Iciinfs 2015 7399058
Iciinfs 2015 7399058
18-20,2015,Sri Lanka
I. INTRODUCTION Fig. 1 Sample images of blood. Malaria parasites are shown inside the red
rectangle. (a) and (b) P. falciparum (c) P. knowlesi (d) P. malariae (e) P.
A. Malaria Diagnosis ovale (f) P. vivax
Malaria is an infectious disease common in many parts of
the world, although tropical countries are mostly affected as Early detection based on blood sample analysis can
it is transmitted through bite of infected Anopheles significantly mmunize adverse effects and fatalities.
mosquitoes. If untreated immediately, it can not only lead to Symptoms of malaria such as high fever, headache, body pain,
death but also result in socio-economic downfalls and profuse sweating, severe chills, etc. are common to many
complications during pregnancy [ 1]. other common diseases [ 1]. Hence, accurate diagnosis
As listed in Table 1, there are five Plasmodium species demands microscope based blood film analysis carried out by
which cause malaria in humans. Among them, P. falciparum medical laboratory technologists (MLT) or parasitologists. It
is the most malevolent form which is responsible for majority is an arduous task for these professionals to test hundreds of
of deaths [ 1]-[3]. They vary in size, shape and density as samples merely to detect malaria infection. On the other hand,
illustrated in Fig. l. The Gametocyte stage, i.e. adult stage of patients have a considerable wait-time to obtain test results.
the Plasmodium life-cycle, of P. falciparum has a distinctive Antigen-based rapid diagnostic tests (RDT) such as those
"banana" shape as seen in Fig. 1 (a) and (b). based on PjHRP2, pLDH and plasmodial aldolase have major
issues including sensitivity to temperature, genetic variation
and persistence of antigen in bloodstream which make these this paper, how the background - all pixels that do not
test results inconsistent in specificities and sensitivities [4]. belong to parasites - of a Giemsa-stained thin film image is
A well-grounded microscopic analysis of malaria suppressed based on properties of the HSV color space and
involves preparing thick and thin films from a blood drop. In image pre-processing techniques are discussed. Nevertheless,
order to vividly identifY parasites, the films are stained, unlike previous methods, colors are not used as input features
typically with Giesma stain. While basic information is for classifiers. In light of work done by Hu et al. [20],
observed from the thick film, the thin film is vital for significant invariant features for this application are identified.
accurate and detailed analysis. So as to determine if P. Jalciparum is present in its
Gametocyte stage in the film, binary classification is
B. Photomicrography performed based on K-nearest neighbors (K-NN) and naIve
The motorized and software-controlled microscope Bayes classifiers.
developed by Silva et al. [5] can be programmed to adjust
magnification, illumination, etc. With the development of
these automated microscopes and novel automatic blood II. APPROACH
smear collection technologies [6], fully automated blood tests,
A. Data Collection
especially detection of hemoparasites, are promising.
Elter et al. detected possible malaria parasites in thick Thirty RGB Giemsa-stained thin film photomicrographs (9,
blood films [7]. The majority of automatic malarial detection 2, 6, 6 and 7 images from P. Jalciparum, P. knowlesi, P.
research studies [8]-[12] were aimed at classifYing parasites malariae, P. ovale and P. vivax respectively) of size 300 x
based on morphological features in thin films. However, 300 pixels were obtained from DPDx database [2 1]. These
accurately inferring all malarial parasite species merely based images were in different magnification and illumination
on shapes was challenging because there are several species levels. Images were selected so that each image contained at
such as P. ovale and P. vivax who share the same shape. least one malarial parasite because detection of non-parasite
References [ 13]-[15] separated all cells and analyzed them samples is trivial in image processing.
based on the difference in colors. Nevertheless, colors of the
B. Photomicrograph pre-processing
cytoplasm and chromatin can vary based on the staining
procedure, how cells taken up the stain and sex [ 1]. Though The primary purpose of morphological pre-processing is to
similar, in a more recent study Linderet al. [ 16] attempted to isolate the parasite from rest of the background and qualifY
exhaustively search for the parasite using a sliding window for the subsequent feature extraction process. As shown in
and segment the parasite based on RGB intensities. Fig. 1, the background is cluttered with blood cells, platelets,
Tek et al. [ 10] carried out a comprehensive analysis to etc.
identifY different life-cycle stages of the Plasmodium parasite. In order to make the parasite more vivid, the input RGB
Since it is required to analyze multitude of features such as image is converted to HSV color space which consists of
M x N pixels where M is the height and N is the width of the
size, number of parasites, shape, visibility of chromatin,
pigmentation, etc. to accurately determine the stage [ 1], image. The pixel-wise color thresholding operation given in
feature extraction becomes more challenging and less feasible. ( 1) is performed to develop a new binary image [,Ci,j, c) on
The external validity is hard to appraise as they maintained the converted imagefCi,j,c) wherei andj indicate columns
controlled constant staining and microscope illumination and rows respectively and; c E {Hue,Sau t rait on, Value}
levels. The closest study to ours was carried out by channel. The operator "v" is the inclusive disjunction (a.k.a.
Anggarani et al. [ 17] which specifically aimed to count the logical OR) between two indicator functions IT E {0,1} and
number of P.Jalciparum. They used a multiple thresholding SL'Su, VL and Vu are empirically determined scalars.
technique for 20 photomicrographs. Converting a color image
to gray scale without extracting any valuable information, t'i
C ,j) = ITsi
C ,j) v ITvi
C ,j) (1)
especially in a stained blood film, is clearly a limitation. Only
using P.Jalciparum infected blood films can be identified as where,
the other drawback and the deterrent for ecological validity
because type of parasite (colors of their cytoplasm and
ITsi
C ,j) =
{1 'SL < fCi,j,Sau
t rai
t on) < Su (2)
chromatin) is clearly a confounding factor for thresholding. 0, oh
t erwise
The brief experiment discussed in [ 18] which is based on the
absolute difference between the original and an inverted RGB ··
ITvC t,j) =
{1, VL < fCi,j, Value) < Vu (3)
images similarly assumed the presence of only P.Jalciparum 0, oh
t erwise
in blood samples. Since the proposed method was merely to
enhance the image, but not to automatically detect the Artefacts and contaminants such as fungus, airborne pollen,
parasite, features of the isolated parasitic region may be less spores, dirt, bacteria, contaminated water, etc. are typically
suitable for automatic classification due to erratic found in blood films in small scale. As a result, the resulting
morphological distortion. Anand et al. [ 19] identified malaria image may contain small undesirable pigments and noise.
infected red blood cells based on a less ubiquitous Additionally, quality of the image and ill-tuned scalars in the
holographic microscope which facilitated them to consider previous stage could contribute to this noise. These can be
the cell thickness as a candidate feature. removed from morphological opening followed by
IdentifYing features suitable for classification is morphological closing operations [22]. If there are multiple
challenging because parasites differ in size, can be in any parasites in the image, all smaller (by area) parasites are
orientation and can be at anywhere in the image. For instance, marginalized from the image, leaving only the largest parasite
compare size and position of parasites in Fig. 1 (a) and (b). In for feature extraction.
475
2015 IEEE 10th International Conference on Industrial and Information Systems,ICIIS 2015, Dec. 18-20,2015,Sri Lanka
C. Feature Extraction correlated features. If there are such correlated features, some
Although the parasite is isolated from its background, it extraneous features are removed and the significant features
may be at any position, orientation and size. Hence, it is that will be used in classification are named
simply impossible to consider area, perimeter and as Fv Fz , F3 , ... , FQ etc. where Q ::; 7.
components of the color histogram as features. Similarly, due
D. Classification
to many permutations of the aforementioned features, it is
less pragmatic to take all possibilities into account and utilize K-nearest neighbors is a non-parametric memory-based
a classical matching technique. To obtain reliable results in classification method [28]-[29] which infers the class of a
template matching [22], the query image should query input by observing the nearest K points.
approximately overlap with the template. For a dataset of size R that contains D dimensional inputs
We make use of TRS moment invariant [20],[23] features X E IRl.RXD, outputs Y E C, K-NN predictor determines the
which was developed based on the "uniqueness theorem" probability ( 13) the point belongs to each class C E {P.fal ,
[20]. Although the derivations are based on the Riemann Non P.fall for a query input X*.
integral which is inferior to the Lebesgue integral, it makes
R
no difference in this application as images are bounded in
size. Analogous to statistical moments, seven movement p(y = CIX*,X,y,K) = � I D (Yr = C) (13)
invariant features, given in (4) to ( 10), were derived [20] rErK(X'.x,y)
based on a moment generating function (MGF). These
features have been previously used in recognizing faces [24], where rK(X*,X,y) are the indices of the K nearest neighbors
hand gestures [25]-[26], landed aircrafts [27], etc. to X*. The class that maximize the probability is calculated
from ( 14).
Ml = rJzo + rJoz (4)
:9(X*) = argmaxc p(y = CIX*,X,y,K) ( 14)
(5)
During the training process, hyperparameter K can be chosen
(6) to maximize the classification accuracy.
Here, rJ is defined as in ( 1 1) based on ( 12). Although Support Vector Machines (SYM) and Artificial
Neural Networks (ANN) are popular [25]-[26], [30]-[3 1] due
rJpq
/lpq
= w ( 1 1) to their easy-to-use property, they are not of our interest for
/l00
this application as they lack probabilistic interpretation and
for w =
[(p + q) /2 + 1] and (p + q) = 2 ,3,4, ...
require many training points. Since the dimensionality have
f.1pq "M-l "N-l ' "7)P(j' "7)qf(") been reduced, cumbersome Bayesian models such as
- L... i=O L...j=O ( t - t -j t ,j ( 12)
_
moo
' J =
mOl
moo
and mpQ = [=0 LJ=OiPjQ!'(i'j) parameters tend to over-fit unless regularization is performed
and sufficiently large amount of data is not provided.
Nevertheless, regularized logistic regression is also a possible
As there are seven features Ml - M7, the "curse of option.
dimensionality" [28] would be an issue especially when the
Euclidian distance among data points are measured. As this is
the case in nearest neighbor search which will be discussed III. EXPERlMENTAL RESULTS
under section II D, it is vital to reduce the dimensionality.
Although each of the seven features have the attractive A. Pre-processing and Feature Extraction
property of being invariant to translation, orientation and All images were converted to HSV color space as
scale, they do not possess a straightforward physical illustrated in Fig. 2. Then the images were filtered by ( 1). It
interpretation in contrast to statistical moments such as mean, was empirically found that settings SL = 0.2,Su = 0.9, VL =
variance, skewness, kurtosis, etc. Therefore, it is not possible 0.6 and Vu = 0.7 produce acceptable images. Then
to directly identity unwanted and trivial features. Principal morphological opening followed by dilation was performed
component analysis (PCA) is suggested to identity highly
476
2015 IEEE 10th International Conference on Industrial and Information Systems,ICIIS 2015, Dec. 18-20,2015,Sri Lanka
with a disk shaped structuring element with a radius of 2 The seven TRS moment invariant features [20],[22] were
pixels. The output images of the two operations are shown in calculated for each image. The normalized features were used
Fig. 3 (b) and (c) respectively. Note that noise in Fig. 3 (a) to perform PCA which revealed, as illustrated in bi-plot of
(e.g. white pixels in bottom-right corner) has been removed. Fig. 4, features Ml and M2, M3 and M4 and; M5, M6 and M7
The results did not change significantly if disk radius< 5. Fig. were highly correlated because angle formed by any two M
3 (d) shows isolation of the largest contiguous area based on variables in the bi-plot indicates pair-wise correlation. Any
connected components. Eventually, as an additional step, Pearson correlation among features verified the result.
image hole filling was performed to ensure that there are no Therefore, only features Ml, M4 and M6, which will be
black pixels on the parasite. henceforth called as Fl, F2 and F3 respectively, were used in
further analysis.
: ---------: : : : M1 : ---
0.6 ... - � -
I
.. - - - - - - - - - � - - - - - - - - - -
" '
- - - - - - - - - � - - - - - - - - -
"
- - - - - - - - - � -
I I I " I
I " " I
"
I " "
: M2
I
Fig. 2 ROB to HSV conversion. (a) Original ROB image (b) Hue (c) N 0.2
1:
Saturation (d) Value Q)
c:
0 0
a.
E ....:--.
.. "Vl3
...
0 , , '4 '
-0.2 ----1----------:----------1---------- ---------1----------:----------,;----
u I • I " I
I I I I I I
I " " I
I " " I
,
I "
--------- ---------,''
"
--------, ----
-0.4
" "
- - - - � - - - - - - - - - -r- - - - - - - - - - � - - - - - - - - - - -r- -
I . , " I
I • I " I
I I I " I
I I I " I
I " " I
-0.6 1
I
____ J __________
1
'-
"
_________ J __________
1
'
_________ J __________
1 1
'-
"
_________ J ___ _
1
1 1 1 1 1 1
50
# 40 '
1 1 1 , ,
__________ J ___________I___________ L __________ J _____________________
1 1 1 , ,
1 1 , , ,
I}) 1
1
1
,
,
,
,
,
,
,
T§ 1
1
, ,
,
, ,
, , ,
0 1 1 , ,
�
1 1 1 ,
1 1 1 1
1 1 1 ,
u 1 1 , ,
1 1 , ,
�
VJ 20 _____ -' ___________,___________
1 ,
L __
,
_______ -' ___________,__________
, ,
VJ
1 , , , ,
1 , , , ,
ro 1 ,
1
,
,
,
,
,
,
U 1 , , ,
VJ
10 ------- -'
,
-----______1______----
,
�
, ,
, ,
, ,
, ,
, ,
, ,
O L-----L---�L---�----�--�
o 5 10 15 20 25 30
Kin K-NN
Fig. 3 Morphological processing. (a) Output binary image in (1) (b)
Morphological opening (c) Morphological closing (d) largest contiguous area Fig. 5 Model selection for K-NN classifier
477
2015 IEEE 10th International Conference on Industrial and Information Systems,ICIIS 2015, Dec. 18-20,2015,Sri Lanka
F2 VS. F1 F3 VS. F2 F3 VS. F1 is not infected with malaria. As a result, Gaussian naIve
Bayes should be selected as the classifier. The results are not
• comparable to [17],[34] as sensitivity and specificity of
• •
• detecting P. Jalciparum are reported w.r.t. red blood cells
••
count.
Detailed feature plots, decision surfaces and decision
surfaces (maximum probability surfaces) are shown in each
row of Fig. 6 respectively. Each column represents feature
0.8 feature plots. The scatter plots in the first row show training
data. The second and third rows depict the probability of a
0.6
given point in the graph being P. Jalciparum and probability
being non-Po Jalciparum respectively. These two rows verify
OA
the relationship ( 16) for binary classification.
0.2
pCP.fal ) +pC NonP.fal ) = 1 ( 16)
o
The last row indicates the decision surface determined by
applying ( 17) for each coordinate.
YMAP = max[pCP·fal ) , pC NonP,fal )] (17)
�
A. Pre-processing
As shown in Fig. 2, magnitudes of Hue pixels are
approximately similar throughout the image as it is a measure
about the difference in perceived colors W.r.t. red, green, blue
and yellow. Since parasites and cells appear to share
approximately similar colors, there is no difference in Hue
magnitudes. In contrast, Saturation magnitudes can be used to
differentiate the parasite from its background as the parasite
is more chromatic relative to its own brightness. Value
magnitude is particularly important as the chromatic of the Fig. 7 Robustness (a) Artefacts and contaminants in the original image (b)
parasite is darker. We conjuncture the possibility of using Filter performance
HSL and YCrCb color spaces instead of HSV space. Further,
as colors of the chromatin and cytoplasm determine sexual
orientation of the parasite, we also conjecture the possibility
of inferring it based on Value magnitudes in conjunction with
RGB values.
B, Classification
NaIve Bayes classification has the highest true positive
rate and true negative rate. As in many medical tests, true
positive rates are more important than false positive rates
because if non-Po Jalciparum is detected as P.falciparum it is
possible to proceed to a formal laboratory test carried out by
a human expert. In contrast, if P.Jalciparum is detected as Fig. 8 Reason for false positive alarms (a) Original image (b) Filtered
non-P, Jalciparum, the patient may simply assume that he/she image
478
2015 IEEE 10th International Conference on Industrial and Information Systems,ICIIS 2015, Dec. 18-20,2015,Sri Lanka
479