Texture Indexes and Gray Level Size Zone Matrix Application To Cell Nuclei Classification
Texture Indexes and Gray Level Size Zone Matrix Application To Cell Nuclei Classification
Abstract: In this paper, we present a study on the abnormal nuclei to be compared with the classification
characterization and the classification of textures. This made by an expert microscopist through the analysis of
study is performed using a set of values obtained by the more nuclei than possible by the expert; ii/ to follow up
computation of indexes. To obtain these indexes, we using this automated procedure the eventual nuclear
extract a set of data with two techniques: the computation changes induced by therapeutic drugs.
of matrices which are statistical representations of the
texture and the computation of "measures". These
matrices and measures are subsequently used as
parameters of a function bringing real or discrete values
which give information about texture features. A model of
texture characterization is built based on this numerical
information, for example to classify textures. An
application is proposed to classify cells nuclei in order to
diagnose patients affected by the Progeria disease.
1. INTRODUCTION
Pattern recognition is a major part of artificial
intelligence that aims to automate the identification of
typical situations. It is a major objective for many
applications: handwritten character recognition (optical
character recognition, automatic reading of postal letters
and bank checks, etc.), video surveillance (facial
recognition), medical imaging (ultrasound, CAT scan,
Magnetic Resonance Imaging), etc. At the heart of the
The first work on characterization and classification
pattern recognition issue, there is a first and unavoidable
shape’s of a nucleus [10, 11] made it possible to obtain a
step: shape characterization. Indeed, in order to recognize
classification success rate of more than 95% thanks to the
an object or a person, it is necessary to describe it by
creation and use of dedicated shape indexes. The texture
defining its characteristics (morphological, geometrical,
characterization method, on the other hand, was less
textural etc.) and then to find and identify these
satisfactory. This original approach, based on indexes
characteristics on the digital source under investigation.
initially obtained for shape characterization, was not able
For this reason, it is often helpful to distinguish two
to achieve a success rate of more than 85%. This rate is
classes of characteristics: the shape using global analysis
inferior to that of expert’s repeatability rate (which
algorithms or outline analysis algorithms and the texture.
corresponds to the percentage of nuclei classified in the
The aim of this paper is to create a model to classify
same way by an expert on two successive analysis, and is
culture skin fibroblast nuclei in patients affected by
between 86 and 89%). The result of this first attempt,
Progeria disease (otherwise known as the Hutchinson-
being inferior to the repeatability rate, we wished to
Gilford syndrome). This rare disease (which affects about
improve this result and obtain one near to that given by
one hundred patients in the world) is cause by a mutation
shape classification.
in a gene encoding lamins A and C, two proteins localized
at the nuclear periphery (lamina) and within nucleoplasm
OUR CONTRIBUTION
[5]. Progeria patients exhibited an accelerate aging. The
With this in mind, we present classification and
presence of mutated lamin A protein resulted in abnormal
validation methods used, followed by the characterization
nuclear shape and texture (not homogeneous), evidenced
techniques used in order to improve the results of texture
by the immunodetection of lamin A/C (primary antibody
classification. Cooccurence matrices and Haralick indexes
directed against lamin A/C, secondary antibody coupled
will be presented first. The Run Length Matrix is used
to Fluoresceine Iso Thio Cyanate) (see fig. 1). Digitized
and modified in order to create a novel texture
pictures of immunostained nuclei were sampled using a
homogeneity characterization method. Our contribution
conventional epifluorescent microscope (Leica DMR)
introduces a new method based on the construction and
coupled to a Princeton-Roper camera. The two interest of
analysis of statistical matrices that represent the texture.
this study are i/ to design an automatic classification of
All these techniques have been studied and validated by representation. This method defines texture according to
the model developed in order to solve the texture gray level special distribution and characterizes texture by
characterization issue. means of second order statistics. In order to accomplish
this, it is interested in the relationships that exist between
2. CLASSIFICATION
the gray scale of pixels of the texture for a given
The aim of classification is to attribute a class to each
displacement vector d. The resulting matrix is of size
object being studied. As mentioned earlier, the objective
, where N is the number of gray levels in the
of this sub-task is to determine whether a cell’s nucleus
has a normal (homogeneous) or an abnormal texture. For a given displacement vector , an
(inhomogeneous) texture. element (i,j) of the matrix is defined by the number of
The classification methods are said to be supervised, pixels in the texture that have a gray level j at a distance d
as they require a reference expert analysis. In this study from a pixel of gray level i. This can be written as
we benefit from of biologists and geneticists knowledge follows:
who have specified classes (healthy and pathological) and ( r,s), r + d ,s + d /
subclasses (normal and irregular shapes, homogeneous
M d (i, j) = card
( ( x y) )
and non-homogeneous textures, etc.).
A classification model is usually built using a learning I ( r,s) = i,I ( r + dx ,s + dy ) = j
method, with the help of data divided into known classes. Figure 2 shows an example of cooccurence matrix
Though applied to a specific problem, the model must be calculation.
capable of being generalized (in so far as data is
concerned). With this objective, the data is separated into
€
two groups: a learning sample and a validation sample.
The classifier must have the same performance rate
through learning and validation. It is necessary to
construct a characteristic vector for each data prior the
classification phase. The vector must be relevant to the
problem in order to allow accurate classification and
prediction. The major risk when providing too many
characteristics to the classifier is rote learning. The
greater the vector's dimension, the greater the flexibility
of the model and the better the classification, but the
greater the likelihood that the model's performance will In this study we are not only interested in the
be poor for a data set not used during the validation. Each neighbors but rather the neighborhood in general. Thus,
model must then be systematically validated and the best for a given spacing distance E, four matrices are
classification with the validation sample obtained. Due to calculated, one for each of the four displacement vectors
the few elements in the sample at our disposal, validation , , and , these are then
was done according to the Leave One Out protocol [8]. averaged to combine all the extracted information. From
The method of classification chosen for the model is the resulting reduced matrix, 15 second-order texture
logistic regression [7]. It is a linear model particularly indexes (Haralick features [6]) are extracted allowing the
well adapted to classification problems with two classes: characterization of the texture.
Next, a systematic study, aimed at finding the best
e f (x ) with x = (x1,.… x n ) the
P(Y / x ) =
f (x )
subset of indexes, was undertaken. The best result was
1+ e obtained for a subset of 8 indexes, on images reduced to
n
32 gray levels with a distance of one pixel. This obtains a
characteristic vector of the initial data, f ( x ) = ∑ α i x i
i=1
classification success rate of 90% by logistic regression.
and P(Y / x ) the conditional probability P of the variable
Figure 3 illustrates the distribution of probabilities as
€ x to belong to the class€Y. To estimate the coefficients α i given by the model.
of the model, the maximum likelihood method is often
€
used, which maximizes the probability of obtaining values
€ observed on the learning sample. It consists in finding the
€ € function
parameters that optimize the likelihood
1−Y
£ (α,Y ) = P [1− P ] . Logistic regression is preferred over
Y
To characterize these nuclei types, two new indexes, Our model’s validity is illustrated in figure 9, which
which are weighted variances of gray level or area size, shows the distribution of probabilities as given by the
are needed: model. The high concentration rate at both extremities of
the histogram and the near absence of ambiguous cases
1
N S
2
(having a probability near the decision value of 0.5) show
VarN = ∑ ∑
N × S n=1 s=1
( n × M ( n,s) − µN ) the efficiency of the classification and the pertinence of
the choice of indexes.
1 N S
with µN = ∑ ∑ n × M (n,s)
N × S n=1 s=1
€ N S
1 2
VarS = ∑ ∑
N × S n=1 s=1
( s × M ( n,s) − µS )
€
1 N S
with µS = ∑ ∑ s × M (n,s)
N × S n=1 s=1
€
with N and S the dimensions of the matrix and M(n,s) the
matrix’s element of coordinates (n,s). The more the
€ texture consists of large areas with high intensity
variations between them, the higher the value of the
index. In the case of a more homogeneous texture, the
value of this index will be low. The same is true for the
index, concerning area size.
In this way two new texture indexes are added to the
11 previous ones, making a total of 13 indexes. An 6. CONCLUSION AND FUTURE WORK
extensive study was once again undertaken, using four In this article, the problem of texture classification,
different classification methods: logistic regression, k-
nearest neighbors, random forests and neural networks. applied to cell nuclei classification, has been covered. The
For nearest neighbors, we tested k equal to the number of main goal was to achieve pertinent texture homogeneity
indexes plus 1 to 30 and k equal 1 to 30. Best result is characterization. To do so, two existing texture
obtain with k equal to 1. Neural networks is a multi-layer characterization methods have been presented: the
perceptron, with a hidden layer. We tested various cooccurence matrix with Haralick features and the Run
numbers of nodes in hidden layer: number of node of Length Matrix. Applied to this particular problem these
input layer divide by 2 to 6. Best result is obtain for 2. methods did not obtain a satisfactory success rate (90%).
This study proved that the best subset is composed of 12 For this reason, a novel method of texture homogeneity
indexes (only the LRHGLE index is not used) on images characterization has been presented. It consists in tagging,
reduced to 32 gray levels and classified by logistic
regression. The different method’s performances can be then counting the size of areas of the same intensity level.
seen and compared in figure 8: better distribution of the This allows a matrix, representative of the texture’s
elements at both extremities and number of ambiguous homogeneity, to be found. In order to improve the
pertinence, two new texture characterization indexes have troughs and peaks, amongst which the 3D representation
also been determined. When combined and applied to of holes and focis can be found. Once extracted, these
nuclei texture classification, this method and these volumes could then be characterized by a 3D extension of
indexes obtain a success rate of 92.6%. 2D shape indexes.
The initial aim of our work was to classify nuclei into
health and unhealthy groups. Expert diagnosis showed
that it was necessary to examine both the shape and the
texture of the nuclei in order to achieve this objective.
Our shape classification model (based on shape indexes)
presented in [10] achieved a classification success rate of
95.4% when applied to nuclei’s shape, but was only
86.9% successful with respect to the initial problem. The
model presented in this article obtains a 92.6% success
rate when applied to nuclei’s texture, and when combined
with the shape classification model, the overall success
rate with respect to the initial problem reaches 87.7%.
Even though this near one percent improvement may
seem slight, it is in fact significant. As there is a large
intersection between abnormally shaped nuclei and
inhomogeneous textured nuclei, most of the abnormally
textured ones are already classified by shape. The best
possible improvement could only have been of 1%, so a
0.8% improvement is highly pertinent with respect to the
probabilities.
By studying the intersections between the different
classes of nuclei, the following conclusion was reached:
only 94% of nuclei can be classified by their shape and /
or texture alone. As a consequence, in our future work,
complementary models will have to be established in
order to characterize the unfrequented diagnostic criteria
and thus further improve the initial problem’s
classification success rate. These unfrequented diagnostic
elements are related to the presence of holes and focis.
Holes are areas of the nuclei in which no lamin A/C are
present, which leads to an absence of a reaction to the
marker (cf. figure 10 a and b). A foci is a small, near
circular area of high intensity (cf. figure 10 c and d).
7. REFERENCES
[1] Chen Y. Q., Dixon M. S., Thomas D. W., “Statistical
Geometrical Features For Texture Classification”,
To detect and characterize these elements, we wish to Pattern Recognition, vol. 28, n° 4, p 537-552, 1995.
determine an original approach, based on texture [2] Chu A., Sehgal C.M., Greenleaf J. F., “Use of gray
representation by its volume below the surface (cf. value distribution of run length for texture analysis”.
figure 11). This representation will allow the extraction of Pattern Recognition Letters, vol. 11, n° 6, p. 415-419,
1990. [9] McCulloch W. S., Pitts W., “A logical calculus of the
[3] Fisher R. A., “The use of multiple measurements in ideas immanent in nervous activity”, Bulletin of
taxonomic problems”, Annals of Eugenics vol. 7, p. Mathematical Biophysics, vol. 5, p. 115-133, 1943.
179-188, 1936. [10] Thibault G., Devic C., Horn J.-F., Fertil B.,
[4] Galloway M. M., “Texture analysis using grey level Sequeira J.,Mari J.-L., “Classification of cell nuclei
run lengths”, Computer Graphics Image Processing, using shape and texture indexes”, International
vol. 4, p. 172-179, July, 1975. Conference in Central Europe on Computer Graphics,
[5] De Sandre-Giovannoli A., Bernard R., Cau P., Visualization and Computer Vision (WSCG), Plzen,
Navarro C., Amiel J., Boccaccio I., Lyonnet S., Czech Repulic, p. 25-28, February, 2008.
Stewart C. L., Munnich A., Merrer M. L., Levy N., [11] Wolberg W., StreetW., Heisey D., Mangasarian
“Lamin A Truncation in Hutchinson-Gilford O., “Computer-Derived Nuclear Features Distinguish
Progeria”, Science, vol. 300, n° 5628, p. 2055, June, Malignant From Benign Breast Cytology”, Human
2003. Pathology, vol. 26, Elsevier, New York, NY, United
[6] Haralick R.M., Shanmugam K., Dinstein I., “Textural States, p. 792-796, July, 1995.
features for image classification”, IEEE Transactions [12] Xu D., Kurani A., Furst J., Raicu D., “Run-
on Systems, Man and Cybernetics, vol. 3, p. 610-621, Length Encoding For Volumetric Texture”,
1973. International Conference on Visualization, Imaging
[7] Hosmer D., Lemeshow S., “Applied Logistic and Image Processing (VIIP), p. 452-458, 2004.
Regression”, John Wiley & Sons, Toronto, 1989. [13] Shakhnarovich, G., Darrel, T., Ydik, P., “Nearest
[8] Martens H., Dardenne P., “Validation and verification Neighbor Methods in Learning and Vision: Theory
of regression in small data sets”, Chemometrics and and Practice”, The MIT Press, 2006.
intelligent laboratory systems, vol. 44, n° 1-2, p. 99- [14] Breiman, L., “Random forests”, Machine
121, 1998. Learning, vol. 45, p. 5-32, 2001.