0% found this document useful (0 votes)
10 views

Texture Indexes and Gray Level Size Zone Matrix Application To Cell Nuclei Classification

Uploaded by

fabio_adao_889
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Texture Indexes and Gray Level Size Zone Matrix Application To Cell Nuclei Classification

Uploaded by

fabio_adao_889
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Texture Indexes and Gray Level Size Zone Matrix

Application to Cell Nuclei Classification


Guillaume THIBAULT 1), Bernard FERTIL 1), Claire NAVARRO 2), Sandrine PEREIRA 2),
Pierre CAU 2), Nicolas LEVY 2), Jean SEQUEIRA 1) and Jean-Luc MARI 1)
1) LSIS Laboratory, Aix-Marseille II University, France
2) INSERM UMR 910, Medical Genetic and Functional Genomic, Medical School, France

Abstract: In this paper, we present a study on the abnormal nuclei to be compared with the classification
characterization and the classification of textures. This made by an expert microscopist through the analysis of
study is performed using a set of values obtained by the more nuclei than possible by the expert; ii/ to follow up
computation of indexes. To obtain these indexes, we using this automated procedure the eventual nuclear
extract a set of data with two techniques: the computation changes induced by therapeutic drugs.
of matrices which are statistical representations of the
texture and the computation of "measures". These
matrices and measures are subsequently used as
parameters of a function bringing real or discrete values
which give information about texture features. A model of
texture characterization is built based on this numerical
information, for example to classify textures. An
application is proposed to classify cells nuclei in order to
diagnose patients affected by the Progeria disease.

Keywords: texture indexes, gray level size zone matrix


(GLSZM), cell nuclei classification.

1. INTRODUCTION
Pattern recognition is a major part of artificial
intelligence that aims to automate the identification of
typical situations. It is a major objective for many
applications: handwritten character recognition (optical
character recognition, automatic reading of postal letters
and bank checks, etc.), video surveillance (facial
recognition), medical imaging (ultrasound, CAT scan,
Magnetic Resonance Imaging), etc. At the heart of the
The first work on characterization and classification
pattern recognition issue, there is a first and unavoidable
shape’s of a nucleus [10, 11] made it possible to obtain a
step: shape characterization. Indeed, in order to recognize
classification success rate of more than 95% thanks to the
an object or a person, it is necessary to describe it by
creation and use of dedicated shape indexes. The texture
defining its characteristics (morphological, geometrical,
characterization method, on the other hand, was less
textural etc.) and then to find and identify these
satisfactory. This original approach, based on indexes
characteristics on the digital source under investigation.
initially obtained for shape characterization, was not able
For this reason, it is often helpful to distinguish two
to achieve a success rate of more than 85%. This rate is
classes of characteristics: the shape using global analysis
inferior to that of expert’s repeatability rate (which
algorithms or outline analysis algorithms and the texture.
corresponds to the percentage of nuclei classified in the
The aim of this paper is to create a model to classify
same way by an expert on two successive analysis, and is
culture skin fibroblast nuclei in patients affected by
between 86 and 89%). The result of this first attempt,
Progeria disease (otherwise known as the Hutchinson-
being inferior to the repeatability rate, we wished to
Gilford syndrome). This rare disease (which affects about
improve this result and obtain one near to that given by
one hundred patients in the world) is cause by a mutation
shape classification.
in a gene encoding lamins A and C, two proteins localized
at the nuclear periphery (lamina) and within nucleoplasm
OUR CONTRIBUTION
[5]. Progeria patients exhibited an accelerate aging. The
With this in mind, we present classification and
presence of mutated lamin A protein resulted in abnormal
validation methods used, followed by the characterization
nuclear shape and texture (not homogeneous), evidenced
techniques used in order to improve the results of texture
by the immunodetection of lamin A/C (primary antibody
classification. Cooccurence matrices and Haralick indexes
directed against lamin A/C, secondary antibody coupled
will be presented first. The Run Length Matrix is used
to Fluoresceine Iso Thio Cyanate) (see fig. 1). Digitized
and modified in order to create a novel texture
pictures of immunostained nuclei were sampled using a
homogeneity characterization method. Our contribution
conventional epifluorescent microscope (Leica DMR)
introduces a new method based on the construction and
coupled to a Princeton-Roper camera. The two interest of
analysis of statistical matrices that represent the texture.
this study are i/ to design an automatic classification of
All these techniques have been studied and validated by representation. This method defines texture according to
the model developed in order to solve the texture gray level special distribution and characterizes texture by
characterization issue. means of second order statistics. In order to accomplish
this, it is interested in the relationships that exist between
2. CLASSIFICATION
the gray scale of pixels of the texture for a given
The aim of classification is to attribute a class to each
displacement vector d. The resulting matrix is of size
object being studied. As mentioned earlier, the objective
, where N is the number of gray levels in the
of this sub-task is to determine whether a cell’s nucleus
has a normal (homogeneous) or an abnormal texture. For a given displacement vector , an
(inhomogeneous) texture. element (i,j) of the matrix is defined by the number of
The classification methods are said to be supervised, pixels in the texture that have a gray level j at a distance d
as they require a reference expert analysis. In this study from a pixel of gray level i. This can be written as
we benefit from of biologists and geneticists knowledge follows:
who have specified classes (healthy and pathological) and  ( r,s), r + d ,s + d / 
subclasses (normal and irregular shapes, homogeneous 
M d (i, j) = card 
( ( x y) ) 

and non-homogeneous textures, etc.).
A classification model is usually built using a learning  I ( r,s) = i,I ( r + dx ,s + dy ) = j 
method, with the help of data divided into known classes. Figure 2 shows an example of cooccurence matrix
Though applied to a specific problem, the model must be calculation.
capable of being generalized (in so far as data is
concerned). With this objective, the data is separated into

two groups: a learning sample and a validation sample.
The classifier must have the same performance rate
through learning and validation. It is necessary to
construct a characteristic vector for each data prior the
classification phase. The vector must be relevant to the
problem in order to allow accurate classification and
prediction. The major risk when providing too many
characteristics to the classifier is rote learning. The
greater the vector's dimension, the greater the flexibility
of the model and the better the classification, but the
greater the likelihood that the model's performance will In this study we are not only interested in the
be poor for a data set not used during the validation. Each neighbors but rather the neighborhood in general. Thus,
model must then be systematically validated and the best for a given spacing distance E, four matrices are
classification with the validation sample obtained. Due to calculated, one for each of the four displacement vectors
the few elements in the sample at our disposal, validation , , and , these are then
was done according to the Leave One Out protocol [8]. averaged to combine all the extracted information. From
The method of classification chosen for the model is the resulting reduced matrix, 15 second-order texture
logistic regression [7]. It is a linear model particularly indexes (Haralick features [6]) are extracted allowing the
well adapted to classification problems with two classes: characterization of the texture.

 Next, a systematic study, aimed at finding the best
 e f (x ) with x = (x1,.… x n ) the
P(Y / x ) = 
f (x )
subset of indexes, was undertaken. The best result was
1+ e obtained for a subset of 8 indexes, on images reduced to
 n
32 gray levels with a distance of one pixel. This obtains a
characteristic vector of the initial data, f ( x ) = ∑ α i x i
i=1
 classification success rate of 90% by logistic regression.
and P(Y / x ) the conditional probability P of the variable
 Figure 3 illustrates the distribution of probabilities as
€ x to belong to the class€Y. To estimate the coefficients α i given by the model.
of the model, the maximum likelihood method is often

used, which maximizes the probability of obtaining values
€ observed on the learning sample. It consists in finding the
€ € function
parameters that optimize the likelihood
1−Y
£ (α,Y ) = P [1− P ] . Logistic regression is preferred over
Y

discriminate analysis [3] because of its greater reliability,


versatility, the few restrictions that it imposes on the
variables and the clarity of its results.

Nevertheless, other methods will also be used in order
to compare results. These methods are more complex,
non-linear, and are obtained through different conception
techniques: neural networks [9], k-nearest neighbors [13],
and random forests [14].
3. GRAY LEVEL COOCCURRENCE MATRIX
The cooccurence matrix technique is one of the oldest
and most efficient methods of statistical texture
A high concentration rate at both extremities can be
seen, which shows that the elements are well separated.
But 40 ambiguous cases exists (for which the given
probability is near the decision value of 0.5, between 0.3
and 0.7). Moreover, 8 important errors appear: dark green
(respectively light green) elements in the right hand
(respectively left hand) column. These errors correspond
to elements for which the model gives a probability upper
than 0.8 (respectively lower than 0.2) but which have an
inhomogeneous (respectively homogeneous) texture. The
presence of these non-negligible errors and ambiguities
has lead us to improve the model. The resulting matrix has a fixed number of lines equal
to the number of gray levels and a dynamic number of
4. GRAY LEVEL RUN LENGTH MATRIX columns, determined by the size of the largest area. The
The Gray Level Run Length Matrix is a statistical more homogeneous the texture, the wider and flatter the
texture characterization method [2,4,6]. This method matrix will be. This matrix has the advantage of not
consists in counting the number of pixel segments having requiring calculations in several directions, which are
the same intensity in a given direction, then representing replaced by tagging different areas. However, specifying
the results in a matrix. A direction (0°, 45°, 90° or 135°) the number of gray levels is still necessary, but this
and a number of gray levels are decided on beforehand. renders the calculations robust to noise. The 11 same
The value contained in the matrix’s (l,n) square is equal to indexes as for the Run Length Matrix with 32 gray levels
are calculated. The classification success rate for these 11
the number of segments of length l and gray level n. This
indexes is 91.11% by logistic regression, which is the best
implies that the matrix’s number of columns is dynamic, result obtained by all the techniques used so far. This
as it is determined by the length of the longest segment. improvement can be clearly seen by comparing figures 3
By design, this calculation is symmetrical and and 6. A better distribution of the elements at both
consequentially, it is unnecessary to consider the four extremities and a diminution of 16 the ambiguous cases
complementary directions (180°, 225°, 270° or 315°, in (then 29 ambiguous cases) is visible. However, six
this example 8 possible directions between a given pixel important errors remain.
and its neighbors are taken into account). Figure 4 shows
an example of the calculation of a Run Length Matrix:

Once the matrix obtained, 11 indexes are calculated


[12] to determine the vector that characterizes the texture.
To establish our model, the matrix for a given gray level
and for four directions was calculated. Then, for each
index, the average value of the four directions was taken.
A systematic study found that the best model was However, when examining the data, the indexes and
obtained for a set of 7 indexes and 32 gray levels. The the false positive results, it became clear that a specific
classification success rate was 84.81% by logistic texture case was not being correctly characterized and
regression, which is inferior to the rate obtained with the was the cause of the remaining errors: nuclei with large
cooccurence matrix and the Haralick features (90%). homogeneous areas, but with high variations in the
intensity between the areas, making them inhomogeneous
5. NEW METHOD: GRAY LEVEL SIZE ZONE textured nuclei.
MATRIX
A homogeneous texture is composed of large areas of
the same intensity, and not of small groups of pixels or
segments in any given direction. To take this fact into
consideration, it was necessary to take into account, in a
matrix, the size of each area with pixels of the same
intensity level. This matrix was calculated according to
the Run Length Matrix principle: the value of the matrix’s
(s,n) square is equal to the number of areas of size s and
of gray level n. Figure 5 shows an example of the
calculation of such a matrix, baptized Size Zone Matrix.
cases reduces to 16. Thanks to the use of the two new
indexes in this new model composed of the 12 most
pertinent indexes, the classification success rate of
92.59% has been reached. For the best configuration of
indexes, performances between regression logistic and
multi-layers perceptron are comparable.

To characterize these nuclei types, two new indexes, Our model’s validity is illustrated in figure 9, which
which are weighted variances of gray level or area size, shows the distribution of probabilities as given by the
are needed: model. The high concentration rate at both extremities of
the histogram and the near absence of ambiguous cases
1
N S
2
(having a probability near the decision value of 0.5) show
VarN = ∑ ∑
N × S n=1 s=1
( n × M ( n,s) − µN ) the efficiency of the classification and the pertinence of
the choice of indexes.

1 N S
with µN = ∑ ∑ n × M (n,s)
N × S n=1 s=1
€ N S
1 2
VarS = ∑ ∑
N × S n=1 s=1
( s × M ( n,s) − µS )

1 N S
with µS = ∑ ∑ s × M (n,s)
N × S n=1 s=1

with N and S the dimensions of the matrix and M(n,s) the
matrix’s element of coordinates (n,s). The more the
€ texture consists of large areas with high intensity
variations between them, the higher the value of the
index. In the case of a more homogeneous texture, the
value of this index will be low. The same is true for the
index, concerning area size.
In this way two new texture indexes are added to the
11 previous ones, making a total of 13 indexes. An 6. CONCLUSION AND FUTURE WORK
extensive study was once again undertaken, using four In this article, the problem of texture classification,
different classification methods: logistic regression, k-
nearest neighbors, random forests and neural networks. applied to cell nuclei classification, has been covered. The
For nearest neighbors, we tested k equal to the number of main goal was to achieve pertinent texture homogeneity
indexes plus 1 to 30 and k equal 1 to 30. Best result is characterization. To do so, two existing texture
obtain with k equal to 1. Neural networks is a multi-layer characterization methods have been presented: the
perceptron, with a hidden layer. We tested various cooccurence matrix with Haralick features and the Run
numbers of nodes in hidden layer: number of node of Length Matrix. Applied to this particular problem these
input layer divide by 2 to 6. Best result is obtain for 2. methods did not obtain a satisfactory success rate (90%).
This study proved that the best subset is composed of 12 For this reason, a novel method of texture homogeneity
indexes (only the LRHGLE index is not used) on images characterization has been presented. It consists in tagging,
reduced to 32 gray levels and classified by logistic
regression. The different method’s performances can be then counting the size of areas of the same intensity level.
seen and compared in figure 8: better distribution of the This allows a matrix, representative of the texture’s
elements at both extremities and number of ambiguous homogeneity, to be found. In order to improve the
pertinence, two new texture characterization indexes have troughs and peaks, amongst which the 3D representation
also been determined. When combined and applied to of holes and focis can be found. Once extracted, these
nuclei texture classification, this method and these volumes could then be characterized by a 3D extension of
indexes obtain a success rate of 92.6%. 2D shape indexes.
The initial aim of our work was to classify nuclei into
health and unhealthy groups. Expert diagnosis showed
that it was necessary to examine both the shape and the
texture of the nuclei in order to achieve this objective.
Our shape classification model (based on shape indexes)
presented in [10] achieved a classification success rate of
95.4% when applied to nuclei’s shape, but was only
86.9% successful with respect to the initial problem. The
model presented in this article obtains a 92.6% success
rate when applied to nuclei’s texture, and when combined
with the shape classification model, the overall success
rate with respect to the initial problem reaches 87.7%.
Even though this near one percent improvement may
seem slight, it is in fact significant. As there is a large
intersection between abnormally shaped nuclei and
inhomogeneous textured nuclei, most of the abnormally
textured ones are already classified by shape. The best
possible improvement could only have been of 1%, so a
0.8% improvement is highly pertinent with respect to the
probabilities.
By studying the intersections between the different
classes of nuclei, the following conclusion was reached:
only 94% of nuclei can be classified by their shape and /
or texture alone. As a consequence, in our future work,
complementary models will have to be established in
order to characterize the unfrequented diagnostic criteria
and thus further improve the initial problem’s
classification success rate. These unfrequented diagnostic
elements are related to the presence of holes and focis.
Holes are areas of the nuclei in which no lamin A/C are
present, which leads to an absence of a reaction to the
marker (cf. figure 10 a and b). A foci is a small, near
circular area of high intensity (cf. figure 10 c and d).

7. REFERENCES
[1] Chen Y. Q., Dixon M. S., Thomas D. W., “Statistical
Geometrical Features For Texture Classification”,
To detect and characterize these elements, we wish to Pattern Recognition, vol. 28, n° 4, p 537-552, 1995.
determine an original approach, based on texture [2] Chu A., Sehgal C.M., Greenleaf J. F., “Use of gray
representation by its volume below the surface (cf. value distribution of run length for texture analysis”.
figure 11). This representation will allow the extraction of Pattern Recognition Letters, vol. 11, n° 6, p. 415-419,
1990. [9] McCulloch W. S., Pitts W., “A logical calculus of the
[3] Fisher R. A., “The use of multiple measurements in ideas immanent in nervous activity”, Bulletin of
taxonomic problems”, Annals of Eugenics vol. 7, p. Mathematical Biophysics, vol. 5, p. 115-133, 1943.
179-188, 1936. [10] Thibault G., Devic C., Horn J.-F., Fertil B.,
[4] Galloway M. M., “Texture analysis using grey level Sequeira J.,Mari J.-L., “Classification of cell nuclei
run lengths”, Computer Graphics Image Processing, using shape and texture indexes”, International
vol. 4, p. 172-179, July, 1975. Conference in Central Europe on Computer Graphics,
[5] De Sandre-Giovannoli A., Bernard R., Cau P., Visualization and Computer Vision (WSCG), Plzen,
Navarro C., Amiel J., Boccaccio I., Lyonnet S., Czech Repulic, p. 25-28, February, 2008.
Stewart C. L., Munnich A., Merrer M. L., Levy N., [11] Wolberg W., StreetW., Heisey D., Mangasarian
“Lamin A Truncation in Hutchinson-Gilford O., “Computer-Derived Nuclear Features Distinguish
Progeria”, Science, vol. 300, n° 5628, p. 2055, June, Malignant From Benign Breast Cytology”, Human
2003. Pathology, vol. 26, Elsevier, New York, NY, United
[6] Haralick R.M., Shanmugam K., Dinstein I., “Textural States, p. 792-796, July, 1995.
features for image classification”, IEEE Transactions [12] Xu D., Kurani A., Furst J., Raicu D., “Run-
on Systems, Man and Cybernetics, vol. 3, p. 610-621, Length Encoding For Volumetric Texture”,
1973. International Conference on Visualization, Imaging
[7] Hosmer D., Lemeshow S., “Applied Logistic and Image Processing (VIIP), p. 452-458, 2004.
Regression”, John Wiley & Sons, Toronto, 1989. [13] Shakhnarovich, G., Darrel, T., Ydik, P., “Nearest
[8] Martens H., Dardenne P., “Validation and verification Neighbor Methods in Learning and Vision: Theory
of regression in small data sets”, Chemometrics and and Practice”, The MIT Press, 2006.
intelligent laboratory systems, vol. 44, n° 1-2, p. 99- [14] Breiman, L., “Random forests”, Machine
121, 1998. Learning, vol. 45, p. 5-32, 2001.

You might also like