Image Segmentation: Femur
Image Segmentation: Femur
Model of a segmented femur. It shows the outer surface (red), the surface between compact bone
and spongy bone (green) and the surface of the bone marrow (blue).
In computer vision, image segmentation is the process of partitioning a digital image into
multiple segments (sets of pixels, also known as superpixels). The goal of segmentation is to
simplify and/or change the representation of an image into something that is more meaningful
and easier to analyze.[1][2] Image segmentation is typically used to locate objects and boundaries
(lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a
label to every pixel in an image such that pixels with the same label share certain characteristics.
The result of image segmentation is a set of segments that collectively cover the entire image, or
a set of contours extracted from the image (see edge detection). Each of the pixels in a region are
similar with respect to some characteristic or computed property, such as color, intensity, or
texture. Adjacent regions are significantly different with respect to the same characteristic(s).[1]
When applied to a stack of images, typical in medical imaging, the resulting contours after image
segmentation can be used to create 3D reconstructions with the help of interpolation algorithms
like Marching cubes.
Contents
1 Applications
2 Thresholding
3 Clustering methods
4 Compression-based methods
5 Histogram-based methods
6 Edge detection
8 Region-growing methods
10 Variational methods
12 Watershed transformation
14 Multi-scale segmentation
15 Semi-automatic segmentation
16 Trainable segmentation
17 Other methods
18 Segmentation benchmarking
19 See also
20 Notes
21 References
22 External links
Applications
Some of the practical applications of image segmentation are:
Machine vision
Medical imaging[3][4]
o Intra-surgery navigation
Object detection[7]
o Pedestrian detection
o Face detection
Recognition Tasks
o Face recognition
o Fingerprint recognition
o Iris recognition
Video surveillance
Several general-purpose algorithms and techniques have been developed for image
segmentation. To be useful, these techniques must typically be combined with a domain's
specific knowledge in order to effectively solve the domain's segmentation problems.
Thresholding
The simplest method of image segmentation is called the thresholding method. This method is
based on a clip-level (or a threshold value) to turn a gray-scale image into a binary image. There
is also a balanced histogram thresholding.
The key of this method is to select the threshold value (or values when multiple-levels are
selected). Several popular methods are used in industry including the maximum entropy method,
Otsu's method (maximum variance), and k-means clustering.
Recently, methods have been developed for thresholding computed tomography (CT) images.
The key idea is that, unlike Otsu's method, the thresholds are derived from the radiographs
instead of the (reconstructed) image[8] .[9]
New methods suggested the usage of multi-dimensional fuzzy rule-based non-linear thresholds.
In these works decision over each pixel's membership to a segment is based on multi-
dimensional rules derived from fuzzy logic and evolutionary algorithms based on image lighting
environment and application.[10]
Clustering methods
Main article: Data clustering
Source image.
Image after running k-means with k = 16. Note that a common technique to improve
performance for large images is to downsample the image, compute the clusters, and then
reassign the values to the larger image if necessary.
The K-means algorithm is an iterative technique that is used to partition an image into K clusters.
[11]
The basic algorithm is
2. Assign each pixel in the image to the cluster that minimizes the distance between the
pixel and the cluster center
3. Re-compute the cluster centers by averaging all of the pixels in the cluster
4. Repeat steps 2 and 3 until convergence is attained (i.e. no pixels change clusters)
In this case, distance is the squared or absolute difference between a pixel and a cluster center.
The difference is typically based on pixel color, intensity, texture, and location, or a weighted
combination of these factors. K can be selected manually, randomly, or by a heuristic. This
algorithm is guaranteed to converge, but it may not return the optimal solution. The quality of the
solution depends on the initial set of clusters and the value of K.
Compression-based methods
Compression based methods postulate that the optimal segmentation is the one that minimizes,
over all possible segmentations, the coding length of the data.[12][13] The connection between these
two concepts is that segmentation tries to find patterns in an image and any regularity in the
image can be used to compress it. The method describes each segment by its texture and
boundary shape. Each of these components is modeled by a probability distribution function and
its coding length is computed as follows:
1. The boundary encoding leverages the fact that regions in natural images tend to have a
smooth contour. This prior is used by Huffman coding to encode the difference chain
code of the contours in an image. Thus, the smoother a boundary is, the shorter coding
length it attains.
For any given segmentation of an image, this scheme yields the number of bits required to
encode that image based on the given segmentation. Thus, among all possible segmentations of
an image, the goal is to find the segmentation which produces the shortest coding length. This
can be achieved by a simple agglomerative clustering method. The distortion in the lossy
compression determines the coarseness of the segmentation and its optimal value may differ for
each image. This parameter can be estimated heuristically from the contrast of textures in an
image. For example, when the textures in an image are similar, such as in camouflage images,
stronger sensitivity and thus lower quantization is required.
Histogram-based methods
Histogram-based methods are very efficient compared to other image segmentation methods
because they typically require only one pass through the pixels. In this technique, a histogram is
computed from all of the pixels in the image, and the peaks and valleys in the histogram are used
to locate the clusters in the image.[1] Color or intensity can be used as the measure.
Histogram-based approaches can also be quickly adapted to apply to multiple frames, while
maintaining their single pass efficiency. The histogram can be done in multiple fashions when
multiple frames are considered. The same approach that is taken with one frame can be applied
to multiple, and after the results are merged, peaks and valleys that were previously difficult to
identify are more likely to be distinguishable. The histogram can also be applied on a per-pixel
basis where the resulting information is used to determine the most frequent color for the pixel
location. This approach segments based on active objects and a static environment, resulting in a
different type of segmentation useful in Video tracking.
Edge detection
Edge detection is a well-developed field on its own within image processing. Region boundaries
and edges are closely related, since there is often a sharp adjustment in intensity at the region
boundaries. Edge detection techniques have therefore been used as the base of another
segmentation technique.
The edges identified by edge detection are often disconnected. To segment an object from an
image however, one needs closed region boundaries. The desired edges are the boundaries
between such objects or spatial-taxons.[15] [16]
Segmentation methods can also be applied to edges obtained from edge detectors. Lindeberg and
Li[21] developed an integrated method that segments edges into straight and curved edge segments
for parts-based object recognition, based on a minimum description length (MDL) criterion that
was optimized by a split-and-merge-like method with candidate breakpoints obtained from
complementary junction cues to obtain more likely points at which to consider partitions into
different segments.
Dual clustering method
This method is a combination of three characteristics of the image: partition of the image based
on histogram analysis is checked by high compactness of the clusters (objects), and high
gradients of their borders. For that purpose two spaces has to be introduced: one space is the one-
dimensional histogram of brightness H = H(B), the second space the dual 3-dimensional space
of the original image itself B = B(x, y). The first space allows to measure how compact is
distributed the brightness of the image by calculating minimal clustering kmin. Threshold
brightness T corresponding to kmin defines the binary (black-and-white) image bitmap b =
(x, y), where (x, y) = 0, if B(x, y) < T, and (x, y) = 1, if B(x, y) T. The bitmap b is an object
in dual space. On that bitmap a measure has to be defined reflecting how compact distributed
black (or white) pixels are. So, the goal is to find objects with good borders. For all T the
measure MDC =G/(k-L) has to be calculated (where k is difference in brightness between the
object and the background, L is length of all borders, and G is mean gradient on the borders).
Maximum of MDC defines the segmentation.[22]
Region-growing methods
Region-growing methods rely mainly on the assumption that the neighboring pixels within one
region have similar values. The common procedure is to compare one pixel with its neighbors. If
a similarity criterion is satisfied, the pixel can be set to belong to the cluster as one or more of its
neighbors. The selection of the similarity criterion is significant and the results are influenced by
noise in all instances.
The method of Statistical Region Merging[23] (SRM) starts by building the graph of pixels using
4-connectedness with edges weighted by the absolute value of the intensity difference. Initially
each pixel forms a single pixel region. SRM then sorts those edges in a priority queue and decide
whether or not to merge the current regions belonging to the edge pixels using a statistical
predicate.
One region-growing method is the seeded region growing method. This method takes a set of
seeds as input along with the image. The seeds mark each of the objects to be segmented. The
regions are iteratively grown by comparison of all unallocated neighboring pixels to the regions.
The difference between a pixel's intensity value and the region's mean, , is used as a measure of
similarity. The pixel with the smallest difference measured in this way is assigned to the
respective region. This process continues until all pixels are assigned to a region. Because seeded
region growing requires seeds as additional input, the segmentation results are dependent on the
choice of seeds, and noise in the image can cause the seeds to be poorly placed.
One variant of this technique, proposed by Haralick and Shapiro (1985),[1] is based on pixel
intensities. The mean and scatter of the region and the intensity of the candidate pixel are used to
compute a test statistic. If the test statistic is sufficiently small, the pixel is added to the region,
and the regions mean and scatter are recomputed. Otherwise, the pixel is rejected, and is used to
form a new region.
This method starts at the root of the tree that represents the whole image. If it is found non-
uniform (not homogeneous), then it is split into four son squares (the splitting process), and so
on. If, in contrast, four son squares are homogeneous, they are merged as several connected
components (the merging process). The node in the tree is a segmented node. This process
continues recursively until no further splits or merges are possible.[25][26] When a special data
structure is involved in the implementation of the algorithm of the method, its time complexity
can reach , an optimal algorithm of the method.[27]
Parametric methods
Lagrangian techniques are based on parameterizing the contour according to some sampling
strategy and then evolve each element according to image and internal terms. Such techniques
are fast and efficient, however the original "purely parametric" formulation (due to Kass, Witkin
and Terzopoulos in 1987 and known as "snakes"), is generally criticized for its limitations
regarding the choice of sampling strategy, the internal geometric properties of the curve,
topology changes (curve splitting and merging), addressing problems in higher dimensions, etc..
Nowadays, efficient "discretized" formulations have been developed to address these limitations
while maintaining high efficiency. In both cases, energy minimization is generally conducted
using a steepest-gradient descent, whereby derivatives are computed using, e.g., finite
differences.
The level set method was initially proposed to track moving interfaces by Osher and Sethian in
1988 and has spread across various imaging domains in the late 90s. It can be used to efficiently
address the problem of curve/surface/etc. propagation in an implicit manner. The central idea is
to represent the evolving contour using a signed function whose zero corresponds to the actual
contour. Then, according to the motion equation of the contour, one can easily derive a similar
flow for the implicit surface that when applied to the zero level will reflect the propagation of the
contour. The level set method affords numerous advantages: it is implicit, is parameter-free,
provides a direct way to estimate the geometric properties of the evolving structure, allows for
change of topology, and is intrinsic. It can be used to define an optimization framework, as
proposed by Zhao, Merriman and Osher in 1996. One can conclude that it is a very convenient
framework for addressing numerous applications of computer vision and medical image analysis.
[29]
Research into various level set data structures has led to very efficient implementations of this
method.
The fast marching method has been used in image segmentation,[30] and this model has been
improved (permitting a both positive and negative speed propagation speed) in an approach
called the generalized fast marching method.[31]
Variational methods
The goal of variational methods is to find a segmentation which is optimal with respect to a
specific energy functional. The functionals consist of a data fitting term and a regularizing terms.
A classical representative is the Potts model defined for an image by
A minimizer is a piecewise constant image which has an optimal tradeoff between the squared
L2 distance to the given image and the total length of its jump set. The jump set of defines a
segmentation. The relative weight of the energies is tuned by the parameter . The binary
variant of the Potts model, i.e., if the range of is restricted to two values, is often called Chan-
Vese model.[32] An important generalization is the Mumford-Shah model [33] given by
The functional value is the sum of the total length of the segmentation curve , the smoothness
of the approximation , and its distance to the original image . The weight of the smoothness
penalty is adjusted by . The Potts model is often called piecewise constant Mumford-Shah
model as it can be seen as the degenerate case . The optimization problems are known
to be NP-hard in general but near-minimizing strategies work well in practice. Classical
algorithms are graduated non-convexity and Ambrosio-Tortorelli approximation.
The application of Markov random fields (MRF) for images was suggested in early 1984 by
Geman and Geman.[39] Their strong mathematical foundation and ability to provide a global
optima even when defined on local features proved to be the foundation for novel research in the
domain of image analysis, de-noising and segmentation. MRFs are completely characterized by
their prior probability distributions, marginal probability distributions, cliques, smoothing
constraint as well as criterion for updating values. The criterion for image segmentation using
MRFs is restated as finding the labelling scheme which has maximum probability for a given set
of features. The broad categories of image segmentation using MRFs are supervised and
unsupervised segmentation.
In terms of image segmentation, the function that MRFs seek to maximize is the probability of
identifying a labelling scheme given a particular set of features are detected in the image. This is
a restatement of the Maximum a posteriori estimation method.
MRF neighborhood for a chosen pixel
The generic algorithm for image segmentation using MAP is given below:
Optimization algorithms
Each optimization algorithm is an adaptation of models from a variety of fields and they are set
apart by their unique cost functions. The common trait of cost functions is to penalize change in
pixel value as well as difference in pixel label when compared to labels of neighboring pixels.
The ICM algorithm tries to reconstruct the ideal labeling scheme by changing the values of each
pixel over each iteration and evaluating the energy of the new labeling scheme using the cost
function given below,
where is the penalty for change in pixel label and is the penalty for difference in label
between neighboring pixels and chosen pixel. Here is neighborhood of pixel i and is the
Kronecker delta function. A major issue with ICM is that, similar to gradient descent, it has a
tendency to rest over local maxima and thus not obtain a globally optimal labeling scheme.
Simulated Annealing(SA)
Derived as an analogue of annealing in metallurgy, SA uses change in pixel label over iterations
and estimates the difference in energy of each newly formed graph to the initial data. If the
newly formed graph is more profitable, in terms of low energy cost, given by:
new old
the algorithm selects the newly formed graph. Simulated annealing requires the input of
temperature schedules which directly affects the speed of convergence of the system, as well as
energy threshold for minimization to occur.
Alternative Algorithms
A range of other methods exist for solving simple as well as higher order MRFs. They include
Maximization of Posterior Marginal, Multi-scale MAP estimation,[40] Multiple Resolution
segmentation[41] and more. Apart from likelihood estimates, graph-cut using maximum flow[42]
and other highly constrained graph based methods[43][44] exist for solving MRFs.
2. E-Step: Estimate class statistics based on the random segmentation model defined. Using
these, compute the conditional probability of belonging to a label given the feature set is
calculated using naive Bayes' theorem.
3. M-Step: The established relevance of a given feature set to a labeling scheme is now used to
compute the a priori estimate of a given label in the second part of the algorithm. Since the actual
number of total labels is unknown (from a training data set), a hidden estimate of the number of
labels given by the user is utilized in computations.
where is the set of all possible features.
Watershed transformation
The watershed transformation considers the gradient magnitude of an image as a topographic
surface. Pixels having the highest gradient magnitude intensities (GMIs) correspond to watershed
lines, which represent the region boundaries. Water placed on any pixel enclosed by a common
watershed line flows downhill to a common local intensity minimum (LIM). Pixels draining to a
common minimum form a catch basin, which represents a segment.
Multi-scale segmentation
Image segmentations are computed at multiple scales in scale space and sometimes propagated
from coarse to fine scales; see scale-space segmentation.
Segmentation criteria can be arbitrarily complex and may take into account global as well as
local criteria. A common requirement is that each region must be connected in some sense.
Witkin's seminal work[47][48] in scale space included the notion that a one-dimensional signal
could be unambiguously segmented into regions, with one scale parameter controlling the scale
of segmentation.
A key observation is that the zero-crossings of the second derivatives (minima and maxima of
the first derivative or slope) of multi-scale-smoothed versions of a signal form a nesting tree,
which defines hierarchical relations between segments at different scales. Specifically, slope
extrema at coarse scales can be traced back to corresponding features at fine scales. When a
slope maximum and slope minimum annihilate each other at a larger scale, the three segments
that they separated merge into one segment, thus defining the hierarchy of segments.
There have been numerous research works in this area, out of which a few have now reached a
state where they can be applied either with interactive manual intervention (usually with
application to medical imaging) or fully automatically. The following is a brief overview of some
of the main research ideas that current approaches are based upon.
The nesting structure that Witkin described is, however, specific for one-dimensional signals and
does not trivially transfer to higher-dimensional images. Nevertheless, this general idea has
inspired several other authors to investigate coarse-to-fine schemes for image segmentation.
Koenderink[49] proposed to study how iso-intensity contours evolve over scales and this approach
was investigated in more detail by Lifshitz and Pizer.[50] Unfortunately, however, the intensity of
image features changes over scales, which implies that it is hard to trace coarse-scale image
features to finer scales using iso-intensity information.
Lindeberg[51][52] studied the problem of linking local extrema and saddle points over scales, and
proposed an image representation called the scale-space primal sketch which makes explicit the
relations between structures at different scales, and also makes explicit which image features are
stable over large ranges of scale including locally appropriate scales for those. Bergholm
proposed to detect edges at coarse scales in scale-space and then trace them back to finer scales
with manual choice of both the coarse detection scale and the fine localization scale.
Gauch and Pizer[53] studied the complementary problem of ridges and valleys at multiple scales
and developed a tool for interactive image segmentation based on multi-scale watersheds. The
use of multi-scale watershed with application to the gradient map has also been investigated by
Olsen and Nielsen[54] and been carried over to clinical use by Dam[55] Vincken et al.[56] proposed a
hyperstack for defining probabilistic relations between image structures at different scales. The
use of stable image structures over scales has been furthered by Ahuja[57][58] and his co-workers
into a fully automated system. A fully automatic brain segmentation algorithm based on closely
related ideas of multi-scale watersheds has been presented by Undeman and Lindeberg[59] and
been extensively tested in brain databases.
These ideas for multi-scale image segmentation by linking image structures over scales have also
been picked up by Florack and Kuijper.[60] Bijaoui and Ru[61] associate structures detected in
scale-space above a minimum noise threshold into an object tree which spans multiple scales and
corresponds to a kind of feature in the original signal. Extracted features are accurately
reconstructed using an iterative conjugate gradient matrix method.
Semi-automatic segmentation
In one kind of segmentation, the user outlines the region of interest with the mouse clicks and
algorithms are applied so that the path that best fits the edge of the image is shown.
Techniques like SIOX, Livewire, Intelligent Scissors or IT-SNAPS are used in this kind of
segmentation. In an alternative kind of semi-automatic segmentation, the algorithms return a
spatial-taxon (i.e. foreground, object-group, object or object-part) selected by the user or
designated via prior probabilities.[62][63]
Trainable segmentation
Most segmentation methods are based only on color information of pixels in the image. Humans
use much more knowledge than this when doing image segmentation, but implementing this
knowledge would cost considerable computation time and would require a huge domain-
knowledge database, which is currently not available. In addition to traditional segmentation
methods, there are trainable segmentation methods which can model some of this knowledge.
Neural Network segmentation relies on processing small areas of an image using an artificial
neural network[64] or a set of neural networks. After such processing the decision-making
mechanism marks the areas of an image accordingly to the category recognized by the neural
network. A type of network designed especially for this is the Kohonen map.
Pulse-coupled neural networks (PCNNs) are neural models proposed by modeling a cats visual
cortex and developed for high-performance biomimetic image processing. In 1989, Eckhorn
introduced a neural model to emulate the mechanism of a cats visual cortex. The Eckhorn model
provided a simple and effective tool for studying the visual cortex of small mammals, and was
soon recognized as having significant application potential in image processing. In 1994, the
Eckhorn model was adapted to be an image processing algorithm by Johnson, who termed this
algorithm Pulse-Coupled Neural Network. Over the past decade, PCNNs have been utilized for a
variety of image processing applications, including: image segmentation, feature generation, face
extraction, motion detection, region growing, noise reduction, and so on. A PCNN is a two-
dimensional neural network. Each neuron in the network corresponds to one pixel in an input
image, receiving its corresponding pixels color information (e.g. intensity) as an external
stimulus. Each neuron also connects with its neighboring neurons, receiving local stimuli from
them. The external and local stimuli are combined in an internal activation system, which
accumulates the stimuli until it exceeds a dynamic threshold, resulting in a pulse output. Through
iterative computation, PCNN neurons produce temporal series of pulse outputs. The temporal
series of pulse outputs contain information of input images and can be utilized for various image
processing applications, such as image segmentation and feature generation. Compared with
conventional image processing means, PCNNs have several significant merits, including
robustness against noise, independence of geometric variations in input patterns, capability of
bridging minor intensity variations in input patterns, etc.
IMMI
Other methods
There are many other methods of segmentation like multispectral segmentation or connectivity-
based segmentation based on DTI images.[65]
Segmentation benchmarking
Several segmentation benchmarks are available for comparing the performance of segmentation
methods with the state-of-the-art segmentation methods on standardized sets:
See also
Computer vision
Image-based meshing
Vector quantization
Image quantization
Color quantization
Notes
1. Linda G. Shapiro and George C. Stockman (2001): Computer Vision, pp 279-
325, New Jersey, Prentice-Hall, ISBN 0-13-030796-3
3. Pham, Dzung L.; Xu, Chenyang; Prince, Jerry L. (2000). "Current Methods in
Medical Image Segmentation". Annual Review of Biomedical Engineering 2: 315337.
doi:10.1146/annurev.bioeng.2.1.315. PMID 11701515.