Segmentation of Digital Rock Images Using Deep Convolutional Autoencoder
Segmentation of Digital Rock Images Using Deep Convolutional Autoencoder
Keywords: Segmentation is a critical step in Digital Rock Physics (DRP) as the original images are available in a gray-scale
Segmentation format. Conventional methods often use thresholding to delineate distinct phases and, consequently, watershed
Digital rock physics (DRP) algorithm to identify the existing phases. Such methods are based on color contrast, which makes it difficult to
Mineral identification automatically differentiate phases with similar colors and intensities. Recently, deep learning and machine
Artificial intelligence
learning algorithms have proposed several algorithms working with images, including Convolutional Neural
Networks (CNN). Among them, convolutional autoencoder networks have produced accurate results in different
applications when various images are available for the training. In this paper, thus, convolutional autoencoder
algorithm is implemented to enhance segmentation of digital rock images. However, the bottleneck for applying
the CNN algorithms in DRP is the limited available rock images. As an effective data augmentation method, a
cross-correlation based simulation was used to increase the necessary dataset in this study. Therefore, using the
originally available dataset, namely 20 images from Berea sandstone, a training seed comprising of the manually
and semi-manually segmented images was used. Then, the dataset is divided into training, validation and testing
groups with a fraction of 80, 10 and 10%, respectively. Next, the produced dataset is given to our stochastic
image generator algorithm and 20000 realizations, along with their segmented images, are produced simulta-
neously. The implemented CNN algorithm was tested for two versions of basic and extended architectures. The
results show that the extended network produces results with 96% of categorical accuracy using the designated
images in the testing group. Finally, a qualitative comparison with the conventional multiphase segmentation
(multi-thresholding) revealed that our results are more accurate and reliable even if very few rock images are
available.
1. Introduction detection using gradient magnitude of the image and excluding them, 2.
Thresholding the reminder pixels as pore and minerals phases, and 3.
Rock physics provides the relationships between the physical Expanding all phases to boundary pixels by a marker-based watershed
properties of the porous structure of rock and remotely-sensed geo- algorithm (Beucher and Meyer, 1992). Marker detection in minerals
physical measurements. Recently, emerging the high-resolution micro- with the complicated pattern is not straightforward and there may be
computed tomography (μCT) images of rock samples have led to a re- no general method to achieve this (Beucher and Meyer, 1992). Al-
markable development in Digital Rock Physics (DRP). In the standard though most of the segmentation methods are based on image proces-
workflow of DRP, segmentation of pore and minerals to separate phases sing algorithms occurring in an automatic framework, manually con-
is a vital step. The segmentation methods developed for DRP are ex- trolling is essential in each step. For instance, thresholding may fail
tensively reviewed in the previous publications (Iassonov et al., 2009; when no color contrast is observed between two separate phases. In
Sezgin, 2004). Among them, the procedure introduced by Visual Sci- other words, there may be two different minerals in the rock image with
ence Group (VSG), Stanford University (SU) and Kongju (KJ) segmen- similar colors. This issue happens due to either their close densities or
tations are the most effective available frameworks (Andrä et al., 2013). limited detection power of the imaging instrument. For the sake of
These methods are summarized in the following steps: 1. Boundary simplicity, researchers often consider the rock images as a two-phase
Dr. Sadegh Karimpouli and Dr. Pejman Tahmasebi together conceived the problem, developed the method, performed the computations and contributed to
☆
https://doi.org/10.1016/j.cageo.2019.02.003
Received 19 April 2018; Received in revised form 12 October 2018; Accepted 5 February 2019
Available online 06 February 2019
0098-3004/ © 2019 Elsevier Ltd. All rights reserved.
S. Karimpouli and P. Tahmasebi Computers and Geosciences 126 (2019) 142–150
(porosity and mineral) sample (Karimpouli et al., 2018; Fattahi and heterogeneous media such as sandstone and carbonate samples
Karimpouli, 2016; Karimpouli and Fattahi, 2016). This simplification is, (Karimpouli and Tahmasebi, 2016), coal samples (Karimpouli et al.,
however, not applicable in general and strongly affects the subsequent 2017), small-scale porous media modeling (Tahmasebi, 2018a;
computations of rock physical parameters, particularly for Pe and S- Tahmasebi, 2018c; Tahmasebi and Kamrava, 2018) and unconventional
wave velocities (Andrä et al., 2013). plays (Tahmasebi et al., 2017; Tahmasebi et al., 2018; Tahmasebi,
Artificial Neural Network (ANN) is a class of Machine Learning al- 2018b).
gorithms inspired by the human brain. ANNs learn to perform tasks In this paper, first, we tackle the problem of limited data in DRP by
such as classification or prediction if they are trained by some ex- the HYPPS algorithm as an effective augmentation approach. Then, the
amples. Recent developments in ANN and, in particular, deep learning SegNet is used to overcome the difficulties and drawbacks of the con-
has offered new possibilities, to tackle very complex problems in image ventional methods for segmentation of digital rock images. This
analysis. One of such an algorithm is the Convolutional Neural workflow can be considered as an opening for the deep learning tech-
Networks (CNN) (Krizhevsky et al., 2012; Lecun et al., 1998), which use niques in the vast world of DRP. In the following sections, we will in-
convolution and pooling functions to extract new features for analyzing troduce the CNN, SegNet and HYPPS algorithms in Section 2. In Section
visual imagery. These networks, which are considered as one of the 3, the utilized Berea sandstone is described as a benchmark data in the
deep learning derivatives, have manifested significant differences in DRP studies. Then, we produce a limited number of ground-truth (or
terms of accuracy and effectiveness compared to the conventional segmented) images semi-manually and increase the numbers using the
networks (Garcia-Garcia et al., 2017). CNN have been used for many HYPPS method. Next, the two versions of the SegNet are used for seg-
applications, including face detection (Li et al., 2015), semantic seg- mentation and their respective segmented results are compared. Fi-
mentation (Garcia-Garcia et al., 2017), video analysis in autonomous nally, the results are discussed in Section 4.
driving (Badrinarayanan et al., 2015), speech recognition (Y. Zhang
et al., 2017b) and medical image analysis and applications (Havaei 2. Basic concepts
et al., 2017; Litjens et al., 2017; Wallach et al., 2015). In geosciences,
and particularly in rock physics, deep learning methods and CNN are Two main algorithms that are used in this paper are the SegNet for
used in different applications such as: lithology detection using bore- image-based segmentation and the HYPPS method for simulation and
hole imaging (P. Y. Zhang et al., 2017a), rock type classification (Cheng data augmentation. In this part, a brief description of these algorithms
and Guo, 2017; Ferreira and Giraldi, 2017), permeability prediction as well as CNN, as the core of the image-based neural network, is
(Srisutthiyakorn, 2016) and reconstruction of rock porous media (Laloy provided.
et al., 2017; Mosser et al., 2017). In this work, we aim to use this
powerful method for rock image segmentation. Thus, deep learning 2.1. Convolutional Neural Networks
segmentation methods are first reviewed briefly.
As mentioned earlier, CNN is capable to be used for pixel labeling Convolutional Neural Networks are a part of a large group of deep
problems or segmentation. The utmost significant advantage that makes learning methods. They attracted attention due to their strong abilities
the CNN to surpass the conventional methods is the ability to learn from in image classification/recognition (He et al., 2015). CNN are trained
patterns, features, and textures rather than only relying on color var- through their convolutional layers to recognize various patterns in the
iation. According to a recent research performed by Garcia-Garcia et al. input images. Small size kernels are pillars of the convolutional layers.
(2017), the most successful sat-of-the-art deep learning based segmen- Indeed, they effectively extract high-level characteristics of the input
tation methods are fully convolutional networks (Shelhamer et al., image. Convolutional layers are followed by a fully connected neural
2017). In the fully convolutional networks, a set of connected layers are network, which is used to translate those features obtained from the
replaced by the convolutional layers, which produce spatial maps in- previous layers to the given output phases. The basic layers in a
stead of classification scores. In fact, such maps are deconvolved in an common CNN are as follow:
up-sampling procedure to obtain a per-pixel labeled image (i.e. seg-
mented image). SegNet (Badrinarayanan et al., 2015), an encoder-de- 1. Input layer: Images are considered as input data, which are in-
coder convolutional network, is one of the currently best available troduced to CNN in this layer.
networks of this kind which has been used in different applications and 2. Convolutional layer: In this layer, input images or feature maps
has demonstrated promising results (Garcia-Garcia et al., 2017; Kendall from the last layer are convolved with some small size filters (or
et al., 2015; Nanfack et al., 2017). kernels) to generate new feature maps. These convolutions are being
One of the major issues in using deep learning methods for DRP is performed with a shift of ‘n’ pixels, which are called stride
the limited dataset that is available for the training step. Acquiring μCT (Krizhevsky et al., 2012). In fact, stride controls how the filter
images is both expensive and time-consuming. Moreover, due to either convolves around the input image.
long-term procedure of sample preparation or limited core samples, 3. ReLU (Rectified Linear Units) Layer: The purpose of this layer is
preparing hundreds (or thousands) polished thin-section samples for to introduce nonlinearity to a system that basically has just been
microscope imaging may not be plausible in subsurface applications. computing by linear operations (multiplications and summations)
Therefore, data augmentation methods should be applied to increase during the convolutional layers. The ReLU layer applies the function
the existing dataset. Using such big data would lead to more effective f (x ) = max(0, x ) to all the values in the input volume. The logic
training and, thus, avoiding overfitting by either a fast convergence or behind ReLU is that this layer changes all the negative values to
better regularizing. The common methods for data augmentation are zero.
the available image transformation operators, such as rotation, trans- 4. Max-pooling layer: This layer, usually known as down-sampling, is
lation, scaling, crops, etc. For example, Mosser et al. (2017) cropped used to summarize data by choosing the local maximum in a sliding
several small size images from a large image and used an overlap of window moving across the feature maps with a stride of the same
12–64% between small images to increase the dataset. In this paper, length.
however, a more efficient reconstruction method, namely Hybrid pat- 5. Fully connected layer: It is similar to the traditional Multi-Layer
tern- and pixel-based Simulation (HYPPS) introduced by Tahmasebi Perceptron (MLP) neural networks (Haykin, 1999) and is used to
(2017), is used for data augmentation. HYPPS uses an input image and translate feature maps or patterns obtained in previous layers to a
generates any number of realizations with different structures and any known classification.
sizes, but similar statistics. HYPPS is a powerful tool which has been 6. Soft-max layer: The soft-max or normalized exponential function is
used in several applications to reconstruct new scenarios of another activation function, which produces a categorical
143
S. Karimpouli and P. Tahmasebi Computers and Geosciences 126 (2019) 142–150
Fig. 1. The general architecture of the SegNet (after Badrinarayanan et al. (2015)).
144
S. Karimpouli and P. Tahmasebi Computers and Geosciences 126 (2019) 142–150
Fig. 3. (a) Different minerals of Berea sandstone detected in a scanning electron microscopy (SEM) (after (Madonna et al., 2013)) and (b) 3D μCT image of Berea
sandstone with a size of 1024 × 1024 × 1024 voxels and a resolution of 0.74 μm.
Then, starting from a corner of the simulation grid, a pattern with a 3. Benchmark data
specific size from the input image is selected and pasted on the simu-
lation grid. Note, the first pattern is inserted on an empty simulation Andrä et al. (2013) introduced several standard digital rock samples
grid and it is selected in a completely random fashion. The size of the such as Berea and Fontainebleau sandstone and Grosmont carbonate1.
selected pattern depends on the heterogeneity of the input image. A These benchmark samples have been frequently used for DRP studies.
larger template size can be used if the input image represents very Among them, we used the Berea sandstone to evaluate our segmenta-
complex patterns. Similarly, smaller pattern size can be chosen when tion results. This sample is mainly composed of quartz and some small
the input image homogeneous. However, it should be kept in mind that minerals such as clay, K-feldspar, ankerite, and zircon; see Fig. 3(a). The
a larger template often reduces the variability between the produced acquired image consists of 1024 × 1024 × 1024 voxels with a resolu-
realizations yet generating high-quality images. On the other hand, tion of 0.74 μm (Fig. 3(b)). Andrä et al. (2013) implemented three
smaller template size increases the variability, whereas the final reali- segmentation methods, namely VSG, SU, and KJ, to obtain a mono-
zations may not be very similar to the input image. Therefore, after mineral sample. The results indicate that the porosity ranges from 18.4
defining the appropriate size, a small overlap between the previously to 20.9%. This is mostly because even with a bimodal image histogram
inserted patterns OL (OL x × OL y ) is selected and the cross-correlation (i.e. two distinct modes of porosity and mineral), choosing a threshold
of the selected region(s) with the input image is calculated using: value strongly affects the estimated porosity.
OLx 1 OL x 1
(i , j ; x , y ) = DI (x + i, y + j) DT (x , y ), 4. Results
x=0 y=0 (1)
4.1. Semi-automatic segmentation
where DI is the input 2D image, DT is the visiting data-event at point
(x , y ) and T represents its size (i.e. template size). The resulted simi-
Fig. 4(a) represents an example of the original grayscale image used
larity map is used, and the patterns are sorted based on their similarities
in this study. The original image size is 1024 × 1024 pixels, but to
and a certain number of the most similar patterns are selected. Finally,
avoid the existing streak artifacts around the central Z-axis, especially
one of such patterns is selected and inserted in the visiting point
in the boundary of the image, an image with 512 × 512 pixels was
(Tahmasebi, 2017). This process continues until the simulation grid is
selected from the center of the original image. Then, we resampled the
filled. Some of the produced realization using an input sandstone image
image by a factor of 0.5 to produce an image with 256 × 256 pixels and
are shown in Fig. 5.
a resolution of 1.48 μm, mostly for the sake of smaller computational
The HYPPS algorithm was originally developed to deal with com-
time.
plex secondary continuous and point data (Tahmasebi, 2017). As such,
To obtain a multi-mineral segmented image using several threshold
reproducing the conditioning data can be difficult using a sole pattern
values require addressing two crucial issues. As illustrated in Fig. 4(b):
strategy. Therefore, the HYPPS method offers some flexibilities such as
1. Grain boundaries are brighter than grain surfaces so that they are
simulating the point-data through a pixel-based method. In this study,
misclassified, and 2. Different minerals are similarly classified because
however, since none of such data are available, the HYPPS method is
of their close color values or intensities. Although these minerals have
used in its pattern-based mode.
different textures, the above segmentations are insensitive to such
One of the main differences between artificial intelligence and
features. Based on several trials, the watershed algorithm enhances the
physics-based modeling is the amount of the necessary variability. In
segmentation process. This algorithm, however, also failed to differ-
other words, the deep learning methods can represent their best out-
entiate complex structures. Therefore, we decided to manually label
come when the input data show large standard deviation so that the
each misclassified pixel as an expert supervisor. Fig. 4(c) shows the
new models can be built from a rich database. To leverage the spatial
result of our semi-manually segmentation. Although there are many
relationship between the input images and making new patterns, the
minor minerals with a similar color in this sample, we decided to ca-
HYPPS method was modified. Thus, new realizations/images are gen-
tegorize all of them as one phase. Therefore, five phases were
erated using an ensemble of inputs. In other words, all the available
images are searched during the pattern selection phase. Doing so will
result in more variability and also producing new transition patterns 1
These images and their corresponding segmentations are available on:
that might not be available in each of the input images individually. https://github.com/fkrzikalla/drp-benchmarks.
145
S. Karimpouli and P. Tahmasebi Computers and Geosciences 126 (2019) 142–150
As mentioned, the main obstacle in using the CNN for digital rock
images is the limited available data for training. Although image
transforms are used as data augmentation method (Garcia-Garcia et al.,
2017), our proposed method in Section 2.3 is a more efficient method.
Therefore, HYPPS method is applied in this study to enrich the dataset.
It has been proven that this algorithm can be considered as an efficient
augmentation method to generate as many images as required in the
deep learning studies.
Fig. 4. An example of (a) original image, (b) automatic multi-thresholding To avoid a secondary segmentation, we made changes in the source
segmentation and (c) semi-manually segmentation of Berea sandstone. code to produce the segmented images simultaneously. This means no
extra round of segmentation is required and the segmented image is
generated all at once when a new realization is produced. The input
image size is 256 × 256 pixels and optimal template and overlap sizes
used for the reconstructions are 90 × 90 and 10 × 10 respectively.
According to the heterogeneity of this sample, five images are con-
sidered. An example of such a simultaneous reconstruction is shown in
Fig. 5. In this example, we used the gray-scale image shown in Fig. 4(a)
and its corresponded segmented image in Fig. 4(c) as inputs, and three
realizations, as well as their segmented images, are produced. These
results demonstrate how efficient and accurate images are generated in
this study.
To avoid overfitting, the original 20 images (Section 4.1) are di-
vided into training, validation and testing groups with a fraction of 80,
10 and 10%, respectively. Then, 16000 stochastic images are generated
using the images in the training group. In a similar way, 4000 other
images are produced for the validation and testing phases, each with
2000 images. In the next step, the produced images in the training
group are used to train the SegNet and the network is adjusted using the
existing images in the evaluation dataset. Finally, the designed network
Fig. 5. Three realizations and their corresponding segmented images produced is tested using the unseen images in the testing subset.
using HYPPS algorithm.
Fig. 6. The categorical accuracy and loss of (a) basic and (b) extended SegNet used in this study.
146
S. Karimpouli and P. Tahmasebi Computers and Geosciences 126 (2019) 142–150
Fig. 7. Categorical accuracy values of (a) the basic and (b) extended SegNet for each phase.
147
S. Karimpouli and P. Tahmasebi Computers and Geosciences 126 (2019) 142–150
the categorical accuracy values of all phases after 10000 epochs, which demonstrate this capability, the results of a multi-segmented image
are more than 90% (Fig. 7(b) and Table 1). Although the overall ca- using the conventional and the CNN methods are compared. The multi-
tegorical accuracy for the training step in the last epoch is 99%, the thresholding method is done by taking the following steps:
network does not perform well for the validation and test images. The
extended SegNet, however, segmented the testing dataset with 95, 98, 1. Due to the artifact associated with image acquisition and re-
64, 73 and 88% of categorical accuracy for the Phi, Qtz, OM, K-Fld, and construction, the image was smoothed using a median filter with a
Zrc, respectively, with an overall categorical accuracy of 96%. size of two pixels.
According to these results, the extended SegNet is considered being 2. The gradient image was computed to identify the boundary between
a more reliable network. Fig. 8 illustrates the gray-scale image, the the minerals. The transition areas between separate phases are those
ground truth and the basic and extended SegNet results for four unseen with high gradient magnitude. Such areas are then masked as the
images. It is clear that the extended SegNet has been successful in the transition regions.
training and image segmentation. Whereas, the basic SegNet algorithm 3. Except for the masked transition area, multiphase segmentation is
introduces a large number of noises in the segmented images. implemented by choosing four thresholds manually to achieve five
A key point that suppresses the CNN-based segmentation compared different phases based on color contrasts of minerals.
to conventional methods is that these methods are based on small 4. Finally, each phase is extended to the transition area using a wa-
convolutional kernels combined in a deep architecture, which enable tershed algorithm.
them to consider both color and texture simultaneously. To
Fig. 9. (a) Real, (b) ground truth, (c) SegNet and (d) multiphase thresholding segmentation.
148
S. Karimpouli and P. Tahmasebi Computers and Geosciences 126 (2019) 142–150
149
S. Karimpouli and P. Tahmasebi Computers and Geosciences 126 (2019) 142–150
segmentation. In: Proceedings of the IEEE International Conference on Computer Tahmasebi, P., 2018c. Packing of discrete and irregular particles. Comput. Geotech 100,
Vision, pp. 1520–1528. https://doi.org/10.1109/ICCV.2015.178. 52–61. https://doi.org/10.1016/J.COMPGEO.2018.03.011.
Ronneberger, O., Fischer, P., Brox, T., 2015. U-net: convolutional networks for biomedical Tahmasebi, P., Javadpour, F., Sahimi, M., 2017. Data mining and machine learning for
image segmentation. In: Lecture Notes in Computer Science (Including Subseries identifying sweet spots in shale reservoirs. Expert Syst. Appl. 88, 435–447.
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, Tahmasebi, P., Kamrava, S., 2018. Rapid multiscale modeling of flow in porous media.
Cham, pp. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28. Phys. Rev. E 98, 052901. https://doi.org/10.1103/PhysRevE.98.052901.
Sezgin, M., 2004. Survey over image thresholding techniques and quantitative perfor- Tahmasebi, P., Sahimi, M., Shirangi, M.G., 2018. Rapid Learning-Based and Geologically
mance evaluation. J. Electron. Imag. 13, 146–168. Consistent History Matching. Transp. Porous Media. https://doi.org/10.1007/
Shelhamer, E., Long, J., Darrell, T., 2017. Fully convolutional networks for semantic s11242-018-1005-6.
segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 640–651. https://doi.org/ Wallach, I., Dzamba, M., Heifets, A., 2015. AtomNet: A Deep Convolutional Neural
10.1109/TPAMI.2016.2572683. Network for Bioactivity Prediction in Structure-Based Drug Discovery.
Srisutthiyakorn, N., 2016. Deep-learning methods for predicting permeability from 2D/ Zhang, P.Y., Sun, J.M., Jiang, Y.J., Gao, J.S., 2017a. Deep Learning Method for Lithology
3D binary-segmented images. In: SEG Technical Program Expanded Abstracts 2016. Identification from Borehole Images. https://doi.org/10.3997/2214-4609.
Society of Exploration Geophysicists, pp. 3042–3046. https://doi.org/10.1190/ 201700945.
segam2016-13972613.1. Zhang, Y., Chan, W., Jaitly, N., 2017b. Very deep convolutional networks for end-to-end
Tahmasebi, P., 2017. HYPPS: a hybrid geostatistical modeling algorithm for subsurface speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and
modeling. Water Resour. Res. 53, 5980–5997. https://doi.org/10.1002/ Signal Processing (ICASSP). IEEE, pp. 4845–4849. https://doi.org/10.1109/ICASSP.
2017WR021078. 2017.7953077.
Tahmasebi, P., 2018a. Accurate modeling and evaluation of microstructures in complex Zhao, W., 2017. Research on the deep learning of the small sample data based on transfer
materials. Phys. Rev. E 97, 023307. https://doi.org/10.1103/PhysRevE.97.023307. learning. In: AIP Conference Proceedings. AIP Publishing LLC, 020018. https://doi.
Tahmasebi, P., 2018b. Nanoscale and multiresolution models for shale samples. Fuel 217, org/10.1063/1.4992835.
218–225. https://doi.org/10.1016/j.fuel.2017.12.107.
150