A Machine Learning Approach for Fracture Density Estimation Using Conventional Logs-1
A Machine Learning Approach for Fracture Density Estimation Using Conventional Logs-1
ABSTRACT
Fracture density estimation is a critical aspect of reservoir characterization, particularly in low-
permeability rock formations. Accurate fracture density estimates can provide valuable insights into the
heterogeneity, permeability, and potential hydrocarbon recovery of a reservoir. They can also help optimize
production and develop robust geological models. Traditional methods for estimating fracture density
include pressure analysis during well testing and observation of core samples. However, these methods
have several limitations, including uncertainty in estimates, sensitivity to reservoir conditions, cost, and
operational complexity, which can affect their implementation.
In recent years, data-driven methods for fracture density estimation in conventional reservoirs have
gained attention. However, these methods are limited by small dataset sizes, which limit their
representativeness and applicability. To address these limitations, this paper proposes a novel deep learning
architecture called TabNet for fracture density estimation using logging data. The proposed model utilizes
a public dataset from Volve Field on the Norwegian continental shelf, which includes geophysical
parameters such as photoelectric effect, gamma-ray, spontaneous potential, resistivity, neutron-porosity,
bulk density, and acoustic logs. The TabNet model demonstrates superior performance compared to other
machine learning and decision tree variants, offering the advantage of narrow error margins and
interpretable feature attributions that provide insights into the global model behavior.
Using metrics such as the root mean squared error (RMSE) and coefficient of determination (R2), the
intelligent model achieved high predictability, with an R2 score of 0.8970 and an RMSE score of 0.0897
on the test set. This performance suggests that the proposed model can be used to estimate fracture density
and create local models for predicting the spatial distribution of permeability in heterogeneous reservoirs,
even in the absence of reservoir samples and with the availability of logging data. Overall, the approach
outlined in this paper represents a significant step forward in fracture density estimation, offering a robust
and interpretable solution to the limitations associated with traditional methods and previous data-driven
approaches.
INTRODUCTION
Petroleum reservoirs are essential resources for the energy industry. Unconventional reservoirs represent
a type of petroleum reservoir that poses unique challenges for their production. Compared to conventional
reservoirs, unconventional reservoirs contain hydrocarbons trapped in less permeable and porous rock
formations, making it difficult to extract these resources. Overcoming this challenge requires specialized
techniques, such as-- hydraulic fracturing or "fracking". Fracking involves pumping specially engineered
fluids into the reservoir at high pressure and rate to create fractures that facilitate hydrocarbon flow.
However, the success of fracking depends on the number and distribution of fractures created, which is
influenced by the initial fracture density of the reservoir. In this regard, natural (pre-existing) fractures play
a crucial role in the initiation and development of induced fractures in unconventional reservoirs.
In the context of a petroleum reservoir, a fracture is a natural or induced opening or discontinuity in the
subsurface rock formations that can serve as a pathway for fluid flow. These openings occur due to a variety
of geological processes, including tectonic activity, natural dissolution and mineral precipitation. Reservoir
fracture density refers to the number of fractures present in the rock formation. This parameter is critical in
determining the effectiveness of fracking and other stimulation techniques used to enhance oil and gas
recovery. Fractures have positive and negative effects on flow rates. They can provide additional channels
for fluid flow, and their orientation may control the direction of fluid flow, particularly in low porosity and
permeability carbonates. However, when filled with clay or shale, they can function as barriers to fluid
flow. Natural fracture systems have a significant impact on production flow patterns, artificial stimulation,
cementing and completion procedures, as well as the trajectory and quality of the wellbore during drilling
operations. Therefore, an understanding of fracture characteristics and distribution is critical.
There are several methods used to identify and characterize fractures across reservoir intervals. Direct
investigation methods such as core analysis and image log interpretation are not usually applied in all drilled
wells due to limitations of cost and availability. In the absence of these data, indirect methods are often
utilized for fracture characterization. The indirect techniques for the identification of fractured zones
include petrophysical logging, mud loss analysis, well testing, and seismic imaging. Petrophysical logs
indicate one-dimensional property variation with depth in the subsurface. These logs measure various
properties of the rock and fluids in the wellbore, such as porosity, natural radiation, resistivity, acoustic
velocity, and neutron capture, amongst others. However, it's important to note that there may not be a direct
causal relationship between the parameter that a certain log measures and the distribution of fractures. Thus,
the interpretation of well logs for fracture identification and characterization requires expertise and
experience. It is necessary to integrate multiple types of logs and other data sources to obtain a
comprehensive understanding of the fracture system within reservoir intervals. In this paper, we employ
machine learning techniques to investigate the feasibility of predicting fracture densities across reservoir
intervals using petrophysical data.
LITERATURE REVIEW
Numerous techniques have been developed to identify these fractures, including petrophysical logging,
mud loss analysis, core analysis, well testing, and seismic imaging. The disadvantages of these methods are
that they are expensive and time-consuming. Therefore, it would be advantageous to use innovative
approaches such as machine learning to obtain more precise results while saving time and money. Machine
Learning (ML) is a discipline of artificial intelligence that employs algorithms and statistical models to
evaluate and draw inferences from data patterns, allowing systems to learn and adapt without the need for
explicit programming. Over the past few decades, several data-driven methodologies were developed to
estimate fracture density in reservoir rocks directly from geophysical logs.
Sjogren et al. (1979) and Idziak (1988) proposed a least-squared regression method to fit measured field
parameters, seismic velocity and fracture density. Subsequently, the derived equations were applicable for
inverting the fracture density based on seismic velocities. Boadu (1998) proposed an alternate approach for
the inversion of seismic velocities (compressional and shear waves) into fracture density (FVDC) using
artificial neural networks (ANNs). Ince (2004) also presented an ANN technique to predict concrete
fracturing using 40 data records. Later, Sarkheil et al. (2009) showed that fracture densities could be
obtained from image logs (FMI) and core measurements using nonlinear models and forecasting systems.
Their findings indicated a strong correlation between observed and predicted fracture densities. Ja’fari et
al. (2012) also suggested a predictive model using adaptive neuro-fuzzy inference systems (ANFIS) and
conventional well logs.
Zazoun (2013) was able to obtain satisfactory outcomes one year later by employing the artificial neural
network (ANN) technique with six inputs, namely, core depth, gamma ray (GR), sonic interval transit time
(DT), caliper, neutron porosity (NPHI), bulk density (RHOB), and FVDC. Their four-layered ANN model,
which employed a conjugate gradient descent training algorithm, resulted in the most optimal fracture-
density prediction performance with an R2 value of 0.812. Nouri-Taleghani et al. (2015) devised a hybrid
model consisting of a combination of three models, multi-layer perceptron (MLP), radial basis function
(RBF), and least-squares support vector machine (LSSVM), While Eze and Li (2018) presented a model
for predicting FVDC using support-vector machines (SVM) with genetic algorithms (GA) based on acoustic
properties in carbonate reservoirs. In the same year, Bhattacharya and Mishra (2018) obtained accuracies
of 74.8% and 79.6% using random forest (RF) and Bayesian Network (BN) algorithms. The most prominent
solution is that of Rajabi et al. (2021), who presented a novel machine learning approach using optimizer
algorithms for FDVC prediction.
It is important to note that these solutions have limitations concerning the size of their datasets,
interpretability, and the inability of their proposed models to account for outlier petrophysical readings
resulting from poor geology. This study proposes a distinct workflow to address these challenges. Its
objective is to investigate the feasibility of using optimizer and attention-based models to estimate and
characterize fracture density, utilizing conventional well logs.
RESEARCH METHODOLOGY
Data Acquisition
The dataset used in this study consists of petrophysical data from an offshore field in the Norwegian
continental shelf (NCS) operated by Equinor. Three wells (15/9-15, 16/1-6A, and 16/10-3) from the study
area were analyzed and used to build the workflow presented in this paper. The dataset consists of 15,174
well log records, with well 15/9-15 contributing 9,531 data points with a depth range of 1,149.648 -
3,192.68m, well 16/1-6A contributed 3,104 data points with a depth range of 1,199.4 – 1725m and well
16/10-3 contributed 2539 data points with a depth range of 2,358 – 2,839.6m. The dataset consists of the
borehole diameter and several logging measurements such as bulk density log, neutron porosity log,
medium resistivity, deep resistivity, compressional wave sonic log, photo-electric factor log, and
radioactivity content (GR), taken at intervals of 0.3 m. These geophysical parameters are intrinsically linked
as they reflect the physical properties of the rock layer. Table 1 provides a summary of this dataset’s
features.
Table 1 — Dataset Summary
Relative to the fracture density, compressional sonic log, density-derived porosity and resistivity showed
an appreciable correlation. This could be because wave velocity is sensitive to changes in rock density,
porosity, and fractures. Higher shear wave velocities are typically associated with less fractured rocks.
Resistivity measurements are sensitive to the presence of fluids, which can fill or occupy fractures. Higher
resistivity values are associated with less fractured rocks, as the presence of fluids can reduce the
conductivity of the rocks. No significant correlation is observed between gamma ray, neutron-porosity,
density logs and fracture density. However, there is no documentation of any study that suggests a reason
for such behaviour.
Model Development
The aim of this study is to develop a robust data-driven approach for predicting fracture densities,
utilizing a range of machine learning models. To achieve this objective, a systematic workflow was devised
and implemented to ensure the proper preparation of each model. This section presents a discussion of the
machine learning algorithms explored in this paper. While not comprehensive, the discussion aims to offer
a basic understanding of the underlying mechanisms that influence the results we will analyze later on.
• Recursive Least Squares: Recursive Least Squares (RLS) is a method used in machine
learning to estimate the parameters of a linear regression model. It works by processing data
samples one by one and continuously updating the model's parameters based on the latest data.
At the start of the algorithm, the model's parameters are initialized to some initial values. As
new data samples are fed into the algorithm, the model's parameters are updated to minimize
the difference between the predicted values and the actual target values. This update is
performed recursively, with the latest data sample being used to adjust the model's parameters.
The RLS algorithm uses a recursive formula to compute the model's parameters at each step,
which allows it to be computationally efficient and suitable for real-time applications. The
algorithm also includes a forgetting factor, which determines the weight given to past data
samples. This allows the model to adapt to changes in the data distribution over time. The
parameters explored during the fine-tuning process were the regularization and forgetting factor.
The model was then applied to both the training and test datasets.
• Extreme Gradient Boosting: Extreme Gradient Boosting commonly referred to as XGBoost,
is a machine learning algorithm that works by iteratively learning decision trees and combining
them into a strong model (Chen & Guestrin, 2016). The algorithm starts by fitting a simple
decision tree to the data, which makes predictions based on a set of rules about the input features.
It then calculates the errors between the predictions and the actual target values, and uses these
errors to adjust the tree's parameters in the next iteration. This process is repeated multiple times,
with each iteration building on the previous trees and adjusting their parameters to minimize the
overall error. During the learning process, XGBoost pays special attention to the gradient of the
loss function and uses this information to update the model parameters in a way that minimizes
the overall loss. This approach results in a highly accurate and efficient model that can handle
large and complex datasets. The regression implementation of XGBoost has hyperparameters
to control the growth of decision trees (e.g., maximum depth, minimum samples per leaf) and
manage ensemble training, e.g., number of trees.
• TabNet: TabNet is a single deep learning model based on sequential multi-step processing. This
machine learning algorithm uses an attention mechanism to learn which features in the input
data are most important for making accurate predictions, enabling interpretability and more
efficient learning as the learning capacity is used for the most salient features (Arik & Pfister,
2021). In addition to the attention mechanism, TabNet uses a modified form of the decision tree
algorithm to learn from the input data. The decision tree is used to split the data into subsets
based on the most informative features, and TabNet uses these splits to build a hierarchy of
decision rules that can be used to make predictions on new data.
Model Performance Metrics
To ensure that each model generalizes, a cross-validation strategy was implemented. Cross-validation
is a resampling technique used to assess the performance of machine learning models. It involves training
multiple models on various subsets of the available input data and evaluating them on the remaining subset.
This method optimizes the use of available data and provides a more reliable indication of the model's
ability to generalize or perform well on unseen data. The mean squared error (MSE) is used in evaluating
each model’s performance. It quantitatively measures the average of the squared differences between the
predicted values and the actual target values. Higher MSE values indicate larger errors or greater deviation
of model predictions from the target values. In contrast, lower MSE values indicate a smaller margin of
error between the model predictions and the target values. A mathematical representation of this quantity
is provided in Eq. 1.
n
∑ (yi − ŷi )2
i=1
MSE = _____________
N
where N is the number of samples, 𝑦𝑖 is the actual value and 𝑦̂𝑖 is the predicted value of the 𝑖 𝑡ℎ sample.
RESULTS
After developing and evaluating the models with predefined portions of the dataset, the predicted results
were compared with the actual density readings from the field using the performance metrics. Each of the
model were compared using the MSE metric to select the best. In general, the lower the mean squared error,
the better fitting results are. The R2 score is the also an important index to verify the accuracy of the
predicted result of a regression algorithm, of which the range is [0, 1].
(Comparing the three models developed in this study. The highest R2 and lowest MSE are marked with
bold font. The best results are given by the TabNet model).
Table 2 and Fig. 2 show the comparative results of the proposed TabNet model and two other
independent models, which verifies that it outperforms the others with the highest value of R2 (0.8970) and
MSE (0.0897).
The results show that a complex linear or boosting models may not be sufficient to capture all the
complexities in the data. In the case of the TabNet model, overfitting is reduced to the barest minimum.
Overfitting is an undesirable machine learning behaviour that occurs when the machine learning model
gives accurate predictions for training data but not new data. Thus, by combining the estimates from
multiple trees and the underlying sequential attention mechanism, the TabNet model displays unparalleled
accuracy.
Variable Importance Analysis
In this workflow, we employ variable importance analysis to measure the degree of the contribution that
each feature makes to the model's overall prediction performance. This helped to identify which features
the model finds crucial for estimating fracture densities. Thus, uncovering insights into the underlying
patterns and relationships between the input variables and the predicted output. The proposed model
consists of multiple features, and each contributes distinctively to the prediction outcome. Understanding
the impact of each feature on the model is crucial in comprehending the model's behaviour. Therefore,
knowledge of feature importance and feature interaction is necessary for a comprehensive understanding
of our proposed model (Sejong, 2022).
To gain a deeper understanding of the role of each feature in our machine learning model, we employed
a technique called permutation importance. We randomly shuffled each input parameter and carefully
studied the effect on the model's prediction accuracy. By using this approach, we effectively quantified the
importance of each feature and discovered valuable insights into the underlying patterns and relationships
between the input variables and the fracture density.
Model Comparison
To further evaluate the comparative robustness of the TabNet model, we compare it performance with
models documented in pre-existing literature. Nouri-Taleghani et al. (2015) documented their results, where
they made a comparison of the performance of the ML models arrived at in their study. However, the data
used in Zazoun (2013) is not publicly available, hence we trained the models using the configuration
specified in their study to enable comparison with the models developed in this study.
From the figures above, one can note that the TabNet model performed better than other models in this
comparison.
CONCLUSION AND RECOMMENDATIONS
Fractures have the potential to improve fluid recovery, but they can also significantly complicate
reservoir flow behaviour. In this work, we estimated the density of natural fractures between reservoir
intervals using different machine learning algorithms. including Recursive least square regression, Extreme
Gradient Boosting, and TabNet. These models were developed using data containing well log readings. The
dataset was divided into two portions: 75% of the dataset was used for training, and model performance
was evaluated on a 25% blind test set. Fracture information derived from core samples was also used to
further validate the model’s prediction. Performance metrics for the developed models range from 0.0897
to 0.1493 for the mean squared error and 0.8375 to 0.897 for the correlation coefficient (R2 score).
Compared to models documented in previous research studies, Google TABNET presents an improvement
in accuracy.
The results indicate that petrophysical data obtain from a limited depth interval within a fractured zone
can be used to develop a robust model. Thus, in the absence of reservoir samples and the availability of
logging data, the model can be used to estimate fracture density and create local models for predicting the
spatial distribution of permeability in heterogeneous reservoirs. The approach presented in this study can
be replicated and customized for specific reservoirs, enabling the creation of discrete fracture networks and
testing of hydraulic simulation and stimulation techniques.
A major limitation experienced in carrying out this study is high-end requirement for computational
resources despite achieving state-of-the-art performance. The model proposed require significant
computational resources to train. However, once a training cycle is completed, time and computational
resources do not become an hinderance for inference and prediction. Future works can focus on developing
models that can achieve state-of-the-art performance with fewer computational resources.
REFERENCES
Arik, S. Ö., and Pfister, T. 2021. Tabnet: Attentive interpretable tabular learning. Proceedings of the AAAI
Conference on Artificial Intelligence. 35(8): 6679-6687.
Bagheri, H., and Reza F. 2022. Fracture permeability estimation utilizing conventional well logs and flow
zone indicator. Petroleum Research. 7(3): 357-365. https://doi.org/10.1016/j.ptlrs.2021.11.004
Bhattacharya, S., and Nikolaou, M., 2016. Comprehensive Optimization Methodology for Stimulation
Design of Low-Permeability Unconventional Gas Reservoirs. SPE Journal. 21(3): 947–964.
https://doi.org/10.2118/147622-PA
Blake, O.O., Faulkner, D.R. and Tatham, D.J. 2019. The role of fractures, effective pressure and loading
on the difference between the static and dynamic Poisson’s ratio and Young’s modulus of westerly
granite. International Journal of Rock Mechanics and Mining Sciences, 116:87-98.
https://doi.org/10.1016/j.ijrmms.2019.03.001
Boadu, F.K. 1998. Inversion of fracture density from field seismic velocities using artificial neural
networks. Geophysics. 63(2): 534–545. https://doi.org/10.1190/1.1444354
Chen, T., and Guestrin, C. 2016. Xgboost: A scalable tree boosting system. Proceedings of the 22nd
international conference on knowledge discovery and data mining, pp. 785-794.
Gale, Julia F.W. 2019. Natural Fracture Characterization in the Wolfcamp Formation at the Hydraulic
Fracture Test Site (HFTS), Midland Basin, Texas. Unconventional resources Technology Conference.
One Petro. https://doi.org/10.15530/urtec-2019-644
Gale, Julia F.W., Laubach, S. E., Olson, J. E, et al. 2014. Natural fractures in shale: A review and new
observations. AAPG Bulletin. 98 (11), 2165–2216. https://doi.org/10.1306/08121413151
Gale JF et al (2007) Natural fractures in the Barnett Shale and their importance for hydraulic fracture
treatments. AAPG Bulletin. 91(4): 603– 622. https://doi.org/10.1306/11010606061
Gamal, M., El-Araby A. A. et al. 2022. Detection and characterization of fractures in the Eocene Thebes
formation using conventional well logs in October field, Gulf of Suez, Egypt. Egyptian Journal of
Petroleum. 31(3): 1-9. https://doi.org/10.1016/j.ejpe.2022.06.001
Idziak, A., 1988. Seismic wave velocity in fractured sedimentary carbonate rocks. Acta
Geophysics. 36: 10l-114.
Ince R. (2004) Prediction of fracture parameters of concrete by artificial neural networks. Engineering
Fracture Mechanics. 71:2143–2159. https://doi.org/10.1016/j.engfracmech.2003.12.004
Ja'fari, A., Ali K., Yoosef, S. et al. 2012. Fracture density estimation from petrophysical log data using the
adaptive neuro-fuzzy inference system. Journal of Geophysics and Engineering. 9:105–114.
https://doi.org/10.1088/1742-2132/9/1/013
Nouri-Taleghani M et al. 2015. Fracture density determination using a novel hybrid computational scheme:
a case study on an Iranian Marun oil field reservoir. Journal of Geophysics and Engineering. 12(2):
188–198. https://doi.org/10.1088/1742-2132/12/2/188
Eze, P. C., and Lin Y. Hu. 2022. Natural Fracture Presence Prediction in Unconventional Reservoirs Using
Machine Learning and Geostatistical Methods-Workflow and HFTS1 Case. SPE/AAPG/SEG
Unconventional Resources Technology Conference. OnePetro,
Rajabi M., Beheshtian .S., Mohamadian .N. 2021. Novel hybrid machine learning optimizer algorithms to
prediction of fracture density by petrophysical data. Journal of Petroleum Exploration and Production
Technology. 11: 4375-4397.
Sarkheil, H., Hassani, H. et al. 2009. The fracture network modeling in naturally fractured reservoirs using
artificial neural network based on image logs and core measurements. Australian Journal of Basic and
Applied Science. 3(4): 3297–3306.
Sejong Oh, 2022. Predictive case-based feature importance and interaction. Information Sciences. 593: 115-
176.
Sjøgren, B., Øfsthus, A., and Sandberg, J. 1979. Seismic classification of rock mass qualities. Geophysical
Prospecting 27(2): 409-442.
Tokhmechi, B., Memarian, H., Noubari, H.A. et al. 2009. A novel approach proposed for fractured zone
detection using petrophysical logs. Journal of Geophysics and Engineering. 6(4): 365-373.
https://doi.org/10.1088/1742-2132/6/4/004
Tokhmechi, B., Memarian, H., Rasouli, V. et al. 2009. Fracture detection from water saturation log data
using a Fourier–wavelet approach. Journal of Petroleum Engineering and Science.
https://doi.org/10.1016/j.petrol.2009.08.005
Zazoun, R.S. 2013. Fracture density estimation from core and conventional well logs data using artificial
neural networks: The Cambro-Ordovician reservoir of Mesdar oil field, Algeria. Journal of African
Earth Science 83:55–73. https://doi.org/10.1016/j.jafrearsci.2013.03.003