A-comprehensive-data-mining-approach-to-estimate-the-r_2017_Journal-of-Petro
A-comprehensive-data-mining-approach-to-estimate-the-r_2017_Journal-of-Petro
A R T I C L E I N F O A B S T R A C T
Keywords: Rate of Penetration (ROP) estimation is one of the main factors in drilling optimization and minimizing the
Rate of penetration operation costs. However, ROP depends on many parameters which make its prediction a complex problem. In the
Feature ranking presented study, a novel and reliable computational approach for prediction of ROP is proposed. Firstly, fscaret
Random forest
package in R environment was implemented to find out the importance and ranking of the inputs parameters.
Neural network
According to the feature ranking technique, weight on bit and mud weight had the highest impact on ROP based
Rule extraction
Data mining on their ranges within this dataset. Also, for developing further models Cubist method was applied to reduce the
input vector from 13 to 6 and 4. Then, Random Forest (RF) and Monotone Multi-Layer Perceptron (MON-MLP)
models were applied to predict ROP. The goodness of fit for all models were measured by RMSE and R2 in 10-fold
cross validation scheme, and both models showed a reliable accuracy. In order to gain a deeper understanding of
the relationships between input parameters and ROP, MON-MLP model with 6 inputs was used to check the effect
of weight on bit, mud weight and viscosity. Finally, RF model with 4 variables was used to extract the most
important rules from dataset as a transparent model.
* Corresponding author.
E-mail addresses: [email protected] (S. Eskandarian), [email protected] (P. Bahrami), [email protected] (P. Kazemi).
http://dx.doi.org/10.1016/j.petrol.2017.06.039
Received 17 February 2017; Received in revised form 7 June 2017; Accepted 15 June 2017
Available online 16 June 2017
0920-4105/© 2017 Elsevier B.V. All rights reserved.
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615
606
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615
607
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615
2.4. Decision trees models, it is offered to use ensembles of trees such as Random Forest (RF)
and Cubist.
One of the fast and efficient methods for data modeling is decision
tree learning which is suitable for high dimensional data. In these tree- 2.4.1. Random forest
structured models, construction starts at root node, and it splits into Breiman in 2001 firstly introduced RF. It is an ensemble of many
children nodes. Splitting of nodes continues until they end at leafs. Each unpruned decision trees which are sequentially grown, instead of only
interior node corresponds to an attribute, each leaf node represents a one. RF applies bootstrap sampling technique that randomly selects a
value of the output variable, and each branch represents conjunctions of group of dataset with replacement, and uses the remaining samples for
features that lead to those values. Splitting of training set to smaller testing each tree. This process repeats for all trees, and random sampling
subsets recursively are done based on greedy algorithm (Quinlan, 1986), ensures the diversity among the ensemble of trees and enhances the
and each node is evaluated by an Impurity Function which decreases as predictions. At the end, an internal cross-validation makes an average of
new nodes are generated (Breiman et al., 2000). Children nodes are more errors within all trees.
homogeneous, and have lower impurity compared to their parents' This method randomly picks a small subsets of size m out of M fea-
nodes. The splitting stops when the impurity of children nodes doesn't tures for each node to train a tree (m < M). Size of m is kept constant
decrease, and finally a tree is formed. during the forest growing, but the features are changed for each node. As
Although decision trees are easy to interpret, they have some disad- a result, the best split for each node is chosen among m features rather
vantages which limit their usage. Overfitting is one of the most important than all features of M.
issues that reduce the accuracy of models especially in larger trees. Also, There are different parameters which RF uses to grow the trees:
a small perturbation in training data can result in severe changes in ob- number of trees in forest (ntree), size of subset (mtry) and maximum
tained trees. To overcome the problems associated with decision tree number of nodes in trees (maxnode). In practice, ntree and mtry should
608
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615
609
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615
Table 2
Controlling parameters for RF models.
mtry 4 6
ntree 20 20
maxnode 200 800
be optimized to find the best accuracy in ensembles of trees. When fea- overlapping the rules (Quinlan, 1992). This pruning/combining pro-
tures are replaced by random noise, the importance of each variable also cedure makes the model more robust to outliers and prevents it from
can be quantified by calculating the model accuracy. Hence, importance overfitting.
determination of features is another advantage of implementing RF.
2.4.3. MON-MLP
2.4.2. Cubist Artificial Neural Network (ANN) is one of the methods for data pro-
Cubist is a rule-based predictive model which was first introduced by cessing. Generally, ANN consists of three main layers, called input, hid-
Quinlan and is an extension of earlier M5 model (Quinlan, 1992), and it den and output layers. There are some controlling parameters in a routine
constructs a type of regression tree where the prediction is based on ANN which should be optimized to build a precise predictive model, such
linear regressions, not discrete values, of terminal nodes. The final model as; number of hidden layers, neurons of each layer, transfer functions, etc
of Cubist is a set of rules (paths from root to leaves), where each rule (Bashiri and Farshbaf Geranmayeh, 2011). Hence, a challenging step in
corresponds to a multivariate linear expression. In other words, if all ANN modeling is tuning of theses parameters, which is mostly done by
conditions within a rule are met, then the linear expression can be try and error technique. In ANN, arriving signals (inputs) to each neuron
applied to predict the target value. Fig. 2 shows an example of one Cubist in hidden layers, are multiplied by the adjusted weight elements (Wip),
tree with 6 Linear Models (LM). combined, and then flow forward through a transfer function to produce
Cubist reduces the rules via pruning and/or combining with nearest the outputs for neurons. Fig. 3 illustrates a scheme of a single neuron in
neighbor rules in order to decrease the absolute error and to stop ANN modeling. The common transfer function which are mostly
Table 3
Summary of fitness error for all models.
610
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615
Table 4
General rules extracted from RF with 4 input variables.
1 3 0.328 12.5 < WOB & Azimuth 280.3 & MW 10.598,15 Medium 1
2 2 0.333 14.5 < WOB & 233.6 < Azimuth Medium 0.509
3 5 0.2 WOB 12.5 & 33.5 < Incline & 71.2 < Azimuth 206 & MW 11.81 Medium 0.366
4 3 0 13.5 < WOB & Incline 48.65 & 280.3 < Azimuth Medium 0.277
5 3 0.103 Incline 41.5 & 196.6 < Azimuth 225 Low 0.125
6 2 0.222 18.5 < WOB & 44.5 < Incline High 0.118
7 1 0 55 < Incline High 0.101
8 2 0.1 42.5 < Incline 44.1 Medium 0.089
9 3 0 WOB 12.5 & 9.89 < MW 11.47 Low 0.084
10 2 0 Incline 36.5 & Azimuth 190.6 Low 0.06
11 3 0 15.5 < WOB & 224.7 < Azimuth & MW 10.222,625 Medium 0.048
12 3 0.286 14.5 < WOB & 45.8 < Incline & 10.22 < MW High 0.047
13 3 0 18.5 < WOB & 44.5 < Incline & Azimuth 224.7 High 0.045
14 4 0.444 14.5 < WOB 22.5 & 45.5 < Incline 55 Medium 0.029
15 2 0.25 20 < WOB & Incline 42.5 Low 0.026
16 2 0.135 Incline 43.5 & 10.14 < MW Low 0.022
17 4 0.125 12.5 < WOB & 63.7 < Azimuth 73 & 9.39 < MW Medium 0.015
employed in ANN are: hyperbolic tangent sigmoid (TANSIG), logarithm 3. Result and discussion
sigmoid (LOGSIG) and pure linear (PURELIN), and they are presented in
Equations (4)–(6). As an initial effort, linear regression was applied to verify the
complexity of the system and to find out whether machine learning ap-
2
TANSIG ¼ 1 (4) proaches can improve the predictions. Fig. 6 shows the estimated results
1 þ exp np based on the linear regression model. From this figure, it can be concluded
that the behavior of the system is highly complex (R2 ¼ 0.482), and con-
1 firms that further investigations are needed.
LOGSIG ¼ (5)
1 þ exp np
3.1. Feature ranking
PURELIN ¼ np (6)
The initial input vector consisted of 13 independent variables. To
XN perform feature ranking and finding the importance of each variable,
np ¼ Xi *Wip þ bp (7)
i¼1 fscaret package in R environment was used. Fig. 7 demonstrates the re-
sults of feature ranking created by fscaret.
where; Xi is the inputs, bp is the bias and np is the transfer function input.
The selected variables were mostly consistent with the findings in
In this study, “monmlp.fit” function of the “monmlp” package in R
literature. According to the results, WOB and MW were the first and
environment was used to implement Monotone Multi-Layer Perceptron
second most effective parameters on ROP for this dataset, respectively.
(MON-MLP) neural network (Cannon, 2014). This function includes an
These two parameters were investigated in some literature as the main
ensemble of MLP models, and it is used bootstrap aggregation (bagging)
factors affecting the ROP (Akpabio et al., 2015; Ernst et al., 2007;
to help avoid overfitting. The final result was an average of RSME and R2
Cheatham and Nahm, 1985). The parameters of incline and azimuth,
of the ensemble of models. All of the generated models had two hidden
which are related to the well trajectory, were the next important pa-
layers with 2–20 nodes per layer. The transfer functions of TANSIG and
rameters for this formation type. The importance of incline and azimuth
PURELIN were used for hidden and output layers; respectively. Fig. 4
may differ for other formation types, and have a neglect effect on ROP. It
illustrates a schematic of MON-MLP model. Also, other parameters were
should be mentioned that the result for variable importance is case
as follows:
specific, and for other reservoirs and formations, it may yield
different results.
The ensemble was a group of 10 models.
Iteration numbers for training the models were set to 10, 50, 80, 100,
500, 1000 and 2000. 3.2. Machine learning approaches
“trial” function as a multi-start technique was set to 5 to avoid local
minima. 3.2.1. Cubist
Cubist model which is a fast algorithm, was implemented on the
The workflow of this study along with the results of feature ranking is dataset to find the best input vector's size based on the generalization
demonstrated in Fig. 5 which consists of 3 main phases of data prepa- error (RMSE). Hence, the models were trained on 10CV at various
ration, selection of important variables and modeling. number of variables from the most important to the lowest one based on
the results of feature ranking. According to the results, RMSE reduced
from 2.0134 (for all variables) to the lowest amount of 1.7777 and
Table 5
Controlling parameters for MON-MLP models.
611
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615
1.7987 (for 4 and 6 variables). Finally, input vector size of 4 and 6 were finding the most accurate model, around 3000 models were trained to
chosen for further analyses to find more accurate results with select the best MON-MLP model. A summary of these controlling pa-
other methods. rameters is presented in Table 5. Also, Fig. 8a and b demonstrate the
predictions of ROP obtained from training and 10CV models with
3.2.2. Random forest MON-MLP.
To find the best RF model, around 3 000 models were trained with A survey was applied on parameters of WOB, MW and Plastic vis-
different controlling parameters (mtry, ntree and maxnode), and finally cosity to see how they affect the ROP. Fig. 9a–c represent the effect of
the ones which presented in Table 2 were chosen as the most accu- these inputs on ROP. To plot these figures, the variable under study was
rate model. varied while the other variables were fixed at their midrange values. By
To evaluate the RF model's performance, R2 and RMSE based on 10CV analyzing Fig. 9a, it can be seen that up to a limit (flounder point),
approach was used. A summary of the results about all models with 4 and increasing the WOB enhanced the penetration. If too much weight is
6 input vector size is presented in Table 3. It can be concluded from this applied, ROP decreases, and may cause rapid bit wearing. Furthermore,
table that the models generated by RF (4 and 6 variables), were well an increase in MW generally tends to decrease the ROP; however, a slight
trained, and were more accurate than the ones obtained from other growth in ROP was seen at low MW in Fig. 9b. This behavior may happen
models but they had a higher generalization error based on 10CV because of the positive effect of MW on hydraulic impact force which
compared to MON-MLP models. improves the ROP (Bourgoyne et al., 1986). At low plastic viscosities,
Rule extraction is one of the benefits of using RF models in compar- increasing the viscosity improves the transport ability of cuttings, and
ison to black-box models such as ANN. It means that the generated trees therefore raises the ROP. But, excess amount of viscosity, inverses the
can be expressed as some general rules which explain the relationships in
the dataset, and are very easy to interpret. Therefore, “inTrees” package
in R environment was applied on the ensemble of trees to extract the
rules according to their length, error and importance (Deng, 2014). The
extracted rules for RF model with the input vector size of 4 are shown in
Table 4. The output (ROP) was automatically split into three levels, Low
(0.183–5.9), Medium (5.9–11.6) and High (11.6–17.3) based on the
minimum and maximum of it.
3.2.3. MON-MLP
It was concluded from Table 3, that the MON-MLP model with 6
variables was the best model in terms of generalization error (R2 and
RMSE in 10CV). So, this model was selected for further analysis to see the
behavior of important variables on ROP. Similar to the RF procedure in
Fig. 8. Predicted values vs. actual values for a) training and b) 10-CV model of MON-MLP. Fig. 9. a–c: Effect of WOB, MW, and plastic viscosity on ROP.
612
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615
Fig. 10. a–c: Predicted and Actual ROPs at different depths for well 1, 2 and 5.
result. In Fig. 9c, because the viscosity range is limited, the inverse there is a reasonable match between the ROPs at various depth which
behavior didn't observe. implies the success of modeling with MON-MLP and the whole procedure
At the final effort, the predicted ROP were compared to actual ROP at done in this work. Therefore, the achieved MON-MLP model can be used
various depth using MON-MLP model. Fig. 10a–c shows the predicted as a reliable method for estimation of ROP depend on various parame-
and actual ROPs vs. depth for well number 1, 2 and 5. As it can be seen, ters, and further analysis within the reservoir.
613
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615
4. Conclusion dataset using the reduced input vector. Both models, showed a reason-
able RMSE and R2 based on 10CV approach. RF model as a transparent
In this work, a comprehensive data mining approach is presented to model, unlike other black-box ones, was used to extract the general rules
perform the computational modeling of ROP. Moreover, following this from the dataset. So, the RF model with 4 variables (R2 of 0.946 and
approach and using the data mining techniques can help to unfold 0.7653 for training and test models) was employed, and 17 general rules
knowledge from the dataset. Applying fscaret package in R environment were extracted. These rules can help to better understand the relation-
enabled to find the importance and ranking of all 13 variables. According ships of features on ROP. Finally, the MON-MLP model with 6 variables
to the results of feature ranking, WOB and MW had the most impact on which was the most accurate model among all (R2 of 0.9472 and 0.8009
the ROP. Furthermore, Cubist model was implemented to reduce the for training and test models) was used to predict the ROP at various
input vector to 4 and 6 which had the lowest RMSE of 1.7777 and 1.7987, depths for 3 wells. Also, the effect of WOB, MW and viscosity on ROP
respectively. RF and MON-MLP models were then developed on the were investigated to see how they are behaving within the dataset.
Nomenclature
Abbreviations
10CV 10-fold Cross Validation
AI Artificial Intelligence
ANN Artificial Neural Network
Gel 10 s Gel Strength after 10 s
Gel 10 m Gel Strength after 10 min
LM Linear Model
LOGSIG Logarithm Sigmoid Transfer Function
maxnode maximum number of nodes in trees
MW Mud Weight
mtry size of subset randomly sampled
MON-MLP
Monotone Multi-Layer Perceptron
ntree number of trees in forest
PURELIN Pure linear Transfer Function
R2 coefficient of determination
RF Random Forest
RMSE Root Mean Squared Error
ROP Rate of Penetration
rpm revolution per minute
TANSIG Hyperbolic Tangent Sigmoid Transfer Function
WOB Weight on Bit
Symbols
bp ANN bias
m random subset size
M number of all parameters
n total number of records
N number of ANN inputs
np transfer function input
p number of ANN layers
Wip adjusted weight of input i for layer p in ANN
Xi neurons input
yact actual value
Yi output of neurons
ym actual values average
ypred predicted value
References Bourgoyne Jr., A.T., Young, F.S., 1974. A multiple regression approach to optimal drilling
and abnormal pressure detection. SPE J. 14, 371–384.
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 2000. Classification and Regression
Akpabio, J.U., Inyang, P.N., Iheaka, C.I., 2015. The effect of drilling mud density on
Trees. Chapman & Hall/CRC, Boca Raton, FL.
penetration rate. Int. Res. J. Eng. Technol. 2, 29–35.
Breiman, L., 2001. Random forest. Mach. Learn. 45, 5–32.
Bahrami, P., Kazemi, P., Mahdavi, S., Ghobadi, H., 2016. A novel approach for modeling
Cannon, A.J.. Monmlp: Monotone Multi-layer Perceptron Neural Network. Available
and optimization of surfactant/polymer flooding based on genetic programming
from: http://cran.r-project.org/web/packages/monmlp/index.html (Accessed 1 June
evolutionary algorithm. Fuel 179, 289–298.
2014).
Bashiri, M., Farshbaf Geranmayeh, A., 2011. Tuning the parameters of an artificial neural
Cheatham, C. A., Nahm, J. J., Effects of selected mud properties on rate of penetration in
network using central composite design and genetic algorithm. Sci. Iran. 18,
full-scale shale drilling simulations, SPE-13465-MS, SPE/IADC Drilling Conference,
1600–1608.
5–8 March, New Orleans, Louisiana, 1985.
Bourgoyne Jr., A.T., Millheim, K.K., Chenevert, M.E., Young Jr., F.S., 1986. Applied
Deng, H., 2014. Interpreting Tree Ensembles with Intrees. Intuit. ArXiv: 1408.5456.
drilling engineering. In: SPE Textbook Series, vol. 2. Chapter 4, Page 131.
614
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615
Edalatkhah, S., Rasoul, R., Hashemi, A., 2010. Bit selection optimization using artificial Ricardo, J., Mendes, P., Fonseca, T.C., Serapaio, A.B.S., 2007. Applying a neuro-model
intelligence systems. Petrol. Sci. Technol. 28, 1946–1956. reference adaptive controller in drilling optimization. World Oil Mag. 228, 29–38.
Ernst, S., Pastusek, P. E., Lutes, P. J., Effects of RPM and ROP on PDC bit steerability, SPE- Szlek, J., Mendyk, A., 2015. The fscaret, Automated Feature Selection Using Variety of
105594-MS, SPE/IADC Drilling Conference, 20–22 February, Amsterdam, The Models Provided by Caret Package. http://CRAN.R-project.org/package¼fscaret.
Netherlands, 2007. Szlek, J., Pacławski, A., Lau, R., Jachowicz, R., Kazemi, P., Mendyk, A., 2016. Empirical
Freedman, D.A., 2009. Statistical Models: Theory and Practice, second ed. University of search for factors affecting mean particle size of PLGA microspheres containing
California, Berkeley. macromolecular drugs. Comput. Meth. Programs Biomed. 134, 137–147.
James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Team, R. C, 2015. R: A Language and Environment for Statistical Computing. R
Learning. Springer, p. 204. Foundation for Statistical Computing.
Kordon, K., 2010. Applying Computational Intelligence – How to Create Value. Springer, Walker, B. H., Black, A. D., Klauber, W. P., Little, T., Khodaverdian, M., Roller-Bit
pp. 73–113. penetration rate response as a function of rock properties and well depth. SPE 15620.
Monazami, M., Hashemi, A., Shahbazian, M., 2012. Drilling rate of penetration prediction SPE Annual Technical Conference and Exhibition, 5–8 October 1986, New Orleans,
using artificial neural network: a case study of one of iranian southern oil fields. Louisiana 1986.
Electron. Sci. J. Oil Gas Bus. 6, 21–31. Warren, T.M., March 1987. Penetration-rate performance of roller-cone bit. SPEDE 2,
Quinlan, J.R., 1986. Induction of decision trees. Mach. Learn. 81–106. 9–18.
Quinlan, J.R., 1992. Learning with Continuous Classes. World Scientific, pp. 343–348. Winters, W. J., Warren, T. M., Onyia, E. C., Roller bit model with ductility and cone offset.
Radermacher, F.J., 1991. Modeling and artificial intelligence. Appl. Artif. Intell. 5, SPE Paper 16696, Presented at SPE 62nd Annual Technical Conference and
131–151. Exhibition Held in Dallas, Texas, September 27-30, 1987.
Rahimzadeh, H., Mostofi, M., Hashemi, A., Salahshoor, K., Comparision of the penetration Zhang, Y., Yang, Y.J., 2015. Cross-validation for selecting a model selection procedure.
rate models using field data for one of the gas fields in Persian Gulf area. SPE 131253. Econometrics 187, 95–112.
CPS/SPE International Oil and Gas Conference and Exhibition in China Held in
Beijing, China, 8-10 June, 2010.
615