0% found this document useful (0 votes)

14 views

A-comprehensive-data-mining-approach-to-estimate-the-r_2017_Journal-of-Petro

Uploaded by

Abdoul Ridwan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

A-comprehensive-data-mining-approach-to-estimate-the-r_2017_Journal-of-Petro

Uploaded by

Abdoul Ridwan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Journal of Petroleum Science and Engineering 156 (2017) 605–615

Contents lists available at ScienceDirect

Journal of Petroleum Science and Engineering

journal homepage: www.elsevier.com/locate/petrol

A comprehensive data mining approach to estimate the rate of penetration:

Application of neural network, rule based models and feature ranking
Sajad Eskandarian a, Peyman Bahrami b, *, Pezhman Kazemi c
a
Faculty of Mining, Petroleum and Geophysics Engineering, Shahrood University of Technology, Shahrood, Iran
b
Young Researchers and Elites Club, Science and Research Branch, Islamic Azad University, Tehran, Iran
c
Departament d'Enginyeria Química, Escola Tecnica Superior d'Enginyeria Química, Universitat Rovira i Virgili, Tarragona, 43007, Catalonia, Spain

A R T I C L E I N F O A B S T R A C T

Keywords: Rate of Penetration (ROP) estimation is one of the main factors in drilling optimization and minimizing the
Rate of penetration operation costs. However, ROP depends on many parameters which make its prediction a complex problem. In the
Feature ranking presented study, a novel and reliable computational approach for prediction of ROP is proposed. Firstly, fscaret
Random forest
package in R environment was implemented to ﬁnd out the importance and ranking of the inputs parameters.
Neural network
According to the feature ranking technique, weight on bit and mud weight had the highest impact on ROP based
Rule extraction
Data mining on their ranges within this dataset. Also, for developing further models Cubist method was applied to reduce the
input vector from 13 to 6 and 4. Then, Random Forest (RF) and Monotone Multi-Layer Perceptron (MON-MLP)
models were applied to predict ROP. The goodness of ﬁt for all models were measured by RMSE and R2 in 10-fold
cross validation scheme, and both models showed a reliable accuracy. In order to gain a deeper understanding of
the relationships between input parameters and ROP, MON-MLP model with 6 inputs was used to check the effect
of weight on bit, mud weight and viscosity. Finally, RF model with 4 variables was used to extract the most
important rules from dataset as a transparent model.

1. Introduction comprehensive estimation because of the complex and non-linear

behavior of parameters with ROP (Ricardo et al., 2007). These diffi-
Oil and gas well drilling is one of the most important and expensive culties have conducted the recent research toward using intelligent so-
part of the oil industry. One of the main factors which has an effect on lutions which implement computers as an alternative (Rahimzadeh et al.,
drilling efficiency and cost is the Rate of Penetration (ROP). It is 2010; Monazami et al., 2012; Edalatkhah et al., 2010).
important to find and optimize the relationship between drilling pa- This study is intended to propose a procedure for predicting the ROP
rameters and ROP in order to reduce cost and increase drilling operation's and/or other complex systems in order to reach reliable models. It con-
performance eventually. The ROP can be explained by the progress of a sists of using feature ranking technique and applying different data
bit in rocks and formations in time units, and feet per minute and/or mining algorithms to search available design space and build predic-
meters per hour are the commonly used units for that. There are several tive models.
parameters have an effect on ROP such as bit type, Weight on Bit (WOB),
revolution per minute (rpm), mud properties, drilling hydraulic, forma- 2. Materials and methods
tion conditions, etc (Walker et al., 1986). Generally, ROP is surveyed in
instantaneous and average types. Instantaneous ROP is recorded in In this work, different computational intelligence methods have been
restricted time and distance during drilling, like the data that is used in used to model the ROP employing on a dataset containing 226 data
this work, while average ROP is recorded in time periods that drill pipes points. These data points were collected from 5 different wells of a field
run in a well. located in South-West of Iran using the rotary table, and includes
In the past, several researchers tried to develop models and formulate different characteristics of drilling operations, drilling hydraulic, etc. in
related parameters to ROP (Bourgoyne and Young, 1974; Warren, 1987; paying and non-paying zones.
Winters et al., 1987); however, none of them could make an accurate and

* Corresponding author.
E-mail addresses: [email protected] (S. Eskandarian), [email protected] (P. Bahrami), [email protected] (P. Kazemi).

http://dx.doi.org/10.1016/j.petrol.2017.06.039
Received 17 February 2017; Received in revised form 7 June 2017; Accepted 15 June 2017
Available online 16 June 2017
0920-4105/© 2017 Elsevier B.V. All rights reserved.
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615

Table 1 a proper model in feature ranking.

Input variables in feature ranking. In this work, fscaret package as a in R environment is used to reduce
Variable Range the input vector and give stable variable importance (Szlek and Mendyk,
WOB, klb 6–30 2015; Team, 2015). fscaret uses a large number of 103 various models in
rpm 4–100 order to determine the importance of features and rank them. At the end,
Flow Rate, gpm 350–1100 the importance of variables is averaged through all models to give reli-
Pump Pressure, psi 1200–3800 able results, and is scaled from 0 to 100.
Incline 5.1–56
Azimuth 52.41–322
In this work, firstly a total number of 21 variables were chosen to
Measured Drilling Depth, m 1020–5186 estimate the output. Then, 8 constant or nearly constant variables were
MWa, ppg 8.85–17.25 excluded, and 13 remaining variables were selected for feature ranking.
Funnel Viscosity, sec 28–179 These features are shown in Table 1.
Plastic Viscosity, cp 3–97
Yield Point, lb/100sf 4–76
Gel 10sb, lb/100sf 1–73 2.3. Model assessment
Gel 10mc, lb/100sf 2–143
a
Mud weight. To evaluate the model's goodness of fit, Root Mean Squared Error
b
Gel strength after 10 s. (RMSE) and the coefficient of determination (R2) was used. Additionally,
c
Gel strength after 10 min. to examine the generalization ability of the models, 10-fold Cross Vali-
dation (10CV) approach was performed. The model which has a lower
2.1. Data mining RMSE and higher R2, can be considered as the best model. RMSE and R2
are calculated based on Equations (1) and (2).
Using new technologies in the present era, causes a large volume of sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
data to be created in a rapid pace, thus our success in data interpretation Pn 2
i¼1 yprd;i yact;i
depends on a comprehensive insight within it. Emerging advanced pro- RMSE ¼ (1)
n
cessing systems over the last decade, has allowed us to put away the old
time-consuming and weary data analysis methods. Data mining is the Pn
term that is used for processes of finding hidden patterns and correlations yprd;i yact;i
R2 ¼ 1 Pi¼1 (2)
i¼1 yprd;i ym
n
through data to predict the outcomes, and is comprised of three main
scientific fields of statistics, artificial intelligence and machine learning.
Statistics and Artificial Intelligence (AI) can be found extensively in the where yact and yprd are the actual and predicted values; respectively, i is
literature (Freedman, 2009; Radermacher, 1991). In machine learning, the data record number, ym is average of the experimental value, and n is
the computers are used to probe structures within a dataset even if there the total number of records.
is not a theory behind the way they look like (Kordon, 2010). In this It is essential to check the generalization ability of produced models
work, different machine learning approaches have been applied to model in data mining problems. One of the common methods for model's vali-
the ROP based on various input variables. dation is to split the dataset into training and test sets, and evaluate the
performance based on the test set. However, inability of measuring the
generalization ability of models is a weak point of this method, and it is
2.2. Feature ranking possible that accurate models are generated by chance especially in low
to medium sized datasets.
Dealing with many features in machine learning problems, can cause One of the techniques to survey how a model can generalize to an
some difficulties in both the dataset and the derived models. Feature independent data set is 10CV (Zhang and Yang, 2015; Bahrami et al.,
ranking is a technique which finds the importance of features, and can be 2016). The data set is split into 10 parts where; at first step, one of the 10
used to reduce the dimensions of the dataset. Using this technique also subsets is chosen as the test set, and the other 9 subsets are all considered
has the benefits of shorter training time, ease of interpretation of models, as the training set. This procedure continues 10 times, and at each step
overfitting reduction and lower cost in data collection (James et al., one of the parts is recognized as the test set. Then, the average of RMSE
2013). Depends on the choice of predictive models for feature ranking and R2 is calculated for all 10 trials. 10CV has a lower variance compared
approach, different results of variable ranking and importance would be to other methods, and can confirm that the model is generalized and not
achieved (Szlek et al., 2016). Though, it is always a moot point to choose over-fitted. Fig. 1 demonstrates the 10CV procedure.

Fig. 1. Schematic of 10CV method.

606
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615

Fig. 2. Example of one Cubist tree with 6 linear regression models.

Fig. 3. Scheme of a single neuron.

607
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615

2.4. Decision trees models, it is offered to use ensembles of trees such as Random Forest (RF)
and Cubist.
One of the fast and efficient methods for data modeling is decision
tree learning which is suitable for high dimensional data. In these tree- 2.4.1. Random forest
structured models, construction starts at root node, and it splits into Breiman in 2001 firstly introduced RF. It is an ensemble of many
children nodes. Splitting of nodes continues until they end at leafs. Each unpruned decision trees which are sequentially grown, instead of only
interior node corresponds to an attribute, each leaf node represents a one. RF applies bootstrap sampling technique that randomly selects a
value of the output variable, and each branch represents conjunctions of group of dataset with replacement, and uses the remaining samples for
features that lead to those values. Splitting of training set to smaller testing each tree. This process repeats for all trees, and random sampling
subsets recursively are done based on greedy algorithm (Quinlan, 1986), ensures the diversity among the ensemble of trees and enhances the
and each node is evaluated by an Impurity Function which decreases as predictions. At the end, an internal cross-validation makes an average of
new nodes are generated (Breiman et al., 2000). Children nodes are more errors within all trees.
homogeneous, and have lower impurity compared to their parents' This method randomly picks a small subsets of size m out of M fea-
nodes. The splitting stops when the impurity of children nodes doesn't tures for each node to train a tree (m < M). Size of m is kept constant
decrease, and finally a tree is formed. during the forest growing, but the features are changed for each node. As
Although decision trees are easy to interpret, they have some disad- a result, the best split for each node is chosen among m features rather
vantages which limit their usage. Overfitting is one of the most important than all features of M.
issues that reduce the accuracy of models especially in larger trees. Also, There are different parameters which RF uses to grow the trees:
a small perturbation in training data can result in severe changes in ob- number of trees in forest (ntree), size of subset (mtry) and maximum
tained trees. To overcome the problems associated with decision tree number of nodes in trees (maxnode). In practice, ntree and mtry should

Fig. 4. Scheme of MON-MLP model.

608
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615

Fig. 5. General Scheme of workﬂow.

Fig. 6. Linear regression on the dataset.

609
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615

Fig. 7. Importance of variables obtained from fscaret.

Table 2
Controlling parameters for RF models.

Controlling Parameters RF (4 variables) RF (6 variables)

mtry 4 6
ntree 20 20
maxnode 200 800

be optimized to find the best accuracy in ensembles of trees. When fea- overlapping the rules (Quinlan, 1992). This pruning/combining pro-
tures are replaced by random noise, the importance of each variable also cedure makes the model more robust to outliers and prevents it from
can be quantified by calculating the model accuracy. Hence, importance overfitting.
determination of features is another advantage of implementing RF.
2.4.3. MON-MLP
2.4.2. Cubist Artificial Neural Network (ANN) is one of the methods for data pro-
Cubist is a rule-based predictive model which was first introduced by cessing. Generally, ANN consists of three main layers, called input, hid-
Quinlan and is an extension of earlier M5 model (Quinlan, 1992), and it den and output layers. There are some controlling parameters in a routine
constructs a type of regression tree where the prediction is based on ANN which should be optimized to build a precise predictive model, such
linear regressions, not discrete values, of terminal nodes. The final model as; number of hidden layers, neurons of each layer, transfer functions, etc
of Cubist is a set of rules (paths from root to leaves), where each rule (Bashiri and Farshbaf Geranmayeh, 2011). Hence, a challenging step in
corresponds to a multivariate linear expression. In other words, if all ANN modeling is tuning of theses parameters, which is mostly done by
conditions within a rule are met, then the linear expression can be try and error technique. In ANN, arriving signals (inputs) to each neuron
applied to predict the target value. Fig. 2 shows an example of one Cubist in hidden layers, are multiplied by the adjusted weight elements (Wip),
tree with 6 Linear Models (LM). combined, and then flow forward through a transfer function to produce
Cubist reduces the rules via pruning and/or combining with nearest the outputs for neurons. Fig. 3 illustrates a scheme of a single neuron in
neighbor rules in order to decrease the absolute error and to stop ANN modeling. The common transfer function which are mostly

Table 3
Summary of ﬁtness error for all models.

Model RMSE (Training) R2 (Training) RMSE (10 CV) R2 (10 CV)

Cubist (4 variables) 1.4777 0.7787 1.7777 0.6457

Cubist (6 variables) 1.4548 0.7865 1.7987 0.6382
Cubist (all variables) 1.1444 0.8775 2.0134 0.5351
RF (4 variables) 0.7496 0.946 1.4718 0.7653
RF (6 variables) 0.6784 0.9572 1.5039 0.751
MON-MLP (4 variables) 1.0333 0.8871 1.4468 0.7618
MON-MLP (6 variables) 0.7005 0.9472 1.33 0.8009

610
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615

Table 4
General rules extracted from RF with 4 input variables.

Rule length error condition prediction Importance

1 3 0.328 12.5 < WOB & Azimuth 280.3 & MW 10.598,15 Medium 1
2 2 0.333 14.5 < WOB & 233.6 < Azimuth Medium 0.509
3 5 0.2 WOB 12.5 & 33.5 < Incline & 71.2 < Azimuth 206 & MW 11.81 Medium 0.366
4 3 0 13.5 < WOB & Incline 48.65 & 280.3 < Azimuth Medium 0.277
5 3 0.103 Incline 41.5 & 196.6 < Azimuth 225 Low 0.125
6 2 0.222 18.5 < WOB & 44.5 < Incline High 0.118
7 1 0 55 < Incline High 0.101
8 2 0.1 42.5 < Incline 44.1 Medium 0.089
9 3 0 WOB 12.5 & 9.89 < MW 11.47 Low 0.084
10 2 0 Incline 36.5 & Azimuth 190.6 Low 0.06
11 3 0 15.5 < WOB & 224.7 < Azimuth & MW 10.222,625 Medium 0.048
12 3 0.286 14.5 < WOB & 45.8 < Incline & 10.22 < MW High 0.047
13 3 0 18.5 < WOB & 44.5 < Incline & Azimuth 224.7 High 0.045
14 4 0.444 14.5 < WOB 22.5 & 45.5 < Incline 55 Medium 0.029
15 2 0.25 20 < WOB & Incline 42.5 Low 0.026
16 2 0.135 Incline 43.5 & 10.14 < MW Low 0.022
17 4 0.125 12.5 < WOB & 63.7 < Azimuth 73 & 9.39 < MW Medium 0.015

employed in ANN are: hyperbolic tangent sigmoid (TANSIG), logarithm 3. Result and discussion
sigmoid (LOGSIG) and pure linear (PURELIN), and they are presented in
Equations (4)–(6). As an initial effort, linear regression was applied to verify the
complexity of the system and to find out whether machine learning ap-
2
TANSIG ¼ 1 (4) proaches can improve the predictions. Fig. 6 shows the estimated results
1 þ exp np based on the linear regression model. From this figure, it can be concluded
that the behavior of the system is highly complex (R2 ¼ 0.482), and con-
1 firms that further investigations are needed.
LOGSIG ¼ (5)
1 þ exp np
3.1. Feature ranking
PURELIN ¼ np (6)
The initial input vector consisted of 13 independent variables. To
XN perform feature ranking and finding the importance of each variable,
np ¼ Xi *Wip þ bp (7)
i¼1 fscaret package in R environment was used. Fig. 7 demonstrates the re-
sults of feature ranking created by fscaret.
where; Xi is the inputs, bp is the bias and np is the transfer function input.
The selected variables were mostly consistent with the findings in
In this study, “monmlp.fit” function of the “monmlp” package in R
literature. According to the results, WOB and MW were the first and
environment was used to implement Monotone Multi-Layer Perceptron
second most effective parameters on ROP for this dataset, respectively.
(MON-MLP) neural network (Cannon, 2014). This function includes an
These two parameters were investigated in some literature as the main
ensemble of MLP models, and it is used bootstrap aggregation (bagging)
factors affecting the ROP (Akpabio et al., 2015; Ernst et al., 2007;
to help avoid overfitting. The final result was an average of RSME and R2
Cheatham and Nahm, 1985). The parameters of incline and azimuth,
of the ensemble of models. All of the generated models had two hidden
which are related to the well trajectory, were the next important pa-
layers with 2–20 nodes per layer. The transfer functions of TANSIG and
rameters for this formation type. The importance of incline and azimuth
PURELIN were used for hidden and output layers; respectively. Fig. 4
may differ for other formation types, and have a neglect effect on ROP. It
illustrates a schematic of MON-MLP model. Also, other parameters were
should be mentioned that the result for variable importance is case
as follows:
specific, and for other reservoirs and formations, it may yield
different results.
The ensemble was a group of 10 models.
Iteration numbers for training the models were set to 10, 50, 80, 100,
500, 1000 and 2000. 3.2. Machine learning approaches
“trial” function as a multi-start technique was set to 5 to avoid local
minima. 3.2.1. Cubist
Cubist model which is a fast algorithm, was implemented on the
The workflow of this study along with the results of feature ranking is dataset to find the best input vector's size based on the generalization
demonstrated in Fig. 5 which consists of 3 main phases of data prepa- error (RMSE). Hence, the models were trained on 10CV at various
ration, selection of important variables and modeling. number of variables from the most important to the lowest one based on
the results of feature ranking. According to the results, RMSE reduced
from 2.0134 (for all variables) to the lowest amount of 1.7777 and

Table 5
Controlling parameters for MON-MLP models.

Controlling Parameters MON-MLP (4 variables) MON-MLP (6 variables)

Number of nodes in layer 1 12 20

Number of nodes in layer 2 6 2
Number of models in ensemble 10 10
Trials 5 5
Maximum number of iterations 90 200

611
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615

1.7987 (for 4 and 6 variables). Finally, input vector size of 4 and 6 were finding the most accurate model, around 3000 models were trained to
chosen for further analyses to find more accurate results with select the best MON-MLP model. A summary of these controlling pa-
other methods. rameters is presented in Table 5. Also, Fig. 8a and b demonstrate the
predictions of ROP obtained from training and 10CV models with
3.2.2. Random forest MON-MLP.
To find the best RF model, around 3 000 models were trained with A survey was applied on parameters of WOB, MW and Plastic vis-
different controlling parameters (mtry, ntree and maxnode), and finally cosity to see how they affect the ROP. Fig. 9a–c represent the effect of
the ones which presented in Table 2 were chosen as the most accu- these inputs on ROP. To plot these figures, the variable under study was
rate model. varied while the other variables were fixed at their midrange values. By
To evaluate the RF model's performance, R2 and RMSE based on 10CV analyzing Fig. 9a, it can be seen that up to a limit (flounder point),
approach was used. A summary of the results about all models with 4 and increasing the WOB enhanced the penetration. If too much weight is
6 input vector size is presented in Table 3. It can be concluded from this applied, ROP decreases, and may cause rapid bit wearing. Furthermore,
table that the models generated by RF (4 and 6 variables), were well an increase in MW generally tends to decrease the ROP; however, a slight
trained, and were more accurate than the ones obtained from other growth in ROP was seen at low MW in Fig. 9b. This behavior may happen
models but they had a higher generalization error based on 10CV because of the positive effect of MW on hydraulic impact force which
compared to MON-MLP models. improves the ROP (Bourgoyne et al., 1986). At low plastic viscosities,
Rule extraction is one of the benefits of using RF models in compar- increasing the viscosity improves the transport ability of cuttings, and
ison to black-box models such as ANN. It means that the generated trees therefore raises the ROP. But, excess amount of viscosity, inverses the
can be expressed as some general rules which explain the relationships in
the dataset, and are very easy to interpret. Therefore, “inTrees” package
in R environment was applied on the ensemble of trees to extract the
rules according to their length, error and importance (Deng, 2014). The
extracted rules for RF model with the input vector size of 4 are shown in
Table 4. The output (ROP) was automatically split into three levels, Low
(0.183–5.9), Medium (5.9–11.6) and High (11.6–17.3) based on the
minimum and maximum of it.

3.2.3. MON-MLP
It was concluded from Table 3, that the MON-MLP model with 6
variables was the best model in terms of generalization error (R2 and
RMSE in 10CV). So, this model was selected for further analysis to see the
behavior of important variables on ROP. Similar to the RF procedure in

Fig. 8. Predicted values vs. actual values for a) training and b) 10-CV model of MON-MLP. Fig. 9. a–c: Effect of WOB, MW, and plastic viscosity on ROP.

612
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615

Fig. 10. a–c: Predicted and Actual ROPs at different depths for well 1, 2 and 5.

result. In Fig. 9c, because the viscosity range is limited, the inverse there is a reasonable match between the ROPs at various depth which
behavior didn't observe. implies the success of modeling with MON-MLP and the whole procedure
At the ﬁnal effort, the predicted ROP were compared to actual ROP at done in this work. Therefore, the achieved MON-MLP model can be used
various depth using MON-MLP model. Fig. 10a–c shows the predicted as a reliable method for estimation of ROP depend on various parame-
and actual ROPs vs. depth for well number 1, 2 and 5. As it can be seen, ters, and further analysis within the reservoir.

613
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615

4. Conclusion dataset using the reduced input vector. Both models, showed a reason-
able RMSE and R2 based on 10CV approach. RF model as a transparent
In this work, a comprehensive data mining approach is presented to model, unlike other black-box ones, was used to extract the general rules
perform the computational modeling of ROP. Moreover, following this from the dataset. So, the RF model with 4 variables (R2 of 0.946 and
approach and using the data mining techniques can help to unfold 0.7653 for training and test models) was employed, and 17 general rules
knowledge from the dataset. Applying fscaret package in R environment were extracted. These rules can help to better understand the relation-
enabled to ﬁnd the importance and ranking of all 13 variables. According ships of features on ROP. Finally, the MON-MLP model with 6 variables
to the results of feature ranking, WOB and MW had the most impact on which was the most accurate model among all (R2 of 0.9472 and 0.8009
the ROP. Furthermore, Cubist model was implemented to reduce the for training and test models) was used to predict the ROP at various
input vector to 4 and 6 which had the lowest RMSE of 1.7777 and 1.7987, depths for 3 wells. Also, the effect of WOB, MW and viscosity on ROP
respectively. RF and MON-MLP models were then developed on the were investigated to see how they are behaving within the dataset.

Nomenclature

Abbreviations
10CV 10-fold Cross Validation
AI Artificial Intelligence
ANN Artificial Neural Network
Gel 10 s Gel Strength after 10 s
Gel 10 m Gel Strength after 10 min
LM Linear Model
LOGSIG Logarithm Sigmoid Transfer Function
maxnode maximum number of nodes in trees
MW Mud Weight
mtry size of subset randomly sampled
MON-MLP
Monotone Multi-Layer Perceptron
ntree number of trees in forest
PURELIN Pure linear Transfer Function
R2 coefficient of determination
RF Random Forest
RMSE Root Mean Squared Error
ROP Rate of Penetration
rpm revolution per minute
TANSIG Hyperbolic Tangent Sigmoid Transfer Function
WOB Weight on Bit

Symbols
bp ANN bias
m random subset size
M number of all parameters
n total number of records
N number of ANN inputs
np transfer function input
p number of ANN layers
Wip adjusted weight of input i for layer p in ANN
Xi neurons input
yact actual value
Yi output of neurons
ym actual values average
ypred predicted value

References Bourgoyne Jr., A.T., Young, F.S., 1974. A multiple regression approach to optimal drilling
and abnormal pressure detection. SPE J. 14, 371–384.
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 2000. Classification and Regression
Akpabio, J.U., Inyang, P.N., Iheaka, C.I., 2015. The effect of drilling mud density on
Trees. Chapman & Hall/CRC, Boca Raton, FL.
penetration rate. Int. Res. J. Eng. Technol. 2, 29–35.
Breiman, L., 2001. Random forest. Mach. Learn. 45, 5–32.
Bahrami, P., Kazemi, P., Mahdavi, S., Ghobadi, H., 2016. A novel approach for modeling
Cannon, A.J.. Monmlp: Monotone Multi-layer Perceptron Neural Network. Available
and optimization of surfactant/polymer flooding based on genetic programming
from: http://cran.r-project.org/web/packages/monmlp/index.html (Accessed 1 June
evolutionary algorithm. Fuel 179, 289–298.
2014).
Bashiri, M., Farshbaf Geranmayeh, A., 2011. Tuning the parameters of an artificial neural
Cheatham, C. A., Nahm, J. J., Effects of selected mud properties on rate of penetration in
network using central composite design and genetic algorithm. Sci. Iran. 18,
full-scale shale drilling simulations, SPE-13465-MS, SPE/IADC Drilling Conference,
1600–1608.
5–8 March, New Orleans, Louisiana, 1985.
Bourgoyne Jr., A.T., Millheim, K.K., Chenevert, M.E., Young Jr., F.S., 1986. Applied
Deng, H., 2014. Interpreting Tree Ensembles with Intrees. Intuit. ArXiv: 1408.5456.
drilling engineering. In: SPE Textbook Series, vol. 2. Chapter 4, Page 131.

614
S. Eskandarian et al. Journal of Petroleum Science and Engineering 156 (2017) 605–615

Edalatkhah, S., Rasoul, R., Hashemi, A., 2010. Bit selection optimization using artificial Ricardo, J., Mendes, P., Fonseca, T.C., Serapaio, A.B.S., 2007. Applying a neuro-model
intelligence systems. Petrol. Sci. Technol. 28, 1946–1956. reference adaptive controller in drilling optimization. World Oil Mag. 228, 29–38.
Ernst, S., Pastusek, P. E., Lutes, P. J., Effects of RPM and ROP on PDC bit steerability, SPE- Szlek, J., Mendyk, A., 2015. The fscaret, Automated Feature Selection Using Variety of
105594-MS, SPE/IADC Drilling Conference, 20–22 February, Amsterdam, The Models Provided by Caret Package. http://CRAN.R-project.org/package¼fscaret.
Netherlands, 2007. Szlek, J., Pacławski, A., Lau, R., Jachowicz, R., Kazemi, P., Mendyk, A., 2016. Empirical
Freedman, D.A., 2009. Statistical Models: Theory and Practice, second ed. University of search for factors affecting mean particle size of PLGA microspheres containing
California, Berkeley. macromolecular drugs. Comput. Meth. Programs Biomed. 134, 137–147.
James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Team, R. C, 2015. R: A Language and Environment for Statistical Computing. R
Learning. Springer, p. 204. Foundation for Statistical Computing.
Kordon, K., 2010. Applying Computational Intelligence – How to Create Value. Springer, Walker, B. H., Black, A. D., Klauber, W. P., Little, T., Khodaverdian, M., Roller-Bit
pp. 73–113. penetration rate response as a function of rock properties and well depth. SPE 15620.
Monazami, M., Hashemi, A., Shahbazian, M., 2012. Drilling rate of penetration prediction SPE Annual Technical Conference and Exhibition, 5–8 October 1986, New Orleans,
using artificial neural network: a case study of one of iranian southern oil fields. Louisiana 1986.
Electron. Sci. J. Oil Gas Bus. 6, 21–31. Warren, T.M., March 1987. Penetration-rate performance of roller-cone bit. SPEDE 2,
Quinlan, J.R., 1986. Induction of decision trees. Mach. Learn. 81–106. 9–18.
Quinlan, J.R., 1992. Learning with Continuous Classes. World Scientific, pp. 343–348. Winters, W. J., Warren, T. M., Onyia, E. C., Roller bit model with ductility and cone offset.
Radermacher, F.J., 1991. Modeling and artificial intelligence. Appl. Artif. Intell. 5, SPE Paper 16696, Presented at SPE 62nd Annual Technical Conference and
131–151. Exhibition Held in Dallas, Texas, September 27-30, 1987.
Rahimzadeh, H., Mostofi, M., Hashemi, A., Salahshoor, K., Comparision of the penetration Zhang, Y., Yang, Y.J., 2015. Cross-validation for selecting a model selection procedure.
rate models using field data for one of the gas fields in Persian Gulf area. SPE 131253. Econometrics 187, 95–112.
CPS/SPE International Oil and Gas Conference and Exhibition in China Held in
Beijing, China, 8-10 June, 2010.

615

Expressive Therapy With Traumatized Children by P. Gussie Klorer PDF
No ratings yet
Expressive Therapy With Traumatized Children by P. Gussie Klorer PDF
228 pages
Puncture Protection of Geomembranes
No ratings yet
Puncture Protection of Geomembranes
25 pages
20461C ENU TrainerHandbook
0% (1)
20461C ENU TrainerHandbook
558 pages
Developing a New Model for Drilling Rate of Penetration Prediction Using Convolutional Neural Network
No ratings yet
Developing a New Model for Drilling Rate of Penetration Prediction Using Convolutional Neural Network
33 pages
Investigating and Ranking The Rate of Penetration (ROP) Features For Petroleum Drilling Monitoring and Optimization
No ratings yet
Investigating and Ranking The Rate of Penetration (ROP) Features For Petroleum Drilling Monitoring and Optimization
7 pages
Drilling Efficiency Improvement and Rate of Penetration Optimization by Machine Learning and Data Analytics
No ratings yet
Drilling Efficiency Improvement and Rate of Penetration Optimization by Machine Learning and Data Analytics
14 pages
Combining Insight From Physics-Based Models Into Data-Driven Model For Predicting Drilling Rate of Penetration
No ratings yet
Combining Insight From Physics-Based Models Into Data-Driven Model For Predicting Drilling Rate of Penetration
10 pages
Feature Investigation On The ROP Machine
No ratings yet
Feature Investigation On The ROP Machine
7 pages
1 s2.0 S0920410521016442 Main
No ratings yet
1 s2.0 S0920410521016442 Main
22 pages
Analysis of Rate of Penetration (ROP) Prediction in Drilling Using Physics-Based and Data-Driven Models
No ratings yet
Analysis of Rate of Penetration (ROP) Prediction in Drilling Using Physics-Based and Data-Driven Models
30 pages
Reference Dataset For Rate of Penetration Benchmar
No ratings yet
Reference Dataset For Rate of Penetration Benchmar
12 pages
08.-2019 - Cloud Based ROP Prediction and Optimization in Real-Time Using
No ratings yet
08.-2019 - Cloud Based ROP Prediction and Optimization in Real-Time Using
12 pages
Application of Data Science and Machine Learning Algorithms For ROP Prediction Turning Data Into Knowledge
No ratings yet
Application of Data Science and Machine Learning Algorithms For ROP Prediction Turning Data Into Knowledge
10 pages
AADE 20 FTCE 070 - Alkinani
No ratings yet
AADE 20 FTCE 070 - Alkinani
8 pages
Drilling rate prediction from petrophysical logs and mud loggind data using an optimized multilayered perceptron neural network
No ratings yet
Drilling rate prediction from petrophysical logs and mud loggind data using an optimized multilayered perceptron neural network
14 pages
1-s2.0-S2090447920301350-main
No ratings yet
1-s2.0-S2090447920301350-main
10 pages
Is Support Vector Regression Method Suitable f 2020 Journal of Petroleum Sci
No ratings yet
Is Support Vector Regression Method Suitable f 2020 Journal of Petroleum Sci
18 pages
Comparative Analysis of Machine Learning Algorithms in Predicting Rate Ofrnpenetration During Drilling
No ratings yet
Comparative Analysis of Machine Learning Algorithms in Predicting Rate Ofrnpenetration During Drilling
16 pages
Drilling Optimization
100% (1)
Drilling Optimization
14 pages
14th Conference Kochi23 21
No ratings yet
14th Conference Kochi23 21
6 pages
Sridharan SureshKumar
No ratings yet
Sridharan SureshKumar
15 pages
Artificial Intelligence in Drilling 1686676588
No ratings yet
Artificial Intelligence in Drilling 1686676588
30 pages
A machine learning approach to predict drilling rate using petrophysical and mud logging data
No ratings yet
A machine learning approach to predict drilling rate using petrophysical and mud logging data
21 pages
Cloud-Based ROP Prediction and Optimization in Real Time Using Supervised Machine Learning
No ratings yet
Cloud-Based ROP Prediction and Optimization in Real Time Using Supervised Machine Learning
12 pages
Anemangely 2018
No ratings yet
Anemangely 2018
15 pages
Developing a New Rigorous Drilling Rate Predicti 2020 Journal of Petroleum S
No ratings yet
Developing a New Rigorous Drilling Rate Predicti 2020 Journal of Petroleum S
27 pages
A Novel Neural Network Framework For The Prediction of Drilling Rate of Penetration
No ratings yet
A Novel Neural Network Framework For The Prediction of Drilling Rate of Penetration
11 pages
(Hedge Et Al. 2015) Using Trees, Bagging, and Random Forest To Predict ROP During Drilling
No ratings yet
(Hedge Et Al. 2015) Using Trees, Bagging, and Random Forest To Predict ROP During Drilling
12 pages
Bataee, M., & Mohseni, S. (2011) - Application of Artificial Intelligent Systems in ROP Optimization - A Case Study
No ratings yet
Bataee, M., & Mohseni, S. (2011) - Application of Artificial Intelligent Systems in ROP Optimization - A Case Study
10 pages
Hadietal 2019
No ratings yet
Hadietal 2019
11 pages
A Data Driven Approach of ROP Prediction and Drilling Performance
No ratings yet
A Data Driven Approach of ROP Prediction and Drilling Performance
9 pages
Rate of Penetration (ROP) Modeling Using Hybrid Models - Deterministic and Machine Learning
No ratings yet
Rate of Penetration (ROP) Modeling Using Hybrid Models - Deterministic and Machine Learning
19 pages
Journal of Natural Gas Science and Engineering: Chiranth Hegde, K.E. Gray
No ratings yet
Journal of Natural Gas Science and Engineering: Chiranth Hegde, K.E. Gray
9 pages
RO&G - Machine Learning Methods Applied To Rate of Penetration
No ratings yet
RO&G - Machine Learning Methods Applied To Rate of Penetration
10 pages
1 s2.0 S0920410518307824 Main
No ratings yet
1 s2.0 S0920410518307824 Main
12 pages
SPE-195268-MS A Novel Application of Artificial Neural Networks To Predict Rate of Penetration
No ratings yet
SPE-195268-MS A Novel Application of Artificial Neural Networks To Predict Rate of Penetration
6 pages
Accepted Manuscript
No ratings yet
Accepted Manuscript
22 pages
Rop Optimization With MSE
No ratings yet
Rop Optimization With MSE
14 pages
A New Approach For The Prediction of Rate of Penetration (ROP) Values
No ratings yet
A New Approach For The Prediction of Rate of Penetration (ROP) Values
5 pages
Prediction of The Rate of Penetration While Drilling Horizontal Carbonate Reservoirs Using The Self-Adaptive Artificial Neural Networks Technique
No ratings yet
Prediction of The Rate of Penetration While Drilling Horizontal Carbonate Reservoirs Using The Self-Adaptive Artificial Neural Networks Technique
19 pages
Graduation Project Template
No ratings yet
Graduation Project Template
36 pages
Esma El 2011
No ratings yet
Esma El 2011
4 pages
1 s2.0 S0920410521015217 Main
No ratings yet
1 s2.0 S0920410521015217 Main
13 pages
Comprehensive Input Models and Machine Learning Me
No ratings yet
Comprehensive Input Models and Machine Learning Me
20 pages
Machine Learning Classification Approaches To Optimize ROP and TOB Using Drilling and Geomechanical Parameters in A Carbonate Reservoir
No ratings yet
Machine Learning Classification Approaches To Optimize ROP and TOB Using Drilling and Geomechanical Parameters in A Carbonate Reservoir
26 pages
Essemble ML Forecastproduction
No ratings yet
Essemble ML Forecastproduction
20 pages
Advanced Machine Learning Application For Permeabi
No ratings yet
Advanced Machine Learning Application For Permeabi
17 pages
Optimizing Drilling Using Step Wise Linear Regression Rate of Penetration Model
No ratings yet
Optimizing Drilling Using Step Wise Linear Regression Rate of Penetration Model
6 pages
IPTC-19548-MS Rate of Penetration Prediction in Shale Formation Using Fuzzy Logic
No ratings yet
IPTC-19548-MS Rate of Penetration Prediction in Shale Formation Using Fuzzy Logic
9 pages
Ristan To 2018
No ratings yet
Ristan To 2018
71 pages
bgo00454_Moosavi_0
No ratings yet
bgo00454_Moosavi_0
14 pages
Processes
No ratings yet
Processes
19 pages
A Data Analytics Tutorial Building Predictive
No ratings yet
A Data Analytics Tutorial Building Predictive
15 pages
Ensemble Learning Based Sustai
No ratings yet
Ensemble Learning Based Sustai
16 pages
Full Online Version-Hybrid CI Models For Characterization of Oil and Gas Reservoirs
No ratings yet
Full Online Version-Hybrid CI Models For Characterization of Oil and Gas Reservoirs
11 pages
Optimizacion Multiple Linear
No ratings yet
Optimizacion Multiple Linear
13 pages
Qin-Feng DI, Wei Chen, Jing-Nan ZHANG, Wen-Chang WANG and Hui-Juan Chen
No ratings yet
Qin-Feng DI, Wei Chen, Jing-Nan ZHANG, Wen-Chang WANG and Hui-Juan Chen
9 pages
s41598-025-95490-0
No ratings yet
s41598-025-95490-0
37 pages
2022 Rate of Penetration Prediction Method For Ultra-Deep Wells Based On LSTM-FNN
No ratings yet
2022 Rate of Penetration Prediction Method For Ultra-Deep Wells Based On LSTM-FNN
14 pages
Introductory Laplace Transform with Applications
From Everand
Introductory Laplace Transform with Applications
Dalpatadu
5/5 (1)
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
From Everand
Mechanical Properties of Nanostructured Materials: Quantum Mechanics and Molecular Dynamics Insights
Abdolhossein Fereidoon
No ratings yet
Radon Transform: Unveiling Hidden Patterns in Visual Data
From Everand
Radon Transform: Unveiling Hidden Patterns in Visual Data
Fouad Sabry
No ratings yet
Soil Organic Carbon Mapping Cookbook: 2nd Edition
From Everand
Soil Organic Carbon Mapping Cookbook: 2nd Edition
Food and Agriculture Organization of the United Nations
No ratings yet
Eot Topic Stat Chapter 3
No ratings yet
Eot Topic Stat Chapter 3
8 pages
Abonacci Trading v11-11
100% (1)
Abonacci Trading v11-11
12 pages
Micro Project Report: " Climate Change An Global Issue. "
No ratings yet
Micro Project Report: " Climate Change An Global Issue. "
13 pages
4 Play Assessment Tools and Methodologies: The View of Practitioners
No ratings yet
4 Play Assessment Tools and Methodologies: The View of Practitioners
28 pages
The RIASEC Model
No ratings yet
The RIASEC Model
2 pages
Abductive Inference Josephson
No ratings yet
Abductive Inference Josephson
29 pages
Pps 999303 A8l Electrical Components Eng
No ratings yet
Pps 999303 A8l Electrical Components Eng
91 pages
Homework Design Studio
100% (1)
Homework Design Studio
7 pages
Topic 6
No ratings yet
Topic 6
16 pages
Folds
100% (1)
Folds
34 pages
Pre Final Burst Exercise
100% (4)
Pre Final Burst Exercise
23 pages
Test 211
No ratings yet
Test 211
28 pages
700-En-00 Infrared Thermometer Instruction Manual
No ratings yet
700-En-00 Infrared Thermometer Instruction Manual
2 pages
The Effects of Heat Treatment On The Microstructure and Mechanical Properties of EN19 Steel Alloy
No ratings yet
The Effects of Heat Treatment On The Microstructure and Mechanical Properties of EN19 Steel Alloy
11 pages
sd7000190 - RapidChek SELECT Salmonella Food System USA Original 69695
No ratings yet
sd7000190 - RapidChek SELECT Salmonella Food System USA Original 69695
19 pages
Henry Mintz Berg
No ratings yet
Henry Mintz Berg
8 pages
PART A: Vocabulary
No ratings yet
PART A: Vocabulary
6 pages
Case Gladiators
No ratings yet
Case Gladiators
6 pages
Lecture 6 Measurement of Angles and Direction
No ratings yet
Lecture 6 Measurement of Angles and Direction
72 pages
Specification of Flexible LED Strip
No ratings yet
Specification of Flexible LED Strip
21 pages
Terratec Aureon 5.1 Fun
No ratings yet
Terratec Aureon 5.1 Fun
2 pages
Elements and Tenets of Digital Citizenship
No ratings yet
Elements and Tenets of Digital Citizenship
24 pages
Emotional Responses To Social Media Experiences Among Adolescents Longitudinal Associations With Depressive Symptoms
No ratings yet
Emotional Responses To Social Media Experiences Among Adolescents Longitudinal Associations With Depressive Symptoms
17 pages
1811016-Experiment 3
No ratings yet
1811016-Experiment 3
21 pages
Full Product Line Catalogue 2018 2270
No ratings yet
Full Product Line Catalogue 2018 2270
232 pages
PHSC 210-B04 Essay
No ratings yet
PHSC 210-B04 Essay
6 pages
Resist 65: Technical Data Sheet
No ratings yet
Resist 65: Technical Data Sheet
5 pages

A-comprehensive-data-mining-approach-to-estimate-the-r_2017_Journal-of-Petro

Uploaded by

A-comprehensive-data-mining-approach-to-estimate-the-r_2017_Journal-of-Petro

Uploaded by

Journal of Petroleum Science and Engineering 156 (2017) 605–615

Contents lists available at ScienceDirect

Journal of Petroleum Science and Engineering

A comprehensive data mining approach to estimate the rate of penetration:

1. Introduction comprehensive estimation because of the complex and non-linear

Table 1 a proper model in feature ranking.

Fig. 1. Schematic of 10CV method.

Fig. 2. Example of one Cubist tree with 6 linear regression models.

Fig. 3. Scheme of a single neuron.

Fig. 4. Scheme of MON-MLP model.

Fig. 5. General Scheme of workﬂow.

Fig. 6. Linear regression on the dataset.

Fig. 7. Importance of variables obtained from fscaret.

Controlling Parameters RF (4 variables) RF (6 variables)

Model RMSE (Training) R2 (Training) RMSE (10 CV) R2 (10 CV)

Cubist (4 variables) 1.4777 0.7787 1.7777 0.6457

Rule length error condition prediction Importance

Controlling Parameters MON-MLP (4 variables) MON-MLP (6 variables)

Number of nodes in layer 1 12 20

You might also like