A-Comparative-Study-on-Machine-Learning-Algorithms-for-Smart-Manufacturing-Tool-Wear-Prediction-Using-Random-Forests
A-Comparative-Study-on-Machine-Learning-Algorithms-for-Smart-Manufacturing-Tool-Wear-Prediction-Using-Random-Forests
1 Introduction will often incur higher costs and lower productivity due to unex-
pected machine downtime. In order to increase manufacturing
Smart manufacturing aims to integrate big data, advanced ana-
productivity while reducing maintenance costs, it is crucial to
lytics, high-performance computing, and Industrial Internet of
develop and implement an intelligent maintenance strategy that
Things (IIoT) into traditional manufacturing systems and proc-
allows manufacturers to determine the condition of in-service sys-
esses to create highly customizable products with higher quality at
tems in order to predict when maintenance should be performed.
lower costs. As opposed to traditional factories, a smart factory
Conventional maintenance strategies include reactive, preven-
utilizes interoperable information and communications technolo-
tive, and proactive maintenance [4–6]. The most basic approach
gies (ICT), intelligent automation systems, and sensor networks to
to maintenance is reactive, also known as run-to-failure mainte-
monitor machinery conditions, diagnose the root cause of failures,
nance planning. In the reactive maintenance strategy, assets are
and predict the remaining useful life (RUL) of mechanical sys-
deliberately allowed to operate until failures actually occur. The
tems or components. For example, almost all engineering systems
assets are maintained on an as-needed basis. One of the disadvan-
(e.g., aerospace systems, nuclear power plants, and machine tools)
tages of reactive maintenance is that it is difficult to anticipate the
are subject to mechanical failures resulting from deterioration
maintenance resources (e.g., manpower, tools, and replacement
with usage and age or abnormal operating conditions [1–3].
parts) that will be required for repairs. Preventive maintenance is
Some of the typical failure modes include excessive load, over-
often referred to as use-based maintenance. In preventive mainte-
heating, deflection, fracture, fatigue, corrosion, and wear. The
nance, maintenance activities are performed after a specified
degradation and failures of engineering systems or components
period of time or amount of use based on the estimated probability
that the systems or components will fail in the specified time inter-
1
Corresponding author.
val. Although preventive maintenance allows for more consistent
Manuscript received October 25, 2016; final manuscript received March 13, and predictable maintenance schedules, more maintenance activ-
2017; published online April 18, 2017. Assoc. Editor: Laine Mears. ities are needed as opposed to reactive maintenance. To improve
Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-1
C 2017 by ASME
Copyright V
Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-3
Fig. 2 Tool wear prediction using SVR 3.3.1 Bootstrap Aggregating or Bagging. Given a training
dataset D ¼ fðx1 ; y1 Þ; ðx2 ; y2 Þ; …; ðxN ; yN Þg, bootstrap aggregating
or bagging generates B new training datasets Di of size N by sam-
3.2 Tool Wear Prediction Using SVR. The original SVM pling from the original training dataset D with replacement. Di is
for regression was developed by Vapnik and coworkers [46,47]. A referred to as a bootstrap sample. By sampling with replacement
SVM constructs a hyperplane or set of hyperplanes in a high- or or bootstrapping, some observations may be repeated in each Di .
infinite-dimensional space, which can be used for classification Bagging helps reduce variance and avoid overfitting. The number
and regression. of regression trees B is a parameter specified by users. Typically,
The framework of SVR for linear cases is illustrated in Fig. 2. a few hundred to several thousand trees are used in the random
Formally, SVR can be formulated as a convex optimization forest algorithm.
problem
3.3.2 Choosing Variables to Split On. For each of the boot-
X ‘
strap samples, grow an un-pruned regression tree with the follow-
1
Minimize kxk2 þ C ni þ ni ing procedure: At each node, randomly sample m variables and
2 choose the best split among those variables rather than choosing
8 i¼1
< yi hx; xi i b e þ ni
> (3.1) the best split among all predictors. This process is sometimes
Subject to hx; xi i þ b yi e þ ni called “feature bagging.” The reason why a random subset of the
>
: predictors or features is selected is because the correlation of the
ni ; ni 0 trees in an ordinary bootstrap sample can be reduced. For regres-
sion, the default m ¼ p=3.
where x 2 v; C ¼ 1, e ¼ 0:1, and ni ; ni ¼ 0:001. b can be com-
puted as follows: 3.3.3 Splitting Criterion. Suppose that a partition is divided
into M regions R1 , R2 ,…, Rm . The response is modeled as a con-
b ¼ yi hx; xi i e for ai 2 ½0; C stant cm in each region
(3.2)
b ¼ yi hx; xi i þ e for ai 2 ½0; C
X
M
f ðxÞ ¼ cm Iðx Rm Þ (3.4)
For nonlinear SVR, the training patterns xi can be preprocessed m¼1
by a nonlinear kernel function kðx; x0 Þ :¼ hUðxÞ; UðxÞi0 , where
UðxÞ is a transformation that maps x to a high-dimensional space. The splitting criterion at each node is to minimize the sum of
These kernel functions need to satisfy the Mercer’s theorem. squares. Therefore, the best cc
m is the average of yi in region Rm
Many kernels have been developed for various applications. The
most popular kernels include polynomial, Gaussian radial basis c
cm ¼ aveðyi jxi Rm Þ (3.5)
function (RBF), and sigmoid. In many applications, a nonlinear
kernel function provides better accuracy. According to the litera- Consider a splitting variable j and split point s, and define the
ture [32,33], the Gaussian RBF kernel is one of the most effective pair of half-planes
kernel functions used in tool wear prediction. In this research, the
Gaussian RBF kernel is used to transform the input dataset R1 ðj; sÞ ¼ fXjXj sg and R2 ðj; sÞ ¼ fXjXj sg (3.6)
D ¼ ðxi ; yi Þ, where xi is the input vector and yi is the response
variable (i.e., flank wear) into a new dataset in a high-dimensional The splitting variable j and split point s should satisfy
space. The new dataset is linearly separable by a hyperplane in a X X
higher-dimensional Euclidean space as illustrated in Fig. 2. The min min ðyi c1Þ2 þ min ðyi c2Þ2 (3.7)
slack variables ni and ni are introduced in the instances where the j;s c1
xi 2R1 ðj;sÞ
c2
xi 2R2 ðj;sÞ
constraints are infeasible. The slack variables denote the deviation
from predicted values with the error of e ¼ 0:1. The RBF kernel is For any j and s, the inner minimization is solved by
kðxi ; xj Þ ¼ exp ðððkxi xj k2 Þ=2r2 ÞÞ, where r2 ¼ 0:5. At the
optimal solution, we obtain cb1 ¼ aveðyi jxi R1 ðj; sÞÞ and cb2 ¼ aveðyi jxi R2 ðj; sÞÞ (3.8)
X
‘ X
‘ Having found the best split, the dataset is partitioned into two
x¼ ðai ai ÞUðxÞ and f ðxÞ ¼ ðai ai Þkðxi ; xj Þ þ b resulting regions and repeat the splitting process on each of the
i¼1 i¼1 two regions. This splitting process is repeated until a predefined
(3.3) stopping criterion is satisfied.
The random forest algorithm [48,49] for regression is as follows: 4 Experimental Setup
(1) Draw a bootstrap sample Z of size N from the training data. The data used in this paper were obtained from Li et al. [50].
(2) For each bootstrap sample, construct a regression tree by Some details of the experiment are presented in this section. The
splitting a node into two children nodes until the stopping experimental setup is shown in Fig. 5.
criterion is satisfied. The cutter material and workpiece material used in the experi-
(3) Output the ensemble of trees fTb gB1 . ment are high-speed steel and stainless steel, respectively. The
(4) Make a prediction at a new point x by aggregating the pre- detailed description of the operating conditions in the dry milling
dictions of the B trees. operation can be found in Table 2. The spindle speed of the cutter
was 10,400 RPM. The feed rate was 1555 mm/min. The Y depth
The framework of predicting flank wear using an RF is illus- of cut (radial) was 0.125 mm. The Z depth of cut (axial) was
trated in Fig. 3. In this research, a random forest is constructed 0.2 mm.
using B ¼ 500 regression trees. Given the labeled training dataset 315 cutting tests were conducted on a three-axis high-speed
D ¼ ðxi ; yi Þ, a bootstrap sample of size N ¼ 630 is drawn from CNC machine (R€oders Tech RFM 760). During each cutting test,
the training dataset. For each regression tree, m ¼ 9 ðm ¼ ðp=3Þ; seven signal channels, including cutting force, vibration, and
p ¼ 28Þ variables are selected at random from the 28 variables/ acoustic emission data, were monitored in real-time. The sampling
features. The best variable/split-point is selected among the nine rate was 50 kHz/channel. Each cutting test took about 15 s. A sta-
variables. A regression tree progressively splits the training data- tionary dynamometer, mounted on the table of the CNC machine,
set into two child nodes: left node (with samples <z) and right was used to measure cutting forces in three, mutually perpendicu-
node (with samples z). A splitting variable and split point are lar axes (x, y, and z dimensions). Three piezo accelerometers,
selected by solving Eqs. (3.7) and (3.8). The process is applied mounted on the workpiece, were used to measure vibration in
recursively on the dataset in each child node. The splitting process three, mutually perpendicular axes (x, y, and z dimensions). An
stops if the number of records in a node is less than 5. An acoustic emission (AE) sensor, mounted on the workpiece, was
Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-5
Parameter Value
used to monitor a high-frequency oscillation that occurs spontane- Fig. 7 Comparison of observed and predicted tool wear using
ously within metals due to crack formation or plastic deformation. SVR (termination criterion: slack variable or tolerance n is equal
Acoustic emission is caused by the release of strain energy as the to 0.001)
microstructure of the material is rearranged. After each cutting
test, the value of tool wear was measured off-line using a micro-
scope (Leica MZ12). The total size of the condition monitoring Three predictive models were developed using ANNs, SVR,
data is about 8.67 GB. and RFs, respectively. Two-thirds (2/3) of the input data (i.e.,
three datasets) were selected at random for model development
5 Results and Discussion (training). The remainder (1/3) of the input data was used for
model validation (testing). Figures 6–8 show the predicted against
In machine learning, feature extraction is an essential prepro-
observed tool wear values with the test dataset using ANNs, SVR,
cessing step in which raw data collected from various signal chan-
and RFs, respectively. Figure 9 shows the tool wear against time
nels are converted into a set of statistical features in a format
with RFs.
supported by machine learning algorithms. The statistical features
In addition, the performance of the three algorithms was
are then given as an input to a machine learning algorithm. In this
evaluated on the test dataset using accuracy and training time.
experiment, the condition monitoring data were collected from (1)
Accuracy is measured using the R2 statistic, also referred to as
cutting force, (2) vibration, and (3) acoustic emission signal chan-
the coefficient of determination, and mean squared error (MSE).
nels. A set of statistical features (28 features) was extracted from
In statistics, the coefficient of determination is defined as
these signals, including maximum, median, mean, and standard
R2 ¼ 1 ðSSE=SSTÞ, where SSE is the sum of the squares of
deviation as listed in Table 3.
Fig. 6 Comparison of observed and predicted tool wear using Fig. 8 Comparison of observed and predicted tool wear using
an ANN with 16 neurons in the hidden layer (termination crite- RFs (termination criterion: minimum number of samples in
rion: tolerance is equal to 1.0 3 1024) each node is equal to 5)
Table 8 Accuracy on the test data and training time for the
FFBP ANN with 32 neurons in the hidden layer
Fig. 9 Tool wear against time (cut) using RFs ANN (number of neurons ¼ 32)
Table 5 Accuracy on the test data and training time for the
FFBP ANN with four neurons in the hidden layer
Table 6 Accuracy on the test data and training time for the
FFBP ANN with eight neurons in the hidden layer better the regression model fits the data. The MSE of an estimator
measures the average of P the squares of the errors. The MSE is
ANN (number of neurons ¼ 8) defined as MSE ¼ ð1=nÞ ni¼1 ðb yi yi Þ2 , where ybi is a predicted
value, yi is an observed value, and n is the sample size. The ANN,
Training size (%) MSE R2 Training time (s)
SVR, and RF algorithms use between 50% and 90% of the input
50 36.810 0.964 0.167 data for model development (training) and use the remainder for
60 34.168 0.968 0.186 model validation (testing). Because the performance of ANNs
70 39.795 0.961 0.202 depends on the hidden layer configuration, five ANNs with a sin-
80 44.175 0.957 0.197 gle hidden layer but different number of neurons were tested on
90 46.634 0.954 0.234 the training dataset. Tables 4–8 list the MSE, R-squared, and train-
ing time for the ANNs with 2, 4, 8, 16, and 32 neurons. With
respect to the performance of the ANN, the training time increases
as the number of neurons increases. However, the increased in
training time are not significant as shown in Fig. 10. In addition,
residuals, SST is the total sum of squares. The coefficient of deter- while the prediction accuracy increases as the number of neurons
mination is a measure that indicates the percentage of the response increases, the performance is not significantly improved by adding
variable variation that is explained by a regression model. A more than eight neurons in the hidden layer as shown in Figs. 11
higher R-squared indicates that more variability is explained by and 12. Tables 9 and 10 list the MSE, R-squared, and training
the regression model. For example, an R2 of 100% indicates that time for SVR and RFs. While the training time for RFs is longer
the regression model explains all the variability of the response than that of ANNs and SVR, the predictive model built by RFs is
data around its mean. In general, the higher the R-squared, the the most accurate as shown in Figs. 10–12.
Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-7
Acknowledgment
The research reported in this paper is partially supported by
NSF under Grant Nos. IIP-1238335 and DMDII-15-14-01. Any
opinions, findings, and conclusions or recommendations expressed
in this paper are those of the authors and do not necessarily reflect
the views of the National Science Foundation and the Digital Man-
ufacturing and Design Innovation Institute.
References
[1] Swanson, L., 2001, “Linking Maintenance Strategies to Performance,” Int. J.
Prod. Econ., 70(3), pp. 237–244.
Fig. 12 Comparison of R-squared errors [2] Valdez-Flores, C., and Feldman, R. M., 1989, “A Survey of Preventive Mainte-
nance Models for Stochastically Deteriorating Single-Unit Systems,” Nav. Res.
Logist., 36(4), pp. 419–446.
[3] Wu, D., Terpenny, J., Zhang, L., Gao, R., and Kurfess, T., 2016, “Fog-Enabled
Architecture for Data-Driven Cyber-Manufacturing Systems,” ASME Paper
No. MSEC2016-8559.
Table 9 Accuracy on the test data and training time for SVR [4] Lee, J., 1995, “Machine Performance Monitoring and Proactive Maintenance in
with radial basis kernel Computer-Integrated Manufacturing: Review and Perspective,” Int. J. Comput.
Integr. Manuf., 8(5), pp. 370–380.
[5] Bevilacqua, M., and Braglia, M., 2000, “The Analytic Hierarchy Process
SVR Applied to Maintenance Strategy Selection,” Reliab. Eng. Syst. Saf., 70(1),
pp. 71–83.
Training size (%) MSE R2 Training time (s) [6] Suh, J. H., Kumara, S. R., and Mysore, S. P., 1999, “Machinery Fault Diagnosis
and Prognosis: Application of Advanced Signal Processing Techniques,” CIRP
50 54.993 0.946 0.060 Ann.-Manuf. Technol., 48(1), pp. 317–320.
[7] Hu, C., Youn, B. D., and Kim, T., 2012, “Semi-Supervised Learning With Co-
60 49.868 0.952 0.073
Training for Data-Driven Prognostics,” IEEE Conference on Prognostics and
70 41.072 0.959 0.088 Health Management (PHM), Denver, CO, June 18–21, pp. 1–10.
80 31.958 0.969 0.107 [8] Schwabacher, M., 2005, “A Survey of Data-Driven Prognostics,” AIAA Paper
90 23.997 0.975 0.126 No. 2005-7002.
[9] Byrne, G., Dornfeld, D., Inasaki, I., Ketteler, G., K€
onig, W., and Teti, R., 1995,
“Tool Condition Monitoring (TCM)—The Status of Research and Industrial
Application,” CIRP Ann.-Manuf. Technol., 44(2), pp. 541–567.
[10] Teti, R., Jemielniak, K., O’Donnell, G., and Dornfeld, D., 2010, “Advanced
Monitoring of Machining Operations,” CIRP Ann.-Manuf. Technol., 59(2),
pp. 717–739.
Table 10 Accuracy on the test data and training time for RFs
[11] Gao, R., Wang, L., Teti, R., Dornfeld, D., Kumara, S., Mori, M., and Helu, M.,
2015, “Cloud-Enabled Prognosis for Manufacturing,” CIRP Ann.-Manuf. Tech-
RFs (500 trees) nol., 64(2), pp. 749–772.
[12] Daigle, M. J., and Goebel, K., 2013, “Model-Based Prognostics With Concur-
Training size (%) MSE R2 Training time (s) rent Damage Progression Processes,” IEEE Trans. Syst. Man Cybernetics:
Syst., 43(3), pp. 535–546.
50 14.170 0.986 1.079 [13] Si, X.-S., Wang, W., Hu, C.-H., Chen, M.-Y., and Zhou, D.-H., 2013, “A
Wiener-Process-Based Degradation Model With a Recursive Filter Algorithm
60 11.053 0.989 1.386
for Remaining Useful Life Estimation,” Mech. Syst. Signal Process., 35(1),
70 10.156 0.990 1.700 pp. 219–237.
80 8.633 0.991 2.003 [14] Dong, M., and He, D., 2007, “Hidden Semi-Markov Model-Based Methodology
90 7.674 0.992 2.325 for Multi-Sensor Equipment Health Diagnosis and Prognosis,” Eur. J. Oper.
Res., 178(3), pp. 858–878.
Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-9