0% found this document useful (0 votes)
7 views

A-Comparative-Study-on-Machine-Learning-Algorithms-for-Smart-Manufacturing-Tool-Wear-Prediction-Using-Random-Forests

This document presents a comparative study on machine learning algorithms for tool wear prediction in smart manufacturing, focusing on the effectiveness of random forests (RFs) compared to feed-forward back propagation artificial neural networks (FFBP ANNs) and support vector regression (SVR). The research demonstrates that RFs provide more accurate predictions of tool wear based on experimental data from milling tests, achieving a mean squared error of 7.67 and a coefficient of determination of 0.992. The study emphasizes the transition from traditional model-based prognostics to data-driven methods in enhancing predictive maintenance strategies in manufacturing systems.

Uploaded by

Tanmay tripathi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

A-Comparative-Study-on-Machine-Learning-Algorithms-for-Smart-Manufacturing-Tool-Wear-Prediction-Using-Random-Forests

This document presents a comparative study on machine learning algorithms for tool wear prediction in smart manufacturing, focusing on the effectiveness of random forests (RFs) compared to feed-forward back propagation artificial neural networks (FFBP ANNs) and support vector regression (SVR). The research demonstrates that RFs provide more accurate predictions of tool wear based on experimental data from milling tests, achieving a mean squared error of 7.67 and a coefficient of determination of 0.992. The study emphasizes the transition from traditional model-based prognostics to data-driven methods in enhancing predictive maintenance strategies in manufacturing systems.

Uploaded by

Tanmay tripathi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Dazhong Wu1

Department of Industrial and


Manufacturing Engineering, A Comparative Study on
National Science Foundation
Center for e-Design,
Pennsylvania State University,
Machine Learning Algorithms
University Park, PA 16802
e-mail: [email protected] for Smart Manufacturing: Tool
Connor Jennings
Department of Industrial and
Wear Prediction Using
Manufacturing Engineering,
National Science Foundation
Center for e-Design,
Random Forests
Pennsylvania State University, Manufacturers have faced an increasing need for the development of predictive models
University Park, PA 16802 that predict mechanical failures and the remaining useful life (RUL) of manufacturing
e-mail: [email protected] systems or components. Classical model-based or physics-based prognostics often
require an in-depth physical understanding of the system of interest to develop closed-
Janis Terpenny form mathematical models. However, prior knowledge of system behavior is not always
Department of Industrial and available, especially for complex manufacturing systems and processes. To complement
Manufacturing Engineering, model-based prognostics, data-driven methods have been increasingly applied to machin-
National Science Foundation ery prognostics and maintenance management, transforming legacy manufacturing sys-
Center for e-Design, tems into smart manufacturing systems with artificial intelligence. While previous
Pennsylvania State University, research has demonstrated the effectiveness of data-driven methods, most of these prog-
University Park, PA 16802 nostic methods are based on classical machine learning techniques, such as artificial
e-mail: [email protected] neural networks (ANNs) and support vector regression (SVR). With the rapid advance-
ment in artificial intelligence, various machine learning algorithms have been developed
Robert X. Gao and widely applied in many engineering fields. The objective of this research is to intro-
Department of Mechanical and duce a random forests (RFs)-based prognostic method for tool wear prediction as well as
Aerospace Engineering, compare the performance of RFs with feed-forward back propagation (FFBP) ANNs and
Case Western Reserve University, SVR. Specifically, the performance of FFBP ANNs, SVR, and RFs are compared using an
Cleveland, OH 44106 experimental data collected from 315 milling tests. Experimental results have shown that
e-mail: [email protected] RFs can generate more accurate predictions than FFBP ANNs with a single hidden layer
and SVR. [DOI: 10.1115/1.4036350]
Soundar Kumara
Department of Industrial and Keywords: tool wear prediction, predictive modeling, machine learning, random forests
Manufacturing Engineering, (RFs), support vector machines (SVMs), artificial neural networks (ANNs), prognostics
Pennsylvania State University, and health management (PHM)
University Park, PA 16802
e-mail: [email protected]

1 Introduction will often incur higher costs and lower productivity due to unex-
pected machine downtime. In order to increase manufacturing
Smart manufacturing aims to integrate big data, advanced ana-
productivity while reducing maintenance costs, it is crucial to
lytics, high-performance computing, and Industrial Internet of
develop and implement an intelligent maintenance strategy that
Things (IIoT) into traditional manufacturing systems and proc-
allows manufacturers to determine the condition of in-service sys-
esses to create highly customizable products with higher quality at
tems in order to predict when maintenance should be performed.
lower costs. As opposed to traditional factories, a smart factory
Conventional maintenance strategies include reactive, preven-
utilizes interoperable information and communications technolo-
tive, and proactive maintenance [4–6]. The most basic approach
gies (ICT), intelligent automation systems, and sensor networks to
to maintenance is reactive, also known as run-to-failure mainte-
monitor machinery conditions, diagnose the root cause of failures,
nance planning. In the reactive maintenance strategy, assets are
and predict the remaining useful life (RUL) of mechanical sys-
deliberately allowed to operate until failures actually occur. The
tems or components. For example, almost all engineering systems
assets are maintained on an as-needed basis. One of the disadvan-
(e.g., aerospace systems, nuclear power plants, and machine tools)
tages of reactive maintenance is that it is difficult to anticipate the
are subject to mechanical failures resulting from deterioration
maintenance resources (e.g., manpower, tools, and replacement
with usage and age or abnormal operating conditions [1–3].
parts) that will be required for repairs. Preventive maintenance is
Some of the typical failure modes include excessive load, over-
often referred to as use-based maintenance. In preventive mainte-
heating, deflection, fracture, fatigue, corrosion, and wear. The
nance, maintenance activities are performed after a specified
degradation and failures of engineering systems or components
period of time or amount of use based on the estimated probability
that the systems or components will fail in the specified time inter-
1
Corresponding author.
val. Although preventive maintenance allows for more consistent
Manuscript received October 25, 2016; final manuscript received March 13, and predictable maintenance schedules, more maintenance activ-
2017; published online April 18, 2017. Assoc. Editor: Laine Mears. ities are needed as opposed to reactive maintenance. To improve

Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-1
C 2017 by ASME
Copyright V

Downloaded From: http://manufacturingscience.asmedigitalcollection.asme.org/ on 03/10/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


the efficiency and effectiveness of preventive maintenance, pre- 2 Data-Driven Methods for Tool Wear Prediction
dictive maintenance is an alternative strategy in which mainte-
Tool wear is the most commonly observed and unavoidable
nance actions are scheduled based on equipment performance or
phenomenon in manufacturing processes, such as drilling, milling,
conditions instead of time. The objective of proactive mainte-
and turning [25–27]. The rate of tool wear is typically affected by
nance is to determine the condition of in-service equipment and
process parameters (e.g., cutting speed and feed rate), cutting tool
ultimately to predict the time at which a system or a component
geometry, and properties of workpiece and tool materials. Tay-
will no longer meet desired functional requirements.
lor’s equation for tool life expectancy [28] provides an approxi-
The discipline that predicts health condition and remaining use-
mation of tool wear. However, with the rapid advancement of
ful life (RUL) based on previous and current operating conditions
sensing technology and increasing number of sensors equipped on
is often referred to as prognostics and health management (PHM).
modern CNC machines, it is possible to predict tool wear more
Prognostic approaches fall into two categories: model-based and
accurately using various measurement data. This section presents
data-driven prognostics [7–12]. Model-based prognostics refer to
a review of data-driven methods for tool wear prediction.
approaches based on mathematical models of system behavior
Schwabacher and Goebel [29] conducted a review of data-
derived from physical laws or probability distribution. For exam-
driven methods for prognostics. The most popular data-driven
ple, model-based prognostics include methods based on Wiener
approaches to prognostics include ANNs, decision trees, and
and Gamma processes [13], hidden Markov models (HMMs) [14],
SVMs in the context of systems health management. ANNs are a
Kalman filters [15,16], and particle filters [17–20]. One of the
family of computational models based on biological neural net-
limitations of model-based prognostics is that an in-depth under-
works which are used to estimate complex relationships between
standing of the underlying physical processes that lead to system
inputs and outputs. Bukkapatnam et al. [30–32] developed effec-
failures is required. Another limitation is that it is assumed that
tive tool wear monitoring techniques using ANNs based on fea-
underlying processes follow certain probability distributions, such €
as gamma or normal distributions. While probability density func- tures extracted from the principles of nonlinear dynamics. Ozel
tions enable uncertainty quantification, distributional assumptions and Karpat [33] presented a predictive modeling approach for sur-
may not hold true in practice. face roughness and tool wear for hard turning processes using
To complement model-based prognostics, data-driven prognos- ANNs. The inputs of the ANN model include workpiece hardness,
tics refer to approaches that build predictive models using learn- cutting speed, feed rate, axial cutting length, and mean values of
ing algorithms and large volumes of training data. For example, three force components. Experimental results have shown that the
classical data-driven prognostics are based on autoregressive model trained by ANNs provides accurate predictions of surface
(AR) models, multivariate adaptive regression, fuzzy set theory, roughness and tool flank wear. Palanisamy et al. [34] developed a
ANNs, and SVR. The unique benefit of data-driven methods is predictive model for predicting tool flank wear in end milling
that an in-depth understanding of system physical behaviors is not operations using feed-forward back propagation (FFBP) ANNs.
a prerequisite. In addition, data-driven methods do not assume Experimental results have shown that the predictive model based
any underlying probability distributions which may not be practi- on ANNs can make accurate predictions of tool flank wear using
cal for real-world applications. While ANNs and SVR have been cutting speeds, feed rates, and depth of cut. Sanjay et al. [35]
applied in the area of data-driven prognostics, little research has developed a model for predicting tool flank wear in drilling using
been conducted to evaluate the performance of other machine ANNs. The feed rates, spindle speeds, torques, machining times,
learning algorithms [21]. Because RFs have the potential to han- and thrust forces are used to train the ANN model. The experi-
dle a large number of input variables without variable selection mental results have demonstrated that ANNs can predict tool wear
and they do not overfit [22–24], we investigate the ability of RFs accurately. Chungchoo and Saini [36] developed an online fuzzy
for the prediction of tool wear using an experimental dataset. neural network (FNN) algorithm that estimates the average width
Further, the performance of RFs is compared with that of FFBP of flank wear and maximum depth of crater wear. A modified
ANNs and SVR using accuracy and training time. least-square backpropagation neural network was built to estimate
The main contributions of this paper include the followings: flank and crater wear based on cutting force and acoustic emission
signals. Chen and Chen [37] developed an in-process tool wear
 Tool wear in milling operations is predicted using RFs along prediction system using ANNs for milling operations. A total of
with cutting force, vibration, and acoustic emission (AE) sig- 100 experimental data were used for training the ANN model.
nals. Experimental results have shown that the predictive The input variables include feed rate, depth of cut, and average
model trained by RFs is very accurate. The mean squared peak cutting forces. The ANN model can predict tool wear with
error (MSE) on the test tool wear data is up to 7.67. The an error of 0.037 mm on average. Paul and Varadarajan [38] intro-
coefficient of determination (R2) on the test tool wear data is duced a multisensor fusion model to predict tool wear in turning
up to 0.992. To the best of our knowledge, the random forest processes using ANNs. A regression model and an ANN were
algorithm is applied to predict tool wear for the first time. developed to fuse the cutting force, cutting temperature, and
 The performances of ANNs, support vector machines vibration signals. Experimental results showed that the coefficient
(SVMs), and RFs are compared using an experimental data- of determination was 0.956 for the regression model trained by
set with respect to the accuracy of regression (e.g., MSE and the ANN. Karayel [39] presented a neural network approach
R2) and training time. While the training time for RFs is lon- for the prediction of surface roughness in turning operations. A
ger than that of ANNs and SVMs, the predictive model built feed-forward back-propagation multilayer neural network was
by RFs is the most accurate for the application example. developed to train a predictive model using the data collected
from 49 cutting tests. Experimental results showed that the predic-
The remainder of the paper is organized as follows: Section 2 tive model has an average absolute error of 2.29%.
reviews the related literature on data-driven methods for tool wear pre- Cho et al. [40] developed an intelligent tool breakage detection
diction. Section 3 presents the methodology for tool wear prediction system with the SVM algorithm by monitoring cutting forces and
using ANNs, SVMs, and RFs. Section 4 presents an experimental power consumption in end milling processes. Linear and polyno-
setup and the experimental dataset acquired from different types of mial kernel functions were applied in the SVM algorithm. It has
sensors (e.g., cutting force sensor, vibration sensor, acoustic emis- been demonstrated that the predictive model built by SVMs can
sion sensor) on a computer numerical control (CNC) milling recognize process abnormalities in milling. Benkedjouh et al. [41]
machine. Section 5 presents experimental results, demonstrates the presented a method for tool wear assessment and remaining useful
effectiveness of the three machine learning algorithms, and com- life prediction using SVMs. The features were extracted from
pares the performance of each. Section 6 provides conclusions that cutting force, vibration, and acoustic emission signals. The experi-
include a discussion of research contribution and future work. mental results have shown that SVMs can be used to estimate the

071018-2 / Vol. 139, JULY 2017 Transactions of the ASME

Downloaded From: http://manufacturingscience.asmedigitalcollection.asme.org/ on 03/10/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


wear progression and predict RUL of cutting tools effectively. Shi
and Gindy [42] introduced a predictive modeling method by com-
bining least squares SVMs and principal component analysis
(PCA). PCA was used to extract statistical features from multiple
sensor signals acquired from broaching processes. Experimental
results showed that the predictive model trained by SVMs was
effective to predict tool wear using the features extracted by PCA.
Another data-driven method for prognostics is based on deci-
sion trees. Decision trees are a nonparametric supervised learning
method used for classification and regression. The goal of deci-
sion tree learning is to create a model that predicts the value of a
target variable by learning decision rules inferred from data fea-
tures. A decision tree is a flowchart-like structure in which each
internal node denotes a test on an attribute, each branch represents
the outcome of a test, and each leaf node holds a class label. Jiaa
and Dornfeld [43] proposed a decision tree-based method for the
prediction of tool flank wear in a turning operation using acoustic
emission and cutting force signals. The features characterizing the
AE root-mean-square and cutting force signals were extracted from
both time and frequency domains. The decision tree approach was
demonstrated to be able to make reliable inferences and decisions
on tool wear classification. Elangovan et al. [44] developed a deci-
sion tree-based algorithm for tool wear prediction using vibration
signals. Ten-fold cross-validation was used to evaluate the accuracy
of the predictive model created by the decision tree algorithm. The Fig. 1 Tool wear prediction using a feed-forward back-
maximum classification accuracy was 87.5%. Arisoy and Ozel € [45] propagation (FFBP) ANN
investigated the effects of machining parameters on surface micro-
hardness and microstructure such as grain size and fractions using a
random forests-based predictive modeling method along with defined by three types of parameters: (1) the interconnection pat-
finite element simulations. Predicted microhardness profiles and tern between different layers of neurons, (2) the learning process
grain sizes were used to understand the effects of cutting speed, for updating the weights of the interconnections, and (3) the acti-
tool coating, and edge radius on the surface integrity. vation function that converts a neuron’s weighted input to its out-
In summary, the related work presented in this section builds put activation. Among many types of ANNs, the feed-forward
on previous research to explore how the conditions of tool wear neural network is the first and the most popular ANN. Back-
can be monitored as well as how tool wear can be predicted using propagation is a learning algorithm for training ANNs in conjunc-
predictive modeling. While earlier work focused on prediction of tion with an optimization method such as gradient descent.
tool wear using ANNs, SVMs, and decision trees, this paper Figure 1 illustrates the architecture of the FFBP ANN with a
explores the potential of a new method, random forests, for tool single hidden layer. In this research, the ANN has three layers,
wear prediction. Further, the performance of RFs is compared including input layer i, hidden layer j, and output layer k. Each
with that of ANNs and SVMs. Because RFs are an extension of layer consists of one or more neurons or units, represented by
decision trees, the performance of RFs is not compared with that the circles. The flow of information is represented by the lines
of decision trees. between the units. The first layer has input neurons which act as
buffers for distributing the extracted features (i.e., Fi ) from the
input data (i.e., xi ). The number of the neurons in the input layer
3 Methodology is the same as that of extracted features from input variables.
This section presents the methodology for data-driven prognos- Each value from the input layer is duplicated and sent to all
tics for tool wear prediction using ANNs, SVR, and RFs. The neurons in the hidden layer. The hidden layer is used to process
input of ANNs, SVR, and RFs is the following labeled training and connect the information from the input layer to the output
data: layer in a forward direction. Specifically, these values entering a
neuron in the hidden layer are multiplied by weights wij . Initial
D ¼ ðxi ; yi Þ weights are randomly selected between 0 and 1. A neuron in the
hidden layer sums up the weighted inputs and generates a single
where xi ¼ ðFX ; FY ; FZ ; VX ; VY ; VZ ; AEÞ, yi 2 R. The description output. This value is the input of an activation function (sigmoid
of these input data can be found in Table 1. function) in the hidden layer fh that converts the weighted input
to the output of the neuron. Similarly, the outputs of all the neu-
rons in the hidden layer are multiplied by weights wjk . A neural
3.1 Tool Wear Prediction Using ANNs. ANNs are a family in the output layer sums up the weighted inputs and generates a
of models inspired by biological neural networks. An ANN is single value. An activation function in the output layer fo con-
verts the weighted input to the predicted output yk of the ANN,
Table 1 Signal channel and data description which is the predicted flank wear VB. The output layer has only
one neuron because there is only one response variable. The per-
Signal channel Data description formance of ANNs depends on the topology or architecture of
ANNs (i.e., the number of layers) and the number of neurons in
Channel 1 FX : force (N) in X dimension each layer. However, there are no standard or well-accepted
Channel 2 FY : force (N) in Y dimension
methods or rules for determining the number of hidden layers
Channel 3 FZ : force (N) in Z dimension
Channel 4 VX : vibration (g) in X dimension and neurons in each hidden layer. In this research, the single-
Channel 5 VY : vibration (g) in Y dimension hidden-layer ANNs with 2, 4, 8, 16, and 32 neurons in the hid-
Channel 6 VZ : vibration (g) in Z dimension den layer are selected. The termination criterion of the training
Channel 7 AE: acoustic emission (V) algorithm is that training stops if the fit criterion (i.e., least
squares) falls below 1.0  104.

Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-3

Downloaded From: http://manufacturingscience.asmedigitalcollection.asme.org/ on 03/10/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


samples of a training dataset. Each decision tree produces a
response, given a set of predictor values. In a decision tree, each
internal node represents a test on an attribute, each branch repre-
sents the outcome of the test, and each leaf node represents a class
label for classification or a response for regression. A decision
tree in which the response is continuous is also referred to as a
regression tree. In the context of tool wear prediction, each indi-
vidual decision tree in a random forest is a regression tree because
tool wear describes the gradual failure of cutting tools. A compre-
hensive tutorial on RFs can be found in Refs. [22,48,49]. Some of
the important concepts related to RFs, including bootstrap aggre-
gating or bagging, slipping, and stopping criterion, are introduced
in Secs. 3.3.1–3.3.4.

Fig. 2 Tool wear prediction using SVR 3.3.1 Bootstrap Aggregating or Bagging. Given a training
dataset D ¼ fðx1 ; y1 Þ; ðx2 ; y2 Þ; …; ðxN ; yN Þg, bootstrap aggregating
or bagging generates B new training datasets Di of size N by sam-
3.2 Tool Wear Prediction Using SVR. The original SVM pling from the original training dataset D with replacement. Di is
for regression was developed by Vapnik and coworkers [46,47]. A referred to as a bootstrap sample. By sampling with replacement
SVM constructs a hyperplane or set of hyperplanes in a high- or or bootstrapping, some observations may be repeated in each Di .
infinite-dimensional space, which can be used for classification Bagging helps reduce variance and avoid overfitting. The number
and regression. of regression trees B is a parameter specified by users. Typically,
The framework of SVR for linear cases is illustrated in Fig. 2. a few hundred to several thousand trees are used in the random
Formally, SVR can be formulated as a convex optimization forest algorithm.
problem
3.3.2 Choosing Variables to Split On. For each of the boot-
X ‘
  strap samples, grow an un-pruned regression tree with the follow-
1
Minimize kxk2 þ C ni þ ni ing procedure: At each node, randomly sample m variables and
2 choose the best split among those variables rather than choosing
8 i¼1
< yi  hx; xi i  b  e þ ni
> (3.1) the best split among all predictors. This process is sometimes
Subject to hx; xi i þ b  yi  e þ ni called “feature bagging.” The reason why a random subset of the
>
: predictors or features is selected is because the correlation of the
ni ; ni  0 trees in an ordinary bootstrap sample can be reduced. For regres-
sion, the default m ¼ p=3.
where x 2 v; C ¼ 1, e ¼ 0:1, and ni ; ni ¼ 0:001. b can be com-
puted as follows: 3.3.3 Splitting Criterion. Suppose that a partition is divided
into M regions R1 , R2 ,…, Rm . The response is modeled as a con-
b ¼ yi  hx; xi i  e for ai 2 ½0; C stant cm in each region
(3.2)
b ¼ yi  hx; xi i þ e for ai 2 ½0; C
X
M
f ðxÞ ¼ cm Iðx  Rm Þ (3.4)
For nonlinear SVR, the training patterns xi can be preprocessed m¼1
by a nonlinear kernel function kðx; x0 Þ :¼ hUðxÞ; UðxÞi0 , where
UðxÞ is a transformation that maps x to a high-dimensional space. The splitting criterion at each node is to minimize the sum of
These kernel functions need to satisfy the Mercer’s theorem. squares. Therefore, the best cc
m is the average of yi in region Rm
Many kernels have been developed for various applications. The
most popular kernels include polynomial, Gaussian radial basis c
cm ¼ aveðyi jxi Rm Þ (3.5)
function (RBF), and sigmoid. In many applications, a nonlinear
kernel function provides better accuracy. According to the litera- Consider a splitting variable j and split point s, and define the
ture [32,33], the Gaussian RBF kernel is one of the most effective pair of half-planes
kernel functions used in tool wear prediction. In this research, the
Gaussian RBF kernel is used to transform the input dataset R1 ðj; sÞ ¼ fXjXj  sg and R2 ðj; sÞ ¼ fXjXj  sg (3.6)
D ¼ ðxi ; yi Þ, where xi is the input vector and yi is the response
variable (i.e., flank wear) into a new dataset in a high-dimensional The splitting variable j and split point s should satisfy
space. The new dataset is linearly separable by a hyperplane in a  X X 
higher-dimensional Euclidean space as illustrated in Fig. 2. The min min ðyi  c1Þ2 þ min ðyi  c2Þ2 (3.7)
slack variables ni and ni are introduced in the instances where the j;s c1
xi 2R1 ðj;sÞ
c2
xi 2R2 ðj;sÞ
constraints are infeasible. The slack variables denote the deviation
from predicted values with the error of e ¼ 0:1. The RBF kernel is For any j and s, the inner minimization is solved by
kðxi ; xj Þ ¼ exp ðððkxi  xj k2 Þ=2r2 ÞÞ, where r2 ¼ 0:5. At the
optimal solution, we obtain cb1 ¼ aveðyi jxi R1 ðj; sÞÞ and cb2 ¼ aveðyi jxi R2 ðj; sÞÞ (3.8)

X
‘ X
‘ Having found the best split, the dataset is partitioned into two
x¼ ðai  ai ÞUðxÞ and f ðxÞ ¼ ðai  ai Þkðxi ; xj Þ þ b resulting regions and repeat the splitting process on each of the
i¼1 i¼1 two regions. This splitting process is repeated until a predefined
(3.3) stopping criterion is satisfied.

3.3.4 Stopping Criterion. Tree size is a tuning parameter gov-


3.3 Tool Wear Prediction Using RFs. The random forest erning the complexity of a model. The stopping criterion is that
algorithm, developed by Breiman [22,48], is an ensemble learning the splitting process proceeds until the number of records in Di
method that constructs a forest of decision trees from bootstrap falls below a threshold, and five is used as the threshold.

071018-4 / Vol. 139, JULY 2017 Transactions of the ASME

Downloaded From: http://manufacturingscience.asmedigitalcollection.asme.org/ on 03/10/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


Fig. 3 Tool wear prediction using an RF Fig. 5 Experimental setup

individual regression tree is built by starting at the root node of


After B such trees fTb gB1 are constructed, a prediction at a new the tree, performing a sequence of tests about the predictors, and
point x can be made by averaging the predictions from all the indi- organizing the tests in a hierarchical binary tree structure as
vidual B regression trees on x shown in Fig. 4. After 500 regression trees are constructed, a pre-
diction at a new point can be made by averaging the predictions
B 1X B
from all the individual binary regression trees on this point.
fbrf ð xÞ ¼ Tb ð xÞ (3.9)
B b¼1

The random forest algorithm [48,49] for regression is as follows: 4 Experimental Setup
(1) Draw a bootstrap sample Z of size N from the training data. The data used in this paper were obtained from Li et al. [50].
(2) For each bootstrap sample, construct a regression tree by Some details of the experiment are presented in this section. The
splitting a node into two children nodes until the stopping experimental setup is shown in Fig. 5.
criterion is satisfied. The cutter material and workpiece material used in the experi-
(3) Output the ensemble of trees fTb gB1 . ment are high-speed steel and stainless steel, respectively. The
(4) Make a prediction at a new point x by aggregating the pre- detailed description of the operating conditions in the dry milling
dictions of the B trees. operation can be found in Table 2. The spindle speed of the cutter
was 10,400 RPM. The feed rate was 1555 mm/min. The Y depth
The framework of predicting flank wear using an RF is illus- of cut (radial) was 0.125 mm. The Z depth of cut (axial) was
trated in Fig. 3. In this research, a random forest is constructed 0.2 mm.
using B ¼ 500 regression trees. Given the labeled training dataset 315 cutting tests were conducted on a three-axis high-speed
D ¼ ðxi ; yi Þ, a bootstrap sample of size N ¼ 630 is drawn from CNC machine (R€oders Tech RFM 760). During each cutting test,
the training dataset. For each regression tree, m ¼ 9 ðm ¼ ðp=3Þ; seven signal channels, including cutting force, vibration, and
p ¼ 28Þ variables are selected at random from the 28 variables/ acoustic emission data, were monitored in real-time. The sampling
features. The best variable/split-point is selected among the nine rate was 50 kHz/channel. Each cutting test took about 15 s. A sta-
variables. A regression tree progressively splits the training data- tionary dynamometer, mounted on the table of the CNC machine,
set into two child nodes: left node (with samples <z) and right was used to measure cutting forces in three, mutually perpendicu-
node (with samples z). A splitting variable and split point are lar axes (x, y, and z dimensions). Three piezo accelerometers,
selected by solving Eqs. (3.7) and (3.8). The process is applied mounted on the workpiece, were used to measure vibration in
recursively on the dataset in each child node. The splitting process three, mutually perpendicular axes (x, y, and z dimensions). An
stops if the number of records in a node is less than 5. An acoustic emission (AE) sensor, mounted on the workpiece, was

Fig. 4 Binary regression tree growing process

Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-5

Downloaded From: http://manufacturingscience.asmedigitalcollection.asme.org/ on 03/10/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


Table 2 Operating conditions

Parameter Value

Spindle speed 10,400 RPM


Feed rate 1555 mm/min
Y depth of cut 0.125 mm
Z depth of cut 0.2 mm
Sampling rate 50 kHz/channel
Material Stainless steel

Table 3 List of extracted features

Cutting force Vibration Acoustic


(X, Y, Z dimensions) (X, Y, Z dimensions) emission

Max Max Max


Median Median Median
Mean Mean Mean
Standard deviation Standard deviation Standard deviation

used to monitor a high-frequency oscillation that occurs spontane- Fig. 7 Comparison of observed and predicted tool wear using
ously within metals due to crack formation or plastic deformation. SVR (termination criterion: slack variable or tolerance n is equal
Acoustic emission is caused by the release of strain energy as the to 0.001)
microstructure of the material is rearranged. After each cutting
test, the value of tool wear was measured off-line using a micro-
scope (Leica MZ12). The total size of the condition monitoring Three predictive models were developed using ANNs, SVR,
data is about 8.67 GB. and RFs, respectively. Two-thirds (2/3) of the input data (i.e.,
three datasets) were selected at random for model development
5 Results and Discussion (training). The remainder (1/3) of the input data was used for
model validation (testing). Figures 6–8 show the predicted against
In machine learning, feature extraction is an essential prepro-
observed tool wear values with the test dataset using ANNs, SVR,
cessing step in which raw data collected from various signal chan-
and RFs, respectively. Figure 9 shows the tool wear against time
nels are converted into a set of statistical features in a format
with RFs.
supported by machine learning algorithms. The statistical features
In addition, the performance of the three algorithms was
are then given as an input to a machine learning algorithm. In this
evaluated on the test dataset using accuracy and training time.
experiment, the condition monitoring data were collected from (1)
Accuracy is measured using the R2 statistic, also referred to as
cutting force, (2) vibration, and (3) acoustic emission signal chan-
the coefficient of determination, and mean squared error (MSE).
nels. A set of statistical features (28 features) was extracted from
In statistics, the coefficient of determination is defined as
these signals, including maximum, median, mean, and standard
R2 ¼ 1  ðSSE=SSTÞ, where SSE is the sum of the squares of
deviation as listed in Table 3.

Fig. 6 Comparison of observed and predicted tool wear using Fig. 8 Comparison of observed and predicted tool wear using
an ANN with 16 neurons in the hidden layer (termination crite- RFs (termination criterion: minimum number of samples in
rion: tolerance is equal to 1.0 3 1024) each node is equal to 5)

071018-6 / Vol. 139, JULY 2017 Transactions of the ASME

Downloaded From: http://manufacturingscience.asmedigitalcollection.asme.org/ on 03/10/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


Table 7 Accuracy on the test data and training time for the
FFBP ANN with 16 neurons in the hidden layer

ANN (number of neurons ¼ 16)

Training size (%) MSE R2 Training time (s)

50 36.337 0.964 0.394


60 41.420 0.959 0.412
70 40.138 0.960 0.468
80 42.486 0.957 0.506
90 44.056 0.957 0.566

Table 8 Accuracy on the test data and training time for the
FFBP ANN with 32 neurons in the hidden layer

Fig. 9 Tool wear against time (cut) using RFs ANN (number of neurons ¼ 32)

Training size (%) MSE R2 Training time (s)


Table 4 Accuracy on the test data and training time for the 50 35.305 0.965 1.165
FFBP ANN with two neurons in the hidden layer 60 38.612 0.963 1.301
70 38.824 0.963 1.498
ANN (number of neurons ¼ 2) 80 42.469 0.959 1.496
90 48.138 0.953 1.633
Training size (%) MSE R2 Training time (s)

50 49.790 0.951 0.049


60 45.072 0.955 0.054
70 45.626 0.956 0.055
80 47.966 0.953 0.062
90 48.743 0.955 0.056

Table 5 Accuracy on the test data and training time for the
FFBP ANN with four neurons in the hidden layer

ANN (number of neurons ¼ 4)

Training size (%) MSE R2 Training time (s)

50 43.428 0.958 0.122


60 51.001 0.951 0.084
70 43.645 0.958 0.093
80 45.661 0.955 0.103
90 45.058 0.958 0.118
Fig. 10 Comparison of training times

Table 6 Accuracy on the test data and training time for the
FFBP ANN with eight neurons in the hidden layer better the regression model fits the data. The MSE of an estimator
measures the average of P the squares of the errors. The MSE is
ANN (number of neurons ¼ 8) defined as MSE ¼ ð1=nÞ ni¼1 ðb yi  yi Þ2 , where ybi is a predicted
value, yi is an observed value, and n is the sample size. The ANN,
Training size (%) MSE R2 Training time (s)
SVR, and RF algorithms use between 50% and 90% of the input
50 36.810 0.964 0.167 data for model development (training) and use the remainder for
60 34.168 0.968 0.186 model validation (testing). Because the performance of ANNs
70 39.795 0.961 0.202 depends on the hidden layer configuration, five ANNs with a sin-
80 44.175 0.957 0.197 gle hidden layer but different number of neurons were tested on
90 46.634 0.954 0.234 the training dataset. Tables 4–8 list the MSE, R-squared, and train-
ing time for the ANNs with 2, 4, 8, 16, and 32 neurons. With
respect to the performance of the ANN, the training time increases
as the number of neurons increases. However, the increased in
training time are not significant as shown in Fig. 10. In addition,
residuals, SST is the total sum of squares. The coefficient of deter- while the prediction accuracy increases as the number of neurons
mination is a measure that indicates the percentage of the response increases, the performance is not significantly improved by adding
variable variation that is explained by a regression model. A more than eight neurons in the hidden layer as shown in Figs. 11
higher R-squared indicates that more variability is explained by and 12. Tables 9 and 10 list the MSE, R-squared, and training
the regression model. For example, an R2 of 100% indicates that time for SVR and RFs. While the training time for RFs is longer
the regression model explains all the variability of the response than that of ANNs and SVR, the predictive model built by RFs is
data around its mean. In general, the higher the R-squared, the the most accurate as shown in Figs. 10–12.

Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-7

Downloaded From: http://manufacturingscience.asmedigitalcollection.asme.org/ on 03/10/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


6 Conclusions and Future Work
In this paper, the prediction of tool wear in milling operations
was conducted using three popular machine learning algorithms,
including ANNs, SVR, and RFs. The performance of these
algorithms was evaluated on the dataset collected from 315 mill-
ing tests. The performance measures include mean squared error,
R-squared, and training time. A set of statistical features was
extracted from cutting forces, vibrations, and acoustic emissions.
The experimental results have shown that while the training time
on the particular dataset using RFs is longer than the FFBP ANNs
with a single hidden layer and SVR, RFs generate more accurate
predictions than the FFBP ANNs with a single hidden layer and
SVR. The main contribution of this paper is twofold: (1) we dem-
onstrated that the predictive model trained by RFs can predict tool
wear in milling processes very accurately for the first time to the
best of our knowledge and (2) we compared the performance of
Fig. 11 Comparison of MSEs RFs with that of FFBP ANNs and SVR, as well as observed
that RFs outperform FFBP ANNs and SVR for this particular
application example.
In the future, a comparison of the performance of SVR and RFs
with that of other types of ANNs, such as recurrent neural networks
and dynamic neural networks, will be conducted. In addition, our
future work will focus on designing the parallel implementation of
machine learning algorithms that can be applied to large-scale and
real-time prognosis.

Acknowledgment
The research reported in this paper is partially supported by
NSF under Grant Nos. IIP-1238335 and DMDII-15-14-01. Any
opinions, findings, and conclusions or recommendations expressed
in this paper are those of the authors and do not necessarily reflect
the views of the National Science Foundation and the Digital Man-
ufacturing and Design Innovation Institute.

References
[1] Swanson, L., 2001, “Linking Maintenance Strategies to Performance,” Int. J.
Prod. Econ., 70(3), pp. 237–244.
Fig. 12 Comparison of R-squared errors [2] Valdez-Flores, C., and Feldman, R. M., 1989, “A Survey of Preventive Mainte-
nance Models for Stochastically Deteriorating Single-Unit Systems,” Nav. Res.
Logist., 36(4), pp. 419–446.
[3] Wu, D., Terpenny, J., Zhang, L., Gao, R., and Kurfess, T., 2016, “Fog-Enabled
Architecture for Data-Driven Cyber-Manufacturing Systems,” ASME Paper
No. MSEC2016-8559.
Table 9 Accuracy on the test data and training time for SVR [4] Lee, J., 1995, “Machine Performance Monitoring and Proactive Maintenance in
with radial basis kernel Computer-Integrated Manufacturing: Review and Perspective,” Int. J. Comput.
Integr. Manuf., 8(5), pp. 370–380.
[5] Bevilacqua, M., and Braglia, M., 2000, “The Analytic Hierarchy Process
SVR Applied to Maintenance Strategy Selection,” Reliab. Eng. Syst. Saf., 70(1),
pp. 71–83.
Training size (%) MSE R2 Training time (s) [6] Suh, J. H., Kumara, S. R., and Mysore, S. P., 1999, “Machinery Fault Diagnosis
and Prognosis: Application of Advanced Signal Processing Techniques,” CIRP
50 54.993 0.946 0.060 Ann.-Manuf. Technol., 48(1), pp. 317–320.
[7] Hu, C., Youn, B. D., and Kim, T., 2012, “Semi-Supervised Learning With Co-
60 49.868 0.952 0.073
Training for Data-Driven Prognostics,” IEEE Conference on Prognostics and
70 41.072 0.959 0.088 Health Management (PHM), Denver, CO, June 18–21, pp. 1–10.
80 31.958 0.969 0.107 [8] Schwabacher, M., 2005, “A Survey of Data-Driven Prognostics,” AIAA Paper
90 23.997 0.975 0.126 No. 2005-7002.
[9] Byrne, G., Dornfeld, D., Inasaki, I., Ketteler, G., K€
onig, W., and Teti, R., 1995,
“Tool Condition Monitoring (TCM)—The Status of Research and Industrial
Application,” CIRP Ann.-Manuf. Technol., 44(2), pp. 541–567.
[10] Teti, R., Jemielniak, K., O’Donnell, G., and Dornfeld, D., 2010, “Advanced
Monitoring of Machining Operations,” CIRP Ann.-Manuf. Technol., 59(2),
pp. 717–739.
Table 10 Accuracy on the test data and training time for RFs
[11] Gao, R., Wang, L., Teti, R., Dornfeld, D., Kumara, S., Mori, M., and Helu, M.,
2015, “Cloud-Enabled Prognosis for Manufacturing,” CIRP Ann.-Manuf. Tech-
RFs (500 trees) nol., 64(2), pp. 749–772.
[12] Daigle, M. J., and Goebel, K., 2013, “Model-Based Prognostics With Concur-
Training size (%) MSE R2 Training time (s) rent Damage Progression Processes,” IEEE Trans. Syst. Man Cybernetics:
Syst., 43(3), pp. 535–546.
50 14.170 0.986 1.079 [13] Si, X.-S., Wang, W., Hu, C.-H., Chen, M.-Y., and Zhou, D.-H., 2013, “A
Wiener-Process-Based Degradation Model With a Recursive Filter Algorithm
60 11.053 0.989 1.386
for Remaining Useful Life Estimation,” Mech. Syst. Signal Process., 35(1),
70 10.156 0.990 1.700 pp. 219–237.
80 8.633 0.991 2.003 [14] Dong, M., and He, D., 2007, “Hidden Semi-Markov Model-Based Methodology
90 7.674 0.992 2.325 for Multi-Sensor Equipment Health Diagnosis and Prognosis,” Eur. J. Oper.
Res., 178(3), pp. 858–878.

071018-8 / Vol. 139, JULY 2017 Transactions of the ASME

Downloaded From: http://manufacturingscience.asmedigitalcollection.asme.org/ on 03/10/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use


[15] Saha, B., Goebel, K., and Christophersen, J., 2009, “Comparison of Prognostic [34] Palanisamy, P., Rajendran, I., and Shanmugasundaram, S., 2008, “Prediction of
Algorithms for Estimating Remaining Useful Life of Batteries,” Trans. Inst. Tool Wear Using Regression and ANN Models in End-Milling Operation,” Int.
Meas. Control, 31(3–4), pp. 293–308. J. Adv. Manuf. Technol., 37(1–2), pp. 29–41.
[16] Niaki, F. A., Michel, M., and Mears, L., 2016, “State of Health Monitoring in [35] Sanjay, C., Neema, M., and Chin, C., 2005, “Modeling of Tool Wear in Drilling
Machining: Extended Kalman Filter for Tool Wear Assessment in Turning of by Statistical Analysis and Artificial Neural Network,” J. Mater. Process. Tech-
IN718 Hard-to-Machine Alloy,” J. Manuf. Processes, 24(Part 2), pp. 361–369. nol., 170(3), pp. 494–500.
[17] Orchard, M. E., and Vachtsevanos, G. J., 2009, “A Particle-Filtering Approach [36] Chungchoo, C., and Saini, D., 2002, “On-Line Tool Wear Estimation in CNC
for On-Line Fault Diagnosis and Failure Prognosis,” Trans. Inst. Meas. Control, Turning Operations Using Fuzzy Neural Network Model,” Int. J. Mach. Tools
31(3–4), pp. 221–246. Manuf., 42(1), pp. 29–40.
[18] Wang, P., and Gao, R. X., 2015, “Adaptive Resampling-Based Particle Filtering [37] Chen, J. C., and Chen, J. C., 2005, “An Artificial-Neural-Networks-Based In-
for Tool Life Prediction,” J. Manuf. Syst., 37(Part 2), pp. 528–534. Process Tool Wear Prediction System in Milling Operations,” Int. J. Adv.
[19] Niaki, F. A., Ulutan, D., and Mears, L., 2015, “Stochastic Tool Wear Assess- Manuf. Technol., 25(5–6), pp. 427–434.
ment in Milling Difficult to Machine Alloys,” Int. J. Mechatronics Manuf. [38] Paul, P. S., and Varadarajan, A., 2012, “A Multi-Sensor Fusion Model Based
Syst., 8(3–4), pp. 134–159. on Artificial Neural Network to Predict Tool Wear During Hard Turning,”
[20] Wang, P., and Gao, R. X., 2016, “Stochastic Tool Wear Prediction for Sustain- Proc. Inst. Mech. Eng., Part B, 226(5), pp. 853–860.
able Manufacturing,” Proc. CIRP, 48, pp. 236–241. [39] Karayel, D., 2009, “Prediction and Control of Surface Roughness in CNC Lathe Using
[21] Sick, B., 2002, “On-Line and Indirect Tool Wear Monitoring in Turning With Artificial Neural Network,” J. Mater. Process. Technol., 209(7), pp. 3125–3137.
Artificial Neural Networks: A Review of More Than a Decade of Research,” [40] Cho, S., Asfour, S., Onar, A., and Kaundinya, N., 2005, “Tool Breakage Detec-
Mech. Syst. Signal Process., 16(4), pp. 487–546. tion Using Support Vector Machine Learning in a Milling Process,” Int. J.
[22] Breiman, L., 2001, “Random Forests,” Mach. Learn., 45(1), pp. 5–32. Mach. Tools Manuf., 45(3), pp. 241–249.
[23] Biau, G., 2012, “Analysis of a Random Forests Model,” J. Mach. Learn. Res., [41] Benkedjouh, T., Medjaher, K., Zerhouni, N., and Rechak, S., 2015, “Health
13, pp. 1063–1095. Assessment and Life Prediction of Cutting Tools Based on Support Vector
[24] Verikas, A., Gelzinis, A., and Bacauskiene, M., 2011, “Mining Data With Regression,” J. Intell. Manuf., 26(2), pp. 213–223.
Random Forests: A Survey and Results of New Tests,” Pattern Recognit., [42] Shi, D., and Gindy, N. N., 2007, “Tool Wear Predictive Model Based on Least
44(2), pp. 330–349. Squares Support Vector Machines,” Mech. Syst. Signal Process., 21(4), pp.
[25] Kamarthi, S., Kumara, S., and Cohen, P., 2000, “Flank Wear Estimation in 1799–1814.
Turning Through Wavelet Representation of Acoustic Emission Signals,” [43] Jiaa, C. L., and Dornfeld, D. A., 1998, “A Self-Organizing Approach to the Pre-
ASME J. Manuf. Sci. Eng., 122(1), pp. 12–19. diction and Detection of Tool Wear,” ISA Trans., 37(4), pp. 239–255.
[26] Liang, S., and Dornfeld, D., 1989, “Tool Wear Detection Using Time Series [44] Elangovan, M., Devasenapati, S. B., Sakthivel, N., and Ramachandran, K.,
Analysis of Acoustic Emission,” J. Eng. Ind., 111(3), pp. 199–205. 2011, “Evaluation of Expert System for Condition Monitoring of a Single Point
[27] Huang, Y., and Liang, S. Y., 2004, “Modeling of CBN Tool Flank Wear Progression Cutting Tool Using Principle Component Analysis and Decision Tree Algo-
in Finish Hard Turning,” ASME J. Manuf. Sci. Eng., 126(1), pp. 98–106. rithm,” Expert Syst. Appl., 38(4), pp. 4450–4459.
[28] Taylor, F. W., 1907, On the Art of Cutting Metals, ASME, New York. €
[45] Arisoy, Y. M., and Ozel, T., 2015, “Machine Learning Based Predictive Model-
[29] Schwabacher, M., and Goebel, K., 2007, “A Survey of Artificial Intelligence ing of Machining Induced Microhardness and Grain Size in Ti–6Al–4V Alloy,”
for Prognostics,” AAAI Fall Symposium, Arlington, VA, Nov. 9–11, pp. Mater. Manuf. Process., 30(4), pp. 425–433.
107–114. [46] Cortes, C., and Vapnik, V., 1995, “Support-Vector Networks,” Mach. Learn.,
[30] Bukkapatnam, S. T., Lakhtakia, A., and Kumara, S. R., 1995, “Analysis of Sen- 20(3), pp. 273–297.
sor Signals Shows Turning on a Lathe Exhibits Low-Dimensional Chaos,” [47] Drucker, H., Burges, C. J., Kaufman, L., Smola, A., and Vapnik, V., 1997,
Phys. Rev. E, 52(3), p. 2375. “Support Vector Regression Machines,” Advances in Neural Information Proc-
[31] Bukkapatnam, S. T., Kumara, S. R., and Lakhtakia, A., 2000, “Fractal Estima- essing Systems, Vol. 9, pp. 155–161.
tion of Flank Wear in Turning,” ASME J. Dyn. Syst. Meas. Control, 122(1), pp. [48] Liaw, A., and Wiener, M., 2002, “Classification and Regression by Random
89–94. Forest,” R News, 2(3), pp. 18–22.
[32] Bukkapatnam, S., Kumara, S., and Lakhtakia, A., 1999, “Analysis of Acoustic [49] Friedman, J., Hastie, T., and Tibshirani, R., 2001, The Elements of Statistical
Emission Signals in Machining,” ASME J. Manuf. Sci. Eng., 121(4), pp. Learning, Springer Series in Statistics, Springer, Berlin.
568–576. [50] Li, X., Lim, B., Zhou, J., Huang, S., Phua, S., Shaw, K., and Er, M., 2009,

[33] Ozel, T., and Karpat, Y., 2005, “Predictive Modeling of Surface Roughness and “Fuzzy Neural Network Modelling for Tool Wear Estimation in Dry Milling
Tool Wear in Hard Turning Using Regression and Neural Networks,” Int. J. Operation,” Annual Conference of the Prognostics and Health Management
Mach. Tools Manuf., 45(4), pp. 467–479. Society (PHM), San Diego, CA, Sept. 27–Oct. 1, pp. 1–11.

Journal of Manufacturing Science and Engineering JULY 2017, Vol. 139 / 071018-9

Downloaded From: http://manufacturingscience.asmedigitalcollection.asme.org/ on 03/10/2018 Terms of Use: http://www.asme.org/about-asme/terms-of-use

You might also like