0% found this document useful (0 votes)

126 views

Mathematical Programming For Piecewise Linear Regression Analysis

This document discusses a mathematical programming approach for piecewise linear regression analysis. It introduces a novel integer linear programming formulation that partitions an input variable into multiple mutually exclusive segments and fits a multivariate linear regression function to each segment to minimize total absolute error. The method determines breakpoints and regression coefficients simultaneously. It was shown to outperform other regression methods on 7 real-world datasets and produce easily interpretable if-then rule models.

Uploaded by

hongxin

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

126 views

Mathematical Programming For Piecewise Linear Regression Analysis

Uploaded by

hongxin

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Mathematical Programming for Piecewise Linear

Regression Analysis

Lingjian Yanga , Songsong Liua , Sophia Tsokab , Lazaros G. Papageorgioua,∗

a Centrefor Process Systems Engineering, Department of Chemical Engineering, University
College London, Torrington Place, London WC1E 7JE, UK
b Department of Informatics, School of Natural and Mahtematical Sciences, King’s College

London, Strand, London WC2R 2LS, UK

Abstract

In data mining, regression analysis is a computational tool that predicts con-

tinuous output variables from a number of independent input variables, by ap-
proximating their complex inner relationship. A large number of methods have
been successfully proposed, based on various methodologies, including linear
regression, support vector regression, neural network, piece-wise regression etc.
In terms of piece-wise regression, the existing methods in literature are usu-
ally restricted to problems of very small scale, due to their inherent non-linear
nature. In this work, a more efficient piece-wise regression method is intro-
duced based on a novel integer linear programming formulation. The proposed
method partitions one input variable into multiple mutually exclusive segments,
and fits one multivariate linear regression function per segment to minimise the
total absolute error. Assuming both the single partition feature and the number
of regions are known, the mixed integer linear model is proposed to simultane-
ously determine the locations of multiple break-points and regression coefficients
for each segment. Furthermore, an efficient heuristic procedure is presented
to identify the key partition feature and final number of break-points. 7 real
world problems covering several application domains have been used to demon-

∗ Corresponding author: Tel.: +442076792563; Fax.: +442073882348

Email addresses: [email protected] (Lingjian Yang), [email protected]
(Songsong Liu), [email protected] (Sophia Tsoka), [email protected]
(Lazaros G. Papageorgiou)

Preprint submitted to Journal of Expert Systems with Applications August 12, 2015
strate the efficiency of our proposed method. It is shown that our proposed
piece-wise regression method can be solved to global optimality for datasets of
thousands samples, which also consistently achieves higher prediction accuracy
than a number of state-of-the-art regression methods. Another advantage of
the proposed method is that the learned model can be conveniently expressed
as a small number of if-then rules that are easily interpretable. Overall, this
work proposes an efficient rule-based multivariate regression method based on
piece-wise functions and achieves better prediction performance than state-of-
the-arts approaches. This novel method can benefit expert systems in various
applications by automatically acquiring knowledge from databases to improve
the quality of knowledge base.
Keywords: regression analysis, surrogate model, piecewise linear function,
mathematical programming, optimisation

1. Introduction

In data mining, regression is a type of analysis that predicts continuous out-

put/response variables from several independent input variables. Given a num-
ber of samples, each one of which is characterised by certain input and output
5 variables, regression analysis aims to approximate their functional relationship.
The estimated functional relationship can then be used to predict the level of
output variable for new enquiry samples. Generally, regression analysis can be
useful under two circumstances: 1) when the process of interest is a black-box,
i.e. there is limited knowledge of the underlying mechanism of the system. In
10 this case, regression analysis can accurately predict the output variables from
the relevant input variables without requiring details of the however compli-
cated inner mechanism (Bai et al., 2014; Venkatesh et al., 2014; Cortez et al.,
2009; Davis & Ierapetritou, 2008). Quite frequently, the user would also like
to gain some valuable insights into the true underlying functional relationship,
15 which means the interpretability of a regression method is also of importance, 2)
when the detailed simulation model relating input variables to output variables,

2
usually via some other intermediate variables, is known, yet is too complex and
expensive to be evaluated comprehensively in feasible computational time. In
this case, regression analysis is capable of approximating the overall system be-
20 haviour with much simpler functions while preserving a desired level of accuracy,
and can then be more cheaply evaluated (Caballero & Grossmann, 2008; Henao
& Maravelias, 2011, 2010; Viana et al., 2014; Beck et al., 2012).

Over the past years, regression analysis has been established as a powerful tool
25 in a wide range of applications, including: customer demand forecasting (Levis
& Papageorgiou, 2005; Kone & Karwan, 2011), investigation of CO2 capture
process (Zhang & Sahinidis, 2013; Nuchitprasittichai & Cremaschi, 2013), opti-
misation of moving bed chromatography (Li et al., 2014b), forecasting of CO2
emission (Pan et al., 2014), prediction of acidity constants for aromatic acids
30 (Ghasemi et al., 2007), prediction of induction of apoptosis by different chemical
components (Afantitis et al., 2006) and estimation of thermodynamic property
of ionic liquids (Chen et al., 2014; Wu et al., 2014).

A large number of regression analysis methodologies exist in the literature,

35 including: linear regression, support vector regression (SVR), kriging, radial
basis function (RBF) (Sarimveis et al., 2004), multivariate adaptive regression
splines (MARS), multilayer perceptron (MLP), random forest, K-nearest neigh-
bour (KNN) and piecewise regressions. We briefly summarise those regression
methodologies before presenting our proposed method.
40

Linear regression
Linear regression is one of the most classic types of regression analysis, which
predicts the output variables as linear combinations of the input variables.
The regression coefficients of the input variables are usually estimated using
45 least squared error or least absolute error approaches, and the problems can
be formulated as either quadratic programming or linear programming prob-
lems, which can be solved efficiently. In some cases when the estimated linear

3
relationship fails to adequately describe the data, a variant of linear regres-
sion analysis, called polynomial regression, can be adopted to accommodate
50 non-linearity (Khuri & Mukhopadhyay, 2010). In polynomial regression, higher
degree polynomials of the original independent input variables are added as new
input variables into the regression function, before estimating the coefficients
of the aggregated regression function. Polynomial functions of second-degree
have been most frequently used in literature due to its robust performance and
55 computational efficiency (Khayet et al., 2008; Minjares-Fuentes et al., 2014).

Another popular variant of linear regression is called least absolute shrinkage

and selection operator (LASSO) (Tibshirani, 1994). In LASSO, summation of
absolute values of regression coefficients is added as a penalty term into the
60 objective function. The nature of LASSO encourages some coefficients to equal
to 0, thus performing implicit feature selection (Tibshirani, 2011).

Automated learning of algebraic models for optimisation (ALAMO) (Cozad

et al., 2014; Zhang & Sahinidis, 2013) is a mathematical programming-based
65 regression method that proposes low-complexity functions to predict output
variables. Given the independent input features, ALAMO starts with defining
a large set of potential basis functions, such as polynomial, multinomial, expo-
nential and logarithmic forms of the original input variables. Subsequently an
mixed integer linear programming model (MILP) is solved to select the best
70 subset of T basis functions that optimally fit the data. The value of T is ini-
tially set equal to 1 and then iteratively increased until the Akaike information
criterion, which measures the generalisation of the constructed model, starts to
decrease (Miller et al., 2014). The integer programming model is capable of
capturing the synthetic effect of different basis functions, which is considered
75 more efficient than traditional step-wise feature selection.

SVR
Support vector machine is a very established statistical learning algorithm,

4
which fits a hyper plane to the data in hand (Smola & Schlkopf, 2004). SVR
80 minimises two terms in the objective function, one of which is -insensitive loss
function, i.e. only sample training error greater than an user-specific threshold,
, is considered in the loss function. The other term is model complexity, which
is expressed as sum of squared regression coefficients. Controlling model com-
plexity usually ensures the model generalisation, i.e. high prediction accuracy
85 in testing samples. Another user-specified trade-off parameter balances the sig-
nificance of the two terms (Chang & Lin, 2011; Bermolen & Rossi, 2009). One
of the most important features that contribute to the competitiveness of SVR
is the kernel trick. Kernel trick maps the dataset from the original space to
higher-dimensional inner product space, at where a linear regression is equiva-
90 lent to an non-linear regression function in the original space (Li et al., 2000).
A number of kernel functions can be employed, e.g. polynomial function, radial
basis function and fourier series (Levis & Papageorgiou, 2005). Formulated as a
convex quadratic programming problem, SVR can be solved to global optimality.

95 Despite the simplicity and optimality of SVR, the problem of tuning two param-
eters, i.e. training error tolerance and trade-off parameter balancing model
complexity and accuracy, and selection of suitable kernels still considerably af-
fect its performance accuracy (Lu et al., 2009; Cherkassky & Ma, 2004).

100 Kriging
Kriging is a spatial interpolation-based regression analysis methodology (Klei-
jnen & Beers, 2004). Given a query sample, kriging estimates its output as
a weighted sum of the outputs of the known nearby samples. The weights
of samples are computed solely from the data by considering sample closeness
105 and redundancy, instead of being given by an arbitrary decreasing function of
distance (Kleijnen, 2009). The interpolation nature of kriging means that the
derived interpolant passes through the given training data points, i.e. the er-
ror between predicted output and real output is zero for all training samples.
Different variants of kriging have been developed in literature, including the

5
110 most popular ordinary kriging (Lloyd & Atkinson, 2002; Zhu & Lin, 2010) and
universal kriging (Brus & Heuvelink, 2007; Sampson et al., 2013).

MARS
MARS (Friedman, 1991) is another type of regression analysis that accommo-
115 dates non-linearity and interaction between independent input variables in its
functional relationship. Non-linearity is introduced into MARS in the form of
the so-called hinge functions, which are expressions with max operators and look
like max(0, X − const). If independent variable X is greater than a constant
number const, the hinge function is equal to X-const, otherwise the hinge func-
120 tion equals to 0. The hinge functions create knots in the prediction surface of
MARS. The functional form of MARS can be a weighted sum of constant, hinge
functions and products of multiple hinge functions, which makes it suitable to
model a wide range of non-linearity (Andrs et al., 2011).

125 The building of MARS usually consists of two steps, a forward addition and
a backward deletion step. In the forward addition step, MARS starts from one
single intercept term/constant and iteratively adds pairs of hinge functions (i.e.
max(0, X − const) and max(0, const − X)) that leads to largest reduction in
training error. Afterwards, a backward deletion step, which removes one by
130 one those hinge functions contributing insignificantly to the model accuracy, is
employed to improve generalisation of the final model (Leathwick et al., 2006;
Balshi et al., 2009). The presence of hinge functions also make MARS a piece-
wise regression method.

135 MLP
Multilayer perceptron is a feedforward artificial neural network, whose structure
is inspired by the organisations of biological neural networks (Hill et al., 1994).
A MLP typically consists of an input layer of measurable features, an output
layer of response variables, sandwiching multiple intermediate layers of neurons.
140 The network is fully interconnected in the sense that neurons in each layer are

6
connected to all the neurons in the two neighbour layers (Comrie, 1997; Gevrey
et al., 2003). Each neuron in the intermediate layers takes a weighted linear
combination of outputs from all neurons in the previous layer as input, applies
an non-linear transformation function before supplying the output to all neu-
145 rons of the next layer. The use of non-linear transformation functions, including
sigmoid, hyperbolic tangent and logarithmic functions, makes MLP suitable for
modelling highly non-linear relationship (Gevrey et al., 2003; Rafiq et al., 2001).

Identifying the optimal configuration of a MLP, i.e. the number of interme-

150 diate layers, the number of neurons for each intermediate layer, the type of
activation function for each neuron and the weights of connection between con-
secutive layers of neurons, is known to be time-consuming and traps in local
optimal solutions (Paliwal & Kumar, 2009). The large degree of freedom in
training a MLP is often blamed for data over-fitting. Dua (2010) has presented
155 a two-objective mathematical formulation trying to find the best configuration
of a MLP by balancing the training accuracy and network complexity. More
often, architecture of a MLP is fixed by the user and back-propagation is used to
tune only the weights of connection between neighbour layers of neurons (Gud-
ise & Venayagamoorthy, 2003; Zhang et al., 2007).
160

Random forest
Before introducing random forest we first describe regression tree, which is a
decision tree-based prediction model. Starting from the entire set of samples, a
regression tree selects one independent input variable among all and performs
165 binary split into two child sets, under the condition that the two child nodes
give increased purity of the data compared with its single parent node. Purity
is often defined as the deviation of predicting with the mean value of the out-
put variable. The process of binary split is recursively applied for each child
node until a terminating criterion is satisfied. The nodes that are not further
170 partitioned are called leaves. After growing a large tree, a pruning process is
employed to remove the leaves contributing insignificantly to the purity im-

7
provement (Breiman et al., 1984; Loh, 2011). In order to improve model fit, a
linear regression model can be fitted for each leaf (Quinlan, 1992).

175 Random forest is an ensemble learning method of regression trees. In general,

random forest (Breiman, 2001; Biau, 2012) builds a forest of multiple regression
tree models and aggregate the decisions from all the trees to produce a final
prediction. Given a dataset, multiple bootstrap sample sets are first created by
random sampling with replacement. Each of the bootstrap sample set is then
180 learned by a revised regression tree algorithm, which differs from the classic
regression tree by randomly selecting a candidate subset of features for each
binary split of node (Genuer et al., 2010). The accuracy of each regression tree
can be estimated on the training samples absent from the bootstrap set, and
the final prediction can be either simple average of predictions from all trees or
185 weighted average considering the estimated accuracy. It is demonstrated that
random forest achieves much robust prediction performance compared with sin-
gle regression tree method (Breiman, 2001; Fanelli et al., 2011).

KNN
190 KNN belongs to the category of lazy learning algorithms, due to the fact that
prediction is based on the instances without an explicit training phase of con-
structing models, thus making it one of the simplest regression methods in
literature (Korhonen & Kangas, 1997). Given an enquiry sample, KNN first
identifies K closest instances in the training sample set, the exact value of K is
195 given a priori. The closeness of samples can be measured by different distance
metrics, for example Euclidean and Manhattan distances (Scheuber, 2010; Ero-
nen & Klapuri, 2010). Prediction is then taken as weighted mean of the outputs
of the K nearest neighbours, with weight often being defined as the inverse
of distance (Papadopoulos et al., 2011). Despite its simplicity, KNN usually
200 provides competitive prediction performance against much more sophisticated
algorithms.

8
Previous work on piecewise regression
Piecewise functions have been frequently studied in literature as well. In (Toms
205 & Lesperance, 2003), univariate piece-wise linear functions have been used to
fit ecological data and identify break-points that represent critical threshold
values of a phenomenon. In (Strikholm, 2006), a method based on statistical
testing is proposed to estimate the number of break-points for an univariate
piece-wise linear function. Malash & El-Khaiary (2010) also apply piece-wise
210 linear regression techniques on univariate experimental adsorption data. Piece-
wise function is determined by solving a non-linear programming model. Seg-
Reg (www.waterlog.info/segreg.htm) is free software that permits estimating
of piece-wise regression functions with up to two independent variables. For
one independent variable, SegReg splits from a series of candidate break-points
215 and for each one fits a linear regression for either side of the break-point. The
break-point corresponding to the largest statistical confidence is taken as the
final solution. In the case of two independent variables, SegReg first determines
the two-region piece-wise regression function between the dependent variable
and the most significant input variable, before computing the relation between
220 its residual/deviation and the second input variable.

Both Magnani & Boyd (2009) and Toriello & Vielma (2012) publish work on
data fitting with a special family of piece-wise regression functions, called max-
affine functions. The form of max-affine functions is defined as the maximum
225 of a series of linear functions, i.e. a sample is projected to all linear functions,
and the maximum projected value among all is taken as final predicted value
from the piece-wise functions. The use of max-affine functions limits the fitted
surface to be convex. In (Magnani & Boyd, 2009) a heuristic method is used to
ease the difficulty of direct solving the highly non-linear max-affine functions,
230 while in (Toriello & Vielma, 2012), big-M constraint is used to reformulate
the problem into an non-convex mixed integer non-linear programming model.
However, computational complexity is limiting their applications to examples
of small scale.

9
235 More recently, Greene et al. (2015) applies piece-wise regression analysis to
predict patient’s post-treatment life quality with the pre-treatment life quality
measure, which identifies the segments where therapy benefits vary significantly.
The analysis is performed using Segmented (Muggeo, 2008), a package written
in R (R Development Core Team, 2008). Segmented formulates the problem
240 using a non-linear model and requires a user to specify the segmented input
variables, the number of break-points and also the initial guess of each break-
point. Starting from the those user supplied initial positions of break-points,
Segmented iteratively moves around the neighbour of the initial guess points
to search break-points of better quality using local linearisation. However, it is
245 difficult if not impossible to reasonably guess good starting points for real world
multivariate problems of large number of samples and input variables, where
visual examination cannot be performed. This makes it hard to identify quality
solutions. Furthermore, Segmented only allows the input variables being par-
titioned to have different regression coefficients across different segments, while
250 the other input variables keep the same coefficients within the entire ranges,
significantly restricting its flexibility.

In both (Xue et al., 2013) and (Li et al., 2014a), piece-wise regression func-
tion were employed to detect vegetation changes. Piece-wise linear regression
255 was tackled using fuzzy logic and identifies the changes in patterns of vegetation
greenness. Cavanaugh et al. (2014) employ piece-wise regression and find out
that the changes in mangrove area over the last 20 years is a piece-wise functions
of latitude, with regions above and below a specific threshold latitude value fol-
lowing two different patterns of mangrove grows. Moreover, Matthews et al.
260 (2014) uses 2-segment piece-wise functions to describe the relationship between
species richness and fragment area of islands, with the critical breakpoint being
determined by simply sampling a number of candidate values and selecting the
one giving best model fit. Unfortunately, the above methods are all limited to
model rather simple relationship between one output variable and one input

10
265 variable, seriously limiting their usage in more complex problems.

It is clear that the previous literature work of piece-wise regression methods

are non-linear and computationally restricted to problems of very small scales.
Yet, they cannot be solved to identify globally optimal solutions. In this work,
270 we propose a novel linear model for piece-wise regression analysis. A single
input variable is partitioned to separate samples into multiple mutually exclu-
sive segments, while each segment is fitted with a unique multivariate linear
regression function. Assuming that both the partition variable and the number
of break-points are known, we propose an optimisation model that optimally
275 estimates the position of all break-points and the linear regression coefficients
for each segment simultaneously so that the total absolute deviation is min-
imised. Thanks to the usage of binary variables, our proposed mathematical
model is linear and can be efficiently solved to global optimality for problems
up to thousands samples (see Results and Discussion section). Furthermore, a
280 solution procedure is used to identify the key partition variable and the final
number of break-points. Several real world multivariate benchmark datasets
have been used to demonstrate the applicability and efficiency of the proposed
method.

285 The proposed piece-wise regression method can help construct expert systems in
various application domains. Expert systems are computer programs designed
to make decisions analogous to human experts. As an expert system is typ-
ically made up of an inference engine and a knowledge base, the quality and
quantity of information in knowledge base directly affects the usefulness of the
290 constructed expert system. Our proposed piece-wise regression method can be
helpful in more efficiently building expert systems via automatic and efficient
acquisition of knowledge. More specifically, the proposed piece-wise regression
method can extract latent knowledge from the large collection of domain expert
curated databases. Those discovered knowledge are represented in the form
295 of identified relationship between input and output variables of interest, which

11
can be combined with expert knowledge to form the final expert system (Alonso
et al., 2012). For example, the proposed piece-wise regression method in this
work can be used for building prognostic expert systems in medical applica-
tions. When presented historical data of patients’ clinical variables and survival
300 length, piece-wise regression can induce domain knowledge by approximating
the complex relationship between clinical variables and survival length. Those
induced knowledge can then be used to perform prognosis for the current pa-
tients, imitating the end-behaviour of human experts, i.e. medical doctors.

305 Overall, the key contributions of our work are illustrated below:

• We propose a novel mixed integer optimisation model for multivariate

regression analysis modelling piece-wise linear functions, which partitions
a single variable into multiple mutually exclusive regions and fits each one
with a distinct multivariate linear function. Given as prior a single input
310 variable as partition feature and the number of segments, the optimisation
model can be solved to simultaneously determine the positions of multiple
break points and regression coefficients for each segment.

• Given that neither which feature should be segmented nor the number
of segments are typically known, a heuristic solution procedure is also
315 introduced that automatically identifies the key partition variable and the
final number of segments.

• A number of real world benchmark problems have been employed to

demonstrate the applicability and efficiency of the proposed method. As
sharp difference to the existing piece-wise regression methods in literature
320 which can only be applied to problems of very small size, our proposed
optimisation model can be solved to global optimality for datasets contain-
ing up to five thousand samples. Comparison to some popular regression
methods based on other methodologies clearly indicates that our proposed
method based on piece-wise function achieves the highest prediction ac-
325 curacy, and does it consistently. Besides high prediction performance,

12
our proposed regression method has the advantage of being easily under-
standable and interpretable, as the learned model can be conveniently
represented as a small set of rules.

• As a generic data mining method, our proposed regression method can

330 help with constructions of expert and intelligent systems via automatic
extraction of knowledge from database. We have discussed its potential
usage in various application domains.

The rest of the paper is structured as follows: in Section 2, we present

the mathematical programming model and a heuristic solution procedure. In
335 Section 3, comparative results of our proposed method and some state-of-the-
art regression algorithms on benchmark examples are presented and discussed.
The last section concludes with our key findings.

2. Method

A novel piecewise linear regression method is proposed in this work. The core
340 idea of the proposed method is to identify a single input feature, and separate
the samples into complementary regions on this feature. One different linear
regression function is fitted locally for each region. The sample partition and
calculation of local regression coefficients are performed simultaneously within
the proposed optimisation to achieve least absolute error.

345 2.1. A novel regression method

In this section, we first describe a novel mathematical programming model

that optimises the location of break-points and regression coefficients for each
region so as to achieve minimal training error. Subsequently, a solution proce-
dure is proposed to identify the best partition feature and the number of regions.

The indices, parameters and variables associated with the proposed model are

13
listed below:

Indices
s sample, s=1,2,...,S
m f eature/independent input variable, m=1,2,...,M
r region, r=1,2,...,R
*
m the f eature where sample partition takes place

Parameters
Asm numeric value of sample s on f eature m
Ys output value of sample s
U 0 , U 00 arbitrarily large positive numbers

Continuous variables
r
Wm regression coef f icient f or f eature m in region r
Br intercept of regression f unction in region r
P redrs predicted output f or sample s in region r
Xr * break − point r on partition f eature m*
m
Ds training error between predicted output and real output f or sample s

Binary variables
Fsr 1 if sample s f alls into region r; 0 otherwise

Assume first that both the partition feature m* and the number of regions
R are given, the R-1 break points are arranged in an ordered way:

r−1 r
Xm ≤ Xm ∀m = m* , r = 2, 3, ..., R (1)

Binary variables Fsr are introduced to model if sample s belongs to region r or

14
not. Modelling of which sample belongs to which region is achieved with the
following constraints:

r−1
Xm − U 0 (1 − Fsr ) ≤ Asm ∀s, r = 2, 3, ..., R, m = m* (2)

r
Asm ≤ Xm + U 0 (1 − Fsr ) ∀s, r = 1, 2, ..., R − 1, m = m* (3)

When sample s belongs to region r (i.e. Fsr = 1), Asm* falls into the region
bounded by the two consecutive break-points X r−1 r ∗
* and X * on feature m ;
m m
otherwise the two sets of constraints become redundant. A visualisation of
break-points and regions is provided in Figure 1:

Figure 1: Break-points and regions

The following constraints restrict that each sample belongs to one and only one
region:
X
Fsr = 1 ∀s (4)
r

For sample s, its predicted output value for region r, P redrs , is as below:

X
P redrs = r
Asm Wm + Br ∀s, r (5)
m

For any sample s, its training error is equal to the absolute deviation between

15
the real output and the predicted output for the region r where it belongs to
(i.e. Fsr = 1):
Ds ≥ Ys − P redrs − U 00 (1 − Fsr ) ∀s, r (6)

Ds ≥ P redrs − Ys − U 00 (1 − Fsr ) ∀s, r (7)

The objective function is to minimise the sum of absolute training error:

X
min Ds (8)
s

The final model, named as Optimal Piece-wise Linear Regression Analysis (OPLRA)
in this work, consists of a linear objective function and several linear constraints,
and the presence of both binary and continuous variables define an MILP prob-
350 lem, which can be solved to global optimality by standard solution algorithms,
for example branch and bound. A heuristic solution procedure is also employed
in this work to identify the partition feature and the number of regions, as de-
scribed in Figure 2 below.

355 The heuristic procedure starts with solving a linear regression on the entire set
of data with least absolute deviation. Subsequently, each input feature in turn
serves as partition feature m* once and the OPLRA model is solved while al-
lowing two regions (i.e. R = 2). The feature corresponding to the minimum
training error is kept and if its error represents a percentage reduction of more
360 than β from the global linear regression without data partition, the procedure
continues; otherwise it is decided that two-region piecewise linear regression does
not provide a desirable improvement upon the classic linear regression, and the
initially derived linear regression function without sample partition is obtained
for prediction. The parameter β, taking value between 0 and 1, quantifies the
365 percentage reduction in training error that justifies adding one more region.

16
Figure 2: Heuristic procedure to identify the partition feature and the number of regions

17
If two-region piecewise regression is accepted, the corresponding partition fea-
ture is retained for further analysis while the number of regions is iteratively
increased, until the β training reduction criterion is not satisfied between iter-
ations.
370

β is the only user-specific parameter in our proposed regression method, which

requires fine tuning. A small value may cause over-fitting, i.e. too many regions
are allowed and each region contains only small number of samples, which then
results in unreliable construction of local linear regression functions; while a
375 value excessively large will lead to premature growing of regions, which then
under-fit the data. In Results and Discussion section, we will test a series of
values on a number of benchmark datasets and select the optimal value corre-
sponding to the most robust prediction performance.

380 The constructed piecewise linear regression functions are then used to predict
the output value of new samples. A testing sample is firstly assigned to one of
the regions, and the regression coefficients for that region are used to estimate
its output value.

2.2. An illustrative example

385 In order to better illustrate the training of the proposed regression method,
a simulation model is taken from literature. In brief, the illustrative example
(Palmer & Realff, 2002) describes the operation of a continuous stirred tank
reactor, where a chain reaction of A → B → C takes place. An inlet stream
containing both reactant A and B enters the reactor and the desirable output is
390 component B. There are 4 independent input variables to the simulation model,
including temperature of the reactor (T ), volume of the reactor (V ), concen-
tration of A and B in the inlet stream (CAin and CB in ). The output to be
predicted is the production rate of B (P ). The process and associated variables
are described in Figure 3.
395

18
Figure 3: Illustrative example of a continuous stirred tank reactor

With latin hypercube sampling technique (Helton & Davis, 2003) employed
to specify a set of data points, we run the simulation model and collect 300
samples. The goal of the regression analysis is to approximate the functional
relationship between output variable P and input variables including T , V ,
400 CAin and CB in using piece-wise linear functions. The step-wise description of
the training procedure is presented in Table 1 below.

Initially, a linear regression function is fitted to the entire dataset without fea-
ture segmentation, which gives an absolute deviation of 1677.78. The second
405 iteration of the method solves 4 independent OPLRA models allowing 2 regions
each, respectively specifying T, V, CAin and CB in as partition feature. The two-
region piece-wise linear functions constructed while partitioning on T appears
to yield lower training errors (i.e. 1030.63) than the other 3, and therefore is
taken as the solution for iteration 2. This represents a significant improvement
410 (i.e. 38.57%) from the initial global linear regression function. From iteration
3, the partition feature is fixed as T while one more region is allocated for each
increased iteration. Iteration 3 and 4 respectively lowers the training error to
876.66 and 807.12. The iterative procedure terminates when the β criterion is
not satisfied, e.g. if β = 20%, then the iterative procedure terminates at the

19
415 third iteration and the final regression function has 2 regions; if β = 10%, then
the final regression function has 3 regions.

Table 1: Piecewise regression functions built at each step of training procedure

Iteration Number Partition Training Training error Functional relationship
of regions feature error improvement

1 1 NONE 1677.78 P = 1.0240T + 0.0054CAin + 0.0125CB in + 0.4340V − 333.54

(
0.7413T + 0.0040CAin + 0.0102CB in + 0.3406V − 238.74, T ≤ 213.21
2 2 T 1030.63 38.57% P =
1.7156T + 0.0111CAin + 0.0315CB in + 0.7574V − 592.63, T > 213.21
(
0.5952T + 0.0033CAin + 0.0056CB in + 0.4533V − 194.26, V ≤ 42.38
2 V 1143.49 P =
1.4781T + 0.0083CAin + 0.0195CB in + 0.4773V − 48.70, V > 42.38
(
0.8930T + 0.0057CAin + 0.0152CB in + 0.4161V − 293.45, CAin ≤ 3528.43
2 CAin 1485.65 P =
1.4857T + 0.0073CAin + 0.0070CB in + 0.5929V − 489.45, CAin > 3528.43
(
1.0242T + 0.0056CAin + 0.0118CB in + 0.4241V − 333.49, CB in ≤ 458.21
2 CB in 1627.73 P =
1.1105T + 0.0050CAin − 0.1405CB in + 0.5813V − 291.00, CB in > 458.21

in in
0.5815T + 0.0030CA + 0.0097CB + 0.2654V
 − 184.45, T ≤ 303.25
3 3 T 876.66 14.94% P = 1.1353T + 0.0062CAin + 0.0176CB in + 0.4579V − 373.68, 303.25 < T ≤ 316.62

1.8764T + 0.0119CA + 0.0394CB in + 0.8617V
in
− 654.41, T > 316.62



 0.5815T + 0.0030CAin + 0.0097CB in + 0.2654V − 184.45, T ≤ 303.25
1.2648T + 0.0054CAin + 0.0148CB in + 0.4510V

− 409.61, 303.32 < T ≤ 312.21
4 4 T 807.12 7.93% P =
1.4872T + 0.0084CAin + 0.0202CB in + 0.6667V
 − 503.10, 312.21 < T ≤ 320.77

1.9930T + 0.0128CAin + 0.0360CB in + 0.8871V − 695.65, T > 320.77


...

Overall, the key features of our proposed piecewise linear regression method are
summarised here: 1) our method identifies one key partition feature and sepa-
420 rate samples into multiple complementary regions on it, 2) each region has the
flexibility of being fitted by its own linear regression function, with all input fea-
tures allowed to have different regression coefficients across different regions, 3)
there is only one tuning parameter β, 4) compared with algorithms like kernel-
based SVR and MLP, the constructed regression function is easy to understand,
425 as it exhibits linear relationships for different regions.

It is noted here that the obtained relationship between input and output vari-
ables, presented as rules in Table 1, can be used to build an expert system for
the above operation. Given the chain reaction of A → B → C in stirred tank
430 reactor (Palmer & Realff, 2002), domain experts perform experiments to create
a database of samples for different levels of temperature, reactor volume and
concentrations of reactants. Our proposed piece-wise regression method is then
applied to automatically extract the rules that predict production rate from

20
temperature, reactor volume and reactant concentrations. The rules will be
435 difficult to be provided directly from even chemical engineering experts due to
the complex nature of the reaction. Since the above extracted rules can calcu-
late a production rate value for any random value of temperature, tank volume
and reactant concentrations, regardless of if they obey physical laws (must be
positive) or are valid for the reaction of interest, expert knowledge should be
440 incorporated to further refine the rules. For example, expert knowledge can be
used to constraint the applicable temperature range outside which liquid phase
will vaporise to gas phase or freeze to solid phase, making it impossible for the
reaction to proceed as normal. The final expert system will allow users to query
the likely outcome, as production rate or no reaction, of any combination of
445 values of temperature, reactor volume and reactant concentrations.
In the next section, a number of real world regression problems are employed
to benchmark the predictive performance of our proposed model.

3. Results and Discussion

A total number of 7 real world datasets have been downloaded from UCI ma-
450 chine learning repository (http://archive.ics.uci.edu/ml/) (Bache & Lichman,
2013) to test the prediction performance of our proposed method. The first re-
gression problem Yacht Hydrodynamics predicts the hydrodynamic performance
of sailing yachts from 7 features describing the hull dimensions and velocity of
the boat for 308 samples. Energy Efficiency (Tsanas & Xifara, 2012) collects
455 data corresponding to 768 building shapes, described by 8 features including
wall area, root area and so on. The aims are to establish the relationship be-
tween either heating load or cooling load requirements and the 8 parameters
of the building. The third example, Concrete Strength (Yeh, 1998), looks into
the relationship between compressive strength of concrete and 8 input variables,
460 including water concentration and age, with 1030 samples of different concretes.
Airfoil dataset concerns how the different airfoil blade desings, wind speed and
angles of attack affect the sound pressure level. The last 2 case studies, Red

21
Wine Quality and White Wine Quality (Cortez et al., 2009), aims to predict
experts’ preference of red and while wine taste with 11 physicochemical features
465 of the wines. Almost 1600 red wine and 4900 white wine samples have been
obtained for analysis.

For each of the 7 benchmark datasets, a 5-fold cross validation, is performed

to estimate the predictive accuracy of the proposed method. Given a dataset,
470 5-fold cross validation randomly splits the samples into 5 subsets of equal size.
Each subset is in turn held out once while the other 4 subsets of samples are
used in the training process to derive the regression function. The holdout set
is then used to validate the predictive accuracy of the constructed regression
function. We conduct 10 rounds of 5-fold cross validation by performing differ-
475 ent random sample splits, and the mean absolute prediction errors (MAE) are
averaged over 50 testing sets as the final error.

For comparison purposes, a number of state-of-the-art regression methods have

been implemented, including linear regression, MLP, kriging, SVR, KNN, ran-
480 dom forest, MARS, PaceRegression and ALAMO. Linear regression, MLP, krig-
ing, SVR, KNN and PaceRegression are implemented in WEKA machine learn-
ing software (Hall et al., 2009). For KNN, the number of nearest neighbours is
selected as 5, while for other methods their default settings have been retained.
Random forest is implemented using Orange (Demšar et al., 2013). We use
485 the MATLAB toolbox called ARESlab (Jekabsons, 2011) for MARS. ALAMO
is reproduced using the General Algebraic Modeling System (GAMS) (GAMS
Development Corporation, 2013), and basis function forms including polyno-
mial of degrees up to 3, pair-wise multinomial terms of equal exponents up to 3,
exponential and logarithmic forms are provided for each dataset. Our proposed
490 method is also implemented in GAMS. Both ALAMO and our proposed model
are solved using Cplex MILP solver, with optimality gap set as 0. Computa-
tional resource limit is set as 200 seconds for each solving of OPLRA model in
our proposed method.

22
Figure 4: Sensitivity analysis of β. The numbers above points in each plot correspond to the
average numbers of final regions.

3.1. Sensitivity analysis for β

495 In this subsection, a sensitivity analysis is performed for the parameter β,

which serves as a terminating criterion of the iterative training procedure for
our proposed method. Taking value between 0 and 1, β defines the minimum
percentage training error reduction that must be achieved to justify the alloca-
tion of an extra region. A range of values have been tested, including: 0.2, 0.15,
500 0.10, 0.05, 0.03 and 0.01. The results of the sensitivity analysis are provided
in Figure 4 below.

23
Figure 4 describes how mean absolute error changes with β. The numbers
attached to the points in each plot are the average numbers of final regions,
505 which always go up as β decreases. For Yacht Hydrodynamics example, setting
β = 0.20 results in just more than 4 final regions. Decrease the β value to
0.15 increases slightly the prediction error with marginally higher number of
regions. Further decrease β to 0.10 leads to lowest mean prediction error of
0.648 with an average of 5 regions, before excessively low values of β over-fits
510 the unseen testing samples by yielding much increased prediction error. For En-
ergy Efficiency Heating case study, when β = 0.10,0.15 and 0.20 our proposed
regression method constructs piece-wise regression functions of an average of 3
regions, yielding MAE of 0.907. Smaller values of β leads to about 5 regions,
which are shown to predict the testing samples with higher accuracy (MAE
515 around 0.810). In terms of Energy Efficiency Cooling and Concrete Strength
examples, similar phenomenon can be observed that when β takes overly high
values (i.e. 0.20, 0.15 ), the proposed method terminates prematurely with only
2 regions and relatively high MAE. More regions are allowed by lowering β,
which gives higher prediction accuracies. On Airfoil case study, the proposed
520 method outputs global multiple linear regression functions without data par-
titions when β = 0.20. As β decreases, more regions are permitted, which
predict unseen samples with better accuracy. With regards to Red Wine Qual-
ity dataset, the optimal prediction occurs when β = 0.03. On the last example
of White Wine Quality, 2-region piece-wise regression functions achieved with
525 β = 0.01, 0.03, 0.05 outperforms global multiple linear regressions for higher
values of β.

It can be seen from Figure 4 that the range of values between 0.01 and 0.05 gen-
erally lead to smaller prediction error than higher values of β. For all datasets
530 except Yacht Hydrodynamics, prediction errors of β = 0.01, 0.03 and 0.05 are
evidently smaller than that of β = 0.10, 0.15 and 0.20. Within the range be-
tween 0.01 and 0.05, there is no clear optimal value for β as different values have

24
different effects on the accuracy. We instead seek to identify the most robust
value for β, which gives consistently desirable prediction accuracy across a wide
535 range of problems. For each dataset, we normalise the MAE of each β accord-
M AEβ −minβ M AEβ
ing to the formula: minβ M AEβ . For example, in Yacht Hydrodynamics,
0.7131−0.6481
original MAE of β = 0.01 is normalised from 0.7131 to 0.6481 = 10.0%,
where 0.6481 is the lowest MAE achieved when β = 0.10. The normalised MAE
of each β represents the actual deviation of it compared to the lowest error, and
540 is averaged over all examples to reflect its overall competitiveness.

Overall β = 0.03 provides the smallest normalised MAE of 1.7%, which is

marginally lower than these of β = 0.01 and β = 0.05, respectively as 1.8% and
2.8%. Even higher values of β correspond to noticeably larger normalised MAE
545 (5.6%, 9.7% and 12.3% for β = 0.10,0.15 and 0.20, respectively). The con-
sistently small normalised MAE, while β is between 0.01 and 0.05, show that
our proposed regression method is robust with respect to the only user tuning
parameter β. Finally β is set to 0.03 when comparing with other competing
methods in literature.

550 3.2. Prediction performance comparison

After identifying a value (i.e. 0.03 ) for the only tuning parameter β in
our proposed regression method, we now compare the accuracy of the proposed
method against some popular regression algorithms with the same set of 7 ex-
amples. The results of the comparison are available in Table 2 below.

Table 2: Comparative testing of different regression methods on benchmark datasets

Hydrodynamics Energy Heating Energy Cooling Concrete Airfoil Red Wine White Wine
linear regression 7.270 2.089 2.266 8.311 0.037 0.506 0.586
MLP 0.809 0.993 1.924 6.229 0.035 0.581 0.623
Kriging 4.324 1.788 2.044 6.224 0.030 0.496 0.576
SVR 6.445 2.036 2.191 8.212 0.037 0.500 0.585
KNN 5.299 1.937 2.148 7.068 0.026 0.515 0.537
Random forest 3.516 1.435 1.644 5.309 0.030 0.484 0.519
MARS 1.011 0.796 1.324 4.871 0.035 0.502 0.570
PaceRegression 7.233 2.089 2.261 8.298 0.037 0.507 0.586
ALAMO 0.787 2.722 2.765 8.044 0.032 0.594 0.639
Proposed 0.706 0.810 1.278 4.870 0.029 0.481 0.551

555

25
In Table 2 and each tested dataset, the lowest prediction error achieved among
all implemented regression methods is marked with bold. On Hydrodynamics
problem, the proposed method in this work provides an MAE of 0.706, which is
560 lower than any other competing algorithm. ALAMO, MLP and MARS follow
closely with MAE of 0.787, 0.809 and 1.011, respectively. Mean error rates of
the rest of the methods are between 3 and 8. On Energy Efficiency Heating,
MARS emerges as the most accurate algorithm with an mean absolute error of
0.796, which is closely matched by our proposed method and MLP. Mean pre-
565 diction errors of the other approaches are almost all twice as large as that of the
MARS. In terms of Energy Efficiency Cooling dataset, the proposed method,
MARS, random forest and MLP are the top 4 performers with MAE between
1.278 and 1.924. On Concrete Strength, our proposed approach and MARS,
with an MAE of 4.870 and 4.871, again emerge as the leading methods from
570 random forest, Kriging, MLP and the others. When it comes to Airfoil example,
all the competing algorithms achieve similar prediction accuracies, with KNN
topping the chart with an MAE of 0.026. The proposed approach in this work
is a merely 0.003 far behind, with kriging and random forest a further 0.001
behind. A merely 0.011 separates the 10 methods. Lastly, on the two Wine
575 Quality examples, our proposed approach is respectively ranked as 1st and 3rd
best method.

Overall, for 4 out of the 7 datasets, including Yacht Hydrodynamics, Energy

Efficiency Cooling, Concrete Strength and Red Wine Quality, the proposed re-
580 gression method achieves the lowest prediction errors. For the other 3 tested
examples, the proposed method still perform competitively as being second on
Energy Efficiency Heating, Airfoil and third on White Wine Quality.

As there does not exist a single regression method which can always outper-
585 form others on all datasets, a desirable regression algorithm should demonstrate
consistently competitive prediction accuracy. In order to more comprehensively

26
Figure 5: Scoring of regression methods

evaluate the relative competitiveness of all the implemented approaches, we em-

ploy the following scoring strategy: for each problem, the regression methods
are ranked in descending order according to their mean prediction error. The
590 best regression method corresponding to the lowest prediction error is awarded
the maximum score of 10, the second best regression method corresponding to
the second lowest prediction error is assigned a score of 9 and so on. The scores
of each regression approach are averaged over the 7 datasets, which represent
the overall performance of the method. The higher the score, the better the
595 relative performance of a method. The scores of the different regression ap-
proaches used in this work are presented in Figure 5 below.

According to Figure 5, the proposed method is shown to be the most accu-

27
rate and robust regression algorithm among all, achieving a score of 9.43 out
600 of a possible 10. Random forest and MARS are second and third according to
the ranking with scores of 8 and 7.43, followed by kriging, KNN, MLP, SVR,
ALAMO, linear regression and PaceRegression in descending order. The advan-
tages of the proposed regression method is quite obvious compared with other
implemented methods.
605

Lastly, we take a look at, for each dataset, the number of regions and the key
partition feature determined by our proposed regression method. The results
are summarised in Table 3. It is clear that the proposed segmented regres-
sion method provides good interpretability as the number of regions are small
610 (usually between 2 to 4 and at most 5). The partition features may release im-
portant insights of the underlying system as the output variables change more
dramatically across different ranges alone this feature.

Table 3: Number of regions and partition feature by our proposed method

Dataset Number of regions Partition feature
Hydrodynamics 5 Froude number
Energy Heating 3 Wall Area
Energy Cooling 3 Wall Area
Concrete 3 Age
Airfoil 4 Frequency
Red Wine 2 Alcohol
White Wine 2 Volatile acidity

3.3. Strength and weakness of the proposed piece-wise regression method

No regression method will be the best for all problems. In this section, we
615 give some general illustration of the pros and cons of the proposed OPLRA
piece-wise linear regression method, and compare it against some other litera-
ture methods. OPLRA piece-wise regression is inherently deterministic, which
means the same solution is always guaranteed regardless of the number of runs
executed. This is an advantage of OPLRA against stochastic-based methods,
620 for example MLP, where each execution would typically end up with a different

28
locally optimal solution. On the other hand, OPLRA is intuitive and easy to
interpret. OPLRA approximates the potentially highly non-linear relationship
between output and input variables as piece-wise linear algebraic functions, the
formalism of which is easy to understand, interpret and use for users without
625 sophisticated background knowledge. Contrarily, the mechanisms of certain
methods like SVR, MLP and Kriging lack transparency as the former two work
as black box techniques and the latter requires detailed knowledge on statistics.
The small number of user-specified parameters involved in training of OPLRA is
another remarkable advantage. β is the only tuning parameter in the proposed
630 OPLRA, which produces robust predictive performance with regards to varying
values of β as shown in the following Results and Discussion section. Conversely,
usage of certain regression methods, including SVR, MLP and Kriging requires
tuning a large number of parameters, making it a challenging task to identify
their optimal values. More importantly, OPLRA piece-wise regression achieves
635 more accurate and robust prediction performance against other methods. Using
a large number of real word problems, OPLRA is shown to outperform popular
state-of-the-art multivariate regression methods in terms of prediction accuracy
and does so consistently across a number of real world problems.

640 With regards to shortcomings of our proposed piece-wise regression method,

training of OPLRA generally consumes more computational resource than the
existing methods in literature. Solving OPLRA combinatorial optimisation
model is indeed a more computationally intensive task than heuristic-based
methods, for example regression tree, MARS and quadratic programming-based
645 SVR. Therefore, we note here that this method is not designed for online ap-
plications where computation time is valued more than the prediction accu-
racy/model interpretability. Another limitation of OPLRA is that it permits
segmentation of only one input variable, which may not be adequate for the
datasets where trend of the output variable changes dramatically in more than
650 one input variables.

29
4. Concluding Remarks

This work addresses the problem of multivariate regression analysis, where

one seeks to estimate the complex relationship between dependent output vari-
ables and independent input variables from training samples. The identified
655 relationships can then be used to make predictions for unseen observations. We
have proposed a novel piece-wise regression method, which approaches the prob-
lem by segmenting one input variable into multiple mutually exclusive regions
and simultaneously fitting each one with a distinct multivariate linear function.
An optimisation model has been proposed to optimise the locations of break
660 points and regression coefficients for each region, while a heuristic procedure
has also been introduced to find the key partition feature and the number of
break-points by repeatedly solving the optimisation models until a satisfactory
solution is identified.

665 To demonstrate the applicability and efficiency of the proposed piece-wise re-
gression method, 7 real world problems have been employed, covering a wide
range of application domains. To benchmark the predictive capability of the pro-
posed method, we have also implemented various popular regression methods in
literature for comparison, including support vector regression, artificial neural
670 network, MARS and K nearest neighbour. Computational experiments clearly
indicate that our proposed piece-wise regression method achieves consistently
high predictive accuracy as leading to the lowest prediction errors for 4 out of 7
datasets, second lowest errors for 2 datasets and third lowest error for the other
example. The results confirm our proposed method as a reliable alternative to
675 traditional regression analysis methods. Another remarkable advantage of our
proposed method is that the learned model can be conveniently expressed as
a set of if-then rules that are compact and easily understandable. From Table
3, it is clear that the number of if-then rules identified by our method as the
hidden patterns in the large scale databases (up to thousands expert curated
680 samples) are extremely small (usually 2 to 3 and at most 5). The model inter-

30
pretability of the proposed piece-wise regression is a desirable advantage over
black modelling techniques, for example support vector regression and neural
network.

685 With regards to research contribution in expert and intelligent systems, the
generic machine learning method proposed in this work can be used to con-
struct a large number of automatic decision making or support systems for var-
ious domain applications. As the quality and coverage of information contained
in knowledge base critically affects the efficiency of any expert and intelligent
690 system, our proposed machine learning method can serve to automatically and
more efficiently acquire knowledge from database by approximating the relation-
ship between output and input variables as rules. Subsequently, the discovered
knowledge can be used to generate forecasts to users’ enquiry.

695 To further improve the efficiency of the proposed piece-wise regression method
in this work, the following limitations can be considered for refinement. As the
piece-wise regression method proposed in this work can only partition a single
input variable, one potential improvement is to generalise the method so that
to permit segmentation of multiple variables so as to better capture the non-
700 linearity in datasets. Secondly, as our proposed method in this work can only
handle continuous input variables, we plan to improve its applicability by gener-
alising it to deal with categorical input variables having many distinct levels. In
addition, the relationship between output and input variables are approximated
as linear for each segment in the current method, which may not adequately
705 model the underlying patterns. To overcome this, more complex non-linear ba-
sis functions, for example polynomial, exponential and logarithmic forms, can
be added to allow more flexibility. Another limitation of our method is the rel-
atively high computational cost, which may restrict its usage in certain online
applications, where learning speed of the method is considered more important
710 than actual prediction accuracy. To tackle this problem, we can explore more
efficient heuristic solution procedures that, by estimating the possible break-

31
point positions and constricting the solution space, more quickly converge to a
quality solution.

715 In terms of practical future applications in expert and intelligent systems, the
proposed piece-wise regression method can benefit many via automatic extrac-
tion of knowledge from databases and generate accurate forecasts. As examples,
we have identified the following directions as possible avenues worth investiga-
tion in the near future. First, our proposed method can be incorporated into
720 the construction of a decision support expert system that continuously predicts
the personalised risk of prisoner with mental illness being released from the jail,
aiding clinician for decision making (Constantinou et al., 2015). Other applica-
tions that can benefit from our work include intelligent drowsiness monitoring
system and stock price prediction. In drowsiness monitoring, the proposed
725 regression model can be built into an intelligent fatigue detection equipment,
which records the dynamic physiological signals of drivers or medical staffs and
continuously predicts their level of fatigues. A warning will be automatically
issued when the model predicts the fatigue level of subjects to be above a pre-
specified threshold level (Chen et al., 2015). In financial area, our method can
730 help with construction of an automatic system that forecasts the stock price
based on the ever-changing variables quantifying the current performance of a
company, including assets, liabilities and income, providing management with
data support to make better financial benefits (Ballings et al., 2015). Lastly,
the proposed method developed here can also find application in airline industry
735 where managers and decision makers can benefit from a framework powerful of
predicting the level of customer satisfaction from various aspects of services, and
therefore making it possible for them to carefully allocate resource to maximise
customer loyalty (Leong et al., 2015).

32
5. Acknowledgements

740 Funding from the UK Engineering and Physical Sciences Research Coun-
cil (to LY, SL and LGP through the EPSRC Centre for Innovative Manufac-
turing in Emergent Macromolecular Therapies), the UK Leverhulme Trust (to
ST and LGP, RPG-2012-686), the European Union(to ST, HEALTH-F2-2011-
261366),and the Centre for Process Systems Engineering (CPSE) at Imperial
745 and University College London are gratefully acknowledged.

References

Afantitis, A., Melagraki, G., Sarimveis, H., Koutentis, P. A., Markopoulos, J.,
& Igglessi-Markopoulou, O. (2006). A novel {QSAR} model for predicting
induction of apoptosis by 4-aryl-4h-chromenes. Bioorganic and Medicinal
750 Chemistry, 14 , 6686 – 6694.

Alonso, F., Martnez, L., Prez, A., & Valente, J. P. (2012). Cooperation between
expert knowledge and data mining discovered knowledge: Lessons learned.
Expert Systems with Applications, 39 , 7524 – 7535.

Andrs, J. D., Lorca, P., de Cos Juez, F. J., & Snchez-Lasheras, F. (2011).
755 Bankruptcy forecasting: A hybrid approach using fuzzy c-means clustering
and multivariate adaptive regression splines (mars). Expert Systems with
Applications, 38 , 1866 – 1875.

Bache, K., & Lichman, M. (2013). UCI machine learning repository.

Bai, Y., Wang, P., Li, C., Xie, J., & Wang, Y. (2014). A multi-scale relevance
760 vector regression approach for daily urban water demand forecasting. Journal
of Hydrology, 517 , 236 – 245.

Ballings, M., den Poel, D. V., Hespeels, N., & Gryp, R. (2015). Evaluating
multiple classifiers for stock price direction prediction. Expert Systems with
Applications, 42 , 7046 – 7056.

33
765 Balshi, M. S., Mcguire, A. D., Duffy, P., Flannigan, M., Walsh, J., & Melillo, J.
(2009). Assessing the response of area burned to changing climate in western
boreal north america using a multivariate adaptive regression splines (mars)
approach. Global Change Biology, 15 , 578–600.

Beck, J., Friedrich, D., Brandani, S., Guillas, S., & Fraga, E. (2012). Surro-
770 gate based optimisation for design of pressure swing adsorption systems. In
Proceedings of the 22nd European Symposium on Computer Aided Process
Engineering.

Bermolen, P., & Rossi, D. (2009). Support vector regression for link load pre-
diction. Computer Networks, 53 , 191 – 201. QoS Aspects in Next-Generation
775 Networks.

Biau, G. (2012). Analysis of a random forests model. Journal of Machine

Learning Research, 13 , 1063–1095.

Breiman, L. (2001). Random forests. Machine Learning, 45 , 5–32.

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classifica-
780 tion and Regression Trees. Wadsworth.

Brus, D. J., & Heuvelink, G. B. (2007). Optimization of sample patterns for

universal kriging of environmental variables. Geoderma, 138 , 86 – 95.

Caballero, J. A., & Grossmann, I. E. (2008). An algorithm for the use of

surrogate models in modular flowsheet optimization. AIChE Journal , 54 ,
785 2633–2650.

Cavanaugh, K. C., Kellner, J. R., Forde, A. J., Gruner, D. S., Parker, J. D.,
Rodriguez, W., & Feller, I. C. (2014). Poleward expansion of mangroves is a
threshold response to decreased frequency of extreme cold events. Proceedings
of the National Academy of Sciences, 111 , 723–727.

790 Chang, C.-C., & Lin, C.-J. (2011). Libsvm: A library for support vector ma-
chines. ACM Transactions on Intelligent Systems and Technology, 2 , 27:1–
27:27.

34
Chen, L., Zhao, Y., Zhang, J., & zhong Zou, J. (2015). Automatic detection of
alertness/drowsiness from physiological signals using wavelet-based nonlinear
795 features and machine learning. Expert Systems with Applications, 42 , 7344 –
7355.

Chen, Q.-L., Wu, K.-J., & He, C.-H. (2014). Thermal conductivity of ionic
liquids at atmospheric pressure: Database, analysis, and prediction using a
topological index method. Industrial and Engineering Chemistry Research,
800 53 , 7224–7232.

Cherkassky, V., & Ma, Y. (2004). Practical selection of svm parameters and
noise estimation for svm regression. Neural Networks, 17 , 113 – 126.

Comrie, A. C. (1997). Comparing neural networks and regression models for

ozone forecasting. Journal of the Air and Waste Management Association,
805 47 , 653–663.

Constantinou, A. C., Freestone, M., Marsh, W., Fenton, N., & Coid, J. (2015).
Risk assessment and risk management of violent reoffending among prisoners.
Expert Systems with Applications, 42 , 7511 – 7529.

Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling
810 wine preferences by data mining from physicochemical properties. Decision
Support Systems, 47 , 547 – 553. Smart Business Networks: Concepts and
Empirical Evidence.

Cozad, A., Sahinidis, N. V., & Miller, D. C. (2014). Learning surrogate models
for simulation-based optimization. AIChE Journal , 60 , 2211–2227.

815 Davis, E., & Ierapetritou, M. (2008). A kriging-based approach to minlp con-
taining black-box models and noise. Industrial and Engineering Chemistry
Research, 47 , 6101–6125.

Demšar, J., Curk, T., Erjavec, A., Črt Gorup, Hočevar, T., Milutinovič, M.,
Možina, M., Polajnar, M., Toplak, M., Starič, A., Štajdohar, M., Umek, L.,

35
820 Žagar, L., Žbontar, J., Žitnik, M., & Zupan, B. (2013). Orange: Data mining
toolbox in python. Journal of Machine Learning Research, 14 , 2349–2353.

Dua, V. (2010). A mixed-integer programming approach for optimal configura-

tion of artificial neural networks. Chemical Engineering Research and Design,
88 , 55 – 60.

825 Eronen, A., & Klapuri, A. (2010). Music tempo estimation with k -nn regression.
Audio, Speech, and Language Processing, IEEE Transactions on, 18 , 50–57.

Fanelli, G., Gall, J., & Van Gool, L. (2011). Real time head pose estimation
with random regression forests. In Computer Vision and Pattern Recognition
(CVPR), 2011 IEEE Conference on (pp. 617–624).

830 Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals

of Statistics, 19 , pp. 1–67.

GAMS Development Corporation (2013). General Algebraic Modeling System

(GAMS) Release 24.2.1.. Washington, DC, USA.

Genuer, R., Poggi, J.-M., & Tuleau-Malot, C. (2010). Variable selection using
835 random forests. Pattern Recognition Letters, 31 , 2225 – 2236.

Gevrey, M., Dimopoulos, I., & Lek, S. (2003). Review and comparison of meth-
ods to study the contribution of variables in artificial neural network models.
Ecological Modelling, 160 , 249 – 264. Modelling the structure of acquatic
communities: concepts, methods and problems.

840 Ghasemi, J., Saaidpour, S., & Brown, S. D. (2007). Qspr study for estimation
of acidity constants of some aromatic acids derivatives using multiple linear
regression (mlr) analysis. Journal of Molecular Structure: THEOCHEM , 805 ,
27 – 32.

Greene, M., Rolfson, O., Garellick, G., Gordon, M., & Nemes, S. (2015).
845 Improved statistical analysis of pre- and post-treatment patient-reported

36
outcome measures (proms): the applicability of piecewise linear regression
splines. Quality of Life Research, 24 , 567–573.

Gudise, V., & Venayagamoorthy, G. (2003). Comparison of particle swarm

optimization and backpropagation as training algorithms for neural networks.
850 In Swarm Intelligence Symposium, 2003. SIS ’03. Proceedings of the 2003
IEEE (pp. 110–117).

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H.
(2009). The weka data mining software: An update. SIGKDD Explorations
Newsletter , 11 , 10–18.

855 Helton, J., & Davis, F. (2003). Latin hypercube sampling and the propagation
of uncertainty in analyses of complex systems. Reliability Engineering and
System Safety, 81 , 23 – 69.

Henao, C. A., & Maravelias, C. T. (2010). Surrogate-based process synthesis. In

S. Pierucci, & G. B. Ferraris (Eds.), 20th European Symposium on Computer
860 Aided Process Engineering (pp. 1129 – 1134). Elsevier volume 28 of Computer
Aided Chemical Engineering.

Henao, C. A., & Maravelias, C. T. (2011). Surrogate-based superstructure

optimization framework. AIChE Journal , 57 , 1216–1232.

Hill, T., Marquez, L., O’Connor, M., & Remus, W. (1994). Artificial neural
865 network models for forecasting and decision making. International Journal of
Forecasting, 10 , 5 – 15.

Jekabsons, G. (2011). Areslab: Adaptive regression splines toolbox for mat-

lab/octave.

Khayet, M., Cojocaru, C., & Zakrzewska-Trznadel, G. (2008). Response surface

870 modelling and optimization in pervaporation. Journal of Membrane Science,
321 , 272 – 283.

37
Khuri, A. I., & Mukhopadhyay, S. (2010). Response surface methodology. Wiley
Interdisciplinary Reviews: Computational Statistics, 2 , 128–149.

Kleijnen, J. P. (2009). Kriging metamodeling in simulation: A review. European

875 Journal of Operational Research, 192 , 707 – 716.

Kleijnen, J. P. C., & Beers, W. C. M. v. (2004). Application-driven sequential

designs for simulation experiments: Kriging metamodelling. The Journal of
the Operational Research Society, 55 , pp. 876–883.

Kone, E. R. S., & Karwan, M. H. (2011). Combining a new data classification

880 technique and regression analysis to predict the cost-to-serve new customers.
Computer and Industrial Engineering, 61 , 184–197.

Korhonen, K. T., & Kangas, A. (1997). Application of nearestneighbour regres-

sion for generalizing sample tree information. Scandinavian Journal of Forest
Research, 12 , 97–101.

885 Leathwick, J., Elith, J., & Hastie, T. (2006). Comparative performance of
generalized additive models and multivariate adaptive regression splines for
statistical modelling of species distributions. Ecological Modelling, 199 , 188
– 196. Predicting Species Distributions Results from a Second Workshop
on Advances in Predictive Species Distribution Models, held in Riederalp,
890 Switzerland, 2004.

Leong, L.-Y., Hew, T.-S., Lee, V.-H., & Ooi, K.-B. (2015). An semartificial-
neural-network analysis of the relationships between servperf, customer sat-
isfaction and loyalty among low-cost and full-service airline. Expert Systems
with Applications, 42 , 6620 – 6634.

895 Levis, A. A., & Papageorgiou, L. G. (2005). Customer demand forecasting

via support vector regression analysis. Chemical Engineering Research and
Design, 83 , 1009 – 1018.

38
Li, B., Zhang, L., Yan, Q., & Xue, Y. (2014a). Application of piecewise lin-
ear regression in the detection of vegetation greenness trends on the tibetan
900 plateau. International Journal of Remote Sensing, 35 , 1526–1539.

Li, S., Feng, L., P., B., & Seidel-Morgenstern, A. (2014b). Using surrogate
models for efficient optimization of simulated moving bed chromatography.
Computers and Chemical Engineering, 67 , 121 – 132.

Li, Y., Gong, S., & Liddell, H. (2000). Support vector regression and classifica-
905 tion based multi-view face detection and recognition. In Automatic Face and
Gesture Recognition, 2000. Proceedings. Fourth IEEE International Confer-
ence on (pp. 300–305).

Lloyd, C. D., & Atkinson, P. M. (2002). Deriving dsms from lidar data with
kriging. International Journal of Remote Sensing, 23 , 2519–2524.

910 Loh, W.-Y. (2011). Classification and regression trees. Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, 1 , 14–23.

Lu, C.-J., Lee, T.-S., & Chiu, C.-C. (2009). Financial time series forecasting us-
ing independent component analysis and support vector regression. Decision
Support Systems, 47 , 115 – 125.

915 Magnani, A., & Boyd, S. (2009). Convex piecewise-linear fitting. Optimization
and Engineering, 10 , 1–17.

Malash, G. F., & El-Khaiary, M. I. (2010). Piecewise linear regression: A

statistical method for the analysis of experimental adsorption data by the
intraparticle-diffusion models. Chemical Engineering Journal , 163 , 256 –
920 263.

Matthews, T. J., Steinbauer, M. J., Tzirkalli, E., Triantis, K. A., & Whittaker,
R. J. (2014). Thresholds and the speciesarea relationship: a synthetic analysis
of habitat island datasets. Journal of Biogeography, 41 , 1018–1028. URL:
http://dx.doi.org/10.1111/jbi.12286. doi:10.1111/jbi.12286.

39
925 Miller, D. C., Syamlal, M., Mebane, D. S., Storlie, C., Bhattacharyya, D.,
Sahinidis, N. V., Agarwal, D., Tong, C., Zitney, S. E., Sarkar, A., Sun, X.,
Sundaresan, S., Ryan, E., Engel, D., & Dale, C. (2014). Carbon capture simu-
lation initiative: A case study in multiscale modeling and new challenges. An-
nual Review of Chemical and Biomolecular Engineering, 5 , 301–323. PMID:
930 24797817.

Minjares-Fuentes, R., Femenia, A., Garau, M., Meza-Velzquez, J., Simal, S.,
& Rossell, C. (2014). Ultrasound-assisted extraction of pectins from grape
pomace using citric acid: A response surface methodology approach. Carbo-
hydrate Polymers, 106 , 179 – 189.

935 Muggeo, V. M. (2008). Segmented: an r package to fit regression models with

broken-line relationships. R news, 8 , 20–25.

Nuchitprasittichai, A., & Cremaschi, S. (2013). An algorithm to determine

sample sizes for optimization with artificial neural networks. AIChE Journal ,
59 , 805–812.

940 Paliwal, M., & Kumar, U. A. (2009). Neural networks and statistical techniques:
A review of applications. Expert Systems with Applications, 36 , 2 – 17.

Palmer, K., & Realff, M. (2002). Metamodeling approach to optimization of

steady-state flowsheet simulations: Model generation. Chemical Engineering
Research and Design, 80 , 760 – 772.

945 Pan, J., Kung, P., Bretholt, A., & Lu, J. (2014). Prediction of energys environ-
mental impact using a three-variable time series model. Expert Systems with
Applications, 41 , 1031 – 1040.

Papadopoulos, H., Vovk, V., & Gammerman, A. (2011). Regression confor-

mal prediction with nearest neighbours. Journal of Artificial Intelligence
950 Research, 40 , 815–840.

40
Quinlan, J. R. (1992). Learning with continuous classes. In Proceedings of the
Australian Joint Conference on Artificial Intelligence (pp. 343–348). World
Scientific.

R Development Core Team (2008). R: A Language and Environment for Sta-

955 tistical Computing. R Foundation for Statistical Computing Vienna, Austria.
ISBN 3-900051-07-0.

Rafiq, M., Bugmann, G., & Easterbrook, D. (2001). Neural network design for
engineering applications. Computers and Structures, 79 , 1541 – 1552.

Sampson, P. D., Richards, M., Szpiro, A. A., Bergen, S., Sheppard, L., Larson,
960 T. V., & Kaufman, J. D. (2013). A regionalized national universal kriging
model using partial least squares regression for estimating annual pm2.5 con-
centrations in epidemiology. Atmospheric Environment, 75 , 383 – 392.

Sarimveis, H., Alexandridis, A., Mazarakis, S., & Bafas, G. (2004). A

new algorithm for developing dynamic radial basis function neural net-
965 work models based on genetic algorithms. Computers and Chem-
ical Engineering, 28 , 209 – 217. URL: http://www.sciencedirect.
com/science/article/pii/S0098135403001698. doi:http://dx.doi.org/
10.1016/S0098-1354(03)00169-8. Escape 12.

Scheuber, M. (2010). Potentials and limits of the k-nearest-neighbour method

970 for regionalising sample-based data in forestry. European Journal of Forest
Research, 129 , 825–832.

Smola, A., & Schlkopf, B. (2004). A tutorial on support vector regression.

Statistics and Computing, 14 , 199–222.

Strikholm, B. (2006). Determining the number of breaks in a piecewise lin-

975 ear regression model . Working Paper Series in Economics and Finance 648
Stockholm School of Economics.

Tibshirani, R. (1994). Regression shrinkage and selection via the lasso. Journal
of the Royal Statistical Society, Series B , 58 , 267–288.

41
Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: a ret-
980 rospective. Journal of the Royal Statistical Society: Series B (Statistical
Methodology), 73 , 273–282.

Toms, J. D., & Lesperance, M. L. (2003). Piecewise regression: A tool for

identifying ecological thresholds. Ecology, 84 , 2034–2041.

Toriello, A., & Vielma, J. P. (2012). Fitting piecewise linear continuous func-
985 tions. European Journal of Operational Research, 219 , 86–95.

Tsanas, A., & Xifara, A. (2012). Accurate quantitative estimation of energy

performance of residential buildings using statistical machine learning tools.
Energy and Buildings, 49 , 560 – 567.

Venkatesh, K., Ravi, V., Prinzie, A., & den Poel, D. V. (2014). Cash demand
990 forecasting in atms by clustering and neural networks. European Journal of
Operational Research, 232 , 383 – 392.

Viana, F. A. C., Simpson, T. W., Balabanov, V., & Toropov, V. (2014). Meta-
modeling in Multidisciplinary Design Optimization: How Far Have We Really
Come? AIAA Journal , 52 , 670–690.

995 Wu, K.-J., Chen, Q.-L., & He, C.-H. (2014). Speed of sound of ionic liquids:
Database, estimation, and its application for thermal conductivity prediction.
AIChE Journal , 60 , 1120–1131.

Xue, Y., Liu, S., Zhang, L., & Hu, Y. (2013). Integrating fuzzy logic with
piecewise linear regression for detecting vegetation greenness change in the
1000 yukon river basin, alaska. International Journal of Remote Sensing, 34 , 4242–
4263.

Yeh, I.-C. (1998). Modeling of strength of high-performance concrete using

artificial neural networks. Cement and Concrete Research, 28 , 1797 – 1808.

Zhang, J.-R., Zhang, J., Lok, T.-M., & Lyu, M. R. (2007). A hybrid particle
1005 swarm optimizationback-propagation algorithm for feedforward neural net-

42
work training. Applied Mathematics and Computation, 185 , 1026 – 1037.
Special Issue on Intelligent Computing Theory and Methodology.

Zhang, Y., & Sahinidis, N. V. (2013). Uncertainty quantification in co2 seques-

tration using surrogate models from polynomial chaos expansion. Industrial
1010 and Engineering Chemistry Research, 52 , 3121–3132.

Zhu, Q., & Lin, H. (2010). Comparing ordinary kriging and regression kriging
for soil properties in contrasting landscapes. Pedosphere, 20 , 594 – 606.

How To Hack Atm
87% (15)
How To Hack Atm
1 page
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Relational Intelligence The People Skills You Need For The Life of Purpose You Want (Daniels, Dharius) (Z-Library) - Data - Alterno
No ratings yet
Relational Intelligence The People Skills You Need For The Life of Purpose You Want (Daniels, Dharius) (Z-Library) - Data - Alterno
6 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
Ain't It Fun - Paramore
No ratings yet
Ain't It Fun - Paramore
2 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
Gestalt
100% (3)
Gestalt
39 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Laboratory Manual: Bachelor of Engineering Technology Chemistry For Engineering Technologist
No ratings yet
Laboratory Manual: Bachelor of Engineering Technology Chemistry For Engineering Technologist
5 pages
Multivariate Ordinal Categorical Process Control Based On Log Linear Modeling
No ratings yet
Multivariate Ordinal Categorical Process Control Based On Log Linear Modeling
16 pages
Mixture of Partial Least Squares Regression Models
No ratings yet
Mixture of Partial Least Squares Regression Models
13 pages
Lasso RRR
No ratings yet
Lasso RRR
74 pages
Opposition-Based Differential Evolution
No ratings yet
Opposition-Based Differential Evolution
16 pages
Nonlinear Identification of PH Process by Using Nnarx Model: M.Rajalakshmi, G.Saravanakumar, C.Karthik
No ratings yet
Nonlinear Identification of PH Process by Using Nnarx Model: M.Rajalakshmi, G.Saravanakumar, C.Karthik
5 pages
02 Efficient Characterization of The Random Eigenvalue Problem in
No ratings yet
02 Efficient Characterization of The Random Eigenvalue Problem in
19 pages
IJAIA
No ratings yet
IJAIA
8 pages
SVM Quantile DNA
No ratings yet
SVM Quantile DNA
9 pages
Comparison of Different Uses of Metamodels For Robust Design Optimization
No ratings yet
Comparison of Different Uses of Metamodels For Robust Design Optimization
14 pages
Matecconf Icpcm2023 01046
No ratings yet
Matecconf Icpcm2023 01046
6 pages
Yang-39 2 Proof 27
No ratings yet
Yang-39 2 Proof 27
11 pages
System Reliability Analysis With Small Failure Pro
No ratings yet
System Reliability Analysis With Small Failure Pro
17 pages
Topology and Geometry in Machine Learning For Logistic Regression Problems
No ratings yet
Topology and Geometry in Machine Learning For Logistic Regression Problems
30 pages
Multivariate Linear QSPR/QSAR Models: Rigorous Evaluation of Variable Selection For PLS
No ratings yet
Multivariate Linear QSPR/QSAR Models: Rigorous Evaluation of Variable Selection For PLS
10 pages
NLS Tools For
No ratings yet
NLS Tools For
21 pages
Engineering Applications of Arti Ficial Intelligence: Zeineb Lassoued, Kamel Abderrahim
No ratings yet
Engineering Applications of Arti Ficial Intelligence: Zeineb Lassoued, Kamel Abderrahim
9 pages
Uncertain and Sensitivity in LCA
No ratings yet
Uncertain and Sensitivity in LCA
7 pages
Chemometrics and Intelligent Laboratory Systems: 10.1016/j.chemolab.2016.07.011
No ratings yet
Chemometrics and Intelligent Laboratory Systems: 10.1016/j.chemolab.2016.07.011
26 pages
Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties
No ratings yet
Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties
14 pages
Evaluation of Failure Probability Via Surrogate Models
No ratings yet
Evaluation of Failure Probability Via Surrogate Models
15 pages
Journal of Statistical Software: Multiple Imputation With Diagnostics (Mi) in R: Opening Windows Into The Black Box
No ratings yet
Journal of Statistical Software: Multiple Imputation With Diagnostics (Mi) in R: Opening Windows Into The Black Box
31 pages
FalseDiscoveryRates FDR 6
No ratings yet
FalseDiscoveryRates FDR 6
21 pages
Epsoc
No ratings yet
Epsoc
8 pages
14 Aos1260
No ratings yet
14 Aos1260
31 pages
2019_GOSIEWSKA_AUDITOR_AN r package for model agnostic visual validation and diagnostics
No ratings yet
2019_GOSIEWSKA_AUDITOR_AN r package for model agnostic visual validation and diagnostics
14 pages
Delac 01
No ratings yet
Delac 01
6 pages
Multivariable Identification of An Activated Sludge Process With Subspace-Based Algorithms
No ratings yet
Multivariable Identification of An Activated Sludge Process With Subspace-Based Algorithms
9 pages
ADMM
No ratings yet
ADMM
33 pages
Basic Sens Analysis Review PDF
No ratings yet
Basic Sens Analysis Review PDF
26 pages
Identification Methods of Nonlinear Systems Based On The Kernel Functions
No ratings yet
Identification Methods of Nonlinear Systems Based On The Kernel Functions
16 pages
A Review of Surrogate Assisted Multiobjective EA
No ratings yet
A Review of Surrogate Assisted Multiobjective EA
15 pages
978-3-642-37453-1_40
No ratings yet
978-3-642-37453-1_40
12 pages
Articulo
No ratings yet
Articulo
9 pages
Jurnal Asli Diagram Sa
No ratings yet
Jurnal Asli Diagram Sa
11 pages
Inteligencia Artificial: Adriana Villa-Murillo, Andr Es Carri On and Antonio Sozzi
No ratings yet
Inteligencia Artificial: Adriana Villa-Murillo, Andr Es Carri On and Antonio Sozzi
17 pages
Translate - Varying Coefficient Models in Stata - v4
No ratings yet
Translate - Varying Coefficient Models in Stata - v4
40 pages
UNIT 4 Seminar
50% (2)
UNIT 4 Seminar
10 pages
s11227-023-05111-8
No ratings yet
s11227-023-05111-8
68 pages
Approximation Models in Optimization Functions: Alan D Iaz Manr Iquez
No ratings yet
Approximation Models in Optimization Functions: Alan D Iaz Manr Iquez
25 pages
A robust regression method based on exponential-type kernel functions — De Carvalho et al
No ratings yet
A robust regression method based on exponential-type kernel functions — De Carvalho et al
47 pages
Kock Hadaya 2018 ISJ SampleSizePLS
No ratings yet
Kock Hadaya 2018 ISJ SampleSizePLS
37 pages
s10107-017-1137-4
No ratings yet
s10107-017-1137-4
39 pages
guzman2021
No ratings yet
guzman2021
15 pages
Generalized Functional Linear Models
No ratings yet
Generalized Functional Linear Models
32 pages
A Fault Detection Approach Based On Machine Learning Models: (Legarza, RMM, Rramirez) @itesm - MX
No ratings yet
A Fault Detection Approach Based On Machine Learning Models: (Legarza, RMM, Rramirez) @itesm - MX
2 pages
An Overview of Software For Conducting Dimensionality Assessmentent in Multidiomensional Models
No ratings yet
An Overview of Software For Conducting Dimensionality Assessmentent in Multidiomensional Models
11 pages
1 s2.0 S0959152421001098 Main
No ratings yet
1 s2.0 S0959152421001098 Main
12 pages
Gagnon, Abran et April (2024)
No ratings yet
Gagnon, Abran et April (2024)
24 pages
Alberton Etal 2013 Parameters Identifiability
No ratings yet
Alberton Etal 2013 Parameters Identifiability
17 pages
Choi 2018
No ratings yet
Choi 2018
13 pages
Application of Rejection Sampling based methodology to variance based parametric sensitivity analysis
No ratings yet
Application of Rejection Sampling based methodology to variance based parametric sensitivity analysis
10 pages
Modal Assurance Criterion
No ratings yet
Modal Assurance Criterion
8 pages
Final Thesis Pk12
No ratings yet
Final Thesis Pk12
97 pages
Independent Feature Elimination in High Dimensional Data: Empirical Study by Applying Learning Vector Quantization Method
No ratings yet
Independent Feature Elimination in High Dimensional Data: Empirical Study by Applying Learning Vector Quantization Method
6 pages
Environmental Modelling & Software: Saman Razavi, Bryan A. Tolson, Donald H. Burn
No ratings yet
Environmental Modelling & Software: Saman Razavi, Bryan A. Tolson, Donald H. Burn
20 pages
Souza e Junqueira 2005 PDF
No ratings yet
Souza e Junqueira 2005 PDF
11 pages
Applied Iterative Methods
From Everand
Applied Iterative Methods
Louis A. Hageman
No ratings yet
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
From Everand
Machine Learning. Supervised Learning Techniques and Tools: Nonlinear Models Exercises with R, SAS, Stata, Eviews and SPSS
César Pérez López
No ratings yet
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
Attention Is All You Need
50% (2)
Attention Is All You Need
11 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
Realworld - Python - Hackers Guide2021
67% (3)
Realworld - Python - Hackers Guide2021
362 pages
97 Things Every Programmer Should Know Extended
100% (3)
97 Things Every Programmer Should Know Extended
143 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Pranav SOP Harvard 2
No ratings yet
Pranav SOP Harvard 2
2 pages
The Singularity: Creating Skynet and The Destruction of Humanity.
No ratings yet
The Singularity: Creating Skynet and The Destruction of Humanity.
212 pages
Scientific American - April 2024
100% (1)
Scientific American - April 2024
88 pages
Mythic Magazine #009
100% (3)
Mythic Magazine #009
27 pages
Being Human in The Age of Artificial Intelligence
No ratings yet
Being Human in The Age of Artificial Intelligence
1 page
Sudoku Theory
No ratings yet
Sudoku Theory
13 pages
Cognitive Bias Cheat Sheet
100% (1)
Cognitive Bias Cheat Sheet
17 pages
U.S. Army Intelligence Analysis Manual PDF
100% (1)
U.S. Army Intelligence Analysis Manual PDF
146 pages
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
No ratings yet
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
25 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Network Concepts Cheat Sheet
No ratings yet
Network Concepts Cheat Sheet
3 pages
Psychological Statistics Assignment
No ratings yet
Psychological Statistics Assignment
4 pages
Internship Report
No ratings yet
Internship Report
39 pages
BS 5306-8:2012 How Does It Affect You?
No ratings yet
BS 5306-8:2012 How Does It Affect You?
3 pages
Uart2bus Verification Plan
No ratings yet
Uart2bus Verification Plan
22 pages
Handout Diversity 25.10.21
No ratings yet
Handout Diversity 25.10.21
29 pages
Direct Reduction and Smelting Processes
No ratings yet
Direct Reduction and Smelting Processes
40 pages
Wheel Alignment On Light Vehicles
No ratings yet
Wheel Alignment On Light Vehicles
4 pages
Mangalayatan University, Beswan, Aligarh: Malay Pratap Singh
No ratings yet
Mangalayatan University, Beswan, Aligarh: Malay Pratap Singh
102 pages
Philips Mini Stereo dcm292 - 37b Manual
No ratings yet
Philips Mini Stereo dcm292 - 37b Manual
20 pages
Valvula Check Bykers
100% (1)
Valvula Check Bykers
6 pages
Module 3 - SMS
No ratings yet
Module 3 - SMS
38 pages
Thesis Defense Presentation Speech
100% (2)
Thesis Defense Presentation Speech
6 pages
MFS-LT Web 8o
No ratings yet
MFS-LT Web 8o
8 pages
Moran
No ratings yet
Moran
25 pages
EC170 UM-en
No ratings yet
EC170 UM-en
10 pages
Software Engineering Notes1
No ratings yet
Software Engineering Notes1
13 pages
DSE 056 017 PC Configuration Interfacing
100% (3)
DSE 056 017 PC Configuration Interfacing
226 pages
Kalmar Reachstackers 45t Drt450 Workshop Manual
100% (58)
Kalmar Reachstackers 45t Drt450 Workshop Manual
20 pages
TS TS8000: Getting Started
No ratings yet
TS TS8000: Getting Started
40 pages
Fraia Pilot Report
No ratings yet
Fraia Pilot Report
44 pages
Lab 1 - Getting Started With UML, Your Virtual Machine and SQL
No ratings yet
Lab 1 - Getting Started With UML, Your Virtual Machine and SQL
10 pages
Cs Xxx Basic Computer Skills
No ratings yet
Cs Xxx Basic Computer Skills
5 pages
45-Bridge Mode Lab
No ratings yet
45-Bridge Mode Lab
18 pages
Estimation of Slab Culvert
100% (1)
Estimation of Slab Culvert
31 pages
MDOF Dynamics Mathcad
No ratings yet
MDOF Dynamics Mathcad
41 pages
Integrated Openstack 51 User Guide
No ratings yet
Integrated Openstack 51 User Guide
37 pages
DBMS File Organization
No ratings yet
DBMS File Organization
69 pages
02 - Capacitor Panel - 1 (1300 Kvar) (Substation - 3)
No ratings yet
02 - Capacitor Panel - 1 (1300 Kvar) (Substation - 3)
3 pages

Mathematical Programming For Piecewise Linear Regression Analysis

Uploaded by

Mathematical Programming For Piecewise Linear Regression Analysis

Uploaded by

Mathematical Programming for Piecewise Linear

Lingjian Yanga , Songsong Liua , Sophia Tsokab , Lazaros G. Papageorgioua,∗

London, Strand, London WC2R 2LS, UK

In data mining, regression analysis is a computational tool that predicts con-

∗ Corresponding author: Tel.: +442076792563; Fax.: +442073882348

In data mining, regression is a type of analysis that predicts continuous out-

A large number of regression analysis methodologies exist in the literature,

Another popular variant of linear regression is called least absolute shrinkage

Automated learning of algebraic models for optimisation (ALAMO) (Cozad

Identifying the optimal configuration of a MLP, i.e. the number of interme-

175 Random forest is an ensemble learning method of regression trees. In general,

It is clear that the previous literature work of piece-wise regression methods

• We propose a novel mixed integer optimisation model for multivariate

• A number of real world benchmark problems have been employed to

• As a generic data mining method, our proposed regression method can

The rest of the paper is structured as follows: in Section 2, we present

345 2.1. A novel regression method

In this section, we first describe a novel mathematical programming model

Binary variables Fsr are introduced to model if sample s belongs to region r or

Figure 1: Break-points and regions

Ds ≥ P redrs − Ys − U 00 (1 − Fsr ) ∀s, r (7)

The objective function is to minimise the sum of absolute training error:

β is the only user-specific parameter in our proposed regression method, which

2.2. An illustrative example

Table 1: Piecewise regression functions built at each step of training procedure

1 1 NONE 1677.78 P = 1.0240T + 0.0054CAin + 0.0125CB in + 0.4340V − 333.54

3. Results and Discussion

For each of the 7 benchmark datasets, a 5-fold cross validation, is performed

For comparison purposes, a number of state-of-the-art regression methods have

3.1. Sensitivity analysis for β

495 In this subsection, a sensitivity analysis is performed for the parameter β,

Overall β = 0.03 provides the smallest normalised MAE of 1.7%, which is

550 3.2. Prediction performance comparison

Table 2: Comparative testing of different regression methods on benchmark datasets

Overall, for 4 out of the 7 datasets, including Yacht Hydrodynamics, Energy

evaluate the relative competitiveness of all the implemented approaches, we em-

According to Figure 5, the proposed method is shown to be the most accu-

Table 3: Number of regions and partition feature by our proposed method

3.3. Strength and weakness of the proposed piece-wise regression method

640 With regards to shortcomings of our proposed piece-wise regression method,

This work addresses the problem of multivariate regression analysis, where

Bache, K., & Lichman, M. (2013). UCI machine learning repository.

Biau, G. (2012). Analysis of a random forests model. Journal of Machine

Breiman, L. (2001). Random forests. Machine Learning, 45 , 5–32.

Brus, D. J., & Heuvelink, G. B. (2007). Optimization of sample patterns for

Caballero, J. A., & Grossmann, I. E. (2008). An algorithm for the use of

Comrie, A. C. (1997). Comparing neural networks and regression models for

Dua, V. (2010). A mixed-integer programming approach for optimal configura-

830 Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals

GAMS Development Corporation (2013). General Algebraic Modeling System

Gudise, V., & Venayagamoorthy, G. (2003). Comparison of particle swarm

Henao, C. A., & Maravelias, C. T. (2010). Surrogate-based process synthesis. In

Henao, C. A., & Maravelias, C. T. (2011). Surrogate-based superstructure

Jekabsons, G. (2011). Areslab: Adaptive regression splines toolbox for mat-

Khayet, M., Cojocaru, C., & Zakrzewska-Trznadel, G. (2008). Response surface

Kleijnen, J. P. (2009). Kriging metamodeling in simulation: A review. European

Kleijnen, J. P. C., & Beers, W. C. M. v. (2004). Application-driven sequential

Kone, E. R. S., & Karwan, M. H. (2011). Combining a new data classification

Korhonen, K. T., & Kangas, A. (1997). Application of nearestneighbour regres-

895 Levis, A. A., & Papageorgiou, L. G. (2005). Customer demand forecasting

Malash, G. F., & El-Khaiary, M. I. (2010). Piecewise linear regression: A

935 Muggeo, V. M. (2008). Segmented: an r package to fit regression models with

Nuchitprasittichai, A., & Cremaschi, S. (2013). An algorithm to determine

Palmer, K., & Realff, M. (2002). Metamodeling approach to optimization of

Papadopoulos, H., Vovk, V., & Gammerman, A. (2011). Regression confor-

R Development Core Team (2008). R: A Language and Environment for Sta-

Sarimveis, H., Alexandridis, A., Mazarakis, S., & Bafas, G. (2004). A

Scheuber, M. (2010). Potentials and limits of the k-nearest-neighbour method

Smola, A., & Schlkopf, B. (2004). A tutorial on support vector regression.

Strikholm, B. (2006). Determining the number of breaks in a piecewise lin-

Toms, J. D., & Lesperance, M. L. (2003). Piecewise regression: A tool for

Tsanas, A., & Xifara, A. (2012). Accurate quantitative estimation of energy

Yeh, I.-C. (1998). Modeling of strength of high-performance concrete using

Zhang, Y., & Sahinidis, N. V. (2013). Uncertainty quantification in co2 seques-

You might also like