0% found this document useful (0 votes)
12 views

Which Artificial Intelligence Algorithm Better Pre

This article compares the performance of deep learning algorithms to traditional artificial neural networks (backpropagation, extreme learning machine, and radial basis function neural network) in predicting the Chinese stock market index futures using high frequency transaction data. Deep learning methods are shown to outperform traditional neural networks in both how accurately they fit the data and their ability to predict the direction of changes in the stock market index futures. Increasing the amount of data used also improves the predictive performance of deep learning.

Uploaded by

adel.afettouche
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Which Artificial Intelligence Algorithm Better Pre

This article compares the performance of deep learning algorithms to traditional artificial neural networks (backpropagation, extreme learning machine, and radial basis function neural network) in predicting the Chinese stock market index futures using high frequency transaction data. Deep learning methods are shown to outperform traditional neural networks in both how accurately they fit the data and their ability to predict the direction of changes in the stock market index futures. Increasing the amount of data used also improves the predictive performance of deep learning.

Uploaded by

adel.afettouche
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

1234

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

Which artificial intelligence algorithm better


predicts the Chinese stock market?
Lin Chen, Zhilin Qiao, Minggang Wang, Chao Wang, Ruijin Du, H. Eugene Stanley

Abstract—Unpredictable stock market factors make it difficult machine learning that has improved the ability of comput-
to predict stock index futures. Although efforts to develop an ers in areas of image recognition and classification, natural
effective prediction method have a long history, recent develop- language processing, speech recognition, and social network
ments in artificial intelligence and the use of artificial neural
networks have increased our success in nonlinear approxima- filtering [14-16]. In some cases the results are comparable
tion. When we study financial markets we can now extract to or even superior to those of human experts [17,18]. The
features from a big data environment without prior predictive structure of DL is a multi-layer neural network that uses a
information. We here propose to further improve this predictive cascade of multiple layers of nonlinear processing units to
performance using a combination of a deep-learning based stock extract and transform various features. Its learning can be
index futures prediction model, an autoencoder, and a restricted
Boltzmann machine. We use high frequency data to examine the either supervised or unsupervised, and it forms a hierarchy of
predictive performance of deep learning, and we compare three concepts by utilizing multiple representation levels that corre-
traditional artificial neural networks, (i) the back propagation spond to different levels of abstraction [19]. The use of DL
neural network, (ii) the extreme learning machine, and (iii) the has improved computational power and big data processing,
radial basis function neural network. We use all of the one- and it allows more sophisticated algorithms. Previous studies
minute high frequency transaction data of the CSI 300 futures
contract (IF1704) in our empirical analysis, and we test three indicate that DL more efficiently solves nonlinear problems
groups of different volume samples to validate our observations. than traditional methods [19-21]. Irrespective of the level
We find that the deep learning method of predicting stock of complication or the presence of linear and nonlinear big
index futures outperforms the back propagation, the extreme data financial market factors, DL can extract abstract features
learning machine, and the radial basis function neural network and identify hidden relationships in financial markets without
in its fitting degree and directional predictive accuracy. We also
find that increasing the amount of data increases predictive making econometric assumptions [22]. Traditional financial
performance. This indicates that deep learning captures the economic methods and other quantitative techniques cannot
nonlinear features of transaction data and can serve as a powerful do this. Over against the limitations of existing models DL
stock index futures prediction tool for financial market investors. can process high frequency big data, analyze the financial
market, and predict stock returns [23,24]. When setting the
Index Terms—Prediction methods, Artificial neural networks, parameters of artificial neural networks, the learning rate,
Stock markets, Deep learning. epochs, goals, and number of artificial neurons, all must be
taken into account to achieve desirable results [25,26]. DL can
I. I NTRODUCTION extract abstract features without subjective interferences. We
TOCK market prediction is a classic topic in both financial thus do not need to add influencing factors or control variables
S circles and academia. Extreme stock market fluctuations,
e.g., the global stock market turmoil in February 2018, damage
when we use a large time-series dataset to predict financial
market behavior. Some other methods have been applied to
financial markets and the global economy. We thus need a time-series analysis[27-29]. For example, Gao et al. propose
more effective way of predicting market fluctuations. Among a new wavelet multiresolution complex network for analyzing
the many predictive efforts over the last few decades [1- multivariate time series [28], which is capable of analyzing
3], some have had success using quantitative methods [4-9], the information in the dynamical and topological fields and
such as autoregressive integrated moving average (ARIMA) is successfully applied in many research fields. But DL can
models, artificial neural networks, support vector machine, provide accurate financial market time-series forecasting, and
and neuro-fuzzy based systems, but because of the nonlinear the accuracy increases as the size of the database increases
characteristics of stock market behavior, financial economists [30]. Thus DL is perfect for time-series prediction in the
continue to debate these methodologies [10-13]. financial market.
Recently “deep learning” (DL) has attracted a great deal We here use deep learning to predict stock market behavior,
of attention in many research fields. DL is a new area of and we compare the performance of this approach with the
performance of traditional back propagation (BP) network, ex-
Lin Chen is with School of Management, Northwestern Polytechnical treme learning machines (ELM), radial basis function (RBF).
University, Xi’an 710072, China.
Zhilin Qiao is with School of Economics and Finance, Xi’an Jiaotong Many studies have used traditional predictive methods and
University, Xi’an 710061, China. low frequency data to predict stock prices and returns, but the
Minggang Wang is with Nanjing Normal University. Chao Wang is with market inefficiencies caused by high frequency microstructure
Beijing University of Technology. Ruijin Du is with Jiangsu University. H.
Eugene Stanley is with Boston University. noise may provide additional profit opportunities. When tradi-
Manuscript received April 4, 2018; revised July 12, 2018. tional predictive methods are used to examine high frequency

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

Input Code Reconstruction


Output
Encoder Decoder
Error

Classifier Fig. 2: Principle of autoencoder.

Higher-level Features Here W indicates the weight matrices, b the biases, and L
the number of layers. The training process of deep learning
has two stages.The first is unsupervised learning from bottom
High-level Features to top. The second is supervised learning from top to bottom.
The widely used data representation methods in deep learning
are two nonlinear transformations, (i) the autoencoder (AE)
Low-level Features and (ii) the restricted Boltzmann machine (RBM).

1) Autoencoder: The autoencoder learns a representation


Input (an encoding) for a dataset, typically in order to reduce the
dimensionality. It compresses the input into a latent-space
Fig. 1: Illustration of deep learning hierarchy. representation and then reconstructs the output from this
representation(see Fig.2).
An autoencoder consists of an encoder and a decoder [14],
data they are also subject to overfitting and low accuracy. defined as transitions ϕ and φ,
We thus use a DL-based prediction model and high fre-
quency time-series data from 20 February 2017 to 20 April ϕ:χ→κ
2017 in the Chinese stock market to predict stock index φ:κ→χ (2)
futures and compare the predictive performance with that from ϕ, φ = arg min ∥X − (φ ◦ ϕ)X∥2 .
other methods. In Sec. 2 we describe deep learning and other ϕ,φ

artificial intelligence algorithms. In Sec. 3 we describe the data In the encoder stage, xl ∈ Rp = χ is the input of layer l
source and the data characteristics and propose the criteria for mapped to yl ∈ Rq = κ with the function yl = fl (Wl xl + bl ).
evaluating the performance of our model. Sec. 4 provides our A sigmoid function is typically used to activate function fl ,
empirical results, and Sec. 5 provides the conclusions. which we also use here. In the decoder stage, yl is mapped to
the reproduction of the same shape as xl using the function
′ ′ ′ ′
II. D EEP LEARNING AND OTHER ARTIFICIAL xl = fl (Wl yl + bl ) . Autoencoders are trained to minimize
INTELLIGENCE ALGORITHMS reconstruction errors,
′ ′ ′ ′ ′
A. Deep learning E(X, X ) = ∥X −X ∥2 = ∥X −f (W (f (W X +b))+b )∥2 .
Deep learning is a machine learning technology that by- (3)
passes manual extraction and extracts features mechanistically. We here use autoencoders to set initial weights and thresholds
Deep learning simulates the cerebral cortex and abstract data to reduce the error and more rapidly reach the desired range.
or signals, layer by layer, and models the image recognition
of the cerebral cortex. Deep learning first extracts low-level 2) Restricted Boltzmann Machine: The restricted Boltz-
features from original signals. It then extracts high-level fea- mann machine (RBM) is a generative stochastic artificial
tures, first from low-level features and then from higher-level neural network that can learn a probability distribution. It is
features. In an image recognition system, the original signals a restricted variant of a Boltzmann machine in which neurons
are pixels. The low-level features are the edges of objects. must form a bipartite graph. Depending on the task, the
The high-level features are the contours, and the highest- training of an RBM can be either supervised or unsupervised.
level feature is the image. Using the characteristics of high- A standard RBM has a hidden binary-value and visible units.
level classification, deep learning outputs forecast results [15]. Here X, Y are the visible (input) and hidden layers, respec-
Figure 1 shows the deep learning hierarchy. tively. The energy of a configuration (a pair of boolean vectors)
Deep learning abstracts and transfers features of data (X, Y ) is defined
through different layers. Here f is the activation function.
G(X, Y ) = −αT X − β T Y − X T W Y, (4)
Given input X and predictable output Ŷ , the prediction
function of deep learning is where the parameter α, β, and W are the weights of visible
M1 = f1 (W0 X + b0 ), units, hidden units, and those associated with the connection
between hidden and visible units, respectively. The probability
M2 = f2 (W1 M1 + b1 ), distributions over hidden or visible vectors in general Boltz-
··· (1) mann machines are defined in terms of the energy function
ML = fL (WL−1 ML−1 + bL−1 ), [31]:
1
Ŷ (X) = WL ML + bL . P (X, Y ) = e−G(X,Y ) , (5)
Z

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

where Z is a partition function defined as the sum of e−G(X,Y ) output. Usually the hidden parameters {Wi , ei } are randomly
over all possible configurations, and the marginal probability generated during training without tuning. The ELM solves
of a visible vector is the sum over all possible hidden layer the compact model using error minimization
configurations,
minη ∥Dη − Z∥F (10)
1 ∑ −G(X,Y )
P (X) = e . (6) with
Z
 1 , W2 , · · · , Wd , e1 , e2 , · · · , ed )
Y D(W 
With the restriction that RBM has the shape of a bipar- f (W1 X1 + e1 ) · · · f (Wh X1 + ed )
tite graph with no intra-layer connections, the hidden unit  .. .. 
= . ··· . ,
activations are mutually independent given the visible unit
activations, and vise verse. In this paper, we use RBM to pre- f (W  s + e1 ) · ·T· f (Wh Xs + ed )
1X
η1T Z1
train the network layer by layer, and then fine-tune it with  ..   .. 
feedback method. η =  . , Z =  . ,
ηdT ZsT
B. Artificial intelligence algorithms where D is the hidden layer output matrix, and η the output
weight matrix. ELM randomly selects the hidden node pa-
Artificial intelligence algorithms have been used to predict rameters, and only the output weight parameters needed to be
economic and market behavior [32-34]. To compare predictive determined. Despite these advantages, ELM cannot be used
performances, we focus on three popular artificial neural when the time series is noisy.
networks, (i) the back propagation neural network, (ii) the 3) Radial basis function neural network: The radial basis
extreme learning machine, and (iii) the radial basis function function (RBF) network is an artificial neural network that
neural network. uses radial basis functions for activation. The network output
1) Back propagation neural network: The back propagation is a linear radial basis function combination of inputs and
(BP) neural network is an artificial intelligence algorithm neuron parameters. RBF networks have many uses, including
widely used in prediction, in particular for advanced multiple function approximation, time series prediction, classification,
regression analysis. It better generates complex and non- and system control [36]. Radial basis functions are often used
linear responses than a standard regression analysis [35]. The to construct function approximations,
formula for the BP algorithm is

N
W (n) = W (n − 1) − ∆W (n), (7) y(x) = µi υ(∥x − xi ∥), (11)
i=1
where
where y(x) is the sum of N radial basis functions, and µi the
∂E
∆W (n) = γ (n − 1) + θ∆W (n − 1). (8) weight estimated using the linear least square matrix method.
∂W
The Gaussian function is a commonly used radial basis func-
Here γ is the learning rate, E the gradient of error function, tion, and is here used by υ(∥x − xi ∥). Geometrically the RBF
and θ∆W (n − 1) the quantity of incremental weight. A BP network divides the input space into hypersphere subspaces,
network uses the gradient method, and the learning and inertial which can cause such algorithmic problems as overfitting,
factors are determined by experience. This affects the conver- overtraining, the small-sample effect, and singularities.
gence in a BP network. A BP network rapidly converges to a
local minimum, but because it learns the convergent velocity III. DATA AND PERFORMANCE EVALUATION CRITERIA
more slowly, it has relatively few applications.
2) Extreme learning machine: The extreme learning ma- A. Data
chine (ELM) is a feedforward neural network for classification, China Securities Index Co., Ltd, a joint venture company of
regression, clustering, and feature learning with a single layer Shanghai and Shenzhen Stock Exchanges, received permission
or multi layers of hidden nodes in which the parameters of from the China Securities Regulatory Commission to release
hidden nodes need not be tuned. For a set of training samples the China securities indices. The CSI 300 is a representative
{(Xj , Zj )}Sj=1 with S samples and V classes, the single index of these indices. It is a capitalization-weighted stock
hidden layer feedforward neural network with d hidden nodes market index designed to represent the performance of the
and activation function f is top 300 stocks listed in the Shanghai and Shenzhen stock
exchanges. As an early stock index futures contract issued in

d ∑
d
Yj = ηi fi (Xj ) = ηi f (Wi Xj + ei ), j = 1, 2, · · · , S, China, the CSI 300 futures contract has enriched the existing
i=1 i=1
index system in China’s securities markets and achieved great
(9) success. The CSI 300 index futures began trading on China
where Xj = [xj1 , xj2 , · · · , xjs ]T is the input, Financial Futures Exchange (CFFEX) with the commodity
Zj = [zj1 , zj2 , · · · , zjv ]T the corresponding output, ticker symbol IF on 16 April 2010. The notional value of
Wi = [wi1 , wi2 , · · · , wis ]T the connecting weights of one contract is RMB 300 times the value of the CSI 300. The
hidden neuron i to input neurons, ei the bias of hidden node CSI 300 index futures provides an indicator when observing
i , ηi = [ηi1 , ηi2 , · · · , ηiv ]T the connecting weights of hidden the fluctuation of the stock market. It facilitates companies
neuron i to the output neurons, and Yj the actual network and individual investors to better understand stock market

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

B. Performance evaluation
3550 We use three criteria to evaluate predictive accuracy, (i)
the root-mean-square error (RMSE), (ii) the mean absolute
Opening Price

3500 percentage error (MAPE), and (iii) the directional predictive


accuracy. The RMSE is defined
3450 √
∑N
t=1 (ŷt − yt )
2
3400 RM SE = , (14)
N
3350
where yt and yˆt are actual value and predicted value at time
0 2000 4000 6000 8000 10000 t, respectively, and N the data size of the tested set. RMSE
2017/02/20--2017/04/20 expresses the standard deviation of the difference between the
predicted and actual values.
Fig. 3: Opening price per minute of the CSI 300 futures The MAPE, also known as the mean absolute percentage
contract (IF1704). deviation (MAPD), expresses the accuracy as a percentage,
and is defined
1 ∑ (yt − ŷt )
N
developments and allows the accumulation of experience when M AP E = . (15)
trading index investment products. Thus we here select the CSI N t=1 yt
300 futures contract transaction data for empirical analysis.
MAPE measures the mean absolute relative error of each
The CSI 300 index futures uses the T+0 trading system for prediction model. RMSE and MAPE have been widely used
short-term transactions. They list the price fluctuations and to evaluate the predictive accuracy [37,38]. The smaller the
performance of the Chinese A-share market, which serve as values of RMSE and MAPE, the higher the accuracy of the
benchmarks for derivative innovations and indexing. Because model.
our goal here is to predict the short-term price of stock Furthermore, we are also interested in tendencies in predic-
index futures and to provide guidance in transactions, the tive accuracy, we measure the directional predictive accuracy,
CSI 300 index futures is the appropriate one for analysis. We {
collect all transaction data of the CSI 300 futures contract ∑ N 1, (yt+1 − yt )(ŷt+1 − yt ) ≥ 0,
DA = N1 t=1 Dt , Dt =
(IF1704) from 20 February 2017 to 20 April 2017, which 0, otherwise.
we acquire from the high frequency RESSET database, a top (16)
Chinese financial research data resource. We use one-minute The closer DA is to 1, the higher the directional predictive
high frequency data that includes opening price, highest price, accuracy. The closer DA is to 0, the lower the directional
lowest price, closing price, trade volume, and opening interest predictive accuracy.
to test the performance of deep learning and other artificial
intelligence algorithms. We collect 10,000 6-variable groups IV. R ESULTS OF EMPIRICAL ANALYSIS
from 20 February 2017 to 20 April 2017 for a total of 60,000
To eliminate random influences within each simulation we
data.
use the average value of 10 simulation results to evaluate the
Figure 3 shows the minute-by-minute trend of the opening performance of each method. To determine the performance
price of the CSI 300 futures contract (IF1704) from 20 of each neural networks when there are different amounts of
February 2017 to 20 April 2017. Note that there is a big data, we divide the database into small, medium and large
fluctuation during this period, and that thus there may be datasets (see Fig.4). In each small dataset, there are 2000
arbitrage opportunities if this indication is accurate. groups with six variables in each group, yielding a total
When using the deep learning algorithm the preparatory of 12000 data. There are five small-scale datasets. In each
data processing includes decreasing the difference between medium dataset, there are 5000 groups with six variables in
the threshold and the actual data. Usually the sample data are each group, yielding 30000 data in each medium dataset. There
normalized prior to their being input into the neural network, are two medium-scale datasets. Similarly, in each large dataset,
xt − xmin there are 10000 groups with 60000 data. We use the first 90
x′t = , (12) percent of data in each dataset as training samples and the
xmax − xmin
remaining 10 percent as testing samples. We run all neural
where x′t is the data after normalization, and xmin and xmax networks using the Matlab R2017b software package and a
the minimum and maximum data of input xt , respectively. Lenovo laptop computer with a Core(TM) i5-5200U 2.20GHz
After processing we anti-normalize the output, CPU and 8GB of random-access memory (RAM).
In DL the number of hidden layer nodes strongly affects
ŷt = yt′ (ymax + ymin ) + ymin , (13) the prediction. There is no uniform way of determining
hidden layer nodes [39], and the experience of model creator
where ŷt is the predicted data after anti-normalization, and is key during the layer-by-layer experiment. The number of
ymin and ymax the minimum and maximum data, respectively, hidden layer nodes strongly correlates with the input and
of output yt′ . output layer nodes. In general, the greater the number of

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

A small-scale dataset
Actual and prediction data (small-scale datasets)
3550
Actual
2000 3/24 10:11
DL
Groups -3/24 15:00

Opening Price
BP
3500 3/13 14:21 ELM
ĂĂ -3/14 13:40 4/07 13:01 RBF
A medium-scale dataset 3/01 13:01 -4/10 10:50
-3/02 10:50
3450
5000
Groups 4/19 14:21
-4/20 13:40
ĂĂ 3400
A large-scale dataset 0 200 400 600 800 1000 1200
Date
10000 Fig. 5: Actual and prediction data of the CSI 300 futures
Groups
contract (small-scale datasets).
ĂĂ

Fig. 4: Demonstration of different scale datasets.


TABLE I: The Error and Running Time of Prediction (Small-
scale Datasets)
nodes in the input and output layers, the greater the number
of nodes needed in the hidden layers to achieve efficient Testing Period Criteria DL BP ELM RBF
feature learning. Currently, the following methods can be 3/01 13:01- RMSE 0.4167 1.6583 0.8595 0.7208
used to determine the initial number of hidden layer nodes, 3/02 10:50 MAPE 0.0001 0.0004 0.0002 0.0002
DA 0.7150 0.4800 0.5850 0.6750
√ Time(s) 10.0472 5.3171 0.3734 18.6100
L= m + n + α, (17)
3/1314:21- RMSE 1.6441 8.5737 3.7178 3.7802
L = log2 m, (18) 3/14 13:40 MAPE 0.0004 0.0025 0.0010 0.0008
DA 0.4850 0.4450 0.4500 0.4600
and √ Time(s) 14.4155 1.7276 0.1273 19.0071
L= mn. (19)
3/24 10:11- RMSE 2.1212 3.4200 5.4519 4.2656
Here m is the number of input layer nodes, n the number 3/24 15:00 MAPE 0.0004 0.0007 0.0011 0.0006
of output layers nodes, and α a constant between 1-10. The DA 0.7650 0.5050 0.4950 0.7350
number of nodes in the hidden layer produced using these Time(s) 19.1702 1.6232 0.1061 16.1977
methods is only approximate, and often must be corrected 4/07 13:01- RMSE 0.5213 6.0358 1.8921 0.7040
during training and learning. Gradually increasing and decreas- 4/10 10:50 MAPE 0.0001 0.0017 0.0005 0.0001
ing the number of hidden layer nodes is a common method of DA 0.8200 0.4200 0.5300 0.7800
reducing errors to a usable range. We use Eq. (17) to calculate Time(s) 24.4831 1.6459 0.0907 18.1132
the number of hidden layer nodes and find 10 nodes in the first 4/19 14:21- RMSE 0.5151 4.9356 2.5872 4.4237
hidden layer and 4 nodes in the second hidden layer. 4/20 13:40 MAPE 0.0001 0.0012 0.0007 0.0010
DA 0.7900 0.4750 0.4900 0.5250
Time(s) 29.9869 1.7319 0.0927 16.5090
A. Predictive performance of small-scale datasets
Mean RMSE 1.0437 4.9247 2.9017 2.7788
Each small dataset contains 12000 data. According to the MAPE 0.0002 0.0013 0.0007 0.0005
total sample volume, 5 small-scale datasets are obtained for DA 0.7150 0.4650 0.5100 0.6350
analysis. The first 10800 data (90%) are training data and the Time(s) 19.6206 2.4091 0.1581 17.6874
remaining 1200 (10%) test data. Figure 5 shows the testing
sample values and the CSI 300 futures contract predictions
for different periods. Table 1 compares the predictive perfor-
mances of the DL, BP, ELM and RBF methods. In each dataset
we record the average value of 10 simulations. The last line DL method has the lowest error when predicting the price of
in each period gives the running time for each method, and the CSI 300 index futures. The DL directional prediction of
the last row of Table 1 shows the mean value of each method. 0.7150 is also more accurate than the mean DA values of BP
Table 1 shows that the DL prediction of the opening price (0.4650), ELM (0.5100), and RBF (0.6350). The running time
per minute of the CSI 300 futures contract outperforms the BP, for small datasets does not indicate that DL is superior to the
ELM, RBF methods in both accuracy and direction. The mean other methods. DL usually has more than one hidden layer,
RMSE value of DL is 1.0437, lower than the mean RMSE thus it requires more time to train the network. These results
values of BP (4.9247), ELM (2.9017), and RBF (2.7788), and indicate that the predictive performance of DL is better than
the mean MAPE value of DL (0.0002) is also lower than those that of BP, ELM, and RBF, implying that DL may be a useful
of BP (0.0013), ELM (0.0007), and RBF (0.0005). Thus the tool when forecasting stock market behavior.

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

Actual and prediction data (medium-scale datasets) TABLE II: The Error and Running Time of Prediction
3500 (Medium-scale Datasets)
3/16 14:01-3/20 14:20
Testing Period Criteria DL BP ELM RBF
Opening Price

3450 3/16 14:01- RMSE 0.7324 5.9677 4.2976 2.9929


3/20 14:20 MAPE 0.0001 0.0017 0.0010 0.0006
4/18 13:21-4/20 13:40 Actual DA 0.7440 0.4780 0.4700 0.6040
3400 DL Time(s) 13.0792 4.8034 0.3846 229.3378
BP
ELM 4/18 13:21- RMSE 0.8531 4.2729 2.1513 4.8682
RBF 4/20 13:40 MAPE 0.0002 0.0012 0.0005 0.0012
3350 DA 0.7180 0.4700 0.5420 0.5400
0 200 400 600 800 1000 1200 Time(s) 9.4341 1.5968 0.1121 208.4432
Date Mean RMSE 0.7928 5.1203 3.2245 3.9305
Fig. 6: Actual and prediction data of the CSI 300 futures MAPE 0.0002 0. 0014 0.0007 0.0009
contract (medium-scale datasets). DA 0.7310 0.4740 0.5060 0.5720
Time(s) 11.2566 3.2001 0.2482 218.8905

B. Predictive performance of medium-scale datasets


Actual and prediction data (large-scale datasets)
When a dataset contains 30000 data, we divide it into two 3500
medium datasets. We use the first 27000 data (90%) to train the
network and the remaining 3000 (10%) to test the predictive 3480

Opening Price
performance of each method. Figure 6 shows the sample
3460
values and forecasts of the CSI 300 futures contract during
two periods. Table 2 compares the predictive performances of Actual
3440
the DL, BP, ELM, and RBF methods. In the last row are the DL
average values of 10 simulation results for each dataset and a 3420
BP
summary of the mean value for each method. 4/14 13:01-4/20 13:40 ELM
RBF
Table 2 shows that the DL predictions are superior to those 3400
of BP, ELM, and RBF in both accuracy and direction. The 0 200 400 600 800 1000 1200
DL has a mean RMSE value of 0.7928, much lower than Date
the mean RMSE values of BP (5.1203), ELM (3.2245), and Fig. 7: Actual and prediction data of the CSI 300 futures
RBF (3.9305). The mean MAPE value of DL is 0.0002, also contract (large-scale datasets).
lower than those of the BP (0.0014), ELM (0.0007), and
RBF (0.0009) methods, indicating that the error of DL when
TABLE III: The Error and Running Time of Prediction (Large-
predicting the price of the CSI 300 index futures is minimal.
scale Datasets)
The mean DA value of DL is 0.7310 for directional prediction,
much higher and more accurate than those of BP (0.4740), Testing Period Criteria DL BP ELM RBF
ELM (0.5060), and RBF (0.5720). When increasing the size of 4/14 13:01- RMSE 0.6423 7.6147 1.7071 2.1135
the sample data, DL also shows better performance. In contrast 4/20 13:40 MAPE 0.0001 0.0020 0.0004 0.0002
to the average RBF running time of 218.8905, DL requires on DA 0.8120 0.4790 0.5460 0.7870
average 11.2566 seconds to complete the training and testing Time(s) 26.8537 5.5083 0.4637 3681.4000
with a usable predictive accuracy. Thus the high predictive
performance of DL is also confirmed in medium datasets.
ELM (0.0004), and RBF (0.0002). The error is smallest when
C. Predictive performance of large-scale datasets using DL to predict the price of the CSI 300 index futures.
Our large-scale dataset contains 60000 data. We use the first The DA value of DL (0.8120) is also higher than those of BP
54000 data (90%) to train the network and the remaining 6000 (0.4790), ELM (0.5460), and RBF (0.7870), indicating that the
(10%) to test the predictive accuracy of the methods. We have deep learning method has the highest directional predictive
one large-scale dataset in the current database. Figure 7 shows accuracy. The running time of DL is also superior to that
the sample values and the forecasts of the CSI 300 futures of RBF. Although the DA value of RBF reaches 0.7870, it
contract in the predictive period. Table 3 shows the predictive requires 3681.4000 seconds to process this scale of data. In
performances of the DL, BP, ELM, and RBF methods. The contrast, DL requires only 26.8537 seconds to process the
average value of 10 simulation results is also provided. same data, and the accuracy of its prediction is higher. Thus
Table 3 shows that the DL predictions of value and direction the predictive performance of DL is better than the others for
are more accurate than those of the BP, ELM, and RBF large-scale dataset.
methods. The RMSE value of DL (0.6423) is lower than those Figure 8 shows the empirical results for different size
of BP (7.6147), ELM (1.7071), and RBF (2.1135). The MAPE datasets and compares their average predictive performances.
value of DL (0.0001) is also lower than those of BP (0. 0020), DL exhibits the minimum RMSE and MAPE errors for small,

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

RMSE 10-3 MAPE futures contract (IF1704) in the Chinese stock market and
8 2 carry out an empirical analysis. To show the effect of sample
6 1.5 volume on network training and predicting, we divide the
sample into three scale datasets and compare the stock price
4 1 prediction of deep learning with three traditional artificial
2 0.5 neural networks (BP, ELM, RBF). We compare their predictive
fitting degree and directional predictive accuracy and find that
0 0 the predictive performance of deep learning is superior to that
DL BP ELM RBF DL BP ELM RBF
of BP, ELM and RBF. We also find that increasing sample
DA Time(s) volume significantly increases the predictive performance of
1 4000
deep learning. We thus find that sample volume strongly
3000 affects stock prediction, and that deep learning performs well
0.5 2000 when applied to large data. Furthermore, deep learning does
not need prior predictive information to extract features from
1000 large datasets, and this increases its usefulness in predicting
0 0
stock market behavior. This result provides additional evidence
DL BP ELM RBF DL BP ELM RBF that DL is an effective method of predicting stock price.
Small-scale dataset Medium-scale dataset Large-scale dataset Our deep learning prediction model expands our ability to
Fig. 8: Predictive performance comparison of different scale analyze financial market behavior. Because there are complex
datasets. relationships among stock futures prices and such factors as
the economy, politics, the environment, and culture, future
research could apply complex network theory to key input
medium, and large datasets and its directional predictions are variables that influence stock prices and returns. That would
the most accurate. The predictive accuracy of DL increases allow the construction of a deep learning network that would
as the sample volume increases. Although the DL running facilitate better predictive performance.
time consumed is more than that of BP and ELM, it is much
less than that of RBF, and although RBF is more accurate, ACKNOWLEDGMENT
it requires so much more running time and this advantage is Financial support from National Social Science Fund of
obviated. China (17BGL143) is greatly acknowledged.
An empirical analysis shows that DL can be trained to a
given accuracy, and that it fits actual data well. We construct
R EFERENCES
a non-linear map of the transaction data to the closing price of
the following minute. The error reaches an ideal value after a [1] Pesaran MH, Timmermann A. “Predictability of stock returns: Robust-
ness and economic significance,” The Journal of Finance, vol.50, no.4,
few iterations and satisfies the experimental requirements. DL pp.1201-1228, Sep. 1995.
also has the highest fitting degree of directional prediction, [2] Ang A, Bekaert G. “Stock return predictability: Is it there?” The Review
and its accuracy increases when the size of the sample data of Financial Studies, vol.20, no.3, pp.651-707, Jul. 2006.
[3] Rapach DE, Strauss JK, Zhou G. “International stock return predictabil-
increase. The fitting degrees of BP and ELM are less accurate, ity: what is the role of the United States?” The Journal of Finance,
and the higher accuracy RBF is undone by its excessively long vol.68, no.4, pp.1633-1662, Aug. 2013.
running time. [4] Qi M. “Nonlinear predictability of stock returns using financial and
economic variables,” Journal of Business & Economic Statistics, vol.17,
We follow the study of Chong et al., add trade volume to no.4, pp.419-429, Oct. 1999.
the input, and test high frequency (one-minute) stock market [5] Aiolfi M, Favero CA. “Model uncertainty, thick modelling and the
data. We find that DL is more effective when applied to high predictability of stock returns,” Journal of Forecasting, vol.24, no.4,
pp.233-254, Jul. 2005.
frequency data, consistent with the assumption of Chong et al [6] Huang W, Nakamori Y, Wang SY. “Forecasting stock market movement
[22]. Thus DL is an effective method, particularly for large direction with support vector machine,” Computers & Operations Re-
high frequency data. It improves predictive accuracy, reduces search, vol.32, no.10, pp.2513-2522, Oct. 2005.
[7] Atsalakis GS, Valavanis KP. “Forecasting stock market short-term trends
investment blindness, and lowers investment risk. using a neuro-fuzzy based methodology,” Expert Systems with Applica-
tions, vol.36, no.7, pp.10696-10707, Sep. 2009.
[8] Schumaker R P, Chen H. “Textual analysis of stock market prediction
V. C ONCLUSION using breaking financial news: The AZFin text system,” ACM Transac-
Predicting stock market behavior is challenging. Nonlinear tions on Information Systems (TOIS), vol.27, no.2, pp.12, Feb. 2009.
[9] Adebiyi AA, Adewumi AO, Ayo CK. “Comparison of ARIMA and
relationship among transaction data and unpredictable factors artificial neural networks models for stock price prediction,” Journal
in market fluctuations make predictions difficult. Deep learn- of Applied Mathematics, 2014.
ing is a machine learning method suitable for solving nonlinear [10] Hinich MJ, Patterson DM. “Evidence of nonlinearity in daily stock
returns,” Journal of Business & Economic Statistics, vol.3, no.1, pp.69-
approximations that has been successfully applied in many 77, Jan. 1985.
fields. [11] McMillan DG. “Nonlinear predictability of stock market returns: Evi-
To achieve better stock market predictive performance, we dence from nonparametric and threshold models,” International Review
of Economics & Finance, vol.10, no.4, pp.353-368, Dec. 2001.
have used a deep architecture-based model. We use one- [12] Kanas A. “Nonlinearity in the stock priceCdividend relation,” Journal of
minute high frequency transaction data from the CSI 300 International Money and Finance, vol.24, no.4, pp.583-606, Jun. 2005.

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

[13] Bonilla CA, Romero-Meza R, Hinich MJ. “Episodic nonlinearity in Lin Chen is a professor with School of Manage-
Latin American stock market indices,” Applied Economics Letters, ment, Northwestern Polytechnical University, China.
vol.13, no.3, pp.195-199, Feb. 2006. She received the B.S. degree in accounting and the
[14] Hinton GE, Salakhutdinov RR. “Reducing the dimensionality of data Ph.D. degree in management from Xi’an Jiaotong
with neural networks,” Science, vol.313, no.5786, pp.504-507, Jul. 2006. University in 2006. During 2011-2012, she was a
[15] Hinton GE, Osindero S, Teh YW. “A fast learning algorithm for deep visiting scholar at University of Michigan. She is
belief nets,” Neural computation, vol.18, no.7, pp.1527-1554, Jul. 2006. the author of one book and more than 20 articles.
[16] LeCun Y, Bengio Y, Hinton G. “Deep learning,” Nature, vol.521, Her research interests include performance evalua-
no.7553, pp.436-444, May. 2015. tion, corporate governance, econometrics, big data
[17] Ciresan Dan, Meier U, Schmidhuber J. “Multi-column deep neural analytics in financial markets.
networks for image classification,” in CVPR, 2012, pp.3642-3649.
[18] Krizhevsky A, Sutskever I, Hinton GE. “ImageNet Classification with
Deep Convolutional Neural Networks,” in Advances in neural informa-
tion processing systems, 2012, pp.1097-1105.
[19] Deng L, Yu D. “Deep learning: methods and applications,” Foundations
and Trendsr in Signal Processing, vol.7, no.3-4, pp.197-387, Jun. 2014.
[20] Chen XW, Lin X. “Big data deep learning: challenges and perspectives,” Zhilin Qiao is an associate professor with School of
IEEE access, vol.2, pp.514-525, 2014. Economics and Finance, Xi’an Jiaotong University,
[21] Jin X, Shao J, Zhang X, An W, Malekian R. “Modeling of nonlinear China. He received the Ph.D. degree in management
system based on deep learning framework,” Nonlinear Dynamics, vol.84, from Xi’an Jiaotong University in 2006. During
no.3, pp.1327-1340, May. 2016. 2006-2008, he did research at Antai College of
[22] Chong E, Han C, Park FC. “Deep learning networks for stock market Economics and Management, Shanghai Jiao Tong
analysis and prediction: Methodology, data representations, and case University. During 2011-2012, he was a visiting
studies,” Expert Systems with Applications, vol.83, pp.187-205, Oct. scholar at University of Michigan. He is the author
2017. of more than 20 academic articles. His research
[23] Ding X, Zhang Y, Liu T, Duan J. “Deep learning for event-driven stock interests include experimental economics, behavioral
prediction,” in Ijcai, 2015, pp.2327-2333. finance, complexity science in economics and man-
[24] Heaton JB, Polson NG, Witte JH. “Deep learning for finance: deep agement.
portfolios,” Applied Stochastic Models in Business and Industry, vol.33,
no.1, pp.3-12, Jan. 2017.
[25] Gao Z, Dang W, Mu C, Yang Y, Li S, Grebogi C. “A Novel Multiplex
Network-based Sensor Information Fusion Model and Its Application to
Industrial Multiphase Flow System,” IEEE Transactions on Industrial
Informatics, Dec. 2017. Minggang Wang is an associate professor with
[26] Wang M, Tian L. “From time series to complex networks: The phase School of Mathematics, Nanjing Normal University,
space coarse graining,” Physica A: Statistical Mechanics and its Appli- China. He received the Ph.D. degree in science from
cations, vol.461, pp.456-468, Nov. 2016. Nanjing Normal University in 2015. From July 2017,
[27] Gao ZK, Cai Q, Yang YX, Dong N, Zhang SS. “Visibility graph from he is a visiting scholar at Boston University, USA.
adaptive optimal kernel time-frequency representation for classification He is the author of more than 20 articles in Scientific
of epileptiform EEG,” International Journal of Neural Systems, vol.27, Reports, Applied Energy, Energy Economics etc.
no.4, pp.1750005, Jun. 2017. His research interests include complexity science
[28] Gao ZK, Li S, Dang WD, Yang YX, Do Y, Grebogi C. “Wavelet in economics and management, econometrics, time
multiresolution complex network for analyzing multivariate nonlinear series analysis, data mining and big data analysis.
time series,” International Journal of Bifurcation and Chaos, vol.27,
no.8, pp.1750123, Jul. 2017.
[29] Wang M, Vilela AL, Du R, Zhao L, Dong G, Tian L, Stanley HE. “Exact
results of the limited penetrable horizontal visibility graph associated to
random time series and its application,” Scientific reports, vol.8, no.1,
pp.5130, Mar. 2018.
[30] Chourmouziadis K, Chatzoglou PD. “An intelligent short term stock Chao Wang is an associate professor with School
trading fuzzy system for assisting investors in portfolio management,” of Economics and Management, Beijing University
Expert Systems with Applications, vol.43, pp.298-311, Jan. 2016. of Technology, China. He is also a postdoctoral
[31] Hinton G. “A practical guide to training restricted Boltzmann machines,” fellow at Boston University, USA. His research
in Neural networks: Tricks of the trade, Berlin:Springer, 2012, pp.599- focuses on metaheuristics in optimization problem
619. and econophysics. He conducted his doctoral study
[32] Wang M, Chen Y, Tian L, et al. “Fluctuation behavior analysis of in management science and engineering at Beijing
international crude oil and gasoline price based on complex network Jiaotong University (BJTU). He received the Ph.D.
perspective,” Applied Energy, vol.175, pp.109-127, Aug. 2016. from BJTU (2015) with joint training by the Purdue
[33] Wang M, Tian L, Zhou P. “A novel approach for oil price forecasting University (2013 & 2014). His research has been
based on data fluctuation network,” Energy Economics, vol.71, pp.201- published in Computers & Industrial Engineering,
212, Mar. 2018. Engineering Computations etc.
[34] Wang M, Zhao L, Du R, et al. “A novel hybrid method of forecasting
crude oil prices using complex network science and artificial intelligence
algorithms,” Applied Energy, vol.220, pp.480-495, Jun. 2018.
[35] Jin W, Li ZJ, Wei LS, Zhen H. “The improvements of BP neural network
learning algorithm,” in WCCC-ICSP, 2000, pp.1647-1649.
[36] Schwenker F, Kestler HA, Palm G. “Three learning phases for radial- Ruijin Du is an associate professor with Faculty of
basis-function networks,” Neural networks, vol.14, no.4-5, pp.439-458, Science at Jiangsu University, China. From August
May. 2001. 2017, she is a visiting scholar at Boston University,
[37] Hyndman RJ, Koehler AB. “Another look at measures of forecast USA. Her current work involves robustness of com-
accuracy,” International journal of forecasting, vol.22, no.4, pp.679-688, plex networks, modeling and analysis of complex
Oct. 2006. system, data mining and big data analysis, etc.
[38] Prestwich S, Rossi R, Armagan Tarim S, Hnich B. “Mean-based error She has published more than 20 academic papers
measures for intermittent demand forecasting,” International Journal of in Applied Energy, Physical Review E, Scientific
Production Research, vol.52, no.22, pp.6782-6791, Nov. 2014. Reports, Chaos, Europhysics Letters, Physica A etc.
[39] Stathakis D. “How many hidden layers and nodes?” International
Journal of Remote Sensing, vol.30, no.8, pp.2133-2147, Apr. 2009.

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10

H. Eugene Stanley is an American physicist and


university professor with Boston University, USA.
He received the Ph.D. degree in physics from Har-
vard University in 1967 and has been elected to the
U.S. National Academy of Sciences in 2004. He has
made fundamental contributions to complex systems
and is one of the founding fathers of econophysics.
His current research interests include complexity
science, econometrics etc.

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like