Which Artificial Intelligence Algorithm Better Pre
Which Artificial Intelligence Algorithm Better Pre
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access
1234
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access
Abstract—Unpredictable stock market factors make it difficult machine learning that has improved the ability of comput-
to predict stock index futures. Although efforts to develop an ers in areas of image recognition and classification, natural
effective prediction method have a long history, recent develop- language processing, speech recognition, and social network
ments in artificial intelligence and the use of artificial neural
networks have increased our success in nonlinear approxima- filtering [14-16]. In some cases the results are comparable
tion. When we study financial markets we can now extract to or even superior to those of human experts [17,18]. The
features from a big data environment without prior predictive structure of DL is a multi-layer neural network that uses a
information. We here propose to further improve this predictive cascade of multiple layers of nonlinear processing units to
performance using a combination of a deep-learning based stock extract and transform various features. Its learning can be
index futures prediction model, an autoencoder, and a restricted
Boltzmann machine. We use high frequency data to examine the either supervised or unsupervised, and it forms a hierarchy of
predictive performance of deep learning, and we compare three concepts by utilizing multiple representation levels that corre-
traditional artificial neural networks, (i) the back propagation spond to different levels of abstraction [19]. The use of DL
neural network, (ii) the extreme learning machine, and (iii) the has improved computational power and big data processing,
radial basis function neural network. We use all of the one- and it allows more sophisticated algorithms. Previous studies
minute high frequency transaction data of the CSI 300 futures
contract (IF1704) in our empirical analysis, and we test three indicate that DL more efficiently solves nonlinear problems
groups of different volume samples to validate our observations. than traditional methods [19-21]. Irrespective of the level
We find that the deep learning method of predicting stock of complication or the presence of linear and nonlinear big
index futures outperforms the back propagation, the extreme data financial market factors, DL can extract abstract features
learning machine, and the radial basis function neural network and identify hidden relationships in financial markets without
in its fitting degree and directional predictive accuracy. We also
find that increasing the amount of data increases predictive making econometric assumptions [22]. Traditional financial
performance. This indicates that deep learning captures the economic methods and other quantitative techniques cannot
nonlinear features of transaction data and can serve as a powerful do this. Over against the limitations of existing models DL
stock index futures prediction tool for financial market investors. can process high frequency big data, analyze the financial
market, and predict stock returns [23,24]. When setting the
Index Terms—Prediction methods, Artificial neural networks, parameters of artificial neural networks, the learning rate,
Stock markets, Deep learning. epochs, goals, and number of artificial neurons, all must be
taken into account to achieve desirable results [25,26]. DL can
I. I NTRODUCTION extract abstract features without subjective interferences. We
TOCK market prediction is a classic topic in both financial thus do not need to add influencing factors or control variables
S circles and academia. Extreme stock market fluctuations,
e.g., the global stock market turmoil in February 2018, damage
when we use a large time-series dataset to predict financial
market behavior. Some other methods have been applied to
financial markets and the global economy. We thus need a time-series analysis[27-29]. For example, Gao et al. propose
more effective way of predicting market fluctuations. Among a new wavelet multiresolution complex network for analyzing
the many predictive efforts over the last few decades [1- multivariate time series [28], which is capable of analyzing
3], some have had success using quantitative methods [4-9], the information in the dynamical and topological fields and
such as autoregressive integrated moving average (ARIMA) is successfully applied in many research fields. But DL can
models, artificial neural networks, support vector machine, provide accurate financial market time-series forecasting, and
and neuro-fuzzy based systems, but because of the nonlinear the accuracy increases as the size of the database increases
characteristics of stock market behavior, financial economists [30]. Thus DL is perfect for time-series prediction in the
continue to debate these methodologies [10-13]. financial market.
Recently “deep learning” (DL) has attracted a great deal We here use deep learning to predict stock market behavior,
of attention in many research fields. DL is a new area of and we compare the performance of this approach with the
performance of traditional back propagation (BP) network, ex-
Lin Chen is with School of Management, Northwestern Polytechnical treme learning machines (ELM), radial basis function (RBF).
University, Xi’an 710072, China.
Zhilin Qiao is with School of Economics and Finance, Xi’an Jiaotong Many studies have used traditional predictive methods and
University, Xi’an 710061, China. low frequency data to predict stock prices and returns, but the
Minggang Wang is with Nanjing Normal University. Chao Wang is with market inefficiencies caused by high frequency microstructure
Beijing University of Technology. Ruijin Du is with Jiangsu University. H.
Eugene Stanley is with Boston University. noise may provide additional profit opportunities. When tradi-
Manuscript received April 4, 2018; revised July 12, 2018. tional predictive methods are used to examine high frequency
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access
Higher-level Features Here W indicates the weight matrices, b the biases, and L
the number of layers. The training process of deep learning
has two stages.The first is unsupervised learning from bottom
High-level Features to top. The second is supervised learning from top to bottom.
The widely used data representation methods in deep learning
are two nonlinear transformations, (i) the autoencoder (AE)
Low-level Features and (ii) the restricted Boltzmann machine (RBM).
artificial intelligence algorithms. In Sec. 3 we describe the data In the encoder stage, xl ∈ Rp = χ is the input of layer l
source and the data characteristics and propose the criteria for mapped to yl ∈ Rq = κ with the function yl = fl (Wl xl + bl ).
evaluating the performance of our model. Sec. 4 provides our A sigmoid function is typically used to activate function fl ,
empirical results, and Sec. 5 provides the conclusions. which we also use here. In the decoder stage, yl is mapped to
the reproduction of the same shape as xl using the function
′ ′ ′ ′
II. D EEP LEARNING AND OTHER ARTIFICIAL xl = fl (Wl yl + bl ) . Autoencoders are trained to minimize
INTELLIGENCE ALGORITHMS reconstruction errors,
′ ′ ′ ′ ′
A. Deep learning E(X, X ) = ∥X −X ∥2 = ∥X −f (W (f (W X +b))+b )∥2 .
Deep learning is a machine learning technology that by- (3)
passes manual extraction and extracts features mechanistically. We here use autoencoders to set initial weights and thresholds
Deep learning simulates the cerebral cortex and abstract data to reduce the error and more rapidly reach the desired range.
or signals, layer by layer, and models the image recognition
of the cerebral cortex. Deep learning first extracts low-level 2) Restricted Boltzmann Machine: The restricted Boltz-
features from original signals. It then extracts high-level fea- mann machine (RBM) is a generative stochastic artificial
tures, first from low-level features and then from higher-level neural network that can learn a probability distribution. It is
features. In an image recognition system, the original signals a restricted variant of a Boltzmann machine in which neurons
are pixels. The low-level features are the edges of objects. must form a bipartite graph. Depending on the task, the
The high-level features are the contours, and the highest- training of an RBM can be either supervised or unsupervised.
level feature is the image. Using the characteristics of high- A standard RBM has a hidden binary-value and visible units.
level classification, deep learning outputs forecast results [15]. Here X, Y are the visible (input) and hidden layers, respec-
Figure 1 shows the deep learning hierarchy. tively. The energy of a configuration (a pair of boolean vectors)
Deep learning abstracts and transfers features of data (X, Y ) is defined
through different layers. Here f is the activation function.
G(X, Y ) = −αT X − β T Y − X T W Y, (4)
Given input X and predictable output Ŷ , the prediction
function of deep learning is where the parameter α, β, and W are the weights of visible
M1 = f1 (W0 X + b0 ), units, hidden units, and those associated with the connection
between hidden and visible units, respectively. The probability
M2 = f2 (W1 M1 + b1 ), distributions over hidden or visible vectors in general Boltz-
··· (1) mann machines are defined in terms of the energy function
ML = fL (WL−1 ML−1 + bL−1 ), [31]:
1
Ŷ (X) = WL ML + bL . P (X, Y ) = e−G(X,Y ) , (5)
Z
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access
where Z is a partition function defined as the sum of e−G(X,Y ) output. Usually the hidden parameters {Wi , ei } are randomly
over all possible configurations, and the marginal probability generated during training without tuning. The ELM solves
of a visible vector is the sum over all possible hidden layer the compact model using error minimization
configurations,
minη ∥Dη − Z∥F (10)
1 ∑ −G(X,Y )
P (X) = e . (6) with
Z
1 , W2 , · · · , Wd , e1 , e2 , · · · , ed )
Y D(W
With the restriction that RBM has the shape of a bipar- f (W1 X1 + e1 ) · · · f (Wh X1 + ed )
tite graph with no intra-layer connections, the hidden unit .. ..
= . ··· . ,
activations are mutually independent given the visible unit
activations, and vise verse. In this paper, we use RBM to pre- f (W s + e1 ) · ·T· f (Wh Xs + ed )
1X
η1T Z1
train the network layer by layer, and then fine-tune it with .. ..
feedback method. η = . , Z = . ,
ηdT ZsT
B. Artificial intelligence algorithms where D is the hidden layer output matrix, and η the output
weight matrix. ELM randomly selects the hidden node pa-
Artificial intelligence algorithms have been used to predict rameters, and only the output weight parameters needed to be
economic and market behavior [32-34]. To compare predictive determined. Despite these advantages, ELM cannot be used
performances, we focus on three popular artificial neural when the time series is noisy.
networks, (i) the back propagation neural network, (ii) the 3) Radial basis function neural network: The radial basis
extreme learning machine, and (iii) the radial basis function function (RBF) network is an artificial neural network that
neural network. uses radial basis functions for activation. The network output
1) Back propagation neural network: The back propagation is a linear radial basis function combination of inputs and
(BP) neural network is an artificial intelligence algorithm neuron parameters. RBF networks have many uses, including
widely used in prediction, in particular for advanced multiple function approximation, time series prediction, classification,
regression analysis. It better generates complex and non- and system control [36]. Radial basis functions are often used
linear responses than a standard regression analysis [35]. The to construct function approximations,
formula for the BP algorithm is
∑
N
W (n) = W (n − 1) − ∆W (n), (7) y(x) = µi υ(∥x − xi ∥), (11)
i=1
where
where y(x) is the sum of N radial basis functions, and µi the
∂E
∆W (n) = γ (n − 1) + θ∆W (n − 1). (8) weight estimated using the linear least square matrix method.
∂W
The Gaussian function is a commonly used radial basis func-
Here γ is the learning rate, E the gradient of error function, tion, and is here used by υ(∥x − xi ∥). Geometrically the RBF
and θ∆W (n − 1) the quantity of incremental weight. A BP network divides the input space into hypersphere subspaces,
network uses the gradient method, and the learning and inertial which can cause such algorithmic problems as overfitting,
factors are determined by experience. This affects the conver- overtraining, the small-sample effect, and singularities.
gence in a BP network. A BP network rapidly converges to a
local minimum, but because it learns the convergent velocity III. DATA AND PERFORMANCE EVALUATION CRITERIA
more slowly, it has relatively few applications.
2) Extreme learning machine: The extreme learning ma- A. Data
chine (ELM) is a feedforward neural network for classification, China Securities Index Co., Ltd, a joint venture company of
regression, clustering, and feature learning with a single layer Shanghai and Shenzhen Stock Exchanges, received permission
or multi layers of hidden nodes in which the parameters of from the China Securities Regulatory Commission to release
hidden nodes need not be tuned. For a set of training samples the China securities indices. The CSI 300 is a representative
{(Xj , Zj )}Sj=1 with S samples and V classes, the single index of these indices. It is a capitalization-weighted stock
hidden layer feedforward neural network with d hidden nodes market index designed to represent the performance of the
and activation function f is top 300 stocks listed in the Shanghai and Shenzhen stock
exchanges. As an early stock index futures contract issued in
∑
d ∑
d
Yj = ηi fi (Xj ) = ηi f (Wi Xj + ei ), j = 1, 2, · · · , S, China, the CSI 300 futures contract has enriched the existing
i=1 i=1
index system in China’s securities markets and achieved great
(9) success. The CSI 300 index futures began trading on China
where Xj = [xj1 , xj2 , · · · , xjs ]T is the input, Financial Futures Exchange (CFFEX) with the commodity
Zj = [zj1 , zj2 , · · · , zjv ]T the corresponding output, ticker symbol IF on 16 April 2010. The notional value of
Wi = [wi1 , wi2 , · · · , wis ]T the connecting weights of one contract is RMB 300 times the value of the CSI 300. The
hidden neuron i to input neurons, ei the bias of hidden node CSI 300 index futures provides an indicator when observing
i , ηi = [ηi1 , ηi2 , · · · , ηiv ]T the connecting weights of hidden the fluctuation of the stock market. It facilitates companies
neuron i to the output neurons, and Yj the actual network and individual investors to better understand stock market
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access
B. Performance evaluation
3550 We use three criteria to evaluate predictive accuracy, (i)
the root-mean-square error (RMSE), (ii) the mean absolute
Opening Price
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access
A small-scale dataset
Actual and prediction data (small-scale datasets)
3550
Actual
2000 3/24 10:11
DL
Groups -3/24 15:00
Opening Price
BP
3500 3/13 14:21 ELM
ĂĂ -3/14 13:40 4/07 13:01 RBF
A medium-scale dataset 3/01 13:01 -4/10 10:50
-3/02 10:50
3450
5000
Groups 4/19 14:21
-4/20 13:40
ĂĂ 3400
A large-scale dataset 0 200 400 600 800 1000 1200
Date
10000 Fig. 5: Actual and prediction data of the CSI 300 futures
Groups
contract (small-scale datasets).
ĂĂ
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access
Actual and prediction data (medium-scale datasets) TABLE II: The Error and Running Time of Prediction
3500 (Medium-scale Datasets)
3/16 14:01-3/20 14:20
Testing Period Criteria DL BP ELM RBF
Opening Price
Opening Price
performance of each method. Figure 6 shows the sample
3460
values and forecasts of the CSI 300 futures contract during
two periods. Table 2 compares the predictive performances of Actual
3440
the DL, BP, ELM, and RBF methods. In the last row are the DL
average values of 10 simulation results for each dataset and a 3420
BP
summary of the mean value for each method. 4/14 13:01-4/20 13:40 ELM
RBF
Table 2 shows that the DL predictions are superior to those 3400
of BP, ELM, and RBF in both accuracy and direction. The 0 200 400 600 800 1000 1200
DL has a mean RMSE value of 0.7928, much lower than Date
the mean RMSE values of BP (5.1203), ELM (3.2245), and Fig. 7: Actual and prediction data of the CSI 300 futures
RBF (3.9305). The mean MAPE value of DL is 0.0002, also contract (large-scale datasets).
lower than those of the BP (0.0014), ELM (0.0007), and
RBF (0.0009) methods, indicating that the error of DL when
TABLE III: The Error and Running Time of Prediction (Large-
predicting the price of the CSI 300 index futures is minimal.
scale Datasets)
The mean DA value of DL is 0.7310 for directional prediction,
much higher and more accurate than those of BP (0.4740), Testing Period Criteria DL BP ELM RBF
ELM (0.5060), and RBF (0.5720). When increasing the size of 4/14 13:01- RMSE 0.6423 7.6147 1.7071 2.1135
the sample data, DL also shows better performance. In contrast 4/20 13:40 MAPE 0.0001 0.0020 0.0004 0.0002
to the average RBF running time of 218.8905, DL requires on DA 0.8120 0.4790 0.5460 0.7870
average 11.2566 seconds to complete the training and testing Time(s) 26.8537 5.5083 0.4637 3681.4000
with a usable predictive accuracy. Thus the high predictive
performance of DL is also confirmed in medium datasets.
ELM (0.0004), and RBF (0.0002). The error is smallest when
C. Predictive performance of large-scale datasets using DL to predict the price of the CSI 300 index futures.
Our large-scale dataset contains 60000 data. We use the first The DA value of DL (0.8120) is also higher than those of BP
54000 data (90%) to train the network and the remaining 6000 (0.4790), ELM (0.5460), and RBF (0.7870), indicating that the
(10%) to test the predictive accuracy of the methods. We have deep learning method has the highest directional predictive
one large-scale dataset in the current database. Figure 7 shows accuracy. The running time of DL is also superior to that
the sample values and the forecasts of the CSI 300 futures of RBF. Although the DA value of RBF reaches 0.7870, it
contract in the predictive period. Table 3 shows the predictive requires 3681.4000 seconds to process this scale of data. In
performances of the DL, BP, ELM, and RBF methods. The contrast, DL requires only 26.8537 seconds to process the
average value of 10 simulation results is also provided. same data, and the accuracy of its prediction is higher. Thus
Table 3 shows that the DL predictions of value and direction the predictive performance of DL is better than the others for
are more accurate than those of the BP, ELM, and RBF large-scale dataset.
methods. The RMSE value of DL (0.6423) is lower than those Figure 8 shows the empirical results for different size
of BP (7.6147), ELM (1.7071), and RBF (2.1135). The MAPE datasets and compares their average predictive performances.
value of DL (0.0001) is also lower than those of BP (0. 0020), DL exhibits the minimum RMSE and MAPE errors for small,
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access
RMSE 10-3 MAPE futures contract (IF1704) in the Chinese stock market and
8 2 carry out an empirical analysis. To show the effect of sample
6 1.5 volume on network training and predicting, we divide the
sample into three scale datasets and compare the stock price
4 1 prediction of deep learning with three traditional artificial
2 0.5 neural networks (BP, ELM, RBF). We compare their predictive
fitting degree and directional predictive accuracy and find that
0 0 the predictive performance of deep learning is superior to that
DL BP ELM RBF DL BP ELM RBF
of BP, ELM and RBF. We also find that increasing sample
DA Time(s) volume significantly increases the predictive performance of
1 4000
deep learning. We thus find that sample volume strongly
3000 affects stock prediction, and that deep learning performs well
0.5 2000 when applied to large data. Furthermore, deep learning does
not need prior predictive information to extract features from
1000 large datasets, and this increases its usefulness in predicting
0 0
stock market behavior. This result provides additional evidence
DL BP ELM RBF DL BP ELM RBF that DL is an effective method of predicting stock price.
Small-scale dataset Medium-scale dataset Large-scale dataset Our deep learning prediction model expands our ability to
Fig. 8: Predictive performance comparison of different scale analyze financial market behavior. Because there are complex
datasets. relationships among stock futures prices and such factors as
the economy, politics, the environment, and culture, future
research could apply complex network theory to key input
medium, and large datasets and its directional predictions are variables that influence stock prices and returns. That would
the most accurate. The predictive accuracy of DL increases allow the construction of a deep learning network that would
as the sample volume increases. Although the DL running facilitate better predictive performance.
time consumed is more than that of BP and ELM, it is much
less than that of RBF, and although RBF is more accurate, ACKNOWLEDGMENT
it requires so much more running time and this advantage is Financial support from National Social Science Fund of
obviated. China (17BGL143) is greatly acknowledged.
An empirical analysis shows that DL can be trained to a
given accuracy, and that it fits actual data well. We construct
R EFERENCES
a non-linear map of the transaction data to the closing price of
the following minute. The error reaches an ideal value after a [1] Pesaran MH, Timmermann A. “Predictability of stock returns: Robust-
ness and economic significance,” The Journal of Finance, vol.50, no.4,
few iterations and satisfies the experimental requirements. DL pp.1201-1228, Sep. 1995.
also has the highest fitting degree of directional prediction, [2] Ang A, Bekaert G. “Stock return predictability: Is it there?” The Review
and its accuracy increases when the size of the sample data of Financial Studies, vol.20, no.3, pp.651-707, Jul. 2006.
[3] Rapach DE, Strauss JK, Zhou G. “International stock return predictabil-
increase. The fitting degrees of BP and ELM are less accurate, ity: what is the role of the United States?” The Journal of Finance,
and the higher accuracy RBF is undone by its excessively long vol.68, no.4, pp.1633-1662, Aug. 2013.
running time. [4] Qi M. “Nonlinear predictability of stock returns using financial and
economic variables,” Journal of Business & Economic Statistics, vol.17,
We follow the study of Chong et al., add trade volume to no.4, pp.419-429, Oct. 1999.
the input, and test high frequency (one-minute) stock market [5] Aiolfi M, Favero CA. “Model uncertainty, thick modelling and the
data. We find that DL is more effective when applied to high predictability of stock returns,” Journal of Forecasting, vol.24, no.4,
pp.233-254, Jul. 2005.
frequency data, consistent with the assumption of Chong et al [6] Huang W, Nakamori Y, Wang SY. “Forecasting stock market movement
[22]. Thus DL is an effective method, particularly for large direction with support vector machine,” Computers & Operations Re-
high frequency data. It improves predictive accuracy, reduces search, vol.32, no.10, pp.2513-2522, Oct. 2005.
[7] Atsalakis GS, Valavanis KP. “Forecasting stock market short-term trends
investment blindness, and lowers investment risk. using a neuro-fuzzy based methodology,” Expert Systems with Applica-
tions, vol.36, no.7, pp.10696-10707, Sep. 2009.
[8] Schumaker R P, Chen H. “Textual analysis of stock market prediction
V. C ONCLUSION using breaking financial news: The AZFin text system,” ACM Transac-
Predicting stock market behavior is challenging. Nonlinear tions on Information Systems (TOIS), vol.27, no.2, pp.12, Feb. 2009.
[9] Adebiyi AA, Adewumi AO, Ayo CK. “Comparison of ARIMA and
relationship among transaction data and unpredictable factors artificial neural networks models for stock price prediction,” Journal
in market fluctuations make predictions difficult. Deep learn- of Applied Mathematics, 2014.
ing is a machine learning method suitable for solving nonlinear [10] Hinich MJ, Patterson DM. “Evidence of nonlinearity in daily stock
returns,” Journal of Business & Economic Statistics, vol.3, no.1, pp.69-
approximations that has been successfully applied in many 77, Jan. 1985.
fields. [11] McMillan DG. “Nonlinear predictability of stock market returns: Evi-
To achieve better stock market predictive performance, we dence from nonparametric and threshold models,” International Review
of Economics & Finance, vol.10, no.4, pp.353-368, Dec. 2001.
have used a deep architecture-based model. We use one- [12] Kanas A. “Nonlinearity in the stock priceCdividend relation,” Journal of
minute high frequency transaction data from the CSI 300 International Money and Finance, vol.24, no.4, pp.583-606, Jun. 2005.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access
[13] Bonilla CA, Romero-Meza R, Hinich MJ. “Episodic nonlinearity in Lin Chen is a professor with School of Manage-
Latin American stock market indices,” Applied Economics Letters, ment, Northwestern Polytechnical University, China.
vol.13, no.3, pp.195-199, Feb. 2006. She received the B.S. degree in accounting and the
[14] Hinton GE, Salakhutdinov RR. “Reducing the dimensionality of data Ph.D. degree in management from Xi’an Jiaotong
with neural networks,” Science, vol.313, no.5786, pp.504-507, Jul. 2006. University in 2006. During 2011-2012, she was a
[15] Hinton GE, Osindero S, Teh YW. “A fast learning algorithm for deep visiting scholar at University of Michigan. She is
belief nets,” Neural computation, vol.18, no.7, pp.1527-1554, Jul. 2006. the author of one book and more than 20 articles.
[16] LeCun Y, Bengio Y, Hinton G. “Deep learning,” Nature, vol.521, Her research interests include performance evalua-
no.7553, pp.436-444, May. 2015. tion, corporate governance, econometrics, big data
[17] Ciresan Dan, Meier U, Schmidhuber J. “Multi-column deep neural analytics in financial markets.
networks for image classification,” in CVPR, 2012, pp.3642-3649.
[18] Krizhevsky A, Sutskever I, Hinton GE. “ImageNet Classification with
Deep Convolutional Neural Networks,” in Advances in neural informa-
tion processing systems, 2012, pp.1097-1105.
[19] Deng L, Yu D. “Deep learning: methods and applications,” Foundations
and Trendsr in Signal Processing, vol.7, no.3-4, pp.197-387, Jun. 2014.
[20] Chen XW, Lin X. “Big data deep learning: challenges and perspectives,” Zhilin Qiao is an associate professor with School of
IEEE access, vol.2, pp.514-525, 2014. Economics and Finance, Xi’an Jiaotong University,
[21] Jin X, Shao J, Zhang X, An W, Malekian R. “Modeling of nonlinear China. He received the Ph.D. degree in management
system based on deep learning framework,” Nonlinear Dynamics, vol.84, from Xi’an Jiaotong University in 2006. During
no.3, pp.1327-1340, May. 2016. 2006-2008, he did research at Antai College of
[22] Chong E, Han C, Park FC. “Deep learning networks for stock market Economics and Management, Shanghai Jiao Tong
analysis and prediction: Methodology, data representations, and case University. During 2011-2012, he was a visiting
studies,” Expert Systems with Applications, vol.83, pp.187-205, Oct. scholar at University of Michigan. He is the author
2017. of more than 20 academic articles. His research
[23] Ding X, Zhang Y, Liu T, Duan J. “Deep learning for event-driven stock interests include experimental economics, behavioral
prediction,” in Ijcai, 2015, pp.2327-2333. finance, complexity science in economics and man-
[24] Heaton JB, Polson NG, Witte JH. “Deep learning for finance: deep agement.
portfolios,” Applied Stochastic Models in Business and Industry, vol.33,
no.1, pp.3-12, Jan. 2017.
[25] Gao Z, Dang W, Mu C, Yang Y, Li S, Grebogi C. “A Novel Multiplex
Network-based Sensor Information Fusion Model and Its Application to
Industrial Multiphase Flow System,” IEEE Transactions on Industrial
Informatics, Dec. 2017. Minggang Wang is an associate professor with
[26] Wang M, Tian L. “From time series to complex networks: The phase School of Mathematics, Nanjing Normal University,
space coarse graining,” Physica A: Statistical Mechanics and its Appli- China. He received the Ph.D. degree in science from
cations, vol.461, pp.456-468, Nov. 2016. Nanjing Normal University in 2015. From July 2017,
[27] Gao ZK, Cai Q, Yang YX, Dong N, Zhang SS. “Visibility graph from he is a visiting scholar at Boston University, USA.
adaptive optimal kernel time-frequency representation for classification He is the author of more than 20 articles in Scientific
of epileptiform EEG,” International Journal of Neural Systems, vol.27, Reports, Applied Energy, Energy Economics etc.
no.4, pp.1750005, Jun. 2017. His research interests include complexity science
[28] Gao ZK, Li S, Dang WD, Yang YX, Do Y, Grebogi C. “Wavelet in economics and management, econometrics, time
multiresolution complex network for analyzing multivariate nonlinear series analysis, data mining and big data analysis.
time series,” International Journal of Bifurcation and Chaos, vol.27,
no.8, pp.1750123, Jul. 2017.
[29] Wang M, Vilela AL, Du R, Zhao L, Dong G, Tian L, Stanley HE. “Exact
results of the limited penetrable horizontal visibility graph associated to
random time series and its application,” Scientific reports, vol.8, no.1,
pp.5130, Mar. 2018.
[30] Chourmouziadis K, Chatzoglou PD. “An intelligent short term stock Chao Wang is an associate professor with School
trading fuzzy system for assisting investors in portfolio management,” of Economics and Management, Beijing University
Expert Systems with Applications, vol.43, pp.298-311, Jan. 2016. of Technology, China. He is also a postdoctoral
[31] Hinton G. “A practical guide to training restricted Boltzmann machines,” fellow at Boston University, USA. His research
in Neural networks: Tricks of the trade, Berlin:Springer, 2012, pp.599- focuses on metaheuristics in optimization problem
619. and econophysics. He conducted his doctoral study
[32] Wang M, Chen Y, Tian L, et al. “Fluctuation behavior analysis of in management science and engineering at Beijing
international crude oil and gasoline price based on complex network Jiaotong University (BJTU). He received the Ph.D.
perspective,” Applied Energy, vol.175, pp.109-127, Aug. 2016. from BJTU (2015) with joint training by the Purdue
[33] Wang M, Tian L, Zhou P. “A novel approach for oil price forecasting University (2013 & 2014). His research has been
based on data fluctuation network,” Energy Economics, vol.71, pp.201- published in Computers & Industrial Engineering,
212, Mar. 2018. Engineering Computations etc.
[34] Wang M, Zhao L, Du R, et al. “A novel hybrid method of forecasting
crude oil prices using complex network science and artificial intelligence
algorithms,” Applied Energy, vol.220, pp.480-495, Jun. 2018.
[35] Jin W, Li ZJ, Wei LS, Zhen H. “The improvements of BP neural network
learning algorithm,” in WCCC-ICSP, 2000, pp.1647-1649.
[36] Schwenker F, Kestler HA, Palm G. “Three learning phases for radial- Ruijin Du is an associate professor with Faculty of
basis-function networks,” Neural networks, vol.14, no.4-5, pp.439-458, Science at Jiangsu University, China. From August
May. 2001. 2017, she is a visiting scholar at Boston University,
[37] Hyndman RJ, Koehler AB. “Another look at measures of forecast USA. Her current work involves robustness of com-
accuracy,” International journal of forecasting, vol.22, no.4, pp.679-688, plex networks, modeling and analysis of complex
Oct. 2006. system, data mining and big data analysis, etc.
[38] Prestwich S, Rossi R, Armagan Tarim S, Hnich B. “Mean-based error She has published more than 20 academic papers
measures for intermittent demand forecasting,” International Journal of in Applied Energy, Physical Review E, Scientific
Production Research, vol.52, no.22, pp.6782-6791, Nov. 2014. Reports, Chaos, Europhysics Letters, Physica A etc.
[39] Stathakis D. “How many hidden layers and nodes?” International
Journal of Remote Sensing, vol.30, no.8, pp.2133-2147, Apr. 2009.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2018.2859809, IEEE Access
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.