Integrated Long-Term Stock Selection Models Based On Feature Selection and Machine Learning Algorithms For China Stock Market
Integrated Long-Term Stock Selection Models Based On Feature Selection and Machine Learning Algorithms For China Stock Market
ABSTRACT The classical linear multi-factor stock selection model is widely used for long-term stock
price trend prediction. However, the stock market is chaotic, complex, and dynamic, for which reasons the
linear model assumption may be unreasonable, and it is more meaningful to construct a better-integrated
stock selection model based on different feature selection and nonlinear stock price trend prediction
methods. In this paper, the features are selected by various feature selection algorithms, and the parameters
of the machine learning-based stock price trend prediction models are set through time-sliding window
cross-validation based on 8-year data of Chinese A-share market. Through the analysis of different integrated
models, the model performs best when the random forest algorithm is used for both feature selection and
stock price trend prediction. Based on the random forest algorithm, a long-short portfolio is constructed to
validate the effectiveness of the best model.
INDEX TERMS Stock, trend prediction, machine learning, feature selection, long-term investment.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
22672 VOLUME 8, 2020
X. Yuan et al.: Integrated Long-Term Stock Selection Models Based on Feature Selection and Machine Learning Algorithms
the Indian market including artificial neural network (ANN), for support vector machines: classification (SVC) [3], [16],
SVM, random forest (RF), and Naive Bayes (NB) [13], [14]. and regression (SVR) [17], [18]. The core idea of SVM is the
At present, there are more than 3,000 listed companies in maximum margin hyper plane, in which way it can classify
the Chinese A-share market, and the number of listed com- the sample data into two different categories, including posi-
panies is still increasing. The traditional strategy is mainly tive and negative examples [19].
through research on listed companies to determine whether The steps to build a decision tree are as follows.
the company is worth buying or not. However, with the (xi , labeli ), (i = 1, 2, . . . , n) is a series of linearly sep-
increasing number of listed companies, this traditional strat- arable sample data in an N -dimensional plane, where xi is
egy will also require more manpower and material resources, the data set in the N-dimensional space, labeli is the label
and thus the sustainability of the strategy is not strong. corresponding to the sample data and the value is −1 or 1.
Numerous experts and scholars verify that the China stock Then the function obtained by SVM is wT · x + b = 0. The
market is still in the developing stage and a large number equation between the points is calculated as:
of individual investors trade in this market. Moreover, due
|wT · x + b| label(wT · x + b)
to information asymmetry and other phenomena, the price of D= =
some stocks tends to deviate from their own intrinsic value. ||w|| ||w||
Therefore, through analyzing historical stock data by com- The idea of SVM mentioned above is to maximize the
puter, the construction of quantitative stock selection strategy minimum interval, so the algorithm can be transformed into
model has great potential in such market. This kind of model an optimization problem as follow.
can generate a set of logical and strict trading instructions to 1
avoid investors from being affected by market sentiment and min ||w||2 s.t. labeli (wT · xi + b) ≥ 1(i = 1, 2, . . . , n)
2
resulting in incorrect judgments.
This paper focuses on the multi-features stock selection The above optimization problem is a quadratic program-
model based on different feature selection algorithms and ming problem, which can be solved by a corresponding
machine learning based stock price trend prediction algo- method. Then this paper introduces the second method to
rithms for the China stock market and establishes the nonlin- solve the classification hyper plane equation: Lagrangian
ear relationship between factors and stock returns. It expands multiplier method. The above optimization problem is trans-
the development direction of the classical multi-factor model formed into the following questions by the Lagrangian mul-
and provides a new investment strategy for investors in the tiplier method and the KKT condition (Lagrange duality):
China stock market. In our work, 60 features are obtained max min L(w, b, α)
to be used as the input of the model, through the financial α≥0 w,b
n
report, daily opening prices, closing prices, volumes and 1 X
other data of the A-share market. The main contributions L(w, b, α) = ||w||2 − αi (label(wT · xi + b) − 1)
2
of this research are reflected in the following points. First, i=1
the feature selection algorithm is used to filter the features, where labeli is the classification label and αi is the Lagrangian
which can reduce the complexity of the model, and avoid the multiplier. The above formula is calculated, and then the
dimensional disaster caused by too many features. Second, following equation can be achieved:
considering the problem of analyzing time series data using n n
X 1 X
the original cross-validation method, the method of time max L (w, b, α) = αi − αi αj labeli labelj xiT xj
sliding window cross-validation is adopted to make the model 2
i=1 i,j=1
more suitable for the actual situation. Third, the stock price n
X
trend forecasting algorithm is applied to predict the excess s.t. αi ≥ 0; αi labeli = 0
returns of stock of the subsequent month. i=1
The remainder of this paper is organized into the following
The optimization problem can be solved by the sequential
sections. In Section 2, several common stock price trend
minimal optimization (SMO) algorithm and then the specific
prediction algorithms and feature selection algorithms are
value of the parameter αi can be known. The algorithm can
introduced and also describes the principle and application
be described as an optimization function written as:
of each algorithm in detail. Section 3 explains the experiment
scheme. Sections 4 discusses the results of the experiment and f (x) = sign(wT · x + b)
proposes a effective trading strategy. Section 5 concludes this Xn
paper. = sign(( αi xi labeli )T · x + b)
i=1
II. METHODOLOGY Xn
A. THE METHODS OF STOCK PRICE TREND PREDICTION = sign( αi labeli < xi , x > + b)
1) SVM i=1
The support vector machine was first proposed by Vapnik and However, the data distribution currently discussed is an
then applied in different fields [15]. There are two categories ideal situation that is completely linear and can be divided
FIGURE 3. The structure of the two-layer fully connected neural network. (b) Forward propagation
From the input layer to the hidden layer:
hiddenj = f (netj1 )
through the output layer after the calculation of two hidden = f (x1 W1j1 + x2 W2j1 + . . . + xn Wnj1 + b1 bk1 ),
layers. The weights on the nodes are calculated by the error
(j = 1, 2, . . . , M )
back propagation algorithm.
For example, this paper assumes the existence of a two- where f (·) is the activation function, and it is set as the
layer fully connected neural network, presented in Fig. 3. The Sigmoid function, which is written as f (x) = 1+e1 −x .
detail of the figure is as follows: the number of the network From a hidden layer to output layer:
input layer nodes is N, which is defined as {x1 , x2 , . . . , xN },
and the number of hidden layer nodes is M , which is defined yj = f (netj2 )
as {hidden1 , hidden2 , . . . , hiddenM }. The number of output =f (hidden1 W1j2 +hidden2 W2j2 +. . .+hiddenm Wmj
2
+b2 bk2 )
layer nodes is K , which is defined as {y1 , y2 , . . . , yK }. The
weight from the input layer nodes to the hidden layer nodes
is Wij1 (i = 1, 2, . . . , N ; j = 1, 2, . . . , K ) and the weight from (c) Calculate the total error
the input layer nodes to the hidden layer nodes is Wij2 (i = The algorithm defines a loss function to measure the fit
1, 2, . . . , M ; j = 1, 2, . . . , K ). The bias term of input layer is of the model. The smaller the value of the loss function,
b1 and the weight from bias item to hidden layer node is bk1 . the better the performance of the fitting. For each data sample,
The bias term of the hidden layer is b2 and the weight from the loss function is:
bias item of the hidden layer to the output layer node is bk2 . n
1X
After the data of the input layer and the output layer, E= (yi − ti )2
2
it needs to determine the weight between each node to achieve i=1
the construction of the entire network. The method for deter- where ti is the target value.
mining the weight between nodes is the error back propa- (d) Error back propagation
gation algorithm, and the specific flow chart is displayed as The weight is updated by error back propagation so that
in Fig. 4. the error can be reduced. There are some common optimiza-
(a) Parameter initialization tion methods such as gradient descent method. The gradient
The initial weight and the offset weight are set and then the descent method is used as an example to illustrate the process
initial weight is updated according to the forward propagation of error back propagation.
and the error back propagation until the error or the number (1) Update the weight between the hidden layer and the
of iterations meets the condition. output layer
Firstly, the effect of each weight on the overall error is and (2) after the feature dimension exceeds a certain limit,
calculated, which is the partial error to the bias of the weight: the performance of the classifier decreases as the feature
dimension increases [27].
∂E ∂E ∂yj ∂netj2
= · ·
∂Wij2 ∂yj ∂netj2 ∂Wij2 1) SVM-RFE
= (yj − tj ) · yj · (1 − yj ) · hiddeni , The recursive feature elimination (RFE) applies in the
(i = 1, 2, . . . , M ; j = 1, 2, . . . , K ) machine learning model to perform multiple rounds of train-
ing [28]. After each round of training, the features with the
For the bias item weights: lowest importance are eliminated, and then the model is
∂netj2 trained based on the new feature set.
∂E ∂E ∂yj
= · · SVM-RFE is the recursive feature elimination algorithm
∂bk2 ∂yj ∂netj2 ∂bk2 based on SVM [29]. The operation steps are as follows. The
= (yj − tj ) · yj · (1 − yj ) · b2 original sample data set is
Secondly, it needs to set learning efficiency to update D = {xi1 , xi2 , . . . , xin , yi }, (i = 1, 2, . . . , m)
weights:
∂E where n represents the number of features and m represents
Wij2+ = Wij2 + η · the sample data.
∂Wij2
(a) The original sample data set is used as input to the
∂E linear SVM model to train the SVM model. The classifica-
bk2+ = bk2 + η ·
∂bk2 tion decision function of the linear SVM model is f (x) =
where Wij2+ and bk2+ are the updated weights. sign(wT ·x+b), where wi (i = 1, 2, . . . , n) indicates the weight
(2) Update the weight between the input layer and the corresponding to the ith feature.
hidden layer (b) Calculate the importance score for each feature
Firstly, the partial derivative of the total error versus weight as:scorei = w2i (i = 1, 2, . . . , n), where scorei represents the
is calculated: importance score of the ith feature.
(c) Sort the importance of all features in descending order
∂E ∂E ∂hiddenj
= · and remove the last ranked feature. The feature-length is
∂Wij1 ∂hiddenj ∂Wij1 changed from n to n-1, and then the data set D is updated.
K
∂netk2 (d) Cycle through steps (1), (2), (3) until the number of
X ∂E ∂yk
= ( · · ) · hiddeni remaining features meets the set conditions.
∂yk ∂netk2 ∂hiddenj
k=1
· (1 − hiddeni ) · xi 2) FEATURE SELECTION BASED ON RF
Secondly, it needs to set learning efficiency to update weights: Considering that the decision tree model is constructed in the
way that the optimal features are selected from all the features
∂E at a time to use to divide the sample data but how to select the
Wij1+ = Wij1 + η ·
∂Wij1 optimal feature from a large number of features has become
∂E the key issue of the decision tree model. The decision tree
bk1+ = bk1 + η · model measures the importance of features by the value of
∂bk1
information gain, information gain rate, Gini index, and so
where Wij1+ and bk1+ are the updated weights. So far, on. In this way, the importance of all features can be measured
the weight of the ownership is updated so that the error is well. Therefore, the feature selection algorithm based on the
reduced, and the loop operation steps (2), (3), and (4) are tree model can draw some features with higher importance in
continued until the error is less than the set threshold or the this way [30], [31].
number of iterations satisfies the condition. With the continuous development of the model, the inte-
The above part introduces the construction method of a grated forests such as random forest and GBDT have begun to
two-layer fully-connected neural network and the weight emerge, which has solved the over-fitting problem of decision
update method. For multi-layer neural networks, the same tree to some extent. Therefore, this paper adopts the feature
algorithm can be used to construct the multi-layer neural selection algorithm based on random forest. The specific
network. calculation of the model is as follows;
(a) The random forest uses the sampled method of return-
B. THE METHODS OF FEATURE SELECTION ing to generate the sub-data set, which is used to construct the
When all the features are directly used as input to the model corresponding decision tree model. Therefore, some data will
in the process of actually creating the model, the following not be selected during the sampling process. At this time, this
situations may occur because of unrelated features and corre- part of the data is called the bag (OOB). It is used to analyze
lations between features: (1) higher complexity of the model the out-of-bag data through the newly constructed decision
2) RF
K (xi , xj ) = exp(−γ ||xi − xj ||2 )
The RF algorithm is very similar to SVM-RFE, and both According to the formula, the parameters that need
need to select a certain number of features. For consistency, to be adjusted in the SVM model are mainly C and
the top 80% of the features are also selected. gamma (γ ), which are shown in Table 3. For example, when
C = 0.001 and gamma = 0.0001, the results obtained by the
D. PARAMETER SETTING OF STOCK PREDICTON time window slicing cross-validation are used as evaluation
The stock price trend forecasting algorithms used in this results of the model. Through testing each set of C and gamma
paper include SVM, RF, and ANN. For each model, different values, it can get the result for each model and finally select
parameters of the model will result in different results. In the the best performing model.
field of machine learning, the common method of parameter In addition, when choosing the RF algorithm, there are
setting is cross-validation. However, it cannot be applied some parameters that need to be set, such as the number of
in finance directly. For example, the operation steps of the decision trees, the maximum number of features that each
10-fold cross-validation method which is one of the most decision tree needs to consider when classifying, the mini-
common cross-validation methods exhibit in Fig. 6. If the mum number of samples required for internal node subdivi-
10-fold cross-validation method is used in financial data, sion, the minimum sample of leaf nodes and so on. In order
it is inevitable that future data are used to create model and to reduce the complexity of the model, the paper artificially
predict previous data. Thus, in this paper, the time window defines some of these parameters. For example, the number of
slicing cross-validation strategy is applied [32]. Considering decision trees is 100. The maximum number of features that
the application of 12 months of data for cross-validation, each decision tree needs to consider when classifying is the
the training set data are divided into 12 equal groups, firstly as square of the total number of features. It marks the minimum
illustrates in Fig. 7, and then the first 4 groups of data are used number of samples required for internal node subdivision
as the training set and the next group of data as the validation as ‘‘s’’ and the minimum sample of leaf nodes as ‘‘l’’. They
set each time. are described in detail in Table 4.
In this paper, the three-layer fully connected neural net- set of the first group is over the period from January 2010 to
work is applied to predict the stock price trend. Com- January 2011, while the testing set is over the period from
pared with the other two algorithms, the algorithm has more January 2011 to February 2011. The specific division method
parameters, such as the number of hidden layer neurons, is shown in Fig. 8.
the optimization function, L2 penalty (regularization term)
F. MODEL EVALUATION
parameter, the maximum number of iterations, and so on. The
number of the first hidden layer neurons is set to 20, and then For the classification of the algorithm, there are several com-
another is 10. The optimization function is ‘‘lbfgs’’ which is mon evaluation indicators, such as accuracy, precision, and so
an optimizer in the family of quasi-Newton methods. It marks on. They are calculated from a confusion matrix, as defined
L2 penalty (regularization term) parameter as ‘‘alpha’’ and in Table 6.
the maximum number of iterations as ‘‘max_iter’’. They are Most people always like to measure the success of clas-
described in detail in Table 5. sifier tasks based on the correct rate, while accuracy is the
most important indicator for evaluating the model. However,
E. OUT-SAMPLE TEST the method of calculating accuracy needs to manually set a
The sliding window method, which has been widely used in threshold to achieve classification. The accuracy is greatly
stock price trend prediction, is applied to divide the sample affected by this threshold, and so the paper further uses the
into different groups of training and testing sets. The main AUC indicator to evaluate the model in this paper. AUC is
reason for applying this method is that investors always pay calculated from the area covered by the ROC curve, which
attention to the recent trend of stocks, but not interested in the x-axis is the FP, and the y-axis is the Recall.
data long ago. Therefore, the model needs to be continuously In order to further evaluate the model, the paper constructs
updated during the process of the application. In order to a strategy model and implements the back-testing of historical
obtain the latest stock information as much as possible, the data. There are some indicators, as exhibits in Table 7, which
model is regenerated every month. For example, the training are evaluated the strategy model [33]. The annualized return
IV. RESULTS
A. EMPIRICAL RESULTS AND DISCUSSION
In this paper, the procedure described in Section 3 is used
refers to the return that can be obtained after one year of for the experiment. The above steps are repeated every three
investment. The Sharpe ratio is an indicator used to measure months according to the method of the time window slicing,
both returns and risks of the model. The max drawdown is the and some results were obtained and shown as follows.
maximum degree of loss that the model may have in a certain First, the prediction accuracy and AUC of different
period of time in the past. In the case of investment, if there integrated models are analyzed, which has been shown
is a large max drawdown, it will often lead investors to lose in Table 8 and Table 9. As can be seen from the two tables,
confidence in the model. Therefore, the max drawdown of the stock price trend forecast result is the best when adopt-
the reasonable control model is particularly important. It is ing the RF model. Consider that the prediction performance
especially important to control the max drawdown of the of the three models is relatively close, the paper conducts a
model reasonably. more in-depth analysis of these three models.
Where Rp is the annualized return and P is the total return. Second, the main job is to analyze the profitability of
n is the number of days the strategy is conducted and Rf is different integrated models. In order to further explain the
application of the algorithm proposed in this paper, the APT Moreover, Table 12 and Table 13 shows the win rate and profit
model is used to predict the stock returns and achieve com- loss ratios of different models.
parative analysis. The specific strategy is built as follows: the The above tables found that the best result can be obtained
back-test time is from 2011.1 to 2018.1, and the stocks are when selecting the features through RF and applying RF for
traded at the end of each month. By ranking the probability stock price trend prediction. The RF-RF1 integrated model is
that the A-share stocks are predicted to be ‘‘1’’, then the top analyzed in detail as follows.
1%, 3%, or 5% of the stocks that are equally divided into the First of all, the specific strategy is built as follows:
funds to buy are selected separately. The back-test time is from January 2011 to January 2018, and
Table 10 shows the annualized return of different inte- the stocks are traded at the end of each month. By ranking the
grated models. The results demonstrate that the profitabil- probability that the A-share stocks are predicted to be ‘‘1’’,
ity of the SVM and RF models are better than APT, and all stocks are divided into 10 equal groups. For each group,
regardless of whether using SVM or RF to predict stock all stocks that are equally divided into the funds to buy are
price trends, the profitability is stronger than ANN. As the selected. The next thing to do is to analyze the profitability of
number of selected stocks decreases, the profitability of these ten groups.
the model becomes stronger. Therefore, when choosing 1% of
the total number of stocks, the higher returns can be obtained. 1 The RF-RF integrated model represents the random forest algorithm is
The Sharpe ratio of the models is calculated in Table 11. used for both feature selection and stock price trend prediction.
Fig. 9 shows the net value of these ten groups over the Sharpe ratio, win rate and profit loss ratio. And the hier-
period from January 2011 to January 2018. The red line repre- archical combined back-testing also shows that the RF-RF
sents the net trend of Group 1, while the purple and black lines integrated model has strong long-term predictability and prof-
represent the net value of HS300 and the Shanghai Composite itability. At present, the proportion of quantitative investment
Index. Table 14 shows the performance of different groups. in the field of financial investment is becoming larger and
Group 1 has the best results regardless of any evaluation larger. Especially in recent years, with the poor market situ-
indicators. In order to further illustrate profitability of the ation, traditional investment has been suffering a large loss
models, as shown in Table 15, the return for each year of the in China stock market, and quantitative investment obtains
ten groups is calculated. Group 1 has the best performance stable income by controlling systematic risk, which makes
compared to other models. Therefore, the RF-RF integrated quantitative investment products more and more trusted by
model has strong profitability. investors. Based on the feature selection and machine learn-
The above empirical results indicate that the RF-RF inte- ing algorithm, our empirical results find that the RF-RF inte-
grated model has the best performance in annualized return, grated model can bring stable long-term return to investors
which is meaningful for guiding investment, as well as pro- features are filtered by feature selection methods. The time
moting investors’ willingness to invest and improving the sliding window method is applied for cross-validation to
vitality of the capital market. determine the parameters of stock price trend prediction
algorithms, which makes the model more practical in actual
B. A LONG-SHORT TRADING STRATEGY BASED ON THE investment transactions. The empirical results show that the
RF-RF INTEGRATED MODEL best performance can be obtained when the RF is applied
Note that the Sharpe ratio of group 1 in Table 14 is 0.744 and for both feature selection and stock price trend forecasting.
the max drawdown is 45.95%. The main reason is that this By selecting different stock numbers to build the model, it is
model always holds stock fully at any time, so it is difficult also found that the RF-RF model has the highest return when
to control the drawdown when the market falls substantially. it chooses top 1% of the stocks, achieving a 29.51% annu-
However, when investors choose their investment strategy, alized return. The stratified back-testing method is used to
they still have doubts about the relatively volatile return further analyze the profitability of the RF-RF model, and the
fluctuations. annualized return from 2011 to 2018 for the new long-short
The RF-RF integrated model in this paper helps investors portfolio is 21.92% while the max drawdown is only 13.58%.
in solving this problem by buying the stock of Group 1 and Therefore, the RF-RF model is highly predictive of long-term
selling the stock of Group 10 at the same time. As can be stock price trends and can be used for guiding investment.
seen from Fig. 10 and Table 16, the annualized return of There are still some issues that need to be improved in this
this new long-short portfolio is 21.92% which is lower than article: (1) no test in overseas markets such as US and UK, and
the portfolio that selects the top 1% stocks, but the max (2) the feature selection algorithm still needs to be optimized,
drawdown of the new portfolio gets 13.58% which far below such as how to determine the number of features selected,
much lower than it, and the Sharpe ratio is 2.86, which is and (3) there is still a need to continually explore more new
significant for investors. features which have more predictability.
V. CONCLUSION REFERENCES
This paper aims to analyze the profitability of various inte-
[1] W. Huang, Y. Nakamori, and S.-Y. Wang, ‘‘Forecasting stock market
grated stock selection models based on different feature selec- movement direction with support vector machine,’’ Comput. Oper. Res.,
tion and stock price trend prediction algorithms. The original vol. 32, no. 10, pp. 2513–2522, Oct. 2005.
[2] C. Huang, D. Yang, and Y. Chuang, ‘‘Application of wrapper approach [27] R. CervellÓ-Royo, F. Guijarro, and K. Michniuk, ‘‘Stock market trading
and composite classifier to the stock trend prediction,’’ Expert Syst. Appl., rule based on pattern recognition and technical analysis: Forecasting the
vol. 34, no. 4, pp. 2870–2878, May 2008. DJIA index with intraday data,’’ Expert Syst. Appl., vol. 42, no. 14,
[3] M.-C. Lee, ‘‘Using support vector machine with a hybrid feature selection pp. 5963–5975, Aug. 2015.
method to the stock trend prediction,’’ Expert Syst. Appl., vol. 36, no. 8, [28] W. You, Z. Yang, and G. Ji, ‘‘Feature selection for high-dimensional multi-
pp. 10896–10904, Oct. 2009. category data using PLS-based local recursive feature elimination,’’ Expert
[4] Y. Kara, M. Acar Boyacioglu, and Ö. K. Baykan, ‘‘Predicting direction of Syst. Appl., vol. 41, no. 4, pp. 1463–1475, Mar. 2014.
stock price index movement using artificial neural networks and support [29] X. Lin, F. Yang, L. Zhou, P. Yin, H. Kong, W. Xing, X. Lu, L. Jia, Q. Wang,
vector machines: The sample of the Istanbul Stock Exchange,’’ Expert and G. Xu, ‘‘A support vector machine-recursive feature elimination fea-
Syst. Appl., vol. 38, no. 5, pp. 5311–5319, May 2011. ture selection method based on artificial contrast variables and mutual
[5] M. Ballings, D. Van Den Poel, N. Hespeels, and R. Gryp, ‘‘Evaluating information,’’ J. Chromatography B, vol. 910, pp. 149–155, Dec. 2012.
multiple classifiers for stock price direction prediction,’’ Expert Syst. Appl., [30] X.-Q. Hu, M. Cui, and B. Chen, ‘‘Feature selection based on random forest
vol. 42, no. 20, pp. 7046–7056, Nov. 2015. and application in correlation analysis of symptom and disease,’’ presented
[6] D. Enke and S. Thawornwong, ‘‘The use of data mining and neural at the Conf. IEEE Int. Symp. Med. Educ., Aug. 2009.
networks for forecasting stock market returns,’’ Expert Syst. Appl., vol. 29, [31] D.-J. Yao, J. Yang, and X. J. Zhan, ‘‘Feature selection algorithm based on
no. 4, pp. 927–940, Nov. 2005. random forest,’’ J. Jilin Univ., vol. 44, no. 1, pp. 137–141, 2014.
[7] C.-F. Tsai and S.-P. Wang, ‘‘Stock price forecasting by hybrid machine [32] X. Zhang, Y. Hu, K. Xie, S. Wang, E. Ngai, and M. Liu, ‘‘A causal fea-
learning techniques,’’ in Proc. Int. Multi-Conf. Eng. Comput. Scientists, ture selection algorithm for stock prediction modeling,’’ Neurocomputing,
Mar. 2009, pp. 755–761. vol. 142, pp. 48–59, Oct. 2014.
[8] Y. Zuo and E. Kita, ‘‘Stock price forecast using Bayesian network,’’ Expert [33] J. Rohde, ‘‘Downside risk measure performance in the presence of breaks
Syst. Appl., vol. 39, no. 8, pp. 6729–6737, Jun. 2012. in volatility,’’ J. Risk Model Validation, vol. 9, no. 2, pp. 31–68, Dec. 2015.
[9] E. Chong, C. Han, and F. C. Park, ‘‘Deep learning networks for stock
market analysis and prediction: Methodology, data representations, and XIANGHUI YUAN received the B.Sc. degree in
case studies,’’ Expert Syst. Appl., vol. 83, pp. 187–205, Oct. 2017. electrical engineering and its automation from
[10] B. Weng, L. Lu, X. Wang, F. M. Megahed, and W. Martinez, ‘‘Predicting Shaanxi Science and Technology University,
short-term stock prices using ensemble methods and online data sources,’’
Xianyang, China, in 2002, and the Ph.D. degree
Expert Syst. Appl., vol. 112, pp. 258–273, Dec. 2018.
in control science and engineering from Xi’an
[11] A. Oztekin, R. Kizilaslan, S. Freund, and A. Iseri, ‘‘A data analytic
approach to forecasting daily stock returns in an emerging market,’’ Eur. Jiaotong University, Xi’an, China, in 2008.
J. Oper. Res., vol. 253, no. 3, pp. 697–710, Sep. 2016. From 2008 to 2018, he was a Faculty Member
[12] E. F. Fama and K. R. French, ‘‘Common risk factors in the returns on stocks with the Department of Automation, Xi’an
and bonds,’’ J. Financial Economics, vol. 33, no. 1, pp. 3–56, Feb. 1993. Jiaotong University. He is currently an Associate
[13] J. Patel, S. Shah, P. Thakkar, and K. Kotecha, ‘‘Predicting stock and Professor with the Department of Financial Engi-
stock price index movement using Trend Deterministic Data Preparation neering, Xi’an Jiaotong University. His research interests include estima-
and machine learning techniques,’’ Expert Syst. Appl., vol. 42, no. 1, tion and decision theory for stochastic systems, financial engineering, and
pp. 259–268, Jan. 2015. machine learning for financial data analysis. He is also a CFA charter holder.
[14] H. Y. Kim and C. H. Won, ‘‘Forecasting the volatility of stock price index:
A hybrid model integrating LSTM with multiple GARCH-type models,’’ JIN YUAN was born in Anhui, China, in 1994.
Expert Syst. Appl., vol. 103, pp. 25–37, Aug. 2018. He received the B.S. degree from Northwest Poly-
[15] V. Vapnik, ‘‘An overview of statistical learning theory,’’ IEEE Trans. technic University, Xi’an, China, in 2015. He is
Neural Netw., vol. 10, no. 5, pp. 988–999, Sep. 1999. currently pursuing the Ph.D. degree in applied
[16] A. Fan and M. Palaniswami, ‘‘Stock selection using support vector
economics with Xi’an Jiaotong University, Xi’an.
machines,’’ in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2001,
His research interests include financial engineer-
pp. 1793–1798.
ing, machine learning for financial data analysis,
[17] F. E. H. Tay and L. Cao, ‘‘Application of support vector machines in
financial time series forecasting,’’ Omega, vol. 29, no. 4, pp. 309–317, performance evaluation, and ranking.
Aug. 2001.
[18] K.-J. Kim, ‘‘Financial time series forecasting using support vector
machines,’’ Neurocomputing, vol. 55, nos. 1–2, pp. 307–319, Sep. 2003. TIANZHAO JIANG was born in Anhui, China,
[19] R. Khemchandani, Jayadeva, and S. Chandra, ‘‘Knowledge based proximal
in 1993. He received the B.S. degree from North-
support vector machines,’’ Eur. J. Oper. Res., vol. 195, no. 3, pp. 914–923,
west University, Xi’an, China, in 2016, and the
Jun. 2009.
M.S. degree in control theory and science from
[20] S. Safavian and D. Landgrebe, ‘‘A survey of decision tree classi-
fier methodology,’’ IEEE Trans. Syst., Man, Cybern., vol. 21, no. 3, Xi’an Jiaotong University, Xi’an, in 2019. He is
pp. 660–674, Jun. 1991. currently working as a Researcher with Shang-
[21] Y. H. Cho, J. K. Kim, and S. H. Kim, ‘‘A personalized recommender system hai Foresee Investment Ltd., Liability Company.
based on Web usage mining and decision tree induction,’’ Expert Syst. His research interests include information fusion,
Appl., vol. 23, no. 3, pp. 329–342, Oct. 2002. financial engineering, machine learning algorithm,
[22] V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and and investment.
B. P. Feuston, ‘‘Random forest: A classification and regression tool for
compound classification and QSAR modeling,’’ J. Chem. Inf. Comput. Sci., QURAT UL AIN received the master’s degree in
vol. 43, no. 6, pp. 1947–1958, Nov. 2003. accounting and finance and the M.S. degree in
[23] B. Lariviere and D. Vandenpoel, ‘‘Predicting customer retention and prof- finance from the University of Central Punjab,
itability by using random forests and regression forests techniques,’’ Expert Lahore, Pakistan, in 2014 and 2017, respectively.
Syst. Appl., vol. 29, no. 2, pp. 472–484, Aug. 2005. She is currently pursuing the Ph.D. degree with the
[24] A. Prinzie and D. Van Den Poel, ‘‘Random forests for multiclass classi- School of Economics and Finance, Xi’an Jiaotong
fication: Random multinomial logit,’’ Expert Syst. Appl., vol. 34, no. 3,
University, China. Her areas of research interest
pp. 1721–1732, Apr. 2008.
are corporate finance, corporate governance, risk
[25] M. Khashei and M. Bijari, ‘‘An artificial neural network (p, d, q) model for
time series forecasting,’’ Expert Syst. Appl., vol. 37, no. 1, pp. 479–489,
management, and technology innovation. Her cur-
Jan. 2010. rent research work revolves around the governance
[26] Q. Cao, M. E. Parry, and K. B. Leggio, ‘‘The three-factor model and role of women directors. Her research work has been published in very
artificial neural networks: Predicting stock price movement in China,’’ well-reputed journals.
Ann. Oper. Res., vol. 185, no. 1, pp. 25–44, May 2011.