0% found this document useful (0 votes)
24 views

Intelligent Algorithmic Trading Strategy Using Reinforcement Learning and Directional Change

Uploaded by

parth.jha.soft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Intelligent Algorithmic Trading Strategy Using Reinforcement Learning and Directional Change

Uploaded by

parth.jha.soft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Received July 29, 2021, accepted August 2, 2021, date of publication August 16, 2021, date of current version

August 24, 2021.


Digital Object Identifier 10.1109/ACCESS.2021.3105259

Intelligent Algorithmic Trading Strategy Using


Reinforcement Learning and Directional Change
MONIRA ESSA ALOUD AND NORA ALKHAMEES
Department of Management Information System, College of Business Administration, King Saud University, Riyadh 11451, Saudi Arabia
Corresponding author: Monira Essa Aloud ([email protected])
This work was supported by the Research Center for the Humanities, Deanship of Scientific Research, King Saud University.

ABSTRACT Designing a profitable trading strategy plays a critical role in algorithmic trading, where the
algorithm can manage and execute automated trading decisions. Determining a specific trading rule for
trading at a particular time is a critical research problem in financial market trading. However, an intelligent,
and a dynamic algorithmic trading driven by the current patterns of a given price time-series may help deal
with this issue. Thus, Reinforcement Learning (RL) can achieve optimal dynamic algorithmic trading by
considering the price time-series as its environment. A comprehensive representation of the environment
states is indeed vital for proposing a dynamic algorithmic trading using RL. Therefore, we propose a
representation of the environment states using the Directional Change (DC) event approach with a dynamic
DC threshold. We refer to the proposed algorithmic trading approach as the DCRL trading strategy.
In addition, the proposed DCRL trading strategy was trained using the Q-learning algorithm to find an
optimal trading rule. We evaluated the DCRL trading strategy on real stock market data (S&P500, NASDAQ,
and Dow Jones, for five years period from 2015-2020), and the results demonstrate that the DCRL state
representation policies obtained more substantial trading returns and improved the Sharpe Ratios in a volatile
stock market. In addition, a series of performance analyses demonstrate the robust performance and extensive
applicability of the proposed DCRL trading strategy.

INDEX TERMS Machine learning, reinforcement learning, Q-learning, directional change event, algorith-
mic trading, stock market.

I. INTRODUCTION Classic algorithmic trading strategy models include trend-


Developing algorithmic trading strategies that can make following and mean reversion strategies [2]. Early works
timely stock trading decisions has always been a sub- include the use of filter trading rules to control when to
ject of interest for investors and financial analysts. The buy or sell a stock [3]. Several studies have investigated
decision-making problem for financial trading remains par- algorithmic trading, including algorithms based on funda-
ticularly challenging given the variety of factors that can mental and technical analysis indicators and algorithms based
influence stock prices. The design challenge of algorithmic on machine learning techniques [1]. Machine learning algo-
trading primarily emerges from the continuous evolution rithms learn from historical data and interacts with the
of price time-series and thus the dynamic cycle of making environment to generate profitable trading rules. Machine
trading action decisions. Algorithmic trading is based on learning techniques for algorithmic trading can be divided
computer algorithms to produce automated trading decisions into supervised learning and RL algorithm-based meth-
and place orders in the market [1]. Recent advancements in ods [4]. Supervised learning methods examine and analyze
information technologies and machine learning techniques training data (structured data) to predict stock prices or
have led to the creation of algorithmic trading, which is also trends. The RL algorithm recognizes different environmental
referred to as quantitative trading [1]. Decision-making in states, and it performs an action and receives feedback (i.e.,
financial trading requires the trading algorithm to explore a reward). Thus, RL methods learn to change actions to max-
the environment and make appropriate and timely decisions imize future rewards [5]. In this study, we develop an algo-
without supervised information. rithmic trading strategy using a machine learning technique,
i.e., a hybrid of Reinforcement Learning (RL) and Q-learning
The associate editor coordinating the review of this manuscript and
approving it for publication was Yiqi Liu. algorithms.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 114659
M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

RL is a machine learning method used for sequential algorithm to make decisions and take the most appropriate
decision-making problems [5]. It achieves policy improve- trading action.
ment throughout continuous interaction with and ongoing The DCRL algorithmic trading strategies were evaluated
evaluation of its environment. A RL agent performs a using real financial market data for stock trading. We con-
sequence of actions based on the environment states to receive ducted a series of systematic experiments to confirm the
a predefined reward. In contrast to supervised machine learn- effectiveness and interpretability of the trading performance
ing, which requires historical labeled data, the RL agent results. Therefore, we selected three common US stock
learns the environment’s states and performs actions through indices to evaluate the performance of the DCRL algorithmic
continuous evaluation of the dynamic environment. The RL trading strategies (with and without Q-learning) and com-
algorithm has several advantages, e.g., self-learning, ongoing pare their performance against Zero-Intelligence (ZI) trading
behavior enhancements, and adaptivity to the environment agents. The experimental results demonstrate that the DCRL
states. RL has been applied effectively in different domains, algorithmic trading strategies are effective in different market
e.g., job scheduling [6], pattern recognition [7], and algorith- situations and can potentially generate profits.
mic trading [8]–[11]. Our primary contributions are summarized as follows.
Despite the effectiveness and robustness of the RL algo- First, we contribute to the financial market literature by
rithm, employing an algorithmic trading strategy remains a designing and developing an algorithmic trading strategy that
challenge in real-world trading for three reasons. First, using is suitable for stock markets by improving the RL envi-
a physically fixed time interval (e.g., hourly data) to represent ronment state representation and action decision-making to
the environment states make the flow of price time-series ensure stable trading returns even in the case of volatile price
irregularly spaced because prices are transacted at irregu- time-series. Second, we contribute to the application of the
lar times and at different magnitudes and directions [12]. DC event approach for the representation of the environmen-
Physical time employs a point-based system, where a single tal states in RL. The proposed algorithmic trading considers
time unit for observing price changes in range from seconds sequential DC event recognition in the price time-series pro-
to hours or even days; thus, time is homogeneous. Under cess using the dynamic DC threshold. This model can support
intrinsic time, the Directional Change (DC) event approach decision-makers to determine optimal trading opportunities
emerges as an alternative approach for price time-series anal- to maximize profits. Finally, we contribute to the literature
ysis that can capture periodic patterns in price time-series. by using the Q-learning algorithm to improve the learning
Second, selecting appropriate features and data to represent process via the prior gained experience, and we capture
the environment states can be difficult. For example, man- long-term learning and continuous improvements via the Q-
ual selection of features and data is challenging due to the learning algorithm to achieve optimal policies under different
large search space (e.g., fundamental, and technical indicator market states.
data) [9]. Finally, machine learning algorithms have a com- The remainder of this paper organized as follows.
plex structure and a large number of different parameters [4]. Section II presents a brief discussion of literature related
Reducing the number of parameters results in simplifying the to the RL algorithm in financial trading. Section III pro-
tracking and interruption of the trading performance results. vides a brief description of the DC event approach and
This study extends the Alkhamees and Aloud [7], where the definition of the dynamic DC threshold. Section IV
a DCRL model was introduced to detect directional price describes DCRL algorithmic trading and the Q-learning
changes in price time-series. The proposed DCRL model is algorithm. Section V presents the datasets, experiment set-
considered an alternative approach to the traditional time- tings, profitability results, and discusses the empirical results.
series analytical approaches for environment state represen- Section VI concludes the paper and presents suggestions for
tation. Basically, these traditional approaches are based on potential future work.
fixed time interval analysis, in contrast, the DCRL model
samples price time-series under intrinsic time. The DCRL II. RELATED WORKS
model also learns the states of the price time-series to find Several works in financial and machine learning literature
the optimal dynamic threshold for DC event analysis. The have exploited RL in different financial market studies,
dynamic DC threshold was introduced [13] to replace the e.g., financial signal representation [4], [7], [14], building
fixed DC threshold, which is used to identify DC events algorithmic trading [4], [8]–[10], [15], [16], portfolio man-
(e.g., directional price changes). agement [11], [17], [18], optimizing trade execution [19],
This paper develops an intelligent and dynamic algorithmic Foreign Exchange (FX) asset allocations [20], changes in
trading strategy using the proposed DCRL model, specifi- market regimes [11], and stock market modelling [21], [22].
cally, we present two algorithmic trading strategies where the Building algorithmic trading using RL has been the focus of
first is a direct RL approach and the second additionally incor- many studies for a range of market settings. Some studies
porates a RL Q-learning algorithm. Essentially, the proposed have used direct RL [23], while others have employed a
DCRL algorithmic trading employs the DC event approach value-based RL approach with a Q-Learning matrix to realize
with the dynamic DC threshold to derive the state represen- algorithmic trading [15], [23], [24]. In addition, other studies
tation in RL. In addition, it uses the RL decision-making have used Recurrent RL (RRL) approach [10], [11], [25]

114660 VOLUME 9, 2021


M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

or applied a Q-learning algorithm to the design of trading Q-learning and temporal difference algorithms using real
strategies [9], [26], [27]. Furthermore, several recent stud- data. Their results demonstrated that the deferential Sharpe
ies have employed deep RL for financial portfolio manage- ratio RRL system outperformed the Q-learning algorithm.
ment [17], [18]. Carapuço et al. [28] developed an RL trading system to trade
Serving the literature on algorithmic trading using direct in the foreign exchange market. They used ANNs with three
RL. Bertsimas and Lo [23] examined an application of the RL hidden layers, where the neurons were trained as RL agents
algorithm for trading a large block of equity over a fixed time under the Q-learning algorithm using a simulated market
horizon to minimize the expected cost of executing trades. environment framework. The framework was tested using
They identify optimal trading rules (i.e., executed actions) as EUR/USD market data from 2010 to 2017 with more than
a strategy that evolves over a few days. Their experimental 10 tests with different initial conditions, and an average total
results demonstrated that the RL strategy saved between 25% profit of 114.0% ± 19.6% was achieved.
and 40% of execution costs compared to the naïve strategy. Other literature studies have used the Recurrent RL (RRL)
However, this study’s main drawback was the assumption approach. Moody et al. [10] proposed an application of
that the quantity of each buy order is significantly high to the RRL approach. RRL is an unconstrained RL algorithm
increase the price of the traded security. The work by [22] that solves the problem of dimensionality. Several studies
designed a next-generation multi-agent systems (MAS) stock have extended the RRL model. For example, Zhang and
market simulator. Each agent learns price forecasting and Maringer [25] used technical analysis indicators, fundamen-
stock trading autonomously via RL. The results demonstrate tal analysis, and econometric study with RRL to improve
that agent learning allows accurate simulation of the market trading decisions. The analytical indicators were filtered
microstructure. using the genetic algorithm evolutionary process. Refer-
Several studies in the literature utilized a value-based RL ence [8] combined RRL and a particle swarm with a Cal-
approach with a Q-Learning matrix for algorithmic trad- mar ratio-based objective function for portfolio trading. They
ing. Gao and Laiwan [24] and Pendharkar and Cusatis [15] evaluated their method using S&P100 index stocks, and the
employed a value-based RL approach with a Q-Learning results demonstrated that the proposed portfolio trading sys-
matrix to develop algorithmic trading methods. Here, the core tem outperformed benchmark trading strategies, particularly
idea is to approximately calculate each state’s value func- under high transaction cost conditions. In addition, the results
tion (or state-action pair) and subsequently select the greedy demonstrated that the Calmar ratio was the best fitness func-
trading action based on the value function. [24] used two tion for particle swarm algorithms.
performance functions, i.e., absolute profit and relative risk- In recent years, RL research has clustered around deep
adjusted profit, to train the algorithmic trading model. The learning RL. The work by [17] used a financial-model-free
authors in [15] proposed several RL agents for trading port- RL framework to deliver a deep RL solution to the portfo-
folio assets. They designed on-policy (SARSA (λ)) and lio management problem. The central part of the deep RL
off-policy (Q-learning) discrete state and discrete action framework is the Ensemble of Identical Independent Evalua-
agents. Here, the goal is to maximize one of the two val- tors (EIIE) topology. An EIIE is a neural network designed
ues the portfolio returns or differential Sharpe ratios. They to examine the historical data of an asset and evaluate its
examined the impact of RL and trading frequencies. The potential growth. In their work, the portfolio weights iden-
results demonstrate that a continuous adaptive action RL tify the action for the RL agent. The reward of the RL
trading strategy consistently performs the best in forecasting agent is the explicit average value of the recurring loga-
portfolio allocations in the following period. The learning rithmic returns. In a similar context, [18] offers a portfolio
frequency of RL algorithmic trading is essential in determin- management approach using deep RL on markets with a
ing trading performance. The work by [9] and [20] demon- dynamic number of assets. The neural network architecture is
strated the effectiveness of the policy-based model over employed and trained using deep RL. Their design was tested
the value-based function model relative to performance and on a historical dataset of one of the largest world cryptocur-
applicability. rency markets. The results outperform state-of-the-art meth-
With regard to the adoption of Q-learning, Neuneier [26] ods in the literature, accomplishing average daily returns of
applied a Q-learning algorithm to optimize a trading over 24%.
portfolio. Neuneier constructed an Artificial Neural Net- The main advantage of the algorithmic trading strate-
work (ANN) to forecast price movement and then used the gies proposed in this paper is their continuous adaptability
Q-learning algorithm to find an optimal policy. Another to new market conditions using a learning process result-
study [27] proposed a portfolio optimization technique using ing from dynamic DC events. In addition, existing RL
the RL Q-learning approach. This method improved the algorithmic trading modules does not consider an event-
Q-learning algorithm for optimal asset allocation intro- based system, where an event is the basic unit for
duced [26]. This model simplifies the previous model [26] studying price time-series. Thus, the representation of
by using one value function for several assets, facilitat- environmental states (i.e., market states) in RL algorith-
ing model-free policy iteration. Another study [9] used mic trading must be improved to realize adaptability to
direct RL alteration and compared their algorithm to market behaviours continuous changes.

VOLUME 9, 2021 114661


M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

III. DYNAMIC DC EVENT APPROACH Algorithm 1 DCRL Algorithmic Trading


DC is an event approach for price time series analysis in Input: Pstream: Price Time-Series stream
financial markets [29]. The DC summarizes price movements 1 Initilise DCRL model prompters
using intrinsic time, which is an event-based timing system 2 for each incoming Episode Et in Pstream do
(irregularly spaced in time), rather than physical time, which 3 Define DC dynamic threshold
is a point-based timing system that depends on fixed time 4 The DC model represents the evironmnet state by st
intervals (e.g., hourly). DC identifies significant price move- 5 The RL decides trading action at according to
ments (i.e., DC events) if the price change between two points environment state st
satisfies a fixed given threshold value. DC has two types of 6 Calculate reward rt and evaluation metrics
events, i.e., upturn events, which are identified once the price 7 Observe new state st+1
change is greater than or equal to the fixed threshold value, 8 Store transition (st , at , rt , st+1 ) in D
and downturn events, which are identified once the price 9 end
change is less than or equal to the fixed threshold value. A DC
event detection algorithm with a comprehensive description
of DC can be found in the literature [29]. The potential of
DC event approaches has been proven in many studies for A. RL
different financial markets, and they have been used for event RL comprises an agent, an environment represented by a set
detection [7], [13], forecasting models [30], [31], design- of states st ∈ S, and a set of actions at ∈ A. By performing an
ing trading strategies [30], [32]–[40], profiling price time- action at time t, the agent receives reward rt and transforms
series [31], and time-series analysis [41], [42]. from state st to state st+1 . Here, the goal of the agent is
A dynamic threshold definition method that replaces the to maximize its current reward. The RL algorithm creates
DC fixed given threshold value has been proposed in [13]. models using the Markov Decision Process (MDP) [5]; thus,
The dynamic threshold is applicable to financial markets RL can be represented by a state, action, and reward sequence
that operate over specific opening and closing times, e.g., (st , at , rt , st+1 , at+1 , rt+2 , . . .). The RL algorithm designs
stock markets. The dynamic threshold is a flexible value that policies (π ) that associate an environment state s with action a
can identify significant price movements (i.e., DC events) of to maximize the immediate reward r received over a specified
different magnitudes in continuously changing and dynamic time. In a trading algorithm, a trading rule can be considered
environments. Setting the dynamic threshold value depends a programmed policy π (s, a) that yields a trading action
on the previous day’s price behaviors (i.e., the short-term according to the available state data st at time t. As shown
price history) and additional data sources (e.g., news out- in Figure 1, the RL algorithm is based on the sequential
lets) [13]. In general, the dynamic threshold is set depending interactions between the agent and its environment.
on the previous day (between opening and closing price) price
changes, and the overnight (between previous day closing
price and current day opening price) price changes. In addi-
tion to the rate of change in price between the current day
opening price and the reached trend price. The dynamic
threshold definition equations can be found in [13].
In the work by [7], we were able to set the dynamic thresh-
old value-based on only financial market data (without any
additional data sources) using RL. In this study, to identify an
actionable trading opportunity, we use the DC event approach
and the dynamic threshold definition method [7].

IV. DCRL ALGORITHMIC TRADING


The proposed DCRL algorithmic trading comprises two main
components. First, the DC event approach with the dynamic
DC threshold is used to identify and represent the market’s
environmental states. Second, the RL decision-making algo-
rithm is used to make decisions and take appropriate trading
actions. In this section, the DCRL algorithmic trading strat-
egy is described in detail. In addition, a rigorous formaliza-
tion of this algorithmic trading approach and its associated
characteristics are presented. Algorithm 1 depicts the core FIGURE 1. Interactive process of the RL algorithm.
mechanism of the DCRL algorithmic trading.
Algorithm 1. The core mechanism of the DCRL algorith- The RL algorithm includes two value functions, i.e., the
mic Trading. function of states and the function of state-action pairs [5].

114662 VOLUME 9, 2021


M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

These functions estimate the effectiveness of an agent’s action to learn the stock market environment and discover trading
in a given state. The notion of ‘‘effectiveness’’ in RL is rules. The underlying challenge of stock market trading is
defined according to future rewards, i.e., the expected return capturing market states at a specific time. For price time-
in a financial trading context. Thus, these value functions are series, commonly employed data in financial forecasting lit-
determined based on specified policies. The value of state s erature represent the price sequence at regular time intervals
following policy π (denoted vπ (s) is the expected return when (e.g., daily data). In this study, we used the daily data of stock
starting in s and following π through the specified period. market indexes, i.e., the opening, closing, high, and low prices
For the MDP, we can define the state-value function vπ (s) for for each day.
policy π as follows in Eq. 1: The market state variable of each trading day is represented
by a pair of the DC price trend direction (an upward or
vπ (s) = Eπ [Gt |St = s] (1)
downward trend) and the type of detected event (overnight
where Eπ [.] is the expected value following policy π, and t or previous day event). This gives six states for our research
is any time. Here, Gt is the cumulative discount rate for state problem. A lookup table (Table 1) is established for state
s at time t, which is defined as follows in Eq. 2: representation of the environment, where each state is sig-
hX∞ i nified with a single action associated with an expected
Gt (s) = γ k Rt+k+1 |St = s (2) reward.
k=0
where gamma (γ ) is a discount factor that takes a value TABLE 1. Lookup table for DCRL algorithmic trading.
between 0 and 1. Discount factor (γ ) defines the importance
of future rewards and weighs recent rewards more heavily.
In algorithmic trading, a higher discount factor value implies
that the agent will become more long-term investment ori-
ented. For example, in the ultimate case of γ = 1, the agent
considers each reward equally through the market run. In con-
trast, for γ = 0, the agent is biased because it only reflects
the current reward and discards future rewards.
Similarly, we define the function of a state-action pair
Q(s, a). The value of taking action a in state s following policy
π (denoted Qπ (s, a)) is defined as the expected return starting
from s, taking action a, and subsequently following pol-
icy π . The action value function for policy π is expressed as
follows:
Qπ (s, a) = Eπ [Gt |St = s, At = a] (3)
where Gt is the cumulative discount rate for all actions in state
s at time t, which is defined as follows.
hX∞ i
Gt (s) = γ k Rt+k+1 |St = s, At = a (4)
k=0
In RL, there are two main algorithms designed to find
optimal action at+1 to take given current state st+1 . The
first algorithm is the off-policy algorithm, where the Q(s, a) The agent uses an RL algorithm to change from state st to
function does not depend on the agent’s learning policy: thus, st+1 , which is based on learning the dynamics of the environ-
it learns from taking different actions (e.g., random actions). ment. Thus, if state st were Overnight or PreviousDay with an
The second algorithm is an on-policy algorithm, where the Upward trend, the action would be Sell because we think that
Q(s, a) function is dependent on the agent’s learning policy; the price increase that occurred due to an overnight or pre-
thus, the agent learns from actions it has taken using the vious day price change was high. The same applies if st was
current policy π(a|s). Overnight or PreviousDay with a Downward trend, i.e., the
action would be Buy because we think that prices have fallen
B. DCRL STATES sharply due to a sudden overnight or previous day change in
The principle of RL is that an agent continuously interacts price. The Overnight or Previous Day states are satisfied if
with the environment and learns the optimal trading rule the five-day moving average is greater than the overnight or
to improve its trading strategy. For stock market trading, previous day’s price change. However, if the detected state is
the environment comprises the current stock price data and Neutral, which indicates no significant event was identified
historical price series, including a variety of fundamental in the price time-series between time t − 1 and t, we use
data and technical analysis indicators. Therefore, selecting the optimal state-action value function to select the optimal
the set of data inputs is a prerequisite for trading agents policy for t + 1.

VOLUME 9, 2021 114663


M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

C. DCRL ACTIONS action), and the variation of unrealized profit was employed
At each time step t, the agent observes the environment’s for opening (buy action) or holding a position.
state st and executes a trading action following policy
π (s, a). Here, the agent actions are buy, sell, or hold, i.e., E. Q-LEARNING ALGORITHM
A = {Buy, Sell, Hold}. An agent receives a reward after it Q-learning is an off-policy RL algorithm that seeks to maxi-
takes an action. An action at may have an impact on the mize the total reward. Quality in the RL approach signifies
agent’s portfolio value, specifically, the cash and share values how effective an executed action at at time t was relative
giving that a trading action executes at the current market to achieving a particular future reward. In the Q-learning
closure price pt . algorithm, we create a Q-table or matrix that follows policy
Two experimental design constraints are assumed regard- π (s, a) and randomly initialize the values in the matrix. Then,
ing the quantity of traded shares Qt at time t. First, for the for each iteration of the market run, the Q-values are updated
Buy action, the amount of shares to be bought by the trading and stored in the matrix. Accordingly, the Q-matrix turn into
agent is based on all available cash that agent has at time t. a reference matrix for the agent to determine the optimal
For the Sell action, the agent sells all of available shares at action based on the maximum Q-value. The Q-function uses
time t. In other words, the agent spends 100% of its cash when the Bellman equation, which takes two inputs, i.e., state st
buying and 100% of its shares when selling. Second, there and an action under policy π (s, a). Given the current state
is no transaction cost in this simulation. By making these st of the environment at time t and the taken action at+1 ,
simplified assumptions, the complexity of the trading strategy we can formulate the action value function following policy
is reduced to a level that can be explored and examined within π as follows:
the scope of this study. Simplicity is essential to understand Q (s, a) = Q (a, s) + α[R (s, a)+γ max Q s0 , a0 − Q(s, a)]

an agents’ trading behaviors and the trading rules generated
by the agent because assigning variable quantities may result (7)
in a more complicated analysis. Note that relaxation of these where Q(s, a) is the new Q-value for state st and action at ,
assumptions does not affect generality or the accuracy of α is the learning rate satisfying 0 ≤ α ≤ 1, R(s, a) is the
the obtained results. Nevertheless, we are aware of share reward for taking action at at state st , γ is the discount factor
quantity’s critical role as a choice variable for the generated (also referred to as the discount rate) satisfying 0 ≤ γ ≤ 1,
trading rules (especially with risk aversion). and max Q0 (s0 , a0 ) is the maximum expected reward for new
state s0 and all possible actions at state s0 . Low alpha (α)
D. REWARD FUNCTION values imply a slower learning rate, while higher alpha values
An agent designed based on the RL algorithm learns the indicate more rapidly learning of Q-value updates.
optimal policy to trade to achieve maximum profit; therefore, For simplicity, we refer to DCRL with Q-learning algo-
the reward function design is critical when designing trading rithm as QDCRL. A QDCRL agent learns an optimal state-
strategies based on the RL algorithm. In stock market trading action value function Q∗ for the Neutral state, where an
literature, several studies have used the Rate of Return (RoR) update process considers a quintuple Q(st , at , rt , st+1 , at+1 )
as a reward function [15]. of the environment. For the six states and three actions,
In this study, we used two immediate reward criteria for we create a matrix Q ∈ R6×3 initialized with random val-
the DCRL agent. The first criterion is Buy action, where ues. Therefore, Q(s, a) represents the Q-value for state s and
the Relative Return (RR) is used (Eq. 5). Here, pSell and action a. The Q(s, a) initial random values are subsequently
pBuy are the selling and buying prices, respectively. The RR updated in the simulation run by identifying new states and
defined as the difference between the absolute price return at actions using the dataset, where reward r(s, a) is assigned for
time t and the return reached by the target time. The second each selected action. The structure of QDCRL algorithmic
immediate reward criterion is for the Sell action, where the trading is shown in Figure 2.
RoR is used (Eq. 6). The RoR is the net gain (or loss) of a
single trade over a particular period based on the trade’s initial V. EXPERIMENT AND RESULTS
cost. In this section, we discuss a series of experiments conducted
with the proposed DCRL (with and without Q-learning)
algorithmic trading strategies, including the datasets used,
RR = (pt − pt−1 )/pt−1 (5)
performance evaluation metrics, benchmarks, experimental
RoR = (pSell − pBuy )/pBuy (6) settings, and trading performance results.
We evaluated three aspects of the proposed DCRL and
Here, pt and pt−1 are the current price at time t and time QDCRL algorithmic trading strategies, i.e., trading perfor-
t − 1, respectively. To assess the action taken (i.e., the exe- mance profitability and effectiveness, as well as adaptability
cuted trading action), we employ two reward functions so and efficiency of the dynamic threshold DC event approach
that we can consider the different impact of both the Sell and for the RL environment state representation. Finally, we con-
Buy actions. The authors of [28] used two reward functions, firmed the efficacy of the Q-learning algorithm in RL for
i.e., the trade profit was used for closing a position (sell algorithmic trading.

114664 VOLUME 9, 2021


M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

FIGURE 2. Structure of QDCRL algorithmic trading.

A. DATASETS shows a descriptive statistics analysis (mean (µ), standard


We performed several experiments to confirm the effec- deviations (σ ), skewness, kurtosis, minimum, and maximum
tiveness and robustness of DCRL algorithmic trading using price values) of the investigated stock indices.
three different stock indices, i.e., S&P500, NASDAQ, and
Dow Jones. These stocks were downloaded from Yahoo! B. EXPERIMENTAL SETTINGS
Finance for the period July 2015 to July 2020 (five years, i.e., Here, we define the set of parameters used in our experiments.
1260 days of daily price data). We examined different parameters settings for the QDCRL
The movement of the three stock indices and their detailed trading strategy systematically through several rounds of tun-
price curve evolution is shown in Figure 2, and Table 2 ing. The parameter settings are summarized in Table 3.

VOLUME 9, 2021 114665


M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

TABLE 2. Descriptive statistics of stock indices. TABLE 4. ROI and Sharpe Ratio (SR) for different parameters settings
(discount factor (γ ) and learning rate (α)). Results for the S&P500 stock
index are shown.

TABLE 3. QDCRL parameters used at time t .

performance criteria are portfolio returns and differential


SR [15]. The four-evaluation metrics are defined as follows.
i. The profit curve is a qualitative metric to evaluate
the profitability of trading strategies. The goal is to
demonstrate the change in the agent’s portfolio profit
We executed several independent simulation runs using over time, which can signify the cumulative gain
the same configuration values with different random seeds to (profit or loss) of the trading strategies at each time
confirm that the obtained results are consistent. This allowed point.
us to verify the effectiveness and accuracy of the results. ii. The ROI is the ratio between the net trading gains
Therefore, the experimental results are averaged over 20 inde- (profit or loss) and trading cost. It is a quantitative
pendent simulation runs. evaluation indicator that measures portfolio profitabil-
For the learning rate (α) and discount factor (γ ) parameters ity. Maximizing the ROI is simple, and it reflects a
in QDCRL, we conducted a series of systematic experi- risk-neutral utility function [15].
ments to explain the effect of these two parameters on the iii. SR is a quantitative evaluation indicator to measure
Q-learning algorithm and the profitability of QDCRL the risk-adjusted return of the agent’s portfolio. The
algorithmic trading. Here, we examined values between SR considers both the benefits and risks of an invest-
0.85–0.99 for the discount factor and 0.0001, 0.001, 0.05, and ment. Thus, it removes undesirable effects of risk
0.1 for the learning rate (these values are in line with previous factors on trading performance evaluation. As a result,
studies [15], [28], [43]). The Return On Investment (ROI) the SR shows how to fit returns to the risk taken. Here,
and Sharpe Ratio (SR) results of the different γ and α values a higher SR value indicates a higher risk-adjusted
are given in Table 4. The objective is to find the optimal RoR. Several studies into RL algorithmic trading
parameter set to be used for the QDCRL. Higher ROI and have used SRs as a performance measure [8]. How-
SR values indicate that a set of parameters achieved the ever, the SR penalizes price returns that are greater
best performance; thus, we selected these parameter sets in than a specific amount, weighs recent price returns
our experiments. We found that different discount factor (γ ) higher than the past returns, and does not differentiate
values do not affect performance significantly. In contrast, between the upside and downside possible growth of
small learning rate (α) values demonstrated slightly better a portfolio [14]. The SR is expressed as follows:
performance than higher values, which implies the impor- E (R) − Rf √
SR = n (8)
tance of a slower learning rate for the Q-value updates in the σ (R)
Q-learning algorithm. where E(R) is the expected accumulated return of
investments over trading period T , σ (R) is the standard
C. EVALUATION METRICS deviation of the return, Rf is the risk-free factor, and n
We used four metrics to evaluate the trading strategies’ effec- is the number of observations.
tiveness and robustness: profit curve, ROI, SR, and the num- iv. The number of trading signals refers to the number
ber of trading signals. In the literature, two commonly used of times a Buy or Sell trading action was executed

114666 VOLUME 9, 2021


M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

FIGURE 3. Price curve movement of S&P500, NASDAQ, and Dow Jones stock indices during target period.

during the trading period, which is measured to avoid trading. The reason behind choosing the Direct RL model
exceedingly frequent trading resulting in extremely as a baseline benchmark is to provide a rational comparison
high risk. of the minimum level of supervised learning. Besides, this
will allow us to evaluate the dynamic DC event approach’s
D. BENCHMARK TRADING STRATEGY effectiveness in representing the environment’s state. Further-
To further evaluate the performance the proposed DCRL and more, we compare the performance of DCRL and QDCRL
QDCRL trading strategies, we compare them to the ZI agent with a classic DC event approach -fixed threshold- intro-
with a budget constraint for stock trading. The ZI is a bench- duced by [12]. The DC approach provides a pattern detec-
mark used to evaluate intelligent algorithmic trading mod- tion for price time-series with no utilization of any machine
els. It is a complete random approach that allows to assess learning techniques. We employed the DC approach using a
the intelligence and learning effectiveness of the DCRL and variety of fixed thresholds ranging from [0.01, 0.001]. The
QDCRL. In addition, we benchmark with the Direct RL average performance of the different simulation runs was
designed by [9], which is a classical RL model for algorithmic reported.

VOLUME 9, 2021 114667


M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

A ZI with a constraint trading agent was established [44]


to trade in a continuous double auction market. Thus, the ZI
agent has no intelligence and does not observe states in the
market (i.e., it does not employ a learning process from
historical data). As a result, the ZI agent is not informed of the
current market conditions and does not have any beliefs about
future price movements. The ZI agent places an order based
on a random probability defined from a uniform distribution
subject to budget constraints. Therefore, the ZI agent decides
to either submit a buy or a sell order (or hold) with equal
probability. Here, we examined the effect of the ZI trading
strategy’s randomness compared to the DCRL and QDCRL
trading strategies.

E. TRADING PERFORMANCE RESULTS


Our core focus is the profitability performance and effective-
ness of the proposed DCRL and QDCRL algorithmic trading FIGURE 4. Comparison of number of trading signals of QDCRL, DCRL, ZI,
strategies; therefore, standard performance evaluations were Direct RL, and classic DC for S&P500, NASDAQ, and Dow Jones.
conducted on three stock indices. Table 5 summarizes the
results of the quantitative assessment using the ROI and SR.
We identify three main findings from the results.
Second, generally, the QDCRL trading strategy outper-
formed the DCRL trading strategy without Q-learning. The
TABLE 5. ROI and SR of QDCRL, DCRL, classic DC, ZI and direct RL trading QDCRL trading strategy generally outperformed DCRL on
strategies on different stock indices.
the S&P500 and Dow Jones stock indices in relation to ROI
and SR, which indicates the Q-learning algorithm can poten-
tially improve the trading performance resulting on good
profits contained by an acceptable level of risk (for NASDAQ,
the DCRL trading strategy was only slightly higher than
QDCRL ROI).
Third, the QDCRL trading strategy SR values were sig-
nificantly higher than those for the DCRL, ZI, Direct
RL and the classic DC (except for NASDAQ the clas-
sic DC SR was slightly higher). Thereby confirming the
validity of the trading actions taken by QDCRL, which
avoids risk while securing profit especially when the price
curve increases sharply. In addition, the high SR values for
QDCRL implies that the risk of the QDCRL is more con-
trollable, and that the trading performance results are more
First, the QDCRL and DCRL trading strategies generate effective.
substantial profits for the three stock indices compared to The number of trading signals generated by the QDCRL,
the Direct RL, ZI and classic DC -fixed threshold- trading DCRL, ZI, Direct RL and the classic DC trading agents
strategies. This observation confirms that the dynamic DC for the different stock indices is plotted in Figure 3. For
threshold effectively contributes to the algorithmic trading the three trading strategies designed based on the DC event
design. This is because the dynamic DC event approach approach (DCRL, QDCRL, and classic DC algorithmic trad-
summarizes patterns in the price time series. These patterns ing), the number of trading signals indicates its sensitivity
represent environmental states for the RL algorithm. Instead to price fluctuations in the market as a result of the learning
of employing discrete price values during training of the RL process and adaptability to market changes. Figure 3 shows
algorithm (e.g., Direct RL), a continuous environmental state that the total number of trading signals generated by the
signal is fed to the DCRL and QDCRL. The DC environmen- QDCRL, DCRL and classic DC trading agents for the three
tal states can provide more detailed information regarding stock indices was significantly less than that for the ZI and
the dynamic of price time series. This is also clear in the Direct RL agents. The average number of trading signals
results of the classic DC trading strategy (as it comes in generated by the ZI trading agent for all the three stock
third place of the trading strategies performance), which also indices represents approximately 42% of the total available
confirms the potential of employing an event based approach trading time. As for the Direct RL trading agent, the average
for summarizing price time series. number of trading signals represents approximately 50% of

114668 VOLUME 9, 2021


M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

FIGURE 5. Profit curves of QDCRL, DCRL, and ZI trading strategies for three stock indices.

the total available trading time. The higher the number of outperformed the QDCRL. After that, we can clearly see how
trading signals taking place the more likely its leading to the learning is well reflected in the QDCRL performance, and
negative investment results. hence, how the QDCRL has significantly outperformed both
Figure 4 shows the daily portfolio return for the QDCRL, DCRL and ZI. The same applies to the Dow Jones (in the third
DCRL, and ZI trading agents during the target period chart), where, initially, ZI and DCRL outperformed QDCRL.
(1260 days) for the S&P500, NASDAQ, and Dow Jones However, as learning goes on, the QDCRL significantly out-
indices. We have excluded the portfolio return for the Direct performed both DCRL and ZI. The same also applies to NAS-
RL given the massive negative returns during the vast of DAQ, where learning has proven to be effective when used
trading periods. For the S&P500, the DCRL and ZI initially with RL and DC. Finally, learning had remarkably effect on

VOLUME 9, 2021 114669


M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

QDCRL performance, and QDCRL generally outperformed may permit the agent to submit more trading signals. The
both DCRL and ZI. results of this study suggest that adaptive QDCRL agents with
Q-learning provide the best performance based on investment
VI. CONCLUSION profitability and are more promising in practical applications.
In this paper, we have proposed two algorithmic trading This paper can be further extended in several research
strategies based on the DCRL model. Our main focus was directions. For example, in the future, we can examine DCRL
to improve the environment state representations for the RL (with and without Q-learning) on high-frequency trading to
algorithm. The dynamic DC threshold event approach was explore and confirm the effectiveness of DCRL algorithmic
able to precisely represent the environment states. In addition, trading, thus further improving and optimising DCRL to fit
it was able efficiently capture stable market states, which led that trading context. In addition, we can evaluate applying
to achieving profitable trading returns under acceptable risk DCRL algorithmic trading to different emerging markets,
levels in several stock indices. The effectiveness and robust- e.g., the Forex market and cryptocurrencies. Finally, DCRL
ness of the DCRL trading strategies were verified on real algorithmic trading can only trade one asset at a time; thus,
stock market data, and the experimental results demonstrate we can also extend our investigations to managing portfolios
that the proposed DCRL algorithmic trading outperformed involving multiple assets.
the ZI, Direct RL and classic DC trading strategies with
higher total profits and SR, as well as more consistent profit ACKNOWLEDGMENT
curves. The authors would like to thank the anonymous reviewers for
Our primary contributions are summarized as follows. their useful comments and suggestions.
We defined the environment states in the RL algorithm using
the dynamic DC threshold event approach, we developed a DECLARATION OF INTEREST
simple lookup table for RL algorithmic stock trading, and The authors report no conflicts of interest. They alone are
we employed the Q-learning algorithm to select the optimal responsible for the content and writing of the article.
policy under the Natural market state.
Given the dynamic nature of the price time-series, trained REFERENCES
and adaptive algorithmic trading must be retrained when [1] P. Treleaven, M. Galas, and V. Lalchand, ‘‘Algorithmic trading review,’’
the environment states changes based on specified precon- Commun. ACM, vol. 56, pp. 76–85, Nov. 2013.
[2] B. Bruce, Trading Algorithms, Student-Managed Investment Funds.
ditions. The learning mechanism based on the dynamic DC 2nd ed. Cambridge, U.K.: Academic, 2020, pp. 285–315.
threshold event approach is effective relative to improving the [3] E. Fama and M. Blume, ‘‘Filter rules and stock market trading profits,’’
market’s states’ representation. The DCRL agents’ trading J. Bus., vol. 39, pp. 226–241, Jan. 1966.
[4] K. Lei, B. Zhang, Y. Li, M. Yang, and Y. Shen, ‘‘Time-driven feature-
performance (with and without Q-learning) were generally aware jointly deep reinforcement learning for financial signal represen-
significant and turned a profit within an appropriate level of tation and algorithmic trading,’’ Expert Syst. Appl., vol. 140, Feb. 2020,
risk. These results indicate that, to generate proper trading Art. no. 112872.
[5] R. Sutton and A. Barto, Reinforcement Learning: An Introduction.
rules and high-performance returns, learning the environment Cambridge, MA, USA: MIT Press, 1998.
states is required (i.e., adaptive and non-static representations [6] S. Chinchali, P. Hu, T. Chu, M. Sharma, M. Bansal, R. Misra, M. Pavone,
of the price time-series is needed). and S. Katti, ‘‘Cellular network traffi scheduling with deep reinforcement
learning,’’ in Proc. 32nd AAAI Conf. Artif. Intell., 2018, pp. 766–774.
We used two reward functions for the DCRL agents, where [7] N. Alkhamees and M. Aloud, ‘‘DCRL: Approach to identify financial
each reward is associated with a specific action (either a buy events from time series using directional change and reinforcement learn-
or sell action). The relative return reward function was sued ing,’’ Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 8, 2020.
[8] F. Bertoluzzo and M. Corazza, ‘‘Reinforcement learning for automatic
for the buy action, and the rate of return reward function financial trading: Introduction and some applications,’’ Dept. Econ., Ca’
was used for sell action. We found that using these reward Foscari Univ. Venice, Venice, Italy, Work. Paper 2012:33, 2012.
functions (rather than a single reward function) improved the [9] J. Moody and M. Saffell, ‘‘Learning to trade via direct reinforcement,’’
IEEE Trans. Neural Netw., vol. 12, no. 4, pp. 875–889, Jul. 2001.
Q-learning matrix’s performance. [10] J. Moody, L. Wu, Y. Liao, and M. Saffell, ‘‘Performance functions and
There are two reasons why the QDCRL trading algorithm reinforcement learning for trading systems and portfolios,’’ J. Forecasting,
outperformed DCRL. The first is the learning process for vol. 17, nos. 5–6, pp. 441–470, Sep. 1998.
[11] S. Almahdi and S. Yang, ‘‘An adaptive portfolio trading system: A risk-
the optimal trading policy under specific market conditions. return portfolio optimization using recurrent reinforcement learning with
As stated previously, the performance of QDCRL agents expected maximum drawdown,’’ Expert Syst. Appl., vol. 87, pp. 267–279,
depends on the selection of the optimal policy. The learning Nov. 2017.
[12] J. B. Glattfelder, A. Dupuis, and R. B. Olsen, ‘‘Patterns in high-frequency
frequency of algorithmic trading plays a critical role in influ- FX data: Discovery of 12 empirical scaling laws,’’ Quant. Finance, vol. 11,
encing the agent’s trading analytical performance; however, no. 4, pp. 599–614, Apr. 2011.
we did not find that large learning rate (α) values are always [13] N. Alkhamees and M. Fasli, ‘‘Event detection from time-series streams
using directional change and dynamic thresholds,’’ in Proc. IEEE Int. Conf.
effective. We consider the difference between loss and reward Big Data (Big Data), Boston, MA, USA, Dec. 2017, pp. 1882–1891.
in the Neutral state was caused by the fact that Q-learning [14] Y. Deng, F. Bao, Y. Kong, Z. Ren, and Q. Dai, ‘‘Deep direct reinforcement
may effectively model the long-term discounted returns of a learning for financial signal representation and trading,’’ IEEE Trans.
Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 653–664, Mar. 2017.
particular state. In addition, we restricted the agent to select [15] P. C. Pendharkar and P. Cusatis, ‘‘Trading financial indices with reinforce-
from a finite action set based on the optimal policy, which ment learning agents,’’ Expert Syst. Appl., vol. 103, pp. 1–13, Aug. 2018.

114670 VOLUME 9, 2021


M. E. Aloud, N. Alkhamees: Intelligent Algorithmic Trading Strategy Using RL and DC

[16] L. Weng, X. Sun, M. Xia, J. Liu, and Y. Xu, ‘‘Portfolio trading system of [37] M. Aloud, ‘‘Profitability of directional change based trading strategies: The
digital currencies: A deep reinforcement learning with multidimensional case of Saudi stock market,’’ Int. J. Econ. Financ., vol. 6, no. 1, pp. 87–95,
attention gating mechanism,’’ Neurocomputing, vol. 402, pp. 171–182, 2016.
Aug. 2020. [38] M. Aloud, ‘‘Investment opportunities forecasting: A genetic programming-
[17] Z. Jiang, D. Xu, and J. Liang, ‘‘A deep reinforcement learning based dynamic portfolio trading system under a directional-change frame-
framework for the financial portfolio management problem,’’ 2017, work,’’ J. Comput. Finance, vol. 22, pp. 1–35, Mar. 2017.
arXiv:1706.10059. [Online]. Available: http://arxiv.org/abs/1706.10059 [39] M. Aloud and M. Fasli, ‘‘Exploring trading strategies and their effects in
[18] C. Betancourt and W.-H. Chen, ‘‘Deep reinforcement learning for portfolio the foreign exchange market,’’ Comput. Intell., vol. 33, no. 2, pp. 280–307,
management of markets with a dynamic number of assets,’’ Expert Syst. May 2017.
Appl., vol. 164, Feb. 2021, Art. no. 114002. [40] M. Kampouridis and F. E. B. Otero, ‘‘Evolving trading strategies using
[19] Y. Nevmyvaka, Y. Feng, and M. Kearns, ‘‘Reinforcement learning for directional changes,’’ Expert Syst. Appl., vol. 73, pp. 145–160, May 2017.
optimized trade execution,’’ in Proc. 23rd Int. Conf. Mach. Learn. (ICML), [41] M. Aloud, ‘‘Time series analysis indicators under directional changes: The
2006, pp. 1–8. case of Saudi stock market,’’ Int. J. Econ. Financ., vol. 6, no. 1, pp. 55–64,
[20] M. Dempster and V. Leemans, ‘‘An automated FX trading system using 2016.
adaptive reinforcement learning,’’ Expert Syst. Appl. vol. 30, pp. 543–552, [42] J. Ma, X. Xiong, F. He, and W. Zhang, ‘‘Volatility measurement with direc-
Apr. 2006. tional change in Chinese stock market: Statistical property and investment
[21] C.-H. Kuo, C.-T. Chen, S.-J. Lin, and S.-H. Huang, ‘‘Improving gener- strategy,’’ Phys. A, Stat. Mech. Appl., vol. 471, pp. 169–180, Apr. 2017.
alization in reinforcement learning–based trading by using a generative [43] G. Jeong and H. Y. Kim, ‘‘Improving financial trading decisions using deep
adversarial market model,’’ IEEE Access, vol. 9, pp. 50738–50754, 2021. Q-learning: Predicting the number of shares, action strategies, and transfer
[22] J. Lussange, I. Lazarevich, S. Bourgeois-Gironde, S. Palminteri, and learning,’’ Expert Syst. Appl., vol. 117, pp. 125–138, Mar. 2019.
B. Gutkin, ‘‘Modelling stock markets by multi-agent reinforcement learn- [44] D. K. Gode and S. Sunder, ‘‘Allocative efficiency of markets with zero-
ing,’’ Comput. Econ., vol. 57, no. 1, pp. 113–147, Jan. 2021. intelligence traders: Market as a partial substitute for individual rational-
[23] D. Bertsimas and A. W. Lo, ‘‘Optimal control of execution costs,’’ J. Finan- ity,’’ J. Political Economy, vol. 101, no. 1, pp. 119–137, Feb. 1993.
cial Markets, vol. 1, no. 1, pp. 1–50, Apr. 1998.
[24] X. Gao and C. Laiwan, ‘‘An algorithm for trading and portfolio manage-
ment using Q-learning and Sharpe ratio maximization,’’ in Proc. Int. Conf.
Neural Inf. Process., 2000, pp. 832–837.
[25] J. Zhang and D. Maringer, ‘‘Indicator selection for daily equity trading with
recurrent reinforcement learning,’’ in Proc. 15th Annu. Conf. Companion
Genetic Evol. Comput., Jul. 2013, pp. 1757–1758.
[26] R. Neuneier, ‘‘Optimal asset allocation using adaptive dynamic program-
ming,’’ in Proc. Adv. Neural Inf. Process. Syst., Cambridge, MA, USA: MONIRA ESSA ALOUD received the B.Sc. degree in information technol-
MIT Press, 1996, pp. 952–958. ogy and the M.Sc. degree in e-commerce technology from King Saud Univer-
[27] R. Neuneier, ‘‘Enhancing Q-learning for optimal asset allocation,’’ in Proc. sity, in 2006 and 2008, respectively, and the Ph.D. degree from the School of
Adv. Neural Inf. Process. Syst., 1998, pp. 936–942. Computer Science and Electronic Engineering (CSEE), University of Essex,
[28] J. Carapuço, R. Neves, and N. Horta, ‘‘Reinforcement learning applied to U.K., in 2013. She is currently an Associate Professor with the Department
Forex trading,’’ Appl. Soft Comput., vol. 73, pp. 783–794, Dec. 2018.
of Management Information Systems, College of Business Administration,
[29] M. Aloud, E. Tsang, R. Olsen, and A. Dupuis, ‘‘A directional-change
events approach for studying financial time series,’’ Econ. Open Access
King Saud University. She is also a member of the Computational Finance
Open Assess. E-J., vol. 6, pp. 1–18, Dec. 2012. and Economics Research Laboratory, Centre for Computational Finance
[30] A. Bakhach, E. P. K. Tsang, and H. Jalalian, ‘‘Forecasting directional and Economic Agents (CCFEA), University of Essex. While in CSEE, she
changes in the FX markets,’’ in Proc. IEEE Symp. Ser. Comput. Intell. worked on research projects with Olsen Ltd. She served as the Dean for
(SSCI), Dec. 2016, pp. 1–8. the College of Business Administration, Princess Nourah Bint Abdulrahman
[31] E. P. K. Tsang, R. Tao, A. Serguieva, and S. Ma, ‘‘Profiling high-frequency University, from March 2018 to August 2019. Since her appointment, she
equity price movements in directional changes,’’ Quant. Finance, vol. 17, has developed and implemented various strategic initiatives, including imple-
no. 2, pp. 217–225, Feb. 2017. menting student engagement and career development programs, launching
[32] H. Ao and E. Tsang, ‘‘Trading algorithms built with directional changes,’’ Trading Stock Lounge and new Bloomberg Finance Lab, and introducing
in Proc. IEEE Conf. Comput. Intell. for Financial Eng. Econ. (CIFEr), faculty professional development initiatives and incentives.
May 2019, pp. 1–7.
[33] A. M. Bakhach, E. P. K. Tsang, and V. L. Raju Chinthalapati, ‘‘TSFDC:
A trading strategy based on forecasting directional change,’’ Intell. Syst.
Accounting, Finance Manage., vol. 25, no. 3, pp. 105–123, Jul. 2018.
[34] N. Alkhamees and M. Fasli, ‘‘An exploration of the directional change
NORA ALKHAMEES received the B.Sc. degree in information technology
based trading strategy with dynamic thresholds on variable frequency data
streams,’’ in Proc. Int. Conf. Frontiers Adv. Data Sci. (FADS), Oct. 2017,
and the M.Sc. degree in information systems from the College of Computer
pp. 108–113. and Information Sciences, King Saud University (KSA), in 2008 and 2011,
[35] N. Alkhamees and M. Fasli, ‘‘A directional change based trading strategy respectively, and the Ph.D. degree in computer science from the School of
with dynamic thresholds,’’ in Proc. IEEE Int. Conf. Data Sci. Adv. Anal. Computer Science and Electronic Engineering (CSEE), University of Essex,
(DSAA), Oct. 2017, pp. 283–292. U.K., in 2019. She is currently working as an Assistant Professor with
[36] M. Aloud, ‘‘Directional-change event trading strategy: Profit-maximizing the Department of Management Information Systems, College of Business
learning strategy,’’ in Proc. 7th Int. Conf. Adv. Cogn. Technol. Appl., Administration, King Saud University.
F. Nice, Ed., 2015, pp. 123–129.

VOLUME 9, 2021 114671

You might also like