0% found this document useful (0 votes)
14 views25 pages

Enhancing Option Pricing Accuracy in The Indian

This research article presents a hybrid deep learning model combining Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) to enhance the accuracy of option pricing in the Indian market. The model utilizes 15 predictive factors derived from fundamental market data and technical indicators, and is evaluated against other deep learning models using various metrics. Results indicate that the proposed CNN-BiLSTM model significantly outperforms competing models in terms of prediction accuracy and robustness.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views25 pages

Enhancing Option Pricing Accuracy in The Indian

This research article presents a hybrid deep learning model combining Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) to enhance the accuracy of option pricing in the Indian market. The model utilizes 15 predictive factors derived from fundamental market data and technical indicators, and is evaluated against other deep learning models using various metrics. Results indicate that the proposed CNN-BiLSTM model significantly outperforms competing models in terms of prediction accuracy and robustness.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Enhancing Option Pricing Accuracy in theIndian

Market: A CNN-BiLSTM Approach


Akanksha Sharma (  [email protected] )
Maulana Azad National Institute of Technology
Dr. Chandan Kumar Verma
Maulana Azad National Institute of Technology
Priya Singh
Maulana Azad National Institute of Technology

Research Article

Keywords: CNN-BiLSTM, Technical indicators, Deep learning, Option pricing, Derivatives

Posted Date: September 7th, 2023

DOI: https://doi.org/10.21203/rs.3.rs-3322968/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License

Additional Declarations: No competing interests reported.


Springer Nature 2021 LATEX template

Enhancing Option Pricing Accuracy in the


Indian Market: A CNN-BiLSTM Approach
Akanksha Sharma1*, Chandan Kumar Verma2† and Priya
Singh3†
1* Department of Mathematics, Bioinformatics, and Computer
Applications, Maulana Azad national Institute of technology,
Bhopal, 462003, Madhya Pradesh, India.
2 Department of Mathematics, Bioinformatics, and Computer

Applications, Maulana Azad national Institute of technology,


Bhopal, 462003, Madhya Pradesh, India.
3 Department of Mathematics, Bioinformatics, and Computer

Applications, Maulana Azad national Institute of technology,


Bhopal, 462003, Madhya Pradesh, India.

*Corresponding author(s). E-mail(s):


[email protected];
Contributing authors: [email protected];
[email protected];
† These authors contributed equally to this work.

Abstract
Due to overly optimistic economic and statistical assumptions, the
classical option pricing model frequently falls short of ideal predic-
tions. Rapid progress in Artificial Intelligence, the availability of mas-
sive datasets, and the rise in computational power in machines have
all created an environment conducive to the development of com-
plex methods for predicting financial derivatives prices. This study
proposes a hybrid deep learning (DL) based predictive model for
accurate and prompt prediction of option prices by fusing a one-
dimensional Convolutional Neural Network (CNN) and a Bidirectional
Long Short-Term Memory (BiLSTM). A set of 15 predictive factors
is carefully built under the umbrella of fundamental market data and
technical indicators. Our proposed model is compared with other DL-
based models using six evaluation metrics–Root Mean Square Error

1
(RMSE), Mean Absolute Percentage Error (MAPE), Mean Percentage
Error (MAE), Determination Coefficient (R2 ), Maximum Error and
Median Absolute Error(MedAE). Further, statistical analysis of mod-
els is also done using one-way ANOVA and posthoc analysis using
the Tukey HSD test to demonstrate that the CNN-BiLSTM model
outperforms competing models in terms of fit and prediction accuracy.

Keywords: CNN-BiLSTM, Technical indicators, Deep learning, Option


pricing, Derivatives

1 Introduction
Derivatives in the financial markets are products that allow for trading spe-
cific financial risks in relation to another financial instrument, indicator, or
commodity. Risk management, hedging, arbitrage across markets, and even
pure speculation are all possible with the help of financial derivatives. By
increasing the efficiency with which market resources are allocated, financial
derivatives have become an integral part of the financial system. During the
last several decades, options have grown indispensable to businesses and
investors as a financial derivative product and one of the most prominent
items traded on the financial derivatives market. Assets and portfolios can be
protected from the impact of share price fluctuations by purchasing options
[42]. Due to the enormous profits made by trading stock or index options over
the past decade, a number of investors and speculators have been interested
in the options market. [34].
Options are complicated financial contracts that confer the right, but not the
duty, to purchase or sell an underlying asset at a defined price by a given
date. In the market, there are numerous types of options, including European
call (put) options and American call (put) options, exotic options, Bermudan
options, Barrier options, Asian options, and lookback options [31]. Due to the
dynamic nature of the market and the complexity of the underlying assets,
accurately predicting the price of financial options is a difficult assignment.
The central topic of options research, the option pricing method, has been
studied for decades, yet a precise match to the price process remains difficult
to achieve. Many studies have been conducted by academics to refine option
pricing’s precision. Today, option pricing models are primarily classified as
parametric or non-parametric. In 1973, Black and Scholes made history when
they presented the famous Black-Scholes (B-S) pricing formula for European
options and developed the classical parametric pricing model [2]. Nevertheless,
their model’s assumptions, like constant volatility and continuous underlying
asset process assumptions, were in contrast with the features of the option
data traded on the market. Numerous financial engineering models have
attempted to loosen the limitations of the Black-Scholes model and optimize
empirical results, resulting in popular approaches such as statistical series

2
expansion [6], local volatility models [47], stochastic volatility models [10],
and models with jumps [40]. These models have shown to be more effective in
terms of valuation, but they are computationally expensive due to the need
to calibrate a large number of implicit factors. Furthermore, they continue
to be constrained by certain economic and statistical hypotheses, such as
no-arbitrage and market completeness principles. So, a new field of research
has been created that uses data-driven models to rapidly and precisely price
financial derivatives. Machine Learning methods are extremely helpful if
options are viewed as a functional relation between the contracted terms
(inputs) and the premium (outputs). There is a wealth of literature in this
field such as option pricing using Support Vector Machine [41], Decision Tree
[20], Artificial Neural Network [8]. In contrast to parametric models, Hutchin-
son et al. claimed, a data-driven method is flexible and can account for subtle
shifts in the data’s underlying structure [17]. Since these models do not rely
on limiting parametric presumptions, they are resistant to the specification
errors that constrain traditional models.
Subsequent research from a variety of sources [53, 33, 46], confirms that an
artificial neural network is highly effective at options valuation. Notably, deep
learning has garnered considerable interest in the field of financial data anal-
ysis because of its effective nonlinear fitting and feature capture capabilities.
Deep learning can filter and learn data in depth, removing irrelevant factors
and strengthening relevant ones while learning variability. Convolutional
Neural Network performs particularly well in this respect by autonomously
extracting features. CNN’s low hyperparameter and computational require-
ments make it a popular choice for use in graphic and image processing
[28, 18]. Hu et al. made stock price predictions using CNN. CNN was found
to be capable of time series prediction, and deep learning was found to be the
optimal approach for dealing with time series problems [14].
Hochreiter et al. in 1997 first suggested the long short-term memory (LSTM)
neural network [12]. Recent years have seen a rise in the use of long short-
term memory (LSTM) for predicting data that depends on the past, and
LSTM has reasonably mature studies in fields like option pricing, volatility
prediction in the option market [52], stock price prediction [21, 7], and so on.
Sezer et al. suggested that LSTM is one of the most widely used DL models
for predicting financial time series [49]. However, LSTM occasionally failed
to recognize when a pattern suddenly changed, which negatively impacted
the reliability of the prediction model [48]. When predicting stock prices on
the National Stock Exchange of India and the New York Stock Exchange,
Hiransha et al. examined CNN, LSTM, MLP, and RNN [11]. They stated
that CNN was superior to three other models in predicting the stock price at
the market close price of the following day(NSE stock). Pokhrel et al. used
three deep learning models to make price predictions for the Nepal Stock
Exchange index, and their results demonstrated that the LSTM model archi-
tecture provides a superior fit with high prediction accuracy [43].
In contrast to LSTM, which can only use forward information, Bidirectional

3
LSTM (BiLSTM) can use both forward and backward information which
enhances the model performance. Siami et al. compared ARIMA and LSTM
models to BiLSTM, and their findings indicate that BiLSTM results in
more accurate predictions [50]. As a result, it works wonderfully for making
predictions about time sequences. To demonstrate the efficacy of BiLSTM,
Kulshrestha et al. used it in conjunction with Bayesian Optimisation (BO) to
predict quarterly visitor numbers to Singapore [24]. Numerous studies have
used BiLSTM for predicting time series [36]. The use of multitask RNNs for
time series forecasting has also been proposed, with applications including
EEG-based motion intention identification and dynamic sickness severity
prediction [4, 3]. BiLSTM was utilized by Zeng to make predictions about the
S&P 500 in 2019. The findings indicate that the accuracy of the predictions
far exceeded that of the currently available prediction models [54].
For a variety of applications, including healthcare [45], financial market pre-
diction [19], etc., some researchers have tried to merge CNN and LSTM to
build CNN-LSTM hybrid models. Numerous empirical findings demonstrate
that the influence of trade discontinuities makes it challenging to reliably esti-
mate option prices using a single neural network prediction model. To predict
stock prices, Lu et al. suggested a CNN-LSTM-based model [35]. To solve
the option pricing prediction problem, Zhao et al. merged CNN and LSTM
model with a standard stochastic volatility Heston model and a stochastic
interests CIR model, and the results demonstrated the high accuracy of the
dual hybrid model [55].
According to the research studies discussed above, most papers skim the
surface of how we can use technical indicators in sequential deep networks.
The suggested research utilizes a combination of selective features, derived
from fundamental and technical data to construct the model. This research
recommends combining the CNN and BiLSTM to appropriately price options,
enhance prediction accuracy, and boost model performance by looking at the
widespread popularity of DL-based hybrid models in several sectors for finan-
cial derivative price prediction. Afterward, we de-noised the close price of the
index using the soft mode of the Haar wavelets. Furthermore, the min-max
normalization method was used to normalize the data collection. Following
hyperparameter tuning, the input data is loaded into a CNN-BiLSTM model,
which is subsequently used for predicting the close price of the index. Using
MAE, RMSE, MAPE, MedAE, MaxError, and R scores, we evaluated the
reliability of the suggested model.
This research’s main takeaways are as follows: (a) To predict the price of
NIFTY 50 index options, a novel hybrid DL-based model (CNN-BiLSTM)
that incorporates fundamental and technical data is proposed.; (b) Experi-
ments were carried out to compare our suggested approach to different DL
models. The results demonstrate that our proposed model provides more
precise predictions; (c) the model’s validity and robustness are confirmed by
doing a statistical and robustness testing experiment.
The remaining sections of the paper are arranged as follows. Section 2

4
describes the modeling approach discussed in this study. Dataset collection,
feature selection procedure, input preparation, and evaluation criteria are
discussed in Section 3. Experimental design, results, and prediction perfor-
mances are explained in Section 4. It also describes robustness checks and
statistical analysis of models. Finally, Section 5 and 6 present the discussion
and conclusion with future work, followed by references.

2 Proposed Model
To improve the precision of option price predictions, we present a hybrid model
CNN-BiLSTM comprising 1DCNN and BiLSTM. CNN is utilized to extract
local features of the data in a layer-by-layer manner. Extraction of highly
expressive advanced features from the data can help overcome the subjectivity
and constraints of manual feature extraction. As BiLSTM can remember past
contexts for a long period, it can accomplish time- and distance-dependent
feature extraction. Moreover, BiLSTM may extract the long-term time series
link between the index’s influencing elements and its close price [22]. Hence, the
CNN output data are fed into the BiLSTM to model the bidirectional temporal
structure via the calculation of formulas(2-10). By maximizing the use of data
information, our suggested model can automatically train and extract local
features and long memory patterns in the time series, significantly lowering
the model’s complexity [5]. Figure 1 is a simplified visual representation of the
suggested study framework. The architecture of a proposed network is depicted
in Figure 2. These CNN and BiLSTM layers, which constitute the core of our
proposed model, are briefly described in the following subsections.

Fig. 1: Diagrammatic illustration of the proposed research plan

2.1 Convolutional neural network


Convolutional Neural Network (CNN) is based on studies on how animals
naturally perceive light and color [16]. CNN architectural framework consists

5
Fig. 2: CNN-BiLSTM

of input, convolutional layer with non-linear activation function, pooling layer,


fully connected layer, and output. Numerous convolution kernels are included
in each convolution layer [35]. The formula for its computation is stated in
Eq.{1}
Yt = φ(xt ∗ kt + bt ) (1)
where Yt is the output after convolution, φ is the activation function, xt is
the input vector, kt and bt represents weight and bias of convolutional kernel
respectively. During the convolution operation of the convolution layer, data
features are extracted; however, the extracted feature dimensions are rather
high; to address this issue and lower the cost of network training, a pooling
layer has been added after the convolution layer. To learn the nonlinear com-
bination of features extracted by the convolution layer to produce the final
output, FC layers are used in the last layers of the CNN architecture [13].

2.2 Bidirectional Long short-term memory neural


network
LSTM network model, a version of RNN, was proposed by Schmidhuber et al.
in 1997 [12]. As such, it has found widespread application in both classification
and regression problems not only for stock market prediction but additionally
in a variety of areas such as text analysis, emotional analysis, speech recog-
nition, rainfall-runoff modeling, anomaly detection, mobile traffic prediction,
etc [32, 44, 27].
While traditional RNNs surpass classic networks in conserving information,
it struggles to grasp long-term dependencies due to the vanishing gradient
problem. LSTM employs memory cells to overcome this issue. BiLSTM is an
enhanced form of LSTM that allows for accessibility to attribute values in
both the forward and backward directions and was introduced by Graves and
Schmidhuber [9]. A cell state, an input layer, a hidden layer, and an output
layer make the overall framework(Fig. 3). The essential element of the LSTM
structure is the cell state, which passes through the chain with only linear
interaction, preserving the information flow. The cell state information can be
deleted or altered by the LSTM gate mechanism. A combination of a sigmoid
layer, a hyperbolic tangent layer, and a pointwise multiplication operation is
used to selectively transmit data [38, 23]. LSTM calculating principle is as

6
Fig. 3: Schematic diagram of LSTM

described below.
αt = σ(Ωα λt + Ωψα ψt−1 + bα ) (2)
βt = σ(Ωβ λt + Ωψβ ψt−1 + bβ ) (3)
γt = σ(Ωγ λt + Ωψγ ψt−1 + bγ ) (4)
τ̂t = tanh(Ωτ λt + Ωψτ ψt−1 + bτ ) (5)
τt = βt τt−1 + αt τ̂t (6)
ψt = γt ∗ tanh(τt ) (7)
where, Ω and ψ are weight matrices αt represents input gate, βt is forget gate,
τ̂t is current cell state, τt is candidate value, γt is output and, ψt is the hidden
state of the lstm cell for timestep t [29]. By introducing double-layer LSTM
and configuring the forward and reverse layers, BiLSTM can be procured (Fig
4. BiLSTM’s activation function is present at the output of the hidden layer
for both forward and backward direction [26].

Fig. 4: BiLSTM

The mathematical equations of BiLSTM are expressed as follows:



− →

ψ t = φ(Ωλ→
− λt + Ω→
ψ
−→
ψψ
− ψ t−1 + b→
−)
ψ
(8)


− ←−
ψ t = φ(Ωλ←
− λ + Ω←
ψ t
−←
ψψ
− ψ t−1 + b←
−)
ψ
(9)

7

− →

Ot = Ωλ→
− ψ +Ω →
ψ
− ψ + bχ
χψ
(10)
where φ represents activation function of the model; Ω is the weight of
matrix; Ωλψ indicates the weight of input(λ) to hidden layer(ψ); Ot shows the
hidden layer input; bλ represents the bias of respective gates(λ).The output


is achieved via updating in the forward direction ψ and structures that are
←−
created in backward ψ .

3 Dataset Preparation
The study’s predictive methodology is based on the NIFTY 50 index as it is
the most commonly traded asset globally. In feature selection, the primary
contributors of index value fluctuations are isolated. The data incorporated
into the study range from January 2007 to January 2021. The selected period
includes the effects of the 2008 global financial crisis on the Indian economy
and the Great Recession around 2020, which coincides with the global COVID-
19 pandemic. Using fundamental trading data and technical indicators of the
underlying index, the close price is predicted. The features of the proposed
approach are first briefly discussed.

3.1 Fundamental data


The foundation of any successful stock trading strategy is a solid understanding
of fundamentals, also known as historical data. The open price, close price, high
price, low price, and trading volume are all included. Indexes begin trading at
the open price each day, while they end the day at the closing price. The high
and low prices are, respectively, the daily high and low. The volume of a stock’s
trading activity is a measure of how actively investors are participating in the
market on a given day. Interest is proportional to trading volume, therefore
more trading indicates more interest. A ratio between the day’s open and close
prices is also used as a statistical feature.

3.2 Technical indicators


Traders and analysts employ technical indicators—mathematical calculations
based on factors like price, volume, and other technical indicators—to help
them spot possible buy and sell signals. Because of their focus on short-term
price changes, they are often used by active traders. The technical indica-
tors utilized in this study provide information regarding trends, volatility,
momentum, and additional relevant data (Table 1).
where, Cp = Current price of the asset at period p ; abs = abso-
2
lute function; Cp−n =Current price of asset n periods ago; α = 1+n
14 Period positive Money flow
; Money flow ratio = 14 Period negative Money flow ; TP(Typical Price) =
Pn
(High+Low+Close)/3; MA(Moving Average) = i=1 TnP ; MD(Mean Devi-
Pn abs(T P −M A)
ation)= i=1 n ;Up/Down= sum of positive/negative changes in
(Close-low)-(High-Close)
price ; β = High-low ; Lp /Hp =Lowest/Highest price over period p;

8
Table 1: List of technical indicators used in the study along with their formulas
Name Abbreviation Formula
Pp
Simple moving average SMA i=1 Ci /p
True range TR max(High − Low, abs(High −
P C), abs(Low − P C))
P rev.AT R(n−1)+T R
Average True Range ATR n
Exponential Moving Average EMA (Cp − EM Ai−1 ) ∗ α + EM Ai−1
100
Relative Strength Index RSI 100 −
1+ Avg.Gain
Avg.Loss
C−Cp−n
Rate of Change RoC Cp−n
∗ 100
100
Money Flow Index MFI 100 − 1+M oneyF lowRatio
T P −M A
Commodity Channel Index CCI 0.015∗M D
SM M Ai−1 ∗(n−1)+Cp
Smoothed Moving Average SMMA n
Momentum MoM Cp − Cp−n
U p−Down
Chande Momentum Oscillator CMO ∗ 100
1 Pn
U p+Down
Triangular Moving Average TRIMA n i=0 iCi
C −Cp
Efficiency Ratio ER P p−n
P Cp −Cp−1
PPrice−V ol
Volume-Weighted Average Price VWAP V ol
U pperBB−LowerBB
Bollinger Bands Width BBwidth MA
M A −M A−
Average Directional Index ADX 100 ∗ M+ AT otal
C −C
Volume Price Trend VPT V P Ti−1 + V ol ∗ pC p−1
p−1
Force Volume Energy FVE V ol ∗ (Cn − Cn−1 )
On-Balance Volume OBV OBVi−1 + V ol ∗ (Cn − Cn−1 )
Accumulation Distribution Line ADL ADLi−1 + V ol ∗ β
Cp −Lp
Stochastic Oscillator STOCH H −L
∗ 100
p p
M AS −M AL
Volume Oscillator VO M AL
∗ 100
Awesome Oscillator AO SM A5 − SM A34
EM A9
Mass Index MI EM Anew
9

M AS /M AL = Short/Long term moving average of volume; EM A9 =Sum of 9


periods EMA of Range; EM Anew
9 =Sum of 9-period EMA of 9-period EMA of
Range.

3.3 Feature selection strategy


Deep learning is a method of processing massive amounts of data to draw rele-
vant results. The effectiveness of deep learning algorithms, however, is typically
hampered by the existence of irrelevant or duplicate input. To anticipate the
final price, each of the input factors described above contributes to some extent
[51]. This section discusses the strategies for picking the essential features and
deleting the extraneous ones from the original feature sets.

3.3.1 Pearson correlation coefficient


As a statistical measure of the degree to which two continuous variables are
related to one another, Pearson’s correlation coefficient(PCC) is a widely used
test statistic [15]. It illustrates not only the strength but also the direction

9
of the link or correlation. Consider two-time series A = {a1 , a2 , ...., an } and
B = {b1 , b2 ...., bn }, the PCC between series A and B can be written as:

Cov(A, B)
φA,B = (11)
σA σB
Pn
j=1 (ai −Ā)(bi −B̄)
where Covariance(Cov) formula is √Pn 2

Pn 2
, Ā is the mean
j=1 (ai −Ā) j=1 (bi −B̄)

of series A and B̄ is the mean of series B.

Its value varies from -1 (absolute weak relation) to 1 (absolute strong cor-
relation), with 0 denoting that there is no link.
The heatmap of the Pearson correlation coefficient is shown in Figure 5. Clos-

Fig. 5: Heatmap

ing prices may or may not have a high correlation with the other variables.
Predictive strength may be poor if there is little to no association between the

10
considered variables. Hence, a correlation coefficient of 0.70 is used as a cutoff
for identifying and discarding features that provide irrelevant information.
During the modeling process, the filter method for selecting features is com-
plemented by an embedding method that combines the best characteristics of
both the filter and wrapper approaches, and for this L2(Ridge) regularization
approach is used. The addition of the L2 regularizer has the effect of shrink-
ing the coefficients that are less relevant to the target variable towards zero,
encouraging them to take smaller values which helps to reduce the complexity
of the model and prevent overfitting. Following the completion of the feature
selection procedure, a total of 12 variables are utilized as inputs.

3.4 Input preparation


Due to a significant number of immediate market fluctuations and trade noises,
financial data contain a complicated structure of irregularities and roughness.
Denoising time series data with the help of Discrete Wavelet Transform (DWT)
is a highly effective method [1]. While analyzing time series data, Haar wavelets
are widely used. Haar wavelet possesses a trait of compacted support, which
provides the function with a remarkable acute drop-off performance. Further,
its support length is 1, which expediently shortens the computing time and
minimizes the data processing and the training duration. Additionally, it is
also symmetric, so the true price can be recovered substantially after noise
reduction [30]. Using the scikit-image module from the Python library, we have
de-noised the close price by employing the soft mode of the Haar wavelets [39].
We have applied the min-max normalization method for scaling features to
boost our model’s efficiency and reliability. The mathematical representation
of the min-max normalization method is
α − αmin
β= (12)
αmax − αmin

where β , α are scaled and real input respectively and αmax , αmin are maximum
and minimum values of input respectively.

4 Experiment & Findings


The objective is to accurately predict the closing price of the NIFTY 50 index,
which has complicated, noisy, and volatile behavior. The original price data
of the NIFTY50 index option is directly taken from the NSE website. Figure
6 illustrates the general trend of the predictor variables. The blue line shows
the original closing price time series from September 2007 to January 2021. To
further illustrate the long-term and short-term patterns of the closing price,
the green and red curves reflect the 200-day and 50-day moving averages,
respectively.

11
Fig. 6: Closing price and moving averages of NIFTY 50

4.1 Environmental Setup & Dataset processing


All the tests are executed in a Python environment(version 3.10.11), utilizing
the TensorFlow and Keras APIs. Table 2 details the machine setup used for
each experiment.

Table 2: Computing Setup


Machine configuration Google Colab with NIVIDA-SMI 525.85.12 GPU
Python version 3.10.11
Processor Intel i5-4700H
Edition Windows 8.1 with 64-bit OS

As a part of dataset processing, appropriate features are selected and data


is transformed and separated into two parts. This study chose the first 80%
of the series to be the training set and the rest 20% to be the testing set.
Further, we split the training set into 80% − 20%, where the later 20% is
utilized as the validation set during hyperparameter tuning. In addition, a
min-max normalization approach is used to scale each feature in the datasets
to fall inside the range [0,1]. The transformed datasets are in 2D array format
(number of observations, number of features). Our model architecture demands
nonetheless 3D input data. To feed the data into the model, they are first
transformed into three-dimensional arrays (number of observations, time step,
and number of features) that include the time dimension. After this, the models
are trained with optimal hyperparameters using the full training data set. To
evaluate the efficacy of the trained models, we use the test dataset.

12
4.2 Hyperparameter selection
The ideal value of the model’s hyperparameters was determined after repeated
experimental work. The following is an explanation of the impact that several
hyperparameters have on the model, along with their significance:
• Batch size - It specifies the total number of input samples to be processed
before affecting any internal model parameters. The degree of optimization
and the training speed will be affected by its size. We experimented with
32, 64, and 128-sample batches to train our models.
• Epochs - A single epoch is one cycle through the entire training dataset. The
neural network’s weights are updated more frequently as the epoch count
rises, leading to a shift from underfitting to overfitting as the curve
progresses.
• Learning rate - This hyperparameter is used to fine-tune the convergence of
the model to an accurate prediction. In our model, we tried different learning
rates like 0.1,0.001..,0.00001.
• Optimizer - This is the optimization function that is applied in order to
achieve the best possible outcomes. In our experiments, we tried the Adam,
Nadam, and Adagrad optimizers. When compared to Adam, Nadam (which
combines the Adam optimizer with the Nesterov approach) has a somewhat
shorter training duration.
• BiLSTM units - In a nutshell, the more neurons a system has, the more
precise it is. Overfitting, however, is very common. Dropout should be
implemented at this time.
• Dropout - Regularisation in DL models is achieved through the use of a
dropout layer, which aids in reducing dependent learning between individ-
ual nodes. This layer’s value, which can range from 0 to 1, serves to stop
the model from being overfitting. Regularisation in DL models is achieved
through the use of a dropout layer, which aids in reducing dependent learn-
ing between individual nodes. This layer’s value, which can range from 0 to
1, serves to stop the model from being overfitting.
• Convolution Kernel and Filters - Their importance cannot be overstated in
the process of gaining insight and useful knowledge from input data. 3 * 3
convolution kernel is adopted in this study. We have conducted experiments
using 32 and 64-filter sizes.
Table 3 shows a description of the hyperparameters selected.

4.3 Prediction Performance


To validate the effectiveness, superiority, and generalizability of the suggested
model, we compared its prediction performance to that of competing models
CNN, LSTM, BiLSTM, and CNN-LSTM. Figure 7 shows the prediction
results of all models. The red line represents the actual value, while the blue
line represents the predicted value. This study uses the following six criteria
to compute prediction error and accuracy in order to assess the effectiveness

13
Table 3: List of Optimal Hyperparameter values
Hyper-parameter Value Hyper-parameter Value

Epochs 100 Batch size 128


BiLSTM units 256 Activation Function tanh
Conv1D layers 2 Filters 64
Kernel size 3 Padding same
Pool size 1 Kernel Regularizer(L2) 0.0001
Timesteps 5 Optimizer Nadam
Loss Function Mean Squared Error Dropout 0.1
Learning rate 0.0001

of the predictions made, and these are Mean absolute error(MAE), Root
mean squared error(RMSE), Mean absolute percentage error(MAPE), Deter-
mination Coefficient(R2 ), Maximum Error( Max Error), and Median absolute
error(MedAE). Enhanced prediction performance is seen as the model’s R2
rises and the remaining metrics decline. The mathematical representation of
these measures is as follows:

• Mean absolute error(MAE)- It is a statistical measure of how far the actual


values in a dataset deviate from the predicted ones.

1 N
M AE = Σ | Yi − Yi′ |
N i=1
• Root mean squared error(RMSE)- To put it another way, RMSE is the
standard deviation of the absolute error between the anticipated value and
the real values. r
1 N
RM SE = Σ (Yi − Yi′ )2
N i=1
• Mean absolute percentage error(MAPE)-In statistics, MAPE is used fre-
quently as a more solid criterion for evaluating the success of a prediction.

1 n Yi − Yi′
M AP E = Σi=1 | |
n Yi′

• Determination Coefficient(R2 )-It is a statistical indicator of the amount of


variation in the target variable that can be accounted for by the model’s
independent variables. Its values are between zero and one.

Σni=1 (Yi − Yi′ )2


R2 = 1 −
Σni=1 (Yi − Y¯i′ )2

• Maximum Error(Max Error)-It is a metric for determining the largest


possible discrepancy between anticipated and observed values in a given
dataset.

14
(a) CNN (b) LSTM

(c) BiLSTM (d) CNN-LSTM

(e) CNN-BiLSTM

Fig. 7: Actual vs Predicted Graph

• Median absolute error(MedAE)-It’s a statistic that takes the median of all


the discrepancies between predictions and actuals.

M edAE = M edian(Σni=1 | Yi − Yi′ |)

where Yi = Actual value, Y¯i′ = Mean value & Yi′ = Predicted value

15
Table 4 displays the average results of the different model’s prediction errors
based on the six evaluation indicators after multiple runs.

Table 4: Average performance scores


Metrics
RMSE MAE MAPE R2 MaxE MedAE
Models
CNN 0.0344 0.0291 0.0412 0.8393 0.1656 0.0286
LSTM 0.0377 0.0339 0.0474 0.8070 0.1486 0.0346
BiLSTM 0.0358 0.0323 0.0450 0.8260 0.1353 0.0330
CNN-LSTM 0.0307 0.0237 0.0345 0.8721 0.1423 0.0201
CNN-BiLSTM 0.0266 0.0191 0.0283 0.9036 0.1309 0.0140
Lal et al.[25] 0.0430 0.0373 0.0518 0.7494 0.1462 0.0350
Madhu et al.[37] 0.03798 0.0278 0.0402 0.7855 0.2283 0.0213

Since Mean Squared Error most accurately represents the deviation


between the resultant value generated by the output layer and the real value
of the data, it is used as the loss function. Figure 8 represents the loss func-
tion graph of train loss and validation loss(referred as test loss in legend) for
the above-mentioned models.

4.4 Statistical evaluation


To demonstrate that the output of the model can be trusted, we carry out a
statistical investigation to determine whether or not the results obtained by
each of the five models are noticeably distinct from one another. As a first step,
the Shapiro-Wilk test is used to ensure that RMSEs follow a normal distri-
bution. After ensuring that the RMSE distributions are normally distributed,
a one-way analysis of variance(ANOVA test) to compare them is conducted;
the resulting p-value was 0.00 < 0.05 which indicates there is a significant
difference between the performance of models. Following this primary statisti-
cal analysis, a post hoc analysis utilizing the Tukey-HSD test is performed to
ascertain if there are statistically noteworthy distinctions between the groups.
Figure 9 shows the p-values of each model with respect to the other. Com-
paring CNN-BiLSTM to CNN, LSTM, BiLSTM, CNN-LSTM, [25], and [37]
its p-values are 0.0137, 0, 0.0295, 0.0192, 1, 0, and 0 correspondingly. As a
result, there is sufficient data from both tests to assert that the CNN-BiLSTM
model’s performance is clearly above that of its competitors.

4.5 Robustness Check


The experimental findings presented in Section 4.3 demonstrate that the CNN-
BiLSTM model suggested in this paper significantly outperforms in terms of
prediction accuracy. This research employs a pair of methodologies to examine
the sturdiness of the proposed hybrid model with other benchmark models.
First, we examine whether or not the model suggested in this study continues
to produce top-tier outcomes under varying training and testing situations.

16
(a) CNN (b) LSTM

(c) BiLSTM (d) CNN-LSTM

(e) CNN-BiLSTM

Fig. 8: Loss function Graph

We then investigate the prediction performance of the suggested model in the


context of the current dispute between Russia and Ukraine.

4.5.1 Train-test ratio


We used a smaller sample size for the training set and a larger sample size for
the test set, with 40% and 60% respectively. Figure 10a displays the compara-
tive prediction results of the proposed model and its competing ones. The line
chart of performance metrics is shown in figure 10b. The proposed model still
has better accuracy than others.

4.5.2 Validation of recent dataset


According to the results presented above, the model suggested in this work can
effectively foresee the consequences of extreme events. With this in mind, we
supplemented the model with data spanning January 2021 through March 2023
(including the effects of inflation and the current crisis in Ukraine) to ensure its

17
Fig. 9: P-Statistic Values

efficacy. Figure 11a depicts the model’s prediction comparison diagram for 0.4
training and 0.6 test sets. Again, the scores from the performance evaluations
demonstrate the high quality of the prediction made by the model described
in this study(Fig 11b).

5 Discussion: Validity Threat


In this section, we address the few challenges to the reliability of our research
regarding internal and external validity. Experiments on various datasets led
us to the conclusion that our prediction model benefits from a sufficiently
large dataset, which may be a drawback shared by all DL models. Choosing
a suitable training dataset to mine past trends is also crucial to our work.
To get around this, we experiment with various partitioning datasets values
and pick the optimal one for our model training. Moreover, the models’ out-
puts may vary depending on the hyperparameter values used during training.
However, there is currently no well-known ideal selection approach, making
the process of choosing hyperparameters both time-consuming and subjective.
To get around this problem, we conducted several trial runs with varying val-
ues for the model’s hyperparameters in an effort to fine-tune our suggested
model. Hopefully, this will give the reader a sense of how broadly applicable
our prediction model is in the real world.

18
(a)

(b)

Fig. 10: Prediction graph and performance scores of all models under 40:60
split

6 Conclusion and Future work


In this article, we suggest a hybrid deep learning model for predicting the prices
of index options. Our suggested model, CNN-BiLSTM, combines the capabil-
ities of both CNN and BiLSTM into a single computational framework. CNN
is utilized for extracting features from raw data. The problem of long-term and
prolonged dependencies can be addressed by the BiLSTM unit. The hyper-
parameters that were employed in the assessment of the model that provided
the best prediction have also been covered in this article. This study con-
structs the predictive models by making use of thirty different predictors that
are classified as either fundamental or technical data. We incorporated Pear-
son’s correlation coefficient (filtration method) and L2 regularisation approach
(embedding method) during model-building to extract a balanced set of input
variables. Results from experiments demonstrate that, when compared to other
DL Models, the CNN-BiLSTM achieves the highest prediction accuracy and
the most remarkable overall performance. To ensure that RMSE values are
normally distributed, the Shapiro test is utilized. As an added layer of verifi-
cation, one-way ANOVA and post hoc analysis with the Tukey-HSD test show

19
(a)

(b)

Fig. 11: Prediction graph and performance scores of all models under 40:60
split in addition to recent data

that the CNN-BiLSTM model is statistically superior to the rest. In conclu-


sion, we demonstrate that the model developed in this research continues to
exhibit superior performance and dependability by conducting a robustness
check, varying the ratio of varying training and test sets, and conducting an
additional evaluation of recent data sets.
Improved hybrid predictive models based on different neural network architec-
tures will be developed shortly in an effort to improve accuracy. Combining
features of both traditional and modern machine learning model architectures
into a single predictive model is yet another viable option. More informative
features from multivariate datasets like macroeconomic data, greeks, etc. can
be used as well, leading to improved prediction accuracy. Hybrid optimiza-
tion methods, which combine existing local optimizers with global optimizers
like genetic algorithms, can be implemented to train the model parameters to
increase the prediction accuracy even more.

Declaration
Conflict of interest: The authors state that they do not have any conflict of
interest.

20
References
[1] R. M. Alrumaih and M. A. Al-Fawzan. Time series forecasting using
wavelet denoising an application to saudi stock index. Journal of King
Saud University-Engineering Sciences, 14(2):221–233, 2002.
[2] F. Black and M. Scholes. The pricing of options and corporate liabilities.
Journal of political economy, 81(3):637–654, 1973.
[3] W. Chen, S. Wang, G. Long, L. Yao, Q. Z. Sheng, and X. Li. Dynamic
illness severity prediction via multi-task rnns for intensive care unit. In
2018 IEEE International Conference on Data Mining (ICDM), pages 917–
922. IEEE, 2018.
[4] W. Chen, S. Wang, X. Zhang, L. Yao, L. Yue, B. Qian, and X. Li. Eeg-
based motion intention recognition via multi-task rnns. In Proceedings of
the 2018 SIAM International Conference on Data Mining, pages 279–287.
SIAM, 2018.
[5] Y. Chen, R. Fang, T. Liang, Z. Sha, S. Li, Y. Yi, W. Zhou, and
H. Song. Stock price forecast based on cnn-bilstm-eca model. Scientific
Programming, 2021:1–20, 2021.
[6] C. J. Corrado and T. Su. Skewness and kurtosis in s&p 500 index returns
implied by option prices. Journal of Financial research, 19(2):175–192,
1996.
[7] Y. Feng and Y. Li. A research on the csi 300 index prediction model based
on lstm neural network. Math. Pract. Theory, 49(7):308–315, 2019.
[8] N. Gradojevic, R. Gençay, and D. Kukolj. Option pricing with modular
neural networks. IEEE transactions on neural networks, 20(4):626–637,
2009.
[9] A. Graves and J. Schmidhuber. Framewise phoneme classification
with bidirectional lstm and other neural network architectures. Neural
networks, 18(5-6):602–610, 2005.
[10] S. L. Heston. A closed-form solution for options with stochastic volatility
with applications to bond and currency options. The review of financial
studies, 6(2):327–343, 1993.
[11] M. Hiransha, E. A. Gopalakrishnan, V. K. Menon, and K. Soman. Nse
stock market prediction using deep-learning models. Procedia computer
science, 132:1351–1362, 2018.
[12] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural
computation, 9(8):1735–1780, 1997.
[13] E. Hoseinzade and S. Haratizadeh. Cnnpred: Cnn-based stock market pre-
diction using a diverse set of variables. Expert Systems with Applications,
129:273–285, 2019.
[14] Y. Hu. Stock market timing model based on convolutional neural
network–a case study of shanghai composite index. Finance& Economy,
4:71–74, 2018.
[15] H. Huang, R. Jia, X. Shi, J. Liang, and J. Dang. Feature selection
and hyper parameters optimization for short-term wind power forecast.
Applied Intelligence, pages 1–19, 2021.

21
[16] D. H. Hubel. Single unit activity in striate cortex of unrestrained cats.
The Journal of physiology, 147(2):226, 1959.
[17] J. M. Hutchinson, A. W. Lo, and T. Poggio. A nonparametric approach
to pricing and hedging derivative securities via learning networks. The
journal of Finance, 49(3):851–889, 1994.
[18] A. Ikram and Y. Liu. Skeleton based dynamic hand gesture recogni-
tion using lstm and cnn. In Proceedings of the 2020 2nd International
Conference on Image Processing and Machine Vision, pages 63–68, 2020.
[19] Ishwarappa and J. Anuradha. Big data based stock trend prediction using
deep cnn with reinforcement-lstm model. International Journal of System
Assurance Engineering and Management, pages 1–11, 2021.
[20] C.-F. Ivas, cu. Option pricing using machine learning. Expert Systems with
Applications, 163:113799, 2021.
[21] R. Jun, W. Jianhua, W. Chuanmei, and W. Jianxiang. Stock index fore-
cast based on regularized lstm model. Comput. Appl. Softw, 35:44–48,
2018.
[22] P. Kavianpour, M. Kavianpour, E. Jahani, and A. Ramezani. A cnn-
bilstm model with attention mechanism for earthquake prediction. The
Journal of Supercomputing, pages 1–33, 2023.
[23] A. Ke and A. Yang. Option pricing with deep learning. Department of
Computer Science, Standford University, In CS230: Deep learning, 8:1–8,
2019.
[24] A. Kulshrestha, V. Krishnaswamy, and M. Sharma. Bayesian bilstm
approach for tourism demand forecasting. Annals of tourism research,
83:102925, 2020.
[25] J. K. Lal and A. K. Timalsina. A cnn-bgru method for stock price
prediction. 2022.
[26] M.-C. Lee, J.-W. Chang, S.-C. Yeh, T.-L. Chia, J.-S. Liao, and X.-
M. Chen. Applying attention-based bilstm and technical indicators in
the design and performance analysis of stock trading strategies. Neural
Computing and Applications, 34(16):13267–13279, 2022.
[27] J. Li. A comparative study of lstm variants in prediction for tesla’s stock
price. BCP Business and Manaengemt, 34:30–38, 12 2022.
[28] J. Li, Y. Sun, and B. Zhang. Interactive behavior recognition based
on sparse coding feature fusion. Laser and Optoelectronics Progress,
57(11):181006, 18 2020.
[29] L. Liang and X. Cai. Time-sequencing european options and pricing
with deep learning–analyzing based on interpretable ale method. Expert
Systems with Applications, 187:115951, 2022.
[30] X. Liang, Z. Ge, L. Sun, M. He, and H. Chen. Lstm with wavelet trans-
form based data preprocessing for stock price prediction. Mathematical
Problems in Engineering, 2019, 2019.
[31] X. Liang, H. Zhang, J. Xiao, and Y. Chen. Improving option price
forecasts with neural networks and support vector regressions. Neurocom-
puting, 72(13-15):3055–3065, 2009.

22
[32] B. Lindemann, B. Maschler, N. Sahlab, and M. Weyrich. A survey on
anomaly detection for technical systems using lstm networks. Computers
in Industry, 131:103498, 2021.
[33] S. Liu, C. W. Oosterlee, and S. M. Bohte. Pricing options and computing
implied volatilities using neural networks. Risks, 7(1):16, 2019.
[34] Y. Liu and X. Zhang. Option pricing using lstm: A perspective of realized
skewness. Mathematics, 11(2):314, 2023.
[35] W. Lu, J. Li, Y. Li, A. Sun, and J. Wang. A cnn-lstm-based model to
forecast stock prices. Complexity, 2020:1–10, 2020.
[36] W. Lu, J. Li, J. Wang, and L. Qin. A cnn-bilstm-am method for stock
price prediction. Neural Computing and Applications, 33:4741–4753, 2021.
[37] B. Madhu, M. A. Rahman, A. Mukherjee, M. Z. Islam, R. Roy, and
L. E. Ali. A comparative study of support vector machine and artifi-
cial neural network for option price prediction. Journal of Computer and
Communications, 9(05):78–91, 2021.
[38] A. Q. Md, S. Kapoor, C. J. AV, A. K. Sivaraman, K. F. Tee, H. Sabireen,
and N. Janakiraman. Novel optimization approach for stock price fore-
casting using multi-layered sequential lstm. Applied Soft Computing,
134:109830, 2023.
[39] T. Meinl and E. W. Sun. Methods of denoising financial data. In Handbook
of Financial Econometrics and Statistics, pages 519–538. Springer USA,
2015.
[40] R. C. Merton. Option pricing when underlying stock returns are
discontinuous. Journal of financial economics, 3(1-2):125–144, 1976.
[41] M. Nikou, G. Mansourfar, and J. Bagherzadeh. Stock price prediction
using deep learning algorithm and its comparison with machine learning
algorithms. Intelligent Systems in Accounting, Finance and Management,
26(4):164–174, 2019.
[42] H. Park, N. Kim, and J. Lee. Parametric models and non-parametric
machine learning models for predicting option prices: Empirical compari-
son study over kospi 200 index options. Expert Systems with Applications,
41(11):5227–5237, 2014.
[43] N. R. Pokhrel, K. R. Dahal, R. Rimal, H. N. Bhandari, R. K. Khatri,
B. Rimal, and W. E. Hahn. Predicting nepse index price using deep
learning models. Machine Learning with Applications, 9:100385, 2022.
[44] M. M. Rahman, O. L. Usman, R. C. Muniyandi, S. Sahran, S. Mohamed,
and R. A. Razak. A review of machine learning methods of feature
selection and classification for autism spectrum disorder. Brain sciences,
10(12):949, 2020.
[45] H. M. Rai and K. Chatterjee. Hybrid cnn-lstm deep learning model and
ensemble technique for automatic detection of myocardial infarction using
big ecg data. Applied Intelligence, 52(5):5366–5384, 2022.
[46] J. Ruf and W. Wang. Neural networks for option pricing and hedging: a
literature review. arXiv preprint arXiv:1911.05620, 2019.
[47] M. Schroder. Computing the constant elasticity of variance option pricing

23
formula. the Journal of Finance, 44(1):211–219, 1989.
[48] S. Selvin, R. Vinayakumar, E. Gopalakrishnan, V. K. Menon, and
K. Soman. Stock price prediction using lstm, rnn and cnn-sliding win-
dow model. In 2017 international conference on advances in computing,
communications and informatics (icacci), pages 1643–1647. IEEE, 2017.
[49] O. B. Sezer, M. U. Gudelek, and A. M. Ozbayoglu. Financial time series
forecasting with deep learning: A systematic literature review: 2005–2019.
Applied soft computing, 90:106181, 2020.
[50] S. Siami-Namini, N. Tavakoli, and A. S. Namin. The performance of
lstm and bilstm in forecasting time series. In 2019 IEEE International
conference on big data (Big Data), pages 3285–3292. IEEE, 2019.
[51] H. Tyralis and G. Papacharalampous. Variable selection in time series
forecasting using random forests. Algorithms, 10(4):114, 2017.
[52] H. Xie and T. You. Research on european stock index options pricing
based on deep learning algorithm: evidence from 50etf options markets.
In Stat. Inf. Forum, volume 33, pages 99–106, 2018.
[53] S.-H. Yang and J. Lee. Predicting a distribution of implied volatilities for
option pricing. Expert Systems with Applications, 38(3):1702–1708, 2011.
[54] N. W.-j. ZENG An. Stock recommendation system based on deep
bidirectional lstm. Computer Science, 46(10):84, 2019.
[55] K. Zhao, J. Zhang, and Q. Liu. Dual-hybrid modeling for option pricing
of csi 300etf. Information, 13(1):36, 2022.

24

You might also like