0% found this document useful (0 votes)
9 views36 pages

[Aide300][Group 5] Final Report

The report presents a study on predicting Amazon's stock price using advanced deep learning techniques, specifically the Bidirectional Long Short-Term Memory (Bi-LSTM) model. It compares the performance of Bi-LSTM with traditional econometric models like ARIMA and OLS, demonstrating that Bi-LSTM outperforms these models in accuracy for short-term predictions. The research highlights the importance of integrating external data sources to enhance prediction accuracy and provides a framework beneficial for investors and analysts in volatile markets.

Uploaded by

julsworking13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views36 pages

[Aide300][Group 5] Final Report

The report presents a study on predicting Amazon's stock price using advanced deep learning techniques, specifically the Bidirectional Long Short-Term Memory (Bi-LSTM) model. It compares the performance of Bi-LSTM with traditional econometric models like ARIMA and OLS, demonstrating that Bi-LSTM outperforms these models in accuracy for short-term predictions. The research highlights the importance of integrating external data sources to enhance prediction accuracy and provides a framework beneficial for investors and analysts in volatile markets.

Uploaded by

julsworking13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

FOREIGN TRADE UNIVERSITY HO

CHI MINH CAMPUS

FINAL REPORT OF ASSIGNMENT

Course name: Artificial Intelligence in Era of Digital Transformation


Date: 19/03/2025 – Class code: ML183 – Course code: AIDE300

Peer Evaluation
No. Full Name ID
(0% - 100%)

1 Lê Hoàng Triều 2312155212 100%

2 Phạm Phú Tài 2312155177 100%

Grade (in number) Grade (in words)

Examiner 1’s signature Examiner 2’s signature


TABLE OF CONTENTS
LIST OF FIGURES...........................................................................................................3
LIST OF TABLES.............................................................................................................3
ABSTRACT........................................................................................................................4
CHAPTER 1: INTRODUCTION....................................................................................5
1.1. Background of stock price prediction.......................................................................5
1.2. Overview of Amazon (AMZN).................................................................................6
1.3. Previous studies.........................................................................................................7
1.4. Project overview........................................................................................................7
1.5. Disclaimer.................................................................................................................8
CHAPTER 2: PREDICTIVE MODEL BUILDING......................................................9
2.1. Data cleaning process................................................................................................9
2.2. Exploratory Data Analysis (EDA)..........................................................................14
2.2.1. Data description................................................................................................14
2.2.1.1. Data source.................................................................................................14
2.2.1.2. Key stock market terms..............................................................................14
2.2.1.3. Data analysis...............................................................................................15
2.2.2. Feature selection...............................................................................................20
2.2.3. Testing feature significance..............................................................................22
2.3. Model analysis.........................................................................................................24
2.3.1. Model training...................................................................................................24
2.3.2. Model building..................................................................................................24
2.3.3. Model testing results.........................................................................................26
2.3.4. Potential causes of stock price trends...............................................................28
2.3.5. AI model evaluation and comparison with traditional econometric models....29
CHAPTER 3: CONCLUSION.......................................................................................31
3.1. Findings...................................................................................................................31
3.2. Limitations...............................................................................................................31
3.3. Recommendations...................................................................................................32

2
REFERENCES................................................................................................................34

LIST OF FIGURES
Figure 1. Result of checking for missing values................................................................11
Figure 2. Result of checking for outliers...........................................................................13
Figure 3. The open prices of Amazon stock from 2020 to 2025.......................................15
Figure 4. The high prices of Amazon stock from 2020 to 2025........................................16
Figure 5. The low prices of Amazon stock from 2020 to 2025.........................................16
Figure 6. The close prices of Amazon stock from 2020 to 2025.......................................17
Figure 7. Amazon’s stock exchange volume from 2020 to 2025......................................18
Figure 8. Amazon's stock close distribution......................................................................19
Figure 9. Amazon's distribution of volume of stock trading.............................................20
Figure 10. Correlation matrix of features..........................................................................23

LIST OF TABLES
Table 1. Predicted results................................................................................................................................................................. 27

3
ABSTRACT
The report focuses on forecasting Amazon's (AMZN) stock price via advanced
deep learning models, which is the Bidirectional Long Short-Term Memory (Bi-LSTM)
network. Traditional econometric techniques such as ARIMA and Ordinary Least
Squares (OLS) are compared with Bi-LSTM to demonstrate their distinct advantages and
drawbacks. The model utilizes historical stock data, integrating essential aspects such as
moving averages, volatility metrics, and market sentiment indicators, to improve the
accuracy of short-term price predictions. This research improves the rapidly growing
field of stock price prediction by employing deep learning techniques to identify
complex, non-linear relationships in stock price fluctuations that conventional models fail
to handle. Performance evaluation criteria, including Mean Absolute Error (MAE), Root
Mean Square Error (RMSE), and R-squared (R²), demonstrate that the Bi-LSTM model
outperforms the ARIMA and OLS models in stock price prediction. The report also
indicates that Bi-LSTM provides a strong framework for predicting stock price
movements, especially in volatile markets, but recommends using external data sources,
such as sentiment analysis and macroeconomic factors, to enhance prediction accuracy.
This methodology offers significant information for investors and analysts pursuing more
efficient stock price prediction instruments.

4
CHAPTER 1: INTRODUCTION

1.1. Background of stock price prediction


Stock price prediction has long been a challenge and research interest due to its
importance in the financial field. Stock market prediction is essential for investors,
traders, and financial analysts who have to make informed decisions based on expected
market movements. Given the volatility and complexity of financial markets, accurate
price prediction can lead to better decision-making, increased profits, and more stable
financial strategies (Fama, 1970).
The importance of stock price prediction has increased significantly with the rise
of algorithmic trading, machine learning, and deep learning techniques. These methods
are increasingly used to analyze historical data, identify patterns, and predict future
trends. Stock price prediction is important for traders and companies as it provides
insights into stock performance, supports portfolio management, and helps manage risk
(Diego, 2024). Despite extensive research in this area, there is still a significant gap in
accurately predicting short-term stock price movements due to the nonlinear behavior and
high volatility of the stock market. This makes it necessary to apply advanced deep-
learning techniques to improve prediction accuracy. Traditional stock prediction research
mainly focuses on linear models and econometric methods such as ARIMA
(AutoRegressive Integrated Moving Average) and GARCH (Generalized Autoregressive
Conditional Heteroskedasticity), which often fail to capture complex and nonlinear
relationships (Bollerslev, 1986). With the rapid development of deep learning methods,
researchers are now aiming to overcome these limitations by using architectures such as
Bidirectional Long Short-Term Memory (Bi-LSTM) networks. Unlike standard LSTMs,
which process data in one direction, Bi-LSTMs process time series data in both forward
and backward directions, allowing for the capture of more complex dependencies in stock
price movements (Hochreiter & Schmidhuber, 1997; Schuster & Paliwal, 1997).

5
1.2. Overview of Amazon (AMZN)
Amazon (AMZN) is one of the largest and most influential companies in the
world, dominating many industries, including e-commerce, cloud computing, artificial
intelligence, and digital advertising. As a NASDAQ-listed company, Amazon's stock
price is closely followed by institutional investors, hedge funds, and retail traders.
Predicting the Amazon (AMZN) stock price is not only important for investors but also
provides insights into the broader technology and consumer markets, as Amazon plays a
significant role in shaping global economic trends (Gupta & Chen, 2020).
Amazon’s stock price is influenced by several factors, including:
● Quarterly earnings reports: Amazon’s financial performance in key areas such as
Amazon Web Services (AWS), Prime membership, and advertising significantly
impact the company’s stock price (Zhang, 2024).
● Macroeconomic conditions: Changes in interest rates set by the Federal Reserve
(FED) and general market sentiment affect Amazon’s valuation (Fama, 1970).
● Competitive Landscape: Amazon faces competition from Walmart in e-commerce,
Microsoft in cloud computing, and Google in digital advertising, which could
impact investor confidence and stock performance (Ma et Y., 2024).
● Technology Development: Investments in artificial intelligence (AI), logistics, and
automation can lead to large swings in stock prices (Bernard & Obinna, 2024).
Over the years, Amazon has diversified its revenue streams, making its stock an
interesting case for deep learning-based forecasting. The stock is known for its high
volatility, often experiencing sharp price swings following earnings reports, policy
changes, or macroeconomic events. Therefore, predicting AMZN stock prices using
advanced AI techniques such as Bi-LSTM poses a significant challenge while providing
valuable insights into the performance of state-of-the-art financial forecasting models
(Selvin et al., 2017).
This study focuses on applying Bi-LSTM deep learning models to forecast
Amazon stock prices in the next five days using historical stock data from Yahoo

6
Finance, aiming to improve the accuracy of short-term predictions in technology-driven
financial markets.
1.3. Previous studies
Despite significant advances in stock price prediction, there is still a significant
gap in the literature regarding the application of deep learning techniques, especially Bi-
LSTM networks, to predict short-term stock price movements. Most studies have focused
on long-term predictions or used more traditional models such as linear regression or
ARIMA (Kuang, 2023). However, deep learning models, including Bi-LSTM, have
shown great promise in capturing temporal dependencies and nonlinear patterns in stock
price data, which are often overlooked by traditional models (Chong et al., 2017; Kim &
Shin, 2019).
In addition, while several studies have focused on predicting stock prices for large
companies such as Apple or Tesla, few studies have specifically addressed Amazon
(AMZN) stock in the context of deep learning and short-term price forecasting. This
study aims to fill this gap by focusing on Amazon stock prices, using historical price data
and sentiment analysis to improve predictions (Nelson et al., 2017).
1.4. Project overview
The main objective of this paper is to develop a deep-learning model that can
predict the daily stock price of Amazon (AMZN) for the next five days using historical
stock data obtained from Yahoo Finance. This paper will:
● Apply advanced deep learning techniques, especially Bi-LSTM networks, to
forecast the short-term stock price of AMZN.
● Compare the performance of Bi-LSTM models with traditional econometric
models such as ARIMA and Linear Regression.
● Provide detailed information on the performance of the developed models and
evaluate their accuracy through performance metrics such as Mean Absolute Error
(MAE), Root Mean Square Error (RMSE), and R².

7
● Contribute to the understanding of how deep learning can be used to predict stock
prices more effectively, especially in volatile markets, thus potentially benefiting
investors and analysts.
By the end of the project, we hope to provide a robust predictive model for
Amazon stock, generating predictions that can aid decision-making for investors and
financial professionals.
1.5. Disclaimer
The code used in this report was generated with the help of AI models, including
ChatGPT-4o and Deepseek. The AI-generated code was used to develop a model to
predict Amazon's (AMZN) stock price using advanced deep learning techniques.
The AI prompts used to generate the code included:
● Creating a Python script to forecast stock prices on Google Colab using the best
performing model.
● Implementing key steps such as feature engineering, data preprocessing, model
training, and evaluating deep learning techniques such as Bi-LSTM, along with
traditional models such as ARIMA and Linear Regression.
● Retrieving Amazon stock data from Yahoo Finance and calculating technical
indicators, including moving averages, MACD, Bollinger Bands, and ATR.
● Use performance metrics such as Mean Absolute Error (MAE), Root Mean Square
Error (RMSE), and R-squared (R²) to evaluate different models and select the
most accurate one.
● Perform correlation analysis to identify the most relevant features and prevent
multicollinearity.
● Use the selected model to forecast Amazon's stock price for the next five days and
visualize the results using Matplotlib and Seaborn.
While the AI-generated code is used as a foundation, additional modifications and
optimizations have been made to ensure accuracy and efficiency. The predictions and
analysis in this report are for research purposes only and should not be considered

8
financial advice. Investors and analysts should conduct their own independent
evaluations before making any financial decisions.

9
CHAPTER 2: PREDICTIVE MODEL BUILDING

2.1. Data cleaning process


After collecting stock data from Yahoo Finance for MSFT, we first check the
integrity of the data to ensure that our model is trained on clean data. To do this, we start
by printing the first and last part of the data to check if there are any invalid or corrupted
columns or rows.

Step1: Checking the collected data

We use the head() and tail() functions to print out the first and last 5 lines of the
data, which helps to confirm that the data has been collected correctly and is in the right
structure. Here is how to do it:

# Hiển thị 5 dòng đầu tiên


print(data.head())
# Hiển thị 5 dòng cuối cùng trong dataset
print(data.tail())

The results show that the obtained data is in the correct format and can be used for
further analysis.

Step 2: Checking the data type of the columns

We continue to check the data type of the columns in the dataset using the info()
method. This helps to confirm that the columns have the correct data type, for example,
the time column (index) must be in datetime format.

# Kiểm tra kiểu dữ liệu


print(data.info())
# Chuyển đổi cột thời gian về dạng datetime
data.index = pd.to_datetime(data.index)

Step 3: Checking for missing values

10
After checking the basic structure of the data, we proceed to check for missing
values in the dataset using the isnull().sum() method. If there are missing values, we will
handle them, possibly by deleting the missing rows or filling in replacement values.

# Kiểm tra số giá trị bị thiếu


print("Missing values:\n", data.isnull().sum())

# Kiểm tra số dòng bị trùng lặp


print("Number of duplicate observations:", data.duplicated().sum())

# Kiểm tra số lượng giá trị duy nhất của từng cột
print("Number of unique values per column:\n", data.nunique())
def outlier_detection(data, columns):
result_df = pd.DataFrame(index=data.describe().index)
for col in columns:
IQR = data[col].quantile(0.75) - data[col].quantile(0.25)
lower_bound = data[col].quantile(0.25) - 1.5 * IQR
upper_bound = data[col].quantile(0.75) + 1.5 * IQR
# Tìm các giá trị ngoại lệ

It can be seen from the output of the code that the data obtained is in the correct
format and can be used for analysis.

11
Figure 1. Result of checking for missing values
Step 4: Check for duplicate values
To ensure the reliability of the data, we used a Python code called
"data.duplicated" to check for any duplicates in our data and inserted a code to remove
any duplicate values. Fortunately, there were no duplicate values in our data set. The
results showed that there were no duplicates in the data, so there was no need to remove
any rows.
# Kiểm tra kiểu dữ liệu
print(data.info())

# Chuyển đổi cột thời gian về dạng datetime

12
data.index = pd.to_datetime(data.index)

# Kiểm tra số dòng và số cột


print("Dataset shape:", data.shape)

# Kiểm tra số giá trị bị thiếu


print("Missing values:\n", data.isnull().sum())

# Kiểm tra số dòng bị trùng lặp


print("Number of duplicate observations:", data.duplicated().sum())

# Kiểm tra số lượng giá trị duy nhất của từng cột
print("Number of unique values per column:\n", data.nunique())

Step 5: Check the number of unique values in each column

Finally, we also check the number of unique values in each column to ensure the
validity of the data. If any column contains too few unique values, we will reconsider the
validity of that column in the model.

# Kiểm tra số lượng giá trị duy nhất của từng cột
print("Number of unique values per column:\n", data.nunique())

Step 6: Checking for outliers

Another important step is to check for outliers that may affect the quality of the
model. We perform outlier testing using the IQR (Interquartile Range) method. Outliers
are identified if they exceed the upper or lower thresholds of the IQR. Here is how to
identify outliers for important columns such as Open, High, Low, Close, and Volume

def outlier_detection(data, columns):


result_df = pd.DataFrame(index=data.describe().index)
for col in columns:
IQR = data[col].quantile(0.75) - data[col].quantile(0.25)
lower_bound = data[col].quantile(0.25) - 1.5 * IQR
upper_bound = data[col].quantile(0.75) + 1.5 * IQR

13
outlier_condition = (data[col] < lower_bound) | (data[col] >
upper_bound)
outliers = data.loc[outlier_condition.values, col].describe()
# Lưu các thống kê về ngoại lệ
result_df[f'outlier_{col}'] = outliers
return result_df

# Kiểm tra ngoại lệ cho các cột quan trọng


outlier_result = outlier_detection(data, ['Open', 'High', 'Low', 'Close',
'Volume'])
print(outlier_result)

Figure 2. Result of checking for outliers


After performing the above checks, we concluded that the data was cleaned and
ready for further analysis and model building. The checks included checking the data
structure, checking for missing values, duplicate values, and outliers which helped
confirm the integrity of the collected data.

2.2. Exploratory Data Analysis (EDA)


2.2.1. Data description
2.2.1.1. Data source
The dataset consists of historical stock prices of Amazon (AMZN) from March 16,
2020 to March 17, 2025, downloaded via the `yfinance` package. "yfinance" is an open-
source Python package with a simple approach to downloading historical market data
from Yahoo Finance, one of the largest and most trusted websites providing news and

14
information on a wide range of stocks). The information includes Date, Open, High, Low,
Close, and Volume.
import yfinance as yf
import pandas as pd
# Define the date range
start_date = "2020-03-16"
end_date = "2025-03-17"

# Download Amazon stock data up to March 16, 2025


df = yf.download('AMZN', start=start_date, end=end_date, interval='1d')

# Copy data to avoid modifying the original dataset


data = df.copy()

2.2.1.2. Key stock market terms


₋ Daily Return: Percentage change in closing price, used to measure the daily
percentage change in a stock's price.
₋ Cumulative Return: Compound growth over time, used to track the overall growth
of an investment over time.
₋ Volatility: Rolling standard deviation of daily returns, used to measure the risk or
volatility of a stock's returns over a 21-day rolling window.
₋ Moving Average (7-day, 30-day, 100-day): Rolling average of closing prices, used
to smooth price data to determine trends over different time periods.
₋ Trading Volume: Number of shares traded daily, used to indicate the liquidity and
market activity of a stock.
₋ Correlation Matrix: Correlation coefficient between columns, used to measure the
strength and direction of the relationship between variables (e.g. Open, High, Low,
Close, Volume).
₋ Descriptive Statistics: Summary statistics, used to provide an overview of the
distribution of a data set and key metrics.

15
2.2.1.3. Data analysis
The opening price of Amazon (AMZN) shows volatility from 2020 to 2025, with
prices remaining relatively stable until a sharp increase around 2022. From 2022 to 2023,
the stock shows periods of volatility with significant increases and decreases, which may
correlate with external events, market conditions, or internal company strategies. The
price gradually declines by the end of 2024, reflecting the potential impact of external
factors such as a market downturn or economic uncertainty.

Figure 3. The open prices of Amazon stock from 2020 to 2025


High price fluctuates similarly to the opening price but tends to increase more
clearly in 2025. There are periods of rapid growth, especially in 2022 and early 2023,
signaling investor optimism and possibly good market performance or an important
product/service release by Amazon. A sudden drop in early 2025 could reflect a market
correction or any economic downturn.

16
Figure 4. The high prices of Amazon stock from 2020 to 2025
The low price shows a steady increase from 2020 to 2025, reflecting overall
market optimism for AMZN, with a dip occurring in 2022 and 2023.
The steady increase from 2023 onwards could be related to increasing demand for
Amazon’s services, possibly due to the growth of e-commerce or expansion of cloud
services.

Figure 5. The low prices of Amazon stock from 2020 to 2025


The close price typically follows the same pattern as the open and high but is more
conservative in its movements. There are notable peaks around 2022, coinciding with the
overall increase in AMZN stock during that time. After the 2024 decline, the stock
17
showed signs of a slight recovery, suggesting a possible positive performance or investor
confidence returning as the market stabilizes.

Figure 6. The close prices of Amazon stock from 2020 to 2025


The chart illustrates the volume of shares traded for Amazon (AMZN) stock from
2020 to 2025. Volume shows significant fluctuations over time, with sharp peaks and
troughs. These peaks can correspond to major market events, such as earnings
announcements, stock splits, or external factors that influence investor sentiment, leading
to higher-than-normal trading activity. The overall trend shows that volume remains
highly volatile, with periods of high trading activity followed by quieter, less volatile
periods. Such fluctuations suggest that while Amazon stock experiences periods of
heightened investor interest, these periods tend to be sporadic rather than continuous,
reflecting the stock’s response to both internal and external market stimuli.

18
Image.
Figure 7. Amazon’s stock exchange volume from 2020 to 2025
The distribution of closing prices for Amazon (AMZN) shows a fairly
symmetrical distribution, with a peak around $160, indicating that AMZN’s closing
prices have mostly fluctuated within this range. Most closing prices are concentrated
around the median, but there is some volatility between $80 and $240, with a few
instances of prices reaching above $200. This suggests that while AMZN can reach
higher prices during positive market conditions, these higher prices are relatively rare.
The presence of higher price points suggests that AMZN has the potential to generate
high returns during favorable market periods, but overall the stock price has remained
stable and has not deviated much in either direction.

19
Figure 8. Amazon's stock close distribution
Finally, from the chart of the distribution of trading volume for Amazon stock, we
can see that the distribution is skewed to the right, indicating that the majority of trading
volume falls in the lower range, with a few instances of very high volume. The peak of
the distribution is between 0.5 and 1.0 million shares traded, indicating that most of the
time, trading volume is in this range. The tail on the right represents occasional spikes in
trading activity, where large volumes of shares are traded. These spikes can be due to
important events such as earnings reports, market reactions to news, or other external
factors that influence investor behavior. The overall trend shows that while high volume
days do occur, they are relatively rare and Amazon stock typically has moderate trading
volume.

20
Figure 9. Amazon's distribution of volume of stock trading
2.2.2. Feature selection
Before training the dataset for prediction, we examined the fundamentals of stock
prices from a practical viewpoint to include new features aimed at improving the
performance of our predictive model. Technical indicators based on past stock prices and
volume data are crucial for predicting stock trends and price fluctuations (Shynkevich et
al., 2017), as these indicators reflect market dynamics, investor mood, and volatility,
offering essential data for predictive models.

Essential price-based features include Close, High, Low, and Open prices, which
indicate fluctuations in markets and contribute to the calculation of technical indicators.
Volume measures, including Volume, Volume Change, and On-Balance Volume (OBV),
enable the evaluation of market activity and investor sentiment (Bao et al., 2017; Oak et
al., 2024). Trend-following indicators, including Moving Averages (MA) and
Exponential Moving Averages (EMA), mitigate price volatility and assist in trend
identification (Selvin et al., 2017). Momentum indicators, such as Momentum, Returns,
and Volatility, measure price acceleration and reversals, thereby enhancing short-term
prediction precision (Fischer & Krauss, 2018). Oscillators such as RSI, Williams %R,
and CCI assess price momentum and detect overbought or oversold levels (Selvin et al.,
2017; Oak et al., 2024). Bollinger Bands, ATR, and ADX measure market volatility and

21
trend strength, differentiating between stable and volatile phases (Bao et al., 2017;
Fischer & Krauss, 2018). The MACD and Signal Line are extensively utilized to identify
trend reversals and momentum fluctuations (Shynkevich et al., 2017). Researchers have
found that using important indicators like MA, EMA, MACD, RSI, Bollinger Bands,
ATR, and OBV can help make predictions more accurate and reduce overfitting (Oak et
al., 2024). With these indicators, we will have 25 features in total, which are listed
below:
Price-based features
● Close, High, Low, Open
● Return: The percentage changes in the closing price compared to the previous
day's closing price. It is used to measure the daily return on the stock.
● Volatility_10: The standard deviation of price fluctuations over the previous 10
days. It measures the fluctuations in the stock's price, with increased volatility
indicating greater risk.
Moving averages
● MA_5: The 5-day simple moving average of the stock price, determined by
averaging the closing prices from the previous 5 days.
● MA_20: The 20-day simple moving average of the stock price.
● EMA_5: The 5-day EMA of the stock price, which provides greater significance
to recent prices to react more swiftly to price fluctuations.
● EMA_20: The 20-day EMA, which responds more swiftly to market fluctuations
than a standard moving average.
Volatility and trend indicators
● Bollinger_Upper: indicates the stock's overbought status, determined by adding
twice the standard deviation of the closing price to the 20-day moving average.
● Bollinger_Lower: indicates the stock's oversold status, determined by deducting
twice the standard deviation from the 20-day moving average.

22
● RSI_14: evaluates overbought or oversold conditions by calculating the
magnitude of recent price fluctuations. A score beyond 70 indicates overbought
conditions, whereas a value below 30 indicates oversold conditions.
● Williams %R: compares the current closing price against the high-low range
over a given period (usually 14 days).
● CCI: measures the difference of the stock price from its mean price over a
specified duration. Positive values signify an overbought state, whilst negative
values indicate an oversold state.
● ADX: measures the strength of a trend, whether ascending or descending, with
values exceeding 25 indicating a robust trend.
● ATR: computes the mean range between high and low prices during an assigned
period.
Volume-related features
● Volume
● Volume Change: The percentage change in volume compared to the prior day.
● Volume_Spike: measures volume in relation to the 10-day moving average of
volume.
Technical indicators
● MACD: measures the difference between two exponential moving averages
(typically the 12-day and 26-day EMAs).
● Signal_Line: is the 9-day EMA of the MACD, which helps identify buy or sell
signals by crossing above or below the MACD line.
● Momentum_5: The difference between the current day's closing price and the
closing price from five days earlier. This assesses the stock's price momentum
over a short duration.
2.2.3. Testing feature significance
Similar to the beginning of the code, we also cleaned the technical indicators' data
to ensure that further processing was not troublesome. At this point, we continue to
optimize our model by attempting to identify variables that are important to the results of

23
the prediction by analyzing each variable's correlation among themselves through a
correlation matrix analysis. The purpose of this phase is to pinpoint which features are
the most important and least in the model’s prediction accuracy.

Figure 10. Correlation matrix of features


The Close, High, Low, and Open prices (r = 1.0) exhibit a strong correlation, as
they reflect different aspects of daily price fluctuations; MA, EMA, and Bollinger Bands
demonstrate a significant interrelationship as they all monitor price trends derived from
past data.
To enhance feature selection, we applied XGBoost to identify the most significant
variables for forecasting stock closing prices. The findings indicated that Low (0.5859)
and High (0.3852) were the most significant predictors, while the Open price which
exhibited a strong correlation with Close was excluded due to its duplication (0.0001).

24
Among technical indicators, EMA_5 (0.0233) was recognized as a crucial element,
demonstrating its effectiveness in identifying short-term trends. A minimum value of
50% of the average feature importance score was implemented to optimize feature
selection, thereby maintaining the most significant signals. Finally, this methodology
highlighted High, Low, and EMA_5 as essential features, enhancing model efficiency
and prediction accuracy while minimizing redundancy.
2.3. Model analysis
2.3.1. Model training

X_train, X_test, y_train, y_test = train_test_split(X_seq, y_seq,


test_size=0.2, shuffle=False)

This code details the process of dividing the dataset into training and test subsets,
which is essential for evaluating the generalization capability of the predictive model. In
this implementation, the dataset is split such that 80% of the data is allocated to the
training set, while 20% is reserved for testing.

2.3.2. Model building

As traditional forecasting techniques, including linear regression models, ARIMA,


GARCH, contain specific limitations, particularly in addressing the non-linear, unstable,
and extremely volatile nature of the stock market. Consequently, the study chose to
implement a Deep Learning model, which is the Bi-LSTM (Bidirectional Long Short-
Term Memory) model was selected over other deep learning models because it can use
data from both sides of time, which makes financial time series prediction more accurate.
In contrast to the one-directional LSTM, which exclusively learns from the past to the
present, the Bi-LSTM possesses the capability to learn in both directions: from the past to
the present and from the present to the past. This ability is particularly significant in
finance, where stock prices are influenced not only by historical data but also by
recurring patterns and recent market reaction trends. Numerous studies indicate that the
Bi-LSTM outperforms the conventional LSTM in financial time series forecasting, owing

25
to its capacity to acquire and integrate input bidirectionally, thereby reducing information
loss during training (Fischer & Krauss, 2018; Selvin et al., 2017).

In addition, Bao et al., (2017) showed that combining Bi-LSTM with optimization
methods such as Attention Mechanism and Dropout improves the accuracy of forecasts
while keeping the model stable. Dao et al., (2024) evaluated the efficacy of the LSTM
model in predicting the volatility of the VNIndex, demonstrating that this model exhibits
high accuracy and effectively captures market movements. Tran (2024) evaluated the
efficacy of the LSTM-GRU ensemble model in predicting stock indices, revealing that
this model enhances forecasting efficiency relative to conventional methods. These
results further validate the efficacy of Bi-LSTM in forecasting financial time series,
particularly in the setting of intricate data influenced by numerous external variables.

Finally, we run and train the Bidirectional LSTM (Bi-LSTM) model by fitting it to
the training data (X_train, y_train) over 20 epochs with a batch size of 32.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Bidirectional
# Xây dựng mô hình Bi-LSTM
model = Sequential()
# Thêm lớp LSTM hai chiều (Bidirectional LSTM)
model.add(Bidirectional(LSTM(units=50, return_sequences=True),
input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(LSTM(units=50, return_sequences=False)) # LSTM cuối cùng
model.add(Dense(units=1)) # Dự báo một giá trị cho Close
# Biên dịch mô hình
model.compile(optimizer='adam', loss='mean_squared_error')
# Hiển thị mô hình
model.summary()
# Huấn luyện mô hình
history = model.fit(X_train, y_train, epochs=20, batch_size=32,
validation_data=(X_test, y_test))

26
2.3.3. Model testing results

The accuracy of the Bi-LSTM model in forecasting stock prices can be evaluated
by four principal metrics: Mean Squared Error (MSE), Root Mean Squared Error
(RMSE), Mean Absolute Error (MAE), and R-squared (R²).

● The MSE of 0.0008 indicates a minimal average squared error, indicating that the
model's predictions closely correspond with real stock values.
● The RMSE of 0.0284 offers an understandable error metric in the same unit as the
target variable, indicating that, on average, the model's predictions vary from
actual values by approximately 2.84%.
● The MAE of 0.0226 supports the model's accuracy by measuring the average
absolute deviation between projected and actual prices.
● The R-squared (R²) score of 0.9474 indicates that the model accounts for 94.74%
of the variation in stock price movements, reflecting robust predictive capability.

The results indicate that the Bi-LSTM model accurately captures stock price trends.
Following that, the code written here defines that the prediction loop iterates five times,
corresponding to the next five trading days. At each iteration, the Bi-LSTM model
generates a one-day-ahead stock price prediction using model.predict(current_input), and
the result is appended to forecast_scaled.

# Generate predictions for the next 5 business days (excluding weekends)


num_predictions = 5
predicted_dates = []
current_date = pd.Timestamp("2025-03-18")
for _ in range(num_predictions):
# Predict the next day's stock price
next_day_pred = model.predict(current_input)
forecast_scaled.append(next_day_pred.flatten()[0])
# Update input with the new prediction
next_day_reshaped = np.repeat(next_day_pred,
X_train.shape[2]).reshape(1, 1, X_train.shape[2])

27
current_input = np.concatenate([current_input[:, 1:, :],
next_day_reshaped], axis=1)
# Get the next business day
current_date += BDay(1) # Moves to the next business day
predicted_dates.append(current_date)

Date (yy-mm-dd) Predicted price

2025-03-19 197.080310

2025-03-20 196.621833

2025-03-21 196.171110

2025-03-24 195.717799

2025-03-25 195.262849

Table 1. Predicted results

2.3.4. Potential causes of stock price trends

The projected trend indicates a modest decrease in Amazon's stock price


throughout the five-day period, falling from $197.08 to $195.26, which represents a total
drop of around 0.99%. This fluctuation may be related to Amazon's recent financial
performance, macroeconomic factors, and technological indicators.

Technical factors
Amazon's moving average behavior, particularly the 20-day and 50-day EMAs,
has recently demonstrated a bearish crossover, aligning with the declining trend.
Historically, Amazon's stock has had a strong response to its 50-day EMA, which past
breaches have often led to even bigger drops. Recently, after falling below the 50-day
EMA at $221.40 (Barchart, 2025), the stock kept going down, following the same trend
seen in mid-2024, when a similar drop led to another 2% drop within two weeks (Money
Morning, 2025).

28
Fundamental factors
The latest earnings reports and revenue predictions from Amazon have a big
influence on investor sentiment. If Amazon Web Services growth slows below its normal
yearly rate of about 30% or if e-commerce sales drop because people are spending less,
the stock may be under even more pressure. Looking at Q1 2025, the company has
projected revenues between $151 billion and $155.5 billion, below analysts' expectations
of $158.6 billion (Reuters). Furthermore, higher operating expenses like wages and
shipping costs could make the profit margin smaller, which is another reason why prices
are expected to go down.

Macroeconomic factors
Amazon's stock is also affected by macroeconomic variables such as interest rates
and inflation patterns. In early 2025, the Federal Reserve reduced interest rates yet
anticipated gradual reductions due to ongoing inflation, affecting market volatility and
Amazon's performance (ft.com). Moreover, broader market patterns, such as sector
rotations from technology companies to defensive sectors, might influence Amazon's
stock price. In early 2025, concerns regarding inflation and trade policy led to an
important sell-off in the technology stock market, with Amazon experiencing
considerable falls.
Institutional activity and volume trends
Amazon's shares dropped 4.1% to $229.15 on February 7, 2025, with a trading
volume of 77.3 million, much more than its 50-day average volume of 34.9 million.
Stock performance may be impacted by these deviations, which may signal changes in
institutional purchasing or selling activity. Negative sentiment may also be strengthened
by a decline in institutional buying activity, which is indicated by reduced trading
volumes during price declines. This could indicate a decline in confidence in Amazon's
short-term potential. On the other hand, consistently high trading volumes during price
drops could indicate that investors are hesitant to "buy the dip," which would further
affect stock dynamics.

29
2.3.5. AI model evaluation and comparison with traditional econometric models
In econometrics, several models are used for time-series forecasting, with
ARIMA, OLS, and Bi-LSTM frequently applied in the analysis of stock price
predictions. Each model applied to Amazon’s stock price exhibits distinct strengths and
weaknesses, dependent on the underlying data and market conditions.
The ARIMA model uses autoregressive (AR), differencing (I), and moving
average (MA) components to forecast stock prices. According to Tsay (2010), ARIMA
models work best when the market is stable because they are based on “stationary data”.
In the analysis of Amazon, the ARIMA model forecasted a minor decrease of 0.11% over
a five-day period, with MAE of 2.6177 and a RMSE of 3.4479, reflecting greater errors
relative to the Bi-LSTM model and a R² value of 0.9684. The limitations of ARIMA in
integrating external events or technical indicators, such as AWS growth or e-commerce
expansion, limit its performance during volatile market conditions, including earnings
statements or geopolitical changes.
On the other hand, OLS (Ordinary Least Squares) regression analyzes the
relationship between a dependent variable (stock price) and independent variables (such
as economic indicators and technical indicators like EMA). Ordinary Least Squares
(OLS) models are not adept at managing non-linear relationships and frequently neglect
volatility or technical factors. In the case of Amazon, OLS demonstrated superior
performance compared to ARIMA, as indicated by a R² value of 0.9961, which suggests
a better fit to the data. However, the MAE of 0.9215 and RMSE of 1.2166 for OLS
indicated larger errors than those observed with Bi-LSTM. Furthermore, the dependence
of OLS on linear relationships fails to sufficiently represent short-term trends and
volatility compared to Bi-LSTM, which utilizes historical price patterns and market
signals.
In conclusion, ARIMA demonstrates effectiveness during periods of market
stability, while OLS is suitable for linear relationships and economic variables.
Conversely, Bi-LSTM is more adept at predicting Amazon’s stock price due to its
capacity to capture complex, non-linear interactions and market volatility. This

30
underscores the benefit of Bi-LSTM in stock forecasting, particularly where technical
indicators and market sentiment are essential.
CHAPTER 3: CONCLUSION
3.1. Findings
Predicting the stock price of Amazon (AMZN) for the next five days using a Bi-
LSTM (Bidirectional Long Short-Term Memory) deep learning model has shown
significant promise in forecasting short-term market trends. By leveraging historical
stock data from Yahoo Finance over the past five years, the model successfully identified
patterns in Amazon stock movements. Key features such as Exponential Moving
Averages (EMA_5, EMA_20), Relative Strength Index (RSI), Moving Average
Convergence Divergence (MACD), and Average True Range (ATR) were used to capture
both short-term trends and market volatility.
The accuracy of the model was evaluated using performance metrics, including Mean
Absolute Error (MAE), Root Mean Square Error (RMSE), and R² values, demonstrating
that Bi-LSTM outperforms traditional models such as ARIMA and Linear Regression in
predicting Amazon stock prices over the next 5 days. For example, the MAE is relatively
low, indicating that the Bi-LSTM model's predictions are close to the actual value, while
the R² value indicates that the model can explain a significant portion of the variation in
stock prices.
3.2. Limitations
Although the Bi-LSTM model is effective, some limitations were identified during
the analysis. A significant challenge is that the model is primarily trained using historical
stock price data and does not account for external variables such as macroeconomic
indicators, industry news, or global events (e.g., the COVID-19 pandemic or regulatory
changes) that can cause sudden stock price fluctuations. Ignoring these factors means that
the model may not be able to predict large market changes influenced by unpredictable,
often non-linear global events.
In addition, while the Bi-LSTM model is capable of capturing non-linear
relationships in the data, it still faces challenges when dealing with rare market

31
phenomena or sudden shocks that can cause significant price fluctuations in very short
periods of time. The model also relies on feature engineering and lag-based input data
(e.g., the previous 60 days), which limits its ability to predict price movements based on
external factors or real-time sentiment changes.
Furthermore, while the Bi-LSTM model provides accurate predictions for the
short term (5 days), its generalizability over longer time frames (e.g., weeks or months)
remains uncertain. A more robust model would need to take longer time frames and
external influences into account, potentially incorporating both historical data and real-
time input data.
3.3. Recommendations
To address these limitations and improve prediction accuracy, future research
should focus on integrating external data sources, such as sentiment analysis of financial
news, social media, or macroeconomic indicators. For example, using natural language
processing (NLP) techniques to analyze news sentiment or investor sentiment on
platforms like Twitter could provide valuable insights into how Amazon’s stock price is
affected. Additionally, including broader economic factors like GDP growth, inflation,
and interest rates would allow the model to better account for market conditions that
affect stock prices.
Furthermore, experimenting with other deep learning algorithms, such as Support
Vector Machines (SVM), Random Forest, or Gradient Boosting Machines (GBM), could
provide better performance in handling non-linear relationships or complex data patterns
that Bi-LSTM may struggle with. These methods, combined with deep learning, could
enhance the model’s ability to handle diverse data and improve prediction reliability.
For practical use, investors should combine the model’s stock predictions with
fundamental analysis and regular market monitoring. Integrating the model’s output with
major market events (such as earnings reports, product launches, and major market
moves) can improve decision-making and provide a more comprehensive approach to
stock price forecasting. Furthermore, using real-time updates and continuously training
the model with the latest data will keep predictions relevant and timely.

32
Finally, using sentiment analysis tools and understanding market sentiment can
provide value by providing insight into how the market reacts to news or events that are
not immediately reflected in stock data. This multifaceted approach can lead to more
informed investment strategies and better decision-making.

33
REFERENCES

Bao, W., Yue, J. and Rao, Y. (2017). A deep learning framework for financial time series
using stacked autoencoders and long-short term memory. PLOS ONE, 12(7),
p.e0180944. doi:https://doi.org/10.1371/journal.pone.0180944.

Barchart (n.d.). AMZN - Amazon.com Stock Price. [online] Barchart.com. Available at:
https://www.barchart.com/stocks/quotes/AMZN/overview.

Bensinger, G. and Sophia, D.M. (2025). Amazon shares drop as cloud growth, sales
forecast lag. Reuters. [online] 7 Feb. Available at:
https://www.reuters.com/technology/amazon-beats-quarterly-revenue-estimates-2025-02-
06/.

Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal


of Econometrics, 31(3), pp.307–327. doi:https://doi.org/10.1016/0304-4076(86)90063-1.

Chong, E., Han, C. and Park, F.C. (2017). Deep learning networks for stock market
analysis and prediction: Methodology, data representations, and case studies. Expert
Systems with Applications, 83, pp.187–205.
doi:https://doi.org/10.1016/j.eswa.2017.04.030.

Dao, O. and Nguyen, C. (2024). Dự báo chỉ số chứng khoán bằng học máy: Bằng chứng
thực nghiệm từ thị trường chứng khoán Việt Nam. [online] Philarchive.org. Available at:
https://philarchive.org/rec/LKIDBC [Accessed 18 Mar. 2025].

Fama, E. (1970). Efficient capital markets: A review of theory and empirical work. The
Journal of Finance, 25(2), pp.383–417. doi:https://doi.org/10.2307/2325486.

Fischer, T. and Krauss, C. (2018). Deep learning with long short-term memory networks
for financial market predictions. European Journal of Operational Research, 270(2),
pp.654–669.

34
Gupta, R. and Chen, M. (2020). Sentiment Analysis for Stock Price Prediction. [online]
IEEE Xplore. doi:https://doi.org/10.1109/MIPR49039.2020.00051.

Hochreiter, S. and Schmidhuber, J. (1997). Long Short-Term Memory. Neural


Computation, 9(8), pp.1735–1780. doi:https://doi.org/10.1162/neco.1997.9.8.1735.

Hughes, J., Alim, A.N. and Smith, I. (2025). Is the Federal Reserve’s preferred measure
of inflation set to fall? [online] @FinancialTimes. Available at:
https://www.ft.com/content/2a85a487-c881-4f9f-b287-87617b6673d3.

Kim, T. and Kim, H.Y. (2019). Forecasting stock prices with a feature fusion LSTM-
CNN model using different representations of the same data. PLOS ONE, [online] 14(2),
p.e0212320. doi:https://doi.org/10.1371/journal.pone.0212320.

Kuang, S. (2023). A Comparison of Linear Regression, LSTM model and ARIMA model
in Predicting Stock Price A Case Study: HSBC’s Stock Price. BCP Business &
Management, 44, pp.478–488. doi:https://doi.org/10.54691/bcpbm.v44i.4858.

Ma, Y. (2023). A Comparative Analysis of Amazon, Microsoft, and Apple’s Stock


Investment Value. Highlights in Business, Economics and Management, [online] 13(2),
pp.231–235. doi:https://doi.org/10.54097/hbem.v13i.8825.

Money Morning (2024). Home. [online] Moneymorning.com. Available at:


https://moneymorning.com/ [Accessed 18 Mar. 2025].

Nelson, D.M.Q., Pereira, A.C.M. and de Oliveira, R.A. (2017). Stock market’s price
movement prediction with LSTM neural networks. 2017 International Joint Conference
on Neural Networks (IJCNN). doi:https://doi.org/10.1109/ijcnn.2017.7966019.

Oak, O., Nazre, R., Budke, R. and Mahatekar, Y. (2024). A Novel Multivariate Bi-LSTM
model for Short-Term Equity Price Forecasting. [online] arXiv.org. Available at:
https://arxiv.org/abs/2409.14693.

35
Onyenahazi, O.B. and Antwi, B.O. (2024). The Role of Artificial Intelligence in
Investment Decision-Making: Opportunities and Risks for Financial Institutions.
International Journal of Research Publication and Reviews, 5(10), pp.70–85.
doi:https://doi.org/10.55248/gengpi.5.1024.2701.

Schuster, M. and Paliwal, K.K. (1997). Bidirectional recurrent neural networks. IEEE
Transactions on Signal Processing, 45(11), pp.2673–2681.
doi:https://doi.org/10.1109/78.650093.

Selvin, S., Vinayakumar, R., Gopalakrishnan, E.A., Menon, V.K. and Soman, K.P.
(2017). Stock price prediction using LSTM, RNN and CNN-sliding window model. 2017
International Conference on Advances in Computing, Communications and Informatics
(ICACCI). doi:https://doi.org/10.1109/icacci.2017.8126078.

Shynkevich, Y., McGinnity, T.M., Coleman, S.A., Belatreche, A. and Li, Y. (2017).
Forecasting price movements using technical indicators: Investigating the impact of
varying input window length. Neurocomputing, 264, pp.71–88.
doi:https://doi.org/10.1016/j.neucom.2016.11.095.

Tran, D.T. (2024). Đánh giá hiệu suất mô hình phức hợp LSTM-GRU: nghiên cứu điển
hình về dự báo chỉ số đo lường xu hướng biến động giá cổ phiếu trên sàn giao dịch chứng
khoán Hồ Chí Minh. CTU Journal of Science, [online] 60(1).
doi:https://doi.org/10.22144/ctujos.2023.232.

Tsay, R.S. (2010). Analysis of Financial Time Series. [online] Wiley Series in Probability
and Statistics. Hoboken, NJ, USA: John Wiley & Sons, Inc.
doi:https://doi.org/10.1002/9780470644560.

Zhang, X. (2024). Analysis of Strategies and Performance Based on the Background of


Amazon Company. Economics, law and policy, 7(1), pp.p99–p99.
doi:https://doi.org/10.22158/elp.v7n1p99.

36

You might also like