0% found this document useful (0 votes)
43 views

Evaluating Prediction of Stock Price using Machine Learning

The extrapolation of stock prices is an essential and unresolved problem in the sphere of finance because the results of an accurate forecast can produce considerable economic consequences and the nature of the markets makes the task difficult. This research aims at applying the concept of machine learning in forecasting of stock price for Google shares using historical data of the company’s stock for the last20 years.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Evaluating Prediction of Stock Price using Machine Learning

The extrapolation of stock prices is an essential and unresolved problem in the sphere of finance because the results of an accurate forecast can produce considerable economic consequences and the nature of the markets makes the task difficult. This research aims at applying the concept of machine learning in forecasting of stock price for Google shares using historical data of the company’s stock for the last20 years.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Evaluating Prediction of Stock Price using


Machine Learning
Amit Kumar Yadav1; Rohit Sharma2; Swastik Bainsla3
Manav Rachna International Institiute of Research and Study

Abstract:- The extrapolation of stock prices is an essential consider the interactions between variables. Fundamental
and unresolved problem in the sphere of finance because analysis computes the maintenance of a firm’s financial
the results of an accurate forecast can produce health and the market characteristics, while technical analysis
considerable economic consequences and the nature of the incorporates past prices along with the components. Although
markets makes the task difficult. This research aims at both approaches have been used in market analysis and
applying the concept of machine learning in forecasting of forecasting, they tend to miss on a holistic view data
stock price for Google shares using historical data of the architecture and may sacrifice much of lessoned data in favor
company’s stock for the last20 years. The qualitative of the noisy and high-dimensional data set. Machine learning
aspect of the research is the collection of data with the use has taken the world by storm when it comes to predictive
of the yfinance API, data preprocessing with the handling modeling and fields such as finance have not been left
of missing values and removal of outliers. If further behind[8]. Machine learning algorithms do well where other
feature engineering, then the technical indicators methods do not since this involves mining for patterns and
included the simple moving averages and daily returns in relationships not discernible by human observation, such as
order to improve on the capability of the model. Three non-linear or temporal, which apply well in time series
types of machine learning models – Linear Regression, forecasting of financial data. Hypothesis and types of models
Random Forest, and Long Short-Term Memory (LSTM) like Linear Regression, Random Forest, and Long Short-
Networks – were built experimentally and compared Term Memory (LSTM) networks are quite effective to
based on MAE and RMSE performance indices. Out of capture the attributes of historical stock data and as well to
these, LSTM model provided better performance because provide future predictions. These methods enable the
it deals with temporal issues well by capturing temporal incorporation of superior characteristics of feature
dependency and non linear trends in the data. In so doing, engineering and technical indicators to boost the efficiency of
this research establishes the significance of state-of-the- models used to predict stock price[7].
art generous learning models in monetary prediction
while stressing the efficacy of data origination and feature This work aims at using stock price prediction through
engineering. The results are quite informative for machine learning approach on the stock prices of Google
investors and financial analysts, as well as for improving (GOOG). The dataset was collected from the yfinance API
the creation of further prediction models. Future work and covered the last 20 years of data, after most of the
can also complement internal information with external preprocessing was applied. These key technical features,
variables like sentiment analysis and macroeconomic comprising of moving averages and daily returns, were
factors to improve their models. developed to ensure that it generates useful input for the
models. Among three proposed models, namely the Linear
Keywords:- Stock Prediction, Machine Learning, LSTM, Regression model, Random Forest model and LSTM model,
Stock Price Forecasting, Feature Engineering, Financial the feasibility of capturing stock price trend was analyzed[1].
Time Series, Yfinance.
The objectives of this study are twofold: first, to analyze
I. INTRODUCTION and compare accuracy of various machine learning models in
stock price prediction; and second, to determine the
The stock market is the integration of many moving advantages and disadvantages of using different models to
parts and all that happens around can influence the operations analyze the movement of financial time series data. The
including the economic situation, political occurrences, and results of this study add to existing literature on using
attitude among others in the market. Stock price forecasting machine learning in financial forecasting and provide insights
is one of the oldest objectives of the financial analysis, that will benefit investors, analysts, and researchers[10].
economists, and researchers because of the impact on risk and Additionally, this work identifies directions for future
return reduction. However, fluctuations and stochastic research that can be based on the use of other types of external
character of stock prices is indeed a crucial problem of data, including sentiment analysis results and various
financial markets, and traditional techniques of forecasting macroeconomic factors to increase the levels of prediction
are not very effective considering stock price dynamics. The accuracy. This research seeks to fill the gap in the literature
problem is that basic and technical analysis often used for by employing the use of Machine Learning algorithms in the
stock price prediction cannot efficiently address the process of generating sales forecasts with an observation of
difficulties when it deals with massive data also cannot fully the complexity of the modern financial markets. The findings

IJISRT24NOV1725 www.ijisrt.com 2644


Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

in the study thus bring into the fore the usefulness of data- C. New Techniques in Deep Learning Applied to Forecasting
driven techniques in driving decision-making and managing Using Time Series Since the development of deep
risks especially in an uncertain and turbulent environment as learning, stock price prediction has been taken to the next
is characteristic of the retail sector[9]. level. LSTM is a type of RNN especially useful in time series
data analysis and is the model to be introduced in this paper.
II. LITERATURE REVIEW Differences between LSTMs and normal RNNs, the former is
good at tempora dependencies and sequential patterns, and
Stock price prediction is perhaps one of the most they also solve the vanishing gradient problem. Similarly,
actively researched areas, because the knowledge of stock FPGA implementation of LSTM networks was exemplified
prices trends can help investors to make better investments by Fischer and Krauss (2018) who established that the use of
and decrease their risks[11]. When it comes to stock market LSTM significantly outperforms the traditional models in
forecasting, traditional models like Statistical models and stock market indices prediction. On similar grounds, Zhong
Technical analysis have been used in the past many a times. and Enke (2019) also noted that wireless combination of
However, those conventional methods of analysis have LSTM concerning sentiment analysis would improve the
constraints especially with nonlinear relation and dealing stock price forecasting by adding news data and social media
with large data sets necessitate the use of machine learning sentiment.
algorithms. This section reflects on the development of the
methodologies used in stock price prediction pointing to its D. Feature Engineering and Data Enrichment.
major strides as well as the issues arising there from The process of feature engineering is considered to be
especially in relation to machine learning. rather important for the increase of a predictive model’s
accuracy. EMA, Bollinger Bands, and momentum oscillators
A. The Success of Stock Price Prediction are the popular inputs used in machine learning algorithms as
Through traditional methods Conventional analytic they represent technical aspects of the financial signal. In a
techniques including linear regression, ARIMA and GARCH paper by Chen et al in 2020, the authors were able to
have been the typical methods to analyze the stock prices in demonstrate that the integration of TA increased the accuracy
the past. Specifically, AutoRegressive Integrated Moving levels of the models in the prediction of stock prices by a big
Average (ARIMA) type of modeling is prominent in time margin.
series forecasting because it bases its analysis on the
hypothesis that there is a Linear relationship between lagged Aside from those technical values, the use of other
variables[6]. However, such methods are fairly incapable of sources, like macrovariables, news feeds, and geopolitics, has
handling non-linearity and randomness associated with stock been studied for enhancing data sets. Such applications as
price changes resulting into poor performance in volatile sentiment analysis of financial news and social media
markets. Technological analysis, on the other hand, uses platforms has found to be a worthy method that can be used
charts, price data as well as volume information and other to capture the market sentiment. In another study, Bollen et
characteristics including moving averages and RSI. Then al., 2011 showed that it is possible to accurately forecast the
there is the technical analysis, this though very efficient in state of share trading based on sentiment in Twitter.
short term trading lacks adequate capacity to factor external
conditions such as news or sentiment of the market[2]. E. The Hindrances to Accuracy in Machine Learning Stock
Prediction
B. Introduction & Evolution of Machine Learning Nevertheless, the application of the machine learning
In Forecasting of Financial Statements AI methods have based methods in the context of stock price prediction comes
revolutionized the manner in which analysts work towards with the following challenges. The problems are overfitting
forecasting stock prices, by correcting some of the restraining which is prevalent in most modern techniques such as deep
aspects of conventional practices[12]. The capabilities of neural networks, complex models. This can result in model
machine learning algorithms are different from statistical that can work well within training data but poorly in the test
models in which specific tendencies that existed in the given data. Methods like cross-validation, regularization, and
data have to be presupposed. First solutions considered the dropout can be used to deal with this problem.
pattern of supervised learning methods like the support vector
machine (SVM), and decision trees that proven to be valuable The problem of noise in the sets of financial data, which
in the improvement of accuracy in forecasting the trend of make it difficult to find clear patterns. Outlier removal and
stock prices. Random Forest which is one of the methods of data normalization are the most important preprocessing steps
ensemble learning performs particularly well in addressing that should be further studied to increase the stability of the
issues of non-linearity and high dimensionality. Patel et al. proposed models. However, the data availability and quality
(2015) have found Random Forest to outperform other still remain some key concerns in the construction of sound
machine learning algorithms in the context of stock price models.
prediction by comparing these two methods.

IJISRT24NOV1725 www.ijisrt.com 2645


Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

F. Present Scenario and Future Prospects these substeps plays a role in ensuring the data is clean and
Recent developments in machine learning have been ready for analysis.
directed toward ensemble approaches in which several
algorithms are used to take best advantage of the abilities of  Handling Missing Values
each one of them. For example, the hybrid models combining Missing values are often present in financial datasets
ARIMA with LSTM have proven to avoid linear model due to various factors such as data collection errors or market
limitations while modelling time-series data. Likewise, in holidays. In this study, missing values are imputed using
deep learning attention mechanisms have also received much forward-fill interpolation, where each missing value is
attention as they learn how to focus on the useful features in replaced by the most recent available value:
sequential data[4].

Another new trend is the involvement of a separate kind


of AI technology called XAI or explainable AI in financial
forecasting, as machine learning components are often called
“black boxes”. XAI has the added advantage of bringing
interpretability and transparency in financial applications
thereby making them trustworthy and easy to use. This imputation method ensures that the dataset remains
continuous and usable for training machine learning models.
III. METHODOLOGY
 Outlier Removal
The methodology of this research aims to predict stock Outliers in stock price data can significantly distort
prices using machine learning techniques, incorporating data predictions. For this reason, outliers are removed using a
collection, preprocessing, feature engineering, model simple z-score thresholding method. Any data point where
implementation, and evaluation stages. Each of these steps is the z-score exceeds a certain threshold (e.g., 3) is considered
crucial to building an effective prediction model, and the an outlier and is removed from the dataset.
process is outlined systematically. This approach leverages
historical stock data along with technical indicators,  Normalization
employing both traditional machine learning models (Linear Normalization ensures that the input features are on a
Regression, Random Forest) and deep learning models (Long similar scale, which helps machine learning algorithms
Short-Term Memory networks, or LSTMs) to forecast stock converge faster and reduces the impact of large feature values
price movements. Below is a detailed description of each on model performance. Min-Max scaling is used to normalize
step, including mathematical formulations to clarify the the stock prices and technical indicators into the range [0,1]:
methodologies used.

A. Data Collection
The first step involves the collection of historical stock
data, which serves as the basis for prediction. In this study,
the data was retrieved from the yfinance API. The dataset
includes daily stock prices for the Google (GOOG) stock over
a period of 20 years. The stock data consists of the following Where Smin and Smax represent the minimum and
attributes at each time step t: maximum values of the stock price over the entire dataset,
respectively.
 Open: The price at market opening.
 High: The highest price reached during the trading day. C. Feature Engineering
 Low: The lowest price reached during the trading day. Feature engineering involves creating additional input
 Close: The price at market closing. features that improve the model's predictive capabilities. In
 Volume: The number of shares traded. this study, technical indicators are derived from the stock
price data to better capture market trends and behaviors.
Let the time series data be represented as: These features are then added to the input feature set Xt.

 Moving Average (MA)


A moving average (MA) is used to smooth out short-
term fluctuations in stock prices and highlight longer-term
Where St represents the closing stock price at time t, and T is trends. The simple moving average at time t with a window
the total number of time steps in the dataset. size of n is computed as:
B. Data Preprocessing
Data preprocessing is a critical step in preparing the raw
data for machine learning models. This includes handling
missing values, outlier detection, and normalization. Each of

IJISRT24NOV1725 www.ijisrt.com 2646


Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Where St is the stock price at time tt, and n is the number Random Forests are capable of capturing complex, non-
of periods (e.g., 50-day or 200-day moving average). linear relationships and are robust to overfitting, making them
suitable for stock price prediction.
 Daily Return
The daily return, which measures the percentage  Long Short-Term Memory (LSTM) Networks
change in the stock price from one day to the next, is LSTM networks are a type of recurrent neural network
calculated as: (RNN) designed to capture temporal dependencies in time-
series data. The LSTM cell updates its states through a series
of gates:

 Forget Gate:

This feature captures the rate of change in the stock price


and is commonly used in financial modeling to assess market
momentum.
 Input Gate:
These technical indicators are combined with the raw
stock price data to create the final feature set:

 Cell State Update:


D. Machine Learning Models
This study evaluates three different machine learning
models: Linear Regression, Random Forest, and Long Short-
Term Memory (LSTM) networks. These models are chosen
based on their ability to capture both linear and non-linear
relationships in the stock price data.

 Linear Regression  Output Gate:


Linear Regression assumes a linear relationship
between the input features Xt and the target stock price St.
The model is formulated as:

 Hidden State Update:

Where:
 β 0 is the intercept, Where σ is the sigmoid activation function, and ⊙ is the
 βi are the coefficients of the features Xi,t element-wise multiplication.
 ϵ is the error term (residual).
LSTM networks are particularly effective in capturing
Linear regression is a simple model that provides the long-term dependencies in time-series data, making them
interpretable coefficients but may not capture complex suitable for predicting stock prices based on past behavior.
patterns in stock price data.
E. Model Evaluation
 Random Forest To assess the performance of the models, we use the
Random Forest is an ensemble learning method based following evaluation metrics:
on decision trees. Each tree makes a prediction, and the final
prediction is the average of the individual tree predictions:  Mean Absolute Error (MAE):

Where fj represents the j-th decision tree in the forest, This metric computes the average absolute error
and k is the total number of trees. between predicted and actual values, providing a clear
measure of prediction accuracy.

IJISRT24NOV1725 www.ijisrt.com 2647


Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

 Root Mean Squared Error (RMSE):  R-Squared (R2R2):

RMSE penalizes larger errors more heavily, making it R-squared indicates the proportion of the variance in the
useful for detecting large deviations in predictions. dependent variable (stock price) that is predictable from the
independent variables (features).

IV. RESULTS

Fig 1: Closing Price of Google Data

Fig 2: Open of Google Data

IJISRT24NOV1725 www.ijisrt.com 2648


Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 3: High of Google Data

Fig 4: Low of Google Data

Fig 5: Close of Google Data

IJISRT24NOV1725 www.ijisrt.com 2649


Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 6: Adj Close of Google Data

Fig 7: Volume of Google Data

Fig 8: MA_for_days of Google Data

IJISRT24NOV1725 www.ijisrt.com 2650


Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 9: MA_for_250_days of Google Data

Fig 10: MA_for_100_days of Google Data

Fig 11: MA of Google Data

IJISRT24NOV1725 www.ijisrt.com 2651


Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

Fig 12: Percentage_Change of Google Data

Fig 13: Test data of Google Data

Fig 14: Whole data of Google Data

IJISRT24NOV1725 www.ijisrt.com 2652


Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

V. CONCLUSION [3]. Soni, P., Tewari, Y., & Krishnan, D. (2022). Machine
learning approaches in stock price prediction: a
In this study, we have outlined a detailed framework for systematic review. In Journal of Physics: Conference
building machine learning models for stock price prediction Series (Vol. 2161, No. 1, p. 012065). IOP Publishing.
successfully. Using historical stock data and the indicator [4]. Jeevan, B., Naresh, E. and Kambli, P., 2018, October.
data obtained in the technical analysis approach, we Share price prediction using machine learning
illustrated how data preprocessing, feature engineering, and technique. In 2018 3rd International Conference on
model selection are essential for increasing the predictive Circuits, control, communication and computing
performance of the algorithms. We used Linear Regression, (i4c) (pp. 1-4). IEEE.
Random Forest and Long Short-Term Memory (LSTM) [5]. Mokalled, W. E. H. M., & Jaber, M. (2019,
models to define relationships that are both linear and non- September). Automated stock price prediction using
linear between the stock prices[5]. machine learning. In Proceedings of the Second
Financial Narrative Processing Workshop (FNP
In the first step of the model, we performed some 2019) (pp. 16-24).
techniques of data preprocessing by inputting missing values [6]. Shahi TB, Shrestha A, Neupane A, Guo W. Stock
and normalizing the data set from the current input data. price forecasting with deep learning: A comparative
Feature engineering complemented technical indicators with study. Mathematics. 2020 Aug 27;8(9):1441. Shahi
MA, daily returns, which strengthened the understanding of TB, Shrestha A, Neupane A, Guo W. Stock price
the market. forecasting with deep learning: A comparative study.
Mathematics. 2020 Aug 27;8(9):1441.
This processed data was used to train the machine [7]. Milosevic N. Equity forecast: Predicting long term
learning models and the performance of the models tested by stock price movement using machine learning. arXiv
using statistical measures such as Mean Absolute Error preprint arXiv:1603.00751. 2016 Mar 2.
(MAE), Root Mean Squared Error (RMSE) and R-squared [8]. Tsai CF, Wang SP. Stock price forecasting by hybrid
(R2R2). These evaluations assisted in the identification of the machine learning techniques. InProceedings of the
best architecture for this particular prediction of stock price. international multiconference of engineers and
computer scientists 2009 Mar 18 (Vol. 1, No. 755, p.
The evaluation outcomes show that compared with 60).
traditional models including Linear Regression and Random [9]. Emioma CC, Edeki SO. Stock price prediction using
Forest, the deep learning models including LSTM, which machine learning on least-squares linear regression
have the ability to model the temporal dependency, achieve basis. InJournal of Physics: Conference Series 2021
better performance in future stock price forecasting. (Vol. 1734, No. 1, p. 012058). IOP Publishing.
However, each model it had its advantage over the other with [10]. Vijh M, Chandola D, Tikkiwal VA, Kumar A. Stock
Random Forest offering a meaningful method for predicting closing price prediction using machine learning
the output for both linear and non-linear models. techniques. Procedia computer science. 2020 Jan
1;167:599-606.
REFERENCES [11]. Chen J, Wen Y, Nanehkaran YA, Suzauddola MD,
Chen W, Zhang D. Machine learning techniques for
[1]. Mehtab, S., Sen, J., Dutta, A. (2021). Stock Price stock price prediction and graphic signal recognition.
Prediction Using Machine Learning and LSTM-Based Engineering Applications of Artificial Intelligence.
Deep Learning Models. In: Thampi, S.M., Piramuthu, 2023 May 1;121:106038.
S., Li, KC., Berretti, S., Wozniak, M., Singh, D. (eds) [12]. Sonkavde G, Dharrao DS, Bongale AM, Deokate ST,
Machine Learning and Metaheuristics Algorithms, Doreswamy D, Bhat SK. Forecasting stock market
and Applications. SoMMA 2020. Communications in prices using machine learning and deep learning
Computer and Information Science, vol 1366. models: A systematic review, performance analysis
Springer, Singapore. and discussion of implications. International Journal
[2]. Sen J, Chaudhuri TD. Stock price prediction using of Financial Studies. 2023 Jul 26;11(3):94.
machine learning and deep learning frameworks. [13]. Habib, Honey, Gautam Siddharth Kashyap, Nazia
InProceedings of the 6th International Conference on Tabassum, and Nafis Tabrez. "Stock price prediction
Business Analytics and Intelligence, Bangalore, India using artificial intelligence based on LSTM–deep
2018 Dec 20 (pp. 20-22). learning model." In Artificial Intelligence &
Blockchain in Cyber Physical Systems, pp. 93-99.
CRC Press, 2023.

IJISRT24NOV1725 www.ijisrt.com 2653


Volume 9, Issue 11, November – 2024 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165

[14]. Abe M, Nakagawa K. Cross-sectional stock price


prediction using deep learning for actual investment
management. InProceedings of the 2020 Asia Service
Sciences and Software Engineering Conference 2020
May 13 (pp. 9-15).
[15]. Cho CH, Lee GY, Tsai YL, Lan KC. Toward stock
price prediction using deep learning. InProceedings of
the 12th IEEE/ACM International Conference on
Utility and Cloud Computing Companion 2019 Dec 2
(pp. 133-135).
[16]. Kumari J, Sharma V, Chauhan S. Prediction of stock
price using machine learning techniques: A survey.
In2021 3rd International conference on advances in
computing, communication control and networking
(ICAC3N) 2021 Dec 17 (pp. 281-284). IEEE

IJISRT24NOV1725 www.ijisrt.com 2654

You might also like