Evaluating Prediction of Stock Price using Machine Learning
Evaluating Prediction of Stock Price using Machine Learning
ISSN No:-2456-2165
Abstract:- The extrapolation of stock prices is an essential consider the interactions between variables. Fundamental
and unresolved problem in the sphere of finance because analysis computes the maintenance of a firm’s financial
the results of an accurate forecast can produce health and the market characteristics, while technical analysis
considerable economic consequences and the nature of the incorporates past prices along with the components. Although
markets makes the task difficult. This research aims at both approaches have been used in market analysis and
applying the concept of machine learning in forecasting of forecasting, they tend to miss on a holistic view data
stock price for Google shares using historical data of the architecture and may sacrifice much of lessoned data in favor
company’s stock for the last20 years. The qualitative of the noisy and high-dimensional data set. Machine learning
aspect of the research is the collection of data with the use has taken the world by storm when it comes to predictive
of the yfinance API, data preprocessing with the handling modeling and fields such as finance have not been left
of missing values and removal of outliers. If further behind[8]. Machine learning algorithms do well where other
feature engineering, then the technical indicators methods do not since this involves mining for patterns and
included the simple moving averages and daily returns in relationships not discernible by human observation, such as
order to improve on the capability of the model. Three non-linear or temporal, which apply well in time series
types of machine learning models – Linear Regression, forecasting of financial data. Hypothesis and types of models
Random Forest, and Long Short-Term Memory (LSTM) like Linear Regression, Random Forest, and Long Short-
Networks – were built experimentally and compared Term Memory (LSTM) networks are quite effective to
based on MAE and RMSE performance indices. Out of capture the attributes of historical stock data and as well to
these, LSTM model provided better performance because provide future predictions. These methods enable the
it deals with temporal issues well by capturing temporal incorporation of superior characteristics of feature
dependency and non linear trends in the data. In so doing, engineering and technical indicators to boost the efficiency of
this research establishes the significance of state-of-the- models used to predict stock price[7].
art generous learning models in monetary prediction
while stressing the efficacy of data origination and feature This work aims at using stock price prediction through
engineering. The results are quite informative for machine learning approach on the stock prices of Google
investors and financial analysts, as well as for improving (GOOG). The dataset was collected from the yfinance API
the creation of further prediction models. Future work and covered the last 20 years of data, after most of the
can also complement internal information with external preprocessing was applied. These key technical features,
variables like sentiment analysis and macroeconomic comprising of moving averages and daily returns, were
factors to improve their models. developed to ensure that it generates useful input for the
models. Among three proposed models, namely the Linear
Keywords:- Stock Prediction, Machine Learning, LSTM, Regression model, Random Forest model and LSTM model,
Stock Price Forecasting, Feature Engineering, Financial the feasibility of capturing stock price trend was analyzed[1].
Time Series, Yfinance.
The objectives of this study are twofold: first, to analyze
I. INTRODUCTION and compare accuracy of various machine learning models in
stock price prediction; and second, to determine the
The stock market is the integration of many moving advantages and disadvantages of using different models to
parts and all that happens around can influence the operations analyze the movement of financial time series data. The
including the economic situation, political occurrences, and results of this study add to existing literature on using
attitude among others in the market. Stock price forecasting machine learning in financial forecasting and provide insights
is one of the oldest objectives of the financial analysis, that will benefit investors, analysts, and researchers[10].
economists, and researchers because of the impact on risk and Additionally, this work identifies directions for future
return reduction. However, fluctuations and stochastic research that can be based on the use of other types of external
character of stock prices is indeed a crucial problem of data, including sentiment analysis results and various
financial markets, and traditional techniques of forecasting macroeconomic factors to increase the levels of prediction
are not very effective considering stock price dynamics. The accuracy. This research seeks to fill the gap in the literature
problem is that basic and technical analysis often used for by employing the use of Machine Learning algorithms in the
stock price prediction cannot efficiently address the process of generating sales forecasts with an observation of
difficulties when it deals with massive data also cannot fully the complexity of the modern financial markets. The findings
in the study thus bring into the fore the usefulness of data- C. New Techniques in Deep Learning Applied to Forecasting
driven techniques in driving decision-making and managing Using Time Series Since the development of deep
risks especially in an uncertain and turbulent environment as learning, stock price prediction has been taken to the next
is characteristic of the retail sector[9]. level. LSTM is a type of RNN especially useful in time series
data analysis and is the model to be introduced in this paper.
II. LITERATURE REVIEW Differences between LSTMs and normal RNNs, the former is
good at tempora dependencies and sequential patterns, and
Stock price prediction is perhaps one of the most they also solve the vanishing gradient problem. Similarly,
actively researched areas, because the knowledge of stock FPGA implementation of LSTM networks was exemplified
prices trends can help investors to make better investments by Fischer and Krauss (2018) who established that the use of
and decrease their risks[11]. When it comes to stock market LSTM significantly outperforms the traditional models in
forecasting, traditional models like Statistical models and stock market indices prediction. On similar grounds, Zhong
Technical analysis have been used in the past many a times. and Enke (2019) also noted that wireless combination of
However, those conventional methods of analysis have LSTM concerning sentiment analysis would improve the
constraints especially with nonlinear relation and dealing stock price forecasting by adding news data and social media
with large data sets necessitate the use of machine learning sentiment.
algorithms. This section reflects on the development of the
methodologies used in stock price prediction pointing to its D. Feature Engineering and Data Enrichment.
major strides as well as the issues arising there from The process of feature engineering is considered to be
especially in relation to machine learning. rather important for the increase of a predictive model’s
accuracy. EMA, Bollinger Bands, and momentum oscillators
A. The Success of Stock Price Prediction are the popular inputs used in machine learning algorithms as
Through traditional methods Conventional analytic they represent technical aspects of the financial signal. In a
techniques including linear regression, ARIMA and GARCH paper by Chen et al in 2020, the authors were able to
have been the typical methods to analyze the stock prices in demonstrate that the integration of TA increased the accuracy
the past. Specifically, AutoRegressive Integrated Moving levels of the models in the prediction of stock prices by a big
Average (ARIMA) type of modeling is prominent in time margin.
series forecasting because it bases its analysis on the
hypothesis that there is a Linear relationship between lagged Aside from those technical values, the use of other
variables[6]. However, such methods are fairly incapable of sources, like macrovariables, news feeds, and geopolitics, has
handling non-linearity and randomness associated with stock been studied for enhancing data sets. Such applications as
price changes resulting into poor performance in volatile sentiment analysis of financial news and social media
markets. Technological analysis, on the other hand, uses platforms has found to be a worthy method that can be used
charts, price data as well as volume information and other to capture the market sentiment. In another study, Bollen et
characteristics including moving averages and RSI. Then al., 2011 showed that it is possible to accurately forecast the
there is the technical analysis, this though very efficient in state of share trading based on sentiment in Twitter.
short term trading lacks adequate capacity to factor external
conditions such as news or sentiment of the market[2]. E. The Hindrances to Accuracy in Machine Learning Stock
Prediction
B. Introduction & Evolution of Machine Learning Nevertheless, the application of the machine learning
In Forecasting of Financial Statements AI methods have based methods in the context of stock price prediction comes
revolutionized the manner in which analysts work towards with the following challenges. The problems are overfitting
forecasting stock prices, by correcting some of the restraining which is prevalent in most modern techniques such as deep
aspects of conventional practices[12]. The capabilities of neural networks, complex models. This can result in model
machine learning algorithms are different from statistical that can work well within training data but poorly in the test
models in which specific tendencies that existed in the given data. Methods like cross-validation, regularization, and
data have to be presupposed. First solutions considered the dropout can be used to deal with this problem.
pattern of supervised learning methods like the support vector
machine (SVM), and decision trees that proven to be valuable The problem of noise in the sets of financial data, which
in the improvement of accuracy in forecasting the trend of make it difficult to find clear patterns. Outlier removal and
stock prices. Random Forest which is one of the methods of data normalization are the most important preprocessing steps
ensemble learning performs particularly well in addressing that should be further studied to increase the stability of the
issues of non-linearity and high dimensionality. Patel et al. proposed models. However, the data availability and quality
(2015) have found Random Forest to outperform other still remain some key concerns in the construction of sound
machine learning algorithms in the context of stock price models.
prediction by comparing these two methods.
F. Present Scenario and Future Prospects these substeps plays a role in ensuring the data is clean and
Recent developments in machine learning have been ready for analysis.
directed toward ensemble approaches in which several
algorithms are used to take best advantage of the abilities of Handling Missing Values
each one of them. For example, the hybrid models combining Missing values are often present in financial datasets
ARIMA with LSTM have proven to avoid linear model due to various factors such as data collection errors or market
limitations while modelling time-series data. Likewise, in holidays. In this study, missing values are imputed using
deep learning attention mechanisms have also received much forward-fill interpolation, where each missing value is
attention as they learn how to focus on the useful features in replaced by the most recent available value:
sequential data[4].
A. Data Collection
The first step involves the collection of historical stock
data, which serves as the basis for prediction. In this study,
the data was retrieved from the yfinance API. The dataset
includes daily stock prices for the Google (GOOG) stock over
a period of 20 years. The stock data consists of the following Where Smin and Smax represent the minimum and
attributes at each time step t: maximum values of the stock price over the entire dataset,
respectively.
Open: The price at market opening.
High: The highest price reached during the trading day. C. Feature Engineering
Low: The lowest price reached during the trading day. Feature engineering involves creating additional input
Close: The price at market closing. features that improve the model's predictive capabilities. In
Volume: The number of shares traded. this study, technical indicators are derived from the stock
price data to better capture market trends and behaviors.
Let the time series data be represented as: These features are then added to the input feature set Xt.
Where St is the stock price at time tt, and n is the number Random Forests are capable of capturing complex, non-
of periods (e.g., 50-day or 200-day moving average). linear relationships and are robust to overfitting, making them
suitable for stock price prediction.
Daily Return
The daily return, which measures the percentage Long Short-Term Memory (LSTM) Networks
change in the stock price from one day to the next, is LSTM networks are a type of recurrent neural network
calculated as: (RNN) designed to capture temporal dependencies in time-
series data. The LSTM cell updates its states through a series
of gates:
Forget Gate:
Where:
β 0 is the intercept, Where σ is the sigmoid activation function, and ⊙ is the
βi are the coefficients of the features Xi,t element-wise multiplication.
ϵ is the error term (residual).
LSTM networks are particularly effective in capturing
Linear regression is a simple model that provides the long-term dependencies in time-series data, making them
interpretable coefficients but may not capture complex suitable for predicting stock prices based on past behavior.
patterns in stock price data.
E. Model Evaluation
Random Forest To assess the performance of the models, we use the
Random Forest is an ensemble learning method based following evaluation metrics:
on decision trees. Each tree makes a prediction, and the final
prediction is the average of the individual tree predictions: Mean Absolute Error (MAE):
Where fj represents the j-th decision tree in the forest, This metric computes the average absolute error
and k is the total number of trees. between predicted and actual values, providing a clear
measure of prediction accuracy.
RMSE penalizes larger errors more heavily, making it R-squared indicates the proportion of the variance in the
useful for detecting large deviations in predictions. dependent variable (stock price) that is predictable from the
independent variables (features).
IV. RESULTS
V. CONCLUSION [3]. Soni, P., Tewari, Y., & Krishnan, D. (2022). Machine
learning approaches in stock price prediction: a
In this study, we have outlined a detailed framework for systematic review. In Journal of Physics: Conference
building machine learning models for stock price prediction Series (Vol. 2161, No. 1, p. 012065). IOP Publishing.
successfully. Using historical stock data and the indicator [4]. Jeevan, B., Naresh, E. and Kambli, P., 2018, October.
data obtained in the technical analysis approach, we Share price prediction using machine learning
illustrated how data preprocessing, feature engineering, and technique. In 2018 3rd International Conference on
model selection are essential for increasing the predictive Circuits, control, communication and computing
performance of the algorithms. We used Linear Regression, (i4c) (pp. 1-4). IEEE.
Random Forest and Long Short-Term Memory (LSTM) [5]. Mokalled, W. E. H. M., & Jaber, M. (2019,
models to define relationships that are both linear and non- September). Automated stock price prediction using
linear between the stock prices[5]. machine learning. In Proceedings of the Second
Financial Narrative Processing Workshop (FNP
In the first step of the model, we performed some 2019) (pp. 16-24).
techniques of data preprocessing by inputting missing values [6]. Shahi TB, Shrestha A, Neupane A, Guo W. Stock
and normalizing the data set from the current input data. price forecasting with deep learning: A comparative
Feature engineering complemented technical indicators with study. Mathematics. 2020 Aug 27;8(9):1441. Shahi
MA, daily returns, which strengthened the understanding of TB, Shrestha A, Neupane A, Guo W. Stock price
the market. forecasting with deep learning: A comparative study.
Mathematics. 2020 Aug 27;8(9):1441.
This processed data was used to train the machine [7]. Milosevic N. Equity forecast: Predicting long term
learning models and the performance of the models tested by stock price movement using machine learning. arXiv
using statistical measures such as Mean Absolute Error preprint arXiv:1603.00751. 2016 Mar 2.
(MAE), Root Mean Squared Error (RMSE) and R-squared [8]. Tsai CF, Wang SP. Stock price forecasting by hybrid
(R2R2). These evaluations assisted in the identification of the machine learning techniques. InProceedings of the
best architecture for this particular prediction of stock price. international multiconference of engineers and
computer scientists 2009 Mar 18 (Vol. 1, No. 755, p.
The evaluation outcomes show that compared with 60).
traditional models including Linear Regression and Random [9]. Emioma CC, Edeki SO. Stock price prediction using
Forest, the deep learning models including LSTM, which machine learning on least-squares linear regression
have the ability to model the temporal dependency, achieve basis. InJournal of Physics: Conference Series 2021
better performance in future stock price forecasting. (Vol. 1734, No. 1, p. 012058). IOP Publishing.
However, each model it had its advantage over the other with [10]. Vijh M, Chandola D, Tikkiwal VA, Kumar A. Stock
Random Forest offering a meaningful method for predicting closing price prediction using machine learning
the output for both linear and non-linear models. techniques. Procedia computer science. 2020 Jan
1;167:599-606.
REFERENCES [11]. Chen J, Wen Y, Nanehkaran YA, Suzauddola MD,
Chen W, Zhang D. Machine learning techniques for
[1]. Mehtab, S., Sen, J., Dutta, A. (2021). Stock Price stock price prediction and graphic signal recognition.
Prediction Using Machine Learning and LSTM-Based Engineering Applications of Artificial Intelligence.
Deep Learning Models. In: Thampi, S.M., Piramuthu, 2023 May 1;121:106038.
S., Li, KC., Berretti, S., Wozniak, M., Singh, D. (eds) [12]. Sonkavde G, Dharrao DS, Bongale AM, Deokate ST,
Machine Learning and Metaheuristics Algorithms, Doreswamy D, Bhat SK. Forecasting stock market
and Applications. SoMMA 2020. Communications in prices using machine learning and deep learning
Computer and Information Science, vol 1366. models: A systematic review, performance analysis
Springer, Singapore. and discussion of implications. International Journal
[2]. Sen J, Chaudhuri TD. Stock price prediction using of Financial Studies. 2023 Jul 26;11(3):94.
machine learning and deep learning frameworks. [13]. Habib, Honey, Gautam Siddharth Kashyap, Nazia
InProceedings of the 6th International Conference on Tabassum, and Nafis Tabrez. "Stock price prediction
Business Analytics and Intelligence, Bangalore, India using artificial intelligence based on LSTM–deep
2018 Dec 20 (pp. 20-22). learning model." In Artificial Intelligence &
Blockchain in Cyber Physical Systems, pp. 93-99.
CRC Press, 2023.