Stock Price Prediction Using Machine Learning
Stock Price Prediction Using Machine Learning
of
1
Subject Specific Project report
Acknowledgement
I would like to express my deepest gratitude to Our ML teacher Dr Satya Ranjan Pattanaik
for her continue guidance and constant support throughout the project work, for making it
possible to complete in a good way. I also would also like to thank all teachers of computer
science department for their guidance towards this project work to complete.
Here I take this opportunity to convey my heartfelt thanks to our respected Principal sir, Dr.
Trilochan Sahoo, our esteem Dean Academies. Dr. R.N Panda, Second Year coordinator
Mohapatra Girashree Sahu who gave this opportunity to do this wonderful project and
making learnable journey throughout.
I am thankful to my parents and friends without whom this work couldn't have been so
successfully completed.
Signature of Student
Jyotipriya Panda
2
Certificate
This is to certify that the Subject Specific Project entitled
“Stock Price Prediction Using Machine Learning” has
been carried out by Jyotipriya Panda (2407432009)
completed under my guidance and the project meets the
academic requirement of the subject “Machine
Learning”.
3
Subject Specific Project report
ABSTRACT
The prediction of stock market prices is one of the most challenging tasks in the field of financial
data analysis due to the inherently volatile and non-linear nature of the market. The stock market
is influenced by a multitude of factors, including economic indicators, political events, investor
sentiment, and global financial trends, all of which contribute to the complexity of modeling price
movements. With the advent of advanced machine learning techniques and the availability of vast
historical financial data, it has become increasingly feasible to construct models that can analyze
patterns in stock price movements and provide reasonably accurate predictions. This project
focuses on developing a stock price prediction system using machine learning algorithms to
forecast future prices of selected stocks.
The primary objective of this project is to explore and evaluate various machine learning models
and techniques for predicting the closing prices of stocks based on historical data. The project
employs a systematic approach that begins with the collection of historical stock market data,
followed by data cleaning, preprocessing, feature selection, and model training and evaluation.
Popular regression algorithms such as Linear Regression, Decision Tree Regressor, Random Forest
Regressor, and Support Vector Machine (SVM), as well as deep learning models like Long Short-
Term Memory (LSTM) networks, are analyzed and compared to determine their effectiveness in
capturing trends and making accurate forecasts.
Data preprocessing plays a critical role in the performance of the model, including handling missing
values, normalization, and transforming time series data into a supervised learning problem. The
use of technical indicators such as moving averages, relative strength index (RSI), and exponential
moving average (EMA) is also integrated to enrich the feature set and improve model accuracy.
Models are evaluated using metrics such as Mean Absolute Error (MAE), Mean Squared Error
(MSE), and Root Mean Squared Error (RMSE) to assess the predictive performance and reliability
of each approach.
4
Contents
Certificate ................................................................................................................................................... 3
ABSTRACT ................................................................................................................................................... 4
1 Introduction .................................................................................................................................. 6
1.1 Background........................................................................................................................... 6
1.2 Problem Statement .............................................................................................................. 6
1.3 Objective .............................................................................................................................. 7
1.4 Scope of the Project ............................................................................................................. 7
2 Literature Survey ........................................................................................................................... 8
2.1 Existing System ..................................................................................................................... 8
2.2 Proposed System .................................................................................................................. 8
2.3 Advantages of the Proposed System.................................................................................... 9
3 System Analysisy ........................................................................................................................... 9
3.1 Requirements Analysis ......................................................................................................... 9
3.2 Feasibility Study.................................................................................................................. 10
4 System Design ............................................................................................................................. 11
4.1 SYSTEM REQUIREMENTS .................................................................................................... 11
4.1.2 Software Requirements...................................................................................................... 12
4.2 ARCHITECTURE ................................................................................................................... 12
4.3 MODULE DECRIPTION ........................................................................................................ 13
5 Code ............................................................................................................................................ 18
6 Result and Discussion.................................................................................................................. 20
6.1 Result & Discussion ............................................................................................................ 20
7 Conclusion ................................................................................................................................... 23
8 Reference .................................................................................................................................... 24
5
Subject Specific Project report
1 Introduction
1.1 Background
The stock market has long been a cornerstone of the global economy, serving as a vital
platform for companies to raise capital and for investors to generate wealth. Stock price
movements, however, are highly volatile and influenced by a wide range of factors
including economic performance, political stability, global events, interest rates, corporate
announcements, investor sentiment, and even social media trends. This dynamic and
often unpredictable nature of financial markets makes stock price prediction an
immensely challenging task. Traditionally, investors and financial analysts have relied on
fundamental and technical analysis to make informed decisions. However, with the
advent of modern computational tools and access to vast amounts of historical data, new
methods such as machine learning have emerged as powerful alternatives for modeling
and predicting stock market behavior.
In recent years, the integration of machine learning (ML) techniques into financial
forecasting has gained significant attention due to their ability to discover complex
patterns and relationships within data that traditional statistical methods may fail to
capture. Machine learning algorithms can adaptively learn from data, identify trends, and
make data-driven predictions without being explicitly programmed for the task. This
adaptability makes them particularly suitable for stock price forecasting, where historical
trends and patterns are often indicative of future movements, albeit with a degree of
uncertaintyAmerican Sign Language
American Sign Language (ASL) is one of the most widely used and standardized sign
languages globally, especially in the United States and parts of Canada. It features its own
grammar, vocabulary, and syntax that are distinct from English. One of the unique
features of ASL is fingerspelling, where each letter of the alphabet is represented by a
specific hand position.
Fingerspelling in ASL is often used for spelling out names, technical terms, or words for
which there is no predefined sign. It plays a critical role in bridging gaps in vocabulary
and is often the first step in learning ASL. ASL is not merely a visual representation of
English but a full-fledged language with its own rules and nuances. For this reason,
automatic recognition of ASL gestures—especially alphabet-based fingerspelling—is an
essential component in creating a practical and useful sign language translator.
Previous works have explored the use of webcams and desktop environments to
recognize ASL gestures, but these approaches lack the portability and convenience of
mobile platforms. This project builds upon those foundations by developing an Android-
based solution that leverages the power of OpenCV to perform gesture recognition in real
time using a smartphone camera.
1.3 Objective
The main objective of this project is to develop a machine learning model that can
predict future stock prices based on historical stock data. Specific objectives
include:
• To collect and preprocess historical stock market data from reliable sources.
• To identify and engineer relevant features that influence stock price
movement.
• To explore and implement multiple machine learning algorithms including
regression models and deep learning techniques such as LSTM.
The prediction models will primarily utilize historical stock prices along with
engineered features derived from technical indicators. Although fundamental
data and sentiment analysis are also relevant to stock price movements, they are
beyond the current scope of this project and are proposed as areas for future
enhancement.
The project also emphasizes comparative analysis, wherein multiple machine
learning algorithms will be implemented and evaluated to identify strengths,
weaknesses, and optimal use cases for each. Visualization tools such as Matplotlib
and Seaborn will be used for data analysis and presentation of results. Python
programming language, along with libraries like Scikit-learn, TensorFlow, and
Keras, will form the core technological stack. 7
Subject Specific Project report
2 Literature Survey
2.1 Existing System
Traditionally, stock price prediction has been approached using two major
methodologies: Fundamental Analysis and Technical Analysis.
• Technical Analysis, on the other hand, uses historical market data such as
prices and volumes. Analysts use chart patterns and technical indicators
like Moving Averages, RSI (Relative Strength Index), and Bollinger Bands
to predict future price movements. Though widely used, technical analysis
assumes that historical patterns will repeat themselves, which does not
always hold true in volatile markets.
These traditional methods suffer from limitations such as an inability to model
nonlinear patterns in the data, subjectivity in interpretation, and a reliance on
human expertise.
With the rise of computing power and big data, statistical methods such as ARIMA
(AutoRegressive Integrated Moving Average) and GARCH (Generalized
Autoregressive Conditional Heteroskedasticity) have also been applied for time-
series forecasting. However, they struggle with high-dimensional data and non-
stationary trends in real-world stock prices.
• Data Collection: Gathering historical stock prices from sources like Yahoo
Finance or Alpha Vantage.
• Preprocessing: Cleaning the data, handling missing values, and normalizing
the dataset.
• Feature Engineering: Extracting useful features such as moving averages,
MACD, and volume-based indicators. 8
• Modeling: Implementing and evaluating multiple machine learning models,
including:
▪ Linear Regression
• Evaluation: Using metrics like MAE, MSE, and RMSE to assess model
performance.
The machine learning models are trained on historical data and optimized using
techniques such as cross-validation and grid search. Deep learning models like
LSTM are used to exploit temporal dependencies in the data, providing an edge
over shallow models.
.
3 System Analysisy
3.1 Requirements Analysis
Before developing the stock price prediction system, it is crucial to identify and
analyze both the functional and non-functional requirements of the project to
ensure a successful and efficient implementation
3.1.1 Functional Requirements
• Data Acquisition: The system must be able to fetch or accept historical stock
data from reliable sources such as Yahoo Finance, Alpha Vantage, or CSV files. 9
• Data Preprocessing: The system must clean the raw data by handling
missing values, outliers, and formatting inconsistencies.
Subject Specific Project report
• Model Training: The system must support training using multiple machine
learning algorithms like Linear Regression, Decision Trees, Random Forest,
and LSTM.
• Prediction: Based on the trained model, the system must be able to predict
the next day or future stock prices.
• Evaluation: The system must evaluate model performance using metrics
such as MAE, MSE, and RMSE.
• Visualization: The system should graphically represent trends, actual vs.
predicted values, and evaluation results.
4 System Design
The system design phase translates the functional requirements of the stock price
prediction model into a blueprint for implementation. This phase involves
defining the system architecture, data flow, and modeling the components of the
application using UML diagrams. Good design ensures the system is modular,
scalable, and maintainable.
4.1 SYSTEM REQUIREMENTS
• RAM: 8 GB
• PROCESSOR: 2.4 GH
• IDE: ANACONDA
• OPERATING SYSTEM: WINDOWS 10
4.2 ARCHITECTURE
12
Fig Architecture Design
1. Data Preprocessing
2. Feature selection
The entries are present in the dataset. The null values are removed using df =
df.dropna() where df is the data frame. The categorical attributes
(Date,High,Low,Close,Adj value) are converted into numeric using Label Encoder.
The date attribute is splitted into new attributes like total which can be used as
feature for the model.
Features selection is done which can be used to build the model. The attributes
used for feature selection are Date,Price,Adj close,Forecast X coordinate , Y
coordinate, Latitude , Longitude, Hour and month,
13
Subject Specific Project report
After feature selection location and month attribute are used for training. The
dataset is divided into pair of xtrain ,ytrain and xtest, y test. The algorithms model
is imported form skleran. Building model is done using model. Fit (xtrain, ytrain).
This phase would involve supervised classification methods like linear regression,
Ensemble classifiers (like Adaboost, Random Forest Classifiers), etc.
A notable feature of Python is its indenting of source statements to make the code
easier to read. Python offers dynamic data type, ready- made class, and interfaces
to many system calls and libraries. It can be extended, using the C or C++language.
Python can be used as the script in Microsoft's Active Server Page (ASP)
technology. The scoreboard system for the Melbourne (Australia) Cricket Ground
is written in Python. Z Object Publishing Environment, a popular Web application
server, is also written in the Python language’s
14
4.4.2 Python Library
Machine Learning, as the name suggests, is the science of programming a
computer by which they are able to learn from different kinds of data. A more
general definition given by Arthur Samuel is –“Machine Learning is the field of
study that gives computers the ability to learn without being explicitly
programmed.” They are typically used to solve various types of life problems.
In the older days, people used to perform Machine Learning tasks by manually
coding all the algorithms and mathematical and statistical formula. This made the
process time consuming, tedious and inefficient. But in the modern days, it is
become very much easy and efficient compared to the olden days by various
python libraries, frameworks, and modules. Today, Python is one of the most
popular programming languages for this task and it has replaced many languages
in the industry, one of the reason is its vast collection of libraries. Python libraries
that used in Machine Learning are:
o Numpy
o Scipy
o Scikit- learn
o Theano
o TensorFlow
o Keras
o PyTorch
o Pandas
o Matplotlib
4.4.2.1 NumPy
NumPy is a very popular python library for large multi- dimensional array and
matrix processing, with the help of a large collection of high- level mathematical
functions. It is very useful for fundamental scientific computations in Machine
Learning. It is particularly useful for linear algebra, Fourier transform, and
random number capabilities. High- end libraries like TensorFlow uses NumPy
internally for manipulation of Tensors.
4.4.2.2 SciPy:
15
SciPy is a very popular library among Machine Learning enthusiasts as it contains
different modules for optimization, linear algebra, integration and statistics. There
is a difference between the SciPy library and the SciPy stack. The SciPy is one of
the core packages that make up the SciPy stack. SciPy is also very useful for image
manipulation.
Subject Specific Project report
4.4.2.3 Skikit:
Skikit- learn is one of the most popular ML libraries for classical ML algorithms. It
is built on top of two basic Python libraries, viz., NumPy and SciPy. Scikit- learn
supports most of the supervised and unsupervised learning algorithms. Scikit-
learn can also be used for data- mining and data- analysis, which makes it a great
tool who is starting out with ML.
4.4.2.4 Theano:
We all know that Machine Learning is basically mathematics and statistics. Theano
is a popular python library that is used to define, evaluate and optimize
mathematical expressions involving multi- dimensional arrays in an efficient
manner. It is achieved by optimizing the utilization of CPU and GPU. It is
extensively used for unit- testing and self- verification to detect and diagnose
different types of errors. Theano is a very powerful library that has been used in
large- scale computationally intensive scientific projects for a long time but is
simple and approachable enough to be used by individuals for their own projects.
4.4.2.5 TensorFlow:
TensorFlow is a very popular open- source library for high performance
numerical computation developed by the Google Brain team in Google. As the
name suggests, Tensorflow is a framework that involves defining and running
computations involving tensors. It can train and run deep neural networks that
can be used to develop several AI applications. TensorFlow is widely used in the
field of deep learning research and application.
4.4.2.6 Keras:
Keras is a very popular Machine Learning library for Python. It is a high- level
neural networks API capable of running on top of TensorFlow, CNTK, or Theano. It
can run seamlessly on both CPU and GPU. Keras makes it really for ML beginners
to build and design a Neural Network. One of the best thing about Keras is that it
allows for easy and fast prototyping.
4.4.2.7 PyTorch:
PyTorch is a popular open- source Machine Learning library for Python based on
Torch, which is an open- source Machine Learning library which is implemented
in C with a wrapper in Lua. It has an extensive choice of tools and libraries that
supports on Computer Vision, Natural Language Processing(NLP) and many more
ML programs. It allows developers to perform computations on Tensors with GPU
acceleration and also helps in creating computational graphs.
16
4.4.2.8 Pandas:
Pandas is a popular Python library for data analysis. It is not directly related to
Machine Learning. As we know that the dataset must be prepared before training.
In
this case, Pandas comes handy as it was developed specifically for data extraction
and preparation. It provides high- level data structures and wide variety tools for
data analysis. It provides many inbuilt methods for groping, combining and
filtering data.
4.4.2.9 Matpoltlib:
Matpoltlib is a very popular Python library for data visualization. Like Pandas, it is
not directly related to Machine Learning. It particularly comes in handy when a
programmer wants to visualize the patterns in the data. It is a 2D plotting library
used for creating 2D graphs and plots. A module named pyplot makes it easy for
programmers for plotting as it provides features to control line styles, font
properties, formatting axes, etc. It provides various kinds of graphs and plots for
data visualization, viz., histogram, error charts, bar chats, etc,
17
Subject Specific Project report
5 Code
# 2. Data preprocessing
print("Preprocessing data...")
df = data[['Close']]
scaler = MinMaxScaler(feature_range=(0, 1))
df_scaled = scaler.fit_transform(df)
time_step = 60
X, y = create_dataset(df_scaled, time_step)
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train, epochs=50, batch_size=64, validation_data=(X_test, y_test), verbose=1)
# 6. Make predictions
y_pred = model.predict(X_test)
19
Subject Specific Project report
The results and discussion section presents the outcomes of the machine learning models
implemented for stock price prediction. It includes an in-depth analysis of model
performance, comparison among different algorithms, and an interpretation of their
predictive capabilities. The results are evaluated using both quantitative metrics and
visualizations.
After preprocessing the dataset and engineering relevant features, several machine
learning models were trained and tested. The models implemented included:
• Linear Regression
• Mean Squared Error (MSE): Squares the error to penalize large deviations.
• Root Mean Squared Error (RMSE): Square root of MSE; easier to interpret in the
same unit as the data.
Linear
6.21 58.45 7.64
Regression
Support Vector
5.84 54.01 7.35
Regressor
Decision Tree
4.32 36.78 6.06
Regressor
Random Forest
3.78 28.56 5.34
Regressor
20
From the results, LSTM outperforms all other models, demonstrating the lowest MAE,
MSE, and RMSE. This is due to its ability to learn long-term dependencies in time-series
data, making it ideal for stock price prediction.
To better understand the performance, visual comparisons between actual and predicted
stock prices were made using line charts. The following patterns were observed:
• Linear Regression tends to underfit the data and cannot capture the nonlinear
trends, resulting in wider prediction gaps.
• Decision Tree and Random Forest models show better alignment with actual prices
but sometimes exhibit sharp changes due to overfitting on training data.
• LSTM predictions closely follow the actual trend, with minimal error in most time
windows. It performs particularly well in capturing momentum and volatility in the
price series.
Example Visualization:
• A line graph showing actual vs. predicted stock prices for a specific period revealed
that:
o Random Forest predictions closely tracked the actual prices with occasional
deviation.
o LSTM predictions were smoother and consistently aligned with real trends,
especially near volatile market points.
21
Subject Specific Project report
• Trade-Offs in Simplicity vs. Accuracy: While simpler models like Linear Regression
are easier to implement and interpret, they fall short in performance. More complex
models like Random Forest and LSTM offer better accuracy at the cost of increased
training time and complexity.
6.4 Limitations
• Market Volatility: Sudden market crashes or spikes caused by news events cannot
be predicted accurately by historical data-based models.
• Data Dependency: The accuracy of predictions heavily depends on the quality and
quantity of input data.
• Overfitting Risks: Complex models like LSTM may overfit if not properly regularized
or validated.
6.5 Summary
In conclusion, the experimental results affirm the potential of machine learning, especially
deep learning, in forecasting stock prices with reasonable accuracy. Among the tested
models, LSTM proved to be the most effective, making it suitable for real-world
applications. The findings underscore the importance of model selection, feature
engineering, and data quality in developing reliable stock prediction systems.
22
7 Conclusion
In this project, we have explored and implemented various machine learning algorithms
to predict stock prices based on historical data. The primary aim was to evaluate the
effectiveness of these algorithms in forecasting future stock prices, a task known for its
complexity due to the highly dynamic and non-linear nature of financial markets. Through
systematic experimentation, the project has demonstrated that machine learning—
particularly advanced techniques like Long Short-Term Memory (LSTM) networks—can
serve as a powerful tool for financial forecasting when used appropriately.
The process began with extensive data collection and preprocessing, which formed the
foundation for accurate and meaningful predictions. This was followed by the application
of several models including Linear Regression, Support Vector Regressor, Decision Trees,
Random Forest, and LSTM. Performance evaluation using metrics like MAE, MSE, and
RMSE revealed that LSTM significantly outperforms other models by effectively capturing
temporal dependencies and complex patterns in time-series data.
• Machine learning models can achieve substantial accuracy when trained on well-
preprocessed stock market data.
However, it is also clear that no model can guarantee precise predictions in all market
conditions. Stock prices are influenced by numerous unpredictable external factors such
as geopolitical events, economic news, and investor sentiment, which may not be fully
captured by historical data alone.
This project serves as a solid foundation for building intelligent financial systems that can
assist investors and analysts in making data-driven decisions. While the current model
offers promising results, further improvements can be made by incorporating real-time
data streams, sentiment analysis from news and social media, and hybrid approaches
combining multiple models for ensemble predictions.
In conclusion, this project not only validates the application of machine learning in stock
price prediction but also provides a scalable framework for future enhancement. It
reflects the growing potential of AI in transforming the finance industry and sets the stage
for more robust, real-time, and intelligent trading systems in the future.
23
Subject Specific Project report
8 Reference
• Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network
model. Neurocomputing, 50, 159–175.
• Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock and stock price
index movement using Trend Deterministic Data Preparation and machine learning
techniques. Expert Systems with Applications, 42(1), 259–268.
• Chollet, F. (2015). Keras: The Python Deep Learning library. Retrieved from:
https://keras.io/
24