0% found this document useful (0 votes)
4 views

17BIT008

Uploaded by

venkatesh2811
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

17BIT008

Uploaded by

venkatesh2811
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

STOCK MARKET ANALYSIS

AND PREDICTION

A PROJECT REPORT
Submitted by
KANAGARAJ P (17BIT008)
BOOPATHY K (17BIT028)
PRASANTH G (17BIT201)

In partial fulfillment for the award of the degree


of

BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY

KUMARAGURU COLLEGE OF TECHNOLOGY COIMBATORE-641 049


(An Autonomous Institution Affiliated to Anna University, Chennai)

May 2021

1
KUMARAGURU COLLEGE OF TECHNOLOGY
COIMBATORE-641 049
(An Autonomous Institution Affiliated to Anna University, Chennai)

BONAFIDE CERTIFICATE
Certified that this project report “Stock market analysis and prediction” is
the bonafide work of “Kanagaraj P (17BIT008), Boopathy K (17BIT028),
prasanth G (17BIT201)” who carried out the project work under my
supervision.

SIGNATURE SIGNATURE
Dr.M.Alamelu Dr.P.C.Thirumal
Head Of the Department Supervisor
Associate Professor Associate Professor
Information Technology Information Technology

The candidates university register number 17BIT008,17BIT028,17BIT201 were


examined in the Project Viva-Voce Examination held on ……………….

INTERNAL EXAMINAR EXTERNAL EXAMINAR

2
DECLARATION

We affirm that the project work titled “Stock Market Analysis and prediction”
being submitted in partial fulfillment for the award of B.Tech Information
Technology is the original work carried out by us. It has not formed the part of
any other project work submitted for the award of any degree or diploma, either
in this or any other University.

Kanagaraj P (17BIT008)

Boopathy K (17BIT028)

Prasanth G (17BIT201)

I certify that the declaration made above by the candidates is true.

3
ACKNOWLEDGEMENT

We express our profound gratitude to the management of Kumaraguru College


of Technology for providing the required infrastructure that enabled us to
successfully complete the project.

We extend our gratitude to our Principal, Dr. D. Saravanan, for providing us the
necessary facilities to pursue the project.

We would like to acknowledge Dr. M.Alamelu, Professor and Head,


Department of Information Technology, for her support and encouragement
throughout this project.

We thank our project coordinator Dr.P.C.Thirumal, Associate Professor,


Department of Information Technology and guide Dr.P.C.Thirumal, Associate
Professor, Department of Information Technology, for their constant and
continuous effort, guidance and valuable time.

Our sincere and hearty thanks to staff members of the Department of Information
Technology of Kumaraguru College of Technology for their well wishes, timely
help and support rendered to us during our project. We are greatly indebted to our
family, relatives and friends, without whom life would have not been shaped to
this level.

-Kanagaraj

-Boopathy

-Prasanth

4
TABLE OF CONTENTS

CHAPTER TITLE PAGE NO


1. INTRODUCTION 6
1.1 DESCRIPTION 6

2. TECHNOLOGIES AND TOOLS 7

3. ALGORITHMS 8
3.1 SVM (Support Vector Machine for Regression) 8
3.2 LSTM (Long Short-Term Memory ) 8
3.3 ARIMA (Auto Regressive Integrated Moving Average) 8
3.4 RANDOM FOREST 8
3.5 LINEAR REGRESSION 8

4. SYSTEM DESIGN & WORKFLOW 9

5. EXPERIMENTS / PROOF OF CONCEPT EVALUATION 10


5.1 DATASET 10
5.2 DATA PRE-PROCESSING 10

6. RESULTS 11
6.1 GRAPH 11

6.2 EVALUATION 12
6.3 RMSE COMPARISON 13
7. DISCUSSIONS 14
7.1 DECISION MADE 14
7.2 DIFFICULTIES FACED 14
7.3 THINGS THAT WORK AND DID’NT 14
WORK WELL
8. CONCLUSION 15
9. FUTURE SCOPE 16
REFERENCES 16
10. APPENDIX 17

5
CHAPTER-1
INTRODUCTION
1.1 DESCRIPTION

Forecasting of stock market is a way to predict future prices of stocks. It is a long


time attractive topic for researcher and investors from its existence. The Stock
prices are dynamic day by day, so it is hard to decide what is the best time to buy
and sell stocks. Machine Learning provides a wide range of algorithms, which
has been reported to be quite effective in predicting the future stock prices.
In this project, we explored different data mining algorithms to forecast stock
market prices for NSE stock market. Our goal is to compare various algorithms
and evaluate models by comparing prediction accuracy. We examined a few
models including Linear regression, ARIMA, LSTM, Random Forest and
Support Vector Regression. Based on the accuracy calculated using RMSE of all
the models, we predicted prices of different industries.
For forecasting, we used historical data of NSE stock market and applied a few
pre-processing methods to make prediction more accurate and relevant.

6
CHAPTER-2
TECHNOLOGIES & TOOLS
Language and libraries: Python, SciPy, NumPy, Pandas, Sci-Kit Learn, Keras.
Keras is required to implement LSTM model. the other libraries are required to
process data and implement machine learning algorithms. Pandas made data pre-
processing relatively easy.
Tool: GOOGLE colab is convenient to use and is very fast.

7
CHAPTER-3
ALGORITHMS
3.1 SVM (Support Vector Machine for Regression):
SVM is considered as one of the most important breakthroughs in machine
learning field and can be applied in classification and regression. In this project,
SVR is considered to solve a regression problem as it avoids difficulties of using
linear functions.
3.2 LSTM (Long Short-Term Memory):
It is a recurrent neural network (RNN) architecture that learns about values
using intervals. LSTM keeps track of the past values and use those changes to
predict future values. In our project we have stock values for each day which can
be treated as sequence of values. for its ability to act as memory unit, LSTM can
be treated as one of the best algorithms for time-series analysis problems.
For example: Y is present value and X is past value by one day. LSTM will link
between X and Y to predict future value.
x Y
22 35
35 48
48 52

3.3 ARIMA ( Auto Regressive Integrated Moving Average):


ARIMA model works appropriately for modelling time series with trend
characteristics, random walk processes, and seasonal and non-seasonal time
series . It has simple structure that enables to model our time series dataset
characteristics properly.
3.4 Random Forest Algorithm:
Random tree is an ensemble of multiple trees. It has its own way of
predicting values. We don’t know whether the data is linear or not. In such cases,
Random forest is effective.
3.5 Linear Regression:
It is one of the most used algorithm for regression analysis. This algorithm
is implemented to check how it works compared to other algorithms.
8
CHAPTER-4
SYSTEM DESIGN AND WORKFLOW
The input data is pre-processed by cleaning of data and splitting them into proper
sets of training and test. This is in turn is fed to the learning algorithms for the
main phase of analysis. Based on the output from the algorithm, the values are
predicted and new data has been generated. Generated data are been used for the
evaluation of the predicted values to find the accuracy of the algorithms
efficiency.

Fig.1: system design

Fig.2: workflow

9
CHAPTER-5
EXPERIMENTS / PROOF OF CONCEPT EVALUATION
5.1 DATASET
In the project, we chose the National Stock exchange collected from. This
dataset includes India stocks and our index covers a diverse set of sectors
featuring many Indian companies. Our aim was to focus on making general and
unbiased model, which works on every type of scenario irrespective of company
or financial sector. It helps to validate our predictive algorithm and provide more
accurate stock prediction.
Our dataset includes eight features such as company Index, Date, Time, Open,
Close, High, Low values and Volume of trading (prices are in INR). The dataset
covers 440 companies every minute since 2015. We took this dataset as it’s size
is quite large (~2gb) and it can be used to evaluate several companies using our
algorithm. With the primary dataset prepared, we applied pre-processing methods
to carry out individual experiments.
5.2 Data pre-processing
Pre-processing refers to the transformations applied to your data before
feeding it to the algorithm. Selecting and pre-processing the data are crucial steps
in any modelling effort, particularly for generalizing a new predictive model. Our
dataset has some limitations such as it contains invalid values, null values and
missing records etc. We applied following techniques to pre-process our data to
make accurate prediction.
1. Data cleaning:
In real world, data tend to be incomplete and inconsistent. Therefore, the
purpose of data cleaning is to fill in missing values and correct
inconsistencies in the data. Index, Date, time closing prices of NSE dataset
are used as input. There were some missing values due to public holidays.
We removed null values and invalid indexes. There were few irrelevant
columns in the dataset which were not used as input. So we eliminated
those columns to reduce the complexity of our prediction model.
2. Data Transformation:
As our dataset contains minute-wise stock prices and we needed daily basis prices
to fit in our model, so we grouped the data on daily basis prices and took mean of

10
all the rows. Also we applied min-max scaling for a few algorithms to get more
accurate prediction.

CHAPTER-6
RESULTS
In this project, we have made a time-series analysis and it doesn’t need n-fold
cross validation methodology since it’s sequential data. We split our dataset in
train and test data. Top 80 percent of data will be Train data and the remaining
will be test.
6.1 Graphs

Below are the few companies with graphs plotted for predicted values(Red)
versus actual values (Blue) for different algorithms. We can see that LSTM and
Arima performs better compared to random forest and linear regression.

11
Fig 3: Actual closing price index and its predicted value from LR, RF, LSTM models

Arima model (Monthly basis)

Fig 4: Graph Comparison for five companies (Left to right) Infotech, 8kmiles, Aban, Bosch Ltd,
NTPC

6.2 Evaluation
The accuracy of prediction is referred to as “goodness of fit”. In this project, most
popular and statistical accuracy measure RMSE is used for comparison of
different algorithms on same dataset, which is defined as:

12
6.3 RMSE Comparison
Model 3IINFOTECH 8KMILES ABAN

Linear Regression 0.334 0.502 0.415


Support Vector
NA NA NA
Regression
LSTM (relating 0.011 0.064 0.043
present and past fifth
day )

LSTM (relating 0.013 0.054 0.039


present and past day )

Arima 1.263 206.344 23.707

Random Forest 0.345 0.502 0.402

13
CHAPTER-7

DISCUSSION

7.1 Decisions made

● We decided to make analysis using close values of the stock on a


particular day or month and predict the closing values for future.
● Decided to work on different data mining algorithms which are Linear
regression, Recurrent neural network, Support Vector machines,
Random Forest, Arima model with different approaches.
7.2 Difficulties faced
● Data set is in Gb and takes some time to load and process.

● Preprocessing took some time to extract the values in required format


and had difficulty initially on understanding how to forecast.
● Deciding the features to be considered for regression model.

7.3 Things that worked and didn’t work well


● The models that worked well are Long-short term
Memory RNN and Arima model.
● SVM and linear regression didn’t give accurate results.

● SVM took long time to process our large dataset.

14
CHAPTER-8
CONCLUSION

In the project, We proposed the use of different algorithms to predict the future
stock prices of almost twenty companies. Although comparison is shown for
only five companies (randomly selected) in the report due to space constraint,
the behaviour can be known for any company by using the same code. Long
short term memory algorithm worked best in case of forecasting and also we
ranged from first to last algorithms for forecasting stock market,
 LSTM
 ARIMA
 RF
 LR & SVR

15
CHAPTER-9
FUTURE SCOPE
In future, we will extend the project for other effective methods that might
result a better performance. Our algorithms can be used to maximize profit of
investors but it has to be improved for real time conditions.

REFERENCES
http://markdunne.github.io/public/mark-dunne-stock-market-prediction.pdf
http://citeseerx.ist.p
su.edu/viewdoc/download?doi=10.1.1.278.6139&rep=rep1&type=pdf

https://www.kaggle.com/ramamet4/nse-company-stocks

https://machinelearningmastery.com/arima-for-time-series-forecasting-with-
python/
https://ec.europa.eu/eurostat/sa-elearning/arima-model

16
CHAPTER-10
APPENDIX
import numpy as np
import scipy as sp
import pandas as pd
from subprocess import check_output

import time, json


from datetime import date

import time
import math
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.recurrent import LSTM
import numpy as np
import pandas as pd
import sklearn.preprocessing as prep
import matplotlib.pylab as plt
%matplotlib inline
from matplotlib.pylab import rcParams
df= pd.read_csv('groupeddf.csv')
df4=df.set_index("Code")
uniqueVals = df["Code"].unique()
grouped_df=pd.DataFrame()
for i in uniqueVals:
df5 = (df4.loc[i,:]).groupby(['Code','Date']).mean()
# store DataFrame in list
grouped_df=grouped_df.append(df5)
grouped_df.reset_index()
del df5
grouped_df
uniqueVals[:10]
def create_dataset(dataset,past=5): # relating 5th day and 1st day
dataX, dataY = [], []
for i in range(len(dataset)-past-1):
j = dataset[i:(i+past), 0]
dataX.append(j)
dataY.append(dataset[i + past, 0])
return np.array(dataX), np.array(dataY)
from sklearn.preprocessing import MinMaxScaler
def testandtrain(prices):
prices = prices.reshape(len(prices), 1)
prices.shape
scaler = MinMaxScaler(feature_range=(0, 1))
prices = scaler.fit_transform(prices)

17
trainsize = int(len(prices) * 0.80)
testsize = len(prices) - trainsize
train, test = prices[0:trainsize,:], prices[trainsize:len(prices),:
]
print(len(train), len(test))
x_train,y_train = create_dataset(train,1)
x_test,y_test = create_dataset(test,1)
x_train = np.reshape(x_train, (x_train.shape[0], 1, x_train.shape[1
]))
x_test = np.reshape(x_test, (x_test.shape[0], 1, x_test.shape[1]))

return x_train,y_train, x_test,y_test


def trainingmodel(model, trainX, trainY):
model = Sequential()
model.add(LSTM(20, input_shape=(1,1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='rmsprop')
model.fit(trainX, trainY, epochs=10, batch_size=1, verbose=2)
return model
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler

def predicting(prices, testX,testY,trainX):


scaler = MinMaxScaler(feature_range=(0, 1))
prices = prices.reshape(len(prices), 1)
prices.shape
scaler = MinMaxScaler(feature_range=(0, 1))
prices = scaler.fit_transform(prices)

testPredict = model.predict(testX)

error = math.sqrt(mean_squared_error(testY, testPredict))


print('Test RMSE: %.3f' % error)

plt.plot(testPredict,color="blue")
plt.plot(testY,color='red')

plt.show()
print (" --------end for the company------")
return testPredict

for val in uniqueVals[:10]:


df1=grouped_df.loc[val,:]
df2=df1.reset_index()
prices = df2['Close'].values.astype('float32')
print (val)
#train model
model = Sequential()

18
trainX, trainY, testX, testY=testandtrain(prices)
model = trainingmodel(model, trainX, trainY)

#predict and plot


predictingY=predicting(prices,testX,testY,trainX)

19

You might also like