Cryptocurrency Price Prediction Using Deep Learning
Cryptocurrency Price Prediction Using Deep Learning
LEARNING
Major project report submitted in partial fulfillment of the requirement for the degree
of Bachelor of Technology
in
By
JYOTI SARPAL(181314)
TANISHA RANTA(181342)
Technology
I hereby declare that; this project has been done by me under the supervision of
(Dr Monika Bharti, Assistant Professor (CSE/IT)), Jaypee University of
Information Technology.
I also declare that neither this project nor any part of this project has been
submitted elsewhere for award of any degree or diploma.
Supervised by:
(Dr Monika Bharti)
Assistant Professor
(CSE/IT)
Department of Computer Science & Engineering and Information
Technology Jaypee University of Information Technology
This is to certify that the work which is being presented in the project report titled
“CRYPTOCURRENCY PRICE PREDICTION USING DEEP LEARNING ”
in partial fulfillment of the requirements for the award of the degree of B Tech in
Computer Science and Engineering submitted to the Department of Computer
Science and Engineering, Jaypee University of Information Technology,
Waknaghat ,Solan is an authentic record of work carried out by “Jyoti Sarpal
(181314)”, “Tanisha Ranta (181342)” during the period from January 2022 to May
2022 under the supervision of Dr Monika Bharti, Department of Computer
Science and Engineering, Jaypee University of Information Technology,
Waknaghat ,Solan.
I would also generously welcome each one of those individuals who have
helped me straightforwardly or in a roundabout way in making this project a
win. In this unique situation, I might want to thank the various staff
individuals, both educating and non-instructing, which have developed their
convenient help and facilitated my undertaking.
Finally, I must acknowledge ,with due respect, the constant support and
patients of my parents.
TABLE OF CONTENTS
Chapters Page No.
1 INTRODUCTION
1.1 Introduction 1
1.3 Objectives 6
1.4 Methodology 6
1.5 Organization 7
2 LITERATURE SURVEY
2.2 Requirements 13
3 SYSTEM DEVELOPMENT
4 PERFORMANCE ANALYSIS
5 CONCLUSIONS
REFERENCES 45
ABSTRACT
Cryptocurrencies are a sort of digital currency in which all transactions are carried out through
the internet. It is a soft currency that does not exist in hard cash form. We emphasize the
difference between a decentralized currency and a centralized currency in that all virtual
currency users can acquire services without the intervention of a third party. Using these
cryptocurrencies, however, has an influence on international relations and trade because of
their severe price volatility. Furthermore, the rapid variations in cryptocurrency prices indicate
that a reliable method for estimating this price is urgently required.
Price control by a number of organizations has had a significant impact on the level of one
main or central control over them, affecting relationships with other businesses and
international trade. Furthermore, the ever-changing oscillations suggest a more accurate means
of projecting this price is desperately needed. Thus, using deep learning techniques such as the
recurrent neural network (RNN) and the long short-term memory (LSTM), gated recurrent unit
(GRU), which are effective learning models for training data, we must design a method for the
accurate prediction of by considering various factors such as market cap, maximum supply
and, volume, circulating supply.
The proposed method is written in Python and tested on benchmark datasets. The results show
that the proposed method can be used to make reliable predictions. Thus, the neural network,
which has been used by academics in numerous fields over the past ten years as one of the
intelligent data mining tools.
Stock market data is critical in today's economy. Linear (AR, MA, ARIMA, ARMA) and non-
linear models are the two types of forecasting methodologies (ARCH, GARCH, Neural
Network). To anticipate a company's stock price based on past prices, we employed the
Autoregressive Integrated Moving Average (ARIMA), Recurrent Neural Network, Long
Short-Term Memory (LSTM), and gated recurrent unit deep learning architectures (GRU).
CHAPTER 1
1.1 INTRODUCTION
1
The rapid rise in market capitalization and price of bitcoin produced a swarm of other
cryptocurrencies, the majority of which differ from bitcoin in only a few areas (block time,
currency supply, and issuance scheme). With more than 5.7 thousand cryptocurrencies, 23
thousand online exchanges, and a market capitalization of more than 270 billion USD, the
cryptocurrency business had evolved to become one of the world's largest unregulated
marketplaces by July 2021.
Bitcoin and other cryptocurrencies soon gained a reputation as pure speculative ventures,
despite its origins as a peer-to-peer electronic payment system. Their prices are unpredictable
because they are mostly influenced by behavioral factors and are unrelated to the primary types
of financial assets; nonetheless, their informational efficiency is high.
As a result, many hedge funds and asset managers began to include cryptocurrencies in their
portfolios, while researchers concentrated on cryptocurrency trading, namely machine learning
(ML) algorithms. Bitcoin's early success as a peer-to-peer virtual currency was due to its
cryptography-based technology, which eliminates the need for a trusted third party and solves
the problem of double-spending.
Bitcoin is a digital money that was first introduced in January 2009. It is the most valuable
cryptocurrency in the world, and it is traded on more than 40 exchanges worldwide, accepting
more than 30 different currencies. Bitcoin, as a currency, presents a novel potential for price
forecasting due to its extreme volatility, which is significantly higher than that of traditional
currencies.
2
The bitcoin system consists of a collection of decentralized nodes that run the bitcoin code and
keep track of its block chain. A block chain can be thought of as a collection of blocks in
metaphorical terms. There are a number of transactions in each block. Because all computers
running the block chain have the same list of blocks and transactions and can observe these
fresh blocks being filled with new bitcoin transactions in
Because all the computers running the block chain have the same list of blocks and transactions,
and can transparently see these new blocks being filled with new bitcoin transactions, no one
can cheat the system.
It is mainly a digital ledger of transactions that is distributed across the entire network of
computer systems on the block chain. The block chain consists of two fundamental
components; the first one is a transaction, and the second is a block. The transaction
represents the action triggered by the participant, and the block is a data collection that
records the transaction and additional details such as the correct sequence and creation
timestamp. Block chains have a signaling system of multi-domain, block chain-based,
cooperative DDoS defense systems in which each autonomous system (AS) joins the
defensive line.
Effects of networks on competition in the nascent cryptocurrency market over a period of time
regarding exchange rates among cryptocurrencies depends on two aspects: (1) competition
among different currencies and (2) competition among exchanges. There are hundreds of
cryptocurrencies, but Bitcoin is the most popular one as it is a stubborn competitor and did not
emerge from the cryptocurrency competition track. As a result, it has become the dominant
cryptocurrency. The authors describe the competition between cryptocurrency as “healthy
competition” and suggest that new technology and security innovation.
The aim of this research is to examine whether the price of Bitcoin can be predicted similar to
other stock market tickers. This will have a basis on whether we can further use it as a
3
medium of payment. Block chain keeps track of all Bitcoin transactions occurring anywhere
in the world. It is a cryptographic implementation that provides the highest security.The
popularity of cryptocurrencies soared in 2017 as their market value grew exponentially for
several months in a row. In January 2018, prices reached a high of around $800 billion.
Although machine learning has been successful in predicting stock market prices using a
variety of time series models, it has been limited in its use to predicting cryptocurrency prices.
The reason for this is obvious: cryptocurrency values are influenced by a variety of factors such
as technological advancements, internal competitiveness, market pressure to produce,
economic troubles, security concerns, political factors, and so on. Because of their tremendous
volatility, they have a huge profit potential if smart invention tactics are used. Cryptocurrencies
are, unfortunately, less predictions like stock market predictions.
4
1.2 PROBLEM STATEMENT
We hope to use Machine Learning Algorithms which also are widely utilized by many
organizations. This report will walk through a simple implementation of analyzing and
forecasting the prices by using various Machine Learning Algorithms.
5
1.3 OBJECTIVES
The main goal of this study is to use technical trade indicators and machine learning to
build and integrate price prediction for various cryptocurrencies.
1.4 METHODOLOGY
Due to price volatility and dynamism, cryptocurrency prices are difficult to forecast.
Hundreds of cryptocurrencies are used by clients all around the world. We'll look at three
of the more popular ones in this paper. As a result, the study intends to do the following
by employing deep learning algorithms, which may uncover hidden patterns in data,
integrate them, and generate considerably more accurate predictions:
A full examination of the many existing systems for predicting BTC cryptocurrency
prices is presented. LSTM, ARIMA, and GRU AI algorithms are used to reliably
anticipate cryptocurrency prices. For prediction, various AI algorithms are used which
enable an auto machine learning method.Using evaluation matrices such as, evaluating
the proposed hybrid models such as RMSE.
6
1.5 ORGANIZATION
We used APIs to fetch this Bitcoin Cryptocurrency dataset. The obtained dataset was
then averaged into one dataset for consistency and in order to fill in the gaps created by
missing data in the dataset. Building Neural Network Model Machine Learning is the
most suitable technique which can be used here to predict cryptocurrency prices
prediction.
The model to be built had to achieve several goals in order to produce a near to accurate
prediction.
This included selecting the framework which could produce a good prediction accuracy,
take in consideration of other parameters in its prediction algorithm and be trainable.
Below is the organization of our information.
7
Fig 3 : Organization of Model
In Fig 3 we have shown how our respective model is organized by the various steps that are included in this.
8
CHAPTER 2
1) A Novel Cryptocurrency Price Prediction Model Using GRU, LSTM and bi-LSTM Machine
Learning Algorithms
Hamayel and Owda Proposed a prediction model for predicting the prices of three types of cryptocurrency
BTC ETH LTC
Performance measures were conducted to test the accuracy of different models . Then, they
compared the actual and predicted prices. The results show that GRU outperformed the other
algorithms with a MAPE of 0.2454%, 0.8267%, and 0.2116% for BTC, ETH, and LTC,
respectively. The RMSE for the GRU model was found to be 174.129, 26.59, and 0.825 for
BTC, ETH, and LTC, respectively. Based on these outcomes, the GRU model for the targeted
cryptocurrencies can be considered efficient and reliable. This model is considered the best
model. However, bi-LSTM represents less accuracy than GRU and LSTM with substantial
differences between the actual and the predicted prices for both BTC and ETH. The
experimental results show that:
9
2) Deep Learning-Based Cryptocurrency Price Prediction Scheme with Interdependent
Relations
For Zcash, however, it followed a stochastic pattern. Compared to the bigger window size for
Zcash, the proposed model performs well for the smaller window size. For a greater window
size of -days and 30-days for Zcash, the proposed model demonstrates the stochastic character.
They will use the proposed methodology to work on cryptocurrencies with multiple
interdependencies in the future. We will also add emotive factors to the suggested algorithm,
such as Twitter and Facebook posts and messages, to improve the accuracy of the forecast
findings. Traditional commodities like gold and oil prices can be used to improve the
10
prediction outcome.
The crypto market, on the other hand, is less stable than traditional commodities markets. Many
technological, sentimental, and legal elements can influence it, making it very volatile,
uncertain, and unexpected. Many studies have been conducted on various cryptocurrencies in
order to estimate correct prices, but the bulk of these methods are not applicable in real time.
In this work, they present a solution based on the previous debate To predict the price of lit
coin and Zcash with inter-dependency of the parent currency, a deep-learning-based hybrid
model (including Gated Recurrent Units (GRU) and Long Short Term Memory (LSTM)) was
used. The suggested model is well trained and tested using standard data sets and can be
employed in real-time applications. In comparison to existing models, the suggested model
estimates prices with a high degree of accuracy.
11
3) Machine Learning for Bitcoin Pricing — A Structured Literature Review
Jaquart et al. proposed a study in which they use machine learning to assess the existing
corpus of literature on empirical bitcoin pricing and organized it into four main concepts.
They demonstrate that research on this subject is quite different, and that the findings of
multiple studies can only be compared to a limited extent. They also develop standards
for future field papers to ensure a high level of transparency and reproducibility.
12
2.2 REQUIREMENTS
In this we are going to list all the requirements that are needed in the model. The various
Tools, Technologies and Libraries which are used.
Tensor board:
TensorBoard is a visualization toolbox for TensorFlow that lets you watch metrics like loss and accuracy,
display the model graph, observe histograms of weights, biases, and other tensors as they change over time,
and much more. It is a component of the TensorFlow ecosystem and is free source.
The dashboards for Scalars, Graphs, Histograms, Distributions, and HParams are now
available. Over time, more TensorBoard dashboards will be introduced.
Anyone with a link can see any data published to TensorBoard.dev. It should not be used to
store sensitive information.
Tensor Board provides the necessary visualization tools which are needed for machine learning
model building. Also helps us in:
a) Keep track and visualize metrics like loss and accuracy
c) Develop histograms, boxplots, line graphs, bar charts, and other tensors which
change over (i.e., include new and additional features) from time to time.
Py Torch:
13
Hugging Face's Transformers, PyTorch Lightning, and Catalyst, to name a few.
PyTorch offers two high-level features:
● Tensor computation (like NumPy) with powerful graphics processing unit
acceleration (GPU)
● Deep neural networks with an autonomous differentiation method based on tape.
Modules
Module Autograd:
Automated differentiation is a PyTorch approach. The outputs of operations are recorded and
then played backwards to compute the gradients using a recorder. This method is very
beneficial for building neural networks since it allows for parameter differentiation during the
forward pass, which saves time on a single epoch. The torch module could be improved.
The optim module implements different optimization strategies for neural network
construction. There is no need to construct the most widely used methods from scratch because
they are already supported.
The PyTorch auto grad module makes it simple to create computational graphs and compute
gradients, but raw autograd is too low-level for defining sophisticated neural networks. The nn
module can help with this.
Data Reader:
To obtain data with a DataReader, first construct an instance of the Command object, then call
Command to build a DataReader. To obtain rows from a data source, use the ExecuteReader
method. The DataReader offers an unbuffered stream of data that allows procedural logic to
process data from a data source consecutively and effectively. Because the data is not stored in
memory, the DataReader is an excellent choice for obtaining massive
14
volumes of data. To get a row from the query results, use the read method. Each column in the returned row
may be accessed by passing the column's name or ordinal number to the DataReader.
Numpy:
For scientific computing, NumPy is the most important Python package. It's a Python library
with a multidimensional array object, derived objects (like masked arrays and matrices), and a
variety of routines for performing fast array operations like mathematical, logical, shape
manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic
statistical operations, random simulation, and more.
The NumPy library includes multidimensional array and matrix data structures (more on this
in subsequent sections). It provides methods for working efficiently with ndarray, a
homogeneous n-dimensional array object. NumPy allows you to perform a variety of array-
based mathematical operations. It includes a large library of high-level mathematical functions
that interact with these arrays and matrices, as well as advanced data structures that ensure
quick array and matrix calculations.
NumPy (Numerical Python) is a widely used open source Python library in research and
engineering. It's the de facto standard for working with numerical data in Python, and it's at the
centre of the scientific Python and PyData ecosystems. NumPy users range from inexperienced
other data science and scientific Python tools utilizing the NumPy API extensively.
Python lists are more compact and slower than NumPy arrays. An array consumes less memory
and is simpler to manipulate. NumPy saves data in a much smaller amount of memory and
allows you to choose data types. This makes it possible to optimize the code even further.
15
Pandas:
Pandas is a data manipulation and analysis software package for the Python programming
language. It includes data structures and methods for manipulating numerical tables and time
series, in particular. It's open-source software with a three-clause BSD license.
The word "panel data" is an econometrics term for data sets that comprise observations for
the same persons over several time periods.
16
Dataframes:
Pandas is mostly used in Dataframes for data analysis and related data manipulation. Pandas
can read data from CSV, JSON, Parquet, SQL database tables or queries, and Microsoft Excel
files. Pandas is capable of merging, reshaping, choosing, data cleansing, and data wrangling,
among other data manipulation activities. With the advent of pandas, many similar
characteristics of dealing with Dataframes that were created in the R programming language
were included into Python. The pandas library is based on the NumPy library, which is
designed to operate with arrays rather than Dataframes.s.
Seaborn:
Seaborn is a Python tool that allows you to create statistical visualisations. It is built on top of
matplotlib and works closely with pandas data structures.
Seaborn can help you explore and understand your data. Its graphing capabilities work with
dataframes and arrays holding entire datasets, using internal semantic mapping and statistical
aggregation to produce informative graphs. Because of its declarative, dataset-oriented API,
you can focus on the meaning of your charts rather than the mechanics of generating them. We
only need to import the Seaborn library for our short example. The acronym sns is commonly
used when importing it.
Seaborn draws its plots with matplotlib behind the scenes. It's best to utilise a Jupyter/IPython
interface in matplotlib mode for interactive work, or else you'll have to execute
matplotlib.pyplot.show() to view the plot.
To obtain rapid access to an example dataset, much of the code in the documentation will utilise
the load dataset() method. These datasets are nothing special: they're just pandas dataframes
that we could have imported using pandas.read csv() or manually generated. Although most of
the examples in the manual use pandas dataframes, seaborn is fairly versatile when it comes to
data structures.
17
Using a single call to the seaborn function relplot, this plot depicts the relationship between
five variables in the tips dataset (). Notice how we merely gave the variables' names and their
functions in the graphic. It wasn't essential to give plot element properties in terms of colour
values or marker codes, unlike when using matplotlib directly. Behind the scenes, seaborn
managed the conversion of dataframe values into matplotlib-friendly arguments. This
declarative method allows you to concentrate on the questions you want to answer rather than
the specifics of matplotlib control.
In seaborn, there are several specific plot styles for showing categorical data. They may be
found using catplot (). Different levels of granularity are available in these charts. At the most
basic level, you could want to make a "swarm" plot, which is a scatter plot in which the points
along the category axis are adjusted such that they don't overlap:
Pyplot :
Pyplot is a Python matplotlib API that essentially turns matplotlib into a viable open source
alternative to MATLAB. Matplotlib is a data visualization package that generates plots, graphs,
and charts.
Pyplot is stateful, which means it remembers the state of an object when you first plot it. This
is required for usage inside the same loop or session state until plt.close() is called. When
establishing many plots in a row, state is also significant.
The pyplot API is a hierarchy of Python code objects that comprises various functions, the
most important of which being matplotlib.pyplot.
Scripting layer - this layer is used to construct a figure that comprises one or more plots with
axes (i.e., x axis ,y axis, and possibly z axis)
Artist Layer - used to edit plot components such as labels, lines, and so on.
Backend Layer: This layer is used to format the plot for presentation in a specific target
application, such as a Jupyter Notebook.
18
There are times when you have data in a format that allows you to use strings to access certain
variables. For instance, numpy.recarray and pandas. DataFrame.
The data keyword parameter in Matplotlib lets you supply such an object. If these variables are
available, you may create graphs using the strings that correspond to them.
Tensor flow:
An open-source platform for problem-solving. Includes a number of comprehensive, flexible
tools and multi-purpose libraries, as well as community resources, to assist academics in
developing Machine Learning models and quickly building and deploying Machine Learning-
based applications.
TensorFlow is a free and open-source machine learning and artificial intelligence software
library. It can be used for a wide range of tasks, however it is primarily focused on deep neural
network training and inference.
TensorFlow was created by the Google Brain team for internal Google research and production.
The Apache License 2.0 was used to release the initial version in 2015.
The above table shows the description of the proposed approach that we have used.
19
CHAPTER 3
SYSTEM DEVELOPMENT
The suggested method was tested using one of the most well-known and oldest
cryptocurrencies, Bitcoin (BTC) The BTC dataset included exchanges from January 2016 to
December 2021, with OHLC (Open, High, Low, Close) updates every minute, the volume of
BTC and the specified currency, and weighted Bitcoin prices. The dataset was publicly
accessible over the Internet.
Table 2: Bitcoin Datasets
Parameter Value/Description
Dataset Details USD (Large in number named as L1)
Memory usage 919.8 MB
Range Index 1259 entries, 0 to 1258
Total Data Columns 14
Date int64
Open float64
Close float64
High float64
Low float64
Volume float64
adjClose float64
adjHigh float64
adjOpen float64
adjVolume float64
divCash float64
splitFactor float64
Given above is the Bitcoin dataset which includes the description of the parameters of the dataset.
20
Fig 4: BTC Dataset
Timestamp: The bitcoin server's one-minute timestamp. Bitcoin volume in that one-minute
period.
Low: The lowest bitcoin price in that one-minute period.Bitcoin's open price at that one-
minute interval.
Adj Close: The adjusted close is the closing price after all applicable splits have been
adjusted.
Adj High: The highest price of bitcoin after adjusting for all applicable splits.
Adj Low: The adjusted low price of bitcoin is the lowest price after adjusting for all
applicable splits.
Adj Open: Adjusted low is bitcoin's initial price after adjusting for applicable splits.
Split: A 2-for-1 split gives you two shares for the price of one.
21
3.3 THE PROPOSED APPROACH
Data Analysis Phase: This phase examines data and its parameters for redundancy in data
values that might impact prediction outcomes.
If a dataset contains any unnecessary parameters, the data values for those parameters are
eliminated. This step also analyses data for prospective data merging to increase model
predictability.
Data Filtration Step: In this phase, data is filtered to eliminate any empty or redundant values.
The Train-Test Split Phase divides data into training and testing subgroups.
For example, data is split into two sections, with 65 percent training data and 35 percent test
data.
Data-Scaling Phase: Data are scaled according to model requirements before being delivered
to the model. This step reshapes data in this way to make it more suited for the model.
Modeling Phase: The suggested technique is written in Python. In Python, the two most
powerful libraries for machine learning models are keras and tensor flow. Keras with Tensor
Flow is utilized as the backend library to improve the accuracy of the model. The Keras
sequential model is made up of two layers: LSTM and dense layers. These layers thoroughly
examine data in order to assess all types of patterns created in the dataset in order to improve
the model's precision. The data is then fed into the model for training.
22
Phase of Model Learning and Evaluation: Data is trained using several LSTM units. There
are four gates in this circuit: a memory cell, an input gate, an output gate, and a forget gate.
These gates allow information to pass through.
Then the same data is trained using ARIMA forecasting model where used various features
and measures to test it then we have the GRU model which is also used to train the model,
All these models are used for training and testing of data so that we get results and we can test
them for the next 30 days to compare their results.
Prediction Phase: The stored model is used to make the prediction. The model is fed input
data and produces projected values as output. The output is then compared against testing data
to determine accuracy and losses. The stored model is used to make the prediction. The model
is fed input data and produces projected values as output. The output is then compared against
testing data to determine accuracy and losses.
23
Fig 5: Proposed Approach
The above figure explains the various proposed steps used to make our model.
24
Fig 6: Training and Testing Datasets
Model Machine Learning is the most appropriate approach for predicting Bitcoin cryptocurrency
values in this case. In order to create a near-accurate forecast, the model had to accomplish
numerous goals. This involved picking a framework that could yield excellent prediction accuracy,
take other parameters into account in its prediction process, and be trainable.
Following that, it was necessary to determine which layers would be included and how many would
be required, as well as the epoch rates. The training dataset was normalized and altered since it is
better appropriate for various activation functions.
The square of the correlation coefficient is used in this method to determine the link between characteristic
fields in the collection of data. This assists the dominant parameters in determining the values for the other
fields. The bitcoin price is then established using linear and exponential forecasting. To anticipate
cryptocurrency prices, the suggested method employs the LSTM model and the GRU model.
25
3.4.1 Recurrent neural networks (RNN):
Output is affected by both current and previous inputs. Let I1 be the initial input with a dimension
of n*1, where n is the vocabulary length. S0 represents the hidden state of the first RNN cell with d
neurons. Input hidden state for each cell should be one prior. Because no prior state is visible,
initialise S0 with zeros or a random value for the first cell. U is another d*n-dimensional matrix,
where d is the number of neurons in the first RNN cell and n is the size of the input vocabulary. W
is another d*d-dimensional matrix. b is bias with a size of d*1. Another matrix V with the size k*d
is used to find the output from the first cell.
In General,
Sn= UIn+ WSn-1 + b ; On= VSn+c
26
CHAPTER 4
PERFORMANCE ANALYSIS
The long short-term memory network, or LSTM, solves the problem of fading gradients
that plagues recurrent neural networks. This is a type of recurrent neural network used in
deep learning because it can learn very large designs. The LSTM is an RNN-like
architecture with gates that control data flow between cells. The input and forget gate
structures can alter information passing through the cell state, with the final output being
a filtered version of the cell state based on the context of the inputs. The mathematical
representation of the LSTM forward training procedure is as follows:
27
Fig 8: LSTM OUTPUT
Gated recurrent neural networks (Gated RNNs) have demonstrated their effectiveness in
a variety of applications requiring sequential or temporal data. The transition functions
in hidden units of GRU are given as follows:
28
Below is the GRU model being applied:
29
4.3 AUTO REGRESSIVE MOVING AVERAGE (ARIMA):
ARIMA (Auto Regressive Integrated Moving Average) is a family of models that 'explains' a
given time series based on its own previous values, such as delays and prediction errors, such
that the equation can be used to predict future values.
30
4.4 SCREENSHOTS OF THE VARIOUS STAGES OF THE PROJECT
31
4.4.3 Exploration of data and it's visual analysis:
32
Fig 11: Heat Map
33
4.4.4 Data Scaling:
Before data are passed to the model, the data are scaled according to model requirements.
In this way, this phase reshapes data to make them more suitable for the model.
This phase splits data into training and testing data subsets. For example, data are divided
into two parts per a ratio of 65% training data and 35% test data.
34
The final phase Model evaluated will be discussed ahead.
The given blue line represents the whole dataset whereas orange is our training data used for
Number of epochs: This is defined as the total quantity of data that the machine must
learn in a single iteration during the training stage.
35
Correlation coefficient: A measurement of the strength of the link between two variables.
The Person correlation coefficient is the most used technique, and its formula for any two
connections x and y is as follows:
Root mean square error: This calculates the difference between two datasets. This is determined
as the total of all observations divided by the number of observations (n) as the difference
between the anticipated value (Pi) and the observed value (Oi):
36
CHAPTER 05
CONCLUSION
For benchmark datasets, the model was run and implemented. From the entire dataset,
the square of the correlation coefficient was utilized to discover a dominant feature, and
then correlations between Close and High, Close and Low, Close and Open, and Close
and Volume were obtained. Correlations between several types of market data are
displayed.
37
Any value of Low can be found by putting the value of close in the following equation:
Any value of open can be found by putting the value of close in the following equation:
38
5.2 MODEL EVALUATION
5.2.1 LSTM
The blue line shows the values from the original dataset, while the orange line shows the
training values and the green line shows the projected test values.
39
Further prediction for next 30 days is shown:
40
The blue line shows the values from the original dataset, while the orange line shows the training
values and the green line shows the projected test values.
41
PREDICTION GRAPH OF ARIMA:
42
In conclusion, the findings suggest that proximity may play an important role in impacting the
other characteristics. Furthermore, as seen by the findings for the proposed model trained on
big datasets, the size of the dataset may impact future predictions.
To create the predicted model, the study solely considers the bitcoin closing price. It does not
take into account other economic elements such as bitcoin news, government regulations, and
market attitudes, which may be the project's future scope in order to estimate the price with
greater precision. The forecast is confined to historical data. The ability to forecast on streaming
data would increase the model's performance and predictability. The study solely includes the
comparison between ARIMA vs LSTM vs GRU.
Based on these findings, the GRU model for the cryptocurrencies under consideration may be
regarded as efficient and dependable. This model is regarded as the best. However, LSTM and
ARIMA have worse accuracy than GRU and have significant disparities between the real and
forecasted prices for our dataset.
• GRU predicts bitcoin prices better than LSTM and ARIMA, but all algorithms produce outstanding
predictions overall.
43
5.3 FUTURE SCOPE
● We'd experiment with new machine learning methods in order to improve accuracy
● We will create a website that contains all of the information about our project.
● To make this project accessible to everyone, we will also deploy it on cloud platforms.
● We'll look into other factors that could influence bitcoin market values.
The price volatility of cryptocurrency is affected and determined by factors such as a country's
political system, public relations, and market policy. Other cryptocurrencies such as ripple,
ethereum, lite coin, and others were not examined in our research. We will improve the model
by applying it to these coins, making it more stable. Fuzzification can also be applied at the
input.
44
REFERENCES
45