0% found this document useful (0 votes)
74 views

A Comprehensive Analysis of Machine Learning Models For Algorithmic Trading of Bitcoin

Uploaded by

lerhlerh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

A Comprehensive Analysis of Machine Learning Models For Algorithmic Trading of Bitcoin

Uploaded by

lerhlerh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

A Comprehensive Analysis of Machine Learning

Models for Algorithmic Trading of Bitcoin


Abdul Jabbar and Syed Qaisar Jalil

Abstract—This study evaluates the performance of 41 machine market sentiments. This ability to extract meaningful patterns
learning models, including 21 classifiers and 20 regressors, in and insights from complex and often noisy data is crucial
predicting Bitcoin prices for algorithmic trading. By examining in predicting future market behavior and making informed
these models under various market conditions, we highlight their
accuracy, robustness, and adaptability to the volatile cryptocur- trading decisions [2].
arXiv:2407.18334v1 [q-fin.TR] 9 Jul 2024

rency market. Our comprehensive analysis reveals the strengths Our research delves deep into the realm of algorithmic
and limitations of each model, providing critical insights for trading for Bitcoin, employing a range of machine learning
developing effective trading strategies. We employ both machine models. The primary aim is to critically analyze the per-
learning metrics (e.g., Mean Absolute Error, Root Mean Squared formance of these models in the context of Bitcoin trading.
Error) and trading metrics (e.g., Profit and Loss percentage,
Sharpe Ratio) to assess model performance. Our evaluation We explore various aspects of these models, including their
includes backtesting on historical data, forward testing on recent predictive accuracy, response to market volatility, and the
unseen data, and real-world trading scenarios, ensuring the effectiveness of different feature sets. In doing so, this study
robustness and practical applicability of our models. Key findings sheds light on the nuances of algorithmic trading in the
demonstrate that certain models, such as Random Forest and cryptocurrency market and provides a roadmap for traders and
Stochastic Gradient Descent, outperform others in terms of profit
and risk management. These insights offer valuable guidance for investors in navigating this volatile yet potentially lucrative
traders and researchers aiming to leverage machine learning for domain.
cryptocurrency trading. A key motivation behind this study is the growing interest in
Index Terms—Bitcoin, Machine Learning, Trading Strategies cryptocurrency trading and the need for robust trading strate-
gies that can adapt to the dynamic nature of these markets.
The extreme volatility of cryptocurrencies, while a deterrent
I. I NTRODUCTION
for some, presents a fertile ground for algorithmic trading
The advent of Bitcoin and the subsequent proliferation of strategies. Machine learning models, with their adaptability
cryptocurrencies have not only disrupted traditional financial and learning capabilities, are well-suited to capture the intri-
systems but also introduced novel paradigms in asset trading. cacies of these markets. However, it is imperative to critically
Cryptocurrencies, led by Bitcoin, have carved a niche in assess the performance of different machine learning models,
financial markets, attracting attention from both retail and each with its unique strengths and limitations, to identify the
institutional investors. The allure of high returns, coupled most effective strategies for cryptocurrency trading.
with the inherent volatility of these digital assets, has spurred Our research contributes to the existing literature by pro-
the development of sophisticated trading strategies. Among viding a comprehensive analysis of various machine learn-
these, algorithmic trading, leveraging the prowess of machine ing models in the context of Bitcoin trading. We not only
learning models, has emerged as a key player in navigating assess the performance of these models but also explore the
the cryptocurrency market landscape [1]. implications of their results in practical trading scenarios. This
Bitcoin, the forerunner in this domain, presents a unique includes considerations of market volatility, transaction costs,
blend of challenges and opportunities for traders. Its decentral- and other factors that impact on the trading outcomes.
ized nature, coupled with the absence of regulatory oversight, The structure of the paper is as follows: In Section 2,
results in significant price fluctuations. This volatility, while we present a detailed literature review, examining previous
posing risks, also creates opportunities for substantial gains, work in this field and identifying gaps that our study aims to
making Bitcoin an attractive asset for algorithmic trading fill. Section 3 describes the methodology, including the data
strategies. These strategies, which were once the domain sources, machine learning models employed, and the evalua-
of sophisticated institutional traders, are now increasingly tion criteria used. Section 4 discusses the results, providing an
accessible to a wider audience, thanks to advancements in in-depth analysis of the performance of each model. Section
computational power and machine learning techniques. 5 offers a discussion on the implications of our findings, both
The integration of machine learning in trading strategies for for traders and the broader field of financial machine learning.
Bitcoin and other cryptocurrencies represents a significant shift Finally, Section 6 concludes the paper, summarizing our key
from traditional trading approaches. Machine learning models findings and suggesting avenues for future research.
offer the capability to process and learn from vast datasets, Through this comprehensive exploration, our study aims to
including historical price movements, trading volumes, and not only advance the understanding of algorithmic trading in
Dr. A. Jabbar, and Dr. S.Q. Jalil are with Neurog LLP. E-mails: abduljab- the cryptocurrency sphere but also to provide practical insights
[email protected], [email protected] that can be leveraged by traders and investors.
II. L ITERATURE R EVIEW prices using a stacking ensemble model, integrating Random
Forest and Generalized Linear Model with Support Vector
The application of machine learning (ML) techniques to Regression (SVR) as a meta-learner. The study achieved high
predict cryptocurrency prices has garnered significant attention predictive accuracy, suggesting the effectiveness of ensemble
due to the volatile nature of these markets and the potential for methods.
substantial financial returns. This section reviews key studies Studies have also focused on practical applications and real-
in this domain, highlighting methodologies, findings, and how world testing of ML models to validate their performance and
our study advances the current state of knowledge. applicability in actual trading scenarios. [12] applied various
Machine learning algorithms have been extensively em- ML techniques, including Logistic Regression, SVM, Random
ployed to predict Bitcoin prices, leveraging their ability to Forest, XGBoost, and LightGBM, to predict Bitcoin price
handle large datasets and capture complex patterns. Several movements. The study highlighted the potential of ensemble
studies have explored various ML models and techniques to models in enhancing prediction accuracy and constructing
enhance prediction accuracy. For example, [3] applied Sup- effective trading strategies. Similarly, [13] conducted a com-
port Vector Machine (SVM) and K-Nearest Neighbor (KNN) parative analysis of ARIMA, Facebook Prophet, and XGBoost
algorithms to forecast Bitcoin prices, demonstrating that SVM to predict the monthly Bitcoin price rate. The results indi-
outperforms KNN in terms of accuracy. This study emphasizes cated that Facebook Prophet outperformed the other models,
the importance of machine learning in producing more accu- demonstrating high accuracy and reliability. Furthermore, [14]
rate results compared to traditional techniques. Similarly, [4] performed a comparative analysis of machine learning models
investigated the prediction of Bitcoin prices using the prices of for forecasting next-day cryptocurrency returns. They found
other cryptocurrencies, such as Ethereum, Zcash, and Litecoin. that SVMs provided the highest classification accuracy and de-
They employed cointegration analysis, regression models, and veloped a probability-based trading strategy that significantly
ARIMA models to analyze price trends and found that Zcash outperformed standalone investments.
performed best in forecasting Bitcoin prices without direct Some studies have integrated sentiment analysis and techni-
Bitcoin price information. cal indicators to improve the accuracy of cryptocurrency price
Highlighting the superiority of machine learning over tra- predictions. For instance, [15] applied machine learning and
ditional methods, [5] evaluated the forecasting performance sentiment analysis techniques to predict price movements of
of various ML algorithms using high-frequency intraday data. major cryptocurrencies. The study leveraged data from Twitter
They found that SVM achieved the highest accuracy, out- and market data, finding that neural networks outperformed
performing traditional models like ARIMA, especially during other models. Additionally, [16] investigated the application of
market turmoil such as the COVID-19 pandemic. In a different ML algorithms to forecast Bitcoin price movements. The study
approach, [6] combined a high-end multi-layer perceptron found that Random Forest achieved the highest forecasting
(MLP) with various machine learning techniques to predict performance on continuous datasets, while ANN performed
Bitcoin prices. This study achieved high prediction accuracies best on discrete datasets.
using optimization techniques and classifiers like KNN and Various performance metrics have been used to evaluate
SVM. the effectiveness of ML models in predicting cryptocurrency
Several studies have conducted comparative analyses of dif- prices. [17] compared ARIMA, Facebook Prophet, and XG-
ferent ML models to identify the most effective techniques for Boost using metrics such as RMSE, MAE, and R-squared.
cryptocurrency price prediction. [7] analyzed various machine The study demonstrated that ARIMA outperformed the other
learning methods for predicting Bitcoin prices, highlighting models, highlighting the importance of preprocessing and fea-
the superior prediction accuracy of Artificial Neural Networks ture selection. Similarly, [18] investigated the efficacy of ML
(ANN) and SVMs compared to traditional parametric regres- algorithms in predicting Bitcoin prices. The study found that
sion approaches. Additionally, [8] evaluated SVM, KNN, and RF exhibited the highest forecasting accuracy on continuous
Light Gradient Boosted Machine (LGBM) in predicting price datasets, while ANN performed best on discrete datasets.
movements of Bitcoin, Ethereum, and Litecoin. They found While existing studies have significantly advanced the field
that KNN outperformed other models in the overall dataset, of cryptocurrency price prediction, they often face challenges
while SVM and LGBM were better for specific cryptocurren- related to model robustness, overfitting, and the ability to
cies. Supporting these findings, [9] compared the effectiveness adapt to rapidly changing market conditions. Our study ad-
of Simple Moving Average (SMA) and Radial Basis Function dresses these challenges by integrating both machine learning
Neural Network (RBFNN) methods. The study demonstrated and trading metrics (e.g., Mean Absolute Error, Root Mean
that RBFNN significantly outperforms SMA, providing a more Squared Error, Profit and Loss percentage, Sharpe Ratio) to
accurate tool for forecasting Bitcoin prices. comprehensively evaluate model performance. Furthermore,
Advanced machine learning techniques, including ensemble our evaluation process includes backtesting on historical data,
methods, have shown promising results in predicting cryp- forward testing on recent unseen data, and real-world testing
tocurrency prices. [10] explored the predictability of major to ensure robustness and practical applicability. This multi-
cryptocurrencies using linear models, random forests, and faceted evaluation approach provides a more thorough assess-
SVMs. The study found that ensemble approaches achieve sig- ment of model performance compared to previous studies.
nificant profitability, particularly during bear market periods. Key findings from our study demonstrate that certain mod-
Additionally, [11] investigated the predictability of Bitcoin els, such as Random Forest and Stochastic Gradient Descent,
outperform others in terms of profit and risk management. • Parabolic SAR (Stop and Reverse): This indicator is
These insights offer valuable guidance for traders and re- used to determine the direction of an asset’s momentum
searchers aiming to leverage machine learning for cryptocur- and the point in time when this momentum has a higher-
rency trading, highlighting the practical benefits and improved than-normal probability of switching directions.
accuracy of our approach. By incorporating economic indica- Each of these indicators provides a unique lens through
tors and considering practical trading constraints, our study which to analyze market trends and movements, and their
contributes to the development of more efficient and reliable incorporation is expected to enrich the feature set available
algorithmic trading strategies in the cryptocurrency domain. for our machine learning models.
This comprehensive evaluation framework and the integration Our methodology is further characterized by the use of
of diverse metrics set our study apart from previous research, rolling windows of various sizes: 1, 7, 14, 21, and 28 days.
offering a more robust and practical solution for Bitcoin price This approach ensures that our models have access to a
prediction and trading. dynamic, evolving view of market conditions, as each window
III. M ETHODOLOGY encompasses the preceding n intervals of data. Such a tech-
nique is crucial for models that need to understand and predict
A. Data
market trends over different time horizons. The models that
In machine learning, the quality and depth of data are show the highest performance, particularly in terms of profit
critical, especially in complex fields like financial trading. and loss (PNL) percentage across these windows, are then
For Bitcoin trading, the challenge is even more pronounced selected for detailed examination in the subsequent sections
due to the market’s relatively recent development and the of this study.
lack of centralized, comprehensive historical data. To circum- An important preprocessing step applied to our dataset is
vent these challenges, our study leverages a detailed dataset the log difference transformation. Mathematically, this can be
of Bitcoin prices, publicly available since the inception of expressed as:
Bitcoin trading in 2013. This extensive dataset is invaluable
for training models capable of recognizing and adapting to ∆ log(Pt ) = log(Pt ) − log(Pt−1 )
a wide spectrum of market conditions, which is essential for
developing sophisticated algorithmic trading strategies. where Pt and Pt−1 represent the price of Bitcoin at times t and
The dataset for this research is meticulously divided into t−1, respectively. This transformation is effective in stabilizing
three segments: training, backtesting, and forward testing. The variance, linearizing trends, and introducing stationarity to
training dataset spans a decade, from January 2013 to January the dataset, crucial for analyzing financial time series where
2023, providing a rich historical context for the models to understanding growth rates and temporal changes is important.
learn from. This lengthy period is crucial to encompass the Finally, the design of our dataset is intentionally made flex-
diverse range of market behaviors and trends Bitcoin has ible to accommodate various time intervals. While the primary
experienced. The backtesting phase covers six months, from focus is on a 24-hour trading horizon, the structure is adaptable
February to July 2023, and is instrumental in evaluating the to different temporal scales. This flexibility showcases the
models on unseen data, thus testing their ability to generalize broad applicability of our methodology, suitable for a range
beyond the training set. This is a crucial step in preventing of trading frequencies and market conditions.
overfitting. Finally, the forward testing phase, from August
to October 2023, serves as a real-world application of the B. Machine Learning Models
models, ensuring they are tested against new, unencountered
data, thereby eliminating any survivorship bias. In this research, a diverse array of machine learning classi-
To enhance the models’ input features, the study incorpo- fiers and regressors has been employed to analyze and predict
rates a range of technical indicators alongside the raw pricing Bitcoin trading patterns. Each model has been meticulously
data. These indicators include: selected for its unique attributes and potential effectiveness
• Accumulation/Distribution Index: A volume-based in- in capturing the complexities of the cryptocurrency mar-
dicator designed to reflect cumulative inflows and out- ket. Classifiers are tasked with determining the trading ac-
flows of money, providing insights into the strength of a tion—specifically, whether to buy (go long) or sell (go short).
trend based on volume movements. In contrast, regressors focus on predicting the magnitude of
• Money Flow Index (MFI): This indicator combines price price changes over specified intervals. To distinguish between
and volume to identify overbought or oversold conditions the two, we denote classifiers with a suffix ’C’ and regressors
in an asset, offering a perspective on the intensity of with ’R’.
buying or selling pressure. 1) Classifiers: The following classifiers have been em-
• Bollinger Bands: A statistical chart characterizing the ployed:
prices and volatility of an asset over time, which includes 1) Ada Boost (ABC): This ensemble method combines
a moving average and two standard deviation lines. multiple weak learners to form a stronger model, en-
• Keltner Channel Width: This encompasses a volatility- hancing performance in varied market conditions.
based envelope set above and below an exponential mov- 2) Bagging (BGC): Uses bootstrap aggregating to improve
ing average of the price, offering insights into potential stability and reduce overfitting, crucial in volatile market
trend breakouts or reversals. scenarios.
Data Machine Learning Evaluation

Collection Development Backtest


(Classifiers &
Regressors)
Preprocessing Forwardtest

Training
Feature Selection Realworld

Hyperparameter
Rolling Windows
Optimization

Dataset

Fig. 1: Overview of the Methodology: This flowchart illustrates the comprehensive process used in our study, encompassing
three main modules: data, machine learning, and evaluation. The data module includes all the steps from data collection to
dataset creation, preparing the data for use by the machine learning module. The machine learning module covers model
development and training, including hyperparameter optimization for both classifiers and regressors. The evaluation module
involves rigorous backtesting on historical data, forward testing on recent unseen data, and real-world testing to validate model
performance and ensure practical applicability.

3) Bernoulli NB (BNBC): Suited for binary classification, learning, it updates models based on prediction errors,
it’s effective in scenarios with binary/boolean feature adapting swiftly to market changes.
sets. 15) Perceptron (PC): A simple, yet effective linear classifier
4) Calibrated CV (CCVC): Improves probability estima- for large datasets, efficient in handling vast market data.
tion in classification, essential for better trade decision- 16) Quadratic Discriminant Analysis (QDAC): Assumes
making. Gaussian distribution for class separation, effective in
5) Decision Tree (DTC): Offers a transparent, tree- markets exhibiting normal distribution patterns.
structured modeling approach, useful for clear interpre- 17) Random Forest (RFC): An ensemble of decision trees,
tation of trading signals. known for high accuracy and robustness against overfit-
6) Extra Tree (ETC): A Random Forest variant that ting in complex market environments.
introduces more randomness in split decisions, aiming 18) Ridge (RC): A linear model with L2 regularization,
to reduce model overfitting. adept at handling multicollinearity in financial datasets.
7) Gaussian Process (GPC): Excellent for small datasets, 19) SGD (SGDC): Utilizes stochastic gradient descent for
captures complex patterns using kernel functions, suit- optimized computational efficiency, crucial in high-
able for nuanced market analysis. frequency trading scenarios.
8) K Neighbors (KNC): A non-parametric method that 20) SVC (SVC): Versatile in handling both non-linear and
classifies based on the proximity to nearest neighbors, high-dimensional data, adaptable to various market con-
useful in identifying market trends. ditions.
9) Linear Discriminant Analysis (LDAC): Effective in 21) Radius Neighbors (RNC): Classifies based on a fixed
finding linear combinations of features for class separa- radius, useful in spatial or locality-based market analy-
tion, suitable for linearly separable market data. ses.
10) Linear SVC (LSVC): Applies Support Vector Classifi- 2) Regressors: The regressors used in this study are as
cation in scenarios with linear separability, efficient for follows:
clear market trend data.
1) Ada Boost (ABR): Applies an ensemble technique
11) Logistic Regression (LRC): A fundamental model for
focusing on challenging data points, enhancing accuracy
binary classification, ideal for straightforward buy or sell
in regression tasks.
decisions.
2) Bagging (BGR): Employs bootstrap sampling to create
12) Logistic Regression CV (LRCVC): Integrates logistic
multiple models, reducing variance and improving pre-
regression with cross-validation, optimizing for the best
dictions in regression.
model parameters.
3) Decision Tree (DTR): An interpretable model for re-
13) MLP (MLPC): A neural network-based model, capable
gression, useful in capturing non-linear relationships in
of capturing complex, non-linear relationships in market
price movements.
data.
4) Extra Tree (ETR): Improves on Random Forest by
14) Passive Aggressive (PAC): Suitable for large-scale
randomizing decision trees, enhancing regression per-
formance in unpredictable markets. approach involves using a window of a fixed size that moves
5) Gaussian Process (GPR): Ideal for small datasets with through the dataset over time. For each position of the window,
complex patterns, offers probabilistic outputs beneficial a subset of data is selected, which is then used for training
for risk assessment. the model. This technique is crucial in capturing the evolving
6) K Neighbors (KNR): Predicts values based on the nature of financial markets, as it allows models to learn from
proximity of neighbors, effective in markets with spatial the most recent trends and patterns.
correlation. In the context of machine learning for Bitcoin trading,
7) Linear SVR (LSVR): Adapts Support Vector Regres- rolling windows are essential for several reasons. Firstly, they
sion for linear contexts, efficient in markets with linear enable models to adapt to changing market conditions, which
price movements. is crucial in a volatile market like Bitcoin. By training on the
8) MLP (MLPR): A neural network approach for model- most recent data, the models stay updated with current mar-
ing complex regression patterns in financial markets. ket dynamics, enhancing their predictive accuracy. Secondly,
9) Random Forest (RFR): Known for high accuracy in rolling windows help in mitigating the risk of overfitting.
regression, leveraging an ensemble of decision trees to Models trained on a specific period might perform well on
predict price changes. that period but fail to generalize to new data. By continuously
10) Ridge (RR): Utilizes L2 regularization to mitigate updating the training dataset, rolling windows ensure that
overfitting in regression, essential for stable financial models are not overly tuned to a specific historical period.
predictions. In this study, five different rolling window sizes were
11) SGD (SGDR): Implements stochastic gradient descent used: 1, 7, 14, 21, and 28 days. These sizes were chosen to
for efficient regression analysis in large datasets. capture various market dynamics, from short-term fluctuations
12) SVR (SVRR): A versatile kernel-based method, effec- to longer-term trends. Each window size provides a different
tive for both linear and non-linear regression tasks in perspective on the data, allowing models to learn patterns
trading. and trends over different time horizons. For instance, a 1-
13) ARD (ARDR): Uses Automatic Relevance Determina- day window focuses on very short-term movements, while a
tion to adapt regression models to the inherent structure 28-day window captures broader market trends.
of the data. Each machine learning model in our study was trained
14) Bayesian Ridge (BRR): Combines ridge regression against each rolling window size. This process involved se-
with Bayesian inference, offering flexible modeling in quentially moving the window through the entire dataset,
uncertain market conditions. training the model on the data within the window at each
15) Gradient Boosting (GBR): Constructs an additive step. For example, with a 7-day window, the model would be
model in a forward stage-wise fashion, useful in pro- trained on data from days 1 to 7, then on data from days 2
gressive market trend analysis. to 8, and so on, until the end of the dataset. This approach
16) Lars (LaR): Efficient in high-dimensional data regres- ensures that each model is exposed to a wide range of market
sion, providing solutions along a regularization path. conditions, enhancing its ability to generalize and adapt.
17) Linear Regression (LiR): The foundational regression The use of multiple window sizes allows us to analyze the
model, establishing linear relationships between market performance of each model under different market conditions.
variables. It provides insights into which models are better at capturing
18) RANSAC (RanR): Fits models robustly to subsets short-term trends versus long-term trends. This is particularly
of data, effectively dealing with outliers in financial important in Bitcoin trading, where market conditions can
datasets. change rapidly. Models that perform well across multiple
19) Theil Sen (TSR): A non-parametric approach resilient window sizes are likely to be more robust and versatile,
to outliers, suitable for complex multivariate regression making them more reliable for real-world trading applications.
in trading. After training, each model’s performance was evaluated
20) Radius Neighbors (RNR): Utilizes a fixed radius for based on its predictive accuracy within each window. The
neighborhood-based regression, applicable in spatially model with the highest performance in terms of predictive
correlated market environments. accuracy and profitability (PNL) for each window size was
The selection of these diverse models is based on their estab- then selected for further analysis. This approach allows us
lished effectiveness in predictive modeling, particularly in the to identify the most effective models for Bitcoin trading,
financial markets where accuracy, adaptability, and robustness considering both short-term and long-term market behaviors.
are of utmost importance. This wide range of models ensures
a comprehensive analysis, allowing us to identify the most
D. Hyperparameter Optimization
effective strategies for Bitcoin trading prediction.
Hyperparameters are the configurable settings used to tune
the performance of machine learning models. Unlike model
C. Rolling Windows and Training Process parameters, which are learned during training, hyperparam-
The concept of rolling windows is pivotal in time series eters are set prior to the training process and can have a
analysis, especially in financial markets where data is sequen- significant impact on the effectiveness of the models. Proper
tial and market conditions are dynamic. A rolling window hyperparameter optimization is critical in machine learning,
particularly in financial applications like Bitcoin trading, October 2023, served as a real-world test bed to assess the
where the optimal model configuration can substantially in- models’ performance on new, unseen data.
fluence predictive accuracy and profitability.
For the purpose of hyperparameter optimization in this IV. R ESULTS AND D ISCUSSION
study, we employed Optuna [19], an open-source hyperparam-
A. Evaluation Metrics
eter optimization framework. Optuna is designed for automat-
ing the process of finding the best hyperparameters, making it In the domain of algorithmic trading, the performance of
an ideal tool for our complex machine learning tasks. It uses a classifiers and regressors is quantified through a series of
Bayesian optimization technique to search the hyperparameter established metrics. Each metric provides unique insights into
space efficiently, focusing on combinations that are more likely the model’s predictive accuracy, risk management, and overall
to yield better model performance. This approach is especially economic viability. Below is a detailed explanation of each
beneficial given the large number of models and the extensive metric employed in this study:
range of hyperparameters involved in our study. • Profit and Loss (PNL) Percentage: This metric mea-
In our implementation with Optuna, each model underwent sures the total percentage gain or loss of a trading strategy
100 trials of hyperparameter tuning. In each trial, Optuna var- over a specified period. It is calculated by summing up
ied the hyperparameters within predefined ranges, searching individual trade outcomes (profit or loss) and dividing by
for the combination that maximized the model’s performance. the total investment. A positive PNL indicates profitabil-
The hyperparameters varied included learning rates, regular- ity, while a negative PNL suggests a loss.
ization strengths, the number of layers and neurons in neural • Sharpe Ratio: Named after Nobel laureate William F.
network models, and other model-specific parameters. The Sharpe, this ratio is used to understand the return of
variation in these hyperparameters was guided by Optuna’s an investment compared to its risk. It is calculated by
optimization algorithm, which adapted its search strategy subtracting the risk-free rate of return from the average
based on the results of previous trials, thereby progressively return of the investment and dividing the result by the
honing in on the most promising hyperparameter values. investment’s standard deviation. A higher Sharpe Ratio
The primary metric for evaluating the performance of the indicates a more desirable risk-adjusted return [20].
models during the hyperparameter optimization process was • R-squared (R2): R2 is a statistical measure that rep-
the Profit and Loss (PNL) percentage. PNL was chosen as it resents the proportion of the variance for a dependent
directly reflects the financial efficacy of the models in trading variable that’s explained by an independent variable or
scenarios. For each model, the hyperparameter combination variables in a regression model. An R2 of 1 indicates
that yielded the highest PNL percentage during the backtesting that the regression predictions perfectly fit the data.
phase was identified as the optimal set. This approach ensured • Accuracy: In classification tasks, accuracy is the fraction
that the selected hyperparameters were not only statistically of predictions our model got right, defined as the number
effective but also financially practical in terms of trading of correct predictions divided by the total number of
performance. predictions. It is a useful metric when the classes in the
The optimization of hyperparameters is particularly im- dataset are nearly balanced.
portant in the volatile and unpredictable domain of Bitcoin • F1 Score: The F1 score is the harmonic mean of precision
trading. Bitcoin markets exhibit unique characteristics and can and recall and is particularly useful when the class
behave differently from traditional financial markets. There- distribution is imbalanced. It is calculated as 2 times the
fore, fine-tuning the models to adapt to these idiosyncrasies product of precision and recall divided by the sum of
through hyperparameter optimization is essential to achieve precision and recall.
the best possible predictive performance. • Precision: Precision is defined as the number of true
positives divided by the number of true positives plus the
number of false positives. It is a measure of a classifier’s
E. Backtest and Forward Test Procedures exactness. A high precision relates to a low false positive
In financial machine learning applications, backtesting and rate.
forward testing are crucial steps for evaluating the effective- • Recall: Recall, also known as sensitivity or true positive
ness and robustness of models. Backtesting involves testing rate, is the number of true positives divided by the number
the models against historical data to assess their performance, of true positives plus the number of false negatives. It is
while forward testing (also known as paper trading) tests the a measure of a classifier’s completeness.
models on more recent, unseen data to evaluate how well they • Mean Absolute Error (MAE): For regression models,
might perform in real-world trading scenarios. MAE is a metric that sums the absolute differences
For the purpose of this research, the dataset was divided between predicted and actual values and then takes the
into three distinct segments: training, backtesting, and forward average. It gives an idea of how wrong the predictions
testing. The training set, spanning from January 2013 to were in terms of an average amount.
January 2023, was used to train the models. The backtesting • Mean Squared Error (MSE): MSE is the average of
phase covered data from February to July 2023, providing a the squares of the errors of the predictions. It penalizes
recent historical dataset to evaluate the trained models. The larger errors more than smaller ones, due to squaring each
forward testing phase, encompassing data from August to difference.
Backtest Forwardtest
Rolling
Classifier PNL F1 No. of PNL F1 No. of
window Sharpe R2 Accuracy Precision Recall Sharpe R2 Accuracy Precision Recall
(%) score Trades (%) score Trades
AdaBoostClassifier 21 89.26 6.47 0.92 0.53 0.64 0.51 0.88 55 -9.24 -0.81 0 0.49 0.59 0.48 0.77 40
BaggingClassifier 28 121.73 7.17 0.89 0.6 0.65 0.57 0.74 74 -21.67 -2.78 0.5 0.53 0.59 0.51 0.7 40
BernoulliNB 21 113.31 6.14 0.89 0.58 0.59 0.56 0.63 106 29.78 5.17 0.84 0.52 0.53 0.5 0.56 38
CalibratedClassifierCV 28 92 7.59 0.86 0.52 0.66 0.51 0.94 24 -2.78 0.05 0.11 0.42 0.54 0.44 0.72 30
DecisionTreeClassifier 28 62.22 3.81 0.62 0.5 0.63 0.5 0.88 41 -8.17 -1.12 0.28 0.44 0.57 0.45 0.77 28
ExtraTreeClassifier 28 103.73 6.14 0.9 0.57 0.65 0.54 0.82 49 12.16 2.46 0.22 0.49 0.58 0.48 0.74 31
GaussianProcessClassifier 21 47.36 3.42 0.57 0.52 0.58 0.5 0.69 90 -24.72 -3.37 0.52 0.43 0.46 0.42 0.51 40
KNeighborsClassifier 28 103.84 5.44 0.96 0.56 0.57 0.55 0.61 95 -5.14 0 0.57 0.49 0.47 0.47 0.47 51
LinearDiscriminantAnalysis 28 88.7 4.65 0.85 0.51 0.58 0.5 0.67 82 -6.84 -0.23 0.24 0.5 0.54 0.48 0.6 49
LinearSVC 28 73.36 4.12 0.81 0.53 0.55 0.52 0.58 102 -6.63 -0.21 0.49 0.5 0.54 0.48 0.6 49
LogisticRegression 28 97.44 5.43 0.87 0.53 0.59 0.52 0.69 94 -20.12 -1.96 0.34 0.49 0.52 0.47 0.58 51
LogisticRegressionCV 28 111.04 6.57 0.81 0.56 0.68 0.53 0.96 36 6.81 1.47 0.32 0.52 0.63 0.5 0.86 31
MLPClassifier 28 112.04 4.94 0.77 0.55 0.62 0.53 0.75 74 -28.13 -3.23 0.46 0.43 0.51 0.44 0.63 41
PassiveAggressiveClassifier 21 83.33 4.94 0.84 0.53 0.57 0.5 0.66 87 -40.23 -5.67 0.84 0.36 0.41 0.36 0.47 42
Perceptron 28 75.61 4.84 0.9 0.58 0.6 0.57 0.64 90 -55.87 -8.62 0.85 0.37 0.33 0.33 0.33 35
QuadraticDiscriminantAnalysis 21 90.09 12.36 0.89 0.53 0.66 0.5 0.97 28 -3.98 -0.16 0.53 0.51 0.61 0.49 0.79 29
RandomForestClassifier 21 87.75 6.3 0.79 0.53 0.65 0.5 0.92 40 15.38 8.68 0.84 0.52 0.66 0.5 0.98 10
RidgeClassifier 28 94.36 4.17 0.78 0.51 0.59 0.5 0.71 74 -7.02 -0.23 0.35 0.48 0.53 0.47 0.63 51
SGDClassifier 28 104.29 5.16 0.87 0.52 0.57 0.51 0.64 90 -15.64 -1.4 0.68 0.49 0.53 0.47 0.6 49
SVC 28 106.92 5.24 0.81 0.52 0.63 0.51 0.84 60 -12.8 -1.34 0.85 0.46 0.56 0.46 0.72 35
RadiusNeighborsClassifier 1 26.97 13.22 0.8 0.46 0.63 0.46 0.99 6 12.78 5.08 0.08 0.49 0.65 0.48 0.98 6

TABLE I: Performance Metrics of Classifiers: A Comparative Analysis of Backtest and Forwardtest Results

• Root Mean Squared Error (RMSE): RMSE is the rolling window of 28 days, achieved an exceptional PNL,
square root of the mean of the squared errors. It is com- suggesting that its ensemble approach is particularly suited to
monly used in regression analysis to verify experimental grasp long-term trends. Conversely, the BernoulliNB classifier
results, and like MSE, gives more weight to larger errors. demonstrates a high degree of precision in the shorter rolling
• Number of Trades: This metric indicates the count of window of 21 days, indicating its potential effectiveness in
trades executed based on the model’s recommendations. short-term market movement prediction. The MLPClassifier’s
It can provide an understanding of the model’s trading balanced metrics, particularly its F1 score, suggest a well-
frequency and has implications for transaction costs and tuned model that avoids overfitting, evidenced by its ability to
market liquidity. maintain high precision and recall.
These metrics collectively provide a holistic view of the 2) Forward Test Observations: The forward testing phase
models’ performance, enabling us to not only assess the is critical for assessing the real-world applicability of the
profitability and accuracy of predictions but also to gauge the classifiers. The Random Forest Classifier, which maintained
risk and reliability of the trading strategies derived from the a consistent performance across both phases, indicates not
models. just a strong fit to the data but also adaptability to evolving
These metrics were chosen to provide a comprehensive eval- market conditions. The sharp increase in Sharpe Ratio for the
uation of the models’ performance. PNL, Sharpe Ratio, and Quadratic Discriminant Analysis and RadiusNeighborsClassi-
Number of Trades directly relate to the financial effectiveness fier from backtest to forward test underscores their potential
of the models. In contrast, R2, Accuracy, F1 Score, Precision, for yielding profitable strategies when applied in real-time,
Recall, MAE, MSE, and RMSE offer insights into the models’ despite their less impressive backtest PNL. These results
predictive accuracy and error characteristics. A combination of underscore the importance of evaluating models on unseen
these metrics allows for a balanced assessment, considering data to gauge their practical utility.
both financial viability and statistical accuracy. 3) Rolling Window and Model Responsiveness: The vary-
ing rolling window sizes play a significant role in the clas-
B. Classifier Results Interpretation sifiers’ ability to capture different market conditions. Larger
Table I provides a quantitative evaluation of classifier mod- windows may allow classifiers to integrate longer-term trends
els over two distinct phases: backtesting and forward testing. into their predictions, which can be crucial for capturing
The performance of each classifier is contextualized by a set macroeconomic movements that affect asset prices. Smaller
of metrics, and the rolling window sizes are instrumental windows, on the other hand, may enable classifiers to react
in capturing temporal market dynamics. The top-performing more quickly to short-term market volatility, which could be
models in each phase are highlighted, indicating their superior advantageous in rapidly changing trading environments.
ability to navigate the complexities of market prediction. 4) Interpreting the Discrepancies Between Backtest and
1) Backtest Insights: The backtest phase reveals the in- Forward Test Results: The highlighted models exhibit varied
trinsic strength of the classifiers when applied to historical performances when transitioning from backtest to forward test
data. For instance, the highlighted BaggingClassifier, with a environments. Such discrepancies may stem from overfitting
Backtest Forwardtest
Rolling
Regressor PNL No. of PNL No. of
window Sharpe R2 MAE MSE RMSE Sharpe R2 MAE MSE RMSE
(%) Trades (%) Trades
AdaBoostRegressor 28 94.69 7.62 0.79 0.0183 0.0007 0.0255 25 9.68 2.73 0.6 0.0117 0.0004 0.0198 16
BaggingRegressor 21 102.04 6.56 0.92 0.0179 0.0006 0.0251 88 11.01 1.99 0.01 0.0128 0.0004 0.0205 52
DecisionTreeRegressor 21 97.31 6.34 0.92 0.0181 0.0006 0.0252 64 13.41 2.39 0.1 0.0122 0.0004 0.0198 38
ExtraTreeRegressor 28 101.03 4.92 0.7 0.0183 0.0007 0.0255 80 -6.84 -0.38 0.81 0.0121 0.0004 0.0201 39
GaussianProcessRegressor 28 90.8 4.89 0.73 0.0183 0.0007 0.0256 20 -0.05 -1000 0 0.0118 0.0004 0.0198 0
KNeighborsRegressor 28 106.01 6.71 0.94 0.0186 0.0006 0.0255 76 11.62 2.09 0.13 0.0133 0.0004 0.0204 41
LinearSVR 21 71.57 4.7 0.86 0.0181 0.0007 0.0256 93 24.49 4.13 0.3 0.0124 0.0004 0.0199 38
MLPRegressor 28 76.92 4.6 0.86 0.1229 0.0236 0.1536 88 -20.99 -2.96 0.71 0.229 0.0768 0.2771 34
RandomForestRegressor 28 84.01 14.67 0.91 0.0183 0.0007 0.0257 8 3.38 2.72 0.01 0.0117 0.0004 0.0198 10
Ridge 21 37.35 2.42 0.45 0.0197 0.0007 0.0264 84 20.92 3.76 0.56 0.0163 0.0005 0.0221 45
SGDRegressor 28 81.28 5.06 0.87 0.0184 0.0007 0.0256 63 34.01 5.34 0.8 0.0117 0.0004 0.0195 38
SVR 7 76.74 4.86 0.73 0.0272 0.0013 0.0355 81 -24.45 -2.81 0.61 0.0261 0.0012 0.0341 50
ARDRegression 28 76.33 4.99 0.8 0.0183 0.0007 0.0257 42 17.54 3.6 0.12 0.0119 0.0004 0.0199 13
BayesianRidge 28 59.04 4.67 0.85 0.0185 0.0007 0.0256 47 15.2 2.67 0.09 0.0118 0.0004 0.0195 29
GradientBoostingRegressor 28 80.81 6.44 0.83 0.0185 0.0006 0.0254 30 0.43 2.36 0.05 0.0123 0.0004 0.02 6
Lars 21 48.78 3.18 0.74 0.0461 0.004 0.063 105 31.69 4.88 0.66 0.0525 0.0051 0.0711 42
LinearRegression 28 48.98 3.14 0.5 0.1156 0.0211 0.1452 79 27.64 4.57 0.66 0.1197 0.0231 0.1519 38
RANSACRegressor 21 47.17 2.97 0.57 0.1399 0.0316 0.1777 97 -6.12 -0.16 0.08 0.1487 0.0345 0.1857 49
TheilSenRegressor 7 81.96 4.45 0.75 0.1429 0.0474 0.2176 70 -7.02 -0.42 0.73 0.1724 0.0564 0.2374 31
RadiusNeighborsRegressor 1 42.09 3.65 0.67 0.0175 0.0007 0.0255 42 1.69 1.12 0.16 0.0126 0.0004 0.0211 20

TABLE II: Performance Metrics of Regressors: Evaluating Predictive Strength Across Market Conditions

to historical data patterns that do not extrapolate well into 2) Forward Test Observations: Transitioning to forward
future market states. The BaggingClassifier, while performing testing, the SGDRegressor maintains a strong performance,
optimally in backtesting, shows a decrease in PNL during indicating robustness and potential for real-world application.
forward testing. This could indicate a model finely tuned The Lars regressor shows an increase in both PNL and Sharpe
to past conditions but less adaptable to unforeseen market Ratio, suggesting that its simpler, linear approach is well-
shifts. In contrast, the Random Forest Classifier demonstrates suited for the forward test market conditions. The consistency
robustness, with a more consistent PNL, suggesting a model in the performance of the RadiusNeighborsRegressor, with
that captures underlying market drivers that persist over time. minimal trades, accentuates its precision in trade selection,
5) Assessing Model Robustness and Economic Significance: which is vital for strategies aiming to minimize transaction
Robustness in financial models is demonstrated by consistent costs.
performance across both backtesting and forward testing. 3) Rolling Window and Model Predictive Dynamics: The
Economic significance, however, is derived from the model’s regressors’ results highlight the significance of selecting an
ability to produce actionable insights leading to profitable appropriate rolling window size, which directly influences
trades. The BernoulliNB classifier, for instance, maintains a their ability to assimilate and predict based on the market’s
high PNL in both phases, reinforcing its potential for real- historical data. The rolling window’s impact is evident in the
world application. The Sharpe Ratios, especially in forward models’ varied performance across the two testing phases,
test results, reflect the models’ capabilities to deliver returns with different window lengths aligning with specific market
above the risk-free rate, which is crucial for long-term invest- behaviors that the models have learned to predict.
ment strategies. 4) Analysis of Regressor Robustness: The robustness of
regressors is evaluated through their ability to maintain predic-
C. Regressor Results Interpretation tive accuracy from backtesting to live-market forward testing.
In parallel, Table II lays out the regressors’ performance, The SGDRegressor, with its high Sharpe Ratio and con-
where the highlighted models exhibit noteworthy predictive sistent PNL, exemplifies a model with a stable foundation,
power. Each regressor is scrutinized under metrics that collec- likely to withstand market volatilities. The Rolling Window’s
tively portray its predictive accuracy and economic impact. significance is evident in the models’ ability to incorporate
1) Backtest Insights: During the backtest period, the SG- relevant market data into their predictive framework, with
DRegressor distinguished itself with a notable PNL and the longer windows capturing more extensive market trends.
highest Sharpe Ratio, suggesting effective risk management 5) Economic and Predictive Implications: The economic
combined with profitability. This is further corroborated by implications of the regressors’ performance are multifaceted.
its relatively high R2 value, reflecting the model’s capability A high PNL is desirable but must be coupled with low
to capture the variance in price movement effectively. The predictive error metrics, such as MAE and RMSE, to be eco-
GradientBoostingRegressor and Lars, both highlighted for nomically significant. The Lars model, for instance, illustrates
their substantial PNL, also demonstrate solid R2 scores, which this with an improved Sharpe Ratio and a lower RMSE in
points to their models’ good explanatory power. forward testing, suggesting a model that not only forecasts
 %*&
%1%&
0/3&
 5)&
51&
%DFNWHVW
 )RUZDUGWHVW
5HDO:RUOG


31/ 


















































E

U

U

Q

O

Y

F

Q
\

J

W
S
-X

2F
0D

$S

1R

'H
0D
)H

-X

$X

-D
6H

Fig. 2: Profit and Loss (PNL) Trajectories of Top Classifiers in Real-World Trading Scenarios

accurately but does so with economic prudence. to historical patterns but were also robust and flexible enough
to adapt to new, unseen market data.
D. Closing Evaluation
F. Analysis of Top Models on Real-World Data
The detailed analysis of classifiers and regressors under-
scores the multifaceted nature of financial prediction. The An empirical evaluation of the top-performing classifiers
highlighted models in the tables provide a benchmark for what was conducted to assess their ability to generalize beyond
can be achieved with careful tuning and selection of rolling backtesting and forward testing scenarios. This analysis is
windows. These results emphasize the necessity of a com- crucial to determine the models’ viability in live-market con-
prehensive evaluation framework that incorporates a variety ditions, where unpredictability and external factors play a
of performance metrics to assess model efficacy thoroughly. significant role.
The findings from the backtest and forward test phases offer 1) Interpreting Classifier Performance in the Real World:
invaluable insights for developing resilient trading strategies Figure 2 illustrates the Profit and Loss (PNL) trajectories of the
capable of adapting to the ever-evolving patterns of financial top classifiers over a timeline that spans backtesting, forward
markets. testing, and into the real-world application phase. Each line
represents the PNL progression of a model, providing insights
into their performance stability and adaptability to real market
E. Hyperparameter Optimization: Tuning for Peak Perfor- conditions.
mance The shaded areas—red for backtesting, green for forward
In the quest for optimal model performance, hyperparameter testing, and blue for the real-world phase—contextualize the
optimization serves as the fine-tuning process that can make timeline of each model’s deployment. Across the transition
or break the predictive power of classifiers and regressors. The from controlled testing environments to the real world, the
hyperparameter optimization for classifiers was meticulously following observations are made:
performed using advanced techniques that explored the depth • Consistency of Performance: The models that maintain
and breadth of the parameter space, striking a balance between a steady trajectory from backtesting through to real-world
model complexity and generalization capability. The regressors trading, such as the Random Forest Classifier (RFC),
underwent a similar process, with each model’s unique param- indicate a strong ability to adapt to evolving market
eters adjusted to navigate the intricate landscape of financial conditions without overfitting to historical data.
time series forecasting. This iterative and methodical approach • Adaptability to Market Shifts: Some models, like
ensured that the final model configurations were not just suited the Multi-Layer Perceptron Classifier (MLPC), show re-
.15
%*5
(75
 6*'5
/D5
/L5
 %DFNWHVW
)RUZDUGWHVW
5HDO:RUOG
31/ 


















































E

U

U

Q

O

Y

F

Q
\

J

W
S
-X

2F
0D

$S

1R

'H
0D
)H

-X

$X

-D
6H

Fig. 3: Real-World Profit and Loss (PNL) Performance of Top Regressors

silience in the face of market volatility, as evidenced by they transition from the controlled environments of backtesting
their PNL performance remaining robust or improving and forward testing into actual market deployment. The PNL
when transitioning to real-world trading. trajectories provide a longitudinal view of each model’s ability
• Real-World Viability: The Bagging Classifier (BGC) to navigate and capitalize on real market trends.
and BernoulliNB Classifier (BNBC) demonstrate signif- The shaded regions represent different evaluation phases:
icant real-world viability, highlighted by their sustained backtesting (red), forward testing (green), and the real-world
PNL levels in the live market phase. This suggests that trading period (blue). The regressors’ performance trends
these models have captured fundamental market drivers across these phases offer a multifaceted perspective on their
that are applicable in ongoing trading. predictive capabilities and economic utility:
• Economic Significance: The Ridge Classifier (RNC),
• KNeighborsRegressor (KNR) displays a relatively sta-
while showing a dip in the forward test phase, recovers
ble PNL during backtesting, which declines during for-
in the real-world application, pointing to economic strate-
ward testing but shows recovery in real-world conditions.
gies embedded within the model that may only become
This pattern suggests a sensitivity to market conditions
evident under actual market pressures.
that may require adaptive parameter adjustments or dy-
• Volatility and Risk Management: The volatility in the
namic feature selection to maintain profitability.
PNL trajectories for some classifiers indicates the varying
• BaggingRegressor (BGR) and ExtraTreesRegressor
risk profiles and the models’ sensitivity to market fluctua-
(ETR) both demonstrate high PNL in the backtest phase,
tions. Effective risk management strategies are imperative
with the BGR maintaining this performance in the for-
for these models to ensure that high volatility does not
ward test phase, indicating a robust model less prone
erode profitability.
to overfitting and capable of capturing persistent market
The detailed visualization of PNL trajectories in Figure 2 signals.
serves as a testament to the models’ capabilities and provides a • Stochastic Gradient Descent Regressor (SGDR) shows
predictive lens through which investors can gauge the potential a consistent increase in PNL across all phases, highlight-
success of deploying these models in live trading scenarios. ing its strength in adapting to new data. Its performance
The analysis confirms that while backtest and forwardtest in the real-world phase, in particular, underscores the
performances are indicative, the ultimate test for any trading potential of SGD-based models for financial time series
model lies in its real-world application. forecasting.
2) Real-World Performance of Regressors: Figure 3 • Lasso Regression (LaR) and Linear Regression (LiR)
presents the PNL performance of selected regressor models as exhibit significant PNL volatility post-backtesting. The
divergence in their PNL during the real-world phase [2] F. Dakalbab, M. A. Talib, Q. Nassir, and T. Ishak, “Artificial intelligence
could reflect their varying degrees of regularization and techniques in financial trading: A systematic literature review,” Journal
of King Saud University-Computer and Information Sciences, p. 102015,
feature weighting, which impact their ability to handle 2024.
non-stationary market data. [3] P. Nagamani, G. J. Anand, S. G. Prasanna, B. S. Raju, and M. S. Satish,
• Performance Fluctuations: The fluctuations and drops “Bitcoin price prediction using machine learning algorithms,” in Second
International Conference on Emerging Trends in Engineering (ICETE
in PNL for some models from backtesting to real-world 2023). Atlantis Press, 2023, pp. 389–396.
application highlight the challenges of model generaliza- [4] N. Maleki, A. Nikoubin, M. Rabbani, and Y. Zeinali, “Bitcoin price
tion and the impact of market volatility. These variations prediction based on other cryptocurrencies using machine learning and
time series analysis,” Scientia Iranica, 2020.
call for ongoing model recalibration and robust risk [5] E. Akyildirim, O. Cepni, S. Corbet, and G. S. Uddin, “Forecasting mid-
management strategies to mitigate potential drawdowns. price movement of bitcoin futures using machine learning,” Annals of
Operations Research, pp. 1–32, 2021.
The PNL trajectories in Figure 3 underscore the impor- [6] S. Ziweritin, “Height-end multi-layer perceptron and machine learning
tance of rigorous model evaluation. Models that demonstrate methods of forecasting bitcoin price time series,” Authorea Preprints,
resilience and adaptability in forward testing are more likely 2023.
[7] R. Siva, B. Subrahmanian et al., “Analyzing the machine learning
to perform well in real-world trading, but the ultimate litmus methods to predict bitcoin pricing,” World Journal of Advanced Research
test for any trading strategy is its ability to sustain profitability and Reviews, vol. 21, no. 1, pp. 1288–1294, 2024.
in the live market. This graph illustrates not only the successes [8] L. Al Hawi, S. Sharqawi, Q. A. Al-Haija, and A. Qusef, “Empirical
evaluation of machine learning performance in forecasting cryptocur-
but also the limitations of the tested regressors, guiding future rencies,” Journal of Advances in Information Technology, vol. 14, no. 4,
model refinement and the development of adaptive trading 2023.
systems. [9] M. A. S. Yudono, A. D. W. M. Sidik, I. H. Kusumah, A. Suryana, A. P.
Junfithrana, A. Nugraha, M. Artiyasa, E. Edwinanto, and Y. Imamulhak,
“Bitcoin usd closing price (btc-usd) comparison using simple moving
average and radial basis function neural network methods,” FIDELITY:
V. C ONCLUSION Jurnal Teknik Elektro, vol. 4, no. 2, pp. 29–34, 2022.
[10] H. Sebastião and P. Godinho, “Forecasting and trading cryptocurrencies
This study evaluated the performance of 41 machine learn- with machine learning under changing market conditions,” Financial
ing models, comprising 21 classifiers and 20 regressors, for Innovation, vol. 7, pp. 1–30, 2021.
[11] S. A. Gyamerah, “Are bitcoins price predictable? evidence from ma-
Bitcoin price prediction in algorithmic trading. Through rig- chine learning techniques using technical indicators,” arXiv preprint
orous backtesting, forward testing, and real-world testing, we arXiv:1909.01268, 2019.
identified that models like Random Forest and Stochastic Gra- [12] S. Yao, D. Ma, and Y. Zhang, “Prediction of bitcoin price movements
based on machine learning method and strategy construction.”
dient Descent exhibit superior performance in terms of profit [13] A. P. Kumar and S. Sunny, “Comparative analysis of machine learning
and risk management. The integration of both machine learn- models for predicting bitcoin price rate,” International Research Journal
ing metrics (e.g., Mean Absolute Error, Root Mean Squared of Modernization in Engineering, Technology and Science, vol. 3, pp.
234–238, 2021.
Error) and trading metrics (e.g., Profit and Loss percentage, [14] A. Falcon and T. Lyu, “Daily cryptocurrency returns forecasting and
Sharpe Ratio) provided a comprehensive assessment of model trading via machine learning,” Journal of Student Research, vol. 10,
performance. no. 4, 2021.
[15] F. Valencia, A. Gómez-Espinosa, and B. Valdés-Aguirre, “Price move-
Our findings underscore the necessity for a multi-faceted ment prediction of cryptocurrencies using sentiment analysis and ma-
evaluation approach to ensure the practical utility of trading chine learning,” Entropy, vol. 21, no. 6, p. 589, 2019.
models. Many models that performed well in backtesting [16] J. Parra-Moyano, D. Partida, and M. Gessl, “Your sentiment matters:
A machine learning approach for predicting regime changes in the
did not translate effectively to forward tests and real-world cryptocurrency market,” in The 56th Hawaii International Conference
scenarios, highlighting the limitations of relying solely on on System Sciences. HICSS 2023. Hawaii International Conference on
backtesting. By incorporating economic indicators and con- System Sciences (HICSS), 2023, pp. 920–929.
[17] M. Iqbal, M. Iqbal, F. Jaskani, K. Iqbal, and A. Hassan, “Time-series
sidering practical trading constraints, our study offers a robust prediction of cryptocurrency market using machine learning techniques,”
and practical solution for Bitcoin price prediction and trading. EAI Endorsed Transactions on Creative Technologies, vol. 8, no. 28,
Future research should extend these findings to other cryp- 2021.
[18] T. W. Septiarini, M. R. Taufik, M. Afif, and A. R. Masyrifah, “A
tocurrencies and investigate the impact of different economic comparative study for bitcoin cryptocurrency forecasting in period 2017-
indicators on model performance. Additionally, exploring 2019,” in Journal of Physics: Conference Series, vol. 1511, no. 1. IOP
emerging machine learning techniques can further enhance Publishing, 2020, p. 012056.
[19] T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-
predictive accuracy and trading effectiveness. This study pro- generation hyperparameter optimization framework,” in Proceedings
vides valuable insights for traders and researchers aiming to of the 25th ACM SIGKDD International Conference on Knowledge
leverage machine learning for more strategic and profitable Discovery and Data Mining, 2019.
[20] W. F. Sharpe, “The sharpe ratio,” The Journal of Portfolio Management,
cryptocurrency trading. Future work will also focus on refin- vol. 21, no. 1, pp. 49–58, 1994.
ing our multi-faceted evaluation framework and exploring its
application in different market conditions to further validate
and improve the robustness of trading models.

R EFERENCES
[1] F. Fang, C. Ventre, M. Basios, L. Kanthan, D. Martinez-Rego, F. Wu,
and L. Li, “Cryptocurrency trading: a comprehensive survey,” Financial
Innovation, vol. 8, no. 1, p. 13, 2022.

You might also like