Classification_of_ECG_Signals_using_Decision_Trees_and_Linear_Discriminant_Analysis
Classification_of_ECG_Signals_using_Decision_Trees_and_Linear_Discriminant_Analysis
Mahmud Ihsan Fuady Muhammad Glavin S.Zaidan Thierry Rain Dhafin Montoya
Department of Electrical Engineering Department of Electrical Engineering Department of Electrical Engineering
Universitas Padjadjaran Universitas Padjadjaran Universitas Padjadjaran
Jatinangor, Indonesia Jatinangor, Indonesia Jatinangor, Indonesia
[email protected] [email protected] [email protected]
Abstract—The heart's crucial function necessitates practical authoritmic cells that trigger a potential for the contractile
detection tools like Electrocardiograph (ECG) for recording its
cells to function. [3]. Diagnosis of heart abnormalities
electrical activity, aiding in understanding heart muscle
contractions and identifying abnormalities. Artificial through hearing heart sounds using a stethoscope is less
Intelligence, notably Machine Learning, enhances computer effective due to the degree of difficulty and variation between
performance for cardiac data analysis without explicit
programming, surpassing traditional methods like stethoscopes individuals, as well as the high dependence of the analysis on
or ECGs. This study compares ECG signal classification using the hearing sensitivity and experience of the medical
Decision Trees and Linear Discriminant Analysis (LDA) models professional [4]. The ECG signal consists of the QRS wave,
with patient data. The methodology involves data acquisition,
signal processing, feature extraction, and model application. which is the main representation of the heart's electrical
Decision Trees outperform LDA, achieving 98.95% accuracy activity, as well as two other important waves, the P and T
through K-Fold cross-validation, whereas LDA achieves
88.82%. Evaluation metrics including precision, recall, and F1-
waves [5] [6] [7].
Score underscore Decision Trees' superiority, adeptly handling Machine Learning is a branch of Artificial Intelligence that
complexity and nonlinearity compared to LDA. These results focuses on developing systems and algorithms for computers
affirm Decision Trees' accuracy, stability, and overall
performance balance, reinforcing their effectiveness in ECG to learn from data, recognize patterns, and perform tasks
signal classification. without the need for explicit programming [8] [9]. The goal is
Keywords—Decision Tree, LDA, ECG, Heart, Classification
to create models based on data to improve computer
performance over time. Classification data mining is the
I. INTRODUCTION process of grouping objects into predefined categories. This
The heart, a vital organ in the human body, plays a key process is often used to predict the class of a label by
role in circulating blood throughout the body through the classifying data based on a training set and certain attribute
heartbeat, which includes contraction (systole) and relaxation values, which are then used to classify new data. [10].
(diastole). The frequency of the heartbeat reflects the speed at Some research that has been done before as a reference is
which blood is pumped through the circulatory system. this study using the Multi-Layer Perceptron (MLP) algorithm
Although doctors use a stethoscope or electrocardiograph for diastolic and systolic blood pressure classification by
(ECG) to check the heart rate, this method is cumbersome and 96.6% and 94.2% [11]. Research by Bin Huang et al (2022)
time-consuming. Therefore, a practical, economical, and easy- successfully developed a tool to measure human blood
to-use heart detection device is needed to assist the medical pressure using ECG and PPG signals. Usha K. et al (2021)
team in calculating the patient's heart rate. [1, 2]. Heart created an automated method for detecting cardiac
function can be identified through the principle of electricality arrhythmias using a machine learning technique known as
in the heart using a medical device called an Support Vector Machine (SVM), combined with Discrete
Electrocardiograph (ECG). Electrocardiography records the Wavelet Transform (DWT). The SVM classifier achieved a
electrical activity of the heart through sensors placed on the performance accuracy of 95.92% in classifying arrhythmias,
skin, involving the contraction of two types of heart muscle as evaluated using a confusion matrix. [12]. Furthermore,
cells: contractile cells that operate mechanically, and research conducted by B. Krithiga et al (2021) discusses early
Authorized licensed use limited to: Pusan National University Library. Downloaded on May 12,2024 at 07:54:30 UTC from IEEE Xplore. Restrictions apply.
detection of Coronary Heart Disease (CHD) using the Naive The ECG device used consists of an AD8232 sensor to
Bayes Algorithm. This research focuses on detecting mild detect the heart rate and an ESP32 module to control the
abnormalities in CHD at an early stage using the Naive device and send data to the computer via Bluetooth. In
Bayes1 classifier. This classifier considers various parameters addition, this device is equipped with a battery as a power
such as age, gender, type of chest pain, blood pressure, and source and DC to DC to increase the battery voltage from 3.7
more to diagnose CHD. The Naive Bayes classifier showed an volts to 5 volts. All of these components are integrated into
accuracy of 84.0714% in detecting the presence of CHD, one PCB board. The device will be installed with three
which is higher compared to other classifiers such as SVM electrodes as a means of receiving ECG signals. The position
(82.943%) and Artificial Neural Network (80.1567%) [13]. of the electrodes on the subject's skin needs to be carefully
Based on the research problems previously described and considered to ensure accurate heart rate signal results. This
the results of previous research, this study formulates how to electrode placement is based on Einthoven's theory where the
determine the comparative analysis of ECG signal red electrode is placed above the right diaphragm, the yellow
classification using the Decision Trees and Linear electrode above the left diaphragm, and the green electrode on
Discriminant Analysis classification models with an ECG the right under the ribs.
patient data obtained as test data. The Linear Discriminant
Analysis (LDA) method is a good method for separating
classes in data in an optimal way, provided that the data is
normally distributed and has similar variation between
classes. Decision Trees, on the other hand, is an easy-to-
understand method because it looks like a decision tree,
making it easy to interpret. Decision Trees are also good at
handling various types of data, both categorical and
numerical.
Authorized licensed use limited to: Pusan National University Library. Downloaded on May 12,2024 at 07:54:30 UTC from IEEE Xplore. Restrictions apply.
response requirements need to be met. Furthermore, practical
considerations such as computational complexity and
implementation feasibility also influence the selection of
filtering techniques. Butterworth filters are known for their
simplicity and low computational cost, making them
preferable for real-time applications or situations where
computational resources are limited. On the other hand, FIR
filters with Kaiser Window tend to require higher
computational demands due to their longer filter lengths, but
Fig. 3. Raw Signal From ECG Recording
they offer superior performance in terms of stopband
In the initial stage of signal processing, the Baseline
attenuation and passband ripple control, justifying their use in
Correction technique is performed to adjust the ECG signal to
applications requiring high precision despite additional
point 0 to facilitate the filtering process. A detrending process
computational costs.
is performed on the voltage function using the linear
detrending method to adjust the linear component of the C. Feature Extraction
signal. Then, a low-pass Butterworth filter is used to dampen Feature extraction from ECG signals is a crucial step in
or remove unwanted high-frequency interference, including analyzing ECG signals to identify patterns and characteristics
noise that appears during signal capture. The Butterworth useful in detecting cardiac disorders. These features are used
approach is generally chosen because it provides a better as variables in a classification model, providing insights into
linear phase and an even response across each bandwidth. the heart's electrical activity. Researchers use parameters on
The final stage of ECG signal preprocessing involves the ECG cycle, such as RR Interval, PR Interval, QS Complex,
filtering using Finite Impulse Response (FIR). This filtering QT Interval, ST Segment, and Qtc, to determine if a subject is
aims to improve the quality of the ECG signal by focusing on indicative of arrhythmia. RR Interval measures the interval
suppressing the low-frequency components and any between two consecutive R complexes, providing information
remaining noise. Using an FIR filter with a Kaiser Window is on heart rhythm and heart rate. PR Interval measures the
considered the best option as this window technique can set a distance between a P complex and a QRS complex, providing
narrow transition band, allowing the design of filters with information about AV conduction and evaluating conduction
sharp transitions to reduce noise more effectively. disturbances. QS Complex indicates myocardial infarction or
heart tissue damage in the area corresponding to the QS
complex. QT Interval represents the total duration of
ventricular depolarization and repolarization, with
prolongation of the interval indicating a risk for ventricular
arrhythmias. ST Segment represents the period of change in
electrical polarity between ventricular depolarization and
repolarization, indicating myocardial ischemia or ventricular
repolarization issues. QTc (Corrected QT Interval) corrects
QT interval values based on heart rate, accounting for heart
rate variability.
Fig. 4. Filtering Using Baseline Correction, Butterworth, and FIR In handling missing data or anomalies in feature
The selection of specific filtering techniques, such as extraction, researchers employ several strategies. For missing
Butterworth and FIR with Kaiser Window, over alternatives data, techniques such as mean imputation or interpolation can
is often based on several factors. Firstly, the characteristics of be utilized to estimate the missing values based on available
the signal and the desired filtering requirements play a crucial data. Anomalies or outliers in feature extraction, which may
role. Butterworth filters are commonly used for their smooth arise due to measurement errors or other factors, are often
frequency response, which is suitable for applications where addressed through outlier detection methods such as z-score
maintaining signal amplitude across all important frequencies normalization or Tukey's method. Additionally, researchers
is essential. Conversely, FIR filters with Kaiser Window offer may choose to exclude or assign lower weights to
flexible design parameters, allowing precise control over observations identified as anomalies during data
filtering characteristics such as passband ripple and stopband preprocessing to ensure robustness of the analysis. These
attenuation, which is beneficial when specific frequency approaches help mitigate the impact of missing data or
Authorized licensed use limited to: Pusan National University Library. Downloaded on May 12,2024 at 07:54:30 UTC from IEEE Xplore. Restrictions apply.
anomalies on the accuracy and reliability of arrhythmia variables (dependent) and attribute variables (independent),
detection processes based on electrocardiogram (ECG) with the former being linked to the latter, describing object
parameters. characteristics [19]. Before the training process begins, we
find the within-class covariance matrix 47 and covariance
D. Decision Trees
matrix between classes 48 , each of which has its definition.
Decision Tree is a classification algorithm that uses a tree-
47 = ∑@ ∑<= ∈? : − ; : −;
shaped data structure to determine the class of data. It is part A
(5)
of the classification methods in data mining. A decision tree is
a flowchart-like concept with a tree structure, where each node 48 = ∑@ ; −; A
; −; (6)
(internal node) represents an attribute and branches represent
test results or attribute input values, while leaves represent In this context, B refers to the kth image, is the
classes or class distributions [14]. There are three types of number of samples in the class B , and C is the total number of
nodes in a decision tree [15]: classes. ; is the average image of the class, and ; is the
Root Node: The initial node in a decision tree that has average image in class-i. Next, we find the eigenvectors of the
no inputs but can have multiple outputs. matrix resulting from the multiplication between 48 and the
Internal Node: These nodes have one input and inverse of 47 .
multiple outputs, serving as branching points based
on certain variables. C D = 48 47 E (7)
Leaf or Terminal Node: The final node that has one The eigenvector is selected based on the largest
input and multiple outputs, used to determine the eigenvalue. This eigenvector is then used to project each
class of the data. training data using equation (10).
E. Linear Discriminant Analysis In this process, H $ is the distance between the projection
Linear Discriminant Analysis (LDA) is a statistical and of training data and test data, : is the training data projection
machine learning method used to classify data into specific and : $ is the projection of the test data.
categories by enhancing differences between categories and
reducing variation within each category [16]. LDA improves III. RESULTS AND DISCUSSION
the distinction between classes and reduces variation within A. Research Results
them by employing between- and within-category functions After passing through the feature extraction stage, the data
[17]. It aims to increase inter-class distances while decreasing is combined and grouped into four categories at the
intra-class distances during information processing [18]. classification stage, including abnormal, normal, potentially
LDA generates attributes based on the number of classes and arrhythmic, and highly potentially arrhythmic. This process
applications, preserving information while handling begins by summing the PQRST interval features without
numerous categories [14]. LDA defines objects using class considering standard deviation and heart rate, referred to as
Authorized licensed use limited to: Pusan National University Library. Downloaded on May 12,2024 at 07:54:30 UTC from IEEE Xplore. Restrictions apply.
sum. Categorization was performed based on two main After categorizing the entire dataset with the previously
variables, namely sum and heart rate. The boundaries for the mentioned classifications, the next step is to apply
classification of each variable are set with a focus on the classification learning using the data. For validation, the
predefined normal heart parameters. The limits of the sum proportion of test data and training data is 25:75. The training
variable were calculated by summing the lower and upper data is used to train, adjust, and optimize the model, while the
limits of each interval, resulting in the values 1250 and 1870 test data is used to test the model's performance by providing
as the new lower and upper limits. Details of the normal unbiased classification estimates. This data is later used to
cardiac interval parameters can be found in Table I. compare the Decision Trees classification method with LDA.
This aims to analyze which of the two methods is better.
TABLE I. NORMAL HEART INTERVAL PARAMETERS
The next step was to group the "sum" variable into four
categories with a normal range of 1250-1870, and the "heart
rate" variable into four categories with normal at 60 - 100
bpm. The final classification is determined by looking at the
grouping results of both, where the normal category is
obtained if both variables are in the normal category. Details
of the category boundaries and final classification are in Table
II.
Authorized licensed use limited to: Pusan National University Library. Downloaded on May 12,2024 at 07:54:30 UTC from IEEE Xplore. Restrictions apply.
Decision Trees achieved a higher F1-Score of 98.69%, while [6] M. Turnip, A. Dharma, A. Afriansyah, A. Oktarino, and A. Turnip,
"Integration of FIR and Butterworth Algorithm for Real-Time
LDA's F1-Score was 87.98%.
Extraction of Recorded ECG Signals," Computer and Automation
System. Advances in Intelligent Systems and Computing, vol. 1291,
The study found that Decision Trees was significantly 2021.
better than LDA in handling complexity and nonlinearity in [7] S. J. Lase, A. Trisanto, N. S. Syafei, N. L. Hoa and A. Turnip,
classification data, suggesting that Decision Trees can handle "Identification and Classification of Arrhythmias on Portable ECG
Using ANN Method," in 2022 IEEE International Conference on
complex data. These findings support Decision Trees' Sustainable Engineering and Creative Computing (ICSECC), Bekasi,
superiority in classification, considering accuracy, stability, 2022.
and the balance between precision and recall. [8] A. Turnip, J. G. Hamonangan, G. F. Yohanes, P. Turnip, E. Sitompul
and N. Le Hoa, "Detection of Drug Effects on Brain-Based EEG
. Signals using K-Nearest Neighbours," in 2023 IEEE 9th Information
Technology International Seminar (ITIS), Batu Malang, Indonesia,
IV. CONCLUSIONS 2023.
The study compares ECG signal classification methods [9] A. Turnip, M. Taufik, D. R. Manday, Yennimar, E. Sitompul and D.
Hidayat, "PPG Signal-Based Blood Pressure Classification With
using Linear Discriminant Analysis (LDA) and decision trees, Ensemble Bagged Trees Method," in 2023 International Conference
finding that decision trees yield superior results. Conducted of Computer Science and Information Technology (ICOSNIKOM),
Binjia, 2023.
with a homemade ECG device, the study uses features such as
[10] M. Solahuddin, A. I. Purnamasari and A. R. Dikananda, "Klasifikasi
RR, PR, QRS, QT, ST, and heart rate intervals. Four Kualitas Berita Pada Majalah Menggunakan Metode Decision Tree,"
classifications are used: abnormal, normal, potentially Jurnal Teknologi Ilmu Komputer, vol. I, no. 2, pp. 48 - 54, 2023.
arrhythmic, and very potentially arrhythmic. After processing [11] B. Huang, W. Chen, C. L. Lin, C. F. Juang, and J. Wang, "MLP-BP:
A novel framework for cuffless blood pressure measurement with
data from 40 subjects, the Decision Trees classifier achieves PPG and ECG signals based on MLP-Mixer neural networks,"
98.68% accuracy, while LDA achieves 88.16%. Validation Biomed. Signal Process. Control, vol. 71, 2021.
and test data are split 25:75. Decision Trees' precision, recall, [12] C. U. Kumari, A. S. D. Murthy, B. L. Prasanna, M. P. P. Reddy, and
A. K. Panigrahy, "An automated detection of heart arrhythmias using
and F1-Score values are 98.75%, 98.68%, and 98.69% machine learning technique: SVM," Mater. Today Proc, vol. 45, 2021.
respectively, while LDA's are 88.64%, 88.16%, and 87.98% [13] B. Krithiga, P. Sabari, I. Jayasri, and I. Anjali, "Early detection of
respectively. The results suggest that Decision Tree-based coronary heart disease by using naive bayes algorithm," J. Phys. Conf.
Ser, vol. 1717, 2021.
approaches are more effective in distinguishing ECG signal
[14] D. Septhya, K. Rahayu, S. Rabbani, V. Fitria, Rahmaddeni, Y. Irawan
patterns, potentially improving classification accuracy and and R. Hayami, "Implementasi Algoritma Decision Tree dan Support
clinical relevance. The study underscores the importance of Vector Machine Untuk Klasifikasi Penyakit Kanker Paru,"
MALCOM: Indonesian Journal of Machine Learning and Computer
selecting appropriate classification methods tailored to data Science, vol. III, no. 1, pp. 15 -19, 2023.
characteristics, offering insights for future medical signal [15] C. A. Sari, A. Sukmawati, R. P. Aprilli, P. S. Kayaningtias and N.
analysis research. Yudistira, "Perbandingan Metode Naive Bayes, Support Vector
Machine, dan Decision Tree Dalam Klasifikasi Konsumsi Obat,"
Jurnal Litbang Edusaintech (JLE), vol. III, no. 1, pp. 33 - 41, 2022.
ACKNOWLEDGMENTS
[16] M. Melinda, M. Oktiana, Y. Yunidar, N. H. Nabila and I. K. A. Enriko,
This research was supported by Department of Electrical "Classification of EEG Signal Using Independent Component
Engineering, Universitas Padjadjaran, Indonesia. Analysis and Discrete Wavelet Transform Based on Linear
Discriminant Analysis," International Journal on Informatics
Visualization, vol. VII, no. 3, pp. 830 - 838, 2023.
REFERENCES
[17] D. A. Widyati, R. R. Isnanto and M. A. Riyadi, "Analysis of
Recognition Pattern Leaves Uses The Method Linear Discriminant
Analysis (LDA) and The Distance Minkowski," Transformtika, vol.
[1] A. R. Rinaldi, "Rancang Bangun Alat Deteksi Jantung Berbasis XVIII, no. 2, pp. 225 - 230, 2021.
Mikrokontroler Arduino Dengan Pulse Sensor," Seminar Nasional
[18] R. R. Istanto, I. Rashad and C. E. Widodo, "Classification of Heart
Fortel Regional 7 (SinarFe7), pp. 374 - 377, 2021.
Disease Using Linear Discriminant Analysis Algorithm," ICENIS, pp.
[2] A. Turnip, J. G. Hamonangan, S. J. Lase, N. S. Syafei, AS. Rahmi 1 - 11, 2023.
"DETEKSI DINIARITMIA MENGGUNAKAN K-NEAREST
[19] F. R. Malau and D. I. Mulyana, "Classification of Edelweiss Flowers
NEIGHBOUR," JIIF (Jurnal Ilmu dan Inovasi Fisika), vol. 8, no. 1,
Using Data Augmentation and Linear Discriminant Analysis
p. 86–95, 2024.
Methods," Journal of Applied Engineering and Technological
[3] F. M. Rosidi, "Implementasi Sistem Telemedicine Untuk Monitoring Science, vol. IV, no. 1, pp. 139 - 148, 2022.
Detak Jantung Berbasis Sensor AD8232," Seminar Nasional Fortel
Regional 7 (SinarFe7), pp. 317 - 320, 2021.
[4] M. A. Mukhtar, "Alat Deteksi Sinyal Elektrokardiografi Pada Jantung
Menggunakan Wavelet Transform dan Neural Network," Seminar
Nasional Fortel Regional 7 (SinarFe7), pp. 313 - 316, 2021.
[5] Hendrick, A. Okvironi and R. Y. Setyawan, "Pemantauan Detak
Jantung Sinyal EKG Melalui Jaringan LoRa," Seminar Nasional
Terapan Riset Inovatif (Sentrinov), vol. VI, no. 1, pp. 321 - 328, 2020.
Authorized licensed use limited to: Pusan National University Library. Downloaded on May 12,2024 at 07:54:30 UTC from IEEE Xplore. Restrictions apply.