0% found this document useful (0 votes)

72 views

Automated Detection of Diabetes Using CNN and CNN-LSTM

Uploaded by

Ashif Mahbub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views

Automated Detection of Diabetes Using CNN and CNN-LSTM

Uploaded by

Ashif Mahbub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Available

Available online
online at
at www.sciencedirect.com
www.sciencedirect.com
Available online at www.sciencedirect.com

ScienceDirect
Procedia
Procedia
Procedia Computer
Computer Science
Science
Computer 00
00 (2018)
132 (2018)
Science 000–000
1253–1262
(2018) 000–000
www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia

International
International Conference
Conference on
on Computational
Computational Intelligence
Intelligence and
and Data
Data Science
Science (ICCIDS
(ICCIDS 2018)
2018)
Automated detection of diabetes using CNN and CNN-LSTM
network and heart rate signals
Swapna
Swapna G,
G, Soman
Soman KP,
KP, Vinayakumar
Vinayakumar R
R
Centre
Centre for
for Computational
Computational Engineering
Engineering and
and Networking
Networking (CEN),
(CEN), Amrita
Amrita School
School of
of Engineering,
Engineering, Coimbatore,
Coimbatore,
Amrita Vishwa Vidyapeetham, India
Amrita Vishwa Vidyapeetham, India

Abstract
Abstract
Diabetes
Diabetes mellitus,
mellitus, commonly
commonly known
known as as diabetes,
diabetes, isis aa disease
disease that
that affects
affects aa vast
vast majority
majority ofof people
people globally.
globally. Diabetes
Diabetes cannot
cannot be
be
cured; it
cured; it can
can only
only be
be kept
kept under
under control.
control. In
In this
this paper,
paper, diabetes
diabetes is
is diagnosed
diagnosed by by the
the analysis
analysis ofof Heart
Heart Rate
Rate Variability
Variability (HRV)
(HRV) sig-
sig-
nals
nals obtained
obtained from
from ECG
ECG signals.
signals. We
We employed
employed deep deep learning
learning networks
networks ofof Convolutional
Convolutional neural
neural network
network (CNN)
(CNN) andand CNN-LSTM
CNN-LSTM
(LSTM
(LSTM = Long Short Term Memory) combination to automatically detect the abnormality. Unlike the conventional analysis
= Long Short Term Memory) combination to automatically detect the abnormality. Unlike the conventional analysis meth-
meth-
ods so
ods so far
far followed,
followed, deep
deep learning
learning techniques
techniques do do not
not require
require any
any feature
feature extraction.
extraction. We
We initially
initially performed
performed classification
classification splitting
splitting
the
the database
database into
into separate
separate training
training and
and testing
testing data.
data. The
The maximum
maximum accuracy
accuracy obtained
obtained for
for test
test data
data is
is 90.9%
90.9% using
using CNN-LSTM.
CNN-LSTM.
Using
Using 5 fold cross-validation, CNN gave an accuracy of 93.6% while CNN-LSTM combination gave the maximum accuracy
5 fold cross-validation, CNN gave an accuracy of 93.6% while CNN-LSTM combination gave the maximum accuracy of
of
95.1%. As
95.1%. As per
per our
our best
best knowledge,
knowledge, this
this is
is the
the first
first paper
paper inin which
which deep
deep learning
learning techniques
techniques areare employed
employed in in distinguishing
distinguishing dia-
dia-
betes
betes and
and normal
normal HRV.
HRV. The
The accuracy
accuracy obtained
obtained using
using cross-validation
cross-validation is
is the
the maximum
maximum value
value achieved
achieved soso far
far for
for the
the the
the automated
automated
detection of diabetes using
detection of diabetes using HRV.HRV.
© 2018 The Authors. Published by Elsevier Ltd.
cc 2018

2018 The
This
The Authors.
Authors. Published
is an open
Published by
by Elsevier
access article under
Elsevier B.V.
B.V.
the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review under
Peer-review under responsibility
responsibility of
of the
the scientific
scientific committee
committee of
of the
the International
International Conference
Conference on
on Computational
Computational Intelligence
Intelligence and
and
Data Science (ICCIDS 2018).
Data Science (ICCIDS 2018).
Keywords: ECG;
Keywords: ECG; HRV;
HRV; CNN;
CNN; LSTM;
LSTM; deep
deep learning;
learning; diabetes;
diabetes; Cardiovascular
Cardiovascular Autonomic
Autonomic Neuropathy
Neuropathy

1.
1. Introduction
Introduction

Diabetes
Diabetes isis aa condition
condition inin which
which the
the body
body isis unable
unable to
to metabolize
metabolize glucose
glucose effectively.
effectively. This
This in
in turn
turn creates
creates anan
abnormally
abnormally high level of glucose in the blood referred to as hyperglycemia. Diabetes may be caused either due
high level of glucose in the blood referred to as hyperglycemia. Diabetes may be caused either due to
to the
the
inability
inability of
of the
the body
body toto produce
produce enough
enough insulin
insulin or
or due
due to
to the
the situation
situation wherein
wherein cells
cells cannot
cannot respond
respond to
to the
the insulin
insulin that
that
is
is produced in the body. It is very difficult to cure diabetes; it has to be managed effectively. Type 1 diabetes, Type
produced in the body. It is very difficult to cure diabetes; it has to be managed effectively. Type 1 diabetes, Type
2
2 diabetes,
diabetes, and
and gestational
gestational diabetes
diabetes are
are the
the three
three types
types of
of diabetes,
diabetes, out
out of
of which
which the
the commonly
commonly prevalent
prevalent type
type is
is the
the

∗ Swapna
∗ Swapna GG
E-mail
E-mail address:
address: [email protected]
[email protected]

1877-0509 © 2018 The

TheAuthors.
Authors. Published by Elsevier
B.V.Ltd.
1877-0509
1877-0509
This
cc 2018
is an open
2018 The article
access Authors. Published
Published
under the CC
by Elsevier
by Elsevier
BY-NC-ND B.V.
license (https://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review under
Peer-review under responsibility of the scientific committee of the
the International
International Conference
Conference on
on Computational
Computational Intelligence and
and Data
Data Science
Science
Peer-review under responsibility
responsibility of the scientific
of the scientific committee
committee ofof the International Conference on Computational Intelligence
Intelligence and Data Science
(ICCIDS
(ICCIDS 2018).
2018).
(ICCIDS 2018).
10.1016/j.procs.2018.05.041
1254 Swapna G et al. / Procedia Computer Science 132 (2018) 1253–1262
2 Author name / Procedia Computer Science 00 (2018) 000–000

type 2 diabetes. By the word diabetes, we imply Type 2 diabetes in this paper. As per the update of World Health
Organisation (WHO) on diabetes in July 2017, the number of people with diabetes was 108 million in 1980, whereas
in 2014, it was 422 million. Over the same span, the diabetes afflicted adults over 18 years of age were almost doubled
(4.7% in 1980 to 8.5% in 2014). The middle and low income countries are more affected by diabetes. In 2012, there
were 1.5 million deaths reported. WHO studies again showed that there were additional 2.2 million deaths due to
complications caused by diabetes. The grim situation is that the 43% of these 3.7 million deaths occur before the age
of 70 years.
The economic impact of increasing diabetes is evident from the fact that diabetes management costs 132 billion
dollars in the United States alone every year according to the National Diabetes Information Clearinghouse.
In addition, diabetes is now increasingly affecting women population. Globally 8% of the women (205 million
women) are affected by diabetes; half of them belong to South-East Asia and Western Pacific. Pregnancy related
diabetes in women adversely affects the health of both mother and child. It also increases the risk of diabetes for child
in future.
All types of diabetes can lead to complications in many parts of the body, thus increasing the risk of premature
deaths. Kidney failure, heart attack, stroke, leg amputation, vision loss due to diabetic retinopathy and nerve damage
are some of the possible complications. All these data and reports on diabetes, mostly from WHO, point to the neces-
sity of development of effective diabetic management methods. Timely detection is of great importance in managing
diabetes. In the research work, we present a method to diagnose diabetes using Heart rate variability signals. The
analysis method employed is deep learning.
Diabetes causes disorders in nervous system. This disorder is known as diabetic neuropathy. Heart and blood
vessels are part of cardiovascular system. Nerve damage in the cardiovascular system negatively affects heart rate and
blood pressure. We concentrate on the diabetic neuropathy that affects nerves controlling heart and blood pressure,
known as Cardiovascular Autonomic Neuropathy (CAN).
Statistics indicate 50% of the diabetes affected people die from heart diseases and stroke. The death rates of diabetic
people, having heart diseases is about 2 to 4 times higher than people without diabetes (National Diabetes Statistics
2011).
Diabetes induced hyperglycemia causes cardiovascular abnormalities irrespective of the presence of conditions
like arterial hypertension, dyslipidemia and obesity. Poor glycemic control results in precapillary damages and affects
nerve blood flow. Diabetes also speeds up hardening of arteries which is known as atherosclerosis. CAN, which causes
dysfunction of the autonomic nervous system, is thus associated with diabetes, leading to reduced heart rate variability.
Thus, depressed HRV is an indicator of diabetic neuropathy [1]. Other than reduced heart rate variability, CAN caused
by diabetes, may cause changes in ECG like QT dispersion, ST-T changes, left ventricular hypertrophy and sinus
tachycardia. In this work, we have chosen heart rate variability as a measure of cardiac autonomic impairments caused
by diabetes from among these ECG alterations.
The interval between QRS complexes lying adjacent in ECG signal is named as NN (or RR) interval (tR−R ). The
Heart Rate (HR) measured in terms of beats per minute is expressed as HR = 60/tR−R . Thus, the heart rate signal is
sequence of non-uniform RR intervals in time domain. The RR interval variation is defined as Heart Rate Variability
(HRV). In other words, HRV indicates variations of instantaneous heart rate. The main advantage of HRV measure-
ments is that they are non-invasive in nature. They are easy to acquire. When acquired under standardized conditions,
HRV signals have good reproducibility [2].
It is observed that heart is not a periodic oscillator under normal physiological conditions [3]. Hence, nonlinear
techniques, which extract and analyse the nonlinear features from HRV signals, are widely employed in the analysis
of HRV signals, much more than time domain and frequency domain methods.
In this work, we employ CNN, CNN-LSTM combination on HRV signals to detect diabetes achieving a maximum
accuracy of 95.1% using CNN 5 layer LSTM combination using 5 fold-cross validation. Ours is the first work to
employ deep learning in the automated detection of diabetes using HRV with the highest value of accuracy obtained
so far. The previous works of diabetes detection using HRV is given in Table 4.
The organization of the paper is as follows: Section 2 describes the used data. Section 3 presents brief descriptions
of the deep learning techniques, CNN and LSTM used in this work. Section 4 tells in detail about experiments con-
ducted and the network architecture arrived at. Section 5 lists the obtained results. Section 6 is the discussion on the
results. The paper concludes in Section 7.
Swapna
Author name G et al. / Procedia
/ Procedia ComputerComputer Science
Science 00 (2018)132 (2018) 1253–1262
000–000 1255
3

Fig. 1. Sample Heart Rate signals corresponding to (a) Normal subject; (b) Diabetic subject.

2. Data used

The Electrocardiograms (ECG) of 20 diabetes patients and 20 normal people were collected with people in a relaxed
supine position for 10 minutes. ECG data were converted to heart rate time series signals using Pan and Tompkinson’s
algorithm. In Figure 1, sample HR signals of normal people and diabetic people are displayed. The sampling rate of
ECG was 500 Hz. We used 71 datasets of diabetic people and 71 datasets of normal people from the extracted data.
The number of samples in each dataset is 1000. No further pre-processing is done before giving this data to CNN and
CNN-LSTM.

3. Methods

We have adopted deep learning techniques for the detection of diabetes. The conventional steps of feature extrac-
tion, feature selection and classification in the traditional machine learning methods need not be explicitly defined
in deep learning networks. They are in turn, embedded in the deep learning network. The main process happening is
self-learning from the data.

3.1. Convolutional neural network (CNN)

Convolutional neural network (CNN) is a special type of Multilayer perceptron (MLP). They are similar to neural
networks in the following aspects. They are made up of neurons with weights and biases which have to be learned.
Some inputs are given to each neuron. Then, an operation of dot product is performed followed by an optional function
of nonlinearity. CNNs were initially used in the area of image processing where it receives raw image pixels on the
input end, transform it through a series of hidden layers and finally give the class scores at the other end.
CNN is build up of basically three types of main layers. They are Convolutional layer, Pooling layer and a fully
connected layer with a rectified linear activation function (ReLU). Our application is on analysing signals which are
one dimensional, so we use Convolution 1D layers, pooling 1D layers and fully connected layer. Here CNN takes the
time series data in one dimensional form wherein the data are arranged in the order of sequential time instants.
In our case, the input one dimensional data vector is x = (x1 , x2, ...., xn−1 , xn , cl) where xn ∈ Rd denotes features
(here time series HRV data) and cl ∈ R denotes a class label (either diabetic or normal). Convolution1D constructs a
feature map f m by applying the convolution operation on the input data with a filter w ∈ R f d where f denotes the
features inherent in the input data producing at its output, new set of features which is fed to input of the next block
in line.
A new feature map f m is obtained from a set of features f as follows

hlif m = tanh(w f m xi:i+ f −1 + b) (1)

The filter hl is employed to each set of features f in the input data defined by {x1: f , x2: f +1 , . . . , xn− f +1 } so as to
generate a feature map as hl = [hl1 , hl2 , . . . , hln− f +1 ]
where b ∈ R denotes a bias term and hl ∈ Rn− f +1
1256 Swapna G et al. / Procedia Computer Science 132 (2018) 1253–1262
4 Author name / Procedia Computer Science 00 (2018) 000–000

The output of the convolutional layer is given to the pooling (POOL) layer. Convolutional layer uses ReLU activa-
tion function that apply max(0,x) to each of the inputs to the ReLU represented by x. The next layer (POOL) performs
→
−
a downsampling operation. Here, the max-pooling operation is applied on each feature map hl = max{hl}. This pro-
duces the most significant features (here the selection of features with highest values). These selected features are fed
to fully connected layer, containing the so f tmax function that gives the probability distribution over each class. Thus,
the fully connected layer (FC) will compute the classes which form the final output of the CNN network. Thus the
CNN has the architecture of INPUT-CONV-POOL-FC.

3.2. Long short term memory (LSTM)

LSTM is an improved form of Recurrent Neural Network (RNN). LSTM introduces memory blocks instead of con-
ventional simple RNN units to handle the problem of vanishing and exploding gradient. LSTMs can handle long term
dependencies much better than the traditional RNNs. This means that LSTMs can remember and connect previous
information (which really lags back so much in time compared to the present) to the present.
A memory block in LSTM is a complex processing unit made up of one or more number of memory cells. A pair
of multiplicative gates is used as input and output gate. The complete operations of a memory block are controlled by
a set of adaptive multiplicative gates. The input gate performs allow or discard operation for an input flow of a cell
activation to a memory cell. The output gate performs allow or discard operation for an output state of a memory cell
to other nodes.
LSTM does the task of remembering information for long periods so effortlessly that they are presently widely
used in natural language processing (NLP) applications. It has achieved great improvements in areas like language
modelling, computer vision, speech recognition and others. The concept of LSTMs is so complex that it was not used
largely in earlier years. But its performance in language translation and speech processing has brought it to wide use
in many areas. Other than NLP and speech applications, it can be used in areas where modelling of large sequence of
data is required. As the research on LSTM progressed, forget gate and peephole connections were introduced into the
existing LSTM network. The forget gate is used instead of Constant error carousel (CEC). The forget gate helps to
forget or reset the states of a memory cell. The peephole connections are made from a memory cell to all of its gates.
They learn the precise timing of the outputs as well as the internal state of a memory cell.
The operation of the LSTM is as follows. The input sequence of data of arbitrary length x = (x1 , x2, ...., xT −1 , xT )
is fed to the LSTM architecture. The output sequence o = (o1 , o2 , ..., oT −1 , oT ) with continuous write, read and reset
operations by three multiplicative units (input (in), output (ot), and forget gate ( f r)) on memory cell (cl) is estimated
in an iterative manner from t = 1 to T in the recurrent hidden layer of LSTM architecture. The sequence of operations
taking place in LSTMs at time step T can be briefly represented by the below equations. xt , ht−1 , clt−1 → ht , clt

int = σ(w xin xt + whin ht−1 + wclin clt−1 + bin ) (2)

f rt = σ(w x f r xt + wh f r ht−1 + wcl f r clt−1 + b f r ) (3)

clt = f rt clt−1 + int tanh(w xcl xt + whcl ht−1 + bcl ) (4)

ott = σ(w xot xt + whot ht−1 + wclot clt + bot ) (5)

ht = ott tanh(clt ) (6)

LSTM memory block is a complex processing unit with many units. The memory cell stores the information across
many time steps with the control from the three adaptive multiplicative gating units. The input and output flows of
a cell activation is modulated by the input and output gates of a memory cell. Forget gate is used to reset the self-
recurrent value, when it becomes irrelevant. Forget gate uses the value 0 and 1 to delete and retain value to next step
by multiplying with a memory cell. Memory cell and all gates have peephole connections to learn the precise timings
of outputs.
Swapna G et al. / Procedia Computer Science 132 (2018) 1253–1262 1257
Author name / Procedia Computer Science 00 (2018) 000–000 5

3.3. Hybrid network of CNN-LSTM

In this hybrid network, CNN consists of convolution1D and maxpooling1D layers only. The output of the max-
pooling1D layer is fed to the subsequent LSTM layer.

yi = CNN(xi ) (7)

xi is the initial input vector to the CNN network with the class label. yi is the output of the CNN network to be
fed to the next LSTM network. xi the feature vector formed from the max-pooling operation in CNN. It is fed to the
LSTM to learn the long-range temporal dependencies.

4. Experiments and Architecture

Many powerful and ease-of-use software frameworks have been developed to facilitate the implementation of deep
learning network architectures. A comparative study of these deep learning software frameworks has been performed
on various deep learning architectures executed on devices (CPU and GPU) with respect to the parameters of ex-
tensibility, hardware utilization and speed [4]. In this section, we discuss the effectiveness of the various CNN and
combination of CNN and LSTM architectures for the detection of diabetes using our privately collected HRV data
sets. The deep learning architectures used in our research are experimented using the most recent software framework
TensorFlow (Google’s open source data flow engine) [5]. TensorFlow allows researchers to model numerical systems
as unified data flow graphs. The data flow graph represents mathematical operations with the help of tensors, nodes and
edges. In addition, programmers can also perform computations on heterogeneous platforms like one or more CPUs,
GPU or mobile devices. We run all our experiments on GPU enabled TensorFlow in single NVidia GK110BGL Tesla
k40 to accelerate the gradient descent computations.

4.1. Hyper parameter selection

Deep learning architecture is represented by parameterized functions and hence the optimal parameters have direct
impact on the accuracy of diabetes detection. In order to find out the optimal values for parameters such as learning
rate, number of units / memory blocks, number of filters, number of hidden layers, we conducted experiments on
various configurations for CNN and hybrid of CNN and LSTM networks.
Initially, we tried the moderately sized CNN network which consists of an input, hidden and output layers. The
input layer is made up of 1000 neurons. The hidden layer is built up as follows. Convolutional neural network (CNN)
made up of 32 filters, kernel-size is chosen as 3, stride as 1, pool-size as 2 and followed by functionality blocks of
maxpooling1d, flatten and dropout with 0.5. This is further followed by fully connected layer with sigmoid activation
function. The connection between neurons in input layer to the hidden layer and from the hidden layer to the output
layer is fully-connected. The values in the input data sets are normalized so that they fall in the range 0 to 1. Three
trails of experiments are run for the network parameter of filter size initially chosen as 32, then 64 and finally 128. All
experiments are run for 300 epochs with batch size of 16. ADAM is used as the optimizer and binary cross entropy
is chosen as the loss function. We observed that CNN with 64 filters performed well in comparison to the other filter
sizes. When we increase the number of filters from 128 to 256 and to 512, the network produced the same detection
rate of diabetes as the CNN network with 64 filters. Hence, we decided to set 64 filters for the CNN layer for the rest
of the experiments.
Our next objective was to find an optimal value for the parameter of learning rate of the CNN network. We con-
ducted three trails of experiments with different learning rates in the range [0.01-0.5]. It was observed that the perfor-
mance of CNN network with learning rate 0.001 is better in comparison to the other values of learning rates.

4.2. Network topologies

To select the most suitable CNN network structure for training our privately collected HRV dataset to detect the
diabetes, the following network topologies are used.
1258 Swapna G et al. / Procedia Computer Science 132 (2018) 1253–1262
6 Author name / Procedia Computer Science 00 (2018) 000–000

• CNN 1 layer
• CNN 2 layer
• CNN 3 layer
• CNN 4 layer
• CNN 5 layer
• CNN 1 layer followed by LSTM with 70 memory blocks
• CNN 2 layer followed by LSTM with 70 memory blocks
• CNN 3 layer followed by LSTM with 70 memory blocks
• CNN 4 layer followed by LSTM with 70 memory blocks
• CNN 5 layer followed by LSTM with 70 memory blocks

When we use more than one layer in both the CNN and CNN-LSTM network, the numbers of filters chosen is the
doubled value with the rest of the parameter values remaining the same. Three trails of experiments are run for the
above network topologies. All the experiments are run for 600 epochs with learning rate 0.001. It is observed that the
complex network topologies took large epochs (more than 500 epochs) to attain the desirable level of performance in
the detection of diabetes. We also found that the complex network topologies performed well in comparison to the
other simple networks. Each network took different number of epochs to run the experiments to attain the desired
performance level in detecting diabetes. Moreover, the simple CNN and the CNN-LSTM network topologies faced
the problem of over fitting, once the number of epochs reached 500. It specifically means that the network has started
to memorize the train data samples. Thus overfitting deteriorates the generalized performance of the network. The
best performed model configuration details are provided in Table 1.

Table 1. Structure and configuration details of proposed CNN-LSTM structure.

Layers Type Neurons Parameters filters Kernel-size strides Pool-size

0-1 Convolution1D (None, 1000, 64) 256 64 3 1 -

1-2 Max-pooling1D (None, 500, 64) 0 - - - 2
2-3 Convolution1D (None, 500, 128) 24704 128 3 1 -
3-4 Max-pooling1D (None, 250, 128) 0 - - - 2
4-5 Convolution1D (None, 250, 256) 98560 256 3 1 -
5-6 Max-pooling1D (None, 125, 256) 0 - - - 2
6-7 Convolution1D (None, 125, 512) 393728 512 3 1 -
7-8 Max-pooling1D (None, 62, 512) 0 - - - 2
8-9 Convolution1D (None, 62, 1024) 1573888 1024 3 1 -
9-10 Max-pooling1D (None, 31, 1024) 0 - - - 2
10-11 LSTM (None, 70) 306600 - - - -
12-13 Dropout (0.1) (None, 70) 0 - - - -
13-14 Fully-connected (None, 1) 71 - - - -

4.3. Proposed Architecture

The proposed architecture for detecting diabetes using ECG is shown in Fig. 2.
Unlike traditional machine learning classifiers, a deep learning algorithm doesn’t rely on any feature engineering
mechanisms. Instead, it takes raw data as such and passes to more than one hidden layer to obtain the optimal feature
representation itself. This is composed of an input layer, hidden layer and an output layer. The input layer contains
1000 neurons and hidden layer contains 5 convolution layers with filters 64, 128, 256, 512, 1024. Each convolutional
layer is followed by maxpooling1d with the pool length 2 to reduce the dimension. Finally, the features learnt by
CNN network are passed to the LSTM layer. This contains the 70 memory blocks. This facilitates in obtaining the
sequences related information and passes that information to the fully connected layer. In between LSTM and fully
connected layer, dropout layer with 0.1 is used. This acts as a regularization parameter and facilitates to alleviate over
fitting. The fully connected layer followed by output layer with sigmoid non-linear activation function, output values
Swapna G et al. / Procedia Computer Science 132 (2018) 1253–1262 1259
Author name / Procedia Computer Science 00 (2018) 000–000 7

Fig. 2. Architecture of proposed system for detection of diabetes

0 (diabetes) or 1 (non-diabetes). This uses binary cross entropy as loss function, which is represented by the below
equation
N
1
loss(ed, pd) = − [ed j log pd j + (1 − ed j ) log(1 − pd j )] (8)
N j=1

where ed is a vector of expected class label, pd is a vector of predicted label.

5. Results

The cross-validation accuracy and the detailed test accuracy are given in the Table 2 and Table 3 respectively.

Table 2. Accuracy of 5-fold cross-validation

Architecture Accuracy

CNN 1 layer 0.681

CNN 2 layer 0.754
CNN 3 layer 0.884
CNN 4 layer 0.912
CNN 5 layer 0.936
CNN 1 layer - LSTM 0.742
CNN 2 layer - LSTM 0.762
CNN 3 layer - LSTM 0.851
CNN 4 layer - LSTM 0.931
CNN 5 layer - LSTM 0.951

The epoch wise test accuracy of each network is displayed in Fig. 3(b).
It can be observed from Fig. 3(b) that most of the networks attained highest accuracy epoch in the range [200-400].
After that, the networks have started to over fitting. But CNN and CNN-LSTM 5 layer network has maintained the
trend of increase in accuracy till 1000 epochs. CNN-LSTM model has reached highest accuracy over epochs in the
range [850-950]. CNN has reached highest accuracy over epochs in the range [400-850]. To understand the detection
rate of diabetes accurately, the ROC curve is drawn and shown in Fig. 3(a). CNN-LSTM has achieved highest AUC
of 0.974.
1260 Swapna G et al. / Procedia Computer Science 132 (2018) 1253–1262
8 Author name / Procedia Computer Science 00 (2018) 000–000

Table 3. Summary of test results.

Algorithm Accuracy Precision Recall F1-Score Loss

CNN 1 layer 0.50 0.0 0.0 0.0 0.69

CNN 2 layer 0.659 0.667 0.636 0.651 0.90
CNN 3 layer 0.773 0.750 0.818 0.783 1.16
CNN 4 layer 0.795 0.842 0.727 0.780 1.72
CNN 5 layer 0.841 0.857 0.818 0.837 0.51
CNN 1 layer - LSTM 0.591 0.559 0.864 0.679 0.69
CNN 2 layer - LSTM 0.659 0.667 0.636 0.651 0.69
CNN 3 layer - LSTM 0.705 0.696 0.727 0.711 0.70
CNN 4 layer - LSTM 0.818 0.889 0.727 0.800 0.37
CNN 5 layer - LSTM 0.909 0.846 1.00 0.917 0.38

Fig. 3. (a) Accuracy; (b) ROC curve.

Generally, in the network we have experimented with, the inputs have been passed to more than one layer to capture
the hidden patterns with respect to space and time domain. The activation in each layer enhances the capability of the
system to distinguish the HRV data as belonging to the class of diabetes or normal (non-diabetes). To visualize and
understand, the last layer activation values of LSTM networks are passed to t-SNE instead of fully connected network
[6]. This transforms the high dimensional feature vectors into two dimensional feature vectors. The feature vectors
are shown in Fig. 4.
As we can see from Fig. 4, the HRV data of diabetes and non-diabetes are not completely appearing in different
clusters. From this, it can be inferred that the proposed deep learning architecture has not learnt the complete patterns
associated with the diabetes and non-diabetes data. The primary reason may be due to fact that the size of the HRV
data used in this work is less when compared to the typical size of data need to be pushed as input to deep learning
networks. As part of our future research, we intend to feed the proposed architecture with a larger input data to further
improve the accuracy of distinguishing diabetes.
Swapna G et al. / Procedia Computer Science 132 (2018) 1253–1262 1261
Author name / Procedia Computer Science 00 (2018) 000–000 9

Fig. 4. 12 samples of each class of Diabetes and Non-Diabetes with their corresponding activation values of the penultimate layer neurons are
represented using 2-dimensional linear projection (PCA). Note that the samples are clustered based on the similarity in activation values

6. Discussions

Deep learning methods using CNN have been employed to analyse ECG signals for the detection of coronary artery
disease [7], Myocardial Infarction [8], classification of heartbeats [9].
Diabetes detection was done with 88.41% accuracy making use of deep learning neural network using 8 attributes
including glucose concentration in plasma, blood pressure, body mass index [10]. In this work, we have used deep
learning networks of CNN and CNN-LSTM networks to achieve a maximum accuracy of 95.1% in detecting diabetes.
This is the highest accuracy obtained so far in the automated detection of diabetes using HRV signals.
The advantages of our method are:

• Deep learning techniques are introduced for the first time to diagnose diabetes using HRV data (derived from
ECG) as input.
• Since deep learning methods are employed, there is no need of any feature extraction, selection and classifica-
tion.
• Ours is an automated diabetes detection method whose accuracy is the highest among the automated diabetes
detection methods so far produced.

Table 4. Summary of methods used for automated detection of diabetes using HRV parameters

Authors Methods Performance

Ref [11] Nonlinear (RQA features, Correlation Dimension) Accuracy = 86%

Ref [12] HOS (Bispectrum moments, entropies) Accuracy = 90.5%
Ref [13] HOS (PCA features) Accuracy = 79.93%
Ref [14] Nonlinear (RQA features, Approximate Entropy) Accuracy = 90.0%
Ref [15] EMD related features difference between DM and normal
Ref [16] DWT (entropies, energy, skewness, kurtosis) Accuracy = 92.02%
Proposed paper Deep learning using CNN,CNN-LSTM Accuracy = 95.1%
1262 Swapna G et al. / Procedia Computer Science 132 (2018) 1253–1262
10 Author name / Procedia Computer Science 00 (2018) 000–000

7. Conclusion

A majority of human population has been affected by diabetes. Diabetes cannot be cured, it can only be kept under
control. Uncontrolled diabetes can lead to complications, leading to several other chronic diseases. Hence,early detec-
tion, efficient treatment and proper management of diabetes is very important. Diabetes causes nerve disorders which
can affect heart and thus heart rate too. Here, in our work, we use HRV data (extracted from ECG signals) to detect
diabetes with a very high accuracy. As far as our knowledge, this is the first work to employ deep learning in detecting
diabetes using HRV data. The accuracy of 95.1% achieved using CNN-LSTM network with 5-fold cross-validation
is the highest accuracy obtained so far in the automated detection of diabetes using HRV. There is no requirement of
explicit feature extraction and use of traditional classifiers. Our method is non-invasive and reproducible. Our system
can assist the clinicians to diagnose diabetes accurately. As explained under Results section, further improvement in
accuracy can be explored by feeding into the proposed architecture large sized input dataset compared to the dataset
size used in this work.

Acknowledgement

We are grateful to NVIDIA India, for the GPU hardware support for our research. We are grateful to Computational
Engineering and Networking (CEN) department for encouraging the research. We would like to thank Dr. Venkatesh
(Government Medical College, Manjeri, Kerala, India) for sharing his ECG data for this study.

References

[1] Michael A Pfeifer, Daniel Cook, Joel Brodsky, David Tice, A Reenan, Sally Swedine, Jeffrey B Halter, and Daniel Porte. (1982) Quantitative
evaluation of cardiac parasympathetic activity in normal and diabetic man. Diabetes 31 (4): 339345.
[2] Robert E. Kleiger, J Thomas Bigger, Matthew S. Bosner, Mina K. Chung, James R. Cook, Linda M. Rolnitzky, Richard Steinman, and Joseph
L. Fleiss. (1991) Stability over time of variables measuring heart rate variability in normal subjects.” American Journal of Cardiology 68 (6):
626630.
[3] Ary L. Goldberger and Bruce J. West. (1987) Applications of nonlinear dynamics to clinical cardiology.” Annals of the New York Academy of
Sciences 504 (1): 195213.
[4] Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, and Mohak Shah. (2015) Comparative study of deep learning software frameworks.”
arXiv preprint arXiv 1511.06435.
[5] Martn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving,
Michael Isard, et al. (2016) Tensorflow: A system for large-scale machine learning. In OSDI 16 265283.
[6] Laurens van der Maaten, and Geoffrey Hinton. (2008) Visualizing data using t-SNE.” Journal of machine learning research 9:25792605.
[7] Rajendra Acharya U., Hamido Fujita, Oh Shu Lih, Muhammad Adam, Jen Hong Tan, and Chua Kuang Chua. (2017) Automated detection of
coronary artery disease using different durations of ECG segments with convolutional neural network. ” Knowledge-Based Systems 132:6271.
[8] Rajendra Acharya U., Hamido Fujita, Shu Lih Oh, Yuki Hagiwara, Jen Hong Tan, and Muhammad Adam. (2017) Application of deep convo-
lutional neural network for automated detection of myocardial infarction using ECG signals. Information Sciences 415: 190198.
[9] Rajendra Acharya U., Shu Lih Oh, Yuki Hagiwara, Jen Hong Tan, Muhammad Adam, Arkadiusz Gertych, and Ru San Tan. (2017) A deep
convolutional neural network model to classify heartbeats. Computers in biology and medicine 89:389396.
[10] Akm Ashiquzzaman, Abdul Kawsar Tushar, Md Rashedul Islam, Dongkoo Shon, Kichang Im, Jeong-Ho Park, Dong-Sun Lim, and Jongmyon
Kim. (2018) Reduction of overfitting in diabetes prediction using deep learning neural network. In IT Convergence and Security pages 3543
Springer.
[11] Rajendra Acharya U., Oliver Faust, Vinitha Sree S., Dhanjoo N. Ghista, Sumeet Dua, Paul Joseph, Thajudin Ahamed V.I., Nittiagandhi Ja-
narthanan, and Toshiyo Tamura. (2013) An integrated diabetic index using heart rate variability signal features for diagnosis of diabetes.
Computer methods in biomechanics and biomedical engineering 16 (2): 222234.
[12] Goutham Swapna, Rajendra Acharya U., VinithaSree S., and Jasjit S. Suri. (2013) Automated detection of diabetes using higher order spectral
features extracted from heart rate signals. Intelligent Data Analysis 17 (2):309326.
[13] Lee Wei Jian, and Teik-Cheng Lim. (2013) Automated detection of diabetes by means of higher order spectral features obtained from heart
rate signals. Journal of Medical Imaging and Health Informatics 3 (3): 440447.
[14] Rajendra Acharya U., Oliver Faust, Nahrizul Adib Kadri, Jasjit S. Suri, and Wenwei Yu. (2013) Automated identification of normal and diabetes
heart rate signals using nonlinear measures. Computers in biology and medicine 43 (10): 15231529.
[15] Ram Bilas Pachori, Pakala Avinash, Kora Shashank, Rajeev Sharma, and Rajendra Acharya U. (2015) Application of empirical mode decom-
position for analysis of normal and diabetic RR-interval signals. Expert Systems with Applications 42 (9): 45674581.
[16] Rajendra Acharya U., Vidya K. Sudarshan, Dhanjoo N. Ghista, Wei Jie Eugene Lim, Filippo Molinari, and Meena Sankaranarayanan. (2015)
Computer- aided diagnosis of diabetic subjects by heart rate variability signals using discrete wavelet transform method. Knowledge-based
systems 81 5664.