0% found this document useful (0 votes)
50 views12 pages

Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning of EEG Signals

This document discusses using deep learning models for unsupervised feature learning from EEG signals to help detect and classify epilepsy. It proposes an autoencoding framework called AE-CDNN that combines convolution and deconvolution networks to learn low-dimensional features from high-dimensional EEG data in an unsupervised manner. Experimental results on public EEG datasets show that features extracted by AE-CDNN achieve high accuracy for epilepsy classification, comparable or better than other recent studies.

Uploaded by

investigacion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views12 pages

Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning of EEG Signals

This document discusses using deep learning models for unsupervised feature learning from EEG signals to help detect and classify epilepsy. It proposes an autoencoding framework called AE-CDNN that combines convolution and deconvolution networks to learn low-dimensional features from high-dimensional EEG data in an unsupervised manner. Experimental results on public EEG datasets show that features extracted by AE-CDNN achieve high accuracy for epilepsy classification, comparable or better than other recent studies.

Uploaded by

investigacion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

SPECIAL SECTION ON TRENDS, PERSPECTIVES AND PROSPECTS OF MACHINE LEARNING

APPLIED TO BIOMEDICAL SYSTEMS IN INTERNET OF MEDICAL THINGS

Received April 8, 2018, accepted April 28, 2018, date of publication May 7, 2018, date of current version May 24, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2833746

Deep Convolution Neural Network and


Autoencoders-Based Unsupervised
Feature Learning of EEG Signals
TINGXI WEN AND ZHONGNAN ZHANG , (Member, IEEE)
Software School, Xiamen University, Xiamen 361005, China
Corresponding author: Zhongnan Zhang ([email protected])
This work was supported by the Science and Technology Guiding Project of Fujian Province, China, under Grant 2015H0037 and
Grant 2016H0035.

ABSTRACT Epilepsy is a health problem that seriously affects the quality of humans for many years.
Therefore, it is important to accurately analyze and recognize epilepsy based on EEG signals, and for a long
time, researchers have attempted to extract new features from the signals for epilepsy recognition. However,
it is very difficult to select useful features from a large number of them in this diagnostic application. As the
development of artificial intelligence progresses, unsupervised feature learning based on the deep learning
model can obtain features that can better describe identified objects from unlabeled data. In this paper,
the deep convolution network and autoencoders-based model, named as AE-CDNN, is constructed in order
to perform unsupervised feature learning from EEG in epilepsy. We extract features by AE-CDNN model and
classify the features based on two public EEG data sets. Experimental results showed that the classification
results of features obtained by AE-CDNN are more optimal than features obtained by principal component
analysis and sparse random projection. Using several common classifiers to classify features obtained by
AE-CDNN model results in high accuracy and not inferior to the research results from most recent studies.
The results also showed that the features of AE-CDNN model are clear, effective, and easy to learn. These
features can speed up the convergence and reduce the training times of classifiers. Therefore, the AE-CDNN
model can be effectively applied to feature extraction of EEG in epilepsy.

INDEX TERMS EEG, unsupervised learning, feature extraction, CNN, epileptic seizure.

I. INTRODUCTION of brain activity can change during seizures, the EEG has
Epilepsy is a noninfectious chronic brain disease, which become the most common used epilepsy diagnostic meth-
affects people of all ages. There are about 50 million epileptic ods. After the first study using EEG to detect epileptic by
patients at present, which results in becoming one of the most Gotam [3], researchers have done many experiments on
common neurological diseases in the world [1]. Seizures can this technique [4]–[6]. The essence of EEG-based epilep-
cause cognitive dysfunction such as loss of consciousness, tic detection is the classification of patients’ EEG signals.
which leads to great physical harm to patients, e.g. fracture Other studies [7]–[9] mentioned the EEG during the pres-
and injury. Moreover, patients may suffer from great mental ence and absence of epileptic seizures. Then, some stud-
pain because of shame and discrimination. Because epileptic ies [10]–[12] aimed at studying healthy persons, as well as
seizures can cause irreversible damage to the brain and may patients with epileptic seizures during the onset and absence
result in unprovoked recurrent attack, it is of great signifi- of the seizures. Their method follows the steps of data
cance to analyze epilepsy. acquisition and preprocessing, feature extraction, classifica-
Electroencephalogram (EEG) is a measure of the voltage tion model training and EEG signals classification. In data
fluctuation generated by the ion current of neurons in the acquisition, researches focused on physiological signal sen-
brain, which reflects the activity of the brain’s bioelectric- sor [13]–[15]. In data preprocessing, Sharma et al. [16] sub-
ity and contains a large number of physiological and dis- tracted EEG signals of adjacent channels to reduce the effect
ease information [2]. Because the frequency and rhythm of noise. Anindya et al. [17] filtered frequencies that are

VOLUME 6, 2018 This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ 25399
T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning

higher than 64HZ by means of a 6th order Butter-worth filter. challenges in automatic epilepsy detection. Feature design
Feature extraction has always been the focus of EEG classifi- and selection is difficult, and whether these methods can
cation, which greatly reduces the dimension of data. A small be applied to new patients is still unknown. Compared with
number of features can describe EEG data well and improve improving classification accuracy by existing techniques,
classification performance greatly. There are many EEG fea- we still need a model with independent feature learning
ture extraction methods, such as time-domain, frequency- ability. Facing this challenge, we propose an auto-coding
domain, time-frequency analysis, and chaotic features. framework- based deep network by combining it with con-
Zandi et al. [18] proposed an algorithm based on wavelet for volution and deconvolution in order to perform unsupervised
real-time detection of epileptic seizures. Polat and Güneş [19] feature learning from EEG signals. This model can effectively
extracted features using fast Fourier transform and classified learn low-dimensional features from high-dimensional EEG
the EEG using a decision tree classifier. Acharya et al. [20] data in order to help classifier achieve a higher detection
mentioned the Entropy-based feature extraction method in his accuracy and faster speed.
review of the application of Entropy in epilepsy detection, The structure of this paper is as follows: in Section II,
such as Approximate Entropy, Sample Entropy, and Spectral we will detail the deep learning-based unsupervised feature
Entropy. learning model. Experiment and results will be presented in
The core of these common methods is to perform effective Section III. In this section, we describe the source and struc-
feature extraction for EEG. However, the process of designing ture of the datasets, and the classification result by applying
new feature is complex and not easily verifiable. It is also the proposed model. Section IV is based on the discussion of
extremely difficult to select a number of optimal charac- features and classification results. Besides, we will compare
teristics from a large number of time-domain, frequency- our model with other well-established models in analyzing
domain, time-frequency analysis, and chaotic features. At the EEG. Finally, Section V concludes this paper.
same time, some wavelet transform, empirical mode analysis
and trend multifractal feature extraction process are compli-
cated and time-consuming. In recent years, deep learning has
become the research hotspot in machine learning and has
achieved high efficiency in computer vision. In EEG classi-
fication, Tabar and Halici [21] used the short time Fourier
transform to transform EEG into 2D images, and then per-
formed independent feature learning and classification based
on a deep learning method. Xun et al. [22] dissected the
EEG into smaller sections through a small window and used
a context-learning model for each small section in order to
form various ‘‘EEG word’’, so as to compile them into an
‘‘EEG dictionary’’ and resulting in a new feature. Then, they
FIGURE 1. Flowchart for unsupervised learning method whereby the light
used the original data to attain new features from the ‘‘EEG green and the light blue dashed frames represent the training and testing
dictionary’’ and present them in the classifier by using two phrases respectively.
parts of these features. Deep learning can independently learn
features from data, which greatly improves the performance
of the classification model. Because of the powerful fea- II. METHODOLOGY
ture learning ability, deep convolution neural network (CNN) The paper adopts unsupervised learning method for the EEG
has become an important research hotspot in image field of epilepsy patients so as to automatically obtain features
and has significant influences in EEG classification. Masci that can better describe the recognized objects. In other
et al. [23] proposed a convolutional auto-encoder, which is words, it extracts a number of features from the original high-
an unsupervised learning method for features learning based dimensional data by means of the dimension reduction algo-
on CNN. Chen et al. [24] presented several descriptors for rithms. The light green dashed frame in Fig.1 is the training
feature-based matching using convolution and pooling auto- stage of the model, which uses unlabeled EEG signals to
encoders. Noh et al. [25] proposed a novel semantic segmen- train the deep convolution network. The light blue dashed
tation algorithm by implementing a deconvolution network. frame represents the testing phrase, in which the test data
However, these unsupervised feature learning methods are is input into the model to obtain the relevant features. After
mostly used in the image analysis field. This paper presents an this stage, we use different common classifiers to verify the
unsupervised feature learning method based on convolution, validity of the test data features. The classifier validation will
deconvolution and autoencoders, and applies it on epilepsy be described in the experimental results in Section 3 and
detection. to be discussed in Section 4. Then, Section 2.1 describes
Although some experienced experts can identify EEG in the Autoencoders framework used in the model, and
epilepsy heuristically, and scholars have performed research the structure and detail of the model will be shown
widely on EEG based epilepsy detection, there are still many in Section 2.2.

25400 VOLUME 6, 2018


T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning

A. AUTOENCODERS FRAMEWORK samples and training samples do not meet the same dis-
Autoencoders [26] is a special neural network structure, tribution, the prediction result of the model will decrease
which has input layer, output layer and hidden layer. It adjusts dramatically [28]. However, the EEG signal sample is high-
the weights of hidden layer by training to enable the input dimensional and their dimensions are not independent. There-
and output value to be as close to each other as possible. fore, autoencoders is difficult to extract effective features
Therefore, the hidden layer has important features of original from EEG signals. On the other hand, CNN can perceive adja-
signal in order to realize unsupervised feature extraction. cent dimension of the signal (i.e. local perception) to attain
Autoencoders is similar to the Principal Component Analy- local features by receptive field and parameter sharing. This
sis (PCA), which can reduce data dimension [27]. process is based on the kernel’s convolution. Using multiple
convolution kernels to construct the sample signals can allow
us to obtain a variety of local features. At the same time,
the features of the model can be reduced by down-sampling.
Therefore, the final features of the sample signals can be
extracted by iterative convolution and down-sampling. Our
autoencoders framework-based unsupervised feature learn-
ing is described as follows. In the encoder part, we extract
features by CNN, which constantly iterates multiple convo-
lution kernels’ convolution and down-sampling to reduce the
number of features to our preset. Then, in the decoder part,
we use the extracted features to reconstruct sample signals
by deconvolution and up-sampling. This means that we will
carry out deconvolution first, and then iterate up-sampling
and deconvolution so as to restore the signal.
We presents a fusion model based on the deep convo-
lution network and autoencoders-based (AE-CDNN). The
FIGURE 2. Autoencoders framework that includes input layer
(x1 , x2 , . . . , xn ), hidden layer (h1 , h2 , . . . , hm ), and output layer structure of the AE-CDNN model is as shown in Fig.3. The
(y1 , y2 , . . . , yn ) whereby the weights of hidden layer represent features model mainly has two stages: 1) The encoder stage has sam-
of the input signal.
ple input, convolution layer, pooling layer (down-sampling
layer), reshape operation, full connection layer, and the fea-
Fig.2 illustrates the basic framework of autoencoders,
ture coding; 2) The decoder stage includes feature coding
which has two data process phases, encoding and decoding.
input, full connection layer, reshape operation, deconvolution
During encoding, the original input signal data is x ∈ [0, 1]n .
layer, up-sampling layer and the reconstruction samples. The
Then we obtain the hidden layer h by encoding function
following presents a specific description of each layer of the
h = encoder(x)(h ∈ [0, 1]m ), whereby the encoding function
model.
is defined as follows:
Assume that we input a one-dimension EEG data into the
h = encoder (x) = g(W · x + b). (1) model. Let x represent the input data, and the convolution
In the function, W ∈ Rm×n is the weight matrix connecting layer is the feature extractor. It uses multiple convolution
input layer and hidden layer. b ∈ [0, 1]m is the bias vector, and kernels to perform convolution calculation of x (multiple
g is activation function. In decoding phase, hidden layer h is convolution kernels perceive x locally), so as to attain more
the input of the decoding function y = decoder(h) in order to feature maps, which can maintain the main components of the
obtain the output layer y. The decoding function is: input samples. The k-th feature map fmk in the convolution
layer is calculated as follows:
y = decoder (x) = g W 0 · x + b0 .

(2)
fmk = g (w_ck ∗ x + b_ck ). (4)
Here, the weight matrix between hidden layer and output
layer is W 0 ∈ Rn×m , and the bias vector is b0 ∈ [0, 1]n . For In the k-th convolution kernel of the convolution layer,
the model training process, we let each output signal y(i) to be w_ck and b_ck represent filters and biases of the convolution
as close in value as possible to the original input signal x (i) . kernel, and ∗ is convolution computation. The pooling layer
Then, the object function of the model is given byčž is a down-sampling process, which samples upper layer’s
feature maps to obtain pooled feature maps for reducing data
y(i) − x (i) .
X
min (3)
dimension. The pooling operation uses a window of length l
to allow sliding and extraction of the sample feature maps.
B. DEEP NETWORK BASED UNSUPERVISED
Each sampling interval does not overlap each other, and we
FEATURE LEARNING MODEL
sample the maximum value within the window to obtain the
Because it is easy to directly copy the input vector to out- pooled feature maps. The calculation process is as follows:
put vector during the traditional autoencoders’ training pro-
cess, the model has inferior performance. When the test pmk = Maxpooling (fmk , l). (5)

VOLUME 6, 2018 25401


T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning

FIGURE 3. Structure of the AE-CDNN model that consists of the encoder and decoder stages whereby the left
and right show processing flow and data flow respectively.

Here, we can iterate multiple processes of convolution and starts after the encoding. Firstly, the feature coding becomes
pooling to reduce the number and dimension of the pooled a one-dimension vector v0 of length r by the calculation of
feature maps to result in m pooled feature maps. Reshape the second full connection layer. The associated equation is
operation reflects pooled feature maps to one dimensional as follows:
vector and then achieve feature coding by the operation of
v0 = g w0v ∗ c .

(7)
full connection layer, which synthesize information of all
pooled feature maps. The reshape operation makes all pooled The weight of the full connection layer is w0v . Because we
feature maps to generate a one-dimension vector v of length r. need to ensure that in the decoding process, all information
Therefore, after the calculation of full connection layer, is from the feature coding, there will be no bias in this layer.
the vector v becomes feature coding as follows: The second reshape operation cuts v0 into m pooled feature
maps, which corresponds to the first reshape operation. The
c = g (wv ∗ v + bf ). (6) k-th pooled feature map is pm0k . In the up-sampling layer,
we use the original sampling window to insert the same value
In the above equation, wv and bf represent the weight
as the previous sample to attain fm0k as follows:
and bias of full connection layer respectively. Here, c is
fm0k = upsampling pm0k , l .

the achieved feature. The decoding (signal reconstruction) (8)

25402 VOLUME 6, 2018


T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning

The deconvolution is:


X X
y= g( w_c0k ◦ fm0k + b_c0k ). (9)

In this case, the deconvolution kernel w_c0k is equal to


the shape of the transpose of w_ck and b_c0k is bias. Here,
◦ denotes deconvolution. Next, the reconstructed signal y is
achieved. In the network, the output layer activation function
adopts the sigmoid function and all others are relu activation
function.
Assume that there are N training samples and x (i) is a
sample, to get y(i) from x (i) by calculation, we need to generate
the minimized loss function such as the target function in the
autoencoders model (based on Equation (3)). The calculation
is as follows:
XN  FIGURE 4. Sample signals of dataset 1 based on the intracranial signal
Loss1 = (i)
x −y (i)
N. (10) whereby subsets C and D pertain to the EEG signals during the absence of
epileptic patient and subset E pertains to the EEG signal during seizure.
i=1

However, the signal is serial and its features are not inde-
pendent. There are differences between each sample in terms
of magnitude and Equation (10) has great impact on large size
samples. We need to adopt a new loss calculation method:
XN . 
Loss2 = (x (i) − y(i) ) avg(x (i) ) N. (11)
i=1

Here, avg(x (i) ) is the average of sample x (i) . The loss function
of this network is optimized using the Adam optimizer [22].

III. EXPERIMENT AND RESULTS


This paper focuses on the unsupervised feature learning of
EEG signal. The research process is as follows: 1) Perform
unsupervised feature learning of two data sets to obtain fea-
tures. 2) After that, we use a variety of common classifiers
to classify these features and verify effect of features by
FIGURE 5. Sample signal of dataset 2 based on brain scalp signal
the classification accuracy. In this section, we explain the whereby the left and the right of the red line are non-epilepsy and
source of experimental data and the preprocessing. Next, epileptic seizures respectively.
we describe the network structure and parameters of the
AE-CDNN model. Then, we list several common classifiers
to validate AE-CDNN model. Finally, we explain the experi- 2) DATASET 2
mental results. The second dataset includes the scalp EEG signals collected
from a children hospital in Boston [31]. It was based on
A. DATA PREPROCESSING measuring electrical activity in the brain to obtain EEG sig-
1) DATASET 1 nals by connecting multiple electrodes to the patients’ scalp.
The first dataset is based on the online public dataset that This dataset is composed by EEG signals of 23 children with
was published by Andrzejak et al. [30]. This dataset con- refractory epilepsy. All the signals were recorded at 256 Hz
sists of five subsets (expressed as A-E) and each subset has with 16-bit resolution. Each sample has 23 channels and the
100 EEG signals. Each EEG signal has a duration of 23.6s length of each channel is about 921600. Some of the samples
and the length is 4096, which includes records of health and contain epileptic EEG signals. Each channel has its own
epilepsy patients. The subsets A and B contain EEG records name, i.e. the first channel is FP1-F7 (see Fig.5). We selected
of 5 healthy volunteers, which were recorded based on the one channel from the 23 channels to implement a study. The
standard international 10-20 electrode placement program. larger the variance of the signal is, the greater the fluctuation
The subsets C and D describe EEG signals during the absence range is. When epilepsy occurred, the EEG signals fluctuate
of 5 epileptic patients. And subset E includes the EEG signals significantly.
during seizure of epileptic patients. The C, D and E subsets Therefore, we chose channel based on variance [32]. Our
came from the skull. The analysis of subsets C, D and E are method is as follows: 1) calculate the variance of each channel
shown in Fig.4. in each sample, and select the channel with the maximum

VOLUME 6, 2018 25403


T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning

FIGURE 6. Network structure was constructed based on AE-CDNN model in the experiment whereby the left is
encoding network while the right is decoding network.

variance for each sample; 2) count these channels. In the TABLE 1. Parameters used in each classifiers.
first 10 patients, the ‘‘FT9-FT10’’ channel appeared most.
So, we chose ‘‘FT9-FT10’’ channel as sample data for classi-
fication. For the EEG data of the first 10 patients, we ran-
domly extracted 200 epileptic seizure EEG signal samples
and 200 non-epileptic seizure EEG signal samples on channel
‘‘FT9-FT10’’. Note that the length of each sample is 4096.
The activation function of AE-CDNN is sigmoid function
and the output range is (0, 1). To ensure that the loss func-
tions (10) and (11) are available, the range of each dimension
of the input layer should be set at (0, 1). We used 0-1 normal-
ization to map the sample data to [0, 1]. The transformation
function was as follows:
   
dist = max x (k) − min x (k) , k ∈ {1, . . . N } (12) setting different values for feature encoding length m (feature
  quantity).
tranfun = (d − min x (k) )/dist. (13) In Fig. 6, the Convolutional (k_szie = 3, c = 16)
represents a convolution layer, where k_size = 3 means that
In the above equation, dis the dimension value of the input the convolution kernel is 3 and c = 16 means the number
sample x. Then, max x (k) is the maximum value of samples of output channels of this layer is 16. Pooling (k = 2)
in all dimensions, and max x (k) is the minimum of samples is a pooling layer. k is down-sampling factor whose value is 2
in all dimensions. and stride is 1, Deconvolutional (k_szie = 3, c = 1)
is a deconvolution layer whose number of deconvolution
B. NETWORK PARAMETERS AND kernel is 3 and the number of output channels is 16.
CLASSIFIERS PARAMETERS Depooling (k = 2) is up-sampling layer. The original
The AE-CDNN model-based deep network is shown in Fig. 6. input data was a one-dimension vector of length 4096. After
Although the increase of network depth can enhance learning Convolutional (k_szie = 3, c = 16), it became a matrix of
ability of the model, it may also cause gradient to disappear 4096∗16 (16 channels, the length of each channel is 4096).
while training, or overfitting. In the deep network, we fixed Therefore, Convolutional(k_szie = 3, c = 32) changed the
the network depth, encoding and decoding process. In our matrix of upper layer into 4096∗32. And then, the matrix
experiment, we analyzed learning ability of the model by became 2048∗32 after Pooling(k = 2).

25404 VOLUME 6, 2018


T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning

TABLE 2. Classification results of AE-CDNN-L1 for dataset 1.

TABLE 3. Classification results of AE-CDNN-L2 for dataset 1.

TABLE 4. Classification results of AE-CDNN-L1 for dataset 2.

In this paper, we used several common classifiers, includ- the classification results of different classifiers for features
ing k-NN, support vector machine (linear kernel and radial learned by AE-CDNN-L1 and AE-CDNN-L2, respectively of
basis kernel), decision tree, random forests, multilayer neural the two datasets. We used a 5-fold cross validation to calculate
network, AdaBoost algorithm, and Guass Bayesian classifi- each classifier’s accuracy. In the first column, m is the number
cation, to classify features attained by unsupervised algorithm of features that the model learnt. For example, in Table 2,
in order to verify the effectiveness of AE-CDNN model. All m = 2 means that the model need to learn two features,
these classifiers are from the scikit-learn library [33] and and after 5-fold cross validation the average accuracy of the
the parameters in the classifiers are based on the default k-NN classifier is 73.336%. In the last column, AVG means
parameters in the library, which were shown in Table 1. In this the average accuracy for the listed classifiers.
table, ‘‘–’’ indicates that the parameters were set to default Fig.7 shows the average accuracy’s change of
value. AE-CDNN-L1 and AE-CDNN-L2 based on different feature
dimensions of the two datasets. When the number of features
C. EXPERIMENTAL RESULTS is greater than 8, the average performance of the above
In our experiment, we used the training set to train model classifiers is acceptable. In general, the accuracy and stability
and the testing set to extract features by the trained model. of AE-CDNN-L2 are better than those of AE-CDNN-L1.
And then, we used several common classifiers to classify
the extracted features, so that to verify the effectiveness of IV. DISCUSSION
the features obtained by the unsupervised learning method. In this section, we analyze the effectiveness of features
For AE-CDNN model, we proposed two different loss func- extracted by AE-CDNN. Then, we compared the features of
tions (10) and (11). Here, we treat the method that uses the model with other unsupervised feature extraction mod-
function (10) as AE-CDNN-L1, and the method that used els. Finally, we performed comparison of our classification
function (11) as AE-CDNN-L2. Table 2 to 5 represent results with those of other related studies.

VOLUME 6, 2018 25405


T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning

TABLE 5. Classification results of AE-CDNN-L2 for dataset 2.

TABLE 6. Classification results of AE-CDNN-L1 for dataset 1.

TABLE 7. Classification results of AE-CDNN-L2 for dataset 1.

the network being 0 and the gradient of the autoencoder


model disappeared quickly. It is very difficult to train mul-
tiple hidden layers (such as 2-4 layers) [34], which resulted
in the hidden layer (including feature coding) not being
able to receive training and the weights are 0. In addition,
because the ratio of positive and negative samples in the
testing set is 2:1, the accuracy is 0.667 when all samples
were deemed as positive and the accuracy is 0.333 when
all samples were deemed as negative. We can see that the
hidden layers of network correspond to the 2nd , 3rd and 5th
lines in Table 6, and the 1st line in Table 7 were not
trained. It meant that the network has not learned any fea-
tures. The problem also existed in dataset 2, whereby the
ratio of positive and negative samples is 1:1. Therefore,
when the model did not learn any features, the accuracy
becomes 0.5, which is shown in the 2nd , 3rd , 4th and 5th lines
FIGURE 7. Average accuracy of AE-CDNN-L1 and AE-CDNN-L2 based on
different feature dimensions of the two datasets. of Table 8.
We increased the coding length of features. When m = 4,
A. NETWORK MODEL FEATURE EXTRACTION ANALYSIS the model can learn features easier. Fig.8 shows the feature
From Fig.7, when m = 2, the network models were dif- distribution of the two models in the first test of dataset 1.
ficult to learn effective features. Table 6 and 7 present the Here, f1 , f2 , f3 and f4 represents the four features. Note that
calculation results of 5-fold cross validation for dataset 1 with f2 and f3 of AE-CDNN-L1 pertain to 0, which meant that
AE-CDNN-L1 and AE-CDNN-L2. the model did not learn any feature. From f1 and f4 , we can
In Table 6 and 7, the accuracy with the value of 0.667 see that the features obtained by AE-CDNN-L2 are more
appeared multiple times. It is due to the initial weight of optimal than those of AE-CDNN-L1 in terms of feature

25406 VOLUME 6, 2018


T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning

TABLE 8. Classification results of AE-CDNN-L1 for dataset 2.

FIGURE 8. Feature distribution of AE-CDNN-L1 and AE-CDNN-L2 models in the first test of dataset 1 while feature dimension was 4.

separation degree. It also indicated that AE-CDNN-L2 can two methods, we find that AE-CDNN-L2 could be a better
learn better features. choice.
In Fig. 8, f1 was the first feature of the learned feature
coding array. f1 and f2 were independent of each other and B. COMPARISON OF DIMENSION REDUCTION METHODS
the value of f1 of the two models were independent and In last section, we discussed the feature learning ability of
had no corresponding relationship. P, N represented the AE-CDNN model. We will compare the model with the
positive and negative samples. In the process of deep learn- existing main dimension reduction methods in this section.
ing, the change of loss function value can reflect the learn- PCA is a very important linear dimension reduction method,
ing situation of the deep network. Fig.9 shows the training which can obtain low dimensional features by linear trans-
of AE-CDNN-L1 and AE-CDNN-L2. We find that based formation of high dimensional data [27]. Random projec-
on the testing set loss function of the two methods con- tion (RP) is a powerful method used to construct Lipschitz
verged after 2000 epoch. From the convergence results of the mappings so as to realize dimension reduction with a high

VOLUME 6, 2018 25407


T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning

TABLE 9. Classification results of AE-CDNN-L1 for dataset 1 (10-fold cross validation).

with the increase of total features. However, AE-CDNN-L1


and AE-CDNN-L2 can obtain new features as the increase
of total features. When the number of features is greater
than 4, the classification accuracy of features obtained by
these two network performed better than those pertaining to
PCA and SRP.

C. COMPARISON OF CLASSIFICATION APPLICATION


There are many classification researches on dataset 1.
However, they all focused on designing new features based
on the combination of existing features or searching use-
ful features from a large number of features. For example,
Pachori and Patidar [7] (2014) designed new features based
on the empirical mode decomposition (EMD) and the second-
order difference plot (SODP), and classified them using neu-
ral network. The best classification accuracy was 97.75%.
FIGURE 9. Change of loss function of AE-CDNN-L1 and Besides, Sharma and Pachori [8] (2015) combined the pre-
AE-CDNN-L2 during training whereby the blue and green lines represent vious method and the phase space representation to obtain
the change of loss function of the training and testing sets respectively.
new features. They used SVM to classify and the best clas-
sification accuracy can reach up to 98.67%. Wen et al. used
optimization algorithm to search features on the frequency of
signal [36], which achieved an accuracy of 99%. It is worth-
while noting that high classification accuracy is often limited
by specific data processing method, feature design, classifica-
tion model, parameters, etc. When these constraints change,
fluctuation of the accuracy occurs. Therefore, we need a
model with autonomous learning ability that is not so limited
by the classifier. Table 9 presents the classification results of
AE-CDNN-L1 for dataset 1 after 10-fold cross validation.
When m = 16 or 32, most of classifiers without parameter
tuning can achieve good classification results.
Table 10 shows the 10-fold cross validation results of
neural network (NN-2) with two hidden layers (the number of
nodes in the hidden layer are 22 and 12) for features obtained
by AE-CDNN-L1 in Table 9. The number 1-10 represent
10 independent experiments. Note that AVG is the average
FIGURE 10. Comparison of average classification accuracy of the two
datasets, obtained by AE-CDNN-L1, AE-CDNN-L2, PCA and SRP in each
classification accuracy of the 10 experiments. The results
classifier. show that the best classification accuracy can reach up to
100%. Fig.11 shows the change of loss function values of
the NN-2 training process in Table 10. We find that based on
probability [35]. Sparse random projection (SRP) can reduce the features obtained by AE-CDNN-L2, NN-2 can converge
the dimensionality by projecting the original input space well at approximately 500 epoch during the learning process.
using a sparse random matrix. Fig.10 presents the comparison It means that features attained by AE-CDNN-L2 are clear,
of average classification accuracy of the two datasets that are effective and easy to learn.
obtained by AE-CDNN-L1, AE-CDNN-L2, PCA and SRP Most studies related with the classification of EEG in
in each classifier. We find that the classification accuracy of epilepsy only focused on one dataset. But the application
features obtained by PCA and SRP are more stable, which of AE-CDNN model in dataset 2 also gives good classi-
mean that the number of effective features is not increased fication results. Because dataset 2 has multiple channels,

25408 VOLUME 6, 2018


T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning

TABLE 10. Results of 10-fold cross validation based on NN-2.

we plan to make a meaningful visualization of the features


extracted by the deep convolution network, which can be
applied to feature recognition at all steps of epileptic seizure.

ACKNOWLEDGMENT
The findings achieved herein are solely the responsibility of
the authors.

REFERENCES
[1] WHO. (2012). Epilepsy. [Online]. Available: http://www.who.int/
mediacentre/factsheets/fs999/zh/
[2] M. Z. Koubeissi, Niedermeyer’s Electroencephalography, Basic Princi-
ples, Clinical Applications, and Related Fields, 6th ed. Philadelphia, PA,
USA: Lippincott Williams & Wilkins, 2010.
[3] J. Gotman, ‘‘Automatic recognition of epileptic seizures in the EEG,’’
Electroencephalogr. Clin. Neurophysiol., vol. 54, no. 5, pp. 530–540, 1982.
[4] Y. Zhang, G. Xu, J. Wang, and L. Liang, ‘‘An automatic patient-specific
FIGURE 11. Change of loss function values of the NN-2 training process seizure onset detection method in intracranial EEG based on incremen-
in Table 10. tal nonlinear dimensionalityreduction,’’ Comput. Biol. Med., vol. 40,
nos. 11–12, pp. 889–899, 2010.
[5] A. Shoeb, A. Kharbouch, J. Soegaard, S. Schachter, and J. Guttag,
‘‘A machine-learning algorithm for detecting seizure termination in scalp
Xun et al. used a window with the length of 5 secs to carry EEG,’’ Epilepsy Behav., vol. 22, no. 1, pp. S36–S43, 2011.
out single channel samples for classification, with an error [6] M. Qaraqe, M. Ismail, and E. Serpedin, ‘‘Band-sensitive seizure onset
detection via CSP-enhanced EEG features,’’ Epilepsy Behav., vol. 50,
rate of 22.93% [22]. Although there are some differences in pp. 77–87, Sep. 2015.
the preprocessing, Table 5 shows that our method can achieve [7] R. B. Pachori and S. Patidar, ‘‘Epileptic seizure classification in EEG
better classification accuracy. signals using second-order difference plot of intrinsic mode functions,’’
Comput. Methods Programs Biomed., vol. 113, no. 2, pp. 494–502, 2014.
[8] R. Sharma and R. B. Pachori, ‘‘Classification of epileptic seizures in EEG
V. CONCLUSION signals based on phase space representation of intrinsic mode functions,’’
In this paper, we attempt to use the deep convolution network Expert Syst. Appl., vol. 42, no. 3, pp. 1106–1117, 2015.
[9] G. Zhu, Y. Li, and P. Wen, ‘‘Epileptic seizure detection in EEGs signals
and autoencoders to perform unsupervised feature learning using a fast weighted horizontal visibility algorithm,’’ Comput. Methods
of EEG in epilepsy, and to construct the AE-CDNN model. Programs Biomed., vol. 115, no. 2, pp. 64–75, Jul. 2014.
Our model can extract features from unlabeled EEG signals, [10] A. T. Tzallas, M. G. Tsipouras, and D. I. Fotiadis, ‘‘Automatic seizure
detection based on time-frequency analysis and artificial neural networks,’’
which greatly reduce data dimension and achieve good clas- Comput. Intell. Neurosci., vol. 2007, Oct. 2007, Art. no. 80510.
sification accuracy. In our experiments, we have used two [11] E. D. Übeyli, ‘‘Least squares support vector machine employing model-
public EEG datasets (one is the intracranial EEG signals based methods coefficients for analysis of EEG signals,’’ Expert Syst.
Appl., vol. 37, no. 1, pp. 233–239, 2010.
and the other is the brain epidermis EEG signals) to learn [12] A. S. M. Murugavel and S. Ramakrishnan, ‘‘Hierarchical multi-class SVM
features and classify. Only from the classification results, with ELM kernel for epileptic EEG signal classification,’’ Med. Biol. Eng.
we could see that using multiple classifiers without parameter Comput., vol. 54, no. 1, pp. 149–161, 2016.
[13] X. Xiao, S. Pirbhulal, and K. Dong, ‘‘Performance evaluation of plain
tuning, the average classification accuracy of the obtained weave and honeycomb weave electrodes for human ECG monitoring,’’
features could reach more than 92% when feature dimension J. Sensors, vol. 2017, Jul. 2017, Art. no. 7539840.
was greater than 16. And the new method is more optimal [14] S. Pirbhulal, H. Zhang, and S. C. Mukhopadhyay, ‘‘An efficient biometric-
based algorithm using heart rate variability for securing body sensor net-
than PCA and SRP in feature effectiveness after dimension works,’’ Sensors, vol. 15, no. 7, pp. 15067–15089, 2015.
reduction. However, the model performed badly when the [15] W. Wu, H. Zhang, S. Pirbhulal, S. C. Mukhopadhyay, and Y. T. Zhang,
dimension of features is lower. We find that it is the initial ‘‘Assessment of biofeedback training for emotion management through
weight of the network that caused the problem. Therefore, our wearable textile physiological monitoring system,’’ IEEE Sensors J.,
vol. 15, no. 12, pp. 7087–7095, Dec. 2015.
next research object is to discover how to pre-train the initial [16] R. Sharma, R. B. Pachori, and S. Gautam, ‘‘Empirical mode decomposition
weight effectively. Our method is not inferior in terms of based classification of focal and non-focal seizure EEG signals,’’ in Proc.
classification accuracy as compared to the other techniques. Int. Conf. Med. Biometrics, May 2014, pp. 135–140.
[17] A. B. Das and M. I. H. Bhuiyan, ‘‘Discrimination and classification of focal
Because the traditional epilepsy classifications are based on and non-focal EEG signals using entropy-based features in the EMD-DWT
single dataset, our method is less constrained. In future, domain,’’ Biomed. Signal Process. Control, vol. 29, pp. 11–21, Aug. 2016.

VOLUME 6, 2018 25409


T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning

[18] A. S. Zandi, M. Javidan, G. A. Dumont, and R. Tafreshi, ‘‘Automated real- [32] T. Alotaiby, F. E. A. El-Samie, and S. A. Alshebeili, ‘‘A review of channel
time epileptic seizure detection in scalp EEG recordings using an algorithm selection algorithms for EEG signal processing,’’ EURASIP J. Adv. Signal
based on wavelet packet transform,’’ IEEE Trans. Biomed. Eng., vol. 57, Process., vol. 2015, p. 66, Dec. 2015.
no. 7, pp. 1639–1651, Jul. 2010. [33] F. Pedregosa et al., ‘‘Scikit-learn: Machine learning in Python,’’ J. Mach.
[19] K. Polat and S. Güneş, ‘‘Classification of epileptiform EEG using a hybrid Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011.
system based on decision tree classifier and fast Fourier transform,’’ Appl. [34] G. E. Hinton and R. R. Salakhutdinov, ‘‘Reducing the dimensionality of
Math. Comput., vol. 187, no. 2, pp. 1017–1026, 2007. data with neural networks,’’ Science, vol. 313, no. 5786, pp. 504–507,
[20] U. R. Acharya, H. Fujita, V. K. Sudarshan, S. Bhat, and J. E. W. Koh, 2006.
‘‘Application of entropies for automated diagnosis of epilepsy using EEG [35] J. Wang, Geometric Structure of High-Dimensional Data and Dimension-
signals: A review,’’ Knowl.-Based Syst., vol. 88, pp. 85–96, Nov. 2015. ality Reduction. Beijing, China: Higher Education Press, 2012.
[21] Y. R. Tabar and U. Halici, ‘‘A novel deep learning approach for classi- [36] T. Wen and Z. Zhang, ‘‘Effective and extensible feature extraction method
fication of EEG motor imagery signals,’’ J. Neural Eng., vol. 14, no. 1, using genetic algorithm-based frequency-domain feature search for epilep-
p. 016003, 2017. tic EEG multiclassification,’’ Medicine, vol. 96, no. 19, p. e6879, 2017.
[22] G. Xun, X. Jia, and A. Zhang, ‘‘Detecting epileptic seizures with electroen-
cephalogram via a context-learning model,’’ BMC Med. Inform. Decision
Making, vol. 16, no. 2, p. 70, 2016.
[23] J. Masci, U. Meier, C. Dan, C. Dan, and J. Schmidhuber, ‘‘Stacked con-
volutional auto-encoders for hierarchical feature extraction,’’ in Proc. Int. TINGXI WEN was born in Fujian, China.
Conf. Artif. Neural Netw., 2011, pp. 52–59. He received the M.S. degree in software engineer-
[24] L. Chen, F. Rottensteiner, and C. Heipke, ‘‘Feature descriptor by convo- ing from Xiamen University. He has published
lution and pooling autoencoders,’’ Int. Arch. Photogram., Remote Sens. widely in the field of robotic control based on
Spatial Inf. Sci., vol. 3, no. 3, pp. 31–38, 2015. EMG using various data classification methods.
[25] H. Noh, S. Hong, and B. Han, ‘‘Learning deconvolution network for He has also developed a novel multi-objective
semantic segmentation,’’ in Proc. IEEE Int. Conf. Comput. Vis. IEEE optimization technique based on spark platform
Comput. Soc., May 2015, pp. 1520–1528. applied for optimizing routing. His research inter-
[26] G. E. Hinton and R. S. Zemel, ‘‘Autoencoders, minimum description length ests include data mining, machine learning, and
and Helmholtz free energy,’’ in Proc. Int. Conf. Neural Inf. Process. Syst.,
cloud computing.
1993, pp. 3–10.
[27] T. Ahmad, R. A. Fairuz, F. Zakaria, and H. Lsa, ‘‘Selection of a subset
of EEG channels of epileptic patient during seizure using PCA,’’ in Proc.
World Sci. Eng. Acad. Soc. (WSEAS), 2008, pp. 270–273.
[28] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, ZHONGNAN ZHANG (M’15) received the B.E.
‘‘Stacked denoising autoencoders: Learning useful representations in a and M.E. degrees in computer science and tech-
deep network with a local denoising criterion,’’ J. Mach. Learn. Res., nology from Southeast University, Nanjing, China,
vol. 11, no. 12, pp. 3371–3408, Dec. 2010. in 1999 and 2001, respectively, and the Ph.D.
[29] D. P. Kingma and J. Ba. (2014). ‘‘Adam: A method for stochastic optimiza- degree in computer science from The University
tion.’’ [Online]. Available: https://arxiv.org/abs/1412.6980
of Texas at Dallas, TX, USA, in 2008.
[30] R. G. Andrzejak, K. Lehnertz, F. Mormann, C. Rieke, P. David, and
C. E. Elger, ‘‘Indications of nonlinear deterministic and finite-dimensional
Since 2017, he has been a Full Professor with
structures in time series of brain electrical activity: Dependence on record- the Software School, Xiamen University, Xiamen,
ing region and brain state,’’ Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat. China, where he was an Assistant Professor from
Interdiscip. Top., vol. 64, no. 6, p. 061907, 2001. 2009 to 2012 and an Associate Professor from
[31] A. H. Shoeb, ‘‘Application of machine learning to epileptic seizure onset 2012 to 2017. His research interests include big data analysis, data mining,
detection and treatment,’’ Dept. Electr. Med. Eng., Massachusetts Inst. machine learning, and bioinformatics.
Technol., Cambridge, MA, USA, 2009.

25410 VOLUME 6, 2018

You might also like