Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning of EEG Signals
Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning of EEG Signals
Received April 8, 2018, accepted April 28, 2018, date of publication May 7, 2018, date of current version May 24, 2018.
Digital Object Identifier 10.1109/ACCESS.2018.2833746
ABSTRACT Epilepsy is a health problem that seriously affects the quality of humans for many years.
Therefore, it is important to accurately analyze and recognize epilepsy based on EEG signals, and for a long
time, researchers have attempted to extract new features from the signals for epilepsy recognition. However,
it is very difficult to select useful features from a large number of them in this diagnostic application. As the
development of artificial intelligence progresses, unsupervised feature learning based on the deep learning
model can obtain features that can better describe identified objects from unlabeled data. In this paper,
the deep convolution network and autoencoders-based model, named as AE-CDNN, is constructed in order
to perform unsupervised feature learning from EEG in epilepsy. We extract features by AE-CDNN model and
classify the features based on two public EEG data sets. Experimental results showed that the classification
results of features obtained by AE-CDNN are more optimal than features obtained by principal component
analysis and sparse random projection. Using several common classifiers to classify features obtained by
AE-CDNN model results in high accuracy and not inferior to the research results from most recent studies.
The results also showed that the features of AE-CDNN model are clear, effective, and easy to learn. These
features can speed up the convergence and reduce the training times of classifiers. Therefore, the AE-CDNN
model can be effectively applied to feature extraction of EEG in epilepsy.
INDEX TERMS EEG, unsupervised learning, feature extraction, CNN, epileptic seizure.
I. INTRODUCTION of brain activity can change during seizures, the EEG has
Epilepsy is a noninfectious chronic brain disease, which become the most common used epilepsy diagnostic meth-
affects people of all ages. There are about 50 million epileptic ods. After the first study using EEG to detect epileptic by
patients at present, which results in becoming one of the most Gotam [3], researchers have done many experiments on
common neurological diseases in the world [1]. Seizures can this technique [4]–[6]. The essence of EEG-based epilep-
cause cognitive dysfunction such as loss of consciousness, tic detection is the classification of patients’ EEG signals.
which leads to great physical harm to patients, e.g. fracture Other studies [7]–[9] mentioned the EEG during the pres-
and injury. Moreover, patients may suffer from great mental ence and absence of epileptic seizures. Then, some stud-
pain because of shame and discrimination. Because epileptic ies [10]–[12] aimed at studying healthy persons, as well as
seizures can cause irreversible damage to the brain and may patients with epileptic seizures during the onset and absence
result in unprovoked recurrent attack, it is of great signifi- of the seizures. Their method follows the steps of data
cance to analyze epilepsy. acquisition and preprocessing, feature extraction, classifica-
Electroencephalogram (EEG) is a measure of the voltage tion model training and EEG signals classification. In data
fluctuation generated by the ion current of neurons in the acquisition, researches focused on physiological signal sen-
brain, which reflects the activity of the brain’s bioelectric- sor [13]–[15]. In data preprocessing, Sharma et al. [16] sub-
ity and contains a large number of physiological and dis- tracted EEG signals of adjacent channels to reduce the effect
ease information [2]. Because the frequency and rhythm of noise. Anindya et al. [17] filtered frequencies that are
VOLUME 6, 2018 This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ 25399
T. Wen, Z. Zhang: Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning
higher than 64HZ by means of a 6th order Butter-worth filter. challenges in automatic epilepsy detection. Feature design
Feature extraction has always been the focus of EEG classifi- and selection is difficult, and whether these methods can
cation, which greatly reduces the dimension of data. A small be applied to new patients is still unknown. Compared with
number of features can describe EEG data well and improve improving classification accuracy by existing techniques,
classification performance greatly. There are many EEG fea- we still need a model with independent feature learning
ture extraction methods, such as time-domain, frequency- ability. Facing this challenge, we propose an auto-coding
domain, time-frequency analysis, and chaotic features. framework- based deep network by combining it with con-
Zandi et al. [18] proposed an algorithm based on wavelet for volution and deconvolution in order to perform unsupervised
real-time detection of epileptic seizures. Polat and Güneş [19] feature learning from EEG signals. This model can effectively
extracted features using fast Fourier transform and classified learn low-dimensional features from high-dimensional EEG
the EEG using a decision tree classifier. Acharya et al. [20] data in order to help classifier achieve a higher detection
mentioned the Entropy-based feature extraction method in his accuracy and faster speed.
review of the application of Entropy in epilepsy detection, The structure of this paper is as follows: in Section II,
such as Approximate Entropy, Sample Entropy, and Spectral we will detail the deep learning-based unsupervised feature
Entropy. learning model. Experiment and results will be presented in
The core of these common methods is to perform effective Section III. In this section, we describe the source and struc-
feature extraction for EEG. However, the process of designing ture of the datasets, and the classification result by applying
new feature is complex and not easily verifiable. It is also the proposed model. Section IV is based on the discussion of
extremely difficult to select a number of optimal charac- features and classification results. Besides, we will compare
teristics from a large number of time-domain, frequency- our model with other well-established models in analyzing
domain, time-frequency analysis, and chaotic features. At the EEG. Finally, Section V concludes this paper.
same time, some wavelet transform, empirical mode analysis
and trend multifractal feature extraction process are compli-
cated and time-consuming. In recent years, deep learning has
become the research hotspot in machine learning and has
achieved high efficiency in computer vision. In EEG classi-
fication, Tabar and Halici [21] used the short time Fourier
transform to transform EEG into 2D images, and then per-
formed independent feature learning and classification based
on a deep learning method. Xun et al. [22] dissected the
EEG into smaller sections through a small window and used
a context-learning model for each small section in order to
form various ‘‘EEG word’’, so as to compile them into an
‘‘EEG dictionary’’ and resulting in a new feature. Then, they
FIGURE 1. Flowchart for unsupervised learning method whereby the light
used the original data to attain new features from the ‘‘EEG green and the light blue dashed frames represent the training and testing
dictionary’’ and present them in the classifier by using two phrases respectively.
parts of these features. Deep learning can independently learn
features from data, which greatly improves the performance
of the classification model. Because of the powerful fea- II. METHODOLOGY
ture learning ability, deep convolution neural network (CNN) The paper adopts unsupervised learning method for the EEG
has become an important research hotspot in image field of epilepsy patients so as to automatically obtain features
and has significant influences in EEG classification. Masci that can better describe the recognized objects. In other
et al. [23] proposed a convolutional auto-encoder, which is words, it extracts a number of features from the original high-
an unsupervised learning method for features learning based dimensional data by means of the dimension reduction algo-
on CNN. Chen et al. [24] presented several descriptors for rithms. The light green dashed frame in Fig.1 is the training
feature-based matching using convolution and pooling auto- stage of the model, which uses unlabeled EEG signals to
encoders. Noh et al. [25] proposed a novel semantic segmen- train the deep convolution network. The light blue dashed
tation algorithm by implementing a deconvolution network. frame represents the testing phrase, in which the test data
However, these unsupervised feature learning methods are is input into the model to obtain the relevant features. After
mostly used in the image analysis field. This paper presents an this stage, we use different common classifiers to verify the
unsupervised feature learning method based on convolution, validity of the test data features. The classifier validation will
deconvolution and autoencoders, and applies it on epilepsy be described in the experimental results in Section 3 and
detection. to be discussed in Section 4. Then, Section 2.1 describes
Although some experienced experts can identify EEG in the Autoencoders framework used in the model, and
epilepsy heuristically, and scholars have performed research the structure and detail of the model will be shown
widely on EEG based epilepsy detection, there are still many in Section 2.2.
A. AUTOENCODERS FRAMEWORK samples and training samples do not meet the same dis-
Autoencoders [26] is a special neural network structure, tribution, the prediction result of the model will decrease
which has input layer, output layer and hidden layer. It adjusts dramatically [28]. However, the EEG signal sample is high-
the weights of hidden layer by training to enable the input dimensional and their dimensions are not independent. There-
and output value to be as close to each other as possible. fore, autoencoders is difficult to extract effective features
Therefore, the hidden layer has important features of original from EEG signals. On the other hand, CNN can perceive adja-
signal in order to realize unsupervised feature extraction. cent dimension of the signal (i.e. local perception) to attain
Autoencoders is similar to the Principal Component Analy- local features by receptive field and parameter sharing. This
sis (PCA), which can reduce data dimension [27]. process is based on the kernel’s convolution. Using multiple
convolution kernels to construct the sample signals can allow
us to obtain a variety of local features. At the same time,
the features of the model can be reduced by down-sampling.
Therefore, the final features of the sample signals can be
extracted by iterative convolution and down-sampling. Our
autoencoders framework-based unsupervised feature learn-
ing is described as follows. In the encoder part, we extract
features by CNN, which constantly iterates multiple convo-
lution kernels’ convolution and down-sampling to reduce the
number of features to our preset. Then, in the decoder part,
we use the extracted features to reconstruct sample signals
by deconvolution and up-sampling. This means that we will
carry out deconvolution first, and then iterate up-sampling
and deconvolution so as to restore the signal.
We presents a fusion model based on the deep convo-
lution network and autoencoders-based (AE-CDNN). The
FIGURE 2. Autoencoders framework that includes input layer
(x1 , x2 , . . . , xn ), hidden layer (h1 , h2 , . . . , hm ), and output layer structure of the AE-CDNN model is as shown in Fig.3. The
(y1 , y2 , . . . , yn ) whereby the weights of hidden layer represent features model mainly has two stages: 1) The encoder stage has sam-
of the input signal.
ple input, convolution layer, pooling layer (down-sampling
layer), reshape operation, full connection layer, and the fea-
Fig.2 illustrates the basic framework of autoencoders,
ture coding; 2) The decoder stage includes feature coding
which has two data process phases, encoding and decoding.
input, full connection layer, reshape operation, deconvolution
During encoding, the original input signal data is x ∈ [0, 1]n .
layer, up-sampling layer and the reconstruction samples. The
Then we obtain the hidden layer h by encoding function
following presents a specific description of each layer of the
h = encoder(x)(h ∈ [0, 1]m ), whereby the encoding function
model.
is defined as follows:
Assume that we input a one-dimension EEG data into the
h = encoder (x) = g(W · x + b). (1) model. Let x represent the input data, and the convolution
In the function, W ∈ Rm×n is the weight matrix connecting layer is the feature extractor. It uses multiple convolution
input layer and hidden layer. b ∈ [0, 1]m is the bias vector, and kernels to perform convolution calculation of x (multiple
g is activation function. In decoding phase, hidden layer h is convolution kernels perceive x locally), so as to attain more
the input of the decoding function y = decoder(h) in order to feature maps, which can maintain the main components of the
obtain the output layer y. The decoding function is: input samples. The k-th feature map fmk in the convolution
layer is calculated as follows:
y = decoder (x) = g W 0 · x + b0 .
(2)
fmk = g (w_ck ∗ x + b_ck ). (4)
Here, the weight matrix between hidden layer and output
layer is W 0 ∈ Rn×m , and the bias vector is b0 ∈ [0, 1]n . For In the k-th convolution kernel of the convolution layer,
the model training process, we let each output signal y(i) to be w_ck and b_ck represent filters and biases of the convolution
as close in value as possible to the original input signal x (i) . kernel, and ∗ is convolution computation. The pooling layer
Then, the object function of the model is given byčž is a down-sampling process, which samples upper layer’s
feature maps to obtain pooled feature maps for reducing data
y(i) − x (i) .
X
min (3)
dimension. The pooling operation uses a window of length l
to allow sliding and extraction of the sample feature maps.
B. DEEP NETWORK BASED UNSUPERVISED
Each sampling interval does not overlap each other, and we
FEATURE LEARNING MODEL
sample the maximum value within the window to obtain the
Because it is easy to directly copy the input vector to out- pooled feature maps. The calculation process is as follows:
put vector during the traditional autoencoders’ training pro-
cess, the model has inferior performance. When the test pmk = Maxpooling (fmk , l). (5)
FIGURE 3. Structure of the AE-CDNN model that consists of the encoder and decoder stages whereby the left
and right show processing flow and data flow respectively.
Here, we can iterate multiple processes of convolution and starts after the encoding. Firstly, the feature coding becomes
pooling to reduce the number and dimension of the pooled a one-dimension vector v0 of length r by the calculation of
feature maps to result in m pooled feature maps. Reshape the second full connection layer. The associated equation is
operation reflects pooled feature maps to one dimensional as follows:
vector and then achieve feature coding by the operation of
v0 = g w0v ∗ c .
(7)
full connection layer, which synthesize information of all
pooled feature maps. The reshape operation makes all pooled The weight of the full connection layer is w0v . Because we
feature maps to generate a one-dimension vector v of length r. need to ensure that in the decoding process, all information
Therefore, after the calculation of full connection layer, is from the feature coding, there will be no bias in this layer.
the vector v becomes feature coding as follows: The second reshape operation cuts v0 into m pooled feature
maps, which corresponds to the first reshape operation. The
c = g (wv ∗ v + bf ). (6) k-th pooled feature map is pm0k . In the up-sampling layer,
we use the original sampling window to insert the same value
In the above equation, wv and bf represent the weight
as the previous sample to attain fm0k as follows:
and bias of full connection layer respectively. Here, c is
fm0k = upsampling pm0k , l .
the achieved feature. The decoding (signal reconstruction) (8)
However, the signal is serial and its features are not inde-
pendent. There are differences between each sample in terms
of magnitude and Equation (10) has great impact on large size
samples. We need to adopt a new loss calculation method:
XN .
Loss2 = (x (i) − y(i) ) avg(x (i) ) N. (11)
i=1
Here, avg(x (i) ) is the average of sample x (i) . The loss function
of this network is optimized using the Adam optimizer [22].
FIGURE 6. Network structure was constructed based on AE-CDNN model in the experiment whereby the left is
encoding network while the right is decoding network.
variance for each sample; 2) count these channels. In the TABLE 1. Parameters used in each classifiers.
first 10 patients, the ‘‘FT9-FT10’’ channel appeared most.
So, we chose ‘‘FT9-FT10’’ channel as sample data for classi-
fication. For the EEG data of the first 10 patients, we ran-
domly extracted 200 epileptic seizure EEG signal samples
and 200 non-epileptic seizure EEG signal samples on channel
‘‘FT9-FT10’’. Note that the length of each sample is 4096.
The activation function of AE-CDNN is sigmoid function
and the output range is (0, 1). To ensure that the loss func-
tions (10) and (11) are available, the range of each dimension
of the input layer should be set at (0, 1). We used 0-1 normal-
ization to map the sample data to [0, 1]. The transformation
function was as follows:
dist = max x (k) − min x (k) , k ∈ {1, . . . N } (12) setting different values for feature encoding length m (feature
quantity).
tranfun = (d − min x (k) )/dist. (13) In Fig. 6, the Convolutional (k_szie = 3, c = 16)
represents a convolution layer, where k_size = 3 means that
In the above equation, dis the dimension value of the input the convolution kernel is 3 and c = 16 means the number
sample x. Then, max x (k) is the maximum value of samples of output channels of this layer is 16. Pooling (k = 2)
in all dimensions, and max x (k) is the minimum of samples is a pooling layer. k is down-sampling factor whose value is 2
in all dimensions. and stride is 1, Deconvolutional (k_szie = 3, c = 1)
is a deconvolution layer whose number of deconvolution
B. NETWORK PARAMETERS AND kernel is 3 and the number of output channels is 16.
CLASSIFIERS PARAMETERS Depooling (k = 2) is up-sampling layer. The original
The AE-CDNN model-based deep network is shown in Fig. 6. input data was a one-dimension vector of length 4096. After
Although the increase of network depth can enhance learning Convolutional (k_szie = 3, c = 16), it became a matrix of
ability of the model, it may also cause gradient to disappear 4096∗16 (16 channels, the length of each channel is 4096).
while training, or overfitting. In the deep network, we fixed Therefore, Convolutional(k_szie = 3, c = 32) changed the
the network depth, encoding and decoding process. In our matrix of upper layer into 4096∗32. And then, the matrix
experiment, we analyzed learning ability of the model by became 2048∗32 after Pooling(k = 2).
In this paper, we used several common classifiers, includ- the classification results of different classifiers for features
ing k-NN, support vector machine (linear kernel and radial learned by AE-CDNN-L1 and AE-CDNN-L2, respectively of
basis kernel), decision tree, random forests, multilayer neural the two datasets. We used a 5-fold cross validation to calculate
network, AdaBoost algorithm, and Guass Bayesian classifi- each classifier’s accuracy. In the first column, m is the number
cation, to classify features attained by unsupervised algorithm of features that the model learnt. For example, in Table 2,
in order to verify the effectiveness of AE-CDNN model. All m = 2 means that the model need to learn two features,
these classifiers are from the scikit-learn library [33] and and after 5-fold cross validation the average accuracy of the
the parameters in the classifiers are based on the default k-NN classifier is 73.336%. In the last column, AVG means
parameters in the library, which were shown in Table 1. In this the average accuracy for the listed classifiers.
table, ‘‘–’’ indicates that the parameters were set to default Fig.7 shows the average accuracy’s change of
value. AE-CDNN-L1 and AE-CDNN-L2 based on different feature
dimensions of the two datasets. When the number of features
C. EXPERIMENTAL RESULTS is greater than 8, the average performance of the above
In our experiment, we used the training set to train model classifiers is acceptable. In general, the accuracy and stability
and the testing set to extract features by the trained model. of AE-CDNN-L2 are better than those of AE-CDNN-L1.
And then, we used several common classifiers to classify
the extracted features, so that to verify the effectiveness of IV. DISCUSSION
the features obtained by the unsupervised learning method. In this section, we analyze the effectiveness of features
For AE-CDNN model, we proposed two different loss func- extracted by AE-CDNN. Then, we compared the features of
tions (10) and (11). Here, we treat the method that uses the model with other unsupervised feature extraction mod-
function (10) as AE-CDNN-L1, and the method that used els. Finally, we performed comparison of our classification
function (11) as AE-CDNN-L2. Table 2 to 5 represent results with those of other related studies.
FIGURE 8. Feature distribution of AE-CDNN-L1 and AE-CDNN-L2 models in the first test of dataset 1 while feature dimension was 4.
separation degree. It also indicated that AE-CDNN-L2 can two methods, we find that AE-CDNN-L2 could be a better
learn better features. choice.
In Fig. 8, f1 was the first feature of the learned feature
coding array. f1 and f2 were independent of each other and B. COMPARISON OF DIMENSION REDUCTION METHODS
the value of f1 of the two models were independent and In last section, we discussed the feature learning ability of
had no corresponding relationship. P, N represented the AE-CDNN model. We will compare the model with the
positive and negative samples. In the process of deep learn- existing main dimension reduction methods in this section.
ing, the change of loss function value can reflect the learn- PCA is a very important linear dimension reduction method,
ing situation of the deep network. Fig.9 shows the training which can obtain low dimensional features by linear trans-
of AE-CDNN-L1 and AE-CDNN-L2. We find that based formation of high dimensional data [27]. Random projec-
on the testing set loss function of the two methods con- tion (RP) is a powerful method used to construct Lipschitz
verged after 2000 epoch. From the convergence results of the mappings so as to realize dimension reduction with a high
ACKNOWLEDGMENT
The findings achieved herein are solely the responsibility of
the authors.
REFERENCES
[1] WHO. (2012). Epilepsy. [Online]. Available: http://www.who.int/
mediacentre/factsheets/fs999/zh/
[2] M. Z. Koubeissi, Niedermeyer’s Electroencephalography, Basic Princi-
ples, Clinical Applications, and Related Fields, 6th ed. Philadelphia, PA,
USA: Lippincott Williams & Wilkins, 2010.
[3] J. Gotman, ‘‘Automatic recognition of epileptic seizures in the EEG,’’
Electroencephalogr. Clin. Neurophysiol., vol. 54, no. 5, pp. 530–540, 1982.
[4] Y. Zhang, G. Xu, J. Wang, and L. Liang, ‘‘An automatic patient-specific
FIGURE 11. Change of loss function values of the NN-2 training process seizure onset detection method in intracranial EEG based on incremen-
in Table 10. tal nonlinear dimensionalityreduction,’’ Comput. Biol. Med., vol. 40,
nos. 11–12, pp. 889–899, 2010.
[5] A. Shoeb, A. Kharbouch, J. Soegaard, S. Schachter, and J. Guttag,
‘‘A machine-learning algorithm for detecting seizure termination in scalp
Xun et al. used a window with the length of 5 secs to carry EEG,’’ Epilepsy Behav., vol. 22, no. 1, pp. S36–S43, 2011.
out single channel samples for classification, with an error [6] M. Qaraqe, M. Ismail, and E. Serpedin, ‘‘Band-sensitive seizure onset
detection via CSP-enhanced EEG features,’’ Epilepsy Behav., vol. 50,
rate of 22.93% [22]. Although there are some differences in pp. 77–87, Sep. 2015.
the preprocessing, Table 5 shows that our method can achieve [7] R. B. Pachori and S. Patidar, ‘‘Epileptic seizure classification in EEG
better classification accuracy. signals using second-order difference plot of intrinsic mode functions,’’
Comput. Methods Programs Biomed., vol. 113, no. 2, pp. 494–502, 2014.
[8] R. Sharma and R. B. Pachori, ‘‘Classification of epileptic seizures in EEG
V. CONCLUSION signals based on phase space representation of intrinsic mode functions,’’
In this paper, we attempt to use the deep convolution network Expert Syst. Appl., vol. 42, no. 3, pp. 1106–1117, 2015.
[9] G. Zhu, Y. Li, and P. Wen, ‘‘Epileptic seizure detection in EEGs signals
and autoencoders to perform unsupervised feature learning using a fast weighted horizontal visibility algorithm,’’ Comput. Methods
of EEG in epilepsy, and to construct the AE-CDNN model. Programs Biomed., vol. 115, no. 2, pp. 64–75, Jul. 2014.
Our model can extract features from unlabeled EEG signals, [10] A. T. Tzallas, M. G. Tsipouras, and D. I. Fotiadis, ‘‘Automatic seizure
detection based on time-frequency analysis and artificial neural networks,’’
which greatly reduce data dimension and achieve good clas- Comput. Intell. Neurosci., vol. 2007, Oct. 2007, Art. no. 80510.
sification accuracy. In our experiments, we have used two [11] E. D. Übeyli, ‘‘Least squares support vector machine employing model-
public EEG datasets (one is the intracranial EEG signals based methods coefficients for analysis of EEG signals,’’ Expert Syst.
Appl., vol. 37, no. 1, pp. 233–239, 2010.
and the other is the brain epidermis EEG signals) to learn [12] A. S. M. Murugavel and S. Ramakrishnan, ‘‘Hierarchical multi-class SVM
features and classify. Only from the classification results, with ELM kernel for epileptic EEG signal classification,’’ Med. Biol. Eng.
we could see that using multiple classifiers without parameter Comput., vol. 54, no. 1, pp. 149–161, 2016.
[13] X. Xiao, S. Pirbhulal, and K. Dong, ‘‘Performance evaluation of plain
tuning, the average classification accuracy of the obtained weave and honeycomb weave electrodes for human ECG monitoring,’’
features could reach more than 92% when feature dimension J. Sensors, vol. 2017, Jul. 2017, Art. no. 7539840.
was greater than 16. And the new method is more optimal [14] S. Pirbhulal, H. Zhang, and S. C. Mukhopadhyay, ‘‘An efficient biometric-
based algorithm using heart rate variability for securing body sensor net-
than PCA and SRP in feature effectiveness after dimension works,’’ Sensors, vol. 15, no. 7, pp. 15067–15089, 2015.
reduction. However, the model performed badly when the [15] W. Wu, H. Zhang, S. Pirbhulal, S. C. Mukhopadhyay, and Y. T. Zhang,
dimension of features is lower. We find that it is the initial ‘‘Assessment of biofeedback training for emotion management through
weight of the network that caused the problem. Therefore, our wearable textile physiological monitoring system,’’ IEEE Sensors J.,
vol. 15, no. 12, pp. 7087–7095, Dec. 2015.
next research object is to discover how to pre-train the initial [16] R. Sharma, R. B. Pachori, and S. Gautam, ‘‘Empirical mode decomposition
weight effectively. Our method is not inferior in terms of based classification of focal and non-focal seizure EEG signals,’’ in Proc.
classification accuracy as compared to the other techniques. Int. Conf. Med. Biometrics, May 2014, pp. 135–140.
[17] A. B. Das and M. I. H. Bhuiyan, ‘‘Discrimination and classification of focal
Because the traditional epilepsy classifications are based on and non-focal EEG signals using entropy-based features in the EMD-DWT
single dataset, our method is less constrained. In future, domain,’’ Biomed. Signal Process. Control, vol. 29, pp. 11–21, Aug. 2016.
[18] A. S. Zandi, M. Javidan, G. A. Dumont, and R. Tafreshi, ‘‘Automated real- [32] T. Alotaiby, F. E. A. El-Samie, and S. A. Alshebeili, ‘‘A review of channel
time epileptic seizure detection in scalp EEG recordings using an algorithm selection algorithms for EEG signal processing,’’ EURASIP J. Adv. Signal
based on wavelet packet transform,’’ IEEE Trans. Biomed. Eng., vol. 57, Process., vol. 2015, p. 66, Dec. 2015.
no. 7, pp. 1639–1651, Jul. 2010. [33] F. Pedregosa et al., ‘‘Scikit-learn: Machine learning in Python,’’ J. Mach.
[19] K. Polat and S. Güneş, ‘‘Classification of epileptiform EEG using a hybrid Learn. Res., vol. 12, pp. 2825–2830, Oct. 2011.
system based on decision tree classifier and fast Fourier transform,’’ Appl. [34] G. E. Hinton and R. R. Salakhutdinov, ‘‘Reducing the dimensionality of
Math. Comput., vol. 187, no. 2, pp. 1017–1026, 2007. data with neural networks,’’ Science, vol. 313, no. 5786, pp. 504–507,
[20] U. R. Acharya, H. Fujita, V. K. Sudarshan, S. Bhat, and J. E. W. Koh, 2006.
‘‘Application of entropies for automated diagnosis of epilepsy using EEG [35] J. Wang, Geometric Structure of High-Dimensional Data and Dimension-
signals: A review,’’ Knowl.-Based Syst., vol. 88, pp. 85–96, Nov. 2015. ality Reduction. Beijing, China: Higher Education Press, 2012.
[21] Y. R. Tabar and U. Halici, ‘‘A novel deep learning approach for classi- [36] T. Wen and Z. Zhang, ‘‘Effective and extensible feature extraction method
fication of EEG motor imagery signals,’’ J. Neural Eng., vol. 14, no. 1, using genetic algorithm-based frequency-domain feature search for epilep-
p. 016003, 2017. tic EEG multiclassification,’’ Medicine, vol. 96, no. 19, p. e6879, 2017.
[22] G. Xun, X. Jia, and A. Zhang, ‘‘Detecting epileptic seizures with electroen-
cephalogram via a context-learning model,’’ BMC Med. Inform. Decision
Making, vol. 16, no. 2, p. 70, 2016.
[23] J. Masci, U. Meier, C. Dan, C. Dan, and J. Schmidhuber, ‘‘Stacked con-
volutional auto-encoders for hierarchical feature extraction,’’ in Proc. Int. TINGXI WEN was born in Fujian, China.
Conf. Artif. Neural Netw., 2011, pp. 52–59. He received the M.S. degree in software engineer-
[24] L. Chen, F. Rottensteiner, and C. Heipke, ‘‘Feature descriptor by convo- ing from Xiamen University. He has published
lution and pooling autoencoders,’’ Int. Arch. Photogram., Remote Sens. widely in the field of robotic control based on
Spatial Inf. Sci., vol. 3, no. 3, pp. 31–38, 2015. EMG using various data classification methods.
[25] H. Noh, S. Hong, and B. Han, ‘‘Learning deconvolution network for He has also developed a novel multi-objective
semantic segmentation,’’ in Proc. IEEE Int. Conf. Comput. Vis. IEEE optimization technique based on spark platform
Comput. Soc., May 2015, pp. 1520–1528. applied for optimizing routing. His research inter-
[26] G. E. Hinton and R. S. Zemel, ‘‘Autoencoders, minimum description length ests include data mining, machine learning, and
and Helmholtz free energy,’’ in Proc. Int. Conf. Neural Inf. Process. Syst.,
cloud computing.
1993, pp. 3–10.
[27] T. Ahmad, R. A. Fairuz, F. Zakaria, and H. Lsa, ‘‘Selection of a subset
of EEG channels of epileptic patient during seizure using PCA,’’ in Proc.
World Sci. Eng. Acad. Soc. (WSEAS), 2008, pp. 270–273.
[28] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, ZHONGNAN ZHANG (M’15) received the B.E.
‘‘Stacked denoising autoencoders: Learning useful representations in a and M.E. degrees in computer science and tech-
deep network with a local denoising criterion,’’ J. Mach. Learn. Res., nology from Southeast University, Nanjing, China,
vol. 11, no. 12, pp. 3371–3408, Dec. 2010. in 1999 and 2001, respectively, and the Ph.D.
[29] D. P. Kingma and J. Ba. (2014). ‘‘Adam: A method for stochastic optimiza- degree in computer science from The University
tion.’’ [Online]. Available: https://arxiv.org/abs/1412.6980
of Texas at Dallas, TX, USA, in 2008.
[30] R. G. Andrzejak, K. Lehnertz, F. Mormann, C. Rieke, P. David, and
C. E. Elger, ‘‘Indications of nonlinear deterministic and finite-dimensional
Since 2017, he has been a Full Professor with
structures in time series of brain electrical activity: Dependence on record- the Software School, Xiamen University, Xiamen,
ing region and brain state,’’ Phys. Rev. E, Stat. Phys. Plasmas Fluids Relat. China, where he was an Assistant Professor from
Interdiscip. Top., vol. 64, no. 6, p. 061907, 2001. 2009 to 2012 and an Associate Professor from
[31] A. H. Shoeb, ‘‘Application of machine learning to epileptic seizure onset 2012 to 2017. His research interests include big data analysis, data mining,
detection and treatment,’’ Dept. Electr. Med. Eng., Massachusetts Inst. machine learning, and bioinformatics.
Technol., Cambridge, MA, USA, 2009.