0% found this document useful (0 votes)
58 views

++CNN Paper With Code

This document discusses applying convolutional neural networks (CNNs) to classify motor imagery EEG data into four classes. The authors analyze different CNN architectures and their effect on accuracy. They investigate optimal parameters for CNN training like feature map size, filter sizes, and epoch count. Methods for generating 2D feature maps from 1D EEG features are presented. Initial results show CNN can achieve 68% accuracy on this four-class problem using simple feature extraction. Accuracy highly depends on tunable factors, requiring fine-tuning techniques. Experiments show simple FFT energy map generation achieves state-of-the-art accuracy for common CNN feature map sizes, confirming CNNs can learn descriptive information for optimal EEG classification.

Uploaded by

Kaja Mohideen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

++CNN Paper With Code

This document discusses applying convolutional neural networks (CNNs) to classify motor imagery EEG data into four classes. The authors analyze different CNN architectures and their effect on accuracy. They investigate optimal parameters for CNN training like feature map size, filter sizes, and epoch count. Methods for generating 2D feature maps from 1D EEG features are presented. Initial results show CNN can achieve 68% accuracy on this four-class problem using simple feature extraction. Accuracy highly depends on tunable factors, requiring fine-tuning techniques. Experiments show simple FFT energy map generation achieves state-of-the-art accuracy for common CNN feature map sizes, confirming CNNs can learn descriptive information for optimal EEG classification.

Uploaded by

Kaja Mohideen
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Application of Convolutional Neural Networks

to Four-Class Motor Imagery Classification Problem

Tomas Uktveris, Vacius Jusas


Kaunas University of Technology, Software Engineering Department,
Studentu st. 50, Kaunas, Lithuania,
e-mail: [email protected], [email protected]

Abstract. In this paper the use of a novel feature extraction method oriented to convolutional neural networks
(CNN) is discussed in order to solve four-class motor imagery classification problem. Analysis of viable CNN archi-
tectures and their influence on the obtained accuracy for the given task is argued. Furthermore, selection of optimal
feature map image dimension, filter sizes and other CNN parameters used for network training is investigated. Meth-
ods for generating 2D feature maps from 1D feature vectors are presented for commonly used feature types. Initial re-
sults show that CNN can achieve high classification accuracy of 68% for the four-class motor imagery problem with
less complex feature extraction techniques. It is shown that optimal accuracy highly depends on feature map dimen-
sions, filter sizes, epoch count and other tunable factors, therefore various fine-tuning techniques must be employed.
Experiments show that simple FFT energy map generation techniques are enough to reach the state of the art classifica-
tion accuracy for common CNN feature map sizes. This work also confirms that CNNs are able to learn a descriptive
set of information needed for optimal electroencephalogram (EEG) signal classification.
Keywords: convolutional neural network; motor imagery; feature map; image classification, FFT energy map.

1. Introduction way to look deeper into the same problem. Regarding


Motor imagery classification is one of many wide- its novelty and success in other fields it was chosen as
spread machine-learning problems of brain-computer the main tool for four-class EEG motor imagery prob-
interface (BCI) systems. With the need for human lem analysis in this paper.
mind controlled applications the recording of electro- By using CNN for classification subtle fine tuning
encephalograms (EEG) has emerged as an optimal is required to receive best results. This involves select-
solution for non-interventional brain activity analysis. ing a proper neural network architecture, feature
The ability to fully understand this brain induced elec- method and feature map size. These nuances and their
trical signal would greatly simplify the life for people effect on classification performance are further ana-
with disabilities or break the barrier for natural inter- lyzed and discussed in this paper.
action in entertainment industry. Furthermore, feature extraction and feature map
This work focuses on four-class motor imagery (image) generation methods for classification are of
problem where the recorded EEG signal is classified great significance. In simplest cases, the EEG signal
into four different classes that correspond to four dif- and feature vector can be treated as one-dimensional
ferent human subject imagined motoric actions (left signal. In order to move to two-dimensional image
hand, right hand, feet and tongue movement). Even if classification, two dimensional features or feature
a simpler two-class (binary) problem achieves good transformation methods are required. Possible tech-
classification performance, the four-class still strug- niques for such a task are presented and discussed in
gles to reach the same results and requires more scien- this work.
tific investigation.
A relatively new and perspective approach to EEG 2. Related work
data classification was found in deep learning branch
of machine learning. Convolutional neural network In recent years, an increasing number of papers
(CNN) is a novel animal visual cortex inspired method that use CNN for EEG classification task have been
for image based classification that has not been widely published. Multiple approaches have been proposed
used with EEG, let alone motor-imagery task. With for solving motor imagery and other related problems.
the abilities to generalize/pool and self-learn the need- A short review of the common techniques is presented
ed features in non-linear ways it can benefit EEG clas- in the remainder of this section.
sification. Since EEG motor imagery task lacks accu- CNN was successfully used by Mirowski et al. [1]
rate solutions the CNN could be the new perspective to predict epileptic seizures from EEG. The authors

1
T. Uktveris, V. Jusas

have proposed to use four types of bivariate statistical rithms were presented along with criticism for the
properties of the EEG signal as features for classifica- manual method for unclear and endless possibilities of
tion. They argue that commonly used univariate fea- combining different methods in an efficient way.
tures (computed on each EEG channel separately) lack Various techniques directly related to the current
the required channel relationship information. Cross- motor imagery problem have been proposed over the
correlation, non-linear interdependence, Lyapunov years in literature. Qin and He in [6] describe an anal-
exponent and wavelet synchrony feature information ysis of a two-class motor imagery problem. Authors
was packed into 2D images for classification. Predic- proposed a technique to analyze the EEG in frequency
tion accuracy of 70% was achieved. Another work in domain. A time-frequency distribution (TFD) images
the field of EEG analysis was dedicated to solving the were constructed based on complex Morlet wavelet
SSVEP (Steady State Visually Evoked Potential) sig- decomposition for electrode pairs. The TFDs were
nal classification problem by Cecotti and Gräser [2], subtracted from symmetrical channels to form weight
where a subject is introduced to visual stimulation at a matrices that were used to compute in weighted ener-
specific frequency. A four layer CNN network topolo- gy for classification. A Laplacian filter was used for
gy with a Fourier transform filter in second layer was signal preprocessing. Average classification rate of
tested. Selected architecture proved to achieve up to 78% was achieved for this method. Another approach
97% classification accuracy. It was noted that the based on energy entropy preprocessing and Fisher
switch from time domain to frequency domain gave a class separability criteria was proposed in [7] by Xiao
positive effect on the classification performance. et al. Authors analyzed a two-class motor imagery
However, introduced reliability rejection criteria for problem in time-frequency domain. Similar TFD dis-
each class made the final solution less robust, pro- tributions (spectrograms) were constructed from EEG
duced a lot of sample rejections and gave average short-term Fourier transform (STFT) data. Three dif-
generalization. Different application of CNN to the ferent classification methods were compared. Classifi-
SSVEP is described in a paper by Bevilacqua et al. cation accuracy for the two class problem was 85%. A
[3]. The authors used a four layer network architecture more complicated approach for 3-class motor imagery
with a hidden L2 Fast Fourier Transform (FFT) layer analysis was done by Zhou et al. in [8]. The study
for frequency extraction. Due to the nature of the proposed a new method to extract the MRICs (move-
problem the signal analysis was done in frequency ment related independent components) and utilized
domain. Channels Pz, PO3, PO4, Oz (of 10-20 elec- ICA (Independent Component Analysis) spatial distri-
trode system) were used to record EEG samples at bution patterns for such a task. Different ICA filter
256Hz within 2 second windows. Images of 4x512 designs were tested. ICA filter design was confirmed
elements were composed of filtered EEG data and to be subject invariant. Classification accuracy of 62%
used as input for the CNN classifier. Network was was received.
trained for 1000 epochs. Mean accuracy of 88% was A more recent study by Bai et al. [9] on 4-class
obtained by this method. motor imagery proposed a novel Wavelet-CSP (Com-
CNN capability of detecting P300 events from mon Spatial Patterns) with ICA-filter method. The
EEG was showcased by Cecotti and Gräser [4] with EEG artifacts were removed using negative entropy-
accuracy of 95%. The signal analysis was conducted based ICA. Mean accuracy of 76% was achieved us-
separately in time and space domains. Images of ing SVM (Support Vector Machine) classifier.
64x64 in size created from 64 channels of downsam- One of the latest works in the field of CNN and 4-
pled EEG data were used for classification. Seven class motor imagery is the paper by Yang et al. [10].
different CNN models were verified. Additionally, the The authors proposed a frequency complementary
work employed a strategy to use vector based CNN feature map selection (FCMS) method. ACSP (Aug-
kernels instead of matrix kernels in order to prevent mented CSP) feature filtering was used in their work.
mixing features related to space and time domains. A Two other feature selection methods - random map
technique based on trained network first layer weight selection (RMS) and selection of all feature maps
analysis was used to extract 8 most relevant electrodes (SFM) were analyzed. FCMS was the best performing
for each subject. method due to its ability to limit the ACSP feature
Recently, CNN has been used by Manor et al. [5] redundancy in different frequency bands. The CNN
to solve RSVP (Rapid Serial Visual Presentation) task used 5 layer architecture with 5x5 filters (kernels).
(where a subject has to detect a target image within The work also demonstrated that CNNs are capable of
five possible categories). The authors introduced, a learning discriminant, deep structure features for EEG
spatio-temporal regularization penalty for EEG classi- classification without relying on the handcrafted fea-
fication to reduce network overfitting. Accuracy of tures. Average classification accuracy achieved was
75% was reached with CNN architecture of three lay- 69%.
ers having 64x1 convolutional, two pooling and two
fully connected filters. Images of 64x64 (64 channels
by 64 time samples) were used as input for the net-
work. Advantages of using neural network models
against manually designed feature extraction algo-

2
3. Methods of analysis input map x. The form of the normalization operator
is:
3.1. Convolutional Neural Networks (CNN) 𝑥𝑖𝑗𝑘
𝑦𝑖𝑗𝑘 ′ = 𝛽
, (6)
2
(𝜅 + 𝛼 ∑𝑘∈𝐺(𝑘 ′) 𝑥𝑖𝑗𝑘 )
Convolutional neural networks are biologically-
inspired variants of MLPs (multi-layer perceptrons).
They have been successfully used for character recog- where y is the output; κ, α, β are normalization param-
nition in the past by LeCun et al. [11] and currently eters, G(k)=[k−⌊ρ2⌋,k+⌈ρ2⌉]∩{1,2,…,K} is a group of
have gained interest from researchers due to perfor- ρ consecutive feature channels in the input map.
mance capabilities. CNNs consist of one or more con-
volutional layers, with weights of the layer shared 3.1.6. Softmax
across the input. Multiple of such layers form a non-
The operation computes the softmax operator
linear “filter” chain. The convolution is designed to across feature channels and, in a convolutional man-
handle 2D data, as opposed to other neural networks
ner, at all spatial locations. It is a combination of an
that operate on 1D vectors. This ability makes the
activation function (exponential) and a normalization
extracted features easier to view and interpret.
operator:
𝑒 𝑥𝑖𝑗𝑘
3.1.1. Feed-forward neural network 𝑦𝑖𝑗𝑘 = 𝑥𝑖𝑗𝑡 . (7)
∑𝐷
𝑡=1 𝑒
A typical neural network function as presented by
Vedaldi and Lenc [12] is defined as: 3.2. Common Spatial Patterns (CSP)
𝑓(𝑥) = 𝑓𝑛 (… 𝑓2 (𝑓1 (𝒙; 𝒘1 ); 𝒘2 ) … , 𝒘𝑛 ) (1) CSP is a widely adopted signal pre-processing
𝑓: ℝ 𝑀×𝑁×𝐾
→ℝ 𝑀′ × 𝑁′ ×𝐾 ′ (2) method that decomposes the raw EEG into subcom-
where x=(x1,…,xk) is the network layer input (a M×N ponents (spatial patterns) having maximum differ-
size image with K channels), w=(w1,…,wn) is the vector of ences in variance as shown by Naeem et al. [13].
learned parameters. Wang et al. in [14] concluded that this technique al-
lows better feature separation in feature space and thus
3.1.2. Convolution more accurate signal classification. Also, the property
of CSP to decrease feature dimensionality is very suit-
A 3-dimensional convolution operation with k’ fil- able for EEG data complexity reduction. It has been
ter count can be expressed as: shown by Uktveris and Jusas in [15] and other works
𝑦𝑖 ′𝑗 ′𝑘 ′ = ∑ 𝒘𝑖𝑗𝑘𝑘 ′ 𝒙𝑖+𝑖 ′,𝑗+𝑗 ′,𝑘 (3) that this method gives a substantial EEG signal classi-
𝑖𝑗𝑘 fication performance increase, thus is a highly rec-
where y is the output of the convolution. ommended filtering method.
The filter is a spatial coefficient matrix W:
3.1.3. Pooling 𝑆=𝑊 𝐸
CNN concept of pooling is a form of non-linear where S is the filtered signal matrix, E is original
down-sampling. Pooling partitions the input image EEG signal. Columns of W denote spatial filters, while
into a set of non-overlapping rectangles and, for each inverse, i.e. 𝑊 1 , are spatial patterns of EEG signal.
such sub-region, outputs the maximum or average The criterion of CSP for a two C1, C2 class problem is
value. This way it is possible to reduce the feature size given by:
(and computation) as required and provide translation 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒: 𝑡𝑟(𝑊 ∑1 𝑊) (8)
invariance. Pooling function is given by: 𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜: 𝑊 (∑1 + ∑2 )𝑊 = 𝐼 (9)
𝑦𝑖𝑗𝑘 = 𝑚𝑎𝑥{𝑦𝑖 ′𝑗 ′𝑘 : 𝑖 ≤ 𝑖 ′ < 𝑖 + 𝑝, 𝑗 ≤ 𝑗 ′ < 𝑗 + 𝑝} (4)
where ∑1 and ∑2 are the class covariance matrices.
where y is the output, p is padding. Solution can be acquired by solving a generalized
eigenvalue problem. Since CSP was designed for a
3.1.4. Non-linear gating binary problem, multiclass solutions are combined of
Typical CNN non-linear filters use linear functions multiple spatial filters.
with a non-linear gating function, applied identically Due to the broad and positive acknowledgement of
to each component of a feature map. The simplest CSP, the method was used in the current work to filter
such function is the Rectified Linear Unit (ReLU). EEG data before commencing feature extraction.
Such filter can be written as:
𝑦𝑖𝑗𝑘 = 𝑚𝑎𝑥{0, 𝑥𝑖𝑗𝑘 } . (5) 3.3. Feature extraction methods

A multitude of EEG feature extraction methods


3.1.5. Normalization
have been studied by Uktveris and Jusas in [15] and
Another important CNN building block is channel- other literature. Their output usually is a one dimen-
wise normalization. This operator normalizes the vec- sional feature vector that can be used for classifica-
tor over feature channels at each spatial location in the tion. The ability to adapt the algorithms for two-

3
T. Uktveris, V. Jusas

dimensional CNN has not been thoroughly analyzed. 3.3.4. Principal Component Analysis (PCA)
It is also important to know if the adapted methods
PCA is a filtering technique that decomposes input
can give similar or better results when applied in 2D
signal into main components by using orthogonal
for CNN. Thus, a review of the most common feature
transformations (13). Wang et al. showed in [17] that it
extraction techniques and their implementations for
also can be used to suppress artifacts and noise in
CNN is presented in this work. A short description of
EEG signal. The decomposition (13) is carried out
the EEG feature methods that were tested and ana-
multiple times – initially to determine the principal
lyzed in this paper is given next.
components, secondly – to suppress noisy components
at decomposition levels 1-3:
3.3.1. Mean channel energy (MCE)
̂𝑖 = |𝑃𝐶𝐴(𝒙𝒊 )|,
𝒙 𝑖 = ̅̅̅̅
1, 𝑛 . (13)
The energy of each i-th EEG channel is computed
as the mean of squared time domain samples (10). The The final feature vector consists of filtered EEG
result is then transformed using a Box-Cox [16] trans- mean energy elements (14) that were normalized via
formation (i.e. logarithm) in order to make the features Box-Cox:
more normally distributed, and finally, the resulting 𝑁
1
values are combined into a feature vector: 𝑦𝑖 = log ( ∑ 𝑥̂𝑖 [𝑘]2 ) , 𝑖 = ̅̅̅̅̅
1, 𝑛 . (14)
𝑁
𝑁
𝑘=1
1
𝑦𝑖 = log ( ∑ 𝑥𝑖 [𝑘]2 ) , 𝑖 = ̅̅̅̅̅
1, 𝑛 . (10) Multi-resolution of 5 levels with Daubechies' least-
𝑁 asymmetric wavelet (4 vanishing moments) was used
𝑘=1
for the decomposition in this work.
3.3.2. Channel variance (CV)
Variance for each i-th EEG channel is the second 3.3.5. Mean band power (BP)
moment of the signal computed about its mean 𝑥̅ . The Algorithm calculates the power of three major fre-
result is normalized using Box-Cox for final feature quency bands: 8-14Hz, 19-24Hz and 24-30Hz (corre-
vector: sponding to Mu, Alfa and Beta brain waves) by first
𝑁
1 band-pass filtering the signal using the 4-th order But-
𝑦𝑖 = log ( ∑(𝑥𝑖 [𝑘] − 𝑥̅𝑖 )2 ) , 𝑖 = ̅̅̅̅̅
1, 𝑛 . (11) terworth finite impulse response (FIR) filter.
𝑁 𝑤
𝑘=1
1
An example of a feature map generated using this 𝑝̅ [𝑛] = 𝑙𝑛 ( ∑ 𝑝[𝑛 − 𝑘]2 ) (15)
technique is given in Figure 1. 𝑤
𝑘=0
The resulting signal is then squared to obtain the
power, and a w-sized smoothing window operation is
performed to filter the signal as shown in (15). The
mean power values (16):
𝑁
1
𝑦𝑖 = ∑ 𝑝̅ [𝑘], 𝑖 = ̅̅̅̅̅
1, 𝑛 (16)
𝑁
𝑘=1
of each computed band are then used as feature
vector components.

3.3.6. Channel FFT energy (CFFT)


Figure 1. Feature map generated with CV method As analyzed by Cecotti and Gräser in [2], this
method employs the Fast Fourier Transform (FFT) for
3.3.3. Mean window energy (MWE) computing i-th EEG channel signal energy estimation
This technique computes (12) the mean signal en- in the frequency domain. The FFT result is squared
ergy of N windows of size W=s/N for each i-th EEG and the sum of all elements is computed:
𝑁
channel (where s is EEG channel sample count). The
resulting coefficients are Box-Cox transformed (12) to 𝑦𝑖 = log (∑ 𝐹𝐹𝑇(𝒙𝒊 )2 ) , 𝑖 = ̅̅̅̅̅
1, 𝑛 (17)
form final map: 𝑘=1

𝑁 𝑊 Final feature vector components are formed after


1 1 Box-Cox transformation is applied.
𝐻𝑖,𝑗 = log [ ∑ ( ∑ 𝑥𝑖 [𝑘]2 )] , 𝑁 = ̅̅̅̅̅̅
1, 𝑝 . (12)
𝑁 𝑊
𝑗=1 𝑘=0
3.3.7. Channel Discrete Cosine Transform (DCT)
The maximum window count in experiments was
selected as p=n (where n is EEG channel count) in Signal energy concentration can be estimated via
order to form rectangle feature maps. DCT as shown by Birvinskas et al. in [18]. The sum of
squared DCT coefficients of each i-th EEG channel
forms the feature vector components of this method
(18). Features are normalized using Box-Cox trans-
form:

4
𝑁
3.3.11. Complex Morlet Wavelet Transform (CWT)
𝑦𝑖 = log (∑ 𝐷𝐶𝑇(𝒙𝒊 )2 ) , 𝑖 = ̅̅̅̅̅
1, 𝑛 . (18)
𝑘=1
CWT is a time-frequency analysis method used by
Le Van Quyen et al. in [20] for obtaining wavelet co-
3.3.8. Time Domain Parameters (TDP) efficient maps W (24) at specific frequencies, that were
analyzed more by Qin and He in [6]:
Time domain parameters compute time-varying +∞
energy of the first k derivatives of the i-th EEG chan- 𝑊𝑥 (𝜏, 𝑓) = ∫ 𝑥(𝑢) 𝛹𝜏,𝑓 (𝑢)𝑑𝑢 . (24)
nel. Obtained derivative values (19) are smoothed −∞
using exponential moving average and a logarithm is All EEG channel signals combined as one x(t) sig-
taken as given by (20). The resulting signal mean is nal are convolved with a number of different frequen-
used in feature vector generation. cy Morlet Wavelets (25), where 𝜎 = 𝑛/ 𝜋𝑓 and n is
𝑑 𝑗 𝑥(𝑡) the number of wavelet cycles.
𝑝𝑗 (𝑡) = , 𝑗 = 0,1, … , 𝑘 (19) (𝑢−𝜏)
𝑑𝑡 𝑗 𝜋𝑓(𝑢−𝜏) − (25)
𝛹𝜏,𝑓 (𝑢) = √𝑓𝑒𝑖 𝑒 𝜎
𝑁

𝑦𝑖 =
1
∑ ln(𝑢 ∙ 𝑝𝑗 [𝑛] − (1 − 𝑢) ∙ 𝑝𝑗 [𝑛 − 1]) , 𝑗 = ̅̅̅̅̅̅
1, 𝑚 (20) Finally, the 𝑊𝑥 (𝜏, 𝑓) is decomposed back to initial
𝑁
𝑘=1 EEG dimensions and the mean energy coefficients of
here u is the moving average parameter, 𝑢 ∈ [0; 1]. each channel form a single row (26) in the feature
map:
3.3.9. Teager-Kaiser Energy Operator (TKEO) 1
𝑁

𝑯𝑖 = ∑|𝑊𝑥(𝜏, 𝑓𝑖 )| , 𝑖 = ̅̅̅̅
1, 𝑛 . (26)
TKEO is a more accurate signal energy calculation 𝑁
𝑘=1
method that allows to detect high frequency and low
In this work, 22 different frequencies were used
amplitude components. Approximation for discrete i-
from [0; 30] Hz range band along with wavelet cycles
th EEG channel signals is given by (21).
from range [0.5; 5]. An example output of this method
is given in Figure 3.
𝛹[𝑥[𝑛]] = 𝑥 2 [𝑛] − 𝑥[𝑛 − 1]𝑥[𝑛 + 1] (21)
𝑁
1
𝑦𝑖 = log ( ∑ Ψ[𝑘]) , 𝑖 = ̅̅̅̅̅
1, 𝑛 (22)
𝑁
𝑘=1
The components of the final feature vector are
computed by using (22).

3.3.10. FFT energy map (FFTEM)


This method generates a 2D feature map from
EEG by using FFT. Each i-th EEG channel signal is
transformed into frequency domain and forms a single Figure 3. Feature map generated with CWT
row in the feature map as shown in (23). Full signal
window was used to gain a global energy view as 3.3.12. Raw signal features (RAW)
opposed to the work by Hu et al. [19], which used RAW is a baseline method that uses the initially
short-term FFT windows. pre-processed EEG signal as values for the feature
𝑯𝑖 = |𝐹𝐹𝑇(𝒙𝒊 )|, ̅̅̅̅
𝑖 = 1, 𝑛 (23) map. Each i-th EEG signal channel directly maps to
feature map H i-th row as shown in (27):
The computed map H is scaled to required feature
map size for CNN classification. Figure 2 shows an 𝑯𝑖 = 𝒙𝒊 , ̅̅̅̅
𝑖 = 1, 𝑛. (27)
example result map generated with this method. If necessary, the resulting feature map is scaled to
the required image size for CNN training.

3.3.13. Signal energy map (SEM)


This method is using raw EEG signal energy val-
ues for feature map generation. The Box-Cox normal-
ized energy of each i-th EEG signal channel is com-
puted and the resulting vector is directly mapped to
feature map H i-th row as shown in (28):
𝑯𝑖 = log 𝒙2𝑖 , ̅̅̅̅
𝑖 = 1, 𝑛. (28)
If necessary, the resulting feature map is scaled to
Figure 2. FFT energy map example the required image size for CNN training. An example
map generated with this method is given in Figure 4.

5
T. Uktveris, V. Jusas

amount of computational-processing resources (i.e.


simplest) should be selected as optimal – 1, 2, 4 or 10.

Figure 4. Feature map generated with SEM

3.4. CNN architecture selection


Choosing the correct network architecture for the
problem gives a greater probability of getting better Figure 5. CNN architecture evaluation
classification results. CNN supports serially connected
layers. Due to the large number of different layer types 3.5. CNN parameter tuning
it is not trivial to find an optimal chain that closely
matches the given problem. CNNs are more complex since they have more hy-
Tests for 11 different CNN architectures were per-parameters than a standard MLP. However, the
completed. Starting from simplest and ending with usual learning rates and regularization constants still
more complex ones. The architecture configurations in apply. CNN training parameters, initial learning rate,
a simplified notation are given in Table 1. The used momentum, batch size and the number of epochs,
notation is explained in Table 2. must be tuned for best performance. Since a 4D pa-
rameter grid based search is too resource intensive, a
Table 1. Different evaluated CNN architectures parameter range scanning approach was carried out to
find optimal values.
# CNN configuration Notes The momentum value denotes the contribution for
1 IC(4)RPFSO 4 filters the next gradient value from previous iteration in Sto-
2 IC(4)RP(4)FSO stride 4 chastic Gradient Descent (SGD) method. Larger pa-
3 IC(8)RPFSO 8 filters rameter values decrease the effectiveness of faster
learning as shown in Figure 6. In tests, values above
4 ICRPFSO
0.6 push the CNN to overfitting and thus decrease
5 IC(32)RPFSO 32 filters generalization and testing accuracy. Value of zero for
6 IC(64)RPFSO 64 filters momentum is not recommended since that invokes a
7 ICRPCRPFSO loss of historical gradient learning information.
8 ICRFSO
9 ICFSO
10 IC(7x1)RC(1x7)RPFSO Non-rect filters
11 IC(1x7)RPC(7x1)RPFSO Non-rect filters

Table 2. CNN layer symbolic notation

Notation Description (default parameters)


I input layer of size (44x44x1)
C convolutional layer (7x7, 16 filters) Figure 6. Momentum evaluation
R ReLU layer
The optimal number of training epochs ensures
P max pooling layer (2x2, stride 2)
that the network learns and generalizes the provided
F fully connected layer (4 classes) features. Excessive epochs deteriorate the testing ac-
S softmax layer curacy since the network is overfitting. Figure 7
O classification (output) layer shows that the optimal count for training is 400-500
epochs.
Evaluation results are shown in Figure 5. It can be
noted that testing accuracy is around ~65% between
most of the configurations. However, training accura-
cy displays a more dynamic profile from 50% to 80%.
In this case, the CNN configuration with the least

6
signed to process two-dimensional images. Two ap-
proaches for image generation form a viable solution.
The first is to interpret the one-dimensional signal
as a 2D single row image. However, the negative as-
pect of this approach is that only a single row CNN
filters/kernels will be usable.
The second method, exploiting the CNN transla-
tional nature, is to find such a transformation H that
allows to convert a 1D signal into 2D:
𝐻: ℝ → ℝ2 . (29)
Figure 7. Epoch count evaluation
A simple example for such a transformation is to
Batch size is the image count that is used for single duplicate the feature vector y in both directions to fill
epoch training. It has direct effect on the network the feature map space. Some additional filtering can
learning quality as shown in Figure 8. The maximum be applied to new repeated copies. An example of
batch size is the number of total images, e.g. N=288 in such feature map is given in Figure 10.
experiments. The values lower than N/4 prevent the
network from fully maximizing learning efficiency,
greater values only increase computational costs at the
price of no change in testing accuracy.

Figure 10. Feature map generated via vector duplication

3.6.2. Feature map scaling


A baseline method and the simplest approach from
all feature extraction techniques is to classify the raw
EEG signal samples. The raw EEG data form factor of
NxM, (where N is number of channels, M is number of
samples, N<<M) restricts a direct use of it for CNN
Figure 8. Batch size evaluation feature images due to large number of samples. Thus it
must be scaled down. Generally, a feature map of
Initial learning rate must be adopted for each prob- WxH size (where W is width and H is height) can be
lem. Experiments show that the value should be no formed by down/up-scaling the raw EEG signal or
bigger than 0.1, while the network testing accuracy extracted feature data. The technique of resizing can
peak is achieved with values close to 0.01 as shown in use bilinear or other type of filtering in order to pre-
Figure 9. Lower values allow to learn fine grained vent sharp data transitions, limit noise and smooth out
features, while large ones have the tendency to overfit the final feature map. An example of filtering applied
the network. to raw EEG feature maps can be seen in Figure 11.

Figure 11. Example of 22x22 raw EEG feature maps


(from left - nearest, bilinear, bicubic filtering)

Initial testing results of the three different image


Figure 9. Initial learning rate evaluation filtering techniques for raw EEG signal classification
are given in Table 3. Results show ~10% difference in
3.6. Feature map generation classification accuracy when various filtering tech-
niques are applied. It can be seen that for raw EEG
3.6.1. Feature duplication signal analysis nearest filtering method should be used
in order to retain original signal details as much as
Many feature extraction methods form a single possible. For other feature types the effect could be
one-dimensional vector of coefficients known as a the opposite.
feature vector. A problem arises since CNNs are de-

7
T. Uktveris, V. Jusas

4. Experiments
Table 3. Raw EEG feature map resize filtering accuracy
The main purpose of the experimentation activities
Filter method Training Testing was to investigate the capabilities of CNN classifier
Nearest 0.47 ± 0.14 0.43 ± 0.11 for four-class motor imagery classification problem.
Also, to analyze influence of various CNN architec-
Bilinear 0.35 ± 0.11 0.33 ± 0.12 tures, feature maps, filter sizes and other parameters to
Bicubic 0.33 ± 0.10 0.32 ± 0.11 classification accuracy. Experiments were conducted
in the analysis step (tuning the CNN network parame-
3.7. Feature map and filter dimensions ters) and also in the main motor imagery classification
step (for each subject).
The problem is to find the right level of granularity
The experiment results were measured and evalu-
in order to create data abstractions at the proper scale,
ated using normalized accuracy in range [0; 1]. The
given a particular dataset. Different feature maps and
CNN network parameters were tuned and verified
filter sizes were analyzed for the motor imagery prob-
before final classification step. Tests were carried out
lem. Dimensions from 8x8 to 64x64 of feature maps
using ten-fold cross validation. Also, the ability of
were tested. Test results are given in Figure 12.
CNN to learn from feature data was validated visually
by inspecting the learned filter/weight images.
Final classification results for each subject are
provided in the results section further.

4.1. Dataset
BCI signal Dataset 2a (contributed and described
by Brunner et al. [21]) from the BCI IV competition
held in 2008 was used for classifier training and test-
ing. The data consists of 22 channels of 250 Hz sam-
Figure 12. Feature map size evaluation ple rate recorded EEG signal for 9 healthy test sub-
jects (total 288 motor imagery trials per subject). The
The plot shows that the optimal feature map size is EEG signal was bandpass-filtered between 0.5 Hz and
24x24 with accuracy of 65%, even though a more 100 Hz and additional 50 Hz notch filter was enabled
accurate solution of 66% exists at size 44x44. Choos- to suppress line noise. For each subject, two sessions
ing a smaller size feature map ensures faster computa- on different days were recorded. During each session
tion and processing speeds. Also note that the accura- and using a cue-paced (synchronous) mode of opera-
cy convergence is reached when the feature map size tion, test subjects were asked to imagine movement of
is at least twice (15x15) the size as the convolution one out of four different motions (left hand, right
layer filter size (7x7 in the experiment). When the hand, feet, tongue) for 3 s. Each of the trials (Figure
optimal size is reached, the further increase in dimen- 14) in the dataset started with an audible signal (beep),
sion only introduces extra computational costs. followed by visual information (cue) on screen to
Convolution layer filter size limits the learning perform one of the mental tasks and a short break after
granularity by encompassing fixed size feature map the mental task. Before the experiments additional
regions. Ten different filter sizes were tested in range artifact correction of EEG data was done to discard
[2; 11] for 22x22 feature maps. Test results are dis- invalid trials as described in [21] by Brunner et al. The
played in Figure 13. corrected EEG data were bandpass-filtered between 7
Hz and 30 Hz in order to cover mu and beta rhythm
bands.

Figure 14. Single trial timing scheme

4.2. Implementation details


Figure 13. Convolution layer filter size evaluation
Software code for experiments was implemented
The optimal filter size, that gives highest accuracy, in MATLAB 2016b/9.1 numerical computation envi-
is 7x7 and 11x11. Choosing the smaller filter size ronment. CNN is a new MATLAB functionality (start-
ensures faster processing speeds. Filters of size 2x2 ing from the 2016a/9.0 version), which uses GPU
and 3x3 exhibit too few weights to fully learn the processor for parallel computations. Other alternatives
details of the provided data. for convolutional neural networks exist such as the

8
open source MatConvNet library created by Vedaldi imagery based problem. After an in-depth CNN analy-
and Lenc [12], however, the library was left as an sis and parameter fine-tuning, promising results were
option for future CNN evaluations. Parts of the open achieved for the selected problem. The FFT energy
source BioSig library for biomedical signal processing map method demonstrated the best feature determina-
and imaging were used in EEG signal analysis. tion abilities and achieved 68% mean testing accuracy
CNN convolution layer initial filter weights in all for all the BCI IV competition 2a dataset subjects. The
tests were set to have a Gaussian distribution with a gained accuracy is slightly better than in new tech-
mean of 0 and standard deviation of 0.01. The default niques proposed by Tabar and Halici in [22] and simi-
for the initial bias was 0. lar to more complex state-of-the-art EEG analysis
All MATLAB source code used in this work with techniques by Yang et. al [10]. The use of simpler
results are available at the Github Git repository: feature extraction methods like FFT energy map
https://github.com/tomazas/itc2017. shows a high CNN method potential for motor image-
ry EEG analysis.
4.3. Results Further work will continue in order to provide
more efficient feature extraction methods favoring
Final classification results were obtained after processing speed and accuracy.
analysis and CNN parameter fine-tuning step. A CNN
with initial learning rate of 0.01, momentum of 0.1,
batch size of 128, 200 epochs and architecture Acknowledgments
I(22x22)C(4x4,16)RPFSO was trained and tested for
We would like to present our thanks to anonymous
final evaluation on all subjects. Results were verified
reviewers for their helpful suggestions.
by using 10-fold cross-validation scheme. The accura-
cies with their standard deviation values are displayed
in Table 4. From the results, it can be seen that the best
performing approach (70% in training) and (68% in
testing) is the FFT energy map method. The second
and third best methods in tests are the Channel vari-
ance (68%/61%) and Signal energy map (67%/61%)
features. The lowest accuracy of (41%/31%) was
achieved by the TDP feature method.
Table 4. Classification results for feature methods

Method Training Testing


MCE 0.66 ± 0.19 0.58 ± 0.20
CV 0.68 ± 0.18 0.61 ± 0.22
MWP 0.66 ± 0.19 0.58 ± 0.20
PCA 0.61 ± 0.16 0.55 ± 0.20
BP 0.52 ± 0.18 0.39 ± 0.11
CFFT 0.66 ± 0.19 0.58 ± 0.20
DCT 0.54 ± 0.17 0.42 ± 0.11
TDP 0.41 ± 0.11 0.31 ± 0.07
TKEO 0.43 ± 0.12 0.34 ± 0.05
FFTEM 0.70 ± 0.18 0.68 ± 0.20
CWT 0.46 ± 0.10 0.43 ± 0.13
RAW 0.48 ± 0.14 0.37 ± 0.11
SEM 0.67 ± 0.18 0.61 ± 0.20

5. Conclusions
This work analyzed Convolutional Neural Net-
works and their application to four-class motor-

9
T. Uktveris, V. Jusas

References
[1] P. Mirowski, Y. LeCun, D. Madhavan, R. Kuzniecky. [14] Y. Wang, S. Gao, X. Gao. Common Spatial Pattern
Comparing SVM and convolutional networks for epilep- Method for Channel Selection in Motor Imagery Based
tic seizure prediction from intracranial EEG, IEEE Brain-Computer Interface, IEEE Engineering in Medi-
Workshop Machine Learning Signal Processing, 2008, cine and Biology 27th Annual Conference, 2005, 5392-
https://doi.org/10.1109/MLSP.2008.4685487 5395, https://doi.org/10.1109/IEMBS.2005.1615701
[2] H. Cecotti, A. Gräser, Convolutional neural network [15] T. Uktveris, V. Jusas. Comparison of Feature Ex-
with embedded fourier transform for EEG classification, traction Methods for EEG BCI Classification, Infor-
Proceedings of the 19th International Conference on mation and Software Technologies: 21st International
PatternRecognition (ICPR ’08), 2008, 1–4, Conference, 2015, 81-92, https://doi.org/10.1007/978-3-
https://doi.org/10.1109/ICPR.2008.4761638 319-24770-0_8
[3] V. Bevilacqua et al., A novel BCI-SSVEP based ap- [16] G. E. P. Box, D. R. Cox. An analysis of transfor-
proach for control of walking in Virtual Environment us- mations, Journal of the Royal Statistical Society, Series
ing a Convolutional Neural Network, International Joint B, 1964, Vol. 26, No. 2, 211-252,
Conference on Neural Networks (IJCNN), 2014, 4121- http://www.jstor.org/stable/2984418
4128, https://doi.org/10.1109/IJCNN.2014.6889955 [17] D. Wang, D. Miao, G. Blohm. Multi-Class Motor
[4] H. Cecotti, A. Gräser, Convolutional neural networks Imagery EEG Decoding for Brain-Computer Interfaces.
for P300 detection with application to brain–computer Frontiers in Neuroscience, 2012,
interfaces, IEEE Trans. Pattern Anal. Mach. Intell., https://doi.org/10.3389/fnins.2012.00151
2011, Vol. 33, No. 3, 433-445, [18] D. Birvinskas, V. Jusas, I. Martišius, R.
https://doi.org/10.1109/TPAMI.2010.125 Damaševičius. EEG dataset reduction and feature ex-
[5] R. Manor, A. B. Geva, Convolutional Neural Network traction using discrete cosine transform. UKSim-AMSS
for Multi-Category Rapid Serial Visual Presentation EMS 2012: 6th European Modelling Symposium on
BCI, Frontiers Media S.A, 2015, Mathematical Modeling and Computer Simulation 2012,
https://dx.doi.org/10.3389%2Ffncom.2015.00146 186-191, https://doi.org/10.1109/EMS.2012.88
[6] L. Qin, B. He. A wavelet-based time-frequency analysis [19] J. Hu, Z. Mu, D. Xiao. Application of Energy En-
approach for classification of motor imagery for brain- tropy in Motor Imagery EEG Classification. JDCTA,
computer interface applications, J. Neural Eng., 2009, Vol. 3, 83-90,
2005, Vol. 2, 65-72, https://doi.org/10.1088/1741- http://dblp.dagstuhl.de/rec/bib/journals/jdcta/HuXM09
2560/2/4/001 [20] M. Le Van Quyen, J. Foucher, J. Lachaux, E.
[7] D. Xiao, Z. D. Mu, J. F. Hu. Classification of motor Rodriguez, A. Lutz, J. Martinerie, F. J. Varela. Com-
imagery EEG signals based on energy entropy, 2009 In- parison of Hilbert transform and wavelet methods for the
ternational Symposium on Intelligent Ubiquitous Com- analysis of neuronal synchrony. Journal of Neuroscience
puting and Education, 2009, 61-64, Methods, 2001, Vol. 111, 83-98,
https://doi.org/10.1109/IUCE.2009.57 https://doi.org/10.1016/S0165-0270(01)00372-7
[8] Zhou, B. , Wu, X. , Zhang, L. , Lv, Z. and Guo, X. [21] Brunner, C., et al., BCI Competition 2008 – Graz
Robust Spatial Filters on Three-Class Motor Imagery data set A, 2008,
EEG Data Using Independent Component Analysis. http://www.bbci.de/competition/iv/desc_2a.pdf
Journal of Biosciences and Medicines, 2014, Vol. 2, 43- [22] Y. R. Tabar, U. Halici. A novel deep learning ap-
49, http://dx.doi.org/10.4236/jbm.2014.22007 proach for classification of EEG motor imagery signals,
[9] X. Bai, X. Wang, S. Zheng, M. Yu. The offline feature Journal of Neural Engineering, 2016, Vol. 14, No. 1,
extraction of four-class motor imagery EEG based on https://doi.org/10.1088/1741-2560/14/1/016003
ICA and Wavelet-CSP, Control Conference (CCC),
2014, 7189-7194,
https://doi.org/10.1109/ChiCC.2014.6896188
[10] H. Yang, S. Sakhavi, K. K. Ang, C. Guan, On the
use of convolutional neural networks and augmented
CSP features for multi-class motor imagery of EEG sig-
nals classification, 37th Annual International Confer-
ence of the IEEE Engineering in Medicine and Biology
Society (EMBC), 2015, 2620-2623,
https://doi.org/10.1109/EMBC.2015.7318929
[11] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner. Gra-
dient-based learning applied to document recognition.
Proceedings of the IEEE, 86(11), 2278–2324,
https://doi.org/10.1109/5.726791
[12] A. Vedaldi, K. Lenc, MatConvNet - Convolutional
Neural Networks for MATLAB, Proc. of the ACM Int.
Conf. on Multimedia, 2015,
https://arxiv.org/abs/1412.4564
[13] M. Naeem, C. Brunner, G. Pfurtscheller. Dimen-
sionality Reduction and Channel Selection of Motor Im-
agery Electroencephalographic Data, Computational In-
telligence and Neuroscience, 2009,
http://dx.doi.org/10.1155/2009/537504

10

You might also like