0% found this document useful (0 votes)
11 views

SURF

The document describes a new method called SURF for compressing electrocardiogram (ECG) signals collected by wearable devices. SURF uses unsupervised learning to build subject-specific dictionaries to encode the signals in a way that significantly reduces data and energy requirements for wireless transmission while maintaining signal quality. Evaluation shows it achieves much higher compression ratios and lower transmission energy use than existing techniques.

Uploaded by

sunilkumarkn26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

SURF

The document describes a new method called SURF for compressing electrocardiogram (ECG) signals collected by wearable devices. SURF uses unsupervised learning to build subject-specific dictionaries to encode the signals in a way that significantly reduces data and energy requirements for wireless transmission while maintaining signal quality. Evaluation shows it achieves much higher compression ratios and lower transmission energy use than existing techniques.

Uploaded by

sunilkumarkn26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

SPECIAL SECTION ON BODY AREA NETWORKS

Received July 28, 2017, accepted August 24, 2017, date of publication September 7, 2017, date of current version October 12, 2017.
Digital Object Identifier 10.1109/ACCESS.2017.2749758

SURF: Subject-Adaptive Unsupervised ECG Signal


Compression for Wearable Fitness Monitors
MOHSEN HOOSHMAND1 , DAVIDE ZORDAN2 , TOMMASO MELODIA3 , (Senior Member, IEEE),
AND MICHELE ROSSI2 , (Senior Member, IEEE)
1 Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
2 Department of Information Engineering, University of Padova, 35131 Padua, Italy
3 Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115 USA
Corresponding author: Michele Rossi ([email protected])
The work of M. Hooshmand, D. Zordan, and M. Rossi was supported in part by Samsung Advanced Institute of Technology, South Korea,
as part of its Samsung Global Research Outreach program, and in part by by the University of Padova through the Project IoT-SURF under
Grant CPDA 151221. The work of T. Melodia was supported by the U.S. National Science Foundation under Grant CNS-1253309 and
Grant CNS-1618731.

ABSTRACT Recent advances in wearable devices allow non-invasive and inexpensive collection of
biomedical signals including electrocardiogram (ECG), blood pressure, respiration, among others.
Collection and processing of various biomarkers are expected to facilitate preventive healthcare through
personalized medical applications. Since wearables are based on size- and resource-constrained hardware,
and are battery operated, they need to run lightweight algorithms to efficiently manage energy and memory.
To accomplish this goal, this paper proposes SURF, a subject-adaptive unsupervised signal compressor
for wearable fitness monitors. The core idea is to perform a specialized lossy compression algorithm on
the ECG signal at the source (wearable device), to decrease the energy consumption required for wireless
transmission and thus prolong the battery lifetime. SURF leverages unsupervised learning techniques to build
and maintain, at runtime, a subject-adaptive dictionary without requiring any prior information on the signal.
Dictionaries are constructed within a suitable feature space, allowing the addition and removal of code words
according to the signal’s dynamics (for given target fidelity and energy consumption objectives). Extensive
performance evaluation results, obtained with reference ECG traces and with our own measurements from
a commercial wearable wireless monitor, show the superiority of SURF against state-of-the-art techniques,
including: 1) compression ratios up to 90-times; 2) reconstruction errors between 2% and 7% of the signal’s
range (depending on the amount of compression sought); and 3) reduction in energy consumption of up to
two orders of magnitude with respect to sending the signal uncompressed, while preserving its morphology.
SURF, with artifact prone ECG signals, allows for typical compression efficiencies (CE) in the range
CE ∈ [40, 50], which means that the data rate of 3 kbit/s that would be required to send the uncompressed
ECG trace is lowered to 60 and 75 bit/s for CE = 40 and CE = 50, respectively.

INDEX TERMS Biomedical signal processing, data compression, energy efficiency, self-organizing feature
maps, unsupervised learning, wearable sensors.

I. INTRODUCTION easy to measure, but are at the same time extremely valuable
Wearables can be integrated into wireless body sensor for the aforementioned purposes. We consider the acquisition
networks (WBSN) [1] to update medical records via the of such signals through wearable devices like smart watches
Internet, thus enabling prevention, early diagnosis and per- or chest straps [2], [3] and are concerned with prolonging
sonalized care. However, since they are required to be small the battery time of these wearables through lossy signal com-
and lightweight, they are also resource constrained in terms pression. We consider scenarios where wireless transmission
of energy, transmission capability, and memory. of ECG signals to some access point is required, so that
In this article, we propose new data processing solutions the signal can be stored and made available through cloud
for the long-term monitoring of quasi-periodic electrocardio- servers to be analyzed by clinicians. Our approach consists
graphy (ECG) signals. These biomedical traces are relatively of compressing the ECG time series right on the wearable

2169-3536 2017 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 5, 2017 Personal use is also permitted, but republication/redistribution requires IEEE permission. 19517
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

device, prior to their transmission, so that the data to be


stored and sent takes a small portion of its original space.
As we quantify below, this leads to substantial energy savings
(between one and two orders of magnitude) and to prolonged
battery life.
The proposed compression algorithm is based on unsuper-
vised neural maps for the construction of online dictionar-
ies, whose codewords are utilized to match input patterns.
The acquired biomedical signal, thanks to its quasi-periodic
nature, is decomposed into segments made up of samples
between consecutive signal peaks. We consider these seg-
ments as the signal’s recurrent patterns. A preliminary train- FIGURE 1. Communication diagram.
ing phase uses the incoming segments to learn the signal
distribution of the actual subject. The synaptic weights of the at an Internet server (see scenario a) or at the wireless receiver
neural maps become progressively and adaptively tuned to (e.g., at the smartphone for scenario b). The wireless channel
approximate such distribution, without any prior knowledge is used to transport the compressed bitstream and to also
upon it. Note that these weights represent the codewords. update the dictionaries that are utilized at the decompressor
Once the dictionary has been set up, each input segment is to reconstruct the original ECG sequence.
then encoded through a Vector Quantization (VQ) approach, SURF is based on the Growing Neural Gas (GNG) network
which selects the best matching codeword and transmits its of [8]. With GNG, the dictionary size can be dynamically
index to the receiving end in place of the full data. Moreover, adapted through the addition and removal of neurons. This
each new data segment is also utilized to refine the dictionary allows the exploration of new regions in the data space with-
in a real-time, online fashion. This is particularly appealing out affecting the accuracy reached by the dictionary in the
since it allows updating the dictionary to new subjects or to regions that have been already explored. Also, with SURF
the same subject if and when their signal statistics undergoes dictionaries are learned in a suitable feature space, with
major changes. We underline that, although our approach is reduced dimensionality with respect to the dimensionality
here designed and tested against ECG traces, it can be applied of the signal to be compressed. This makes it possible to
to any quasi-periodic signal exhibiting recurrent patterns. further enhance the compression efficiency and reduce the
In a previous research paper [4], we devised a first cost of dictionary updates. In addition, an original dictionary
subject-adaptive compressor that builds and maintains dic- management scheme is adopted, where one dictionary is used
tionaries on the fly using Time Adaptive Self Organizing for compression, one for continuous learning and one for the
Maps (TASOM), see [5]–[7]. As shown in that paper, assessment of previously unseen patterns. Before being used
these neural networks have excellent learning and adaptation for compression, new codewords must pass the assessment
capabilities and in general lead to high compression rates, phase. Patterns that are still under assessment are not encoded
reaching very good performance were other algorithms typ- through the dictionary, but their compact (transform domain)
ically fail. In this work, we build on our previous research, representation is sent instead. This allows achieving small
and in particular on the TASOM-based algorithm of [4], representation errors at all times, while effectively coping
by proceeding along two main axes: (i) first, we prove with artifacts (see Section VIII) and refining the dictionaries
that TASOM-based dictionaries have some major limitations as the signal statistics change. Overall, SURF learns and
when new and sporadic patterns arise, such as in the presence maintains dictionaries in a totally unsupervised and online
of artifacts caused, for example, by motion of the wearer, fashion, requiring less than 20kbytes of memory space, while
(ii) given this, we design and validate a new algorithm, called allowing for high compression ratios and small reconstruction
SURF, for ‘‘Subject-adpative Unsupervised signal compres- errors (usually within 2 and 7% of the signal’s peak-to-peak
sor for weaRable Fitness monitors’’, which overcomes the range, depending on the compression factor).
limitations of TASOM-based schemes, while retaining their Several codebook-based solutions were proposed in the
excellent performance in terms of compression ratio. This literature, see, e.g., the Gain-Shape Vector Quantization
is an entirely new design that combines three different dic- (GSVQ) method of [9]. GSVQ relies on offline learning,
tionaries with unsupervised learning techniques, successfully i.e., the codebook is attained from pre-collected datasets and
coping with artifacts. is then used at runtime. With this approach, the codebook
Two possible usage models for SURF are shown in Fig. 1. cannot be changed if the signal statistics changes significantly
The SURF compressor is run inside the wearable ECG mon- and a stream of residuals is transmitted to compensate for
itor, taking as input the raw ECG sequence and generating a this. With GSVQ, the achievable reduction in size for ECG
compressed bitstream. The latter is then sent over a wireless signals is up to 35-fold, as opposed to the much higher perfor-
channel to an access point (‘‘scenario a’’) or to a smartphone mance attained by SURF, which ranges from 50- to 96-fold
(‘‘scenario b’’). The SURF decompressor is used to recon- (depending on the frequency of artifacts in the input data).
struct the original ECG signal and can either be implemented Also, the performance of SURF is here compared against

19518 VOLUME 5, 2017


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

that of selected compression algorithms from the literature Transformation methods perform a linear orthogonal trans-
including neural networks [4], linear approximations [10], formation. The most widely adopted techniques are Fast
Fourier [11]–[13], Wavelet [14] transforms and compressive Fourier Transform (FFT) [11], Discrete Cosine Transform
sensing (CS) [15], [16]. SURF surpasses all of them, achiev- (DCT) [20], and Discrete Wavelet Transform (DWT) [14].
ing remarkable performance, especially at high compression The amount of compression they achieve depends on the
ratios where the reconstruction error (Root Mean Square number of transform coefficients that are selected, whereas
Error, RMSE) at the decompressor is kept below 7% of the their representation accuracy depends on how many and
signal’s peak-to-peak amplitude, whereas the RMSE of other which coefficients are retained. Although these algorithms
solutions becomes unacceptably high. can provide high compression ratios, their computational
A thorough numerical analysis of SURF, carried out on complexity is often too high for wearable devices. Also, as we
the PhysioNet public dataset [17] and our own collected quantify below, these methods are in general outperformed by
ECG traces from a Zephyr Bioharness 3 device, reveals the linear and dictionary-based approaches at high compression
following: ratios.
i) SURF’s dictionaries gracefully and effectively adapt to Parameter extraction methods use Artificial Neural Net-
new subjects or to their new activities, works (ANNs), Vector Quantization (VQ), and pattern recog-
ii) the size of these dictionaries is kept bounded within nition techniques. This is a field with limited investigation
20 kbytes, making them amenable to implementation in that has recently aroused great interest from the research
wireless monitors, community. Unlike direct and transformation methods, the
iii) high compression efficiency is reached (reductions in rationale is to process the input time series to obtain some
the signal size from 50 to 96-fold), kind of knowledge (e.g., input data probability distribution,
iv) the original ECG time series are reconstructed at the signal features, hidden topological structures) and utilize it
receiver with high accuracy, i.e., obtaining peak-to-peak to get compact and accurate signal representations. The algo-
RMSEs within 7% and often smaller than 3% and, rithm that we propose in this paper belongs to this class. Other
v) compression allows saving energy at the transmitter, representative algorithms are [4], [9], [21]–[24]. In [21],
leading to reductions of up to two orders of magnitude a direct waveform Mean-Shape Vector Quantization (MSVQ)
at the highest compression ratios. is tailored for single-lead ECG compression. Based on the
The reminder of this paper is structured as follows. observation that many short-length segments mainly differing
In Section II, we discuss previous work on lossy compression in their mean values can be found in a typical ECG signal,
for ECG signals. In Section III, we briefly review vector quan- the authors segment the ECG into vectors, subtract from
tization, which we also exploit in our design. In Section IV, each vector its mean value, and apply scalar quantization
we introduce the self-organizing maps, and in Section V we and vector quantization to the extracted means and zero-
describe an initial design based on them. This design is the mean vectors respectively. Differently from our approach,
same of [4] and is discussed here for the sake of completeness the segmentation procedure is carried out by fixing the vec-
and for a better understanding of the more complex design of tor length to a predetermined value. This avoids the com-
Section VI, where we describe in detail the SURF compres- putational burden of peak detection, but it does not take
sion scheme. A thorough performance evaluation is carried full advantage of the redundancy among adjacent heartbeats,
out in Section VIII, comparing SURF against state-of-the which are in fact highly correlated. Moreover, in MSVQ,
art solutions for reference and own collected ECG traces. dictionaries are built through the Linde-Buzo-Gray (LBG)
Conclusions are drawn in Section IX. algorithm [25], without adapting its codewords at runtime.
The following Section III introduces some preliminary In [9], Sun et al. propose another vector quantization scheme
notions on network quantization and Section IV summarizes for ECG compression, using the Gain-Shape Vector Quanti-
the main operational principles of self organizing maps, on zation (GSVQ) approach. There, the ECG is segmented into
which the algorithms that are proposed in this paper rest. This vectors made up of samples between two consecutive signal
material can be skipped by an expert reader on these matters. peaks. Each extracted vector is normalized to a fixed length
and divided by its gain to obtain the so called shape vector.
II. RELATED WORK A codebook for the shape vectors is generated using the
Compression algorithms for ECG signals can be grouped LBG algorithm. After this, each normalized vector is assigned
into three main categories: Direct Methods, Transformation the index of the nearest codeword in the dictionary and a
Methods and Parameter Extraction Methods. residual vector is encoded to compensate for inaccuracies.
Direct methods, which include the Lightweight Temporal The original length of each heartbeat, the gain, the index
Compression (LTC) [10], the Amplitude Zone Time Epoch of the nearest codeword and the encoded (residual) stream
Coding (AZTEC) [18], and the Coordinate Reduction Time are sent to the decoder. For the signal reconstruction at the
Encoding System (CORTES) [19], operate in the time domain receiver, the decoder retrieves the codeword from its local
and utilize prediction or interpolation algorithms to reduce copy of the dictionary, performs a denormalization using
redundancy in the input signal by examining subsequent time the gain and the length, and adds the residual signal. SURF
samples. resembles [9] in the way signal segments are defined and in

VOLUME 5, 2017 19519


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

the adoption of the GSVQ approach. Indeed, the main ECG instead of scalars, even if the data source is memoryless or
peaks are used to extract segments, which then constitute the data compression system is allowed to have memory.
the recurrent patterns to be learned. [22] distinguishes itself Let x = [x1 , x2 , ..., xm ]T ∈ Rm be an m dimensional input
from the previous schemes because it defines a codebook random vector. A vector quantizer is described by:
of ECG vectors adaptable in real-time. The dictionary is • A set of decision regions Ij ⊆ Rm , j = 1, . . . , L, such
implemented in a one dimensional array with overlapped and that Ij ∩ Ih = ∅, j, h = 1, . . . , L, j 6 = h, and the union of
linearly shifted codewords that are continuously updated and Ij (with j = 1, . . . , L) spans Rm .
possibly removed according to their frequency of utilization. • A finite set of reproduction vectors (codewords)
In particular, an input vector that does not find a matching C = {yyj }Lj=1 , y j = [yj1 , yj2 , . . . , yjm ]T ∈ Ij ⊆ Rm . This
codeword is added to the codebook, triggering the removal set is called codebook or dictionary. Each codeword y j
of the codeword with the least number of matches. However, is assigned a unique index.
no details are provided on ECG segmentation nor on how • A quantization rule q(·):
ECG segments with different lengths are to be processed.
A compression approach based on vector quantization, where q(xx ) = yj if x ∈ Ij . (1)
dictionaries are built and maintained at runtime is presented This means that the jth
decision region Ij is associated
in [4]. In this paper, time adaptive self organizing maps with the jth codeword y j and that each vector x belonging
are utilized to reshape the dictionary as the signal statistics to Ij is mapped by (1) into y j .
change. As we show in Section VIII, while this approach has A compression system based on VQ involves an encoder
excellent compression performance and gracefully adapts to and a decoder. At the encoder, the output samples from the
non-stationary signals, it is not robust to artifacts, i.e., the data source (e.g., samples from a waveform, pixels from an
quality of the dictionary degrades in the presence of sudden image) are grouped into blocks (vectors) and each of them is
changes in the signal statistics or of previously unseen pat- given as input to the VQ. The VQ maps each vector x onto
terns. A compression scheme for quasi-periodic time series codeword y j according to (1). Compression is achieved since
can be found in [24], where the authors target the lightweight the index j associated with y j is transmitted to the decoder in
compression of biomedical signals for constrained devices, as place of the whole codeword. Because the decoder has exactly
we do in this paper. They do not use a VQ approach but exploit the same dictionary stored at the encoder, it can retrieve the
sparse autoencoders and pattern recognition as a means to codeword given its index through a table lookup. Note that,
achieve dimensionality reduction and compactly represent the for correct decoding, the dictionary at the decoder shall be the
information in the original signal segments through shorter same in use at the encoder at all times. We say that encoder
segments. Quantitative results assess the effectiveness of their and decoder are synchronized if this is the case and that are
approach in terms of compression ratio, reconstruction error out-of-synchronism otherwise.
and computational complexity. However, the scheme is based The quality of reconstruction is measured by the average
on a training phase that must be carried out offline and is thus distortion between the quantizer input x and the quantizer out-
not suitable for patient-centered applications featuring pre- put y j . A common distortion measure between a vector x and
viously unseen signal statistics. A taxonomy describing most a codeword y j is the Euclidean distance d(xx , y j ) = kxx − y j k.
of these compression approaches, including their quantitative The average distortion is measured through the root mean
comparison, can be found in the survey paper [26]. squared error (RMSE):
Our present work improves upon previous research as neu-
L Z
ral network structures are utilized to build and adapt compact X
representations of biomedical signals at runtime, utilizing E[d(xx , y j )] = kaa − y j kfx (aa)daa, (2)
j=1 Ij
unsupervised learning. Our design uses multiple dictionaries
to ensure robustness against artifacts, still obtaining very high where fx (·) is the probability density function of the random
compression ratios, and achieving small reconstruction errors vector x . The design of an optimal VQ consists in finding the
at all times. dictionary C and the partition of Rm that minimize the average
distortion. It can be proved that an optimal VQ must satisfy
III. PRELIMINARIES ON VECTOR QUANTIZATION the following conditions:
FOR SIGNAL COMPRESSION 1) Nearest Neighbor Condition (NNC). Given the set of
Vector quantization is a technique originally conceived for codewords C, the optimal partition of Rm is the one
lossy data compression but also applicable to clustering, returning the minimum distortion:
pattern recognition and density estimation. VQ is a gener-
alization of scalar quantization of a single random variable Ij = {xx : d(xx , y j ) ≤ d(xx , y h ), j 6 = h}. (3)
to quantization of a block (vector) of random variables [27].
This condition implies that the quantization rule (1)
Its motivation lies on the fundamental result of Shannon’s
can be equivalently defined as q(xx ) = arg min d(xx , yj ),
rate-distortion theory, which states that better performance yj
(i.e., lower distortion for a given rate or lower rate for a i.e., the selected y j is the nearest codeword to the input
given distortion) can always be achieved by encoding vectors vector x .

19520 VOLUME 5, 2017


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

2) Centroid Condition (CC). Given the partition Ij , Let {xx (n)}N


n=0 be the training set of unlabeled examples (train-
j = 1, . . . , L, the codewords are the centroids of the ing input patterns), selected at random from X . The learning
decision regions. process proceeds iteratively, from n = 0 to n = N , where
Linde, Buzo and Gray, inspired by the k-means method for N should be large enough so that self organization develops
data clustering, provided an iterative algorithm (the LBG properly. At iteration n, the n-th training input pattern x (n)
algorithm) to generate a vector quantizer that satisfies the is presented to the SOM and the following three steps are
above conditions [25]. It essentially defines an initial dic- performed [6].
tionary and proceeds by repeatedly computing the decision Step 1 (Competition): The neurons compete among them-
regions (according to NNC) and improving the codewords selves to be selected as the winning neuron, i.e., the one
(according to CC) until the average distortion falls below whose synaptic-weight vector most closely matches x (n)
a given threshold. It can be formulated to address known according to the Euclidean distance. Its index i(xx ) satisfies:
or unknown source statistics. In the last case, a large set
of input vectors, called training set, must be used to build i(xx ) = arg min kxx (n) − w j (n)k, j = 1, . . . , L. (4)
j
up the quantizer. In this paper, we adopt a VQ approach.
Input vectors for the quantizer are determined by subdividing Step 2 (Cooperation): The winning neuron i(xx ) identifies
the original signal into segments between successive peaks. the center of a topological neighborhood of cooperating neu-
Dictionaries are obtained exploiting artificial neural networks rons modeled by the function hij (n). If dij is the lateral dis-
and our primary interest is to obtain and update them in an tance between i(xx ) and neuron j in A, then hij (n) is symmetric
online fashion as the signal statistics change. Note that the around i(xx ) and its amplitude decreases monotonically with
LBG algorithm does not natively support this, since it is increasing lateral distance dij . Moreover, hij (n) shrinks over
conceived for time-invariant dictionaries. time. We set hij (n) = exp(−dij2 /(2σ (n)2 )) (unnormalized
Since our reference scenario is a wearable-based health- Gaussian function), where σ (n) is the width of the topological
care application, the proposed compression framework aims neighborhood, that exponentially decreases with increasing
at being as energy-efficient as possible. A problem that arises time n [6].
with VQ is related to the search of the nearest codeword Step 3 (Synaptic Adaptation): The synaptic-weight vector
during the quantization process, i.e., the codeword y j∗ ∈ C w j (n) of neuron j at time n is adjusted through the equation:
such that y j∗ = argminy j d(xx , y j ). In fact, the number of w j (n + 1) = w j (n) + η(n)hij (n)(xx (n) − w j (n)). (5)
operations performed in such phase affects the overall com-
putational complexity performance and, in turn, power con- Equation (5) has the effect of moving the synaptic-weight
sumption. To speed up the search and thus save energy, vector w i(xx ) of the winning neuron i(xx ) (and the synaptic-
we exploit the fast dictionary search algorithm devised by weight vectors of the neurons in its topological neighborhood,
Wu and Lin [28]. The idea is to bypass those codewords through hij (n)) toward the input vector x . The learning-rate
which satisfy a kick-out condition without the actual com- parameter η(n) starts at some initial value η(0) and then
putation of the distortion for the bypassed codewords. This is exponentially decreases with increasing n through η(n) =
achieved by decomposing the Euclidean distance and using η0 exp(−n/τ2 ), n = 0, 1, . . . , where τ2 is a time constant.
the Cauchy-Schwarz inequality, see [28]. The synaptic weight adaptation process proceeds according
to (5) and can be decomposed into two phases: an ordering
IV. UNSUPERVISED DICTIONARY LEARNING phase, during which the topological ordering of the weight
THROUGH SELF-ORGANIZING MAPS vectors takes place, followed by a convergence phase, which
The Self Organizing Map (SOM) and its time-adaptive ver- fine-tunes the feature map and therefore provide an accurate
sion (TASOM) are single layer feed-forward networks having statistical quantification of the input space. As a general rule,
an input layer of source nodes that projects directly onto the total number of iterations allowing the map to develop
an output layer of neurons. The SOM provides a struc- properly should be at least N = 1000 + 500 × L [6].
tured representation of the input data distribution with the Once the SOM algorithm has terminated, a nonlinear
synaptic-weight vectors acting as prototypes. For its out- transformation (feature map) 8 : X → A is obtained
put layer, we consider a square lattice A with L neurons as 8(xx ) = w i(xx ) , where the index i(xx ) is found using (4).
arranged in M rows and M columns. The input space X is 8(·) is a quantization rule as it approximates the input data
m-dimensional with input vectors x = [x1 , x2 , . . . , xm ]T ∈ space X with the finite set of weights (prototypes) w j ∈ A.
X ⊂ Rm . The SOM input layer has m source nodes, In fact, the same weight vector w j is returned in response to
each associated with a single component of the input vec- all the input vectors x for which 8(xx ) = w j . Thus, the SOM
tor x and each neuron in the lattice is connected to all the algorithm is a VQ algorithm. However, upon completion of
source nodes. The links (synapses) between the source nodes the learning phase, the SOM map stabilizes and further learn-
and the neurons are weighted, such that the j-th neuron is ing / adaptation to new input distributions is difficult. With
associated with a synaptic-weight vector denoted by w j = non-stationary signals, adaptive learning must be employed to
[wj1 , wj2 , . . . , wjm ]T ∈ Rm , j = 1, . . . , L, where L = M 2 is update the feature map. The Time-Adaptive Self-Organizing
the total number of neurons in A. Training is unsupervised. Map (TASOM) achieves this by allowing the map to increase

VOLUME 5, 2017 19521


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

FIGURE 2. Diagram of the TASOM-based compression algorithm.

the learning rate when the signal’s statistics changes and for The normalized segment feeds the dictionary manager, which
this reason is a more appealing technique with non-stationary uses it to update the dictionary, and the pattern matching
signals. The TASOM has been introduced in [7] improving module, which returns the best matching codeword from
upon the basic SOM and preserving its properties in station- the dictionary and outputs its index. The segment’s original
ary and non-stationary settings. In a TASOM, each neuron j, length, offset, gain and codeword index are then sent to the
j = 1, . . . , L, has a synaptic-weight vector w j ∈ Rm with receiver in place of the original samples.
its own learning-rate ηj (n) and neighborhood width σj (n), The dictionary manager is the key block of the
which are continuously adapted so as to allow a potentially TASOM-based compressor. We designed it thinking of a
unlimited training of the synaptic-weight vectors. The reader communication scenario entailing a transmitting wearable
is referred to [7] for additional details. device and a receiver, such as a smartphone. At any time
instant n, two dictionaries are maintained at the transmitter:
V. A FIRST DESIGN: TASOM-BASED ECG COMPRESSION the current dictionary C c (n), which is used to compress
In this section, we describe an initial design that uses the the input signal, and the updated dictionary C u (n), which
TASOM unsupervised learning algorithm. First, we identify undergoes updating at each time instant through the TASOM
as ECG segments the sequence of samples between con- algorithm and is maintained to track statistical changes in
secutive ECG peaks and we use them to build a dictionary the input signal’s distribution. As for the dictionaries, we
that stores typical segments and is maintained consistent and consider a TASOM with L neurons. When the compression
well representative through online updates. A diagram of the scheme is used for the first time, a sufficient number N of
proposed technique is shown in Fig. 2. The ECG signal is signal segments shall be provided as input to the TASOM to
first preprocessed through a third-order Butterworth filter to perform a preliminary training phase. This training allows
remove artifacts. Hence, the fast peak detection algorithm the map to learn the subject signal’s distribution. This may
of [29] is employed to locate the signal peaks. Since the be accomplished the first time the subject wears the device.
segments may have different lengths, after their extraction, After this, a first subject-specific dictionary is available. It can
a linear interpolation block resizes each segment from its be used for compression and can also be updated at runtime
actual length rx (n) to a fixed length m. The resized segment as more data is acquired. Let us assume that time is reset
is termed x (n) = [x1 (n), . . . , xm (n)]T , whereas when the preliminary training ends, and assume n = 0 at
Pm such point. The current and updated dictionaries are C c (0) =
xk (n)
ex (n) = k=1 (6) {ccc1 (0), . . . , ccL (0)} and C u (0) = {ccu1 (0), . . . , cuL (0)}, respec-
m c/u
tively. Their codewords c j (0) represent the synaptic-weight
is its offset and vectors of the corresponding neural (TASOM) maps. At time
m
X
!1/2 n = 0, we have c cj (0) = c uj (0) = w j (0), j = 1, . . . , L. Let also
gx (n) = xk (n) /m
2
(7) assume that the decompressor at the receiver is synchronized
k=1 with the compressor, i.e., it owns a copy of C c (0). From
is its gain. The normalization module applies the following time 0 onwards, for any new segment x (n) (n = 1, 2, . . . )
transformation to each entry of x (n): the following procedure is followed:
Step 2 makes it possible to always maintain an updated
xk (n) − ex (n) approximation of the input segment distribution at the
xk (n) ← , k = 1, . . . , m. (8)
gx (n) transmitter. With step 4, we check the validity of the

19522 VOLUME 5, 2017


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

Algorithm 1 TASOM-Based Compressor learned. In that case, we do not want previous neurons to be
1) Map x (n) onto the index of the best matching codeword involved in the refinement as we are exploring a new area
in C c (n), i.e., map x (n) onto the index ix (n) such that in the input signal space, and we do not want to do this at the
cost of getting lower accuracies in the portion of space that we
ix (n) = arg min kxx (n) − c cj (n)k, j = 1, . . . , L. (9)
j have already inspected and successfully approximated. In our
new design, we accomplish this through a GNG network [8].
2) Let d(n) = kxx (n) − c ci (n)k be the distance between This type of neural network incrementally learns the topolog-
the current segment and the associated codeword, where ical relations in a given input signal set, dynamically adding
we use index i as a shorthand notation for ix (n). Use x (n) and removing neurons and connections, adapting itself to
as the new input for the current iteration of the TASOM previously unseen input patterns.
learning algorithm and obtain the new synaptic-weight O2) Objective 2 (Reducing Overhead): we aim at further
vectors wj (n), j = 1, . . . , L. reducing the overhead associated with maintaining and trans-
3) Update C u (n) by using the weights obtained in step 2, mitting the dictionary. This is achieved through two tech-
i.e., setting c uj (n) ← w j (n) for j = 1, . . . , L. niques: 1) working within a suitable feature space, where a
4) Let ε > 0 be a tuning parameter. If d(n)/kxx (n)k > ε, number of features much smaller than the size of each ECG
then update C c (n) by replacing it with C u (n), i.e., C c (n) ← segment suffices for its accurate representation; 2) selective
C u (n) and, using (9), re-map x (n) onto the index ix (n) of dictionary update. In the TASOM-based approach, a dictio-
the best matching codeword in the new dictionary C c (n). nary is entirely replaced whenever any of its codewords is no
5) Send to the receiver the segment’s original length rx (n), longer capable of approximating ECG segments belonging to
its offset ex (n), gain gx (n), and the codeword index ix (n). a given signal’s area within a preset accuracy. Instead, in our
If C c (n) has been modified in step 4, then also send C u (n) new design codewords are selectively replaced by new ones
(that in this case is equal to the new C c (n)). that better approximate the portion of signal space that they
are responsible for.
O3) Objective 3 (Coping With Artifacts): ECG signals
approximation provided by the current dictionary (the one gathered from wearable devices are prone to artifacts,
used for compression, which is also known at the receiver). caused, for example, by the body movements of the wearer.
The tunable parameter ε is used to control the signal recon- Dictionary-based approaches are particularly sensitive to arti-
struction fidelity at the decompressor: if d(n)/kxx (n)k ≤ ε, facts as no existing codeword can adequately approximate
codeword c cix (n) (n) is considered suitable to approximate the them. Dictionary updates, attempting to bring the codewords
current segment, otherwise C c (n) is replaced with the updated closer to the new segments (i.e., the artifacts) are likely to
dictionary C u (n) and the encoding mapping is re-executed. result in a degraded representation accuracy for the recurring
Note that the higher ε, the higher the error tolerance and the segments. However, these noisy segments must be accurately
lower the number of updates of the current dictionary. On the represented, as these may indicate anomalous behavior that
contrary, a small ε entails frequent dictionary updates: this could play an important role in the diagnosis of a disor-
regulates the actual representation error and also determines der. Our new compressor successfully copes with this by:
the maximum achievable compression efficiency. 1) sending features in place of full codewords whenever none
At the receiver, the n-th ECG segment is reconstructed by of the current codewords provide a satisfactory match and
picking the codeword with index ix (n) from the local dic- 2) concurrently starting an assessment phase for the new
tionary, performing renormalization of such codeword with pattern. In the assessment phase, a new neuron (codeword)
respect to offset ex (n) and gain gx (n) and stretching the is temporarily added to a local dictionary, which is only
codeword according to the actual segment length rx (n). maintained at the source and is used for the evaluation of
new (or anomalous) patterns. The permanent addition of
VI. THE SURF COMPRESSION SCHEME such codeword to the main dictionary only occurs if further
In what follows, the TASOM-based compressor of the pre- segments are found to match it, which means that the new
vious section is improved through the use of a more flex- segment has become recurrent.
ible neural network architecture, aiming at the following Objectives O1, O2, and O3 are achieved through the
objectives. SURF compression algorithm that we describe in detail next.
O1) Objective 1 (Specializing the Dictionary to New Signal It leverages a GNG neural structure to learn and maintain
Areas): we recall that the number of neurons in the TASOM a set of prototypes in the signal’s feature space in a totally
map remains fixed as time evolves and this entails that some unsupervised fashion. This neural network structure has a
further refinement of the dictionary, whenever the signal number L(n) of neurons, where n is the (discrete) time index,
statistics undergoes major changes and new behaviors arise, which is updated as n ← n+1 each time a new ECG segment
may not always be possible. In fact, from our experiments we is processed.
have seen that, at times, additional neurons may be benefi- A diagram of the SURF algorithm is shown in Fig. 3. The
cial to specialize the dictionary upon the occurrence of new signal is at first preprocessed through the same chain of Fig. 2,
patterns, while at the same time preserving what previously involving filtering to remove artifacts, ECG peak detection

VOLUME 5, 2017 19523


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

FIGURE 3. Flow diagram of the SURF compression algorithm: the dictionaries are learned in the feature space. Codewords
can be added or removed. When the distance between the best matching codeword in D1 and the current feature vector is
higher than a threshold, a new codeword is added to D2. That codeword is in an assessment phase until it is either
permanently added (to D1 and D3) or deleted (i.e., when no further matches occur). Dictionary D1 is used for
(dictionary-based) compression, D3 for continuous learning. When a good match is found for an ECG segment, the
compressor sends its length, offset and the index of the matching codeword. Otherwise, the segment’s feature vector is sent
along with its length and offset.

and segment extraction. After this, ECG segments are nor- by the decompressor at the receiver. This implies that any
malized, resized and their offset is removed. As different changes to D1 should be promptly communicated to the
ECG segments may have different lengths, linear interpola- decompressor so that the dictionaries at source and at the
tion is used to resize them to a fixed length m. Let x (n) = receiver remain synchronized at all times. Instead, D2 and D3
[x1 (n), . . . , xm (n)]T be the resized m-length ECG segment at only need to be maintained at the source (transmitter).
time n. Offset removal is achieved through: Dictionary D1: The current dictionary D1 contains the
codewords which are currently in use. For each new fea-
xk (n) ← xk (n) − ex (n), k = 1, . . . , m, (10) ture segment y (n), the closest codeword cci∗ (n) in D1 is
fetched (‘‘pattern matching’’ in Fig. 3) by minimizing the
where ex (n) is defined in (6). After this, the normalized distance d(yy(n), c cj (n)) = kyy(n) − c cj (n)k for all codewords
ECG segment x (n) is fed to a feature extraction block which ccj (n) ∈ C c (n), i.e.,
reduces the dimensionality of x (n) through the computation
of a number f < m of features. This mapping is denoted i∗ = arg min d(yy(n), c cj (n)), j = 1, . . . , L(n). (11)
by 9 : Rm → Rf and we have: y(n) = 9(xx (n)), where j
y(n) = [y1 (n), . . . , yf (n)]T . For our experimental results, If d(yy(n), c ci∗ ) is smaller than a preset error tolerance
this mapping corresponds to the DCT transform of x (n), by εf > 0,1 the codeword c ci∗ from D1 is deemed a good
retaining the first (low-pass filtering) f coefficients in the candidate to approximate the current ECG segment. In this
transform (frequency) domain. We underline that our method case, we say that y (n) is matched by c ci∗ . Index i∗ is thus sent
is rather general and other transformation and coefficient to the receiver in place of the entire feature set y (n). At the
selection methods can be applied. receiver side, a copy of D1 is maintained at all times and is
At this point, the SURF dictionaries come into play. used to retrieve c ci∗ from its index.
Differently from the TASOM approach, three dictionaries are Dictionary D2: If d(yy(n), c ci∗ ) > εf , none of the codewords
maintained at the transmitter: D1) the current dictionary in D1 adequately approximates the current feature vector,
C c (n) = {cc1 (n), . . . , ccL(n) (n)}, D2) the reserved dictionary which is then termed unmatched. Note that this may be a
C r (n) = {cr1 (n), . . . , crR(n) (n)} and D3) the updated dictionary consequence of changes in the signal statistics such as sudden
C u (n) = {cu1 (n), . . . , cuL(n) (n)}. D1 and D3 contain the same variations in the subject’s activity, to pathological (and often
number of codewords at all times, whereas D2 contains R(n) 1 Here, ε represents the error tolerance in the feature space, which must
f
codewords, where in general R(n)  L(n). D1 is used for not be confused with that in the signal space ε, that was used for the
compression at the source (transmitter) and has to be known TASOM-based compressor of Section V.

19524 VOLUME 5, 2017


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

sporadic) ECG segments or to measurement artifacts. In these all times, while adaptively (and automatically) tuning the
cases, we check for a match in the reserved dictionary D2 instantaneous compression rate (as a function of the charac-
(C r (n)). If a match occurs, the matching count of the match- teristics of the current segment). Also, this allows refining
ing codeword in D2 is increased by one. Otherwise, a new the main dictionary by only including those patterns that
codeword is added to D2. This is achieved by adding a neuron have become recurrent. As we shall see, this provides excel-
to dictionary C r (n) and using feature vector y (n) to initialize lent accuracy performance, resilience against artifacts, while
its synaptic-weight vector. We stress that the codewords in retaining most of the benefits of dictionary-based schemes
D2 are not yet ready for use in the signal compression, but (very high compression rates).
they have to go through an assessment phase. D2 behaves For the formal description of the SURF algorithm, let us
as a buffer with maximum size Lmax : if a codeword in D2 assume that time is reset when the preliminary training ends
is matched γ times (with γ being a preset parameter), it is and assume n = 0 at such point. The codewords of D1 and
removed from D2 and added to D1. If instead D2 gets full D3, at time n = 0, C c (0) = {ccc1 (0), . . . , c cL(0) (0)} and C u (0) =
and a new codeword has to be added to it for assessment, the {ccu1 (0), . . . , c uL(0) (0)} are set equal to the synaptic-weight vec-
oldest codeword from D2 is deleted and the new one takes tors at the end of the initial training, i.e., c cj (0) = c uj (0) =
its place. The rationale behind the assessment phase is that w j (0), j = 1, . . . , L(0). We also assume that the decompressor
new codewords are added to explore a new portion of the at the receiver is synchronized with the compressor, that is,
signal’s feature space, and this exploration is prompted by it owns a copy of D1 (C c (0)). Also, for any codeword c
the measurement of previously unseen patterns. Now, if these belonging to any dictionary, if d(yy(n), c ) < εf we say that y (n)
patterns are very unlikely to occur again it does not make any is matched by c . For the continuous update of the synaptic
sense to add them to dictionary D1 and it is better to send weight vectors (codewords) in dictionary D3, we apply the
the feature vector y(n) for these isolated instances. In turn, following Algorithm 2, which rests on the Hebbian learning
y(n) will be utilized to reconstruct the pattern at the receiver. theory in [30], [31].
Instead, if after their first appearance, these become recurring
patterns, it does make sense to add them to D1 (and D3 for Algorithm 2 Synaptic Weight Vector Update
their continuous refinement). Note that the combined use of At the generic time n, let y (n) and i∗ respectively be the
D1 and D2 makes it possible to specialize the dictionary to current feature vector and the index associated with the best
new signal areas (new patterns, i.e., objective O1) and as well matching codeword in D1, i.e.,
to cope with artifacts (objective O3).
Dictionary D3: This dictionary has the same number of d(yy(n), c ui∗ (n)) ≤ d(yy(n), c uj (n)), j = 1, . . . , L(n). (12)
neurons of D1 but its codewords are updated for each new We have that i∗ is the winning neuron in map (dictionary)
matched ECG segment. That is, when d(yy(n), c ci∗ ) < εf the D1 for this input (feature) vector y (n) and its synaptic
feature vector y (n) is also used to update dictionary C u (n). weight vector is w i∗ = c ui∗ (n), with w i∗ ∈ Rf . The update
As stated above, dictionary D2 and D3 are continu- rule for w i∗ is:
ously updated: D3 when a match occurs between y (n) and
a codeword in D1, whereas D2 when no codeword in i∗ ← w i∗ + b (y
w new y(n) − w i∗ ). (13)
D1 matches y (n). In this case, if y (n) matches some code-
Moreover, when we have a match, an edge will be created
word in D2, the corresponding matching count is increased,
in the neural map between i∗ and i∗∗ , where i∗∗ is the
otherwise D2 is extended through the addition of a new
second-closest neuron to the current input vector y (n).
codeword. Dictionaries D1 and D3 are initialized with L(0)
If i∗ and i∗∗ are already connected with an edge, no new
neurons, where L(n) is always bounded, i.e., L(n) ≤ Lmax
edge will be added. After that, we update the synaptic
at all times, where Lmax is a preset parameter to cope
weight vector of every neuron j that is a neighbor of i∗ ,
with memory constraints. At time 0, D2 is empty and the
i.e., that is connected to it with an edge:
number of neurons therein is likewise bounded by Lmax .
Similarly to the TASOM-based approach, when the com- w new
j ← w j + n (yy(n) − w j ), (14)
pression scheme is activated for the first time, a suffi-
cient number N of signal segments must be provided as where b and n are constant learning rates. The new
input to perform a preliminary training phase. Such train- weight vectors of (13) and (14) correspond to the updated
ing allows the dictionaries to learn the subject signal’s codewords for dictionary D3.
distribution. An observation is in oder. Basically, the just
described approach dynamically switches the compression Keeping the above definitions and update rules into
strategy between a dictionary-based technique and a stan- account, from time 0 onwards, for any new feature segment
dard transform-based one (i.e., sending a number of DCT y (n) (n = 1, 2, . . . ) the following procedure is executed:
coefficients for the current segment). The dictionary is In the above algorithm, Step 2 checks whether the current
used when it approximates well the current ECG pattern. segment is matched by one codeword in the current dictio-
Otherwise, a DCT compression approach is exploited. Note nary D1. If not, the current feature vector is tagged as an
that this makes is possible to achieve high accuracy at unknown pattern and is added to dictionary D2 to go through

VOLUME 5, 2017 19525


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

Algorithm 3 SURF an assessment phase. If instead a matching codeword in D1


Step 1): for y (n), find the indices of the two closest codewords in is found, this codeword is used in Step 3 to approximate
D1 (C c (n)), which are respectively called iy∗ (n) and iy∗∗ (n), where the current segment. This is achieved by sending the index
iy∗ (n) = arg min d(yy(n), c cj (n)), j = 1, . . . , L(n) (15) associated with this matching codeword to the receiver, which
j owns a copy of dictionary D1 and uses the index to retrieve
and iy∗∗ (n) is the index of the second-closest codeword in D1. the approximating codeword. With Step 4, we periodically
Step 2): Let d(n) = d(yy(n), cci∗ (n)) be the distance between y(n)
perform a dictionary assessment, i.e., we check whether
and the closest codeword c ci∗ (n), where we use i∗ as a shorthand
the current dictionary D1 is still well representative of the
notation for iy∗ (n). If d(n) ≤ εf move to Step 3, otherwise act
as follows. Check the reserved dictionary D2 to see whether any actual input distribution. This assessment is accomplished
of its codewords matches y (n). If this is the case, then increase by checking the distance between each codeword in D1 and
by one unit the matching count for that codeword: if this count its corresponding codeword in D3: if this distance gets too
reaches γ , this codeword is removed from D2, added to D1 and D3 large (namely, larger than the maximum error tolerance εf ),
(increasing their size, i.e., L(n) ← L(n)+1) and transmitted to the the codeword in D1 is replaced by its counterpart in D3.
receiver. If no matching codeword exists in D2, the feature vector Note that, the higher εf , the higher the error tolerance and
y (n) is sent to the receiver along with the length and offset of the the lower the number of updates that are carried out for the
corresponding signal segment. Also, a new codeword (neuron) is current dictionary D1. Conversely, a small εf entails frequent
added to the reserved dictionary D2 and this neuron has a weight dictionary updates. This regulates the actual representation
vector w = y (n). Go to Step 4.
error and also determines the maximum achievable compres-
Step 3) Here, d(n) ≤ εf . 3.1) Use the weight vector of neuron
sion efficiency. Moreover, we stress that in Step 4 the update
i∗ as the approximating vector for y (n). Hence, send index i∗
to the receiver along with the length and offset of the signal procedure is solely applied to those neurons that need to be
segment associated with y (n). 3.2) Use y (n) to update D3 through updated as opposed to our previous design of Section V,
Algorithm 2 above. 3.3) For dictionary D3 do the following. where the whole dictionary is updated. This helps reduce
Increase the age aj of all the neighbors j of neuron i∗ . Remove the overhead associated with the dictionary update operation
any edge with age aj ≥ amax , with amax being a preset parameter. (see objective O2).
If this makes it so a neuron remains with no neighbors (no edges At the receiver, when compression is achieved picking
connecting it to other neurons in D3), then remove this neuron the closest codeword from dictionary D1 and sending the
from both D1 and D3 and decrease their size, L(n) ← L(n) − 1. corresponding index i∗ , the n-th segment is reconstructed
3.4) For dictionary D1 do the following. The distance between by picking the codeword y ∗ with index i∗ from the local
the input y and the nearest neuron i∗ will be added to the local
dictionary, moving it into the time domain through the inverse
accumulated error of neuron i∗ :
feature map, i.e., x ∗ = 9 −1 (yy∗ ). Instead, when the feature
error(i∗ )new ← error(i∗ )old + d(yy(n), c ci∗ (n)). (16) vector y (n) is transmitted, the decompressor directly applies
Step 4) Dictionary Management: The following dictionary x ∗ = 9 −1 (yy(n)) to the received feature vector. In both cases
update procedure follows the growing neural gas network algo- then, the offset ex (n) is added back to x ∗ and the latter is
rithm of [8]. Every λ time steps, we check the current dictionary resized to the actual segment length rx (n). This returns the
D1 for its possible update as follows: 4.1) Each two corresponding reconstructed segment x̂x (n).
neurons in C c (n) and C u (n) will be considered. If their distance is
greater than εf , the weight vector of the neuron (codeword) in VII. HARDWARE ARCHITECTURE, ENERGY MODEL
C c (n) will be replaced by the one of the corresponding neuron in AND PERFORMANCE METRICS
C u (n). The weight vectors (codewords) in C c (n) that are updated
To evaluate the energy consumption, following the approach
as a consequence of this check are sent to the receiver. 4.2) For
of [32], [33] we compute three metrics:
dictionary D1, the neuron p (synaptic weight vector w p ) with
the maximum accumulated error is determined. A new neuron r 1) compression energy: the energy consumption to execute
(synaptic weight vector w r ) is generated halfway between p and the compression algorithm on the wearable node,
its neighbor q that has the largest 2) transmission energy: the energy drained by the the trans-
accumulated error: mission of the (either compressed or original) signal
wp + w q ).
w r = 0.5(w (17) over a wireless channel, and the
3) total energy, given by the sum of the previous two
The new neuron r is then added to both D1 and D3 and is also
transmitted to the receiver to update the decoder’s dictionary D1.
metrics.
For both D1 and D3, remove the edge connecting neurons p and
q (edge (p, q)) and add the two edges (p, r) and (r, q). Multiply A. COMPRESSION ENERGY
the accumulated error of p and q by constant α and initialize the The compression energy has been evaluated by taking into
accumulated error of r with the new value of the accumulated account the number of operations performed by the Micro-
error of p. Controller Unit (MCU), i.e., the number of additions, mul-
Step 5): All the accumulated errors will be multiplied by a second tiplications, divisions and comparisons. Then, according to
constant β. After this, go to Step 1 for the next input segment. the considered sensor hardware, we translated these figures

19526 VOLUME 5, 2017


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

into the corresponding number of clock cycles Ncc and, from The maximum number of packets per seconds PPSmax that
there, we derived the energy expenditure, as in [32]. For the can be exchanged between the two devices is thus: PPSmax =
energy consumption plots of Section VIII-A, we considered nmax /CImin , with CImin expressed in seconds, and the maxi-
a Cortex M4-90LP [34] processor, whose number of clock mum throughput is obtained as
cycles per operation is detailed in Table 7-1 of [35]. As for
the energy consumption per clock cycle, Ecc , in active mode Thrmax = PPSmax × payload_size. (19)
the Cortex M4-90LP consumes 10.94 µA with the MCU
operating at 1 MHz and the supply voltage being +3 V: Here, nmax depends on the operating system of the terminals,
for example, at the time of writing, Android has nmax = 6,
Ecc = 10.94 µA × 3 V/1 MHz = 32.82 · 10−12 J. (18) whereas iOS has nmax = 4. Using (19), the maximum
throughput for a wireless ECG monitor connected with an
B. TRANSMISSION ENERGY Android terminal (nmax = 6), is thus: Thrmax = PPSmax ×
When ECG samples are measured using a Zephyr Bioharness payload_size = (6/0.0075) × 105 = 672 kbit/s. This
3 module [36], the sampling frequency is 250 Hz, and each maximum throughput is more than enough to support the
ECG sample takes 12 bits. This amounts to a transmission transmission of the raw ECG signal (3 to 4 kbit/s).
rate for a continuously streamed (uncompressed) ECG signal The number of transmitted packets is computed according
of 3 kbit/s. This is the setup considered for the results in to the number of information bits that are to be transmitted by
Section VIII-B, whereas in Section VIII-A the bitrate is of the radio, segmenting the bitstream into multiple data packets
3.96 kbit/s, as the sampling rate is higher (360 Hz with 11 bits according to a fixed payload length of 105 information bytes.
per sample). The raw ECG signal is then compressed using The energy consumption associated with the transmission of
SURF and transmitted through the wireless channel. Next, we a single data packet is obtained as Epacket , as per the above
detail how we estimated the energy consumption associated discussion. Finally, the total energy consumption is computed
with the transmission of data packets as they travel from the as the sum of processing and transmission energy.
wearable device to the data receiver. Towards this end, we Two additional metrics are considered in the performance
consider the energy consumption figure of the Bluetooth LE analysis, i.e., the Compression Efficiency (CE) and the Root
Texas Instruments CC2541 radio [37], whose energy con- Mean Square Error (RMSE). CE has been computed as the
sumption per transmitted bit is Ebit = 300 nJ/bit (18.2 mA ratio between the total number of bits that would be required
at 3.3 V considering a physical layer bitrate of 2 Mbps to transmit the full signal divided by those required for the
and the radio in active mode). The procedure that we now transmission of the compressed bitstream. The RMSE is used
describe can be applied to any other radio, by plugging the to represent the reconstruction fidelity, as is computed as
corresponding Ebit . the root mean square error between the original and the
The energy consumption for each transmitted packet is compressed signals, normalizing it with respect to the signal’s
obtained as Epacket = Ebit × packet_size, where peak-to-peak amplitude (p2p), that is
packet_size = header_size + payload_size. PK !1/2
2
No energy consumption is accounted for when the radio is 100 i=1 (xi − x̂i )
RMSE = , (20)
in idle mode (between packet transmissions). The packet p2p K
transmission process follows the Bluetooth LE protocol in
the connected mode (in our case, a point-to-point connection where K corresponds to the total number of samples in the
between only one master and only one slave). In Bluetooth ECG trace, xi and x̂i are the original sample i and that recon-
LE, a data packet consists of the following fields: pream- structed at the decompressor (receiver side), respectively. The
ble (1 byte), access address (4 bytes), link layer header SURF default parameters have been set as follows: b = 0.01,
(2 bytes), L2CAP header (4 bytes), which are followed by n = 0.005, α = 0.5, β = 0.995, γ = 3, Lmax = 10,
3 bytes of ATT command type/ attribute ID, Ldata informa- λ = 200 and amax = 100. These parameters were selected
tion bytes (containing application data), and the CRC field empirically and provide a good tradeoff between RMSE and
(3 bytes), see [38]. This leads to a total protocol overhead overhead (memory and compression efficiency).
of header_size = 17 bytes. For our results, we picked
a payload size of Ldata = payload_size = 105 unen- VIII. NUMERICAL RESULTS
coded information bytes (leading to a protocol overhead of In this section, we show quantitative results for the proposed
(17/122) × 100 = 13.9%), although the numerical results signal compression algorithms, detailing their energy con-
can be promptly adapted for any other value. Each side com- sumption, compression efficiency and reconstruction fidelity.
municates with the other on a given period called Connection In Section VIII-A, we first assess the performance
Interval (CI), whose minimum value is 7.5 milliseconds. Each of the considered compression algorithms for the refer-
communication instance between the master and the slave ence ECG traces from the PhysioNet MIT-BIH arrhythmia
is called a communication event, subsequent communica- database [17]. In Section VIII-B, we extend our analysis to
tion events are separated by CI seconds and a maximum (artifact prone) ECG traces that we collected from a Zephyr
of nmax data packets can be transmitted within this period. BioHarness 3 wearable chest monitor.

VOLUME 5, 2017 19527


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

A. PHYSIONET ECG TRACES correspond to the original signal segments, i.e., y (n) = x (n).
For the first set of graphs, we considered the following ECG If an input segment is unmatched, the corresponding DCT
traces from the MIT-BIH arrhythmia database [17]: 101, 112, coefficients are transmitted to the receiver. So, in this case
115, 117, 118, 201, 209, 212, 213, 219, 228, 231 and 232, the DCT transform is only applied if a new pattern that the
which were sampled at rate of 360 samples/s with 11−bit res- current dictionary is unable to approximate is detected. In this
olution. Note that not all the traces in the database are usable case, f of its DCT coefficients are sent to reconstruct it at the
(some are very noisy due to heavy artifacts probably due to receiver (f = 200 is used for the SURF-TD curve in Fig. 4).
the disconnection of the sensing devices) and an educated
selection has to be carried out for a meaningful performance 2) SURF
analysis, as done in previous work [17], [39]. The above It is the feature domain implementation that we have
performance metrics were obtained for these ECG signals and described in Section VI, for which we considered the fol-
their average values are shown in the following plots. lowing values for the feature space size f ∈ {50, 75, 100,
150, 200} (see Figs. 4, 5 and 7).
From Fig. 4, we see that SURF achieves the highest CE,
up to 90-fold for the considered PhysioNet signals, whereas
time domain processing allows for maximum efficiencies of
60-fold. As expected, increasing f entails a smaller RMSE
at the cost of a smaller CE. However, we see that when f
increases beyond 100 the RMSE performance gets affected
and starts decreasing. In these cases, SURF behaves similarly
to its time domain counterpart. This is because dictionary
construction in feature space allows for more robustness and
generalization capabilities than working in the time domain,
which may lead to overfitting codewords to specific signal
examples. This means that an optimal value of f can be
identified, which in our case is around f ' 100. Fig. 5
shows the total energy consumption (adding up processing
FIGURE 4. SURF – RMSE vs compression efficiency. and transmission) and we see that savings of almost two
orders of magnitude with respect to the case where the signal
is sent uncompressed (‘‘no-compression’’) are possible. This
is further discussed below.

FIGURE 5. SURF – RMSE vs total energy consumption.

As a first result, in Figs. 4 and 5 we show the compression


efficiency and the total energy consumption of SURF, both
FIGURE 6. RMSE vs CE – comparison of compression algorithms.
plotted versus the RMSE by varying εf as a free parameter.
In these plots we quantify the impact of the feature space
size f , corresponding to the number of DCT coefficients that In Fig. 6, we plot RMSE vs CE for SURF, compar-
are retained and stored in the feature vector y (n). Two SURF ing it against the TASOM-based algorithm of Section V
variants were implemented: and selected lossy compression techniques from the litera-
ture based on DCT [20], DWT [14], linear approximation
1) SURF-TD LTC [10], GSVQ [9], and schemes based on compressive
It is a time domain implementation of SURF dictionaries, i.e., sensing: BSBL [40] and SOMP [15]. At very low compres-
the feature vectors y (n) that are inputted into the dictionary sion efficiencies LTC outperforms DCT in terms of RMSE.

19528 VOLUME 5, 2017


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

DWT does a much better job than DCT in terms of RMSE,


especially at relatively small compression efficiencies, say,
smaller than 30, but it is unable to reach higher CEs, for
which LTC, TASOM and SURF are to be preferred. As for
the CS-based algorithms, neither SOMP nor BSBL pro-
vides satisfactory performance. The compression efficiency
of SOMP is rather small and the corresponding RMSE tends
to diverge for, e.g., CE larger than 5. As we shall discuss
shortly, although the BSBL compressor has the lowest energy
consumption, its overall energy expenditure is high as this
approach is less effective in terms of CE than other schemes
such as dictionary based (GSVQ) and neural map based
algorithms (TASOM and SURF). For GSVQ, we move along
the RMSE vs CE curves by changing the threshold governing
the number of bits that are encoded into the residual stream
FIGURE 8. SURF – RMSE vs compression efficiency.
(residual encoding is the operation that affects the perfor-
mance of GSVQ the most). Although not shown in the plot,
with GSVQ one may think of not sending the residual encod-
ing stream, so as to reach higher compression efficiencies.
However, due to the use of a precomputed and fixed dictio-
nary, this leads to a very high RMSE and is not a viable option.
SURF offers very good performance both in terms of RMSE
and CE, thus clearly outperforming the other algorithms. We
also emphasize the substantial gap in both RMSE and CE
that SURF achieves with respect to TASOM. The reasons for
this are: i) SURF dictionaries more effectively represent new
patterns and artifacts, ii) SURF works in the signal feature
space, where the size of codewords is f < m elements and
iii) dictionary updates are selectively implemented only for
those codewords that no longer meet the error tolerance εf .

FIGURE 9. SURF – Dictionary size vs compression efficiency.

none of the current codewords will match the new segment.


Also, as a new pattern is detected and the corresponding
feature vector is added to dictionary D2, this codeword will be
put into use (moving it to D1 and D3) with small probability,
as further ‘‘nearly exact’’ (εf → 0) matches are difficult
to occur for it. On the other hand, as εf increases, more
codewords will be added to the dictionary and they will be
used to encode multiple patterns each. However, as εf keeps
increasing beyond a certain threshold, because of the relaxed
accuracy requirement, a smaller codeword set will suffice to
represent the input signal space and the dictionary size will
FIGURE 7. SURF dictionary size vs CE. The tradeoff curves are obtained by
varying the error tolerance εf . correspondingly decrease. Again, f = 100 was found to
be a good choice, requiring less than 10 kbytes of memory,
For SURF, we also look at the size of dictionaries as a func- while resulting in very high CEs. For this reason, f = 100 is
tion of CE. In Fig. 7, we plot the total size of dictionaries D1, used for SURF in the following graphs. Two further graphs,
D2 and D3 and we see that it never exceeds 17kbytes. For this Figs. 8 and 9, quantify the impact of the maximum number
reason, the approach is deemed amenable to implementation of codewords in the dictionaries, Lmax , which corresponds to
on wearables. We also note that the size at first (small CE) the number of neurons in the adopted GNG neural networks.
increases up to a maximum and then starts decreasing for The error tolerance εf is varied as an independent parameter
higher CE. This is because when the error tolerance εf is very in both plots. As expected, Fig. 8 shows that a higher Lmax
small, the compressor often sends the full feature vector as leads to a higher accuracy, i.e., the current dictionary more

VOLUME 5, 2017 19529


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

accurately represents the input signal. Nevertheless, we see


that the accuracy increase is not very large, whereas there
is a substantial difference in the overall memory space that
is taken by the dictionaries as Lmax increases. A number of
neurons per dictionary between 15 and 20 appears to be a
good choice, as further increasing Lmax from 20 to 30 only
leads to minor fidelity improvements (RMSE). Once Lmax is
fixed, the error tolerance can be used to tune the RMSE as
desired.

FIGURE 11. RMSE vs total energy consumption.

required for compression to that for the subsequent transmis-


sion of the compressed bitstream, as detailed in Section VII.
This total energy is then normalized with respect to the
number of bits in the original ECG signal. From this plot,
we see that the total energy consumption is dominated by the
transmission energy, which depends on the compression effi-
FIGURE 10. RMSE vs compression energy obtained varying CE as the ciency. In this respect, the best algorithms are LTC and SURF
independent parameter.
and the algorithm of choice depends on the target RMSE that,
In Fig. 10, we show the RMSE and the energy drained in turn, directly descends from the selected CE. As discussed
for compression (processing) at the transmitter, expressed in above, an adaptive algorithm may be a good option, where
Joule per bit of the original ECG sequence. These tradeoff for each value of CE the scheme that provides the smallest
curves are obtained by varying the compression efficiency of RMSE is used. In Fig. 11, the energy consumption when
each algorithm from the minimum to the maximum achiev- no compression is applied is also shown for comparison.
able (which is scheme specific, see Fig. 6). The RMSE We see that signal compression, and the subsequent reduction
increases with an increasing compression efficiency, whereas in size of the data to be transmitted, allows a considerable
the compression energy depends weakly on CE. As expected, decrease in the total energy consumption. When the energy
BSBL has the smallest energy consumption. This good per- reduction is one order of magnitude, LTC and SURF both
formance is due to its lightweight compression algorithm, provide RMSEs smaller than 2%. The performance of SURF
which just multiplies the input signal by sparse binary matri- is particularly striking as it allows saving up to two order of
ces. LTC is the second best, whereas SOMP, GSVQ and magnitude in terms of energy consumption while still keeping
TASOM perform very close to one another and have the the RMSE around 6%. Also, note that SURF’s actual RMSE
worst energy consumption for compression, although SURF is automatically adjusted at runtime, by allowing slightly less
consumes a slightly smaller amount of energy than them. accurate representations, and thus much higher compression,
We underline that the energy consumption of SURF, TASOM, when no critical patterns occur.
SOMP and GSVQ is dominated by the preprocessing chain A breakdown of the complexity and energy consump-
of Fig. 3 (as we quantify below through Tables 1 and 2). tion figures for the considered algorithms is provided in
In Fig. 10, we also show the performance of SURF by remov- Tables 1 and 2. These metrics were obtained for the Phy-
ing the contribution of this pre-processing chain (filtering, sioNet ECG signals and represent the average complexity
peak detection and segment extraction): the corresponding (expressed in terms of number of operations) and energy
curve is referred to in the plot as ‘‘SURF NoPre’’. Note that consumption (Joules) for the compression and transmission
filtering is always performed to remove measurement arti- of an ECG segment. From Table 2, we see that SURF has
facts and peak detection is also very often utilized to extract a lower energy consumption with respect to TASOM for
relevant signal features. Given this, the energy consumption compression, transmission and in total. We also see that the
associated with the required pre-processing functions may peak detection block of TASOM and SURF accounts for 91%
not be a problem, especially if these functions are to be of the per-segment energy drainage. The same fact applies to
executed anyway. the other segment-based approaches (SOMP, GSVQ).
In Fig. 11, we show the RMSE as a function of the total The plots in Fig. 12 show original and reconstructed ECG
energy consumption, which is obtained adding up the energy temporal signals using LTC, TASOM and SURF, in the

19530 VOLUME 5, 2017


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

TABLE 1. Energy breakdown [no. operations] and consumption [µJ] for TASOM. RMSE = 3.6% and CE = 20.92.

TABLE 2. Energy breakdown [no. operations] and consumption [µJ] for SURF. RMSE = 3.6% and CE = 76.6.

FIGURE 12. Original and reconstructed signal in the presence of artifacts for LTC, TASOM and SURF. (a) LTC: CE = 22 and RMSE = 2%. (b) LTC: CE = 29 and
RMSE = 3%. (c) TASOM: CE = 34 and RMSE = 2%. (d) TASOM: CE = 49 and RMSE = 3%. (e) SURF: CE = 43 and RMSE = 2%. (f) SURF: CE = 53 and
RMSE = 3%.

presence of anomalous ECG segments (toward the middle unable to effectively represent the new (anomalous) pat-
of the plots). Remarkably, although all algorithms have the terns. SURF provides the best results as it preserves the
same average RMSE, LTC heavily affects the ECG mor- signal morphology, while achieving the highest CE, i.e.,
phology. TASOM does a better job, but its dictionary is up to CE = 53.

VOLUME 5, 2017 19531


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

B. WEARABLE ECG SIGNALS


We now present some results for ECG signals that we
acquired from a Zephyr BioHarness 3 wearable device [36].
To this end, we collected ECG traces from eleven healthy
individuals, which were continuously recorded during work-
ing hours, i.e., from 8am to 6pm. These were sampled at a
rate of 25 samples/s with each sample taking 12 bits.

FIGURE 14. RMSE as a function of time. CE = 25 for all schemes,


RMSE(LTC) = 5%, RMSE(TASOM) = 1.58% and RMSE(SURF) = 1.29%.

FIGURE 13. RMSE vs CE for Bioharness ECG signals.

The RMSE vs CE tradeoff for these signals is shown in


Fig. 13 for the best performing compression algorithms. The
results are similar to those of Fig. 6 with the main difference
that in this case the ECG signals are prone to artifacts. Due
to the artifacts and to the highly non-stationary behavior of
the new traces, the resulting RMSE is higher and the CE
performance is degraded for all schemes. DWT and LTC are
good choices at low up to intermediate compression effi- FIGURE 15. CE as a function of time. RMSE = 2% for all schemes,
ciencies, whereas SURF shows its superior performance at CE(LTC) = 14, CE(TASOM) = 42 and CE(SURF) = 50.
very high CEs, and especially its ability to gracefully adapt
to artifact-prone and non-stationary signals. Although its
maximum compression efficiency is affected, being lowered thanks to its adaptive mechanism by which feature vectors are
from 96 to 50, the RMSE remains within 6% and is much transmitted in place of dictionary indices. From Fig. 15, we
smaller than that achieved by all other schemes. SURF, with see that at times SURF’s compression efficiency is reduced.
artifact prone ECG signals, allows for typical compression This is either due to dictionary updates, which, with the
efficiencies in the range CE ∈ [40, 50], which means that considered SURF parameters, occur every λ = 200 time steps
the data rate of 3 kbit/s that would be required to send the (ECG segments) or to artifacts, which in this figure are seen
uncompressed ECG trace is lowered to 60 bit/s and 75 bit/s for again around ECG segments 1100 or 1500 (the same portion
CE = 40 and CE = 50, respectively. The energy consump- of ECG signal is used for the last two figures).
tion figures, although rescaled, have a very similar behavior In Fig. 16, we analyze the training behavior (RMSE versus
as those obtained with the PhysioNet MIT-BIH traces and time) for dictionary D3, which is continuously updated at the
shown in Figs. 10 and 11. They are thus not shown in the transmitter. Note that the current dictionary D1 is replaced
interest of space. with D3 when the distance among their codewords exceeds
In Figs. 14 and 15, we respectively show how the RMSE a given threshold. So, the evolution of D3, although at a
and CE evolve with time for LTC, TASOM and SURF, where coarser time scale, also represents that of D1. To obtain this
these metrics are shown for each new ECG segment. For plot, we ran the following experiment: we picked a first
the RMSE (Fig. 14), we see that both TASOM and SURF subject and we trained D3 with their ECG signal for the first
provide excellent approximation accuracy. However, when 55 minutes, at which point, the input signal was swapped
artifacts occur, at around times 1100 and 2500 (at the end of with that of a second subject. Two curves are shown in the
the plot) we see that TASOM struggles to keep the RMSE low. figure, using Lmax = 10 and Lmax = 30 and keeping all
SURF instead still provides satisfactory RMSE performance the remaining parameters as specified at the beginning of

19532 VOLUME 5, 2017


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

FIGURE 16. Average normalized RMSE versus training time for the FIGURE 17. Efficiency regions for SURF compression with different radios
updated dictionary D3: the dictionary is trained on a first subject for the and MCUs. Radios: CC2420 (250 kbit/s, power 0 dBm), CC2541 (2 Mbit/s,
first 55 minutes. After that, the ECG trace of a different subject is used as at maximum power 0 dBm), CC2541LP (low rate 500 kbit/s and power
the input time series. The dictionary at first produces high errors, but then −20 dBm). MCUs: Cortex-M4 versions 40LP, 90LP, 180ULL.
quickly adapts and converges to the steady-state RMSE for the second
subject.
indicate the number of clock cycles that are needed to run
the section. At time zero, the dictionary is initialized using the compression algorithm, which depends on the number
random ECG segments from the first subject, whereas its bits B in the original signal and on the compression error εf
subsequent training follows the GNG-based algorithms of (which dictates a certain compression factor). Compression
Section VI. A few observations are in order. As expected, is convenient when the following inequality holds:
when the training starts the error is higher (the RMSE is
higher than 4% for the first subject for Lmax = 10) but it Ecc Ncc (B, εf ) + Etx
0
B̂ < Etx
0
B, (21)
decreases with time and converges to the steady-state error which means that the energy for compression added to that
within 20 minutes. After 55 minutes, the signal is swapped for transmission of the compressed sequence (left hand side)
with that of another subject and this may for example occur must be smaller than the energy that would be required to
when the wireless ECG monitor is handled over to another send the uncompressed signal (right hand side). Solving this
patient. At this point, we observe a peak in the RMSE, 0 , we find the minimum E 0 that allows
inequality for Etx tx
which suddenly increases from 2.8% to 4.1%. However, D3 compression to be energy efficient, that is:
is retrained and in about 20 more minutes converges to the Ecc Ncc (B, εf )
0,min
new steady-state RMSE for the second subject. This shows Etx = . (22)
that SURF gracefully adapts to new wearers, progressively B − B̂
0,min
tuning its dictionaries to their ECG patterns. From this graph, The lines plotted in Fig. 17 correspond to Etx computed
we also see that the RMSE depends on the maximum number for several values of εf , which in turn imply different com-
of codewords in the dictionary, Lmax : an increasing Lmax leads pression efficiencies (CE in the figure). The region in this
to higher accuracies. As a last remark, we recall that the plot where compression is advantageous (energy efficient
0,min
RMSE in Fig. 16 corresponds to the representation error of region) is that for which E 0tx > Etx , which corresponds
SURF dictionaries, but the actual RMSE of the full SURF to the region above the curves. As seen from the plot, the
algorithm is always within the preset error tolerance. In fact, energy efficient regions weakly depend on the compression
according to the algorithms of Section VI, when the RMSE parameters as the number of clock cycles is almost constant
is higher than a preset threshold the dictionary is not used, for different settings (changing εf ), B is also constant and
but the feature vector associated with the current segment is it depends on the sampling rate of the ECG monitor, and
sent as the compressed representation. In other words, SURF the only variable that changes is B̂. Most importantly, in
automatically switches between dictionary-based compres- the graph we have also reported the energy consumption
sion and feature-based (e.g., DCT) compression, meeting the figures (Etx0 and E ) of several radios and MCUs (each
cc
preset representation accuracy at all times. radio/MCU pair is indicated by a filled dot in the figure).
Fig. 17 shows the energy consumption associated with All of them fall within the efficient region and, as expected,
radio transmission and processing, identifying the region compression provides the highest gain when the radio is
where compression provides energy savings and it is there- energy hungry (CC2420) and the processor is energy efficient
fore recommended. We obtained this plot as follows. Let (Cortex M4-40LP). Before applying the SURF algorithm
B and B̂ respectively be the number of bits to send over to any architecture, one should make sure that the selected
the channel when no compression is applied and those to combination of radio and MCU operates within the energy
be sent when the signal is compressed. With Ncc (B, εf ) we efficient region of Fig. 17.

VOLUME 5, 2017 19533


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

IX. CONCLUSIONS [15] L. F. Polania, R. E. Carrillo, M. Blanco-Velasco, and K. E. Barner,


In this paper, we have presented SURF, an original subject- ‘‘Compressed sensing based method for ECG compression,’’ in Proc.
IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Prague,
specific and time-adaptive lossy compression algorithm for Czech Republic, May 2011, pp. 761–764.
wearable fitness monitors. This algorithm is based upon dic- [16] G. D. Poian, R. Bernardini, and R. Rinaldo, ‘‘Gaussian dictionary for
tionaries that are learned and maintained at runtime through compressive sensing of the ECG signal,’’ in Proc. IEEE Workshop Bio-
metric Meas. Syst. Secur. Med. Appl. (BIOMS), Rome, Italy, Oct. 2014,
the use of neural network maps. Our design utilizes unsu- pp. 80–85.
pervised learning to accomplish the following objectives: [17] M. Saeed et al., ‘‘Multiparameter intelligent monitoring in intensive care II
i) dictionaries gracefully and effectively adapt to new subjects (MIMIC-II): A public-access intensive care unit database,’’ Critical Care
Med., vol. 39, no. 5, pp. 952–960, 2011.
or their new activities, ii) the size of these dictionaries is kept
[18] J. Cox, H. Fozzard, F. M. Nolle, and G. Oliver, ‘‘AZTEC: A preprocessing
bounded (i.e., within 20 kbytes), making them amenable to system for real-time ECG rhythm analysis,’’ IEEE Trans. Biomed. Eng.,
implementation in wireless monitors, iii) high compression vol. BME-37, no. 9, pp. 128–129, Apr. 1968.
efficiencies are reached, allowing for reductions in the signal [19] J. P. Abenstein and W. J. Tompkins, ‘‘A new data-reduction algorithm for
real-time ECG analysis,’’ IEEE Trans. Biomed. Eng., vol. BME-29, no. 1,
size from 50- to 96-fold, depending on the frequency of pp. 43–48, Jan. 1982.
artifacts in the sampled signal, iv) the original biometric time [20] H. Lee and K. M. Buckley, ‘‘ECG data compression using cut and align
series are reconstructed at the receiver with high accuracy, beats approach and 2-D transforms,’’ IEEE Trans. Biomed. Eng., vol. 46,
no. 5, pp. 556–564, May 1999.
i.e., within a peak-to-peak RMSE of 7% and often smaller
[21] J. L. Cardenas-Barrera and J. V. Lorenzo-Ginori, ‘‘Mean-shape vector
than 3% and v) compression allows saving energy at the quantizer for ECG signal compression,’’ IEEE Trans. Biomed. Eng.,
transmitter, lowering the total energy expenditure of almost vol. 46, no. 1, pp. 62–70, Jan. 1999.
two orders of magnitude. SURF outperforms the compression [22] S.-G. Miaou and J.-H. Larn, ‘‘Adaptive vector quantisation for elec-
trocardiogram signal compression using overlapped and linearly shifted
approaches that were proposed thus far. Although in this codevectors,’’ Med. Biolog. Eng. Comput., vol. 38, no. 5, pp. 547–552,
paper SURF has been designed and tested with ECG signals, 2000.
it can be applied to other quasi-periodic signals as long as a [23] A. Chatterjee, A. Nait-Ali, and P. Siarry, ‘‘An input-delay neural-network-
based approach for piecewise ECG signal compression,’’ IEEE Trans.
reliable segment extraction technique is provided. Biomed. Eng., vol. 52, no. 5, pp. 945–947, May 2005.
[24] D. Del Testa and M. Rossi, ‘‘Lightweight lossy compression of biometric
REFERENCES patterns via denoising autoencoders,’’ IEEE Signal Process. Lett., vol. 22,
no. 12, pp. 2304–2308, Dec. 2015.
[1] P. J. Soh, G. A. E. Vandenbosch, M. Mercuri, and D. M. M.-P. Schreurs,
[25] Y. Linde, A. Buzo, and R. M. Gray, ‘‘An algorithm for vector quan-
‘‘Wearable wireless health monitoring: Current developments, challenges,
tizer design,’’ IEEE Trans. Commun., vol. COM-28, no. 1, pp. 84–95,
and future trends,’’ IEEE Microw. Mag., vol. 16, no. 4, pp. 55–70,
Jan. 1980.
May 2015.
[26] M. Hooshmand, D. Zordan, D. Del Testa, E. Grisan, and M. Rossi,
[2] M. Srivastava, T. Abdelzaher, and B. Szymanski, ‘‘Human-centric sens-
‘‘Boosting the battery life of wearables for health monitoring through the
ing,’’ Phylos. Trans. Roy. Soc., vol. 370, no. 1958, pp. 176–197, Jan. 2012.
compression of biosignals,’’ IEEE Internet Things J., to be published.
[3] S. M. R. Islam, D. Kwak, M. H. Kabir, M. Hossain, and
[27] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression,
K.-S. Kwak, ‘‘The Internet of Things for health care: A comprehensive
vol. 159. New York, NY, USA: Springer, 1992.
survey,’’ IEEE Access, vol. 3, pp. 678–708, Jun. 2015.
[4] V. Vadori, E. Grisan, and M. Rossi, ‘‘Biomedical signal compression [28] K.-S. Wu and J.-C. Lin, ‘‘Fast VQ encoding by an efficient kick-out
with time- and subject-adaptive dictionary for wearable devices,’’ in Proc. condition,’’ IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 1,
IEEE Int. Workshop Mach. Learn. Signal Process. (MLSP), Salerno, Italy, pp. 59–62, Feb. 2000.
Sep. 2016, pp. 1–6. [29] M. Elgendi, ‘‘Fast QRS detection with an optimized knowledge-based
[5] T. Kohonen, ‘‘The self-organizing map,’’ Proc. IEEE, vol. 78, no. 9, method: Evaluation on 11 standard ECG databases,’’ PLoS ONE, vol. 8,
pp. 1464–1480, Sep. 1990. no. 9, pp. 1–18, Sep. 2013.
[6] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. [30] T. M. Martinetz, S. G. Berkovich, and K. J. Schulten, ‘‘Neural-gas network
Upper Saddle River, NJ, USA: Prentice-Hall, 1998. for vector quantization and its application to time-series prediction,’’ IEEE
[7] H. Shah-Hosseini and R. Safabakhsh, ‘‘TASOM: The time adaptive self- Trans. Neural Netw., vol. 4, no. 4, pp. 558–569, Jul. 1993.
organizing map,’’ in Proc. Int. Conf. Inf. Technol., Coding Comput., [31] B. Fritzke, ‘‘Growing cell structures: A self-organizing network for
Las Vegas, NV, USA, Mar. 2000, pp. 422–427. unsupervised and supervised learning,’’ Neural Netw., vol. 7, no. 9,
[8] B. Fritzke, A Growing Neural Gas Network Learns Topologies. Cambridge, pp. 1441–1460, 1994.
MA, USA: MIT Press, 1995. [32] C. Karakus, A. C. Gurbuz, and B. Tavli, ‘‘Analysis of energy efficiency
[9] C.-C. Sun and S.-C. Tai, ‘‘Beat-based ECG compression using gain- of compressive sensing in wireless sensor networks,’’ IEEE Sensors J.,
shape vector quantization,’’ IEEE Trans. Biomed. Eng., vol. 52, no. 11, vol. 13, no. 5, pp. 1999–2008, May 2013.
pp. 1882–1888, Nov. 2005. [33] M. Hooshmand, M. Rossi, D. Zordan, and M. Zorzi, ‘‘Covariogram-based
[10] T. Schoellhammer, B. Greenstein, E. Osterweil, M. Wimbrow, and compressive sensing for environmental wireless sensor networks,’’ IEEE
D. Estrin, ‘‘Lightweight temporal compression of microclimate datasets,’’ Sensors J., vol. 16, no. 6, pp. 1716–1729, Mar. 2016.
in Proc. IEEE Int. Conf. Local Comput. Netw. (LCN), Tampa, FL, USA, [34] ARM The Architecture for the Digital World. (2015). ARM Cortex-
Nov. 2004, pp. 516–524. M4 Processor. [Online]. Available: http://www.arm.com/products/
[11] R. Shankara and S. M. Ivaturi, ‘‘ECG data compression using Fourier processors/cortex-m/
descriptors,’’ IEEE Trans. Biomed. Eng., vol. BME-33, no. 4, pp. 428–434, [35] ARM. (2010). Cortex-M4 Technical Reference Manual. [Online].
Apr. 1986. Available: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0439b/
[12] V. A. Allen and J. Belina, ‘‘ECG data compression using the discrete cosine DDI0439B_cortex_m4_r0p0_trm.pdf
transform (DCT),’’ in Proc. Comput. Cardiol., Oct. 1992, pp. 687–690. [36] Zephyr Technology Corporation. (2017). Bioharness 3-Wireless Profes-
[13] D. Zordan, B. Martinez, I. Vilajosana, and M. Rossi, ‘‘On the performance sional Heart Rate Monitor and Physiological Monitor. [Online]. Available:
of lossy compression schemes for energy constrained sensor networking,’’ http://www.zephyranywhere.com/
ACM Trans. Sensor Netw., vol. 11, no. 1, pp. 15-1–15-34, Nov. 2014. [37] Texas Instruments. (2015). CC2451: 2:4 GHz Low Energy and Proprietary
[14] B. A. Rajoub, ‘‘An efficient coding algorithm for the compression of ECG System-on-Chip. [Online]. Available: http://www.ti.com/product/cc2541
signals using the wavelet transform,’’ IEEE Trans. Biomed. Eng., vol. 49, [38] Specification of the Bluetooth System v4.2. (Dec. 2014). Bluetooth Core
no. 4, pp. 355–362, Apr. 2002. Specification Standard. [Online]. Available: https://www.bluetooth.com

19534 VOLUME 5, 2017


M. Hooshmand et al.: SURF: Subject-Adaptive Unsupervised ECG Signal Compression

[39] Y. Zigel, A. Cohen, and A. Katz, ‘‘ECG signal compression using anal- TOMMASO MELODIA (S’02–M’07–SM’16)
ysis by synthesis coding,’’ IEEE Trans. Biomed. Eng., vol. 47, no. 10, received the Ph.D. degree in electrical and com-
pp. 1308–1316, Oct. 2000. puter engineering from the Georgia Institute of
[40] Z. Zhang, T.-P. Jung, S. Makeig, and B. D. Rao, ‘‘Compressed sensing Technology, Atlanta, GA, USA, in 2007. He is cur-
for energy-efficient wireless telemonitoring of noninvasive fetal ECG via rently an Associate Professor with the Department
block sparse Bayesian learning,’’ IEEE Trans. Biomed. Eng., vol. 60, no. 2, of Electrical and Computer Engineering, North-
pp. 300–309, Feb. 2013. eastern University, Boston, MA, USA. He is also
serving as the lead PI on multiple grants from U.S.
federal agencies including the National Science
Foundation, the Air Force Research Laboratory,
the Office of Naval Research, and the Army Research Laboratory. He is
the Director of Research for the PAWR Project Office, a public-private
partnership that is developing four city-scale platforms for advanced wireless
research in the U.S. His research focuses on modeling, optimization, and
experimental evaluation of wireless networked systems, with applications
MOHSEN HOOSHMAND received the M.Sc. to 5G Networks and Internet of Things, software-defined networking,
degree in computer engineering from the Isfa- and body area networks. He is the Technical Program Committee Chair
han University of Technology, in 2011, and the for IEEE INFOCOM 2018. He is a recipient of the National Science
Ph.D. degree from the University of Padova, Foundation CAREER award and of several other awards. He is an Asso-
in 2017. He is currently a Postdoctoral Fellow with ciate Editor for the IEEE TRANSACTIONS on WIRELESS COMMUNICATIONS, the
the Biomedical and Clinical Informatics Labora- IEEE TRANSACTIONS on MOBILE COMPUTING, and the IEEE TRANSACTIONS on
tory, Department of Computational Medicine and BIOLOGICAL, MOLECULAR, and MULTI-SCALE COMPUTER NETWORKS, and Smart
Bioinformatics and a member with the Michigan Health.
Center for Integrative Research in Clinical Care,
University of Michigan, USA. His research inter- MICHELE ROSSI (SM’–) is currently an Asso-
ests include signal and medical image processing, machine learning and ciate Professor with the Department of Infor-
Internet of Things devices. mation Engineering, University of Padova, Italy.
In the last few years, he has been actively
involved in EU projects on IoT technology and
has collaborated with SMEs, such as Worldsensing
(Barcelona, ES) in the design of optimized IoT
solutions for smart cities and, with large compa-
nies, such as Samsung and Intel. He is author of
over 100 scientific papers published in interna-
DAVIDE ZORDAN received the M.Sc. degree in tional conferences, book chapters and journals and has been the recipient
telecommunications engineering and the Ph.D. of four Best Paper Awards from the IEEE. His current research interests are
degree from the University of Padova, Italy, in centered around wireless sensor networks, Internet of Things (IoTs), green
2010 and 2014, respectively. He is currently a 5G mobile networks and wearable computing. In 2014, he was the recipient
Postodoctoral Researcher with the Department of of a Samsung Gro award with a project entitled Boosting Efficiency in
Information Engineering, University of Padova. Biometric Signal Processing for Smart Wearable Devices. Since 2016, he has
His research interests include stochastic mod- been collaborating with Intel on the design of IoT protocols exploiting cog-
eling and optimization, protocol design, and nition and machine learning, as part of the Intel Strategic Research Alliance
performance evaluation for wireless networks, Research and Development program. His research is also supported by the
in-network processing techniques including European Commission on green 5G mobile networks. He currently serves on
compressive sensing, energy efficient protocols and energy harvesting tech- the Editorial Board of the IEEE TRANSACTIONS ON MOBILE COMPUTING.
niques for WSNs, and wearable IoT devices.

VOLUME 5, 2017 19535

You might also like