Banerjee 2017
Banerjee 2017
Phonocardiogram
Rohan Banerjee1 , Anirban Dutta Choudhury1 , Parijat Deshpande1 , Sakyajit Bhattacharya1
Arpan Pal1 and Dr. K M Mandana2
Abstract— Automatic classification of normal and abnor- due to movement of the stethoscope on human body during
mal heart sounds is a popular area of research. However, collection of data as well as background noise in audible
building a robust algorithm unaffected by signal quality and range. Moreover, there is a significant variation in morphol-
patient demography is a challenge. In this paper we have
analysed a wide list of Phonocardiogram (PCG) features in ogy of PCGs, collected using different commercial digital
time and frequency domain along with morphological and stethoscopes available in the market. The intensity of S1
statistical features to construct a robust and discriminative and S2 heart sounds also vary depending upon the position
feature set for dataset-agnostic classification of normal and of stethoscope on human chest. Thus the performance of
cardiac patients. The large and open access database, made a classifier, validated upon one particular dataset, often
available in Physionet 2016 challenge was used for feature
selection, internal validation and creation of training models. degrades when tested on another corpus of different patient
A second dataset of 41 PCG segments, collected using our in- demography or the device used for collection of data. In this
house smart phone based digital stethoscope from an Indian paper, we analysed a wide list of PCG features to identify a
hospital was used for performance evaluation. Our proposed feature set, independent to device quality as well as patient
methodology yielded sensitivity and specificity scores of 0.76 and demography. Features derived from both segmented PCG
0.75 respectively on the test dataset in classifying cardiovascular
diseases. The methodology also outperformed three popular as well as the entire unsegregated signal were combined to
prior art approaches, when applied on the same dataset. construct the proposed set. For an exhaustive and unbiased
analysis, two completely different datasets, in terms of de-
I. I NTRODUCTION vice quality, patient demography and ethnicity as well as
This paper presents a new feature extraction algorithm background noise were used for creation of training models
for robust classification of normal/ abnormal heart sounds. and performance evaluation. Finally the proposed approach
Heart sound signals, commonly known as Phonocardiogram was compared with three recent prior arts ([6], [7] and [8])
(PCG) is known to carry meaningful information regarding that showed promising accuracy on a similar problem in
heart condition of a person and is considered as an important Physionet Challenge 2016 [9].
marker for non-invasive detection of heart diseases. Cardio-
vascular disease continues to be the major cause of death in II. M ETHODOLOGY
both developed and developing nations. Thus an early detec- This section describes the algorithm we devise for feature
tion of the same is very important for preventive healthcare. extraction. The algorithm provided discriminative features,
Several research works [1], [2] already exist for identifying on which a classifier can be created from the training data.
heart diseases by analysing PCG signals recorded in a high Then the learning was applied on test data for prediction
quality digital stethoscope. Most of them fall into two major purpose. We used the Physionet Challenge 2016 dataset [10],
categories, segmentation and without segmentation approach. [11] for training and validation and data from another urban
In a segmentation based approach ([1], [2], [3]), PCG signal hospital for test purpose. These datasets are detailed in the
is split into different cardiac phases (S1, systole, S2 and subsequent section. In this section, we mainly concentrate
diastole), followed by extraction of relevant features from on the feature extraction procedure, which is the main
those segments and classification. On the other hand, the contribution of the paper.
entire unsegregated signal is analysed in the second approach One persistent problem with PCG is that the segregation of
[4], [5] for feature extraction and classification. In spite of S1 and S2 on noisy signal often fails. As a result, features
being more accurate, the first approach needs an accurate derived from these faulty segments become unreliable. Thus,
segmentation of the fundamental cardiac phases, which is a in this study we considered features, derived from both seg-
challenging task on noisy PCG. mented PCG as well as the entire signal without segmenting.
However, a PCG signal is inherently noisy in nature and A wide list of possible features available in prior arts along
its quality gets affected by the frictional noise generated with some new features were explored. Since the intensity
1 Rohan Banerjee, Anirban Dutta Choudhury, Parijat Deshpande, of S1 and S2 depends on the quality of stethoscope as well
Sakyajit Bhattacharya and Arpan Pal are with Tata Consultancy Services. as the position of the device, we did not consider any feature
{rohan.banerjee, anirban.duttachoudhury, related to the amplitude of the time signal. Features related
parijat.deshpande, sakyajit.bhattacharya, to patient demography were also discarded. The optimum
arpan.pal}@tcs.com
2 Dr. K M Mandana is with Fortis Hospital, Kolkata, India feature set was selected based on statistical feature selection
[email protected] tools like Maximal Information Coefficient (MIC), minimum
amplitude
amplitude
amplitude
20 40
10 20 20
0 0 0
100 200 300 400 500 100 200 300 400 500 100 200 300 400 500
frequency in Hz frequency in Hz frequency in Hz
0.1
amplitude
amplitude
amplitude
0.2 0.5
0 0 0
-0.2 -0.1 -0.5
0 2 4 0 2 4 0 2 4
time in second time in second time in second
(a) Training Set a (b) Training Set c (c) Training Set d
amplitude
20 50
amplitude
amplitude
40
10 20
0 0 0
100 200 300 400 500 100 200 300 400 500 100 200 300 400 500
frequency in Hz frequency in Hz frequency in Hz
amplitude
0.1
amplitude
0.5 0.2
amplitude
0 0
0
-0.1 -0.5
0 2 4 0 2 4 -0.2
0 2 4
time in second time in second
time in second
(d) Training Set e (e) Training Set f (f) Test data
Fig. 1: Sample PCGs from Experimental Datasets along with Frequency Response
Redundancy Maximum Relevance (mRMR) as well as inter- 1) Statistical features (f27 − f33 ): Mean kurtosis of all
nal evaluation on the training set. A detailed description of time windows (each having a duration of 2 seconds) was
our selected 60 features are provided subsequently. considered as a novel feature. Linear Predictive Coefficient
(LPC) of the first, third, sixth, eight, ninth, and tenth coeffi-
A. Time domain features (f eature f1 − f20 )
cients of tenth order linear predictor of the signal were also
These features were extracted from segmented PCG. A included in our feature set.
total of 20 features related to individual cardiac cycle as 2) Entropy based features (f34 −f35 ): Natural and Tsallis
well as short term Heart rate Variability (HRV), as detailed entropy of PCG signals were used as features [7]. They are
in [11] were derived from the internal distances between defined as
different phases inside a cardiac cycle. HRV is widely �
considered as a primary marker for identifying heart diseases. H(x) = − prob(xi ) ln(p(xi )) (1)
Being independent to signal amplitude, these features can be i
expected to be device-agnostic. k �
Sq (x) = (1 − prob(xi )q ) (2)
B. Spectral features (f21 − f26 ) q−1 i
Fig. 1 shows sample PCG of cardiac patients along with where prob(xi ) is the probability of ith PCG sample, xi . k
frequency response, randomly selected from all the training and q are real parameters equal to 1 and 2 respectively.
and test sets detailed in Section III-A. It can be observed 3) Wavelet features (f36 − f37 ): Input PCG was de-
that many of the time signals do not show discriminative S1 composed upto third and fifth level of detail coefficients
and S2 regions. In spite of morphological differences in time (d3 , d5 ) using Daubechies 4 (db4) as the mother wavelet.
domain, the overall frequency response looks quite similar The following parameters were derived as features [7]
for all the cases. Thus frequency features are expected to
be more reliable for analysis. We proposed 3 new features, λ(d3 ) = log2 (var(d3 )) (3)
extracted from the spectrum of the signal, broken into small
rectangular windows having 50% overlapping. The window 1 �
Hq (d5 ) = ln( prob(d5i )q ) (4)
length was carefully chosen as 2 seconds so that at least q−1 i
one complete cardiac cycle can be fitted in every window.
The proposed features consisted of 1) spectral centroid, 2) Hq is known as Renyl entropy, where q = 2.
spectral roll-off and 3) spectral flux across all the windows 4) MFCC features (f38 −f60 ): In Mel Frequency Cepstral
in a recording. The frequency centroid of Power Spectral Coefficients (MFCC), an audio signal is represented in
Density (PSD) of the entire signal without windowing was terms of short term power spectrum, by translating its
also used as a feature. Finally the Area Under Curve (AUC) frequency components to Mel scale (fmel ) using eqn. 5,
of the normalized power spectrum plot under two normalized which approximates the human auditory system.
frequency regions (0.7-0.8 and 0.9-1), used in [7] were f
incorporated in the feature list. fmel = 2595 log10 (1 + ) (5)
700
4583
Although MFCC is widely used in speech and speaker TABLE I: Distribution of Normal and Abnormal Recordings
recognition system, they were also used in prior art [7] for in Training Dataset
analysis of heart sounds. In this study, we proposed a newer
Dataset Normal recordings Abnormal recordings
set of features derived from MFCC. Raw signal was broken set a 114 274
into windows of 25 ms duration, with a 10 ms overlapping set c 7 18
using Hamming window. These were analysed up to 500 set d 4 13
Hz to extract 13 dimensional MFCC parameters from each set e 1374 95
set f 78 31
window. The mean values across all the windows for all 13
Total 1577 431
coefficients were included to complete the feature set.
III. DATA ANALYSIS 8000 Hz. Critical information regarding heart sound typically
In this section, we describe the experimental data, its pre- lies within 500 Hz. This allowed us to down-sample all
processing, and the classifier used. Our method is compared the recordings to 1000 Hz, which also significantly reduced
with some other recent existing techniques ([6], [7], [8]) and the computation cost. Subsequently, the logistic regression-
the results are reported in Section III-D. HSMM-based algorithm by Springer et al [13] was used
to segregate S1, systole, S2 and diastole phases from each
A. Experimental dataset cardiac cycle.
The dataset provided in Physionet 2016 challenge was
C. Classification
used as the training dataset. It is a large and open access
dataset [11], containing a total of 3153 annotated record- The training set was highly unbalanced in terms of
ings of normal and abnormal heart sounds. The abnormal normal and abnormal recordings. Moreover, overall signal
recordings were collected from patients having heart valve quality and noise level was also varying in nature across the
defects and Coronary Artery Disease (CAD). The entire dataset. Thus, creating a single Neural Network (NN) based
dataset contains 6 sets (a, b, c, d, e, f ), each recorded from training model, containing equal representation of both the
hospitals at different geographical locations using different class labels might not fully utilize the training data. Hence,
types of digital stethoscopes under different controlled or we used an ensemble of multiple feed-forward NNs. A total
uncontrolled environments. All recordings were also marked of 5 NNs were used in our methodology, each containing a
as good/bad by expert annotators based on their signal single hidden layer with 50 neurons. Output layer of each
quality. Most of its recordings being too noisy, set b was NN contained two neurons, representing the two output
discarded completely along with all noisy recordings present class labels. Log-sigmoid function was used to activate the
in other sets from our analysis. Further, all recordings, having neurons in both hidden and output layers. During training
duration less than 15 seconds, were also omitted, for being phase, back propagation algorithm was applied to learn the
too short to contain any relevant HRV information. Table I weights of the neurons. Being the minor class in the training
depicts the distribution of normal and abnormal recordings set, the entire population of abnormal recordings were split
present in all five training sets used in our study. The larger into five partitions of equal length. Subsequently 5 different
sets a, c and e were used for internal training and validated training scenarios were created, by drawing 4 portions of it
on d and f . The final training model was created on the to cover all possible combinations. In each case, an equal
entire training set to evaluate on the test dataset. number of normal recordings were also selected at random.
The test dataset was collected from an urban hospital in Five individual NNs were trained based on the overlapping
Kolkata, India using our in-house digital stethoscope [12]. training sets.
We obtained the necessary approval from the hospital ethics To classify a certain test case, it was classified using all 5
committee before the data collection drive. A total of 41 PCG NNs to get 5 different classification labels (yi , i ∈ 1...5) as
segments including 24 normal and 17 abnormal recordings well as the corresponding probability values (pri , i ∈ 1...5)
were obtained from consenting 25 subjects. The duration to belong to that particular class label was computed.
of the recordings varied from 30 seconds to a minute. The prediction of an abnormal and normal heart sound
Normal heart sounds were collected from both non-cardiac by each of the classifiers was labelled as +1 and −1
patients and healthy adults, whereas the abnormal recordings respectively. The sign of sum of the probability scores by
were from CAD patients having ranging percentage of heart all 5 classifiers, multiplied by their respective classification
blockage. Being collected using a non-medical garde digital label was computed (as shown in eqn. 6) for final prediction.
stethoscope in an uncontrolled hospital environment, the 5
�
test set is noisier than the training data. This was done F = sign( yi ∗ pri ) (6)
purposefully to test the robustness of our methodology on i
a dataset containing both clean and noisy PCG.
D. Experimental results
B. Pre-processing Our experimental results were reported in terms of
All recordings in Physionet challenge dataset were sam- sensitivity (Se) and specificity (Sp) of identifying abnormal
pled at 2000 Hz, whereas the test dataset was sampled at heart sounds and overall accuracy was measured as
4584
TABLE II: Performance on Validation dataset study. Thus, our methodology yielded the maximum score
in terms of overall accuracy.
Se Sp Acc
dataset mean std mean std mean std
set d 0.63 0.04 0.85 0.08 0.74 0.04
set f 0.79 0.02 0.75 0.05 0.77 0.04 IV. C ONCLUSION
In this paper we presented a new feature extraction tech-
nique which subsequently built a robust methodology for
Acc = (Se + Sp)/2. For internal validation, We trained
classifying normal and abnormal heart sounds. Two datasets,
our classifier by randomly selecting training indices from
completely different in terms of device quality, patient de-
sets a, c and e. Sets d and f were considered to study the
mography and background noise were used for training and
performance of the algorithm. The process was repeated
performance evaluation. We proposed a robust set of features
for five times. Table II shows that overall performance of
from the training set that yielded an accuracy of 75% on the
the classifier is stable enough on varying training indices.
test dataset. Our future works include enhancing the current
Once the internal validation was done, we moved on to
work by testing on larger and more diverse dataset as well as
the test set. The new training models for this phase were
fusing the PCG classifier with other low cost non-invasive
regenerated from the entire set of training and validation.
signals like PPG, single lead ECG etc for designing of a
We compared the performance of our methodology against
multi-modal classifier.
three prior art approaches, securing the top three positions
in the Physionet Challenge 2016 [9]. R EFERENCES
A total of 124 time-frequency features, mostly from [1] Dominique Gauthier, Yasemin M Akay, Robert G Paden, William
Pavlicek, F David Fortuin, John K Sweeney, Richard W Lee, and
1
Metin Akay, “Spectral analysis of heart sounds associated with
Potes et al.
coronary occlusions,” in Information Technology Applications in
0.9 Zabihi et al. Biomedicine, 2007. ITAB 2007. 6th International Special Topic Con-
Kay et al. ference on. IEEE, 2007, pp. 49–52.
0.8 Proposed [2] Samuel E Schmidt, John Hansen, Henrik Zimmermann, Dorte Ham-
mershøi, Egon Toft, and Johannes J Struijk, “Coronary artery disease
0.7 and low frequency heart sound signatures,” in Computing in Cardiol-
ogy, 2011. IEEE, 2011, pp. 481–484.
0.6
[3] Liang Huiying, Lukkarinen Sakari, and Hartimo Iiro, “A heart
0.5 sound segmentation algorithm using wavelet decomposition and re-
construction,” in Engineering in Medicine and Biology Society, 1997.
0.4 Proceedings of the 19th Annual International Conference of the IEEE.
IEEE, 1997, vol. 4, pp. 1630–1633.
0.3 [4] Sumeth Yuenyong, Akinori Nishihara, Waree Kongprawechnon, and
Se Sp Acc
Kanokvate Tungpimolrut, “A framework for automatic heart sound
analysis without segmentation,” Biomedical engineering online, vol.
Fig. 2: Comparative study between Proposed Methodology 10, no. 1, pp. 1, 2011.
and SoA techniques on Test Dataset [5] Shi-Wen Deng and Ji-Qing Han, “Towards heart sound classification
without segmentation via autocorrelation feature and diffusion maps,”
Future Generation Computer Systems, vol. 60, pp. 13–21, 2016.
segmented PCG were extracted by Potes et al in [6]. [6] Cristhian Potes, Saman Parvaneh, Asif Rahman, and Bryan Conroy,
A combination of AdaBoost and Convolutional Neural “Ensemble of feature-based and deep learning-based classifiers for
detection of abnormal heart sound,” in Computing in Cardiology,
Network (CNN) was used for classification. Zabihi et 2016. IEEE, 2016.
al [7] used 18 features, all extracted from unsegmented [7] Morteza Zabihi, Ali Bahrami Rad, Serkan Kiranyaz, Moncef Gabbouj,
PCG and ensemble of NNs for classification. Kay et al and Aggelos K. Katsaggelos, “Heart sound anomaly and quality
detection using ensemble of neural networks without segmentation,”
[8] proposed a fully connected neural network regularized in Computing in Cardiology, 2016. IEEE, 2016.
with DropConnect for classification using features from [8] Edmund Kay and Anurag Agarwal, “Dropconnected neural network
segmented PCG. Fig. 2 shows a comparative study between trained with diverse features for classifying heart sound,” in Computing
in Cardiology, 2016. IEEE, 2016.
our methodology and the prior arts. It can be observed [9] Gari D. Clifford et al., “Classification of normal/abnormal heart sound
that, all the prior arts were biased towards normal heart recordings: the physionet/computing in cardiology challenge 2016,” in
sounds, resulting in a very high specificity and a much Computing in Cardiology, 2016. IEEE, 2016.
[10] Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff,
lesser sensitivity. Whereas our methodology yielded a Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody,
balanced response in both sensitivity and specificity. As Chung-Kang Peng, and H Eugene Stanley, “Physiobank, physiotoolkit,
the segmentation algorithms often failed on noisy PCGs, and physionet,” Circulation, vol. 101, no. 23, pp. e215–e220, 2000.
[11] Chengyu Liu et al., “An open access database for the evaluation of
prior art [7] outperformed prior art [6] and [8] on the test heart sound algorithms,” Physiological Measurement, vol. 37, no. 9,
dataset, where many signal did not have prominent S1 2016.
and S2 regions through out the recording. We also found [12] Arijit Sinharay et al., “Smartphone based digital stethoscope for
connected health - a direct acoustic coupling technique,” in 2016 IEEE
that the statistical and entropy features proposed by [7] 1st International Conference on Connected Health: Applications,
yielded a high specificity. In spite of a marginal reduction Systems and Engineering Technologies, 2016.
in specificity, a major improvement in overall sensitivity [13] David B Springer, Lionel Tarassenko, and Gari D Clifford, “Logistic
regression-hsmm-based heart sound segmentation,” IEEE Transactions
was achieved by combining these features with the time on Biomedical Engineering, vol. 63, no. 4, pp. 822–832, 2016.
domain, spectral and MFCC features added by us in this
4585