AdaBoost
AdaBoost
https://doi.org/10.1007/s11042-019-08428-w
Abstract
Older people living alone are facing serious risks. Falls are the main risk that menace their
lives. In this paper, a new vision-based method for fall detection is proposed to allow
older people to live independently and safely. The proposed method uses shape defor-
mation and motion information to distinguish between normal activity and fall. The main
contribution of this paper consists on the proposition of a new descriptor based on
silhouette deformation, as well as, a new image sequence representation is proposed to
capture the change between different postures, which is discriminant information for
action classification.
Experimental results are conducted on two states-of-the art datasets (SDU fall and UR
Fall dataset) and a comparative study is presented. The results obtained show the
performance of the proposed method to differentiate between fall events and normal
activity. The accuracy achieved is up to 98.41% with the SDU fall dataset and 95.45%
with URFall dataset.
Keywords Fall detection . Human activity recognition . Image processing . Kinect . Video
monitoring . Depth information
1 Introduction
In the last century, the elderly population is increasing and falls are the major health risks that
debilitate this growing community and even threaten its life. According to the study by [1],
from 28 to 35% of people aged over 65 fall each year and this percentage increases for people
aged over 70. Falls lead to severe consequences such as hospitalization due to hip fractures,
* Fairouz Merrouche
[email protected]
Nadia Baha
[email protected]
1
Computer Science Department, University of Science and Technology USTHB, Algiers, Algeria
30490 Multimedia Tools and Applications (2020) 79:30489–30508
upper limb injuries and traumatic brain injuries. They cause fear of falling and reduction of
normal activities [29]. Falls also affect the family, society and economy.
Many researches have been conducted around the world to develop efficient systems to
prevent fall incidents and most marketing systems use portable sensors such as accelerometers,
gyroscopes that force the user to wear them. Therefore wearing these sensors is a major
disadvantage of these systems, because the elderly usually forget to wear them or to recharge
them. To deal with this problem, other researches have used sensors mounted in the environ-
ment, usually based on vibration, sound, vision and infrared sensors [7].
In the last years, computer vision area has known a fast evolution, which interests more
developers. Video monitoring systems use cameras to detect fall. These sensors yield infor-
mation about the daily activity of a person. The main advantage of such systems is that the
person does not need to wear any device. However, such systems do not respect the privacy of
older people. Kinect sensor is a camera that has gained great importance due to its low cost and
to its ability to protect the privacy of people. These systems are not only beneficial for the great
autonomy they offer to seniors at home, but can also be of great assistance in any international
gathering from all around the world. Many studies have been done to deal with weak and old
people in these situations, such as overcrowding [19].
The main idea behind this work is to provide a solution to the problem of fall detection of
elderly people at home. In this paper, we present a new, simple and flexible fall detection
method using Kinect sensor leveraging the advantages of different state of the art solutions.
This work considers the fall detection problem from two perspectives: the first one uses a
spatial information to describe the silhouette in order to improve the accuracy and the second
one aims the fall detection in real time. Most studies in the computer vision field have focused
on the use of a bounding box surrounding the human body silhouette and orientation angle to
detect falls. Unlike to stat-of-the art methods, the novelty of this work consists of proposing a
new shape based feature. This feature is described by orientation and size of human silhouette
in a new way using histograms that have two main characteristics: invariant to translation and
invariant to scale. We also, propose a new image sequences representation and we use a
classification method to discriminate between falls and normal activities.
The remainder of this paper is organized as follow: Section 2 presents the related work in
the field of fall detection systems. Section 3 presents a detailed description of the proposed
approach. In section 4, experimental results are presented and discussed. Finaly, section 5
concludes the paper with some remarks.
2 Related works
Falls are serious issue in elderly community. Dangerous health implication can decline the life
quality of elder person. In recent decades, multiple solutions have been proposed by many
researchers in order to automatically detect falls in the elderly population. According to the
literature, fall detection methods can be divided into two main classes: The computer vision-
based methods and non-computer vision method. In this subsection, we highlight some works
proposed for fall detection and elderly assistance.
- Non-vision methods
Different sensors have been used. In the system proposed by [4] a special piezoelectric
sensor coupled to the floor is exploited where a binary signal can be generated in case of a fall.
Multimedia Tools and Applications (2020) 79:30489–30508 30491
The main advantage of this system consists of not wearing the sensor while its drawback can
be false alarms.
In [8], two accelerometers have been attached to the chest and thigh of 10 subjects, and then
a decision tree was applied to the angle of all body postures to recognize posture transitions
with a threshold to detect falls. However, the main inconvenient of this system is to wear the
sensor by users.
In the work [32], a fall was detected using three sensors. It consists of a vibration sensor and
two PIR sensors. Vibrations captured by vibration sensors were converted into electrical signals
and vibration waveforms feature vectors are extracted using the complex wavelet transform
(CWT). These vectors are then classified using support vector machine (SVM). To eliminate
fall alarms generated when the vibration sensor signal energy is concentrated in low-frequency
bands (vibration from door slams and other similar events), PIR sensors are used as additional
sensors to detect the infrared radiation emitted by moving objects in the room. The authors in
[18] have proposed a new approach based on radar technologies to detect falls. Time-frequency
(TF) is used. This latter can reveal higher order velocity for different parts of human body based
on deep learning which learns and captures the properties of the TF signatures without human
intervention. The final features are then fed to a classifier that achieved 87% success rate.
In the recent work of [12], the fall detection method was based on accelerometer and sound,
the algorithm that was used is fuzzy logic-based fall detection. The algorithm is developed to
process the output signals from the accelerometer and sound sensor. This combination of
signals can minimize false alarms, as it is capable of minimizing false fall detections per day
from high of 1:37 to low of 0:06. Another work of [31], where the authors implemented a
system consisting of three stages. The first one is called mobile stage, which is used to filter out
the ADLs exploiting light-weighted threshold method. If this threshold suspect falls, then it
will be transmitted to the second stage, which is a collaboration stage. In the collaboration
stage, acceleration data will be transferred to a cloud server to extract features after that the
system will detect falls through classification using FEDT (fall-detection Ensemble Decision
Tree) in the cloud stage. A warning will be sent to a mobile device to get help.
In the work of [10] dynamic range-Doppler trajectory (DRDT) method was proposed based
on a frequency-modulated continuous-wave (FMCW) radar system. This method extracts
Multi-domain features, which includes temporal variation of range, Doppler, radar cross-
section (RCS) and dispersion. The fall has been detected using K-Nearest Neighbor (KNN)
classifier. The conducted Experiments have achieved an accuracy of 95.5%.
The methods described previously suffer from many weaknesses. They are sensitive to
environmental noises, which cause many fall alarms. Elderly people often forget to carry on
the sensor in case of wearable sensor based methods also these techniques exploit expensive
hardware.
- Vision-based methods.
Methods based on camera have demonstrated to have efficient and accurate results against
other sensors. These systems have multiple advantages: They are less intrusive; they have low
cost, easy to use and integrate. Hence, researchers have adopted computer-vision solutions.
These methods vary from work to work. Some of these works are based on 2D sensors
whereas the others are based on 3D sensors.
For 2D sensors-based methods, the steps followed are generally, starting with detecting the
person from each video frame using foreground separation from the background techniques.
The authors in [13] proposed a new approach based on shape variation. It was a combination
of best-fit approximated ellipse around the human body and projection histograms of the
30492 Multimedia Tools and Applications (2020) 79:30489–30508
segmented silhouette and temporal changes of head poses. The results achieved were 88.08%
of accuracy.
In the work of [23], the KNN algorithm was used to classify the postures using the ratio and
difference of human body silhouette bounding box height and width. In addition to posture
classification, time difference was used in order to distinguish between fall events and lying
down events. The accuracy obtained was 84.44%.
The authors in [30] have discriminated fall event from normal activities using standard
deviation and the orientation standard deviation of the Ellipse. Therefore, an ellipse approx-
imates the silhouette to analyze the changing of the shape of an ellipse and then the motion is
quantified by calculating the pixel value of Motion History Image blob.
Despite, the accuracy of these methods, they are not effective in case of forward or
backward falls. Thus, systems using multi-cameras have been introduced.
In [5], a multi-camera system was chosen to get 3D information. The fall detection process
was performed in two levels where the first level consists of the inference of the states of the
object at each frame. The second level operates on linguistic summarization of 3D person’s
states (Voxels) in order to infer the human activity.
Rougier et al. [26] proposed another approach that uses four cameras mounted in a room in
order to detect falls. They have used classification using GMM and analysis of the deformation
of the human shape. The basic rule used is a majority vote on all cameras. Each camera GMM
classifier has one vote and if an abnormal event occurs in a majority of cameras, the event is
considered as fall which allows the accuracy to be increased.
The authors in [25] have also proposed a multi-camera system based on the area occupied by
the person on the ground in order to differentiate between lying down activity and other
activities. From the foreground information gotten from each camera, the percentage of the
surface of the person which is in contact with the ground is estimated using two orthogonal
views. The results obtained with this method were 95.8% of sensitivity and 100% of specificity.
Lately, Kinect sensor has been exploited in order to acquire 3D data instead of using
multiple cameras. Using this sensor can solve many shortcomings of regular camera such as
illumination and shadow.
In [24], a shape-based method to detect fall has been introduced using Kinect sensor where
CSS features of the silhouette in each depth frame were extracted. The video sequences were
represented by bag of CSS words. This resultant feature represents the action. VPSO-ELM that
uses Particle Swarm Optimization (PSO) classifier was used to detect abnormal events.
In [6], the depth information provided by Kinect has been exploited. The authors have
extracted the key joint of the human body using RDT algorithm trained with Shannon entropy.
These joints are then tracked. The fall is detected based on head joint distance to the floor
trajectory and SVM classifier.
Kwolek and Kepski in [21] have adopted a combination of Kinect sensor and accelerom-
eter. The accelerometer was used to filter out non-fall events and depth sensor was used to
distinguish between such filtered events and the falls. From the depth map, the person’s body
was delineated from the background. Then, depth features were extracted. These features are
based on the width and height of the person’s bounding box and on the person’s centroid.
Finally, a KNN classifier was used to identify fall events.
In the work of [22], a combination of Kinect with another sensor is used and this time is
using a mobile phone. This system adopts two methods from each sensor. Based on Kinect, the
speed of the head joint features with SVM classifier was exploited to identify if the person is
falling or not. The smart phone-based fall detection algorithm, three feature values using the
Multimedia Tools and Applications (2020) 79:30489–30508 30493
acceleration data: Signal Magnitude Area (SMA), Signal Magnitude Vector (SMV), and Tilt
Angle (TA) were calculated. The fall is detected using thresholds. To process the data from the
two sensors, two methods of decision-making fusion (The logical rules-based and D-S
evidence fusion) were used.
In the work of [28], the authors proposed a computer vision based fall detection system
where they introduce a foreground segmentation method. This method has used the hue image
in HSV color to get the human information and executed with a simple adaptive background
model. This method also solved problem of light and shadow change. Falls were detected
based on the shape of human and the center of gravity. Experiments were done on their build
dataset. The accuracy achieved is up to 96% on average.
In this paper, we propose a new and simple method based on human shape to detect falls.
Inspired by works that used height/width of bounding box surrounding the human body
silhouette and solutions that used orientation angle to detect falls, a new descriptor is built
in order to represent postures by combining these two features in one vector. The motivation
behind this work is the need to characterize the human body with features that are invariant to
scale and translation considering the camera also a new image sequence representation is
proposed in order to capture the change between different postures which a discriminate
information. In addition, these state of the art methods generally dependent on a threshold,
which causes a lot of misclassification of falls. Another important aspect addressed in this
work, is the detection of falls in real-time by proposing a simple descriptor, unlike other
methods that do not take it into account in their work.
The proposed approach uses a Microsoft Kinect sensor which is a special camera developed by
Microsoft for Xbox360. It consists of two sensors: an RGB camera and Infrared depth sensor,
where depth represents the distance between the sensor and the object.
Only the depth map sequences are processed in this work and this choice is based on the
advantages provided by the depth sensor. Thanks to the infrared sensor, our system can remain
Fig. 1 Kinect sensor: a Kinect sensor, b RGB Image and c Depth Image
30494 Multimedia Tools and Applications (2020) 79:30489–30508
operational day and night, because it is not affected by the alteration of light and darkness. It
also preserves the privacy of elderly people. Figure 1 shows an example of depth and RGB
images captured by Kinect sensor.
The main objective of this work is to observe a person living alone at home closely. For this
aim, we first detect the person in the scene, extract features that describe its posture and then
classify its action as an activity of daily living or as a fall.
An overview of the proposed method is illustrated in Fig. 2. Initially, the object of interest is
detected using background subtraction method. Then, shape and motion features are extracted
from a sequence of images to build a descriptor which is fed to a classifier. Finally, fall event or
normal activity is detected. In the following subsections, each of these steps is described in
more detail.
3.1 Preprocessing
This step aims to extract the object of interest from the background and to perform additional
processing to eliminate noise and fill holes from depth images.
To fill holes of the depth images, inpainting is used. For this, the method proposed in [27] is
used. The algorithm is based on Fast Marching Method. It starts from the boundary of the
black region and goes forward to the inside. The pixel to be replaced is filled by the normalized
weighted sum of the pixels in the neighborhood. Fig. 3 shows an example of an image
obtained after the step of filling the hole.
Most fall detection vision-based methods start with background subtraction approach, which is
used to delineate the object of interest from the background. In this work, we use the frame
difference method because it is simple and fast, and then we apply morphological operation
and deletion to differentiate all the pixels that belong to the foreground or to the background. In
Fig. 4 an example of background subtraction method is presented.
In this subsection, we describe how to extract the body feature from an image using shape
characteristics.
The shape can be characterized by the height and the orientation angle of the human body.
Since the height/width ratio is very informative, it can inform us about the main posture of the
person when he falls. The orientation angle of the body is also a discriminative feature when
fall events occur. Inspired by the well-known descriptor Histogram of Oriented gradient HOG
[9], we build a new shape-based descriptor to describe body posture.
First, the contour C of body silhouette is extracted using findcontour() function of
Opencv library [17]. Then, this contour is approximated using Douglas-Peuker algorithm
[11] in order to estimate points of higher curvature. This allows the reduction of the
processing time without losing important information. The resultant curve is noted as Ca.
Furthermore, the center of mass p c (x c ,y c ) is located using the first geometrical
momentm10,m01 and m00of the silhouette as:
m10 m01
xc ¼ ; yc ¼ ð1Þ
m00 m00
Fig. 3 a image before hole filling step, b image after hole filling step.
30496 Multimedia Tools and Applications (2020) 79:30489–30508
Where:
In order to make the descriptor robust to geometrical transformation as Translation and Scale,
we apply to it a normalization operation.
In this step, we attempt to bring the center of mass pc (xc,yc) of the object to the point (0, 0)
using the following transformation.
xt ¼ xi −xc ; yt ¼ yi −yc ð3Þ
Where (xc, yc) are the coordinate of the center of mass pc computed from Eq. (1).
xi and yi represent the coordinate of the point pi of the contour C.
Figure 5 shows an illustration of the translation operation of the center of mass.
The shape information is preserved over any size using the normalization of Eq. 5.
First, we compute the maximal distance from the center of mass of the silhouette to the
curve Ca point’s pai (xai ,yai ) as follows:
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
d max ¼ max ðxai −xc Þ þ ðyai −yc Þ ð4Þ
Therefore the points pti (xti ,yti ) obtained after the translation (Eq. (4)) is transformed with Eq. (5).
After the normalization operation, the distance di between the centroid pc (xc,yc) and the
boundary points ptsi (xtsi ,ytsi ) of the curve C is measured using the Euclidean distance as in Eq. (6).
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
d i ðx; yÞ ¼ ðxtsi −xc Þ2 þ ðytsi −yc Þ2b ð6Þ
Then, the angle θ which corresponds to the angle between the point ptsi (xtsi ,ytsi ) and the centroid
pc (xc,yc) is calculated using formula (7). Fig. 7 illustrates the angle θ and the distance between
the centroid pc (xc,yc) and contour points of the silhouette.
ts
y −y
θi xtsi ; ytsi ¼ arctan its c ð7Þ
xi −xc
After the translation and scale transformations, the angles are then binned into orientation bins
which are uniformly spaced over 0° – 180°. To reduce aliasing, votes are interpolated
bilinearly between the neighboring bin centers in both orientation and position [9]. Each angle
θi(x, y) with the distance di(x, y)contributes to the ith bin of the histogram. A histogram of nine
bins is obtained.
Fig. 8 shows an example of histograms feature obtained with different postures chosen
randomly from SDU-Fall database [24].
In order to differentiate between similar actions such as: falling and lying down. The motion
information is measured and introduced. In this approach, the distance between silhouette’s
center of mass through frames represents the human body velocity. This distance is calculated
using the Euclidean distance. A vector V of length n is obtained, where n is the number of the
frame sequence. Fig. 9 gives an illustration of the distance ordinate over time.
The goal of this step is to represent an action using temporal and spatial information in a
simple, fast and efficient way. Representing and describing action is a challenging topic.
Generally, the similarity between frames technique or the bag-of-words (BoW) model [15] are
30498 Multimedia Tools and Applications (2020) 79:30489–30508
used. In this work, the main idea consists to measure the variation of data value from the first
frame to the last one. Therefore, the standard deviation is used for this purpose.
Let Sf = [f1, f2… fn] be a sequence of frames, hdo is the descriptor built from the previous
steps (section 3.2) and A the matrix of dimension (nxm) where m is the length of the descriptor
hdo and n is the number of frames. The row elements of this matrix (A) represent the elements
of the hdo descriptor.
Let A−1 be the inverse matrix of A and Tv a vector of length m which contains the standard
deviation of each column elements.
The standard deviation of each column is computed using Eq. (8):
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 −1
2
σi ¼ ∑m
1 Aij −μ ð8Þ
m
Where:
σi is the standard deviation of the ith element of vector Tv.
A−1 −1
ij is the instance value of the matrix A iϵ{0, m} and jϵ{0, n}.
μ is the mean.
The final descriptor that represents the action consists of the combination of the vector Tv
and the velocity feature extracted from subsection 3.2.
Multimedia Tools and Applications (2020) 79:30489–30508 30499
Once the descriptor is built, we go to the classification stage. The extracted descriptor is fed to
a classifier in order to distinguish between fall events and daily activities. Many fall detection
methods use different machine learning techniques for classification. The choice of a classifier
is an important decision. Thus, in this work, we have selected three well-known classifier that
have proven their accuracy to test the robustness and efficiency of the proposed descriptor. We
used K-fold Cross validation with the following classifier.
- Random Decision Forest [16]
Decision Tree Forest is a very popular classifier and it is chosen because of its simplicity
and performance and the ease of application in larger datasets. It is a combination of many
binary decision trees built using several bootstrap samples. Each Tree computes its own
prediction. The final classification is elected by the forest which employs plurality voting.
- Adaboost[14]
Boosting is an ensemble technique that tries to create a robust classifier from several weak
classifiers. It is used with decision trees. Training instances that are difficult to predict will be
given the highest weights. The model is created sequentially. AdaBoost is a successful
classifier especially in binary classification.
- KNN [3]
KNN is a simple classification algorithm widely used. The classification of an instance is
done by the majority vote of its k neighbors based on similarity distance. KNN classifier is
known for its flexibility in output interpretation and in calculation speed.
Subsection 4.2.2 is dedicated to discuss in more detail the result of classification step and
performance of each classifier.
Multimedia Tools and Applications (2020) 79:30489–30508 30501
4 Experimental results
In this section, we describe the different experiments conducted to evaluate the performance of
the proposed method. Our aim is to obtain a minimum of false alarms and a fast run time,
which are the requirements for any fall detection system. For this purpose, we have tested our
algorithm on two datasets.
4.1 Datasets
In order to evaluate the effectiveness of our fall detection method, we performed experiments
with two datasets widely used in literature. The first dataset is SDUFall [25]. The SDUFall
Dataset is chosen because it contains multiple actions performed by 20 person men and
women. These actions consist of sitting down, lying down, bending, squatting and falling.
They were performed 10 times with different conditions chosen randomly like turning the light
on and off, carrying or not an object, and changing the direction towards the camera. There are
1200 depth videos with a frame size of 320 × 240 recorded at 30 fps in avi format. These data
were recorded with Microsoft Kinect installed at 1.5 m heights.
The other source of data is URFall database [20]. It consists of 70 sequences in which 30
are for fall activities and 40 for Activity Daily Living ADL sequences. The images are
acquired at 25 frames per second, and with a resolution of 640 × 480 pixels. The videos are
recorded from different environments and contain variable illumination like shadows and
reflections that can be detected as moving objects.
All experiments were conducted on a PC with Intel (R) Core i5 1.80 GHz. Algorithms were
implemented using C++ language, python and openCV library.
To evaluate the effectiveness of the proposed descriptor named hdo, we used specificity,
sensitivity and accuracy as defined in formula 9, 10 and 11 respectively [6].
Recall: Indicates the capacity of the algorithm to detect fall events computed as:
TP
Recall ¼ ð9Þ
TP þ FN
Precision: indicates the capacity of the algorithm to detect non-fall events computed with
following formula:
TP
Precision ¼ ð10Þ
TP þ FP
Accuracy: Indicates how many true results were correctly detected amongst all measurements
computed as:
TP þ TN
Accuracy ¼ ð11Þ
TP þ TN þ FP þ FN
30502 Multimedia Tools and Applications (2020) 79:30489–30508
In the previous section, we have described the results obtained for the classification of 5
postures, in the following we will be interested in the results obtained for the fall detection. For
Fig. 10 Confusion Matrix of posture classification using random forest tree (RDF) classifier
Multimedia Tools and Applications (2020) 79:30489–30508 30503
85.60 85 85
this, we used ROC curve (the receiver-operating characteristic) and AUC (The area under the
curve) which are well known tools for evaluating binary classifiers. The ROC curve is a curve
of probability where the true positives rates are presented in terms of false positives rates. AUC
represents the capability of the model to distinguish between different classes. When AUC has
a higher value, this means that we have a better model. We used 10-Fold cross validation to
train the data. As mentioned in section 3.4, we used three classifiers: Adaboost, RDF and
KNN to test the robustness of features used in this work. Overall, RDF gives the best accuracy
result with SDU Fall dataset reaching 98.41%. For URFall dataset, the obtained results for
Adaboost and RDF were closed with 95.45% and 95.36% respectively, but the Adaboost
classifier outperforms RDF with 95.45% of accuracy. This gap in accuracy is due to the
difference between the two datasets considering the number of samples.
This step is performed in order to know the best classifier for the new features proposed in
this paper. The three classifier Adaboost, RDF and KNN are considered to have the best
performance on classifying data. Therefore, they were chosen for testing. We notice that
Adaboost and RDF (random forest tree) outperform KNN. However, both (Adaboost and
RDF) have approaching results with 0, 09% difference in UR fall datasets and this difference
get larger when tests are done using SDUFall datatset. The classifier random forest and
Adaboost are similar in term of performance considering the difference in data size (number
of samples) and also the difference in the data type. In conclusion, we cannot generalize these
results to decide on the best classifier for our features as Adaboost and RDF got approximately
Fig. 11 Roc curve for KNN, RDF and Adaboost (ADC) with SDU Fall
30504 Multimedia Tools and Applications (2020) 79:30489–30508
Fig. 12 ROC Curve for KNN, RDF and Adaboost(ADC) for URFall dataset
similar accuracies on both datasets performing better than each other with each data-base. For
that, future testing will be done on other datasets. In overall, the performance on the three
classifiers have highest values except for some errors which are due to background subtraction
accuracy and posture that are in front and back from the camera. This leads to confusion
between different actions. Another problem which rarely mentioned in fall detection state-of-
the-art which we have encountered during our experiments is related to the dataset itself, where
the activity movements did not simulate well the movement of an elder adult, for example
when young actors lay abruptly. This action is confused with falls which trigger an alarm since
our method is based on the movement to discriminate between these two actions.
Figures 11 and 12 show Roc curve obtained for SDU Fall and UR Fall datasets respectively.
Tables 2 and 3 illustrate the results obtained with the cross-validation on both dataset SDU
Fall and UR Fall
We use three scenarios showing the practicality and effectiveness of the proposed method:
Table 2 Accuracy Results with SDU Fall database for Adaboost, RDF and KNN classifiers
Table 3 Accuracy Results with UR Fall DataBaseforAdaboost, RDF and KNN classifiers
Lying on the floor is an action that can be confused with a fall. Our method was able to
classify it as non- fall without any false alarm and this is due to the incorporation of the center
of mass velocity in our solution.
To squat and then to lay on the floor to look for something is an action that can trigger
alarms, in our case, it was perfectly classified as non-fall. Another challenging action, falling
from the chair, our method classified it correctly as a fall.
A visualization of the results is shown in Figs. 13 and 14 with SDU fall dataset and URFall
dataset respectively.
The obtained results show the ability of the proposed method to detect falls. Therefore, the
proposed method based on the histogram of distance and orientation, combined with the center
of mass velocity is comparable to other state-of-the-art vision-based methods. Our method uses
Kinect sensor and benefits from its advantages. Unlike RGB camera based approaches our
system is not affected by shadows or light changes. It respects the privacy of older people
because only the depth map is used. In addition, the proposed features have proven their ability
to discriminate fall events from daily activity actions resulting in better accuracy.
For a fair comparison, we compare the obtained results for each dataset with the state-of-
the-art methods that use the same dataset.
For SDUFall dataset, the accuracy obtained by our method is higher than the one of [24]
which used curvature scale space (CSS) features extracted from silhouette at each frame
combined with bag of CSS words (BoCSS) for action representation and the complicated
VPSO-ELM classifier. It performs also the approach proposed by [2] where a new descriptor
based on Silhouette Orientation Images (SOI) is used. Besides its performance, it is only based
on shape orientation and posture changes to detect the activity. In fact, we have noticed that
these two works did not use movement in their studies, which is an important feature that is
used to discriminate between similar actions.
In the work of [21], RGB and depth images combined with accelerometer were exploited.
This method is based on 3 elements which make it computationally expensive, also the use of
RGB images leads to misclassification due to its ineffectiveness in case of darkness, shadows
and illumination.
Tables 4 and 5 show a comparative evaluation between our method and some state-of-the-
art methods.
An effective fall detection system should work in real-time, for that, the chosen feature
plays a very important role in reducing the run time. The complexity of our built feature (see
section 3.2 and section 3.3) depends on the number of the approximated contour’s point n,
which represents a complexity of O (n). For the training and testing phase, the run time took
longer with RDF and Adaboost classifiers depending on the data set size, but this time is
inconsiderable when the system is operational since this step is performed off-line.
5 Conclusion
In this paper, we have presented a new fall detection method using the Kinect sensor. The
proposed fall detection method combines spatial and temporal features in order to discriminate
falls. Therefore, a new approach has been proposed to describe the posture of the human
silhouette. At each frame, the silhouette is detected, then the distance between the center of
mass of the silhouette is measured and the corresponding angle is calculated in order to build a
histogram feature. Finally, a descriptor composed of the combination of the histogram and the
Table 5 Comparison with other state of- the-art method with URFall dataset
motion of the center of a mass vector is submitted to a classifier. In order to test the robustness
of our descriptor three classifiers have been used.
In this paper, we have shown that the proposed method has contributed to better fall
detection performance; the accuracy has been achieved up to 98.41% with SDU Fall dataset
and 95.45% with URfall dataset. A comparison with the state of-the-art-methods was pre-
sented to confirm the performance of the method. In addition, our method has the ability to
work in real-time which is an important criterion for any fall detection system.
For further work, other investigations will be done to enhance the accuracy, incorporating
the depth information provided by Kinect sensor. We are also looking forward to building our
own dataset to test the proposed method with more scenarios.
References
1. Ageing WHO, Unit LC (2008) WHO global report on falls prevention in older age. World Health
Organization, Geneva
2. Akagündüz E, Aslan M, Şengür A, Wang H, İnce MC (2017) Silhouette orientation volumes for efficient
fall detection in depth videos. IEEE J Biomed Health Inform 21(3):756–763
3. Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):
175–185
4. Alwan M, Rajendran PJ, Kell S, Mack D, Dalal S, Wolfe M, Felder R (2006) A smart and passive floor-
vibration based fall detector for elderly. In: Information and communication technologies, 2006. ICTTA’06.
2nd, vol 1, 1003–1007. IEEE
5. Anderson D, Luke RH, Keller JM, Skubic M, Rantz M, Aud M (2009) Linguistic summarization of video
for fall detection using voxel person and fuzzy logic. Comput Vis Image Underst 113(1):80–89
6. Bian ZP, Hou J, Chau LP, Magnenat-Thalmann N (2015) Fall detection based on body part tracking using a
depth camera. IEEE J Biom Health Inform 19(2):430–439
7. Chen WH, Ma HP (2015) A fall detection system based on infrared array sensors with tracking capability
for the elderly at home. In: 2015 17th international conference on E-health networking, application &
aervices (HealthCom), pp 428–434. IEEE
8. Cheng J, Chen X, Shen M (2013) A framework for daily activity monitoring and fall detection based on
surface electromyography and accelerometer signals. IEEE J Biomed Health Inform 17(1):38–45
9. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society
conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, 1, 886–893
10. Ding C, Zou Y, Sun L, Hong H, Zhu X, Li C (2019) Fall detection with multi-domain features by a portable
FMCW radar. In: 2019 IEEE MTT-S International Wireless Symposium (IWS), 1–3. IEEE
11. Douglas DH, Peucker TK (1973) Algorithms for the reduction of the number of points required to represent
a digitized line or its caricature. Cartographica 10(2):112–122
12. Er PV, Tan KK (2018) Non-intrusive fall detection monitoring for the elderly based on fuzzy logic.
Measurement 124:91–102
13. Foroughi H, Rezvanian A, Paziraee A (2008) Robust fall detection using human shape and multi-class
support vector machine. In: Sixth Indian conference on computer vision, graphics & image processing,
2008. ICVGIP’08, 413–420. IEEE
14. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In Icml, 96, 148–156
15. Harris ZS (1954) Distributional structure. Word 10(2–3):146–162
16. Ho TK (1995) Random decision forests. In Document analysis and recognition, 1995. Proceedings of the
third international conference on, vol 1, pp 278–282. IEEE
17. https://docs.opencv.org/2.4/doc/tutorials/imgproc/shapedescriptors/find_contours/find_contours.html
18. Jokanovic B, Amin M, Ahmad F (2016) Radar fall motion detection using deep learning. In: Radar
Conference (RadarConf), 2016 IEEE, 1–6. IEEE
19. Kim S, Guy SJ, Hillesland K, Zafar B, Gutub AAA, Manocha D (2015) Velocity-based modeling of
physical interactions in dense crowds. Vis Comput 31(5):541–555
20. Kwolek B, Kepski M (2014) Human fall detection on embedded platform using depth maps and wireless
accelerometer. Comput Methods Prog Biomed 117(3):489–501
21. Kwolek B, Kepski M (2015) Improving fall detection by the use of depth sensor and accelerometer.
Neurocomputing 168:637–645
30508 Multimedia Tools and Applications (2020) 79:30489–30508
22. Li X, Nie L, Xu H, Wang X (2018) Collaborative fall detection using smart phone and kinect. Mobile Netw
Appl 23:1–14
23. Liu CL, Lee CH, Lin PM (2010) A fall detection system using k-nearest neighbor classifier. Expert Syst
Appl 37(10):7174–7181
24. Ma X, Wang H, Xue B, Zhou M, Ji B, Li Y (2014) Depth-based human fall detection via shape features and
improved extreme learning machine. IEEE J Biomed Health Inform 18(6):1915–1922
25. Mousse MA, Motamed C, Ezin EC (2017) Percentage of human-occupied areas for fall detection from two
views. Vis Comput 33(12):1529–1540
26. Rougier C, Auvinet E, Rousseau J, Mignotte M, Meunier J (2011) Fall detection from depth map video
sequences. In: International conference on smart homes and health telematics. Springer, Berlin/Heidelberg,
pp 121–128
27. Telea A (2004) An image inpainting technique based on the fast marching method. J Graph Tools 9(1):23–
34
28. Tsai TH, Wang RZ, Hsu CW (2019) Design of fall detection system using computer vision technique. In:
Proceedings of the 2019 4th international conference on robotics, control and automation, 33–37. ACM
29. Walker JE, Howland J (1991) Falls and fear of falling among elderly persons living in the community:
occupational therapy interventions. Am J Occup Ther 45(2):119–122
30. Worrakulpanit N, Samanpiboon P (2014) Human fall detection using standard deviation of C-motion
method. J Autom Control Eng 2(4)
31. Wu T, Gu Y, Chen Y, Xiao Y, Wang J (2019) A Mobile cloud collaboration fall detection system based on
ensemble learning. arXiv preprint arXiv:1907.04788
32. Yazar A, Erden F, Cetin AE (2014) Multi-sensor ambient assisted living system for fall detection. In:
Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP’14),
1–3
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.