Sensors: Automatic, Qualitative Scoring of The Clock Drawing Test (CDT) Based On U-Net, CNN and Mobile Sensor Data
Sensors: Automatic, Qualitative Scoring of The Clock Drawing Test (CDT) Based On U-Net, CNN and Mobile Sensor Data
Article
Automatic, Qualitative Scoring of the Clock Drawing Test
(CDT) Based on U-Net, CNN and Mobile Sensor Data
Ingyu Park and Unjoo Lee *
Abstract: The Clock Drawing Test (CDT) is a rapid, inexpensive, and popular screening tool for
cognitive functions. In spite of its qualitative capabilities in diagnosis of neurological diseases, the
assessment of the CDT has depended on quantitative methods as well as manual paper based meth-
ods. Furthermore, due to the impact of the advancement of mobile smart devices imbedding several
sensors and deep learning algorithms, the necessity of a standardized, qualitative, and automatic
scoring system for CDT has been increased. This study presents a mobile phone application, mCDT,
for the CDT and suggests a novel, automatic and qualitative scoring method using mobile sensor
data and deep learning algorithms: CNN, a convolutional network, U-Net, a convolutional network
for biomedical image segmentation, and the MNIST (Modified National Institute of Standards and
Technology) database. To obtain DeepC, a trained model for segmenting a contour image from a hand
drawn clock image, U-Net was trained with 159 CDT hand-drawn images at 128 × 128 resolution,
obtained via mCDT. To construct DeepH, a trained model for segmenting the hands in a clock image,
U-Net was trained with the same 159 CDT 128 × 128 resolution images. For obtaining DeepN, a
trained model for classifying the digit images from a hand drawn clock image, CNN was trained
with the MNIST database. Using DeepC, DeepH and DeepN with the sensor data, parameters of
Citation: Park, I.; Lee, U. Automatic,
contour (0–3 points), numbers (0–4 points), hands (0–5 points), and the center (0–1 points) were
Qualitative Scoring of the Clock
scored for a total of 13 points. From 219 subjects, performance testing was completed with images
Drawing Test (CDT) Based on U-Net,
and sensor data obtained via mCDT. For an objective performance analysis, all the images were
CNN and Mobile Sensor Data.
scored and crosschecked by two clinical experts in CDT scaling. Performance test analysis derived a
Sensors 2021, 21, 5239. https://
doi.org/10.3390/s21155239 sensitivity, specificity, accuracy and precision for the contour parameter of 89.33, 92.68, 89.95 and
98.15%, for the hands parameter of 80.21, 95.93, 89.04 and 93.90%, for the numbers parameter of 83.87,
Academic Editors: Sonya A. Coleman, 95.31, 87.21 and 97.74%, and for the center parameter of 98.42, 86.21, 96.80 and 97.91%, respectively.
Dermot Kerr and Yunzhou Zhang From these results, the mCDT application and its scoring system provide utility in differentiating
dementia disease subtypes, being valuable in clinical practice and for studies in the field.
Received: 22 June 2021
Accepted: 29 July 2021 Keywords: clock drawing test; automatic scoring; wearable sensor; deep learning; U-Net; CNN; MNIST
Published: 3 August 2021
verbal working memory, visual memory and reconstruction, on-demand motor execution
(praxis), auditory comprehension, numerical knowledge, and executive function can be
tested [9,11,12]. It was Shulman and collaborators who employed the CDT as a screening
tool for older patients for cognitive disorders and, since then, multiple studies have affirmed
the utility of the CDT for screening and diagnosis of cognitive impairment for suspected
pathologies such as Huntington’s disease, schizophrenia, unilateral neglect, delirium,
multiple sclerosis, etc. [13]. Given the utility of CDT, studies on generating a scoring system
for CDT are valuable.
To gauge visuo-constructional disorders in moderate and severe dementia, quan-
titative scoring methods had been developed for the CDT, and these scoring methods
have indicated that CDT may be practical in allowing to distinguish the various clinical
features of cognitive deficits [14–17]. However, to improve such methods, neuropsycho-
logical approaches using information processing and its qualitative aspects have become
necessary, with the aims being to assess the executive functions for the task and analyz-
ing errors in the execution of the drawing. Qualitative approaches analyze various error
types such as graphing problems, conceptual deficits, stimulus-bound responses, spa-
tial/planning deficits, and perseveration, and these have helped describe various dementia
profiles [18,19]. Furthermore, the assessment of the CDT had been conventionally con-
ducted manually by a medical expert based on identifying abnormalities in the drawings
including poor number positioning, omission of numbers, incorrect sequencing, missing
clock hands and the presence of irrelevant writing, which is labor intensive, complex and
also prone to human subjective errors [20]. Thus, needs of an automatic scoring system
have been increased. As such, qualitative and automatic scoring systems are helpful for
differential diagnoses [21]. There are several ways of interpreting CDT quantitatively
and/or qualitatively [22–28]. For several of these methods, the scoring involves assessment
of drawing parameters, including the size of the clock, the closure of the contour, the
circularity of the contour, the existence of the two hands, the proportion of the minute
and the hour hands, the positioning of the hands, the correct target numbers of the hands
according to the time-setting instruction, the presence of all the digit numbers, the correct
position of the digit numbers, the order of drawing of the digit numbers, and the existence
of the center in the hand-drawn images for CDT. However, the scoring of CDT is dependent
on a clinician’s subjectivity, and so it is prone to be biased and have human errors. Further-
more, it is not very practical in analyzing big data, such as for personal lifelogs. With the
CDT results being qualitative and not numeric, it is also difficult to evaluate objectively.
Therefore, the necessity of a standardized, qualitative, and automatic scoring system for
CDT has been increased. An optoelectronic protocol was suggested to qualitatively induce
a parameter related to the movement kinematics in the CDT execution by acquiring the
graphic gesture from video recording using six cameras, where a trial duration index was
evaluated as a temporal parameter classifying between groups of subjects with Parkinson’s
Disease and with both Parkinson’s Disease and dementia [29]. However, the optoelectronic
protocol needs a lot of equipment and does not show parametric analysis of the specific
behavior patterns during CDT. Due to sensors and sensing systems of modern mobile
phones, accurate and rapid measurements of the user’s behavior patterns are available and
can be easily implemented without using a lot of hardware equipment. A comparative
study was presented for an automatic digit and hand sketch recognition, where several
conventional machine learning classification algorithms were considered such as a decision
tree, k-nearest neighbors, and multilayer perceptron [30]. Another machine learning based
classification was developed for the CDT using a digital pen, where the scoring system
used was the original Rouleau scoring system, and a comparative study was executed
on the digital pen stroke analysis [31]. A deep learning approach was recently applied
for automatic dementia screening and scoring on CDT, where the screening was for dis-
tinguishing between sick and healthy groups without scaling on the levels of dementia,
and the scoring was for a performed dementia test not for an assessment of CDT [8]. In
other words, the deep learning approach scored the CDT image in six levels according
Sensors 2021, 21, 5239 3 of 32
to the severity and status of dementia by using the image classification neural networks,
such as VGG16 [32], ResNet-152 [33], and DenseNet-121 [34]. Therefore, a qualitative and
automatic scoring system for CDT in mobile environments is still strongly demanded.
For this study, a mobile-phone app version for CDT, namely mCDT, was developed
that includes a novel, original, automatic, and qualitative scoring system. The scoring
methodology employs U-Net, a convolutional network for biomedical image segmentation,
CNN, a convolutional network for digit classification, and MNIST database, the Modified
National Institute of Standards and Technology database. Smart phone mobile sensor data
were used to develop mCDT and its scoring system. Hand drawn clock images numbered
159 and were of 128 × 128 resolution; they were obtained via mCDT to train the U-Net
to generate the trained model, DeepC, tasked with segmenting the contour of a drawn
clock image. Similarly, the U-Net was trained with the same 159 CDT 128 × 128 resolution
images to obtain DeepH, also a trained model, designed for segmenting the hands of a
drawn clock image. The MNIST database was used to train CNN to obtain the trained
model, DeepN, employed for classifying the digit images from a hand drawn clock image.
The accuracies of greater than 75% and those being saturated were obtained with the epochs
iterated for DeepC and DeepH. Similarly, accuracies of greater than 98% that were saturated
were achieved with epochs iterated for DeepN. The mobile sensor data were the x and y
coordinates, timestamps, and touch events for all the samples with a 20 ms sampling period
extracted from the mobile touch sensor. Four parameters including contour (0–3 points),
numbers (0–4 points), hands (0–5 points), and the center (0–1 points) were estimated by
using DeepC, DeepH and DeepN along with the sensor data, resulting in scaling with a
total of 13 points.
As a result, this study comes up with a significantly effective and accurate CDT scoring
system available in a mobile device, which not only achieves the state-of-the-art accuracy
on the assessment and the scaling of each CDT parametric criterion, but is also applicable
to neurological disease diagnosis as well as temporal difference assessment of cognitive
functioning in a daily lifelog. Especially, it is noticeable from the performance results for
this system to show relatively more excellent specificity and precision for the PD test group
than the young volunteer group. These results suggest that our mCDT application and the
scoring system are valuable in differentiating dementia disease subtypes and also useful
for clinical practice and field studies.
The rest of this paper is organized as follows. Section 2 describes the number of
subjects enrolled in this study and the information related to the obtained approval of
the ethics committee for gathering data from the subjects and the protocols suggested in
this study. Section 2 also describes the implementation of mCDT, a mobile-phone app
of CDT, the training models DeepC, DeepH and DeepN generated in this study, and the
CDT scoring methods using the training models and the sensor data collected from mCDT
for each of the parameters, such as the contour, numbers, hands, and center. Section 3
describes the results of the performance test of mCDT on a case-by-case basis for each of
the four parameters. Section 4 presents a discussion of the results, the limitations, and the
future works. Finally, Section 5 summarizes the overall results of this study.
age data were subjected to testing the application’s scoring method. Two neurologists
assisted in scoring of the CDT as the experts in CDT scaling, and cross checked all the
images for an objective performance analysis, as well as gathering the data of the CDT test
from the 140 PD patients. The Institutional Review Board of the Hallym University Sacred
Heart Hospital, as an independent ethics committee (IEC), ethical review board (ERB), or
research ethics board (REB), approved the gathering data and the protocols used for this
study (IRB number: 2019-03-001). Table 1 provides the age, gender, and binary CDT score
summary of the 238 volunteers and 140 PD patients.
Table 1. Statistics of age, gender, handedness and clinical status of the participants.
2.2. Implementation of the Deep Learning Based Mobile Clock Drawing Test, mCDT
The Android Studio development environment was used to develop the deep learning
based mobile application mCDT for the clock drawing test. A user of mCDT draws the
face of a clock with all the numbers present and sets the hands to a specific time such
as 10 after 11. Here, the clock face contour could be pre-drawn by mCDT as an option
chosen by the user and the specific time is randomly selected by mCDT. Then, mCDT
scores the drawn image qualitatively; this scoring is based on mobile sensor data of the
drawing image and pre-trained models, DeepC, DeepH and DeepN created in this study.
Fast and precise segmenting of the clock face contour and the hands in the images were
accomplished by DeepC and DeepH, respectively, using U-Net, a convolutional network
architecture. In turn, DeepN classifies the numbers using CNN and the MNIST database.
The mobile sensor data of x and y coordinates in pixels, timestamps in seconds, and touch
events for each samples of the drawing image are made with a 50 Hz sampling frequency.
Three types of the touch events, ‘up’, ‘down’, and ‘move’ were considered in mCDT. The
touch event ‘down’ occurs when the user starts to touch on the screen; ‘up’ when the user
ends it; and ‘move’ when the user moves the finger or the pen across the screen. Figure 1a
provides the flow chart of the processes by mCDT. Figure 1b–d provide the screen shots of
the registration window, the CDT window, and the result window of mCDT, respectively.
As shown in Figure 1a,b, an informed consent prompt appears at the launch of mCDT,
followed by a registration window for entering the subject’s information; these include
age, name, gender, education level and handedness of the subject plus optional parameters
including an email address. After pressing the start button in the registration window, the
CDT window appears as shown in Figure 1c, and the user is instructed to draw numbers
and hands on a clock face contour; the contour of the clock face is asked to be drawn by
the user or pre-drawn by mCDT as an option chosen by the user. In the drawing, the clock
hands have to set to a specific time given randomly by mCDT. The sensor data are saved
as the subject draws a contour, numbers and hands of a clock face on the touch screen of
the CDT window. The sensor data along with the drawn image are then provided in the
results window as shown in Figure 1d. The results could then be forwarded to the email
address submitted at the registration window.
Sensors 2021,21,
Sensors2021, 21,5239
x FOR PEER REVIEW 55 of
of3233
(a)
(b)
(c) (d)
Figure1.1.(a)
Figure (a)Flow
Flowdiagram
diagramofofmCDT
mCDToperation;
operation;screen
screenshots
shotsofof(b)
(b)registration
registrationwindow;
window;(c)
(c)CDT
CDTwindow;
window;and
and(d)
(d)results
results
window of
window of mCDT.mCDT.
2.3. Pre-Trained Models, DeepC, DeepH and DeepN Based on the U-Net and the CNN
Novel pre-trained models DeepC, DeepH and DeepN were developed for the seg-
mentation and classification of the contour, the hands, and the numbers, respectively, of
Sensors 2021, 21, 5239 6 of 32
2.3. Pre-Trained Models, DeepC, DeepH and DeepN Based on the U-Net and the CNN
Novel pre-trained models DeepC, DeepH and DeepN were developed for the segmen-
the clock
tation andface from a drawn
classification of clock image. DeepC
the contour, the hands, and and
DeepH the were
numbers,created based on the
respectively, U-
of the
Net convolutional network architecture and DeepN, based
clock face from a drawn clock image. DeepC and DeepH were created based on the U-Net on the CNN in keras [6]. The
U-Net and CNN
convolutional network
network architecture
architecture andimplemented
DeepN, based in this study
on the CNN areinillustrated
keras [6]. in TheFigures
U-Net
2and
andCNN 3, respectively. The U-Net
network architecture network architecture
implemented in this study consists of a contracting
are illustrated in Figurespath,2 and an3,
expansive
respectively. path,
Theand a final
U-Net layer as
network shown in consists
architecture Figure 2.ofThe contracting
a contracting path,pathanconsists
expansive of
repeated applications of two 3 × 3 convolutions and a 2 × 2
path, and a final layer as shown in Figure 2. The contracting path consists of repeated max pooling operation with
stride 2 for down-sampling.
applications At each repetition,
of two 3 × 3 convolutions and a 2 the × 2number
max poolingof feature channels
operation withisstride
doubled.
2 for
The expansive path consists of two 3 × 3 convolutions and a
down-sampling. At each repetition, the number of feature channels is doubled. The expan- 2 × 2 convolution (“up-con-
volution”) for up-sampling
sive path consists of two 3 to × recover the size and
3 convolutions of the
a 2segmentation
× 2 convolution map.(“up-convolution”)
At the final layer,
afor
1 ×up-sampling
1 convolution was used
to recover thetosizemap ofeach 16-componentmap.
the segmentation feature vector
At the finaltolayer,
the desired
a1×1
number
convolutionof classes.
was usedIn total,
to mapthe network has 23 convolutional
each 16-component layers.toThe
feature vector thetraining
desired data for
number
both DeepCInand
of classes. DeepH
total, contain has
the network 477 23 images of 128 × layers.
convolutional 128 resolution,
The training which were
data for aug-
both
mented
DeepC and usingDeepH
a module called
contain 477 ImageDataGenerator
images of 128 × 128 resolution, in keras.preprocessing.image
which were augmented and
resized
using a frommodule thecalled
original 159 images of 2400
ImageDataGenerator × 1200 resolution. The augmentation
in keras.preprocessing.image and resized from was
carried out by159
the original randomly of 2400 × 1200
images translating horizontally
resolution.or vertically using the parameter
The augmentation was carried value
out
0.2
by for both width_shifting_range
randomly translating horizontally andorheight_shifting_range.
vertically using the parameter DeepC and value DeepH
0.2 forwere
both
width_shifting_range
both trained for 100 epochs and height_shifting_range.
with an accuracy of about DeepC77.47%
and DeepH were both
and 79.56%, trained for
respectively.
100 loss
The epochs with an
function accuracy
used for theoftraining
about 77.47% and 79.56%,
was basically respectively.
a binary The loss
cross entropy. The function
CNN
used for the training was basically a binary cross entropy. The
network architecture consists of two convolution layers (C1 and C3), two pooling layers CNN network architecture
consists
(D2 of two
and D4), and convolution layers (C1layers
two fully connected and C3), (F5two
andpooling layers (D2
F6), as shown and D4),
in Figure andfirst
3. The two
fully connected
convolution layerlayers (F5 and
C1 filters the 28 F6), as input
× 28 shownnumber in Figure 3. The
image withfirst
32 convolution
kernels of size layer
5 × C1
5,
filtersthe
while 28 × 28
thesecond input number
convolution layerimage with
C3 filters the32down-sampled
kernels of size12 5 ××125,×while the second
32 feature maps
convolution
with 64 kernels layer C3 filters
of size 5 × 5 the
× 32.down-sampled
A unit stride is × 12 in
12used × both
32 feature maps with 64
the convolution kernels
layers, and of
size 5 × 5 × 32. A unit stride is used in both the convolution
a ReLU nonlinear function is used at the output of each of them. Down-sampling occurs layers, and a ReLU nonlinear
atfunction
layer D2isandusedD4 atbytheapplying
output of2 ×each of them. Down-sampling
2 non-overlapping max pooling. occurs at layer
Finally, the D2
twoand D4
fully-
by applying 2 × 2 non-overlapping max pooling. Finally,
connected layers, F5 and F6, have 1024 and 10 neurons, respectively. The MNIST hand- the two fully-connected layers,
F5 and digit
written F6, have 1024 and
database 10 neurons,
(about respectively.
60,000 images) and theThedigit
MNISTimageshandwritten
from the digit477 CDTdatabase
im-
(about
ages 60,000
were used images) andthe
to train theCNNdigit images
architecturefrom theused477here
CDTtoimages
obtainwere the used to train
trained model,the
CNN
DeepN. architecture used here to obtain the trained model, DeepN.
U-Net network
Figure2.2. U-Net
Figure network architecture
architecture forfor this
this study.
study. The
TheU-Net
U-Netnetwork
networkarchitecture
architectureisiscomposed
composedofof a contracting path,
a contracting an
path,
expansive
an expansive path, andand
path, a final layer.
a final TheThe
layer. contracting
contractingpathpath
is made up ofup
is made repetitive applications
of repetitive × 3 3convolutions
of twoof3two
applications and a
× 3 convolutions
2 × a2 2max
and × 2 pooling
max pooling operation
operation with stride
with stride 2 for down-sampling.
2 for down-sampling. The expansive
The expansive path exists
path exists × 3 convolutions
of twoof3 two 3 × 3 convolutions
and a
and
2 × a2 2convolution
× 2 convolution for up-sampling.
for up-sampling. Thelayer
The final final employs
layer employs
a 1 × 1aconvolution
1 × 1 convolution for mapping
for mapping each 16-component
each 16-component feature
feature vector
vector for the for the required
required numbernumber of classes.
of classes.
s 2021, 21, x FOR PEER REVIEW 7 of 33
Sensors 2021, 21, 5239 7 of 32
(a)
(b)
(c)
Figure 4.4. Overall
Overallflowchart
flowchartofofthe
thescoring
scoringmethod,
method,(a)(a)
forfor
parameter contour,
parameter (b)(b)
contour, forfor
parameters hands
parameters andand
hands center andand
center (c)
for parameter
(c) for numbers.
parameter numbers.
Sensors 2021, 21, 5239 9 of 32
Two forms of data that include the sensor data and the clock drawing image are
generated for output once mCDT has been completed, as shown in Figure 4a. The clock
drawing image, IC , is intended to be of a clock face with numbers, hands and a contour.
From the original 2400 × 1200 pixel drawing image at the CDT window, the clock drawing
image IC is resized to 128 × 128 pixels. Time stamps t[n] in sec, x- and y- coordinates, x [n]
and y[n] in pixels, and touch-events e[n] of the sensor data for the 128 × 128 drawing image
have a sampling rate of 50 Hz with n being the index of a sample point. Each touch-event
e[n] has a value such as −1, 0, 1; the assigned value of −1 designates the event ‘down’
for the screen being touched, 1 is for the event ‘up’ with the screen touch not continuing,
and 0 is for the event ‘move’ with moving and touching on the screen continuing.
The sensor data x [n] and y[n], ci ≤ n ≤ c f belonging to the contour in the clock
drawing image IC are obtained using the touch-events e[n], ci ≤ n ≤ c f , where ci and
c f are the start and the end indices of the contour, respectively, which can be estimated
by the touch-event down-shifting from the event ‘down’ into the event ‘move’ and the
touch-event up-shifting from the event ‘move’ into the event ‘up’, respectively. Besides,
the touch-events e[n], ci < n < c f between the touch-event down and up shiftings have to
be successively stayed in the event ‘move’ for the longest time if such a period occurs more
than once. In other words, the longest continuous sequence of 0 s in the touch-event e[n],
starting with the digit −1 and ending with the digit 1, identifies itself as belonging to the
contour in the clock drawing image IC .
The contour image I f c is segmented from the clock drawing image IC using DeepC,
the pre-trained models. Next, percentages p f c of the segmented image I f c , matching to
the corresponding portion of the clock drawing image IC , is estimated by Equation (1),
where n( IC,ci ≤n≤c f ∩ I f c ) is the number of pixel coordinates that I f c and the contour image
IC,ci ≤n≤c f have in common; and n( IC,ci ≤n≤c f ) is the total number of pixel coordinates in the
sensor data belonging to the contour image.
n( IC,ci ≤n≤c f ∩ I f c )
pfc = (1)
n( IC,ci ≤n≤c f )
n( IC,hk ≤n≤hk ∩ I f h )
i f
pkf h = (2)
n( IC,hk ≤n≤hk )
i f
The sensor data x [n] and y[n], hik ≤ n ≤ hkf , k = 1, 2 belonging to one of the hand
and minute hands images in the modified clock drawing image IC0 , are obtained using the
touch events e[n], hik ≤ n ≤ hkf , k = 1, 2, where hik and hkf are the start and the end indices
of the hand, respectively, which can be estimated by a touch-event down shifting from
the event ‘down’ into the event ‘move’ and by the touch-event up shifting from the event
‘move’ into the event ‘up’, respectively. Here, the time stamps between t[hik ] and t[hkf ] have
to be overlapped with those between ti [ j] and t f [ j], j = 1, 2, . . . , N. If there is only one or
no hands, so such a touch-event down or up shifting is not identified in the modified clock
Sensors 2021, 21, 5239 10 of 32
drawing image IC0 , then the corresponding time stamps t[hik ] and t[hkf ] are treated to have
missing values, NAs.
The minimum and maximum x- and y- coordinates in pixels, xmin c c
and xmax c
, and ymin
c
and ymax , of the sensor data x [n] and y[n], ci ≤ n ≤ c f are estimated as the boundary of
the contour and formulated by Equations (3)–(6), respectively.
c
xmin = min x [n] (3)
ci ≤ n ≤ c f
c
xmax = max x [n] (4)
ci ≤ n ≤ c f
c
ymin = min y[n] (5)
ci ≤ n ≤ c f
c
ymax = max y[n] (6)
ci ≤ n ≤ c f
c
The x- and y- coordinates in pixels, xmid and ycmid of the center point of the contour are
defined by the bisecting point of the minimum and maximum x- and y- coordinates in pixels,
c
xmin c , and yc
and xmax c
min and ymax , that are formulated by Equations (7) and (8), respectively:
c c c
xmid = ( xmin + xmax )/2 (7)
ycmid = (ymin
c c
+ ymax )/2 (8)
Pre-estimated point positions Pk = ( xd [k], yd [k]), k = 1, 2, . . . , 12 of the clock number
digits from 0 to 12 are evaluated as shown in Figure 4c by using the boundaries xmin c , xc
max ,
c c c c
ymin and ymax of the contour along with the center point xmid and ymid , where xd [k] and yd [k ]
are x- and y- coordinates in pixels of the kth digit number from 0 to 12. Table 3 summarizes
the corresponding formula of each of the pre-estimated positions Pk , k = 1, 2, . . . , 12.
Table 3. Formulas of the pre-estimated positions of the number digits from 0 to 12.
Formula
Number Digit
k xd [k] yd [k]
1 c
(2xmid c
+ xmax )/3 c
(2ymax + ycmid )/3
2 c
( xmid c
+ 2xmax )/3 c
(ymax + 2ycmid )/3
3 c
xmax ycmid
4 c
( xmid c
+ 2xmax )/3 (ymin + 2ycmid )/3
c
5 c c
(2xmid + xmax )/3 c
(2ymin + ycmid )/3
6 c
xmid c
ymin
7 c
(2xmid c ) /3
+ xmin c
(2ymin + ycmid )/3
8 c c ) /3
( xmid + 2xmin c
(ymin + 2ycmid )/3
9 c
xmin ycmid
10 c
( xmid c ) /3
+ 2xmin c
(ymax + 2ycmid )/3
11 c
(2xmid c ) /3
+ xmin c
(2ymax + ycmid )/3
12 c
xmid c
ymax
j
Next, each of the number images IC0 , j = 1, 2, . . . , N corresponding to a digit number is
cropped out from the modified clock drawing image IC0 using the function findContours()
of OpenCV2, where N is the total number of the digit images cropped out and j is the index
sorted by the time stamps in ascending order. Here, the function findContours() can be
used for finding the suburb contours of white objects from a black background [35].
j
The model DeepN classifies each of the number images IC0 , j = 1, 2, . . . , N into one
of 10 integer values ranging from 0 to 9, inclusive, and saves the identified integer in D [ j].
At the same time, spatial data, Lux [ j] Luy [ j], Ldx [ j], and Ldy [ j], j = 1, 2, . . . , N, of the suburb
contours and the corresponding time stamps ti [ j] and t f [ j], j = 1, 2, . . . , N are generated,
where Lux [ j] and Luy [ j] are the upper left x- and y- coordinates in pixel of the jth suburb
contour, respectively; Ldx [ j] and Ldy [ j] are the lower right x- and y- coordinates in pixels
Sensors 2021, 21, 5239 11 of 32
of the jth suburb contour, respectively; ti [ j] ∈ t[n] and t f [ j] ∈ t[n] are the corresponding
initial and final time stamps of the sensor data belonging to the number image in the jth
suburb contour, respectively, and the index j is sorted by the time stamp ti [ j].
FigureFigure
5. Flowcharts for for
5. Flowcharts scoring presence
scoring presenceof
of all thenumbers
all the numbers andand no additional
no additional numbers;
numbers; for scoring
for scoring correctness
correctness of the orderof the
order of
of the
thenumbers;
numbers;forfor scoring correctness of the positions of the numbers; and for scoring positioning of the
scoring correctness of the positions of the numbers; and for scoring positioning of the numbers numbers
within
withinthe
thecontour.
contour.
The correctness of the position of the numbers is evaluated by the classified outputs
2.4.3. Scoring on Criteria of Hands Parameter
D [ j], L [ j] L [ j], L [ j], and L [ j], t [ j] and t [ j], j = 1, 2, . . . , N of DeepN for the cropped
ux uy dx dy i f
Figure images
number 6 showsI j the , j flowchart
= 1, 2, . . . ,suggested
N as well as in the
thispredefined
study for scoring the
positions P(presence of, two
xd [k], yd [k ])
C0
or one
k = 1, 2, . . . , 12 of the number digits from 0 to 12. The center position P( xm [ j], ym [ j])target
hand, correctness of the proportion of the hands, correctness of the hour ,
number,
j = 1,and
2, . . correctness
. , N of each ofof the
the minute
cropped targetnumbernumber. j
images IC , j = 1, 2, . . . , N is estimated
byPresence
the bisectingof two or of
point onethehand
upper is left
evaluated
point P(by
Luxthe uy [ j ]) and theplower
[ j], Lpercentage fh of right
the segmented
point
P( L [ j], Ldy [ j]), j = 1, 2, . . . , N, estimated by Equations (9) and (10).
image dxI fh matching to the corresponding portion of the clock drawing image IC . The
[ j] = ( Lux [ jif] +the
presence of two hands isxmidentified [ j])/2, of
Ldxvalue j =the ...,N
1, 2,percentage p fh is larger(9)
than a
given threshold θ h1 . The presence of one hand is identified if the value of the percentage
ym [ j] = Luy [ j] + Ldy [ j] /2, j = 1, 2, . . . , N (10)
p fh is larger than a given threshold θ h 2 . Here, the value of the given threshold θ h 2 is
The digit number k in D [ j] is identified and the distance dcn [ j] is estimated between
smaller than that of the given threshold θ h1 since DeepH is trained j with images of the
the center position P( xm [ j], ym [ j]) of the jth cropped number images IC0 and the predefined
clock face with
position P( xd two
[k], ydhands so that the criteria
[k]) corresponding of the two
to the identified hands
digit is included
number in thethe
k in D [ j]. Then, criteria
of the one hand.
correctness of the position of the numbers is identified if the percentage of the distances
Indication
dcn [ j], j = 1, 2,of
. . the
. , N;correct
within aproportion
given limit,of `dchands is evaluated
is greater byvalue
than a given usingθdc
the . hands sensor
data x[nThe ] , positioning
y[n] , t [n ]of, the e[n ] hi ≤
andnumbers f between the time stamps t[hi ] and t [ h f ] .
n ≤ h the
within contour is evaluated by using the center point
j
P( xindication
Here, m [ j ], ym [ j ]), of
j =the
1, 2, . . . , N of of
presence each of hands
two the cropped number images
is a prerequisite of Ithe , j = 1, 2, . . . , N
C 0 indication of the
and the contour sensor data, x [n] and y[n], ci ≤ n ≤ c f . A circle FcL is fitted to the
correct proportion of hands. The hands sensor data x[n] , y[n] , t [n ] , and e[n ] hi ≤ n ≤ h f
contour sensor data, x [n] and y[n], ci ≤ n ≤ c f using the least squares circle fitting
is algorithm
divided into the center P( x [nc ], yH
two pointsets = ,{nx[n∈], y[nnc
], t[≤ ≤ n <the
n],ne[≤n]c| hi and hmradius
} and
n o
[37], where [n1c ]) c i f
H 2 R=cL{ xof
[ n],
they[ nfitted
], t[ n],circle < n obtained.
e[ n] |FhcLm are ≤ h f } using a line aclustering
Similarly, circle FcN algorithm
is fitted to the [38]. Here,point
center the time
P( xmt[[ h
stamp j], y] m [hj])<, jh= <1,h2, . .is
.,N
theusing
timealso theof
point least
thesquares circle fitting
intermission algorithm,
between the twowhere
sets H 1
m i m f
and H 2 . Then, the length 1h of one hand is estimated as the maximum distance between
Sensors 2021, 21, 5239 13 of 32
the center point P( x [ncN ], y[ncN ]) and the radius RcN of the fitted circle FcN are obtained.
Then, the positioning of the numbers is identified if the radius RcN of the fitted circle FcN is
smaller than the radius RcL of the fitted circle FcL .
Figure 6. Flowcharts
Figure forfor
6. Flowcharts scoring presence
scoring presence of
of one ortwo
one or twohands,
hands, correctness
correctness ofproportion
of the the proportion of the correctness
of the hands, hands, correctness
of the of
the hour
hourtarget
target number, andcorrectness
number, and correctness
of of
thethe minute
minute target
target number.
number.
Presence
2.4.4. Scoring of two of
Criteria or Center
one hand is evaluated by the percentage p f h of the segmented
Parameter
image I f h matching to the corresponding portion of the clock drawing image IC . The
Presence
presence or inference
of two of the center
hands is identified if the valuepointofofthethe clock face
percentage p f hinis the
largerdrawing image IC
than a given
is identified,
threshold θh1 as. The
shown in Figure
presence of one4 hand
if theispresence
identifiedof twovalue
if the or one of the hand is identified.
percentage p f h is Also,
larger than
presence a given threshold
or inference θh2 . Here,
of the center the value if
is identified ofthere
the given threshold
is a data pointθh2within is smaller thanrange
a given
that of the given threshold c θ h1 since
c DeepH is trained with images of the clock face with
from the center point P ( x mid , y mid ) .
two hands so that the criteria of the two hands is included in the criteria of the one hand.
Indication of the correct proportion of hands is evaluated by using the hands sensor
2.4.5.
dataAssignment
x [n], y[n], t[nof Scores
], and e[n]hi ≤ n ≤ h f between the time stamps t[hi ] and t[h f ]. Here, indi-
Table
cation 4 lists
of the the conditions
presence of two hands forisassigning
a prerequisite scores forindication
of the each parameter of the correctin mCDT,
propor-where
tion of hands. The hands sensor data x [ n ] ,
the heuristic values of all the thresholds used in thisnstudy y [ n ] , t [ n ] , and e [ n ] h ≤ n ≤ h
i were summarized is divided into
attwo
the
f o foot-
note.
sets H1 = { x [n], y[n], t[n], e[n]|hi ≤ n < hm } and H2 = x [n], y[n], t[n], e[n]hm < n ≤ h f
Theascore
using of the contour
line clustering parameter
algorithm [38]. Here, is viathe thetime stamp t[hmp]fch,i the
percentage < hmaximum
m < h f is the contour
time point c
of thed intermission between the two sets H and H2 . window Then, thesizes 1
lengthA`h/ W of . The
closure distance max , and the ratio of the contour to 1the CDT c c
one hand is estimated as the maximum distance between P( x [n], y1min ), hi ≤ n < hm and
score of the1 circular contour is a 1 if the percentage p fc is greater than a1 given threshold
P( x [n], ymax ), hi ≤ n < hm using the hand sensor data in the set H1 , where ymin and y1max
θ c1are
; the
thescore of theand
minimum closure contour
maximum of yiscoordinates
a 1 if the maximum in the set Hcontour
1 . Similarly, closure distance
the length
c
`2h of d max
2
another than
is greater hand ais given
estimated θ ; thedistance
as the maximum
threshold c2 score of between P( x [n], yminsized
the appropriate ), h m < n ≤ h f is a 1
contour
if the ratio of the contour to the CDT window sizes Ac / Wc is greater than a given thresh-
old θ c 3 .
The score of the numbers parameter is determined by the contour sensor data, x[n ]
Sensors 2021, 21, 5239 14 of 32
and P( x [n], y2max ), hm < n ≤ h f using the hand sensor data in the set H2 , where y2min and
y2max are the minimum and maximum of y coordinates in the set H2 . Finally, the indication
of the correct proportion of hands is identified if the length difference ∆`h =`2h − `1h
between the lengths of one and another hand is larger than a given number θ pr .
Indication of the hour target number is evaluated by using the hands sensor data x [n],
y[n], t[n], and e[n]hi ≤ n ≤ h f between the time stamps t[hi ] and t[h f ]. Two different cases
are considered here; one is that the presence of two hands is identified, and the other is that
the presence of only one hand is identified. For the first case, the hour hand sensor data
Sh with larger data size of the two sets H1 and H2 is fitted into a line and then the fitted
line is extrapolated within a range [yh,min , yh,max ] of y pixel coordinates, where yh,min is the
minimum value of y coordinates y[n], hi ≤ n < h f in the hands sensor data and yh,max , the
maximum of y coordinates y[n], ci ≤ n < c f in the contour sensor data. For the second
case, the whole hands sensor data is fitted into a line and then the fitted line is extrapolated
within a range [yh,min , yh,max ] of y pixel coordinates. Next, the closest point Ph ( x [nh ], y[nh ]),
ci ≤ n < c f of the contour sensor data to the extrapolated line is evaluated. Finally, the
indication of the hour target number is identified if the point Ph ( x [nh ], y[nh ]) is within a
given range from a predefined pixel point P( xd [ht ], yd [ht ]) of the given hour target digit ht .
Indication of minute target number is similarly evaluated by using the hands sensor
data x [n], y[n], t[n], and e[n]hi ≤ n ≤ h f between the time stamps t[hi ] and t[h f ]. Two
different cases are considered here; one is that the presence of two hands is identified, and
the other is that the presence of only one hand is identified. For the first case, the minute
hand sensor data Sm with smaller data size of the two sets S1 and S2 is fitted into a line
and then the fitted line is extrapolated within a range [yh,min , yh,max ] of y pixel coordinates,
where yh,min is the minimum value of y coordinates y[n], hi ≤ n < h f in the hands sensor
data, and yh,max is the maximum of y coordinates y[n], ci ≤ n < c f in the contour sensor
data. For the second case, the whole hands sensor data is fitted into a line and then the
fitted line is extrapolated within a range [yh,min , yh,max ] of y pixel coordinates. Next, the
closest point P( x [nm ], y[nm ]), ci ≤ n < c f of the contour sensor data to the extrapolated
line is evaluated. Finally, the indication of a minute target number is identified if the point
P( x [nm ], y[nm ]) is within a given range from a predefined pixel point P( xd [mt ], yd [mt ]) of
the given minute target digit mt . Here, the point P( x [nm ], y[nm ]) has to be the same as the
point P( x [nh ], y[nh ]) if the indication of the hour target number is already identified.
Table 4. Cont.
Total sum 13
* heuristic values of the thresholds: 1 θc1 = 75.00 pixels; 2 θc2 = 50.00 pixels; 3 θc3 = 0.1; 4 θn = 65.00 pixels; 5 `dc = 100.00 pixels;
6 θ = 0.65; 7 θ 8 9 10 ε = 200.00 pixels; 11 ε = 75.00 pixels.
dc h1 = 65.00 pixels; θ h2 = 50.00 pixels; θ pr = 30.00 pixels; c
The score of the contour parameter is via the percentage p f c , the maximum contour
closure distance dmax c , and the ratio of the contour to the CDT window sizes Ac /Wc . The
score of the circular contour is a 1 if the percentage p f c is greater than a given threshold
θc1 ; the score of the closure contour is a 1 if the maximum contour closure distance dmax c is
greater than a given threshold θc2 ; the score of the appropriate sized contour is a 1 if the
ratio of the contour to the CDT window sizes Ac /Wc is greater than a given threshold θc3 .
The score of the numbers parameter is determined by the contour sensor data, x [n] and
y[n], ci ≤ n ≤ c f , the outputs, D [ j], Lux [ j] Luy [ j], Ldx [ j], Ldy [ j], ti [ j] and t f [ j], j = 1, 2, . . . , N
of DeepN, the reference number sequences Si , i = 1, 2, 3, and the predefined number
positions P( xd [k ], yd [k]), k = 1, 2, . . . , 12 of the number digits from 0 to 12. There are
four criteria on the numbers parameter, such as the presence of all the numbers and no
additional numbers, the correctness of the order of the numbers, the correctness of the
position of the numbers, and the positioning of the numbers within the contour. The score
of the presence of all the numbers and no additional numbers is a 1 if the total number
N is equal to 15, all the values in D [ j], j = 1, 2, . . . , N are in the range from 0 to 9, and the
count number of digits 1 and 2 are 5 and 2, respectively. The score of the correctness of the
order of the numbers is a 1 if the maximum value Rseq of percentages of matched sequence
between the number sequence S N and each of the reference number sequences S1 , S2 and
S3 is greater than a given threshold θn . The score of the correctness of the position of the
numbers is a 1 if the percentage of the distances dcn [ j], j = 1, 2, . . . , N within a given limit
`dc is greater than a given value θdc , where the distance dcn [ j] is estimated between the
jth center position P(( Lux [ j] + Ldx [ j])/2, ( Luy [ j] + Ldy [ j])/2) and the predefined position
P( xd [k], yd [k]) corresponding to the identified digit number k in D [ j]. The score of the
positioning of the numbers within the contour is a 1 if the radius RcN is smaller than the
radius RcL , where the radius RcN is of the fitted circle FcN to the center point P( xm [ j], ym [ j]),
j = 1, 2, . . . , 15 and the radius RcL is of the fitted circle FcL to the contour sensor data, x [n]
and y[n], ci ≤ n ≤ c f . The score of criteria on the numbers parameter is the sum of the
scores of the presence of all the numbers and no additional numbers, the correctness of the
order of the numbers, the correctness of the position of the numbers, and the positioning of
the numbers within the contour.
The score of the hands parameter is determined by the percentage p f h , the linearly
clustered sets, H1 and H2 of the hands sensor data x [n] and y[n], hi ≤ n ≤ h f , the contour
sensor data x [n] and y[n], ci ≤ n ≤ c f , the point P( x [nh ], y[nh ]) in the range [ymin i h ],
, ymax
i
i = 1, 2 where ymin is the minimum of y coordinates y[n] in Hi , i = 1, 2 and yh,max is the
maximum of y coordinates y[n], ci ≤ n < c f . There are four criteria on hands parameter,
such as the presence of two or one hand, the indication of the correct proportion of hands,
the indication of the hour target number, the indication of the minute target number, and
the positioning of the numbers within the contour. The score of the presence of two hands
or one hand is a 2 if the value of the percentage p f h is larger than a given threshold, θh1 ; a
1 if the value of the percentage p f h is larger than a given threshold θh2 . The score of the
indication of the correct proportion of hands is a 1 if the value of the percentage p f h is
but not wholly closed. The segmented image has the estimated percentage p fc of 89.66%,
c
the maximum contour closure distance has the evaluated pixel value d max of 55.56 pixels,
and the ratio of the contour to the CDT window sizes has the estimated value Ac / Wc of
Sensors 2021, 21, 5239 c 16 of 32
0.337. The closure of the contour was evaluated to be zero, as d max was greater than the
50.00 score for θ c 2 ; however, both the circular and the size were gauged to be one, as the
estimated
larger than a given p
percentage fc wasθ h1
threshold greater
and the than the 75.00
size difference score
∆`h of θ c1extrapolated
for and
the fitted and Ac / Wc was
lines in
greater than the data0.1between
score for θc3 H
H1 and 2 is larger than
. Therefore, thea given number
total score ofθ pr
the, where H1 and
contour H2 are
parameter was
the two sets divided by a line cluster algorithm on the hands sensor data, x [n] and y[n],
evaluatedhi ≤to nbe≤two. h f . The score of the indication of the hour target number is a 1 if the point
Figure
P( x [n7c
h ] , shows
y [ n an example
h ]) is within of anoforiginal
a given range drawing
a predefined of an
pixel point appropriately
of the sized, but
given hour target,
where the
neither closed norpoint P( x [nh ],contour.
circular, y[nh ]) is obtained by estimating
The segmented the closest
image has one theofestimated
the extrapolated
percentage
hands sensor data spatially to the contour sensor data within the range [yh,min , yh,max ]. c
p fc of 52.31%, the maximum contour closure distance has the evaluated pixel value d max
The score of the indication of the minute target number is a 1 if the point P( x [n ], y[n ]) m m
of 51.56 ispixels,
within and
a given therangeratioofof the contour
predefined pixel to theofCDT
points window
the given minutesizes haswhere
target, the estimated
the
point P ( x [ n ] , y [ n ]) is obtained by estimating the closest of the extrapolated
value Ac / Wc of 0.237. Both the closure and circular of the contour were evaluated to be
m m hands sensor
data spatially to the contour sensor data within the range [yh,min , yh,max ]. Here, the point
zero, as Pp(fcx[nwas not greater than the 75.00 score for θ c1h ])and
m ], y [ nm ]) has to be the same as the point P ( x [ n h ], y [ n
c
if the dindication
max was ofgreater than the
the hour
target
50.00 score for θ c 2 ; ishowever,
number already identified.
the sizeThe score
was of the criteria
gauged on theas
to be one, hands
Ac /parameter is the than
Wc was greater
sum of the scores of the presence of two hands or one hand, the indication of the correct
the 0.1 score for θofc3hands,
proportion . Therefore, the total
the indication of thescore
hour of thenumber,
target contour theparameter
indication ofwas evaluated to
the minute
be one. target number, and the positioning of the numbers within the contour.
The score of the criteria of the center parameter is determined by the center point
Finally,c the original drawing image of Figure 7d depicts an example of a closed cir-
P( xmid , ycmid ) of the contour sensor data, x [n] and y[n], ci ≤ n ≤ c f . The score of the
cular contour, but
presence or the notinference
sized appropriately. The
of the center of the segmented
clock face is a 1 if image has the
the presence estimated
of two or one per-
p fc is
centage hands ofidentified,
97.44%, or theif maximum
there is a datacontour closure
point within distance
a given hasthe
range from thecenter
evaluated
point pixel
P ( x c , y c ).
c mid mid
value d max of 32.01 pixels, and the ratio of the contour to the CDT window sizes has the
estimated value Ac / Wc of 0.061. Both the closure and circular of the contour were evalu-
3. Results
ated to be one, as p fc was greater than the 75.00 score for θ c1 and d max
3.1. Scoring on Criteria of Contour Parameter c
was less than the
Figure 7 depicts separate examples of original drawings (first column) each with
50.00 score for θ c 2 ; however,
the segmented the column)
image (second size wasperceived
gaugedby toDeepC
be zero,
for as Ac / Wc contour,
a detected was lessthe
than the
for θc3 . The
0.1 scoreoverlapping image (third
total column)
score of theofcontour
the original and the was
parameter segmented images
evaluated to and
be. the
corresponding parameter values including the total score estimated.
(a)
(b)
Figure 7. Cont.
2021, 21, x FOR PEER REVIEW 19 of 33
Sensors 2021, 21, 5239 17 of 32
(c)
(d)
igure 7. FourFigure
cases7.ofFour
original drawings
cases of (left) along
original drawings with with
(left) along theirtheir
segmented
segmentedimages
images (middle) produced
(middle) produced bypre-trained
by the the pre-trained
model, DeepC, and their corresponding velocity graphs (right). Four representative
model, DeepC, and their corresponding velocity graphs (right). Four representative examples demonstrating examples demonstrating how contours
how con-
in original images are detected and classified in their types and sizes, showing the original image
ours in original images are detected and classified in their types and sizes, showing the original image (first column) (first column) with the with
he segmentedsegmented image (second
image (second column),
column), thethe overlapping image
overlapping image(third column)
(third and the
column) corresponding
and parameter values
the corresponding (fourthvalues
parameter
column) of each examples; (a) the case in which the original image is of a closed circular contour sized appropriately; (b) the
ourth column) of each examples; (a) the case in which the original image is of a closed circular contour sized appropri-
case in which the original image is of a circular contour sized appropriately, but not wholly closed; (c) the case in which the
ely; (b) the case in which the original image is of a circular contour sized appropriately, but not wholly closed; (c) the
original image is of an appropriately sized but neither closed nor circular contour; and (d) the case in which the original
ase in which the original image is of an appropriately sized but neither closed nor circular contour; and (d) the case in
image is of a closed circular contour but sized not appropriately. * Total score/score of contour/score of numbers/score of
hich the original image is of a closed circular contour but sized not appropriately. * Total score/score of contour/score of
hands/score of center.
umbers/score of hands/score of center.
In Figure 7a where the original image is of a closed circular contour sized appropriately,
3.2. Scoring on Criteriaimage
the segmented estimated percentage p f c of 95.74%, the maximum contour
has theParameter
of Number
closure distance has the evaluated pixel value dmax c of 9.67 pixels, and the ratio of the
In eight
contour representative
to the CDT window examples, Figure
size has the 8 demonstrates
estimated value Ac /Wc ofhow 0.339.numbers in original
Both the closure
images are anddetected
circular ofand classified
the contour wereinevaluated
their orders andaspositions,
to be one, showing
p f c was greater the75.00
than the binarized
d c
image (left), the original image with the cropped areas for the numbers (middle),, and
score for θ c1 , a threshold heuristically set, and max was less than the 50.00 score for θ c2 a the
threshold heuristically set. The size of the contour was also evaluated to be one, as Ac /Wc
corresponding parameter values (right) of each example.
was greater than the 0.1 score for θc3 , a threshold heuristically set. Therefore, the total score
Figureof the8acontour
displays the case
parameter inevaluated
was which alltothe numbers without any additional numbers
be three.
are present in the correct
Figure 7b has an orders
original and the proper
drawing image of positions
a circular within
contour thesizedcontour. The classi-
appropriately,
fied output D[ j ] , j = 1, 2 ,..., N of DeepN, the pretrained model, was identified
but not wholly closed. The segmented image has the estimated percentage p fc of 89.66%, to have
the maximum contour closure distance has the evaluated pixel value dmax c of 55.56 pixels,
no missing values, since the total umber N was equal to 15, and the classified output
and the ratio of the contour to the CDT window sizes has the estimated value Ac /Wc of D[ j ]
, j = 1, 2 ,...,
0.337.N was in theofrange
The closure fromwas
the contour zero to nine,tothat
evaluated is as
be zero, 0 ≤dmax
c
] ≤ 9greater
D [ jwas for allthanj ,the and the
50.00 score for θc2 ; however, both the circular and the size were gauged to be one, as the
numbers n( D[ j ] = 1) and n ( D[ j ] = 2) were five and two, respectively. Therefore, the pres-
estimated percentage p f c was greater than the 75.00 score for θc1 and Ac /Wc was greater
ence of all than thethenumbers
0.1 score for and
θc3 . no additional
Therefore, numbers
the total was
score of the evaluated
contour parameter to was
be one. The maxi-
evaluated
mum ratio Rseq in percentage between the number sequence S N and the reference se-
to be two.
Figure 7c shows an example of an original drawing of an appropriately sized, but
S i i =closed
quences neither 1, 2 ,3 ,norwas evaluated
circular, contour.to The
be 100.00%.
segmentedTherefore,
image has the theestimated
correctness of the order
percentage
c of Rseq
of the numbers was evaluated
p f c of 52.31%, the maximum tocontour
be one,closure
sincedistance
the valuehas 100.00 of thepixel
the evaluated maximum
value dmax ratio
51.56 pixels, and the ratio of the contour to the CDT window sizes has the estimated value
was greater Ac /W than 65.00, the heuristically given value of θ n . The
c of 0.237. Both the closure and circular of the contour were
ratio to
evaluated n(be [ j ] ≤as dc )/n(
d cnzero,
d cn [ j ] ) of the distances d cn [ j ] , j = 1, 2 ,..., 12 within 100.00 pixels, a given limit dc , was
greater than 0.65, a heuristically given value θ dc , where the evaluated distance d cn [ j ]
for each of the digit numbers k in D[ j ] in pixels was specifically 46.0, 66.9, 27.5, 32.0, 33.0,
16.6, 119.0, 123.0, 38.0, 51.1, 40.7 and 22.6. Therefore, the correctness of the position of the
Sensors 2021, 21, x FOR PEER REVIEW 23 of 33
Sensors 2021, 21, 5239 18 of 32
(a)
(b)
(c)
(d)
Figure 8. Cont.
Sensors
Sensors 2021, 2021,
21, 21, 5239
x FOR PEER REVIEW 19 of2432 of 33
(e)
(f)
(g)
(h)
FigureFigure
8. Eight8. Eight representative
representative examples
examples demonstrating
demonstrating hownumbers
how numbers in in original
originalimages
imagesare detected
are detected andandclassified in in
classified
their orders
their orders and positions,
and positions, showing
showing the the binarized
binarized image(left),
image (left),the
theoriginal
original image
imagewithwiththethecropped
cropped areas
areasfor for
the the
numbers
numbers
(middle), and the
(middle), andcorresponding
the corresponding parameter
parameter values
values(right)
(right)ofofeach
eachexample;
example; (a) (a)the
thecase
caseininwhich
which allall
thethe numbers
numbers without
without
any additional numbers are present in the correct orders and the proper positions within the contour; (b) the case in which
any additional numbers are present in the correct orders and the proper positions within the contour; (b) the case in which
all the all
numbers without
the numbers any any
without additional
additional numbers
numbers arearepresent
presentininthethe correct orderswithin
correct orders withinthe the contour
contour butbut
not not in proper
in proper
positions; (c) the
positions; (c)case in which
the case in which allallthe
the numbers without
numbers without anyany additional
additional numbers
numbers are present
are present in the
in the correct correct
orders orders but
but neither
neitherininthe
theproper
proper positions
positions nornor within
within the contour;
the contour; (d) the(d) theincase
case whichin all
which all the numbers
the numbers without any without any additional
additional numbers are num-
bers are present
present in in
thethe correct
correct orders
orders withinwithin the contour
the contour but notbutinnot
theinproper
the proper positions;
positions; (e) the(e)
casetheincase
which in some
whichnumbers
some numbers
are
are missing
missingandand thethepresented
presented numbers
numbers are arenot
notinin proper
proper positions
positions but but mostly
mostly in correct
in correct order within
order within the contour;
the contour; (f) the
(f) the case
case ininwhich
which there
there are areadditional
additional numbers
numbers not not belonging
belonging to a but
to a clock clockthebut the numbers
numbers are in
are in correct correct
orders within orders within the
the contour;
contour;(g)(g)
thethe
casecase in which
in which somesome
numbersnumbers are missing
are missing and theand the presented
presented numbers are numbers are notpositions
not in proper in proper butpositions
mostly in but mostly
correct
in correct
orderorder
withinwithin the contour;
the contour; and (h)and (h) the
the case case many
in which in which many
numbers arenumbers
missing and arethe
missing andnumbers
presented the presented
are not innumbers
proper are
not in proper
positionspositions
and correct and correct
order order the
but within butcontour.
within the contour.
* Total * Total
score/score of score/score
contour/score ofofcontour/score
numbers/score ofofnumbers/score
hands/score of
hands/score of center.
of center.
Figure
3.3. Scoring 8a displays
on Criteria the case
of Hand in which all the numbers without any additional numbers
Parameter
are present in the correct orders and the proper positions within the contour. The classified
The analytical
output ability
D [ j], j = 1, 2, . . . ,of
N the pre-trained
of DeepN, model DeepH
the pretrained model, is demonstrated
was in Figure
identified to have no 9,
with eight separate examples on how two hands in the original images are evaluated in
the presence, the correctness of the proportions, and the correctness of the target numbers,
showing the segmented image of hands (left), the original image with the cropped areas
of the target numbers, and the extrapolated lines of hands (middle), and the correspond-
ing parameter values (right) of each examples.
Sensors 2021, 21, 5239 20 of 32
missing values, since the total umber N was equal to 15, and the classified output D [ j],
j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D [ j] ≤ 9 for all j, and the
numbers n( D [ j] = 1) and n( D [ j] = 2) were five and two, respectively. Therefore, the
presence of all the numbers and no additional numbers was evaluated to be one. The
maximum ratio Rseq in percentage between the number sequence S N and the reference
sequences Si i = 1, 2, 3, was evaluated to be 100.00%. Therefore, the correctness of the order
of the numbers was evaluated to be one, since the value 100.00 of the maximum ratio Rseq
was greater than 65.00, the heuristically given value of θn . The ratio n(dcn [ j] ≤ `dc )/n(dcn [ j])
of the distances dcn [ j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit `dc , was greater
than 0.65, a heuristically given value θdc , where the evaluated distance dcn [ j] for each of
the digit numbers k in D [ j] in pixels was specifically 46.0, 66.9, 27.5, 32.0, 33.0, 16.6, 119.0,
123.0, 38.0, 51.1, 40.7 and 22.6. Therefore, the correctness of the position of the numbers
was evaluated to be one. The radius RcL of the fitted circle FcL to the contour sensor data
was obtained to be 454.7 pixels. The radius RcN of the fitted circle FcN to the center point
j
P( xm [ j], ym [ j]), j = 1, 2, . . . , 12 of each of the cropped number images IC , j = 1, 2, . . . , N
was obtained to be 351.6 pixels. Now, the positioning of the numbers within the contour
was evaluated to be one, since the radius RcN of the fitted circle FcN is smaller than the
radius RcL of the fitted circle FcL . Finally, the total score of the numbers parameter was
evaluated to be four.
Figure 8b displays the case in which all the numbers without any additional numbers
are present in the correct orders within the contour but not in proper positions. The
classified output D [ j], j = 1, 2, . . . , N of DeepN, the pretrained model, was identified to
have no missing values, since the total number N was equal to 15, and the classified output
D [ j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D [ j] ≤ 9 for all j, and
the numbers n( D [ j] = 1) and n( D [ j] = 2) were five and two, respectively. Therefore, the
presence of all the numbers and no additional numbers was evaluated to be one. The
maximum ratio Rseq in percentage between the number sequence S N and the reference
sequences Si i = 1, 2, 3, was evaluated to be 86.66%. Therefore, the correctness of the order
of the numbers was evaluated to be one, since the value 86.66 of the maximum ratio Rseq
was greater than 65.00, the heuristically given value of θn . The ratio n(dcn [ j] ≤ `dc )/n(dcn [ j])
of the distances dcn [ j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit `dc , was less than
0.65, a heuristically given value θdc , where the evaluated distance dcn [ j] in pixels for each
of the digit numbers k in D [ j] was specifically 243.0, 367.8, 366.93, 594.4, 518.2, 398.0, 491.1,
365.9, 418.2, 404.0, 628.2 and 143.6. Therefore, the correctness of the position of the numbers
was evaluated to be zero. The radius RcL of the fitted circle FcL to the contour sensor data
was obtained to be 447.6 pixels. The radius RcN of the fitted circle FcN to the center point
j
P( xm [ j], ym [ j]), j = 1, 2, . . . , 12 of each of the cropped number images IC , j = 1, 2, . . . , N
was obtained to be 338.1 pixels. Now, the positioning of the numbers within the contour
was evaluated to be one, since the radius RcN of the fitted circle FcN is smaller than the
radius RcL of the fitted circle FcL . Finally, the total score of the numbers parameter was
evaluated to be three.
Figure 8c displays the case in which all the numbers without any additional numbers
are present in the correct orders, but not in the proper positions nor within the contour.
The classified output D [ j], j = 1, 2, . . . , N of DeepN, the pretrained model, was identified
to have no missing values, since the total number N was equal to 15, and the classified
output D [ j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D [ j] ≤ 9 for all j,
and the numbers n( D [ j] = 1) and n( D [ j] = 2) were five and two, respectively. Therefore,
the presence of all the numbers and no additional numbers was evaluated to be one. The
maximum ratio Rseq in percentage between the number sequence S N and the reference
sequences Si i = 1, 2, 3, was evaluated to be 93.33%. Therefore, the correctness of the order
of the numbers was evaluated to be one, since the value 93.33 of the maximum ratio Rseq
was greater than 65.00, the heuristically given value of θn . The ratio n(dcn [ j] ≤ `dc )/n(dcn [ j])
of the distances dcn [ j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit `dc , was less than
0.65, a heuristically given value θdc , where the evaluated distance dcn [ j] for each of the
Sensors 2021, 21, 5239 21 of 32
digit numbers k in D [ j] was specifically 147.7, 170.7, 214.2, 364.0, 369.4, 139.1, 114.3, 226.4,
157.0, 242.5, 224.0 and 127.6. Therefore, the correctness of the position of the numbers
was evaluated to be zero. The radius RcL of the fitted circle FcL to the contour sensor data
was obtained to be 538.5 pixels. The radius RcN of the fitted circle FcN to the center point
j
P( xm [ j], ym [ j]), j = 1, 2, . . . , 12 of each of the cropped number images IC , j = 1, 2, . . . , N
was obtained to be 694.2 pixels. Now, the positioning of the numbers within the contour
was evaluated to be zero, since the radius RcN of the fitted circle FcN is larger than the
radius RcL of the fitted circle FcL . Finally, the total score of the numbers parameter was
evaluated to be two.
Figure 8d displays the case in which all the numbers without any additional numbers
are present in the correct orders within the contour, but not in the proper positions. The
classified output D [ j], j = 1, 2, . . . , N of DeepN, the pretrained model, was identified to
have no missing values, since the total number N was equal to 15, and the classified output
D [ j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D [ j] ≤ 9 for all j, and
the numbers n( D [ j] = 1) and n( D [ j] = 2) were five and two, respectively. Therefore, the
presence of all the numbers and no additional numbers was evaluated to be one. The
maximum ratio Rseq in percentage between the number sequence S N and the reference
sequences Si i = 1, 2, 3, was evaluated to be 100.00%. Therefore, the correctness of the order
of the numbers was evaluated to be one, since the value 100.00 of the maximum ratio Rseq
was greater than 65.00, the heuristically given value of θn . The ratio n(dcn [ j] ≤ `dc )/n(dcn [ j])
of the distances dcn [ j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit `dc , was less than
0.65, a heuristically given value θdc , where the evaluated distance dcn [ j] for each of the digit
numbers k in D [ j] was specifically 618.19, 1106.19, 1408.42, 931.44, 378.00, 185.95, 630.61,
1079.10, 1381.07, 1314.39, 720.95 and 58.69. Therefore, the correctness of the position of the
numbers was evaluated to be zero. The radius RcL of the fitted circle FcL to the contour
sensor data was obtained to be 554.8 pixels. The radius RcN of the fitted circle FcN to
j
the center point P( xm [ j], ym [ j]), j = 1, 2, . . . , 12 of each of the cropped number images IC ,
j = 1, 2, . . . , N was obtained to be 330.8 pixels. Now, the positioning of the numbers within
the contour was evaluated to be one, since the radius RcN of the fitted circle FcN is smaller
than the radius RcL of the fitted circle FcL . Finally, the total score of the numbers parameter
was evaluated to be three.
Figure 8e displays the case in which some numbers are missing and the presented
numbers are not in proper positions, but mostly in correct order within the contour. The
classified output D [ j], j = 1, 2, . . . , N of DeepN, the pretrained model, was identified to
have some missing values, since the total number N was equal to 13, and the classified
output D [ j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D [ j] ≤ 9 for all j,
and the numbers n( D [ j] = 1) and n( D [ j] = 2) were four and two, respectively. Therefore,
the presence of all the numbers and no additional numbers was evaluated to be zero. The
maximum ratio Rseq in percentage between the number sequence S N and the reference
sequences Si i = 1, 2, 3, was evaluated to be 85.71%. Therefore, the correctness of the order
of the numbers was evaluated to be one, since the value 85.71 of the maximum ratio Rseq
was greater than 65.00, the heuristically given value of θn . The ratio n(dcn [ j] ≤ `dc )/n(dcn [ j])
of the distances dcn [ j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit `dc , was less than
0.65, a heuristically given value θdc , where the evaluated distance dcn [ j] for each of the
digit numbers k in D [ j] was specifically 497.8, 462.5, 350.7, 399.3, 415.1, 254.5, 208.5, 1037.8,
836.2, 792.1, 743.6 and 952.1. Therefore, the correctness of the position of the numbers
was evaluated to be zero. The radius RcL of the fitted circle FcL to the contour sensor data
was obtained to be 319.8 pixels. The radius RcN of the fitted circle FcN to the center point
j
P( xm [ j], ym [ j]), j = 1, 2, . . . , 12 of each of the cropped number images IC , j = 1, 2, . . . , N
was obtained to be 193.2 pixels. Now, the positioning of the numbers within the contour
was evaluated to be one, since the radius RcN of the fitted circle FcN is smaller than the
radius RcL of the fitted circle FcL . Finally, the total score of the numbers parameter was
evaluated to be two.
Sensors 2021, 21, 5239 22 of 32
Figure 8f displays the case in which there are additional numbers not belonging to
a clock but the numbers are in correct orders within the contour. The classified output
D [ j], j = 1, 2, . . . , N of DeepN the pretrained model was identified to have some addi-
tional numbers, since the total number N was equal to 42, and the classified output D [ j],
j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D [ j] ≤ 9 for all j, and the
numbers n( D [ j] = 1) and n( D [ j] = 2) was thirteen and nine, respectively. Therefore, the
presence of all the numbers and no additional numbers was evaluated to be zero. The
maximum ratio Rseq in percentage between the number sequence S N and the reference
sequences Si i = 1, 2, 3, were evaluated to be 83.57%. Therefore, the correctness of the order
of the numbers was evaluated to be one, since the value 83.57 of the maximum ratio Rseq
was greater than 65.00, the heuristically given value of θn . The ratio n(dcn [ j] ≤ `dc )/n(dcn [ j])
of the distances dcn [ j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit `dc , was less than
0.65, a heuristically given value θdc , where the evaluated distance dcn [ j] for each of the
digit numbers k in D [ j] was specifically 376.5, 483.4, 432.1, 636.2, 743.4, 856.4, 947.7, 1056.9,
1171.4, 1113.0, 826.2 and 837.7. Therefore, the correctness of the position of the numbers
was evaluated to be zero. The radius RcL of the fitted circle FcL to the contour sensor data
was obtained to be 694.2 pixels. The radius RcN of the fitted circle FcN to the center point
j
P( xm [ j], ym [ j]), j = 1, 2, . . . , 12 of each of the cropped number images IC , j = 1, 2, . . . , N
was obtained to be 446.1 pixels. Now, the positioning of the numbers within the contour
was evaluated to be one, since the radius RcN of the fitted circle FcN was smaller than the
radius RcL of the fitted circle FcL . Finally, the total score of the numbers parameter was
evaluated to be two.
Figure 8g displays the case in which some numbers are missing and the presented
numbers are not in proper positions, but mostly in correct order within the contour. The
classified output D [ j], j = 1, 2, . . . , N of DeepN, the pretrained model, was identified to
have some additional numbers, since the total number N was equal to 11, and the classified
output D [ j], j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D [ j] ≤ 9 for all j,
and the numbers n( D [ j] = 1) and n( D [ j] = 2) were two and one, respectively. Therefore,
the presence of all the numbers and no additional numbers was evaluated to be zero. The
maximum ratio Rseq in percentage between the number sequence S N and the reference
sequences Si i = 1, 2, 3, were evaluated to be 66.66%. Therefore, the correctness of the order
of the numbers was evaluated to be one, since the value 66.66 of the maximum ratio Rseq
was greater than 65.00, the heuristically given value of θn . The ratio n(dcn [ j] ≤ `dc )/n(dcn [ j])
of the distances dcn [ j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit `dc , was less than
0.65, a heuristically given value θdc , where the evaluated distance dcn [ j] for each of the digit
numbers k in D [ j] was specifically 87.7, 68.9, 56.1, 78.0, 163.0, 190.1, 232.8, 265.3, 894.3, 860.6,
802.5 and 990.8. Therefore, the correctness of the position of the numbers was evaluated
to be zero. The radius RcL of the fitted circle FcL to the contour sensor data was obtained
to be 242.4 pixels. The radius RcN of the fitted circle FcN to the center point P( xm [ j], ym [ j]),
j
j = 1, 2, . . . , 12 of each of the cropped number images IC , j = 1, 2, . . . , N was obtained to be
149.9 pixels. Now, the positioning of the numbers within the contour was evaluated to be
one, since the radius RcN of the fitted circle FcN is smaller than the radius RcL of the fitted
circle FcL . Finally, the total score of the numbers parameter was evaluated to be two.
Figure 8h displays the case in which many numbers are missing and the presented
numbers are not in proper positions and correct order, but within the contour. The classified
output D [ j], j = 1, 2, . . . , N of DeepN the pretrained model was identified to have some
missing numbers, since the total number N is equal to five, and the classified output D [ j],
j = 1, 2, . . . , N was in the range from zero to nine, that is 0 ≤ D [ j] ≤ 9 for all j, and the
numbers n( D [ j] = 1) and n( D [ j] = 2) were one and two, respectively. Therefore, the
presence of all the numbers and no additional numbers was evaluated to be zero. The
maximum ratio Rseq in percentage between the number sequence S N and the reference
sequences, Si i = 1, 2, 3, were evaluated to be 42.10%. Therefore, the correctness of the order
of the numbers was evaluated to be zero since the value 42.10 of the maximum ratio Rseq
was less than 65.00, the heuristically given value of θn . The ratio n(dcn [ j] ≤ `dc )/n(dcn [ j])
parameter was evaluated to be three.
Figure 9f displays the case in which only one hand is present with the proper targ
number. In this case, the percentage p fh was evaluated to be 64.4% less than 65.0%, t
Sensors 2021, 21, 5239
score for θ h1 , and greater than 50.0%, the score for θ h 2 . Therefore, the presence 23 of 32
of tw
hands was scored to be one. The distance abs( P ( x[nh ], y[nh ]) − P( x[ht ], y[ht ])) was es
matedof to be 133.6 pixels less than 200.0 pixels, the score for ε but the distan
the distances dcn [ j], j = 1, 2, . . . , 12 within 100.00 pixels, a given limit `dc , was less than
abs( P0.65,
( x[nm ], y[ nm ]) − Pgiven
a heuristically ( x[mvalue
t ], y[ m
θdct ])) was
, where theestimated to be 1888.6
evaluated distance pixels
dcn [ j] for greater
each of the digitthan 200
pixels,numbers
the score ε . Therefore,
D [ j] was
k in for specifically 545.5, 920.4, 1167.1, 1739.7,
the correctness of the1644.2,
hand1464.5,
target1338.6,
number 1095.1,
was eval
715.6, 630.7, 734.1 and 153.0. Therefore, the correctness of the position of the numbers
ated to be one, and the correctness of the minute target number was also evaluated to
was evaluated to be zero. The radius RcL of the fitted circle FcL to the contour sensor data
one. Finally,
was obtainedthe total
to be score of theThe
601.1 pixels. hands parameter
radius was evaluated
RcN of the fitted circle FcN to to thebe two.point
center
Figure 9g displays the case in which only one hand is present jwith neither the prop
P( xm [ j], ym [ j]), j = 1, 2, . . . , 15 of each of the cropped number images IC , j = 1, 2, . . . , 15
proportions nor the
was obtained to betarget
446.3 numbers.
pixels. Now,Inthe this case, theofpercentage
positioning the numbers within p fh was evaluated to
the contour
was evaluated to be one, since the radius RcN of the fitted circle FcN is smaller than the
63.7%radius
less than
RcL of65.0%, thecircle
the fitted scoreFcLfor θ h1 and
. Finally, the greater than
total score 50.0%,
of the the parameter
numbers θ h 2 . Ther
score for was
fore, evaluated
the presenceto be one.of two hands was scored to be one. Both of the distanc
abs( P3.3.
( x[nScoring
h ], y[ non − P( x[ofhtHand
h ])Criteria t ])) and abs( P ( x[ n m ], y[ nm ]) − P ( x[ mt ], y[ mt ])) were es
], y[hParameter
mated to be The229.63 greater
analytical than
ability 200.0
of the pixels,model
pre-trained the score
DeepHforis εdemonstrated
. Therefore,inthe correctness
Figure 9,
hand with
target number
eight separatewas evaluated
examples on howto two
be zero.
handsFinally, the total
in the original score
images are of the hands
evaluated in param
the presence,
eter was evaluated theto
correctness
be one. of the proportions, and the correctness of the target numbers,
showing the segmented
Figure 9h displays the image
caseofin
hands
which(left),
nothehands
original image
are with the
present. cropped areas
Therefore, theoftotal sco
the target numbers, and the extrapolated lines of hands (middle), and the corresponding
of theparameter
hands parameter was evaluated to be zero.
values (right) of each examples.
(a)
(b)
(c)
Figure 9. Cont.
ors 2021, 21, x FOR PEER REVIEW
Sensors 2021, 21, 5239 24 of 32
27 of
(d)
(e)
(f)
(g)
(h)
Figure 9. Eight
Figurerepresentative examples
9. Eight representative demonstrating
examples demonstrating how
hownumbers
numbers ininoriginal
original images
images are detected
are detected and classified
and classified in in
their orderstheir
andorders
positions, showing
and positions, the binarized
showing image
the binarized image (left),
(left),the
the original image
original image with
with the cropped
the cropped areas
areas for for the numbers
the numbers
(middle),
(middle), and and the corresponding
the corresponding parameter
parameter values
values (right)of
(right) of each
each examples;
examples; (a) (a)
the the
case case
in which two hands
in which twoarehands
presentare present
with the proper proportions and target numbers; (b) another example of the case in which two hands are present with the
proper proportions and target numbers, where one of the target numbers is not in the proper position; (c) the case in which
two hands are present with the proper proportions but one of them is not indicating the target number; (d) the case in
which two hands are present with the proper target numbers but not the proper proportions; (e) the case in which two
Sensors 2021, 21, 5239 25 of 32
with the proper proportions and target numbers; (b) another example of the case in which two hands are present with the
proper proportions and target numbers, where one of the target numbers is not in the proper position; (c) the case in which
two hands are present with the proper proportions but one of them is not indicating the target number; (d) the case in which
two hands are present with the proper target numbers but not the proper proportions; (e) the case in which two hands
are present with the proper proportions but not the proper target numbers; (f) the case in which only one hand is present
with the proper target number; (g) the case in which only one hand is present with neither the proper proportions nor the
target numbers; and (h) the case in which no hands are present. * Total score/score of contour/score of numbers/score of
hands/score of center.
Figure 9a displays the case in which two hands are present with the proper proportions
and target numbers. In this case, the percentage p f h was evaluated to be 100.00% greater
than 65% the score for θh1 . Therefore, the presence of two hands was scored to be two. The
length difference ∆`h was evaluated to be 89.4 pixels greater than 30.0 pixels the score for
θ pr . Therefore, the correct proportion of the two hands was also scored to be one. Both of the
distances abs( P( x [nh ], y[nh ]) − P( x [ht ], y[ht ])) and abs( P( x [nm ], y[nm ]) − P( x [mt ], y[mt ]))
were estimated to be 123.9 and 86.6 pixels less than 200.0 pixels, the score for ε, respectively.
Therefore, both the correctness of hand and minute target numbers were evaluated to be
one. Finally, the total score of the hands parameter was evaluated to be five.
Figure 9b displays another example of the case in which two hands are present with
the proper proportions and target numbers, where one of the target numbers is not in the
proper position. In this case, the percentage p f h was evaluated to be 73.37% greater than
65% the score for θh1 . Therefore, the presence of two hands was scored to be two. The length
difference ∆`h was evaluated to be 219.3 pixels greater than 30.0 pixels, the score for θ pr .
Therefore, the correct proportion of the two hands was also scored to be one. Both of the
distances abs( P( x [nh ], y[nh ]) − P( x [ht ], y[ht ])) and abs( P( x [nm ], y[nm ]) − P( x [mt ], y[mt ]))
were estimated to be 57.7 and 44.5 pixels less than 200.0 pixels, the score for ε, respectively.
Therefore, both the correctness of hand and minute target numbers were evaluated to be
one. Finally, the total score of the hands parameter was evaluated to be five.
Figure 9c displays the case in which two hands are present with the proper propor-
tions, but one of them is not indicating the target number. In this case, the percentage
p f h was evaluated to be 65.35% greater than 65%, the score for θh1 . Therefore, the pres-
ence of two hands was scored to be two. The length difference ∆`h was evaluated to be
101.7 pixels less than 30.0 pixels, the score for θ pr . Therefore, the correct proportion of the
two hands was also scored to be one. The distance abs( P( x [nh ], y[nh ]) − P( x [ht ], y[ht ]))
was estimated to be 110.7 pixels less than 200.0 pixels, the score for ε, but the distance
abs( P( x [nm ], y[nm ]) − P( x [mt ], y[mt ])) was estimated to be 292.0 pixels greater than
200.0 pixels, the score for ε. Therefore, the correctness of the hand target number was
evaluated to be one, but the correctness of the minute target number was also evaluated to
be one. Finally, the total score of the hands parameter was evaluated to be four.
Figure 9d displays the case in which two hands are present with the proper target
numbers but not the proper proportions. In this case, the percentage p f h was evaluated
to be 89.8% greater than 65.0%, the score for θh1 . Therefore, the presence of two hands
was scored to be two. The length difference ∆`h was evaluated to be 63.9 pixels less
than 30.0 pixels, the score for θ pr . Therefore, the correct proportion of the two hands
was also scored to be zero. Both of the distances abs( P( x [nh ], y[nh ]) − P( x [ht ], y[ht ])) and
abs( P( x [nm ], y[nm ]) − P( x [mt ], y[mt ])) were estimated to be 60.5 and 109.5 pixels less than
200.0 pixels, the score for ε, respectively. Therefore, both the correctness of the hand and
minute target numbers were evaluated to be one. Finally, the total score of the hands
parameter was evaluated to be four.
Figure 9e displays the case in which two hands are present with the proper proportions
but not the proper target numbers. In this case, the percentage p f h was evaluated to be
91.1% greater than 65.0%, the score for θh1 . Therefore, the presence of two hands was
scored to be two. The length difference ∆`h was evaluated to be 37.80 pixels greater
than 30.0 pixels, the score for θ pr . Therefore, the correct proportion of the two hands
was also scored to be one. Both of the distances abs( P( x [nh ], y[nh ]) − P( x [ht ], y[ht ])) and
Sensors 2021, 21, 5239 26 of 32
abs( P( x [nm ], y[nm ]) − P( x [mt ], y[mt ])) were estimated to be 610.1 and 540.1 pixels greater
than 200.0 pixels, the score for ε, respectively. Therefore, both the correctness of hand and
minute target numbers were evaluated to be zero. Finally, the total score of the hands
parameter was evaluated to be three.
Figure 9f displays the case in which only one hand is present with the proper target
number. In this case, the percentage p f h was evaluated to be 64.4% less than 65.0%,
the score for θh1 , and greater than 50.0%, the score for θh2 . Therefore, the presence
of two hands was scored to be one. The distance abs( P( x [nh ], y[nh ]) − P( x [ht ], y[ht ]))
was estimated to be 133.6 pixels less than 200.0 pixels, the score for ε but the distance
abs( P( x [nm ], y[nm ]) − P( x [mt ], y[mt ])) was estimated to be 1888.6 pixels greater than
200.0 pixels, the score for ε. Therefore, the correctness of the hand target number was
evaluated to be one, and the correctness of the minute target number was also evaluated to
be one. Finally, the total score of the hands parameter was evaluated to be two.
Figure 9g displays the case in which only one hand is present with neither the
proper proportions nor the target numbers. In this case, the percentage p f h was eval-
uated to be 63.7% less than 65.0%, the score for θh1 and greater than 50.0%, the score for
θh2 . Therefore, the presence of two hands was scored to be one. Both of the distances
abs( P( x [nh ], y[nh ]) − P( x [ht ], y[ht ])) and abs( P( x [nm ], y[nm ]) − P( x [mt ], y[mt ])) were esti-
mated to be 229.63 greater than 200.0 pixels, the score for ε. Therefore, the correctness
of hand target number was evaluated to be zero. Finally, the total score of the hands
parameter was evaluated to be one.
Figure 9h displays the case in which no hands are present. Therefore, the total score of
the hands parameter was evaluated to be zero.
(a)
(b)
(c)
(d)
Table 5. Frequency of the ground truth of the 219 images in each criteria of the parameters of the
scoring method of mCDT.
Errors in
Frequency
Parameter Criteria Estimation
Count (%)
Count (%)
Contour is circular 217(99.08) 6(2.76)
Contour Contour is closed 178(81.27) 13(7.30)
Contour size is appropriate 215(98.17) 1(0.46)
Numbers are all present without
153(69.86) 11(7.18)
additional numbers
Numbers Numbers are in corrected order 181(82.64) 5(2.76)
Numbers are in the correct positions 88(40.18) 2(2.27)
Numbers are within the contour 202(92.23) 2(0.99)
Two hands are present 171(78.08) 13(7.60)
One hand is present 181(82.64) 6(3.31)
Hands Hands are in correct proportion 170(77.62) 13(7.64)
Hour target number is indicated 153(69.86) 1(0.65)
Minute target number is indicated 149(68.03) 6(4.02)
Center A center is drawn or inferred 190(86.75) 3(1.57)
Tables 6 and 7 list the distribution of the estimated scores and the performance
of each scoring parameter, respectively in total as well as in the two separate groups,
young volunteers and PD patients. As shown in Table 7, for the parameter contour,
sensitivity, specificity, accuracy and precision, values were 89.33%, 92.68%, 89.95% and
98.15%; for numbers, they were 80.21%, 95.93%, 89.04% and 93.90%; for hands, they were
83.87%, 95.31%, 87.21% and 97.74%; and for center, they were 98.42%, 86.21%, 96.80%
and 97.91%, respectively.
Table 6. Distribution of the estimated scores for each scoring parameters in mCDT.
4. Discussion
A conventional CDT based on a paper and pencil test for examining the active and
dynamic mechanisms of the cognitive function is inadequate. With the conventional CDT,
multiple studies have indicated that a number of brain regions are recruited for the tasks
required; these include the temporal lobes, frontal and parietal lobes in addition to the
cerebellum, thalamus, premotor area and inferior temporal sulcus, the bilateral parietal
lobe, and the sensorimotor cortex [39,40]. What is not clearly known are which portions
of the cognitive function are required for recruiting these areas, as with the conventional
CDT such an association and a quantitation would be difficult to accomplish. Our study
sought to address this requirement from the CDT, and by introducing mCDT as a mobile
phone application with a qualitative, automatic scoring system of CDT, this may have
been realized. As elaborated previously, the mCDT scoring system was constructed using
CNN, a convolutional network for digit classification, U-Net, a convolutional network for
biomedical image segmentation, and the MNIST database, the Modified National Institute
of Standards and Technology database. The sensor data is also collected by mCDT. From
the performance test results, the scoring algorithm in mCDT is efficient and accurate when
compared with those of the traditional CDT. In addition, mCDT is able to evaluate the
relevant components of the cognitive function. The subjects in our study carried out the
drawings with a smart pen on a smartphone screen when required to reproduce figures in
a setting similar to the conventional CDT using a pen and paper. This method also allows
for increased accuracy in gauging the drawing process and also minimizing any noise in an
assay for activated brain function. The smartphone could also provide the motor-related
markers of speed and pausing as the test is being carried out; in a conventional CDT pencil
and paper test, such motor ability function cognitive tools may not be easily implemented.
In summary, our study introduces the developed mCDT as a tool for increasing the accuracy
required for the cognitive function evaluation in CDT. As described in the performance
test results, mCDT showed fairly good statistical indicators, especially excellent values
in specificity and precision. Furthermore, the values of specificity and precision for PD
patient groups were better than those for the young volunteer group, which suggested
that mCDT does classify the two groups very well and consistently so that it is applicable
as a diagnostic tool in neurological disease group and also as a correlation tool between
the scores of each criteria and the regional functions of the degenerated brain. Of course,
the ability as a correlation tool needs to be investigated in future work, some preliminary
studies of which is ongoing, with several clinical groups in collaboration with primary care
physicians and neurology subspecialists. Furthermore, since the presented CDT scoring
method here uses sensor data collected from a smart mobile device and deep learning
Sensors 2021, 21, 5239 30 of 32
based algorithm in the CDT image segmentation and processing, other stroke behavior
patterns due to neurological disease symptoms such as motor, memory, and cognitive
disorders could be additionally extracted using stroke speed variation and touch event
sequence patterns that could be estimated from the sensor data; even the CDT scoring is
limited to four parameters with thirteen criteria.
5. Conclusions
In this study, a mobile phone application mCDT for the CDT was implemented, and
also an automatic and qualitative scoring method in thirteen criteria was developed using
mobile sensor data and deep learning algorithms, U-Net and CNN. A young healthy
volunteer (n = 238, 147 males and 89 females, aged 23.98 ± 2.83 years) and a PD patient
(n = 140, 76 males and 64 females, aged 75.09 ± 8.57 years) group were recruited and partici-
pated in the training models DeepC, DeepH and DeepN, and in validating the performance
of the CDT scoring algorithm. Most of the resulting overall statistical indicators, sensitivity,
specificity, accuracy, and precision greater than 85%, were acquired at the performance
validation of the 79 young healthy volunteers and 140 PD patients. Two exceptions were
recognized at the sensitivities of the number and the hands parameters. Especially, the
specificities of the contour and hand parameter of the young volunteer group were shown
to be far too low (60.00% and 66.67%, respectively), which was because the number of true
negatives and false positives were a lot smaller, as well as they were in relatively similar
proportions. Furthermore, the specificities and the precisions of the PD patients group
were better than those of the young volunteer group, which suggests that the mCDT along
with the scoring method is available to be used as a tool of classifying neurological disease
groups and also as a tool of scaling the disease symptoms related to degenerated regions
of the brain. Further clinical studies should be established in differentiating neurological
disease subtypes, being valuable in clinical practice and for studies in the field.
Author Contributions: Conceptualization, U.L.; data curation, I.P. and U.L.; formal analysis, U.L.;
funding acquisition, U.L.; investigation, U.L.; methodology, U.L.; project administration, U.L.;
resources, U.L.; software, I.P. and U.L.; supervision, U.L.; validation, U.L.; visualization, I.P. and U.L.;
writing—original draft, U.L.; writing—review and editing, U.L. All authors have read and agreed to
the published version of the manuscript.
Funding: This research was supported by Basic Science Research (2020R1F1A1048281) and Hallym
University Research Fund (HRF-202003-021).
Institutional Review Board Statement: The Institutional Review Board of the Hallym University
Sacred Heart Hospital approved the gathering data and the protocols used for this study (IRB number:
2019-03-001).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author. The data are not publicly available due to privacy issue.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Biundo, R.; Weis, L.; Bostantjopoulou, S.; Stefanova, E.; Falup-Pecurariu, C.; Kramberger, M.G.; Geurtsen, G.J.; Antonini, A.;
Weintraub, D.; Aarsland, D. MMSE and MoCA in Parkinson’s disease and dementia with Lewy bodies: A multicenter 1-year
follow-up study. J. Neural. Transm. 2016, 123, 431–438. [CrossRef]
2. Mittal, C.; Gorthi, S.P.; Rohatgi, S. Early Cognitive Impairment: Role of Clock Drawing Test. Med. J. Armd. Forces India 2010, 66,
25–28. [CrossRef]
3. Aprahamian, I.; Martinelli, J.E.; Neri, A.L.; Yassuda, M.S. The Clock Drawing Test: A review of its accuracy in screening for
dementia. Dement. Neuropsychol. 2009, 3, 74–81. [CrossRef]
4. Youn, Y.C.; Pyun, J.-M.; Ryu, N.; Baek, M.J.; Jang, J.-W.; Park, Y.H.; Ahn, S.-W.; Shin, H.-W.; Park, K.-Y.; Kim, S.Y. Use of the
Clock Drawing Test and the Rey–Osterrieth Complex Figure Test-copy with convolutional neural networks to predict cognitive
impairment. Alzheimer’s Res. Ther. 2021, 13, 1–7. [CrossRef]
Sensors 2021, 21, 5239 31 of 32
5. Straus, S.H. Use of the automatic clock drawing test to rapidly screen for cognitive impairment in older adults, drivers, and the
physically challenged. J. Am. Geriatr. Soc. 2007, 55, 310–311. [CrossRef] [PubMed]
6. Chen, S.; Stromer, D.; Alabdalrahim, H.A.; Schwab, S.; Weih, M.; Maier, A. Automatic dementia screening and scoring by applying
deep learning on clock-drawing tests. Sci. Rep. 2020, 10, 1–11. [CrossRef]
7. Park, I.; Kim, Y.J.; Kim, Y.J.; Lee, U. Automatic, Qualitative Scoring of the Interlocking Pentagon Drawing Test (PDT) Based on
U-Net and Mobile Sensor Data. Sensors 2020, 20, 1283. [CrossRef]
8. Mann, D.L. Heart Failure: A Companion to Braunwald’s Heart Disease; Elsevier: Amsterdam, The Netherlands, 2011. [CrossRef]
9. Spenciere, B.; Alves, H.; Charchat-Fichman, H. Scoring systems for the Clock Drawing Test: A historical review. Dement.
Neuropsychol. 2017, 11, 6–14. [CrossRef]
10. Eknoyan, D.; Hurley, R.A.; Taber, K.H. The Clock Drawing Task: Common Errors and Functional Neuroanatomy. J. Neuropsychiatry
Clin. Neurosci. 2012, 24, 260–265. [CrossRef]
11. Talwar, N.A.; Churchill, N.W.; Hird, M.A.; Pshonyak, I.; Tam, F.; Fischer, C.E.; Graham, S.J.; Schweizer, T.A. The Neural Correlates
of the Clock-Drawing Test in Healthy Aging. Front. Hum. Neurosci. 2019, 13, 25. [CrossRef]
12. Yuan, J.; Libon, D.J.; Karjadi, C.; Ang, A.F.; Devine, S.; Auerbach, S.H.; Au, R.; Lin, H. Association Between the Digital Clock
Drawing Test and Neuropsychological Test Performance: Large Community-Based Prospective Cohort (Framingham Heart
Study). J. Med. Internet Res. 2021, 23, e27407. [CrossRef] [PubMed]
13. Shulman, K. Clock-drawing: Is it the ideal cognitive screening test? Int. J. Geriatr. Psychiatry 2000, 15, 548–561. [CrossRef]
14. Sunderland, T.; Hill, J.L.; Mellow, A.M.; Lawlor, B.A.; Gundersheimer, J.; Newhouse, P.A.; Grafman, J.H. Clock drawing in
Alzheimer’s disease. Nov. Meas. Dement. Sev. J. Am. Geriatr Soc. 1989, 37, 725–729. [CrossRef] [PubMed]
15. Nasreddine, Z.S.; Phillips, N.A.; Bédirian, V.; Charbonneau, S.; Whitehead, V.; Collin, I. The Montreal Cognitive Assessment,
MoCA: A brief screening tool for mild cognitive impairment. J. Am. Geriatr Soc 2005, 53, 695–699. [CrossRef]
16. Souillard-Mandar, W.; Davis, R.; Rudin, C.; Au, R.; Libon, D.J.; Swenson, R.; Price, C.C.; Lamar, M.; Penney, D.L. Learning
Classification Models of Cognitive Conditions from Subtle Behaviors in the Digital Clock Drawing Test. Mach. Learn. 2016, 102,
393–441. [CrossRef]
17. Nirjon, S.; Emi, I.A.; Mondol, A.S.; Salekin, A.; Stankovic, J.A. MOBI-COG: A Mobile Application for Instant Screening of
Dementia Using the Mini-Cog Test. In Proceedings of the Wireless Health 2014 on National Institutes of Health, Bethesda, MD,
USA, 1–7 October 2014. [CrossRef]
18. Fabricio, A.T.; Aprahamian, I.; Yassuda, M.S. Qualitative analysis of the Clock Drawing Test by educational level and cognitive
profile. Arq. Neuropsiquiatr. 2014, 72, 289–295. [CrossRef]
19. Borson, S.; Scanlan, J.; Brush, M.; Vitaliano, P.; Dokmak, A. The mini-cog: A cognitive “vital signs” measure for dementia
screening in multi-lingual elderly. Int. J. Geriatr. Psychiatry 2000, 15, 1021–1027. [CrossRef]
20. Harbi, Z.; Hicks, Y.; Setchi, R. Clock Drawing Test Interpretation System. Procedia Comput. Sci. 2017, 112, 1641–1650. [CrossRef]
21. Kim, H.; Cho, Y.S.; Do, E.I. Computational clock drawing analysis for cognitive impairment screening. In Proceedings of the Fifth
International Conference on Tangible, Embedded, and Embodied Interaction; Gross, M.D., Ed.; ACM: New York, NY, USA, 2011; pp.
297–300.
22. Caffarraa, P.; Gardinia, S.; Diecib, F.; Copellib, S.; Maseta, L.; Concaria, L.; Farinac, E.; Grossid, E. The qualitative scoring MMSE
pentagon test (QSPT): A new method for differentiating dementia with Lewy Body from Alzheimer’s Disease. Behav. Neurol.
2013, 27, 213–220. [CrossRef]
23. Davis, R.; Libon, D.; Au, R.; Pitman, D.; Penney, D. Think: Inferring cognitive status from subtle behaviors. IEEE Int. Conf. Robot.
Autom. 2014, 2898–2905. Available online: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4825804/ (accessed on 2 August
2021). [CrossRef]
24. Manos, P.J.; Wu, R. The Ten Point Clock Test: A Quick Screen and Grading Method for Cognitive Impairment in Medical and
Surgical Patients. Int. J. Psychiatry Med. 1994, 24, 229–244. [CrossRef] [PubMed]
25. Royall, D.R.; Cordes, J.A.; Polk, M. CLOX: An executive clock drawing task. J. Neurol. Neurosurg. Psychiatry 1998, 64, 588–594.
[CrossRef] [PubMed]
26. Rouleau, I.; Salmon, D.P.; Butters, N.; Kennedy, C.; McGuire, K. Quantitative and qualitative analyses of clock drawings in
Alzheimer’s and Huntington’s disease. Brain Cogn. 1992, 18, 70–87. [CrossRef]
27. Muayqil, T.A.; Tarakji, A.R.; Khattab, A.M.; Balbaid, N.T.; Al-Dawalibi, A.M.; AlQarni, S.A.; Hazazi, R.A.; Alanazy, M.H.
Comparison of Performance on the Clock Drawing Test Using Three Different Scales in Dialysis Patients. Behav. Neurol. 2020,
2020, 1–7. [CrossRef] [PubMed]
28. Shao, K.; Dong, F.; Guo, S.; Wang, W.; Zhao, Z.; Yang, Y.; Wang, P.; Wang, J. Clock-drawing test: Normative data of three
quantitative scoring methods for Chinese-speaking adults in Shijiazhuang City and clinical utility in patients with acute ischemic
stroke. Brain Behav. 2020, 10, e01806. [CrossRef]
29. De Pandis, M.F.; Galli, M.; Vimercati, S.; Cimolin, V.; De Angelis, M.V.; Albertini, G. A New Approach for the Quantitative
Evaluation of the Clock Drawing Test: Preliminary Results on Subjects with Parkinson’s Disease. Neurol. Res. Int. 2010, 2010, 1–6.
[CrossRef]
30. Guha, A.; Kim, H.; Do, E.Y. Automated Clock Drawing Test through Machine Learning and Geometric Analysis. In Proceedings
of the 16th International Conference on Distributed Multimedia Systems, DMS 2010, Hyatt Lodge at McDonald’s Campus, Oak
Brook, IL, USA, 14–16 October 2010.
Sensors 2021, 21, 5239 32 of 32
31. William, S.; Randall, D.; Cynthia, R.; Rhoda, A.; Dana, L.P. Interpretable Machine Learning Models for the Digital Clock Drawing
Test. In Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY,
USA, 23 June 2016.
32. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
33. Huang, G.; Liu, Z.; Weinberger, K.Q. Densely Connected Convolutional Networks. 2016. Available online: http://arxiv.org/abs/
1608.06993 (accessed on 2 August 2021).
34. Folstein, M.F.; Folstein, S.E.; McHugh, P.R. mini-mental state: A practical method for grading the cognitive state of patients for
the clinician. J. Psychiatr. Res. 1975, 12, 189–198. [CrossRef]
35. Available online: https://docs.opencv.org/3.4/df/d0d/tutorial_find_contours.html (accessed on 2 August 2021).
36. Available online: https://www.kite.com/python/docs/difflib.SequenceMatcher (accessed on 2 August 2021).
37. Available online: https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html (accessed on 2 August 2021).
38. Available online: https://pythonprogramming.net/how-to-program-best-fit-line-machine-learning-tutorial/ (accessed on 2
August 2021).
39. Freedman, M.; Leach, L.; Kaplan, E.; Winocur, G.; Shulman, K.I.; Delis, D. Clock Drawing: A Neuropsychological Analysis; Oxford
University Press, Inc.: Oxford, UK, 1994.
40. Mendes-Santos, L.C.; Mograbi, D.; Spenciere, B.; Charchat-Fichman, H. Specific algorithm method of scoring the Clock Drawing
Test applied in cognitively normal elderly. Dement. Neuropsychol. 2015, 9, 128–135. [CrossRef]