4.predictive Modeling Toward The Design of A Forensic Decision Support System Using Cheiloscopy For Identification From Lip Prints
4.predictive Modeling Toward The Design of A Forensic Decision Support System Using Cheiloscopy For Identification From Lip Prints
Applied Informatics
5th International Conference, ICAI 2022
Arequipa, Peru, October 27–29, 2022
Proceedings
Communications
in Computer and Information Science 1643
Applied Informatics
5th International Conference, ICAI 2022
Arequipa, Peru, October 27–29, 2022
Proceedings
Editors
Hector Florez Henry Gomez
Universidad Distrital Francisco Jose de Universidad Continental
Caldas Arequipa, Peru
Bogota, Colombia
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2022
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Contents
Artificial Intelligence
Data Analysis
Decision Systems
AESRSA: A New Cryptography Key for Electronic Health Record Security . . . 237
Sunday Adeola Ajagbe, Hector Florez, and Joseph Bamidele Awotunde
Image Processing
Robotic Autonomy
Software Architectures
1 Introduction
Forensic identification is an application of forensic science and technology to identify
specific objects from the trace evidence they leave, often at a crime scene or the scene of
an accident. Modern forensic science deals with several identification techniques where
digital tools play a pivotal role. Identification methods through anthropometry, finger-
prints, sex determination, age estimation, measurement of height, and differentiation by
blood groups, DNA and odontology are traditionally used in forensics. In this respect,
biometrics play a fundamental role. Biometrics is a fundamental verification mechanism
that identifies individuals based on their physiological and behavioral features. These
biometric expansions are easily observable in different forensic identification areas, e.g.,
face, fingerprint, iris, voice, handwriting, etc. The effectiveness of biometrics system lies
in different recognition processes which include feature extraction, feature robustness
and feature matching. However, traditional methods face several challenges like insuf-
ficiency of available evidence, concealment of identity from traditional models, time
consumption, lack of standardization and interoperability [1].
In this respect, newer techniques are being considered, coupled with the power of data
science. Cheiloscopy is the forensic investigation technique dealing with identification
of humans based on lip traces [6–9]. Several algorithms like top-hat transform, vote
counting, time warping, and Hough transform are efficient methods to automate the
process of identification from lip prints based on cheiloscopy [12]. However, recent
years have seen a significant scope in this area due to lack of investigation on significant
information like the sex or age apart from the identity [2, 3].
This work is aimed at designing the architecture for a Forensic Decision Support
System based on cheiloscopy, using predictive modeling through machine learning. It
focuses on the use of supervised learning algorithms with the aim of identifying a per-
son in terms of their biological sex using their lip prints, illustrated through a cohort of
43 subjects, performing predictive analysis on 40 labial features for forensic identifica-
tion. Subsequently, an architecture for a forensic decision support system implementing
identification from cheiloscopic techniques is presented in this work.
In the domain of biometrics, machine learning has illustrated its capability in the better-
ment of precision in the identification process [4, 9]. Biometric features collected at the
first instance are not essentially the same as the subsequent samples. The use of machine
learning provides a significant support in this respect. A biometric system aiming to
predict the identifying information of a person based on a biometric sample automati-
cally or check if it links to an existing information in the database usually follows the
following structure (Fig. 1).
Feature
Input Data SegmentaƟon Pre-processing ClassificaƟon
ExtracƟon
This study consists of 43 subjects (26 females and 17 males). The input data of lip
impressions has been collected from the subjects by pressing their lips colored with
lipstick against a paper.
In this respect, a generalized architecture (Fig. 2) has been designed using the Ama-
zon Web Services (AWS) platform, aimed at a Forensic Decision Support System based
on cheiloscopy.
370 A. Sabelli et al.
The architecture is based on AWS tools which are designed to follow a streamlined
flow of the entire process from collection of images to solving the decision support tasks
in an agile way. The principal modules of the designed architecture are as follows:
Input Image
This image is sent through a post request to an API Gateway which consists of commu-
nication with lambdas starting the analysis process. The image is uploaded to S3 prior
to executing Lambda 1.
a. The image will use the recognition tool generating a crop in the necessary area.
b. The resulting image will be uploaded to S3 to continue the analysis process.
Predictive Modeling Toward the Design of a Forensic Decision Support 371
Sagemaker
In case the image does not require a treatment, the execution of the processing and
consumption of Sagemaker will be direct.
Modeling in SageMaker
The implementation of the model in SageMaker will allow consumption to be more
distributed and its management more uniform in terms of version management. The
model will receive as input the URL parameters where it must obtain the image to be
processed, which generates a consumption to S3.
The model consumes its dataset from the database indicated in the architecture to
balance the information and return a result to the lambda either by listening to a record
in the database or as a direct response. The response to the SageMaker process obtained
in the lambda will execute two processes, both the return of the response to the frontend
through the API gateway and, if necessary, a notification via Amazon Simple Queue
Service (SQS).
The flow diagram (Fig. 3) consists of AWS tools designed to speed up and improve
the extraction and insertion of information in the consumption dataset of the model.
The process consists of loading an image through the frontend which will proceed to be
processed through lambdas.
Image upload: The upload of the image to an S3 will generate an execution of a lambda
process which will give rise to the start of the process in the indicated lambda.
Extraction: The extraction process will consist of the method or script used for the local
extraction of information of the training image, either with a direct implementation or
an alternative process to the lambda.
Storage of information: The saving of the information resulting from the previous
process will be stored in the dataset of information in the database.
Completion notification: The lambda process will return a result, if necessary, through
the API gateway to the frontend which indicate that the process was successful.
Once the architecture has been designed, the dataset of 43 subjects is considered and
modeled through the following steps.
Data Segmentation
In the region of interest (ROI) of the lip traces, a high level of noise is often common
due to the presence of undesired elements like facial hair and fingerprints in the image
sample (Fig. 4). So, the segmentation of data was performed using manual methods.
Subsequently, the ROI was extracted from the lip prints with the objective of
separating the lower lip from the upper lip (Fig. 5).
Predictive Modeling Toward the Design of a Forensic Decision Support 373
Once the segmentation has been performed, the prints of the upper and lower lip
were stored as separate files since they were treated differently by the next level of the
modeling process based on whether it is an upper or lower lip.
Preprocessing
Since the algorithm for feature selection primarily considers the lower lip traces, the
images of the upper lips were vertically flipped, whereas the images of lower lip traces
were kept intact.
The transparent pixels (with a zero value for alpha channel) were replaced by white
colored pixels (i.e. [255, 255, 255, 0] in [R, G, B, A] notation). Following that, the
images have been aligned horizontally and converted to grayscale. For the horizontal
alignment, the extremities and corners of the lip prints were pointed, to calculate the
anchor point (the middle spot in between the corners) for the horizontal rotation (Fig. 6).
Fig. 6. (a) The corners and anchor points of the lip print. (b) Horizontal alignment of the lip print.
Following that, the minimum bounding rectangle enclosing the lip print has been
calculated, and the image is cropped, taking this rectangle into account. Afterwards, the
image was resized to establish a common size for all the prints, performing normalization
(1500x500 pixels). Finally, the image was binarized (Fig. 7).
Fig. 7. (a) Removal of white space and normalization. (b) Binarized lip print.
374 A. Sabelli et al.
For all the lip prints of the sample set, the same set of biometric features have been
determined. Each set of lip print-based features are denoted as fn = [f1 , …, fn ] (Table 1).
For using in the classification algorithms, 40 features in each side of the lips have been
extracted, counting to 80 features in total for each whole lip trace, following which
standardization have been performed on the features.
2.3 Classification
The first phase of feature extraction produced a substantially high number of features,
leading to the need of dimensionality reduction, to preserve only the relevant ones, for a
smooth modeling procedure. The dimensionality reduction was performed using Extra
Tree Classifiers, also called as Extremely Randomized Trees, which is a type of ensemble
learning technique consisting of many decision trees and the prediction of each tree is
considered to reach the final decision. In this classifier, random selection is performed
on all the features and splits, and Gini importance is used to measure the relevance of
a feature. Consequently, the features are ordered in descending order of this value to
fit the high-dimensional data into a low-dimensional space, selecting the top k-features
(43) from the original set of 80 features. Following that, the dataset was prepared for
the modeling using supervised algorithms, considering that the data is labeled, and the
objective is to predict or classify the observations. The dataset was split into test and
train data after different sequences of train-test data through cross-validation, finalizing
80% of the data for training and the remaining 20% for testing.
Considering the nature of the dataset, five algorithms have been used for the pre-
dictive analysis—logistic regression, support vector machine, naïve bayes, multilayer
perceptron (MLP), and k-nearest neighbors (kNN).
3 Results
The results and the respective performance metrics for the classification algorithms are
shown in Table 2. In each case, the optimal combination of features has been mentioned.
Across the different classifiers, the optimal number of features varies; however, an
average of 25 features have been considered from the original 80 features.
Among the considered metrics, the accuracy of the classification algorithms illus-
trates the correctness of the algorithm classifying a data point. To analyze the recall
and precision of the prediction models, the f1 score is of high importance, whereas the
AUC values illustrate the capability of the models to distinguish between the two classes
(female or male). Among all the considered models, k-NN provided the highest accuracy
(0.82) to predict the biological sex of the person, accompanied by the highest AUC (0.80)
and f1 score (0.86) as well. On the other hand, MLP showed the worst performance in
terms of accuracy (0.65), AUC (0.59), and f1 score (0.66).
Predictive Modeling Toward the Design of a Forensic Decision Support 375
4 Conclusions
This work illustrates the architectural design of a forensic decision support system
through predictive modeling with supervised algorithms, aiming at identifying the bio-
logical sex of a person using their lip prints. Based on the original dataset comprising
of lip prints and using five machine learning models, k-NN provided the highest per-
formance, with all the models providing reasonably good accuracy in determining the
biological sex of a person using their lip traces. However, within the forensic decision
support system, the key challenge is the image segmentation module, since the presence
of noise and unwanted elements in the image samples led to manual preprocessing for
basic cleaning and extraction of the ROI. With respect to the supervised algorithms,
scarcity of image samples was a challenge, in terms of training and test-data. Another
significant challenge has been the area of feature selection for the classification models,
considering the huge number of features associated with the sample, which was resolved
using a hyperparameter tuning function [5].
This work provides a generalized structure for the cheiloscopy-based forensic deci-
sion support system, along with a specific module for predictive modeling. On one
hand, it provides an integral structure to receive the samples, preprocess and finally
predict the identifying attributes of the subjects. On the other hand, it opens the pathwat
to reengineer the similar architecture for other biometric features as well, for a more
comprehensive and integral identification system. However, further work on this line
is aimed toward the predictive modeling using unsupervised algorithms, provided the
availability of sufficiently big datasets. Subsequently, other significant information like
age and other identifying attributes could be extracted from the lip prints through a com-
prehensive decision support system. Similarly, it might provide further information like
age that could still be extracted out of lip prints.
Acknowledgements. This work was supported and financed by the Cloudgenia group through its
technical and operational capabilities for the design of the decision support system architecture.
References
1. Saini, M., Kumar Kapoor, A.: Biometrics in forensic identification: applications and
challenges. J Forensic Med. 1(108), 2 (2016). https://doi.org/10.4172/2472-1026.1000108
Predictive Modeling Toward the Design of a Forensic Decision Support 377
2. Kumar, A., Prasad, S.N., Kamal, V., Priya, S., Kumar, M., Kumar, A.: Importance of
cheiloscopy. IJOCR 4, 48–52 (2016). https://doi.org/10.5005/jp-journals-10051-0012.
3. Sandhya, S., Fernandes, R.: Lip Print: An emerging biometrics technology - a review. In:
2017 IEEE International Conference on Computational Intelligence and Computing Research
(ICCIC),. pp. 1–5 (2017). https://doi.org/10.1109/ICCIC.2017.8524457
4. Akulwar, P., Vijapur, N.A.: Secured multi modal biometric system: a review. In: 2019 Third
International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC),.
pp. 396–403 (2019). https://doi.org/10.1109/I-SMAC47947.2019.9032628
5. Tuning the hyper-parameters of an estimator — scikit-learn 0.24.2 documentation. https://sci
kit-learn.org/stable/modules/grid_search.html Accessed 23 Aug 2021
6. Caldas, I.M., Magalhães, T., Afonso, A.: Establishing identity using cheiloscopy and
palatoscopy. Forensic Sci Int. 165, 1–9 (2007)
7. Tsuchihashi, Y.: Studies on personal identification by means of lip prints. Forensic Sci. 3,
233–248 (1974)
8. Cheiloscopy, K.J.: In: Siegel, J.A., Saukko, P.J., Knupfer, G.C. (eds.) Encyclopedia of Forensic
Sciences. 2nd edi. I, pp. 358–361. Academic Press, London (2000)
9. Acharya, A.B.: Teaching forensic odontology: an opinion on its content and format. Eur J
Dent Educ. 10, 137–141 (2006)
10. Velosa, F., Florez, H.: Edge solution with machine learning and open data to interpret signs
for people with visual disability. CEUR Workshop Proceedings 2714, 15–26 (2020)
11. Sabelli, A.F., Chatterjee, P., Pollo-Cattaneo, M.F.: Predictive modeling toward identification
of sex from lip prints-machine learning in cheiloscopy. CEUR Workshop Proceedings 2992,
29–43 (2021)