Personality Prediction System Based On Graphology Using Machine Learning
Personality Prediction System Based On Graphology Using Machine Learning
Listen Share
Team Members: Lucy Hwang, Yashaswini Kalva, Hyeon Gu Kim, Kaushik Kumaran,
Archit Patel
Abstract
Graphology is a method of identifying, evaluating and understanding human
personality traits through the strokes and patterns revealed by handwriting.
Handwriting reveals the true personality including emotional outlay, fears, honesty,
defenses and many others. Professional handwriting examiners called graphologists
often identify the writer with a piece of handwriting. Accuracy of handwriting
analysis depends on how skilled the analyst is. Although human intervention in
handwriting analysis has been effective, it is costly and prone to error. Hence the
proposed methodology focuses on developing a system that can predict personality
traits with the aid of machine learning without human intervention. To make this
happen, we considered seven handwriting features: (i) size of letters, (ii) slant of the
writing, (iii) baseline, (iv) pen pressure, (v) spacing between letters, (vi) spacing
between words and (vii) top margin in a document to predict eight personality traits
of a writer as shown in Figure 1.0.
Figure 1.0 handwriting attributes and respective personality behavior
After extracting all these features from the images containing the handwriting we
applied a Random Forest classifier for each personality trait of the writer. We also
built ANN and CNN models on the raw image data.
Introduction
Graphology is defined as the analysis of the physical characteristics and patterns of
the handwriting of an individual to understand his or her psychological state at the
time of writing. Handwriting is a kind of projective test where the unconscious
comes to the fore and expresses itself in the conscious [1]. A Graphologist can
roughly interpret an individual’s character and personality traits by analysing the
handwriting. We can use graphology to determine the personality and character
profile of a person.
Objective
The objective of this project is to develop a system that takes an image document
containing the handwriting of a person and outputs a few of his/her personality
traits based on some selected handwriting features. Carefully analysing all the
significant characteristics of a handwriting manually is not only time consuming
but prone to errors as well. Automating the analysis on a few selected
characteristics of handwriting will speed up the process and reduce the errors
Motivation
Handwriting analysis is one among several methods to understand the psychology
of a person. Graphology can be used for below two areas:
Hand-writing analysis with a computer is fast, accurate and identifies the patterns
better than visual inspection. Moreover, machine learning assisted analysis is
efficient and devoid of human errors.
Literature Review
The project focuses on development of a system to predict some psychological traits
of a person by analyzing his or her handwriting using machine learning. Many
researchers have also done similar works on computer aided graphology.
A similar work was done by Shitala Prasad, Vivek Kumar Singh and Akshay Sapre of
Department of Information Technology, Indian Institute of Information Technology
Allahabad, India to predict human personality through handwriting using support
vector machines [4]. Another similar work was done by Navin Karanth, Vijay Desai
and S. M. Kulkarni of Mechanical Engineering Department, National Institute of
Technology Karnataka, India to predict a writer’s personality through graphology,
without any machine learning [5]. Another similar work was done by Champa H N,
Assistant Professor of Department of Computer Science and Engg., University
Visvesvaraya College of Engineering, Karnataka, India and Dr. K R Ananda Kumar,
Professor of Department of Computer Science and Engg., SJB Institute of
Technology, Karnataka, India on computer aided graphology using artificial neural
networks [6]. All these research works have fundamental differences in selection of
handwriting features, extraction methods, classification and output, etc.
Problem Statement
A system is proposed to automate the basic handwriting analysis tasks of
graphology to determine a few important personality traits. Seven
features/characteristics of a handwriting are considered to be extracted from a
sample handwriting image. Each of the seven resulting raw values will be put into
corresponding categories of respective feature variations. The classifiers will then
be able to predict the personality traits of the writer. An overview is represented
below:
Figure 1.1: The proposed system — A handwriting sample is taken and the personality traits are predicted.
Data Acquisition
Data from the IAM Handwriting Database of Research Group on Computer Vision
and Artificial Intelligence INF, University of Bern, Switzerland is obtained. The data
was readily available for download to be used for non-profit research purposes. The
database contains 1538 pages of scanned text for which 657 writers contributed
samples of their handwriting. Each handwriting sample is labelled with the
corresponding psychological traits by manually studying each document.
Pre Processing
The handwriting images we obtained contain unwanted noise, printed texts and
lines. The aim of pre-processing is to make the image data suitable for feature
extraction for which we adopted below methods
1. Image resizing
These images were cropped and saved as PNG images with an automatic action
script. Now the width of all the images is 850 pixels and the height is according to
the content of the handwriting in the image. PNG format is used instead of JPEG
because the former is a lossless format and is more suitable for storing text images,
printed or handwriting.
Figure 1.21: Original image data sample obtained sample with 850px width
Figure 1.22: Cropped and normalized image data from the IAM Handwriting Database.
2. Noise Removal
Image noise is defined as random variation of brightness or color information in
images, and is usually an aspect of electronic noise.
From below 2 images, it is observed that a bilateral filter preserves the edges of the
subjects in the image
Figure 1.5: The sample image after applying dilation with a 5x100 kernel. The foreground pixels are spread
horizontally.
Feature Extraction
Features used for building Random Forest are — Baseline ; Line; Letter Size; Line
Spacing; Word Spacing; Top Margin; Pen Pressure; Slant of Letters
Classification Labels
1. Openness
2. Conscientiousness
3. Agreeableness
4. Neuroticism
Random Forest
Random forest is used in modeling predictions and behavior analysis as feature
scaling is not required and as it is less impacted by noise.
Given below are the steps followed for predicting personality traits using Random
Forest:
Figure 1.6: Steps for predicting personality traits using Random Forest
For predicting each personality trait a separate random forest classifier was built.
Given below is a snippet of the input data fed into the models:
Figure 1.7: Input data fed into the models
Hyperparameter Tuning
We used Randomized Grid Search to find the most optimal hyper parameters for
RandomForest Classifier. Below hyper parameters are tuned
n_estimators
Max_features
max_depth
min_samples_split
ccp_alpha
Feature Importance
Using Random Forest Models we were able to understand the importance of
features that we extracted in the pre-processing step as the model assigns
importance to a feature based on the frequency of its inclusion in the sample by all
trees.
Figure 1.9: The most important features for each personality type
Results
Below are the results obtained from Random Forest:
ANN
Before we dive into the art of neural networks, we first need to understand what
ANN is. In short, Artificial Neural Network (ANN) is a machine learning algorithm
that mimics the processing of the brain. In other words, ANN enables machines to
process given data similar to how the human brain processes. Below figure shows
how biological neuron and ANN similarly process data:
This is the simplest form of ANN that is consist of inputs (x1, x2, …,xn ), weights
(w1,w2,…,wn) and activation function. Similar to how the human brain takes inputs
with dendrites, processes from nucleus to axon and outputs the results in axon
terminals, ANN takes input data, gives weights to each input, processes through
activation function and outputs the result.
Because of the vast amount of complex data from preprocessing steps, the simplest
form of ANN above is not enough — we need more than that. For such a reason, we
decided to include two hidden layers which distill redundant data and makes the
process more efficient and faster. This is called Multi Layer Perceptron (MLP) which
consists of an input layer, one or more hidden layers and an output layer (Figure
2.1).
Now that we have a better understanding about ANN, let’s see how we implemented
ANN for predicting personality using handwriting. The overall process of the
implementation of ANN is quite simple: converting pre-processed data into arrays
of pixels and putting the arrays into ANN. Below figure shows a high-level view of
the ANN process in this project.
Figure 2.3: High-level view of the ANN process: With datasets of handwriting images, we converted them
into arrays of pixels and put them into ANN model
Although the data already had been preprocessed, we still needed to do data
transformation process where we encode categorical variables (personality labels,
which is our target variable), reshape the data matrices for ANN, and split the data
into train, validation and test sets (70%, 15%, 15%, respectively). Then we used
Keras from TensorFlow for ANN:
RMSprop optimizer
Hyperparameter Tuning
Epochs: 60
We can see from Figure 2.5 that the ANN is performing well by looking at the train
and test accuracy graph above. One interesting fact is that the test accuracy starts to
outperform train accuracy after the 34th epoch. Next, let’s see the relationship
between accuracy and loss.
Figure 2.6: Relationship between the test loss and accuracy
Similarly, in Figure 2.6, we can observe the equilibrium between the accuracy and
the loss at the 34th epoch and the accuracy continues to increase as the loss
continues to decrease.
Figure 2.7: Train loss vs Test loss
The above graph shows a comparison between the train loss and the test loss.
Interestingly enough, the test loss diverges from the train loss when epoch is 20.
Robust to the data with heteroskedasticity (data with high volatile and non-
constant variance)
However, ANN is not an all-mighty algorithm. Recall that our objective is to predict
personality from handwriting and the data is image! Unfortunately, ANN cannot
take the image data as it is but rather have to convert the images to numbers which
could lead to the loss of important information. Furthermore, the high test accuracy
score could raise the problem of overfitting in the future. Therefore, we decided to
try another popular neural network model — Convolutional Neural Network (CNN).
CNN
Inspired from the human visual perception of recognizing things, CNN follows a
hierarchical model which works on building a network, like a funnel, and finally
gives out a fully-connected layer where all the neurons are connected to each other
and the output is processed. The input image is fed into the CNN layers, these layers
are trained to extract relevant features from the image. A CNN convolves learned
features with input data, and uses 2D convolutional layers, making this architecture
well suited to processing 2D data, such as images.
Figure 2.9: How CNN classifies handwritten digits
CNN Methodology
Data Preprocessing
As a first step, we separated the data into training, validation and test sets in the
ratio of 70%, 15% and 15% respectively.
Since the training set had only 657 images, Data Augmentation was used in an effort
to increase the number of samples.
Model Building
Since the number of available images were limited even after augmentation, there
was a need to use Transfer Learning so that the model learns the lower level
features with some pre-trained network. The base model used was Inception
Resnetv2 with pre-trained weights flowing in from the ImageNet dataset.
Batch Normalization was used to scale the inputs and thereby make the network
more stable.
Model checkpoints were incorporated to store the best weights of the model.
Hyperparameter Tuning
Epochs: set the number of epochs to 30
CNN Results
Accuracy on the training set — 78.9%
Precision : 66.3%
Recall : 62.5%
Figure 3.0: Train accuracy vs Validation accuracy
Unfreeze certain layers and try re training the model with our dataset for those
layers
Conclusion
We used machine learning to automate the graphology process to determine
important personality traits through different classifiers such as Random Forest,
ANN and CNN. After image preprocessing features were extracted. The feature
importance we received for each trait using the classifiers was similar to
importance given by the graphologist in determining the personality traits. Random
forest has performed better than CNN and ANN because subject knowledge was
incorporated into the pre-processing phase.
However, we are aware there are additional resources available to better understand
human personality. The sample did not require to standardize pen type and ink
color. With standardization of pen, paper, margins, as well as guiding personality
questions, we could further enhance our automated handwriting process to lead to
more accurate results.
References
[1] D. J. Antony. Personality Profile Through Handwriting Analysis. Anugraha
Publications, 2008.
[2] Karen Amend and Mary S. Ruiz. Handwriting Analysis The Complete Basic Book.
New Page Books, 1980.
[4] Shitala Prasad, Vivek Kumar Singh, Akshay Sapre. Handwriting Analysis based
on Segmentation Method for Prediction of Human Personality using Support Vector
Machine. International Journal of Computer Applications (0975 8887) Volume 8
№12, October 2010.
[5] Vikram Kamath, Nikhil Ramaswamy, P. Navin Karanth, Vijay Desai and S. M.
Kulkarni . Development of an Automated Handwriting Analysis System. ARPN
Journal of Engineering and Applied Sciences VOL 6, NO.9, September 2011.
[6] Champa H N, K R AnandaKumar. Arti cial Neural Network for Human Behavior
Prediction through Handwriting Analysis. International Journal of Com-puter
Application (0975–8887) Volume 2- №2, May 2010.l
Hyeon Gu Kim
1
Hyeon Gu Kim
Hyeon Gu Kim
Analyzing Employee Satisfaction in Major Consulting Firms from
Glassdoor Reviews — Part 2…
Team Members: Lucy Hwang, Rhiannon Pytlak, Hyeon Gu Kim, Mario Gonzalez, Namit
Agrawal, Sophia Scott, Sungho Park
Hyeon Gu Kim
3.4K 29
The PyCoach in Artificial Corner
29K 525
Lists
AI Regulation
6 stories · 51 saves
4K 68
173 2
17.3K 277
Love Sharma in ByteByteGo System Design Alliance
6.8K 53