Empirical Analysis For Crime Prediction and Forecasting Using Machine

Uploaded by

Prajwal Bunny

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

Empirical Analysis For Crime Prediction and Forecasting Using Machine

Uploaded by

Prajwal Bunny

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Received April 24, 2021, accepted May 2, 2021, date of publication May 6, 2021, date of current version May

17, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3078117

Empirical Analysis for Crime Prediction and

Forecasting Using Machine Learning and
Deep Learning Techniques
WAJIHA SAFAT 1, SOHAIL ASGHAR 1, (Member, IEEE), AND SAIRA ANDLEEB GILLANI2
1 Department of Computer Science, COMSATS University, Islamabad 44000, Pakistan
2 Department of Computer Science, Bahria University Karachi Campus, Karachi 75260, Pakistan

Corresponding author: Wajiha Safat ([email protected]; [email protected])

ABSTRACT Crime and violation are the threat to justice and meant to be controlled. Accurate crime
prediction and future forecasting trends can assist to enhance metropolitan safety computationally. The
limited ability of humans to process complex information from big data hinders the early and accurate
prediction and forecasting of crime. The accurate estimation of the crime rate, types and hot spots from past
patterns creates many computational challenges and opportunities. Despite considerable research efforts, yet
there is a need to have a better predictive algorithm, which direct police patrols toward criminal activities.
Previous studies are lacking to achieve crime forecasting and prediction accuracy based on learning models.
Therefore, this study applied different machine learning algorithms, namely, the logistic regression, support
vector machine (SVM), Naïve Bayes, k-nearest neighbors (KNN), decision tree, multilayer perceptron
(MLP), random forest, and eXtreme Gradient Boosting (XGBoost), and time series analysis by long-short
term memory (LSTM) and autoregressive integrated moving average (ARIMA) model to better fit the crime
data. The performance of LSTM for time series analysis was reasonably adequate in order of magnitude
of root mean square error (RMSE) and mean absolute error (MAE), on both data sets. Exploratory data
analysis predicts more than 35 crime types and suggests a yearly decline in Chicago crime rate, and a slight
increase in Los Angeles crime rate; with fewer crimes occurred in February as compared to other months.
The overall crime rate in Chicago will continue to increase moderately in the future, with a probable decline
in future years. The Los Angeles crime rate and crimes sharply declined, as suggested by the ARIMA model.
Moreover, crime forecasting results were further identified in the main regions for both cities. Overall,
these results provide early identification of crime, hot spots with higher crime rate, and future trends with
improved predictive accuracy than with other methods and are useful for directing police practice and
strategies.

INDEX TERMS LSTM and ARIMA based crime prediction, analysis and forecast.

I. INTRODUCTION associated with distinct consequences [4]. Overall, crimes

Criminality is a negative phenomenon, which occurs world- take place due to various circumstances including specific
wide in both developed and underdeveloped countries. The motives, human nature and behavior, critical situations and
criminal activities can severely strike the economy as well poverty [5]. Furthermore, multiple factors such as unemploy-
as affect the quality of life and well-being of residents, thus ment, gender inequality, high population density, child labor,
leading towards social and societal issues [1]. The crimes and illiteracy, can cause an increase in violent crimes [6]. The
and criminal acts can incur costs to both the public and growing and populated cities also have a strong correlation
private sectors [2]. Public safety is a considerable factor with higher crime rates associated with multiple types of
for secure environments when people travel or move to environments such as commercial buildings and municipal
new places [3]. In reality, different kinds of crimes may be housing areas [7]. A socially sustainable community heavily
relies on minimizing crime so that people can live peace-
The associate editor coordinating the review of this manuscript and fully and actively, while corrupt societies cannot prosper
approving it for publication was Yiming Tang . both socially and economically in the absence of peace.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
70080 VOLUME 9, 2021
W. Safat et al.: Empirical Analysis for Crime Prediction and Forecasting

Consequently, analyzing the crime reports and statistics are analysis, and (4) crime forecasting for the crime rate and
essential to improve the safety and security of humanity while high intensity crime areas for subsequent years by using
maintaining sustainable development. an ARIMA model. The Chicago and Los Angeles datasets
Crime prediction has gained popularity in recent years have been collected throughout the years; it is no surprise
because it supports the ability of investigation authorities that machine learning and deep learning methods may be
to handle crime computationally. There is a need for bet- useful in the prediction of crime types and forecasting future
ter predictive algorithms, which direct police patrols toward benefit [19]. The overall crime rate forecasting results would
criminals [8]. Several studies have been carried out to predict benefit the police by using identified alleged crime areas to
crime types, crime rates and hot spots of crime by using crime allocate additional resources and protective measures against
datasets for different areas, for example, in South Korea, criminals.
and the U.S. (including Portland) [9], [10]. Furthermore, This study reports an improved efficiency for accurate
different pilot projects are also extended to identify crime crime prediction as compared with previously achieved with
geographical locations such as residential and commercial further analysis based on different machine learning algo-
using the Canada dataset [11]. Research has been dedicated rithms. Besides crime prediction accuracy, the LSTM for
to implementing innovative methodologies such as machine time series analysis was reported using different performance
learning and deep learning techniques to predict crimes as metrics. Moreover, the study also provides a visual summary
a rigid approach and maintain a safe and secure environ- through exploratory data analysis to portray crime types and
ment [4]. Recent examples of machine learning and deep count. Finally, the future crime rate and crime density areas
learning algorithms for successful crime prediction and anal- for the next five years were examined through ARIMA.
ysis are the Naïve Bayes, random forest, SVM, decision tree, The structure of this paper is organized as follow: Section 2
and regression techniques [12], [13]. discusses the literature review related to crime prediction.
Accurate crime prediction is complicated but necessary Section 3 presents preliminary classification methods, pre-
for the prevention of criminal acts. The accurate estima- diction and performance evaluation measures. Section 4
tion of the crime rate, types and hot spots from past pat- introduces the data and preprocessing. Section 5 explains
terns creates many computational challenges and opportu- the major findings with a detailed comparative analysis of
nities. Crime prediction based on machine learning is the Chicago and Los Angeles datasets about crime prediction and
current mainstream for prediction analysis; however, only a future forecasting. Section 6 covers the discussions and future
few studies systematically compare different machine learn- directions with additional considerations and key points about
ing methods. The ability of machine learning algorithm in models. Finally, concluding remarks are given in Section 7.
processing non-linear rational data has been confirmed in
many fields, including crime prediction. It can handle very II. LITERATURE REVIEW
high-dimensional data with faster training speed and can The recent literature regarding crime prediction can be cat-
extract the characteristics of the data [14]. Despite consider- egorized in different research domains [20]. For example,
able research efforts, the literature lacks the relative accuracy several studies highlight the ecological factors like educa-
for crime prediction from large datasets for multiple cities; tion, income level, unemployment to name a few, behind
such as Los Angeles and Chicago datasets have been used crimes, while spatial-temporal crime event has also been
rarely. Recent literature further suggest that the challenges focused [21], [22]. The recent literature also suggests that
concerned with the accuracy of prediction and forecast of crime prediction and analysis are based on new types of data
violent acts mainly in high crime density areas by imple- taken from online forums such as Twitter and mobile phone
menting different models [15]. Given that, the crime data data [23]. Nevertheless, all these studies mainly focus only
is usually based on time series data, which shows the data on the cause of crimes followed by their consequence [24].
seasonality, and suggests the potential significance of crime Herein, we particularly emphasize the implementation of
activities evolved in the years. Therefore, time series analysis multiple techniques to achieve substantial accuracy on two
is required to generate visual patterns along with a deep large datasets.
learning algorithm specifically LSTM, which provides the The literature review section specifically reveals the related
better classification of crimes over time based on adequate studies on crime prediction based on Chicago and Los Ange-
measures [16]. Additionally, forecasting the crime trends les datasets. This section further highlights the classification,
through ARIMA model is highly recommended in recent prediction and forecasting of crimes. Different aspects of
research [17]. crime detection have been analyzed by different research
Therefore, this study aims to analyze crime prediction in methods. However, the overall prediction depends directly
the Chicago and Los Angeles datasets [18], (1) improving or indirectly on the information available within the given
the predictive accuracy compared to results in the recent dataset for crime prediction. Chicago and Los Angeles both
literature by implementing the Logistic Regression, SVM, are populous and iconic cities of the U.S. and their datasets
Naïve Bayes, KNN, Decision Tree, MLP, Random Forest, are available publically at authorized repositories, relating
XGBoost algorithms, (2) time-series analysis by LSTM, multiple traits that have been a great source of attraction for
(3) creating a visual summary through exploratory data analysts. With a specific goal to the brief, there have been