Empirical Analysis For Crime Prediction and Forecasting Using Machine
Empirical Analysis For Crime Prediction and Forecasting Using Machine
17, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3078117
ABSTRACT Crime and violation are the threat to justice and meant to be controlled. Accurate crime
prediction and future forecasting trends can assist to enhance metropolitan safety computationally. The
limited ability of humans to process complex information from big data hinders the early and accurate
prediction and forecasting of crime. The accurate estimation of the crime rate, types and hot spots from past
patterns creates many computational challenges and opportunities. Despite considerable research efforts, yet
there is a need to have a better predictive algorithm, which direct police patrols toward criminal activities.
Previous studies are lacking to achieve crime forecasting and prediction accuracy based on learning models.
Therefore, this study applied different machine learning algorithms, namely, the logistic regression, support
vector machine (SVM), Naïve Bayes, k-nearest neighbors (KNN), decision tree, multilayer perceptron
(MLP), random forest, and eXtreme Gradient Boosting (XGBoost), and time series analysis by long-short
term memory (LSTM) and autoregressive integrated moving average (ARIMA) model to better fit the crime
data. The performance of LSTM for time series analysis was reasonably adequate in order of magnitude
of root mean square error (RMSE) and mean absolute error (MAE), on both data sets. Exploratory data
analysis predicts more than 35 crime types and suggests a yearly decline in Chicago crime rate, and a slight
increase in Los Angeles crime rate; with fewer crimes occurred in February as compared to other months.
The overall crime rate in Chicago will continue to increase moderately in the future, with a probable decline
in future years. The Los Angeles crime rate and crimes sharply declined, as suggested by the ARIMA model.
Moreover, crime forecasting results were further identified in the main regions for both cities. Overall,
these results provide early identification of crime, hot spots with higher crime rate, and future trends with
improved predictive accuracy than with other methods and are useful for directing police practice and
strategies.
INDEX TERMS LSTM and ARIMA based crime prediction, analysis and forecast.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
70080 VOLUME 9, 2021
W. Safat et al.: Empirical Analysis for Crime Prediction and Forecasting
Consequently, analyzing the crime reports and statistics are analysis, and (4) crime forecasting for the crime rate and
essential to improve the safety and security of humanity while high intensity crime areas for subsequent years by using
maintaining sustainable development. an ARIMA model. The Chicago and Los Angeles datasets
Crime prediction has gained popularity in recent years have been collected throughout the years; it is no surprise
because it supports the ability of investigation authorities that machine learning and deep learning methods may be
to handle crime computationally. There is a need for bet- useful in the prediction of crime types and forecasting future
ter predictive algorithms, which direct police patrols toward benefit [19]. The overall crime rate forecasting results would
criminals [8]. Several studies have been carried out to predict benefit the police by using identified alleged crime areas to
crime types, crime rates and hot spots of crime by using crime allocate additional resources and protective measures against
datasets for different areas, for example, in South Korea, criminals.
and the U.S. (including Portland) [9], [10]. Furthermore, This study reports an improved efficiency for accurate
different pilot projects are also extended to identify crime crime prediction as compared with previously achieved with
geographical locations such as residential and commercial further analysis based on different machine learning algo-
using the Canada dataset [11]. Research has been dedicated rithms. Besides crime prediction accuracy, the LSTM for
to implementing innovative methodologies such as machine time series analysis was reported using different performance
learning and deep learning techniques to predict crimes as metrics. Moreover, the study also provides a visual summary
a rigid approach and maintain a safe and secure environ- through exploratory data analysis to portray crime types and
ment [4]. Recent examples of machine learning and deep count. Finally, the future crime rate and crime density areas
learning algorithms for successful crime prediction and anal- for the next five years were examined through ARIMA.
ysis are the Naïve Bayes, random forest, SVM, decision tree, The structure of this paper is organized as follow: Section 2
and regression techniques [12], [13]. discusses the literature review related to crime prediction.
Accurate crime prediction is complicated but necessary Section 3 presents preliminary classification methods, pre-
for the prevention of criminal acts. The accurate estima- diction and performance evaluation measures. Section 4
tion of the crime rate, types and hot spots from past pat- introduces the data and preprocessing. Section 5 explains
terns creates many computational challenges and opportu- the major findings with a detailed comparative analysis of
nities. Crime prediction based on machine learning is the Chicago and Los Angeles datasets about crime prediction and
current mainstream for prediction analysis; however, only a future forecasting. Section 6 covers the discussions and future
few studies systematically compare different machine learn- directions with additional considerations and key points about
ing methods. The ability of machine learning algorithm in models. Finally, concluding remarks are given in Section 7.
processing non-linear rational data has been confirmed in
many fields, including crime prediction. It can handle very II. LITERATURE REVIEW
high-dimensional data with faster training speed and can The recent literature regarding crime prediction can be cat-
extract the characteristics of the data [14]. Despite consider- egorized in different research domains [20]. For example,
able research efforts, the literature lacks the relative accuracy several studies highlight the ecological factors like educa-
for crime prediction from large datasets for multiple cities; tion, income level, unemployment to name a few, behind
such as Los Angeles and Chicago datasets have been used crimes, while spatial-temporal crime event has also been
rarely. Recent literature further suggest that the challenges focused [21], [22]. The recent literature also suggests that
concerned with the accuracy of prediction and forecast of crime prediction and analysis are based on new types of data
violent acts mainly in high crime density areas by imple- taken from online forums such as Twitter and mobile phone
menting different models [15]. Given that, the crime data data [23]. Nevertheless, all these studies mainly focus only
is usually based on time series data, which shows the data on the cause of crimes followed by their consequence [24].
seasonality, and suggests the potential significance of crime Herein, we particularly emphasize the implementation of
activities evolved in the years. Therefore, time series analysis multiple techniques to achieve substantial accuracy on two
is required to generate visual patterns along with a deep large datasets.
learning algorithm specifically LSTM, which provides the The literature review section specifically reveals the related
better classification of crimes over time based on adequate studies on crime prediction based on Chicago and Los Ange-
measures [16]. Additionally, forecasting the crime trends les datasets. This section further highlights the classification,
through ARIMA model is highly recommended in recent prediction and forecasting of crimes. Different aspects of
research [17]. crime detection have been analyzed by different research
Therefore, this study aims to analyze crime prediction in methods. However, the overall prediction depends directly
the Chicago and Los Angeles datasets [18], (1) improving or indirectly on the information available within the given
the predictive accuracy compared to results in the recent dataset for crime prediction. Chicago and Los Angeles both
literature by implementing the Logistic Regression, SVM, are populous and iconic cities of the U.S. and their datasets
Naïve Bayes, KNN, Decision Tree, MLP, Random Forest, are available publically at authorized repositories, relating
XGBoost algorithms, (2) time-series analysis by LSTM, multiple traits that have been a great source of attraction for
(3) creating a visual summary through exploratory data analysts. With a specific goal to the brief, there have been
different studies in recent years based on these datasets to clustering technique [27]. Christian et al. relate the socioe-
predict accuracy and hotspot crime regions by applying mul- conomic and sustainable development indicators like poverty
tiple machine-learning algorithms, and kinds of expectation rate and unemployment toward crime by implementing Lin-
accomplished. Some of the recent studies on both cities are ear Regression Analysis from the year 2008 to 2012 foe
summarized below. Chicago [28]. A detailed analysis report cited by Schnell
et. al. addressed 359,786 incidents and were geocoded to
A. CHICAGO 41,926 street segments nested within 342 neighborhood clus-
Chicago is the third most populous city of the U.S., and crime ters, within 76 communities from 2001 to 2014 [29]. There
rates are more often distinct as compared to less populated have been multiple studies that are performed by using
area. Most crimes are associated with location, properties and geographical locations, meta-association rules and specific
distribution of people, rather than patterns of past crimes. detection system introduced to examine the crime rate in
Some recent studies for crime analytics from the Chicago city Chicago [30].
dataset are discussed below:
Kang et al. used environmental context information to B. LOS ANGELES
improve the prediction of models by proposing a feature-level Data from Los Angeles corroborate almost identical percent-
data fusion method on deep neural networks [6]. This study ages and indicate the involvement of long-term dependency
used four multiple demographic datasets (City of Chicago on additional systems and subsequent higher costs. Different
Data Portal, American FactFinder, Weather Underground, studies highlight more comprehensively the crime predic-
and Google Street View) for the year 2014 and showed tion among the dually involved population in Los Angeles.
improved results after exercising area under the curve, pre- Young et al. studied the report of the Los Angeles Times
cision and recall. Stec and Klabjan utilized the neural net- and the Data Desk (a team of reporters and Web developers)
work idea by merging two techniques; convolutional neural to inspect the technological changes in the newsroom at the
network (CNN) and recurrent neural network (RNN), and start of the twenty-first century [31]. The contribution of
achieved 75.6% accuracy [10]. The study was conducted on this study recommended the computational schemes appear
multiple datasets including; Portland, public transportation, in a discontinuous advancement of practices, identities and
weather census and Chicago dataset with 6 million records. norms. A study conducted by Contreras further analyzes the
It predicted the top three crime types namely violence, theft connection between dispensaries for medical marijuana and
and narcotic crimes for Chicago; after implementing Feed crime rates in Los Angeles [32]. Their outcome indicates that
Forward with 71.3%, CNN with 72.7% and RNN with 74.1% dispensaries for marijuana are considered as an assailant of
accuracy respectively. Another recent study conducted on crime. Another similar study conducted by Dierkhising et al.
the Chicago dataset from the year 2001 to 2019 also fore- reveals an intense female involvement among the sample
casts future crimes by using the ARIMA model [15]. They to predict rearrests rate and child welfare histories [33].
proposed their own model LFSNBC and achieved 97.47% Brantingham et al. analyzed the racial biases using arrest
accuracy along with SVM 67.01%, deep neural network for predictive policing experiments. The findings anticipate
(DNN) 84.25% and kernel density estimation (KDE) 66.33%. that the total numbers of arrests by algorithmically predicted
Najjar et al. used 12,000 satellite images to inquire about locations were numerically higher [8]. Ridgeway et al. further
crime rates from data and reports gathered by the police evaluate the impact of rail transit on crime from 1988 to
department [12]. Their finding predicts 79% accuracy by 2014 in neighborhoods near transit stations [34]. With per-
executing CNN using the deep learning concept. Wang et al. mutation tests, results revealed that there was no appreciable
implemented Linear Regression Negative Binomial Regres- crime effect in rail transit. Valasik et al. inspect the environ-
sion to figure out the MAE and mean relative error (MRE) for mental risk factors in East Los Angeles for the year 2012 that
two Chicago datasets; the point of interest data (POI) and taxi spatially influence gang assaults and gang violence [35].
flow [13]. POI was applied to aid the demographic features, RTM (risk terrain modeling) was used as an analytic tool
while taxi flow was used as a hyperlink to help the neighbors that greatly aided the local law enforcement, stakeholders,
by seeking geographical knowledge. Results anticipate the and policymakers by presenting anti-gang efforts to high-risk
rapid decline in the overall crime rate. areas.
Statistical analysis was conducted to evaluate violent and Orsogna et al. addressed the complex data analysis issues
non-violent crimes by using Chicago arrest data for the social by using the modeling tools for research, mathematicians
criminal network [25]. K–S test were executed to model the and scientists to predict crime and safety measures [36].
exact-repeat and near-repeat effects of the arrest. Catlett et al. Almanie et al. used the dataset for the year 2014 to pre-
applied two clustering algorithms; DBSCAN and ARIMA to dict the potential crime type and applied the Apriori algo-
detect high-risk crime regions to forecast future crime trends rithm, Naïve Bayesian and Decision Tree [37]. The result
by using a spatial-temporal approach [26]. Catlett et al. pro- achieved 54% prediction accuracy with ‘robbery’ as a major
posed an approach that relies on Spatial-temporal to discover attempted crime. Wang et al. predicted the spatial-temporal
the crime in high-risk areas that are mostly urban and depend- crime distribution in Los Angeles over the last six months
able trends for crimes forecast in every region while using of 2015 [4]. Results provide reliable guidance for crime
control after applying ST-ResNet and CNN on 104,957 accuracy of reported crimes in the past, whereas forecasting
crimes. Sungyong et al. analyzed the classification of crime, direct towards the future crime trends. However, a quick
whether the crime is related to gang-oriented or not through overview of criminal activities has been achieved by inves-
Generative Neural Network (GNN) [38]. The model is capa- tigation authorities through the available software packages,
ble to classify gang-oriented crimes when complete informa- whereas for deep analysis, only learning approaches may
tion is available from 2014 to 2016 in Los Angeles dataset. ensure the optimum solution. Therefore, different machine
For crowd-sourcing crime prediction, the Hawkes technique learning techniques can be used to predict crime patterns and
was introduced on Los Angeles crime reports, which requires thus may assist in further necessary actions based on histor-
no previous history [39]. This method illustrate a real-time ical data. Therefore, this study is divided into two sections:
crime model with an online k-mean type algorithm. i) crime prediction and ii) crime forecasting. Eight differ-
Overall, studies on crime prediction and forecast highlight ent machine learning algorithms are implemented to achieve
multiple research aspects, based on multiple cities worldwide. highly accurate predictions in both the Chicago and Los
All these studies mainly involve different types of mod- Angeles datasets. The machine-learning algorithms imple-
els including socio-economic factor that features education, mented in this study were namely logistic regression, decision
income level, unemployment to name a few. In addition to tree, random forest, MLP, Naïve Bayes, SVM, XGBoost, and
socio-economic factors, multiple computational models have KNN to get the crime prediction accuracy. Detailed informa-
been proposed to enhance crime prediction, classification and tion about these machine learning algorithms models archi-
forecast; and the spatial-temporal models, which specifically tecture is given in the supplementary information (SI) and
assess the hotspot crime regions. Different methodologies an experimental flow chart is given in Fig. 2. The prediction
have been analyzed for crime prediction in different cities results further identify areas with high crime density, all crime
such as South Korea, the U.S. (including Portland), and types and the crime rate over the past years. Additionally,
Canada, and many others [9]–[13]. Significant research effort the statistical model ARIMA for time series analyses was
has been made in different aspects, yet literature is still applied to foresee future crime trends and analytics.
pointing major concern towards better prediction accuracy, Crime forecasting based on time series data was also imple-
forecast and hotspot in large datasets such as Chicago and mented in a later part of this study. A time-series analy-
Los Angeles cities. The results and discussion part is divided sis involves forecasting based on a sequence of events or
on prediction accuracy, time series analysis and time series data points that forms a series with respect to time [40].
forecasting as shown in Fig. 1. Research groups around the globe have recently used differ-
ent approaches, including unsupervised models such as the
III. PREDICTION AND FORECASTING bilinear model, the threshold autoregressive (tar) model, the
Crime prediction and forecasting approaches have trans- autoregressive conditional heteroscedastic (ARCH) and deep
formed dramatically in recent years since the introduction of learning approaches, to identify future trends [41]. Real-time
commercial software packages. Crime prediction refers to the crime forecasting is always critical; especially in unknown
circumstances; when and where the next crime will hap- Initially there were 7019734 crime instances within the
pen remains difficult to predict accurately [42]. Therefore, Chicago dataset, and 16913 crimes were removed due
we used an ARIMA model for future forecasting and cal- to invalid formatting (missing data, fates, values etc.).
culated the RMSE to aggregate the magnitudes of the errors The experiment is performed on 7002821 instances of
and crime predictions. The details of the ARIMA model are the Chicago dataset. In the Los Angeles dataset, there
discussed in the SI. The forecasting results illustrate future were 2651233 instances initially and 4770 instances were
crime trends by highlighting the crime hot spots, top five removed during data pre-processing. Finally, there were
crimes and overall crime rates until 2024. 2646463 instances for Los Angeles for experiments. The
common attributes were chosen in both datasets for better
IV. DATA AND PREPROCESSING comparative analysis which were named as ID, date, crime
The data used in this study consists of criminal records for the primary type, description of the crime, location, year, zip
cities of Chicago and Los Angeles, and is the most decisive code, and police district. Both Chicago and Los Angeles
part to achieve the crime prediction accuracy. Herein, we used datasets have 35 different crime types.
two big datasets namely Chicago and Los Angeles obtained The accumulated raw data from online repositories usually
from open access data portals and are easily downloadable in contains irrelevant information and errors. The overall data
CSV format [18], [19]. will likely have noise, inconsistencies, outliers, bungles, and
The dataset of Chicago city contains the crime history missing qualities or, more fundamentally, data is inconsistent
(reports and social factors) from 2001 to November 2019. to start method. Therefore, the selection of meaningful data is
With 2.7 million population density, Chicago appears to necessary to eliminate anomalies against the outliers, noise,
be higher in crime density and the crime rate has been missing values, and other discrepancies, and thus change over
reported to double during the 2005 to 2008 period as com- the unfeasible data into possible is manageable to accomplish
pared to the rest of the U.S. where approximately 16% information handling. Additionally, collection for the more
circulation rate was predicted by 2012 [43]. Given that, mind-blogging framework is always required keeping in view
the situation drives the police officials to revise their poli- the current developing rate of data in business, industry appli-
cies, which later consequently showed a decreasing trend in cations, science, and research network. The data preprocess-
recent years. The freshly available dataset contains detailed ing solidifies data planning, exacerbated by mix, cleaning,
information regarding time, location (i.e., latitude and lon- institutionalization, and change of data; data decrease assign-
gitude), and types of crime, with 22 attributes along with ments, thereby reducing the multifaceted design of the data,
more than 7 million instances. Los Angeles: The dataset of perceiving or expelling unessential and uproarious compo-
Los Angeles city contains the criminal history from 2010 to nents from the data through element assurance, occurrence
2018. With 3.9 million population density, crime reports choice, or discretization frames, and thus finally assists to
have been declined significantly until 2015, but with an generate statistically significant data to make accurate crime
increasing trend after 2015. The Los Angeles dataset is predictions [44]. Therefore, the bootstrap random sampling
reported by the Los Angeles police department, and con- method as shown in Fig. 3; an over feature selection method,
tains 17 attributes with more than 2.6 million instances. which is also common since it is the least biased method
A. PREDICTIVE ACCURACY
This study used different parameters to assess the perfor-
mance of multiple algorithms, which better reflect the real
dataset application. Eight different algorithms were applied
FIGURE 3. Bootstrap random sampling method.
to the Chicago and Los Angeles datasets to investigate the
detailed predictive accuracy of the trained models, as shown
to generate estimates of population parameters; specifically in Fig. 4. To the best of our knowledge, these algorithms have
when the dataset is big [45]. Initially the datasets were exam- not been implemented together for Chicago and Los Angeles
ined from different sources and to take common attributes. datasets. Consequently, the main reason to choose these cities
In total, there are 9 common attributes in both datasets, and is population density, which reported higher crime rates in
data cleaning was assured by removing all missing values. the past with big data. The implemented algorithms have
For implementation, Python (version 3.6.3) framework was different methodologies to refine the data that involves super-
used with different libraries mainly for data transformation vised, unsupervised and reinforcement learning approaches.
e.g., imblearn and sklearn. The final attributes considered Additionally, Random Forest and XGBoost were also imple-
for this study were named as ID, date, crime primary type, mented which prompts an ensemble learning approach. Deci-
description of the crime, location, year, zip code and police sion Tree layout the significant decisions, while SVM and
district. Therefore, the data is divided into test sets (30%) and Naïve Bayes are used for better classification and KNN for
training set (70%). Finally, there were 7002821 instances for advance regression. To handle dependent variables Logistic
Chicago and 2646463 instances for Los Angeles after pre- regression is implemented along with MLP which refers
processing step. Accuracy, precision, recall and f1-score are to the network of multiple layers of the perceptron. Since
the main parameters used for performance evaluation in this all these mathematical expressions help to seek improved
study. accuracy to the best of their proficiency, with other perfor-
mance metrics such as precision, recall and F1-score, as listed
V. RESULTS in Table 1. The accuracy estimates the proportion of instances
The results and discussion part is divided into four sec- that are correctly classified to obtain the optimum threshold
tions based on methodology as shown in Fig. 1; predictive for crime prediction. XGBoost performs better than other
accuracy, time series analysis through LSTM, exploratory algorithms with 94% and 88% accuracy on both the Chicago
data analysis, and forecasting with an ARIMA model. The and Los Angeles datasets, as multiple innovative algorithms
experimental results are also shown and discussed in each work behind XGBoost. The Naïve Bayes, MLP (with hidden
section. First, the predictive accuracy is discussed based layer sizes of 24, 28, 30, and 34), and SVM algorithms also
on different algorithms. In the second part, time series achieve a better performance on the Chicago dataset than on
analysis was performed thorough LSTM to measure the per- the Los Angeles dataset with maximum accuracy. The deci-
formance of the model. Thereafter, crime particulars are thor- sion tree algorithm achieves an accuracy of approximately
oughly discussed in the exploratory data analysis section, and 66% (Chicago) and 60% (Los Angeles). The MLP (87 and
finally, crime forecasting and future crime trends are shown 84%) and KNN (88 and 89%) algorithms also approach the
through the ARIMA model. Different Python libraries were maximum accuracy on both datasets. The logistic regression
applied including Keras with Tensor Flow, Sk Learn, Pandas, model determines the statistical relationship between vari-
Numpy, Seaburn, Scipy, and many others to generate the ables to achieve optimal results; here, it depicts consistent
results. performance with 90% accuracy on the Chicago dataset and
achieves below average results on the Los Angeles dataset. terms of data mining for the last two decades [46]. From elec-
All these reported accuracy results are higher as compared tronic health records to cybersecurity, almost all real-world
with the literature. applications require time-series data for classification [47].
Conversely, the random forest model achieves 77% accu- A detailed description of LSTM is provided in the SI [48].
racy on the Chicago dataset, while the Naïve Bayes algorithm Prior to LSTM implementation, the data were preprocessed
achieves almost the same results on the Los Angeles dataset. to reduce noise and then transformed into stationary data.
The accuracy also depends on how often the crime happened Time series data are usually in non-stationary form and must
in the past, and predicting rare crimes in the population of be transformed into stationary form for easier handling and
interest might result in low accuracy. However, the SVM better classification [49]. Therefore, the Dickey-Fuller test
algorithm achieves average results; the random forest model is conducted to check for stationary data in a standard way
achieves the worst results on the Los Angeles dataset. Over- and to further evaluate the appropriate error scores [50]. The
all, the performances of these machine-learning algorithms results provide in-depth guidance from data processing and
are more consistent in the Chicago dataset than in the Los training of the LSTM model for a set of time-series data.
Angeles dataset. For time-series data, different types of errors are usually
The classification quality is usually evaluated on the per- measured, such as the scale-dependent error and percentage
formance of objective functions such as precision, recall and error. Herein, two known scale-dependent error measures
F1-sore. The recall presents the relevant instances that are were used, namely, the RMSE and the MAE, along with the
retrieved by the classifier, whereas the precision is the per- number of epochs and batch size. The RMSE measures the
centage of correctly classified samples. Both functions simul- average magnitude of the errors. Specifically, it is the square
taneously and optimize the two objectives with an inverse root of the average of the squared differences between the
relationship, whereas the F1-score is the weighted average of predicted and actual observations. Therefore, the RMSE will
recall and precision. The Chicago dataset yielded the highest be more useful when large errors are particularly undesirable.
performance metrics compared with the Los Angeles dataset The MAE measures the average magnitude of the errors in a
and suggests better and stable algorithm performance. The set of predictions, regardless of their direction. Therefore, it is
general performance parameters, i.e., precision, recall, and the average across the test sample of the absolute differences
F1-score, for the Los Angeles dataset are not stable enough, between the predicted and actual observations where all the
thereby suggesting moderate performance. XGBoost exhibits individual differences have equal weight. The performance
better results for precision, recall, and F1-score than the other metrics of LSTM are listed in Table 2, which indicates the
models. performance of the corresponding model in the testing data
rather than the training data.
B. TIME SERIES ANALYSIS THROUGH LSTM The outcome of the epochs showed the same loss value
LSTM is an elegant variation in the RNN architecture, which after the 13th iteration for the Chicago dataset, whereas for
is an approach that can be applied to model sequential data. Los Angeles, the loss value started repeating after the 18th
The structure of LSTM makes it an effective solution to iteration. There is no evidence training the network with the
combat the vanishing gradient problem of RNNs. It uses same dataset more than once would improve the accuracy of
memory capable of representing the long-term dependencies the prediction. In some cases, the performance even worsens,
in sequential data. LSTM ensures improved learning for time indicating that the trained models are overfitting. However,
series by capturing the structure of sequential data more nat- apparently setting the number of epochs to 1 generates a
urally and even performs hierarchical processing for complex reasonable prediction model [51]. The performance of LSTM
temporal tasks. Time series classification tasks are differ- seems to be adequate for time series analysis, especially for
ent from traditional classification and regression predictive RMSE and MAE, where it can classify the data focusing on
modeling problems and have been considered challenging in their variations.
Fig. 5 shows the approximate distribution of the mean spot districts for the crime with their corresponding numbers
crime density areas in different periods after LSTM imple- of crime incidents (Fig. 7). There were 24 crime regions
mentation. The different frequencies include the daily, in Chicago and Los Angeles with the highest crime rates
weekly, monthly, quarterly, and yearly results, as shown with further extensive insights (Table 3). Fig. 7A and 7B
in Fig. 5. The mean crime density area for Chicago has display the hot spot regions for Chicago and Los Ange-
an intense variation trend mainly in daily and weekly data, les with their respective crime counts. Additionally, future
whereas the monthly and quarterly data have moderate vari- crime density areas were also studied by using an ARIMA
ation trends (Fig. 5A). However, the mean crime type for model, which will be discussed in the next section. The crime
Los Angeles presents some variations initially and then a types and their estimated intensities are even more important
decreasing trend in recent years, finally becoming stable to determine the anticipated chance of crime occurrences.
(Fig. 5B). The overall process involves developing a func- The visual frequencies of each crime type with the corre-
tion that calculates and presents the moving average of the sponding crime count are shown in Fig. 8. Theft, battery,
events in the neighborhood of the events. In recent years, criminal damage, narcotics, offense, robbery, motor vehicle
the majority of mean crime types demonstrate a downward theft, deceptive practice, burglary, assault, and theft were
trend, which suggests a further decline in the majority of the main crimes observed in Chicago (Fig. 8A). Miscel-
forecasts in the overall time intervals (Fig. 5). However, it is laneous offenses, larceny-theft, assault, narcotics, burglary,
not applicable when the historically upward trend is related grand theft auto, juvenile theft, kidnapping, vehicle loss,
to other criminal offenses. The time series classification is vandalism, and accidents were the main crime types in Los
potentially a direct indicator, but it cannot be treated as an Angeles (Fig. 8B). The visual representation allows inves-
approximation of specific values, but rather as a data-driven tigation authorities to take special measures against these
model. violations.
FIGURE 5. Time series analysis with respect to mean crime density area for daily, weekly, monthly, quarterly, and yearly.
Initially, it was assumed that the time series data were ARIMA model was used for forecasting after passing the
stable after differentiation with bounded fluctuation. The noise test, and later, a Dickey-Fuller test was conducted to
examine the stationarity of the data. The prediction results of forecasted crime rate for Chicago and Los Angeles were
the ARIMA model for Chicago and Los Angeles are shown 31.8 and 24.65 and MAE was 29.8 and 20.83 respectively.
in Fig. 9. The objective of an ARIMA analysis is to determine The Chicago crime rate pattern had intense variations in
the best predictive performance for the data of interest. The recent years, and variation will continue to increase mod-
ARIMA model performs favorably to the alternative models. erately in the future, followed by a stable decline, proba-
It presents the distribution of the results obtained for each bly in subsequent years, as observed in Fig. 9A. The Los
dataset with all architectures depending on the historical Angeles crime rate has been stable over the last few years,
window length. and forecasts suggest a sharp decline in the future (Fig. 9B).
Finally, the study forecasts the crime rate and hotspots for After taking the mean of high crime density areas identified
both Chicago and Los Angeles to ultimately support proac- as crime hot spot (Fig. 9A and B, x-axis is the number of
tive policing strategies. The mean crime count is calculated crimes and y-axis is the years). The Chicago crime inten-
to forecast the five-year crime trend. The RMSEs of the sity for crime density areas as hot spots increased slightly
(Fig. 9C), where the x-axis is the top ID locations having evaluated based on RMSE, MAE, number of epochs and
higher crime rates in the past and the y-axis is the year. The batch size. In addition to crime prediction accuracy and
Los Angeles crime intensity for the hot spot declined sharply LSTM classification, exploratory data analysis provides a
(Fig. 9D). visual summary for better comparative analysis between both
cities. Results identify the different crime count, crime type,
VI. DISSCUSSION in different classified locations with 35 crime types for both
Criminality is a phenomenon that occurs seemingly random Chicago and Los Angeles. The annual crime trend represents
and multiple research efforts have been made to develop a significant decrease in the Chicago crime rate and Los
rigorous and independent assessments. However, this study Angeles indicates an increase in recent years. Furthermore,
highlights the practical perspective of criminology by intro- theft, battery, criminal damage, narcotics and offense were
ducing predictive analysis through possible methods based the top five crimes observed in Chicago whereas miscella-
on real-time data. Therefore, implementation of different neous offenses, larceny-theft, assault and narcotics were the
machine learning algorithms were examined including LSTM main crime types reported in Los Angeles. Finally, the crime
and ARIMA modeling. First, the performance of differ- forecasting for crime rate and high-density crime areas for the
ent machine learning algorithms namely logistic regression, next five years by using an ARIMA model. ARIMA model
SVM, Naïve Bayes, KNN, decision tree, MLP, random forest suggests that the Chicago crime rate continue to increase
and XGBoost were examined on datasets of Chicago and moderately in the future whereas suggests a sharp decline for
Los Angeles. The efficiency of prediction accuracy achieved Los Angeles. This study reports the five-year crime trend and
by different algorithms is comparatively better than those high crime density areas until 2024 with ARIMA, as com-
reported earlier and suggests better performance. The per- pared with previous reports by using ARIMA. The Chicago
formance of machine learning algorithms is more consistent crime density in hot spots increased slightly whereas it will
for the Chicago dataset as compared with the Los Angeles sharply decline in Los Angeles. ARIMA model performs
dataset; where XGBoost achieves improved efficiency for better as compared with LSTM based on RMSE and MAE.
prediction accuracy (around 94% and 88%) followed by KNN Overall, the proposed aims and objectives of the study are ful-
(around 88% and 89%) on both crime datasets. Herein, this filled and portray a clear picture of machine learning and deep
study reports the better prediction accuracy for Los Chicago learning techniques and their implementation with potential
and Angeles, which are 94 % and 88% respectively including for different types of big datasets. All these results could
all types of crimes whereas previous literature report 75.6% benefit the situational awareness with the help of descriptive
accuracy for Chicago by using the dataset until the year graphs that depicts the trend analysis with future forecast.
2014 by only three types of crimes namely, violence, theft Findings will further assist the law enforcement agencies and
and narcotic [10]. Also, the Los Angeles dataset is rarely been investigation departments to determine policies and meaning-
used and just a few studies were conducted like permutation ful insights like high crime density areas and helps the gov-
test and K-S test for gang assaults and gang violence; while ernment and city management to ensure public safety. As a
recently Almanie et al. predicts 54% prediction accuracy with future augmentation, we intend to apply hybrid models to
‘robbery’ as a major crime [37], [39]. Second, LSTM further expand crime prediction accuracy and to enhance the overall
classifies the crimes over different periods (yearly, quar- performance. In addition, future work plans to build up visual
terly, monthly, weekly and daily). LSTM performance was images and location maps creating effectual anticipation from
FIGURE 9. Forecast analysis of crime rates and crime density areas by ARIMA model.
the foreseen crime event providing a chance to upgrade the Chicago and a decline for Los Angles. For future work,
regulation of the patrolling system by police. this study will be expanded by using satellite imagery data,
and the implementation of different learning techniques with
VII. CONCLUSION corresponding visual data for different crime datasets.
Crimes are serious threats to human society, safety, and sus-
tainable development and are thus meant to be controlled. APPENDIX
Investigation authorities often demand computational pre- The machine-learning algorithms implemented in this study
dictions and predictive systems that improve crime analyt- (Logistic Regression, SVM, Naïve Bayes, KNN, Decision
ics to further enhance the safety and security of cities and Tree, MLP, Random Forest, XGBoost) LSTM and ARIMA
help to prevent crimes. Herein, we achieved an improved models are detailed in SI.
predictive accuracy for crimes by implementing different
machine learning algorithms on Chicago and Los Angeles ACKNOWLEDGMENT
crime datasets. Among the different algorithms, XGBoost Wajiha Safat acknowledges the financial support for M.S.
achieves the maximum accuracy on Chicago datasets and study from COMSATS University, Islamabad. She especially
KNN achieves the maximum accuracy on Los Angeles. Data thank Dr. Abdul Ghaffar (Institute of Metal Research, Chi-
preprocessing was followed by splitting the dataset into train- nese Academy of Sciences, Shenyang) for fruitful discus-
ing and testing sets, and later the performance parameters sions.
were examined. This study further applied a deep learning
architecture for time series analysis through LSTM, by which COMPLIANCE WITH ETHICAL STANDARDS
the Chicago crime count had intense variations compared Conflicts of Interest: The authors declare no conflict of inter-
with Los Angeles, as shown by the RMSE and MAE. Also, est.
the exploratory data analysis exhibited extensive visualiza-
tions regarding crime particulars, including crime rates in REFERENCES
different periods from daily to yearly trends, crime types, [1] G. Mohler, ‘‘Marked point process hotspot maps for homicide and
and high-intensity areas based on historical patterns. More- gun crime prediction in Chicago,’’ Int. J. Forecasting, vol. 30, no. 3,
over, the implementation of an ARIMA model to predict pp. 491–497, Jul. 2014.
[2] A. Iriberri and G. Leroy, ‘‘Natural language processing and e-government:
the five-year trends regarding the crime rate and hot spots Extracting reusable crime report information,’’ in Proc. IEEE Int. Conf. Inf.
having high crime density suggest moderate variations for Reuse Integr., Las Vegas, IL, USA, Aug. 2007, pp. 221–226.
[3] V. Pinheiro, V. Furtado, T. Pequeno, and D. Nogueira, ‘‘Natural language [26] C. Catlett, E. Cesario, D. Talia, and A. Vinci, ‘‘Spatio-temporal crime
processing based on semantic inferentialism for extracting crime informa- predictions in smart cities: A data-driven approach and experiments,’’
tion from text,’’ in Proc. IEEE Int. Conf. Intell. Secur. Informat., Vancouver, Pervasive Mobile Comput., vol. 53, pp. 62–74, Feb. 2019.
BC, Canada, May 2010, pp. 19–24. [27] C. Catlett, E. Cesario, D. Talia, and A. Vinci, ‘‘A data-driven approach for
[4] B. Wang, P. Yin, A. L. Bertozzi, P. J. Brantingham, S. J. Osher, and J. Xin, spatio-temporal crime predictions in smart cities,’’ in Proc. IEEE Int. Conf.
‘‘Deep learning for real-time crime forecasting and its ternarization,’’ Chin. Smart Comput. (SMARTCOMP), Taormina, Italy, Jun. 2018, pp. 17–24.
Ann. Math., B, vol. 40, no. 6, pp. 949–966, Nov. 2019. [28] S. N. Christian, K. R. Majeed, and S. O. Etinosa, ‘‘Application of data
[5] S. Chackravarthy, S. Schmitt, and L. Yang, ‘‘Intelligent crime anomaly analytics techniques in analyzing crimes,’’ in Proc. SAIS, vol. 40, 2018,
detection in smart cities using deep learning,’’ in Proc. IEEE 4th Int. Conf. pp. 1–7.
Collaboration Internet Comput. (CIC), Philadelphia, PA, USA, Oct. 2018, [29] C. Schnell, A. A. Braga, and E. L. Piza, ‘‘The influence of community
pp. 399–404. areas, neighborhood clusters, and street segments on the spatial variability
[6] H.-W. Kang and H.-B. Kang, ‘‘Prediction of crime occurrence from multi- of violent crime in Chicago,’’ J. Quant. Criminol., vol. 33, pp. 469–496,
modal data using deep learning,’’ PLoS ONE, vol. 12, no. 4, Apr. 2017, Sep. 2017.
Art. no. e0176244. [30] G. Rosser and T. Cheng, ‘‘Improving the robustness and accuracy of
[7] A. Fidow, M. Hassan, M. Imran, X. Cheng, C. Petridis, and C. Sule, ‘‘Sug- crime prediction with the self-exciting point process through isotropic
gesting a hybrid approach mobile apps with big data analysis to report and triggering,’’ Appl. Spatial Anal. Policy, vol. 12, no. 1, pp. 5–25, Mar. 2019.
prevent crimes,’’ in Social Media Strategy in Policing (Security Informatics [31] M. L. Young and A. Hermida, ‘‘From Mr. and Mrs. Outlier to central ten-
and Law Enforcement), B. Akhgar, P. S. Bayeri, and G. Leventakis, Eds. dencies: Computational journalism and crime reporting at the Los Angeles
Cham, Switzerland: Springer, 2019, pp. 177–195. times,’’ Digit. Journalism, vol. 3, no. 3, pp. 381–397, May 2015.
[8] P. J. Brantingham, M. Valasik, and G. O. Mohler, ‘‘Does predictive policing [32] C. Contreras, ‘‘A block-level analysis of medical marijuana dispensaries
lead to biased arrests? Results from a randomized controlled trial,’’ Statist. and crime in the city of Los Angeles,’’ Justice Quart., vol. 34, no. 6,
Public Policy, vol. 5, no. 1, pp. 1–6, Jan. 2018. pp. 1069–1095, Sep. 2017.
[9] A. Nasridinov and Y.-H. Park, ‘‘A study on performance evaluation of [33] C. B. Dierkhising, D. Herz, R. A. Hirsch, and S. Abbott, ‘‘System back-
machine learning algorithms for crime dataset,’’ Adv. Sci. Technol. Lett., grounds, psychosocial characteristics, and service access among dually
vol. 90, pp. 90–92, Dec. 2014. involved youth: A Los Angeles case study,’’ Youth Violence Juvenile Jus-
[10] A. Stec and D. Klabjan, ‘‘Forecasting crime with deep learning,’’ 2018, tice, vol. 17, no. 3, pp. 309–329, 2018.
arXiv:1806.01486. [Online]. Available: http://arxiv.org/abs/1806.01486 [34] G. Ridgeway and J. M. MacDonald, ‘‘Effect of rail transit on crime: A study
[11] J. Fitterer, T. A. Nelson, and F. Nathoo, ‘‘Predictive crime mapping,’’ Police of Los Angeles from 1988 to 2014,’’ J. Quant. Criminol., vol. 33, no. 2,
Pract. Res., vol. 16, no. 2, pp. 121–135, Mar. 2015. pp. 277–291, Jun. 2017.
[12] A. Najjar, S. Kaneko, and Y. Miyanaga, ‘‘Crime mapping from satellite [35] M. Valasik, ‘‘Gang violence predictability: Using risk terrain modeling to
imagery via deep learning,’’ 2018, arXiv:1812.06764. [Online]. Available: study gang homicides and gang assaults in east Los Angeles,’’ J. Criminal
http://arxiv.org/abs/1812.06764 Justice, vol. 58, pp. 10–21, Sep. 2018.
[13] H. Wang, D. Kifer, C. Graif, and Z. Li, ‘‘Crime rate inference with big [36] M. R. D’Orsogna and M. Perc, ‘‘Physics for better human societies: Reply
data,’’ in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data to comments on ‘statiscal physics of crime: A review,’’’ Phys. Life Rev.,
Mining, San Francisco, CA, USA, Aug. 2016, pp. 635–644. vol. 12, pp. 40–43, Mar. 2015.
[14] X. Zhang, L. Liu, L. Xiao, and J. Ji, ‘‘Comparison of machine learn- [37] T. Almanie, R. Mirza, and E. Lor, ‘‘Crime prediction based on crime types
ing algorithms for predicting crime hotspots,’’ IEEE Access, vol. 8, and using spatial and temporal criminal hotspots,’’ Int. J. Data Mining
pp. 181302–181310, 2020. Knowl. Manage. Process, vol. 5, pp. 1–19, Aug. 2015.
[15] G. R. Nitta, B. Y. Rao, T. Sravani, N. Ramakrishiah, and M. BalaAnand, [38] S. Seo, H. Chan, P. J. Brantingham, J. Leap, P. Vayanos, M. Tambe, and
‘‘LASSO-based feature selection and Naïve Bayes classifier for crime Y. Liu, ‘‘Partially generative neural networks for gang crime classification
prediction and its type,’’ Service Oriented Comput. Appl., vol. 13, no. 3, with partial information,’’ in Proc. AAAI/ACM Conf. AI, Ethics, Soc.
pp. 187–197, Sep. 2019. (AIES), New Orleans, LA, USA, Dec. 2018, pp. 257–263.
[16] A. L’Heureux, K. Grolinger, H. F. Elyamany, and M. A. M. Capretz, [39] G. Mohler and P. J. Brantingham, ‘‘Privacy preserving, crowd sourced
‘‘Machine learning with big data: Challenges and approaches,’’ IEEE crime hawkes processes,’’ in Proc. Int. Workshop Social Sens. (SocialSens),
Access, vol. 5, pp. 7776–7797, 2017. Orlando, FL, USA, Apr. 2018, pp. 14–19.
[17] Z. Zhang, D. Sha, B. Dong, S. Ruan, A. Qiu, Y. Li, J. Liu, and C. Yang, [40] D. S. de O. Santos Júnior, J. F. L. de Oliveira, and P. S. G. de Mattos Neto,
‘‘Spatiotemporal patterns and driving factors on crime changing during ‘‘An intelligent hybridization of ARIMA with machine learning models
black lives matter protests,’’ ISPRS Int. J. Geo-Inf., vol. 9, no. 11, p. 640, for time series forecasting,’’ Knowl.-Based Syst., vol. 175, pp. 72–86,
Oct. 2020. Jul. 2019.
[18] Chicago Data Portal. Accessed: Nov. 2, 2019. [Online]. Avail- [41] J. C. B. Gamboa, ‘‘Deep learning for time-series analysis,’’ 2017,
able: https://data.cityofchicago.org/Public-Safety/Crimes-2001-topresent- arXiv:1701.01887. [Online]. Available: http://arxiv.org/abs/1701.01887
Dashboard/5cd6-ry5g [42] M. Khashei and M. Bijari, ‘‘A novel hybridization of artificial neural
[19] Los Angeles County GIS Data Portal. Accessed: Nov. 2, 2019. [Online]. networks and ARIMA models for time series forecasting,’’ Appl. Soft
Available: http://egis3.lacounty.gov/dataportal/?s=crime Comput., vol. 11, no. 2, pp. 2664–2675, Mar. 2011.
[20] L. Lochner, ‘‘Education and crime,’’ in The Economics of Education: [43] L. W. Kennedy, J. M. Caplan, E. L. Piza, and H. Buccine-Schraeder,
A Comprehensive Overview, S. Bradley and G. Green, Eds. New York, NY, ‘‘Vulnerability and exposure to crime: Applying risk terrain modeling to
USA: Academic, 2020, pp. 109–117. the study of assault in Chicago,’’ Appl. Spatial Anal. Policy, vol. 9, no. 4,
[21] G. O. Mohler, M. B. Short, P. J. Brantingham, F. P. Schoenberg, and pp. 529–548, Dec. 2016.
G. E. Tita, ‘‘Self-exciting point process modeling of crime,’’ J. Amer. Stat. [44] T. Altameem and M. Amoon, ‘‘Crime activities prediction using hybridiza-
Assoc., vol. 106, no. 493, pp. 100–108, Mar. 2011. tion of firefly optimization technique and fuzzy cognitive map neural net-
[22] J. H. Ratcliffe, ‘‘A temporal constraint theory to explain opportunity-based works,’’ Neural Comput. Appl., vol. 31, no. 5, pp. 1263–1273, May 2019.
spatial offending patterns,’’ J. Res. Crime Delinquency, vol. 43, no. 3, [45] H. Wang, H. Yao, D. Kifer, C. Graif, and Z. Li, ‘‘Non-stationary model
pp. 261–291, Aug. 2006. for crime rate inference using modern urban data,’’ IEEE Trans. Big Data,
[23] M. S. Gerber, ‘‘Predicting crime using Twitter and kernel density estima- vol. 5, no. 2, pp. 180–194, Jun. 2019.
tion,’’ Decis. Support Syst., vol. 61, pp. 115–125, May 2014. [46] H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar, and P.-A. Muller, ‘‘Deep
[24] M. Traunmueller, G. Quattrone, and C. Capra, ‘‘Mining mobile phone data learning for time series classification: A review,’’ Data Mining Knowl.
to investigate urban crime theories at scale,’’ in Social Informatics (Lecture Discovery, vol. 33, no. 4, pp. 917–963, Jul. 2019.
Notes in Computer Science), L. M. Aiello and D. McFarland, Eds. Cham, [47] S. L. Hyland, M. Faltys, M. Hüser, X. Lyu, T. Gumbsch, C. Esteban,
Switzerland: Springer, 2014, pp. 396–411. C. Bock, M. Horn, M. Moor, B. Rieck, M. Zimmermann, D. Bodenham,
[25] P. Kump, D. H. Alonso, Y. Yang, J. Candella, J. Lewin, and M. N. Wernick, K. Borgwardt, G. Rätsch, and T. M. Merz, ‘‘Early prediction of circulatory
‘‘Measurement of repeat effects in Chicago’s criminal social network,’’ failure in the intensive care unit using machine learning,’’ Nature Med.,
Appl. Comput. Informat., vol. 12, no. 2, pp. 154–160, Jul. 2016. vol. 26, no. 3, pp. 364–373, Mar. 2020.
[48] M. Abdel-Nasser and K. Mahmoud, ‘‘Accurate photovoltaic power fore- SOHAIL ASGHAR (Member, IEEE) received
casting models using deep LSTM-RNN,’’ Neural Comput. Appl., vol. 31, the degree (Hons.) in computer science from
no. 7, pp. 2727–2740, Jul. 2019. the University of Wales, U.K., in 1994, and
[49] P. Filonov, A. Lavrentyev, and A. Vorontsov, ‘‘Multivariate industrial time the Ph.D. degree from the Faculty of Informa-
series with cyber-attack simulation: Fault detection using an LSTM-based tion Technology, Monash University, Melbourne,
predictive data model,’’ 2016, arXiv:1612.06676. [Online]. Available: Australia, in 2006. In 2011, he joined the
https://arxiv.org/abs/1612.06676 University Institute of Information Technology,
[50] M. Alsharif, M. Younes, and J. Kim, ‘‘Time series ARIMA model for
PMAS-Arid Agriculture University, Rawalpindi,
prediction of daily and monthly average global solar radiation: The case
as a Director. He is currently a Professor and the
study of Seoul, South Korea,’’ Symmetry, vol. 11, no. 2, p. 240, Feb. 2019.
[51] S. Siami-Namini and A. S. Namin, ‘‘Forecasting economics and financial Chairman of computer science with COMSATS
time series: ARIMA vs. LSTM,’’ 2018, arXiv:1803.06386. [Online]. Avail- University Islamabad. He has taught and researched in data mining, including
able: http://arxiv.org/abs/1803.06386 structural learning, classification, and privacy preservation in data mining
[52] A. J. Hussain, P. Liatsis, M. Khalaf, H. Tawfik, and H. Al-Asker, and text and web mining, big data analytics, data science, and information
‘‘A dynamic neural network architecture with immunology inspired opti- technology areas. He has published more than 150 publications in inter-
mization for weather data forecasting,’’ Big Data Res., vol. 14, pp. 81–92, national journals and conference proceedings. He has consulted widely on
Dec. 2018. information technology matters, especially in the framework of data mining
[53] S. Benabderrahmane, N. Mellouli, M. Lamolle, and P. Paroubek, and data science. He is a member of the Australian Computer Society (ACS)
‘‘Smart4Job: A big data framework for intelligent job offers broadcasting and the Higher Education Commission Approved Supervisor. He has served
using time series forecasting and semantic classification,’’ Big Data Res., as a Program Committee Member of numerous international conferences
vol. 7, pp. 16–30, Mar. 2017. and regularly speaks at international conferences, seminars, and workshops.
In 2004, he received the Australian Postgraduate Award for Industry. He is
on the Editorial Team of well-reputed scientific journals.