0% found this document useful (0 votes)
2 views

Correlation_sample

Uploaded by

Nicat Nazarov
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Correlation_sample

Uploaded by

Nicat Nazarov
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Finding Correlation between crime and poverty in 50 countries

How will you ensure that the data you collect on crime rates and poverty
levels in each country are reliable and comparable?

Introduction:

When I was kid I was a fan of crime movies. During that time I didn’t pay attention to which
country film was going on. Later on I realized that most of the movies were filming in countries
where there is poverty. For that reason I wanted to search to how poverty is affecting crime in
the country, and in this exploration I will investigate the relationship between crime and poverty.

Nowadays crime rates have increased a lot around the world especially in well-developed
countries such as Canada, Switzerland, Portugal, and etc. People assume that crime rates are
increasing because of poverty in the country. For that reason I wanted to find out if there really is
any connection between poverty and crime.

In this investigation delves into the connection between crime and poverty across 50 countries,
aiming to uncover how these two factors relate to each other. Crime and poverty are big issues
affecting many people around the world. By looking at countries like Latvia, Pakistan, Canada,
and Finland, we can see how different places deal with crime and poverty. I want to understand
if there's a link between how poor a country is and how much crime happens there.

The reason I’m looking into this is because understanding this link can help us make better
decisions about how to deal with crime and poverty. If I find a strong connection, it could mean
that reducing poverty might also help reduce crime in some places.

To do this, I will use data and numbers to see if there's a pattern. I will analyze information about
crime rates and poverty levels in each country. By comparing these factors, I hope to find out if
there's a relationship between poverty and crime.

My study is important because it could help governments and organizations create better
strategies for solving crime and poverty. By understanding the connection between the two, I
might be able to find more effective ways to make communities safer and improve people's lives
around the world.
Hypothesis:
The hypothesis of this study shows there is a relationship between the rates of crime and poverty
in 50 different countries. In the 1993 science-fiction movie Demolition Man, a rebel named
Edgar said that being poor may result in an increase in criminal activity, it is expected that higher
levels of poverty are linked to higher crime rates. To test this hypothesis and find any noteworthy
trends or relationships, data on crime rates and poverty indicators will be analyzed statistically.
With using linear correlation
1

Plan:
1. Using random sampling to list 50 countries
2. Finding crime rates of 50 countries
3. Finding poverty rates in 50 countries
4. Writing data to the TI-84 calculator, and finding scatter.
5. Finding line of best fit
6. Finding linear regression
7. Finding country out of my data
8. Checking how realistic is regression line 5 times
9. Finding percentage error
10. Outlier will be considered
11. Investigation affect linear to the data

1 Louise Gaille, “How Poverty Influences Crime Rates,” Vittana.org, December 16, 2019,
https://vittana.org/how-poverty-influences-crime-rates.
Dependent and Independent variable

The Dependent variable depends on other variables. Independent variables aren’t affected by any
other variables that the study measures.2 The independent variable is the cause. Its value of
Poverty in my investigation . The dependent variable is the effect. Its value depends on changes
in the independent variable.Its value of Crime Rate in my investigation3.

Country Poverty (x) Crime rate (y)

Latvia 22.5 37.30

Tajikistan 26.3 44.03

Pakistan 24.3 42.80

Andorra 8 12.87

Hungary 12.3 20.18

Algeria 5.5 51.40

Antigua and Barbuda 28.5 56.39

Sierra Leone 56.8 64.71

Ukraine 34.4 46.85

Suriname 39.4 53.03

Jordan 15.7 40.41

Kuwait 2 32.97

Iraq 18.9 44.93

Nicaragua 24.9 51.77

2 “WCU Faqs: Research Help,” LibAnswers,


https://westcoastuniversitylibrary.libanswers.com/research/faq/295836#:~:text=Dependent%20variables
%3A&text=Dependent%20variables%20depend%20on%20other,would%20be%20the%20dependent
%20variable.
3 Pritha Bhandari, “Independent vs. Dependent Variables: Definition & Examples,” Scribbr, June 22,
2023, https://www.scribbr.com/methodology/independent-and-dependent-variables/.
Ivory Coast 39.5 57.40

Grenada 28.2 26.5

Dominica 28.8 53.58

El Salvador 22.8 62.09

Finland 12.2 26.33

Canada 11.6 45.22

South Korea 15 24.94

Nigeria 40.1 66.25

Somalia 73 65.20

Tuvalu 26.3 18.48

Portugal 17.2 31.41

Cambodia 17.7 53.09

Aruba 15.9 31.39

Taiwan 9 16.71

Moldova 7.3 46.43

Bolivia 37.2 64.56

India 21.9 44.32

Peru 20.2 67.74

Anguilla 23 20.10

Senegal 46.7 44.95

Serbia 23.2 38.17

Iceland 8.8 25.51

Denmark 12.5 26.22

Qatar 0.4 15.88

Belgium 14.8 49.32

Libya 40 60.42
Iran 48.6 49.50

Zimbabwe 38.3 60.6

Lebanon 27.4 46.54

Indonesia 9.4 46.01

Switzerland 16 25.26

Samoa 20.03 42.06

Belarus 5 51.02

Kazakhstan 4.3 45.91

Myanmar 24.8 50.40

Congo 63.9 67.65

Scatter graph
A scatter plot uses dots to represent values for two different variables. The position of each point
on the horizontal and vertical axis shows the values for a data point. Scatter plots are used to
observe relationships between variables.
4

The scatter plot is usually described as weak, or strong. The more spread out the data points are,
the weaker in the relationship. If the points are clearly clustered, or closely follow a curve or line,
the relationship is described as strong.
5

4 Describing Scatter Plots¶,” Describing Scatter Plots - Introduction to Google Sheets and
SQL,https://runestone.academy/ns/books/published/ac1/scatter_plots_and_correlation/
describing_scatter_plots.html#.

5 “A Complete Guide to Scatter Plots,” Chartio, https://chartio.com/learn/charts/what-is-a-scatter-plot/.


6

Line of Best Fit


Line of best fit refers to the line through a scatter plot of data points that best shows the
relationship between those data points. The method of least squares is used to arrive at the
geometric equation for the line, either through manual calculations or by using the technology.
7

6 “Untitled Spreadsheet,” Google Sheets,


https://docs.google.com/spreadsheets/d/193OA2vv5KmksWVB6LCbYKCLsiHmY1guwOyRF9J7udU4/
edit#gid=0.
7 James Chen, “Line of Best Fit: Definition, How It Works, and Calculation,” Investopedia,
https://www.investopedia.com/terms/l/line-of-best-fit.asp#toc-how-to-calculate-the-line-of-best-fit.
Pearson Coefficient
Linear Regression is a way to make predictions based on a relationship between one data value
such as x.axis and another one y.axis. The basic idea is to come up with a mathematical model
to describe the dependent or unknown variable y by using the linear relationship with the known
or independent variable x. For example, if data shows your expenses and income from last year,
linear regression can tell you if and how your expenses grow as your income does. If you then
use this year’s income with the model, you can predict your expenses.
8

The formula for linear regression is y= mx+c where m and c are constant for all possible values
of x and y.

❑ ❑ ❑
n (∑ ❑ xy)−( ∑ ❑ x )(∑ ❑ y)
❑ ❑ ❑
r=
¿¿
The picture above shows correlation math formulas without using a calculator.

n = Number of values or elements


∑x = Sum of 1st values list

∑y = Sum of 2nd values list

∑xy = Sum of the product of 1st and 2nd values

∑x2 = Sum of squares of 1st values

∑y2 = Sum of squares of 2nd values

y=0.59x + 29.5

R in a regression analysis is called the correlation coefficient and it is defined as the correlation
or relationship between an independent and a dependent variable. It ranges from -1 to +1. An R-

8 What is linear regression? - linear regression explained - AWS,


https://aws.amazon.com/what-is/linear-regression/
9 “Correlation Coefficient - Definition, Formula, Properties and Examples,” BYJUS, August 29, 2023,
https://byjus.com/jee/correlation-coefficient/
value of -1 and +1 indicates respectively a perfect negative and positive relationship between the
independent and dependent variable., R-value of 0 shows that there is no relationship between
these variables. So,, the higher the R-value is closer to -1 or +1, the better the relationship. Most
often, it is expressed in percentages.

10

For this result I used a line of regression. I used poverty for x. axis crime rate for y. axis which is
❑ ❑ ❑

the same in this formula n (∑ ❑ xy)−( ∑ ❑ x )(∑ ❑ y)


❑ ❑ ❑
r=
¿¿

Percentage errors
Percent error is the difference between an approximate or measured value and an exact or known
value. Percent errors indicate how big our errors are when we measure something in an analysis
process. Smaller percent errors indicate that we are close to the accepted or original value.

The formula for calculating percent error:

Percent Error Formula: %Error= |T −E


T |
×100

T = True or Actual value

E = Estimated value

10 “The Meaning of R, R Square, Adjusted R Square, R Square Change and F Change in a Regression
Analysis,” Analysis INN., March 13, 2020, https://www.analysisinn.com/post/the-meaning-of-r-r-square-
adjusted-r-square-r-square-change-and-f-change-in-a-regression-analysis/#:~:text=R%20in%20a
%20regression%20analysis,independent%20and%20a%20dependent%20variable.
11

Country Poverty (x) Crime (y) Crime rate from Percentage Error
regression line

Mauritius12 11 43.8213 35.99 17.87

Italy12 7.5 44.6613 33.925 24.03

Argentina12 53 74.8213 63.13 15.62

Effect of Outlier
An outlier is a data point that is very different from the rest in a group. Basically, it's up to the
person looking at the data to decide what counts as very different. Before we can point out which
data points are very different, we need to understand what the normal data points look like.
14
Outlier Formula :
Low outlier : Q 1−1.5 × IQR
Upper outlier : Q 3+1.5 × IQR
11 “Percent Error - Definition, Formula, and Solved Examples,” BYJUS, January 6, 2020,
https://byjus.com/maths/percent-error/
12 “Random Country Generator - Test Where You Land and Learn about It.,” Random Country - Explore
the World, May 17, 2022, https://random.country/
13 “Crime,” Cost of Living, https://www.numbeo.com/crime/
14 7.1.6. what are outliers in the data?, https://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm
Calculations

Low Outlier: 12.35 - 1.5 × 16.375 = -12.2


Upper Outlier: 28.725 + 1.5 × 16.375 =53.3

In my investigation there is no lower outlier, but there is an Upper outlier.

To find Q1, and Q3 we need to know what they are. Q1 is the middle point of the lower
half of the data. I found it by taking the middle value of the data that is below the middle
of the whole set. Q3 is the middle point between the middle of the data and the highest
value. You find it by taking the middle value of the data that is above the middle of the
whole set. To find the IQR I’m finding the difference between Q3, and Q1.

I’m finding the Q1, Q3, and IQR by using data from Poverty rate

Mean:23.81
Q1: 12.35
Q3: 28.725
IQR:16.375

Low Outlier: 31.395 - 1.5 × 21.68 =63.915


Upper Outlier: 53.075 + 1.5 × 21.68 =85.595

Country Poverty (x) Crime rate (y)

Latvia 22.5 37.30

Tajikistan 26.3 44.03

Pakistan 24.3 42.80

Andorra 8 12.87

Hungary 12.3 20.18

Algeria 5.5 51.40

Antigua and Barbuda 28.5 56.39

Ukraine 34.4 46.85


Suriname 39.4 53.03

Jordan 15.7 40.41

Kuwait 2 32.97

Iraq 18.9 44.93

Nicaragua 24.9 51.77

Ivory Coast 39.5 57.40

Grenada 28.2 26.5

Dominica 28.8 53.58

El Salvador 22.8 62.09

Finland 12.2 26.33

Canada 11.6 45.22

South Korea 15 24.94

Nigeria 40.1 66.25

Tuvalu 26.3 18.48

Portugal 17.2 31.41

Cambodia 17.7 53.09

Aruba 15.9 31.39

Taiwan 9 16.71

Moldova 7.3 46.43

Bolivia 37.2 64.56

India 21.9 44.32

Peru 20.2 67.74

Anguilla 23 20.10

Senegal 46.7 44.95


Serbia 23.2 38.17

Iceland 8.8 25.51

Denmark 12.5 26.22

Qatar 0.4 15.88

Belgium 14.8 49.32

Libya 40 60.42

Iran 48.6 49.50

Zimbabwe 38.3 60.6

Lebanon 27.4 46.54

Indonesia 9.4 46.01

Switzerland 16 25.26

Samoa 20.03 42.06

Belarus 5 51.02

Kazakhstan 4.3 45.91

Myanmar 24.8 50.40

Congo 63.9 67.65


Line of Best Fit

15 “Untitled Spreadsheet,” Google Sheets,


https://docs.google.com/spreadsheets/d/193OA2vv5KmksWVB6LCbYKCLsiHmY1guwOyRF9J7udU4/
edit#gid=539304974.
y=0.61x + 28.55
After removing outlier r value increases 0.2 it means there is strong correlation.

Evaluation
From percentage error we can see that there is a strong correlation, but my percentage error was
higher than expected. While I was researching, I found multiple sources for each country. If I
had chosen any other source than I have chosen my percentage error could have been lower.
Although second issue could be that I have chosen 50 countries I needed to choose more than 50
countries to decrease percentage error .From this investigation, I evaluated that in my new
investigations I need to choose more than 50 countries, to find better result also in my new
investigations I will use new type of TI-84 calculators to find more reluctant result. Every year
technology upgrades itself, and with new graphical calculators my results will be superior.

Conclusion

It is important to check level of correlation between Poverty rate, and Crime rate. From the
scatter diagram, and correlation (r value) we can see there is a positively strong linear
correlation. Using Pearson's Rank correlation graph (GDC) proves that the result is reliable. The
statement of the Demolition Man movie was that being poor may result in an increase in criminal
activity, it is expected that higher levels of poverty are linked to higher crime rates. And my
investigation proves that statements of the Demolition Man movie were reliable.

Bibliography

1. “A Complete Guide to Scatter Plots,” Chartio, https://chartio.com/learn/charts/what-is-a-


scatter-plot/. Accessed on October 2023
2. “Crime,” Cost of Living, https://www.numbeo.com/crime/
3. “Correlation Coefficient - Definition, Formula, Properties and Examples,” BYJUS,
August 29, 2023, https://byjus.com/jee/correlation-coefficient/ Accessed on October 2023
4. Describing Scatter Plots¶,” Describing Scatter Plots - Introduction to Google Sheets and
SQL,https://runestone.academy/ns/books/published/ac1/scatter_plots_and_correlation/
describing_scatter_plots.html#. Accessed on October 2023
5. James Chen, “Line of Best Fit: Definition, How It Works, and Calculation,”
Investopedia, https://www.investopedia.com/terms/l/line-of-best-fit.asp#toc-how-to-
calculate-the-line-of-best-fit Accessed on October 2023
6. Louise Gaille, “How Poverty Influences Crime Rates,” Vittana.org, December 16, 2019,
https://vittana.org/how-poverty-influences-crime-rates. Accessed on October 2023
7. “Percent Error - Definition, Formula, and Solved Examples,” BYJUS, January 6, 2020,
https://byjus.com/maths/percent-error/ Accessed on November 2023
8. Pritha Bhandari, “Independent vs. Dependent Variables: Definition & Examples,”
Scribbr, June 22, 2023, https://www.scribbr.com/methodology/independent-and-
dependent-variables/. Accessed on October 2023

9. “Random Country Generator - Test Where You Land and Learn about It.,” Random
Country - Explore the World, May 17, 2022, https://random.country/ Accessed on
November 2023
10. “The Meaning of R, R Square, Adjusted R Square, R Square Change and F Change in a
Regression Analysis,” Analysis INN., March 13, 2020,
https://www.analysisinn.com/post/the-meaning-of-r-r-square-adjusted-r-square-r-square-
change-and-f-change-in-a-regression-analysis/#:~:text=R%20in%20a%20regression
%20analysis,independent%20and%20a%20dependent%20variable. Accessed on
November 2023
11. “The Meaning of R, R Square, Adjusted R Square, R Square Change and F Change in a
Regression Analysis,” Analysis INN., March 13, 2020
https://www.analysisinn.com/post/the-meaning-of-r-r-square-adjusted-r-square-r-square-
change-and-f-change-in-a-regression-analysis/#:~:text=R%20in%20a%20regression
%20analysis,independent%20and%20a%20dependent%20variable. Accessed on
November 2023

12. “Untitled Spreadsheet,” Google Sheets,


https://docs.google.com/spreadsheets/d/193OA2vv5KmksWVB6LCbYKCLsiHmY1guw
OyRF9J7udU4/edit#gid=0. Accessed on October 2023
13. “Untitled Spreadsheet,” Google Sheets,
https://docs.google.com/spreadsheets/d/193OA2vv5KmksWVB6LCbYKCLsiHmY1guw
OyRF9J7udU4/edit#gid=539304974. Accessed on November 2023
14. What is linear regression? - linear regression explained - AWS,
https://aws.amazon.com/what-is/linear-regression/ Accessed on October 2023
15. “WCU Faqs: Research Help,” LibAnswers,
https://westcoastuniversitylibrary.libanswers.com/research/faq/295836#:~:text=Dependen
t%20variables%3A&text=Dependent%20variables%20depend%20on%20other,would
%20be%20the%20dependent%20variable. Accessed on October 2023

7.1.6. what are outliers in the data?,


https://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm Accessed on
November 2023
16.

You might also like