911 Calls Data Capstone Project .HTML
911 Calls Data Capstone Project .HTML
Read in the csv file for 911 calls over a year for Montgomery County, PA via Kaggle
In [4]: df = pd.read_csv('911.csv')
In [5]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99492 entries, 0 to 99491
Data columns (total 9 columns):
lat 99492 non-null float64
lng 99492 non-null float64
desc 99492 non-null object
zip 86637 non-null float64
title 99492 non-null object
timeStamp 99492 non-null object
twp 99449 non-null object
addr 98973 non-null object
e 99492 non-null int64
dtypes: float64(3), int64(1), object(5)
memory usage: 6.8+ MB
In [6]: df.head()
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Out[6]:
lat lng desc zip title timeStamp twp addr e
REINDEER CT
& DEAD END; 2015-12-
- EMS: BACK NEW REINDEER CT
0 40.297876 NEW 19525.0 10 1
75.581294 PAINS/INJURY HANOVER & DEAD END
HANOVER; 17:40:00
Station ...
HAWS AVE;
2015-12-
- NORRISTOWN; Fire: GAS-
2 40.121182 19401.0 10 NORRISTOWN HAWS AVE 1
75.351975 2015-12-10 @ ODOR/LEAK
17:40:00
14:39:21-St...
AIRY ST &
EMS: 2015-12-
- SWEDE ST; AIRY ST &
3 40.116153 19401.0 CARDIAC 10 NORRISTOWN 1
75.343513 NORRISTOWN; SWEDE ST
EMERGENCY 17:40:01
Station 308A;...
CHERRYWOOD
CT & DEAD 2015-12- CHERRYWOOD
- EMS: LOWER
4 40.251492 END; LOWER NaN 10 CT & DEAD 1
75.603350 DIZZINESS POTTSGROVE
POTTSGROVE; 17:40:01 END
S...
In [7]: df['zip'].value_counts().head(5)
In [8]: df['twp'].value_counts().head(5)
Out[8]: LOWER MERION 8443
ABINGTON 5977
NORRISTOWN 5890
UPPER MERION 5227
CHELTENHAM 4575
Name: twp, dtype: int64
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
I Want to break out the reason for each 911 call into a new column
Out[9]: 0 EMS
1 EMS
2 Fire
3 EMS
4 EMS
Name: Reason, dtype: object
What is the most common Reason for a 911 call based off of this new column?
In [10]: df['Reason'].value_counts().head(3)
Out[10]: EMS 48877
Traffic 35695
Fire 14920
Name: Reason, dtype: int64
In [11]: type(df['timeStamp'][0])
Out[11]: str
I want to break out the time stamp into hours days and months
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
In [12]: df['timeStamp'] = pd.to_datetime(df['timeStamp'])
type(df['timeStamp'][0])
Out[12]: pandas.tslib.Timestamp
REINDEER CT
& DEAD END; 2015-12-
- EMS: BACK NEW REINDEER CT
0 40.297876 NEW 19525.0 10 1 EMS 17 12 3
75.581294 PAINS/INJURY HANOVER & DEAD END
HANOVER; 17:40:00
Station ...
HAWS AVE;
2015-12-
- NORRISTOWN; Fire: GAS-
2 40.121182 19401.0 10 NORRISTOWN HAWS AVE 1 Fire 17 12 3
75.351975 2015-12-10 @ ODOR/LEAK
17:40:00
14:39:21-St...
AIRY ST &
EMS: 2015-12-
- SWEDE ST; AIRY ST &
3 40.116153 19401.0 CARDIAC 10 NORRISTOWN 1 EMS 17 12 3
75.343513 NORRISTOWN; SWEDE ST
EMERGENCY 17:40:01
Station 308A;...
CHERRYWOOD
CT & DEAD 2015-12- CHERRYWOOD
- EMS: LOWER
4 40.251492 END; LOWER NaN 10 CT & DEAD 1 EMS 17 12 3
75.603350 DIZZINESS POTTSGROVE
POTTSGROVE; 17:40:01 END
S...
Now I want to convert the numeric values of the days of the week into a string
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Out[14]:
Day
lat lng desc zip title timeStamp twp addr e Reason Hour Month of
Week
REINDEER CT
& DEAD END; 2015-12-
- EMS: BACK NEW REINDEER CT
0 40.297876 NEW 19525.0 10 1 EMS 17 12 Thu
75.581294 PAINS/INJURY HANOVER & DEAD END
HANOVER; 17:40:00
Station ...
HAWS AVE;
2015-12-
- NORRISTOWN; Fire: GAS-
2 40.121182 19401.0 10 NORRISTOWN HAWS AVE 1 Fire 17 12 Thu
75.351975 2015-12-10 @ ODOR/LEAK
17:40:00
14:39:21-St...
AIRY ST &
EMS: 2015-12-
- SWEDE ST; AIRY ST &
3 40.116153 19401.0 CARDIAC 10 NORRISTOWN 1 EMS 17 12 Thu
75.343513 NORRISTOWN; SWEDE ST
EMERGENCY 17:40:01
Station 308A;...
CHERRYWOOD
CT & DEAD 2015-12- CHERRYWOOD
- EMS: LOWER
4 40.251492 END; LOWER NaN 10 CT & DEAD 1 EMS 17 12 Thu
75.603350 DIZZINESS POTTSGROVE
POTTSGROVE; 17:40:01 END
S...
Now I can plot out the type of 911 call by the day of the week
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Easy to see that EMS has the highest volume of calls regardless of week closely
followed by Traffic calls. Fires are very consistent throughout the week.
Again, we get roughly the same outcome. Although I noticed we are missing some
months. A line plot might help fill in this information
I can gropuby Month and use the count() method to get the rows to be the months.
Then any column will yeild the amount of calls for that month and I can plot a line plot
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Out[17]:
Day of
lat lng desc zip title timeStamp twp addr e Reason Hour
Week
Month
1 13205 13205 13205 11527 13205 13205 13203 13096 13205 13205 13205 13205
2 11467 11467 11467 9930 11467 11467 11465 11396 11467 11467 11467 11467
3 11101 11101 11101 9755 11101 11101 11092 11059 11101 11101 11101 11101
4 11326 11326 11326 9895 11326 11326 11323 11283 11326 11326 11326 11326
5 11423 11423 11423 9946 11423 11423 11420 11378 11423 11423 11423 11423
In [18]: byMonth['addr'].plot()
Out[18]: <matplotlib.axes._subplots.AxesSubplot at 0x203265aa400>
This graph tells us that the most calls were made in January with a spike in the
downward trend around July. Notice that the y-axis starts at 8000 so there are still a
large amount of calls even though the graph suggests a large drop. I'll need to set the
index to a columns to create a linear fit on the number of calls per month. Again, any
column will work for "y" to get the number of calls
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
This plot shows what I suggested from the line plot. The trend is down with the outliers
being in the month of july where we saw a peak in calls.
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Out[20]:
Day
lat lng desc zip title timeStamp twp addr e Reason Hour Month of Date
Week
REINDEER CT
& DEAD END; 2015-12-
- EMS: BACK NEW REINDEER CT 2015-
0 40.297876 NEW 19525.0 10 1 EMS 17 12 Thu
75.581294 PAINS/INJURY HANOVER & DEAD END 12-10
HANOVER; 17:40:00
Station ...
HAWS AVE;
2015-12-
- NORRISTOWN; Fire: GAS- 2015-
2 40.121182 19401.0 10 NORRISTOWN HAWS AVE 1 Fire 17 12 Thu
75.351975 2015-12-10 @ ODOR/LEAK 12-10
17:40:00
14:39:21-St...
AIRY ST &
EMS: 2015-12-
- SWEDE ST; AIRY ST & 2015-
3 40.116153 19401.0 CARDIAC 10 NORRISTOWN 1 EMS 17 12 Thu
75.343513 NORRISTOWN; SWEDE ST 12-10
EMERGENCY 17:40:01
Station 308A;...
CHERRYWOOD
CT & DEAD 2015-12- CHERRYWOOD
- EMS: LOWER 2015-
4 40.251492 END; LOWER NaN 10 CT & DEAD 1 EMS 17 12 Thu
75.603350 DIZZINESS POTTSGROVE 12-10
POTTSGROVE; 17:40:01 END
S...
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Hmmm... I'd rather look at this by the type of incident
In [187]: df[df['Reason']=='Traffic'].groupby('Date').count()['twp'].plot()
plt.title('Traffic')
Out[187]: <matplotlib.text.Text at 0x2ab6b1312e8>
The graph above suggest there are more traffic incidents reported in the winter
months which makes sense with driving conditions being at their worst.
In [188]: df[df['Reason']=='Fire'].groupby('Date').count()['twp'].plot()
plt.title('Fire')
Out[188]: <matplotlib.text.Text at 0x2ab6b1bfb38>
I would have expected more fire calls to be during the summer months, and although
we see a spike around July, we know july to be a high volumne month. This graph
suggests more incidents in the winter.
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
In [22]: df[df['Reason']=='EMS'].groupby('Date').count()['twp'].plot()
plt.title('EMS')
Out[22]: <matplotlib.text.Text at 0x203275e5518>
EMS seems to be fairly consistent throughout the year with few spikes
A heatmap could be useful to determine what time of day most 911 calls were made. I'll
first have to arrange the data frame into a matrix
In [24]: dayHour.head()
Out[24]:
Hour 0 1 2 3 4 5 6 7 8 9 ... 14 15 16 17 18 19 20 21 22 23
Day
of
Week
Fri 275 235 191 175 201 194 372 598 742 752 ... 932 980 1039 980 820 696 667 559 514 474
Mon 282 221 201 194 204 267 397 653 819 786 ... 869 913 989 997 885 746 613 497 472 325
Sat 375 301 263 260 224 231 257 391 459 640 ... 789 796 848 757 778 696 628 572 506 467
Sun 383 306 286 268 242 240 300 402 483 620 ... 684 691 663 714 670 655 537 461 415 330
Thu 278 202 233 159 182 203 362 570 777 828 ... 876 969 935 1013 810 698 617 553 424 354
5 rows 24 columns
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
In [219]: sns.heatmap(dayHour,cmap='coolwarm')
This shows that most calls are made between 7am and 7pm. The highest volume of calls
are made around 4 or 5pm on nearly every day of the week. It is notable that the
weekend has the lowest volumn of calls in general.
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
This cluster map suggests what I determined from the last graph. The weekend days are
grouped together showing they have the lowest volume and the normal sleeping hours
are to the left of the x-axis showing 911 calls were less likely to be made then. That all
makes sense.
Now I want to manipulate the DataFrame to show the Month as the column.
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
Out[25]:
Month 1 2 3 4 5 6 7 8 12
Day of Week
Fri 1970 1581 1525 1958 1730 1649 2045 1310 1065
Mon 1727 1964 1535 1598 1779 1617 1692 1511 1257
Sat 2291 1441 1266 1734 1444 1388 1695 1099 978
Sun 1960 1229 1102 1488 1424 1333 1672 1021 907
Thu 1584 1596 1900 1601 1590 2065 1646 1230 1266
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com
This is interesting. Saturdays in January had the highest volume of calls despite the
weekend having the lowest amount of 911 calls. We can also summize that the summer
months had the lowest volume of calls.
PRO version Are you a developer? Try out the HTML to PDF API pdfcrowd.com