0% found this document useful (0 votes)
12 views32 pages

Final

The document outlines a comprehensive analysis of New York City's 311 service request dataset, detailing its structure, relevance, and the analytical goals aimed at optimizing public services. It covers data preparation steps, including cleaning, handling missing values, and feature creation, as well as the tools used for analysis, such as Pandas and Matplotlib. The analysis seeks to identify patterns in complaint trends, response times, and geographic distributions to enhance operational efficiency and service delivery.

Uploaded by

gandukrishal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views32 pages

Final

The document outlines a comprehensive analysis of New York City's 311 service request dataset, detailing its structure, relevance, and the analytical goals aimed at optimizing public services. It covers data preparation steps, including cleaning, handling missing values, and feature creation, as well as the tools used for analysis, such as Pandas and Matplotlib. The analysis seeks to identify patterns in complaint trends, response times, and geographic distributions to enhance operational efficiency and service delivery.

Uploaded by

gandukrishal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Contents

1. Introduction ............................................................................................................ 3
1.1 Background of the Dataset ............................................................................... 3
1.2 Relevance of Data Analysis in Public Services ................................................. 3
1.3 Goals and Scope of Analysis ............................................................................ 4
1.4Tools used ......................................................................................................... 5
2. Data Understanding ............................................................................................... 7
2.1 Overview of Data Structure ............................................................................... 7
2.2 Description of Key Columns ............................................................................. 8
2.3 Data Types and Missing Data Analysis ............................................................. 9
3. Data Preparation .................................................................................................. 11
3.1 Import the dataset and clean it ....................................................................... 12
3.2 Converting Date-Time Columns and Creating New Features ......................... 13
3.3 Dropping Irrelevant Columns .......................................................................... 14
3.4.Handling Missing Data. ................................................................................... 14
3.5 Display Unique Values from All Columns ........................................................ 15
4.Data Analysis ........................................................................................................ 16
4.1Show Summary Statistics such as sum, mean, standard deviation ................. 17
4.2 Data Analysis: Calculate and Show Correlation of All Variables ..................... 18
5. Data Exploration ................................................................................................... 18
5.1 four major insights through visualization that you come up after data mining. 20
5.2 Group Complaint Types by Average Request Closing Time and Location ...... 24
Statistical Testing: Test 1 - Average Response Time Across Complaint Types ..... 25
Test 2: Whether the type of complaint or service requested and location are related
.............................................................................................................................. 26
Conclusion ............................................................................................................... 29
Bibliography ............................................................................................................. 32
1. Introduction

1.1 Background of the Dataset

The dataset is a collection of service requests registered to New York's 311. 311 is a
non-emergency customer service hotline that the city's residents, businesses, and
visitors dial to complain about noise, illegal parking, and others such as sanitation,
maintenance, and public service problems. This data is on those service requests; it
is critical since it provides some insight into the demand for public services, resource
distribution, and problems in an urban area.

It also contains some other attributes related to each service request, such as:

Complaint Type: The type of complaint made. It can be about noise, a blocked
driveway, or illegal parking, among other things.

Agency: This is the department or agency responsible for the complaint. Could be
NYDP or Sanitation.

Incident Location: The location where the incident was reported, including details such
as zip codes and neighborhood names.

Created Date and Closed Date: The time when the request was opened and closed.

Resolution Status: Closed, Pending, Resolved.

Consequently, via such data analysis, the chronology of complaints in different areas
can be known, the efficacy with which problems are solved can be monitored, and
there can be the observation of patterns where types of complaints change over time.
(NYC.gov, n.d.)

1.2 Relevance of Data Analysis in Public Services

This is the most important initiative in the area of optimization for public service,
particularly with respect to those serving the global population of metropolises like New
York. In other words, the huge data created within 311 service requests is of great
value to city authorities and policymakers.

The result will, thus, be evident in an increase in operational efficiency, optimal


dispersion of resources, and a rise in the overall quality level of services that a city can
present to its residents through effective data analytics.

For instance, data analysis helps to determine geographies that are hotspots of
categories of complaints and allows interventions on targeted geographies. The city
will also analyze response times and resolution rates in order to determine the
efficiency of operations in various agencies and change the workflows accordingly.
The public service agencies can even plan to prevent the subsequent recurring
problems.Moreover, this might be analysis of citizen complaints to catch systemic
problems for which solutions need to be envisaged over the long term, by way of
infrastructural upgradation or policy change. Taking these insights into cognizance
propels public service management from reactive into proactive, which is in the best
interest of residents so that their concerns are ultimately solved more promptly and
efficiently. (Goldsmith, 2023)

1.3 Goals and Scope of Analysis

This analysis aims primarily to understand patterns and trends in 311 service request
data with respect to the following goals:

Complaint Trends: Given this, the analysis would establish the total number of
complaints that are prevalent in the case of New York City and whether some
complaints seem to increase with time while others decrease.
Response Time Analysis: Information on how promptly the city's agencies were
responding to complaints and whether response time varied for different complaints
and geographical locations.

Geographic Patterns: Test if there is a difference in complaints from different parts of


the city and then examine spatial trends in the data.

Service Optimization: Insights should be provided to other, more efficient resource


allocations within New York City, setting complaints under such priorities that the 311
service system is optimized for New York City.

The analysis would include multiple steps for data preparation in terms of cleaning and
transforming it into a state appropriate for statistical analysis and visual exploration of
important variables, then moving on to hypothesis tests to find out relationships
between various factors such as the type of complaints and geographic locations or if
the response times vary for different types of complaints. It will also aim to pinpoint
insights that are actionable in order to make informed decisions and service
improvements that will help the city optimize the way it responds to the complaints of
its citizens, and ultimately improve public satisfaction.

1.4Tools used

1. Pandas:
Pandas is a widely used Python library for data manipulation and analysis. In this
project, it was essential for reading the CSV dataset into a structured DataFrame. It
allowed easy handling of missing values, renaming columns, and filtering rows based
on conditions. We used pandas to extract date and time components, group data by
categories (like month or borough), and compute summary statistics. Its intuitive
syntax and integration with other tools made it ideal for managing and exploring our
service request data. (Alriksson, 2020)
2. Matplotlib:
Matplotlib is a foundational plotting library in Python used to create static, interactive,
and animated visualizations. In this project, it was used to build bar charts, line plots,
and scatter plots to visually interpret complaint trends. It helped visualize the
distribution of complaints across boroughs, time, and complaint types. With
customization features such as color, label formatting, and saving plots as images,
Matplotlib made our visual outputs presentation-ready. These visuals were key to
revealing trends that would otherwise remain hidden in raw numbers.

3. Seaborn:
Seaborn is a high-level data visualization library built on top of Matplotlib that provides
an easier and more aesthetically pleasing way to create plots. We used Seaborn to
generate statistical plots like boxplots and histograms, which helped analyze the
spread and distribution of complaint response times. Its integration with pandas made
it simple to plot directly from DataFrames. Seaborn automatically handles visual
themes, legends, and color palettes, making plots clearer and more professional. It
significantly improved the visual storytelling aspect of our data analysis. (Solomon,
2022)

4. Jupyter Notebook:
Jupyter Notebook served as the interactive development environment where all the
coding and visualization tasks were performed. It allowed us to write and execute code
in cells, visualize outputs immediately, and document our process alongside the code.
This interactivity helped test small portions of code step by step and adjust parameters
quickly. Additionally, markdown cells enabled us to add explanations, titles, and
structured reports directly in the notebook. Jupyter’s ability to combine code, output,
and narrative made it ideal for both development and presentation.

5. CSV File:
The dataset used in this project was stored in a CSV (Comma-Separated Values)
format. CSV files are simple text files that store tabular data and are widely used for
sharing structured information. Using pandas, we imported the CSV file into a
DataFrame to begin our analysis. Despite being large and complex, the CSV format
allowed us to access and process the data efficiently. It served as the foundation for
all our analysis, containing the complaint details, timestamps, locations, and status
information required for exploration.

2. Data Understanding

2.1 Overview of Data Structure


The dataset contains records of customer service requests entered into the New York
City 311 service system. It has complaints raised by residents or businesses or visitors
on any urban issue like noise, parking violation, sanitation, and other public
maintenance.

It has maintained a pretty comprehensive set of records of how citizens have been
able to interact with 311 over a period of time. The dataset contains 300,698 rows and
53 columns, and every column in a row gives a definite view or picture to exhibit the
service request of the data. The data itself is in the form of single attributes:

• Complaint details: type and description Location details: incident address, ZIP code,
borough Dates & times: Creation, closure, response times Agency responsible for
handling the complaint: NYPD, Sanitation Department, etc.

Each row is indicative of a particular complaint, whereas the columns further account
for the data that the complaint's type, agency responsible, location of the occurrence
of the incident, and dates when the request was created and resolved for the incident
2.2 Description of Key Columns

This dataset has a number of columns, each capturing information on the request for
services. A few of the primary columns in the dataset are:

• Unique Key: A unique identifier that is given to every service request. This is the
primary key of this dataset.

• Created Date: date and time of the creation of the service request. It is one of the
important columns to understand the timestamp of when any complaint was received.

• Closed Date: date and time of closing out for the service request. It will help to find
out resolution time - time taken to resolve the complaint.

•Agency: The agency or department that is responsible for addressing the complaint
(examples are NYPD, Department of Sanitation).

•Agneb Name: The complete name of the agency that is handling the request
(examples are "New York City police Department").

•Complaint Type: The specific complaint type, such as Blocked Driveway, Noise -
Street/Sidewalk, or Illegal Parking, etc.

•Descriptor: This is another specification for the complaint, adding more detail to at
least provide some context; for example, Loud Music/Party for Noise -
Street/Sidewalks.

•Location Type: Breaks it down further to Street/Sidewalk, Building, Park.

•Incident Zip: This is the ZIP code where the incident took place; hence, it provides
general information about the location of the complaining party.

•Incident Address: This is the exact location information or street address where a
complaint has been lodged, if available.

•ResoAction: Text of the action taken, for instance Completed, No Access.

These fields contain a lot of information in connection with complaints' nature, location,
and the way these were resolved. These columns help give fast calculations for
response time for any complaint through a difference of two vital dates: Created Date
and Closed Date. Therefore, Complaint Type and Location Type aid in classifying and
understanding issues that were raised by citizens.

2.3 Data Types and Missing Data Analysis

Integer: This will be for numerical data, like Unique Key (unique identifier for an
appeal)

Object (String): The occurrence of textual data, type Agency, Complaint Type, incident
address.

Datetime: For data in the form of dates and times, say, the values in columns like
Created At and Closed At.

Float: where it contains numeric data, a whole number or a decimal place and,
optionally, empty values; Incident Zip is an example. Missing Data: A minimum of few
missing or null data in columns of datasets, likely few of these will hit the analysis.

Missing Data Analysis: Closed Date has got 2,164 missing values, making a point that
open requests are still pending and not closed. Other than this, some missing values
in other columns, like Descriptor, Incident Address, and Resolution Action, are again
not very important, as these are not going to interfere with the basic analysis or time-
to-respond calculation. Closed Date: 2,164 missing values, depict that the requests
are open, therefore not closed.

In all other columns, the possibility of some missing values is considered. Descriptor,
Incident Address, and Resolution Action may also have a few missing values; this
again is not very important, as these will not be getting in the way of basic analysis or
time-to-respond calculation.

Handling Missing Data: The open requests in Closed Date can easily be removed from
the missing value treatment for that column. The imputation of Missing values can be
done to fill it up with any placeholder. Beside this, other columns having some form of
Missing values need imputation or simply removal of incomplete entries depending on
how much that column is relevant towards the analysis.

S.No Column Name Description Data


Type

1 Unique Key A unique identifier assigned to each service Integer


request.

2 Created Date The date and time when the request was String
created.

3 Closed Date The date and time when the request was String
closed.

4 Agency The agency handling the request (e.g., NYPD, String


DOT).

5 Complaint Type The type of complaint (e.g., Noise, Illegal String


Parking).

6 Descriptor Additional details about the complaint (e.g., String


type of noise, location specifics).

7 Location Type The type of location (e.g., Street, Residential). String

8 Incident Zip The ZIP code of the incident location. String

9 City The city where the incident occurred. String

10 Status The status of the request (e.g., Open, Closed). String

11 Borough The borough where the incident occurred String


(e.g., Manhattan, Brooklyn).
12 Latitude The latitude coordinate of the incident Float
location.

13 Longitude The longitude coordinate of the incident Float


location.

14 Address Type The type of address (e.g., Residential, String


Business).

15 Park Facility The name of the park facility, if the incident String
Name occurred in a park.

16 School Name The name of the school, if the incident String


occurred near or in a school.

17 Taxi Company The borough associated with the taxi String


Borough company, if the complaint is related to a taxi.

18 Bridge Highway The name of the bridge or highway, if the String


Name incident occurred there.

19 Road Ramp The specific ramp involved in the incident, if String


applicable.

20 Community The community board associated with the String


Board incident location.

3. Data Preparation
Data preparation still forms the important steps in pipeline data analysis. It is the
process of converting raw forms of data into clean datasets ready for further
analysis. The process makes data ready for any kind of applied analysis. In this
scenario, the data preparation for the dataset of service requests from 311 in New
York City will consist of a few primary steps, as described below:
3.1 Import the dataset and clean it
The very first step in data preparation is to import the dataset and do some initial
cleaning. This is performed by loading the data into a DataFrame, ensuring its
structure is proper, and making it analysis-ready.

•How to load the Dataset: To begin with, we must import the data from whatever
file format it is into pandas (in this instance, the file format is CSV).

•As soon as the data loads, proceed with a few anomaly checks for preliminary
cleaning by starting the detection of missing values, duplicate entries, or any other
data integrity issues.

Figure 1 Importing and Cleaning the Dataset


Figure 2 Output of Importing and Cleaning the Dataset

3.2 Converting Date-Time Columns and Creating New Features


Attributes like Created Date and Closed Date are typically of string or object datatypes.
If there has to be any kind of analysis which is to be done based on time, like
calculating the time taken to resolve a complaint, then those columns need to be
converted into datetime objects.

Next, a new feature will be created named Request_Closing_Time. It calculates the


time taken to close a complaint by getting the difference between Closing Date and
Request Date.

Figure 33.2 Converting Date-Time Columns and Creating New Features

Explanation: This code snippet below will convert string columns into datetime using
the pd.to_datetime() function. It also sets errors='coerce', so that every invalid entry
for dates will be converted to NaT. Please subtract the Closed Date from the Created
Date to get the time taken in hours.

3.3 Dropping Irrelevant Columns

In any given dataset, there might be fields that are definitely of no significance for
analysis. Such columns can be certainly removed to make data more manageable and
viewable. For example, in the given dataset, there is no need to know precise
addresses, school names, and vehicle-related details in order to understand the
pattern of complaints and duration of response times.

Drop the irrelevant columns for the type of analysis we have in our hands.

Figure 4 3.3 Dropping Irrelevant Columns

3.4.Handling Missing Data.


In the real world, dealing with missing data mainly involves replacing implied NaNs.
There are cases when missing data is just dropped, while in other cases, it will be
imputed by a default value or a computed value; that is mean, median, or mode.

Working with this data set will involve:

Dropping all rows with missing data in critical columns like Created Date and Closed
Date, as they are essential in the computation of response times. For the rows with
less critical columns in missing data, values may be imputed or dropped based on
significance. (Jain, 2021)
igure 5 Handling Missing Data.

Explanation: The dropna() method is used to remove rows with missing values in the
specified columns, while the fillna() method fills missing values in the Complaint Type
column with the most frequent value (mode).

3.5 Display Unique Values from All Columns


An overview of the data cleaning methods is as follows:
Dropping Rows of Missing ValuesMissing data is common to real-world datasets,
especially in columns important to analysis. Examples are Created Date and Closed
Date in this dataset. It will be implemented as follows:

Drop rows having missing values in critical columns like Created Date and Closed
Date.

For less critical columns, it will do either of the following:

Impute missing values by the mode in case of a categorical column.

It will drop rows if the column is not crucial for analysis.

Figure 6 Display Unique Values from All Columns

4.Data Analysis
Data analysis is the process of investigation and interpretation of data with a view to
identifying patterns, trends, and relationships between variables. Suchrequiring the
application of statistical techniques at the stage that could be summary statistics,
correlation analysis, and data visualization. Summary statistics throw light on principal
measures through mean, standard deviation, skewness, and kurtosis. Correlation
analysis helps to identify relationships between numeric variables. When visualized,
the data thus helps in understanding the patterns, outliers, and trends in a more
effective manner. It is because of this process that one can make an informed decision
based on insights from data.

4.1Show Summary Statistics such as sum, mean, standard


deviation
To generate summary statistics such as sum, mean, standard deviation, skewness,
and kurtosis for the numerical columns in the DataFrame, we can use a combination
of pandas functions and scipy.stats functions.

Figure 7 Show Summary Statistics such as sum, mean, standard deviation


4.2 Data Analysis: Calculate and Show Correlation of All Variables
In this task, we are going to find the correlation matrix of all numerical variables in the
data set. It will show the magnitude and direction of relationships between pairs of

Figure 8 Data Analysis: Calculate and Show Correlation of All Variables

Explanation: value can vary from -1 for perfect negative to +1 for perfect positive. This
will ensure an error in the assessment as non-numeric data will be excluded from the
correlation calculation." This error occurred because the process used to compute the
correlations from corr() considered computed values against non-numeric columns,
which was pretty ludicrous. Thus, we had to first filter out only the numerical columns
satisfying select_dtypes() for data types float64 and int64. So, for these columns, the
correlation matrix was constructed to indicate the relationship between numeric
variables.

5. Data Exploration
EDA is an approach to analyzing a dataset in which its structure, patterns, or
relationships between variables are derived in order to further analyze it. This is quite
the first stage in data analysis. It consists of a number of techniques applied to gain
insights into the data before applying advanced modeling or analysis. The principal
objectives of data exploration are as follows: (Kumar, 2021)

1. Data Understanding: It involves the checking of columns, data types, missing


values, and basic statistics, which consist of mean, median, and standard deviation,
to have an idea of the general characteristics of a dataset.

2. Missing or Anomalous Data Identification: Look for the missing, duplicated, or outlier
values that need to be handled before analysis.

3. Data Distribution: Understanding how the data is distributed through histograms,


box plots, or even descriptive statistics, in order to understand central tendency,
spread, and skewness in variables.

4. Correlation and Relationships: Check for correlations and relationships between


different features, whether linearly or non-linearly related; techniques include scatter
plots or correlation matrices.

5. Visualization: These are different kinds of visual representations like histograms,


bar charts, scatter plots, etc., which bring out hidden patterns and trends in the dataset.

Data exploration helps with the hypothesis, the proper analytical techniques for use,
and lays the data ready for further advanced statistical models, machine learning
models, or any decision-making processes. This is the most critical starting point of
any data science project.
5.1 four major insights through visualization that you come up after
data mining.
Data mining and analysis of large datasets will yield several valuable insights through
visualizations. Here are four key insights which stem from various types of
visualizations:

1. Distribution of Complaints by Type

Insight: A bar chart of the most frequent complaints within each category can easily
bring out the frequency, for instance, if there are recurring problems related to noise,
or a blocked driveway has a high frequency of complaints, it signifies some repeated
problem in that area.

Visualization:
2. Geographical Distribution of Complaints

Insight: A scatter plot or heatmap based on latitude and longitude can be plotted to
point out areas where complaints are concentrated the most. This will provide an
insight into an area where resource allocation is skewed, indicating an area where
more attention or intervention is needed. (Singh, 2020)

Visualization:
3. Request_Closing_Time by Complaint Type

This concept refers to identifying trends or regularities in how service requests occur
over time by leveraging NumPy, a numerical computing library in Python.

In practice, service request data often includes timestamps indicating when each
request was made. By analyzing these timestamps with NumPy arrays, you can detect
temporal patterns—such as peaks in activity during certain hours, days, or seasons.

Visualization:
4. Association between Variables

Insight: The correlation matrix heatmap can be used to detail the relationship between
variables. For instance, one can show positive correlations between some specific
complaint types and their corresponding response times, or high correlations between
variables originating from the geographic location due to increased probabilities of
getting certain complaints.
Visualization:

5.2 Group Complaint Types by Average Request Closing Time and


Location
To arrange complaint types according to their average Request_Closing_Time and
categorize them by locations (e.g., Borough), we first group the data by Complaint
Type and Location. Then, we calculate the average Request_Closing_Time for each
group, sort them in descending order, and visualize the results using a bar chart.
Figure 9 Group Complaint Types by Average Request Closing Time and Location

Statistical Testing: Test 1 - Average Response Time Across


Complaint Types

One-Way ANOVA is conducted for testing whether there is an equal response time
across all types of complaints. The Analysis of Variance checks if there is a significant
mean difference in Request_Closing_Time among different Complaint Types.

Hypotheses:
Null Hypothesis (H0): The average response times are equal for all complaint types,
so there is no significant difference between them.

Alternative Hypothesis (H1): The average response times are not equal across all
complaint types, indicating that at least one group is different.

Figure 10 Average Response Time Across Complaint Types

Explanation:

• f_oneway(): This function performs the ANOVA test comparing the means of
multiple groups (complaint types). It returns two values: the F-statistic (which
tells the ratio of variance between the groups) and the p-value (which tells if the
difference is statistically significant).

• F-statistic: A larger value indicates a higher likelihood that at least one group
mean is different.

• p-value: If p < 0.05, we reject the null hypothesis and conclude that at least one
complaint type has a significantly different average response time. If p ≥ 0.05,
we fail to reject the null hypothesis, meaning no significant difference exists.

Test 2: Whether the type of complaint or service requested and


location are related
Null Hypothesis (H0):
The type of complaint or service requested and the location (e.g., Borough) are not
related. (No relationship between complaint type and location.)

Alternative Hypothesis (H1):

The type of complaint or service requested and the location (Borough) are related.
(There is a significant relationship between complaint type and location.)

Statistical Test: Chi-Square Test for Independence

A Chi-Square test helps determine if two categorical variables (complaint type and
location) are independent or related.

Figure 11 type of complaint or service requested

Explanation:

• Chi2 Stat: The Chi-Square statistic, which helps measure the association
between two categorical variables.

• p-value: If the p-value < 0.05, we reject the Null Hypothesis and conclude that
there is a significant relationship between the complaint type and location. If the
p-value ≥ 0.05, we fail to reject the Null Hypothesis and conclude that there is
no significant relationship between the two.
Conclusion
In This milestone of this project, we aimed to explore, clean, and analyze the 311
Customer Service Requests dataset to extract meaningful insights that can be used
for process optimization, resource allocation, and improving service efficiency. The
dataset provided a wealth of information, including the Complaint Type, Borough,
Created Date, Closed Date, and other relevant details that help in understanding the
patterns and trends in customer complaints. The first stage of this project involved the
data understanding phase, where we identified the key variables, such as Complaint
Type, Borough, and Request_Closing_Time. We recognized early on that the dataset
required significant preprocessing to ensure data quality and completeness.

In the data preparation phase, we handled missing values, converted the Created Date
and Closed Date to the proper datetime format, and engineered the new feature,
Request_Closing_Time, which calculated the time it took to resolve each complaint.
We also dropped irrelevant columns and handled any remaining missing values by
imputing the Complaint Type with the most frequent category. This step ensured that
our dataset was clean and ready for deeper analysis, and we could now focus on the
core aspects of the dataset.

Once the data was prepared, we moved into the exploratory data analysis (EDA)
phase, where we sought to uncover patterns, trends, and insights from the data.
Through visualizations such as bar charts, box plots, and histograms, we discovered
key findings:

• Complaint Type Distribution: We identified that Noise, Illegal Parking, and


Blocked Driveway complaints were the most frequent types, highlighting areas
where the public service is most needed.

• Geographic Distribution: Complaints were heavily concentrated in urban areas,


particularly Brooklyn and Manhattan, indicating that certain boroughs face a
higher volume of complaints and might require more resources.

• Request_Closing_Time Variability: We found that Request_Closing_Time


varied greatly across different complaint types. Some complaints, like Noise,
were resolved quickly, while others, such as Blocked Driveway, took
significantly longer, pointing to inefficiencies in the resolution process that need
further investigation.

• Correlation Matrix: The correlation analysis showed weak relationships


between numerical variables, but it indicated that Request_Closing_Time did
not have strong associations with other variables, suggesting that it operates
independently as a key measure of service efficiency.

To validate these findings, we employed statistical tests:

• ANOVA Test: The One-Way ANOVA test confirmed that average response times
differ significantly across different complaint types (with a p-value of 0.0). This
validated our earlier observation from visualizations, emphasizing that certain
complaint types require more time to resolve, and indicating potential areas for
process improvement.

• Chi-Square Test: The Chi-Square test showed a significant relationship


between Complaint Type and Borough, supporting the idea that certain
complaint types are more likely to occur in specific geographic regions, which
could help with targeted resource allocation.

The insights gained from this analysis provide significant value:

• Geographic Resource Allocation: Urban boroughs like Brooklyn and Manhattan


are hotspots for complaints, suggesting that more resources and services
should be allocated to these regions.

• Inefficiencies in Service: There are significant variations in how long it takes to


resolve different types of complaints, which suggests inefficiencies in the
service process. Complaints like Blocked Driveway or Abandoned Vehicle
require much longer resolution times, which may point to the need for process
optimization in these categories.

• Predictive Potential: With the identification of geographic and complaint type


patterns, future analysis could focus on building predictive models to forecast
which complaint types are likely to arise in different areas and help allocate
resources proactively.
In conclusion, the coursework provided valuable insights into the 311 Customer
Service Requests dataset by exploring the data, cleaning it, and conducting in-depth
analysis. The findings pointed out areas that need attention, such as complaint types
with longer resolution times and regions with higher complaint volumes. The statistical
tests confirmed that there are significant differences in response times across
complaint types and that Complaint Type is related to Borough. These insights lay a
solid foundation for the next stages of the project, where we will further explore
predictive models and process improvements to optimize complaint handling and
response times. This milestone not only helped in identifying the areas for
improvement but also provided the data-driven insights needed to enhance the public
service experience for the citizens.
Bibliography
Alriksson, T. (2020, 08 19). https://towardsdatascience.com/why-you-should-use-
jupyter-notebook-and-pandas-2b60f8b1b6fe. Retrieved from
towardsdatascience: https://towardsdatascience.com/why-you-should-use-
jupyter-notebook-and-pandas-2b60f8b1b6fe
Goldsmith, S. (2023, 11 14). https://datasmart.ash.harvard.edu/news/article/why-
cities-need-data-smart-culture. Retrieved from
https://datasmart.ash.harvard.edu/news/article/why-cities-need-data-smart-
culture

Jain, S. (2021, 10 02). Towards Data Science. Retrieved from Towards Data
Science: https://towardsdatascience.com/how-to-handle-missing-data-
8646b18db0d4

Kumar, G. P. (2021, 05 20). Analytics Vidhya. Retrieved from Analytics Vidhya:


https://www.analyticsvidhya.com/blog/2021/06/exploratory-data-analysis-eda-
in-python/

NYC.gov. (n.d.). Retrieved from https://data.cityofnewyork.us/Social-Services/311-


Service-Requests/erm2-nwe9.
Singh, N. (2020, 12 04). Medium (The Startup). Retrieved from Medium (The
Startup): https://medium.com/swlh/geospatial-data-analysis-using-python-
f7f4b2c0ecf2

Solomon, B. (2022, 06 07). realpython. Retrieved from realpython:


https://realpython.com/python-data-visualization-seaborn/

You might also like