0% found this document useful (0 votes)

12 views32 pages

Final

The document outlines a comprehensive analysis of New York City's 311 service request dataset, detailing its structure, relevance, and the analytical goals aimed at optimizing public services. It covers data preparation steps, including cleaning, handling missing values, and feature creation, as well as the tools used for analysis, such as Pandas and Matplotlib. The analysis seeks to identify patterns in complaint trends, response times, and geographic distributions to enhance operational efficiency and service delivery.

Uploaded by

gandukrishal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views32 pages

Final

Uploaded by

gandukrishal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

1. Introduction ............................................................................................................ 3
1.1 Background of the Dataset ............................................................................... 3
1.2 Relevance of Data Analysis in Public Services ................................................. 3
1.3 Goals and Scope of Analysis ............................................................................ 4
1.4Tools used ......................................................................................................... 5
2. Data Understanding ............................................................................................... 7
2.1 Overview of Data Structure ............................................................................... 7
2.2 Description of Key Columns ............................................................................. 8
2.3 Data Types and Missing Data Analysis ............................................................. 9
3. Data Preparation .................................................................................................. 11
3.1 Import the dataset and clean it ....................................................................... 12
3.2 Converting Date-Time Columns and Creating New Features ......................... 13
3.3 Dropping Irrelevant Columns .......................................................................... 14
3.4.Handling Missing Data. ................................................................................... 14
3.5 Display Unique Values from All Columns ........................................................ 15
4.Data Analysis ........................................................................................................ 16
4.1Show Summary Statistics such as sum, mean, standard deviation ................. 17
4.2 Data Analysis: Calculate and Show Correlation of All Variables ..................... 18
5. Data Exploration ................................................................................................... 18
5.1 four major insights through visualization that you come up after data mining. 20
5.2 Group Complaint Types by Average Request Closing Time and Location ...... 24
Statistical Testing: Test 1 - Average Response Time Across Complaint Types ..... 25
Test 2: Whether the type of complaint or service requested and location are related
.............................................................................................................................. 26
Conclusion ............................................................................................................... 29
Bibliography ............................................................................................................. 32
1. Introduction

1.1 Background of the Dataset

The dataset is a collection of service requests registered to New York's 311. 311 is a
non-emergency customer service hotline that the city's residents, businesses, and
visitors dial to complain about noise, illegal parking, and others such as sanitation,
maintenance, and public service problems. This data is on those service requests; it
is critical since it provides some insight into the demand for public services, resource
distribution, and problems in an urban area.

It also contains some other attributes related to each service request, such as:

Complaint Type: The type of complaint made. It can be about noise, a blocked
driveway, or illegal parking, among other things.

Agency: This is the department or agency responsible for the complaint. Could be
NYDP or Sanitation.

Incident Location: The location where the incident was reported, including details such
as zip codes and neighborhood names.

Created Date and Closed Date: The time when the request was opened and closed.

Resolution Status: Closed, Pending, Resolved.

Consequently, via such data analysis, the chronology of complaints in different areas
can be known, the efficacy with which problems are solved can be monitored, and
there can be the observation of patterns where types of complaints change over time.
(NYC.gov, n.d.)

1.2 Relevance of Data Analysis in Public Services

This is the most important initiative in the area of optimization for public service,
particularly with respect to those serving the global population of metropolises like New
York. In other words, the huge data created within 311 service requests is of great
value to city authorities and policymakers.

The result will, thus, be evident in an increase in operational efficiency, optimal

dispersion of resources, and a rise in the overall quality level of services that a city can
present to its residents through effective data analytics.

For instance, data analysis helps to determine geographies that are hotspots of
categories of complaints and allows interventions on targeted geographies. The city
will also analyze response times and resolution rates in order to determine the
efficiency of operations in various agencies and change the workflows accordingly.
The public service agencies can even plan to prevent the subsequent recurring
problems.Moreover, this might be analysis of citizen complaints to catch systemic
problems for which solutions need to be envisaged over the long term, by way of
infrastructural upgradation or policy change. Taking these insights into cognizance
propels public service management from reactive into proactive, which is in the best
interest of residents so that their concerns are ultimately solved more promptly and
efficiently. (Goldsmith, 2023)

1.3 Goals and Scope of Analysis

This analysis aims primarily to understand patterns and trends in 311 service request
data with respect to the following goals:

Complaint Trends: Given this, the analysis would establish the total number of
complaints that are prevalent in the case of New York City and whether some
complaints seem to increase with time while others decrease.
Response Time Analysis: Information on how promptly the city's agencies were
responding to complaints and whether response time varied for different complaints
and geographical locations.

Geographic Patterns: Test if there is a difference in complaints from different parts of

the city and then examine spatial trends in the data.

Service Optimization: Insights should be provided to other, more efficient resource

allocations within New York City, setting complaints under such priorities that the 311
service system is optimized for New York City.

The analysis would include multiple steps for data preparation in terms of cleaning and
transforming it into a state appropriate for statistical analysis and visual exploration of
important variables, then moving on to hypothesis tests to find out relationships
between various factors such as the type of complaints and geographic locations or if
the response times vary for different types of complaints. It will also aim to pinpoint
insights that are actionable in order to make informed decisions and service
improvements that will help the city optimize the way it responds to the complaints of
its citizens, and ultimately improve public satisfaction.

1.4Tools used

1. Pandas:
Pandas is a widely used Python library for data manipulation and analysis. In this
project, it was essential for reading the CSV dataset into a structured DataFrame. It
allowed easy handling of missing values, renaming columns, and filtering rows based
on conditions. We used pandas to extract date and time components, group data by
categories (like month or borough), and compute summary statistics. Its intuitive
syntax and integration with other tools made it ideal for managing and exploring our
service request data. (Alriksson, 2020)
2. Matplotlib:
Matplotlib is a foundational plotting library in Python used to create static, interactive,
and animated visualizations. In this project, it was used to build bar charts, line plots,
and scatter plots to visually interpret complaint trends. It helped visualize the
distribution of complaints across boroughs, time, and complaint types. With
customization features such as color, label formatting, and saving plots as images,
Matplotlib made our visual outputs presentation-ready. These visuals were key to
revealing trends that would otherwise remain hidden in raw numbers.

3. Seaborn:
Seaborn is a high-level data visualization library built on top of Matplotlib that provides
an easier and more aesthetically pleasing way to create plots. We used Seaborn to
generate statistical plots like boxplots and histograms, which helped analyze the
spread and distribution of complaint response times. Its integration with pandas made
it simple to plot directly from DataFrames. Seaborn automatically handles visual
themes, legends, and color palettes, making plots clearer and more professional. It
significantly improved the visual storytelling aspect of our data analysis. (Solomon,
2022)

4. Jupyter Notebook:
Jupyter Notebook served as the interactive development environment where all the
coding and visualization tasks were performed. It allowed us to write and execute code
in cells, visualize outputs immediately, and document our process alongside the code.
This interactivity helped test small portions of code step by step and adjust parameters
quickly. Additionally, markdown cells enabled us to add explanations, titles, and
structured reports directly in the notebook. Jupyter’s ability to combine code, output,
and narrative made it ideal for both development and presentation.

5. CSV File:
The dataset used in this project was stored in a CSV (Comma-Separated Values)
format. CSV files are simple text files that store tabular data and are widely used for
sharing structured information. Using pandas, we imported the CSV file into a
DataFrame to begin our analysis. Despite being large and complex, the CSV format
allowed us to access and process the data efficiently. It served as the foundation for
all our analysis, containing the complaint details, timestamps, locations, and status
information required for exploration.

2. Data Understanding

2.1 Overview of Data Structure

The dataset contains records of customer service requests entered into the New York
City 311 service system. It has complaints raised by residents or businesses or visitors
on any urban issue like noise, parking violation, sanitation, and other public
maintenance.

It has maintained a pretty comprehensive set of records of how citizens have been
able to interact with 311 over a period of time. The dataset contains 300,698 rows and
53 columns, and every column in a row gives a definite view or picture to exhibit the
service request of the data. The data itself is in the form of single attributes:

• Complaint details: type and description Location details: incident address, ZIP code,
borough Dates & times: Creation, closure, response times Agency responsible for
handling the complaint: NYPD, Sanitation Department, etc.

Each row is indicative of a particular complaint, whereas the columns further account
for the data that the complaint's type, agency responsible, location of the occurrence
of the incident, and dates when the request was created and resolved for the incident
2.2 Description of Key Columns

This dataset has a number of columns, each capturing information on the request for
services. A few of the primary columns in the dataset are:

• Unique Key: A unique identifier that is given to every service request. This is the
primary key of this dataset.

• Created Date: date and time of the creation of the service request. It is one of the
important columns to understand the timestamp of when any complaint was received.

• Closed Date: date and time of closing out for the service request. It will help to find
out resolution time - time taken to resolve the complaint.

•Agency: The agency or department that is responsible for addressing the complaint
(examples are NYPD, Department of Sanitation).

•Agneb Name: The complete name of the agency that is handling the request
(examples are "New York City police Department").

•Complaint Type: The specific complaint type, such as Blocked Driveway, Noise -
Street/Sidewalk, or Illegal Parking, etc.

•Descriptor: This is another specification for the complaint, adding more detail to at
least provide some context; for example, Loud Music/Party for Noise -
Street/Sidewalks.

•Location Type: Breaks it down further to Street/Sidewalk, Building, Park.

•Incident Zip: This is the ZIP code where the incident took place; hence, it provides
general information about the location of the complaining party.

•Incident Address: This is the exact location information or street address where a
complaint has been lodged, if available.

•ResoAction: Text of the action taken, for instance Completed, No Access.

These fields contain a lot of information in connection with complaints' nature, location,
and the way these were resolved. These columns help give fast calculations for
response time for any complaint through a difference of two vital dates: Created Date
and Closed Date. Therefore, Complaint Type and Location Type aid in classifying and
understanding issues that were raised by citizens.

2.3 Data Types and Missing Data Analysis

Integer: This will be for numerical data, like Unique Key (unique identifier for an
appeal)

Object (String): The occurrence of textual data, type Agency, Complaint Type, incident
address.

Datetime: For data in the form of dates and times, say, the values in columns like
Created At and Closed At.

Float: where it contains numeric data, a whole number or a decimal place and,
optionally, empty values; Incident Zip is an example. Missing Data: A minimum of few
missing or null data in columns of datasets, likely few of these will hit the analysis.

Missing Data Analysis: Closed Date has got 2,164 missing values, making a point that
open requests are still pending and not closed. Other than this, some missing values
in other columns, like Descriptor, Incident Address, and Resolution Action, are again
not very important, as these are not going to interfere with the basic analysis or time-
to-respond calculation. Closed Date: 2,164 missing values, depict that the requests
are open, therefore not closed.

In all other columns, the possibility of some missing values is considered. Descriptor,
Incident Address, and Resolution Action may also have a few missing values; this
again is not very important, as these will not be getting in the way of basic analysis or
time-to-respond calculation.

Handling Missing Data: The open requests in Closed Date can easily be removed from
the missing value treatment for that column. The imputation of Missing values can be
done to fill it up with any placeholder. Beside this, other columns having some form of
Missing values need imputation or simply removal of incomplete entries depending on
how much that column is relevant towards the analysis.

S.No Column Name Description Data

Type

1 Unique Key A unique identifier assigned to each service Integer

request.

2 Created Date The date and time when the request was String
created.

3 Closed Date The date and time when the request was String
closed.

4 Agency The agency handling the request (e.g., NYPD, String

DOT).

5 Complaint Type The type of complaint (e.g., Noise, Illegal String

Parking).

6 Descriptor Additional details about the complaint (e.g., String

type of noise, location specifics).

7 Location Type The type of location (e.g., Street, Residential). String

8 Incident Zip The ZIP code of the incident location. String

9 City The city where the incident occurred. String

10 Status The status of the request (e.g., Open, Closed). String

11 Borough The borough where the incident occurred String

(e.g., Manhattan, Brooklyn).
12 Latitude The latitude coordinate of the incident Float
location.

13 Longitude The longitude coordinate of the incident Float

location.

14 Address Type The type of address (e.g., Residential, String

Business).

15 Park Facility The name of the park facility, if the incident String
Name occurred in a park.

16 School Name The name of the school, if the incident String

occurred near or in a school.

17 Taxi Company The borough associated with the taxi String

Borough company, if the complaint is related to a taxi.

18 Bridge Highway The name of the bridge or highway, if the String

Name incident occurred there.

19 Road Ramp The specific ramp involved in the incident, if String

applicable.

20 Community The community board associated with the String

Board incident location.

3. Data Preparation
Data preparation still forms the important steps in pipeline data analysis. It is the
process of converting raw forms of data into clean datasets ready for further
analysis. The process makes data ready for any kind of applied analysis. In this
scenario, the data preparation for the dataset of service requests from 311 in New
York City will consist of a few primary steps, as described below:
3.1 Import the dataset and clean it
The very first step in data preparation is to import the dataset and do some initial
cleaning. This is performed by loading the data into a DataFrame, ensuring its
structure is proper, and making it analysis-ready.

•How to load the Dataset: To begin with, we must import the data from whatever
file format it is into pandas (in this instance, the file format is CSV).

•As soon as the data loads, proceed with a few anomaly checks for preliminary
cleaning by starting the detection of missing values, duplicate entries, or any other
data integrity issues.

Figure 1 Importing and Cleaning the Dataset

Figure 2 Output of Importing and Cleaning the Dataset

3.2 Converting Date-Time Columns and Creating New Features

Attributes like Created Date and Closed Date are typically of string or object datatypes.
If there has to be any kind of analysis which is to be done based on time, like
calculating the time taken to resolve a complaint, then those columns need to be
converted into datetime objects.

Next, a new feature will be created named Request_Closing_Time. It calculates the

time taken to close a complaint by getting the difference between Closing Date and
Request Date.

Figure 33.2 Converting Date-Time Columns and Creating New Features

Explanation: This code snippet below will convert string columns into datetime using
the pd.to_datetime() function. It also sets errors='coerce', so that every invalid entry
for dates will be converted to NaT. Please subtract the Closed Date from the Created
Date to get the time taken in hours.

3.3 Dropping Irrelevant Columns

In any given dataset, there might be fields that are definitely of no significance for
analysis. Such columns can be certainly removed to make data more manageable and
viewable. For example, in the given dataset, there is no need to know precise
addresses, school names, and vehicle-related details in order to understand the
pattern of complaints and duration of response times.

Drop the irrelevant columns for the type of analysis we have in our hands.

Figure 4 3.3 Dropping Irrelevant Columns

3.4.Handling Missing Data.

In the real world, dealing with missing data mainly involves replacing implied NaNs.
There are cases when missing data is just dropped, while in other cases, it will be
imputed by a default value or a computed value; that is mean, median, or mode.

Working with this data set will involve:

Dropping all rows with missing data in critical columns like Created Date and Closed
Date, as they are essential in the computation of response times. For the rows with
less critical columns in missing data, values may be imputed or dropped based on
significance. (Jain, 2021)
igure 5 Handling Missing Data.

Explanation: The dropna() method is used to remove rows with missing values in the
specified columns, while the fillna() method fills missing values in the Complaint Type
column with the most frequent value (mode).

3.5 Display Unique Values from All Columns

An overview of the data cleaning methods is as follows:
Dropping Rows of Missing ValuesMissing data is common to real-world datasets,
especially in columns important to analysis. Examples are Created Date and Closed
Date in this dataset. It will be implemented as follows:

Drop rows having missing values in critical columns like Created Date and Closed
Date.

For less critical columns, it will do either of the following:

Impute missing values by the mode in case of a categorical column.

It will drop rows if the column is not crucial for analysis.

Figure 6 Display Unique Values from All Columns

4.Data Analysis
Data analysis is the process of investigation and interpretation of data with a view to
identifying patterns, trends, and relationships between variables. Suchrequiring the
application of statistical techniques at the stage that could be summary statistics,
correlation analysis, and data visualization. Summary statistics throw light on principal
measures through mean, standard deviation, skewness, and kurtosis. Correlation
analysis helps to identify relationships between numeric variables. When visualized,
the data thus helps in understanding the patterns, outliers, and trends in a more
effective manner. It is because of this process that one can make an informed decision
based on insights from data.

4.1Show Summary Statistics such as sum, mean, standard

deviation
To generate summary statistics such as sum, mean, standard deviation, skewness,
and kurtosis for the numerical columns in the DataFrame, we can use a combination
of pandas functions and scipy.stats functions.

Figure 7 Show Summary Statistics such as sum, mean, standard deviation

4.2 Data Analysis: Calculate and Show Correlation of All Variables
In this task, we are going to find the correlation matrix of all numerical variables in the
data set. It will show the magnitude and direction of relationships between pairs of

Figure 8 Data Analysis: Calculate and Show Correlation of All Variables

Explanation: value can vary from -1 for perfect negative to +1 for perfect positive. This
will ensure an error in the assessment as non-numeric data will be excluded from the
correlation calculation." This error occurred because the process used to compute the
correlations from corr() considered computed values against non-numeric columns,
which was pretty ludicrous. Thus, we had to first filter out only the numerical columns
satisfying select_dtypes() for data types float64 and int64. So, for these columns, the
correlation matrix was constructed to indicate the relationship between numeric
variables.

5. Data Exploration
EDA is an approach to analyzing a dataset in which its structure, patterns, or
relationships between variables are derived in order to further analyze it. This is quite
the first stage in data analysis. It consists of a number of techniques applied to gain
insights into the data before applying advanced modeling or analysis. The principal
objectives of data exploration are as follows: (Kumar, 2021)

1. Data Understanding: It involves the checking of columns, data types, missing

values, and basic statistics, which consist of mean, median, and standard deviation,
to have an idea of the general characteristics of a dataset.

2. Missing or Anomalous Data Identification: Look for the missing, duplicated, or outlier
values that need to be handled before analysis.

3. Data Distribution: Understanding how the data is distributed through histograms,

box plots, or even descriptive statistics, in order to understand central tendency,
spread, and skewness in variables.

4. Correlation and Relationships: Check for correlations and relationships between

different features, whether linearly or non-linearly related; techniques include scatter
plots or correlation matrices.

5. Visualization: These are different kinds of visual representations like histograms,

bar charts, scatter plots, etc., which bring out hidden patterns and trends in the dataset.

Data exploration helps with the hypothesis, the proper analytical techniques for use,
and lays the data ready for further advanced statistical models, machine learning
models, or any decision-making processes. This is the most critical starting point of
any data science project.
5.1 four major insights through visualization that you come up after
data mining.
Data mining and analysis of large datasets will yield several valuable insights through
visualizations. Here are four key insights which stem from various types of
visualizations:

1. Distribution of Complaints by Type

Insight: A bar chart of the most frequent complaints within each category can easily
bring out the frequency, for instance, if there are recurring problems related to noise,
or a blocked driveway has a high frequency of complaints, it signifies some repeated
problem in that area.

Visualization:
2. Geographical Distribution of Complaints

Insight: A scatter plot or heatmap based on latitude and longitude can be plotted to
point out areas where complaints are concentrated the most. This will provide an
insight into an area where resource allocation is skewed, indicating an area where
more attention or intervention is needed. (Singh, 2020)

Visualization:
3. Request_Closing_Time by Complaint Type

This concept refers to identifying trends or regularities in how service requests occur
over time by leveraging NumPy, a numerical computing library in Python.

In practice, service request data often includes timestamps indicating when each
request was made. By analyzing these timestamps with NumPy arrays, you can detect
temporal patterns—such as peaks in activity during certain hours, days, or seasons.

Visualization:
4. Association between Variables

Insight: The correlation matrix heatmap can be used to detail the relationship between
variables. For instance, one can show positive correlations between some specific
complaint types and their corresponding response times, or high correlations between
variables originating from the geographic location due to increased probabilities of
getting certain complaints.
Visualization:

5.2 Group Complaint Types by Average Request Closing Time and

Location
To arrange complaint types according to their average Request_Closing_Time and
categorize them by locations (e.g., Borough), we first group the data by Complaint
Type and Location. Then, we calculate the average Request_Closing_Time for each
group, sort them in descending order, and visualize the results using a bar chart.
Figure 9 Group Complaint Types by Average Request Closing Time and Location

Statistical Testing: Test 1 - Average Response Time Across

Complaint Types

One-Way ANOVA is conducted for testing whether there is an equal response time
across all types of complaints. The Analysis of Variance checks if there is a significant
mean difference in Request_Closing_Time among different Complaint Types.

Hypotheses:
Null Hypothesis (H0): The average response times are equal for all complaint types,
so there is no significant difference between them.

Alternative Hypothesis (H1): The average response times are not equal across all
complaint types, indicating that at least one group is different.

Figure 10 Average Response Time Across Complaint Types

Explanation:

• f_oneway(): This function performs the ANOVA test comparing the means of
multiple groups (complaint types). It returns two values: the F-statistic (which
tells the ratio of variance between the groups) and the p-value (which tells if the
difference is statistically significant).

• F-statistic: A larger value indicates a higher likelihood that at least one group
mean is different.

• p-value: If p < 0.05, we reject the null hypothesis and conclude that at least one
complaint type has a significantly different average response time. If p ≥ 0.05,
we fail to reject the null hypothesis, meaning no significant difference exists.

Test 2: Whether the type of complaint or service requested and

location are related
Null Hypothesis (H0):
The type of complaint or service requested and the location (e.g., Borough) are not
related. (No relationship between complaint type and location.)

Alternative Hypothesis (H1):

The type of complaint or service requested and the location (Borough) are related.
(There is a significant relationship between complaint type and location.)

Statistical Test: Chi-Square Test for Independence

A Chi-Square test helps determine if two categorical variables (complaint type and
location) are independent or related.

Figure 11 type of complaint or service requested

Explanation:

• Chi2 Stat: The Chi-Square statistic, which helps measure the association
between two categorical variables.

• p-value: If the p-value < 0.05, we reject the Null Hypothesis and conclude that
there is a significant relationship between the complaint type and location. If the
p-value ≥ 0.05, we fail to reject the Null Hypothesis and conclude that there is
no significant relationship between the two.
Conclusion
In This milestone of this project, we aimed to explore, clean, and analyze the 311
Customer Service Requests dataset to extract meaningful insights that can be used
for process optimization, resource allocation, and improving service efficiency. The
dataset provided a wealth of information, including the Complaint Type, Borough,
Created Date, Closed Date, and other relevant details that help in understanding the
patterns and trends in customer complaints. The first stage of this project involved the
data understanding phase, where we identified the key variables, such as Complaint
Type, Borough, and Request_Closing_Time. We recognized early on that the dataset
required significant preprocessing to ensure data quality and completeness.

In the data preparation phase, we handled missing values, converted the Created Date
and Closed Date to the proper datetime format, and engineered the new feature,
Request_Closing_Time, which calculated the time it took to resolve each complaint.
We also dropped irrelevant columns and handled any remaining missing values by
imputing the Complaint Type with the most frequent category. This step ensured that
our dataset was clean and ready for deeper analysis, and we could now focus on the
core aspects of the dataset.

Once the data was prepared, we moved into the exploratory data analysis (EDA)
phase, where we sought to uncover patterns, trends, and insights from the data.
Through visualizations such as bar charts, box plots, and histograms, we discovered
key findings:

• Complaint Type Distribution: We identified that Noise, Illegal Parking, and

Blocked Driveway complaints were the most frequent types, highlighting areas
where the public service is most needed.

• Geographic Distribution: Complaints were heavily concentrated in urban areas,

particularly Brooklyn and Manhattan, indicating that certain boroughs face a
higher volume of complaints and might require more resources.

• Request_Closing_Time Variability: We found that Request_Closing_Time

varied greatly across different complaint types. Some complaints, like Noise,
were resolved quickly, while others, such as Blocked Driveway, took
significantly longer, pointing to inefficiencies in the resolution process that need
further investigation.

• Correlation Matrix: The correlation analysis showed weak relationships

between numerical variables, but it indicated that Request_Closing_Time did
not have strong associations with other variables, suggesting that it operates
independently as a key measure of service efficiency.

To validate these findings, we employed statistical tests:

• ANOVA Test: The One-Way ANOVA test confirmed that average response times
differ significantly across different complaint types (with a p-value of 0.0). This
validated our earlier observation from visualizations, emphasizing that certain
complaint types require more time to resolve, and indicating potential areas for
process improvement.

• Chi-Square Test: The Chi-Square test showed a significant relationship

between Complaint Type and Borough, supporting the idea that certain
complaint types are more likely to occur in specific geographic regions, which
could help with targeted resource allocation.

The insights gained from this analysis provide significant value:

• Geographic Resource Allocation: Urban boroughs like Brooklyn and Manhattan

are hotspots for complaints, suggesting that more resources and services
should be allocated to these regions.

• Inefficiencies in Service: There are significant variations in how long it takes to

resolve different types of complaints, which suggests inefficiencies in the
service process. Complaints like Blocked Driveway or Abandoned Vehicle
require much longer resolution times, which may point to the need for process
optimization in these categories.

• Predictive Potential: With the identification of geographic and complaint type

patterns, future analysis could focus on building predictive models to forecast
which complaint types are likely to arise in different areas and help allocate
resources proactively.
In conclusion, the coursework provided valuable insights into the 311 Customer
Service Requests dataset by exploring the data, cleaning it, and conducting in-depth
analysis. The findings pointed out areas that need attention, such as complaint types
with longer resolution times and regions with higher complaint volumes. The statistical
tests confirmed that there are significant differences in response times across
complaint types and that Complaint Type is related to Borough. These insights lay a
solid foundation for the next stages of the project, where we will further explore
predictive models and process improvements to optimize complaint handling and
response times. This milestone not only helped in identifying the areas for
improvement but also provided the data-driven insights needed to enhance the public
service experience for the citizens.
Bibliography
Alriksson, T. (2020, 08 19). https://towardsdatascience.com/why-you-should-use-
jupyter-notebook-and-pandas-2b60f8b1b6fe. Retrieved from
towardsdatascience: https://towardsdatascience.com/why-you-should-use-
jupyter-notebook-and-pandas-2b60f8b1b6fe
Goldsmith, S. (2023, 11 14). https://datasmart.ash.harvard.edu/news/article/why-
cities-need-data-smart-culture. Retrieved from
https://datasmart.ash.harvard.edu/news/article/why-cities-need-data-smart-
culture

Jain, S. (2021, 10 02). Towards Data Science. Retrieved from Towards Data
Science: https://towardsdatascience.com/how-to-handle-missing-data-
8646b18db0d4

Kumar, G. P. (2021, 05 20). Analytics Vidhya. Retrieved from Analytics Vidhya:

https://www.analyticsvidhya.com/blog/2021/06/exploratory-data-analysis-eda-
in-python/

NYC.gov. (n.d.). Retrieved from https://data.cityofnewyork.us/Social-Services/311-

Service-Requests/erm2-nwe9.
Singh, N. (2020, 12 04). Medium (The Startup). Retrieved from Medium (The
Startup): https://medium.com/swlh/geospatial-data-analysis-using-python-
f7f4b2c0ecf2

Solomon, B. (2022, 06 07). realpython. Retrieved from realpython:

https://realpython.com/python-data-visualization-seaborn/

Accounting Meigs Haka Bettner 11th Edition
0% (4)
Accounting Meigs Haka Bettner 11th Edition
3 pages
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
From Everand
Time Series with Python: How to Implement Time Series Analysis and Forecasting Using Python
Bob Mather
3/5 (1)
Group9 - NYC311 Analysis
No ratings yet
Group9 - NYC311 Analysis
6 pages
My Project Customer Service Requests Analysis
No ratings yet
My Project Customer Service Requests Analysis
3 pages
cover2
No ratings yet
cover2
31 pages
Final SSD
No ratings yet
Final SSD
10 pages
Project 02 Customer Service Requests Analysis Caltech
No ratings yet
Project 02 Customer Service Requests Analysis Caltech
19 pages
Simplilearn - Customer-Service-Requests-Analysis - Ipynb - Colaboratory
No ratings yet
Simplilearn - Customer-Service-Requests-Analysis - Ipynb - Colaboratory
15 pages
Doc3_merged
No ratings yet
Doc3_merged
16 pages
Introduction to Data Analytics
From Everand
Introduction to Data Analytics
Dan Martin
No ratings yet
Enhancing The Government Accounting Information Sys - 2023 - International Journ
No ratings yet
Enhancing The Government Accounting Information Sys - 2023 - International Journ
19 pages
1-s2.0-S187705091932126X-main
No ratings yet
1-s2.0-S187705091932126X-main
8 pages
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
From Everand
The InfluxDB Handbook: Deploying, Optimizing, and Scaling Time Series Data
Robert Johnson
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Cognitive Computing and Big Data Analytics
From Everand
Cognitive Computing and Big Data Analytics
Judith S. Hurwitz
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
SourceCode&Writeup_Comcast telecom
No ratings yet
SourceCode&Writeup_Comcast telecom
28 pages
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
Data Science With R - Comcast Telecom Consumer Complaints
No ratings yet
Data Science With R - Comcast Telecom Consumer Complaints
11 pages
Data Science With R: Project 2: Comcast Telecom Consumer Complaints
No ratings yet
Data Science With R: Project 2: Comcast Telecom Consumer Complaints
11 pages
Document and Knowledge Management Interrelationships
From Everand
Document and Knowledge Management Interrelationships
A. Afritopic
4.5/5 (2)
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Managing Big Data Effectively
From Everand
Managing Big Data Effectively
Bhima Asan
No ratings yet
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Business Intelligence: Individual Assignment
No ratings yet
Business Intelligence: Individual Assignment
3 pages
Customer Analysis Project1
No ratings yet
Customer Analysis Project1
6 pages
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
From Everand
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
Stephen Fleming
5/5 (2)
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Planning, Negotiating, Implementing, and Managing Wide Area Networks: A Practical Guide
From Everand
Planning, Negotiating, Implementing, and Managing Wide Area Networks: A Practical Guide
Luiz Augusto de Carvalho
No ratings yet
Ultimate Enterprise Data Analysis and Forecasting using Python: Leverage Cloud platforms with Azure Time Series Insights and AWS Forecast Components for Deep learning Modeling using Python (English Edition)
From Everand
Ultimate Enterprise Data Analysis and Forecasting using Python: Leverage Cloud platforms with Azure Time Series Insights and AWS Forecast Components for Deep learning Modeling using Python (English Edition)
Shanthababu Pandian
No ratings yet
Data Mining 101: Core Concepts and Algorithms
From Everand
Data Mining 101: Core Concepts and Algorithms
Swarnalata Verma
No ratings yet
Comcast Analysis
No ratings yet
Comcast Analysis
8 pages
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
Efficient Management of Large Metadata Catalogs in a Ubiquitous Computing Environment
From Everand
Efficient Management of Large Metadata Catalogs in a Ubiquitous Computing Environment
Daniel Beatty
No ratings yet
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Patterns for Parallel Software Design
From Everand
Patterns for Parallel Software Design
Jorge Luis Ortega-Arjona
No ratings yet
Beginner’s Guide to ServiceNow Workflow Automation
From Everand
Beginner’s Guide to ServiceNow Workflow Automation
Business Success Shop
No ratings yet
Professional Microsoft SQL Server 2012 Reporting Services
From Everand
Professional Microsoft SQL Server 2012 Reporting Services
Paul Turley
1/5 (1)
Systems Analysis and Design: The Modern Perspective
From Everand
Systems Analysis and Design: The Modern Perspective
Pasquale De Marco
No ratings yet
Open-Source Odyssey: Pioneering Data Engineering with AI Automation
From Everand
Open-Source Odyssey: Pioneering Data Engineering with AI Automation
Muthukrishnan Muthusubramanian
No ratings yet
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Analyst's Atlas: Navigating the Financial Data Sphere
From Everand
The Analyst's Atlas: Navigating the Financial Data Sphere
Manish Tomar
No ratings yet
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
From Everand
Building Modern Data Applications Using Databricks Lakehouse: Develop, optimize, and monitor data pipelines on Databricks
Will Girten
No ratings yet
Applied Analytics with Spotfire: Definitive Reference for Developers and Engineers
From Everand
Applied Analytics with Spotfire: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Oracle Quick Guides: Part 2 - Oracle Database Design
From Everand
Oracle Quick Guides: Part 2 - Oracle Database Design
Malcolm Coxall
No ratings yet
How to Create Custom Dashboards in ServiceNow
From Everand
How to Create Custom Dashboards in ServiceNow
Business Success Shop
No ratings yet
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Generative AI Record of Interaction Template
No ratings yet
Generative AI Record of Interaction Template
3 pages
Developing Analytic Talent: Becoming a Data Scientist
From Everand
Developing Analytic Talent: Becoming a Data Scientist
Vincent Granville
3/5 (7)
Learning Software Architecture
From Everand
Learning Software Architecture
IT Campus Academy
No ratings yet
MCS-034: Software Engineering
From Everand
MCS-034: Software Engineering
Dr. DK Sukhani
No ratings yet
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Data Governance for Tax Administrations: A Practical Guide
From Everand
Data Governance for Tax Administrations: A Practical Guide
Inter-American Center of Tax Administrations – CIAT
No ratings yet
Enabling World-Class Decisions for Banks and Credit Unions: Making Dollars and Sense of Your Data
From Everand
Enabling World-Class Decisions for Banks and Credit Unions: Making Dollars and Sense of Your Data
Corey Barak
No ratings yet
Dancing on a Cloud: A Framework for Increasing Business Agility
From Everand
Dancing on a Cloud: A Framework for Increasing Business Agility
David Sterling
No ratings yet
Building Scalable Data-Intensive Applications
From Everand
Building Scalable Data-Intensive Applications
Chandani Kaul
No ratings yet
Big Data for Enterprise Architects
From Everand
Big Data for Enterprise Architects
Dr Mehmet Yildiz
4.5/5 (3)
Architecting Real-Time Analytics with Druid: Definitive Reference for Developers and Engineers
From Everand
Architecting Real-Time Analytics with Druid: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Session 5-6
No ratings yet
Session 5-6
25 pages
Module2 ResearchDesign
No ratings yet
Module2 ResearchDesign
14 pages
Session-8 BRM PDF
No ratings yet
Session-8 BRM PDF
18 pages
Pengaruh Penghasilan Dan Gaya Hidup Terhadap Kejadian Hipertensi Pada Nelayan Di Kota Medan
No ratings yet
Pengaruh Penghasilan Dan Gaya Hidup Terhadap Kejadian Hipertensi Pada Nelayan Di Kota Medan
15 pages
Simulation
No ratings yet
Simulation
7 pages
CI Problem
No ratings yet
CI Problem
3 pages
MODULE 3. Research in Educ Student
No ratings yet
MODULE 3. Research in Educ Student
8 pages
Introduction To Qualitative Research Methods:: Arthur Cropley University of Hamburg
No ratings yet
Introduction To Qualitative Research Methods:: Arthur Cropley University of Hamburg
217 pages
Field Practical Training Report
No ratings yet
Field Practical Training Report
32 pages
Georglm Binomial Likfit - GLSM
No ratings yet
Georglm Binomial Likfit - GLSM
48 pages
1 - The Nature of Statistics
100% (1)
1 - The Nature of Statistics
58 pages
Solution Manual for Social Statistics for a Diverse Society, 8th Edition, Chava Frankfort-Nachmias, Anna Leon-Guerrero, ISBN: 9781544326306, ISBN: 9781544326238, ISBN: 9781506347202 pdf download
100% (2)
Solution Manual for Social Statistics for a Diverse Society, 8th Edition, Chava Frankfort-Nachmias, Anna Leon-Guerrero, ISBN: 9781544326306, ISBN: 9781544326238, ISBN: 9781506347202 pdf download
42 pages
Metrology of Qualitative Chemical Analysis
No ratings yet
Metrology of Qualitative Chemical Analysis
172 pages
Comparative Study of Investment Avenue
No ratings yet
Comparative Study of Investment Avenue
76 pages
proc17thconf2019 (2)
No ratings yet
proc17thconf2019 (2)
421 pages
Hull: Options, Futures, and Other Derivatives, Tenth Edition Chapter 23: Estimating Volatilities and Correlations Multiple Choice Test Bank
No ratings yet
Hull: Options, Futures, and Other Derivatives, Tenth Edition Chapter 23: Estimating Volatilities and Correlations Multiple Choice Test Bank
4 pages
Measures of Forecast Error
No ratings yet
Measures of Forecast Error
12 pages
JBI Manual For Evidence Synthesis - CONTENIDOS
No ratings yet
JBI Manual For Evidence Synthesis - CONTENIDOS
5 pages
Mess PDF
100% (1)
Mess PDF
94 pages
Core Stabilization Training and Fundamental Motor Skills in Children
No ratings yet
Core Stabilization Training and Fundamental Motor Skills in Children
5 pages
Assignment 3 28855
No ratings yet
Assignment 3 28855
3 pages
Statistics Notes PDF
No ratings yet
Statistics Notes PDF
27 pages
Determinants_of_intention_to_use_family_planning_m
No ratings yet
Determinants_of_intention_to_use_family_planning_m
12 pages
Statistic and Probability Assign. #4
No ratings yet
Statistic and Probability Assign. #4
20 pages
Solutions To Chapter 1 An Introduction To Data Mining: Discovering Knowledge in Data 2 Edition
No ratings yet
Solutions To Chapter 1 An Introduction To Data Mining: Discovering Knowledge in Data 2 Edition
15 pages
Hypergeometric Distribution
No ratings yet
Hypergeometric Distribution
7 pages
Psychology
No ratings yet
Psychology
21 pages
DSUR I Chapter 06 (Correlation)
No ratings yet
DSUR I Chapter 06 (Correlation)
42 pages

Final

Uploaded by

Final

Uploaded by

Contents

1.1 Background of the Dataset

Resolution Status: Closed, Pending, Resolved.

1.2 Relevance of Data Analysis in Public Services

The result will, thus, be evident in an increase in operational efficiency, optimal

1.3 Goals and Scope of Analysis

Geographic Patterns: Test if there is a difference in complaints from different parts of

Service Optimization: Insights should be provided to other, more efficient resource

2.1 Overview of Data Structure

•Location Type: Breaks it down further to Street/Sidewalk, Building, Park.

•ResoAction: Text of the action taken, for instance Completed, No Access.

2.3 Data Types and Missing Data Analysis

S.No Column Name Description Data

1 Unique Key A unique identifier assigned to each service Integer

4 Agency The agency handling the request (e.g., NYPD, String

5 Complaint Type The type of complaint (e.g., Noise, Illegal String

6 Descriptor Additional details about the complaint (e.g., String

7 Location Type The type of location (e.g., Street, Residential). String

8 Incident Zip The ZIP code of the incident location. String

9 City The city where the incident occurred. String

10 Status The status of the request (e.g., Open, Closed). String

11 Borough The borough where the incident occurred String

13 Longitude The longitude coordinate of the incident Float

14 Address Type The type of address (e.g., Residential, String

16 School Name The name of the school, if the incident String

17 Taxi Company The borough associated with the taxi String

18 Bridge Highway The name of the bridge or highway, if the String

19 Road Ramp The specific ramp involved in the incident, if String

20 Community The community board associated with the String

Figure 1 Importing and Cleaning the Dataset

3.2 Converting Date-Time Columns and Creating New Features

Next, a new feature will be created named Request_Closing_Time. It calculates the

Figure 33.2 Converting Date-Time Columns and Creating New Features

3.3 Dropping Irrelevant Columns

Figure 4 3.3 Dropping Irrelevant Columns

3.4.Handling Missing Data.

Working with this data set will involve:

3.5 Display Unique Values from All Columns

For less critical columns, it will do either of the following:

Impute missing values by the mode in case of a categorical column.

It will drop rows if the column is not crucial for analysis.

Figure 6 Display Unique Values from All Columns

4.1Show Summary Statistics such as sum, mean, standard

Figure 7 Show Summary Statistics such as sum, mean, standard deviation

Figure 8 Data Analysis: Calculate and Show Correlation of All Variables

1. Data Understanding: It involves the checking of columns, data types, missing

3. Data Distribution: Understanding how the data is distributed through histograms,

4. Correlation and Relationships: Check for correlations and relationships between

5. Visualization: These are different kinds of visual representations like histograms,

1. Distribution of Complaints by Type

5.2 Group Complaint Types by Average Request Closing Time and

Statistical Testing: Test 1 - Average Response Time Across

Figure 10 Average Response Time Across Complaint Types

Test 2: Whether the type of complaint or service requested and

Alternative Hypothesis (H1):

Statistical Test: Chi-Square Test for Independence

Figure 11 type of complaint or service requested

• Complaint Type Distribution: We identified that Noise, Illegal Parking, and

• Geographic Distribution: Complaints were heavily concentrated in urban areas,

• Request_Closing_Time Variability: We found that Request_Closing_Time

• Correlation Matrix: The correlation analysis showed weak relationships

To validate these findings, we employed statistical tests:

• Chi-Square Test: The Chi-Square test showed a significant relationship

The insights gained from this analysis provide significant value:

• Geographic Resource Allocation: Urban boroughs like Brooklyn and Manhattan

• Inefficiencies in Service: There are significant variations in how long it takes to

• Predictive Potential: With the identification of geographic and complaint type

Kumar, G. P. (2021, 05 20). Analytics Vidhya. Retrieved from Analytics Vidhya:

NYC.gov. (n.d.). Retrieved from https://data.cityofnewyork.us/Social-Services/311-

Solomon, B. (2022, 06 07). realpython. Retrieved from realpython:

You might also like