0% found this document useful (0 votes)
14 views

Exploratory Data Analysis Report_ Electric Vehicle Dataset -

The exploratory data analysis report on an electric vehicle dataset reveals 232,230 entries with various attributes, including vehicle specifications and geographical data. Key findings indicate Tesla's dominance in the market, a significant number of vehicles with an electric range of 0 miles, and notable geographic concentration in Washington State. The report suggests further analysis on pricing discrepancies, electric range distributions, and the need for additional data to enhance insights into EV adoption trends.

Uploaded by

Yash Tiwari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Exploratory Data Analysis Report_ Electric Vehicle Dataset -

The exploratory data analysis report on an electric vehicle dataset reveals 232,230 entries with various attributes, including vehicle specifications and geographical data. Key findings indicate Tesla's dominance in the market, a significant number of vehicles with an electric range of 0 miles, and notable geographic concentration in Washington State. The report suggests further analysis on pricing discrepancies, electric range distributions, and the need for additional data to enhance insights into EV adoption trends.

Uploaded by

Yash Tiwari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Exploratory Data Analysis Report: Electric Vehicle

Dataset - Yash Tiwari


Dataset Overview

The dataset comprises 232,230 rows and 17 columns, offering a detailed snapshot of electric
vehicles. Each entry represents a unique vehicle, identifiable by its Vehicle Identification
Number (VIN), accompanied by a variety of attributes detailing its technical specifications,
geographical context, and administrative information.

Key Attributes

Identifiers & Location:

●​ VIN: Unique identifier for each vehicle.


●​ County, City, State, Postal Code: Provide the geographical location of the vehicle.
●​ Legislative District, 2020 Census Tract, Vehicle Location: Offer additional
administrative and precise regional details.

Vehicle Details:

●​ Model Year, Make, Model: Describe the vehicle's manufacturing year, brand, and
specific model.
●​ Electric Vehicle Type: Categorizes the vehicle as either Battery Electric Vehicle (BEV)
or Plug-in Hybrid Electric Vehicle (PHEV).
●​ CAFV Eligibility: Indicates if the vehicle is eligible for Clean Alternative Fuel Vehicle
incentives.
●​ Electric Range: Specifies the distance a vehicle can travel solely on electric power.
Notably, 25% of the records report an electric range of 0 miles, which could signify hybrid
models or potential data recording inconsistencies.
●​ Base MSRP: Represents the manufacturer's suggested retail price. A significant number
of entries are recorded as $0, suggesting missing data or placeholder values.

Additional Information:

●​ DOL Vehicle ID: An identifier used by the Department of Licensing.


●​ Electric Utility: Indicates the electric utility provider for the vehicle's location.

Data Quality Insights

Missing Values:
The dataset exhibits minimal missing values in most location-based fields: County (4), City (4),
Postal Code (4), Vehicle Location (11), Electric Utility (4), and 2020 Census Tract (4). However,
there are more substantial missing entries in the Legislative District (481). Additionally, Electric
Range and Base MSRP each have 27 missing values.

Duplicates:

The dataset is free of duplicate rows, indicating a clean and reliable set of unique vehicle
records.

Preliminary Observations:

●​ Tesla Dominance: Tesla vehicles, particularly the Model Y and Model 3, are the most
prevalent in the dataset, suggesting a strong market presence or a focus in the data
collection process.
●​ Electric Range Distribution: The high percentage of vehicles with an electric range of 0
miles warrants further investigation to differentiate between hybrid vehicles and potential
data quality issues.
●​ Geographic Concentration: The majority of vehicles are registered in Washington
State, with a notable concentration in King County and the city of Seattle. This regional
bias should be considered during analysis.

1. Initial Plan for Data Exploration

●​ Top Vehicle Makes & Models: Visualize the frequency distribution of the top 10 vehicle
makes (e.g., TESLA, CHEVROLET) and the top 10 models (e.g., MODEL Y, MODEL 3)
to understand market share and popularity.
●​ Temporal Analysis: Analyze the count of electric vehicles by model year to identify
trends in EV adoption over time and potential growth trajectories.
●​ Electric Range Analysis: Examine the distribution of the electric range to understand
the typical range capabilities and investigate the significant number of zero-range
entries.
●​ Vehicle Range by Electric Vehicle Type: Compare the electric range of Battery Electric
Vehicles (BEVs) and Plug-in Hybrid Electric Vehicles (PHEVs) to assess performance
differences between these categories.
●​ Pricing Insights: Investigate the Base MSRP across different vehicle makes to identify
pricing patterns and the impact of brand on price, while acknowledging and addressing
the presence of $0 values.
●​ Geographic Distribution: Map the distribution of electric vehicles across different
counties and cities within Washington State to identify areas with high EV adoption.
●​ Relationship between Range and Model Year: Explore if there's a correlation between
the vehicle's model year and its electric range, potentially indicating technological
improvements over time.

2. Actions Taken for Data Cleaning and Feature Engineering


Data Cleaning:

●​ Missing Values: Identified and documented the missing values in County, City, Postal
Code (4 each), Electric Range (27), Base MSRP (27), Legislative District (481), Vehicle
Location (11), Electric Utility (4), and 2020 Census Tract (4). While the number is
relatively small for most fields, the Legislative District has a notable amount of missing
data.
●​ Zero Values in Electric Range: Noticed that 25% of the Electric Range entries are 0.0.
This was flagged as a potential indicator of hybrid vehicles or a data entry issue requiring
further investigation.
●​ Zero Values in Base MSRP: Identified 27 instances where the Base MSRP is $0. This
suggests missing or placeholder data that needs to be addressed for accurate pricing
analysis.
●​ Duplicates: Confirmed that the dataset contains no duplicate rows, ensuring the
uniqueness of each vehicle record.

Feature Engineering:

●​ Vehicle Type Categorization: Created a new categorical variable by extracting and


grouping the information from the "Electric Vehicle Type" column to clearly distinguish
between BEVs and PHEVs. This will facilitate comparisons between the two types.

3. Key Findings and Insights

●​ Make and Model Trends:

Tesla Dominance: Tesla is the leading manufacturer in the dataset with 13,451
vehicles.

Top Tesla Models: Within the Tesla brand, the Model Y (6,392 records) and
Model 3 (4,875 records) are the most frequently recorded models.

Other Popular Brands: Following Tesla, other significant makes include


CHEVROLET, NISSAN, BMW, KIA, FORD, TOYOTA, HYUNDAI, JEEP, and
RIVIAN, indicating a diverse range of EV adoption.
●​ Temporal Trends:

The dataset shows a higher number of records for more recent model years, with
2023 having the highest count (8,099 records), followed by 2024 and 2022. This
trend likely reflects the increasing adoption of electric vehicles and/or the focus of
the data collection efforts.

●​ Electric Range Distribution:

A substantial portion of the vehicles (25%) report an electric range of 0.0 miles.
This could imply a significant presence of plug-in hybrid vehicles in the dataset,
or it might indicate data entry errors where the range was not recorded for some
battery electric vehicles. Further investigation is needed to clarify this.
●​ Performance by Vehicle Type:

Preliminary analysis suggests that Battery Electric Vehicles (BEVs) generally


have a higher electric range compared to Plug-in Hybrid Electric Vehicles
(PHEVs), aligning with the expectation that BEVs rely solely on electric power for
propulsion.

●​ Pricing Discrepancies:

The analysis of Base MSRP by vehicle make revealed a significant number of


entries with a value of $0. This could represent missing data or placeholder
values. Accurate pricing insights will require addressing these discrepancies,
potentially through imputation or by excluding these records from price-related
analyses.

●​ Geographic Concentration:

The majority of the electric vehicles in the dataset are located in Washington
State, with a notable concentration in King County and the city of Seattle. This
indicates a strong adoption of EVs in this region, possibly due to state incentives,
infrastructure, or other regional factors.
4. Formulating Hypotheses

Based on the initial exploration of the data, we can formulate the following testable hypotheses:

●​ Hypothesis 1: Battery Electric Vehicles (BEVs) have a statistically significantly higher


electric range than Plug-in Hybrid Electric Vehicles (PHEVs).
○​ Rationale: Initial observations of the data suggest a difference in electric range
between these two vehicle types, with BEVs expected to have a greater range
due to their full reliance on electric power.
●​ Hypothesis 2: There is a positive correlation between the model year of an electric
vehicle and its electric range. Newer model years tend to have a higher electric range
compared to older model years.
○​ Rationale: Technological advancements in battery technology and vehicle design
over time likely contribute to increased electric range in newer EV models.
●​ Hypothesis 3: The average Base MSRP varies significantly across different vehicle
makes. Specifically, premium brands like Tesla and BMW will have a higher average
MSRP compared to more mainstream brands such as Chevrolet and Ford.
○​ Rationale: Brand perception, technological features, and market positioning
often lead to significant price differences between vehicle manufacturers.

5. Conducting a Formal Significance Test

Hypothesis Testing for Base MSRP Variation Across Vehicle Makes

●​ Null Hypothesis (H0): There is no statistically significant difference in the mean Base
MSRP across different vehicle makes.
●​ Alternative Hypothesis (H1): There is a statistically significant difference in the mean
Base MSRP across at least one vehicle make compared to others.
●​ Test Used: One-way ANOVA (Analysis of Variance)
●​ Reasoning: ANOVA is the appropriate statistical test to compare the means of a
continuous variable (Base MSRP) across multiple independent groups (vehicle makes).
It assesses whether the observed differences in means are likely due to chance or a real
effect of the vehicle make.

Results:

●​ F-statistic: 11.8574
●​ P-value: 3.5405578232248146e-73

Interpretation:

The calculated p-value (approximately 3.54e-73) is extremely small, significantly less than the
conventional significance level of 0.05. This indicates very strong statistical evidence against the
null hypothesis.

Conclusion:

Based on the results of the one-way ANOVA test, we reject the null hypothesis. We conclude
that there is a statistically significant difference in the Base MSRP across different vehicle
makes in this dataset. This supports the alternative hypothesis that the average Base MSRP
varies significantly depending on the vehicle manufacturer. Further post-hoc analysis could be
conducted to identify which specific vehicle makes have significantly different average MSRPs
from each other.

6. Suggestions for Next Steps in Analyzing This Data

To further analyze this electric vehicle dataset and gain deeper insights, consider the following
steps:

●​ Address Missing MSRP Values: Investigate the reasons behind the missing Base
MSRP values. If possible, explore imputation techniques based on vehicle make, model,
and year, or consider using external data sources to fill these gaps.
●​ Investigate Zero Electric Range: Further analyze the vehicles with an electric range of
0.0 miles. Determine if they are primarily plug-in hybrid vehicles or if there are potential
data errors. If they are PHEVs, consider creating a separate category for more granular
analysis.
●​ Geographic Analysis: Conduct a more in-depth geographic analysis to understand the
factors driving EV adoption in specific regions. This could involve looking at correlations
with population density, income levels, charging infrastructure, and state incentives.
●​ Time Series Analysis: If more historical data is available, perform a time series analysis
to understand the growth trends of different EV makes and models over time.
●​ Correlation Analysis: Explore the correlations between different attributes, such as the
relationship between model year and electric range, or between Base MSRP and electric
range.
●​ Regression Analysis: Build a regression model to predict the electric range or Base
MSRP based on other vehicle attributes like make, model, and vehicle type.
●​ CAFV Eligibility Analysis: Investigate the characteristics of vehicles that are eligible for
CAFV incentives versus those that are not.
●​ Electric Utility Analysis: Explore the distribution of electric vehicles across different
utility providers and see if there are any patterns or correlations.
●​ Hypothesis Testing: Conduct formal significance tests for the remaining formulated
hypotheses (comparison of electric range between BEVs and PHEVs, and the
correlation between model year and electric range).

7. Data Quality Summary and Request for Additional Data

The dataset appears to be of reasonably good quality, with a large number of records and
minimal missing values in most key fields. The absence of duplicate entries is also a positive
aspect. However, the significant number of zero values in the Electric Range and Base MSRP
fields raises some concerns and requires further investigation. These zero values could
represent different scenarios (e.g., hybrid vehicles, missing data) that need to be clearly
understood to avoid skewing the analysis.

To enhance the analysis and gain a more comprehensive understanding, the following
additional data points would be beneficial:

●​ Detailed Specifications of Hybrid Vehicles: If the 0 electric range indicates hybrid


vehicles, having a specific attribute that distinguishes between different types of hybrids
(e.g., plug-in hybrid with limited electric range vs. traditional hybrids) would be valuable.
●​ Reason Codes for Missing MSRP: Understanding why the MSRP is missing for certain
vehicles (e.g., not applicable, data entry error) would help in deciding how to handle
these missing values.
●​ Information on Charging Infrastructure: Data on the availability and density of
charging stations in different geographic areas could provide valuable context for
understanding EV adoption patterns.
●​ Vehicle Usage Data: Information on how these vehicles are actually used (e.g., average
miles driven, charging frequency) could provide deeper insights into their real-world
performance and impact.
●​ Economic Indicators: Incorporating economic data at the regional level (e.g., average
income, cost of electricity) could help explain variations in EV adoption and preferences.

You might also like