0% found this document useful (0 votes)
71 views

Data Analytics Project PDF

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Data Analytics Project PDF

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Project Name - Used Cars Price Analysis

Objective: Analyse car price variation based on various car brands and identify
the best cars offering good quality at lower prices

About Dataset:
1. The dataset comprises diverse car attributes including model, year, kilo-meters
driven, fuel type, and ownership history.

2. It provides a comprehensive collection of data points to facilitate accurate


prediction of the selling price of used cars.

3. Our analysis aims to uncover intricate relationships between these features


and the selling price, enhancing understanding within the used car market.

Steps to be Followed:
1. Web scrapping to collect data
2. Data cleaning and manipulation
3. Visualization and analysis on various factors
4. Challenges
5. Conclusion

Step 1: Web scrapping to collect data


To perform web scraping, we will use Python libraries such as

• Pandas: Data manipulation and analysis library.

• Numpy: Numerical computing library.

• Matplotlib: Data visualization library.


• Seaborn: Statistical data visualization library.

To collect data from reputed used car selling websites like Car Wale.
➢ View the column names, alternatively we can also use data.columns

➢ View the shape of the data, number of rows & columns

Step 2: Data cleaning and manipulation


➢ Checking for missing values
• In this data there is no missing values.
Fig 1 a box plot to visually inspect the missing values

➢ Concise summary of the Data frame’s


➢ Summary Statistics

➢ Top 10 Car Names


Fig 2 Top 10 car names

➢ Checking and displaying outliers using box plot

Fig 3 A box plot to visually inspect the present_price column for outliers

➢ Removing Outliers using IQR


Fig 4 A box plot to visually inspect the present_price column for outliers
Step 3: Visualization and analysis on various
factors
• We’ll use libraries like matplotlib, seaborn and plotly for visualization

➢ Visualize distribution of numerical variables

Fig 5 Visualize distribution of numerical variables

➢ Visualize categorical variables


Fig 6 Visualize distribution of numerical variables

➢ Re-visualize distribution after removing outliers


Fig 7 Visualize distribution of numerical variables

➢ Plot correlation matrix

Fig 8 Visualize co-relation matrix of our data


Step 4: Challenges
1. Data Quality: Ensuring the data collected is accurate and complete.
2. Website Blocking: Websites may block scrapping requests.
3. Dynamic Content: Handling Javascript-loaded content.
4. Data Cleaning: Handling inconsistencies in data formats and missing values.
5. Analysis Complexity: Dealing with multivariate data and deriving meaningful
insights.

Step 5: Conclusion
From the analysis, we can conclude the following:

1. Price Variation: There is significant variations in prices across different


brands, Luxury brands tend to have higher prices.
2. Best value cars: Some brands offer good quality cars at relatively lower
prices.
3. EMI vs Price: A positive correlation between car price and EMI indicates
between car price and EMI indicates that higher-priced cars have higher
EMIs.

Final Thoughts:
• Insights: The insights from this analysis can help buyers make informed
decisions when purchasing used cars.
• Further Research: Additional
factors like mileage, year of manufacture and car condition could provide
deeper insights.

Next Steps:
• Improve Data Collection: Use more advanced scrapping techniques or APIs if
available.
• Deploy Dashboard: Create an interactive dashboard using tools like Dash or
Tableau for better visualization and decision-making.

You might also like