0% found this document useful (0 votes)
177 views10 pages

Zomato Data Analysis with Python

Zomoto data analysis

Uploaded by

bkdanusri27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views10 pages

Zomato Data Analysis with Python

Zomoto data analysis

Uploaded by

bkdanusri27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Zomoto

August 24
data
analysis
using 2024
python
Name:[Link]
Project Overview: Unveiling valuable insights from Zomato, a popular restaurant platform, requires the
power of Python. Libraries like Pandas and Matplotlib become your allies in this task. Pandas helps you
wrangle the Zomato data into a structured format, while Matplotlib brings it to life with informative
visualizations. Through data exploration and analysis, you can uncover hidden trends. Perhaps you’ll
identify popular cuisines by location or explore how pricing influences ratings. Python empowers you to
ask questions of the data and uncover knowledge that can benefit both restaurants and dinners.

Objectives:

Collect and preprocess Zomato data.


Perform exploratory data analysis (EDA) to identify trends and patterns.
Visualize data using Matplotlib or Seaborn to uncover insights.
Skills Demonstrated
Data wrangling and preprocessing using Pandas.
Exploratory data analysis (EDA).

Python and its following libraries are used to analyze Zomato data.
Numpy–
With Numpy arrays, complex computations are executed quickly, and large calculations are handled
efficiently.
Matplotlib–
It has a wide range of features for creating high-quality plots, charts, histograms, scatter plots, and
more.
Pandas–
The library simplifies the loading of data frames into 2D arrays and provides functions for performing
multiple analysis tasks in a single operation.
Seaborn–
It offers a high-level interface for creating visually appealing and informative statistical graphics.

To address our analysis, we need to respond to the subsequent inquiries:


Do a greater number of restaurants provide online delivery as opposed to offline services?
Which types of restaurants are the most favored by the general public?
What price range is preferred by couples for their dinner at restaurants?

Before commencing the data analysis, the following steps are followed.
Following steps are followed before starting to analyze the data.
Step 1: Import necessary Python libraries.

import pandas as pd

import numpy as np

import [Link] as plt

import seaborn as sns


Step 2: Create the data frame.

Download the file containing the data using the link.

dataframe = pd.read_csv("Zomato data .csv")

print([Link]())

output:

name online_order book_table rate votes \


0 Jalsa Yes Yes 4.1/5 775
1 Spice Elephant Yes No 4.1/5 787
2 San Churro Cafe Yes No 3.8/5 918
3 Addhuri Udupi Bhojana No No 3.7/5 88
4 Grand Village No No 3.8/5 166

approx_cost(for two people) listed_in(type)


0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet

def handleRate(value):

​ value=str(value).split('/')

​ value=value[0];

​ return float(value)

dataframe['rate']=dataframe['rate'].apply(handleRate)

print([Link]())

___________________________________

def handleRate(value):

​ value=str(value).split('/')
​ value=value[0];

​ return float(value)

dataframe['rate']=dataframe['rate'].apply(handleRate)

print([Link]())

output:

name online_order book_table rate votes \


0 Jalsa Yes Yes 4.1 775
1 Spice Elephant Yes No 4.1 787
2 San Churro Cafe Yes No 3.8 918
3 Addhuri Udupi Bhojana No No 3.7 88
4 Grand Village No No 3.8 166

approx_cost(for two people) listed_in(type)


0 800 Buffet
1 800 Buffet
2 800 Buffet
3 300 Buffet
4 600 Buffet

[Link]()

output:

<class '[Link]'>
RangeIndex: 148 entries, 0 to 147
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 148 non-null object
1 online_order 148 non-null object
2 book_table 148 non-null object
3 rate 148 non-null float64
4 votes 148 non-null int64
5 approx_cost(for two people) 148 non-null int64
6 listed_in(type) 148 non-null object
dtypes: float64(1), int64(2), object(4)
memory usage: 8.2+ KB

We will now examine the data frame for the presence of any null values. This stage scans each column to
see whether there are any missing values or empty cells. This allows us to detect any potential data gaps
that must be addressed.
There is no NULL value in dataframe.

Lets explore the listed_in (type) column


[Link](x=dataframe['listed_in(type)'])

[Link]("Type of restaurant")

output:

Conclusion: The majority of the restaurants fall into the dining category.
grouped_data = [Link]('listed_in(type)')['votes'].sum()

result = [Link]({'votes': grouped_data})

[Link](result, c="green", marker="o")

[Link]("Type of restaurant", c="red", size=20)

[Link]("Votes", c="red", size=20)

output:
Conclusion: Dining restaurants are preferred by a larger number of individuals.

Now we will determine the restaurant’s name that received the maximum votes based on a given
dataframe.

max_votes = dataframe['votes'].max()

restaurant_with_max_votes = [Link][dataframe['votes'] == max_votes, 'name']

print("Restaurant(s) with the maximum votes:")

print(restaurant_with_max_votes)

output:

Restaurant(s) with the maximum votes:


38 Empire Restaurant
Name: name, dtype: object

Let’s explore the online_order column.

[Link](x=data['online_order'])

output:
Conclusion: This suggests that a majority of the restaurants do not accept online orders.

Let’s explore the rate column.

[Link](dataframe['rate'],bins=5)

[Link]("Ratings Distribution")

[Link]()

output:
Conclusion: The majority of restaurants received ratings ranging from 3.5 to 4.

Let’s explore the approx_cost(for two people) column.


Conclusion: The majority of couples prefer restaurants with an approximate cost of 300 rupees.

Now we will examine whether online orders receive higher ratings than offline orders.

[Link](figsize = (6,6))

[Link](x = 'online_order', y = 'rate', data = dataframe)

output:

CONCLUSION: Offline orders received lower ratings in comparison to online orders, which
obtained excellent ratings.

pivot_table = dataframe.pivot_table(index='listed_in(type)', columns='online_order', aggfunc='size',


fill_value=0)

[Link](pivot_table, annot=True, cmap="YlGnBu", fmt='d')

[Link]("Heatmap")

[Link]("Online Order")

[Link]("Listed In (Type)")
[Link]()

CONCLUSION: Dining restaurants primarily accept offline orders, whereas cafes primarily
receive online [Link] suggests that clients prefer to place orders in person at restaurants,
but prefer online ordering at cafes

Common questions

Powered by AI

A pivot table enables efficient summarization and comparison of large datasets. In Zomato data analysis, it can evaluate service types against delivery methods by providing a clear view of how each restaurant type interacts with online and offline delivery. This allows users to explore and identify patterns such as dining restaurants primarily accepting offline orders compared to cafes receiving online orders, providing insights into consumer behaviors and restaurant operational strategies.

To determine whether more restaurants provide online delivery services compared to offline services, one could use Pandas to create a frequency distribution of the 'online_order' column. By generating a count plot using Seaborn, with 'online_order' option as the x-axis, you can visually compare the number of restaurants that offer online versus offline services. This approach allows analysts to easily interpret large-scale service preferences in the data set.

Data visualization can uncover trends in the Zomato dataset by representing complex data through graphs that highlight patterns, such as popular restaurant types or cost preferences among couples. Libraries like Matplotlib and Seaborn facilitate this by offering features to create high-quality plots, such as histograms for rating distribution, count plots for delivery preferences, and line plots for votes by restaurant type. By visualizing data, analysts can derive actionable insights and communicate findings effectively.

To prepare Zomato data for EDA, you should first import the necessary Python libraries such as Pandas, Numpy, Matplotlib, and Seaborn. Then, load the data into a DataFrame, handle missing values, and transform necessary columns, such as converting string ratings to float. Ensuring the data is clean and well-structured is crucial for accurate analysis and visualization. These steps enable effective detection of trends and patterns, allowing for detailed insights into restaurant characteristics, customer preferences, and service modes.

To preprocess and analyze Zomato data for insights, you would use Python libraries like Pandas, Matplotlib, and Seaborn. Pandas enables data wrangling and preprocessing by loading the Zomato data into a structured DataFrame format, handling null values, and allowing column manipulations such as converting rating data. Matplotlib and Seaborn are used to visualize the data and uncover trends, such as popular cuisine types or the impact of pricing on ratings. Exploratory Data Analysis (EDA) can then be performed to identify user preferences or evaluate service modes like online vs offline delivery by visualizing patterns and distributions.

Online orders might receive higher ratings compared to offline orders because customers potentially associate online ordering with convenience and efficiency. The ability to order ahead and reduced waiting times may enhance the customer experience, resulting in higher satisfaction ratings. Moreover, online platforms can provide better customer service feedback and engagement, contributing to perceived quality improvements over traditional service modes.

Handling ratings as numerical values instead of strings is crucial as it allows for effective statistical analysis, such as calculating averages or generating meaningful visualizations. In Python, this can be achieved using Pandas by applying a custom function that splits the string values and converts the rating portion to float. This transformation enables quantitative assessments of restaurant performance, enhancing descriptive and predictive analytics.

Based on voting data, dining restaurants receive the most customer engagement. By grouping the data by restaurant type and summing the votes, it is clear that dining types attract more votes compared to others, suggesting they are more favored by diners. This higher engagement can reflect preferences for dining experiences where customers spend more time and interaction, potentially indicating a higher likelihood of voting.

According to the Zomato data, the majority of couples prefer restaurants with an approximate cost of 300 rupees for dining. This suggests that couples are budget-conscious when choosing dining options, likely due to seeking affordability while maintaining quality. Restaurants within this cost range may position themselves better to attract couple diners by balancing price with service and ambiance.

Analyzing the "listed_in(type)" column of the Zomato data reveals that dining restaurants have a higher preference among consumers, compared to other types. By examining the count plot and vote summation, it's evident that more consumers favor the dining experience, which may include sit-down meals and social interactions, over other options like takeout or buffets. Such insights are valuable for understanding consumer behaviors and assisting restaurant owners in aligning their offerings to demand.

You might also like