Zomato Data Analysis
Zomato Data Analysis
https://doi.org/10.22214/ijraset.2021.39303
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.429
Volume 9 Issue XII Dec 2021- Available at www.ijraset.com
Abstract: Whenever we would like to visit a brand new place in delhi -NCR, we often search for the most effective restaurant or
the most cost effective restaurant, but of decent quality. For looking of our greatest restaurants we frequently goes for various
websites and apps to induce an overall idea of restaurants service. the foremost important criteria for all this is often rating and
reviews of the those that have already got experience in these restaurants. People see for rating and compare these restaurants
with one another and choose for his or her best. We restrict our data only to Delhi-NCR. This Zomato dataset provides us with
enough information in order that one can decide which restaurants is suitable at which place and what kind of food they must
serve so as get maximum profit. it's 9552 rows and 22 columns during this dataset. We'd wish to find the most affordable
restaurant in Delhi-NCR.We can discuss various relationships between various columns of information sets like between rating
and cuisine type , locality and cuisine etc. Since it's a true time data we might start first with data cleaning like cleaning spaces ,
garbage texts etc , then data exploratory like handling the None values, null values, dropping duplicates and other
Transformations then randomization of dataset so analysis. Our target variable is that the "Aggregate Rating" column. We
explore the link of the opposite features within the dataset with relevancy Rates. we'll the visualize the relation of all the opposite
depend features with relevance our target variable, and hence find the foremost correlated features which effects our target
variable.
Keywords: Online food delivery, Marketing mix strategies, Competitive analysis, Pre-processing, Data Cleaning, Data Mining,
Exploratory data analysis , Classification , Pandas , MatPlotLib.
I. INTRODUCTION
Digitalization has impacted the whole world and India is also remain affected by this phenomenon. Various things from classrooms to
eating food gone to the internet.Customers not only use the Internet to buy product online, but also to compare costs, features and
quality of the product, and any sale available[1].They would get the idea if they want to buy the product from a specific store . The
Internet is becoming an pervasively common platform to facilitate searching, choosing, and buying products. Online food ordering
companies offer a range of options and conveniences that enable consumers to have their favourite food on their fingertips[2].
Recently, Online Food Delivery become a new trend comforting the foodies and Zomato is the biggest name that come to mind when
talking with reference to India. Zomato helps various restaurants to increase their customer base and even the concept of cloud
kitchen also finds its way in India having only delivery but not dine in facilities. Many people across the country want to get into this
profitable business of food delivery and wants to open restaurants and cloud kitchen in different parts of India.
The Objective of this project is to get an idea of following :
1) What type of food is like by people in various places.
2) Which restaurant they like most.
3) Which type of restraint is profitable to open in which area
II. OBJECTIVES
The study has the following objectives:
V. DATASET DESCRIPTION
Zomato dataset is real time data set which gives information about restraunts , its cuisins , locality , ratings etc.
The data is taken from url: https://drive.google.com/file/d/1FSa_x3COvCoMODa44qXufO9CQb3ydqKw/view
VI. METHODS
A. Data Collection
Data that we got from Url above is a platform used for getting various results that are further analyse to get proper relationships
among various factors.There are total of 21000 data points approx.. and calculation is done on this.
B. Data Pre-Processing
The Dataset contained following Attributes.-
1) Records having null values were dropped from ratings columns and were replaced in the other columns with a numerical value.
2) Various spaces at different places were deleted and data set get cleaned
4) Data Visualization: Data visualization is the technique of converting large data sets and numbers into charts, graphs and other
visuals that is made to represent data pictorially[12]. This visual representation of data makes it easier to identify and share real-
time comparisons, trends, and new insights about the information represented in the data. It helps you to keep an eye on different
events or activities in a single look by providing insights on one or more pages or screens.Various Data science techniques can be
used to identify what affecting what, why it's affecting, and what will happen next. As the size of database increases, more people
required data visualization tools to process their data [13]
5)
VIII. RESULTS
1) The dataset is very skewed towards Delhi ncr restaurants
2) BBQ, German, Malwani,, Cajun - these cusines are not available in ncr regions.
3) North Indian 3597, Chinese 2448, Fast Food 1866 are the most served cuisines in both nor and non nor regions
4) when we print data we see NON NCR values and NON NCR EX values are same except for 3 cases. I am just printing top 10
because after that data is very small which means chances of error are high. The first unexpected value in column 5 is
"Continental" which means in NON NCR region local food is famous. Second unexpected value is in column 6 which is "Italian"
which I believe can be justified by people likeness to "Pizza and Pasta". Third unexpected value is in cloumn 8 "Cafe" which
shows increase in coffee culture in other parts of India.
5) When we print data we see most of cuisines are from another country. As data is very low for this segment this might not
represent the actual facts. In this data we see most of the exotic food from another countries are in NCR region while NON NCR
region is 0, for 7 out of 10 cases.
6) The restaurants serving 6 cuisines have highest rating. The rating first increased form 1-6 cuisines and then decreased from 6-8.
7) The graph tells us all. After the average_cost_for_two hits a target of RS 7000 the average rating is usually above 4.0 which is a
good rating. We can say the more premium a restaurant gets the more rating it has.
8) The analysis is simple. The more exotic the cuisine gets the more rating it has. One more thing we can see if we print avg cost
each cuisine we will see avg cost doesn't need to more for good rating. All costs are in Indian RS.
9) Restaurants chins like BBQ, AB - absolute barbeque, have highest number of votes.
10) Cities like Delhi, Gurgaon and Noida have more number of restaurants and more weighted rating.
IX. CONCLUSION
This paper have the analyses of various characteristics of current restaurants in different localities of a city in particular country and
analyzes them to predict restaurant ratings related to particular food. This makes it an important thing to take into consideration before
making a dining in or online ordering decision. Before creating a venture like that of a restaurant, such kind of research is an
important part of planning and this paper has already done it for atleast delhi-NCR people. There has been a lot of research into
variables impacting profits and the competition in the restaurant industry when someone opening its restaurants. To enhance customer
satisfaction rates, various dine-scape variables have been analyzed. If data is also collected for other reviwers, such predictions could
be made for accuracy.
X. ACKNOWLEDGMENT
I completed this project and Research paper under the guidance of my mentor Prof. Nidhi Sengar who constantly supports me in
every sphere. I would also like to thanks my friends, family and my teachers who helped me in completing this project.
REFERENCES
[1] Das, J. (2018). Consumer Perception Towards “Online Food Ordering and Delivery Services”:
[2] Kapoor, A. P., & Vij, M. (2018). Technology at the dinner table: Ordering food online through mobile apps. Journal of Retailing and Consumer Services,
43(1), 342–351
[3] Atharva Kulkarni,Divya Bhandari,Sachin Bhoite.A study of Restaurants Rating Prediction using Machine
[4] Kuhlman, Dave. "A Python Book: Beginning Python, Advanced Python, and Python Exercises". Section 1.1. Archived from the original (PDF) on 23 June
2012.
[5] Python Software Foundation. Archived from the original on 20 April 2012. Retrieved 24 April 2012., second section "Fans of Python use the phrase
"batteries included" to describe the standard library, which covers everything from asynchronous processing to zip files."
[6] Rossum, Guido Van (20 January 2009). "The History of Python: A Brief Timeline of Python". The History of Python. Archived from the original on 5 June
2020. Retrieved 5 March 2021.
[7] "License – Package overview – pandas 1.0.0 documentation". pandas. 28 January 2020. Retrieved 30 January 2020.
[8] matplotlib.org.
[9] "NumFOCUS Sponsored Projects". NumFOCUS. Retrieved 2021-10-25.
[10] "Installing – Matplotlib 2.0.2 documentation". Retrieved 2017-06-23.
[11] "Matplotlib: Python plotting — Matplotlib 3.2.0 documentation". matplotlib.org. Retrieved 2020-03-14.
[12] Nussbaumer Knaflic, Cole (2 November 2015). Storytelling with Data: A Data Visualization Guide for Business Professionals. ISBN 978-1-119-00225-3.
[13] Gershon, Nahum; Page, Ward (1 August 2001). "What storytelling can do for information visualization". Communications of the ACM. 44 (8): 31–37.
doi:10.1145/381641.381653. S2CID 7666107.