Understanding Indian Cuisine: A Data-Driven Study
Understanding Indian Cuisine: A Data-Driven Study
BE G I NNE R D AT A VI S UA LI Z AT I O N PYT HO N
Indians love to eat, and the variety of food ensures that each part of India will surprise you with a new dish.
The diversity in soil, climate, rainfall patterns, farming methods, ethnic groups, culture, occupation, etc of
people across the country makes the cuisine very widespread and vast. People use a wide variety of
vegetables, dairy products, spices, herbs, etc to cook their food.
Image 1
The iconic biryani is also cooked in different ways across the nation. The famous Hyderabadi Biryani is
made by marinating the meat and cooking it along with the rice. The cooking is done over a slow fire to
give it aroma and fragrance. The uniqueness of Kolkata biryani lies in using potatoes along with meat.
Kolkata Biryani uses fewer spices but has more flavours. The wide variety of spices and raw materials
available all over the Indian subcontinent makes it easy for each region to cook a dish in its own special
way.
Understanding Indian cuisine can be a complex study. The wide variety of food types and recipes makes it
difficult to group food dishes into a specific class. Indian food history is thousands of years old.
Indians consider a healthy breakfast very important. North Indians will prefer Roti and Parathas over Rice.
People in Gujrat might prefer Dhokla, and people in South India will prefer Idli and Dosa. Bengali cuisine
has many varied types of fish and sweets. Indians consider evening snacks an important part of the day
where family members gather to chat over tea. There is also high importance of Desserts in Indian cuisine,
similar to the Western concept of desserts. Sweets like Gulab Jamun, Rasgulla, Laddu, Jalebi are popular
all across India.
To understand Indian cuisine, we need to analyse the geographic distribution of Indian dishes and the
ingredients used in making the food.
Dataset for Indian Cuisine Analysis
The dataset is taken from Kaggle. The data contains various information on Indian dishes. Data includes
the name of the dish, main ingredients, diet type, preparation time of the dish, the cooking time of food,
flavour profile of the dish, meal course, state of origin of the dish, and the region of the state. The dataset
also has many missing values.
The dataset has a lot of information about Indian dishes and can be used to do a study.
import numpy as np import pandas as pd import plotly.express as px import seaborn as sns import
matplotlib.pyplot as plt %matplotlib inline
For this job, we will only need basic plotting tools and libraries. Now, we import the data.
df= pd.read_csv("/kaggle/input/indian-food-101/indian_food.csv")
df.head()
Output:
So, we can see that all the data is present and we can use it without any problems.
Let us have a look at the data types and amount of data points.
df.info()
Output:
So, there are 255 dishes in the data. Quite a large number!
Only the prep_time and cook_time are numbers, which denote the dish preparation time and dish cooking
time respectively. Others are all string fields, which is quite normal.
Let us have a look at the data distribution of the two numeric fields, that is prep_time and cook_time.
df.describe()
Output:
The mean preparation time and cook time is over 30 minutes. It does explain the fact that Indian dishes
take a long time to get cooked. Since our childhood days, we have seen that our mothers took a long time
to cook our favourite dishes (in India). The complex preparation and wide variety of ingredients used make
it a long process.
px.bar(df, y="state")
Output:
The largest number of entries here are from Gujrat.
Output:
The entire table is longer. I will share the link to the notebook in the end, do have a look there.
Output:
Let us see some more data visuals.
Output:
We can see that majority of the dishes listed here are from the western region.
Output:
We can see that majority of the dishes are vegetarian and a few of them are non-vegetarian.
Output:
The majority of items are spicy, and then there are sweet items. The “-1” items are wrong and mislabeled
data. Very few items are bitter or sour.
As we can see, the majority is the main course. Then, there is dessert.
px.bar( df, x='region', color= 'diet', title= 'Different Diet per Region', labels= {'region': 'Region',
'diet': 'Diet'}, color_discrete_sequence=['#3CB14C','#35612D'] )
Output:
The majority of the non-veg dishes (as a ratio of total dishes) are in the Northeast region. Northern and
western regions have very few non-veg dishes.
Let us try to see the broader picture of the data, with many distributions including Region, State and
Flavour profile. This will give us a better understanding of the data.
aspect=2)
Output:
Now, let us replace the flavor profile with the meal course.
Output:
hue="diet", data=df)
Output:
We see that all the non-vegetarian items are spicy. The “-1” units are wrongly labeled data. Sweet and sour
foods are not non-vegetarian. The cook times of sweet dishes are in the higher range.
It is understandable as these dishes have many special requirements and preparation methods.
Output:
Now, we replace the preparation time data with cooking time data.
Output:
There are different types of food items with different cook times. Some of the inferences we can derive
from all the data exploration and visuals:
3. There are also a wide variety of sweets. indicating the fact that Indians love sweets a lot.
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator wordcloud = WordCloud(width = 1000, height =
500).generate(" ".join(flat_list)) plt.figure(figsize=(15,8)) plt.imshow(wordcloud) plt.axis("off")
plt.show()
Output:
WordCloud Observation
1. Sugar is a big name here, from sugar to Gajar ka Halwa, a large number of Indian dishes have sugar as an
ingredient.
3. Milk is also very important in the daily diet and is an important part of Indian cuisine. Paneer, Rasmalai,
Dahi, Phirni, etc all have milk as an ingredient. Milk is not just a complete meal, but also an important
ingredient in many Indian dishes.
4. We will find a large number of spices in the image, as Indian food is rich in spices. Garam masala is one
such component. Most of the Indian spicy dishes have Garam Masala as an ingredient.
Output:
Some final study can be done, considering the regions and using bar charts to count and visualize data.
Bar charts are a simple and great way to understand a lot of things.
First, we see the distribution of veg and non-veg foods for all Indian States.
plt.ylabel('States') plt.xlabel('counts') plt.title("Number of Food Items from Each State - Veg & Non-Veg")
plt.show()
Output:
Some inferences:
West Bengal, Assam, and Punjab have the largest number of non-veg dishes. Goa also has a good percent
of total dishes as non-veg dishes.
Output:
And, finally, the food course is based on region.
Output:
Let us see what we can understand
Spicy and sweet dishes are available in almost all states of India. The majority of Indian dishes are main
course items. There are also a large number of desserts.
The entire study was an interesting way to understand the Food Map of India. Indian food map is a wide
and vast area. To understand it in deep, we need more data and information. This limited amount of data
serves the purpose of doing a basic study and analysis. We can get a brief overview of the whole Indian
dishes scenario. With the limited amount of information, we understood a lot about Indian Cuisine.
The code is in the Kaggle Notebook, do consider upvoting if you like the work.
About me
Prateek Majumder
Thank You.
Image Sources
1. Image 1 – https://www.pexels.com/photo/white-and-brown-cooked-dish-on-white-ceramic-bowls-
958545/
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
Prateek Majumder