0% found this document useful (0 votes)
12 views20 pages

Foodhub Analysis

The dataset consists of 1,898 records detailing food orders, including variables such as order ID, customer ID, restaurant name, cuisine type, cost, day of the week, rating, food preparation time, and delivery time. Initial analysis reveals that most orders are placed on weekends, with American cuisine being the most popular, and a significant portion of orders remain unrated. The methodology includes univariate and multivariate analyses to explore relationships between variables, with findings indicating weak correlations among numerical variables and varying delivery times based on the day of the week.

Uploaded by

eng.yosrahasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views20 pages

Foodhub Analysis

The dataset consists of 1,898 records detailing food orders, including variables such as order ID, customer ID, restaurant name, cuisine type, cost, day of the week, rating, food preparation time, and delivery time. Initial analysis reveals that most orders are placed on weekends, with American cuisine being the most popular, and a significant portion of orders remain unrated. The methodology includes univariate and multivariate analyses to explore relationships between variables, with findings indicating weak correlations among numerical variables and varying delivery times based on the day of the week.

Uploaded by

eng.yosrahasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Methodology and Data

Data Dictionary

The dataset contains 1,898 records with the following columns:

- Order ID: Unique ID of the order


- Customer ID: ID of the customer who ordered the food
- Restaurant name: Name of the restaurant
- Cuisine type: Cuisine ordered by the customer
- Cost: Cost of the order
- Day of the week: Indicates whether the order is placed on a weekday or weekend
- Rating: Rating given by the customer out of 5
- Food Preparation time: (in minutes) taken by the restaurant to prepare the food.
- Delivery time: (in minutes) taken by the delivery person to deliver the food package.

Classification of the data:

- Categorical variables are restaurant name, cuisine type, rating and day of the week.
- Numerical variables are cost of the order, preparation time and delivery time.

Initial Statistical Observations:

 The Data Frame has 9 columns as mentioned in the Data Dictionary. Data in each row
corresponds to the order placed by a customer.
 Order ID and Customer ID are just identifiers for each order
 The cost of an order ranges from 4.47 to 35.41 dollars, with an average order costing
~16.49 dollars.
 Food preparation times range from 20 to 35 minutes, with an average of ~27.4 minutes
and a standard deviation of 4.6 minutes.
 Delivery times range from 15 to 33 minutes, with an average of ~24.2 minutes and a
standard deviation of 5.0 minutes.
 Based on the frequency of rating, 736 orders are not rated, which accounts for 38.78%
of total orders. The ratings of 5 are the most frequent among those orders that are
rated, with a percentage of 30.98%
 By implementing a correlation matrix, it shows that there are no significant correlations
between any of the available numerical variables. In other words, they are independent
variables, meaning that delivery time, order cost, and preparation time do not strongly
influence one another.

Methodology:

The analysis is divided into two sections: Univariate analysis and multivariate analysis.
Univariate analysis explores each variable independently, providing observations on
distributions and statistics. On the other hand, the multivariate analysis explores correlation
between the variables, determining if there is dependency among the variables. For categorical
variables, bar charts are used to reflect the data, however histograms and box plots are used to
reflect numerical data. Additionally, scatter plots, and word cloud are implemented to enhance
the analysis. Furthermore, SaS Viya and Excel are the utilized tools for this project.

Univariate Analysis

Step 1: Identify and check total number of variables (Uniqueness)

Observations:

 Total unique order ID's = 1898 as expected

 Total unique customer ID's = 1200. Meaning there are repeated customers.

 Total unique restaurants are = 178

 Total unique cuisine types are = 14

 Total unique days of the week are only 2, weekday and weekends
Step 2: Univariate analysis of Cuisine type

Observations:

There are 14 unique cuisine types, and their distribution reveals uneven popularity. American
Cuisine is the most common type, followed by Japanese and Italian. However, Vietnamese
cuisine is the least popular among all the food choices. Beyond Chinese food, the order
counts for the remaining cuisines drops significantly.
Step 3: Univariate analysis of Restaurants

Observations:

The dataset contains 178 distinct restaurants, Shake Shack is the restaurant with the most
orders with over 200 orders, followed by The Meatball Shop and Blue Ribbon Sushi. The top ten
restaurants (~6% of total restaurants) account for approximately 47% of the total orders, 887
orders to be exact.
Step 4: Univariate analysis of Ratings

Observations:

A significant number of orders, accounting for 39% of the total orders, were not given a rating.
Among the rated orders, the majority received a 5-star rating accounting for 31% of the total
orders.
Step 5: Univariate analysis of day of the week
Observations:

The data comprises of two distinct values: Weekday and Weekend. The bar chart shows
that the number of orders placed during the weekends is roughly twice or even more than
twice compared to those placed on weekdays. Weekend orders account for 71% of the total
orders. It is also observed that America cuisine is the top cuisine ordered on weekends with a
total of 415 orders. It is also the top cuisine ordered on weekdays with a total of 169 orders.

Step 6: Univariate analysis of Cost of the order


Observations:

- We can observe from the boxplot that the median cost is around 15 dollars. This
confirms that half of the orders are below 15 dollars and half are above.
- The minimum order cost is approximately 5 dollars, and the maximum order cost is
approximately 35 dollars.
- The average cost is 16.49 dollars.
- First Quartile (Q1): 25% of the orders cost 12.08 dollars or less.
- Third Quartile (Q3): 75% of the orders cost 22.31 dollars or less.
- Standard Deviation: The order costs have a standard deviation of 7.48 dollars,
indicating moderate variability in the data. This means that while most orders are
clustered around the mean, there is still a noticeable spread in the data
- Moving to Histogram, it is observed that the distribution of the data is sightly right
skewed which is convenient as the average (mean) exceeds the median, there are
higher cost values pulling the mean up.
- It is also noticed that most orders fall within the 10-to-20-dollar range, signaling a
preferred price range favored by a significant portion of customers.
- Lastly, there are approximately 555 orders with costs above 20$ (30% of total orders),
indicating the presence of expensive meal options in the data set, however there are no
outliers present, indicating that there are no significant values affecting the distribution
of data.
Step 7: Univariate analysis of Delivery time

Observations:
- From the box plot, we can observe that the median delivery time is 25 minutes,
indicating that half of the deliveries take 25 minutes or less.
- The minimum delivery time is 15 minutes, and the maximum delivery time is 33
minutes.
- First Quartile (Q1): 25% of the delivery times are at 20 minutes or less.
- Average (Mean): The average delivery time is approximately 24.16 minutes
- Third Quartile (Q3): 75% of the delivery times are at 28 minutes or less.
- Standard Deviation: The delivery times have a standard deviation of approximately
4.97minutes, indicating moderate variability in delivery times.
- Moving to the histogram, he majority of food delivery times are within a range 24 to 31
minutes.
- The average is slightly lower than the median, that’s why the distribution is slightly left-
skewed. There are few longer delivery times, pulling the average down.
- There are no outliers present, indicating that there are no significant values affecting the
distribution of data.

Step 8: Univariate analysis of Food preparation time


Observations:

- From the box plot, it is observed that the median food preparation time is 27 minutes,
indicating that half of the preparations take 27 minutes or less.
- The minimum food preparation time is 20 minutes while the maximum food preparation
time is 35 minutes.
- First Quartile (Q1): 25% of the food preparation times are at 23 minutes or less.
- Average (Mean): The average preparation time is approximately 27.37minutes.
- Third Quartile (Q3): 75% of the preparation times are at 31 minutes or less.
- Standard Deviation: The preparation times have a standard deviation of approximately
4.63 minutes, indicating moderate variability in preparation times.
- The Average is highly close to the median, therefore the distribution is nearly
symmetrical
- Moving to histogram, the timings are evenly distributed across the intervals, with a slight
peak in the 20.8 to 21.6 range (135 occurrences).
- Further, the distribution data indicates that there are no orders recorded in the following
intervals of food preparation times 23.2 to 24 minutes, 27.2 to 28 minutes, 31.2 to 32
minutes. It could suggest that these preparation times are either deliberately avoided
for efficiency reasons or are naturally rare due to the operational workflow.
- There are no outliers present, indicating that there are no significant values affecting the
distribution of data.

Multivariate Analysis

Step1: Correlation between numerical variables

Observations:

Based on the above correlation matrix, there seems to be no strong correlation between cost of
the order, delivery time and food preparation time.

- The correlation between delivery time and food preparation time is 0.011, which is
weak.
- The correlation between the cost of the order and food preparation time is 0.04, which
is also weak.
- The correlation between delivery time and cost of the order is -0.03, which is weak as
well.
Step 2: Delivery time by the day of the week relationship

Observations:

It is observed from the above box plots that the average delivery time for the weekend is 22.5
minutes, however it is 28.3 minutes for the weekdays. This could be due to different reasons
such as:

1. Weekday traffic jams cause delivery delays.


2. Order Volume: weekdays may see more demand, which could result in longer waiting times.
3. Delivery Workers: More drivers on the weekends could expedite delivery.
Step 3: Food Preparation time by cuisine type relation

Observations:

- Most cuisines have an average delivery time between 25 to 28 minutes.


- Korean Cuisine is the fastest one with an average of (25.46 min) followed by
Vietnamese with an average of (25.71 min). (with an exception for Korean extreme
points (outliers) at 32 and 33 minutes, could be due to large orders)
- Southern Cuisine is the slowest one to prepare with an average of for (27.59 min) and
followed by Italian (27.48 min).
- Delivery times are generally stable, with most cuisines having a standard variation of
between 4.5 and 5.5 minutes. On the other hand, Southern (5.52) and Thai (5.49) have
the most diversity, while Korean (3.97) and Middle Eastern (4.01) have the least.

Step 4: Average Cost of the order by cuisine type

Observations:

The average cost is close for most of the cuisine types, perhaps the Vietnamese and Korean
are the cheapest, and Frensh is the most expensive.

Step 5: Delivery time by rating


Observations

It seems that there is no direct relation between delivery times and ratings, for example, a rating
of 3 has an average delivery time of 25, while a rating of 5 has an average delivery time of 24.
Rating 4 also has an average delivery time of 24, like rating 5. We can assume that there other
factors influencing the ratings. The scatter plot also indicates that the ratings remain consistent
across different delivery times.

Step 6: Food Preparation time by rating


Observations:

It is also observed that the distribution of food preparation time is the same regardless of rating.
So rating is not impacted by food preparation time. The scatter plot is also supported this
assumption.

Step 7: Cost of the order by rating


Observations:

It is observed from the scatter plot that most of the rated orders and unrated orders are
clustered around lower cost orders between 5 dollars and about 17 dollars. 5-star ratings are
dense in the low-cost area but not dominant, meaning there is room for improvement in
customer satisfaction. 3 and 4-star ratings are also present there, which may indicate some
level of dissatisfaction, even for lower-cost orders. There are a lot of not given ratings clustered
in the low-cost area; therefore, we can assume that the cost is not directly impacting the ratings.
Further, significantly fewer ratings are given to more expensive orders, most likely because of
rating bias, consumer hesitancy, or reduced order volume.

Step 8: Ratings grouped by top 10 restaurant names


Observations:

Shake Shack and The Meatball Shop dominate in high ratings. For mid-level ratings of 3 and 4,
Shake Shack leads again followed by Blue Ribbon Sushi. TAO, and Parm could benefit from
customer experience improvements to push more 4-star ratings to 5.

However, there is a significant number of unrated orders, which may affect the analysis. If these
orders were to receive a rating, it could potentially change the results.

Word clouds below supports these assumptions.

You might also like