Foodhub Analysis
Foodhub Analysis
Data Dictionary
- Categorical variables are restaurant name, cuisine type, rating and day of the week.
- Numerical variables are cost of the order, preparation time and delivery time.
The Data Frame has 9 columns as mentioned in the Data Dictionary. Data in each row
corresponds to the order placed by a customer.
Order ID and Customer ID are just identifiers for each order
The cost of an order ranges from 4.47 to 35.41 dollars, with an average order costing
~16.49 dollars.
Food preparation times range from 20 to 35 minutes, with an average of ~27.4 minutes
and a standard deviation of 4.6 minutes.
Delivery times range from 15 to 33 minutes, with an average of ~24.2 minutes and a
standard deviation of 5.0 minutes.
Based on the frequency of rating, 736 orders are not rated, which accounts for 38.78%
of total orders. The ratings of 5 are the most frequent among those orders that are
rated, with a percentage of 30.98%
By implementing a correlation matrix, it shows that there are no significant correlations
between any of the available numerical variables. In other words, they are independent
variables, meaning that delivery time, order cost, and preparation time do not strongly
influence one another.
Methodology:
The analysis is divided into two sections: Univariate analysis and multivariate analysis.
Univariate analysis explores each variable independently, providing observations on
distributions and statistics. On the other hand, the multivariate analysis explores correlation
between the variables, determining if there is dependency among the variables. For categorical
variables, bar charts are used to reflect the data, however histograms and box plots are used to
reflect numerical data. Additionally, scatter plots, and word cloud are implemented to enhance
the analysis. Furthermore, SaS Viya and Excel are the utilized tools for this project.
Univariate Analysis
Observations:
Total unique customer ID's = 1200. Meaning there are repeated customers.
Total unique days of the week are only 2, weekday and weekends
Step 2: Univariate analysis of Cuisine type
Observations:
There are 14 unique cuisine types, and their distribution reveals uneven popularity. American
Cuisine is the most common type, followed by Japanese and Italian. However, Vietnamese
cuisine is the least popular among all the food choices. Beyond Chinese food, the order
counts for the remaining cuisines drops significantly.
Step 3: Univariate analysis of Restaurants
Observations:
The dataset contains 178 distinct restaurants, Shake Shack is the restaurant with the most
orders with over 200 orders, followed by The Meatball Shop and Blue Ribbon Sushi. The top ten
restaurants (~6% of total restaurants) account for approximately 47% of the total orders, 887
orders to be exact.
Step 4: Univariate analysis of Ratings
Observations:
A significant number of orders, accounting for 39% of the total orders, were not given a rating.
Among the rated orders, the majority received a 5-star rating accounting for 31% of the total
orders.
Step 5: Univariate analysis of day of the week
Observations:
The data comprises of two distinct values: Weekday and Weekend. The bar chart shows
that the number of orders placed during the weekends is roughly twice or even more than
twice compared to those placed on weekdays. Weekend orders account for 71% of the total
orders. It is also observed that America cuisine is the top cuisine ordered on weekends with a
total of 415 orders. It is also the top cuisine ordered on weekdays with a total of 169 orders.
- We can observe from the boxplot that the median cost is around 15 dollars. This
confirms that half of the orders are below 15 dollars and half are above.
- The minimum order cost is approximately 5 dollars, and the maximum order cost is
approximately 35 dollars.
- The average cost is 16.49 dollars.
- First Quartile (Q1): 25% of the orders cost 12.08 dollars or less.
- Third Quartile (Q3): 75% of the orders cost 22.31 dollars or less.
- Standard Deviation: The order costs have a standard deviation of 7.48 dollars,
indicating moderate variability in the data. This means that while most orders are
clustered around the mean, there is still a noticeable spread in the data
- Moving to Histogram, it is observed that the distribution of the data is sightly right
skewed which is convenient as the average (mean) exceeds the median, there are
higher cost values pulling the mean up.
- It is also noticed that most orders fall within the 10-to-20-dollar range, signaling a
preferred price range favored by a significant portion of customers.
- Lastly, there are approximately 555 orders with costs above 20$ (30% of total orders),
indicating the presence of expensive meal options in the data set, however there are no
outliers present, indicating that there are no significant values affecting the distribution
of data.
Step 7: Univariate analysis of Delivery time
Observations:
- From the box plot, we can observe that the median delivery time is 25 minutes,
indicating that half of the deliveries take 25 minutes or less.
- The minimum delivery time is 15 minutes, and the maximum delivery time is 33
minutes.
- First Quartile (Q1): 25% of the delivery times are at 20 minutes or less.
- Average (Mean): The average delivery time is approximately 24.16 minutes
- Third Quartile (Q3): 75% of the delivery times are at 28 minutes or less.
- Standard Deviation: The delivery times have a standard deviation of approximately
4.97minutes, indicating moderate variability in delivery times.
- Moving to the histogram, he majority of food delivery times are within a range 24 to 31
minutes.
- The average is slightly lower than the median, that’s why the distribution is slightly left-
skewed. There are few longer delivery times, pulling the average down.
- There are no outliers present, indicating that there are no significant values affecting the
distribution of data.
- From the box plot, it is observed that the median food preparation time is 27 minutes,
indicating that half of the preparations take 27 minutes or less.
- The minimum food preparation time is 20 minutes while the maximum food preparation
time is 35 minutes.
- First Quartile (Q1): 25% of the food preparation times are at 23 minutes or less.
- Average (Mean): The average preparation time is approximately 27.37minutes.
- Third Quartile (Q3): 75% of the preparation times are at 31 minutes or less.
- Standard Deviation: The preparation times have a standard deviation of approximately
4.63 minutes, indicating moderate variability in preparation times.
- The Average is highly close to the median, therefore the distribution is nearly
symmetrical
- Moving to histogram, the timings are evenly distributed across the intervals, with a slight
peak in the 20.8 to 21.6 range (135 occurrences).
- Further, the distribution data indicates that there are no orders recorded in the following
intervals of food preparation times 23.2 to 24 minutes, 27.2 to 28 minutes, 31.2 to 32
minutes. It could suggest that these preparation times are either deliberately avoided
for efficiency reasons or are naturally rare due to the operational workflow.
- There are no outliers present, indicating that there are no significant values affecting the
distribution of data.
Multivariate Analysis
Observations:
Based on the above correlation matrix, there seems to be no strong correlation between cost of
the order, delivery time and food preparation time.
- The correlation between delivery time and food preparation time is 0.011, which is
weak.
- The correlation between the cost of the order and food preparation time is 0.04, which
is also weak.
- The correlation between delivery time and cost of the order is -0.03, which is weak as
well.
Step 2: Delivery time by the day of the week relationship
Observations:
It is observed from the above box plots that the average delivery time for the weekend is 22.5
minutes, however it is 28.3 minutes for the weekdays. This could be due to different reasons
such as:
Observations:
Observations:
The average cost is close for most of the cuisine types, perhaps the Vietnamese and Korean
are the cheapest, and Frensh is the most expensive.
It seems that there is no direct relation between delivery times and ratings, for example, a rating
of 3 has an average delivery time of 25, while a rating of 5 has an average delivery time of 24.
Rating 4 also has an average delivery time of 24, like rating 5. We can assume that there other
factors influencing the ratings. The scatter plot also indicates that the ratings remain consistent
across different delivery times.
It is also observed that the distribution of food preparation time is the same regardless of rating.
So rating is not impacted by food preparation time. The scatter plot is also supported this
assumption.
It is observed from the scatter plot that most of the rated orders and unrated orders are
clustered around lower cost orders between 5 dollars and about 17 dollars. 5-star ratings are
dense in the low-cost area but not dominant, meaning there is room for improvement in
customer satisfaction. 3 and 4-star ratings are also present there, which may indicate some
level of dissatisfaction, even for lower-cost orders. There are a lot of not given ratings clustered
in the low-cost area; therefore, we can assume that the cost is not directly impacting the ratings.
Further, significantly fewer ratings are given to more expensive orders, most likely because of
rating bias, consumer hesitancy, or reduced order volume.
Shake Shack and The Meatball Shop dominate in high ratings. For mid-level ratings of 3 and 4,
Shake Shack leads again followed by Blue Ribbon Sushi. TAO, and Parm could benefit from
customer experience improvements to push more 4-star ratings to 5.
However, there is a significant number of unrated orders, which may affect the analysis. If these
orders were to receive a rating, it could potentially change the results.