0% found this document useful (0 votes)
79 views

Operation Analytics and Investigating Metric Spike

qq

Uploaded by

Nishtha 5128
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views

Operation Analytics and Investigating Metric Spike

qq

Uploaded by

Nishtha 5128
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Operation Analytics and Investigating

Metric Spike

Description:
In this project, we have to analyze various datasets and derive insights
for operational improvements. The analysis focuses on two main case
studies: Job Data Analysis and Investigating Metric Spikes. The goal is
to use advanced SQL skills to analyze the data and provide actionable
insights that help different departments within the company.
Operational Analytics is a crucial process that involves analyzing a
company's end-to-end operations. One of the key aspects of
Operational Analytics is investigating metric spikes. This involves
understanding and explaining sudden changes in key metrics, such as a
dip in daily user engagement or a drop in sales.

Approach:
For the project, we will create database and tables. In the first case
study we insert the data into tables manually while for the second case
study we will import the provided CSV files into MySQL Workbench to
create the necessary tables. Now ensure the tables are correctly
structured and populated with data.
Now utilize SQL queries to answer the questions posed in the case
studies. Also focus on understanding the table structures and the
meaning of various columns. Prepare a comprehensive report
summarizing the findings, approach, tech-stack used, insights, and
results.

Tech Stack:
• MySQL Workbench: For database creation, table management,
and executing SQL queries.
• MS-Word: To write the report.
• Google Drive: To store and share the final report.

Case Study I: Job Data Analysis


We will be working with a table named job_data with the following
columns:
1. job_id: Unique identifier of jobs
2. actor_id: Unique identifier of actor
3. event: The type of event (decision/skip/transfer).
4. language: The Language of the content
5. time_spent: Time spent to review the job in seconds.
6. org: The Organization of the actor
7. ds: The date in the format yyyy/mm/dd (stored as text).
The table contents are provided in the .csv file of Case Study-I. Using
the data make tables in the MySQL Workbench also write the table
credentials manually.
Tasks:
A. Jobs Reviewed Over Time:
Calculate the number of jobs reviewed per hour for each day in
November 2020. Write an SQL query to calculate the number of
jobs reviewed per hour for each day in November 2020.

The number of jobs reviewed per hour for each day in the month
of November 2020 is 0.0111.

B. Throughput Analysis:
Calculate the 7-day rolling average of throughput (number of
events per second). Write an SQL query to calculate the 7-day
rolling average of throughput. Additionally, explain whether you
prefer using the daily metric or the 7-day rolling average for
throughput, and why.
The 7-day rolling average of throughput is maximum for 30th
November 2020. There were 2 jobs reviewed on that day. Also, I
would prefer using the 7-day rolling average for throughput as it
smoothens out daily fluctuations and provides a more stable view
of trends over time, helping to make better-informed decisions.

C. Language Share Analysis:


Calculate the percentage share of each language in the last 30
days. Write an SQL query to calculate the percentage share of
each language over the last 30 days.
The percentage share of the Persian language is highest
which is three times,37.5, of the other languages. While all
the other languages have a percent language of 12.5.

D. Duplicate Rows Detection:


Identify duplicate rows in the data. Write an SQL query to display
duplicate rows from the job_data table.

Insights:
• Jobs Reviewed Over Time: Peaks in the number of jobs reviewed
indicate high activity periods, which can help allocate resources
more effectively. The jobs reviewed per hour is 0.0111.
• Throughput Analysis: A stable 7-day rolling average of throughput
ensures that operational performance is consistently monitored
and maintained. This is highest for 30-11-2020 which is 1.34.
• Language Share Analysis: Identifying dominant languages helps in
localizing content and improving user engagement. The highest
percent of language share is for Persian language. All other
languages have same share.
• Duplicate Rows Detection: Detecting and eliminating duplicate
rows ensures data accuracy and integrity.
Case Study II: Investigating Metric Spike
We will be working with three tables:
1. users: Contains one row per user, with descriptive information
about that user’s account.
2. events: Contains one row per event, where an event is an action
that a user has taken (e.g., login, messaging, search).
3. email_events: Contains events specific to the sending of emails.
The data is provided in Case Study-II in form of .csv files on google
drive. We need to create three tables i.e. users, email_events and
events. The data in tables in fed using import function in MySQL
Workbench.

Tasks:
A. Weekly User Engagement:
Measure the activeness of users on a weekly basis. Write an SQL
query to calculate the weekly user engagement.
The weekly engagement of week 33 is highest and 21 is lowest.

B. User Growth Analysis:


Analyze the growth of users over time for a product. Write an SQL
query to calculate the user growth for the product.

year_num week_num num_active_users cum_active_users


2001 2 23 23
2001 7 13 36
2001 11 14 50
2001 15 29 79
2001 20 44 123
2001 24 15 138
2001 28 46 184
2001 33 54 238
2001 37 4 242
2001 41 16 258
2001 46 17 275
2001 50 5 280
https://drive.google.com/file/d/1mXQSS9Zf8B2nwkmlz4Fr34aXRO
N71uDm/view?usp=sharing
Here we extracted the year and week for the occurred_at column
of the users table using the extract, year and week functions.
Then we grouped the extracted week and year on the basis of
year and week number. Then we ordered the result on the basis
of year and week number. Then we found the cumm_active_users
using the SUM, OVER and ROW function between unbounded
preceding and current row Investigating Metric Spike.
User Growth = Number of active users per week.

C. Weekly Retention Analysis:


Analyze the retention of users on a weekly basis after signing up
for a product. Write an SQL query to calculate the weekly
retention of users based on their sign-up cohort.
user_id total_engagements per_week_retention
11768 1 0
11770 1 0
11775 1 0
11778 1 0
11779 1 0
11780 1 0
11785 1 0
11787 1 0
11791 1 0
11793 1 0

https://drive.google.com/file/d/1JXqXrOYALV0sk1oehIFcKRl6lFPaz
MIF/view?usp=sharing
The full output table is provided in the given link.

D. Weekly Engagement Per Device:


Measure the activeness of users on a weekly basis per device.
Write an SQL query to calculate the weekly engagement per
device.

year_num week_num device no_of_users


acer aspire
2001 20 notebook 1
2001 20 asus chromebook 2
dell inspiron
2001 20 notebook 3
2001 20 hp pavilion desktop 1
2001 20 htc one 1
2001 20 ipad air 2
2001 20 iphone 4s 3

https://drive.google.com/file/d/1IIzkAxJJP5UvTI-4HSQModwrWo7-
MU/view?usp=sharing

E. Email Engagement Analysis:


Analyze how users are engaging with the email service. Write an
SQL query to calculate the email engagement metrics.

The Email opening rate is 32.11 and Email clicking rate is 13.18.

Insights:
• Weekly User Engagement:
By analyzing the weekly user engagement, we can identify trends
in user activity. The weekly engagement of week 33 is highest and
21 is lowest.
• User Growth Analysis:
Tracking the growth of users over time helps in understanding the
success of marketing efforts and product acceptance in the
market. The active number of users per week is 9381.

• Weekly Retention Analysis:


Weekly retention analysis helps in understanding how well the
product retains users after they sign up. High initial retention with
a gradual decline is common; identifying the drop-off point can
help in targeted interventions. Consistently low retention rates
suggest potential issues with the onboarding process or the
product's value proposition.

• Weekly Engagement Per Device:


Measuring weekly engagement per device helps in understanding
how users interact with the product across different devices.
Higher engagement on specific devices can guide device-focused
optimizations. Lower engagement on certain devices may indicate
performance or compatibility issues that need addressing.

• Email Engagement Analysis:


Analyzing how users engage with email communications helps in
understanding the effectiveness of email campaigns. Emails with
higher engagement rates indicate successful content strategies
but having low engagement might need content adjustments or
better targeting. The Email opening rate is 32.1 and closing rate is
13.18.

Result:
This project enhanced my understanding of using SQL for operational
analytics, providing valuable insights for improving company operations
and addressing metric spikes. The analysis led to actionable
recommendations for resource allocation, performance monitoring,
content localization, and data accuracy.

You might also like