Instagram User Analysis
Instagram User Analysis
User analysis is the process by which we track how users engage and interact with our digital
product (software or mobile application) in an attempt to derive business insights for
marketing, product & development teams.
These insights are then used by teams across the business to launch a new marketing
campaign, decide on features to build for an app, track the success of the app by measuring
user engagement and improve the experience altogether while helping the business grow.
Problem statement
In this Project we have a dataset of Instagram users. We have to analyze this data to derive
useful insights for the marketing team and for the investors. The marketing team wants to
track users’ engagement on Instagram app to take decision regarding AD campaign launch.
The users’ analysis will help in deriving useful insights which will help in better decision
making.
Approach
• To carry out this project I am using MYSQL.
• At first, I have created a database ig_clone.
• To use the database use ig_clone command was executed.
• Then I have created tables in the database. First table is Users which consist of user
details like id, username and time when the account was created.
• Next table is photos which consist of id, image URL, user id and time when the photo
was posted.
• Then I have created comments table for all the comment details like comment id, user
id, photo id, comment text and time.
• In the same manner I have created likes table and follows table. Follows table have
information about followers and followee.
• Then we have tags table which have information about various tag names.
1
• The last table is photo_tags for all the tags used with the respective photos.
• After creating the schema in the database, the next step is to load the dataset in the
database. To store the values in all the tables I have used INSERT Command. After
loading the data in the database, we will perform the users analysis on SQL.
Tech-Stack Used:
For this project, I have implemented the SQL codes on MYSQL workbench 8.0 CE version.
MARKETING
Insight: From the result we can see that Darby_Herzog, Emilio_Bernier52, Elenor88,
Nicole71 and Jordyn.Jacobson2 are the five oldest users.
2
Task: Find the users who have never posted a single photo on Instagram.
select users.id, users.username, photos.id from users left join photos on
users.id=photos.user_id where photos.id is null;
Insight: We can see here that out of 100 users, 26 users have not posted a single photo on
Instagram. We have the list of the usernames above. We’ll send promotional emails to these
users to post their 1st photo.
3
The team started a contest and the user who gets the most likes on a single photo will win the
contest now they wish to declare the winner.
Your Task: Identify the winner of the contest and provide their details to the team
Insight: The person who has a username “Zack_Kemmer93” has won the contest. His photo
with a photo ID 145 has been liked by maximum users. Likes count is 48.
Hashtag Researching
A partner brand wants to know, which hashtags to use in the post to reach the most people on
the platform.
Your Task: Identify and suggest the top 5 most commonly used hashtags on the platform
4
select tags.tag_name, count(photo_tags.photo_id) as 0.000
Usage_count from tags inner join photo_tags on 5 row(s) sec /
3 15 18:07:26
tags.id=photo_tags.tag_id group by tags.tag_name order returned 0.000
by usage_count desc limit 5 sec
Insight: In the output we can see that “smile, beach, party, fun and concert” are the most
commonly used hashtags popular among the users.
Launch AD Campaign
The team wants to know, which day would be the best day to launch ADs.
Task: What day of the week do most users register on? Provide insights on when to schedule
an ad campaign
Insight: The DAYOFWEEK() function returns the weekday index for a given date (a number
from 1 to 7). Note: 1=Sunday, 2=Monday, 3=Tuesday, 4=Wednesday, 5=Thursday,
6=Friday, 7=Saturday.
From our output, maximum users have registered on Thursday (5) and Sunday (1). So, either
Thursday or Sunday would be an appropriate day for campaign launch.
INVESTOR METRICS
User Engagement
5
Are users still as active and post on Instagram or they are making fewer posts.
Task: Provide how many times does average user posts on Instagram. Also, provide the total
number of photos on Instagram/total number of users
select users.id as user_id,count(photos.id) as photo_id from users left join photos on users.id=
photos.user_id group by users.id;
The investors want to know if the platform is crowded with fake and dummy accounts.
Task: Provide data on users (bots) who have liked every single photo on the site (since any
normal user would not be able to do this).
select username from base where photo_likes=(select count(*) from photos) order by
username;
6
with base as (select users.username, count(likes.photo_id) as 0.015
photo_likes from likes inner join users on likes.user_id=users.id 13 row(s) sec /
3 52 19:02:46
group by users.username) select username from base where returned 0.000
photo_likes=(select count(*) from photos) order by username sec
Insight: we have a list of usernames who have liked all the photos on the app. There are 13
dummy accounts on the platform.
Result
In the project we have derived multiple insights for the marketing team and for the investors.
According to the insights we know now that the most suitable day to launch a campaign is
either Thursday or Sunday. We then determined who are the oldest users and who are
inactive users whom the company will target through promotional emails. Along with that we
saw “smile, beach, party, fun and concert” are the most popular hashtags. Lastly we saw that
13 users accounts are fake. These insights will be very useful for the respective stakeholders
and will help in better decision making.
This project has helped me in better understanding of structured query language through it’s
implementation on real life case study. I have learned the application of aggregation, sorting
and filtering command in a better way. The concepts of join helped me in retrieving values
from two or more than two tables.