0% found this document useful (0 votes)

6 views13 pages

SMA_Expt_4

The lab manual outlines an experiment focused on Exploratory Data Analysis (EDA) and visualization of social media data for business purposes. Students will learn to collect, monitor, and analyze social media data using various visualization techniques such as univariate plots, histograms, and heat maps. The document emphasizes the importance of EDA in understanding data patterns, detecting anomalies, and informing data cleaning processes.

Uploaded by

Laukik Pawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views13 pages

SMA_Expt_4

Uploaded by

Laukik Pawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

LAB MANUAL

PART A
(PART A: TO BE REFFERED BY STUDENTS)

Experiment No-04

A.1 Aim:
Exploratory Data Analysis and visualization of Social Media Data for business.

Lab Objective To understand the fundamental concepts of social media networks

Lab Outcome Collect, monitor, store and track social media data

A-2 Prerequisite
Data Mining, Data Analytics

A.3 OutCome
Students will able to perform exploratory data analysis and visualization on the chosen social
media data.

A.4 Theory:

What is Exploratory Data Analysis?

We can define exploratory data analysis as the essential data investigation process before the
formal analysis to spot patterns and anomalies, discover trends, and test hypotheses with summary
statistics and visualizations. It gives an idea about the data we will be digging deep into while
analyzing. It aids in formulating how we can handle data during analysis, like choosing models,
handling outliers, deciding model accuracy parameters, etc. Visualization helps to infer insights
easily from massive datasets.

Need for visualizing data:

● Understand the trends and patterns of data

● Analyze the frequency and other such characteristics of data
● Know the distribution of the variables in the data.
● Visualize the relationship that may exist between different variables

The number of variables of interest featured by the data classifies it as univariate, bivariate, or
multivariate. For example, if the data features only one variable of interest then it is a univariate
data. Further, based on the characteristics of data, it can be classified
as categorical/discrete and continuous data.

Types of Exploratory Data Analysis

1. Univariate Plots
Univariate plots show the frequency or the distribution shape of a variable.

2. Swarm Plot

The swarm-plot, similar to a strip-plot, provides a visualization technique for univariate data to
view the spread of values in a continuous variable. The swarm-plot spreads out the data points of
the variable automatically to avoid overlap and hence provides a better visual overview of the
data.

2. Histograms
Histograms are two-dimensional plots in which the x-axis divide into a range of numerical bins or
time intervals. The y-axis shows the frequency values, which are counts of occurrences of values
for each bin. Bar graphs have gaps between the bars to indicate that they compare distinct groups,
but there are no gaps in histograms. Hence, they tell us if the distribution is left/positively skew
(most of the data falls to the right side), right/negatively skewed (most of the data falls to the left
side), bi-modal (graphs having two distinct peaks), normal (perfectly symmetrical without skew),
or uniform (almost all the bins have similar frequency).

Density Plots:
A density plot is like a smoother version of a histogram. Generally, the kernel density estimate is
used in density plots to show the probability density function of the variable. A continuous curve,
which is the kernel is drawn to generate a smooth density estimation for the whole data.
Bar Graphs
Bar charts can be used to compare nominal or ordinal data. They are helpful for recognizing trends.

Violin Plots:
The Violin plot is very much similar to a box plot, with the addition of a rotated kernel density
plot on each side. It shows the distribution of quantitative data across several levels of one (or
more) categorical variables such that those distributions can be compared.

Box Plots
These charts show the distribution of values along an axis. Rectangular boxes are used in order to
bucket the data, giving us an idea of how the data points are spread out. These boxes are also called
quartiles which represent a quarter of a data set. Boxes can be drawn vertically or horizontally.
Box plots are suitable for identifying outliers. The below figure shows the structure of a box plot.
Heat Maps
For instance, correlation heat maps show the interrelationship between variables—areas as shaded
as per the data’s values. So, colour differences can easily spot similar and different values and
make sense of the data variation. They are usually helpful when you have a large amount of data.
They are used during A/B testing to see which parts of a web page are accessed by users on a
website.

PART B
(PART B: TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments within two hours of the practical.
The soft copy must be uploaded on the Blackboard or emailed to the concerned lab in charge
faculties at the end of the practical in case the there is no Black board access available)

Roll. No.: A17 Name: Laukik Pawar

Class: BE_A Batch: A1
Date of Experiment: Date of Submission:
Grade:
B.1.Study the fundamentals of social media platform and implement data cleaning, pre-
processing, filtering and storing social media data for business:
(Paste your Search material completed during the 2 hours of practical in the lab here)

● Students need to use the previous social media dataset to perform exploratory data analysis and
visualization.

B.2 Input and Output:

(Command and its output)
# prompt: code to visulize the dataset i,e Univariate Plots ,Swarm Plot,
Histograms, Density Plots, Bar Graphs, Violin Plots, Box Plots,Heat Maps

import matplotlib.pyplot as plt

import seaborn as sns

# Univariate Plots
# Histograms
plt.figure(figsize=(10, 6))
df['VIEWS'].hist(bins=20)
plt.title('Distribution of Views')
plt.xlabel('Views')
plt.ylabel('Frequency')
plt.show()

# Density Plots
plt.figure(figsize=(10, 6))
sns.kdeplot(df['VIEWS'])
plt.title('Density Plot of Views')
plt.xlabel('Views')
plt.ylabel('Density')
plt.show()

# Box Plots
plt.figure(figsize=(10, 6))
sns.boxplot(y=df['VIEWS'])
plt.title('Box Plot of Views')
plt.show()

# Violin Plots
plt.figure(figsize=(10, 6))
sns.violinplot(y=df['VIEWS'])
plt.title('Violin Plot of Views')
plt.show()

# Swarm Plots (for smaller datasets, can be slow for large ones)
plt.figure(figsize=(10, 6))
sns.swarmplot(y=df['VIEWS']) # Consider sampling for large datasets
plt.title('Swarm Plot of Views')
plt.show()

# Bar Graphs (for categorical data - example using 'Channel Name' if it's
categorical)
if 'Channel' in df.columns: #Check if the column exists
plt.figure(figsize=(12,6))
df['CHANNEL'].value_counts().plot(kind='bar')
plt.title('Number of videos per channel')
plt.xlabel('Channel Name')
plt.ylabel('Number of Videos')
plt.show()

# Additional plots you can explore

#Scatter plot
plt.figure(figsize=(10, 6))
plt.scatter(df['VIEWS'], df['DURATION'])
plt.title('Views vs Duration')
plt.xlabel('Views')
plt.ylabel('Duration')
plt.show()
# prompt: code to plot heat map for numeric data only

import matplotlib.pyplot as plt

import seaborn as sns
# Assuming 'df' is your DataFrame and it's already preprocessed

# Select numeric columns for the heatmap

numeric_cols = df.select_dtypes(include=['number']).columns

# Create the heatmap

plt.figure(figsize=(10, 8))
sns.heatmap(df[numeric_cols].corr(), annot=True, cmap='coolwarm',
fmt=".2f")
plt.title('Correlation Heatmap of Numeric Features')
plt.show()

B.3 Observations and learning:

(Students are expected to comment on the output obtained with clear observations and
learning for each task/ sub part assigned)
We performed Exploratory Data Analysis and visualization of Social Media Data for business.

B.4 Conclusion:
(Students must write the conclusion as per the attainment of individual outcome listed above
and learning/observation noted in section B.3)
We performed Exploratory Data Analysis and visualization of Social Media Data for business.

B.5 Question of Curiosity

(To be answered by student based on the practical performed and learning/observations)
Q1. What is EDA? Explain Its Importance.
Exploratory Data Analysis (EDA) is an analytical process used to summarize the main
characteristics of a dataset, often using visual methods. EDA allows data scientists and analysts to
explore data without having a specific hypothesis in mind, helping to uncover patterns, spot
anomalies, test assumptions, and gain insights that inform future analyses.

Importance of EDA:

Data Understanding: EDA provides in-depth insight into the data's structure, including
distributions, relationships, and trends, enabling analysts to understand what the data represents.

Identifying Patterns and Trends: It helps in uncovering underlying trends or patterns that may not
be immediately obvious, which can guide further data exploration and analysis.

Detecting Anomalies: Through visualization and summary statistics, EDA can identify outliers or
anomalies that could affect subsequent analysis or modeling.

Hypothesis Generation: EDA can help generate hypotheses that can be tested in further analysis
by revealing insights that might not have been considered originally.

Feature Selection: Understanding the relationships between different variables can aid in
identifying which features are most relevant for predictive modeling.

Informing Data Cleaning and Pre-processing: EDA highlights issues in the data, such as missing
values, skewed distributions, or irrelevant features, influencing necessary data cleaning steps.

Q2. What is the Importance of Visualization?

Data Visualization is the graphical representation of information and data. Using visual elements
like charts, graphs, and maps, data visualization tools provide an accessible way to see and
understand trends, outliers, and patterns in data.

Importance of Visualization:
Improved Comprehension: Visualizations make complex data more understandable, summarizing
large amounts of information quickly and effectively.

Better Communication: Visual representations make it easier to convey findings to others,

especially non-technical stakeholders, facilitating discussions and decision-making.

Enhanced Pattern Recognition: Humans are generally good at recognizing patterns visually, so
visualizations can highlight correlations and trends that might not be discernible through raw data
alone.

Time Efficiency: Visual tools help analysts quickly grasp the significance of the data without
digging deeply into the numbers, thus saving time.

Exploration in EDA: In the context of EDA, visual tools help in immediate feedback and iterative
analysis, allowing stakeholders to explore data more freely and flexibly.

Q3. Explain the Steps Involved in EDA.

The EDA process typically involves several cohesive steps that provide a thorough understanding
of the dataset:

Data Collection: Gather data from various sources to create a comprehensive dataset for analysis.

Data Cleaning: Address issues such as missing values, duplicates, and inconsistencies to ensure
the quality of the dataset.

Descriptive Statistics: Compute summary statistics (mean, median, mode, variance, etc.) to gain
insights into the central tendency and dispersion of data.

Data Visualization: Create visual representations (histograms, scatter plots, box plots, etc.) to
explore distributions and relationships between variables.

Variable Analysis:

Univariate Analysis: Analyze each variable individually to understand its distribution and
characteristics.
Bivariate/Multivariate Analysis: Explore the relationships between two or more variables to
identify correlations and dependencies.
Outlier Detection: Identify and analyze outliers or anomalies in the data that could impact the
analysis.
Correlation Analysis: Examine the correlation between features to understand relationships and
dependencies, using correlation matrices or heatmaps.

Hypothesis Generation: Use insights derived from EDA to formulate hypotheses for further testing
in subsequent analyses.

Documentation: Document findings, visualizations, and initial impressions to provide context and
reference for future analysis or stakeholders.

By following these steps, analysts can approach their data in a structured manner, ensuring that
they derive maximum insights while also preparing the data for further predictive modeling or
analysis as needed

2017 Year 7 BSC Paper and Answers
100% (10)
2017 Year 7 BSC Paper and Answers
21 pages
Msa4 Vda5 Ea
100% (1)
Msa4 Vda5 Ea
6 pages
Oxo Machm 09va A000 Xxaann PDF
No ratings yet
Oxo Machm 09va A000 Xxaann PDF
4 pages
exp 4-10 merged
No ratings yet
exp 4-10 merged
89 pages
C21_SMA_EXP4[1]
No ratings yet
C21_SMA_EXP4[1]
12 pages
SMA EXP4 AYU
No ratings yet
SMA EXP4 AYU
6 pages
AUTOMATED EDA Libraries
No ratings yet
AUTOMATED EDA Libraries
12 pages
03a EDA
No ratings yet
03a EDA
47 pages
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
No ratings yet
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
47 pages
Module4 DSV
No ratings yet
Module4 DSV
89 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
139 pages
Week13 2 Data Analysis 2
No ratings yet
Week13 2 Data Analysis 2
44 pages
OE PPT
No ratings yet
OE PPT
8 pages
@vtucode - in 21CS644 Module 4 2021 Scheme
No ratings yet
@vtucode - in 21CS644 Module 4 2021 Scheme
33 pages
UNIT4
No ratings yet
UNIT4
8 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
EDA Feature eng- Estimation Inference and Hypothesis
No ratings yet
EDA Feature eng- Estimation Inference and Hypothesis
53 pages
Module1 DS Ppt
No ratings yet
Module1 DS Ppt
61 pages
Chapter3 - Visualization and Communication
No ratings yet
Chapter3 - Visualization and Communication
45 pages
IMPDAV
No ratings yet
IMPDAV
105 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Data Science Process
No ratings yet
Data Science Process
30 pages
Business Research Unit_4
No ratings yet
Business Research Unit_4
14 pages
Module 1 - 2 - EDA
No ratings yet
Module 1 - 2 - EDA
12 pages
EDA-4-5
No ratings yet
EDA-4-5
7 pages
21CS644 Module 4
No ratings yet
21CS644 Module 4
24 pages
Research Methodogy Class 5
No ratings yet
Research Methodogy Class 5
29 pages
Research Methodogy Class 4
No ratings yet
Research Methodogy Class 4
29 pages
EDA - Unit 1
No ratings yet
EDA - Unit 1
82 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
m2 final
No ratings yet
m2 final
151 pages
Data Basics for ML
No ratings yet
Data Basics for ML
23 pages
Amit_Khilare_Used_Device_Data_PM_Project
No ratings yet
Amit_Khilare_Used_Device_Data_PM_Project
25 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Crash Course Data Science
No ratings yet
Crash Course Data Science
7 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Data Analysis Week 8 Lecture Note
No ratings yet
Data Analysis Week 8 Lecture Note
11 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Applied_Data_Science-MODULE-3-SEM-8
No ratings yet
Applied_Data_Science-MODULE-3-SEM-8
41 pages
DEV_CORE
No ratings yet
DEV_CORE
7 pages
Ch-1 Introduction To Data Analysis
No ratings yet
Ch-1 Introduction To Data Analysis
23 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
DSV Module-4
No ratings yet
DSV Module-4
36 pages
Unit 3 DATA VISUAIZATION
No ratings yet
Unit 3 DATA VISUAIZATION
25 pages
Module 1
No ratings yet
Module 1
91 pages
Olmca616 Exploratory-data-Analysis Th 1.0 81 Olmca616
No ratings yet
Olmca616 Exploratory-data-Analysis Th 1.0 81 Olmca616
2 pages
208 RM Lab File1 PDF
No ratings yet
208 RM Lab File1 PDF
31 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
Fda End Sem
No ratings yet
Fda End Sem
14 pages
W04- Visualization and Data Tools
No ratings yet
W04- Visualization and Data Tools
18 pages
DV Lab Manual (Ex - No.1-10)
No ratings yet
DV Lab Manual (Ex - No.1-10)
23 pages
Data Sciecnce
No ratings yet
Data Sciecnce
16 pages
Big Data Visualization and Common Adopattation Issues
No ratings yet
Big Data Visualization and Common Adopattation Issues
34 pages
Module 4 DS
No ratings yet
Module 4 DS
89 pages
Linear Regression Merged
No ratings yet
Linear Regression Merged
38 pages
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
CH4 Exploratory Data Analysis
No ratings yet
CH4 Exploratory Data Analysis
12 pages
Da End Sem
No ratings yet
Da End Sem
5 pages
Module 4 PPT
No ratings yet
Module 4 PPT
195 pages
UNIT 3 DV
No ratings yet
UNIT 3 DV
12 pages
Data Mining Vs Data Exploration UNIT-II
No ratings yet
Data Mining Vs Data Exploration UNIT-II
11 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Develop A Research Proposal
No ratings yet
Develop A Research Proposal
11 pages
Research MCQ's
No ratings yet
Research MCQ's
13 pages
Statistics For Management, 7e Authors: Sanjay Rastogi and Masood H. Siddiqui
0% (1)
Statistics For Management, 7e Authors: Sanjay Rastogi and Masood H. Siddiqui
13 pages
Maths T Assignment Viva Title Sampling Distributions: Prepared by Marcus NG Zi Jian Examinate by MR Suhaimi Aziz
No ratings yet
Maths T Assignment Viva Title Sampling Distributions: Prepared by Marcus NG Zi Jian Examinate by MR Suhaimi Aziz
10 pages
Output For Program Evaluation
No ratings yet
Output For Program Evaluation
31 pages
Mathgazine 2
No ratings yet
Mathgazine 2
19 pages
Final Exam Sta104 July 2020
No ratings yet
Final Exam Sta104 July 2020
9 pages
HookPoint Deck
No ratings yet
HookPoint Deck
57 pages
Blue And Light Blue Modern Market Research Presentation
No ratings yet
Blue And Light Blue Modern Market Research Presentation
15 pages
Task Card 5 - Confidence Intervals
No ratings yet
Task Card 5 - Confidence Intervals
3 pages
The Franck-Condon Principle
100% (1)
The Franck-Condon Principle
7 pages
RSH Qam11 ch05
No ratings yet
RSH Qam11 ch05
84 pages
Q2 Practical Research 2 Module 14
No ratings yet
Q2 Practical Research 2 Module 14
20 pages
Study Guide 9 - Theorethical Framework of A Research
No ratings yet
Study Guide 9 - Theorethical Framework of A Research
8 pages
RMM Unit 1 PDF
No ratings yet
RMM Unit 1 PDF
27 pages
Physical Structures of Fibres (Cont - D)
No ratings yet
Physical Structures of Fibres (Cont - D)
25 pages
6 Sigma 1619893345
No ratings yet
6 Sigma 1619893345
299 pages
Random Sampling
100% (1)
Random Sampling
28 pages
Jay Villasoto Activity
No ratings yet
Jay Villasoto Activity
2 pages
Understanding Q-Q Plots: Latest News
No ratings yet
Understanding Q-Q Plots: Latest News
4 pages
Data Collection Methods
No ratings yet
Data Collection Methods
23 pages
A Regression Analysis Investigating The Relationship Between Income and Happiness
No ratings yet
A Regression Analysis Investigating The Relationship Between Income and Happiness
7 pages
Notes In-Statistics
No ratings yet
Notes In-Statistics
3 pages
Sciework
No ratings yet
Sciework
10 pages
3 PDF
No ratings yet
3 PDF
112 pages
Assignment: Jatiya Kabi Kazi Nazrul Islam University
No ratings yet
Assignment: Jatiya Kabi Kazi Nazrul Islam University
26 pages
1SLIDES-FundSTATS (S3511SF) - Week 1 - 230511 - 104056
No ratings yet
1SLIDES-FundSTATS (S3511SF) - Week 1 - 230511 - 104056
61 pages

SMA_Expt_4

Uploaded by

SMA_Expt_4

Uploaded by

LAB MANUAL

Lab Objective To understand the fundamental concepts of social media networks

What is Exploratory Data Analysis?

Need for visualizing data:

● Understand the trends and patterns of data

Types of Exploratory Data Analysis

Roll. No.: A17 Name: Laukik Pawar

B.2 Input and Output:

import matplotlib.pyplot as plt

# Additional plots you can explore

import matplotlib.pyplot as plt

# Select numeric columns for the heatmap

# Create the heatmap

B.3 Observations and learning:

B.5 Question of Curiosity

Q2. What is the Importance of Visualization?

Better Communication: Visual representations make it easier to convey findings to others,

Q3. Explain the Steps Involved in EDA.

You might also like