0% found this document useful (0 votes)
7 views14 pages

BDA assignment

The document discusses three data visualization tools: Tableau, Power BI, and QlikView, detailing their merits and demerits. It also explains statistical inference and its importance in data analytics, particularly through the chi-square test method with an example related to education level and job satisfaction. The chi-square test demonstrates how to analyze observed versus expected frequencies to determine relationships between categorical variables.

Uploaded by

Sukriti Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

BDA assignment

The document discusses three data visualization tools: Tableau, Power BI, and QlikView, detailing their merits and demerits. It also explains statistical inference and its importance in data analytics, particularly through the chi-square test method with an example related to education level and job satisfaction. The chi-square test demonstrates how to analyze observed versus expected frequencies to determine relationships between categorical variables.

Uploaded by

Sukriti Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

(First Assignment of Big Data Analysis)

Submitted by
Prashi Garg 2116257(27)
Priya Kumari Gupta 2116259(29)
Purvi Gangwar 2116263(33)
Shambhawi Singh 2116296(66)

For the awardees of the degree of

B-Tech (Computer Science)

Submitted To
Dr. Manisha Jailia
​ ​ ​ ​ ​ ​ ​

Faculty of Mathematics and Computing


Banasthali Vidyapith
Banasthali - 304022
Session: 2025
Ques 1)
Discuss at least 3 visualization tools in detail also gives merits and demerits of all.

Data visualization software is a specialized tool designed for data visualization and analysis.
Examples of data visualization software include Tableau, QlikView, and Power BI. These tools
provide advanced data visualization capabilities, including interactive dashboards, heat maps,
and network diagrams.

Data Visualization tools

1. Tableau

Tableau is a very powerful data visualization tool that can be used by data analysts, scientists,
statisticians, etc. to visualize the data and get a clear opinion based on the data analysis. Tableau
is very famous as it can take in data and produce the required data visualization output in a very
short time. Basically, it can elevate your data into insights that can be used to drive your action in
the future. And Tableau can do all this while providing the highest level of security with a
guarantee to handle security issues as soon as they arise or are found by users.

Merits:

User-Friendly Interface: It is a rather user-friendly software, enabling users with less


knowledge in system engineering to build complex visualizations by drag and drop.

Data Exploration: Can deeply explore specific data instead of providing only aggregates.

Customizability: Can be highly customizable for visual aspects like colors, fonts, and module
layouts for visualization requirements.

Handles Large Data Sets: Large data sets can be utilized, allowing for a more complex
analysis than spreadsheet programs.

Demerits:

Cost: One of the more expensive options, especially for enterprise-level usage.

Learning Curve: While generally user-friendly, advanced features may require a steeper
learning curve for complex data manipulation.

Deployment: Can be challenging to embed Tableau visualizations within other applications.

2. Power BI

Power BI is a Data Visualization and Business Intelligence tool by Microsoft that converts data
from different data sources to create various business intelligence reports. It provides interactive
visualizations using which end users can create reports and interactive dashboards by
themselves. It is highly recommended to download Power-BI tool before proceeding further.

Merits:

Microsoft Ecosystem Integration: It is integrated with other Microsoft products such as Excel
and Azure, making it easy for users already in the Microsoft ecosystem.

Scalability: It can scale effectively for large organizations with diverse data sources.
Real-time Data Updates: The ability to show real-time data updates on dashboards.

Demerits:

Complexity: It can be overwhelming for new users due to a large number of features and
options.

Customization Limitations: While customization options exist, some users might find them less
flexible compared to Tableau.

Data Preparation: May require more data preparation upfront compared to other tools.

3. QlikView

QlikView is a Business Intelligence tool which is used to convert raw data into knowledge. It is
excellent in visually analyzing the relationships between data. This software acts like a human
brain which works on "association," and it can go into any direction to search for the answers.

Merits

Associative Data Model: With its unique associative model, QlikView provides unprecedented
freedom to users to explore the data in any direction they choose, without restricting them to
predefined queries or paths.

Fast In-Memory Processing: Inbuilt with in-memory data processing capabilities, it provides
fast data processing and results even when processing large datasets.

Self-Service BI: Users can create visualizations and enable data discovery without the need for
advanced programming or database knowledge, making them ideal for non-technical staff.

Interactive Dashboards: It offers a for sure interactive dashboards where users can filter, drill
down, and explore and things dynamically.

Data Integration: Allow different source data to get unified analysis in one place.
Customization: Provides huge flexibility in building custom reports and dashboards for specific
business needs.

Demerits:

Steep Learning Curve: Making simple reports is straightforward, but developing the advanced
functionalities, scripting, and data modeling in QlikView can stuff inexperienced users.

High Licensing Costs: The cost of licensing for QlikView can be high, which can make it less
accessible for small and mid-sized businesses or individual users.

Limited Visual Customization: While QlikView does provide interactive dashboards, its visual
customization capabilities appear marginally limited when compared to tools such as Tableau or
Power BI.

Hardware Dependency: It is important to note that QlikView can be memory and


processing-intensive, especially for large-scale data processing.

Less User-Friendly for Beginners: For non BI tool users, the interface and features might
seem complicated.

Outdated Interface: QlikView’s interface looks a little dated in comparison to its successor,
Qlik Sense, and its rivals
Line Chart: BMI & Glucose Trends
●​ BMI (Blue Line): Remains stable across age groups with minor fluctuations, suggesting
no significant age-related changes.
●​ Glucose (Red Line): Increases steadily with age, correlating to higher stroke risk in older
populations.

Correlation Heatmap: Indicates correlations between numeric variables like age, blood pressure,
heart rate, and cholesterol level.
Correlation Heatmap
●​ Visualizes relationships between variables (age, blood pressure, cholesterol).
●​ Key Insight: Strong positive/negative correlations (e.g., age vs. cholesterol) guide
targeted health interventions
1.​ Heart Disease by Hypertension:
●​ Higher heart disease prevalence in hypertensive groups.
●​ Reinforces blood pressure management as a preventive measure.
2.​ Stroke by Age:
●​ Stroke cases rise with age, highlighting older demographics as high-risk.
●​ Supports prioritizing geriatric healthcare initiatives.
Scatter Plot: Cholesterol vs. Heart Rate
●​ Analyzes gender-based trends: higher cholesterol may correlate with elevated heart rates.
●​ Helps assess gender-specific cardiovascular risks.

Cholesterol vs. Heart Rate Scatter Plot: Examines the relationship between cholesterol levels and
heart rate, differentiated by gender.
Count of Heart Disease by Hypertension​
A bar chart showing heart disease cases based on hypertension status:

●​ Highlights the difference in cases with and without hypertension.


●​ Emphasizes hypertension as a key risk factor for heart disease.
Sum of Stroke by Age

●​ Shows an increase in cases with age.


●​ Identifies older individuals as having a higher stroke risk.
Ques 2)
What do you mean by statistical inference? Why statistical inference plays
important role in data Analytics. Explain chi square test method with suitable
example.

Statistical inference provides a systematic approach to understanding broader population


characteristics through careful analysis of sample data. By applying mathematical and
probabilistic techniques, researchers can extrapolate insights, test hypotheses, and generate
predictive models that extend beyond immediate observations.

Statistical Inference in Data Analytics

Data analytics often involves understanding large, complex systems. However, studying entire
populations is usually impossible due to time and cost constraints. This is where statistical
inference becomes crucial.

Statistical inference allows researchers to draw conclusions about a big group by studying a
smaller, representative sample. It helps transform limited data into meaningful insights that can
guide important decisions. By using mathematical techniques, analysts can predict trends,
identify patterns, and understand relationships within data.

For example, imagine a company wanting to know customer satisfaction. Instead of asking every
single customer, they can survey a smaller group and use statistical methods to estimate overall
satisfaction. This approach saves time and resources while providing reliable information.

The method works by:

●​ Selecting an appropriate sample


●​ Analyzing data carefully
●​ Generalize findings to larger populations
●​ Measure the reliability of those findings

Statistical inference is used in many fields like business, healthcare, and social research. It helps
organizations make smart choices by turning limited information into valuable insights. By
understanding uncertainty and detecting significant patterns, it provides a powerful tool for
decision-makers.

Essentially, statistical inference bridges the gap between small-scale observations and broader
understanding, making it a key technique in modern data analysis.
the chi-square test works on the logic that if the observed frequencies are greater than the
expected frequencies, independence is rejected, suggesting a relationship exists.

Problem: Test if education level relates to job satisfaction

Observed Data:

Very Satisfied Satisfied Dissatisfied Total


High School 20 30 25 75
Bachelor's 35 50 15 100
Master's 15 20 10 45
Total​ ​ ​ 70 100 50 220

Step 1: Calculate Expected Frequencies Formula: (Row Total × Column Total) ÷ Grand Total

For High School, Very Satisfied:

●​ Row Total: 75
●​ Column Total: 70
●​ Grand Total: 220
●​ Expected = (75 × 70) ÷ 220 = 23.86

Repeat for all cells:

●​ High School, Very Satisfied: Expected = 23.86


●​ High School, Satisfied: Expected = 34.09
●​ High School, Dissatisfied: Expected = 17.05
●​ Bachelor's, Very Satisfied: Expected = 31.82
●​ Bachelor's, Satisfied: Expected = 45.45
●​ Bachelor's, Dissatisfied: Expected = 22.73
●​ Master's, Very Satisfied: Expected = 14.32
●​ Master's, Satisfied: Expected = 20.45
●​ Master's, Dissatisfied: Expected = 10.23

Step 2: Chi-Square Calculation Formula: Σ [(Observed - Expected)² ÷ Expected]

Calculation for each cell:

●​ (20 - 23.86)² ÷ 23.86 = 0.59


●​ (30 - 34.09)² ÷ 34.09 = 0.59
●​ (25 - 17.05)² ÷ 17.05 = 3.72
●​ (35 - 31.82)² ÷ 31.82 = 0.32
●​ (50 - 45.45)² ÷ 45.45 = 0.44
●​ (15 - 22.73)² ÷ 22.73 = 2.63
●​ (15 - 14.32)² ÷ 14.32 = 0.04
●​ (20 - 20.45)² ÷ 20.45 = 0.01
●​ (10 - 10.23)² ÷ 10.23 = 0.01

Total Chi-Square: 8.35

Step 3: Degrees of Freedom (Rows - 1) × (Columns - 1) = (3 - 1) × (3 - 1) = 4

Step 4: Significance At 5% significance level with 4 degrees of freedom, critical value is 9.49

Conclusion: Since 8.35 < 9.49, we cannot reject the null hypothesis. There's no statistically
significant relationship between education level and job satisfaction.

Limitations:

●​ Cannot determine relationship strength


●​ Only shows association, not causation
●​ Requires categorical data

You might also like