BDA assignment
BDA assignment
Submitted by
Prashi Garg 2116257(27)
Priya Kumari Gupta 2116259(29)
Purvi Gangwar 2116263(33)
Shambhawi Singh 2116296(66)
Submitted To
Dr. Manisha Jailia
Data visualization software is a specialized tool designed for data visualization and analysis.
Examples of data visualization software include Tableau, QlikView, and Power BI. These tools
provide advanced data visualization capabilities, including interactive dashboards, heat maps,
and network diagrams.
1. Tableau
Tableau is a very powerful data visualization tool that can be used by data analysts, scientists,
statisticians, etc. to visualize the data and get a clear opinion based on the data analysis. Tableau
is very famous as it can take in data and produce the required data visualization output in a very
short time. Basically, it can elevate your data into insights that can be used to drive your action in
the future. And Tableau can do all this while providing the highest level of security with a
guarantee to handle security issues as soon as they arise or are found by users.
Merits:
Data Exploration: Can deeply explore specific data instead of providing only aggregates.
Customizability: Can be highly customizable for visual aspects like colors, fonts, and module
layouts for visualization requirements.
Handles Large Data Sets: Large data sets can be utilized, allowing for a more complex
analysis than spreadsheet programs.
Demerits:
Cost: One of the more expensive options, especially for enterprise-level usage.
Learning Curve: While generally user-friendly, advanced features may require a steeper
learning curve for complex data manipulation.
2. Power BI
Power BI is a Data Visualization and Business Intelligence tool by Microsoft that converts data
from different data sources to create various business intelligence reports. It provides interactive
visualizations using which end users can create reports and interactive dashboards by
themselves. It is highly recommended to download Power-BI tool before proceeding further.
Merits:
Microsoft Ecosystem Integration: It is integrated with other Microsoft products such as Excel
and Azure, making it easy for users already in the Microsoft ecosystem.
Scalability: It can scale effectively for large organizations with diverse data sources.
Real-time Data Updates: The ability to show real-time data updates on dashboards.
Demerits:
Complexity: It can be overwhelming for new users due to a large number of features and
options.
Customization Limitations: While customization options exist, some users might find them less
flexible compared to Tableau.
Data Preparation: May require more data preparation upfront compared to other tools.
3. QlikView
QlikView is a Business Intelligence tool which is used to convert raw data into knowledge. It is
excellent in visually analyzing the relationships between data. This software acts like a human
brain which works on "association," and it can go into any direction to search for the answers.
Merits
Associative Data Model: With its unique associative model, QlikView provides unprecedented
freedom to users to explore the data in any direction they choose, without restricting them to
predefined queries or paths.
Fast In-Memory Processing: Inbuilt with in-memory data processing capabilities, it provides
fast data processing and results even when processing large datasets.
Self-Service BI: Users can create visualizations and enable data discovery without the need for
advanced programming or database knowledge, making them ideal for non-technical staff.
Interactive Dashboards: It offers a for sure interactive dashboards where users can filter, drill
down, and explore and things dynamically.
Data Integration: Allow different source data to get unified analysis in one place.
Customization: Provides huge flexibility in building custom reports and dashboards for specific
business needs.
Demerits:
Steep Learning Curve: Making simple reports is straightforward, but developing the advanced
functionalities, scripting, and data modeling in QlikView can stuff inexperienced users.
High Licensing Costs: The cost of licensing for QlikView can be high, which can make it less
accessible for small and mid-sized businesses or individual users.
Limited Visual Customization: While QlikView does provide interactive dashboards, its visual
customization capabilities appear marginally limited when compared to tools such as Tableau or
Power BI.
Less User-Friendly for Beginners: For non BI tool users, the interface and features might
seem complicated.
Outdated Interface: QlikView’s interface looks a little dated in comparison to its successor,
Qlik Sense, and its rivals
Line Chart: BMI & Glucose Trends
● BMI (Blue Line): Remains stable across age groups with minor fluctuations, suggesting
no significant age-related changes.
● Glucose (Red Line): Increases steadily with age, correlating to higher stroke risk in older
populations.
Correlation Heatmap: Indicates correlations between numeric variables like age, blood pressure,
heart rate, and cholesterol level.
Correlation Heatmap
● Visualizes relationships between variables (age, blood pressure, cholesterol).
● Key Insight: Strong positive/negative correlations (e.g., age vs. cholesterol) guide
targeted health interventions
1. Heart Disease by Hypertension:
● Higher heart disease prevalence in hypertensive groups.
● Reinforces blood pressure management as a preventive measure.
2. Stroke by Age:
● Stroke cases rise with age, highlighting older demographics as high-risk.
● Supports prioritizing geriatric healthcare initiatives.
Scatter Plot: Cholesterol vs. Heart Rate
● Analyzes gender-based trends: higher cholesterol may correlate with elevated heart rates.
● Helps assess gender-specific cardiovascular risks.
Cholesterol vs. Heart Rate Scatter Plot: Examines the relationship between cholesterol levels and
heart rate, differentiated by gender.
Count of Heart Disease by Hypertension
A bar chart showing heart disease cases based on hypertension status:
Data analytics often involves understanding large, complex systems. However, studying entire
populations is usually impossible due to time and cost constraints. This is where statistical
inference becomes crucial.
Statistical inference allows researchers to draw conclusions about a big group by studying a
smaller, representative sample. It helps transform limited data into meaningful insights that can
guide important decisions. By using mathematical techniques, analysts can predict trends,
identify patterns, and understand relationships within data.
For example, imagine a company wanting to know customer satisfaction. Instead of asking every
single customer, they can survey a smaller group and use statistical methods to estimate overall
satisfaction. This approach saves time and resources while providing reliable information.
Statistical inference is used in many fields like business, healthcare, and social research. It helps
organizations make smart choices by turning limited information into valuable insights. By
understanding uncertainty and detecting significant patterns, it provides a powerful tool for
decision-makers.
Essentially, statistical inference bridges the gap between small-scale observations and broader
understanding, making it a key technique in modern data analysis.
the chi-square test works on the logic that if the observed frequencies are greater than the
expected frequencies, independence is rejected, suggesting a relationship exists.
Observed Data:
Step 1: Calculate Expected Frequencies Formula: (Row Total × Column Total) ÷ Grand Total
● Row Total: 75
● Column Total: 70
● Grand Total: 220
● Expected = (75 × 70) ÷ 220 = 23.86
Step 4: Significance At 5% significance level with 4 degrees of freedom, critical value is 9.49
Conclusion: Since 8.35 < 9.49, we cannot reject the null hypothesis. There's no statistically
significant relationship between education level and job satisfaction.
Limitations: