0% found this document useful (0 votes)
19 views

UNIT-1

The document provides an overview of data analytics and visualization using R, detailing the nature, sources, types, and forms of data. It discusses the importance of data quality, the analytics workflow, and various applications across different sectors such as healthcare, marketing, and finance. Additionally, it highlights skills and tools necessary for business analytics and data science.

Uploaded by

chimneyhey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

UNIT-1

The document provides an overview of data analytics and visualization using R, detailing the nature, sources, types, and forms of data. It discusses the importance of data quality, the analytics workflow, and various applications across different sectors such as healthcare, marketing, and finance. Additionally, it highlights skills and tools necessary for business analytics and data science.

Uploaded by

chimneyhey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Data Analytics &

Visualization using R
What is Data ?
• Data refers to information that is collected,
stored, and used for various purposes. It can
come in many forms, such as numbers, text,
images, and sounds, and is often used to
analyze, understand, and make decisions
about different aspects of the world.
Real Facts about data
• Data Explosion:
• 97 Zettabytes by 2024: The world is projected to generate 97
zettabytes of data by 2024, showcasing the massive growth since
2020.
• Healthcare Data:
• 2.3 Zettabytes by 2024: The healthcare sector is expected to
manage 2.3 zettabytes of data, reflecting the extensive data
accumulation from medical records, research, and patient care.
• Social Media Activity:
• 500 Million Tweets Daily: About 500 million tweets are sent each
day by roughly 330 million active users on Twitter, highlighting the
significant data generation from social media interactions.
• Here are a few key points about data:

1. Sources of Data:

• Primary Data: Collected directly from original


sources through surveys, experiments, or
observations.

• Secondary Data: Gathered from existing


sources such as books, articles, databases, and
previously conducted studies.
2. Types of Data:

• Quantitative Data: Numerical information


that can be measured and quantified, such as
height, weight, temperature, and sales figures.

• Qualitative Data: Descriptive information that


cannot be easily measured, such as opinions,
feelings, and descriptions.
Types of Data:

• Labeled Data : Labeled data includes a label or


target variable that the model is trying to
predict

• Unlabeled Data : unlabeled data does not


include a label or target variable
3. Forms of Data:

• Structured Data: Organized in a predefined


manner, often in tables or spreadsheets,
making it easy to search and analyze.

• Unstructured Data: Not organized in a


predefined way, such as emails, social media
posts, and multimedia files.
Data Information Knowledge

• INFORMATION: Data that has been


interpreted and manipulated and has now
some meaningful inference for the users.

• KNOWLEDGE: Combination of inferred


information, experiences, learning, and
insights. Results in awareness or concept
building for an individual or organization.
4. Uses of Data:

• Analysis: Examining data to uncover patterns,


trends, and insights.

• Decision Making: Using data to inform


business strategies, policy decisions, and
personal choices.

• Research: Conducting studies to answer


questions and test hypotheses.
Issues associated with data :
• Data Quality: Accurate, complete, and representative data is
crucial. Poor quality data leads to inaccurate or biased models.
• Data Quantity: Insufficient data can prevent training accurate
models, especially for complex problems needing large datasets.
• Bias and Fairness: Biased or unrepresentative training data can
result in unfair models, perpetuating discrimination.
• Overfitting and Underfitting: Overfitting occurs when a model is
too complex and fits training data too closely, while underfitting
happens when a model is too simple and misses important
patterns.
• Privacy and Security: Models can infer sensitive information, raising
privacy and security concerns.
CRISP- DM
ANALYTICS WORKFLOW
Business Understanding
For the below listed business problems, draft the business objectives
and constraints.
Business Problem :
Smart data platforms can bring together customer transactions data
and data from real-time communication streams to disclose the
insights concerning customers feelings about the services which
allows addressing the satisfaction-related issues and churn
prevention.

• Business Objective:
• Minimize: Churn rate (churning implies customers going to another
company for their needs)
• Maximize: Customer satisfaction (satisfaction will make customer
more loyal to the brand)

Business Constraints: Lack of data coverage for all customers


Data Understanding
• Data understanding is the first and crucial
phase of any data analytics or machine
learning project.
• It involves exploring, describing, and
assessing the data to ensure its quality and
suitability for analysis.
1. Identifying Data Sources
• Before analysis, it is essential to determine where
the data comes from:
✅ Structured Data (Databases, Spreadsheets, APIs)
✅ Unstructured Data (Social Media, Text, Images,
Videos)
✅ Semi-Structured Data (JSON, XML, Web Logs)
2. Data Collection & Formats
• Data can be stored in different formats, such as:
✔ CSV (Comma-Separated Values) – Used for tabular
data
✔ Excel (XLSX) – Popular for business reporting
✔ SQL Databases – Structured storage for relational
data
✔ JSON/XML – Used for web and API data
✔ Big Data Storage – Hadoop, Spark, Cloud
Platforms
Data Preparation
• Handling Missing Values
• Removal of Duplicate Records
• Fixing Data Types
• Handling Outliers
• Standardising & Normalizing the data
• Feature Engineering
• Data Encoding
Data Analysis
• Descriptive Analytics
• Diagnostic Analysis
• Predictive Analysis
• Prescriptive Analysis
Reporting / Visualization
• Executive Summary
High-level overview of key findings.
• Data & Methodology
Description of data sources, data cleaning, and
analysis techniques.
• Visualizations & Insights
Charts, graphs, and tables to present insights
effectively.
• Findings & Interpretations
Explanation of trends, patterns, and key takeaways.
• Recommendations & Conclusion
Business implications and suggested actions based
on data.
Validation
• Types of Validation
1. Data Validation
2. Model Validation
3. Business Validation
Applications of Analytics
Customer Analytics
📊 Understanding customer behavior, preferences,
and retention

✅ Applications:
• Customer segmentation for targeted marketing
• Predicting customer churn and loyalty
• Sentiment analysis from social media & reviews
• 🔹 Example: Netflix uses analytics to recommend
shows based on user preferences.
Marketing Analytics
📢 Optimizing marketing strategies through data-
driven insights

✅ Applications:
• Analyzing campaign performance (Google Ads,
Facebook Ads)
• A/B testing for ad creatives & landing pages
• Customer lifetime value (CLV) prediction
• 🔹 Example: Amazon uses predictive analytics to
recommend products and improve conversions.
Financial Analytics
💰 Risk assessment, fraud detection, and
financial forecasting

✅ Applications:
• Credit Scoring for Loan Approvals
• Fraud Detection in Banking Transactions
• Stock Market Trend Prediction
Example: Banks like JPMorgan Chase use AI-
driven analytics to detect fraudulent
transactions in real time.
Supply Chain & Logistics Analytics
🚚 Optimizing inventory, reducing costs, and
improving logistics

✅ Applications:
• Route optimization for faster deliveries
• Demand forecasting to avoid
overstocking/understocking
• Predictive maintenance of delivery vehicles
• Example: Walmart uses analytics to predict
product demand and optimize its supply
chain.
HR & Workforce Analytics
👥 Enhancing employee productivity and
engagement

✅ Applications:
• Predicting employee attrition
• Recruitment analytics to hire the best talent
• Employee performance evaluation using KPIs
Example: Google uses people analytics to
optimize hiring and retain top talent.
Healthcare & Medical Analytics

🏥 Improving patient care and operational


efficiency

✅ Applications:
• Predictive analytics for disease outbreaks
• AI-driven diagnosis from medical imaging
• Hospital resource optimization (beds, staff,
equipment)
• Example: IBM Watson uses AI to assist doctors
in diagnosing diseases more accurately.
• Retail & Ecommerce Analytics
• Manufacturing & Operations Analytics
• Sports & Performance Analytics
• Fraud detection & Cybersecurity Analytics
and many more…
Text Analytics
• What is Text Analytics?
• Text analytics (also known as text mining) is the
process of deriving meaningful insights from
unstructured textual data. It involves techniques
such as natural language processing (NLP),
machine learning, and statistical methods to
extract patterns, trends, and sentiments from
text sources like customer reviews, emails, social
media, and more.
Applications of Text Analytics in
Business
• Sentiment Analysis
• Customer Feedback Analysis
• Chatbot & Virtual Assistants
• Fraud Detection & Cybersecurity
• Healthcare & Medical Text Mining
• Legal & Compliance Monitoring
• Market Research & Competitive Analysis
• Topic Modeling & Document Clustering
Web Analytics
• What is Web Analytics?
• Web analytics is the process of collecting,
analyzing, and interpreting web data to
optimize website performance, enhance user
experience, and improve digital marketing
strategies. It helps businesses track visitor
behavior, measure engagement, and make
data-driven decisions.
Key Metrics in Web Analytics
📊 Traffic Metrics:
✔ Sessions & Users – Number of visits and unique visitors
✔ Pageviews – Total number of pages viewed
✔ Bounce Rate – Percentage of users who leave after visiting
one page
🎯 Engagement Metrics:
✔ Average Session Duration – Time spent per visit
✔ Pages per Session – Average pages viewed in one visit
✔ Click-Through Rate (CTR) – Percentage of users clicking on
links
💰 Conversion Metrics:
✔ Conversion Rate – Percentage of visitors who complete a
goal (purchase, sign-up)
✔ Cart Abandonment Rate – Percentage of users leaving items
in the cart
Applications of Web Analytics in
Business
• E-Commerce Optimization
• Digital Marketing Performance
• User Experience (UX) Improvement
• Customer Retention & Personalization
• Fraud Detection & Security
Skills for Business analytics
• Data analytics & Statistics
• Data Visualization
• Database Management & SQL
• Machine learning & Predictive Analytics
• Business & Analytical Thinking
• Communication & Storytelling with data
BA Tools
• Excel
• SQl
• Python / R
• Tableau / PowerBI
• Google Analytics
Concepts of DS
Skills Required for DS
• Programming
• Statistics & Mathematics
• Data handling & processing
• ML & AI
• Data Visualization & Storytelling
• Business & Domain Knowledge
Thank You

You might also like