0% found this document useful (0 votes)

13 views7 pages

Da Imp Qna Cleaned

Uploaded by

sreejaravinderreddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views7 pages

Da Imp Qna Cleaned

Uploaded by

sreejaravinderreddy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

DA IMP QNA

### 1. Define variable rationalization. What are techniques applied in

rationalization?

- **Definition**:

Variable rationalization is a method used to reduce the number of variables (or

columns) in a dataset. It focuses on keeping only the important variables that

help in analysis or predictions, while removing the unimportant or redundant

ones. This helps in making the data easier to work with and improves the

performance of machine learning models.

- Why Its Important:

1. Reduces the size of the dataset, making it faster to analyze.

2. Prevents overfitting in machine learning models.

3. Simplifies the understanding of the data.

- **Techniques**:

1. **Feature Selection**:

This involves selecting the variables that have the most influence on the

output.

**Example**: In predicting house prices, variables like "Size of the House" and

"Location" are important, but "Color of the Walls" may not be.

2. Removing Correlated Variables:

If two variables provide the same information, one can be removed.

**Example**: "Monthly Income" and "Annual Income" are related; we can keep
only one.

3. Principal Component Analysis (PCA):

This combines multiple related variables into fewer independent variables.

Example: In an exam dataset, 10 test scores can be reduced to 2 key

components summarizing performance.

4. Eliminating Low-Variance Variables:

Variables that dont change much dont provide useful information.

**Example**: A column where most values are the same, like "Country: India"

for 99% entries, can be removed.

---

### 2. What is data visualization? List the various tools used in data

visualization.

- **Definition**:

Data visualization is the process of creating graphs, charts, and other visuals to

make data easy to understand. It helps people see patterns, trends, or outliers

in data.

- Why Use Data Visualization?

1. To quickly understand large datasets.

2. To identify trends, like sales increasing over time.

3. To compare categories, such as revenue from different cities.

4. To explain complex data in a simple and attractive way.

- **Types of Visualizations**:
1. **Line Chart**: Used to show trends over time (e.g., monthly sales).

2. Bar Chart: Used to compare categories (e.g., revenue from different

products).

3. Pie Chart: Used to show proportions (e.g., market share of companies).

4. Scatter Plot: Used to show relationships (e.g., age vs. income).

- **Tools**:

1. Tableau: Advanced dashboards for businesses.

2. Power BI: Simple and interactive visualizations.

3. Matplotlib & Seaborn: Python libraries for charts and plots.

4. Excel: Basic charts for simple analysis.

5. Google Data Studio: Online tool to connect and display data.

---

### 3. Construct a decision tree? Why is pruning called?

- What is a Decision Tree?

A decision tree is a flowchart-like model used to make decisions or predictions.

It works by splitting data step by step based on questions or rules.

- Steps to Build a Decision Tree:

1. Start with all the data.

2. Ask a question that splits the data into two groups.

3. Keep splitting the data until you reach a decision at the bottom.

- **Example**:
A bank deciding loan approval:

1. Is Income > $50,000?

- Yes Check Credit Score.

- No Loan Not Approved.

2. Is Credit Score > 700?

- Yes Loan Approved.

- No Loan Not Approved.

- Why Is Pruning Done?

1. Definition: Pruning means removing unnecessary branches from the

decision tree.

2. **Reason**: If a tree becomes too complex, it learns noise in the training data

and performs poorly on new data.

- **Example**: A branch uses "Day of the Week" to decide loan approval, but

this is not useful. Pruning removes it.

3. Benefits: Makes the tree simpler, faster, and more accurate.

---

### 4. Explain the applications of analytics in various business domains.

- **What is Analytics?**

Analytics means studying data to find useful patterns, trends, and insights.

Different industries use analytics to make better decisions.

- **Applications**:

1. **Healthcare**:
- Predict diseases (e.g., identifying people at risk of heart attacks).

- Track patient recovery and hospital performance.

- **Example**: Hospitals use data to predict how many beds will be needed

during flu season.

2. **Finance**:

- Detect fraud in transactions.

- Predict stock prices or market trends.

- Example: Banks use analytics to decide whether a customer is eligible

for a loan.

3. **Retail**:

- Analyze customer behavior to recommend products.

- Optimize inventory to avoid overstock or shortages.

- Example: Amazon recommends products based on browsing history.

4. **Manufacturing**:

- Predict when machines will break down (preventive maintenance).

- Improve product quality using defect analysis.

- Example: Factories use analytics to schedule machine repairs.

5. **Marketing**:

- Target the right customers with ads.

- Measure how well ads perform.

- Example: A company sends offers to customers who havent shopped

recently.

---

### 5. Difference between supervised and unsupervised learning.

| **Feature** | **Supervised Learning** |

**Unsupervised Learning** |

|---------------------------|------------------------------------------------------|------------------------------

-----------------------|

| Definition | Uses labeled data (data with answers). | Uses

unlabeled data (data without answers). |

| Goal | Predict outcomes or classify data. | Find hidden

patterns or groups. |

| Examples | Regression, Classification (e.g., Predicting sales). |

Clustering (e.g., Grouping customers by spending). |

| Data Needed | Labeled (e.g., Age, Income, Loan Approved Yes/No). |

Unlabeled (e.g., Customer Purchase History). |

| Algorithms | Decision Trees, Neural Networks, Linear Regression. |

K-Means, PCA, Hierarchical Clustering. |

- Example of Supervised Learning:

Predicting house prices based on features like size, location, and number of

bedrooms.

- Example of Unsupervised Learning:

Grouping customers into high, medium, and low spenders based on their

shopping history.

---

### 6. STL Approach

- **What is STL?**

STL (Seasonal and Trend decomposition using Loess) is a method for

analyzing time-series data. It breaks the data into three parts:

1. Seasonal: Repeating patterns (e.g., sales peak every weekend).

2. Trend: Long-term movement (e.g., sales gradually increasing).

3. Residual: Random noise or outliers.

- Why Use STL?

1. To understand patterns in time-series data.

2. To forecast future values.

3. To detect anomalies (e.g., sudden drops or spikes).

- **Example**:

A shop tracks daily sales:

1. Seasonal: Sales increase on weekends.

2. Trend: Sales are rising month by month.

3. Residual: Random spikes during holidays or discounts.

- **Applications**:

1. Predicting sales for the next month.

2. Analyzing website traffic trends.

3. Detecting unusual behavior in electricity usage.

Machine Learning
No ratings yet
Machine Learning
48 pages
Big Data - Notes
No ratings yet
Big Data - Notes
6 pages
Ivy - Data Science and Data Visualization Certification Course
100% (1)
Ivy - Data Science and Data Visualization Certification Course
10 pages
Summary DS231
No ratings yet
Summary DS231
11 pages
Approved Staffing and Recruiting Suppliers KL 7-22-131
No ratings yet
Approved Staffing and Recruiting Suppliers KL 7-22-131
2 pages
GATE ML Updated 111023
No ratings yet
GATE ML Updated 111023
109 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Accounting Analytics 2
No ratings yet
Accounting Analytics 2
41 pages
ML Super Imp
No ratings yet
ML Super Imp
19 pages
Creating A Digital Product- Beginner’s
From Everand
Creating A Digital Product- Beginner’s
Natisha Peters
No ratings yet
50 Interview Questions & Answers!
No ratings yet
50 Interview Questions & Answers!
52 pages
Data Mining University Answer
No ratings yet
Data Mining University Answer
10 pages
Unit 3
No ratings yet
Unit 3
55 pages
PGP-Data Science - Course Module With Internship Module
No ratings yet
PGP-Data Science - Course Module With Internship Module
16 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
ML DL NLP Definitions
No ratings yet
ML DL NLP Definitions
22 pages
Machine Learning
No ratings yet
Machine Learning
37 pages
ML Unit 3
No ratings yet
ML Unit 3
10 pages
Unit I Data Analytics
No ratings yet
Unit I Data Analytics
46 pages
Computer Vision-Lec 3
No ratings yet
Computer Vision-Lec 3
11 pages
Big Data
No ratings yet
Big Data
5 pages
BDA Lecture Unit 3 With LAB
No ratings yet
BDA Lecture Unit 3 With LAB
20 pages
Kavin
No ratings yet
Kavin
15 pages
Data Analytics
No ratings yet
Data Analytics
6 pages
Bi Short Notes
No ratings yet
Bi Short Notes
15 pages
Business Analytics
No ratings yet
Business Analytics
14 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
Supervised Learning Final With Diagrams Cleaned
No ratings yet
Supervised Learning Final With Diagrams Cleaned
7 pages
Data Analytics Syllabus PDF
No ratings yet
Data Analytics Syllabus PDF
5 pages
1.four Types of Analytics in Simple Terms
No ratings yet
1.four Types of Analytics in Simple Terms
11 pages
Recent Incidents Involving The WhatsApp Accounts of S
No ratings yet
Recent Incidents Involving The WhatsApp Accounts of S
4 pages
Big Data Part-I
No ratings yet
Big Data Part-I
15 pages
Dsbda Ut4
No ratings yet
Dsbda Ut4
12 pages
Oral Aswers Dsbda
No ratings yet
Oral Aswers Dsbda
7 pages
Predective Analytics
No ratings yet
Predective Analytics
11 pages
Big Data Imp Notes of Big Dats
No ratings yet
Big Data Imp Notes of Big Dats
17 pages
Data-Driven Decisions: The Blueprint for Success
From Everand
Data-Driven Decisions: The Blueprint for Success
Naila Hina
No ratings yet
ML Algo Terms
No ratings yet
ML Algo Terms
11 pages
Abhijitya Midsem
No ratings yet
Abhijitya Midsem
6 pages
Da Unit-Ii
No ratings yet
Da Unit-Ii
21 pages
Data Analyst Role Tasks Skills
No ratings yet
Data Analyst Role Tasks Skills
21 pages
Da Mid 2
No ratings yet
Da Mid 2
12 pages
Ass 2
No ratings yet
Ass 2
6 pages
DF
No ratings yet
DF
4 pages
Data Science Foundations
No ratings yet
Data Science Foundations
4 pages
Mineral Appraisal Handout
100% (1)
Mineral Appraisal Handout
16 pages
Data Science and Analytics Theory Complete
No ratings yet
Data Science and Analytics Theory Complete
11 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
21 pages
BIA Notes
No ratings yet
BIA Notes
10 pages
Marketing Analytics Week-8 LAQ
No ratings yet
Marketing Analytics Week-8 LAQ
4 pages
Introduction To Data Mining 1
No ratings yet
Introduction To Data Mining 1
23 pages
Notes Unit 2
No ratings yet
Notes Unit 2
3 pages
Unit3 Datamining
No ratings yet
Unit3 Datamining
5 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
Presentation On Supervised Learning
No ratings yet
Presentation On Supervised Learning
8 pages
Notes
No ratings yet
Notes
35 pages
Ds Revision 1
No ratings yet
Ds Revision 1
5 pages
Data Analytics
No ratings yet
Data Analytics
24 pages
FPRM 004. General Orientation Form
No ratings yet
FPRM 004. General Orientation Form
4 pages
Data Analytics Unit4 FullNotes
No ratings yet
Data Analytics Unit4 FullNotes
4 pages
Econ Edexcel SuggestedAnswers ALLPAPERS 2019
No ratings yet
Econ Edexcel SuggestedAnswers ALLPAPERS 2019
29 pages
Data Mining Overview
No ratings yet
Data Mining Overview
4 pages
CSA Demo
0% (1)
CSA Demo
13 pages
Data Analytics Unit4 Notes
No ratings yet
Data Analytics Unit4 Notes
3 pages
Viva Questions-Flu
No ratings yet
Viva Questions-Flu
3 pages
Business Plan,, Castel Soap & Co...
No ratings yet
Business Plan,, Castel Soap & Co...
15 pages
Pharmaceutical Supply Chain Management - Performance Measures
No ratings yet
Pharmaceutical Supply Chain Management - Performance Measures
32 pages
Unit 1 Exploring Business (UK)
No ratings yet
Unit 1 Exploring Business (UK)
10 pages
Rahal - Folemirial Trust: Inderpal
No ratings yet
Rahal - Folemirial Trust: Inderpal
2 pages
Borrowing Costs Lkas 23: ACC 3112 Financial Reporting
No ratings yet
Borrowing Costs Lkas 23: ACC 3112 Financial Reporting
17 pages
AT2 - Revised - Briefing Report Template
No ratings yet
AT2 - Revised - Briefing Report Template
3 pages
Lecture 3. First Time Adoption of IFRSs
No ratings yet
Lecture 3. First Time Adoption of IFRSs
18 pages
Atlas Honda: Board of Directors
100% (1)
Atlas Honda: Board of Directors
2 pages
OS Unit-3 23-24
No ratings yet
OS Unit-3 23-24
55 pages
OS Unit-1 23-24
No ratings yet
OS Unit-1 23-24
89 pages
Short Ans DAA
No ratings yet
Short Ans DAA
4 pages
List of Books: S. No Edition Name of The Publisher Year of Pub. Cost of The Book No. of Copies Req. Now Total Amont
No ratings yet
List of Books: S. No Edition Name of The Publisher Year of Pub. Cost of The Book No. of Copies Req. Now Total Amont
3 pages
Accounting For Overheads
No ratings yet
Accounting For Overheads
53 pages
Cosm - QB (5 Units) - 2023-24 - KNR
No ratings yet
Cosm - QB (5 Units) - 2023-24 - KNR
10 pages
Big Data Analysis
No ratings yet
Big Data Analysis
9 pages
Detailed Python Data Analysis Big Data Tools
No ratings yet
Detailed Python Data Analysis Big Data Tools
9 pages
1.1 Impact of Sustainability in Modern Business World
No ratings yet
1.1 Impact of Sustainability in Modern Business World
3 pages
CSS Cheatsheet - CodeWithHarry
No ratings yet
CSS Cheatsheet - CodeWithHarry
19 pages
Reverse Auction Police Transport Dept PDF
No ratings yet
Reverse Auction Police Transport Dept PDF
27 pages
Accounting S1
No ratings yet
Accounting S1
2 pages
Transaction History: Name: Account Number: Address: Card Number: Reporting Period: Email
No ratings yet
Transaction History: Name: Account Number: Address: Card Number: Reporting Period: Email
2 pages
Assignment Brief Unit 4.6 Resource Management
No ratings yet
Assignment Brief Unit 4.6 Resource Management
14 pages
Chapter 21 Solutions
No ratings yet
Chapter 21 Solutions
28 pages
Ecommerce Strategy Handbook
No ratings yet
Ecommerce Strategy Handbook
53 pages
Chapter 5 Modes of International Payment
No ratings yet
Chapter 5 Modes of International Payment
13 pages
Xaar Annual Report 2018
No ratings yet
Xaar Annual Report 2018
140 pages
Full Research Pappers
No ratings yet
Full Research Pappers
49 pages
Business Plan Template
No ratings yet
Business Plan Template
10 pages
POA End of Term MCQ
No ratings yet
POA End of Term MCQ
7 pages
Donation 12-31-2023
No ratings yet
Donation 12-31-2023
2 pages
ND Lecture Notes V CEU 2024 - VF
No ratings yet
ND Lecture Notes V CEU 2024 - VF
19 pages
Track Shipment Status - Consignment Status - DTDC India
No ratings yet
Track Shipment Status - Consignment Status - DTDC India
1 page
Receipt
No ratings yet
Receipt
1 page

Da Imp Qna Cleaned

Uploaded by

Da Imp Qna Cleaned

Uploaded by

DA IMP QNA

### 1. Define variable rationalization. What are techniques applied in

Variable rationalization is a method used to reduce the number of variables (or

columns) in a dataset. It focuses on keeping only the important variables that

help in analysis or predictions, while removing the unimportant or redundant

performance of machine learning models.

- **Why Its Important**:

1. Reduces the size of the dataset, making it faster to analyze.

2. Prevents overfitting in machine learning models.

3. Simplifies the understanding of the data.

2. **Removing Correlated Variables**:

If two variables provide the same information, one can be removed.

3. **Principal Component Analysis (PCA)**:

This combines multiple related variables into fewer independent variables.

**Example**: In an exam dataset, 10 test scores can be reduced to 2 key

components summarizing performance.

4. **Eliminating Low-Variance Variables**:

Variables that dont change much dont provide useful information.

for 99% entries, can be removed.

- **Why Use Data Visualization?**

1. To quickly understand large datasets.

2. To identify trends, like sales increasing over time.

3. To compare categories, such as revenue from different cities.

4. To explain complex data in a simple and attractive way.

2. **Bar Chart**: Used to compare categories (e.g., revenue from different

3. **Pie Chart**: Used to show proportions (e.g., market share of companies).

4. **Scatter Plot**: Used to show relationships (e.g., age vs. income).

1. Tableau: Advanced dashboards for businesses.

2. Power BI: Simple and interactive visualizations.

3. Matplotlib & Seaborn: Python libraries for charts and plots.

4. Excel: Basic charts for simple analysis.

5. Google Data Studio: Online tool to connect and display data.

### 3. Construct a decision tree? Why is pruning called?

- **What is a Decision Tree?**

A decision tree is a flowchart-like model used to make decisions or predictions.

It works by splitting data step by step based on questions or rules.

- **Steps to Build a Decision Tree**:

1. Start with all the data.

2. Ask a question that splits the data into two groups.

1. Is Income > $50,000?

- Yes Check Credit Score.

- No Loan Not Approved.

2. Is Credit Score > 700?

- Yes Loan Approved.

- No Loan Not Approved.

- **Why Is Pruning Done?**

1. **Definition**: Pruning means removing unnecessary branches from the

and performs poorly on new data.

this is not useful. Pruning removes it.

3. **Benefits**: Makes the tree simpler, faster, and more accurate.

### 4. Explain the applications of analytics in various business domains.

Different industries use analytics to make better decisions.

- Track patient recovery and hospital performance.

during flu season.

- Detect fraud in transactions.

- Predict stock prices or market trends.

- **Example**: Banks use analytics to decide whether a customer is eligible

- Analyze customer behavior to recommend products.

- Optimize inventory to avoid overstock or shortages.

- **Example**: Amazon recommends products based on browsing history.

- Predict when machines will break down (preventive maintenance).

- Improve product quality using defect analysis.

- **Example**: Factories use analytics to schedule machine repairs.

- Target the right customers with ads.

- Measure how well ads perform.

- **Example**: A company sends offers to customers who havent shopped

### 5. Difference between supervised and unsupervised learning.

| **Definition** | Uses labeled data (data with answers). | Uses

unlabeled data (data without answers). |

| **Goal** | Predict outcomes or classify data. | Find hidden

| **Examples** | Regression, Classification (e.g., Predicting sales). |

Clustering (e.g., Grouping customers by spending). |

| **Data Needed** | Labeled (e.g., Age, Income, Loan Approved Yes/No). |

Unlabeled (e.g., Customer Purchase History). |

| **Algorithms** | Decision Trees, Neural Networks, Linear Regression. |

K-Means, PCA, Hierarchical Clustering. |

- Why Its Important:

2. Removing Correlated Variables:

3. Principal Component Analysis (PCA):

Example: In an exam dataset, 10 test scores can be reduced to 2 key

4. Eliminating Low-Variance Variables:

- Why Use Data Visualization?

2. Bar Chart: Used to compare categories (e.g., revenue from different

3. Pie Chart: Used to show proportions (e.g., market share of companies).

4. Scatter Plot: Used to show relationships (e.g., age vs. income).

- What is a Decision Tree?

- Steps to Build a Decision Tree:

- Why Is Pruning Done?

1. Definition: Pruning means removing unnecessary branches from the

3. Benefits: Makes the tree simpler, faster, and more accurate.

- Example: Banks use analytics to decide whether a customer is eligible

- Example: Amazon recommends products based on browsing history.

- Example: Factories use analytics to schedule machine repairs.

- Example: A company sends offers to customers who havent shopped

| Definition | Uses labeled data (data with answers). | Uses

| Goal | Predict outcomes or classify data. | Find hidden

| Examples | Regression, Classification (e.g., Predicting sales). |

| Data Needed | Labeled (e.g., Age, Income, Loan Approved Yes/No). |

| Algorithms | Decision Trees, Neural Networks, Linear Regression. |

- Example of Supervised Learning:

- Example of Unsupervised Learning:

1. Seasonal: Repeating patterns (e.g., sales peak every weekend).

2. Trend: Long-term movement (e.g., sales gradually increasing).

3. Residual: Random noise or outliers.

- Why Use STL?