0% found this document useful (0 votes)
13 views7 pages

Da Imp Qna Cleaned

DA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views7 pages

Da Imp Qna Cleaned

DA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

DA IMP QNA

### 1. Define variable rationalization. What are techniques applied in

rationalization?

- **Definition**:

Variable rationalization is a method used to reduce the number of variables (or

columns) in a dataset. It focuses on keeping only the important variables that

help in analysis or predictions, while removing the unimportant or redundant

ones. This helps in making the data easier to work with and improves the

performance of machine learning models.

- **Why Its Important**:

1. Reduces the size of the dataset, making it faster to analyze.

2. Prevents overfitting in machine learning models.

3. Simplifies the understanding of the data.

- **Techniques**:

1. **Feature Selection**:

This involves selecting the variables that have the most influence on the

output.

**Example**: In predicting house prices, variables like "Size of the House" and

"Location" are important, but "Color of the Walls" may not be.

2. **Removing Correlated Variables**:

If two variables provide the same information, one can be removed.

**Example**: "Monthly Income" and "Annual Income" are related; we can keep
only one.

3. **Principal Component Analysis (PCA)**:

This combines multiple related variables into fewer independent variables.

**Example**: In an exam dataset, 10 test scores can be reduced to 2 key

components summarizing performance.

4. **Eliminating Low-Variance Variables**:

Variables that dont change much dont provide useful information.

**Example**: A column where most values are the same, like "Country: India"

for 99% entries, can be removed.

---

### 2. What is data visualization? List the various tools used in data

visualization.

- **Definition**:

Data visualization is the process of creating graphs, charts, and other visuals to

make data easy to understand. It helps people see patterns, trends, or outliers

in data.

- **Why Use Data Visualization?**

1. To quickly understand large datasets.

2. To identify trends, like sales increasing over time.

3. To compare categories, such as revenue from different cities.

4. To explain complex data in a simple and attractive way.

- **Types of Visualizations**:
1. **Line Chart**: Used to show trends over time (e.g., monthly sales).

2. **Bar Chart**: Used to compare categories (e.g., revenue from different

products).

3. **Pie Chart**: Used to show proportions (e.g., market share of companies).

4. **Scatter Plot**: Used to show relationships (e.g., age vs. income).

- **Tools**:

1. Tableau: Advanced dashboards for businesses.

2. Power BI: Simple and interactive visualizations.

3. Matplotlib & Seaborn: Python libraries for charts and plots.

4. Excel: Basic charts for simple analysis.

5. Google Data Studio: Online tool to connect and display data.

---

### 3. Construct a decision tree? Why is pruning called?

- **What is a Decision Tree?**

A decision tree is a flowchart-like model used to make decisions or predictions.

It works by splitting data step by step based on questions or rules.

- **Steps to Build a Decision Tree**:

1. Start with all the data.

2. Ask a question that splits the data into two groups.

3. Keep splitting the data until you reach a decision at the bottom.

- **Example**:
A bank deciding loan approval:

1. Is Income > $50,000?

- Yes Check Credit Score.

- No Loan Not Approved.

2. Is Credit Score > 700?

- Yes Loan Approved.

- No Loan Not Approved.

- **Why Is Pruning Done?**

1. **Definition**: Pruning means removing unnecessary branches from the

decision tree.

2. **Reason**: If a tree becomes too complex, it learns noise in the training data

and performs poorly on new data.

- **Example**: A branch uses "Day of the Week" to decide loan approval, but

this is not useful. Pruning removes it.

3. **Benefits**: Makes the tree simpler, faster, and more accurate.

---

### 4. Explain the applications of analytics in various business domains.

- **What is Analytics?**

Analytics means studying data to find useful patterns, trends, and insights.

Different industries use analytics to make better decisions.

- **Applications**:

1. **Healthcare**:
- Predict diseases (e.g., identifying people at risk of heart attacks).

- Track patient recovery and hospital performance.

- **Example**: Hospitals use data to predict how many beds will be needed

during flu season.

2. **Finance**:

- Detect fraud in transactions.

- Predict stock prices or market trends.

- **Example**: Banks use analytics to decide whether a customer is eligible

for a loan.

3. **Retail**:

- Analyze customer behavior to recommend products.

- Optimize inventory to avoid overstock or shortages.

- **Example**: Amazon recommends products based on browsing history.

4. **Manufacturing**:

- Predict when machines will break down (preventive maintenance).

- Improve product quality using defect analysis.

- **Example**: Factories use analytics to schedule machine repairs.

5. **Marketing**:

- Target the right customers with ads.

- Measure how well ads perform.

- **Example**: A company sends offers to customers who havent shopped

recently.

---

### 5. Difference between supervised and unsupervised learning.


| **Feature** | **Supervised Learning** |

**Unsupervised Learning** |

|---------------------------|------------------------------------------------------|------------------------------

-----------------------|

| **Definition** | Uses labeled data (data with answers). | Uses

unlabeled data (data without answers). |

| **Goal** | Predict outcomes or classify data. | Find hidden

patterns or groups. |

| **Examples** | Regression, Classification (e.g., Predicting sales). |

Clustering (e.g., Grouping customers by spending). |

| **Data Needed** | Labeled (e.g., Age, Income, Loan Approved Yes/No). |

Unlabeled (e.g., Customer Purchase History). |

| **Algorithms** | Decision Trees, Neural Networks, Linear Regression. |

K-Means, PCA, Hierarchical Clustering. |

- **Example of Supervised Learning**:

Predicting house prices based on features like size, location, and number of

bedrooms.

- **Example of Unsupervised Learning**:

Grouping customers into high, medium, and low spenders based on their

shopping history.

---

### 6. STL Approach


- **What is STL?**

STL (Seasonal and Trend decomposition using Loess) is a method for

analyzing time-series data. It breaks the data into three parts:

1. **Seasonal**: Repeating patterns (e.g., sales peak every weekend).

2. **Trend**: Long-term movement (e.g., sales gradually increasing).

3. **Residual**: Random noise or outliers.

- **Why Use STL?**

1. To understand patterns in time-series data.

2. To forecast future values.

3. To detect anomalies (e.g., sudden drops or spikes).

- **Example**:

A shop tracks daily sales:

1. Seasonal: Sales increase on weekends.

2. Trend: Sales are rising month by month.

3. Residual: Random spikes during holidays or discounts.

- **Applications**:

1. Predicting sales for the next month.

2. Analyzing website traffic trends.

3. Detecting unusual behavior in electricity usage.

You might also like