100% found this document useful (1 vote)
67 views

A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms

Ensemble methods are motivated by probabilistic principles to improve predictive accuracy of machine learning models. By combining multiple models through techniques like bagging or boosting, their errors and biases tend to cancel out according to the law of large numbers and central limit theorem. This results in a more robust estimate of the underlying data distribution, reducing variance and improving predictions compared to a single model.

Uploaded by

Hassan Saddiqui
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
67 views

A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms

Ensemble methods are motivated by probabilistic principles to improve predictive accuracy of machine learning models. By combining multiple models through techniques like bagging or boosting, their errors and biases tend to cancel out according to the law of large numbers and central limit theorem. This results in a more robust estimate of the underlying data distribution, reducing variance and improving predictions compared to a single model.

Uploaded by

Hassan Saddiqui
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Q1

a) What is motivation behind ensemble methods? Give your answer in


probabilistic terms.

Ensemble methods are motivated by the concept of leveraging probabilistic principles


to enhance the predictive power of machine learning models. In probabilistic terms, they
aim to exploit the law of large numbers and the central limit theorem to create more
accurate and reliable predictions.

Consider a single machine learning model as an estimator of a probability distribution.


This model may have biases or limited expressiveness, leading to errors and
uncertainties in its predictions. Ensemble methods address this by aggregating the
outputs of multiple models, each of which captures a different aspect of the data.

When you combine these diverse models probabilistically, through techniques like
bagging (Bootstrap Aggregating) or boosting, you effectively average out the biases
and errors, which tend to cancel each other out as the number of models increases. This
results in a more robust estimate of the underlying probability distribution of the data,
reducing variance and improving predictive accuracy.

b) What are the main strengths and weaknesses of Random Forest?

Strengths:

1. Handles Both Classification and Regression: Random Forest can be used for both
classification and regression tasks, making it versatile and applicable to a wide range of
problems.
2. Outlier Robustness: Random Forest is generally robust to outliers in the data. Outliers
do not have a significant impact on the ensemble's performance, as they might with
some other algorithms.
3. Parallelization: The individual decision trees in a Random Forest can be trained in
parallel, making it computationally efficient, especially when dealing with large datasets.
Weaknesses:
1. Limited Extrapolation Capability: Random Forest is less suitable for extrapolation
tasks, where you need to make predictions outside the range of the training data. It
tends to make flat predictions beyond the observed data.
2. Bias Toward Features with Many Categories: Random Forest may have a bias towards
features with many categories or levels, as they can be more likely to appear in
individual trees. This can affect feature importance scores.
3. Large Memory Footprint: Storing a large Random Forest model can require a
significant amount of memory, especially when dealing with a large number of trees and
features.
(c)What are hyperparameters of the Random Forest model? How do
you find these values.

 max_depth
 min_sample_split
 max_leaf_nodes
 min_samples_leaf
 n_estimators
 max_sample (bootstrap sample)
 max_features

(d)How Random Forest training and inference works? Give pseudo


code.

The Working of the Random Forest Algorithm is quite intuitive. It is implemented in two
phases: The first is to combine N decision trees with building the random forest, and the
second is to make predictions for each tree created in the first phase.

The following steps can be used to demonstrate the working process:

Step 1: Pick M data points at random from the training set.


Step 2: Create decision trees for your chosen data points (Subsets).

Step 3: Each decision tree will produce a result. Analyze it.

Step 4: For classification and regression, accordingly, the final output is based on
Majority Voting or Averaging, accordingly.

# Import necessary libraries

import numpy as np

from sklearn.tree import DecisionTreeClassifier

# Function to create a bootstrap sample from the dataset

def bootstrap_sample(X, y):

n_samples = X.shape[0]

indices = np.random.choice(n_samples, n_samples, replace=True)

return X[indices], y[indices]

# Function to train a single decision tree

def train_single_tree(X, y, max_depth):

tree = DecisionTreeClassifier(max_depth=max_depth)

tree.fit(X, y)

return tree

# Function to train a Random Forest

def train_random_forest(X, y, n_trees, max_depth):

forest = []

for _ in range(n_trees):

# Create a bootstrap sample

X_sample, y_sample = bootstrap_sample(X, y)


# Train a decision tree on the sample

tree = train_single_tree(X_sample, y_sample, max_depth)

# Add the trained tree to the forest

forest.append(tree)

return forest

# Main code

if __name__ == "__main__":

# Load your dataset (X, y)

# Define hyperparameters

n_trees = 100 # Number of trees in the forest

max_depth = 10 # Maximum depth of each decision tree

# Train the Random Forest

forest = train_random_forest(X, y, n_trees, max_depth)

# Now, you have a trained Random Forest (forest) ready for making predictions.

Q2

a) What is a support vector? Derive the objective function of support vector


machines (SVM) for linearly separable data.
What do you mean by support vector?
Support vectors are data points that are closer to the hyperplane and influence the
position and orientation of the hyperplane. Using these support vectors, we maximize
the margin of the classifier. Deleting the support vectors will change the position of the
hyperplane. These are the points that help us build our SVM.

(b)Differentiate between soft margin and hard margin classifier.

Hard margin SVM:


In a hard margin SVM, the goal is to find the hyperplane that can
perfectly separate the data into two classes without any
misclassification. However, this is not always possible when the data is
not linearly separable or contains outliers. In such cases, the hard
margin SVM will fail to find a hyperplane that can perfectly separate
the data, and the optimization problem will have no solution.

Soft margin SVM:

In a soft margin SVM, we allow some misclassification by introducing


slack variables that allow some data points to be on the wrong side of
the margin.

(d) What is kernel trick in SVM? How and why it is used?


Kernel methods represent the techniques that are used to deal with
linearly inseparable data or non-linear data set shown in fig 1. The idea is to create
nonlinear combinations of the original features to project them onto a higher-dimensional

space via a mapping function, , where the data becomes linearly separable.

You might also like