0% found this document useful (0 votes)

8 views32 pages

Machine Learning

Machine learning is a field of computer science that allows computers to learn from data through algorithms without explicit programming. It includes supervised learning, which uses labeled data for predictions, and unsupervised learning, which identifies patterns in unlabeled data. Common algorithms include linear regression, logistic regression, decision trees, and K-means clustering, each with specific applications and evaluation methods.

Uploaded by

ankushsonawane36

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views32 pages

Machine Learning

Uploaded by

ankushsonawane36

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

MACHINE

LEARNIN
G
M a c h i n e L e a r n i n g i s a fi e l d o f c o m p u t e r
science that enables computers to learn
from data without being explicitly
programmed. It involves the development of
algorithms that can learn from data and
make decisions or predictions based on data.
Types of
Learning
Featues Supervised Learning Un - Supervised Learning

Data Type Labeled Un-labeled

Discover hidden patterns and

Learning Goal Make predictions for new data
structures within data

Exploring and making sense of the

Analogy Learning with a teacher's guidance
world on your own

The model is trained on a dataset

The model is trained on a dataset
where data points have no
where each data point has a
predefined labels. The model
corresponding label (desired
Model Training identifies similarities and
output). The model learns the
differences between data points to
relationship between the input
group them or uncover underlying
features and the labels.
structures.
Types of
Learning
Featues Supervised Learning Un - Supervised Learning

The effectiveness of the model is

The model's performance is
evaluated based on how well it
evaluated on unseen data to
Evaluation achieves the desired outcome,
assess its ability to generalize and
such as identifying distinct clusters
make accurate predictions.
or meaningful patterns.

Linear Regression, Logistic K-Means Clustering, Hierarchical

Common Algorithms Regression, Decision Trees, Support Clustering, Principal Component
Vector Machines (SVM) Analysis (PCA), Apriori algorithm

Customer segmentation: Groups

Spam filter: Classifies emails as customers with similar
spam or not spam based on labeled characteristics based on unlabeled
training data. customer data.
Example
Stock price prediction: Predicts Anomaly detection: Identifies
future stock prices based on unusual patterns in data,
historical labeled data. potentially indicating fraud or
system failures.
SUPERVISED
LEARNING
Linear
Regression
Linear Regression models the relationship between a
dependent variable and one or more independent
variables by fitting a linear equation to observed data. It
is used to predict a continuous outcome based on input
features.

• Assumptions on Data:
• Linearity: The relationship between the
independent and dependent variables is linear.
• Independence: Observations are independent of
each other.
• Homoscedasticity: The variance of the errors is
constant across all levels of the independent
variables.
• Normality: The errors are normally distributed.
Aspects Ridge Regression Lasso Regression Elastic Net Regresiion

Linear regression with Linear regression with

Linear regression with a
regularization term that regularization term that
Definition combination of L1 and L2
penalizes the L2 norm of the penalizes the L1 norm of the
regularization terms.
coefficients. coefficients.

Objective Function

Shrinks coefficients towards Induces sparsity by setting Combines benefits of Ridge

Shrinkae Effect zero, but does not set them some coefficients to exactly and Lasso, allowing for both
exactly to zero. zero. shrinkage and sparsity.

Less effective for feature

Effective for feature Balances between Ridge and
selection, as it does not force
Feature Selection selection, as it can eliminate Lasso in terms of feature
coefficients to be exactly
irrelevant features. selection.
zero.

When there are many

When all features are When there are many
features and multicollinearity
suaitable for potentially relevant and irrelevant features or when a
is present, but also when
multicollinearity is present. sparse solution is desired.
feature selection is desired.
Logistic
Regression
Logistic regression is used for binary classification
problems. It models the probability of an event occurring
by fitting data to a logistic function ( sigmoid function ).

• Assumptions on Data:
• · Linearity: The log odds of the outcome is a linear
combination of the predictor variables.
• · Independence: Observations are independent of
each other.
• · Large Sample Size: Logistic regression requires a
large sample size to provide a good estimate of the
model parameters.
Naive
Bayes
Naive Bayes is a probabilistic machine learning
algorithm based on Bayes' theorem with the Key Formulas:
assumption of independence between features. 1.Key Formulas:
• Bayes' Theorem:
It's commonly used for classification tasks,
especially in text classification and spam filtering.

• where 𝐶𝑘is the class label, 𝑋 is the input features,

𝑃(𝐶𝑘∣𝑋) is the posterior probability of class 𝐶𝑘given
Working:

features 𝑋, 𝑃(𝑋∣𝐶𝑘) is the likelihood of features given class

• Naive Bayes calculates the probability of a data

𝐶𝑘, 𝑃(𝐶𝑘) is the prior probability of class 𝐶𝑘, and 𝑃(𝑋) is

point belonging to each class based on the feature

the probability of features 𝑋.

values.
• It assumes that the features are conditionally
independent given the class label. • Independence Assumption:
• To classify a new data point, Naive Bayes selects

• assuming features 𝑋1,𝑋2,...,𝑋𝑛 are conditionally

the class with the highest posterior probability

independent given class 𝐶𝑘.

using Bayes' theorem.
Types:
• Gaussian Naive Bayes: Assumes that continuous features follow a Gaussian distribution.
• Multinomial Naive Bayes: Suitable for discrete features, commonly used in text classification with word counts or TF-
IDF values.
• Bernoulli Naive Bayes: Similar to Multinomial Naive Bayes but assumes binary features (e.g., presence or absence of
words).
Support Vector
Support VectorMachine
Machine (SVM) is a supervised
machine learning algorithm used for classification
and regression tasks. It finds a hyperplane in an N-
dimensional space (N is the number of features) that
best separates the data points into classes.

Working:
• SVM aims to find the hyperplane that maximizes
the margin between the classes.
• The margin is the distance between the
hyperplane and the nearest data point from each
class, known as support vectors. Kernel Trick:
• Support vectors are the critical data points that • SVM can be extended to non-linearly
determine the position and orientation of the separable data using a kernel function,
hyperplane. such as polynomial, radial basis function
• SVM seeks to maximize this margin, making it (RBF), or sigmoid kernel.
robust to outliers and generalizable to unseen • Kernel functions transform the input space
data. into a higher-dimensional space where the
data becomes linearly separable.
K-Nearest
Neighbors
K-Nearest Neighbors (KNN) is a simple, non-
parametric supervised learning algorithm used for
classification and regression tasks. It classifies a data
point based on the majority class of its k nearest
neighbors in the feature space.

Working:
• Given a new data point, KNN calculates the
distance (e.g., Euclidean distance) to all other
data points in the training set.
• It selects the k nearest neighbors based on the
calculated distances. Key Formulas:
• For classification, KNN assigns the class label that Euclidean Distance:
is most common among the k neighbors.
• For regression, KNN predicts the average value of
the target variable among the k neighbors.

Majority Voting: For classification, the class of a data point

is determined by the majority class among its k nearest
neighbors.
Decision
TreeTree is a supervised machine learning
Decision Key Formulas:
algorithm used for both classification and regression • Gini Impurity:
tasks. It partitions the feature space into regions,
assigning a label or value to each region based on
the majority class or average target value of the • where 𝑝𝑖is the proportion of samples in class 𝑖i at a
training instances within that region. particular node.
• Entropy:
Working:
• Decision Tree recursively splits the feature space

• where 𝑝𝑖is the proportion of samples in class 𝑖i at a

into subsets, based on the values of features, in a
hierarchical manner.
particular node.
• At each node of the tree, it selects the feature
• Information Gain:
that best splits the data into homogeneous
subsets, using a criterion such as Gini impurity,
• where 𝐷 is the dataset, 𝐴 is a feature, 𝐷𝑣is the
entropy, or information gain.

subset of 𝐷 where feature 𝐴 takes value 𝑣, and 𝐼 is

• The splitting process continues until a stopping
criterion is met, such as reaching a maximum tree
the impurity measure.
depth, minimum number of samples in a node, or
no further improvement in impurity reduction.
Decision
Tree
Splitting Criteria:

• Gini Impurity: Measures the probability of a

randomly chosen sample being incorrectly
classified.

• Entropy: Measures the average amount of

information needed to classify a sample.

• Information Gain: Measures the reduction

in impurity achieved by splitting the data on
a particular feature.
ENSEMBLE
LEARNING
Baggi
ng
bootstrapping refers to a sampling method that
involves drawing random samples from a dataset
with replacement.

Bootstrap aggregating, more commonly known as

bagging, is a technique that directly leverages the
concept of bootstrapping to create and train a weak
learner on each of the individual subsets (sometimes
referred to as “bags”). A weak learner in this context is
defined as an algorithm that performs just slightly better
than random guessing.

Bagging is a parallel process, meaning that the models

are created in parallel on these subsets and are
independent of one another.

After fitting a model to each of the bootstrapped subsets,

their respective results are combined, or aggregated, in
order to obtain the final results.
Boostin Compared to bagging, boosting generally does not make use of
bootstrapping and follows a sequential process, where each
g subsequent model tries to correct the errors of the previous
model and reduce its bias.

First, a weak learner is fitted on the original training data and

subsequently evaluated by comparing its predictions to the
actual values. During this initial iteration, all samples are given
equal weights.

Next, we increase the weights of the misclassified samples and

decrease the weight of the correctly classified ones. The
samples that were misclassified will thus have higher weights in
the next iteration, thereby “boosting” their importance.

This process is then repeated for a pre-defined number of

iterations, or until the models’ predictions reach a desired level
of accuracy. Once all the models are trained, their predictions
are combined to produce the final output. Typically, the
prediction of each individual model is weighted based on its
accuracy, with more accurate models contributing more to the
final prediction.
Random
Key Formulas:
Forest
Random Forest is an ensemble learning method • Random Forest does not have specific key
used for both classification and regression tasks. It formulas, but it utilizes the same formulas as
operates by constructing a multitude of decision Decision Trees for splitting criteria, such as Gini
trees during training and outputs the class that is the impurity, entropy, or information gain.
mode of the classes (classification) or mean
prediction (regression) of the individual trees. Hyperparameters:
• Number of Trees: The number of decision trees
Working:
in the forest.
• Random Forest builds multiple decision trees
• Max Depth: The maximum depth allowed for
using a technique called bootstrap aggregation
each decision tree.
(bagging).
• Minimum Samples Split: The minimum number
• Each tree is trained on a random subset of the
training data (bootstrap sample) and a random of samples required to split a node.
subset of features at each split. • Minimum Samples Leaf: The minimum number
• During prediction, each tree in the forest of samples required to be at a leaf node.
independently predicts the class label (or • Max Features: The number of features to
regression value) of a new data point. consider when looking for the best split.
• The final prediction is determined by averaging (in • Bootstrap Sampling: Whether to use bootstrap
regression) or taking a majority vote (in sampling (with replacement) when building trees.
classification) over the predictions of all trees.
Random
Forest Splitting Criteria:
• Gini Impurity: Measures the probability of a
randomly chosen sample being incorrectly
classified.
• Entropy: Measures the average amount of
information needed to classify a sample.
• Information Gain: Measures the reduction
in impurity achieved by splitting the data on
a particular feature.

Ensemble Technique:
• Random Forest is an ensemble technique
that combines the predictions of multiple
weak learners (decision trees) to improve
overall performance and robustness.
• It reduces overfitting and variance by
averaging or voting over multiple
independent models.
UN-
SUPERVISED
LEARNING
K-
Means
K-means clustering is an unsupervised
machine learning algorithm used for partitioning
a dataset into k distinct, non-overlapping
clusters. It aims to minimize the within-cluster
variance, or inertia, by iteratively assigning data
points to the nearest cluster centroid and
updating the centroids.
Working:
• Initialize k centroids randomly or based on some Key Formulas:
heuristic. Distance Metric: Commonly used distance metric is Euclidean
• Assign each data point to the nearest centroid, distance, but other metrics like Manhattan distance or cosine
forming k clusters. similarity can also be used.
• Update the centroids by computing the mean of all
data points assigned to each cluster.
• Repeat the assignment and update steps until
convergence, i.e., when the centroids no longer Within-Cluster Variance: Inertia is often used as a measure
change significantly or a maximum number of of clustering quality, defined as the sum of squared distances of
iterations is reached. samples to their closest cluster center.
K-
Means
The elbow plot is a graphical tool used to determine
the optimal number of clusters (k) in a K-Means
clustering algorithm. It helps identify the point where
increasing the number of clusters results in
diminishing returns, meaning the improvement in
clustering performance slows down, forming an
"elbow" shape in the plot.

Working:
• Perform K-Means clustering on the dataset for a range of
k values (typically from 1 to a chosen upper limit).
• For each k value, compute the within-cluster variance or Key Terms:
inertia (sum of squared distances from each point to its • Inertia: The sum of squared distances of each data
assigned centroid). point to its nearest centroid. It indicates how tightly
• Plot the value of inertia (on the y-axis) against the the clusters are packed.
number of clusters k (on the x-axis). • Diminishing Returns: The concept where after a
• The "elbow" point on the curve indicates the optimal k certain point, adding more clusters does not
value. After this point, adding more clusters provides significantly improve the clustering outcome.
only minimal reduction in inertia. • Elbow Point: The k value where inertia starts to
decrease at a slower rate, forming a sharp bend in the
plot.
Types:

Hierarchical • Agglomerative Hierarchical Clustering: It starts

with individual data points as clusters and iteratively

Clustering
merges the closest pairs of clusters until the desired
Hierarchical Clustering is an unsupervised clustering number of clusters is reached.
algorithm that builds a hierarchy of clusters. It does not • Divisive Hierarchical Clustering: It starts with a
require the number of clusters K to be specified in advance. single cluster containing all data points and recursively
It can produce either a dendrogram (tree-like structure) or a splits the cluster into smaller clusters until each cluster
set of nested clusters. contains only one data point.

Working:
• Start with each data point as its own cluster, treating N
data points as N clusters.
• Merge the two closest clusters into a single cluster based
on a distance metric (e.g., Euclidean distance).
• Repeat the merging process until only a single cluster
remains or until a stopping criterion is met.

Hyperparameters:
• Distance Metric: The choice of distance metric can significantly affect the clustering results.
• Linkage Method: The method used to calculate the distance between clusters can impact the resulting cluster
structure.
• Stopping Criterion: Criteria such as the maximum number of clusters or a threshold distance can be used to stop
the clustering process.
DB-
SCAN
DBSCAN is a density-based clustering algorithm used to partition a dataset into clusters of varying shapes and
sizes.It does not require the number of clusters k to be specified beforehand. DBSCAN is capable of identifying noise
points, which do not belong to any cluster.

Working:
• DBSCAN defines clusters as dense regions of data points separated by regions of lower density.
• It requires two parameters: ε, the maximum distance between two points to be considered neighbors, and minPts,
the minimum number of points required to form a dense region (core point).
• The algorithm starts by randomly selecting a point from the dataset. If it has at least minPts neighbors within
distance ε, it becomes a core point, and a new cluster is formed.
• The algorithm expands the cluster by adding all reachable points (including core points and border points) within
distance ε to the cluster.
• If a core point is not reachable from any existing cluster, it becomes a new cluster.
• Points that are not core points and are not reachable from any cluster are considered noise points.
DB-
Key SCAN
Formulas :
• Reachability Distance: The reachability distance
between two data points p and q is defined as the
maximum of the core distance of q and the distance
between p and q.
• Reachability distance is calculated using the
formula:

• Where p and q are data points, and the core distance of

q (core-distance(q)) is the distance to its minPts nearest
neighbor.
• Core Distance: The core distance of a data point p is
the distance to its minPts nearest neighbor. Core
distance is represented as core-distance(p).
• Border Points: Border points are data points that are
not core points themselves but are within the ε-
neighborhood of a core point.
Principal Component
Analysis
Principal Component Analysis (PCA) is a dimensionality
reduction technique that simplifies complex datasets by
transforming the original variables into a smaller set of
new variables called principal components. These
components capture the most important patterns or
variance in the data while reducing the number of
features.

Working:
• Standardize the data: Ensure that each feature (like age, Key Terms:
income, etc.) is on the same scale to avoid larger values • Principal Components: New variables
dominating the analysis. created by combining the original ones,
• Compute the covariance matrix: Calculate the relationships capturing the main patterns in the data.
between features to see how much they vary together. • Covariance Matrix: A table showing how
• Find principal components: Use mathematical methods to find much each pair of features varies together.
the directions (called principal components) where the data shows • Eigenvalues: Values that indicate how
the most variation. much information (variance) each principal
• Rank the components: The components are ranked based on component contains.
how much of the total variance they capture, using eigenvalues. • Eigenvectors: The directions or axes
• Reduce dimensions: Select the top k components that capture along which the principal components lie.
the most variance, and project the original data onto these
E VA L U AT I O N
METRICS
Regression
metrics
Basic metrics Given a regression
model f, the following metrics are
commonly used to assess the
performance of the model:

Coefficient of determination - the coefficient of

determination, often noted or , provides a
measure of how well the observed outcomes are
replicated by the model and is defined as follows:
Classifiaction
Confusion Metrics
matrix : The confusion matrix is used
to have a more complete picture when assessing
the performance of a model.

The following metrics are commonly used to assess the performance of

classification models :
Classifiaction
Metrics
ROC : The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the
threshold.

AUC : The area under the receiving operating curve, also noted AUC or AUROC, is the area below the
ROC as shown in the following figure
Model
have asselection :
When selecting a model, we distinguish 3 different parts of the data that we
follows:
Training Set Validation Set Testing Set

Model is trained Model is assesed Model gives predictions

usually 80% of the dataset usually 20% of the dataset Unseen data

Also Called hold-out or development set

Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set.
These are represented in the figure below:
Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much
on the initial training set. The different types are summed up in the table below:

K-fold Leave p-out

Training on k-1 folds and assesment on the remaining one Training on n-p observations and assessment on p remaining ones

Generally K=5 or 10 Case p =1 is called leave out-one

The most commonly used method is called K-folds cross-validation and splits the training data into K-
folds to validate the model on one fold while training the model on the K- 1 other folds, all of this K times.
The error is then averaged over the folds and is named cross-validation error.
Bias - Varaiance
Tradeoff
Bias - The bias of a model is the difference
between the expected prediction and the correct
model that we try to predict for given data points.

Variance - The variance of a model is the

variability of the model prediction for given data
points.

Bias/variance tradeoff - The simpler the model,

the higher the bias, and the more complex the
model, the higher the variance.
T h a n k
y o u !

Machine learning algorithms laiki
No ratings yet
Machine learning algorithms laiki
123 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Machine Learning Notes ?
No ratings yet
Machine Learning Notes ?
14 pages
AIYA SESSION 4
No ratings yet
AIYA SESSION 4
42 pages
Unit 1
No ratings yet
Unit 1
15 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Asign-3 DWDM
No ratings yet
Asign-3 DWDM
27 pages
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
No ratings yet
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
16 pages
Supervised Learning
No ratings yet
Supervised Learning
46 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Lecture - 2 & 3
No ratings yet
Lecture - 2 & 3
62 pages
Module 3 (1)
No ratings yet
Module 3 (1)
63 pages
U21amg05 Aif and ML Unit 04 Notes
No ratings yet
U21amg05 Aif and ML Unit 04 Notes
42 pages
Module 1 ML Mumbai University
No ratings yet
Module 1 ML Mumbai University
47 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Classification
No ratings yet
Classification
50 pages
Classification
No ratings yet
Classification
7 pages
1
No ratings yet
1
42 pages
ML - Interview Prep
No ratings yet
ML - Interview Prep
9 pages
Classifying in Machine Learning
No ratings yet
Classifying in Machine Learning
26 pages
Machine Learning Supervised
No ratings yet
Machine Learning Supervised
42 pages
Machine Learning
100% (6)
Machine Learning
115 pages
Session 5 ppt
No ratings yet
Session 5 ppt
36 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
ML Research Paper
No ratings yet
ML Research Paper
9 pages
Machine Learning For Beginners PDF
No ratings yet
Machine Learning For Beginners PDF
29 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
21 pages
[English (Auto-generated)] All Machine Learning Algorithms Explained in 17 Min [DownSub.com]
No ratings yet
[English (Auto-generated)] All Machine Learning Algorithms Explained in 17 Min [DownSub.com]
19 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
machine learning
No ratings yet
machine learning
37 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
ML-Unit-2
No ratings yet
ML-Unit-2
33 pages
Group 2 ML Asignmet Ppt
No ratings yet
Group 2 ML Asignmet Ppt
23 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
Chapter 2
No ratings yet
Chapter 2
31 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Introduction to AI
No ratings yet
Introduction to AI
51 pages
Machine Learning
No ratings yet
Machine Learning
40 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Machine Learning Reg
No ratings yet
Machine Learning Reg
45 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Assignment 2
No ratings yet
Assignment 2
111 pages
Machine Learning Section4 Ebook v03
No ratings yet
Machine Learning Section4 Ebook v03
20 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
1 - Supervised Learning & Its Types
No ratings yet
1 - Supervised Learning & Its Types
24 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
118 pages
DM assignment 2
No ratings yet
DM assignment 2
23 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
ML
No ratings yet
ML
49 pages
Supervised Learning
No ratings yet
Supervised Learning
187 pages
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
Mastering Advanced Object-Oriented Programming in Java: Unlock the Secrets of Expert-Level Skills
From Everand
Mastering Advanced Object-Oriented Programming in Java: Unlock the Secrets of Expert-Level Skills
Larry Jones
No ratings yet
Chapter 4 PDF
No ratings yet
Chapter 4 PDF
89 pages
Zafira fk,+4 Vol11No1 855+ (36-47) +
No ratings yet
Zafira fk,+4 Vol11No1 855+ (36-47) +
12 pages
PA_TH_MDM[1]
No ratings yet
PA_TH_MDM[1]
4 pages
Ml Unit 5 Material Svck Cse
No ratings yet
Ml Unit 5 Material Svck Cse
22 pages
Support Vector Machine
No ratings yet
Support Vector Machine
21 pages
HCIA-AI Exercises
No ratings yet
HCIA-AI Exercises
43 pages
DL_UNIT-4_Part-1
No ratings yet
DL_UNIT-4_Part-1
10 pages
Deep Learning
No ratings yet
Deep Learning
1 page
1-Resnet Slides
No ratings yet
1-Resnet Slides
89 pages
CS230 Midterm Solutions Fall 2021
No ratings yet
CS230 Midterm Solutions Fall 2021
14 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
9751 27984 1 PB
No ratings yet
9751 27984 1 PB
10 pages
Asset-V1 MITx+6.86x+3T2020+typeasset+blockslides Lecture12 Compressed PDF
No ratings yet
Asset-V1 MITx+6.86x+3T2020+typeasset+blockslides Lecture12 Compressed PDF
13 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
Ensemble
No ratings yet
Ensemble
2 pages
TSR Neural
No ratings yet
TSR Neural
16 pages
Foundations of Deep Learning
No ratings yet
Foundations of Deep Learning
48 pages
Paquete
No ratings yet
Paquete
4 pages
Conformer
No ratings yet
Conformer
5 pages
21cs743 Solutions
No ratings yet
21cs743 Solutions
19 pages
Basics of ANN
No ratings yet
Basics of ANN
16 pages
Peng Et Al.: Deep Learning and Practice 1
No ratings yet
Peng Et Al.: Deep Learning and Practice 1
8 pages
KRAI Practical
No ratings yet
KRAI Practical
14 pages
Advance AI and ML LAB
No ratings yet
Advance AI and ML LAB
16 pages
roadmap to GenAi
No ratings yet
roadmap to GenAi
2 pages
Deep Learning Interview Questions - Deep Learning Questions
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
21 pages
Al3451 Ml - Questionbank -3,4,5
No ratings yet
Al3451 Ml - Questionbank -3,4,5
11 pages
B.E Syllabus For DL
No ratings yet
B.E Syllabus For DL
4 pages
11 Regression and Classification Using Neural Networks
No ratings yet
11 Regression and Classification Using Neural Networks
2 pages
Unit 7 - Week 6: Assignment 6
No ratings yet
Unit 7 - Week 6: Assignment 6
4 pages

Machine Learning

Uploaded by

Machine Learning

Uploaded by

MACHINE

Data Type Labeled Un-labeled

Discover hidden patterns and

Exploring and making sense of the

The model is trained on a dataset

The effectiveness of the model is

Linear Regression, Logistic K-Means Clustering, Hierarchical

Customer segmentation: Groups

Linear regression with Linear regression with

Shrinks coefficients towards Induces sparsity by setting Combines benefits of Ridge

Less effective for feature

When there are many

• where 𝐶𝑘​is the class label, 𝑋 is the input features,

features 𝑋, 𝑃(𝑋∣𝐶𝑘) is the likelihood of features given class

𝐶𝑘​, 𝑃(𝐶𝑘) is the prior probability of class 𝐶𝑘​, and 𝑃(𝑋) is

the probability of features 𝑋.

• assuming features 𝑋1,𝑋2,...,𝑋𝑛 are conditionally

independent given class 𝐶𝑘​.

Majority Voting: For classification, the class of a data point

• where 𝑝𝑖​is the proportion of samples in class 𝑖i at a

subset of 𝐷 where feature 𝐴 takes value 𝑣, and 𝐼 is

• Gini Impurity: Measures the probability of a

• Entropy: Measures the average amount of

• Information Gain: Measures the reduction

Bootstrap aggregating, more commonly known as

Bagging is a parallel process, meaning that the models

After fitting a model to each of the bootstrapped subsets,

First, a weak learner is fitted on the original training data and

Next, we increase the weights of the misclassified samples and

This process is then repeated for a pre-defined number of

Hierarchical • Agglomerative Hierarchical Clustering: It starts

• Where p and q are data points, and the core distance of

Coefficient of determination - the coefficient of

The following metrics are commonly used to assess the performance of

Model is trained Model is assesed Model gives predictions

Also Called hold-out or development set

K-fold Leave p-out

Generally K=5 or 10 Case p =1 is called leave out-one

Variance - The variance of a model is the

Bias/variance tradeoff - The simpler the model,

You might also like

• where 𝐶𝑘is the class label, 𝑋 is the input features,

𝐶𝑘, 𝑃(𝐶𝑘) is the prior probability of class 𝐶𝑘, and 𝑃(𝑋) is

independent given class 𝐶𝑘.

• where 𝑝𝑖is the proportion of samples in class 𝑖i at a