AIMl TA2

The document explains Cross-Validation (CV) as a resampling technique that improves model performance evaluation by systematically splitting datasets into multiple training and testing sets, particularly beneficial for small datasets. It also outlines key concepts in Machine Learning, including types, steps, terminology, tradeoffs, and feature extraction techniques, emphasizing the importance of balancing model complexity, accuracy, and interpretability. Additionally, it highlights various applications of Machine Learning across different industries.

Uploaded by

dhananjay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

AIMl TA2

Uploaded by

dhananjay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Cross-Validation Explained--When you have a limited number of samples, simply

splitting your dataset once into training and testing sets might give you biased or unstable
results, depending on how the split was done.---This is where Cross-Validation (CV) comes
in — it's a technique that helps ensure your model's performance is robust, reliable, and
generalizes well.
What is Cross-Validation?---Cross-validation is a resampling technique that splits your
dataset into multiple training and testing sets in a systematic way.
The most common type is: K-Fold Cross-Validation
---Here's how it works:---Split the dataset into k equal-sized folds (subsets).---For each of
the k iterations:---Use k-1 folds for training.---Use the remaining 1 fold for testing.---Repeat
this process k times, with a different fold used for testing each time.---Average the results
(like accuracy, precision, etc.) to get a final performance estimate.
--- Why It Helps (Especially with Small Datasets)---Less bias: Every data point gets to be
in the training and test set at least once.---More reliable evaluation: Reduces the effect of
randomness from a single train-test split.---Better generalization: Helps choose a model
that performs well across different splits.--- Example:---If you have 100 data points and
use 5-Fold CV:---Each fold will have 20 data points.---The model trains on 80, tests on 20
— repeated 5 times.---You get 5 performance scores → average them for final evaluation.
What is Machine Learning (ML)?---ML is a method where computers learn patterns
from data and make predictions or decisions without being explicitly programmed for the
task.
High-Level Steps in Machine Learning---------Step, Description-----Data Collection,
Gather raw data----Preprocessing, Clean and format data----Splitting, Divide into
training/testing sets-----Choose Model, Select an appropriate algorithm-----Training, Fit the
model to training data-----Evaluation, Test performance on unseen data----Prediction, Use
model to predict new data
Types of Machine Learning---Supervised Learning → Learn from labeled data (e.g.,
classification, regression)----Unsupervised Learning → Find patterns in unlabeled data
(e.g., clustering)-----Semi-supervised Learning → Mix of labeled and unlabeled data
-----Reinforcement Learning → Learn by interacting with an environment (e.g., games,
robotics)
Applications of Machine Learning------ Email, Spam detection--- Social Media,
Personalized recommendations, content ranking--- E-commerce, Product
recommendations, dynamic pricing--- Banking/Finance, Fraud detection, credit
scoring--- Healthcare, Disease prediction, medical imaging analysis--- Automotive,
Self-driving cars, traffic prediction--- Entertainment, Movie/music recommendations
(Netflix, Spotify)--- NLP, Chatbots, language translation, sentiment analysis---
Manufacturing, Predictive maintenance, quality control--- Weather, Forecasting,
disaster prediction--- Computer Vision, Face recognition, object detection
Key Machine Learning Terminology
1. Algorithm--A set of rules or mathematical procedures that a machine follows to learn
patterns from data.--It is the engine that powers model training.---Examples: Linear
Regression, Decision Tree, Support Vector Machine (SVM), Neural Networks.
2. Model---The output generated after the machine learning algorithm trains on data.---
It represents the learned relationship between inputs (features) and outputs (labels). ---
Once trained, the model is used to make predictions on new data.
3. Feature Set (Input Variables / Independent Variables)---The input data used to make
predictions.---Features are the measurable properties or characteristics of the data.---
Example: For a house price model, features could be:---Size (sq ft)---Number of bedrooms-
--Location====== 4. Predictor Variable----Another name for a feature — it's a variable
used to predict the outcome.----All predictor variables together form the feature set.
5. Response Variable (Target Variable / Output Variable / Dependent Variable)---The
outcome you are trying to predict or classify.---Example: In predicting house prices, the
price is the response variable.====== 6. Training Data---The portion of the dataset used
to train the model.--The model learns patterns from this data — both inputs (features) and
known outputs (targets).---Typically around 70–80% of the dataset.
7. Testing Data---The portion of the dataset used to evaluate the model's
performance.--The model has not seen this data during training, so it gives a real-world
performance estimate.----Typically around 20–30% of the dataset.
i) PrecisionDefinition:Precision tells us how many of the predicted positives are
actually positive. Use Case: When false positives are costly (e.g., spam detection – don't
mark important emails as spam).
ii) Recall (Sensitivity or True Positive Rate)--Definition:Recall tells us how many of the
actual positives were correctly predicted. Use Case: When missing positives is risky (e.g.,
disease detection – don’t miss people who are actually sick).
iii) F1-Score--Definition:F1-score is the harmonic mean of Precision and Recall. It
balances both when you need a single performance measure.
Significance of Errors (Residuals)---Residuals are critical in evaluating and improving
your model. Here's why: Model Accuracy---Smaller residuals = better predictions.---A
model with low average residuals is usually more accurate. Model Diagnosis---Plotting
residuals helps detect:---Non-linearity in data---Outliers or noise---Bias in predictions
(e.g., consistently over- or under-predicting)
Error Metrics Are Based on Residuals--Common performance metrics are derived from
residuals: Model Optimization---During training, algorithms try to minimize residuals by
adjusting internal parameters (e.g., weights in linear regression or neural nets). --This is
done using loss functions like MSE or cross-entropy (in classification). Helps in Feature
Selection---High residuals could mean:---Missing important features---Irrelevant or noisy
data---Need for transformation (e.g., polynomial features)
What is a Tradeoff in Machine Learning?
---A tradeoff in machine learning means that improving one aspect of a model’s
performance often comes at the cost of another. You can’t have it all — increasing one
thing might hurt another, so finding the right balance is key.
--- Common Tradeoffs in Machine Learning--- Bias-Variance Tradeoff
Bias:Error due to simplified assumptions in the model (e.g., using a linear model for
non-linear data).----High bias → Underfitting
Variance:---Error due to too much complexity, causing the model to fit the noise in the
training data.--High variance → Overfitting Goal: Find the sweet spot where both bias
and variance are minimized — i.e., the model generalizes well.
-- Precision vs. Recall Tradeoff---In classification tasks:---Precision: How many
predicted positives are correct---Recall: How many actual positives were correctly
predicted--Increasing precision may reduce recall, and vice versa.--- Example: In
medical diagnosis:---You might want high recall (catch all actual sick patients),---Even if
that means lower precision (some false alarms).
Model Complexity vs. Interpretability--Complex models (e.g., deep neural networks)
may give better accuracy,--But they are harder to interpret ("black boxes").--- On the
other hand:---Simple models like decision trees or linear regression are easy to
understand,---But may perform worse on complex problems.
Training Time vs. Accuracy---More training time (e.g., more epochs, larger datasets)
usually improves performance.---But after a point, returns diminish, and it may not be
worth the extra cost. -- Sometimes you have to trade speed for quality, especially in
real-time systems.---- Amount of Data vs. Model Performance
---More data typically improves performance.---But collecting and processing data is
costly and time-consuming.---Sometimes it’s better to:Improve the features or algorithm
instead of just gathering more data.
What is Feature Extraction in Machine Learning?
Feature Extraction is a technique used to transform raw data into a set of meaningful
features that can be used by a machine learning algorithm.
It involves creating new features from the existing data — often by reducing dimensionality
or extracting hidden patterns. Example:---Suppose you have an image. Raw pixels (say
100x100 = 10,000 features) are hard to use directly.---- With feature extraction, you might
transform the image into:-Color histograms-Edge maps-Shape descriptors
Common Feature Extraction Techniques:
Data Type, Techniques
Text, TF-IDF, Word2Vec, BERT embeddings
Images, SIFT, HOG, CNN feature maps
Audio, MFCC (Mel-frequency cepstral coefficients)
General, PCA (Principal Component Analysis), Autoencoders
How is Feature Extraction Different from Feature Selection?
Aspect, Feature Extraction, Feature Selection
What it does, Creates new features from raw data, Chooses the best features from existing
Output, Transformed data (possibly lower dimensions), Subset of original features
Techniques, PCA, LDA, Autoencoders, Chi-Square, Mutual Info, RFE
Example, Combine multiple sensor signals into 1 summary, Pick top 10 features from 100
Goal, Compress, reduce noise, find hidden patterns, Improve performance, reduce
overfitting
Advantages of Feature Extraction
====Improves Model Performance---Removes noise and irrelevant data---Highlights
patterns, trends, and important signals
===Reduces Dimensionality--Helps tackle the “curse of dimensionality”---Speeds up
training and reduces overfitting
===Makes Data More Manageable---Easier to visualize and interpret---Enables use of
algorithms that require fewer inputs
====Boosts Generalization----Helps models perform better on unseen data
Real-World Example---- Sentiment Analysis on movie reviews:
Raw text → Cleaned text → TF-IDF or Word2Vec vectors → Used for classification
---- Face Recognition: Raw pixels → PCA → Eigenfaces → Classifier

Using Grayscale Images For Object Recognition With Convolutional-Recursive Neural Network
No ratings yet
Using Grayscale Images For Object Recognition With Convolutional-Recursive Neural Network
5 pages
Strategy Deck
No ratings yet
Strategy Deck
16 pages
Introduction Class
No ratings yet
Introduction Class
134 pages
ML Fundamentals
No ratings yet
ML Fundamentals
15 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
Unit 3
No ratings yet
Unit 3
55 pages
ML-chap-2
No ratings yet
ML-chap-2
60 pages
ML MAKAUT unit-3
No ratings yet
ML MAKAUT unit-3
6 pages
Training Evaluation
No ratings yet
Training Evaluation
42 pages
ML
No ratings yet
ML
4 pages
Machine Learning Notes (1)
No ratings yet
Machine Learning Notes (1)
19 pages
Week 02
No ratings yet
Week 02
9 pages
Lecture 12 - Machine Learning
No ratings yet
Lecture 12 - Machine Learning
18 pages
Unit 3
No ratings yet
Unit 3
17 pages
Lecture 5 Evaluation_Classifer
No ratings yet
Lecture 5 Evaluation_Classifer
61 pages
module3_DS_ppt
No ratings yet
module3_DS_ppt
68 pages
7 ML
No ratings yet
7 ML
38 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
Machine Learning 1707965934
No ratings yet
Machine Learning 1707965934
15 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Unit 5 Intro To Machine Learning
No ratings yet
Unit 5 Intro To Machine Learning
25 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Machine Learning Notes "2023
No ratings yet
Machine Learning Notes "2023
31 pages
Unit4_PPT (2)
No ratings yet
Unit4_PPT (2)
126 pages
ML
No ratings yet
ML
9 pages
Final ML
No ratings yet
Final ML
2 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Overfitting & Feature Engineering.pptx
No ratings yet
Overfitting & Feature Engineering.pptx
37 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
8 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Machine Learning – I[1]
No ratings yet
Machine Learning – I[1]
126 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Machine Learning Note (2)
No ratings yet
Machine Learning Note (2)
40 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
AIML-HC Mod 03
No ratings yet
AIML-HC Mod 03
46 pages
Introduction To Machine Learning PPT Main
No ratings yet
Introduction To Machine Learning PPT Main
15 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
machine learning
No ratings yet
machine learning
37 pages
sdl unit 1
No ratings yet
sdl unit 1
7 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Lecture 5 - Feature extraction, model building & evaluation
No ratings yet
Lecture 5 - Feature extraction, model building & evaluation
35 pages
A Practical and Technical Introduction To Machine Learning
No ratings yet
A Practical and Technical Introduction To Machine Learning
23 pages
Pa 2
No ratings yet
Pa 2
13 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
GATE ML Updated 111023
No ratings yet
GATE ML Updated 111023
109 pages
Machine Learning Fundamentals
No ratings yet
Machine Learning Fundamentals
4 pages
ML notes
No ratings yet
ML notes
16 pages
Machine Learning # 2
No ratings yet
Machine Learning # 2
17 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
ML assignment
No ratings yet
ML assignment
13 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
DSP Lab Manual New
No ratings yet
DSP Lab Manual New
105 pages
DAA_Lab Workbook - 23CS2205R
No ratings yet
DAA_Lab Workbook - 23CS2205R
94 pages
U02Lecture06 Regression
No ratings yet
U02Lecture06 Regression
25 pages
Probability
No ratings yet
Probability
7 pages
Adversary Arguments: A Method For Obtaining Lower Bounds
No ratings yet
Adversary Arguments: A Method For Obtaining Lower Bounds
12 pages
Machine Learning Algorithms
No ratings yet
Machine Learning Algorithms
9 pages
Lesson 5 - Linear Programming PDF
No ratings yet
Lesson 5 - Linear Programming PDF
2 pages
Factoring Sum and Difference of Two Cubes
No ratings yet
Factoring Sum and Difference of Two Cubes
14 pages
Channel Capacity Notes
No ratings yet
Channel Capacity Notes
26 pages
0-1 Integer Linear Programming Model For Location Selection of Fire Station: A Case Study in Indonesia
100% (1)
0-1 Integer Linear Programming Model For Location Selection of Fire Station: A Case Study in Indonesia
8 pages
MATLAB Code for solving Traffic Signal Timing Optimization using the Firefly Optimization Algorithm
No ratings yet
MATLAB Code for solving Traffic Signal Timing Optimization using the Firefly Optimization Algorithm
5 pages
Introduction of Analysis Design & Algorithm
No ratings yet
Introduction of Analysis Design & Algorithm
42 pages
Scholkopf Kernel PDF
No ratings yet
Scholkopf Kernel PDF
6 pages
3 Round Off Error and Approximation
No ratings yet
3 Round Off Error and Approximation
37 pages
AMP Chapter 13 - Nonlinier Programing
No ratings yet
AMP Chapter 13 - Nonlinier Programing
13 pages
IFEM Ch09
No ratings yet
IFEM Ch09
16 pages
Preemptive
No ratings yet
Preemptive
6 pages
Interview Preparation Notes PDF
No ratings yet
Interview Preparation Notes PDF
415 pages
Data Structures and Algorithms (DSA) in Python - Self Paced
No ratings yet
Data Structures and Algorithms (DSA) in Python - Self Paced
4 pages
C Program To Find Shortest Path Using Dijkstra
No ratings yet
C Program To Find Shortest Path Using Dijkstra
9 pages
Implementation and Validation of NSGA-II Algorithm For Constrained and Unconstrained Multi-Objective Optimization Problem
No ratings yet
Implementation and Validation of NSGA-II Algorithm For Constrained and Unconstrained Multi-Objective Optimization Problem
7 pages
YAAPT Pitch Tracking MATLAB Function
No ratings yet
YAAPT Pitch Tracking MATLAB Function
11 pages
Chapter 6 - Backpropagation
No ratings yet
Chapter 6 - Backpropagation
48 pages
Flood Prediction Using Supervised Machine Learning Algorithms
No ratings yet
Flood Prediction Using Supervised Machine Learning Algorithms
4 pages
10th Polynomial Cbse Test Paper PDF
No ratings yet
10th Polynomial Cbse Test Paper PDF
1 page
Riya Bepari - 34700122020 - Numerical Methods
No ratings yet
Riya Bepari - 34700122020 - Numerical Methods
16 pages
GCSE Maths - Iteration - 230908 - 220424
No ratings yet
GCSE Maths - Iteration - 230908 - 220424
2 pages
Program 03 Program To Show The Steps To Solve 8-Puzzle Problem
No ratings yet
Program 03 Program To Show The Steps To Solve 8-Puzzle Problem
3 pages
Matched Filter in Radar
100% (1)
Matched Filter in Radar
10 pages

AIMl TA2

Uploaded by

AIMl TA2

Uploaded by

Cross-Validation Explained--When you have a limited number of samples, simply

You might also like