UNIT I-Part 1
UNIT I-Part 1
Introduction to Machine
Learning and Supervised
Learning
Code:U18CST7002
Presented by: Nivetha R
Department: CSE
Introduction to Machine Learning
Introduction to Machine Learning
10
Classification
2. Pattern Recognition
• Character recognition
• Input: character image, classes: as many number of characters
• recognizing char code from images.
• ML is used to learn sequences and model dependencies
• Face recognition
• Input: image, classes: people to be recognized
• ML algorithm should learn to associate the face image to identities
• Medical diagnosis
• Input: info about patient, classes: various illness
• Model may identify the type of disease
• Speech recognition
• Input: acoustic, classes: words
• Model should learn the association from an acoustic signal to a word of some language
• Biometrics
• Input: Physiological(face,fingerprint,iris,palm) /behavioural characteristics(voice,gait,key
stroke)
• Decision: Accept/Reject
11
Regression
Predicting the output
1. Price of used car
Input: car attributes-brand,year, mileage, engine,etc.,
Output: price
x-denote attributes of car,
y - price of the car
By using training data (past transactions) ML model fits a function to this
data to learn y as a function of x.
y=wx+w0
2. Price of House
Input: size in feet
Output: price
3. Navigation of Mobile Robot
Input: from sensors-GPS,camera,etc
Output: angle by which steering wheel should be turned
12
Unsupervised Learning
• Only input data is available (unlabeled)
• Aim is to find the regularities/pattern/structure in the input data
• Problems
• Clustering
• Association
13
Clustering Places the data into diff erent
clusters
1. Customer Dataset analysis
• Company wants to know the distribution of the profile of its
customers
• Clustering model allocates customers similar in their attributes to the
same group
• Also possible to identify outliers-who are different from other
customers
2. Image Compression
• Input –image pixels represented as RGB values
• Clustering program groups pixels with similar colors in the same
group (colors that are frequently occurring in image)
3. Document clustering
• Ex: News reports: clusters: Politics, sports, entertainment, etc.,
• Document-bag of words-predefine lexicon of N words
4. Bioinformatics (DNA is a sequence of bases, protein is a
sequence of amino acids)
• Alignment- Matching one sequence to another 14
Association
Finding association between products
• Basket Analysis
• Finding associations between products bought by customer
• If people who buy X typically also buy Y and if there is a
customer who buys X but not Y, then he is the potential Y
customer
• Target the potential customer for cross selling
• Association rule – conditional probability
• P(Y/X)
• Ex: P(Jam/Bread) = 0.8 -> 80 % of customer who
buy bread also buy Jam
• To make a distinction among customer,
• P(Y/X,D) D is set of attributes like gender, age, marital
status, etc.,
15
Reinforcement Learning
Closest to human learning
Definition Method in which machine is Machine is trained with unlabelled data Agent interacts with the environment
trained with labelled data without any guidance through actions & discovers errors or
rewards
Types of Problems Regression, classification Association, clustering, dimensionality Trial and error
reduction
Approach Map input to output Understand pattern and discover output Follow trail and error method
Popular algorithms Linear, Logistic regression, SVM, K-means, Apriori Q-learning, SARSA
KNN, Random Forest
Example Applications Risk Evaluation, Forecast sales Recommendation system, Anomaly Self driving cars, Gaming
detection
Reference: Supervised vs Unsupervised vs Reinforcement Learning | Data Science Certification Training | Edureka - YouTube
17
Examples of Machine Learning
Applications
Learning Associations:
• Retail (Basket Analysis): Identifying
associations between products bought together,
e.g., customers who buy beer also buy chips
(70% association).
• Booksellers: Cross-selling based on purchase
history of books or authors.
• Web Portals: Predicting links a user is likely to
click, preloading pages for faster access.
Classification:
• Predicting loan default risk based on customer
attributes like income, savings, and past financial
history.
• Classifying patient symptoms to diagnose
diseases.
Examples of Machine Learning App
Regression:
• Steering angle prediction for navigating
without hitting obstacles.
• Optimizing coffee quality based on various
settings (temperature, time, bean type).
Unsupervised Learning:
• Grouping similar data points without labeled
outcomes.
• Analyzing customer demographics and
transactions to identify common profiles.
Reinforcement Learning:
• Learning optimal strategies for games by trial
and error.
• Developing sequences of actions to achieve a
Use Case in Smart
phones
Did you know that machine learning powers most of the
features on your smartphone?
Voice Assistants
Input Representation
• Attributes: Choose relevant features to represent each
example.
• Example: Price (x1x_1x1) and Engine Power (x2x_2x2).
• Data Points: Each car is represented by a pair of
values (price, engine power).
• Example Plot: Training set plotted in a 2D space with
x1(Price) and x2(Engine Power).
Hypothesis Class:
A set of possible hypotheses to describe the class.
For family cars, the hypothesis class can be a
rectangle in the price-engine power space:
Hypothesis Class (H):(p1≤price≤p2) AND (e1
≤engine power≤e2)
Learning a Class from Examples
Learning a Class from
Examples
Learning Process
• Finding Hypothesis: The goal is to find the
hypothesis h∈H that best approximates the class C.
• Empirical Error: Proportion of training instances
where the predictions of h do not match the true
labels.
Learning a Class from Examples
• Variance
Variance refers to the model's sensitivity to small
fluctuations in the training data. A model with high
variance pays too much attention to the training
data, including the noise, and might not perform well
on new, unseen data. This usually leads to
overfitting.
Noise in Data
Aspect Bias Variance
Error introduced by a Error due to model's
Definition
simplified model sensitivity to data
Systematic errors in High variability in
Effect
predictions predictions
Characteristic Captures noise along with
Misses underlying patterns
s patterns
Strategy to
Use more complex models Use simpler models
Reduce
Cross-validation, Cross-validation,
Techniques Regularization, Ensemble Regularization, Ensemble
Methods Methods
Visual
Underfitting (high bias, low Overfitting (low bias, high
Representatio
variance) Variance)
n
References
• 1. Ethem Alpaydin, “Introduction to Machine
Learning”, Second Edition, MIT Press, 2013.
• 2. Tom M. Mitchell, “Machine Learning”, McGraw-Hill
Education, 2013.
• 3. Stephen Marsland, “Machine Learning: An
Algorithmic Perspective”, CRC Press, 2009.
• 4. Y. S. Abu-Mostafa, M. Magdon-Ismail, and H.-T.
Lin, “Learning from Data”, AML Book Publishers,
2012.
• 5. K. P. Murphy, “Machine Learning: A Probabilistic
Perspective”, MIT Press, 2012.
• 6. M. Mohri, A. Rostamizadeh, and A. Talwalkar,
“Foundations of Machine Learning”, MIT Press,
2012.