L3 - Supervised and Unsupervised Learning
L3 - Supervised and Unsupervised Learning
3
When to use Machine Learning
Any field that needs to interpret and act on data can benefit from Machine learning
Techniques.
10
Supervised Learning
• The outcome or output for the given input is known before itself” and the machine must be able to map or assign the
given input to the output.
• Regression and classification problems are mainly solved here.
• Labelled data is used for training here.
• Popular Algorithms: Linear Regression, Support Vector Machines (SVM), Neural Networks, Decision Trees, Naive
Bayes, Nearest Neighbor.
• It is mainly used in Predictive Modelling.
• Supervised learning is often described as Task-driven. It is highly focused on a singular task, feeding more and more
examples to the algorithm until it can accurately perform on that task. Examples include:
Advertisement Popularity: Many of the ads you see as you browse the internet are placed there because a learning
algorithm said that they were of reasonable popularity (and clickability).
Spam Classification: That spam filter is a supervised learning system. Fed email examples and labels (spam/not spam),
these systems learn how to preemptively filter out malicious emails so that their user is not harassed by them.
Face Recognition: Do you use Facebook? Most likely your face has been used in a supervised learning algorithm that is
trained to recognize your face. Having a system that takes a photo, finds faces, and guesses who that is in the photo
(suggesting a tag) is a supervised process. It has multiple layers to it, finding faces and then identifying them, but is still
supervised nonetheless.
Supervised Learning
Unsupervised Learning
Unsupervised Learning
• Unlabeled data is used in unsupervised learning. “The outcome or output for
the given inputs is unknown”, here input data is given and the model is run on it.
• The image or the input given are grouped together here and insights on the inputs
can be found here(which is the most of the real world data available).
• Clustering problems(grouping), Anomaly Detection (in banks for unusual
transactions) where there is a need for finding relationships among the data
given, Associations such as people that buy X also tend to buy Y.
• Popular Algorithms: k-means clustering, Association rule.
• It is mainly used in Descriptive Modelling.
• Because unsupervised learning is based upon the data and its properties, we can
say that unsupervised learning is data-driven.
Unsupervised Learning Examples
• Recommender Systems: If you’ve ever used YouTube or Netflix, you’ve most likely encountered a
video recommendation system. These systems are often times placed in the unsupervised domain.
We know things about videos, maybe their length, their genre, etc. We also know the watch history
of many users. Taking into account users that have watched similar videos as you and then enjoyed
other videos that you have yet to see, a recommender system can see this relationship in the data and
prompt you with such a suggestion.
• Buying Habits: It is likely that your buying habits are contained in a database somewhere and that
data is being bought and sold actively at this time. These buying habits can be used in unsupervised
learning algorithms to group customers into similar purchasing segments. This helps companies
market to these grouped segments and can even resemble recommender systems.
• Grouping User Logs: Less user facing, but still very relevant, we can use unsupervised learning to
group user logs and issues. This can help companies identify central themes to issues their customers
face and rectify these issues, through improving a product or designing an FAQ to handle common
issues. Either way, it is something that is actively done and if you’ve ever submitted an issue with a
product or submitted a bug report, it is likely that it was fed to an unsupervised learning algorithm to
cluster it with other similar issues.
15
Semi-supervised Learning
• The most basic disadvantage of any Supervised Learning algorithm is that the
dataset has to be hand-labeled either by a Machine Learning Engineer or a Data
Scientist. This is a very costly process, especially when dealing with large volumes
of data.
• The most basic disadvantage of any Unsupervised Learning is that it’s
application spectrum is limited.
• Semi-supervised learning overcomes both the above cons.
• It is in-between that of Supervised and Unsupervised Learning.
• The basic procedure involved is that first, the programmer will cluster similar data
using an unsupervised learning algorithm and then use the existing labeled data to
label the rest of the unlabeled data.
Semi-supervised Learning
Summary
• Supervised: All data is labeled and the algorithms learn to predict the output from
the input data.
• Unsupervised: All data is unlabeled and the algorithms learn to inherent structure
from the input data.
18
Reinforcement Learning
Summary: Types of Machine Learning
• Supervised learning: (also called inductive learning) Training data includes
desired outputs. This is spam this is not, learning is supervised.
• Unsupervised learning: Training data does not include desired outputs. Example
is clustering. It is hard to tell what is good learning and what is not.
• Semi-supervised learning: Training data includes a few desired outputs.
• Reinforcement learning: Rewards from a sequence of actions. AI types like it, it is
the most ambitious type of learning.
Resources:
Datasets
• UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
• UCI KDD Archive:
http://kdd.ics.uci.edu/summary.data.application.ht
ml
• Statlib: http://lib.stat.cmu.edu/
• Delve: http://www.cs.utoronto.ca/~delve/
21
Importance of AI and Machine Learning
• https://www.youtube.com/watch?v=czVeSFH4dWc
• https://www.youtube.com/watch?v=f_uwKZIAeM0
Commonly Used Terms
• Labelled data: It consists of a set of data, an example would include all the
labelled cats or dogs images in a folder, all the prices of the house based on size
etc.
• Classification: Separating into groups having definite values Eg. 0 or 1, cat or dog
or orange etc.
• Regression: Estimating the most probable values or relationship among variables.
Eg. estimation of the price of the house based on size.
• Association: Discovering interesting relations between variables in large
databases where the connection found is crucial.
• Prediction: Once our model is ready, it can be fed a set of inputs to which it will
provide a predicted output(label).
Commonly used terms