Chapter 1
Chapter 1
Lecture 1
1. Machine Learning is great for problems for which existing solutions require
a lot of hand-tuning or long lists of rules: one Machine Learning algorithm
can often simplify code and perform better.
5
2. Whether or not they can learn incrementally on the fly (online versus batch
learning).
3. Whether they work by simply comparing new data points to known data
points, or instead detect patterns in the training data and build a predictive
model, much like scientists do (instance-based versus model-based
learning)
Spam Filter may learn on the fly using a deep neural network
model trained using examples of spam and ham; this makes
it an online, model-based, supervised learning system.
Supervised, Unsupervised,
Semisupervised, and
Reinforcement Learning
17
18
Supervised Learning
• The training data you feed to the algorithm includes the desired solutions,
called labels.
• Classification: The Spam Filter is trained with many example emails along
with their class (spam or ham), and it must learn how to classify new emails.
19
https://www.mdpi.com/2078-2489/10/4/150/htm
21
Unsupervised Learning
• The training data is unlabeled. The system tries to learn without a teacher.
22
Notice:
1. how animals are rather well separated from vehicles.
2. how horses are close to deer but far from birds, and so on.
25
Dimensionality Reduction
The goal is to simplify the data without losing too much information.
• Merge several correlated features into one (called Feature Extraction).
• For example, a car’s mileage may be very correlated with its age.
• The dimensionality reduction algorithm will merge Mileage and age into one
feature that represents the car’s wear and tear. This is called feature
extraction
26
Anomaly Detection
Detecting unusual credit card transactions to prevent fraud, catching
manufacturing defects, or automatically removing outliers from a dataset
before feeding it to another learning algorithm.
27
Semi-supervised Learning
• Partially labeled training data, usually a lot of unlabeled data and a little bit of
labeled data.
• Google Photos (photo-hosting services):
29
Reinforcement Learning
• Many robots implement Reinforcement Learning algorithms to learn how to walk.
Batch and Online
Learning
30
31
Batch Learning
• The system is incapable of learning incrementally: it must be trained using all the
available data.
• It takes a lot of time and computing resources.
• It is typically done offline.
• Offline Learning: the system is trained, launched into production and runs without
learning anymore (applies what it has learned).
• The whole process of training, evaluating, and launching a Machine Learning system
can be automated fairly easily.
32
Online Learning
• The system is trained incrementally by feeding it data instances sequentially,
either individually or by small groups called mini-batches.
• Each learning step is fast and cheap, so the system can learn about new data
on the fly, as it arrives.
34
36
37
Instance-based Learning
• The system learns the examples by heart, then generalizes to new cases
using a similarity measure.
38
Model-based Learning
• Another way to generalize from a set of examples is to build a model of these
examples, then use that model to make predictions.
Model-based Learning
• Suppose you want to know if money makes
people happy.
• Question: Does money make people
References
• Pages 3-18 of Hands-On Machine Learning with Scikit-Learn and
TensorFlow.
• Pages 1-13 of Introduction to Machine Learning with Python.
• https://learning.oreilly.com/library/view/hands-on-machine-learning/97814920
32632/ch01.html