Machine Learning
Faculty of Computing
JiT
Jimma University
Introduction
It has been long understood that
learning is a key element of
intelligence.
This holds both for natural
intelligence
we all get smarter by learning -
and artificial intelligence.
The types of machine
learning
Handwritten digits are a classic
case that is often used when
discussing
why we use machine learning, and we
will make no exception.
Below you can see examples of
handwritten images from the very
commonly used MNIST dataset.
The correct label (what digit the
writer was supposed to write) is
shown above each image.
Note that some of the correct
class labels are questionable:
see for example the second image
from left: is that really a 7, or actually
a 4?
MNIST - M - Modified, and NIST - National
Institute of Standards and Technology.
Three types of machine learning
The roots of machine learning are
in statistics, which can also be
thought of as the art of extracting
knowledge from data.
Especially methods such as
linear regression and Bayesian
statistics,
which are both already more than two
centuries old (!), are even today at
the heart of machine learning.
The area of machine learning is
often divided in subareas
according to the kinds of problems
being attacked.
A rough categorization is as
follows:
Supervised learning
We are given an input,
for example
a photograph with a traffic sign, and
the task is to predict the correct output or
label,
for example which traffic sign is in the
picture (speed limit, stop sign, etc.).
In the simplest cases, the answers
are in the form of
yes/no (we call these binary
classification problems).
Unsupervised learning
There are
no labels or correct outputs.
The task is to discover the
structure of the data:
for example,
grouping similar items to form
clusters, or
reducing the data to a small number
of important dimensions.
Data visualization can also be
considered unsupervised learning.
Reinforcement learning
Commonly used in situations
where
an AI agent like a self-driving car
must operate in an environment and
where feedback about good or bad
choices is available with some delay.
Also used in games where the
outcome may be decided only at
the end of the game.
Semisupervised learning
The categories are somewhat
overlapping and fuzzy,
so a particular method can
sometimes be hard to place in one
category.
For example, as the name
suggests,
partly supervised and partly
unsupervised.
Classification
When it comes to machine
learning,
we will focus primarily on supervised
learning, and in particular,
classification tasks.
In classification,
we observe in input,
such as a photograph of a traffic sign,
and
try to infer its “class”, such as the type
of sign (speed limit 80 km/h, pedestrian
crossing, stop sign, etc.).
Other examples of classification
tasks include:
identification of fake Twitter accounts
(input includes the list of followers, and
the rate at which they have started
following the account, and
the class is either fake or real account)
and handwritten digit recognition (input
is an image, class is 0,...,9).
Humans teaching machines:
supervised learning
Instead of manually writing down
exact rules to do the classification,
the point in
supervised machine learning is to
take a number of examples, label
each one by the
correct label, and use them to
“train” an AI method to
automatically recognize the correct
label for the training examples as
course requires that the correct
labels are provided, which is why
we talk about
supervised learning. The user who
provides the correct labels is a
supervisor who guides
the learning algorithm towards
correct answers so that eventually,
the algorithm can
independently produce them.
In addition to learning how to
predict the correct label in a
classification problem,
supervised learning can also be
used in situations where the
predicted outcome is a
number. Examples include
predicting the number of people
who will click a Google ad
based on the ad content and data
Example
Suppose we have a data set
consisting of apartment sales data.
For each purchase, we
would obviously have the price
that was paid, together with the
size of the apartment in
square meters (or square feet, if
you like), and the number of
bedrooms, the year of
construction, the condition (on a
Caveat: careful with that
machine learning algorithm
There are a couple potential
mistakes that we'd like to make
you aware of. They are
related to the fact that unless you
are careful with the way you apply
machine learning
methods, you could become too
confident about the accuracy of
your predictions, and be
heavily disappointed when the
The first thing to keep in mind in
order to avoid big mistakes, is to
split your data set into
two parts: the training data and
the test data. We first train the
algorithm using only the
training data. This gives us a
model or a rule that predicts the
output based on the input
variables.
To assess how well we can actually
predict the outputs, we can't count
on the training
data. While a model may be a very
good predictor in the training data,
it is no proof that it
can generalize to any other data.
This is where the test data comes
in handy: we can
apply the trained model to predict
Too fit to be true! Overfitting
alert
It is very important to keep in mind
that the accuracy of a predictor
learned by machine
learning can be quite diqerent in
the training data and in separate
test data. This is the socalled
overfitting phenomenon, and a lot
of machine learning research is
focused on avoiding
it one way or another. Intuitively,
However, maybe there are two
love songs with catchy choruses
that
didn't make the top-20, so you
decide to continue the rule
“...except if Sweden or yoga are
mentioned” to improve your rule.
This could make your rule fit the
past data perfectly, but it
could in fact make it work worse on
Machine learning methods are
especially prone to overfitting
because they can try a huge
number of diqerent “rules” until
one that fits the training data
perfectly is found. Especially
methods that are very flexible and
can adapt to almost any pattern in
the data can overfit
unless the amount of data is
Learning to avoid overfitting and
choose a model that is not too
restricted, nor too
flexible, is one of the most
essential skills of a data scientist.
Learning without a teacher:
unsupervised learning
Above we discussed supervised
learning where the correct answers
are available, and the
task of the machine learning
algorithm is to find a model that
predicts them based on the
input data.
In unsupervised learning, the
correct answers are not provided.
This makes the situation
quite different since we can't build
the model by making it fit the
correct answers on
training data. It also makes the
evaluation of performance more
complicated since we
can't check whether the learned
Typical unsupervised learning
methods attempt to learn some
kind of “structure”
underlying the data. This can
mean, for example, visualization
where similar items are
placed near each other and
dissimilar items further away from
each other. It can also
mean clustering where we use the
Example
As a concrete example, grocery
store chains collect data about
their customers' shopping
behavior (that's why you have all
those loyalty cards). To better
understand their customers,
the store can either visualize the
data using a graph where each
customer is represented by a
dot and customers who tend to
Yet another example of
unsupervised learning can be
termed generative modeling. This
has become a prominent approach
since the last few years as a deep
learning technique
called generative adversarial
networks (GANs) has lead to great
advances. Given some
data, for example, photographs of people's
faces, a generative model can generate more
of the same: more real-looking but artificial
images of people's faces.
We will return to GANs and the implications of
being able to produce high-quality
artificial image content a bit later in the course,
but next we will take a closer look at
supervised learning and discuss some specific
methods in more detail.