Supervised Learning
Supervised Learning
net/publication/229031588
Supervised Learning
CITATIONS READS
93 20,551
2 authors:
All content following this page was uploaded by Qiong Liu on 01 April 2015.
Synonyms
Supervised Machine Learning; Learning with a Teacher; Learning from Labeled Data; Regression;
Classification; Inductive Machine Learning; Active Learning; Semi-supervised Learning;
Definition
Supervised Learning is a machine learning paradigm for acquiring the input-output relationship information
of a system based on a given set of paired input-output training samples. As the output is regarded as the
label of the input data or the supervision, an input-output training sample is also called labelled training data,
or supervised data. Occasionally, it is also referred to as Learning with a Teacher (Haykin 1998), Learning
from Labelled Data, or Inductive Machine Learning (Kotsiantis, 2007). The goal of supervised learning is to
build an artificial system that can learn the mapping between the input and the output, and can predict the
output of the system given new inputs. If the output takes a finite set of discrete values that indicate the class
labels of the input, the learned mapping leads to the classification of the input data. If the output takes con-
tinuous values, it leads to a regression of the input. The input-output relationship information is frequently
represented with learning-model parameters. When these parameters are not directly available from training
samples, a learning system needs to go through an estimation process to obtain these parameters. Different
form Unsupervised Learning, the training data for Supervised Learning need supervised or labelled informa-
tion, while the training data for unsupervised learning are unsupervised as they are not labelled (i.e., merely
the inputs). If an algorithm uses both supervised and unsupervised training data, it is called a Semi-supervised
Learning algorithm. If an algorithm actively queries a user/teacher for labels in the training process, the itera-
tive supervised learning is called Active Learning.
Overview
yi
Figure 1 shows a block diagram that illustrates the form of Supervised Learning. In this diagram, (xi,yi) is a
supervised training sample, where ‘x’ represents system input, ‘y’ represents the system output (i.e., the super-
vision or labelling of the input x), and ‘i’ is the index of the training sample. During a Supervised Learning
process, a training input xi is fed to the Learning System, and the Learning System generates an output ỹi .
The Learning System output ỹi is then compared with the ground truth labeling yi by an arbitraor that
computes the difference between them. The difference, termed Error Signal in this diagram, is then sent to
the Learning System for adjusting the parameters of the learner. The goal of this learning process is to obtain
a set of optimal Learning System parameters that can minimize the differences between ỹi and yi for all i, i.e.,
minimizing the total error over the entire training data set.
A notable phenomenon is that a minimum training error does not necessarily indicate a good performance in
testing. Training is referred to as the learning process that estimates the parameters of the learner based on
the ground truth supervised data seen, while testing is to evaluate the predictions of the learner for the data
unseen, i.e., the data used in testing have not been included in the training process. Therefore, even if a
learner achieves a minimum error on the set of training data, it does not guarantee to perform well to the data
unseen. The reason for this is mainly due to the posible overfitting to the training data, i.e., the learner has an
unnecessary order of complexity in learning the mapping. This issue is referred to as the generalizablity. A
good learning algorithm must have a good generalizability. To take into consideration of the generalizablity in
designing the learner, a learning algorithm needs to balance the objective of minimizing the training error and
the complexity of the learner (e.g., the strucutre and the order of the learner). For example, in the Support
Vector Machine, the generalizablity of the learner is charaterized by the margin of the learned discrimination
boundary. The larger the margin, the better the generalization. The support vector machine learns a
maxiumum margin classifier over the training set, and thus it naturally leads to a good generalization
performance (Vapnik, 1995).
The Supervised Learning paradigm does not restrict the sources of the input or output data. The input or
output may belong to a vector space or a set of discrete values. The learning paradigm does not have special
restrictions on the arbitrator either. If yi is drawn from a continuous space, the error signal is usually
computed via yi - ỹi . If yi belongs to a set of discrete values, the arbitrator usually outputs the error signal
based on the equality between yi and ỹi . For example, the arbitrator may output 0 for equalled yi and ỹi and
output 1 for different yi and ỹi .
There are different approaches to the design of the Learning System in Supervised Learning. Some well-
known approaches include the logic-based approach, the multi-layer perceptron approach, the statistical-
learning approach, the instance-based learning approach, the Support Vector Machines, and Boosting.
Applications
Supervised Learning enables a machine to learn the human behaviour or object behaviour in certain tasks.
The learned knowledge can then be used by the machine to perform similar actions on these tasks. Since the
computing machinery may perform some input-output mappings much faster and more persistent than the
human, machines equipped with a good supervised learner can perform certain tasks much faster and accu-
rate than the human. On the other hand, because of the limitation in hardware, software, and algorithm de-
signs, existing Supervised Learning algorithms still cannot match human’s learning ability on many compli-
cated tasks.
Supervised Learning have been successfully used in areas such as Information Retrieval, Data Mining, Com-
puter Vision, Speech Recognition, Spam Detection, Bioinformatics, Cheminformatics, and Market Analysis
(Wikipedia, 2010).
Cross-Reference
Interactive learning, Machine Learning from pairwise relationships, Adaptation and unsupervised learning,
Feature selection (unsupervised learning), Classification of learning objects.
References
Haykin, S. 1998 Neural Networks: a Comprehensive Foundation. 2nd. Prentice Hall PTR. ISBN 0-132-
73350-1
Vapnik, V. The Nature of Statistical Learning Theory. Springer-Verlag, 1995. ISBN 0-387-98780-0
Kotsiantis, S. Supervised Machine Learning: A Review of Classification Techniques, Informatica Journal 31
(2007) 249-268
Wikipedia, Supervised learning, http://en.wikipedia.org/wiki/Supervised_learning