Machine Learning Ebook
Machine Learning Ebook
Machine Learning
What is Machine Learning?
Machine learning algorithms find natural patterns in data Real World Applications
that generate insight and help you make better decisions
and predictions. They are used every day to make critical With the rise in big data, machine learning has become
decisions in medical diagnosis, stock trading, energy load particularly important for solving problems in areas like these:
forecasting, and more. Media sites rely on machine learning • Computational finance, for credit scoring and
to sift through millions of options to give you song or movie algorithmic trading
recommendations. Retailers use it to gain insight into their
customers’ purchasing behavior. • Image processing and computer vision, for face recognition,
motion detection, and object detection
UNSUPERVISED
LEARNING
CLUSTERING
Group and interpret
data based only
on input data
MACHINE LEARNING
CLASSIFICATION
SUPERVISED
LEARNING
Develop predictive
model based on both
input and output data
REGRESSION
The aim of supervised machine learning is to build a model Using Supervised Learning to Predict Heart Attacks
that makes predictions based on evidence in the presence of
uncertainty. A supervised learning algorithm takes a known Suppose clinicians want to predict whether someone will have a
set of input data and known responses to the data (output) heart attack within a year. They have data on previous patients,
and trains a model to generate reasonable predictions for the including age, weight, height, and blood pressure. They know
response to new data. whether the previous patients had heart attacks within a year. So
the problem is combining the existing data into a model that can
Supervised learning uses classification and regression predict whether a new person will have a heart attack
techniques to develop predictive models. within a year.
• Classification techniques predict discrete responses—for
example, whether an email is genuine or spam, or
whether a tumor is cancerous or benign. Classification
models classify input data into categories. Typical
applications include medical imaging, speech recognition,
and credit scoring.
Clustering
Patterns in
the Data
Clustering Patterns
in the Data
Clustering
Patterns in
the Data
There is no best method or one size fits all. Finding the right
algorithm is partly just trial and error—even highly experienced SUPERVISED UNSUPERVISED
data scientists can’t tell whether an algorithm will work without LEARNING LEARNING
trying it out. But algorithm selection also depends on the size and
type of data you’re working with, the insights you want to get from
the data, and how those insights will be used.
Discriminant
SVR, GPR Hierarchical
Analysis
Hidden Markov
Neural Networks
Model
• The nature of the data keeps changing, and the program needs
to adapt—as in automated trading, energy demand forecasting,
and predicting shopping trends.
With more than 8 million members, the RAC is one of the UK’s
largest motoring organizations, providing roadside assistance,
insurance, and other services to private and business motorists.
Ready for a deeper dive? Explore these resources to learn more about
machine learning methods, examples, and tools.
Watch
Machine Learning Made Easy 34:34
Signal Processing and Machine Learning Techniques for Sensor Data Analytics 42:45
Read
Machine Learning Blog Posts: Social Network Analysis, Text Mining, Bayesian Reasoning, and more
The Netflix Prize and Production Machine Learning Systems: An Insider Look
Machine Learning Challenges: Choosing the Best Model and Avoiding Overfitting
Explore
MATLAB Machine Learning Examples
Machine Learning Solutions
Classify Data with the Classification Learner App
© 2016 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See mathworks.com/trademarks for a list of additional trademarks.
Other product or brand names may be trademarks or registered trademarks of their respective holders.
92991v00
Getting Started with
Machine Learning
Rarely a Straight Line
It takes time to find the best model to fit the data. Choosing the
right model is a balancing act. Highly flexible models tend to overfit
data by modeling minor variations that could be noise. On the
other hand, simple models may assume too much. There are always
tradeoffs between model speed, accuracy, and complexity.
In the next sections we’ll look at the steps in more detail, using a
health monitoring app for illustration. The entire workflow will be
completed in MATLAB®.
The trained model (or classifier) will be integrated into an app to MACHINE LEARNING
help users track their activity levels throughout the day.
1. Sit down holding the phone, log data from the phone,
and store it in a text file labeled “Sitting.”
We store the labeled data sets in a text file. A flat file format such
as text or CSV is easy to work with and makes it straightforward to
import data.
We import the data into MATLAB and plot each labeled set. raw data outliers
To preprocess the data we do the following:
We could simply ignore the missing values, but this will reduce
the size of the data set. Alternatively, we could substitute
approximations for the missing values by interpolating or using Outliers in the activity-tracking data.
4. Divide the data into two sets. We save part of the data for
testing (the test set) and use the rest (the training set) to build
models. This is referred to as holdout, and is a useful cross-
validation technique.
For the activity tracker, we want to extract features that capture the
frequency content of the accelerometer data. These features will
help the algorithm distinguish between walking (low frequency)
and running (high frequency). We create a new table that includes
the selected features.
The number of features that you could derive is limited only by your imagination. However, there are a lot of techniques
commonly used for different types of data.
When building a model, it’s a good idea to start with something To see how well it performs, we plot the confusion matrix, a table
simple; it will be faster to run and easier to interpret. that compares the classifications made by the model with the
actual class labels that we created in step 1.
We start with a basic decision tree.
feat53<335.449 feat53>=335.449
Sitting >99% <1%
St
D
Running Dancing
Ru
Si
an
an
al
tti
nn
ci
ki
di
ng
in
ng
ng
ng
g
PREDICTED CLASS
We start with a K-nearest neighbors (KNN), a simple algorithm However, KNNs take a considerable amount of memory to run,
that stores all the training data, compares new points to the since they require all the training data to make a prediction.
training data, and returns the most frequent class of the “K”
We try a linear discriminant model, but that doesn’t improve the
nearest points. That gives us 98% accuracy compared to 94.1%
results. Finally, we try a multiclass support vector machine (SVM).
for the simple decision tree. The confusion matrix looks
The SVM does very well—we now get 99% accuracy:
better, too:
St
W
W
D
D
Ru
Ru
Si
Si
an
an
an
an
al
al
tti
tti
nn
nn
ci
ci
ki
ki
di
di
ng
ng
in
in
ng
ng
ng
ng
ng
ng
g
g
PREDICTED CLASS PREDICTED CLASS
Improving a model can take two different directions: make the A good model includes only the features with the most
model simpler or add complexity. predictive power. A simple model that generalizes well is
better than a complex model that may not generalize or
Simplify train well to new data.
First, we look for opportunities to reduce the number of features. In machine learning, as in many other
computational processes, simplifying the
Popular feature reduction techniques include:
model makes it easier to understand,
• Correlation matrix – shows the relationship between more robust, and more computationally
efficient.
variables, so that variables (or features) that are not highly
correlated can be removed.
• Principal component analysis (PCA) – eliminates
redundancy by finding a combination of features that
captures key distinctions between the original features and
brings out strong patterns in the dataset.
• Sequential feature reduction – reduces features
iteratively on the model until there is no improvement
in performance.
Next, we look at ways to reduce the model itself. We can
do this by:
Add Complexity
If the model can reliably classify activities on the test data, we’re
ready to move it to the phone and start tracking.
Ready for a deeper dive? Explore these resources to learn more about
machine learning methods, examples, and tools.
Watch
Machine Learning Made Easy 34:34
Signal Processing and Machine Learning Techniques for Sensor Data Analytics 42:45
Read
Supervised Learning Workflow and Algorithms
Data-Driven Insights with MATLAB Analytics: An Energy Load Forecasting Case Study
Explore
MATLAB Machine Learning Examples
Classify Data with the Classification Learner App
© 2016 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See mathworks.com/trademarks for a list of additional trademarks.
Other product or brand names may be trademarks or registered trademarks of their respective holders.
93014v00
Applying
Unsupervised Learning
When to Consider
Unsupervised Learning
Unsupervised learning is useful when you want to explore your data but
don’t yet have a specific goal or are not sure what information the data
contains. It’s also a good way to reduce the dimensions of your data.
Unsupervised Learning Techniques
k-Means k-Medoids
How it Works How It Works
Partitions data into k number of mutually exclusive clusters. Similar to k-means, but with the requirement that the cluster
How well a point fits into a cluster is determined by the centers coincide with points in the data.
distance from that point to the cluster’s center.
Best Used...
Best Used...
• When the number of clusters is known
• When the number of clusters is known • For fast clustering of categorical data
• For fast clustering of large data sets • To scale to large data sets
Result:
Result: Dendrogram showing Lower-dimensional
the hierarchical relationship (typically 2D)
between clusters representation
After running the algorithm, the team can accurately determine the
results of partitioning the data into three and four clusters.
After preprocessing the data to remove noise, they cluster the data.
Because the same genes can be involved in several biological
processes, no single gene is likely to belong to one cluster only.
The researchers apply a fuzzy c-means algorithm to the data. They
then visualize the clusters to identify groups of genes that behave in
a similar way.
Machine learning is an effective method for finding patterns in As datasets get bigger, you frequently need to reduce the
big datasets. But bigger data brings added complexity. number of features, or dimensionality.
In datasets with many variables, groups of variables often move Each principal component is a linear combination of the original
together. PCA takes advantage of this redundancy of information variables. Because all the principal components are orthogonal to
by generating new variables via linear combinations of the original each other, there is no redundant information.
variables so that a small number of new variables captures most of
the information.
Your dataset might contain measured variables that overlap, In a factor analysis model, the measured variables depend on
meaning that they are dependent on one another. Factor a smaller number of unobserved (latent) factors. Because each
analysis lets you fit a model to multivariate data to estimate factor might affect several variables, it is known as a common
this sort of interdependence. factor. Each variable is assumed to be dependent on a linear
combination of the common factors.
Over the course of 100 weeks, the percent change in stock prices
has been recorded for ten companies. Of these ten, four are
technology companies, three are financial, and a further three
are retail. It seems reasonable to assume that the stock prices
for companies in the same sector will vary together as economic
conditions change. Factor analysis can provide quantitative
evidence to support this premise.
This dimension reduction technique is based on a low-rank nonnegative, producing models that respect features such as
approximation of the feature space. In addition to reducing the nonnegativity of physical quantities.
the number of features, it guarantees that the features are
Iris Clustering
Self-Organizing Maps
Cluster Data with a
Self-Organizing Map
© 2016 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See mathworks.com/trademarks for a list of additional trademarks.
Other product or brand names may be trademarks or registered trademarks of their respective holders.
80823v00
Applying
Supervised Learning
When to Consider
Supervised Learning
• Speed of training
• Memory usage
• Predictive accuracy on new data
• Transparency or interpretability (how easily you can
understand the reasons an algorithm makes its predictions)
Let’s take a closer look at the most commonly used classification
and regression algorithms.
Best Used...
• For data that has exactly two classes (you can also use it
for multiclass classification with a technique called error-
correcting output codes)
• For high-dimensional, nonlinearly separable data
• When you need a classifier that’s simple, easy to interpret,
and accurate
Discriminant Analysis
How It Works
Discriminant analysis classifies data by finding linear combinations
of features. Discriminant analysis assumes that different classes
generate data based on Gaussian distributions. Training a
discriminant analysis model involves finding the parameters for a
Gaussian distribution for each class. The distribution parameters
are used to calculate boundaries, which can be linear or
quadratic functions. These boundaries are used to determine the
class of new data.
Best Used...
After collecting, cleaning, and logging data from all the machines
in the plant, the engineers evaluate several machine learning
techniques, including neural networks, k-nearest neighbors,
bagged decision trees, and support vector machines (SVMs). For
each technique, they train a classification model using the logged
machine data and then test the model’s ability to predict machine
problems. The tests show that an ensemble of bagged decision
trees is the most accurate model for predicting the production
quality.
Applying Unsupervised
Supervised Learning
Learning 11
Common Regression Algorithms
Applying Unsupervised
Supervised Learning
Learning 15
Improving Models
Feature transformation is a form of dimensionality reduction. As we saw in section 3, the three most commonly used dimensionality
reduction techniques are:
• Bayesian optimization
• Grid search
• Gradient-based optimization
© 2016 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See mathworks.com/trademarks for a list of additional trademarks.
Other product or brand names may be trademarks or registered trademarks of their respective holders.
80827v00