0% found this document useful (0 votes)
2 views

1. Chapter 1 Introduction to ML

The document provides an introduction to machine learning (ML), defining it as a subset of artificial intelligence (AI) that enables computers to learn from data without explicit programming. It discusses the importance of ML in organizations for decision-making and value creation, highlighting various applications such as spam filtering, image classification, and fraud detection. Additionally, it categorizes ML systems into supervised, unsupervised, semi-supervised, and reinforcement learning, explaining their respective algorithms and use cases.

Uploaded by

ajay200457
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

1. Chapter 1 Introduction to ML

The document provides an introduction to machine learning (ML), defining it as a subset of artificial intelligence (AI) that enables computers to learn from data without explicit programming. It discusses the importance of ML in organizations for decision-making and value creation, highlighting various applications such as spam filtering, image classification, and fraud detection. Additionally, it categorizes ML systems into supervised, unsupervised, semi-supervised, and reinforcement learning, explaining their respective algorithms and use cases.

Uploaded by

ajay200457
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

INTRODUCTION

TO
MACHINE LEARNING
INTRODUCTION TO ANALYTICS AND MACHINE LEARNING

• Analytics is a collection of techniques and tools used for


creating value from data. Techniques include concepts such as
artificial intelligence (AI), machine learning (ML), and deep
learning (DL) algorithms.
• AI, ML, and DL are defined as follows: 1
1. Artificial Intelligence: Algorithms and systems that exhibit human-like
intelligence.
2. Machine Learning: Subset of AI that can learn to perform a task with
extracted data and/or models.
3. Deep Learning: Subset of machine learning that imitate the
functioning of human brain to solve problems.
• The relationship between AI, ML, and DL shown in Figure is
not accepted by all.
• There is another school of thought that believes that AI and
ML are different (ML is not a subset of AI) with some overlap.
• The important point is that all of them are algorithms, which
are nothing but set of instructions used for solving business
and social problems.
• Machine learning is a set of algorithms that have the capability
to learn to perform tasks such as prediction and classification
effectively using data.
• Learning is achieved using additional data and/or additional
models.
• An algorithm can be called a learning algorithm when it
improves on a performance metric while performing a task, for
example, accuracy of classification such as fraud, customer
churn, and so on.
What Is Machine Learning?
Machine Learning is the science (and art) of
programming computers so they can learn from data.

Here is a slightly more general definition:


[Machine Learning is the] field of study that gives
computers the ability to learn without being explicitly
programmed.
—Arthur Samuel, 1959
And a more engineering-oriented one:
A computer program is said to learn from experience E
with respect to some task T and some performance measure P, if
its performance on T, as measured by P, improves with
experience E.
—Tom Mitchell, 1997
• Your spam filter is a Machine Learning program that, given
examples of spam emails (e.g., flagged by users) and examples
of regular (nonspam, also called “ham”) emails, can learn to flag
spam.
• The examples that the system uses to learn are called the
training set.
• Each training example is called a training instance (or sample).
• In this case, the task T is to flag spam for new emails, the
experience E is the training data, and the performance measure
P needs to be defined; for example, you can use the ratio of
correctly classified emails. This particular performance measure
is called accuracy, and it is often used in classification tasks.
• If you just download a copy of Wikipedia, your computer has a
lot more data, but it is not suddenly better at any task. Thus,
downloading a copy of Wikipedia is not Machine Learning.
WHY MACHINE LEARNING?
• Organizations across the world use several performance
measures such as return on investment (ROI), market share,
customer retention, sales growth, customer satisfaction, and
so on for quantifying, monitoring, benchmarking, and
improving.
• Organizations would like to understand the association
between key performance indicators (KPIs) and factors that
have a significant impact on the KPIs for effective
management. Knowledge of the relationship between KPIs
and factors would provide the decision maker with
appropriate actionable items (U D Kumar, 2017).
• Machine learning algorithms can be used for identifying the
factors that influence the key performance indicators, which
can be further used for decision making and value creation.
• Organizations such as Amazon, Apple, Capital One, General
Electric, Google, IBM, Facebook, Procter and Gamble and so
on use ML algorithms to create new products and solutions.
• ML can create significant value for organizations if used
properly. MacKenzie et al. (2013) reported that Amazon’s
recommender systems resulted in a sales increase of 35%.
• A typical ML algorithm uses the following steps:
1. Identify the problem or opportunity for value creation.
2. Identify sources of data (primary as well secondary data sources) and
create a data lake (integrated data set from different sources).
3. Pre-process the data for issues such as missing and incorrect data.
Generate derived variables (feature engineering) and transform the data
if necessary. Prepare the data for ML model building.
4. Divide the datasets into subsets of training and validation datasets.
5. Build ML models and identify the best model(s) using model
performance in validation data.
6. Implement Solution/Decision/Develop Product.
Why Use Machine Learning?

• Consider how you would write a spam filter using traditional


programming techniques.

• Since the problem is difficult, your program will likely become


a long list of complex rules—pretty hard to maintain.
• In contrast, a spam filter based on Machine Learning
techniques automatically learns which words and phrases are
good predictors of spam by detecting unusually frequent
patterns of words in the spam examples compared to the ham
examples.

• The program is much shorter, easier to maintain, and most


likely more accurate.
• What if spammers notice that all their emails containing “4U”
are blocked? They might start writing “For U” instead. A spam
filter using traditional programming techniques would need to
be updated to flag “For U” emails.
• If spammers keep working around your spam filter, you will
need to keep writing new rules forever.
• In contrast, a spam filter based on Machine Learning
techniques automatically notices that “For U” has become
unusually frequent in spam flagged by users, and it starts
flagging them without your intervention
• Another area where Machine Learning shines is for problems
that either are too complex for traditional approaches or
have no known algorithm.
• For example, consider speech recognition. Say you want to
start simple and write a program capable of distinguishing the
words “one” and “two.” You might notice that the word “two”
starts with a high-pitch sound (“T”), so you could hardcode an
algorithm that measures high-pitch sound intensity and use
that to distinguish ones and twos—but obviously this
technique will not scale to thousands of words spoken by
millions of very different people in noisy environments and in
dozens of languages.
• The best solution (at least today) is to write an algorithm that
learns by itself, given many example recordings for each word.
• Finally, Machine Learning can help humans learn (Figure). ML
algorithms can be inspected to see what they have learned
(although for some algorithms this can be tricky).
• For instance, once a spam filter has been trained on enough
spam, it can easily be inspected to reveal the list of words and
combinations of words that it believes are the best predictors
of spam.
• Sometimes this will reveal unsuspected correlations or new
trends, and thereby lead to a better understanding of the
problem.
• Applying ML techniques to dig into large amounts of data can
help discover patterns that were not immediately apparent.
This is called data mining.
To summarize, Machine Learning is great for:
• Problems for which existing solutions require a lot of fine-
tuning or long lists of rules: one Machine Learning algorithm
can often simplify code and perform better than the
traditional approach.
• Complex problems for which using a traditional approach
yields no good solution: the best Machine Learning
techniques can perhaps find a solution.
• Fluctuating environments: a Machine Learning system can
adapt to new data.
• Getting insights about complex problems and large amounts
of data.
Examples of Applications
• Analyzing images of products on a production line to
automatically classify them
This is image classification, typically performed using
convolutional neural networks(CNNs)
• Detecting tumors in brain scans
This is semantic segmentation, where each pixel in the
image is classified (as we want to determine the exact
location and shape of tumors), typically using CNNs as
well.
• Automatically classifying news articles
This is natural language processing (NLP), and more
specifically text classification, which can be tackled using
recurrent neural networks (RNNs), CNNs, or Transformers.
• Automatically flagging offensive comments on discussion forums
This is also text classification, using the same NLP tools.
• Summarizing long documents automatically
This is a branch of NLP called text summarization, again using the
same tools.
• Creating a chatbot or a personal assistant
This involves many NLP components, including natural language
understanding(NLU) and question-answering modules.
• Forecasting your company’s revenue next year, based on many
performance metrics
This is a regression task (i.e., predicting values) that may be
tackled using any regression model, such as a Linear Regression or
Polynomial Regression model, a regression SVM, a regression
Random Forest, or an artificial neural network. If you want to take
into account sequences of past performance metrics, you may
want to use RNNs, CNNs, or Transformers.
• Making your app react to voice commands
This is speech recognition, which requires processing
audio samples: since they are long and complex
sequences, they are typically processed using RNNs, CNNs,
or Transformers
• Detecting credit card fraud
This is anomaly detection.
• Segmenting clients based on their purchases so that you can
design a different marketing strategy for each segment
This is clustering.
• Representing a complex, high-dimensional dataset in a clear
and insightful diagram
This is data visualization, often involving dimensionality
reduction techniques
• Recommending a product that a client may be interested in,
based on past purchases
This is a recommender system. One approach is to feed
past purchases (and other information about the client) to
an artificial neural network. and get it to output the most
likely next purchase. This neural net would typically be
trained on past sequences of purchases across all clients.
• Building an intelligent bot for a game
This is often tackled using Reinforcement Learning (RL),
which is a branch of Machine Learning that trains agents
(such as bots) to pick the actions that will maximize their
rewards over time (e.g., a bot may get a reward every time
the player loses some life points), within a given
environment (such as the game). The famous AlphaGo
program that beat the world champion at the game of Go
was built using RL.
Types of Machine Learning Systems
• There are so many different types of Machine Learning
systems that it is useful to classify them in broad categories,
based on the following criteria:
– Whether or not they are trained with human supervision (supervised,
unsupervised, semisupervised, and Reinforcement Learning)
– Whether or not they can learn incrementally on the fly (online versus
batch learning)
– Whether they work by simply comparing new data points to known
data points, or instead by detecting patterns in the training data and
building a predictive model, much like scientists do (instance-based
versus model-based learning)
• These criteria are not exclusive; you can combine them in any
way you like. For example, a state-of-the-art spam filter may
learn on the fly using a deep neural network model trained
using examples of spam and ham; this makes it an online,
model based, supervised learning system.
Supervised/Unsupervised Learning
Machine Learning systems can be classified according to
the amount and type of supervision they get during training.
There are four major categories:
• Supervised learning
• Unsupervised learning
• Semisupervised learning, and
• Reinforcement Learning.
• (Evolutionary Learning)
Supervised learning:
• In supervised learning, the training set you feed to the
algorithm includes the desired solutions, called labels

A labeled training set for spam classification (an example of supervised


learning)
• A typical supervised learning task is classification. The spam
filter is a good example of this: it is trained with many
example emails along with their class (spam or ham), and it
must learn how to classify new emails.
• Another typical task is to predict a target numeric value, such
as the price of a car, given a set of features (mileage, age,
brand, etc.) called predictors. This sort of task is called
regression .
• To train the system, you need to give it many examples of
cars, including both their predictors and their labels (i.e., their
prices).
Note: In Machine Learning an attribute is a data type (e.g.,
“mileage”), while a feature has several meanings, depending on
the context, but generally means an attribute plus its value (e.g.,
“mileage = 15,000”). Many people use the words attribute and
feature interchangeably.
Note that some regression algorithms can be used for
classification as well, and vice versa. For example, Logistic
Regression is commonly used for classification, as it can output a
value that corresponds to the probability of belonging to a given
class (e.g., 20% chance of being spam).
Here are some of the most important supervised learning
algorithms :
• k-Nearest Neighbors
• Linear Regression
• Logistic Regression
• Support Vector Machines (SVMs)
• Decision Trees and Random Forests
• Neural networks.
Unsupervised learning:
• In unsupervised learning, as you might guess, the training data
is unlabeled. The system tries to learn without a teacher.
Here are some of the most important unsupervised learning
algorithms:
• Clustering
— K-Means
— DBSCAN
— Hierarchical Cluster Analysis (HCA)
• Anomaly detection and novelty detection
— One-class SVM
— Isolation Forest
• Visualization and dimensionality reduction
— Principal Component Analysis (PCA)
— Kernel PCA
— Locally Linear Embedding (LLE)
— t-Distributed Stochastic Neighbor Embedding (t-SNE)
• Association rule learning
— Apriori
— Eclat
For example, say you have a lot of data about your blog’s visitors. You may
want to run a clustering algorithm to try to detect groups of similar visitors. At
no point do you tell the algorithm which group a visitor belongs to: it finds
those connections without your help. For example, it might notice that 40%
of your visitors are males who love comic books and generally read your blog
in the evening, while 20% are young sci-fi lovers who visit during the
weekends. If you use a hierarchical clustering algorithm, it may also subdivide
each group into smaller groups. This may help you target your posts for each
group.
• Visualization algorithms are also good examples of unsupervised learning
algorithms: you feed them a lot of complex and unlabeled data, and they
output a 2D or 3D representation of your data that can easily be plotted.
These algorithms try to preserve as much structure as they can (e.g.,
trying to keep separate clusters in the input space from overlapping in the
visualization) so that you can understand how the data is organized and
perhaps identify unsuspected patterns.
• A related task is dimensionality reduction, in which the goal is
to simplify the data without losing too much information. One
way to do this is to merge several correlated features into
one. For example, a car’s mileage may be strongly correlated
with its age, so the dimensionality reduction algorithm will
merge them into one feature that represents the car’s wear
and tear. This is called feature extraction.
Note:
• It is often a good idea to try to reduce the dimension of your
training data using a dimensionality reduction algorithm
before you feed it to another Machine Learning algorithm
(such as a supervised learning algorithm). It will run much
faster, the data will take up less disk and memory space, and
in some cases it may also perform better.
• Yet another important unsupervised task is anomaly
detection—for example, detecting unusual credit card
transactions to prevent fraud, catching manufacturing defects,
or automatically removing outliers from a dataset before
feeding it to another learning algorithm. The system is shown
mostly normal instances during training, so it learns to
recognize them; then, when it sees a new instance, it can tell
whether it looks like a normal one or whether it is likely an
anomaly.
• A very similar task is novelty detection: it aims to detect new
instances that look different from all instances in the training
set. This requires having a very “clean” training set, devoid of
any instance that you would like the algorithm to detect. For
example, if you have thousands of pictures of dogs, and 1% of
these pictures represent Chihuahuas, then a novelty detection
algorithm should not treat new pictures of Chihuahuas as
novelties. On the other hand, anomaly detection algorithms
may consider these dogs as so rare and so different from
other dogs that they would likely classify them as anomalies
(no offense to Chihuahuas).
• Finally, another common unsupervised task is association rule
learning, in which the goal is to dig into large amounts of data
and discover interesting relations between attributes. For
example, suppose you own a supermarket. Running an
association rule on your sales logs may reveal that people
who purchase barbecue sauce and potato chips also tend to
buy steak. Thus, you may want to place these items close to
one another.
Semisupervised learning:

• Since labeling data is usually time-consuming and costly, you


will often have plenty of unlabeled instances, and few labeled
instances. Some algorithms can deal with data that’s partially
labeled. This is called semisupervised learning.
• Some photo-hosting services, such as Google Photos, are
good examples of this. Once you upload all your family photos
to the service, it automatically recognizes that the same
person A shows up in photos 1, 5, and 11, while another
person B shows up in photos 2, 5, and 7. This is the
unsupervised part of the algorithm (clustering). Now all the
system needs is for you to tell it who these people are. Just
add one label per person and it is able to name everyone in
every photo, which is useful for searching photos.
• Most semisupervised learning algorithms are combinations of
unsupervised and supervised algorithms. For example, deep
belief networks (DBNs) are based on unsupervised
components called restricted Boltzmann machines (RBMs)
stacked on top of one another. RBMs are trained sequentially
in an unsupervised manner, and then the whole system is
fine-tuned using supervised learning techniques.
Reinforcement Learning:
• Reinforcement Learning is a very different beast. The learning
system, called an agent in this context, can observe the
environment, select and perform actions, and get rewards in
return (or penalties in the form of negative rewards, as shown
in Figure).
• It must then learn by itself what is the best strategy, called a
policy, to get the most reward over time. A policy defines
what action the agent should choose when it is in a given
situation.
• For example, many robots implement Reinforcement Learning
algorithms to learn how to walk. DeepMind’s AlphaGo
program is also a good example of Reinforcement Learning: it
made the headlines in May 2017 when it beat the world
champion Ke Jie at the game of Go. It learned its winning
policy by analyzing millions of games, and then playing many
games against itself.
• Note that learning was turned off during the games against
the champion; AlphaGo was just applying the policy it had
learned.
Evolutionary Learning:
• Evolutional algorithms are algorithms that imitate natural
evolution to solve a problem. Techniques such as genetic
algorithm and ant colony optimization fall under the category
of evolutionary learning
Batch and Online Learning
• Another criterion used to classify Machine Learning systems is
whether or not the system can learn incrementally from a
stream of incoming data.
Batch Learning:
• In batch learning, the system is incapable of learning
incrementally: it must be trained using all the available data.
This will generally take a lot of time and computing resources,
so it is typically done offline. First the system is trained, and
then it is launched into production and runs without learning
anymore; it just applies what it has learned. This is called
offline learning.
• If you want a batch learning system to know about new data
(such as a new type of spam), you need to train a new version
of the system from scratch on the full dataset (not just the
new data, but also the old data), then stop the old system and
replace it with the new one.
• Fortunately, the whole process of training, evaluating, and
launching a Machine Learning system can be automated fairly
easily, so even a batch learning system can adapt to change.
Simply update the data and train a new version of the system
from scratch as often as needed.
• This solution is simple and often works fine, but training using
the full set of data can take many hours, so you would typically
train a new system only every 24 hours or even just weekly. If
your system needs to adapt to rapidly changing data (e.g., to
predict stock prices), then you need a more reactive solution.
• Also, training on the full set of data requires a lot of computing
resources (CPU, memory space, disk space, disk I/O, network
I/O, etc.). If you have a lot of data and you automate your
system to train from scratch every day, it will end up costing
you a lot of money. If the amount of data is huge, it may even
be impossible to use a batch learning algorithm.
• Finally, if your system needs to be able to learn autonomously
and it has limited resources (e.g., a smartphone application or
a rover on Mars), then carrying around large amounts of
training data and taking up a lot of resources to train for hours
every day is a showstopper.
• Fortunately, a better option in all these cases is to use
algorithms that are capable of learning incrementally.
Online learning:
• In online learning, you train the system incrementally by
feeding it data instances sequentially, either individually or in
small groups called mini-batches. Each learning step is fast
and cheap, so the system can learn about new data on the fly,
as it arrives.
• Online learning is great for systems that receive data as a
continuous flow (e.g., stock prices) and need to adapt to
change rapidly or autonomously. It is also a good option if
you have limited computing resources: once an online
learning system has learned about new data instances, it
does not need them anymore, so you can discard them
(unless you want to be able to roll back to a previous state
and “replay” the data). This can save a huge amount of
space.
• Online learning algorithms can also be used to train systems
on huge datasets that cannot fit in one machine’s main
memory (this is called out-of-core learning). The algorithm
loads part of the data, runs a training step on that data, and
repeats the process until it has run on all of the data.
Note: Out-of-core learning is usually done offline (i.e., not on the
live system), so online learning can be a confusing name. Think
of it as incremental learning.
• One important parameter of online learning systems is how
fast they should adapt to changing data: this is called the
learning rate. If you set a high learning rate, then your system
will rapidly adapt to new data, but it will also tend to quickly
forget the old data (you don’t want a spam filter to flag only
the latest kinds of spam it was shown).
• Conversely, if you set a low learning rate, the system will have
more inertia; that is, it will learn more slowly, but it will also
be less sensitive to noise in the new data or to sequences of
nonrepresentative data points (outliers).
• A big challenge with online learning is that if bad data is fed to
the system, the system’s performance will gradually decline. If
it’s a live system, your clients will notice.
• For example, bad data could come from a malfunctioning
sensor on a robot, or from someone spamming a search
engine to try to rank high in search results.
• To reduce this risk, you need to monitor your system closely
and promptly switch learning off (and possibly revert to a
previously working state) if you detect a drop in performance.
• You may also want to monitor the input data and react to
abnormal data (e.g., using an anomaly detection algorithm).
Instance-Based Versus Model-Based Learning
• One more way to categorize Machine Learning systems is by
how they generalize.
• Most Machine Learning tasks are about making predictions.
This means that given a number of training examples, the
system needs to be able to make good predictions for
(generalize to) examples it has never seen before.
• Having a good performance measure on the training data is
good, but insufficient; the true goal is to perform well on new
instances.
• There are two main approaches to generalization: instance-
based learning and model-based learning.
Instance-based learning:
• Possibly the most trivial form of learning is simply to learn by
heart. If you were to create a spam filter this way, it would
just flag all emails that are identical to emails that have
already been flagged by users—not the worst solution, but
certainly not the best.
• Instead of just flagging emails that are identical to known
spam emails, your spam filter could be programmed to also
flag emails that are very similar to known spam emails. This
requires a measure of similarity between two emails.
• A (very basic) similarity measure between two emails could be
to count the number of words they have in common. The
system would flag an email as spam if it has many words in
common with a known spam email.
• This is called instance-based learning: the system learns the
examples by heart, then generalizes to new cases by using a
similarity measure to compare them to the learned examples
(or a subset of them).
• For example, in Figure the new instance would be classified as
a triangle because the majority of the most similar instances
belong to that class.
Model-based learning:
• Another way to generalize from a set of examples is to build a
model of these examples and then use that model to make
predictions. This is called model-based learning.

You might also like