100% found this document useful (2 votes)
181 views

MachineLearning Jan2nd

This document provides an overview of machine learning with R. It discusses what machine learning is, how machines can learn from previous experiences without being explicitly programmed, and the difference between conventional and machine learning programs. It also covers machine learning concepts like recommendation engines, supervised learning techniques including classification and regression, and unsupervised learning. Popular programming languages for machine learning are also listed.

Uploaded by

goodboy
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
181 views

MachineLearning Jan2nd

This document provides an overview of machine learning with R. It discusses what machine learning is, how machines can learn from previous experiences without being explicitly programmed, and the difference between conventional and machine learning programs. It also covers machine learning concepts like recommendation engines, supervised learning techniques including classification and regression, and unsupervised learning. Popular programming languages for machine learning are also listed.

Uploaded by

goodboy
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 171

Machine learning with R

language
Conducted by
Ratnam and Ratnam Training Zone
Kakinada
What is Machine learning?
• Machine Learning is a part of Artificial Intelligence
that focuses on the study of computing and
mathematical algorithms and datasets to make
decisions without writing manual code.
• In other words, machine learning is writing code that
lets machines make decisions based on pre-defined
algorithms on provided datasets.
• If machines can learn from previous experiences,
without being explicitly programmed, this is known as
Machine Learning
“Previous experience” in Machine learning

• Providing our system, the previous experience


helps it to understand the relationship
between inputs and output.
• Or, we may also say, it helps machines to learn
how the output should be calculated when
given an input.
Difference between Conventional and
Machine learning programs
Conventional programs
• As we give input to the computer, the
computer performs a function on it and gives
output
Machine learning program
• The output and the program switch their
position to get a new program
Machine learning concepts
• For instance, if you provide a machine learning model
with a lot of songs that you enjoy, along their
corresponding audio statistics (dance-ability,
instrumentality, tempo or genre), it will be able to
automate (depending of the supervised machine
learning model used) and generate a recommender
system as to suggest you with music in the future that
(with a high percentage of probability rate) you’ll enjoy
similarly as to what Netflix, Spotify, and other companies
do. Following image shows a sample of Netflix
recommendations with movie suggestions
Netflix example
X-ray example
• In a simple example, if you load a machine learning program with
a considerable large data-set of x-ray pictures along with their
description (symptoms, items to consider, etc.), it will have the
capacity to assist (or perhaps automatize) the data analysis of x-
ray pictures later on
• The machine learning model will look at each one of the pictures
in the diverse data-set, and find common patterns found in
pictures that have been labeled with comparable indications.
• when you load the model with new pictures it will compare its
parameters with the examples it has gathered before in order to
disclose to you how likely the pictures contain any of the
indications it has analyzed previously.
Examples of AI and Machine Learning In Practice

Recommendation engines
For example, I am looking for Syska power bank
in Amazon.com and usually it looks like this
Now lets scroll down and we can find two
more sections in the same page or screen like
“Customers who viewed this item also
viewed” and “Customers who bought this
item also bought” as shown in the below
images
Types of Machine learning

Supervised learning
• Supervised learning algorithms try to model
relationship and dependencies between the
target prediction output and the input
features, such that we can predict the output
values for new data based on those
relationships, which it has learned from
previous data-sets fed.
Supervised learning
• The aim of supervised machine learning is to build a
model that makes predictions based on evidence in
the presence of uncertainty.
• A supervised learning algorithm takes a known set
of input data and known responses to the data
(output) and trains a model to generate reasonable
predictions for the response to new data.
• Supervised learning uses classification and
regression techniques to develop predictive models
Supervised learning

• Supervised machine learning can further be


classified into,
– Classification
– Regression
Classification techniques

• Classification techniques predict discrete


(Separate or individual)responses—for example,
whether an email is genuine or spam, or whether
a tumor is cancerous or benign.
• Classification models classify input data into
categories.
• Typical applications include medical imaging,
speech recognition, and credit scoring.
Classification

• In this type of problem, we seek a yes or no


response.
• It forms some conclusion from observed
values whether an event will occur or not.
Classification example
Regression techniques

• Regression techniques predict continuous


responses— for example, changes in
temperature or fluctuations in power demand.
• Typical applications include electricity load
forecasting and algorithmic trading
• A regression algorithm may predict a discrete
value, but the discrete value in the form of an
integer quantity
Regression Example

• Let's go through an example of car insurance fraud:


• What are we trying to predict?
– This is the label: the amount of fraud
• What are the "if questions" or properties that you
can use to predict?
– These are the features: to build a classifier model, you
extract the features of interest that most contribute to
the classification.
– In this simple example, we will use the claimed amount.
Difference between Classification and
Regression
Using Supervised Learning to Predict Heart
Attacks
• Suppose clinicians want to predict whether
someone will have a heart attack within a year.
• They have data on previous patients, including
age, weight, height, and blood pressure.
• They know whether the previous patients had
heart attacks within a year.
• So the problem is combining the existing data
into a model that can predict whether a new
person will have a heart attack within a year.
Unsupervised learning
Unsupervised learning, another type of
machine learning are the family of machine
learning algorithms, which are mainly used in
pattern detection and descriptive modeling.
These algorithms do not have output
categories or labels on the data (the model is
trained with unlabeled data).
Comparison
• Based on the response i.e. either spam or not,
the output will be stored by the machine.
• If similar type of emails comes, it will be
labeled as spam without additional checking.
Regression

• In this type of problem, the predicted value lies somewhere


in a continuous spectrum; regression is often used for
forecasting and finding out a relation between variables.
• From our previous example, the number of runs that will
be scored, and what will be the salary of a new employee
are the case scenarios for the regression.
• Following are the common regression algorithms
– Nearest Neighbors
– Linear Regression
– Decision Tree
– Naive Bayes
Unsupervised learning

• Lets come back to this after discussing


datasets and how to understand them
Machine learning languages

Most popular programming languages for machine learning


include
– Python
– C++
– Java
– JavaScript
– C#
–R
– Julia
– GO
– TypeScript
– Scala
Datasets

• Data is of two types i.e. Structured data and unstructured data.


• Structure data: Spreadsheets or excels or CSV (Comma
Separated Values) with columns and rows and tables in Oracles
or MySQL. It is written in a format that’s easy for machines to
understand. Structured data is easily searchable by basic
algorithms
• Unstructured data: Unstructured data is more like human
language. Examples include emails, text documents (Word docs,
PDFs, etc.), social media posts, videos, audio files, and images.
• Note: Most of the data we use for data sciences is
unstructured
Few terms of datasets
• Instance: A single row of data is called an instance. It is an
observation from the domain.
• Feature: A single column of data is called a feature. It is a
component of an observation and is also called an attribute of a
data instance. Some features may be inputs to a model (the
predictors) and others may be outputs or the features to be
predicted.
• Data Type: Features have a data type. They may be real or
integer-valued or may have a categorical or ordinal value. You
can have strings, dates, times, and more complex types, but
typically they are reduced to real or categorical values when
working with traditional machine learning methods.
• Datasets: A collection of instances is a dataset and
when working with machine learning methods we
typically need a few datasets for different purposes
• Training Dataset: A dataset that we feed into our
machine learning algorithm to train our model
• Testing Dataset: A dataset that we use to validate
the accuracy of our model but is not used to train
the model. It may be called the validation dataset
• Example of raw adult dataset
https://archive.ics.uci.edu/ml/datasets/adult
Below figure shows our dataset separated into
features to the left of the red line, and targets
to the right
Features for a particular instance are grouped
together into a feature vector, an example of
which is outlined in Figure
An instance is made up of a feature vector and a
corresponding target, as shown in Figure
Difference between labeled data and
unlabelled data
Labeled data
label gender age
healthy m 18
healthy f 29
healthy f 34
healthy m 21
...
sick m 68
Unlabelled data
gender age
f 65
m 21
...
m 23
f 18
f 75
• Lets go back to unsupervised learning now
• Unsupervised learning, another type of machine learning are
the family of machine learning algorithms, which are mainly
used in pattern detection and descriptive modeling. These
algorithms do not have output categories or labels on the data
(the model is trained with unlabeled data).
• This machine learning approach finds the pattern in the data. It
learns from the data that has not been labeled, which means
the input values are given but no corresponding output values.
• The system is not provided with the correct answer and is left
to discover an interesting structure in the data.
• Unsupervised machine learning helps us to
make a decision like,
– Clustering
– Association
• Some common algorithms that are used in
machine learning language are.
– K-means Clustering
– Hierarchical Clustering
– DBSCAN Clustering
– Anomaly Detection
Understanding variables of this dataset

• From the above dataset, there are two type of


variables
– Dependent variables and
– Independent variables
Machine learning Algorithms
• Broadly, there are 3 types of Machine Learning Algorithms
– Supervised Learning
– Unsupervised learning
– Reinforcement learning
Supervised learning algorithms

• Linear Regression
• Logistic Regression
• Support Vector Machines (SVM)
• K Nearest Neighbors (KNN)
• Random Forest
• Decision Trees
Unsupervised learning algorithms

• K-Means
• Apriori
• C-Means
Reinforcement Learning

• Q-Learning
• SARSA (State Action Reward State Action)
Popular Machine learning algorithms

• Linear Regression
• Logistic Regression
• Decision Tree
• SVM
• Naive Bayes
• kNN
• K-Means
• Random Forest
• Dimensionality Reduction Algorithms
• Gradient Boosting algorithms
– GBM
– XGBoost
– LightGBM
– CatBoost
Linear Regression Algorithm basics

• Linear regression is one of the popular machine learning regression


algorithms
• It uses a statistical model and mathematical algorithm to show the relation
between two variables
• We can use linear regression for solving the problems like
– What is the relation between age and income?
– What is the relationship between sales and marketing spending?
– How much additional income I will get for each thousand dollar spent on marketing?
– What will be the price of my shares in next six months?
• In Linear regression we will find correlation between X and Y variables or the
features of a dataset
• So, every value in X has a corresponding value in Y, if both the values are
continuous
• The data is modeled using a straight line using linear regression
• We should check the accuracy and goodness of fit to validate the
linear regression model
• Linear regression is a linear model, e.g. a model that assumes a
linear relationship between the input variables (x) and the single
output variable (y). More specifically, that y can be calculated
from a linear combination of the input variables (x).
• When there is a single input variable (x), the method is referred
to as simple linear regression.
• When there are multiple input variables, literature from statistics
often refers to the method as multiple linear regression.
Linear regression algorithm steps

• Mathematically the relationship can be represented with


the help of following equation −
• Y=mX+b
•  Here, Y is the dependent variable we are trying to predict
• X is the dependent variable we are using to make
predictions.
• m is the slop of the regression line which represents the
effect X has on Y
• b is a constant, known as the 𝑌Y-intercept. If X = 0,Y would
be equal to 𝑏
• Here the function we used is weight =B0 +B1
* height
•  For example, lets use B0 = 0.1 and B1 = 0.5.
Let’s plug them in and calculate the weight (in
kilograms) for a person with the height of 182
centimeters.
• weight = 0.1 + 0.5 * 182
• weight = 91.1
Another example
Linear regression steps with sample dataset

• The dataset contains tips data from different customers’ females and
males smokers and non smokers from days Thursday to Sunday, dinner or
lunch and from different tables size
• We want to predict how much tip the waiter will earn based on other
parameters
• Now lets answer few questions to get the linear regression outputs
•  What is the hardest day to work? (Based on number of tables been
served)
• Lets find out what is the best day to work – maximum tips (sum and
percents)
• We can see that the tips are around 15% of the bill
• Next we will analyze who eats more (and tips
more)? Smokers or non smokers?
Transform and clean the data

• Before we start building our model, we need


to convert all the text values into numbers.
We can do it in many ways:
– Using update statements
– Using replace method
– Iterate over the rows
– Use dummy variables
• For example using Python code we are replacing or converting sex and
smokers columns to values i.e. 0 and 1
• df.replace({ 'sex': {'Male':0 , 'Female':1} , 'smoker' : {'No': 0 , 'Yes':
1}} ,inplace=True)
• Now we will use dummy variables to replace the Day values. The values in
day column are:  Thu, Fri, Sat, Sun we can convert it to 1,2,3,4 but to get a
good model, it is better to use Boolean variables.
• We can achieve it by converting the column into 4 columns – one for each
day with 0 or 1 as values
Building our Machine Learning model

• Now we are ready to build the linear regression model:


• We create a list of features as X and predicted as Y
• X = df[['sex','smoker','size','Fri','Sat','Sun','Dinner']]
• Y = df[['tip']]
• Now lets split the data into test and train so we can test
our model before we use it – we decide to split 70% – 30%:
• from sklearn.model_selection import train_test_split
• from sklearn.linear_model import LinearRegression
• X_train, X_test , y_train , y_test =
train_test_split(X,Y,test_size=0.25,random_state=26)
• Now lets train the model with X_train and y_train:
• model = LinearRegression()
• model.fit(X_train, y_train)
• And predict the X_test values:
• predictions=model.predict(X_test)
• We can now look at the predictions and compare it with
y_test
• We can draw a graph to see the difference distribution:
• sb.distplot(y_test-predictions)
Another example

• We will predict the sales based on advertising


on TV, Radio and News paper
• What are the features?
– TV: advertising dollars spent on TV for a single
product in a given market (in thousands of dollars)
– Radio: advertising dollars spent on Radio
– Newspaper: advertising dollars spent on
Newspaper
• What is the response?
– Sales: sales of a single product in a given market
(in thousands of items)
Linear regression interpretations

• Strong relationship between TV ads and sales


• Weak relationship between Radio ads and
sales
• Very weak to no relationship between
Newspaper ads and sales
Logistic Regression

• Logistic regression is a predictive modeling


algorithm that is used when the Y variable is
binary categorical.
• That is, it can take only two values like 1 or 0.
• The goal is to determine a mathematical
equation that can be used to predict the
probability of event 1.
• Once the equation is established, it can be used
to predict the Y when only the X’s are known
Some examples of binary classification
• Spam Detection : Predicting if an email is Spam or not
• Credit Card Fraud : Predicting if a given credit card
transaction is fraud or not
• Health : Predicting if a given mass of tissue is benign
or malignant
• Marketing : Predicting if a given user will buy an
insurance product or not
• Banking : Predicting if a customer will default on a
loan.
Difference between linear regression and logistic regression

• In Linear regression, response variable has only 2


possible values, it is desirable to have a model
that predicts the value either as 0 or 1 or as a
probability score that ranges between 0 and 1
• Linear regression does not have this capability
• Because, if you use linear regression to model a
binary response variable, the resulting model may
not restrict the predicted Y values within 0 and 1.
• Dependent variable in Linear regression is
continuous and it is categorical in logistic
regression
• Dependent variable or output of this regression
type is in binary format
• Linear regression assumes that the data follows a
linear function
• Logistic regression models the data using the
sigmoid function
Logistic regression Model
• The curve here is known as S-curve or sigmoid
curve. This curve maps the relation between
dependent and independent variables.
• Here 0 and 1 represents the probability of an
event to happen
• In logistic regression we don’t need the values
below 0 or above 1
Mathematical equation
• Here threshold value is 0.5. Threshold value
indicate the probability of winning or losing
• Values above 0.5 are considered as probability
will be 1 and below 0.5 are considered as
probability will be 0
Example program for Logistic regression

• Let’s consider the Titanic dataset


Steps to build the logistic regression
Collecting the data
Analyzing the data

• Here we should analyze the relations between


variables.
• We have to check how one variable is affecting
the other variables.
• For example we are analyzing the number of
passengers survived and not survived using
the python code
• Now let’s analyze gender, how many males were survived and females
were survived
• Now lets analyze the passenger class
• Now, let’s check the age distribution i.e. the
range of age of the passengers
• Now, let’s check the fare distribution i.e. fare
of the tickets purchased by the passengers
• Lets analyze Siblings and spouse feature
Data Wrangling

• Cleaning the data is known as data wrangling


• For large datasets we will have null values or NAN
values.
• For better performance of machine learning algorithms
we have to clear the datasets by removing these values
and this entire process is known as data wrangling
• Lets perform data wrangling on Titanic dataset.
• We will check if the dataset has null values or not using
the following code, which results in Boolean values.
• Here, True mean null and false means not null
• We can find, the cabin value of the 1st instance is null, as the result holds
TRUE
• Now we will check the count of null values for each of the feature using
the following code
• To handle these null values, we can drop them, replace them or insert
some dummy values
• In cabin column or feature we have lot of null values, so we can
completely drop it as we can’t use it for predicting. We will use the
following code
• Now we will convert the text based data to categorical data to perform
logistic regression.
• In general, machine learning algorithms always need numerical categories
to build the models or run the algorithms
• In this case, we will convert the sex feature to numbers using the following
code
• Now here, we don’t required two columns to
check whether the gender is male or female
• Now, Embarked feature also have text values
like S and C. So no insert some dummy values
using the following procedure
• We will remove the first column to make the
data simple using the following code
• Similarly we will use dummy values for
Passenger class feature also
• Next step is to concatenate all the new rows
into a new dataset
• Lets drop the irrelevant columns i.e.
PassengerId, Pclass, Name, Ticket, Sex and
Embarked using the following code
Training and Testing the data

• We will split the entire dataset into train


dataset and test dataset
• Training
– Here we will build the model on the train dataset
and predict the output on the test dataset
– Lets define the dependent variable and
independent variable i.e. X and Y
– Y should be predicted and in this case, its
“survived” feature for the titanic data
Testing the accuracy with testing data

• Now we are testing the logistic model created


with training data for the test data
• We use classification report and confusion
matrix to check the accuracy of the model
Classification report
Confusion matrix

• Confusion matrix is a 2X2 matrix which has 4


outcomes.
• This tells how accurate the values are
PN (Predicted No) PY (Predicted Yes)

AN (Actual No) 105 21

AY (Actual Yes) 25 63
• Here the AN and PN is 105 and AY and PY is 63
• So calculate the accuracy we need to add the
sum of 105+63 = 168 and divide it by the sum
i.e. (105+21+25+63)
• Here 105 is also called as True Negative
• 21 is called as False Positive
• 63 is called as True positive
• 25 is called as False negative
Decision Trees
• Decision tree is a type of classification algorithms
that comes under supervised learning technique
• Decision trees are graphical representation of all
the possible solutions to decide
• Decisions are made based on some conditions
• Decisions made can easily explained
• Decision trees have root node, decision nodes
and leaf nodes
Decision Tree Terminologies

• Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two
or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree
cannot be segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root
node into sub-nodes according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted
branches from the tree.
• Parent/Child node: The root node of the tree is called the parent
node, and other nodes are called the child nodes.
Decision Tree Steps
• Step-1: Begin the tree with the root node, says S, which contains
the complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute
Selection Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values for
the best attributes.
• Step-4: Generate the decision tree node, which contains the best
attribute.
• Step-5: Recursively make new decision trees using the subsets of
the dataset created in step -3. Continue this process until a stage is
reached where you cannot further classify the nodes and called the
final node as a leaf node.
Example
Simple examples
• From the above figure
• Root node is Salary at least $50000
• Decision nodes are outcomes of Yes and No
• Leaf nodes are final decisions made by the
model i.e. Accept offer or decline offer
Attribute Selection Measures
• There are two popular techniques for ASM,
which are:
• Information Gain
• Gini Index
Building decision tree manually

• Let’s build a decision tree manually using CART (Classification and


Regression Trees) algorithm and consider the below dataset
• Out of these features, which should be picked to start
• Choose the best feature that best classifies the training
data
• For this purpose we will use few factors i.e. Gini index,
Information gain, Reduction in variance and Chi
Square
• Attributes or features with highest information gain is
considered first
• To understand what information gain is, let’s
understand what Entropy is
Entropy

• Entropy is a metric used to measure the


impurities
• So, what is impurity?
• Suppose we have a basket full of apples one
side and labels with apples on the another side
• Now the probability of picking apple from the
basket and the apple label is 1 and in this case,
impurity is 0
• Now consider, a bag with 4 different fruits on one side and corresponding
labels on the other side
• Here, the probability of matching a fruit and label is not 1, it would be less
than 1
• In this case, impurity is non-zero
• So, we can define impurity as degree of randomness
• Mathematically entropy is calculated using the
following formulae
• Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
• Where,

• S= Total number of samples


• P(yes)= probability of yes
• P(no)= probability of no
• Let’s check the fist condition, where the
probability is 0.5
• Entropy(S) = -P(Yes) log2 P(Yes)-P(No) log2 P(No)
• = -0.5 log20.5-0.5 log20.5 = -0.5(log20.5+ log20.5)
• = -0.5(-1+-1)
• = -0.5*-2
• =1
• So here Entropy (S) = 1
• Lets see the other condition, where the
probability of Yes or No is 1
• Entropy(S) = -P(Yes) log2 P(Yes)-P(No) log2
P(No)
• = -1 log21-1 log21 = -1(log21+ log21)
• = -1(0+0) = 0
• So here the Entropy (S) = 0
Information Gain
• Information gain will measure the reduction in entropy
• Also decides which attribute or feature should be selected as
decision node
• Mathematical formula for calculating the Information gain is
Information gain for our dataset
• Now let’s calculate the information gain for the
previous dataset
•  For instance, we will consider Outcome feature
from the previous dataset as the root node
• Outlook has 3 parameters or values i.e. Sunny,
Overcast and Rainy.
• We will compute the number of Yes or No values
from Play feature for Sunny, Overcast and Rainy
and it will look something like this
• Now, let’s calculate the entropy for each feature
• E(Outlook=Sunny) = - 2/5 log2 2/5 – 3/5 log2 3/5 = 0.971
• E(Outlook=Overcast) = - 4/4 log2 4/4 – 0/4 log2 0/4 = 0
• E(Outlook=Rainy) = -3/5 log2 3/5 – 2/5 log2 2/5 = 0.971
• Now, we will calculate the amount of information we are getting from outlook
• I(Outlook) = (5/14*0.971)+(4/14*0)+(5/14*0.971) = 0.693
• So the information for Outlook feature is 0.693
• Now, information gained from outlook is calcualted
• Gain(Outlook) = E(S)-I(Outlook)
• So how do we calculate E(S) i.e. for entire population
• E(S) = - 9/14 log2 9/14 – 5/14 log2 5/14 = 0.94
• So the gain is calculated as below
• Gain(Outlook) = E(S)-I(Outlook) = 0.94-0.693 = 0.247
• So the information gain for outlook is 0.247
• Now, let’s assume Windy is the rood node
• E (Windy=True) = 1 (As there are equal number
of yes and equal number of no)
• E(Windy=False) = 0.811
• So, what is the information from Windy?
• I(Windy)= (8/14*0.811)+(6/14*1) = 0.892
•  Now the information gain from Windy is as
calculated
• Gain(Windy)=E(S)-I(Windy) = 0.94-0.892 = 0.048
• Similarly we will calculate for temperature and
humidity
• Information for Temp = 0.911
• Gain(Temp)= 0.029
•  Information for Humidity = 0.788
• Gain(Humidity) = 0.152
•  We will select the feature with maximum
gain i.e. Outlook in this case = 0.247
• So outlook is the root node for building this
decision tree
• For outlook = overcast, we have all YES, so it is
the final or leaf node
• For sunny and rain, again we have choose the
nodes as they have both yes and no values
• Again, we have to priorities these decision
nodes based on information gain values i.e.
Humidity and Windy and the decision tree
looks like
Pruning

• Pruning is cutting down some nodes to get


optimal solution and reduce the complexity
Associations: Apriori Algorithm
Market Basket Analysis
• Market basket analysis is one of the techniques used
by many retailers to uncover associations between
items
• Examples include, a customer who purchase bread
have a 60% chance to purchase jam
• Customer who purchased laptops could also purchase
laptop bags also
• So, its all about finding associations between items or
products that can be sold together
Association rule mining

• Association rule mining is like developing


simple If and Then conditions
• If a customer buys bread, then the chances of
buying jam, butter and eggs are high
• Association rule mining is used to determine the relation between A and B in this
case
• A can be an item or set of items and it is also called Antecedent and B is called
Consequent
• For example, If Tea and Milk are the antecedent, then Sugar can be consequent
• What if the customer who bought A and B?
• Here, we should be able to propose another product C to the customer
• So, lot of combinations and associations should be learned to attract the
customers and also improve the sales
• In such cases, lot of rules should be developed
• Suppose, there are 10000 or more data, then building association rules for all the
items will end up with lot of relations and combinations
• But we can simply build the association rules for huge volumes of data using
Apriori algorithm
Definition
• Association rule learning is a rule-based
machine learning method for discovering
interesting relations between variables in
large databases. It identifies frequent if-then
associations called association rules which
consists of an antecedent (if) and a
consequent (then).
Apriori algorithm

• Apriori algorithm is all about measuring the


associations with any number of items using
mathematical equations or formulae
• There are three metrics used for calculating or
measuring the associations
– Support
– Confidence
– Lift
• Support is an indication of how frequently the
items appear in the data. Mathematically,
support is the fraction of the total number of
transactions in which the item set occurs.
• Confidence indicates the number of times the
if-then statements are found true. Confidence
is the conditional probability of occurrence of
consequent given the antecedent.
• Lift can be used to compare confidence with
expected confidence. This says how likely item
Y is purchased when item X is purchased,
while controlling for how popular item Y is.
Mathematically,
• With support value we can compute the frequency of items A and B bought. So,
we can filter out the items which have been bought less frequently
• Confidence gives us the how frequently items A and B are bought based on the
number of times item A was bought
• Suppose the customers are buying the items A and B together and not buying C,
we can filter out C items using confidence value
• With the Confidence and Support values we can filter many items and rules
• Suppose, even after filtering many items, what if we need to create more 5000
rules for every item left? In this case, we can use lift value
• Lift can be defined as a strength of any rule
• Lift will consider the independent support values of items A and B in the
denominator
• This will give independent occurrence probability of items or items sets A and B
Example of Association rule mining

• Suppose we have few items and the


corresponding transactions as below
• Now we will create some rules for these
transactions and these might look like
Apriori algorithm
Definition
Apriori algorithms use frequent item sets to
generate association rules. It is based on the
concept that a subset of frequent itemset
must also be a frequent item set
• Now let’s consider the following transaction
• We will calculate the support for each item i.e. {1}, {2}, {3}, {4}
and {5} for item sets of size 1
•  
• Support {1} = 3
• Support {2} = 3
• Support {3} = 4
• Support {4} = 1
• Support {5} = 4
• We can notice, support for item 4 is less i.e. 1. So we can
eliminate item 4 as it is less than the minimum support value 2
i.e. in case of transaction 4 i.e. {2, 5}
• Let’s prepare the final table
• Now, let’s calculate the item sets from F1 table
• {1, 2}, {1, 3}, {1, 5}, {2, 3}, {2, 5}, {3, 5} and call this as C2 as shown
• Lets check these item sets with the original item set and compute the
support for the item set of size 2
• {1, 2} = 1 i.e. in T3
• {1, 3} = 3 i.e.in T1, T3 and T5
• {1, 5} = 2 i.e. in T3 and T5
• {2, 3} = 2 i.e. in T2 and T3
• {2, 5} = 3 i.e. in T2, T3 and T4
• {3, 5} = 3 i.e. in T2, T3 and T5
•  Lets put them in another table and call that as F2 as shown below. {1, 2} has
less support i.e. 1 which is less than the minimum item set count, so we will
discard this
• Now consider, F2 and do all the possible combinations with item set of size 3
• {1, 3, 2}
• {1, 3, 5}
• {2, 3, 5}
• {2, 5, 1}
•  Now we will identify, discarded item sets of size 2 if any identified from
above list
• So, we can discard {1, 2} which are noticed in {1, 3, 2} and {2, 5, 1}
• So the final item sets remaining are {1, 3, 5} and {2, 3, 5}
• Let’s calculate the support for these item sets
• {1, 3, 5}= 2 i.e. in T3 and T5
• {2, 3, 5}= 2 i.e. in T2 and T3
• Let’s create another table C4 with item set size
4
• {1, 3, 5, 2} and the support is 1 i.e. in T3 as
shown
• As the support of {1, 2, 3, 5} is 1, we will stop the
iterations here and go to previous table i.e.
• {1, 3, 5} and {2, 3, 5}
•  We have to create subsets for these two items set
and they will be like
• For I= {1, 3, 5}, subsets are {1, 3} {1, 5} {3,5} {1} {3}
{5}
• For I= {2, 3, 5} subsets are {2, 3} {2, 5} {3, 5} {2} {2}
{5}
• Here S is one of the subsets of I
• S recommends I to S
• Let’s assume the minimum confidence value is
60%
• Let’s apply the rules for these two item sets
and the corresponding subsets
{1, 3, 5}
• Rule 1: {1, 3}({1, 3, 5}-{1, 3}) so, the 1 and 3 recommends 5
• Calculate the confidence now
• Confidence = Support(1, 3, 5)/Support(1, 3) = 2/3 = 66.66% > 60%
• Rule 1 is selected
• Rule 2: {1, 5}  ({1, 3, 5}--{1, 5}) so the 1 and 5 recommends  3
• Confidence = Support(1, 3, 5)/Support (1, 5) = 2/2 = 100% > 60%
• Rule 2 is selected
•  Rule 3: {3, 5}({1, 3, 5}—{3, 5}), so the 3 and 5 recommends  1
• Confidence = Support (1, 3, 5)/Support (3, 5) = 2/3= 66.66% > 60%
• Rule 3 is selected
• Similarly we will calculate the rest of the
rules
• Similarly we will calculate the rules for the
second item set i.e. {2, 3, and 5}
Task for next session
• Compute the rules for the below transactions

You might also like