Module 4
Module 4
Dr. S. Ilankumaran
AP/ IT, TCE.
Introduction
• The aim of descriptive analysis is to describe the patterns of
customer behavior
• Descriptive analytics is referred to as unsupervised learning
• In supervised learning, input data is provided to the model along
with the output.
• In unsupervised learning, only input data is provided to the model.
• The goal of supervised learning is to train the model so that it can
predict the output when it is given new data
• The main goal of unsupervised learning is to discover hidden and
interesting patterns in unlabeled data.
Types of Descriptive Analysis
Detecting sequences of
purchase behavior in a
supermarket context
Detect
Detecting sequences of web
Sequence rules sequences of
page visits in a web
events
mining context
Detecting sequences of
words in a text document
Sequence Rule Mining
• Data Type: Sequence rule mining is applied to sequential data, where
each data point is a sequence of events or items ordered by time or
another sequence.
• Objective: The primary goal of sequence rule mining is to discover
sequential patterns or rules that describe the sequential
relationships between events or items.
• Pattern: Sequence rules describe the order in which events or items
occur over time.
Sequence Rule Mining
• Example: An example of a sequence rule could be "if {login, browse,
add to cart} then {purchase}," indicating the sequence of actions
leading to a purchase in an online shopping session.
• Algorithms: Common algorithms for sequence rule mining include
PrefixSpan, GSP (Generalized Sequential Pattern), and SPADE
(Sequential PAttern Discovery using Equivalence classes).
Types of Descriptive Analysis
support (X)
• Confidence refers to the amount of times a given rule turns out to be
true in practice
Association Rule Mining
• Mining association rules from data is essentially a two-step process
as follows:
1. Identification of all item sets having support above min support (i.
e., "frequent” item sets)
2. Discovery of all derived association rules having confidence
above min confidence.
SEQUENCE RULES
• Given a database D of customer transactions, the problem of mining
sequential rules is to find the maximal sequences among all
sequences
• That sequence have certain user-specified minimum support and
confidence.
• Example
• Home page ⇒ Electronics ⇒ Cameras and Camcorders ⇒ Digital
Cameras ⇒ Shopping cart ⇒ Order confirmation ⇒ Return to
shopping
SEQUENCE RULES
• It is important to note that a transaction time or sequence field will
now be included in the analysis.
• Association rules are concerned about what items appear together at
the same time
• Sequence rules are concerned about what items appear at different
times
• To mine the sequence rules, one can again make use of the a priori
property
Sequence Rules
• Consider the following example of a transactions data set in a web
analytics setting. The letters A, B, C, … refer to web pages
Sequence Rule
• A sequential version can then be obtained as follows:
• Session 1: A, B, C
• Session 2: B, C
• Session 3: A, C, D
• Session 4: A, B, D
• Session 5: D, C, A
Sequence Rule
• Support can be calculated in two ways
• A first approach would be to calculate the support whereby the one
sequent can appear in any subsequent stage of the sequence
• In this case, the support becomes 2/5 (40%)
• Another approach would be to only consider sessions in which the
consequent appears right after the antecedent
• In this case, the support becomes 1/5 (20%)
• The confidence, will be 2/4 (50%) in first case and 1/4 (25%) in
second case.
SEGMENTATION
• The aim of segmentation is to split up a set of customer observations into
segments such that
• the homogeneity within a segment is maximized (cohesive) and
• the heterogeneity between segments is maximized (separated)
• Famous Applications are
• Understanding a customer population (e.g., targeted marketing or
advertising)
• Efficiently allocating marketing resources
• Differentiating between brands in a portfolio
• Identifying the most profitable customers
• Identifying shopping patterns
• Identifying the need for new products
Summary
• Association rule mining focuses on discovering co-occurrence
patterns in transactional data,
• Sequence rule mining focuses on uncovering sequential patterns in
sequential data,
• Clustering focuses on grouping similar data points together without
predefined classes.
• Each technique serves a different purpose and is applied to different
types of data to extract useful insights.
Hierarchical versus Nonhierarchical
Clustering Techniques
Hierarchal Clustering
• Divisive hierarchical clustering starts from the whole data set in one
cluster, and then breaks this up in each time smaller clusters until
one observation per cluster remains
• Agglomerative clustering works the other way around, starting from
all observations in one cluster and continuing to merge the ones that
are most similar until all observations make up one big cluster