0% found this document useful (0 votes)
9 views

Module 4

Descriptive analysis aims to describe customer behavior patterns without supervision. Association rule mining finds frequently co-occurring item patterns, sequence rule mining finds sequential event patterns over time, and clustering groups similar observations into segments without labels.

Uploaded by

Mhd Aslam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Module 4

Descriptive analysis aims to describe customer behavior patterns without supervision. Association rule mining finds frequently co-occurring item patterns, sequence rule mining finds sequential event patterns over time, and clustering groups similar observations into segments without labels.

Uploaded by

Mhd Aslam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Descriptive Analysis

Dr. S. Ilankumaran
AP/ IT, TCE.
Introduction
• The aim of descriptive analysis is to describe the patterns of
customer behavior
• Descriptive analytics is referred to as unsupervised learning
• In supervised learning, input data is provided to the model along
with the output.
• In unsupervised learning, only input data is provided to the model.
• The goal of supervised learning is to train the model so that it can
predict the output when it is given new data
• The main goal of unsupervised learning is to discover hidden and
interesting patterns in unlabeled data.
Types of Descriptive Analysis

Type of Descriptive Type of Descriptive


Type of Descriptive
Analytics Analytics Explanation
Analytics Explanation Example
Explanation Example Example

Detecting what products are


frequently purchased together in a
supermarket context
Detect frequently
occurring Detecting what words frequently
Association rules
patterns between co‐occur in a text document
items
Detecting what elective courses
are frequently chosen together in a
university setting
Association Rule Mining
• Data Type: Association rule mining typically works with transactional
data, where each transaction consists of a set of items.
• Objective: The main objective of association rule mining is to
discover interesting relationships or associations between items in
large datasets.
• Pattern: Association rules are in the form of "if-then" statements,
where certain items appearing together in transactions imply the
presence of other items.
Association Rule Mining
• Example: An example of an association rule could be "if {milk, bread}
then {eggs}," indicating that customers who buy milk and bread are
likely to also buy eggs.
• Algorithms: Common algorithms for association rule mining include
Apriori and FP-Growth.
Types of Descriptive Analysis

Type of Descriptive Type of Descriptive


Type of Descriptive
Analytics Analytics Explanation
Analytics Explanation Example
Explanation Example Example

Detecting sequences of
purchase behavior in a
supermarket context
Detect
Detecting sequences of web
Sequence rules sequences of
page visits in a web
events
mining context
Detecting sequences of
words in a text document
Sequence Rule Mining
• Data Type: Sequence rule mining is applied to sequential data, where
each data point is a sequence of events or items ordered by time or
another sequence.
• Objective: The primary goal of sequence rule mining is to discover
sequential patterns or rules that describe the sequential
relationships between events or items.
• Pattern: Sequence rules describe the order in which events or items
occur over time.
Sequence Rule Mining
• Example: An example of a sequence rule could be "if {login, browse,
add to cart} then {purchase}," indicating the sequence of actions
leading to a purchase in an online shopping session.
• Algorithms: Common algorithms for sequence rule mining include
PrefixSpan, GSP (Generalized Sequential Pattern), and SPADE
(Sequential PAttern Discovery using Equivalence classes).
Types of Descriptive Analysis

Type of Descriptive Type of Descriptive


Type of Descriptive
Analytics Analytics Explanation
Analytics Explanation Example
Explanation Example Example

Detect Differentiate between brands in


homogeneous a marketing portfolio
Clustering
segments of Segment customer population
observations for targeted marketing
Clustering
• Objective: Clustering is used to group similar data points together
based on certain criteria, without any predefined classes or labels.
• Data Type: It can be applied to various types of data, including
numerical, categorical, or mixed data.
• Pattern: Clustering aims to partition the data into clusters or groups
so that data points within the same cluster are more similar to each
other than to those in other clusters.
Clustering
• Example: In customer segmentation, clustering can be used to group
customers with similar purchasing behaviors together.
• Algorithms: Common algorithms include K-means, DBSCAN (Density-
Based Spatial Clustering of Applications with Noise), and hierarchical
clustering.
Association Rule
• Association rules typically start from a database of transactions, D
• Each transaction consists of a transaction identifier and a set of items
{ i1 , i2 , …, i n } selected from all possible items (I).
• Two key measures to quantify the strength of an association rule are
1. Support
2. confidence
Association Rule
• The support of an item set is defined as the percentage of total
transactions in the database that contains the item set.
• Hence, the rule X ⇒ Y has support (s) if 100s% of the transactions in
D
contain X ∪ Y . It can be formally defined as follows:
• Support(x ∪ y) = number of transactions supporting (x ∪ y)
total number of transactions
• Support refers to how often a given rule appears in the database
being mined
Association Rule
• The confidence measures the strength of the association and is
defined as the conditional probability of the rule consequent, given
the rule antecedent.
• The rule X ⇒ Y has confidence( c ) if 100c % of the transactions in D
that contain X also contain Y.
• It can be defined as
Confidence( X--> Y) = P (Y/ X) = support (x ∪ y)

support (X)
• Confidence refers to the amount of times a given rule turns out to be
true in practice
Association Rule Mining
• Mining association rules from data is essentially a two-step process
as follows:
1. Identification of all item sets having support above min support (i.
e., "frequent” item sets)
2. Discovery of all derived association rules having confidence
above min confidence.
SEQUENCE RULES
• Given a database D of customer transactions, the problem of mining
sequential rules is to find the maximal sequences among all
sequences
• That sequence have certain user-specified minimum support and
confidence.
• Example
• Home page ⇒ Electronics ⇒ Cameras and Camcorders ⇒ Digital
Cameras ⇒ Shopping cart ⇒ Order confirmation ⇒ Return to
shopping
SEQUENCE RULES
• It is important to note that a transaction time or sequence field will
now be included in the analysis.
• Association rules are concerned about what items appear together at
the same time
• Sequence rules are concerned about what items appear at different
times
• To mine the sequence rules, one can again make use of the a priori
property
Sequence Rules
• Consider the following example of a transactions data set in a web
analytics setting. The letters A, B, C, … refer to web pages
Sequence Rule
• A sequential version can then be obtained as follows:
• Session 1: A, B, C
• Session 2: B, C
• Session 3: A, C, D
• Session 4: A, B, D
• Session 5: D, C, A
Sequence Rule
• Support can be calculated in two ways
• A first approach would be to calculate the support whereby the one
sequent can appear in any subsequent stage of the sequence
• In this case, the support becomes 2/5 (40%)
• Another approach would be to only consider sessions in which the
consequent appears right after the antecedent
• In this case, the support becomes 1/5 (20%)
• The confidence, will be 2/4 (50%) in first case and 1/4 (25%) in
second case.
SEGMENTATION
• The aim of segmentation is to split up a set of customer observations into
segments such that
• the homogeneity within a segment is maximized (cohesive) and
• the heterogeneity between segments is maximized (separated)
• Famous Applications are
• Understanding a customer population (e.g., targeted marketing or
advertising)
• Efficiently allocating marketing resources
• Differentiating between brands in a portfolio
• Identifying the most profitable customers
• Identifying shopping patterns
• Identifying the need for new products
Summary
• Association rule mining focuses on discovering co-occurrence
patterns in transactional data,
• Sequence rule mining focuses on uncovering sequential patterns in
sequential data,
• Clustering focuses on grouping similar data points together without
predefined classes.
• Each technique serves a different purpose and is applied to different
types of data to extract useful insights.
Hierarchical versus Nonhierarchical
Clustering Techniques
Hierarchal Clustering
• Divisive hierarchical clustering starts from the whole data set in one
cluster, and then breaks this up in each time smaller clusters until
one observation per cluster remains
• Agglomerative clustering works the other way around, starting from
all observations in one cluster and continuing to merge the ones that
are most similar until all observations make up one big cluster

You might also like