Customer Analytics at Flipkart
Customer Analytics at Flipkart
Ecommerce has been a platform of advantage for many retail webstores. Flipkart being one of the
older players who is trying to set its competitive advantage uses right strategies of gaining customer
interest, pattern of buying behaviour using certain recommendation Engines.
Here, in our case we can implement – Collaborative filtering algorithm in recommendation system of
Flipkart.
Take the historical data of all the user list for a particular period and demographic of that
particular location.
Input data for collaborative filtering can be target items (which users usually purchase) and
users.
+The data is stored in matrix form or rows and columns where rows are the number of users and
columns as the targeted items of particular section. Say apparels.
In our case we can see how each user purchased 5 items of that section, where each -
“purchased” that particular item.
Hence matching criteria of the user with target item is also important.
Here, we saw the criteria as purchased, it can be “Added to cart”, “ratings given” , “click per
view” etc.
Based on the above parameters we use “utility matrix” to find the relation between users
and targeted items.
Utility Matrix: C x S- R, where R = Targeted Criteria; C= set of customers; S= set of targeted
Items.
Collaborative Filtering Algorithm:
Step 1: Find the set of N users whose purchases are similar to the targeted customer X
Step 2: Estimate X purchase item based on set of N users
Step 3: If the purchase similarity matches accurately, Suggest the recommended items of N
users, to the user X.
Content = Based on articles/blogs customer clicked, liked per view etc factors
Repeat purchases are done in bigbasket so for this along with collaborative filtering
we use the evaluation tools like “ support” and “confidence” while applying
association rule mining .
Association is measured using support and confidence Support(A)=n(A)/N
Support=n(A∩B)/N where n(A∩B) is the no.of times both A and B are purchased and
N is the total no.of purchases Confidence=n(A∩B)/n(A)=Support(A,B)/Support(A)
Lift(A,B)=Confidence(A,B)/Support(B) = Support(A,B)/ Support(A) * Support(B) =
n(A∩B)/n(A) * n(B) value >1 likely to be bought =1 no association <1 unlikely to buy
Association Rule Mining – Algorithm:
1. Apriori algorithm is used to find the frequency patterns keeping “frequent item
set “ condition in mind.
2. Generating Association rules
Eg : Person who buys bread will 100% purchase milk and egg or When milk and
egg are bought there is 100% chance that bread is bought.
Popularity of associate rule mining is checked through threshold limit .
One of the product recommendation techniques is “Market basket analysis”.
“Lift” is the measure of correlation and measure of association. In bigbasket
case , we see if lift =1 there is no association between bread and butter
Lift <1 then there is a negative association where when bread is bought
chance of buying butter is decreases.
Lift >1 is a positive association where when bread is bought chance of buying
butter increases.
In the bigbasket case study , I suggest for Lift >1 .
https://towardsdatascience.com/a-complete-logistic-regression-algorithm-from-scratch-in-python-
step-by-step-ce33eae7d703
confusion matrix is a visual representation which tells us the degree of four important classification
metrics:
True Positives (TP): The number of observations where the model predicted the
True Negatives (TN): The number of observations where the model predicted the
customer would not churn (0), and they actually do not churn (0).
False Positives (FP): The number of observations where the model predicted the
customer will churn (1), but in real life they do not churn (0).
False Negatives (FN): The number of observations where the model predicted the
customer will not churn (0), but in real life they do churn (1).
One axis of a confusion matrix will represent the ground-truth value, while the other will represent
the predicted values. At this step, it is very important to have business domain knowledge. Certain
metrics are more prevalent to our model than not. For example, if we were modeling whether a
patient has a disease, it would be much worse for a high number of false negatives than a high
number of false positives. If there are many false positives, then that just means some patients
would need to undergo some unnecessary testing and maybe an annoying doctor visit or two. But, a
high false negative means that many patients would actually be sick and diagnosed as healthy,
potentially having dire consequences. For our purposes of churn, it is worse for us to predict a
customer not churning when that customer actually churns in reality, meaning that our False
Over All
Bigbasket :Collaborative filtering >> Concept of repeat purchases >> Association rule mining >>
Support and confidence as evaluation tools or metrics.
QWE – Content Filtering >> Why ?? – Based on the articles, blogs which mention the services
provided by the org .
Flipkart, QWE, HR >> Uses Confusion matrix – precision , accuracy, specificity etc as evaluation
metrics to determine the churn rate model .