Introduction To Business Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
Introduction To Business Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
Analytics
Dr Sandipan Karmakar
Department of Management Studies
MNIT Jaipur
Some Real World Challenges
You apply for a loan for the first time. How does the bank assess the riskiness of the loan it might make
to you?
How does Amazon.com know which books and other products to recommend to you whenever you log
in to their web site?
How do airlines determine what price to quote to you when you are shopping for a plane ticket?
How can doctors better diagnose and treat you when you are ill or injured?
Test
Set
Learn
Training Model
Set Classifier
Classification – Application 1
Direct Marketing
Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-phone product
Approach
• Use the data for a similar product introduced before.
• We know which customers decided to buy and which decided otherwise. This {buy, don’t buy}
decision forms the class attribute.
• Collect various demographic, lifestyle, and company-interaction related information about all
such customers.
• Type of business, where they stay, how much they earn, etc.
• Use this information as input attributes to learn a classifier model.
Classification – Application 2
Fraud Detection
Goal: Predict fraudulent cases in credit card transactions
Approach
• Use credit card transactions and the information on its account-holder as attributes.
• When does a customer buy, what does he buy, how often he pays on time, etc
• Label past transactions as fraud or fair transactions. This forms the class attribute.
• Learn a model for the class of the transactions.
• Use this model to detect fraud by observing credit card transactions on an account.
Classification – Application 3
Customer Attrition/Churn
Goal: To predict whether a customer is likely to be lost to a competitor
Approach
• Use detailed record of transactions with each of the past and present customers, to find
attributes.
• How often the customer calls, where he calls, what time-of-the day he calls most, his
financial status, marital status, etc.
• Label the customers as loyal or disloyal.
• Find a model for loyalty
Clustering - Definition
Given a set of data points, each having a set of attributes, and a similarity measure among them, find
clusters such that
Data points in one cluster are more similar to one another
Data points in separate clusters are less similar to one another
Similarity Measures:
Euclidean Distance if attributes are continuous
Other Problem-specific Measures.
Illustrative Clustering
Euclidean Distance Based Clustering in 3-D space
Rules Discovered:
{Milk} --> {Coke}
{Diaper, Milk} --> {Beer}
Association Rule Discovery – Application 1
Marketing and Sales Promotion
Let the rule discovered be
{Bagels, … } --> {Potato Chips}
Potato Chips as consequent => Can be used to determine what should be done to boost its sales
Bagels in the antecedent => Can be used to see which products would be affected if the store
discontinues selling bagels
Bagels in antecedent and Potato chips in consequent => Can be used to see what products should be
sold with Bagels to promote sale of Potato chips!
Association Rule Discovery – Application 2
Supermarket shelf management
Goal: To identify items that are bought together by sufficiently many customers
Approach: Process the point-of-sale data collected with barcode scanners to find dependencies among
items.
A classic rule --
• If a customer buys diaper and milk, then he is very likely to buy beer.
• So, don’t be surprised if you find six-packs stacked next to diapers!
Association Rule Discovery – Application 3
Inventory Management
Goal: A consumer appliance repair company wants to anticipate the nature of repairs on its consumer
products and keep the service vehicles equipped with right parts to reduce on number of visits to
consumer households.
Approach: Process the data on tools and parts required in previous repairs at different consumer
locations and discover the co-occurrence patterns
Regression
Predict a value of a given continuous valued variable based on the values of other variables, assuming a
linear or nonlinear model of dependency.
Greatly studied in statistics, neural network fields.
Examples:
Predicting sales amounts of new product based on advertising expenditure
Predicting wind velocities as a function of temperature, humidity, air pressure, etc.
Time series prediction of stock market indices
Why Python
Python is an interpreted, high-level, general purpose programming language.
Some of the key design philosophy are the code readability, ease of use and high productivity and gained
significant popularity since 2012.
It has an amazing ecosystem and is excellent for developing prototypes very quickly
It has a comprehensive set of core libraries for data analysis and visualization
Python, unlike R is not only built for only data analysis purpose but is a general-purpose language
It can be used to build web applications, enterprise applications and easier to integrate with existing
systems in an enterprise for data collection and preparation
Moreover, Python’s strong community continuously evolves its data science libraries and keeps it
cutting edge
It has libraries for Linear Algebra, Statistics, Machine Learning, Visualization, Optimization, Stochastic
Models etc.