CRM Data Points
CRM Data Points
How do you make sense of sales data generated by transactions with 100,000 customers, 10000
products from 15 different sites through 10 different channels? It's easy, you say, because there are
now so many tools available to do this and this is not even big data. But it's also hard because you
don't need big data to be overwhelmed, even 1/100th of what we use in this example is enough to
get the better of you. For most of us in any case. For instance, look at this data.
This is for a single product over three years. Average of weekly prices offered, and quantity picked up
for just one product across the sites and channels. Even though we can see the expected price-
quantity relationship, however, can we tell the customer what is a good price to offer? - to which
customer? through which channel and site combination - in which weeks so the customer can be
assured of maximizing returns? Moreover, this is only one product. How does one do this at scale?
So, that is our challenge. In addition to technical challenge, as we do customer projects we realize
the need that project teams face on (1) how to get started with simple analysis when there are many
possibilities, perspectives, KPIs etc. and (2) how to root the tools and analysis in the context of a
business problem to build a compelling narrative which is what is critical for communication to
executives. We try to address some of these issues in this series of posts. So, here we present you
with an example - how to take your 'ordinary' 'sales data' from Dynamics 365 for operations sales
order table and how to structure your exploration to provide some deeper analysis to your
executives.
We present this in Q&A format so you can follow along with us.
We start with very basic questions that can be answered with simple descriptive analytics and don't
need machine learning and then we progress towards deeper analysis. These series of posts are
aimed at business analyst audience so we keep you away from technical complexity
underneath. However, lets first see an approach that we use to solve this technical challenge.
Pricing is a core business function for every single company. Yet, there are no one-size-fits-all
solutions because every business has a unique history and position in the market. Enormous human,
marketing and technical resources are spent on getting pricing right. Today price setting is informed
by a combination of experience of long-time employees, competitive market intelligence,
experiments and promotions, and supplier forecasts.
Given the abundance of historical data, pricing seems ripe for the application of machine learning.
However, the domain is complex, and there are many confounding effects like holidays and
promotions. Machine learning tradition prioritizes prediction performance over causal consistency,
but pricing decisions need high-quality inference about causation, which is only guaranteed with
randomized controlled experiments. Experimentation, also known as A/B testing, is extremely difficult
to set up on existing operations systems. Observational approaches are "bolt-on", but tools that
guard against the many statistical pitfalls are essential.
In this post, we talk about how to leverage historical data with AzureML and python to build a
pricing decision-support application using an advanced observational approach. We call the
resulting set of related services the Microsoft Pricing Engine (MPE). We created an Excel-centric
workflow interfacing with the services to support interactive use. The demonstration uses the
public orange juice data, a popular benchmark dataset in pricing econometrics.
System integrators will be able to use the MPE’s web service interface to build pricing decision
support applications on Azure, targeting the mid-size companies with small pricing teams without
extensive data science support for customized pricing models.
We will give a bit of theory, discuss how price elasticity is modeled, which potential confounders are
included, and the architecture of the pricing engine.
A little pricing theory
The core concept is that of price elasticity of demand, a measure of how sensitive the aggregate
demand is to price.
Lowering the price will improve both sales quantities and hence revenues for products with
elasticities e < -1 (negative since the line slopes down). If your product is inelastic (e > -1), you
should increase the price – only a little demand will be lost and your revenue will go up.
The above condition holds if you are optimizing only revenue, which is the case if you have
negligible marginal cost (e.g. you sell electronic media). More frequently, you have substantial
marginal costs, and want to optimize your gross margin (price – marginal cost). Then the constant -1
in the above calculation is replaced by (Price)/(Gross Margin), also known as the inverse Lerner Index.
For example, suppose your marginal cost of oranges is $4 and you are selling them for $5. Then you
should increase prices at lower elasticities (e > -5), and decrease them otherwise. (This does not
account for fixed costs, you may well have complex optimization problem if your marginal costs are
already burdened by a share of fixed cost.)
In addition to self-price elasticity, we recognize cross-price elasticity of demand where the demand
of a product changes in response to the price of another product. Substitute products have positive
cross-elasticity: if price of Tropicana orange juice increases, so do the sales of Minute Maid orange
juice, as customers substitute toward Minute Maid. Complementary products have negative cross-
elasticities – lowering the price of peanut butter may increase sales for jam even if price of jam stays
constant.
There should be exactly 1 row of data for each Item, Site (and Channel if provided) combination
provided. It is best to have at least 2 years’ worth of data so that seasonal effects can be captured.
You prepare your data into this format in Excel. For example, if your data is on weekly sales of various
brands of orange juice, it would look like this:
While we used the Orange Juice dataset, you can use your own data!
The Elasticity service gives one elasticity per Location (store number) and Product (three brands of
OJ). Note we referred to the same dataset "OJ_Demo". We see that the elasticity estimates cluster
around -5, which is pretty typical for competitive consumer products, and corresponds to optimal
gross margins of approximately 1/5 = 20%. The pricing analyst concludes that any products they are
earning less than 20% gross margin are probably underpriced, and vice versa.
Interpreting Cross-Elasticities
On the next tab, we consume cross-price elasticities from the PE_CrossPrice.
Here, the analysis shows that the products are indeed mostly substitutes - some know this effect as
"cannibalization". Note the visualization is not a standard chart, but is a python heat map
visualization. This is generated from the engine in the cloud and you get a link to the visual.
We see that the predicted effects is an increase in demand for Minute Maid, with customers
substituting primarily from the other premium brand, Tropicana, while the value brand is mostly
unaffected.
The MPE currently offers these services, each corresponding to a tab in the Excel "app":
The first (Build) stage of the python code is compute-intensive and takes a few seconds to a few
minutes to process, depending on the amount of data. This stage leaves behind .npy (numpy) files in
Azure Blob Storage which contain the estimated elasticities, the baseline forecasts, and the
intermediate features for each item, site and channel.
The second stage services query the data left in the storage. The files are stored as blobs, and each
service first loads the .npy file it needs. Then it answers the question through a lookup in the data
structures contained in the file and/or a simple calculation from the elasticity model. This method is
currently resulting in RRS latency that is acceptable for interactive use (<2s). As the data grows, we
are preparing to output the intermediate data structures into an Azure SQL database, where it will be
indexed for constant-time access. Having data in a database will also allow for easier joining and
detailed analysis that goes beyond the pre-implemented python services.
The services themselves are created by deploying a web service from the AzureML Studio. We
created wrapper code around the “legacy python” which requires only very short scripts to be
deployed into AzureML Execute Python modules in order to create the web services. Most of the
code is contained in the PricingEngine python package. The package is included in the script bundle
PricingEngine_Python_Dependencies.zip, together with its dependencies (mostly Azure SDK, since
sklearn and scipy are already available in the AzureML Python container. The figure below shows the
AzureML experiment for GetPromotionEffect - all services are similarly architected.
Deployment
The solution is deployed into your Azure subscriptions through Cortana Intelligence Quick Start
(CIQS), currently in private testing. The CIQS is a user friendly wrapper around an Azure Resource
Manager template. Please contact us if you would like to try it.
The data science: elasticity model
Core methodology
The core of the engine is an implementation of the idea of “double-residual effect estimation”
applied to price elasticity. Price elasticity of an item is simply the derivative of demand wrt price,
other things being equal. Other things being never equal, we need counterfactuals for both demand
and price. The engine supplied those today as predictions from a carefully elastic-net-regularized
linear model, with regularization parameters picked automatically by cross-validation. The elasticity
then comes from regressing the demand residual on the price residual.
Seasonal indicators
The price itself, in absolute, and relative to other products in the same category
Lagged changes in price and demand
The recent trend in the product’s category prices and demands
Indicator variables for the product category and site, channel
The recent demand and pricing trends at the site and in the channel
Indicator variables for the products category
Various interactions between these variables, especially between price change and site,
channel and category
To control the dimensionality of the model when we have large product hierarchies with many
categories up to 5 level deep, we use a model stacking approach. We first estimate the model M1
with only the top-level category indicators enabled. We capture the residuals of M1 and for each
level-1 category, we fit a model of the residuals, with level-2 category indicators as additional
predictors, using only transactions on items from the level-1 category. This process continues
recursively, in effect creating a tree of models. Heavy elastic net regularization occurs at each step.
Since the hierarchy is up to 5 levels deep, each demand prediction involves running 5 models along
the branch of the category tree.
There is a lot more going on. The engine also estimates pull-forward effects (demand shifting to
promotion period from subsequent ones). We'll blog more about that in a detailed future post.
Even when the price history is on hand, it can be surprisingly complicated to answer the question in a
B2B context. In B2C context, most customers face one price (but consider coupons and co-branded
credit cards!). In B2B, the same item can and does sell at many different prices. The following chart
illustrates the sales of one item in one site in time. The x axis is days, the y axis is the volume, and the
colors represent different price points.
The prices differ because of channels (delivery vs pick-up, internet vs in-person), preexisting pricing
agreements, etc. Price variations open us up to the treacherous modeling issue of “composition of
demand”. When multiple customer segments pay different prices, the average price is a function of
proportion of customers from different segment and can change even if no one is facing a different
price. Thus average sales price is a very poor covariate to use for modeling!
With guidance from the customer, we solved both the problem of historical record and the problem
of price variation by considering “the price” to be the weekly customer-modal price (the price at
which the most customers bought). We exclude items averaging fewer than 3 sales a week from the
modeling.
Conclusions
The engine combines strengths of ML, which is good at predicting, with those of econometrics,
which is good at preparing training data to mitigate biases and producing causally interpretable
models. As a cloud application, it is easy to deploy to your subscription and try out.