0% found this document useful (0 votes)
27 views67 pages

Brei 2020

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views67 pages

Brei 2020

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Foundations and Trends® in Marketing

Machine Learning in Marketing


Overview, Learning Strategies, Applications,
and Future Developments

Suggested Citation: Vinicius Andrade Brei (2020), “Machine Learning in Market-


ing”, Foundations and Trends® in Marketing: Vol. 14, No. 3, pp 173–236. DOI:
10.1561/1700000065.

Vinicius Andrade Brei


Universidade Federal do Rio Grande do Sul (UFRGS)
Brazil
[email protected]

This article may be used only for the purpose of research, teaching,
and/or private study. Commercial use or systematic downloading
(by robots or other automatic processes) is prohibited without ex-
plicit Publisher approval.
Boston — Delft
Contents

1 Introduction 174

2 Overview of Machine Learning 176


2.1 Types of Machine Learning and the Most
Relevant Algorithms . . . . . . . . . . . . . . . . . . . . . 177
2.2 The Relevance of Machine Learning for Marketing . . . . . 179

3 The Machine Learning Workflow 181

4 How to Learn Machine Learning 185

5 Analysis of Machine Learning Applications in Marketing 191


5.1 Choice Modeling . . . . . . . . . . . . . . . . . . . . . . . 192
5.2 Consumer Behavior . . . . . . . . . . . . . . . . . . . . . 194
5.3 Internet/Digital Marketing and Recommender Systems . . 198
5.4 Marketing Strategy . . . . . . . . . . . . . . . . . . . . . 201
5.5 Relationship Marketing . . . . . . . . . . . . . . . . . . . 202
5.6 Methodological Developments of Machine Learning . . . . 205
5.7 Combination of Theory-Driven Frameworks with
Machine Learning Methods . . . . . . . . . . . . . . . . . 207
5.8 ML Methods for Causal Effects and Policy Evaluation . . . 209
6 Trends and Future Developments of Machine
Learning in Marketing 211
6.1 Automated Machine Learning . . . . . . . . . . . . . . . . 211
6.2 Data Privacy and Security . . . . . . . . . . . . . . . . . . 212
6.3 Model Interpretability . . . . . . . . . . . . . . . . . . . . 213
6.4 Algorithm Fairness . . . . . . . . . . . . . . . . . . . . . . 215
6.5 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . 217
6.6 Bayesian Machine Learning . . . . . . . . . . . . . . . . . 218

7 Conclusions 220

Acknowledgements 222

References 223
Machine Learning in Marketing
Vinicius Andrade Brei
Universidade Federal do Rio Grande do Sul (UFRGS), Brazil;
[email protected]

ABSTRACT
The widespread impacts of artificial intelligence (AI) and
machine learning (ML) in many segments of society have not
yet been felt strongly in the marketing field. Despite such
shortfall, ML offers a variety of potential benefits, includ-
ing the opportunity to apply more robust methods for the
generalization of scientific discoveries. Trying to reduce this
shortfall, this monograph has four goals. First, to provide
marketing with an overview of ML, including a review of
its major types (supervised, unsupervised, and reinforce-
ment learning) and algorithms, relevance to marketing, and
general workflow. Second, to analyze two potential learning
strategies for marketing researchers to learn ML: the bottom-
up (that requires a strong background in general math and
calculus, statistics, and programming languages) and the
top-down (focused on the implementation of ML algorithms
to improve explanations and/or predictions given within the
domain of the researcher’s knowledge). The third goal is to
analyze the ML applications published in top-tier marketing
and management journals, books, book chapters, as well
as recent working papers on a few promising marketing re-
search sub-fields. Finally, the last goal of the monograph is to
discuss possible impacts of trends and future developments
of ML to the field of marketing.

Vinicius Andrade Brei (2020), “Machine Learning in Marketing”, Foundations and


Trends® in Marketing: Vol. 14, No. 3, pp 173–236. DOI: 10.1561/1700000065.
1
Introduction

The widespread impacts of artificial intelligence (AI) and machine


learning (ML) in all segments of society have driven researchers to term
the present day as the “AI Revolution” (Makridakis, 2017). This AI
revolution has sparked multidisciplinary research. In the business world,
such processes have been impactful as a significant source of innovation
(Huang and Rust, 2018). Despite their relevance, for many marketing
researchers and practitioners, terms such as artificial intelligence and
machine learning may seem akin to terms of a foreign language (Conick,
2017). This monograph attempts to change this scenario by discussing
the central role that AI and, more specifically, ML can play as a research
method in the marketing field.
Why should ML be applied to marketing? There are many possible
answers to this question rooted both in academic and applied prac-
tices of the discipline. For practitioners, for example, ML is disrupting
many industries with new business models, products, and services. In
academia, the impact appears to equally substantial. For example, the
lack of generalization of scientific discoveries is at the center of the
so-called “replication crisis,” which has affected many of the life and
social sciences, including the fields of management and marketing. This

174
175

crisis has occurred because researchers have found that many of the
most important scientific studies are difficult or impossible to replicate
or reproduce (see, for example, Camerer et al., 2018). As this monograph
will discuss, the fundamental goal of machine learning is to generalize
beyond the examples provided by training data, looking for generaliz-
ability (Domingos, 2012). Thus, one of the potential contributions of
ML to marketing (and to management in general) lies in its robustness
for the generation, testing, and generalization of scientific discoveries.
With these different academic and practical perspectives in mind, the
goal of this monograph is to provide marketing with an overview of ML
and to analyze required learning, applications, and future developments
involved in applying ML to marketing.
This monograph progresses as follows. The following section pro-
vides an overview of ML, including a review of its most relevant types,
algorithms, and relevance to marketing. The following section presents
a typical ML workflow, followed by a section that proposes two dif-
ferent learning strategies that can be used by management/marketing
researchers interested in ML. That section is followed by a descriptive
analysis of applications of ML published in top-tier marketing and
management journals, books, book chapters, and recent working papers
that explore a few of the most promising marketing research sub-fields.
The following section discusses how trends and future developments
of ML can impact the field of marketing. The last section summarizes
the monograph’s contributions, limitations, and suggestions for future
research.
2
Overview of Machine Learning

It is not uncommon to find references utilizing AI, ML, and deep


learning (DL) as synonyms (Garbade, 2018). Although all three are
closely related, they are not synonyms. AI is a system’s ability to
correctly interpret external data, to learn from such data, and to use the
learnings to achieve specific goals and tasks through flexible adaptation
(Kaplan and Haenlein, 2019). ML is a subset of AI that addresses ways
to build computers that improve automatically through experience
(i.e., data) (Jordan and Mitchell, 2015). ML refers to the study of
methods or algorithms designed to learn the underlying patterns in
the data and make predictions based on these patterns (Dzyabura
and Yoganarasimhan, 2018). DL is an approach of ML that involves
gathering knowledge from data, thus forming a hierarchy of concepts
that enables a computer to learn complex concepts from simpler ones.
Graphically, these concepts are layered on top of one another, and the
developed graph is deep with many layers. Therefore, this ML approach
is referred to as deep learning (Goodfellow et al., 2016). Figure 2.1
illustrates the relationships between the three concepts.
ML is implemented through the use of algorithms, which are unam-
biguous specifications of ways to solve a class of problems by learning

176
2.1. Machine Learning and the Most Relevant Algorithms 177

Figure 2.1: Relationships between artificial intelligence, machine learning, and deep
learning.

from data. An ML algorithm learns from experience E with respect to


some class of tasks T and performance measure P if its performance on
task T as measured by P improves with experience E (Mitchell, 1997).
There are three standard features of a learning algorithm: repre-
sentation, evaluation, and optimization. Representation involves the
presentation of a sequence of tasks in a way a computer can manage.
Evaluation (also referred to as the objective function) consists in dis-
tinguishing between good and bad actions/consequences. Optimization
involves searching for and selecting the best solutions (Domingos, 2012).

2.1 Types of Machine Learning and the Most Relevant Algorithms

The literature broadly divides ML into three major types: supervised,


unsupervised, and reinforcement learning (Kaplan and Haenlein, 2019).
Some authors also divide ML into branches, such as self-supervised
(Chollet and Allaire, 2018), representational (Goodfellow et al., 2016),
and evolutionary learning (Marsland, 2015). However, the former classi-
fications are much widely used than the latter.
Supervised (or predictive) learning is the most commonly used form
of ML (Jordan and Mitchell, 2015) and usually involves classification
and regression. Its purpose is to map from inputs x to outputs y based
on a labeled set of input-output pairs called the training set, which
is often annotated by humans. The algorithm generalizes to respond
correctly to all possible inputs. Inputs x can be as simple as a vector
178 Overview of Machine Learning

of numbers but can also include more complex and structured objects
such as images, sentences, texts, times-series, etc. When an output (or
a response variable or target) is categorical or nominal, the problem
refers to classification or pattern recognition. When y is a real value,
the problem is known as a regression (Murphy, 2012). In supervised
learning, a comparative metric is used to evaluate how accurate the
prediction of y is, given x. Most ML algorithms (including DL) are used
for supervised learning. Examples of supervised learning algorithms
frequently used in marketing include linear and logistic regressions,
random forest, and support vector machines. Almost all applications
of DL also belong to this branch of ML, including those of object/face
detection, image segmentation/classification, speech recognition, and
language translation (Chollet and Allaire, 2018).
The unsupervised (or descriptive) learning goal is to find ‘interesting
patterns’ in the data without the use of any targets (or outputs).
Rather, no responses (or labeled data) are provided. For this approach,
the problem is much less specifically defined, as one does not know
which patterns to search for, and there is no straightforward metric
for determining the success of a learning task (Murphy, 2012). The
algorithm is designed to identify similarities between inputs so that
inputs with commonalities are categorized together (Marsland, 2015).
It is not uncommon to use unsupervised learning to understand data
patterns before attempting to solve a supervised learning problem. Some
of the most widely used unsupervised learning algorithms in the field
of marketing include principal components analysis (for dimensionality
reduction or the discovery of latent factors) and k-means clustering (for
segmentation).
Reinforcement learning (RL) involves mapping situations to actions
to maximize a numerical reward signal. The learner algorithm is not told
which actions to take. Instead, it must identify which actions maximize
rewards through trial and error. The basic premise is to capture the
most critical aspects of the problem faced by the learning agent, which
interacts with its environment over time to achieve a goal. This learning
agent must be able to sense the state of its environment and to engage
in actions that affect the state. RL as a system applies four key elements:
a policy (the agent’s way of behaving), a reward signal (the goal of a
2.2. The Relevance of Machine Learning for Marketing 179

reinforcement problem), a value function (what is effective over the


long-term), and, optionally, a model of the environment. RL differs
from supervised learning in that there is no predefined correct answer
(or label). It also differs from unsupervised learning because it does
not involve uncovering or finding a hidden structure, as reinforcement
learning attempts to maximize a reward signal by trading-off exploration
over exploitation.
Thus, to obtain numerous rewards, the agent must prefer actions
that it has attempted in the past and that have been found to be
effective in producing rewards. However, to identify such actions, the
agent must try actions that it has not selected before, exploiting what
it has already experienced to obtain a reward. The challenge is that the
agent must also explore to engage in better actions in the future. In
short, the agent must try a variety of actions and progressively favor
those that appear to be the best (Sutton and Barto, 2018). Some of the
most relevant RL algorithms include the following: Q-Learning, State-
Action-Reward-State-Action (SARSA), and Deep Q Network (DQN).
RL is likely to be the form of ML that most approximates with AI.
Some applications of RL include robots, games, and self-driving cars.

2.2 The Relevance of Machine Learning for Marketing

ML offers several advantages and new perspectives for the knowledge


generation in marketing. It can be applied for solving regression and
classification problems, clustering, visualization, dimensionality reduc-
tion, creating association rules, developing reinforcement-learning agents,
among other utilizations. Despite these manifold applications, ML meth-
ods are usually compared to traditional econometric methods, commonly
applied in marketing.
Dzyabura and Yoganarasimhan (2018) highlight some of the dif-
ferences of the latter compared with the former. First, ML methods
are focused on obtaining the best out-of-sample predictions, whereas
causal econometric methods are aimed at deriving the best unbiased
estimators. Thus, techniques for causal inference often do not perform
well when making out-of-sample predictions. This happens because the
best unbiased estimator does not always provide the best out-of-sample
180 Overview of Machine Learning

predictions. In the ML literature, this problem is known as the bias-


variance tradeoff. Second, in contrast to econometric methods, many
ML approaches were developed in a situation where one does not have
a priori theory about the process through which outcomes observed in
the data were generated. Third, unlike several empirical methods used
in marketing, ML methods can handle an extremely large number of
variables and select which variables should be retained and which should
be dropped from the analysis. Last, ML can be a strong approach to
deal with scalability in marketing. Many ML approaches apply feature
selection and optimization to achieve scale and efficiency, an increasingly
important goal for many different real-time marketing problems.
Most of the ML applications in marketing adopt either a descrip-
tive/exploratory or a predictive perspective. The predictive modeling
perspective, based on the intersection of ML and causal modeling, of-
fers a powerful perspective for the development and test of marketing
theories than the descriptive/exploratory ones (see Shmueli, 2010 for a
review on explanation and prediction).
3
The Machine Learning Workflow

Despite the many approaches, algorithms, and programming languages


used, it is possible to identify a typical ML workflow. Chollet and Allaire
(2018) specify the following stages (Figure 3.1) involved in solving an
ML problem:

(1) Problem definition and dataset assembly. In this stage, the re-
searcher should determine which type of problem she is facing
(e.g., binary or multiclass classification, regression, clustering, gen-
eration, or reinforcement learning). To solve the problem, the
researcher should identify inputs and outputs and required data.
Depending on the task at hand, the researcher may also wish to
identify hypotheses to be tested later.

(2) Selection of success metrics. The researcher must select which


measure of success is appropriate for her problem (Zinkevich,
2019). For example, accuracy is a traditional measure used in clas-
sification problems, while for ranking or multilabel classification,
one can use mean average precision. The metric for success will
guide the selection of a loss function that the model will optimize.

181
182 The Machine Learning Workflow

Figure 3.1: Typical machine learning workflow.

(3) Evaluation protocol. Once the researcher knows what she is aiming
for, the next step is to establish ways to measure current progress.
Three commonly used protocols involve maintaining a hold-out
validation dataset for which the model was not previously trained,
running K-fold cross-validation, or performing iterated K-fold
validation for highly accurate model evaluation when little data
are available. K-fold cross-validation uses some of the available
data to fit the model and a different proportion to test it (Trevor
et al., 2009).

(4) Data preparation. Once the researcher knows what the model
is being trained for, what should be optimized, and ways to
evaluate the chosen approach, she may start to train the models.
However, it is recommended to first format the data so that they
can be fed into an ML model. Different models have different
requirements. For example, deep neural network data should be
formatted as tensors, while decision-tree models do not typically
require previous rescaling or feature engineering.
183

(5) Developing a model that performs better than a baseline level. The
goal of this stage is to achieve statistical power by crafting a small
model capable of surmounting an initial “dumb” baseline. For
example, any classification model should achieve an accuracy level
of greater than 0.5 as an expected random guess value. When it
is not possible to beat the “dumb” baseline, this may be a signal
that outputs cannot be predicted from the input data or that it is
not possible to test the hypothesis with the available data. When
the model passes an initial test, the researcher should choose a loss
or cost function as a measure of how good a model performs in
predicting the expected outcomes. At this point, one must define
the optimizer (or optimization algorithm), which ties together the
loss function and model parameters by updating the model in
response to the output of the loss function.
(6) Scaling up: developing a model that overfits. Once the researcher
obtains a model that has statistical power, the next step is to de-
termine whether the model generalizes well. The universal tension
in ML lies between optimization and generalization. That is, the
ideal model is that which straddles the line between underfitting
(when the model cannot capture the underlying trend observed in
the data) and overfitting (when the model captures noise in the
data). The model must also balance undercapacity with overcapac-
ity, i.e., its ability to fit a wide variety of functions. To determine
how large a model should be, the researcher should develop a
model that overfits and then adjust its architecture while monitor-
ing training and validation/test losses. Once the validation/test
loss begins to decline, this serves as evidence that the model has
achieved overfitting. At this point, one must start regularizing
and tuning the model so that it is as similar as possible to the
ideal model.
(7) Model regularization and tuning. In this stage, the researcher
modifies, trains, and re-evaluates the model until it is optimized.
Regularization involves reducing the error value by roughly fitting
a function to the given training set to avoid overfitting (Trevor
et al., 2009). A set of tuning (hyper)parameters governs the model’s
184 The Machine Learning Workflow

adaptability, allowing each model to discover predictive patterns


and structures within the data (Kuhn and Johnson, 2013). Some of
the most recent developments in ML, those of automated machine
learning (AutoML), are helping to reduce the amount of time
and effort often required to regularize and tune a model. Once
the researcher has achieved an acceptable model configuration,
it can be tested and used. That is, to check its performance in
the real world, the model must perform well on unseen data. The
maintenance of a model in production involves another set of steps
and precautions – see Zinkevich (2019), for a summary of best
practices).

Of course, a considerable amount of knowledge is applied in imple-


menting the ML workflow. The following section analyzes two poten-
tial approaches that management/marketing researchers may use to
learn ML.
4
How to Learn Machine Learning

ML involves the aggregation of knowledge from different fields and


especially from the fields of mathematics, statistics, and computer sci-
ence. The most important features of mathematics used for ML include
linear algebra, analytic geometry, matrix operations, vector calculus,
and optimization. From statistics, the probability and distributions
theories are primarily relevant. From computer science, programming
languages such as Python (developed by a computer scientist) and R
(developed by statisticians) are employed, as are approaches to dataflows
and parallel computing.
Given its complexity, it is apparent that there is no single approach
to learning ML. However, two main methods may be used: bottom-up
and top-down (Brownlee, 2016). The bottom-up approach is the most
traditional and robust, where the student learning process starts with
theories of math/calculus, statistics, and computer science. It provides a
solid foundation for those interested in mastering the knowledge required
to become a researcher of ML. For some of these researchers, applications
of ML may be of secondary importance relative to innovation and
efficiency or to the algorithms they will create or develop. Computer
Science students usually follow this the bottom-up learning approach

185
186 How to Learn Machine Learning

that often requires several years of training before the first results start
to emerge.
The bottom-up approach usually starts with the mastering of general
math and calculus, statistics, and programming languages. This goal
can be achieved by completing traditional academic courses or by using
online learning platforms. KahnAcademy.com and specialized websites
focused on the math used for ML, such as Mathematics for Machine
Learning (Deisenroth et al., 2019), are good examples of high-quality
online learning platforms. YouTube channels like Mathematicalmonk
(2019) can also be used to learn about this approach. Understanding the
bottom-up approach also involves the study of classical (e.g., Blitzstein
and Hwang, 2014) and Bayesian probability (e.g., Hoff, 2009), as well as
ML books that use more formal mathematical and probabilistic notation,
such as Bishop (2016) and Trevor et al. (2009), Murphy (2012), and
Goodfellow et al. (2016).
The bottom-up approach also requires the use of strong programming
language skills. At the time of writing, the most popular ML language
is Python. New scientific discoveries related to ML are published in
specialized conference proceedings (e.g., the International Conference
on Machine Learning (ICML)), on websites (e.g., arXiv.org) and in aca-
demic journals (e.g., IEEE Transactions on Evolutionary Computation
or Foundations and Trends in Machine Learning). As in any academic
field, the closer one gets to achieving specialized communication (e.g.,
articles published in academic journals on ML), the harder it becomes
for outsiders to learn ML using an approach other than the bottom-up.
In pursuing a different goal from that of the bottom-up approach, the
top-down approach is designed to implement ML algorithms to improve
explanations or predictions given within the domain of the researcher’s
knowledge. Such applications can be used in the realms of business,
health, law, communication, art, etc. With such an approach, researchers
are not necessarily striving to improve existing algorithms or to advance
the knowledge frontier of ML. Instead, their goal is to apply ML to
solve specific problems related to their disciplines. Given the applied
nature of the management and marketing disciplines, many students and
researchers of these areas prefer the top-down approach over the bottom-
up approach. However, the choice of the former approach may bring
187

negative consequences. In many cases, the top-down approach can lead


the (marketing) researcher to incorrect choices and inappropriate use of
the methods due to lack of a strong background on the foundations of
ML. Because of such perils, it is recommended to learn the foundations
of ML (i.e., the bottom-up approach) before start using it in academic
or empirical problems.
After these considerations about the risks of using ML without a
solid background, if one still chooses to follow the top-down approach,
s/he should know that there is no specific way of learning ML using
this approach. However, some steps may be applied (Brownlee, 2016).
The marketing researcher should start by defining why she would like
to engage in ML. Goals may include a desire to make predictions, to
solve complex marketing problems, to learn new ways of solving old
problems by learning new algorithms, to create software, or to become
a data scientist. Next, it is recommended to follow a systematic process,
such as that reflected in the workflow presented in the previous section.
To facilitate the ML learning process, it is essential to use appropriate
tools and resources. Although many of the popular ML algorithms
are available from proprietary point-and-click software (e.g., SPSS or
MATLAB), the vast majority of ML tools are first developed and
launched in open-source programming languages such as Python or R.
Depending on the application involved, the researcher will need to use
specialized packages, such as Keras for Deep Learning (Chollet, 2017) or
XGBoost for Gradient Boosting (He, 2018). After identifying a process
and tool, it is suggested to practice on datasets through the use of toy
examples or real-world ML problems relevant to her research. Even
when a large dataset for practice ML is not available, there are many
free data sources, such as Kaggle’s (2019) datasets.
Many free resources are available to those wishing to learn ML.
Table 4.1 summarizes some of the best available.
After the above discussion of learning strategies, the next section
explores the applications of ML in marketing.
188 How to Learn Machine Learning

Table 4.1: Learning resources for studying machine learning

Type of Resource Description/Strengths


Programming languages:
Python An interpreted, high-level, general-purpose
programming language.
R A programming language focused on
statistical computing and graphics.
Introductory books:
James et al. (2013) Introduces statistical learning methods with
several examples for R.
Géron (2019) Provides an understanding of concepts and
tools for ML using examples for Python.
Chollet (2017) Introduces the field of deep learning using
Chollet and Allaire the Keras library (Python and R versions).
(2018)
Online courses:
Datacamp Data science website offering video tutorials
and coding exercises.
Udemy Online learning and teaching marketplace.
Coursera Online learning platform.
Google ML Crash Course Google’s fast-paced and practical
introduction to ML.
Websites and blogs:
Kaggle An online community of data scientists that
allows users to find and publish datasets
and explore and build models.
Machine Learning Step-by-step guides for applied ML.
Mastery Website rendering deep learning as
Fast.ai accessible as possible.
Stackoverflow Question and answer website for
programmers.
Continued.
189

Table 4.1: Continued

Type of Resource Description/Strengths


Free datasets:
Kaggle Datasets on Kaggle data science
competitions.
GitHub/awesomedata GitHub repository of public datasets.
UCI Machine Learning Collection of databases, domain theories,
Repository and data generators.
The Dataset Collection Extensive free data archives on both sites
and individuals.
Apertio.io Global database and search engine for
open government Datasets.
Google Dataset Search Google’s search engine for free datasets.

IDEs (integrated development environments):


RStudio Most widely used free and open-source
IDE for R.
Spyder and PyCharm Open-source Python IDEs optimized to
data science workflows.
Popular ML packages/libraries:
For R lattice, DataExplorer, Dalex, dplyr,
Esquisse, caret, janitor, rpart, prophet,
plotly, randomForest, e1071, nnet,
kernlab, gbm, and RWeka.
For Python numpy, scipy, scikit-learn, theano,
matplotlib, pytorch, pandas, and
seaborn.
Both R and Python keras, tensorflow, xgb, and h2o.

Distribution platform:
Anaconda Free and open-source distribution of the
Python and R programming languages
for scientific computing that aims to
simplify package management and
deployment.
Continued.
190 How to Learn Machine Learning

Table 4.1: Continued

Type of Resource Description/Strengths


Reusable code:
Model Depot Platform pre-trained ML.
The Neural Network Zoo Repository of deep learning models.
TensorFlow Hub Library of reusable ML modules.
Open-Source collaborative platforms:
Jupyter An open-source web application that
allows creation and sharing of
documents that contain live code,
equations, visualizations, and narrative
text.
Colab Free Google research project created to
help disseminate ML education and
research in a “Jupyter lab-like”
environment.
Podcasts:
This Week in Machine ML and AI targeted to a broad
Learning & AI community of data scientists, engineers,
(TWiML&AI) and tech-savvy business and IT leaders.
Data Skeptic Interviews and discussion of topics related
to data science, statistics, ML, and AI.
DataFramed Discusses the problems that data science
attempts to solve rather than which
definitions fit best.
Artificial Intelligence: Conversations about technology, science,
AI Podcast and the human condition.
Talk Python to Me It covers a wide array of Python and AI
topics.
5
Analysis of Machine Learning Applications in
Marketing

To develop an analysis of the ML applications in marketing, I searched


for both consolidated work (i.e., published articles and books), but
also promising working papers in the recent streams that combine
theory-driven frameworks with ML methods and causal inference. To
build the consolidated corpus of ML applications in the knowledge
base of the marketing field, I searched for published papers in four
marketing top-journals, Journal of Marketing, Journal of Marketing
Research, Journal of Consumer Research, Marketing Science, and one
management top-journal, Management Science. Although Management
Science publishes papers from other management sub-fields other than
marketing, I included it in the search due to its inherent characteristic
of publishing state-of-the-art quantitative marketing papers. I have
also searched for relevant books and book chapters that analyzed ML
applications in marketing.
I also searched for more recent streams of work about ML appli-
cations that attempt to combine theory-driven frameworks with ML
methods. These applications are aimed at large scale personalization
problems and ML methods for causal effects and policy evaluation. They
are a few of the most promising research areas due to their strength in

191
192 Analysis of Machine Learning Applications in Marketing

building new scientific knowledge in marketing. Some of the papers of


those streams are still in the working paper format and have not been
published yet.
The search period started in 1950 and ended in September 2019.
Two historical perspectives guided the starting date: first, the origins
of the term Artificial Intelligence remote to Alan Turing’s theory of
computation (Turing, 1950); second, that the field of AI research is
considered to be initiated at a workshop at Dartmouth College in 1956
(Russell and Norvig, 2016). Having these two facts in mind, I established
the year 1950 as the start date for the search for academic marketing
papers. September 2019 was the date of the finalization of the first
version of this paper.
The thematic aggregation of the applications of ML in marketing
papers analyzed in this section is based on a classification of each of
them in a unique subject group by affinity with a primary marketing
subject domain. This aggregation resulted in the following subject cate-
gories: choice modeling; consumer behavior; internet/digital marketing
and recommender systems; marketing strategy; relationship marketing;
methodological developments of ML; the combination of theory-driven
frameworks with ML methods; and ML methods for causal effects and
polity evaluation.
In many cases, however, a paper’s contribution can be classified in
more than one subject category. For example, Chung and Rao’s (2012)
paper can be framed either as a choice modeling, digital marketing, or
as a recommender system contribution. Whenever such ambiguity hap-
pened, I tried to classify the paper in the category it contributed most.
The applications of ML in marketing under these subject categories are
presented next.

5.1 Choice Modeling

Many papers have studied different phenomena related to choice models,


such as discrete (Chiong and Shum, 2019; Mishra et al., 2014) and
hierarchical choice models (Currim et al., 1988), choice interdependence
(Wang et al., 2013), endogeneity and heterogeneity (Li and Ansari, 2014),
5.1. Choice Modeling 193

limits (Jagabathula and Rusmevichientong, 2019), and decision-making


(Armbruster and Delage, 2015; Cavagnaro et al., 2013).
Chiong and Shum (2019) introduced a random projection ML tool
to estimate aggregate discrete choice models with high-dimensional
choice sets. Their paper helps the decision-making process facilitating
consumer choice when the number of possible combinations is very
large.
Mishra et al. (2014) have applied a semiparametric discrete choice
model (also called marginal distribution model – MDM) to incorporate
consumer and product-level heterogeneity in both partworths and scale
parameters in the choice model. MDM efficacy was proved against
the classical multinomial logit, mixed logit, and a machine learning
approach that accounts for partworth heterogeneity.
Currim et al. (1988) proposed a new approach to inferring hierarchi-
cal models based on a classification algorithm that estimated decision
trees at individual levels without requiring prior assumptions about
the tree form. Their approach permitted the study of interactive or
conditional attribute effects at a disaggregate level.
The work of Wang et al. (2013) used a Markov random field (MRF)
model to investigate how individuals’ product choices are influenced
by the product choices of their connected others and how the influence
mechanism may differ for fashion- versus technology-related products.
They showed that experts exert asymmetrically greater influence on a
technology-related product, whereas popular individuals exert a greater
influence on a fashion-related product. Their model provided influence
predictions for a technology-related product.
Li and Ansari (2014) proposed a heterogeneous Bayesian semipara-
metric approach for modeling choice endogeneity. Their approach offers
a flexible and robust alternative to parametric methods. It was based on
a centered Dirichlet process mixtures (CDPM) that modeled consumer
preference heterogeneity nonparametrically. The work provides a robust
alternative to models that rely on the normal distribution for handling
both the endogeneity and heterogeneity errors.
Jagabathula and Rusmevichientong (2019) work take into account
the fact that consumers’ preferences are not always rational, and de-
veloped a method to quantify the limit of rationality (LoR) in choice
194 Analysis of Machine Learning Applications in Marketing

modeling applications. Their introduction of rational separation and


choice graph concepts provided methods to compute LoR efficiently for
applications such as grocery sales data and identification of product
categories for which going beyond rational choice models is necessary
to obtain acceptable performance.
Armbruster and Delage (2015) developed an approach to under-
standing optimal decision making under uncertainty when preference
information is incomplete. Their work is inspired by the AI literature
on utility elicitation using optimization for problems that only involve
a finite, although possibly large, set of outcomes.
Finnaly, Cavagnaro et al. (2013) proposed a methodological frame-
work for the adaptive selection of optimal stimuli for discriminating
among models of risky choice. Their adaptive design optimization ap-
proach uses active learning to choose the most informative decision
stimuli adaptively.

5.2 Consumer Behavior

A multitude of consumer behavior papers has applied ML in different


perspectives of the consumer decision process, such as search (Hu et al.,
2019), knowledge acquisition (O’Leary, 1998), ideal generation (Toubia
and Netzer, 2017), preference assessment (Chen et al., 2017; De Bruyn
et al., 2008; Dzyabura and Hauser, 2011; Evgeniou et al., 2005, 2007;
Hauser et al., 2010; Huang and Luo, 2016; Liu and Dzyabura, 2016;
Liu and Toubia, 2018; Toubia et al., 2013), consumer sorting tasks
(Blanchard et al., 2017), purchase prediction and decision (De Matos
et al., 2018; Toubia et al., 2019), shopping patterns (Xia et al., 2019),
consumer responses (Puranam et al., 2017; Xiao and Ding, 2014), post-
consumption evaluations (Dzyabura et al., 2019) and idea generation
(Toubia and Florès, 2007).
The initial stages of the consumer decision-making process have
been the object of study of different authors. Hu et al. (2019) studied
consumers’ search on daily deal websites using individual clickstream
data. They proposed a dynamic model of search and Dirichlet learning
that can replicate consumer search patterns. O’Leary (1998) investigates
the quality of probability knowledge when it is acquired from groups
5.2. Consumer Behavior 195

or individuals. Studying the ability of subjects to account for Bayes’


theorem reversals, he found that knowledge acquisition from groups is
more likely to obtain correct probability knowledge than from individu-
als. Toubia and Netzer (2017) created semantic networks to represent
word stems in a particular idea generation topic. Their model captured
the degree of novelty versus familiarity of word stem combinations,
evidencing which ideas were judged as more creative.
The study of consumer preferences is one of the most frequent appli-
cations of ML; many of them concentrated on conjoint analysis methods.
Evgeniou et al. (2005) applied methods from statistical learning theory
to improve conjoint analysis for preference modeling. Their approach
used computationally efficient optimization techniques. It achieved supe-
rior performance compared to standard logistic regression, hierarchical
Bayes, and the polyhedral methods for handling noise and estimating
nonlinearities in preference models. De Bruyn et al. (2008) compared
three algorithms – cluster classification, Bayesian treed regression, and
stepwise componential regression – to develop an optimal sequence
of questions and predict online visitors’ preferences. They empirically
demonstrated that a stepwise componential regression with fewer and
easier-to-answer questions achieved predictive accuracy equivalent to a
traditional conjoint approach. Evgeniou et al. (2007) proposed a new
approach for modeling consumer heterogeneity in conjoint estimation
based on convex optimization and statistical ML. Their method outper-
formed the standard hierarchical Bayes approach, both with metric and
choice data. Hauser et al. (2010) tested methods, based on cognitively
simple decision rules, that predicted which products consumers select
for their consideration sets. They proposed two ML methods to esti-
mate cognitively simple disjunctions-of-conjunctions (DOC) rules. The
cognitively simple DOC-based methods predicted better than the ten
benchmark methods on an information-theoretic measure and hit rates.
Their contribution is relevant to managerial decisions about considera-
tion sets in product categories. Dzyabura and Hauser (2011) developed
an active-ML method to select questions adaptively when consumers
use heuristic decision rules. Adaptive questions outperformed market-
based questions when estimating heuristic decision rules. Their method
relevance is anchored in the fact that heuristic decision rules predicted
196 Analysis of Machine Learning Applications in Marketing

validation decisions better than compensatory rules. Toubia et al. (2013)


introduced a method that dynamically (i.e., adaptively) designs elicita-
tion questions for estimating risk and time preference parameters. In
risk preferences settings, their approach predicted willingness to pay
more accurately at about the same time as the standard approach. For
time preferences setting, their model predicted preferences equally well
as the standard approach, but in less time. Liu and Dzyabura (2016)
developed an algorithm for estimating multi-taste consumer preferences.
Their algorithm initializes individual taste parameters with a cluster-
ing algorithm combined with regularized logistic regression. Next they
followed an iterative optimization procedure similar to the Concave-
Convex procedure to update parameters corresponding to each taste,
and shuffling products between tastes. Their model was able to capture
consumer taste heterogeneity. Huang and Luo (2016) developed a useful
method to understand consumers’ preferences for complex products, an
adaptive decompositional framework to elicit consumers’ preferences for
such products. Their method used a fuzzy support vector machine active
learning algorithm to adaptively refine the individual-specific preference
estimate after each question. This approach was adequate to complex
product categories equipped with as many as 100 attribute levels. Chen
et al. (2017) dealt with the problem that consumers’ preferences can
often be represented using a multimodal continuous heterogeneity dis-
tribution, a phenomenon difficult to be modeled. They applied a sparse
learning model to conjoint analysis, where consumer heterogeneity plays
a critical role in determining optimal marketing decisions. Their model
showed superior performance compared to benchmark models (e.g.,
finite mixture and Bayesian normal component mixture models) to
understand consumer preference distribution. More recently, Liu and
Toubia (2018) developed a semantic approach for estimating consumer
content preferences from online search queries. They introduced a topic
model, hierarchically dual latent Dirichlet allocation (HDLDA). This
model is appropriate for contexts in which one type of document (e.g.,
search queries) is semantically related to another kind of document (e.g.,
search results). The output of HDLDA provides a basis for estimating
consumers’ content preferences on the fly from their search queries,
5.2. Consumer Behavior 197

given a set of assumptions on how consumers translate their content


preferences into search queries.
Later stages of the consumer decision process were also explored. For
example, Blanchard et al. (2017) analyzed consumer sorting tasks, a tra-
ditional subject in exploratory marketing research in brand positioning
and categorization. Based on Monte Carlos simulations and an empirical
application, they created a computationally-efficient, flexible framework
for analyzing sorting task data and a new optimization approach to
identify summary piles. This approach provided an easy way to explore
associations consumers make among a set of items.
Besides consumer preferences, another recently explored subject in
ML applications on consumer research was purchase prediction and
decision. De Matos et al. (2018) analyzed the effect of subscription video-
on-demand (SVoD) services on digital piracy. They used ML algorithms
to discriminate covariates more strongly associated with household use
of BitTorrent and also to check for their results’ robustness. They found
that the use of SVoD services to curtail piracy will require, at the
minimum, offering content much earlier and at much lower prices than
those currently offered in the marketplace. Toubia et al. (2019) proposed
a quantitative approach for describing entertainment products, in a
way that allows for improving the predictive performance of consumer
choice models for these products. They developed a natural language
processing tool, guided latent Dirichlet allocation (LDA), that automat-
ically extracted a set of features of entertainment products from their
descriptions. The tool successfully predicted movie-watching behavior
at the individual level. Also recently, Xia et al. (2019) developed a
computationally efficient modeling and estimation method to analyze
large consumer shopping data sets that show complex relationships
across categories and over time. They explored the nature of intertem-
poral cross-product patterns in an enormous consumer purchase data
set using a model that adopts the structure of conditional restricted
Boltzmann machines (CRBMs). Their approach used the efficient esti-
mation algorithm embodied in the CRBM, enabling them to process
very large data sets and capture the consumer decision patterns for
both predictive and descriptive purposes that might not otherwise be
apparent.
198 Analysis of Machine Learning Applications in Marketing

Different papers applied ML methods to understand consumer re-


sponses. Xiao and Ding (2014) explored the effects of facial features in
print advertising, relying on the eigenface method, a holistic approach
for face recognition. They empirically demonstrated that different faces
do affect people’s attitudes toward the advertisement, attitude toward
the brand, and purchase intention. Puranam et al. (2017) proposed
a scalable Bayesian topic model to measure and understand changes
in consumer opinion about health and other topics. They developed a
Latent Dirichlet Allocation (LDA) probabilistic topic model to analyze
online reviews of restaurants. Using a difference-in-differences estima-
tion approach, they isolated the potentially causal effect of a health
regulation on consumer opinion.
Focusing their attention at the end of the consumer decision process,
Dzyabura et al. (2019) analyzed discrepancies between online and offline
product evaluations. They fused data from a large online study with
data from a smaller set of participants who completed both an online
and an offline study, estimating parameters using two ML methods,
a hierarchical Bayesian approach and a k-nearest-neighbors approach.
Their method achieved better out-of-sample predictive performance
on individual choices and aggregate market shares. Finally, Toubia
and Florès (2007) applied idea-screening algorithms that performed
idea selection adaptively based on the evaluations made by previous
consumers. They used simulations to compare and analyze the perfor-
mance and behavior of the algorithms. Their best-performing algorithm
focused on the ideas that were the most likely to have been misclassified
(as “top” or “bottom” ideas) based on the previous evaluations, and
avoided discarding ideas too fast by adding random perturbations to
the misclassification probabilities.

5.3 Internet/Digital Marketing and Recommender Systems

Many papers have applied ML to understand and develop recommen-


dation system-related research, such as recommendation agents (Wang
et al., 2018), hybrid recommender systems (Ansari et al., 2018), rec-
ommender system for experience products (Chung and Rao, 2012),
for online social networks (Li et al., 2017), for customer analysis
5.3. Internet/Digital Marketing and Recommender Systems 199

(Huang et al., 2007), product recommendation model based on cus-


tomer preference similarity (Moon and Russell, 2008), and collaborative
filtering (Van Roy and Xiang, 2010).
Many other papers have implemented ML methods in internet/digital
marketing applications, such as website (Hauser et al., 2009) and ad-
vertising morphing (Urban et al., 2014), media schedules (Naik et al.,
1998), assortment personalization (Bernstein et al., 2019), online sales
(Sonnier et al., 2011), multi-armed bandit to study the exploration
versus exploitation trade-off (Kim and Lim, 2016) and dynamic pricing
using multi-armed bandit experiments (Misra et al., 2019); learning
from users’ profiles (Atahan and Sarkar, 2011) and from consumers’
preferences (Farias and Li, 2019), and user profiling in customer-base
analysis (Trusov et al., 2016).

5.3.1 Recommendation Systems


Recommendation systems based on ML models have been developed in
marketing for more than a decade now. Huang et al. (2007) applied a
random graph modeling methodology to analyze bipartite consumer-
product graphs that represent sales transactions to understand consumer
purchase behavior in e-commerce settings better. They found that the
recommendation algorithm-induced graphs generally provided a bet-
ter match with the real-world consumer-product graphs than purely
random graphs. The former algorithms outperformed representative
standard collaborative filtering algorithms. Moon and Russell (2008)
developed a product recommendation model based on the principle that
customer preference similarity stemming from prior purchase behavior is
a crucial element in predicting current product purchases. Their model
was based on joint space mapping (placing customers and products
on the same psychological map) and spatial choice modeling (allowing
observed choices to be correlated across customers). It achieved superior
forecast performance of purchase behavior compared to benchmark
models. Van Roy and Xiang (2010) demonstrated that the traditional
nearest neighbors algorithms dare highly susceptible to manipulation.
They proved that linear and asymptotically linear algorithms could
be more robust to manipulation than nearest neighbors algorithms.
200 Analysis of Machine Learning Applications in Marketing

As a result, their proposed approach shows that it is possible to design


recommendation systems algorithms that achieve accuracy alongside
robustness. Chung and Rao (2012) developed a general consumer pref-
erence model for experience products that overcomes the limitations of
consumer choice models. By using Bayesian estimation methods and
Markov chain Monte Carlo simulation inference and testing the model
in online consumer ratings and offline consumer viewership for movies,
their approach outperformed several alternative collaborative filtering
and attribute-based preference models. Wang et al. (2018) analyzed
neutral recommendation agents (RAs) and those that lack recommen-
dation neutrality and are biased toward sponsors. They examined the
effects of recommendation neutrality on users’ trust and distrust in
RAs. Based on experiments in two countries, they discovered that user
trust in a biased RA increases, but only when explanations for organic
recommendations and sponsorship disclosure are both provided. Finally,
Ansari et al. (2018) developed a novel covariate-guided, heterogeneous
supervised topic model that uses product covariates, user ratings, and
product tags to characterize products in terms of latent topics and
specifies consumer preferences via these topics. Based on a stochastic
variational Bayesian framework, their model achieved fast, scalable, and
accurate estimation and prediction about movie preferences.

5.3.2 Internet/Digital Marketing


There has been a growing literature when considering the use of ML
in digital marketing papers. The most relevant papers refer to topics
related to online surfing and morphing. Trusov et al. (2016) proposed
a modeling approach grounded in the Correlated Topic Model that
uncovers individual user profiles from online surfing data and allowed
online businesses to make profile predictions when limited information
is available. The other relevant research stream of digital marketing an-
alyzes the personalization and morphing related to assortment, website,
and advertising applications. Hauser et al. (2009) developed a “morph-
ing” strategy to automatically matching the basic “look and feel” of a
website, not just the content, to cognitive styles. They inferred cognitive
styles from clickstream data with Bayesian updating and next balanced
5.4. Marketing Strategy 201

exploration (learning how morphing affects purchase probabilities) with


exploitation (maximizing short-term sales) by solving a dynamic pro-
gram (partially observable Markov decision process). In another paper,
Urban et al. (2014) extended the morphing method to make banners to
be matched to consumers based on posterior probabilities of latent seg-
ment membership, which were identified from consumers’ clickstreams.
A field test and an experiment demonstrated that matching banners
to cognitive-style segments is feasible and provides significant benefits
above and beyond traditional targeting.

5.4 Marketing Strategy

Many managerial applications of ML have been published in marketing


journals in the last two decades. They can be grouped into two major
subjects: marketing strategy and relationship management. Marketing
strategy papers that have applied ML focused on understanding unmet
demand (Chandukala et al., 2011), prediction (Cui and Curry, 2005;
Jacobs et al., 2016; Tam and Kiang, 1992), and forecast (Grushka-
Cockayne et al., 2017).
The most frequent application of ML in marketing strategy was
related to prediction/forecasting. One of the seminal applications of
ML in the marketing and management literature was the Tam and
Kiang’s (1992) paper to predict bank failure. They developed a neural-
net approach to perform discriminant analysis that outperformed other
traditional methods, such as linear classifier, logistic regression, and
kNN. Cui and Curry (2005) investigated the ability of the support
vector machine (SVM) algorithm to predict outcomes in emerging
environments in marketing, such as automated modeling, mass-produced
models, intelligent software agents, and data mining. They compared
the differences of the SVM’s prediction hit-rates against those from the
multinomial logit model. Chandukala et al. (2011) proposed a model for
identifying unmet demand, separating out weak preferences from strong
category interest and heterogenous tastes effects within the context of
conjoint analysis. They carried out a Bayesian estimation of the model
using data augmentation, following a heterogeneous variable section
model algorithm (Gilbride et al., 2006). Jacobs et al. (2016) introduced
202 Analysis of Machine Learning Applications in Marketing

a flexible framework for analyzing sorting task data, as well as a new


optimization approach to identify summary piles, which provide a way
to explore associations consumers make among a set of items. Using
Monte Carlo simulations and an empirical application, the authors
demonstrated that the resulting procedure is scalable computationally
efficient. Grushka-Cockayne et al. (2017) introduced a theoretical model
to examine the combined effect of overfitting and overconfidence on
the average forecast. They argued about the necessity of such a model
because firms usually average forecasts collected from multiple experts
and models. Because of cognitive biases, strategic incentives, or the
structure of ML algorithms, these forecasts are often overfitted to
sample data and are overconfident. They identified that the random
forest, a leading ML algorithm that pools hundreds of overfitting and
overconfident regression trees, as an ideal environment for trimming
probabilities. Using several known data sets, they showed that trimmed
ensembles could significantly improve the random forest’s predictive
accuracy.

5.5 Relationship Marketing

Relationship marketing problems are among the most frequent appli-


cations of ML, including customer acquisition (Schwartz et al., 2017;
Tillmanns et al., 2017), engagement (Lee et al., 2018), and retention
(Ascarza, 2018; Ryzhov et al., 2016), churn management (Lemmens and
Croux, 2006; Neslin et al., 2006), customer valuation (Schweidel et al.,
2014; Tirenni et al., 2007), customer similarities identification (Braun
and Bonfrer, 2011), customer relationship management (Sun, 2006),
firms interventions in online communities (Homburg et al., 2015), word-
of-mouth (WOM) management (Chica and Rand, 2017), and online
reputation mechanisms (Hollenbeck, 2018).
Tillmanns et al. (2017) proposed a Bayesian variable selection ap-
proach to optimally select targets for new customer acquisition. Their
approach outperformed non-selection methods and selection methods
based on expert judgment as well as benchmarks based on principal
component analysis and bootstrap aggregation of classification trees.
Schwartz et al. (2017) paper answered the question of how to decide
5.5. Relationship Marketing 203

what percentage of impressions firms should allocate in each of the


multiple versions of an online advertising strategy. They resolved this
“learn-and-earn” trade-off using multi-armed bandit (MAB) methods,
which improved by 8% the customer acquisition rate against an esti-
mated 10% decrease if the firm were to optimize click-through rates
instead of conversion directly.
Lee et al. (2018) described the effect of social media advertising
content on customer engagement using data from Facebook using a com-
bination of Amazon Mechanical Turk and natural language processing
algorithms. They studied the association of various kinds of social media
marketing content with user engagement with the messages. The results
showed that there are benefits to content engineering that combines
informative characteristics that help in obtaining immediate leads with
brand personality and branding on the social media site.
Ryzhov et al. (2016) proposed an empirical model to analyze the ef-
fectiveness of several design approaches used in direct-mail marketing to
cultivate one-time disaster donors and convert them into recurring con-
tributors. Their data-driven approach study of disaster donor cultivation
used different analytics tools, such as the development of a statistical
model estimated using maximum-likelihood, regularization techniques,
model selection to reduce the feature space, and feature generation.
Ascarza (2018) also analyzed customer retention. She demonstrated
that customers identified as having the highest risk of churning are not
necessarily the best targets for proactive churn programs. Her work com-
bined two field experiments with the algorithm proposed by Guelman
et al. (2015) to obtain estimates for the treatment effect and confidence
intervals of the treatment effect in different groups of customers. She
concluded that retention programs are sometimes futile not because
firms offer the wrong incentives but because they do not apply the right
targeting rules.
Lemmens and Croux (2006) and Neslin et al. (2006) applied ML to
predict churn. The former paper advocated the application of bagging
and boosting classification techniques with bias correction for unbal-
anced samples as a way to correctly predict churn and increase profits.
The latter paper provided a descriptive analysis of how methodological
factors contribute to the accuracy of customer churn predictive models.
204 Analysis of Machine Learning Applications in Marketing

After applying a variety of modeling “approaches” with different esti-


mation techniques, variable selection procedures, number of variables
included, and time allocated to steps in the model-building process, the
authors found important differences in performance among the tested
approaches, suggesting that the choice of the method is a major decision
when implementing a churn management strategy.
In the 2005 ISMS Practice Prize Winner paper, Tirenni et al. (2007)
introduced the Customer Equity and Lifetime Management (CELM).
CELM is a solution based on a decision-support system that offers
marketing managers a scientific framework for the optimal planning
and budgeting of targeted marketing campaigns to maximize return on
marketing investments. The CELM technology combines different ML-
related methods, such as Markov decision processes (to model customer
dynamics), Monte Carlo simulation (to simulate the financial impact
of a given marketing policy), and lifetime value portfolio optimization
through dynamic programming. Schweidel et al. (2014) analyzed extant
models that often focus on a single activity (such as purchases from
a retailer). They proposed a framework to generalize extant models
for customer base analysis to multiple activities (such as purchasing in
multiple brands or contributing with UGC). They developed a flexible
“buy ‘til you die” model to empirically examine how the two activities
(purchase digital content and/or post digital content at no charge) are
related. Compared with benchmarks, their model predicted both types
of activities more accurately.
Braun and Bonfrer (2011) developed a probabilistic framework for
modeling customer interactions grounded in the theory of homophily
(i.e., people who are similar to one another are more likely to interact
with one another). By applying a Bayesian nonparametric approach,
using Dirichlet processes, they moderated the scalability problems that
marketing researchers encounter when working with networked data.
Their framework can be used to segmentation and targeting activities.
In a conceptual paper, Sun (2006) analyzed how technology inno-
vation could affect Customer Relationship Management (CRM) and
its implications for marketing models. Among other propositions, he
discussed how data-mining technology demanded research to develop
5.6. Methodological Developments of Machine Learning 205

statistical learning rules for adaptive ML and automated implementation


of CRM.
Homburg et al. (2015) analyzed how consumers react to firms’ ac-
tive participation in consumer-to-consumer conversations in an online
community setting. They applied support vector machine models to
classify consumers’ posts. Results evidenced that consumers show di-
minishing returns to active firm engagement, which, at very high levels,
can undermine consumer sentiment.
Chica and Rand (2017) proposed an agent-based framework that
aggregates social network–level individual interactions to guide the con-
struction of a successful decision support system for WOM. To develop
the framework, they applied different algorithms, such as generalized
random networks (Viger and Latapy, 2005) and genetic algorithms
(Goldberg and Holland, 1988). The results show how the decision sup-
port system can help managers by forecasting premium conversions
of a freemium app and increasing the number of premiums through
targeting and implementing reward policies.
Finally, Hollenbeck (2018) investigated the value of business format
franchising and how it is changing in response to a large increase in
consumer information provided by online reputation mechanisms. He
used an ML analysis of online review text to recover the latent dimen-
sions of quality. More specifically, he applied latent Dirichlet allocation
(LDA) to model high-dimensional text data and naive Bayes to assign
probabilistic classifiers to features of text based on predefined categories.
Results showed that measures of the information content of online
reviews demonstrated that as information has increased, independent
hotel revenue has grown substantially more than chain hotel revenue.

5.6 Methodological Developments of Machine Learning

A handful of papers related to ML applications in marketing journals


were focused on methodological contributions related to ML. In a
theoretical note extending the work of Tam and Kiang (1992) and Wang
(1995) discussed the weakness of the standard backpropagation neural
network (NN) algorithm in classification applications. He argued that a
valuable property of NNs is their flexibility in generating input/output
206 Analysis of Machine Learning Applications in Marketing

relationships. However, because of this flexibility, they can generate


arbitrarily complicated classification boundaries after learning from
a training sample. The solution proposed was to use a form of local
linear fitting, called k-neighbor interpolation training. Wang’s paper
clarifies that the monotonic NN model can be applied to non-monotonic
classification when knowledge about inflection is available.
Piramuthu et al. (1998) also developed a solution to improve the
performance of NNs. Their work was based on the fact that NN’s
learning performance by algorithms such as backpropagation is slow
due to the size of the search space and iterative manner in which
the algorithm works. To overcome these problems, they proposed the
feature construction methodology to improve the learning speed and
the classification accuracy of NNs.
Ansari et al. (2000) developed a hierarchical Bayesian framework for
modeling general forms of heterogeneity in partially recursive structural
equation models. Their framework clarifies the motivations for accommo-
dating heterogeneity and illustrates theoretically the types of misleading
inferences that can result when unobserved heterogeneity is ignored.
The work also detailed different choices that researchers can make to
incorporate different forms of measurement and structural heterogeneity.
They also developed Markov Chain Monte Carlo (MCMC) procedures
to perform Bayesian inference in partially recursive, random-coefficient
structural equation models. The relevance of these procedures is to
provide individual-specific estimates of the factor scores, structural
coefficients, and other model parameters.
Finally, Toubia et al. (2003) proposed and tested a new adaptive
question design and estimation algorithms for partial-profile conjoint
analysis. Their polyhedral question design focuses on questions to reduce
a feasible set of parameters as rapidly as possible. They evaluated the
proposed methods relative to established benchmarks for question design
(e.g., random selection, D-efficient designs, adaptive conjoint analysis)
and estimation (hierarchical Bayes).
5.7. Combination of Theory-Driven Frameworks with ML Methods 207

5.7 Combination of Theory-Driven Frameworks with


Machine Learning Methods

This recent research stream has focused at large scale personalization


problems, including pricing (Dubé and Misra, 2019), mobile in-app
advertising (Rafieian, 2019a,b; Rafieian and Yoganarasimhan, 2020),
and search rankings (Yoganarasimhan, 2020).
Dubé and Misra (2019) analyzed the welfare implications of person-
alized pricing (an extreme form of third-degree price discrimination)
implemented with ML for a large, digital firm. They conducted a ran-
domized controlled pricing field experiment to train a demand model
and to conduct inferences about the effects of personalized pricing on
firm and customer surplus. In a second experiment, they validated their
predictions out of sample. Their results show that personalized pricing
substantially improves the firm’s profits. However, on the demand side,
customer surplus declines slightly. Importantly, they find that customer
surplus increases when the firm is allowed to condition on more cus-
tomer features and classify customers into more granular segments,
a conclusion that brings issues of data privacy. Overall, their work
brings important debates on policy decisions regarding the outcomes of
personalized pricing to companies’ profits and consumer privacy and
welfare.
Rafieian (2019a) developed a unified dynamic framework for adaptive
ad sequencing that optimizes user engagement in the session (e.g., the
number of clicks or length of stay). His framework comprised of two
components, a Markov Decision Process that captures the domain
structure and incorporates inter-temporal trade-offs in ad interventions,
and an empirical framework that combines ML methods with ideas
from the causal inference literature to obtain counterfactual estimates
of user behavior. By adopting a dynamic framework, the author found
significant gains in user engagement that were heterogeneous across
sections. That is, adaptive forward-looking ad sequencing was most
effective when users were new to the platform.
Rafieian (2019b) extended his previous work on dynamic sequencing
of mobile in-app ads (Rafieian, 2019a). In this paper, he examined the
revenue gains from adopting a revenue-optimal dynamic auction to
208 Analysis of Machine Learning Applications in Marketing

sequence ads. He proposed a unified framework with a theoretical frame-


work to derive the revenue-optimal dynamic auction and an empirical
framework that involved the structural estimation of advertisers’ click
valuations as well as personalized estimation of users’ behavior using
ML. The framework incorporates both advertisers’ and users’ behavior
and examines the market outcomes under the optimal dynamic auction.
It significantly improved revenue gains from using the revenue-optimal
dynamic auction compared to the revenue-optimal static auction.
Rafieian and Yoganarasimhan (2020) proposed a unified modeling
framework to address the issues of the value of different types of target-
ing information, the incentives of ad-networks to engage in behavioral
targeting, and the role of regulation in mobile in-app advertising markets.
They framework that consisted of two components – a ML framework
for targeting and an analytical auction model for examining market
outcomes under counterfactual targeting regimes. The ML component
consisted of a filtering procedure, a feature generation framework, and
a learning algorithm. They found that an efficient targeting policy im-
proves the average click-through rate substantially by stemming from
behavioral information compared to contextual information. However,
while total surplus grew with more granular targeting, the most effi-
cient targeting did not maximize ad-network revenues. Rather, it was
maximized when the ad-network did not allow advertisers to engage in
behavioral targeting. Their results suggest that ad-networks may have
incentives to preserve users’ privacy without external regulation.
Finally, Yoganarasimhan (2020) analyzed the personalization prob-
lem related to search rankings in the context of online search. She pro-
posed a personalized ranking mechanism based on a user’s search and
click history. The ML framework consisted of three modules: feature gen-
eration, normalized discounted cumulative gain-based Lambda-MART
algorithm, and feature selection wrapper. She showed that personal-
ization based on short-term history or within-session behavior is less
valuable than long-term or across-session personalization. This is be-
cause there is significant heterogeneity in returns to personalization
as a function of user history and query type. She demonstrated that
informational or navigational queries benefit more from personaliza-
tion than transactional ones, and that returns to personalization are
5.8. ML Methods for Causal Effects and Policy Evaluation 209

negatively correlated with a query’s past average performance. Also,


Yoganarasimhan’s approach showed how Big Data can be leveraged
to improve marketing outcomes without compromising the speed or
real-time performance of digital applications.

5.8 ML Methods for Causal Effects and Policy Evaluation

The last recent stream on the ML applications in marketing is likely


to be impactful in the next years. It involves the use of ML methods
to find causal effects and evaluate policy (for more details, see Athey
and Imbens, 2016; Wager and Athey, 2018). Recent manuscripts of this
research stream have studied the use of heterogeneous treatment effects
and optimal targeting policy evaluation (Hitsch and Misra, 2018), the
design and evaluation of personalized free trials (Yoganarasimhan et al.,
2019), the effects of personalized pricing on customer welfare (Dubé
and Misra, 2019), and how targeting and privacy issues arise in mobile
in-app advertising (Rafieian and Yoganarasimhan, 2020).
Hitsch and Misra (2018) analyzed how to construct optimal target-
ing policies and documented the difference in profits from alternative
targeting policies by using estimation approaches that are based on
recent advances in causal inference and ML. They introduced an ap-
proach allows the comparison of many alternative optimal targeting
policies that are constructed based on different estimates of the condi-
tional average treatment effect. Their approach allows the arbitrarily
comparison of many different targeting policies without incurring the
cost of an equal number of large-scale field experiments.
Yoganarasimhan et al. (2019) also used the strategy of combining
ideas from the ML literature with causal inference paradigms. Their
problem was to show how to personalize the length of the free trial pro-
motion that a user receives to optimize conversions. They concentrated
on the problem of how to derived heterogeneous treatment effects that
can be used to develop personalized targeting policies. By using data
from a large-scale field experiment, they developed seven personalized
targeting policies based on seven ML algorithms, and evaluated their
performances using the Inverse Propensity Score (IPS) estimator. They
showed how to link a method’s effectiveness in designing a policy with
210 Analysis of Machine Learning Applications in Marketing

its ability to personalize the treatment sufficiently without over-fitting.


Also, they demonstrated that policies designed to maximize short-run
conversions also perform well on long-run outcomes.
The work of Dubé and Misra (2019) (reviewed in the previous section
about theory-driven frameworks with ML methods), is an interesting,
recent example of how personalized pricing can increase companies’
profits at the cost of raising concerns on consumer privacy and welfare.
Also reviewed in the previous section, the work of Rafieian and Yoga-
narasimhan (2020) points out that ad-networks should have incentives
to preserve users’ privacy in the context of lack of external regulation.
After analyzing the past applications of ML in marketing, in the
next section I discuss the trends and future developments of ML that
may impact the field.
6
Trends and Future Developments of Machine
Learning in Marketing

Several new areas of ML promise to radically transform, popularize, or


even undermine its use in the field of marketing. With no pretense of
being exhaustive, a shortlist of these areas includes automated machine
learning (AutoML), data privacy and security, model interpretability,
algorithm fairness, computer vision, and Bayesian machine learning.
Each of these approaches is briefly analyzed next.

6.1 Automated Machine Learning

Automated machine learning (AutoML) offers methods and processes for


making ML available to nonexperts by automating end-to-end processes
of applying ML to real-world problems. For most ML applications, re-
searchers must perform a variety of tasks to develop efficient and precise
models. These tasks include data preprocessing, feature engineering,
extraction and selection, algorithm selection, and hyperparameter opti-
mization. As most of these tasks require the use of expert knowledge,
the growth of ML has generated a demand for methods that can be
used easily by nonexperts (Hutter et al., 2014).

211
212 Trends and Future Developments of ML in Marketing

The subfields of marketing analytics and relationship marketing


can benefit considerably from AutoML in helping to deepen relation-
ships with customers. Today, customers expect highly contextual and
individualized offers, which is becoming more and more challenging to
provide, given the multitude of devices that each customer can use to
consume. Contact with companies may occur through smartphones or
computers or even through visits to brick and mortar stores. Customer
data must be analyzed by resolving identity management across devices,
IDs, types of data (text, images, and audio), and spelling and style
conventions. AutoML is introducing new approaches to marketing by
having companies engage with customers on a deeper and more per-
sonal basis (Yadav, 2018), as this helps to map the consumer path to
purchases. In developing marketing attribution models that perform
complex “what-if” analyses, AutoML can quantify the effectiveness of
various marketing activities and different combinations of marketing
touchpoints (Geer, 2018).

6.2 Data Privacy and Security

Privacy and security of data are gaining momentum in the research


community, mainly due to emerging technologies like cloud computing,
analytics engines, and social networks. It is not uncommon now that
big data sectors, such as healthcare systems or financial systems, may
involve multiple organizations that might have different privacy policies.
The problem is that they may not explicitly share their data publicly
while joint data processing is done (Xu et al., 2015).
There are different challenges and perspectives involving the pri-
vacy and security of data, such as social network mining, the safety of
outsourced databases, privacy-preserving analytics, data exchange pro-
cedures, privacy-preserving big graph analysis and mining, and querying
cloud-enabled database management systems (Cuzzocrea, 2014). Most
of those issues became even more relevant after in May 2018, the imple-
mentation date of the European General Data Protection Regulation
(GPPR, 2018), the first extensive rewrite of privacy law in Europe.
Different anonymization techniques have conventionally been used
to shield individual privacy in the context of ML applications. Three of
6.3. Model Interpretability 213

the most relevant are k-anonymity, l-diversity, and t-closeness (Dorschel,


2019). In k-anonymity (see Samarati and Sweeney, 1998), personal data
(e.g., name, religion) of specific data points are removed or generalized
(e.g., replacing a particular age with an age span). The result is that
every combination of identity-revealing characteristics occurs in at least
k different rows of the dataset. L-diversity and t-closeness are extensions
of this concept (Li et al., 2007). These modifications are applied before
data is shared. However, with the rise of AI and ML, these forms of
protections have not been sufficient anymore.
Dorschel (2019) detailed that the first approaches to increase data
privacy were centered around the use of transfer learning, i.e., the use
of a pre-trained model for a new problem. When removing access to the
original data, reduced privacy concerns decreased naturally. However,
conventional transfer learning lacks a privacy guarantee when it comes to
sensitive data (such as health data), mainly because many technologies
of re-identification have been developed (see El Emam et al., 2011).
Today it is possible to re-identify personal information using large
datasets, from minimal amounts of personal information, such as credit
card purchase data (see Montjoye et al., 2015). This possibility brings
significant consequences for ML applications in marketing because more
restrictive privacy regulations may affect the future use of data.
The solutions for the re-identification problem have evolved around
the goal of removing access to the original data and decentralized. New
approaches have been developed recently. At the time of writing, a few
of the most promising were federated machine learning (McMahan and
Ramage, 2017), differential privacy (based on Dwork et al., 2006), and
split learning (Vepakomma et al., 2018).

6.3 Model Interpretability

In addition to the complex tasks that AutoML aims to automate and the
privacy/security data concerns, another barrier to the adoption of ML
relates to the fact that computers usually do not explain their predictions.
Such ML models are typically called “black-box models” (Molnar, 2019).
A company may build a model that correlates marketing campaign data
with financial data to evaluate whether a given campaign was effective.
214 Trends and Future Developments of ML in Marketing

However, correlation does not prove causation. Without using a causal


methodological approach, such as experimental design, it can be difficult,
if not impossible, to explain predictions. Experiments, on the other
hand, are expensive, time-consuming, and sometimes impossible to run
without disturbing customers. ML thus proposes model interpretability
as a necessary means of identifying and mitigating bias, improving
generalizations, and even avoiding ethical and legal issues produced
from incorrect model predictions (Hulstaert, 2018).
Some algorithms naturally create interpretable models, such as linear
regressions, logistic regressions, and decision trees. However, many of
the most robust recently developed ML advances, such as deep learning,
involve the use of black-box models, which use specific interpretation
strategies. Nevertheless, recent improvements in model interpretability
have resulted in the development of model-agnostic methods that can be
used to separate an explanation from an ML model (Ribeiro et al., 2016).
There are at least ten different types of model interpretability methods
(for a more in-depth review of this subject, see Molnar, 2019). Three of
these are of particular importance to marketing: feature importance,
Shapley values, and local surrogates (LIME).
Feature importance measures how relevant each variable is to in-
crease the prediction error of a given model after the researcher permutes
the feature’s values (Breiman, 2001; Fisher et al., 2019). By applying
feature importance, a marketer may determine which variables may be
excluded from her predictive model without a significant loss, therefore
simplifying the model and enhancing model reliability. The method has
broad applications to marketing. For example, about 42% of current
iPhone users will upgrade to a newer iPhone model (Munster, 2018).
Many of these customers are not necessarily replacing broken devices
but are instead shortening the lifespan of their functioning phones.
Using feature importance to analyze a consumer panel, for example,
may help one understand which variables more heavily affect consumers’
upgrade decisions (see Brei et al., 2020).
The Shapley values – a cooperative game theory developed by
Shapley (1953) – explains a prediction by assuming that each feature
value is a “player” in a game for which the prediction is the payout.
The Shapley values determine how to equitably distribute the “payout”
6.4. Algorithm Fairness 215

across features and each of their levels (Molnar, 2019). The logic of
this distribution echoes that of the traditional conjoint analysis method
(Rao, 2014) developed by marketing researchers and widely used in the
field from the 1970s. However, conjoint analysis involves estimating
the relevance of each feature value based on a relatively limited set
of features. In contrast, the Shapley value can substantially scale up
a feature value analysis for large samples and complex choice models.
The knowledge derived from the Shapley values can be applied to a
broad range of marketing phenomena, including those related to product
development, customer segmentation, and relationship marketing.
Finally, local interpretable model-agnostic explanations (LIME)
(Ribeiro et al., 2016) are used to explain the individual predictions of
black-box ML models. Rather than training a global surrogate model,
a LIME runs prediction based on local surrogate model estimations. It
generates a new dataset of permuted samples and the corresponding new
predictions. From this new dataset, the LIME then trains a new model,
which is weighted by the proximity of sampled instances to the instance
of interest (Molnar, 2019). A LIME has a broad range of applications
to the field of marketing. Any problem that requires the evaluation of
individual customer behaviors or decisions (in B2B or B2C contexts)
may be explained by a LIME, including those of churning (e.g., Dancho,
2018), product replacement, or adherence to loyalty programs.

6.4 Algorithm Fairness

Fairness has emerged as a matter of serious concern within Artificial


Intelligence (Verma and Rubin, 2018). One reason is that AI now replaces
humans at many critical decision points, such as which customers will
get a loan and who will get hired for a job. Advocates of algorithmic
techniques argue that these techniques eliminate human biases from
the decision-making process. However, this is not the case because
an algorithm is as good as the data it works with. As Barocas and
Selbst (2016) showed, data is frequently imperfect in ways that allow
algorithms to inherit the prejudices of prior decision-makers. Or, more
simply, data may reflect the widespread biases that persist in society
at large. The reason for the growing concern with algorithm fairness is
216 Trends and Future Developments of ML in Marketing

that there is more recognition that even models developed with the best
of intentions may exhibit discriminatory biases, perpetuate inequality,
or perform less well for historically disadvantaged groups (Barocas and
Hardt, 2017).
Barocas and Selbst (2016) and Zhong (2019) summarized a list of
possible causes of bias in ML systems:

• Skewed sample: If by some chance, some initial bias happens, it


may compound over time. In this type of bias, future observations
generally will confirm the prediction, and there will be fewer
opportunities to make observations that contradict the predictions.
One example is police records. The records of crimes only come
from those crimes observed by police. The police department tends
to dispatch more officers to the place that was found to have a
higher crime rate initially and is thus more likely to record crimes
in such regions.

• Tainted examples: Any ML system keeps the bias existing in


the old data caused by human bias. For example, Bolukbasi
et al. (2016) found that an algorithm trained using word embed-
dings from on Google News articles exhibit female/male gender
stereotypes. In those articles, the relationship between “man” and
“computer programmers” was highly similar to the relationship
between “woman” and “homemaker”.

• Limited features: minority groups’ features may be less informative


or reliably collected than a feature of a majority group. In such
situations, the algorithm tends to have lower precision for the
prediction of the minority group.

• Sample size disparity: if the training data sample coming from the
minority group is smaller than the one coming from the majority
group, it is likely to represent the minority group correctly.

• Proxies: even if the researcher does not include sensitive attributes


to train an ML model (e.g., race), there can be other features that
are proxies of those attributes (e.g., neighborhood).
6.5. Computer Vision 217

6.5 Computer Vision

Computer vision is an interdisciplinary field that studies how computers


can be trained to acquire, process, analyze, understand, and describe
the visual world to automate tasks that the human visual system can
process. In performing these complex tasks, computer vision tries to
extract high-dimensional data from the real world to support decisions
(Huang, 1996; Klette, 2014). ML has played a crucial role in advances
and applications of computer vision to marketing by improving the
speed and accuracy of image recognition. New classes of DL networks
such as convolutional neural networks (CNNs) have revolutionized the
computing landscape of computer vision due to their efficiency in the
realms of object detection and identification (Chahal and Dey, 2018).
Deep CNNs have won some of the most pressing international computer
vision challenges, such as the ImageNet Large Scale Visual Recognition
Challenge (He et al., 2015).
The range of computer vision and ML/DL applications available for
marketing is vast. They can help customers discover products based on
visual traits; scrape data and content from social and video channels;
track consumer behaviors, emotions, and attention in real-time; and use
visual data for product personalization, contextual and programmatic
advertising among other tasks (see more applications in Annalect, 2018;
Davis, 2017; Martech, 2017).
Recent developments of computer vision using data freely available
from Google Maps and Google Street View may also have substantial
applications for the field of marketing. For example, Gebru et al. (2017)
have estimated socioeconomic attributes such as income, race, education,
and voting patterns based on inferences drawn from cars detected from
Google Street View images using DL. These applications offer substan-
tial advantages over the traditional use of census data for marketers
because such data are expensive, and their collection is time-consuming
and may be outdated, especially in the years preceding a new census.
Another promising further use of computer vision involves estimating
demographics from satellite imagery. From DL/CNNs and object de-
tection technologies (see Cheng and Han, 2016 for a review), satellite
imagery has been used to estimate income (e.g., Jean et al., 2016), a key
218 Trends and Future Developments of ML in Marketing

variable used by marketers to predict demand and to estimate market


potential.

6.6 Bayesian Machine Learning

As a very simplified representation, a typical ML procedure involves the


following three steps: defining a model based on previously described
functions or distributions specified from unknown model parameters,
selecting data, and running a learning algorithm based on data to choose
a value for unknown model parameters. The Bayesian machine learning
(BML) approach follows these same steps but with a few modifications
(see Neiswanger, 2019 for an overview of Bayesian learning). The model
definition is based on a “generative process” for the employed data,
i.e., a sequence of steps describing how data were created (including
unknown model parameters). The model incorporates prior beliefs about
these parameters, which take the form of distributions over values that
the parameters may take. The data are defined as observations made
through the generative process. Finally, after running the learning
algorithm, an updated belief regarding the parameters emerges based
on their new distribution. BML is especially useful in the following
cases:
• When the researcher has prior beliefs about the unknown model
parameters or regarding data generation.

• When little data are available, as Bayesian inference methods re-


quire the computation of posterior probabilities often analytically
intractable when datasets are large (Sambasivan et al., 2018). How-
ever, new BML solutions are available for large dataset estimation
(see Braun and Damien, 2015).

• When there are many unknown model parameters, making it


difficult to obtain accurate results based on raw data (without
added structure or information).

• When the researcher wishes to measure the distribution of un-


certainty for the results rather than obtaining the single “best”
result (Neiswanger, 2019).
6.6. Bayesian Machine Learning 219

BML presents some advantages for marketing research, including


prediction accuracy, transparent procedures, interpretable results, and
analytic insights. Such benefits have become evident in different pub-
lications and a variety of marketing domains. For example, Cui et al.
(2006) demonstrated that BML could serve as a better method for
modeling consumers’ purchase histories over a long period. They show
how BML achieves a high level of predictive accuracy and offers a good
representation of the underlying distributions of purchase probabilities.
Sprigg et al. (2014) showed how BML overcomes the limitations of
traditional statistical learning to estimate the effects of promotional
treatments of datasets that are highly dimensional and sparse. Finally,
Braun and Damien (2015) proposed a new method for sampling from
the posterior distributions of Bayesian models whereby samples are
independent and can be collected in parallel, rendering them applicable
to large datasets, which is a limitation of traditional Bayesian methods.
In summary, these recent developments in ML are already trans-
forming the marketing landscape. However, such impacts will become
substantially more prominent in the future with the rapid adoption of
ML by marketing researchers.
7
Conclusions

This monograph aimed to provide an introductory discussion of the


requirements, contributions, and impacts of ML for the field of market-
ing. Further, it gives an overview of AI and ML, its main approaches,
most important algorithms, relevance to marketing, and how market-
ing researchers can learn ML. It also analyzes published articles and
promising working papers about ML applications, followed by future
developments of ML for the field of marketing.
Despite its enormous potential to facilitate the creation of new
marketing knowledge, researchers should not expect ML to change
the essence of marketing. It will not. However, ML has the potential
to mitigate some of the field’s weaknesses. For example, the adverse
effects of the “replication crisis” (see Camerer et al., 2018) may be
addressed using ML. One of the causes of such a crisis relates to a lack
of statistical power, poor experimental designs, and a lack of robust
analytical strategies used in previous scientific studies. These problems
are common to many of the social sciences, including marketing. For
example, it is common in consumer behavior research to fit models using
all available data rather than splitting into training and validation/test
sets. The simple practice of testing the capacity of a model to generalize

220
221

unseen data (as the fundamental goal of ML) (Domingos, 2012) may
alone improve the generalizability of many marketing models.
Although this monograph discussed some of the most critical ap-
plications and advances of ML for marketing, many others were not
analyzed due to space limitations. This list includes applications and
advances for voice recognition, natural language processing, augmented
and virtual reality, and the use of so-called “notebooks” (open-source
web applications for sharing codes and texts) that may popularize ML,
among many other features. Advances in hardware development, such as
those related to quantum computing will also substantially impact the
field (see Biamonte et al., 2017), but such a discussion extends beyond
the scope of this monograph.
The marketing applications of machine learning already affect the
everyday lives of millions of consumers across the globe via recommenda-
tion systems, collaborative filtering, gaming, digital personal assistants,
logistics, distribution systems, etc. Such impacts tend to enhance in cer-
tain areas, such as those of self-driving and automated transportation,
environmental protection, health care, banking, and smart home and
city development, among many others. The associated challenges and
opportunities facing marketers are thus considerable.
Acknowledgements

The author expresses his gratitude to the National Council for Scientific
and Technological Development (CNPq, Brazil) for the scholarship
200863/2018-5 that supported his Visiting Scholar appointment at
Harvard University, during when the first draft of this text was written.
He is also grateful to the Chair Tramontina Eletrik for the research
funding 4458-X, to the Massachusetts Institute of Technology (MIT) for
his Visiting Scholar appointment and Connection Science Fellowship,
and to the Editorial team’s helpful suggestions in previous versions of
this monograph.

222
References

Annalect (2018). “7 ways computer vision helps marketers see better


performance”. Annalect. April 27. https :// www . annalect . com /
7-ways-computer-vision-helps-marketers-see-better-performance/.
Ansari, A., K. Jedidi, and S. Jagpal (2000). “A hierarchical Bayesian
methodology for treating heterogeneity in structural equation mod-
els”. Marketing Science. 19(4): 328–347.
Ansari, A., Y. Li, and J. Z. Zhang (2018). “Probabilistic topic model
for hybrid recommender systems: A stochastic variational Bayesian
approach”. Marketing Science. 37(6): 987–1008.
Armbruster, B. and E. Delage (2015). “Decision making under uncer-
tainty when preference information is incomplete”. Management
Science. 61(1): 111–128.
Ascarza, E. (2018). “Retention futility: Targeting high-risk customers
might be ineffective”. Journal of Marketing Research. 55(1): 80–98.
Atahan, P. and S. Sarkar (2011). “Accelerated learning of user profiles”.
Management Science. 57(2): 215–239.
Athey, S. and G. Imbens (2016). “Recursive partitioning for hetero-
geneous causal effects”. Proceedings of the National Academy of
Sciences. 113(27): 7353–7360.

223
224 References

Barocas, S. and H. Hardt (2017). “Fairness in machine learning”. In:


Proceeding of the 31st Thirty-First Conference on Neural Informa-
tion Processing Systems. Conference on Neural Information Pro-
cessing Systems. December 4. https://nips.cc/Conferences/2017/
Schedule?showEvent=8734.
Barocas, S. and A. Selbst (2016). “Big data’s disparate impact”. Cali-
fornia Law Review. 104(3): 1–671.
Bernstein, F., S. Modaresi, and D. Sauré (2019). “A dynamic clustering
approach to data-driven assortment personalization”. Management
Science. 65(5): 2095–2115.
Biamonte, J., P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and
S. Lloyd (2017). “Quantum machine learning”. Nature. 549(7671):
195–202.
Bishop, C. M. (2016). Pattern Recognition and Machine Learning. New
York: Springer.
Blanchard, S. J., D. Aloise, and W. S. DeSarbo (2017). “Extracting sum-
mary piles from sorting task data”. Journal of Marketing Research.
54(3): 398–414.
Blitzstein, J. K. and J. Hwang (2014). Introduction to Probability. CRC
Press.
Bolukbasi, T., K.-W. Chang, J. Zou, V. Saligrama, and A. Kalai (2016).
“Man is to computer programmer as woman is to homemaker? Debi-
asing word embeddings”. ArXiv: 1607.06520 [Cs, Stat]. url: http://
arxiv.org/abs/1607.06520.
Braun, M. and A. Bonfrer (2011). “Scalable inference of customer simi-
larities from interactions data using dirichlet processes”. Marketing
Science. 30(3): 513–531.
Braun, M. and P. Damien (2015). “Scalable rejection sampling for
Bayesian hierarchical models”. Marketing Science. 35(3): 427–444.
Brei, V. A., L. Nicolao, M. A. Pasdiora, and R. C. Azambuja (2020).
“An integrative model to predict product replacement using deep
learning on longitudinal data”. Brazilian Administration Review
(BAR). 17(2): 1–33.
Breiman, L. (2001). “Random forests”. Machine Learning. 45(1): 5–32.
References 225

Brownlee, J. (2016). “The machine learning mastery method”. Machine


Learning Mastery. October 9. https://machinelearningmastery.com/
machine-learning-mastery-method/.
Camerer, C. F., A. Dreber, F. Holzmeister, T.-H. Ho, J. Huber, M.
Johannesson, M. Kirchler, G. Nave, B. A. Nosek, T. Pfeiffer, A.
Altmejd, N. Buttrick, T. Chan, Y. Chen, E. Forsell, A. Gampa, E.
Heikensten, L. Hummer, T. Imai, . . ., and H. Wu (2018). “Evaluating
the replicability of social science experiments in nature and science
between 2010 and 2015”. Nature Human Behaviour. 2(9): 637–644.
Cavagnaro, D. R., R. Gonzalez, J. I. Myung, and M. A. Pitt (2013).
“Optimal decision stimuli for risky choice experiments: An adaptive
approach”. Management Science. 59(2): 358–375.
Chahal, K. S. and K. Dey (2018). “A survey of modern object detection
literature using deep learning”. ArXiv: 1808.07256 [Cs]. url: http://
arxiv.org/abs/1808.07256.
Chandukala, S. R., Y. D. Edwards, and G. M. Allenby (2011). “Identi-
fying unmet demand”. Marketing Science. 30(1): 61–73.
Chen, Y., R. Iyengar, and G. Iyengar (2017). “Modeling multimodal
continuous heterogeneity in conjoint analysis—A sparse learning
approach”. Marketing Science. 36(1): 140–156.
Cheng, G. and J. Han (2016). “A survey on object detection in optical
remote sensing images”. ISPRS Journal of Photogrammetry and
Remote Sensing. 117: 11–28.
Chica, M. and W. Rand (2017). “Building agent-based decision support
systems for word-of-mouth programs: A freemium application”.
Journal of Marketing Research. 54(5): 752–767.
Chiong, K. X. and M. Shum (2019). “Random projection estimation of
discrete-choice models with large choice sets”. Management Science.
65(1): 256–271.
Chollet, F. (2017). Deep Learning with Python. Manning Publications
Company.
Chollet, F. and J. J. Allaire (2018). Deep Learning with R. Manning
Publications Company.
Chung, J. and V. R. Rao (2012). “A general consumer preference model
for experience products: Application to internet recommendation”.
Journal of Marketing Research. 49(3): 289–305.
226 References

Conick, H. (2017). The Past, Present and Future of AI in Marketing –


American Marketing Association. January 12. https://www.ama.org/
marketing-news/the-past-present-and-future-of-ai-in-marketing/.
Cui, D. and D. Curry (2005). “Prediction in marketing using the support
vector machine”. Marketing Science. 24(4): 595–615.
Cui, G., M. L. Wong, and H.-K. Lui (2006). “Machine learning for direct
marketing response models: Bayesian networks with evolutionary
programming”. Management Science. 52(4): 597–612.
Currim, I. S., R. J. Meyer, and N. T. Le (1988). “Disaggregate tree-
structure modeling of consumer choice data”. Journal of Marketing
Research. 25(3): 253–265.
Cuzzocrea, A. (2014). “Privacy and security of big data: Current chal-
lenges and future research perspectives”. Proceedings of the First
International Workshop on Privacy and Secuirty of Big Data: 45–47.
Dancho, M. (2018). TensorFlow for R: Deep Learning with Keras to
Predict Customer Churn. url: https://blogs.rstudio.com/tensorflo
w/posts/2018-01-11-keras-customer-churn/.
Davis, B. (2017). “10 uses of computer vision in marketing & customer
experience”. Econsultancy. February 3. https://econsultancy.com/
10-uses-of-computer-vision-in-marketing-customer-experience/.
De Bruyn, A., J. C. Liechty, E. K. R. E. Huizingh, and G. L. Lilien
(2008). “Offering online recommendations with minimum customer
input through conjoint-based decision aids”. Marketing Science.
27(3): 443–460.
De Matos, M. G., P. Ferreira, and M. D. Smith (2018). “The effect of
subscription video-on-demand on piracy: Evidence from a household-
level randomized experiment”. Management Science: 5610–5630.
Deisenroth, M. P., A. A. Faisal, and C. S. Ong (2019). Mathematics
for Machine Learning. Mathematics for Machine Learning. url:
https://mml-book.com/.
Domingos, P. (2012). “A few useful things to know about machine
learning”. Communications of the ACM. 55(10): 78–87.
Dorschel, A. (2019). “Data privacy in machine learning: A technical
deep-dive”. Medium. April 25. https://medium.com/luminovo/data
-privacy-in-machine-learning-a-technical-deep-dive-f7f0365b1d60.
References 227

Dubé, J.-P. and S. Misra (2019). Personalized Pricing and Customer


Welfare (SSRN Scholarly Paper ID 2992257). Social Science Research
Network.
Dwork, C., F. McSherry, K. Nissim, and A. Smith (2006). “Calibrating
noise to sensitivity in private data analysis”. Proceedings of the
Third Conference on Theory of Cryptography: 265–284.
Dzyabura, D. and J. R. Hauser (2011). “Active machine learning for
consideration heuristics”. Marketing Science. 30(5): 801–819.
Dzyabura, D., S. Jagabathula, and E. Muller (2019). “Accounting for
discrepancies between online and offline product evaluations”. Mar-
keting Science. 38(1): 88–106.
Dzyabura, D. and H. Yoganarasimhan (2018). “Machine learning and
marketing”. In: Handbook of Marketing Analytics. Edward Elgar
Publishing.
El Emam, K., E. Jonker, L. Arbuckle, and B. Malin (2011). “A system-
atic review of re-identification attacks on health data”. PLoS ONE.
6(12).
Evgeniou, T., C. Boussios, and G. Zacharia (2005). “Generalized robust
conjoint estimation”. Marketing Science. 24(3): 415–429.
Evgeniou, T., M. Pontil, and O. Toubia (2007). “A convex optimization
approach to modeling consumer heterogeneity in conjoint estima-
tion”. Marketing Science. 26(6): 805–818.
Farias, V. F. and A. A. Li (2019). “Learning preferences with side
information”. Management Science. 65(7): 3131–3149.
Fisher, A., C. Rudin, and F. Dominici (2019). “All models are wrong, but
many are useful: Learning a variable’s importance by studying an
entire class of prediction models simultaneously”. ArXiv: 1801.01489
[Stat]. url: http://arxiv.org/abs/1801.01489.
Garbade, D. M. J. (2018). “Clearing the confusion: AI vs. machine learn-
ing vs. deep learning differences”. Towards Data Science. September
14. https://towardsdatascience.com/clearing-the-confusion-ai-vs-m
achine-learning-vs-deep-learning-differences-fce69b21d5eb.
228 References

Gebru, T., J. Krause, Y. Wang, D. Chen, J. Deng, E. L. Aiden, and


L. Fei-Fei (2017). “Using deep learning and Google street view
to estimate the demographic makeup of neighborhoods across the
United States”. Proceedings of the National Academy of Sciences.
114(50): 13108–13113.
Geer, A. (2018). “How machine learning can help marketers measure
multi-touch attribution – American marketing association”. February
19. https://www.ama.org/2018/02/19/how-machine-learning-can
-help-marketers-measure-multi-touch-attribution/.
Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn, Keras,
and TensorFlow: Concepts, Tools, and Techniques to Build Intelli-
gent Systems. (2nd edition). O’Reilly Media.
Gilbride, T. J., G. M. Allenby, and J. D. Brazell (2006). “Models for
heterogeneous variable selection”. Journal of Marketing Research.
43(3): 420–430.
Goldberg, D. E. and J. H. Holland (1988). “Genetic algorithms and
machine learning”. Machine Learning. 3(2): 95–99.
Goodfellow, I., Y. Bengio, and A. Courville (2016). Deep Learning.
Cambridge: MIT Press.
GPPR (2018). “General Data Protection Regulation (GDPR) – Official
Legal Text”. General Data Protection Regulation (GDPR). url:
https://gdpr-info.eu/.
Grushka-Cockayne, Y., V. R. R. Jose, and K. C. Lichtendahl (2017).
“Ensembles of overfit and overconfident forecasts”. Management
Science. 63(4): 1110–1130.
Guelman, L., M. Guillén, and A. M. Pérez-Marín (2015). “Uplift random
forests”. Cybernetics and Systems. 46(3–4): 230–248.
Hauser, J. R., G. L. Urban, G. Liberali, and M. Braun (2009). “Website
morphing”. Marketing Science. 28(2): 202–223.
Hauser, J. R., O. Toubia, T. Evgeniou, R. Befurt, and D. Dzyabura
(2010). “Disjunctions of conjunctions, cognitive simplicity, and con-
sideration sets”. Journal of Marketing Research. 47(3): 485–496.
He, T. (2018). “Xgboost v0.71.2. R Documentation”. url: https ://
www.rdocumentation.org/packages/xgboost/versions/0.71.2.
References 229

He, K., X. Zhang, S. Ren, and J. Sun (2015). “Deep residual learning for
image recognition”. ArXiv: 1512.03385 [Cs]. url: http://arxiv.org/
abs/1512.03385.
Hitsch, G. J. and S. Misra (2018). Heterogeneous Treatment Effects
and Optimal Targeting Policy Evaluation (SSRN Scholarly Paper
ID 3111957). Social Science Research Network. doi: 10.2139/ssrn.
3111957.
Hoff, P. D. (2009). A First Course in Bayesian Statistical Methods.
Springer Science & Business Media.
Hollenbeck, B. (2018). “Online reputation mechanisms and the decreas-
ing value of chain affiliation”. Journal of Marketing Research. 55(5):
636–654.
Homburg, C., L. Ehm, and M. Artz (2015). “Measuring and managing
consumer sentiment in an online community environment”. Journal
of Marketing Research. 52(5): 629–641.
Hu, M. (Mandy), C. (Ivy) Dang, and P. K. Chintagunta (2019). “Search
and learning at a daily deals website”. Marketing Science. 38(4):
609–642.
Huang, T. (1996). Computer Vision: Evolution and Promise. CERN.
Huang, D. and L. Luo (2016). “Consumer preference elicitation of com-
plex products using fuzzy support vector machine active learning”.
Marketing Science. 35(3): 445–464.
Huang, M.-H. and R. T. Rust (2018). “Artificial intelligence in service”.
Journal of Service Research. 21(2): 155–172.
Huang, Z., D. D. Zeng, and H. Chen (2007). “Analyzing consumer-
product graphs: Empirical findings and applications in recommender
systems”. Management Science. 53(7): 1146–1164.
Hulstaert, L. (2018). “Interpreting machine learning models”. Towards
Data Science. February 20. https://towardsdatascience.com/interpr
etability-in-machine-learning-70c30694a05f.
Hutter, F., R. Caruana, R. Bardenet, M. Bilenko, I. Guyon, B. Kégl,
and H. Larochelle (2014). AutoML workshop @ ICML’14. url:
https://sites.google.com/site/automlwsicml14/.
Jacobs, B. J. D., B. Donkers, and D. Fok (2016). “Model-based purchase
predictions for large assortments”. Marketing Science. 35(3): 389–
404.
230 References

Jagabathula, S. and P. Rusmevichientong (2019). “The limit of rational-


ity in choice modeling: Formulation, computation, and implications”.
Management Science. 65(5): 2196–2215.
James, G., D. Witten, T. Hastie, and R. Tibshirani (2013). An Introduc-
tion to Statistical Learning: With Applications in R. Springer-Verlag.
url: http://www.springer.com/us/book/9781461471370.
Jean, N., M. Burke, M. Xie, W. M. Davis, D. B. Lobell, and S. Ermon
(2016). “Combining satellite imagery and machine learning to predict
poverty”. Science. 353(6301): 790–794.
Jordan, M. I. and T. M. Mitchell (2015). “Machine learning: Trends,
perspectives, and prospects”. Science. 349(6245): 255–260.
Kaggle (2019). Datasets. Datasets | Kaggle. url: https://www.kaggle.
com/datasets.
Kaplan, A. and M. Haenlein (2019). “Siri, Siri, in my hand: Who’s
the fairest in the land? On the interpretations, illustrations, and
implications of artificial intelligence”. Business Horizons. 62(1): 15–
25.
Kim, M. J. and A. E. B. Lim (2016). “Robust multiarmed bandit
problems”. Management Science. 62(1): 264–285.
Klette, R. (2014). Concise Computer Vision: An Introduction into
Theory and Algorithms. Springer.
Kuhn, M. and K. Johnson (2013). “Over-Fitting and Model Tuning”.
In: Applied Predictive Modeling. Ed. by M. Kuhn and K. Johnson.
New York: Springer. 61–92.
Lee, D., K. Hosanagar, and H. S. Nair (2018). “Advertising content and
consumer engagement on social media: Evidence from facebook”.
Management Science. 64(11): 5105–5131.
Lemmens, A. and C. Croux (2006). “Bagging and boosting classification
trees to predict churn”. Journal of Marketing Research. 43(2): 276–
286.
Li, Y. and A. Ansari (2014). “A Bayesian semiparametric approach
for endogeneity and heterogeneity in choice models”. Management
Science. 60(5): 1161–1179.
Li, N., T. Li, and S. Venkatasubramanian (2007). “t-closeness: Privacy
beyond k-anonymity and l-diversity”. 2007 IEEE 23rd International
Conference on Data Engineering: 106–115.
References 231

Li, Z., X. Fang, X. Bai, and O. R. L. Sheng (2017). “Utility-based link


recommendation for online social networks”. Management Science.
63(6): 1938–1952.
Liu, L. and D. Dzyabura (2016). “Capturing multi-taste preferences:
A machine learning approach”. SSRN Electronic Journal. doi: 10.
2139/ssrn.2729468.
Liu, J. and O. Toubia (2018). “A semantic approach for estimating
consumer content preferences from online search queries”. Marketing
Science. 37(6): 930–952.
Makridakis, S. (2017). “The forthcoming artificial intelligence (AI)
revolution: Its impact on society and firms”. Futures. 90: 46–60.
Marsland, S. (2015). Machine Learning: An Algorithmic Perspective.
(2nd edition). CRC Press.
Martech (2017). “How computer vision may impact the future of mar-
keting”. MarTech Today. September 6. https://martechtoday.com/
computer-vision-may-impact-future-marketing-203605.
Mathematicalmonk (2019). “Mathematicalmonk”. url: https://www.
youtube.com/channel/UCcAtD_VYwcYwVbTdvArsm7w.
McMahan, B. and D. Ramage (2017). “Federated learning: Collaborative
machine learning without centralized training data”. Google AI Blog.
April 6. http://ai.googleblog.com/2017/04/federated-learning-colla
borative.html.
Mishra, V. K., K. Natarajan, D. Padmanabhan, C.-P. Teo, and X. Li
(2014). “On theoretical and empirical aspects of marginal distribution
choice models”. Management Science. 60(6): 1511–1531.
Misra, K., E. M. Schwartz, and J. Abernethy (2019). “Dynamic on-
line pricing with incomplete information using multiarmed bandit
experiments”. Marketing Science. 38(2): 226–252.
Mitchell, T. M. (1997). Machine Learning. McGraw-Hill.
Molnar, C. (2019). “Interpretable machine learning: A guide for making
black box models explainable”. url: https://christophm.github.io/
interpretable-ml-book/.
Montjoye, Y.-A. de, L. Radaelli, V. K. Singh, and A. “Sandy” Pentland
(2015). “Unique in the shopping mall: On the reidentifiability of
credit card metadata”. Science. 347(6221): 536–539.
232 References

Moon, S. and G. J. Russell (2008). “Predicting product purchase from


inferred customer similarity: An autologistic model approach”. Man-
agement Science. 54(1): 71–82.
Munster, G. (2018). “IPhone intent to upgrade survey suggests more
predictable demand”. Loup Ventures. March 12. https://loupventur
es.com/iphone-intent-to-upgrade-survey-suggests-more-predictabl
e-demand/.
Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective.
MIT Press.
Naik, P. A., M. K. Mantrala, and A. G. Sawyer (1998). “Planning media
schedules in the presence of dynamic advertising quality”. Marketing
Science. 17(3): 214–235.
Neiswanger, W. (2019). “Intro to modern Bayesian learning and proba-
bilistic programming”. Medium. January 15. https://medium.com/
@Petuum/intro-to-modern-bayesian-learning-and-probabilistic-pr
ogramming-c61830df5c50.
Neslin, S. A., S. Gupta, W. Kamakura, L. U. Junxiang, and C. H.
Mason (2006). “Defection detection: Measuring and understanding
the predictive accuracy of customer churn models”. Journal of
Marketing Research. 43(2): 204–211.
O’Leary, D. E. (1998). “Knowledge acquisition from multiple experts:
An empirical study”. Management Science. 44(8): 1049–1058.
Piramuthu, S., H. Ragavan, and M. J. Shaw (1998). “Using feature
construction to improve the performance of neural networks”. Man-
agement Science. 44(3): 416–430.
Puranam, D., V. Narayan, and V. Kadiyali (2017). “The effect of calorie
posting regulation on consumer opinion: A flexible latent dirichlet
allocation model with informative priors”. Marketing Science. 36(5):
726–746.
Rafieian, O. (2019a). Optimizing user engagement through adaptive ad
sequencing. Technical report, Working paper. url: http://staff .
washington.edu/rafieian/data/part1.pdf.
Rafieian, O. (2019b). “Revenue-Optimal Dynamic Auctions for Adaptive
Ad Sequencing”. url: https://staff.washington.edu/rafieian/data/
part2.pdf.
References 233

Rafieian, O. and H. Yoganarasimhan (2020). Targeting and Privacy


in Mobile Advertising (SSRN Scholarly Paper ID 3163806). Social
Science Research Network. doi: 10.2139/ssrn.3163806.
Rao, V. R. (2014). Applied Conjoint Analysis. Springer.
Ribeiro, M. T., S. Singh, and C. Guestrin (2016). “Why should I trust
you?: Explaining the predictions of any classifier”. In: Proceedings
of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining. 1135–1144.
Russell, S. and P. Norvig (2016). Artificial Intelligence: A Modern
Approach. CreateSpace Independent Publishing Platform.
Ryzhov, I. O., B. Han, and J. Bradić (2016). “Cultivating disaster
donors using data analytics”. Management Science. 62(3): 849–866.
Samarati, P. and L. Sweeney (1998). Protecting Privacy When Dis-
closing Information: K-Anonymity and Its Enforcement Through
Generalization and Suppression.
Sambasivan, R., S. Das, and S. K. Sahu (2018). “A Bayesian perspective
of statistical machine learning for big data”. ArXiv: 1811.04788 [Cs,
Stat]. url: http://arxiv.org/abs/1811.04788.
Schwartz, E. M., E. T. Bradlow, and P. S. Fader (2017). “Customer
acquisition via display advertising using multi-armed bandit experi-
ments”. Marketing Science. 36(4): 500–522.
Schweidel, D. A., Y.-H. Park, and Z. Jamal (2014). “A multiactivity
latent attrition model for customer base analysis”. Marketing Science.
33(2): 273–286.
Shapley, L. S. (1953). “A value for n-person games”. Contributions to
the Theory of Games. 2(28): 307–317.
Shmueli, G. (2010). “To explain or to predict?” Statistical Science. 25(3):
289–310.
Sonnier, G. P., L. McAlister, and O. J. Rutz (2011). “A dynamic model
of the effect of online communications on firm sales”. Marketing
Science. 30(4): 702–716.
Sprigg, J., K. Becker, and A. Cosmas (2014). “Estimating individual
promotional campaign impacts through Bayesian inference”. Journal
of Consumer Marketing. 31(6/7): 541–552.
Sun, B. (2006). “Technology innovation and implications for customer
relationship management”. Marketing Science. 25(6): 594–597.
234 References

Sutton, R. S. and A. G. Barto (2018). Reinforcement Learning: An


Introduction. MIT Press.
Tam, K. Y. and M. Y. Kiang (1992). “Managerial applications of neural
networks: The case of bank failure predictions”. Management Science.
38(7): 926–947.
Tillmanns, S., F. Ter Hofstede, M. Krafft, and O. Goetz (2017). “How
to separate the wheat from the chaff: Improved variable selection
for new customer acquisition”. Journal of Marketing. 81(2): 99–113.
Tirenni, G., A. Labbi, C. Berrospi, A. Elisseeff, T. Bhose, K. Pauro, and
S. Pöyhönen (2007). “Customer equity and lifetime management
(CELM) finnair case study”. Marketing Science. 26(4): 553–565.
Toubia, O. and L. Florès (2007). “Adaptive idea screening using con-
sumers”. Marketing Science. 26(3): 342–360.
Toubia, O. and O. Netzer (2017). “Idea generation, creativity, and
prototypicality”. Marketing Science. 36(1): 1–20.
Toubia, O., E. Johnson, T. Evgeniou, and P. Delquié (2013). “Dynamic
experiments for estimating preferences: An adaptive method of
eliciting time and risk parameters”. Management Science. 59(3):
613–640.
Toubia, O., G. Iyengar, R. Bunnell, and A. Lemaire (2019). “Extracting
features of entertainment products: A guided latent dirichlet alloca-
tion approach informed by the psychology of media consumption”.
Journal of Marketing Research. 56(1): 18–36.
Toubia, O., D. I. Simester, J. R. Hauser, and E. Dahan (2003). “Fast
polyhedral adaptive conjoint estimation”. Marketing Science. 22(3):
273–303.
Trevor, H., T. Robert, and J. H. Friedman (2009). The Elements of
Statistical Learning: Data Mining, Inference, and Prediction. New
York, NY: Springer.
Trusov, M., L. Ma, and Z. Jamal (2016). “Crumbs of the cookie: User
profiling in customer-base analysis and behavioral targeting”. Mar-
keting Science. 35(3): 405–426.
Turing, A. M. (1950). “Computing machinery and intelligence”. Mind.
LIX(236): 433–460.
References 235

Urban, G. L., G. (Gui) Liberali, E. MacDonald, R. Bordley, and J. R.


Hauser (2014). “Morphing banner advertising”. Marketing Science.
33(1): 27–46.
Van Roy, B. and Y. Xiang (2010). “Manipulation robustness of collabo-
rative filtering”. Management Science. 56(11): 1911–1929.
Vepakomma, P., O. Gupta, T. Swedish, and R. Raskar (2018). “Split
learning for health: Distributed deep learning without sharing raw
patient data”. ArXiv: 1812.00564 [Cs, Stat]. url: http://arxiv.org/
abs/1812.00564.
Verma, S. and J. Rubin (2018). “Fairness definitions explained”. Pro-
ceedings of the International Workshop on Software Fairness: 1–7.
doi: 10.1145/3194770.3194776.
Viger, F. and M. Latapy (2005). “Efficient and simple generation of
random simple connected graphs with prescribed degree sequence”.
In: Computing and Combinatorics. Ed. by L. Wang. Springer. 440–
449.
Wager, S. and S. Athey (2018). “Estimation and inference of hetero-
geneous treatment effects using random forests”. Journal of the
American Statistical Association. 113(523): 1228–1242.
Wang, S. (1995). “The unpredictability of standard back propagation
neural networks in classification applications”. Management Science.
41(3): 555–560.
Wang, J., A. Aribarg, and Y. F. Atchade (2013). “Modeling choice
interdependence in a social network”. Marketing Science. 32(6):
977–997.
Wang, W., J. Xu, and M. Wang (2018). “Effects of recommendation
neutrality and sponsorship disclosure on trust vs. Distrust in online
recommendation agents: Moderating role of explanations for organic
recommendations”. Management Science. 64(11): 5198–5219.
Xia, F., R. Chatterjee, and J. H. May (2019). “Using conditional re-
stricted boltzmann machines to model complex consumer shopping
patterns”. Marketing Science. 38(4): 711–727.
Xiao, L. and M. Ding (2014). “Just the faces: Exploring the effects
of facial features in print advertising”. Marketing Science. 33(3):
338–352.
236 References

Xu, K., H. Yue, L. Guo, Y. Guo, and Y. Fang (2015). “Privacy-preserving


machine learning algorithms for big data systems”. 2015 IEEE 35th
International Conference on Distributed Computing Systems: 318–
327.
Yadav, A. (2018). “AutoML can be the next game-changing tech-
nology in 2018 for marketing operations”. Forbes. March 9. ht
tps :// www . forbes . com / sites / forbestechcouncil / 2018 / 03 / 0
9/automl-can-be-the-next-game-changing-technology-in-2018-for
-marketing-operations/.
Yoganarasimhan, H. (2020). “Search personalization using machine
learning”. Management Science. 66(3): 1045–1070.
Yoganarasimhan, H., E. Barzegary, and A. Pani (2019). “Design and
evaluation of personalized targeting policies: Application to free
trials”. url: https://faculty.washington.edu/hemay/Free_Trial.
pdf.
Zhong, Z. (2019). “A tutorial on fairness in machine learning”. Medium.
July 27. https://towardsdatascience.com/a-tutorial-on-fairness-in
-machine-learning-3ff8ba1040cb.
Zinkevich, M. (2019). “Rules of machine learning: Best practices for ML
engineering”. Google Developers. url: https://developers.google.
com/machine-learning/guides/rules-of-ml/.

You might also like