Brei 2020
Brei 2020
This article may be used only for the purpose of research, teaching,
and/or private study. Commercial use or systematic downloading
(by robots or other automatic processes) is prohibited without ex-
plicit Publisher approval.
Boston — Delft
Contents
1 Introduction 174
7 Conclusions 220
Acknowledgements 222
References 223
Machine Learning in Marketing
Vinicius Andrade Brei
Universidade Federal do Rio Grande do Sul (UFRGS), Brazil;
[email protected]
ABSTRACT
The widespread impacts of artificial intelligence (AI) and
machine learning (ML) in many segments of society have not
yet been felt strongly in the marketing field. Despite such
shortfall, ML offers a variety of potential benefits, includ-
ing the opportunity to apply more robust methods for the
generalization of scientific discoveries. Trying to reduce this
shortfall, this monograph has four goals. First, to provide
marketing with an overview of ML, including a review of
its major types (supervised, unsupervised, and reinforce-
ment learning) and algorithms, relevance to marketing, and
general workflow. Second, to analyze two potential learning
strategies for marketing researchers to learn ML: the bottom-
up (that requires a strong background in general math and
calculus, statistics, and programming languages) and the
top-down (focused on the implementation of ML algorithms
to improve explanations and/or predictions given within the
domain of the researcher’s knowledge). The third goal is to
analyze the ML applications published in top-tier marketing
and management journals, books, book chapters, as well
as recent working papers on a few promising marketing re-
search sub-fields. Finally, the last goal of the monograph is to
discuss possible impacts of trends and future developments
of ML to the field of marketing.
174
175
crisis has occurred because researchers have found that many of the
most important scientific studies are difficult or impossible to replicate
or reproduce (see, for example, Camerer et al., 2018). As this monograph
will discuss, the fundamental goal of machine learning is to generalize
beyond the examples provided by training data, looking for generaliz-
ability (Domingos, 2012). Thus, one of the potential contributions of
ML to marketing (and to management in general) lies in its robustness
for the generation, testing, and generalization of scientific discoveries.
With these different academic and practical perspectives in mind, the
goal of this monograph is to provide marketing with an overview of ML
and to analyze required learning, applications, and future developments
involved in applying ML to marketing.
This monograph progresses as follows. The following section pro-
vides an overview of ML, including a review of its most relevant types,
algorithms, and relevance to marketing. The following section presents
a typical ML workflow, followed by a section that proposes two dif-
ferent learning strategies that can be used by management/marketing
researchers interested in ML. That section is followed by a descriptive
analysis of applications of ML published in top-tier marketing and
management journals, books, book chapters, and recent working papers
that explore a few of the most promising marketing research sub-fields.
The following section discusses how trends and future developments
of ML can impact the field of marketing. The last section summarizes
the monograph’s contributions, limitations, and suggestions for future
research.
2
Overview of Machine Learning
176
2.1. Machine Learning and the Most Relevant Algorithms 177
Figure 2.1: Relationships between artificial intelligence, machine learning, and deep
learning.
of numbers but can also include more complex and structured objects
such as images, sentences, texts, times-series, etc. When an output (or
a response variable or target) is categorical or nominal, the problem
refers to classification or pattern recognition. When y is a real value,
the problem is known as a regression (Murphy, 2012). In supervised
learning, a comparative metric is used to evaluate how accurate the
prediction of y is, given x. Most ML algorithms (including DL) are used
for supervised learning. Examples of supervised learning algorithms
frequently used in marketing include linear and logistic regressions,
random forest, and support vector machines. Almost all applications
of DL also belong to this branch of ML, including those of object/face
detection, image segmentation/classification, speech recognition, and
language translation (Chollet and Allaire, 2018).
The unsupervised (or descriptive) learning goal is to find ‘interesting
patterns’ in the data without the use of any targets (or outputs).
Rather, no responses (or labeled data) are provided. For this approach,
the problem is much less specifically defined, as one does not know
which patterns to search for, and there is no straightforward metric
for determining the success of a learning task (Murphy, 2012). The
algorithm is designed to identify similarities between inputs so that
inputs with commonalities are categorized together (Marsland, 2015).
It is not uncommon to use unsupervised learning to understand data
patterns before attempting to solve a supervised learning problem. Some
of the most widely used unsupervised learning algorithms in the field
of marketing include principal components analysis (for dimensionality
reduction or the discovery of latent factors) and k-means clustering (for
segmentation).
Reinforcement learning (RL) involves mapping situations to actions
to maximize a numerical reward signal. The learner algorithm is not told
which actions to take. Instead, it must identify which actions maximize
rewards through trial and error. The basic premise is to capture the
most critical aspects of the problem faced by the learning agent, which
interacts with its environment over time to achieve a goal. This learning
agent must be able to sense the state of its environment and to engage
in actions that affect the state. RL as a system applies four key elements:
a policy (the agent’s way of behaving), a reward signal (the goal of a
2.2. The Relevance of Machine Learning for Marketing 179
(1) Problem definition and dataset assembly. In this stage, the re-
searcher should determine which type of problem she is facing
(e.g., binary or multiclass classification, regression, clustering, gen-
eration, or reinforcement learning). To solve the problem, the
researcher should identify inputs and outputs and required data.
Depending on the task at hand, the researcher may also wish to
identify hypotheses to be tested later.
181
182 The Machine Learning Workflow
(3) Evaluation protocol. Once the researcher knows what she is aiming
for, the next step is to establish ways to measure current progress.
Three commonly used protocols involve maintaining a hold-out
validation dataset for which the model was not previously trained,
running K-fold cross-validation, or performing iterated K-fold
validation for highly accurate model evaluation when little data
are available. K-fold cross-validation uses some of the available
data to fit the model and a different proportion to test it (Trevor
et al., 2009).
(4) Data preparation. Once the researcher knows what the model
is being trained for, what should be optimized, and ways to
evaluate the chosen approach, she may start to train the models.
However, it is recommended to first format the data so that they
can be fed into an ML model. Different models have different
requirements. For example, deep neural network data should be
formatted as tensors, while decision-tree models do not typically
require previous rescaling or feature engineering.
183
(5) Developing a model that performs better than a baseline level. The
goal of this stage is to achieve statistical power by crafting a small
model capable of surmounting an initial “dumb” baseline. For
example, any classification model should achieve an accuracy level
of greater than 0.5 as an expected random guess value. When it
is not possible to beat the “dumb” baseline, this may be a signal
that outputs cannot be predicted from the input data or that it is
not possible to test the hypothesis with the available data. When
the model passes an initial test, the researcher should choose a loss
or cost function as a measure of how good a model performs in
predicting the expected outcomes. At this point, one must define
the optimizer (or optimization algorithm), which ties together the
loss function and model parameters by updating the model in
response to the output of the loss function.
(6) Scaling up: developing a model that overfits. Once the researcher
obtains a model that has statistical power, the next step is to de-
termine whether the model generalizes well. The universal tension
in ML lies between optimization and generalization. That is, the
ideal model is that which straddles the line between underfitting
(when the model cannot capture the underlying trend observed in
the data) and overfitting (when the model captures noise in the
data). The model must also balance undercapacity with overcapac-
ity, i.e., its ability to fit a wide variety of functions. To determine
how large a model should be, the researcher should develop a
model that overfits and then adjust its architecture while monitor-
ing training and validation/test losses. Once the validation/test
loss begins to decline, this serves as evidence that the model has
achieved overfitting. At this point, one must start regularizing
and tuning the model so that it is as similar as possible to the
ideal model.
(7) Model regularization and tuning. In this stage, the researcher
modifies, trains, and re-evaluates the model until it is optimized.
Regularization involves reducing the error value by roughly fitting
a function to the given training set to avoid overfitting (Trevor
et al., 2009). A set of tuning (hyper)parameters governs the model’s
184 The Machine Learning Workflow
185
186 How to Learn Machine Learning
that often requires several years of training before the first results start
to emerge.
The bottom-up approach usually starts with the mastering of general
math and calculus, statistics, and programming languages. This goal
can be achieved by completing traditional academic courses or by using
online learning platforms. KahnAcademy.com and specialized websites
focused on the math used for ML, such as Mathematics for Machine
Learning (Deisenroth et al., 2019), are good examples of high-quality
online learning platforms. YouTube channels like Mathematicalmonk
(2019) can also be used to learn about this approach. Understanding the
bottom-up approach also involves the study of classical (e.g., Blitzstein
and Hwang, 2014) and Bayesian probability (e.g., Hoff, 2009), as well as
ML books that use more formal mathematical and probabilistic notation,
such as Bishop (2016) and Trevor et al. (2009), Murphy (2012), and
Goodfellow et al. (2016).
The bottom-up approach also requires the use of strong programming
language skills. At the time of writing, the most popular ML language
is Python. New scientific discoveries related to ML are published in
specialized conference proceedings (e.g., the International Conference
on Machine Learning (ICML)), on websites (e.g., arXiv.org) and in aca-
demic journals (e.g., IEEE Transactions on Evolutionary Computation
or Foundations and Trends in Machine Learning). As in any academic
field, the closer one gets to achieving specialized communication (e.g.,
articles published in academic journals on ML), the harder it becomes
for outsiders to learn ML using an approach other than the bottom-up.
In pursuing a different goal from that of the bottom-up approach, the
top-down approach is designed to implement ML algorithms to improve
explanations or predictions given within the domain of the researcher’s
knowledge. Such applications can be used in the realms of business,
health, law, communication, art, etc. With such an approach, researchers
are not necessarily striving to improve existing algorithms or to advance
the knowledge frontier of ML. Instead, their goal is to apply ML to
solve specific problems related to their disciplines. Given the applied
nature of the management and marketing disciplines, many students and
researchers of these areas prefer the top-down approach over the bottom-
up approach. However, the choice of the former approach may bring
187
Distribution platform:
Anaconda Free and open-source distribution of the
Python and R programming languages
for scientific computing that aims to
simplify package management and
deployment.
Continued.
190 How to Learn Machine Learning
191
192 Analysis of Machine Learning Applications in Marketing
211
212 Trends and Future Developments of ML in Marketing
In addition to the complex tasks that AutoML aims to automate and the
privacy/security data concerns, another barrier to the adoption of ML
relates to the fact that computers usually do not explain their predictions.
Such ML models are typically called “black-box models” (Molnar, 2019).
A company may build a model that correlates marketing campaign data
with financial data to evaluate whether a given campaign was effective.
214 Trends and Future Developments of ML in Marketing
across features and each of their levels (Molnar, 2019). The logic of
this distribution echoes that of the traditional conjoint analysis method
(Rao, 2014) developed by marketing researchers and widely used in the
field from the 1970s. However, conjoint analysis involves estimating
the relevance of each feature value based on a relatively limited set
of features. In contrast, the Shapley value can substantially scale up
a feature value analysis for large samples and complex choice models.
The knowledge derived from the Shapley values can be applied to a
broad range of marketing phenomena, including those related to product
development, customer segmentation, and relationship marketing.
Finally, local interpretable model-agnostic explanations (LIME)
(Ribeiro et al., 2016) are used to explain the individual predictions of
black-box ML models. Rather than training a global surrogate model,
a LIME runs prediction based on local surrogate model estimations. It
generates a new dataset of permuted samples and the corresponding new
predictions. From this new dataset, the LIME then trains a new model,
which is weighted by the proximity of sampled instances to the instance
of interest (Molnar, 2019). A LIME has a broad range of applications
to the field of marketing. Any problem that requires the evaluation of
individual customer behaviors or decisions (in B2B or B2C contexts)
may be explained by a LIME, including those of churning (e.g., Dancho,
2018), product replacement, or adherence to loyalty programs.
that there is more recognition that even models developed with the best
of intentions may exhibit discriminatory biases, perpetuate inequality,
or perform less well for historically disadvantaged groups (Barocas and
Hardt, 2017).
Barocas and Selbst (2016) and Zhong (2019) summarized a list of
possible causes of bias in ML systems:
• Sample size disparity: if the training data sample coming from the
minority group is smaller than the one coming from the majority
group, it is likely to represent the minority group correctly.
220
221
unseen data (as the fundamental goal of ML) (Domingos, 2012) may
alone improve the generalizability of many marketing models.
Although this monograph discussed some of the most critical ap-
plications and advances of ML for marketing, many others were not
analyzed due to space limitations. This list includes applications and
advances for voice recognition, natural language processing, augmented
and virtual reality, and the use of so-called “notebooks” (open-source
web applications for sharing codes and texts) that may popularize ML,
among many other features. Advances in hardware development, such as
those related to quantum computing will also substantially impact the
field (see Biamonte et al., 2017), but such a discussion extends beyond
the scope of this monograph.
The marketing applications of machine learning already affect the
everyday lives of millions of consumers across the globe via recommenda-
tion systems, collaborative filtering, gaming, digital personal assistants,
logistics, distribution systems, etc. Such impacts tend to enhance in cer-
tain areas, such as those of self-driving and automated transportation,
environmental protection, health care, banking, and smart home and
city development, among many others. The associated challenges and
opportunities facing marketers are thus considerable.
Acknowledgements
The author expresses his gratitude to the National Council for Scientific
and Technological Development (CNPq, Brazil) for the scholarship
200863/2018-5 that supported his Visiting Scholar appointment at
Harvard University, during when the first draft of this text was written.
He is also grateful to the Chair Tramontina Eletrik for the research
funding 4458-X, to the Massachusetts Institute of Technology (MIT) for
his Visiting Scholar appointment and Connection Science Fellowship,
and to the Editorial team’s helpful suggestions in previous versions of
this monograph.
222
References
223
224 References
He, K., X. Zhang, S. Ren, and J. Sun (2015). “Deep residual learning for
image recognition”. ArXiv: 1512.03385 [Cs]. url: http://arxiv.org/
abs/1512.03385.
Hitsch, G. J. and S. Misra (2018). Heterogeneous Treatment Effects
and Optimal Targeting Policy Evaluation (SSRN Scholarly Paper
ID 3111957). Social Science Research Network. doi: 10.2139/ssrn.
3111957.
Hoff, P. D. (2009). A First Course in Bayesian Statistical Methods.
Springer Science & Business Media.
Hollenbeck, B. (2018). “Online reputation mechanisms and the decreas-
ing value of chain affiliation”. Journal of Marketing Research. 55(5):
636–654.
Homburg, C., L. Ehm, and M. Artz (2015). “Measuring and managing
consumer sentiment in an online community environment”. Journal
of Marketing Research. 52(5): 629–641.
Hu, M. (Mandy), C. (Ivy) Dang, and P. K. Chintagunta (2019). “Search
and learning at a daily deals website”. Marketing Science. 38(4):
609–642.
Huang, T. (1996). Computer Vision: Evolution and Promise. CERN.
Huang, D. and L. Luo (2016). “Consumer preference elicitation of com-
plex products using fuzzy support vector machine active learning”.
Marketing Science. 35(3): 445–464.
Huang, M.-H. and R. T. Rust (2018). “Artificial intelligence in service”.
Journal of Service Research. 21(2): 155–172.
Huang, Z., D. D. Zeng, and H. Chen (2007). “Analyzing consumer-
product graphs: Empirical findings and applications in recommender
systems”. Management Science. 53(7): 1146–1164.
Hulstaert, L. (2018). “Interpreting machine learning models”. Towards
Data Science. February 20. https://towardsdatascience.com/interpr
etability-in-machine-learning-70c30694a05f.
Hutter, F., R. Caruana, R. Bardenet, M. Bilenko, I. Guyon, B. Kégl,
and H. Larochelle (2014). AutoML workshop @ ICML’14. url:
https://sites.google.com/site/automlwsicml14/.
Jacobs, B. J. D., B. Donkers, and D. Fok (2016). “Model-based purchase
predictions for large assortments”. Marketing Science. 35(3): 389–
404.
230 References