Instant download Modern Business Analytics: Practical Data Science for Decision-making - eBook PDF pdf all chapter
Instant download Modern Business Analytics: Practical Data Science for Decision-making - eBook PDF pdf all chapter
https://ebookluna.com/product/ebook-pdf-business-analytics-data-
analysis-decision-making-6th/
ebookluna.com
https://ebookluna.com/product/business-analytics-data-analysis-
decision-making-6th-edition-ebook-pdf/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-business-analytics-data-
analysis-decision-making-7th-edition/
ebookluna.com
https://ebookluna.com/download/business-analytics-data-analysis-
decision-making-mindtap-course-list-ebook-pdf/
ebookluna.com
(eBook PDF) Analytics, Data Science, & Artificial
Intelligence: Systems for Decision Support 11th Edition
https://ebookluna.com/product/ebook-pdf-analytics-data-science-
artificial-intelligence-systems-for-decision-support-11th-edition/
ebookluna.com
ebookluna.com
https://ebookluna.com/product/ebook-pdf-accounting-business-reporting-
for-decision-making-6th/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-statistics-for-business-
decision-making-analysis-2nd/
ebookluna.com
https://ebookluna.com/product/ebook-pdf-spreadsheet-modeling-decision-
analysis-a-practical-introduction-to-business-analytics-8th-edition/
ebookluna.com
ISTUDY
MODERN BUSINESS
ANALYTICS
ISTUDY
ISTUDY
MODERN BUSINESS
ANALYTICS
Practical Data Science for Decision Making
Matt Taddy
Amazon, Inc.
Leslie Hendrix
University of South Carolina
Matthew C. Harding
University of California, Irvine
ISTUDY
Final PDF to printer
All credits appearing on page or at the end of the book are considered to be an extension of the copyright page.
The Internet addresses listed in the text were accurate at the time of publication. The inclusion of a website
does not indicate an endorsement by the authors or McGraw Hill Education, and McGraw Hill Education
does not guarantee the accuracy of the information presented at these sites.
mheducation.com/highered
ISTUDY
BRIEF CONTENTS
1 Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Uncertainty Quantification. . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
vi
ISTUDY
CONTENTS
Guided Tour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Practical Data Science for Decision Making xi
An Introductory Example xii
Machine Learning xiv
Computing with R xv
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Linear Regression 3
Residuals 15
Logistic Regression 21
Likelihood and Deviance 26
Time Series 30
Spatial Data 46
2 Uncertainty Quantification. . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Frequentist Uncertainty 56
False Discovery Rate Control 67
The Bootstrap 72
More on Bootstrap Sampling 86
Bayesian Inference 91
3 Regularization and Selection. . . . . . . . . . . . . . . . . . . . . . . . 100
Out-of-Sample Performance 101
Building Candidate Models 108
Model Selection 130
Uncertainty Quantification for the Lasso 144
4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Nearest Neighbors 152
Probability, Cost, and Classification 158
Classification via Regression 160
Multinomial Logistic Regression 163
ISTUDY
viii Contents
ISTUDY
Contents ix
Bibliography 419
Glossary 424
Acronyms 433
Index 435
ISTUDY
Visit https://ebookluna.com
now to explore a diverse
collection of ebooks available
in formats like PDF, EPUB, and
MOBI, compatible with all
devices. Don’t miss the chance
to enjoy exciting offers and
quickly download high-quality
materials in just a few simple
steps!
PREFACE
ISTUDY
GUIDED TOUR
This book is based on the Business Data Science text by Taddy (2019), which was itself developed
as part of the MBA data science curriculum at the University of Chicago Booth School of Business.
This new adaptation creates a more accessible and course-ready textbook, and includes a major
expansion of the examples and content (plus an appendix tutorial on computing with R). Visit Con-
nect for digital assignments, code, datasets, and additional resources.
ISTUDY
xii Guided Tour
It is also important to recognize that data science can be learned only by doing. This means
writing the code to run analysis routines on really messy data. We will use the R scripting lan-
guage for all of our examples. All example code and data is available online, and one of the
most important skills you will get out of this book will be an advanced education in this pow-
erful and widely used statistical software. For those who are completely new to R, we have also
included an extensive R primer. The skills you learn here will also prepare you well for learning
how to program in other languages, such as Python, which you will likely encounter in your
business analysis career.
This is a book about how to do modern business analytics. We will lay out a set of core
principles and best practices that come from statistics, machine learning, and economics. You
will be working through many real data analysis examples as you learn by doing. It is a book
designed to prepare scientists, engineers, and business professionals to use data science to
improve their decisions.
An Introductory Example
Before diving into the core material, we will work through a simple finance example to illus-
trate the difference between data processing or description and a deeper business analysis.
Consider the graph in Figure 0.1. This shows seven years of monthly returns for stocks in the
S&P 500 index (a return is the difference between the current and previous price divided by
the prior value). Each line ranging from bright yellow to dark red denotes an individual stock’s
return series. Their weighted average—the value of the S&P 500—is marked with a bold line.
Returns on three-month U.S. treasury bills are in gray.
This is a fancy plot. It looks cool, with lots of different lines. It is the sort of plot that you
might see on a computer screen in a TV ad for some online brokerage platform. If only I had
that information, I’d be rich!
S&P500
0.5
Return
0.0
–0.5
FIGURE 0.1 A fancy plot: monthly stock returns for members of the S&P 500 and their average (the bold
line). What can you learn?
ISTUDY
Guided Tour xiii
But what can you actually learn from Figure 0.1? You can see that returns do tend to
bounce around near zero (although the long-term average is reliably much greater than zero).
You can also pick out periods of higher volatility (variance) where the S&P 500 changes more
from month to month and the individual stock returns around it are more dispersed. That’s
about it. You don’t learn why these periods are more volatile or when they will occur in the
future. More important, you can’t pull out useful information about any individual stock. There
is a ton of data on the graph but little useful information.
Instead of plotting raw data, let’s consider a simple market model that relates individual
stock returns to the market average. The capital asset pricing model (CAPM) regresses the
returns of an individual asset onto a measure of overall market returns, as shown here:
rjt = αj + βjmt + εjt (0.1)
The output rjt is equity j return at time t. The input mt is a measure of the average return—the
“market”—at time t. We take mt as the return on the S&P 500 index that weights 500 large
companies according to their market capitalization (the total value of their stock). Finally, εjt is
an error that has mean zero and is uncorrelated with the market.
Equation (0.1) is the first regression model in this book. You’ll see many more. This is a
simple linear regression that should be familiar to most readers. The Greek letters define a line
relating each individual equity return to the market, as shown in Figure 0.2. A small βj, near zero,
indicates an asset with low market sensitivity. In the extreme, fixed-income assets like treasury
bills have βj = 0. On the other hand, a βj > 1 indicates a stock that is more volatile than the mar-
ket, typically meaning growth and higher-risk stocks. The αj is free money: assets with αj > 0 are
adding value regardless of wider market movements, and those with αj < 0 destroy value.
Figure 0.3 represents each stock “ticker” in the two-dimensional space implied by the mar-
ket model’s fit on the seven years of data in Figure 0.1. The tickers are sized proportional to
each firm’s market capitalization. The two CAPM parameters—[α, β]—tell you a huge amount
about the behavior and performance of individual assets. This picture immediately allows you
to assess market sensitivity and arbitrage opportunities. For example, the big tech stocks of
Facebook (FB), Amazon (AMZN), Apple (AAPL), Microsoft (MSFT), and Google (GOOGL)
all have market sensitivity β values close to one. However, Facebook, Amazon, and Apple
generated more money independent of the market over this time period compared to Micro-
soft and Google (which have nearly identical α values and are overlapped on the plot). Note
0.3
Equity return
0.1
−0.1
−0.3
FIGURE 0.2 A scatterplot of a single stock’s returns against market returns, with the fitted regression
line for the model of Equation (0.1) shown in red.
ISTUDY
xiv Guided Tour
FB
0.015
AMZN
AAPL
0.010
V
Alpha
BA
AMGN
DIS
0.005
T
PEP JNJ PFE GOOG
MSFT
WFC GE
PG KO
WMT JPM
0.000
CVX
IBM XOM
ORCL BAC
CSCO
FIGURE 0.3 Stocks positioned according to their fitted market model, where α is money you make
regardless of what the market does and β summarizes sensitivity to market movements. The tickers are sized
proportional to market capitalization. Production change alpha to α and beta to β in the plot axis labels.
that Facebook’s CAPM parameters are estimated from a shorter time period, since it did not
have its IPO until May of 2012. Some of the older technology firms, such as Oracle (ORCL),
Cisco (CSCO), and IBM, appear to have destroyed value over this period (negative alpha).
Such information can be used to build portfolios that maximize mean returns and minimize
variance in the face of uncertain future market conditions. It can also be used in strategies
like pairs-trading where you find two stocks with similar betas and buy the higher alpha while
“shorting” the other.
CAPM is an old tool in financial analysis, but it serves as a great illustration of what to strive
toward in practical data science. An interpretable model translates raw data into information that
is directly relevant to decision making. The challenge in data science is that the data you’ll be
working with will be larger and less structured (e.g., it will include text and image data). Moreover,
CAPM is derived from assumptions of efficient market theory, and in many applications you won’t
have such a convenient simplifying framework on hand. But the basic principles remain the same:
you want to turn raw data into useful information that has direct relevance to business policy.
Machine Learning
Machine learning (ML) is the field of using algorithms to automatically detect and predict pat-
terns in complex data. The rise of machine learning is a major driver behind data science and a
big part of what differentiates today’s analyses from those of the past. ML is closely related to
modern statistics, and indeed many of the best ideas in ML have come from statisticians. But
whereas statisticians have often focused on model inference—on understanding the parameters
of their models (e.g., testing on individual coefficients in a regression)—the ML community
has historically been more focused on the single goal of maximizing predictive performance
(i.e., predicting future values of some response of interest, like sales or prices).
ISTUDY
Guided Tour xv
A focus on prediction tasks has allowed ML to quickly push forward and work with larger
and more complex data. If all you care about is predictive performance, then you don’t need
to worry about whether your model is “true” but rather just test how well it performs when
predicting future values. This single-minded focus allows rapid experimentation on alternative
models and estimation algorithms. The result is that ML has seen massive success, to the point
that you can now expect to have available for almost any type of data an algorithm that will
work out of the box to recognize patterns and give high-quality predictions.
However, this focus on prediction means that ML on its own is less useful for many decision-
making tasks. ML algorithms learn to predict a future that is mostly like the past. Suppose that
you build an ML algorithm that looks at how customer web browser history predicts how much
they spend in your e-commerce store. A purely prediction-focused algorithm will discern what
web traffic tends to spend more or less money. It will not tell you what will happen to the
spending if you change a group of those websites (or your prices) or perhaps make it easier for
people to browse the Web (e.g., by subsidizing broadband). That is where this book comes in:
we will use tools from economics and statistics in combination with ML techniques to create a
platform for using data to make decisions.
Some of the material in this book will be focused on pure ML tasks like prediction and
pattern recognition. This is especially true in the earlier chapters on regression, classification,
and regularization. However, in later chapters you will use these prediction tools as parts of
more structured analyses, such as understanding subject-specific treatment effects, fitting
consumer demand functions, or as part of an artificial intelligence system. This typically
involves a mix of domain knowledge and analysis tools, which is what makes the data scientist
such a powerful figure. The ML tools are useless for policy making without an understanding
of the business problems, but a policy maker who can deploy ML as part of their analysis
toolkit will be able to make better decisions faster.
Computing with R
You don’t need to be a software engineer to work as a data scientist, but you need to be able
to write and understand computer code. To learn from this book, you will need to be able to
read and write in a high-level scripting language, in other words, flexible code that can be used
to describe recipes for data analysis. In particular, you will need to have a familiarity with R
(r-project.org).
The ability to interact with computers in this way—by typing commands rather than click-
ing buttons or choosing from a menu—is a basic data analysis skill. Having a script of com-
mands allows you to rerun your analyses for new data without any additional work. It also
allows you to make small changes to existing scripts to adapt them for new scenarios. Indeed,
making small changes is how we recommend you work with the material in this book. The
code for every in-text example is available on-line, and you can alter and extend these scripts
to suit your data analysis needs. In the examples for this book, all of the analysis will be con-
ducted in R. This is an open-source high-level language for data analysis. R is used widely
throughout industry, government, and academia. Companies like RStudio sell enterprise prod-
ucts built around R. This is not a toy language used simply for teaching purposes—R is the real
industrial-strength deal.
For the fundamentals of statistical analysis, R is tough to beat: all of the tools you need for
linear modeling and uncertainty quantification are mainstays. R is also relatively forgiving for
ISTUDY
xvi Guided Tour
the novice programmer. A major strength of R is its ecosystem of contributed packages. These
are add-ons that increase the capability of core R. For example, almost all of the ML tools that
you will use in this book are available via packages. The quality of the packages is more varied
than it is for R’s core functionality, but if a package has high usage you should be confident that
it works as intended.
The Appendix of this book contains a tutorial that is dedicated to getting you started in R.
It focuses on the topics and algorithms that are used in the examples in this book. You don’t
need to be an expert in R to learn from this book; you just need to be able to understand the
fundamentals and be willing to mess around with the coded examples. If you have no formal
background in coding, worry not: many in the field started out in this position. The learning
curve can be steep initially, but once you get the hang of it, the rest will come fast. The tutorial
in the Appendix should help you get started. We also provide extensive examples throughout
the book, and all code, data, and homework assignments are available through Connect. Every
chapter ends with a Quick Reference section containing the basic R recipes from that chapter.
When you are ready to learn more, there are many great places where you can supplement your
understanding of the basics of R. If you simply search for R or R statistics books on-line, you
will find a huge variety of learning resources.
ISTUDY
ACKNOWLEDGMENTS
We are grateful for the reviewers who provided feedback on this first edition:
ISTUDY
xviii Preface
ISTUDY
Preface xix
Top: Jenner Images/Getty Images, Left: Hero Images/Getty Images, Right: Hero Images/Getty Images
ISTUDY
Visit https://ebookluna.com
now to explore a diverse
collection of ebooks available
in formats like PDF, EPUB, and
MOBI, compatible with all
devices. Don’t miss the chance
to enjoy exciting offers and
quickly download high-quality
materials in just a few simple
steps!
Proctorio
Remote Proctoring & Browser-Locking
Capabilities
Remote proctoring and browser-locking capabilities, hosted by
Proctorio within Connect, provide control of the assessment
environment by enabling security options and verifying the identity of the student.
Seamlessly integrated within Connect, these services allow instructors to control students’
assessment experience by restricting browser activity, recording students’ activity, and verify-
ing students are doing their own work.
Instant and detailed reporting gives instructors an at-a-glance view of potential academic
integrity concerns, thereby avoiding personal bias and supporting evidence-based claims.
ReadAnywhere
Read or study when it’s convenient for you with McGraw Hill’s free ReadAnywhere app. Avail-
able for iOS or Android smartphones or tablets, ReadAnywhere gives users access to McGraw
Hill tools including the eBook and SmartBook 2.0 or Adaptive Learning Assignments in Con-
nect. Take notes, highlight, and complete assignments offline–all of your work will sync when
you open the app with WiFi access. Log in with your McGraw Hill Connect username and
password to start learning–anytime, anywhere!
OLC-Aligned Courses
Implementing High-Quality Online Instruction and Assessment through Preconfigured
Courseware
In consultation with the Online Learning Consortium (OLC) and our certified Faculty
Consultants, McGraw Hill has created pre-configured courseware using OLC’s quality score-
card to align with best practices in online course delivery. This turnkey courseware contains
a combination of formative assessments, summative assessments, homework, and application
activities, and can easily be customized to meet an individual’s needs and course outcomes. For
more information, visit https://www.mheducation.com/highered/olc.
xx
ISTUDY
efficiently find what they need, when they need it, across an entire semester of class record-
ings. Help turn your students’ study time into learning moments immediately supported by
your lecture. With Tegrity, you also increase intent listening and class participation by easing
students’ concerns about note-taking. Using Tegrity in Connect will make it more likely you
will see students’ faces, not the tops of their heads.
Writing Assignment
Available within Connect and Connect Master, the Writing Assignment tool delivers a learning
experience to help students improve their written communication skills and conceptual under-
standing. As an instructor you can assign, monitor, grade, and provide feedback on writing
more efficiently and effectively.
ISTUDY
McGraw Hill’s comprehensive, cross-disciplinary content. Choose what you want from our
high-quality textbooks, articles, and cases. Combine it with your own content quickly and eas-
ily, and tap into other rights-secured, third-party content such as readings, cases, and articles.
Content can be arranged in a way that makes the most sense for your course and you can
include the course name and information as well. Choose the best format for your course: color
print, black-and-white print, or eBook. The eBook can be included in your Connect course and
is available on the free ReadAnywhere app for smartphone or tablet access as well. When you
are finished customizing, you will receive a free digital copy to review in just minutes! Visit
McGraw Hill Create®—www.mcgrawhillcreate.com—today and begin building!
xxii
ISTUDY
1
This chapter develops the framework and language of regression: building models
that predict response outputs from feature inputs.
Section 1.1 Linear Regression: Specify, estimate, and predict from a linear
regression model for a quantitative response y as a function of inputs x. Use
log transforms to model multiplicative relationships and elasticities, and use
interactions to allow the effect of inputs to depend on each other.
Section 1.2 Residuals: Calculate the residual errors for your regression fit, and
understand the key fit statistics deviance, R2, and degrees of freedom.
Section 1.3 Logistic Regression: Build logistic regression models for a binary
response variable, and understand how logistic regression is related to linear
regression as a generalized linear model. Translate the concepts of deviance,
likelihood, and R2 to logistic regression, and be able to interpret logistic
regression coefficients as effects on the log odds that y = 1.
Section 1.4 Likelihood and Deviance: Relate likelihood maximization and
deviance minimization, use the generalized linear models to determine r esidual
deviance, and use the predict function to integrate new data with the same
variable names as the data used to fit your regression.
Section 1.5 Time Series: Adapt your regression models to allow for
dependencies in data that has been observed over time, and understand time
series concepts including seasonal trends, autoregression, and panel data.
Section 1.6 Spatial Data: Add spatial fixed effects to your regression mod-
els and use Gaussian process models to estimate spatial dependence in your
observations.
ISTUDY
2 Chapter 1 Regression
T
he vast majority of problems in applied data science require regression modeling. You
have a response variable (y) that you want to model or predict as a function of a vector
of input features, or covariates (x). This chapter introduces the basic framework and lan-
guage of regression. We will build on this material throughout the rest of the book.
Regression is all about understanding the conditional probability distribution for “y given
x,” which we write as p(y|x). Figure 1.1 illustrates the conditional distribution in contrast to a
marginal distribution, which is so named because it corresponds to the unconditional distribu-
tion for a single margin (i.e., column) of a data matrix.
A variable that has a probability distribution (e.g., number of bathrooms in Figure 1.1) is
called a random variable. The mean for a random variable is the average of random draws from
its probability distribution. While the marginal mean is a simple number, the conditional mean
is a function. For example, from Figure 1.1b, you can see that the average home selling price
takes different values indexed by the number of bathrooms. The data is distributed randomly
around these means, and the way that you model these distributions drives your estimation and
prediction strategies.
Conditional Expectation
A basic but powerful regression strategy is to build models in terms of averages and lines. That
is, we will model the conditional mean for our output variable as a linear function of inputs.
Other regression strategies can sometimes be useful, such as quantile regression that models
percentiles of the conditional distribution. However for the bulk of applications you will find
that mean regression is a good approach.
There is some important notation that you need to familiarize yourself with for the rest of
the book. We model the conditional mean for y given x as
𝔼[y | x] = f (x′β) (1.1)
where
• 𝔼[⋅]denotes the taking of the expectation or average of whatever random variable is inside
the brackets. It is an extremely important operation, and we will use this notation to define
many of our statistical models.
8000
Home value
Frequency
1000000
4000
0
0 500000 1500000 0 1 2 3 4 5 6 7
Home value Number of bathrooms
(a) Marginal Distribution (b) Conditional Distribution
FIGURE 1.1 Illustration of marginal versus conditional distributions for home prices. On the left, we have
the marginal distribution for all of the home prices. On the right, home price distributions are conditional on
the number of bathrooms.
ISTUDY
Chapter 1 Regression 3
• The vertical bar | means “given” or “conditional upon,” so that 𝔼[y|x]is read as “the
average for y given inputs x.”
• f (·) is a “link” function that transforms from the linear model to your response.
• x = [1, x1, x2, . . . xp] is the vector of covariates and β = [β0, β1, β2, . . . βp] are the
corresponding coefficients.
⎢⎥
The vector notation, x′β, is shorthand for the sum of elementwise products:
⎡β0⎤
β1
x′β = [1x1x2⋯ xp] β2 = β0+ x1β1+ x2β2+ … + xpβp (1.2)
⋮
⎣ p⎦
β
This shorthand notation will be used throughout the book. Here we have used the convention
that x0 = 1, such that β0 is the intercept.
The link function, f(·), defines the relationship between your linear function x′β and the
response. The link function gives you a huge amount of modeling flexibility. This is why mod-
els of the kind written in Equation (1.1) are called generalized linear models (GLMs). They
allow you to make use of linear modeling strategies after some simple transformations of your
output variable of interest. In this chapter we will outline the two most common GLMs: linear
regression and logistic regression. These two models will serve you well for the large majority
of analysis problems, and through them you will become familiar with the general principles
of GLM analysis.
β1
β0
FIGURE 1.2 Simple linear regression with a positive slope β1. The plotted line corresponds to 𝔼[y|x].
ISTUDY
4 Chapter 1 Regression
E[Y∣X]
Gaussian
distribution
FIGURE 1.3 Using simple linear regression to picture the Gaussian conditional distribution for y|x.
Here 𝔼[y|x]are the values on the line and the variation parallel to the y axis (i.e., within each narrow vertical
strip) is assumed to be described by a Gaussian distribution.
This says that the distribution for y as a function of x is normally distributed around
𝔼[y|x] = x′βwith variance σ2. The same model is often written with an additive error term:
y = x′β + ε, ε ~ N(0, σ2) (1.5)
where ε are the “independent” or “idiosyncratic” errors. These errors contain the variations in
y that are not correlated with x. Equations (1.4) and (1.5) describe the same model. Figure 1.3
illustrates this model for single-input simple linear regression. The line is the average 𝔼[y|x]
and vertical variation around the line is what is assumed to have a normal distribution.
You will often need to transform your data to make the linear model of Equation (1.5)
realistic. One common transform is that you need to take a logarithm of the response, say, “r,”
such that your model becomes
log (r) = x′β + ε, ε ~ N(0, σ2) (1.6)
Of course this is the same as the model in Equation (1.5), but we have just made the replace-
ment y = log(r). You will likely also consider transformations for the input variables, such that
elements of x include logarithmic and other functional transformations. This is often referred
to as feature engineering.
Example 1.1 Orange Juice Sales: Exploring Variables and the Need for a log-log Model As
a concrete example, consider sales data for orange juice (OJ) from Dominick’s grocery
stores. Dominick’s was a Chicago-area chain. This data was collected in the 1990s and
is publicly available from the Kilts Center at the University of Chicago’s Booth School of
ISTUDY
Chapter 1 Regression 5
6e+05
12
10
4e+05
log(sales)
Brand
Sales
8
2e+05
6
0e+00
FIGURE 1.4
ISTUDY
6 Chapter 1 Regression
price. This makes sense: demand is downward sloping, and if you charge more, you sell less.
More specifically, it appears that log sales has a roughly linear relationship with log price. This
is an important point. Whenever you are working with linear (i.e., additive) models, it is crucial
that you try to work in the space where you expect to find linearity. For variables that change
multiplicatively with other factors, this is usually the log scale (see the nearby box for a quick
review on logarithms). For comparison, the raw (without log) values in Figure 1.4b show a
nonlinear relationship between prices and sales.
United
Brazil Kingdom
France
Canada
Australia
Argentina Netherlands
6
Egypt
4000 6000
GreMalaysia
ece
De nmark
log(GDP)
Finland
Israel
Nigeria
GDP
Japan BoliviaCuba
Panama
Mauritius
Haiti
India Jamaica
2
2000
Liberia
BrazilUni ted Kingdom
France
Canada
Australia
Argentina
Neth
Egyerl
ptands
0
Malaysia
Greece
Denmark
Nigeria
Finland
Israel
Mauritius
Jamaica
Panama
Samoa
Liberia
Bolivia
Cuba
Haiti Samoa
0
FIGURE 1.5 National GDP against imports, in original and log scale.
ISTUDY
Chapter 1 Regression 7
Example 1.2 Orange Juice Sales: Linear Regression Now that we have established what a
log-log model will do for us, let’s add a bit of complexity to the model from (1.7) to make it
more realistic. If you take a look at Figure 1.4c, it appears that the three brands have log-log
sales-price relationships that are concentrated around three separate lines. If you suspect that
each brand has the same β1 elasticity but a different intercept (i.e., if all brands have sales that
move with price the same way but at the same price some brands will sell more than others),
then you would use a slightly more complex model that incorporates both brand and price:
log (𝚜𝚊𝚕𝚎𝚜) = α𝚋𝚛𝚊𝚗𝚍+ β log (𝚙𝚛𝚒𝚌𝚎) + ε (1.13)
Here, αbrand is shorthand for a separate intercept for each OJ brand, which we could write out
more fully as
α𝚋𝚛𝚊𝚗𝚍= α01[𝚍𝚘𝚖𝚒𝚗𝚒𝚌𝚔𝚜] + α11[𝚖𝚒𝚗𝚞𝚝𝚎.𝚖𝚊𝚒𝚍] + α21[𝚝𝚛𝚘𝚙𝚒𝚌𝚊𝚗𝚊]. (1.14)
The indicator functions, 1[v], are one if v is the true factor level and zero otherwise. Hence,
Equation (1.13) says that, even though their sales all have the same elasticity to price, the
brands can have different sales at the same price due to brand-specific intercepts.
ISTUDY
Visit https://ebookluna.com
now to explore a diverse
collection of ebooks available
in formats like PDF, EPUB, and
MOBI, compatible with all
devices. Don’t miss the chance
to enjoy exciting offers and
quickly download high-quality
materials in just a few simple
steps!
Other documents randomly have
different content
vihreään veteen.
Kun hän taas ilmaa haukkoen tuli pinnalle, oli Cinco Llagas jo
muutaman kaapelinmitan päässä.
Kymmenes luku
DON DIEGO
»Entä poikani? Missä poikani on? Hän oli samassa veneessä, joka
toi minut laivalle!»
Yhdestoista luku
POJAN USKOLLISUUTTA
Kunniasanansa perusteella sai Don Diego de Espinosa nauttia
täydellistä vapautta entisellä laivallaan, ja laivan ohjaus, jonka hän
oli lupautunut suorittamaan, oli kokonaan hänen käsissään. Ja siitä
syystä, että laivan miehistö oli kokonaan outo espanjalaisten valta-
alueiden vesillä ja myös sen vuoksi, että Bridgetownin aikaiset
tapahtumat eivät vielä olleet kylliksi opettaneet sitä pitämään jokaista
espanjalaista petollisena, verenhimoisena koirana, joka tavattaessa
oli lyötävä kuoliaaksi, se kohteli häntä samanlaisella ystävyydellä
mitä hänen oma hieno maailmanmiehen käytöksensä oli omiaan
herättämään. Hän söi ateriansa suurhytissä Bloodin ja kolmen hänen
avukseen valitun upseerin, Hagthorpen, Wolverstonen ja Dyken
seurassa.
»Vai niin?» espanjalaisen ääni oli rauhallinen. Siinä oli ikään kuin
rahtunen naurua, ja syy siihen selvisi hänen seuraavasta
lauseestaan. »Mutta tehän kerroitte minulle, että herra Pitt oli teidän
laivurinne?»
»Ah, perro inglés! Sinä tiedät liian paljon», sanoi hän matalalla
äänellä ja syöksyi kapteenin kurkkuun kiinni.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookluna.com