0% found this document useful (0 votes)

87 views8 pages

Analysis-County Census

This document analyzes demographic factors related to population growth in U.S. counties from 2000 to 2010 using data from the U.S. Census. The data includes population counts and percentages for age, gender, race, and education levels for over 3,000 counties. Population growth rates varied widely, from -46.6% to 110.4%, with an average of 5.4%. Several variables are highly correlated, such as the percentages of the population under age 5 and under 18, and the percentages of different racial groups. Taking the natural log of population sizes helps account for differences between smaller and larger counties.

Uploaded by

Shafayet Hossain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views8 pages

Analysis-County Census

Uploaded by

Shafayet Hossain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Modeling population growth in the United States

Luke Paulsen
OpenIntro
openintro.org
CC BY-SA∗

1 Introduction

Population growth can define a community. Communities that grow rapidly may see increased
investment, while contracting communities may find local assets, such as homes and businesses,
falling in value relative to those of nearby communities. In this investigation, we wish to determine
which demographic factors relate most closely to population growth in U.S. counties from 2000
to 2010.

2 Data Exploration

Data for each county is available from the US Census website, including age, gender, race, and edu-
cation, along with other relevant demographics such as homeownership, employment, and income.
Five counties for these data are summarized in Table 1, and the data were originally collected from
the US Census website.1 This investigation will only consider a subset of variables and be limited
to counties where those variables are complete. The resulting data set represents 3,083 counties
on 23 different variables. A complete list of the variables under consideration along with variable
descriptions is available at
www.openintro.org/stat/data/cc.php

growth pop2000 age under 5 age under 18 female black hs grad bachelors
1 24.96 43671 6.6 26.8 51.3 17.7 85.3 21.7
2 29.80 140415 6.1 23.0 51.1 9.4 87.6 26.8
3 -5.44 29038 6.2 21.9 46.9 46.9 71.9 13.5
4 10.03 20826 6.0 22.7 46.3 22.0 74.5 10.0
.. .. .. .. .. .. .. .. ..
. . . . . . . . .
3143 8.49 6644 5.7 21.8 47.4 0.3 91.1 17.9

Table 1: Five rows from the countyComplete data set with 8 of the 23 variables.
∗
This document is released under a Creative Commons Attribution-ShareAlike 3.0 license.
1
These data were collected from the US Census website. The data are available in the openintro R package and
also as a tab-delimited text file at openintro.org/stat.

1
A variable called growth that represents the population growth rate for each county from 2000 to
2010 is included in Table 1, and this variable represents the response variable for the analysis. This
variable is summarized in Table 2 and Figure 3. Growth rates for the ten-year period average 5.4%,
with the middle half ranging from -2.2% to 10.4%.

Mean Median St. Dev. IQR Min Max

5.42% 3.29% 13.18% 12.70% -46.6% 110.40%

Table 2: Statistical summaries of population growth in US counties from 2000

to 2010.

300

250
>30
200
Frequency

150

100 0
50

0
−50 0 50 100
<−30

Growth (percent)

Figure 3: Population growth across the United States from 2000 to 2010.

Several other variables in the data set are worth exploring. The pop2000 variable measures the
population of the county during the 2000 census and is shown in Figure 4. The variable is very
skewed, so we will use the natural logarithm of population in the model. Taking the natural
logarithm of population allows us to measure population differences in terms of multiplication
rather than addition. For example, a difference of 1,000 people would be important in a county
with population 10,000 but less so in a county with population 1,000,000. Using the natural
logarithm for population means differences are compared geometrically, e.g. comparing counties
with populations of 1,000 and 10,000 will be analogous to a comparing two counties with 10,000
and 100,000 people in the model.

600 200 1,000 10,000 100,000 1,000,000

500
150
Frequency

Frequency

400

300 100
34 counties with a population
200 greater than 1,000,000
are not shown 50
100

0 0
0 500,000 1,000,000 4 6 8 10 12 14 16

Population, Year 2000 Natural Log of Population, Year 2000

Figure 4: Distribution of populations. Left: original populations. Right: log-

transformed populations.

2
100
40 70

Percent With Bachelors Degree

60
80

Percent White (not Hispanic)

Percent under Age 18

30
50
60
40
20
40 30

10 20
20
10
0
0
0 2 4 6 8 10 12 0 20 40 60 80 40 50 60 70 80 90 100

Percent under Age 5 Percent Black Percent With High School Degree

Figure 5: Three figures that highlight the collinearity of several predictors.

There are also several groups of variables that divide the population with respect to a particular
statistic: age, race, or education level. We expect these variables to be related to one another,
and this relationship must be considered when interpreting the results. Figure 5 highlights the
relationships among some of these variables.
The first plot in Figure 5 suggests that the variables age under 5 and age under 18 are strongly
correlated. The diagonal line in the second plot represents the fact that the percentages of each
racial group in the population cannot sum to more than 100%. The percentage of the population
that self-identifies as some race other than black or non-Hispanic white is represented by the
distance of a point from the downward-trending diagonal. The relationship in the third plot is
somewhat weaker, but it shows that the percentage of the population with a bachelor’s degree is
always smaller than the percentage that completed high school, as would be expected.

3 Analysis

Two variables can be linearly related. For example, the left panel of Figure 5 shows a positive trend
relating age under 5 and age under 18. This trend looks linear, and it can be modeled, even if
imperfectly, by using a straight line. Such a line would have error for individual observations, but
it would capture the overall structure of the relationship.

3.1 Modeling population growth

When working with many variables, the principles of the linear model can be generalized, where
here we simultaneously fit many variables against a response rather than one variable at a time.
We begin by writing a formula that models the growth rate as a linear combination of all the other
variables that we are considering:
d = β0 + β1 × log(pop2000) + β2 × female
growth
..
.
+ β21 × poverty + β22 × sales per capita

3
Statistical software may be used to identify the best fitting model, where point estimates of β0 , β1 ,
..., β22 would be estimated in the model.
To improve the model, we perform model selection, eliminating variables using backwards selection,
until all remaining variables are found to be statistically significant. The model following backwards
selection is summarized by Table 6.

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.3168 6.3879 0.83 0.4053
log(pop2000) 1.6556 0.2007 8.25 0.0000
age under 5 3.3118 0.4179 7.92 0.0000
age under 18 -0.6160 0.1605 -3.84 0.0001
age over 65 -0.2666 0.0816 -3.26 0.0011
female -0.3026 0.1080 -2.80 0.0051
hispanic 0.1320 0.0327 4.03 0.0001
white not hispanic 0.0803 0.0145 5.54 0.0000
no move in one plus year -0.5614 0.0520 -10.79 0.0000
foreign born 0.2509 0.0632 3.97 0.0001
foreign spoken at home -0.1744 0.0471 -3.70 0.0002
bachelors 0.5281 0.0431 12.24 0.0000
mean work travel 0.7156 0.0434 16.48 0.0000
housing multi unit -0.4930 0.0347 -14.22 0.0000
median val owner occupied 0.0000 0.0000 4.20 0.0000
persons per household 9.9882 1.3261 7.53 0.0000
per capita income -0.0003 0.0001 -3.73 0.0002
poverty -0.3266 0.0506 -6.46 0.0000
sales per capita 0.0002 0.0000 5.74 0.0000

Table 6: Model summary for the regression model predicting population growth
after model selection. See page 2 for a link that provides variable descriptions.

The variables black, hs grad, and density were eliminated during model selection. However,
variables that we would expect to be closely correlated with these variables – hispanic and
white not hispanic with black, and bachelors with hs grad – still appear in the model. As we
saw in the Data Exploration section, some variables are highly correlated, i.e. they are collinear.
When predictors are collinear, having one in a multiple regression model may be about as good as
having both, and this may explain why black and hs grad were eliminated during model selec-
tion.
In the age variables there is a surprise of a different type. The variables age under 5 and age under 18
are highly collinear, but both are still included in the model, and the model suggests they have
opposing effects on population growth. It may be tempting to make a standard interpretation of
the coefficients, however, that could be misleading. These two variables are collinear (see Figure 5),
and this complicates interpretation. For example, dropping age under 5 results in the coefficient
of age under 18 changing from -0.62 to 0.40. The practical interpretation of these variables has
been complicated by other variables in the model.

4
3.2 Diagnostics

In order to assess the multiple regression model, we check conditions on the model’s residuals. The
general requirements are that the residuals are roughly normal, have approximately the same vari-
ance, and are independent. We leave it to the reader to check whether any nonlinear relationships
exist between the predictors and growth variable.
Figure 7 is a normal probability plot of the model’s residuals. There is clear curvature, and the tails
at the corner of the graph indicate that some of the observations have unusually distant residuals
from zero. While this would be a substantial concern for a model with only a small number of data
points, over 3,000 counties are being used here, so the influence of these outlying residuals should
be very limited.

50
Sample Quantiles

−50

−3 −2 −1 0 1 2 3

Figure 7: Normal probability plot for the residuals following model selection.
There is clear curvature, but the outliers are probably reasonable for the size
of this data set.

Figure 8 shows that the residuals plotted against their fitted values. The variance is approximately
consistent, with perhaps a small increase in variability with larger predicted values. One county,
Kalawao County in Hawaii, had a predicted value far from the cloud at (-59.4%, 20.6%). This
small and isolated county was previously a quarantine for leprosy patients; no new residents are
allowed to move to this county. Due to the unusual nature of this county, this observation should
be excluded in future analyses.
Figure 9 is useful for checking spacial independence of the residuals. In a model that fully explained
the observations, we would expect the residual values to be randomly distributed geographically;
instead there are definite geographic patterns and clusters of similar residuals. For example, the
model fails to account for variables such as climate, which may help explain why adjacent counties
tend to have similar residual values. This figure indicates there are additional features remaining
within the data that were not captured by the multiple regression model presented in here, violating
the independence condition for the residuals.

5
50
Kalawao County, Hawaii
Residuals

●
0

−50

−60 −40 −20 0 20 40

Predicted Values

Figure 8: Residuals versus fitted values from the regression model.

>18

<−18

Figure 9: Residuals plotted by their location. Empty spaces represent counties

that had missing data and were not included in the analysis.

6
3.3 Practical interpretation of model coefficients

We will proceed in estimating the impact of many variables on population growth, but we want to
highlight that these findings may be somewhat unreliable due to the violation of the independence
condition for the residuals. Each variables’s coefficient was multiplied by the variable’s IQR to get
a scaled impact for the variable, shown in Table 10. The proper way to interpret each value is,
“The growth rate for a county at the 75th versus the 25th percentile in this variable, other things
being equal, would be estimated as higher over ten years.”

log(pop2000) age under 5 age under 18

2.8% 4.3% -2.3%
age over 65 female hispanic
-1.3% -0.4% 0.9%
white not hispanic no move in one plus year foreign born
2.2% -3.2% 1.0%
foreign spoken at home bachelors mean work travel
-1.3% 5.0% 5.1%
housing multi unit median val owner occupied persons per household
-4.8% 1.1% 2.6%
per capita income poverty sales per capita
-1.9% -2.6% 1.4%

Table 10: The values in this table represent the estimated difference in growth
rate for a county at the 75th versus the 25th percentile in each variable, other
things being equal.

4 Conclusion

In this investigation, we attempted to model a U.S. county’s population growth based on readily
available demographic data, a potentially useful tool for economic and other applications. We
found strong statistical evidence that many of the demographic variables measured by the 2010 U.S.
Census (including age, racial, and demographic distribution, economic conditions, and household
makeup) were important in modeling a county’s population growth between 2000 and 2010. Taken
together in a multiple regression model, the measured variables appear to explain nearly half of the
variation in growth rate among counties.
Of the variables measured, the percentage of the population with a bachelor’s degree may be
especially important in terms of population growth. We suspect the modestly large estimated
coefficient of mean commute time is not a driver of population growth but a result of a migration
to suburbs, which often require larger commute times. It is also important to consider that many
of the variables examined are related to one another, which complicates the interpretability of
many model coefficients. This makes it especially difficult to conjecture causal conclusions from
the current model.
Further analysis of how the model’s variables are related to one another, possibly including trans-
formations of some variables in the model, may be helpful in eliminating this source of error and
in providing more definite results. In addition, this model includes no information on geographic

7
location and does not distinguish between urban, suburban, and rural areas. These types of infor-
mation appear to be important in determining county growth rates and should likely be included
as variables in future investigations.

Nick Huntington-Klein - The Effect-Routledge (2021)
100% (1)
Nick Huntington-Klein - The Effect-Routledge (2021)
646 pages
Multistate Analysis of Life Histories With R High-Resolution PDF Download
100% (10)
Multistate Analysis of Life Histories With R High-Resolution PDF Download
17 pages
Growth Curve Analysis and Visualization Using R - 1st Edition Google Drive Download
100% (13)
Growth Curve Analysis and Visualization Using R - 1st Edition Google Drive Download
14 pages
Artificial Intelligence and Causal Inference
100% (2)
Artificial Intelligence and Causal Inference
253 pages
CAP1 - Samuel Preston Demography Measuring and Modeling Population
0% (1)
CAP1 - Samuel Preston Demography Measuring and Modeling Population
22 pages
Stat 509 Notes
100% (1)
Stat 509 Notes
195 pages
Demographic Methods
100% (1)
Demographic Methods
89 pages
RealStats Book
No ratings yet
RealStats Book
897 pages
Statistics Explained - 4th Edition ISBN 0367366355, 9780367366353 (FULL VERSION DOWNLOAD)
No ratings yet
Statistics Explained - 4th Edition ISBN 0367366355, 9780367366353 (FULL VERSION DOWNLOAD)
16 pages
Measuring Livelihood Impacts: A Review of Livelihoods Indicators
100% (1)
Measuring Livelihood Impacts: A Review of Livelihoods Indicators
22 pages
Generalized Linear Models
100% (9)
Generalized Linear Models
243 pages
Quantitative, Spatial, Mapping, and Visualization: Plan-Making Methods
100% (1)
Quantitative, Spatial, Mapping, and Visualization: Plan-Making Methods
38 pages
Datos Categóricos
No ratings yet
Datos Categóricos
416 pages
Econometria Con R
No ratings yet
Econometria Con R
300 pages
Greenwood Intermediate Statistics With R
No ratings yet
Greenwood Intermediate Statistics With R
429 pages
Logistic Regression
0% (1)
Logistic Regression
71 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Testing Marx With Input Output Tables
No ratings yet
Testing Marx With Input Output Tables
27 pages
Population - A Lively Introduction, 4th Edition - Population Reference (PDFDrive)
No ratings yet
Population - A Lively Introduction, 4th Edition - Population Reference (PDFDrive)
44 pages
Imstat
No ratings yet
Imstat
549 pages
4 - of Tests and Testing
100% (1)
4 - of Tests and Testing
16 pages
Akritas Probability & Statistics With R For Engineers and Scientists
No ratings yet
Akritas Probability & Statistics With R For Engineers and Scientists
256 pages
Manuel PDF
No ratings yet
Manuel PDF
503 pages
Engle and Dufour - Time and Price Impact of A Trade
100% (4)
Engle and Dufour - Time and Price Impact of A Trade
32 pages
Statnotes PDF
No ratings yet
Statnotes PDF
300 pages
R Programming
100% (8)
R Programming
60 pages
Solution Manual Adms 2320 PDF
No ratings yet
Solution Manual Adms 2320 PDF
869 pages
Class Notes
No ratings yet
Class Notes
147 pages
Analitik Data Dalam Bisnis
No ratings yet
Analitik Data Dalam Bisnis
52 pages
Growth Curve Analysis and Visualization Using R, 1st Edition ISBN 1466584327, 9781466584327 Secure Ebook Download
No ratings yet
Growth Curve Analysis and Visualization Using R, 1st Edition ISBN 1466584327, 9781466584327 Secure Ebook Download
15 pages
Multistate Analysis of Life Histories With R Full Text
No ratings yet
Multistate Analysis of Life Histories With R Full Text
16 pages
Lecture Introduction 2021-22
No ratings yet
Lecture Introduction 2021-22
41 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
46 pages
2022bbe1052 Ecotrix Merged
No ratings yet
2022bbe1052 Ecotrix Merged
18 pages
STAT6101 Coursenotes 1516 PDF
No ratings yet
STAT6101 Coursenotes 1516 PDF
73 pages
Simple Model For Wall Deflection Caused by Braced Excavation in Clays
No ratings yet
Simple Model For Wall Deflection Caused by Braced Excavation in Clays
16 pages
Time Series Forecasting
No ratings yet
Time Series Forecasting
11 pages
Multiple Linear Regression: The Basics
No ratings yet
Multiple Linear Regression: The Basics
53 pages
Instructor's Presentation - Health Indicators, Demography and Population Estimation
No ratings yet
Instructor's Presentation - Health Indicators, Demography and Population Estimation
46 pages
1.Basic principles of engineering metrology - 박재희
No ratings yet
1.Basic principles of engineering metrology - 박재희
11 pages
Math IA Final
No ratings yet
Math IA Final
20 pages
Demography: Environmental Planning Capability Building For Architects Dates: March 8/9/15/16/22/23
No ratings yet
Demography: Environmental Planning Capability Building For Architects Dates: March 8/9/15/16/22/23
61 pages
SPS 2452 Demographic Techniques Notes Week One To Four
No ratings yet
SPS 2452 Demographic Techniques Notes Week One To Four
22 pages
Tndy - Ta Session 1
No ratings yet
Tndy - Ta Session 1
10 pages
Research Methods in Urban Science (US 603) : Lecture #3: Population Projections (Cont'd) 13 January 2018
No ratings yet
Research Methods in Urban Science (US 603) : Lecture #3: Population Projections (Cont'd) 13 January 2018
38 pages
Class Lecture 01
No ratings yet
Class Lecture 01
16 pages
Is There A Correlation Between The Type of The City and Its Population
No ratings yet
Is There A Correlation Between The Type of The City and Its Population
14 pages
ETR 560 Final Project by Z1782470
No ratings yet
ETR 560 Final Project by Z1782470
15 pages
Data Analysis Course: Time Series Analysis & Forecasting (Version-1)
No ratings yet
Data Analysis Course: Time Series Analysis & Forecasting (Version-1)
43 pages
Econometrics Lecture - 1
No ratings yet
Econometrics Lecture - 1
32 pages
Integrated MIS and M&E System
No ratings yet
Integrated MIS and M&E System
9 pages
Biostat Group-4
No ratings yet
Biostat Group-4
11 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Presentation 16 Demo
No ratings yet
Presentation 16 Demo
12 pages
A Crash R Course On Statistical Graphics
No ratings yet
A Crash R Course On Statistical Graphics
169 pages
A Pizza
No ratings yet
A Pizza
10 pages
Handout - Basic Regression - Analysis
No ratings yet
Handout - Basic Regression - Analysis
14 pages
SimpleRegression Transcript
No ratings yet
SimpleRegression Transcript
4 pages
CPH LEC Demography and Pop Estimates Reviewer
No ratings yet
CPH LEC Demography and Pop Estimates Reviewer
4 pages
Detecting Earning Management
No ratings yet
Detecting Earning Management
34 pages
Statistics: Population N
No ratings yet
Statistics: Population N
4 pages
Project3 1
No ratings yet
Project3 1
2 pages
Laurieproject
No ratings yet
Laurieproject
4 pages
BUS-485-3 Research Final Paper (Nafees and Group)
No ratings yet
BUS-485-3 Research Final Paper (Nafees and Group)
36 pages
Final DMT Report PDF
No ratings yet
Final DMT Report PDF
27 pages
Chapter 8: Quantitative Sampling
No ratings yet
Chapter 8: Quantitative Sampling
5 pages
2010 WSMC Team Project Paper
No ratings yet
2010 WSMC Team Project Paper
10 pages
Predicting Population Using Least Squares
No ratings yet
Predicting Population Using Least Squares
8 pages
Pam3100 Ps5 Revised Spring 2018
No ratings yet
Pam3100 Ps5 Revised Spring 2018
5 pages
Libros 2
No ratings yet
Libros 2
3 pages
Unit 2 WebQuest - Internet Project
No ratings yet
Unit 2 WebQuest - Internet Project
3 pages
Quantitative AnalysisJD
No ratings yet
Quantitative AnalysisJD
64 pages
MM Soymilk Maker
No ratings yet
MM Soymilk Maker
2 pages
Chapter 11 Lecture Notes .
No ratings yet
Chapter 11 Lecture Notes .
22 pages
Handbook of AIDS Indicator
No ratings yet
Handbook of AIDS Indicator
101 pages
Appendix Nonlinear Regression
No ratings yet
Appendix Nonlinear Regression
5 pages
Benjamin Libet
No ratings yet
Benjamin Libet
17 pages
610 - Article1730200659
No ratings yet
610 - Article1730200659
16 pages
Lesson 6 - The - (2 K - ) Factorial Design
No ratings yet
Lesson 6 - The - (2 K - ) Factorial Design
30 pages
Shabge-Dfid Project (NW) Life of Project Plan FY-2000-2005: Output 1
No ratings yet
Shabge-Dfid Project (NW) Life of Project Plan FY-2000-2005: Output 1
15 pages
Attitude, Subjective Norms, Perceived Behavior, Entrepreneurship Education and Self-Efficacy Toward Entrepreneurial Intention University Student in Indonesia
No ratings yet
Attitude, Subjective Norms, Perceived Behavior, Entrepreneurship Education and Self-Efficacy Toward Entrepreneurial Intention University Student in Indonesia
21 pages
Opm101chapter8 000
No ratings yet
Opm101chapter8 000
43 pages
Parametric Analysis of Gear Hobbing Process by Using Tin and Alcrn Coated M 35 Hob
No ratings yet
Parametric Analysis of Gear Hobbing Process by Using Tin and Alcrn Coated M 35 Hob
7 pages
R Commands: Appendix B
No ratings yet
R Commands: Appendix B
5 pages
The Determinants of Problem Banks in Indonesia (An Empirical Study)
No ratings yet
The Determinants of Problem Banks in Indonesia (An Empirical Study)
50 pages
Gradient Boosting
No ratings yet
Gradient Boosting
17 pages
05 Contrasts1
No ratings yet
05 Contrasts1
73 pages
Practice Final
No ratings yet
Practice Final
18 pages
RBA Indicators For HIV-Program
No ratings yet
RBA Indicators For HIV-Program
4 pages
Lobbying and Sopa/Pipa: Luke Paulsen Openintro CC By-Sa
No ratings yet
Lobbying and Sopa/Pipa: Luke Paulsen Openintro CC By-Sa
6 pages
Effects of Local Climate Variability On Transmission Dynamics of Cholera in Matlab, Bangladesh
No ratings yet
Effects of Local Climate Variability On Transmission Dynamics of Cholera in Matlab, Bangladesh
6 pages
Comparison of SAS and SPSS Products With R Packages and Functions PDF
No ratings yet
Comparison of SAS and SPSS Products With R Packages and Functions PDF
3 pages
Data Management and Sampling
No ratings yet
Data Management and Sampling
3 pages
Sample Size Calculate
No ratings yet
Sample Size Calculate
2 pages
Business Analytics (MGT555)
No ratings yet
Business Analytics (MGT555)
2 pages
Econometrics Notes
No ratings yet
Econometrics Notes
2 pages
Writing Chemistry Lab Reports 26
No ratings yet
Writing Chemistry Lab Reports 26
1 page
Changing Texas: Implications of Addressing or Ignoring the Texas Challenge
From Everand
Changing Texas: Implications of Addressing or Ignoring the Texas Challenge
Steve H. Murdock
No ratings yet
Challenges in the Process of China’s Urbanization
From Everand
Challenges in the Process of China’s Urbanization
Karen Eggleston
No ratings yet
Socioeconomic Stratification: A Case Study on Sustainable Growth in a Declining Population
From Everand
Socioeconomic Stratification: A Case Study on Sustainable Growth in a Declining Population
Sunday Cristopher Enubuzor Ph.D.
No ratings yet
Universal Health Coverage in China: A Health Economic Perspective
From Everand
Universal Health Coverage in China: A Health Economic Perspective
David S. Weis
No ratings yet

Analysis-County Census

Uploaded by

Analysis-County Census

Uploaded by

Modeling population growth in the United States

Mean Median St. Dev. IQR Min Max

Table 2: Statistical summaries of population growth in US counties from 2000

600 200 1,000 10,000 100,000 1,000,000

Population, Year 2000 Natural Log of Population, Year 2000

Figure 4: Distribution of populations. Left: original populations. Right: log-

Percent With Bachelors Degree

Percent White (not Hispanic)

Figure 5: Three figures that highlight the collinearity of several predictors.

3.1 Modeling population growth

Estimate Std. Error t value Pr(>|t|)

−60 −40 −20 0 20 40

Figure 8: Residuals versus fitted values from the regression model.

Figure 9: Residuals plotted by their location. Empty spaces represent counties

log(pop2000) age under 5 age under 18

You might also like