Assignment Questions
Assignment Questions
A1 (21101-107)
The data describe housing prices in the Philadelphia area. Each of the 110 rows of this data
table describes a region of the metropolitan area. (Several make up the city of Philadelphia).
One column, labeled Selling Price, gives the median price for homes sold in that area during
1999 in thousands of dollars. Another labeled Crime Rate, gives the number of crimes
committed in that area, per 1,00,000 residents. [PhiladelphiaHousing.xls]
a. Make a scatter plot of the selling price on the crime rate. Which observation stands
out from the others? Is this outlier unusual in terms of either marginal distribution?
b. Find the correlation using all of the data as shown in the prior scatter plot.
c. Exclude the distinct outlier and redraw the scatter plot focused on rest of the data.
Does your impression of the relationship between the crime rate and selling price
change?
d. Compute the correlation without the outlier. Does it change much?
e. Can we conclude from the correlation that crimes in the Philadelphia area cause a rise
or fall in the value of real estate?
A2 (21108-114)
The owner of Showtime Movie Theatres Inc. would like to estimate weekly gross revenue as
a function of advertising expenditures. Historical data for a sample of eight weeks is given in
[Showtime.xls]
A3 (21115-121)
[TourDeFrance.xls] : These data give the times in minutes for the 140 cyclist who competed
in 2006 Tour de France and raced in both time trial events. In a time trial, each cyclist
individually rides a set course. The rider with the shortest time wins the event. The courses in
the 2006 tour were each about 35 miles long. (Stage 7 was of 52 km long and stage 19 was 57
km long)
a. Would you expect the times to be associated?
b. Make a scatter plot for these data, with the first set of times (from Stage 7) along the
x-axis. Do the variables appear to be associated? Describe the relationship, if any.
c. Find the correlation between these variables, if appropriate.
d. Identify the times for Floyd Landis in the scatter plot. Did he perform as well in the
second race, with the championship on the line, as in the first race?
A4 (21122-128)
a. Make a scatter plot of weight & highway mileage. Which variable do you think makes
the most sense to put on the x-axis and which belongs on the y-axis?
b. Describe any pattern that you see in the plot. Be sure to identify any outliers.
c. Find the correlation between these two variables.
d. Interpret the correlation in the context of these data. Does the correlation provide a
good summary of the strength of the relationship?
e. Describe the marginal distribution of MPG Highway and Weight. Include the mean &
SD of both variables.
f. Use the correlation line to estimate the mileage of a car that weighs 4000 pounds.
Does this seem like a sensible procedure?
A5 (21129-135)
An economist wishes to predict the market value of owner-occupied homes in small mid-
western cities. She has collected a set of data from 45 small cities for a 2-year period and
wants you to use this as the data source for the analysis. She wants you to develop two
prediction equations: one that uses the size of the house as a predictor and a second that uses
the tax rate as a predictor. [Citydat.xls]
a. Plot the market value of houses (hseval) versus the size of houses (sizehse), and then
versus the tax rates (taxrate). Note any unusual patterns in the data.
b. Prepare regression analyses for the two predictor variables. Which variable is the
stronger predictor of the value of houses?
c. A business developer in a mid-western state has stated that local property tax rates in
small towns need to be lowered because if they are not, no one will purchase a house
in these towns. Based on your analysis in this problem, evaluate the business
developer’s claim.
A6 (21136-142)
The following table [Auto1.xls] reports the horsepower, curb weight and the speed at ¼ mile
for 16 sports and GT cars (1998 Road& Track Sports & GT Cars)
a. Use curb weight as the independent variable and the speed at ¼ mile as the dependent
variable and draw the scatter plot.
b. Find the correlation between these two variables and interpret it in the context of these
data. Does the correlation provide a good summary of the strength of the relationship?
c. What is the estimated regression equation?
d. Use curb weight and horsepower as two independent variables and the speed at ¼
miles as he dependent variable. What is the estimated regression equation?
e. The 1999 Porsche 911 Carrera has been advertised as having a curb weight of 2990
pounds and an engine with 296 horsepower. Use the results in (d) to predict the speed
at ¼ mile for the Porsche 911.
A7 (21143-149)
[Coffee.xls] :
The temperature in December in Buffalo, New York is often below 40 degrees Fahrenheit (4
degrees Celsius). Not surprisingly when National Football League Buffalo Bills play at home
in December, coffee is a popular item at the concession stand. The Concession manager
would like to acquire more information so that he can manage inventories more efficiently.
The numbers of cups of coffee sold during 50 games played in December in Buffalo were
recorded. Suppose that in addition to recording the coffee sales, the manager also recorded
average temperature (measured in degrees Fahrenheit) during the game. These data together
with the number of cups of coffee sold were recorded
A8 (21150-156)
If the airline flight that you are on is 20 minutes late departing, can you expect the pilot to
make these minutes by, say, flying faster than usual? These data summaries the status of a
sample of 984 flights during 2006.(Bureau of Transportation Statistics). [FlightDelays.xls]
a. Do you expect the number of minutes that the flight is delayed departing to be
associated with the arrival delay?
b. Make a scatter plot of the arrival delay (in minutes) on the departure delay (also in
minutes). Summarize the association present in the scatter plot, if any?
c. Find the correlation between arrival delay and departure delay and interpret the result.
d. How is the correlation affected by the evident outlier, a flight with very strong delays?
e. How would the correlation change if delays were measured in hours rather than
minutes?
A9 (21157-163)
A10 (21164-171)
The table in [ForFunds] gives the annual return, the safety rating (0 = riskiest, 10= safest)
and the annual expense ratio for 20 foreign funds (Mutual Fund, March2000)
a. Use annual expense ratio as the independent variable and the annual return as the
dependent variable to draw the scatter plot. Compute the coefficient of determination
and describe what it tells you.
b. What is the estimated regression equation?
c. Develop an estimated regression equation relating the annual return to the safety
rating and the annual expense ratio
d. Estimate the annual return for a firm that has a safety rating of 7.5 and annual expense
ratio of 2.