Assignments123 2013
Assignments123 2013
Homework Homework Homework Homework Assignments Assignments Assignments Assignments1 11 1, 2 & 3 , 2 & 3 , 2 & 3 , 2 & 3
Instructions:
1. This document has three homework assignments one each for the first three weeks of
classes. All of the homework assignments use the same HousePrices.jmp dataset which is
available on the course website alongside this homework.
2. Each weeks assignment requires you to perform some data analysis using JMP and turn in a
brief report of no more than 2 pages (sides). The submissions will all be due by 5 pm on the
due date and should be uploaded on Turnitin before that time. Instructions for uploading
on Turnitin will be forwarded by your AA separately.
3. Please remember to include your name, section number and ID at the top of the assignment.
4. The homework submissions will not be graded for correctness, but rather for effort. The
scoring will be binary you will get full credit if your effort is deemed satisfactory, otherwise
you will get no credit. No credit will be given for late submissions. A timely submission that
responds to all questions and shows your thinking will be considered satisfactory whether or
not your solution is correct.
5. The solutions for the homework assignments will be made available to you after the due
date.
6. Honor code category 1 applies. This implies that you can refer to any resource within and
outside of the course material only with appropriate citations. You can also discuss with
other students but the submitted write-up should be entirely your own work. Significant
overlaps with other submissions maybe considered as possible instances of violation of the
honor code.
7. All assignments are individual assignments and each assignment is worth 3 points.
8. Have fun.
Description of the dataset:
A real estate agent is trying to understand the nature of housing stock and home prices in
and around a medium sized town in upstate New York. She has collected data from a
random sample of 1047 homes sold in the last 12 months. Data was collected on the
following variables, and is available in the attached HomePrices.jmp file.
Price the sale price of the house in $
Living Area in Sq. ft.
Bathrooms number of bathrooms in the house (powder rooms with no tub or
shower area are considered 0.5 baths)
Bedrooms the number of bedrooms
Lot Size size of the property on which the house sits (in acres).
Age of the house in years
Fireplace whether or not the house has a fireplace (Yes = 1, No = 0)
Your task, over the three assignments is to analyze this dataset in order to gain some
understanding of this particular real estate market the values of homes, their
characteristics in terms of size and other features, and relationships between these. This
understanding will prove immensely helpful to the real estate agent in advising her
clients. Since all of the homes are from the same geographical area, location (which
usually has a huge bearing on home values) is not a major concern here.
Most of the analysis will be done in response to the specific questions posed on the
homework assignments. But feel free to explore and play around with the data set to
enhance your own understanding of how to make sense of data.
Assignment 1
Due: Sunday, April 28, 2013
1. Prepare a brief report summarizing the home values (prices) in this area. Use both
graphical and numerical summaries. Your report should briefly describe what those
summaries tell you, and anything of particular note/interest.
2. Does the normal model provide a good description of the prices? Use a Normal
Quantile plot to frame your response.
3. Irrespective of your response to Q2, assume that Price ~ N(164K, 68K
2
). Given this:
A. Calculate the following probabilities P(Price > 92.8K), P(Price < 255.5K). Do
these numbers agree with what you see in the data?
B. Once again, assuming the above normal distribution, what percentage of
houses should have a value less than 232K? Does that agree with the data?
C. Based on the theoretical model, what do you expect should be the price of a
house that is exactly on the 3
rd
quartile (75
th
percentile,). How does that
compare to the actual?
4. Create a histogram and boxplot for the Living Area variable. What does the histogram
tell you that the boxplot does not, and vice-versa? Is the distribution symmetric?
Check the skewness measure to see if it is consistent with your observation.
5. Create a new column in the dataset by taking the logarithm of the Living Area
variable. Is the normal distribution a better fit for this variable or the original (Living
Area) variable? Why do you think this is the case?
Assignment 2
Due: Sunday, May 5, 2013
1. Create the 90%, 95%, and 99% confidence intervals for the average home price and
explain what these mean. How do the margins of error for these three confidence
intervals compare? Does that make sense? Before creating the confidence intervals,
be sure to check the conditions necessary to create confidence intervals (and briefly
describe this in your submission).
2. Your friend has asked you to provide an estimate for the 95
th
percentile of home
prices in this market. Which (if any) of the above confidence intervals can you use to
give an answer? Describe briefly.
3. The sample data given to you all come from home sales within the past 12 months.
Suppose you had sample data of the same size each year going back several years,
and calculated the average sale price for each year. What kind of distribution do you
expect to see for these averages and why? (Include the parameters of the distribution
in your response, assuming that the house prices dont change i.e. go up or down,
over time. Clearly this is not a great assumption, but make it anyway.)
4. The architecture changed significantly in this geographical area about 30 years ago.
So any houses aged more than 30 years are considered old houses. What
proportion of the houses in the sample are old? Provide the 95% and 99% confidence
intervals for the proportion of old houses in this area, and interpret them. Once
again, make sure that the necessary conditions are satisfied before creating
confidence intervals.
Assignment 3
Due: Wednesday, May 8, 2013
1. Your friend claims that the average house price in this area is above $150K. Do you
agree? He also claims that the average living area is more than 1800 Sq.ft. Do you
agree with this? (use a 5% significance level for both). Briefly explain what the p-
values in these cases mean?
2. Are the average home prices higher for houses with fireplaces as compared to those
without? Create side by side boxplots of the two groups and comment on what you
see. Now formulate an appropriate hypothesis and test it.
3. Your friend claims that old houses, on average, have larger lot sizes than new houses
(refer to Assignment 2 for definition of old). Do you agree? Explain. Use a
significance level of 5% for your test.
4. Considering the situation in Q2 again, if your friend had claimed that the lot sizes for
old houses are on average different from those for new ones, without specifying
anything about the direction of difference, how would that change your response?
Use a significance level of 5% for your test.