0% found this document useful (0 votes)
42 views1 page

StatCrunch Practice: Z-Scores & Regression

The document outlines a practice exercise using StatCrunch for statistical analysis, including normal distribution calculations and linear regression. It covers various tasks such as finding z-scores, estimating percentages, and creating scatterplots based on given datasets. Additionally, it involves interpreting regression results and evaluating the appropriateness of linear models for specific data sets.

Uploaded by

azbarber
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views1 page

StatCrunch Practice: Z-Scores & Regression

The document outlines a practice exercise using StatCrunch for statistical analysis, including normal distribution calculations and linear regression. It covers various tasks such as finding z-scores, estimating percentages, and creating scatterplots based on given datasets. Additionally, it involves interpreting regression results and evaluating the appropriateness of linear models for specific data sets.

Uploaded by

azbarber
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Test 2 StatCrunch In Class Practice

ˆ To access the Normal calculator in StatCrunch, use: Stat → Calculators → Normal


ˆ To access the Linear Regression tool, use: Stat → Regression → Simple Linear
1. The number of pages contained in books in a certain collection follows a normal distribution with a
mean of 300 and a standard deviation of 72
(a) Find and interpret the z-score for a book that contains 358 pages.
(b) Use StatCrunch to estimate the percentage of books in the collection with more than 358 pages.
Include the P (Inequality) statement you are having StatCrunch compute.
(c) Use the Empirical (68 − 95 − 99.7) Rule to estimate the percentage of books between 300 and 444
pages.
(d) Use StatCrunch to find a more precise answer. Include the P (Inequality) statement you are having
StatCrunch compute.
(e) If a book has a z-score of z = −2.1, how many pages does it contain? (Round to the nearest
whole number).
2. An analysis was done the time taken to beat the new video game: “The Legend of Zelda: Echoes of
Wisdom.” On average, it took players 20.8 hours to complete the main story, with a standard deviation
of 1.2 hours.
(a) What percentage of players finished the main story between 19 and 23 hours? Include the
P (Inequality) statement you are having StatCrunch compute.
(b) What was the cutoff for the fastest 15% of times?
(c) What is the z-score associated with that time?
(d) What are the 1st and 3rd quartile values? What is the IQR?
3. Open the Starbucks data set in our class StatCrunch group. We want to predict the number of
calories in an item based on the amount of fat it contains.
(a) What are the explanatory and response variables?
(b) Create an appropriate scatterplot, and describe the Direction, Form, Strength, and Unusual
Features.
(c) Compute the Correlation coefficient, r. Does it match your description in part b?
(d) Find the equation of the regression line, and use it to predict the calories in an item with 13 grams
of fat.
(e) The “Marble Pound Cake” item has 13 grams of fat and 350 calories. Compute the residual for
this data point.
(f) Examine the “Residuals vs X-values” graph. What does it look like?
(g) Is a linear model appropriate for this data set? Why or Why not?
4. Open the Cotton Quality data set in our class StatCrunch group. We’d like to use Soil pH to predict
the quality of cotton grown in a field.
(a) What are the explanatory and response variables?
(b) Find the equation of the regression line.
(c) Interpret the slope and y-intercept of this line in complete sentences in context of the data.
(d) Find the R2 value for this data. Give a complete sentence interpreting it in context of the data.
(e) Is a linear model appropriate for this data set? Why or Why not?

Common questions

Powered by AI

Using StatCrunch, you calculate the percentage by finding P(X > 358) with a mean of 300 and standard deviation of 72. This corresponds to finding P(z > 0.81), which is approximately 20.9% (using normal distribution tables or StatCrunch).

The R2 value indicates the proportion of variance in the response variable explained by the explanatory variable. A high R2 value, say 0.85, suggests that 85% of the variability in cotton quality is accounted for by soil pH, reinforcing the linear model’s strength, assuming all assumptions hold .

To decide if a linear model is appropriate, consider scatterplot patterns (should show a clear linear trend), correlation coefficients (high values close to 1), residual patterns (should be randomly scattered without patterns), and R2 values (high indicates a good fit). If conditions match these criteria, the model is likely suitable .

By examining the scatterplot, you assess direction, form, strength, and unusual features. If the scatterplot shows a positive, linear pattern with few outliers, this indicates a strong positive correlation, often supported by a high correlation coefficient, r, like 0.9 or above. The analysis should confirm these observations, indicating a robust linear relationship .

To determine this percentage, you calculate P(19 <= X <= 23) using a mean of 20.8 and a standard deviation of 1.2. This involves finding the z-scores for 19 and 23, which are about -1.5 and 1.83, respectively, resulting in approximately 83.9% of players finishing within this time span .

According to the Empirical Rule (68-95-99.7 rule), approximately 68% of the data in a normal distribution lie within one standard deviation of the mean. Since 444 is exactly two standard deviations away from the mean (300 + 2*72), about 95% of the books will have between 300 and 444 pages. This can be validated using StatCrunch to find a more precise answer where P(300 <= X <= 444) confirms this estimation .

The residual is computed by finding the difference between the actual calorie count (350) and the predicted count from the regression equation. If the predicted value is 340, then residual = 350 - 340 = 10. A positive residual implies the prediction is underestimated, suggesting the model doesn't fully capture variability for this item .

To find the cutoff for the fastest 15%, you need to find the z-score corresponding to the 15th percentile (about -1.04) and use it in the formula X = μ + zσ. Given a mean of 20.8 and SD of 1.2, the cutoff time is approximately X = 20.8 + (-1.04 * 1.2) = 19.68 hours .

The z-score is calculated using the formula z = (X - μ) / σ, where X is the observed value (358 pages), μ is the mean (300 pages), and σ is the standard deviation (72). For a book with 358 pages, the z-score is (358 - 300) / 72 = 0.81. This indicates that the book's page count is 0.81 standard deviations above the mean .

To find the number of pages using a z-score of -2.1, use the formula: X = μ + zσ. Here, X is the page count, μ is the mean (300), z is the z-score (-2.1), and σ is the standard deviation (72). Thus, X = 300 + (-2.1 * 72) = 148.8, rounded to 149 pages .

You might also like