0% found this document useful (0 votes)
18 views18 pages

FDS UNIT 3 QB

The document is a question bank for the course AD3491 - Fundamentals of Data Science and Analytics for the academic year 2024-25. It covers various topics in descriptive analytics, including frequency distributions, measures of central tendency (mean, median, mode), and correlation, along with guidelines for constructing frequency distributions and interpreting data. Additionally, it discusses concepts such as outliers, standard deviation, and linear regression, providing examples and solutions for practical understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views18 pages

FDS UNIT 3 QB

The document is a question bank for the course AD3491 - Fundamentals of Data Science and Analytics for the academic year 2024-25. It covers various topics in descriptive analytics, including frequency distributions, measures of central tendency (mean, median, mode), and correlation, along with guidelines for constructing frequency distributions and interpreting data. Additionally, it discusses concepts such as outliers, standard deviation, and linear regression, providing examples and solutions for practical understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

DEPARTMENT OF INFORMATION TECHNOLOGY

Academic Year: 2024-25 (Even)


QUESTION BANK
Course Code & Name: AD3491-Fundamentals of Data Science and
Analytics
Degree/Branch/Year/Semester: B.Tech/AI & DS/II/IV

UNIT II- DESCRIPTIVE ANALYTICS


Part A

1. During their first swim through a water maze, 15 laboratory rats made the
following number of errors (blind alleyway entrances): 2, 17, 5, 3, 28, 7, 5, 8, 5, 6, 2,
12, 10, 4, 3. Find the mode and median for these data.
Mode: The mode of a data set is the number that occurs most frequently in the set.
Data Mode
2 2
3 2
4 1
5 3
6 1
7 1
8 1
10 1
12 1
17 1
28 1

Her the data 05 occurs most frequently as 03 times compared to other data. Hence the
mode these data is 05.
Median:The median M is the midpoint of the distribution
Ordered list of given data set is 2,2,3,3,4,5,5,5,6,7,8,10,12,17,28
Total no of observation in the data set is:15
Here ‘n’=15. It is an Odd value. So, median located in (15+1)/2=8th spot in the ordered
1
list is 5
So, median is 05
2. Mentions the essential and optional guidelines to be followed for frequency
distributions or State the “Guidelines for frequency distribution”.
 Each observation should be included in one, and only one, class.
 List all classes, even those with zero frequencies.
 All classes should have equal intervals.
 All classes should have both an upper boundary and a lower boundary.
 Select the class interval from convenient numbers, particularly 5 and
10 ormultiples of 5 and 10.
 The lower boundary of each class should be a multiple of the class interval.
 Aim for a total of approximately 10 classes.
3. GRE scores for a group of graduate school applicants are distributed as follows:

2
Convert to a relative frequency distribution. When calculating proportions, round
numbers to two digits to the right of the decimal point, using the rounding
procedure
(a) Convert the distribution of GRE scores shown in above table to a cumulative
frequency distribution.
(b) Convert the distribution of GRE scores obtained in above table to a cumulative
percent frequency distribution
Solun. a)
GRE Frequency (f) cumulative frequency distribution
725-749 1/200=0.005
700-724 3/200=0.015
675-699 14 14/200=0.07
650-674 30 30/200=0.15
625-649 34 34/200=0.17
600-624 42 42/200=0.21
575-599 30 30/200=0.15
550-574 27 27/200=0.135
525-549 13 13/200=0065
500-524 4/200=0.02
475-499 2/100=0.02
b)
GRE Frequency cumulative Convert the distribution of GRE
(f) frequency scores obtained in above table to
distribution a cumulative percent frequency
distribution
725-749 1/200=0.005 0.5%
700-724 3/200=0.015 1.5%
675-699 14 14/200=0.07 7%
650-674 30 30/200=0.15 15%
625-649 34 34/200=0.17 17%
600-624 42 42/200=0.21 21%
575-599 30 30/200=0.15 15%
550-574 27 27/200=0.135 13.5%
525-549 13 13/200=0.065 6.5%
500-524 4/200=0.02 2%
475-499 2/100=0.02 2%
3
4. Write short note on Stem-and-leaf display. Represent the following datain
stem- and-leaf display. 67, 74, 63, 88, 82, 97, 65, 79
 A stem-and-leaf display is used to present quantitative data in a graphicalformat,
similar to a histogram, to assist in visualizing the shape of adistribution.
 A stem and leaf plot displays data by splitting up each value in a datasetinto a
“stem” and a “leaf.”
Raw Data 67 Stem Leaf
74
6 7 3 5
63
88
7 4 9
82
97
8 8 2
65
79
9 7

5. Why Frequency Distribution is important in Data Science?


 Frequency distribution is an organized tabulation/graphical representation of the
number of individuals in each category on the scale of measurement.
 The reasons for constructing a frequency distribution are as follows:
• To organize the data in a meaningful, intelligible way
• To determine the shape of the distribution
• To facilitate computational procedures for measures of average and
spread
• To draw charts and graphs for the presentation of data
• To enable the reader to make comparisons among different data sets
6. How the skewness of a data distribution can be identified?

4
7. Define frequency distribution? Or Define Frequency distribution.
A frequency distribution is a collection of observations produced by sorting observations
into classes and showing their frequency (f ) of occurrence in each class.
8. The IQ scores for a group of 35 high school dropouts are given in the table:
a) Construct a frequency distribution for grouped data.
(b) Specify the real limits for the lowest class interval in this frequency distribution.

Solution
(a) Calculating the class
width, 123 -69/10=
54/10=5.4
Round off to a convenient number, such as 5.

(b) 64.5–69.5

9. What are some possible poor features of the following frequency distribution?

Solun:
 Not all observations can be assigned to one and only one class (because of
gap between 20–22 and 25–30 and overlap between 25–30 and 30–34).
 All classes are not equal in width (25–30 versus 30–34).
 All classes do not have both boundaries (35–above).
10. Define Outlier.
 An outlier is an observation that lies an abnormal distance from other values in
a random sample from a population.
 It will be considered as abnormal.
 i.e., the appearance of one or more very extreme scores, or outliers.
11. Identify any outliers in each of the following sets of data collected from nine
college students.

5
Solun:
Outliers are a summer income of $25,700; an age of 61; and a family size of 18. No outliers
for GPA.
12. List out the typical shapes of smoothed frequency distribution.
Normal
Bimodal
Positively skewed
Negatively skewed
13. Define mean.
 The mean is the average of a set of observations.
 i.e., the sum of the observations divided by the number of observations.
 If the n observations are written as their mean can be written mathematically as:
 X1,x2…xn
 We read the symbol as “x-bar.”
 The bar notation is commonly used to represent the samplemean, i.e. the mean of
the sample.

14. Find the sample mean value for the best actress Oscar winner data set: 34 34 26 37
42 41 35 31 41 33 30 74 33 49 38 61 21 41 26 80 43 29 33 35 45 49 39 34 26 25 35 33.
Solun.

15. Define Median.


 The median M is the midpoint of the distribution.
 It is the number such that half of the observations fall above, and half fall below.
16. State the steps to find the median value.
 Order the data from smallest to largest.
 Consider whether n, the number of observations, is even or odd.
 If n is odd, the median M is the center observation in the ordered list.
 This observation is the one “sitting” in the (n + 1) / 2 spot in the ordered list.
 If n is even, the median M is the mean of the two center observations in the
ordered list.
 These two observations are the ones “sitting” in the (n / 2) and (n / 2) + 1 spots in
the ordered list.
17. Compare mean and median

 The mean and the median, the most common measures of center.
 Each describe the center of a distribution of values in a different way.
 The mean describes the center as an average value, in which the actual values of
6
the data points play an important role.
 The median, on the other hand, locates the middle value as the center, and the
order of the data is the key.
18. Define mode.
 The mode of a data set is the number that occurs most frequently in the set.
• If no value appears more than once in the data set, the data set has no
mode.
• If a there are two values that appear in the data set an equal number of
times, they both will be modes etc.
19. When to use mean/ median?

 Use the sample mean as a measure of center for symmetric distributions with no
outliers.
 Otherwise, the median will be a more appropriate measure of the center of our
data.
20. What do you meant by range?
 A range measures the spread of a data inside the limits of a data set.
 It is calculated as a difference between the highest and lowest values in the data
set.
 The larger the range, the greater the spread of the data.
 The range covered by the data is the most intuitive measure of variability.
 The range is exactly the distance between the smallest data point (min) and the
largest one (Max).
 Range = Max – min
21. Define standard deviation.
 The standard deviation is to quantify the spread of a distribution by measuring
how far the observations are from their mean.
 The standard deviation gives the average (or typical distance) between a
 Standard deviation is the measure of the overall spread (variability) of a data set
values from the mean.
 The more spread out a data set is, the greater are the distances from the mean and
the standard deviation.
 There are many notations for the standard deviation: SD, s, Sd, StDev.

22. Compute the standard deviation of the sample data: 3, 5, 7 with a sample mean
of 5.

23. What do you meant by degree of freedom.

 Degrees of freedom (df) refers to the number of values that are free to vary,
given one or more mathematical restrictions, in a sample being used to estimate a
population character
 The number of values frees to vary, given one or more mathematical restrictions.
 Degrees of freedom, that is, df = n – 1.
24. Define Inter-Quartile Range (IQR).

 The Inter-Quartile Range or IQR measures the variability of a distribution by

7
giving us the range covered by the MIDDLE 50% of the data.
 To find the interquartile range (IQR), first find the median (middle value) of the
lower and upper half of the data.
 These values are quartile 1 (Q1) and quartile 3 (Q3).
 The IQR is the difference between Q3 and Q1
 IQR = Q3 – Q1
 Q3 = 3rd Quartile = 75th Percentile
 Q1 = 1st Quartile = 25th Percentile
25. How to measure/interpret the strength of a relationship based on the absolute value
of ‘r’?
Absolute Value of r Strength of Relationship

r<0.3 None or very Weak


0.3<r<0.5 Weak

0.5<r<0.7 Moderate

r>0.7 Strong

26. Define Correlation.


 Correlation is a statistical term describing the degree to which two variables
move in coordination with one another.
 If the two variables move in the same direction, then those variables are said to
have a positive correlation.
 If they move in opposite directions, then they have a negative correlation.
27. Define correlation coefficient.
 It is a number between -1 to 1.
 It tells you that the strength and direction of a relationship between variables.
 i.e., it reflects how similar the measurements of 2 or more variable across
dataset.

28. What are the 4 things to describe the relationship between the variables?

Strength
• Strength of the relationship is given by the correlation coefficient
 Direction
• It can be +ve or –ve based on the sign of the correlation coefficient
 Shape
• It must always be linear to computer a pearson correlation coefficient
 Statistically significant
• It is based on p-value.
29. What does correlation coefficient tells you?

8
 It summarizes the data
 It helps you to compare the results between studies.
30. State the guidelines for interpreting correlation strength.

31. List out the types of correlation coefficients


 Pearson’s r Correlation coefficient
 Spearman’s rho Correlation coefficient

32. What is a Linear Regression?


In simple terms,linear regression is adopting a linear approach to modeling the relationship between
a dependent variable (scalar response) and one or more independent variables (explanatory
variables). In case you have one explanatory variable, you call it a simple linear regression. In
case you have more than one independent variable, you refer to the process as multiple linear
regressions.
33. What are the disadvantages of the linear regression model?
One of the most significant demerits of the linear model is that it is sensitive and dependent on the
outliers. It can affect the overall result. Another notable demerit of the linear model is over
fitting. Similarly, under fitting is also a significant disadvantage of the linear model
34. What are the different types of least squares?
Least squares problems fall in to two categories: linear or ordinary least squares and
nonlinear least squares, depending on whether or not the residuals are linear in all
unknowns. The linear least-squares problem occurs in statistical regression analysis; it
has a closed-form solution.
35. What is the difference between least squares regression and multiple regressions?
The goal of multiple linear regressions is to model the linear relationship between the
explanatory (independent) variables and response (dependent) variables. In essence,
multiple regression is the extension of ordinary least-squares (OLS) regression because
it involves more than one explanatory variable.
36. What is the principle of least squares?
Principle of Least Squares" states that the most probable values of a system of unknown
quantities upon which observations have been made, are obtained by making the sum of
the squares of the errors a minimum.
Part B
1. Explain the step by step procedure to construct the frequency distribution with an
example of data set of the following table

9
2. In a survey, a question was asked “During your life time, how often have you
changed your permanent residence?” a group of 18 college students replied a
follows: 1,3,4,1,0,2,5,8,0,2,3,4,7,11,0,2,3,3. Find the mode, median, and standard
deviation [April/May 2023]
3. Consider an example. Tom who is the owner of a retail shop, found the price of
different T-shirts vs the number of T-shirts sold at his shop over a period of one
week. He tabulated this like shown below:
Price of T-Shirt Number of T-Shirt Sold

10
15
Explain the concept of least squares regression to find the line of best fit for the above data
4. The following frequency distribution shows the annual incomes in dollars for a
group of college graduates.

i. Construct a histogram.
ii. Construct a frequency polygon.
iii. Is this distribution balanced or lopsided?
5. Consider the best actress Oscar winners dataset given below, construct the stem plot
for the above dataset.
34 34 26 37 42 41 35 31 41 33 30 74 33 49 38 61 21 41 26 80 43 29 33 35 45 49 39
34 26 25 35 33

6. Explain multiple linear regression model with the prediction of sales through the
10
various attributes like budget for TV advertisement, Radio Advertisement and News paper
Advertisement using statistical model
7. Consider the following x and y set of values, create least square linear regression and check
the result of model fitting to know whether the model is satisfactory
8. Discuss in detail the various typical shapes of frequency distribution. Analyze its
characteristics with an example
9. The following are the number of customers who entered a video store in 8 consecutive
hours: 7,9,5,13,3,11,15,9. Find the standard deviation of the number of hourly customers.
Summarize about the aforementioned data with the help of standard deviation
10. Explain the steps to calculate IQR with an example of best actress Oscar winners
11. For each of the following pairs of distributions, first decide whether their standard
deviations are about the same or different. If their standard deviations are different, indicate
which distribution should have the larger standard deviation. Note that the distribution with
the more dissimilar set of scores or individuals should produce the larger standard deviation
regardless of whether, on average, scores or individuals in one distribution differ from
those in other distribution.
12. The IQ scores for a group of 35 high school dropouts are as follows:

i. Construct a frequency distribution for grouped data (4)


ii. Relative Frequency distribution (3)
iii. Cumulative Frequency distribution (3)
13. Discuss in detail about “Measures of Central Tendency” and calculate each measure for the
following retirement ages data:
60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63
Is it possible to calculate “Mean” for qualitative data? Justify your answer.
Is the above data following “Bimodal”? Justify your answer.
14. Discuss about following measures and calculate them with given“residence changes”
data.
1, 3, 4, 1, 0, 2, 5, 8, 0, 2, 3, 4, 7, 11, 0, 2, 3, 4
i. Range
ii. Variance
iii. Standard Deviation
iv. Inter Quartile Range (IQR)
v. Z-Score

11
UNIT III- INFERENTIAL STATISTICS
Part A

37. What is population?


In statistics, population is the entire set of items from which you draw data for a statistical study.
It can be a group of individuals, a set of items, etc. It makes up the data pool for a study.
38. What is a sample?
A sample represents the group of interest from the population, which you will use to represent
the data. The sample is an unbiased subset of the population that best represents the whole
data.
39. When are samples used?
 The population is too large to collect data.
 The data collected is not reliable.
 The population is hypothetical and is unlimited in size. Take the example of a study that
documents the results of a new medical procedure. It is unknown how the procedure will
affect people across the globe, so a test group is used to find out how people react to it.
40. Difference Between Population and Sample? (April/May 2023)
Population Sample
All residents of a country would constitute All residents who live above the poverty
the Population set line would be the Sample
All residents above the poverty line in a All residents who are millionaires would
country would be the Population make up the Sample
All employees in an office would be the Out of all the employees, all managers in the
Population office would be the Sample

41. Define Hypothetical Population


A population containing a finite number of individuals, members or units is a class. ... All
the 400 students of 10th class of particular school is an example of existent type of
population and the population of heads and tails obtained by tossing a coin on infinite
number of times is an example of hypothetical population.
42. What is Random Samplings?
Random sampling occurs if, at each stage of sampling, the selection process guarantees that
all potential observations in the population have an equal chance of being included in the
sample
43. What is Sampling Distribution ?
The sampling distribution of the mean refers to the probability distribution of means for all
possible random samples of a given size from some population.
44. What are the types of Sampling Distribution?
 Sampling distribution of mean
 Sampling distribution of proportion
 T-distribution
45. Define Sampling distribution of mean
The most common type of sampling distribution
n the mean. It focuses on calculating the
1

12
mean of every sample group chosen from the population and plotting the data points. The
graph shows a normal distribution where the center is the mean of the sampling distribution,
which represents the mean of the entire population.
46. What is mean by Sampling distribution of proportion
This sampling distribution focuses on proportions in a population. Samples are selected and
their proportions are calculated. The mean of the sample proportions from each group
represent the proportion of the entire population,
47. Define T-distribution
T-distribution is a sampling distribution that involves a small population or one where
not much is known about it. It is used to estimate the mean of the population and other
statistics such as confidence intervals, statistical differences and linear regression. The
T-distribution uses a t- score to evaluate data that wouldn't be appropriate for a normal
distribution
The formula for t-score is:

In the formula, "x" is the sample mean and "μ" is the population mean and signifies
standard deviation.
48. Define MEAN OF ALL THE SAMPLE MEAN
The mean of the sampling distribution of the mean always equals the mean of the
population
49. Standard Error Of The Mean
The standard error of the mean equals the standard deviation of the population divided
by the square root of the sample size
50. What is the Special Type Of Standard Deviation
You might find it helpful to think of the standard error of the mean as a rough measure
of the average amount by which sample means deviate from the mean of the sampling
distribution or from the population mean
51. What Is The Hypothesis Testing
Hypothesis testing is a form of statistical inference that uses data from a sample to draw
conclusions about a population parameter or a population probability distribution. First,
a tentative assumption is made about the parameter or distribution. This assumption is
called the null hypothesis and is denoted by H0.
52. Define Hypothesized Sampling Distribution
When you perform a hypothesis test of a single population mean μ using a normal
distribution (often called a z-test), you take a simple random sample from the
population. ... Then the binomial distribution of a sample (estimated) proportion can be
approximated by the normal distribution with μ = p and σ=√pqn σ = p q n
53. Define Decision Rule
A decision rule specifies precisely when H0 should be rejected (because the observed z
qualifies as a rare outcome). There are many possible decision rules, as will be seen in
Section 11.3. A very common one, already introduced in Figure 10.3, specifies that H0
should be rejected if the observed z equals or is more positive than 1.96 or if the
observed z equals or is more negative than –1.96. Conversely, H0 should be retained if
the observed z falls between ± 1.96.
54. Define null hypothesis
The null hypothesis is a typical statistical theory which suggests that no statistical
relationship and significance exists in a set of given single observed variable, between
two sets of observed data and measured phenomena.
55. What is Level of Significance
Total area that is identified with rare outcomes. Often referred to as the level of
significance of the statistical test, this proportion is symbolized by the Greek letter α
(alpha. In the present example, the level of significance, α, equals 05.
56. Define One-Tailed And Two-Tailed Tests
13
A one-tailed test is a statistical test in which the critical area of a distribution is one-
sided so that it is either greater than or less than a certain value, but not both. If the
sample being tested falls into the one-sided critical area, the alternative hypothesis will
be accepted instead of the null hypothesis.
In statistics, a two-tailed test is a method in which the critical area of a distribution is
two-sided and tests whether a sample is greater or less than a range of values. It is used
in null-hypothesis testing and testing for statistical significance.
57. State Addition Rule and Multiplication Rule
Addition rule states that add together the separate probabilities of several mutually
exclusive events to find the probability that any one of these events will occur

Multiplication rule states that multiply together the separate probabilities of several
independent events to find the probability that these events will occur together.

58. Imagine a very simple population consisting of only four observations:2, 4, 6, 8.


List all possible samples of size two.
For given sample size 2, list of possible samples that can be taken from above
observations are

59. What is One-Tailed Test (Lower Tail Critical)


Now let’s assume that the research hypothesis for the investigation of SAT math scores
was based on complaints from instructors about the poor preparation of local freshmen.
Assume also that if the investigation supports these complaints, a remedial program will
be instituted. Under these circumstances, the investigator might prefer a hypothesis test
that is specially designed to detect only whether the population mean math score for all
local freshmen is less than the national average. This alternative hypothesis reads:

60. What are four possible outcomes for any hypothesis test?
 If H0 really is true, it is a correct decision to retain the true H0.
 If H0 really is true, it is a type I error to reject the true H0.
 If H0 really is false, it is a type II error to retain the false H0.
 If H0 really is false, it is a correct decision to reject the false H0.
61. Define Point Estimate
A point estimate for μ uses a single value to represent the unknown population mean
62. What is mean by confidence interval ( ci ) for μ?
A confidence interval for μ uses a range of values that, with a known degree of certainty,
includes the unknown population mean.
63. What do you mean by Hypothesis? Name at least 4 of its types.
Hypothesis is a statement about the nature of a population. It is oftenstated in terms of a
population parameter. Hypothesis testing is a form ofstatistical inference that uses data
from a sample to draw conclusions about a population parameter or a population
probability distribution. Some types of hypothesis statements are
Directional Hypothesis,
Non-Directional Hypothesis,
3
14
Null hypothesis,
Alternative hypothesis,
Associative Hypothesis
64. State Central Limit Theorem (April/May 2023)
Central Limit Theorem states that regardless of the population shape, the shape of the
sampling distribution of the mean approximates a normal curve if the sample size is
sufficiently large.
According to this theorem, it doesn’t matter whether the shape of the parent population
is normal, positively skewed or negatively skewed, as long as the sample size is
sufficiently large.
If the shape of the parent population is normal, then any sample size will be sufficiently
large. Otherwise, depending on the degree of non-normality in the parent population, a
sample size between 25 and 100 is sufficiently large
65. Indicate whether the following statements are True or False with proper
justification. The mean of all sample means,
(c) always equals the value of a particular sample mean.(b) equals 100 if, in fact,
the population mean equals 100.(c) usually equals the value of a particular sample
mean.(d) is interchangeable with the population mean
 FALSE, Mean of all sample mean will not represent a particular sample mean
 TRUE, The population mean can be equated to mean of all sample mean
 FALSE, Mean of all sample mean will not represent a particular sample mean
 TRUE, The population mean can be equated to mean of all sample mean
66. Indicate what’s wrong with each of the following statistical hypothesis:

(c) Null hypothesis and its respective alternative hypothesis cannot have different
anchor point values. In given scenario, both hypothesis dintcover any values
between155 and160 exclusively.
(d) Any hypothesis statement represents details about any one of population parameter.
But Sample mean X ̅ is referred in given above scenario.
67. Define Effect Of Sample Size
The larger the sample size, the smaller the standard error and, hence, the more precise
(narrower) the confidence interval will be. Indeed, as the sample size grows larger, the
standard error will approach zero and the confidence interval will shrink to a point
estimate. Given this perspective, the sample size for a confidence interval, unlike that for
a hypothesis test, never can be too large.
Part B
15. Explain population and samples. And difference?
16. Describe random sampling
17. Explain sampling distribution and types
18. Describe null hypothesis test in detail
19. Explain in detail hypothesis testing and examples
20. Does the mean of SAT math score for all local freshman differ for all local
average of 500? (z test for population mean)
21. Explain one tailed and two tailed test
22. Define estimation .Explain in detail about point estimation.
23. Discuss about the following with suitable example:
i. Random Sampling vs Random Assignments

4
15
ii. Independent vs Dependent Events
iii. Independent vs Mutually Exclusive Events
iv. Conditional Probability
v. Sampling Distribution of the Mean
24. Imagine a very simple population consisting of only four observations:2 3 4 5
i. Explain the process of constructing relative frequency table showing
thesampling distribution of the mean.
ii. Construct a relative frequency table showing the sampling distributionof the
mean for the above observations.
25. Define Hypothesis. Discuss in detail about at least 5 types of hypothesis
statement with an example.
26. Calculate the value of the z test for each of the following situations. Also, given
critical z score of +/- 1.96, calculate the critical confidence level.
i. X=12; σ=9; n=25; µhyp=15
ii. X=3600; σ=4000; n=100; µhyp=3500
iii. X=0.25; σ=010; n=36; µhyp=0.22
27. Reading achievement scores are obtained for a group of fourth graders. A scores
of 4.0 indicates a level of achievement appropriate for fourth grades, a score
below 4.0 indicates under achievement., and a score above 4.0 indicates over
achievement. Assume that the population standard deviation equals 0.4. A
random sample of 64 fourth graders reveals a mean achievement score of 3.82.
Construct a 95% confidence interval for the unknown population mean.
(Remember to convert the standard deviation to a standard error). Interpret this
confidence interval; that is, do you find any consistent evidence either of
overachievement or of underachievement?
28. Illustrate in detail about estimation method and confidence interval.
29. For the population at large, the Wechsler Adult Intelligence Scale is designed to
yield a normal distribution of test score with a mean of 100 and a standard
deviation of 15. School district officials wonder whether, on the average, an IQ
score different from 100 describes the intellectual aptitudes of all students in
their district. Wechsler IQ scores are obtained for random sample of 25 of their
students, and the mean IQ is found to equal 105. Using the step-by-step
procedure, test the null hypothesis at the .05 level of significance.
30. Imagine a simple population consisting of only 5 observations: 2 4 6 8 10. List
all possible sample of size two. Construct relative frequency table showing the
sample distribution of the mean.
31. According to the American Psychological Association, members with a
doctorate and a full-time teaching appointment earn, on the average, $82,500
per year, with a standard deviation of $6,000. An investigator wishes to
determine whether $82,500 is also the mean salary for all female members with
a doctorate and a full-time teaching appointment. Salaries are obtained for a
random sample of 100 women from this population, and the mean salary equals
$80,100.
i. Someone claims that the observed difference between $80,100 and $82,500 is large

16
5
enough by itself to support the conclusion that female members earn less than male
members. Explain why it is important to conduct a hypothesis test.
ii. The investigator wishes to conduct a hypothesis test for what population?
iii. What is the null hypothesis, H0?
iv. What is the alternative hypothesis, H1?
v. Specify the decision rule, using the .05 level of significance.
vi. Calculate the value of z. (Remember to convert the standard deviation to
a standard error.)
vii. What is your decision about H0?
viii. Using words, interpret this decision in terms of the original problem.
32. According to the California Educational Code
(http://www.cde.ca.gov/ls/fa/sf/peguidemidhi.asp), students in grades 7 through 12
should receive 400 minutes of physical education every 10 school days. A random
sample of 48 students has a mean of 385 minutes and a standard deviation of 53
minutes. Test the hypothesis at the .05 level of significance that the sampled
population satisfies the requirement.
33. According to a 2009 survey based on the United States census (http://www.census.
gov/prod/2011pubs/acs-15.pdf), the daily one-way commute time of U.S. workers
averages 25 minutes with, we’ll assume, a standard deviation of 13 minutes. An
investigator wishes to determine whether the national average describes the mean
commute time for all workers in the Chicago area. Commute times are obtained for a
random sample of 169 workers from this area, and the mean time is found to be 22.5
minutes. Test the null hypothesis at the .05 level of significance.
34. Each of the following statements could represent the point of departure for a
hypothesis test. Given only the information in each statement, would you use a two-
tailed (or nondirectional) test, a one-tailed (or directional) test with the lower tail
critical, or a one-tailed (or directional) test with the upper tail critical? Indicate your
decision by specifying the appropriate H0 and H1. Furthermore, whenever you
conclude that the test is one-tailed, indicate the precise word (or words) in the
statement that justifies the one-tailed test.
i. An investigator wishes to determine whether, for a sample of drug addicts, the mean
score on the depression scale of a personality test differs from a score of 60, which,
according to the test documentation, represents the mean score for the general
population.
ii. To increase rainfall, extensive cloud-seeding experiments are to be conducted, and
the results are to be compared with a baseline figure of 0.54 inch of rainfall (for
comparable periods when cloud seeding was not done).
iii. Public health statistics indicate, we will assume, that American males gain an
average of 23 lbs during the 20-year period after age 40. An ambitious weight-
reduction program, spanning 20 years, is being tested with a sample of 40-year-old
men.
iv. When untreated during their lifetimes, cancer-susceptible mice have an
average life span of 134 days. To determine the effects of a potentially life-
prolonging (and cancer-retarding) drug, the average life span is determined for a
group of mice that receives this drug.
35. For each of the following situations, indicate whether H0 should be retained or
rejected. Given a one-tailed test, lower tail critical with α = .01, and

36. Specify the decision rule for each of the following situations (referring to Table 11.1
to find critical z values):
17
37. Each of the following statements could represent the point of departure for a hypothesis test.
Given only the information in each statement, would you use a two- tailed (or
nondirectional) test, a one-tailed (or directional) test with the lower tail critical, or a one-
tailed (or directional) test with the upper tail critical? Indicate your decision by specifying
the appropriate H0 and H1. Furthermore, whenever you conclude that the test is one-tailed,
indicate the precise word (or words) in the statement that justifies the one-tailed test.
i. An investigator wishes to determine whether, for a sample of drug addicts, the mean
score on the depression scale of a personality test differs from a score of 60, which,
according to the test documentation, represents the mean score for the general population.
ii. To increase rainfall, extensive cloud-seeding experiments are to be conducted, and
the results are to be compared with a baseline figure of 0.54 inch of rainfall (for
comparable periods when cloud seeding was not done).
iii. Public health statistics indicate, we will assume, that American males gain an
average of 23 lbs during the 20-year period after age 40. An ambitious weight- reduction
program, spanning 20 years, is being tested with a sample of 40-year-old men.
iv. When untreated during their lifetimes, cancer-susceptible mice have an average life
span of 134 days. To determine the effects of a potentially life- prolonging (and cancer-
retarding) drug, the average life span is determined for a group of mice that receives this
drug.
38. For each of the following situations, indicate whether H0 should be retained or rejected.
Given a one-tailed test, lower tail critical with α = .01, and
(a) z = – 2.34 (b) z = – 5.13 (c) z = 4.04
Given a one-tailed test, upper tail critical with α = .05, and
(d) z = 2.00 (e) z = – 1.80 (f) z = 1.61

39. Reading achievement scores are obtained for a group of fourth graders. A score of
4.0 indicates a level of achievement appropriate for fourth grade, a score below 4.0
indicates underachievement, and a score above 4.0 indicates overachievement. Assume that
the population standard deviation equals 0.4. A random sample of 64 fourth graders reveals
a mean achievement score of 3.82.
iv. Construct a 95 percent confidence interval for the unknown population mean.
(Remember to convert the standard deviation to a standard error.)
v. Interpret this confidence interval; that is, do you find any consistent evidence either of
overachievement or of underachievement?

You might also like