FDA CIA 2 Qs Answers
FDA CIA 2 Qs Answers
PART A
Sample
Any subset of observations from a population.
The sample size is small relative to the population size.
Example 1
For each of the following pairs, indicate with a Yes or No
whether the relationship between the first and second
expressions could describe that between a sample and its
population, respectively.
(a) students in the last row; students in class
(b) citizens of Wyoming; citizens of New York
(c) 20 lab rats in an experiment; all lab rats, similar to
those used, that could undergo the same experiment
(d) all U.S. presidents; all registered Republicans
(e) two tosses of a coin; all possible tosses of a coin
Solution
(a) Yes
(b)No. Citizens of Wyoming aren’t a subset of citizens of New York.
(c)Yes
(d) No. All U.S. presidents aren’t a subset of all registered Republicans.
(e)Yes
Example 2
Identify all of the expressions from Example 3.1 that involve a
hypothetical population.
Solution
Expressions in 8.1(c) and 8.1(e) involve hypothetical populations.
9
Random Sampling
A selection process that guarantees all potential observations in
the population have an equal chance of being selected.
Inferential statistics requires that samples be random.
Example 3
Indicate whether each of the following statements is True or False.
A random selection of 10 playing cards from a deck of 52 cards implies that
(a) the random sample of 10 cards accurately represents the
important features of the whole deck.
(b) each card in the deck has an equal chance of being selected.
(c)it is impossible to get 10 cards from the same suit (for example,
10 hearts).
(d) any outcome, however unlikely, is possible.
Solution
a. False. Sometimes, just by chance, a random sample of 10 cards fails
to represent the important features of the whole deck.
b. True
c. False. Although unlikely, 10 hearts could appear in a random sample of
10 cards.
d. True
Example 4
Describe how you would use the table of random numbers to take
a. a random sample of five statistics students in a classroom
where each of nine rows consists of nine seats.
b. a random sample of size 40 from a large directory consisting
of 3041 pages, with 480 lines per page.
Solution
a. There are many ways. For instance, consult the tables of random
numbers, using the first digit of each 5-digit random number to identify
the row (previously labeled 1, 2, 3, and so on), and the second digit of the
same random number to locate a particular student’s seat within that
row. Repeat this process until five students have been identified. (If the
classroom is larger, use additional digits so that every student can be
sampled.)
b. Once again, there are many ways. For instance, use the initial 4
digits of each random number (between 0001 and 3041) to identify
the page number of the telephone directory and the next 3 digits
(between 001 and 480) to identify the particular line on that
page. Repeat this process, using 7-digit numbers, until 40 telephone
numbers have been identified.
Probability
The proportion or fraction of times that a particular event is likely to
occur.
Example 5
Assuming that people are equally likely to be born during
any One of the months, what is the probability of Jack
being born during
(a) June?
(b) any month other than June?
(c) either May or
June? Solution
Independent Events
The occurrence of one event has no effect on the probability that
the other event will occur.
Multiplication Rule
Multiply together the separate probabilities of several
independent events to find the probability that these events will
occur together.
where A and B are independent events.
Example 6
Assuming that people are equally likely to be born during any of the
months, and also assuming (possibly over the objections of
astrology fans) that the birthdays of married couples are
independent, what’s the probability of
(a) the husband being born during January and the wife being born
during February?
(b) both husband and wife being born during December?
(c) both husband and wife being born during the spring (April or
May)? (Hint: First, find the probability of just one person being born
during April or May.)
Solution
Dependent Events
When the occurrence of one event affects the probability of the
other event, these events are dependent.
Although the heights of randomly selected pairs of men are
independent, the heights of brothers are dependent.
Conditional Probability
The probability of one event, given the occurrence of another event.
Defining Hypotheses
Null hypothesis (H0):
In statistics, the null hypothesis is a general statement or default
position that there is no relationship between two measured cases or
no relationship among groups. In other words, it is a basic
assumption or made based on the problem knowledge.
Example:
A company’s mean production is 50 units/per day
H0: = 50.
Alternative hypothesis (H1):
The alternative hypothesis is the hypothesis used in hypothesis
testing that is contrary to the null hypothesis.
Example:
A company’s production is not equal to 50 units/per day i.e. H1: 50.
Key Terms of Hypothesis Testing
Level of significance:
o It refers to the degree of significance to accept or reject the null
hypothesis. 100% accuracy is not possible for accepting a
hypothesis, so, therefore, select a level of significance that is usually
5%.
o This is normally denoted with and generally, it is 0.05 or 5%,
which means the output should be 95% confident to give a similar
kind of result in each sample.
P-value:
o The P value, or calculated probability, is the probability of finding the
observed/extreme results when the null hypothesis(H0) of a study-
given problem is true.
o If P-value is less than the chosen significance level then reject the
null hypothesis i.e. accept that the sample claims to support the
alternative hypothesis.
Test Statistic:
o The test statistic is a numerical value calculated from sample data
during a hypothesis test, used to determine whether to reject the
null hypothesis.
o It is compared to a critical value or p-value to make decisions
about the statistical significance of the observed results.
Critical value:
o The critical value in statistics is a threshold or cutoff point used to
determine whether to reject the null hypothesis in a hypothesis
test.
Degrees of freedom:
o Degrees of freedom are associated with the variability or freedom
one has in estimating a parameter.
o The degrees of freedom are related to the sample size and
determine the shape.
Figure 3.6 - One possible set of common and rare outcomes (values of X).
Figure 3.6 shows one possible set of boundaries for common and rare
outcomes, expressed in values of X.
If the one observed sample mean is located between 478 and 522, it will
qualify as a common outcome, and the null hypothesis will be retained.
If, however, the one observed sample mean is greater than522 or less than 478,
it will qualify as a rare outcome, and the null hypothesis will be rejected.
UNIT-4
PART-A
1. State the null hypothesis in a one-sample t-test.
The null hypothesis states that the population mean is equal to a specified
value.
2. What is the sampling distribution of t?
The sampling distribution of t is the distribution of the t-statistic under the null
hypothesis.
3. Specify the purpose of a t-test for two independent samples.
It is used to compare the means of two independent groups to determine if they
are significantly different.
4. Define p-value.
The p-value is the probability of obtaining test results at least as extreme as the
observed results, under the assumption that the null hypothesis is true.
5. What is statistical significance?
Statistical significance indicates that the observed result is unlikely to have
occurred by chance under the null hypothesis.
6. What is a t-test for two related samples?
It compares the means of two related groups, such as paired observations or
repeated measures on the same subjects.
7. Define F-test.
The F-test is used to compare the variances of two populations or to test the
overall significance in ANOVA.
8. What is ANOVA?
ANOVA, or Analysis of Variance, is a statistical method used to compare the
means of three or more groups.
9. List the purpose of two-factor ANOVA.
Two-factor ANOVA analyzes the impact of two independent variables on a
dependent variable.
10. What is a chi-square test used for?
The chi-square test is used to determine if there is a significant association
between two categorical variables.
PART-B
1. Explain the procedure for conducting a one-sample t-test with an example.
The one-sample t-test is a statistical test used to determine whether the
mean of a single sample is significantly different from a known or
hypothesized population mean. This test is particularly useful when the
population standard deviation is unknown, and the sample size is relatively
small.
Detailed procedure:
1. **State the Hypotheses**:
- Null Hypothesis (H0): μ = μ0 (The sample mean is equal to the population
mean).
- Alternative Hypothesis (H1): μ ≠ μ0 (The sample mean is not equal to the
population mean).
This can also be a one-tailed test if a specific direction of difference is
hypothesized.
2. **Collect Sample Data**:
Gather a random sample from the population of interest and compute the
sample mean (x̄) and standard deviation (s).
3. **Calculate the Test Statistic**:
Use the formula: t = (x̄ - μ0) / (s / √n), where:
- x̄ is the sample mean,
- μ0 is the hypothesized population mean,
- s is the sample standard deviation,
- n is the sample size.
4. **Determine Degrees of Freedom (df)**:
Degrees of freedom for a one-sample t-test are calculated as df = n - 1.
5. **Find the Critical Value or P-Value**:
Using the t-distribution table or statistical software, find the critical value for
the chosen significance level (e.g., α = 0.05). Alternatively, calculate the p-
value corresponding to the computed t-statistic.
6. **Decision Rule**:
- If |t| > critical value, or if the p-value < α, reject the null hypothesis (H0).
- Otherwise, fail to reject H0.
7. **Interpret the Results**:
Clearly state whether the sample mean is significantly different from the
hypothesized population mean.
Example:
Suppose a nutritionist claims that the average weight of a type of apple is 150
grams. A sample of 20 apples has a mean weight of 155 grams and a
standard deviation of 10 grams. Using a one-sample t-test:
- Hypotheses: H0: μ = 150, H1: μ ≠ 150
- Test statistic: t = (155 - 150) / (10 / √20) ≈ 2.236
- Degrees of freedom: df = 20 - 1 = 19
Using a t-table at α = 0.05 (two-tailed), the critical value is approximately
±2.093.
Since |t| > 2.093, we reject H0 and conclude that the mean weight of the
apples is significantly different from 150 grams.
2. Describe the t-test for two independent samples and its assumptions.
The t-test for two independent samples, also known as the independent t-test,
is used to compare the means of two unrelated groups to determine if they
are significantly different. This test is commonly applied in controlled
experiments, such as comparing the effect of two treatments on different
groups.
Key Assumptions:
1. **Independence**: Observations within each group must be independent.
2. **Normality**: The data in each group should approximately follow a
normal distribution.
3. **Homogeneity of Variances**: The variances of the two groups should be
equal. If this assumption is violated, a modified version of the test, such as
Welch’s t-test, can be used.
Procedure:
1. **State the Hypotheses**:
- Null Hypothesis (H0): μ1 = μ2 (The means of the two groups are equal).
- Alternative Hypothesis (H1): μ1 ≠ μ2 (The means of the two groups are not
equal).
2. **Collect Sample Data**:
Calculate the mean and standard deviation for each group.
3. **Compute the Test Statistic**:
The formula for the t-statistic depends on whether the variances are
assumed equal:
- For equal variances:
t = (x̄1 - x̄2) / √(s_p² * (1/n1 + 1/n2)),
where s_p² is the pooled variance:
s_p² = [(n1 - 1)s1² + (n2 - 1)s2²] / (n1 + n2 - 2).
- For unequal variances (Welch’s t-test):
t = (x̄1 - x̄2) / √(s1²/n1 + s2²/n2).
4. **Determine Degrees of Freedom (df)**:
- For equal variances: df = n1 + n2 - 2.
- For unequal variances, use an approximation formula.
5. **Compare with Critical Value or Compute P-Value**:
Use the t-distribution table or software to find the critical value or p-value.
6. **Decision Rule**:
- If |t| > critical value or p-value < α, reject H0.
- Otherwise, fail to reject H0.
Example:
A study compares the effectiveness of two teaching methods on students’ test
scores. Group 1 (n1 = 30) has a mean score of 75 (s1 = 8), and Group 2 (n2 =
30) has a mean score of 70 (s2 = 6). Assuming equal variances:
- s_p² = [(29 × 8²) + (29 × 6²)] / 58 = 50.
- t = (75 - 70) / √(50 * (1/30 + 1/30)) ≈ 2.236.
With df = 58, the critical value at α = 0.05 (two-tailed) is ±2.002. Since |t| >
2.002, we reject H0 and conclude that the two teaching methods have
significantly different effects on test scores.
2. Statistical Significance
Statistical significance provides a formal decision rule for hypothesis testing.
It is determined by comparing the p-value to a predefined threshold called the
significance level (α). Common choices for α include 0.05, 0.01, and 0.10,
although the choice depends on the context of the study and the potential
consequences of errors.
Significance Level (α): This is the maximum probability of making a Type I
error—rejecting a true null hypothesis. If the p-value is less than α, the result
is deemed statistically significant.
For example, if α is set at 0.05 and the p-value is 0.03, the result is
statistically significant, suggesting sufficient evidence to reject the null
hypothesis. This significance implies that the observed effect or relationship is
unlikely to be due to random chance alone.
5. Practical Applications
a. Scientific Research: P-values and statistical significance are integral to
research across disciplines, from medicine to economics. They help determine
whether observed effects are genuine or likely due to random variation.
b. Policy and Decision Making: In applied settings, such as public health or
business, statistically significant results guide actionable decisions. For
instance, a pharmaceutical trial might rely on p-values to evaluate the
efficacy of a new drug.
c. Replicability and Reliability: Consistent findings of statistical
significance across multiple studies enhance the credibility and
generalizability of results. Researchers often replicate experiments to confirm
findings, emphasizing the role of p-values in verifying reliability.
Conclusion
The p-value and statistical significance are indispensable in hypothesis
testing, providing a rigorous framework for evaluating evidence and making
decisions. However, they must be interpreted carefully, considering their
limitations and the broader research context. By combining these metrics with
other statistical tools, researchers can draw robust and meaningful
conclusions, advancing knowledge across disciplines.
Introduction to ANOVA
Analysis of Variance (ANOVA) is a statistical technique used to determine whether there are
significant differences among the means of three or more groups. It is particularly useful when
comparing multiple datasets to identify variations attributable to different factors or
experimental conditions. Unlike a simple t-test that compares means between two groups,
ANOVA expands this capability to multiple groups, offering a robust methodology for complex
experimental designs.
.
Types of ANOVA
There are several types of ANOVA, tailored to different experimental scenarios:
One-Way ANOVA: Used when one factor with multiple levels is under consideration.
Two-Way ANOVA: Suitable when two factors are studied simultaneously.
Multivariate ANOVA (MANOVA): Extends ANOVA to handle multiple dependent
variables.
Two-Way ANOVA
Two-way ANOVA is a statistical method employed to evaluate the effect of two independent
factors on a dependent variable. It also examines the interaction between these factors,
providing insights into whether the combined effect of the factors differs from their individual
effects.
Key Components
1. Factors and Levels:
o Each factor represents an independent variable.
o Each factor has multiple levels (e.g., different treatments, groups, or categories).
2. Main Effects:
o These are the effects of each factor independently on the dependent variable.
3. Interaction Effects:
o These occur when the effect of one factor depends on the level of the other
factor.
Model Structure
A two-way ANOVA model can be expressed as: Where:
: Observed value for the k-th observation in the i-th level of factor A and j-th level of
factor B.
: Overall mean.
: Effect of the i-th level of factor A.
: Effect of the j-th level of factor B.
: Interaction effect between factors A and B.
: Random error.
Non T
Veg - o
etar Veg t
ian etar a
ian l
Lo
w
Ex
erc 8
ise 30 50 0
Mo
der
ate
Ex
erc 7
ise 40 30 0
Hig
h
Ex
erc 5
ise 20 30 0
2
Tot 0
al 90 110 0
Step 1: Hypotheses
Null Hypothesis (H0): Exercise frequency and diet preference are
independent.
Alternative Hypothesis (H1): Exercise frequency and diet preference are
dependent.
Step 2: Calculate Expected Frequencies The expected frequency (E) for
each cell is calculated as:
E = (Row Total × Column Total) / Grand Total
For example, the expected frequency for Low Exercise and Vegetarian is: E =
(80 × 90) / 200 = 36
Step 3: Compute the Chi-Square Statistic Using the formula, calculate χ²
for all cells:
χ² = Σ [(O_i - E_i)^2 / E_i]
Step 4: Compare with Critical Value Determine degrees of freedom (df):
df = (Number of Rows - 1) × (Number of Columns - 1) df = (3 - 1) × (2 - 1) = 2
At α = 0.05, the critical value for df = 2 is 5.991.
If the calculated χ² exceeds 5.991, reject the null hypothesis.
Step 5: Interpret Results If the null hypothesis is rejected, conclude that
exercise frequency and diet preference are significantly associated.
UNIT-5
PART-A
1.What Is Predictive Analytics?
The term predictive analytics refers to the use of statistics and modeling techniques
to make predictions about future outcomes and performance.
Predictive analytics looks at current and historical data patterns to determine if those
patterns are likely to emerge again.
It allows businesses and investors to adjust where they use their resources to take
advantage of possible future events.
a. Weather forecasts
b. Creating video games
c. Translating voice to text for mobile phone messaging
d. Customer service
3. Define Credit
Credit scoring makes extensive use of predictive analytics.
Example: When a consumer or business applies for credit, data on the applicant's
credit history and the credit record of borrowers with similar characteristics are used
to predict the risk that the applicant might fail to perform on any credit extended
PART_B
11. Explain about the linear lest square method in detail
Least squares method:
Now that we have determined the loss function, the only thing left to do is minimize
it.
This is done by finding the partial derivative of L, equating it to 0 and then finding an
expression for m and c.
After we do the math, we are left with these equations:
Here is the mean of all the values in the input X and is the mean of all the values in ȳ
the desired output Y.
This is the Least Squares method.
Now we will implement this in python and make predictions.
Implementing the model:
This is the example for implementing the model using python
Making Predictions:
# Making predictions
Y_pred = m*X + c
plt.scatter(X, Y) # actual
# plt.scatter(X, Y_pred, color='red')
plt.plot([min(X), max(X)], [min(Y_pred), max(Y_pred)], color='red')
# predicted
plt.show()
There won’t be much accuracy because we are simply taking a straight line and forcing
it
to fit into the given data in the best possible way.
But you can use this to make simple predictions or get an idea about the
magnitude/range
of the real value.
Also this is a good first step for beginners in Machine Learning
There won’t be much accuracy because we are simply taking a straight line and
forcing it to fit into the given data in the best possible way.
But you can use this to make simple predictions or get an idea about the
magnitude/range of the real value.
Also this is a good first step for beginners in Machine Learning.
Linear Regression:
Linear models with independently and identically distributed errors, and for errors
with
heteroscedasticity or autocorrelation.
This module allows estimation by ordinary least squares (OLS), weighted least squares
(WLS), generalized least squares (GLS), and feasible generalized least squares with
auto correlated AR(p) errors
13. Explain in detail about Multiple regression.
In linear regression, there is only one independent and dependent variable involved.
But, in the
case of multiple regression, there will be a set of independent variables that helps us to
explain better or predict the dependent variable y.
Predictive analytics determines the likelihood of future outcomes using techniques like
data mining, statistics, data modeling, artificial intelligence, and machine learning.
Put simply, predictive analytics interprets an organization’s historical data to make
predictions about the future.
Today’s predictive analytics techniques can discover patterns in the data to identify
upcoming risks and opportunities for an organization.
Statistical analysis
Once the final forma of data is obtained, it is possible to go on with a Statistical Analysis
of parameters, so that previous hypotheses are directly tested, or insights are extracted
thanks to metrics visualization.
Modeling
Once thoroughly setting up data, predictive models can be tested, and necessary
experiments can be carried out to obtain a model with a satisfactory productiveness.
Implementation
It is the stage of the actual deploy. After performing all the required tests, evaluating
the quality of models, and validating output data, it is possible to implement the
Predictive Analytics tool in production, so that it provides predictions able to solve the
problem stated in the first point.