DAV Short Notes
DAV Short Notes
T-test:
A t-test is a statistical test used to compare the means of two groups and determine whether
they are significantly different from each other. It is commonly used in hypothesis testing
when the sample size is small and the population standard deviation is unknown.
Types of t-tests:
1. One-Sample t-test – Compares the mean of a single group to a known population
mean.
2. Independent (Unpaired) t-test – Compares the means of two independent groups.
○ Example: Comparing the test scores of students from two different schools.
3. Paired (Dependent) t-test – Compares the means of the same group before and
after a treatment.
○ Example: Measuring blood pressure before and after taking a new drug.
● If p < 0.05, we reject the null hypothesis, meaning there is a significant difference.
● If p > 0.05, we fail to reject the null hypothesis, meaning the difference is not
statistically significant.
Applications:
F-test:
An F-test is used to compare the variances of two or more groups to determine if they are
significantly different. It is used in Analysis of Variance (ANOVA) and regression analysis.
1. One-Way ANOVA – Compares the means of multiple groups based on one
independent variable.
2. Two-Way ANOVA – Compares the means of multiple groups based on two
independent variables.
○ Example: Examining how gender and diet type affect weight loss.
Factorial Experiments:
● Each factor has multiple levels, and the experiment evaluates their individual and
combined effects.
1. Main effect of Factor A – Checks if different levels of Factor A affect the dependent
variable.
2. Main effect of Factor B – Checks if different levels of Factor B affect the dependent
variable.
3. Interaction effect of A and B – Checks if the combination of Factor A and Factor B
has a unique effect.
3) Explain linear least squares, goodness of fit, model testing, and the
importance of weighted resampling in predictive analytics.
● A method used to find the best-fitting line in linear regression by minimizing the sum
of squared differences between actual and predicted values.
● Formula:
y=mx+by = mx + by=mx+b
where mmm is the slope and bbb is the intercept.
Goodness of Fit:
● Common metrics:
○ RMSE (Root Mean Square Error): Measures the average prediction error.
Model Testing:
● Used when dealing with imbalanced datasets to ensure all classes are fairly
represented.
● Techniques:
Multiple Regression:
● Formula:
y=b0+b1x1+b2x2+...+bnxn+ϵy = b_0 + b_1x_1 + b_2x_2 + ... + b_nx_n +
\epsilony=b0+b1x1+b2x2+...+bnxn+ϵ
where b0b_0b0is the intercept, bnb_nbnare coefficients, and ϵ\epsilonϵ is the error
term.
Nonlinear Relationships:
● If data doesn’t fit a straight line, nonlinear models like polynomial regression and
exponential regression are used.
Logistic Regression:
● Used for classification problems where the output is categorical (e.g., 0 or 1, Yes or
No).
● Analyzes data points collected over time (e.g., stock prices, temperature records).
Moving Averages:
● Simple Moving Average (SMA): Computes the average of the last ‘n’ observations.
● Serial Correlation: When past values influence future values in a time series.
● Autocorrelation: Measures how a time series is correlated with its past values at
different time lags.
Applications:
● Weather Prediction