0% found this document useful (0 votes)
2 views

DAV Short Notes

The document provides an overview of various statistical tests and methods including t-tests (one-sample, independent, paired), F-tests, ANOVA (one-way, two-way), and linear regression techniques. It explains concepts such as p-value, statistical significance, goodness of fit, and the importance of weighted resampling in predictive analytics. Additionally, it covers time series analysis, moving averages, handling missing values, and applications in fields like medical research, marketing, and forecasting.

Uploaded by

RUDHRESH S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DAV Short Notes

The document provides an overview of various statistical tests and methods including t-tests (one-sample, independent, paired), F-tests, ANOVA (one-way, two-way), and linear regression techniques. It explains concepts such as p-value, statistical significance, goodness of fit, and the importance of weighted resampling in predictive analytics. Additionally, it covers time series analysis, moving averages, handling missing values, and applications in fields like medical research, marketing, and forecasting.

Uploaded by

RUDHRESH S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1) Explain the t-test, its types (one-sample, independent, paired),

p-value, and statistical significance with applications.

T-test:

A t-test is a statistical test used to compare the means of two groups and determine whether
they are significantly different from each other. It is commonly used in hypothesis testing
when the sample size is small and the population standard deviation is unknown.

Types of t-tests:

1.​ One-Sample t-test – Compares the mean of a single group to a known population
mean.​

○​ Example: Checking if the average height of students in a class is different


from the national average.​

2.​ Independent (Unpaired) t-test – Compares the means of two independent groups.​

○​ Example: Comparing the test scores of students from two different schools.​

3.​ Paired (Dependent) t-test – Compares the means of the same group before and
after a treatment.​

○​ Example: Measuring blood pressure before and after taking a new drug.​

P-value and Statistical Significance:

●​ P-value is the probability of obtaining results as extreme as the observed ones,


assuming the null hypothesis is true.​

●​ If p < 0.05, we reject the null hypothesis, meaning there is a significant difference.​

●​ If p > 0.05, we fail to reject the null hypothesis, meaning the difference is not
statistically significant.​

Applications:

●​ Medical research (testing drug effectiveness).​

●​ A/B testing in marketing.​

●​ Quality control in manufacturing.


2) Describe the F-test, ANOVA (one-way, two-way), factorial experiments,
and the role of three F-tests in two-factor ANOVA.

F-test:

An F-test is used to compare the variances of two or more groups to determine if they are
significantly different. It is used in Analysis of Variance (ANOVA) and regression analysis.

ANOVA (Analysis of Variance):

ANOVA is used to compare the means of three or more groups.

1.​ One-Way ANOVA – Compares the means of multiple groups based on one
independent variable.​

○​ Example: Testing the effectiveness of three different teaching methods on


student performance.​

2.​ Two-Way ANOVA – Compares the means of multiple groups based on two
independent variables.​

○​ Example: Examining how gender and diet type affect weight loss.​

Factorial Experiments:

●​ Involves studying the effect of two or more factors (independent variables)


simultaneously.​

●​ Each factor has multiple levels, and the experiment evaluates their individual and
combined effects.​

Three F-tests in Two-Factor ANOVA:

1.​ Main effect of Factor A – Checks if different levels of Factor A affect the dependent
variable.​

2.​ Main effect of Factor B – Checks if different levels of Factor B affect the dependent
variable.​

3.​ Interaction effect of A and B – Checks if the combination of Factor A and Factor B
has a unique effect.
3) Explain linear least squares, goodness of fit, model testing, and the
importance of weighted resampling in predictive analytics.

Linear Least Squares:

●​ A method used to find the best-fitting line in linear regression by minimizing the sum
of squared differences between actual and predicted values.​

●​ Formula:​
y=mx+by = mx + by=mx+b​
where mmm is the slope and bbb is the intercept.​

Goodness of Fit:

●​ Measures how well the model explains the observed data.​

●​ Common metrics:​

○​ R² (coefficient of determination): Shows the proportion of variance


explained by the model.​

○​ RMSE (Root Mean Square Error): Measures the average prediction error.​

Model Testing:

●​ Cross-validation (e.g., k-fold cross-validation) is used to check model accuracy.​

●​ Hypothesis testing is performed to ensure model parameters are statistically


significant.​

Weighted Resampling in Predictive Analytics:

●​ Used when dealing with imbalanced datasets to ensure all classes are fairly
represented.​

●​ Techniques:​

○​ Bootstrap Sampling: Random sampling with replacement.​

○​ Stratified Sampling: Maintaining proportional representation in each


category.​
4) Discuss multiple regression, nonlinear relationships, logistic
regression, and parameter estimation using StatsModels.

Multiple Regression:

●​ Extends simple linear regression to multiple independent variables.​

●​ Formula:​
y=b0+b1x1+b2x2+...+bnxn+ϵy = b_0 + b_1x_1 + b_2x_2 + ... + b_nx_n +
\epsilony=b0​+b1​x1​+b2​x2​+...+bn​xn​+ϵ​
where b0b_0b0​is the intercept, bnb_nbn​are coefficients, and ϵ\epsilonϵ is the error
term.​

Nonlinear Relationships:

●​ If data doesn’t fit a straight line, nonlinear models like polynomial regression and
exponential regression are used.​

●​ Example: y = ax² + bx + c (quadratic relationship).​

Logistic Regression:

●​ Used for classification problems where the output is categorical (e.g., 0 or 1, Yes or
No).​

●​ Uses the sigmoid function to predict probabilities:​

Parameter Estimation using StatsModels:

●​ StatsModels is a Python library used for statistical modeling.​

●​ Used to estimate regression coefficients, check p-values, and generate statistical


summaries.​

Example usage in Python:​



python​
import statsmodels.api as sm
X = sm.add_constant(X) # Adding intercept
model = sm.OLS(y, X).fit()
print(model.summary())
5) Explain time series analysis, including moving averages,
handling missing values, serial correlation, and autocorrelation
with applications.

Time Series Analysis:

●​ Analyzes data points collected over time (e.g., stock prices, temperature records).​

Moving Averages:

●​ Simple Moving Average (SMA): Computes the average of the last ‘n’ observations.​

●​ Exponential Moving Average (EMA): Gives more weight to recent values.​

Handling Missing Values:

●​ Forward Fill: Use previous values to fill missing data.​

●​ Interpolation: Estimate missing values using linear methods.​

●​ Mean/Median Imputation: Replace missing values with the mean or median.​

Serial Correlation and Autocorrelation:

●​ Serial Correlation: When past values influence future values in a time series.​

●​ Autocorrelation: Measures how a time series is correlated with its past values at
different time lags.​

Applications:

●​ Stock Market Forecasting​

●​ Weather Prediction​

●​ Demand Forecasting in Supply Chain

You might also like