STATISTICS AND ECONOMETRICS EXPECTED QUESTIONS
1. What is Probability?
Probability is a fundamental concept in mathematics and statistics that quantifies the
likelihood or chance of an event occurring.Probability is typically expressed as a number
between 0 and 1, where 0 represents an impossible event (no chance of occurring), and
1 represents a certain event (guaranteed to occur).
2. Explain Probability to a 10 year old
Probability is like a way of measuring how likely something is to happen. Imagine you have a
bag of colorful marbles, and some of them are red, some are blue, and some are green.
Now, let's say you want to figure out the probability of picking a red marble from the bag. To do
that, you need to know two things:
How many red marbles are there in the bag?
How many marbles are there in total (including all the colors)?
So, if you have 5 red marbles in the bag, and there are 20 marbles in total, the probability of
picking a red marble would be:
Probability of picking red = (Number of red marbles) / (Total number of marbles) Probability of
picking red = 5 / 20
Now, simplify that fraction:
Probability of picking red = 1/4
That's the basic idea of probability, and it's used in all sorts of situations, like predicting
the weather, playing games, or even in science to understand random events.
3. Classic vs General Probability
Classical Probability:
● Definition: Classical probability, also known as "classical or theoretical
probability," is based on the assumption that all outcomes in a sample space are
equally likely. It's often used in situations where we have a good understanding of
the underlying processes and can assume that each possible outcome has an
equal chance of occurring.
● Example: If you roll a fair six-sided die, there are six possible outcomes (numbers
1 through 6), and each outcome is equally likely. In this case, classical probability
can be used because we know that the probability of getting any specific number
(e.g., 3) is 1/6, as there's one favorable outcome out of six possible outcomes.
● Applicability: Classical probability is most suitable for situations where there is a
clear and known set of equally likely outcomes.
General Probability:
● Definition: General probability, also known as "empirical probability" or
"experimental probability," is based on observed data or experiments. Instead of
assuming that all outcomes are equally likely, general probability calculates
probabilities based on real-world data or observations.
● Example: Suppose you want to find the probability of it raining tomorrow in a
particular city. You can gather historical weather data and see how many times it
rained on similar days in the past. The probability of rain tomorrow would then be
based on the observed frequency of rain in similar conditions.
● Applicability: General probability is used when we don't have equal likelihood
assumptions or when we want to make predictions based on real-world data. It's
often used in situations where classical probability assumptions don't apply.
4. Bayes Theorem
Bayes' Theorem, named after the Reverend Thomas Bayes, is a fundamental concept in
probability theory and statistics. It provides a way to update our beliefs or probabilities about an
event based on new evidence or information. It's particularly useful in situations where we have
some prior knowledge or beliefs, and we want to incorporate new data to make more accurate
predictions or assessments.
The theorem can be expressed mathematically as:
P(A∣B)=P(A). P(B∣A)/ P(B)
5. Explain it to a 10-year kid
Imagine you have a big jar filled with different colored balls—red, blue, and green. You want to
figure out the chance of picking a red ball. Now, you can't see inside the jar, so you're not sure
how many red balls there are.
Bayes' Theorem helps you guess the chance of getting a red ball based on some clues. Here's
how it works:
First, you make a guess about how many red balls there might be, but you're not very
sure (that's your "prior" guess).
Then, you get some new clues. Maybe you close your eyes and a friend tells you if the
ball you picked is red or not (that's your "new evidence").
Using Bayes' Theorem, you can adjust your guess about the number of red balls based
on this new information.
In simple terms, Bayes' Theorem helps you update your guess when you learn something new.
It's like a smart way to change your mind when you get more information.
So, if you guess there are lots of red balls, but you keep picking blue and green ones, Bayes'
Theorem can help you realize that maybe there aren't as many red balls as you thought.
It's a bit like solving a mystery by collecting clues and changing your ideas as you learn more.
Bayes' Theorem helps you do that with numbers and math.
6. Random Variables
1. What are Random Variables
A random variable is a variable whose value is unknown or a function that
assigns values to each of an experiment's outcomes.
A random variable can be either discrete (having specific values) or continuous
(any value in a continuous range).
7. All kinds of distribution (their mean, pdf, variance, real life examples)
Discrete: Discrete random variables take on a countable number of distinct values. Consider an
experiment where a coin is tossed three times. If X represents the number of times that the coin
comes up heads, then X is a discrete random variable that can only have the values 0, 1, 2, or 3
(from no heads in three successive coin tosses to all heads). No other value is possible for X.
Continuous: Continuous random variables can represent any value within a specified range or
interval and can take on an infinite number of possible values. An example of a continuous
random variable would be an experiment that involves measuring the amount of rainfall in a city
over a year or the average height of a random group of 25 people.
Drawing on the latter, if Y represents the random variable for the average height of a random
group of 25 people, you will find that the resulting outcome is a continuous figure since height
may be 5 ft or 5.01 ft or 5.0001 ft. Clearly, there is an infinite number of possible values for
height.
Poisson: The Poisson distribution is a discrete probability function that means the variable can
only take specific values in a given list of numbers, probably infinite. A Poisson distribution
measures how many times an event is likely to occur within “x” period of time. In other words,
we can define it as the probability distribution that results from the Poisson experiment. A
Poisson experiment is a statistical experiment that classifies the experiment into two categories,
such as success or failure. Poisson distribution is a limiting process of the binomial distribution.
The formula for the Poisson distribution function is given by:f(x) =(e– λ λx)/x!
In Poisson distribution, the mean is represented as E(X) = λ.For a Poisson Distribution, the
mean and the variance are equal. It means that E(X) = V(X)
Examples: Number of Network Failures per Week.
● Number of Bankruptcies Filed per Month.
● Number of Website Visitors per Hour.
● Number of Arrivals at a Restaurant.
● Number of Calls per Hour at a Call Center.
● Number of Books Sold per Week.
● Average Number of Storms in a City.
Bernoulli Distribution
This distribution is generated when we perform an experiment once and it has only two possible
outcomes – success and failure. The trials of this type are called Bernoulli trials, which form the
basis for many distributions discussed below. Let p be the probability of success and 1 – p is the
probability of failure.
The PMF is given as
One example of this would be flipping a coin once. p is the probability of getting ahead and 1 – p
is the probability of getting a tail. Please note down that success and failure are subjective and
are defined by us depending on the context.
Binomial Distribution
This is generated for random variables with only two possible outcomes. Let p denote the
probability of an event is a success which implies 1 – p is the probability of the event being a
failure. Performing the experiment repeatedly and plotting the probability each time gives us the
Binomial distribution.
The most common example given for Binomial distribution is that of flipping a coin n number of
times and calculating the probabilities of getting a particular number of heads. More real-world
examples include the number of successful sales calls for a company or whether a drug works
for a disease or not.
The PMF is given as,
where p is the probability of success, n is the number of trials and x is the number of times we
obtain a success.
The mean of the binomial distribution is np, and the variance of the binomial distribution is np (1
− p).
Hypergeometric Distribution
Consider an event of drawing a red marble from a box of marbles with different colors. The
event of drawing a red ball is a success and not drawing it is a failure. But each time a marble is
drawn it is not returned to the box and hence this affects the probability of drawing a ball in the
next trial. The hypergeometric distribution models the probability of k successes over n trials
where each trial is conducted without replacement. This is unlike the binomial distribution where
the probability remains constant through the trials.
The PMF is given as,
where k is the number of possible successes, x is the desired number of successes, N is the
size of the population and n is the number of trials.
● Deck of Cards: A deck of cards contains 20 cards: 6 red cards and 14 black cards. ...
● Inspection for Defective Items: A young, growing company is making products in small
lots. ...
● Companies accepting the order: A company buys batches of N = 1000 components.
The mean of the hypergeometric distribution is nk/N, and the variance (square of the standard
deviation) is nk(N − k)(N − n)/N2(N − 1).w
Negative Binomial Distribution
Sometimes we want to check how many Bernoulli trials we need to make in order to get a
particular outcome. The desired outcome is specified in advance and we continue the
experiment until it is achieved. Let us consider the example of rolling a dice. Our desired
outcome, defined as a success, is rolling a 4. We want to know the probability of getting this
outcome thrice. This is interpreted as the number of failures (other numbers apart from 4) that
will occur before we see the third success.
The PMF is given as,
where p is the probability of success, k is the number of failures observed and r is the desired
number of successes until the experiment is stopped.
Like in Binomial distribution, the probability through the trials remains constant and each trial is
independent of the other.
● If we flip a coin a fixed number of times and count the number of times the coin turns out
heads is a binomial distribution. If we continue flipping the coin until it has turned a
particular number of heads say the third head-on flipping 5 times, then this is a case of
the negative binomial distribution.
● For a situation involving three glasses to be hit with 7 balls, the probability of hitting the
third glass successfully with the seventh ball can be obtained with the help of negative
binomial distribution.
● In a class, if there is a rumor that there is a math test, and the fifth is the second person
to believe the rumor, then the probability of this fifth person to be the second person to
believe the rumor can be computed using the negative binomial distribution.
The mean of the negative binomial distribution with parameters r and p is rq / p, where q = 1 – p.
The variance is rq / p2.
Geometric Distribution
This is a special case of the negative binomial distribution where the desired number of
successes is 1. It measures the number of failures we get before one success. Using the same
example given in the previous section, we would like to know the number of failures we see
before we get the first 4 on rolling the dice.
where p is the probability of success and k is the number of failures. Here, r = 1.
Cost benefit analysis
No of supporters of the law
the mean and variance of the geometric distribution. Notice that the mean m is ( 1 - p ) / p and
the variance v is ( 1 - p ) / p 2 .
1. Normal Distribution: The normal distribution is the proper term for a probability bell curve.
2. In a normal distribution the mean is zero and the standard deviation is 1. It has zero
skew and a kurtosis of 3.
3. Normal distributions are symmetrical, but not all symmetrical distributions are normal.
4. Many naturally-occurring phenomena tend to approximate the normal distribution.
a. In finance, most pricing distributions are not, however, perfectly normal.
b. For example, heights, blood pressure, measurement error, and IQ scores follow
the normal distribution
6. Binomial VS Bernouilli
The Bernoulli distribution represents the success or failure of a single Bernoulli trial. The
Binomial Distribution represents the number of successes and failures in n independent
Bernoulli trials for some given value of n.
7. Hypothesis Testing: Null and alternative hypotheses are used in statistical hypothesis
testing. The null hypothesis of a test always predicts no effect or no relationship between
variables, while the alternative hypothesis states your research prediction of an effect or
relationship.
● Difference between two-tailed and one tailed
A one-tailed test looks for an “increase” or “decrease” in the parameter whereas a two-tailed test
looks for a “change” (could be increase or decrease) in the parameter.The main difference
between one-tailed and two-tailed tests is that one-tailed tests will only have one critical region
whereas two-tailed tests will have two critical regions.
● T test
A t test is a statistical test that is used to compare the means of two groups. It is often used in
hypothesis testing to determine whether a process or treatment actually has an effect on the
population of interest, or whether two groups are different from one another.
For example, if one wishes to figure out if the mean of the length of petals of a flower belonging
to two different species is the same,
● F test: it is a continuous probability distribution that arises frequently as the null
distribution of a test statistic, most notably in analysis of variance.
Chi-square
● A chi-square (χ2) statistic is a measure of the difference between the observed
and expected frequencies of the outcomes of a set of events or variables.
● Chi-square is useful for analyzing such differences in categorical variables,
especially those nominal in nature.
● χ2 depends on the size of the difference between actual and observed values,
the degrees of freedom, and the sample size.
● χ2 can be used to test whether two variables are related or independent from
each other.
● It can also be used to test the goodness of fit between an observed distribution
and a theoretical distribution of frequencies.
● P value:the p value is defined as the lowest significance level at which a null
hypothesis can be rejected.
● Level of Significance: The level of significance is the measurement of the
statistical significance. It defines whether the null hypothesis is assumed to be
accepted or rejected. It is expected to identify if the result is statistically
significant for the null hypothesis to be false or rejected.
● Power of Test: The power of a test is the probability of rejecting the null
hypothesis when it is false; in other words, it is the probability of avoiding a type
II error. The power may also be thought of as the likelihood that a particular
study will detect a deviation from the null hypothesis given that one exists.
● Degree of Freedom: Degrees of freedom, often represented by v or df, is the
number of independent pieces of information used to calculate a statistic. It's
calculated as the sample size minus the number of restrictions.
8.Central Limit Theorem: The CLT states that, given a sufficiently large sample size, the
sampling distribution of the mean for a variable will approximate a normal distribution regardless
of that variable’s distribution in the population.
9. Type 1 and Type 2 error: A type I error (false-positive) occurs if an investigator rejects a null
hypothesis that is actually true in the population; a type II error (false-negative) occurs if the
investigator fails to reject a null hypothesis that is actually false in the population.
● Which is more problematic? = depends on cases equally harmful.
● Explain with real life Examples = Card Industry or industry specific
10. MLE: Maximum Likelihood Estimation (MLE) is a statistical method used to estimate the
parameters of a probability distribution that best describe a given dataset. The fundamental idea
behind MLE is to find the values of the parameters that maximize the likelihood of the observed
data, assuming that the data are generated by the specified distribution.
10.Econometrics: Econometrics is the use of statistical and mathematical models to develop
theories or test existing hypotheses in economics and to forecast future trends from historical
data. It subjects real-world data to statistical trials and then compares the results against the
theory being tested.
Depending on whether you are interested in testing an existing theory or in using existing data
to develop a new hypothesis, econometrics can be subdivided into two major categories:
theoretical and applied.
11. Causation Vs Correlation: A correlation between variables, however, does not
automatically mean that the change in one variable is the cause of the change in the values of
the other variable. Causation indicates that one event is the result of the occurrence of the other
event; i.e. there is a causal relationship between the two events.
Correlation means there is a statistical association between variables. Causation means that a
change in one variable causes a change in another variable.
12. Correlation Vs Covariance: covariance and correlation measure the relationship and the
dependency between two variables. Covariance indicates the direction of the linear relationship
between variables while correlation measures both the strength and direction of the linear
relationship between two variables.
13. CLRM for multiple regression:CLRM stands for the Classical Linear Regression Model.
The CLRM is also known as the standard linear regression model. Three sets of assumptions
define the multiple CLRM -- essentially the same three sets of assumptions that defined the
simple CLRM, with one modification to assumption A8.
1. Assumptions respecting the formulation of the population regression equation, or PRE.
Assumption A1
2. Assumptions respecting the statistical properties of the random error term and the dependent
variable. Assumptions A2-A4
• Assumption A2: The Assumption of Zero Conditional Mean Error
• Assumption A3: The Assumption of Constant Error Variances
• Assumption A4: The Assumption of Zero Error Covariances 3. Assumptions respecting the
properties of the sample data. Assumptions A5-A8
• Assumption A5: The Assumption of Independent Random Sampling
• Assumption A6: The Assumption of Sufficient Sample Data (N > K)
• Assumption A7: The Assumption of Nonconstant Regressors
• Assumption A8: The Assumption of No Perfect Multicollinearity
14. When the errors are dependent,we can use generalized least squares (GLS). When the
errors are independent, but not identically distributed, we can use weighted least squares
(WLS), which is a special case of GLS.
Weighted Least Squares Regression (WLS) regression is an extension of the ordinary least
squares (OLS) regression that weights each observation unequally. The additional scale factor
(weight), included in the fitting process, improves the fit and allows handling cases with data of
varying quality.
16. Why do we take squared deviation rather than taking mode of errored square although both
of them give us a positive value?
ANS: Mode is not differentiable
17.Outlier:
In simple terms, an outlier is an extremely high or extremely low data point relative to the
nearest data point and the rest of the neighboring co-existing values in a data graph or dataset
you're working with. Outliers are extreme values that stand out greatly from the overall pattern of
values in a dataset or graph.
Detection of Outlier? Given a data set how would you figure out which one is an outlier?
1. ANS Box Slot Method
2. Graph it
How to remove outliers?
ANS = 1. Standardisation
2. Normalisation
How do you deal with missing values?
One method is to remove outliers as a means of trimming the data set.
Another method involves replacing the values of outliers or reducing the influence of outliers
through outlier weight adjustments.
The third method is used to estimate the values of outliers using robust techniques.
1. R^2
● Explain R^2 to a layman
The coefficient of determination, or R2 , is a measure that provides information
about the goodness of fit of a model. In the context of regression it is a statistical
measure of how well the regression line approximates the actual data.
The R2 tells us the percentage of variance in the outcome that is explained by
the predictor variables (i.e., the information we do know). A perfect R2 of 1.00
means that our predictor variables explain 100% of the variance in the outcome
we are trying to predict.
● formula
● R square vs correlation
Correlation measures the strength of the relationship between two variables,
while R-squared measures the amount of variation in the data that is explained
by the model.
● If we change the functional form how to compare the R square?
Adjusted R2
Good substitutes of R^2
To complement R-squared, you can use error metrics such as root mean
squared error (RMSE) and mean absolute error (MAE). These metrics measure
the average distance between the actual and predicted values, and they can
help you compare different models or evaluate how well your model performs on
new data
18. Adjusted R2: The adjusted R-squared is a modified version of R-squared that accounts for
predictors that are not significant in a regression model. In other words, the adjusted R-squared
shows whether adding additional predictors improve a regression model or not.
19. Why are error terms normally distributed?
In developing our models we make the assumption that the error terms are normally distributed.
This assumption helps us in the sense that the work required to calculate estimates, test
statistics and others becomes relatively easier.
20. Modelling based questions that come for econometrics and statistics
● Example: Suppose you are a bank and you need to prepare an econometric
model to give loan to customers. You need to identity whom to give a loan and
whom not to give.
21. Heteroscedasticity, Multicollinearity, Autocorrelation
● Kind of Tests (2 each)
Heteroscedasticity:
1. Nature of data- cross section data
2. Graphical method- graph error squares against y hat square or any explanatory variable
3. PARK test
Functional Form: 𝑙𝑛σ2 = 𝐵1 + 𝐵2𝑙𝑛𝑋𝑖 + 𝑣𝑖 Steps:
1. Estimate regression despite heteroscedasticity and obtain residuals
2. Run the regression against each explanatory variable
3. T test- if rejected there is heteroscedasticity
Drawback: error term vi may be heteroscedastic ( heteroscedasticity is coming from some
omitted variable?)
4. GLEJSER test
Functional Form: |𝑒𝑖| = 𝐵1 + 𝐵2𝑙𝑛𝑋𝑖 + 𝑣𝑖 Steps:
1. Estimate regression despite heteroscedasticity and obtain residuals
2. Run the regression against each explanatory variable
3. T test- if rejected there is heteroscedasticity
Assumption- AR(1), intercept included, normality(if not, runs test), X non-stochastic, no
lagged values(d around 2, but doesn’t mean no auto), no missing observations
great advantage = based on estimated residuals
Great drawback= If fall in Indecisive zone, one can’t conclude autocorr exist or not Normal z or t
test
The Breusch-Godfrey test
To avoid some of the pitfalls of d test, BG is general, it allows for lagged values, high order
autoregressive schemes
Under the null hypothesis (no autocorrelation) , (n-p)R^2 follows chi square distribution
Drawback- Length of lag can’t specified a priori- Use Akaike and Scharz info criteria to select
lag length
Check for specification bias
● To remove multicollinearity – PCA – Principal component analysis
Principal component analysis, or PCA, is a dimensionality reduction method that is often used to
reduce the dimensionality of large data sets, by transforming a large set of variables into a
smaller one that still contains most of the information in the large set.
22. VIF
A variance inflation factor (VIF) is a measure of the amount of multicollinearity in
regression analysis. Multicollinearity exists when there is a correlation between multiple
independent variables in a multiple regression model. This can adversely affect the
regression results. Thus, the variance inflation factor can estimate how much the
variance of a regression coefficient is inflated due to multicollinearity.
The remaining term, 1 / (1 − Rj2) is the VIF.
To ensure the model is properly specified and functioning correctly, there are tests that can be
run for multicollinearity. The variance inflation factor is one such measuring tool. Using variance
inflation factors helps to identify the severity of any multicollinearity issues so that the model can
be adjusted. Variance inflation factor measures how much the behavior (variance) of an
independent variable is influenced, or inflated, by its interaction/correlation with the other
independent variables.
Variance inflation factors allow a quick measure of how much a variable is contributing to the
standard error in the regression. When significant multicollinearity issues exist, the variance
inflation factor will be very large for the variables involved. After these variables are identified,
several approaches can be used to eliminate or combine collinear variables, resolving the
multicollinearity issue.
1. Dummy Variable
● When incorporating qualitative variables into regression models, statistical
programs create dummy variables that take the value of 0 or 1. For instance, a
dummy variable created from fuelType with levels of gas and diesel would take
the value 1 if the car uses gas and 0 if it uses diesel.
● how to create dummies from many variables
here are two steps to successfully set up dummy variables in a multiple
regression: (1) create dummy variables that represent the categories of your
categorical independent variable; and (2) enter values into these dummy
variables – known as dummy coding – to represent the categories of the
categorical independent variable.
● What do you think when you introduce dummy variables?
Dummy variables are useful because they allow us to include categorical
variables in our analysis, which would otherwise be difficult to include due to
their non-numeric nature. They can also help us to control for confounding
factors and improve the validity of our results.
24. Functional Form:(read yourself)
● Interpretations
● log
● Double log
● Double linear
● elasticities
25. Specification Bias
● How do you detect whether the model is doing fine?
● What tests are used?
https://rlacollege.edu.in/pdf/Statistics/specification-bias.pdf
26.
27. Logit Model(youtube)
28. Logistic Distribution (youtube)
29.Time Series
● What is Autocorrelation?
Autocorrelation refers to the degree of correlation of the same variables between
two successive time intervals. It measures how the lagged version of the value
of a variable is related to the original version of it in a time series.
Autocorrelation, as a statistical concept, is also known as serial correlation.
● What improvement would you do your Academic project – Time series