0% found this document useful (0 votes)
86 views

Economics Sem 4 Notes

The document provides an overview of key concepts in statistics including descriptive statistics, parameters, samples, populations, and statistical inference. It discusses various types of data and how to present data visually through graphs and charts. The document also covers topics like data collection methods, sampling, measures of central tendency, and measures of dispersion.

Uploaded by

Balveer Godara
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views

Economics Sem 4 Notes

The document provides an overview of key concepts in statistics including descriptive statistics, parameters, samples, populations, and statistical inference. It discusses various types of data and how to present data visually through graphs and charts. The document also covers topics like data collection methods, sampling, measures of central tendency, and measures of dispersion.

Uploaded by

Balveer Godara
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Economics Notes - Sem 4 (Made by Arya Naga only with Sakshi notes)

Unit 1. Introduction to statistics

Statistics: is a way to get information from the data collected. Presents facts in
numerical figures. “Objective is to extract information from data” and present it in factual
forms as an interpretation of it.

- Descriptive Statistics- deals with organizing, summarizing, and presenting data


in a convenient and informative manner. (Can be of population, e.g.-average or
mean) Conveys information in two ways /types -

a) Graphical techniques - easy method to extract information from the presented


data. Ex. Bar Graph, Pie chart etc.

b) Numerical technique - uses numerical techniques to summarize data for


further interpretations. Ex. Mean calculation, Standard deviation etc.

- Parameter - Descriptive Measure of a population is called a parameter.


- Histogram- shows the number of observations distributed in a particular range.
- Sample is a set of data drawn from the studied population. A descriptive
measure of a sample is called a statistic. Sample statistics is used to make
inferences about the whole population.
- Population- the entire set of observations under study / group of items of
interest, usually very large.
- Statistical Inference - process of making an estimate, prediction, or decision
about a population based on sample data. It measures the foll:

Inferential Statistics- methods used to draw conclusions or inferences about


characteristics of populations based on sample data.

Confidence level - the proportion of times that an estimating procedure will be


correct.

Significance Level -measures how frequently the conclusion will be wrong.

EXIT POLLS: is a method used during election, to make predictions via’s voter’s
response when they exit the voting poll (only used for elections)

Scope of Statistics

Statistics increases the field of mental vision just as a binocular increases the field of
physical vision
Simplifies unwieldy and complex masses of data, and presents them in a form and
manner that they become understandable.
Conversion of data into information, makes it better suited for decision making
Quantifies and measures uncertainty and variability, and thereby helps in decision
making.
Discovers past and emerging patterns in data. Using such analysis, it helps in
forecasting. Sometimes, Statistics can even suggest possible reasons for such a
pattern.
Helps in estimation and validating assumptions.
Quantitative data carry conviction, and add force and credibility to the issue being
discussed. These are used to convince or win an argument

How does it help others?

Helps the managers (all sectors of the economy) to summarize and organize the data
they receive on a massive scale and this further helps in decision making.

Decision making is often performed with the help of Descriptive statistics. These
methods are straightforward. Helps in economic planning as well.

Most management, business, and economics students will encounter numerous


opportunities to make valuable use of graphical and numerical descriptive techniques
when preparing reports and presentations in the workplace.

Unit 2. Classification and Types of data

Type of data

Some important terms:

Variable- some characteristic of a population or sample. E.g., the mark on a stats exam
by a student or price of a stock. Represented by uppercase letters like X, Z, Y.

Values- values of variables are the possible observations of the variable. E.g., price of
stocks (real number) ranging from 0 to 100 dollars.

Data- observed values of a variable. Datum- plural for data.

Three Types of data

1. Interval- are real numbers such as heights, weights, incomes, and distances.
2. Nominal- values which are categories. Such as the marital status of people. Codes
are assigned to these non-numerical values. Further categorized into Qualitative and
Categorical.

3. Ordinal- appear to be nominal but the difference is the order of their values has
meaning. Such as poor, fair, good, very good. It indicates a higher rating. Accordingly,
codes are assigned in ascending order.

* Difference between ordinal and interval is that differences in interval data are
consistent and meaningful and in ordinal data codes are assigned so it's impossible
to compute and interpret differences.

* Difference between nominal and ordinal data is that the order of the values of the
latter indicate a higher rating.

* In nominal and ordinal data you cannot perform descriptive statistical calculations,
only except the Frequency

* In ordinal data only ranking can be assigned and then calculated.

Describing a set of Nominal Data.

Frequency Distribution- Summarising the data in a table, which presents categories and
their counts.

Relative frequency distribution- lists the categories and the proportion with which each
occurs.

Function: COUNTIF (Input range, Criteria)

For e.g., Countif (X1:X250,1)

Describing relationship between two nominal variables or two or more data sets.

One of the methods used is to create a cross-classification table and to produce a table
showing the row relative frequencies. Using Pivot Table (pg36) follow the instruction
then for Summarize values by wala step go to Values and right click and choose Count.
To convert into percentage right click and choose % of row.
Interpretation= If the two variables are unrelated, then the patterns exhibited in the bar
charts should be approximately the same. If some relationship exists, then some bar
charts will differ from others.

Unit 3. Data presentation and Visualization

Graphical techniques to describe interval data

The frequency distribution provides information about how the numbers are distributed,
the information is more easily understood and imparted by drawing a picture or graph.
The graph is called a histogram. A histogram is created by drawing rectangles whose
bases are the intervals and whose heights are the frequencies.

Steps for histogram- go to data, then data analysis, click histogram and then put in
Bin range and output range, choose labels and then Chart Output.

Class Interval Widths= Largest Observation-Smallest Observation/ no of classes

Shapes of Histogram-

Symmetry- the two sides are identical in shape and size.

Positively skewed- towards the left more

Negatively skewed- towards the right more

A skewed histogram is one with a long tail extending to either the right or the left.

Unimodal- single peak Bimodal-Two peaks

Time series- Time-series data are often graphically depicted on a line chart, which is a
plot of the variable over time. It is created by plotting the value of the variable on the
vertical axis and the time periods on the horizontal axis.

Scatter Diagram- Economists develop statistical techniques to describe the relationship


between such variables as unemployment rates and inflation. The technique is called a
scatter diagram. Pg 69 for graphs
Interpret- positive

Negative, No relation, no linear.

Ogives- An ogive is a freehand graph drawn curve to show the cumulative frequency
distribution. It is also known as a cumulative frequency polygon.

Unit 4. Data Collection and Sampling

Direct observation= The simplest method of obtaining data is by direct observation.


When data are gathered in this way, they are said to be observational.

Experimental data- produced through experiments. And is more expensive.

Surveys- One of the most familiar methods of collecting data is the survey, which
solicits information from people concerning such things as their income, family size, and
opinions on various issues.

Personal Interview- involves an interviewer soliciting information from a respondent by


asking prepared questions. A personal interview has the advantage of having a higher
expected response rate than other methods of data collection.

Telephone Interview- A telephone interview is usually less expensive, but it is also less
personal and has a lower expected response rate. Unless the issue is of interest, many
people will refuse to respond to telephone surveys.

Simple Random Sampling-

is a sample selected in such a way that every possible sample with the same number
of observations is equally likely to be chosen.

A stratified random sample is obtained by separating the population into mutually


exclusive sets, or strata, and then drawing simple random samples from each stratum.
(Making categories)

A cluster sample is a simple random sample of groups or clusters of elements.

Sampling Error refers to differences between the sample and the population that exists
only because of the observations that happened to be selected for the sample.Sampling
error is an error that we expect to occur when we make a statement about a population
that is based only on the observations contained in a sample taken from the population
The difference between the true (unknown) value of the population mean and its
estimate, the sample mean, is the sampling error. Ex- population wala.

Non sampling Error is more serious than sampling error because taking a larger
sample will not diminish the size, or the possibility of occurrence, of this error. Even a
census can (and probably will) contain non-sampling errors. Nonsampling errors result
from mistakes made in the acquisition of data or from the sample observations being
selected improperly.

1. Errors in data acquisition- arises from the recording of incorrect responses.

2. Nonresponse error- refers to error (or bias) introduced when responses are not
obtained from some members of the sample.

3. Selection bias- occurs when the sampling plan is such that some members of the
target population cannot possibly be selected for inclusion in the sample.

Unit. 5 Measures of Central Tendency and Dispersion

Measures of central location-

Arithmetic mean= mean is computed by summing the observations and dividing by the
number of observations.

Population means- μ Sample mean- x bar

Function= AVERAGE ([Input Range]) or Descriptive Analysis.

Median= The median is calculated by placing all the observations in order (ascending or
descending). The observation that falls in the middle is the median.

Function= MEDIAN (Input Range)

The mode is defined as the observation (or observations) that occurs with the greatest
frequency. Both the statistic and parameter are computed in the same way. Pg-93

Range- Largest obv-smallest obv

Variance- How given data varies to the arithmetic mean of the data set. Sample
variance denotes the variation between your sample data and the mean of your sample
data. Sample is plucked from the huge pool of population (entire available data).
Function: VAR (Input Range)

Population Varx and Sample Varx pg. 97

VARIANCE CANNOT BE NEGATIVE BECAUSE IT IS SQUARED WHICH ELIMINATES


ALL THE POSSIBILITIES OF BEING NEGATIVE.

Standard deviation: Shows how much your data deviates or varies from the average or
the mean of the data. Low SD means data is clustered around mean and high means
there is a high dispersion of data from mean.

Interpret= info depends on the shape of histogram and if the histogram is bell shaped,
use the empirical rule.

1. Approximately 68% of all observations fall within one standard deviation of the mean.

2. Approximately 95% of all observations fall within two standard deviations of the
mean. 3. Approximately 99.7% of all observations fall within three standard deviations of
the mean

The coefficient of variation of a set of observations is the standard deviation of the


observations divided by their mean: Population coefficient of variation: CV = σ/μ

Sample coefficient of variation: cv = s/x

Percentile

The Pth percentile is the value for which P % are less than that value and (100 – P)%
are greater than that value.

Because these three statistics divide the set of data into quarters, these measures of
relative standing are also called quartiles. The first or lower quartile is labeled Q1. It is
equal to the 25th percentile. The second quartile, Q2, is equal to the 50th percentile,
which is also the median. The third or upper quartile, Q3, is equal to the 75th percentile.

Quintiles divide the data into fifths, and deciles divide the data into tenths.

LP = (n + 1) P/100

The interquartile range measures the spread of the middle 50% of the observations.
Large values of this statistic mean that the first and third quartiles are far apart,
indicating a high level of variability.

=Q3-Q1
- Mean: The arithmetic average of the given data set
- Median: The middle value of the given data set after arranging it in ascending
order
- Mode: The value in a data set that has the most repetitive frequency.
- Standard error of the mean: The standard deviation of given mean values is
called standard error (it is calculated using the means of multiple data sets and
then calculating their standard deviation) The bigger the sample size the smaller
the standard error. How far is your sample mean from the actual population
mean. Measure of mean’s accuracy.
- Standard deviation: Shows how much your data deviates or varies from the
average or the mean of the data. (data’s dispersion) Low SD means data is
clustered around mean and high means there is a high dispersion of data from
mean.
- Sample Variance: How given data varies to the arithmetic mean of the data set.
Sample variance denotes the variation between your sample data and the mean
of your sample data. Sample is plucked from the huge pool of population (entire
available data). It is the squared value of standard deviation.
- Range: Difference between the highest and the smallest number of a given data
set / subtracting the smallest from the largest number.

Unit. 6-7 Probability 1

Random Experiment A random experiment is an action or process that leads to one of


several possible outcomes

Features:

The first step in assigning probabilities is to produce a list of the outcomes. The listed
outcomes must be exhaustive, which means that all possible outcomes must be
included. In addition, the outcomes must be mutually exclusive, which means that no
two outcomes can occur at the same time.

Sample Space: A sample space of a random experiment is a list of all possible


outcomes of the experiment. The outcomes must be exhaustive and mutually exclusive.

Requirements:

1. The probability of any outcome must lie between 0 and 1; that is, 0 ≤ P(Oi ) ≤ 1

2. The sum of the probabilities of all the outcomes in a sample space must be 1.
3 approaches to assigning probabilities:

1. Classical Approach - equal probabilities assigned based on the number of


possible outcomes. Game of chance (guess)
2. Relative frequency approach - defines probability as long run relative frequency
with which an outcome occurs. (Assume that there are infinite number of
observations)

Formula: P(a) = n(a)/sample space(total no of obv in sample space)

3. Subjective approach - define probability as the degree of belief that we hold in


the occurrence of an event.

Interpreting probability : 6-1d Pg 158, ex. first interpret in any method using relative
frequency approach, then subjective approach.

Event: An event is a collection or set of one or more simple events in a sample space.

An individual outcome of a sample space is called a simple event. All other events are
composed of simple events in a sample space.

Probability of an Event is the sum of the probabilities of the simple events that
constitute the event.

Intersection of Events A and B

The intersection of events A and B is the event that occurs when both A and B occur. It
is denoted as A and B. The probability of the intersection is called the joint probability.
(Central space in the Venn Diagram).

Marginal probabilities, computed by adding across rows or down columns, are so


named because they are calculated in the margins of the table.

Conditional probabilities, (to know how two events are related) only if the condition
occurs then only this probability occurs - probability of one event given the occurrence
of another related event.
| = Condition (means, given) for conditional probability

∩ = and = intersection (inverted u) for joint probability

∪ = or

Independence: Two events A & B are independent if and only if joint probability is
multiplication of marginal probability of P(A) and P(B) ; P (A n B) = P(A) * P(B). This is
also the multiplication theorem. - joint prob of two events.

Two events A and B are said to be mutually exclusive or disjoints if they don't have any
common outcomes, P(A and B) = 0; P(A n B) = 0

Reciprocals: when the sum of the probabilities of both is 1

Sum is 1: when the sample space is same (same denominator)

Addition theorem: If P(A) and P(B) are not mutually exclusive events then:

P(A u B) => P(A) + P(B) - P(A n B)

Addition rule for mutually exclusive events P(A u B) = P(A) + P(B)

It is addition theorem, whenever the question has ‘Or’

It is multiplication theorem, when the question has ‘and’


Bayes theorem: Given by Thomas Bayes, describes the probability of an event, based
on prior knowledge of conditions that might be related to the event.

Complement Rule:

The complement of event A is the event that occurs when event A does not occur. The
complement of event A is denoted by AC. The complement rule defined here derives
from the fact that the probability of an event and the probability of the event’s
complement must sum to 1.

Unit. 8 - 9. Random Variable and Probability Distribution 1

Random Variable - function or a rule that assigns real numbers to the outcome of a
random experiment.

Random experiment is when you know the outcomes in advance, but exact outcomes
are unknown.

Discrete rv=is one that can take on a countable number of values. (height, scores etc)

Continuous rv= is one whose values are uncountable. (time,petrol consumed etc)

A probability distribution is a table, formula, or graph that describes the values of a


random variable and the probability associated with these values.
Requirements for a Distribution of a Discrete Random Variable

1. 0 ≤ P(x) ≤ 1 for all x lies between 0 to 1

2. all-x P(x) = 1 sum of all probabilities = 1

Population mean(neu), Sample mean, Population Variance, and sample variance.


(Refer to notebook)

Standard deviation = square root of variance.

E(x)=sigma x P(X=x)

It is the sum of the values taken by a random variable and it is associated with
probabilities.

BINOMIAL DISTRIBUTION

Fixed no of trials

Has only two outcomes – success and failure

Probability of success- p

Probability of failure= 1-p


trials are independent, which means that the outcome of one trial does not affect the
outcomes of any other trials

Bernoulli trial= if there are only two outcomes success and failure, Probability of
success- p

Probability of failure= 1-p

trials are independent, which means that the outcome of one trial does not affect the
outcomes of any other trials

P(x)= n! / x! (n-x)!

Function= BINOM.DIST

POISSON DISTRIBUTION

If we want to find out the probability of successes in each interval of time or space, we
use Poisson distribution.

Function=POISSON.DIST(x, mu, t/f)

P(x) = e−μμx/ x! and substituting x = 0

NORMAL DISTRIBUTION

Based on continuous random variables, values are within a particular interval.

Heights, Weights, and distance covered.

It is bell shaped and symmetric around the mean.

Functions= NORM.DIST(x, mu, SD, true)

NORM.S. DIST (Z, True)

NORM.INV

NORM.S.INV
Unit. 10 Inferential Statistics

Inferential Statistics: methods used to draw conclusions or inferences about


characteristics of populations based on sample data.

Point Estimates: Point Estimator - A point estimator draws inferences about a


population by estimating the value of an unknown parameter using a single value or
point.

Interval Estimates: Interval Estimator - an interval estimator draws inferences about a


population by estimating the value of an unknown parameter using an interval

Confidence Intervals: refers to the probability that a population parameter will fall
between a set of values for a certain proportion of times. Analysts often use confidence
intervals that contain either 95% or 99% of expected observations.

Error in Estimation: an error made by using the equation of a regression line to


estimate the values of the dependent variable from those of the independent variable.

Sample Size:
Statistical Distribution: Sampling Distribution -

x: 1 2 3 4 5 6

P(x): 1/6 1/6 1/6 1/6 1/6 1/6

Population mean -

μ = ΣxP(x)

= 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6)

= 3.5

Population variance -

2
σ2 = Σ (𝑥 − μ) P(x)

= (1 − 3.5)2(1/6) + (2 − 3.5)2(1/6) + (3 − 3.5)2(1/6) + (4 − 3.5)2(1/6) + (5 − 3.5)2(1/6) +


(6 − 3.5)2(1/6)

= 2.92

Population standard deviation -

σ = σ2

= 2. 92

= 1.71

Σ = a = sigma, all/sum of all observations


Unit. 11 Hypothesis Testing

The purpose of this type of inference is to determine whether enough statistical


evidence exists to enable us to conclude that a belief or hypothesis about a parameter
is supported by the data. (or not)

H0: The null hypothesis (negative about the study/ absence of phenomena)

H1: The alternate hypothesis/ research hypothesis (positive about the study/what we
want to know/ presence of phenomena)

//phrased as not rejecting the null hypothesis in favor of the alternative.

There are two possible errors. A Type I error occurs when we reject a true null
hypothesis. A Type II error is defined as not rejecting a false null hypothesis. In the
criminal trial, a Type I error is made when an innocent person is wrongly convicted. A
Type II error occurs when a guilty defendant is acquitted. The probability of a Type I
error is denoted by α which is also called the significance level (alpha). The probability
of a Type II error is denoted by β (Greek letter beta). The error probabilities α and β are
inversely related, meaning that any attempt to reduce one will increase the other.

Critical Concepts of HT

The testing procedure begins with the assumption that the null hypothesis is true.

The goal of the process is to determine whether there is enough evidence to infer that
the alternative hypothesis is true.

There are two possible decisions:

- Conclude that there is enough evidence to support the alternative hypothesis.


- Conclude that there is not enough evidence to support the alternative hypothesis.

Two possible errors can be made in any test. A Type I error occurs when we reject a
true null hypothesis, and a Type II error occurs when we don’t reject a false null
hypothesis.

The probabilities of Type I and Type II errors are

P(Type I error) = α

P(Type II error) = β
Test Statistic: Randomly sample the population and calculate the sample mean. It is
the criterion on which we base our decision about the hypotheses. If the test statistic’s
value is inconsistent with the null hypothesis, we reject the null hypothesis and infer that
the alternative hypothesis is true.

Steps involved:

1. Define hypothesis (Null = H0,


Alternate = H1)
2. Identify and note down
a. Standard deviation = sd
b. Mean = x
c. No of observation = n
d. Confidence level
3. Calculations:

Zcal=> either by using normal distribution (NORMDIST) or memorize above table.

X - u = **

sd/sqrt(n) = ***

Zcal = (x-u)/ [sd/sqrt(n)]

Now compare the value of z with table and evaluate whether to accept or not accept, if it
falls within Confidence level wala z tab +ve to -ve. Ex. -1.96 < zcal < 1.96.

4. Then mention which type of error you have eliminated.

Unit. 12 Correlation

CORRELATION - A statistical technique which analyzes the relationship between 2


variables. Correlation is calculated between 0 to 1.

Positive (both rise and fall together), Negative (both act opposite in rise and fall), Linear
(Ratio of change in variables stays constant throughout), Curvi-linear (positive
correlation with separate ratio of change every time).

Two variables= simple correlation, Partial = more than two variables but studying only 2
after making all other constants, Multiple = Studying multiple variables simultaneously.
Degree of Correlation:

Perfect - positive= 1; Negative= -1

High degree- positive= 0.75 to 1; negative= -0.75 to 1

Moderate- positive= 0.25 to 0.75; negative= -0.25 to 0.75

Low- Positive= 0 to 0.25; negative= 0 to -0.25

It is a measure of linear association.

Step by Step on excel:

Click data > data analysis > select correlation > input range : both columns > drop menu
- input range = (A1:B10)

Function:

= CORREL (A1: A10, B1: B10)

= PEARSON (A1: A10, B1: B10)

Karl Pearson Correlation Coefficient:

𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑥.𝑦) Σ (𝑥 − 𝑥) (𝑦 − 𝑦)
δ𝑥𝑦 = σ𝑥, σ𝑦
or δ𝑥𝑦 = 2 2
(𝑥−𝑥) − (𝑦−𝑦)

2
Standard deviation : σ𝑥 = (𝑥 − 𝑥)

Covariance
Spearman's Rank Correlation Coefficient:

ρ = spearman’s rank correlation coefficient

𝑑𝑖 = 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡𝑤𝑜 𝑟𝑎𝑛𝑘𝑠 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛

𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

Given observations on x, y, rank them from highest to lowest assigning numbers.


2
Obtain, 𝑑𝑖 and 𝑑𝑖 , substitute in formula and get the coefficient.

Cross check with Pearson/Correl function, as done in the example below

Unit. 13 - 16 REGRESSION
Regression: Study of the relationship between independent variable and dependent
variable. Regression helps in predicting possible values of dependent variables based
on independent ones. Regression helps us predict how dependent variables will act if
the independent increases or decreases.

Always has causation and effect relationship.

Regression analysis is used to predict the value of one variable on the basis of another
variable. It develops a mathematical equation or model that describes the relation
between dependent variable (variable to be forecast) and independent variable
(believed by the practitioner)

Simple regression has only 2 variables, multiple regression = at least 3 variables.

Always insert a trendline of a scatter diagram, plus add the linear equation and R
square value.

Steps -

1. Select data > recommended charts > scatter chart>


2. Click on the scatter chart > select “+”> add trendline> more> display linear
equation > display R square value - write their interpretation

Simple Linear Regression Model

The straight-line model with one independent variable. This model is called the
first-order linear model—sometimes called the simple linear regression model.
First-Order Linear Model
y = β0 + β1x + ε
Where,
y = dependent variable
x = independent variable β
0 = y-intercept
β1 = slope of the line (defined as rise/run)
ε = error variable

Can write the linear equation as foll (in exam):

y = a + bx

(Slope is also the coefficient of independent variable in the ANOVA table)


The slope is defined as rise/run, which means that it is the change in y (rise) for a one
unit increase in x (run). Put less mathematically, the slope measures the marginal rate
of change in the dependent variable. The marginal rate of change refers to the effect of
increasing the independent variable by one additional unit.
Intercept- point at which the regression line and the y axis intercept.

Coefficient of Correlation tells us about the linear relationship whether it is strong or


weak.
Refer to pg 636 and 637 of the textbook for all formulae and examples.

Error variable - accounts for all the variables, measurable and immeasurable, that are
not part of the model.

The problem objective addressed by the model is to analyze the relationship between
two variables, x and y, both of which must be interval. To define the relationship
between x and y, we need to know the value of the coefficients β0 and β1. However,
these coefficients are population parameters, which are almost always unknown.

Residual output - The residuals are observations of the error variable. The deviations
between the actual data points and the line are called residuals, denoted ei ; that is, ei =
yi − y^i

Function on Excel :-
1. Click, Data, Data Analysis and Regression
2. Specify the Input Y Range (A1:A101) and the Input X Range (B1:B101).
3. Draw scatter diagram - select for line fit plot

Interpretation of Regression

Multiple Regression (Multiple R) - measures the strength of the linear relationship


between the independent variables and the despondent variable suggests. Ex. strong
positive relationship etc.

R Square - which represents the proportion of variance in the response variable that is
explained by the independent variables.

Adjusted R square - takes into account the number of dependent variables in the
model (suggests the variability of this variable and if it is sufficient or no). It suggests
that the addition of other dependent variables might improve the model’s ability to
explain the variability in the independent variables.
Standard error - indicates the variability or dispersion of the data points around the
regression line. Represents the average distance that the observed values fall from the
regression line.

Observations - sample size/number of data points used in analysis.

ANOVA table - it is the Analysis of Variance (ANOVA) shows the sources of variation in
the regression model.

Degree of Freedom - df:

SS: Sum of squares. It represents the variation explained by the regression model.

MS: Mean square. It is calculated by dividing the sum of squares by its respective
degrees of freedom.

F: The F-statistic is a ratio of the mean square values and tests the overall significance
of the regression model.

Significance F: The p-value associated with the F-statistic. (ex. In this case, it is very
small (1.83945E-05), indicating that the regression model is statistically significant) -
smaller than 5%

Coefficients: ANOVA Table provides the estimated coefficients for the intercept and the
independent variable(s).

- Intercept: The estimated intercept. It represents the predicted value of the


dependent variable when all independent variables are zero.
- X value (independent variable) : The estimated coefficient for the independent
variable. It indicates that for each unit increase in the "independent" variable, the
predicted value of the dependent variable increases by “amt of coefficient”.

Ex. Intercept: The estimated intercept is 26.917. It represents the


predicted value of the dependent variable when all independent variables
are zero. Overweight: The estimated coefficient for the independent
variable "Overweight" is 0.794. It indicates that for each unit increase in
the "Overweight" variable, the predicted value of the dependent variable
increases by 0.794.
Residual Output: This table shows the predicted values, residuals, and observed
values for each observation in the dataset.

- Predicted values (of dependent variables): The predicted values of the


dependent variable based on the regression model.
- Residuals: The differences between the observed values and the predicted
values. Positive values indicate that the actual values are higher than predicted,
while negative values indicate the opposite.\

// Correlation vs regression: correlation measures degree of relation and regression =


nature of variable.

Regression is for cause and effect and correlation is not.

correlation cannot predict stuff. correlation is a linear relationship. correlation can be


between anything but regression cannot take place between unrelated stuff.

Covariance: A measure of variation in two variables together using their means.


Establishes positive or negative relationships between two variables.

When covariance gets a defined scale, it is called correlation. It also limits the range of
data. Covariance gives direction and relation gives strength as well. Covariance in units.

Sample Answer: Xm 16-06 solved sheet

Unit. 17 - 18 Time Series

Time series- Time-series data are often graphically depicted on a line chart, which is a
plot of the variable over time. It is created by plotting the value of the variable on the
vertical axis and the time periods on the horizontal axis.

- Any variable that is measured over time in sequential order is called a time series
- Time series analysis helps to to detect patterns that will enable us to forecast
future values of the time series.
- Any data which is indexed with respect to time is a time series.

Application, there is an almost unlimited number of such applications in management


and economics.(some examples):
1. Governments want to know future values of interest rates, unemployment rates, and
percentage increases in the cost of living.

2. Housing industry economists must forecast mortgage interest rates, demand for
housing, and the cost of building materials.

3. Many companies attempt to predict the demand for their products and their share of
the market.

4. Universities and colleges often try to forecast the number of students who will be
applying for acceptance at post secondary-school institutions

Time-Series Components (4 variations/features)

1. Long-term trend - A trend (also known as a secular trend) is a long-term, relatively


smooth pattern or direction exhibited by a series. Its duration is more than 1 year. [TSCI]

For example, the population of the United States exhibited a trend of relatively steady
growth from 157 million in 1952 to 314 million in 2012.

2. Cyclical variation - is a wavelike pattern describing a long-term trend that is


generally apparent over a number of years, resulting in a cyclical effect. By definition, it
has a duration of more than 1 year. However, cyclical patterns that are consistent and
predictable are quite rare. For practical purposes, we will ignore this type of variation

Examples include business cycles that record periods of economic recession and
inflation, long-term product-demand cycles, and cycles in monetary and financial
sectors.

3. Seasonal variation - refers to cycles that occur over short repetitive calendar periods
and, by definition, have a duration of less than 1 year. Caused by irregular and
unpredictable changes & no other components.

For example, the term seasonal variation may refer to the four traditional seasons or to
systematic patterns that occur during a month, a week, or even one day. Demand for
restaurants feature “seasonal” variation throughout the day.

4. Random variation/irregular - is caused by irregular and unpredictable changes in a


time series that are not caused by any other components. It tends to mask the existence
of the other more predictable components. Because random variation exists in almost
all time series, hence we study the ways to reduce the random variation, which allows
us to describe and measure the other components. By doing so, we hope to be able to
make accurate predictions of the time series.

A moving average for a time period is the arithmetic mean of the values in that time
period and those close to it. It is a series of averages, calculated from historic data.

Computation in excel

2. Click Data, Data Analysis, and Moving Average.

3. Specify the Input Range (A1:A17). Specify the number of periods (3), and the Output
Range (B1). Interval (3 or 5)

4. Delete the cells containing N/A.

5. To draw the line charts.

In excel: Add and Divide by 3 for three quarter moving average, first and last value not
included

In excel: Add and Divide by 5 for five quarter moving average, first two and last two
values not included

You might also like