0% found this document useful (0 votes)

126 views

What Is Correlation Analysis

Statistics involves collecting, organizing, analyzing, and interpreting quantitative data. There are two main types: descriptive statistics, which summarize and present data through measures like mean, median and mode; and inferential statistics, which make predictions about a population based on a sample. Variables can be qualitative (non-numeric attributes) or quantitative (numeric data that is either discrete or continuous). There are different levels of measurement for variables, from nominal to ratio. Descriptive statistics and graphical representations are used to describe sets of data, while numerical measures like mean and standard deviation describe the center and spread of the data.

Uploaded by

Ashutosh Pandey

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

126 views

What Is Correlation Analysis

Uploaded by

Ashutosh Pandey

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 73

Statistics the Science of collecting, organising, presenting,

analyzing and interpreting data to assist in making more

effective decisions.
Types of Statistics:
The study of statistics is usually divided into 02 categories
(i) Descriptive statistics: Method of organising,
summarizing, and presenting data in a informative way
Ex:- Frequency distribution and charts (histogram, pie
etc. it suggests the specific measures of central location
such as mean, describes central values of a group of
numerical data.
(ii) Inferential Statistics (or statistical inference/inductive
statistics): the method used to determine something
about a population on the basis of a sample.
Population: the entire set of individuals or objects of
interest or the measurements obtained from all
individual or objects of interest.
Sample: A portion, or part of the population of interest.
Types of Variables:
There are 2 basic types of variables
(i) Qualitative Variable:- When characteristics being
studied is non numeric, it is called qualitative variable or
an attribute
Ex: gender, religion, color

Types of Variables

Non numeric Quantitative Quantitative Numeric data

data

Discrete Continuous

Can take only Can take any

certain values and value within a
there are usually range
‘gaps’ between
values

Nominal- Level Data:

The lowest or most primitive, measurement is the nominal
level. The highest or the level that gives us the most
information about the observation, is the ratio level of
measurement.
There are actually 04 levels of measurement:
1. Nominal Level Data:- For the nominal level of
measurement observations of a qualitative variable can
only be classified and counted. There is no particular order
of the levels.
Ex. The classification of items by colors say chocolates the
nominal level data have the following properties.
i) Data categories are mutually exclusive and exhaustive.
ii) Data categories have no logical order.
2. Ordinal Level Data: Example
a) Poor and rich
b) Superior vs Good vs Average
Here we can say superior is better than good is better
than Average But we can not conclude how much better
the rating is in summary, the properties of ordinal-level
data are:
i) The data classification are mutually exclusive and
exhaustive
ii) Data classification are ranked or order4ed
according to the particular trait they posses.
3. Interval-Level Data: Ex. Measurement of length by meter;
heat by
The properties of internal-level data are:
i) Data classification are mutually exclusive and exhaustive
ii) Data classifications are ordered according to the
amount the characteristic they possess.
iii) Equal differences in the characteristics are represented
by equal differences in the measurements.
4. Ratio-Level Data: The ratio-level is the “highest” level of
measurement. In summary, the properties of the ratio-
level data are:
i) Data classifications are mutually exclusive and
exhaustive.
ii) Data classifications are ordered according to the
characteristics they posses.
iii) Equal differences in the characteristics are represented
by equal differences in the numbers assigned to the
classifications.
iv) The zero point is the absence of the characteristic.

Class Intervals and Class Midpoints:

 Class midpoint: The midpoint is halfway between the
lower limits of two consecutive classes.
 Class Interval: Is the difference lower limit of consecutive
classes. One can also determine the class interval by
finding the difference between consecutive midpoints.

 Histogram: A graph in which classes are on the horizontal

(x-axis) and frequencies on the vertical axis (y-axis). The
class frequencies are represented by the height of the
bars, and the bars are drawn adjacent to each other.
 Frequency Polygon similar to a histogram. It is constructed
by joining the class mid-points with a line segment.

Describing Data
To transform a mass of raw data into a meaningful form is
important.
Descriptive Statistics: Frequency distributive and graphical
representations like histogram or frequency polygon.
Numerical Statistics: 2 important numerical ways to represent
data.
 Measure of location- often references to as averages. The
purpose of measure of a location is to pinpoint the center
of a set of values Most common 5 measures of location are
arithmetic mean the weighted mean the geometric mean
are median the mode.
 Measure of dispersion- often called the variation or spread

Any measurable characteristics of a population is called

parameter. The mean of a population is a parameter.

Weighted Mean=
Or

Median: The midpoint of the values after they have been

ordered from the smallest to the largest or the largest to the
smallest.
Mode: The value of the observation that appears most
frequently.
Mode has the advantage of not being affected by the extremely
high or low values. It can be used for all levels of data- nominal,
ordinal, internal or ratio.
The mode does have disadvantages; however that causes it to
be used less frequently than the mean or median. There may be
more than one mode values or no mode values in a set of data.
For a symmetric distribution, the mode, median, and mean are
looked at the center and are always equal.
If a distribution is non symmetrical or skewed, the relationship
among the three measures changes. In a positively skewed
(more area on right side) distribution, the AM is the largest of
the three affected by extreme values. The median is the next
largest measure. The mode is the smallest of the three
measures.
The opposite happens in a negatively skewed distribution.
Geometric Mean, GM=

Dispersion- studies the spread of data

Measures of Dispersion
(i) Range- Largest Value-Smallest Value
(ii) Mean Deviation, MD: The arithmetic mean of the
absolute values of the deviations from the arithmetic
mean

MD=

(iii) Variance & SD

(iv) Variance: The arithmetic mean of the squared deviation
from the mean
Standard Deviation: The square roof of the variance

Population Variance, =

Population SD,

Sample Variance, =
Sample SD,

Chebyshev’s Theorem: For any set of observations (sample or

population) the portion of the values that lie within K standard
deviations of the mean is at least 1-1{ , where K is any
constant greater than 1
The Empirical Rule (sometimes called as Normal Rule): For a
symmetrical, bell shaped frequency distribution approximately.
68% of the observations will be within SDs of the mean; and

practically all (99.7%) will be within SDs of the mean.

The Mean & SD of Grouped Data:

AM of grouped data,

SD of grouped data, =

Of variation The Ratio of the SD to the AM, expressed as a %

Probability Distribution: A listing of all the outcomes of an
experiment and the probability associated with each outcome.
In any experiment of chance, the outcomes occur randomly. So
it is often call random variable.
Random Variable: A quantity resulting from an experiment that
by chance, can assume different values.
Discrete Random Variable: A random variable that can assume
only certain clearly separated values.

Mean & SD of a Probability Distribution

Mean of a probability distribution is also referred to as its
expected value

Mean of a PD,

Where P(x) is the probability of a particular value x.

Variance of a PD,
Binomial PD is a special case of discrete probability distribution
with only 02 possible outcomes

Binomial PD,

Where: C- Combination, - Probability of success of each trial,

n- no. of trials, X- the RV defined number of success.

[Remember:- is not same as 3.1416]

Mean of a BPD,

Variance of a BPD,

Poisson PD describes the number of times some event occurs

during a specified internal. The internal may be time, distance,
area or volume.
PPD is based on 03 basic assumptions:
i) The RV is the no. Of times some event occurs during a
defined interval
ii) The probability of the event is proportional to the
interval.
iii) The intervals which do not overlap are independent.
Poisson Distribution,

Where:
- is the mean no. Of occurrences (successes) in a particular
interval; e= 2.71828; X- no. of occurrences (successes); P(x) is
the probability for a specified value of x;

Mean of PD,

Where is the no. of total trials and is the probability of

successes.

Normal PD

Where: is mean; is SD; = ; e is base of natural of

system.
Continuous PD
The number of normal distributions is unlimited, each having a
different mean , standard deviation , or both. While is
possible to provide probability tables for discrete distribution
such as binomial and the Poisson, providing tables for the
infinite number of distributions is impossible fortunately, one
member of the family can be used to determine the
probabilities for all normal distributions. It is called the standard
normal distribution and it is unique because it has a mean of 0
and a standard deviation of 1.
Any normal distribution can be converted into a Standard
Normal Distribution by subtracting the mean from each
observation and dividing the difference by the SD. The results
are called Z values. They are also referred to as z scores, the z
statistics, the standard normal deviates, the standard normal
values, or just the normal deviate.

It is observed that the binomial distribution (a discrete

distribution) can be approximated using normal distribution (a
continuous distribution) for large values of n.
The normal PD is a good approximation to the binomial
probability distribution when and are both at least
5.
Continuity correction factor the value.5 subtracted or added,
depending on the question, to a selected value when a discrete
probability distribution approximated by a continuous PD
How to apply the correction Factor
Only 04 cases may arise. These cases are:
For probability
i) at least x occur, use the area above (x-0.5)
ii) That more than x occur, the area above (x+0.5)
iii) That x or fewer occur, the area below (x+0.5)
iv) That fewer than x occur, the area below (x-0.5)

The CLT states that for large random samples, the shape of the
sampling distribution of the sample mean is close to a normal
probability distribution. This approximation is more accurate for
large samples than for small samples. This is one of the most
useful conclusions in statistics. We can reason about the
distribution of the sample mean with absolutely no information
about the shape of the population distribution from which the
sample is taken. In other words, the CLT is true for all
distributions.
Sampling Distribution of the Sample Mean: information about
the shape of the population distribution from which the sample
is taken means of samples of a specified size vary from sample
to sample.
Sampling Distribution of the Sample Mean: A probability
distribution of all possible sample means of a given sample size.
Central Limit Theorem
If all samples of a particular size are selected from any
population; the sampling distribution of the sample mean is
approximately a normal distribution. This approximation
improves with larger samples.
If the population follows a normal distribution, then for any
sample size the sampling distribution of the sample mean will
also be normal. If the population distribution is symmetrical
(but not normal), you will see the normal shape of the
distribution of the sample mean emerges with samples as small
as 10. On the other hand if you start with a distribution that is
skewed or has thick tails, it may require samples of 30 more to
observe the normality feature. A sample size of 30 or more to
be large enough for CLT be employed.
The CLT indicates that, regardless of the shape of the
population distribution, the sampling distribution of the sample
mean will move towards the normal probability distribution.
The larger the number of observations in each sample, the
stronger the convergence.
For larger sample sizes, it is observed that the mean of the
sampling distribution is the population mean, i.e., , and if
the standard deviation in the population is , the standard

deviation of the sample mean is , where n is the number

of observations in each sample. We refer as the standard

error of the mean. Its longer name is actually the standard

deviation of the sampling distribution of the sample mean.

Important Conclusions from this section:

i) The mean of the distribution of sample means will be
exactly equal to the population mean if we are able to
select all possible samples of the same size from a given
population. That is:

Even if we do not select all samples, we can expect the

mean of the distribution of sample means to be close to.
ii) There will be less dispersion in the sampling distribution
of the sample mean than in the population. If the SD of
the population is , the standard deviation of sample
means is , Note that when we increase the size of

the sample the standard error of the mean decreases.

If we have a population about which we have some
information. We take a sample from that population and wish
to conclude whether the sampling error, that is, the difference
between the population parameter and the sample statistic is
due to chance.
As discussed in this section, we can calculate the probability
that a sample mean will fall within a certain range. We know
that the sampling distribution of the sample mean will follow
normal probability distribution under two conditions:
i) When the samples are taken from populations known to
follow the normal distribution. In this case the size of
the sample is not a factor.
ii) When the shape of the population distribution is not
known or the shape is known to be non normal but our
sample contains at least 30 observations.

We can now use, , to convert any normal

distribution to standard normal distribution using this value z,

we can find the probability (from Table) selecting an
observation that would fall within a specific range.
Where: x is the value of the RV; is the population mean, and
is the population SD.

However, most business decisions refer to a sample not just one

observation. So we are interested in the distribution of , the
sample mean, instead of X, the value of one observation. That is
the first charge we make in the above formula. The second is
that we use the standard error of the mean of n observations
instead of the population standard deviation. That is, we use

in the denominator rather than . Therefore to find the

likelihood of a sample mean with a specified range, we first use

the following formula to find the corresponding z value. Then
use the Table for area under the normal curve to locate the
probability.

Finding the z value of when the population SD is known:

Often we do not know the value of the standard deviation, .
Again since the sample is at least 30, we estimate the
population SD with the sample SD. The actual distribution of
the statistics is student’s distribution.

Finding the z value of when the population SD is unknown:

POINT ESTIMATES & CONFIDENCE INTERVAL

In most business situations, information about the population is
not available. In fact, the purpose of sampling may be to
estimate same of the population parameter values like
population mean and population standard deviation.
We start by find a point estimate. However, a point estimate is a
single value. A more informative approach is to present a range
of values in which we expect the population parameter to
occur.
Point Estimate: The statistic, computed from sample
information, which is used to estimate the population
parameter.
The sample mean, , is a project estimate of the population

mean, ; , a sample proportion, is a point estimate of , , the

population proportion; and s, the sample SD, is a point estimate
of , the population SD.

A point estimate, however, tells only a part of the story while

we expect the point estimate to be close to the population
parameter, we would like to measure how close it really is. A
confidence interval serves this purpose.
Confidence Interval: A range of values constructed from sample
data so that the population parameter is likely to occur within
that range at a specified probability. The specified probability is
called the level of confidence.
The SD of the sampling distribution of the sample mean is
usually called the “standard error”

We know standard error of the mean

In most applied situations population SD is not available, so we

estimate it as follows:
The size of the Standard Error(SE) is affected by two values. The
first is the SD if the SD is large, then SE will also be large.
However, the SE is also affected by the sample size, n. As the
sample size is increased, the SE decreases, indicating that there
is less variability in the sampling distribution of the sample
mean. This conclusion is logical, because on estimate made
with a large sample should be more precise than one made
from a small sample.

Confidence Interval for the population Mean

Where: z depends on the level of confidence

Unknown Population SD and a Small Sample:

In the previous section we used the standard normal
distribution to express the level of confidence we assumed
ether:
i) The population followed the normal distribution and the
population SD was known, or
ii) The shape of the population was not known, but the
number of observation in the sample is at least 30.
What do we do if sample size is less than 30 and we do not
known the population SD? This situation is not covered by CLT
but exist in many cases. Under these conditions, the correct
statistical procedure is to replace the standard normal
distribution with the t distribution.

S is an estimate of

The t distribution has a higher spread than the normal

distribution.

Confidence interval for the population mean, unknown

To develop a confidence interval for the population mean with

an unknown population SD we:
i) Assume the sample is from a normal distribution.
ii) Estimate the population SD, σ, with the sample SD, s.
iii) Use t distribution rather than the z distribution.
We should be clear at this point. We usually employ the
standard normal distribution when the sample size is at least
30. We should strictly speaking, base the decision whether to
use z or t on whether is known or not. When is known, we
use z; when it is not we use t. The role of using z when the
sample size is 30 or more is based on the fact that the t
distribution approaches the normal distribution as the sample
size increases. When the sample size reaches 30, there is little
difference between the z and t values, so we ignore the
difference and use z.
Is the Population normal?

No Yes

Is n 30 or more? Is the population SD known?

Yes No Yes
No

Use a non- Use the z Use the t Use the z

parametric distribution distribution distribution
test

Figure: Determining when to use z Distribution or the t

Distribution
Choosing an Appropriate sample Size:
Sample size depends in 3 factors:
(i) The confidence interval defined
(ii) The margin of error the researcher will tolerate.
(iii) The variability in the population being studied
Or
(i) Level of confidence
(ii) Allowable error
(iii) Population SD (If the population is widely dispersed, a
large sample is required. If the population is
homogenous, has low SD, we need a small sample.).
However, it may be necessary to use an estimate for the
population SD. Here are three suggestions to find that
estimate:
a) Use a comparable study

b) Range based approach (Virtually 3 ....... is within

the range)
c) Conduct a pilot study
We can express the interaction among these three
factors and the sample size in the following formula:

Where is the max. Allowable error

Hypothesis:
A hypothesis is a statement about a population. Data are then
used to check the resemblances of the statement.

General is rejected it the confidence interval doesn’t include

the hypothesized value. If the confidence interval includes the
hypothesized value, then is not rejected.
What is Correlation Analysis?
 Is a technique to determine whether there is any
relationship between two variables
i. dependent variable, Y, (variable that is
predicted / estimated), and,
ii. Independent variable, X, ( a variable that
provides the basis for estimation)
 Ex. Scatter Analysis

The Coefficient of Correlation

- Interval-level (eq. diff. in characteristics are represented by
equal diff. in the measurement) or ratio level of data
(normalized interval-level of data)
o (mostly not on i. Nominal – (data categories are
mutually exclusive and exhaustive but have no logical
order hence provides no manipulation (say addn etc.)
on the data)or ii. Ordinal-Level data (data categories
are mutually exclusive and exhaustive and are
classified according to the rank of the data or data
categories has a ranked order Ex. Rich, Middle-Class,
Poor etc.)
Coefficient of Correlation, r: A measure of strength of the
relationship
 The value of r ranges between ( -1) and ( +1)
The value of r denotes the strength of the
association as illustrated by the following
diagram
A case for Linear Regression
The Mr Bush, the marketing manager, would like to offer specific information about the
relationship between the number of sales calls and the number of severs sold. Use the least
square method to determine a linear equation to express the relationship between the two
variables.

Sales Sales Servers

Representative Calls (X) Sold (Y)
Hari 20 30
Rama 40 60
Shivani 20 40
Ravi 30 60
Gautam 10 30
Manish 10 40
Pandu 20 40
Harish 20 50
Venktesh 20 30
Binny 30 70
Total 220 450

a. What is the expected number of servers sold by a

representative who made 25 calls.
b. Determine a 95 percent confidence interval for all sales
representatives who makes 25 calls and for Mr Venu, a
Nothen Region sales representative who made 25 calls.

Is there any relationship between X and Y?

Plot the data – scatter plot

Exhibits a near linear relationship

Determine the strength of this relationship: Compute

Coefficient of Correlation, r
Use Nomalized Scatter Plot (through the mean of X and Y):
X-axis through and Y-axis Ȳ as vertices.
The vertices pass through the center of the data.

= ∑ X / n = 220 /10 = 22
Ȳ = ∑ Y / n = 450 /10 = 45

As most of the data (except 8th data of Harish) one are in the 1st
or 3rd quadrant, we may assume a +ve relationship because in
both these quadrants (X- ) (Y - Ȳ) is +ve (both (X- ) and (Y - Ȳ)
have same signs either +ve (in the 1st quadrant) or both –ve in
(the 3rd quadrant) and as observed in table below:
Calculate the deviations from the mean data

Sales Calls Sold X- Y - Ȳ (X- ) (Y - Ȳ)

Representative (X) (Y)
Hari 20 30 -2 -15 30
Rama 40 60 18 15 270
Shivani 20 40 -2 -5 10
Ravi 30 60 8 15 120
Gautam 10 30 -12 -15 180
Manish 10 40 -12 -5 60
Pandu 20 40 -2 -5 10
Harish 20 50 -2 5 -10
Venktesh 20 30 -2 -15 30
Binny 30 70 8 25 200
Total 220 450 0 0 900

Correlation Coefficient,
= 900 /[(10-1) * (9.189) * 14.337)]
= 0.759

Be cautious of spurious correlations.

Ex. No. of donkeys / horses and PhDs awarded
No of Trees in the institute campus and PhDs awarded
Testing of Coefficient of Correlation
In the previous example data there exhibited a strong
relationship between 10 sales people and the number of
servers sold. Could it be that the correlation of the population
is 0 (zero)? This would mean that the correlation of 0.759 is
due to chance.

Resolving this dilemma requires a test to answer the obvious

question selected. To put it in a different way did the computed
r came from a population of paired observations with zero
correlation?

We do a hypothesis testing
Hypothesis Testing: A statement about a population parameter
developed for the purpose of testing.
Null Hypothesis: A statement about the value of the
population parameter
Alternate Hypothesis: A statement that is accepted if the
sample data provide sufficient evidence that the Null
Hypothesis is false.
Steps in Hypothesis Testing
1. Establish the null hypothesis (H0) and the alternate
hypothesis (H1),
2. Select the level of significance, that is α,
(Rejecting the null hypothesis when it is in fact true is called a Type I error)
Errors in Making Decisions
 Type I Error (H0 rejected when true)
 When a true null hypothesis is rejected
 The probability of a Type I Error is a
 Called level of significance of the test
 Set by researcher in advance
 Type II Error [(Failure to reject H0 when it is false) or H1
accepted when false]
 Failure to reject a false null hypothesis
 The probability of a Type II Error is β

3. Select the test statistic, Examples: z, t, F, c2

4. Formulate the decision rule (based on 1,2, and 3 above)
5. Make a decision regarding the null hypothesis based on
the sample information. Interpret the results of the test.
For our example:
H0: ρ = 0 (the correlation of the population is 0)
H1: ρ ≠ 0 (the correlation of the population is different from 0)
From the way H1 is stated, we know that the test is two-tailed
We use a t-test (as we do not know the distribution of the
population, but may assume a more flatter symmetrical
distribution--- assuming a low n (here it is 10) with n-2 df

Using the 0.05 level of significance, the decision rule states that
if the computed t falls in the area ±2.306, the null hypothesis is
not rejected
Compute the test Statistic (here t)

The computed t is in the rejection region.

Thus, H0 is rejected at the 0.05% significance level.
This means the correlation of the population is not 0 i.e., there
is relationship between sales and calls made for the sales.

For extreme testing we use the p-value. A p-value is the

likelihood of finding a value of the test statistics more extreme
than the one computed earlier. We search for t-value with a
two-tail for the df from the table. We observe 3.297 is between
2.896 and 3.355 corresponding to 0.01 and 0.02 significance
level.
Regression Analysis
The Correlation analysis is used to measure the strength and
the direction of between two variables.
Is it meaningful to know whether co-relations exist(s)?
If so why?
Or
What is the application (use) of this relationship among
variables (between independent /dependent variables)?
The technique used to develop an equation and provide the
estimate is called regression analysis.
Regression Equation: An equation that expresses the linear
relationship between two variables.
Curve fitting; behavior of distribution of data around a line /
curve
Least Squares Principe: Determining a regression equation by
minimizing the sum of the squares of the vertical (shortest /
minm) distances between the actual Y values and the
estimated / predicted values of Y, Y’

General form of the linear regression Equation

Y’ = a + bX
where:
Y’is the predicted value of Y for a selected X value;
a is Y-intercept. It is the estimated value of Y when X=0;
b is the slope of the line, or the average change in Y’ for each change of one unit (either > or < )
in the value of X.;
X is any value of the independent variable that is selected.

Slope of the regression line:

where:
r is the correlation coefficient;
Sy is the SD of Y (the dependent variable)
Sx is the SD of X (the independent variable)

Y - intercept: a = Ȳ - b , value of Y even when X is absent

where:
Ȳ is the man of Y (the dependent variable)
is the mean of X (the independent variable)

Referring to the example above of number of sales calls (X) and

number of servers sold (Y), what is the expected number of
servers sold by the organization (any representative(s)) with 25
calls?
The calculations necessary to determine the regression
equation are:

a = Ȳ - b = 45 – (1.1842) * 22 = 18.9476

Thus the regression equation is

Y’ = a + bX; Y’ = 18.9476 + 1.1842X
For 25 calls (X =25) made by the sales representative(s), the
expected number of servers s/he can sell is:
Y’ = 18.9476 + 1.1842* 25
= 18.9476 + 23.684
= 48.5526
What does it mean? It means, with 25 calls made, we are
95% confident, the organization expects to sell (on an average /
mean) of around 49 servers.
Drawing the Regression Line

Sales Sales Servers Estimated

Representative Calls Sold Sales (Y’)
(X) (Y)
Hari 20 30 42.6316
Rama 40 60 66.3156
Shivani 20 40 42.6316
Ravi 30 60 54.4736
Gautam 10 30 30.7896
Manish 10 40 30.7896
Pandu 20 40 42.6316
Harish 20 50 42.6316
Venktesh 20 30 42.6316
Binny 30 70 54.4736
Total 220 450 450
This line has some interesting features. As we have discussed,
there is no other line through the data for which sum of square
of deviation of the data is smaller. In addition, this line will pass
through the points represented by the mean of X values and
the mean of Y values, that is and Ȳ. In this example = 22.0
and Ȳ = 45.0.

The Standard Error of Estimate

As we observe the predicated value differ from the actual
values.

What is the error of the regression estimates?

Standard Error of Estimate, SEE, SY.X: A measure of the scatter,
or dispersion, of the observed values around the line of
regression.

Standard Error of Estimate,

Remember that the regression line represents all the values of

Y’. If SY.X is small, this means that the data are relatively close to
the regression line and the regression equation can be used to
predict Y with little error. Conversely, if SY.X is large, then this
means that the data are widely scattered around the regression
line and the regression equation will not provide a precise
estimate of Y.
Sales Sales Servers Estimated Deviation Deviation
Representative Calls (X) Sold (Y) Sales (Y’) (Y-Y’) squared (Y-Y’)2
Hari 20 30 42.6316 -12.6316 159.5573
Rama 40 60 66.3156 -6.3156 39.8868
Shivani 20 40 42.6316 -2.6316 6.925319
Ravi 30 60 54.4736 5.5264 30.5411
Gautam 10 30 30.7896 -0.7896 0.623468
Manish 10 40 30.7896 9.2104 84.83147
Pandu 20 40 42.6316 -2.6316 6.925319
Harish 20 50 42.6316 7.3684 54.29332
Venktesh 20 30 42.6316 -12.6316 159.5573
Binny 30 70 54.4736 15.5264 241.0691
Total 220 450 450 0.0000 784.2105
Standard Error of Estimate,

The deviations are the vertical deviations from the regression

line. The sum of the signed deviations, ∑(Y-Y’), is zero. This
indicates that the positive deviations (above the regression line)
are offset by the negative deviations (below the regression line).

Thus far we have presented linear regression only as a

descriptive tool. In other words a simple summary (Y’ = a + bX)
of the relationship between the dependent Y variable and the
independent X variable. When our data are a sample taken
from a population, we are doing inferential statistics. Thus we
need to recall the distinction between population parameters
and sample statistics. In this case, we “model” the linear
relationship in the population by the equation:
Y = α + βX
where:
Y is any value of the dependent variable
α is the Y-intercept (the value of Y when X = 0) in the population,
β is the slope (the amount by which Y changes when X increases by one unit) of the population
line,
X is any value of the independent variable

Now α and β are population parameters and a and b,

respectively are estimates of those parameters. They (a & b) are
calculated from a particular sample taken from the population.
Thus, the values of a and b in the regression equation are
usually referred to as the estimated regression coefficients or
simply the regression coefficients.

Assumptions underlying Linear Regression

To properly apply linear regression, several assumptions are
necessary:
i. For each value of X, there are a group of Y values.
These values follow the normal distribution

ii. The mean of these normal distributions lie on the

regression line (in other words the regression line is the
line connecting the means normal distribution of Yi’s for
respective Xi’s)
iii. The standard deviations of these normal deviations are
the same. The best estimate we have of this SD is the
Standard Error of the Estimate, SEE (SY.X)

iv. The Y values are statistically independent. This means

that in selecting a sample a particular X does not
depend on any other value of X. This is particularly
important when data are collected over a period of
time. In such situations, the errors for a particular time
period are often correlated with those of other time
periods.
If the respective Y values follow a normal distribution with a
constant standard deviation equal to the Standard Error of
Estimate, SEE, (SY.X), around the mean respective Y value (the
respective mean values being on the regression line), then the
same relationship exists between the predicted values, Y’, and
the Standard Error of Estimate, SEE, SY.X;
i. Y’ ± SY.X will include the middle 68 % of the observations
ii. Y’ ±2SY.X will include the middle 95 % of the observations
iii. Y’ ±3SY.X will include all the observations
We can now relate these assumptions to sales of server by ABC
Ltd. Assume that we took a sample of at least 30 (number of
observations in the sample, n ≥30), but the standard error of
estimate, SEE (SY.X), was still 9.900824 (say 9.901). If we draw
parallel lines 9.901 units above and below the regression line
(see figure below) then about 68 percent of the points would lie
within these lines as limits. Similarly, a line 19.802 [2 SY.X =
2(9.901)] above the regression line and another 19.802 units
below the regression line should include about 95 percent of
the data values.
Confidence and Prediction Interval
The standard error of estimate is also used to establish
confidence intervals when the sample size is large (n ≥30) and
the scatter around the regression line approximates the normal
distribution.
 If the population standard deviation σ is unknown, we
can substitute the sample standard deviation, s, as an
estimate
 This introduces extra uncertainty, since s is different from
sample to sample
 In these circumstances the t distribution is used instead of
the normal distribution [for a small sample size (n <30)
and with unknown population distribution (σ)
distribution, we use t-statistics instead of z-statistics of
normal population distribution]

In our example as n=10 < 30), sample size is small we need a

correction factor to account for the size of the sample. In
addition, when we move away from the mean of the
independent variables, our estimates are subject to more of
variation, and we also need to adjust for this.

We are interested in providing interval estimates of two types:

i. Confidence interval, reports a mean value of Y’ for a
given X
ii. Prediction interval reports the range of values of Y’ for a
particular value of X
The confidence interval for a mean of Y, given X

where:
t is the value of t from the t-table with n-2 df
William Gosset (1990) noticed that ± z(s) was not precisely
correct for small samples. He noticed that for small samples (n <
30), the vatiations around are more than ± z(s). We need to
compensate this, for small samples. It is observed that it follows
a t-distribution (a more flatter distribution that z-distribution).
But with sample sizes n≥30, the t-vales and z-vales are almost
equal.

Referring to our server sales example:

let us find for the confidence interval for X = 25

Sales Calls Sold X- Y - Ȳ (X- ) (Y - Ȳ)

where:
Y’= (18.9476 + 1.1842* 25) = 48.5526
t for n-2 = 08 df, 95 percent confidence interval is 2.306

= 48.5526 ± 2.306*(9.901)*(0.334428027)
= 48.5526 ± 7.6356

Thus, the 95 percent of all sales representatives who makes 25

calls is from 40.9170 to 56.1882. To interpret, let’s round the
values; if a sales representative makes 25 calls, s/he expects to
sell 48.6 severs. It is likely those sales will range from 41 to 56.
To determine the prediction interval for a particular value of X
(particular sales representative in our example), the confidence
interval formula is modified slightly: A 1 is added in the radical.
The formula becomes:

Suppose we want to estimate the number of servers sold by any

individual, say Harish, who made 25 calls

= 48.5526 ±2.306*(9.901)*(1.054439)
= 48.5526 ±24.0746
Thus the interval is from 24.478 to 72.627. (≈ 24 to 73)
We conclude the number of servers sold will be between 24 to
73, for a particular sales representative. This interval is quite
large. It is much larger than the confidence interval for all sales
representatives who made 25 calls. It is logical, however, that
there should be more variation in the sales estimates for an
individual than for a group.
More on Coefficient of Determination
Sales Calls Sold Y - Ȳ (Y - Ȳ)2
Representative (X) (Y)
Hari 20 30 -15 225
Rama 40 60 15 225
Shivani 20 40 -5 25
Ravi 30 60 15 225
Gautam 10 30 -15 225
Manish 10 40 -5 25
Pandu 20 40 -5 25
Harish 20 50 5 25
Venktesh 20 30 -15 225
Binny 30 70 25 625
Total 220 450 0 1850

The sum of the square of the deviations from the arithmetic

mean for a set of numbers is smaller than the squared
deviations from any other value, such as median. The sum of
squared deviations is 1850. This is shown dogmatically below:
The line in red shows the difference
Here,
for Rama: X=40 , Y=60, Ȳ =45, and hence Y-Ȳ=15
for Ravi: X=30 , Y=60, Ȳ =45, and hence Y-Ȳ=15
for Binny: X=30 , Y=70, Ȳ =45, and hence Y-Ȳ=25

But for the regression: standard error of estimate, SEE (SY.X),

was still 9.900824. Thus logically, the total variation in Y can be
subdivided into explained variation and unexplained variation
Measures of Variation:
i. Explained variation (sum of squares due to regression)
ii. Unexplained variation (error sum of squares)
iii. Total Variation (Total sum of squares)
Explained Variation = Total Variation – Unexplained variation
in our example,

Coefficient of Determination, r 2
 The coefficient of determination is the portion of the
total variation in the dependent variable that is explained
by variation in the independent variable
 The coefficient of determination is also called r-squared
and is denoted as r2
SSR regression sum of squares
r2  
SST total sum of squares

Note:
0  r2  1
The Relationship among the Coefficient of Correlation, the
Coefficient of Determination, and the Standard Error of
Estimate
The standard error of estimate, measures how close the actual
value are to the regression line. When SEE is small, it indicates
that the two variables are closely related. In the calculation of
SEE the key term is ∑(Y-Y’)2. If the value of this term is small,
then SEE will also be small.

The correlation coefficient measures the strength of the linear

association between the two variables. When he points (data)
are close to the line, we note that the correlation coefficient
tend to be large. Thus, the SEE and r relate the same
information but use a different scale to report the strength of
the association. However, both measure involves the term ∑(Y-
Y’)2.

The coefficient of determination measures the percent of the

variation in Y that is explained by the variation in X.
A convenient way of showing the relationship among these
three measures is an ANOVA table. In ANOVA, the total
variation is divided into two components: that is due to the
treatments and that is due to random error. This concept is
similar in regression analysis. The total variation ∑(Y-Ȳ)2 is
divided into two components: 1. that explained by the
regression (explained by the independent variable), and 2. the
error, or unexplained variation.
The total number of df is n-1. The number of df for the
regression is 1, since there is only one independent variable.
The number of df associated with the error term is n-2. The
“SS” is located in the middle of the ANOVA table refers to the
sum of squares – the variation. The terms are computed as
follows:
Regression = SSR = ∑(Y’-Ȳ)2
Error Variation = SSE = ∑(Y-Y’)2
Total variation = SS total = ∑(Y-Ȳ)2
The format of the ANOVA table is
Source df SS MS
(sum of squares –
explains the variation)
Regression 1 SSR SSR /1
Error n-2 SSE SSE / (n-2)
Total n-1 SSTotal*
*SS Total = SSR + SSE

The Coefficient of determination can be obtained directly from

the ANOVA table:

The term “SSR/SS Total” is the portion of the variation in Y

explained by the independent variable, X. Note the effect of the
SSE term on r2. As SSE decreases, r2 will increase. Conversely, as
the standard error decreases, the term r2 increases.

The standard error of estimate, SEE, can also be obtained from

the ANOVA table using the following equation:
Multiple Linear Regression
Four Assumptions similar to these apply to the SLR model must
also apply to the MLR model
1. There exist many values of Y for a given value of X i. Hence,
there can be many possible εi for a given Xi. The
distribution of model errors for any level of X is normally
distributed.
2. The errors, , are independent are another.
3. The distribution of possible ε-values have equal variances
at each level of X.
4. The means of the dependent variables, y, for all specified
values of X can be connected with a line called the
population regression model.

(Y, X1, X2) points of three-dimensional space forms a slice (hyper

plane) through the data such that is minimized.
This is the same least square equation that is used as in Simple
Linear Regression.
Basic Model-Building Concepts:
A. Model specification or model identification is the process
of identifying
 Dependent variable.

 Independent variables, and,

 obtaining sample data for all variables. The Larger
sample size is better.
B. Model Building:
Development of the mathematical model in which some or
all of the independent variable to explain the variation in
the dependent variable.
- Include independent variables for which you have
complete data
- There is no way to determine whether an independent
variable will be good predicator variable by analyzing
the individual variables descriptive statistics such as the
mean and SD. Instead, we need to look at the
correlation between the independent variable(s) and
the dependent variable which is measured by the
correlation coefficient.
Is the relationship (measured through the correlation
coefficient) spurious?

We will conduct the test with a significant level with

df1 = n – (k+1) and dfMLR = n-k.
Individual tSTAT are calculated (for each independent variable and
the dependent variable)
For a 2 tail test

Compute the Regression equation:

Helps in getting point estimate for Y (estimated sales price/

sales units). Provides an estimated average (mean) value of y
for the respective values of X1, X2 etc.

Multiple coefficient of Determination (R2)

R2explains the % variation of Y, explained through linear
relationship of the selected independent variable(s). However,
as we may see later, not all the independent variables are
equally important to the model’s ability to explain this
verification.

Model Diagnosis:
Before use of the model (for predictive etc.) to estimate the
sales units (y), there are several questions that should be
answered.
1. Is the model significant?
2. Are the individual variable(s) significant?
3. Is the SD of the model error too large to provide
meaningful results?
4. Is Multicollinearity a problem?
5. Have the regression analysis assumptions been satisfied?

Is the Model Significant?

If the null hypothesis is true and all the slope coefficient are
simultaneously equal to zero, the overall regression model is
not useful for predictive or descriptive purposes.
So it is a question of overall fit of the MLR Model.
The F- test is a method for testing whether the regression
model explains a significant proportion of the variation in the
independent variable (and whether the overall model is
significant). The F-statistics for a MLR model is:
F-Test statistic

Where:
SSR= sum of squares regression =

SSE= sum of squares error =

n= sample size
k= number of independent variable
df of regression = number of independent variable, k
df of errors = n-k-1
total df of MLR = n-1
If

or if

If we reject H0, we conclude that regression model does explain

a significant portion of the variation in sales price. Thus, the
overall model is statistically significant. This means at least one
of the regression slope coefficient is significant (is not equal to
zero).
Addition of new independent variables always increases R2,
even if these variables have no relationship to the independent
variable. Therefore, as the number of independent variables is
increased (regardless of the quality of the variable), R2 will
always increase. However, each addition of variable results in
the loss of one df. This is viewed as part of the cost of adding
the specified variable. The addition to R2 may not justify the loss
of the df. The RA2 value takes into account this cost and adjusts
the RA2 value accordingly. RA2 will always less than R2.
When a variable added that does not contribute its fair share to
the explanation of the variation in the dependent variable, the
RA2 value may actually decline, even though R2 will always
increase. The adjusted R2 is a particularly important measure
when the number of independent variables is large relative to
the sample size.
Are the independent variables significant?
The overall model is significant means at least one independent
variables explains a significant proportion of the variation in
sales volume / price. This does not mean that all the variables
are significant, however to determine which variables are
significant, we need to test the following hypothesis.

for all j

We test the significance of each independent variable using

significant level α = 0.05 and the calculated t- values should be
compared to the critical t-value with n-k-1 df, which is
approximately t0.025 = 1.97

bj- sample slope coefficient

- estimate of the standard error for the j th sample slope

coefficient.

When a regression is to be used for prediction the model

should contain no insignificant variables. It insignificant
variables are present, they should be dropped and a new
regression equation obtained before the model is used for
prediction purposes.
Is the SD of the Regression Model too large?
The SD of the regression model (also called the standard error
of the estimate), measures the dispersion of observed sale
values, Y, around values predicted by the regression model.
Standard Error of the Estimate, Se

SSE- sum of squares error (residual)

Sometimes, even though a model has high R2, the standard

error of the estimate will be too large to provide adequate
precision for the confidence and predictive intervals. A rule of
thumb is to examine the range ±2Se across the mean predicted
value. It this range is acceptable from a practical view point, the
standard error of the estimate might be considered acceptable.

Is multicollinearity a problem?
Multicollinearity - a high correlation between the independent
variables such that the two variables contribute redundant
information to the model.
Some of the obvious problems and indications of severe
Multicollinearity:
i) Unexpected/ incorrect signs on the coefficients.
ii) A sizeable change in the value of the previously
estimated coefficients when a new variable is added to
the model
iii) The estimate of the SD of the model error increases
when a variable is added to the model.
iv) Low t-values for significant variables.
One method of Measure of multicollinearity is known as the
Variance Inflation Factor (VIF)

where: s the coefficient of determination when the j

independent variable is regressed against the remaining k-1

independent variables.
imply that the correction between the independent
variables is too extreme and should be dealt with by dropping
the variable from the model.
Confidence Interval Estimation for the Regression Coefficients.
The regression coefficients being project estimates are subject
to sampling error.
Confidence interval estimate for the regression slope

Important Checks of Health of Regression

A. So it is a question of overall fit of the MLR Model.
The F- test is a method for testing whether the regression
model explains a significant proportion of the variation in the
independent variable (and whether the overall model is
significant). The F-statistics for a MLR model is:
F-Test statistic

Where:
SSR= sum of squares regression =
SSE= sum of squares error =
n= sample size
k= number of independent variable
df of regression = number of independent variable, k
df of errors = n-k-1
total df of MLR = n-1

or if

 Multiple Coefficient of Determination (R2) Reports the

proportion of total variation in y explained by all x
variables taken together

Coefficient of Determination, R 2
 The coefficient of determination is the portion of the
total variation in the dependent variable that is explained
by variation in the independent variable
 The coefficient of determination is also called r-squared
and is denoted as r2
SSR regression sum of squares
R2  
SST total sum of squares

Note: 0  R2 1

B. We test the significance of each independent variable using

significant level α = 0.05 and the calculated t- values should
be compared to the critical t-value with n-k-1 df, which is
approximately t0.025 = 1.97

C. One method of Measure of multicollinearity is known as

the Variance Inflation Factor (VIF)

where: s the coefficient of determination when the j

independent variable is regressed against the remaining k-1

independent variables.

imply that the correction between the independent

variables is too extreme and should be dealt with by dropping
the variable from the model.
D. The errors should be Random / pure white noise
There should be no correlation in the errors.
Autocorrelation is correlation of the errors (residuals) over time
The Durbin-Watson statistic is used to test for autocorrelation
The Durbin-Watson Statistic
H0: ρ = 0 (no autocorrelation)
H1: autocorrelation is present
n

 (e t  e t 1 ) 2
The Durbin-Watson test statistic (d): d t 2
n

e 2
t
 The possible range is 0 ≤ d ≤ 4 t 1

 d should be close to 2 if H0 is true

 d less than 2 may signal positive autocorrelation,
 d greater than 2 may signal negative autocorrelation

1. The Durbin-Watson Statistic: d should be close to 2

2. Measure of multicollinearity is known as the Variance
Inflation Factor (VIF) should be <5
For 95% CL or 5% SL
3. For Significance of the independent variable t ≥ │±2│
4. For overall fit of the model, n≥10, F be greater than 5

Educ 201
No ratings yet
Educ 201
2 pages
Apa Citation Style PDF
No ratings yet
Apa Citation Style PDF
6 pages
Lecture 1
No ratings yet
Lecture 1
32 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Midterms Gec Math Adooooor
No ratings yet
Midterms Gec Math Adooooor
6 pages
2466939-EDA_and_STATISTICS_NOTES
No ratings yet
2466939-EDA_and_STATISTICS_NOTES
15 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Basics of Statistics
No ratings yet
Basics of Statistics
40 pages
Univariate Statistics
No ratings yet
Univariate Statistics
7 pages
Basics For Understanding
No ratings yet
Basics For Understanding
8 pages
MATM111-Midterms-REVIEWER
No ratings yet
MATM111-Midterms-REVIEWER
3 pages
Basic Concepts in Biostatistics-1
No ratings yet
Basic Concepts in Biostatistics-1
40 pages
Stats 1 Module Updated
No ratings yet
Stats 1 Module Updated
53 pages
Stats Reviewer
No ratings yet
Stats Reviewer
3 pages
Predictive Analytics Notes1
No ratings yet
Predictive Analytics Notes1
37 pages
Unit 8. Data Analysis
No ratings yet
Unit 8. Data Analysis
69 pages
Stat Quick Overview
No ratings yet
Stat Quick Overview
35 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Statistics and Probability (1) (1)
No ratings yet
Statistics and Probability (1) (1)
2 pages
Handout-A-Preliminaries (Advance Statistics)
No ratings yet
Handout-A-Preliminaries (Advance Statistics)
29 pages
Qualitative Quantitative: Random Variable
No ratings yet
Qualitative Quantitative: Random Variable
4 pages
Week 5 - Result and Analysis 1 (UP)
No ratings yet
Week 5 - Result and Analysis 1 (UP)
7 pages
Stats Assingment
No ratings yet
Stats Assingment
12 pages
Statistics
No ratings yet
Statistics
11 pages
STA301 IMP Notes Headings and Some Questions Answers Prepared by
No ratings yet
STA301 IMP Notes Headings and Some Questions Answers Prepared by
32 pages
1 Basic Statistics
No ratings yet
1 Basic Statistics
35 pages
Statistics
No ratings yet
Statistics
68 pages
Statisitcs
No ratings yet
Statisitcs
22 pages
Univariate Statistics
No ratings yet
Univariate Statistics
4 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Bio Statistics
No ratings yet
Bio Statistics
72 pages
ISM Session 1-8+webinar1,2 Merged
No ratings yet
ISM Session 1-8+webinar1,2 Merged
718 pages
Statistical Methods
No ratings yet
Statistical Methods
43 pages
Statistics and Probabilities Quarter 1
No ratings yet
Statistics and Probabilities Quarter 1
6 pages
Location) .: Distribution Is The Purpose of Measure of Central
No ratings yet
Location) .: Distribution Is The Purpose of Measure of Central
13 pages
6062b249804f2baef22989a2 - SS AP Statistics
No ratings yet
6062b249804f2baef22989a2 - SS AP Statistics
35 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
No ratings yet
Descriptive Statistics Summary (Session 1-5) : Types of Data - Two Types
4 pages
Accounting Decision Tools
No ratings yet
Accounting Decision Tools
6 pages
Mmw Data Management
No ratings yet
Mmw Data Management
35 pages
It Is Also Including Hypothesis Testing and Sampling
No ratings yet
It Is Also Including Hypothesis Testing and Sampling
12 pages
Statistics 1 Year Paper Pattern
No ratings yet
Statistics 1 Year Paper Pattern
7 pages
Probability Distributions-Sarin B
No ratings yet
Probability Distributions-Sarin B
20 pages
Normal DistrCent Tendency Measures of Dispersion
No ratings yet
Normal DistrCent Tendency Measures of Dispersion
26 pages
Sampling and sampling distribution with Business Application_v2.docx
No ratings yet
Sampling and sampling distribution with Business Application_v2.docx
11 pages
History Reporting
No ratings yet
History Reporting
61 pages
Mba Statistics Midterm Review Sheet
No ratings yet
Mba Statistics Midterm Review Sheet
1 page
Data Management
No ratings yet
Data Management
48 pages
Statistics
No ratings yet
Statistics
12 pages
PDS_Unit4
No ratings yet
PDS_Unit4
18 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
4 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
QM Formula Class
No ratings yet
QM Formula Class
31 pages
MATM MIDTERMS
No ratings yet
MATM MIDTERMS
6 pages
Statistics L 1
No ratings yet
Statistics L 1
27 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
3 pages
Basic Statistics
No ratings yet
Basic Statistics
105 pages
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Helloo This Is My First Latx Tutorial
No ratings yet
Helloo This Is My First Latx Tutorial
1 page
Microeconomics Quiz
100% (2)
Microeconomics Quiz
8 pages
Portals, Portals Business Model, Online Banking Implementation, Changing Dynamica and Management Issues
No ratings yet
Portals, Portals Business Model, Online Banking Implementation, Changing Dynamica and Management Issues
1 page
Retail Promotion Strategy
No ratings yet
Retail Promotion Strategy
16 pages
2personality Development Games-February PDF
No ratings yet
2personality Development Games-February PDF
8 pages
Nature and Types of Company
No ratings yet
Nature and Types of Company
20 pages
Impact of Scripless Trading
No ratings yet
Impact of Scripless Trading
8 pages
Functions of Financial Services
67% (9)
Functions of Financial Services
15 pages
What Is A Lease
No ratings yet
What Is A Lease
9 pages
Capital Market Instruments
No ratings yet
Capital Market Instruments
4 pages
Module 3 - Statistics Refresher
No ratings yet
Module 3 - Statistics Refresher
16 pages
Our Businesses: Retail Brands (16) Hospitality Brands (7) Mall Management
No ratings yet
Our Businesses: Retail Brands (16) Hospitality Brands (7) Mall Management
10 pages
Population Projection of Ethiopia For All Regions at Wereda Level From 2014 PDF
100% (1)
Population Projection of Ethiopia For All Regions at Wereda Level From 2014 PDF
118 pages
Tools For Decision Making A Practical Guide For Local Government 2nd Edition David N. Ammons
No ratings yet
Tools For Decision Making A Practical Guide For Local Government 2nd Edition David N. Ammons
84 pages
Chapter 5 Homework
No ratings yet
Chapter 5 Homework
7 pages
Assessing Demographics, Perceptions, and Participation: A Study On Sangguniang Kabataan (SK) Effectiveness and Improvement Suggestions
No ratings yet
Assessing Demographics, Perceptions, and Participation: A Study On Sangguniang Kabataan (SK) Effectiveness and Improvement Suggestions
9 pages
Stat Reviewer
No ratings yet
Stat Reviewer
4 pages
2021 MID-YEAR - Assignment Stats 1
No ratings yet
2021 MID-YEAR - Assignment Stats 1
7 pages
10 Practical Research 2 List Research Hypothesis Module Bryan Gabriel
No ratings yet
10 Practical Research 2 List Research Hypothesis Module Bryan Gabriel
27 pages
Final Resume 1
No ratings yet
Final Resume 1
1 page
Vocab
No ratings yet
Vocab
29 pages
Random Variables and Probability Distributions
100% (1)
Random Variables and Probability Distributions
15 pages
Organic Multipurpose Cleaner
0% (1)
Organic Multipurpose Cleaner
60 pages
SST 16
No ratings yet
SST 16
3 pages
Statistical Model For The Plant and Soil Sciences
No ratings yet
Statistical Model For The Plant and Soil Sciences
730 pages
IST ChiSquare Sample
No ratings yet
IST ChiSquare Sample
3 pages
Saint Joseph College Senior High School Department Tunga-Tunga, Maasin City, Southern Leyte 6600 Philippines
No ratings yet
Saint Joseph College Senior High School Department Tunga-Tunga, Maasin City, Southern Leyte 6600 Philippines
11 pages
Prob & Random Process Q
No ratings yet
Prob & Random Process Q
20 pages
Demirhan Thesis 2016
No ratings yet
Demirhan Thesis 2016
60 pages
CHAPTER-1-PR2 Quantitative Research
No ratings yet
CHAPTER-1-PR2 Quantitative Research
3 pages
Ifoa Syllabus Mapping Template Copy
No ratings yet
Ifoa Syllabus Mapping Template Copy
132 pages
Australian International School, Dhaka: Subject: - Mathematics Application and Interpretation
No ratings yet
Australian International School, Dhaka: Subject: - Mathematics Application and Interpretation
14 pages
Hud Animations
No ratings yet
Hud Animations
7 pages
Republic of The Philippines BED Department
No ratings yet
Republic of The Philippines BED Department
4 pages
Hasil Uji Validitas Dan Reliabilitas: Item-Total Statistics
No ratings yet
Hasil Uji Validitas Dan Reliabilitas: Item-Total Statistics
3 pages
Vit
No ratings yet
Vit
30 pages
Statistical Inference
100% (1)
Statistical Inference
118 pages
Basic Econometrics (BA 4th)
No ratings yet
Basic Econometrics (BA 4th)
4 pages
Chapter5 Infererence Based On Two Samples
No ratings yet
Chapter5 Infererence Based On Two Samples
26 pages
Type of Research 2
100% (1)
Type of Research 2
116 pages