0% found this document useful (0 votes)
48 views

Statistical Tools For Data Analysis

1. The document discusses various statistical tools that can be used for data analysis, categorized as univariate, bivariate, and multivariate tools. 2. Univariate tools like frequency tables, histograms, and measures of central tendency and dispersion are used to analyze individual variables. Bivariate tools like cross tables, scatter plots, and regression analyze relationships between two variables. Multivariate tools include multiple regression and cluster analysis for more complex analysis. 3. Examples of commonly used univariate tools provided are frequency distributions, measures of central tendency and dispersion. Bivariate tools discussed are cross tables, correlation, and linear regression analysis.

Uploaded by

Sidharth Ray
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Statistical Tools For Data Analysis

1. The document discusses various statistical tools that can be used for data analysis, categorized as univariate, bivariate, and multivariate tools. 2. Univariate tools like frequency tables, histograms, and measures of central tendency and dispersion are used to analyze individual variables. Bivariate tools like cross tables, scatter plots, and regression analyze relationships between two variables. Multivariate tools include multiple regression and cluster analysis for more complex analysis. 3. Examples of commonly used univariate tools provided are frequency distributions, measures of central tendency and dispersion. Bivariate tools discussed are cross tables, correlation, and linear regression analysis.

Uploaded by

Sidharth Ray
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

STATISTICAL TOOLS FOR DATA

ANALYSIS

Tools for Data Analysis could be


different with respect to type of data.
 Several statistical tools could be used
to analyze data such as a) qualitative and
quantitative and b) time series, cross
section and panel data.
Specification of procedure and analytical
tool to be used are related to the
objectives of the study.
Tool for data analysis are classified
into three broad categories.
Univariate Tools
Bivariate Tools
Multivariate Tools
These classification is based on the
number of variables a tool uses.
Classification of statistical Tools (Contd)

Univariate Tools for Data Analysis


Frequency Tables & Distribution
Histogram
Ogive or Cumulative frequency curve
Pie-Charts
MCTs
Measures of Dispersion
Kurtosis etc
Some of these tools are visual aids.
Classification of statistical Tools (Contd)

Bivariate Tools for Data Analysis


Cross Tables
Scatter Plots
Correlation
Bivariate Regression
Trend Lines
Binary Choice (Logistic Regression, Linear
Probability Models using two variables)
Etc
Classification of statistical Tools (Contd)

A few Multivariate Tools are as follows


Multiple Regression
Factor Analysis
Cluster Analysis
Discriminant analysis
Multivariate Analysis of variance (MANOVA)
Conjoint Analysis
Canonical Correlation
Multi Dimensional Scaling (MDS)
Structural Equation Modeling (SEM)
Logistic Regression using more than 2 variables etc
Master Table for survey Research

When data are collected through questionnaire in


a social research a master table is prepared to
summarize the data and conduct further analysis.
A Master Table could be prepared either
manually or using a soft ware package like an
excel package.
Code number could be used.
After summarizing the data different statistical
tools could be used to analyse those.
1. Univariate Tools
The primary objectives of use of univariate tools are a)
to introduce the sample to the reader and b) to examine
the nature of the variable in terms of its distribution.
Uses of some of these tools are as follows:
Frequency Tables.
The objective of a frequency table is to summarize the
raw data in a concise, systematic and meaningful way.
Cumulative frequency tables, Histogram, Pie-Charts,
Distributions can be prepared from this.
Broad conclusions on the nature of the distribution of the
data can be drawn from these tools.
Univariate Tools (Contd)

Summary Statistics (Descriptive Statistics)


Often the researcher is interested to represent a set of
data in single number/figure with respect to a
variable.
For example : A researcher has a set of observation on
income of a group of persons.
He wants to summarize the variable for the group in
terms of average and deviation from the average.
Such statistical tools are known as descriptive
statistics since the number/figure describe the
distribution of the variables .
Univariate Tools (Contd)

Some of the Summary Statistics are:


Measures of Central Tendencies. Measures
of Dispersion, Measures of Peaked ness.
a) MCT :
Arithmetic Mean, AM = ∑Xi/N (Simple average
Weighted AM, WAM= ∑WiXi/N ( Takes the importance
of each value to the overall total)
Geometric Mean: We use GM when we need to know
the average rate of growth of a series of numbers.
GM=Nth root of the product of n number of Xs.
Univariate Tools (Contd)

Harmonic Mean: It is used in cases where extreme


values (usually higher values) are there in a series
For exple: Let us consider the series of numbers
12,13,16,18, 11.16.19. 20.18. 17.14.89. 99.
Arithmetic Mean may not represent the series.In such
cases we use a harmonic mean to give less
weightage to the higher values.
H.M.=Reciprocal of the average of the reciprocals of N
number of Xs i.e 1/AM of 1/12,1/13……..
Median: The most central item.
Mode: A value repeated most often
Univariate Tools (Contd)

Measures of Dispersion:
Range, Mean Deviation, Variance and Standard
Deviation.
They have several implications and uses in analysing a
set of observations.
For Exple: 1+/- one standard deviation covers about
66% of the sample in a normal distribution.
Statistical Tests
Z and 't' Tests are used to examine the significance of
difference between sample and population means.
Similarly λ 2 'F' Tests are used to examine the
difference between sample and population variance.
2. Bivariate Tools

Bivariate Tools are used to highlight


relationship between two variables.
Some of the bivaraite tools are
a) Cross Tables,Graphs and Scatter Plots
b) Correlations (Rank and Simple)
c) Bivariate linear and non.linear regression
d) Binary Choice (Logistic Regression, Linear
Probability Models using two variables)
e)Trend Lines
Bivariate Tools (Contd..)
Scatter Plots (Gives an idea about the nature of relationship
between the variables)
Correlation
Rank (Spearman) and Simple (Karl Pearson) correlations are used
bivariate data analysis.
These two types of correlation differ with respect to the types of
data used. Ordinal scale (rank order) data are used for rank
correlation where as metric data are used in simple correlation.
Both of these use a specific formulae to calculate the correlation
coefficient which ranges from -1 to 1.
The correlation coefficient speaks about the direction and the
extent of correlation.
No cause and effect relationship is examined, but it should have
construct validity.
Redundant relationship should be avoided.
Bivariate Tools (Contd..)

Bivariate Linear Non-Linear Regression


1. Linear Regression:
The simplest relationship between two variables is
a linear one which can be specified as follows;
Yi = α + β Xi + Ui , where
Y - Dependent variable
X- Independent variable
U- Error term or disturbance term.
Bivariate Tools (Contd..)

A scatter plot gives us some idea about the relationship


between two variables.
There could be alternative lines representing the
relationship between the variables.
Consider ∑ ei and ∑ ei2 about the alternative lines.
∑ ei2 will be non-negative and will vary with the spread
of the points from the lines.
Now, each line has α and β values .
Therefore ∑ ei2 will be a function of α and β.
Therefore we need to minimize this with respect to α
and β which will identify the line which will give the
least square error.
Bivariate Tools (Contd..)

What is ei ?
ei = Actual Observation on Y - Estimated Y
Therefore, ei2 = (yi - y^i) 2
Or [yi - (α + β Xi) ]2

∑ ei2 = ∑ [yi - (α + β Xi) ]2


This has to be minimized with respect to α and
β to get the best fitted line which represents
the relationship between X and Y.
Bivariate Tools (Contd..)

The process of minimization gives two


normal equations with two unknowns.
By solving the equations we get the formula for
estimating the values of α and β.
β = ∑ xiyi/ ∑ xi 2 ( In deviation form)
α = Mean Y - β Mean X
These estimates are known an Least Square Estimates.
With the help of these estimated values of the intercept
and the slope we can write the equation of the line of
best fit.
Bivariate Tools (Contd..)
The null hypothesis (Ho):
A Null Hypothesis which is commonly tested is
Ho : β = 0
This means that there is no relation between X
and Y Or the line is a straight line parallel to
the X axis.
This null hypothesis (Ho) is rejected if the
computed 't' value is more than the
tabulated 't' value with a certain degree of
freedom and significance level.
Bivariate Tools (Contd..)

The Coefficient of Determination ( R2 )


Three quantities can be calculated from the line of
regression with respect to the given Y and X values.
TSS: Total Sum of Squares of the deviations
ESS : Explained sum of Squares
RSS: Residual Sum of Squares
R2 = Explained sum of Squares/ Total Sum of
Squares.(When RSS declines ESS tends to TSS and
R2 approaches 1 (One)
This is known as the explanatory power of the model.
Forms of Bivariate Regression Models
and their uses
Various forms of two variable regression
models have have different objectives/uses.
A few examples:
1. Simple Linear Model
Yi = α + β Xi + Ui , where
Y - Dependent variable, X- Independent variable
and U- Error term
It highlights the linear relationship between Y and X
as discussed earlier.
2. Linear Trend

The linear growth of a variable can be


calculated using a simple regression
model such as Y = α + β t + u, where Y
is the variable under consideration & t
is the time or trend variable.
The + ive or - ive trend of the variable
over the time period is determined by
looking at the sign of the slope or β.
3.Log Linear Model Yi =α Xi β e , (Taking log)
Ln Yi = ln α + β ln Xi + ei

It is an exponential regression model ( known


as double log or log linear model).
This model is popular in applied work since
the slope coefficient measures elasticity of Y
with respect to X.(% change in Y due to %
change in X)
Exple: To estimate the advertising elasticity
of a product we may use this model
specifying : Sale Volume = f( Adv expdr).The
slope will give the adv elasticity.
4. Semi-log Regression Model.

The semi log model could be used to


measure growth rate of a variable over
a time period.
This model is specified as
ln Yi = ln α + β t + u
This is known as semi log model since only
one variable appears in log form. It is also
known as log-lin model.
Semi-log Rgression Model..Contd.

In the semi log model the slope coefficient


measures the constant proportion or relative
change in Y for a given absolute change in
X ( 't' in the above equation).
Slope x 100 will give the point of time
change in Y with respect to change in X.
Compound growth rate can be found by
The formulae: [ Antilog β - 1] x 100
5. Quadratic/ Cubic Model:
The forms of a quadratic or a cubic model
could be
Y= a+bx+cX2 + u or Y= a+bx+cX2 +dX3+u
Since these models use one independent variable they
can be categorized under the two variable regression
equations.
The quadratic models are used to examine whether
minima or maxima exits in the curve depicting the
relationship between X and Y. Expl: Total Rev Curve, Av
Cost Curves etc. Cibic models are used in Total cost
functions etc.
Assignment 2
Collect relevant data and estimate the Five
Forms of Two Variable Regression Models
explained above.
Use SPSS package for estimation.
Interpret the results
Exercises of randomly selected groups will be
discussed in the next class.
One class will be assigned for discussion of the
results and interpretation.

You might also like