0% found this document useful (0 votes)
2 views

Data Analysis and Interpretation

The document provides an overview of data analysis and interpretation, emphasizing the importance of statistics in collecting, presenting, and analyzing data. It distinguishes between primary and secondary data, outlines various statistical tests, and explains how to choose between parametric and non-parametric tests. Additionally, it covers regression analysis, including how to interpret results and present regression equations.

Uploaded by

Drake Tsuchiguri
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Analysis and Interpretation

The document provides an overview of data analysis and interpretation, emphasizing the importance of statistics in collecting, presenting, and analyzing data. It distinguishes between primary and secondary data, outlines various statistical tests, and explains how to choose between parametric and non-parametric tests. Additionally, it covers regression analysis, including how to interpret results and present regression equations.

Uploaded by

Drake Tsuchiguri
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

DATA ANALYSIS AND

INTERPRETATION

SEVERINO B. SALERA JR, Ph.D


Associate Professor 5
Statistics
Definition: Science of collection, presentation, analysis,
and reasonable interpretation of data.
 Statistics presents a rigorous scientific method for gaining insight into data.
 For example, suppose we measure the weight of 100 patients in a study. With
so many measurements, simply looking at the data fails to provide an
informative account.
 However statistics can give an instant overall picture of data based on graphical
presentation or numerical summarization irrespective to the number of data
points.
 Besides data summarization, another important task of statistics is to make
inference and predict relations of variables.
What is Data?
Definition: Facts or figures, which are numerical or
otherwise, collected with a definite purpose are
called data.
 Everyday we come across a lot of information in the form of facts,
numerical figures, tables, graphs, etc.
 These are provided by newspapers, televisions, magazines and
other means of communication.
 These may relate to cricket batting or bowling averages, profits
of a company, temperatures of cities, expenditures in various
sectors of a five year plan, polling results, and so on.
 These facts or figures, which are numerical or otherwise,
collected with a definite purpose are called data.
Primary Data Vs Secondary Data
Primary Data
 Primary data is the data that is collected for the first time
through personal experiences or evidence, particularly for
research.
 It is also described as raw data or first-hand information.
 The mode of assembling the information is costly.
 The data is mostly collected through observations,
physical testing, mailed questionnaires, surveys, personal
interviews, telephonic interviews, case studies, and focus
groups, etc.
Primary Data Vs Secondary Data
Secondary Data
 Secondary data is a second-hand data that is already collected and recorded by
some researchers for their purpose, and not for the current research problem.
 It is accessible in the form of data collected from different sources such as
government publications, censuses, internal records of the organisation, books,
journal articles, websites and reports, etc.
 This method of gathering data is affordable, readily available, and saves cost and
time.
 However, the one disadvantage is that the information assembled is for some other
purpose and may not meet the present research purpose or may not be accurate.
Data Presentation
 Two types of statistical presentation of data - graphical
and numerical.
 Graphical Presentation: We look for the overall pattern and
for striking deviations from that pattern. Over all pattern
usually described by shape, center, and spread of the data.
An individual value that falls outside the overall pattern is
called an outlier.
 Bar diagram and Pie charts are used for categorical
variables.
 Histogram, stem and leaf and Box-plot are used for
numerical variable.
Choosing the right statistical test
Type of statistical test
Parametric statistical tests are a group of
statistical tests that make certain
assumptions about the data. These tests are
used to make inferences about a population
based on a sample. The main assumption that
these tests make is that the data is normally
distributed.
Different Parametric Test
 Regression tests

Regression tests look for cause-and-effect relationships. They can be used to estimate
the effect of one or more continuous variables on another variable.

 Comparison tests
 Comparison tests look for differences among group means. They can be used to test
the effect of a categorical variable on the mean value of some other characteristic.

 T-tests are used when comparing the means of precisely two groups (e.g., the average
heights of men and women). ANOVA and MANOVA tests are used when comparing
the means of more than two groups (e.g., the average heights of children, teenagers,
and adults).

 Correlation tests
 Correlation tests check whether variables are related without hypothesizing a cause-
and-effect relationship
Regression Test

Predictor variable Outcome variable Research question


example
Simple linear regressi  Continuous  Continuous What is the effect
on of income on longevity
 1 predictor  1 outcome
?
Multiple linear regres  Continuous  Continuous What is the effect
sion of income and minutes
 2 or more  1 outcome
predictors of exercise per
day on longevity?
Logistic regression  Continuous  Binary What is the effect
of drug dosage on
the survival of a test
subject?
Comparison Test
Predictor variable Outcome variable Research question
example
Paired t-test  Categorical  Quantitative What is the effect of two
  different test prep
1 predictor groups come from the
programs on the average
same population
exam scores for students
from the same class?
Independent t-test  Categorical  Quantitative What is the difference
  in average exam scores for
1 predictor groups come from
students from two different
different populations
schools?
ANOVA  Categorical  Quantitative What is the difference
  in average pain
1 or more predictor 1 outcome
levels among post-surgical
patients given three
different painkillers?
MANOVA  Categorical  Quantitative What is the effect of flower
  species on petal
1 or more predictor 2 or more outcome
length, petal width,
and stem length?
Correlation Test

Variables Research question


example
Pearson’s r  2 continuous variables How
are latitude and temperatur
e related?
Choosing a nonparametric test

 Non-parametric tests don’t make as many


assumptions about the data, and are useful when
one or more of the common statistical assumptions
are violated. However, the inferences they make
aren’t as strong as with parametric tests.
Non Parametric Test
Predictor variable Outcome variable Use in place of…

Spearman’s r  Quantitative  Quantitative Pearson’s r

Chi square test of indepe  Categorical • Categorical Pearson’s r


ndence
Sign test  Categorical • Quantitative One-sample t-test

Kruskal–Wallis H  Categorical • Quantitative ANOVA


 3 or more groups
ANOSIM  Categorical  Quantitative MANOVA
 3 or more groups  2 or more outcome
variables

Wilcoxon Rank-Sum test  Categorical  Quantitative Independent t-test


 2 groups  groups come from
different populations

Wilcoxon Signed-rank • Categorical  Quantitative Paired t-test


test 
• 2 groups groups come from the
Regression Analysis
Click Analyze > Regression > Linear... on the top menu, as shown below :
Regression Analysis
You will be presented with the Linear Regression dialogue
box:
Transfer the independent and
independent variable
Click on the button. This will generate the
results
Output of Linear Regression Analysis
 This table provides the R and R2 values. The R value
represents the simple correlation and is 0.873 (the "R"
Column), which indicates a high degree of correlation.
The R2 value (the "R Square" column) indicates how much of
the total variation in the dependent variable, Price, can be
explained by the independent variable, Income. In this case,
76.2% can be explained, which is very large.
The next table is the ANOVA table, which reports how
well the regression equation fits the data (i.e.,
predicts the dependent variable) and is shown below:
 This table indicates that the regression model predicts
the dependent variable significantly well. How do we
know this? Look at the "Regression" row and go to the
"Sig." column. This indicates the statistical significance
of the regression model that was run. Here, p < 0.0005,
which is less than 0.05, and indicates that, overall, the
regression model statistically significantly predicts the
outcome variable (i.e., it is a good fit for the data).
 The Coefficients table provides us with the necessary
information to predict price from income, as well as
determine whether income contributes statistically
significantly to the model (by looking at the "Sig."
column). Furthermore, we can use the values in the "B"
column under the "Unstandardized Coefficients"
column, as shown below:
To present the regression equation as:
Price = 8287 + 0.564(Income)

You might also like