0% found this document useful (0 votes)
43 views

Lecture 8 Data Analysis

The document discusses various topics related to research methodology and data analysis. It covers preparing data for analysis through activities like editing, handling missing data, coding, and categorization. It then discusses analyzing the data through understanding basic objectives, getting a feel for the data, and testing hypotheses. It also introduces various statistical tests and interpreting the results. The document provides information on data collection methods and data analysis software tools. It includes diagrams and examples to illustrate key concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Lecture 8 Data Analysis

The document discusses various topics related to research methodology and data analysis. It covers preparing data for analysis through activities like editing, handling missing data, coding, and categorization. It then discusses analyzing the data through understanding basic objectives, getting a feel for the data, and testing hypotheses. It also introduces various statistical tests and interpreting the results. The document provides information on data collection methods and data analysis software tools. It includes diagrams and examples to illustrate key concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Research Methodology

Data Analysis
Agenda

– Getting data ready for analysis


• Editing data
• Handling missing data
• Coding data
• Categorization
• Entering data
– Data analysis
• Basic objectives of data analysis
• Feel for data
• Testing goodness of data
• Hypothesis testing
– Statistics
– Statistical tests
– Interpretation of data (continue in next session)
2
Data
Data collection
– Questionnaire (online, email, distributed,..)
– Interviews (direct, telephone, computer based)
– Observation
– Secondary sources (central bank data,..)

Data analysis software tools


– SPSS
– SAS
– STATPAK
– SYSTAT
– Excel 3
Flow diagram of data analysis

4
Hypotheses Testing

• Examples

5
Class Work

• Select a research article


• Examine whether the researcher has used
Hypothesis testing
• Quantitative or Qualitative data?
• How may hypothesis?
• What the areas?
• Why do the research set many hypothesis?

6
Editing data (1)
• Responses to open ended questions needs to be edited
• Information that may be noted down by the interviewer,
observer, or researcher in a hurry must be clearly
deciphered (lack of clarity  confusion)
• Recommend to edit data on the very same day that the
data are collected
– Before forgetting
– Enabling to contact the respondents for further information
or clarification
• Editing should be identifiable through the use of a
different colour pencil or ink so that the original
information is still available in case of further doubts
later.
7
Editing data (2)
• Data have to be checked for incompleteness
and inconsistencies
– Inconsistencies can logically correct (married
number of years married)
– Check before make corrections if possible
(otherwise bias may affect the goodness of data)
• Some cases may not be simple to edit, and
omissions could be left unnoticed and not
rectified (affects goodness of data)

8
Handling missing data
• Reasons to have blank responses
– Could not understand the problem
– Did not know the answer
– Was not willing to answer
– Was simply indifferent to the need to respond the
entire questionnaire
• If 25% of the questionnaire is not answered, better to
throw it without using it for data analysis
• Important to mention number of returned but unused
responses due to excessive missing data (final
report submitted to the sponsor)

9
Blank responses (minor scale)
• An interval-scaled item with a mid-point (neutral
value) would be to assign the midpoint in the scale
as the response to that particular item
• Allow the computer to ignore the blank responses
when analyzing (reduce sample size)
• Assign to the item the mean value of the responses
of all other responded to that particular item
• Mean of the responses of this particular respondent
to all other questions measuring this variable
• Random number within the scale
• Linear interpolation from adjacent point
• Linear trend in SPSS
10
Coding
• Convenient to use scanner sheets to input data than keying them.
When scanning is not possible due to some reason, it is better to
use coding sheet first to transcribe data from the questionnaire and
then key in the data (avoid flipping large questionnaires)
• Efficient when some thought is given to coding at the time of
designing questionnaire
• Human errors can occur while coding. At least 10% of the coded
questionnaires should therefore be checked for coding accuracy
• Rating scales
– Dichotomous scale
– Category scale
– Likert scale
– Numerical scale
– Semantic differential scale-via meaning
– Itemized rating scale
– Fixed or constant sum rating scale
– Stapel scale
– Graphic rating scale 11
– Consensus scale
Categorization

• Set up a scheme for categorizing the variables such


that the several items measuring a concept are all
grouped together
• Responses to some of the negatively worded
questions have also be reversed to make in same
direction
• Questions measuring a concept is not contiguous
(close) care has to be taken to include all the items
without any omission or wrong inclusion

12
Entering data

• Scanning input or manually keying of data


into the computer
• Any software can be used but spread sheet
type may be preferred (row  case and
column  variable)
• All missing values will appear with a dot in
the cell

13
Data analysis
Objectives:
– Getting feel for the data
– Testing the goodness of data
– Testing the hypothesis developed for research

14
Getting feel for data
– Better to obtain
• Frequency distributions for the demographic variables
• Central tendency measures (mean, median, mode) and
dispersion measures (standard deviation, range and variance,
absolute deviation) on the other dependent and independent
variables
• An intercorrelation matrix of the respective variables,
irrespective of whether or not the hypotheses are directly
related to the analysis
– If an item has a little variation, researcher would suspect that
particular question was not properly worded (not understand the
intent of question). Otherwise it could be explained
– Graphical charts may be much easy
• Histograms, bar charts
– Good to know how the dependent and independent variables in
the study related to each other (inter-correlation matrix)
– If the correlation between two variables happens to be high 15
(>0.75) check them for two different concepts
Statistics
• Descriptive statistics
– Statistics that describe the phenomenon of
interest
• Inferential statistics
– Statistical results that let us draw inferences
from a sample to population
– Can be categorized into two:
• Parametric
– Based on the assumption that the population from which
is drawn is normally distributed and data are collected in
interval or ratio scale
• Non-parametric
– No explicit assumption on distribution
– Used when data are collected on a nominal or ordinal 21
scale
Descriptive statistics (1)
• Frequency
– Simply refer to number of times various subcategories of
a certain phenomenon occurs (percentage and
cumulative percentage can be calculated)
– Information can graphically represented as a histogram
or bar chart
– Eg: A marketing manager wants to know how many units
of each brand of coffee are sold
– Desire to obtain frequencies on a nominally scaled
variable (grouped into non-overlapping subcategories)
– In management research, frequencies are generally for
nominal variables such as gender and educational level

22
Descriptive statistics (2)
Central tendency
• Mean:
– the mean is the sum of the data points divided by the number of
data points.
– The mean is that value that is most commonly referred to as the
average.
– We will use the term average as a synonym for the mean and the
term typical value to refer generically to measures of location.
• Median
– the median is the value of the point which has half the data smaller
than that point and half the data larger than that point.
• Mode
– the mode is the most frequently occurring phenomenon
– It is not necessarily unique.
– The mode is typically used in a qualitative fashion
– It is the midpoint of the class interval of the histogram with the
highest peak
23
A few of the more common alternative measures of central tendency:

1. Mid-Mean - computes a mean using the data between the 25 th


and 75th percentiles.

2. Trimmed Mean - similar to the mid-mean except different


percentile values are used. A common choice is to trim 5% of the
points in both the lower and upper tails, i.e., calculate the mean
for data between the 5th and 95th percentiles.

3. Winsorized Mean - similar to the trimmed mean. However,


instead of trimming the points, they are set to the lowest (or
highest) value. For example, all data below the 5th percentile are
set equal to the value of the 5th percentile and all data greater
than the 95th percentile are set equal to the 95th percentile.

4. Mid-range = (smallest + largest)/2.

24
Descriptive statistics (3)
Dispersion (i)
• Range:
– the range is the largest value minus the smallest value in a data
set.
– Note that this measure is based only on the lowest and highest
extreme values in the sample.
– The spread near the center of the data is not captured at all.
• Variance
– The variance is roughly the arithmetic average of the squared
distance from the mean.
– Squaring the distance from the mean has the effect of giving
greater weight to values that are further from the mean.
– Although the variance is intended to be an overall measure of
spread, it can be greatly affected by the tail behavior.
• Standard deviation
– Measure of dispersion for interval and ratio scaled data
– Offers an index of the spread of a distribution or the variability in
the data
25
Descriptive statistics (4)
Dispersion (ii)

• Other measures
– Percentiles, deciles, and quartiles become meaningful
– The median divides the total realm of observation into two equal
halves, the quartile divides it into four equal parts, the decile into
10 and the percentile into 100.
– Inter-quartile range: observations excluding the bottom and top
25% quartiles
– Box-and-whisker plot
• A graphical device that portrays central tendencies, percentiles and
variability
• A box is drawn extending from the first to 3rd quartiles and lines are
drawn from either side of the box to the extreme scores

26
Inferential statistics (1)
• Interested to know or infer the data through
analysis
– The relationship between two variables
• Between advertisement and sales
– Differences in a variable among different
subgroups
• Women or men buy more of the product
– How several independent variables might
explain the variance in dependent variable
• How investment in a stock market are influenced
by the level of unemployment, perceptions of
economy, disposable incomes, and dividend
expectations
27
Inferential statistics (2)
• Correlation
– To know how one variable is related to the others
– A Pearson correlation matrix indicates the direction, strength and
significance of the bivariate relationship
– The correlation is derived by assessing the variations in one
variable as another variable varies (scatter diagram helps to
identify the variation)
– Perfect positive correlation is +1 and perfect negative correlation
is -1.
– For example, the correlation r=0.56 also indicates that the
variables would explain the variance in one another to the extent
of 31.4% (0.562)
– Strength of relationship between two variables can be generated
for variables measured on interval or ratio scale
– Non-parametric tests are also available to assess the relationship
between variables not measured on interval or ratio scale.
Examine the relationship between two ordinal variables
• Spearman’s rank correlation
• Kendall’s rank correlation 28
Hypothesis testing

• Once the data are ready for analysis the


researcher is ready to test the hypothesis
already developed for the study
• Different tests that would be appropriate for
different hypothesis and for data obtained on
different scales will be discussed.

29
Inferential statistics (5)
• Significant mean differences : ANOVA
– T-test only for two groups, but ANOVA helps to examine the
significant mean differences among more than two groups on an
interval or ratio scaled dependent variable
• Examine the significant mean differences in the amount of sales by
those who are sent to training schools; those who are given OJT;
those who have tutored by the sales manager
– F-distribution is a probability distribution of sample variances and
the family of distributions changes with the changes in sample
size
– Tests can be used to detect whether exactly mean difference lies
• Duncan multiple range test
• Turkey’s test
• Student-Newman-Keul’s test
– Kruskal-wallis one way analysis of variance is the non parametric
test used when the dependent variable is an ordinal scale and
independent variable is nominally scaled
32
Inferential statistics (6)
• Multiple regression analysis (1):
– Correlation coefficient r indicates the strength of relationship
between two variables, it gives us no idea of how much of the
variance in the dependent or criterion variable will be explained
when several variables are theorized to simultaneously influence it
– There may be independent variables correlated to the dependent
variable in varying degrees, but they might also be inter-correlated
• Task difficulty is likely to be related to supervisory support, pay might be
correlated to task difficulty and all three task difficulty, supervisory
support and pay might influence the organizational culture
– When the variables jointly regressed against the dependent
variable in an effort to explain the variance in it, the individual
correlations collapse into what is called a multiple r or multiple
correlation
– The square of multiple r, R square as it is commonly known, is the
amount of variance explained in the dependent variable by the
predictors (multiple regression analysis)
33
Inferential statistics (7)
• Multiple regression analysis (2):
– When the R-square value, the F statistic, and its significant level
is known, it is possible to interpret the results. For example:
• R-squared value : 0.63
• F value : 25.56
• Significance level p<0.001
– It is possible to say that 63% of the variance has been
significantly explained by the set of predictors. There is less than
0.001% chance of this not holding true
– Multiple regression analysis is done to examine the simultaneous
effects of several independent variables on a dependent variable
that is interval scaled.
– If we want to know which among the set of predictors is most
important in explaining the variance, do stepwise multiple
regression
– Multiple regression analysis is also done to trace the sequential
antecedents that cause the dependent variable through what is
known as path analysis.
34
Selection of statistical techniques

35
Use of non parametric tests

36
Thank you

Next: Data Analysis - Continuing

37

You might also like