VI-Data Analysis: College of Business and Economics Bahir Dar University, 2006E.C
VI-Data Analysis: College of Business and Economics Bahir Dar University, 2006E.C
Data Preparation
Data Editing
Data Coding
Data Entry
Dealing with Missing values
Transcribing interview and others audio/tape recorded interactions
Typing Field Notes
Preparing electronic textual and optical scanned materials
I DATA PREPARATION IN QUANTITATIVE DATA ANALYSIS
Ste Preparati What?
p on
1st Editing It is a process of examining the collected raw data (specially in
surveys) to detect errors and omissions and to correct these when
possible.
It is done to assure that the data are accurate, consistent with
other facts gathered, uniformly entered, as completed as possible
and have been well arranged to facilitate coding
2nd Data Coding is the process of converting data into numeric format.
Coding
A codebook should be created to guide the coding process
containing detailed description of :-
Each variable in a research study,
Items or measures for that variable, the format of each item
(numeric, etc.),
The response scale for each item (i.e., whether it is measured
on a nominal, ordinal, interval, or ratio scale, and how to code
each value into a numeric format
3rd Data Coded data can be entered into a spreadsheet, database, text file,
Entry or directly into a statistical program like SPSS, STATA.
For focus groups and interviews, after transcribing all relevant recordings,
the transcriptionist types up the interviewer’s or focus group moderator’s
corresponding handwritten field notes
These typed field notes could either be appended to the transcript or kept in
a separate file
This typed field notes provide contextual information that could enhance the
researchers’ understanding of the transcript
3rd
Preparing Textual data such as email interviews or electronic versions of documents
electronic that have been captured electronically, need to be prepared for analysis.
6.2. Quantitative Analysis
Data collected in a research project can be analyzed
quantitatively using statistical tools in different ways.
Categories Quantifiable/Numbers
Can the categories be
meaningfully rank ordered Continuous/ scale Variable
in some way?
Has the scale variable got
No Yes a true and absolute zero?
No Yes
Nominal Ordinal
Interval Ratio
Tabulating Data(Frequency Distributions) and
Graphs
The frequency distribution of a variable is a summary of the frequency (or
percentages) of individual values or ranges of values for that variable.
Frequency distribution – usually presented as a table, that simply shows the values for each
variable expressed as a number and as a percentage of the total of cases(relative frequency)
Frequency distributions can also be depicted in graphs that depends on the
variable type
Graph What?
Histogram It presents the counts of observations(frequencies) of a continuous
variable(Interval/Ratio
This displays the same data as the simple histogram except that it uses a
line instead of bars to show the frequency, and the area below the line is
shaded
Pie chart It presents the frequencies for a categorical variable(Nominal)
This shows the values of a variable as a section of the total cases (like
slices of a pie).
Tabulating Data(Frequency Distributions) and
Graphs- Histogram(Continuous Data- Interval
Data) Distributions
CK-concentration-(U/l)
Firm size Frequency Relative Cumulative Rel. Quan
(‘000) Frequency Frequency
8 100.0
99.5%
20-39 1 0.028 0.028
97.5%
40-59 4 0.111 0.139 90.0%
6 75.0%
60-79 7 0.194 0.333
50.0%
80-99 8 0.222 0.555 25.0%
10.0%
Frequency
100-119 8 0.222 0.777
4 2.5%
120-139 3 0.083 0.860 0.5%
0.0%
140-159 2 0.056 0.916
Total 36 1.000
Tabulating Data(Frequency Distributions) and
Graphs- Pie Chart(Nominal Data)
Orthodox 24 40
Muslims 18 30
Protestants 12 20
Others 6 10
Tabulating Data(Frequency Distributions) and
Graphs-Bar Chart (Nominal Data)
Retail 65 42
Warehouse 60 39
Accounts 20 10
Personnel 25 9
Measures of Central Tendency
(Location)
Measures of Center Tendency (Location)-
indicate where on the number line the data are to
be found.
Standard Deviation n
Most commonly used measure of variation xi x 2
Shows variation about the mean s i1
n 1
Is the square root of the variance
3 The inter-quartile It is the distance between the value that has a quarter of the values less than
range (IQR) it (first quartile-Q1 or 25th percentile) and the value that has three-quarters of
the values less than it (third quartile-Q3 or 75th percentile)
i.e. IQR = Q3- Q1 and measures the spread in the middle 50% of the data
The IQR is also called the mid spread because it covers the middle 50% of
the data
The IQR is a measure of variability that is not influenced by outliers or
extreme values
…cont’d
Summary guide as to the appropriate use of descriptive
statistics
No Descriptive Statistics Measurement scale
A) Appropriate ways to display Data
1 Histogram Ratio or Interval scale
2 Frequency Polygon Ratio or Interval scale
3 Bar Chart and Frequency Table Ordinal or Nominal Scale
4 Pie Chart and Frequency Table Nominal Scale
B) Measures of central tendency
1 Mean Ratio or Interval scale
2 Median Ordinal, Interval and Ratio scale
3 Mode Nominal ,Interval and Ratio scale
C) Measures of dispersion
1 Standard deviation Ratio or Interval scale
2 Range Ordinal, Interval and Ratio scale
3 Inter quartile range Ordinal, Interval and Ratio scale
Measure of Associations- Bi varaite
Analysis
Bi variate Analysis considers the properties of two variables in
relation to each other
Non
Non Metri
Metric metric
metric c
(Ratio/ -Factor Analysis
(Nominal/ Interval -Cluster Analysis
-Non Metric Multi
Ordinal) ) dimensional
-Multi dimensional
scaling(MDS)
scaling( MDS)
Are the
Are the Indep.Vs Indep.Vs
metric/nonmetric metric/nonmetri Are the
c Indep.Vs
metric/nonmetri
Non Metric
Non c
Metric metric
metric Non Metric
metric
2. Multiple It is a classificatory technique that aims to place a given observation in one of
discriminate analysis several nominal categories based on a linear combination of predictor variables.
3. Logit It is suitable for assessing the influence of independent variables on a dependent
variable measured in a nominal scale
4. Structural SEM enable the researcher to test structural(regression) relationships between
equation Modeling factors (i.e. between latent variables
(SEM)
5.Multivariate It is an extension of bi variate analysis of variance in which the ratio of among-
analysis groups variance to within-groups variance is calculated on a set of variables instead
of of a single variable.
variance(MANOVA)
6. Canonical It can be used in case of both measurable and non-measurable variables for the
correlation purpose of simultaneously predicting a set of dependent variables from their joint
covariance with a set of independent variables
7. Conjoint Analysis This technique is considered appropriate when several non metric dependent
variables are involved in a research study along with many metric/ non-metric
explanatory variables.
II. Interdependence
Methods
8. Factor analysis It is a data reduction technique that is used to statistically aggregate a large
number of observed measures (items) into a smaller set of unobserved (latent)
Inferential Statistics: Hypothesis
Testing
Inferential statistics or Hypothesis testing -involves using data collected in a
sample to make statements (inferences) about unknown population
parameters.
Hypothesis testing helps to assess the evidence provided by the data in favor of some claim
about the population
2nd. Specification of significance level (to see how safe it is to accept or reject
the hypothesis).
Nominal - Binomial -McNemar -Fisher Exact test -Cochran Q -Chi square for K
- Chi square one sample -Chi square two sample test
test sample test
Ordinal - Kolmogorov –Smirnov -Sign test - Median test -Friedman two -Median
one sample test -Wilcoxon - Mann-Whiteny U way ANOVA Extension
- Runs test matched pair test -Kolmogorov – -Kruskal-Wallis
Smirnov one way ANOVA
-Wald-Wolfowitz runs
V)Calculation
-
Interval t-test of the
-t-test fortest
paired statistic and acceptance
- t-test
-Repeated or
-One way ANOVA
rejection
and
Ratio
- z-testof the hypothesis
samples
ANOVA
- z-test
measure - n-way ANOVA
Once the test statistic is calculated, the final task is to compare this with the
hypothesized value.
If the test statistic does not reach this value, then the null hypothesis must be
6.3.Qualitative Data
Analysis
Qualitative analysis is the analysis of qualitative data such as
text data from interview transcripts.
Unlike quantitative analysis, which is statistics driven and largely
independent of the researcher, qualitative analysis is heavily dependent
on:-
The researcher’s analytic and integrative skills and personal knowledge of the
social context where the data is collected.
The emphasis in qualitative analysis is “sense making” or understanding
a phenomenon, rather than predicting or explaining.
Thematic analysis(generic), Grounded Theory, Content
Analysis ,Discourse Analysis and Narrative Analysis
…..Cont’d
I) Thematic Analysis
It is a data reduction and analysis strategy by which :-
Qualitative data are segmented, categorized, summarized, and reconstructed in a
way that captures the important concepts within the data set.
This involve coding of data schemes and “clustering” of emerging themes from
the data
It is primarily a descriptive strategy that facilitates the search for patterns
of experience within a qualitative data set; the product of a thematic
analysis is a description of those patterns
Data Display
Systematic analysis is advanced when codes are put into “data displays” which reflect the
researcher’s judgments about the data
Many qualitative researchers mostly use matrices or network
Matrices- are two-dimensional arrangements of rows and columns that summarize a
substantial amount of information.
Network are maps and charts used to display data. They are made up of blocks
(nodes) connected by links.
Such arrangements help researchers to :
“Dimensionalize,” or recognize dimensions of similar thoughts , Connect codes in more
sophisticated ways and Document patterns in “user-friendly” ways (never rely on memory)
Explore Themes:
Why do some codes co-occur? Why are some dimensions related to other codes while others
are not?
Narratives
This approach is to use a narrative passage to convey the findings of the analysis.
This might be a discussion that mentions a chronology of events, the detailed discussion of
several themes or a discussion with interconnecting themes.
…..Cont’d
Step 6. Interpretation or Meaning of the data-
A final step in data analysis involves making an interpretation or meaning of the data.
Asking, “What were the lessons learned?” captures the essence of this idea
These lessons could be the researcher’s personal interpretation, couched in the
understanding that the inquirer brings to the study from her or his own culture, history,
and experiences.
It could also be a meaning derived from a comparison of the findings with information
gleaned from the literature or theories. In this way,
The findings confirm past information or diverge from it.
It can also suggest new questions that need to be asked—questions raised by the data and
analysis that the inquirer had not foreseen earlier in the study.
End !