0% found this document useful (0 votes)
12 views34 pages

VI-Data Analysis: College of Business and Economics Bahir Dar University, 2006E.C

The document outlines data analysis methods used in research, including data preparation, quantitative analysis, and qualitative analysis. It details steps for preparing quantitative and qualitative data, as well as various statistical techniques for analyzing data, such as measures of central tendency and dispersion. Additionally, it discusses bi-variate and multi-variate analysis methods to explore relationships between variables.

Uploaded by

bilewget2021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views34 pages

VI-Data Analysis: College of Business and Economics Bahir Dar University, 2006E.C

The document outlines data analysis methods used in research, including data preparation, quantitative analysis, and qualitative analysis. It details steps for preparing quantitative and qualitative data, as well as various statistical techniques for analyzing data, such as measures of central tendency and dispersion. Additionally, it discusses bi-variate and multi-variate analysis methods to explore relationships between variables.

Uploaded by

bilewget2021
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

VI-Data Analysis

College of Business and Economics


Bahir Dar University ,2006E.C
OUTLINE

 Data Preparation

 Quantitative Data Analysis

 Qualitative Data Analysis


Introduction
 The analysis of data represent the application of deductive
and inductive logic to the research process.

 The data are often classified by division into, subgroups, and


are then analyzed and synthesized in such a way that
hypothesis may be verified or rejected.

=> Quantitative Data Analysis

 The final result may be a new principle or generalization.

=> Qualitative Data Analysis


6.1. Data Preparation
 In research projects, data may be collected from a variety of sources:
questionnaire/surveys , interviews, observational data, and so forth. => Need to
be Prepared before analysis

 Data Preparation in Quantitative Data Analysis- usually follows the f/g


steps:-


Data Editing

Data Coding

Data Entry

Dealing with Missing values

 Data preparation in Qualitative Data Analysis- usually follows the f/g


steps:-


Transcribing interview and others audio/tape recorded interactions

Typing Field Notes

Preparing electronic textual and optical scanned materials
I DATA PREPARATION IN QUANTITATIVE DATA ANALYSIS
Ste Preparati What?
p on
1st Editing It is a process of examining the collected raw data (specially in
surveys) to detect errors and omissions and to correct these when
possible.

It is done to assure that the data are accurate, consistent with
other facts gathered, uniformly entered, as completed as possible
and have been well arranged to facilitate coding
2nd Data Coding is the process of converting data into numeric format.
Coding
A codebook should be created to guide the coding process
containing detailed description of :-
Each variable in a research study,
Items or measures for that variable, the format of each item
(numeric, etc.),
The response scale for each item (i.e., whether it is measured
on a nominal, ordinal, interval, or ratio scale, and how to code
each value into a numeric format
3rd Data Coded data can be entered into a spreadsheet, database, text file,
Entry or directly into a statistical program like SPSS, STATA.

The entered data should be frequently checked for accuracy, via


occasional spot checks on a set of items or observations, during
and after entry.
4th Dealing  Missing data is an inevitable part of any empirical data set.=>
with Respondents may not answer certain questions if they are
II DATA PREPARATION IN QUALITATIVE DATA ANALYSIS
Ste Preparation What?
p
1st What do  Mostly, Interviews and focus groups are audio tape-recorded whenever
we do with possible.
audio  Preparing these recorded data for analysis requires transcribing =>That is,
recordings? it should be reproduced as a written (word-processed) account using the
(Transcribi actual words.
ng) To transcribe an audio recording,
 The transcriptionist listens to the tape and simultaneously writes down everything
that is said on the tape.
Nonverbal sounds (such as laughter, someone knocking on the door) are also often
noted on the transcript.

Before beginning the analysis, it is vital to check the transcripts for


accuracy, and also to ensure that participants’ anonymity is preserved by
removing identifying information.
2nd Typing Participant observers, focus group facilitators, and interviewers take
field notes handwritten notes to document a wide range of information, eg. paraphrases
of participant responses

For focus groups and interviews, after transcribing all relevant recordings,
the transcriptionist types up the interviewer’s or focus group moderator’s
corresponding handwritten field notes

These typed field notes could either be appended to the transcript or kept in
a separate file
This typed field notes provide contextual information that could enhance the
researchers’ understanding of the transcript
3rd
Preparing Textual data such as email interviews or electronic versions of documents
electronic that have been captured electronically, need to be prepared for analysis.
6.2. Quantitative Analysis
Data collected in a research project can be analyzed
quantitatively using statistical tools in different ways.

Descriptive Analysis- refers to set of statistical techniques that can


help to present/illustrate the sample data and used to describe a sample
characteristic

Tabulating the data (Frequencies) and Graphs


Measures of Central Tendency
Measures of Dispersion

Measures of Associations- refers to statistically examining


associations between the constructs/variables.
Bi-variate Analysis
Multivariate Analysis

Inferential analysis- refers to the statistical testing of hypotheses


Recall!-Categorization of
Data
All statistical analysis test depends on the data type or level
of measurement Does the variable consist of values
expressed as numbers or
categories?

Categories Quantifiable/Numbers
Can the categories be
meaningfully rank ordered Continuous/ scale Variable
in some way?
Has the scale variable got
No Yes a true and absolute zero?

No Yes
Nominal Ordinal

Interval Ratio
Tabulating Data(Frequency Distributions) and
Graphs
The frequency distribution of a variable is a summary of the frequency (or
percentages) of individual values or ranges of values for that variable.
 Frequency distribution – usually presented as a table, that simply shows the values for each
variable expressed as a number and as a percentage of the total of cases(relative frequency)
Frequency distributions can also be depicted in graphs that depends on the
variable type
Graph What?
Histogram It presents the counts of observations(frequencies) of a continuous
variable(Interval/Ratio

It plots a single continuous variable (x-axis) against the frequency of


scores (y-axis),
Frequency It is a line that connects the tops of the bars of a histogram to provide a
Polygon pure shape illustrating distribution.

This displays the same data as the simple histogram except that it uses a
line instead of bars to show the frequency, and the area below the line is
shaded
Pie chart It presents the frequencies for a categorical variable(Nominal)

This shows the values of a variable as a section of the total cases (like
slices of a pie).


Tabulating Data(Frequency Distributions) and
Graphs- Histogram(Continuous Data- Interval
Data) Distributions
CK-concentration-(U/l)
Firm size Frequency Relative Cumulative Rel. Quan
(‘000) Frequency Frequency
8 100.0
99.5%
20-39 1 0.028 0.028
97.5%
40-59 4 0.111 0.139 90.0%
6 75.0%
60-79 7 0.194 0.333
50.0%
80-99 8 0.222 0.555 25.0%
10.0%

Frequency
100-119 8 0.222 0.777
4 2.5%
120-139 3 0.083 0.860 0.5%
0.0%
140-159 2 0.056 0.916

160-179 1 0.028 0.944 2

180-199 0 0.000 0.944

200-219 2 0.056 1.000

Total 36 1.000 20 40 60 80 100 120 140 160 180 200 220


Tabulating Data(Frequency Distributions) and
Graphs- Frequency Polygon(Continuous Data-
Interval Data)
Firm size Frequency Relative Cumulative Rel.
(‘000) Frequency Frequency

20-39 1 0.028 0.028

40-59 4 0.111 0.139

60-79 7 0.194 0.333

80-99 8 0.222 0.555

100-119 8 0.222 0.777

120-139 3 0.083 0.860

140-159 2 0.056 0.916

160-179 1 0.028 0.944

180-199 0 0.000 0.944

200-219 2 0.056 1.000

Total 36 1.000
Tabulating Data(Frequency Distributions) and
Graphs- Pie Chart(Nominal Data)

Religion Frequenc Percent


y

Orthodox 24 40

Muslims 18 30

Protestants 12 20

Others 6 10
Tabulating Data(Frequency Distributions) and
Graphs-Bar Chart (Nominal Data)

Department Frequency Percent

Retail 65 42

Warehouse 60 39

Accounts 20 10

Personnel 25 9
Measures of Central Tendency
(Location)
Measures of Center Tendency (Location)-
indicate where on the number line the data are to
be found.

 Common Measures of Central Tendency (location)


are:

(i)the Arithmetic Mean,


(ii) the Median, and
(iii) the Mode
Measures of Central Tendency
(Location)
No Measures of Central What?
Tendency
1 Arithmetic This is the arithmetic average calculated by adding all the
Mean(often, simply values and dividing by their number.
called ‘Mean’) n
x n 1
For sample size n, the mean can be computed:  xi
i 1

2 Median  In an ordered array, the median is the “middle” number (50%


above, 50% below)

 Hence, in the sample data arranged increasing order:


If there are an odd number of observations then the
median is represented by the numerical value
corresponding to the middle value .
If there are an even number of observations then the
middle is between two observations. The median in this
case is the average of the two values.

Not affected by extreme values


3 Mode Represents Value that occurs most often

Not affected by extreme values


Measures of Dispersion
 Measures of Dispersion (also called Measures of
Variability or Spread) characterise how spread out
the distribution is, i.e., how variable the data are.

 Commonly used measures of dispersion include:


1. Range
2. Variance & Standard deviation
3. Inter-quartile range
Measures of Dispersion
No Measures of What?
Dispersion
1 Range Simplest measure of variation
Is the Difference between the largest and the smallest values: Range =
Xlargest – Xsmallest
Easy to calculate;
 Useful for “best” or “worst” case scenarios
Sensitive to extreme values
2 Variance and Variance n
The sample variance is roughly the average  xof x squared
2
Standard Deviation i the differences
2
between each of the observations in asset
of data and the mean:
i 1
n  1

Standard Deviation n
Most commonly used measure of variation  xi  x 2
Shows variation about the mean s  i1
n  1
Is the square root of the variance

3 The inter-quartile It is the distance between the value that has a quarter of the values less than
range (IQR) it (first quartile-Q1 or 25th percentile) and the value that has three-quarters of
the values less than it (third quartile-Q3 or 75th percentile)

i.e. IQR = Q3- Q1 and measures the spread in the middle 50% of the data
The IQR is also called the mid spread because it covers the middle 50% of
the data
The IQR is a measure of variability that is not influenced by outliers or
extreme values
…cont’d
Summary guide as to the appropriate use of descriptive
statistics
No Descriptive Statistics Measurement scale
A) Appropriate ways to display Data
1 Histogram Ratio or Interval scale
2 Frequency Polygon Ratio or Interval scale
3 Bar Chart and Frequency Table Ordinal or Nominal Scale
4 Pie Chart and Frequency Table Nominal Scale
B) Measures of central tendency
1 Mean Ratio or Interval scale
2 Median Ordinal, Interval and Ratio scale
3 Mode Nominal ,Interval and Ratio scale
C) Measures of dispersion
1 Standard deviation Ratio or Interval scale
2 Range Ordinal, Interval and Ratio scale
3 Inter quartile range Ordinal, Interval and Ratio scale
Measure of Associations- Bi varaite
Analysis
Bi variate Analysis considers the properties of two variables in
relation to each other

The choice of appropriate statistical methods of bi-variate analysis


depends on the levels of measurement used in the variables
Measurement Coefficient Comments and Uses
scale
Nominal Chi-square Used to test the null hypothesis of statistical independence or
determine significant differences between observed frequencies within
the data & frequencies that were expected.
Phi(Φ) Used to determine relationships b/n both dichotomous(e.g. yes/no),
nominal data
Cramer’s V It is a modification of Phi for contingency tables larger than 2 X 2
Used when both variables are nominal and with positive values
Contingency Used with 2 nominal variables & is the most commonly used of the chi-
coefficient squared-based measures of association
Ordinal Kendall’s -Determine correlation among variables used when both variables are at
Tau the ordinal level
Spearman’s -Correlation for ranked data used to examine relationships among
Rank-Order variables in a study
Correlation
Interval and Pearson Used to determine the relationship between variables (-1 to +1 =
Measure of Association- Multi variate
Analysis
 Multivariate analysis looks at the relationships between more than two
variables.
 It is defined as those statistical techniques which focus up on and bring out
in bold belief , the structure of simultaneous relationship among three or
more variables

 Today, there exist a great variety of multivariate techniques => The


technique to be used for a given situation depends upon the answers to
all these very questions:

 Are some of the involved variables dependent upon others?



If the answer is ‘yes’, we have dependence methods; but in case the
answer is ‘no’, we have interdependence methods.
 In case some variables are dependent, how many variables are dependent?
 Is the data are metric(interval or ratio scale) or non-metric(nominal or ordinal
scale)?

 The selection of multivariate techniques considering these questions


have been shown in figure below.
Yes Are there dependent
variables(DVs) in the problem? No
Is there
more
than Are the variables
one? Yes metric/non
No
metric
Is the DV Are the DVs
Non Metri
metric/non metric metric/nonmetric
metric c

Non
Non Metri
Metric metric
metric c
(Ratio/ -Factor Analysis
(Nominal/ Interval -Cluster Analysis
-Non Metric Multi
Ordinal) ) dimensional
-Multi dimensional
scaling(MDS)
scaling( MDS)
Are the
Are the Indep.Vs Indep.Vs
metric/nonmetric metric/nonmetri Are the
c Indep.Vs
metric/nonmetri
Non Metric
Non c
Metric metric
metric Non Metric
metric

-Logit Multiple Conjoint Multivariat -Structural


-Multiple
-Canonical Regression Analysis e analysis Equation
Regression
Analysis -Multiple Analysis of Variance Modeling(SE
Analysis
with Discriminate with dummy (MANOVA) M)
dummy Analysis
- -Canonical
Analysis
Summary of Multivariate Techniques
Multivariate Comments and Use
Technique
I. Dependence
Methods
1. Multiple Its objective is to predict the variability the dependent variable based on its
regression covariance with all the independent variables

2. Multiple It is a classificatory technique that aims to place a given observation in one of
discriminate analysis several nominal categories based on a linear combination of predictor variables.
3. Logit It is suitable for assessing the influence of independent variables on a dependent
variable measured in a nominal scale
4. Structural SEM enable the researcher to test structural(regression) relationships between
equation Modeling factors (i.e. between latent variables
(SEM)
5.Multivariate It is an extension of bi variate analysis of variance in which the ratio of among-
analysis groups variance to within-groups variance is calculated on a set of variables instead
of of a single variable.
variance(MANOVA)
6. Canonical It can be used in case of both measurable and non-measurable variables for the
correlation purpose of simultaneously predicting a set of dependent variables from their joint
covariance with a set of independent variables
7. Conjoint Analysis This technique is considered appropriate when several non metric dependent
variables are involved in a research study along with many metric/ non-metric
explanatory variables.
II. Interdependence
Methods
8. Factor analysis It is a data reduction technique that is used to statistically aggregate a large
number of observed measures (items) into a smaller set of unobserved (latent)
Inferential Statistics: Hypothesis
Testing
 Inferential statistics or Hypothesis testing -involves using data collected in a
sample to make statements (inferences) about unknown population
parameters.

Hypothesis testing helps to assess the evidence provided by the data in favor of some claim
about the population

 Stages in hypothesis testing


1st .Hypothesis formulation.

2nd. Specification of significance level (to see how safe it is to accept or reject
the hypothesis).

3rd. Identification of the probability distribution and definition of the region of


rejection.

4th. Selection of appropriate statistical tests.

5th. Calculation of the test statistic and acceptance or rejection of the


…..Cont’d
 I)Hypothesis Formulation
 A hypothesis is a statement concerning a population (s) that may or may not be
true, and constitutes an inference or inferences about a population, drawn from
sample information
 Hypotheses come in essentially three forms . Those that:

Examine the characteristics of a single population (and may involve calculating the mean and standard
deviation, and the shape of the distribution).

Explore contrasts and comparisons between groups.

Examine associations and relationships between groups.
 It may be formulated as the null hypothesis or Alternative
 Null Hypothesis (H0 ): It is a hypothesis which states that there is no difference between the procedures
 Alternative Hypothesis (HA): It is a hypothesis which states that there is a difference between the procedures

 II) Specification of significance level


 Having formulated the null hypothesis, we must next decide on the circumstances in which it will be
accepted or rejected.

Since there is no such thing as an absolute certainty (especially in the real world!), there is always a
chance of rejecting the null hypothesis when in fact it is true (called a Type I error) and accepting it when
it is in fact false (a Type II error)
 What are the chances of making a Type I error? This is measured by what is called the significance
level, which measures the probability of making a mistake.

The significance level is always set before a test is carried out, and is traditionally set at either 0.05, 0.01,
or 0.001.

Thus, if we set our significance level at 5 per cent (p = 0.05), we are willing to take the risk of rejecting the
null hypothesis when in fact it is correct 5 times out of 100.
…..Cont’d
 III) Identification of the prob. distribution and rejection region- All
statistical tests are based on an area of acceptance and an area of rejection.
 Rejection Region : It is the part of the sample space (critical region) where the null hypothesis H 0
is rejected

For example, for the z distribution where p = 0.05 and a two-tailed test, statistical tables
show that the area of acceptance for the null hypothesis is the central 95 per cent of the
distribution and the areas of rejection are the 2.5 per cent of each tail (see Figure below).

 IV)Selection of appropriate statistical tests-The type of statistical test to be


used will depend on a range of factors:

Type of hypothesis- Tests of hypothesis can be on one sample tests- or two or k(more than
two) samples.

If two samples or K samples are involved, are the individual cases Independent or Related?

Assumptions about the distribution of populations and , the level of measurement of the variables
in the hypothesis.-Different tests are appropriate for nominal, ordinal, interval and ratio data
…..Cont’d
,
=>Table below provides a summary of the kinds of statistical test available in the variety of
circumstances just described.

Comparing two sample statistic Comparing several samples


Measur statistics
ement One Sample Two sample K-samples
scale
Related Independent Related Independent
samples samples samples samples

Nominal - Binomial -McNemar -Fisher Exact test -Cochran Q -Chi square for K
- Chi square one sample -Chi square two sample test
test sample test
Ordinal - Kolmogorov –Smirnov -Sign test - Median test -Friedman two -Median
one sample test -Wilcoxon - Mann-Whiteny U way ANOVA Extension
- Runs test matched pair test -Kolmogorov – -Kruskal-Wallis
Smirnov one way ANOVA
-Wald-Wolfowitz runs
 V)Calculation
-
Interval t-test of the
-t-test fortest
paired statistic and acceptance
- t-test
-Repeated or
-One way ANOVA
rejection
and
Ratio
- z-testof the hypothesis
samples
ANOVA
- z-test
measure - n-way ANOVA

 Once the test statistic is calculated, the final task is to compare this with the
hypothesized value.

If the test statistic does not reach this value, then the null hypothesis must be
6.3.Qualitative Data
Analysis
Qualitative analysis is the analysis of qualitative data such as
text data from interview transcripts.


Unlike quantitative analysis, which is statistics driven and largely
independent of the researcher, qualitative analysis is heavily dependent
on:-

The researcher’s analytic and integrative skills and personal knowledge of the
social context where the data is collected.


The emphasis in qualitative analysis is “sense making” or understanding
a phenomenon, rather than predicting or explaining.

Of the various methods of Qualitative data analysis, the most


common techniques include:


Thematic analysis(generic), Grounded Theory, Content
Analysis ,Discourse Analysis and Narrative Analysis
…..Cont’d
 I) Thematic Analysis

It is a data reduction and analysis strategy by which :-

Qualitative data are segmented, categorized, summarized, and reconstructed in a
way that captures the important concepts within the data set.

This involve coding of data schemes and “clustering” of emerging themes from
the data


It is primarily a descriptive strategy that facilitates the search for patterns
of experience within a qualitative data set; the product of a thematic
analysis is a description of those patterns

• II) Grounded Theory


• Aims to derive theory from systematic analysis of data
• It is based on categorization approach (called here ‘coding’)
• Three levels of ‘coding’:
– Open coding: identifying, uncovering, and naming concepts that are hidden
within data and then, identify categories by grouping similar concepts
– Axial coding: flesh out and link to subcategories=> selecting one of the
categories and positioning it within a theoretical model
– Selective coding: form theoretical scheme =>explicating a story from the inter
connection of these categories
…..Cont’d
 III) Content Analysis

It is the systematic analysis of the content of a text and typically conducted as
follows:-

1st . Select a sample texts from the population of texts for analysis.

This process is not random, but instead, texts that have more pertinent content
should be chosen selectively.

2nd Identifies and applies rules to divide each text into segments that can be treated as
separate units of analysis. This process is called unitizing

3rd , Construct and apply one or more concepts to each unitized text segment in a
process called coding.

4th. Finally, the coded data is analyzed, to determine which themes occur most
frequently, in what contexts, and how they are related to each other.

 IV) Discourse Analysis



The focus of discourse analysis is on how both spoken and written language is used
in social contexts.

Attention is given to the structure and organization of language with an emphasis on
how participants’ versions of events are constructed.
 V) Narrative Analysis

The analyst focuses on how respondents impose order on the flow of experience in
their lives and thus make sense of events and actions in which they have
participated.

Focuses on “the story itself” and seeks to preserve the integrity of personal
…..Cont’d
 Despite these analytic differences depending on the type of qualitative
strategy to be used, qualitative inquirers often use a general procedure
and convey in the proposal the steps in data analysis.
 An overview of the quantitative data analysis process is seen in Figure
below.
Fig 6.3. Process of
Qualitative data
analysis
…..Cont’d
 Step 1.Organize and Prepare the data for analysis
 This involves transcribing interviews, optically scanning material, typing up field notes, or
sorting and arranging the data into different types depending on the sources of information.
 Step 2. Read through all the data.
 Most qualitative researchers would agree that qualitative analysis begins with data immersion.

This means reading and rereading each set of notes or transcripts until you are intimately familiar
with the content.

This is to obtain a general sense of the information and to reflect on its overall meaning.
 What general ideas are participants saying? What is the tone of the ideas? What is the
impression of the overall depth, credibility, and use of the information?
 Step3. Coding Data
 Coding is the process of identifying categories and meanings in text data or pictures or images
gathered during data collection and labeling those categories with a term.

Coding permits systematic retrieval of categories and meanings during analysis.

Codes help researchers identify patterns in data
 Step 4. Generate Descriptions/Themes
 Use the coding process to generate a description of the setting or people as well as categories
or themes for analysis.

Description -It involves a detailed rendering of information about people, places, or events
in a setting.

Themes- It involves use the coding to generate a small number of themes or categories,
perhaps five to seven categories for a research study.
…..Cont’d
 Step 5. Interrelating Themes/Descriptions and Display

Advance how the description and themes will be represented in the findings
qualitative study based on:


Data Display

Systematic analysis is advanced when codes are put into “data displays” which reflect the
researcher’s judgments about the data

Many qualitative researchers mostly use matrices or network
 Matrices- are two-dimensional arrangements of rows and columns that summarize a
substantial amount of information.
 Network are maps and charts used to display data. They are made up of blocks
(nodes) connected by links.

Such arrangements help researchers to :
 “Dimensionalize,” or recognize dimensions of similar thoughts , Connect codes in more
sophisticated ways and Document patterns in “user-friendly” ways (never rely on memory)
 Explore Themes:

Why do some codes co-occur? Why are some dimensions related to other codes while others
are not?

 Narratives

This approach is to use a narrative passage to convey the findings of the analysis.

This might be a discussion that mentions a chronology of events, the detailed discussion of
several themes or a discussion with interconnecting themes.
…..Cont’d
 Step 6. Interpretation or Meaning of the data-
 A final step in data analysis involves making an interpretation or meaning of the data.

 Asking, “What were the lessons learned?” captures the essence of this idea


These lessons could be the researcher’s personal interpretation, couched in the
understanding that the inquirer brings to the study from her or his own culture, history,
and experiences.

 It could also be a meaning derived from a comparison of the findings with information
gleaned from the literature or theories. In this way,


The findings confirm past information or diverge from it.

It can also suggest new questions that need to be asked—questions raised by the data and
analysis that the inquirer had not foreseen earlier in the study.
End !

You might also like