0% found this document useful (0 votes)
10 views

Data Analysis

Uploaded by

Usama Mushtaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Data Analysis

Uploaded by

Usama Mushtaq
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 65

Analyzing Data

Data Analysis
 The process through which data are organized so that
comparisons can be made and conclusions drawn
Presenting Your Data
 Describe your methods – tell your audience exactly
what it is that you did to come up with your results.
 Be descriptive & analytical (so conclusions are
accepted)
 Use lots of quotes (as evidence, let people see them)
 Consider tables, charts, figures, models & diagrams
 Contextualize – where does this data come from, who
does it apply to (particularity, applicability)...
 Speak with confidence about your data
Data Analytic Strategies
 Recursive analytic strategies:
analyze cases generate findings
draw conclusion from grounded theory write
report
Analyzing Survey Data
Do you want to report…
 how many people answered a, b, c, d?
 the average number or score?
 a change in score between two points in time?
 how people compared?
 how many people reached a certain level?
Common descriptive statistics
 Count (frequencies)
 Percentage
 Mean
 Mode
 Median
 Range
 Standard deviation
 Variance
 Ranking
Getting your data ready
 Assign a unique identifier
 Organize and keep all forms (questionnaires,
interviews, testimonials)
 Check for completeness and accuracy
 Remove those that are incomplete or do not make
sense
Data entry computer screen
Smoking: 1 (YES) 2 (NO)

Survey Q1 Do you Q2 Age Q3 Support


ID smoke? ordinance?
001 1 24 2
002 1 18 2
003 2 36 1
004 2 48 1
005 1 26 1
Supports Opposes Undecided/
restaurant restaurant declined to
ordinance ordinance comment

Current 8 33 14
smokers (15% of (60% of (25% of
(n=55) smokers) smokers) smokers)

Non-smokers 170 16 12
(n=200) (86% of non- (8% of non- (6% of non-
smokers) smokers) smokers)

Total 178 49 26
(N=255) (70% of all (19% of all (11% of all
respondents) respondents) respondents)
Summarizing data
 Tables
 Simplest way to summarize data
 Data are presented as absolute numbers or percentages
 Charts and graphs
 Visual representation of data
 Data are presented as absolute numbers or percentages
Basic guidance when summarizing data
 Ensure graphic has a title
 Label the components of your graphic
 Indicate source of data with date
 Provide number of observations (n=xx) as a reference
point
 Add footnote if more information is needed
Tables: Frequency distribution

Set of categories with numerical counts

Year Number of births


1900 61
1901 58
1902 75
Tables: Relative frequency

number of values within an interval


total number of values in the table x 100

Year # births (n) Relative frequency (%)


1900–1909 35 27
1910–1919 46 34
1920–1929 51 39
Total 132 100.0
Tables

Percentage of births by decade between 1900 and 1929

Year Number of births (n) Relative frequency (%)

1900–1909 35 27
1910–1919 46 34
1920–1929 51 39
Total 132 100.0

Source: U.S. Census data, 1900–1929.


Charts and graphs
 Charts and graphs are used to portray:
 Trends, relationships, and comparisons
 The most informative are simple and self-explanatory
Use the right type of graphic
 Charts and graphs
 Bar chart: comparisons, categories of data
 Line graph: display trends over time
 Pie chart: show percentages or proportional share
Bar chart
Comparing categories

4
Site 1
3
Site 2
2 Site 3

0
Quarter 1 Quarter 2 Quarter 3 Quarter 4
Percentage of new enrollees tested for HIV at each site,
by quarter

6
% o f new enrollees tested for

5
4
3
HIV

2
Site 1
1 Site 2
0 Site 3
Quarter 1 Quarter 2 Quarter 3 Quarter 4
Months

Data Source: Program records, AIDS Relief, January 2009 – December 2009.rce:
Quarterly Country Summary: Nigeria, 2008
Has the program met its goal?
Percentage of new enrollees tested for HIV at each site, by
quarter
60%
% of new enrollees tested

50%
40%
for HIV

30% Site 1
20% Site 2
Site 3
10%
0%
Target
Quarter 1 Quarter 2 Quarter 3 Quarter 4

Data Source: Program records, AIDS Relief, January 2009 – December 2009..
quarterly Country Summary: Nigeria, 2008
Stacked bar chart
Represent components of whole & compare wholes
Number of Months Female and Male Patients Have Been
Enrolled in HIV Care, by Age Group

Females 4 10

0-14 years
15+ years
Males 3 6

0 5 10 15

Number of months patients have been enrolled in HIV care


Data source: AIDSRelief program records January 2009 - 20011
Line graph
Displays trends over time
Number of Clinicians Working in Each Clinic During Years 1–4*

5
Number of clinicians

4
Clinic 1
3
Clinic 2
2 Clinic 3
1

0
Year 1 Year 2 Year 3 Year 4
*Includes doctors and nurses
Line graph
Number of Clinicians Working in Each Clinic During Years 1-4*

5
Number of clinicians

4
Clinic 1
3
Clinic 2
2 Clinic 3

0
Year 1 Year 2 Year 3 Year 4

Zambia Service Provision Assessment, 2007.


*Includes doctors and nurses
Pie chart
Contribution to the total = 100%

Percentage of All Patients Enrolled by Quarter


8%

10%

1st Qtr
2nd Qtr
3rd Qtr
23% 59% 4th Qtr

N=150
Interpreting data
 Adding meaning to information by making connections
and comparisons and exploring causes and consequences
Interpretation – relevance of finding
 Adding meaning to information by making connections
and comparisons and exploring causes and consequences
Interpretation – relevance of finding
 Does the indicator meet the target?
 How far from the target is it?
 How does it compare (to other time periods, other
facilities)?
 Are there any extreme highs and lows in the data?
Interpretation – possible causes?

• Supplement with expert opinion


• Others with knowledge of the program or target
population
Interpretation – consider other data
Use routine service data to clarify questions
• Calculate nurse-to-client ratio, review
commodities data against client load, etc.
Use other data sources
Interpretation – other data sources
 Situation analyses
 Demographic and health surveys
 Performance improvement data
Interpretation – conduct further research

 Data gap conduct further research


 Methodology depends on questions being asked and
resources available
Key messages
 Use the right graph for the right data
 Tables – can display a large amount of data
 Graphs/charts – visual, easier to detect patterns
 Label the components of your graphic
 Interpreting data adds meaning by making connections
and comparisons to program
 Service data are good at tracking progress & identifying
concerns – do not show causality
Planning for Analysis

Type of Type of
Data Formatting

Type of
Analysis
Type of Data & Formatting Technique
 Quantitative Data
 Must “quantify” the data
 Convert (“data reduce”) from collection format into numeric
database
 Qualitative Data
 Must process the data (type/enter/describe)
 Convert from audio/video to text
 Combination
 Process each element as appropriate
Type of Data & Analysis
 Quantitative Data
 Counts, frequencies, tallies
 Statistical analyses (as appropriate)
 Qualitative Data
 Coding
 Patterns, themes, theory building
 Combination
 Process each element as appropriate
Quantifying Data
 Before we can do any kind of analysis, we need to
quantify our data

 “Quantification” is the process of converting data to a


numeric format
 Convert social science data into a “machine-readable” form, a
form that can be read & manipulated by computer programs
Quantifying Data

Some transformations are simple:


 Assign numeric representations to nominal or ordinal
variables:
 Turning male into “1” and female into “2”
 Assigning “3” to Very Interested, “2” to Somewhat
Interested, “1” to Not Interested
 Assign numeric values to continuous variables:
 Turning born in 1973 to “35”
 Number of children = “02”
Developing Code Categories
Some data are more challenging. Open-ended responses
must be coded.

 Two basic approaches:


 Begin with a coding scheme derived from the research purpose.
 Generate codes from the data.
Coding Quantitative Data
 Goal – reduce a wide variety of information to a more
limited set of variable attributes:
 “What is your occupation?”
 Use pre-established scheme: Professional, Managerial, Clerical, Semi-
skilled, etc.
 Create a scheme after reviewing the data
 Assign value to each category in the scheme: Professional = 1,
Managerial = 2, etc.
 Classify the response: “Secretary” is “clerical” and is coded as “3”
Coding Quantitative Data
 Points to remember:
 If the data are coded to maintain a good amount of detail, they
can always be combined (reduced) later
 However, if you start off with too little detail, you can’t get it
back
 If you’re using a survey / questionnaire, it’s a good idea to do
your coding on the form so that it can be entered properly (i.e.
create a “codebook”)
Codebook Construction
Purposes:
 Primary guide used in the coding process.
 Should note the value assigned to each variable attribute
(response)
 Guide for locating variables and interpreting codes in the
data file during analysis.
 If you’re doing your own input, this will also guide data
set construction
Hands-on Exercise 1
 Create a mini-codebook by coding the survey instrument
 Note column spaces / locations
 Note variable attribute values
 Pay attention to the box at the bottom, special instructions
Entering Data
 Optical scan sheets (usually ASCII output).
 Limits possible responses

 CATI system / On-line: entered while collected

 Data entry specialists enter the data into an SPSS data matrix,
Excel spreadsheet, or ASCII file.
 Typically, work off a coded questionnaire
Entering Data
 In Excel or Access, follow procedures from class:
 Format tables with proper variable columns
 Enter data for each case
 In SPSS
 Import an ASCII file and name variables/column headings
 Or, create variables/column headings & enter each case
Entering Data
 ASCII files are useful because they can be transformed or
used in almost all analysis programs
 Upload to SPSS, Excel, or use directly with SAS
Hands-on Exercise 2
 Complete the survey (fill-in your answers)
 Create a ‘dataset’
 Enter the data from your survey using either Notepad or
the Edit program from the Command prompt
Quantitative Analysis
 You should choose a level of analysis that is appropriate
for your research question

 You should choose the type of statistical analysis


appropriate for the variables you have
 Nominal/Categorical, Ordinal, or Continuous
Quantitative Levels of Analysis
 Univariate - simplest form,describe a case in terms of a
single variable.
 Bivariate - subgroup comparisons, describe a case in terms
of two variables simultaneously.
 Multivariate - analysis of two or more variables
simultaneously.
Univariate Analysis
Describing a case in terms of the distribution of attributes
that comprise it.
Example:
 Gender - number of women, number of men.

 You should always begin your analysis by running the


basic univariate frequencies and checking to be sure data
were entered properly
Univariate Analysis
 Frequency distributions

 Measures of central tendency


 Mean, Median, Mode
Presenting Univariate Data
Goals:
 Provide reader with the fullest degree of detail regarding
the data.
 Present data in a manageable from.
 Simple and straightforward
Subgroup Comparisons
 Describe subsets of cases, subjects or respondents.
Examples
 "Collapsing" response categories:
 Age categories, Open responses, etc.
 Handling "don't knows“
 Code separately, make missing if appropriate
Bivariate Analysis
 Describe a case in terms of two variables simultaneously.
 Example:
 Gender
 Attitudes toward equality for men and women
 How does a respondent’s gender affect his or her attitude toward
equality for men and women?
 Crosstabulations / Correlations
Constructing Bivariate Tables
 Divide cases into groups according to the attributes of
the independent variable.
 Describe each subgroup in terms of attributes of the
dependent variable.
 Read the table by comparing the independent variable
subgroups in terms of a given attribute of the dependent
variable.
 DV goes in the rows, IV goes in the columns
Bivariate Analysis
 Bivariate Tables / Crosstabs are appropriate for all types
of variables, but the proper inferential statistic will vary
by variable type

 Continuous variables are typically made into categorical


variables for this type of analysis
 Recode variables
 Example: Create “Age” (18-34, 35-50, 51-65, 66+)
Appropriate Types of Analysis
Bivariate Analysis: Correlations
 Bivariate correlation analysis is appropriate for continuous
variables (interval, ratio)
 Other types of variables are often recoded into ‘Dummy’
variables (value 0 or 1) for these purposes
 Example: Gender becomes two variables ‘Male’ (1=yes) &
‘Female’ (1=yes)
 Present in Correlation Matrix
Multivariate Analysis
 Analysis of more than two variables simultaneously.
 Can be used to understand the relationship between
multiple variables more fully.
 Most typical: Regression analysis
Multivariate Analysis
 Ordinal (technically inappropriate but it happens),
continuous, dummy variables

 Type of regression analysis will depend on the type of


variables
 OLS (continuous)
 Logistic (other types)
Sampling
 What is your population of interest?
 To whom do you want to generalize your results?
 All students (18 and over)
 Undergraduates only
 Greeks
 Athletes
 Other
 Can you sample the entire population?
Sampling
 A sample is “a smaller (but hopefully representative)
collection of units from a population used to
determine truths about that population” (Field, 2005)
 Why sample?
 Resources (time, money) and workload
 Gives results with known accuracy that can be calculated
mathematically
 The sampling frame is the list from which the
potential respondents are drawn
 Registrar’s office
 Class rosters
 Must assess sampling frame errors
Types of Samples
 Probability (Random) Samples 
 Simple random sample
 Systematic random sample
 Stratified random sample
 Proportionate
 Disproportionate
 Cluster sample
 Non-Probability Samples
 Convenience sample
 Purposive sample
 Quota
Sample Size
 Depends on expected response rate
 Average 85% for paper
 FINAL SAMPLE DESIRED / .85 = SAMPLE
 Average 25% for web
 FINAL SAMPLE DESIRED / .25 = SAMPLE

Size of Campus Final Desired N


<600 All students
600-2,999 600
3,000-9,999 700
10,000-19,999 800
20,000-29,000 900
≥30,000 1,000
Bias and Error
 Systematic Error or Bias: unknown or unacknowledged
error created during the design, measurement, sampling,
procedure, or choice of problem studied
 Error tends to go in one direction
 Examples: Selection, Recall, Social desirability
 Random
 Unrelated to true measures
 Example: Momentary fatigue
Levels of Measurement
 Nominal  Interval
 Gender  Body Mass Index (BMI)
 Male, Female
 Vaccinations
 Yes, No, Unsure
 Ordinal  Ratio
 Personal health status  Number of drinks
 Excellent, Very good, Good,  Number of sexual partners
Fair, Poor  Perception percentages
 Last 30 days  Blood alcohol concentration
 Never used, Not in last 30 days, (BAC)
1-2 days, 3-5 days, 6-9 days, 10-
19 days, 20-29 days, All 30 days

You might also like