0% found this document useful (0 votes)

13 views73 pages

Session 3

The document outlines the process of data preparation in advanced data analysis, focusing on handling missing values, outliers, and creating new variables. It emphasizes the importance of accurately identifying and managing missing data, detecting outliers through graphical methods, and employing summated scales for data reduction. Additionally, it covers the reliability analysis of scales to ensure internal consistency in measuring constructs.

Uploaded by

Rajasekharan Kuntheti Gopalakrishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views73 pages

Session 3

Uploaded by

Rajasekharan Kuntheti Gopalakrishna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Advanced Data Analysis

Session 3: Data preparation

Today’s session
Data preparation
• Dealing with missing values
• Dealing with outliers
• Compute new variables
• Research on a subset of observations
• Summated scales
 Recoding variables
 Reliability analysis
• Transforming metric variables into categorical variable
DEALING WITH MISSING VALUES
File: CLASS missing [Link]
Dealing with missing values •
•
Course website > Session 3
Save on computer

Variable view

Name Variable name

restrictions: no spaces, no underscore at end, no
duplicates, no reserved keywords: ALL,
AND,BY,EQ,GE,LE,LT,…
Type String data (~text) vs. numeric data
Label Label variable: less restrictions than name
Values Label assigned to the levels of the variable
Missing Assign missing values
Measure Measurement level: scale – ordinal - nominal
Dealing with missing values
• Missing value = No response or “Don’t know” “No opinion” (NOT: “Neutral”)
• Can significantly influence results!
• Need to be assigned explicitly
• Represented by a Dot (.) in the dataset

In large datasets: not easy to identify by sight!

Dealing with missing values
Detecting missing values

Analyze > Descriptive Statistics > Frequencies

Dealing with missing values
Detecting missing values

Step 1: Select ALL variables and move them into the « variable box »
Dealing with missing values
Detecting missing values
Step 2: Check whether missing values are spotted for any of the variables

Step 3: For missing values detected, check frequency table of the variable

Step 4: Go into DATA VIEW and scroll down to identify where the
value is missing for that specific variable
Dealing with missing values
Specify missing values in SPSS

Rule:
Assign a value that is
NOT an answer option
(e.g., 99 or -1)

Step 5: Tell SPSS that a value is missing

Having missing values for one or more participants doesn’t mean we have to ignore the
data we do have for those participants!!
But, we need to tell SPSS that a value is missing for those participants!
2 possibilities:
1) Leave the cell blank with a dot
→ Not 100% clear whether the answer still needs to be filled out or whether the value
is missing; sometimes SPSS cannot work with that
2) Assign a value that clearly indicates that the value is missing
→ Better option!
Dealing with missing values
Specify missing values in SPSS
Step 5.1: In Data view

(put -1)

Illustration: Calculate the mean weight

Analyze > Descriptive Statistics > Descriptives
Dealing with missing values
Specify missing values in SPSS
Step 5.1: In Data view

(put -1)
Step5.2: In Variable view

Without
specification,
the mean
could be very
different!!

For EVERY
variable with
missings!
DEALING WITH OUTLIERS
Dealing with outliers

Outlier (reminder)
• A case that is very different from the rest of the data
• Can significantly influence results:
 Bias the mean
 Inflate the standard deviation
Dealing with outliers
the!mean!increases!(it!increases!by!0.4).!This!example!shows!how!a!single!score,!from!some!meanO

Dealing with outliers

spirited!badger!turd,!can!bias!a!parameter!such!as!the!mean:!the!first!rating!of!2!drags!the!average!
down.!Based!on!this!biased!estimate!new!customers!might!erroneously!conclude!that!my!book!is!
worse!than!the!population!actually!thinks!it!is.!Although!I!am!consumed!with!bitterness!about!this!
whole!affair,!it!has!at!least!given!me!a!great!example!of!an!outlier.!

Figure' 5.2:' The!first!7!customer!ratings!of!this!book!on![Link]!(in!about!2002).!The!first!score!

biases!the!mean!  Importance of screening your data!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
slated!every!aspect!of!the!data!analysis!in!a!very!pedantic!way.!Imagine!my!horror!when!my!supervisor!came!
Dealing with outliers
Screen your data for outliers

1) Graph the data with a frequency distribution

(histogram)

2) If an outlier appears to be present for a variable,

graph the data for that variable with a boxplot

Example
A biologist was worried about the potential health effects of music festivals.
He measured the hygiene of 810 concert-goers over the three days of the
festival. Hygiene was measured using a scale with scores ranging from 0 to 4:
• 0 = you smell like a corpse rotting up a skunk’s arse
• 4 = you smell like sweet roses on a fresh spring day
Dealing with outliers
Screen your data for outliers

Step 1: Analyze > Descriptive Statistics > Frequencies

Step 2: Drag all the variables in your dataset from the left box to the
Variable(s) Box.
Step 3: Click on Charts and as Chart Type you indicate Histograms

Step 4: Click on Continue and then OK.

Dealing with outliers
Reminder: Spotting outliers with graphs

Hygiene rating on a festival:

Interval scale of 0 - 4

Particularly odd because

it has a score of 20 on a
scale from 0 to 4!

TIP: Always check that there

are no scores BELOW or ABOVE
the maximum and minimum of
the scale used!
Dealing with outliers
Reminder: Spotting outliers with graphs

Boxplots tell you

WHETHER there are
extreme scores &
WHERE to look in
your dataset

Case 611 seems to be an extreme score!

(LOOK OUT: it is line 611 in data view, not a value of 611 for the variable!)
Check some hints on outlier detection for the group work in a document online
Dealing with outliers
What to do with outliers once detected?
Discuss the outliers very specifically:

• Are they mistakes (e.g., typos)? Then, fix them!!

e.g., length = 317 cm  173 cm

• Delete the person from the dataset

ONLY IF you think the entire person is an outlier:
1) not representative for the population you want to investigate
(e.g., student from Paris while testing students from Lille)
2) the same person is responsible for outliers in many questions in
the survey
Only delete when you are SURE that it is an outlier! Sometimes, people may have very
good reasons for responding with an extreme score (compared to the rest of the data)!

• Change the score: CAUTION!!

Dealing with outliers
What to do with outliers once detected?
CHANGING THE SCORE

« What did you say? CHEATING???!!! »

2 options
• Replace by next highest/lowest non-outlier value
(e.g., score of 12 while only a scale of 1-10  change to 10)
• Replace that particular value by a missing code (e.g., -1)

REPORT!!! Don’t delete/change outliers automatically, without disclosure and discussion

Explain who was deleted and on which ground!

We detected one outlier (case …) because […].
We left that value out of the analysis OR we
changed that value to… because …
File: [Link]
• Course website > Session 3
• Save on computer

COMPUTE NEW VARIABLES

Compute new variables
Exercise

Calculate a new variable BMI (Body Mass Index)

BMI = Weight (in kg)/(Length (in m))²

Do not do it yourself!
 Cumbersome task
 Prone to mistakes
Use SPSS to help you with that…
Compute new variables
Exercise
BMI = Weight (in kg)/(Length (in m))²

Transform > Compute Variable

Compute new variables
Exercise
BMI = Weight (in kg)/(Length (in m))²

???

Predefined
formulas (e.g.,
Square to calculate
means, etc.)
Compute new variables
Exercise
Data view

Check quickly whether the right calculation is made!

DO NO FORGET to
specify characteristics
for this new variable in
the variable view!
RESEARCH ON A SUBSET OF
OBSERVATIONS
Research on a subset of observations File: [Link]

Sometimes, you may be interested in performing an analysis for a specific group of

people only → Need to tell SPSS, otherwise, it takes everybody in the dataset!

Data > Select cases

Research on a subset of observations

e.g., you want to exclude

group 1
Variable > 1

e.g., you want to include

only group 1 and 3
Variable =1 │ Variable = 3
Research on a subset of observations

Data view

Check quickly whether the right selection is made!

Don’t forget to put off the selection after your analysis!

Data > Select cases > Check ‘All cases’
REDUCING DATA:
SUMMATED SCALES
File: [Link]
Reducing data •
•
Course website > Session 3
Save on computer

« Candy Preference Scale »

Candy Preference Scale
When I watch television in the evening, I eat candy on a regular basis.
If I'm hungry between meals, I will eat fruit more often than candy.
I always like to add extra sugar to my dessert.
When I take a snack, I prefer the sweetest one.

 Four questions to measure the same preference for candy

Two options:
 Repeat the same analyses for the four questions =
cumbersome task (e.g., 15 questions)
 Reduce the data by making a SUMMATED SCALE
= one variable that combines several variables that
measure the same construct (here: candy preference)
Summated scales

3 requirements
1. All questions need to be measured on the same scale
(e.g., Likert scale from 1 to 5)
2. All questions need to be scaled in the same direction
3. The new variable should contain only variables that
measure the same construct (here: candy preference)
Summated scales

Three requirements
1. All questions need to be measured on the same scale
Candy Preference Scale
Please indicate the extent to which you agree with the following statements
from 1 « completely disagree » to 7 « completely agree »
When I watch television in the evening, I eat candy on a regular basis.
□1 □2 □3 □4 □5 □6 □7
If I'm hungry between meals, I will eat fruit more often than candy.
□1 □2 □3 □4 □5 □6 □7
I always like to add extra sugar to my dessert.
□1 □2 □3 □4 □5 □6 □7
When I take a snack, I prefer the sweetest one.
□1 □2 □3 □4 □5 □6 □7
Summated scales

Three requirements
2. All questions need to be scaled in the same direction

Candy Preference Scale

Please indicate the extent to which you agree with the following statements
from 1 « completely disagree » to 7 « completely agree »
When I watch television in the evening, I eat candy on a regular basis.
If I'm hungry between meals, I will eat fruit more often than candy.
I always like to add extra sugar to my dessert.
When I take a snack, I prefer the sweetest one.
Summated scales
Recoding variables

Transform > Recode into Different Variables

Helpful hint: Display the

variable name instead of label:
1. Right mouse click
on the variable of
interest
2. Select « Display
Variable Names »
Summated scales
Recoding variables
1. Click on the variable to recode (« CandyPref2 ») in the list of variables on the left and
click the arrow
2. Under Output Variable: enter the name of the new recoded variable
(‘CandyPref2_recoded) + label
3. Click on ‘Old and New Values’
Summated scales
Recoding variables
1. Specify potential missing variables
Summated scales
Recoding variables
2. Specify the old and new values for reverse coding

7
OLD: System- or user-
1 missing 
NEW: System-missing

ATTENTION!!
Different scales need different
recodings (old-new values)

(here: 7-point scale)

Summated scales
Recoding variables
Data view

Check quickly whether the right recoding is made!

Summated scales

Three requirements
3. Summated scale = one variable that combines several variables
that measure the same construct (here: candy preference)

Create your summated

scale ONLY IF
sufficiently reliable!
(i.e. acceptable
Cronbach’s α)
Summated scales
Reliability analysis
Internal consistency reliability
= a measure that indicates whether several items that propose
to measure the same general construct produce similar scores
 When you have a large scale, first make groups of items that
logically fit together in terms of interpretation before you
test their internal consistency
Example for construct “attitude toward cycling”
I like to ride bicycles I’ve enjoyed riding bicycles I hate bicycles
in the past
Totally agree Totally agree Totally disagree

 good internal consistency of the scale

Measured by Cronbach’s alpha
0<α<1
Summated scales
Reliability analysis
Analyze > Scale > Reliability Analysis
Summated scales
Reliability analysis

If you have a RECODED variable: use it instead of the

original one, otherwise Cronbach’s alpha will produce a
strange score!!!
Summated scales
Reliability analysis
Summated scales
Reliability analysis
Summated scales
Reliability analysis
OUTPUT

Alpha could be increased by deleting CandyPref2_recoded

BUT is it necessary? Cronbach’s alpha is already very high
and the increase is only marginal!
Summated scales
Reliability analysis
Rule of thumb
• α ≥ 0.9: excellent
• 0.8 ≤ α < 0.9: very good Cronbach’s alpha > 0.7
• 0.7 ≤ α < 0.8: good
• 0.6 ≤ α < 0.7: acceptable
• 0.5 ≤ α < 0.6: poor
• α < 0.5: unacceptable

Warning!
• Items should logically match according to interpretation
(garbage in, garbage out)  first fit items together based on
their meaning
• If only marginal difference, choose for more items
• Min. 3 items
• Max. 10 items (α increases as amount of items increase)
Summated scales
Reliability analysis
What if only 2 items?

No Cronbach’s alpha
BUT Pearson correlation

Analyze > Correlate > Bivariate

Summated scales
Reliability analysis
What if only 2 items?
Summated scales
Reliability analysis
OUTPUT

= correlation between
= the two variables
Summated scales
How to report reliability analysis?
Report about Cronbach’s alpha by using the symbol α when
you write about your measures:

« Candy preference was measured with four items: « (1) When I watch
television in the evening, I eat candy on a regular basis; (2) If I'm hungry
between meals, I will eat fruit more often than candy; (3) I always like to
add extra sugar to my dessert; (4) When I take a snack, I prefer the
sweetest one.” The second item was reverse coded. After reversing the
second item, the candy preference scale had a high reliability (α = .95). »

When reporting statistics below

Same for Pearson correlation, but
1, always drop the 0 before the
use the symbol r (r = .95)
decimal place!!!
Exercise on Cronbach’s Alpha

 Open dataset ‘[Link]’

 Do an appropriate reliability analysis on the items
Summated scales
Last stage: Creating the new, summated scale
3 requirements
1. All questions need to be measured on the same scale
(e.g., Likert scale from 1 to 5)
2. All questions need to be scaled in the same direction
3. The new variable should contain only variables that
measure the same construct (here: candy preference)

If those 3 requirements are met:

Create a new variable, which is the MEAN of the scores on
the different questions
1. Go in ‘Transform’ > ‘Compute variable’
2. Give a name to your new variable (« MEAN… » )
3. Compute the mean of all the scores on the different questions
Summated scales
Last stage: Creating the new, summated scale

MEAN_CandyPreference (CandyPref1 + CandyPref3 + CandyPref4 +

CandyPref2_recoded) / 4 This number of course depends
on the number of items you
sum up

Again, you can check the newly created variable in the data view!
TRANSFORMING METRIC
VARIABLES
Transforming metric variable into categorical variable

File: [Link]
• Course website > Session 3
• Save on computer

IMPORTANT QUESTION
How many categories do we want?

Two groups
(= median split):
People are split into two equal Three groups:
categories based on the median: People are split into three equal
1) <(=) the median categories based on
2) >(=) the median % of people in each group:
In each category, +/- 33% of the total
sample should be present
Transforming metric variable
into 2 groups variable
Step 1: Ask for frequencies + median
Analyze > Descriptives > Frequencies
Transforming metric variable
into 2 groups variable

Group 1

Group 2

Look for the BEST split point to divide the sample

in half (= split after the value closest to 50)
HERE: Include the median in the highest group:
• Group 1: < median
• Group 2: =/> median
Transforming metric variable
into 2 groups variable
Step 2: Make a new variable: Transform > Recode into different variable
Transforming metric variable
into 2 groups variable
Transforming metric variable
into 2 groups variable

2
1
Transforming metric variable
into 2 groups variable

Check quickly whether the right coding is made!

Transforming metric variable into categorical variable

IMPORTANT QUESTION
How many categories do we want?

Two categories
(= median split):
People are split into two equal Three categories:
categories based on the median: People are split into three equal
1) <(=) the median categories based on
2) >(=) the median % of people in each group:
In each category, +/- 33% of the total
sample should be present
Transforming metric variable
into 3 groups variable

Step 1: Ask for frequencies

Analyze > Descriptives > Frequencies

Look at the cumulative percent :

Group 1: < ± 33%
Group 2: ± 33% < (…) < ± 66%
Group 3: > ± 66%
Transforming metric variable
into 3 groups variable
Step 2: Make a new variable: Transform > Recode into different variable
Transforming metric variable
into 3 groups variable
Transforming metric variable
into 3 groups variable

3
Transforming metric variable
into 3 groups variable
Check quickly whether the right coding is made!(in data view)

Analyze > Descriptive Statistics > Frequencies (for the new variable)
Transforming metric variable into categorical variable
Exercise

Useful hint: the value that you mention is always included in that
particular category
CHECKLIST
Checklist Session 3
1) Missing values?
2) Outliers?
3) Scales are used ( = data that can be reduced)?
• Check requirements:
1) Same scale
2) Reversed items to be recoded?
3) Sufficiently reliable scale?
 Cronbach’s alpha (more than 2 items)
 Pearson correlation (for only 2 items)
• Make a summated scale (Transform > Compute > Mean)
4) Other variables that need to be created or transformed?
5) Do some cases need to be excluded for some analyses
(Data > Select cases)?
6) Do we need to transform a metric variable into a categorical
variable? If yes, how many categories do we need?
If you have not done so yet:
Clean up the dataset!!!
(cf. session 1 – Part II)

Prepare your dataset for

further analyses
Go over checklist
session 3

GROUP PROJECT

Session 3 and 4
No ratings yet
Session 3 and 4
118 pages
Data Screening and Cleaning Guide
No ratings yet
Data Screening and Cleaning Guide
55 pages
SPSS Session
No ratings yet
SPSS Session
133 pages
Data Analysis Using SPSS: Research Workshop Series
No ratings yet
Data Analysis Using SPSS: Research Workshop Series
86 pages
Data Screening and Psychometrics
No ratings yet
Data Screening and Psychometrics
7 pages
Data Screening (Sometimes Referred To As "Data Screaming") Is The Process of Ensuring Your Data Is
No ratings yet
Data Screening (Sometimes Referred To As "Data Screaming") Is The Process of Ensuring Your Data Is
4 pages
Data Screening Assumptions
No ratings yet
Data Screening Assumptions
29 pages
10 Data Preparation
No ratings yet
10 Data Preparation
42 pages
SPSS Data Screening Guide
No ratings yet
SPSS Data Screening Guide
36 pages
Missing Data and Data Cleaning - Tagged
No ratings yet
Missing Data and Data Cleaning - Tagged
31 pages
LCGC Eur Burke 2001 - Missing Values, Outliers, Robust Stat and NonParametric PDF
No ratings yet
LCGC Eur Burke 2001 - Missing Values, Outliers, Robust Stat and NonParametric PDF
6 pages
Unit2 - Data Cleaning and Multivariate Techniques - 26 - 01 - 2025
No ratings yet
Unit2 - Data Cleaning and Multivariate Techniques - 26 - 01 - 2025
42 pages
1.data Cleaning Screening
No ratings yet
1.data Cleaning Screening
21 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
SEMINAR Data Screening
No ratings yet
SEMINAR Data Screening
8 pages
BA UNIT-3 - Part 1
No ratings yet
BA UNIT-3 - Part 1
4 pages
SPSS Pres
No ratings yet
SPSS Pres
25 pages
Data Screening and Main Model Analysis in Spss
No ratings yet
Data Screening and Main Model Analysis in Spss
26 pages
Spss Training Manual
No ratings yet
Spss Training Manual
94 pages
Data Preparation and Processing
No ratings yet
Data Preparation and Processing
30 pages
SPSS Notes
No ratings yet
SPSS Notes
8 pages
Module 3 Data Preparation
No ratings yet
Module 3 Data Preparation
33 pages
Data Preparation
100% (1)
Data Preparation
38 pages
Session 2 - QRT - Oct 3, 2020
No ratings yet
Session 2 - QRT - Oct 3, 2020
17 pages
Course Code: 8614 Course Name: Educational Statistics Assignment: 2 Semester: Spring 2022 Program: B.Ed
No ratings yet
Course Code: 8614 Course Name: Educational Statistics Assignment: 2 Semester: Spring 2022 Program: B.Ed
19 pages
Ambreen 2338 18990 1 BRM Session 14 SPSS
No ratings yet
Ambreen 2338 18990 1 BRM Session 14 SPSS
26 pages
320 Course Reader
No ratings yet
320 Course Reader
41 pages
Session 1
No ratings yet
Session 1
23 pages
Initial Data Analysis
No ratings yet
Initial Data Analysis
38 pages
Kognity Guide
No ratings yet
Kognity Guide
22 pages
Preliminary Analysis: - Descriptive Statistics. - Checking The Reliability of A Scale
No ratings yet
Preliminary Analysis: - Descriptive Statistics. - Checking The Reliability of A Scale
92 pages
Missing Data
No ratings yet
Missing Data
7 pages
MBR Lab Week 10-12-1
No ratings yet
MBR Lab Week 10-12-1
65 pages
Lecture 2 - Introduction To SPSS
No ratings yet
Lecture 2 - Introduction To SPSS
44 pages
Data Screening/Cleaning/ Preparation For Analyses
No ratings yet
Data Screening/Cleaning/ Preparation For Analyses
13 pages
Quantitative Data Analysis Guide
No ratings yet
Quantitative Data Analysis Guide
78 pages
Week 5B
No ratings yet
Week 5B
15 pages
Pima Tutorial
No ratings yet
Pima Tutorial
8 pages
Statistical Concepts
No ratings yet
Statistical Concepts
51 pages
Chapter 13
No ratings yet
Chapter 13
71 pages
SPSS For Likert Scale
No ratings yet
SPSS For Likert Scale
4 pages
SEM Boot Camp: Day 1 Basics & Data
No ratings yet
SEM Boot Camp: Day 1 Basics & Data
38 pages
Week 11
No ratings yet
Week 11
22 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
30 pages
Unit 1
No ratings yet
Unit 1
21 pages
Creating and Editing A Data File Final
No ratings yet
Creating and Editing A Data File Final
36 pages
Data Screening Techniques Explained
No ratings yet
Data Screening Techniques Explained
36 pages
Exploratory Data Analysis: By:-Shobhit Tyagi
No ratings yet
Exploratory Data Analysis: By:-Shobhit Tyagi
20 pages
Data Comes in Different Formats Time Histograms Lists But . Can Contain The Same Information About Quality
No ratings yet
Data Comes in Different Formats Time Histograms Lists But . Can Contain The Same Information About Quality
64 pages
How To Deal With Missing Values (DR SEE KIN HAI)
No ratings yet
How To Deal With Missing Values (DR SEE KIN HAI)
4 pages
Notes: Reliability
No ratings yet
Notes: Reliability
7 pages
Minitab 16: ANOVA, Normality, Tukey, Control Charts
No ratings yet
Minitab 16: ANOVA, Normality, Tukey, Control Charts
63 pages
Statistics Refresher
No ratings yet
Statistics Refresher
11 pages
Pharma Outlier Detection Guide
No ratings yet
Pharma Outlier Detection Guide
5 pages
Spss and Statistics Guide
100% (1)
Spss and Statistics Guide
28 pages
Data Analyses R Manual NYTS
No ratings yet
Data Analyses R Manual NYTS
24 pages
Data Preparation: March 6, 2010
No ratings yet
Data Preparation: March 6, 2010
17 pages
Head Nurse: General Objective
No ratings yet
Head Nurse: General Objective
10 pages
Introduction and BACKGROUND of The Philippine Development Plan (PDP) and The PNP Medium Term Development Plan (PNP MTDP)
No ratings yet
Introduction and BACKGROUND of The Philippine Development Plan (PDP) and The PNP Medium Term Development Plan (PNP MTDP)
109 pages
Pickleball Sport Sustainability Strategies: The Philippine Higher Education Institutions Context
No ratings yet
Pickleball Sport Sustainability Strategies: The Philippine Higher Education Institutions Context
14 pages
Sap Standard QM Reports
100% (1)
Sap Standard QM Reports
5 pages
IDMA Member Companies List
No ratings yet
IDMA Member Companies List
23 pages
MBA Term Paper: Building a Coalition
No ratings yet
MBA Term Paper: Building a Coalition
44 pages
An AHP/DEA Methodology For 3PL Vendor Selection in 4PL
No ratings yet
An AHP/DEA Methodology For 3PL Vendor Selection in 4PL
10 pages
Annals of Mathematics
No ratings yet
Annals of Mathematics
10 pages
2 Vol. 6 Issue 5 May 2015 IJPSR RE 1524 Paper 2 PDF
No ratings yet
2 Vol. 6 Issue 5 May 2015 IJPSR RE 1524 Paper 2 PDF
11 pages
Dynamic of Two Wheeled Robot
No ratings yet
Dynamic of Two Wheeled Robot
35 pages
Family Support's Impact on Student Engagement
No ratings yet
Family Support's Impact on Student Engagement
9 pages
Checklist For Survillance Assessment
No ratings yet
Checklist For Survillance Assessment
10 pages
Qualitative Sampling & Data Collection
No ratings yet
Qualitative Sampling & Data Collection
19 pages
Epistemology in Business Research
No ratings yet
Epistemology in Business Research
3 pages
M.A (Pol. Science)
No ratings yet
M.A (Pol. Science)
6 pages
Chapter-1: A Study On Employee Motivation
No ratings yet
Chapter-1: A Study On Employee Motivation
7 pages
Process Capability Analysis Guide
No ratings yet
Process Capability Analysis Guide
4 pages
Content Analysis: A Flexible Methodology: Library Trends, Volume 55, Number 1, Summer 2006, Pp. 22-45 (Article)
No ratings yet
Content Analysis: A Flexible Methodology: Library Trends, Volume 55, Number 1, Summer 2006, Pp. 22-45 (Article)
25 pages
Market Research Methods for Businesses
0% (1)
Market Research Methods for Businesses
7 pages
The Effects of Digitalization On Auditors Tools and Working Methods
No ratings yet
The Effects of Digitalization On Auditors Tools and Working Methods
43 pages
Ph.D. Program: Click Here For Details Click Here For Details
No ratings yet
Ph.D. Program: Click Here For Details Click Here For Details
2 pages
WEEK 10 CU 10 - Evidenced Base Practice in Health Education New
No ratings yet
WEEK 10 CU 10 - Evidenced Base Practice in Health Education New
5 pages
KBSAssessment CoverSheet2023-24
No ratings yet
KBSAssessment CoverSheet2023-24
5 pages
Probability and Statistics (SH552) Lecturer 1
No ratings yet
Probability and Statistics (SH552) Lecturer 1
25 pages
Cognition Exploring The Science of The Mind by Reisberg Official Test Bank
No ratings yet
Cognition Exploring The Science of The Mind by Reisberg Official Test Bank
316 pages
T13. Joint - Analysis - Handbook - 4th - Edition
No ratings yet
T13. Joint - Analysis - Handbook - 4th - Edition
135 pages
Sample of Preliminaries
No ratings yet
Sample of Preliminaries
8 pages
Management Science Course Syllabus
No ratings yet
Management Science Course Syllabus
4 pages
Strategic Case Study Guide
100% (1)
Strategic Case Study Guide
4 pages
Entrep 4
No ratings yet
Entrep 4
6 pages

Session 3

Uploaded by

Session 3

Uploaded by

Advanced Data Analysis

Session 3: Data preparation

Name Variable name

In large datasets: not easy to identify by sight!

Analyze > Descriptive Statistics > Frequencies

Step 5: Tell SPSS that a value is missing

Illustration: Calculate the mean weight

Dealing with outliers

Figure' 5.2:' The!first!7!customer!ratings!of!this!book!on![Link]!(in!about!2002).!The!first!score!

1) Graph the data with a frequency distribution

2) If an outlier appears to be present for a variable,

Step 1: Analyze > Descriptive Statistics > Frequencies

Step 4: Click on Continue and then OK.

Hygiene rating on a festival:

Particularly odd because

TIP: Always check that there

Boxplots tell you

Case 611 seems to be an extreme score!

• Are they mistakes (e.g., typos)? Then, fix them!!

• Delete the person from the dataset

• Change the score: CAUTION!!

« What did you say? CHEATING???!!! »

REPORT!!! Don’t delete/change outliers automatically, without disclosure and discussion

Explain who was deleted and on which ground!

COMPUTE NEW VARIABLES

Calculate a new variable BMI (Body Mass Index)

BMI = Weight (in kg)/(Length (in m))²

Transform > Compute Variable

Check quickly whether the right calculation is made!

Sometimes, you may be interested in performing an analysis for a specific group of

Data > Select cases

e.g., you want to exclude

e.g., you want to include

Check quickly whether the right selection is made!

Don’t forget to put off the selection after your analysis!

« Candy Preference Scale »

 Four questions to measure the same preference for candy

Candy Preference Scale

Transform > Recode into Different Variables

Helpful hint: Display the

(here: 7-point scale)

Check quickly whether the right recoding is made!

Create your summated

 good internal consistency of the scale

If you have a RECODED variable: use it instead of the

Alpha could be increased by deleting CandyPref2_recoded

Analyze > Correlate > Bivariate

When reporting statistics below

 Open dataset ‘[Link]’

If those 3 requirements are met:

MEAN_CandyPreference (CandyPref1 + CandyPref3 + CandyPref4 +

Look for the BEST split point to divide the sample

Check quickly whether the right coding is made!

Step 1: Ask for frequencies

Look at the cumulative percent :

Prepare your dataset for

You might also like