0% found this document useful (0 votes)
14 views

Week #3 Data Manipulation and Transformation

1. The document discusses various data manipulation techniques in software including sorting cases, splitting files, selecting cases, merging files, and transforming data. 2. Sorting cases allows sorting the rows of a data file based on one or more variables, either in ascending or descending order. Splitting a file splits it into groups for analysis based on grouping variables. 3. Merging files combines two or more data files into one by either adding cases if they have the same variables or adding variables if they have the same cases. Data transformation changes existing variable values or creates new variables from existing ones.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Week #3 Data Manipulation and Transformation

1. The document discusses various data manipulation techniques in software including sorting cases, splitting files, selecting cases, merging files, and transforming data. 2. Sorting cases allows sorting the rows of a data file based on one or more variables, either in ascending or descending order. Splitting a file splits it into groups for analysis based on grouping variables. 3. Merging files combines two or more data files into one by either adding cases if they have the same variables or adding variables if they have the same cases. Data transformation changes existing variable values or creates new variables from existing ones.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

BS-VII (A&B): Fall Semester 2023 Data Analysis through Software’s(PY:405)

Data manipulation changes the layout of the data and does


not change its values. All data manipulation commands
are listed in “Data” pull-down menu.
– Insert var: To insert new variable into the existing file
– Insert case:To insert new case into the existing file
– Go to case: Go to a particular case
DATA MANIPULATION – Sort cases: You can sort cases on the value of one or
more variables.
– Transpose: Used to convert rows into column and vice versa
– Split File: To split the data file based on the values of one or
more grouping variables
– Merging Files: To merge two or more files into one data file
– Select Cases: To select a subset of cases but have not
discarded unselected cases,

1 2

Data Manipulation: Sort Cases


– This dialog box sorts cases (rows) of the data file Sorting the data
based on the values of one or more sorting • Click ‘Data’ and then click Sort Cases
variables. You can sort cases in ascending or
descending order.
– If you select multiple sort variables, cases are
sorted by each variable within categories of the
preceding variable on the Sort list. For example,
if you select gender as the first sorting variable
and minority as the second sorting variable,
cases will be sorted by minority classification
within each gender category.
3

3 4

Muhammad Usman, NIP, Quaid-i-Azam University, Islamabad 30 October 2023 - Page# 1


BS-VII (A&B): Fall Semester 2023 Data Analysis through Software’s(PY:405)

Sorting the data (cont’d) Data Manipulation: SPLIT FILE


• Click a single variable or multiple variables in source box – Split File splits the data file into separate groups
and shift to target box “Sort by” and then select the sort for analysis based on the values of one or more
order option and click “OK” grouping variables. If you select multiple grouping
variables, cases are grouped by each variable
within categories of the preceding variable on the
Groups Based On list. For example, if you select
gender as the first grouping variable and minority
as the second grouping variable, cases will be
grouped by minority classification within each
gender category.
– You can specify up to eight grouping variables.
6

5 6

Data Manipulation: SELECT CASES


MERGE FILES
– Select Cases provides several methods for selecting
a subgroup of cases based on criteria that include
variables and complex expressions.
You can combine two or more data files into
one working data file
– You can also select a random sample of cases. The
criteria used to define a subgroup can include:
There are two possible cases:
– Add Cases
• Variable values and ranges
• Arithmetic Expression – Add Variables
• Relational Expression
• Logical Expression

8
7

7 8

Muhammad Usman, NIP, Quaid-i-Azam University, Islamabad 30 October 2023 - Page# 2


BS-VII (A&B): Fall Semester 2023 Data Analysis through Software’s(PY:405)

MERGING OF TWO OR MORE FILES Add Variables


Add cases – Add Variables merges the working data file with
– Add cases merges the working data file with a an external data file that contains the same
second data file that contains the same variables cases but different variables.
but different cases. – Conditions:
– Conditions: • Both data files should be sorted on key variable.
• In both files the variable names should be same. • Key variable is used to match the cases. The both
• Type of the same variable should also be same. files should be sorted in ascending order on the
key variable.
• The width of same string variables should be
• The same variable names in the external file are
equal.
excluded and the same variables in the working
file are included.

9 10

9 10

Data transformation means to change


the values of an existing variable. Or
you can create new variable on the
DATA basis one or more existing variables.
All data transformation commands are
TRANSFORMATION listed under the transform menu.
–Recode
–Compute
–Replacing Missing Values

12

11 12

Muhammad Usman, NIP, Quaid-i-Azam University, Islamabad 30 October 2023 - Page# 3


BS-VII (A&B): Fall Semester 2023 Data Analysis through Software’s(PY:405)

RECODE PROCEDURE • RECODE INTO SAME VARIABLES


• Recode is a very useful command. Using it, you –Used to reassigns the values of existing variables or
can handle missing values or create new combine ranges of existing values into new values
categorical/Ordinal variables. –You can recode numeric and string variables. If you
• You can modify data values by recoding them. select multiple variables, they must all be the same
This is particularly useful for collapsing or type either Numeric or string.
combining categories. • RECODE INTO DIFFERENT VARIABLE
• You can recode the values within existing variable, –Used to reassigns the values of existing variables or
or you can create new variables based on the collapses ranges of existing values into new values
recoded values of existing variable. for a new variable.
• RECODE INTO SAME VARIABLE – If you select multiple variables, they must all be the
• RECODE INTO DIFFERENT VARIABLE same type.

13 14

13 14

COMPUTE PROCEDURE
Before you can even begin to analyze your data, you
may have to switch them around somehow. Your data
consists of different scales or test items, and you are
interested to find the total scores on the them.
– You can create a new variable by calculating values
of existing variables. e.g you want to find the total
score on a specific test items or the scores of
subscales.
– You can compute values selectively for subsets of
data based on logical conditions.
– You can use over 70 built-in functions, including
arithmetic functions, statistical functions,
distribution functions, and string functions for the
15
data transformation.
16

15 16

Muhammad Usman, NIP, Quaid-i-Azam University, Islamabad 30 October 2023 - Page# 4


BS-VII (A&B): Fall Semester 2023 Data Analysis through Software’s(PY:405)

Transforming data Transforming data (cont’d)


• Click ‘Transform’ and then click ‘Compute Variable…’ • Assign the new variable name in the “Target Variable” and
generate the arithmetic expression in the “Numeric
Expression” by using calculator pad and then Click OK.
• A new variable will be added into the data file at the last column

17 18

REPLACING MISSING VALUES Estimation Methods for Replacing Missing Values


• Missing observations can be problematic in analysis, and
• Series mean
some time series measures cannot be computed if there are
missing values in the series. Sometimes the value for a – Replaces missing values with the mean for the entire series.
particular observation is simply not known. • Mean of nearby points
• The Replace Missing Values dialog box allows you to create – Replaces missing values with the mean of valid surrounding
new variables from existing ones, replacing missing values values. The span of nearby points is the number of valid
with estimates computed with one of several methods. values above and below the missing value used to compute
the mean.
• Default new variable names are the first six characters of the • Median of nearby points
existing variable used to create it, followed by an underscore
and a sequential number. For example, for the variable price, – Replaces missing values with the median of valid surrounding
the new variable name would be price_1. The new variables values. The span of nearby points is the number of valid
values above and below the missing value used to compute
retain any defined value labels from the original variables.
the median.
19 20

19 20

Muhammad Usman, NIP, Quaid-i-Azam University, Islamabad 30 October 2023 - Page# 5


BS-VII (A&B): Fall Semester 2023 Data Analysis through Software’s(PY:405)

LAB ACTIVITY FOR DATA MANIPULATION & TRANSFORMATION


USE THE “GSS93 for Fall Semester 2023.sav” DATA FILE TO
ANSWER THE FOLLOWING QUESTIONS.

• Create 4 age Categories of the variable “Age of Respondent” 18-29


years, 30-39 years, 40-49 years, 50 and above by using recode into
difference variable method.
• Compute the family size by adding the following variables
Household Members Less Than 6 Yrs Old
Household Members 6 Thru 12 Yrs Old
Household Members 13 Thru 17 Yrs Old
Household Members 18 Yrs and Older
• Divide Respondent Socioeconomic Index into three equal groups
assign value labels 1 “Low SES”, 2 “Middle SES”, and 3 “High SES”
• Select the sample of never married women who belongs to High SES
and age is greater than 25 years from the data file. Save the selected
cases to new data file, the new data file should be with your name

21

21

Muhammad Usman, NIP, Quaid-i-Azam University, Islamabad 30 October 2023 - Page# 6

You might also like