0% found this document useful (0 votes)
20 views

Transformation 1

This document discusses data transformations that can be used to satisfy assumptions of normality, homogeneity of variance, and linearity when conducting statistical analyses. It describes four common transformations: logarithmic, square root, inverse, and square transformations. The document explains how to compute each transformation in SPSS and discusses how the transformations adjust variable values depending on whether the original variable is positively or negatively skewed. Examples are provided to illustrate how to apply the transformations.

Uploaded by

Mina Bhatta
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Transformation 1

This document discusses data transformations that can be used to satisfy assumptions of normality, homogeneity of variance, and linearity when conducting statistical analyses. It describes four common transformations: logarithmic, square root, inverse, and square transformations. The document explains how to compute each transformation in SPSS and discusses how the transformations adjust variable values depending on whether the original variable is positively or negatively skewed. Examples are provided to illustrate how to apply the transformations.

Uploaded by

Mina Bhatta
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

SW388R7

Data Analysis &


Computers II Computing Transformations
Slide 1

Transforming variables

Transformations for normality

Transformations for linearity


SW388R7
Data Analysis & Transformations:
Transforming variables to satisfy assumptions
Computers II

Slide 2

 When a metric variable fails to satisfy the


assumption of normality, homogeneity of variance,
or linearity, we may be able to correct the
deficiency by using a transformation.

 We will consider three transformations for normality,


homogeneity of variance, and linearity:
 the logarithmic transformation
 the square root transformation, and
 the inverse transformation

 plus a fourth that may be useful for problems of


linearity:
 the square transformation
SW388R7
Data Analysis & Transformations change the measurement
scale
Computers II

Slide 3

In the diagram to the right, the values of


5 through 20 are plotted on the different
scales used in the transformations. These
scales would be used in plotting the
horizontal axis of the histogram depicting
the distribution.

When comparing values measured on the


decimal scale to which we are
accustomed, we see that each
transformation changes the distance
between the benchmark measurements.
All of the transformations increase the
distance between small values and
decrease the distance between large
values. This has the effect of moving the
positively skewed values to the left,
reducing the effect of the skewing and
producing a distribution that more closely
resembles a normal distribution.
SW388R7
Data Analysis & Transformations:
Computing transformations in SPSS
Computers II

Slide 4

 In SPSS, transformations are obtained by computing a


new variable. SPSS functions are available for the
logarithmic (LG10) and square root (SQRT)
transformations. The inverse transformation uses a
formula which divides one by the original value for
each case.

 For each of these calculations, there may be data


values which are not mathematically permissible.
For example, the log of zero is not defined
mathematically, division by zero is not permitted,
and the square root of a negative number results in
an “imaginary” value. We will usually adjust the
values passed to the function to make certain that
these illegal operations do not occur.
SW388R7
Data Analysis & Transformations:
Two forms for computing transformations
Computers II

Slide 5

 There are two forms for each of the transformations


to induce normality, depending on whether the
distribution is skewed negatively to the left or
skewed positively to the right.

 Both forms use the same SPSS functions and formula


to calculate the transformations.

 The two forms differ in the value or argument passed


to the functions and formula. The argument to the
functions is an adjustment to the original value of
the variable to make certain that all of the
calculations are mathematically correct.
SW388R7
Data Analysis & Transformations:
Functions and formulas for transformations
Computers II

Slide 6

 Symbolically, if we let x stand for the argument


passes to the function or formula, the calculations
for the transformations are:
 Logarithmic transformation: compute log =
LG10(x)
 Square root transformation: compute sqrt =
SQRT(x)
 Inverse transformation: compute inv = -1 / (x)
 Square transformation: compute s2 = x * x
 For all transformations, the argument must be
greater than zero to guarantee that the calculations
are mathematically legitimate.
SW388R7
Data Analysis & Transformations:
Transformation of positively skewed variables
Computers II

Slide 7

 For positively skewed variables, the argument is an


adjustment to the original value based on the
minimum value for the variable.

 If the minimum value for a variable is zero, the


adjustment requires that we add one to each value,
e.g. x + 1.

 If the minimum value for a variable is a negative


number (e.g., –6), the adjustment requires that we
add the absolute value of the minimum value (e.g. 6)
plus one (e.g. x + 6 + 1, which equals x +7).
SW388R7
Data Analysis & Transformations:
Example of positively skewed variable
Computers II

Slide 8

 Suppose our dataset contains the number of books


read (books) for 5 subjects: 1, 3, 0, 5, and 2, and the
distribution is positively skewed.

 The minimum value for the variable books is 0. The


adjustment for each case is books + 1.

 The transformations would be calculated as follows:


 Compute logBooks = LG10(books + 1)

 Compute sqrBooks = SQRT(books + 1)

 Compute invBooks = -1 / (books + 1)


SW388R7
Data Analysis & Transformations:
Transformation of negatively skewed variables
Computers II

Slide 9

 If the distribution of a variable is negatively skewed,


the adjustment of the values reverses, or reflects,
the distribution so that it becomes positively skewed.
The transformations are then computed on the
values in the positively skewed distribution.
 Reflection is computed by subtracting all of the
values for a variable from one plus the absolute
value of maximum value for the variable. This results
in a positively skewed distribution with all values
larger than zero.
 When an analysis uses a transformation involving
reflection, we must remember that this will reverse
the direction of all of the relationships in which the
variable is involved. Our interpretation of
relationships must be adjusted accordingly.
SW388R7
Data Analysis & Transformations:
Example of negatively skewed variable
Computers II

Slide 10

 Suppose our dataset contains the number of books


read (books) for 5 subjects: 1, 3, 0, 5, and 2, and the
distribution is negatively skewed.

 The maximum value for the variable books is 5. The


adjustment for each case is 6 - books.

 The transformations would be calculated as follows:


 Compute logBooks = LG10(6 - books)

 Compute sqrBooks = SQRT(6 - books)

 Compute invBooks = -1 / (6 - books)


SW388R7
Data Analysis & Transformations:
The Square Transformation for Linearity
Computers II

Slide 11

 The square transformation is computed by


multiplying the value for the variable by itself.

 It does not matter whether the distribution is


positively or negatively skewed.

 It does matter if the variable has negative values,


since we would not be able to distinguish their
squares from the square of a comparable positive
value (e.g. the square of -4 is equal to the square of
+4). If the variable has negative values, we add the
absolute value of the minimum value to each score
before squaring it.
SW388R7
Data Analysis & Transformations:
Example of the square transformation
Computers II

Slide 12

 Suppose our dataset contains change scores (chg) for


5 subjects that indicate the difference between test
scores at the end of a semester and test scores at
mid-term: -10, 0, 10, 20, and 30.

 The minimum score is -10. The absolute value of the


minimum score is 10.

 The transformation would be calculated as follows:


 Compute squarChg = (chg + 10) * (chg + 10)
SW388R7
Data Analysis & Transformations:
Transformations for normality
Computers II

Slide 13

Both the histogram and the normality plot for Total


Time Spent on the Internet (netime) indicate that the
variable is not normally distributed.

Histogram
50 Normal Q-Q Plot of TOTAL TIME SPENT ON THE IN
3

40 2

1
30

20

Expected Normal
-1
Frequency

10
Std. Dev = 15.35 -2
Mean = 10.7
N = 93.00 -3
0
-40 -20 0 20 40 60 80 100 120
0.0 20.0 40.0 60.0 80.0 100.0
10.0 30.0 50.0 70.0 90.0 Observed Value

TOTAL TIME SPENT ON THE INTERNET


SW388R7
Data Analysis & Transformations:
Determine whether reflection is required
Computers II

Slide 14

Descriptives

Statistic Std. Error


TOTAL TIME SPENT Mean 10.73 1.59
ON THE INTERNET 95% Confidence Lower Bound 7.57
Interval for Mean Upper Bound
13.89

5% Trimmed Mean 8.29


Median 5.50
Variance 235.655
Std. Deviation 15.35
Minimum 0
Maximum 102
Range 102
Interquartile Range 10.20
Skewness 3.532 .250
Kurtosis 15.614 .495

Skewness, in the table of Descriptive Statistics,


indicates whether or not reflection (reversing the
values) is required in the transformation.

If Skewness is positive, as it is in this problem,


reflection is not required. If Skewness is negative,
reflection is required.
SW388R7
Data Analysis & Transformations:
Compute the adjustment to the argument
Computers II

Slide 15

Descriptives

Statistic Std. Error


TOTAL TIME SPENT Mean 10.73 1.59
ON THE INTERNET 95% Confidence Lower Bound 7.57
Interval for Mean Upper Bound
13.89

5% Trimmed Mean 8.29


Median 5.50
Variance 235.655
Std. Deviation 15.35
Minimum 0
Maximum 102
Range 102
Interquartile Range 10.20
Skewness 3.532 .250
Kurtosis 15.614 .495

In this problem, the minimum value is 0, so 1 will be


added to each value in the formula, i.e. the argument
to the SPSS functions and formula for the inverse will
be:

netime + 1.
SW388R7
Data Analysis & Transformations:
Computing the logarithmic transformation
Computers II

Slide 16

To compute the transformation,


select the Compute… command
from the Transform menu.
Transformations:
SW388R7
Data Analysis &
Computers II

Slide 17
Specifying the transform variable name and function

First, in the Target Variable text box, type a


name for the log transformation variable,
e.g. “lgnetime“.

Third, click
on the up
arrow button
to move the
highlighted
function to
Second, scroll down the list of functions to the Numeric
find LG10, which calculates logarithmic Expression
values use a base of 10. (The logarithmic text box.
values are the power to which 10 is raised
to produce the original number.)
SW388R7
Data Analysis & Transformations:
Adding the variable name to the function
Computers II

Slide 18

Second, click on the right arrow


button. SPSS will replace the
highlighted text in the function
(?) with the name of the variable.

First, scroll down the list of


variables to locate the
variable we want to
transform. Click on its name
so that it is highlighted.
SW388R7
Data Analysis & Transformations:
Adding the constant to the function
Computers II

Slide 19

Following the rules stated for determining the constant


that needs to be included in the function either to
prevent mathematical errors, or to do reflection, we
include the constant in the function argument. In this
case, we add 1 to the netime variable.

Click on the OK
button to complete
the compute
request.
SW388R7
Data Analysis & Transformations:
The transformed variable
Computers II

Slide 20

The transformed variable which we


requested SPSS compute is shown in the
data editor in a column to the right of the
other variables in the dataset.
SW388R7
Data Analysis & Transformations:
Computing the square root transformation
Computers II

Slide 21

To compute the transformation,


select the Compute… command
from the Transform menu.
Transformations:
SW388R7
Data Analysis &
Computers II

Slide 22
Specifying the transform variable name and function

First, in the Target Variable text box, type a


name for the square root transformation
variable, e.g. “sqnetime“.

Third, click
on the up
arrow button
to move the
highlighted
function to
the Numeric
Second, scroll down the list of functions to Expression
find SQRT, which calculates the square root text box.
of a variable.
SW388R7
Data Analysis & Transformations:
Adding the variable name to the function
Computers II

Slide 23

Second, click on the right arrow


button. SPSS will replace the
highlighted text in the function
(?) with the name of the variable.
First, scroll down the list of
variables to locate the
variable we want to
transform. Click on its name
so that it is highlighted.
SW388R7
Data Analysis & Transformations:
Adding the constant to the function
Computers II

Slide 24

Following the rules stated for determining the constant


that needs to be included in the function either to
prevent mathematical errors, or to do reflection, we
include the constant in the function argument. In this
case, we add 1 to the netime variable.

Click on the OK
button to complete
the compute
request.
SW388R7
Data Analysis & Transformations:
The transformed variable
Computers II

Slide 25

The transformed variable which we


requested SPSS compute is shown in the
data editor in a column to the right of the
other variables in the dataset.
SW388R7
Data Analysis & Transformations:
Computing the inverse transformation
Computers II

Slide 26

To compute the transformation,


select the Compute… command
from the Transform menu.
Transformations:
SW388R7
Data Analysis &
Computers II

Slide 27
Specifying the transform variable name and formula

First, in the Target


Variable text box, type a Second, there is not a function for
name for the inverse computing the inverse, so we type
transformation variable, the formula directly into the
e.g. “innetime“. Numeric Expression text box.

Third, click on the


OK button to
complete the
compute request.
SW388R7
Data Analysis & Transformations:
The transformed variable
Computers II

Slide 28

The transformed variable which we


requested SPSS compute is shown in the
data editor in a column to the right of the
other variables in the dataset.
SW388R7 Transformations:
Adjustment to the argument for the square
Data Analysis &
Computers II

Slide 29 transformation

It is mathematically correct to square a value of zero, so the


adjustment to the argument for the square transformation is
different. What we need to avoid are negative numbers,
since the square of a negative number produces the same
value as the square of a positive number.
Descriptives

Statistic Std. Error


TOTAL TIME SPENT Mean 10.73 1.59
ON THE INTERNET 95% Confidence Lower Bound 7.57
Interval for Mean Upper Bound
13.89

5% Trimmed Mean 8.29


Median 5.50
Variance 235.655
Std. Deviation 15.35
Minimum 0
Maximum 102
Range 102
Interquartile Range 10.20
In this problem, the minimum
Skewness
value is 0, no adjustment
3.532 .250
is needed for computing the square. If the minimum
Kurtosis 15.614 .495
was a number less than zero, we would add the
absolute value of the minimum (dropping the sign) as
an adjustment to the variable.
SW388R7
Data Analysis & Transformations:
Computing the square transformation
Computers II

Slide 30

To compute the transformation,


select the Compute… command
from the Transform menu.
Transformations:
SW388R7
Data Analysis &
Computers II

Slide 31
Specifying the transform variable name and formula

First, in the Target


Variable text box, type a Second, there is not a function for
name for the inverse computing the square, so we type
transformation variable, the formula directly into the
e.g. “s2netime“. Numeric Expression text box.

Third, click on the


OK button to
complete the
compute request.
SW388R7
Data Analysis & Transformations:
The transformed variable
Computers II

Slide 32

The transformed variable which we


requested SPSS compute is shown in the
data editor in a column to the right of the
other variables in the dataset.
SW388R7

Using the script to compute transformations


Data Analysis &
Computers II

Slide 33

When the script tests


assumptions, it will create the
transformations that are
checked.

If you want to retain the transformed


variable to use in an analysis, clear the
checkbox that tells the script to delete the
transformed variables it created.
SW388R7

The transformed variables


Data Analysis &
Computers II

Slide 34

The transformed variables are


added to the data editor. The
variable names attempt to
identify the transformation in
the variable name.

The variable labels fully


identify the transformation,
including the function and
formula used to compute it.
SW388R7

Which transformation to use


Data Analysis &
Computers II

Slide 35

The recommendation of which transform to use is often summarized in a


pictorial chart like the above. In practice, it is difficult to determine which
distribution is most like your variable. It is often more efficient to
compute all transformations and examine the statistical properties of
each.

You might also like