0% found this document useful (0 votes)
16 views

SPSS Notes

.

Uploaded by

julienlattouf13
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

SPSS Notes

.

Uploaded by

julienlattouf13
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1.

Basic definitions
Descriptive statistics are statistics that describe a variable's central tendency and dispersion.
Central tendency represents the values contained in the center of the data, such as the Mean, the
Median, the Mode, etc.
Mean = average value.
Median = middle value when data set is ordered from the smallest to the greatest value.
Mode = most occurring value.
Dispersion represents the distribution of the variable's responses.
Minimum = smallest value.
Maximum = greatest value.
Range = Maximum – Minimum.
Standard deviation (écart-type) = spread of data around the Mean, aka how far away values tend
to deviate from the Mean. For example, if the Mean = 18 and the Std. Dev. = 3.8, then the data
spreads between 21.8 and 14.2.
Outliers (Extreme Values) = the 5 lowest values and the 5 highest values, observations that are
distant from normal observations within one complete dataset.
A Percentile is the value below which a given percentage of observations in a group of
observations falls. For example, the 25th percentile is the value below which 25% of the
observations may be found. If the Weighted Average of the 25th Percentile is 164cm, then 25% of
the population has a height below 164cm.
A Confidence Interval has a Lower Limit (or Bound) and an Upper Limit. When a 95%
Confidence Interval is bound between 164 (Lower) and 167 (Upper), this means I’m 95%
confident that, if I revisit the same or a similar sample, the Mean will be situated between 164
and 167.

2. Variable measures
 Scale: values represent ordered categories with a meaningful metric, so that distance
comparisons between values are appropriate. Examples of scale variables include age
in years, income or test score. For example, in a classroom of 60 students, each one
would be given a test, and therefore Scale is used to determine the average score for
the class, or the highest and lowest score in the class, etc.
 Nominal: values represent categories with no intrinsic ranking. Where do you live? 1-
Suburbs, 2- City, 3- Town. What is your gender? M- Male, F-Female.
 Ordinal: values represent categories with some intrinsic ranking. How satisfied are
you with our services? Very Unsatisfied – 1, Unsatisfied – 2, Neutral – 3, Satisfied –
4, Very Satisfied – 5.

3. Transform tab
This tab helps me transform data into meaningful values.
 Compute Variable acts like a calculator. In the Compute Variable window can be
found various functions within each function group. In the Statistical group are the
Mean, the Median, the Mode, etc. The resulting value will be a part of a new variable
that will appear in the SPSS table.
 Recode into Different Variables: it transforms an original variable into a new
variable, but the changes do not overwrite the original variable; they are instead
applied to a copy of the original variable under a new variable name. I can also go to
Old and New Variables to specify how I wish to recode the values for the selected
variable as a way to classify data.

4. Edit tab
This tab helps me easily navigate my data and make changes.
 Go to Variable.
 Go to Case.
 Find and Replace: it’s used, for example, when I want to replace all the 5s in my
dataset with 6s instead.

5. Data tab
 Sort Cases.
 Sort Variables: I can put variables in an Ascending or a Descending order. I can also
select Transpose, aka turn rows into columns and vice versa.
 Select Cases: a window will open, from which I can do many things. For example, if I
want to focus only on the data regarding females, I can insert an “if condition” that
says “if value = 2 (2 = Female) and SPSS will select only the cases regarding females.
I can also select Random Sample of Cases and SPSS will choose cases randomly for
me (for purposes of sampling).
 Split File: for example, I can separate the data of males and females and display the
resulting data as either separate groups or comparative groups.

6. Analyze tab and Output


I go to Descriptive Statistics, then to:
 Frequencies. I add the variables I want in my Output, and I tick Display frequency tables.
In the Statistics window, I tick everything in Central tendency and in Dispersion except
for the S. E mean. In the Charts window, I tick Histogram and Show normal curve on
histogram.
 Descriptives. I add my variables, then I go to Options to select my functions. Here I can
also change the percentage of my Confidence Interval.
 Explore. I add my variables to the Dependent list, then I go to Options. I tick the Mean,
the Sum, everything in Distribution except for the S. E mean, and the Variable list. I can
also go to Statistics, and I tick Descriptives, Outliers and Percentiles. In Plots, I un-tick
Stem-and-leaf.

7. Output: definitions
In a frequency table, the Frequency column reports the number of cases that fall into each
category of the variable being analyzed (number of times something occurs in the population).
The Percent column provides a percentage of the total cases that fall into each category.
The Valid Percent column is a percentage that does not include missing cases.
The Cumulative Percent column adds the percentages of each category from the top of the table
to the bottom, culminating in 100%. This is more useful when the variable of analysis is ranked
or ordinal, as it makes it easy to get a sense of what percentage of cases fall below each rank, aka
percentiles.

8. Missing values
There are 2 types of missing values: system and user.
System missing values are values that are completely absent from the data. They are shown as
periods (.) in data view. System missing values are only found in numeric variables. A
respondent skipped some questions, some values weren't recorded, etc.
User missing values are values that the SPSS user specifically excludes. For categorical
variables, answers such as “don't know” or “no answer” are typically excluded from analysis. For
metric variables, unlikely values (a reaction time of 50ms or a monthly salary of € 9,999,999) are
usually set as user missing. These values are invisible while editing or analyzing data.
9. Analyzing the normal distribution curve to write a rapport

 99.7% of the data:


Mean + 3 Std. Dev. =
Mean – 3 Std. Dev. =

 95.4% of the data:


Mean + 2 Std. Dev. =
Mean – 2 Std. Dev. =

 68.3% of the data:


Mean + Std. Dev. =
Mean – Std. Dev. =

10. Good to know


In the DSM, pathology criteria starts from +/-3 Std. Dev. (+/-3 sigma).
Histograms are used for scale variables, and bar charts for nominal variables.
Before I enter data into SPSS, I have to make sure it’s appropriately scaled. Not everything in
SPSS can be applied to any data. The results in the Output are telling of whether data is
appropriately scaled or not. There has to be some standard by which I should select my
population (a consistent sample).
Outliers can be either one of 3 things: extreme values, inappropriately scaled data, or data entry
errors (I was tired and typed a wrong value, people were too tired to answer accurately, etc.).

11. Boxplot
A Boxplot contains the Median (middle dash), the Minimum (bottom dash) and Maximum (top
dash) values, and the Quartiles.
 The 1st Quartile goes from the Minimum to the bound of the bottom box (25th
Percentile),
 the 2nd from that bound to the Median (50th Percentile),
 the 3rd from the Median to the bound of the top box (75th Percentile),
 and the 4th from that bound to the Maximum.
Interquartile Range (IQR) = the value of the 75th Percentile – the value of the 25th Percentile.
A spread out Boxplot tends not to have Outliers, while a small Boxplot tends to have Outliers.

12. Finding Suspected Outliers


3rd Quartile + 1.5IQR
1st Quartile – 1.5IQR
If the IQR is 4, the 3rd Quartile stops at 168, and the 1st at 164, then we have:
168 + 6 = 174
164 – 6 = 158
Both results are Suspected Outliers. SPSS marks them in the Boxplot using small circles.

13. Finding Certain Outliers


We follow the same procedure as above, but use 3IQR instead of 1.5IQR. SPSS marks Certain
Outliers in the Boxplot using little stars.

14. Pearson Correlation


The closer to -1 or +1, the stronger the correlation (represented by R). It varies within an interval
containing -1 (inverse correlation), 0 and +1 (direct correlation). If it’s 0, there is no correlation.
To perform the operation: Analyze tab  Correlate  Bivariate.
In the report, I must write the sentence that appears under the table in the Output. What does that
sentence mean? The P value is either 0.01 or 0.05. The significant value (sig) is compared to the
P value in this way:
 If sig = 0.000 < 0.01, we don’t need to compare it to 0.05.
 If sig = 0.3 > 0.01, we compare it to 0.05.
The closer R is to -1 or +1, the closer the sig is to 0.000. The farther away R is from -1 or +1, the
bigger the sig, which is true when no correlation exists (r is close to 0). The sig can become
equal to 0.864, for example.
The sig exists to help me determine whether correlation exists or not, especially when the R
doesn’t have a telling value. If sig > P (even 0.05), no correlation exists.
To make a Scatter chart: Graphs tab  Chart Builder. The more linear the curve, the stronger the
correlation.

15. Regression test


The regression test cannot be applied if no correlation exists; this is why it builds off the Pearson
Correlation test. It helps set the degree of relationship existing between two variables. It also
helps in making predictions about the relationship of other values within these two variables:
Y = AX + B.
I have to find the degree of relationship between Y (vertical graph line) and X (horizontal graph
line). If X (caffeine dose) is 0 and I want to find Y (IQ score), I also need to know what A and B
stand for so I can recognize the Y value, which is what the regression test is used for.
Graphs tab  Regression  Linear. Y is the dependent variable (the one I’m trying to find). X
is the independent variable for which I set the value (for prediction’s sake).
R Square is the deviation from the linear curve.
The value of B in the regression equation above is the value in the “B” column “Constant” row
in the Coefficients output table. The value of A is right under it.
So, if X = 0 and A = 0.134 (according to the output), the regression equation will look like this:
Y = 0.134 x 0 + B (in which case Y = B because here AX = 0).
Increasing the caffeine dose (X =1, 2, 3…) means that for every 1mg of caffeine consumed, the
IQ score increase by 0.134.
The more perfect the correlation is, the more accurate and less error-prone the prediction.

16. T-test
A t-test is used to compare the difference of the means of two groups. To perform the operation:
Analyze tab  Compare Means. There are 3 types: one-sample, paired-samples, and
independent-samples.
The one-sample t-test determines whether the sample mean is statistically different than a known
or hypothesized population mean (the “test value”). Example: I’d like to find the difference
between the average IQ score of my sample (105) and the average IQ score of the general
population (100).
In the output table, Mean Difference = variable mean – test value = 105 – 100 = 5, which I will
get in 95% of the cases (within the Lower and Upper bounds of the 95% Confidence Interval).
df = degrees of freedom = N – 1 (as in 1 sample). The result represents the number of values I
can change while keeping the same mean.
To interpret the Output table, we need to keep the t distribution table handy. We commonly use
alpha = 0.05 in the two-tailed test column.
First part of the interpretation: I check in the output table the t value and the df value, then go to
the convenient df column in the t distribution table, and compare the value found there (critical
value) to the t value.
Second: I compare the output Sig value to the corresponding P value (0.01 or 0.05).
Third: I check whether my confidence interval crosses 0. If it doesn’t, then my data’s mean will
never reach the test value. If it does, then it may.
The means are significantly different when 1) the t value is bigger than the critical value, 2) the
sig is smaller than the P value, and 3) the 95% CI doesn’t cross 0.
In the paired-samples t-test, I have 2 groups (before and after) within the same sample. Example:
the IQ score of the same people before and after college: After Mean – Before Mean.
A positive result means that After Mean is greater than Before Mean. A negative result means
that After Mean is smaller than Before Mean.
Paired measures: I’m measuring two independent variables within the same sample, such as
height and weight.
In the independent-samples t-test (where two or more samples are analyzed), I consider only the
equal-variances assumed row (values with different means but similar variances) in the output
table. But remember: if the Sig of Levene’s test is greater than the P value (0.05), I opt for the
equal-variances assumed row. If not, then I opt for the equal-variances not assumed row. Here,
df = N – 2 (or 3, or 4… depending on the number of samples I have).

17. Interpretation of the 95% CI


Positive Bounds: The IQ Mean is greater than the Reference Mean by [value of Lower Bound] at
least and by [value of Upper Bound] at most.
Negative Bounds: The IQ Mean is smaller… by [positive values of Upper and Lower Bounds].

18. ANOVA test


It’s commonly used to test statistical differences among the means of two or more groups, and
statistical differences among the means of two or more interventions.
Analyze tab  Compare Means  One-Way ANOVA.
Post Hoc  I tick Tukey and Games-Howell.
Options  I tick Descriptive, Homogeneity (which is Levene’s test and identifies variables with
equal variances meaning variables that show no significant difference) and Welch.
In the ANOVA output table, if F < Sig (0.000), the difference Between Groups isn’t significant.
In the Multiple Comparisons output table (which is similar to the Crosstabs in the Descriptive
Statistics), I compare the Sig to the P value:
If Sig = 0.008 < P value (0.05), a significant difference does exist.
If Sig = (0.937) > P value (0.05), no significant difference exists.

You might also like