0% found this document useful (0 votes)
44 views

A Ready Reckoner For Analytic Techniques With Proc

This document provides a summary of common analytical techniques organized into first and second level techniques. It describes the purpose, key elements, interpretation and sample sizes for frequencies, cross tabulations, hierarchical clustering, k-means clustering, MDS, and factor analysis. Steps for performing some of these techniques in SPSS are also outlined.

Uploaded by

ADWITIYA MISTRY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

A Ready Reckoner For Analytic Techniques With Proc

This document provides a summary of common analytical techniques organized into first and second level techniques. It describes the purpose, key elements, interpretation and sample sizes for frequencies, cross tabulations, hierarchical clustering, k-means clustering, MDS, and factor analysis. Steps for performing some of these techniques in SPSS are also outlined.

Uploaded by

ADWITIYA MISTRY
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A ready reckoner for analytic techniques

First Level techniques

1. Frequencies:

For ordinal and scale numbers look at cumulative frequencies. They hold a lot of information
for deriving hypothesis.

2. Cross tabulations (Xtabs):

This is a bivariate technique, meaning that we can only look at two variables at a time. The
two variables should be category variables.
Critical element & interpretation: If the Chi square significance value (not to be confused
with chi square value) is less than .05 (at 95% confidence level), then there is a relation
between the variables and if the Chi square significance value is more than .05, then there is
no relation among the variables.

This is the chi


square significance
value

Second Level
techniques

1. Hierarchical clustering:

Used for forming a natural clustering of up to 50 objects.


Frequently used measure of distance for interval type variables is Euclidean distance.
For count type variables (which we come across rarely in business applications) the
preferred distance measure is chi square and for binary variable types it is either, simple
matching (total of yes and no matches divided by total responses) or Jaccard (Yes matches
divided by total responses less no matches) or Russel and Rao (Yes matches divided by total
responses).
Critical Element & interpretation: The critical element is the dendrogram which allows us to
figure out how many clusters we
should go for. General rule is to
draw a vertical line at the point
where the next object to join the
cluster does so at a relatively longer
distance or in simple words the first
instance, where the joining distance
suddenly increases.
The interpretation will depend on the distance measure used. Basically the interpretation
should look at why the objects are clustering together, is it because there are more yes
matches?
Sample size: Less than 50. Larger sample where a clear association is expected can be used.

Steps for carrying out hierarchical cluster analysis in SPSS:


1. Analyze -> Classify -> Hierarchical Cluster…
2. Push the variables which you want to cluster or which you want to use for clustering the
cases in the ‘Variable(s)’ box.
3. Under ‘cluster’ select the appropriate radio button, i.e. select the ‘Cases’ button if you wish
to cluster the cases based on the selected variables or the ‘Variables’ button, if you wish to
cluster the variables themselves.
4. Click on the ‘Statistics…’ button. Check the ‘Proximity Matrix’ box. Click on ‘Continue’
button.
5. Click on ‘Plots…’ button. Check the Dendrogram’. Click ‘Continue’.
6. Click on ‘Method…’ button and select the appropriate method depending on the type of
variables selected (Interval or Binary). If the variables are interval type, select Euclidean
distance and if the variables are binary, select ‘Jaccard’ or ’Simple matching’ or ‘Russel &
Rao’ depending on your end purpose.
Click ‘Continue’ and OK to generate the output.

2. K Means clustering

Used for clustering large number of objects (more than 50).


Clusters are formed based on mean and hence care should be taken to see that there are no
outliers as these will skew the mean. Also the variables selected for clustering should be
scale or interval variables.
Critical element & interpretation: The critical elements are “Final cluster centres” table and
“Number of cases in each cluster” table. Good clustering is said to have taken place when
there are substantial numbers in each cluster and when the clusters are distinctly different
as may be seen from the final cluster centres table.
Once the clusters are formed to one’s satisfaction, the next step is to derive a profile of the
cluster. Usually the profile is demographic profile but it could also be psychographic or based
on any other information that we may have about the members of the clusters.
Interpretation will involve explanation as to what makes the clusters different from each
other in terms of the variables used for clustering and whether the profiles of cluster is also
different.
Sample size: More than 50.

Steps for carrying out k-Means cluster analysis in SPSS:


1. Analyze -> Classify -> K-Means Cluster…
2. Push the variables selected for clustering in the variables box.
3. Set number of Clusters to 3 (Default is 2).
4. Click ‘OK’ to generate output.
5. Once you are satisfied that the clusters are good, click on ‘Save’ button and check ‘Cluster
membership box. Click ‘Continue’ & ‘OK’
For generating profile of clusters Data -> Split File…
6. Click on the ‘Compare Groups’ radio button. Scroll to the last variable in the panel (Cluster
Number of Case (QCL_1)) and push the variable in the ‘Groups Based on’ box. Click ‘OK’.
Analyze -> Descriptive Statistics -> Frequencies…

3. MDS (Multi-dimensional Scaling) aka Permap

Overall similarity: Advantage of this method is that it may (and very often it does) throw up
some latent attributes which otherwise may not be identified by other methods. The
disadvantage is that the person interpreting the results needs to have thorough knowledge
about the objects being mapped in order to be able to derive the attributes.
In overall similarity method, we only ask the respondents to state the extent of similarity or
dissimilarity that they see within pairs of objects. We do not specify any attributes.
Rule of thumb: If the objective function value drops substantially on changing the
dimensions from 2 to 3, then we can be reasonably sure that there are three dimensions in
the data that we are looking at.
Critical element & interpretation: The critical element is the map (screen shot of the map).
Interpretation will depend on what possible reasons we can attribute to the arrangement of
objects in the map.
Sample size: Depends on the distance method being used. Usually 30 or more. Too many
objects tends to introduce higher error.

Attribute based: Its advantage is that it is simple for the respondent as the attributes are
already specified. The only big disadvantage is that we may miss out important attribute.
Keep in mind that when interpreting the vectors (attributes), the arrow head of the vector or
attribute is always the higher number. Hence any object closer to the arrow head
corresponds to the value of the higher number in which the attribute has been measured.
Critical element & interpretation: The critical element is again the map. In addition to looking
at arrangement of the objects in the map one must also look at the relationship between
attributes to arrive at interpretation.
Sample size: Depends on the distance method being used. Usually 30 or more.

Steps for creating table in SPSS and transferring same to Permap input file:
For Overall similarity

1. We may generate the distance matrix using the Hierarchical procedure.


2. Copy the distance matrix to excel for cleaning.

For Attribute based mapping

1. Analyze -> Tables -> Custom Tables…


2. ‘Ok’ the pop up box and drag-drop the set of objects to be mapped into the ‘Rows’ and the
set of vaiables (parameters) into the columns. Click ‘OK’ and copy the table generated in the
output file to excel for cleaning.
3. Clean the table by adjusting decimal points and suitably shortening the variable names
where required.
Common for both Overall similarity & Attribute based method
1. Copy the table (selecting only row labels and all values) into notepad. Do not copy column
headers.
Permap syntax
2. Use the following syntax above the table in notepad
Attribute Based Overall similarity
Title= Title=
Nobjects= Nobjects=
Nattributes= Similaritylist or
Attributelist Dissimilaritylist

3. Save and open the notepad through Permap. Click on start button after the objects have
arranged themselves for attribute based permap, click on ‘Map Evaluation’ menu and select
‘Attributes’.
From the ‘Attributes Evaluation’ pop up select ‘All Active Vectors…’

4. Factor analysis

Used to reduce the number of variables by combining them into factors or dimensions or
components. Variables which are correlated get clubbed into a factor. The factors so
obtained will usually have an underlying common theme and if that is the case, it may
enable naming or labelling the factor with a suitable name.
Rule of thumb: Selection of variables should be done in such a way that the variables are
correlated. Nominal variables cannot be used in factor analysis. Scale variables are best
suited for factor analysis. Some variables which are measured on a scale of 1 to 5 or some
such similar scales may be treated as scale variables for factor analysis. The correlation can
be tested from the communalities table in the output. One must make sure that from each
of the variables, we are able to extract at least 50% or more. In other words at least 50% of
the variance of each variable can be explained by the factors.
Critical element: Rotated component matrix. The rule of thumb to use here, in order to
identify the dominant variable in each factor is that the variable should have a high value in
that factor and a low value in all other factors. 0.7 and 0.4 are generally the values
considered for high and low respectively, i.e. a value higher than 0.7 is considered as high
and a value lower than 0.4 is considered as low. There is no interpretation for factors.
Factors will need to be used further in other techniques.
Sample size: Generally accepted rule of thumb is to consider a sample size which is 5 times
the number of variables being used in the factor analysis. Therefore if we are trying to carry
out factor analysis on 10 variables, the minimum sample size would be 50. In all cases the
more the sample size the better it is.

Steps for carrying out factor analysis in SPSS:


1. Analyze -> Data Reduction -> Factor…
2. Push the variables which you want to use for factor analysis in the ‘Variables’ box.
3. You may use the ‘Selection Variable’ box to carry out Factor analysis for only one value of a
variable. For example, if you want to perform factor analysis for only males, which may be
coded as value 1 of the variable Gender, push the gender variable in the ‘Selection Variable’
box. Click on the “Value..” button next to the box and enter 1.
4. Click on ‘Rotation’ button and select ‘Varimax’ option. Click ‘OK’ button to generate output.
5. Once you are satisfied with the output, open the factor analysis menu again and click on the
‘Scores…’ button.
6. In the ‘Factor Scores’ box which appears, check the box against ‘Save as variables’. Click
‘Continue’ and ‘OK’.
Locate the factors which are saved as new variables in the data file, variable view and type
the names for the factors in the labels column.

5. Regression

Regression is a predictive analytic technique where we try to predict value of an outcome


variable (dependent) based on the values of the causative (independent) input variable(s).
As in all predictive models a historical model or prior behaviour of causative variables with
respect to the outcome variable needs to be known. Using these historical values we build
the regression model, the most common and the simplest one being the linear regression.
For a good model fit the selection of proper causative variables is important.

Critical element & interpretation: The critical elements are coefficients and r square value.
The R square value ranges from 0 to 1, the closer it is to 1 the better the model fit and
consequently better the accuracy of prediction. The magnitude of the coefficients of the
causative variables indicate their respective influence on the outcome variable and their
polarity (positive or negative) indicates the direction in which the influence pulls the
outcome.
Sample size: As in all predictive models the larger the better but only of there is consistency
in the trend or behaviour of the causative variables with respect to their influence on the
outcome variable.

6. Discriminant analysis
Used to find what distinguishes or differentiates between two or more values of a variable.
It is based on regression and has a dependent variable and one or more independent
variables. Like in the case of regression the independent variables should be interval or scale
type but unlike in the case of regression analysis, the dependent variable should be a
category variable.
Critical element and interpretation: Unstandardized coefficients are used to derive the
regression equation which will provide the discriminant score. The highest coefficient is the
one which will have the largest effect on determining the group to which a case will belong.
Negative sign of coefficient will pull down the discriminant score with every unit increase in
value of variable & vice versa.
Functions at Group centroid table helps in deciding which group the case will belong to. This
will depend on which side of the average the discriminant score of the case falls.
The summery table shows the percentage of values correctly predicted which will in turn
determine the accuracy of the discriminant model. The accuracy should be more than 50%.
The closer it is to 100% the better is the accuracy and hence the acceptability of the
discriminant model as a predictive tool.
Sample size: Same as for regression, but we must ensure that each of the groups has at least
30 cases.
Steps for carrying out discriminant analysis in SPSS:
1. Analyze -> Classify -> Discriminant…
2. Push the variable which contains the groups into which you wish to classify the cases in the
‘Grouping Variable’ box. Click on the ‘Define Range’ button below the box and enter the
range of values for the grouping variable. Grouping variable should be binary variable.
3. Select the independent variables and push them in the Independents box.
4. Click on the ‘Statistics…’ button. Check the boxes against ‘Mean’ under ‘Descriptives’ and
‘Unstandardized’ under ‘Function Coefficients’. Click on ‘Continue’ button.
5. Click on ‘Classify…’. Check the box against ‘Summary Table’ under ‘Display’.
6. Click ‘Continue’ and ‘OK’

You might also like