0% found this document useful (0 votes)

44 views

A Ready Reckoner For Analytic Techniques With Proc

This document provides a summary of common analytical techniques organized into first and second level techniques. It describes the purpose, key elements, interpretation and sample sizes for frequencies, cross tabulations, hierarchical clustering, k-means clustering, MDS, and factor analysis. Steps for performing some of these techniques in SPSS are also outlined.

Uploaded by

ADWITIYA MISTRY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

A Ready Reckoner For Analytic Techniques With Proc

Uploaded by

ADWITIYA MISTRY

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

A ready reckoner for analytic techniques

First Level techniques

1. Frequencies:

For ordinal and scale numbers look at cumulative frequencies. They hold a lot of information
for deriving hypothesis.

2. Cross tabulations (Xtabs):

This is a bivariate technique, meaning that we can only look at two variables at a time. The
two variables should be category variables.
Critical element & interpretation: If the Chi square significance value (not to be confused
with chi square value) is less than .05 (at 95% confidence level), then there is a relation
between the variables and if the Chi square significance value is more than .05, then there is
no relation among the variables.

This is the chi

square significance
value

Second Level
techniques

1. Hierarchical clustering:

Used for forming a natural clustering of up to 50 objects.

Frequently used measure of distance for interval type variables is Euclidean distance.
For count type variables (which we come across rarely in business applications) the
preferred distance measure is chi square and for binary variable types it is either, simple
matching (total of yes and no matches divided by total responses) or Jaccard (Yes matches
divided by total responses less no matches) or Russel and Rao (Yes matches divided by total
responses).
Critical Element & interpretation: The critical element is the dendrogram which allows us to
figure out how many clusters we
should go for. General rule is to
draw a vertical line at the point
where the next object to join the
cluster does so at a relatively longer
distance or in simple words the first
instance, where the joining distance
suddenly increases.
The interpretation will depend on the distance measure used. Basically the interpretation
should look at why the objects are clustering together, is it because there are more yes
matches?
Sample size: Less than 50. Larger sample where a clear association is expected can be used.

Steps for carrying out hierarchical cluster analysis in SPSS:

1. Analyze -> Classify -> Hierarchical Cluster…
2. Push the variables which you want to cluster or which you want to use for clustering the
cases in the ‘Variable(s)’ box.
3. Under ‘cluster’ select the appropriate radio button, i.e. select the ‘Cases’ button if you wish
to cluster the cases based on the selected variables or the ‘Variables’ button, if you wish to
cluster the variables themselves.
4. Click on the ‘Statistics…’ button. Check the ‘Proximity Matrix’ box. Click on ‘Continue’
button.
5. Click on ‘Plots…’ button. Check the Dendrogram’. Click ‘Continue’.
6. Click on ‘Method…’ button and select the appropriate method depending on the type of
variables selected (Interval or Binary). If the variables are interval type, select Euclidean
distance and if the variables are binary, select ‘Jaccard’ or ’Simple matching’ or ‘Russel &
Rao’ depending on your end purpose.
Click ‘Continue’ and OK to generate the output.

2. K Means clustering

Used for clustering large number of objects (more than 50).

Clusters are formed based on mean and hence care should be taken to see that there are no
outliers as these will skew the mean. Also the variables selected for clustering should be
scale or interval variables.
Critical element & interpretation: The critical elements are “Final cluster centres” table and
“Number of cases in each cluster” table. Good clustering is said to have taken place when
there are substantial numbers in each cluster and when the clusters are distinctly different
as may be seen from the final cluster centres table.
Once the clusters are formed to one’s satisfaction, the next step is to derive a profile of the
cluster. Usually the profile is demographic profile but it could also be psychographic or based
on any other information that we may have about the members of the clusters.
Interpretation will involve explanation as to what makes the clusters different from each
other in terms of the variables used for clustering and whether the profiles of cluster is also
different.
Sample size: More than 50.

Steps for carrying out k-Means cluster analysis in SPSS:

1. Analyze -> Classify -> K-Means Cluster…
2. Push the variables selected for clustering in the variables box.
3. Set number of Clusters to 3 (Default is 2).
4. Click ‘OK’ to generate output.
5. Once you are satisfied that the clusters are good, click on ‘Save’ button and check ‘Cluster
membership box. Click ‘Continue’ & ‘OK’
For generating profile of clusters Data -> Split File…
6. Click on the ‘Compare Groups’ radio button. Scroll to the last variable in the panel (Cluster
Number of Case (QCL_1)) and push the variable in the ‘Groups Based on’ box. Click ‘OK’.
Analyze -> Descriptive Statistics -> Frequencies…

3. MDS (Multi-dimensional Scaling) aka Permap

Overall similarity: Advantage of this method is that it may (and very often it does) throw up
some latent attributes which otherwise may not be identified by other methods. The
disadvantage is that the person interpreting the results needs to have thorough knowledge
about the objects being mapped in order to be able to derive the attributes.
In overall similarity method, we only ask the respondents to state the extent of similarity or
dissimilarity that they see within pairs of objects. We do not specify any attributes.
Rule of thumb: If the objective function value drops substantially on changing the
dimensions from 2 to 3, then we can be reasonably sure that there are three dimensions in
the data that we are looking at.
Critical element & interpretation: The critical element is the map (screen shot of the map).
Interpretation will depend on what possible reasons we can attribute to the arrangement of
objects in the map.
Sample size: Depends on the distance method being used. Usually 30 or more. Too many
objects tends to introduce higher error.

Attribute based: Its advantage is that it is simple for the respondent as the attributes are
already specified. The only big disadvantage is that we may miss out important attribute.
Keep in mind that when interpreting the vectors (attributes), the arrow head of the vector or
attribute is always the higher number. Hence any object closer to the arrow head
corresponds to the value of the higher number in which the attribute has been measured.
Critical element & interpretation: The critical element is again the map. In addition to looking
at arrangement of the objects in the map one must also look at the relationship between
attributes to arrive at interpretation.
Sample size: Depends on the distance method being used. Usually 30 or more.

Steps for creating table in SPSS and transferring same to Permap input file:
For Overall similarity

1. We may generate the distance matrix using the Hierarchical procedure.

2. Copy the distance matrix to excel for cleaning.

For Attribute based mapping

1. Analyze -> Tables -> Custom Tables…

2. ‘Ok’ the pop up box and drag-drop the set of objects to be mapped into the ‘Rows’ and the
set of vaiables (parameters) into the columns. Click ‘OK’ and copy the table generated in the
output file to excel for cleaning.
3. Clean the table by adjusting decimal points and suitably shortening the variable names
where required.
Common for both Overall similarity & Attribute based method
1. Copy the table (selecting only row labels and all values) into notepad. Do not copy column
headers.
Permap syntax
2. Use the following syntax above the table in notepad
Attribute Based Overall similarity
Title= Title=
Nobjects= Nobjects=
Nattributes= Similaritylist or
Attributelist Dissimilaritylist

3. Save and open the notepad through Permap. Click on start button after the objects have
arranged themselves for attribute based permap, click on ‘Map Evaluation’ menu and select
‘Attributes’.
From the ‘Attributes Evaluation’ pop up select ‘All Active Vectors…’

4. Factor analysis

Used to reduce the number of variables by combining them into factors or dimensions or
components. Variables which are correlated get clubbed into a factor. The factors so
obtained will usually have an underlying common theme and if that is the case, it may
enable naming or labelling the factor with a suitable name.
Rule of thumb: Selection of variables should be done in such a way that the variables are
correlated. Nominal variables cannot be used in factor analysis. Scale variables are best
suited for factor analysis. Some variables which are measured on a scale of 1 to 5 or some
such similar scales may be treated as scale variables for factor analysis. The correlation can
be tested from the communalities table in the output. One must make sure that from each
of the variables, we are able to extract at least 50% or more. In other words at least 50% of
the variance of each variable can be explained by the factors.
Critical element: Rotated component matrix. The rule of thumb to use here, in order to
identify the dominant variable in each factor is that the variable should have a high value in
that factor and a low value in all other factors. 0.7 and 0.4 are generally the values
considered for high and low respectively, i.e. a value higher than 0.7 is considered as high
and a value lower than 0.4 is considered as low. There is no interpretation for factors.
Factors will need to be used further in other techniques.
Sample size: Generally accepted rule of thumb is to consider a sample size which is 5 times
the number of variables being used in the factor analysis. Therefore if we are trying to carry
out factor analysis on 10 variables, the minimum sample size would be 50. In all cases the
more the sample size the better it is.

Steps for carrying out factor analysis in SPSS:

1. Analyze -> Data Reduction -> Factor…
2. Push the variables which you want to use for factor analysis in the ‘Variables’ box.
3. You may use the ‘Selection Variable’ box to carry out Factor analysis for only one value of a
variable. For example, if you want to perform factor analysis for only males, which may be
coded as value 1 of the variable Gender, push the gender variable in the ‘Selection Variable’
box. Click on the “Value..” button next to the box and enter 1.
4. Click on ‘Rotation’ button and select ‘Varimax’ option. Click ‘OK’ button to generate output.
5. Once you are satisfied with the output, open the factor analysis menu again and click on the
‘Scores…’ button.
6. In the ‘Factor Scores’ box which appears, check the box against ‘Save as variables’. Click
‘Continue’ and ‘OK’.
Locate the factors which are saved as new variables in the data file, variable view and type
the names for the factors in the labels column.

5. Regression

Regression is a predictive analytic technique where we try to predict value of an outcome

variable (dependent) based on the values of the causative (independent) input variable(s).
As in all predictive models a historical model or prior behaviour of causative variables with
respect to the outcome variable needs to be known. Using these historical values we build
the regression model, the most common and the simplest one being the linear regression.
For a good model fit the selection of proper causative variables is important.

Critical element & interpretation: The critical elements are coefficients and r square value.
The R square value ranges from 0 to 1, the closer it is to 1 the better the model fit and
consequently better the accuracy of prediction. The magnitude of the coefficients of the
causative variables indicate their respective influence on the outcome variable and their
polarity (positive or negative) indicates the direction in which the influence pulls the
outcome.
Sample size: As in all predictive models the larger the better but only of there is consistency
in the trend or behaviour of the causative variables with respect to their influence on the
outcome variable.

6. Discriminant analysis
Used to find what distinguishes or differentiates between two or more values of a variable.
It is based on regression and has a dependent variable and one or more independent
variables. Like in the case of regression the independent variables should be interval or scale
type but unlike in the case of regression analysis, the dependent variable should be a
category variable.
Critical element and interpretation: Unstandardized coefficients are used to derive the
regression equation which will provide the discriminant score. The highest coefficient is the
one which will have the largest effect on determining the group to which a case will belong.
Negative sign of coefficient will pull down the discriminant score with every unit increase in
value of variable & vice versa.
Functions at Group centroid table helps in deciding which group the case will belong to. This
will depend on which side of the average the discriminant score of the case falls.
The summery table shows the percentage of values correctly predicted which will in turn
determine the accuracy of the discriminant model. The accuracy should be more than 50%.
The closer it is to 100% the better is the accuracy and hence the acceptability of the
discriminant model as a predictive tool.
Sample size: Same as for regression, but we must ensure that each of the groups has at least
30 cases.
Steps for carrying out discriminant analysis in SPSS:
1. Analyze -> Classify -> Discriminant…
2. Push the variable which contains the groups into which you wish to classify the cases in the
‘Grouping Variable’ box. Click on the ‘Define Range’ button below the box and enter the
range of values for the grouping variable. Grouping variable should be binary variable.
3. Select the independent variables and push them in the Independents box.
4. Click on the ‘Statistics…’ button. Check the boxes against ‘Mean’ under ‘Descriptives’ and
‘Unstandardized’ under ‘Function Coefficients’. Click on ‘Continue’ button.
5. Click on ‘Classify…’. Check the box against ‘Summary Table’ under ‘Display’.
6. Click ‘Continue’ and ‘OK’

My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
Exercises On Quantum Mechanics II (TM1/TV) : Solution 2, Discussed October 28 - November 1, 2019
No ratings yet
Exercises On Quantum Mechanics II (TM1/TV) : Solution 2, Discussed October 28 - November 1, 2019
6 pages
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
CLUSTER ANALYSIS-2
No ratings yet
CLUSTER ANALYSIS-2
7 pages
2 - Review Article - Introduction To Multivariate Analysis
No ratings yet
2 - Review Article - Introduction To Multivariate Analysis
8 pages
The Others in The Cluster But With Differences Between Clusters
No ratings yet
The Others in The Cluster But With Differences Between Clusters
5 pages
Spss 8
No ratings yet
Spss 8
4 pages
Lecture-9 Cluster Analysis_LAK
No ratings yet
Lecture-9 Cluster Analysis_LAK
4 pages
SpSS
No ratings yet
SpSS
9 pages
New Microsoft Word Document (14)
No ratings yet
New Microsoft Word Document (14)
2 pages
Presentation Malo
No ratings yet
Presentation Malo
65 pages
Cluster Analysis
No ratings yet
Cluster Analysis
23 pages
Clustering Today
No ratings yet
Clustering Today
52 pages
Cluster Analysis: Mala Srivastava
No ratings yet
Cluster Analysis: Mala Srivastava
21 pages
Mba - BRM - Unit Iv: Dr.N.Chitra Devi
No ratings yet
Mba - BRM - Unit Iv: Dr.N.Chitra Devi
70 pages
In Marketing, Cluster Analysis Is Used For: Statistical
No ratings yet
In Marketing, Cluster Analysis Is Used For: Statistical
3 pages
Lecture 02 - Cluster Analysis 1
No ratings yet
Lecture 02 - Cluster Analysis 1
59 pages
A Famous Example of Cluster Analysis
No ratings yet
A Famous Example of Cluster Analysis
5 pages
Rangkuman Data Analitik Dan Big Data
No ratings yet
Rangkuman Data Analitik Dan Big Data
10 pages
Chapter 4 - Cluster Analysis
No ratings yet
Chapter 4 - Cluster Analysis
55 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
Chapter-5-Cluster Analysis PDF
No ratings yet
Chapter-5-Cluster Analysis PDF
5 pages
Group#10 (Cluster Analysis)
No ratings yet
Group#10 (Cluster Analysis)
53 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
Cluster Analysis
No ratings yet
Cluster Analysis
45 pages
Statistical Methods: 4 Unit
No ratings yet
Statistical Methods: 4 Unit
39 pages
Journal of Statistical Software: Factominer: An R Package For Multivariate Analysis
No ratings yet
Journal of Statistical Software: Factominer: An R Package For Multivariate Analysis
18 pages
3) One New and Important Graph Graphs Legacy Dialogs Population Pyramid
No ratings yet
3) One New and Important Graph Graphs Legacy Dialogs Population Pyramid
3 pages
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
No ratings yet
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
43 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
15 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
BAM Analysis
No ratings yet
BAM Analysis
15 pages
Market Research: Data Analysis Methods
No ratings yet
Market Research: Data Analysis Methods
20 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
Chapter 8 - Consumer Perception and Preference
No ratings yet
Chapter 8 - Consumer Perception and Preference
29 pages
Cluster Analysis: Consumer Segmentation
No ratings yet
Cluster Analysis: Consumer Segmentation
17 pages
Performing Cluster Analysis
No ratings yet
Performing Cluster Analysis
3 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Cluster Analysis
No ratings yet
Cluster Analysis
25 pages
Chapter 23 - Cluster Analysis
100% (1)
Chapter 23 - Cluster Analysis
16 pages
Market Research
No ratings yet
Market Research
88 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
46 pages
Block 18 ST3188
No ratings yet
Block 18 ST3188
29 pages
DataAnalytics(Unit 2)
No ratings yet
DataAnalytics(Unit 2)
131 pages
Unit- 4 DMA
No ratings yet
Unit- 4 DMA
145 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
RESEARCH Methodology: Associate Professor in Management Pondicherry University Karaikal Campus Karaikal - 609 605
No ratings yet
RESEARCH Methodology: Associate Professor in Management Pondicherry University Karaikal Campus Karaikal - 609 605
46 pages
HCPC Husson Josse
No ratings yet
HCPC Husson Josse
17 pages
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
No ratings yet
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
41 pages
Chapter 20: Cluster Analysis: Advance Marketing Research
No ratings yet
Chapter 20: Cluster Analysis: Advance Marketing Research
40 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ethylene Storage
No ratings yet
Ethylene Storage
5 pages
OK-LN Series
No ratings yet
OK-LN Series
10 pages
Evaluating Learning Algorithms A Classification Perspective - 2011
No ratings yet
Evaluating Learning Algorithms A Classification Perspective - 2011
424 pages
GR 7 MMC 2015
No ratings yet
GR 7 MMC 2015
2 pages
Cambridge IGCSE: Additional Mathematics 0606/21
No ratings yet
Cambridge IGCSE: Additional Mathematics 0606/21
16 pages
States of Consciousness Cittas 89/121: 54 Kamavacara 15 Rupavacara 12 Arupavacara 8/40 Lokuttara
No ratings yet
States of Consciousness Cittas 89/121: 54 Kamavacara 15 Rupavacara 12 Arupavacara 8/40 Lokuttara
1 page
Component
100% (1)
Component
5 pages
Chemicaladsorption
No ratings yet
Chemicaladsorption
4 pages
Router Lint Report
No ratings yet
Router Lint Report
16 pages
Meshcentral: Installer'S Guide
No ratings yet
Meshcentral: Installer'S Guide
36 pages
Study of Surface Hardness of Gypsum Casts Made With Slurry Water: An in Vitro Study
No ratings yet
Study of Surface Hardness of Gypsum Casts Made With Slurry Water: An in Vitro Study
4 pages
Safety Chapter 26
No ratings yet
Safety Chapter 26
5 pages
Error Code 1327-03
No ratings yet
Error Code 1327-03
6 pages
Grade 12 LM General Physics 1 Module2
No ratings yet
Grade 12 LM General Physics 1 Module2
15 pages
Flower Pollination Algorithm For Global Optimization
No ratings yet
Flower Pollination Algorithm For Global Optimization
11 pages
Zelio Logic SR3B261BD
No ratings yet
Zelio Logic SR3B261BD
2 pages
List of Honors 23 24 DELIBERATION
No ratings yet
List of Honors 23 24 DELIBERATION
10 pages
Pasch's Philosophy of Mathematics
No ratings yet
Pasch's Philosophy of Mathematics
26 pages
Information Sheet: Account Security
No ratings yet
Information Sheet: Account Security
10 pages
Distributed File Systems
No ratings yet
Distributed File Systems
18 pages
Ratio Proportion
No ratings yet
Ratio Proportion
49 pages
Laser Docking RangerRD303i Specifications
No ratings yet
Laser Docking RangerRD303i Specifications
11 pages
La5744
No ratings yet
La5744
6 pages
Ripple Effect
No ratings yet
Ripple Effect
15 pages
Code Based Cryptography
No ratings yet
Code Based Cryptography
109 pages
Planning Tools and Techniques
No ratings yet
Planning Tools and Techniques
3 pages
Word Book CELEN
No ratings yet
Word Book CELEN
10 pages
Keyboard Shortcut
100% (2)
Keyboard Shortcut
4 pages
Spectros
No ratings yet
Spectros
25 pages

A Ready Reckoner For Analytic Techniques With Proc

Uploaded by

A Ready Reckoner For Analytic Techniques With Proc

Uploaded by

A ready reckoner for analytic techniques

First Level techniques

2. Cross tabulations (Xtabs):

This is the chi

Used for forming a natural clustering of up to 50 objects.

Steps for carrying out hierarchical cluster analysis in SPSS:

Used for clustering large number of objects (more than 50).

Steps for carrying out k-Means cluster analysis in SPSS:

3. MDS (Multi-dimensional Scaling) aka Permap

1. We may generate the distance matrix using the Hierarchical procedure.

For Attribute based mapping

1. Analyze -> Tables -> Custom Tables…

Steps for carrying out factor analysis in SPSS:

Regression is a predictive analytic technique where we try to predict value of an outcome

You might also like