A Ready Reckoner For Analytic Techniques With Proc
A Ready Reckoner For Analytic Techniques With Proc
1. Frequencies:
For ordinal and scale numbers look at cumulative frequencies. They hold a lot of information
for deriving hypothesis.
This is a bivariate technique, meaning that we can only look at two variables at a time. The
two variables should be category variables.
Critical element & interpretation: If the Chi square significance value (not to be confused
with chi square value) is less than .05 (at 95% confidence level), then there is a relation
between the variables and if the Chi square significance value is more than .05, then there is
no relation among the variables.
Second Level
techniques
1. Hierarchical clustering:
2. K Means clustering
Overall similarity: Advantage of this method is that it may (and very often it does) throw up
some latent attributes which otherwise may not be identified by other methods. The
disadvantage is that the person interpreting the results needs to have thorough knowledge
about the objects being mapped in order to be able to derive the attributes.
In overall similarity method, we only ask the respondents to state the extent of similarity or
dissimilarity that they see within pairs of objects. We do not specify any attributes.
Rule of thumb: If the objective function value drops substantially on changing the
dimensions from 2 to 3, then we can be reasonably sure that there are three dimensions in
the data that we are looking at.
Critical element & interpretation: The critical element is the map (screen shot of the map).
Interpretation will depend on what possible reasons we can attribute to the arrangement of
objects in the map.
Sample size: Depends on the distance method being used. Usually 30 or more. Too many
objects tends to introduce higher error.
Attribute based: Its advantage is that it is simple for the respondent as the attributes are
already specified. The only big disadvantage is that we may miss out important attribute.
Keep in mind that when interpreting the vectors (attributes), the arrow head of the vector or
attribute is always the higher number. Hence any object closer to the arrow head
corresponds to the value of the higher number in which the attribute has been measured.
Critical element & interpretation: The critical element is again the map. In addition to looking
at arrangement of the objects in the map one must also look at the relationship between
attributes to arrive at interpretation.
Sample size: Depends on the distance method being used. Usually 30 or more.
Steps for creating table in SPSS and transferring same to Permap input file:
For Overall similarity
3. Save and open the notepad through Permap. Click on start button after the objects have
arranged themselves for attribute based permap, click on ‘Map Evaluation’ menu and select
‘Attributes’.
From the ‘Attributes Evaluation’ pop up select ‘All Active Vectors…’
4. Factor analysis
Used to reduce the number of variables by combining them into factors or dimensions or
components. Variables which are correlated get clubbed into a factor. The factors so
obtained will usually have an underlying common theme and if that is the case, it may
enable naming or labelling the factor with a suitable name.
Rule of thumb: Selection of variables should be done in such a way that the variables are
correlated. Nominal variables cannot be used in factor analysis. Scale variables are best
suited for factor analysis. Some variables which are measured on a scale of 1 to 5 or some
such similar scales may be treated as scale variables for factor analysis. The correlation can
be tested from the communalities table in the output. One must make sure that from each
of the variables, we are able to extract at least 50% or more. In other words at least 50% of
the variance of each variable can be explained by the factors.
Critical element: Rotated component matrix. The rule of thumb to use here, in order to
identify the dominant variable in each factor is that the variable should have a high value in
that factor and a low value in all other factors. 0.7 and 0.4 are generally the values
considered for high and low respectively, i.e. a value higher than 0.7 is considered as high
and a value lower than 0.4 is considered as low. There is no interpretation for factors.
Factors will need to be used further in other techniques.
Sample size: Generally accepted rule of thumb is to consider a sample size which is 5 times
the number of variables being used in the factor analysis. Therefore if we are trying to carry
out factor analysis on 10 variables, the minimum sample size would be 50. In all cases the
more the sample size the better it is.
5. Regression
Critical element & interpretation: The critical elements are coefficients and r square value.
The R square value ranges from 0 to 1, the closer it is to 1 the better the model fit and
consequently better the accuracy of prediction. The magnitude of the coefficients of the
causative variables indicate their respective influence on the outcome variable and their
polarity (positive or negative) indicates the direction in which the influence pulls the
outcome.
Sample size: As in all predictive models the larger the better but only of there is consistency
in the trend or behaviour of the causative variables with respect to their influence on the
outcome variable.
6. Discriminant analysis
Used to find what distinguishes or differentiates between two or more values of a variable.
It is based on regression and has a dependent variable and one or more independent
variables. Like in the case of regression the independent variables should be interval or scale
type but unlike in the case of regression analysis, the dependent variable should be a
category variable.
Critical element and interpretation: Unstandardized coefficients are used to derive the
regression equation which will provide the discriminant score. The highest coefficient is the
one which will have the largest effect on determining the group to which a case will belong.
Negative sign of coefficient will pull down the discriminant score with every unit increase in
value of variable & vice versa.
Functions at Group centroid table helps in deciding which group the case will belong to. This
will depend on which side of the average the discriminant score of the case falls.
The summery table shows the percentage of values correctly predicted which will in turn
determine the accuracy of the discriminant model. The accuracy should be more than 50%.
The closer it is to 100% the better is the accuracy and hence the acceptability of the
discriminant model as a predictive tool.
Sample size: Same as for regression, but we must ensure that each of the groups has at least
30 cases.
Steps for carrying out discriminant analysis in SPSS:
1. Analyze -> Classify -> Discriminant…
2. Push the variable which contains the groups into which you wish to classify the cases in the
‘Grouping Variable’ box. Click on the ‘Define Range’ button below the box and enter the
range of values for the grouping variable. Grouping variable should be binary variable.
3. Select the independent variables and push them in the Independents box.
4. Click on the ‘Statistics…’ button. Check the boxes against ‘Mean’ under ‘Descriptives’ and
‘Unstandardized’ under ‘Function Coefficients’. Click on ‘Continue’ button.
5. Click on ‘Classify…’. Check the box against ‘Summary Table’ under ‘Display’.
6. Click ‘Continue’ and ‘OK’