0% found this document useful (0 votes)
2 views

final assignment

The document discusses Multi-Dimensional Scaling (MDS) as a statistical technique for representing high-dimensional data in a lower-dimensional space while maintaining pairwise distances. It outlines key assumptions of MDS, its applicability, handling of non-Euclidean distances, and comparisons with PCA, as well as challenges in interpreting MDS outputs in contexts like market research and social science. Additionally, it addresses the importance of preprocessing data with missing values and evaluating the robustness of MDS solutions.

Uploaded by

CHARLES
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

final assignment

The document discusses Multi-Dimensional Scaling (MDS) as a statistical technique for representing high-dimensional data in a lower-dimensional space while maintaining pairwise distances. It outlines key assumptions of MDS, its applicability, handling of non-Euclidean distances, and comparisons with PCA, as well as challenges in interpreting MDS outputs in contexts like market research and social science. Additionally, it addresses the importance of preprocessing data with missing values and evaluating the robustness of MDS solutions.

Uploaded by

CHARLES
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

SCHOOL OF BUSINESS AND ECONOMICS

DEPARTMENT OF MANAGEMENT
GROUP ASSIGNMENT

UNIT CODE: BMCU 006


UNIT TITLE: MULTIVARIATE STATISTICAL ANALYSIS

NAME ADMIN NO
JAMES N. GICHURU PHDBA/2025/31657
LEILA WAITHIRA PHDBA/2025/30149
KEVIN KANYARI WACHIRA PHDBA/2025/60213
WABENGA BASHILWANGO PHDBA/2025/53783
CHARLES CHEGE GITAU PHDBA/2025/68950

Multi-Dimensional Scaling (MDS) in research

1. What are the key assumptions behind MDS, and how do these assumptions affect its
applicability in real-world datasets?
Definition of Multi Dimension Scaling
 : Multidimensional Scaling (MDS) is a form of statistics that is utilized to
elucidate/represent data that is high-dimensional in a space that is of a lower-
dimensional value while at the same time, maintaining pairwise distances.
 MDS is also considered as any technique that is multi-dimensional in nature where
qualitative and quantitative relationships in the data are aligned with the geometric
relationships in the representation
 MDS is therefore, important for measuring human perceptions and preferences for
certain products. This is because of the aspect of spatial representation of relationships
among behavioral data. This paper explores MDS in all its key aspects.

Key assumptions behind MDS


 Similarity or Dissimilarity Data:MDS denotes that the distance/ dissimilarity matrix
is a precise and correct representation of the relationships between the data points.
Furthermore, if the input distances are misrepresented or measured incorrectly, the
result of the low-dimensional representation will be misleading
 Continuity & Metric Consistency: Distances must be comparable, and relationships
should be preserved.
 Dimensionality Interpretation: The output space should provide a meaningful lower-
dimensional representation. MDS assumes that the structure of the data can be included
in a lower-dimensional space. In instances where the dataset has a non-linear structure
or highly complex structure, MDS may not be able to have an interpretable
representation

How do these assumptions affect its applicability in real-world datasets?

 The choice of distance metric is critical. The applicability of MDS is heavily reliant
on the selection of a suitable distance measure that has a reflection on the meaningful
relationships within the data. This is because MDS may grapple with the accuracy of
capturing relationships which leads to misrepresentation in the output visualization.
 Missing data or data that is not aligned with the rest of the data, can distort the
distance matrix, leading to incorrect embedding. Methods such as imputation or
filtering out noisy data can help improve results.
 MDS will only work well, where the data structure is captured in two or three
dimensions. Some data might have complexities in relationships which require more
dimensions for meaningful interpretation. In such circumstances, techniques such as t-
SNE or UMAP maybe suitable for visualization.

2. How does MDS handle non-Euclidean distances, and what are the implications of
using different distance metrics (e.g., Minkowski, Mahalanobis)?
 MDS is able to engage different distance measures beyond the standard Euclidean
metric, making it adaptable to different types of data

MDS handles non-Euclidean distance through the following ways;


 Minkowski Distance: Generalizes Euclidean and Manhattan distances, allowing for
different scaling effects.
 Mahalanobis Distance: Accounts for correlations among variables, making it useful
when data dimensions are correlated.
 Cosine Similarity: Measures angular differences, common in text mining applications.
 Jaccard Distance: Suitable for categorical or binary data.

Implications: The choice of distance metric affects the MDS solution, potentially altering
the perceived structure of relationships among data points

3. In what ways can MDS be considered a dimensionality reduction technique, and how
does it compare to PCA in terms of interpretation and usage?

 MDS is considered key, in reduction of high-dimensional distance matrix into lower-


dimensional space while at the same time, maintaining pairwise relationships.
 It is particularly useful as a dimensionality reduction technique when the dataset is
based on subjective distances for instance in psychological studies where the subject
provides their opinion or insight.
 It is also used, in cases where the true relationships in the data are non-linear, making
linear methods like PCA unsuitable.

How does MDS compare to PCA

 In terms of data type, MDS works with any dissimilarity while PCA requires numerical
data.
 The assumption of MDS is that it is based on pairwise distances while PCA is based
on variance and covariance.
 In terms of interpretability, MDS preserves relative distances while PCA preserves
maximum variance.
 The output in MDS is low dimensional embedding and the PCA references itself to
principal components such as orthogonal axes.
 MDS is most useful in cases involving psychology, non-Euclidean spaces while PCA
on the other hand, is engaged with feature extraction, finance etc.

4. How would you determine the optimal number of dimensions to retain in an MDS
analysis, and what are the risks of over- or under-dimensioning?

Determining the optimal number of dimensions in Multi-Dimensional Scaling (MDS) is


crucial for balancing accuracy and interpretability and key determinants to an ptimal
number of dimension are as follows;

 Use of the stress function, particularly Kruskal’s stress; This measures the fit between
the original high-dimensional distances and the lower-dimensional representation. The
stress plot, following the elbow criterion, helps identify the point where adding more
dimensions no longer significantly reduces.
 Evaluating the proportion of variance explained (R²), selecting the number of
dimensions that capture substantial variance while avoiding unnecessary complexity
 Interpretability is also a key factor, ensuring that the retained dimensions provide
meaningful insights relevant to the study context (Cox & Cox, 2001).
 Cross-validation can also be employed by analyzing different subsets of data to
determine whether the selected dimensionality remains consistent across samples.
 Over-dimensioning poses several risks, including overfitting, where excessive dimensions
model noise rather than actual patterns, reducing the model’s generalizability (Borg &
Groenen, 2005). It also leads to computational inefficiencies, increasing processing time
and memory requirements (Cox & Cox, 2001). Furthermore, too many dimensions can
diminish interpretability, making visualization and pattern recognition difficult (Kruskal &
Wish, 1978).

What are the risks of over- or under-dimensioning?

Over-Dimensioning;

 Under-dimensioning can result in information loss, where important structures are


omitted, leading to distorted relationships in the visualization
 It also causes misinterpretation, as key patterns may be lost, leading to inaccurate
conclusions.
 It compromises reliability;a poor model fit due to high stress values indicates that the
lower-dimensional representation fails to adequately capture the original data structure,
compromising reliability.
Over-Dimensioning
 Over-dimensioning poses several risks, including overfitting, where excessive dimensions’
model noise rather than actual patterns, reducing the model’s generalizability (Borg &
Groenen, 2005).
 It also leads to computational inefficiencies, increasing processing time and memory
requirements (Cox & Cox, 2001).
 Furthermore, too many dimensions can diminish interpretability, making visualization and
pattern recognition difficult (Kruskal & Wish, 1978).
5. Given a dataset with missing values, how would you preprocess the data before
applying MDS, and C?
Before applying Multi-Dimensional Scaling (MDS) to a dataset with missing values, it is
essential to handle missing data appropriately to prevent bias and ensure meaningful
distance calculations using the following steps;
STEPS
I. The first step is assessing the missing data pattern by determining whether values
are missing completely at random (MCAR), missing at random (MAR), or missing
not at random (MNAR).
II. Visualization tools, such as heatmaps or missingness matrices, can help identify
patterns in the missing data.
III. If certain rows or columns contain excessive missing values, they may need to be
removed to preserve data integrity.
IV. Once missing values are identified, an appropriate imputation method should be
selected before computing distances.
V. Additionally, standardization or normalization may be necessary to ensure
consistency in scaling and prevent biases in distance-based calculations.
VI. After imputation, it is important to check for bias by comparing distributions before
and after the imputation process to ensure that data structure and variability remain
intact.
What are the implications of different imputation strategies
Different imputation strategies impact MDS in various ways, namely;
 Mean or median imputation can distort variance and reduce data diversity, leading
to biased distance calculations.
 Imputation preserves local data structure but may introduce bias if the nearest
neighbors are not well distributed.
 Multiple Imputation (MI) reduces bias and maintains variability, though it is
computationally intensive and may introduce noise.
 Regression-based imputation helps maintain relationships among variables but
assumes linear dependencies, which may not always hold.
 Deletion methods, such as listwise or pairwise deletion, are simple but can lead to
the loss of valuable data, reducing sample size and affecting generalizability.

6. How does the choice of dissimilarity measure impact MDS results? Can MDS be used
effectively with categorical data, and if so, how?

The choice of dissimilarity measure plays a critical role in the results of Multi-Dimensional
Scaling (MDS) as follows;
 It directly influences how the distances (or dissimilarities) between data points are
quantified and how these points are positioned in the low-dimensional space.
 Different dissimilarity measures affect the interpretation of MDS outputs in varying
ways. For continuous data, Euclidean distance is commonly used, as it reflects the
geometrical closeness of data points in the original high-dimensional space..
 Correlation-based dissimilarity, often applied in biological or behavioral data,
focuses on the relationship between variables rather than their absolute positions.

Can MDS be used effectively with categorical data, and if so, how

MDS can also be used with categorical data, but special techniques are required to handle
the lack of natural ordering in categorical data.

 To compute dissimilarities for categorical data, measures such as Hamming


distance, Jaccard similarity, or Gower’s dissimilarity can be applied, with Gower’s
being particularly suitable for mixed data types.
 A dissimilarity matrix is created from these pairwise distances and used as input
for MDS. Non-metric MDS (NMDS) is an alternative that can handle non-
Euclidean dissimilarities and is effective for categorical data.

7. Explain how MDS can be used in market research to analyze brand positioning. What
challenges arise in interpreting the output?
 Multidimensional Scaling (MDS) is used in market research to analyze brand
positioning by visually representing how consumers perceive different brands in
relation to each other. It reduces complex brand similarity data into a low-dimensional
space (usually 2D or 3D), making it easier to interpret.

How MDS is Used in Brand Positioning Analysis

 Collecting Perceptual Data;Consumers are asked to rate the similarity between


brands based on attributes like quality, price, taste, innovation, etc. Alternatively, they
may rank brands based on preferences or perceptions.
 Constructing a Dissimilarity Matrix;If brands are compared in pairs, the researcher records
a dissimilarity score (e.g., on a scale of 1 to 10, where 1 means very similar and 10 means very
different).

Example for five brands of Tusker

Brand A B C D E
A 0 3.2 1.5 4 2.8
B 3.2 0 2.1 3.7 3.5
C 1.5 2.1 0 3.9 3.2
D 4 3.7 3.9 0 1.2
E 2.8 3.5 3.2 1.2 0

Applying MDS Algorithm

 MDS converts the dissimilarity scores into a spatial representation where brands are plotted
on a map.
 Brands that are perceived as similar will be closer together, while those that are perceived
as different will be farther apart.

Interpreting the Brand Positioning Map

 Clusters of brands indicate market segments (e.g., premium brands vs. budget brands).
 Gaps on the map may reveal opportunities for new products or rebranding.
 Axes can represent latent dimensions (e.g., "Luxury vs. Budget" or "Innovative vs.
Traditional").

Challenges in Interpreting MDS Output

 Lack of defined Axes;MDS does not label the axes automatically, so researchers must
interpret the dimensions based on brand attributes.This can lead to subjective
interpretations.
 Choosing the right Number of Dimensions:While 2D plots are easy to visualize, they
may oversimplify brand perceptions.More dimensions (e.g., 3D) improve accuracy but
make visualization difficult.
 Influence of data Quality:If consumer similarity ratings are inconsistent or biased, the
MDS output may be misleading.Ensuring a large and representative sample is crucial.
 Interpretation Variability:Different MDS solutions (classical MDS vs. non-metric MDS)
can yield different brand maps.Results may change based on scaling techniques or
transformations used.
 Limited Predictive Power;MDS shows relative perceptions but does not explain why
consumers prefer certain brands.It must be combined with factor analysis, regression, or
cluster analysis for deeper insights.
8. What role does MDS play in social science research, and how does it help in
visualizing complex relationships?
 MDS as a statistical technique is used in social science research to analyze and
visualize complex relationships among objects, individuals, or concepts.
 It helps researchers understand perceptions, preferences, and similarities among
entities by representing them in a low-dimensional space (typically 2D or 3D).

How does it help in visualizing complex relationships?

 It helps to Uncovering Hidden Patterns:MDS helps researchers identify underlying


structures in data, such as how people group concepts or how social attitudes cluster.For
example, in public opinion research, MDS can reveal how different political ideologies
are perceived relative to one another.
 It helps to Visualizing Complex Relationships;MDS converts complex high-
dimensional data (e.g., dissimilarity matrices) into a visual map.This is useful in brand
perception studies, where brands are mapped based on consumer similarity ratings.
 It helps in Measuring Perceptions and Attitudes;In psychology and sociology, MDS is
used to study attitudes, emotions, and stereotypes by showing how closely different
concepts are related in people’s minds.Example: It can map how people associate different
personality traits with specific social groups.
 Social Network Analysis:MDS can visualize relationships in social networks, showing
how individuals or groups are connected based on communication patterns or shared
affiliations.Example: Mapping relationships between politicians, influencers, or
community leaders based on their interactions.
 Marketing and Consumer Research;Helps in understanding how consumers perceive
and differentiate between products, brands, or services.Example: If brands A, B, and C are
close in an MDS plot, it suggests consumers see them as similar, indicating direct
competition.

9. How would you evaluate the robustness and reliability of an MDS solution? What
statistical tests can be used to validate the derived dimensions?
 To ensure that a Multidimensional Scaling (MDS) solution is both robust and reliable,
researchers must assess its goodness-of-fit, stability, and interpretability using statistical
and methodological techniques. Here’s how:

1. Goodness-of-Fit Measures;These help determine how well the MDS solution


represents the original dissimilarities under the goodness of fit we have the following tests

a) Stress Value (Kruskal’s Stress): Definition: Measures the discrepancy between the
original dissimilarities and the distances in the MDS solution.

b) R-Squared (RSQ): Measures how much variance in the original dissimilarities is


explained by the MDS solution. Higher R² values (≥ 0.80) indicate a strong fit.

2. Stability and Reliability Checks

These tests determine whether the MDS results are consistent across different conditions.
a) Bootstrapping & Resampling:Recomputes the MDS solution on different subsets of
the data.If the results remain consistent, the solution is stable.

b) Split-Half Reliability:Randomly split the dataset into two halves, perform MDS
separately on each half, and compare the results.High correlation between the two solutions
indicates reliability.

c) Repeating MDS with Different Initial Conditions:MDS solutions can sometimes


converge to local minima. Running MDS multiple times with different starting
configurations ensures that the solution is stable.

3. Dimension Validation Tests

These assess whether the chosen number of dimensions (e.g., 2D, 3D) is appropriate.

a) Scree Plot (Elbow Method);Plots stress values vs. the number of dimensions.Look for
an "elbow point" where adding more dimensions does not significantly reduce stress.

b) Shepard Diagram:Plots original dissimilarities vs. MDS-reproduced distances.A


strong monotonic relationship (i.e., smooth curve) suggests a well-fitted MDS model.

c) Procrustes Analysis:Compares two MDS solutions (e.g., different datasets, different


subsets) to check similarity.If Procrustes distance is low, the solution is reliable.
Question 10 a. Sum of all strictly lower-triangular and strictly upper-
triangular proximities.
PROXSCAL VARIABLES=Toyota Honda Ford
BMW Tesla Mercedes b. Active proximities include all non-missing
/SHAPE=BOTH proximities.
/INITIAL=SIMPLEX
/TRANSFORMATION=INTERVAL
/PROXIMITIES=DISSIMILARITIES
/ACCELERATION=NONE
/CRITERIA=DIMENSIONS(2,2) MAXITER(100)
DIFFSTRESS(.0001) MINSTRESS(.0001)
/PRINT=COMMON STRESS
/PLOT=STRESS COMMON.

Proxscal Goodness of Fit

Credit Stress and Fit Measures


Proxscal Normalized Raw Stress .00128

Version 1.0 Stress-I .03575a

by Stress-II .07593a

Leiden SPSS Group S-Stress .00430b

Leiden University Dispersion Accounted For .99872


(D.A.F.)
The Netherlands

Tucker's Coefficient of .99936


Congruence

Case Processing Summary


PROXSCAL minimizes Normalized Raw

Cases 6 Stress.

Sources 1 a. Optimal scaling factor = 1.001.

b. Optimal scaling factor = 1.000.


Objects 6

Proximities Total Proximities 30a

Missing Proximities 0

Active Proximitiesb 15
Common Space
Final Coordinates

Dimension

1 2

Toyota .696 .094

Honda .665 .135

Ford .462 -.229

BMW -.465 -.172

Tesla -.707 .302

Mercedes -.650 -.130


Question: Analyze the MDS map to understand brand positioning and comment on the
stress value
In MDS we evaluate the fit of the model using the stress values. In essence the stress value is a
badness of fit measure that minimizes the deviation between the fitted distances and the observed
distances (Ding, 2018) . Typically stress values below 0.15 indicate a good fit (Dugard et al.,
Normalized raw stress value of 0.00128 indicates that the fit of the model is very good.
2010).
Normalized raw stress values close to zero indicates that distances in the reduced dimensional
space accurately reflect distances in the original space.
Stress I — The value of 0.0375 is within the acceptable range of below 0.15 indicating that the
model is a good fit
Stress II- this value of 0.07593 is also below the acceptable range of below 0.15, also confirming
that our model is a good fit
S-Stress
Dispersion Accounted for (DAF) — this represents the proportion of variance explained by
MDS. Values close to one indicate a good fit. Our model has a value of 0.99872 which is a
nearly perfect fit.
Tuckers Coefficient of congruence — this is the square root of DAF and also measures goodness
of fit. our value of 0.99936 shows near perfect congruence.
Interpretation of MDS map
Honda and Toyota are very close on the map indicating similarities in market perception which
can be reliability or affordability. Ford is also close to Honda and Toyota indicating dimension
that it may be perceived as an affordable brand but also not as affordable as Honda and Toyota.
On the other end we have Mercedes and BMW close together which might indicate the
perception of being premium brands. Tesla is also located near BMW and Mercedes but located
slightly further indicating perception of a premium brand but with a unique identity. This may be
due to the fact that Tesla is the only electric car brand in our sampled data.
Ding, C. S. (2018). Fundamentals of Applied Multidimensional Scaling for Educational and

Psychological Research. Springer International Publishing. https://doi.org/10.1007/978-3-319-

78172-3

Dugard, P., Todman, J. B., & Staines, H. (2010). Approaching multivariate analysis: A practical

introduction (2nd edition). Routledge. https://doi.org/10.4324/9781003343097


References

 Borg, I., & Groenen, P. J. (2005), Modern Multidimensional Scaling: Theory and
Applications.
 Cox, T. F., & Cox, M. A. (2001), Multidimensional Scaling. Chapman & Hall/CRC.
 Deza, E., & Deza, M. (2009). Encyclopedia of Distances, Springer.
 Jan de Leeuw, , March 2020,

https://www.researchgate.net/publication/2634627_Multidimensional_Scaling, University
of Carlifornia, Los Angeles

 Kruskal, J. B., & Wish, M. (1978). Multidimensional Scaling, Sage Publishers.


 Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proceedings of the
National Institute of Sciences of India

You might also like