final assignment
final assignment
DEPARTMENT OF MANAGEMENT
GROUP ASSIGNMENT
NAME ADMIN NO
JAMES N. GICHURU PHDBA/2025/31657
LEILA WAITHIRA PHDBA/2025/30149
KEVIN KANYARI WACHIRA PHDBA/2025/60213
WABENGA BASHILWANGO PHDBA/2025/53783
CHARLES CHEGE GITAU PHDBA/2025/68950
1. What are the key assumptions behind MDS, and how do these assumptions affect its
applicability in real-world datasets?
Definition of Multi Dimension Scaling
: Multidimensional Scaling (MDS) is a form of statistics that is utilized to
elucidate/represent data that is high-dimensional in a space that is of a lower-
dimensional value while at the same time, maintaining pairwise distances.
MDS is also considered as any technique that is multi-dimensional in nature where
qualitative and quantitative relationships in the data are aligned with the geometric
relationships in the representation
MDS is therefore, important for measuring human perceptions and preferences for
certain products. This is because of the aspect of spatial representation of relationships
among behavioral data. This paper explores MDS in all its key aspects.
The choice of distance metric is critical. The applicability of MDS is heavily reliant
on the selection of a suitable distance measure that has a reflection on the meaningful
relationships within the data. This is because MDS may grapple with the accuracy of
capturing relationships which leads to misrepresentation in the output visualization.
Missing data or data that is not aligned with the rest of the data, can distort the
distance matrix, leading to incorrect embedding. Methods such as imputation or
filtering out noisy data can help improve results.
MDS will only work well, where the data structure is captured in two or three
dimensions. Some data might have complexities in relationships which require more
dimensions for meaningful interpretation. In such circumstances, techniques such as t-
SNE or UMAP maybe suitable for visualization.
2. How does MDS handle non-Euclidean distances, and what are the implications of
using different distance metrics (e.g., Minkowski, Mahalanobis)?
MDS is able to engage different distance measures beyond the standard Euclidean
metric, making it adaptable to different types of data
Implications: The choice of distance metric affects the MDS solution, potentially altering
the perceived structure of relationships among data points
3. In what ways can MDS be considered a dimensionality reduction technique, and how
does it compare to PCA in terms of interpretation and usage?
In terms of data type, MDS works with any dissimilarity while PCA requires numerical
data.
The assumption of MDS is that it is based on pairwise distances while PCA is based
on variance and covariance.
In terms of interpretability, MDS preserves relative distances while PCA preserves
maximum variance.
The output in MDS is low dimensional embedding and the PCA references itself to
principal components such as orthogonal axes.
MDS is most useful in cases involving psychology, non-Euclidean spaces while PCA
on the other hand, is engaged with feature extraction, finance etc.
4. How would you determine the optimal number of dimensions to retain in an MDS
analysis, and what are the risks of over- or under-dimensioning?
Use of the stress function, particularly Kruskal’s stress; This measures the fit between
the original high-dimensional distances and the lower-dimensional representation. The
stress plot, following the elbow criterion, helps identify the point where adding more
dimensions no longer significantly reduces.
Evaluating the proportion of variance explained (R²), selecting the number of
dimensions that capture substantial variance while avoiding unnecessary complexity
Interpretability is also a key factor, ensuring that the retained dimensions provide
meaningful insights relevant to the study context (Cox & Cox, 2001).
Cross-validation can also be employed by analyzing different subsets of data to
determine whether the selected dimensionality remains consistent across samples.
Over-dimensioning poses several risks, including overfitting, where excessive dimensions
model noise rather than actual patterns, reducing the model’s generalizability (Borg &
Groenen, 2005). It also leads to computational inefficiencies, increasing processing time
and memory requirements (Cox & Cox, 2001). Furthermore, too many dimensions can
diminish interpretability, making visualization and pattern recognition difficult (Kruskal &
Wish, 1978).
Over-Dimensioning;
6. How does the choice of dissimilarity measure impact MDS results? Can MDS be used
effectively with categorical data, and if so, how?
The choice of dissimilarity measure plays a critical role in the results of Multi-Dimensional
Scaling (MDS) as follows;
It directly influences how the distances (or dissimilarities) between data points are
quantified and how these points are positioned in the low-dimensional space.
Different dissimilarity measures affect the interpretation of MDS outputs in varying
ways. For continuous data, Euclidean distance is commonly used, as it reflects the
geometrical closeness of data points in the original high-dimensional space..
Correlation-based dissimilarity, often applied in biological or behavioral data,
focuses on the relationship between variables rather than their absolute positions.
Can MDS be used effectively with categorical data, and if so, how
MDS can also be used with categorical data, but special techniques are required to handle
the lack of natural ordering in categorical data.
7. Explain how MDS can be used in market research to analyze brand positioning. What
challenges arise in interpreting the output?
Multidimensional Scaling (MDS) is used in market research to analyze brand
positioning by visually representing how consumers perceive different brands in
relation to each other. It reduces complex brand similarity data into a low-dimensional
space (usually 2D or 3D), making it easier to interpret.
Brand A B C D E
A 0 3.2 1.5 4 2.8
B 3.2 0 2.1 3.7 3.5
C 1.5 2.1 0 3.9 3.2
D 4 3.7 3.9 0 1.2
E 2.8 3.5 3.2 1.2 0
MDS converts the dissimilarity scores into a spatial representation where brands are plotted
on a map.
Brands that are perceived as similar will be closer together, while those that are perceived
as different will be farther apart.
Clusters of brands indicate market segments (e.g., premium brands vs. budget brands).
Gaps on the map may reveal opportunities for new products or rebranding.
Axes can represent latent dimensions (e.g., "Luxury vs. Budget" or "Innovative vs.
Traditional").
Lack of defined Axes;MDS does not label the axes automatically, so researchers must
interpret the dimensions based on brand attributes.This can lead to subjective
interpretations.
Choosing the right Number of Dimensions:While 2D plots are easy to visualize, they
may oversimplify brand perceptions.More dimensions (e.g., 3D) improve accuracy but
make visualization difficult.
Influence of data Quality:If consumer similarity ratings are inconsistent or biased, the
MDS output may be misleading.Ensuring a large and representative sample is crucial.
Interpretation Variability:Different MDS solutions (classical MDS vs. non-metric MDS)
can yield different brand maps.Results may change based on scaling techniques or
transformations used.
Limited Predictive Power;MDS shows relative perceptions but does not explain why
consumers prefer certain brands.It must be combined with factor analysis, regression, or
cluster analysis for deeper insights.
8. What role does MDS play in social science research, and how does it help in
visualizing complex relationships?
MDS as a statistical technique is used in social science research to analyze and
visualize complex relationships among objects, individuals, or concepts.
It helps researchers understand perceptions, preferences, and similarities among
entities by representing them in a low-dimensional space (typically 2D or 3D).
9. How would you evaluate the robustness and reliability of an MDS solution? What
statistical tests can be used to validate the derived dimensions?
To ensure that a Multidimensional Scaling (MDS) solution is both robust and reliable,
researchers must assess its goodness-of-fit, stability, and interpretability using statistical
and methodological techniques. Here’s how:
a) Stress Value (Kruskal’s Stress): Definition: Measures the discrepancy between the
original dissimilarities and the distances in the MDS solution.
These tests determine whether the MDS results are consistent across different conditions.
a) Bootstrapping & Resampling:Recomputes the MDS solution on different subsets of
the data.If the results remain consistent, the solution is stable.
b) Split-Half Reliability:Randomly split the dataset into two halves, perform MDS
separately on each half, and compare the results.High correlation between the two solutions
indicates reliability.
These assess whether the chosen number of dimensions (e.g., 2D, 3D) is appropriate.
a) Scree Plot (Elbow Method);Plots stress values vs. the number of dimensions.Look for
an "elbow point" where adding more dimensions does not significantly reduce stress.
by Stress-II .07593a
Cases 6 Stress.
Missing Proximities 0
Active Proximitiesb 15
Common Space
Final Coordinates
Dimension
1 2
78172-3
Dugard, P., Todman, J. B., & Staines, H. (2010). Approaching multivariate analysis: A practical
Borg, I., & Groenen, P. J. (2005), Modern Multidimensional Scaling: Theory and
Applications.
Cox, T. F., & Cox, M. A. (2001), Multidimensional Scaling. Chapman & Hall/CRC.
Deza, E., & Deza, M. (2009). Encyclopedia of Distances, Springer.
Jan de Leeuw, , March 2020,
https://www.researchgate.net/publication/2634627_Multidimensional_Scaling, University
of Carlifornia, Los Angeles