Assignment 5
Assignment 5
ASSIGNMENT – 5
Submitted By – Shubhangi Mittal, 23/UMBA/103, MBA Section – B
AMESHOUSING PROJECT
Executive Summary
This project focuses on uncovering patterns in housing sales by categorizing houses into
distinct clusters based on their characteristics. Through detailed descriptive and cluster
analyses, key variables such as Sale Price, Gr_Liv_Area (above ground living area), Lot Area,
and Overall Quality have been identified as critical factors influencing these groupings.
Leveraging these variables, the data has been systematically divided into four distinct
clusters, providing a clearer understanding of the relationships between housing attributes
and their impact on market segmentation.
Introduction
The real estate market is inherently complex, influenced by a multitude of factors that
determine property values and buyer preferences. To make informed decisions, it is essential
to analyze and group houses based on shared characteristics. This project addresses this
need by employing statistical techniques to explore the relationships between various
housing attributes and categorizing the dataset into clusters.
Using descriptive analysis, critical variables such as Sale Price, Gr_Liv_Area, Lot Area, and
Overall Quality were identified as significant determinants in shaping housing clusters.
Subsequently, a cluster analysis was conducted to group the houses into four categories,
each representing a unique combination of features. This segmentation provides actionable
insights, aiding stakeholders such as developers, real estate professionals, and policymakers
in understanding market trends, targeting specific buyer segments, and making data-driven
decisions.
Methodology
To understand the data, following steps have been taken into account –
The data is less skewed for Sale Price and more skewed for Lot Area.
Rating for overall quality lies between 1-9 and for overall condition, it lies between 3-
9.
Strong and positive correlation exists between sales price & Ground living area, and
sales price & overall quality.
Weak and positive correlation exists between Lot area & overall quality, and lot area
& overall condition.
SalePrice vs Gr_Liv_Area
350000.00
300000.00
250000.00
200000.00
Sale Price
150000.00
100000.00
50000.00
0.00
200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00
Gr_Liv_Area
SalePrice vs Lot_Area
350000.00
300000.00
250000.00
200000.00
Sale Price
150000.00
100000.00
50000.00
0.00
0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00
Lot_Area
These findings will help the client to group houses sold into different category for analysis
and implementing strategies for the sale of similar houses in the future.
Conclusion
Cluster analysis is a statistical technique used to group similar data points or objects into
clusters based on their characteristics. It helps identify patterns or groupings within a
dataset, which can be useful for understanding relationships, segmentation, and making
predictions.
Using this analysis, the dealers and stakeholders involved can group customers into different
clusters according to their preferences and if someone wants a unique combination, they
can make customized offers for that customer.