0% found this document useful (0 votes)
2 views

Assignment 5

The AmesHousing Project analyzes housing sales by clustering houses based on key characteristics like Sale Price, Gr_Liv_Area, Lot Area, and Overall Quality. Through descriptive and cluster analyses, four distinct clusters were identified, providing insights for real estate stakeholders to understand market trends and buyer preferences. The findings facilitate targeted strategies for housing sales and customer segmentation.

Uploaded by

anshulpuri50
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Assignment 5

The AmesHousing Project analyzes housing sales by clustering houses based on key characteristics like Sale Price, Gr_Liv_Area, Lot Area, and Overall Quality. Through descriptive and cluster analyses, four distinct clusters were identified, providing insights for real estate stakeholders to understand market trends and buyer preferences. The findings facilitate targeted strategies for housing sales and customer segmentation.

Uploaded by

anshulpuri50
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

BUSINESS INTELLIGENCE

ASSIGNMENT – 5
Submitted By – Shubhangi Mittal, 23/UMBA/103, MBA Section – B

AMESHOUSING PROJECT
Executive Summary
This project focuses on uncovering patterns in housing sales by categorizing houses into
distinct clusters based on their characteristics. Through detailed descriptive and cluster
analyses, key variables such as Sale Price, Gr_Liv_Area (above ground living area), Lot Area,
and Overall Quality have been identified as critical factors influencing these groupings.
Leveraging these variables, the data has been systematically divided into four distinct
clusters, providing a clearer understanding of the relationships between housing attributes
and their impact on market segmentation.

Introduction
The real estate market is inherently complex, influenced by a multitude of factors that
determine property values and buyer preferences. To make informed decisions, it is essential
to analyze and group houses based on shared characteristics. This project addresses this
need by employing statistical techniques to explore the relationships between various
housing attributes and categorizing the dataset into clusters.

Using descriptive analysis, critical variables such as Sale Price, Gr_Liv_Area, Lot Area, and
Overall Quality were identified as significant determinants in shaping housing clusters.
Subsequently, a cluster analysis was conducted to group the houses into four categories,
each representing a unique combination of features. This segmentation provides actionable
insights, aiding stakeholders such as developers, real estate professionals, and policymakers
in understanding market trends, targeting specific buyer segments, and making data-driven
decisions.

By simplifying complex relationships into manageable categories, this study contributes to a


more structured and insightful understanding of the housing market dynamics.

Methodology
To understand the data, following steps have been taken into account –

A) FOR DESCRIPTIVE ANALYSIS


 Microsoft excel has been used to load the data and perform descriptive statistics on
it using the Data Analysis add-ins.
 Descriptive statistics like mean, median, mode, standard deviation, variance,
standard error, range, etc., have been found out for the numeric variables.
 Using the above information, correlation analysis has been done to identify the
strength of relationship between various variables and important variables have
been identified that are further used for cluster analysis.
 Visual representations involving scatter plots are also created to better analyze the
relationship between various variables.

B) FOR CLUSTER ANALYSIS


 Weka has been used to perform cluster analysis on the given dataset.
 After loading the dataset, K-MEANS technique has been used to identify different
clusters from the data set.
 Parameters like changing number of clusters and Euclidian distance has been used to
experiment with different clusters.

Body and Analysis

Figure 1 – Descriptive Analysis done on excel using Data Analysis ADD-INS


Figure 2 – Correlation Analysis Summary

Findings from Descriptive Analysis

 The data is less skewed for Sale Price and more skewed for Lot Area.
 Rating for overall quality lies between 1-9 and for overall condition, it lies between 3-
9.
 Strong and positive correlation exists between sales price & Ground living area, and
sales price & overall quality.
 Weak and positive correlation exists between Lot area & overall quality, and lot area
& overall condition.

SalePrice vs Gr_Liv_Area
350000.00

300000.00

250000.00

200000.00
Sale Price

150000.00

100000.00

50000.00

0.00
200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00
Gr_Liv_Area
SalePrice vs Lot_Area
350000.00

300000.00

250000.00

200000.00
Sale Price

150000.00

100000.00

50000.00

0.00
0.00 5000.00 10000.00 15000.00 20000.00 25000.00 30000.00
Lot_Area

Figure 3-4 – Scatter Plot representations

Figure 5 – Cluster Analysis done using Euclidian Distance


Figure 6 – Cluster Analysis done using Manhattan Distance

Findings from Cluster Analysis

Parameters Cluster 0 Cluster 1 Cluster 2 Cluster 3


Lot Area 7140 8398 8420 8524
Overall Quality 4.11 5.28 5.08 6.29
Gr_Liv_Area 956 1208 1070 1240
Sale Price 96034 145929 124433 167730
House Style 1 story 2 story 1 story 1 story

These findings will help the client to group houses sold into different category for analysis
and implementing strategies for the sale of similar houses in the future.
Conclusion
Cluster analysis is a statistical technique used to group similar data points or objects into
clusters based on their characteristics. It helps identify patterns or groupings within a
dataset, which can be useful for understanding relationships, segmentation, and making
predictions.

Using this analysis, the dealers and stakeholders involved can group customers into different
clusters according to their preferences and if someone wants a unique combination, they
can make customized offers for that customer.

You might also like