ExploratoryDataAnalysis
ExploratoryDataAnalysis
net/publication/380529836
CITATION READS
1 93
4 authors, including:
Mahendra Patil
Atharva College of Engineering, Malad
54 PUBLICATIONS 48 CITATIONS
SEE PROFILE
All content following this page was uploaded by Mahendra Patil on 12 May 2024.
DOI: 10.35629/5252-050413881392 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 1388
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 1388-1392 www.ijaem.net ISSN: 2395-5252
algorithms, to presenting the results. The objective is to use the K- means and DBSCAN
algorithm as it is an unsupervised learning method
III. DBSCAN CLUSTERING of Machine Learning technique.
DBSCAN (Density-Based Spatial It is relatively simple to implement and understand,
Clustering of Applications with Noise) is a popular guarantees convergence and mainly generalizes to
clustering algorithm used in machine learning to clusters of different shapes and sizes.
group similar data points based on their spatial
proximity and density. Unlike other clustering V. PROPOSED SOLUTION
algorithms that rely on a predetermined number of The existing system contains hostels and
clusters, DBSCAN is capable of finding clusters of apartments for rent, and it has bought and sold
arbitrary shapes and sizes, making it a flexible and options. It doesn’t recommend accommodation in
versatile tool for clustering data. our budget. It has rare cases of rental houses on our
The DBSCAN algorithm starts by preferences. It also doesn’t recommend restaurants,
selecting an unvisited data point and examining its gyms etc., based on users’ preferences previous
neighborhood defined by the eps parameter. If there research lacks the accuracy of true
are at least minPts points in the neighborhood, the recommendations.
point is considered a core point and a cluster is The Proposed system recommends
formed around it. The algorithm then expands the hostels, apartments as well as houses and it also
cluster by recursively adding all neighboring points displays the details of those houses, apartments and
that also have at least minPts neighbors in their hostels. It recommends accommodation within our
own neighborhood. budget and based on preferences given. It has large
The result of DBSCAN is a set of clusters, cases of houses on our budget. It also recommends
each containing a group of data points that are restaurants, gyms etc., based on users’ budgets. It
closely packed together and separated from other provides true recommendations without much
clusters by areas of lower density. The algorithm is lacking. We are using the K-means algorithm in
capable of detecting clusters of arbitrary shapes and this project, but it has a drawback when two
sizes, and it can handle noisy and sparse datasets. circular clusters centered at the same mean have
By analyzing the data using DBSCAN, it different radii. K-Means uses median values to
is possible to identify clusters of students who have define the cluster center and doesn’t differentiate
similar preferences and needs. This information can between the two clusters. It also fails when the sets
be used to make better decisions about the design are noncircular. To overcome this drawback, we
and location of student accommodation facilities. use the DBSCAN Algorithm along with K-means.
By using both K-means and DBSCAN, we can take
IV. OBJECTIVE advantage of the strengths of both algorithms. K-
While people migrate to a new city for means can be used to identify initial clusters, which
various purposes, like education, job location, etc., can then be refined using DBSCAN. This hybrid
one needs to handle the issues like a house or a approach can help to overcome the limitations of
place to stay, food necessities in that location, K-means while still maintaining its efficiency, as
environment, and many others. K-means can be computationally faster than
To avoid searching for a rental house manually by DBSCAN.
visiting place to place if there is properly analyzed Overall, combining K-means and
data regarding the rental house, and food DBSCAN can lead to more accurate and robust
preferences with preferred location then the clustering results, especially when dealing with
difficulties of an immigrant can be reduced as it is complex and non-circular clusters.
a basic necessity while migrating to a new city.
This need led us to think of an idea to provide such
properly analyzed clustered data for a given
location which can be helpful while looking for a
place to stay.
We have thought of using a specific means of
clustering method to cluster this unanalysed data 1.Get Datasets from the pertinent locations (Data
properly and present it to the client. In this analysis, Collection)
the main problem is the proper clustering of the 2.Clean the Datasets to prepare them for analysis.
available data and using that clustered data to plot (Data Cleaning via Pandas)
the data on the geolocational map according to the 3.Visualize the data using boxplots. (Using
clusters for a better understanding. Matplotlib /Seaborn /Pandas)
DOI: 10.35629/5252-050413881392 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 1389
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 1388-1392 www.ijaem.net ISSN: 2395-5252
2. BoxPlot By K-Means
DOI: 10.35629/5252-050413881392 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 1390
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 1388-1392 www.ijaem.net ISSN: 2395-5252
ACKNOWLEDGEMENT
VIII. APPLICATIONS We owe sincere thanks to our college
The project model could help students and Atharva College of Engineering for giving us a
workers identify areas with a high concentration of platform to prepare a project on the topic
accommodations that fit their budget and “Exploratory Analysis on Data'' and would like to
preferences, allowing them to make more informed thank our Principal Dr. Ramesh Kulkarni for
decisions about where to live. instigating within us the need for this research and
The project model can be used to analyze giving us the opportunities and time to conduct and
and predict the demand for accommodation in a present research on the topic.
specific location, which can be useful for
businesses in the hospitality industry. We are sincerely grateful for having Prof.
The clustering algorithms used in the Mahendra Patil as our guide and Prof. Suvarna
model can also be applied to other datasets with Pansambal, Head of the Computer Engineering
similar features, such as restaurant or retail store Department, for their encouragement, constant
locations. support and valuable suggestions. Moreover, the
completion of this research would have been
IX. FUTURE SCOPE impossible without the cooperation, suggestions
The project model can be further refined and help of our friends and family.
and expanded by incorporating additional features,
such as pricing data or customer reviews. REFERENCES
The model can be integrated with existing [1]. Exploratory Data Analysis Using
booking platforms to provide real-time Dimension Reduction [Tejas Nanaware ,
recommendations for users based on their Prashant Mahajan , Ravi Chandak, Pratik
preferences and location. Deshpande, Prof. Mahendra Patil ]
The project can be extended to include [2]. Automating Exploratory Data Analysis via
predictive analytics for seasonal fluctuations in Machine Learning [ Tova Milo, Amit
demand, which can help businesses optimize Somech ]
pricing and inventory management. [3]. Visualization Methods for Exploratory
Data Analysis [ IEEE A.Nasser ,
D.Hamad , C.Sar ]
DOI: 10.35629/5252-050413881392 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 1391
International Journal of Advances in Engineering and Management (IJAEM)
Volume 5, Issue 4 April 2023, pp: 1388-1392 www.ijaem.net ISSN: 2395-5252
DOI: 10.35629/5252-050413881392 |Impact Factorvalue 6.18| ISO 9001: 2008 Certified Journal Page 1392