❑ Machine learning Flowchart:-
Define Project Objectives Data Collection Data Preprocessing
Model Building Model Selection Data Visulalization
Train Model Test Model Improve Efficiency Deployment
Clustering in Machine Learning
Introduction to Clustering: It is basically a type of unsupervised learning method. An unsupervised learning
method is a method in which we draw references from datasets consisting of input data without labeled
responses. Generally, it is used as a process to find meaningful structure, explanatory underlying processes,
generative features, and groupings inherent in a set of examples.
Clustering is the task of dividing the population or data points into a number of groups such that data points in
the same groups are more similar to other data points in the same group and dissimilar to the data points in
other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them.
For example The data points in the graph below clustered together can be classified into one single group. We
can distinguish the clusters, and we can identify that there are 3 clusters in the below picture.
Applications of Clustering in different fields:
1.Marketing: It can be used to characterize & discover customer segments for marketing purposes.
2.Biology: It can be used for classification among different species of plants and animals.
3.Libraries: It is used in clustering different books on the basis of topics and information.
4.Insurance: It is used to acknowledge the customers, their policies and identifying the frauds.
5.City Planning: It is used to make groups of houses and to study their values based on their
geographical locations and other factors present.
6.Earthquake studies: By learning the earthquake-affected areas we can determine the dangerous
zones.
7.Image Processing: Clustering can be used to group similar images together, classify images based
on content, and identify patterns in image data.
8.Genetics: Clustering is used to group genes that have similar expression patterns and identify gene
networks that work together in biological processes.
9.Finance: Clustering is used to identify market segments based on customer behavior, identify
patterns in stock market data, and analyze risk in investment portfolios.
10.Customer Service: Clustering is used to group customer inquiries and complaints into categories,
identify common issues, and develop targeted solutions.
11.Traffic analysis: Clustering is used to group similar patterns of traffic data, such as peak hours,
routes, and speeds, which can help in improving transportation planning and infrastructure.
12.Social network analysis: Clustering is used to identify communities or groups within social
networks, which can help in understanding social behavior, influence, and trends.
13.Cybersecurity: Clustering is used to group similar patterns of network traffic or system
behavior, which can help in detecting and preventing cyberattacks.
14.Climate analysis: Clustering is used to group similar patterns of climate data, such as
temperature, precipitation, and wind, which can help in understanding climate change and its
impact on the environment.
15.Sports analysis: Clustering is used to group similar patterns of player or team performance
data, which can help in analyzing player or team strengths and weaknesses and making strategic
decisions.
16.Crime analysis: Clustering is used to group similar patterns of crime data, such as location,
time, and type, which can help in identifying crime hotspots, predicting future crime trends, and
improving crime prevention strategies.
Clustering Methods:
1.K-Means Clustering:
1. Divides the dataset into a predefined number of clusters (k) based on the mean of data
points in each cluster.
2.Hierarchical Clustering:
1. Builds a tree of clusters by either repeatedly merging smaller clusters (agglomerative) or
splitting larger clusters (divisive).
3.DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
1. Groups together data points that are close to each other and have a sufficient number of
neighbors, while marking outliers as noise.
Association Rule Learning
Association rule learning is a type of unsupervised learning technique that checks for the
dependency of one data item on another data item and maps accordingly so that it can be
more profitable. It tries to find some interesting relations or associations among the
variables of dataset. It is based on different rules to discover the interesting relations
between variables in the database.
The association rule learning is one of the very important concepts of Machine Learning, and
it is employed in Market Basket analysis, Web usage mining, continuous production,
etc. Here market basket analysis is a technique used by the various big retailer to discover
the associations between items. We can understand it by taking an example of a
supermarket, as in a supermarket, all products that are purchased together are put together.
For example, if a customer buys bread, he most likely can also buy butter, eggs, or milk, so
these products are stored within a shelf or mostly nearby. Consider the below diagram: