Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
10 views
Data Mining and Data Warehouse
Data mining and data warehouse
Uploaded by
hhc407319
AI-enhanced title
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Data mining and data warehouse For Later
Download
Save
Save Data mining and data warehouse For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
0 ratings
0% found this document useful (0 votes)
10 views
Data Mining and Data Warehouse
Data mining and data warehouse
Uploaded by
hhc407319
AI-enhanced title
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
Download now
Download
Save Data mining and data warehouse For Later
Carousel Previous
Carousel Next
Save
Save Data mining and data warehouse For Later
0%
0% found this document useful, undefined
0%
, undefined
Embed
Share
Print
Report
Download now
Download
You are on page 1
/ 9
Search
Fullscreen
fat Home > Coding Ground np E> euorialencine = — Data Mining - Cluster Analysis Advertisements7:36 9 ® LO 1D Yo Si © Previous Page Next Page © Cluster is a group of objects that belongs to the same class. In other words, similar objects are grouped in one cluster and dissimilar objects are grouped in another cluster. What is Clustering? Clustering is the process of making a group of abstract objects into classes of similar objects. Points to Remember ® A cluster of data objects can be treated as one group. While doing cluster analysis, we first partition the set of data into groups based on data similarity and then assign the labels to the groups. o The main advantage of clustering over classification is that, it is adaptable to changes and helps single out useful features that distinguish different groups. Applications of Cluster Analysis736 39 LO 10 fo i © Applications of Cluster Analysis = Clustering analysis is broadly used in many applications such as market research, pattern recognition, data analysis, and image processing. = Clustering can also help marketers discover distinct groups in their customer base. And _ they can characterize their customer groups based on the purchasing patterns. 2 In the field of biology, it can be used to derive plant and animal taxonomies, categorize genes with similar functionalities and gain insight into structures inherent to populations. Clustering also helps in identification of areas of similar land use in an earth observation database. It also helps in the identification of groups of houses in a city according to house type, value, and geographic location. o Clustering also helps in classifying documents on the web for information discovery. 5 2 Clusterina is also used in outlier737 38 1O 01 te 4) a analysis serves as a tool to gain insignt into the distribution of data to observe characteristics of each cluster. Requirements of Clustering in Data Mining The following points throw light on why clustering is required in data mining - * Scalability - We need highly scalable o o clustering algorithms to deal with large databases. Ability to deal with different kinds of attributes - Algorithms should be capable to be applied on any kind of data such as interval-based (numerical) data, categorical, and binary data. Discovery of clusters with attribute shape - The clustering algorithm should be capable of detecting clusters of arbitrary shape. They should not be bounded to only distance measures that tend to find spherical cluster of small sizes. High dimensionality - The clustering algorithm should not only be able to handle low-dimensional data but Bw the high dimensional space.737 3 ¥1@Q 10! ta “i High dimensionality - The clustering algorithm should not only be able to handle low-dimensional data but also the high dimensional space. ci Ability to deal with noisy data - Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters. Interpretability - The clustering results should be interpretable, comprehensible, and usable. Clustering Methods Clustering methods can be classified into the following categories - = Partitioning Method = Hierarchical Method = Density-based Method 2 Grid-Based Method = Model-Based Method ' Constraint-based Method Partitioning Method Suppose we are given a database of ‘n’ objects and the partitioning met eanctricte ‘l! nartitinn af data Fach nartitianPartitioning Method Suppose we are given a database of ‘n’ objects and the partitioning method constructs ‘k’ partition of data. Each partition will represent a cluster and k < n. It means that it will classify the data into k groups, which satisfy the following requirements - = Each group contains at least one object. = Each object must belong to exactly one group. Points to remember - = For a given number of partitions (say k), the partitioning method will create an initial partitioning. = Then it uses the iterative relocation technique to improve the partitioning by moving objects from one group to other. Hierarchical Methods This method creates a_ hierarchical decomposition of the given set of ‘a objects. We can classify hierarchi737 3 9 1O 10 ft fi methods on the basis of how the hierarchical decomposition is formed. There are two approaches here - = Agglomerative Approach 2 Divisive Approach Agglomerative Approach This approach is also known as the bottom- up approach. In this, we start with each object forming a separate group. It keeps on merging the objects or groups that are close to one another. It keep on doing so until all of the groups are merged into one or until the termination condition holds. Divisive Approach This approach is also known as the top-down approach. In this, we start with all of the objects in the same cluster. In the continuous iteration, a cluster is split up into smaller clusters. It is down until each object in one cluster or the termination condition holds. This method is rigid, i.e., once a merging or splitting is done, it can never be undone. Approaches to Improve Quality of Hierarchical Clustering a Lane ae thn han annem bran that nen enna tn7:37 3 9 1O Of Si OO Approaches to Improve Quality of Hierarchical Clustering Here are the two approaches that are used to improve the quality of hierarchical clustering = Perform careful analysis of object linkages at each hierarchical partitioning. 5 Integrate hierarchical agglomeration by first using a hierarchical agglomerative algorithm to group objects into micro- clusters, and then performing macro- clustering on the micro-clusters. Density-based Method This method is based on the notion of density. The basic idea is to continue growing the given cluster as long as the density in the neighborhood exceeds some threshold, i.e., for each data point within a given cluster, the radius of a given cluster has to contain at least a minimum number of points. Grid-based Method In this, the objects together form a grid. a obiect space is auantized into finite number737 3 ® LOS 10! Me © Grid-based Method In this, the objects together form a grid. The object space is quantized into finite number of cells that form a grid structure. Advantages = The major advantage of this method is fast processing time. 2 It is dependent only on the number of cells in each dimension in the quantized space. Model-based methods In this method, a model is hypothesized for each cluster to find the best fit of data for a given model. This method locates the clusters by clustering the density function. It reflects spatial distribution of the data points. This method also provides a way to automatically determine the number of clusters based on standard statistics, taking outlier or noise into account. It therefore yields robust clustering methods. Constraint-based Method g In this method the clusterina is nerfarmed hv
You might also like
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6132)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (627)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1148)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (935)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4/5 (8215)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (631)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1253)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4/5 (8365)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (860)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (877)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (954)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4/5 (2923)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (484)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (277)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (4972)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (444)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2061)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4281)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (447)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1988)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2283)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1068)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2641)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (1993)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (1936)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (125)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (692)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (1912)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4074)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (75)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (830)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (901)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (143)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2544)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M L Stedman
4.5/5 (790)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4/5 (105)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
3.5/5 (109)