0% found this document useful (0 votes)

29 views

UNIT 4 NOTES

The document provides detailed explanations of various data analysis techniques including K-means clustering, Apriori algorithm for association analysis, FP-growth for frequent itemset mining, and Principal Component Analysis (PCA). It outlines the steps involved in each algorithm and highlights their applications in fields like marketing, computer vision, and bioinformatics. Additionally, it discusses Singular Value Decomposition (SVD) and its salient features such as dimensionality reduction, numerical stability, data compression, and noise reduction.

Uploaded by

aelurigowri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

UNIT 4 NOTES

Uploaded by

aelurigowri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

1) Explain k means clustering algorithm in detail

K-means clustering is a popular unsupervised learning algorithm used for data clustering and
partitioning. The goal of k-means clustering is to group similar data points together in a way that
maximizes the similarity within each group (called a cluster) while minimizing the similarity between
different groups.

The algorithm works as follows:

• Select the number of clusters (k) you want to create.

• Randomly initialize k points in the dataset as the centroids of the clusters.
• Assign each data point to the nearest centroid based on the distance metric (usually Euclidean
distance).
• Update the centroid of each cluster by taking the mean of all the data points assigned to that
cluster.
• Repeat steps 3 and 4 until convergence (i.e., the assignment of data points to clusters no longer
changes or some stopping criterion is reached).

The final result of the k-means algorithm is a set of k clusters, each with its centroid, and each data point
belonging to one of the clusters. The quality of the clustering depends on the initial position of the
centroids and the distance metric used. K-means is a computationally efficient algorithm and can be
used on large datasets, making it a popular choice for clustering applications in various fields such as
computer science, biology, and social sciences.

2) Explain clustering concept using k means algorithm

Clustering is the process of grouping similar objects together in a way that

objects in the same group are more similar to each other than to those in
other groups. The k-means algorithm is a popular clustering algorithm that
partitions data into k clusters based on the similarity of the objects in the
data.

The key concepts of clustering using the k-means algorithm are:

Distance metric: A measure of similarity between two data points.

Euclidean distance is commonly used in k-means, where the distance
between two points is the straight-line distance between them.

Centroids: The center of each cluster is represented by a centroid. In the

initial stage, k centroids are randomly chosen from the data points. During
the algorithm, the centroids are updated to be the mean of the data points
assigned to the cluster.

Assignment of data points: Each data point is assigned to the closest

centroid based on the chosen distance metric. This assignment is done
iteratively until the centroids no longer move.

Number of clusters: The number of clusters (k) is determined by the user. A

larger number of clusters may lead to overfitting, while a smaller number of
clusters may not capture all the variation in the data.

Initialization: The initial positions of the centroids have a significant impact

on the final clustering result. Random initialization is commonly used, but
other methods such as k-means++ can be used to improve the quality of
the clustering.

Convergence: The algorithm stops when the centroids no longer move. This
indicates that the clustering has reached a stable state, and further
iterations will not change the result.

The k-means algorithm is widely used for clustering tasks in various fields,
such as customer segmentation in marketing, image segmentation in
computer vision, and gene expression analysis in bioinformatics.

3) Same as 4th answer

4) How association analysis is done by using apriori algorithm. Explain
Ans) Association analysis using the Apriori algorithm is a popular
technique for identifying frequent itemsets in a dataset and generating
association rules between them. Here are the steps involved in the
Apriori algorithm:
• Define minimum support: Set a minimum support threshold that
determines the minimum number of times an itemset needs to appear in
the dataset to be considered frequent.
• Generate frequent itemsets: Scan the dataset to identify all the individual
items and count their occurrences. Then, generate all possible
combinations of these items, called itemsets, and count their occurrences.
Only the itemsets that meet the minimum support threshold are
considered frequent.
• Generate candidate itemsets: Using the frequent itemsets generated in the
previous step, generate new itemsets by joining pairs of frequent itemsets.
Discard any itemsets that contain subsets that are not frequent.
• Repeat steps 2 and 3 until no more frequent itemsets can be generated:
Keep generating frequent itemsets and candidate itemsets until there are
no more frequent itemsets that can be generated.
• Generate association rules: Using the frequent itemsets generated in the
previous step, generate association rules between itemsets by splitting
them into antecedents and consequents. Calculate the confidence of each
rule, which measures the probability of the consequent itemset appearing
given that the antecedent itemset appears. Only the rules that meet a
minimum confidence threshold are considered strong rules

By following these steps, the Apriori algorithm can efficiently identify

frequent itemsets and generate strong association rules between them.
5) Explain with an example dataset, how to find frequent itemsets and
generate association rules using Apriori algorithm.
Ans) Suppose we have a transaction dataset of a retail store that contains
information about customer purchases. Here is a sample of the dataset:
Transaction ID Items Purchased
1 Bread, Milk, Cheese, Eggs
2 Bread, Milk, Diapers
3 Bread, Cheese
4 Bread, Milk, Diapers, Eggs
5 Milk, Cheese, Diapers
We will apply the Apriori algorithm to this dataset to identify frequent itemsets
and generate association rules between them. Here are the steps involved:
Step 1: Define minimum support
We will set the minimum support to 2, which means that an itemset must appear
at least twice in the dataset to be considered frequent.
Step 2: Generate frequent itemsets
Scan the dataset to identify all the individual items and count their occurrences.
Then, generate all possible combinations of these items, called itemsets, and
count their occurrences. Only the itemsets that meet the minimum support
threshold are considered frequent. Here are the frequent itemsets for our
dataset:
Itemset Support
{Bread} 4
{Milk} 4
{Cheese} 3
{Diapers} 3
{Eggs} 2
{Bread, Milk} 3
{Bread, Cheese} 2
{Milk, Cheese} 2
{Milk, Diapers} 2
Step 3: Generate candidate itemsets
Using the frequent itemsets generated in the previous step, generate new
itemsets by joining pairs of frequent itemsets. Discard any itemsets that contain
subsets that are not frequent. Here are the candidate itemsets for our dataset:

Itemset Support
{Bread, Milk, Cheese}. 2
{Bread, Milk, Diapers 2
{Milk, Cheese, Diapers} 2
Step 4: Repeat steps 2 and 3 until no more frequent itemsets can be generated
We can stop here because no more frequent itemsets can be generated
Step 5: Generate association rules
Using the frequent itemsets generated in the previous step, generate association
rules between itemsets by splitting them into antecedents and consequents.
Calculate the confidence of each rule, which measures the probability of the
consequent itemset appearing given that the antecedent itemset appears. Only
the rules that meet a minimum confidence threshold are considered strong rules.
Let’s set the minimum confidence threshold to 0.5. Here are the strong
association rules for our dataset:
Rule Support Confidence
{Bread} => {Milk} 3 3/4 = 0.75
{Milk} => {Bread} 3 3/4 = 0.75
{Bread} => {Cheese}2 2/4 = 0.5
{Cheese} => {Bread}2 2/3 = 0.67
{Milk} => {Cheese} 2 2/4 = 0.5
{Cheese} => {Milk} 2 2/3 = 0.67
{Milk} => {Diapers} 2 2/4 = 0.5
{Diapers} => {Milk} 2 2/3 = 0.67

For better explanation refer this link :-

https://www.geeksforgeeks.org/apriori-algorithm/
https://www.javatpoint.com/apriori-algorithm-in-machine-learning

6)Same as 5th question

7) Explain how to efficiently find frequent itemsets with FP(frequenct
pattern)-growth.
FP-Growth is a popular algorithm for finding frequent itemsets in a large dataset.
The algorithm is efficient because it avoids generating candidate itemsets, as in
traditional Apriori-based algorithms, and instead uses a compact data structure called
an FP-tree to efficiently mine frequent itemsets.
Here are the steps to efficiently find frequent itemsets with FP-growth:
1. Scan the dataset and count the frequency of each item. Sort the items in
descending order of frequency, creating a frequent item list.
2. Construct an FP-tree by inserting the transactions into the tree, one at a time,
starting with the transaction containing the least frequent item in the frequent
item list. Each transaction is added to the tree as a path from the root to a leaf.
The items in each transaction are ordered according to their frequency in the
frequent item list.
3. Create a header table that stores the head of each item's linked list in the FP-
tree. The header table is sorted in descending order of frequency.
4. For each item in the frequent item list, create a conditional pattern base and a
conditional FP-tree. The conditional pattern base is a set of all transactions
that contain the item. The conditional FP-tree is constructed from the
conditional pattern base in the same way as the original FP-tree.
5. Recursively mine the conditional FP-trees to find frequent itemsets. To do
this, start with the item that has the lowest frequency in the frequent item list
and mine its conditional FP-tree. If the conditional FP-tree is not empty, repeat
the process for the next lowest frequency item in the frequent item list, using
the conditional FP-tree as the new dataset.
6. Combine the frequent itemsets found in step 5 with the frequent itemsets
found in previous iterations to obtain the complete set of frequent itemsets.
By using the compact data structure of an FP-tree and avoiding the generation of
candidate itemsets, FP-growth is an efficient algorithm for mining frequent itemsets
in large datasets.
8) Describe how frequent itemsets are found efficiently using FP-
growth (frequenct pattern) algorithm.
Same as above

9) What is principal component analysis? Explain.

Principal Component Analysis (PCA) is a statistical method used for dimensionality
reduction in data analysis. The goal of PCA is to identify the most important features
or components in a dataset, and transform the data to a new coordinate system that
captures the maximum amount of variation in the data using fewer dimensions.
PCA works by finding a set of orthogonal vectors, called principal components, that
can represent the data in a lower-dimensional space. The first principal component
is the direction of maximum variance in the data, and each subsequent principal
component is orthogonal to the previous one and captures as much variance as
possible. The number of principal components is equal to the number of dimensions
in the original dataset.
Here are the steps to perform PCA:
1. Standardize the data: PCA is sensitive to the scale of the data, so it is important
to standardize the data by subtracting the mean and dividing by the standard
deviation for each feature.
2. Compute the covariance matrix: The covariance matrix describes the
relationships between the features in the data. It is computed by taking the dot
product of the standardized data matrix with its transpose.
3. Compute the eigenvectors and eigenvalues of the covariance matrix: The
eigenvectors are the principal components, and the eigenvalues describe the
amount of variance captured by each principal component.
4. Choose the number of principal components: The number of principal
components to retain depends on the amount of variance explained by each
component. A common approach is to choose the number of components that
explain a certain percentage of the total variance in the data, such as 95%.
5. Project the data onto the principal components: The data is transformed into a
new coordinate system defined by the principal components. Each data point
is projected onto the principal components to obtain a reduced-dimensional
representation of the data.
PCA is widely used in data analysis, machine learning, and data visualization to
reduce the dimensionality of high-dimensional datasets, identify the most important
features in the data, and improve computational efficiency.
8 MARKS
UNIT 4

10) Define and explain principal component analysis.

Ans:

• Principal Component Analysis (PCA) is a statistical method used

for identifying patterns in high-dimensional data.
• It is a dimensionality reduction technique that reduces the
number of variables in a dataset while retaining as much
information as possible.
• In PCA, the original data is transformed into a new coordinate
system called principal components.
• These principal components are linear combinations of the
original variables and are orthogonal to each other.
• The first principal component captures the maximum variance in
the data, the second principal component captures the maximum
remaining variance after the first principal component, and so on.
8 MARKS
UNIT 4

The PCA Mainly Contains several steps they are

• Standardize the data
• Calculate the covariance matrix
• Compute the eigenvectors and eigenvalues
• Sort the eigenvectors by their corresponding eigenvalues
• Select the number of principal components
• Transform the data into the new coordinate system

1. Standardize the data: If the variables in the dataset have

different scales, standardize the data by subtracting the mean
from each variable and dividing by its standard deviation.
2. Calculate the covariance matrix: Compute the covariance matrix
of the standardized data.
3. Compute the eigenvectors and eigenvalues: Calculate the
eigenvectors and eigenvalues of the covariance matrix.
4. Sort the eigenvectors by their corresponding eigenvalues: Sort
the eigenvectors by their corresponding eigenvalues in
descending order. The eigenvectors with the highest eigenvalues
are the most important and correspond to the principal
components.
5. Select the number of principal components: Decide on the
number of principal components to retain based on the amount
of variance they explain.
6. Transform the data into the new coordinate system: Project the
original data onto the selected principal components to get the
transformed data.
8 MARKS
UNIT 4

11) Define and explain Singular Value Decomposition.

Ans :

Singular Value Decomposition (SVD) is a matrix factorization

technique that decomposes a matrix into three constituent matrices.
The resulting factorization can be useful for a variety of tasks, such as
reducing dimensionality, denoising, and data compression.
8 MARKS
UNIT 4

video of singular value Decomposition

Steps of SVD:

1. Calculate the transpose of A: If the original matrix A is not

square, calculate its transpose.

2. Compute the product A^TA: Calculate the product of A and its

transpose, A^TA.
8 MARKS
UNIT 4

3. Calculate the eigenvectors and eigenvalues of A^TA: Compute

the eigenvectors and eigenvalues of the matrix A^TA.

4. Compute the singular values: Take the square root of the

eigenvalues to get the singular values.

5. Calculate the matrix V: Construct the matrix V by using the

eigenvectors of A^TA as columns.

6. Compute the matrix U: Calculate the matrix U by normalizing

the columns of A times V by their corresponding singular
values.

7. Assemble the diagonal matrix Σ: Create the diagonal matrix Σ

from the singular values.

12) Explain the salient feature of Singular Value Decomposition in

detail.

Ans:
8 MARKS
UNIT 4

Salient Feature Of Singular Value Decomposition:

1. Dimensionality reduction: SVD can be used to reduce the

dimensionality of a dataset by selecting a subset of the most
important singular values and corresponding eigenvectors. This
makes it possible to represent the data in a lower-dimensional
space without losing much information.

2. Numerical stability: SVD is numerically stable, meaning it is

less susceptible to rounding errors and other numerical
instabilities that can arise in other matrix factorization
techniques.

3. Data compression: SVD can be used for data compression by

approximating a high-dimensional dataset with a low-
dimensional representation that captures the most important
features of the data.

4. Noise reduction: SVD can be used to denoise a signal by

removing low-energy components and retaining the high-energy
components that correspond to the signal.

5. Robustness to outliers: SVD is robust to outliers, meaning it can

handle data that contains extreme values without affecting the
accuracy of the factorization.

6. Interpretable factors: The singular vectors and values obtained

from SVD have clear geometric interpretations, making it easier
to understand and interpret the factors that underlie the data.
8 MARKS
UNIT 4

7. Widely applicable: SVD is applicable to a wide range of

problems in various fields, such as image processing, signal
processing, text mining, and recommendation systems.
13) Explain the feature ranking method of dimensionality reduction.
Ans ) Feature ranking is a technique used in dimensionality reduction to
identify the most important features in a dataset. It involves ranking
the features according to their importance or relevance to the outcome
variable.
The process of feature ranking involves the following steps:
• Select a set of features: The first step is to select the set of features that will
be used for the analysis.
• Calculate feature importance: The next step is to calculate the importance
of each feature. This can be done using a variety of techniques, including
correlation analysis, mutual information, and statistical tests.
• Rank the features: Once the importance of each feature has been
calculated, they can be ranked in order of importance.
• Select the top features: Finally, the top-ranked features can be selected for
further analysis, while the less important features can be discarded.

Feature ranking is a useful technique for reducing the dimensionality

of a dataset, as it can help to identify the most important features
while discarding the less important ones. This can lead to more
accurate and efficient models, as well as a better understanding of
the underlying data.

14) What are the important features of feature ranking method of

dimensionality reduction ?
Ans) Feature ranking is a popular method of dimensionality
reduction that involves selecting a subset of features from a larger
set of variables. The goal is to identify the most important features
that contribute to the prediction or classification task. Here are some
important features of feature ranking methods:
• Ranking Criteria: Feature ranking methods use various criteria to rank
the importance of features, such as correlation, mutual information,
entropy, and so on. The ranking criteria should reflect the relevance
of each feature to the problem at hand.
• Selection Method: Once the features are ranked, a selection method
is used to choose a subset of the top-ranked features. The selection
method can be based on a threshold value, such as selecting the top
k features, or it can be a more sophisticated method, such as a
greedy algorithm or a genetic algorithm.
• Performance Evaluation: Feature ranking methods should be
evaluated based on their ability to improve the performance of the
prediction or classification model. The selected subset of features
should lead to better performance compared to using all the
features.
• Robustness: Feature ranking methods should be robust to noise and
outliers in the data. They should also be able to handle missing
values and deal with redundant or correlated features.
• Interpretability: Feature ranking methods should provide
interpretable results that can be easily understood by domain
experts. The importance of each feature should be explained in
terms of its relevance to the problem at hand.
• Computational Efficiency: Feature ranking methods should be
computationally efficient, especially for large datasets with a large
number of features. They should be able to rank and select features
in a reasonable amount of time

Overall, feature ranking is a useful technique for reducing the

dimensionality of high-dimensional data and improving the
performance of machine learning models.
15)Explain filter method of dimensionality reduction in detail?
A) Filter method is a dimensionality reduction technique used in
machine learning and data science to identify and remove
irrelevant or redundant features from a dataset. This method
works by ranking the features based on a specific metric and
then selecting a subset of the top-ranked features to be used in
the model.
The filter method consists of three main steps:
1. Feature selection: This step involves selecting a subset of the
most relevant features from the dataset. The goal is to reduce
the number of features in the dataset while retaining as much
relevant information as possible.
2. Ranking the features: The next step is to rank the selected
features based on a specific metric. There are several metrics
that can be used to rank the features, including correlation,
mutual information, chi-squared, and ANOVA F-test.
• Correlation: measures the linear relationship
between two variables. Features with high
correlation to the target variable are considered
more relevant.
• Mutual information: measures the amount of
information that one feature provides about
another feature. Features with high mutual
information are considered more relevant.
• Chi-squared: measures the dependence between
two categorical variables. Features with high chi-
squared values are considered more relevant.
• ANOVA F-test: measures the difference in means
between groups in a categorical variable. Features
with high F-values are considered more relevant.

3) Selecting the top-ranked features: Finally, the top-ranked

features are selected and used in the model. The number of
features selected depends on the specific problem and the
performance of the model with different feature subsets.

16)What are the important features of filter method of

dimensionality reduction?

A) The filter method of dimensionality reduction is a technique

that helps to identify and select the most relevant features from
a dataset. The important features of the filter method are as
follows:
1.Simplicity: The filter method is a simple and easy-to-
understand technique that does not require much
computational power. It can handle large datasets with many
features efficiently.
2)Independence: The filter method is independent of the
machine learning algorithm used for classification or regression.
It can be applied to any dataset without requiring specific
assumptions about the underlying distribution.

3)Speed: The filter method is a fast technique as it requires only

a single pass through the data to identify the relevant features.
It can process large datasets with many features quickly.

4)Scalability: The filter method is scalable and can handle

datasets with a high number of features. It is particularly useful
for datasets where the number of features is much larger than
the number of observations.

5)Interpretable results: The filter method provides interpretable

results as it ranks the features based on a specific metric such as
correlation, mutual information, chi-squared, or ANOVA F-test.
This allows for a better understanding of the relationship
between the features and the target variable.

Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
XCom Apocalypse Manual
100% (1)
XCom Apocalypse Manual
203 pages
APznzaYKXa5YwGceeu2-5Hb2cWsN90NIV1g8I9DxBLLoKwuE7P4qjOfEGWd6pCzfmwSqKnWNBm5euXlo07JZKRKi-UcpBSTEjg7UTMzxCaVnPn0Jb2VsTE_sqVGq7R0pvAGyLrtvL4jK7B1dY1fgM9rEecJTtpRn5WSkJB__vFz_Re2xK6z3uN9DfvIaFgXRVYH8z-mJcY-z6Q8hhRFSOd
No ratings yet
APznzaYKXa5YwGceeu2-5Hb2cWsN90NIV1g8I9DxBLLoKwuE7P4qjOfEGWd6pCzfmwSqKnWNBm5euXlo07JZKRKi-UcpBSTEjg7UTMzxCaVnPn0Jb2VsTE_sqVGq7R0pvAGyLrtvL4jK7B1dY1fgM9rEecJTtpRn5WSkJB__vFz_Re2xK6z3uN9DfvIaFgXRVYH8z-mJcY-z6Q8hhRFSOd
174 pages
Association Rules
No ratings yet
Association Rules
24 pages
Lesson 8 Association Rules
No ratings yet
Lesson 8 Association Rules
58 pages
Top 9 Data Science Algorithms
No ratings yet
Top 9 Data Science Algorithms
152 pages
Unit-4 Da
No ratings yet
Unit-4 Da
15 pages
Unit3 Data mining Pattern
No ratings yet
Unit3 Data mining Pattern
46 pages
Unit 4
No ratings yet
Unit 4
72 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
DWDM Mid Ii
No ratings yet
DWDM Mid Ii
13 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
No ratings yet
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
37 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Apriori
No ratings yet
Apriori
27 pages
DWDM Unit 4 (R22)
No ratings yet
DWDM Unit 4 (R22)
25 pages
Q) Frequent Itemset Generation: States That If An Itemset Is Frequent, Then All of Its Subsets Must Also Be Frequent. This
No ratings yet
Q) Frequent Itemset Generation: States That If An Itemset Is Frequent, Then All of Its Subsets Must Also Be Frequent. This
9 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
28 pages
Association Rule Mining 2023 (Compatibility Mode)
No ratings yet
Association Rule Mining 2023 (Compatibility Mode)
44 pages
Unit 5
No ratings yet
Unit 5
40 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
15 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
[2025-05-27]-FPM_LECTURE 9-
No ratings yet
[2025-05-27]-FPM_LECTURE 9-
35 pages
Association Rule Mining
No ratings yet
Association Rule Mining
19 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
ADB Slides 5
No ratings yet
ADB Slides 5
52 pages
Apriori
No ratings yet
Apriori
27 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Association Rule Mining
No ratings yet
Association Rule Mining
11 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
11 pages
Assignment ON Data Mining: Submitted by Name: Manjula.T
No ratings yet
Assignment ON Data Mining: Submitted by Name: Manjula.T
11 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
Lecture 11 Assiciation Rules II M
No ratings yet
Lecture 11 Assiciation Rules II M
27 pages
Dwdm Answer
No ratings yet
Dwdm Answer
19 pages
Module 4 DM
No ratings yet
Module 4 DM
86 pages
Apriori
No ratings yet
Apriori
34 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
6 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Association Rules
No ratings yet
Association Rules
48 pages
Association-Rules
No ratings yet
Association-Rules
33 pages
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
No ratings yet
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
10 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
AD0-E103
No ratings yet
AD0-E103
64 pages
Aricent Sample Resume
No ratings yet
Aricent Sample Resume
3 pages
Saudi Aramco Test Report Pre-Test Punch List Form: Testing Mech-SATR-A-2007
100% (2)
Saudi Aramco Test Report Pre-Test Punch List Form: Testing Mech-SATR-A-2007
1 page
Garbage Alert System
No ratings yet
Garbage Alert System
6 pages
Summary Selling Today Partnering To Create Value Chapter 10 To 17 PDF
No ratings yet
Summary Selling Today Partnering To Create Value Chapter 10 To 17 PDF
57 pages
How I Learned That the Problem in My Marriage Was Me - The New York Times
No ratings yet
How I Learned That the Problem in My Marriage Was Me - The New York Times
14 pages
Lesson 7 - Translating Phrases and Sentences Algebraically
No ratings yet
Lesson 7 - Translating Phrases and Sentences Algebraically
17 pages
8th Grade Math Syllabus 2022-2023
No ratings yet
8th Grade Math Syllabus 2022-2023
2 pages
Traditional Settlements, Cultural Heritage and Sustainable Development
No ratings yet
Traditional Settlements, Cultural Heritage and Sustainable Development
146 pages
File 2
No ratings yet
File 2
1 page
Section 801 Excavation Backfilling and Compaction For Sanitary Sewers
No ratings yet
Section 801 Excavation Backfilling and Compaction For Sanitary Sewers
17 pages
Green_Energy_Technologies_A_Key_Driver_in_Carbon_E
No ratings yet
Green_Energy_Technologies_A_Key_Driver_in_Carbon_E
50 pages
Whatever Happened To The Electronic Cottage
No ratings yet
Whatever Happened To The Electronic Cottage
11 pages
World War II D-Day Invasion by Slidesgo
No ratings yet
World War II D-Day Invasion by Slidesgo
55 pages
Esri GIS To CIM Template Reference Guide
No ratings yet
Esri GIS To CIM Template Reference Guide
17 pages
Session 56: Theory of Non-Linear Analysis: ELEATION's ANSYS Basic To Professional Training Program
No ratings yet
Session 56: Theory of Non-Linear Analysis: ELEATION's ANSYS Basic To Professional Training Program
11 pages
Al 2 Cu MG
No ratings yet
Al 2 Cu MG
10 pages
Efectos de Aplicaciones de Bioestimulantes en El Rendimiento y La Calidad Del Cultivo de Frutilla o Fresa
No ratings yet
Efectos de Aplicaciones de Bioestimulantes en El Rendimiento y La Calidad Del Cultivo de Frutilla o Fresa
16 pages
Tamil Nadu Arasu Cable TV Corporation Limited (Tactv) Consumer Charter
No ratings yet
Tamil Nadu Arasu Cable TV Corporation Limited (Tactv) Consumer Charter
8 pages
2015 Snowmobie Catalog (SINERA)
No ratings yet
2015 Snowmobie Catalog (SINERA)
43 pages
Bcca 4 Sem Mathematics Winter 2018
No ratings yet
Bcca 4 Sem Mathematics Winter 2018
2 pages
Example of Mind Control. Almost As Sinister... Try Advertisi
No ratings yet
Example of Mind Control. Almost As Sinister... Try Advertisi
11 pages
Name of School: To Be Filled Out During Planning To Be Filled Out During Evaluation
No ratings yet
Name of School: To Be Filled Out During Planning To Be Filled Out During Evaluation
52 pages
Aircraft-Power Plants
No ratings yet
Aircraft-Power Plants
116 pages
Ravinder Singh D707/08 6M1
No ratings yet
Ravinder Singh D707/08 6M1
21 pages
MCQ's of English-IV Unit-1 Reflective Writing
100% (1)
MCQ's of English-IV Unit-1 Reflective Writing
5 pages
Auditing and Assurance Principles (AT1) : St. Vincent College
No ratings yet
Auditing and Assurance Principles (AT1) : St. Vincent College
3 pages
Pemberian Plester Hangat
No ratings yet
Pemberian Plester Hangat
8 pages
Assignment 1 ENT 115 (2013 - SOLUTION)
No ratings yet
Assignment 1 ENT 115 (2013 - SOLUTION)
16 pages

UNIT 4 NOTES

Uploaded by

UNIT 4 NOTES

Uploaded by

1) Explain k means clustering algorithm in detail

The algorithm works as follows:

• Select the number of clusters (k) you want to create.

2) Explain clustering concept using k means algorithm

Clustering is the process of grouping similar objects together in a way that

The key concepts of clustering using the k-means algorithm are:

Distance metric: A measure of similarity between two data points.

Centroids: The center of each cluster is represented by a centroid. In the

Assignment of data points: Each data point is assigned to the closest

Number of clusters: The number of clusters (k) is determined by the user. A

Initialization: The initial positions of the centroids have a significant impact

3) Same as 4th answer

By following these steps, the Apriori algorithm can efficiently identify

For better explanation refer this link :-

6)Same as 5th question

9) What is principal component analysis? Explain.

10) Define and explain principal component analysis.

• Principal Component Analysis (PCA) is a statistical method used

The PCA Mainly Contains several steps they are

1. Standardize the data: If the variables in the dataset have

11) Define and explain Singular Value Decomposition.

Singular Value Decomposition (SVD) is a matrix factorization

video of singular value Decomposition

1. Calculate the transpose of A: If the original matrix A is not

2. Compute the product A^TA: Calculate the product of A and its

3. Calculate the eigenvectors and eigenvalues of A^TA: Compute

4. Compute the singular values: Take the square root of the

5. Calculate the matrix V: Construct the matrix V by using the

6. Compute the matrix U: Calculate the matrix U by normalizing

7. Assemble the diagonal matrix Σ: Create the diagonal matrix Σ

12) Explain the salient feature of Singular Value Decomposition in

Salient Feature Of Singular Value Decomposition:

1. Dimensionality reduction: SVD can be used to reduce the

2. Numerical stability: SVD is numerically stable, meaning it is

3. Data compression: SVD can be used for data compression by

4. Noise reduction: SVD can be used to denoise a signal by

5. Robustness to outliers: SVD is robust to outliers, meaning it can

6. Interpretable factors: The singular vectors and values obtained

7. Widely applicable: SVD is applicable to a wide range of

Feature ranking is a useful technique for reducing the dimensionality

14) What are the important features of feature ranking method of

Overall, feature ranking is a useful technique for reducing the

3) Selecting the top-ranked features: Finally, the top-ranked

16)What are the important features of filter method of

A) The filter method of dimensionality reduction is a technique

3)Speed: The filter method is a fast technique as it requires only

4)Scalability: The filter method is scalable and can handle

5)Interpretable results: The filter method provides interpretable

You might also like