0% found this document useful (0 votes)
7 views5 pages

Python

Python concepts

Uploaded by

Girish Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

Python

Python concepts

Uploaded by

Girish Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Dimensionality reduction , Hierarchical clustering in machine learning

Prerequisites
1. Install Python: Make sure you have Python installed. You can download it
from Python's official website (https://www.python.org/downloads/).

2. Install Required Libraries: You will need the following libraries: 'pandas',
'numpy', and 'matplotlib'. You can install them using pip.

pip install pandas numpy matplotlib

3. Set Up Your IDE: You can use any Python IDE or text editor (like Jupyter
Notebook, VS Code, or PyCharm).
Step 1: Gather Data
For demonstration, let’s create a sample dataset in CSV format. Save the
following data in a file named 'business_data.csv'.

CustomerID,Name,Email,JoinDate,AmountSpent
1,John Doe,[email protected],2024-01-15,150.00
2,Jane Smith,[email protected],2024-02-20,200.00
3,Bob Johnson,,2024-03-05,150.00
4,Mary Johnson,[email protected],2024-02-30,300.00
5,Tom Brown,[email protected],2024-03-15,400.00
6,Emily Davis,[email protected],2024-01-25,
1,John Doe,[email protected],2024-01-15,150.00

Step 2: Load the Data


Use Pandas to load the dataset and inspect its contents.

# Load a sample dataset (Iris dataset)

data = load_iris()

df = pd.DataFrame(data.data,

columns=data.feature_names)

print(df.head())
Step 3: Dimensionality Reduction Techniques

Dimensionality reduction helps in reducing the number of features while retaining

essential patterns.

a. Principal Component Analysis (PCA)

from sklearn.decomposition import PCA

pca = PCA(n_components=2)

df_pca = pca.fit_transform(df)

print(df_pca[:5])

Sample Output:

[[-2.68412563 0.31939725]

[-2.71414169 -0.17700123]

[-2.88899057 -0.14494943]

[-2.74534286 -0.31829898]

[-2.72871654 0.32675451]]
b. t-Distributed Stochastic Neighbor Embedding (t-SNE)

Sample Code:

from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, random_state=42)

df_tsne = tsne.fit_transform(df)

print(df_tsne[:5])

Sample Output:

[[ 1.2379045 12.769159 ]

[ 8.755232 7.7505245]

[ 9.419792 8.941869 ]

[ 9.378086 7.217551 ]

[ 2.849782 6.5989175]]

Step 4: Hierarchical Clustering

Sample Code:

-----------

from scipy.cluster.hierarchy import dendrogram, linkage

import matplotlib.pyplot as plt

linked = linkage(df, method='ward')

plt.figure(figsize=(10, 7))

dendrogram(linked, truncate_mode='lastp')
plt.title("Hierarchical Clustering Dendrogram")

plt.show()

Expected Output: A dendrogram plot will display showing hierarchical relationships between data

points.

Step 5: Evaluation and Visualization

Sample Code:

from sklearn.metrics import silhouette_score

from sklearn.cluster import AgglomerativeClustering

cluster = AgglomerativeClustering(n_clusters=3)

labels = cluster.fit_predict(df)

score = silhouette_score(df, labels)

print("Silhouette Score:", score)

Sample Output:

Silhouette Score: 0.554323

This score evaluates clustering quality, where higher values indicate better-defined

clusters.

You might also like