0% found this document useful (0 votes)
27 views

Cluster R

The document performs k-means clustering on customer transaction data to segment customers into distinct groups. It loads data, standardizes variables, determines optimal k by analyzing within-cluster sums of squares, runs k-means clustering with k=3 clusters, calculates cluster means, and visualizes the resulting clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Cluster R

The document performs k-means clustering on customer transaction data to segment customers into distinct groups. It loads data, standardizes variables, determines optimal k by analyzing within-cluster sums of squares, runs k-means clustering with k=3 clusters, calculates cluster means, and visualizes the resulting clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Cluster.

R
Administrator
2024-03-29

# Load required libraries


library(dplyr)

## Warning: package 'dplyr' was built under R version 4.3.1

##
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':


##
## filter, lag

## The following objects are masked from 'package:base':


##
## intersect, setdiff, setequal, union

library(cluster)
library(factoextra)

## Warning: package 'factoextra' was built under R version 4.3.3

## Loading required package: ggplot2

## Warning: package 'ggplot2' was built under R version 4.3.1

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

library(readxl)

## Warning: package 'readxl' was built under R version 4.3.1

# Load dataset
mark5827_T1_24_Ass_2 <- read_excel("C:/Users/Administrator/Downloads/mark5827 T1_ 24 Ass 2.xls")
head(mark5827_T1_24_Ass_2)

## # A tibble: 6 × 21
## ID Year_Birth Kidhome Teenhome Dt_Customer MntWines MntFruits
## <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 5524 1957 0 0 21/11/2020 1587 2024
## 2 2174 1954 1 1 25/05/2022 253 23
## 3 5324 1981 1 0 7/04/2022 3979 989
## 4 965 1971 0 1 30/01/2021 5405 1495
## 5 6177 1985 1 0 25/07/2021 1748 230
## 6 387 1976 0 0 30/01/2021 138 368
## # ℹ 14 more variables: MntMeatProducts <dbl>, MntFishProducts <dbl>,
## # MntSweetProducts <dbl>, MntOtherProds <dbl>, NumStorePurchases <dbl>,
## # NumDealsPurchases <dbl>, NumWebPurchases <dbl>, NumCatalogPurchases <dbl>,
## # NumWebVisitsMonth <dbl>, AcceptedCmp2 <dbl>, AcceptedCmp1 <dbl>,
## # AcceptedCmp3 <dbl>, AcceptedCmp4 <dbl>, AcceptedCmp5 <dbl>

# Standardize the data


data <- mark5827_T1_24_Ass_2[, c("MntWines", "MntFruits", "MntMeatProducts", "MntFishProducts",
"MntSweetProducts", "MntOtherProds")]
scaled_data <- scale(mark5827_T1_24_Ass_2[, c("MntWines", "MntFruits", "MntMeatProducts", "MntFishProducts",
"MntSweetProducts", "MntOtherProds")])

# Determine the number of clusters


wss <- numeric(10)
for (i in 1:10) {
kmeans_model <- kmeans(scaled_data, centers = i)
wss[i] <- kmeans_model$tot.withinss
}
plot(1:10, wss, type = "b", xlab = "Number of Clusters", ylab = "Within Sum of Squares")

# Perform k-means clustering


k <- 3 # Number of clusters
kmeans_model <- kmeans(scaled_data, centers = k)

# Add cluster assignments to the original dataset


data$cluster <- kmeans_model$cluster

# Calculate unstandardized means for each cluster


data <- as.data.frame(data)
cluster_means <- aggregate(. ~ cluster, data = data, mean)

# Print the cluster means


print(cluster_means)

## cluster MntWines MntFruits MntMeatProducts MntFishProducts MntSweetProducts


## 1 1 11983.070 1963.7217 10531.6789 2646.6881 1995.5841
## 2 2 2095.429 156.1426 818.6008 205.6156 154.2482
## 3 3 15255.970 586.9752 5734.9835 918.3526 638.1074
## MntOtherProds
## 1 1858.9205
## 2 411.4256
## 3 1823.8430

# Plot the clusters


fviz_cluster(kmeans_model, data = data, geom = "point", stand = FALSE,
ellipse.type = "convex", ellipse.level = 0.68, main = "Cluster Analysis")

You might also like