0% found this document useful (0 votes)
3 views

K-Means Data Clustering from Scratch Using C# -- Visual Studio Magazine

The document outlines a demo program for K-Means data clustering implemented in C#, detailing how to load data, create a KMeans object, and display clustering results. It discusses the elbow technique for determining the optimal number of clusters and various methods for displaying clustering information. Additionally, it provides insights into potential modifications to the K-means implementation and the author's background.

Uploaded by

Gabriel Augusto
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

K-Means Data Clustering from Scratch Using C# -- Visual Studio Magazine

The document outlines a demo program for K-Means data clustering implemented in C#, detailing how to load data, create a KMeans object, and display clustering results. It discusses the elbow technique for determining the optimal number of clusters and various methods for displaying clustering information. Additionally, it provides insights into potential modifications to the K-means implementation and the author's background.

Uploaded by

Gabriel Augusto
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

AZURE VISUAL STUDIO VISUAL STUDIO CODE BLAZOR/ASP.NET .

NET C#/VB/TYPESCRIPT

XAMARIN/MOBILE AI/MACHINE LEARNING

THE DATA SCIENCE LAB

K-Means Data Clustering from Scratch Using C#


12/01/2023

GET CODE DOWNLOAD

The demo program displays the loaded data using the MatShow() function:

Console.WriteLine("Data: ");
MatShow(X, 2, 6); // 2 decimals, 6 wide
MOST POPULAR

In a non-demo scenario, you might want to display all the data to make sure it was loaded
properly.

Clustering the Data


The KMeans clustering object is created and used like so:

Console.WriteLine("Clustering with k=3 ");


KMeans km = new KMeans(X, 3);
Console.WriteLine("Done ");
int[] clustering = km.Cluster();

The number of clusters, k, must be supplied. There is no good way to programmatically


determine an optimal number of clusters. A somewhat subjective approach for determining a
good number of clusters is called the elbow technique. You try different values of k and plot
the associated WCSS values. When the WCSS value doesn't change much at some point, the
associated value of k is a good candidate. See the article, "Determining the Number of Cluster
s Using the Elbow and the Knee Techniques."

The demo concludes by displaying the clustering results:

Console.WriteLine("clustering = ");
VecShow(clustering, 3);

The results look like:

1 1 2 0 0 1 . . .

The clustering information can be displayed in several ways. For example, instead of
displaying by data item, you can display by cluster ID. Or, you can display each data item and
its associated cluster ID on the same line.

Wrapping Up
The k-means implementation presented in this article can be used as-is or modified in several
ways. The demo uses random partition initialization. The order of the N data items is
MOST POPULAR

randomized, then the first k data items are assigned to the first k cluster IDS, and then the
remaining N-k data items are assigned to a random cluster ID. This approach guarantees that
there is at least one data item assigned to each cluster ID but doesn't guarantee that the initial
distribution of cluster ID counts are equal. Instead of assigning random cluster IDs, you can
cycle though the cluster IDs sequentially.

The Cluster() method calls ClusterOnce() this.trials times looking for the clustering that has
the smallest within-cluster sum of squares. Instead of iterating a fixed number of times, you
can track previous-wcss and current-wcss and exit the loop when the two values do not
change after a threshold value (such as N times).

The demo implementation assumes the source data is normalized. An alternative is to


integrate the data normalization into the k-means system.

The UpdateClustering() uses Euclidean distance to assign each data item to the cluster ID that
has the closest mean. There are alternatives to Euclidean distance, such as Manhattan
distance and Minkowski distance. However, Euclidean distance works fine in most scenarios.
« previous 1 2

About the Author


Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several
Microsoft products including Azure and Bing. James can be reached at [email protected].

PRINTABLE FORMAT

ALSO ON VISUAL STUDIO MAGAZINE

How Good Is GitHub New .NET 9 Templates 'F# Meets XAML' in Open Fir
Copilot? Depends … for Blazor Hybrid, … Source … Em

a month ago • 1 comment 11 days ago • 1 comment 5 months ago • 3 comments 4m


Since leveraging generative Microsoft's fifth preview of "The integration of F# in "Ou
AI breakthroughs to .NET 9 nods at AI OpenSilver 2.1 bridges the are
introduce the original "AI … development while also … gap between functional … inte
MOST POPULAR
0 Comments 
1 Login

G Start the discussion…

LOG IN WITH OR SIGN UP WITH DISQUS ?

Name

 1 Share Best Newest Oldest

Be the first to comment.

Subscribe Privacy Do Not Sell My Data

Featured
MOST POPULAR

Microsoft Making Big .NET Aspire


Push, So What Is It?
Microsoft is making a big push to publicize its new .NET Aspire, a
new tech stack to streamline development of .NET cloud-native
services.

Open Source 'Eclipse Theia IDE' Exits


Beta to Challenge Visual Studio Code
Some seven years in the making, the Eclipse Foundation's Theia
IDE project is now generally available, emerging from beta to
challenge Microsoft's similar Visual Studio Code editor, with
which it shares much tech.
Visual Studio IntelliCode Still Among
Top AI Code Assistants
In the age of GitHub Copilot, ChatGPT, Google Gemini and all the
rest, one of the most-used AI coding assistants is still the
venerable IntelliCode feature of Microsoft's Visual Studio IDE,
whose six-year-old tech now seems positively ancient.

GitHub Expands Copilot Enterprise


Search in Visual Studio and VS Code
GitHub supercharged search for its Copilot Enterprise AI assistant
in both Microsoft's Visual Studio IDE and Visual Studio Code so
developers can now get results from well beyond local
codebases, including the internet.

What's New in TypeScript 5.5, Now


Generally Available
Microsoft shipped the latest iteration of its type-infused superset
MOST POPULAR

of JavaScript, TypeScript 5.5, introducing inferred type predicates,


control flow narrowing, JSDoc @import and other enhancements.

Subscribe on YouTube
.NET Insight
MOST POPULAR

Sign up for our newsletter.

Email Address*

Email Address*

Country*

I agree to this site's Privacy Policy

Please type the letters/numbers you see above.

SUBMIT
Most Popular Articles

Open Source 'Eclipse Theia IDE' Exits Beta to Challenge Visual Studio Code

Microsoft Making Big .NET Aspire Push, So What Is It?

What's Next for ASP.NET Core and Blazor

Visual Studio IntelliCode Still Among Top AI Code Assistants

New .NET 9 Templates for Blazor Hybrid, .NET MAUI

Upcoming Training Events

VSLive! 4-Day Hands-On Training Seminar: Full Stack Hands-On Development with .NET (Core)
July 16-19, 2024
MOST POPULAR

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI - A
Hands-on Experience
August 20-21, 2024

VSLive! 4-Day Hands-On Training Seminar: Hands-on with Blazor


September 17-20, 2024

VSLive! 2-Day Hands-On Training Seminar: Developing Secure ASP.NET Web Apps
September 24-25, 2024

Live! 360 Orlando


November 17-22, 2024

VSLive! 4-Day Hands-On Training Seminar: Full Stack Hands-On Development with .NET (Core)
December 10-13, 2024
Visual Studio Live! Las Vegas
March 10-14, 2025

Free Webcasts

Myths and Realities in Telemetry Data Handling

How .NET MAUI Changes the Cross-Platform Game Summit

MoneyTree Achieves Compliance and Speeds Innovation with AWS and Sumo Logic

Best Practices for AWS Monitoring

> More Webcasts


MOST POPULAR

AI Boardroom

ADTmag

AWS Insider

Campus Security Today

Campus Technology

Environmental Protection

Live! 360
MCPmag

MedCloudInsider

Occupational Health & Safety

Pure AI

Redmond

Redmond Channel Partner

Security Today

Spaces 4 Learning

TechMentor

Techtactics in Education

THE Journal

Virtualization & Cloud Review

Visual Studio Live!


MOST POPULAR

©1996-2024 1105 Media Inc. See our Privacy Policy, Cookie Policy and Terms of Use. CA: Do Not Sell My
Personal Info
Problems? Questions? Feedback? E-mail us.

You might also like