11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
11 Most Common Machine Learning Algorithms Explained in A Nutshell by Soner Yıldırım Towards Data Science
com
Soner Yıldırım
23–29 m inutes
In this post, I will cover the most common algorithms in the first two
categories.
Let’s start.
1. Linear Regression
The data points are not always linearly separable like in the figure
above. In these cases, SVM uses kernel trick which measures the
similarity (or closeness) of data points in a higher dimensional
space in order to make them linearly separable.
3. Naive Bayes
The conditional probability for a single feature given the class label
(i.e. p(x1 | yi) ) can be more easily estimated from the data. The
algorithm needs to store probability distributions of features for
each class independently. For example, if there are 5 classes and
10 features, 50 different probability distributions need to be stored.
Adding all these up, it became an easy task for naive bayes
algorithm to calculate the probability to observe a class given
values of features (p(yi | x1, x2 , … , xn) )
4. Logistic Regression
We can use the calculated probability ‘as is’. For example, the
output can be “the probability that this email is spam is 95%” or “the
probability that customer will click on this ad is 70%”. However, in
most cases, probabilities are used to classify data points. For
instance, if the probability is greater than 50%, the prediction is
positive class (1). Otherwise, the prediction is negative class (0).
6. Decision Trees
6. Random Forest
Feature randomness
Random forest is a highly accurate model on many different
problems and does not require normalization or scaling. However, it
is not a good choice for high-dimensional data sets (i.e. text
classification) compared to fast linear models (i.e. Naive Bayes).
Note on overfitting
8. K-Means Clustering
Real life datasets are much more complex in which clusters are not
clearly separated. However, the algorithm works in the same way.
K-means is an iterative process. It is built on expectation-
maximization algorithm. After number of clusters are determined, it
works by executing the following steps:
4. Find the new centroids of each cluster by taking the mean of all
data points in the cluster.
5. Repeat steps 2,3 and 4 until all points converge and cluster centers
stop moving.
9. Hierarchical Clustering
• Agglomerative clustering
• Divisive clustering
• eps: The distance that specifies the neighborhoods. Two points are
considered to be neighbors if the distance between them are less
than or equal to eps.
The aim of PCA is to explain the variance within the original dataset
as much as possible by using less features (or columns). The new
derived features are called principal components. The order of
principal components is determined according to the fraction of
variance of original dataset they explain.
The principal components are linear combinations of the features of
original dataset.
Thank you for reading. Please let me know if you have any
feedback.