Machine Learning Cheat Sheet
Machine Learning Cheat Sheet
use cases
A simple algorithm that models a linear Explainable metho Assumes linearity between inputs and outpu
Stock price predictio
relationship between inputs and a continuous Interpretable results by its output coefficient Sensitive to outlier
Linear Regression Predicting housing price
numerical output variable Faster to train than other machine learning Can underfit with small, high-dimensional data
Predicting customer lifetime value
models
Top Machine Learning Algorithms
use cases
A simple algorithm that models a linear Interpretable and explainabl Assumes linearity between inputs and output
Credit risk score predictio
relationship between inputs and a categorical Less prone to overfitting when using Can overfit with small, high-dimensional data
Linear Models
Logistic Regression output (1 or 0)
Customer churn prediction
regularizatio
Applicable for multi-class predictions
health data
use cases
K-Means is the most widely used clustering 1. Scales to large datasets
1. Requires the expected number of clusters
Customer segmentatio
approach—it determines K clusters based on 2. Simple to implement and interpret
from the beginning
Clustering
Fraud detectio
Hierarchical
point is treated as its own cluster—and then of clusters
2. Not suitable for large datasets due to high
Document clustering based on similarity 2. The resulting dendrogram is informative
Clustering the closest two clusters are merged together complexity
iteratively
use cases
A probabilistic model for modeling normally 1. Computes a probability for an observation 1. Requires complex tuning
Customer segmentatio
Gaussian Mixture
distributed clusters within a dataset belonging to a cluster
2. Requires setting the number of expected mixture
Recommendation systems
2. Can identify overlapping clusters
components or clusters
Models
3. More accurate results compared to K-means
Association
1. Product placements
frequent itemset in a given dataset where prior 2. Exhaustive approach as it finds all rules 2. Computationally and memory intensive.
based on the confidence and support 3. Results in many overlapping item sets
3. Promotion optimization
>