0% found this document useful (0 votes)
11 views

ML Algorithms Cheat Sheet

Uploaded by

Reddy Mohan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

ML Algorithms Cheat Sheet

Uploaded by

Reddy Mohan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Machine Learning Algorithms Cheat Sheet

1. Linear Regression

**Overview**: Linear Regression is a linear approach to modeling the relationship between a

dependent variable and one or more independent variables.

**Key Hyperparameters**:

- `fit_intercept`: Whether to calculate the intercept for the model. Default is `True`.

- `normalize`: If `True`, the regressors X will be normalized before regression. Default is `False`.

**Example Code**:

```python

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error

# Example data

X = ...

y = ...

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model initialization

lr = LinearRegression(fit_intercept=True, normalize=False)
# Model fitting

lr.fit(X_train, y_train)

# Predictions

y_pred = lr.predict(X_test)

# Evaluation

mse = mean_squared_error(y_test, y_pred)

print(f'Mean Squared Error: {mse}')

```

2. Logistic Regression

**Overview**: Logistic Regression is used for binary classification problems. It models the probability

of a binary outcome using a logistic function.

**Key Hyperparameters**:

- `penalty`: Used to specify the norm used in the penalization (`'l1'`, `'l2'`, `'elasticnet'`, `'none'`).

- `C`: Inverse of regularization strength; smaller values specify stronger regularization.

- `solver`: Algorithm to use in the optimization problem (`'newton-cg'`, `'lbfgs'`, `'liblinear'`, `'sag'`,

`'saga'`).

**Example Code**:

```python

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split


from sklearn.metrics import accuracy_score

# Example data

X = ...

y = ...

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model initialization

log_reg = LogisticRegression(penalty='l2', C=1.0, solver='lbfgs', max_iter=1000)

# Model fitting

log_reg.fit(X_train, y_train)

# Predictions

y_pred = log_reg.predict(X_test)

# Evaluation

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy}')

```

3. Decision Tree

**Overview**: Decision Tree is a non-parametric supervised learning method used for classification
and regression.

**Key Hyperparameters**:

- `criterion`: The function to measure the quality of a split (`'gini'` for Gini impurity, `'entropy'` for

information gain).

- `max_depth`: The maximum depth of the tree.

- `min_samples_split`: The minimum number of samples required to split an internal node.

- `min_samples_leaf`: The minimum number of samples required to be at a leaf node.

**Example Code**:

```python

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# Example data

X = ...

y = ...

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model initialization

dt = DecisionTreeClassifier(criterion='gini', max_depth=None, min_samples_split=2,

min_samples_leaf=1)
# Model fitting

dt.fit(X_train, y_train)

# Predictions

y_pred = dt.predict(X_test)

# Evaluation

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy}')

```

4. Random Forest

**Overview**: Random Forest is an ensemble method that combines multiple decision trees to

improve classification or regression results.

**Key Hyperparameters**:

- `n_estimators`: The number of trees in the forest.

- `criterion`: The function to measure the quality of a split (`'gini'`, `'entropy'`).

- `max_depth`: The maximum depth of the tree.

- `min_samples_split`: The minimum number of samples required to split an internal node.

- `min_samples_leaf`: The minimum number of samples required to be at a leaf node.

**Example Code**:

```python

from sklearn.ensemble import RandomForestClassifier


from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# Example data

X = ...

y = ...

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model initialization

rf = RandomForestClassifier(n_estimators=100, criterion='gini', max_depth=None,

min_samples_split=2, min_samples_leaf=1)

# Model fitting

rf.fit(X_train, y_train)

# Predictions

y_pred = rf.predict(X_test)

# Evaluation

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy}')

```

5. AdaBoost
**Overview**: AdaBoost is an ensemble method that combines multiple weak classifiers to create a

strong classifier.

**Key Hyperparameters**:

- `n_estimators`: The maximum number of estimators at which boosting is terminated.

- `learning_rate`: Weight applied to each classifier at each boosting iteration.

- `base_estimator`: The base estimator from which the boosted ensemble is built (e.g.,

`DecisionTreeClassifier`).

**Example Code**:

```python

from sklearn.ensemble import AdaBoostClassifier

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# Example data

X = ...

y = ...

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model initialization

ada = AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=1), n_estimators=50,


learning_rate=1.0)

# Model fitting

ada.fit(X_train, y_train)

# Predictions

y_pred = ada.predict(X_test)

# Evaluation

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy}')

```

6. K-Nearest Neighbors (KNN)

**Overview**: KNN is a non-parametric method used for classification and regression by finding the

k most similar instances in the training data.

**Key Hyperparameters**:

- `n_neighbors`: Number of neighbors to use.

- `weights`: Weight function used in prediction (`'uniform'`, `'distance'`).

- `algorithm`: Algorithm used to compute the nearest neighbors (`'auto'`, `'ball_tree'`, `'kd_tree'`,

`'brute'`).

**Example Code**:

```python
from sklearn.neighbors import KNeighborsClassifier

from sklearn.model_selection import train_test_split

from sklearn.metrics import accuracy_score

# Example data

X = ...

y = ...

# Split data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model initialization

knn = KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto')

# Model fitting

knn.fit(X_train, y_train)

# Predictions

y_pred = knn.predict(X_test)

# Evaluation

accuracy = accuracy_score(y_test, y_pred)

print(f'Accuracy: {accuracy}')

```

You might also like