ML Theory
ML Theory
2. **Identify outliers:**
- Use statistical methods such as z-scores or IQR (Interquartile Range) to identify outliers.
- Remove or handle outliers appropriately, depending on the nature of the data and the outliers.
**Definition of Terms:**
- **R-squared (R2):** A measure of how well the independent variables explain the variability in the
dependent variable. R2 values range from 0 to 1, where 1 indicates a perfect fit.
- **Root Mean Squared Error (RMSE):** The square root of the average of the squared differences
between predicted and actual values. It represents the standard deviation of the residuals.
- **Mean Absolute Error (MAE):** The average absolute differences between predicted and actual
values. It is less sensitive to outliers compared to RMSE.
**3. Implementation:**
- **K-Nearest Neighbors (KNN):**
- Train the KNN classifier on the training set.
- Tune hyperparameters, such as the number of neighbors (k).
- Evaluate the model on the testing set.
**5. Interpretation:**
- **Accuracy:** The percentage of correctly classified instances.
- **Precision:** The ratio of true positives to the sum of true positives and false positives. It
measures the accuracy of positive predictions.
- **Recall (Sensitivity):** The ratio of true positives to the sum of true positives and false negatives.
It measures the ability of the model to capture all the relevant instances.
- **F1 Score:** The harmonic mean of precision and recall, providing a balance between the two.
- **Confusion Matrix:** A table that summarizes the performance of a classification algorithm.
**6. Conclusion:**
- Compare the performance of K-Nearest Neighbors and Support Vector Machine.
- Choose the model with better overall performance based on the chosen metrics .
Practical -3 Build neural networks
Certainly, let's break down the steps and provide brief explanations of the terms:
**6. Interpretation:**
- Analyze the results of the confusion matrix, accuracy, error rate, precision, and recall to evaluate
the performance of the KNN model.
**Note:** Implementation specifics, such as the choice of k in KNN, may vary based on the dataset
and problem. Additionally, libraries like scikit-learn in Python provide functions to compute these
metrics.
Practical K means clustering on sales data
Certainly, let's proceed with implementing K-Means clustering and hierarchical clustering using the
elbow method:
**6. Interpretation:**
- Analyze the clusters obtained from K-Means and hierarchical clustering.
- Understand the characteristics of each cluster and how well they represent distinct groups in the
data.
**Note:** The actual implementation details may vary based on the programming language and
libraries used (e.g., Python with scikit-learn for K-Means and SciPy for hierarchical clustering). The
choice of features and pre-processing steps will depend on the characteristics of the dataset.