Machine Learning Project Report: Regression & Classification
Machine Learning Project Report: Regression & Classification
The main difference between Ridge Regression and LASSO Linear Regression lies in their regularization techniques. Ridge Regression uses L2 regularization, penalizing the sum of squared coefficients, which shrinks coefficient values but never to zero, thus retaining all features. LASSO Regression uses L1 regularization, penalizing the sum of absolute coefficients, which can shrink coefficients to zero, performing feature selection by excluding non-informative variables. These differences affect performance: Ridge is better for datasets with multicollinearity and when preserving all features is necessary, while LASSO is preferred for models needing interpretability with a subset of predictors.
Supervised machine learning algorithms use datasets that include inputs paired with correct outputs, allowing them to learn a mapping from inputs to outputs. This mapping helps predict outputs for new input data that are similar to initial training data. In contrast, unsupervised learning involves working with data without labeled responses, focusing on discovering patterns or cluster data based on similarities and differences. This fundamental difference means supervised learning is typically used for prediction and estimation tasks, whereas unsupervised learning is used for exploratory data analysis.
Using CART for regression tasks can lead to challenges such as high variance and overfitting, evident from poor cross-validation accuracy in datasets like cetane number estimation and cooling requirement modeling. CART's tendency to create complex models capturing noise in training data reduces reliability on unseen data, leading to discrepancies between training and cross-validation results. Its zero validation accuracy in some tasks suggests it failed to generalize, highlighting the need for strategies like pruning or hybrid approaches to address overfitting and improve predictability.
Standard scaling converts data to have a mean of 0 and a standard deviation of 1, aligning features on a similar scale, thus helping models use features without bias due to scale variations, improving convergence rates and interpretation. PCA complements this by reducing data dimensions while preserving variance, making models less memory-intensive and improving computational efficiency. Together, they prepare datasets of varying scales and dimensions to improve model performance, mitigate risks of overfitting, and enhance model robustness in data interpretation.
Principal Component Analysis (PCA) enhances data processing by reducing the dimensionality of data while retaining as much variance as possible. It simplifies data by transforming it into principal components arranged in descending order of information content, making models more efficient and less prone to overfitting. PCA is particularly useful for high-dimensional data because it highlights the underlying data structure and reduces computational costs by focusing on the most informative features without losing critical information.
The Naïve Bayes classifier is considered a strong benchmark in medical diagnosis due to its simplicity, efficiency, and ability to produce comparable or superior results to more complex algorithms in various medical diagnostic tasks. It performed better than other algorithms in five out of eight medical diagnoses, serving as a reliable baseline before applying advanced algorithms. Recent advancements have led to more specialized branches of the Naïve Bayes algorithm, further enhancing its effectiveness while maintaining the core advantages of simplicity and efficiency.
Ridge Regression is preferred over Linear Regression for predicting power load because it includes an L2 regularization term that penalizes large coefficients, which tend to indicate overfitting. By shrinking coefficients, Ridge Regression reduces model complexity, leading to improved generalization on unseen data. In the given dataset, Ridge Regression showed a better generalization capability with a lower cross-validation mean squared error (4149.7) than Linear Regression (4387.6), hence considered more reliable for predictions.
For modeling the need for cooling in an H2O2 process, the SVM model was chosen due to its superior performance in terms of 10-fold cross-validation accuracy, having a value of 76.312. Although Elastic Net Regression showed better training accuracy, its cross-validation accuracy was relatively high at 84.232. However, the trade-off between model complexity and generalizability favored SVM, suggesting a more balanced model for unseen data.
In thyroid classification, Naïve Bayes achieves comparable accuracy to SVM and KNN due to its ability to independently assess the contribution of each feature to the likelihood of each class. It manages class probabilities using Bayes' theorem, computing the likelihood of a data point belonging to each class, factoring in feature independence, which simplifies complex relationships into manageable probabilities. In this specific instance, the model's flexibility in handling diverse feature types and computation efficiency makes it competitive, achieving a cross-validation accuracy of 93.52%.
The k-Nearest-Neighbour (KNN) algorithm classifies a new data point by calculating the distances from the new data point to all other data points in the training set. It then selects the 'k' closest data points (neighbors) and assigns the class that is most common among those neighbors to the new data point. The parameter 'k' is crucial as it determines the number of neighbors considered; a smaller 'k' captures more local patterns and noise, while a larger 'k' could misrepresent the local context, influencing the classification decision.