0% found this document useful (0 votes)

20 views

ML Theory

Uploaded by

Shweta Bagade

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

ML Theory

Uploaded by

Shweta Bagade

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Practical 1 – uber ride

Sure, let's break down each task:

1. Pre-process the dataset:

- Handle missing values: Remove or impute missing values in the dataset.
- Convert categorical variables: Convert categorical variables (like pickup point, drop-off location)
into numerical representations using techniques like one-hot encoding.
- Normalize/Scale: Normalize or scale numerical features to ensure they are on a similar scale.

2. **Identify outliers:**
- Use statistical methods such as z-scores or IQR (Interquartile Range) to identify outliers.
- Remove or handle outliers appropriately, depending on the nature of the data and the outliers.

3. Check the correlation:

- Use correlation matrices to understand the linear relationship between variables.
- Identify highly correlated features and consider removing one of them to avoid multicollinearity.

4. Implement linear regression and random forest regression models:

- Split the dataset into training and testing sets.
- Train a linear regression model and a random forest regression model on the training set.
- Use libraries like scikit-learn in Python for implementation.

5. Evaluate the models and compare their respective scores:

- Use metrics like R-squared (R2), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) to
evaluate model performance.
- For R-squared, a higher value indicates a better fit. For RMSE and MAE, lower values are desirable.
- Compare the performance of the linear regression and random forest regression models to choose
the one with better predictive capabilities.

**Definition of Terms:**
- **R-squared (R2):** A measure of how well the independent variables explain the variability in the
dependent variable. R2 values range from 0 to 1, where 1 indicates a perfect fit.

- **Root Mean Squared Error (RMSE):** The square root of the average of the squared differences
between predicted and actual values. It represents the standard deviation of the residuals.

- **Mean Absolute Error (MAE):** The average absolute differences between predicted and actual
values. It is less sensitive to outliers compared to RMSE.

- Linear Regression: A linear approach to modeling the relationship between a dependent

variable and one or more independent variables by fitting a linear equation to the observed data.

- Random Forest Regression: An ensemble learning method that constructs a multitude of

decision trees during training and outputs the average prediction of the individual trees for regression
problems.
Practical 2 – email
Certainly, let's go through the steps:

1. Definition of Terms:

- **Binary Classification:** A type of classification task where the goal is to categorize items into
two classes or groups (e.g., spam or not spam).
- **K-Nearest Neighbors (KNN):** A supervised machine learning algorithm for classification that
classifies a data point based on the majority class of its k-nearest neighbors.
- **Support Vector Machine (SVM):** A supervised machine learning algorithm that builds a
hyperplane to separate data into different classes, maximizing the margin between them.

2. Email Spam Detection:

- **Dataset Preparation:** Split the dataset into training and testing sets.
- **Feature Extraction:** Extract relevant features from emails, such as word frequency, presence
of certain keywords, etc.

**3. Implementation:**
- **K-Nearest Neighbors (KNN):**
- Train the KNN classifier on the training set.
- Tune hyperparameters, such as the number of neighbors (k).
- Evaluate the model on the testing set.

- Support Vector Machine (SVM):

- Train the SVM classifier on the training set.
- Tune hyperparameters, such as the choice of kernel and regularization parameters.
- Evaluate the model on the testing set.

4. Performance Analysis:

- **Metrics:**
- Use metrics like accuracy, precision, recall, F1 score, and confusion matrix to analyze the
performance of each model.
- **Cross-Validation:**
- Perform cross-validation to ensure the robustness of the models.

**5. Interpretation:**
- **Accuracy:** The percentage of correctly classified instances.
- **Precision:** The ratio of true positives to the sum of true positives and false positives. It
measures the accuracy of positive predictions.
- **Recall (Sensitivity):** The ratio of true positives to the sum of true positives and false negatives.
It measures the ability of the model to capture all the relevant instances.
- **F1 Score:** The harmonic mean of precision and recall, providing a balance between the two.
- **Confusion Matrix:** A table that summarizes the performance of a classification algorithm.

**6. Conclusion:**
- Compare the performance of K-Nearest Neighbors and Support Vector Machine.
- Choose the model with better overall performance based on the chosen metrics .
Practical -3 Build neural networks

1. Definition of Terms:

- **Neural Network:** A computational model composed of layers of interconnected nodes (neurons) that
can learn patterns and relationships in data.
- **Classifier:** A model that assigns a label or category to input data.
- **Normalization:** Scaling features to a standard range, often between 0 and 1, to ensure consistent and
effective learning.

2. Reading the Dataset:

- Utilize a library like pandas to read and load the dataset.

3. Distinguish Feature and Target Set:

- Identify features (CreditScore, Geography, Gender, Age, Tenure, Balance, etc.).
- Define the target variable, indicating whether the customer will leave or not in the next 6 months.

4. Divide the Dataset:

- Split the dataset into training and testing sets using tools like scikit-learn.

5. Normalize the Data:

- Normalize both the training and testing sets to bring features to a standard scale.
- Common methods include Min-Max scaling or Standardization.

6. Build the Neural Network Model:

- Use a deep learning framework like TensorFlow or PyTorch.
- Define the architecture, specifying the number of layers, nodes, and activation functions.
- Compile the model with an appropriate loss function (e.g., binary cross-entropy for binary classification)
and optimizer.
- Train the model on the training set, specifying the number of epochs and batch size.

7. Identify Points of Improvement:

- Monitor the training process for signs of overfitting or underfitting.
- Experiment with hyperparameters (learning rate, batch size, number of layers) and consider techniques like
dropout or regularization to improve generalization.

8. Evaluate the Model:

- Assess the model's performance on the testing set using metrics such as accuracy, precision, recall, and F1
score.

9. Print Accuracy Score and Confusion Matrix:

- Calculate and print the accuracy score of the model on the test set.
- Generate and print the confusion matrix to evaluate true positives, true negatives, false positives, and false
negatives.

10. Iterative Improvement:

- Based on evaluation results, make iterative improvements to the model, adjusting hyperparameters or
model architecture.
Practical 4 k nearest neighbours on diabetes

Certainly, let's break down the steps and provide brief explanations of the terms:

1. Definition of Terms:

- **K-Nearest Neighbors (KNN):** A supervised machine learning algorithm used for classification.
It classifies a data point based on the majority class of its k-nearest neighbors.
- **Confusion Matrix:** A table used to evaluate the performance of a classification algorithm. It
shows the counts of true positives, true negatives, false positives, and false negatives.
- **Accuracy:** The ratio of correctly predicted instances to the total instances.
- **Error Rate:** The ratio of incorrectly predicted instances to the total instances.
- **Precision:** The ratio of true positives to the sum of true positives and false positives. It
measures the accuracy of positive predictions.
- **Recall (Sensitivity):** The ratio of true positives to the sum of true positives and false negatives.
It measures the ability of the model to capture all the relevant instances.

2. Implementing K-Nearest Neighbors on diabetes.csv:

- Read the 'diabetes.csv' dataset.
- Separate features (independent variables) and the target variable (dependent variable).
- Split the dataset into training and testing sets.

3. Normalize the Data:

- Normalize the feature values to ensure they are on a similar scale. This step is essential for KNN.

4. Train and Predict:

- Train the KNN model on the training set.
- Predict the target variable on the testing set.

5. Compute Metrics:

- Use the predictions and actual values to calculate:
- Confusion matrix: Counts of true positives, true negatives, false positives, and false negatives.
- Accuracy: (TP + TN) / (TP + TN + FP + FN)
- Error Rate: (FP + FN) / (TP + TN + FP + FN)
- Precision: TP / (TP + FP)
- Recall: TP / (TP + FN)

**6. Interpretation:**
- Analyze the results of the confusion matrix, accuracy, error rate, precision, and recall to evaluate
the performance of the KNN model.

**Note:** Implementation specifics, such as the choice of k in KNN, may vary based on the dataset
and problem. Additionally, libraries like scikit-learn in Python provide functions to compute these
metrics.
Practical K means clustering on sales data

Certainly, let's proceed with implementing K-Means clustering and hierarchical clustering using the
elbow method:

1. Definition of Terms:

- **K-Means Clustering:** A partitioning method that divides a dataset into K distinct, non-
overlapping subsets (clusters), where each data point belongs to the cluster with the nearest mean.
- **Hierarchical Clustering:** A method that builds a hierarchy of clusters. It can be agglomerative
(bottom-up) or divisive (top-down), merging or splitting clusters based on certain criteria.
- **Elbow Method:** A technique used to determine the optimal number of clusters for K-Means
clustering. It involves plotting the explained variation as a function of the number of clusters and
identifying the "elbow" point where the rate of improvement slows.

2. Implementing K-Means Clustering:

- Read the 'sales_data_sample.csv' dataset.
- Pre-process the data if necessary (handle missing values, encode categorical variables).
- Identify relevant features for clustering.
- Normalize the data if needed.
- Implement the K-Means algorithm with varying values of K.
- Use the elbow method to determine the optimal number of clusters.

3. Implementing Hierarchical Clustering:

- Similar to K-Means, preprocess the data and identify relevant features.
- Implement hierarchical clustering using agglomerative or divisive approach.
- Use a dendrogram to visualize the hierarchical clustering structure.

4. Elbow Method:

- For K-Means, run the algorithm for different values of K.
- Calculate the sum of squared distances (inertia) for each K.
- Plot the inertia values against the number of clusters (K).
- Identify the "elbow" point where the rate of decrease in inertia slows down. This point indicates a
good balance between the number of clusters and model performance.

5. Determine the Number of Clusters:

- Based on the elbow method results, determine the optimal number of clusters for K-Means.
- For hierarchical clustering, the optimal number of clusters might be determined based on the
dendrogram.

**6. Interpretation:**
- Analyze the clusters obtained from K-Means and hierarchical clustering.
- Understand the characteristics of each cluster and how well they represent distinct groups in the
data.

**Note:** The actual implementation details may vary based on the programming language and
libraries used (e.g., Python with scikit-learn for K-Means and SciPy for hierarchical clustering). The
choice of features and pre-processing steps will depend on the characteristics of the dataset.

Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
No ratings yet
Project On Data Mining: Prepared by Ashish Pavan Kumar K PGP-DSBA at Great Learning
50 pages
Project Questions
No ratings yet
Project Questions
4 pages
Telecom Customer Churn Project Report
50% (2)
Telecom Customer Churn Project Report
25 pages
Lab 2
100% (1)
Lab 2
4 pages
Detailed_Classification_and_Performance_Measures_Notes
No ratings yet
Detailed_Classification_and_Performance_Measures_Notes
4 pages
PA
No ratings yet
PA
8 pages
Dsbda Viva Ans
No ratings yet
Dsbda Viva Ans
8 pages
ML 1
No ratings yet
ML 1
20 pages
Software Projekt
No ratings yet
Software Projekt
11 pages
METHODLOGY of Machine Learning
No ratings yet
METHODLOGY of Machine Learning
1 page
Machine Learning Mid 2 Set 1
No ratings yet
Machine Learning Mid 2 Set 1
6 pages
Amazon Sentiment Analysis Documentation
No ratings yet
Amazon Sentiment Analysis Documentation
4 pages
ML SELF UNIT 2
No ratings yet
ML SELF UNIT 2
20 pages
AAM 1st Unit QB
No ratings yet
AAM 1st Unit QB
4 pages
ML Supervised Learning Unit 3
No ratings yet
ML Supervised Learning Unit 3
51 pages
Paper 2.0
No ratings yet
Paper 2.0
16 pages
Two SPSS Programs For Interpreting Multiple Regression Results
No ratings yet
Two SPSS Programs For Interpreting Multiple Regression Results
5 pages
Computational
No ratings yet
Computational
7 pages
Cross Validation
No ratings yet
Cross Validation
2 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
PYTHON PROGRAMMING FOR MACHINE LEARNING-220901004_compressed (1)
No ratings yet
PYTHON PROGRAMMING FOR MACHINE LEARNING-220901004_compressed (1)
6 pages
Assignment1_LATEX
No ratings yet
Assignment1_LATEX
11 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
22 pages
T3 Bda
No ratings yet
T3 Bda
27 pages
ML_Questions_Answers
No ratings yet
ML_Questions_Answers
4 pages
Oral Aswers Dsbda
No ratings yet
Oral Aswers Dsbda
7 pages
Machine Learning Team Coursework
No ratings yet
Machine Learning Team Coursework
7 pages
S-2
No ratings yet
S-2
10 pages
Capstone project_Jaro-Prof. Babji
No ratings yet
Capstone project_Jaro-Prof. Babji
5 pages
Artificial Intelligence762836
No ratings yet
Artificial Intelligence762836
6 pages
ML-MID1-MYANS
No ratings yet
ML-MID1-MYANS
24 pages
Title Predicting House Pricing Using AIML (KASHISH)
No ratings yet
Title Predicting House Pricing Using AIML (KASHISH)
2 pages
Data Mining - Lab 2
No ratings yet
Data Mining - Lab 2
5 pages
FAI Lecture - 4-10-2023 PDF
No ratings yet
FAI Lecture - 4-10-2023 PDF
27 pages
2022 Answers
No ratings yet
2022 Answers
42 pages
NLP Assignment 2024
No ratings yet
NLP Assignment 2024
12 pages
NOCOPY: Is A Compiles Hint That Can Be Used With OUT and IN OUT Parameter To
No ratings yet
NOCOPY: Is A Compiles Hint That Can Be Used With OUT and IN OUT Parameter To
42 pages
Data Science Notes
No ratings yet
Data Science Notes
36 pages
Regularization - Ridge and Lasso
No ratings yet
Regularization - Ridge and Lasso
7 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
ML Sem
No ratings yet
ML Sem
24 pages
ML Practical 04
No ratings yet
ML Practical 04
19 pages
SSRN 3478927
No ratings yet
SSRN 3478927
40 pages
CQF EXAM 3-Answer
No ratings yet
CQF EXAM 3-Answer
14 pages
Business Simulation - F2F 3
No ratings yet
Business Simulation - F2F 3
27 pages
Element1 Task2
No ratings yet
Element1 Task2
3 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
305_BA_MachineLearning_And_Cognitive_Intellegence_using_Python_1
No ratings yet
305_BA_MachineLearning_And_Cognitive_Intellegence_using_Python_1
32 pages
Word2vec Flow
No ratings yet
Word2vec Flow
2 pages
Data Mining Assignment No 2
No ratings yet
Data Mining Assignment No 2
4 pages
S-10
No ratings yet
S-10
11 pages
Unit 2, 3
No ratings yet
Unit 2, 3
9 pages
Unit 3 - MLnotes-WPS Office
No ratings yet
Unit 3 - MLnotes-WPS Office
18 pages
CS F320 - Assignment II - Draft (Subject to a Few Changes in the Description of Problems)
No ratings yet
CS F320 - Assignment II - Draft (Subject to a Few Changes in the Description of Problems)
12 pages
DL Ut - 1
No ratings yet
DL Ut - 1
14 pages
METHODLOGY of KNN
No ratings yet
METHODLOGY of KNN
2 pages
Aam Ut-1 Qb Ans- [Final]
No ratings yet
Aam Ut-1 Qb Ans- [Final]
28 pages
Ai&ml 2
No ratings yet
Ai&ml 2
15 pages
Lab 8
No ratings yet
Lab 8
5 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Mini Project
No ratings yet
Mini Project
7 pages
Infosys SP 7 Jul 2024 Asked Coding Question Hirin
No ratings yet
Infosys SP 7 Jul 2024 Asked Coding Question Hirin
30 pages
OOMD MiniProject Bank Management Aniket
No ratings yet
OOMD MiniProject Bank Management Aniket
11 pages
Adobe Scan 17 Feb 2024
No ratings yet
Adobe Scan 17 Feb 2024
7 pages
Capstone Project Code
No ratings yet
Capstone Project Code
3 pages
DL Mini
No ratings yet
DL Mini
3 pages
Mini Project DL
No ratings yet
Mini Project DL
5 pages
Capstone Project Code
No ratings yet
Capstone Project Code
3 pages
DL MiniProject
No ratings yet
DL MiniProject
27 pages
Quiz and Mid Paper Data
No ratings yet
Quiz and Mid Paper Data
31 pages
Asteroid Hazard Prediction
No ratings yet
Asteroid Hazard Prediction
8 pages
Moreira 2021 Neurofuzzy Model For Hellp Syndrome
No ratings yet
Moreira 2021 Neurofuzzy Model For Hellp Syndrome
10 pages
Lung cancer detection_Research Paper-2
No ratings yet
Lung cancer detection_Research Paper-2
9 pages
Machine Learning Based Ensemble Classifier For Android Malware Detection
No ratings yet
Machine Learning Based Ensemble Classifier For Android Malware Detection
18 pages
Comparative Performance Analysis of K Nearest Neighbour (KNN) Algorithm and Its Different Variants For Disease Prediction
No ratings yet
Comparative Performance Analysis of K Nearest Neighbour (KNN) Algorithm and Its Different Variants For Disease Prediction
11 pages
Download Data Engineering and Applications Volume 1 Rajesh Kumar Shukla ebook All Chapters PDF
No ratings yet
Download Data Engineering and Applications Volume 1 Rajesh Kumar Shukla ebook All Chapters PDF
55 pages
Parkinson's Detection Using Voice Features and Spiral Drawings
No ratings yet
Parkinson's Detection Using Voice Features and Spiral Drawings
5 pages
Machine Learning - Applications, Process and Techniques
No ratings yet
Machine Learning - Applications, Process and Techniques
241 pages
Confusion Matrix For Your Multi-Class Machine Learning Model - by Joydwip Mohajon - Towards Data Science
No ratings yet
Confusion Matrix For Your Multi-Class Machine Learning Model - by Joydwip Mohajon - Towards Data Science
9 pages
BI Bankai
No ratings yet
BI Bankai
27 pages
4.01 08 2022 - FeatureDescriptors
No ratings yet
4.01 08 2022 - FeatureDescriptors
46 pages
R08 Big Data Projects - Answers
No ratings yet
R08 Big Data Projects - Answers
3 pages
Resume Sonaika Pati D (010824) - 4-1
No ratings yet
Resume Sonaika Pati D (010824) - 4-1
2 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Project Repoprt Final-Speech Emotion Recognition
No ratings yet
Project Repoprt Final-Speech Emotion Recognition
25 pages
Plant Disease Detection and Classification Using Machine Learning and Deep
No ratings yet
Plant Disease Detection and Classification Using Machine Learning and Deep
22 pages
Deepfake Definitions Performance Metrics and Stand
No ratings yet
Deepfake Definitions Performance Metrics and Stand
23 pages
Condition Monitoring of Single Phase Induction Motors Using Discrete Wavelet Transform, Motion Amplification Video and Artificial Neural Network
No ratings yet
Condition Monitoring of Single Phase Induction Motors Using Discrete Wavelet Transform, Motion Amplification Video and Artificial Neural Network
45 pages
CS246 Hw1
No ratings yet
CS246 Hw1
5 pages
Data Mining: Analysis of Student Database Using Classification Techniques
No ratings yet
Data Mining: Analysis of Student Database Using Classification Techniques
7 pages
DMV & ML Lab
No ratings yet
DMV & ML Lab
103 pages
Theft Detection Algorithm Assignment
No ratings yet
Theft Detection Algorithm Assignment
2 pages
Improving Smart Grids Security An Active Learning Approach for Smart Grid-Based Energy Theft Detection
No ratings yet
Improving Smart Grids Security An Active Learning Approach for Smart Grid-Based Energy Theft Detection
12 pages
Immediate download Computational Pathology and Ophthalmic Medical Image Analysis First International Workshop COMPAY 2018 and 5th International Workshop OMIA 2018 Held in Conjunction with MICCAI 2018 Granada Spain September 16 20 2018 Proceedings Danail Stoyanov ebooks 2024
100% (4)
Immediate download Computational Pathology and Ophthalmic Medical Image Analysis First International Workshop COMPAY 2018 and 5th International Workshop OMIA 2018 Held in Conjunction with MICCAI 2018 Granada Spain September 16 20 2018 Proceedings Danail Stoyanov ebooks 2024
62 pages
Multi-Class_Stress_Detection_through_Heart_Rate_Va
No ratings yet
Multi-Class_Stress_Detection_through_Heart_Rate_Va
12 pages
A Deep Transfer Learning Approach For Iot/iiot Cyber Attack Detection Using Telemetry Data
No ratings yet
A Deep Transfer Learning Approach For Iot/iiot Cyber Attack Detection Using Telemetry Data
20 pages
Blockchain Based Rumor Detection Approach For COVID 19: Poonam Rani Vibha Jain Jyoti Shokeen Arnav Balyan
No ratings yet
Blockchain Based Rumor Detection Approach For COVID 19: Poonam Rani Vibha Jain Jyoti Shokeen Arnav Balyan
15 pages
Ijeee V10i7p103
No ratings yet
Ijeee V10i7p103
10 pages
Fake News Detection Based On A Hybrid Bert and Lightgbm Models
No ratings yet
Fake News Detection Based On A Hybrid Bert and Lightgbm Models
12 pages