W07- Intro Basic ML
W07- Intro Basic ML
4. Solutions
2
1. INTRODUCTION TO MACHINE LEARNING
1. INTRODUCTION TO MACHINE LEARNING
4
1. INTRODUCTION TO MACHINE LEARNING
1. Supervised Learning
2. Unsupervised Learning
3. Semi-supervised Learning
4. Reinforcement Learning
7
2. MACHINE LEARNING CORE CONCEPTS
1. Supervised Learning: Using labeled data to predict outcomes (e.g., house price prediction).
• Supervised Learning is a type of machine learning where the model is trained using labeled data.
This means that each training example is paired with an output label. The goal is for the model to
learn a mapping from inputs to outputs, so it can predict the label for new, unseen data.
Key Concepts:
3. Semi-supervised Learning: Uses both labeled and unlabeled data, dealing with data that is
not entirely structured nor unstructured.
Semi-structured learning uses both labeled and unlabeled data to handle data that is not fully
structured or unstructured, such as HTML, JSON, and XML files with tags or markers.
Key Concepts:
• Mixed Data: Combines labeled and unlabeled data.
• Flexibility: Handles data with some structure but not rigidly formatted.
• Iterative Refinement: Improves models iteratively with both data types.
• Graph-Based Models: Uses graphs to represent and learn from data relationships.
• Semi-Supervised Algorithms: Includes methods like self-training and co-training.
• Applications: Useful in NLP, information retrieval, and bioinformatics.
10
2. MACHINE LEARNING CORE CONCEPTS
1. Expand the Dataset: Add five more house sizes and their corresponding prices to the dataset. Ensure the new data
points make logical sense in the context of the existing data.
2. Calculate Model Accuracy: Add code to calculate and print the Mean Absolute Error (MAE) of the model's
predictions on the test set. Provide a brief explanation of what MAE indicates about the model's performance.
3. Modify Train-Test Split: Change the test size to 30% and rerun the model training and evaluation. Observe how
this change affects the model's performance and MAE.
4. Customize the Plot: Customize the plot by changing the color of the data points to green and the regression line to
black. Add grid lines to the plot to enhance readability.
5. Implement Polynomial Regression: Modify the code to fit a polynomial regression model of degree 2 instead of a
linear regression model. Visualize the new regression line and compare it with the linear regression line. Discuss the
13
differences observed in the predictions.
3. PRACTICAL EXAMPLES AND DEMONSTRATIONS
1. Data Preparation: a. Generate a synthetic dataset simulating social network data with features such as the number
of friends, likes, and posts, b. Print the first few rows to understand its structure.
2. K-Means Clustering: a. Perform K-Means clustering on the dataset, b. Use the Elbow Method to determine the
optimal number of clusters, and c. Visualize clusters with a scatter plot, coloring data points by cluster labels.
3. Dimensionality Reduction with PCA: a. Apply Principal Component Analysis (PCA) to reduce the dataset to 2
components, and b. Visualize the dataset in 2D, coloring data points by K-Means cluster labels.
4. Hierarchical Clustering: a. Perform Hierarchical clustering on the dataset, b. Visualize the dendrogram and identify
an appropriate number of clusters, and c. Compare clusters from Hierarchical clustering with those from K-Means.
5. Anomaly Detection: a. Use the GaussianMixture model to detect anomalies in the dataset, and b. Identify and
visualize anomalies using a scatter plot.
14
3. PRACTICAL EXAMPLES AND DEMONSTRATIONS
Tasks:
1. Environment Setup: a. Create a simple Grid World environment where an agent can move up, down, left, or right.
b. Define the grid size, start position, goal position, and obstacles.
2. Q-Learning Implementation: a. Implement the Q-Learning algorithm for the agent to learn the optimal policy to
reach the goal. b. Define the reward structure, learning rate, and discount factor. c. Train the agent by allowing it to
interact with the environment for a specified number of episodes.
3. Policy Visualization: a. Visualize the learned policy by showing the optimal path from the start to the goal on the
grid. b. Display the Q-values for each state-action pair on the grid.
4. Performance Evaluation: a. Plot the total rewards per episode to observe the learning progress. b. Evaluate the
performance by calculating the average reward over multiple test episodes.
5. Exploration vs. Exploitation: a. Implement an epsilon-greedy strategy for the agent to balance exploration and
16
exploitation. b. Experiment with different values of epsilon and observe the impact on the learning process.
4. SOLUTIONS
3. PRACTICAL EXAMPLES AND DEMONSTRATIONS
1. Expand the Dataset: Add five more house sizes and their corresponding prices to the dataset. Ensure the new data
points make logical sense in the context of the existing data.
2. Calculate Model Accuracy: Add code to calculate and print the Mean Absolute Error (MAE) of the model's
predictions on the test set. Provide a brief explanation of what MAE indicates about the model's performance.
3. Modify Train-Test Split: Change the test size to 30% and rerun the model training and evaluation. Observe how
this change affects the model's performance and MAE.
4. Customize the Plot: Customize the plot by changing the color of the data points to green and the regression line to
black. Add grid lines to the plot to enhance readability.
5. Implement Polynomial Regression: Modify the code to fit a polynomial regression model of degree 2 instead of a
linear regression model. Visualize the new regression line and compare it with the linear regression line. Discuss the
18
differences observed in the predictions.
4. SOLUTIONS: SUPERVISED LEARNING
House Price Prediction
Using Machine Learning
19
4. SOLUTIONS: SUPERVISED LEARNING
20
4. SOLUTIONS: SUPERVISED LEARNING
House Price Prediction
Output: Using Machine Learning
21
3. PRACTICAL EXAMPLES AND DEMONSTRATIONS
1. Data Preparation: a. Generate a synthetic dataset simulating social network data with features such as the number
of friends, likes, and posts, b. Print the first few rows to understand its structure.
2. K-Means Clustering: a. Perform K-Means clustering on the dataset, b. Use the Elbow Method to determine the
optimal number of clusters, and c. Visualize clusters with a scatter plot, coloring data points by cluster labels.
3. Dimensionality Reduction with PCA: a. Apply Principal Component Analysis (PCA) to reduce the dataset to 2
components, and b. Visualize the dataset in 2D, coloring data points by K-Means cluster labels.
4. Hierarchical Clustering: a. Perform Hierarchical clustering on the dataset, b. Visualize the dendrogram and identify
an appropriate number of clusters, and c. Compare clusters from Hierarchical clustering with those from K-Means.
5. Anomaly Detection: a. Use the GaussianMixture model to detect anomalies in the dataset, and b. Identify and
visualize anomalies using a scatter plot.
22
4. SOLUTIONS: UNSUPERVISED LEARNING
Exploring Unsupervised
Learning with Social Network
Data
23
4. SOLUTIONS: UNSUPERVISED LEARNING
Exploring Unsupervised
Learning with Social Network
Data
24
4. SOLUTIONS: UNSUPERVISED LEARNING
Exploring Unsupervised
Learning with Social Network
Data
Output:
25
3. PRACTICAL EXAMPLES AND DEMONSTRATIONS
27
4. SOLUTIONS: SEMI-STRUCTURE LEARNING
Exploring Semi-Structured
Learning with Email Dataset.
28
4. SOLUTIONS: SEMI-STRUCTURE LEARNING
Output: Exploring Semi-Structured
Learning with Email Dataset.
29
3. PRACTICAL EXAMPLES AND DEMONSTRATIONS
Tasks:
1. Environment Setup: a. Create a simple Grid World environment where an agent can move up, down, left, or right.
b. Define the grid size, start position, goal position, and obstacles.
2. Q-Learning Implementation: a. Implement the Q-Learning algorithm for the agent to learn the optimal policy to
reach the goal. b. Define the reward structure, learning rate, and discount factor. c. Train the agent by allowing it to
interact with the environment for a specified number of episodes.
3. Policy Visualization: a. Visualize the learned policy by showing the optimal path from the start to the goal on the
grid. b. Display the Q-values for each state-action pair on the grid.
4. Performance Evaluation: a. Plot the total rewards per episode to observe the learning progress. b. Evaluate the
performance by calculating the average reward over multiple test episodes.
5. Exploration vs. Exploitation: a. Implement an epsilon-greedy strategy for the agent to balance exploration and
30
exploitation. b. Experiment with different values of epsilon and observe the impact on the learning process.
4. SOLUTIONS
Reinforcement Learning:
Exploring Reinforcement
Learning with a Grid
World Environment
31
4. SOLUTIONS
Reinforcement Learning:
Exploring Reinforcement
Learning with a Grid
World Environment
32
4. SOLUTIONS
Reinforcement Learning:
Exploring Reinforcement
Learning with a Grid
World Environment
33
4. SOLUTIONS
Reinforcement Learning:
Output:
Exploring Reinforcement
Learning with a Grid
World Environment
34
Thank You!
If you have any questions, please reach me!