Machine learning 2
Machine learning 2
1. Explain the PCA using some common terms used in the PCA algorithm.
Principal Component Analysis(PCA) technique was introduced by the mathematician Karl Pearson in
1901. It works on the condition that while the data in a higher dimensional space is mapped to data in a lower
dimension space, the variance of the data in the lower dimensional space should be maximum.
• Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal
transformation that converts a set of correlated variables to a set of uncorrelated variables.PCA
is the most widely used tool in exploratory data analysis and in machine learning for predictive
models. Moreover,
• Principal Component Analysis (PCA) is an unsupervised learning algorithm technique used to
examine the interrelations among a set of variables. It is also known as a general factor analysis
where regression determines a line of best fit.
• The main goal of Principal Component Analysis (PCA) is to reduce the dimensionality of a
dataset while preserving the most important patterns or relationships between the variables
without any prior knowledge of the target variables.
Principal Component Analysis (PCA) is used to reduce the dimensionality of a data set by finding a new set
of variables, smaller than the original set of variables, retaining most of the sample’s information, and useful
for the regression and classification of data.
1. Dataset/Features:
o The data you start with is usually in the form of a matrix with rows as observations (data
points) and columns as features (variables).
o For example, if you're analyzing images, each image could be represented as a row, and the
pixel values as features.
2. Mean Centering:
o PCA starts by centering the data around the origin. This means subtracting the mean of each
feature from the data, so the new mean of each feature becomes zero.
o Centering helps ensure that PCA captures the true relationships between the features.
3. Eigenvalues and Eigenvectors:
o The covariance matrix is decomposed to find eigenvalues and eigenvectors.
o Eigenvectors indicate the direction of maximum variance (principal components) in the data.
o Eigenvalues indicate the amount of variance explained by each eigenvector.
4. Dimensionality Reduction:
o By selecting the top k principal components (based on eigenvalues), you can reduce the
number of dimensions while retaining most of the information (variance).
o For example, instead of using all 100 original features, you might use the top 10 principal
components.
5. Projection:
o The original data is projected onto the selected principal components to obtain the transformed
data in the reduced space.
o This projection converts the data into a new coordinate system defined by the principal
components.
2. Explain Simple Linear Regression and Multiple Linear Regression with examples.
2. Data Visualization
• Purpose: Transform high-dimensional data into 2D or 3D for easier visualization and pattern
recognition.
• Example: Visualizing clusters in datasets with 10+ dimensions, such as in customer segmentation.
3. Noise Reduction
• Purpose: Eliminate less important features (noise) to improve data quality.
• Example: In signal processing, removing noise from audio or EEG signals by keeping only the
significant components.
4. Feature Extraction
• Purpose: Generate new features (principal components) that are linear combinations of the original
features.
• Example: In natural language processing, PCA is used to extract key topics from text datasets.
5. Preprocessing for Machine Learning
• Purpose: Simplify datasets and reduce multicollinearity, leading to better model performance.
• Example: Preparing datasets with many correlated features (e.g., in regression or classification
problems).
6. Image Compression
• Purpose: Compress high-resolution images by keeping the most significant principal components.
• Example: Reducing storage requirements for images while maintaining visual quality.
9. Anomaly Detection
• Purpose: Detect unusual patterns in data by reducing dimensionality and identifying outliers.
• Example: Fraud detection in credit card transactions.
3. Predictive Analytics
• Purpose: Use historical data to make predictions about future events.
• Example:
o Stock Price Prediction: Predicting future stock prices based on market trends.
o Weather Forecasting: Predicting weather conditions using past climate data.
4. Recommendation Systems
• Purpose: Suggest items based on user preferences or behavior.
• Example:
o E-commerce: Amazon recommending products based on past purchases.
o Streaming Platforms: Netflix recommending movies or TV shows based on viewing history.
5. Fraud Detection
• Purpose: Identify unusual patterns or anomalies to detect fraudulent activities.
• Example:
o Banking: Identifying fraudulent credit card transactions.
o Insurance: Detecting fake claims using anomaly detection algorithms.
6. Healthcare
• Purpose: Enhance diagnostics, treatment, and patient care.
• Example:
o Disease Prediction: Using ML to predict diseases like diabetes based on patient data.
o Drug Discovery: Accelerating the development of new drugs using deep learning.
7. Autonomous Systems
• Purpose: Enable systems to operate without human intervention.
• Example:
o Self-Driving Cars: Autonomous vehicles using ML for navigation and obstacle detection.
o Drones: Delivery drones determining optimal flight paths.
9. Robotics
• Purpose: Enhance robot perception, decision-making, and control.
• Example:
o Industrial Automation: Robots sorting items in warehouses like those used by Amazon.
o Humanoids: Robots like Sophia interacting with humans in real-time.
10. Gaming
• Purpose: Develop intelligent and adaptive game characters.
• Example:
o AI Opponents: Games like Chess or Go using AI to challenge players.
o Dynamic Difficulty Adjustment: ML algorithms adjusting the difficulty based on player
performance.
6. What do you mean by Reinforcement Learning?
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by
interacting with an environment to maximize a reward. The agent learns through trial and error, receiving
feedback in the form of rewards or penalties for its actions.
Real-World Analogy
Imagine training a dog. The dog (agent) is given commands (actions) and learns to respond based on feedback:
• A treat (reward) for good behavior.
• No treat or a mild correction (penalty) for incorrect behavior.
7. How does Machine Learning work?
Machine learning uses a systematic approach to predict new values. Each step is important and cannot be
skipped to achieve high accuracy. Code Implementation follows the below-mentioned steps:
• Data collection
• Data Preprocessing
• Model Training
• Model Evaluation
• Model Deployment
1. Data Collection: Data collection is an important part as discussed above. The quality of data
determines the accuracy of the predictions. We can collect datasets from APIs, websites, social
media, etc. We can also use in-built datasets that are provided in the programming languages for
learning purposes. Ethical use of data should be kept in mind. We must uphold fairness and
privacy while using these datasets to achieve our goal.
2. Data Preprocessing: Before feeding this data into modeling we pre-process this data to remove
duplicate and missing values, deal with outliers, and standardize the formats is done in this step.
This enhances the quality of the dataset and improves accuracy by dealing with the possible
error sources before modeling.
3. Model Training: After we get the dataset we use an algorithm based on our problem to model
our dataset. We usually divide our dataset into two parts: training and testing sets. Various
models are used, for example, linear regression, logistic regression, decision trees,
etc. Hyperparameter training is also done to improve accuracy. Techniques like grid search and
random search are used for tuning the parameter.
4. Model Evaluation: This is a crucial step in determining whether our model is working
accurately or not. Metrics such as accuracy, precision, recall, F1-score, and AUC guide the
assessment of model performance. Cross-validation techniques like k-fold and leave-one-out
help us in determining the efficiency of the model. These values are the determinants of the
accuracy.
5. Model Deployment: Deployment of these steps into real-world problems is done in this step. It
is a process of integrating the trained model into real-world issues to solve them. This is the
practical use of the model building and training.
➢ The Apriori Algorithm is a classic algorithm used in market basket analysis to identify frequent
itemsets and generate association rules from large transactional datasets. It is an essential algorithm in
data mining and is used to find relationships or patterns between different items in large datasets.
➢ The name Apriori comes from the idea that the algorithm uses prior knowledge of frequent itemset
properties. The algorithm works by iteratively identifying itemsets that meet a minimum support
threshold and then generating rules that show how the presence of one item can imply the presence of
another.
o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the particular k-center,
create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other clusters.
The below diagram explains the working of the K-means Clustering Algorithm:
1. Hierarchical Clustering:
Hierarchical clustering builds a hierarchy of clusters, where each data point starts as its own cluster, and the
algorithm progressively merges or splits clusters based on their similarity. There are two types of hierarchical
clustering:
Types of Hierarchical Clustering:
1. Agglomerative (Bottom-up):
o Starts with each data point as a separate cluster.
o Iteratively merges the two closest clusters until only one cluster remains or a stopping
criterion is reached.
2. Divisive (Top-down):
o Starts with all data points in a single cluster.
o Iteratively splits the clusters into smaller sub-clusters.
Steps in Agglomerative Hierarchical Clustering:
1. Each data point is considered a cluster.
2. Compute the distance between all clusters. A common distance metric is Euclidean distance.
3. Merge the closest two clusters.
4. Recalculate the distance matrix for the new cluster formed by merging.
5. Repeat steps 3 and 4 until the desired number of clusters is achieved or all points are in one cluster.
Dendrogram:
• Hierarchical clustering produces a dendrogram, which is a tree-like diagram that represents the
hierarchy of clusters.
• The height of the branches in the dendrogram represents the distance between clusters when they are
merged.
Advantages of Hierarchical Clustering:
• Does not require specifying the number of clusters (K) beforehand.
• Produces a dendrogram that can give insight into the data structure.
• Works well for small datasets.
Disadvantages of Hierarchical Clustering:
• Computationally expensive and slow for large datasets.
• Difficult to adjust after the algorithm has begun, as the merging or splitting process is sequential.
• Sensitive to noise and outliers.
2. Density-Based Clustering:
Density-Based Clustering groups together data points that are closely packed together, based on the density of
data points in a region. This method is particularly effective at identifying clusters of arbitrary shapes and
handling outliers.
The most popular density-based clustering algorithm is DBSCAN (Density-Based Spatial Clustering of
Applications with Noise).
Key Concepts of DBSCAN:
1. Core Points:
A point is a core point if it has more than a specified minimum number of points (MinPts) within a
given radius (epsilon, ε). These points form the dense regions of the cluster.
2. Border Points:
A point is a border point if it is within the ε-neighborhood of a core point but does not have enough
points in its ε-neighborhood to be a core point.
3. Noise (Outliers):
Points that do not belong to any cluster are called noise points or outliers.
4. Epsilon (ε): A predefined radius within which neighbors are considered to be part of the same cluster.
5. MinPts: The minimum number of neighboring points required to form a dense region.
Steps in DBSCAN Algorithm:
1. For each unvisited data point, check if it is a core point by counting how many points are within ε
distance.
2. If a point is a core point, form a cluster by adding all reachable points (core points and border points) to
the cluster.
3. Repeat for each unvisited core point until all points are assigned to clusters or marked as noise.
4. Points that are not part of any cluster are considered outliers (noise).
Advantages of Density-Based Clustering (DBSCAN):
• Can detect clusters of arbitrary shapes (not limited to spherical shapes).
• Can handle noise and outliers effectively.
• Does not require specifying the number of clusters in advance.
Disadvantages of DBSCAN:
• Performance is sensitive to the choice of parameters (ε and MinPts).
• Not suitable for datasets with varying densities (i.e., if clusters have very different densities).
• Struggles with high-dimensional data.