Unit 3 & 4 (p18)
Unit 3 & 4 (p18)
Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine
Learning Foundation Course at a student-friendly price and become industry ready.
Usage of EM algorithm –
It can be used to fill the missing data in a sample.
It can be used as the basis of unsupervised learning of clusters.
It can be used for the purpose of estimating the parameters of Hidden Markov Model (HMM).
It can be used for discovering the values of latent variables.
Advantages of EM algorithm –
It is always guaranteed that likelihood will increase with each iteration.
The E-step and M-step are often pretty easy for many problems in terms of implementation.
Solutions to the M-steps often exist in the closed form.
Disadvantages of EM algorithm –
It has slow convergence.
It makes convergence to the local optima only.
It requires both the probabilities, forward and backward (numerical optimization requires only forward probability).
Long questions:
Partitioning Methods: These methods partition the objects into k clusters and each partition forms one cluster. This
method is used to optimize an objective criterion similarity function such as when the distance is a major parameter
example K-means, CLARANS (Clustering Large Applications based upon Randomized Search), etc.
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is given below:
Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different clusters. It means here we
will try to group these datasets into two different clusters. We need to choose some random k points or centroid to form
the cluster. These points can be either the points from the dataset or any other point. So, here we are selecting the below
two points as k points, which are not the part of our dataset. Consider the below image:
Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will compute it by applying
some mathematics that we have studied to calculate the distance between two points. So, we will draw a median
between both the centroids. Consider the below image:
From the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and points to the right
of the line are close to the yellow centroid. Let's color them as blue and yellow for clear visualization.
As we need to find the closest cluster, so we will repeat the process by choosing a new centroid. To choose the new
centroids, we will compute the center of gravity of these centroids, and will find new centroids as below:
Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same process of finding a median
line. The median will be like below image:
From the above image, we can see, one yellow point is on the left side of the line, and two blue points are right to the line.
So, these three points will be assigned to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is finding new centroids or K-points.
We will repeat the process by finding the center of gravity of centroids, so the new centroids will be as shown in the below
image:
As we got the new centroids so again will draw the median line and reassign the data points. So, the image will be:
We can see in the above image; there are no dissimilar data points on either side of the line, which means our model is
formed. Consider the below image:
As our model is ready, so we can now remove the assumed centroids, and the two final clusters will be as shown in the
below image:
Algorithm:
Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine
Learning Foundation Course at a student-friendly price and become industry ready.
Usage of EM algorithm –
It can be used to fill the missing data in a sample.
It can be used as the basis of unsupervised learning of clusters.
It can be used for the purpose of estimating the parameters of Hidden Markov Model (HMM).
It can be used for discovering the values of latent variables.
Advantages of EM algorithm –
It is always guaranteed that likelihood will increase with each iteration.
The E-step and M-step are often pretty easy for many problems in terms of implementation.
Solutions to the M-steps often exist in the closed form.
Disadvantages of EM algorithm –
It has slow convergence.
It makes convergence to the local optima only.
It requires both the probabilities, forward and backward (numerical optimization requires only forward probability).
Here we will calculate the new features. To do this, we will multiply the P* matrix to the Z. In the resultant matrix Z*, each
observation is the linear combination of original features. Each column of the Z* matrix is independent of each other.
Remove less or unimportant features from the new dataset.
The new feature set has occurred, so we will decide here what to keep and what to remove. It means, we will only keep
the relevant or important features in the new dataset, and unimportant features will be removed out.
Applications of Principal Component Analysis
PCA is mainly used as the dimensionality reduction technique in various AI applications such as computer vision, image
compression, etc.
It can also be used for finding hidden patterns if data has high dimensions. Some fields where PCA is used are Finance,
data mining, Psychology, etc.
Problem: To extract independent sources’ signals from a mixed signal composed of the signals from those sources.
Given: Mixed signal from five different independent sources.
Aim: To decompose the mixed signal into independent sources:
Source 1 Source 2 Source 3 Source 4 Source 5
Solution: Independent Component Analysis (ICA).
Consider Cocktail Party Problem or Blind Source Separation problem to understand the problem which is solved by
independent component analysis.
Here, There is a party going into a room full of people. There is ‘n’ number of speakers in that room and they are
speaking simultaneously at the party. In the same room, there are also ‘n’ number of microphones placed at different
distances from the speakers which are recording ‘n’ speakers’ voice signals. Hence, the number of speakers is equal to
the number must of microphones in the room.
Now, using these microphones’ recordings, we want to separate all the ‘n’ speakers’ voice signals in the room given
each microphone recorded the voice signals coming from each speaker of different intensity due to the difference in
distances between them. Decomposing the mixed signal of each microphone’s recording into independent source’s
speech signal can be done by using the machine learning technique, independent component analysis.
[ X1, X2, ….., Xn ] => [ Y1, Y2, ….., Yn ]
where, X1, X2, …, Xn are the original signals present in the mixed signal and Y1, Y2, …, Yn are the new features and are
independent components which are independent of each other.
Restrictions on ICA –
The independent components generated by the ICA are assumed to be statistically independent of each other.
The independent components generated by the ICA must have non-gaussian distribution.
The number of independent components generated by the ICA is equal to the nu
q.7 Explain multidimensional scaling in brief?
Ans. Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is
used to translate "information about the pairwise 'distances' among a set of objects or individuals" into a configuration
of points mapped into an abstract Cartesian space.[1]
More technically, MDS refers to a set of related ordination techniques used in information visualization, in particular to
display the information contained in a distance matrix. It is a form of non-linear dimensionality reduction.
Given a distance matrix with the distances between each pair of objects in a set, and a chosen number of dimensions, N,
an MDS algorithm places each object into N-dimensional space (a lower-dimensional representation) such that the
between-object distances are preserved as well as possible. For N = 1, 2, and 3, the resulting points can be visualized on
a scatter plot
Types
Classical multidimensional scaling
It is also known as Principal Coordinates Analysis (PCoA), Torgerson Scaling or Torgerson–Gower scaling. It takes an input
matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss
function called strain.
Metric multidimensional scaling (mMDS)
It is a superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input
matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often
minimized using a procedure called stress majorization.
Non-metric multidimensional scaling (nMDS)
In contrast to metric MDS, non-metric MDS finds both a non-parametric monotonic relationship between the dissimilarities
in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional
space.
Generalized multidimensional scaling (GMD)
An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth non-Euclidean space. In
cases where the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the
minimum-distortion embedding of one surface into another.
Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine
Learning Foundation Course at a student-friendly price and become industry ready.
Example:
Suppose we have two sets of data points belonging to two different classes that we want to classify. As shown in the
given 2D graph, when the data points are plotted on the 2D plane, there’s no straight line that can separate the two
classes of the data points completely. Hence, in this case, LDA (Linear Discriminant Analysis) is used which reduces the
2D graph into a 1D graph in order to maximize the separability between the two classes.
Here, Linear Discriminant Analysis uses both the axes (X and Y) to create a new axis and projects data onto a new axis in
a way to maximize the separation of the two categories and hence, reducing the 2D graph into a 1D graph.
In the above graph, it can be seen that a new axis (in red) is generated and plotted in the 2D graph such that it
maximizes the distance between the means of the two classes and minimizes the variation within each class. In simple
terms, this newly generated axis increases the separation between the data points of the two classes. After generating
this new axis using the above-mentioned criteria, all the data points of the classes are plotted on this new axis and are
shown in the figure given below.
But Linear Discriminant Analysis fails when the mean of the distributions are shared, as it becomes impossible for LDA
to find a new axis that makes both the classes linearly separable. In such cases, we use non-linear discriminant analysis.
Extensions to LDA:
Quadratic Discriminant Analysis (QDA): Each class uses its own estimate of variance (or covariance when there are
multiple input variables).
Flexible Discriminant Analysis (FDA): Where non-linear combinations of inputs are used such as splines.
Regularized Discriminant Analysis (RDA): Introduces regularization into the estimate of the variance (actually
covariance), moderating the influence of different variables on LDA.
q.9 Diffrence between PCA and ICA?
Ans. Difference between PCA and ICA –
Principal Component Analysis Independent Component Analysis
It deals with the Principal Components. It deals with the Independent Components.
It focuses on the mutual orthogonality property of the principal It doesn’t focus on the mutual orthogonality of the
components. components.
It doesn’t focus on the mutual independence of the It focuses on the mutual independence of the
components. components.
UNIT 4
Short questions:
Information Gain is calculated for a split by subtracting the weighted entropies of each branch from the original entropy.
When training a Decision Tree using these metrics, the best split is chosen by maximizing Information Gain.
Gain(A) = Info(D) − InfoA (D).
Long question:
https://www.youtube.com/watch?v=coOTEc-0OGw
q.4 What is the difference between machine learning and deep learning?
Machine Learning is a subset of artificial intelligence focusing on a specific goal: setting computers up to be able to
perform tasks without the need for explicit programming.
Computers are fed structured data (in most cases) and ‘learn’ to become better at evaluating and acting on that data over
time.
Think of ‘structured data’ as data inputs you can put in columns and rows. You might create a category column in Excel
called ‘food’, and have row entries such as ‘fruit’ or ‘meat’. This form of ‘structured’ data is very easy for computers to
work with, and the benefits are obvious (It’s no coincidence that one of the most important data programming languages
is called ‘structured query language’).
Once programmed, a computer can take in new data indefinitely, sorting and acting on it without the need for further
human intervention.
Over time, the computer may be able to recognize that ‘fruit’ is a type of food even if you stop labeling your data. This
‘self-reliance’ is so fundamental to machine learning that the field breaks down into subsets based on how much ongoing
human help is involved.
Ans. https://www.youtube.com/watch?v=coOTEc-0OGw
https://youtu.be/coOTEc-0OGw
https://youtu.be/NsmAEmoSRjk
https://youtu.be/jBb8I9BpJrU
https://youtu.be/3EQw8awLQJ4
https://youtu.be/XzSlEA4ck2I
https://youtu.be/CJjSPCslxqQ
https://youtu.be/P2KZisgs4A4
https://youtu.be/K2sBRVCXZqs
https://youtu.be/v9tWTiMd0iE
https://youtu.be/i_bx7LI_h_4
https://youtu.be/7fnarUsRMO0
https://youtu.be/87oLR76aK2g
https://youtu.be/Y1dxfValzY0
https://youtu.be/wzm1NqZSpys
https://youtu.be/hODHKaSv1n0