0% found this document useful (0 votes)

100 views24 pages

Assignment

The document discusses two main topics: 1) Applications of PCA in machine learning with examples including dimensionality reduction, image processing, and recommendation systems. PCA works by finding principal components that explain maximum variance in data. 2) Visualization tools in Python like Matplotlib and Seaborn for creating line charts, bar charts, histograms and more using example datasets. Code examples show how to plot and customize different graph types.

Uploaded by

Santhi Palanisamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

100 views24 pages

Assignment

Uploaded by

Santhi Palanisamy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 24

M.

Sathya Sundaram
PA2213003013027
Assignment
MA3067 - Linear Algebra & Statistics for ML

Topics:
1. Explain the applications of PCA in Machine Learning with suitable examples
2. What is visualization tools, Explain with python code to draw suitable graphs, charts, bars
with suitable examples
1. Explain the applications of PCA in Machine Learning with suitable examples
Principal Component Analysis
 Principal Component Analysis is an unsupervised learning algorithm that is used for
the dimensionality reduction in machine learning.
 It is a statistical process that converts the observations of correlated features into a set
of linearly uncorrelated features with the help of orthogonal transformation. These
new transformed features are called the Principal Components. It is one of the
popular tools that is used for exploratory data analysis and predictive modeling. It is a
technique to draw strong patterns from the given dataset by reducing the variances.
 PCA generally tries to find the lower-dimensional surface to project the high-
dimensional data.
 Correlation is a highly applied technique in machine learning during data analysis and
data mining. It can extract key problems from a given set of features, which can later
cause significant damage during the fitting model. Data having non-
correlated features have many benefits. Such as:
1. Learning of Algorithm will be faster
2. Interpretability will be high
3. Bias will be less
 PCA works by considering the variance of each attribute because the high attribute
shows the good split between the classes, and hence it reduces the dimensionality.
 Some real-world applications of PCA are image processing, movie recommendation
system, optimizing the power allocation in various communication channels.
 The PCA algorithm is based on some mathematical concepts such as:
o Variance and Covariance
o Eigenvalues and Eigen factors
HOW DO YOU DO A PRINCIPAL COMPONENT ANALYSIS?
1. Standardize the range of continuous initial variables
2. Compute the covariance matrix to identify correlations
3. Compute the eigenvectors and eigenvalues of the covariance matrix to identify the
principal components
4. Create a feature vector to decide which principal components to keep
5. Recast the data along the principal components axes
In the above figure, we have several points plotted on a 2-D plane. There are two principal
components. PC1 is the primary principal component that explains the maximum variance in
the data. PC2 is another principal component that is orthogonal to PC1.
The mathematical representation of dimensionality reduction in the context of PCA is as
follows:
Given a dataset with n observations and p variables represented by the n x p data matrix X,
the goal of PCA is to transform the original variables into a new set of k variables called
principal components that capture the most significant variation in the data. The principal
components are defined as linear combinations of the original variables given by:
PC_1 = a_11 * x_1 + a_12 * x_2 + ... + a_1p * x_p
PC_2 = a_21 * x_1 + a_22 * x_2 + ... + a_2p * x_p
...
PC_k = a_k1 * x_1 + a_k2 * x_2 + ... + a_kp * x_p

How Does Principal Component Analysis Work?

1. Normalize the Data
Standardize the data before performing PCA. This will ensure that each feature has a mean =
0 and variance = 1.

2. Build the Covariance Matrix

Construct a square matrix to express the correlation between two or more features in a
multidimensional dataset.

3. Find the Eigenvectors and Eigenvalues

Calculate the eigenvectors/unit vectors and eigenvalues. Eigenvalues are scalars by which we
multiply the eigenvector of the covariance matrix.
4. Sort the Eigenvectors in Highest to Lowest Order and Select the Number of Principal
Components.
Applications of PCA in Machine Learning

 PCA is used to visualize multidimensional data.

 It is used to reduce the number of dimensions in healthcare data.
 PCA can help resize an image.
 It can be used in finance to analyze stock data and forecast returns.
 PCA helps to find patterns in the high-dimensional datasets.

1. Neuroscience:
 A technique known as spike-triggered covariance analysis uses a variant of
Principal Components Analysis in Neuroscience to identify the specific properties of
a stimulus that increase a neuron's probability of generating an action potential.
 PCA is also used to find the identity of a neuron from the shape of its action
potential.

 PCA as a dimension reduction technique is used detect coordinated activities of large

neuronal ensembles. It has been used in determining collective variables, that is, order
parameters, during phase transitions in the brain.
2. Quantitative Finance
PCA is a methodology to reduce the dimensionality of a complex problem. Say, a fund
manager has 200 stocks in his portfolio. To analyze these stocks quantiatively a stock
manager will require a co-relational matrix of the size 200 * 200, which makes the problem
very complex.

However if he was to extract, 10 Principal Components which best represent the variance in
the stocks best, this would reduce the complexity of problem while still explaining the
movement of all 200 stocks. Some other applications of PCA include:

 Analyzing the shape of the yield curve

 Hedging fixed income portfolios

 Implementation of interest rate models

 Forecasting portfolio returns

 Developing asset allocation algorithms

 Developing long short equity trading algorithms

3. Image Compression
PCA is also used for image compression.

2. What is visualization tools, Explain with python code to draw suitable graphs, charts, bars
with suitable examples
Data visualization is a crucial aspect of machine learning that enables analysts to understand
and make sense of data patterns, relationships, and trends. Through data visualization,
insights and patterns in data can be easily interpreted and communicated to a wider audience,
making it a critical component of machine learning.
1.Line Chart
2.Scatter Plot
3.Bar Chart
4.Pie Chart
5.Box Plot
6. Histogram
 Matplotlib and Seaborn are python libraries that are used for data visualization.
Line Charts
A Line chart is a graph that represents information as a series of data points connected by a
straight line. In line charts, each data point or marker is plotted and connected with a line or
curve.
Let's consider the apple yield (tons per hectare) in Kanto. Let's plot a line graph using this
data and see how the yield of apples changes over time. We start by importing Matplotlib and
Seaborn.

Using Matplotlib
We are using random data points to represent the yield of apples.

To better understand the graph and its purpose, we can add the x-axis values too.
Let's add labels to the axes so that we can show what each axis represents.

To plot multiple datasets on the same graph, just use the plt.plot function once for each
dataset. Let's use this to compare the yields of apples vs. oranges on the same graph.
We can add a legend which tells us what each line in our graph means. To understand what
we are plotting, we can add a title to our graph.

To show each data point on our graph, we can highlight them with markers using the marker
argument. Many different marker shapes like a circle, cross, square, diamond, etc. are
provided by Matplotlib.

You can use the plt.figure function to change the size of the figure.
Using Seaborn
An easy way to make your charts look beautiful is to use some default styles from the
Seaborn library. These can be applied globally using the sns.set_style function.
We can also use the darkgrid option to change the background color to a darker shade.
Bar Graphs
When you have categorical data, you can represent it with a bar graph. A bar graph plots data
with the help of bars, which represent value on the y-axis and category on the x-axis. Bar
graphs use bars with varying heights to show the data which belongs to a specific category.
We can also stack bars on top of each other. Let's plot the data for apples and oranges.

Let’s use the tips dataset in Seaborn next. The dataset consists of :
 Information about the sex (gender)
 Time of day
 Total bill
 Tips given by customers visiting the restaurant for a week

We can draw a bar chart to visualize how the average bill amount varies across different days
of the week. We can do this by computing the day-wise averages and then using plt.bar. The
Seaborn library also provides a barplot function that can automatically compute averages.
Histograms
A Histogram is a bar representation of data that varies over a range. It plots the height of the
data belonging to a range along the y-axis and the range along the x-axis. Histograms are
used to plot data over a range of values. They use a bar representation to show the data
belonging to each range. Let's again use the ‘Iris’ data which contains information about
flowers to plot histograms.

Now, let’s plot a histogram using the hist() function.

We can control the number or size of bins too.

We can change the number and size of bins using numpy too.
We can create bins of unequal size too.

Similar to line charts, we can draw multiple histograms in a single chart. We can reduce each
histogram's opacity so that one histogram's bars don't hide the others'. Let's draw separate
histograms for each species of flowers.
Multiple histograms can be stacked on top of one another by setting the stacked parameter to
True.

Scatter Plots
Scatter plots are used when we have to plot two or more variables present at different
coordinates. The data is scattered all over the graph and is not confined to a range. Two or
more variables are plotted in a Scatter Plot, with each variable being represented by a
different color. Let's use the ‘Iris’ dataset to plot a Scatter Plot.

First, let’s see how many different species of flowers we have.

Let’s try plotting the data with the help of a line chart.
This is not very informative. We cannot figure out the relationship between different data
points.

This is much better. But we still cannot differentiate different data points belonging to
different categories. We can color the dots using the flower species as a hue.
Since Seaborn uses Matplotlib's plotting functions internally, we can use functions like
plt.figure and plt.title to modify the figure.

Heat Maps
Heatmaps are used to see changes in behavior or gradual changes in data. It uses different
colors to represent different values. Based on how these colors range in hues, intensity, etc.,
tells us how the phenomenon varies. Let's use heatmaps to visualize monthly passenger
footfall at an airport over 12 years from the flights dataset in Seaborn.
The above dataset, flights_df shows us the monthly footfall in an airport for each year, from
1949 to 1960. The values represent the number of passengers (in thousands) that passed
through the airport. Let’s use a heatmap to visualize the above data.

The brighter the color, the higher the footfall at the airport. By looking at the graph, we can
infer that :
1. The annual footfall for any given year is highest around July and August.
2. The footfall grows annually. Any month in a year will have a higher footfall when
compared to the previous years.
Let's display the actual values in our heatmap and change the hue to blue.

Innova 3100
No ratings yet
Innova 3100
16 pages
PCA How To.1
No ratings yet
PCA How To.1
13 pages
Lesson Plan - Direct Instruction
No ratings yet
Lesson Plan - Direct Instruction
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
11 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
The Intuition Behind PCA: Machine Learning Assignment
No ratings yet
The Intuition Behind PCA: Machine Learning Assignment
11 pages
Love Report
No ratings yet
Love Report
7 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
pca1
No ratings yet
pca1
3 pages
PCA_Explained -
No ratings yet
PCA_Explained -
9 pages
Pca
No ratings yet
Pca
30 pages
PCA_dev
No ratings yet
PCA_dev
16 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
6 pages
Dimensionality Reduction: Motivation I: Data Compression
No ratings yet
Dimensionality Reduction: Motivation I: Data Compression
35 pages
DS Ca2 PPT 3010 3017
No ratings yet
DS Ca2 PPT 3010 3017
10 pages
PCA Theory
No ratings yet
PCA Theory
13 pages
Pca
No ratings yet
Pca
17 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
Reduce Data Dimensionality Using PCA
No ratings yet
Reduce Data Dimensionality Using PCA
6 pages
program-3
No ratings yet
program-3
7 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Principal+Component+Analysis
No ratings yet
Principal+Component+Analysis
6 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
3 pages
ML Module 6
No ratings yet
ML Module 6
6 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
2 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
1 page
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
R PCA (Principal Component Analysis) - DataCamp
No ratings yet
R PCA (Principal Component Analysis) - DataCamp
54 pages
Deep Learning for Data Analytics 2023 Answer
No ratings yet
Deep Learning for Data Analytics 2023 Answer
6 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
Module 3
No ratings yet
Module 3
41 pages
Ai ( PCA)
No ratings yet
Ai ( PCA)
3 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
Data Pre-Processing-IV (Feature Extraction-PCA)_7c5a4c5da931f4f69a14c94e7e8b9062
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)_7c5a4c5da931f4f69a14c94e7e8b9062
23 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
IRJMETS443407
No ratings yet
IRJMETS443407
7 pages
DL Unit 2
No ratings yet
DL Unit 2
28 pages
cheat sheet
No ratings yet
cheat sheet
2 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
PCA- PRINCIPAL COMPONENT ANALYSIS 1233
No ratings yet
PCA- PRINCIPAL COMPONENT ANALYSIS 1233
30 pages
Clustering_and_dimensionality_reduction_techniques__PCA__t_SNE__K_means_ (1)
No ratings yet
Clustering_and_dimensionality_reduction_techniques__PCA__t_SNE__K_means_ (1)
15 pages
MAT 211_7
No ratings yet
MAT 211_7
14 pages
Principal Component Analysis1
No ratings yet
Principal Component Analysis1
26 pages
Principal Component Analysis: #Datascience
No ratings yet
Principal Component Analysis: #Datascience
13 pages
03 Principal Components Analysis
No ratings yet
03 Principal Components Analysis
3 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
PR- Unit 4
No ratings yet
PR- Unit 4
15 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
From Everand
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
César Pérez López
No ratings yet
Internship Report
No ratings yet
Internship Report
38 pages
Unit 1 Topography of Pakistan
No ratings yet
Unit 1 Topography of Pakistan
13 pages
Data Extraction Literature Review
100% (3)
Data Extraction Literature Review
8 pages
Resume Complete
No ratings yet
Resume Complete
1 page
1st-GeneralPTA2024-08-09-24
No ratings yet
1st-GeneralPTA2024-08-09-24
3 pages
Annotated Bibliography
No ratings yet
Annotated Bibliography
8 pages
2023 Malgosa, Alvarez, Marre Self Touching Genitals Pleasure and Privacy The Governance of Sexuality in Primary Schools in Spain
No ratings yet
2023 Malgosa, Alvarez, Marre Self Touching Genitals Pleasure and Privacy The Governance of Sexuality in Primary Schools in Spain
16 pages
Adoption Kit: Developed, Updated, and Maintained by Workday
No ratings yet
Adoption Kit: Developed, Updated, and Maintained by Workday
2 pages
PDF-7 The Socio-Cultural Model
100% (2)
PDF-7 The Socio-Cultural Model
16 pages
Emphatic Do, Does, Did An Other Auxiliaries
No ratings yet
Emphatic Do, Does, Did An Other Auxiliaries
9 pages
BBBM 20230929 0604040950013 Statements 1 PDF
No ratings yet
BBBM 20230929 0604040950013 Statements 1 PDF
20 pages
Sr. Analyst JD - Gridlex
No ratings yet
Sr. Analyst JD - Gridlex
3 pages
QUALITATIVE ANALYSIS OF IONS (USP ID Tests)
No ratings yet
QUALITATIVE ANALYSIS OF IONS (USP ID Tests)
12 pages
18mba1036 Retail Management Assignment
No ratings yet
18mba1036 Retail Management Assignment
5 pages
How To Build A Rotary Phase Converter To Convert Single Phase To Three Phase
100% (2)
How To Build A Rotary Phase Converter To Convert Single Phase To Three Phase
6 pages
Catalouge Book
No ratings yet
Catalouge Book
64 pages
PSC Annual Report 2016 - 2017
No ratings yet
PSC Annual Report 2016 - 2017
170 pages
MCQS On DCN
No ratings yet
MCQS On DCN
16 pages
Chapter 1: Company Profile 1.1. Introduction Company
No ratings yet
Chapter 1: Company Profile 1.1. Introduction Company
7 pages
CV - Mohamad Amar Bagus S - HRGA
No ratings yet
CV - Mohamad Amar Bagus S - HRGA
3 pages
Isre 2400
No ratings yet
Isre 2400
8 pages
Lesson Guide in Earth and Life Science I. Objectives
No ratings yet
Lesson Guide in Earth and Life Science I. Objectives
3 pages
Effects of Inquiry-Based Learning Strategies On Chemistry Students' Conceptions in Chemical Kinetics and Equilibrium
No ratings yet
Effects of Inquiry-Based Learning Strategies On Chemistry Students' Conceptions in Chemical Kinetics and Equilibrium
8 pages
Goods and Service Tax Notes
No ratings yet
Goods and Service Tax Notes
5 pages
BSSHGC PPT Proposal
No ratings yet
BSSHGC PPT Proposal
12 pages
Pulmon Lesson Plan
No ratings yet
Pulmon Lesson Plan
7 pages
ProCNC - Design Guide ForCost Effective Machined Parts - IV
No ratings yet
ProCNC - Design Guide ForCost Effective Machined Parts - IV
4 pages
HIGHWAY-PROPOSAL-1
No ratings yet
HIGHWAY-PROPOSAL-1
43 pages

Assignment

Uploaded by

Assignment

Uploaded by

M.

How Does Principal Component Analysis Work?

2. Build the Covariance Matrix

3. Find the Eigenvectors and Eigenvalues

 PCA is used to visualize multidimensional data.

 PCA as a dimension reduction technique is used detect coordinated activities of large

 Analyzing the shape of the yield curve

 Hedging fixed income portfolios

 Implementation of interest rate models

 Forecasting portfolio returns

 Developing asset allocation algorithms

 Developing long short equity trading algorithms

Now, let’s plot a histogram using the hist() function.

First, let’s see how many different species of flowers we have.

You might also like