0% found this document useful (0 votes)
5 views

Lecture 3 Introduction to Linear Algebra (Part 2)

The document outlines the course structure for 'Introduction to Linear Algebra (Part 2)' taught by Jing Li at The Hong Kong Polytechnic University, detailing weekly topics and teaching plans. Key concepts include matrix properties, eigenvectors, eigenvalues, and their applications in data analytics such as K-Means and Principal Component Analysis (PCA). The course also covers determinants, invertibility, and practical methods for solving linear equations using eigenvalues and eigenvectors.

Uploaded by

zhwzhw1115
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture 3 Introduction to Linear Algebra (Part 2)

The document outlines the course structure for 'Introduction to Linear Algebra (Part 2)' taught by Jing Li at The Hong Kong Polytechnic University, detailing weekly topics and teaching plans. Key concepts include matrix properties, eigenvectors, eigenvalues, and their applications in data analytics such as K-Means and Principal Component Analysis (PCA). The course also covers determinants, invertibility, and practical methods for solving linear equations using eigenvalues and eigenvectors.

Uploaded by

zhwzhw1115
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Introduction to Linear

Algebra (Part 2)

Jing Li, Assistant Professor


Department of Computing
The Hong Kong Polytechnic University

1
Week Topic Instructor
1 Data Analytics: An Introduction Jing/Lotto
2 Introduction to Linear Algebra (Part 1) Jing/Lotto
3 Introduction to Linear Algebra (Part 2) Jing/Lotto
4 Introduction to Calculus (Part 1) Jing/Lotto
(+ Quiz 1)
5 Introduction to Calculus (Part 2) Jing/Lotto

Teaching 6
7
In-class Midterm Test
Programming with R (Part 1)
Jing/Lotto
Jibin
Plan 8 Programming with R (Part 2) Jibin
9 Data Visualization Jibin
10 Monto-Carlo Simulation Jibin
11 Linear Regression Jibin
(Assignment out)
12 Time-series Analysis Jibin
(+ Quiz 2)
13 Review and Exam Q&A Jibin & Jing/Lotto
(Assignment due)
Course Structure
Simulation Time-Series
Advanced Data Analytics (3 lectures) Analysis
Regression

Mathematical Basics R programming


(4 lectures) (3 lectures)

Linear Algebra Environment


Data Manipulation
Calculus
Data Analytics

3
Roadmap
• Matrix Properties
• Eigenvectors and Eigenvalues
• Linear Algebra Applications for Data Analytics
• K-Means
• Principal Component Analysis (PCA)

4
Roadmap
• Matrix Properties
• Eigenvectors and Eigenvalues
• Linear Algebra Applications for Data Analytics
• K-Means
• Principal Component Analysis (PCA)

5
𝒗1 , 𝒗2 , 𝒗3 , … can be

Span (a space spanned viewed as the


coordinate axis of the
space and 𝑤1 , w2 , 𝑤3 , …
by some vectors) the coordinates of a
corresinponding vector.

Set of all linear combination of a set of vectors


Given a set of vectors 𝒮 = 𝒗1 , 𝒗2 , 𝒗3 , … , where 𝒗𝑖 ∈ ℝ𝑀 ∀ 𝑖
𝑠𝑝𝑎𝑛 𝒮 = 𝑤1 𝒗1 + 𝑤2 𝒗2 + 𝑤3 𝒗3 + ⋯ 𝑤𝑖 ∈ ℝ
Span of a set of vectors is an example of a vector space (a more
general concept).
The dimension of the span is denoted as the rank.

6
What do matrices do to vectors with multiplication?

(3,5)
•The new vector is:
1) rotated
(2,1)
2) scaled

7
Geometric definition of the determinant: The “magnitude”
of the transformation made by multiplying a matrix.

(0,1)

(1,0)
The determinant of a square
matrix is a scalar that provides
information about the matrix.
- E.g., Invertibility of the matrix.
8
Determinant vs. Invertibility
If a matrix 𝑀 is It is given by det(AB)=det(A)det(B). In other
invertible, then we’ll fine words, when you transform a geometric
𝑀−1 with shape first by matrix A and then by matrix B,
𝑀 𝑀−1 = 𝑀−1𝑀 = ി 1 you're scaling the volume of the shape by
Then, we shall have det(A) due to the first transformation, and
then by det(B) due to the second
det 𝑀 𝑀−1 =
transformation. The total effect on the volume
det 𝑀 det 𝑀−1 = 1
is the multiplication of these two scale factors,
It indicates that which is why det(AB) = det(A)det(B).
det 𝑀 ≠ 0
9
Recall: Determinants in High School

2X2 3x3
𝑎1 𝑎2 𝑎3
𝑎 𝑏 𝐴 = 𝑎4 𝑎5 𝑎6
𝐴= 𝑎7 𝑎8 𝑎9
𝑐 𝑑

𝑑𝑒𝑡 𝐴 = 𝑎𝑑 𝑑𝑒𝑡 𝐴 =
−𝑏𝑐 𝑎1 𝑎5 𝑎9 +𝑎2 𝑎6 𝑎7 +𝑎3 𝑎4 𝑎8
−𝑎3 𝑎5 𝑎7 −𝑎2 𝑎4 𝑎9 −𝑎1 𝑎6 𝑎8

10
Recall: Determinants in High School
𝑎1 𝑎2 𝑎3
• 2X2 • 3x3
𝐴 = 𝑎4 𝑎5 𝑎6
𝑎 𝑏 𝑎7 𝑎8 𝑎9
𝐴=
𝑐 𝑑
𝑎7 , 𝑎8 , 𝑎9
|𝑑𝑒𝑡 𝐴 |
(c,d)

V
(a,b)
𝑎1 , 𝑎2 , 𝑎3
𝑎4 , 𝑎5 , 𝑎6
11
Example: solve the algebraic equation

If the determinant is
non-zero, then inverse
exists, and we can
simply multiply both
sides of the equation by
the inverse. And It can
only conclude that x=0.

12
Example of an underdetermined system

• This is because det(M)=0 so M is


not invertible. If det(M) isn’t 0,
the only solution is x = 0

Recall that the two vectors


are anti-aligned, and cannot
determine a 2D space, the
rank is 1 (not full rank)
13
What do matrices do to vectors?

(3,5)
• The new vector is:
1) rotated
(2,1)
2) scaled

Are there any special vectors


that only get scaled?
14
Roadmap
• Matrix Properties
• Eigenvectors and Eigenvalues
• Linear Algebra Applications for Data Analytics
• K-Means
• Principal Component Analysis (PCA)

15
Are there any special vectors
that only get scaled?

• For this special vector, multiplying


= (3,3)
by M is like multiplying by a scalar.
• (1,1) is called an eigenvector of M
= (1,1) • 3 (the scaling factor) is called the
eigenvalue associated with this
eigenvector
16
• Yes! Let us try others!

Are there any


other • Exercise: verify that (-1.5, 1) is also an
eigenvectors? eigenvector of M.
• Note: eigenvectors are only defined up to a
scale factor: if 𝑀𝑒 = 𝜆𝑒, then for any non-
zero 𝑐, we’ll have 𝑀 𝑐𝑒 = 𝜆 𝑐𝑒 .
– Conventions are either to make e’s unit vectors
(with length 1), or make one of the elements 1
17
Step back:
Eigenvectors obey this equation

• This is called the


characteristic equation for 
• In general, for an N x N
matrix, there are N
eigenvectors

18
Practical program for approaching equations
coupled through a term Mx=b
• Step 1: Find the eigenvalues • Suppose that we have known it!
and eigenvectors of M.
• Step 2: Decompose x into its
eigenvector components
• Step 3: Stretch/scale each
eigenvalue component
• Step 4: (solve for c and)
transform back to original
coordinates.

19
Practical program for approaching equations
coupled through a term Mx=b
• Step 1: Find the eigenvalues
and eigenvectors of M.
• Step 2: Decompose x into its
eigenvector components
• Step 3: Stretch/scale each
eigenvalue component
• Step 4: (solve for c and)
transform back to original
coordinates.

20
Practical program for approaching equations
coupled through a term Mx=b
• Step 1: Find the eigenvalues Because
and eigenvectors of M.
• Step 2: Decompose x into its
eigenvector components
• Step 3: Stretch/scale each
eigenvalue component
• Step 4: (solve for c and) We can likewise decompose 𝑏 into the eigenvector
components and solve for the coordinates c:
transform back to original (1) (2) (𝑛)
𝑏 = b1 𝑒1 + b2 𝑒2 + ⋯ + bn𝑒𝑛
coordinates.

21
Practical program for approaching equations
coupled through a term Mx=b
• Step 1: Find the eigenvalues So, 𝑐𝑖 = 𝑏𝑖 /𝜆𝑖 , where 𝑖 = 1,2, … , 𝑛
and eigenvectors of M.
• Step 2: Decompose x into its We have solved 𝑐 and let us just put
eigenvector components it back to form the solutions for 𝑥
• Step 3: Stretch/scale each
eigenvalue component
• Step 4: (solve for c and)
transform back to original
coordinates.

22
Putting it all together…

Where (step 1):

MATLAB:
23
Putting it all together…

Step 2: Transform
into eigencoordinates
Step 4: Transform
Step 3: Scale by i
back to original
along the ith
coordinate system
eigencoordinate
24
It makes solving Mx=b easier. Because
𝐸 −1 𝑀𝑥 = Λ 𝐸−1 𝑥, and Λ is a diagonal
matrix (allowing easy solving of 𝐸−1 𝑥)

Left eigenvectors
• The rows of E inverse are
called the left eigenvectors
• because they satisfy
𝐸 −1 𝑀 = Λ 𝐸 −1 .
• Together with the eigenvalues,
they determine how x is
decomposed into each of its
eigenvector components.
25
Putting it all together…

Original Matrix Matrix in


eigencoordinate
system
Where:

26
Putting it all together…

Original Matrix Matrix in eigencoordinate system


• Note: M and Lambda look very different.
Q: Are there any properties that are preserved between them?
A: Yes, 2 very important ones:
Its name is trace!
1.

2.
27
Example: Solving linear equations
Convenient and concise way to work with math involving linear operations
𝑉𝐸 = EΛ (Λ, E gathers eigenvalue and eigenvector)
3𝜃1 + 0𝜃2 + 5𝜃3 = 36 ⟹ 𝐸Λ𝐸 −1 = 𝑉 ⟹ Λ𝐸 −1 𝜽 = 𝐸 −1 𝑉𝜽 = 𝐸 −1 𝒖 (i.e.,
transform into eigencoordinates). Here, we can first
4𝜃1 + 3𝜃2 + 2𝜃3 = 46
solve 𝐲 = 𝐸 −1 𝜽 (the solution in eigencoordinates),
2𝜃1 + 2𝜃2 + 1𝜃3 = 25 and put the solution back to the original
coordinates (𝜽 = 𝐸𝐲)
3 0 5 36 𝜃1 Because Λ is a
𝑉𝜽 = 𝒖 𝑉= 4 3 2 𝒖 = 46 𝜽 = 𝜃2 diagonal matrix, so
solving 𝐲 is easier
2 2 1 25 𝜃3 than solving 𝜽.

28
Special Matrices

Symmetric Matrix:

• e.g., Covariance matrices


• Properties:
– Eigenvalues are real
– Eigenvectors are orthogonal (i.e. it’s a normal matrix)

29
Roadmap
• Matrix Properties
• Eigenvectors and Eigenvalues
• Linear Algebra Applications for Data Analytics
• K-Means
• Principal Component Analysis (PCA)

30
Roadmap
• Matrix Properties
• Eigenvectors and Eigenvalues
• Linear Algebra Applications for Data Analytics
• K-Means
• Principal Component Analysis (PCA)

31
Recall: Vectors’ Distance
The Euclidean distance (or distance) of two 𝑛-vectors 𝑥 and 𝑦 is:
𝑥−𝑦 = 𝑥1 − 𝑦1 2 + 𝑥2 − 𝑦2 2 + ⋯ + 𝑥𝑛 − 𝑦𝑛 2

Length of the subtraction of the two vectors.

The straight-line distance from


one point to the other point
represented by their vectors.

32
Recall: Example: Document Distance

5 Wikipedia articles:
Veterans Day, Memorial Day, Academy Awards, Golden
Globe Awards, Super Bowl
Word count vectors with 4,423 words in dictionary.
Pairwise
Distance

33
Clustering
• Given 𝑁 𝑛-vectors,
𝑥1 , 𝑥2 , … , 𝑥𝑁
• Partition (cluster) them
into 𝑘 clusters
• Our goal is to let vectors
in the same cluster to
be close to each other.

34
Clustering Objective
• Given 𝑁 𝑛-vectors, 𝑥1 , 𝑥2 , … , 𝑥𝑁
• Partition (cluster) them into 𝑘 clusters: 𝐺1 , 𝐺2 , … , 𝐺𝑘
• Group assignment: 𝑐𝑖 is the index of the group assigned to vector 𝑥𝑖 ,
i.e., 𝑥𝑖 ∈ 𝐺𝑐𝑖
• Group representatives:
• 𝑛-vectors 𝑧1 , 𝑧2 , … , 𝑧𝑘
• Clustering objective is:
1 2
𝐽𝑐𝑙𝑢𝑠𝑡𝑒𝑟 = σ𝑁 𝑥𝑖 − 𝑧𝑐𝑖
𝑁 𝑖=1
• Smaller, the better!
35
Clustering Objective
• Given 𝑁 𝑛-vectors, 𝑥1 , 𝑥2 , … , 𝑥𝑁
• Partition (cluster) them into 𝑘 clusters: 𝐺1 , 𝐺2 , … , 𝐺𝑘
• Group assignment: 𝑐𝑖 is the index of the group assigned to vector
𝑥𝑖 , i.e., 𝑥𝑖 ∈ 𝐺𝑐𝑖
• Group representatives:
• 𝑛-vectors 𝑧1 , 𝑧2 , … , 𝑧𝑘
• Clustering objective is:
1 2
• 𝐽 𝑐𝑙𝑢𝑠𝑡𝑒𝑟
= σ𝑁 𝑥𝑖 − 𝑧𝑐𝑖
𝑁 𝑖=1
• 𝑧𝑐𝑖 should be the mean of 𝐺𝑐𝑖
36
Clustering Objective
• Given 𝑁 𝑛-vectors, 𝑥1 , 𝑥2 , … , 𝑥𝑁
• Partition (cluster) them into 𝑘 clusters: 𝐺1 , 𝐺2 , … , 𝐺𝑘
• Group assignment: 𝑐𝑖 is the index of the group assigned to vector
𝑥𝑖 , i.e., 𝑥𝑖 ∈ 𝐺𝑐𝑖
• Group representatives:
• 𝑛-vectors 𝑧1 , 𝑧2 , … , 𝑧𝑘
• Clustering objective is:
1 2
• 𝐽 𝑐𝑙𝑢𝑠𝑡𝑒𝑟
= σ𝑁 𝑥𝑖 − 𝑧𝑐𝑖
𝑁 𝑖=1
• 𝑧𝑐𝑖 should be the mean of 𝐺𝑐𝑖
37
• Alternatively updating the group assignment,
K-means then the representatives.

Clustering • 𝐽𝑐𝑙𝑢𝑠𝑡𝑒𝑟 goes down in each step.


• No guarantee to minimize 𝐽𝑐𝑙𝑢𝑠𝑡𝑒𝑟
Algorithm

38
Running K-
means
Clustering (at
beginning)

39
Running K-
means
Clustering
(Iteration 1)

40
Running K-
means
Clustering
(Iteration 2)

41
Running K-
means
Clustering
(Iteration 3)

42
Running K-
means
Clustering
(Iteration 10)

43
Running K-
means
Clustering
(At last)

44
Example: Topic Discovery
• 𝑁 = 500 Wikipedia articles
• Dictionary size 𝑛 = 4423
• Run K-means algorithm with 𝑘 = 9.
• Results:
• Top words in the cluster representatives, mean of word
vectors in the cluster.
• Titles of articles closest to the representatives.

45
Example: Topic
Discovery
46 (C1-3)

Top 5 words in the cluster


Titles of articles closest to representatives --- mean of
the representatives. word vectors in the cluster in
the normalized form).

46
Example: Topic
Discovery
47 (C1-3)

47
Roadmap
• Matrix Properties
• Eigenvectors and Eigenvalues
• Linear Algebra Applications for Data Analytics
• K-Means
• Principal Component Analysis (PCA)

48
Data Compression

49
Principal
Component
Analysis (PCA)
• • Given a set of points, how do we
know if they can be compressed
like in the previous example?
• – The answer is to look into the
correlation between the points
• – The tool for doing this is called
Principal Component Analysis (PCA)

50
The intuition of PCA
• By finding the eigenvalues and eigenvectors of the
covariance matrix, we find that the eigenvectors with
the largest eigenvalues correspond to the dimensions
that have the strongest correlation in the dataset.
• This is the principal component.
• PCA is a useful statistical technique that has found
application in:
• fields needed to compress the data
• finding patterns in data of high dimension.

51
• Imagine you have a dataset 𝐷 with two features, 𝑥 and 𝑦,
Example of PCA representing some measurements.
implementation • The dataset can be represented as matrix D = 𝑑1 , 𝑑2, … , 𝑑𝑛 ,
where the 𝑖-th column vector 𝑑𝑖 represents the 𝑖-th data sample.
(Step 0) Their mean is denoted as 𝑑.ҧ

Sample No. 𝒙 𝒚

1 2.5 2.4

2 0.5 0.7

3 2.2 2.9

4 1.9 2.2

5 3.1 3.0

6 2.3 2.7

7 2.0 1.6

8 1.0 1.1

9 1.5 1.6

10 1.1 0.9

52
• Standardize the dataset so that each feature has a mean of 0. It
Example of PCA involves subtracting the mean of each feature from the data
points. 𝑆 is the standardized matrix with mean subtracted.
Implementation • Calculate the co-variance matrix of 𝐷 denoted as 𝑄:
(Step 1) 0.616555556 0.615444444
𝑄=
0.615444444 0.716555556

Several properties of the co-variance matrix

• 𝑄 is square
• 𝑄 is symmetric
• 𝑄 can be seen as the scatter matrix
• 𝑄 can be very large (in vision, N is
often the number of pixels in an
image!)

53
• The two eigenvectors
Example of PCA are orthogonal.
• One of the eigenvectors
Implementation (Step 2) Eigenvector 2 goes through the
Eigenvector 1 middle of the points,
• Compute the eigenvectors and eigenvalues
like drawing a line of
of the covariance matrix. These will help us best fit.
determine the principal components.
• The second eigenvector
• Eigenvectors: Directions of the axes
of the new feature space. gives us the other, less
• Eigenvalues: Magnitudes of variance important, pattern in
along the eigenvectors. the data, that all the
• For example, you might find: points follow the main
−0.672
• Eigenvector 1: (
−0.741
) with line, but are off to the
eigenvalue 1.284 side of the main line by
−0.741
• Eigenvector 2: (
0.672
) with some amount.
eigenvalue 0.049

54
• Decide how many principal components to keep.
Say, we keep 𝑘 dimensions, then we take the
top 𝑘 eigenvectors, 𝑒1 , 𝑒2, … , 𝑒𝑘 sorted by their
corresponding eigenvalues.
• Then we measure the projection on the
Example of PCA eigenvectors by S𝑇 (𝑒1 𝑒2 … 𝑒𝑘 )
Implementation • For this example, let‘s keep just the first
(Step 3) principal component, as it explains most of the
variance. Here, the 𝑖-th data sample will have
the projection on 𝑒1 with (𝑑𝑖 −𝑑)ҧ 𝑇 𝑒1.
• By projecting the original data onto the new
feature space using the selected eigenvectors,
we reduce the dimensionality of the data.

55
Final PCA
Results
(2D→1D)
PCA aims to find projection
that maximizes the variance
of the data samples.

56
A slide to take away
• The determinant of a matrix, det(A), is a scalar value that reflects
the volume scaling factor of the linear transformation
represented by the matrix.
• A matrix is invertible if and only if its determinant is non-zero.
• Eigenvalues and eigenvectors of a matrix represent scalars and
non-zero vectors such that the matrix transformation scales the
eigenvectors by their corresponding eigenvalues.
• Linear Algebra applications: K-means and PCA
• K-means is a clustering algorithm that partitions a dataset
into K distinct, non-overlapping groups (clusters) by assigning
each data point to the cluster with the nearest mean.
• PCA is a dimensionality reduction technique that transforms
data into a new coordinate system, where the greatest
variance lies on the first axis, the second greatest variance on
the second axis, and so on, effectively compressing the data
while preserving its most important features. 57

You might also like