0% found this document useful (0 votes)
4 views

MathCamp2024 Linear Algebra

Uploaded by

Thep Williams
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

MathCamp2024 Linear Algebra

Uploaded by

Thep Williams
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Math Camp 2024 – Linear Algebra∗

Seonmin Will Heo†

Department of Economics, UC Santa Barbara

August 16, 2024

1 Vectors
A vector can be defined in many different ways, but let us broadly define a vector as a list of objects that
is used to represent and conceptualize ideas.

Definition 1.1. A column vector is a n × 1 matrix:


   
a11 a1
 ..   .. 
 .  or .
an1 an

While a row vector is a 1 × m matrix:


   
a11 · · · a1m or a1 · · · am
   
2 1
Example 1.2. Consider 2 × 1 vectors u = and v = . These vectors “live” in R2 (the set of ordered
1 2
pairs of real numbers). Geometrically, we can interpret R2 as a 2-dimensional plane and vectors as points
on the plane:

Definition 1.3. Two vectors in Rn are equal if and only if their corresponding elements are equal.
   
2 1
Example 1.4. Our vectors from above, u = and v = , are .
1 2
Definition 1.5. Given two vectors u and v, their sum is the vector u + v, obtained by adding the
corresponding elements.

This lecture note is for personal use only and is not intended for reproduction, distribution, or citation.

This lecture note was originally written by James Banovetz.

1
   
2 1
Example 1.6. Consider the vectors u = and v = .
1 2
     
2 1 3
u+v = + =
1 2 3

Definition 1.7. Given a vector u and a real number c, the scalar multiple of u by c is the vector
obtained by multiplying each element in u by c.
 
2
Example 1.8. Consider the vector u = and the scalar c = 3. Then
1
   
2 6
cu = 3 =
1 3

Aside. Try to imagine each element as a scalar. For example, a vector [2, 3] can be represented as 2i + 3j,
where i and j are the “basis vectors” of the xy coordinate system. This exercise will come in handy later.

Definition 1.9. Two vectors u and v in Rn are parallel if and only if there exists a real number c ∈ R\{0}
such that
u = cv
Definition 1.10. Given vectors v1 , v2 , · · · , vk and scalars c1 , c2 , · · · , ck , the vector y defined by

y = c1 v1 + · · · + ck vk

is called a linear combination of v1 , · · · , vk with weights c1 , · · · , ck .


Definition 1.11. If v1 , · · · , vk ∈ Rn , then the set of all linear combinations of v1 , · · · , vk is called span
of v1 , · · · , vk . That is, the span is collection of all vectors that can be written in the form
n o
w ∈ Rn : w = c1 v1 + c2 v2 + · · · + ck vk for any [c1 · · · ck ]′ ∈ Rk

Definition 1.12. v1 , · · · , vk ∈ Rn spans W if given any w ∈ W,


k
X
′ k
∃ [c1 · · · ck ] ∈ R ∋ w= ci vi
i=1

Definition 1.13. A set of vectors v1 , · · · , vk ∈ Rn is linearly dependent if and only if at least one of
the vectors can be written as a linear combination of the others. If no vector is a linear combination of
the others, the set of vectors are linearly independent.
Remark A set of vectors v1 , · · · , vk ∈ Rn is linearly independent if and only if the vector equation

c1 v1 + c2 v2 + · · · + ck vk = 0n

has only the trivial solution.


Definition 1.14. trivial solution
From the vector equation
c1 v1 + c2 v2 + · · · + ck vk = 0n
it is clear that c1 = 0, c2 = 0, . . . , ck = 0 is a solution to such a system; it is called the trivial solution. Any
solution in which at least one coefficient has nonzero value is called a non-trivial solution.

2
Aside. Let us assume that v1 , · · · , vk ∈ Rn is linearly dependent. WLOG, there exists i ∈ {1, · · · , k}
such that ci ̸= 0. Then
k k
X X cj
c i vi + c j vj = 0 n and vi = − vj
j=1, j̸=i j=1, j̸=i
c i

Example 1.15. Consider the vectors:


     
1 4 2
v1 = 2 ,
 v2 = 5 ,
 v3 = 1

3 6 0

By inspection, v3 = −2v1 + v2 , so the vectors are linearly dependent.

Definition 1.16. v1 , · · · , vk ∈ W ⊂ Rn forms a basis of W if

1. v1 , · · · , vk spans W, and

2. v1 , · · · , vk are linearly independent.


   
1 0
Example 1.17. Consider the vectors and . These vectors span R2 ; any vector in R2 can be
0 −2
written as a linear combination of these two vectors.

Aside. At this point, I highly encourage everyone to watch “Lecture 3: Essence of linear algebra” by 3b1b
in case you haven’t watched it. If you have ever been confused about some notions in linear algebra, this
video may provide you a different way to interpret and understand them.

3
2 Systems of Linear Equations and Matrix Equations
Definition 2.1. A linear equation in the variables x1 , x2 , · · · , xn is an equation that can be written in
the form
a1 x 1 + a2 x 2 + · · · + an x n = b
where b and the coefficients a1 , . . . , an are real or complex numbers. A system of linear equations is a
collection of one or more linear equations involving the same variables.

Example 2.2. Consider the following system of linear equations:

x1 − 2x2 + x3 = 0
2x2 − 8x3 = 8
−4x1 + 5x2 + 9x3 = −9

Definition 2.3. The compact rectangular array containing the coefficients of each variable aligned in
columns is called the coefficient matrix.

Example 2.4. From the system of linear equations above, the coefficient matrix is
 
1 −2 1
0 2 −8
−4 5 9

Definition 2.5. The coefficient matrix, concatenated with a column consisting of the right-hand-side
constants is called the augmented matrix.

Example 2.6. From the system of linear equations above, the augmented matrix is
 
1 −2 1 0
 0 2 −8 8 
−4 5 9 −9

Definition 2.7. The size of a matrix tells how many rows and columns it has. An m × n matrix has m
rows and n columns:  
a11 a12 a13 . . . a1n
 a21 a22 a23 . . . a2n 
 
 a31 a32 a33 . . . a3n 
 
 .. .. .. ... .. 
 . . . . 
am1 am2 am3 . . . amn

Definition 2.8. The element aij is the value in row i, column j.

Definition 2.9. The three basic operations that are used to solve the system of equations are known as
elementary row operations. These operations consist of:

• Scaling - Multiply all entries in a row by a non-zero constant

• Replacement - Replace one row by the sum of itself and the multiple of another row

• Interchange - Interchange any two rows

4
Example 2.10. Consider the augmented matrix from our system above:
   
1 −2 1 0 1 −2 1 0
 0 2 −8 8  → Replacing R3 with R3 + 4R1 : →  0 2 −8 8 
−4 5 9 −9 0 −3 13 −9

Multiplying R2 by 1/2
   
1 −2 1 0 1 −2 1 0
 0 1 −4 4  → Replacing R3 with R3 + 3R2 : →  0 1 −4 4 
0 −3 13 −9 0 0 1 3

Replacing R1 with 2R2 + R1 :


   
1 0 −7 8 1 0 0 29
 0 1 −4 4  → Replacing R1 with 7R3 + R1 : →  0 1 −4 4 
0 0 1 3 0 0 1 3

Replacing R2 with 4R3 + R2 :


 
1 0 0 29 x1 = 29
 0 1 0 16  → The simplified system: → x2 = 16
0 0 1 3 x3 = 3

Aside. This matrix is now in what we call reduced row echelon form. Don’t worry about the
terminology, however, what’s important is the row operations, which may come in handy a few times
throughout the year. Also note that not all linear systems have unique solutions; it is possible to have a
unique solution, infinite solutions, or no solutions.

Definition 2.11. If A is an m×n matrix with columns a1 , · · · , an ∈ Rm , and if x ∈ Rn , then the product
of A and x, denoted Ax, is the linear combination of the columns of A using the corresponding entries in
x as weights
 
 
  x1
  . 
Ax =  a1 · · · an
  ..  = x1 a1 + · · · + xn an (Column Expansion)
 
  x
n

and the equation Ax = b is called a matrix equation.

Theorem 2.12. If A is a m × n matrix with columns a1 , . . . , an ∈ Rm and if b ∈ Rm , then the matrix


equation Ax = b has the same solution set as the linear equations whose augmented matrix is
 
a1 · · · an b

Example 2.13. Consider our first example, the linear system:

x1 − 2x2 + x3 = 0
2x2 − 8x3 = 8
−4x1 + 5x2 + 9x3 = −9

5
Let A be the coefficient matrix:  
1 −2 1
A= 0 2 −8
−4 5 9
Then we can rewrite the linear system as a matrix equation Ax = b:
    
1 −2 1 x1 0
0 2 −8 x2  =  8 
−4 5 9 x3 −9

Aside. We will work with matrix equations quite a bit during the first year, especially when we
learn a regression framework. However, we will almost never apply elementary row operations to
convert matrices into the reduced row echelon form by hand. We are revisiting them they are taught
in a linear algebra course and we’d be remiss if we don’t know what they are. The key takeaway is
that these elementary row operations do not alter the solution set. It is a good time to start thinking
about how the matrix equations are written and what they mean.

yi = xi1 β1 + xi2 β2 + · · · + xik−1 βk−1 + βk + ui ∀ i = 1, · · · , n


= xi T β + ui ∀ i = 1, · · · , n

=⇒ y = Xβ + u

6
3 Matrix Operations
Definition 3.1. Let A and B be m × n matrices. Let aij and bij denote the elements in the ith row and
jth column of the matrices A and B, respectively. Then A and B are equal, denoted A = B, if and only
if aij = bij for all i and all j.

Aside. Note that this is simply an extension of our definition for vector equality; indeed, we could define
matrix equality in terms of vector equality, i.e., two matrices are equal if and only if their corresponding
vectors are equal.

Definition 3.2. If A and B are both m × n matrices, then the sum A + B is the m × n matrix whose
elements are the sums of the corresponding elements from A and B. The difference A − B is the m × n
matrix whose elements are the differences of the corresponding elements of A and B.
     
a11 a12 · · · a1n b11 b12 · · · b1n a11 + b11 a12 + b12 · · · a1n + b1n
 a21 a22 · · · a2n   b21 b22 · · · b2n   a21 + b21 a22 + b22 . . . a2n + b2n 
A + B =  .. ..  +  .. ..  = 
     
.. . . .. . . .. .. . . .. 
 . . . .   . . . .   . . . . 
am1 am2 . . . amn bm1 bm2 · · · bmn am1 + bm1 am2 + bm2 · · · amn + bmn

Definition 3.3. If k is a scalar and A is a matrix, then the scalar multiple kA is the matrix whose
elements are k times the corresponding elements in A.
 
ka11 ka12 · · · ka1n
 ka21 ka22 · · · ka2n 
kA =  ..
 
.. . . .. 
 . . . . 
kam1 kam2 · · · kamn

Definition 3.4. If A is a m×n matrix and B is a n×k matrix with columns b1 , · · · , bk , then the product
AB is the m × k matrix whose columns are Ab1 , · · · , Abk . Note that the number of columns of A must
match the number of rows in B for the matrices to be conformable.

Example 3.5. Suppose we have two matrices: A1×2 and B2×3 . Then AB is permissible, but BA is not. If
we have A1×2 B2×3 = C1×3 , then

• For c11 : c11 = a11 b11 + a12 b21

b11 b12 b13


a11 a12
b21 b22 b23

• For c12 : c12 = a11 b12 + a12 b22

b11 b12 b13


a11 a12
b21 b22 b23

7
Aside. There are a few special matrices with which you should be familiar. For example:
1. Identity Matrix (a diagonal matrix of ones)
 
1 0 ··· 0
0 1 ··· 0
In =  ..
 
.. . . .. 
. . . .
0 0 ··· 1

Note that for all n by n matrices A:


AIn = In A = A

2. Null Matrix (a matrix of zeros)


 
0 00 ··· 0
0 00 ··· 0
0m×n =  ..
 
..
.. . . .. 
. .. . .
0 0 0 ··· 0

Note that for all matrices A and “conformable” matrices 0:

A+0=A 0+A=A
0A = 0 A0 = 0

3. Idempotent Matrix (a matrix that is the product of itself)

A = AA

Theorem 3.6. Let A be a m × n matrix, and let B and C have dimensions for which the indicated sums
and products are defined. Then

A(BC) = (AB)C = ABC (Associative Law)


A(B + C) = AB + AC (Left Distributive Law)
(B + C)A = BA + CA (Right Distributive Law)

Aside. Note that in matrix algebra, we do not have commutativity; that is, AB ̸= BA in general. It
may be true of specific matrices, but in most cases (as in an example above), the matrices won’t even be
conformable in both orders.
Definition 3.7. Given a m × n matrix A, the transpose of A is the n × m matrix, denoted AT or A′ ,
whose columns are formed from the corresponding rows of A.
Example 3.8. Consider the following matrices:
   
3 8 −9 3 4
A= B=
1 0 4 1 7
 
3 1  
3 1
AT =  8 0 B = T
4 7
−9 4

Remark. A property of matrices that occasionally comes in handy is symmetry, which means A = AT .
Later on, we’ll talk about Hessian matrices, which have this property.

8
Theorem 3.9. Let A and B denote matrices whose sizes are appropriate for the following sums and
products.

(AT )T = A
(A + B)T = AT + B T
(AB)T = B T AT

Definition 3.10. The determinant of a square matrix A, denoted by |A| or det(A), is a uniquely defined
scalar associated with the matrix. For matrices of various sizes:
1. 1 × 1 Matrix: A = [a]
|A| = a


a b
2. 2 × 2 Matrix: A =
c d
a b
|A| = = ad − bc
c d

3. Larger Matrix: Check the appendix 1.


Theorem 3.11. 1. Taking the transpose does not affect the determinant:

|A| = |AT |

2. Scaling a row by k will change the value of the determinant k-fold, e.g.:
ka kb a b
=k
c d c d

3. Replacement – adding a multiple of a row (or column) to another row (or column) – leaves the
determinant unchanged, e.g.:
a b a b
=
c + ka d + kb c d

4. Interchanging any two rows (or columns) reverses the sign of the determinant (but does not change
the absolute value), e.g.:
a b c d
=−
c d a b
Definition 3.12. A matrix A is singular if and only if

|A| = 0

which occurs only when rows (or columns) are linearly dependent.
Remark. I’m bringing this to your attention primarily for econometric reasons; perfect collinearity means
that two of your explanatory variables are linear combinations of one another. In this case, STATA, R, or
whatever program you ultimately use to run regressions will fail (as they use matrix algebra to do OLS).
Definition 3.13. A n × n matrix is said to be invertible if there is an n × n matrix C such that

CA = In and AC = In

In this case, C is the inverse of A. In fact, the inverse is unique, so it is typically denoted A−1 .

9
Theorem 3.14. If A is a 2 × 2 matrix  
a b
A=
c d
and ad − bc ̸= 0 (i.e., the determinant is non-zero/the matrix is ), then the inverse is given by
the formula:  
−1 1 d −b
A =
ad − bc −c a
Check the appendix 2 for a larger matrix.

Example 3.15. Consider the matrix  


3 4
A=
5 6
Since the determinant is |A| = (3 ∗ 6) − (4 ∗ 5) = −2 ̸= 0, A is invertible:

|A| = −2 (the determinant)


 
−1 1 6 −4
A = (by THM 2.4)
−2 −5 3
 
−1 −3 2
A = (simplifying)
5/2 −3/2

Checking to make sure we did everything properly:


  
−1 3 4 −3 2
AA = (multiplying matrices)
5 6 5/2 −3/2
 
1 0
AA−1 = (simplifying)
0 1

Theorem 3.16. If A and B are n × n invertible matrices, then

1. A−1 is invertible and


(A−1 )−1 = A

2. AB is invertible and
(AB)−1 = B −1 A−1

3. AT is invertible and
(AT )−1 = (A−1 )T

Example 3.17. Given a n × k matrix X where n > k, assume that XT X is invertible.


 −1 T  −1 T
T T T T T
XT

X X X = X X X X
 T −1 T
= X XT X X
−1
= X XT X X T (Symmetric)
−1 −1 −1
X X T X XT X X T X XT = X XT X I k XT
−1
= X XT X X T (Idempotent)

10
4 Connecting Concepts
Theorem 4.1. Let A be a n × n matrix. Then the following statements are equivalent:
1. |A| =
̸ 0

2. A is invertible

3. The equation Ax = b has a unique solution x = A−1 b for each b ∈ Rn

4. The equation Ax = 0 has only the trivial (i.e., x = 0n ) solution

5. The columns of A form a linearly independent set

6. The columns of A span Rn

7. A has full rank


Aside. With the determinant and inverse, we now have a few different ways to solve linear systems that
we may come across during the first year:
1. Standard substitution (potentially very time consuming)

2. Row operations on an augmented matrix

3. Inverting A to find x = A−1 b


In reality, however, you aren’t asked to solve entire linear systems all that frequently; the first two tools
frequently suffice. Understanding how inverses work, however, is important for OLS. There is one additional
solution concept that you may also find helpful.
Theorem 4.2. Given a system of equations Ax = b, where A is a n × n invertible matrix and b ∈ R, let
Ai (b) be the matrix obtained from A by replacing column i with b:
 
Ai (b) = a1 . . . ai−1 b ai+1 . . . an
Then for any b ∈ Rn , the unique solution x of Ax = b has entries given by:
|Ai (b)|
xi = , i = 1, 2, . . . , n
|A|
Check the Appendix 3(Cramer’s Rule) for the example in economics.
Definition 4.3. The rank of a matrix is the number of nonzero rows in its row echelon form.
Theorem 4.4. Let A be a m × n matrix and B be a n × k matrix. Then
1. rank(A) = rank(AT )

2. rank(A) ≤ min{m, n}

3. rank(AB) ≤ min{rank(A), rank(B)}

4. rank(A) = rank(AT A) = rank(AAT )


Aside. Given a n × k matrix X where n > k, suppose that rank(X) = k (full rank). Then
rank(X) = rank(XT X) = k
Therefore, XT X is invertible.

11
5 Quadratic Forms
Definition 5.1. A quadratic form on Rn is a function Q defined on Rn whose value at a vector x ∈ Rn
can be written in the form
Q(x) = xT Ax
where A is a n × n symmetric matrix.

Example 5.2. Consider a two-variable quadratic form

Q(x) = ax2 + 2bxy + cy 2 (a two-variable quadratic form)


  
  a b x
Q(x) = x y (rewriting in matrix form)
b c y

Note that the matrix of constants(A) is not unique!


      
  1 0 x   1 1 x
x y = x y
0 1 y −1 1 y

Aside. The matrix associated with a quadratic form need not be symmetric. However, no loss of gener-
ality is obtained by assuming A is symmetric. We can always take definite and semidefinite matrices to
be symmetric since they are defined by a quadratic form. Specifically consider a nonsymmetric matrix A
and replace it with 21 (A + AT ). Then A is now symmetric and still it has the same Q(x).

Aside. Frequently, you will also see quadratic forms written in double-sum notation:
n X
X n
T
x Ax = aij xi xj
i=1 j=1

Where aij are elements of A, which is symmetric, i.e., aij = aji ∀i, j.
If off-diagonal elements are all zero, then
n
X
xT Ax = aii x2i
i=1

Definition 5.3. Let Q(x) be a quadratic form. Then Q(x) is:

1. Positive definite if Q(x) > 0 for all x ̸= 0

2. Negative definite if Q(x) < 0 for all x ̸= 0

3. Indefinite if Q(x) assumes both positive and negative values

Quadratic forms can also be positive semi-definite or negative semi-definite if we replace the strict
inequalities with weak inequalities. The matrix A that satisfies certain definite or semi-definite condition
in Definition 6.3 is called a positive (semi)definite/negative (semi)definite matrix.

Example 5.4. Consider the quadratic form:


  
T
  2 −1 x1
x Ax = x1 x2
−1 2 x2

12
 
This is a positive definite quadratic form, e.g., if xT = 1 1 :
  
T
  2 −1 1
x Ax = 1 1 (plugging in values)
−1 2 1
xT Ax = 2
 
Similarly, if xT = −3 2 :
  
T
  2 −1 −3
x Ax = −3 2 (plugging in values)
−1 2 2
xT Ax = 38

Indeed, any vector in R2 (besides the zero-vector) will produce a strictly positive value.

Example 5.5. We can prove that a positive definite matrix An×n is nonsingular. In other words,

Positive Definite =⇒ Nonsingular

An easy way to do this is to prove by contradiction(from the logic class).

• P =⇒ Q is equivalent to ¬P ∨ Q

• ¬(P =⇒ Q) is equivalent to ¬(¬P ∨ Q), or P ∧ ¬Q

• Def. of Positive Definite: ∀x ̸= 0, xT Ax > 0

• Theorem 5.1: if A is singular, ∃ x ̸= 0 ∋ Ax = 0

Proof by contradiction to show: ∀ x, xT Ax > 0 and ∃ y ∋ yT Ay = 0.


Proof.

Let A be positive definite (by hypothesis)


T
=⇒ ∀ x ̸= 0, x Ax > 0 (by def. of positive definite)
Suppose that A is singular (towards a contradiction)
=⇒ ∃ y ̸= 0 ∋ Ay = 0 (by Theorem 5.1)
=⇒ yT Ay = 0 (premultiplying by yT )
Thus, a contradiction
=⇒ A is nonsingular

Definition 5.6. Let A be a n × n matrix with elements aij . Then a principal minor is a minor produced
by the deletion of the row and column associated with an element such that i = j. A nth-order principal
minor is produced by the deletion of rows and columns associated with elements such that i = j, leaving
a nth order determinant.

Example 5.7. Consider the matrix


 
a11 a12 a13
A = a21 a22 a23 
a31 a32 a33

13
The 1st-order principal minors are:

|a11 | = a11 (deleting 2nd and 3rd rows/columns)


|a22 | = a22 (deleting 1st and 3rd rows/columns)
|a33 | = a33 (deleting 1st and 2nd rows/columns)

The 2nd-order principal minors are:

a11 a12
= a11 a22 − a12 a21 (deleting the 3rd row/column)
a21 a22
a11 a13
= a22 a33 − a23 a32 (deleting the 2nd row/column)
a31 a33
a22 a23
= a11 a33 − a31 a13 (deleting the 1st row/column)
a32 a33

The 3rd-order principal minor is:

a11 a12 a13


a21 a22 a23 = |A| (deleting no rows/columns)
a31 a32 a33

Definition 5.8. The leading principal minors are the principal minors that are associated with con-
tiguous, upper-left elements of the matrix.

Example 5.9. For the 3 × 3 matrix A in the preceding example, the leading principal minors are

|A1 | = |a11 | (the 1st-order LPM)


a11 a12
|A2 | = (the 2nd-order LPM)
a21 a22
a11 a12 a13
|A3 | = a21 a22 a23 (the 3rd-order LPM)
a31 a32 a33

Aside. Note that in the example, we are using Simon and Blume’s notation. This is by no means consistent
throughout the various mathematical economics textbooks available; indeed, notation and terminology vary
quite a bit. We always start enumerating leading principal minors, however, with element a11 , which is
the upper-left corner of every leading principal minor.

Theorem 5.10. Let A be a n × n symmetric matrix. then the quadratic form xT Ax is:

• positive definite if and only if all of its leading principal minors are strictly positive

• positive semi-definite if and only if all of its principal minors are non-negative

• negative definite if and only if all of its leading principal minors follow the signing convention

|A1 | < 0 |A2 | > 0 |A3 | < 0 . . .

• negative semi-definite if and only if all of its principal minors of an odd-order are ≤ 0 and all of it’s
principal minors of an even-order are ≥ 0.

14
• indefinite if none of the above hold
Example 5.11. Consider the matrix  
1 −1 0
A = −1 6 −2
0 −2 3
We can determine the definiteness of the matrix by first evaluating the leading principal minors:
|1| = 1
1 −1
=5
−1 6
1 −1 0
−1 6 −2 = 11
0 −2 3
All the LPMs are positive, so A is positive definite. Note that if one of the leading principal minors was
equal to zero, we’d need to check all of the other PMs for positive semi-definiteness.
Below (connecting definiteness to Hessian) is going to be covered in the Analysis session.

Aside. This has a close relationship to optimization. Consider a function f : A ⊂ Rn → R which is of


class C 2 . Then the Hessian matrix is given by
 
f11 f12 . . . f1n
 f21 f22 . . . f2n 
H(x) =  ..
 
.. . . .. 
 . . . . 
fn1 fn2 . . . fnn
2
∂ f (x)
Where fij denotes the second derivative ∂x i ∂xj
. Consider a two-variable function z = f (x, y). Then the
second-order total differential is given by:
dz = fx dx + fy dy (1st-order total differential)
d2 z = fxx dx2 + fxy dxdy + fyx dydx + fyy dy 2 (2nd-order total differential)
d2 z = fxx dx2 + 2fxy dxdy + fyy dy 2 (by Young’s Theorem)
We can re-write this in matrix notation:
  
2
  fxx fxy dx
d z = dx dy (the quadratic form)
fyx fyy dy
We can rewrite the 2nd-order total differential in quadratic form, using the Hessian.
Theorem 5.12. Let f be a twice-continuously differentiable function then if the associated Hessian is
1. positive definite, then f (·) is strictly convex

2. positive semi-definite, then f (·) is convex

3. negative definite, then f (·) is strictly concave

4. negative semi-definite, then f (·) is concave


Basically, the intutition is that if we’re at a point where the derivatives equal zero, then for any small
movements dx and dy, we will get a positive movement d2 z if the Hessian is positive definite, which
means that the function is convex. We will talk about these topics more when we talk about contstrained
optimization.

15
6 Eigenvalues and Eigenvectors
Definition 6.1. For an n × n matrix A, the trace is the sum of the elements along the main diagonal:
n
X
tr(A) = aii
i=1

Example 6.2. Consider the identity matrix:


 
1 0 0
I3 = 0 1 0
0 0 1

The trace is the sum along the main diagonal, so tr(I3 ) = 3 (indeed, the trace of any identity matrix is
the dimensionality of that matrix).

Theorem 6.3. Traces have the following properties:

1. (Linearity) Let A and B be n × n matrices and c ∈ R.

tr(A + B) = tr(A) + tr(B)


c · tr(A) = tr(cA)

2. (Transpose) Let A be a n × n matrix.

tr(AT ) = tr(A)

3. (Product) Let A be a m × n matrix and let B be a n × m matrix.

tr(AB) = tr(BA)

Example 6.4. Given a n × k matrix X where n > k, suppose that rank(X) = k (full rank). Then
 −1     −1 
tr In − X XT X XT = tr In − tr X XT X XT
 −1 
= n − tr XT X XT X
= n − tr(Ik )
=n−k

b ∈ Rn ,
Example 6.5. Given a vector u
bT u
u uT u
b = tr(b b)

Definition 6.6. An eigenvector of a n × n matrix A is a nonzero vector x such that Ax = λx for some
scalar λ. A scalar λ is called an eigenvalue of A if there is a nontrivial solution x of Ax = λx. Note that
we can rewrite this equation to be:
(A − λIn )x = 0

Example 6.7. Consider the matrix and vector:


  
1 6 6
A= and u=
5 2 −5

16
We can show that u is an eigenvector of A:
  
1 6 6
Au = (the product)
5 2 −5
 
(6 − 30)
= (multiplying)
(30 − 10)
 
−24
= (simplifying)
20
 
6
Au = −4 (factoring)
−5

Thus u and −4 are an eigenvector and an eigvenvalue, respectively.

Aside. Recall that one of our theorems states that Bx = 0 has a non-trivial solution x if and only if B is
singular. Thus, for there to be a non-trivial eigenvector, it must be that (A − λIn ) is singular (even if A
itself is not singular).

Definition 6.8. For a n × n matrix A, the scalar equation

|(A − λIn )| = 0

is called the characteristic equation of A.

Theorem 6.9. A scalar λ is an eigenvalue of a n × n matrix A if and only if λ satisfies the charactersitic
equation of A.

Example 6.10. Suppose we have the following 2 × 2 matrix:


 
a b
c d

To find the eigenvalues, we want to pick the λs such that:

a−λ b
= (a − λ)(d − λ) − bc = 0
c d−λ

This is the characteristic equation for the matrix.

Example 6.11. Consider the matrix 



2 2
A=
2 −1
We can find the associated eigenvalues by employing the characteristic equation:

2−λ 2
|(A − λI2 )| = (the characteristic equation)
2 −1 − λ
= (2 − λ)(−1 − λ) − 4 (the determinant)
= λ2 − λ − 6 (simplifying)
λ2 − λ − 6 = 0 (setting equal to zero)
(λ − 3)(λ + 2) = 0 (factoring)

17
Thus, λ1 = 3 and λ2 = −2 are the eigenvalues. To find associated eigenvectors, recall that we need
(A − λI2 )x = 02 . Thus, for our first eigen value:

(A − λ1 In )x = 0 (our condition)
    
2−3 2 x1 0
= (plugging in values)
2 −1 − 3 x2 0
    
−1 2 x1 0
= (simplifying)
2 −4 x2 0
Note that the rows are linear combinations of one another; this reduces to the equation:

x1 = 2x2 (reducing the system)


Note that eigenvectors are not unique; to force a standardized solution, we frequently normalize the vectors
(i.e., make them length one):
q
x21 + x22 = 1 (normalizing)

(2x2 )2 + x22 = 1 (plugging in for x1 )


5x22 = 1 (simplifying)
1
x2 = √ (solving for x2 using the positive root)
5
Thus, our eigenvector associated with λ1 is
" √ #
2/ 5
x1 = √
1/ 5
If we performed the same steps for our second eigenvalue, λ2 :
" √ #
−1/ 5
x2 = √
2/ 5
Note that in general, we may have n distinct eigenvalues and n associated eigenvectors, all of which we
normalized to length one.
Definition 6.12. A square n × n matrix A is called diagonalizable if there exists an n × n invertible
matrix P such that P −1 AP is a diagonal matrix.

A : diagonalizable ⇐⇒ ∃P : P −1 AP is diagonal

Theorem 6.13. Let A be a n × n matrix. If λ1 , · · · , λn are the eigenvalues of A, then:


1. The trace equals the sum of the eigenvalues
n
X
tr(A) = λi
i=1

2. The determinant of A equals the product of the eigenvalues


n
Y
|A| = λi
i=1

18
3. The trace of Ak is the sum of the eigenvalues raised to the kth power
n
X
k
tr(A ) = λki
i=1

4. Let AT = A.

(a) Eigenvalues of A are real.


(b) Eigenvectors corresponding to distinct eigenvalues are pairwise orthogonal.

Proof. Pick i, j ∈ {1, · · · , n} such that λi ̸= λj .

λi xTj xi = xTj (λi xi ) = xTj (Axi ) = xTj (AT xi ) = (xTj AT )xi = (Axj )T xi = (λj xj )T xi = λj xTj xi

(λi − λj )xTj xi = 0 ⇒ xTj xi = 0

(c) A can be diagonalized.


AX = XΛ where X T X = In
A = XΛX T

5. Let AT = A.

(a) Positive (semi)definite if and only if the eigenvalues of A are all (weakly) positive.

Proof. Given any v ∈ Rn \ {0}, vT Av = vT XΛX T v = (X T v)T Λ(X T v).

(b) Negative (semi)definite if and only if the eigenvalues of A are all (weakly) negative.
(c) Indefinite if and only if A has both positive and negative eigenvalues.

6. Let AT = A and A is idempotent.

(a) Eigenvalues of A are either 0 or 1.

Proof. Λ2 = X T A2 X = X T AX = Λ Therefore, λ ∈ {0, 1}.

(b) A is positive semidefinite.

Proof. Given any v ∈ Rn \ {0}, vT Av = vT (XΛX T )v = (X T v)T Λ(X T v) ≥ 0.

(c) rank(A) = tr(A).

7. Let AT = A and A is positive definite. Then there exists C such that A = C T C.


1 1 1 1
Proof. A = XΛX T = XΛ 2 Λ 2 X T = (Λ 2 X T )T (Λ 2 X T )

Example 6.14. Let X be a n × k matrix with rank(X)= k (n ≥ k).

MX := In − X(XT X)−1 XT

1. MX is symmetric.

2. MX is idempotent.

19
7 Vector Spaces and Norms
Aside. Building on what we already know about vectors, we can start thinking about vector spaces.
Recall that earlier, we mentioned that two linearly independent vectors v, u ∈ R2 span R2 . Indeed, R2 is
the vector space defined by the various linear combinations of v and u. Indeed, Rn is the vector space that
we usually are dealing with; however, it probably is a good idea for us to know a more formal definition.
Definition 7.1. A vector space is a set of vectors V equipped with addition and scalar multiplication
that satisfies the following properties for vectors u, v, and w and scalars a and b:
• Associativity of addition: u + (v + w) = (u + v) + w

• Commutativity of addition: u + v = v + u

• The identity element of addition: ∃ 0 ∈ V ∋ u + 0 = u ∀ u ∈ V

• The inverse element of addition: ∀ u ∈ V ∃ − u ∈ V ∋ u + (−u) = 0

• Compatibility of scalar and field multiplication: a(bu) = (ab)u

• The identity element of scalar multiplication: 1u = u

• Distributivity of scalar multiplication with respect to vector addition: a(u + v) = au + av

• Distributivity of scalar multiplication with respect to scalar addition: (a + b)u = au + bu


Definition 7.2. Given a vector space V , a subset W ⊂ V is a subspace if and only if given any λ, µ ∈ R
and v, w ∈ W ,
λv + µw ∈ W
Remark
• The zero vector 0 is in W

• If u, v ∈ W , then u + v ∈ W

• If u is in W and k is a scalar, then ku is in W


Aside. Again, the concepts of vector spaces and subspaces will probably not be particularly relevant to
you in terms of your first-year coursework, but they are concepts with which you should be familiar. In
particular, there are a few subspaces which may come up, particularly in econometrics.
Definition 7.3. The column space of a matrix A, denoted Col(A), is the set of all linear combinations
of the columns of A.
Definition 7.4. The null space of a matrix A, denoted Nul(A), is the set of all solutions to the homo-
geneous equation Ax = 0.
Aside. Let A be a m × n matrix.

Col(A) = {b ∈ Rm : Ax = b for some x ∈ Rn }


Nul(A) = {x ∈ Rn : Ax = 0 for some x ∈ Rn }
Col(AT ) = {b ∈ Rn : AT x = b for some x ∈ Rm }
Nul(AT ) = {x ∈ Rm : AT x = 0 for some x ∈ Rm }

20
Example 7.5. Consider the matrix
 
1 0 −3 5 0
0 1 2 −1 0
B=
0

0 0 0 1
0 0 0 0 0

A basis for this matrix is the set of vectors


     
1 0 0
0 1 0
b1 = 
0
 b2 =  
0 b3 =  
1
0 0 0

These are linearly independent, and all other columns of the matrix are linear combinations of these.

Definition 7.6. A metric space (X, d) is a set X and a function, d : X × X → R, such that for any
x, y, z ∈ X, it satisfies the following properties:

• d(x, y) ≥ 0

• d(x, y) = 0 ⇐⇒ x = y

• d(x, y) = d(y, x)

• d(x, z) ≤ d(x, y) + d(y, z)

Definition 7.7. A normed vector space (V, || · ||) is a vector space V and a function, || · || : V → R,
such that for any x, y ∈ V and for any α ∈ R, it satisfies the following properties:

• ||x|| ≥ 0

• ||x|| = 0 if and only if x = 0

• ||αx|| = |α| · ||x||

• ||x + y|| ≤ ||x|| + ||y||

Aside. Norms are the first step in getting us a notion of distance in a space. While we won’t end up using
norms explicitly all that frequently during the first year, they come about a great deal implicitly, any time
we talk about things like minimum distance, or minimum distance estimators. We typically deal with one
norm, but there are a few others.

Example 7.8. The following are all norms, which may be familiar:

1. Euclidean Norm (occasionally denoted ||x||2 ):

n
! 21
X
|x| = x2i
i=1

This is the “as the crow flies” distance from the origin in R2 , and is (by far) the most frequent norm
we will use.

21
2. Manhattan Norm (occasionally denoted ||x||1 ):
n
X
||x|| = |xi |
i=1

This is akin to measuring distance from the origin on a map laid out as a grid of streets.

Example 7.9. Consider x, y ∈ R2 . The most common metric we use is the notion of Euclidean distance,
i.e.,
1
d(x, y) = (x1 − y1 )2 + (x2 − y2 )2 2
This is equivalent to stating ||x − y||2 .
Example 7.10. Let X ⊂ Rn and S = C(X) be the set of all continuous and bounded functions C : X → R.
Define the metric d : S × S → R such that given any g, h ∈ S
d(g, h) = sup |g(x) − h(x)|
x∈X

Then (S, d) is a metric space.


Proof. We are going to prove this metric satisfies four axioms mentioned in Definition 8.6. (from proofwiki.org)
1. As d is the supremum of the absolute value, d(g, h) ≥ 0.

2. d(g, g) = supx∈X |g(x) − g(x)| = supx∈X |0| = 0

3. d(g, h) = supx∈X |g(x) − h(x)| = supx∈X |h(x) − g(x)| = d(h, g)

4. Let f, g, h ∈ S and x, y ∈ X. When there is a metric d′ (f, h) = |f (c) − h(c)| that fulfills the condition
in a metric apace (S, d′ ), for any c ∈ X,
d′ (f (c), h(c)) ≤ d′ (f (c), g(c)) + d′ (g(c), h(c)) d′ fulfills Metric Space Axiom
≤ sup |f (x) − g(x)| + sup |g(x) − h(x)| definition of Supremum of Real-Valued Function
x∈X x∈X
= d(f, g) + d(g, h) definition of d.
Therefore, d(f, g) + d(g, h) is an upper bound for Y := {d′ (f (c), h(c)) : c ∈ X}.
Thus, d(f, g) + d(g, h) ≥ sup Y = d(f, h).

Aside. As we move forward in econometrics, micro theory, or anything else during the first year sequences,
we will implicitly assume that any “minimum distance” is minimizing Euclidean distance unless specifically
told otherwise. A notion of distance, however, will be used implicitly (or explicitly) in classes, talks, papers,
etc., so it is a concept with which you should be familiar.

22
8 Orthogonal Projections
Definition 8.1. Given a nonzero vector x ∈ Rn and a vector y ∈ Rn , then y may be written as a sum of
two vectors
y = ŷ + û
where ŷ = αx for some scalar α and û is orthogonal to x. Then ŷ is the orthogonal projection of y
onto x, and û is the component of y that is orthogonal to x.

Aside. Alternative terminology would state that ŷ is the orthogonal projection of y onto the span of x.
For us, it does not matter which terminology we use. The first is probably more common, but the second
is perhaps a bit easier to understand graphically.

Theorem 8.2. The α that satisfies the definition of the orthogonal projection is given by

xT y
α=
xT x
Proof.
 2
f (a) = ||ax − y|| = (ax − y)T (ax − y) = a2 xT x − 2axT y + yT y
∂f (α)
=0
∂a
2αxT x − 2xT y = 0
xT y
α=
xT x

The orthogonal projection ŷ is given by the equation:

xT y
ŷ = x
xT x
Theorem 8.3. Let θ ∈ (0, π) be angle between two vectors x, y. Then

xT y = ||x|| · ||y|| · cos(θ)

Proof.

||y|| · cos(θ)
αx = x from the trigonometric functions
||x||
xT y
αx = x from Theorem 9.2. ||x||2 = xT x
||x||2

Aside. Two vectors x, y ∈ Rn are orthogonal when an angle between them is right (θ = 12 π).
Aside. Note that these formulas look virtually identical to our typical matrix formula for the coefficient
estimates in OLS. This is not a coincidence; indeed, if we have only one explanatory variable and no
constant, this is the exact formula for β
b and the fitted values.

23
Example 8.4. Consider the two vectors
   
7 4
y= and x =
6 2

We can find the orthogonal project of y onto the span of x, then write y as the sum of two orthogonal
vectors, one in span(x) and one orthogonal to x:
 
T
  7
x y= 4 2 = 40 (the numerator)
6
 
T
  4
x x= 4 2 = 20 (the denominator)
2
xT y
 
40 4
ŷ = T x = (the projection formula)
x x 20 2
 
8
ŷ = (simplifying)
4

So the projection of y onto x is [8 4]′ . The orthogonal component is û = y − ŷ:


   
7 8
û = y − ŷ = − (by def. of û)
6 4
 
−1
û = (simplifying)
2
Thus, the vector y can be written as
   
8 −1
y= +
4 2

Example 8.5. Consider the projection of y onto x where


   
2 1
y= and x =
2 0

By our formula, we know that


xT y
 
2
ŷ = T x =
x x 0
Graphically:
y

ŷ span(x)

Aside. This idea extends to vectors that don’t span the x-axis (it’s just a bit easier to draw). Similarly,
we can think about projections in an analogous way for multiple dimensions (although it is much harder to
draw and much harder to picture mentally). Indeed, higher dimensionality is important–the fitted values
from OLS, for example, are the projection of y onto the column space of the data matrix X. Luckily, we
have theorems to help us handle more dimensions.

24
9 Orthogonality
Definition 9.1. Let u and v be vectors in Rn . Then the inner product of the vectors is a scalar given
by uT v or u · v.  
v1
 v2 
uT v = u1 u2 . . . un 
 
. . . = u1 v1 + u2 v2 + · · · + un vn

vn
This is also known as the dot product.
Aside. The outer product of the vectors is uvT .
Aside. Note that we can write our usual notion of length, the Euclidean norm, in terms of inner product:
n
! 12
√ X
||x||2 = xT x = x2i
i=1

Definition 9.2. Two vectors u and v in Rn are orthogonal if and only if uT v = 0.


Theorem 9.3. (Lay THM 6.2). Two vectors u and v in Rn are orthogonal if and only if

||u + v||2 = ||u||2 + ||v||2

This is the linear algebra version of the Pythagorean theorem.


Aside. We can work through some of the algebra to show that this must be true if we’re using the
Euclidean norm. Note that xT y = yT x = 0 if and only if x and y are orthogonal.

||x + y||2 = (x + y)T (x + y)


= xT x + yT x + xT y + yT y
= ||x||2 + 0 + 0 + ||y||2
||x + y||2 = ||x||2 + ||y||2

Now, the same steps can be used to show that ||x − y||2 = ||x||2 + ||y||2 . This implies that ||x + y||2 =
||x − y||2 , or that x is equidistant to y and −y. Graphically:
x

||x − (−y)|| ||x − y||

−y 0 y

In other words, orthogonal vectors are perpendicular to each other.


Definition 9.4. Nonempty “sets” V, W ⊂ Rn are orthogonal if and only if given any v ∈ V , w ∈ W

vT w = 0

Remark The collection of vectors is not necessarily a vector space.


Theorem 9.5. If V, W ⊂ Rn are orthogonal, then

V ∩ W = ∅ ∨ V ∩ W = {0}

25
Definition 9.6. Let V be a subset of Rn . The orthogonal complement of V , denoted by V ⊥ , is the set of
all vectors w ∈ Rn that are orthogonal to the set V .
Aside.
1. V ⊥ is a subspace of Rn .

2. Span(V ) = (V ⊥ )⊥

Proof. S: a set of vectors in V = {v1 , · · · , vn }. If k ∈ Span(S), k = a1 v1 + · · · an vn . ∀vi ∈ S, viT w =


0 for all wP∈ S ⊥
∴ kT w = ai viT w = 0
⊥ ⊥
∴ k ∈ (S ) by definition.
∴ Span(S) ⊂ (S ⊥ )⊥ .

If k ∈ (S ⊥ )⊥ , k ⊥ w = 0 for all w ∈ S ⊥ , where S ⊥ := {w ∈ S ⊥ such that v ⊥ w = 0 for all v ∈ S}.


∴ k is one of the linear combination of vectors in S.
∴ k ∈ Span(S)
∴ (S ⊥ )⊥ ⊂ Span(S)

Theorem 9.7. (Lay THM 6.3). Let A be a m × n matrix. The orthogonal complement of the row space
of A is the null space of A, and the orthogonal complement of the column space of A is the null space of
AT :
(Col(A))⊥ = Nul(AT ) and (Row(A))⊥ = Nul(A)
Proof.
Nul(A) = {u ∈ Rn : Au = 0m }
= {u ∈ Rn : Ai · u = 0 for all i = 1, · · · , m}
where Ai is row vectors of A. Therefore, u is in orthogonal complement to the row(A).
∴ Nul(A) = (row(A))⊥ .
Similarly,

Nul(AT ) = {u ∈ Rm : AT u = 0n }
= {u ∈ Rm : aTi u = 0 for all i = 1, · · · , n}
where ai is column vectors of A. Therefore, u is in orthogonal complement to the col(A).
∴ Nul(AT ) = (col(A))⊥ .

Definition 9.8. A set of vectors {x1 , . . . , xk } in Rn is an orthogonal set if each pair of distinct vectors
from the set are orthogonal. An orthogonal basis for a subspace W ⊂ Rn is a basis for W that is also
an orthogonal set.
Theorem 9.9. (Lay THM 6.8) Orthogonal decomposition theorem
Let W be a subspace of Rn . Then each y ∈ Rn can be written uniquely in the form
y = ŷ + û
where ŷ is in W and û ∈ W ⊥ . If {x1 , . . . , xk } is any orthogonal basis of W , then
y′ x1 y′ xk
ŷ = x1 + · · · + xk
x′1 x1 x′k xk
and û = y − ŷ. Here, ŷ is the orthogonal projection of y onto W .

26
Aside. Two quick points are in order. First, this gives us a tool to understand how we can project y onto
a higher-dimensional space than line spanned by a single vector. Second, note that the theorem states that
any orthogonal basis can be used; it is the projection itself (not the basis) that is unique. Scale any vector
in the basis (or analogously, any column in the data matrix X) and it does not change the fitted values.

Theorem 9.10. (Lay THM 6.9). Let W be a subspace of Rn , let y ∈ Rn , and let ŷ be the orthogonal
projection of y onto W . Then ŷ is the point in W closest to y:

||y − ŷ|| < ||y − v|| for all v ∈ W ∋ v ̸= ŷ

Proof.

||y − v|| = ||(y − ŷ) + (ŷ − v)|| (adding 0)

Note that ŷ − v ∈ W (both vectors “live” in W , so their difference does as well. Further, note y − ŷ ∈ W ⊥
(by definition of orthogonal projection). Thus:

||y − v||2 = ||y − ŷ||2 + ||ŷ − v||2 (by the Pythaogrean Theorem)
||ŷ − v|| > 0 (v ̸= ŷ and by def. of a norm)
=⇒ ||y − v||2 > ||y − ŷ||2 (by ||ŷ − v|| > 0)
=⇒ ||y − v|| > ||y − ŷ|| (by non-negativity of norms)

27
10 OLS as a Projection
Definition 10.1. If X is a n × k matrix and y ∈ Rn , a least-squares solution of y = Xβ + u is a
b ∈ Rk such that
β
b ≤ ||y − Xb|| for all b ∈ Rk
||y − Xβ||

Aside. Note that no matter what, the answer we pick is going to “live” in the column space of X, Col(X).
Thus, we’re trying to find the point in Col(X) closest to y. To do this, we’ll use our best approximation
theorem, which tells us that the closest point is the orthogonal projection. ŷ ∈ Col(X), so a solution to
the system Xβb = ŷ will exist.

Theorem 10.2. (Lay THM 6.14). Let X be a n × k matrix. The following statements are logically
equivalent:

1. The equation y = Xβ + u has a unique least-squares solution for each y ∈ Rn

2. the columns of X are linearly independent

3. the matrix XT X is invertible

When these statements are true, the least-squares solution β̂ is given by


b = (XT X)−1 XT y
β

Proof.

0k = XT (y − ŷ) (by def. of orthogonal)

0k = XT (y − Xβ)
b (plugging in for ŷ)

0 k = XT y − XT Xβ
b (distributing)
b = (XT X)−1 XT y
β (solving for β)
b

Definition 10.3. Let X be a n × k matrix of full rank and let y ∈ Rn . Then the matrix

PX = X(XT X)−1 XT

is the projection matrix that projects y onto Col(X).

Theorem 10.4. Let PX be the projection matrix where PX = X(XT X)−1 XT . Then PX is

1. Symmetric

2. Idempotent

3. Positive Semidefinite

Definition 10.5. Let PX be the projection matrix associated with the columnn space of a n × k matrix
X. Then the annihilator matrix is defined as

MX = In − PX

MX projects values of y ∈ Rn onto Col(X)⊥ .

28
Remark Given y = Xβ + u,

1. MX is symmetric, idempotent, and positive semidefinite.

2. MX X = 0n

3. û = MX y = MX u and ûT û = uT MX u

4. û ∈ Nul(XT ) and XT û = 0

5. rank(MX ) = tr(MX ) = n − k (n − k degrees of freedom)

Aside. Depending on the presentation of OLS, the projection and annihilator matrices are important tools
in developing the theory behind econometrics. Even if we focus on a more statistics-motivated approach
to OLS, it is valuable to understand these matrices and what they do.

11 Matrix Differentiation
Aside. We won’t get into the details, but it is occasionally helpful to employ differentiation at a matrix
b for the regression equation y = Xβ + u by minimizing u′ u
level; for example, we can derive an estimator β
via matrix differentiation. Here are a few of the formulas you’ll likely run across:

1. Scalar-by-Vector: y ∈ R and x ∈ Rn

(a) In general:  
∂y
∂x1
 ∂y 
∂y  
 ∂x2  ∂y h
∂y ∂y ∂y
i
= .  and T
= ∂x1 ∂x2
··· ∂xn
∂x   ..  ∂x
∂y
∂xn

(b) For a vector of constants a ∈ R:


∂a
= 0n
∂x
2. Vector-by-Vector: y ∈ Rm and x ∈ Rn

(a) In general:  ∂y 
∂y1 ∂y1
∂x
1
∂x2
··· ∂xn
 ∂y21 ∂y2 ∂y2 

∂y 
 ∂x1 ∂x2
··· ∂xn 
= . .. ... .. 
∂x   .. . . 
∂ym ∂ym ∂ym
∂x1 ∂x2
··· ∂xn

(b) For a derivative of a vector with respect to itself:


∂x
= In
∂x
3. Matrix-by-Vector (from Hansen Econometrics Appendix)
∂ ∂
(a) ∂x
(aT x) = ∂x
(xT a) =a

(b) ∂x
(xT A) =A

∂xT
(Ax) =A

29
(c) Quadratic form:

∂xT Ax
= (A + AT )x (if A is not symmetric)
∂x
∂xT Ax
= 2Ax (if A is symmetric)
∂x
Example 11.1. Ordinary Least Square

β̂ ∈ arg min S(b) where S(b) = (y − Xb)T (y − Xb)


b∈Rk

S(b) = yT y − bT XT y − yT Xb + bT XT Xb
= yT y − 2bT XT y + bT XT Xb
∂S(b)
= −2XT y + 2XT Xb
∂b
∂S(β̂)
=0
∂b
(XT X)β̂ − XT y = 0

β̂ = (XT X)−1 XT y

30
12 Appendices
12.1 Appendix 1: determinant
 
a11 a12 a13
1. 3 × 3 Matrix: A = a21 a22 a23 
a31 a32 a33

First method, multiply three elements diagonally to the right and sum; multiply three elements
diagonally to the left and subtract. Visually, (solid = positive, dashed = negative):

a11 a12 a13

a21 a22 a23

a31 a32 a33

|A| = a11 a22 a33 + a12 a23 a31 + a13 a32 a21 − a13 a22 a31 − a12 a21 a33 − a11 a32 a23

2. For matrices 3 × 3 and larger, use the Laplace Expansion.


Definition 12.1. The determinant of the matrix that results from deleting rows and columns asso-
ciated with element aij is the minor |Mij |.
Example 12.2. Consider again the 3 × 3 matrix A (defined above). The minor associated with the
a a
element a11 is |M11 | = 22 23 . Visually, this can be represented:
a32 a33

a11 a12 a13

a21 a22 a23

a31 a32 a33

a11 a13
The minor associated with a22 is |M22 | =
a31 a33

31
a11 a12 a13

a21 a22 a23

a31 a32 a33

Definition 12.3. The cofactor |Cij | is a minor modified by a prescribed algebraic sign that follows
the convention:
|Cij | ≡ (−1)i+j |Mij |

Definition 12.4. The determinant found by Laplace Expansion is the value found by “expanding”
along any row or column:
n
X
|A| = aij |Cij | (expansion by the ith row)
j=1
n
X
|A| = aij |Cij | (expansion by the jth column)
i=1

Example 12.5. Considering one again the 3 × 3 matrix C (defined above), performing the Laplace
Expansion using the first row:

|A| = a11 |C11 | + a12 |C12 | + a13 |C13 |


a22 a23 a21 a31 a a
|A| = (−1)1+1 · a11 + (−1)1+2 · a12 + (−1)1+3 · a13 21 22
a32 a33 a23 a33 a31 a32
 
−7 0 3
Example 12.6. Consider the 3 × 3 matrix A =  9 1 4. Expanding along the top row:
0 6 5

32
12.2 Appendix 2: inverse
Example 12.7. Suppose we have an invertible matrix
 
a11 a12 a13
A = a21 a22 a23 
a31 a32 a33

Then we could find the inverse of A by performing row operations on the augmented matrix
 
  a11 a12 a13 1 0 0
A I3 =  a21 a22 a23 0 1 0 
a31 a32 a33 0 0 1

until we arrived at a matrix of the form


 
1 0 0 b 11 b 12 b 13
I3 A−1
 
=  0 1 0 b21 b22 b23 
0 0 1 b31 b32 b33

Theorem 12.8. If A is a n × n invertible matrix, the inverse may be found by using the determinant and
cofactors of the matrix A.

Definition 12.9. The cofactor matrix of A is a matrix defined by replacing the elements of A with
their associated cofactors:  
|C11 | |C12 | . . . |C1n |
 |C21 | |C22 | · · · |C2n 
C =  ..
 
.. . . .. 
 . . . . 
|Cn1 | |Cn2 | . . . |Cnn |

Definition 12.10. The adjoint of A is the transpose of the cofactor matrix:


 
|C11 | |C21 | . . . |Cn1 |
 |C12 | |C22 | · · · |Cn2 
adj(A) = C T =  ..
 
.. ... .. 
 . . . 
|C1n | |C2n | . . . |Cnn |

The inverse of a n × n matrix A can be found via the formula:


1
A−1 = adj(A)
|A|

Example 12.11. Consider the 3 × 3 matrix A:


 
0 1 0
A = 2 1 2
4 0 0

First, we can take the determinant to establish invertibility:

1 2 2 2 2 1
|A| = 0 −1 +0 (expanding on 1st row)
0 0 4 0 4 0
= 0 − 1(0 − 8) + 0 = 8 (simplifying)

33
Since |A| =
̸ 0, we can invert this matrix. Next, find the cofactor matrix CA :

1 2 2 2 2 1
 

 0 0 4 0 4 0 
 
 
1 0 0 0 0 1 
 
CA =  − − (the cofactor matrix)

0 0 4 0 4 0 


 
 
 1 0 0 0 0 1 

1 2 2 2 2 1
 
0 8 −4
CA = 0 0 4  (simplifying)
2 0 −2

Recall that the adjoint of A is the transpose of CA :

adj(A) = CAT (by def. of the adjoint)


 
0 0 2
adj(A) =  8 0 0  (the adjoint matrix)
−4 4 −2

Finally, we can calculate the inverse of A:


1
A−1 = adj(A) (the inverse formula)
|A|
 
0 0 2
1
= 8 0 0 (plugging in values)
8
−4 4 −2
 
0 0 1/4
A−1 = 1 0 0  (the inverse)
−1/2 1/2 −1/4

34
12.3 Appendix 3: Cramer’s Rule
Example 12.12. Consider a two-commodity, linear market model. For the first market:

Qd1 = a0 + a1 P1 + a2 P2 (demand 1)
Qs1 = b0 + b1 P1 + b2 P2 (supply 1)
0 = Qd1 − Qs1 (S = D)

For the second:

Qd2 = α0 + α1 P1 + α2 P2 (demand 2)
Qs2 = β0 + β1 P1 + β2 P2 (supply 2)
0 = Qd2 − Qs2 (S = D)

This reduces into a two-equation, two-unknown system:

0 = (a0 −b0 ) + (a1 −b1 )P1 + (a2 −b2 )P2 (for market 1)
0 = (α0 −β0 ) + (α1 −β1 )P1 + (α2 −β2 )P2 (for market 2)

Rewriting for expediency, defining ci = ai − bi (and analogously for greek letters):

c1 P1 + c2 P2 = −c0 (market 1)
γ1 P1 + γ2 P2 = −γ0 (market 2)

Rewriting in matrix notation:


    
c1 c2 P1 −c0
= (the linear system)
γ1 γ2 P2 −γ0

Solving for P1∗ (the market clearing price for good 1):

−c0 c2
−γ0 γ2 c2 γ0 − c0 γ2
P1∗ = = (by Cramer’s Rule)
c1 c2 c1 γ2 − c2 γ1
γ1 γ2

Solving for P2∗ (the market clearing price for good 2):

c1 −c0
γ1 −γ0 c0 γ1 − c1 γ0
P2∗ = = (by Cramer’s Rule)
c1 c2 c1 γ2 − c2 γ1
γ1 γ2

35
Glossary
adjoint ... 33 metric space ... 21
annihilator matrix ... 28 minor ... 31
augmented matrix ... 4
negative semi-definite ... 12
basis ... 3 Negative definite ... 12
normed vector space ... 21
characteristic equation ... 17
null space ... 20
coefficient matrix ... 4
Null Matrix ... 8
cofactor ... 32
cofactor matrix ... 33 orthogonal ... 25
column space ... 20 Orthogonal decomposition theorem ... 26
column vector ... 1 orthogonal basis ... 26
Cramer’s Rule ... 11, 35 orthogonal set ... 26
determinant ... 9, 31 orthogonal projection ... 23, 26
diagonalizable ... 18 parallel ... 2
difference ... 7 positive semi-definite ... 12
dot product ... 25 Positive definite ... 12
eigenvalue ... 16 principal minor ... 13
eigenvector ... 16 product ... 5, 7
element ... 4 projection matrix ... 28
elementary row operations ... 4 Pythagorean theorem ... 25
equal ... 1, 7
quadratic form ... 12
Euclidean Norm ... 21
Hessian ... 15 rank ... 11
row echelon form ... 5, 11
Idempotent Matrix ... 8 row vector ... 1
Identity Matrix ... 8
Indefinite ... 12 scalar multiple ... 2, 7
inner product ... 25 singular ... 9
inverse ... 9, 33 size ... 4
invertible ... 9 span ... 2
subspace ... 20
Laplace Expansion ... 31, 32 sum ... 1, 7
leading principal minors ... 14 symmetry ... 8
least-squares solution ... 28
linear equation ... 4 total differential ... 15
linear combination ... 2 trace ... 16
linearly independent ... 2 transpose ... 8
linearly dependent ... 2 trivial solution ... 2

matrix equation ... 5 vector space ... 20

36

You might also like