0% found this document useful (0 votes)
102 views

Math 208 Course Pack

This document contains course notes for Math 208, covering topics in linear algebra for economics. The notes are divided into 5 chapters covering linear equations, matrices, inverse matrices and determinants, advanced topics in linear algebra, and multivariable regions and functions. The notes were written by William Thompson to provide material for the Mathematics for Economics course.

Uploaded by

ChiOfGree
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views

Math 208 Course Pack

This document contains course notes for Math 208, covering topics in linear algebra for economics. The notes are divided into 5 chapters covering linear equations, matrices, inverse matrices and determinants, advanced topics in linear algebra, and multivariable regions and functions. The notes were written by William Thompson to provide material for the Mathematics for Economics course.

Uploaded by

ChiOfGree
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

Math 208 Course Notes Pack

Mathematics for Economics

Notes By William Thompson


Contents

1 Linear Equations 6
1.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Linear Equations in n-Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 Gauss Jordan-Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Algebra of Matrices 16
2.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Basic Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Matrix Addition and Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.3 Systems of Matrix Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Matrix Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Some Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.1 Idempotent and Partitioned Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.2 Trace of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Inverse Matrices and Determinants 27


3.1 The Inverse Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Determinant and Inverse of a 3 × 3 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Determinant of a 3 by 3 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2 Inverse of a 3 by 3 Matrix by Cofactors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.3 Inverse of a 3 by 3 Matrix by Row Reductions . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.4 Solving Systems of Linear Equations Using the Inverse . . . . . . . . . . . . . . . . . . . . . 33
3.3 Properties and Higher Order Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.1 Higher Order Determinants and Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Properties of the Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Determinant Properties and Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4 Advanced Topics in Linear Algebra 41


4.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.1 Introducing Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.2 Visualizing Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1.3 The Dot Product and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.1.4 Unit Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.5 Linear Combinations and Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

1
4.1.6 Linear Independence and Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.7 Vector Subspaces and Basis of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.1.8 Rank and Nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 The Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Introducing the Eigenvalue Problem and Characteristic Equation . . . . . . . . . . . . . . . 56
4.2.2 Solving the Eigenvalue Problem and Multiplicity . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.4 The Principal Axis Theorem and Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . 61
4.2.5 Properties of Similar Matrices and High Powers . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.1 Introducing Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.3.2 Definiteness of a Symmetric Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3.3 The Eigenvalue Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3.4 Minors, Principal Minors, and Leading Principal Minors . . . . . . . . . . . . . . . . . . . . 70
4.3.5 Sylvester’s Criterion and the Method of Minors . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.4 The Inverse Matrix Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5 Multivariable Regions and Functions 76


5.1 Point Sets in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1.1 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.1.2 Convex Combinations and Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.2 Convexity and Concavity of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2.1 Convex and Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2.2 Level Curves and Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2.3 Quasiconvexity and Quasiconcavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6 Calculus for Functions of n-Variables 94


6.1 Partial Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1.1 Introducing Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1.2 The Law of Diminishing Productivity for Production Functions . . . . . . . . . . . . . . . . 98
6.1.3 The Multivariable Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.1.4 Implicit Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.2 Second Order Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2.1 Higher Order Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2.2 Second Order Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2.3 The Gradient and the Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.3 First Order Total Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.3.1 Introducing the First Order Total Differential . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.3.2 The Differential on Level Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.3.3 The MRS and MRTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3.4 Convexity to the Origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.4 Curvature Properties of Concavity and Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.4.1 The Second Order Differential and Convexity for Differentiable Functions . . . . . . . . . . 112
6.4.2 Examples of Curvature Part One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.5 Differentiable Quasiconvex Functions and Homogeneity . . . . . . . . . . . . . . . . . . . . . . . . 118
6.5.1 The Bordered Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.5.2 Quasiconvexity of Differentiable Non-Convex Functions . . . . . . . . . . . . . . . . . . . . 119
6.5.3 Homogeneous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

2
6.5.4 Euler’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.6 Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.6.1 Single Variable Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.6.2 Multivariable Taylor Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

7 Optimization of Functions of n-Variables 129


7.1 First Order Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.1.1 Critical Points and Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.1.2 Linear Equations of Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
7.1.3 Using Substitution to Obtain Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.1.4 Using Cases to Obtain Critical Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.1.5 Multiproduct Monopoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.1.6 Cournot Duopoly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.2 Second Order Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.2.1 Saddle Points and Classifying Critical Points Locally . . . . . . . . . . . . . . . . . . . . . . 139
7.2.2 Examples of Local Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

8 Constrained Optimization 143


8.1 First Order Conditions of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.1.1 Introducing Constrained Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.1.2 The Lagrange Function and First Order Conditions . . . . . . . . . . . . . . . . . . . . . . 145
8.2 Second Order Conditions of Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

3
Overview of the Course
Welcome to Math 208! This course is divided into two main topics of mathematics:

• First Half: Linear Algebra

• Second Half: Multivariable Calculus and Optimization

These two topics tie together quite nicely in the following way. Calculus can be viewed as two things: approxi-
mating and analyzing functions linearly (one might use lines as an analogy for this in a lower dimensional setting)
or the study of the rate of change. One is a geometric interpretation while the later is a physical interpretation.
The topic of linear algebra is the foundation and theory of linear expressions and will hence prepare us for the
study of calculus in a higher dimensional setting.

The Progression of the course works as under the simple guideline

Solving Linear Equations



Matrix Algebra

Analysis of Matrices

Multivariable Real Space and Functions

Representing Multivariable Calculus Through Matrices

Optimization of Multivariable Functions
We’ll do our best to flavor the content of these notes with discussion and exercises that pertain to the readers
interest in Economics, but we do remind the reader that these notes have been created for a Mathematics course.
We will discuss production functions, monopolies, competing markets, and utility functions, but the main focus is
to introduce and develop the tools used in Economics in a more general mathematical setting.

These notes are what we call “skeleton” notes, in that exercises are incomplete and are intended to be filled
in by the reader in accompanying the lecturer. These notes have been created with consideration to house several
examples that gives the students a learning experience by a practical and “on hands” approach.

4
Common Notation Used
• The absolute value of x is denoted |x|.
• The factorial of a positive number n is denoted n! = n(n − 1)(n − 2)...(2)(1).
• The maximum of a list of numbers a1 , ..., an is denoted max(a1 , ..., an ) and the minimum of this list is denoted
min(a1 , ..., an ).
• An open interval is denoted (a, b) and a closed interval is denoted [a, b]. Half open or half closed intervals
are denoted [a, b) or (a, b].
• The space of all real number is denoted by R. The space of all n-tuples of real numbers (x1 , ..., xn ) is denoted
Rn .
• An element x of a set X is denoted as x ∈ X.
• A subset X of a set Y is denoted X ⊂ Y .
• A function f that maps from a domain X to a codomain Y is denoted f : X → Y .
• The range of a function f is denoted Range(f ).
• The collection of all elements x in a set X that obey a rule r(x) is denoted by the set-builder notation
{x ∈ X | r(x)}.
• Vectors in Rn are denoted either by boldface v or with an arrow ~v .
• The magnitude of a vector v is denoted kvk.
• The dot product of two vectors u and v is denoted u · v = uT v.
• A = [aij ] represents a matrix A with elements aij in row i and column j.
• The rank of a matrix A is denoted Rank(A) and the nullity of a matrix A is denoted Nullity(A).
• The determinant of A is denoted by either |A| or det(A).
• The trace of A is denoted trace(A).
• The transpose of a matrix A is denoted AT .
• If A is similar to B this is denoted A ∼ B.
• Given a matrix A the principal leading minor of order k is denoted |Ak | and the collection of all principal
minors of order k is denoted A∗k .
• Let A be a matrix. We denote A  0 if A is positive definite, A  0 if A is positive semi-definite, A ≺ 0 if
A is negative definite, and A  0 if A is negative semi-definite.
• The open ball centered at x of radius  > 0 is denoted B (x).
∂f
• The partial derivative of f (x) with respect to xi is denoted (x) or fi (x).
∂xi
• The gradient of f is denoted ∇f , the Hessian of f is denoted ∇2 f (x) and the bordered Hessian of f is
denoted ∇2 f (x).
• The first order differential of f is denoted df and the second order differential is denoted d2 f .

5
Chapter 1

Linear Equations

6
1.1 Systems of Linear Equations
Definition

R2 is the collection of all points (x, y). Specifically, it is the xy-plane. Equations of the form f (x, y) = 0
represent curves in the plane.

Definition

Three main forms of a line in R2 given by

• Slope Intercept Form: With slope m and y-intercept (0, b) it is given by y = mx + b

• Point Slope Form: With slope m and point (x0 , y0 ) on the line it is given by y − y0 = m(x − x0 )

• General Form: This is given by ax + by = c

Example: Graph x + y = 2 and 2x − y = 1.

From the graph we see the lines intersect. How do we find this point of intersection?

Solve the “system of linear equations”


(
x+y =2 (1)
2x − y = 1 (2)

7
Example: Solve the above system.

Theorem

A system of two linear equations in R2 has either exactly one solution, no solution, or infinitely many
solutions.

Example: Solve the system


(
2x + 4y = 6 (1)
x + 2y = 3 (2)

8
Example: Solve the system
(
4x + 6y = 5 (1)
2x + 3y = 7 (2)

Example: Solve the system


(
2x − y = 4 (1)
3x + y = 9 (2)

9
1.2 Linear Equations in n-Variables
1.2.1 Gauss Jordan-Elimination
We will expand on the eliminations method in 7.1. The new method is called Gaussian Elimination which comes
in two forms: either Gauss Jordan elimination or Gaussian elimination with back substitution. We use it to solve
systems in any dimension.

Elimination Rules (Row Operations)

• Multiply a row (equation) by a non-zero constant

• Add a multiple of one row (equation) to another

• Swap two rows (equations)

Example: Solve the following system through Gauss Jordan elimination



−x − y + z = −2
 (1)
3x = 9 (2)

2y − 2z = −2 (3)

10
Example: Solve the system through elimination with back-substitution

4x − y + 2z = 13 (1)

x + 2y − 2z = 0 (2)

−x + y + z = 5 (3)

Definition
A system with no solutions is called inconsistent. A system with at least one solution is called consistent.

11
1.2.2 Matrix Notation
Definition
A matrix is a rectangular array of data. Representing a system of equations by the following association

a11 x1 + a12 x2 + · · · + a1n xn = b1
 

 a11 a12 · · · a1n b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
  a21 a22 · · · a2n b2 
⇐⇒  .
 
.. .. .. .. ..
 ..



 . . . . . 
a a · · · a bm

am1 x1 + am2 x2 + · · · + amn xn = bm m1 m2 mn

is called representing the system through an augmented matrix.

Example: Solve the system of equations through use of Gauss Jordan elimination and setting it up as an
augmented matrix

2x + z = 10

2y − z = 0

−6x − 3y + z = 0

12
Definition
A matrix is in Reduced Row Echelon Form (RREF) provided the following hold:

• Rows of all zeroes are moved/switched to the bottom (provided they exist)

• The first non-zero entry in each non-zero row is a 1 (called the leading 1)

• The leading 1 of a row occurs to the right of the proceeding row

• Each column with a leading 1 has only the leading 1 as a non-zero entry

Example: Which of the following are in RREF?

 
      1 6 8 −1
1 0 0 −1 1 −2 3 −1 0 0 1  0 0 0 1 
0 2 1 0 0 0 0 0 0 0 1
0 0 1 0

13
Definition
A column with a leading 1 is called a pivot column. The columns without a leading one are called free
columns.

Definition
If a system is consistent the general solution is a formula which gives all solutions. If there are infinite
solutions then the general solution is created by denoting the variables corresponding to free columns by
parameters, called free variables, and representing the variables of the pivot columns in terms of those
free variables.

4x − 2y + 2z = 6

Example: Solve the system −y + z = 1 by reducing to RREF.

2x = 2

14
Definition
A system of linear equations is called homogeneous if each constant on the right hand side is zero. That
is they are of the form

a11 x1 + a12 x2 + · · · + a1n xn = 0


a21 x1 + a22 x2 + · · · + a2n xn = 0
..
.
am1 x1 + am2 x2 + · · · + amn xn = 0

Theorem
A homogeneous system has either infinitely many solutions or exactly one solution. In either case

x1 = x2 = x3 = · · · = xn = 0

is a solution (called the trivial solution). Specifically, if a homogeneous system of linear equations has
less rows than variables, then there are infinitely many solutions.

15
Chapter 2

Algebra of Matrices

16
2.1 Matrices
Definition
A matrix is a rectangular array of data. They are often denoted by a capital letter A, B, C, ...

Notation
Given a matrix A we denote the entry in the i’th row and j’th column by the lower case letter with
appropriate ordered indexes aij . To put emphasis on the entries of A we may denote it by A = [aij ].

Definition
We say a matrix is of size m × n if it has m rows and n columns. If A is such a matrix we denote this by
Am×n .
 
1 2 −1 0 7
Example: Determine the size of A =  0 3 5 4 2  and determine the entry a24 .
−1 0 8 4 −2

Definition
A matrix of size m × 1 is called a column matrix (or column vector). Likewise, a matrix of size 1 × n
is called a row matrix (or row vector). Such matrices are usually denoted by a lowercase letter with an
arrow overset or boldface (e.g. ~v or v) instead of a capital letter.

Definition
A matrix of size n × n (where the number of rows are equal to the number of columns) is called a square
matrix. A square matrix where all non-diagonal entries are zero is called a diagonal matrix.

Example: Amongst the list below, determine which matrices are column vectors, row vectors, square, diagonal,
or none of these.

 
1 2 1 4  
  2  
1 2 −1  0 1 −3 −2    2 0
A= B=
 −2
 C= 1  D= 2 1 7 9 −3 E=
2 3 0 3 0 1  0 −1
−1
3 7 9 −5

17
Definition
Two matrices A = [aij ] and B = [bij ] of equal size are said to be equivalent if aij = bij for all i and j (that
is, the entries are the same).
   
1 2 1 y
Example: For what values of x and y are A = and B = equal?
x−y 2 0 2

Definition
The identity matrix of size n × n is the diagonal matrix with 1’s on every position of the diagonal and is
denoted In . The zero matrix of size m × n is the matrix where all elements are zero and is denoted Om×n .

Example: Construct the identity matrix of size 4 × 4 and the zero matrix of size 2 × 3.

18
2.2 Basic Matrix Operations
2.2.1 Matrix Addition and Scalar Multiplication
Definition
Let A and B be size m × n matrices. The sum or difference of two matrices is defined component-wise by

A ± B = [aij ] ± [bij ] = [aij ± bij ]


and is called matrix addition and matrix subtraction respectively.
     
2 −1 0 2 −1 5 4 2
Example: Let A = ,B= ,C= .
3 2 4 1 3 0 7 −1
If possible, compute A + B, A + C, and C − A.

Definition
Numbers (in the context of matrices) are called scalars.

Definition
The multiplication of a scalar r and a matrix A = [aij ] is defined as rA = [raij ]. This is called scalar
multiplication.

Example: Compute the following    


2 −1 −3 4
+3
1 3 7 5

19
2.2.2 Matrix Multiplication
Definition
Let A be a matrix of size m × k and B be a matrix of size k × n. The product of A and B is defined as
AB = [ai1 b1j + ai2 b2j + · · · + aik bkj ] and is of size m × n.

Notes on Matrix Multiplication

• The order of multiplication matters. You can’t expect AB to be equal to BA (or even defined).

• Matrix multiplication is defined only if the left matrix has as many columns as the right matrix has
rows. If a size m × k matrix is multiplying a matrix of size k × n on the left the result is a matrix of
size m × n.

• To obtain the new element in the ith row and j th column you multiply the corresponding order of
elements in the ith row in the left matrix with those of the j th column in the right matrix and add
them up.
 
  2 9 0
3 2 1 5  1 3 5 
Example: Let A = and B =  . If possible, compute AB and BA.
9 1 3 0  2 4 7 
8 1 5

20


6 3  
7 4 2
Example: Let A =  2 5  and B = . If possible, compute AB and BA.
6 7 3
9 8

Definition
If n is a positive integer and A is square we define exponentiation by

An = |A × A ×
{z· · · × A}
n-times
 
2 1
Example: If A = compute A2 and A3 .
3 2

21
2.2.3 Systems of Matrix Linear Equations
Theorem
Every system of equations given by the form

a11 x1 + · · · + a1n xn = b1


..
 .

a x + · · · + a x = b
m1 1 mn m m

is equivalent to the matrix equation


     
a11 · · · a1n x1 b1
 .. ..  .. 
.. =  ... 
  
 . . .   . 
am1 · · · amn m×n xn n×1 bm m×1

This is commonly denoted by A~x = ~b or AX = B.

Example: Write the following system of linear equations in matrix multiplication notation.

2x + y + 3z = 2

y−z =1

5x + z = 0

Example: Write the following system of linear equations in matrix multiplication notation.
(
2x − y + 5z = 5
x+y−z =4

22
2.3 Matrix Transposition
Definition

Let A be a matrix of size m × n. The transpose of A, denoted AT , is the size n × n matrix obtained from
A by interchanging the rows and columns of A. Component-wise, if AT = [aTij ] and A = [aij ] then aTij = aji .
 
2 −1 3
Example: Compute the transpose of A =
0 1 7

Definition

A square matrix A is called symmetric if AT = A. A square matrix B is called skew-symmetric if


B T = −B.
   
0 −1 2 0 −1 3
Example: Given the matrices C =  1 0 3  and D =  −1 2 1  which is symmetric and which
−2 −3 0 3 1 4
is skew-symmetric?

Theorem
Let A be a size m × k matrix and B be a size k × n matrix. Then the following identities hold...

• (AT )T = A

• (A + B)T = AT + B T

• (AB)T = B T AT

Example: Let A be symmetric and B be skew-symmetric both of size n × n. Simplify ((AB)T + BA)T .

23
2.4 Some Special Matrices
2.4.1 Idempotent and Partitioned Matrices
Definition

A square matrix is idempotent if A = A2 .


 
4 −1
Example: Let A = . Compute A2 , A3 and A37,142 .
12 −3

 
x y
Example: Find all values x and y such that is idempotent.
y x

24
Definition
A block matrix or a partitioned matrix is a matrix that is interpreted as having been broken into
sections called blocks or submatrices.

Theorem
Addition and multiplication of block matrices is carried out the same as it is defined in terms of ordinary
matrices but with entries as matrices instead of scalars. This is allowed so long as the operations between
these submatrices are defined.

Example: Carry out the multiplication of the two matrices given below
  
−1 2 4 1 −2 2 −3
 1
 0 −1 −2   0
 1 −1  
 2 −1 3 1   −2 −1 0 
1 2 3 4 4 0 1

in the manner of using the highlighted partitions.

25
2.4.2 Trace of a Matrix
Definition
Let A be a size n × n matrix. The trace of A is the sum of diagonal elements. That is,
n
X
trace(A) = a11 + a22 + · · · + ann = akk
k=1

Example: Compute the trace of


 
3 1 0
 2 5 3 
4 6 8

Cyclic Property

If all the listed multiplication of matrices A, B and C are defined then

• trace(AB) = trace(BA)

• trace(ABC) = trace(CAB) = trace(BCA)


 
2 1  
1 0 3
Example: Consider A = 3 2 and B =
  . Demonstrate that trace(AB) =trace(BA).
2 1 5
4 5

26
Chapter 3

Inverse Matrices and Determinants

27
3.1 The Inverse Matrix
Definition
Let A and B be square matrices of size n × n. We say that B is the inverse of A (and vice versa) provided
that AB = BA = I. We denote this by B = A−1 .

Definition
Any matrix for which the inverse does not exist is called singular. If the inverse exists we say the matrix
is invertible or non-singular.
 −1  
1 2 −1 1 1 2
Example: Show that  −2 0 1  =  1 1 1 .
1 −1 0 2 3 4

Definition
 
a11 a12
The determinant of a matrix A = is defined as |A| = a11 a22 − a21 a12 .
a21 a22

Theorem
   
a11 a12 −1 1 a22 −a12
If A = and |A| =
6 0 then A =
a21 a22 |A| −a21 a11
   
3 1 3 4
Example: Compute, if possible, the inverse of A = and B = .
6 2 1 5

28
3.2 Determinant and Inverse of a 3 × 3 Matrix
3.2.1 Determinant of a 3 by 3 Matrix
Definition
Let A be a size 3 × 3 matrix. We define its minor in position (i, j) as the 2 × 2 determinant of the matrix
obtained by deleting row i and column j. This is denoted Mij .
 
3 1 2
Example: Let A =  4 1 7 . Compute M23 .
3 6 9

Definition

Let A be a size 3 × 3 matrix. We define its cofactor in position (i, j) as the quantity Cij = (−1)i+j Mij .
 
2 −1 3
Example: Let A =  4 1 5 . Compute C12 .
2 −2 3

29
Definition
Let A be a size 3 × 3 matrix. We define its determinant as |A| = a11 C11 + a12 C12 + a13 C13 .

0 2 3
Example: Compute the determinant of A =  −1 0 4 .
2 3 0

Theorem
Let A be a size 3 × 3 matrix. The value of the determinant is independent of the row or column expanded
along. That is, |A| = ai1 Ci1 + ai2 Ci2 + ai3 Ci3 along any i’th row and |A| = a1j C1j + a2j C2j + a3j C3j along
any j’th row.

1 2 0
Example: Compute the determinant of A =  4 5 3 
5 −4 0

30
3.2.2 Inverse of a 3 by 3 Matrix by Cofactors
Definition
Let A be a size 3 × 3 matrix. The cofactor matrix of A is defined as cof(A) = [Cij ] where Cij is the (i, j)
cofactor of A. The adjoint matrix of A is defined as adj(A) = (cof(A))T .

Theorem
1
6 0 then A−1 =
If A is a size 3 × 3 matrix and |A| = adj(A).
|A|
 
3 1 2
Example: Find A−1 if A =  4 1 5  .
7 6 2

31
3.2.3 Inverse of a 3 by 3 Matrix by Row Reductions
Theorem
Let A be a size 3×3 matrix such that |A| =
6 0. If the partitioned matrix [I|B] is obtained from [A|I] through
a sequence of row operations, then B = A−1 .
 
3 0 2
Example: Find A−1 if A =  2 0 −2 
0 1 1

32
3.2.4 Solving Systems of Linear Equations Using the Inverse
Theorem
IA = A and CI = C for all matrices A and C where multiplication is defined.

Example: Solve the following system of linear equations using the inverse matrix.


2x + 3y + z = 4

3x + 3y + z = −2

2x + 4y + z = 1

33
3.3 Properties and Higher Order Matrices
3.3.1 Higher Order Determinants and Inverses
Definition
Let A be a size n × n matrix. We define its minor in position (i, j) as the (n − 1) × (n − 1) determinant of
the matrix obtained by deleting row i and column j. This is denoted Mij .

Definition

Let A be a size n × n matrix. We define its cofactor in position (i, j) as the quantity Cij = (−1)i+j Mij .
 
1 2 0 3
 2 3 0 4 
Example: Let A = 
 −1
. Compute C34 .
2 2 3 
0 4 −1 −1

Definition
Let A be a size n × n matrix. We define its determinant as |A| = a11 C11 + a12 C12 + · · · + a1n C1n .
 
1 0 0 0
 2 3 0 0 
Example: Let A =  . Compute |A|.
 −1 7 3 0 
0 0 2 −2

34
Theorem
Let A be a size n × n matrix. The value of the determinant is independent of the row or column expanded
along. That is, |A| = ai1 Ci1 + · · · + ain Cin along any i’th row and |A| = a1j C1j + · · · + anj Cnj along any
j’th row.

Definition
Let A be a size n × n matrix. The cofactor matrix of A is defined as cof(A) = [Cij ] where Cij is the (i, j)
cofactor of A. The adjoint matrix of A is defined as adj(A) = (cof(A))T .

Theorem
1
6 0 then A−1 =
If A is a size n × n matrix and |A| = adj(A).
|A|

Theorem
Let A be a size n × n matrix such that |A| 6= 0. If the partitioned matrix [I|B] is obtained from [A|I]
through a sequence of row operations, then B = A−1 .
 −1
4 0 0 0
 0 0 2 0 
Example: Compute 
 0

1 2 0 
1 0 0 1

35
3.3.2 Properties of the Inverse
Theorem
A square matrix A is invertible if and only if |A| =
6 0.

Theorem
Let A and B be size n × n matrices with |A| =
6 0 and |B| =
6 0. Then...

• (AB)−1 = B −1 A−1

• (A−1 )−1 = A

• (AT )−1 = (A−1 )T


1
 

a11 · · · 0
 ··· 0
 a11 
 .. .. ..  then A−1 =  .. .. .. 
• If A is diagonal and given by A =  . . . 

 . . .


0 ··· ann
 1 
0 ···
ann

Example: Let A be symmetric and B be skew symmetric invertible matrices of size n × n. Simplify the
expression ((AB)T )−1 (A + B)T .

36
3.3.3 Determinant Properties and Operations
Definition
A square matrix U = [uij ] is called upper triangular if uij = 0 whenever i > j. A square matrix L = [lij ]
is called lower triangular if lij = 0 whenever i < j. If a matrix is either upper or lower triangular we say
it is a triangular matrix. Elements on the diagonal are allowed to be zero.
 
2 1 −1
Example: The matrix U = 0 −3
 4  is upper triangular.
0 0 5
 
−3 0 0
Example: The matrix L =  2 0 0  is lower triangular.
4 1 3
Theorem
Let A and B be size n × n matrices and let k be a non-zero real scalar. Then the following hold...

• |AT | = |A|

• |A−1 | = |A|−1

• If B is obtained from A by swapping two rows or columns then |A| = −|B|.


1
• If B is obtained from A by multiplying a row or column by k then |A| = |B|.
k
• If B is obtained by adding a multiple of a row (or column) to another row (or respectively column)
from A then |A| = |B|.

• If A is a triangular or diagonal matrix then |A| is the product of the diagonal elements.

Example: Compute |U | and |L| of the previous example.

37
Example: Using determinant rules compute the following.


1 4 4 1


0 1 −2 2


3 3 1 4

0 1 −3 −2

38
3.4 Cramer’s Rule
Cramer’s Rule
Let Ax = b represent a system of linear equations where A is a size n × n matrix and x and b are both size
n × 1 matrices. Let Aj denote the matrix obtained by taking the j’th column of A and replacing it with
the column matrix b. Provided |A| =6 0 then the system has a unique solution given by

|A1 | |A2 | |An |


x1 = x2 = ··· xn =
|A| |A| |A|

Example: Using Cramer’s rule, solve


(
2x + 4y = 7
3x − y = 2

39
Example: Using Cramer’s rule, solve

2x + 4y − z = 15

x − 3y + 2z = −5

6x + 5y + z = 28

for only z.

40
Chapter 4

Advanced Topics in Linear Algebra

41
4.1 Vector Spaces
4.1.1 Introducing Vectors
Definition
A vector is a column or row matrix. In the space of Rn the vector v is graphed as an arrow where the base
of the arrow is chosen at a point P and the tip is located at v1 units along the x1 -axis, v2 units along the
x2 -axis, ...,and vn units along the xn -axis starting from P .

Definition
If a vector is graphed with its base at the origin 0 we say that the vector is in standard position.

Example: Draw the following vectors in R2 in standard position.

   
1 3
v= w=
2 −2

Definition
q
The length or magnitude of a vector v is kvk = v12 + v22 + · · · + vn2


2
Example: Compute the length of u =  1 .
4

42
4.1.2 Visualizing Vector Operations
• Parallelogram Law: When adding two vectors the result is the diagonal of the corresponding parallelogram.

• Vector Subtraction: Given two vectors u and v the vector u − v is the resulting vector connecting the
tip of v to the tip of u.

• Scalar Multiplication: When multiplying a vector by a scalar k you lengthen the vector by a scale |k|. If
k > 0 you extend this in the original direction of the vector, while if k < 0 you extend this in the opposite
direction.

43
4.1.3 The Dot Product and Orthogonality
Definition

The dot product of two vectors u and v in Rn is v · w = vT w = v1 w1 + v2 w2 + · · · + vn wn .


  
2 0
Example: Find the dot product of v =  −1  and w =  4 .
3 −3

Dot Product Properties

Let u, v and w be vectors in Rn and let k be a real scalar. Then...

• v · w = w · v (Commutative)

• u · (v + w) = u · v + u · w (Distributive)

• (kv) · w = k(v · w) (Associative)

• v · v = kvk2

Example: If kuk = 3 and v · u = 3 compute u · (3u + v).

44
Theorem
Let u and v be vectors in Rn and let θ be the angle between them. Then u · v = kukkvk cos(θ).
   
3 4
Example: Determine the cosine of the angle between u = and v = .
4 3

Definition
Two vectors are said to be orthogonal if their dot product is zero. Consequently, if they are non-zero
vectors, then graphically they are perpendicular if they are orthogonal.

Example: Find all values k for which the two vectors are orthogonal
   
−5 2
 k   4 
v=  1 
 w = 
 −1 

2k k

p
Example: Let u and v be orthogonal vectors in Rn . Show that ku + vk = kuk2 + kvk2 .

45
4.1.4 Unit Vectors
Definition
Definition: We say v is a unit vector provided kvk = 1. Sometimes a unit vector is denoted with an
additional “hat” to emphasize it is a unit vector. For example v = v̂ if v is a unit vector.

Example: Find all values k such that


√ 
k2 / 2

v= 0√ 
k/ 2
is a unit vector.

Definition

The standard basis vectors in Rn are the unit vectors ei with 1 in the it h position and zeroes elsewhere.
The notation e1 = i, e2 = j, e3 = k, ... is sometimes used.

Theorem
Let v be a vector in Rn and let k be a real scalar. Then...

• kkvk = |k|kvk; and consequently


1
• If v 6= 0 then v̂ = v is a unit vector that points in the same direction as v. The process of
kvk
constructing v̂ is called normalization.

Example: Compute a unit vector in the same direction as


 
2
v =  −1 
5

46
4.1.5 Linear Combinations and Spanning Sets
Definition
We say that v is a linear combination of the vectors v1 , ..., vn if it is of the form

v = a 1 v1 + · · · + a n vn
for some scalars a1 , ..., an .
 T  T
Example: Express the vectors u = 1 2 −1 and w = −1 2 0 as a linear combination of
 T  T
v1 = 1 2 3 and v2 = 1 0 2 if it is possible to do so.

47
Definition
The span of a collection of vectors v1 , ..., vn is the collection of all vectors expressible as a linear combination
of v1 , ..., vn . This is denoted Span({v1 , ..., vn }). We say that a collection of vectors v1 , ..., vn span a
region of vectors U if U = Span({v1 , ..., vn }).

    
12  2 −3 
Example: Show that  −11  lies in the region Span   −1  ,  4  .
7 3 1
 

   
1 2
Example: Show that Span , = R2 .
1 1

48
4.1.6 Linear Independence and Dependence
Theorem
Solving the system a1 v1 + · · · + an vn = b for a1 , ..., an is equivalent to solving the system Aa = b where
 T
A = [v1 | · · · |vn ] and a = a1 · · · an .

Definition
Given a collection of vectors v1 , ..., vn they are said to be linearly independent if the equation

a1 v1 + · · · + an vn = 0
has only the unique trivial solution a1 = · · · = an = 0. Otherwise the vectors are said to be linearly
dependent. If they are linearly dependent, a non-trivial linear combination that satisfies the equation is
called the nature of dependence.

Theorem
Given a collection of vectors, it is only possible to express one of them as a linear combination of the other
vectors if the collection is linearly dependent.

  
1 1
Example: Determine whether the vectors v1 =  2  and v2 =  0  are linearly dependent or independent.
3 2

49
Theorem
Any collection of n vectors v1 , ..., vn in Rn is linearly independent if and only |A| =
6 0 where A = [v1 | · · · |vn ].

Example: Determine whether the vectors


     
1 1 3
v1 =  1  v2 =  0  v3 =  1 
0 1 2
are linearly dependent or independent. If they are linearly dependent, determine the nature of dependence.

50
4.1.7 Vector Subspaces and Basis of a Vector Space
Definition
A vector subspace of Rn is a subset U lying in Rn that satisfies:

• Closure Under Addition: The vector sum u + v lies in U for all vectors u and v in U .

• Closure Under Scalar Multiplication: The vector ku lies in U for all vectors u in U .

Example: Demonstrate that the space consisting of all vectors


 
x
y
where xy ≥ 0 is not a subspace of R2 .

Theorem
The spanning set of any collection of vectors is a vector subspace.

Definition
We say that a collection of vectors v1 , ..., vn is a basis of a given vector space if the vectors span the vector
space and are linearly independent. That is, a basis of a vector space is a coordinate representation of the
space with the least number of coordinate axis’ required to represent it through linear combinations.

Definition
The dimension of a vector space is the number of basis vectors of that space. This number is unique.

Theorem
Every collection of n linearly independent vectors in an n-dimensional vector space is a basis for that space.

51
Definition
Let v1 , ..., vm be a collection of m vectors in Rn and let U be a subregion of Rn . We say that...

• the collection is orthogonal if vi · vj = 0 whenever i 6= j.

• the collection is an orthogonal basis of U if it is a basis of U and orthogonal.

• the collection is an orthonormal basis of U if it is an orthogonal basis of U and each vector in the
collection is a unit vector.

Theorem
Every orthogonal collection of vectors is linearly independent.

Example: Verify that the following vectors form an orthogonal basis of R3 .


     
1 1 1
u= 1  v =  −1  w= 1 
1 0 −2

52
Gram Schmidt Process
Let U be a vector space with basis v1 , ..., vn . Then the following vectors

u1 = v1
 
v 2 · u1
u2 = v2 − u1
u1 · u1
   
v 3 · u1 v 3 · u2
u3 = v 3 − u1 − u2
u1 · u1 u2 · u2

···
     
v n · u1 vn · u2 vn · un−1
un = v n − u1 − u2 − · · · − un−1
u1 · u1 u2 · u2 un−1 · un−1
form an orthogonal basis for U .
     
 1
 1 0 
 0   1   1 
 1  ,  0
Example: Consider the vector space W = Span      ,   . Construct an orthogonal basis
  2 
 
0 1 3
 
for this space.

53
4.1.8 Rank and Nullity
Definition
Let A be a size m × n matrix. We call...

• the span of rows of A the rowspace and is denoted Row(A).

• the span of columns of A the columnspace and is denoted Col(A).

• all vector solutions to Ax = 0 the nullspace and is denoted Null(A).

Theorem
Let A be a size m × n matrix with reduced row echelon matrix B. Then...

• Row(A) is a subspace of Rn with basis given by the non-zero rows in B.

• Col(A) is a subspace of Rm with basis given by the columns in A corresponding to those with leading
1’s in B.

• Null(A) is a subspace of Rn with basis given by the vectors multiplied by the free variables in the
general solution.
 
2 4 6 8
Example: Let A =  1 3 0 5 . If you are given that the reduced row echelon form of A is given by
1 1 6 3
 
RREF 1 0 9 2
A ∼
0 1 −2 1
determine a basis for the rowspace, columnspace and nullspace of A.

54
Theorem
For any matrix A, the dimension of Row(A) is equivalent to the dimension of Col(A). This quantity is
called the rank of A and denoted Rank(A). Consequently, this is given by the number of leading 1’s of A.

Definition
The dimension of the nullspace of any matrix is called the Nullity of A and is denoted Nullity(A). Conse-
quently, this is given by the number of free variables in the general solution.

Rank-Nullity Theorem

Let A be a size m × n matrix. Then Nullity(A)+Rank(A) = n.

Example: Demonstrate that the Rank-Nullity theorem holds in the previous example.

55
4.2 The Eigenvalue Problem
4.2.1 Introducing the Eigenvalue Problem and Characteristic Equation
Definition
Definition: Let A be a size n×n matrix. Let λ be an unknown scalar and x be a non-zero unknown column
vector of size n × 1. Solving the problem Av = λv for the unknowns is called the eigenvalue problem.
The value λ is called the an eigenvalue of A and v is called the corresponding eigenvector of A.

Example: Consider the following,


   
4 2 2
A= v=
2 1 1
Demonstrate that v is an eigenvector of A and find its corresponding eigenvalue.

Theorem
Any eigenvalue λ that solves the eigenvalue problem associated to a matrix A must satisfy |A − λI| = 0
(called the characteristic equation). If A − λI were invertible, then the only eigenvector solutions would
be v = 0.

Definition
Let A be a size n × n matrix. The n’th degree polynomial p(λ) = |A − λI| is called the characteristic
polynomial of A.

Example: Find the eigenvalues of


 
3 1
A=
2 4

56
Example: Find the eigenvalues of
 
−1 1 0
A= 1 2 1 
0 3 −1

57
4.2.2 Solving the Eigenvalue Problem and Multiplicity
Theorem
Let A be a size n × n matrix with eigenvalue λ. The eigenvectors of A associated to λ are given by the basis
of the nullspace of A − λI.
 
2 2 2
Example: Solve the eigenvalue problem if A =  0 2 0  provided that all the eigenvalues are given by
0 1 3
λ = 2 and λ = 3.

58
Definition
The number of times an eigenvalue is repeated is called its algebraic multiplicity. The number of
eigenvectors corresponding to an eigenvalue is called its geometric multiplicity.

Theorem
Let A be a matrix with eigenvalue λ. Then the following inequality holds,

1 ≤ Geometric Multiplicity ≤ Algebraic Multiplicity


Furthermore, the geometric multiplicity of λ is given by Nullity(A − λI).
 
0 1 0
Example: State the algebraic and geometric multiplicity of each eigenvalue corresponding to A =  0 0 1 .
1 −3 3

59
4.2.3 Diagonalization
Definition
Suppose A and B are size n × n matrices. Then A and B are similar, denoted A ∼ B if there exists a
nonsingular matrix S of size n such that A = SBS −1 . We say a matrix is diagonalizable if it is similar to
a diagonal matrix.

Spectral Theorem

Let A be a size n × n matrix. A is diagonalizable if and only if the algebraic and geometric multiplicity
of each eigenvalue are equal. Furthermore, denote thepairs of solutions
 to the eigenvalue problem as λi
−1
and qi . Then A = P DP where P is the matrix P = v1 · · · vn and D is the diagonal matrix with
diagonal elements in the order λ1 , ..., λn .

Example: Form the diagonalization of the following matrix


 
4 0 −2
A= 2 5 4 
0 0 5
     
−1 0 −2
given that λ1 = 4, v1 =  2  and λ2 = λ3 = 5, v2 =  1 , v3 =  0  are all solutions to the eigenvalue
0 0 1
problem.

60
4.2.4 The Principal Axis Theorem and Orthogonal Matrices
Principal Axis Theorem

Let A be a symmetric matrix of size n × n. Then the following results hold:

• A has exactly n (not necessarily distinct) eigenvalues.

• The eigenvalues of A are real numbers.

• There exists a set of n linearly independent eigenvectors.

• The eigenvectors corresponding to distinct eigenvalues of A are orthogonal.


 
3 2 4
Example: The solution to the eigenvalue problem associated to A =  2 0 2  is given by λ1 = λ2 = −1,
4 2 3
     
−1 −1 2
v1 =  2 , v2 =  0  and λ3 = 8, v3 =  1 . Demonstrate that eigenvectors corresponding to different
0 1 2
eigenvalues are orthogonal.

Definition
A matrix Q of size n × n is called an orthogonal matrix if the columns of Q form an orthonormal set.

Theorem

Let Q be an orthogonal matrix, then Q−1 = QT .

Theorem

Let A be a symmetric matrix, then there exists an orthonormal matrix Q such that A = QDQT . This
representation is called orthogonal diagonalization.

61
Example: Following the previous example, use the Gram Schmidt Process to construct an orthogonal basis
for the nullspace of A + I (associated to λ = 1). Use this to orthogonally diagonalize the matrix.

62
4.2.5 Properties of Similar Matrices and High Powers
Theorem

Let A and B be similar matrices given by A = SBS −1 . Then both matrices share the same...

• Rank

• Determinant

• Trace

• Eigenvalues and both their algebraic and geometric multiplicities

Furthermore, for any integer n ≥ 1 we have An = SB n S −1 .

Theorem
Let A be a matrix whose eigenvalues are λ1 , ..., λn (not necessarily distinct). Then Trace(A) = λ1 + · · · + λn
and |A| = λ1 × · · · × λn .

Example: Let A be a 3 × 3 matrix with characteristic polynomial p(λ) = (2 − λ)(λ + 4)2 . Compute the trace
and determinant of A.

Theorem
Let B be a size n × n diagonal matrix with diagonal elements given by b1 , ..., bn . Then B m is the diagonal
matrix with diagonal elements bm m
1 , ..., bn .
       n
2 6 −2 1 −1 0 0 1 2 6
Example: Given the diagonalization = find a formula for
0 −1 1 0 0 2 1 2 0 −1
as a single matrix.

63
4.3 Quadratic Forms
4.3.1 Introducing Quadratic Forms
Definition

A function g : Rn → R of the form g(x) = xT Ax for some size n × n matrix A is called a quadratic form.
 
T 3 1
Example: Expand the matrix multiplication in g(x) = x Ax where A = and x is in R2 . Then
4 2
compute g(2, −1).

Definition

Let g : Rn → R be a quadratic form given by g(x) = xT Ax where A = [aij ]. When the quadratic form is
expressed as g(x) = xT Ax we say that it is in matrix form. When expanded to
n X
X n
g(x) = aij xi xj
j=1 i=1

we say it is in scalar form.

Theorem
1
Let g(x) = xT Ax be a quadratic form. Define the matrix S = (A + AT ), then S is symmetric and
2
g(x) = xT Sx. This representation is called the symmetric representation.

Example: Find a symmetric representation (in matrix form) of the quadratic form

g(x) = 3x21 + 2x1 x2 + 5x1 x3 + 8x23

64
4.3.2 Definiteness of a Symmetric Matrix
Definition

Let g(x) = xT Ax be a quadratic form in symmetric representation. We say that A is...

Definiteness:

• Positive definite if g(x) > 0 for all x 6= 0 and is denoted A  0.

• Negative definite if g(x) < 0 for all x 6= 0 and is denoted A ≺ 0.

Semi-Definiteness:

• Positive semi-definite if g(x) ≥ 0 for all x 6= 0 and is denoted A  0.

• Negative semi-definite if g(x) ≤ 0 for all x 6= 0 and is denoted A  0.

and is called indefinite if it is none of the above.

Example: Determine the definiteness of the matrix


 
1 1
A=
1 1

Example: Determine the definiteness of the matrix


 
−3 0
A=
0 −1

65
Example: Determine the definiteness of the matrix
 
2 −1 0
A =  −1 2 −1 
0 −1 2

Theorem
Every positive definite matrix is also positive semi-definite and every negative definite matrix is also negative
semi-definite. The reverse inclusion is not necessarily true.

66
Theorem
Let A and B be size n × n symmetric matrices. If A and B are the same definiteness, then A + B is a
matrix with the same definiteness.

Definition
Let f : U ⊆ Rn → R be a function. We say that x∗ is a...

• global maximum or absolute maximum on U if f (x∗ ) ≥ f (x) for all x in U .

• global minimum or absolute minimum on U if f (x∗ ) ≤ f (x) for all x in U .

Theorem

Let g(x) = xT Ax be a quadratic form in symmetric representation. Then x = 0 is a...

Definiteness:

• unique global minimum of g(x) if A  0.

• unique global maximum of g(x) if A ≺ 0.

Semi-Definiteness:

• global minimum of g(x) if A  0.

• global minimum of g(x) if A  0.

Note
For functions of two variables you can visualize quadratic forms and other surfaces using GeoGebra’s 3D
graphing calculator. All of Geogebra’s calculators can be used in-browser by clicking on the appro-
priate program at the following link: Geogebra’s Programs.

67
4.3.3 The Eigenvalue Method
Lemma

Let g(x) = xT Ax be a quadratic form in symmetric representation. Let λ1 ,...,λn be the eigenvalues of A
and let A = QDQT be the orthogonal diagonalization. Then

g(x) = yT Dy = λ1 y12 + · · · λn yn2


where y = Qx.

Theorem
Let A be a symmetric size n × n matrix whose eigenvalues are λ1 , ..., λn . Then...

Definiteness:

• A  0 if and only if λi > 0 for i = 1, ..., n.

• A ≺ 0 if and only if λi < 0 for i = 1, ..., n.

Semi-Definiteness:

• A  0 if and only if λi ≥ 0 for i = 1, ..., n.

• A  0 if and only if λi ≤ 0 for i = 1, ..., n.

and A is indefinite if and only if some signs of the eigenvalues are different.

Example: Determine the definiteness of


 
4 −2
A=
−2 1

68
Example: Determine the definiteness of
 
2 1
A=
1 5

Example: Determine the definiteness of


 
−2 1
A=
1 −2

Example: Determine the definiteness of


 
1 2
A=
2 1

69
4.3.4 Minors, Principal Minors, and Leading Principal Minors
Definition
Let A be a size n × n matrix. A minor of order n − k is a sub-determinant obtained from A by deleting
k rows and k columns of A. If the columns deleted are i1 , ..., ik and the rows are j1 , ..., jk the minors are
denoted Mi1 ...ik ;j1 ...jk .
 
1 2 0 −1
 2 −1 3 1 
Example: Let A =  . Compute M13;24 . What is the order of this minor?
 0 3 −2 0 
−1 1 0 2

Definition
Let A be a size n × n matrix. A principal minor of order n − k is a minor with the same k rows and
k columns deleted from A. Specifically, if the columns and rows deleted are given by r1 , ..., rk then the
principal minors are given by Mr1 ...rk ;r1 ...rk . The collection of all minors of order m is denoted A∗m .
 
2 0 1 2
 0 −2 −3 3 
Example: Compute all minors of order 2 for A = 
 1 −3 1
.
0 
2 3 0 −2

70
Theorem
 
n n!
Let A be a size n×n matrix. The number of principal minors of A of order m is given by = .
m m!(n − m)!

Example: Show that the above theorem is consistent with the previous example.

Definition
Let A be a size n × n matrix. The leading principal minor of order n − k is the principal minor obtained
by deleting the last k rows and columns of A. The principal leading minor of order m is denoted |Am |.

Note
You can visualize the leading principal minors by drawing an expanding box starting from the top left
corner of a matrix.

Example: Compute all principal leading minors of the following matrix


 
2 3 4 0
 3 −1 0 0 
A=  4

0 3 0 
0 0 0 2

71
4.3.5 Sylvester’s Criterion and the Method of Minors
Notation
Consider a collection of real numbers S = {s1 , ..., sm }. We denote the sign of S as...

• S > 0 provided s1 > 0, ..., sm > 0.

• S ≥ 0 provided s1 ≥ 0, ..., sm ≥ 0.

• S < 0 provided s1 < 0, ..., sm < 0.

• S ≤ 0 provided s1 ≤ 0, ..., sm ≤ 0.

Sylvester’s Criterion

Let A be a symmetric size n × n matrix. If...

Definiteness:

• |Ai | > 0 for all i = 1, ..., n then A  0.

• (−1)i |Ai | > 0 for all i = 1, ..., n then A ≺ 0.

Semi-Definiteness:

• A∗i ≥ 0 for all i = 1, ..., n then A  0.

• (−1)i A∗i ≥ 0 for all i = 1, ..., n then A  0.

and A is indefinite if it fits none of the patterns above. Moreover, if any even order minor of A is negative
then A is indefinite.

Example: Determine the definiteness of


 
2 4
A=
4 8

72
Example: Determine the definiteness of
 
2 0 2
A= 0 4 4 
2 4 6

Example: Determine the definiteness of


 
−2 1 −1
A =  1 −2 −1 
−1 −1 −2

73
Example: Determine all values of k such that
 
k −4 −4
A =  −4 k −4 
−4 −4 k
is positive definite.

74
4.4 The Inverse Matrix Theorem
Inverse Matrix Theorem
Let A be a square size n × n matrix. All of the following are equivalent. This means, that if any of the
statements in the following list are true, then the rest of the statements in the list are also true. Accordingly,
if any of the statements in the list are false, then the rest of the statements in the list are false.

1. A can be reduced to the identity matrix I through a sequence of row reductions.

2. A has n pivot columns.

3. The equation Ax = 0 has only the trivial solution x = 0.

4. For each column vector b ∈ Rn the equation Ax = b has a unique solution.

5. The columns of A form a linearly independent set.

6. The columns of A span Rn .

7. The columns of A form a basis of Rn .

8. Col(A) = Rn .

9. The dimension of the column space of A is n.

10. The rows of A form a linearly independent set.

11. The rows of A span Rn .

12. The rows of A form a basis of Rn .

13. Row(A) = Rn .

14. The dimension of the row space of A is n.

15. Rank(A) = n.

16. Nullity(A) = 0.

17. Nullspace(A) = {0}.

18. There is an n × n matrix C such that CA = I where I is the identity matrix.

19. There is an n × n matrix D such that AD = I where I is the identity matrix.

20. The tranpose matrix AT is invertible.

21. Zero is not an eigenvalue of A.

22. |A| =
6 0.

23. A−1 exists.

75
Chapter 5

Multivariable Regions and Functions

76
5.1 Point Sets in Rn
5.1.1 Open and Closed Sets
Defintion
Given
p two points a and b in Rn the Euclidean distance between them is d(a, b) = ka − bk =
(a1 − b1 )2 + · · · + (an − bn )2 .

Note
In the context of point sets in Rn people use both the standard point notation X = (x1 , x2 , ..., xn ) and
 T
vector notation x = x1 x2 · · · xn .

Example: Show that the points (−2, −3), (2, 1) and (5, −2) form the vertices of a right angle triangle.

Definition
An -neighbourhood of a point x ∈ Rn is given by the set B (x) = {y ∈ Rn : d(x, y) < }. Simply, it is
the collection of points lying within a distance  of x.

77
Notation
If a point p lies in a region U we denote this p ∈ U . If U is a subregion of Rn we denote this U ⊂ Rn .

Definition
Let U ⊂ Rn . Then...

• A point p ∈ U is called an interior point of U if there exists an  > 0 such that B (p) ⊂ U . The
collection of all interior points of U is called the interior of U .

• A point q ∈ U is called a boundary point of U if for every  > 0 we have that B (q) contains both
interior points of U and points not in U . The collection of all boundary points is called the boundary
of U .

Definition
Let U ⊂ Rn . Then U is called...

• open if every point in U is an interior point.

• closed if U contains all of its boundary points.

Theorem
Let U ⊂ Rn , then U is open if and only if it contains none of its boundary.

78
Note
The only sets that are considered to be both open and closed are Rn (of all points) and the empty collection
{} (of no points). Every other set is exclusively either open, closed, or neither.

Example: Graph the regions

S1 = {(x, y) ∈ R2 | x+y ≥ 1} S2 = {(x, y) ∈ R2 | 1 < x2 +y 2 < 4} S3 = {(x, y) ∈ Rn | −2 ≤ x ≤ 1, −2 < y < 3}

Which are open, closed, or neither?

79
(Continued...)

Definition
Let U ⊂ Rn and p ∈ U . If there exists an  > 0 such that U ⊂ B (p) then U is called bounded. If U is
not bounded we say it is unbounded.

Example: Determine which of the regions in the previous example are bounded and which are unbounded.

80
Extreme Value Theorem
Let U ⊂ Rn and let f : U → R be a continuous function. If U is closed and bounded then f obtains both
an absolute maximum and an absolute minimum on U .

q √
Example: Argue that the function f (x1 , x2 ) = x1 − x22 + 1 − x1 − x2 has an absolute maximum and
minimum on its natural domain.

81
5.1.2 Convex Combinations and Convex Sets
Definition
Let p, q ∈ Rn . A convex combination of p and q is a linear combination of the form

x = tp + (1 − t)q

for some 0 ≤ t ≤ 1. The collection of all such points is also called the convex combination of p and q. If
we instead consider 0 < t < 1 we call it a strict convex combination.

Notation
The convex combination of two points p, q ∈ Rn is denoted [p, q]. The strict convex combination is denoted
(p, q).

Example: Determine the convex combination of the points (1, −2, 3) and (2, 4, −1). Does the point (7/4, 5, 0)
lie on this convex combination?

82
Definition
Let U ⊂ Rn . We say that the region U is...

• convex if [p, q] ⊂ U for every p, q ∈ U .

• strictly convex if (p, q) is a subset of the interior of U for every p, q ∈ U .

Example: Prove that the closed half-space {x ∈ Rn : aT x ≥ b} is a convex set for any a ∈ Rn and b ∈ R.

83
Example: Graph the regions S1 = {(x, y) ∈ R2 | y ≥ x2 } and S2 = {(x, y) ∈ R2 | 1 < x2 + y 2 < 4}. Determine
whether they are strictly convex, non-strict convex, or non-convex regions.

Example: Graph the regions S3 = {(x, y) ∈ R2 | y ≥ |x|} and S4 = {(x, y) ∈ R2 | y > |x|}. Determine whether
they are strictly convex, non-strict convex, or non-convex regions.

84
Example: Let p ∈ Rn and let  > 0. Prove that B (p) is a strictly convex region. In your argument you will
need to use the famous triangle inequality:

kx + yk ≤ kxk + kyk
Hint: tx + (1 − t)y − p = (1 − t)(y − p) + t(x − p).

85
5.2 Convexity and Concavity of Functions
5.2.1 Convex and Concave Functions
Definition
Let U ⊂ Rn be a convex region. Then the function f : U → R is...

Non-Strict

• convex if f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y) for every x, y ∈ U and 0 ≤ t ≤ 1.

• concave if f (tx + (1 − t)y) ≥ tf (x) + (1 − t)f (y) for every x, y ∈ U and 0 ≤ t ≤ 1.

Strict

• strictly convex if f (tx + (1 − t)y) < tf (x) + (1 − t)f (y) for every x, y ∈ U and 0 < t < 1.

• strictly concave if f (tx + (1 − t)y) > tf (x) + (1 − t)f (y) for every x, y ∈ U and 0 < t < 1.

86
Example: Prove that the function f (x) = x2 is a strictly convex function.
Hint: It is easier to show tf (x) + (1 − t)f (x0 ) − f (tx + (1 − t)x0 ) > 0.

Example: Prove that the function f (x) = −|x| is concave but not strictly concave.
Hint: You may use the triangular inequality |a + b| < |a| + |b|.

87
Theorem
Let U ⊂ Rn be a convex region and let f : U → R. Then f (x) is convex if and only if −f (x) is concave.
The same result holds for strict convexity and concavity.

Definition
Let U ⊂ Rn and let f : U → R. The function f is called an additive function if

f (x) = f1 (x1 ) + f2 (x2 ) + · · · + fn (xn )

for some single variable functions f1 , f2 , ..., fn .

Theorem
Let U ⊂ Rn be a convex region and let f : U → R be an additive function given by f (x) = f1 (x1 ) + · · · +
fn (xn ). Then f (x) is convex provided each of f1 , ..., fn are convex functions. Similarly, f (x) is concave
provided each of f1 , ..., fn are concave functions. The same result holds for strict convexity and concavity.

Example: Argue that the function f : Rn → R given by f (x) = x21 + · · · + x2n is a strictly convex function.

Theorem
Let I ⊂ R be an interval and let f : I → R be a twice differentiable function of a single variable. Then f
is...

• convex on I if f 00 (x) ≥ 0 on I. It is strictly convex on I if f 00 (x) > 0 on I.

• concave on I if f 00 (x) ≤ 0 on I. It is strictly concave on I if f 00 (x) < 0 on I.



Example: Argue that the function f (x, y) = x + ln(y) where x, y > 0 is strictly concave.

88
5.2.2 Level Curves and Sets
Definition
Let U ⊂ Rn and let f : U → R. The level set associated to C ∈ Range(f ) is the set

LC (f ) = {x ∈ Rn | f (x) = C}
In the two dimensional case of U ⊂ R2 this is often called a level curve.

Example: Suppose that a firm produces two goods in quantities x1 and x2 that sell at prices $4 and $2 per
unit. Construct the revenue function and level curve of the function associated to R = $8 and R = $16.

Note
The level set gives all the points of input that correspond to some fixed output. This is clearly of interest
to us. In economics level sets are encountered in consumer theory (where they are called indifference
curves) and producer theory (where they are called isoquants).

89
Definition
We define the positive region of Rn as Rn+ = {x ∈ Rn | xi > 0 for each i = 1, ..., n}.

Definition
A Cobb-Douglas function is a function f : Rn+ → R of the form f (x) = Axa11 xa22 · · · xann .

1/4 1/2
Example: Below are some level curves to the Cobb-Douglas Function y = 2x1 x2 .

You can visualize this as the collection of all values in the domain of the function that fixes a given output in the
range.

90
Definition
 
x1 x2 xn
A Leontief utility function is a function f : Rn+ → R of the form f (x) = min , , ..., .
w1 w2 wn
x 
1
Example: Below are some level curves to the Leontief utility function y = min , x2 .
0.5

You can visualize this as the collection of all values in the domain of the function that fixes a given output in
the range.

91
5.2.3 Quasiconvexity and Quasiconcavity
Definition
Let U ⊂ Rn and let f : U → R. Let C ∈ Range(f ) then we call...

• BC = {x ∈ U | f (x) ≥ C} the better set associated to C.

• WC = {x ∈ U | f (x) ≤ C} the worse set associated to C.

Example: Determine and graph both the better and worse set of the output associated to (3, 4) for the func-
2 2
tion f (x, y) = ex +y .

92
Definition
Let U ⊂ Rn be a convex region and let f : U → R. We say that f is...

• quasiconvex provided BC is a convex region for every C ∈ Range(f ).

• quasiconcave provided WC is convex region for every C ∈ Range(f ).

Note
If we think of convex and concave functions as bowl like shapes then quasiconvex and quasiconvex functions
are general one hump hill shapes that aren’t specifically bowl shaped.

Example: Argue that the function f (x) = x2/3 is quasiconvex.

Theorem
Every convex function is quasiconvex and every concave function is quasiconcave. The reverse implication
is not necessarily true.

93
Chapter 6

Calculus for Functions of n-Variables

94
6.1 Partial Differentiation
6.1.1 Introducing Partial Derivatives
Definition
The partial derivative of a function y = f (x1 , ..., xn ) with respect to the variable xi is

∂f f (x1 , x2 , ..., xi + h, ..., xn ) − f (x1 , ..., xn )


= lim
∂xi h→0 h
In economics, the partial derivatives are called marginals.

Example: Consider the revenue function R(x1 , x2 ) = p1 x1 + p2 x2 for a multiproduct, competitive firm.
Compute the marginals of the functions and interpret them.

95
Note
Taking the partial derivative with respect to a variable is equivalent to differentiating with respect to that
variable (in the single variable calculus sense) while holding all other variables constant.

x1 ∂f
Example: Let f (x1 , x2 , x3 ) = 3 . Compute .
x2 + 3x3 ∂x3

96
∂f
Example: Let f (x1 , x2 , x3 ) = ln(x1 + x2 x3 ). Compute .
∂x2

2 x2 ∂f
Example: Let f (x1 , x2 ) = x1 ex1 x2 + . Compute .
x1 ∂x1

Notation
All of the following are equivalent notations for partial derivatives:

∂f
= Dxi f = Di f = ∂xi f = ∂i f = fxi = fi
∂xi

97
6.1.2 The Law of Diminishing Productivity for Production Functions
Law of Diminishing Marginal Productivity

If more variable input units are used along with a certain amount of fixed inputs, the overall output might
grow at a faster rate initially, then at a steady rate, but ultimately, it will grow at a declining rate.

1/2 1/2
Example: Consider the Cobb-Douglas function y = 10x1 x2 . Demonstrate that this satisfies the law of
diminishing marginal productivity.

98
6.1.3 The Multivariable Chain Rule
Multivariable Chain Rule
Let f (x) be a differentiable function and let x(u) = (x1 (u), ..., xn (u)) be differentiable, then the derivative
of the function g(u) = f (x(u)) with respect to ui is

∂g ∂f ∂x1 ∂f ∂x2 ∂f ∂xn


= (x) (u) + (x) (u) + · · · + (x) (u)
∂ui ∂x1 ∂ui ∂x2 ∂ui ∂xn ∂ui

Note
There is a nice visual aid that can be used to derive the multivariable chain rule formula given any function.
This is given by a branch diagram below.

99
Example: Let g(t) = f (x1 (t), x2 (t), x3 (t)) where f (x1 , x2 , x3 ) = x21 − x22 + e2x3 and x1 (t) = e2t , x2 (t) = e−t ,
dg
x3 = 2t. Compute .
dt

x1 + x2
Example: Let g(t) = f (x1 (t), x2 (t), x3 (t), t) where f (t, x1 , x2 , x3 ) = and x1 (t) = t2 , x2 (t) = t3 .
x2 + t
dg
Compute at t = 3.
dt

100
Example: Let g(u, v) = f (x(u, v), y(u, v)) where f (x, y) = 3x2 −2xy+y 2 , x(u, v) = 3u+2v and y(u, v) = 4u−v.
∂g
Compute .
∂u

Example: Let g(u, v) = f (x, y) where f (x, y) is a differentiable function with x(u, v) = 3u + 2v and y = uv.
Given the following table below:

(3, −1) (2, 2) (4, −2) (2, −1) (−2, 1)


fx (x, y) 7 2 −3 5 −4
fy (x, y) −6 3 −2 −2 8
∂g
use the multivariable chain rule to compute when (u, v) = (2, −1).
∂v

101
6.1.4 Implicit Differentiation
Implicit Differentiation

Let x ∈ Rn , y ∈ R and F (x, y) = 0 where F : Rn+1 → R is a differentiable function. Then provided Fy 6= 0


we have

∂y Fx
=− i
∂xi Fy

∂y
Example: Consider the surface x21 ex2 y = x22 yex1 . Compute .
∂x2

dx2
Example: Consider a level curve of the Cobb-Douglas function given by Axa1 xb2 = C. Compute .
dx1

102
6.2 Second Order Partial Derivatives
6.2.1 Higher Order Partial Derivatives
Definition
The second order partial derivative of f (x) is defined as

∂2f
 
∂ ∂f
=
∂xj ∂xj ∂xj ∂xi
and higher order derivatives satisfy the same recursion. In the subscript notation the order of indices is
∂mf
reversed = fi1 ···im−1 im .
∂xm ∂xim−1 · · · ∂xi1

2 2
Example: Compute f12 and f21 for the function f (x1 , x2 ) = x1 e−x1 x2

Young’s Theorem

Let U ⊂ Rn be convex and let f : U → R be differentiable with continuous derivatives. Then, order
of differentiation in computing the mixed-partials is irrelevant. That is, fij = fji for any pair i, j = 1, 2, ..., n.

103
Example: Compute all second order partial derivatives of the function f (x1 , x2 , x3 ) = x33 x22 ln(x1 ).

2
ex1 x2
Example: Compute f211 for the function f (x1 , x2 ) =
x2

104
6.2.2 Second Order Chain Rule
Example: Let g(t) = f (x1 , x2 ) be a twice differentiable function where x1 = 2t + 1 and x2 = 2 − 3t. Compute
d2 g
the second derivative of 2 .
dt

105
6.2.3 The Gradient and the Hessian
Definition
Let U ⊂ Rn be convex and let f : U → R be a twice differentiable function. The gradient of f is defined
as
 
f1 (x)
 f2 (x) 
∇f (x) = 
 
.. 
 . 
fn (x)
The Hessian of f is defined as
 
f11 (x) · · · f1n (x)
2
∇ f (x) = 
 .. .. .. 
. . . 
fn1 (x) · · · fnn (x)

Example: Compute the gradient and Hessian of the Cobb-Douglas production function f (x1 , x2 ) = Axα1 xβ2

Example: Compute the gradient and Hessian of f (x, y, z) = xyexz .

106
6.3 First Order Total Differential
6.3.1 Introducing the First Order Total Differential
Definition
Let U ⊂ Rn be a convex region and let f : U → R be differentiable. The first order differential of f (x)
on U is defined to be

df = ∇f (x)T dx = f1 (x)dx1 + f2 (x)dx2 + · · · + fn (x)dxn

where dxT =
 
dx1 · · · dxn .

Example: Compute the first order differential of f (x1 , x2 , x3 ) = x1 ex1 x2 + 2x23 at (2, 0, −3).

Theorem
Let U ⊂ Rn be a convex region and let f : U → R be differentiable. Let p ∈ U be in the interior, then
provided khk is small we have

∆f ≈ f1 (p)∆x1 + f2 (p)∆x2 + · · · + fn (p)∆xn


where ∆f = f (p + h) − f (p) and ∆xi = hi .
In other words, for small increments the differential approximates the change in f .
p
Example: Use the above to approximate the value of f (0.4, 3.7) where f (x, y) = y(x + 4)ex and compare
your results to the actual answer.

107
6.3.2 The Differential on Level Sets
Example: Suppose a firm uses a production function of the form f (x1 , x2 ) = x21 x2 +x1 for goods 1 and 2. Suppose
they are currently producing (x1 , x2 ) = (300, 500) goods. If we decrease production of goods 1 by 5 units how
many units approximately must we change in production for goods 2 to keep the output the same?

108
6.3.3 The MRS and MRTS
Definition
Consumer Theory

A utility function measures the welfare or satisfaction of a consumer as a function of the consumption
of real good. Level curves of the utility function are called indifference curves because they represent
all combinations of two products which give the same level of utility, representing points for which the
consumer is indifferent.

Production Theory

A production function measures the amount of product obtained as a function of the quantities of
productive factors (such as labour and capital). Level curves of the production function are called isoquants
because it means equal quantity in Latin, representing productive factors for which the production output is
the same.

Theorem
Let I ⊂ R be an interval and let x2 : I → R be differentiable. Let p ∈ I be in the interior, then provided
that |h| is small we have that

dx2 dx2
dx2 = dx2 ⇐⇒ ∆x2 ≈ (p)∆x1
dx1 dx1
where ∆x2 = x2 (p + h) − x2 (p) and ∆x1 = h.

Definition

Let U ⊂ R2 be a convex region and let y : U → R be differentiable. Consider a level curve y(x) = C where
C ∈ Range(y). We say that the absolute value of the slope, |dx2 /dx1 |, is called the...

• Marginal Rate of Substitution (MRS) if y(x) = C is an indifference curve.

• Marginal Rate of Technical Substitution (MRTS) if y(x) = C is an isoquant.

109
Example: Suppose that a firm has found that the following pairs of Capital and Labor yield a production of
of 10 units

Labor(L) 100.00 75.00 56.25 42.19 31.64 23.73 17.80 13.35 10.01 7.51 5.63
Capital(K) 10.00 12.00 14.40 17.28 20.74 24.88 29.86 35.83 43.00 51.60 61.92

∆K
with interpolated graph between these data values below. The MRTS has been approximated by MRTS≈

∆L
between a few of these points. Approximate the value of A and B below.

xδ xδ
Example: Suppose that an agent has a utility function given by u(x1 , x2 ) = 1 + 2 where 0 < δ < 1 is a
δ δ
constant. Determine the MRS of the indifference curves of this utility function.

110
6.3.4 Convexity to the Origin
Definition

Let U ⊂ R2 be a convex region and let y : U → R be differentiable. We say that the curve y(x) = C is...

• convex to the origin (MRS/MRTS is diminishing) if dx2 /dx1 is decreasing as x1 increases.

• concave to the origin (MRS/MRTS is increasing) if dx2 /dx1 is increasing as x1 increases.

• linear to the origin (MRS/MRTS is constant) if dx2 /dx1 is constant as x1 increases.

Theorem

Let U ⊂ R2 be a convex region and let y : U → R be twice differentiable. Suppose that the level curve
y(x) = C satisfies dx2 /dx1 < 0 and...

• d2 x2 /dx21 > 0 then it is convex to the origin.

• d2 x2 /dx21 < 0 then it is concave to the origin.

• d2 x2 /dx21 = 0 then it is linear to the origin.

Example: Determine whether the indifference curves of the utility function U (x1 , x2 ) = x21 + x22 are convex,
concave or linear to the origin.

111
6.4 Curvature Properties of Concavity and Convexity
6.4.1 The Second Order Differential and Convexity for Differentiable Functions
Definition
Let U ⊂ Rn be a convex region and let f : U → R be twice differentiable. The second order differential
of f (x) on U is defined to be
n X
X n
2 T 2
d f = dx ∇ f (x)dx = fij (x)dxi dxj
j=1 i=1

where dxT =
 
dx1 · · · dxn .

Example: Compute the second order differential of f (x) = ln(x1 x2 x3 ) + x1 x3 at the point (2, 1, 3).

Theorem
Let U ⊂ Rn be a convex region and let f : U → R be differentiable. Let p ∈ U be in the interior, then
provided khk is small we have
1
∆f ≈ ∇f (p)T ∆x + ∆xT ∇2 f (p)∆x
2
T
 
where ∆f = f (p + h) − f (p) and ∆x = h1 · · · ∆hn . In other words, for small increments the
1
expression df + d2 f approximates the change in f .
2

Example: Use your previous example to approximate ln(2.1 · 0.9 · 2.8) + 2.1 · 2.8 = ln(5.292) + 5.88.

112
Definition
Let U ⊂ Rn be a convex region and let f : U → R be a twice differentiable function. For p ∈ U in the
interior and khk small the approximation

f (p + h) ≈ f (p) + ∆f (p)T ∆x
is called the first order approximation and
1
f (p + h) ≈ f (p) + ∆f (p)T ∆x + ∆xT ∇2 f (p)∆x
2
T
 
is called the second order approximation where ∆x = h1 · · · hn .

Note
The first order approximation gives a linear approximation to the function. The second order approximation
improves this approximation and gives a quadratic approximation. We can then also approximate the
curvature of the function by the curvature of the quadratic form.

Theorem
Let U ⊂ Rn be a convex region and let f : U → R be a twice differentiable function. Then f is...

Strict:

• strictly convex on U provided ∇2 f (x)  0 for all x ∈ U in the interior.

• strictly concave on U provided ∇2 f (x) ≺ 0 for all x ∈ U in the interior.

Non-Strict:

• convex on U provided ∇2 f (x)  0 for all x ∈ U in the interior.

• concave on U provided ∇2 f (x)  0 for all x ∈ U in the interior.

113
6.4.2 Examples of Curvature Part One
Example: Determine the curvature of the function y = 5 − (x1 + x2 )2

1/2 1/2
Example: Determine the curvature of the function y = x1 x2 for (x1 , x2 ) ∈ R2+

114
Example: Determine the curvature of the function y = ln(3x1 + 4x2 ).

115
Example: Determine the curvature of the function y = 3x21 + 2x1 x2 + 5x22 + 4x23 .

116
Example: Determine the curvature of the function y = ln(x1 x2 x3 ) on (x1 , x2 , x3 ) ∈ R3+

117
6.5 Differentiable Quasiconvex Functions and Homogeneity
6.5.1 The Bordered Hessian
Definition
Let U ⊂ Rn be a convex region and let f : U → R be twice differentiable. The bordered Hessian is
defined as
 
0 f1 (x) · · · fn (x)
T
 1 (x)
f f11 (x) · · · f1n (x)
  
0 ∇f (x)

∇2 f (x) = =

2 .. .. .. ..
∇f (x) ∇ f (x)

 . . . . 
fn (x) fn1 (x) · · · fnn (x)

1/2 1/3
Example: Compute the bordered Hessian of the function f (x1 , x2 , x3 ) = x1 x2 + x2 x3 .

118
6.5.2 Quasiconvexity of Differentiable Non-Convex Functions
Theorem

Let U ⊂ Rn be a convex region and let f : U → R be twice differentiable. Let H = ∇2 f represent the
bordered Hessian of f . If the leading principal minors satisfy

• |H k (x)| < 0 for k = 2, ..., n and all x ∈ U in the interior then f is quasiconvex.

• (−1)k |H k (x)| < 0 for k = 2, ...n and all x ∈ U in the interior then f is quasiconcave.

Note
Conditions on the Hessian give us convexity or concavity and conditions on the bordered Hessian give us
quasiconvexity or quasiconcavity. Every convex (or concave) function is quasiconvex (respectively quasi-
concave) and checking the Hessian is easier than the bordered Hessian. Thus in determining quasiconvexity
(or quasiconcavity) you should first check convexity (or concavity) and then move to the bordered Hessian
only if that fails.

Example: Determine whether the function f (x1 , x2 ) = x2 e−x1 is quasiconvex, quasiconcave or neither on R2+ .

119
Example: Determine whether the function f (x1 , x2 ) = x1 x22 is quasiconvex, quasiconcave or neither on R2+ .

120
6.5.3 Homogeneous Functions
Definition

Let U ⊂ Rn and let f : U → R. We say that f is a homogeneous function of degree k if f (sx) = sk f (x).

Note
Homogeneous functions are functions with a scaling relationship between their input factors and output
factor. In a sense, they measure a level of return based on increasing the inputs.

Example: Show that the production function f (x1 , x2 ) = x1 x22 is homogeneous and use this to argue that
doubling the input increases the output 8-fold.

Definition
Let U ⊂ Rn+ and let f : U → R be a homogeneous function of degree k. We say that f displays...

• increasing returns to scale if k > 1.

• constant returns to scale if k = 1.

• decreasing returns to scale if k < 1.

1/4 1/2
Example: Show that f (x1 , x2 ) = x1 x2 is homogeneous and determine the type of return.

121
x21 − x22
Example: Show that the function f (x1 , x2 ) = is homogenous and determine the type of return.
x1 + x2

1/2 1/3 3/2


Example: Show that the production function f (x1 , x2 ) = x1 x2 + x2 is not homogeneous.

122
Theorem

Let U ⊂ R2 be a convex region and let y : U → R be a homogeneous function. Then along any given ray
from the origin, the slopes of the level curves of y(x1 , x2 ) are the same.

Theorem
Let U ⊂ Rn be a convex region and let f : U → R be a differentiable homogeneous function of degree k.
The partial derivatives of f are homogeneous of degree k − 1.

Example: Consider the Cobb-Douglas function f (x1 , x2 ) = Axa1 xb2 on R2+ . Show that the function is homoge-
neous of degree a + b and that the marginals are homogeneous of degree a + b − 1.

123
6.5.4 Euler’s Theorem
Euler’s Theorem
Let U ⊂ Rn be a convex region and let f : U → R be a differentiable homogeneous function of degree k.
Then the following identity holds

∇f (x) · x = kf (x)
for every x ∈ U in the interior.

Example: Prove Euler’s Theorem by differentiating both sides of f (sx) = sk f (x) using the multivariable
chain rule and then evaluating at s = 1.

Example: Illustrate that Euler’s Theorem holds for the Cobb-Douglas function f (x1 , x2 ) = Kxa1 xb2 where
K, a, b > 0.

Note
If f represents the production function of a firm and has constant returns to scale (homogeneous of degree
1) then Euler’s theorem shows that if the price of each input is its marginal product then the total cost and
output are equal, i.e.
   
∂f ∂f
(x) x1 + · · · + (x) xn = f (x)
∂x1 ∂xn

124
6.6 Taylor Series
6.6.1 Single Variable Taylor Series
Definition
Let I ⊂ R be an interval and let f : I → R be a function that is differentiable of every order (i.e. all
derivatives f 0 (x), f 00 (x), f 000 (x), ... exist). Then the Taylor polynomial of f (x) centered at a ∈ I in the
interior is
1 1
P (x) = f (a) + f 0 (a)(x − a) + f 00 (a)(x − a)2 + · · · + f (k) (a)(x − a)k + · · ·
2 k!
If we take this polynomial up to degree m we say it is the Taylor polynomial of order m and denote it
Pm (x).

Example: Compute the Taylor polynomial of f (x) = ex centered at a = 0.

Example: Compute the Taylor polynomial of f (x) = x ln(x) centered at a = 1.

125
Note
Through a combination of using a high enough order Taylor polynomial and restricting yourself close enough
to a ∈ I the Taylor polynomial often gives a good approximation of f (x).

Example: Approximate the value of e2 using the Taylor polynomial of f (x) = ex centered at a = 0 using
orders m = 1, 2, 3, 4, 5 and 6.

126
6.6.2 Multivariable Taylor Series
Definition
Let U ⊂ Rn be a convex region and let f : U → R be a function that is differentiable of every order. Then
the Taylor polynomial of f (x) centered at a ∈ U in the interior is

n n n
X 1 XX
P (x) = f (a) + fi (a)(xi − ai ) + fij (a)(xi − ai )(xj − aj ) + · · ·
2
i=1 j=1 i=1
n n
1 X X
+ ··· fi1 ···ik (a)(xi1 − ai1 ) · · · (xik − aik ) + · · ·
k!
ik =1 i1 =1

If we take this polynomial up to degree m we say it is the Taylor polynomial of order m and denote it
Pm (x).

Example: Find the third order Taylor approximation of the function f (x1 , x2 ) = ex1 x2 about the point
a = (0, 2).

127
Note
The first and second order polynomials are the same as the first and second order approximations respec-
tively. That is,

P1 (p + h) = f (p) + ∇f (p)T ∆x

1
P2 (p + h) = f (p) + ∇f (p)T ∆x + ∆xT ∇2 f (p)∆x
2
and the Taylor polynomial just generalizes this approximation.

Example: Find the second order Taylor approximation of the function f (x1 , x2 ) = ln(x1 + x1 x2 ) about the
point (1, 0). Use this to approximate f (1.1, 0.2).

128
Chapter 7

Optimization of Functions of n-Variables

129
7.1 First Order Conditions
7.1.1 Critical Points and Extrema
Definition
Let U ⊂ Rn be a convex region and let f : U → R be differentiable. A point x∗ ∈ U in the interior is called
a critical point or stationary point if ∇f (x∗ ) = 0.

Note
As the above figure demonstrates, critical points don’t necessarily correspond to maximum or minimum
values of a function but are potential locations of interest. Global extrema were defined earlier. Below we
define local extrema.

Definition
Let U ⊂ Rn be a convex region and let f : U → R. We say that x∗ is a...

• local maximum or relative maximum of f (x) if there exists an  > 0 such that f (x∗ ) ≥ f (x) for
all x ∈ B (x∗ ).

• local minimum or relative minimum of f (x) if there exists an  > 0 such that f (x∗ ) ≤ f (x) for
all x ∈ B (x∗ ).

collectively, maxima and minima are called extrema.

130
Theorem
Every global extrema is also a local extrema. The reverse is not necessarily true unless you know some
information about the curvature of the function. Note that the following theorem does not require the
function f to be differentiable.

Theorem
Let U ⊂ Rn be a convex region and let f : U → R. Provided that f is...

• convex or quasiconvex then any local minimum is also a global minimum.

• concave or quasiconcave then any local maximum is also a global maximum.

131
2 2
Example: Consider the function f (x1 , x2 ) = e−x1 −x2 . Given that x∗ = 0 is a local maximum, show it is global
by verifying f is quasiconcave.

Note
If you have a critical point, there are no conclusions you can make about it unless you know some information
about the overall curvature of the function, i.e. the convexity or concavity. Note that the following theorem
requires f to be differentiable.

Theorem
Let U ⊂ Rn be a convex region and let f : U → R be a differentiable function. Let x∗ be a critical point of
f , then if...

• f is convex then x∗ is a global minimum. If f is strictly convex then this global minimum is unique.

• f is concave then x∗ is a global maximum. If f is strictly concave then this global maximum is unique.

Example: Let f (x1 , x2 ) = x21 + x22 − x1 − x2 + 1. Given that x∗ = (1/2, 1/2) is a critical point of f determine
the convexity or concavity of f to classify it.

First Order Conditions


Let U ⊂ Rn be a convex region and let f : U → R be a differentiable function. If x∗ ∈ U is a relative
extrema then ∇f (x∗ ) = 0.

132
7.1.2 Linear Equations of Critical Points
Example: Determine all critical points of f (x1 , x2 ) = 4x1 + 2x2 − x21 − x22 + x1 x2 .

133
7.1.3 Using Substitution to Obtain Critical Points
1 1
Example: Determine all critical points of f (x1 , x2 ) = + + x1 x2 .
x1 x2

Example: Determine all critical points of f (x1 , x2 ) = 4 + x31 + x32 − 3x1 x2 .

134
7.1.4 Using Cases to Obtain Critical Points
Example: Determine all critical points of f (x1 , x2 ) = 3x21 x2 + x32 − 3x21 + 3x22 + 2.

135
7.1.5 Multiproduct Monopoly
Definition
A demand function is a mathematical equation which expresses the demand of a product or service of its
price (and potentially) other factors.

Definition
Suppose that a firm owns a monopoly on and produces n goods in quantities x ∈ Rn with respective prices
p ∈ Rn . The revenue function is given by R(x) = pT x = p1 x1 + · · · + pn xn . Given a cost function C(x)
the profit function is defined to be

π(x) = R(x) − C(x) = pT x − C(x)

Note
The price is not always constant and is sometimes dependant on the quantity of goods (and vice versa)
through a demand function! If you want to differentiate the profit function, you must turn the expression
into a function of the goods first by solving the demand functions and substituting.

Example: Suppose the (linear) demand functions of a multiproduct monopoly are given by
(
x1 = 100 − 2p1 + p2
x2 = 120 + 3p1 − 5p2

with a cost function C(x1 , x2 ) = 50 + 10x1 + 20x2 . Determine the critical points (x∗1 , x∗2 ) of the profit function
π(x1 , x2 ).

136
Example: A firm sells some output in a perfectly competitive market, where the price is $60 per unit, and
some on a market in which it has a monopoly, with a demand function p2 = 100 − x2 where x2 is output in the
monopoly market. Its total-cost function is C = (x1 + x2 )2 , where x1 is output in the competitive market. Find
the critical points of the profit function.

137
7.1.6 Cournot Duopoly
Definition
Suppose that two firms produce identical outputs and sell into a market with a demand function p(x1 , x2 )
where x1 and x2 represent the output of each firm respectively. Given that each firm has a cost function
C1 (x) and C(x) then each firm has a profit function given by

π1 (x) = px1 − C1 (x) π2 (x) = px2 − C2 (x)


∂π1 ∂π2
The curve that satisfies = 0 and = 0 are called the best response functions for firm 1 and firm
∂x1 ∂x2
2 respectively. The intersection point of these two curves is called the Cournot Nash equilibrium.

Example: Two firms, Firm 1 and Firm 2, are competing in an ologopolistic industry. They produce an
identical product. But Firm 1 does it at a lower cost than Firm 2. Firm 1 has a constant marginal cost of $15
and firm 2 has a constant marginal cost of $30. The market demand for the commodity is p = 120 − (x1 + x2 )
(here x1 + x2 is the aggregate output). Suppose that firms choose quantities. Find both best-response functions.
Then find the Cournot-Nash equilibrium quantities and illustrate your results in a diagram.

138
7.2 Second Order Conditions
7.2.1 Saddle Points and Classifying Critical Points Locally
Definition
Let U ⊂ Rn be a convex region and let f : U → R be a continuous function. We say that x∗ ∈ U in the
interior is called a saddle point if for every  > 0 there are points x, y ∈ B (x∗ ) such that f (x∗ ) < f (x)
and f (x∗ ) > f (y).

Second Order Conditons


Let U ⊂ Rn be a convex region and let f : U → R be a twice differentiable function. Suppose that x∗ ∈ U
in the interior is a critical point of f . Then provided that...

• ∇2 f (x∗ )  0 the critical point is a local minimum.

• ∇2 f (x∗ ) ≺ 0 the critical point is a local maximum.

• ∇2 f (x∗ ) is indefinite the critical point is a saddle point.

The test is inconclusive if ∇2 f (x∗ ) is semi-definite.

Note
If the test is inconclusive, that means this theorem will not assist you in solving your problem! It is not
a conclusive answer! Your goal is to conclude one of the three options: maximum, minimum, or saddle
point. You must change techniques such as determining convexity, quasiconvexity, etc. If you conclude
“inconclusive” you will fail the question.

139
7.2.2 Examples of Local Optimization
Example: Find and classify all critical points of the function f (x1 , x2 ) = x31 + x32 − 3x1 x2 .

140
Example: Find and classify all critical points of the function f (x1 , x2 ) = (x1 − 4) ln(x1 x2 ).

141
√ √
Example:
√ Suppose
√ that f (x1 , x2 , x3 ) is a function with critical points (0, 1, 4), (0, −1, 4), (−2/ 5, 1/ 5, 4)
and (2/ 5, −1/ 5, 4). Classify these critical points given that
 
6x1 + 6x2 6x2 0
∇2 f (x) =  6x1 6x2 0 
0 0 −4

142
Chapter 8

Constrained Optimization

143
8.1 First Order Conditions of Lagrange Multipliers
8.1.1 Introducing Constrained Problems
Definition
Let U ∈ Rn and g1 , ..., gk , f : U → R. The region U ∗ ⊂ U representing the collection of all points p such
that g1 (p) = 0, ..., gk (p) = 0 is called the constrained space. The constrained optimization problem
of optimizing f (x) subject to the constraints g1 (x) = 0, ..., gk (x) = 0 is denoted
(
max/min f (x)
sb.t g1 (x) = 0, ..., gk (x) = 0
and the solutions to this problem are all the local extrema of f (x) on U ∗ .

Example: The constraint problem


(
max/min f (x1 , x2 ) = x21 + x22
sb.t x2 = x1 + 1.5

is to look for all relative extrema of f (x1 , x2 ) = x21 + x22 but only along the curve x2 = x1 + 1.5. This is detailed
in the following picture. Explain why this problem has a minimum solution but not a maximum.

144
8.1.2 The Lagrange Function and First Order Conditions
Definition
Let U ⊂ Rn and consider the constraint problem of optimizing f : U → R subject to the constraints
g1 , ..., gk : U → R. We define the Lagrange Function to be the function L : U ∗ × Rk → R given by

L(x; λ) = f (x) − λ1 g1 (x) − λ2 g2 (x) − · · · − λk gk (x)


The variables x are called the spatial variables and the variables λ are called the Lagrange multipliers.

Theorem
At a relative extrema of the constrained optimization problem the level sets are tangent to the constrained
space.

145
First Order Conditions of Constrained Optimization

Let U ⊂ Rn be a convex region and let g1 , ..., gk , f : U → R be differentiable functions forming the
constrained problem
(
max/min f (x)
sb.t. g1 (x) = 0, ..., gk (x) = 0

and let L(x; λ) = f (x) − λT g(x) be the associated Lagrange function. Let U ∗ be the constrained space and
let x∗ ∈ U ∗ represent a relative extrema where ∇g1 (x∗ ) 6= 0, ..., ∇gk (x∗ ) 6= 0. Then ∇L(x∗ ) = 0.

Example: Find all critical points of the Langrange function associated to optimizing f (x1 , x2 ) = x1 x2 subject
to x1 + x2 = 6.

146
Example: Find all critical points of the Lagrange function associated to optimizing f (x1 , x2 ) = x21 x2 subject
to 2x21 + x22 = 3.

147
Example: Find all critical points of the Lagrange function associated to optimizing f (x1 , x2 , x3 ) = x21 +x22 +x23
subject to 2 = x3 − x1 x2 .

148
Example: Find all critical points of the Lagrange function associated to optimizing f (x1 , x2 , x3 ) = x3 subject
to x1 + x2 + x3 = 12 and x3 = x21 + x22

149
8.2 Second Order Conditions of Lagrange Multipliers
Definition
Let U ⊂ Rn and let g1 , ..., gk , f : U → R be twice differentiable. Given the associated Lagrange function
L(x; λ) = f (x) − λT g(x) its bordered Hessian is defined to be the partitioned matrix

−DgT (x)
 
2 O
∇ L(x; λ) =
−Dg(x) ∇2x L(x; λ)
 
| ··· |
where Dg(x) =  ∇g1 (x) · · · ∇gk (x)  is called the Jacobian matrix of g1 (x), ..., gk (x) and
| ··· |
∂2L ∂2L
 
 ∂x2 (x; λ) ··· (x; λ) 
 1 ∂x1 ∂xn 
∇2x L(x; λ) = 
 .. .. .. 
 . . . 

 ∂2L ∂2L 
(x; λ) · · · 2
(x; λ)
∂xn ∂x1 ∂xn
is called the spatial Hessian of the Lagrange function.

Second Order Conditions of Constrained Optimization

Let U ⊂ Rn be a convex region and let g1 , ..., gk , f : U → R be twice differentiable functions. Given the
optimization problem of optimizing f subject to the constraints g1 (x) = 0, ..., gk (x) = 0 with constrained
space U ∗ and let L(x; λ) be the associated Lagrange function. Let H j denote the upper left j × j submatrix
of the bordered Hessian of the Lagrange function and assume that (x∗ ; λ∗ ) satisfies the first order conditions.
Further assume that every k × k subdeterminant of Dg(x∗ ) is non-zero (it is of rank k). If the minors at
(x∗ ; λ∗ ) satisfy

• (−1)k |H j | > 0 for j = 2k + 1, 2k + 2, ..., n + k then the function f has a local minimum at x = x∗ on
U ∗.

• (−1)j−k |H j | > 0 for j = 2k + 1, 2k + 2, ..., n + k then the function f has a local maximum at x = x∗
on U ∗ .

If the determinants fall into a different non-zero pattern of signs then the critical point is a saddle point. If
any of the determinants are zero then the result is inconclusive.

150
Note
In specific cases... we have the following sign patterns for principal leading minors of the bordered Hessian:

One constraint case (k = 1):

• |H 3 | < 0, |H 4 | < 0, |H 5 | < 0, ..., |H n+1 | < 0 then f has a local minimum at x = x∗ on U ∗ .

• |H 3 | > 0, |H 4 | < 0, |H 5 | > 0, ..., (−1)n |H n+1 | < 0 then f has a local maximum at x = x∗ on U ∗ .

Two constraint case (k = 2):

• |H 5 | > 0, |H 6 | > 0, |H 7 | > 0, ..., |H n+2 | > 0 then the function f has a local minimum at x = x∗ on
U ∗.

• |H 5 | < 0, |H 6 | > 0, |H 7 | > 0, ..., (−1)n |H n+2 | > 0 then the function f has a local maximum at x = x∗
on U ∗ .

Example: Classify all critical points obtained in optimizing f (x1 , x2 ) = x21 + 4x22 subject to x21 + x22 = 1.

151
Example: Classify all critical points obtained in optimizing f (x1 , x2 , x3 ) = x21 +x22 +x23 subject to 2 = x3 −x1 x2 .

152
Example: Classify all critical points obtained in optimizing f (x1 , x2 , x3 ) = x3 subject to x1 + x2 + x3 = 12
and x3 = x21 + x22 .

153
.

154
.

155
.

156
.

157
References

[1] Hoy, Michael, Livernois, John, Mckenna, Chris, Rees, Ray, Stengos, and Thanasis (2011); Mathematics for
Economics 3rd edition, The MIT Press.

[2] Strang (2016); Introduction to Linear Algebra, Wellesley-Cambridge Press.

[3] Norman and Wolczuk (2020); Introduction to Linear Algebra for Science and Engineering 3rd edition, Pearson.

[4] Hass, Heil and Weir (2018); Thomas’ Calculus: Early Transcendentals 13th edition, Pearson.

[5] Rudin (1976); Principles of Mathematical Analysis 3rd edition, McGraw-Hill.

[6] Osborne (2016); Mathematical Methods for Economic Theory, Open Source.

[7] Simon and Blume (1994); Mathematics for Economists, W. W. Norton & Company.

[8] Sydsaeter and Hammond (1995), Mathematics for Economic Analysis, Pearson.

[9] Folland (2002); Advanced Calculus, Pearson.

[10] Rockafellar (1970); Convex Analysis, Princeton University Press.

[11] Boyd and Vandenberghe (2004); Convex Optimization, Cambridge University Press.

158

You might also like