InvFunc
InvFunc
Inverse Functions
Please read this pdf in place of Section 6.5 in the text. The text uses the term “inverse of a function”
and the notation f −1 in the most general possible way, and this can be confusing. We will use this
pdf in place of Section 6.5 to avoid some causes of confusion in working with the notation f −1 .
To refer to results in this pdf, label them as “InF Theorem 1,” “InF Lemma 2,” etc.
Proof. Assume first that g is an inverse function for f . We need to show that both (1) and (2)
are satisfied. Let a ∈ A be arbitrary, and let b = f (a). Then by definition of an inverse function,
f (a) = b implies g(b) = a, so we can compute
This proves (1). To prove (2), let b ∈ B be arbitrary, and let a = g(b). Once again, the fact that g
is an inverse function for f tells us that g(b) = a implies f (a) = b, and therefore,
f (g(b)) = f (a) = b,
1
Conversely, assume that f and g satisfy (1) and (2). Let a ∈ A and b ∈ B be arbitrary; we need
to show that f (a) = b if and only if g(b) = a. On the one hand, if f (a) = b, then
as desired. Conversely, if g(b) = a, then a similar argument using equation (2) shows that f (a) =
f (g(b)) = b, and we are done.
There is a nice way to rephrase this result in terms of compositions. Before we do so, let us
note some important facts about compositions.
If A is any nonempty set, one of the simplest functions we can write down is the identity
function of A, denoted by IA : A → A, and defined by IA (x) = x for every x ∈ A.
Lemma 2. Suppose A, B, C, D are any nonempty sets, and f : A → B, g : B → C, and h : C → D
are any functions. Then the following identities hold:
f ◦ IA = IB ◦ f = f,
(h ◦ g) ◦ f = h ◦ (g ◦ f ).
f ◦ g = IB and g ◦ f = IA . (3)
Proof. You can check using the definitions of composition and identity functions that (3) is true if
and only if both (1) and (2) are true, and then the result follows from Theorem 1.
f ◦ g1 = IB , g1 ◦ f = IA ,
f ◦ g2 = IB , g2 ◦ f = IA .
Using these formulas together with the results of Lemma 2, we compute as follows:
g1 = g1 ◦ IB
= g1 ◦ (f ◦ g2 )
= (g1 ◦ f ) ◦ g2
= IA ◦ g2
= g2 ,
2
Definition. Let A and B be nonempty sets. A function f : A → B is said to be invertible if
it has an inverse function.
Notation: If f : A → B is invertible, we denote the (unique) inverse function by f −1 : B → A.
Using this notation, we can rephrase some of our previous results as follows.
Corollary 5. Suppose f : A → B is an invertible function. Then
Proof. These are just the results of Theorem 1 and Corollary 3 with g replaced by f −1 .
It is important to be careful with the notation f −1 : The superscript in this case does not
represent a multiplicative inverse; instead it represents a different function, one that satisfies (4),
(5), and (6). Also, the notation f −1 should never be used to represent an inverse function unless
you have verified that f is invertible. Important note: We will see in §6.6 that there is a different
way that f −1 is used even when f is not invertible. In that section we wil apply f −1 to a set, and
it will not indicate that f is invertible or that there is an inverse function.
Here is a simple criterion for deciding which functions are invertible.
Theorem 6. A function is invertible if and only if it is bijective.
Proof. Let f : A → B be a function, and assume first that f is invertible. Then it has a unique
inverse function f −1 : B → A. To show that f is surjective, let b ∈ B be arbitrary, and let
a = f −1 (b). Then (5) implies f (a) = f (f −1 (b)) = b.
To show that f is injective, let a1 , a2 ∈ A and suppose f (a1 ) = f (a2 ). Applying the function
f −1 to both sides yields f −1 (f (a1 )) = f −1 (f (a2 )), and then (4) shows that a1 = a2 .
Conversely, assume f is bijective. We define a function g : B → A as follows: Given b ∈ B,
because f is surjective there is an element a ∈ A such that f (a) = b, and because f is injective, a
is the unique such element. Thus we can unambiguously define g(b) to be this particular value of
a. By our definition of g, we see that g(b) = a if and only if f (a) = b. Thus by the definition of an
inverse function, g is an inverse function of f , so f is invertible.
These theorems yield a streamlined method that can often be used for proving that a function
is bijective and thus invertible. Given a function f : A → B, if we can (by any convenient means)
come up with a function g : B → A and prove that it satisfies both f ◦ g = IB and g ◦ f = IA , then
Corollary 3 implies g is an inverse function for f , and thus Theorem 6 implies that f is bijective.
Moreover, since the inverse is unique, we can conclude that g = f −1 .
Consider for example the function F : R → R given by F (x) = 5x + 3, which we studied above.
Preliminary computations suggest that G : R → R given by G(y) = (y − 3)/5 ought to be an inverse
function for it. We can verify this by showing that F ◦ G = IR and G ◦ F = IR , which is just a
couple of computations:
y−3 y−3
F ◦ G(y) = F =5 + 3 = (y − 3) + 3 = y,
5 5
(5x + 3) − 3 5x
G ◦ F (x) = G(5x + 3) = = = x.
5 5
Once G is defined, this computation is all that is needed to prove that F is bijective and invertible,
and that G = F −1 . (Of course, in general, you have to take care when defining G to ensure that
3
it gives a well-defined element of A for every element of B, or in other words that it is indeed a
function.)
Next we have a formula for the inverse of an inverse.
Proof. Assume f and g are invertible. Then they are both bijective by Theorem 6 above, so g ◦ f
is bijective by Theorem 6.20 and thus invertible.
To show that the inverse is equal to f −1 ◦ g −1 , by Corollary 3 we just need to show that the
following two compositions are identity maps:
(g ◦ f ) ◦ (f −1 ◦ g −1 ) = IC ,
(f −1 ◦ g −1 ) ◦ (g ◦ f ) = IA .
To prove the first identity, we compute as follows using the results of Lemma 2:
(g ◦ f ) ◦ (f −1 ◦ g −1 ) = g ◦ f ◦ (f −1 ◦ g −1 )
= g ◦ (f ◦ f −1 ) ◦ g −1 ))
= g ◦ (IB ◦ g −1 )
= g ◦ g −1
= IC .
(f −1 ◦ g −1 ) ◦ (g ◦ f ) = f −1 ◦ g −1 ◦ (g ◦ f )
= f −1 ◦ (g −1 ◦ g) ◦ f
= f −1 ◦ (IB ◦ f )
= f −1 ◦ f
= IA ,
4
Functions as Sets of Ordered Pairs
We end this section with a brief discussion of a way of looking at functions as certain kinds of sets.
Reading this discussion is optional
Recall that we defined a function as a pair of sets (the domain A and the codomain B), together
with a “rule” that associates with each element of A one and only one element of B. You might
have noticed that we were a little vague about exactly what a “rule” is supposed to be. Does it
have to be a formula? Or can it be a set of formulas for different parts of the domain? Or can it be
some other type of algorithm? Does such an algorithm need to produce an exact value after finitely
many steps, or is something like an infinite series acceptable? Can a function assign “random”
values (whatever that might mean)?
These are questions that vexed mathematicians for centuries as the foundations of calculus were
being worked out. Shortly after calculus was invented, most mathematicians believed a function
had to be defined by an explicit formula. Gradually, the requirements were relaxed to allow more
things to be considered functions. Finally, in the mid-twentieth century, mathematicians settled on
a definition that is general enough to accommodate anything anybody wanted to call a “function.”
It is, ultimately, a definition of a function as a kind of set, like just about everything else in modern
mathematics. Like the definition of ordered pairs as sets (see Exercise 10 on p. 263 of the textbook),
it is not very practical to work with, but it is important to know that there is such a definition,
because it ensures that the theory of functions rests on the same solid logical foundation as set
theory.
The key idea is that once the domain A and codomain B have been chosen, whatever techniques
we might use to specify a function f : A → B, the function is completely determined by the pairing
of each element of A with a specific element of B. In other words, if we know all of the ordered
pairs (a, b) ∈ A × B, where a ranges over all elements of A and b = f (a) for each such a, then we
know the function. Thus a function determines a certain subset of A × B, which we denote by Γf :
5
(iv) f is injective if for all a1 , a2 ∈ A and b ∈ B, if (a1 , b) and (a2 , b) are in Γf , then a1 = a2 .
Every function satisfies (i) and (ii), but only some functions satisfy (iii) and/or (iv).
Now let’s try to go back the other way—suppose we are given two sets A and B, and some subset
Γ ⊆ A × B. Can Γ be the graph of a function from A to B? Clearly it has to satisfy conditions (i)
and (ii) above (with Γf replaced by Γ). But the remarkable thing is that nothing more is needed:
If Γ is any subset of A × B that satisfies these two conditions, then we get a well-defined function
f : A → B by declaring that for every a ∈ A, f (a) is the unique element b ∈ B such that (a, b) ∈ Γ.
Based on this observation, mathematicians in the mid-twentieth century reached consensus that
a function should be formally defined as a subset of the Cartesian product satisfying these two
properties. Thus here is the official definition:
Because this definition is simply a description of a certain kind of set, it does not depend on
any particular method for specifying how f (a) is determined from a, thus resolving the ambiguities
about what kinds of rules are legitimate ways to define functions.
Since a function by this definition is the subset Γ ⊆ A×B, it would make sense to give the subset
the same name as the function itself, and many authors do so, speaking of “a function f ⊆ A × B.”
But in practice, mathematicians generally work with functions using the usual functional notation
(b = f (a)), and think about functions in terms of rules for producing an output value from an
input value, rather than as sets of ordered pairs. When it is useful to work directly with the set
of ordered pairs, it is usually referred to as the graph of the function as we did above, not as the
function itself.
The fine print: If a function f : A → B is invertible, then its graph has a very simple description
in terms of the graph of f . You can check easily that in that case the graph of the inverse function
f −1 is the subset of B × A given by
In other words, the graph of f −1 is just the set of ordered pairs obtained by switching the first
and second coordinates in all the ordered pairs in the graph of f .
Now it happens that the subset of B × A defined in (7) makes sense even if f is not invertible.
This leads some authors (including Sundstrom, the author of our textbook) to define the inverse
relation of f to be that subset of B × A, whether f is invertible or not, and to call that set
“f −1 .” This is a dangerous practice, because that set is typically not a function, and does not
behave in the way the actual inverse function behaves when f is invertible; so in this class (as in
most mathematics books) we will never refer to the inverse relation, and we will use the notation
f −1 only to refer to the inverse of an invertible function. (Full disclosure: In Section 6.6, we
will see another common meaning for the symbol f −1 when applied to a set, and we will use it
in that context; but we will never use it to refer to the inverse relation.)