Course Notes roughly up to 4/6

Transcription

Math 235.9 Spring 2015 Course Notes
Andrew J. Havens
April 15, 2015
1
Systems of Real Linear Equations
Let’s consider two geometric problems:
(1) Find the intersection point, if it exists, for the pair of lines whose equations in “standard
form” are given as
2x + 4y = 6,
x − y = 0.
More generally, can we solve the two dimensional linear system:
ax + by = e,
cx + dy = f ,
provided a solution exists? Can we develop criteria to understand when there is a unique
solution, or multiple solutions, or no solution at all?
(2) Consider the vectors
Ç
1
1
å
Ç
,
2
−1
å
.
We can depict these as arrows in the plane as follows:
Figure 1: The two vectors above depicted as “geometric arrows” in the Cartesian coordinate plane.
Imagine that we can only take “steps” corresponding to these vectors, i.e. we can only move
parallel to these vectors, and a valid move consists of adding one of these two vectors to our
position to obtain our next position. Can we make it from the origin O = (0, 0) to the point
(6, 0)?
1
Math 235.9 - Lin. Alg. Course Notes
2015
Andrew J. Havens
We will see that these two kinds of problems are actually related more closely than they would
initially appear (though the second has a restriction that the first does not require, namely we
seek an integer solution; nonetheless, there is an underlying algebraic formalism which allows us to
consider this problem one of linear algebra).
First, we solve problem (1). There are many ways to solve the given numerical problem. Among
them: solving one equation for either x or y (the second is ripe for this) and substituting the result
into the other equation, writing both equations in slope-intercept form and setting them equal
(this is clearly equivalent to the substitution described), or eliminating variables by multiplying
the equations by suitable constants and respectively adding the resulting left and right hand sides
to obtain a single variable equation:
(
2x + 4y
x−y
(
=6
x + 2y
←→
=0
2x − 2y
=3
=0
3x = 3 .
From this we see that x = 1, and substituting into the second of the two original equations, we
see that y = 1 as well.
Figure 2: The two lines plotted in the Cartesian coordinate plane.
The motivation to use these manipulations will become more clear when we see higher-dimensional
linear systems (more variables and more equations motivates a systematic approach, which we will
develop in subsequent lectures). One often notates this kind of problem and the manipulations involved by writing down only the coefficients and constants in what is called an augmented matrix :
ô
ñ
2 4 6
.
1 −1 0
The square portion of the matrix is the coefficient matrix, and the final column contains the constants from the standard forms of our linear equations. This notation generalizes nicely when
encoding large systems of linear equations in many unknowns. Let us describe what the manipulations of the equations correspond to in this matrix notation:
(i) A row may be scaled by a nonzero number since equations may be multiplied/divided on left
and right sides by a nonzero number,
2
2015
Andrew J. Havens
(ii) A nonzero multiple of a row may be added to another row, and the sum may replace that
row, since we can recombine equations by addition as above.
(iii) Two rows may be swapped, since the order in which equations are written down does not
determine or effect their solutions.
The above are known as elementary row operations. Note that for constants p, q ∈ R an
augmented matrix of the form
ñ
ô
1 0 p
0 1 q
corresponds to a solution x = p, y = q. Further, note that we can combine operations (i) and
(ii) to a more general and powerful row operation: we may replace a row by any nontrivial linear
combination of that row and other rows, i.e. we may take a non-zero multiple of a row and add
multiples of other rows, and replace the original row with this sum.
Let us apply row operations to attempt to solve the abstract system
(
ax + by
cx + dy
ñ
ô
=e
a b e
.
←→
c d f
=f
We assume temporarily that a 6= 0. We will discuss this assumption in more depth later. Since
our goal is to make the coefficient matrix have ones along the diagonal from left top to right bottom,
and zeros elsewhere, we work to first zero out the bottom left entry. This can be done, for example,
by taking a times the second row and subtracting c times the first row, and replacing the second
row with the result. We denote this by writing
aR2 − cR1 7→ R20
(I may get lazy and stop writing the primes, where it will be understood that R2 after the arrow
represents a row replacement by the quantity on the left). The effect on the augmented matrix is
ñ
ô
ñ
ô
a b e
a
b
e
7−→
.
c d f
0 ad − bc af − ce
We see that if ad − bc = 0, then either there is no solution, or we must have af − ce = 0. Let’s
plug on assuming that ad − bc 6= 0. We may eliminate the upper right position held by b in the
coefficient matrix by (ad − bc)R1 − bR2 7→ R10 , yielding
ñ
ô
ñ
ô
a
b
e
a(ad − bc)
0
(ad − bc)e − b(af − ce)
7→
0 ad − bc af − ce
0
ad − bc
af − ce
ô
ñ
a(ad − bc)
0
ade − abf
=
.
0
ad − bc af − ce
Since we assumed a and ad − bc nonzero, we may apply the final row operations
1
and ad−bc
R2 7→ R20 to obtain
ñ
ô
1 0 (de − bf )/(ad − bc)
,
0 1 (af − ce)/(ad − bc)
so we obtain the solution as
1
a(ad−bc) R1
7→ R10
de − bf
af − ce
, y=
.
ad − bc
ad − bc
Note that if a = 0 but bc 6= 0, the solutions are still well defined, and one can obtain the
corresponding expressions with a = 0 substituted in by instead performing elimination on
x=
ñ
ô
0 b e
,
c d f
3
2015
Andrew J. Havens
where the first step might be a simple row swap. However, if ad − bc = 0, there is no hope for the
unique solution expressions we obtained, though there may still be solutions, or there may be none
at all. We will characterize this failure geometrically eventually. First, we turn to problem (2).
Problem (2) is best rephrased in terms of the language of linear combinations of vectors. Recall
that addition of the real vectors, which we are representing as arrows in the plane, has both
geometric and algebraic definitions. The geometric definition is of course the parallelogram rule:
the sum of two vectors a and b is the diagonal of the parallelogram completed by parallel translating
a along b and b along a:
Figure 3: Vector addition with arrows.
The corresponding algebraic operation is merely addition of components: if
Ç
a=
ax
ay
å
Ç
, b=
Ç
then define
a + b :=
ax + bx
ay + by
å
bx
by
,
å
.
It is left to the reader to see that these two notions of addition are equivalent, and satisfy properties
such as commutativity and associativity. Moreover, one can iterate addition, and thus define for
any positive integer n ∈ Z
na = |a + a +{z. . . + a} .
n times
Ç
Similarly, one can define subtraction, which regards −a :=
−ax
−ay
å
as a natural additive inverse
to a.
In fact, geometrically, we need not restrict ourselves to integer multiples, for we can scale a
vector by any real number (reversing direction if negative), and algebraically this corresponds to
simply multiplying each component by that real number. (For the math majors among you, we
are giving the space R2 of vectors, thought of either as pairs of real numbers or as arrows in the
plane, an abelian group structure but also a structure as a free R-module; we will see many of these
properties later when we define vector spaces formally, but a further generalization is to study
groups and modules; an elementary theory of groups is treated in introductory abstract algebra–
math 411 here at UMass, while more advanced group theory, ring theory and module theory are
left to more advanced abstract algebra courses, such as math 412 and math 611.)
We restrict our attention to integral linear combinations of the vectors
Ç
a :=
1
1
å
Ç
, b :=
4
2
−1
å
,
2015
Andrew J. Havens
i.e. combinations of the form xa + yb, where x, y ∈ Z. Then problem (2) is easily rephrased
Ç
å as
6
follows: does there exist an integral linear combination of a and b equal to the vector
?
0
Visually, it would seem quite plausible (make two parallelograms as shown below!)
Figure 4: The two vectors above depicted as “geometric arrows” in the Cartesian coordinate plane.
Algebraically, we can apply the definitions of vector scaling and addition to unravel the meaning
of the question: we are seeking integers x and y such that
Ç
å
6
0
Ç
=x
Ç
=
å
1
1
Ç
2
−1
+y
å
x + 2y
x−y
å
,.
This is equivalent to a linear system as seen in problem (1)! In fact, we can use the solution of (1)
to slickly obtain a solution to (2): since (1, 1) = (x, y) is a solution to
Ç
3
0
å
Ç
x + 2y
x−y
=
å
,
we can multiply both sides by 2 to obtain
Ç
6
0
å
Ç
=
2x + 4y
2x − 2y
Ç
= 2(1)
1
1
å
Ç
= 2x
å
Ç
+ 2(1)
1
1
å
Ç
+ 2y
2
−1
å
2
−1
å
.
Thus, taking two steps along a and two steps along b lands on the desired point (6, 0).
Let’s summarize what we’ve seen in these two problems. We have two dual perspectives:
Intersection problem: find the intersection
of two lines / solve a linear system
of two equations:
(
ax + by
cx + dy
Linear combination problem: Find a linear
combination
Ç
å of two vectors
Ç
å
a
b
a=
and b =
:
c
d
Ç
=e
=f
x
5
a
c
å
Ç
+y
b
d
å
Ç
=
e
f
å
2015
Andrew J. Havens
Let’s return to studying the intersection problem to fill in the gap: what can we say about
existence or uniqueness of solutions if the quantity ad − bc is equal to zero?
Proposition 1.1. For a given two variable linear system described by the equations
(
ax + by
cx + dy
=e
=f
the quantity ad − bc = 0 if and only if the lines described by the equations have the same slope.
Proof. We must show two directions, since this is an if and only if statement. Namely, we must
show that if the lines have the same slopes, then ad − bc = 0, and conversely, if we know only
that ad − bc = 0, we must deduce the corresponding lines possess the same slopes. Let’s prove the
former. We have several cases we need to consider. First, let’s suppose that none of the coefficients
are zero, in which case we can write each equation in slope-intercept form:
a
e
ax + by = e ←→ y = − x + ,
b
b
f
c
cx + dy = f ←→ y = − x + ,
d
d
and applying the assumption that the lines have identical slopes, we obtain
−
a
c
= − =⇒ ad = bc =⇒ ad − bd = 0 .
b
d
(1)
On the other hand, if for example, a = 0, then the first equation is by = e, which describes a
horizontal line (we must have b 6= 0 if this equation is meaningful). This tells us that the other
equation is also for a horizontal line, so c = 0 and consequently ad − bc = 0 · d − b · 0 = 0. A nearly
identical argument works when the lines are vertical, which happens if and only if b = 0 = d.
It now remains to show the converse, that if ad − bc = 0, we can deduce the equality of the
lines’ slopes. Provided neither a nor d are zero, we can work backwards in the equation (??):
ad − bc = 0 =⇒ −
a
c
=− .
b
d
Else, if a = 0 or d = 0 and ad − bc = 0, then since ad − bc = bc, either b = 0 or c = 0. But a and
b cannot both be zero if we have a meaningful system (or indeed, the equations of lines). Thus
if a = 0 and ad − bc = 0, then c = 0 and the lines are both horizontal. Similarly, if d = 0 and
ad − bc = 0, then b = 0 we are faced with two vertical lines.
There are thus three pictures, dependent on ad − bc, e and f :
1. If ad − bc 6= 0, there is a unique solution (x, y) for any e and f we choose, and this pair (x, y)
corresponds to the unique intersection point of two non-parallel lines.
2. If ad − bc = 0, but af − ec = 0 = bf − ed, then one equation is a multiple of the other,
and geometrically we are looking at redundant equations for a single line. There are infinitely
many solutions (x, y) corresponding to all ordered pairs lying on this line.
3. ad − bc = 0 but af 6= ec. We have two parallel lines, which never intersect. There are no
solutions to the linear system.
6
2015
Andrew J. Havens
While there is much more that can be done with two dimensional linear algebra, we have a
fairly complete idea of how to solve each of the basic problems posed. We now will explore the
analogous problems in three dimensions, as a way to build up to solving general linear systems.
Thus, consider the following problems from three dimensional geometry:
1. Given three “generic” planes in R3 which intersect in a unique point, can we locate their
point of intersection?
2. Given two planes intersecting along a line, can we describe the line “parametrically”?
3. Given three vectors u, v, w in R3 , can we describe a fourth vector b as a linear combination
of the other three?
Before approaching this, we review some important properties of the real numbers, and the
description of Cartesian coordinates on Cartesian products of the reals.
R denotes the real numbers, which has some additional structure such as a notion of distance
given by absolute value, a notion of partial ordering (≤). With these notions, together with ordinary
real number arithmetic, we can view R as a normed, ordered scalar field. The properties of R which
make it a field are:
(i.) R comes equipped with a notion of associative, commutative addition: for any real numbers
a, b, and c, a + b = b + a is also a real number, and (a + b) + c = a + b + c = a + (b + c).
Moreover, there is a unique element 0 ∈ R which acts as an identity for the addition of real
numbers: 0 + a = a for any a ∈ R. Every a ∈ R has a unique additive inverse (−a) such that
a + (−a) = 0.
(ii.) R comes equipped with a notion of associative, commutative, and distributive multiplication:
for any a, b, c ∈ R, ab = ba determines a real number, a(bc) = abc = (ab)c, and a(b + c) =
ab + ac = (b + c)a. Moreover, 0a = 0 for any a ∈ R, and there is a unique number a ∈ R
which acts as an identity for multiplication of real numbers: 1a = a for any a ∈ R.
(iii.) To any nonzero a ∈ R there corresponds a multiplicative inverse
1
a
:= a−1 satisfying aa−1 = 1.
A mathematical set with a structure as above is called a field. We will encounter other fields
later on. We’ve already seen examples of “vectors” in the plane, utilizing the coordinates coming
from a Cartesian product:
R2 = R × R := {(x, y) | x, y, z ∈ R} .
When we wish to emphasize that we are talking about vectors, we write them not as ordered pairs
horizontally, but as vertical tuples:
Ç
å
x
x=
∈ R2 .
y
We can regard such a vector as the position vector of the point (x, y), which means it is geometrically
the arrow pointing from the origin (0, 0) to the point (x, y). It has a notion of geometric length
coming from the pythagorean theorem:
kxk =
»
x2 + y 2 .
We can extend the ideas of this construction to create “higher dimensional ” spaces. The
geometry we are working with here is called Euclidean (vector) geometry. We define R3 analogously:
R3 = R × R × R := {(x, y, z) | x, y, z ∈ R} .
7
2015
Andrew J. Havens
In R3 , we can carve out subsets called planes. They have equations with general form:
ax + by + cz = d , a, b, c, d ∈ R , x, y, z are real variables for coordinates on the plane.
Let’s try to find an intersection point for a system of three planes.
Example 1.1. Consider the 3 × 3 system



 x+ y+ z
x − 2y + 3z


4x − 5y + 6z


=6
1 1 1 6


= 6 ←→ 1 −2 3 6  .
4 −5 6 12
= 12
Our goal is to manipulate the system via operations corresponding to adding or scaling the equations, in order to obtain


1 0 0 p


0 1 0 q  ,
0 0 1 r
which corresponds to a solution (x, y, z) = (p, q, r) for some p, q, r ∈ R.
A simple list of valid manipulations corresponds to the following elementary row operations:
1. We may swap two rows, just as we may write the equations in any order we please. We notate
a swap of the ith and jth rows of an augmented matrix by Ri ↔ Rj .
2. We may replace a row Ri with the row obtained by scaling the original row by a nonzero real
number. We notate this by sRi 7→ Ri .
3. We may replace a row Ri by the difference of that row and a multiple of another row. We
notate this by Ri − sRj 7→ Ri .
Before we proceed to apply these row operations to try to solve our system, I remark that
combining these elementary operations allows us to describe a more general valid manipulation:
we may replace a row by a linear combination of rows, where the original row is weighted by a
nonzero real number. E.g., if s 6= 0, then the following is the most general row operation (up to
row swapping) involving the rows R1 , R2 , R3 :
sR1 + tR2 + uR3 7→ R1 .
Now, to create our solution with row operations. Notice that the top left entry of the matrix
is already a 1, which is good news! We want 1s on the main diagonal, and zeros elsewhere on the
coefficient side of the augmented matrix. So if the top left entry was a 0, we’d swap rows to get a
nonzero entry there, and then if it was not 1 we’d scale the first row by the multiplicative inverse
of that entry. Once we’ve got a nonzero entry there, we call this position the first pivot, and our
goal is to use it to create a column of zeroes beneath that position.
Focusing on that first column, we have:


1 ... 6


1 . . . 6  .
4 . . . 12
It is clear that we can eliminate the second entry in the first column by the row operation
R2 − R1 7→ R2 . Similarly, we can create a zero in the first entry of the third row by R3 − 4R1 7→ R3 .
This yields




1 1 1 6
1 1 1
6




0 
1 −2 3 6  7−→ 0 −3 2
4 −5 6 12
0 −9 2 −12
8
2015
Andrew J. Havens
Next, we want to make the middle entry from a −3 into a 1. This is readily accomplished by a row
operation of the second type: − 31 R2 7→ R2 .
One should check that after performing in sequence the moves R3 − 9R2 7→ R3 , 14 R3 7→ R3 ,
R2 + 32 R3 7→ R3 , R1 − 13 R3 7→ R1 , and R1 − R2 7→ R1 , the matrix reduces to


1 0 0 1


0 1 0 2 .
0 0 1 3
Thus the solution to our system is (1, 2, 3), which is the point where these planes intersect.
The process where we used a pivot to make zeroes below that entry is called pivoting down,
while the process where we eliminated entries above a pivot position is called pivoting up.
Exercise 1.1. Show that the row operations are invertible, by producing for a given elementary
row operation, another elementary operation which applied either before or after the given one will
result in the final matrix being unchanged.
Example 1.2. Let us turn to the second geometric problem, regarding the description of a line of
intersection of two planes. Take, for instance, the two planes
(
x+ y+ z
x − 2y + 3z
=6
.
=6
By applying the row operations in the preceding example together with a few more (which
ones?), we see that we can get the system to reduce to
ô
ñ
1 0 5/3 6
.
0 1 −2/3 0
Notice that there can be at most two pivots, since there are only two rows! We rewrite the matrix
rows as equations to try to parametrize the line:
x = 6 − (5/3)z ,
y = (2/3)z ,
whence








x
−5/3
6
6 − (5/3)z



 

 
 y  =  (2/3)z  =  0  + z  2/3  .
1
z
z
0
Thus the line can be parametrized by z ∈ R, which is the height along the line which begins at
(6, 0, 0) on the xy-plane in R3 when z = 0, and travels with velocity


−5/3


v =  2/3  .
1
Note that above we wrote the solution as a linear combination of the vectors for the starting
position and the velocity. It will be common to solve systems where the final solution is an arbitrary
linear combination dependent on some scalar weights coming from undetermined variables. By
convention, we often choose different letters from the variable designations, such as s and t, to
represent the scalings in such a solution. Thus we would write






x
6
−5/3

 



 y  =  0  + s  2/3  , s ∈ R ,
z
0
1
where we’ve taken z = s as a free variable.
9
2015
Andrew J. Havens
For the third problem, the key observation is that it is essentially the same as the first problem,
dualized. We can write down the equation
xu + yv + zw = b ,
for some unknowns x, y, z ∈ R, and after scaling the vectors entry by entry, and adding entry by
entry, we have two vectors which are ostensibly equal. Thus setting their entries equal, we obtain a
system of three equations, which can be solved via elimination/row operations on the corresponding
augmented matrix.
Example 1.3. Let






1
−2
3






u =  2  , v =  −1  , w =  2  .
3
−3
5
Can the vector


0


b= 1 
4
be written as a linear combination of u, v, and w?
The claim is that this is not possible. Observe that if such a linear combination exists, then
there’s a solution to the vector equation
xu + yv + zw = b .
We can rewrite this as a system as follows:



 x − 2y + 3z
2x − y + 2z


3x − 3y + 5z


=0
1 −2 3 0


= 1 ←→ 2 −1 2 1
3 −3 5 4
=4
We apply the row operations R2 − 2R1 7→ R2 and R3 − 3R1 7→ R3 to obtain




1 −2 3 0


0 3 −4 1 ,
0 3 −4 4
and then R3 − R2 7→ R3 leaves us with
1 −2 3 0


0 3 −4 1 .
0 0
0 3
The last row corresponds to the impossible equation 0z = 3 =⇒ 0 = 3, so there is no possible
solution! We call such a system inconsistent. Otherwise, if the equation can be solved (even if the
solution is not unique), we refer to the system as consistent.
Some possible practice problems: Problems 1-18 in section 1.1 – Introduction to Linear systems
in Otto Bretscher’s textbook Linear Algebra with Applications.
These problems generalize easily into higher dimensions, and it will be nice to see that our
procedure illustrated in the above examples works just as well in those settings. Thus, it seems
fitting that we study the general algorithm which allows us to reduce systems and solve either for
an explicit solution, or to realize a system is inconsistent. As we will use this algorithm extensively,
I devote several lectures to its details and implementation.
10
2
2015
Andrew J. Havens
Gauss-Jordan Elimination
In this section we describe the general algorithm which takes a matrix and reduces it in order to
solve a system or determine that it is inconsistent. Let us begin with some language and notations.
Definition 2.1. A matrix is said to be in Row Echelon Form (REF) if the following conditions
hold:
1. All rows containing only zeros appear below rows with nonzero entries.
2. The first nonzero entry in any row appears in a column to the right of the first nonzero entry
in any preceding row, and any such initial nonzero entry is a 1.
The columns with leading 1s are called pivot columns, and the entries containing leading 1s are
called pivots. If, in addition, all entries other than the pivot entries are zero we say the matrix is
in Reduced Row Echelon Form (RREF).
Example 2.1.
ñ
is a matrix in row echelon form, while
1 0 5/3
0 1 −2/3
ñ
1 0 0
0 1 0
ô
ô
is a matrix in reduced row echelon form.
We write elementary row ops as follows: let s ∈ R \ 0 be a nonzero scalar, A ∈ Matm×n (R) a
matrix which contains m rows and n columns of real entries. Let Ri denote the ith row of A for
any integer i, 1 ≤ i ≤ m. Then the elementary row operations are
1. Row swap: Ri ↔ Rj swaps the ith and jth rows.
2. Rescaling: sRi 7→ Ri scales Ri by s.
3. Row combine: Ri − sRj 7→ Ri combines Ri with the scalar multiple sRj of Rj .
We are ready to describe the procedure for pivoting downward :
Definition 2.2. Let aij denote the entry in the ith row and jth column of A ∈ Matm×n (R). To
pivot downward on the (i,j)th entry is to perform the following operations:
1
(i.)
Ri 7→ Ri ,
aij
(ii.) For each integer k > i, Ri+k − ai+k,j Ri 7→ Ri+k .
In words, make aij into a 1, and use this one to eliminate (make 0) all other entries directly below
the (i,j)th entry.
Let’s give a brief overview of what the Gauss-Jordan algorithm accomplishes. First, given an
input matrix, it searches for the leftmost nonzero column. Then, after finding this column, and
after exchanging rows if necessary, it brings the first nonzero entry up to the top. It then pivots
downwards on this entry. It subsequently narrows its view to the submatrix with the first row and
column removed, and repeats the procedure. Once it has located all pivot columns and pivoted
down in each one, it starts from the rightmost pivot and pivot up, then move left to the next pivot
and pivot up. It then continues pivoting up and moving left until the matrix is in row echelon form.
The descriptions and charts I gave in class are largely taken from a textbook which is in my
office (the name escapes me). However, the technical details given in class are not a principal focus,
and in particular, will not appear on the exam in any formal capacity (as long as you can perform
the algorithm in practice, then you’ve got what you need for the remainder of the course). I may
come back and include these details at a future date.
11
3
2015
Andrew J. Havens
Matrices and Linear Maps of Rn → Rm
Now that we have an algorithm for solving systems, let’s return to the vector picture again. Here,
we review some basic vector algebra in two and thee dimensions: Regard R2 as the set of vectors
®Ç
2
R =
x
y
å ´
x, y ∈ R ,
and similarly regard R3 as the set of vectors
Ö

è 

x


3
y
R =
x, y, z ∈ R .




z
Recall the dot product, which I define in R3 (for R2 simply forget the last coordinate):
Ö
a
b
c
è Ö
·
x
y
z
è
= ax + by + cz .
Notice that the right hand side is in fact identical to the expression appearing on the left hand
side of our general equation for a plane in R3 ! This is not a coincidence. One way to geometrically
determine a plane is to fix a vector
Ö
è
a
b
n=
c
and find the set of all points such that the displacement vector from a fixed point (x0 , y0 , z0 ) is
perpendicular to i.. The key fact (which we will prove later in the course) is that u·v = kukkvk cos θ
for any vectors u, v ∈ R3 , where θ ∈ [0, π] is the angle made between the two vectors (which can
always be chosen to be in the interval [0, π]). Thus, a plane equation has the form
n · (x − x0 ) = 0 ,
Ö
where
x=
x
y
z
è
Ö
, x0 =
x0
y0
z0
è
.
It is simple algebra of real numbers which turns this into the equation ax + by + cz = d, where
d = n · x0 is a constant determined by the choices of n and x0 . One refers to the function
f (x, y, z) = ax + by + cz, with a, b, c ∈ R known and x, y, z ∈ R variable, as a linear function. So
another viewpoint is that a plane in R3 is a level set of a linear function in three variables.
We can regard the dot product in another way: as a 1 × 3 matrix acting on a 3 × 1 matrix by
matrix multiplication:


x


[ a b c ]  y  = [ax + by + cz] = n · x ,
z
where I’ve abused notation slightly by taking the 1 × 1 resulting matrix, and regarding it as merely
the real number it contains. We take this as the definition of matrix multiplication in the case
where we are given a 3 × 1 matrix (a row vector ) and a 1 × 3 matrix (a column vector ). We wish
to extend this definition to matrices acting on column vectors, and we will see that the definition
is powerful enough to capture both the concepts of linear systems and linear combinations.
12
2015
Andrew J. Havens
The idea is simple: we’ll let rows of a matrix be dotted with a vector, as above, which gives us
a new vector consisting of the real numbers resulting from each row-column product. Formally, we
can define it in Rn , which we think of as the space of column vectors with n real entries:
Ö

è 

x1




.
n
x1 , . . . , x n ∈ R .
..
R =






x
n
Definition 3.1. Let A ∈ Matm×n (R) be a matrix given by
···
..
.
am 1 · · ·
a11
 ..
 .

a1n
..  .
. 
amn

Let vi denote the vector whose entries are taken from the ith row of A:
ai1
 .. 
vi :=  .  .
ain


Then define the matrix-vector product as a map Matm×n (R) × Rn → Rm given by the formula
v1 · x


..
m
x 7→ Ax := 
∈R .
.
vn · x


Example 3.1. Compute the matrix vector product Au where




1
1 1 1




A =  1 −2 3  , u =  2  .
3
4 −5 6
To compute this, we need to dot each row with the column vector u. For example, the first row
gives


1


[ 1 1 1 ]  2  = 1(1) + 1(2) + 1(3) = 6 .
3
Note that dotting a vector u with a vector v consisting entirely of ones simply sums the components
u. Computing the remaining rows this way, we obtain the vector





1 1 1
1
6


 

Au =  1 −2 3   2  =  6  .
4 −5 6
3
12
î
ó
Let’s call this vector b. Recall that u was a solution to the system with augmented matrix A b !
This is no coincidence. We can view the system of equations as being equivalent to solving the
following problem: find a vector x such that Ax = b. In this case we’d solved that system for
x = u, and just checked via matrix-vector multiplication that indeed, it is a solution!
We have one last perspective on this, which is that we found a linear combination of the columns
of A:





 

1
1
1
6





 

x  1  + y  −2  + z  3  =  6 
4
−5
6
12
13
2015
Andrew J. Havens
is solved by x = 1, y = 2, and z = 3.
Thus, we’ve explored numerous ways to understand the solution of the equation





1 1 1
x
6


 

Ax =  1 −2 3   y  =  6  .
4 −5 6
z
12
Let us remark on some basic properties of matrix-vector products. We know that we can view
them as giving maps between Euclidean spaces of vectors. We have the following observations:
1. For any matrix A ∈ Matm×n (R) and any vectors x, y ∈ Rn , A(x + y) = Ax + Ay.
2. For any A ∈ Matm×n (R), any vector x, ∈ Rn , and scalar s ∈ R, A(sx) = s(Ax).
Are these not familiar properties? Consider, for example, limits, derivatives, integrals. Another
way of stating these properties is to say we have discovered operators which, upon acting on linear
combinations of inputs, output a linear combination of the sub-outputs. That is, matrices take linear
combinations of vectors to linear combinations of matrix-vector products, derivatives take linear
combinations of differentiable functions to linear combinations of the derivatives of the simpler
functions, and integrals act analogously on integrable functions. Both derivatives and integrals
behave this way because limits do, so the linearity was somehow inherited. We’d gradually like
to come to an understanding of the word linear describing the commonality among these various
operations, which behave well with respect to linear combinations. To do this, we need to see
what spaces of objects have the right properties to form linear combinations, and to ensure that
we consider maps of such spaces which respect this structure in a way analogous to the above two
properties.
Practice:
Exercise 3.1. Let A be the matrix


0
2 −1


3 .
A =  −2 0
1 −3 0
Compute Ax for
1.




1


x= 1 ,
1
2.
3


x= 1 ,
2
3.






1
0
0

 



x =  0  ,  1  , or  0  .
0
0
1
Can you interpret the results geometrically? We will eventually have a good understanding of
the geometry of the transformation x 7→ Ax for the above matrix, and others which share a certain
property which it possess. (Preview: it is a skew symmetric matrix, and represents a certain
cross-product operation).
14
2015
Andrew J. Havens
We now investigate so called linear maps from Rn to Rm .
Definition 3.2. A map T : Rn → Rm is called a linear transformation, or a linear map if the
following properties hold:
1. For all s ∈ R and any x ∈ Rn , T(sx) = s(Tx).
2. For any pair of vectors x, y ∈ Rn , T(x + y) = Tx + Ty.
We refer to T as a linear operator if these properties hold. Note the convention of often omitting
parentheses between the operator T and the vector input x: Tx := T(x).
Clearly, the operator TA : Rn → Rm defined by TA x = Ax defines a linear map. Let us see how
linear systems fit into this framework. First, a formal description of linear systems:
Definition 3.3. A system of linear equations in n variables is a set of m ≥ 1 equations of the form


a11 x1 + a12 x2 + . . . + a1n xn




a21 x1 + a22 x2 + . . . + a2n xn
..


.




..
.
am1 x1 + am2 x2 + . . . + amn xn
..
.
= b1
= b2
.. .
.
= bm
Observation 3.1. A system of linear equations can be captured by the linear transformation TA
associated to a matrix A = (aij ) ∈ Matm×n (R). Thus, a linear system can be written as Ax = b
for x ∈ Rn unknown. The system Ax = b is solvable if and only if b is in the image of TA .
We need to recall what is meant by this terminology, so what follows is a handful of definitions
regarding functions (not necessarily just linear functions; these definitions are standard and are
usually introduced in high school or college algebra and precalculus courses).
Definition 3.4. Let X and Y be mathematical sets. A function f : X → Y assigns to each x ∈ X
precisely one y ∈ Y . X is called the domain or source, and Y is called the codomain or target.
Note that one y may be assigned to multiple xs, but each x can be assigned no more than one
y... this is a distinction which often trips folk up when first learning about functions. To better
understand this distinction, let’s view functions as triples consisting of a domain X (the inputs),
a codomain Y (the possible outputs), and a rule f assigning outputs to inputs. Note htat we need
to specify all of these to completely identify a function. Now, if the domain were the keys on a
keyboard, and the outputs the symbols on your screen in a basic word processing environment,
you’d declare your keyboard “broken” if after pushing the same key several times, your screen
displayed various unexpected results. On the other hand, if your function was determined by a
preset font, you could imagine pushing many different keys, and having all of the outputs be the
same. In this latter case, the keyboard is functioning, but the rule assigning outputs happens to
be a silly one (every keystroke produces a ‘k’, for example). Thus, a function may assign at most
one output per input, but may reuse outputs as often as it pleases.
Sometimes a function is also called a map, especially if the sets involved are thought of as
“spaces” in some way. We will later define structures on a set which turn them into something
called a vector space, and we will study linear maps on them, which are just functions with properties
analogous to those for linear functions from Rn to Rm .
Definition 3.5. Given sets X and Y and a function f : X → Y , the set
f (x) := Im(f ) = {y ∈ Y | y = f (x) for some x ∈ X} ⊂ Y
is called the image of f .
15
2015
Andrew J. Havens
Definition 3.6. Given sets X and Y and a function f : X → Y , and given a subset V ⊂ Y , the
set
f −1 (V ) := {x ∈ X | f (x) ∈ V } ⊂ X
is called the preimage of V .
Be warned: the preimage of a subset is merely the set of things being mapped to that subset, but
is not necessarily constructed by “inverting a function” since not every function is invertible (but
any subset of a codomain has defined for it a preimage by any function mapping to the codomain;
that preimage may be empty!) If, on the other hand, for every y ∈ Y there is a unique x ∈ X, such
that y = f (x), then we use the same notation f −1 to describe the inverse function. We will talk
more about inverses after a few more definitions.
Definition 3.7. A map f : X → Y is called surjective or onto if and only if for every y ∈ Y , there
is an x ∈ X such that y = f (x); equivalently, if the preimage f −1 {y} =
6 ∅ for all y ∈ Y , f is a
surjection from X to Y . A common shorthand is to write f : X Y to indicate a surjection; in
class I avoid this shorthand because it is easy to miss until one becomes quite comfortable with the
notion. However, in these notes, I will from time to time use it, while also reminding the reader
that a particular map is a surjection by declaring it “onto” or ”surjective” in the commentary.
Note that a map f : X → Y is a surjective map if and only if the image is equal to the codomain:
f (X) = Y . In our keyboard analogy, we’d want to be able to produce any symbol capable of being
displayed in a word processing program by finding an appropriate keystroke in order to declare
that our typing with a particular font was “surjective”. Thus, the rule for producing outputs has
to be powerful relative to the set of outputs: any output can be achieved by an appropriate input
into a surjective function. Another remark is that if we start with some function f : X → Y , and
then restrict our codomain to the image f (X) ⊆ Y , we obtain a new function, which we abusively
might still label f . This function is surjective! Said another way, any function surjects onto its
image, because we’ve thrown out anything in the codomain which wasn’t in the image when we
restricted the codomain! So, in our typing analogy, perhaps we can’t produce all symbols with a
given font, but if we declare our codomain to be only the symbols that display in that font with
regular typing inputs (no fancy stuff, multiple keys at once, sequences of keystrokes, etc1 ), then we
have automatically built an onto map between keys and displayable symbols in the given font.
Definition 3.8. A map f : X → Y is called injective or one-to-one if and only if for every distinct
pair of points x1 , x2 ∈ X, they possess distinct images:
x1 6= x2 =⇒ f (x1 ) 6= f (x2 ) for all x1 , x2 ∈ X .
Equivalently, for any y ∈ f (X), the preimage of y, f −1 ({y}) contains precisely one element from
X. As a shorthand, one often writes f : X ,→ Y , and refers to f as an injection.
Exercise 3.2. Show that a function f : X → Y is injective if and only if whenever f (x1 ) = f (x2 ),
one has that x1 = x2 .
Definition 3.9. A map f : X → Y is called a bijection if and only if it is both injective and
surjective.
Definition 3.10. Given a map f : X → Y , a map f −1 : Y → X is called an inverse for f if and
only if
1
If we define our domain to be the set of all sequences of keystrokes which can produce a single symbol output,
and our codomain to be all possible outputs in the font, then we have a bijection between keystroke sequences and
outputs if and only if the font contains no repeated characters, and the hardcoding contains no redundant input
sequences.
16
Ä
2015
Andrew J. Havens
ä
(i.) f −1 ◦ f = IdX , i.e. f −1 f (x) = x for every x ∈ X,
Ä
ä
(ii.) f ◦ f −1 = IdY , i.e. f f (y) = y for every y ∈ Y .
If such a function exists, we say f is invertible.
Exercise 3.3. A function f : X → Y is invertible if and only if it is a bijection.
Note that there are two ways to show that some map f : X → Y is a bijection. You can show
that it is both injective and surjective separately, or you can prove that an inverse exists.
We’d now like to return to doing linear algebra, a little brighter with our language of functions.
Consider the following questions:
Question 1: If Ax = b possess a solution x for every v ∈ Rm , then what can we say about
the linear map
TA : Rn → Rm ?
Question 2: If Ax = b possess a unique solution x for every v ∈ Im(TA ) =: ARn , then what
can we say about the linear map
TA : Rn → Rm ?
These answers give us a surprising motivation to study specific properties of linear maps, such
as which vectors they send to the zero vector. Here I provide incomplete answers to these questions.
For the first, we know that the map is surjective, though we need to discover what that means in
terms of our matrix; in particular, we’d like to answer “what property must a matrix have for the
associated matrix-vector multiplication map to be surjective?” Similarly, for the second question,
we know that the map must be injective, and would hope to characterize injectivity in an easily
computable way for a given map coming from multiplying vectors by matrices.
Surjectivity, recall, is equivalent to the image being the entire codomain. So for a linear map
T : Rn → Rm to be surjective, we merely require that T(Rn ) = Rm . To know when a given matrix
can accomplish this, we’ll need to do more matrix algebra, and come to an understanding of the
concept of dimension. For now, I’ll state without argument that there’s certainly no hope if n < m.
But it’s also possible to have n >> m and still produce a map which doesn’t cover Rm (e.g. by
mapping everything onto 0, or onto some linear set carved out by a linear equation system).
Injectivity is more subtle. Begin first by observing that if T : Rn → Rm is linear, then T0Rn =
0Rm where 0Rn is the zero vector, consisting of n zeroes for components, and similarly for 0Rm (I
will often drop the subscript when it is clear which zero vector is being invoked). This is because
of the first property in the definition of linearity:
0 = 0(Tx) = T(0x) = T0 for any x ∈ Rn .
So certainly, the preimage of 0 by a linear map contains 0. If it contains anything else, then the
map is not injective by definition. I claim that the converse is true: if there’s only the zero vector in
the preimage of the zero vector, then the linear map is an injection. The proof is constructed as a
solution to the first question on the second written assignment (HW quiz 2, problem 1), in greater
generality (the result, correctly stated, holds for vector spaces). We’ll discuss this proposition more
later. Generally, we want to know about solutions to the homogeneous equation Ax = 0, and in
particular, when there are nontrivial solutions (which means the matrix-vector multiplication map
is not injective). It seems clear that this information comes from applying Gauss-Jordan to the
matrix, and counting the pivots. If there are no free variables, then the homogenous system is
solved uniquely, and the map is injective. If it is also surjective, we’d like to be able to find an
inverse function which solves the general, inhomogeneous system Ax = b once and for all! We need
a little more information about matrix algebra if we wish to accomplish this. Along the way, we
will further motivate the development of abstract vector spaces.
17
4
2015
Andrew J. Havens
Matrix algebra
Suppose we wanted to compose a pair of linear maps induced by matrix multiplication:
T
T
B
A
Rk −→
Rn −→
Rm ,
where B ∈ Matn×k (R) and A ∈ Matm×n (R). Let TAB = TA ◦ TB denote the composition obtained
by first applying TB and then applying TA .
Exercise 4.1. Check that TAB above is a linear map.
We want to know if we can represent TAB by a matrix-vector multiplication. It turns out we
can, and the corresponding matrix can be though of as a matrix product of A and B. Let us do an
example before defining this product in full generality.
Example 4.1. Let
ñ
A=
3 2 1
6 5 4
ô


1 2


∈ Mat2×3 (R) , and B =  3 4  inMat3×2 (R) .
5 6
Thus,
: R3 → R2 is given by TA y = Ay and TB : R2 → R3 is given by TB x = Bx. Given
ñ TA ô
x1
∈ R2 , the map TAB : R2 → R2 sends x to A(Bx). Let y = Bx. Then
x=
x2
ô
x1 + 2x2
1 2 ñ

 x1


=  3x1 + 4x2  .
y= 3 4 
x2
5x1 + 6x2
5 6




We can then compute Ay:
ñ
TAB x = Ay = A(Bx) =
ñ
=
3 2 1
6 5 4
ñ
=
ñ
=


x1 + 2x2


3x

1 + 4x2 
5x1 + 6x2
3(x1 + 2x2 ) + 2(3x1 + 4x2 ) + 5x1 + 6x2
6(x1 + 2x2 ) + 5(3x1 + 4x2 ) + 4(5x1 + 6x2 )
" Ä
=
ô
ä
Ä
ô
ä
3(1) + 2(3) + 1(5)äx1 + Ä3(2) + 2(4) + 1(6)äx2
Ä
6(1) + 5(3) + 4(5) x1 + 6(2) + 5(4) + 4(6) x2
3(1) + 2(3) + 1(5) 3(2) + 2(4) + 1(6)
6(1) + 5(3) + 4(5) 6(2) + 5(4) + 4(6)
14 20
41 56
ôñ
x1
x2
ô
ñ
=
14x1 + 20x2
41x1 + 56x2
ô
ôñ
x1
x2
#
ô
.
Notice that the matrix in the penultimate line above is obtained by forming dot products from
the row vectors of A with the column vectors of B to obtain each entry. This is how we will
define matrix multiplication in general: we treat the columns of the second matrix as vectors, and
compute matrix-vector products in order to obtain new column vectors.
We are now ready to define the matrix product as the matrix which successfully captures a
composition of two linear maps coming from matrix-vector multiplication. Let’s return to the
setup.
18
2015
Andrew J. Havens
Definition 4.1. Suppose we have linear maps
T
T
B
A
Rk −→
Rn −→
Rm ,
where B ∈ Matn×k (R) and A ∈ Matm×n (R). Let TAB = TA ◦ TB : Rk → Rm denote the composition
obtained by first applying TB and then applying TA . Then there is a matrix M such that TAB x =
Mx for any x ∈ Rk , and we wish to define AB := M. Following the ideas of the above example,
we can (exercise!) realize M = (mij ) ∈ Matm×k (R) as the matrix whose entries are given by the
formula
n
mij =
X
ail blj .
l=1
Thus, the columns of AB are precisely the matrix-vector products Avj where vj is the jth column
of B. We refer to AB ∈ Matm×k (R) as the matrix product of A and B.
Several remarks are in order. First, note that there is a distinguished identity matrix In ∈
Matn×n (R) such that for any A ∈ Matm×n , AIn = A and for any B ∈ Matn×k , In B = B. This
matrix consists of entries δij which are 1 if i = j and 0 if i 6= j:



In = 



1 0 0 ... 0
0 1 0 ... 0 

..
.. 
..
 ∈ Matn×n (R) .
.
.
. 
0 0 ... 0 1
Clearly, for any vector x ∈ Rn , In x = x, whence it also acts as an identity for matrix multiplication,
when products are defined.
Notice also that the number of columns of the first matrix must match the number rows of the
second matrix. In particular, if A ∈ Matm×n and B ∈ Matn×k (R), then AB is well defined, but BA
is well defined if and only if k = m.
Worse yet, like function composition, matrix multiplication, even if it can be defined in both
orders, is in general not commutative, as the maps of the two differently ordered compositions may
land in different spaces altogether!
Example 4.2. Suppose A ∈ Mat2×3 (R), and B ∈ Mat3×2 (R). Then both AB and BA are defined,
but AB ∈ Mat2×2 (R), while BA ∈ Mat3×3 (R)!
We may hope that things are nicer if we deal with square matrices only, so that products of
matrices stay in the same space. Alas, even here, commutativity is in general lost, as the next
example illustrates.
Example 4.3. Consider the following matrices:
ñ
1 2
0 1
ô
ñ
,
0 −1
1 0
ô
.
We compute the products in each order:
ñ
ñ
1 2
0 1
ôñ
0 −1
1 0
0 −1
1 0
ôñ
1 2
0 1
ô
ñ
=
ô
ñ
=
2 −1
1 0
0 −1
1 2
ô
ô
.
Thus, matrix multiplication isn’t generally commutative, even for 2 × 2 square matrices where all
products are always defined.
19
2015
Andrew J. Havens
Another remark, which would require some work to prove, is that multiplication of real matrices
is associative. In particular, if A, B and C are matrices for which the products A(BC) and (AB)C
are defined, then in fact these are the same and thus without ambiguity we have
A(BC) = ABC = (AB)C .
There are several other important constructions in matrix algebra, which rely on the structure
of the Euclidean spaces of vectors we’ve been working with. Note that we can define sums of images
of vectors under a linear map. This allows us to also define sums of matrices.
Definition 4.2. Given A, B ∈ Matm×n (R), we can define the sum A + B to be the matrix such
that for any x ∈ Rn , (A + B)x = Ax + Bx. Using the indicial notation for entries, we have then
that
n
n
n
X
aij xj +
j=1
X
X
bij xj =
j=1
(aij + bij )xj ,
j=1
which implies that A + B is obtained by adding corresponding entries of A and B.
Matrices can also be scaled, by simply scaling all the entries: sA = (saij ) for any s ∈ R. In
particular, we may also subtract matrices, and each matrix has an additive inverse. There’s a
unique zero matrix in any given matrix space Matm×n (R), consisting of all zero entries. Denote
this zero matrix by 0m×n .
We define a few more operations with matrices. If A ∈ Matm×n (R), then we can define a new
matrix called it’s transpose, which lives in Matn×m (R):
Definition 4.3. The matrix A = (aij ) has transpose Aτ = (aji ), in other words, the transpose
matrix is the matrix obtained by exchanging the rows of A for columns.
Example 4.4.
ñ
1 2 3
4 5 6
ôτ


1 4


= 2 5 .
3 6
Finally, we discuss, for square matrices, the notion of a matrix inverse. The inverse matrix of
a matrix A ∈ Matn×n (R) is one which, if it exists, undoes the action of the linear map x 7→ Ax.
In particular, we seek a matrix A−1 such that A−1 A = In = AA−1 . Recall, that the map must be
bijective for it to be fully invertible.
Proposition 4.1. If an inverse matrix for a matrix A ∈ Matn×n (R) exists, then one can compute
it by solving the system with augmented matrix
î
ó
A In .
This can be done if and only if the reduced row echelon form of A is the n × n identity, that is,
RREF(A) = In . In this case, after applying Gauss-Jordan to this augmented matrix, one has the
matrix
ó
î
In A−1 .
Proof. The condition AA−1 = In gives us n systems of n equations in n variables, corresponding
to the systems Avj = ej for vj a column of A−1 , and ej the jth column of the identity matrix In .
The row operations to put A into RREF do not depend on ej , so applying these operations to the
matrix
î
ó î
ó
A e1 . . . en = A In
simultaneously solves all n systems, provided that RREF(A) = In . If RREF(A) 6= In , then there
are free variables, and the columns of our hypothetical inverse cannot be uniquely determined, and
20
2015
Andrew J. Havens
in fact, at least one of the systems will consequently be inconsistent. This latter statement will
be more carefully proved when we discuss linear independence. Assuming the reduction can be
completed to solve for A−1 , then the final form of the augmented matrix is clearly
î
ó
î
ó
In v1 . . . vn = In A−1 ,
which gives the desired matrix inverse.
Example 4.5. Let’s compute
ñ
1 2
1 3
ô−1
.
The augmented matrix system we need is
ñ
1 2 1 0
1 3 0 1
ô
.
Applying the row operations R2 − R1 7→ R2 followed by R1 − 2R2 7→ R2 , one obtains
ñ
1 0 3 −2
0 1 −1 1
ô
.
We can check easily by multiplying, in either order, to obtain the identity matrix.
Exercise 4.2. Find
−1

1 2 3


 4 5 6 
7 8 9
,
if it exists.
Exercise 4.3. Show that A(B + C) = AB + AC whenever the products and sums are defined.
Convince yourself that s(AB) = A(sB) for any scalar s ∈ R, provided the matrix product is
defined. What can you say about (A + B)τ and (AB)τ ?
5
5.1
Vector Spaces
Indulging a Motivation
In the previous section, we saw that matrices have algebraic properties identical in some sense
to the algebraic properties of vectors in a Euclidean vector space: we can add them and scale
them, and we can form linear combinations of matrices if we so please, with all these operations
being commutative and associative. Matrix multiplication, on the other hand, defines linear maps
of Euclidean vectors. But since we can also multiply matrices by each other under the right
(dimensional) conditions, we may want a way to regard matrices as determining linear maps on the
spaces of matrices. More specifically, given M ∈ Matm×n (R) and A ∈ Matn×k (R), we can define a
map
TM : Matn×k (R) → Matm×k (R) ,
given by the rule
A 7→ MA .
By the exercise at the end of last section, we have that
TM (sA + tB) = M(sA + tB) = s(MA) + t(MB) = sTM A + tTM B .
21
2015
Andrew J. Havens
Thus, we want to be able to regard this map as a linear map since it shares the properties which
defined linear maps from Rn to Rm .
One way to easily realize this is to actually identify the spaces Matn×k (R) with some Euclidean
vector space. By concatenating the columns of matrices in some chosen order, we can create a
bijective map from Matn×k (R) to Rn×k . Of course, there’s not a single natural way to do this; we
could also concatenate rows, or scramble the entries up somehow, as long as we do it consistently
for all matrices.
Example 5.1. We can identify Mat2×2 (R) with R4 as follows: given a matrix
ñ
a b
c d
ô
,
we can map it to the 4-vector





a
c
b
d



,

obtained by concatenating the first and second columns but we can also map it to the 4-vector





a
b
c
d



,

obtained by concatenating rows. Neither choice is better than the other, so we say that our
identification, whichever we choose, is non-canonical, since there’s not a particularly more natural
choice.
Exercise 5.1. Given A ∈ Matn×k (R), how many different ways can one identify A with a vector
in Rnk which contains the same entries as A? How many ways can we bijectively map Matn×k (R)
and Rnk ?
5.2
The Big Definition
Another approach, which is quite fruitful, is to investigate spaces which have the appropriate
general algebraic structure to support a notion of “linear map”. This brings us to the study of
vector spaces.
Definition 5.1. A vector space is a set V whose elements will be called vectors, together with
additional structure depending on a pair of operations and a choice of a scalar field F (for now,
mentally picture F = R ,the field real numbers, or F = Q, the field of rational numbers; other
examples will be given later including complex numbers C and finite fields.) The operations are
vector addition and scalar multiplication. Vector addition takes two vectors x, y ∈ V and produces
a (possibly new) vector x + y ∈ V , while scalar multiplication takes a scalar s ∈ F and a vector
x ∈ V and produces a (possibly new) vector sx ∈ V . These operations are required to satisfy 8
axioms:
Axiom 1: Commutativity of vector addition: for any x, y ∈ V , x + y = y + x.
Axiom 2: Associativity of vector addition: for any x, y, z ∈ V , x + (y + z) = x + y + z =
(x + y) + z.
22
2015
Andrew J. Havens
Axiom 3: Identity for vector addition: there exists a vector 0 ∈ V such that for any x ∈ V ,
x + 0 = x.
Claim. The zero vector 0 ∈ V is unique.
˜ ∈ V is a
Proof. This follows from the preceding axioms: Assume we have found 0. Then if 0
˜
˜=
vector such that x + 0 = x for any x ∈ V as well, then taking x = 0 one has 0 = 0 + 0
˜ + 0 = 0,
˜ showing that our new candidate was in fact the same as the zero vector.
0
Axiom 4: Inverse for vector addition: for any x ∈ V , there is an inverse element (−x) such
that x + (−x) = 0.
Axiom 5: Scalar distributivity over vector addition: for any s ∈ F and any x, y ∈ V ,
s(x + y) = sx + sy.
Axiom 6: vector distributivity over scalar addition: for any x ∈ V and any scalars r, s ∈ F,
(r + s)x = rx + sx.
Axiom 7: Associativity of scaling: for any x ∈ V and any scalars r, s ∈ F, s(rx) = (sr)x.
Axiom 8: Scalar identity: for any x ∈ V , 1x = x, where 1 ∈ F is the multiplicative identity
for the field.
A set V with vector addition and scalar multiplication satisfying the above eight axioms for a field
F is called a “vector space over F” of simply “an F-vector space”.
Exercise 5.2. Let V be an F-vector space. Prove that for any given x ∈ V , the inverse (−x) is
unique, and equals −1(x).
Given the abstraction of the above definition, let us convince ourselves that it is a worthwhile
definition by exhibiting a plethora of examples. The longer one studies math, the more one discovers
many ubiquitous vector spaces, which vindicate the choices made in crafting such a long, abstract
definition. After a while, one potentially becomes disappointed when one encounters something
that’s almost a vector space (modules over commutative rings with zero divisors: I’m looking at
you!), but rest assured, there are plenty of vector spaces out there to become acquainted with!
The following examples are also “thought exercises” where you should convince yourself that the
examples meet the conditions set forth in the above axioms.
Example 5.2. The obvious example is Rn : every axiom seems to have been picked from observing
the essential structure of Rn as a vector space over R.
Example 5.3. It doesn’t take much work at this point to show that Matm×n (R) is an R-vector
space for any positive integers m and n. Convince yourself that all eight axioms are met if we take
matrix addition as the vector addition, and scaling a matrix as the scalar multiplication operation.
Example 5.4. Let Pn (R) denote the space of all polynomials of degree less than or equal to n
with real coefficients:
Pn (R) = {a0 + a1 x + . . . an xn | a0 , . . . an ∈ R} .
Then I claim this is naturally a vector space over R with the vector addition given by usual addition
of polynomials, and the scalar multiplication given by scaling polynomials in the usual way.
23
2015
Andrew J. Havens
Example 5.5. The complex numbers C := {a + bi | a, b ∈ R, i2 = −1} are naturally a vector space
over the real numbers, but since C is also a field, C can be regarded as a C-vector space. in general,
any field F is itself a vector space over F, and Fn may be defined as it was for Rn . Fn inherits a
natural vector space structure, much as R did, by allowing componentwise addition of vectors using
the additive structure of F, and allowing the multiplicative structure of F to determine the scalar
action componentwise.
Example 5.6. Let p be a prime number. Then there exists a field Fp which has p elements. We
can regard this field as the set of remainder classes modulo the prime p, and so we write
Fp = {0, 1, . . . , p − 1}
as a set. The additive structure is determined by taking the remainder of addition modulo p, and
the multiplicative structure is determined likewise. For example, if p = 3, one has F3 = {0, 1, 2} as
a set, and the operations are
0 + 0 = 0, 0 + 1 = 1, 0 + 2 = 2
1 + 1 = 2, 1 + 2 = 0
0(1) = 0(2) = 0(0) = 0, 1(1) = 1, 1(2) = 2, 2(2) = 1 .
Given any Fp , we can construct Fnp which is certainly an Fp -vector space, but it will contain
only pn elements. We can also construct the space Pn (Fp ) of polynomials of degree less than or
equal to n with Fp coefficients. These spaces are interesting in their own right within the study
of number theory. However, a simple example shows that these are not so abstract: let p = 2. F2
is called the binary field. Recall that any given integer m possess a binary expansion, which is an
expression of the form
m = a0 20 + a1 21 + a2 22 + . . . an 2n
for some integer n, where a0 , . . . an ∈ F2 are equal either 0 or 1. This is just a polynomial in
Pn (F2 ) evaluated with x = 2! Thus, there is a correspondence between binary expansions of
integers and polynomials in the vector space Pn (F2 ). As an example, consider the integer 46.
We know that 46 = 32 + 8 + 4 + 2 = 25 + 23 + 22 + 21 . The corresponding polynomial is then
0 + 1x + 1x2 + 1x3 + 0x4 + 1x5 ∈ P5 (F2 ), while the binary expansion is just the list of these
coefficients (with highest degree first): 4610 = 1011102 .
Example 5.7. Fix an interval I ∈ R, and let C 0 (I, R) denote the set of all continuous R-valued
functions on I. Convince yourself that this is indeed a vector space. One can also give a vector space
structure to continuously differentiable functions C 1 (I, R) defined over an open interval I ∈ R.
5.3
Linear Maps and Machinery
We now can proceed to define and study linear maps between vector spaces. What we will see is
that the phenomena in Rn aren’t particularly special to Rn , but rather a consequence of vector
space structure. We will have the power to prove facts for all vector spaces and linear maps, which
gives us the power to transfer ideas about how to solve problems in one space to other spaces. Our
definition of linear map won’t appear any different, but we see that it is truly the two properties
we’ve settled on which create much of the rigidity in the study of linear algebra.
Definition 5.2. A map T : V → W of F-vector spaces is called an F-linear map or an F-linear
transformation if
(i.) for any u, v ∈ V , T(u + v) = Tu + Tv,
(ii.) for any s ∈ F and any v ∈ V , T(sv) = sTv.
24
2015
Andrew J. Havens
If the field is understood, one simply says ”linear map”, ”linear function”, or ”linear transformation.” The symbol T is often referred to as a linear operator on V .
In analogy to how in elementary algebra, one studies roots of polynomials, i.e. points which a
polynomial maps to 0, one may concern oneself with solutions v to the homogeneous linear equation
Tv = 0W for a linear map T : V → W . We have a special name for the set of solutions to such an
equation:
Definition 5.3. The kernel of an F-linear map T : V → W of F-vector spaces is the preimage of
the zero vector 0W ∈ W :
ker T := T−1 {0W } = {v ∈ V | Tv = 0W } .
Thus, the kernel of a linear map is the set of solutions to the homogeneous equation determined by
that map:
v ∈ ker T ⇐⇒ Tv = 0W .
Proposition 5.1. A linear map T : V → W is an injection if and only if the kernel is trivial, i.e.
ker T = {0V }.
Proof. The proof is built in HW quiz 2, problem 1.
Example 5.8. We’ve already encountered linear maps of R-vector spaces extensively, and in the
case of a linear map given by matrix-vector multiplication, we can easily characterize injectivity.
In particular, if A ∈ Matm×n (R) is a matrix determining a linear map TA : Rn → Rm , it’s injective
if and only if the homogeneous equation Ax = 0 ∈ Rm is uniquely solved by the zero vector
0 ∈ Rn . This occurs if and only if A has n pivots. If there are fewer than n pivots, we have
free variables, and can write the solution to the homogeneous equation as a linear combination of
vectors which generate or span the kernel. We’ve seen this basic procedure performed when solving
for the intersection of two planes, though in that case there was an additional vector with scalar
weight 1, since we were solving an inhomogeneous equation of the form Ax = b for b ∈ R2 .
So by the above discussion, we can detect injectivity of the map x 7→ Ax by examining the row
reduction of A and counting the pivot entries. Note also that this implies that if n > m, there is
no hope for injectivity, as there can be at most as many pivots as the minimum of n and m. We
will often abuse notation and write ker A for the kernel of the linear map TA , and refer to this
kernel as the null space of A. This language will be better justified when we study subspaces and
the rank-nullity theorem in coming lectures.
We also have a special name for bijective linear maps, owing to the fact that linear maps preserve
vector space structure well:
Definition 5.4. Given two vector spaces V and W over a field F, an F-linear map T : V → W is
called a linear isomorphism or a vector space isomorphism if it is a bijection. In this case we say
that V and W are isomorphic as F-vector spaces, and we write
V ∼
=W.
If it is clear we are dealing with two vector spaces over a common field, we may simply say that
the map is an isomorphism and that the vector spaces are isomorphic.
Exercise 5.3. Show that Pn (F) ∼
= Fn+1 by exhibiting a linear isomorphism.
Exercise 5.4. Deduce that if A ∈ Matn×n (R) is an invertible matrix, it determines a selfisomorphism of Rn . We call such a self-isomorphism a linear automorphism.
25
2015
Andrew J. Havens
Exercise 5.5. Compute the kernel of the linear map TA with matrix


4 1 4


A= 1 1 1 .
4 1 4
Describe the general solution to Ax = b in terms of the components b1 , b2 , b3 of b and the elements
of the kernel (in particular, you should be able to express the solution as a linear combination of
some vectors; what is this geometrically?)
5.4
Subspaces
An important concept in the study of vector spaces is that of a subspace. The idea is that linear
equations carve out smaller vector spaces within larger ones, and vector spaces nest well in other
vector spaces.
Definition 5.5. Let V be an F-vector space, U ⊂ V a nonempty subset. We call U a vector
subspace or linear subspace if and only if the following two conditions hold:
(i.) for any u, v ∈ U , u + v ∈ U ,
(ii.) for any s ∈ F and any u ∈ U , su ∈ U .
Exercise 5.6. Verify that a subset U of a vector space V over F is a vector subspace if and only
if it is itself a vector space with the operations it inherits from V .
Exercise 5.7. Convince yourself (and me, if you care) that U ⊂ V is a vector subspace if and only
if it passes the following subspace test:
For any u, v ∈ U and any s ∈ F, u + sv ∈ U .
This is analogous to the statement that a map T : V → W is F-linear if and only if T(u + sv) =
Tu + sTv for any u, v ∈ U and any s ∈ F.
Example 5.9. Given any vector space V , V is a vector subspace of itself, called the improper
subspace. A subspace U ⊂ V is called proper if and only if it is not all of V .
Example 5.10. For any vector space V , {0} is a vector subspace of V , called the trivial subspace.
This justifies the language “the kernel is trivial”, as the kernel is trivial if and only if it equals the
trivial subspace. We often drop the braces and write 0 for the subspace as well as the element.
Example 5.11. If T : V → W is a linear map, then the kernel ker T ⊂ V is a subspace and
similarly the image T (V ) ⊂ W is a subspace. Let us prove the former, and leave the latter as an
exercise. We have to check the two conditions of being a subspace, namely, whether it is closed
under addition and scalar multiplication. By some trickery, one can claim that it suffices to check
that for any u, v ∈ ker T, and any scalar s, u + sv ∈ ker T. (Why?) This is readily verified:
T(u + sv) = Tu + sTv = 0 + s0 = 0 =⇒ u + sv ∈ ker T .
Thus the kernel of the map T is a subspace of V .
Exercise 5.8. Check that Pk (F) ⊂ Pn (F) is naturally a subspace so long as k ≤ n. Define the
space of all polynomials over F
P(F) := {p(x) ∈ Pn (F) | some n ∈ N} = ∪n Pn (F) .
Then convince yourself that Pn (F) ⊂ P(F) is a subspace for any nonnegative integer n.
26
2015
Ä
Andrew J. Havens
ä
Example 5.12. We can view the set C 1 (a, b), R of continuously differentiable functions on an
Ä
ä
open interval (a, b) as sitting inside of continuous functions C 0 (a, b), R , indeed, as a vector subspace (prove this to yourself!) The derivative map provides a linear map
Ä
ä
Ä
ä
d
: C 1 (a, b), R → C 0 (a, b), R ,
dx
and since the kernel of this map is nontrivial
(itäconsists of all the constant functions, which as
Ä
a vector subspace is R sitting inside C 1 (a, b), R ), we know the map is not injective, and so in
Ä
ä
Ä
ä
particular, it is not the map giving us the inclusion C 1 (a, b), R ,→ C 0 (a, b), R . On the other
hand, by the fundamental theorem
of calculus,
the map is surjective, since we can always integrate
Ä
ä
0
a continuous function f ∈ C (a, b), R to obtain a continuously differentiable function
Z x
g(x) :=
a
Ä
ä
f (t)d t , g(x) ∈ C 1 (a, b), R ,
d
g(x) = f (x) .
dx
Thus, we’ve furnished an example of a proper vector subspace which possesses a surjective but not
injective linear map onto its parent vector space. This is possible because the spaces are infinite
dimensional – a notion we will make precise soon! We will also show that these oddities don’t
occur in the finite dimensional cases.
Before we can define dimension properly, we must carefully come to understand the role played
by linear combinations in building subspaces, and in describing elements of vector spaces. Thus,
we will define linear combinations and linear independence for a general vector space V over a field
F.
Definition 5.6. Let V be an F-vector space. Given a finite collection of vectors {v1 , . . . , vk } ⊂ V ,
and a collection of scalars (not necessarily distinct) a1 , . . . , ak ∈ F, the expression
a 1 v1 + . . . + a k vk =
k
X
a i vi
i=1
is called an F-linear combination of the vectors v1 , . . . , vk with scalar weights a1 , . . . ak . It is called
nontrivial if at least one ai 6= 0, otherwise it is called trivial.
As alluded to, one major use of linear combinations is to construct new subspaces. Consider
looking at the collection of all linear combinations made from a collection of vectors. We will call
this their span:
Definition 5.7. The linear span of a finite collection {v1 , . . . , vk } ⊂ V of vectors is the set of all
linear combinations of those vectors:
span {v1 , . . . , vk } :=
( k
X
i=1
)
ai vi ai ∈ F, i = 1, . . . , k .
If S ⊂ V is an infinite set of vectors, the span is defined to be the set of finite linear combinations
made from finite collections of vectors in S.
Proposition 5.2. Let V be an F-vector space. Given a finite collection of vectors S ⊂ V , the span
span (S) is a vector subspace of V .
Proof. A sketch was given in class. You are encouraged to go through a careful argument and
determine which axioms of being a vector space are applied where.
27
5.5
2015
Andrew J. Havens
Linear Independence and Bases
Definition 5.8. A collection {v1 , . . . , vk } ⊂ V of vectors in an F-vector space V are called linearly
independent if and only if the only linear combination of v1 , . . . , vk equal to 0 ∈ V is the trivial
linear combination:
{v1 , . . . , vk } linearly independent ⇐⇒
k
ÄX
ä
ai vi = 0 =⇒ a1 = . . . = ak = 0 .
i=1
Otherwise we say that {v1 , . . . , vk } is linearly dependent.
Proposition 5.3. {v1 , . . . , vk } is linearly dependent if and only if there is some vi ∈ {v1 , . . . , vk }
which can be expressed as a linear combination of the vectors vj for j 6= i.
Proof. Suppose {v1 , . . . , vk } is linearly dependent . After possibly relabeling we can assume that
P
there’s a tuple (a1 , . . . , ak ) ∈ Fk such that a1 6= 0, and ki=1 ai vi = 0. Then rearranging, one has
v1 =
k
X
Å
−
i=2
ã
ai
vi ,
a1
and thus we have expressed one of the vectors as a linear combination of the others.
Conversely, if there’s a vector vi ∈ {v1 , . . . , vk } such that it can be expressed as a linear
P
combination of the other vectors, then we have vi = i6=j aj vj for some constants aj ∈ F, and
P
rearranging one has vi − i6=j aj vj = 0, which is a nontrivial linear combination equal to the zero
vector. This establishes that {v1 , . . . , vk } is linearly dependent.
Example 5.13. Let V = Rn , and suppose {v1 , . . . , vk } ⊂ Rn is a collection of k ≤ n vectors. Then
we have the following proposition:
Proposition 5.4. The set of vectors {v1 , . . . , vk } is linearly independent if and only if the matrix
A = [v1 . . . vk ] has k pivots.
Proof. Consider the system Ax = 0. If 0 6= ker A := ker T, then there’s some nonzero x ∈ Rn such
P
that ni=1 xi vi = 0, which implies that {v1 , . . . , vk } is linearly dependent. Thus, {v1 , . . . , vk } is
linearly independent if and only if ker A is trivial, which is true if and only if there k pivots.
Definition 5.9. A vector space V over F is called finite dimensional if and only if there exists a
finite collection S = {v1 , . . . , vk } ⊂ V such that the F-linear span of S is V . If no finite collection
of vectors spans V , we say V is infinite dimensional.
Proposition 5.5. Any finite dimensional F-vector space V contains a linearly independent set
B ⊂ V such that span B = V , and moreover, any other such set B 0 ⊂ V such that span B 0 = V
has the same number of elements as B.
Proof. Let V be a finite dimensional F-vector space. Observe that because the V is finite dimensional, by definition there exists a subset S ⊂ V such that span S = V . If S is linearly independent
then we merely have to show that no other linearly independent set has a different number of elements. On the other hand, if S is linearly dependent, then since S is finite, we can remove at most
finitely many vectors in S without changing the span. The claim is that removing a vector which
is a linear combination of the remaining vectors does not alter the span. This is obvious, since the
span is the set of linear combinations of the vectors, so if we throw some vector w out of S, the
set S \ {w} still contains w in its span, and hence any other linear combination which potentially
involved w can be constructed using only S \ {w}. Thus, after throwing out finitely many vectors,
we have a set B which is linearly independent, such that span B = span S = V . It now remains
to show that the size of any linearly independent set B 0 which also spans V is the same as that of
B. To do this we need the following lemma:
28
2015
Andrew J. Havens
Lemma 5.1. If S ⊂ V is a finite set and B ⊂ span S is a linearly independent set, then |B| ≤ |S|.
Assuming the lemma, let’s finish the proof of the proposition. Suppose |B| = n and |B 0 | = m.
From the lemma, since span B = V ⊃ B 0 and B 0 is linearly independent, we deduce that m ≤ n
from the lemma. We similarly conclude that since span B 0 = V ⊃ B and B is linearly independent,
n ≤ m. Thus m = n and we are done.
We now prove the lemma:
Proof. Let S = {v1 . . . vm } and suppose B ⊂ span S is a linearly independent set. Choose some
finite subset E ⊂ B. Since B is linearly independent, so is E. Suppose E = {u1 , . . . uk }. Since
E ⊂ span S, there’s a linear relation
uk = a1 v1 + . . . am vm .
Since uk 6= 0 by linear independence of E, we deduce that at least one aj 6= 0. We may assume it
is a1 whence we can write v1 as a linear combination of {uk , v2 . . . vm }. Note that E is also in the
span of this new set. We readily conclude that uk−1 is in the span of this new set, and repeating
the argument above we can claim v2 ∈ span {uk , uk−1 , v3 . . . vm }. Note that E is also in the span
of this new set. We can repeat this procedure until either we’ve used up E, in which case k ≤ m,
or until we run out of elements of S. If we were to run out of elements of S, without running out
of elements of E, then since E is in the span of each of the sets we are building, we’d be forced to
conclude that there are elements of E which are linear combinations of other elements in E, which
contradicts its linear independence. Thus, it must be the case that k ≤ m, as desired.
Definition 5.10. Given a vector space V over F, we say that a linearly independent set B such
that V = span F B is a basis of V . Thus, the above proposition amounts to stating that we can
always provide a basis for a finite dimensional vector space, and moreover, any basis will have the
same number of elements.
Definition 5.11. Given a finite dimensional vector space V over F, the dimension of V is the size
of any F-basis of V :
dimF V := |B| .
A remark: the subscript F is necessary at times, since a given set V may have different vector
space structures over different fields, and consequently different dimensions. Specifying the field
removes ambiguity. We will see examples of this shortly.
Example 5.14. The standard basis of Fn is the set BS := {e1 , . . . , en } consisting of the vectors
which are columns of In . In particular, for any x ∈ Fn :
x1
n
X
 .. 
x =  .  = x1 e1 + . . . + xn en =
xi ei .
i=1
xn


Clearly, the vectors of BS are linearly independent since they are columns of the identity matrix.
Exercise 5.9. Show that if A ∈ Matn×n (R) is an invertible matrix, then the columns of A form a
basis of Rn . Note that dimR Rn = n as expected, either by the previous example or this one.
Example 5.15. A choice of basis for Pn (F) can be given by the set of monomials of degree less
than n: {1, x, . . . , xn }. Clearly, any polynomial with coefficients in F is an F-linear combination of
these, as indeed, that is how one defines polynomials! We merely need to check linear independence.
This is clear since the only polynomial equal to the zero polynomial is the zero polynomial, and
so any F-linear combination of the monomials equal to the zero polynomial necessarily has all zero
coefficients, and thus is the trivial linear combination. Note that there are n + 1 monomials in the
basis, so dimF Pn (F) = n + 1.
29
2015
Andrew J. Havens
Example 5.16. The complex numbers C, regarded as a real vector space, have a basis with two
elements: {1, i}, and thus dimR C = 2. But as a vector space over the field C, a basis choice
could be any nonzero complex number, and in particular, {1} is a basis of C as a vector space over
C, so dimC C = 1. More generally, dimR Cn = 2n while dimC Cn = n. Note that for any field,
dimF Fn = n, which is established for example by looking at the standard basis.
Example 5.17. Let us examine an analogue of the standard basis in the case that our vector space
is the space of real m × n matrices, Matm×n (R). Define a basis
BS = {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n, i, j ∈ N} ,
such that eij is the matrix containing a single 1 in the (i, j)-th entry, and zeros in all other entries.
It is easy to check that this is an R-basis of Matm×n (R), and thus that Matm×n (R) is an mndimensional real vector space.
Exercise 5.10. Consider the set of all matrices Eij ∈ Matn×n (F) defined by
Eij := In − eii − ejj + eij + eji .
(a) Given an n × n matrix A, what is Eij A?
(b) Describe the vector space span F {Eij | 1 ≤ i, j ≤ n, i, j ∈ N} ⊂ Matn×n (F), and give a basis for
this vector space. (Hint: first figure out what happens for 2 × 2 matrices and 3 × 3 matrices,
then generalize).
The notion of basis is useful in describing linear maps, in addition to giving us a notion of
“linear coordinates”. Let us examine the connection between bases and linear maps. The first
result in this direction is the following theorem:
Theorem 5.1. Let V be a finite vector space over F and B = {v1 , . . . , vn } a basis of V . Let W
be a vector space and {w1 , . . . wn } ⊂ W a collection of not necessarily distinct vectors. Then there
is a unique linear map T : V → W such that
Tv1 = w1 , . . . Tvn = wn .
Proof. For any v ∈ V we can write v as a linear combination of the basis vectors. Thus, let
P
v = ni=1 ai vi . Suppose T : V → W is a linear map which satisfies the conditions
Tv1 = w1 , . . . Tvn = wn .
Then the claim is that the value Tv is determined uniquely. Indeed, since T is linear, one has
Tv = T
n
X
i=1
a i vi =
n
X
i=1
ai Tvi =
n
X
ai wi .
i=1
Moreover, we may construct a unique T from the data Tvi = wi by the above formula, and define
this to be the linear extension of the map on the basis.
This proposition tells us that if we determine the values to which basis vectors transform, then
we can linearly extend to describe a linear map of all of V , and so the following corollary should
come as no surprise (we’ve alluded to the fact before in comments and exercises):
Corollary 5.1. Let V be a finite vector space over a field F. Then V is non-canonically2 isomorphic
to Fn where n = dimF V .
2
The term “non-canonical” in mathematics refers to the fact that the construction depends on choices in such a
way that there is no natural preference. In this case, there are many isomorphisms that may exist between V and
Fn , and we have no reason to prefer a specific choice outside of specific applications.
30
2015
Andrew J. Havens
Proof. Since V is finite dimensional, we may find some basis B = {v1 . . . vn }, where dimF V = n.
Then define LB on B by specifying
LB vi = ei , i = 1, . . . , n ,
where {e1 , . . . , en } = BS ⊂ Fn is the standard basis. Then by the above proposition, we may
linearly extend LB to a linear map LB : V → Fn . It is clearly an isomorphism, as L−1
B is defined
on BS and determines a unique linear map LB : Fn → V which clearly satisfies
−1
LB ◦ L−1
B = IdFn and LB ◦ LB = IdV .
Example 5.18. Regarding Cn as a real vector space we have an isomorphism Cn ∼
= R2n . Similarly,
n+1
mn
we have Pn (R) ∼
and Matm×n (R) ∼
=R
= R . This latter fact justifies the notation that many
authors (including Bretscher) exploit of writing Rm×n instead of Matm×n (R).
Exercise 5.11. For each of the above examples, write down explicit isomorphisms (in particular,
produce a basis and describe how to map it to a basis of an appropriate model vector space Rk ).
Exercise 5.12. Explain why there can be no invertible linear map T : R3 → R2 . (this will be
clarified more deeply in the discussion of the Rank-Nullity theorem; try to prove this using a
simple argument about bases!)
We will later explore the use of the maps LB : V → Rn of real vector spaces to discuss linear
coordinates and change of basis matrices. For now, we finish with another important example:
using a basis to describe a linear map via a matrix. Since any n-dimensional vector space over R
is isomorphic to Rn , it suffices to understand how to write matrices for maps T : Rn → Rm .
Theorem 5.2. Let T : Rn → Rm be a linear map. Then there is a matrix A ∈ Matm×n (R), called
the matrix of T relative to the standard basis, or simply the standard matrix of T, such that
Tx = Ax. The matrix is given by has columns given by the effect of the map T on the standard
basis:
A = [Te1 . . . Ten ] ∈ Matm×n (R) .
Proof. The proof is a simple computation. Let x =
Tx = T
n
X
i=1
!
xi ei
Pn
i=1 xi ei .
Then
x1
 .. 
=
xi Tei = [Te1 . . . Ten ]  .  .
i=1
xn

n
X

Remark 5.1. If we think about the one line proof above, it should be clear that the image of the
linear map T : Rn → Rm is nothing more than the span of the columns of the matrix A representing
the map:
T (Rn ) = span {Te1 , . . . Ten } =: Col A .
The last notation is new: for any matrix A ∈ Matm×n (R), Col A is the subspace of Rm spanned by
the columns of A. This is called the column space of A, though we may also just refer to it as the
of the image of the matrix map x 7→ Ax. We’ll see the column space again shortly.
31
2015
Andrew J. Havens
Example 5.19. Let us demonstrate how to construct a matrix representing the linear map
d
: P2 (R) → P1 (R) .
dx
Since matrices describe maps between vector Euclidean vector spaces, we need to exploit the isomorphisms
ϕ2 : P2 (R) → R3 , ϕ2 (a0 + a1 x + a2 x2 ) = a0 e1 + a1 e2 + a3 e3 ∈ R3 ,
ϕ1 : P1 (R) → R2 , ϕ1 (a0 + a1 x) = a0 e1 + a1 e2 ∈ R2
The matrix we desire will actually then be the standard matrix of the map
ϕ1 ◦
d
3
2
◦ ϕ−1
2 : R →R
dx
which completes the diagram:
d/dx
P2 (R)
P1 (R)
ϕ2
ϕ1
R3
Note that since
d
dx p(x)
ϕ1 ◦
d
dx
◦ ϕ−1
2
R2
= a1 + 2a2 x, one has that the bottom map in the diagram is defined by
ãÄ
ä
d
−1
ϕ1 ◦
◦ ϕ2
p(x) = a1 e1 + 2a2 e2 ,
dx
Å
and by applying our theorem we find that the desired matrix representing the derivative is
ñ
A=
0 1 0
0 0 2
ô
.
Observe that the first column is a zero column, and this is entirely sensible since the derivative of
a constant is 0.
Exercise 5.13. Expand on the above example and describe matrices representing the derivative
of polynomials in Pn (R), and do the same for the integral. (This is part of exercise 4 on HW quiz
3.)
Exercise 5.14. Fix a real number a ∈ R and a positive R ∈ R and denote by I the open interval
(a − R, a + R). Denote by C ω (I, R) the space of power series centered at a and convergent on I.
(a) Show that C ω (I, R) is a vector space over R with vector addition and scalar multiplication
defined in the natural ways.
(b) Is this vector space finite dimensional?
(c) Describe a basis of C ω (I, R).
(d) Give an example of a linear transformation T : C ω (I, R) → C ω (I, R) that is surjective but not
injective. Can you find an example of a linear transformation of C ω (I, R) which is injective,
has image of the same dimension as C ω (I, R), but is not surjective?
32
6
2015
Andrew J. Havens
Rank and Nullity, and the General Solution to Ax = b
This section introduces us to the notions of rank and nullity, and will also give us the relation
between them. The theorem relating them, called the rank-nullity theorem, is also sometimes
affectionately referred to as the fundamental theorem of linear algebra. This is because it gives us
a rigid relationship between the dimensions of the domain of a linear map, the dimension of its
image, and the dimension of its kernel, effectively telling us that linear maps can at worst collapse
a subspace (the kernel, if it is nontrivial), leaving the image as a possibly lower dimensional shadow
of the source vector space, sitting inside the target vector space. We will then discuss the general
solution of linear systems.
6.1
Images, Kernels and their Dimensions
Let us introduce the main definitions and their elementary properties. Throughout, let V be a
finite dimensional vector space of a field F, and let T : V → W be a linear map.
Definition 6.1. The rank of the linear map T : V → W is the dimension of the image:
rank T := dimF T (V ) .
It is sometimes abbreviated as rk T.
Remark 6.1. Note that rank T ≤ dimF V and rank T ≤ dim W .
Exercise 6.1. Explain the above remark about bounds on the rank of a linear map.
Definition 6.2. The nullity of the linear map T : V → W is the dimension of the kernel:
null T := dimF ker T .
Remark 6.2. Observe that null T ≤ dimF V , but it need not be bounded by the dimension of W .
Exercise 6.2. Explain the above remark about the bound on the nullity of a linear map.
Let us consider how nullity and rank are computed when a linear map is given by matrix
multiplication. Consider, for example, a linear map T : Rn → Rm given by the rule Tx = Ax for
A ∈ Matm×n (R). Recall that the image of the map T is the same as the set of all vectors which
can be written as linear combinations of the columns of A (this is why some books call it the
column space of A.) Thus, the rank is the number of linearly independent columns, as a collection
of linearly independent columns of A is a basis for the image. But we know that a set of k vectors
is linearly independent if and only if the matrix whose columns are the k vectors has k pivots, and
so we deduce that the rank of the map T is precisely the number of pivots of A. The nullity is the
dimension of the kernel, and each free variable of A contributes a vector which is in a basis of the
kernel (think about using Gauss Jordan to solve Ax = 0). It is thus clear that the nullity of the
map T can be computed by counting free variables, or equivalently by subtracting the number of
pivots from the total number of columns of A. We then have the obvious relationship: rank plus
nullity gives the number of columns, which is just the dimension of the domain Rn . This is the
rank-nullity theorem, as stated for matrices. We will show it generally:
Theorem 6.1. Rank-Nullity
Let V be a finite dimensional F-vector space, and let T : V → W be a linear map.
Then dimF V = dimF T(V ) + dimF ker T = rank T + null T .
33
2015
Andrew J. Havens
Proof. Since V is finite dimensional, there exists a basis B of V . Moreover, since ker T ⊂ V
is a subspace, it is itself a finite dimensional vector space, and it thus possesses a basis. Let
B = {u1 , . . . uk , v1 , . . . vr } be a basis of V such that ker T = span {u1 , . . . , uk }. We claim several
things: that we can indeed procure a basis of V satisfying this property, and that {Tv1 , . . . Tvr }
are a basis of the image.
For the first claim, note that we can start with any basis B˜ of V and some basis {u1 , . . . uk }
of K = ker T ⊂ V , where k = dimF K. Assume that dimF V = n. Then to produce a basis of the
form above, we start by replacing a vector of B˜ by u1 . If the resulting set is linearly independent,
then we choose a different vector in B˜ to be replaced by u1 . I claim there is a choice such that the
˜ But then we
modified set is still a basis. For if not, then u1 is in the span of any n − 1 vectors in B.
have a pair of distinct linear relations involving u1 , and by subtracting these we obtain a nontrivial
˜ contradicting the linear independence of the vectors in
linear relation involving the elements of B,
˜ Thus, we may choose to replace a vector of the basis with u1 to form a different basis. The
B.
set of elements in B˜ − {u1 } is then a basis of an n − 1 dimensional subspace complimentary to
span {u1 }, and we can iterate the process of replacement by elements of the basis of K, until we’ve
exhausted ourselves of the ui , i = 1, . . . k. The final set B is a basis of the form given above, where
n = k + r, and we know that k = null T is the nullity.
For the second claim, observe that the image of T satisfies
T(V ) = span {Tu1 , . . . , Tuk , Tv1 , . . . , Tvr }
= span {0, . . . , 0, Tv1 , . . . , Tvr }
= span {Tv1 , . . . , Tvr }
Thus the set {Tv1 , . . . Tvr } spans the image. We need to show that this set is linearly independent.
P
We prove this by contradiction. Suppose that there is a nontrivial relation ri=1 ai Tvi = 0. Then
T
r
X
!
ai vi
= 0 =⇒
i=1
r
X
ai vi ∈ ker T .
i=1
Since {u1 , . . . , uk } are a basis of ker T, we then can express the linear combination of vi s as a linear
combination of the uj s:
r
X
i=1
ai vi =
k
X
bj uj .
j=1
We thus obtain a relation
a1 v1 + . . . + ar vr − b1 u1 − . . . − bk vk = 0 ,
and since at least one of the ai s is nonzero, this relation is nontrivial. This contradicts the linear
independence of the elements of B. Thus, the assumption that there exists a non-trivial linear
relation on the set {Tv1 , . . . Tvr } is untenable. We conclude that {Tv1 , . . . Tvr } is a basis of the
image, so the rank is then r.
It is therefore clear that
dimF V = n = r + k = dimF T(V ) + dimF ker T = rank T + null T .
Let’s examine the consequences of this theorem briefly. First, note that if a map T : V → W
is an injection from a finite dimensional vector space V , then the kernel has dimension 0, and by
rank-nullity we have that the dimension of the image is the same as the dimension of the domain.
In particular, if a linear map is injective, its image is an “isomorphic copy” of the domain, and
one may refer to such maps as linear embeddings, since we can imagine that we are identifying the
domain with its image as a subspace of the target space.
34
2015
Andrew J. Havens
If we have a surjective map T : V → W from a finite dimensional vector space V , then the
image has the same dimension as W . We see that the dimensions then satisfy
dimF ker T = dimF V − dimF W ,
whence we see that the nullity is the difference in the dimensions of the domain and codomain for
a surjective map. We can interpret this as follows: to cover the space W linearly by V , we have to
squish extra dimensions, nullifying a subspace (the kernel) whose dimension is complimentary to
W.
Finally of course, in a linear isomorphism T : V → W , we have injectivity and surjectivity, and
so in particular we have null T = 0 and dimF V = dimF W = rank T.
6.2
Column Space, Null Space, Row Space
This section introduces some language which is seen in many linear algebra textbooks for talking
about the various subspaces associated to a linear map defined by matrix multiplication. We
will presume a linear map T : Rn → Rm throughout, given by Tx = Ax for some matrix A ∈
Matm×n (R).
Definition 6.3. The column space of the matrix A is the span of the columns of A. Observe that
the column space is thus a subspace Rm , indeed, it is just another name for the image of the map
T, i.e. Col A = T(Rn ) ⊆ Rm .
Definition 6.4. The row space of a matrix A is the span of the rows of A, and is denoted Row A.
Technically, this a subspace of Mat1×n (R), but often one identifies the row space with a corresponding subspace of Rn (via the isomorphism ·τ : Mat1×n (R) → Rn sending a row vector to the
corresponding column vector).
Definition 6.5. The null space (or right null space as it is sometimes called) of the matrix A is
the space of vectors x such that Ax = 0. Note this is just another term for the kernel of the map
T. There is a notion of a “left null space” of A, which is the kernel of the map whose matrix is Aτ .
The right nullity is just the nullity (i.e. the dimension of the kernel of T), and the left nullity is
the dimension of the left null space. I will tend to use the term kernel instead of null space, except
when dealing with both left and right null spaces of a given matrix.
One can naturally identify rows with linear functions from Rn to R, and so there is a more
formal viewpoint on the row space: it is a subspace of the dual vector space to Rn . We develop this
idea with a few exercises. We first define duals in general:
Definition 6.6. Let V be a vector space over F. Then V ∗ = {f : V → R | f is F-linear} has a
natural vector space structure induced by scaling and addition of functions, and when endowed
with this structure is called the dual vector space to V , or the “space of linear functionals on V ”.
Exercise 6.3. Show that for any finite dimensional F-vector space V , V ∗ ∼
= V (non-canonically).
Exercise 6.4. What geometric objects give a model of the dual vector space to R3 ?
By the preceding exercise, we see that the space of linear functionals on Rn is isomorphic to
Rn . By fixing the standard basis as our basis, we can realize linear functionals as row vectors, and
their action by the matrix product. Thus, we see that the row space of a matrix is a subspace of
(Rn )∗ , and we can pass through the aforementioned transposition isomorphism to Rn .
Exercise 6.5. What is the relationship between the row space of A and the column space of
Aτ ? What does rank nullity tell us about the relationships of the dimension of the row space, the
dimension of the column space, and the right and left nullities?
35
6.3
2015
Andrew J. Havens
The General Solution At Last
We now will discuss the general solution to a linear system. We’ve already seen how to algorithmically solve a matrix equation of an inhomogeneous linear system Ax = b, where A ∈ Matm×n (R),
x ∈ Rn and constant b ∈ Rm , using Gauss-Jordan. We wish to more deeply interpret these results
in light of our knowledge of the various subspaces associated to a linear map (or to a matrix), and
the rank-nullity theorem. Throughout, assume A ∈ Matm×n (R), and b ∈ Rm fixed. We begin with
a few observations.
Observation 6.1. Let K = ker(x 7→ Ax). Note this is precisely the space of solutions to the
homogeneous linear system Ax = 0. Suppose x0 ∈ K, and that xp solves the inhomogeneous
system Ax = b. Then note that xp + x0 is also a solution of the inhomogeneous system:
A(xp + x0 ) = Axp + Ax0 = b + 0 = b .
˜ p both solve the inhomogeneous system, then they differ by an
Observation 6.2. If xp and x
element of K:
˜ p ) = Axp − A˜
A(xp − x
xp = b − b = 0 ,
˜p ∈ K .
=⇒ xp − x
These two observations together imply the following: given any particular solution xp to the
inhomogeneous linear system Ax = b, we can obtain any other solution by adding elements of the
kernel of the map x 7→ Ax. In particular, we can describe the general solution to Ax = b as being
of the form
x = xp + x0 ,
for x0 ∈ K.
î
ó
When we reduce the augmented matrix A b and write the solution as a sum of a constant
vector with coefficient 1 and a linear combination of vectors with coefficients coming from the free
variables, we are in fact describing a general solution of the above form. The constant vector is
an example of a particular solution, while the remaining vectors which are scaled by free variables
give a basis of the null space.
We thus know how to solve a general linear system and produce a basis for the null space. How
do we find a basis of the column space? The procedure is remarkably simple once we’ve reduced
the matrix A: simply look for the pivot columns, and then take the corresponding columns of the
original matrix A, and this collection gives a basis of the image of the map x 7→ Ax.
6.4
Excercises
Recommended exercises from Bretscher’s text:
• Any (really many) of the problems at the end of section of 3.1. Especially 9-12,19, 20, 22-31,
35, 36, 42, 43, 48-50.
• Problems 28, 29, 34-44 at the end of section 3.2
• Problems 33-39 at the end of section 3.3
• Problems 1-10 and 16-39 at the end of section 4.1
36
7
2015
Andrew J. Havens
A Tour of Linear Geometry in R2 and R3
This section was covered in class primarily on the dates 3/6, 3/9, and 3/11. Please read Bretscher,
chapter 2, section 2. I covered more than the contents of Bretscher, providing a number of pictures,
proofs and examples. The notes will be updated to more completely reflect what was stated in
class at some point, but in the interim, please find a classmate’s notes if you were unable to attend,
or attempt to prove the given formulae by constructing your own compelling geometric arguments.
The outline of what as covered in class and the statements of the main formulae may be found
below, with propositions, theorems, and definitions generalized to Rn where applicable.
7.1
Geometry of linear transformations of the plane
Before exploring linear transformations of the plane, we need to understand the Euclidean structure
of R2 . As it happens, this structure comes from the dot product, and indeed the dot product gives
a Euclidean structure to any Euclidean vector space Rn .
Proposition 7.1 (Bilinearity of the dot product). Given a fixed vector u ∈ Rn , x 7→ u · x gives
a linear map from Rn to R. Since the dot product is commutative, we have in particular that the
map · : Rn × Rn → R is bilinear (linear in each factor).
Theorem 7.1 (Geometric interpretation of the dot product). Let u and v be vectors in Rn . Then
u · v = kukkvk cos θ ,
where θ ∈ [0, π] is the (lesser) angle between the vectors u and v as measured in the plane they
span.
Remark 7.1. It suffices to prove the above in R2 , since the angle is always measured in the two
dimensional subspace span {u, v} ∼
= R2 . We used elementary trigonometry to deduce this.
Proposition 7.2 (Euclidean orthogonality from the dot product). Two vectors u, v ∈ Rn are
orthogonal if and only if u · v = 0.
Definition 7.1. Given u, v ∈ Rn , the orthogonal projection of v onto u is the vector
proju v :=
u·v
u.
kuk2
ˆ ∈ S1 := {x | kxk = 1}, then the formula simplifies
Remark 7.2. If instead we take a unit vector u
to
projuˆ v = (ˆ
u · v)ˆ
u.
Exercise 7.1. Prove the above remark using the formula in the definition of orthogonal projection.
Then give a matrix for the operator proju for u ∈ R2 , and show that this is the same as the matrix
ˆ := u/kuk is the normalization of u. Find also the corresponding matrices if
for projuˆ where u
3
u∈R .
We may use the above construction to understand reflections through 1-dimensional subspaces
of R2 (namely, reflections across lines through the origin). The remaining theorems exercises of
this subsection concern linear automorphisms of R2 , i.e. bijective linear maps of R2 to itself. In
particular, rotations and reflections are explored through the following exercises.
Exercise 7.2. Prove the following theorems for the rotation and reflection formulae in the plane
(this was done in class!):
37
2015
Andrew J. Havens
Theorem 7.2. Given an angle θ ∈ [0, 2π), the operator for counter-clockwise rotation of R2 by the
angle θ has standard matrix
ñ
ô
cos(θ) − sin(θ)
Rθ =
.
sin(θ) cos(θ)
Using the isomorphism C ∼
= R2 given by mapping the basis (1, i) to (e1 , e2 ), the operator Rθ
corresponds to the 1D C-linear operation
Rθ (z) = eiθ z .
Theorem 7.3. Let L ⊂ R2 be a line through 0, and suppose u is a vector spanning L. Then the
operator giving reflection through L is
ML = (2proju − I2 ) : R2 → R2 ,
and it is well defined independently of the choice of u spanning L. If θ ∈ [0, π) is the angle made
by L with the x-axis, then the matrix of ML in the standard basis of R2 is
ñ
cos(2θ) sin(2θ)
sin(2θ) − cos(2θ)
ô
.
We can thus determine a reflection by the angle θ ∈ [0, π) made by the line L with the x-axis, and
may also write Mθ to indicate the dependence on this parameter.
Moreover, if
ñ
ô
a b
A=
b −a
for a, b ∈ R such that a2 + b2 = 1, then A represents a reflection through the line L = span (u)
where u is any vector lying on the line bisecting the angle between the first column vector of A and
e1 .
Using the isomorphism C ∼
= R2 given by mapping the basis (1, i) to (e1 , e2 ), the operator Mθ
corresponds to the operation
Mθ (z) = e2iθ z¯ ,
where z¯ = <z − i=z is the complex conjugate of z. (Note this operation is not, strictly speaking,
complex linear, since complex conjugation is not C-linear.)
Exercise 7.3. Given an arbitrary nonzero-complex number a ∈ C∗ = C − {0}, what is the effect
of the map z 7→ az? Give a matrix representation when this is viewed as a map of R2 .
One then has the following conclusion about the relation between complex and real representations of rigid linear motions in the plane: “rigid linear motions of R2 are captured by C-linear
motions of C together with conjugation; that is, C-linear motions of C are more restricted (they
preserve orientation), but including the complex conjugation operation recovers R-linear motions
of C as an R-vector space.”
Example 7.1. Let L be the line in R2 through the origin making angle 3π/4 with the x-axis, and
let M be the line in R2 through the origin making angle π/6 with the x axis. Find the standard
matrix for the composition T = MM ◦ ML of reflections through the lines L and M . What is the
geometric interpretation of this composition? Write a formula for it using complex numbers.
Solution:
By the above theorems, if a line L is spanned by a unit vector u = cos θe1 + sin θe2 , then we
can compute the reflection through L as
ML (x) = 2(u · x)u − x = (2proju − I)x ,
38
2015
Andrew J. Havens
and the matrix (2proju − I) is given as
ñ
(2proju − I) =
2 cos2 θ − 1 2 sin θ cos θ
2 sin θ cos θ 2 sin2 θ − 1
ô
ñ
=
cos(2θ) sin(2θ)
sin(2θ) − cos(2θ)
ô
.
Thus, first we determine the unit vectors associated to each line:
´
®Ç √
®
å´
−√ 2/2
cos(3π/4)
= span
L = span
sin(3π/4)
2/2
®
´
®Ç √
å´
cos(π/6)
3/2
M = span
= span
sin(π/6)
1/2
Let A be the matrix such that ML (x) = Ax and let B be the matrix such that MM (x) = Bx.
We then have
ñ
ô
0 −1
A=
,
−1 0
√
ñ
ô
1/2
3/2
√
B=
.
3/2 −1/2
The composition of the maps T = MM ◦ ML has matrix equal to the matrix product
ñ √
ô
− 3/2 −1/2
√
BA =
.
1/2
− 3/2
√
Note that this matrix is the matrix of a rotation! Since sin θ = 1/2 and cos θ = − 3/2, we
conclude that the angle of the associated counterclockwise rotation is θ = 5π/6, and we conclude
MM ◦ ML = R5π/6 .
As a complex linear map T can be realized by z 7→ e5πi/6 z.
Exercise 7.4. Give matrix-vector formulae for rotation about an arbitrary point of R2 and reflection through an arbitrary line (not necessary containing 0).
Exercise 7.5. Characterize all bijective linear maps of R2 which do not decompose as a composition
involving rotations or reflections.
Exercise 7.6. (Hard!) Describe an algorithm which, for a given matrix A describing a bijective
linear map x 7→ Ax of R2 , produces a decomposition in terms of reflections, rotations, and the
maps described in the previous exercise. Can one decompose any linear automorphism of R2 using
just reflections and the maps from the previous exercise (i.e., can we exclude rotations in our
decompositions)?
7.2
Geometry of linear transformations of three-dimensional space
Below is a summary of the contents of the two lectures given on the geometry of linear transformations of R3 . If you missed those lectures, then it is advised you copy notes and discuss the
material with a classmate or myself during office hours. The essential points, such as computing
3 × 3 determinants, are reviewed in future sections.
• Projections - the formula for projection onto a line appears the same. Can you find a formula
for projection onto a plane?
39
2015
Andrew J. Havens
• Planes and normals - This is largely overlap material with math 233; I chose to present it
from a linear algebra perspective in class as a point of unification (e.g. deriving the equation
of a plane, which we’ve used for a while without justification)
• Reflections in planes - the visual argument for this is analogous to the argument used to
derive reflections across a line in R2 .
• Cross products and determinants/Triple Scalar Products - The 3 × 3 determinant was introduced and used as a mnemonic for the computation of the 3D cross product. Note that there
is no cross product in dimensions other than 3 and 7 (though there’s a pseudo-cross product
in R2 which returns the signed area of the parallelogram spanned by the pair of vectors being
multiplied). It was observed that the 3×3 determinant is in fact the signed volume of a parallelepiped spanned by the (column or) row vectors. This construction is equivalent to dotting
the vector corresponding to the first row with the cross product of the vectors corresponding
to the second and third rows.
• Spatial Rotations - Using the cross product and projections, we obtained a beautiful formula
for rotation of R3 about an axis by an angle θ.
8
Coordinates, Basis changes, and Matrix Similarity
Please read sections 3.4 and 4.3, and 4.4 in Bretscher for the presentation and examples of the
following topics.
8.1
Linear Coordinates in Rn
8.2
Coordinates on a finite dimensional vector space
8.3
Change of Basis
9
Determinants and Invertibility
Please read Bretscher, chapter 6; this section of the notes will include definitions and proofs auxiliary
to those provided by the text.
9.1
Review of Determinants in 2 and 3 Dimensions
Recall that we defined the determinant of a 2 × 2 matrix A as follows:
det A := a11 a22 − a21 a12 , where A = (aij ) ∈ Mat2×2 (F) .
Note that this definition can be applied for matrices over any field (or more generally, even over a
ring, such as the integers). Note also that det A = det Aτ .
For 2 × 2 matrices over a field, we know that invertibility of the matrix is equivalent to nonvanishing of its determinant. A natural question is whether we can generalize this to square matrices
of any size. Recall, the geometric interpretation of the 2 × 2 determinant for matrices with real
entries:
40
2015
Andrew J. Havens
Example 9.1. (HW 1 Bonus 1) Show that ad − bc is the signed area of the parallelogram spanned
by u and v, where the sign is positive if rotating u counter-clockwise to be colinear to v sweeps
into the parallelogram, and is negative otherwise.
Solution:
First, let us suppose u and v are unit vectors, i.e. a2 + c2 = 1 = b2 + d2 . Geometrically, they are
vectors lying on the unit circle, and so we can express their components as trigonometric functions
of the angles they make with the x axis. Let u make an angle of α with the x axis and v make
an angle of β with the x axis. Then the angle between the vectors is β − α, and from the sine
subtraction formula: sin(β − α) = cos(α) sin(β) − cos(β) sin(α) = ad − bc.
Recall that the area of a parallelogram is the base times an altitude, formed by taking an
orthogonal line segment from one side to an opposite side. From a picture, one sees that the area of
a parallelogram can be expressed as the product of side lengths times the sine of the internal angle
between adjacent sides. If the sides are the unit vectors u and v, then the area is | sin(β − α)|.
Thus, for unit vectors, ad − bc is ±area, with the sign positive if the angle β − α ∈ (0, π), negative if
β −α ∈ (π, 2π), and 0 if the angle β −α = 0 or π (the colinear case). Thus, for the non-colinear case,
if u sweeps into the parallelogram when rotated counterclockwise towards v, the sign is positive.
Note that switching the order of the vectors switches the sign of the determinant ad − bc, and this
is consistently reflected in the convention regarding the vectors’ orientations.
For general vectors, one scales the area of the parallelogram as well as the components, and
discovers that the scale factors for the area and the equation ad − bc are identical: e.g. if we scale
u by λ, then the area scales by λ, and so do the components:
Ç
λu =
λa
λc
å
,
so the determinant scales to (λa)d − b(λc) = λ(ad − bc). Thus, the determinant is the signed area,
accounting for the orientation/ordering of the two vectors.
We also defined determinants for 3 × 3 matrices, and discovered that our generalization has an
analogous geometric interpretation as a signed volume in R3 of the parallelepiped whose sides are
determined by the column vectors (or row vectors) of the matrix:
a
11
a21
a31
a12 a13
a22 a23
a32 a33
= a11 (a22 a33 − a32 a23 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a31 a23 ) .
See Brestcher, section 6.1, for a discussion of Sarrus’s rule, and why it fails to generalize to give
determinants for n > 3.
9.2
Defining a General Determinant
For the definition provided in class, please read Bretscher, section 6.1. Here, I rephrase his definition
(which uses “patterns” and “inversions”) in the modern, standard language. We need to define a
very important object, called a permutation group, in order to give the modern definition of the
determinant. This definition is very formal, and is not necessary for the kinds of computations
we will be doing (see instead the discussions of computing determinants by row reduction, or via
expansion by minors.) It is recommended you read Bretscher’s treatment or the in class notes
regarding patterns and signatures first, before approaching this section. The end of the section
describes how to define determinants in the general setting of finite vector spaces over a field,
where instead of matrices we consider maps of the vector space to itself, called endomorphisms.
41
2015
Andrew J. Havens
Definition 9.1. Consider a set of n symbols, e.g. the standard collection of integers 1 through
n: {1, . . . , n}. Define the permutation group on n symbols to be the set of all bijective maps of
{1, . . . , n} to itself, with group operation given by composition. See HW 4 for the definition of a
group and an exercise realizing a representation of this group. Denote this permutation group by
Sn .
A common notation for the above group’s elements is cycle notation. For example, let us
consider S3 , the permutation group of the symbols {1, 2, 3}. Consider the map which sends 1 to 2,
2 to 3 and 3 to 1. We notate this element as (1 2 3). We interpret the symbol as telling us where to
send each element as follows: if an integer m appears directly to the right of k, then k is mapped to
n, and the last integer on the right in the cycle is mapped to the first listed on the left. The cycle
(1 2 3) clearly gives a bijection, so we can regard (1 2 3) ∈ S3 . This is called a cyclic permutation,
as it consists of a single cycle. Another special type of permutation is a cyclic permutation with
just two elements, which is called a transposition. An example would be the map which sends 1 to
itself, but swaps 2 and 3. This is notated (2 3) ∈ S3 . The lack of the appearance of 1 tells us that
1 is mapped to itself (sometime, this transposition would be denoted (1)(2 3) to emphasize this.)
The convention I will follow is that if an integer is missing from a cycle, then it is sent to itself by
that cycle. To see the effect of the map determined by a cycle, we’ll denote it’s action sometimes
by writing how it permutes the ordered tuple (1, . . . , n), e.g. if σ = (1 3) ∈ S3 , then
σ
(1, 2, 3) 7−→ (3, 2, 1) .
One can cyclically reorder any cycle and it will represent the same map, e.g. (1 2 3) = (2 3 1) =
(3 1 2). By convention one usually starts the cycle with the lowest integer on which the cycle acts
nontrivially. The empty cycle () represents the identity map on the set of symbols.
One can “multiply” cycles to compute a composition of permutations as follows:
1. Two adjacent cycles represent applying one cycle after another, from right to left. For example, in permutations of 6 symbols, S6 , the cycles σ = (1 2 3) and σ 0 = (3 5 4 6) can be
composed in two ways:
σσ 0
σσ 0 = (1 2 3)(3 5 4 6), which acts as (1, 2, 3, 4, 5, 6) 7−→ (2, 3, 5, 6, 4, 1) ,
σ0 σ
σ 0 σ = (3 5 4 6)(1 2 3), which acts as (1, 2, 3, 4, 5, 6) 7−→ (2, 5, 1, 6, 4, 3) .
2. Any cycle product can be rewritten as a product of disjoint cycles. Disjoint cycles commute
with each other, e.g. (1 2)(3 4) = (3 4)(1 2) ∈ S4 represents the map
(1, 2, 3, 4) 7→ (2, 1, 4, 3) .
If cycles are not disjoint, to write them as disjoint cycles, one reads where the rightmost cycle
sends a given symbol, then scans left to find its image in the cycles to the left, then follows
this image to the left, etc. E.g. using the examples from (1):
σσ 0 = (1 2 3)(3 5 4 6) = (3 5 4 6 1 2) = (1 2 3 5 4 6) .
σ 0 σ = (3 5 4 6)(1 2 3) = (1 2 5 4 6 3) .
In these cases the result is a single cycle (which is therefore a product of disjoint ones). A
more interesting example is the product
(1 3 5)(5 6)(1 4 2 6) = (1 4 2)(3 5 6) .
42
2015
Andrew J. Havens
3. Any cycle can be decomposed as a product of (not necessarily disjoint) transpositions. E.g.
(1 2 3) = (1 2)(2 3)
σσ 0 = (1 2)(2 3)(3 5)(5 4)(4 6)
A permutation is called even if it can be decomposed into an even number of transpositions,
otherwise it is said to be odd.
Exercise 9.1. Argue that the notions of evenness and oddness of a permutation are well defined.
Thus you must show that if a permutation has one decomposition into evenly many transpositions,
then any decomposition into transpositions has an even number of transpositions, and similarly if
it admits an odd decomposition, then all decompositions are odd.
Definition 9.2. A given permutation has signature sgn σ = 1 if σ is even and −1 if σ is odd. By
the above exercise, this is well defined, and in fact determines a unique map sgn : Sn → {−1, 1}
such that sgn (σ1 σ2 ) = sgn (σ1 ) sgn (σ2 ) and with sgn (τ ) = −1 for any single transposition τ ∈ Sn .
The “patterns” Bretscher speaks of are actually the result of applying permutations to the
indices of entries in the matrix. In particular, one can define a pattern as follows. Let’s assume we
are given a matrix A ∈ Matn×n (F). Fix a permutation σ ∈ Sn . Then we obtain a pattern
Pσ = (a1,σ(1) , a2,σ(2) , . . . , anσ(n) ) .
The claim is that all patterns are of this form and that the signature of the pattern is equal to the
signature of the associated permutation. Given this fact, one can realize Bretcher’s definition as
the more common Lagrange formula for the determinant:
Definition 9.3. The determinant of A ∈ Matn×n (F) is the scalar det A ∈ F given by
det A :=
X
sgn (σ)
n
Y
i=1
σ∈Sn
Ä
X
aiσ(i) =
ä
sgn (σ) a1σ(1) · · · anσ(n) .
σ∈Sn
One can readily recover some basic properties of the determinant from this definition. For
example, suppose one were to swap the ith and jth columns of a matrix A. This is equivalent to
acting on the matrix by the transposition τij = (i j) ∈ Sn . Denote the image matrix as τij A and
let A = (akl ). Note that sgn τ = −1 and sgn (σ) = −sgn (στ ) for any σ ∈ Sn . Moreover, since Sn
is a group, the map τ : Sn → Sn is a bijection. Thus
X
det(τij A) = det(akτ (l) ) =
=
X
σ∈Sn
n
Y
−sgn (στ )
σ∈Sn
=−
X
X
n
Y
akσ(τ (l))
k=1
akσ(τ (l))
k=1
n
Y
sgn (στ )
στ ∈Sn
=−
sgn (σ)
akστ (l)
k=1
sgn (σ)
σ∈Sn
n
Y
akσ(l)
k=1
= − det(A) .
Exercise 9.2. Use the above definition to show that det A = det Aτ for any matrix A ∈ Matn×n (R).
Exercise 9.3. Use the above definition to describe the effect of the other elementary row/column
operations on the determinant of a square matrix.
43
2015
Andrew J. Havens
Let us now generalize our definition of determinants to a suitable class of maps of abstract finite
dimensional vector spaces. Given a finite dimensional vector space V over a field F, we can consider
endomorphisms of V and their determinants:
Definition 9.4. Let V be a vector space over F. Then a linear endomorphism, vector space
endomorphism, or simply endomorphism of V as an F-vector space is an F-linear map T : V → V .
We denote the space of all endomorphisms of the F-vector space V by EndF (V ).
Let us consider a finite dimensional vector space V , with dimension n. Thus, there is a basis
A of V consisting of n vectors, giving us a coordinate system in Fn . If T ∈ EndF (V ) is an
endomorphism of V , we can find a matrix A representing T relative to the basis A .
Definition 9.5. For V an F-vector space with dimF V = n, the determinant of an endomorphism
T : V → V is the determinant of any matrix A ∈ Matn×n (F) representing T in coordinates determined by some basis A of V :
det T := det A, where A ∈ Matn×n (F) such that [Tv]A = A[v]A .
We need to check that this is a reasonable definition. Mathematicians speak of checking if a
given construction or definition is “well-defined”. In this case, that means we need to check that
the determinant depends only on the endomorphism T , and not on the choice of basis A of V .
Claim 9.1. The determinant of an endomorphism T : V → V of a finite vector space is well defined.
Proof. Suppose A and B are bases of V , and A and B are the coordinate matrices of T ∈ EndF (V )
relative to A and B respectively. It suffices to show that det A = det B. We know that A and B
are similar, for if S is the change of basis matrix from A to B, i.e. the standard matrix of the
isomorphism LB ◦ LA−1 : Fn → Fn , then AS = SB, whence B = S−1 AS. Then by properties of the
determinant of a square matrix, we have:
det B = det(S−1 AS)
= (det S−1 )(det A)(det S)
= (det S−1 )(det S)(det A)
Ä
ä
= det(S−1 S) (det A)
= (det In )(det A)
= det A .
An alternative definition of general determinants of endomorphisms of a finite vector space
is to define the determinant of a map as the product of its eigenvalues (see the next section).
This alternative definition has the advantage of being completely coordinate free; one need not
invoke coordinates directly in the definition, and it is clearly well defined since the eigenspectrum
is determined only by the map itself.
We now consider the properties of the determinant.
Proposition 9.1. Let V be a finite dimensional vector space over the field F, dimF V = n. Then
there are isomorphisms
EndF (V ) ∼
. . × V} ∼
= Matn×n (F) ∼
= Fn×n .
= V n := |V × .{z
n times
Exercise 9.4. Prove the above proposition.
44
2015
Andrew J. Havens
Definition 9.6. Given a product of vector spaces V1 × V2 × . . . × Vn , a map
T : V1 × V2 × . . . × Vn → F
is said to be multilinear if it is linear in each factor, i.e., if for any i ∈ {1, . . . , n}, any α, β ∈ F, and
any pair x1 , yi ∈ Vi ,
T(x1 , x2 , . . . , αxi + βyi , . . . , xn ) = αT(x1 , x2 , . . . , xi , . . . , xn ) + βT(x1 , x2 , . . . , yi , . . . xn ) .
Definition 9.7. A multilinear map T : : V × V × . . . × V → F is called alternating if and only if
for any pair of indices i, j ∈ {1, . . . , n}
T(x1 , x2 , . . . , xi , . . . , xj , . . . , xn ) = −T(x1 , x2 , . . . , xj , . . . , xi , . . . , xn ) ,
i.e. after swapping any pair of inputs, the map is scaled by −1 ∈ F. A multilinear map is called
symmetric if and only if such a swap does not change the value of the map on its inputs.
Remark 9.1. Note that if F is of characteristic 2, then a map is alternating if and only if it is
symmetric. Otherwise (e.g. the fields we’ve worked with most, such as R, C, Q, or Fp , p 6= 2) a
map might be one but not the other, or might be neither.
Exercise 9.5. Show that any alternating multilinear map T : V × . . . × V → F evaluates to zero if
it has repeated inputs. E.g. for an alternating bilinear map B : V × V → F, B(x, x) = 0 necessarily.
Theorem 9.1. Let V be a finite dimensional vector space over the field F, dimF V = n. There is
a unique map D : EndF (V ) → F satisfying the following properties:
(i.) D is multilinear and alternating when viewed as a map D : V n → F,
(ii.) For any endomorphisms T, S ∈ EndF (V ), D(T ◦ S) = D(T)D(S),
(iii.) D(IdV ) = 1
Exercise 9.6. Prove the above theorem and show that the map D is indeed the determinant
as defined above. Note in particular that the multilinearity and alternativity of D should be
independent of the choice of isomorphism EndF (V ) ∼
= V n.
9.3
Expansion by Minors
We now show that one can recursively compute the determinant. It suffices to demonstrate that a
recursive formula can be produced for a given A ∈ Matn×n (F). We work from the definition
det A :=
X
sgn (σ)
n
Y
aiσ(i) =
i=1
σ∈Sn
X
Ä
ä
sgn (σ) a1σ(1) · · · anσ(n) .
σ∈Sn
Fix a particular index i ∈ {1, . . . , n} =: [n], and observe that
n
Y
akσ(k) = aiσ(i)
k=1
Y
akσ(k) .
k∈[n]\{i}
Let j := σ(i). The following exercise is to deduce the details leading to our recursive formula for
determinant computation.
Exercise 9.7. Let Pσ = nk=1 akσ(k) , and take aij as above. Let Pσij = k∈[n]\{i} akσ(k) . Let
Pσ and Pσij be the respective patterns corresponding to these products (taken in order of the first
index). Show that
Q
Q
45
2015
Andrew J. Havens
a. sgn (σ) = sgn (Pσ ),
b. sgn (Pσ ) = (−1)i+j sgn Pσij ,
c. sgn (σ)Pσ = (−1)i+j aij Pσij .
Theorem 9.2 (Expansion by Minors/The Laplace Expansion). Let A = (aij ) ∈ Matn×n (F). Fix
a column (or row), with index j (or i respectively). Denote by Aij the submatrix of A obtained by
removing the i-th row and j-th column. Then
det A =
n
X
(−1)i+j aij det(Aij ) =
i=1
n
X
(−1)i+j aij det(Aij ) .
j=1
The rightmost formula being an expansion by minors along the i-th row, and the middle formula
being an expansion by minors down the j-th column. Note that the pattern for choosing the signs,
as shown in the above preceding exercise, is a checkerboard, with the upper left corner positive:








+
−
+
−
..
.
−
+
−
+
..
.
+
−
+
−
..
.
−
+
−
+
..
.
...
...
...
...
..
.








Exercise 9.8. Let A ∈ Matn×n R and suppose that k is a positive integer such that A has a k × k
minor which has nonzero determinant, and such that there are no minors of larger size in A with
nonzero determinant (note, the minor might be A itself). Show that rk A = k. Moreover, show
that if rk A = k for some k, then the largest size of a nonzero minor in A is k × k.
9.4
Cramer’s Rule and the Inverse Matrix Theorem
Theorem 9.3 (Cramer’s Rule). Consider the linear system Ax = b, where A ∈ Matn×n (R) and
b ∈ Rn . Suppose x is the unique solution to the system, and xi = ei · x is the i-th component of x.
Then
det(Ab,i )
xi =
,
det(A)
where Ab,i is the matrix obtained from A by replacing the i-th column with the vector b.
Proof. We compute det(Ab,i ) assuming that Ax = b. We write A = [v1 , . . . , vn ], where vj is the
j-th column of A, as is usual. Then
det(Ab,i ) = v1 v2 . . . vi−1 b vi+1 . . . vn = v1 v2 . . . Ax . . . vn = v1 . . . (x1 v1 + . . . + xi vi + . . . + xn vn )
= v1 . . . x i vi . . . vn = x i v1 . . . vi . . . vn = xi det(A) .
(2)
(3)
...
vn (4)
(5)
(6)
(7)
Since x is the unique solution, A has nonzero determinant (as it must be invertible), and we
conclude that for each i ∈ {1, . . . , n}
det(Ab,i )
xi =
.
det(A)
46
2015
Andrew J. Havens
An interesting corollary of this is the following algorithm for computing the inverse of an
invertible matrix. Define the (i, j)-th cofactor of A to be cij = det AijÄ where Aijä is the matrix
obtained from A by removing the i-th row and j-th column, and let C = (−1)i+j cij be the signed
cofactor matrix. Then the classical adjoint is
A∗ := C τ .
Corollary 9.1. If A ∈ Matn×n (F) is invertible, then the inverse of A is given by
A−1 =
1
A∗ .
det A
Exercise 9.9. Prove the above corollary, using Cramer’s rule. (The proof was given in class and
can be found in Bretscher, but see if you can reproduce it without referencing anything other than
Cramer’s rule!)
10
10.1
Eigenvectors, Eigenvalues, and the Characteristic Equation
The concepts of eigenvectors and eigenvalues
Consider the following puzzle, whose solution is intuitive. We have three friends sitting around
a table, and each is given some amount of putty: at time t = 0 minutes one of them has a > 0
grams of putty, another has b > 0 grams of putty, and the last individual has c > 0 grams of putty.
They play with their respective wads of putty for nearly a minute, and then divide their wads into
perfect halves. Exactly at the one minute mark, each person passes one half to the friend to their
left, and the other half to the friend to their right. They then play with their wads of putty for
nearly another minute before agreeing to again divide and pass exactly as they did at t = 1. For
each integer number of minutes n, at exactly t = n they pass half of the putty in their possession
at the time to the adjacent friends. What happens in the long term? Does any one friend end up
with all of the putty, or most of the putty, or does it rather approach an equilibrium?
What we’ve described is an example of a discrete dynamical system. In this particular case, it
is in fact a linear system: you can check that if xt is the vector describing the putty in possession
of our three friends at time t, then at time t = n we have xn = Axn−1 , where


0 1/2 1/2


1/2
0 1/2  .
A=
1/2 1/2 0
It is easy to see that we can define a function, for nonnegative integral t:
x• : Z≥0 → R3 , xt = At x0 ,
where x0 = ae1 + be2 + ce3 is the initial vector describing the putty held by each friend at time
t = 0. The question of long term behavior is then stated mathematically as “Find
lim xn = lim An x0 ,
n→∞
n→∞
if it exists.”
One observation about this putty problem: at each step, the total amount of putty in the hands
of the collective of friends is conserved. This might give us some hope that the limit exists, but of
course we need to understand what it means for this system to converge for a given initial value
x, and actually show that it does (if this is the case). Before we analyze this system in full, let us
explore two-dimensional systems, and define an incredibly useful tool which will allow us to solve
such linear discrete dynamical systems (LDSSs for short).
47
2015
Andrew J. Havens
Exercise 10.1. Generalize the putty problem in two different ways to feature n friends. Intuitively,
can you argue that the long term behavior of each such system is qualitatively the same as that we
expect in the original putty problem?
Example 10.1. Let us consider the matrix
ñ
1 2
0 −1
ô
.
Suppose we wanted to understand the action of the map x 7→ Ax on the plane R2 . One natural
question is “does the map T(x) = Ax admit any invariant proper subspaces (in this case, lines) in
R2 ?” That is, are there lines L such that the the image TL of L is L ? Suppose that L ⊂ R2
is a 1-dimensional subspace fixed by the map T. Then there is some nonzero vector v ∈ R2 such
that L = span {v}. Then Tv = Av = λv for some scalar λ ∈ R, since Tv ∈ span {v}. We can
rearrange this equation as
Av − λv = (A − λI2 )v = 0 .
Thus, v ∈ ker(A − λI2 ). Since we assumed v 6= 0, it follows that det(A − λI2 ) = 0. This gives us
a polynomial equation, which should determine λ. We call this the characteristic equation of the
matrix A. Using the given values, we have
Çñ
det
1 2
0 −1
ô
ñ
−
λ 0
0 λ
ôå
Çñ
= det
1−λ
2
0
−1 − λ
ôå
= (1 − λ)(−1 − λ) = 0
⇐⇒ λ = 1 or λ = −1 .
That we get two such scalars λ suggests that there are two subspaces invariant with respect to
our map T. The λs are called eigenvalues, and the corresponding invariant subspaces are called
eigenspaces (“eigen” means “own” or “self” in German, though it’s come to mean “characteristic”
or “self-similar” owing to it’s extensive appearance in modern mathematics as a prefix for gadgets
coming from linear operators.)
We can find a pair of eigenvectors describing our two eigenlines. Indeed, we can use the values
of λ we found to solve the vector equations
(A − (1)I2 )v1 = 0 .
(A − (−1)I2 )v1 = 0 .
Exercise 10.2. Find the vectors v1 and v2 above. Note that in class we deduced that we could
read off the eigenvalues from the main diagonal of the matrix in this case, since the matrix is
upper-triangular. In general, an upper triangular matrix or lower triangular matrix has eigenvalues precisely equal to the entries along the main diagonal. In class we used our eigenvectors to
form a basis, and rewrote the linear map in eigencoordinates, exhibiting that in the appropriate
coordinates, it was merely a reflection along across one axis.
Please read Bretscher 7.1 - 7.3, which will cover much of the following topics:
48
2015
10.2
The characteristic equation
10.3
Eigenvalue formulae for Traces and Determinants
10.4
Eigenspaces and Eigenbases
10.5
Diagonalization
10.6
Jordan Canonical form
11
Orthogonality and Inner Product Spaces
Will there be time?
49
Andrew J. Havens

Course Notes roughly up to 4/6

Transcription

Similar documents

vector - SurfaceWorks

problems on vectors

Document 6560231

Key

Math 63 Linear Algebra Sample Exam Questions

solutions for Chapter 1. - Introduction to 3D Game Programming with