Course Notes roughly up to 4/6
Transcription
Course Notes roughly up to 4/6
Math 235.9 Spring 2015 Course Notes Andrew J. Havens April 15, 2015 1 Systems of Real Linear Equations Let’s consider two geometric problems: (1) Find the intersection point, if it exists, for the pair of lines whose equations in “standard form” are given as 2x + 4y = 6, x − y = 0. More generally, can we solve the two dimensional linear system: ax + by = e, cx + dy = f , provided a solution exists? Can we develop criteria to understand when there is a unique solution, or multiple solutions, or no solution at all? (2) Consider the vectors Ç 1 1 å Ç , 2 −1 å . We can depict these as arrows in the plane as follows: Figure 1: The two vectors above depicted as “geometric arrows” in the Cartesian coordinate plane. Imagine that we can only take “steps” corresponding to these vectors, i.e. we can only move parallel to these vectors, and a valid move consists of adding one of these two vectors to our position to obtain our next position. Can we make it from the origin O = (0, 0) to the point (6, 0)? 1 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens We will see that these two kinds of problems are actually related more closely than they would initially appear (though the second has a restriction that the first does not require, namely we seek an integer solution; nonetheless, there is an underlying algebraic formalism which allows us to consider this problem one of linear algebra). First, we solve problem (1). There are many ways to solve the given numerical problem. Among them: solving one equation for either x or y (the second is ripe for this) and substituting the result into the other equation, writing both equations in slope-intercept form and setting them equal (this is clearly equivalent to the substitution described), or eliminating variables by multiplying the equations by suitable constants and respectively adding the resulting left and right hand sides to obtain a single variable equation: ( 2x + 4y x−y ( =6 x + 2y ←→ =0 2x − 2y =3 =0 3x = 3 . From this we see that x = 1, and substituting into the second of the two original equations, we see that y = 1 as well. Figure 2: The two lines plotted in the Cartesian coordinate plane. The motivation to use these manipulations will become more clear when we see higher-dimensional linear systems (more variables and more equations motivates a systematic approach, which we will develop in subsequent lectures). One often notates this kind of problem and the manipulations involved by writing down only the coefficients and constants in what is called an augmented matrix : ô ñ 2 4 6 . 1 −1 0 The square portion of the matrix is the coefficient matrix, and the final column contains the constants from the standard forms of our linear equations. This notation generalizes nicely when encoding large systems of linear equations in many unknowns. Let us describe what the manipulations of the equations correspond to in this matrix notation: (i) A row may be scaled by a nonzero number since equations may be multiplied/divided on left and right sides by a nonzero number, 2 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens (ii) A nonzero multiple of a row may be added to another row, and the sum may replace that row, since we can recombine equations by addition as above. (iii) Two rows may be swapped, since the order in which equations are written down does not determine or effect their solutions. The above are known as elementary row operations. Note that for constants p, q ∈ R an augmented matrix of the form ñ ô 1 0 p 0 1 q corresponds to a solution x = p, y = q. Further, note that we can combine operations (i) and (ii) to a more general and powerful row operation: we may replace a row by any nontrivial linear combination of that row and other rows, i.e. we may take a non-zero multiple of a row and add multiples of other rows, and replace the original row with this sum. Let us apply row operations to attempt to solve the abstract system ( ax + by cx + dy ñ ô =e a b e . ←→ c d f =f We assume temporarily that a 6= 0. We will discuss this assumption in more depth later. Since our goal is to make the coefficient matrix have ones along the diagonal from left top to right bottom, and zeros elsewhere, we work to first zero out the bottom left entry. This can be done, for example, by taking a times the second row and subtracting c times the first row, and replacing the second row with the result. We denote this by writing aR2 − cR1 7→ R20 (I may get lazy and stop writing the primes, where it will be understood that R2 after the arrow represents a row replacement by the quantity on the left). The effect on the augmented matrix is ñ ô ñ ô a b e a b e 7−→ . c d f 0 ad − bc af − ce We see that if ad − bc = 0, then either there is no solution, or we must have af − ce = 0. Let’s plug on assuming that ad − bc 6= 0. We may eliminate the upper right position held by b in the coefficient matrix by (ad − bc)R1 − bR2 7→ R10 , yielding ñ ô ñ ô a b e a(ad − bc) 0 (ad − bc)e − b(af − ce) 7→ 0 ad − bc af − ce 0 ad − bc af − ce ô ñ a(ad − bc) 0 ade − abf = . 0 ad − bc af − ce Since we assumed a and ad − bc nonzero, we may apply the final row operations 1 and ad−bc R2 7→ R20 to obtain ñ ô 1 0 (de − bf )/(ad − bc) , 0 1 (af − ce)/(ad − bc) so we obtain the solution as 1 a(ad−bc) R1 7→ R10 de − bf af − ce , y= . ad − bc ad − bc Note that if a = 0 but bc 6= 0, the solutions are still well defined, and one can obtain the corresponding expressions with a = 0 substituted in by instead performing elimination on x= ñ ô 0 b e , c d f 3 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens where the first step might be a simple row swap. However, if ad − bc = 0, there is no hope for the unique solution expressions we obtained, though there may still be solutions, or there may be none at all. We will characterize this failure geometrically eventually. First, we turn to problem (2). Problem (2) is best rephrased in terms of the language of linear combinations of vectors. Recall that addition of the real vectors, which we are representing as arrows in the plane, has both geometric and algebraic definitions. The geometric definition is of course the parallelogram rule: the sum of two vectors a and b is the diagonal of the parallelogram completed by parallel translating a along b and b along a: Figure 3: Vector addition with arrows. The corresponding algebraic operation is merely addition of components: if Ç a= ax ay å Ç , b= Ç then define a + b := ax + bx ay + by å bx by , å . It is left to the reader to see that these two notions of addition are equivalent, and satisfy properties such as commutativity and associativity. Moreover, one can iterate addition, and thus define for any positive integer n ∈ Z na = |a + a +{z. . . + a} . n times Ç Similarly, one can define subtraction, which regards −a := −ax −ay å as a natural additive inverse to a. In fact, geometrically, we need not restrict ourselves to integer multiples, for we can scale a vector by any real number (reversing direction if negative), and algebraically this corresponds to simply multiplying each component by that real number. (For the math majors among you, we are giving the space R2 of vectors, thought of either as pairs of real numbers or as arrows in the plane, an abelian group structure but also a structure as a free R-module; we will see many of these properties later when we define vector spaces formally, but a further generalization is to study groups and modules; an elementary theory of groups is treated in introductory abstract algebra– math 411 here at UMass, while more advanced group theory, ring theory and module theory are left to more advanced abstract algebra courses, such as math 412 and math 611.) We restrict our attention to integral linear combinations of the vectors Ç a := 1 1 å Ç , b := 4 2 −1 å , Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens i.e. combinations of the form xa + yb, where x, y ∈ Z. Then problem (2) is easily rephrased Ç å as 6 follows: does there exist an integral linear combination of a and b equal to the vector ? 0 Visually, it would seem quite plausible (make two parallelograms as shown below!) Figure 4: The two vectors above depicted as “geometric arrows” in the Cartesian coordinate plane. Algebraically, we can apply the definitions of vector scaling and addition to unravel the meaning of the question: we are seeking integers x and y such that Ç å 6 0 Ç =x Ç = å 1 1 Ç 2 −1 +y å x + 2y x−y å ,. This is equivalent to a linear system as seen in problem (1)! In fact, we can use the solution of (1) to slickly obtain a solution to (2): since (1, 1) = (x, y) is a solution to Ç 3 0 å Ç x + 2y x−y = å , we can multiply both sides by 2 to obtain Ç 6 0 å Ç = 2x + 4y 2x − 2y Ç = 2(1) 1 1 å Ç = 2x å Ç + 2(1) 1 1 å Ç + 2y 2 −1 å 2 −1 å . Thus, taking two steps along a and two steps along b lands on the desired point (6, 0). Let’s summarize what we’ve seen in these two problems. We have two dual perspectives: Intersection problem: find the intersection of two lines / solve a linear system of two equations: ( ax + by cx + dy Linear combination problem: Find a linear combination Ç å of two vectors Ç å a b a= and b = : c d Ç =e =f x 5 a c å Ç +y b d å Ç = e f å Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Let’s return to studying the intersection problem to fill in the gap: what can we say about existence or uniqueness of solutions if the quantity ad − bc is equal to zero? Proposition 1.1. For a given two variable linear system described by the equations ( ax + by cx + dy =e =f the quantity ad − bc = 0 if and only if the lines described by the equations have the same slope. Proof. We must show two directions, since this is an if and only if statement. Namely, we must show that if the lines have the same slopes, then ad − bc = 0, and conversely, if we know only that ad − bc = 0, we must deduce the corresponding lines possess the same slopes. Let’s prove the former. We have several cases we need to consider. First, let’s suppose that none of the coefficients are zero, in which case we can write each equation in slope-intercept form: a e ax + by = e ←→ y = − x + , b b f c cx + dy = f ←→ y = − x + , d d and applying the assumption that the lines have identical slopes, we obtain − a c = − =⇒ ad = bc =⇒ ad − bd = 0 . b d (1) On the other hand, if for example, a = 0, then the first equation is by = e, which describes a horizontal line (we must have b 6= 0 if this equation is meaningful). This tells us that the other equation is also for a horizontal line, so c = 0 and consequently ad − bc = 0 · d − b · 0 = 0. A nearly identical argument works when the lines are vertical, which happens if and only if b = 0 = d. It now remains to show the converse, that if ad − bc = 0, we can deduce the equality of the lines’ slopes. Provided neither a nor d are zero, we can work backwards in the equation (??): ad − bc = 0 =⇒ − a c =− . b d Else, if a = 0 or d = 0 and ad − bc = 0, then since ad − bc = bc, either b = 0 or c = 0. But a and b cannot both be zero if we have a meaningful system (or indeed, the equations of lines). Thus if a = 0 and ad − bc = 0, then c = 0 and the lines are both horizontal. Similarly, if d = 0 and ad − bc = 0, then b = 0 we are faced with two vertical lines. There are thus three pictures, dependent on ad − bc, e and f : 1. If ad − bc 6= 0, there is a unique solution (x, y) for any e and f we choose, and this pair (x, y) corresponds to the unique intersection point of two non-parallel lines. 2. If ad − bc = 0, but af − ec = 0 = bf − ed, then one equation is a multiple of the other, and geometrically we are looking at redundant equations for a single line. There are infinitely many solutions (x, y) corresponding to all ordered pairs lying on this line. 3. ad − bc = 0 but af 6= ec. We have two parallel lines, which never intersect. There are no solutions to the linear system. 6 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens While there is much more that can be done with two dimensional linear algebra, we have a fairly complete idea of how to solve each of the basic problems posed. We now will explore the analogous problems in three dimensions, as a way to build up to solving general linear systems. Thus, consider the following problems from three dimensional geometry: 1. Given three “generic” planes in R3 which intersect in a unique point, can we locate their point of intersection? 2. Given two planes intersecting along a line, can we describe the line “parametrically”? 3. Given three vectors u, v, w in R3 , can we describe a fourth vector b as a linear combination of the other three? Before approaching this, we review some important properties of the real numbers, and the description of Cartesian coordinates on Cartesian products of the reals. R denotes the real numbers, which has some additional structure such as a notion of distance given by absolute value, a notion of partial ordering (≤). With these notions, together with ordinary real number arithmetic, we can view R as a normed, ordered scalar field. The properties of R which make it a field are: (i.) R comes equipped with a notion of associative, commutative addition: for any real numbers a, b, and c, a + b = b + a is also a real number, and (a + b) + c = a + b + c = a + (b + c). Moreover, there is a unique element 0 ∈ R which acts as an identity for the addition of real numbers: 0 + a = a for any a ∈ R. Every a ∈ R has a unique additive inverse (−a) such that a + (−a) = 0. (ii.) R comes equipped with a notion of associative, commutative, and distributive multiplication: for any a, b, c ∈ R, ab = ba determines a real number, a(bc) = abc = (ab)c, and a(b + c) = ab + ac = (b + c)a. Moreover, 0a = 0 for any a ∈ R, and there is a unique number a ∈ R which acts as an identity for multiplication of real numbers: 1a = a for any a ∈ R. (iii.) To any nonzero a ∈ R there corresponds a multiplicative inverse 1 a := a−1 satisfying aa−1 = 1. A mathematical set with a structure as above is called a field. We will encounter other fields later on. We’ve already seen examples of “vectors” in the plane, utilizing the coordinates coming from a Cartesian product: R2 = R × R := {(x, y) | x, y, z ∈ R} . When we wish to emphasize that we are talking about vectors, we write them not as ordered pairs horizontally, but as vertical tuples: Ç å x x= ∈ R2 . y We can regard such a vector as the position vector of the point (x, y), which means it is geometrically the arrow pointing from the origin (0, 0) to the point (x, y). It has a notion of geometric length coming from the pythagorean theorem: kxk = » x2 + y 2 . We can extend the ideas of this construction to create “higher dimensional ” spaces. The geometry we are working with here is called Euclidean (vector) geometry. We define R3 analogously: R3 = R × R × R := {(x, y, z) | x, y, z ∈ R} . 7 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens In R3 , we can carve out subsets called planes. They have equations with general form: ax + by + cz = d , a, b, c, d ∈ R , x, y, z are real variables for coordinates on the plane. Let’s try to find an intersection point for a system of three planes. Example 1.1. Consider the 3 × 3 system x+ y+ z x − 2y + 3z 4x − 5y + 6z =6 1 1 1 6 = 6 ←→ 1 −2 3 6 . 4 −5 6 12 = 12 Our goal is to manipulate the system via operations corresponding to adding or scaling the equations, in order to obtain 1 0 0 p 0 1 0 q , 0 0 1 r which corresponds to a solution (x, y, z) = (p, q, r) for some p, q, r ∈ R. A simple list of valid manipulations corresponds to the following elementary row operations: 1. We may swap two rows, just as we may write the equations in any order we please. We notate a swap of the ith and jth rows of an augmented matrix by Ri ↔ Rj . 2. We may replace a row Ri with the row obtained by scaling the original row by a nonzero real number. We notate this by sRi 7→ Ri . 3. We may replace a row Ri by the difference of that row and a multiple of another row. We notate this by Ri − sRj 7→ Ri . Before we proceed to apply these row operations to try to solve our system, I remark that combining these elementary operations allows us to describe a more general valid manipulation: we may replace a row by a linear combination of rows, where the original row is weighted by a nonzero real number. E.g., if s 6= 0, then the following is the most general row operation (up to row swapping) involving the rows R1 , R2 , R3 : sR1 + tR2 + uR3 7→ R1 . Now, to create our solution with row operations. Notice that the top left entry of the matrix is already a 1, which is good news! We want 1s on the main diagonal, and zeros elsewhere on the coefficient side of the augmented matrix. So if the top left entry was a 0, we’d swap rows to get a nonzero entry there, and then if it was not 1 we’d scale the first row by the multiplicative inverse of that entry. Once we’ve got a nonzero entry there, we call this position the first pivot, and our goal is to use it to create a column of zeroes beneath that position. Focusing on that first column, we have: 1 ... 6 1 . . . 6 . 4 . . . 12 It is clear that we can eliminate the second entry in the first column by the row operation R2 − R1 7→ R2 . Similarly, we can create a zero in the first entry of the third row by R3 − 4R1 7→ R3 . This yields 1 1 1 6 1 1 1 6 0 1 −2 3 6 7−→ 0 −3 2 4 −5 6 12 0 −9 2 −12 8 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Next, we want to make the middle entry from a −3 into a 1. This is readily accomplished by a row operation of the second type: − 31 R2 7→ R2 . One should check that after performing in sequence the moves R3 − 9R2 7→ R3 , 14 R3 7→ R3 , R2 + 32 R3 7→ R3 , R1 − 13 R3 7→ R1 , and R1 − R2 7→ R1 , the matrix reduces to 1 0 0 1 0 1 0 2 . 0 0 1 3 Thus the solution to our system is (1, 2, 3), which is the point where these planes intersect. The process where we used a pivot to make zeroes below that entry is called pivoting down, while the process where we eliminated entries above a pivot position is called pivoting up. Exercise 1.1. Show that the row operations are invertible, by producing for a given elementary row operation, another elementary operation which applied either before or after the given one will result in the final matrix being unchanged. Example 1.2. Let us turn to the second geometric problem, regarding the description of a line of intersection of two planes. Take, for instance, the two planes ( x+ y+ z x − 2y + 3z =6 . =6 By applying the row operations in the preceding example together with a few more (which ones?), we see that we can get the system to reduce to ô ñ 1 0 5/3 6 . 0 1 −2/3 0 Notice that there can be at most two pivots, since there are only two rows! We rewrite the matrix rows as equations to try to parametrize the line: x = 6 − (5/3)z , y = (2/3)z , whence x −5/3 6 6 − (5/3)z y = (2/3)z = 0 + z 2/3 . 1 z z 0 Thus the line can be parametrized by z ∈ R, which is the height along the line which begins at (6, 0, 0) on the xy-plane in R3 when z = 0, and travels with velocity −5/3 v = 2/3 . 1 Note that above we wrote the solution as a linear combination of the vectors for the starting position and the velocity. It will be common to solve systems where the final solution is an arbitrary linear combination dependent on some scalar weights coming from undetermined variables. By convention, we often choose different letters from the variable designations, such as s and t, to represent the scalings in such a solution. Thus we would write x 6 −5/3 y = 0 + s 2/3 , s ∈ R , z 0 1 where we’ve taken z = s as a free variable. 9 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens For the third problem, the key observation is that it is essentially the same as the first problem, dualized. We can write down the equation xu + yv + zw = b , for some unknowns x, y, z ∈ R, and after scaling the vectors entry by entry, and adding entry by entry, we have two vectors which are ostensibly equal. Thus setting their entries equal, we obtain a system of three equations, which can be solved via elimination/row operations on the corresponding augmented matrix. Example 1.3. Let 1 −2 3 u = 2 , v = −1 , w = 2 . 3 −3 5 Can the vector 0 b= 1 4 be written as a linear combination of u, v, and w? The claim is that this is not possible. Observe that if such a linear combination exists, then there’s a solution to the vector equation xu + yv + zw = b . We can rewrite this as a system as follows: x − 2y + 3z 2x − y + 2z 3x − 3y + 5z =0 1 −2 3 0 = 1 ←→ 2 −1 2 1 3 −3 5 4 =4 We apply the row operations R2 − 2R1 7→ R2 and R3 − 3R1 7→ R3 to obtain 1 −2 3 0 0 3 −4 1 , 0 3 −4 4 and then R3 − R2 7→ R3 leaves us with 1 −2 3 0 0 3 −4 1 . 0 0 0 3 The last row corresponds to the impossible equation 0z = 3 =⇒ 0 = 3, so there is no possible solution! We call such a system inconsistent. Otherwise, if the equation can be solved (even if the solution is not unique), we refer to the system as consistent. Some possible practice problems: Problems 1-18 in section 1.1 – Introduction to Linear systems in Otto Bretscher’s textbook Linear Algebra with Applications. These problems generalize easily into higher dimensions, and it will be nice to see that our procedure illustrated in the above examples works just as well in those settings. Thus, it seems fitting that we study the general algorithm which allows us to reduce systems and solve either for an explicit solution, or to realize a system is inconsistent. As we will use this algorithm extensively, I devote several lectures to its details and implementation. 10 Math 235.9 - Lin. Alg. Course Notes 2 2015 Andrew J. Havens Gauss-Jordan Elimination In this section we describe the general algorithm which takes a matrix and reduces it in order to solve a system or determine that it is inconsistent. Let us begin with some language and notations. Definition 2.1. A matrix is said to be in Row Echelon Form (REF) if the following conditions hold: 1. All rows containing only zeros appear below rows with nonzero entries. 2. The first nonzero entry in any row appears in a column to the right of the first nonzero entry in any preceding row, and any such initial nonzero entry is a 1. The columns with leading 1s are called pivot columns, and the entries containing leading 1s are called pivots. If, in addition, all entries other than the pivot entries are zero we say the matrix is in Reduced Row Echelon Form (RREF). Example 2.1. ñ is a matrix in row echelon form, while 1 0 5/3 0 1 −2/3 ñ 1 0 0 0 1 0 ô ô is a matrix in reduced row echelon form. We write elementary row ops as follows: let s ∈ R \ 0 be a nonzero scalar, A ∈ Matm×n (R) a matrix which contains m rows and n columns of real entries. Let Ri denote the ith row of A for any integer i, 1 ≤ i ≤ m. Then the elementary row operations are 1. Row swap: Ri ↔ Rj swaps the ith and jth rows. 2. Rescaling: sRi 7→ Ri scales Ri by s. 3. Row combine: Ri − sRj 7→ Ri combines Ri with the scalar multiple sRj of Rj . We are ready to describe the procedure for pivoting downward : Definition 2.2. Let aij denote the entry in the ith row and jth column of A ∈ Matm×n (R). To pivot downward on the (i,j)th entry is to perform the following operations: 1 (i.) Ri 7→ Ri , aij (ii.) For each integer k > i, Ri+k − ai+k,j Ri 7→ Ri+k . In words, make aij into a 1, and use this one to eliminate (make 0) all other entries directly below the (i,j)th entry. Let’s give a brief overview of what the Gauss-Jordan algorithm accomplishes. First, given an input matrix, it searches for the leftmost nonzero column. Then, after finding this column, and after exchanging rows if necessary, it brings the first nonzero entry up to the top. It then pivots downwards on this entry. It subsequently narrows its view to the submatrix with the first row and column removed, and repeats the procedure. Once it has located all pivot columns and pivoted down in each one, it starts from the rightmost pivot and pivot up, then move left to the next pivot and pivot up. It then continues pivoting up and moving left until the matrix is in row echelon form. The descriptions and charts I gave in class are largely taken from a textbook which is in my office (the name escapes me). However, the technical details given in class are not a principal focus, and in particular, will not appear on the exam in any formal capacity (as long as you can perform the algorithm in practice, then you’ve got what you need for the remainder of the course). I may come back and include these details at a future date. 11 Math 235.9 - Lin. Alg. Course Notes 3 2015 Andrew J. Havens Matrices and Linear Maps of Rn → Rm Now that we have an algorithm for solving systems, let’s return to the vector picture again. Here, we review some basic vector algebra in two and thee dimensions: Regard R2 as the set of vectors ®Ç 2 R = x y å ´ x, y ∈ R , and similarly regard R3 as the set of vectors Ö è x 3 y R = x, y, z ∈ R . z Recall the dot product, which I define in R3 (for R2 simply forget the last coordinate): Ö a b c è Ö · x y z è = ax + by + cz . Notice that the right hand side is in fact identical to the expression appearing on the left hand side of our general equation for a plane in R3 ! This is not a coincidence. One way to geometrically determine a plane is to fix a vector Ö è a b n= c and find the set of all points such that the displacement vector from a fixed point (x0 , y0 , z0 ) is perpendicular to i.. The key fact (which we will prove later in the course) is that u·v = kukkvk cos θ for any vectors u, v ∈ R3 , where θ ∈ [0, π] is the angle made between the two vectors (which can always be chosen to be in the interval [0, π]). Thus, a plane equation has the form n · (x − x0 ) = 0 , Ö where x= x y z è Ö , x0 = x0 y0 z0 è . It is simple algebra of real numbers which turns this into the equation ax + by + cz = d, where d = n · x0 is a constant determined by the choices of n and x0 . One refers to the function f (x, y, z) = ax + by + cz, with a, b, c ∈ R known and x, y, z ∈ R variable, as a linear function. So another viewpoint is that a plane in R3 is a level set of a linear function in three variables. We can regard the dot product in another way: as a 1 × 3 matrix acting on a 3 × 1 matrix by matrix multiplication: x [ a b c ] y = [ax + by + cz] = n · x , z where I’ve abused notation slightly by taking the 1 × 1 resulting matrix, and regarding it as merely the real number it contains. We take this as the definition of matrix multiplication in the case where we are given a 3 × 1 matrix (a row vector ) and a 1 × 3 matrix (a column vector ). We wish to extend this definition to matrices acting on column vectors, and we will see that the definition is powerful enough to capture both the concepts of linear systems and linear combinations. 12 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens The idea is simple: we’ll let rows of a matrix be dotted with a vector, as above, which gives us a new vector consisting of the real numbers resulting from each row-column product. Formally, we can define it in Rn , which we think of as the space of column vectors with n real entries: Ö è x1 . n x1 , . . . , x n ∈ R . .. R = x n Definition 3.1. Let A ∈ Matm×n (R) be a matrix given by ··· .. . am 1 · · · a11 .. . a1n .. . . amn Let vi denote the vector whose entries are taken from the ith row of A: ai1 .. vi := . . ain Then define the matrix-vector product as a map Matm×n (R) × Rn → Rm given by the formula v1 · x .. m x 7→ Ax := ∈R . . vn · x Example 3.1. Compute the matrix vector product Au where 1 1 1 1 A = 1 −2 3 , u = 2 . 3 4 −5 6 To compute this, we need to dot each row with the column vector u. For example, the first row gives 1 [ 1 1 1 ] 2 = 1(1) + 1(2) + 1(3) = 6 . 3 Note that dotting a vector u with a vector v consisting entirely of ones simply sums the components u. Computing the remaining rows this way, we obtain the vector 1 1 1 1 6 Au = 1 −2 3 2 = 6 . 4 −5 6 3 12 î ó Let’s call this vector b. Recall that u was a solution to the system with augmented matrix A b ! This is no coincidence. We can view the system of equations as being equivalent to solving the following problem: find a vector x such that Ax = b. In this case we’d solved that system for x = u, and just checked via matrix-vector multiplication that indeed, it is a solution! We have one last perspective on this, which is that we found a linear combination of the columns of A: 1 1 1 6 x 1 + y −2 + z 3 = 6 4 −5 6 12 13 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens is solved by x = 1, y = 2, and z = 3. Thus, we’ve explored numerous ways to understand the solution of the equation 1 1 1 x 6 Ax = 1 −2 3 y = 6 . 4 −5 6 z 12 Let us remark on some basic properties of matrix-vector products. We know that we can view them as giving maps between Euclidean spaces of vectors. We have the following observations: 1. For any matrix A ∈ Matm×n (R) and any vectors x, y ∈ Rn , A(x + y) = Ax + Ay. 2. For any A ∈ Matm×n (R), any vector x, ∈ Rn , and scalar s ∈ R, A(sx) = s(Ax). Are these not familiar properties? Consider, for example, limits, derivatives, integrals. Another way of stating these properties is to say we have discovered operators which, upon acting on linear combinations of inputs, output a linear combination of the sub-outputs. That is, matrices take linear combinations of vectors to linear combinations of matrix-vector products, derivatives take linear combinations of differentiable functions to linear combinations of the derivatives of the simpler functions, and integrals act analogously on integrable functions. Both derivatives and integrals behave this way because limits do, so the linearity was somehow inherited. We’d gradually like to come to an understanding of the word linear describing the commonality among these various operations, which behave well with respect to linear combinations. To do this, we need to see what spaces of objects have the right properties to form linear combinations, and to ensure that we consider maps of such spaces which respect this structure in a way analogous to the above two properties. Practice: Exercise 3.1. Let A be the matrix 0 2 −1 3 . A = −2 0 1 −3 0 Compute Ax for 1. 1 x= 1 , 1 2. 3 x= 1 , 2 3. 1 0 0 x = 0 , 1 , or 0 . 0 0 1 Can you interpret the results geometrically? We will eventually have a good understanding of the geometry of the transformation x 7→ Ax for the above matrix, and others which share a certain property which it possess. (Preview: it is a skew symmetric matrix, and represents a certain cross-product operation). 14 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens We now investigate so called linear maps from Rn to Rm . Definition 3.2. A map T : Rn → Rm is called a linear transformation, or a linear map if the following properties hold: 1. For all s ∈ R and any x ∈ Rn , T(sx) = s(Tx). 2. For any pair of vectors x, y ∈ Rn , T(x + y) = Tx + Ty. We refer to T as a linear operator if these properties hold. Note the convention of often omitting parentheses between the operator T and the vector input x: Tx := T(x). Clearly, the operator TA : Rn → Rm defined by TA x = Ax defines a linear map. Let us see how linear systems fit into this framework. First, a formal description of linear systems: Definition 3.3. A system of linear equations in n variables is a set of m ≥ 1 equations of the form a11 x1 + a12 x2 + . . . + a1n xn a21 x1 + a22 x2 + . . . + a2n xn .. . .. . am1 x1 + am2 x2 + . . . + amn xn .. . = b1 = b2 .. . . = bm Observation 3.1. A system of linear equations can be captured by the linear transformation TA associated to a matrix A = (aij ) ∈ Matm×n (R). Thus, a linear system can be written as Ax = b for x ∈ Rn unknown. The system Ax = b is solvable if and only if b is in the image of TA . We need to recall what is meant by this terminology, so what follows is a handful of definitions regarding functions (not necessarily just linear functions; these definitions are standard and are usually introduced in high school or college algebra and precalculus courses). Definition 3.4. Let X and Y be mathematical sets. A function f : X → Y assigns to each x ∈ X precisely one y ∈ Y . X is called the domain or source, and Y is called the codomain or target. Note that one y may be assigned to multiple xs, but each x can be assigned no more than one y... this is a distinction which often trips folk up when first learning about functions. To better understand this distinction, let’s view functions as triples consisting of a domain X (the inputs), a codomain Y (the possible outputs), and a rule f assigning outputs to inputs. Note htat we need to specify all of these to completely identify a function. Now, if the domain were the keys on a keyboard, and the outputs the symbols on your screen in a basic word processing environment, you’d declare your keyboard “broken” if after pushing the same key several times, your screen displayed various unexpected results. On the other hand, if your function was determined by a preset font, you could imagine pushing many different keys, and having all of the outputs be the same. In this latter case, the keyboard is functioning, but the rule assigning outputs happens to be a silly one (every keystroke produces a ‘k’, for example). Thus, a function may assign at most one output per input, but may reuse outputs as often as it pleases. Sometimes a function is also called a map, especially if the sets involved are thought of as “spaces” in some way. We will later define structures on a set which turn them into something called a vector space, and we will study linear maps on them, which are just functions with properties analogous to those for linear functions from Rn to Rm . Definition 3.5. Given sets X and Y and a function f : X → Y , the set f (x) := Im(f ) = {y ∈ Y | y = f (x) for some x ∈ X} ⊂ Y is called the image of f . 15 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Definition 3.6. Given sets X and Y and a function f : X → Y , and given a subset V ⊂ Y , the set f −1 (V ) := {x ∈ X | f (x) ∈ V } ⊂ X is called the preimage of V . Be warned: the preimage of a subset is merely the set of things being mapped to that subset, but is not necessarily constructed by “inverting a function” since not every function is invertible (but any subset of a codomain has defined for it a preimage by any function mapping to the codomain; that preimage may be empty!) If, on the other hand, for every y ∈ Y there is a unique x ∈ X, such that y = f (x), then we use the same notation f −1 to describe the inverse function. We will talk more about inverses after a few more definitions. Definition 3.7. A map f : X → Y is called surjective or onto if and only if for every y ∈ Y , there is an x ∈ X such that y = f (x); equivalently, if the preimage f −1 {y} = 6 ∅ for all y ∈ Y , f is a surjection from X to Y . A common shorthand is to write f : X Y to indicate a surjection; in class I avoid this shorthand because it is easy to miss until one becomes quite comfortable with the notion. However, in these notes, I will from time to time use it, while also reminding the reader that a particular map is a surjection by declaring it “onto” or ”surjective” in the commentary. Note that a map f : X → Y is a surjective map if and only if the image is equal to the codomain: f (X) = Y . In our keyboard analogy, we’d want to be able to produce any symbol capable of being displayed in a word processing program by finding an appropriate keystroke in order to declare that our typing with a particular font was “surjective”. Thus, the rule for producing outputs has to be powerful relative to the set of outputs: any output can be achieved by an appropriate input into a surjective function. Another remark is that if we start with some function f : X → Y , and then restrict our codomain to the image f (X) ⊆ Y , we obtain a new function, which we abusively might still label f . This function is surjective! Said another way, any function surjects onto its image, because we’ve thrown out anything in the codomain which wasn’t in the image when we restricted the codomain! So, in our typing analogy, perhaps we can’t produce all symbols with a given font, but if we declare our codomain to be only the symbols that display in that font with regular typing inputs (no fancy stuff, multiple keys at once, sequences of keystrokes, etc1 ), then we have automatically built an onto map between keys and displayable symbols in the given font. Definition 3.8. A map f : X → Y is called injective or one-to-one if and only if for every distinct pair of points x1 , x2 ∈ X, they possess distinct images: x1 6= x2 =⇒ f (x1 ) 6= f (x2 ) for all x1 , x2 ∈ X . Equivalently, for any y ∈ f (X), the preimage of y, f −1 ({y}) contains precisely one element from X. As a shorthand, one often writes f : X ,→ Y , and refers to f as an injection. Exercise 3.2. Show that a function f : X → Y is injective if and only if whenever f (x1 ) = f (x2 ), one has that x1 = x2 . Definition 3.9. A map f : X → Y is called a bijection if and only if it is both injective and surjective. Definition 3.10. Given a map f : X → Y , a map f −1 : Y → X is called an inverse for f if and only if 1 If we define our domain to be the set of all sequences of keystrokes which can produce a single symbol output, and our codomain to be all possible outputs in the font, then we have a bijection between keystroke sequences and outputs if and only if the font contains no repeated characters, and the hardcoding contains no redundant input sequences. 16 Math 235.9 - Lin. Alg. Course Notes Ä 2015 Andrew J. Havens ä (i.) f −1 ◦ f = IdX , i.e. f −1 f (x) = x for every x ∈ X, Ä ä (ii.) f ◦ f −1 = IdY , i.e. f f (y) = y for every y ∈ Y . If such a function exists, we say f is invertible. Exercise 3.3. A function f : X → Y is invertible if and only if it is a bijection. Note that there are two ways to show that some map f : X → Y is a bijection. You can show that it is both injective and surjective separately, or you can prove that an inverse exists. We’d now like to return to doing linear algebra, a little brighter with our language of functions. Consider the following questions: Question 1: If Ax = b possess a solution x for every v ∈ Rm , then what can we say about the linear map TA : Rn → Rm ? Question 2: If Ax = b possess a unique solution x for every v ∈ Im(TA ) =: ARn , then what can we say about the linear map TA : Rn → Rm ? These answers give us a surprising motivation to study specific properties of linear maps, such as which vectors they send to the zero vector. Here I provide incomplete answers to these questions. For the first, we know that the map is surjective, though we need to discover what that means in terms of our matrix; in particular, we’d like to answer “what property must a matrix have for the associated matrix-vector multiplication map to be surjective?” Similarly, for the second question, we know that the map must be injective, and would hope to characterize injectivity in an easily computable way for a given map coming from multiplying vectors by matrices. Surjectivity, recall, is equivalent to the image being the entire codomain. So for a linear map T : Rn → Rm to be surjective, we merely require that T(Rn ) = Rm . To know when a given matrix can accomplish this, we’ll need to do more matrix algebra, and come to an understanding of the concept of dimension. For now, I’ll state without argument that there’s certainly no hope if n < m. But it’s also possible to have n >> m and still produce a map which doesn’t cover Rm (e.g. by mapping everything onto 0, or onto some linear set carved out by a linear equation system). Injectivity is more subtle. Begin first by observing that if T : Rn → Rm is linear, then T0Rn = 0Rm where 0Rn is the zero vector, consisting of n zeroes for components, and similarly for 0Rm (I will often drop the subscript when it is clear which zero vector is being invoked). This is because of the first property in the definition of linearity: 0 = 0(Tx) = T(0x) = T0 for any x ∈ Rn . So certainly, the preimage of 0 by a linear map contains 0. If it contains anything else, then the map is not injective by definition. I claim that the converse is true: if there’s only the zero vector in the preimage of the zero vector, then the linear map is an injection. The proof is constructed as a solution to the first question on the second written assignment (HW quiz 2, problem 1), in greater generality (the result, correctly stated, holds for vector spaces). We’ll discuss this proposition more later. Generally, we want to know about solutions to the homogeneous equation Ax = 0, and in particular, when there are nontrivial solutions (which means the matrix-vector multiplication map is not injective). It seems clear that this information comes from applying Gauss-Jordan to the matrix, and counting the pivots. If there are no free variables, then the homogenous system is solved uniquely, and the map is injective. If it is also surjective, we’d like to be able to find an inverse function which solves the general, inhomogeneous system Ax = b once and for all! We need a little more information about matrix algebra if we wish to accomplish this. Along the way, we will further motivate the development of abstract vector spaces. 17 Math 235.9 - Lin. Alg. Course Notes 4 2015 Andrew J. Havens Matrix algebra Suppose we wanted to compose a pair of linear maps induced by matrix multiplication: T T B A Rk −→ Rn −→ Rm , where B ∈ Matn×k (R) and A ∈ Matm×n (R). Let TAB = TA ◦ TB denote the composition obtained by first applying TB and then applying TA . Exercise 4.1. Check that TAB above is a linear map. We want to know if we can represent TAB by a matrix-vector multiplication. It turns out we can, and the corresponding matrix can be though of as a matrix product of A and B. Let us do an example before defining this product in full generality. Example 4.1. Let ñ A= 3 2 1 6 5 4 ô 1 2 ∈ Mat2×3 (R) , and B = 3 4 inMat3×2 (R) . 5 6 Thus, : R3 → R2 is given by TA y = Ay and TB : R2 → R3 is given by TB x = Bx. Given ñ TA ô x1 ∈ R2 , the map TAB : R2 → R2 sends x to A(Bx). Let y = Bx. Then x= x2 ô x1 + 2x2 1 2 ñ x1 = 3x1 + 4x2 . y= 3 4 x2 5x1 + 6x2 5 6 We can then compute Ay: ñ TAB x = Ay = A(Bx) = ñ = 3 2 1 6 5 4 ñ = ñ = x1 + 2x2 3x 1 + 4x2 5x1 + 6x2 3(x1 + 2x2 ) + 2(3x1 + 4x2 ) + 5x1 + 6x2 6(x1 + 2x2 ) + 5(3x1 + 4x2 ) + 4(5x1 + 6x2 ) " Ä = ô ä Ä ô ä 3(1) + 2(3) + 1(5)äx1 + Ä3(2) + 2(4) + 1(6)äx2 Ä 6(1) + 5(3) + 4(5) x1 + 6(2) + 5(4) + 4(6) x2 3(1) + 2(3) + 1(5) 3(2) + 2(4) + 1(6) 6(1) + 5(3) + 4(5) 6(2) + 5(4) + 4(6) 14 20 41 56 ôñ x1 x2 ô ñ = 14x1 + 20x2 41x1 + 56x2 ô ôñ x1 x2 # ô . Notice that the matrix in the penultimate line above is obtained by forming dot products from the row vectors of A with the column vectors of B to obtain each entry. This is how we will define matrix multiplication in general: we treat the columns of the second matrix as vectors, and compute matrix-vector products in order to obtain new column vectors. We are now ready to define the matrix product as the matrix which successfully captures a composition of two linear maps coming from matrix-vector multiplication. Let’s return to the setup. 18 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Definition 4.1. Suppose we have linear maps T T B A Rk −→ Rn −→ Rm , where B ∈ Matn×k (R) and A ∈ Matm×n (R). Let TAB = TA ◦ TB : Rk → Rm denote the composition obtained by first applying TB and then applying TA . Then there is a matrix M such that TAB x = Mx for any x ∈ Rk , and we wish to define AB := M. Following the ideas of the above example, we can (exercise!) realize M = (mij ) ∈ Matm×k (R) as the matrix whose entries are given by the formula n mij = X ail blj . l=1 Thus, the columns of AB are precisely the matrix-vector products Avj where vj is the jth column of B. We refer to AB ∈ Matm×k (R) as the matrix product of A and B. Several remarks are in order. First, note that there is a distinguished identity matrix In ∈ Matn×n (R) such that for any A ∈ Matm×n , AIn = A and for any B ∈ Matn×k , In B = B. This matrix consists of entries δij which are 1 if i = j and 0 if i 6= j: In = 1 0 0 ... 0 0 1 0 ... 0 .. .. .. ∈ Matn×n (R) . . . . 0 0 ... 0 1 Clearly, for any vector x ∈ Rn , In x = x, whence it also acts as an identity for matrix multiplication, when products are defined. Notice also that the number of columns of the first matrix must match the number rows of the second matrix. In particular, if A ∈ Matm×n and B ∈ Matn×k (R), then AB is well defined, but BA is well defined if and only if k = m. Worse yet, like function composition, matrix multiplication, even if it can be defined in both orders, is in general not commutative, as the maps of the two differently ordered compositions may land in different spaces altogether! Example 4.2. Suppose A ∈ Mat2×3 (R), and B ∈ Mat3×2 (R). Then both AB and BA are defined, but AB ∈ Mat2×2 (R), while BA ∈ Mat3×3 (R)! We may hope that things are nicer if we deal with square matrices only, so that products of matrices stay in the same space. Alas, even here, commutativity is in general lost, as the next example illustrates. Example 4.3. Consider the following matrices: ñ 1 2 0 1 ô ñ , 0 −1 1 0 ô . We compute the products in each order: ñ ñ 1 2 0 1 ôñ 0 −1 1 0 0 −1 1 0 ôñ 1 2 0 1 ô ñ = ô ñ = 2 −1 1 0 0 −1 1 2 ô ô . Thus, matrix multiplication isn’t generally commutative, even for 2 × 2 square matrices where all products are always defined. 19 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Another remark, which would require some work to prove, is that multiplication of real matrices is associative. In particular, if A, B and C are matrices for which the products A(BC) and (AB)C are defined, then in fact these are the same and thus without ambiguity we have A(BC) = ABC = (AB)C . There are several other important constructions in matrix algebra, which rely on the structure of the Euclidean spaces of vectors we’ve been working with. Note that we can define sums of images of vectors under a linear map. This allows us to also define sums of matrices. Definition 4.2. Given A, B ∈ Matm×n (R), we can define the sum A + B to be the matrix such that for any x ∈ Rn , (A + B)x = Ax + Bx. Using the indicial notation for entries, we have then that n n n X aij xj + j=1 X X bij xj = j=1 (aij + bij )xj , j=1 which implies that A + B is obtained by adding corresponding entries of A and B. Matrices can also be scaled, by simply scaling all the entries: sA = (saij ) for any s ∈ R. In particular, we may also subtract matrices, and each matrix has an additive inverse. There’s a unique zero matrix in any given matrix space Matm×n (R), consisting of all zero entries. Denote this zero matrix by 0m×n . We define a few more operations with matrices. If A ∈ Matm×n (R), then we can define a new matrix called it’s transpose, which lives in Matn×m (R): Definition 4.3. The matrix A = (aij ) has transpose Aτ = (aji ), in other words, the transpose matrix is the matrix obtained by exchanging the rows of A for columns. Example 4.4. ñ 1 2 3 4 5 6 ôτ 1 4 = 2 5 . 3 6 Finally, we discuss, for square matrices, the notion of a matrix inverse. The inverse matrix of a matrix A ∈ Matn×n (R) is one which, if it exists, undoes the action of the linear map x 7→ Ax. In particular, we seek a matrix A−1 such that A−1 A = In = AA−1 . Recall, that the map must be bijective for it to be fully invertible. Proposition 4.1. If an inverse matrix for a matrix A ∈ Matn×n (R) exists, then one can compute it by solving the system with augmented matrix î ó A In . This can be done if and only if the reduced row echelon form of A is the n × n identity, that is, RREF(A) = In . In this case, after applying Gauss-Jordan to this augmented matrix, one has the matrix ó î In A−1 . Proof. The condition AA−1 = In gives us n systems of n equations in n variables, corresponding to the systems Avj = ej for vj a column of A−1 , and ej the jth column of the identity matrix In . The row operations to put A into RREF do not depend on ej , so applying these operations to the matrix î ó î ó A e1 . . . en = A In simultaneously solves all n systems, provided that RREF(A) = In . If RREF(A) 6= In , then there are free variables, and the columns of our hypothetical inverse cannot be uniquely determined, and 20 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens in fact, at least one of the systems will consequently be inconsistent. This latter statement will be more carefully proved when we discuss linear independence. Assuming the reduction can be completed to solve for A−1 , then the final form of the augmented matrix is clearly î ó î ó In v1 . . . vn = In A−1 , which gives the desired matrix inverse. Example 4.5. Let’s compute ñ 1 2 1 3 ô−1 . The augmented matrix system we need is ñ 1 2 1 0 1 3 0 1 ô . Applying the row operations R2 − R1 7→ R2 followed by R1 − 2R2 7→ R2 , one obtains ñ 1 0 3 −2 0 1 −1 1 ô . We can check easily by multiplying, in either order, to obtain the identity matrix. Exercise 4.2. Find −1 1 2 3 4 5 6 7 8 9 , if it exists. Exercise 4.3. Show that A(B + C) = AB + AC whenever the products and sums are defined. Convince yourself that s(AB) = A(sB) for any scalar s ∈ R, provided the matrix product is defined. What can you say about (A + B)τ and (AB)τ ? 5 5.1 Vector Spaces Indulging a Motivation In the previous section, we saw that matrices have algebraic properties identical in some sense to the algebraic properties of vectors in a Euclidean vector space: we can add them and scale them, and we can form linear combinations of matrices if we so please, with all these operations being commutative and associative. Matrix multiplication, on the other hand, defines linear maps of Euclidean vectors. But since we can also multiply matrices by each other under the right (dimensional) conditions, we may want a way to regard matrices as determining linear maps on the spaces of matrices. More specifically, given M ∈ Matm×n (R) and A ∈ Matn×k (R), we can define a map TM : Matn×k (R) → Matm×k (R) , given by the rule A 7→ MA . By the exercise at the end of last section, we have that TM (sA + tB) = M(sA + tB) = s(MA) + t(MB) = sTM A + tTM B . 21 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Thus, we want to be able to regard this map as a linear map since it shares the properties which defined linear maps from Rn to Rm . One way to easily realize this is to actually identify the spaces Matn×k (R) with some Euclidean vector space. By concatenating the columns of matrices in some chosen order, we can create a bijective map from Matn×k (R) to Rn×k . Of course, there’s not a single natural way to do this; we could also concatenate rows, or scramble the entries up somehow, as long as we do it consistently for all matrices. Example 5.1. We can identify Mat2×2 (R) with R4 as follows: given a matrix ñ a b c d ô , we can map it to the 4-vector a c b d , obtained by concatenating the first and second columns but we can also map it to the 4-vector a b c d , obtained by concatenating rows. Neither choice is better than the other, so we say that our identification, whichever we choose, is non-canonical, since there’s not a particularly more natural choice. Exercise 5.1. Given A ∈ Matn×k (R), how many different ways can one identify A with a vector in Rnk which contains the same entries as A? How many ways can we bijectively map Matn×k (R) and Rnk ? 5.2 The Big Definition Another approach, which is quite fruitful, is to investigate spaces which have the appropriate general algebraic structure to support a notion of “linear map”. This brings us to the study of vector spaces. Definition 5.1. A vector space is a set V whose elements will be called vectors, together with additional structure depending on a pair of operations and a choice of a scalar field F (for now, mentally picture F = R ,the field real numbers, or F = Q, the field of rational numbers; other examples will be given later including complex numbers C and finite fields.) The operations are vector addition and scalar multiplication. Vector addition takes two vectors x, y ∈ V and produces a (possibly new) vector x + y ∈ V , while scalar multiplication takes a scalar s ∈ F and a vector x ∈ V and produces a (possibly new) vector sx ∈ V . These operations are required to satisfy 8 axioms: Axiom 1: Commutativity of vector addition: for any x, y ∈ V , x + y = y + x. Axiom 2: Associativity of vector addition: for any x, y, z ∈ V , x + (y + z) = x + y + z = (x + y) + z. 22 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Axiom 3: Identity for vector addition: there exists a vector 0 ∈ V such that for any x ∈ V , x + 0 = x. Claim. The zero vector 0 ∈ V is unique. ˜ ∈ V is a Proof. This follows from the preceding axioms: Assume we have found 0. Then if 0 ˜ ˜= vector such that x + 0 = x for any x ∈ V as well, then taking x = 0 one has 0 = 0 + 0 ˜ + 0 = 0, ˜ showing that our new candidate was in fact the same as the zero vector. 0 Axiom 4: Inverse for vector addition: for any x ∈ V , there is an inverse element (−x) such that x + (−x) = 0. Axiom 5: Scalar distributivity over vector addition: for any s ∈ F and any x, y ∈ V , s(x + y) = sx + sy. Axiom 6: vector distributivity over scalar addition: for any x ∈ V and any scalars r, s ∈ F, (r + s)x = rx + sx. Axiom 7: Associativity of scaling: for any x ∈ V and any scalars r, s ∈ F, s(rx) = (sr)x. Axiom 8: Scalar identity: for any x ∈ V , 1x = x, where 1 ∈ F is the multiplicative identity for the field. A set V with vector addition and scalar multiplication satisfying the above eight axioms for a field F is called a “vector space over F” of simply “an F-vector space”. Exercise 5.2. Let V be an F-vector space. Prove that for any given x ∈ V , the inverse (−x) is unique, and equals −1(x). Given the abstraction of the above definition, let us convince ourselves that it is a worthwhile definition by exhibiting a plethora of examples. The longer one studies math, the more one discovers many ubiquitous vector spaces, which vindicate the choices made in crafting such a long, abstract definition. After a while, one potentially becomes disappointed when one encounters something that’s almost a vector space (modules over commutative rings with zero divisors: I’m looking at you!), but rest assured, there are plenty of vector spaces out there to become acquainted with! The following examples are also “thought exercises” where you should convince yourself that the examples meet the conditions set forth in the above axioms. Example 5.2. The obvious example is Rn : every axiom seems to have been picked from observing the essential structure of Rn as a vector space over R. Example 5.3. It doesn’t take much work at this point to show that Matm×n (R) is an R-vector space for any positive integers m and n. Convince yourself that all eight axioms are met if we take matrix addition as the vector addition, and scaling a matrix as the scalar multiplication operation. Example 5.4. Let Pn (R) denote the space of all polynomials of degree less than or equal to n with real coefficients: Pn (R) = {a0 + a1 x + . . . an xn | a0 , . . . an ∈ R} . Then I claim this is naturally a vector space over R with the vector addition given by usual addition of polynomials, and the scalar multiplication given by scaling polynomials in the usual way. 23 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Example 5.5. The complex numbers C := {a + bi | a, b ∈ R, i2 = −1} are naturally a vector space over the real numbers, but since C is also a field, C can be regarded as a C-vector space. in general, any field F is itself a vector space over F, and Fn may be defined as it was for Rn . Fn inherits a natural vector space structure, much as R did, by allowing componentwise addition of vectors using the additive structure of F, and allowing the multiplicative structure of F to determine the scalar action componentwise. Example 5.6. Let p be a prime number. Then there exists a field Fp which has p elements. We can regard this field as the set of remainder classes modulo the prime p, and so we write Fp = {0, 1, . . . , p − 1} as a set. The additive structure is determined by taking the remainder of addition modulo p, and the multiplicative structure is determined likewise. For example, if p = 3, one has F3 = {0, 1, 2} as a set, and the operations are 0 + 0 = 0, 0 + 1 = 1, 0 + 2 = 2 1 + 1 = 2, 1 + 2 = 0 0(1) = 0(2) = 0(0) = 0, 1(1) = 1, 1(2) = 2, 2(2) = 1 . Given any Fp , we can construct Fnp which is certainly an Fp -vector space, but it will contain only pn elements. We can also construct the space Pn (Fp ) of polynomials of degree less than or equal to n with Fp coefficients. These spaces are interesting in their own right within the study of number theory. However, a simple example shows that these are not so abstract: let p = 2. F2 is called the binary field. Recall that any given integer m possess a binary expansion, which is an expression of the form m = a0 20 + a1 21 + a2 22 + . . . an 2n for some integer n, where a0 , . . . an ∈ F2 are equal either 0 or 1. This is just a polynomial in Pn (F2 ) evaluated with x = 2! Thus, there is a correspondence between binary expansions of integers and polynomials in the vector space Pn (F2 ). As an example, consider the integer 46. We know that 46 = 32 + 8 + 4 + 2 = 25 + 23 + 22 + 21 . The corresponding polynomial is then 0 + 1x + 1x2 + 1x3 + 0x4 + 1x5 ∈ P5 (F2 ), while the binary expansion is just the list of these coefficients (with highest degree first): 4610 = 1011102 . Example 5.7. Fix an interval I ∈ R, and let C 0 (I, R) denote the set of all continuous R-valued functions on I. Convince yourself that this is indeed a vector space. One can also give a vector space structure to continuously differentiable functions C 1 (I, R) defined over an open interval I ∈ R. 5.3 Linear Maps and Machinery We now can proceed to define and study linear maps between vector spaces. What we will see is that the phenomena in Rn aren’t particularly special to Rn , but rather a consequence of vector space structure. We will have the power to prove facts for all vector spaces and linear maps, which gives us the power to transfer ideas about how to solve problems in one space to other spaces. Our definition of linear map won’t appear any different, but we see that it is truly the two properties we’ve settled on which create much of the rigidity in the study of linear algebra. Definition 5.2. A map T : V → W of F-vector spaces is called an F-linear map or an F-linear transformation if (i.) for any u, v ∈ V , T(u + v) = Tu + Tv, (ii.) for any s ∈ F and any v ∈ V , T(sv) = sTv. 24 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens If the field is understood, one simply says ”linear map”, ”linear function”, or ”linear transformation.” The symbol T is often referred to as a linear operator on V . In analogy to how in elementary algebra, one studies roots of polynomials, i.e. points which a polynomial maps to 0, one may concern oneself with solutions v to the homogeneous linear equation Tv = 0W for a linear map T : V → W . We have a special name for the set of solutions to such an equation: Definition 5.3. The kernel of an F-linear map T : V → W of F-vector spaces is the preimage of the zero vector 0W ∈ W : ker T := T−1 {0W } = {v ∈ V | Tv = 0W } . Thus, the kernel of a linear map is the set of solutions to the homogeneous equation determined by that map: v ∈ ker T ⇐⇒ Tv = 0W . Proposition 5.1. A linear map T : V → W is an injection if and only if the kernel is trivial, i.e. ker T = {0V }. Proof. The proof is built in HW quiz 2, problem 1. Example 5.8. We’ve already encountered linear maps of R-vector spaces extensively, and in the case of a linear map given by matrix-vector multiplication, we can easily characterize injectivity. In particular, if A ∈ Matm×n (R) is a matrix determining a linear map TA : Rn → Rm , it’s injective if and only if the homogeneous equation Ax = 0 ∈ Rm is uniquely solved by the zero vector 0 ∈ Rn . This occurs if and only if A has n pivots. If there are fewer than n pivots, we have free variables, and can write the solution to the homogeneous equation as a linear combination of vectors which generate or span the kernel. We’ve seen this basic procedure performed when solving for the intersection of two planes, though in that case there was an additional vector with scalar weight 1, since we were solving an inhomogeneous equation of the form Ax = b for b ∈ R2 . So by the above discussion, we can detect injectivity of the map x 7→ Ax by examining the row reduction of A and counting the pivot entries. Note also that this implies that if n > m, there is no hope for injectivity, as there can be at most as many pivots as the minimum of n and m. We will often abuse notation and write ker A for the kernel of the linear map TA , and refer to this kernel as the null space of A. This language will be better justified when we study subspaces and the rank-nullity theorem in coming lectures. We also have a special name for bijective linear maps, owing to the fact that linear maps preserve vector space structure well: Definition 5.4. Given two vector spaces V and W over a field F, an F-linear map T : V → W is called a linear isomorphism or a vector space isomorphism if it is a bijection. In this case we say that V and W are isomorphic as F-vector spaces, and we write V ∼ =W. If it is clear we are dealing with two vector spaces over a common field, we may simply say that the map is an isomorphism and that the vector spaces are isomorphic. Exercise 5.3. Show that Pn (F) ∼ = Fn+1 by exhibiting a linear isomorphism. Exercise 5.4. Deduce that if A ∈ Matn×n (R) is an invertible matrix, it determines a selfisomorphism of Rn . We call such a self-isomorphism a linear automorphism. 25 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Exercise 5.5. Compute the kernel of the linear map TA with matrix 4 1 4 A= 1 1 1 . 4 1 4 Describe the general solution to Ax = b in terms of the components b1 , b2 , b3 of b and the elements of the kernel (in particular, you should be able to express the solution as a linear combination of some vectors; what is this geometrically?) 5.4 Subspaces An important concept in the study of vector spaces is that of a subspace. The idea is that linear equations carve out smaller vector spaces within larger ones, and vector spaces nest well in other vector spaces. Definition 5.5. Let V be an F-vector space, U ⊂ V a nonempty subset. We call U a vector subspace or linear subspace if and only if the following two conditions hold: (i.) for any u, v ∈ U , u + v ∈ U , (ii.) for any s ∈ F and any u ∈ U , su ∈ U . Exercise 5.6. Verify that a subset U of a vector space V over F is a vector subspace if and only if it is itself a vector space with the operations it inherits from V . Exercise 5.7. Convince yourself (and me, if you care) that U ⊂ V is a vector subspace if and only if it passes the following subspace test: For any u, v ∈ U and any s ∈ F, u + sv ∈ U . This is analogous to the statement that a map T : V → W is F-linear if and only if T(u + sv) = Tu + sTv for any u, v ∈ U and any s ∈ F. Example 5.9. Given any vector space V , V is a vector subspace of itself, called the improper subspace. A subspace U ⊂ V is called proper if and only if it is not all of V . Example 5.10. For any vector space V , {0} is a vector subspace of V , called the trivial subspace. This justifies the language “the kernel is trivial”, as the kernel is trivial if and only if it equals the trivial subspace. We often drop the braces and write 0 for the subspace as well as the element. Example 5.11. If T : V → W is a linear map, then the kernel ker T ⊂ V is a subspace and similarly the image T (V ) ⊂ W is a subspace. Let us prove the former, and leave the latter as an exercise. We have to check the two conditions of being a subspace, namely, whether it is closed under addition and scalar multiplication. By some trickery, one can claim that it suffices to check that for any u, v ∈ ker T, and any scalar s, u + sv ∈ ker T. (Why?) This is readily verified: T(u + sv) = Tu + sTv = 0 + s0 = 0 =⇒ u + sv ∈ ker T . Thus the kernel of the map T is a subspace of V . Exercise 5.8. Check that Pk (F) ⊂ Pn (F) is naturally a subspace so long as k ≤ n. Define the space of all polynomials over F P(F) := {p(x) ∈ Pn (F) | some n ∈ N} = ∪n Pn (F) . Then convince yourself that Pn (F) ⊂ P(F) is a subspace for any nonnegative integer n. 26 Math 235.9 - Lin. Alg. Course Notes 2015 Ä Andrew J. Havens ä Example 5.12. We can view the set C 1 (a, b), R of continuously differentiable functions on an Ä ä open interval (a, b) as sitting inside of continuous functions C 0 (a, b), R , indeed, as a vector subspace (prove this to yourself!) The derivative map provides a linear map Ä ä Ä ä d : C 1 (a, b), R → C 0 (a, b), R , dx and since the kernel of this map is nontrivial (itäconsists of all the constant functions, which as Ä a vector subspace is R sitting inside C 1 (a, b), R ), we know the map is not injective, and so in Ä ä Ä ä particular, it is not the map giving us the inclusion C 1 (a, b), R ,→ C 0 (a, b), R . On the other hand, by the fundamental theorem of calculus, the map is surjective, since we can always integrate Ä ä 0 a continuous function f ∈ C (a, b), R to obtain a continuously differentiable function Z x g(x) := a Ä ä f (t)d t , g(x) ∈ C 1 (a, b), R , d g(x) = f (x) . dx Thus, we’ve furnished an example of a proper vector subspace which possesses a surjective but not injective linear map onto its parent vector space. This is possible because the spaces are infinite dimensional – a notion we will make precise soon! We will also show that these oddities don’t occur in the finite dimensional cases. Before we can define dimension properly, we must carefully come to understand the role played by linear combinations in building subspaces, and in describing elements of vector spaces. Thus, we will define linear combinations and linear independence for a general vector space V over a field F. Definition 5.6. Let V be an F-vector space. Given a finite collection of vectors {v1 , . . . , vk } ⊂ V , and a collection of scalars (not necessarily distinct) a1 , . . . , ak ∈ F, the expression a 1 v1 + . . . + a k vk = k X a i vi i=1 is called an F-linear combination of the vectors v1 , . . . , vk with scalar weights a1 , . . . ak . It is called nontrivial if at least one ai 6= 0, otherwise it is called trivial. As alluded to, one major use of linear combinations is to construct new subspaces. Consider looking at the collection of all linear combinations made from a collection of vectors. We will call this their span: Definition 5.7. The linear span of a finite collection {v1 , . . . , vk } ⊂ V of vectors is the set of all linear combinations of those vectors: span {v1 , . . . , vk } := ( k X i=1 ) ai vi ai ∈ F, i = 1, . . . , k . If S ⊂ V is an infinite set of vectors, the span is defined to be the set of finite linear combinations made from finite collections of vectors in S. Proposition 5.2. Let V be an F-vector space. Given a finite collection of vectors S ⊂ V , the span span (S) is a vector subspace of V . Proof. A sketch was given in class. You are encouraged to go through a careful argument and determine which axioms of being a vector space are applied where. 27 Math 235.9 - Lin. Alg. Course Notes 5.5 2015 Andrew J. Havens Linear Independence and Bases Definition 5.8. A collection {v1 , . . . , vk } ⊂ V of vectors in an F-vector space V are called linearly independent if and only if the only linear combination of v1 , . . . , vk equal to 0 ∈ V is the trivial linear combination: {v1 , . . . , vk } linearly independent ⇐⇒ k ÄX ä ai vi = 0 =⇒ a1 = . . . = ak = 0 . i=1 Otherwise we say that {v1 , . . . , vk } is linearly dependent. Proposition 5.3. {v1 , . . . , vk } is linearly dependent if and only if there is some vi ∈ {v1 , . . . , vk } which can be expressed as a linear combination of the vectors vj for j 6= i. Proof. Suppose {v1 , . . . , vk } is linearly dependent . After possibly relabeling we can assume that P there’s a tuple (a1 , . . . , ak ) ∈ Fk such that a1 6= 0, and ki=1 ai vi = 0. Then rearranging, one has v1 = k X Å − i=2 ã ai vi , a1 and thus we have expressed one of the vectors as a linear combination of the others. Conversely, if there’s a vector vi ∈ {v1 , . . . , vk } such that it can be expressed as a linear P combination of the other vectors, then we have vi = i6=j aj vj for some constants aj ∈ F, and P rearranging one has vi − i6=j aj vj = 0, which is a nontrivial linear combination equal to the zero vector. This establishes that {v1 , . . . , vk } is linearly dependent. Example 5.13. Let V = Rn , and suppose {v1 , . . . , vk } ⊂ Rn is a collection of k ≤ n vectors. Then we have the following proposition: Proposition 5.4. The set of vectors {v1 , . . . , vk } is linearly independent if and only if the matrix A = [v1 . . . vk ] has k pivots. Proof. Consider the system Ax = 0. If 0 6= ker A := ker T, then there’s some nonzero x ∈ Rn such P that ni=1 xi vi = 0, which implies that {v1 , . . . , vk } is linearly dependent. Thus, {v1 , . . . , vk } is linearly independent if and only if ker A is trivial, which is true if and only if there k pivots. Definition 5.9. A vector space V over F is called finite dimensional if and only if there exists a finite collection S = {v1 , . . . , vk } ⊂ V such that the F-linear span of S is V . If no finite collection of vectors spans V , we say V is infinite dimensional. Proposition 5.5. Any finite dimensional F-vector space V contains a linearly independent set B ⊂ V such that span B = V , and moreover, any other such set B 0 ⊂ V such that span B 0 = V has the same number of elements as B. Proof. Let V be a finite dimensional F-vector space. Observe that because the V is finite dimensional, by definition there exists a subset S ⊂ V such that span S = V . If S is linearly independent then we merely have to show that no other linearly independent set has a different number of elements. On the other hand, if S is linearly dependent, then since S is finite, we can remove at most finitely many vectors in S without changing the span. The claim is that removing a vector which is a linear combination of the remaining vectors does not alter the span. This is obvious, since the span is the set of linear combinations of the vectors, so if we throw some vector w out of S, the set S \ {w} still contains w in its span, and hence any other linear combination which potentially involved w can be constructed using only S \ {w}. Thus, after throwing out finitely many vectors, we have a set B which is linearly independent, such that span B = span S = V . It now remains to show that the size of any linearly independent set B 0 which also spans V is the same as that of B. To do this we need the following lemma: 28 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Lemma 5.1. If S ⊂ V is a finite set and B ⊂ span S is a linearly independent set, then |B| ≤ |S|. Assuming the lemma, let’s finish the proof of the proposition. Suppose |B| = n and |B 0 | = m. From the lemma, since span B = V ⊃ B 0 and B 0 is linearly independent, we deduce that m ≤ n from the lemma. We similarly conclude that since span B 0 = V ⊃ B and B is linearly independent, n ≤ m. Thus m = n and we are done. We now prove the lemma: Proof. Let S = {v1 . . . vm } and suppose B ⊂ span S is a linearly independent set. Choose some finite subset E ⊂ B. Since B is linearly independent, so is E. Suppose E = {u1 , . . . uk }. Since E ⊂ span S, there’s a linear relation uk = a1 v1 + . . . am vm . Since uk 6= 0 by linear independence of E, we deduce that at least one aj 6= 0. We may assume it is a1 whence we can write v1 as a linear combination of {uk , v2 . . . vm }. Note that E is also in the span of this new set. We readily conclude that uk−1 is in the span of this new set, and repeating the argument above we can claim v2 ∈ span {uk , uk−1 , v3 . . . vm }. Note that E is also in the span of this new set. We can repeat this procedure until either we’ve used up E, in which case k ≤ m, or until we run out of elements of S. If we were to run out of elements of S, without running out of elements of E, then since E is in the span of each of the sets we are building, we’d be forced to conclude that there are elements of E which are linear combinations of other elements in E, which contradicts its linear independence. Thus, it must be the case that k ≤ m, as desired. Definition 5.10. Given a vector space V over F, we say that a linearly independent set B such that V = span F B is a basis of V . Thus, the above proposition amounts to stating that we can always provide a basis for a finite dimensional vector space, and moreover, any basis will have the same number of elements. Definition 5.11. Given a finite dimensional vector space V over F, the dimension of V is the size of any F-basis of V : dimF V := |B| . A remark: the subscript F is necessary at times, since a given set V may have different vector space structures over different fields, and consequently different dimensions. Specifying the field removes ambiguity. We will see examples of this shortly. Example 5.14. The standard basis of Fn is the set BS := {e1 , . . . , en } consisting of the vectors which are columns of In . In particular, for any x ∈ Fn : x1 n X .. x = . = x1 e1 + . . . + xn en = xi ei . i=1 xn Clearly, the vectors of BS are linearly independent since they are columns of the identity matrix. Exercise 5.9. Show that if A ∈ Matn×n (R) is an invertible matrix, then the columns of A form a basis of Rn . Note that dimR Rn = n as expected, either by the previous example or this one. Example 5.15. A choice of basis for Pn (F) can be given by the set of monomials of degree less than n: {1, x, . . . , xn }. Clearly, any polynomial with coefficients in F is an F-linear combination of these, as indeed, that is how one defines polynomials! We merely need to check linear independence. This is clear since the only polynomial equal to the zero polynomial is the zero polynomial, and so any F-linear combination of the monomials equal to the zero polynomial necessarily has all zero coefficients, and thus is the trivial linear combination. Note that there are n + 1 monomials in the basis, so dimF Pn (F) = n + 1. 29 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Example 5.16. The complex numbers C, regarded as a real vector space, have a basis with two elements: {1, i}, and thus dimR C = 2. But as a vector space over the field C, a basis choice could be any nonzero complex number, and in particular, {1} is a basis of C as a vector space over C, so dimC C = 1. More generally, dimR Cn = 2n while dimC Cn = n. Note that for any field, dimF Fn = n, which is established for example by looking at the standard basis. Example 5.17. Let us examine an analogue of the standard basis in the case that our vector space is the space of real m × n matrices, Matm×n (R). Define a basis BS = {eij | 1 ≤ i ≤ m, 1 ≤ j ≤ n, i, j ∈ N} , such that eij is the matrix containing a single 1 in the (i, j)-th entry, and zeros in all other entries. It is easy to check that this is an R-basis of Matm×n (R), and thus that Matm×n (R) is an mndimensional real vector space. Exercise 5.10. Consider the set of all matrices Eij ∈ Matn×n (F) defined by Eij := In − eii − ejj + eij + eji . (a) Given an n × n matrix A, what is Eij A? (b) Describe the vector space span F {Eij | 1 ≤ i, j ≤ n, i, j ∈ N} ⊂ Matn×n (F), and give a basis for this vector space. (Hint: first figure out what happens for 2 × 2 matrices and 3 × 3 matrices, then generalize). The notion of basis is useful in describing linear maps, in addition to giving us a notion of “linear coordinates”. Let us examine the connection between bases and linear maps. The first result in this direction is the following theorem: Theorem 5.1. Let V be a finite vector space over F and B = {v1 , . . . , vn } a basis of V . Let W be a vector space and {w1 , . . . wn } ⊂ W a collection of not necessarily distinct vectors. Then there is a unique linear map T : V → W such that Tv1 = w1 , . . . Tvn = wn . Proof. For any v ∈ V we can write v as a linear combination of the basis vectors. Thus, let P v = ni=1 ai vi . Suppose T : V → W is a linear map which satisfies the conditions Tv1 = w1 , . . . Tvn = wn . Then the claim is that the value Tv is determined uniquely. Indeed, since T is linear, one has Tv = T n X i=1 a i vi = n X i=1 ai Tvi = n X ai wi . i=1 Moreover, we may construct a unique T from the data Tvi = wi by the above formula, and define this to be the linear extension of the map on the basis. This proposition tells us that if we determine the values to which basis vectors transform, then we can linearly extend to describe a linear map of all of V , and so the following corollary should come as no surprise (we’ve alluded to the fact before in comments and exercises): Corollary 5.1. Let V be a finite vector space over a field F. Then V is non-canonically2 isomorphic to Fn where n = dimF V . 2 The term “non-canonical” in mathematics refers to the fact that the construction depends on choices in such a way that there is no natural preference. In this case, there are many isomorphisms that may exist between V and Fn , and we have no reason to prefer a specific choice outside of specific applications. 30 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Proof. Since V is finite dimensional, we may find some basis B = {v1 . . . vn }, where dimF V = n. Then define LB on B by specifying LB vi = ei , i = 1, . . . , n , where {e1 , . . . , en } = BS ⊂ Fn is the standard basis. Then by the above proposition, we may linearly extend LB to a linear map LB : V → Fn . It is clearly an isomorphism, as L−1 B is defined on BS and determines a unique linear map LB : Fn → V which clearly satisfies −1 LB ◦ L−1 B = IdFn and LB ◦ LB = IdV . Example 5.18. Regarding Cn as a real vector space we have an isomorphism Cn ∼ = R2n . Similarly, n+1 mn we have Pn (R) ∼ and Matm×n (R) ∼ =R = R . This latter fact justifies the notation that many authors (including Bretscher) exploit of writing Rm×n instead of Matm×n (R). Exercise 5.11. For each of the above examples, write down explicit isomorphisms (in particular, produce a basis and describe how to map it to a basis of an appropriate model vector space Rk ). Exercise 5.12. Explain why there can be no invertible linear map T : R3 → R2 . (this will be clarified more deeply in the discussion of the Rank-Nullity theorem; try to prove this using a simple argument about bases!) We will later explore the use of the maps LB : V → Rn of real vector spaces to discuss linear coordinates and change of basis matrices. For now, we finish with another important example: using a basis to describe a linear map via a matrix. Since any n-dimensional vector space over R is isomorphic to Rn , it suffices to understand how to write matrices for maps T : Rn → Rm . Theorem 5.2. Let T : Rn → Rm be a linear map. Then there is a matrix A ∈ Matm×n (R), called the matrix of T relative to the standard basis, or simply the standard matrix of T, such that Tx = Ax. The matrix is given by has columns given by the effect of the map T on the standard basis: A = [Te1 . . . Ten ] ∈ Matm×n (R) . Proof. The proof is a simple computation. Let x = Tx = T n X i=1 ! xi ei Pn i=1 xi ei . Then x1 .. = xi Tei = [Te1 . . . Ten ] . . i=1 xn n X Remark 5.1. If we think about the one line proof above, it should be clear that the image of the linear map T : Rn → Rm is nothing more than the span of the columns of the matrix A representing the map: T (Rn ) = span {Te1 , . . . Ten } =: Col A . The last notation is new: for any matrix A ∈ Matm×n (R), Col A is the subspace of Rm spanned by the columns of A. This is called the column space of A, though we may also just refer to it as the of the image of the matrix map x 7→ Ax. We’ll see the column space again shortly. 31 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Example 5.19. Let us demonstrate how to construct a matrix representing the linear map d : P2 (R) → P1 (R) . dx Since matrices describe maps between vector Euclidean vector spaces, we need to exploit the isomorphisms ϕ2 : P2 (R) → R3 , ϕ2 (a0 + a1 x + a2 x2 ) = a0 e1 + a1 e2 + a3 e3 ∈ R3 , ϕ1 : P1 (R) → R2 , ϕ1 (a0 + a1 x) = a0 e1 + a1 e2 ∈ R2 The matrix we desire will actually then be the standard matrix of the map ϕ1 ◦ d 3 2 ◦ ϕ−1 2 : R →R dx which completes the diagram: d/dx P2 (R) P1 (R) ϕ2 ϕ1 R3 Note that since d dx p(x) ϕ1 ◦ d dx ◦ ϕ−1 2 R2 = a1 + 2a2 x, one has that the bottom map in the diagram is defined by ãÄ ä d −1 ϕ1 ◦ ◦ ϕ2 p(x) = a1 e1 + 2a2 e2 , dx Å and by applying our theorem we find that the desired matrix representing the derivative is ñ A= 0 1 0 0 0 2 ô . Observe that the first column is a zero column, and this is entirely sensible since the derivative of a constant is 0. Exercise 5.13. Expand on the above example and describe matrices representing the derivative of polynomials in Pn (R), and do the same for the integral. (This is part of exercise 4 on HW quiz 3.) Exercise 5.14. Fix a real number a ∈ R and a positive R ∈ R and denote by I the open interval (a − R, a + R). Denote by C ω (I, R) the space of power series centered at a and convergent on I. (a) Show that C ω (I, R) is a vector space over R with vector addition and scalar multiplication defined in the natural ways. (b) Is this vector space finite dimensional? (c) Describe a basis of C ω (I, R). (d) Give an example of a linear transformation T : C ω (I, R) → C ω (I, R) that is surjective but not injective. Can you find an example of a linear transformation of C ω (I, R) which is injective, has image of the same dimension as C ω (I, R), but is not surjective? 32 Math 235.9 - Lin. Alg. Course Notes 6 2015 Andrew J. Havens Rank and Nullity, and the General Solution to Ax = b This section introduces us to the notions of rank and nullity, and will also give us the relation between them. The theorem relating them, called the rank-nullity theorem, is also sometimes affectionately referred to as the fundamental theorem of linear algebra. This is because it gives us a rigid relationship between the dimensions of the domain of a linear map, the dimension of its image, and the dimension of its kernel, effectively telling us that linear maps can at worst collapse a subspace (the kernel, if it is nontrivial), leaving the image as a possibly lower dimensional shadow of the source vector space, sitting inside the target vector space. We will then discuss the general solution of linear systems. 6.1 Images, Kernels and their Dimensions Let us introduce the main definitions and their elementary properties. Throughout, let V be a finite dimensional vector space of a field F, and let T : V → W be a linear map. Definition 6.1. The rank of the linear map T : V → W is the dimension of the image: rank T := dimF T (V ) . It is sometimes abbreviated as rk T. Remark 6.1. Note that rank T ≤ dimF V and rank T ≤ dim W . Exercise 6.1. Explain the above remark about bounds on the rank of a linear map. Definition 6.2. The nullity of the linear map T : V → W is the dimension of the kernel: null T := dimF ker T . Remark 6.2. Observe that null T ≤ dimF V , but it need not be bounded by the dimension of W . Exercise 6.2. Explain the above remark about the bound on the nullity of a linear map. Let us consider how nullity and rank are computed when a linear map is given by matrix multiplication. Consider, for example, a linear map T : Rn → Rm given by the rule Tx = Ax for A ∈ Matm×n (R). Recall that the image of the map T is the same as the set of all vectors which can be written as linear combinations of the columns of A (this is why some books call it the column space of A.) Thus, the rank is the number of linearly independent columns, as a collection of linearly independent columns of A is a basis for the image. But we know that a set of k vectors is linearly independent if and only if the matrix whose columns are the k vectors has k pivots, and so we deduce that the rank of the map T is precisely the number of pivots of A. The nullity is the dimension of the kernel, and each free variable of A contributes a vector which is in a basis of the kernel (think about using Gauss Jordan to solve Ax = 0). It is thus clear that the nullity of the map T can be computed by counting free variables, or equivalently by subtracting the number of pivots from the total number of columns of A. We then have the obvious relationship: rank plus nullity gives the number of columns, which is just the dimension of the domain Rn . This is the rank-nullity theorem, as stated for matrices. We will show it generally: Theorem 6.1. Rank-Nullity Let V be a finite dimensional F-vector space, and let T : V → W be a linear map. Then dimF V = dimF T(V ) + dimF ker T = rank T + null T . 33 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Proof. Since V is finite dimensional, there exists a basis B of V . Moreover, since ker T ⊂ V is a subspace, it is itself a finite dimensional vector space, and it thus possesses a basis. Let B = {u1 , . . . uk , v1 , . . . vr } be a basis of V such that ker T = span {u1 , . . . , uk }. We claim several things: that we can indeed procure a basis of V satisfying this property, and that {Tv1 , . . . Tvr } are a basis of the image. For the first claim, note that we can start with any basis B˜ of V and some basis {u1 , . . . uk } of K = ker T ⊂ V , where k = dimF K. Assume that dimF V = n. Then to produce a basis of the form above, we start by replacing a vector of B˜ by u1 . If the resulting set is linearly independent, then we choose a different vector in B˜ to be replaced by u1 . I claim there is a choice such that the ˜ But then we modified set is still a basis. For if not, then u1 is in the span of any n − 1 vectors in B. have a pair of distinct linear relations involving u1 , and by subtracting these we obtain a nontrivial ˜ contradicting the linear independence of the vectors in linear relation involving the elements of B, ˜ Thus, we may choose to replace a vector of the basis with u1 to form a different basis. The B. set of elements in B˜ − {u1 } is then a basis of an n − 1 dimensional subspace complimentary to span {u1 }, and we can iterate the process of replacement by elements of the basis of K, until we’ve exhausted ourselves of the ui , i = 1, . . . k. The final set B is a basis of the form given above, where n = k + r, and we know that k = null T is the nullity. For the second claim, observe that the image of T satisfies T(V ) = span {Tu1 , . . . , Tuk , Tv1 , . . . , Tvr } = span {0, . . . , 0, Tv1 , . . . , Tvr } = span {Tv1 , . . . , Tvr } Thus the set {Tv1 , . . . Tvr } spans the image. We need to show that this set is linearly independent. P We prove this by contradiction. Suppose that there is a nontrivial relation ri=1 ai Tvi = 0. Then T r X ! ai vi = 0 =⇒ i=1 r X ai vi ∈ ker T . i=1 Since {u1 , . . . , uk } are a basis of ker T, we then can express the linear combination of vi s as a linear combination of the uj s: r X i=1 ai vi = k X bj uj . j=1 We thus obtain a relation a1 v1 + . . . + ar vr − b1 u1 − . . . − bk vk = 0 , and since at least one of the ai s is nonzero, this relation is nontrivial. This contradicts the linear independence of the elements of B. Thus, the assumption that there exists a non-trivial linear relation on the set {Tv1 , . . . Tvr } is untenable. We conclude that {Tv1 , . . . Tvr } is a basis of the image, so the rank is then r. It is therefore clear that dimF V = n = r + k = dimF T(V ) + dimF ker T = rank T + null T . Let’s examine the consequences of this theorem briefly. First, note that if a map T : V → W is an injection from a finite dimensional vector space V , then the kernel has dimension 0, and by rank-nullity we have that the dimension of the image is the same as the dimension of the domain. In particular, if a linear map is injective, its image is an “isomorphic copy” of the domain, and one may refer to such maps as linear embeddings, since we can imagine that we are identifying the domain with its image as a subspace of the target space. 34 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens If we have a surjective map T : V → W from a finite dimensional vector space V , then the image has the same dimension as W . We see that the dimensions then satisfy dimF ker T = dimF V − dimF W , whence we see that the nullity is the difference in the dimensions of the domain and codomain for a surjective map. We can interpret this as follows: to cover the space W linearly by V , we have to squish extra dimensions, nullifying a subspace (the kernel) whose dimension is complimentary to W. Finally of course, in a linear isomorphism T : V → W , we have injectivity and surjectivity, and so in particular we have null T = 0 and dimF V = dimF W = rank T. 6.2 Column Space, Null Space, Row Space This section introduces some language which is seen in many linear algebra textbooks for talking about the various subspaces associated to a linear map defined by matrix multiplication. We will presume a linear map T : Rn → Rm throughout, given by Tx = Ax for some matrix A ∈ Matm×n (R). Definition 6.3. The column space of the matrix A is the span of the columns of A. Observe that the column space is thus a subspace Rm , indeed, it is just another name for the image of the map T, i.e. Col A = T(Rn ) ⊆ Rm . Definition 6.4. The row space of a matrix A is the span of the rows of A, and is denoted Row A. Technically, this a subspace of Mat1×n (R), but often one identifies the row space with a corresponding subspace of Rn (via the isomorphism ·τ : Mat1×n (R) → Rn sending a row vector to the corresponding column vector). Definition 6.5. The null space (or right null space as it is sometimes called) of the matrix A is the space of vectors x such that Ax = 0. Note this is just another term for the kernel of the map T. There is a notion of a “left null space” of A, which is the kernel of the map whose matrix is Aτ . The right nullity is just the nullity (i.e. the dimension of the kernel of T), and the left nullity is the dimension of the left null space. I will tend to use the term kernel instead of null space, except when dealing with both left and right null spaces of a given matrix. One can naturally identify rows with linear functions from Rn to R, and so there is a more formal viewpoint on the row space: it is a subspace of the dual vector space to Rn . We develop this idea with a few exercises. We first define duals in general: Definition 6.6. Let V be a vector space over F. Then V ∗ = {f : V → R | f is F-linear} has a natural vector space structure induced by scaling and addition of functions, and when endowed with this structure is called the dual vector space to V , or the “space of linear functionals on V ”. Exercise 6.3. Show that for any finite dimensional F-vector space V , V ∗ ∼ = V (non-canonically). Exercise 6.4. What geometric objects give a model of the dual vector space to R3 ? By the preceding exercise, we see that the space of linear functionals on Rn is isomorphic to Rn . By fixing the standard basis as our basis, we can realize linear functionals as row vectors, and their action by the matrix product. Thus, we see that the row space of a matrix is a subspace of (Rn )∗ , and we can pass through the aforementioned transposition isomorphism to Rn . Exercise 6.5. What is the relationship between the row space of A and the column space of Aτ ? What does rank nullity tell us about the relationships of the dimension of the row space, the dimension of the column space, and the right and left nullities? 35 Math 235.9 - Lin. Alg. Course Notes 6.3 2015 Andrew J. Havens The General Solution At Last We now will discuss the general solution to a linear system. We’ve already seen how to algorithmically solve a matrix equation of an inhomogeneous linear system Ax = b, where A ∈ Matm×n (R), x ∈ Rn and constant b ∈ Rm , using Gauss-Jordan. We wish to more deeply interpret these results in light of our knowledge of the various subspaces associated to a linear map (or to a matrix), and the rank-nullity theorem. Throughout, assume A ∈ Matm×n (R), and b ∈ Rm fixed. We begin with a few observations. Observation 6.1. Let K = ker(x 7→ Ax). Note this is precisely the space of solutions to the homogeneous linear system Ax = 0. Suppose x0 ∈ K, and that xp solves the inhomogeneous system Ax = b. Then note that xp + x0 is also a solution of the inhomogeneous system: A(xp + x0 ) = Axp + Ax0 = b + 0 = b . ˜ p both solve the inhomogeneous system, then they differ by an Observation 6.2. If xp and x element of K: ˜ p ) = Axp − A˜ A(xp − x xp = b − b = 0 , ˜p ∈ K . =⇒ xp − x These two observations together imply the following: given any particular solution xp to the inhomogeneous linear system Ax = b, we can obtain any other solution by adding elements of the kernel of the map x 7→ Ax. In particular, we can describe the general solution to Ax = b as being of the form x = xp + x0 , for x0 ∈ K. î ó When we reduce the augmented matrix A b and write the solution as a sum of a constant vector with coefficient 1 and a linear combination of vectors with coefficients coming from the free variables, we are in fact describing a general solution of the above form. The constant vector is an example of a particular solution, while the remaining vectors which are scaled by free variables give a basis of the null space. We thus know how to solve a general linear system and produce a basis for the null space. How do we find a basis of the column space? The procedure is remarkably simple once we’ve reduced the matrix A: simply look for the pivot columns, and then take the corresponding columns of the original matrix A, and this collection gives a basis of the image of the map x 7→ Ax. 6.4 Excercises Recommended exercises from Bretscher’s text: • Any (really many) of the problems at the end of section of 3.1. Especially 9-12,19, 20, 22-31, 35, 36, 42, 43, 48-50. • Problems 28, 29, 34-44 at the end of section 3.2 • Problems 33-39 at the end of section 3.3 • Problems 1-10 and 16-39 at the end of section 4.1 36 Math 235.9 - Lin. Alg. Course Notes 7 2015 Andrew J. Havens A Tour of Linear Geometry in R2 and R3 This section was covered in class primarily on the dates 3/6, 3/9, and 3/11. Please read Bretscher, chapter 2, section 2. I covered more than the contents of Bretscher, providing a number of pictures, proofs and examples. The notes will be updated to more completely reflect what was stated in class at some point, but in the interim, please find a classmate’s notes if you were unable to attend, or attempt to prove the given formulae by constructing your own compelling geometric arguments. The outline of what as covered in class and the statements of the main formulae may be found below, with propositions, theorems, and definitions generalized to Rn where applicable. 7.1 Geometry of linear transformations of the plane Before exploring linear transformations of the plane, we need to understand the Euclidean structure of R2 . As it happens, this structure comes from the dot product, and indeed the dot product gives a Euclidean structure to any Euclidean vector space Rn . Proposition 7.1 (Bilinearity of the dot product). Given a fixed vector u ∈ Rn , x 7→ u · x gives a linear map from Rn to R. Since the dot product is commutative, we have in particular that the map · : Rn × Rn → R is bilinear (linear in each factor). Theorem 7.1 (Geometric interpretation of the dot product). Let u and v be vectors in Rn . Then u · v = kukkvk cos θ , where θ ∈ [0, π] is the (lesser) angle between the vectors u and v as measured in the plane they span. Remark 7.1. It suffices to prove the above in R2 , since the angle is always measured in the two dimensional subspace span {u, v} ∼ = R2 . We used elementary trigonometry to deduce this. Proposition 7.2 (Euclidean orthogonality from the dot product). Two vectors u, v ∈ Rn are orthogonal if and only if u · v = 0. Definition 7.1. Given u, v ∈ Rn , the orthogonal projection of v onto u is the vector proju v := u·v u. kuk2 ˆ ∈ S1 := {x | kxk = 1}, then the formula simplifies Remark 7.2. If instead we take a unit vector u to projuˆ v = (ˆ u · v)ˆ u. Exercise 7.1. Prove the above remark using the formula in the definition of orthogonal projection. Then give a matrix for the operator proju for u ∈ R2 , and show that this is the same as the matrix ˆ := u/kuk is the normalization of u. Find also the corresponding matrices if for projuˆ where u 3 u∈R . We may use the above construction to understand reflections through 1-dimensional subspaces of R2 (namely, reflections across lines through the origin). The remaining theorems exercises of this subsection concern linear automorphisms of R2 , i.e. bijective linear maps of R2 to itself. In particular, rotations and reflections are explored through the following exercises. Exercise 7.2. Prove the following theorems for the rotation and reflection formulae in the plane (this was done in class!): 37 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Theorem 7.2. Given an angle θ ∈ [0, 2π), the operator for counter-clockwise rotation of R2 by the angle θ has standard matrix ñ ô cos(θ) − sin(θ) Rθ = . sin(θ) cos(θ) Using the isomorphism C ∼ = R2 given by mapping the basis (1, i) to (e1 , e2 ), the operator Rθ corresponds to the 1D C-linear operation Rθ (z) = eiθ z . Theorem 7.3. Let L ⊂ R2 be a line through 0, and suppose u is a vector spanning L. Then the operator giving reflection through L is ML = (2proju − I2 ) : R2 → R2 , and it is well defined independently of the choice of u spanning L. If θ ∈ [0, π) is the angle made by L with the x-axis, then the matrix of ML in the standard basis of R2 is ñ cos(2θ) sin(2θ) sin(2θ) − cos(2θ) ô . We can thus determine a reflection by the angle θ ∈ [0, π) made by the line L with the x-axis, and may also write Mθ to indicate the dependence on this parameter. Moreover, if ñ ô a b A= b −a for a, b ∈ R such that a2 + b2 = 1, then A represents a reflection through the line L = span (u) where u is any vector lying on the line bisecting the angle between the first column vector of A and e1 . Using the isomorphism C ∼ = R2 given by mapping the basis (1, i) to (e1 , e2 ), the operator Mθ corresponds to the operation Mθ (z) = e2iθ z¯ , where z¯ = <z − i=z is the complex conjugate of z. (Note this operation is not, strictly speaking, complex linear, since complex conjugation is not C-linear.) Exercise 7.3. Given an arbitrary nonzero-complex number a ∈ C∗ = C − {0}, what is the effect of the map z 7→ az? Give a matrix representation when this is viewed as a map of R2 . One then has the following conclusion about the relation between complex and real representations of rigid linear motions in the plane: “rigid linear motions of R2 are captured by C-linear motions of C together with conjugation; that is, C-linear motions of C are more restricted (they preserve orientation), but including the complex conjugation operation recovers R-linear motions of C as an R-vector space.” Example 7.1. Let L be the line in R2 through the origin making angle 3π/4 with the x-axis, and let M be the line in R2 through the origin making angle π/6 with the x axis. Find the standard matrix for the composition T = MM ◦ ML of reflections through the lines L and M . What is the geometric interpretation of this composition? Write a formula for it using complex numbers. Solution: By the above theorems, if a line L is spanned by a unit vector u = cos θe1 + sin θe2 , then we can compute the reflection through L as ML (x) = 2(u · x)u − x = (2proju − I)x , 38 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens and the matrix (2proju − I) is given as ñ (2proju − I) = 2 cos2 θ − 1 2 sin θ cos θ 2 sin θ cos θ 2 sin2 θ − 1 ô ñ = cos(2θ) sin(2θ) sin(2θ) − cos(2θ) ô . Thus, first we determine the unit vectors associated to each line: ´ ®Ç √ ® å´ −√ 2/2 cos(3π/4) = span L = span sin(3π/4) 2/2 ® ´ ®Ç √ å´ cos(π/6) 3/2 M = span = span sin(π/6) 1/2 Let A be the matrix such that ML (x) = Ax and let B be the matrix such that MM (x) = Bx. We then have ñ ô 0 −1 A= , −1 0 √ ñ ô 1/2 3/2 √ B= . 3/2 −1/2 The composition of the maps T = MM ◦ ML has matrix equal to the matrix product ñ √ ô − 3/2 −1/2 √ BA = . 1/2 − 3/2 √ Note that this matrix is the matrix of a rotation! Since sin θ = 1/2 and cos θ = − 3/2, we conclude that the angle of the associated counterclockwise rotation is θ = 5π/6, and we conclude MM ◦ ML = R5π/6 . As a complex linear map T can be realized by z 7→ e5πi/6 z. Exercise 7.4. Give matrix-vector formulae for rotation about an arbitrary point of R2 and reflection through an arbitrary line (not necessary containing 0). Exercise 7.5. Characterize all bijective linear maps of R2 which do not decompose as a composition involving rotations or reflections. Exercise 7.6. (Hard!) Describe an algorithm which, for a given matrix A describing a bijective linear map x 7→ Ax of R2 , produces a decomposition in terms of reflections, rotations, and the maps described in the previous exercise. Can one decompose any linear automorphism of R2 using just reflections and the maps from the previous exercise (i.e., can we exclude rotations in our decompositions)? 7.2 Geometry of linear transformations of three-dimensional space Below is a summary of the contents of the two lectures given on the geometry of linear transformations of R3 . If you missed those lectures, then it is advised you copy notes and discuss the material with a classmate or myself during office hours. The essential points, such as computing 3 × 3 determinants, are reviewed in future sections. • Projections - the formula for projection onto a line appears the same. Can you find a formula for projection onto a plane? 39 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens • Planes and normals - This is largely overlap material with math 233; I chose to present it from a linear algebra perspective in class as a point of unification (e.g. deriving the equation of a plane, which we’ve used for a while without justification) • Reflections in planes - the visual argument for this is analogous to the argument used to derive reflections across a line in R2 . • Cross products and determinants/Triple Scalar Products - The 3 × 3 determinant was introduced and used as a mnemonic for the computation of the 3D cross product. Note that there is no cross product in dimensions other than 3 and 7 (though there’s a pseudo-cross product in R2 which returns the signed area of the parallelogram spanned by the pair of vectors being multiplied). It was observed that the 3×3 determinant is in fact the signed volume of a parallelepiped spanned by the (column or) row vectors. This construction is equivalent to dotting the vector corresponding to the first row with the cross product of the vectors corresponding to the second and third rows. • Spatial Rotations - Using the cross product and projections, we obtained a beautiful formula for rotation of R3 about an axis by an angle θ. 8 Coordinates, Basis changes, and Matrix Similarity Please read sections 3.4 and 4.3, and 4.4 in Bretscher for the presentation and examples of the following topics. 8.1 Linear Coordinates in Rn 8.2 Coordinates on a finite dimensional vector space 8.3 Change of Basis 9 Determinants and Invertibility Please read Bretscher, chapter 6; this section of the notes will include definitions and proofs auxiliary to those provided by the text. 9.1 Review of Determinants in 2 and 3 Dimensions Recall that we defined the determinant of a 2 × 2 matrix A as follows: det A := a11 a22 − a21 a12 , where A = (aij ) ∈ Mat2×2 (F) . Note that this definition can be applied for matrices over any field (or more generally, even over a ring, such as the integers). Note also that det A = det Aτ . For 2 × 2 matrices over a field, we know that invertibility of the matrix is equivalent to nonvanishing of its determinant. A natural question is whether we can generalize this to square matrices of any size. Recall, the geometric interpretation of the 2 × 2 determinant for matrices with real entries: 40 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Example 9.1. (HW 1 Bonus 1) Show that ad − bc is the signed area of the parallelogram spanned by u and v, where the sign is positive if rotating u counter-clockwise to be colinear to v sweeps into the parallelogram, and is negative otherwise. Solution: First, let us suppose u and v are unit vectors, i.e. a2 + c2 = 1 = b2 + d2 . Geometrically, they are vectors lying on the unit circle, and so we can express their components as trigonometric functions of the angles they make with the x axis. Let u make an angle of α with the x axis and v make an angle of β with the x axis. Then the angle between the vectors is β − α, and from the sine subtraction formula: sin(β − α) = cos(α) sin(β) − cos(β) sin(α) = ad − bc. Recall that the area of a parallelogram is the base times an altitude, formed by taking an orthogonal line segment from one side to an opposite side. From a picture, one sees that the area of a parallelogram can be expressed as the product of side lengths times the sine of the internal angle between adjacent sides. If the sides are the unit vectors u and v, then the area is | sin(β − α)|. Thus, for unit vectors, ad − bc is ±area, with the sign positive if the angle β − α ∈ (0, π), negative if β −α ∈ (π, 2π), and 0 if the angle β −α = 0 or π (the colinear case). Thus, for the non-colinear case, if u sweeps into the parallelogram when rotated counterclockwise towards v, the sign is positive. Note that switching the order of the vectors switches the sign of the determinant ad − bc, and this is consistently reflected in the convention regarding the vectors’ orientations. For general vectors, one scales the area of the parallelogram as well as the components, and discovers that the scale factors for the area and the equation ad − bc are identical: e.g. if we scale u by λ, then the area scales by λ, and so do the components: Ç λu = λa λc å , so the determinant scales to (λa)d − b(λc) = λ(ad − bc). Thus, the determinant is the signed area, accounting for the orientation/ordering of the two vectors. We also defined determinants for 3 × 3 matrices, and discovered that our generalization has an analogous geometric interpretation as a signed volume in R3 of the parallelepiped whose sides are determined by the column vectors (or row vectors) of the matrix: a 11 a21 a31 a12 a13 a22 a23 a32 a33 = a11 (a22 a33 − a32 a23 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a31 a23 ) . See Brestcher, section 6.1, for a discussion of Sarrus’s rule, and why it fails to generalize to give determinants for n > 3. 9.2 Defining a General Determinant For the definition provided in class, please read Bretscher, section 6.1. Here, I rephrase his definition (which uses “patterns” and “inversions”) in the modern, standard language. We need to define a very important object, called a permutation group, in order to give the modern definition of the determinant. This definition is very formal, and is not necessary for the kinds of computations we will be doing (see instead the discussions of computing determinants by row reduction, or via expansion by minors.) It is recommended you read Bretscher’s treatment or the in class notes regarding patterns and signatures first, before approaching this section. The end of the section describes how to define determinants in the general setting of finite vector spaces over a field, where instead of matrices we consider maps of the vector space to itself, called endomorphisms. 41 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Definition 9.1. Consider a set of n symbols, e.g. the standard collection of integers 1 through n: {1, . . . , n}. Define the permutation group on n symbols to be the set of all bijective maps of {1, . . . , n} to itself, with group operation given by composition. See HW 4 for the definition of a group and an exercise realizing a representation of this group. Denote this permutation group by Sn . A common notation for the above group’s elements is cycle notation. For example, let us consider S3 , the permutation group of the symbols {1, 2, 3}. Consider the map which sends 1 to 2, 2 to 3 and 3 to 1. We notate this element as (1 2 3). We interpret the symbol as telling us where to send each element as follows: if an integer m appears directly to the right of k, then k is mapped to n, and the last integer on the right in the cycle is mapped to the first listed on the left. The cycle (1 2 3) clearly gives a bijection, so we can regard (1 2 3) ∈ S3 . This is called a cyclic permutation, as it consists of a single cycle. Another special type of permutation is a cyclic permutation with just two elements, which is called a transposition. An example would be the map which sends 1 to itself, but swaps 2 and 3. This is notated (2 3) ∈ S3 . The lack of the appearance of 1 tells us that 1 is mapped to itself (sometime, this transposition would be denoted (1)(2 3) to emphasize this.) The convention I will follow is that if an integer is missing from a cycle, then it is sent to itself by that cycle. To see the effect of the map determined by a cycle, we’ll denote it’s action sometimes by writing how it permutes the ordered tuple (1, . . . , n), e.g. if σ = (1 3) ∈ S3 , then σ (1, 2, 3) 7−→ (3, 2, 1) . One can cyclically reorder any cycle and it will represent the same map, e.g. (1 2 3) = (2 3 1) = (3 1 2). By convention one usually starts the cycle with the lowest integer on which the cycle acts nontrivially. The empty cycle () represents the identity map on the set of symbols. One can “multiply” cycles to compute a composition of permutations as follows: 1. Two adjacent cycles represent applying one cycle after another, from right to left. For example, in permutations of 6 symbols, S6 , the cycles σ = (1 2 3) and σ 0 = (3 5 4 6) can be composed in two ways: σσ 0 σσ 0 = (1 2 3)(3 5 4 6), which acts as (1, 2, 3, 4, 5, 6) 7−→ (2, 3, 5, 6, 4, 1) , σ0 σ σ 0 σ = (3 5 4 6)(1 2 3), which acts as (1, 2, 3, 4, 5, 6) 7−→ (2, 5, 1, 6, 4, 3) . 2. Any cycle product can be rewritten as a product of disjoint cycles. Disjoint cycles commute with each other, e.g. (1 2)(3 4) = (3 4)(1 2) ∈ S4 represents the map (1, 2, 3, 4) 7→ (2, 1, 4, 3) . If cycles are not disjoint, to write them as disjoint cycles, one reads where the rightmost cycle sends a given symbol, then scans left to find its image in the cycles to the left, then follows this image to the left, etc. E.g. using the examples from (1): σσ 0 = (1 2 3)(3 5 4 6) = (3 5 4 6 1 2) = (1 2 3 5 4 6) . σ 0 σ = (3 5 4 6)(1 2 3) = (1 2 5 4 6 3) . In these cases the result is a single cycle (which is therefore a product of disjoint ones). A more interesting example is the product (1 3 5)(5 6)(1 4 2 6) = (1 4 2)(3 5 6) . 42 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens 3. Any cycle can be decomposed as a product of (not necessarily disjoint) transpositions. E.g. (1 2 3) = (1 2)(2 3) σσ 0 = (1 2)(2 3)(3 5)(5 4)(4 6) A permutation is called even if it can be decomposed into an even number of transpositions, otherwise it is said to be odd. Exercise 9.1. Argue that the notions of evenness and oddness of a permutation are well defined. Thus you must show that if a permutation has one decomposition into evenly many transpositions, then any decomposition into transpositions has an even number of transpositions, and similarly if it admits an odd decomposition, then all decompositions are odd. Definition 9.2. A given permutation has signature sgn σ = 1 if σ is even and −1 if σ is odd. By the above exercise, this is well defined, and in fact determines a unique map sgn : Sn → {−1, 1} such that sgn (σ1 σ2 ) = sgn (σ1 ) sgn (σ2 ) and with sgn (τ ) = −1 for any single transposition τ ∈ Sn . The “patterns” Bretscher speaks of are actually the result of applying permutations to the indices of entries in the matrix. In particular, one can define a pattern as follows. Let’s assume we are given a matrix A ∈ Matn×n (F). Fix a permutation σ ∈ Sn . Then we obtain a pattern Pσ = (a1,σ(1) , a2,σ(2) , . . . , anσ(n) ) . The claim is that all patterns are of this form and that the signature of the pattern is equal to the signature of the associated permutation. Given this fact, one can realize Bretcher’s definition as the more common Lagrange formula for the determinant: Definition 9.3. The determinant of A ∈ Matn×n (F) is the scalar det A ∈ F given by det A := X sgn (σ) n Y i=1 σ∈Sn Ä X aiσ(i) = ä sgn (σ) a1σ(1) · · · anσ(n) . σ∈Sn One can readily recover some basic properties of the determinant from this definition. For example, suppose one were to swap the ith and jth columns of a matrix A. This is equivalent to acting on the matrix by the transposition τij = (i j) ∈ Sn . Denote the image matrix as τij A and let A = (akl ). Note that sgn τ = −1 and sgn (σ) = −sgn (στ ) for any σ ∈ Sn . Moreover, since Sn is a group, the map τ : Sn → Sn is a bijection. Thus X det(τij A) = det(akτ (l) ) = = X σ∈Sn n Y −sgn (στ ) σ∈Sn =− X X n Y akσ(τ (l)) k=1 akσ(τ (l)) k=1 n Y sgn (στ ) στ ∈Sn =− sgn (σ) akστ (l) k=1 sgn (σ) σ∈Sn n Y akσ(l) k=1 = − det(A) . Exercise 9.2. Use the above definition to show that det A = det Aτ for any matrix A ∈ Matn×n (R). Exercise 9.3. Use the above definition to describe the effect of the other elementary row/column operations on the determinant of a square matrix. 43 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Let us now generalize our definition of determinants to a suitable class of maps of abstract finite dimensional vector spaces. Given a finite dimensional vector space V over a field F, we can consider endomorphisms of V and their determinants: Definition 9.4. Let V be a vector space over F. Then a linear endomorphism, vector space endomorphism, or simply endomorphism of V as an F-vector space is an F-linear map T : V → V . We denote the space of all endomorphisms of the F-vector space V by EndF (V ). Let us consider a finite dimensional vector space V , with dimension n. Thus, there is a basis A of V consisting of n vectors, giving us a coordinate system in Fn . If T ∈ EndF (V ) is an endomorphism of V , we can find a matrix A representing T relative to the basis A . Definition 9.5. For V an F-vector space with dimF V = n, the determinant of an endomorphism T : V → V is the determinant of any matrix A ∈ Matn×n (F) representing T in coordinates determined by some basis A of V : det T := det A, where A ∈ Matn×n (F) such that [Tv]A = A[v]A . We need to check that this is a reasonable definition. Mathematicians speak of checking if a given construction or definition is “well-defined”. In this case, that means we need to check that the determinant depends only on the endomorphism T , and not on the choice of basis A of V . Claim 9.1. The determinant of an endomorphism T : V → V of a finite vector space is well defined. Proof. Suppose A and B are bases of V , and A and B are the coordinate matrices of T ∈ EndF (V ) relative to A and B respectively. It suffices to show that det A = det B. We know that A and B are similar, for if S is the change of basis matrix from A to B, i.e. the standard matrix of the isomorphism LB ◦ LA−1 : Fn → Fn , then AS = SB, whence B = S−1 AS. Then by properties of the determinant of a square matrix, we have: det B = det(S−1 AS) = (det S−1 )(det A)(det S) = (det S−1 )(det S)(det A) Ä ä = det(S−1 S) (det A) = (det In )(det A) = det A . An alternative definition of general determinants of endomorphisms of a finite vector space is to define the determinant of a map as the product of its eigenvalues (see the next section). This alternative definition has the advantage of being completely coordinate free; one need not invoke coordinates directly in the definition, and it is clearly well defined since the eigenspectrum is determined only by the map itself. We now consider the properties of the determinant. Proposition 9.1. Let V be a finite dimensional vector space over the field F, dimF V = n. Then there are isomorphisms EndF (V ) ∼ . . × V} ∼ = Matn×n (F) ∼ = Fn×n . = V n := |V × .{z n times Exercise 9.4. Prove the above proposition. 44 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Definition 9.6. Given a product of vector spaces V1 × V2 × . . . × Vn , a map T : V1 × V2 × . . . × Vn → F is said to be multilinear if it is linear in each factor, i.e., if for any i ∈ {1, . . . , n}, any α, β ∈ F, and any pair x1 , yi ∈ Vi , T(x1 , x2 , . . . , αxi + βyi , . . . , xn ) = αT(x1 , x2 , . . . , xi , . . . , xn ) + βT(x1 , x2 , . . . , yi , . . . xn ) . Definition 9.7. A multilinear map T : : V × V × . . . × V → F is called alternating if and only if for any pair of indices i, j ∈ {1, . . . , n} T(x1 , x2 , . . . , xi , . . . , xj , . . . , xn ) = −T(x1 , x2 , . . . , xj , . . . , xi , . . . , xn ) , i.e. after swapping any pair of inputs, the map is scaled by −1 ∈ F. A multilinear map is called symmetric if and only if such a swap does not change the value of the map on its inputs. Remark 9.1. Note that if F is of characteristic 2, then a map is alternating if and only if it is symmetric. Otherwise (e.g. the fields we’ve worked with most, such as R, C, Q, or Fp , p 6= 2) a map might be one but not the other, or might be neither. Exercise 9.5. Show that any alternating multilinear map T : V × . . . × V → F evaluates to zero if it has repeated inputs. E.g. for an alternating bilinear map B : V × V → F, B(x, x) = 0 necessarily. Theorem 9.1. Let V be a finite dimensional vector space over the field F, dimF V = n. There is a unique map D : EndF (V ) → F satisfying the following properties: (i.) D is multilinear and alternating when viewed as a map D : V n → F, (ii.) For any endomorphisms T, S ∈ EndF (V ), D(T ◦ S) = D(T)D(S), (iii.) D(IdV ) = 1 Exercise 9.6. Prove the above theorem and show that the map D is indeed the determinant as defined above. Note in particular that the multilinearity and alternativity of D should be independent of the choice of isomorphism EndF (V ) ∼ = V n. 9.3 Expansion by Minors We now show that one can recursively compute the determinant. It suffices to demonstrate that a recursive formula can be produced for a given A ∈ Matn×n (F). We work from the definition det A := X sgn (σ) n Y aiσ(i) = i=1 σ∈Sn X Ä ä sgn (σ) a1σ(1) · · · anσ(n) . σ∈Sn Fix a particular index i ∈ {1, . . . , n} =: [n], and observe that n Y akσ(k) = aiσ(i) k=1 Y akσ(k) . k∈[n]\{i} Let j := σ(i). The following exercise is to deduce the details leading to our recursive formula for determinant computation. Exercise 9.7. Let Pσ = nk=1 akσ(k) , and take aij as above. Let Pσij = k∈[n]\{i} akσ(k) . Let Pσ and Pσij be the respective patterns corresponding to these products (taken in order of the first index). Show that Q Q 45 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens a. sgn (σ) = sgn (Pσ ), b. sgn (Pσ ) = (−1)i+j sgn Pσij , c. sgn (σ)Pσ = (−1)i+j aij Pσij . Theorem 9.2 (Expansion by Minors/The Laplace Expansion). Let A = (aij ) ∈ Matn×n (F). Fix a column (or row), with index j (or i respectively). Denote by Aij the submatrix of A obtained by removing the i-th row and j-th column. Then det A = n X (−1)i+j aij det(Aij ) = i=1 n X (−1)i+j aij det(Aij ) . j=1 The rightmost formula being an expansion by minors along the i-th row, and the middle formula being an expansion by minors down the j-th column. Note that the pattern for choosing the signs, as shown in the above preceding exercise, is a checkerboard, with the upper left corner positive: + − + − .. . − + − + .. . + − + − .. . − + − + .. . ... ... ... ... .. . Exercise 9.8. Let A ∈ Matn×n R and suppose that k is a positive integer such that A has a k × k minor which has nonzero determinant, and such that there are no minors of larger size in A with nonzero determinant (note, the minor might be A itself). Show that rk A = k. Moreover, show that if rk A = k for some k, then the largest size of a nonzero minor in A is k × k. 9.4 Cramer’s Rule and the Inverse Matrix Theorem Theorem 9.3 (Cramer’s Rule). Consider the linear system Ax = b, where A ∈ Matn×n (R) and b ∈ Rn . Suppose x is the unique solution to the system, and xi = ei · x is the i-th component of x. Then det(Ab,i ) xi = , det(A) where Ab,i is the matrix obtained from A by replacing the i-th column with the vector b. Proof. We compute det(Ab,i ) assuming that Ax = b. We write A = [v1 , . . . , vn ], where vj is the j-th column of A, as is usual. Then det(Ab,i ) = v1 v2 . . . vi−1 b vi+1 . . . vn = v1 v2 . . . Ax . . . vn = v1 . . . (x1 v1 + . . . + xi vi + . . . + xn vn ) = v1 . . . x i vi . . . vn = x i v1 . . . vi . . . vn = xi det(A) . (2) (3) ... vn (4) (5) (6) (7) Since x is the unique solution, A has nonzero determinant (as it must be invertible), and we conclude that for each i ∈ {1, . . . , n} det(Ab,i ) xi = . det(A) 46 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens An interesting corollary of this is the following algorithm for computing the inverse of an invertible matrix. Define the (i, j)-th cofactor of A to be cij = det AijÄ where Aijä is the matrix obtained from A by removing the i-th row and j-th column, and let C = (−1)i+j cij be the signed cofactor matrix. Then the classical adjoint is A∗ := C τ . Corollary 9.1. If A ∈ Matn×n (F) is invertible, then the inverse of A is given by A−1 = 1 A∗ . det A Exercise 9.9. Prove the above corollary, using Cramer’s rule. (The proof was given in class and can be found in Bretscher, but see if you can reproduce it without referencing anything other than Cramer’s rule!) 10 10.1 Eigenvectors, Eigenvalues, and the Characteristic Equation The concepts of eigenvectors and eigenvalues Consider the following puzzle, whose solution is intuitive. We have three friends sitting around a table, and each is given some amount of putty: at time t = 0 minutes one of them has a > 0 grams of putty, another has b > 0 grams of putty, and the last individual has c > 0 grams of putty. They play with their respective wads of putty for nearly a minute, and then divide their wads into perfect halves. Exactly at the one minute mark, each person passes one half to the friend to their left, and the other half to the friend to their right. They then play with their wads of putty for nearly another minute before agreeing to again divide and pass exactly as they did at t = 1. For each integer number of minutes n, at exactly t = n they pass half of the putty in their possession at the time to the adjacent friends. What happens in the long term? Does any one friend end up with all of the putty, or most of the putty, or does it rather approach an equilibrium? What we’ve described is an example of a discrete dynamical system. In this particular case, it is in fact a linear system: you can check that if xt is the vector describing the putty in possession of our three friends at time t, then at time t = n we have xn = Axn−1 , where 0 1/2 1/2 1/2 0 1/2 . A= 1/2 1/2 0 It is easy to see that we can define a function, for nonnegative integral t: x• : Z≥0 → R3 , xt = At x0 , where x0 = ae1 + be2 + ce3 is the initial vector describing the putty held by each friend at time t = 0. The question of long term behavior is then stated mathematically as “Find lim xn = lim An x0 , n→∞ n→∞ if it exists.” One observation about this putty problem: at each step, the total amount of putty in the hands of the collective of friends is conserved. This might give us some hope that the limit exists, but of course we need to understand what it means for this system to converge for a given initial value x, and actually show that it does (if this is the case). Before we analyze this system in full, let us explore two-dimensional systems, and define an incredibly useful tool which will allow us to solve such linear discrete dynamical systems (LDSSs for short). 47 Math 235.9 - Lin. Alg. Course Notes 2015 Andrew J. Havens Exercise 10.1. Generalize the putty problem in two different ways to feature n friends. Intuitively, can you argue that the long term behavior of each such system is qualitatively the same as that we expect in the original putty problem? Example 10.1. Let us consider the matrix ñ 1 2 0 −1 ô . Suppose we wanted to understand the action of the map x 7→ Ax on the plane R2 . One natural question is “does the map T(x) = Ax admit any invariant proper subspaces (in this case, lines) in R2 ?” That is, are there lines L such that the the image TL of L is L ? Suppose that L ⊂ R2 is a 1-dimensional subspace fixed by the map T. Then there is some nonzero vector v ∈ R2 such that L = span {v}. Then Tv = Av = λv for some scalar λ ∈ R, since Tv ∈ span {v}. We can rearrange this equation as Av − λv = (A − λI2 )v = 0 . Thus, v ∈ ker(A − λI2 ). Since we assumed v 6= 0, it follows that det(A − λI2 ) = 0. This gives us a polynomial equation, which should determine λ. We call this the characteristic equation of the matrix A. Using the given values, we have Çñ det 1 2 0 −1 ô ñ − λ 0 0 λ ôå Çñ = det 1−λ 2 0 −1 − λ ôå = (1 − λ)(−1 − λ) = 0 ⇐⇒ λ = 1 or λ = −1 . That we get two such scalars λ suggests that there are two subspaces invariant with respect to our map T. The λs are called eigenvalues, and the corresponding invariant subspaces are called eigenspaces (“eigen” means “own” or “self” in German, though it’s come to mean “characteristic” or “self-similar” owing to it’s extensive appearance in modern mathematics as a prefix for gadgets coming from linear operators.) We can find a pair of eigenvectors describing our two eigenlines. Indeed, we can use the values of λ we found to solve the vector equations (A − (1)I2 )v1 = 0 . (A − (−1)I2 )v1 = 0 . Exercise 10.2. Find the vectors v1 and v2 above. Note that in class we deduced that we could read off the eigenvalues from the main diagonal of the matrix in this case, since the matrix is upper-triangular. In general, an upper triangular matrix or lower triangular matrix has eigenvalues precisely equal to the entries along the main diagonal. In class we used our eigenvectors to form a basis, and rewrote the linear map in eigencoordinates, exhibiting that in the appropriate coordinates, it was merely a reflection along across one axis. Please read Bretscher 7.1 - 7.3, which will cover much of the following topics: 48 Math 235.9 - Lin. Alg. Course Notes 2015 10.2 The characteristic equation 10.3 Eigenvalue formulae for Traces and Determinants 10.4 Eigenspaces and Eigenbases 10.5 Diagonalization 10.6 Jordan Canonical form 11 Orthogonality and Inner Product Spaces Will there be time? 49 Andrew J. Havens