1 00:00:01,410 --> 00:00:07,260 Okay. Good morning, everybody. You see, there is a questionnaire, please, to fill that out and we'll collect it at the end. 2 00:00:07,770 --> 00:00:14,460 Also, assignment three in the usual pattern is due Tuesday, the week after next. 3 00:00:15,810 --> 00:00:19,230 I guess assignment two will be handed back on next Tuesday. 4 00:00:21,600 --> 00:00:24,750 McHale Is there anything you want to tell people about assignments? 5 00:00:25,530 --> 00:00:34,850 Okay, we're fine. All right. So today's our last day on linear algebra, and then the final few lectures of this term will be related to optimisation. 6 00:00:35,490 --> 00:00:41,910 And what I want to do today is first talk at a high level about matrix factorisation and algorithms, 7 00:00:42,270 --> 00:00:48,330 and then talk about the singular value decomposition. One of the most important of all, the matrix factorisation. 8 00:00:48,690 --> 00:01:01,810 So. So Matrix Factorisation and I guess that Section 2.8. 9 00:01:12,330 --> 00:01:16,350 Linear algebra is more or less it was more or less born in the 19th century. 10 00:01:17,340 --> 00:01:22,650 And then after computers came along this way of thinking in terms of matrix factorisation came along. 11 00:01:22,920 --> 00:01:32,490 It turned out that a fruitful way to think of algorithms was as taking a matrix A and turning it into some product of simpler matrices. 12 00:01:32,730 --> 00:01:39,720 And in fact, I like to say that this is the central dogma of numerical linear algebra. 13 00:01:42,000 --> 00:01:50,670 So central dogma of numerical linear algebra is that algorithms, at least the classical algorithms. 14 00:01:52,460 --> 00:02:07,810 Correspond to matrix factorisation. This expression, the central dogma is one that Francis Crick made famous in biology, 15 00:02:07,810 --> 00:02:12,650 where the central dogma is that DNA turns into RNA, which turns into proteins. 16 00:02:12,670 --> 00:02:17,860 So I think it's a good idea that every field should have a central dogma that's ours. 17 00:02:18,310 --> 00:02:27,730 It sort of came along gradually over the decades, 1950s, 1960s, and is now just absolutely the way that numerical people think about these algorithms. 18 00:02:28,240 --> 00:02:36,400 The general flavour is that you take an arbitrary matrix and you turn it into pieces that might be diagonal or triangular or orthogonal. 19 00:02:36,610 --> 00:02:42,460 Those are the three big ones diagonal try, diagonal, triangular or orthogonal. 20 00:02:43,000 --> 00:02:48,010 From those pieces, you can work with pieces like that more easily than with general matrices. 21 00:02:48,340 --> 00:02:54,210 For example, if you want to find eigenvalues of a matrix, the algorithms make the matrix diagonal. 22 00:02:54,220 --> 00:02:57,940 And then of course, you know, the eigenvalues of a diagonal matrix. 23 00:02:57,940 --> 00:03:07,670 They're the entries on the diagonal. Now what I want to do is summarise the seven most important factorisation in numerical linear algebra. 24 00:03:07,690 --> 00:03:11,620 So here they are. The first one is the Q r factorisation. 25 00:03:14,370 --> 00:03:24,460 We've seen most, but not all of these. And in my summary, I'm going to suppose that a is a real square matrix. 26 00:03:24,700 --> 00:03:31,509 So that's of course not always true, but let's imagine that a is a real matrix of dimension. 27 00:03:31,510 --> 00:03:41,589 And by M in that case the Q are factorisation is a factorisation of a matrix a an arbitrary real square matrix. 28 00:03:41,590 --> 00:03:46,720 A into q times. R Where Q is orthogonal. 29 00:03:49,840 --> 00:04:03,120 An hour is up a triangular. Let me remind you that if A were rectangular, then Q would have all the normal columns. 30 00:04:03,120 --> 00:04:08,970 But when a square Q is square. So those are the normal columns mean that it's an orthogonal matrix. 31 00:04:09,420 --> 00:04:13,499 Now, what is this used for? Well, it's used a lot. It's used all over the place. 32 00:04:13,500 --> 00:04:18,060 But in particular, it's important in least squares calculations. 33 00:04:19,710 --> 00:04:22,920 It's the starting point of anything you do and least squares, really. 34 00:04:23,610 --> 00:04:31,470 And it's also a step in iterative algorithms for eigenvalues and related things. 35 00:04:34,470 --> 00:04:41,010 So in particular for eigenvalues, which I like to write like that, and the singular value decomposition, 36 00:04:41,250 --> 00:04:47,310 the way those things are computed is by an iterative algorithm where at each step you do Q, r, factorisation. 37 00:04:47,640 --> 00:04:56,490 So this really transformed numerical linear algebra in the 1960s using the use of the q r factorisation as proposed by householder. 38 00:04:58,520 --> 00:05:01,700 The second one I'd like to mention is Lou Factorisation. 39 00:05:03,980 --> 00:05:09,320 Now, this is the oldest one in some sense. This is Gaussian elimination, basically. 40 00:05:09,590 --> 00:05:12,740 This is what we all learn when we first encounter linear algebra. 41 00:05:12,980 --> 00:05:18,350 You subtract multiples of one row of a matrix from other rows to introduce zeros. 42 00:05:18,680 --> 00:05:25,340 You make a matrix triangular by Gaussian elimination that can be regarded as an l u factorisation. 43 00:05:25,580 --> 00:05:32,360 Now, in the simplest form, it would be a equals l u, where that's lower triangular and that's upper triangular. 44 00:05:32,720 --> 00:05:36,830 But more generally, there's a permutation matrix on the left. 45 00:05:37,310 --> 00:05:41,000 So this would be the standard version of Gaussian elimination. 46 00:05:41,480 --> 00:05:44,480 L is unit lower triangular. 47 00:05:47,060 --> 00:05:57,890 That means that l is lower triangular with one is on the diagonal u is upper triangular. 48 00:06:02,640 --> 00:06:18,590 And P is a permutation matrix. Now a permutation matrix is a matrix whose rows are permutations of the rows of the identity equivalently, 49 00:06:18,590 --> 00:06:22,040 whose columns are permutations of the columns of the identity. 50 00:06:22,310 --> 00:06:25,400 When you take p times A, you're commuting its rows. 51 00:06:25,610 --> 00:06:33,050 Gaussian elimination typically is set up to per mute rows in order to make sure that the numbers we divide by are not too small. 52 00:06:33,530 --> 00:06:42,170 So this is the standard version of Gaussian elimination with partial pivoting, which is the workhorse of solution of systems of equations. 53 00:06:42,800 --> 00:06:52,550 Now, one of the handouts maybe you can look at now is an essay I wrote a few years ago about Gaussian elimination as an iterative algorithm. 54 00:06:53,960 --> 00:06:55,820 There's some fascinating history here. 55 00:06:56,990 --> 00:07:07,820 Let me remind you that conjugate gradient was invented in 1952, and the authors regarded it as both an iterative and a direct algorithm. 56 00:07:08,060 --> 00:07:12,980 But somehow most of the world interpreted it mostly in those early years as a direct algorithm. 57 00:07:13,220 --> 00:07:21,590 And then about 1971, with the work of John Reed and others, people began to interpret it as mostly an iterative algorithm. 58 00:07:23,710 --> 00:07:26,800 Now. Gaussian elimination, weirdly. 59 00:07:28,940 --> 00:07:35,150 Has a slightly analogous history. It's, of course, in some sense ancient certainly goes back to Gauss. 60 00:07:35,180 --> 00:07:38,360 Actually, it goes back to the Chinese. 2000 years ago. 61 00:07:38,930 --> 00:07:46,010 And of course, its main use is a direct is as a direct algorithm, which means you have a matrix. 62 00:07:46,220 --> 00:07:49,220 You do the elimination and you get to the end and then you stop. 63 00:07:50,870 --> 00:07:56,030 Now, that's still true. That's still the main use. But sort of in the last ten years. 64 00:07:59,300 --> 00:08:04,760 There's a big business going around where people do low rank compressions of things. 65 00:08:05,930 --> 00:08:12,649 So this is a big theme in large scale linear algebra these days where you take a matrix that 66 00:08:12,650 --> 00:08:18,950 might be a million by a million and you try to represented by something of rank 100 or 200. 67 00:08:19,280 --> 00:08:22,790 Now, what does that mean to be rank 100 or 200? It means. 68 00:08:24,090 --> 00:08:33,720 u1v1 transpose plus you to the to transpose and so on up to some modest number 200, let's say. 69 00:08:36,640 --> 00:08:44,200 So there's a picture of a rank one matrix plus another rank, one matrix plus another micron matrix and so on. 70 00:08:44,210 --> 00:08:47,830 200 of them. That's a rank 200 matrix. 71 00:08:49,080 --> 00:08:55,740 That's a fruitful way to approximate matrices of dimensions in the millions somehow, sometimes. 72 00:08:56,130 --> 00:09:01,650 For example, the famous Netflix competition where people were trying to predict movie preferences. 73 00:09:01,860 --> 00:09:07,620 Is in this general area. You have this vast matrix where one dimension is movies. 74 00:09:07,620 --> 00:09:09,630 The other dimension is viewers of movies. 75 00:09:10,080 --> 00:09:15,570 You only have a few of the entries, but if you make the approximation that the Matrix has reasonably low rank, 76 00:09:15,840 --> 00:09:18,840 you can predict the next movie that you're going to like. 77 00:09:19,890 --> 00:09:23,580 That turns out to be intimately related with Gaussian elimination. 78 00:09:23,580 --> 00:09:31,320 And let me just draw the picture of why that is. Suppose you have a million by a million matrix or a 10 million by a million matrix. 79 00:09:32,040 --> 00:09:38,850 Suppose you find the biggest entry in it and then consider that row and that column. 80 00:09:40,350 --> 00:09:45,360 Well, the product of that row times, that column is a rank one approximation. 81 00:09:46,800 --> 00:09:52,560 If you take your matrix and subtract off that rank, one approximation suitably scaled, 82 00:09:53,130 --> 00:09:56,610 you get something which is now zero in that column and that row. 83 00:09:57,240 --> 00:09:59,250 Then you do it again. So that was the first step. 84 00:10:00,480 --> 00:10:13,470 At step two, you find the biggest entry of what remains, and you draw a row and a column through there, and then you subtract off that rank one term. 85 00:10:14,070 --> 00:10:18,060 So we've now approximated this huge matrix by a rank to approximation. 86 00:10:18,360 --> 00:10:24,900 If you continue that process, that turns out to be a pretty good way to find low rank approximations to matrices. 87 00:10:25,260 --> 00:10:28,950 And lo and behold, that's exactly Gaussian elimination. 88 00:10:28,950 --> 00:10:34,320 It's actually Gaussian elimination with not only row interchanges, but also column interchanges. 89 00:10:34,530 --> 00:10:39,120 But basically, that's one way of interpreting the Gaussian elimination process. 90 00:10:40,500 --> 00:10:45,750 So I'm telling you on this panel that every algorithm should be interpreted as a matrix factorisation. 91 00:10:46,200 --> 00:10:52,740 But I'm contradicting myself over here because the easiest way to draw this picture is not as a matrix factorisation, 92 00:10:52,950 --> 00:10:57,270 but just as a decomposition into a sum of rank one pieces. 93 00:10:59,820 --> 00:11:07,870 So just to finish off the history of conjugate gradients as a direct algorithm essentially is dead and nobody would ever use it that way. 94 00:11:07,890 --> 00:11:16,140 Hardly. Gaussian elimination remains mostly a direct algorithm, but this iterative side of it is growing in a dramatic way. 95 00:11:18,280 --> 00:11:23,950 An important part of large scale data manipulation. Okay. 96 00:11:23,960 --> 00:11:30,230 That's the end of the digression. Let's return to the picture. 97 00:11:30,410 --> 00:11:33,800 Of course, I should summarise what L'ue is used for. 98 00:11:34,010 --> 00:11:38,030 So it's used for systems of equations. 99 00:11:43,240 --> 00:11:44,710 And low rank compression. 100 00:11:54,480 --> 00:12:04,590 Now the next one is called classic factorisation, and that's Gaussian elimination in the special case of a symmetric, positive, definite matrix. 101 00:12:09,040 --> 00:12:13,209 You remember my rule of thumb that half of the problems that arise in practice are symmetric. 102 00:12:13,210 --> 00:12:25,030 Positive. Definite. Suppose if a is symmetric positive definite, then we can write a as an l u factor with a symmetry in it. 103 00:12:25,240 --> 00:12:29,770 So we write a equals r. Transpose r. 104 00:12:31,660 --> 00:12:39,239 So now hour is up for triangular. You can actually do this. 105 00:12:39,240 --> 00:12:45,480 If a is a semi definite, it might have zero eigenvalues, but it can't have negative eigenvalues. 106 00:12:45,750 --> 00:12:51,870 And this, of course, is used for symmetric positive definite systems. 107 00:12:53,790 --> 00:13:04,130 Of equations. So in MATLAB, for example, if you type backslash, it actually figures out whether your matrix is symmetric or not. 108 00:13:04,400 --> 00:13:08,300 If it's not symmetric, it uses L2 factorisation. 109 00:13:08,660 --> 00:13:15,980 If it is symmetric, it tries to key factor zation and if that's successful, then it knows it was not only symmetric but positive definite. 110 00:13:16,250 --> 00:13:19,760 If it's unsuccessful, it knows that it was symmetric but not positive. 111 00:13:19,760 --> 00:13:25,700 Definite. And it goes back and uses lru factorisation. Let me mention. 112 00:13:25,700 --> 00:13:29,810 And the fourth one is eigenvalue factorisation. 113 00:13:42,140 --> 00:13:45,890 So when you do an eigenvalue factorisation, you're making a matrix diagonal. 114 00:13:47,120 --> 00:13:53,630 So we could say like this A equals the D the inverse. 115 00:13:54,080 --> 00:14:00,230 Now, this is not always possible because not every matrix is diagonal visible, but generically every matrix is. 116 00:14:00,530 --> 00:14:03,680 If you take a matrix that isn't and perturb it a little bit, it will be. 117 00:14:04,370 --> 00:14:08,510 So here what we have is a has to be a diagonal sizeable square matrix. 118 00:14:12,220 --> 00:14:21,400 And this has to be non singular. And D is diagonal. 119 00:14:26,220 --> 00:14:29,850 So this is called a similarity transformation. And as you know, 120 00:14:29,850 --> 00:14:37,320 I'm sure the eigenvalues of a matrix are the same as the eigenvalues of an another matrix related to it by a similarity transformation. 121 00:14:37,590 --> 00:14:43,770 So whatever V is, so long as it's not singular, the eigenvalues of A are the same as the eigenvalues of D, 122 00:14:44,100 --> 00:14:52,890 and if D is diagonal, those are the diagonal entries. So let me just remind you of the point here, which is algorithms in all of these cases. 123 00:14:54,370 --> 00:14:58,780 The algorithm that is the standard one consists of constructing the factorisation 124 00:14:59,050 --> 00:15:03,400 to simplify the problem and then doing what you need to do in that simpler form. 125 00:15:03,730 --> 00:15:07,540 So for example, if you want to solve a system of equations, 126 00:15:07,540 --> 00:15:15,369 you turn the matrix into a lower triangular and an upper triangular product that then amounts to a succession of two problems, 127 00:15:15,370 --> 00:15:20,469 each of which is triangular. It's easy to solve a triangular system by substitution. 128 00:15:20,470 --> 00:15:22,720 You do one variable, then the next, then the next. 129 00:15:23,320 --> 00:15:31,000 Similarly, in this case, if you can make the matrix diagonal, then of course it's trivial to solve the eigenvalue problem. 130 00:15:31,150 --> 00:15:37,660 So that's how it's done. Somehow or other you make the matrix diagonal, then you've got an easy eigenvalue problem. 131 00:15:39,840 --> 00:15:44,070 I have three more to mention, so I guess I'll leave those there and move here. 132 00:15:49,770 --> 00:15:55,679 The fifth one is again eigenvalue factorisation, but in the special case of a symmetric matrix. 133 00:15:55,680 --> 00:16:02,040 So I'll call it the orthogonal eigenvalue factorisation. 134 00:16:12,930 --> 00:16:23,729 I remind you that if you have a real symmetric matrix then it has a complete set of orthogonal eigenvectors in the continuous analogue, 135 00:16:23,730 --> 00:16:27,330 as if you have a mission operator like the Schrödinger operator. 136 00:16:27,540 --> 00:16:34,440 It has a complete set of orthogonal eigen functions. So that's the case where everything works as beautifully as possible. 137 00:16:35,070 --> 00:16:43,800 If you think of it in terms of matrices and factorisation, this becomes Q de, q transpose. 138 00:16:44,040 --> 00:16:47,640 A special case of this in which V is orthogonal. 139 00:16:48,840 --> 00:16:56,650 So in this case, Q is orthogonal. And D is diagonal. 140 00:17:00,900 --> 00:17:10,020 So the algorithms for solving symmetric matrix eigenvalue problems consist of orthogonal operations that reduce it to diagonal form. 141 00:17:12,240 --> 00:17:19,560 There's another eigenvalue factorisation, which in fact is closer to what the algorithms do than what I described here. 142 00:17:20,850 --> 00:17:22,980 And that's called the sure factorisation. 143 00:17:32,010 --> 00:17:39,570 So if you have a diagonal sizeable matrix, then there exists some similarity transformation which takes it to diagonal form. 144 00:17:39,900 --> 00:17:46,500 But you can imagine there must be something, a little suspect about that operation because if your matrix is merely non diagonal, 145 00:17:46,500 --> 00:17:49,870 Isabelle, something's got to go wrong here. And that is indeed true. 146 00:17:49,890 --> 00:18:00,070 The inverse becomes huge, typically. It turns out that every matrix diagonal, visible or not, can be made triangular as opposed to diagonal. 147 00:18:00,660 --> 00:18:07,440 So the sure factorisation is a equals q t q transpose. 148 00:18:09,290 --> 00:18:17,930 Where Q is orthogonal and T is upper triangular. 149 00:18:26,390 --> 00:18:35,090 So let me just emphasise again that for this kind of eigenvalue factorisation where you make the matrix diagonal, 150 00:18:35,390 --> 00:18:44,750 diagonal that only works for matrices that are diagonal visible, whereas the sure factorisation works for any matrix, any square matrix. 151 00:18:47,380 --> 00:18:54,220 So that in fact is what algorithms really do normally for eigenvalues of non symmetric matrices. 152 00:18:56,470 --> 00:18:59,500 And then the final one is the singular value decomposition. 153 00:19:04,810 --> 00:19:07,540 The SVOD, the singular value decomposition. 154 00:19:10,710 --> 00:19:21,120 The roots of this are a century or more old, but it became big computationally in the 1960s, and the famous man who made it big was Gene Ghalib. 155 00:19:22,260 --> 00:19:25,770 What you have here is a factorisation of a matrix a. 156 00:19:35,550 --> 00:19:46,230 Into a product you and it's normally written sigma v transpose where you and V are orthogonal. 157 00:19:50,450 --> 00:19:54,980 And sigma is diagonal. And more than that. 158 00:19:56,090 --> 00:20:06,590 It's if we write it in the usual form, it would have entries sigma one, sigma two down to sigma n and these are real numbers that are sorted. 159 00:20:06,860 --> 00:20:13,790 So Sigma one is bigger than or equal to Sigma two and so on down to Sigma N, bigger than or equal to zero. 160 00:20:15,860 --> 00:20:18,920 So that's the singular value decomposition. 161 00:20:19,430 --> 00:20:22,730 It reduces a to a diagonal matrix. 162 00:20:23,450 --> 00:20:29,769 You can always do that. But the catch is it's using different vector matrices. 163 00:20:29,770 --> 00:20:33,430 On the two side, you and V are both orthogonal, but they're not the same in general. 164 00:20:33,640 --> 00:20:40,360 So that's not a similarity transformation. You can do it for any A, doesn't have to be real. 165 00:20:40,720 --> 00:20:46,960 That doesn't have to be symmetric. So these are really seven of the glories of our field. 166 00:20:48,070 --> 00:20:49,240 Every one is important. 167 00:20:49,240 --> 00:20:58,930 This one is hugely important, both for classical dense matrices and in all sorts of ways in the large scale world, in data analysis and big data. 168 00:20:59,260 --> 00:21:02,530 People are always doing things related to the CVD. 169 00:21:02,890 --> 00:21:06,790 It's related to principal component analysis, and the proper, orthogonal, 170 00:21:06,790 --> 00:21:14,260 decomposition and coherent way of transformations are many ways that people have discovered things related to the 3D over the years. 171 00:21:15,790 --> 00:21:21,910 Now I want to draw your attention to the ones here that generalise to rectangular matrices. 172 00:21:22,450 --> 00:21:29,950 So QR Factorisation certainly does generalise to a rectangular matrix. 173 00:21:30,200 --> 00:21:35,410 L'ue. Factorisation also generalises to a rectangular matrix. 174 00:21:37,540 --> 00:21:44,470 Khaleesi does not because it's assuming symmetry. Eigenvalues does not because eigenvalues only make sense for square matrices. 175 00:21:44,740 --> 00:21:49,690 So of course, that one also doesn't and that one also doesn't, but the SVD generalises. 176 00:21:50,290 --> 00:21:56,439 So those three apply to non symmetric and non square matrices. 177 00:21:56,440 --> 00:22:03,910 Also, classically, nobody would think of U for a non square matrix because classically we think of it as solving a system of equations. 178 00:22:04,210 --> 00:22:10,000 But as soon as you get into these rank compression things, you are certainly using non square matrices. 179 00:22:12,100 --> 00:22:17,650 Okay. So what we're going to do now is play with these factors, Asians with the code called M16. 180 00:22:32,370 --> 00:22:40,230 So all I'm going to do here is pick a five by five matrix that I like and then play with it with the five with the seven factorisation. 181 00:22:40,440 --> 00:22:49,079 So let's say A equals magic of five, which is a magic square consisting of numbers that all add up to the same thing. 182 00:22:49,080 --> 00:22:52,409 Rows and columns. You probably know what that number is. 183 00:22:52,410 --> 00:22:55,800 If I take the sums of the columns, I get 65. 184 00:22:55,980 --> 00:23:01,260 Each column adds up to 65. If I take the sums of the rows, I get 65. 185 00:23:01,270 --> 00:23:07,080 Each row adds up to 65, gets the convenient non symmetric matrix to play with. 186 00:23:08,250 --> 00:23:12,959 So let's look at the QR factorisation of a there it is. 187 00:23:12,960 --> 00:23:18,960 Let's make formats short. So there's a and there's its QR factorisation. 188 00:23:20,340 --> 00:23:31,320 Of course you can't really see that Q is orthogonal, but you can see that our is upper triangular to machine precision a should be equal to Q are so 189 00:23:31,590 --> 00:23:38,870 if I say Q times are minus A you see a matrix whose entries are of size ten to the minus 14th. 190 00:23:41,720 --> 00:23:53,000 So that's to our factorisation. Let's do L'ue. So there's a again, let's say l u p equals l u of a. 191 00:23:56,910 --> 00:24:05,730 That's wrong. Now you have. Right. So you see there that l is lower triangular and moreover it has ones on the diagonal. 192 00:24:05,970 --> 00:24:09,000 You is upper triangular and p is a permutation matrix. 193 00:24:09,330 --> 00:24:15,570 I didn't say that. Normally P would be chosen at each step to bring the biggest entry in a column to the diagonal. 194 00:24:15,810 --> 00:24:25,290 If that's done, then l is not on the unit lower triangular, but all the entries below the diagonal have size at most one in absolute value. 195 00:24:25,830 --> 00:24:28,890 Notice all the entries of l are positive. That's. 196 00:24:29,890 --> 00:24:34,590 Totally not typical. And I presume that's a property. 197 00:24:34,600 --> 00:24:38,460 I don't know. I'm guessing that's the property of magic squares. I've never noticed it before. 198 00:24:38,470 --> 00:24:44,620 Is it? Here we have a statistical question. Is a five by five example enough evidence that. 199 00:24:45,550 --> 00:24:49,690 I can't resist trying it. Let's see what happens. Let's just for fun. 200 00:24:49,690 --> 00:24:56,200 Say L you p equals lieu of magic of ten. 201 00:24:59,560 --> 00:25:04,990 And let's will then print l. Okay. Now, I don't have an opinion, but you all must have. 202 00:25:05,050 --> 00:25:08,800 Let's have votes. Who thinks that all the entries below the diagonal will be positive? 203 00:25:10,120 --> 00:25:13,720 Who thinks that they will be a mix of positive and negative. Okay. 204 00:25:14,660 --> 00:25:19,460 So the people most people think mixed. I maybe I'm inclined towards the other, but I really don't know. 205 00:25:20,010 --> 00:25:23,930 Um. You're right. Okay. So the five by five case was coincidence. 206 00:25:26,140 --> 00:25:32,870 Okay. Back to good old A. Oh, we didn't confirm that. 207 00:25:35,830 --> 00:25:43,780 Sorry. There's a. Once more there is its L2 factorisation. 208 00:25:44,680 --> 00:25:51,270 We didn't confirm that the factorisation gives the right thing, so let's do that p times a minus l times. 209 00:25:51,280 --> 00:25:55,560 You should be zero. And sure enough it's ten to the minus 14th. 210 00:25:57,130 --> 00:26:03,160 Okay. Now let's try a classic factorisation. So there's a and it's not symmetric, let alone positive. 211 00:26:03,160 --> 00:26:08,620 Definite. If I try call of a, it says error, which is the right answer. 212 00:26:09,370 --> 00:26:14,410 Now suppose I try r equals call of a transpose a. 213 00:26:15,630 --> 00:26:21,810 That is symmetric positive, definite. So it proceeds happily and indeed our transpose times are. 214 00:26:23,220 --> 00:26:32,010 Minus a transpose times a is of size machine precision that's actually larger than I'd like it to be. 215 00:26:32,010 --> 00:26:36,830 Ten to the minus 12th. I don't know why it's so big. Let's do another one. 216 00:26:37,490 --> 00:26:41,000 So there's a again, let's look at its eigenvalues. 217 00:26:42,040 --> 00:26:48,970 They happened to be all real numbers. A priori they might have been complex, perhaps, but they're all real numbers. 218 00:26:50,740 --> 00:26:57,220 We could say the comedy equals I gave a and then we have the matrix of eigenvectors. 219 00:26:57,640 --> 00:27:01,600 The five columns are the eigenvectors and then the matrix of eigenvalues. 220 00:27:03,250 --> 00:27:11,040 So. If a equals v v inverse, that's the same as saying a vehicle's v d. 221 00:27:11,080 --> 00:27:14,130 So the columns of V are the eigenvectors of a. 222 00:27:14,470 --> 00:27:20,560 And let's confirm that a times v minus V, times D should be zero. 223 00:27:21,760 --> 00:27:28,450 So make sure you're with me on this. A times V amounts to multiplying a by each of the five columns of V, 224 00:27:28,780 --> 00:27:34,630 and the results should be those columns scaled by the eigenvalues on the diagonal of V. 225 00:27:35,020 --> 00:27:38,140 So if these are really eigenvectors, we should get zero. 226 00:27:38,680 --> 00:27:44,040 And sure enough, ten to the -13. Okay. 227 00:27:44,430 --> 00:27:50,640 Let's try of a plus a transpose. 228 00:27:50,880 --> 00:27:55,680 That's a real not positive definite matrix. So it's symmetric but not positive. 229 00:27:55,680 --> 00:27:59,820 Definite. It has real eigenvalues and some of them are negative. 230 00:28:01,570 --> 00:28:16,030 So I could say QED equals I of a plus a and there you have the eigenvalues and you also have orthogonal or a normal eigenvectors. 231 00:28:18,580 --> 00:28:27,970 130 Y 130. Well, the Magic Square has its biggest eigenvalue 65 and the eigenvector is the constants all ones. 232 00:28:28,210 --> 00:28:35,530 Here, you see, that's that eigenvector constant. And because we've had a plus eight transpose, we have 65 plus 65. 233 00:28:40,130 --> 00:28:44,630 And if I confirmed that that works, I could say a plus, a transpose minus. 234 00:28:44,630 --> 00:28:50,070 Q Times, the times. Q transpose, and that is not zero. 235 00:28:50,090 --> 00:28:58,230 What did I do wrong? Oh I'm sorry. 236 00:28:58,240 --> 00:29:02,110 Cos I didn't do a placé transpose did I. Thank you. 237 00:29:03,790 --> 00:29:11,019 That should have been A-plus transpose. And now 130 is on the bottom. 238 00:29:11,020 --> 00:29:14,410 Right. That's okay. And let's now confirm. 239 00:29:15,820 --> 00:29:20,630 Yeah. Machine precision. Okay. 240 00:29:21,250 --> 00:29:25,899 Well, let me remind you again that the eigenvalues of a look like that and that every 241 00:29:25,900 --> 00:29:30,520 matrix has a sure factorisation an orthogonal reduction to triangular form. 242 00:29:30,850 --> 00:29:42,250 So if I say Q T equals sure of A, then you see we've got a orthogonal matrix and a triangular matrix. 243 00:29:42,580 --> 00:29:47,469 So I could say A minus. Q Times T times. 244 00:29:47,470 --> 00:29:50,620 Q Prime machine precision. 245 00:29:52,730 --> 00:30:00,620 Finally, we're going to talk about the SVOD. Now, the SVOD of a if you guessed type that you get the numbers, which are its singular values. 246 00:30:00,890 --> 00:30:07,370 If you type USV equals SVOD of a, you get the three matrices you FMV. 247 00:30:07,910 --> 00:30:13,940 So there we have diagonal, singular values, all positive real numbers and the two orthogonal matrices. 248 00:30:13,940 --> 00:30:18,469 You and V, I could say A minus you. 249 00:30:18,470 --> 00:30:23,260 Times F, times v. Transpose. 250 00:30:25,180 --> 00:30:31,749 So this kind of stuff is happening all the time. Whenever you compute with matrices, virtually anything you do will involve a matrix. 251 00:30:31,750 --> 00:30:36,829 Factorisation. Okay. 252 00:30:36,830 --> 00:31:22,470 So let's talk about the SVD in particular. So the speed is a very pretty topic. 253 00:31:23,160 --> 00:31:26,960 It is more important than you could possibly imagine in practice. 254 00:31:26,970 --> 00:31:37,360 It's used all over the place. And one of the prevailing facts decade after decade, especially at Oxford, 255 00:31:37,750 --> 00:31:42,220 is that mathematics students are not taught as much about the speed as they should be. 256 00:31:42,610 --> 00:31:48,549 At at more applied universities. The balance is better, but at a at a really high powered, 257 00:31:48,550 --> 00:31:53,890 pure mathematical bastion like Oxford University, students are not taught enough about the speed. 258 00:31:54,970 --> 00:31:59,080 I don't fully understand why that is. Of course, it's a newer concept in practice. 259 00:32:00,340 --> 00:32:07,160 Eigenvalues. Students Learn All About. Eigenvalues are essentially an algebraic concept. 260 00:32:07,820 --> 00:32:12,710 Singular values are more of a concept of analysis, so they almost have a completely different flavour. 261 00:32:12,980 --> 00:32:16,850 And the algebra always wins at the undergraduate mathematics level. 262 00:32:17,420 --> 00:32:24,710 So here, as in so many places, you probably find that engineers know more about singular values than mathematicians. 263 00:32:26,400 --> 00:32:29,550 Okay. Let's get a sense of what's involved here. 264 00:32:30,600 --> 00:32:36,899 So as I mentioned, the SVD consists of a factorisation into orthogonal times, 265 00:32:36,900 --> 00:32:44,820 diagonal times, orthogonal, and the diagonal entries are non-negative real numbers in order. 266 00:32:45,330 --> 00:32:49,410 And let's now allow you to be rectangular. 267 00:32:49,800 --> 00:33:00,090 So if a I mean, if A is a rectangular matrix and by n for simplicity, I'll assume that M is bigger than or equal to N. 268 00:33:01,990 --> 00:33:07,720 Then what we have here in the form that people would most often use it in practice. 269 00:33:09,530 --> 00:33:21,169 Is a rectangular matrix. A is equal to a rectangular matrix U times a diagonal matrix sigma times a square matrix. 270 00:33:21,170 --> 00:33:37,440 V So this is a equals u sigma v transpose and by n m by n and by n and by n u. 271 00:33:38,240 --> 00:33:43,130 When we talked about square matrices, that was orthogonal, but now it will have all the normal columns. 272 00:33:43,460 --> 00:33:47,030 So U has all the normal columns. 273 00:33:48,560 --> 00:33:56,799 And of course V is orthogonal. So that's the standard way that the the SVD is used. 274 00:33:56,800 --> 00:34:06,280 It's the idea is to take a matrix which maybe has a lot of rows and find new bases for the range and the domain. 275 00:34:06,700 --> 00:34:12,430 So in effect, this is a basis for the domain and this is the basis for the range of a. 276 00:34:12,790 --> 00:34:18,700 And here's the picture that is at the heart of everything. Take any matrix. 277 00:34:21,330 --> 00:34:25,770 If it's an m by n matrix, then it's a map. From our end. 278 00:34:27,260 --> 00:34:35,780 To our M. Let's see what happens when a is applied to a ball. 279 00:34:36,080 --> 00:34:42,350 So just the out of the ball defined in the obvious way in our end all the vectors of size at most one. 280 00:34:43,910 --> 00:34:52,750 Every matrix without exception. Always maps. The unit ball on to an ellipse, or more precisely, I should say, a hyper ellipsoid. 281 00:34:53,500 --> 00:34:58,260 What's a hyper ellipsoid? It's a ball that gets stretched in linear fashions. 282 00:34:58,720 --> 00:35:08,740 So every matrix, without exception. Will map 020, of course, and then it will map this ball into some hyper ellipse. 283 00:35:10,860 --> 00:35:17,969 This is an end dimensional object. This would generically be an end dimensional hyper ellipsoid, but it might be flat in some dimensions. 284 00:35:17,970 --> 00:35:23,520 So its, its actual dimension could be less than n in those degenerate cases. 285 00:35:24,330 --> 00:35:27,390 Now, the SVOD is all about describing that map. 286 00:35:30,720 --> 00:35:38,460 Suppose you have an ellipse? It has a major semi axis and a minor semi axis. 287 00:35:39,900 --> 00:35:44,160 Generically speaking, you have no choice here except plus or minus signs. 288 00:35:44,430 --> 00:35:48,300 If it happened that the two were of equal length and this was a circle, then you'd have a choice. 289 00:35:48,600 --> 00:35:54,570 But generically, they're uniquely defined. So let's call this one. 290 00:35:55,870 --> 00:35:58,900 Sigma one time, you one. 291 00:36:01,680 --> 00:36:05,610 The length of this is the first singular value. 292 00:36:06,420 --> 00:36:11,010 The direction of it is the first singular vector, which means the first column of you. 293 00:36:11,220 --> 00:36:20,320 So you one is the first column of you. And let's pretend this is a picture in any dimensions. 294 00:36:20,380 --> 00:36:24,220 So I'm going to call this one the last one. This will be sigma n. 295 00:36:25,150 --> 00:36:33,700 U n. So I really should have used the red pen for consistency. 296 00:36:40,980 --> 00:36:46,320 So we take our hyper ellipsoid and we look at its principal axes. 297 00:36:47,700 --> 00:36:55,260 Those are the singular vectors you sub j times the corresponding singular values sigma subject, 298 00:36:55,680 --> 00:36:58,890 and it's clear geometrically that that's a sensible thing you can do. 299 00:36:59,370 --> 00:37:04,950 And now the remarkable fact, which of course, can be proved and may not be obvious at first, 300 00:37:05,280 --> 00:37:11,100 is that the pre images of these principal axes are always orthogonal over here. 301 00:37:11,580 --> 00:37:16,920 So we don't know where the pre image of that line is and who knows where. 302 00:37:16,950 --> 00:37:19,620 Let's say there and let's call that V one. 303 00:37:20,670 --> 00:37:29,910 That's the first column of the orthogonal matrix v and we don't know where the pre image of that will be, but it will be orthogonal to v one. 304 00:37:30,300 --> 00:37:34,330 So let's suppose it's here. That's V two. 305 00:37:36,680 --> 00:37:46,059 VM. Sorry, that's. We're thinking of that as the last one, aren't we. So that's what the SVT does. 306 00:37:46,060 --> 00:37:54,400 That's the picture you should always have in your head. The SVT is about describing the orthogonal bases that correspond to each other. 307 00:37:54,640 --> 00:38:03,670 When you map a unit ball onto a hyper ellipsoid and it's perhaps intuitively plausible that because of the orthogonal ity of those bases, 308 00:38:03,670 --> 00:38:11,980 these are numerically a very good way to work. If you want to study the properties of a or of a inverse, this is a magnificent way to do that. 309 00:38:13,120 --> 00:38:17,709 The singular value decomposition has no relevance to powers of a if a square. 310 00:38:17,710 --> 00:38:24,780 And you want to talk about a to the 10th array to the millionth, the speed is irrelevant because the left and the right have different bases. 311 00:38:24,790 --> 00:38:28,030 You can't you can't square it or qubit. You need eigenvalues for that. 312 00:38:28,450 --> 00:38:36,540 The SVT is always about a itself, not powers. Okay. 313 00:38:36,630 --> 00:38:44,740 Let me. I think I've actually said all this already, but let's just spell it out a little more fully. 314 00:38:45,010 --> 00:38:54,700 So the picture I'm talking about is that you have a matrix A and we are thinking in terms of multiplying it by. 315 00:38:57,480 --> 00:39:02,820 Vectors v one through v. N. That's equal to. 316 00:39:06,080 --> 00:39:11,380 A matrix you won through U.N. Times. 317 00:39:11,830 --> 00:39:17,820 The diagonal matrix. Sigma one through Sigma n. 318 00:39:21,290 --> 00:39:25,850 So I got this equation by taking that picture and multiplying it on the right by V. 319 00:39:26,690 --> 00:39:34,550 So the SVOD, this is the algebra corresponding to that picture a times a singular vector V 320 00:39:35,390 --> 00:39:39,680 gives you the corresponding singular vector you scaled by a singular value. 321 00:39:40,280 --> 00:39:57,320 So the picture and the algebra are the same. Now let's imagine that you have a vector B which is equal to a times that coefficient vector x. 322 00:40:00,680 --> 00:40:07,790 Then another way to say that is that B is equal to U Sigma, the transpose x. 323 00:40:09,290 --> 00:40:12,380 And another way to say that is that. 324 00:40:14,240 --> 00:40:23,550 You transpose B? Is equal to Sigma Z, transpose x. 325 00:40:26,280 --> 00:40:36,590 That's not a sum. That's a matrix sigma. And now that neurone in your brain fires and you see that this is the vector. 326 00:40:38,000 --> 00:40:41,450 Of coefficients of X. 327 00:40:43,910 --> 00:40:50,360 In the basis of columns of the. 328 00:40:55,170 --> 00:41:01,650 Now an arbitrary M vector could not be represented in as a linear combination of these and. 329 00:41:03,310 --> 00:41:07,900 Vectors you want through UN, but if it happens to be in this form, then it can be. 330 00:41:07,910 --> 00:41:11,500 So this is a vector of coefficients. 331 00:41:14,310 --> 00:41:17,400 Of be in the basis. 332 00:41:19,820 --> 00:41:23,240 Of columns of you. 333 00:41:25,270 --> 00:41:37,090 So in other words, what we're saying here is that if you take a matrix A and then change bases in both the domain and the range in this 3D fashion, 334 00:41:37,690 --> 00:41:46,000 then the problem becomes diagonal. Every matrix is diagonal in some suitable, orthogonal, chosen basis. 335 00:41:47,320 --> 00:41:55,370 And that's why the SVOD is so useful. Okay. 336 00:41:55,370 --> 00:42:00,770 I'm going to play with it in a minute on the computer. But first, I want to tell you some key algebraic facts. 337 00:42:03,800 --> 00:42:14,960 So here are some facts. And this is for rectangular, which, of course, has a special case would include square. 338 00:42:17,160 --> 00:42:21,690 So one fact every a has a singular value decomposition. 339 00:42:24,110 --> 00:42:32,110 In contrast to an eigenvalue decomposition. Singular vowels are unique. 340 00:42:35,320 --> 00:42:47,620 But not singular vectors. The singular vectors always have plus or minus one choices, and in none generic cases they have further choices. 341 00:42:47,630 --> 00:43:00,320 Beyond that, the two norm of a matrix is something we haven't defined, but it's a standard concept and it's equal to the first singular value of a. 342 00:43:09,970 --> 00:43:13,930 The obvious norm of a matrix is also something we haven't defined. 343 00:43:14,890 --> 00:43:18,219 People write at a sub f, but it's easy to define that one. 344 00:43:18,220 --> 00:43:20,380 It's just the square root of the sum, of the squares, 345 00:43:21,430 --> 00:43:27,040 of the entries of a and it's also equal to the square root of the sum of the squares, of the singular values. 346 00:43:28,900 --> 00:43:34,330 So the definition is that it's the sum of the entries squared. 347 00:43:36,020 --> 00:43:44,030 Square rooted. Well, that's equal to the sum of the singular value squared and then square root it. 348 00:43:47,190 --> 00:43:50,100 It's also true that the singular values of a. 349 00:43:54,280 --> 00:44:02,140 Are equal to some of the set of singular values of a is equal to the set of square roots of eigenvalues of a transpose. 350 00:44:10,540 --> 00:44:17,290 So mathematically that means you can relate this as 3D to an eigenvalue problem, but it's an ongoing value problem for a different matrix. 351 00:44:17,590 --> 00:44:24,490 And moreover, numerically, that's not something to do if you can avoid it because it sort of squares the condition number. 352 00:44:24,700 --> 00:44:27,630 So that's not the standard way people compute the CVD, 353 00:44:28,030 --> 00:44:33,670 but you could in principle compute the CVD by computing the eigenvalue decomposition of a transpose. 354 00:44:35,780 --> 00:44:44,900 Couple more facts. The rank of AA is equal to the number of non-zero singular values. 355 00:44:51,800 --> 00:44:57,920 So you can see that's a very practically important thing. When MATLAB computes the rank of a matrix, it finds the speed. 356 00:44:58,550 --> 00:45:05,900 It ignores singular values that are down at machine precision and tells you how many singular values are well above machine precision. 357 00:45:06,980 --> 00:45:13,459 The range of A is equal to the span of you. 358 00:45:13,460 --> 00:45:18,590 One through you are where R is the rank. 359 00:45:20,490 --> 00:45:27,060 So the singular vectors you sub j give you an orthogonal basis for the range of a. 360 00:45:29,610 --> 00:45:33,660 And then I also want to write down three facts, four square matrices. 361 00:45:45,400 --> 00:45:50,830 So first fact is that the product of the singular values so the product, 362 00:45:50,830 --> 00:46:02,590 let's say of sigma sub j is equal to the product of the absolute values of the eigenvalues, which in turn is the absolute value of the determinant. 363 00:46:06,320 --> 00:46:10,160 So singular values and eigenvalues have some connections. 364 00:46:15,120 --> 00:46:16,350 The normal, the inverse. 365 00:46:17,970 --> 00:46:26,610 And again, that would be the two norm of the inverse which we haven't defined is equal to one over the smallest singular value. 366 00:46:27,900 --> 00:46:37,260 And finally. The condition number of a which is defined as the product of the norm of a. 367 00:46:38,370 --> 00:46:43,880 And the norm of an inverse. Well, you know what that is? 368 00:46:44,060 --> 00:46:49,520 Because the norm of AA is sigma one. The norm of AA inverse is one over sigma n. 369 00:46:50,570 --> 00:46:55,340 So that's the ratio of the largest to the smallest singular value. 370 00:47:04,440 --> 00:47:08,220 So the speed is everywhere in numerical linear algebra. 371 00:47:10,290 --> 00:47:15,420 There isn't really time to talk about how it's used for low rank approximation, but let me just say a word or two. 372 00:47:15,630 --> 00:47:21,960 I mentioned that Gaussian elimination can be thought of as a way to compress a matrix, to find a low rank approximation. 373 00:47:22,320 --> 00:47:26,070 You have a million by million matrix. You find a rank 100 approximation. 374 00:47:26,520 --> 00:47:34,110 Well, the SVD is the optimal such approximation in the to norm and also in the obedience norm. 375 00:47:34,380 --> 00:47:39,690 So if you want the optimal rank 100 approximation to a million by million matrix, 376 00:47:39,690 --> 00:47:47,280 then mathematically you get that by taking the first hundred singular values and singular vectors you and singular vectors. 377 00:47:47,280 --> 00:47:51,090 V That's optimal in the mathematical sense. 378 00:47:51,180 --> 00:47:56,160 The trouble is, it's hard to compute. There's no way to compute the speed of a million by million matrix. 379 00:47:56,400 --> 00:48:04,740 So if the matrix is small enough that you can afford to compute, it says it gives you optimal algorithms for low rank approximation. 380 00:48:05,190 --> 00:48:10,560 If the matrix is huge, people use approximations to the CVD such as Gaussian elimination. 381 00:48:12,930 --> 00:48:16,440 And I just want to finish by playing around with some two by two matrices. 382 00:48:21,060 --> 00:48:25,560 So this is now the code called M 17. 383 00:48:29,470 --> 00:48:37,060 So if you glance at the code, you'll see it's an infinite loop. And at each step, it finds a random matrix of dimension two. 384 00:48:38,170 --> 00:48:45,370 And then a computer test feed. And then it draws a picture of the map from that ball to the hyper ellipsoid. 385 00:48:45,520 --> 00:48:51,610 Except it's not a hyper ellipsoid. It's just an ellipse. So I'll say M17. 386 00:48:58,020 --> 00:49:06,800 So. There is a matrix, two by two matrix and it maps the unit ball in the plane to there. 387 00:49:07,130 --> 00:49:12,410 And you can see we've drawn the singular vector U one times the singular value. 388 00:49:12,650 --> 00:49:16,620 And similarly over here. Those seem to be on the axes. 389 00:49:16,630 --> 00:49:20,250 That's an astonishing coincidence. It's completely random. 390 00:49:20,260 --> 00:49:25,900 How did that happen? It's totally misleading. You should not expect the blue and the red lines to be on the axes. 391 00:49:26,230 --> 00:49:29,260 What property of a made that the case. Can we spot that? 392 00:49:29,890 --> 00:49:34,040 Michael, are you quick enough to spot that? Oh. Anyway. 393 00:49:34,040 --> 00:49:39,320 I'm astonished. Somebody tell me later what property of A leads to that. 394 00:49:42,140 --> 00:49:46,760 Let's do another one, which I hope will be more generic. Okay, that's more generic. 395 00:49:48,740 --> 00:49:52,850 Let me draw your attention to the fact that there's no right hand rule here. 396 00:49:52,850 --> 00:49:55,970 The red one is on the right of the blue one, and from the left over there, 397 00:49:56,120 --> 00:50:01,729 it could go either way because some matrices are rotations and others are reflections. 398 00:50:01,730 --> 00:50:06,110 And you can predict it has to do with whether the determinant is positive or negative. 399 00:50:07,310 --> 00:50:12,590 So let's do a few more. That one also came out with a negative determinant. 400 00:50:12,600 --> 00:50:17,670 I can tell by the orientation of the blue and the red. Let's find one with a positive determinant there. 401 00:50:17,690 --> 00:50:21,860 Okay. So that must have a positive determinant because they're oriented the same way. 402 00:50:22,100 --> 00:50:26,390 This is a very well conditioned matrix because the ellipse is nearly a circle. 403 00:50:27,050 --> 00:50:31,310 Let's press the button a few times until we find an L Condition matrix. 404 00:50:33,100 --> 00:50:36,850 I'm looking for an ellipse that's very narrow. So that's air conditioned. 405 00:50:39,460 --> 00:50:43,720 Now. What does it mean to be ill conditioned? It means you're nearly rank deficient. 406 00:50:43,730 --> 00:50:47,410 If that were actually gift the segment, you'd be perfectly rank deficient. 407 00:50:47,590 --> 00:50:53,350 So if we look at that matrix, we should find that the first two columns are nearly multiples of one another. 408 00:50:53,620 --> 00:50:58,390 And sure enough, it looks as if the second column is about three and a half times the first column. 409 00:50:58,630 --> 00:51:02,680 Similarly, the second row is about nine. 410 00:51:02,710 --> 00:51:06,230 Did I say that wrong? No, that's right. 411 00:51:06,350 --> 00:51:12,800 And the second row is about -1.7 times the first drop or something like that. 412 00:51:13,100 --> 00:51:18,920 So every matrix has this property in 2D, it's just ellipsoid, but in multiple dimensions. 413 00:51:19,160 --> 00:51:22,910 It's an n dimensional ellipse. Okay. See you on Tuesday.