1 00:00:00,210 --> 00:00:02,520 First of all, on the quiz. I hope you've done that. 2 00:00:03,210 --> 00:00:14,100 The first and the third vectors are orthogonal and then the three vectors have norms square root of 102, square root of 100 and square root of 101. 3 00:00:14,100 --> 00:00:16,680 So it's the first one that has the largest norm. 4 00:00:18,030 --> 00:00:27,270 I thought I'd begin just for fun with something that has no legitimate connection with this course whatsoever, which is the code called M9. 5 00:00:27,450 --> 00:00:38,130 So let's have some fun with that one. We live in a world where almost anything can be explored easily with or without mathematical content. 6 00:00:38,340 --> 00:00:49,020 So let's see here. M9 is a little MATLAB script that's designed to play with colours. 7 00:00:49,710 --> 00:00:57,230 So if I type m nine colour quiz. What happens is that you see a bar of colour that appears. 8 00:00:57,240 --> 00:01:02,880 Now the default way that MATLAB handles colours is B red, green, blue. 9 00:01:03,390 --> 00:01:09,150 So every colour is a linear combination of an amount of red between zero and one, 10 00:01:09,450 --> 00:01:14,370 an amount of green between zero and one, and an amount of blue between zero and one. 11 00:01:15,120 --> 00:01:21,720 Now, this quiz allows you to choose one of 27 options in an attempt to match that colour. 12 00:01:21,960 --> 00:01:25,020 So you've got to decide, is the red 0.5 or one? 13 00:01:25,230 --> 00:01:29,640 And then is the green 0.5 or one? And is the blue 0.5 or one? 14 00:01:29,670 --> 00:01:33,270 So for example, suppose we conjecture that that colour is red. 15 00:01:33,810 --> 00:01:39,450 Then I could type one red, zero green, zero blue. 16 00:01:41,310 --> 00:01:45,330 And I discover we're wrong because that's red. It's different. 17 00:01:45,660 --> 00:01:49,350 So now I turn it over to you. You tell me, what should I try next? 18 00:01:50,880 --> 00:01:55,990 How much red? Point five, I hear how much green? 19 00:01:56,890 --> 00:01:59,950 Zero. How much blue point five. 20 00:02:01,120 --> 00:02:08,840 Wrong again. Okay. What should I try next? More read. 21 00:02:09,140 --> 00:02:12,140 Okay. So should we try 10.5? 22 00:02:15,450 --> 00:02:19,110 Nope. Well. What's that? 23 00:02:21,180 --> 00:02:25,540 101. Yay! 24 00:02:25,600 --> 00:02:28,690 Okay, you win. So that's good. Let's just do a few more. 25 00:02:28,960 --> 00:02:33,940 There are only 27 possibilities. Pretty soon you'll have them all. Okay, let's try it again. 26 00:02:35,620 --> 00:02:39,610 Oh, that's rather the same, but maybe that's good. What do you think? What was the last one? 27 00:02:39,620 --> 00:02:44,920 Remind me, was it 101101? 28 00:02:44,930 --> 00:02:49,650 So what's this? Do you suppose? How much red? 29 00:02:50,680 --> 00:02:54,610 One. How much? Green Point five. How much? 30 00:02:54,610 --> 00:02:58,090 Blue. Hey, we're getting good. 31 00:02:58,270 --> 00:03:04,829 I'll try a couple more. Okay. What's that? Just for fun. 32 00:03:04,830 --> 00:03:09,660 Let's try a pure blue. So if I said 001, now we're pure blue. 33 00:03:09,870 --> 00:03:14,470 And this is clearly not a pure blue. So what do you think it is? 34 00:03:18,560 --> 00:03:22,590 00.5, I think I hear. Now. 35 00:03:25,580 --> 00:03:30,120 50.51. Hey. 36 00:03:30,390 --> 00:03:37,340 Hey. Well done. Okay, we'll do one more. Some of the most fun colours are orange and yellow, and the trouble is, they rarely appear. 37 00:03:37,370 --> 00:03:42,050 It's interesting how our perception has much sharper gradients in some regions than others. 38 00:03:42,320 --> 00:03:46,760 So among these 27 colours, loosely speaking, most of them seem to be green. 39 00:03:47,390 --> 00:03:50,420 Maybe we'll be lucky and get yellow or orange. 40 00:03:51,650 --> 00:03:54,680 We could, of course, keep going until we get yellow or orange. 41 00:03:54,890 --> 00:04:03,220 Oh, we got yellow. Okay. Who knows what yellow is? One read. 42 00:04:04,540 --> 00:04:08,350 I think I heard one read. How much green? One. 43 00:04:08,830 --> 00:04:14,959 How much blue? Point five. Wow, that was impressive. 44 00:04:14,960 --> 00:04:19,040 I'm very impressed. I couldn't have done that even though I've I wrote this code years ago. 45 00:04:19,040 --> 00:04:25,420 I'll do one more and then we'll say, Oh, that's boring. Let's do a more interesting one. Oh, okay. 46 00:04:25,440 --> 00:04:30,700 That's a good one to start with. How much? Red, green and blue. 111. 47 00:04:30,710 --> 00:04:34,600 I think that's right. Actually, I can do that one in my head. Let's see. One, one. 48 00:04:35,080 --> 00:04:39,190 Yeah, you win. Okay, good. We're done. Thank you for allowing me to do that. 49 00:04:43,570 --> 00:04:46,720 So we're talking about orthogonal ity and least squares. 50 00:04:48,340 --> 00:04:54,580 And let me bring us back to where we were in the last lecture where I rather hastily mentioned these squares. 51 00:04:54,760 --> 00:05:00,100 We talked about orthogonal vectors, which are defined by having an inner product at zero. 52 00:05:00,310 --> 00:05:03,790 If you have a whole set of vectors, they're said to be orthogonal. 53 00:05:03,790 --> 00:05:09,970 If they're pairwise orthogonal, if you have a matrix whose columns are authored normal. 54 00:05:10,210 --> 00:05:14,860 There's no name for such a matrix, even though we use them all the time, unless it's square. 55 00:05:14,860 --> 00:05:19,600 And if you have a square matrix who's orthogonal, whose columns are normal, 56 00:05:19,840 --> 00:05:25,270 then it's called an orthogonal matrix, and then it's equal to the transpose as equal to its inverse. 57 00:05:25,840 --> 00:05:33,220 Now, just to remind you, the framework for least squares problems is usually m by n matrices. 58 00:05:33,490 --> 00:05:39,980 So let a, b, m by n, and let's suppose that M is bigger than N. 59 00:05:41,860 --> 00:05:50,440 In applications, it might be much bigger than N. Now, to remind you a least, squares solution to A x equals B. 60 00:05:57,520 --> 00:06:01,840 Is a vector x that minimises the two norm of the residual. 61 00:06:02,260 --> 00:06:11,889 So it's x such that a x minus be measured in our standard square root of the sum of the squares. 62 00:06:11,890 --> 00:06:14,080 Norm is as small as possible. 63 00:06:18,320 --> 00:06:26,060 I'm not giving you a theoretical course on these squares fitting, but I'm just quickly going over some key points in MATLAB. 64 00:06:26,330 --> 00:06:29,330 We haven't talked about algorithms that will begin to do that yet. 65 00:06:29,540 --> 00:06:35,480 In MATLAB, if you type X equals a backslash B, it somehow computes this x. 66 00:06:35,780 --> 00:06:40,730 And let's remind ourselves of the kind of stuff you can get that way. 67 00:06:40,970 --> 00:06:44,540 I'm going to again run the code m8 from last time. 68 00:06:44,900 --> 00:06:53,870 So we had m8 at least squares. And just to remind you. 69 00:07:00,490 --> 00:07:06,310 What happened with that code is we made a piecewise linear function and then we added some random noise to it, 70 00:07:06,490 --> 00:07:10,120 and then we did exactly this kind of thing to fit the data by least squares. 71 00:07:10,330 --> 00:07:15,250 So in fact, every time I run the code, I'll get different random numbers, but a similar fit. 72 00:07:15,640 --> 00:07:22,180 So there we do it again. There we do it again. Different random numbers, similar coefficients, similar fit. 73 00:07:22,480 --> 00:07:27,810 So this is a very small scale, least squares problem that could have been done in 1949. 74 00:07:27,820 --> 00:07:33,640 Of course, we're way beyond that. Now, one reason for doing this is if you've got your code handy, 75 00:07:33,850 --> 00:07:40,000 the last couple of lines of the code, we're showing off a little bit of MATLAB just for fun. 76 00:07:40,510 --> 00:07:45,190 Suppose you click on the red curve then. 77 00:07:45,190 --> 00:07:51,910 That has the effect of making the red curve into the so-called current object in your MATLAB graphics. 78 00:07:52,210 --> 00:07:57,400 So if I now say set GCA, which is get current object, 79 00:07:57,910 --> 00:08:08,530 I can then do things like say line width 20 and so suddenly I've made the current object have a line width of 20 or I could say colour. 80 00:08:12,900 --> 00:08:18,740 What was yellow again? One. 11.5. 81 00:08:20,270 --> 00:08:24,040 What's that? Okay. I'm not pretending not to know. 82 00:08:24,040 --> 00:08:28,600 By the way, I really don't remember what yellow is. Okay, so good. 83 00:08:28,930 --> 00:08:33,909 Very impressive. Okay, so that's just a reminder that we're talking about least squares. 84 00:08:33,910 --> 00:08:36,250 And I want to continue down that road. 85 00:08:37,570 --> 00:08:45,250 And in particular, the main thing today is to talk about the way that numerical linear algebra people think of least squares, 86 00:08:45,250 --> 00:08:52,810 which is in terms of a QR factorisation traditionally in physics, often, for example, people think of a gram Schmidt Factorisation. 87 00:08:53,050 --> 00:09:00,760 This is a way to regard that more algebraically and to connect it with an algorithm that is more numerically stable. 88 00:09:01,210 --> 00:09:07,640 So I want to talk about Q QR factorisation. 2.3. 89 00:09:16,880 --> 00:09:22,310 And. Q and our stand for matrices. Q is orthogonal and R is triangular. 90 00:09:24,050 --> 00:09:29,210 So just to remind you that when we. Oh, these patterns are annoying, aren't they? 91 00:09:41,820 --> 00:09:51,660 Just to remind you that when we rate Q in this field, really, it almost always signifies a matrix whose columns are thought to normal. 92 00:09:52,710 --> 00:09:57,060 So if we have NW columns, then Q is probably going to look like that. 93 00:09:57,300 --> 00:10:02,010 And typically the length of the columns will be greater than the number of columns. 94 00:10:02,280 --> 00:10:05,790 So a matrix like that is going to be rectangular. 95 00:10:07,170 --> 00:10:15,270 And in such a case, if we look at Q transpose Q, then that will be the end by an matrix of inner product of these things. 96 00:10:15,480 --> 00:10:18,800 So that will be equal to the end by an identity. 97 00:10:20,790 --> 00:10:24,450 So pictorially. Q Transpose. Q Looks like this. 98 00:10:26,900 --> 00:10:32,000 On the other hand, if we look at Q Q transpose, which also makes sense as a matrix, 99 00:10:32,780 --> 00:10:39,080 it's an M by M matrix, and it is not equal to the end by M identity. 100 00:10:39,860 --> 00:10:44,060 So let's see it pictorially. It has the right dimensions. 101 00:10:46,680 --> 00:10:53,190 But of course it can't possibly be the Maiam identity, because if you think about it, it has rank on the end. 102 00:10:53,610 --> 00:10:59,220 So whatever this is, it's a matrix of rank m whereas the M by M identity would have rank m. 103 00:10:59,610 --> 00:11:03,000 So it's a rank and square matrix. 104 00:11:04,620 --> 00:11:12,950 Now anyway. A QR factorisation. Are also known as a reduced QR factorisation. 105 00:11:14,380 --> 00:11:19,530 Oops. Of our matrix a. 106 00:11:23,710 --> 00:11:28,330 Is a factorisation of aa into a product of a q times and are. 107 00:11:29,860 --> 00:11:32,870 And I'll just draw the picture and also the formula. 108 00:11:32,890 --> 00:11:40,180 Of course, we have a matrix A which we think of as consisting of N columns from a one to a n. 109 00:11:41,620 --> 00:11:45,010 A reduced q r factorisation rates that. 110 00:11:46,130 --> 00:11:49,820 As a matrix. Q With all the normal columns. 111 00:11:51,450 --> 00:12:00,910 Times. An upper triangular matrix r with entries going from r11 to our and end. 112 00:12:01,270 --> 00:12:04,290 So that would be R one, one, R2 two and so on. 113 00:12:04,480 --> 00:12:11,260 And it's up for triangular. So this would be R one to r1n0 down there. 114 00:12:12,760 --> 00:12:17,350 So this is a factorisation of matrix. A Into matrix. 115 00:12:17,350 --> 00:12:26,910 Q Times matrix are. And this is the basic way that we formulate things when talking about least squares problems. 116 00:12:27,150 --> 00:12:31,290 It's a change of basis in some sense that I will talk about from various angles. 117 00:12:32,220 --> 00:12:37,260 You start with a basis for an end dimensional space consisting of the end columns of a. 118 00:12:38,060 --> 00:12:47,310 You change that to a basis which is the normal and or normality has computational advantages, and it's also directly tied to least squares. 119 00:12:48,570 --> 00:12:51,750 So here. Of course I've said it. Let me write it down. 120 00:12:51,750 --> 00:12:59,190 The columns of Q are required to be authored normal and are is required to be upper triangular. 121 00:13:06,190 --> 00:13:07,840 So if you have a thing like that, 122 00:13:09,040 --> 00:13:16,510 let me remind you that I've been urging you to think of matrix times vector as a linear combination of the columns of the matrix. 123 00:13:16,780 --> 00:13:22,450 So here we have matrix times this vector and this vector and this vector and this vector. 124 00:13:23,110 --> 00:13:26,169 So the first column on the left. Will be. 125 00:13:26,170 --> 00:13:35,560 Q One times are one one. The second column on the left will be a linear combination of Q1 and Q2 with these coefficients and so on. 126 00:13:35,800 --> 00:13:43,600 So let's write that down. A one is a linear combination of just one vector. 127 00:13:44,320 --> 00:13:48,370 A two is a linear combination of two vectors. 128 00:13:51,840 --> 00:13:59,550 And so on. A three is our one three. 129 00:13:59,820 --> 00:14:03,390 Q One plus are two three. 130 00:14:03,780 --> 00:14:07,830 Q Two plus are three, three. 131 00:14:08,160 --> 00:14:18,180 Q three. So each column on the left is a linear combination of columns on the right, with coefficients coming from one column of the matrix. 132 00:14:19,670 --> 00:14:26,870 So we can see there how it's a change of basis, where we're representing the columns of a in the successive columns of. 133 00:14:26,870 --> 00:14:31,840 Q, which are all the normal. So generically. 134 00:14:35,990 --> 00:14:42,860 If a one through a and are linearly independent. 135 00:14:47,100 --> 00:14:54,870 Then we've simply changed basis and we have that the span of those vectors. 136 00:14:57,290 --> 00:15:03,740 Would be the same as the span of the Q vectors. 137 00:15:05,370 --> 00:15:08,430 And by the span I mean the set of all linear combinations. 138 00:15:08,670 --> 00:15:13,020 So generically, this is an n dimensional space and this is a different basis for it. 139 00:15:13,530 --> 00:15:19,830 If they're rank linearly dependent, then the span of these vectors would be of a lower dimension. 140 00:15:20,130 --> 00:15:24,810 And so it would be properly contained in the span of these vectors. 141 00:15:28,620 --> 00:15:30,600 Okay. So that's a QR factorisation. 142 00:15:30,930 --> 00:15:36,210 Now let's play around with the algebra a little more and then talk about the ways that these things are calculated. 143 00:15:37,140 --> 00:15:42,090 So let's multiply on the left by Q transpose. 144 00:15:48,130 --> 00:15:51,700 So I'm going to multiply that equation on the left by. Q Transpose. 145 00:15:51,910 --> 00:15:55,780 Q Transpose is a wide short matrix. 146 00:15:56,200 --> 00:16:02,180 So what I get is. Q transpose a. 147 00:16:08,460 --> 00:16:11,820 And then. Q Transpose times. Q Is the NBA an identity? 148 00:16:12,030 --> 00:16:15,090 So that goes away. And I just get are. 149 00:16:19,310 --> 00:16:24,800 So cue transpose consist of rose from Q one transpose down to. 150 00:16:24,980 --> 00:16:39,890 Q and transpose. A As always consist of columns from a one to an and then are as always are one one down to are and end up or triangular. 151 00:16:42,990 --> 00:16:51,389 So that's another way to see what's going on here. The entries of ah ah inner products of vectors. 152 00:16:51,390 --> 00:16:54,750 Q with is a. So ah. 153 00:16:54,780 --> 00:17:04,620 I. J. We see from here is the inner product of Q II with a J. 154 00:17:09,520 --> 00:17:13,960 Now another way to work with the algebra is let's suppose. 155 00:17:17,030 --> 00:17:23,150 That we have some X, so it's any end vector. 156 00:17:26,570 --> 00:17:30,320 And let's consider the corresponding B eight times x. 157 00:17:30,620 --> 00:17:41,060 So I could then say that B equals x is a linear combination of the columns of a. 158 00:17:47,430 --> 00:17:55,710 So if we have our QR factorisation so if A equals Q are, then we can use the algebra and write it like this. 159 00:17:56,700 --> 00:18:00,389 I'll say b equals a x. 160 00:18:00,390 --> 00:18:07,260 So that's q are x and we could write that as Q times our x. 161 00:18:09,150 --> 00:18:15,390 And now I could multiply on the left by Q transpose and I would get that Q transpose B. 162 00:18:17,270 --> 00:18:26,710 Is equal to our X. So what I'm doing here is stuff you've all seen, I'm sure, from different angles, 163 00:18:27,460 --> 00:18:31,990 but I'm trying to urge you to think of it in terms of columns, of matrices, that's all. 164 00:18:32,170 --> 00:18:40,060 And this factorisation, when you see a formula like this, I hope you remember to interpret it in terms of vectors, of coefficients, 165 00:18:40,270 --> 00:18:46,210 we know that X is the vector of coefficients of an expansion of B in the basis of columns of A, 166 00:18:47,410 --> 00:18:54,010 and that neurone in your brain sees this, which is Q inverse B That must be a vector of coefficients of B. 167 00:18:54,880 --> 00:18:58,630 In the. Basis of columns of. 168 00:18:58,630 --> 00:19:05,080 Q So to be precise, X itself is coefficient vector for B. 169 00:19:10,110 --> 00:19:13,290 In the basis of columns of a. 170 00:19:16,830 --> 00:19:26,190 And then this thing because it's an inverse of something times a vector is the coefficient vector. 171 00:19:28,140 --> 00:19:31,570 Four be. In the basis. 172 00:19:32,620 --> 00:19:41,760 Q one through. Q in. So this formula shows us how you can convert from the A basis to the Q basis. 173 00:19:42,120 --> 00:19:46,460 And in fact, we could do it fully by inverting one of those matrices. 174 00:19:46,470 --> 00:19:51,280 So I could say that our inverse Q transpose B equals X and so on. 175 00:19:51,300 --> 00:19:57,830 Many ways to slice it. Okay. 176 00:20:01,080 --> 00:20:08,540 When I was a student, I like a lot of us, I learned different things in different subjects with completely different language. 177 00:20:08,550 --> 00:20:14,460 So I saw this stuff in physics courses, talked about one way, then I saw it in mathematics courses, talked about another way. 178 00:20:14,730 --> 00:20:20,100 It took me years to be comfortable with all the connections and to see that everything is really the same in the end. 179 00:20:21,750 --> 00:20:25,680 Okay, now I want to talk about how people compute QR Factorisation. 180 00:20:26,760 --> 00:20:34,320 So there's our QR factorisation let's talk about it's computation. 181 00:20:40,480 --> 00:20:43,480 This is kind of neat from a structural point of view. 182 00:20:57,530 --> 00:21:02,420 So we're given a matrix, a rectangular matrix. And we want to find Q and R. 183 00:21:04,580 --> 00:21:09,170 One way to view that is we're trying to find. Q which is an orthogonal basis. 184 00:21:09,620 --> 00:21:14,900 Another way to view it is we're trying to find our which is the sort of equivalent a triangular matrix. 185 00:21:15,080 --> 00:21:20,600 And it turns out those two ways of looking at it lead to the two great classes of algorithm. 186 00:21:20,990 --> 00:21:24,650 So one of them is called Graham Schmidt and the other is called Householder. 187 00:21:26,090 --> 00:21:34,060 So let me write the heading for Graham Schmidt. The way I think of Graham Schmidt. 188 00:21:36,350 --> 00:21:40,280 Which is a pre-computer idea. Well, there's another word. 189 00:21:40,280 --> 00:21:47,300 Graham Schmitt, orthogonal ization. The word or thought. 190 00:21:47,330 --> 00:21:52,570 Canonisation says what it's all about. It's an algorithm in which we orthogonal ize the columns of a. 191 00:21:52,840 --> 00:21:59,470 And the way I like to think of it is it's a process of triangular or orthogonal ization. 192 00:21:59,830 --> 00:22:10,830 By which I mean. As you'll see, it's making a orthogonal by a sequence of triangular operations. 193 00:22:13,650 --> 00:22:20,850 And let me skip ahead, leaving some blank space here to the other one, which is householder. 194 00:22:24,240 --> 00:22:27,330 So householder triangular is Asian. 195 00:22:30,900 --> 00:22:38,010 Is the dual of this idea. And this is a post computer idea. 196 00:22:38,280 --> 00:22:41,700 This is a process of orthogonal triangular ization. 197 00:22:51,960 --> 00:23:03,910 Big words. Takes a while to write them. Now those words are nearly enough for you to figure out, given enough time what's going on. 198 00:23:04,150 --> 00:23:07,450 You know, your goal is to take a and convert it into. Q Times are. 199 00:23:07,840 --> 00:23:14,739 I'm in effect telling you there are two natural ways to do that. You can either try to make the columns of or orthogonal write it down. 200 00:23:14,740 --> 00:23:20,390 You'll get Graham Smith or you can try to make a upper triangular using orthogonal operations. 201 00:23:20,410 --> 00:23:23,470 Write that down. You'll get householder. Let me say a little more. 202 00:23:25,210 --> 00:23:39,320 So the Graham Schmidt idea. Viewed from an algebraic point of view consist of taking your matrix A and multiplying it on the right by a sequence. 203 00:23:40,390 --> 00:23:46,240 Of triangular matrices. I haven't really told you what we're doing at an operational level, 204 00:23:46,510 --> 00:23:54,310 but algebraically Gram Schmidt consists of taking your matrix and doing things to it until it becomes. 205 00:23:56,250 --> 00:24:00,390 Orthogonal until the columns become all the normal. Now, what does that really mean? 206 00:24:01,380 --> 00:24:04,890 When you multiply a matrix on the right by something. 207 00:24:06,080 --> 00:24:13,610 You're taking linear combinations of columns. And the grand Schmidt idea is to construct the first column of Q. 208 00:24:14,640 --> 00:24:18,000 By normalising the first column of a construct. 209 00:24:18,000 --> 00:24:23,100 The second column of Q as a linear combination of the first two columns of the third 210 00:24:23,100 --> 00:24:27,390 column of Q is a linear combination of the first three columns of A and so on. 211 00:24:27,570 --> 00:24:32,790 So it's doing a succession of operations on columns of A to make them orthogonal. 212 00:24:38,300 --> 00:24:43,910 I'm not going to write down the formulas, partly because most of us have seen them in one context or another. 213 00:24:44,150 --> 00:24:54,800 It's essentially these formulas. You use these formulas one by one to construct the entries of R, such that the columns have become orthogonal. 214 00:24:56,490 --> 00:25:00,660 You go through these operations acting on column one, one and two, one, two, three. 215 00:25:00,810 --> 00:25:05,310 All the way up to an operational acting on all the columns and. 216 00:25:07,080 --> 00:25:18,219 This then. Is a product of upper triangular matrices, which we can call its trying upper triangular itself because it's a product of them. 217 00:25:18,220 --> 00:25:26,860 If we call it a if we call that our inverse, then algebraically speaking we found a equals Q are. 218 00:25:29,600 --> 00:25:33,050 Okay. So many of you know Graham Schmidt, I'm sure some of you don't. 219 00:25:33,320 --> 00:25:36,020 But any way you can, if you're curious about the details, 220 00:25:36,020 --> 00:25:44,540 look it up and then have some fun interpreting what you find as multiplications of a matrix on the right by elementary matrices. 221 00:25:45,350 --> 00:25:58,120 The one I want to emphasise more is the householder approach. Which is less obvious, but in fact is the standard approach for moderate sized matrices. 222 00:25:58,270 --> 00:26:01,600 If you type Q are in MATLAB, it's this method that's used. 223 00:26:01,810 --> 00:26:04,840 If you do least squares in MATLAB, it's this method that's used. 224 00:26:05,260 --> 00:26:12,340 It's the best way to deal accurately with matrices that are not so big as to cause sparsity challenges. 225 00:26:14,260 --> 00:26:21,070 So I want to talk about householder. And before I do that, let's look at the paper, which is one of the handouts. 226 00:26:24,170 --> 00:26:28,490 So this is a greatest hit of numerical analysis. It's from 1958. 227 00:26:28,910 --> 00:26:32,990 Unitary triangular ization of a non symmetric matrix. 228 00:26:33,940 --> 00:26:39,160 By Austin Householder, which was one of the great figures of numerical linear algebra after the war. 229 00:26:39,850 --> 00:26:43,240 A real character. Smart guy. Whisky drinker. 230 00:26:43,510 --> 00:26:50,500 Good man. He died some years ago, but he had a huge influence on generations of people. 231 00:26:50,890 --> 00:26:54,040 I love this paper. It's so readable and it's so short. 232 00:26:54,040 --> 00:26:58,060 It's essentially three pages long. And it really did change the world. 233 00:26:59,170 --> 00:27:02,170 Nobody had noticed, to first approximation. 234 00:27:03,270 --> 00:27:07,740 This duality the whole world had been thinking in terms of orthogonal ising things. 235 00:27:08,130 --> 00:27:14,580 Householder noticed that you could equally well triangular five things and moreover that it was numerically more stable. 236 00:27:14,850 --> 00:27:19,140 So the rounding error is causing more trouble with this method than with this method. 237 00:27:19,170 --> 00:27:27,180 Quite a remarkable observation. The the previous thing that had come along is called Given's rotations, which has a bit of the same flavour. 238 00:27:27,420 --> 00:27:31,350 But Householder did it in the really the ultimate way, I think. 239 00:27:31,830 --> 00:27:36,840 So we're going to say a bit about what he does and maybe left next lecture, but not right now. 240 00:27:37,140 --> 00:27:42,500 What he does do formally. Is take a matrix a. 241 00:27:46,200 --> 00:27:50,760 And instead of acting on its columns, which would be multiplying by stuff on the right, 242 00:27:51,000 --> 00:27:55,080 he acts on its rows, which means multiplying by stuff on the left. 243 00:27:59,680 --> 00:28:08,200 So the picture is. House holder starts from a matrix, which we imagine being a dense matrix, rectangular. 244 00:28:10,720 --> 00:28:13,570 And he wants to make it up a triangular. 245 00:28:14,630 --> 00:28:22,010 And what he does is, first of all, multiply it on the left by something that turns it into this structure 0000. 246 00:28:22,490 --> 00:28:25,580 So in the first column, the zeros are introduced. 247 00:28:27,390 --> 00:28:31,860 So that step consists of multiplying out, playing it on the left by an orthogonal matrix. 248 00:28:31,860 --> 00:28:40,390 Q one. Then at the next step, zeros are introduced in the second column below the diagonal. 249 00:28:47,160 --> 00:28:50,460 So that would be the multiplication on the left by another. 250 00:28:51,660 --> 00:28:52,770 Orthogonal matrix. 251 00:28:52,950 --> 00:29:01,950 So by a sequence of orthogonal operations, household triangular ization takes the dense matrix and turns it into a upper triangular matrix. 252 00:29:02,280 --> 00:29:09,600 So finally for this one, we have a third multiplication and now it would be upper triangular. 253 00:29:14,990 --> 00:29:20,270 So we're done. We've done it end times. And now we have an upper triangular matrix called our. 254 00:29:22,040 --> 00:29:26,510 And if. Same thing as before. If we call that Q inverse or Q transpose. 255 00:29:28,900 --> 00:29:32,680 Then we've got a equals Q are. 256 00:29:47,770 --> 00:29:57,460 When I was your age and studying numerical analysis, I was told that this was simply better than not more accurate, less subject to rounding errors. 257 00:29:57,730 --> 00:30:02,170 The truth is a little more subtle than that. There are in many basic applications. 258 00:30:02,170 --> 00:30:07,210 That's true, and it certainly gives you more nearly orthogonal matrices. 259 00:30:07,240 --> 00:30:13,360 Q However, it turns out that orthogonal ity sometimes matters, and another time it doesn't matter. 260 00:30:13,600 --> 00:30:17,380 So it's not always true that the accuracy makes a difference. 261 00:30:17,800 --> 00:30:22,840 And there is an advantage to this approach, which is that it generalises more readily to sparse problems. 262 00:30:23,080 --> 00:30:27,220 So often you'll find that if your matrix is 1000 by a thousand, you're doing this. 263 00:30:27,490 --> 00:30:30,700 If it's a billion by a million, you may find yourself closer to there. 264 00:30:31,180 --> 00:30:35,740 Lots of possibilities. I want to play around with this stuff and see what happens. 265 00:30:36,190 --> 00:31:00,230 So let's go to the code called M10. So this is just illustrating QR factorisation. 266 00:31:01,670 --> 00:31:06,970 So I'll say a. Equals a random. 267 00:31:07,180 --> 00:31:16,080 Sorry, a zero. Matrix of dimension seven by three. 268 00:31:17,610 --> 00:31:20,969 There we are. And now I'm going to put some numbers into the matrix. 269 00:31:20,970 --> 00:31:24,660 So I'll say a of colon, a notation we haven't used before. 270 00:31:24,660 --> 00:31:29,490 But you can see what it does will be 1 to 21 squared. 271 00:31:30,270 --> 00:31:39,060 So that takes the vector 1 to 21 squared and puts it into the A without changing the shape in the usual order that we've talked about several times. 272 00:31:40,770 --> 00:31:45,800 So there we have an arbitrary matrix. Let's now say Q are equals. 273 00:31:45,870 --> 00:31:52,320 Q are of a comma zero, which is not labs notation for the reduced q r factorisation. 274 00:31:53,460 --> 00:31:58,410 If we do, that will get a Q in and on. So what you see here is that. 275 00:31:59,560 --> 00:32:05,350 Q You can't tell there are the normal vector column vectors, but they are as obviously upper triangular. 276 00:32:05,590 --> 00:32:12,720 You can immediately see, by the way, why. This way of thinking is kind of a post computer way of thinking. 277 00:32:13,670 --> 00:32:18,180 Anything to do with author normality is not something you're going to write down on paper. 278 00:32:18,210 --> 00:32:23,670 There are square roots all over the place. These are numbers easy to compute and hard to work with on paper. 279 00:32:24,300 --> 00:32:28,770 So orthogonal linear algebra is really a post computer technology. 280 00:32:30,540 --> 00:32:33,959 But anyway, a is equal to QR and I could do all sorts of things. 281 00:32:33,960 --> 00:32:40,320 I could say a minus QR, for example. Mathematically, that should be the zero matrix. 282 00:32:41,190 --> 00:32:44,520 Of course, with rounding errors it won't be exactly zero. 283 00:32:44,850 --> 00:32:48,320 There it is. That's actually bigger than you might think. 284 00:32:48,330 --> 00:32:53,780 But the reason is that a itself is pretty big because it had entries in the hundreds. 285 00:32:53,790 --> 00:32:57,630 So ten to the minus 12th relative to the scale of a is fine. 286 00:32:59,740 --> 00:33:03,760 Here's a again if we look at the norm of the first column of a. 287 00:33:05,010 --> 00:33:10,910 Square root of the sum of the squares of those numbers. There it is. Let me remind you, there's the matrix. 288 00:33:10,920 --> 00:33:15,780 Ah, so because Q is an author normalised version of A. 289 00:33:17,470 --> 00:33:24,430 The first column of Q is a normalisation of the first column of A, therefore the scalar has to come out there. 290 00:33:25,120 --> 00:33:33,880 So that's why 68 equals 68. Similarly, if you look at the norm of the second column of a. 291 00:33:36,780 --> 00:33:40,050 And compare it with the norm of the second column of Ah. 292 00:33:41,310 --> 00:33:49,830 You find they're the same because we're taking all the directions away and putting them into Q and all the magnitudes of A and putting them into our. 293 00:33:52,320 --> 00:33:57,330 Now cue is orthogonal. It's a seven by three matrix with the normal column. 294 00:33:57,340 --> 00:34:00,390 So if I take Q transpose times Q. 295 00:34:02,790 --> 00:34:05,070 You get the three by three identity. 296 00:34:07,460 --> 00:34:15,740 If I take Q times Q transpose, then that would be a seven by seven matrix, which can't possibly be the identity because it only has rank three. 297 00:34:16,370 --> 00:34:21,030 And indeed, if I say rank of ants. It's three. 298 00:34:25,110 --> 00:34:30,690 I guess the final thing in this demo is to get through a big one to see how nice it is to live in the 21st century. 299 00:34:30,900 --> 00:34:36,240 So suppose I say a equals round n of 1e41e3. 300 00:34:40,110 --> 00:34:46,800 So there we have a random matrix with 10,000 rows and 1000 columns. 301 00:34:47,400 --> 00:34:51,719 Pretty big. And let's do a QR factorisation. 302 00:34:51,720 --> 00:34:57,960 I'll say q r equals q r of a comma zero. 303 00:34:59,550 --> 00:35:07,390 Talk. So on this laptop, it takes something like 4 seconds. 304 00:35:10,420 --> 00:35:15,940 But of course, we're dealing with 10 million entries. And in fact, it's it's an N cubed operation. 305 00:35:15,940 --> 00:35:20,890 So the amount of work scales as 10,000 times 1000 squared. 306 00:35:20,900 --> 00:35:25,900 So we're doing a lot of computing. Let's see how accurate it was. 307 00:35:25,900 --> 00:35:33,130 So suppose I say what's the maximum entry of the absolute value of Q? 308 00:35:33,160 --> 00:35:38,880 Transpose Q. Minus the identity of size 1e3. 309 00:35:39,340 --> 00:35:46,470 Okay, so we've just made a CU matrix that's 10,000 by 1000, and its columns are supposed to be all the normal. 310 00:35:46,740 --> 00:35:52,650 Are they really are the normal? Pretty good, right? 311 00:35:53,160 --> 00:35:57,239 If we had done it with Graham Schmidt, they would be not even close to the normal. 312 00:35:57,240 --> 00:36:01,260 But householder achieves this even with these incredible matrices. 313 00:36:01,770 --> 00:36:04,440 It's interesting to think historically about that. 314 00:36:04,450 --> 00:36:13,720 So this theory was developed by Householder and Wilkinson and others in an era when even 100 by 100 was too big for most experiments. 315 00:36:13,770 --> 00:36:17,100 Those guys would have been playing around with matrices of size 20 or 30. 316 00:36:17,460 --> 00:36:23,610 And that's the power of theory, right? They proved theorems about stability, which showed that the method really works. 317 00:36:23,610 --> 00:36:27,810 And here we are 50 years later, doing it on matrices a thousand times bigger. 318 00:36:27,960 --> 00:36:35,870 Just amazing. Okay. 319 00:36:35,870 --> 00:37:08,610 So I want to get back to least squares problems. Okay. 320 00:37:18,070 --> 00:37:36,120 We? Okay. 321 00:37:36,960 --> 00:37:41,730 We already have mentioned them a bit. I want to now connect them with QR factorisation. 322 00:37:41,940 --> 00:37:46,530 So this is called section 2.5. Linear at least squares. 323 00:37:51,720 --> 00:37:56,910 There are, of course, also nonlinear squares problems, which will say a little bit about in a couple of weeks. 324 00:37:59,280 --> 00:38:05,609 The history here is wonderful. This was discovered by Gauss and by legendary independently. 325 00:38:05,610 --> 00:38:12,839 And boy, they had a big fight. So I think Gauss maybe did it first, but typically didn't publish this in the 1790s. 326 00:38:12,840 --> 00:38:18,840 And when they found out about each other, they were furious. Gauss was not a very nice man, but he was pretty bright. 327 00:38:20,490 --> 00:38:26,370 There's a great book. Sort of the standard numerical book about this stuff. 328 00:38:26,380 --> 00:38:30,310 Of course, these squares is everywhere, it's in statistics, everywhere and so on. 329 00:38:30,550 --> 00:38:42,100 But from a numerical linear algebra point of view, the standard book is by Bjork, who is not a rock star except in the mathematical sense. 330 00:38:44,020 --> 00:38:48,250 So just to remind you, we're always dealing here with rectangular matrices. 331 00:38:48,670 --> 00:38:53,730 So A is my N and M is bigger than N. 332 00:38:55,720 --> 00:39:01,690 And we're looking for solutions such that a X minus B in the tune arm is minimal. 333 00:39:06,670 --> 00:39:14,670 And the picture I drew last lecture. Let me draw it again. Lies in the space. 334 00:39:14,670 --> 00:39:25,100 R m. And the picture I like to draw consists of a subspace of RAM, which is the range of a. 335 00:39:26,170 --> 00:39:28,730 Also known as the column space of a. 336 00:39:31,220 --> 00:39:37,520 And whenever we do these squares, we're trying to find a vector in the range of a that's as close as possible to be. 337 00:39:38,060 --> 00:39:42,650 So we think of B as being a vector out here somewhere. And we're trying to find. 338 00:39:44,070 --> 00:39:47,400 That point there, that's not X, that's acts. 339 00:39:48,240 --> 00:39:51,900 So that's the point in the range of A as close as possible to be. 340 00:39:53,080 --> 00:39:57,490 It's obvious intuitively, at least, that there is an unique such point. 341 00:39:59,050 --> 00:40:02,380 There's not necessarily a unique X, but there's a unique X. 342 00:40:03,580 --> 00:40:09,250 And our job is to find it in the way we find it mathematically is to exploit this orthogonal property. 343 00:40:09,640 --> 00:40:16,390 The least squares norm has the property that the residual which is this vector is characterised 344 00:40:16,390 --> 00:40:21,130 by being orthogonal to the range of a and I mentioned all of this at the end of. 345 00:40:22,160 --> 00:40:31,500 Last lecture. So that orthogonal is a condition. The range of AA must be orthogonal to the residual. 346 00:40:31,710 --> 00:40:42,210 So that leads us to the condition we identified at the end of the last lecture, which is that A transpose B minus X should be the zero vector. 347 00:40:43,110 --> 00:40:52,380 This is the residual and it must be orthogonal to each column of a, which is enough to make it orthogonal to the range of a. 348 00:40:53,190 --> 00:41:00,480 So these are the normal equations a transpose a x equals a transpose b. 349 00:41:08,260 --> 00:41:14,139 So as a rule, you shouldn't use this method as long as your matrices are small enough to to do dense 350 00:41:14,140 --> 00:41:19,480 linear algebra with don't solve any squares problem by solving the system of equations. 351 00:41:19,720 --> 00:41:24,070 The reason is that a transpose sort of squares the il conditioning of the problem. 352 00:41:24,280 --> 00:41:26,500 So it'll you'll get a less accurate answer. 353 00:41:26,800 --> 00:41:33,700 On the other hand, for very large problems with sparsity issues, often this is the best way to go because it maintains sparsity. 354 00:41:33,700 --> 00:41:49,900 And to a certain degree. Let me mention a theorem about this stuff, which is that if the rank of A is N, so A has full rank, that's equivalent. 355 00:41:52,220 --> 00:41:59,580 To transpose a being symmetric positive definite a transpose. 356 00:41:59,580 --> 00:42:03,500 They will always be symmetric and it will always be a semi definite. 357 00:42:03,710 --> 00:42:07,490 So its eigenvalues can never be negative, but they might be zero. 358 00:42:07,970 --> 00:42:15,440 However, if rank of A is equal to N rather than less than n, then all the eigenvalues are positive, which means it's non singular. 359 00:42:16,220 --> 00:42:20,990 So a transpose is non singular if and only if rank of his end. 360 00:42:21,200 --> 00:42:26,720 Which means that the solution vector x is unique. 361 00:42:30,950 --> 00:42:40,700 So X is always unique and the picture sort of makes that obvious, but the vector x corresponding to that is unique if and only if A has full rank. 362 00:42:43,880 --> 00:42:47,210 So there are three ways we can easily talk about computing X. 363 00:42:47,900 --> 00:42:51,080 We've just in effect shown one of them we could solve the normal equation. 364 00:42:51,230 --> 00:42:59,240 It's a symmetric, positive, definite system. Normally we could solve that system and find x, but let's do it the other way. 365 00:42:59,240 --> 00:43:05,270 Let's do it via q our factorisation. So how do we solve? 366 00:43:05,880 --> 00:43:19,220 Least squares via QR. Well, the idea is if we have a North a normal basis for the range of AA, then the problem should be an easy one. 367 00:43:19,370 --> 00:43:22,460 We won't have to invert anything. We're going to have to take some inner product. 368 00:43:22,730 --> 00:43:28,430 That's what the QR Factorisation gives us and we're a little short on time, so I'm going to write it down rather quickly. 369 00:43:31,930 --> 00:43:36,040 We're trying to project be orthogonal into that range space. 370 00:43:37,570 --> 00:43:45,550 Let's write down an equation for that. So a X is the orthogonal projection. 371 00:43:51,930 --> 00:43:56,330 Of B on to. Range of. 372 00:44:02,340 --> 00:44:05,570 That means. That acts. 373 00:44:06,740 --> 00:44:11,330 I'll write it down and then explain why it's right. It can be written as Q. 374 00:44:11,630 --> 00:44:18,830 Q transpose. B. Now, why is that? Well, this is going to give me expansion coefficients. 375 00:44:22,350 --> 00:44:24,570 I'm speaking fast and loose here, I'm afraid. 376 00:44:26,860 --> 00:44:32,200 And then this is going to take those expansion coefficients and multiply them by the appropriate vectors. 377 00:44:39,170 --> 00:44:48,590 So I'm afraid I am speaking quickly here. But it's a projection process where you take me and you take in our products in order to find coefficients, 378 00:44:48,590 --> 00:44:51,890 and then those use those coefficients in the appropriate basis. 379 00:44:52,250 --> 00:45:02,320 Now, if you multiply on the left by Q transpose. Then you get. 380 00:45:04,770 --> 00:45:11,820 May make sure I've got this right. I want to assume that a is QR so we let's see, we have QR. 381 00:45:12,150 --> 00:45:18,340 X equals Q Q transpose B multiplied by Q transpose. 382 00:45:18,340 --> 00:45:22,470 So we get our x equals Q transpose B, 383 00:45:22,620 --> 00:45:32,489 and that's the key equation that we can solve to do at least squares problem by the Q our factorisation so I'm sorry 384 00:45:32,490 --> 00:45:41,010 this is going to quickly the idea is you take a you factor it into Q times R and then you solve this equation, 385 00:45:41,430 --> 00:45:46,350 you multiply Q transpose by B to get your coefficients, as it were, in that basis. 386 00:45:46,350 --> 00:45:51,420 And then you solve this equation to translate back to the original a basis. 387 00:45:51,930 --> 00:45:56,280 So that's the algorithm. Let's do it. 388 00:46:00,710 --> 00:46:03,560 And the final demo here is the one called M11. 389 00:46:08,520 --> 00:46:16,710 So if I say following the script here, I'm going to say t equals two pi times a vector of length of thousand. 390 00:46:19,700 --> 00:46:25,990 So this is now a set of points scattered between zero and two pi, a thousand points, 391 00:46:26,270 --> 00:46:31,370 and I'll say B equals T over five plus and then use a T over five. 392 00:46:32,180 --> 00:46:35,780 And then using a logical operation, I'll say. 393 00:46:36,990 --> 00:46:43,890 T less than pi, so that vector will be zero when t is bigger than pi and one when t is less than pi. 394 00:46:45,030 --> 00:46:51,150 So if I plotted. I've just constructed that. 395 00:46:52,270 --> 00:46:56,590 Vector. This is a vector of length. 1000 plotted by straight lines. 396 00:46:58,240 --> 00:47:02,319 Now let's make a matrix which has five columns that are trigonometric functions. 397 00:47:02,320 --> 00:47:13,120 So say A is equal to the thousand by five matrix cosine of zero t sign of t cosine of t sign of two. 398 00:47:13,120 --> 00:47:19,130 T cosine of two. Thousand by five matrix. 399 00:47:20,060 --> 00:47:23,360 And now let's solve the least squares problem in three different ways. 400 00:47:23,570 --> 00:47:29,790 So one way. Is to just use backslash so I could say x1 equals a backslash. 401 00:47:29,800 --> 00:47:36,610 B this is doing whatever matlab does, which is in fact the q r factorisation and we could plot the result. 402 00:47:36,610 --> 00:47:44,140 I can plot a times x one as a red line and there you see the least squares fit. 403 00:47:47,440 --> 00:47:54,430 On the other hand, we could explicitly use the QR factorisation and do it ourself rather than relying on MATLAB backslash. 404 00:47:54,440 --> 00:48:03,170 So let's do that. I could say Q comma R equals Q are of a zero. 405 00:48:04,760 --> 00:48:11,600 And then I could say x2 equals R backslash q transpose b. 406 00:48:13,990 --> 00:48:20,469 Now that's a hazardous way to write it, because who knows whether the backslash was done before the vector. 407 00:48:20,470 --> 00:48:21,850 This is a better way to write it. 408 00:48:22,210 --> 00:48:32,050 So here, that operation means first we multiply B by Q transpose, and then we solve the problem r x equals that vector. 409 00:48:34,690 --> 00:48:39,240 So if I do that, I have another version of X. This was the first one we computed. 410 00:48:39,250 --> 00:48:44,440 This is the second one we computed. They're pretty much the same x one minus x two. 411 00:48:45,550 --> 00:48:49,090 There they are. They're not identical notice. 412 00:48:49,090 --> 00:48:52,270 So obviously MATLAB did something slightly different from what we did. 413 00:48:52,720 --> 00:48:57,970 Finally, let's do the normal equations and I don't want to tell you these are bad merely that they're different. 414 00:48:58,300 --> 00:49:02,890 I could solve the problem by a transpose a. 415 00:49:04,030 --> 00:49:09,970 Backslash A transpose B and this is mathematically different. 416 00:49:10,180 --> 00:49:17,200 This is not what backslash is doing. It's really a different algorithm that advantageous, sometimes disadvantageous other times. 417 00:49:17,710 --> 00:49:22,980 If I look at the error here, I see that in this case it's fine. 418 00:49:22,990 --> 00:49:26,560 Nothing went wrong in this case. Okay. 419 00:49:26,580 --> 00:49:27,990 I think that's it for today. Thanks.