1 00:00:11,280 --> 00:00:21,240 Today. It's a big. 2 00:00:34,860 --> 00:00:39,760 Is this? 3 00:00:53,610 --> 00:01:07,140 Think. And that's what do in my life. 4 00:01:07,970 --> 00:01:12,670 Okay. Good morning, everybody. Let's get started. I hope you've done the quiz. 5 00:01:12,690 --> 00:01:23,430 Let's quickly go through that. So the answer to problem one is no output because there's a semicolon. 6 00:01:26,160 --> 00:01:35,430 The answer to problem two is true, which in MATLAB appears as one. 7 00:01:38,010 --> 00:01:44,430 The answer to problem three we have a determinant of a two by two matrix and it's minus two. 8 00:01:45,960 --> 00:01:51,780 The answer to problem four is about 4.5. 9 00:01:52,860 --> 00:01:59,370 Depending how accurately you count, because that's the square root of the condition number of the matrix, which is 20. 10 00:02:00,570 --> 00:02:10,560 And so let's say the answer to number five, how long does it take if you increase from dimension 4000 to 6000? 11 00:02:10,800 --> 00:02:15,270 It should it should go quickly. So it's around 5.4 seconds. 12 00:02:15,870 --> 00:02:19,050 Actually, on my machine, it was 4.7 seconds. 13 00:02:20,760 --> 00:02:26,549 The answer to number six. About what are some names of non symmetric matrix iterations? 14 00:02:26,550 --> 00:02:37,410 While there are lots of them, there's CG and there's like Q are there's by C, G, there is by C, G, stab and so on and so on. 15 00:02:37,410 --> 00:02:43,709 Kumar. They go on and on. The answer to number seven. 16 00:02:43,710 --> 00:02:46,800 What's the non symmetric analogue of the lunchbox iteration? 17 00:02:47,250 --> 00:02:53,110 That's the Arnold iteration. And then the answer to number eight. 18 00:02:53,110 --> 00:02:56,460 What is my first name? It's Lloyd, actually. 19 00:02:56,470 --> 00:03:01,150 I'm Lloyd. Nicholas. Professor. I hope you all knew that. McHale. 20 00:03:01,330 --> 00:03:06,400 Would you be willing to go get me a bottle of water? I have a kind of a cold, and that might help. 21 00:03:06,590 --> 00:03:10,870 Yeah. Okay, so today we're switching to a part. 22 00:03:10,870 --> 00:03:16,720 Two of the three parts of this term, and the heading here is dense linear algebra. 23 00:03:22,480 --> 00:03:25,030 So we started with sparse linear algebra. 24 00:03:28,670 --> 00:03:35,090 And the dense stuff is, of course, a huge field that I'm only going to touch on certain very basic aspects of. 25 00:03:36,200 --> 00:03:41,810 Of course, all of you have encountered linear algebra in many ways over the years and you know some of this. 26 00:03:42,110 --> 00:03:47,270 But I hope that the way we look at things will be a little different from the way you're used to it. 27 00:03:47,900 --> 00:03:55,730 One place to read about this point of view is in my book, Professor, and now just at the beginning, in chapters one, two, three, 28 00:03:58,070 --> 00:04:05,960 talks about the point of view that I'm going to express, which is all about thinking about matrices and vector vectors via their columns. 29 00:04:07,160 --> 00:04:18,260 So let's start with what I'll call section 2.1, and that's matrices, vectors and expansions. 30 00:04:28,880 --> 00:04:31,640 So of course, you know what matrices are and you know what vectors are. 31 00:04:33,500 --> 00:04:40,700 But the basic point of view I want to recommend is that often in basic numerical linear algebra, 32 00:04:40,910 --> 00:04:44,750 it's good to think of a matrix as consisting of a collection of columns. 33 00:04:45,080 --> 00:04:48,709 So this is the picture we can often draw for a matrix. 34 00:04:48,710 --> 00:04:55,310 The first column is a one and the second is a two, and so on up to a n. 35 00:04:57,050 --> 00:05:01,070 And when I draw that picture, I'm certainly not assuming it's a square matrix. 36 00:05:01,640 --> 00:05:07,700 Typically, I'm assuming that it has at least as many rows as columns and maybe many more rows and columns. 37 00:05:08,600 --> 00:05:13,760 Now a typical matrix times vector. I would then draw like this. 38 00:05:15,000 --> 00:05:18,930 x1x 2... down 2xn. 39 00:05:21,090 --> 00:05:24,270 So of course, you know the formula for a matrix times a vector here. 40 00:05:24,270 --> 00:05:30,840 We learned that in school, this kind of thing. But the way I want you to think of it is as a linear combination of the columns. 41 00:05:31,050 --> 00:05:36,600 So if A X equals B, notice the way I'm drawing the picture. 42 00:05:37,740 --> 00:05:43,650 I'm not telling you anything about the individual entries of these columns or about the individual entries of B. 43 00:05:43,980 --> 00:05:51,540 Moreover, A is rectangular. B has the same length as A, whereas the length of x corresponds to the width of A. 44 00:05:52,770 --> 00:05:58,320 So that is the way I want you to think of A X equals B. 45 00:06:00,090 --> 00:06:03,090 So in other words, B is a linear combination. 46 00:06:03,090 --> 00:06:10,020 Let's write down some things here. A J is the Jth column of the Matrix a. 47 00:06:11,810 --> 00:06:16,700 X sub j is the eighth entry of the vector x. 48 00:06:21,280 --> 00:06:28,600 And then B is a linear combination of the columns of A. 49 00:06:33,960 --> 00:06:36,150 With coefficients x sub j. 50 00:06:40,440 --> 00:06:46,620 So whenever you have a matrix vector product, you may interpret it that way, and often that's the right way to interpret it. 51 00:06:46,860 --> 00:06:57,000 We could write the formula B is the sum from one to end of x sub j a sub j. 52 00:06:57,240 --> 00:06:59,160 Of course that's a number and that's a vector. 53 00:07:02,430 --> 00:07:12,900 Okay, now standard terminology, we say that the range of a which is also known as the column space of a. 54 00:07:13,320 --> 00:07:24,170 Thank you very much. So. These are to above and beyond the call of duty. 55 00:07:30,290 --> 00:07:42,020 So the range or the column space of A is the set of all things you can get when you multiply A by some X like this. 56 00:07:42,260 --> 00:07:53,660 In other words, it's a set of all linear combinations of the columns of a set of all such linear combinations. 57 00:07:56,990 --> 00:08:00,080 And that's a vector space and all sorts of things are known about them. 58 00:08:01,040 --> 00:08:07,250 It's a vector space of dimension at most end. So it's a vector space. 59 00:08:13,210 --> 00:08:20,590 At most, and it will be of dimension. And if the columns are linearly independent and dimension less than if they are linearly dependent, 60 00:08:20,740 --> 00:08:26,230 which means there's some linear combination of them with non-zero coefficients that gives you the zero vector. 61 00:08:27,460 --> 00:08:36,340 And this number, the dimension is the rank of a also known as the column rank. 62 00:08:37,420 --> 00:08:52,150 Are they? Now, if a matrix has rank and if rank of A equals N, then we also say that A has full rank. 63 00:08:56,170 --> 00:09:03,340 And if the rank is less than ten, we often say that A is rank deficient. 64 00:09:06,500 --> 00:09:10,160 So these are standard terms that people like me use all the time. 65 00:09:11,750 --> 00:09:12,229 Now, of course, 66 00:09:12,230 --> 00:09:22,100 one of the things we want to do with matrices and vectors is to try to find expansion coefficients to match a particular right hand side. 67 00:09:22,340 --> 00:09:26,620 So that's the basic problem of linear algebra, given B, find X, 68 00:09:27,350 --> 00:09:32,820 and that makes sense even in this rectangular case where we're not assuming that the matrix is square. 69 00:09:32,840 --> 00:09:37,520 It's still a reasonable question to ask. So let me move over to here. 70 00:09:44,500 --> 00:09:47,710 So let's ask the question given be. 71 00:09:51,040 --> 00:10:06,530 Is there an x such that well, a x equals B would be nice, or maybe a x is in some sense close to be if a square. 72 00:10:06,550 --> 00:10:11,080 Of course, that's something we hope for. If a is rectangular, we're more likely to hope for that. 73 00:10:11,500 --> 00:10:15,040 So that's the basic problem of linear algebra. 74 00:10:15,250 --> 00:10:20,229 And when I posed it this way, I'm thinking of B as the data vector. 75 00:10:20,230 --> 00:10:26,530 So B goes from B one up to B, M if you like. 76 00:10:28,480 --> 00:10:32,530 It's the data and the x vector we're looking for. 77 00:10:35,110 --> 00:10:49,490 Is the coefficient vector. So of course the case we know best is when a is square and in vertical. 78 00:10:50,840 --> 00:11:02,360 And in that case, if a is square and investable, then we all know the solution to this problem. 79 00:11:05,710 --> 00:11:11,650 It is X equals a inverse B, that's the inverse of the matrix. 80 00:11:11,830 --> 00:11:15,100 And I want you to think about that expression. 81 00:11:17,110 --> 00:11:24,880 What that can be interpreted as is a statement that to get the coefficient vector for an expansion, you apply a inverse. 82 00:11:25,150 --> 00:11:30,670 So a inverse is an operation that matches you, that maps you from data to coefficients. 83 00:11:30,970 --> 00:11:35,500 So a inverse maps. 84 00:11:36,850 --> 00:11:43,030 From Data. Two expansion coefficients. 85 00:11:47,320 --> 00:11:54,610 And indeed, I recommend that you set up a neurone in your brain that sort of fires whenever it sees that kind of a thing. 86 00:11:54,610 --> 00:12:02,110 It should fire and say, aha, coefficient vector. Somebody is interested in a coefficient vector for a vector b. 87 00:12:05,810 --> 00:12:14,180 So let's not even write that down yet another way whenever we do a solution of a system of equations. 88 00:12:17,900 --> 00:12:37,040 Another way to say that is we're finding a coefficient vector for a matrix equation for an expansion of a vector B in a basis. 89 00:12:38,680 --> 00:12:46,010 Of columns of a. So that's going to work most cleanly if a square and non singular. 90 00:12:46,190 --> 00:12:51,740 But it also makes sense when a is rectangular and then we can talk about trying to get close to B, 91 00:12:51,980 --> 00:13:00,140 we can try to approximately solve a system of equations, find a coefficient vector that is a pretty good expansion, even if it's not exactly right. 92 00:13:01,340 --> 00:13:10,970 So let's say a little more about that. Suppose A has more rows than columns. 93 00:13:18,160 --> 00:13:21,969 Then in general, we're not going to be able to solve X equals B. 94 00:13:21,970 --> 00:13:30,790 Exactly. So we could put it like this unless B happens to be in the range. 95 00:13:34,860 --> 00:13:41,450 Then there will be no solution. So unless B happens to be in the range of a, a, 96 00:13:42,070 --> 00:13:52,260 x equals B has no solution by definition the range of defined as all the B's for which there is such a solution. 97 00:13:53,940 --> 00:14:01,270 So in this case. We can look for approximations. 98 00:14:04,500 --> 00:14:08,850 With the property that in some measure is close to be. 99 00:14:11,260 --> 00:14:16,749 And indeed one of the axioms of numerical linear algebra is that almost any problem that 100 00:14:16,750 --> 00:14:21,760 makes sense with a square matrix can be generalised to a problem with a rectangular matrix. 101 00:14:22,060 --> 00:14:25,960 And often that generalisation is a more robust formulation of the problem. 102 00:14:28,690 --> 00:14:33,100 So this is what will lead us in a moment to least squares. 103 00:14:35,560 --> 00:14:41,500 Which is one particular natural choice of what it means for a ex to be close to me. 104 00:14:41,950 --> 00:14:45,190 I want to say a word about the continuous analogue of these things. 105 00:14:56,560 --> 00:15:01,600 So the normal way that people like me think about matrix problems is as matrix 106 00:15:01,600 --> 00:15:05,110 problems in which there a certain number of rows and a certain number of columns. 107 00:15:05,410 --> 00:15:11,770 But when you draw this picture, nothing in the picture requires these columns to be discrete. 108 00:15:11,770 --> 00:15:14,380 They could be continuous, they could be functions. 109 00:15:14,680 --> 00:15:22,630 So the very same picture would apply if we're trying to expand this function as a linear combination of these functions. 110 00:15:23,230 --> 00:15:27,790 If you're thinking matrices, that's the case where m the number of rows goes to infinity. 111 00:15:28,030 --> 00:15:34,210 You can imagine a very long, skinny matrix. Well, as M goes to infinity, you get some kind of a function analogue. 112 00:15:36,030 --> 00:15:51,450 And that's, of course, a very old idea. So as the number of rose goes to infinity, we can think of continuous functions. 113 00:15:52,650 --> 00:16:08,760 So we can think of each column of a as a function of some independent variable, let's say T and similarly B would be a function. 114 00:16:09,840 --> 00:16:15,240 So that could be functions, let's say, of T on some interval, if you like. 115 00:16:17,280 --> 00:16:20,549 All the linear algebra that we do has analogues. 116 00:16:20,550 --> 00:16:23,760 In this continuous case, everything is still defined. 117 00:16:25,530 --> 00:16:32,500 So for example. Range of a is the same as before. 118 00:16:36,900 --> 00:16:45,360 If a is an object whose columns are functions, the range of A is the space of linear combinations of those functions. 119 00:16:45,570 --> 00:16:50,610 Again, it will have dimension at most. M If it has dimension, exactly. 120 00:16:50,610 --> 00:16:59,220 And we can say it's a full rank and so on. The rank of A is the same defined as before. 121 00:17:02,460 --> 00:17:07,380 So it's the dimensionality of the set of all products A, B. 122 00:17:08,460 --> 00:17:12,540 So this, as I say, is an old point of view, although there isn't that much literature on it. 123 00:17:13,200 --> 00:17:20,300 In this case, the term that seems to have taken hold is that a is called a quasi matrix. 124 00:17:26,230 --> 00:17:32,110 And the reason this stuff interests me. Well, one reason it interests me is that I think all of computational science, 125 00:17:32,110 --> 00:17:36,880 all of numerical linear algebra, is about this interplay between the continuous and the discrete. 126 00:17:37,300 --> 00:17:42,910 Typical problem we try to solve with matrices might be obtained by describing something continuous. 127 00:17:43,150 --> 00:17:46,930 So ultimately it's this that we care about at the continuous level. 128 00:17:47,530 --> 00:17:52,570 It's also of interest to me because of the Fun Software Project, which is something I'm engaged with. 129 00:17:52,780 --> 00:18:00,220 So at a software level this turns into fun, which I'll show you a little bit about, maybe a little today and more later on. 130 00:18:01,690 --> 00:18:09,370 So let's see. I want to show an example, discrete enough quasi matrices and this on the reverse of the quiz is M7. 131 00:18:18,280 --> 00:18:27,760 So if we go to M7, let's type A equals 123456220. 132 00:18:29,020 --> 00:18:39,060 So there you have a three by three matrix a. And I just want you to practice thinking of eight times a vector as a linear combination of the columns. 133 00:18:39,300 --> 00:18:46,620 So, for example, suppose I say a times the column vector one zero. 134 00:18:48,030 --> 00:18:51,570 One. Now, of course, you can calculate that any number of ways. 135 00:18:51,780 --> 00:18:56,480 But I want you to think of that instantly as the first column, plus the third column. 136 00:18:56,490 --> 00:19:02,580 So that should give us 410 to. Similarly. 137 00:19:02,580 --> 00:19:05,580 Let's do one more. Suppose I say a times. 138 00:19:05,940 --> 00:19:09,000 What do I have here? One minus two zero. 139 00:19:10,320 --> 00:19:13,850 So that's the first column minus twice the second column. 140 00:19:13,860 --> 00:19:17,280 So that should give me. Somebody murmuring. 141 00:19:17,280 --> 00:19:21,189 Did I miss type? No. What will that give me? 142 00:19:21,190 --> 00:19:24,910 Minus three. Minus six. Minus two. 143 00:19:27,580 --> 00:19:30,850 So linear combinations of columns is a fruitful way to think. 144 00:19:31,330 --> 00:19:37,550 Now, of course, we can type something like inverse of a times 2 to 2. 145 00:19:39,100 --> 00:19:46,150 And what I ask you to do is think of that not just as some matrix operation, but as the extraction of a coefficient vector. 146 00:19:46,450 --> 00:19:53,620 So that is the vector of coefficients. When you expand the vector 2 to 2 in the columns of a. 147 00:19:56,450 --> 00:20:02,540 In MATLAB and indeed in numerical computation, generally, you don't normally construct an inverse. 148 00:20:02,750 --> 00:20:09,260 So this inverse of A is not an object that people would really construct, although you can, of course, 149 00:20:09,800 --> 00:20:20,330 it's more accurate and more efficient to bypass the inverse so that in MATLAB, the way you'd really solve this is to two, two, backslash, 2 to 2. 150 00:20:20,720 --> 00:20:26,210 So it's an interesting coincidence that when we did it with the inverse, obviously there were some small rounding errors. 151 00:20:26,600 --> 00:20:30,049 When we did it this way, there were no rounding errors. That's not typical. 152 00:20:30,050 --> 00:20:33,230 Of course, normally there will be rounding errors. We were just lucky. 153 00:20:36,440 --> 00:20:43,700 Now continuing on what you see here, the kind of thing people do with matrices all the time is do long skinny matrices. 154 00:20:43,910 --> 00:20:49,430 So let's do that. I'll shrink the font a little bit or maybe too much. 155 00:20:51,630 --> 00:21:00,300 Okay. So suppose I say t equals the vector from zero and steps of 0.1 up to two, and I make that a column vector. 156 00:21:01,290 --> 00:21:05,070 So there's t, there's T. 157 00:21:05,790 --> 00:21:11,820 So this is some kind of a discrete approximation to the function T on the interval from 0 to 2. 158 00:21:13,490 --> 00:21:21,170 So I could now say flat t e to the minus t, and I could make that, let's say, a red line. 159 00:21:26,190 --> 00:21:35,280 So there we have a picture that looks like E to the minus T, but of course, it's not really it's actually a vector with 20 pieces in it. 160 00:21:37,630 --> 00:21:40,960 Plotted by connecting the dots with piecewise linear. 161 00:21:41,140 --> 00:21:43,780 It only looks so good because it's a nice smooth function. 162 00:21:45,490 --> 00:21:49,960 Now let's do a little linear algebra and interpret the linear algebra in terms of that picture. 163 00:21:50,590 --> 00:21:58,709 Let's make a. As a matrix whose first column is t two. 164 00:21:58,710 --> 00:22:07,890 The zero second column is TI to the one and third column is ti squared. 165 00:22:09,090 --> 00:22:12,390 And let's see. I'll detach this thing so we can see the matrix. 166 00:22:17,070 --> 00:22:25,240 Okay. So the thing I've just constructed is a so-called Van der Monde matrix. 167 00:22:25,510 --> 00:22:29,620 The first column is T to the zero, the second T to the one. The third is T squared. 168 00:22:29,890 --> 00:22:33,700 Obviously, it's some kind of a discrete approximation to a continuous object, 169 00:22:34,000 --> 00:22:38,410 a very basic matrix that we might use for some kind of data fitting problem. 170 00:22:39,920 --> 00:22:49,400 So let's do a little data fitting. Let's make a coefficient vector x, which will be one minus one one half. 171 00:22:52,140 --> 00:22:53,610 Now. Why did I pick that vector? 172 00:22:53,760 --> 00:23:02,580 Because those are the first three Taylor coefficients of E to the minus T, so e to the minus T equals one minus T plus the half T squared and so on. 173 00:23:02,910 --> 00:23:10,410 So that data vector corresponds to a taylor approximation to the functioning, to the minus t at t equals zero. 174 00:23:12,460 --> 00:23:18,540 So if we multiply. A by that vector, we get that approximation. 175 00:23:19,080 --> 00:23:24,870 Let's see. I say hold on and then I'll say plot T against a times x. 176 00:23:27,330 --> 00:23:36,920 So this will show me the Taylor approximation to the function e to the minus T, at least on this discrete grid of 21 points. 177 00:23:41,640 --> 00:23:44,790 So notice it does just what you would expect of a Taylor approximation. 178 00:23:45,060 --> 00:23:50,400 It matches the function beautifully at zero. And then it wanders off in some other direction. 179 00:23:55,560 --> 00:23:59,460 Now we're going to do least squares in a moment. But first, let's just illustrate least squares. 180 00:23:59,670 --> 00:24:08,820 So suppose I say x equals a backslash e to the minus t. 181 00:24:10,520 --> 00:24:17,360 So let's think about that syntactically. First, before I type return e to the minus t is a vector of length 21. 182 00:24:17,960 --> 00:24:22,640 I've got a spurious parentheses there. I'll have to get rid of a backslash. 183 00:24:22,640 --> 00:24:24,950 Well, that neurone should fire in your head. Right. 184 00:24:24,950 --> 00:24:31,520 So you immediately know that that is going to compute the expansion vector, the coefficient vector x. 185 00:24:32,940 --> 00:24:38,760 In some sense, but it's going to be in a least squares sense because a is this long skinny matrix. 186 00:24:39,030 --> 00:24:46,200 So if I do that, it's actually giving you a vector of coefficients which will give a least squares approximation to the data. 187 00:24:46,380 --> 00:24:49,770 And if we plot that, we can see it. So let's say plot. 188 00:24:51,360 --> 00:24:54,600 T against a x. And I'll make this one black. 189 00:25:01,040 --> 00:25:04,760 So you see the big difference. The Taylor approximation is all at the origin. 190 00:25:05,090 --> 00:25:09,800 The least squares is global and it's obviously done a good job of fitting the function. 191 00:25:10,040 --> 00:25:15,800 So this function can be very well approximated by a parabola on the interval from 0 to 2. 192 00:25:17,780 --> 00:25:22,459 And just to play with MATLAB, as we always do. I hope you know the legend command. 193 00:25:22,460 --> 00:25:27,890 You can do things like this. I'll say e to the minus t taylor least squares. 194 00:25:29,430 --> 00:25:34,139 The Legend Command puts a legend on your plot and you can put it in whatever corner you like. 195 00:25:34,140 --> 00:25:40,840 So I say Location Southwest and there you see it's put the legend on the plot. 196 00:25:40,880 --> 00:25:46,660 Oops. Okay. 197 00:25:46,660 --> 00:25:50,740 So you see that Lee Squares obviously has something going for it. 198 00:25:50,890 --> 00:25:55,180 Let's finish this demo by just plotting the error for the least squares approximation. 199 00:25:55,480 --> 00:26:01,900 So if I say plot T against a x minus E to the minus T, that's the error. 200 00:26:05,540 --> 00:26:09,110 So if you look at the axes, you can see the error is nice and small. 201 00:26:09,260 --> 00:26:14,210 We begin to see the discrete nested 20 segments and the error is about .02. 202 00:26:18,870 --> 00:26:26,419 Okay. Any comment? Now, since we do seem to have enough time, let me just give you a hint of how to have fun. 203 00:26:26,420 --> 00:26:31,340 Does continuous analogues of this so intense fun I can do things like this. 204 00:26:31,340 --> 00:26:39,800 I'll say from 18, I could say let's create a variable on the interval from 0 to 2, 205 00:26:41,450 --> 00:26:49,310 and then I could say A equals T to the zero, T to the one, T to the two. 206 00:26:49,640 --> 00:27:00,230 And now we have a truly continuous object. So it reports that it's a telephone consisting of three columns, and the three columns do the right thing. 207 00:27:00,590 --> 00:27:05,000 So, for example, if I plot a. You'll see. 208 00:27:07,300 --> 00:27:11,260 The three columns one an x and x squared. 209 00:27:12,660 --> 00:27:16,150 And I can do all sorts of things like at least squares. So let's try that. 210 00:27:16,170 --> 00:27:25,260 I can even do the continuous analogue of least squares. I could say, let's see, one of our data vectors was that one. 211 00:27:26,250 --> 00:27:34,770 So I could say plot, I could say x equals E to the minus T and I could say Plot F. 212 00:27:36,840 --> 00:27:41,310 And then I could say. Plot. 213 00:27:42,760 --> 00:27:52,810 Hold on. Then I could say plot a times X as a blue line looks blues, a boring colour. 214 00:27:53,110 --> 00:27:59,200 So here we're doing exactly the same kind of thing, except we've taken the number of rows to infinity, 215 00:27:59,200 --> 00:28:02,410 as it were, where in some sense dealing with actual functions. 216 00:28:02,920 --> 00:28:10,120 So there's E to the minus T, there's the Taylor approximation, and let's do the least squares analogue. 217 00:28:10,120 --> 00:28:14,590 If I say A equals a backslash f. 218 00:28:16,290 --> 00:28:20,580 I could now say Clark a times ex as a red line. 219 00:28:24,650 --> 00:28:30,440 So there you can see we've actually done a continuous least squares approximation rather than a discrete one. 220 00:28:30,590 --> 00:28:37,310 Everything looks the same because this isn't a very exciting function, but it's truly the continuous analogue of this discrete stuff. 221 00:28:37,550 --> 00:28:47,480 If I plot the error, that's a X minus F, so it looks about as it did before, except now it's smooth instead of discrete. 222 00:28:48,560 --> 00:28:53,690 So we've had a lot of fun in fun doing all kinds of neat stuff, which I will show you more of later. 223 00:29:00,630 --> 00:29:09,660 Okay. Well, we're about halfway through linear algebra. Yes. 224 00:29:11,220 --> 00:29:16,070 Is. Is it finding a quadratic which is close to the least square? 225 00:29:16,360 --> 00:29:24,230 Yes, because the reason it's a quadratic is that the quasi matrix had three columns, which were one and T and T squared. 226 00:29:24,710 --> 00:29:31,790 So a linear combination of them. Is that quadratic? Okay. 227 00:29:31,790 --> 00:29:36,050 So I haven't really told you what these squares is, though. Of course, you all know, I'm sure. 228 00:29:37,220 --> 00:29:40,310 But in order to talk about that, we need to talk about orthogonal ity. 229 00:29:43,730 --> 00:29:48,170 So, uh, 2.2 is orthogonal vectors and matrices. 230 00:29:56,640 --> 00:30:04,760 Now. You know me. I always like history. One of the interesting bits of history here is that our technology is, of course, 231 00:30:04,760 --> 00:30:11,360 a very old idea mathematically, but in the 1960s it moved to centre stage algorithmically. 232 00:30:11,630 --> 00:30:17,510 Before that, people had been doing solving systems of equations and so on, using non orthogonal methods. 233 00:30:17,750 --> 00:30:23,480 And in the sixties, they pretty much discovered that most things could be done more accurately using orthogonal methods. 234 00:30:23,720 --> 00:30:29,660 So for 50 years now, orthogonal linear algebra has been a very central part of the field. 235 00:30:30,830 --> 00:30:37,880 Now, let me remind you of some standard definitions. We start with the idea of an inner product between two vectors. 236 00:30:41,730 --> 00:30:46,320 So if the. And w are vectors, then. 237 00:30:47,930 --> 00:30:50,990 We write it like this. V. Transpose w. 238 00:30:52,480 --> 00:30:56,110 Is equal to the sum of v j. 239 00:30:57,130 --> 00:31:05,020 WJ. Couple of comments here. The default notation in this field is always that a vector is a column vector. 240 00:31:05,230 --> 00:31:11,140 So v transpose w means we have something like that multiplied by something like that. 241 00:31:13,710 --> 00:31:21,210 Another comment is I could write J goes from one to M or something if I wanted to emphasise the row index. 242 00:31:21,420 --> 00:31:26,579 But I prefer this way because well, it could be anything. 243 00:31:26,580 --> 00:31:34,260 It could be continuous. This could in the continuous analogue that would turn into an integral of v. 244 00:31:34,260 --> 00:31:46,390 Of t times w. Of t. Another comment I could make is the formulas I'm writing down are for real numbers. 245 00:31:46,570 --> 00:31:51,130 If you're dealing with complex numbers, then an inner product would have a complex conjugate for it. 246 00:31:53,530 --> 00:31:58,660 Okay, so that's the standard idea. And we say that the and W are orthogonal. 247 00:32:03,350 --> 00:32:06,380 If the inner product is zero. 248 00:32:10,430 --> 00:32:17,360 And we say the norm of a matrix of a vector, which is also called the two norm. 249 00:32:21,010 --> 00:32:24,430 Is defined like this. We write it like that. 250 00:32:24,430 --> 00:32:28,480 Or if we want to emphasise that it's the tune arm, we could also write it like that. 251 00:32:29,200 --> 00:32:34,080 And it's the square root. Of the transposed feet. 252 00:32:40,900 --> 00:32:48,040 If we have a set of vectors, that is to say more than two of them, let's say, so a set of vectors. 253 00:32:48,370 --> 00:32:51,549 And when they're orthogonal, they're often going to be written as. 254 00:32:51,550 --> 00:32:54,880 Q So let me do that. If suppose I have a set of vectors. 255 00:32:55,090 --> 00:32:59,080 Q Sub J We say that they are orthogonal. 256 00:33:03,840 --> 00:33:10,560 If each pair is orthogonal. So we could say if they are pairwise or orthogonal. 257 00:33:19,540 --> 00:33:25,870 When I write. Q Usually it's going to have a further property of being normalised, have length one. 258 00:33:26,140 --> 00:33:30,010 So we say that a set of vectors are auth a normal. 259 00:33:34,740 --> 00:33:39,810 Which is abbreviated o. N. If in addition. 260 00:33:40,830 --> 00:33:44,040 Each one has norm. One. 261 00:33:47,180 --> 00:33:49,930 So if you have a vector, you can always normalise it. 262 00:33:49,940 --> 00:33:56,420 If it's non-zero, you can normalise it by dividing by its norm and then you'll get a normalised vector. 263 00:33:56,630 --> 00:34:01,250 Or you could divide by minus its norm. So there's always that plus or minus sign choice. 264 00:34:03,870 --> 00:34:08,040 So let's suppose that we have a set of ortho normal vectors. 265 00:34:08,040 --> 00:34:14,640 So I'll imagine that Q1 through Q n our ortho normal. 266 00:34:18,270 --> 00:34:24,510 Then. Why is all the normal ortho normality so interesting? 267 00:34:24,750 --> 00:34:31,470 Because then inverses turn into multiplication and to get expansion coefficients all we need our inner products. 268 00:34:31,800 --> 00:34:37,650 So let me write that down. In this case, we get expansion coefficients. 269 00:34:42,790 --> 00:34:49,550 Via inner products. And I want to write that down in a matrix fashion. 270 00:34:52,040 --> 00:35:00,769 So first of all, in a non matrix fashion, if a vector B is expanded as a linear combination. 271 00:35:00,770 --> 00:35:06,800 X. J. Q. J. Then that's the same as saying. 272 00:35:08,120 --> 00:35:18,760 If the Q JS are ortho normal. That x j equals q j transpose b and why is that? 273 00:35:18,790 --> 00:35:22,210 Well, I multiply this equation on the left by q j transpose. 274 00:35:22,480 --> 00:35:30,969 So of course that's q j transpose b and then here we get the linear combination of x j times QJ transpose and you 275 00:35:30,970 --> 00:35:39,550 see that all the all the ones that the inner products of Q II and Q J that are different I and j go to zero. 276 00:35:40,210 --> 00:35:44,320 So this is the great benefit of ortho normality you get. 277 00:35:45,960 --> 00:35:50,220 Expansion coefficients by products. So let's write it via matrices. 278 00:35:51,210 --> 00:35:58,130 So in the matrix picture. We, as usual, think of a long, skinny matrix whose columns are. 279 00:35:59,240 --> 00:36:02,850 Now. Q one. Up to q in. 280 00:36:05,470 --> 00:36:16,450 Author Normal. And I can imagine multiplying that by x1x2 up to x m and getting a vector b. 281 00:36:22,400 --> 00:36:34,550 So this equation as a matrix equation could be written capital q x equals B where Q is a whatever by n matrix. 282 00:36:37,230 --> 00:36:44,290 Now we multiply on the left. By the transpose of Q. 283 00:36:46,370 --> 00:36:50,030 Which is a and by whatever matrix. 284 00:36:51,890 --> 00:36:55,160 So imagine multiplying this thing on the left. 285 00:36:55,550 --> 00:37:00,210 I'm just going to draw it schematically. We have rectangle times. 286 00:37:00,800 --> 00:37:05,330 Vector equals long vector. But now we multiply on the left. 287 00:37:05,330 --> 00:37:10,840 So we get. Transpose a rectangle times rectangle times. 288 00:37:10,840 --> 00:37:15,309 Short vector equals transpose of rectangle. 289 00:37:15,310 --> 00:37:29,370 Times long vector. So multiplying that thing on the left gives us the equation that x one down two x end. 290 00:37:30,990 --> 00:37:36,360 Is equal to. And now let's do that. Schematically, it's this long. 291 00:37:37,560 --> 00:37:47,129 Vector whose first row is Q one transpose second row Q to transpose third row and so on down to the end throw is q. 292 00:37:47,130 --> 00:37:50,130 N. Transpose times b. 293 00:37:56,660 --> 00:38:04,070 So as a matrix equation, this is the equation that X equals Q transpose b. 294 00:38:08,610 --> 00:38:12,090 And notice here I'm writing equations involving rectangular matrices. 295 00:38:12,330 --> 00:38:18,150 I most certainly have not assumed that Q is square. What I've done makes sense in general. 296 00:38:22,160 --> 00:38:25,700 So it's interesting how different fields use different notations. 297 00:38:27,600 --> 00:38:30,870 In Physics, which is the other field I would know best, I guess. 298 00:38:31,530 --> 00:38:34,349 Of course, exactly the same formulas are used all the time, 299 00:38:34,350 --> 00:38:39,240 but they would typically be written as summations rather than in this kind of matrix notation. 300 00:38:39,510 --> 00:38:44,820 But numerical people have found that the matrix notation is a very simple way to go towards algorithms. 301 00:38:45,060 --> 00:38:52,710 So that's how we prefer to write things. Okay. 302 00:38:53,490 --> 00:39:01,530 That's all for arbitrary matrices with all the normal columns, but a very important special case. 303 00:39:02,010 --> 00:39:09,390 Let's move over here is when they are square matrices with all the normal columns. 304 00:39:13,800 --> 00:39:19,500 In fact, there's no name for one of these matrices, which is a great shame matrix with all the normal columns. 305 00:39:19,500 --> 00:39:24,810 There's no name for that. It might be called an All the Normal Matrix, but that's not a standard term. 306 00:39:25,620 --> 00:39:30,570 The standard term only arises once it becomes square, and then it's called an orthogonal matrix. 307 00:39:30,810 --> 00:39:34,740 So let's now talk about an orthogonal matrix. 308 00:39:37,500 --> 00:39:40,890 So it's. An orthogonal matrix. 309 00:39:44,740 --> 00:39:49,120 Which we could write. Cue is a square matrix. 310 00:39:53,390 --> 00:40:03,410 With all the normal columns. So everything I've written down applies in this case too. 311 00:40:03,410 --> 00:40:06,830 But then more also applies because we can talk about the inverse. 312 00:40:07,760 --> 00:40:16,930 So in this case. Let's draw the square picture of Q transpose times. 313 00:40:16,950 --> 00:40:24,960 Q. So here's Q here's Q transpose. 314 00:40:25,830 --> 00:40:29,280 Q consists of vector column vectors. 315 00:40:29,280 --> 00:40:34,930 Cue one through. Cue in. Q Transpose consists of row vectors. 316 00:40:35,680 --> 00:40:40,330 Q one transpose through. Q And transpose. 317 00:40:42,270 --> 00:40:50,250 Now when you multiply that by that, you're getting inner products of the various Q transposes with the various cues and it's sort of normal. 318 00:40:50,400 --> 00:40:55,889 Then all of those inner products are zero except on the diagonal where you get one. 319 00:40:55,890 --> 00:41:00,360 Because it's not just orthogonal, it's ortho normal. So we get the identity. 320 00:41:00,360 --> 00:41:09,990 Oops. We get one, one, one, which is the identity. 321 00:41:14,840 --> 00:41:20,120 So just from this notion of or thug analogy, you can see that in the square case. 322 00:41:20,690 --> 00:41:26,570 Q Transpose is the inverse of. Q Because you multiply them together and you get the identity. 323 00:41:26,840 --> 00:41:28,460 So that is. 324 00:41:32,220 --> 00:41:40,710 If you have the same number of rows and columns and an author normal matrix with all the normal columns we have Q transpose Q equals the identity, 325 00:41:41,010 --> 00:41:45,480 so Q transpose equals Q inverse. 326 00:41:46,350 --> 00:41:51,150 And it's a standard fact of linear algebra that that also implies that Q. 327 00:41:51,240 --> 00:41:54,540 Q Transpose is the identity. 328 00:41:57,540 --> 00:42:02,190 So in this case you your neurone can fire whenever it sees Q transpose. 329 00:42:02,370 --> 00:42:10,040 So whenever you see a. X equals Q transpose b. 330 00:42:11,510 --> 00:42:15,080 That's the same as saying X equals Q inverse b. 331 00:42:16,380 --> 00:42:20,850 So in this case of a square matrix, that's how you extract coefficients. 332 00:42:21,030 --> 00:42:29,640 And that's a hint about the numerical algorithms. Multiplying something by this matrix is much easier than computing a whole new inverse matrix. 333 00:42:34,830 --> 00:42:41,100 Okay. So this is how you get expansion coefficients when you're dealing with an orthogonal matrix. 334 00:42:41,310 --> 00:42:44,400 And just to hammer the point home ad nauseum. 335 00:42:46,430 --> 00:42:50,000 X in this case would be written as. 336 00:42:51,460 --> 00:42:54,700 This matrix. Q one transposed down to. 337 00:42:54,970 --> 00:43:04,860 Q n transpose times. B. Okay. 338 00:43:04,860 --> 00:43:09,120 So this gives us a convenient notation in which a lot of things are easy to write down. 339 00:43:09,300 --> 00:43:16,680 Let's give an example of that. So it's too simple to call it a theorem. 340 00:43:16,950 --> 00:43:25,240 Let's call it a proposition. If Q is orthogonal. 341 00:43:31,760 --> 00:43:36,710 Then for any vector x the norm of q x. 342 00:43:38,490 --> 00:43:49,980 Is equal to the norm of X. Well, that's plausible enough because the Q is going to maybe rotate or change the change basis and come kind of a. 343 00:43:51,530 --> 00:43:59,540 A non stretchy kind of away intuitively, but let's verify it algebraically and the matrices are very convenient for that. 344 00:43:59,540 --> 00:44:02,690 So q x. Norm. 345 00:44:03,920 --> 00:44:11,630 Well, the square of x norm is the inner product of Q X with itself. 346 00:44:13,700 --> 00:44:18,260 So remember, my definition of a norm was the square root of this. So the norm squared is that. 347 00:44:18,650 --> 00:44:27,410 And now you all know that when you have a transpose of a product, that's the same as the product of the transposes in reverse order. 348 00:44:29,360 --> 00:44:32,570 So that is equal to X transpose. 349 00:44:33,200 --> 00:44:37,080 Q Transpose. Times. Q. 350 00:44:38,210 --> 00:44:42,730 Times X. And now this is the identity. 351 00:44:44,460 --> 00:44:51,120 So this is equal to x transpose x. So here we have the norm of x squared. 352 00:44:54,640 --> 00:45:00,280 And it's interesting practice if you're thinking about this stuff and I'm sure you all know these things from some angle, 353 00:45:00,640 --> 00:45:06,340 maybe not using this notation, it take the angle you're most familiar with and see how it translates into this. 354 00:45:06,340 --> 00:45:09,940 See what is being really said here by each of these terms. 355 00:45:14,550 --> 00:45:21,900 Okay. So it remains to write down these squares and basically with all these orthogonal tools in hand, 356 00:45:22,680 --> 00:45:30,590 the algebra of these squares sort of comes immediately. So let's suppose we have a matrix a. 357 00:45:34,230 --> 00:45:40,710 And it has more rows than columns. So if we want to be explicit, we could say it's m by n with. 358 00:45:42,500 --> 00:45:55,230 M bigger than n. Then in general, we can't expect to be able to solve X equals B, but we might be able to find a good approximation. 359 00:45:55,530 --> 00:46:01,010 And we say that at least squares. Solution to X equals B. 360 00:46:07,370 --> 00:46:15,410 Is a vector x with the property that the norm of x minus B is as small as possible. 361 00:46:20,380 --> 00:46:27,130 That actually explains me is called the residual. And all it takes is a picture to see how to solve this problem. 362 00:46:27,310 --> 00:46:36,160 So this is a picture I draw a lot. You imagine in some abstract space the set of all possible m vectors. 363 00:46:36,170 --> 00:46:39,330 So this is the space. Our M. 364 00:46:42,190 --> 00:46:52,270 It's a vector space, an m dimensional vector space. And now you imagine that the subspace of our M, which is the range of a. 365 00:46:56,720 --> 00:47:01,879 Now a. Only has n columns. So the range of a is a subspace of dimension at most. 366 00:47:01,880 --> 00:47:09,170 N it could be less or it could be n. So the range of a is a subspace of our m of dimension at most. 367 00:47:09,170 --> 00:47:13,340 N And now we have a point somewhere other over here. 368 00:47:14,640 --> 00:47:22,560 Be. And our goal is to find an X such that a x minus B is minimal. 369 00:47:22,570 --> 00:47:27,520 In other words, we want to find an x corresponding to the point in here. 370 00:47:27,640 --> 00:47:31,270 Closest to be the solution is immediate from the picture. 371 00:47:31,990 --> 00:47:34,150 Obviously one can prove this algebraically. 372 00:47:35,450 --> 00:47:46,630 The closest approximation to be in this space is a point there characterised by this right angle here orthogonal. 373 00:47:50,000 --> 00:47:54,230 I'm just using your geometric intuition. Of course one can prove that algebraically. 374 00:47:54,830 --> 00:48:04,400 So to solve the least squares problem, you want to find that point there such that the range of a is orthogonal to the residual. 375 00:48:06,050 --> 00:48:12,200 So to get the solution to our least squares problem, we want the range of a. 376 00:48:14,410 --> 00:48:18,160 To be orthogonal to X minus B. 377 00:48:20,990 --> 00:48:25,620 Now, another way to say that is we want. Each. 378 00:48:26,950 --> 00:48:30,280 Column of A to B, orthogonal to A, X minus B. 379 00:48:30,610 --> 00:48:42,340 So we want a oops. We want a J transpose x minus B to be zero for each column of a. 380 00:48:44,610 --> 00:48:49,710 So the picture tells us what algebraic problem we need to solve to do least squares. 381 00:48:57,570 --> 00:49:09,120 So there's our algebraic problem. Another way to say that is that a transpose times A, X minus B should be zero. 382 00:49:09,630 --> 00:49:16,410 Or another way to say that is that A transpose A should be A transpose B. 383 00:49:18,490 --> 00:49:23,470 So that's the simplest derivation of an algebraic condition to solve. 384 00:49:23,530 --> 00:49:28,160 Sorry, there's an X here. To solve the least squares problem. 385 00:49:28,340 --> 00:49:31,460 This set of equations is called the normal equations. 386 00:49:34,910 --> 00:49:36,620 So we've defined orthogonal ity, 387 00:49:37,430 --> 00:49:44,840 we've used a picture to characterise a least squares approximation and then from that picture we've derived a formula. 388 00:49:45,900 --> 00:49:51,809 A linear system of equations symmetric, positive, definite, or at least positive as indefinite. 389 00:49:51,810 --> 00:49:55,410 It could be non-negative definite in the ranked deficient case. 390 00:49:56,130 --> 00:50:07,510 We still have one minute, so let's do the other demo. So now this is m eight. 391 00:50:07,780 --> 00:50:13,390 Suppose I now say t equals to pi times. 392 00:50:14,940 --> 00:50:21,580 Numbers space between zero and one. So T is now a vector that basically goes from 0 to 2 pi. 393 00:50:22,380 --> 00:50:27,720 And suppose I say a equals. It's in my computer somewhere. 394 00:50:27,750 --> 00:50:39,110 There we are. So A is now a matrix of five columns which are one, and then sine t and cosine t and sign to t and cosine to t. 395 00:50:39,330 --> 00:50:42,480 So you see, we're doing a trigonometric type of approximation here. 396 00:50:43,380 --> 00:50:48,240 If I compute a transpose times a there it is, it's a bunch of inner products. 397 00:50:48,240 --> 00:50:51,000 You can see the orthogonal ity of signs and cosines. 398 00:50:52,500 --> 00:51:02,370 If I plot a, you can see signs and cosine democratised in 30 points or if I want to get the x axis right, I should really say plot of t against a. 399 00:51:04,260 --> 00:51:12,169 And now let's make a particular. Right hand side vector, which is already in my computer, and I'll plot it. 400 00:51:12,170 --> 00:51:16,040 I'll say T, B, and I'll plot it as black circles. 401 00:51:17,870 --> 00:51:29,830 Oops. So that's a right hand side vector which is picked just to be I'm using piecewise linear, let's do least squares. 402 00:51:29,840 --> 00:51:35,860 So if I type this in MATLAB, it's going through some algorithm which we haven't actually talked about, 403 00:51:36,100 --> 00:51:39,730 which is mathematically equivalent to solving this system of equations. 404 00:51:39,970 --> 00:51:44,170 So it finds the vector x corresponding to the least squares solution. 405 00:51:45,520 --> 00:51:51,160 And if I now say Hold on and plot T against a times X. 406 00:51:52,950 --> 00:52:00,169 You can see we have a nicely squares fit to that data. Continuing with this example, let's perturb the data. 407 00:52:00,170 --> 00:52:03,950 So I'll say B equals B plus some random perturbations. 408 00:52:04,400 --> 00:52:07,910 And now I'll plot that and I'll say plot TB. 409 00:52:08,060 --> 00:52:11,629 Okay, so now we've got the same data, 410 00:52:11,630 --> 00:52:20,209 but perturbed if I do the same thing where we know least squares is pretty good at dealing with noise, so we should get a similar vector. 411 00:52:20,210 --> 00:52:25,730 It's not so different. And if I do hold on and then plot T against x. 412 00:52:28,760 --> 00:52:34,280 We see that the least squares solution has done that. Of course, all of this could also be done in a continuous setting. 413 00:52:34,820 --> 00:52:36,200 Okay, that's it for now. Thanks.