1 00:00:02,700 --> 00:00:06,450 Okay. Good morning, everybody. I hope you've done the quiz. 2 00:00:06,450 --> 00:00:13,650 So let's just look at that and see if you got the right answers. Every square matrix has an eigenvalue decomposition false. 3 00:00:14,040 --> 00:00:18,180 And the simplest example of one that doesn't. Would be that. 4 00:00:19,830 --> 00:00:25,400 In other words, not every matrix is diagonal sizeable. Every matrix, rectangular or square has an SVD. 5 00:00:25,710 --> 00:00:31,300 Yes, that's true. If A and these are square, etc., etc., then they have the same eigenvalues. 6 00:00:31,320 --> 00:00:34,740 Yes, that's true. That's a similarity transformation. So let's see. 7 00:00:34,740 --> 00:00:43,080 We have false, true, true number for a square matrix of singular if and only if it has a zero eigenvalue. 8 00:00:43,290 --> 00:00:49,280 True. Number five, the singular values of an orthogonal matrix are all equal to one. 9 00:00:49,550 --> 00:00:55,280 That's true. Remember, the singular values may tell you about a hyper ellipsoid. 10 00:00:55,280 --> 00:01:00,500 Well, an orthogonal matrix maps a ball to a ball, so all the singular values are one. 11 00:01:01,610 --> 00:01:04,880 Finally, the eigenvalues of an orthogonal matrix are all equal to one. 12 00:01:05,060 --> 00:01:11,120 That's false. The eigenvalues of an orthogonal matrix have to be equal to those of its inverse. 13 00:01:11,390 --> 00:01:15,410 But that could be minus one. So they're plus or minus one? Not necessarily one. 14 00:01:17,750 --> 00:01:24,170 Okay. So today we begin the last bit of this term, which is about optimisation. 15 00:01:24,380 --> 00:01:27,350 We've been talking about linear algebra. Now optimisation. 16 00:01:27,770 --> 00:01:32,990 This is a subject that I mentioned at the beginning was one of the pillars of scientific computing. 17 00:01:33,200 --> 00:01:42,530 Remember, I drew a tree in which I identified three traditional main areas of numerical analysis scientific computing. 18 00:01:42,680 --> 00:01:51,350 One was linear algebra, one was differential equations, pdes odes, and the other was optimisation. 19 00:01:53,690 --> 00:01:58,630 I think if anything, over the years, this one just keeps getting more and more important. 20 00:01:58,640 --> 00:02:04,430 It's just an amazingly exciting, flourishing area that's everywhere in science and engineering. 21 00:02:05,720 --> 00:02:13,820 Let me mention a couple of facts about optimisation that make it diverse and interesting. 22 00:02:14,300 --> 00:02:20,840 One is that, of course, we're typically dealing with functions of several variables or many variables. 23 00:02:22,280 --> 00:02:28,670 So optimisation is all about minimising functions of many variables. 24 00:02:30,830 --> 00:02:38,180 And the other thing without which the field wouldn't be so variegated is that typically there are constraints. 25 00:02:40,700 --> 00:02:42,260 So what is optimisation? 26 00:02:42,560 --> 00:02:54,360 Well, it's the subject of minimising or maximising functions, typically with several or many variables and typically with constraints. 27 00:02:54,370 --> 00:03:06,500 So it's minimisation and maximisation also more or less in the same domain is finding routes of functions maximisation. 28 00:03:13,720 --> 00:03:18,660 So that's what optimisation is all about. Nobody ever maximises in practice you always minimise. 29 00:03:18,660 --> 00:03:22,600 So if you have a if you want a maximum of something, multiply it by minus one. 30 00:03:23,050 --> 00:03:31,330 The standard is always to talk about minimisation. So this is a field where there is a totally dominant, great textbook. 31 00:03:31,600 --> 00:03:36,850 And that's this book by No Saddle and Right. The second edition came out nine years ago. 32 00:03:36,850 --> 00:03:39,309 I think it's just a beautiful, straightforward, 33 00:03:39,310 --> 00:03:45,940 knowledgeable book that talks about all the basic and a little more than basic ideas of numerical optimisation. 34 00:03:46,150 --> 00:03:52,300 So there's absolutely no doubt that if you want to learn about something in this area, the place to start is with this book. 35 00:03:52,720 --> 00:03:58,600 Let me pass that around. Oh, before I pass it around, let me read the first two sentences. 36 00:04:02,720 --> 00:04:05,750 So it's a big, thick book, but well organised. 37 00:04:05,750 --> 00:04:09,580 Don't worry. The first two paragraphs are just these little paragraphs. 38 00:04:09,590 --> 00:04:13,700 The first paragraph begins People optimise. 39 00:04:15,010 --> 00:04:19,960 What they mean by that is, well, let me say what they mean by that. Investors seek to create portfolios. 40 00:04:19,990 --> 00:04:23,290 Da da da. Manufacturers aim for maximum efficiency. 41 00:04:23,500 --> 00:04:26,740 Engineers adjust parameters to optimise performance and so on. 42 00:04:26,890 --> 00:04:31,960 So people optimise. Then the second paragraph says Nature optimises. 43 00:04:32,230 --> 00:04:34,840 Physical systems tend to a state of minimum energy. 44 00:04:35,020 --> 00:04:41,470 The molecules in an isolated chemical system react with each other until the total potential energy of their electrons is minimised. 45 00:04:41,680 --> 00:04:45,160 Rays of light follow paths that minimise their travel time. 46 00:04:45,700 --> 00:04:51,309 I think that's an intellectually fascinating beginning of the book because it really is true that optimisation is 47 00:04:51,310 --> 00:04:56,860 something that's both at the heart of science because of so many processes that minimise energy or something. 48 00:04:57,070 --> 00:05:00,280 And at the heart of engineering, for even more obvious reasons. 49 00:05:07,100 --> 00:05:16,009 Okay. So the beginning point of optimisation and it's not just the beginning, 50 00:05:16,010 --> 00:05:21,200 it's a pervasive idea throughout everything is Newton's method, which of course we've all heard of. 51 00:05:21,380 --> 00:05:24,140 But that's certainly where we're going to start discussing the matter. 52 00:05:24,380 --> 00:05:32,090 And we're going to begin with Newton's method in one dimension, which is the kind of thing I hope you all learned in high school. 53 00:05:32,840 --> 00:05:39,400 So Newton's. Method, and I'll put it like this Newton's method for a single equation. 54 00:05:40,660 --> 00:05:46,870 And the reason for putting it like that is to emphasise that I'm beginning with root finding rather than minimisation. 55 00:05:51,150 --> 00:05:55,830 I remember once a teacher told me that really mathematics boiled down to two things. 56 00:05:56,280 --> 00:05:59,520 One is Taylor series. The other is linear algebra. 57 00:05:59,640 --> 00:06:04,110 Everything else is variations on those themes. Well, this is Taylor series. 58 00:06:04,110 --> 00:06:06,900 Newton's great idea is Taylor series. Indeed, 59 00:06:07,110 --> 00:06:14,250 I think Arnold wrote a book about the history of mathematics in which he essentially claimed that Newton's great contribution was Taylor series, 60 00:06:15,720 --> 00:06:22,530 exploiting the smoothness of a function to approximate it by a constant plus slope plus curvature, that sort of thing. 61 00:06:23,190 --> 00:06:25,620 So let's write down the notation we're going to use. 62 00:06:27,720 --> 00:06:35,070 Let's suppose we're given a function, and the notation I'll use for root finding is capital F of X. 63 00:06:35,580 --> 00:06:42,810 When we turn to minimisation, it'll be lowercase F. So suppose we have a function which we assume is nonlinear. 64 00:06:47,280 --> 00:06:58,140 And we want to find a zero. So our goal is to find a zero of F, also called a root of F. 65 00:07:00,820 --> 00:07:04,450 And the notation will use is that that's going to be X star. 66 00:07:06,700 --> 00:07:11,680 So we're looking for an X star such that f of X star is equal to zero. 67 00:07:13,990 --> 00:07:21,819 And you all know Newton's method. Let me write it down in this format that I'll write down a bunch of these iterations. 68 00:07:21,820 --> 00:07:28,080 So suppose we're given an initial guess. Which I'll call, of course, X not. 69 00:07:30,990 --> 00:07:36,450 And then we enter the loop. And I won't ever talk about termination criteria. 70 00:07:36,450 --> 00:07:56,970 So I'll just say four k equals 01... define s sub k to be minus f evaluated at x sub k divided by f prime evaluated that x sub k now s. 71 00:07:58,180 --> 00:08:04,790 Okay. Stands for step. Because we're going to take a step in that direction. 72 00:08:06,440 --> 00:08:13,730 The new iterate is then x, k plus one is equal to x, k plus k. 73 00:08:17,010 --> 00:08:20,940 So there's the iteration that you all learned in high school. 74 00:08:22,710 --> 00:08:25,710 Now, let me ask I'm curious. Raise your hand if you did learn that in high school. 75 00:08:26,700 --> 00:08:36,270 Oh, okay. Pretty good. And of course, you know, the picture associated with this, we have some kind of function. 76 00:08:37,170 --> 00:08:46,110 We're trying to find X star, we have some guess x not we evaluate the function, then extrapolate along the tangent, 77 00:08:46,710 --> 00:08:52,830 and that process converges quickly, in fact, just to say a word about how fast it does converge. 78 00:08:56,610 --> 00:09:03,630 I won't state it formally as a theorem, but this is the idea if F is twice differentiable. 79 00:09:05,450 --> 00:09:11,970 So that means the second derivative exists. And moreover. 80 00:09:15,310 --> 00:09:22,180 The derivative evaluated at a route x star is nonzero. 81 00:09:25,990 --> 00:09:32,260 Then if X not is sufficiently close to x star. 82 00:09:44,180 --> 00:09:50,390 We have convergence and it's quadratic. So we say the convergence is quadratic. 83 00:09:53,790 --> 00:09:58,530 Which means the number of digits doubles at each step, essentially. 84 00:09:59,670 --> 00:10:07,950 Later, I may write that down. More precisely, the error at one step is proportional asymptotically to the square of the error at the previous step. 85 00:10:08,550 --> 00:10:12,630 And this is a prototype for the kind of theorem you find all over optimisation. 86 00:10:13,170 --> 00:10:18,240 Notice some key assumptions. Always turn up. You need some smoothness in your function. 87 00:10:18,360 --> 00:10:21,780 If you lose that, then there may be no convergence or slower convergence. 88 00:10:22,350 --> 00:10:26,610 You need some kind of generic behaviour at the solution you're looking for. 89 00:10:26,910 --> 00:10:30,120 Otherwise you may have no convergence or slower convergence. 90 00:10:30,660 --> 00:10:35,040 And especially with Newton's method, you certainly need to be close enough to the solution. 91 00:10:36,000 --> 00:10:39,240 So of course, if this function keeps doing crazy things. 92 00:10:42,670 --> 00:10:48,760 Depending where we start, we may converge to nothing or we may converge to different routes. 93 00:10:49,060 --> 00:10:52,570 It's obvious from the picture that if we start near this route, will converge to it. 94 00:10:52,780 --> 00:11:00,670 Start near that will converge to that. Things get more complicated if your function is less smooth or the derivatives are non-zero at the roots. 95 00:11:02,260 --> 00:11:06,550 So of course, a route is by no means necessarily unique. 96 00:11:06,910 --> 00:11:10,630 Newton's method is all about finding one route, not necessarily all of them. 97 00:11:11,770 --> 00:11:16,510 Now I want to show you, as usual, some stuff. So let's go to MATLAB. 98 00:11:21,620 --> 00:11:30,830 The first code is the one called M 18, which is not a program you're really supposed to look at because all it does is fiddle around with graphics. 99 00:11:32,120 --> 00:11:36,830 But I was giving a public lecture once, so it was worthwhile making such a code. 100 00:11:37,040 --> 00:11:52,490 So it's called Pretty Newton. So what Pretty Newton does is take a particular function and just colourfully illustrate Newton's method. 101 00:11:52,700 --> 00:11:58,120 So there you have the function, which of course is the blue curve, and we're trying to find that point. 102 00:11:58,520 --> 00:12:03,650 And there's our initial gas and you can see that the initial gas is 0.8. 103 00:12:04,400 --> 00:12:12,410 None of the digits are correct. So they're all in red. You evaluate the function at your current gas. 104 00:12:14,140 --> 00:12:18,670 And then you extrapolate along the tangent line. So that gives you a new point. 105 00:12:18,670 --> 00:12:21,730 0.208. Still, none of the digits are correct. 106 00:12:22,420 --> 00:12:27,160 So they're still all in red. Then we evaluate the function there. 107 00:12:27,610 --> 00:12:32,519 Extrapolate along the tangent. Still none of the digits are correct. 108 00:12:32,520 --> 00:12:38,649 So they're still all in red. Of course, the remainder you won't see very well. 109 00:12:38,650 --> 00:12:45,280 You're mostly going to see the red digits. We evaluate and extrapolate and now suddenly we have three digits of accuracy. 110 00:12:45,280 --> 00:12:54,310 .373 are correct. And from now on, every time we keep going, the number of correct digits will approximately double. 111 00:12:55,990 --> 00:12:59,680 So I evaluate and then I extrapolate. 112 00:13:00,250 --> 00:13:03,430 And it exactly doubled from three digits to six digits. 113 00:13:04,520 --> 00:13:09,140 Do it again. We expect to get about 12 digits. I evaluate and then extrapolate. 114 00:13:09,980 --> 00:13:13,820 And as it happens, we got exactly 12 digits on my machine. 115 00:13:13,820 --> 00:13:14,840 It won't go beyond that. 116 00:13:14,840 --> 00:13:22,820 I might be able to nail the final one, but of course, I'm working in 16 digit arithmetic, so that's as far as I'll get on this computer. 117 00:13:26,160 --> 00:13:31,560 Great idea. Newton, by the way, didn't do as much with Newton's method as you probably think. 118 00:13:31,860 --> 00:13:35,790 I think he mainly applied it only to polynomials, not to general functions. 119 00:13:36,060 --> 00:13:42,030 And Simpson, as in Simpson's rule, was the person who really put Newton's method on its feet. 120 00:13:42,210 --> 00:13:46,940 Simpson and Robson both. Okay. 121 00:13:46,970 --> 00:13:51,140 Now, I also want to look at the other codes that are called M 19. 122 00:13:52,550 --> 00:13:57,170 So there are three of these just to explore this problem from different angles. 123 00:13:58,070 --> 00:14:01,330 Excuse me. Okay. 124 00:14:01,330 --> 00:14:06,580 So here we have a problem. I might as well write it down. 125 00:14:07,870 --> 00:14:22,580 We're solving. An M 19 where solving f of x equals x cubed minus two. 126 00:14:22,940 --> 00:14:25,220 So we're trying to find the cube root of two. 127 00:14:26,120 --> 00:14:32,300 And incidentally, on computers, sometimes Newton's method is used even for elementary things like square root sets. 128 00:14:32,570 --> 00:14:36,920 It's an algorithm so useful that even in the hardware, sometimes it will be implemented. 129 00:14:37,910 --> 00:14:44,810 So let's look at this from various angles. The first of the codes em 19 Newton. 130 00:14:44,820 --> 00:14:48,139 A just as straightforward matlab. 131 00:14:48,140 --> 00:14:54,570 So I will close that. And let's just type em 19. 132 00:15:01,890 --> 00:15:07,370 Newton. A So what you see there is we've taken our initial guesses. 133 00:15:07,380 --> 00:15:16,320 One, we're trying to find the cube root of two. We iterate, we get 4/3, we iterate again and again. 134 00:15:16,830 --> 00:15:19,860 So you could guess now that we have several digits of accuracy. 135 00:15:20,850 --> 00:15:27,660 It looks clear that now that we have four or five digits of accuracy now, it would seem we have maybe nine. 136 00:15:28,620 --> 00:15:31,110 And now, from now on, it just keeps agreeing. 137 00:15:31,500 --> 00:15:38,430 So, of course, it's not a proof, but it's pretty clear that we have more or less full precision in the cube root of two. 138 00:15:40,300 --> 00:15:43,860 Let's see what happens if I type two to the one third. 139 00:15:46,180 --> 00:15:50,710 Looks good. That doesn't mean they're the same. By the way, I could say Ant's minus X. 140 00:15:51,430 --> 00:15:55,740 They might. That might be zero. Or it might be some small multiple of machine epsilon. 141 00:15:55,750 --> 00:16:02,960 It turns out at zero we were lucky. The next one is doing the same. 142 00:16:02,980 --> 00:16:08,740 But symbolically now I would love to show you this properly, but my laptop doesn't have the symbolic toolbox. 143 00:16:09,040 --> 00:16:17,110 So you can try this at home. And therefore what I did was printed out a couple of the outputs on the sheet of paper here. 144 00:16:17,380 --> 00:16:24,160 So here we're running in MATLAB symbolic toolbox, which is by no means the world's most beautiful symbolic calculator. 145 00:16:24,550 --> 00:16:31,680 But anyway, it works for something like this. In MATLAB, you can declare a variable to be sim of a string, 146 00:16:31,680 --> 00:16:37,860 and so one then becomes a symbolic one and you can go through this whole thing in a symbolic mode. 147 00:16:39,570 --> 00:16:45,810 And Newton's method for this problem has a succession of iterates that are all rational numbers. 148 00:16:46,050 --> 00:16:51,330 So you see, the first is one and then 4/3, which we identified immediately when we ran Newton. 149 00:16:52,650 --> 00:16:55,740 But they keep being rational numbers at every stage. 150 00:16:55,740 --> 00:16:59,640 So 91, 70 seconds and 112. Yeah, whatever. 151 00:17:02,000 --> 00:17:05,360 Now, the striking thing about these numbers is that they keep getting longer. 152 00:17:05,540 --> 00:17:11,569 And if you quantify that, roughly speaking, the numerator is tripling and that's number of digits at each step. 153 00:17:11,570 --> 00:17:14,810 And so is the denominator because this is a cubic. 154 00:17:15,290 --> 00:17:23,930 If it had been a quintillion, the numerator would have become five times as long at each step or a 20th degree polynomial, 20 times as long. 155 00:17:24,560 --> 00:17:29,420 So what you're seeing here is the combinatorial explosion of symbolic computing. 156 00:17:29,690 --> 00:17:38,690 Rational arithmetic is sadly just not a good idea for scientific computing because this happens all the time at combinatorial explodes. 157 00:17:39,140 --> 00:17:44,060 So that's why rational arithmetic has such a small niche in computational science. 158 00:17:44,420 --> 00:17:53,330 Usually it just isn't feasible. And if you're even looking at this for theatre, it said we have won 4/3 91 and then going down three more steps. 159 00:17:55,180 --> 00:18:02,860 That approximation to the cube root of two will surely be accurate to 16 digits or so, but it won't be accurate to this many digits. 160 00:18:03,040 --> 00:18:09,460 So most of that exactness in the rational numbers is irrelevant to the problem you're trying to solve. 161 00:18:10,060 --> 00:18:16,210 And that would be very typical of symbolic or exact arithmetic computing. 162 00:18:18,650 --> 00:18:26,450 Now the third one and 19 Newton C is also something I can't run on my laptop because I don't have the symbolic toolbox. 163 00:18:26,660 --> 00:18:31,010 But it's the other variation on the theme of standard numerical computing. 164 00:18:31,250 --> 00:18:35,300 So standard is floating point arithmetic. One variation is symbolic. 165 00:18:35,780 --> 00:18:39,970 The other variation is extended precision. Another important beautiful idea. 166 00:18:39,980 --> 00:18:47,510 So here in this code M 19, Newton C, I said let's run in 75 digit arithmetic. 167 00:18:49,100 --> 00:18:53,059 So there you see, print it out the first few steps and 75 digit arithmetic. 168 00:18:53,060 --> 00:18:58,940 And sure enough, the second one has one digit, the third one has one digit, the fourth one has three digits. 169 00:18:59,180 --> 00:19:03,860 And then if you follow along, you see that the last two lines of the output are identical. 170 00:19:04,280 --> 00:19:08,450 So sure enough, Newton is doubling the number of digits at each step. 171 00:19:09,800 --> 00:19:18,040 Variable precision is a beautiful, wonderful thing. And I think everyone in my field wishes it were effortlessly available all the time. 172 00:19:18,050 --> 00:19:21,950 If only every computer had a knob, you could turn and just change the position. 173 00:19:22,970 --> 00:19:26,690 Sadly, from an engineering point of view, that doesn't seem to be worth the cost. 174 00:19:27,410 --> 00:19:31,430 But for an academic like me, it would be delightful to be able to do that. 175 00:19:31,730 --> 00:19:38,420 Maybe in 50 years computers will do that. So in practice, if you want to do variable precision arithmetic, 176 00:19:38,570 --> 00:19:44,870 you go into some rather different mode and usually there's a significant cost associated with that. 177 00:19:46,980 --> 00:19:51,890 When I was your age, quadruple precision was more common than it is now. 178 00:19:51,900 --> 00:19:58,290 It's still out there. But when I was your age, single, double and quadruple were all very much part of the landscape. 179 00:19:58,290 --> 00:20:01,980 So we would computed eight digit, 16 digits or 32 digits. 180 00:20:02,280 --> 00:20:07,140 Now, that's kind of compressed where most people, most of the time are just working in 16 digits. 181 00:20:10,140 --> 00:20:16,920 Okay. So that's Newton's method in the scalar case. 182 00:20:17,130 --> 00:20:20,190 But of course, optimisation is about systems of equations. 183 00:20:20,550 --> 00:20:30,520 So let's talk about that. So Newton's method for a system of equations. 184 00:20:46,360 --> 00:20:52,810 And again, I'm writing that out explicitly to emphasise that we're not yet minimising, we're still finding routes of equations. 185 00:20:54,310 --> 00:21:00,870 So now we're going to have a function that maps from a vector space to a vector space. 186 00:21:00,880 --> 00:21:06,280 So let's say that capital F maps from our end to our N. 187 00:21:09,400 --> 00:21:28,690 So we have variables and equations, if you like. We're seeking a vector x star in our n such that f of x star is zero and these are vectors. 188 00:21:28,690 --> 00:21:31,900 So zero here would be the end dimensional zero vector. 189 00:21:32,710 --> 00:21:37,750 So another way to say that is we have a system of equations. 190 00:21:41,940 --> 00:21:50,519 And if you want, we could write them out. The first one would be f one of x one through x, n equals zero. 191 00:21:50,520 --> 00:22:01,500 And that's now the scalar 0... up to f n of x1 through x n is equal to zero. 192 00:22:03,390 --> 00:22:07,980 So a system of equations is the same as a single equation in a vector space. 193 00:22:09,780 --> 00:22:12,780 Now, Newton's method in principle carries over. Exactly. 194 00:22:13,380 --> 00:22:17,910 Although Newton himself never did that, he never did systems of equations. 195 00:22:20,490 --> 00:22:27,180 And I don't remember to what extent it's thought that he sort of understood the idea of calculus with more than one variable. 196 00:22:27,270 --> 00:22:32,589 Probably he did. He was smart. The idea is to do exactly the same. 197 00:22:32,590 --> 00:22:39,850 We need to extrapolate along a tangent, but now a tangent will be defined by some linear algebra rather than just a derivative scalar. 198 00:22:40,060 --> 00:22:44,020 So we're going to have to solve a linear system of equations at each step. 199 00:22:44,500 --> 00:22:50,680 And for that to make sense, we have to talk about a linear ization of the function f. 200 00:22:51,130 --> 00:22:58,780 So here's the definition, and I'll define this in the general sense of a function from one dimensional space to another. 201 00:23:01,280 --> 00:23:07,730 The key thing we need is the derivative of F, which usually one would call the Jacoby and the Jacoby and matrix. 202 00:23:08,060 --> 00:23:21,460 So the derivative. Of S at a point X in our end. 203 00:23:25,380 --> 00:23:28,980 It's a matrix. It's the M by n matrix. 204 00:23:34,350 --> 00:23:45,050 Of partial derivatives. So J for Jacoby and I could say Jay of X, that's my matrix. 205 00:23:45,290 --> 00:23:56,149 And the I j component of that matrix is you could also write as f prime of x AJ component is the partial 206 00:23:56,150 --> 00:24:07,610 derivative of the ice component of F with respect to the JTH component of X evaluated at the point x. 207 00:24:11,750 --> 00:24:15,920 By the way, you notice I don't use special symbols to distinguish vectors. 208 00:24:16,190 --> 00:24:24,120 Of course you can. You can write them in bold face, or you can put arrows over them or Tilda's over them in mathematics. 209 00:24:24,140 --> 00:24:30,290 Often one doesn't do that, so people get used to just using the same thing, both for a vector as for a scalar. 210 00:24:31,070 --> 00:24:37,580 I realise in other fields you may have different habits. So let's just do an example. 211 00:24:39,290 --> 00:24:43,610 You all know partial derivatives and presumably most of, you know, Jacobean matrices. 212 00:24:44,390 --> 00:24:47,360 The example I'll pick, we'll go from our two to our three. 213 00:24:52,640 --> 00:25:13,879 So suppose we had for example, F1 of x1 x2 is equal to x1 squared and f two of x1 x2 is equal to x1 plus x1 x2 and f three of x1. 214 00:25:13,880 --> 00:25:20,090 X2 is equal to x1 e to the x2. 215 00:25:23,140 --> 00:25:26,200 So the Jacoby and Matrix is a three by two matrix. 216 00:25:27,370 --> 00:25:32,020 Let me just write it down. F prime of x. So it's three by two. 217 00:25:33,670 --> 00:25:37,990 So we differentiate the first thing with respect to x one and x two. 218 00:25:38,020 --> 00:25:45,999 So we get two X1 zero, then we differentiate the second one with respect to X1 and x2. 219 00:25:46,000 --> 00:25:56,100 So we get one plus x2 and x1. Then we differentiate the third with respect to X1 and x2. 220 00:25:56,310 --> 00:25:59,940 So we get either the X2 and X1, either the x2. 221 00:26:07,310 --> 00:26:10,730 So Newton's method is going to have to do this kind of thing all the time. 222 00:26:11,030 --> 00:26:15,830 Of course, usually the derivative will be calculated numerically rather than symbolically. 223 00:26:45,240 --> 00:27:08,790 Said, okay, I guess to continue for just a second on the general idea of a Jacobean, 224 00:27:09,060 --> 00:27:13,500 this is Newton's idea that locally any function, if it's smooth, looks linear. 225 00:27:15,060 --> 00:27:18,930 So that idea, the motivation for Jacobins. 226 00:27:22,260 --> 00:27:32,400 Is that if you have this function capital F, mapping r m to our end and you evaluate it at X plus a small perturbation delta x, 227 00:27:33,390 --> 00:27:36,930 so that's an M, an n vector and another n vector. 228 00:27:38,670 --> 00:27:45,150 Then that should be equal to. F of X plus some suitable perturbation. 229 00:27:46,020 --> 00:27:53,040 And that perturbation will be the vector, the matrix F, prime times the perturbation. 230 00:27:54,360 --> 00:27:57,480 So that's the whole point of vector calculus, if you like. 231 00:27:57,750 --> 00:28:06,640 These are both end vectors, as I said. And these. 232 00:28:08,300 --> 00:28:12,140 Are both m vectors. Sorry. 233 00:28:15,520 --> 00:28:27,440 Yeah, right. So this amp vector arises by this one by multiplying an N by an matrix by a an end vector. 234 00:28:27,770 --> 00:28:36,770 So that's what vector calculus starts from. And Newton's method is simply exploiting this property in the case of a square capital F. 235 00:28:37,340 --> 00:28:53,060 So let's write that down. So now it's Newton's method for a system of equations in its purest form. 236 00:28:53,480 --> 00:28:59,540 So we would be given an initial guess, which is now a vector. 237 00:29:01,390 --> 00:29:11,530 Ex not and then we enter into our loop for k equals 01... we evaluate. 238 00:29:14,680 --> 00:29:27,580 The function. So that's F of x, k, and also the derivative Jacobean, which is f prime of x k. 239 00:29:29,720 --> 00:29:42,830 Now, I'm assuming here that any calls anywhere in the square case, you can imagine that in practice, evaluating the Jacoby is a non-trivial matter. 240 00:29:43,280 --> 00:29:46,549 That's a huge issue of different ways to do that. Exactly. 241 00:29:46,550 --> 00:29:52,700 Or approximately. And now we want to do our update. 242 00:29:54,350 --> 00:29:59,750 We need a step. And the step is now defined by a linear system of equations. 243 00:30:00,170 --> 00:30:09,530 So here's the system of equations. F prime of x times s k is equal to minus F of x k. 244 00:30:11,460 --> 00:30:22,330 And we're solving that for the step. So before in the scalar case we just said S.K. equals two minus F over F prime. 245 00:30:22,330 --> 00:30:28,660 But now that's the system of equations. We have to solve this end by end system of equations that gives us our step. 246 00:30:29,200 --> 00:30:33,220 And then x, k plus one is equal to x, k plus k. 247 00:30:43,600 --> 00:30:50,290 So that's a pretty straightforward idea. A lot of optimisation consists of how to make that idea practical for big problems. 248 00:30:50,560 --> 00:31:01,340 Now let's look at it in its simplest form. So what I'm going to do here is look at the code called M20 Newton's system. 249 00:31:02,960 --> 00:31:09,700 So if you look at that on the back of the other ones. I guess once again I might as well write down. 250 00:31:10,330 --> 00:31:17,710 Well, now it's not important you can see the function. So the function F that we're working with here is two variables and two equations. 251 00:31:18,070 --> 00:31:21,940 So sine of x one plus x two minus e to the minus x one squared. 252 00:31:22,340 --> 00:31:27,670 Then the second equation is 3x1 plus x1x2 squared minus one equals zero. 253 00:31:27,970 --> 00:31:31,480 So we're trying to find a root of those two equations in two unknowns. 254 00:31:31,870 --> 00:31:40,330 And in this case, I've explicitly written down a formula for the giacobbe, which of course would not typically be possible in a numerical calculation. 255 00:31:40,720 --> 00:31:46,870 So this little code will just explicitly do Newton's iteration for that system of two equations. 256 00:31:48,190 --> 00:31:56,440 Let's see what happens. I'll type M20 Newton system. 257 00:31:57,700 --> 00:32:01,620 Oh, dear. Where am I not in the system? Yeah. 258 00:32:03,920 --> 00:32:09,600 What did I do wrong here? Hmm. 259 00:32:09,900 --> 00:32:15,300 That's interesting. Either I tested this on my desktop, but not on this laptop. 260 00:32:15,720 --> 00:32:26,940 Let's see what's going on in line 32. Column one. Oh sorry. 261 00:32:28,480 --> 00:32:32,170 I added some comments. That's the problem and forgot to make them into comments. 262 00:32:32,200 --> 00:32:37,880 My apologies. So. They're in your output there. 263 00:32:38,120 --> 00:32:43,150 And I added them at the last minute. Right. 264 00:32:49,930 --> 00:32:55,659 Okay. So the code is set up to start from the initial guess of zero zero and then it solves 265 00:32:55,660 --> 00:33:01,870 this two by two system of equations and gets one third two thirds and it goes on. 266 00:33:01,900 --> 00:33:05,440 Now you can guess at this point, we probably have a couple of digits of accuracy. 267 00:33:06,730 --> 00:33:10,090 You can guess now that we still probably have a couple of digits of accuracy. 268 00:33:10,090 --> 00:33:16,149 Or maybe your best guess would be about four digits now because this either it agrees with the last 1 to 2 places, 269 00:33:16,150 --> 00:33:25,900 so it's probably accurate to about four places. Sure enough, we had about five digits of accuracy and of course, numerically beautiful convergence. 270 00:33:26,140 --> 00:33:27,730 So that was an easy problem. 271 00:33:28,000 --> 00:33:34,780 Now, of course, you don't know looking at that, whether that's the unique solution, you're pretty certain that is a solution. 272 00:33:34,780 --> 00:33:44,200 But there might be others you simply don't know from looking at that. Let me do the comments that I added there. 273 00:33:45,160 --> 00:33:50,379 So of course, FUN has a lot of fun with minimising and finding routes and so on. 274 00:33:50,380 --> 00:33:58,000 Two functions. Let's look at this function in fun. It's two equations and two unknowns, so I'll do the second first. 275 00:33:58,000 --> 00:34:01,360 One way to do that would be to declare. 276 00:34:04,960 --> 00:34:17,110 A function corresponding to X on the unit square by default and another function corresponding to Y on the unit square. 277 00:34:18,400 --> 00:34:36,600 And then I could create. F1 to be sign of x plus y minus E to the minus x squared and F2 to be three times x minus x times y squared minus one. 278 00:34:36,630 --> 00:34:38,400 If I have a typo, please let me know. 279 00:34:38,670 --> 00:34:46,980 So to have fun too is doing its usual thing of representing functions by polynomials that match them to about 16 digits of precision. 280 00:34:47,790 --> 00:34:54,090 And then there's the roots command, which in principle aims to find all the roots. 281 00:34:55,880 --> 00:35:00,800 Of this system of two equations in the region where they are defined, which was the unit square. 282 00:35:00,800 --> 00:35:07,730 So -1 to 1, -1 to 1. So one would hope that we'll find the root that we just found. 283 00:35:08,180 --> 00:35:11,470 Maybe we'll find others, too, because Newton's method didn't tell us. 284 00:35:11,480 --> 00:35:16,510 Are there others to. So we proceed and we seem to find just one. 285 00:35:16,510 --> 00:35:21,820 That's not a proof, of course, but it's maybe some evidence that that's the unique route in the unit square. 286 00:35:21,970 --> 00:35:26,860 Of course you could try enlarging. Suppose I changed it to the interval from. 287 00:35:28,630 --> 00:35:33,580 -3 to 3. -3 to 3. So now we're in a box nine times as big. 288 00:35:37,580 --> 00:35:43,430 I haven't tried this, so I don't know what will happen if I now say a defined F as before. 289 00:35:45,090 --> 00:35:49,980 I've now created a global representation of these functions on these bigger domains. 290 00:35:50,520 --> 00:35:56,400 I don't know what routes we'll find. Oh, okay. 291 00:35:56,690 --> 00:36:03,829 So that's new to me. Hopefully those are all correct. We could get some numerical evidence of whether they're correct. 292 00:36:03,830 --> 00:36:09,020 So suppose I say f one events or I'll say R equals. 293 00:36:09,590 --> 00:36:21,020 And so then I'll say F1 of ah, I'm doing something wrong, so maybe I won't waste your time on that, but I'm probably those are all valid routes. 294 00:36:21,110 --> 00:36:24,110 Do we recognise among them the one that Newton's method got? 295 00:36:25,090 --> 00:36:28,420 I think that's the fourth one in the list, right .387. 296 00:36:28,750 --> 00:36:35,200 So I currently believe that there are six routes of this pair of equations in the box. 297 00:36:35,200 --> 00:36:38,739 Minus three, three, minus three, three, seven. 298 00:36:38,740 --> 00:36:42,340 Also lets you work with vector calculus sort of directly. 299 00:36:42,350 --> 00:36:48,130 So another way to do it would be to say F equals seven to V. 300 00:36:49,550 --> 00:36:53,870 V for vector, and then we could simply make a vector function of both things. 301 00:36:53,870 --> 00:36:58,130 So I could say sine of x plus y minus e to the minus x squared. 302 00:36:58,370 --> 00:37:07,220 That's the first component. And then the second component is the other function, three x minus X times y. 303 00:37:08,730 --> 00:37:12,510 Squared. Stop me if you see an error minus one. 304 00:37:14,100 --> 00:37:21,780 So that's now a vector. Capital F is now a the proper vector function and you can ask for the roots of vector. 305 00:37:23,580 --> 00:37:27,940 And there they are. Let's do the experiment of doing it on -33. 306 00:37:30,270 --> 00:37:39,400 We hope this matches what we got before. Right. Seems to. 307 00:37:40,570 --> 00:37:45,580 So whether that's further evidence or just the same old evidence is an interesting question. 308 00:37:46,180 --> 00:37:49,509 Since I'm doing this fun stuff, let me just go to the more trivial, 309 00:37:49,510 --> 00:37:55,930 one dimensional case to highlight the significance of local versus global root finding. 310 00:37:56,110 --> 00:38:05,250 So suppose we take an example we looked at before. Suppose I guess say F equals a one dimensional function and I'll pick a Bessel function. 311 00:38:05,260 --> 00:38:11,350 So suppose I say Bessel j and I'll do that on the interval from 0 to 20. 312 00:38:12,820 --> 00:38:16,650 So if I plot f. Yup. 313 00:38:16,710 --> 00:38:24,960 That's not going to be interesting. Thank you. Uh, zero x right. 314 00:38:26,220 --> 00:38:32,440 So if I plot f. I find a function with several zeros. 315 00:38:33,460 --> 00:38:39,280 Now, of course, Newton's method, if you started at any point, probably will converge to one of these zeros, but only one. 316 00:38:39,490 --> 00:38:44,080 If you wanted to find all of them, you would need to enlarge your discourse somehow. 317 00:38:44,410 --> 00:38:48,700 In the olden days, when I was your age, that was almost a taboo subject. 318 00:38:48,700 --> 00:38:53,500 Global optimisation was considered provably impossible, so nobody talked about it. 319 00:38:53,770 --> 00:38:59,950 But that's all sort of changes. There are now various context in which you can meaningfully do global optimisation. 320 00:39:00,160 --> 00:39:04,840 Of course, a one day problem like this is very easy and Tetfund would do that. 321 00:39:05,320 --> 00:39:12,160 Let me remind you how if I say R equals roots of F, of course I get all of them so I could plot them. 322 00:39:13,390 --> 00:39:16,750 But let's talk about why. 323 00:39:19,110 --> 00:39:23,210 What they do wrong here. That's worrisome. 324 00:39:23,540 --> 00:39:29,900 Did I? What did I do wrong? Oh. 325 00:39:32,730 --> 00:39:36,760 Oops. Sorry. Hold on. 326 00:39:36,780 --> 00:39:40,860 Plot are against FFR. Is that what I want to do? Thank you. 327 00:39:41,160 --> 00:39:46,410 Yes. Okay. So, of course, we see some kind of global route finding. 328 00:39:46,410 --> 00:39:49,590 But let's talk for a moment. Why was this global? 329 00:39:50,840 --> 00:39:59,600 The answer is that Ted often happens to represent functions by polynomials and find roots as eigenvalues of matrices. 330 00:39:59,900 --> 00:40:06,410 And it's a fact of scientific computing that the standard way of finding eigenvalues of matrices is global. 331 00:40:06,410 --> 00:40:13,220 You find all of them. So there's some matrix involved here, which has a lot of eigenvalues, six of which are those dots. 332 00:40:13,640 --> 00:40:21,140 It's because the standard algorithms for eigenvalue problems get all of them that it becomes easy to do global root finding, 333 00:40:21,140 --> 00:40:27,170 at least in one dimension for fun. And as I think I demonstrated before, it's pretty powerful. 334 00:40:27,170 --> 00:40:33,559 So if I made that the interval from 0 to 20000, for example, I don't I haven't tried one this big. 335 00:40:33,560 --> 00:40:41,490 Let's see what happens. I'll plot f. So there's your Bessel function. 336 00:40:41,790 --> 00:40:45,390 The that's a polynomial of degree, 10,000. 337 00:40:46,380 --> 00:40:49,560 Let's say R equals roots of F and see how long it takes. 338 00:40:49,560 --> 00:40:58,600 I predict about half a minute, actually. We're finding the roots of a polynomial of degree. 339 00:40:58,600 --> 00:41:01,600 10,000 in order to do global roots. 340 00:41:01,810 --> 00:41:10,150 Well, 15 seconds maybe. So suppose, I guess say plot F and I'll restrict it to a small interval I'll go from. 341 00:41:12,500 --> 00:41:18,860 8082 8175. 342 00:41:21,270 --> 00:41:26,910 Okay. And now I'll say plot are against FLIR. 343 00:41:27,940 --> 00:41:33,470 Hold on. Plot are against FFR. 344 00:41:33,490 --> 00:41:39,480 Will that work? Yeah. 345 00:41:39,690 --> 00:41:44,550 Okay. So in 15 seconds, we found all the roots of the Bessel function out to 20,000. 346 00:41:45,210 --> 00:41:50,070 There they are. I don't know exactly how accurate they are, but probably 14 or 15 digits. 347 00:41:51,330 --> 00:41:59,340 And that's an example of Global Zero finding. And I think as the decades go on, optimisation has a larger and larger part that's global. 348 00:41:59,460 --> 00:42:04,740 It's still a very small minority for serious scientific problems like protein folding. 349 00:42:04,740 --> 00:42:12,750 Let's say you don't have any realistic hope of global optimisation in a provable sense, but of course people are always striving for that. 350 00:42:13,950 --> 00:42:18,540 Okay, so for the last few minutes, I want to begin to talk about minimisation. 351 00:42:31,960 --> 00:42:38,860 And for whatever reason, minimisation is in practice a much vaster scientific enterprise than zero. 352 00:42:38,860 --> 00:42:44,410 Finding it just comes up more of the time. So this is really the heart of the field of optimisation. 353 00:42:44,620 --> 00:42:55,900 So let's talk about Newton's method. For minimising and we'll get start in one day. 354 00:43:11,750 --> 00:43:17,989 Well, there's no great mystery mathematically about what's involved here to minimise you want to find a zero of the derivative, 355 00:43:17,990 --> 00:43:23,900 and that's what Newton's method does. So we're now going to use a lowercase letter. 356 00:43:24,260 --> 00:43:31,820 So we suppose now we have lowercase F. A function of a scalar variable. 357 00:43:32,780 --> 00:43:38,840 And our goal is to find x star. 358 00:43:41,640 --> 00:43:45,000 Such that f of x star equals minimum. 359 00:43:51,780 --> 00:44:01,860 So it's interesting already, if you think about it, you can begin to see a slight difference between minimisation and zero finding. 360 00:44:02,070 --> 00:44:06,720 So suppose we have some function and we want to find its minima. Well. 361 00:44:08,850 --> 00:44:18,270 Of course, these are some local minima and they all have the property that the zero of the derivative is zero. 362 00:44:19,620 --> 00:44:24,239 So it's interesting that from a global point of view, these are not all equal. 363 00:44:24,240 --> 00:44:30,200 That's a better minimum than this one. Whereas in the zero finding problem, of course, by definition all zero zeros. 364 00:44:30,450 --> 00:44:34,620 So you can see that minimisation has a complication there, 365 00:44:34,630 --> 00:44:39,930 global versus local that goes deeper somehow than the analogous situation with zero finding. 366 00:44:41,580 --> 00:44:46,340 So we want to find a minimum. So a. 367 00:44:47,250 --> 00:44:51,510 Usually 90% of the time we were going to seek a local minimum. 368 00:44:54,460 --> 00:44:58,120 And it's obvious what that means. Let's write it down. 369 00:44:58,120 --> 00:45:05,710 We'll say F of X should be bigger than or equal to f of x star for all. 370 00:45:08,020 --> 00:45:14,600 Ex in a neighbourhood. Neighbourhood of Star. 371 00:45:16,910 --> 00:45:25,400 So that's our definition of a local minimum. And the obvious idea is to use Newton's method defined as zero of the derivative. 372 00:45:32,440 --> 00:45:40,230 So if prime of x star equals zero would be the condition we apply. 373 00:45:40,240 --> 00:45:44,320 And if you'll excuse my being so methodical here, I'm going to write it down. 374 00:45:44,650 --> 00:45:48,430 So this is Newton's method for minimisation of a scalar function. 375 00:45:48,640 --> 00:45:54,330 We're given an initial gas. Which would be ex not. 376 00:45:55,230 --> 00:46:00,910 We enter our loop. 4K equals 01... 377 00:46:01,630 --> 00:46:12,560 We define our step. As Sub K, we're now finding a0 of the derivative, so it's the same formula as before but differentiated. 378 00:46:12,560 --> 00:46:18,770 So it's minus f, prime of x k divided by F double prime. 379 00:46:19,710 --> 00:46:28,320 Of x k and then we update exactly as before x k plus one equals x k plus s-k and. 380 00:46:36,640 --> 00:46:45,070 So from that, you can perhaps begin to see that minimisation is going to have some problems. 381 00:46:45,400 --> 00:46:49,030 It's a whole nother derivative beyond what we had before it. 382 00:46:49,390 --> 00:46:54,970 This pure Newton method depends on the curvature of the function. That's first of all, it's a lot more information. 383 00:46:55,270 --> 00:47:01,660 Second of all, it's demanding more of the smoothness of a function for that to really work. 384 00:47:01,990 --> 00:47:09,100 So, of course, in many applications, it will be hard to evaluate that, or maybe the derivative won't even exist if your function isn't very smooth. 385 00:47:09,400 --> 00:47:13,300 And that gives a hint as to the complexity of the problem. 386 00:47:14,020 --> 00:47:18,550 Let me finish by saying another way that one would interpret this. 387 00:47:22,790 --> 00:47:27,450 So. Let me call it an equivalent formulation, 388 00:47:27,630 --> 00:47:35,760 while at the same time hinting that it's in some sense a better formulation because it generalises very broadly, 389 00:47:41,400 --> 00:47:46,500 the equivalent way of looking at Newton's method is that we approximate F of X. 390 00:47:50,990 --> 00:48:04,220 Locally by a parabola. And of course the particular parabola we choose is determined by the derivative in the second derivative at our current point, 391 00:48:04,730 --> 00:48:13,580 and then we set x, k plus one equal to the minimum of the parabola. 392 00:48:23,050 --> 00:48:28,660 Now the reason that somehow a better formulation is that it immediately generalises. 393 00:48:28,660 --> 00:48:34,270 And this in some sense is really a picture for all kinds of optimisation algorithms. 394 00:48:34,720 --> 00:48:40,390 You do a local approximation and what changes is, instead of saying just a parabola, 395 00:48:40,930 --> 00:48:45,130 you have some model of your function based on whatever data you've got. 396 00:48:47,420 --> 00:48:59,640 And then you minimise that model. So at least for unconstrained optimisation, this is sort of the universal picture of the way most algorithms work. 397 00:48:59,910 --> 00:49:03,780 You locally find some model of what your function looks like. 398 00:49:04,080 --> 00:49:12,090 Use that model to get the next step. And as you go, keep improving the accuracy and probably improving the model. 399 00:49:12,690 --> 00:49:19,020 So next time we'll go into multiple dimensions, but also talk about more realistic aspects beyond Newton's method.