1 00:00:11,930 --> 00:00:15,710 We have an assignment which is due on Tuesday, week after next. 2 00:00:16,940 --> 00:00:20,839 It involves downloading a piece of software which, as I say this, 3 00:00:20,840 --> 00:00:25,960 I realise I haven't quite put on the website yet, but I'll do it right after this lecture called PFO. 4 00:00:25,970 --> 00:00:36,049 It stands for Brute Force Optimisation. And it's a nice example of a serious piece of software for not large scale optimisation. 5 00:00:36,050 --> 00:00:45,140 It's for maybe 1 to 10 variables. But if you have a problem with 1 to 10 variables, PFO is a very remarkable, convenient piece of work. 6 00:00:45,350 --> 00:00:50,720 It's all in MATLAB, freely available. I have a few miscellaneous things. 7 00:00:50,990 --> 00:00:58,760 One is actually section 3.6 of our outline is to mention nails and coin O.R. 8 00:00:58,940 --> 00:01:05,960 Now, last lecture. We already took a look at Neovasc briefly, which is the network enabled optimisation software. 9 00:01:06,320 --> 00:01:14,310 Let's briefly now look at the other one. Just so you're aware of this coin. 10 00:01:14,360 --> 00:01:19,760 Zero Hour is a funny name. Anyway, it stands for Computational Infrastructure for Operations Research. 11 00:01:20,000 --> 00:01:23,960 This is another portal to a great deal of information about optimisation. 12 00:01:24,170 --> 00:01:32,600 I'm not going to explore it now, but one reason I wanted to mention it to you is a reminder that operations research or in Britain, 13 00:01:32,600 --> 00:01:38,360 we would say operational research has been a key source of optimisation technology. 14 00:01:38,570 --> 00:01:44,060 In the early years, more or less during and after the war. This was where optimisation came from. 15 00:01:44,270 --> 00:01:49,100 Now optimisation is much broader in its impact, but this is the classical root of it. 16 00:01:49,420 --> 00:01:52,970 It's even connected with Soviet history. 17 00:01:53,120 --> 00:02:00,920 The planned economy was one of the early, great examples in which people attempted to use optimisation to achieve good things. 18 00:02:01,130 --> 00:02:09,950 It didn't work very well, but it was a noble effort and indeed one of the great early figures in optimisation, Kontorovich was from the Soviet Union. 19 00:02:12,490 --> 00:02:16,830 None of you were even born when there was a Soviet Union. Okay. 20 00:02:19,430 --> 00:02:27,350 The other thing I wanted to mention is this handout of the top 500 computer site. 21 00:02:27,860 --> 00:02:36,710 I think I've mentioned top 500 a couple of times. They update their top 500 twice a year at once, as always in November. 22 00:02:37,370 --> 00:02:40,490 So this is the update from last week. 23 00:02:40,790 --> 00:02:48,169 And you can see that the fastest machines on earth are rated in the tens of teraflops. 24 00:02:48,170 --> 00:02:49,370 They're incredibly fast. 25 00:02:49,610 --> 00:02:56,420 The top ten are listed here and you can see they're from China, the US, Japan, the US, US, Switzerland, Germany, Saudi Arabia. 26 00:02:56,900 --> 00:03:03,410 They're a remarkable collection of machines that are very remote from what we actually do day to day, most of us. 27 00:03:03,710 --> 00:03:09,200 But if you take the trouble to use them with all their cores effectively, they can be incredible. 28 00:03:11,330 --> 00:03:17,690 One thing to remind you of is that all of these ratings spring from the numerical linear algebra community. 29 00:03:17,930 --> 00:03:28,700 The top 500 numbers are computed by running benchmarks, which effectively spring from the old linpack software subroutine libraries from the 1970s. 30 00:03:29,030 --> 00:03:34,400 So it's all about numerical linear algebra used to rate computers as well as to solve problems. 31 00:03:37,220 --> 00:03:42,770 There's a fair amount of scepticism about this list, and yet nobody can bring themselves to stop compiling it. 32 00:03:47,700 --> 00:03:53,490 Okay. Well, today is the last lecture, and the theme of the lecture is optimisation with constraints. 33 00:03:54,150 --> 00:04:02,010 As usual, I won't be able to go into things in depth, but I want to mention some of the features that turn up here. 34 00:04:04,620 --> 00:04:12,390 So 3.7 is our constraints and linear programming. 35 00:04:27,990 --> 00:04:33,120 Now in general in optimisation problem has something you're trying to minimise and then various constraints. 36 00:04:33,450 --> 00:04:38,850 The thing you're minimising could be linear or nonlinear, and the constraints could be linear or nonlinear. 37 00:04:40,260 --> 00:04:46,770 That description applies to continuous optimisation and a further complication as you might have discrete variables other than constraints, 38 00:04:46,770 --> 00:04:48,630 which I'm not talking about now. 39 00:04:49,470 --> 00:04:56,130 Linear programming is the case where everything's linear, and in practice the challenge is really to handle the constraints efficiently. 40 00:04:56,490 --> 00:05:06,570 The kind of picture we have in mind is that we're in some high dimensional space and we have to lie to a certain side of various hyper planes. 41 00:05:06,840 --> 00:05:14,280 So maybe we have to lie to that side of that hyper plane and to this side of this hyper plane and to this side of this hyper plane, 42 00:05:14,700 --> 00:05:19,170 we're constrained to a feasible region of legal positions. 43 00:05:19,650 --> 00:05:25,860 And within that region, we want to find an optimum, which is defined by minimising a linear function. 44 00:05:26,100 --> 00:05:29,220 It sounds trivial and of course it is trivial on the whiteboard, 45 00:05:29,460 --> 00:05:34,530 but once you get into millions of constraints and millions of variables, it's very far from trivial. 46 00:05:34,950 --> 00:05:50,280 So let's write down the notation. So we seek to find a vector X in our M and such that. 47 00:05:52,820 --> 00:05:57,860 We minimise f transpose x. 48 00:05:59,810 --> 00:06:07,070 Subject to the constraints that a times X is less than or equal to be. 49 00:06:07,070 --> 00:06:19,490 So I have to define what all of those things are. X, as I said, is an unknown and vector, and F is a given and vector also. 50 00:06:22,100 --> 00:06:28,130 And then a is a given and by and matrix. 51 00:06:31,840 --> 00:06:36,250 And B is a given m vector. 52 00:06:39,290 --> 00:06:45,640 So the picture here is that I'm talking about a system with NW unknowns and m constraints. 53 00:06:45,650 --> 00:07:05,980 Let's write that down. And The Matrix A defines the constraints that are linear constraints defined by this condition that affects less than or equal. 54 00:07:06,010 --> 00:07:09,040 B Now what does that mean? This is a vector and that's a vector. 55 00:07:09,250 --> 00:07:11,920 When I say less than a equal, I mean component wise. 56 00:07:11,920 --> 00:07:19,780 So each component of x there m components has to be less than or equal to the corresponding component of B. 57 00:07:24,570 --> 00:07:30,750 So this problem, which would have been considered too trivial to say anything about in the 19th century, 58 00:07:30,990 --> 00:07:37,709 began to seem important in the 1930s, 1940s, 1950s, and has just become more and more so over the years. 59 00:07:37,710 --> 00:07:41,160 It's absolutely at the heart of a lot of optimisation. 60 00:07:47,400 --> 00:07:51,570 So I'm going to run a baby example. So let me write down that example. 61 00:07:53,670 --> 00:07:57,750 I like to be able to draw it so it will be two dimensional. So here's our example. 62 00:08:01,330 --> 00:08:05,830 Let's take the X Y plane. 63 00:08:05,860 --> 00:08:06,760 So here's our plane. 64 00:08:06,770 --> 00:08:18,610 We're going to take three constraints X less than or equal to, Y, x greater than or equal to minus Y and Y less than or equal to one. 65 00:08:19,690 --> 00:08:24,970 So that defines a triangle. Why? 66 00:08:25,080 --> 00:08:33,150 Less than or equal to one would be that constraint and x less than or equal to Y would be that constraint. 67 00:08:35,060 --> 00:08:40,430 And X greater than or equal to minus Y would be that constraint. 68 00:08:40,850 --> 00:08:45,030 So we're looking in this region in the X Y plane. 69 00:08:45,770 --> 00:08:57,630 That's our feasible region. When you draw on the board, it's so trivial, but numerically, for many dimensions, 70 00:08:57,810 --> 00:09:02,100 even finding a point in that region is not necessarily an easy problem. 71 00:09:02,250 --> 00:09:06,240 And of course, there may be no point in that region if the constraints are inconsistent. 72 00:09:06,780 --> 00:09:08,370 Now we need to minimise something. 73 00:09:08,370 --> 00:09:15,150 So the particular vector I'm going to show in my code sort of points in that direction, or it's negative points in that direction. 74 00:09:15,360 --> 00:09:18,980 It will be the vector F equals one too. 75 00:09:20,670 --> 00:09:28,230 So the baby problem we're trying to solve is to find that point, 76 00:09:28,830 --> 00:09:35,940 because that's the point that minimises the inner product of X with F subject to lying in that triangle. 77 00:09:40,700 --> 00:09:49,810 Well, there's a long history here. I mentioned Kontorovich. And then another key name is dancing. 78 00:09:50,410 --> 00:09:58,149 So George Dancing was a great genius who worked during the war on secret things, which were certainly related to this, 79 00:09:58,150 --> 00:10:04,840 and then was at Berkeley right after the war, and more or less invented the modern field of linear programming right after the war. 80 00:10:05,080 --> 00:10:09,850 He also became a key person in the RAND Corporation, which was the original think tank, 81 00:10:10,150 --> 00:10:13,960 more or less set up to exploit academics for military purposes. 82 00:10:15,250 --> 00:10:18,760 Dancing invented the so-called simplex algorithm. 83 00:10:22,410 --> 00:10:30,960 Which in my simplistic history was one of the two biggest events, perhaps of the many decades of linear programming. 84 00:10:31,110 --> 00:10:38,700 The simplex algorithm pictorially is an algorithm which recognises that the optimum will always be attained at some vertex. 85 00:10:39,000 --> 00:10:42,360 So in principle all we have to do is check all the vertices. 86 00:10:42,630 --> 00:10:45,300 Now they will be combinatorial many vertices. 87 00:10:45,540 --> 00:10:52,380 So actually in the worst case it can be combinatorial is slow, but in practice it converges very quickly. 88 00:10:52,530 --> 00:10:58,410 The simplex algorithm checks one vertex and then moves in a downward direction to the next vertex and then so on. 89 00:10:58,710 --> 00:11:05,940 That makes it sound easier than it is. It's a brilliant idea of moving from one vertex to another until you find an optimal vertex. 90 00:11:08,310 --> 00:11:13,770 The other legendary thing that happened was in 1984. 91 00:11:14,970 --> 00:11:20,520 So let's say Danzig was in the forties and then Karmakar was 1984. 92 00:11:20,850 --> 00:11:27,990 This was an Indian mathematician from Berkeley and Bell Laboratories who proposed a completely different method. 93 00:11:28,560 --> 00:11:38,520 He proposed that you could do better by not hopping from one vertex to another, but by doing something in the interior. 94 00:11:40,220 --> 00:11:45,140 So carmakers. Well, for a year or two, it was called Karmakar method. 95 00:11:45,470 --> 00:11:48,620 But now everybody speaks of interior point methods. 96 00:11:55,460 --> 00:12:02,030 Now, you know, one of my philosophical interests is the interplay of continuous and discrete things and finite and infinite things. 97 00:12:02,330 --> 00:12:06,920 This is a fascinating example where you have a problem, which in principle is finite. 98 00:12:06,980 --> 00:12:12,530 So in principle, to find the red dot, all you have to do is check the vertices whose number is finite. 99 00:12:14,000 --> 00:12:18,740 The interior point approach is infinite. You're solving a continuous optimisation problem. 100 00:12:18,980 --> 00:12:23,390 You're somehow trying to track along a curve that's getting closer and closer to the red dot. 101 00:12:23,630 --> 00:12:30,690 In principle, that takes an infinite amount of work, but that infinite amount of work involves a very rapid convergence. 102 00:12:30,710 --> 00:12:36,950 So in a practical sense, it's finite and indeed often quicker than the simplex method. 103 00:12:37,460 --> 00:12:40,850 This whole approach was born in tremendous controversy. 104 00:12:41,180 --> 00:12:45,680 Karmakar made claims that his method was 50 times faster than the simplex method, 105 00:12:45,920 --> 00:12:51,710 which really annoyed people because he didn't release all the code and it was hard to verify the claims. 106 00:12:52,820 --> 00:12:57,110 As things settle down, it turns out in many contexts these methods really are superior. 107 00:12:57,500 --> 00:13:06,650 It would be rare for them to be 50 times superior. But these days, both classes of methods are very important interior point and boundary methods. 108 00:13:08,740 --> 00:13:17,980 So let's see if we're going to solve this particular problem. Our constraints need to be formulated in the standard way. 109 00:13:19,180 --> 00:13:22,210 So for our example, the constraints can be written in the form. 110 00:13:22,570 --> 00:13:32,050 X minus Y is less than or equal to zero, minus x, minus Y is less than or equal to zero, and Y is less than or equal to one. 111 00:13:34,090 --> 00:13:41,470 Now I can write that as a matrix problem. It's a three by two matrix times a two vector x y. 112 00:13:43,290 --> 00:13:51,900 Should be less than or equal to a three by one vector. The first constraint would be one minus one zero. 113 00:13:53,180 --> 00:14:01,880 The second constraint would be minus one minus one zero, and the third constraint would be 011. 114 00:14:03,240 --> 00:14:07,020 So there's my matrix a and my constraint vector be. 115 00:14:11,090 --> 00:14:16,820 So we're going in a moment. We're going to run that on the computer and see the very exciting result of the red dot. 116 00:14:17,780 --> 00:14:22,390 Let me say a little more about the history. Of course, it's not just these two remarkable men. 117 00:14:22,400 --> 00:14:25,820 There's been a huge amount of development over many years. 118 00:14:27,680 --> 00:14:34,610 Danzig is reputed to have muttered under his breath that five Nobel Prizes were won by the Simplex method. 119 00:14:34,850 --> 00:14:37,490 I've only heard that indirectly. I never heard him mutter it, 120 00:14:37,730 --> 00:14:47,420 but the impact has been enormous and I tried to figure out which Nobel prizes those might be for the simplex method or for linear programming. 121 00:14:48,920 --> 00:14:52,820 Of course, Danzig didn't get one because mathematicians don't get Nobel Prizes, 122 00:14:53,120 --> 00:14:57,980 but scientists who use the products of mathematicians may get Nobel Prizes. 123 00:14:58,190 --> 00:15:07,670 So depending how you analyse things, one of them was an early economics Nobel Prize in 1917 by Paul Samuelson, 124 00:15:08,360 --> 00:15:13,760 which was certainly related to linear programming. Another one was Leo A.F. 125 00:15:16,040 --> 00:15:25,419 In 1973. That was input-output analysis again in economics, which relates to the Soviet Union and all that stuff, 126 00:15:25,420 --> 00:15:29,860 trying to optimise planning for an economy or for a production facility. 127 00:15:31,570 --> 00:15:48,740 Another one is Kontorovich and Koopmans. And that was 1975, and they were working on optimal resource allocation. 128 00:15:48,950 --> 00:15:57,970 Another one is Markowitz. Who was doing portfolio optimisation in 1990. 129 00:15:58,240 --> 00:16:02,830 I don't know whether it's fair to say that the simplex algorithm won those Nobel Prizes, 130 00:16:02,830 --> 00:16:05,950 but all of them certainly have something to do with linear programming. 131 00:16:06,250 --> 00:16:10,120 If there's a fifth that Danzig was muttering about, I don't know what the fifth is. 132 00:16:13,040 --> 00:16:23,300 Okay. Let's run a code here. This is another one that relies on something in the optimisation toolbox, so I'll have to switch to the other computer. 133 00:16:29,770 --> 00:16:33,040 So I'm now looking at the one called M23 LP. 134 00:16:33,790 --> 00:16:42,220 If you look at that code, you see the main thing it does is call line Prague, which is a MATLAB linear programming code. 135 00:16:42,820 --> 00:16:53,320 So I set up precisely this matrix A and the vector F and the right hand side B, and then I call Lynn Prague and I draw a picture. 136 00:16:54,670 --> 00:17:02,979 You recognise that picture from before? Now. 137 00:17:02,980 --> 00:17:11,750 I haven't tried it on this computer. Let's hope it works. Okay. 138 00:17:14,660 --> 00:17:21,980 MATLAB. Okay. So I'll say M 23 LP. 139 00:17:24,550 --> 00:17:27,580 So there is the region. Very exciting. 140 00:17:28,030 --> 00:17:31,540 And when I press return, you see a red dot. So isn't that impressive? 141 00:17:32,350 --> 00:17:34,720 That's what graduate study at Oxford will do for you. 142 00:17:35,800 --> 00:17:42,910 I have a couple of variants on the code to give a little bit of the flavour of things with larger scale. 143 00:17:43,150 --> 00:17:46,629 So the next one M23 ran increases it. 144 00:17:46,630 --> 00:17:53,710 So we have 40 constraints. Pick that random. So let's see what happens with M23 ran. 145 00:17:57,610 --> 00:18:04,420 Oh, well. So let's see. I'll type em up. And 23 ran. 146 00:18:05,860 --> 00:18:13,930 So there we just pick 40 constraints at random. So now, of course, because it's still two dimensional, it's easy for you to see where the optimum is. 147 00:18:14,140 --> 00:18:19,690 But you can imagine if it were in many more dimensions. We're beginning to get into a real computational problem. 148 00:18:20,050 --> 00:18:26,770 Let's do that one more time. Okay. 149 00:18:27,160 --> 00:18:30,459 Let's do it a little bigger. There's another one on the computer called M23. 150 00:18:30,460 --> 00:18:36,900 Ran big. So here. 40 is changed to 400. 151 00:18:38,130 --> 00:18:44,230 Oops. Hmm. 152 00:18:44,740 --> 00:18:48,330 What's going on there? Again. 153 00:18:48,330 --> 00:18:51,840 I didn't check it on this computer. Oops. 154 00:18:52,320 --> 00:18:57,990 I'm not sure what just happened. We seem to have a problem. 155 00:18:59,660 --> 00:19:04,520 Houston. Let me just take a quick look at that in case it's obvious what the problem is. 156 00:19:08,600 --> 00:19:12,950 Okay. So. Oh, I see. 157 00:19:12,950 --> 00:19:18,950 I've tried to do 10,000. I meant to do 400. That's the problem. 158 00:19:21,640 --> 00:19:25,260 Oops. Right. Okay. That looks a little better. 159 00:19:26,530 --> 00:19:32,010 Let's see if that works. Okay. 160 00:19:32,160 --> 00:19:36,080 So there you have 400 constraints. And it finds the dot. 161 00:19:36,090 --> 00:19:40,260 Well, that's exciting. Do it again. 400 constraints and it finds the dot. 162 00:19:40,500 --> 00:19:47,280 Now I've set it up so that obviously there is a feasible region and I've cooked up the constraints to have that property. 163 00:19:47,280 --> 00:19:53,880 But of course, if you took 400 constraints at random without some kind of intent like that, there would be an empty, feasible region. 164 00:19:54,180 --> 00:19:59,100 Let's at least try it one more and I'll change 400 to 2000. 165 00:20:04,830 --> 00:20:10,540 You can imagine that there's a huge amount known about the complexity of these operations, these algorithms. 166 00:20:11,040 --> 00:20:14,800 And it's been a fascinating interplay of theory and practice. 167 00:20:14,910 --> 00:20:23,190 Are we have 2000 constraints. It still finds the red dot in computer science. 168 00:20:23,220 --> 00:20:29,520 There has been tremendous interest over the last 40 years in the theoretical complexity of linear programming. 169 00:20:29,790 --> 00:20:33,720 And this is a case where theory and practice have dovetailed in a very powerful way. 170 00:20:33,750 --> 00:20:39,450 So it's a great example of theory, not being just theory of really affecting what people do. 171 00:20:41,300 --> 00:20:43,970 Okay. I think that's all I'll show you for that thing. 172 00:20:48,170 --> 00:20:57,590 Just continuing on the theme of Nobel Prizes, I mentioned Last Lecture that we have a bunch of interesting people at Oxford in optimisation. 173 00:20:57,590 --> 00:21:01,870 There's Nick Gould, there's Cora Curtis is Rafael Houser and Felipe Plant. 174 00:21:02,420 --> 00:21:06,770 The last of these is the visitor from Belgium who wrote the code that you're going to use on the assignment. 175 00:21:07,400 --> 00:21:13,760 And I was asking them yesterday what Nobel Prizes have been won by optimisation more generally, not just linear programming. 176 00:21:13,940 --> 00:21:19,130 And unfortunately they didn't come up with a good list, but one that does seem pretty clear is. 177 00:21:21,740 --> 00:21:32,060 DFT now to a mathematician, DFT usually means discrete Fourier transform, but to a chemist it means density functional theory. 178 00:21:38,180 --> 00:21:41,690 And let me just say a very quick word about this. First, let me ask. 179 00:21:42,020 --> 00:21:45,920 We have physicists and chemists here. Has anyone here used this stuff? 180 00:21:46,280 --> 00:21:49,969 Fantastic. Okay, that's interesting. That's four or five of you. 181 00:21:49,970 --> 00:21:55,790 I'm impressed. So I've never had any connection with this. But as I understand it, the story goes. 182 00:21:55,790 --> 00:21:59,630 Around 1926, quantum mechanics was figured out and. 183 00:22:01,410 --> 00:22:05,100 Paul Dirac famously made the observation, well, that that's chemistry. 184 00:22:05,100 --> 00:22:12,570 We've now solved chemistry. It's all done. All that remains is a computational problem to take Schrodinger's equation and solve it. 185 00:22:13,080 --> 00:22:21,900 The trouble is, if you have any particles, the Schrödinger equation is a partial differential equation in a state space of three and dimensions, 186 00:22:22,140 --> 00:22:27,120 and nobody can solve the partial differential equation in that kind of huge dimensional space. 187 00:22:27,510 --> 00:22:30,239 So right from the beginning, around 1926, 188 00:22:30,240 --> 00:22:37,080 there's been a huge effort in trying to find ways to approximate Schrodinger's equation and retain the key physics. 189 00:22:37,380 --> 00:22:43,950 And then I guess a key step with happened with these people and they're sort of three people. 190 00:22:43,950 --> 00:22:47,700 There's Walter Cohn is the one who got the Nobel Prize. 191 00:22:49,350 --> 00:22:53,160 So that was the chemistry prize in 1998, 1998. 192 00:22:57,190 --> 00:23:02,380 But there are two key papers from this field from the beginning of this field. 193 00:23:02,560 --> 00:23:10,750 One of them is by him with a sham and the other is with Hohenberg. 194 00:23:13,750 --> 00:23:21,280 So what these people discovered back then was that in in a certain precise sense, this vast, 195 00:23:21,550 --> 00:23:28,990 high dimensional problem could be reduced to a high dimensional optimisation problem where they were able to show that 196 00:23:28,990 --> 00:23:38,380 actually the key thing was to study a certain energy that depended on a configuration vector for where your electrons are. 197 00:23:38,590 --> 00:23:46,600 So all you had to do was minimise this energy and then you could find initially ground states of complicated molecules. 198 00:23:47,080 --> 00:23:49,510 So now I'm curious, since we have some people in the room, 199 00:23:49,840 --> 00:23:54,970 I don't have much of a sense of how much that's really done with numerical optimisation methods. 200 00:23:55,210 --> 00:23:59,080 Can any of you tell me, do you use what happens when you solve these things? 201 00:24:04,600 --> 00:24:09,700 I've not had much experience of it. Yeah. Okay. How many, like, how big is n have. 202 00:24:09,720 --> 00:24:15,040 Has anyone in here actually solved the problem like this on the computer? And how many unknowns were there? 203 00:24:16,510 --> 00:24:23,530 Below ten. Below ten? Yeah. And are you in principle finding a minimum in a ten or 13 dimensional space? 204 00:24:23,530 --> 00:24:34,360 Is that the idea? Um, yeah. So obviously I don't know much about this, but the impact, if you cannot overstate how important this is, indeed, 205 00:24:34,900 --> 00:24:43,180 my very simple view of the world of difficult pdes is that there are two of them that are most amazing historically. 206 00:24:43,600 --> 00:24:50,020 The Navier-stokes equations for fluid mechanics and the Schrodinger equation for chemistry and physics. 207 00:24:50,590 --> 00:24:55,600 These are ancient things, you know. What is this? 1846, 1926? 208 00:24:56,200 --> 00:25:02,229 These are great examples where we've sort of known the equations for a century or two centuries, 209 00:25:02,230 --> 00:25:07,180 and yet solving the equations is a complete challenge that just goes on and on and on. 210 00:25:07,540 --> 00:25:11,600 Century after century. Okay. 211 00:25:11,780 --> 00:25:15,469 So that's all I'm going to say about linear things. 212 00:25:15,470 --> 00:25:24,870 And now I want to turn to the quadratic case. This is not linear, I hasten to add. 213 00:25:25,410 --> 00:25:30,260 So 3.8, let's say a word about quadratic programming. 214 00:25:37,750 --> 00:25:45,070 And when you're in that subject in the constrained context, you always find yourself talking about Lagrange multipliers. 215 00:25:55,070 --> 00:25:59,660 So just to remind you once again about the big picture, we have linear problems. 216 00:25:59,660 --> 00:26:05,660 We have nonlinear problems. Now, nonlinear problems, of course, could have arbitrary, non-linear narratives, 217 00:26:05,660 --> 00:26:11,120 but locally, at least a smooth, arbitrary non linearity will look quadratic. 218 00:26:11,420 --> 00:26:15,710 So one way to think of a minimum is it's the bottom of a parabolic dish. 219 00:26:16,490 --> 00:26:20,330 Now a point with a zero gradient might not be a minimum, it might be a saddle point. 220 00:26:20,330 --> 00:26:25,700 So then you have a sort of like a parabola, but going up in some directions, down in other directions. 221 00:26:26,510 --> 00:26:33,290 Quadratic programming is the part of optimisation where you actually deal with a quadratic function, 222 00:26:33,560 --> 00:26:38,540 but it has this broader relevance because locally every function looks quadratic. 223 00:26:39,710 --> 00:26:45,470 When there are constraints, well, when there are no constraints, that's sort of least squares fitting, which is huge. 224 00:26:45,470 --> 00:26:49,430 Of course, when there are constraints, then we tend to use this word programming. 225 00:26:49,670 --> 00:26:54,410 It's a kind of old fashioned word, and it always hints at optimisation with constraints. 226 00:26:55,190 --> 00:27:01,040 So quadratic programming usually means minimisation of a quadratic function subject to constraints, 227 00:27:01,370 --> 00:27:05,000 and it's the constraints that lead you to introduce Lagrange multipliers. 228 00:27:06,200 --> 00:27:12,290 So I'm just going, as usual, to show you a simple example of how a Legrand multiplier might appear. 229 00:27:14,540 --> 00:27:19,890 So suppose. We are looking for a vector x. 230 00:27:21,950 --> 00:27:28,190 To minimise that energy functional. And I'm going to write that not as high but as Jay. 231 00:27:30,810 --> 00:27:34,260 So we have an energy functional J of X and we're trying to minimise it. 232 00:27:34,620 --> 00:27:36,299 Now in density functional theory, 233 00:27:36,300 --> 00:27:42,330 that would be something very complicated and perhaps controversial as people try to figure out how to do the physics right. 234 00:27:43,080 --> 00:27:47,190 But in quadratic form programming, it's simply quadratic. 235 00:27:47,700 --> 00:27:59,170 So let's write it in this form. It's one half x, transpose x minus F, transpose x now. 236 00:28:00,150 --> 00:28:09,060 And then there's also a constraint, which will be that B transpose X and I'm going to make an inequality constraint is equal to C. 237 00:28:11,870 --> 00:28:16,760 So what are these things? X is an unknown vector that we're trying to find. 238 00:28:22,240 --> 00:28:34,520 F is a given vector. A is a given matrix. 239 00:28:34,520 --> 00:28:40,670 And the key thing here is that it's symmetric, positive, definite. 240 00:28:44,300 --> 00:28:47,390 If it's not positive, definite, then you have descent directions. 241 00:28:47,570 --> 00:28:52,580 And so the minimum is going to be minus infinity. So you need some kind of lower bound. 242 00:28:53,810 --> 00:28:58,340 Positive, definite minus B is a given vector. 243 00:29:01,640 --> 00:29:06,590 And see, I'm going to guess, do this as a given scalar. 244 00:29:07,280 --> 00:29:10,370 So I'm showing you a simple version with linear programming. 245 00:29:10,370 --> 00:29:18,800 I imagine we had m inequality constraints for this one because I want to show you Lagrange multipliers in their simplest context, 246 00:29:19,040 --> 00:29:25,040 I'm imagining that we have one. Equality constraint. 247 00:29:28,620 --> 00:29:32,430 Of course, that generalises to situations with more constraints. 248 00:29:33,180 --> 00:29:37,920 So we're about to derive a Lagrange multiplier, which will be a scalar because there's one constraint. 249 00:29:38,340 --> 00:29:42,540 More generally, a legrand's multiplier would be a vector if you have constraints. 250 00:30:13,890 --> 00:30:29,170 Okay. So the La Grange idea is amazing. 251 00:30:29,170 --> 00:30:33,520 I mean, this was hundreds of years ago, long before we had a field called optimisation. 252 00:30:33,520 --> 00:30:38,290 But I guess La Grange was studying mechanics and various constraints on how things moved. 253 00:30:38,560 --> 00:30:48,760 And he saw the right thing to do. So the right thing to do is to set up a la Grange in I don't know whether he called it Al. 254 00:30:48,790 --> 00:30:56,259 Probably not. But anyway, the right thing to do is to introduce a new variable, which will be a scalar. 255 00:30:56,260 --> 00:31:05,169 Since we have only one constraint and we make a new function called the Lagrangian, which consists of J, that is to say one half x, 256 00:31:05,170 --> 00:31:20,500 transpose a x minus F, transpose x plus an additional term, which is B transpose x minus C times a scalar. 257 00:31:22,000 --> 00:31:30,130 Now, let's think about what that means. This is the thing we're trying to minimise. 258 00:31:39,050 --> 00:31:43,220 This is the constraint we're trying to satisfy. 259 00:31:43,520 --> 00:31:47,330 So the constraint is that this should be zero. 260 00:31:57,570 --> 00:32:02,820 Now. There are lots of ways to think about this. And in a minute I'll take a derivative and we'll see it mathematically. 261 00:32:03,000 --> 00:32:05,190 But before doing that, let's sort of draw a picture. 262 00:32:07,390 --> 00:32:17,080 We have a constraint here, which is the hyper plane B transpose x equals C, we're forced to lie on that line, 263 00:32:17,080 --> 00:32:25,149 that space, that hyper plane, and we're trying to minimise this quadratic function subject to that constraint. 264 00:32:25,150 --> 00:32:31,150 So we're trying to find not the global minimum of J, but the minimum of J on that constraint. 265 00:32:31,420 --> 00:32:35,980 So. Let's imagine that the level curves of Jay. 266 00:32:38,280 --> 00:32:46,860 Sort of look like this. So we're trying to find this point. 267 00:32:47,850 --> 00:32:51,390 It's the point where Jay is as good as possible on the constraint. 268 00:32:51,660 --> 00:32:59,460 Now, the Lagrange multiplier idea is essentially the geometric observation that at this point. 269 00:33:02,020 --> 00:33:08,470 The direction you would go to improve the value of J is orthogonal to the constraint. 270 00:33:09,130 --> 00:33:11,860 Anything you do along the constraint won't help. 271 00:33:13,060 --> 00:33:20,740 If the only way to improve the up, the objective function would be to violate the constraint and go in an orthogonal direction. 272 00:33:21,010 --> 00:33:24,490 And then here's Lagrange had this brilliant idea that. 273 00:33:26,380 --> 00:33:30,100 This constrained point could be thought of as an unconstrained point. 274 00:33:30,100 --> 00:33:33,880 If only the wind were blowing strongly enough in that direction. 275 00:33:34,210 --> 00:33:40,450 Just the right amount of wind would sort of tilt the objective function so that this point became a global constraint. 276 00:33:41,140 --> 00:33:45,250 And this is the wind speed, if you like. Okay, enough of that. 277 00:33:45,280 --> 00:33:54,520 Let's write down the mathematics. But the idea is to turn a constrained problem into an unconstrained problem with one more variable. 278 00:33:54,880 --> 00:33:58,120 So what we do is compute partial derivatives. 279 00:33:58,750 --> 00:34:03,550 So the partial derivative of the Lagrangian with respect to the constraint. 280 00:34:04,420 --> 00:34:12,250 Well, the constraint only appears here. So that's B transpose X minus C. 281 00:34:14,270 --> 00:34:19,069 The other partial derivative would be respect to the X variable, and that's a gradient. 282 00:34:19,070 --> 00:34:31,340 So the the partial derivative with respect to all the X variables is the gradient of L with respect to X, and that involves all of these terms. 283 00:34:31,340 --> 00:34:34,340 That's a X minus F. 284 00:34:35,420 --> 00:34:41,870 Plus to be. Now. 285 00:34:41,870 --> 00:34:45,800 LA His idea is to find a point where all of these things are zero. 286 00:34:47,460 --> 00:34:51,930 So we know that optimisation involves setting some kind of gradient to zero. 287 00:34:52,200 --> 00:34:56,970 That's when you're at a flat point thanks to having the wind blowing. 288 00:34:57,780 --> 00:35:01,379 That flat point will even satisfy the constraint. 289 00:35:01,380 --> 00:35:05,400 And here's why. Well, let's see what the. 290 00:35:06,540 --> 00:35:16,650 Suppose we find a stationary point. So suppose we find x and lambda such that both of these things are zero, 291 00:35:16,890 --> 00:35:27,240 so the gradient is zero and the derivative with respect to the Lagrange multiplier is zero. 292 00:35:31,800 --> 00:35:40,620 Then what does that imply? So well, first of all, it implies that the constraint is satisfied. 293 00:35:47,060 --> 00:35:52,040 Because we just figured out that the partial derivative of the grunge in with respect to lambda 294 00:35:52,040 --> 00:35:58,370 is that and if that zero the constraint is satisfied but also what it also implies is that. 295 00:36:02,100 --> 00:36:08,240 That A X minus F equals minus lambda B. 296 00:36:08,250 --> 00:36:15,060 That's what we get from this condition, which tells us that the how do I want to say this? 297 00:36:17,430 --> 00:36:25,840 The gradient of J. Is equal to minus Lambda B. 298 00:36:27,350 --> 00:36:34,490 So that condition says we satisfy the constraint. That condition says that the gradient is equal to minus lambda B. 299 00:36:34,640 --> 00:36:43,520 But that's just this picture, because that's telling you that the gradient of the objective function is orthogonal to the constraint. 300 00:36:43,910 --> 00:36:47,540 So if you believe the picture, it's got to be optimal. 301 00:36:47,780 --> 00:36:52,520 The only way, if you move along the constraint, you won't improve the objective function. 302 00:36:54,530 --> 00:36:56,450 So I'm saying it very quickly, 303 00:36:56,450 --> 00:37:04,370 but this tells us that the solution of this unconstrained optimisation problem gives us a solution of the original constraint problem. 304 00:37:06,470 --> 00:37:08,840 So the mathematics is just linear algebra. 305 00:37:17,140 --> 00:37:27,340 And the key idea, which perhaps La Grange didn't have to have, is to think of it as a system of linear equations of size and plus one by end. 306 00:37:27,340 --> 00:37:32,360 Plus one. Here's what it looks like. A. 307 00:37:33,820 --> 00:37:37,450 B b transpose zero. 308 00:37:39,030 --> 00:37:44,640 X lambda equals f c. 309 00:37:47,040 --> 00:37:50,340 So this is an n plus one by n plus one system of equations. 310 00:37:56,140 --> 00:38:03,070 It's linear because we've assumed that we started not with an arbitrary nonlinear function, but with a quadratic nonlinear function. 311 00:38:03,460 --> 00:38:10,330 And what are these two equations? It's end by end, plus one by end plus one. 312 00:38:10,330 --> 00:38:15,850 But you could also say it's block two by two. So it has a block structure of a two by two matrix. 313 00:38:16,360 --> 00:38:29,860 The first row of that block structure would be a x plus B, lambda equals F, the second row is being transpose x equal B, transpose x equals C. 314 00:38:34,100 --> 00:38:38,150 So that you see by solving this end plus one by end, plus one linear system of equations, 315 00:38:38,570 --> 00:38:43,460 we satisfy all the Lagrange multiplier conditions and find our global optimum. 316 00:38:45,430 --> 00:38:54,250 So this idea generalises to a huge part of the technology of quadratic programming and more generally linear programming. 317 00:38:54,910 --> 00:39:02,310 Let me mention that. So of course, if we have them constraints. 318 00:39:06,480 --> 00:39:10,860 Then we get em Lagrange multipliers or a vector of length. 319 00:39:10,860 --> 00:39:14,970 Then we get a system of dimension. 320 00:39:16,690 --> 00:39:22,930 And plus M. BI and plus M. 321 00:39:25,390 --> 00:39:31,330 So again, if you have a quadratic objective function and a finite number of linear equality constraints, 322 00:39:31,510 --> 00:39:36,490 you can solve the whole problem by solving a system of equations if it's more general. 323 00:39:38,160 --> 00:39:45,510 If it's non-linear but not necessarily quadratic objective function. 324 00:39:51,440 --> 00:39:52,999 Well, there are many ways to solve that, 325 00:39:53,000 --> 00:40:00,050 but one of the ways is to locally approximate it by a quadratic and then sequentially use successive approximations. 326 00:40:00,290 --> 00:40:09,020 And that method is called S Q P, which stands for successive or sequential quadratic programming. 327 00:40:21,590 --> 00:40:30,980 So this 200 year old idea of electrons multiplier has morphed or evolved into one of the key ideas in non-linear programming generally. 328 00:40:32,420 --> 00:40:37,879 I wanted to finish by telling you about a particular problem I've been involved in the last couple of years, 329 00:40:37,880 --> 00:40:41,240 and that gets to this final handout about the Faraday cage. 330 00:40:42,740 --> 00:40:45,840 So who's heard of the Faraday cage? Yeah. 331 00:40:46,440 --> 00:40:51,960 Everyone's heard of it. It's amazing. Let me remind you what a Faraday cage is. 332 00:40:55,950 --> 00:41:01,940 So suppose you have a metal shell. Metal is a conductor. 333 00:41:02,180 --> 00:41:08,840 And therefore, the voltage, the potential, let's call it you, the potential on a metal scale will be constant. 334 00:41:10,160 --> 00:41:19,940 And that means that inside a metal shell, if you have an electrostatic potential on the boundary, you just have that same constant potential inside. 335 00:41:20,450 --> 00:41:25,930 So the solution to the pluses equation inside a metal shell is a constant. 336 00:41:25,940 --> 00:41:35,270 So the gradient is zero. So inside a metal shell there are no electric fields because the gradient of the voltage has to be zero. 337 00:41:35,930 --> 00:41:43,670 Now, what Faraday noted in 1836 was that the same seemed to work if you had. 338 00:41:45,870 --> 00:41:54,120 A discrete approximation to a shell. I'm drawing it in two dimensions, but of course he would have been experimenting in three dimensions. 339 00:41:54,330 --> 00:42:03,750 If you take a wire mesh, it seems to block electric fields as well as a solid metal mesh, a solid metal shell. 340 00:42:04,470 --> 00:42:07,830 So in two D we can imagine we have a bunch of wires. 341 00:42:07,980 --> 00:42:12,090 These are the cross-sections of the wires and they're all being held at the same voltage. 342 00:42:18,210 --> 00:42:25,560 So the Faraday effect is that if you have a bunch of points in 2D or lines in 3D held at the same voltage, 343 00:42:25,770 --> 00:42:32,100 then the gradient inside, of course nobody would say it's exactly zero, but it's approximately zero. 344 00:42:33,570 --> 00:42:38,610 The famous example is a microwave oven where you have these microwaves inside that aren't allowed to get out. 345 00:42:38,970 --> 00:42:41,010 How is that achieved while there's a lot of metal? 346 00:42:41,220 --> 00:42:46,890 But the front door is metal with holes in it because you want light to get out and light has a much smaller wavelength. 347 00:42:48,060 --> 00:42:55,320 So I got interested in that a couple of years ago, and this is a clippings from a talk I just gave the other day on the subject, 348 00:42:56,790 --> 00:43:00,300 and it turns out that there's no literature on the Faraday cage. 349 00:43:00,570 --> 00:43:04,170 And, well, there's a little. And what little literature there is is wrong. 350 00:43:04,740 --> 00:43:10,620 So the the famous example or famous the the conspicuous example of a bit of 351 00:43:10,620 --> 00:43:15,000 literature in this area is in Feynman's lecture notes on Physics and Volume two. 352 00:43:16,050 --> 00:43:19,400 You can find a brief one or two page discussion. 353 00:43:19,410 --> 00:43:24,420 He doesn't call it the Faraday cage, but he talks about shielding. So I've given you a quote there. 354 00:43:24,840 --> 00:43:33,510 Feynman does an analysis, and then he says, Inside a screen, the field, the the fields are zero. 355 00:43:33,510 --> 00:43:41,040 And of course, he means exponentially small. So he's saying that inside a screen like this, the fields are nearly the same as there. 356 00:43:41,670 --> 00:43:47,840 So that turns out to be not true. It took us a long time to realise that with confidence. 357 00:43:48,980 --> 00:43:55,160 If you look at Faraday's argument, you discover, rather than setting the voltages to be equal on the wires, 358 00:43:55,400 --> 00:43:58,540 he's actually assuming that each wire has the same charge on it. 359 00:43:58,550 --> 00:44:01,550 And that's not right. It's the voltage that's equal. 360 00:44:01,550 --> 00:44:08,720 The point is that they're connected. So let me explain to you how that's a constrained quadratic programming problem. 361 00:44:10,560 --> 00:44:20,010 So the way we look at that is imagine for simplicity that your cage consists of a bunch of a bunch of circles. 362 00:44:21,450 --> 00:44:27,379 Or little desk, if you like. These are cross-sections of the wires. Out here. 363 00:44:27,380 --> 00:44:31,640 You have some drive in some field that you're trying to shield. 364 00:44:31,910 --> 00:44:37,850 So in two dimensions, that might be something like the log of Z minus some special point. 365 00:44:38,000 --> 00:44:41,360 That's a two dimensional point charge in 3-D. 366 00:44:41,360 --> 00:44:52,310 It would be analogous. The goal of the Faraday cage is to make the voltage nearly constant of here so that the electric field is nearly zero. 367 00:44:54,120 --> 00:44:58,079 So if you think physically about what happens here, these are just passive wires. 368 00:44:58,080 --> 00:45:04,950 They don't have any charge on them in a net sense, but the charge will nevertheless move around so that if this, 369 00:45:04,950 --> 00:45:15,030 for example, is plus, then it will attract some minus charges here and repel some plus charges here. 370 00:45:15,840 --> 00:45:26,070 Always neutral on balance. So the question becomes how will charges distribute themselves around this cage to minimise energy? 371 00:45:28,570 --> 00:45:32,770 And that may explain why it's well, let's just stare at this thing. 372 00:45:37,150 --> 00:45:48,110 The equations are at the bottom of the back. And what you see is that the quadratic energy term that we're trying to minimise to model the Faraday 373 00:45:48,110 --> 00:45:53,290 cage consists of three pieces and two of them are sort of obvious and it's the third that's the surprise. 374 00:45:53,300 --> 00:45:59,840 So let's go from back to front. The third piece is the energy with respect to the external field. 375 00:46:00,140 --> 00:46:07,100 So if you have a certain charge here, then there's an energy associated with its interaction with that. 376 00:46:07,880 --> 00:46:11,270 Obviously there's an inverse square force, an inverse linear energy. 377 00:46:13,220 --> 00:46:19,970 So that's the third term in the equation. The middle term in the equation is the interaction between the charges. 378 00:46:19,970 --> 00:46:24,290 So if there's some charge here and some charge here, they would repel or attract each other. 379 00:46:24,290 --> 00:46:30,010 There's an energy associated with that. And then the first term is the surprising one. 380 00:46:30,610 --> 00:46:34,689 So the way we're modelling this is where imagining that this disk has charged. 381 00:46:34,690 --> 00:46:38,470 Q One on it and this one has charged. Q Two, these are just numbers. 382 00:46:39,100 --> 00:46:43,090 Q So our unknown is a vector of charges. 383 00:46:55,690 --> 00:47:00,249 And those charges will be distributed in such a way as to minimise the energy. 384 00:47:00,250 --> 00:47:05,260 But the third term in the energy that's the first one as written down is the energy involved with 385 00:47:05,260 --> 00:47:10,389 having a lot of charge on a small wire because it takes work to put charge on the small wire. 386 00:47:10,390 --> 00:47:16,719 It's repelling itself. So the smaller the wire it is, the more work it takes and the more charge there is, the more work it takes. 387 00:47:16,720 --> 00:47:22,660 Quadratics. So you go through all this and you find you have a quadratic, objective function. 388 00:47:23,050 --> 00:47:27,130 You notice the first two terms are the quadratic, the squared part. 389 00:47:27,670 --> 00:47:30,690 The first term is the diagonal of a matrix q k squared. 390 00:47:30,700 --> 00:47:33,990 The second term is the off diagonal of the matrix. Q JQ. 391 00:47:34,000 --> 00:47:39,690 K And it's symmetric. The third term is the linear part of the problem. 392 00:47:39,700 --> 00:47:46,030 So we have exactly the format that I wrote down here. 393 00:47:46,300 --> 00:47:56,050 This is this is the energy we're trying to minimise two terms go into there, one term goes there and then there's a constraint. 394 00:47:56,650 --> 00:48:06,100 The total amount of charge is zero. So the sum of the Q case should be zero because there's no net charge on your passive Faraday cage. 395 00:48:06,310 --> 00:48:12,580 In other words, the inner product of C with Q is zero. 396 00:48:13,120 --> 00:48:20,260 If C is a vector of all ones. So it's a perfect example of a one constraint. 397 00:48:20,590 --> 00:48:25,960 Quadratic programming problem. And of course, once you see that, you can solve it in great at great speed. 398 00:48:26,200 --> 00:48:33,490 I have a little code that hopefully will do that. Let's see if it works. I'll try it on this computer. 399 00:48:34,240 --> 00:48:39,400 Chris, does it matter which computer I use? No. Okay, let's try this one. 400 00:48:49,630 --> 00:48:56,020 So suppose I try M24 Faraday, which is something I wrote up at Great Speed the other day. 401 00:48:56,020 --> 00:49:00,280 So it's an ugly bit of code. But it does exactly this. 402 00:49:00,290 --> 00:49:06,110 It solves this end plus one by end, plus one system. Let's give it a radius of each wire. 403 00:49:08,130 --> 00:49:15,630 And the way it's set up is I click and every time I click it sets up this matrix problem and solves it. 404 00:49:16,350 --> 00:49:22,950 So these are wires that I'm just putting into my cage. You see, the cage isn't at all closed yet, so it's not doing much shielding. 405 00:49:23,460 --> 00:49:27,390 But as I close it off. The shielding will get better. 406 00:49:29,220 --> 00:49:32,520 So every time I click, it's just solving another linear algebra problem. 407 00:49:32,520 --> 00:49:36,570 And what could be easier than that? The time is spent in drawing the plot, of course. 408 00:49:39,190 --> 00:49:42,640 So there you see the Faraday shielding effect. 409 00:49:43,090 --> 00:49:44,560 And you see it's pretty good. 410 00:49:45,130 --> 00:49:52,930 But you also see the fact that there are a few of those lines penetrating into the case shows that it's nothing like exponential. 411 00:49:52,930 --> 00:49:56,080 And whether you're a microwave oven is as safe as it ought to be. 412 00:49:56,290 --> 00:50:01,390 I don't know a fact about microwave ovens is that it's hard to see into them. 413 00:50:01,960 --> 00:50:08,080 You know, there's a lot of metal in that front door. And obviously the engineers who measured this stuff were not reading their Feynman. 414 00:50:08,080 --> 00:50:11,860 Thank goodness they realised you needed a lot of metal to have a good Faraday cage. 415 00:50:12,310 --> 00:50:15,530 Okay, so we'll turn to PDS and Otis next term. 416 00:50:15,550 --> 00:50:16,180 Thanks very much.