1
00:00:11,930 --> 00:00:15,710
We have an assignment which is due on Tuesday, week after next.
2
00:00:16,940 --> 00:00:20,839
It involves downloading a piece of software which, as I say this,
3
00:00:20,840 --> 00:00:25,960
I realise I haven't quite put on the website yet, but I'll do it right after this lecture called PFO.
4
00:00:25,970 --> 00:00:36,049
It stands for Brute Force Optimisation. And it's a nice example of a serious piece of software for not large scale optimisation.
5
00:00:36,050 --> 00:00:45,140
It's for maybe 1 to 10 variables. But if you have a problem with 1 to 10 variables, PFO is a very remarkable, convenient piece of work.
6
00:00:45,350 --> 00:00:50,720
It's all in MATLAB, freely available. I have a few miscellaneous things.
7
00:00:50,990 --> 00:00:58,760
One is actually section 3.6 of our outline is to mention nails and coin O.R.
8
00:00:58,940 --> 00:01:05,960
Now, last lecture. We already took a look at Neovasc briefly, which is the network enabled optimisation software.
9
00:01:06,320 --> 00:01:14,310
Let's briefly now look at the other one. Just so you're aware of this coin.
10
00:01:14,360 --> 00:01:19,760
Zero Hour is a funny name. Anyway, it stands for Computational Infrastructure for Operations Research.
11
00:01:20,000 --> 00:01:23,960
This is another portal to a great deal of information about optimisation.
12
00:01:24,170 --> 00:01:32,600
I'm not going to explore it now, but one reason I wanted to mention it to you is a reminder that operations research or in Britain,
13
00:01:32,600 --> 00:01:38,360
we would say operational research has been a key source of optimisation technology.
14
00:01:38,570 --> 00:01:44,060
In the early years, more or less during and after the war. This was where optimisation came from.
15
00:01:44,270 --> 00:01:49,100
Now optimisation is much broader in its impact, but this is the classical root of it.
16
00:01:49,420 --> 00:01:52,970
It's even connected with Soviet history.
17
00:01:53,120 --> 00:02:00,920
The planned economy was one of the early, great examples in which people attempted to use optimisation to achieve good things.
18
00:02:01,130 --> 00:02:09,950
It didn't work very well, but it was a noble effort and indeed one of the great early figures in optimisation, Kontorovich was from the Soviet Union.
19
00:02:12,490 --> 00:02:16,830
None of you were even born when there was a Soviet Union. Okay.
20
00:02:19,430 --> 00:02:27,350
The other thing I wanted to mention is this handout of the top 500 computer site.
21
00:02:27,860 --> 00:02:36,710
I think I've mentioned top 500 a couple of times. They update their top 500 twice a year at once, as always in November.
22
00:02:37,370 --> 00:02:40,490
So this is the update from last week.
23
00:02:40,790 --> 00:02:48,169
And you can see that the fastest machines on earth are rated in the tens of teraflops.
24
00:02:48,170 --> 00:02:49,370
They're incredibly fast.
25
00:02:49,610 --> 00:02:56,420
The top ten are listed here and you can see they're from China, the US, Japan, the US, US, Switzerland, Germany, Saudi Arabia.
26
00:02:56,900 --> 00:03:03,410
They're a remarkable collection of machines that are very remote from what we actually do day to day, most of us.
27
00:03:03,710 --> 00:03:09,200
But if you take the trouble to use them with all their cores effectively, they can be incredible.
28
00:03:11,330 --> 00:03:17,690
One thing to remind you of is that all of these ratings spring from the numerical linear algebra community.
29
00:03:17,930 --> 00:03:28,700
The top 500 numbers are computed by running benchmarks, which effectively spring from the old linpack software subroutine libraries from the 1970s.
30
00:03:29,030 --> 00:03:34,400
So it's all about numerical linear algebra used to rate computers as well as to solve problems.
31
00:03:37,220 --> 00:03:42,770
There's a fair amount of scepticism about this list, and yet nobody can bring themselves to stop compiling it.
32
00:03:47,700 --> 00:03:53,490
Okay. Well, today is the last lecture, and the theme of the lecture is optimisation with constraints.
33
00:03:54,150 --> 00:04:02,010
As usual, I won't be able to go into things in depth, but I want to mention some of the features that turn up here.
34
00:04:04,620 --> 00:04:12,390
So 3.7 is our constraints and linear programming.
35
00:04:27,990 --> 00:04:33,120
Now in general in optimisation problem has something you're trying to minimise and then various constraints.
36
00:04:33,450 --> 00:04:38,850
The thing you're minimising could be linear or nonlinear, and the constraints could be linear or nonlinear.
37
00:04:40,260 --> 00:04:46,770
That description applies to continuous optimisation and a further complication as you might have discrete variables other than constraints,
38
00:04:46,770 --> 00:04:48,630
which I'm not talking about now.
39
00:04:49,470 --> 00:04:56,130
Linear programming is the case where everything's linear, and in practice the challenge is really to handle the constraints efficiently.
40
00:04:56,490 --> 00:05:06,570
The kind of picture we have in mind is that we're in some high dimensional space and we have to lie to a certain side of various hyper planes.
41
00:05:06,840 --> 00:05:14,280
So maybe we have to lie to that side of that hyper plane and to this side of this hyper plane and to this side of this hyper plane,
42
00:05:14,700 --> 00:05:19,170
we're constrained to a feasible region of legal positions.
43
00:05:19,650 --> 00:05:25,860
And within that region, we want to find an optimum, which is defined by minimising a linear function.
44
00:05:26,100 --> 00:05:29,220
It sounds trivial and of course it is trivial on the whiteboard,
45
00:05:29,460 --> 00:05:34,530
but once you get into millions of constraints and millions of variables, it's very far from trivial.
46
00:05:34,950 --> 00:05:50,280
So let's write down the notation. So we seek to find a vector X in our M and such that.
47
00:05:52,820 --> 00:05:57,860
We minimise f transpose x.
48
00:05:59,810 --> 00:06:07,070
Subject to the constraints that a times X is less than or equal to be.
49
00:06:07,070 --> 00:06:19,490
So I have to define what all of those things are. X, as I said, is an unknown and vector, and F is a given and vector also.
50
00:06:22,100 --> 00:06:28,130
And then a is a given and by and matrix.
51
00:06:31,840 --> 00:06:36,250
And B is a given m vector.
52
00:06:39,290 --> 00:06:45,640
So the picture here is that I'm talking about a system with NW unknowns and m constraints.
53
00:06:45,650 --> 00:07:05,980
Let's write that down. And The Matrix A defines the constraints that are linear constraints defined by this condition that affects less than or equal.
54
00:07:06,010 --> 00:07:09,040
B Now what does that mean? This is a vector and that's a vector.
55
00:07:09,250 --> 00:07:11,920
When I say less than a equal, I mean component wise.
56
00:07:11,920 --> 00:07:19,780
So each component of x there m components has to be less than or equal to the corresponding component of B.
57
00:07:24,570 --> 00:07:30,750
So this problem, which would have been considered too trivial to say anything about in the 19th century,
58
00:07:30,990 --> 00:07:37,709
began to seem important in the 1930s, 1940s, 1950s, and has just become more and more so over the years.
59
00:07:37,710 --> 00:07:41,160
It's absolutely at the heart of a lot of optimisation.
60
00:07:47,400 --> 00:07:51,570
So I'm going to run a baby example. So let me write down that example.
61
00:07:53,670 --> 00:07:57,750
I like to be able to draw it so it will be two dimensional. So here's our example.
62
00:08:01,330 --> 00:08:05,830
Let's take the X Y plane.
63
00:08:05,860 --> 00:08:06,760
So here's our plane.
64
00:08:06,770 --> 00:08:18,610
We're going to take three constraints X less than or equal to, Y, x greater than or equal to minus Y and Y less than or equal to one.
65
00:08:19,690 --> 00:08:24,970
So that defines a triangle. Why?
66
00:08:25,080 --> 00:08:33,150
Less than or equal to one would be that constraint and x less than or equal to Y would be that constraint.
67
00:08:35,060 --> 00:08:40,430
And X greater than or equal to minus Y would be that constraint.
68
00:08:40,850 --> 00:08:45,030
So we're looking in this region in the X Y plane.
69
00:08:45,770 --> 00:08:57,630
That's our feasible region. When you draw on the board, it's so trivial, but numerically, for many dimensions,
70
00:08:57,810 --> 00:09:02,100
even finding a point in that region is not necessarily an easy problem.
71
00:09:02,250 --> 00:09:06,240
And of course, there may be no point in that region if the constraints are inconsistent.
72
00:09:06,780 --> 00:09:08,370
Now we need to minimise something.
73
00:09:08,370 --> 00:09:15,150
So the particular vector I'm going to show in my code sort of points in that direction, or it's negative points in that direction.
74
00:09:15,360 --> 00:09:18,980
It will be the vector F equals one too.
75
00:09:20,670 --> 00:09:28,230
So the baby problem we're trying to solve is to find that point,
76
00:09:28,830 --> 00:09:35,940
because that's the point that minimises the inner product of X with F subject to lying in that triangle.
77
00:09:40,700 --> 00:09:49,810
Well, there's a long history here. I mentioned Kontorovich. And then another key name is dancing.
78
00:09:50,410 --> 00:09:58,149
So George Dancing was a great genius who worked during the war on secret things, which were certainly related to this,
79
00:09:58,150 --> 00:10:04,840
and then was at Berkeley right after the war, and more or less invented the modern field of linear programming right after the war.
80
00:10:05,080 --> 00:10:09,850
He also became a key person in the RAND Corporation, which was the original think tank,
81
00:10:10,150 --> 00:10:13,960
more or less set up to exploit academics for military purposes.
82
00:10:15,250 --> 00:10:18,760
Dancing invented the so-called simplex algorithm.
83
00:10:22,410 --> 00:10:30,960
Which in my simplistic history was one of the two biggest events, perhaps of the many decades of linear programming.
84
00:10:31,110 --> 00:10:38,700
The simplex algorithm pictorially is an algorithm which recognises that the optimum will always be attained at some vertex.
85
00:10:39,000 --> 00:10:42,360
So in principle all we have to do is check all the vertices.
86
00:10:42,630 --> 00:10:45,300
Now they will be combinatorial many vertices.
87
00:10:45,540 --> 00:10:52,380
So actually in the worst case it can be combinatorial is slow, but in practice it converges very quickly.
88
00:10:52,530 --> 00:10:58,410
The simplex algorithm checks one vertex and then moves in a downward direction to the next vertex and then so on.
89
00:10:58,710 --> 00:11:05,940
That makes it sound easier than it is. It's a brilliant idea of moving from one vertex to another until you find an optimal vertex.
90
00:11:08,310 --> 00:11:13,770
The other legendary thing that happened was in 1984.
91
00:11:14,970 --> 00:11:20,520
So let's say Danzig was in the forties and then Karmakar was 1984.
92
00:11:20,850 --> 00:11:27,990
This was an Indian mathematician from Berkeley and Bell Laboratories who proposed a completely different method.
93
00:11:28,560 --> 00:11:38,520
He proposed that you could do better by not hopping from one vertex to another, but by doing something in the interior.
94
00:11:40,220 --> 00:11:45,140
So carmakers. Well, for a year or two, it was called Karmakar method.
95
00:11:45,470 --> 00:11:48,620
But now everybody speaks of interior point methods.
96
00:11:55,460 --> 00:12:02,030
Now, you know, one of my philosophical interests is the interplay of continuous and discrete things and finite and infinite things.
97
00:12:02,330 --> 00:12:06,920
This is a fascinating example where you have a problem, which in principle is finite.
98
00:12:06,980 --> 00:12:12,530
So in principle, to find the red dot, all you have to do is check the vertices whose number is finite.
99
00:12:14,000 --> 00:12:18,740
The interior point approach is infinite. You're solving a continuous optimisation problem.
100
00:12:18,980 --> 00:12:23,390
You're somehow trying to track along a curve that's getting closer and closer to the red dot.
101
00:12:23,630 --> 00:12:30,690
In principle, that takes an infinite amount of work, but that infinite amount of work involves a very rapid convergence.
102
00:12:30,710 --> 00:12:36,950
So in a practical sense, it's finite and indeed often quicker than the simplex method.
103
00:12:37,460 --> 00:12:40,850
This whole approach was born in tremendous controversy.
104
00:12:41,180 --> 00:12:45,680
Karmakar made claims that his method was 50 times faster than the simplex method,
105
00:12:45,920 --> 00:12:51,710
which really annoyed people because he didn't release all the code and it was hard to verify the claims.
106
00:12:52,820 --> 00:12:57,110
As things settle down, it turns out in many contexts these methods really are superior.
107
00:12:57,500 --> 00:13:06,650
It would be rare for them to be 50 times superior. But these days, both classes of methods are very important interior point and boundary methods.
108
00:13:08,740 --> 00:13:17,980
So let's see if we're going to solve this particular problem. Our constraints need to be formulated in the standard way.
109
00:13:19,180 --> 00:13:22,210
So for our example, the constraints can be written in the form.
110
00:13:22,570 --> 00:13:32,050
X minus Y is less than or equal to zero, minus x, minus Y is less than or equal to zero, and Y is less than or equal to one.
111
00:13:34,090 --> 00:13:41,470
Now I can write that as a matrix problem. It's a three by two matrix times a two vector x y.
112
00:13:43,290 --> 00:13:51,900
Should be less than or equal to a three by one vector. The first constraint would be one minus one zero.
113
00:13:53,180 --> 00:14:01,880
The second constraint would be minus one minus one zero, and the third constraint would be 011.
114
00:14:03,240 --> 00:14:07,020
So there's my matrix a and my constraint vector be.
115
00:14:11,090 --> 00:14:16,820
So we're going in a moment. We're going to run that on the computer and see the very exciting result of the red dot.
116
00:14:17,780 --> 00:14:22,390
Let me say a little more about the history. Of course, it's not just these two remarkable men.
117
00:14:22,400 --> 00:14:25,820
There's been a huge amount of development over many years.
118
00:14:27,680 --> 00:14:34,610
Danzig is reputed to have muttered under his breath that five Nobel Prizes were won by the Simplex method.
119
00:14:34,850 --> 00:14:37,490
I've only heard that indirectly. I never heard him mutter it,
120
00:14:37,730 --> 00:14:47,420
but the impact has been enormous and I tried to figure out which Nobel prizes those might be for the simplex method or for linear programming.
121
00:14:48,920 --> 00:14:52,820
Of course, Danzig didn't get one because mathematicians don't get Nobel Prizes,
122
00:14:53,120 --> 00:14:57,980
but scientists who use the products of mathematicians may get Nobel Prizes.
123
00:14:58,190 --> 00:15:07,670
So depending how you analyse things, one of them was an early economics Nobel Prize in 1917 by Paul Samuelson,
124
00:15:08,360 --> 00:15:13,760
which was certainly related to linear programming. Another one was Leo A.F.
125
00:15:16,040 --> 00:15:25,419
In 1973. That was input-output analysis again in economics, which relates to the Soviet Union and all that stuff,
126
00:15:25,420 --> 00:15:29,860
trying to optimise planning for an economy or for a production facility.
127
00:15:31,570 --> 00:15:48,740
Another one is Kontorovich and Koopmans. And that was 1975, and they were working on optimal resource allocation.
128
00:15:48,950 --> 00:15:57,970
Another one is Markowitz. Who was doing portfolio optimisation in 1990.
129
00:15:58,240 --> 00:16:02,830
I don't know whether it's fair to say that the simplex algorithm won those Nobel Prizes,
130
00:16:02,830 --> 00:16:05,950
but all of them certainly have something to do with linear programming.
131
00:16:06,250 --> 00:16:10,120
If there's a fifth that Danzig was muttering about, I don't know what the fifth is.
132
00:16:13,040 --> 00:16:23,300
Okay. Let's run a code here. This is another one that relies on something in the optimisation toolbox, so I'll have to switch to the other computer.
133
00:16:29,770 --> 00:16:33,040
So I'm now looking at the one called M23 LP.
134
00:16:33,790 --> 00:16:42,220
If you look at that code, you see the main thing it does is call line Prague, which is a MATLAB linear programming code.
135
00:16:42,820 --> 00:16:53,320
So I set up precisely this matrix A and the vector F and the right hand side B, and then I call Lynn Prague and I draw a picture.
136
00:16:54,670 --> 00:17:02,979
You recognise that picture from before? Now.
137
00:17:02,980 --> 00:17:11,750
I haven't tried it on this computer. Let's hope it works. Okay.
138
00:17:14,660 --> 00:17:21,980
MATLAB. Okay. So I'll say M 23 LP.
139
00:17:24,550 --> 00:17:27,580
So there is the region. Very exciting.
140
00:17:28,030 --> 00:17:31,540
And when I press return, you see a red dot. So isn't that impressive?
141
00:17:32,350 --> 00:17:34,720
That's what graduate study at Oxford will do for you.
142
00:17:35,800 --> 00:17:42,910
I have a couple of variants on the code to give a little bit of the flavour of things with larger scale.
143
00:17:43,150 --> 00:17:46,629
So the next one M23 ran increases it.
144
00:17:46,630 --> 00:17:53,710
So we have 40 constraints. Pick that random. So let's see what happens with M23 ran.
145
00:17:57,610 --> 00:18:04,420
Oh, well. So let's see. I'll type em up. And 23 ran.
146
00:18:05,860 --> 00:18:13,930
So there we just pick 40 constraints at random. So now, of course, because it's still two dimensional, it's easy for you to see where the optimum is.
147
00:18:14,140 --> 00:18:19,690
But you can imagine if it were in many more dimensions. We're beginning to get into a real computational problem.
148
00:18:20,050 --> 00:18:26,770
Let's do that one more time. Okay.
149
00:18:27,160 --> 00:18:30,459
Let's do it a little bigger. There's another one on the computer called M23.
150
00:18:30,460 --> 00:18:36,900
Ran big. So here. 40 is changed to 400.
151
00:18:38,130 --> 00:18:44,230
Oops. Hmm.
152
00:18:44,740 --> 00:18:48,330
What's going on there? Again.
153
00:18:48,330 --> 00:18:51,840
I didn't check it on this computer. Oops.
154
00:18:52,320 --> 00:18:57,990
I'm not sure what just happened. We seem to have a problem.
155
00:18:59,660 --> 00:19:04,520
Houston. Let me just take a quick look at that in case it's obvious what the problem is.
156
00:19:08,600 --> 00:19:12,950
Okay. So. Oh, I see.
157
00:19:12,950 --> 00:19:18,950
I've tried to do 10,000. I meant to do 400. That's the problem.
158
00:19:21,640 --> 00:19:25,260
Oops. Right. Okay. That looks a little better.
159
00:19:26,530 --> 00:19:32,010
Let's see if that works. Okay.
160
00:19:32,160 --> 00:19:36,080
So there you have 400 constraints. And it finds the dot.
161
00:19:36,090 --> 00:19:40,260
Well, that's exciting. Do it again. 400 constraints and it finds the dot.
162
00:19:40,500 --> 00:19:47,280
Now I've set it up so that obviously there is a feasible region and I've cooked up the constraints to have that property.
163
00:19:47,280 --> 00:19:53,880
But of course, if you took 400 constraints at random without some kind of intent like that, there would be an empty, feasible region.
164
00:19:54,180 --> 00:19:59,100
Let's at least try it one more and I'll change 400 to 2000.
165
00:20:04,830 --> 00:20:10,540
You can imagine that there's a huge amount known about the complexity of these operations, these algorithms.
166
00:20:11,040 --> 00:20:14,800
And it's been a fascinating interplay of theory and practice.
167
00:20:14,910 --> 00:20:23,190
Are we have 2000 constraints. It still finds the red dot in computer science.
168
00:20:23,220 --> 00:20:29,520
There has been tremendous interest over the last 40 years in the theoretical complexity of linear programming.
169
00:20:29,790 --> 00:20:33,720
And this is a case where theory and practice have dovetailed in a very powerful way.
170
00:20:33,750 --> 00:20:39,450
So it's a great example of theory, not being just theory of really affecting what people do.
171
00:20:41,300 --> 00:20:43,970
Okay. I think that's all I'll show you for that thing.
172
00:20:48,170 --> 00:20:57,590
Just continuing on the theme of Nobel Prizes, I mentioned Last Lecture that we have a bunch of interesting people at Oxford in optimisation.
173
00:20:57,590 --> 00:21:01,870
There's Nick Gould, there's Cora Curtis is Rafael Houser and Felipe Plant.
174
00:21:02,420 --> 00:21:06,770
The last of these is the visitor from Belgium who wrote the code that you're going to use on the assignment.
175
00:21:07,400 --> 00:21:13,760
And I was asking them yesterday what Nobel Prizes have been won by optimisation more generally, not just linear programming.
176
00:21:13,940 --> 00:21:19,130
And unfortunately they didn't come up with a good list, but one that does seem pretty clear is.
177
00:21:21,740 --> 00:21:32,060
DFT now to a mathematician, DFT usually means discrete Fourier transform, but to a chemist it means density functional theory.
178
00:21:38,180 --> 00:21:41,690
And let me just say a very quick word about this. First, let me ask.
179
00:21:42,020 --> 00:21:45,920
We have physicists and chemists here. Has anyone here used this stuff?
180
00:21:46,280 --> 00:21:49,969
Fantastic. Okay, that's interesting. That's four or five of you.
181
00:21:49,970 --> 00:21:55,790
I'm impressed. So I've never had any connection with this. But as I understand it, the story goes.
182
00:21:55,790 --> 00:21:59,630
Around 1926, quantum mechanics was figured out and.
183
00:22:01,410 --> 00:22:05,100
Paul Dirac famously made the observation, well, that that's chemistry.
184
00:22:05,100 --> 00:22:12,570
We've now solved chemistry. It's all done. All that remains is a computational problem to take Schrodinger's equation and solve it.
185
00:22:13,080 --> 00:22:21,900
The trouble is, if you have any particles, the SchrÃ¶dinger equation is a partial differential equation in a state space of three and dimensions,
186
00:22:22,140 --> 00:22:27,120
and nobody can solve the partial differential equation in that kind of huge dimensional space.
187
00:22:27,510 --> 00:22:30,239
So right from the beginning, around 1926,
188
00:22:30,240 --> 00:22:37,080
there's been a huge effort in trying to find ways to approximate Schrodinger's equation and retain the key physics.
189
00:22:37,380 --> 00:22:43,950
And then I guess a key step with happened with these people and they're sort of three people.
190
00:22:43,950 --> 00:22:47,700
There's Walter Cohn is the one who got the Nobel Prize.
191
00:22:49,350 --> 00:22:53,160
So that was the chemistry prize in 1998, 1998.
192
00:22:57,190 --> 00:23:02,380
But there are two key papers from this field from the beginning of this field.
193
00:23:02,560 --> 00:23:10,750
One of them is by him with a sham and the other is with Hohenberg.
194
00:23:13,750 --> 00:23:21,280
So what these people discovered back then was that in in a certain precise sense, this vast,
195
00:23:21,550 --> 00:23:28,990
high dimensional problem could be reduced to a high dimensional optimisation problem where they were able to show that
196
00:23:28,990 --> 00:23:38,380
actually the key thing was to study a certain energy that depended on a configuration vector for where your electrons are.
197
00:23:38,590 --> 00:23:46,600
So all you had to do was minimise this energy and then you could find initially ground states of complicated molecules.
198
00:23:47,080 --> 00:23:49,510
So now I'm curious, since we have some people in the room,
199
00:23:49,840 --> 00:23:54,970
I don't have much of a sense of how much that's really done with numerical optimisation methods.
200
00:23:55,210 --> 00:23:59,080
Can any of you tell me, do you use what happens when you solve these things?
201
00:24:04,600 --> 00:24:09,700
I've not had much experience of it. Yeah. Okay. How many, like, how big is n have.
202
00:24:09,720 --> 00:24:15,040
Has anyone in here actually solved the problem like this on the computer? And how many unknowns were there?
203
00:24:16,510 --> 00:24:23,530
Below ten. Below ten? Yeah. And are you in principle finding a minimum in a ten or 13 dimensional space?
204
00:24:23,530 --> 00:24:34,360
Is that the idea? Um, yeah. So obviously I don't know much about this, but the impact, if you cannot overstate how important this is, indeed,
205
00:24:34,900 --> 00:24:43,180
my very simple view of the world of difficult pdes is that there are two of them that are most amazing historically.
206
00:24:43,600 --> 00:24:50,020
The Navier-stokes equations for fluid mechanics and the Schrodinger equation for chemistry and physics.
207
00:24:50,590 --> 00:24:55,600
These are ancient things, you know. What is this? 1846, 1926?
208
00:24:56,200 --> 00:25:02,229
These are great examples where we've sort of known the equations for a century or two centuries,
209
00:25:02,230 --> 00:25:07,180
and yet solving the equations is a complete challenge that just goes on and on and on.
210
00:25:07,540 --> 00:25:11,600
Century after century. Okay.
211
00:25:11,780 --> 00:25:15,469
So that's all I'm going to say about linear things.
212
00:25:15,470 --> 00:25:24,870
And now I want to turn to the quadratic case. This is not linear, I hasten to add.
213
00:25:25,410 --> 00:25:30,260
So 3.8, let's say a word about quadratic programming.
214
00:25:37,750 --> 00:25:45,070
And when you're in that subject in the constrained context, you always find yourself talking about Lagrange multipliers.
215
00:25:55,070 --> 00:25:59,660
So just to remind you once again about the big picture, we have linear problems.
216
00:25:59,660 --> 00:26:05,660
We have nonlinear problems. Now, nonlinear problems, of course, could have arbitrary, non-linear narratives,
217
00:26:05,660 --> 00:26:11,120
but locally, at least a smooth, arbitrary non linearity will look quadratic.
218
00:26:11,420 --> 00:26:15,710
So one way to think of a minimum is it's the bottom of a parabolic dish.
219
00:26:16,490 --> 00:26:20,330
Now a point with a zero gradient might not be a minimum, it might be a saddle point.
220
00:26:20,330 --> 00:26:25,700
So then you have a sort of like a parabola, but going up in some directions, down in other directions.
221
00:26:26,510 --> 00:26:33,290
Quadratic programming is the part of optimisation where you actually deal with a quadratic function,
222
00:26:33,560 --> 00:26:38,540
but it has this broader relevance because locally every function looks quadratic.
223
00:26:39,710 --> 00:26:45,470
When there are constraints, well, when there are no constraints, that's sort of least squares fitting, which is huge.
224
00:26:45,470 --> 00:26:49,430
Of course, when there are constraints, then we tend to use this word programming.
225
00:26:49,670 --> 00:26:54,410
It's a kind of old fashioned word, and it always hints at optimisation with constraints.
226
00:26:55,190 --> 00:27:01,040
So quadratic programming usually means minimisation of a quadratic function subject to constraints,
227
00:27:01,370 --> 00:27:05,000
and it's the constraints that lead you to introduce Lagrange multipliers.
228
00:27:06,200 --> 00:27:12,290
So I'm just going, as usual, to show you a simple example of how a Legrand multiplier might appear.
229
00:27:14,540 --> 00:27:19,890
So suppose. We are looking for a vector x.
230
00:27:21,950 --> 00:27:28,190
To minimise that energy functional. And I'm going to write that not as high but as Jay.
231
00:27:30,810 --> 00:27:34,260
So we have an energy functional J of X and we're trying to minimise it.
232
00:27:34,620 --> 00:27:36,299
Now in density functional theory,
233
00:27:36,300 --> 00:27:42,330
that would be something very complicated and perhaps controversial as people try to figure out how to do the physics right.
234
00:27:43,080 --> 00:27:47,190
But in quadratic form programming, it's simply quadratic.
235
00:27:47,700 --> 00:27:59,170
So let's write it in this form. It's one half x, transpose x minus F, transpose x now.
236
00:28:00,150 --> 00:28:09,060
And then there's also a constraint, which will be that B transpose X and I'm going to make an inequality constraint is equal to C.
237
00:28:11,870 --> 00:28:16,760
So what are these things? X is an unknown vector that we're trying to find.
238
00:28:22,240 --> 00:28:34,520
F is a given vector. A is a given matrix.
239
00:28:34,520 --> 00:28:40,670
And the key thing here is that it's symmetric, positive, definite.
240
00:28:44,300 --> 00:28:47,390
If it's not positive, definite, then you have descent directions.
241
00:28:47,570 --> 00:28:52,580
And so the minimum is going to be minus infinity. So you need some kind of lower bound.
242
00:28:53,810 --> 00:28:58,340
Positive, definite minus B is a given vector.
243
00:29:01,640 --> 00:29:06,590
And see, I'm going to guess, do this as a given scalar.
244
00:29:07,280 --> 00:29:10,370
So I'm showing you a simple version with linear programming.
245
00:29:10,370 --> 00:29:18,800
I imagine we had m inequality constraints for this one because I want to show you Lagrange multipliers in their simplest context,
246
00:29:19,040 --> 00:29:25,040
I'm imagining that we have one. Equality constraint.
247
00:29:28,620 --> 00:29:32,430
Of course, that generalises to situations with more constraints.
248
00:29:33,180 --> 00:29:37,920
So we're about to derive a Lagrange multiplier, which will be a scalar because there's one constraint.
249
00:29:38,340 --> 00:29:42,540
More generally, a legrand's multiplier would be a vector if you have constraints.
250
00:30:13,890 --> 00:30:29,170
Okay. So the La Grange idea is amazing.
251
00:30:29,170 --> 00:30:33,520
I mean, this was hundreds of years ago, long before we had a field called optimisation.
252
00:30:33,520 --> 00:30:38,290
But I guess La Grange was studying mechanics and various constraints on how things moved.
253
00:30:38,560 --> 00:30:48,760
And he saw the right thing to do. So the right thing to do is to set up a la Grange in I don't know whether he called it Al.
254
00:30:48,790 --> 00:30:56,259
Probably not. But anyway, the right thing to do is to introduce a new variable, which will be a scalar.
255
00:30:56,260 --> 00:31:05,169
Since we have only one constraint and we make a new function called the Lagrangian, which consists of J, that is to say one half x,
256
00:31:05,170 --> 00:31:20,500
transpose a x minus F, transpose x plus an additional term, which is B transpose x minus C times a scalar.
257
00:31:22,000 --> 00:31:30,130
Now, let's think about what that means. This is the thing we're trying to minimise.
258
00:31:39,050 --> 00:31:43,220
This is the constraint we're trying to satisfy.
259
00:31:43,520 --> 00:31:47,330
So the constraint is that this should be zero.
260
00:31:57,570 --> 00:32:02,820
Now. There are lots of ways to think about this. And in a minute I'll take a derivative and we'll see it mathematically.
261
00:32:03,000 --> 00:32:05,190
But before doing that, let's sort of draw a picture.
262
00:32:07,390 --> 00:32:17,080
We have a constraint here, which is the hyper plane B transpose x equals C, we're forced to lie on that line,
263
00:32:17,080 --> 00:32:25,149
that space, that hyper plane, and we're trying to minimise this quadratic function subject to that constraint.
264
00:32:25,150 --> 00:32:31,150
So we're trying to find not the global minimum of J, but the minimum of J on that constraint.
265
00:32:31,420 --> 00:32:35,980
So. Let's imagine that the level curves of Jay.
266
00:32:38,280 --> 00:32:46,860
Sort of look like this. So we're trying to find this point.
267
00:32:47,850 --> 00:32:51,390
It's the point where Jay is as good as possible on the constraint.
268
00:32:51,660 --> 00:32:59,460
Now, the Lagrange multiplier idea is essentially the geometric observation that at this point.
269
00:33:02,020 --> 00:33:08,470
The direction you would go to improve the value of J is orthogonal to the constraint.
270
00:33:09,130 --> 00:33:11,860
Anything you do along the constraint won't help.
271
00:33:13,060 --> 00:33:20,740
If the only way to improve the up, the objective function would be to violate the constraint and go in an orthogonal direction.
272
00:33:21,010 --> 00:33:24,490
And then here's Lagrange had this brilliant idea that.
273
00:33:26,380 --> 00:33:30,100
This constrained point could be thought of as an unconstrained point.
274
00:33:30,100 --> 00:33:33,880
If only the wind were blowing strongly enough in that direction.
275
00:33:34,210 --> 00:33:40,450
Just the right amount of wind would sort of tilt the objective function so that this point became a global constraint.
276
00:33:41,140 --> 00:33:45,250
And this is the wind speed, if you like. Okay, enough of that.
277
00:33:45,280 --> 00:33:54,520
Let's write down the mathematics. But the idea is to turn a constrained problem into an unconstrained problem with one more variable.
278
00:33:54,880 --> 00:33:58,120
So what we do is compute partial derivatives.
279
00:33:58,750 --> 00:34:03,550
So the partial derivative of the Lagrangian with respect to the constraint.
280
00:34:04,420 --> 00:34:12,250
Well, the constraint only appears here. So that's B transpose X minus C.
281
00:34:14,270 --> 00:34:19,069
The other partial derivative would be respect to the X variable, and that's a gradient.
282
00:34:19,070 --> 00:34:31,340
So the the partial derivative with respect to all the X variables is the gradient of L with respect to X, and that involves all of these terms.
283
00:34:31,340 --> 00:34:34,340
That's a X minus F.
284
00:34:35,420 --> 00:34:41,870
Plus to be. Now.
285
00:34:41,870 --> 00:34:45,800
LA His idea is to find a point where all of these things are zero.
286
00:34:47,460 --> 00:34:51,930
So we know that optimisation involves setting some kind of gradient to zero.
287
00:34:52,200 --> 00:34:56,970
That's when you're at a flat point thanks to having the wind blowing.
288
00:34:57,780 --> 00:35:01,379
That flat point will even satisfy the constraint.
289
00:35:01,380 --> 00:35:05,400
And here's why. Well, let's see what the.
290
00:35:06,540 --> 00:35:16,650
Suppose we find a stationary point. So suppose we find x and lambda such that both of these things are zero,
291
00:35:16,890 --> 00:35:27,240
so the gradient is zero and the derivative with respect to the Lagrange multiplier is zero.
292
00:35:31,800 --> 00:35:40,620
Then what does that imply? So well, first of all, it implies that the constraint is satisfied.
293
00:35:47,060 --> 00:35:52,040
Because we just figured out that the partial derivative of the grunge in with respect to lambda
294
00:35:52,040 --> 00:35:58,370
is that and if that zero the constraint is satisfied but also what it also implies is that.
295
00:36:02,100 --> 00:36:08,240
That A X minus F equals minus lambda B.
296
00:36:08,250 --> 00:36:15,060
That's what we get from this condition, which tells us that the how do I want to say this?
297
00:36:17,430 --> 00:36:25,840
The gradient of J. Is equal to minus Lambda B.
298
00:36:27,350 --> 00:36:34,490
So that condition says we satisfy the constraint. That condition says that the gradient is equal to minus lambda B.
299
00:36:34,640 --> 00:36:43,520
But that's just this picture, because that's telling you that the gradient of the objective function is orthogonal to the constraint.
300
00:36:43,910 --> 00:36:47,540
So if you believe the picture, it's got to be optimal.
301
00:36:47,780 --> 00:36:52,520
The only way, if you move along the constraint, you won't improve the objective function.
302
00:36:54,530 --> 00:36:56,450
So I'm saying it very quickly,
303
00:36:56,450 --> 00:37:04,370
but this tells us that the solution of this unconstrained optimisation problem gives us a solution of the original constraint problem.
304
00:37:06,470 --> 00:37:08,840
So the mathematics is just linear algebra.
305
00:37:17,140 --> 00:37:27,340
And the key idea, which perhaps La Grange didn't have to have, is to think of it as a system of linear equations of size and plus one by end.
306
00:37:27,340 --> 00:37:32,360
Plus one. Here's what it looks like. A.
307
00:37:33,820 --> 00:37:37,450
B b transpose zero.
308
00:37:39,030 --> 00:37:44,640
X lambda equals f c.
309
00:37:47,040 --> 00:37:50,340
So this is an n plus one by n plus one system of equations.
310
00:37:56,140 --> 00:38:03,070
It's linear because we've assumed that we started not with an arbitrary nonlinear function, but with a quadratic nonlinear function.
311
00:38:03,460 --> 00:38:10,330
And what are these two equations? It's end by end, plus one by end plus one.
312
00:38:10,330 --> 00:38:15,850
But you could also say it's block two by two. So it has a block structure of a two by two matrix.
313
00:38:16,360 --> 00:38:29,860
The first row of that block structure would be a x plus B, lambda equals F, the second row is being transpose x equal B, transpose x equals C.
314
00:38:34,100 --> 00:38:38,150
So that you see by solving this end plus one by end, plus one linear system of equations,
315
00:38:38,570 --> 00:38:43,460
we satisfy all the Lagrange multiplier conditions and find our global optimum.
316
00:38:45,430 --> 00:38:54,250
So this idea generalises to a huge part of the technology of quadratic programming and more generally linear programming.
317
00:38:54,910 --> 00:39:02,310
Let me mention that. So of course, if we have them constraints.
318
00:39:06,480 --> 00:39:10,860
Then we get em Lagrange multipliers or a vector of length.
319
00:39:10,860 --> 00:39:14,970
Then we get a system of dimension.
320
00:39:16,690 --> 00:39:22,930
And plus M. BI and plus M.
321
00:39:25,390 --> 00:39:31,330
So again, if you have a quadratic objective function and a finite number of linear equality constraints,
322
00:39:31,510 --> 00:39:36,490
you can solve the whole problem by solving a system of equations if it's more general.
323
00:39:38,160 --> 00:39:45,510
If it's non-linear but not necessarily quadratic objective function.
324
00:39:51,440 --> 00:39:52,999
Well, there are many ways to solve that,
325
00:39:53,000 --> 00:40:00,050
but one of the ways is to locally approximate it by a quadratic and then sequentially use successive approximations.
326
00:40:00,290 --> 00:40:09,020
And that method is called S Q P, which stands for successive or sequential quadratic programming.
327
00:40:21,590 --> 00:40:30,980
So this 200 year old idea of electrons multiplier has morphed or evolved into one of the key ideas in non-linear programming generally.
328
00:40:32,420 --> 00:40:37,879
I wanted to finish by telling you about a particular problem I've been involved in the last couple of years,
329
00:40:37,880 --> 00:40:41,240
and that gets to this final handout about the Faraday cage.
330
00:40:42,740 --> 00:40:45,840
So who's heard of the Faraday cage? Yeah.
331
00:40:46,440 --> 00:40:51,960
Everyone's heard of it. It's amazing. Let me remind you what a Faraday cage is.
332
00:40:55,950 --> 00:41:01,940
So suppose you have a metal shell. Metal is a conductor.
333
00:41:02,180 --> 00:41:08,840
And therefore, the voltage, the potential, let's call it you, the potential on a metal scale will be constant.
334
00:41:10,160 --> 00:41:19,940
And that means that inside a metal shell, if you have an electrostatic potential on the boundary, you just have that same constant potential inside.
335
00:41:20,450 --> 00:41:25,930
So the solution to the pluses equation inside a metal shell is a constant.
336
00:41:25,940 --> 00:41:35,270
So the gradient is zero. So inside a metal shell there are no electric fields because the gradient of the voltage has to be zero.
337
00:41:35,930 --> 00:41:43,670
Now, what Faraday noted in 1836 was that the same seemed to work if you had.
338
00:41:45,870 --> 00:41:54,120
A discrete approximation to a shell. I'm drawing it in two dimensions, but of course he would have been experimenting in three dimensions.
339
00:41:54,330 --> 00:42:03,750
If you take a wire mesh, it seems to block electric fields as well as a solid metal mesh, a solid metal shell.
340
00:42:04,470 --> 00:42:07,830
So in two D we can imagine we have a bunch of wires.
341
00:42:07,980 --> 00:42:12,090
These are the cross-sections of the wires and they're all being held at the same voltage.
342
00:42:18,210 --> 00:42:25,560
So the Faraday effect is that if you have a bunch of points in 2D or lines in 3D held at the same voltage,
343
00:42:25,770 --> 00:42:32,100
then the gradient inside, of course nobody would say it's exactly zero, but it's approximately zero.
344
00:42:33,570 --> 00:42:38,610
The famous example is a microwave oven where you have these microwaves inside that aren't allowed to get out.
345
00:42:38,970 --> 00:42:41,010
How is that achieved while there's a lot of metal?
346
00:42:41,220 --> 00:42:46,890
But the front door is metal with holes in it because you want light to get out and light has a much smaller wavelength.
347
00:42:48,060 --> 00:42:55,320
So I got interested in that a couple of years ago, and this is a clippings from a talk I just gave the other day on the subject,
348
00:42:56,790 --> 00:43:00,300
and it turns out that there's no literature on the Faraday cage.
349
00:43:00,570 --> 00:43:04,170
And, well, there's a little. And what little literature there is is wrong.
350
00:43:04,740 --> 00:43:10,620
So the the famous example or famous the the conspicuous example of a bit of
351
00:43:10,620 --> 00:43:15,000
literature in this area is in Feynman's lecture notes on Physics and Volume two.
352
00:43:16,050 --> 00:43:19,400
You can find a brief one or two page discussion.
353
00:43:19,410 --> 00:43:24,420
He doesn't call it the Faraday cage, but he talks about shielding. So I've given you a quote there.
354
00:43:24,840 --> 00:43:33,510
Feynman does an analysis, and then he says, Inside a screen, the field, the the fields are zero.
355
00:43:33,510 --> 00:43:41,040
And of course, he means exponentially small. So he's saying that inside a screen like this, the fields are nearly the same as there.
356
00:43:41,670 --> 00:43:47,840
So that turns out to be not true. It took us a long time to realise that with confidence.
357
00:43:48,980 --> 00:43:55,160
If you look at Faraday's argument, you discover, rather than setting the voltages to be equal on the wires,
358
00:43:55,400 --> 00:43:58,540
he's actually assuming that each wire has the same charge on it.
359
00:43:58,550 --> 00:44:01,550
And that's not right. It's the voltage that's equal.
360
00:44:01,550 --> 00:44:08,720
The point is that they're connected. So let me explain to you how that's a constrained quadratic programming problem.
361
00:44:10,560 --> 00:44:20,010
So the way we look at that is imagine for simplicity that your cage consists of a bunch of a bunch of circles.
362
00:44:21,450 --> 00:44:27,379
Or little desk, if you like. These are cross-sections of the wires. Out here.
363
00:44:27,380 --> 00:44:31,640
You have some drive in some field that you're trying to shield.
364
00:44:31,910 --> 00:44:37,850
So in two dimensions, that might be something like the log of Z minus some special point.
365
00:44:38,000 --> 00:44:41,360
That's a two dimensional point charge in 3-D.
366
00:44:41,360 --> 00:44:52,310
It would be analogous. The goal of the Faraday cage is to make the voltage nearly constant of here so that the electric field is nearly zero.
367
00:44:54,120 --> 00:44:58,079
So if you think physically about what happens here, these are just passive wires.
368
00:44:58,080 --> 00:45:04,950
They don't have any charge on them in a net sense, but the charge will nevertheless move around so that if this,
369
00:45:04,950 --> 00:45:15,030
for example, is plus, then it will attract some minus charges here and repel some plus charges here.
370
00:45:15,840 --> 00:45:26,070
Always neutral on balance. So the question becomes how will charges distribute themselves around this cage to minimise energy?
371
00:45:28,570 --> 00:45:32,770
And that may explain why it's well, let's just stare at this thing.
372
00:45:37,150 --> 00:45:48,110
The equations are at the bottom of the back. And what you see is that the quadratic energy term that we're trying to minimise to model the Faraday
373
00:45:48,110 --> 00:45:53,290
cage consists of three pieces and two of them are sort of obvious and it's the third that's the surprise.
374
00:45:53,300 --> 00:45:59,840
So let's go from back to front. The third piece is the energy with respect to the external field.
375
00:46:00,140 --> 00:46:07,100
So if you have a certain charge here, then there's an energy associated with its interaction with that.
376
00:46:07,880 --> 00:46:11,270
Obviously there's an inverse square force, an inverse linear energy.
377
00:46:13,220 --> 00:46:19,970
So that's the third term in the equation. The middle term in the equation is the interaction between the charges.
378
00:46:19,970 --> 00:46:24,290
So if there's some charge here and some charge here, they would repel or attract each other.
379
00:46:24,290 --> 00:46:30,010
There's an energy associated with that. And then the first term is the surprising one.
380
00:46:30,610 --> 00:46:34,689
So the way we're modelling this is where imagining that this disk has charged.
381
00:46:34,690 --> 00:46:38,470
Q One on it and this one has charged. Q Two, these are just numbers.
382
00:46:39,100 --> 00:46:43,090
Q So our unknown is a vector of charges.
383
00:46:55,690 --> 00:47:00,249
And those charges will be distributed in such a way as to minimise the energy.
384
00:47:00,250 --> 00:47:05,260
But the third term in the energy that's the first one as written down is the energy involved with
385
00:47:05,260 --> 00:47:10,389
having a lot of charge on a small wire because it takes work to put charge on the small wire.
386
00:47:10,390 --> 00:47:16,719
It's repelling itself. So the smaller the wire it is, the more work it takes and the more charge there is, the more work it takes.
387
00:47:16,720 --> 00:47:22,660
Quadratics. So you go through all this and you find you have a quadratic, objective function.
388
00:47:23,050 --> 00:47:27,130
You notice the first two terms are the quadratic, the squared part.
389
00:47:27,670 --> 00:47:30,690
The first term is the diagonal of a matrix q k squared.
390
00:47:30,700 --> 00:47:33,990
The second term is the off diagonal of the matrix. Q JQ.
391
00:47:34,000 --> 00:47:39,690
K And it's symmetric. The third term is the linear part of the problem.
392
00:47:39,700 --> 00:47:46,030
So we have exactly the format that I wrote down here.
393
00:47:46,300 --> 00:47:56,050
This is this is the energy we're trying to minimise two terms go into there, one term goes there and then there's a constraint.
394
00:47:56,650 --> 00:48:06,100
The total amount of charge is zero. So the sum of the Q case should be zero because there's no net charge on your passive Faraday cage.
395
00:48:06,310 --> 00:48:12,580
In other words, the inner product of C with Q is zero.
396
00:48:13,120 --> 00:48:20,260
If C is a vector of all ones. So it's a perfect example of a one constraint.
397
00:48:20,590 --> 00:48:25,960
Quadratic programming problem. And of course, once you see that, you can solve it in great at great speed.
398
00:48:26,200 --> 00:48:33,490
I have a little code that hopefully will do that. Let's see if it works. I'll try it on this computer.
399
00:48:34,240 --> 00:48:39,400
Chris, does it matter which computer I use? No. Okay, let's try this one.
400
00:48:49,630 --> 00:48:56,020
So suppose I try M24 Faraday, which is something I wrote up at Great Speed the other day.
401
00:48:56,020 --> 00:49:00,280
So it's an ugly bit of code. But it does exactly this.
402
00:49:00,290 --> 00:49:06,110
It solves this end plus one by end, plus one system. Let's give it a radius of each wire.
403
00:49:08,130 --> 00:49:15,630
And the way it's set up is I click and every time I click it sets up this matrix problem and solves it.
404
00:49:16,350 --> 00:49:22,950
So these are wires that I'm just putting into my cage. You see, the cage isn't at all closed yet, so it's not doing much shielding.
405
00:49:23,460 --> 00:49:27,390
But as I close it off. The shielding will get better.
406
00:49:29,220 --> 00:49:32,520
So every time I click, it's just solving another linear algebra problem.
407
00:49:32,520 --> 00:49:36,570
And what could be easier than that? The time is spent in drawing the plot, of course.
408
00:49:39,190 --> 00:49:42,640
So there you see the Faraday shielding effect.
409
00:49:43,090 --> 00:49:44,560
And you see it's pretty good.
410
00:49:45,130 --> 00:49:52,930
But you also see the fact that there are a few of those lines penetrating into the case shows that it's nothing like exponential.
411
00:49:52,930 --> 00:49:56,080
And whether you're a microwave oven is as safe as it ought to be.
412
00:49:56,290 --> 00:50:01,390
I don't know a fact about microwave ovens is that it's hard to see into them.
413
00:50:01,960 --> 00:50:08,080
You know, there's a lot of metal in that front door. And obviously the engineers who measured this stuff were not reading their Feynman.
414
00:50:08,080 --> 00:50:11,860
Thank goodness they realised you needed a lot of metal to have a good Faraday cage.
415
00:50:12,310 --> 00:50:15,530
Okay, so we'll turn to PDS and Otis next term.
416
00:50:15,550 --> 00:50:16,180
Thanks very much.