1
00:00:02,700 --> 00:00:06,450
Okay. Good morning, everybody. I hope you've done the quiz.
2
00:00:06,450 --> 00:00:13,650
So let's just look at that and see if you got the right answers. Every square matrix has an eigenvalue decomposition false.
3
00:00:14,040 --> 00:00:18,180
And the simplest example of one that doesn't. Would be that.
4
00:00:19,830 --> 00:00:25,400
In other words, not every matrix is diagonal sizeable. Every matrix, rectangular or square has an SVD.
5
00:00:25,710 --> 00:00:31,300
Yes, that's true. If A and these are square, etc., etc., then they have the same eigenvalues.
6
00:00:31,320 --> 00:00:34,740
Yes, that's true. That's a similarity transformation. So let's see.
7
00:00:34,740 --> 00:00:43,080
We have false, true, true number for a square matrix of singular if and only if it has a zero eigenvalue.
8
00:00:43,290 --> 00:00:49,280
True. Number five, the singular values of an orthogonal matrix are all equal to one.
9
00:00:49,550 --> 00:00:55,280
That's true. Remember, the singular values may tell you about a hyper ellipsoid.
10
00:00:55,280 --> 00:01:00,500
Well, an orthogonal matrix maps a ball to a ball, so all the singular values are one.
11
00:01:01,610 --> 00:01:04,880
Finally, the eigenvalues of an orthogonal matrix are all equal to one.
12
00:01:05,060 --> 00:01:11,120
That's false. The eigenvalues of an orthogonal matrix have to be equal to those of its inverse.
13
00:01:11,390 --> 00:01:15,410
But that could be minus one. So they're plus or minus one? Not necessarily one.
14
00:01:17,750 --> 00:01:24,170
Okay. So today we begin the last bit of this term, which is about optimisation.
15
00:01:24,380 --> 00:01:27,350
We've been talking about linear algebra. Now optimisation.
16
00:01:27,770 --> 00:01:32,990
This is a subject that I mentioned at the beginning was one of the pillars of scientific computing.
17
00:01:33,200 --> 00:01:42,530
Remember, I drew a tree in which I identified three traditional main areas of numerical analysis scientific computing.
18
00:01:42,680 --> 00:01:51,350
One was linear algebra, one was differential equations, pdes odes, and the other was optimisation.
19
00:01:53,690 --> 00:01:58,630
I think if anything, over the years, this one just keeps getting more and more important.
20
00:01:58,640 --> 00:02:04,430
It's just an amazingly exciting, flourishing area that's everywhere in science and engineering.
21
00:02:05,720 --> 00:02:13,820
Let me mention a couple of facts about optimisation that make it diverse and interesting.
22
00:02:14,300 --> 00:02:20,840
One is that, of course, we're typically dealing with functions of several variables or many variables.
23
00:02:22,280 --> 00:02:28,670
So optimisation is all about minimising functions of many variables.
24
00:02:30,830 --> 00:02:38,180
And the other thing without which the field wouldn't be so variegated is that typically there are constraints.
25
00:02:40,700 --> 00:02:42,260
So what is optimisation?
26
00:02:42,560 --> 00:02:54,360
Well, it's the subject of minimising or maximising functions, typically with several or many variables and typically with constraints.
27
00:02:54,370 --> 00:03:06,500
So it's minimisation and maximisation also more or less in the same domain is finding routes of functions maximisation.
28
00:03:13,720 --> 00:03:18,660
So that's what optimisation is all about. Nobody ever maximises in practice you always minimise.
29
00:03:18,660 --> 00:03:22,600
So if you have a if you want a maximum of something, multiply it by minus one.
30
00:03:23,050 --> 00:03:31,330
The standard is always to talk about minimisation. So this is a field where there is a totally dominant, great textbook.
31
00:03:31,600 --> 00:03:36,850
And that's this book by No Saddle and Right. The second edition came out nine years ago.
32
00:03:36,850 --> 00:03:39,309
I think it's just a beautiful, straightforward,
33
00:03:39,310 --> 00:03:45,940
knowledgeable book that talks about all the basic and a little more than basic ideas of numerical optimisation.
34
00:03:46,150 --> 00:03:52,300
So there's absolutely no doubt that if you want to learn about something in this area, the place to start is with this book.
35
00:03:52,720 --> 00:03:58,600
Let me pass that around. Oh, before I pass it around, let me read the first two sentences.
36
00:04:02,720 --> 00:04:05,750
So it's a big, thick book, but well organised.
37
00:04:05,750 --> 00:04:09,580
Don't worry. The first two paragraphs are just these little paragraphs.
38
00:04:09,590 --> 00:04:13,700
The first paragraph begins People optimise.
39
00:04:15,010 --> 00:04:19,960
What they mean by that is, well, let me say what they mean by that. Investors seek to create portfolios.
40
00:04:19,990 --> 00:04:23,290
Da da da. Manufacturers aim for maximum efficiency.
41
00:04:23,500 --> 00:04:26,740
Engineers adjust parameters to optimise performance and so on.
42
00:04:26,890 --> 00:04:31,960
So people optimise. Then the second paragraph says Nature optimises.
43
00:04:32,230 --> 00:04:34,840
Physical systems tend to a state of minimum energy.
44
00:04:35,020 --> 00:04:41,470
The molecules in an isolated chemical system react with each other until the total potential energy of their electrons is minimised.
45
00:04:41,680 --> 00:04:45,160
Rays of light follow paths that minimise their travel time.
46
00:04:45,700 --> 00:04:51,309
I think that's an intellectually fascinating beginning of the book because it really is true that optimisation is
47
00:04:51,310 --> 00:04:56,860
something that's both at the heart of science because of so many processes that minimise energy or something.
48
00:04:57,070 --> 00:05:00,280
And at the heart of engineering, for even more obvious reasons.
49
00:05:07,100 --> 00:05:16,009
Okay. So the beginning point of optimisation and it's not just the beginning,
50
00:05:16,010 --> 00:05:21,200
it's a pervasive idea throughout everything is Newton's method, which of course we've all heard of.
51
00:05:21,380 --> 00:05:24,140
But that's certainly where we're going to start discussing the matter.
52
00:05:24,380 --> 00:05:32,090
And we're going to begin with Newton's method in one dimension, which is the kind of thing I hope you all learned in high school.
53
00:05:32,840 --> 00:05:39,400
So Newton's. Method, and I'll put it like this Newton's method for a single equation.
54
00:05:40,660 --> 00:05:46,870
And the reason for putting it like that is to emphasise that I'm beginning with root finding rather than minimisation.
55
00:05:51,150 --> 00:05:55,830
I remember once a teacher told me that really mathematics boiled down to two things.
56
00:05:56,280 --> 00:05:59,520
One is Taylor series. The other is linear algebra.
57
00:05:59,640 --> 00:06:04,110
Everything else is variations on those themes. Well, this is Taylor series.
58
00:06:04,110 --> 00:06:06,900
Newton's great idea is Taylor series. Indeed,
59
00:06:07,110 --> 00:06:14,250
I think Arnold wrote a book about the history of mathematics in which he essentially claimed that Newton's great contribution was Taylor series,
60
00:06:15,720 --> 00:06:22,530
exploiting the smoothness of a function to approximate it by a constant plus slope plus curvature, that sort of thing.
61
00:06:23,190 --> 00:06:25,620
So let's write down the notation we're going to use.
62
00:06:27,720 --> 00:06:35,070
Let's suppose we're given a function, and the notation I'll use for root finding is capital F of X.
63
00:06:35,580 --> 00:06:42,810
When we turn to minimisation, it'll be lowercase F. So suppose we have a function which we assume is nonlinear.
64
00:06:47,280 --> 00:06:58,140
And we want to find a zero. So our goal is to find a zero of F, also called a root of F.
65
00:07:00,820 --> 00:07:04,450
And the notation will use is that that's going to be X star.
66
00:07:06,700 --> 00:07:11,680
So we're looking for an X star such that f of X star is equal to zero.
67
00:07:13,990 --> 00:07:21,819
And you all know Newton's method. Let me write it down in this format that I'll write down a bunch of these iterations.
68
00:07:21,820 --> 00:07:28,080
So suppose we're given an initial guess. Which I'll call, of course, X not.
69
00:07:30,990 --> 00:07:36,450
And then we enter the loop. And I won't ever talk about termination criteria.
70
00:07:36,450 --> 00:07:56,970
So I'll just say four k equals 01... define s sub k to be minus f evaluated at x sub k divided by f prime evaluated that x sub k now s.
71
00:07:58,180 --> 00:08:04,790
Okay. Stands for step. Because we're going to take a step in that direction.
72
00:08:06,440 --> 00:08:13,730
The new iterate is then x, k plus one is equal to x, k plus k.
73
00:08:17,010 --> 00:08:20,940
So there's the iteration that you all learned in high school.
74
00:08:22,710 --> 00:08:25,710
Now, let me ask I'm curious. Raise your hand if you did learn that in high school.
75
00:08:26,700 --> 00:08:36,270
Oh, okay. Pretty good. And of course, you know, the picture associated with this, we have some kind of function.
76
00:08:37,170 --> 00:08:46,110
We're trying to find X star, we have some guess x not we evaluate the function, then extrapolate along the tangent,
77
00:08:46,710 --> 00:08:52,830
and that process converges quickly, in fact, just to say a word about how fast it does converge.
78
00:08:56,610 --> 00:09:03,630
I won't state it formally as a theorem, but this is the idea if F is twice differentiable.
79
00:09:05,450 --> 00:09:11,970
So that means the second derivative exists. And moreover.
80
00:09:15,310 --> 00:09:22,180
The derivative evaluated at a route x star is nonzero.
81
00:09:25,990 --> 00:09:32,260
Then if X not is sufficiently close to x star.
82
00:09:44,180 --> 00:09:50,390
We have convergence and it's quadratic. So we say the convergence is quadratic.
83
00:09:53,790 --> 00:09:58,530
Which means the number of digits doubles at each step, essentially.
84
00:09:59,670 --> 00:10:07,950
Later, I may write that down. More precisely, the error at one step is proportional asymptotically to the square of the error at the previous step.
85
00:10:08,550 --> 00:10:12,630
And this is a prototype for the kind of theorem you find all over optimisation.
86
00:10:13,170 --> 00:10:18,240
Notice some key assumptions. Always turn up. You need some smoothness in your function.
87
00:10:18,360 --> 00:10:21,780
If you lose that, then there may be no convergence or slower convergence.
88
00:10:22,350 --> 00:10:26,610
You need some kind of generic behaviour at the solution you're looking for.
89
00:10:26,910 --> 00:10:30,120
Otherwise you may have no convergence or slower convergence.
90
00:10:30,660 --> 00:10:35,040
And especially with Newton's method, you certainly need to be close enough to the solution.
91
00:10:36,000 --> 00:10:39,240
So of course, if this function keeps doing crazy things.
92
00:10:42,670 --> 00:10:48,760
Depending where we start, we may converge to nothing or we may converge to different routes.
93
00:10:49,060 --> 00:10:52,570
It's obvious from the picture that if we start near this route, will converge to it.
94
00:10:52,780 --> 00:11:00,670
Start near that will converge to that. Things get more complicated if your function is less smooth or the derivatives are non-zero at the roots.
95
00:11:02,260 --> 00:11:06,550
So of course, a route is by no means necessarily unique.
96
00:11:06,910 --> 00:11:10,630
Newton's method is all about finding one route, not necessarily all of them.
97
00:11:11,770 --> 00:11:16,510
Now I want to show you, as usual, some stuff. So let's go to MATLAB.
98
00:11:21,620 --> 00:11:30,830
The first code is the one called M 18, which is not a program you're really supposed to look at because all it does is fiddle around with graphics.
99
00:11:32,120 --> 00:11:36,830
But I was giving a public lecture once, so it was worthwhile making such a code.
100
00:11:37,040 --> 00:11:52,490
So it's called Pretty Newton. So what Pretty Newton does is take a particular function and just colourfully illustrate Newton's method.
101
00:11:52,700 --> 00:11:58,120
So there you have the function, which of course is the blue curve, and we're trying to find that point.
102
00:11:58,520 --> 00:12:03,650
And there's our initial gas and you can see that the initial gas is 0.8.
103
00:12:04,400 --> 00:12:12,410
None of the digits are correct. So they're all in red. You evaluate the function at your current gas.
104
00:12:14,140 --> 00:12:18,670
And then you extrapolate along the tangent line. So that gives you a new point.
105
00:12:18,670 --> 00:12:21,730
0.208. Still, none of the digits are correct.
106
00:12:22,420 --> 00:12:27,160
So they're still all in red. Then we evaluate the function there.
107
00:12:27,610 --> 00:12:32,519
Extrapolate along the tangent. Still none of the digits are correct.
108
00:12:32,520 --> 00:12:38,649
So they're still all in red. Of course, the remainder you won't see very well.
109
00:12:38,650 --> 00:12:45,280
You're mostly going to see the red digits. We evaluate and extrapolate and now suddenly we have three digits of accuracy.
110
00:12:45,280 --> 00:12:54,310
.373 are correct. And from now on, every time we keep going, the number of correct digits will approximately double.
111
00:12:55,990 --> 00:12:59,680
So I evaluate and then I extrapolate.
112
00:13:00,250 --> 00:13:03,430
And it exactly doubled from three digits to six digits.
113
00:13:04,520 --> 00:13:09,140
Do it again. We expect to get about 12 digits. I evaluate and then extrapolate.
114
00:13:09,980 --> 00:13:13,820
And as it happens, we got exactly 12 digits on my machine.
115
00:13:13,820 --> 00:13:14,840
It won't go beyond that.
116
00:13:14,840 --> 00:13:22,820
I might be able to nail the final one, but of course, I'm working in 16 digit arithmetic, so that's as far as I'll get on this computer.
117
00:13:26,160 --> 00:13:31,560
Great idea. Newton, by the way, didn't do as much with Newton's method as you probably think.
118
00:13:31,860 --> 00:13:35,790
I think he mainly applied it only to polynomials, not to general functions.
119
00:13:36,060 --> 00:13:42,030
And Simpson, as in Simpson's rule, was the person who really put Newton's method on its feet.
120
00:13:42,210 --> 00:13:46,940
Simpson and Robson both. Okay.
121
00:13:46,970 --> 00:13:51,140
Now, I also want to look at the other codes that are called M 19.
122
00:13:52,550 --> 00:13:57,170
So there are three of these just to explore this problem from different angles.
123
00:13:58,070 --> 00:14:01,330
Excuse me. Okay.
124
00:14:01,330 --> 00:14:06,580
So here we have a problem. I might as well write it down.
125
00:14:07,870 --> 00:14:22,580
We're solving. An M 19 where solving f of x equals x cubed minus two.
126
00:14:22,940 --> 00:14:25,220
So we're trying to find the cube root of two.
127
00:14:26,120 --> 00:14:32,300
And incidentally, on computers, sometimes Newton's method is used even for elementary things like square root sets.
128
00:14:32,570 --> 00:14:36,920
It's an algorithm so useful that even in the hardware, sometimes it will be implemented.
129
00:14:37,910 --> 00:14:44,810
So let's look at this from various angles. The first of the codes em 19 Newton.
130
00:14:44,820 --> 00:14:48,139
A just as straightforward matlab.
131
00:14:48,140 --> 00:14:54,570
So I will close that. And let's just type em 19.
132
00:15:01,890 --> 00:15:07,370
Newton. A So what you see there is we've taken our initial guesses.
133
00:15:07,380 --> 00:15:16,320
One, we're trying to find the cube root of two. We iterate, we get 4/3, we iterate again and again.
134
00:15:16,830 --> 00:15:19,860
So you could guess now that we have several digits of accuracy.
135
00:15:20,850 --> 00:15:27,660
It looks clear that now that we have four or five digits of accuracy now, it would seem we have maybe nine.
136
00:15:28,620 --> 00:15:31,110
And now, from now on, it just keeps agreeing.
137
00:15:31,500 --> 00:15:38,430
So, of course, it's not a proof, but it's pretty clear that we have more or less full precision in the cube root of two.
138
00:15:40,300 --> 00:15:43,860
Let's see what happens if I type two to the one third.
139
00:15:46,180 --> 00:15:50,710
Looks good. That doesn't mean they're the same. By the way, I could say Ant's minus X.
140
00:15:51,430 --> 00:15:55,740
They might. That might be zero. Or it might be some small multiple of machine epsilon.
141
00:15:55,750 --> 00:16:02,960
It turns out at zero we were lucky. The next one is doing the same.
142
00:16:02,980 --> 00:16:08,740
But symbolically now I would love to show you this properly, but my laptop doesn't have the symbolic toolbox.
143
00:16:09,040 --> 00:16:17,110
So you can try this at home. And therefore what I did was printed out a couple of the outputs on the sheet of paper here.
144
00:16:17,380 --> 00:16:24,160
So here we're running in MATLAB symbolic toolbox, which is by no means the world's most beautiful symbolic calculator.
145
00:16:24,550 --> 00:16:31,680
But anyway, it works for something like this. In MATLAB, you can declare a variable to be sim of a string,
146
00:16:31,680 --> 00:16:37,860
and so one then becomes a symbolic one and you can go through this whole thing in a symbolic mode.
147
00:16:39,570 --> 00:16:45,810
And Newton's method for this problem has a succession of iterates that are all rational numbers.
148
00:16:46,050 --> 00:16:51,330
So you see, the first is one and then 4/3, which we identified immediately when we ran Newton.
149
00:16:52,650 --> 00:16:55,740
But they keep being rational numbers at every stage.
150
00:16:55,740 --> 00:16:59,640
So 91, 70 seconds and 112. Yeah, whatever.
151
00:17:02,000 --> 00:17:05,360
Now, the striking thing about these numbers is that they keep getting longer.
152
00:17:05,540 --> 00:17:11,569
And if you quantify that, roughly speaking, the numerator is tripling and that's number of digits at each step.
153
00:17:11,570 --> 00:17:14,810
And so is the denominator because this is a cubic.
154
00:17:15,290 --> 00:17:23,930
If it had been a quintillion, the numerator would have become five times as long at each step or a 20th degree polynomial, 20 times as long.
155
00:17:24,560 --> 00:17:29,420
So what you're seeing here is the combinatorial explosion of symbolic computing.
156
00:17:29,690 --> 00:17:38,690
Rational arithmetic is sadly just not a good idea for scientific computing because this happens all the time at combinatorial explodes.
157
00:17:39,140 --> 00:17:44,060
So that's why rational arithmetic has such a small niche in computational science.
158
00:17:44,420 --> 00:17:53,330
Usually it just isn't feasible. And if you're even looking at this for theatre, it said we have won 4/3 91 and then going down three more steps.
159
00:17:55,180 --> 00:18:02,860
That approximation to the cube root of two will surely be accurate to 16 digits or so, but it won't be accurate to this many digits.
160
00:18:03,040 --> 00:18:09,460
So most of that exactness in the rational numbers is irrelevant to the problem you're trying to solve.
161
00:18:10,060 --> 00:18:16,210
And that would be very typical of symbolic or exact arithmetic computing.
162
00:18:18,650 --> 00:18:26,450
Now the third one and 19 Newton C is also something I can't run on my laptop because I don't have the symbolic toolbox.
163
00:18:26,660 --> 00:18:31,010
But it's the other variation on the theme of standard numerical computing.
164
00:18:31,250 --> 00:18:35,300
So standard is floating point arithmetic. One variation is symbolic.
165
00:18:35,780 --> 00:18:39,970
The other variation is extended precision. Another important beautiful idea.
166
00:18:39,980 --> 00:18:47,510
So here in this code M 19, Newton C, I said let's run in 75 digit arithmetic.
167
00:18:49,100 --> 00:18:53,059
So there you see, print it out the first few steps and 75 digit arithmetic.
168
00:18:53,060 --> 00:18:58,940
And sure enough, the second one has one digit, the third one has one digit, the fourth one has three digits.
169
00:18:59,180 --> 00:19:03,860
And then if you follow along, you see that the last two lines of the output are identical.
170
00:19:04,280 --> 00:19:08,450
So sure enough, Newton is doubling the number of digits at each step.
171
00:19:09,800 --> 00:19:18,040
Variable precision is a beautiful, wonderful thing. And I think everyone in my field wishes it were effortlessly available all the time.
172
00:19:18,050 --> 00:19:21,950
If only every computer had a knob, you could turn and just change the position.
173
00:19:22,970 --> 00:19:26,690
Sadly, from an engineering point of view, that doesn't seem to be worth the cost.
174
00:19:27,410 --> 00:19:31,430
But for an academic like me, it would be delightful to be able to do that.
175
00:19:31,730 --> 00:19:38,420
Maybe in 50 years computers will do that. So in practice, if you want to do variable precision arithmetic,
176
00:19:38,570 --> 00:19:44,870
you go into some rather different mode and usually there's a significant cost associated with that.
177
00:19:46,980 --> 00:19:51,890
When I was your age, quadruple precision was more common than it is now.
178
00:19:51,900 --> 00:19:58,290
It's still out there. But when I was your age, single, double and quadruple were all very much part of the landscape.
179
00:19:58,290 --> 00:20:01,980
So we would computed eight digit, 16 digits or 32 digits.
180
00:20:02,280 --> 00:20:07,140
Now, that's kind of compressed where most people, most of the time are just working in 16 digits.
181
00:20:10,140 --> 00:20:16,920
Okay. So that's Newton's method in the scalar case.
182
00:20:17,130 --> 00:20:20,190
But of course, optimisation is about systems of equations.
183
00:20:20,550 --> 00:20:30,520
So let's talk about that. So Newton's method for a system of equations.
184
00:20:46,360 --> 00:20:52,810
And again, I'm writing that out explicitly to emphasise that we're not yet minimising, we're still finding routes of equations.
185
00:20:54,310 --> 00:21:00,870
So now we're going to have a function that maps from a vector space to a vector space.
186
00:21:00,880 --> 00:21:06,280
So let's say that capital F maps from our end to our N.
187
00:21:09,400 --> 00:21:28,690
So we have variables and equations, if you like. We're seeking a vector x star in our n such that f of x star is zero and these are vectors.
188
00:21:28,690 --> 00:21:31,900
So zero here would be the end dimensional zero vector.
189
00:21:32,710 --> 00:21:37,750
So another way to say that is we have a system of equations.
190
00:21:41,940 --> 00:21:50,519
And if you want, we could write them out. The first one would be f one of x one through x, n equals zero.
191
00:21:50,520 --> 00:22:01,500
And that's now the scalar 0... up to f n of x1 through x n is equal to zero.
192
00:22:03,390 --> 00:22:07,980
So a system of equations is the same as a single equation in a vector space.
193
00:22:09,780 --> 00:22:12,780
Now, Newton's method in principle carries over. Exactly.
194
00:22:13,380 --> 00:22:17,910
Although Newton himself never did that, he never did systems of equations.
195
00:22:20,490 --> 00:22:27,180
And I don't remember to what extent it's thought that he sort of understood the idea of calculus with more than one variable.
196
00:22:27,270 --> 00:22:32,589
Probably he did. He was smart. The idea is to do exactly the same.
197
00:22:32,590 --> 00:22:39,850
We need to extrapolate along a tangent, but now a tangent will be defined by some linear algebra rather than just a derivative scalar.
198
00:22:40,060 --> 00:22:44,020
So we're going to have to solve a linear system of equations at each step.
199
00:22:44,500 --> 00:22:50,680
And for that to make sense, we have to talk about a linear ization of the function f.
200
00:22:51,130 --> 00:22:58,780
So here's the definition, and I'll define this in the general sense of a function from one dimensional space to another.
201
00:23:01,280 --> 00:23:07,730
The key thing we need is the derivative of F, which usually one would call the Jacoby and the Jacoby and matrix.
202
00:23:08,060 --> 00:23:21,460
So the derivative. Of S at a point X in our end.
203
00:23:25,380 --> 00:23:28,980
It's a matrix. It's the M by n matrix.
204
00:23:34,350 --> 00:23:45,050
Of partial derivatives. So J for Jacoby and I could say Jay of X, that's my matrix.
205
00:23:45,290 --> 00:23:56,149
And the I j component of that matrix is you could also write as f prime of x AJ component is the partial
206
00:23:56,150 --> 00:24:07,610
derivative of the ice component of F with respect to the JTH component of X evaluated at the point x.
207
00:24:11,750 --> 00:24:15,920
By the way, you notice I don't use special symbols to distinguish vectors.
208
00:24:16,190 --> 00:24:24,120
Of course you can. You can write them in bold face, or you can put arrows over them or Tilda's over them in mathematics.
209
00:24:24,140 --> 00:24:30,290
Often one doesn't do that, so people get used to just using the same thing, both for a vector as for a scalar.
210
00:24:31,070 --> 00:24:37,580
I realise in other fields you may have different habits. So let's just do an example.
211
00:24:39,290 --> 00:24:43,610
You all know partial derivatives and presumably most of, you know, Jacobean matrices.
212
00:24:44,390 --> 00:24:47,360
The example I'll pick, we'll go from our two to our three.
213
00:24:52,640 --> 00:25:13,879
So suppose we had for example, F1 of x1 x2 is equal to x1 squared and f two of x1 x2 is equal to x1 plus x1 x2 and f three of x1.
214
00:25:13,880 --> 00:25:20,090
X2 is equal to x1 e to the x2.
215
00:25:23,140 --> 00:25:26,200
So the Jacoby and Matrix is a three by two matrix.
216
00:25:27,370 --> 00:25:32,020
Let me just write it down. F prime of x. So it's three by two.
217
00:25:33,670 --> 00:25:37,990
So we differentiate the first thing with respect to x one and x two.
218
00:25:38,020 --> 00:25:45,999
So we get two X1 zero, then we differentiate the second one with respect to X1 and x2.
219
00:25:46,000 --> 00:25:56,100
So we get one plus x2 and x1. Then we differentiate the third with respect to X1 and x2.
220
00:25:56,310 --> 00:25:59,940
So we get either the X2 and X1, either the x2.
221
00:26:07,310 --> 00:26:10,730
So Newton's method is going to have to do this kind of thing all the time.
222
00:26:11,030 --> 00:26:15,830
Of course, usually the derivative will be calculated numerically rather than symbolically.
223
00:26:45,240 --> 00:27:08,790
Said, okay, I guess to continue for just a second on the general idea of a Jacobean,
224
00:27:09,060 --> 00:27:13,500
this is Newton's idea that locally any function, if it's smooth, looks linear.
225
00:27:15,060 --> 00:27:18,930
So that idea, the motivation for Jacobins.
226
00:27:22,260 --> 00:27:32,400
Is that if you have this function capital F, mapping r m to our end and you evaluate it at X plus a small perturbation delta x,
227
00:27:33,390 --> 00:27:36,930
so that's an M, an n vector and another n vector.
228
00:27:38,670 --> 00:27:45,150
Then that should be equal to. F of X plus some suitable perturbation.
229
00:27:46,020 --> 00:27:53,040
And that perturbation will be the vector, the matrix F, prime times the perturbation.
230
00:27:54,360 --> 00:27:57,480
So that's the whole point of vector calculus, if you like.
231
00:27:57,750 --> 00:28:06,640
These are both end vectors, as I said. And these.
232
00:28:08,300 --> 00:28:12,140
Are both m vectors. Sorry.
233
00:28:15,520 --> 00:28:27,440
Yeah, right. So this amp vector arises by this one by multiplying an N by an matrix by a an end vector.
234
00:28:27,770 --> 00:28:36,770
So that's what vector calculus starts from. And Newton's method is simply exploiting this property in the case of a square capital F.
235
00:28:37,340 --> 00:28:53,060
So let's write that down. So now it's Newton's method for a system of equations in its purest form.
236
00:28:53,480 --> 00:28:59,540
So we would be given an initial guess, which is now a vector.
237
00:29:01,390 --> 00:29:11,530
Ex not and then we enter into our loop for k equals 01... we evaluate.
238
00:29:14,680 --> 00:29:27,580
The function. So that's F of x, k, and also the derivative Jacobean, which is f prime of x k.
239
00:29:29,720 --> 00:29:42,830
Now, I'm assuming here that any calls anywhere in the square case, you can imagine that in practice, evaluating the Jacoby is a non-trivial matter.
240
00:29:43,280 --> 00:29:46,549
That's a huge issue of different ways to do that. Exactly.
241
00:29:46,550 --> 00:29:52,700
Or approximately. And now we want to do our update.
242
00:29:54,350 --> 00:29:59,750
We need a step. And the step is now defined by a linear system of equations.
243
00:30:00,170 --> 00:30:09,530
So here's the system of equations. F prime of x times s k is equal to minus F of x k.
244
00:30:11,460 --> 00:30:22,330
And we're solving that for the step. So before in the scalar case we just said S.K. equals two minus F over F prime.
245
00:30:22,330 --> 00:30:28,660
But now that's the system of equations. We have to solve this end by end system of equations that gives us our step.
246
00:30:29,200 --> 00:30:33,220
And then x, k plus one is equal to x, k plus k.
247
00:30:43,600 --> 00:30:50,290
So that's a pretty straightforward idea. A lot of optimisation consists of how to make that idea practical for big problems.
248
00:30:50,560 --> 00:31:01,340
Now let's look at it in its simplest form. So what I'm going to do here is look at the code called M20 Newton's system.
249
00:31:02,960 --> 00:31:09,700
So if you look at that on the back of the other ones. I guess once again I might as well write down.
250
00:31:10,330 --> 00:31:17,710
Well, now it's not important you can see the function. So the function F that we're working with here is two variables and two equations.
251
00:31:18,070 --> 00:31:21,940
So sine of x one plus x two minus e to the minus x one squared.
252
00:31:22,340 --> 00:31:27,670
Then the second equation is 3x1 plus x1x2 squared minus one equals zero.
253
00:31:27,970 --> 00:31:31,480
So we're trying to find a root of those two equations in two unknowns.
254
00:31:31,870 --> 00:31:40,330
And in this case, I've explicitly written down a formula for the giacobbe, which of course would not typically be possible in a numerical calculation.
255
00:31:40,720 --> 00:31:46,870
So this little code will just explicitly do Newton's iteration for that system of two equations.
256
00:31:48,190 --> 00:31:56,440
Let's see what happens. I'll type M20 Newton system.
257
00:31:57,700 --> 00:32:01,620
Oh, dear. Where am I not in the system? Yeah.
258
00:32:03,920 --> 00:32:09,600
What did I do wrong here? Hmm.
259
00:32:09,900 --> 00:32:15,300
That's interesting. Either I tested this on my desktop, but not on this laptop.
260
00:32:15,720 --> 00:32:26,940
Let's see what's going on in line 32. Column one. Oh sorry.
261
00:32:28,480 --> 00:32:32,170
I added some comments. That's the problem and forgot to make them into comments.
262
00:32:32,200 --> 00:32:37,880
My apologies. So. They're in your output there.
263
00:32:38,120 --> 00:32:43,150
And I added them at the last minute. Right.
264
00:32:49,930 --> 00:32:55,659
Okay. So the code is set up to start from the initial guess of zero zero and then it solves
265
00:32:55,660 --> 00:33:01,870
this two by two system of equations and gets one third two thirds and it goes on.
266
00:33:01,900 --> 00:33:05,440
Now you can guess at this point, we probably have a couple of digits of accuracy.
267
00:33:06,730 --> 00:33:10,090
You can guess now that we still probably have a couple of digits of accuracy.
268
00:33:10,090 --> 00:33:16,149
Or maybe your best guess would be about four digits now because this either it agrees with the last 1 to 2 places,
269
00:33:16,150 --> 00:33:25,900
so it's probably accurate to about four places. Sure enough, we had about five digits of accuracy and of course, numerically beautiful convergence.
270
00:33:26,140 --> 00:33:27,730
So that was an easy problem.
271
00:33:28,000 --> 00:33:34,780
Now, of course, you don't know looking at that, whether that's the unique solution, you're pretty certain that is a solution.
272
00:33:34,780 --> 00:33:44,200
But there might be others you simply don't know from looking at that. Let me do the comments that I added there.
273
00:33:45,160 --> 00:33:50,379
So of course, FUN has a lot of fun with minimising and finding routes and so on.
274
00:33:50,380 --> 00:33:58,000
Two functions. Let's look at this function in fun. It's two equations and two unknowns, so I'll do the second first.
275
00:33:58,000 --> 00:34:01,360
One way to do that would be to declare.
276
00:34:04,960 --> 00:34:17,110
A function corresponding to X on the unit square by default and another function corresponding to Y on the unit square.
277
00:34:18,400 --> 00:34:36,600
And then I could create. F1 to be sign of x plus y minus E to the minus x squared and F2 to be three times x minus x times y squared minus one.
278
00:34:36,630 --> 00:34:38,400
If I have a typo, please let me know.
279
00:34:38,670 --> 00:34:46,980
So to have fun too is doing its usual thing of representing functions by polynomials that match them to about 16 digits of precision.
280
00:34:47,790 --> 00:34:54,090
And then there's the roots command, which in principle aims to find all the roots.
281
00:34:55,880 --> 00:35:00,800
Of this system of two equations in the region where they are defined, which was the unit square.
282
00:35:00,800 --> 00:35:07,730
So -1 to 1, -1 to 1. So one would hope that we'll find the root that we just found.
283
00:35:08,180 --> 00:35:11,470
Maybe we'll find others, too, because Newton's method didn't tell us.
284
00:35:11,480 --> 00:35:16,510
Are there others to. So we proceed and we seem to find just one.
285
00:35:16,510 --> 00:35:21,820
That's not a proof, of course, but it's maybe some evidence that that's the unique route in the unit square.
286
00:35:21,970 --> 00:35:26,860
Of course you could try enlarging. Suppose I changed it to the interval from.
287
00:35:28,630 --> 00:35:33,580
-3 to 3. -3 to 3. So now we're in a box nine times as big.
288
00:35:37,580 --> 00:35:43,430
I haven't tried this, so I don't know what will happen if I now say a defined F as before.
289
00:35:45,090 --> 00:35:49,980
I've now created a global representation of these functions on these bigger domains.
290
00:35:50,520 --> 00:35:56,400
I don't know what routes we'll find. Oh, okay.
291
00:35:56,690 --> 00:36:03,829
So that's new to me. Hopefully those are all correct. We could get some numerical evidence of whether they're correct.
292
00:36:03,830 --> 00:36:09,020
So suppose I say f one events or I'll say R equals.
293
00:36:09,590 --> 00:36:21,020
And so then I'll say F1 of ah, I'm doing something wrong, so maybe I won't waste your time on that, but I'm probably those are all valid routes.
294
00:36:21,110 --> 00:36:24,110
Do we recognise among them the one that Newton's method got?
295
00:36:25,090 --> 00:36:28,420
I think that's the fourth one in the list, right .387.
296
00:36:28,750 --> 00:36:35,200
So I currently believe that there are six routes of this pair of equations in the box.
297
00:36:35,200 --> 00:36:38,739
Minus three, three, minus three, three, seven.
298
00:36:38,740 --> 00:36:42,340
Also lets you work with vector calculus sort of directly.
299
00:36:42,350 --> 00:36:48,130
So another way to do it would be to say F equals seven to V.
300
00:36:49,550 --> 00:36:53,870
V for vector, and then we could simply make a vector function of both things.
301
00:36:53,870 --> 00:36:58,130
So I could say sine of x plus y minus e to the minus x squared.
302
00:36:58,370 --> 00:37:07,220
That's the first component. And then the second component is the other function, three x minus X times y.
303
00:37:08,730 --> 00:37:12,510
Squared. Stop me if you see an error minus one.
304
00:37:14,100 --> 00:37:21,780
So that's now a vector. Capital F is now a the proper vector function and you can ask for the roots of vector.
305
00:37:23,580 --> 00:37:27,940
And there they are. Let's do the experiment of doing it on -33.
306
00:37:30,270 --> 00:37:39,400
We hope this matches what we got before. Right. Seems to.
307
00:37:40,570 --> 00:37:45,580
So whether that's further evidence or just the same old evidence is an interesting question.
308
00:37:46,180 --> 00:37:49,509
Since I'm doing this fun stuff, let me just go to the more trivial,
309
00:37:49,510 --> 00:37:55,930
one dimensional case to highlight the significance of local versus global root finding.
310
00:37:56,110 --> 00:38:05,250
So suppose we take an example we looked at before. Suppose I guess say F equals a one dimensional function and I'll pick a Bessel function.
311
00:38:05,260 --> 00:38:11,350
So suppose I say Bessel j and I'll do that on the interval from 0 to 20.
312
00:38:12,820 --> 00:38:16,650
So if I plot f. Yup.
313
00:38:16,710 --> 00:38:24,960
That's not going to be interesting. Thank you. Uh, zero x right.
314
00:38:26,220 --> 00:38:32,440
So if I plot f. I find a function with several zeros.
315
00:38:33,460 --> 00:38:39,280
Now, of course, Newton's method, if you started at any point, probably will converge to one of these zeros, but only one.
316
00:38:39,490 --> 00:38:44,080
If you wanted to find all of them, you would need to enlarge your discourse somehow.
317
00:38:44,410 --> 00:38:48,700
In the olden days, when I was your age, that was almost a taboo subject.
318
00:38:48,700 --> 00:38:53,500
Global optimisation was considered provably impossible, so nobody talked about it.
319
00:38:53,770 --> 00:38:59,950
But that's all sort of changes. There are now various context in which you can meaningfully do global optimisation.
320
00:39:00,160 --> 00:39:04,840
Of course, a one day problem like this is very easy and Tetfund would do that.
321
00:39:05,320 --> 00:39:12,160
Let me remind you how if I say R equals roots of F, of course I get all of them so I could plot them.
322
00:39:13,390 --> 00:39:16,750
But let's talk about why.
323
00:39:19,110 --> 00:39:23,210
What they do wrong here. That's worrisome.
324
00:39:23,540 --> 00:39:29,900
Did I? What did I do wrong? Oh.
325
00:39:32,730 --> 00:39:36,760
Oops. Sorry. Hold on.
326
00:39:36,780 --> 00:39:40,860
Plot are against FFR. Is that what I want to do? Thank you.
327
00:39:41,160 --> 00:39:46,410
Yes. Okay. So, of course, we see some kind of global route finding.
328
00:39:46,410 --> 00:39:49,590
But let's talk for a moment. Why was this global?
329
00:39:50,840 --> 00:39:59,600
The answer is that Ted often happens to represent functions by polynomials and find roots as eigenvalues of matrices.
330
00:39:59,900 --> 00:40:06,410
And it's a fact of scientific computing that the standard way of finding eigenvalues of matrices is global.
331
00:40:06,410 --> 00:40:13,220
You find all of them. So there's some matrix involved here, which has a lot of eigenvalues, six of which are those dots.
332
00:40:13,640 --> 00:40:21,140
It's because the standard algorithms for eigenvalue problems get all of them that it becomes easy to do global root finding,
333
00:40:21,140 --> 00:40:27,170
at least in one dimension for fun. And as I think I demonstrated before, it's pretty powerful.
334
00:40:27,170 --> 00:40:33,559
So if I made that the interval from 0 to 20000, for example, I don't I haven't tried one this big.
335
00:40:33,560 --> 00:40:41,490
Let's see what happens. I'll plot f. So there's your Bessel function.
336
00:40:41,790 --> 00:40:45,390
The that's a polynomial of degree, 10,000.
337
00:40:46,380 --> 00:40:49,560
Let's say R equals roots of F and see how long it takes.
338
00:40:49,560 --> 00:40:58,600
I predict about half a minute, actually. We're finding the roots of a polynomial of degree.
339
00:40:58,600 --> 00:41:01,600
10,000 in order to do global roots.
340
00:41:01,810 --> 00:41:10,150
Well, 15 seconds maybe. So suppose, I guess say plot F and I'll restrict it to a small interval I'll go from.
341
00:41:12,500 --> 00:41:18,860
8082 8175.
342
00:41:21,270 --> 00:41:26,910
Okay. And now I'll say plot are against FLIR.
343
00:41:27,940 --> 00:41:33,470
Hold on. Plot are against FFR.
344
00:41:33,490 --> 00:41:39,480
Will that work? Yeah.
345
00:41:39,690 --> 00:41:44,550
Okay. So in 15 seconds, we found all the roots of the Bessel function out to 20,000.
346
00:41:45,210 --> 00:41:50,070
There they are. I don't know exactly how accurate they are, but probably 14 or 15 digits.
347
00:41:51,330 --> 00:41:59,340
And that's an example of Global Zero finding. And I think as the decades go on, optimisation has a larger and larger part that's global.
348
00:41:59,460 --> 00:42:04,740
It's still a very small minority for serious scientific problems like protein folding.
349
00:42:04,740 --> 00:42:12,750
Let's say you don't have any realistic hope of global optimisation in a provable sense, but of course people are always striving for that.
350
00:42:13,950 --> 00:42:18,540
Okay, so for the last few minutes, I want to begin to talk about minimisation.
351
00:42:31,960 --> 00:42:38,860
And for whatever reason, minimisation is in practice a much vaster scientific enterprise than zero.
352
00:42:38,860 --> 00:42:44,410
Finding it just comes up more of the time. So this is really the heart of the field of optimisation.
353
00:42:44,620 --> 00:42:55,900
So let's talk about Newton's method. For minimising and we'll get start in one day.
354
00:43:11,750 --> 00:43:17,989
Well, there's no great mystery mathematically about what's involved here to minimise you want to find a zero of the derivative,
355
00:43:17,990 --> 00:43:23,900
and that's what Newton's method does. So we're now going to use a lowercase letter.
356
00:43:24,260 --> 00:43:31,820
So we suppose now we have lowercase F. A function of a scalar variable.
357
00:43:32,780 --> 00:43:38,840
And our goal is to find x star.
358
00:43:41,640 --> 00:43:45,000
Such that f of x star equals minimum.
359
00:43:51,780 --> 00:44:01,860
So it's interesting already, if you think about it, you can begin to see a slight difference between minimisation and zero finding.
360
00:44:02,070 --> 00:44:06,720
So suppose we have some function and we want to find its minima. Well.
361
00:44:08,850 --> 00:44:18,270
Of course, these are some local minima and they all have the property that the zero of the derivative is zero.
362
00:44:19,620 --> 00:44:24,239
So it's interesting that from a global point of view, these are not all equal.
363
00:44:24,240 --> 00:44:30,200
That's a better minimum than this one. Whereas in the zero finding problem, of course, by definition all zero zeros.
364
00:44:30,450 --> 00:44:34,620
So you can see that minimisation has a complication there,
365
00:44:34,630 --> 00:44:39,930
global versus local that goes deeper somehow than the analogous situation with zero finding.
366
00:44:41,580 --> 00:44:46,340
So we want to find a minimum. So a.
367
00:44:47,250 --> 00:44:51,510
Usually 90% of the time we were going to seek a local minimum.
368
00:44:54,460 --> 00:44:58,120
And it's obvious what that means. Let's write it down.
369
00:44:58,120 --> 00:45:05,710
We'll say F of X should be bigger than or equal to f of x star for all.
370
00:45:08,020 --> 00:45:14,600
Ex in a neighbourhood. Neighbourhood of Star.
371
00:45:16,910 --> 00:45:25,400
So that's our definition of a local minimum. And the obvious idea is to use Newton's method defined as zero of the derivative.
372
00:45:32,440 --> 00:45:40,230
So if prime of x star equals zero would be the condition we apply.
373
00:45:40,240 --> 00:45:44,320
And if you'll excuse my being so methodical here, I'm going to write it down.
374
00:45:44,650 --> 00:45:48,430
So this is Newton's method for minimisation of a scalar function.
375
00:45:48,640 --> 00:45:54,330
We're given an initial gas. Which would be ex not.
376
00:45:55,230 --> 00:46:00,910
We enter our loop. 4K equals 01...
377
00:46:01,630 --> 00:46:12,560
We define our step. As Sub K, we're now finding a0 of the derivative, so it's the same formula as before but differentiated.
378
00:46:12,560 --> 00:46:18,770
So it's minus f, prime of x k divided by F double prime.
379
00:46:19,710 --> 00:46:28,320
Of x k and then we update exactly as before x k plus one equals x k plus s-k and.
380
00:46:36,640 --> 00:46:45,070
So from that, you can perhaps begin to see that minimisation is going to have some problems.
381
00:46:45,400 --> 00:46:49,030
It's a whole nother derivative beyond what we had before it.
382
00:46:49,390 --> 00:46:54,970
This pure Newton method depends on the curvature of the function. That's first of all, it's a lot more information.
383
00:46:55,270 --> 00:47:01,660
Second of all, it's demanding more of the smoothness of a function for that to really work.
384
00:47:01,990 --> 00:47:09,100
So, of course, in many applications, it will be hard to evaluate that, or maybe the derivative won't even exist if your function isn't very smooth.
385
00:47:09,400 --> 00:47:13,300
And that gives a hint as to the complexity of the problem.
386
00:47:14,020 --> 00:47:18,550
Let me finish by saying another way that one would interpret this.
387
00:47:22,790 --> 00:47:27,450
So. Let me call it an equivalent formulation,
388
00:47:27,630 --> 00:47:35,760
while at the same time hinting that it's in some sense a better formulation because it generalises very broadly,
389
00:47:41,400 --> 00:47:46,500
the equivalent way of looking at Newton's method is that we approximate F of X.
390
00:47:50,990 --> 00:48:04,220
Locally by a parabola. And of course the particular parabola we choose is determined by the derivative in the second derivative at our current point,
391
00:48:04,730 --> 00:48:13,580
and then we set x, k plus one equal to the minimum of the parabola.
392
00:48:23,050 --> 00:48:28,660
Now the reason that somehow a better formulation is that it immediately generalises.
393
00:48:28,660 --> 00:48:34,270
And this in some sense is really a picture for all kinds of optimisation algorithms.
394
00:48:34,720 --> 00:48:40,390
You do a local approximation and what changes is, instead of saying just a parabola,
395
00:48:40,930 --> 00:48:45,130
you have some model of your function based on whatever data you've got.
396
00:48:47,420 --> 00:48:59,640
And then you minimise that model. So at least for unconstrained optimisation, this is sort of the universal picture of the way most algorithms work.
397
00:48:59,910 --> 00:49:03,780
You locally find some model of what your function looks like.
398
00:49:04,080 --> 00:49:12,090
Use that model to get the next step. And as you go, keep improving the accuracy and probably improving the model.
399
00:49:12,690 --> 00:49:19,020
So next time we'll go into multiple dimensions, but also talk about more realistic aspects beyond Newton's method.