1
00:00:02,700 --> 00:00:06,450
Okay. Good morning, everybody. I hope you've done the quiz.

2
00:00:06,450 --> 00:00:13,650
So let's just look at that and see if you got the right answers. Every square matrix has an eigenvalue decomposition false.

3
00:00:14,040 --> 00:00:18,180
And the simplest example of one that doesn't. Would be that.

4
00:00:19,830 --> 00:00:25,400
In other words, not every matrix is diagonal sizeable. Every matrix, rectangular or square has an SVD.

5
00:00:25,710 --> 00:00:31,300
Yes, that's true. If A and these are square, etc., etc., then they have the same eigenvalues.

6
00:00:31,320 --> 00:00:34,740
Yes, that's true. That's a similarity transformation. So let's see.

7
00:00:34,740 --> 00:00:43,080
We have false, true, true number for a square matrix of singular if and only if it has a zero eigenvalue.

8
00:00:43,290 --> 00:00:49,280
True. Number five, the singular values of an orthogonal matrix are all equal to one.

9
00:00:49,550 --> 00:00:55,280
That's true. Remember, the singular values may tell you about a hyper ellipsoid.

10
00:00:55,280 --> 00:01:00,500
Well, an orthogonal matrix maps a ball to a ball, so all the singular values are one.

11
00:01:01,610 --> 00:01:04,880
Finally, the eigenvalues of an orthogonal matrix are all equal to one.

12
00:01:05,060 --> 00:01:11,120
That's false. The eigenvalues of an orthogonal matrix have to be equal to those of its inverse.

13
00:01:11,390 --> 00:01:15,410
But that could be minus one. So they're plus or minus one? Not necessarily one.

14
00:01:17,750 --> 00:01:24,170
Okay. So today we begin the last bit of this term, which is about optimisation.

15
00:01:24,380 --> 00:01:27,350
We've been talking about linear algebra. Now optimisation.

16
00:01:27,770 --> 00:01:32,990
This is a subject that I mentioned at the beginning was one of the pillars of scientific computing.

17
00:01:33,200 --> 00:01:42,530
Remember, I drew a tree in which I identified three traditional main areas of numerical analysis scientific computing.

18
00:01:42,680 --> 00:01:51,350
One was linear algebra, one was differential equations, pdes odes, and the other was optimisation.

19
00:01:53,690 --> 00:01:58,630
I think if anything, over the years, this one just keeps getting more and more important.

20
00:01:58,640 --> 00:02:04,430
It's just an amazingly exciting, flourishing area that's everywhere in science and engineering.

21
00:02:05,720 --> 00:02:13,820
Let me mention a couple of facts about optimisation that make it diverse and interesting.

22
00:02:14,300 --> 00:02:20,840
One is that, of course, we're typically dealing with functions of several variables or many variables.

23
00:02:22,280 --> 00:02:28,670
So optimisation is all about minimising functions of many variables.

24
00:02:30,830 --> 00:02:38,180
And the other thing without which the field wouldn't be so variegated is that typically there are constraints.

25
00:02:40,700 --> 00:02:42,260
So what is optimisation?

26
00:02:42,560 --> 00:02:54,360
Well, it's the subject of minimising or maximising functions, typically with several or many variables and typically with constraints.

27
00:02:54,370 --> 00:03:06,500
So it's minimisation and maximisation also more or less in the same domain is finding routes of functions maximisation.

28
00:03:13,720 --> 00:03:18,660
So that's what optimisation is all about. Nobody ever maximises in practice you always minimise.

29
00:03:18,660 --> 00:03:22,600
So if you have a if you want a maximum of something, multiply it by minus one.

30
00:03:23,050 --> 00:03:31,330
The standard is always to talk about minimisation. So this is a field where there is a totally dominant, great textbook.

31
00:03:31,600 --> 00:03:36,850
And that's this book by No Saddle and Right. The second edition came out nine years ago.

32
00:03:36,850 --> 00:03:39,309
I think it's just a beautiful, straightforward,

33
00:03:39,310 --> 00:03:45,940
knowledgeable book that talks about all the basic and a little more than basic ideas of numerical optimisation.

34
00:03:46,150 --> 00:03:52,300
So there's absolutely no doubt that if you want to learn about something in this area, the place to start is with this book.

35
00:03:52,720 --> 00:03:58,600
Let me pass that around. Oh, before I pass it around, let me read the first two sentences.

36
00:04:02,720 --> 00:04:05,750
So it's a big, thick book, but well organised.

37
00:04:05,750 --> 00:04:09,580
Don't worry. The first two paragraphs are just these little paragraphs.

38
00:04:09,590 --> 00:04:13,700
The first paragraph begins People optimise.

39
00:04:15,010 --> 00:04:19,960
What they mean by that is, well, let me say what they mean by that. Investors seek to create portfolios.

40
00:04:19,990 --> 00:04:23,290
Da da da. Manufacturers aim for maximum efficiency.

41
00:04:23,500 --> 00:04:26,740
Engineers adjust parameters to optimise performance and so on.

42
00:04:26,890 --> 00:04:31,960
So people optimise. Then the second paragraph says Nature optimises.

43
00:04:32,230 --> 00:04:34,840
Physical systems tend to a state of minimum energy.

44
00:04:35,020 --> 00:04:41,470
The molecules in an isolated chemical system react with each other until the total potential energy of their electrons is minimised.

45
00:04:41,680 --> 00:04:45,160
Rays of light follow paths that minimise their travel time.

46
00:04:45,700 --> 00:04:51,309
I think that's an intellectually fascinating beginning of the book because it really is true that optimisation is

47
00:04:51,310 --> 00:04:56,860
something that's both at the heart of science because of so many processes that minimise energy or something.

48
00:04:57,070 --> 00:05:00,280
And at the heart of engineering, for even more obvious reasons.

49
00:05:07,100 --> 00:05:16,009
Okay. So the beginning point of optimisation and it's not just the beginning,

50
00:05:16,010 --> 00:05:21,200
it's a pervasive idea throughout everything is Newton's method, which of course we've all heard of.

51
00:05:21,380 --> 00:05:24,140
But that's certainly where we're going to start discussing the matter.

52
00:05:24,380 --> 00:05:32,090
And we're going to begin with Newton's method in one dimension, which is the kind of thing I hope you all learned in high school.

53
00:05:32,840 --> 00:05:39,400
So Newton's. Method, and I'll put it like this Newton's method for a single equation.

54
00:05:40,660 --> 00:05:46,870
And the reason for putting it like that is to emphasise that I'm beginning with root finding rather than minimisation.

55
00:05:51,150 --> 00:05:55,830
I remember once a teacher told me that really mathematics boiled down to two things.

56
00:05:56,280 --> 00:05:59,520
One is Taylor series. The other is linear algebra.

57
00:05:59,640 --> 00:06:04,110
Everything else is variations on those themes. Well, this is Taylor series.

58
00:06:04,110 --> 00:06:06,900
Newton's great idea is Taylor series. Indeed,

59
00:06:07,110 --> 00:06:14,250
I think Arnold wrote a book about the history of mathematics in which he essentially claimed that Newton's great contribution was Taylor series,

60
00:06:15,720 --> 00:06:22,530
exploiting the smoothness of a function to approximate it by a constant plus slope plus curvature, that sort of thing.

61
00:06:23,190 --> 00:06:25,620
So let's write down the notation we're going to use.

62
00:06:27,720 --> 00:06:35,070
Let's suppose we're given a function, and the notation I'll use for root finding is capital F of X.

63
00:06:35,580 --> 00:06:42,810
When we turn to minimisation, it'll be lowercase F. So suppose we have a function which we assume is nonlinear.

64
00:06:47,280 --> 00:06:58,140
And we want to find a zero. So our goal is to find a zero of F, also called a root of F.

65
00:07:00,820 --> 00:07:04,450
And the notation will use is that that's going to be X star.

66
00:07:06,700 --> 00:07:11,680
So we're looking for an X star such that f of X star is equal to zero.

67
00:07:13,990 --> 00:07:21,819
And you all know Newton's method. Let me write it down in this format that I'll write down a bunch of these iterations.

68
00:07:21,820 --> 00:07:28,080
So suppose we're given an initial guess. Which I'll call, of course, X not.

69
00:07:30,990 --> 00:07:36,450
And then we enter the loop. And I won't ever talk about termination criteria.

70
00:07:36,450 --> 00:07:56,970
So I'll just say four k equals 01... define s sub k to be minus f evaluated at x sub k divided by f prime evaluated that x sub k now s.

71
00:07:58,180 --> 00:08:04,790
Okay. Stands for step. Because we're going to take a step in that direction.

72
00:08:06,440 --> 00:08:13,730
The new iterate is then x, k plus one is equal to x, k plus k.

73
00:08:17,010 --> 00:08:20,940
So there's the iteration that you all learned in high school.

74
00:08:22,710 --> 00:08:25,710
Now, let me ask I'm curious. Raise your hand if you did learn that in high school.

75
00:08:26,700 --> 00:08:36,270
Oh, okay. Pretty good. And of course, you know, the picture associated with this, we have some kind of function.

76
00:08:37,170 --> 00:08:46,110
We're trying to find X star, we have some guess x not we evaluate the function, then extrapolate along the tangent,

77
00:08:46,710 --> 00:08:52,830
and that process converges quickly, in fact, just to say a word about how fast it does converge.

78
00:08:56,610 --> 00:09:03,630
I won't state it formally as a theorem, but this is the idea if F is twice differentiable.

79
00:09:05,450 --> 00:09:11,970
So that means the second derivative exists. And moreover.

80
00:09:15,310 --> 00:09:22,180
The derivative evaluated at a route x star is nonzero.

81
00:09:25,990 --> 00:09:32,260
Then if X not is sufficiently close to x star.

82
00:09:44,180 --> 00:09:50,390
We have convergence and it's quadratic. So we say the convergence is quadratic.

83
00:09:53,790 --> 00:09:58,530
Which means the number of digits doubles at each step, essentially.

84
00:09:59,670 --> 00:10:07,950
Later, I may write that down. More precisely, the error at one step is proportional asymptotically to the square of the error at the previous step.

85
00:10:08,550 --> 00:10:12,630
And this is a prototype for the kind of theorem you find all over optimisation.

86
00:10:13,170 --> 00:10:18,240
Notice some key assumptions. Always turn up. You need some smoothness in your function.

87
00:10:18,360 --> 00:10:21,780
If you lose that, then there may be no convergence or slower convergence.

88
00:10:22,350 --> 00:10:26,610
You need some kind of generic behaviour at the solution you're looking for.

89
00:10:26,910 --> 00:10:30,120
Otherwise you may have no convergence or slower convergence.

90
00:10:30,660 --> 00:10:35,040
And especially with Newton's method, you certainly need to be close enough to the solution.

91
00:10:36,000 --> 00:10:39,240
So of course, if this function keeps doing crazy things.

92
00:10:42,670 --> 00:10:48,760
Depending where we start, we may converge to nothing or we may converge to different routes.

93
00:10:49,060 --> 00:10:52,570
It's obvious from the picture that if we start near this route, will converge to it.

94
00:10:52,780 --> 00:11:00,670
Start near that will converge to that. Things get more complicated if your function is less smooth or the derivatives are non-zero at the roots.

95
00:11:02,260 --> 00:11:06,550
So of course, a route is by no means necessarily unique.

96
00:11:06,910 --> 00:11:10,630
Newton's method is all about finding one route, not necessarily all of them.

97
00:11:11,770 --> 00:11:16,510
Now I want to show you, as usual, some stuff. So let's go to MATLAB.

98
00:11:21,620 --> 00:11:30,830
The first code is the one called M 18, which is not a program you're really supposed to look at because all it does is fiddle around with graphics.

99
00:11:32,120 --> 00:11:36,830
But I was giving a public lecture once, so it was worthwhile making such a code.

100
00:11:37,040 --> 00:11:52,490
So it's called Pretty Newton. So what Pretty Newton does is take a particular function and just colourfully illustrate Newton's method.

101
00:11:52,700 --> 00:11:58,120
So there you have the function, which of course is the blue curve, and we're trying to find that point.

102
00:11:58,520 --> 00:12:03,650
And there's our initial gas and you can see that the initial gas is 0.8.

103
00:12:04,400 --> 00:12:12,410
None of the digits are correct. So they're all in red. You evaluate the function at your current gas.

104
00:12:14,140 --> 00:12:18,670
And then you extrapolate along the tangent line. So that gives you a new point.

105
00:12:18,670 --> 00:12:21,730
0.208. Still, none of the digits are correct.

106
00:12:22,420 --> 00:12:27,160
So they're still all in red. Then we evaluate the function there.

107
00:12:27,610 --> 00:12:32,519
Extrapolate along the tangent. Still none of the digits are correct.

108
00:12:32,520 --> 00:12:38,649
So they're still all in red. Of course, the remainder you won't see very well.

109
00:12:38,650 --> 00:12:45,280
You're mostly going to see the red digits. We evaluate and extrapolate and now suddenly we have three digits of accuracy.

110
00:12:45,280 --> 00:12:54,310
.373 are correct. And from now on, every time we keep going, the number of correct digits will approximately double.

111
00:12:55,990 --> 00:12:59,680
So I evaluate and then I extrapolate.

112
00:13:00,250 --> 00:13:03,430
And it exactly doubled from three digits to six digits.

113
00:13:04,520 --> 00:13:09,140
Do it again. We expect to get about 12 digits. I evaluate and then extrapolate.

114
00:13:09,980 --> 00:13:13,820
And as it happens, we got exactly 12 digits on my machine.

115
00:13:13,820 --> 00:13:14,840
It won't go beyond that.

116
00:13:14,840 --> 00:13:22,820
I might be able to nail the final one, but of course, I'm working in 16 digit arithmetic, so that's as far as I'll get on this computer.

117
00:13:26,160 --> 00:13:31,560
Great idea. Newton, by the way, didn't do as much with Newton's method as you probably think.

118
00:13:31,860 --> 00:13:35,790
I think he mainly applied it only to polynomials, not to general functions.

119
00:13:36,060 --> 00:13:42,030
And Simpson, as in Simpson's rule, was the person who really put Newton's method on its feet.

120
00:13:42,210 --> 00:13:46,940
Simpson and Robson both. Okay.

121
00:13:46,970 --> 00:13:51,140
Now, I also want to look at the other codes that are called M 19.

122
00:13:52,550 --> 00:13:57,170
So there are three of these just to explore this problem from different angles.

123
00:13:58,070 --> 00:14:01,330
Excuse me. Okay.

124
00:14:01,330 --> 00:14:06,580
So here we have a problem. I might as well write it down.

125
00:14:07,870 --> 00:14:22,580
We're solving. An M 19 where solving f of x equals x cubed minus two.

126
00:14:22,940 --> 00:14:25,220
So we're trying to find the cube root of two.

127
00:14:26,120 --> 00:14:32,300
And incidentally, on computers, sometimes Newton's method is used even for elementary things like square root sets.

128
00:14:32,570 --> 00:14:36,920
It's an algorithm so useful that even in the hardware, sometimes it will be implemented.

129
00:14:37,910 --> 00:14:44,810
So let's look at this from various angles. The first of the codes em 19 Newton.

130
00:14:44,820 --> 00:14:48,139
A just as straightforward matlab.

131
00:14:48,140 --> 00:14:54,570
So I will close that. And let's just type em 19.

132
00:15:01,890 --> 00:15:07,370
Newton. A So what you see there is we've taken our initial guesses.

133
00:15:07,380 --> 00:15:16,320
One, we're trying to find the cube root of two. We iterate, we get 4/3, we iterate again and again.

134
00:15:16,830 --> 00:15:19,860
So you could guess now that we have several digits of accuracy.

135
00:15:20,850 --> 00:15:27,660
It looks clear that now that we have four or five digits of accuracy now, it would seem we have maybe nine.

136
00:15:28,620 --> 00:15:31,110
And now, from now on, it just keeps agreeing.

137
00:15:31,500 --> 00:15:38,430
So, of course, it's not a proof, but it's pretty clear that we have more or less full precision in the cube root of two.

138
00:15:40,300 --> 00:15:43,860
Let's see what happens if I type two to the one third.

139
00:15:46,180 --> 00:15:50,710
Looks good. That doesn't mean they're the same. By the way, I could say Ant's minus X.

140
00:15:51,430 --> 00:15:55,740
They might. That might be zero. Or it might be some small multiple of machine epsilon.

141
00:15:55,750 --> 00:16:02,960
It turns out at zero we were lucky. The next one is doing the same.

142
00:16:02,980 --> 00:16:08,740
But symbolically now I would love to show you this properly, but my laptop doesn't have the symbolic toolbox.

143
00:16:09,040 --> 00:16:17,110
So you can try this at home. And therefore what I did was printed out a couple of the outputs on the sheet of paper here.

144
00:16:17,380 --> 00:16:24,160
So here we're running in MATLAB symbolic toolbox, which is by no means the world's most beautiful symbolic calculator.

145
00:16:24,550 --> 00:16:31,680
But anyway, it works for something like this. In MATLAB, you can declare a variable to be sim of a string,

146
00:16:31,680 --> 00:16:37,860
and so one then becomes a symbolic one and you can go through this whole thing in a symbolic mode.

147
00:16:39,570 --> 00:16:45,810
And Newton's method for this problem has a succession of iterates that are all rational numbers.

148
00:16:46,050 --> 00:16:51,330
So you see, the first is one and then 4/3, which we identified immediately when we ran Newton.

149
00:16:52,650 --> 00:16:55,740
But they keep being rational numbers at every stage.

150
00:16:55,740 --> 00:16:59,640
So 91, 70 seconds and 112. Yeah, whatever.

151
00:17:02,000 --> 00:17:05,360
Now, the striking thing about these numbers is that they keep getting longer.

152
00:17:05,540 --> 00:17:11,569
And if you quantify that, roughly speaking, the numerator is tripling and that's number of digits at each step.

153
00:17:11,570 --> 00:17:14,810
And so is the denominator because this is a cubic.

154
00:17:15,290 --> 00:17:23,930
If it had been a quintillion, the numerator would have become five times as long at each step or a 20th degree polynomial, 20 times as long.

155
00:17:24,560 --> 00:17:29,420
So what you're seeing here is the combinatorial explosion of symbolic computing.

156
00:17:29,690 --> 00:17:38,690
Rational arithmetic is sadly just not a good idea for scientific computing because this happens all the time at combinatorial explodes.

157
00:17:39,140 --> 00:17:44,060
So that's why rational arithmetic has such a small niche in computational science.

158
00:17:44,420 --> 00:17:53,330
Usually it just isn't feasible. And if you're even looking at this for theatre, it said we have won 4/3 91 and then going down three more steps.

159
00:17:55,180 --> 00:18:02,860
That approximation to the cube root of two will surely be accurate to 16 digits or so, but it won't be accurate to this many digits.

160
00:18:03,040 --> 00:18:09,460
So most of that exactness in the rational numbers is irrelevant to the problem you're trying to solve.

161
00:18:10,060 --> 00:18:16,210
And that would be very typical of symbolic or exact arithmetic computing.

162
00:18:18,650 --> 00:18:26,450
Now the third one and 19 Newton C is also something I can't run on my laptop because I don't have the symbolic toolbox.

163
00:18:26,660 --> 00:18:31,010
But it's the other variation on the theme of standard numerical computing.

164
00:18:31,250 --> 00:18:35,300
So standard is floating point arithmetic. One variation is symbolic.

165
00:18:35,780 --> 00:18:39,970
The other variation is extended precision. Another important beautiful idea.

166
00:18:39,980 --> 00:18:47,510
So here in this code M 19, Newton C, I said let's run in 75 digit arithmetic.

167
00:18:49,100 --> 00:18:53,059
So there you see, print it out the first few steps and 75 digit arithmetic.

168
00:18:53,060 --> 00:18:58,940
And sure enough, the second one has one digit, the third one has one digit, the fourth one has three digits.

169
00:18:59,180 --> 00:19:03,860
And then if you follow along, you see that the last two lines of the output are identical.

170
00:19:04,280 --> 00:19:08,450
So sure enough, Newton is doubling the number of digits at each step.

171
00:19:09,800 --> 00:19:18,040
Variable precision is a beautiful, wonderful thing. And I think everyone in my field wishes it were effortlessly available all the time.

172
00:19:18,050 --> 00:19:21,950
If only every computer had a knob, you could turn and just change the position.

173
00:19:22,970 --> 00:19:26,690
Sadly, from an engineering point of view, that doesn't seem to be worth the cost.

174
00:19:27,410 --> 00:19:31,430
But for an academic like me, it would be delightful to be able to do that.

175
00:19:31,730 --> 00:19:38,420
Maybe in 50 years computers will do that. So in practice, if you want to do variable precision arithmetic,

176
00:19:38,570 --> 00:19:44,870
you go into some rather different mode and usually there's a significant cost associated with that.

177
00:19:46,980 --> 00:19:51,890
When I was your age, quadruple precision was more common than it is now.

178
00:19:51,900 --> 00:19:58,290
It's still out there. But when I was your age, single, double and quadruple were all very much part of the landscape.

179
00:19:58,290 --> 00:20:01,980
So we would computed eight digit, 16 digits or 32 digits.

180
00:20:02,280 --> 00:20:07,140
Now, that's kind of compressed where most people, most of the time are just working in 16 digits.

181
00:20:10,140 --> 00:20:16,920
Okay. So that's Newton's method in the scalar case.

182
00:20:17,130 --> 00:20:20,190
But of course, optimisation is about systems of equations.

183
00:20:20,550 --> 00:20:30,520
So let's talk about that. So Newton's method for a system of equations.

184
00:20:46,360 --> 00:20:52,810
And again, I'm writing that out explicitly to emphasise that we're not yet minimising, we're still finding routes of equations.

185
00:20:54,310 --> 00:21:00,870
So now we're going to have a function that maps from a vector space to a vector space.

186
00:21:00,880 --> 00:21:06,280
So let's say that capital F maps from our end to our N.

187
00:21:09,400 --> 00:21:28,690
So we have variables and equations, if you like. We're seeking a vector x star in our n such that f of x star is zero and these are vectors.

188
00:21:28,690 --> 00:21:31,900
So zero here would be the end dimensional zero vector.

189
00:21:32,710 --> 00:21:37,750
So another way to say that is we have a system of equations.

190
00:21:41,940 --> 00:21:50,519
And if you want, we could write them out. The first one would be f one of x one through x, n equals zero.

191
00:21:50,520 --> 00:22:01,500
And that's now the scalar 0... up to f n of x1 through x n is equal to zero.

192
00:22:03,390 --> 00:22:07,980
So a system of equations is the same as a single equation in a vector space.

193
00:22:09,780 --> 00:22:12,780
Now, Newton's method in principle carries over. Exactly.

194
00:22:13,380 --> 00:22:17,910
Although Newton himself never did that, he never did systems of equations.

195
00:22:20,490 --> 00:22:27,180
And I don't remember to what extent it's thought that he sort of understood the idea of calculus with more than one variable.

196
00:22:27,270 --> 00:22:32,589
Probably he did. He was smart. The idea is to do exactly the same.

197
00:22:32,590 --> 00:22:39,850
We need to extrapolate along a tangent, but now a tangent will be defined by some linear algebra rather than just a derivative scalar.

198
00:22:40,060 --> 00:22:44,020
So we're going to have to solve a linear system of equations at each step.

199
00:22:44,500 --> 00:22:50,680
And for that to make sense, we have to talk about a linear ization of the function f.

200
00:22:51,130 --> 00:22:58,780
So here's the definition, and I'll define this in the general sense of a function from one dimensional space to another.

201
00:23:01,280 --> 00:23:07,730
The key thing we need is the derivative of F, which usually one would call the Jacoby and the Jacoby and matrix.

202
00:23:08,060 --> 00:23:21,460
So the derivative. Of S at a point X in our end.

203
00:23:25,380 --> 00:23:28,980
It's a matrix. It's the M by n matrix.

204
00:23:34,350 --> 00:23:45,050
Of partial derivatives. So J for Jacoby and I could say Jay of X, that's my matrix.

205
00:23:45,290 --> 00:23:56,149
And the I j component of that matrix is you could also write as f prime of x AJ component is the partial

206
00:23:56,150 --> 00:24:07,610
derivative of the ice component of F with respect to the JTH component of X evaluated at the point x.

207
00:24:11,750 --> 00:24:15,920
By the way, you notice I don't use special symbols to distinguish vectors.

208
00:24:16,190 --> 00:24:24,120
Of course you can. You can write them in bold face, or you can put arrows over them or Tilda's over them in mathematics.

209
00:24:24,140 --> 00:24:30,290
Often one doesn't do that, so people get used to just using the same thing, both for a vector as for a scalar.

210
00:24:31,070 --> 00:24:37,580
I realise in other fields you may have different habits. So let's just do an example.

211
00:24:39,290 --> 00:24:43,610
You all know partial derivatives and presumably most of, you know, Jacobean matrices.

212
00:24:44,390 --> 00:24:47,360
The example I'll pick, we'll go from our two to our three.

213
00:24:52,640 --> 00:25:13,879
So suppose we had for example, F1 of x1 x2 is equal to x1 squared and f two of x1 x2 is equal to x1 plus x1 x2 and f three of x1.

214
00:25:13,880 --> 00:25:20,090
X2 is equal to x1 e to the x2.

215
00:25:23,140 --> 00:25:26,200
So the Jacoby and Matrix is a three by two matrix.

216
00:25:27,370 --> 00:25:32,020
Let me just write it down. F prime of x. So it's three by two.

217
00:25:33,670 --> 00:25:37,990
So we differentiate the first thing with respect to x one and x two.

218
00:25:38,020 --> 00:25:45,999
So we get two X1 zero, then we differentiate the second one with respect to X1 and x2.

219
00:25:46,000 --> 00:25:56,100
So we get one plus x2 and x1. Then we differentiate the third with respect to X1 and x2.

220
00:25:56,310 --> 00:25:59,940
So we get either the X2 and X1, either the x2.

221
00:26:07,310 --> 00:26:10,730
So Newton's method is going to have to do this kind of thing all the time.

222
00:26:11,030 --> 00:26:15,830
Of course, usually the derivative will be calculated numerically rather than symbolically.

223
00:26:45,240 --> 00:27:08,790
Said, okay, I guess to continue for just a second on the general idea of a Jacobean,

224
00:27:09,060 --> 00:27:13,500
this is Newton's idea that locally any function, if it's smooth, looks linear.

225
00:27:15,060 --> 00:27:18,930
So that idea, the motivation for Jacobins.

226
00:27:22,260 --> 00:27:32,400
Is that if you have this function capital F, mapping r m to our end and you evaluate it at X plus a small perturbation delta x,

227
00:27:33,390 --> 00:27:36,930
so that's an M, an n vector and another n vector.

228
00:27:38,670 --> 00:27:45,150
Then that should be equal to. F of X plus some suitable perturbation.

229
00:27:46,020 --> 00:27:53,040
And that perturbation will be the vector, the matrix F, prime times the perturbation.

230
00:27:54,360 --> 00:27:57,480
So that's the whole point of vector calculus, if you like.

231
00:27:57,750 --> 00:28:06,640
These are both end vectors, as I said. And these.

232
00:28:08,300 --> 00:28:12,140
Are both m vectors. Sorry.

233
00:28:15,520 --> 00:28:27,440
Yeah, right. So this amp vector arises by this one by multiplying an N by an matrix by a an end vector.

234
00:28:27,770 --> 00:28:36,770
So that's what vector calculus starts from. And Newton's method is simply exploiting this property in the case of a square capital F.

235
00:28:37,340 --> 00:28:53,060
So let's write that down. So now it's Newton's method for a system of equations in its purest form.

236
00:28:53,480 --> 00:28:59,540
So we would be given an initial guess, which is now a vector.

237
00:29:01,390 --> 00:29:11,530
Ex not and then we enter into our loop for k equals 01... we evaluate.

238
00:29:14,680 --> 00:29:27,580
The function. So that's F of x, k, and also the derivative Jacobean, which is f prime of x k.

239
00:29:29,720 --> 00:29:42,830
Now, I'm assuming here that any calls anywhere in the square case, you can imagine that in practice, evaluating the Jacoby is a non-trivial matter.

240
00:29:43,280 --> 00:29:46,549
That's a huge issue of different ways to do that. Exactly.

241
00:29:46,550 --> 00:29:52,700
Or approximately. And now we want to do our update.

242
00:29:54,350 --> 00:29:59,750
We need a step. And the step is now defined by a linear system of equations.

243
00:30:00,170 --> 00:30:09,530
So here's the system of equations. F prime of x times s k is equal to minus F of x k.

244
00:30:11,460 --> 00:30:22,330
And we're solving that for the step. So before in the scalar case we just said S.K. equals two minus F over F prime.

245
00:30:22,330 --> 00:30:28,660
But now that's the system of equations. We have to solve this end by end system of equations that gives us our step.

246
00:30:29,200 --> 00:30:33,220
And then x, k plus one is equal to x, k plus k.

247
00:30:43,600 --> 00:30:50,290
So that's a pretty straightforward idea. A lot of optimisation consists of how to make that idea practical for big problems.

248
00:30:50,560 --> 00:31:01,340
Now let's look at it in its simplest form. So what I'm going to do here is look at the code called M20 Newton's system.

249
00:31:02,960 --> 00:31:09,700
So if you look at that on the back of the other ones. I guess once again I might as well write down.

250
00:31:10,330 --> 00:31:17,710
Well, now it's not important you can see the function. So the function F that we're working with here is two variables and two equations.

251
00:31:18,070 --> 00:31:21,940
So sine of x one plus x two minus e to the minus x one squared.

252
00:31:22,340 --> 00:31:27,670
Then the second equation is 3x1 plus x1x2 squared minus one equals zero.

253
00:31:27,970 --> 00:31:31,480
So we're trying to find a root of those two equations in two unknowns.

254
00:31:31,870 --> 00:31:40,330
And in this case, I've explicitly written down a formula for the giacobbe, which of course would not typically be possible in a numerical calculation.

255
00:31:40,720 --> 00:31:46,870
So this little code will just explicitly do Newton's iteration for that system of two equations.

256
00:31:48,190 --> 00:31:56,440
Let's see what happens. I'll type M20 Newton system.

257
00:31:57,700 --> 00:32:01,620
Oh, dear. Where am I not in the system? Yeah.

258
00:32:03,920 --> 00:32:09,600
What did I do wrong here? Hmm.

259
00:32:09,900 --> 00:32:15,300
That's interesting. Either I tested this on my desktop, but not on this laptop.

260
00:32:15,720 --> 00:32:26,940
Let's see what's going on in line 32. Column one. Oh sorry.

261
00:32:28,480 --> 00:32:32,170
I added some comments. That's the problem and forgot to make them into comments.

262
00:32:32,200 --> 00:32:37,880
My apologies. So. They're in your output there.

263
00:32:38,120 --> 00:32:43,150
And I added them at the last minute. Right.

264
00:32:49,930 --> 00:32:55,659
Okay. So the code is set up to start from the initial guess of zero zero and then it solves

265
00:32:55,660 --> 00:33:01,870
this two by two system of equations and gets one third two thirds and it goes on.

266
00:33:01,900 --> 00:33:05,440
Now you can guess at this point, we probably have a couple of digits of accuracy.

267
00:33:06,730 --> 00:33:10,090
You can guess now that we still probably have a couple of digits of accuracy.

268
00:33:10,090 --> 00:33:16,149
Or maybe your best guess would be about four digits now because this either it agrees with the last 1 to 2 places,

269
00:33:16,150 --> 00:33:25,900
so it's probably accurate to about four places. Sure enough, we had about five digits of accuracy and of course, numerically beautiful convergence.

270
00:33:26,140 --> 00:33:27,730
So that was an easy problem.

271
00:33:28,000 --> 00:33:34,780
Now, of course, you don't know looking at that, whether that's the unique solution, you're pretty certain that is a solution.

272
00:33:34,780 --> 00:33:44,200
But there might be others you simply don't know from looking at that. Let me do the comments that I added there.

273
00:33:45,160 --> 00:33:50,379
So of course, FUN has a lot of fun with minimising and finding routes and so on.

274
00:33:50,380 --> 00:33:58,000
Two functions. Let's look at this function in fun. It's two equations and two unknowns, so I'll do the second first.

275
00:33:58,000 --> 00:34:01,360
One way to do that would be to declare.

276
00:34:04,960 --> 00:34:17,110
A function corresponding to X on the unit square by default and another function corresponding to Y on the unit square.

277
00:34:18,400 --> 00:34:36,600
And then I could create. F1 to be sign of x plus y minus E to the minus x squared and F2 to be three times x minus x times y squared minus one.

278
00:34:36,630 --> 00:34:38,400
If I have a typo, please let me know.

279
00:34:38,670 --> 00:34:46,980
So to have fun too is doing its usual thing of representing functions by polynomials that match them to about 16 digits of precision.

280
00:34:47,790 --> 00:34:54,090
And then there's the roots command, which in principle aims to find all the roots.

281
00:34:55,880 --> 00:35:00,800
Of this system of two equations in the region where they are defined, which was the unit square.

282
00:35:00,800 --> 00:35:07,730
So -1 to 1, -1 to 1. So one would hope that we'll find the root that we just found.

283
00:35:08,180 --> 00:35:11,470
Maybe we'll find others, too, because Newton's method didn't tell us.

284
00:35:11,480 --> 00:35:16,510
Are there others to. So we proceed and we seem to find just one.

285
00:35:16,510 --> 00:35:21,820
That's not a proof, of course, but it's maybe some evidence that that's the unique route in the unit square.

286
00:35:21,970 --> 00:35:26,860
Of course you could try enlarging. Suppose I changed it to the interval from.

287
00:35:28,630 --> 00:35:33,580
-3 to 3. -3 to 3. So now we're in a box nine times as big.

288
00:35:37,580 --> 00:35:43,430
I haven't tried this, so I don't know what will happen if I now say a defined F as before.

289
00:35:45,090 --> 00:35:49,980
I've now created a global representation of these functions on these bigger domains.

290
00:35:50,520 --> 00:35:56,400
I don't know what routes we'll find. Oh, okay.

291
00:35:56,690 --> 00:36:03,829
So that's new to me. Hopefully those are all correct. We could get some numerical evidence of whether they're correct.

292
00:36:03,830 --> 00:36:09,020
So suppose I say f one events or I'll say R equals.

293
00:36:09,590 --> 00:36:21,020
And so then I'll say F1 of ah, I'm doing something wrong, so maybe I won't waste your time on that, but I'm probably those are all valid routes.

294
00:36:21,110 --> 00:36:24,110
Do we recognise among them the one that Newton's method got?

295
00:36:25,090 --> 00:36:28,420
I think that's the fourth one in the list, right .387.

296
00:36:28,750 --> 00:36:35,200
So I currently believe that there are six routes of this pair of equations in the box.

297
00:36:35,200 --> 00:36:38,739
Minus three, three, minus three, three, seven.

298
00:36:38,740 --> 00:36:42,340
Also lets you work with vector calculus sort of directly.

299
00:36:42,350 --> 00:36:48,130
So another way to do it would be to say F equals seven to V.

300
00:36:49,550 --> 00:36:53,870
V for vector, and then we could simply make a vector function of both things.

301
00:36:53,870 --> 00:36:58,130
So I could say sine of x plus y minus e to the minus x squared.

302
00:36:58,370 --> 00:37:07,220
That's the first component. And then the second component is the other function, three x minus X times y.

303
00:37:08,730 --> 00:37:12,510
Squared. Stop me if you see an error minus one.

304
00:37:14,100 --> 00:37:21,780
So that's now a vector. Capital F is now a the proper vector function and you can ask for the roots of vector.

305
00:37:23,580 --> 00:37:27,940
And there they are. Let's do the experiment of doing it on -33.

306
00:37:30,270 --> 00:37:39,400
We hope this matches what we got before. Right. Seems to.

307
00:37:40,570 --> 00:37:45,580
So whether that's further evidence or just the same old evidence is an interesting question.

308
00:37:46,180 --> 00:37:49,509
Since I'm doing this fun stuff, let me just go to the more trivial,

309
00:37:49,510 --> 00:37:55,930
one dimensional case to highlight the significance of local versus global root finding.

310
00:37:56,110 --> 00:38:05,250
So suppose we take an example we looked at before. Suppose I guess say F equals a one dimensional function and I'll pick a Bessel function.

311
00:38:05,260 --> 00:38:11,350
So suppose I say Bessel j and I'll do that on the interval from 0 to 20.

312
00:38:12,820 --> 00:38:16,650
So if I plot f. Yup.

313
00:38:16,710 --> 00:38:24,960
That's not going to be interesting. Thank you. Uh, zero x right.

314
00:38:26,220 --> 00:38:32,440
So if I plot f. I find a function with several zeros.

315
00:38:33,460 --> 00:38:39,280
Now, of course, Newton's method, if you started at any point, probably will converge to one of these zeros, but only one.

316
00:38:39,490 --> 00:38:44,080
If you wanted to find all of them, you would need to enlarge your discourse somehow.

317
00:38:44,410 --> 00:38:48,700
In the olden days, when I was your age, that was almost a taboo subject.

318
00:38:48,700 --> 00:38:53,500
Global optimisation was considered provably impossible, so nobody talked about it.

319
00:38:53,770 --> 00:38:59,950
But that's all sort of changes. There are now various context in which you can meaningfully do global optimisation.

320
00:39:00,160 --> 00:39:04,840
Of course, a one day problem like this is very easy and Tetfund would do that.

321
00:39:05,320 --> 00:39:12,160
Let me remind you how if I say R equals roots of F, of course I get all of them so I could plot them.

322
00:39:13,390 --> 00:39:16,750
But let's talk about why.

323
00:39:19,110 --> 00:39:23,210
What they do wrong here. That's worrisome.

324
00:39:23,540 --> 00:39:29,900
Did I? What did I do wrong? Oh.

325
00:39:32,730 --> 00:39:36,760
Oops. Sorry. Hold on.

326
00:39:36,780 --> 00:39:40,860
Plot are against FFR. Is that what I want to do? Thank you.

327
00:39:41,160 --> 00:39:46,410
Yes. Okay. So, of course, we see some kind of global route finding.

328
00:39:46,410 --> 00:39:49,590
But let's talk for a moment. Why was this global?

329
00:39:50,840 --> 00:39:59,600
The answer is that Ted often happens to represent functions by polynomials and find roots as eigenvalues of matrices.

330
00:39:59,900 --> 00:40:06,410
And it's a fact of scientific computing that the standard way of finding eigenvalues of matrices is global.

331
00:40:06,410 --> 00:40:13,220
You find all of them. So there's some matrix involved here, which has a lot of eigenvalues, six of which are those dots.

332
00:40:13,640 --> 00:40:21,140
It's because the standard algorithms for eigenvalue problems get all of them that it becomes easy to do global root finding,

333
00:40:21,140 --> 00:40:27,170
at least in one dimension for fun. And as I think I demonstrated before, it's pretty powerful.

334
00:40:27,170 --> 00:40:33,559
So if I made that the interval from 0 to 20000, for example, I don't I haven't tried one this big.

335
00:40:33,560 --> 00:40:41,490
Let's see what happens. I'll plot f. So there's your Bessel function.

336
00:40:41,790 --> 00:40:45,390
The that's a polynomial of degree, 10,000.

337
00:40:46,380 --> 00:40:49,560
Let's say R equals roots of F and see how long it takes.

338
00:40:49,560 --> 00:40:58,600
I predict about half a minute, actually. We're finding the roots of a polynomial of degree.

339
00:40:58,600 --> 00:41:01,600
10,000 in order to do global roots.

340
00:41:01,810 --> 00:41:10,150
Well, 15 seconds maybe. So suppose, I guess say plot F and I'll restrict it to a small interval I'll go from.

341
00:41:12,500 --> 00:41:18,860
8082 8175.

342
00:41:21,270 --> 00:41:26,910
Okay. And now I'll say plot are against FLIR.

343
00:41:27,940 --> 00:41:33,470
Hold on. Plot are against FFR.

344
00:41:33,490 --> 00:41:39,480
Will that work? Yeah.

345
00:41:39,690 --> 00:41:44,550
Okay. So in 15 seconds, we found all the roots of the Bessel function out to 20,000.

346
00:41:45,210 --> 00:41:50,070
There they are. I don't know exactly how accurate they are, but probably 14 or 15 digits.

347
00:41:51,330 --> 00:41:59,340
And that's an example of Global Zero finding. And I think as the decades go on, optimisation has a larger and larger part that's global.

348
00:41:59,460 --> 00:42:04,740
It's still a very small minority for serious scientific problems like protein folding.

349
00:42:04,740 --> 00:42:12,750
Let's say you don't have any realistic hope of global optimisation in a provable sense, but of course people are always striving for that.

350
00:42:13,950 --> 00:42:18,540
Okay, so for the last few minutes, I want to begin to talk about minimisation.

351
00:42:31,960 --> 00:42:38,860
And for whatever reason, minimisation is in practice a much vaster scientific enterprise than zero.

352
00:42:38,860 --> 00:42:44,410
Finding it just comes up more of the time. So this is really the heart of the field of optimisation.

353
00:42:44,620 --> 00:42:55,900
So let's talk about Newton's method. For minimising and we'll get start in one day.

354
00:43:11,750 --> 00:43:17,989
Well, there's no great mystery mathematically about what's involved here to minimise you want to find a zero of the derivative,

355
00:43:17,990 --> 00:43:23,900
and that's what Newton's method does. So we're now going to use a lowercase letter.

356
00:43:24,260 --> 00:43:31,820
So we suppose now we have lowercase F. A function of a scalar variable.

357
00:43:32,780 --> 00:43:38,840
And our goal is to find x star.

358
00:43:41,640 --> 00:43:45,000
Such that f of x star equals minimum.

359
00:43:51,780 --> 00:44:01,860
So it's interesting already, if you think about it, you can begin to see a slight difference between minimisation and zero finding.

360
00:44:02,070 --> 00:44:06,720
So suppose we have some function and we want to find its minima. Well.

361
00:44:08,850 --> 00:44:18,270
Of course, these are some local minima and they all have the property that the zero of the derivative is zero.

362
00:44:19,620 --> 00:44:24,239
So it's interesting that from a global point of view, these are not all equal.

363
00:44:24,240 --> 00:44:30,200
That's a better minimum than this one. Whereas in the zero finding problem, of course, by definition all zero zeros.

364
00:44:30,450 --> 00:44:34,620
So you can see that minimisation has a complication there,

365
00:44:34,630 --> 00:44:39,930
global versus local that goes deeper somehow than the analogous situation with zero finding.

366
00:44:41,580 --> 00:44:46,340
So we want to find a minimum. So a.

367
00:44:47,250 --> 00:44:51,510
Usually 90% of the time we were going to seek a local minimum.

368
00:44:54,460 --> 00:44:58,120
And it's obvious what that means. Let's write it down.

369
00:44:58,120 --> 00:45:05,710
We'll say F of X should be bigger than or equal to f of x star for all.

370
00:45:08,020 --> 00:45:14,600
Ex in a neighbourhood. Neighbourhood of Star.

371
00:45:16,910 --> 00:45:25,400
So that's our definition of a local minimum. And the obvious idea is to use Newton's method defined as zero of the derivative.

372
00:45:32,440 --> 00:45:40,230
So if prime of x star equals zero would be the condition we apply.

373
00:45:40,240 --> 00:45:44,320
And if you'll excuse my being so methodical here, I'm going to write it down.

374
00:45:44,650 --> 00:45:48,430
So this is Newton's method for minimisation of a scalar function.

375
00:45:48,640 --> 00:45:54,330
We're given an initial gas. Which would be ex not.

376
00:45:55,230 --> 00:46:00,910
We enter our loop. 4K equals 01...

377
00:46:01,630 --> 00:46:12,560
We define our step. As Sub K, we're now finding a0 of the derivative, so it's the same formula as before but differentiated.

378
00:46:12,560 --> 00:46:18,770
So it's minus f, prime of x k divided by F double prime.

379
00:46:19,710 --> 00:46:28,320
Of x k and then we update exactly as before x k plus one equals x k plus s-k and.

380
00:46:36,640 --> 00:46:45,070
So from that, you can perhaps begin to see that minimisation is going to have some problems.

381
00:46:45,400 --> 00:46:49,030
It's a whole nother derivative beyond what we had before it.

382
00:46:49,390 --> 00:46:54,970
This pure Newton method depends on the curvature of the function. That's first of all, it's a lot more information.

383
00:46:55,270 --> 00:47:01,660
Second of all, it's demanding more of the smoothness of a function for that to really work.

384
00:47:01,990 --> 00:47:09,100
So, of course, in many applications, it will be hard to evaluate that, or maybe the derivative won't even exist if your function isn't very smooth.

385
00:47:09,400 --> 00:47:13,300
And that gives a hint as to the complexity of the problem.

386
00:47:14,020 --> 00:47:18,550
Let me finish by saying another way that one would interpret this.

387
00:47:22,790 --> 00:47:27,450
So. Let me call it an equivalent formulation,

388
00:47:27,630 --> 00:47:35,760
while at the same time hinting that it's in some sense a better formulation because it generalises very broadly,

389
00:47:41,400 --> 00:47:46,500
the equivalent way of looking at Newton's method is that we approximate F of X.

390
00:47:50,990 --> 00:48:04,220
Locally by a parabola. And of course the particular parabola we choose is determined by the derivative in the second derivative at our current point,

391
00:48:04,730 --> 00:48:13,580
and then we set x, k plus one equal to the minimum of the parabola.

392
00:48:23,050 --> 00:48:28,660
Now the reason that somehow a better formulation is that it immediately generalises.

393
00:48:28,660 --> 00:48:34,270
And this in some sense is really a picture for all kinds of optimisation algorithms.

394
00:48:34,720 --> 00:48:40,390
You do a local approximation and what changes is, instead of saying just a parabola,

395
00:48:40,930 --> 00:48:45,130
you have some model of your function based on whatever data you've got.

396
00:48:47,420 --> 00:48:59,640
And then you minimise that model. So at least for unconstrained optimisation, this is sort of the universal picture of the way most algorithms work.

397
00:48:59,910 --> 00:49:03,780
You locally find some model of what your function looks like.

398
00:49:04,080 --> 00:49:12,090
Use that model to get the next step. And as you go, keep improving the accuracy and probably improving the model.

399
00:49:12,690 --> 00:49:19,020
So next time we'll go into multiple dimensions, but also talk about more realistic aspects beyond Newton's method.