1
00:00:01,020 --> 00:00:06,280
I think. Everybody, let's get going. I've made a few more copies of a sign up sheet.
2
00:00:07,060 --> 00:00:11,560
Please keep them circulating. So anybody who didn't sign last lecture, please put your name down.
3
00:00:11,740 --> 00:00:14,170
Now, a comment about that.
4
00:00:14,860 --> 00:00:21,070
I normally put up a list on the web of people in the course by department because it's interesting to see who's in the course.
5
00:00:21,490 --> 00:00:27,700
If for any reason you don't want your name on that list, send me an email and I will commit your name from the list.
6
00:00:28,090 --> 00:00:35,270
I think, by the way, the filming here probably doesn't include your faces, so you're fairly anonymous anyway.
7
00:00:35,740 --> 00:00:42,580
I don't think we're yet available online, but the hope is that before too long you'll be able to look at lectures online if you want to.
8
00:00:43,780 --> 00:00:46,480
This thing that I've put up here has nothing to do with this course,
9
00:00:47,020 --> 00:00:51,220
but it's just something I've been interested in recently thought might interest you.
10
00:00:51,910 --> 00:01:00,940
So the centrepiece is this graph. I've been involved for many years with writing papers and serving as a referee and an editor and so on.
11
00:01:01,090 --> 00:01:04,420
And I got interested in the phenomenon that papers are getting longer.
12
00:01:04,840 --> 00:01:09,640
In particular, my organisation is Siam, the Society for Industrial and Applied Mathematics,
13
00:01:09,940 --> 00:01:17,680
and I took a look at three of the main journals from Siam, Siam Journal of Applied Mathematics and Numerical Analysis and Scientific Computing.
14
00:01:18,070 --> 00:01:24,520
And this plot shows as a function of time the average length of a paper in these Siam journals.
15
00:01:24,520 --> 00:01:31,840
It's just astonishing how clean the data is. So basically when my career was getting going, a paper was 12 pages long.
16
00:01:32,110 --> 00:01:36,249
And as your career is getting going, a paper is 24 pages long.
17
00:01:36,250 --> 00:01:38,290
It's a doubling of the length of papers.
18
00:01:38,560 --> 00:01:44,110
There are many things going on here, and indeed it's fairly controversial what the essential explanations are.
19
00:01:44,470 --> 00:01:48,160
But it all happened unintentionally. Nobody planned this.
20
00:01:48,160 --> 00:01:52,780
It's just one of these cultural shifts. I think it's happened in other fields too, to some extent.
21
00:01:52,780 --> 00:01:56,889
And actually, if any of you have views on this matter, please send me email.
22
00:01:56,890 --> 00:02:01,420
I'm very interested in the subject and I plan to write some kind of summary of that subject.
23
00:02:01,600 --> 00:02:05,379
I know, for example, it's also happened in economics, in pure mathematics.
24
00:02:05,380 --> 00:02:09,430
The papers are three times as long and they've been growing in proportion.
25
00:02:10,600 --> 00:02:21,880
Okay. So we're talking about conjugate gradients.
26
00:02:21,890 --> 00:02:25,820
That's what the last lecture was about. And today we're going to continue on that subject.
27
00:02:25,820 --> 00:02:33,740
It's such an important subject and talk in particular about Preconditioning, but we didn't really do things properly last time.
28
00:02:33,740 --> 00:02:34,910
It all went rather quickly.
29
00:02:34,910 --> 00:02:43,880
So I want to bring you back to the last lecture and emphasise this picture which captures the essence of the behaviour of conjugate gradients.
30
00:02:44,240 --> 00:02:49,280
Now you remember that conjugate gradient is an iterative algorithm which
31
00:02:49,700 --> 00:02:56,060
constructs a succession of approximations x sub j to an exact solution x star.
32
00:02:56,510 --> 00:03:02,360
And those approximations lie in these so-called cri loft spaces which are getting bigger and bigger.
33
00:03:02,720 --> 00:03:10,400
So at each step you're in a space of dimension one greater, and as a result, your approximations can't get worse.
34
00:03:10,400 --> 00:03:17,630
They have to get better because conjugate gradients magically find the optimal approximation in the kirilloff space.
35
00:03:17,990 --> 00:03:22,220
So as time goes by, it gets better and better. And the question is how fast?
36
00:03:22,820 --> 00:03:28,790
Now we showed last time you can relate that to a polynomial and it's a polynomial in the matrix.
37
00:03:28,790 --> 00:03:38,570
A we're solving x equals B, a is a very large matrix and the polynomial of A is involved.
38
00:03:39,770 --> 00:03:47,270
But that in turn we related to the polynomial applied to the eigenvalues of a and then these
39
00:03:47,270 --> 00:03:53,090
are the crucial pictures the red lines are supposed to suggest the eigenvalues of your matrix.
40
00:03:53,510 --> 00:03:59,840
And here's a case where the eigenvalues, they're all positive, but they come close to zero.
41
00:04:01,310 --> 00:04:06,110
And here's the case the eigenvalues are all positive and they don't come so close to zero.
42
00:04:07,190 --> 00:04:15,230
Now remember that the key fact is how small can a polynomial p be at the eigenvalues
43
00:04:15,800 --> 00:04:19,910
subject to the constraint that it has to take the value one at the origin?
44
00:04:21,240 --> 00:04:30,690
So conjugate gradients is implicitly finding an optimal polynomial in some sense that small on the spectrum but takes the value one at the origin.
45
00:04:31,050 --> 00:04:36,270
Now it's intuitively plausible that if the spectrum comes very close to the origin, that's hard.
46
00:04:36,630 --> 00:04:43,980
So your polynomial is relatively big, whereas if the spectrum is well separated from the origin,
47
00:04:43,980 --> 00:04:47,250
then it's easy and your polynomial is relatively small.
48
00:04:47,640 --> 00:04:53,610
So this is the case where things converge quickly. All of that can be made very precise.
49
00:04:55,380 --> 00:05:03,210
And in particular, the standard thing one says is based on comparing the maximum and the minimum eigenvalue.
50
00:05:03,450 --> 00:05:08,310
So if that's the maximum eigenvalue and that's the minimum eigenvalue.
51
00:05:09,730 --> 00:05:13,180
A is a symmetric positive, definite matrix.
52
00:05:13,180 --> 00:05:23,380
So its eigenvalues are real and positive. We define the condition number of a to be the ratio.
53
00:05:24,130 --> 00:05:30,250
So Kappa of A is the ratio of Lambda Max to Lambda Min.
54
00:05:33,310 --> 00:05:37,090
And that's a measure of how difficult this approximation problem is.
55
00:05:37,270 --> 00:05:45,880
If that ratio is high, then you're in this situation with slow convergence and if the ratio is small, you're in this situation with fast convergence.
56
00:05:46,960 --> 00:05:52,030
Now let me write down the theorem we didn't quite get to in the last lecture on that subject.
57
00:05:57,320 --> 00:06:04,040
So the theorem goes like this. It's really a corollary of the theorem I did write down.
58
00:06:05,930 --> 00:06:17,210
And it says this. If you look at the a norm of the error at the end step of conjugate gradients and divided by the norm of the initial error,
59
00:06:18,680 --> 00:06:22,760
then that ratio is bounded by twice.
60
00:06:24,570 --> 00:06:31,620
Square root of Kappa minus one over square root of Kappa, plus one to the N.
61
00:06:34,550 --> 00:06:42,330
So what's going on there? Is that. We have geometric convergence.
62
00:06:42,330 --> 00:06:49,319
That is to say, as you take more and more steps with each step, you get at least a constant factor improvement in the error.
63
00:06:49,320 --> 00:06:55,469
The error goes down by at least a constant at each step, and the constant depends on the condition number.
64
00:06:55,470 --> 00:06:58,710
If cap is large, this is close to one, so that's not so good.
65
00:06:58,980 --> 00:07:02,190
If Kappa is small, it's well below one and that's very good.
66
00:07:02,640 --> 00:07:09,360
And incidentally, if Kappa is large, then this is approximately two times E.
67
00:07:10,680 --> 00:07:14,640
To the minus two NW over the square root of Kappa.
68
00:07:19,270 --> 00:07:27,220
So you get at worst exponential convergence e to the minus constant n as you take more and more steps.
69
00:07:27,610 --> 00:07:30,820
And the timescale of that convergence is the square root of Kappa.
70
00:07:31,240 --> 00:07:37,390
So loosely speaking, it takes square root of Kappa iterations of conjugate gradient to converge.
71
00:07:37,870 --> 00:07:42,070
More precisely, it takes square root of Kappa iterations to get a constant factor improvement.
72
00:07:42,310 --> 00:07:45,820
And then, depending how far you want to go, you have to act accordingly.
73
00:07:46,780 --> 00:07:55,570
It's a convenient fact that E to the minus two is approximately 0.1.
74
00:07:56,260 --> 00:08:03,100
It's about 0.14. So thanks to the two here, what that means is that.
75
00:08:04,610 --> 00:08:12,470
Basically when you take square root of Kappa steps of conjugate gradient, you get one digit improvement or nearly one digit improvement in your error.
76
00:08:13,580 --> 00:08:19,790
So that's an amazing, beautiful fact about conjugate gradients and it's a starting point of all sorts of things.
77
00:08:20,840 --> 00:08:25,090
And let me illustrate that now coming back to what we did in the last lecture.
78
00:08:25,100 --> 00:08:28,930
So there's no new handout here. Let's see.
79
00:08:33,390 --> 00:08:36,450
So we'll go back to MATLAB and.
80
00:08:37,480 --> 00:08:44,080
There was a code called M three, M three, c g convergence.
81
00:08:44,080 --> 00:08:47,440
And let me run that again. Okay. So just to remind you, it looked like this.
82
00:08:47,740 --> 00:08:56,800
We set a equal to ram down of 500 of 500 by 500 matrix matrix with random normally distributed entries.
83
00:08:57,310 --> 00:09:07,360
And then we took a equal a transpose A, which gives us a 500 by 500 symmetric positive definite matrix.
84
00:09:07,750 --> 00:09:13,600
And then we set B equal to one 500 comma one to give us a right hand side.
85
00:09:15,150 --> 00:09:23,850
And now the point is that the smallest eigenvalue of this matrix, AA, is very close to the orange, and we're in this situation.
86
00:09:24,060 --> 00:09:32,250
Let's see how small it is. If equals I give a and lambda min equals min of a very small.
87
00:09:32,910 --> 00:09:37,330
If Lambda max equals max of E. Rather big.
88
00:09:38,110 --> 00:09:44,110
So the condition number is the ratio of Lambda Max divided by Lambda Min.
89
00:09:44,290 --> 00:09:52,300
And that's a bad number, right? So ten to the fifth indicates we're going to take a thousand steps even to get one digit of improvement.
90
00:09:52,630 --> 00:09:59,709
The square root is about 1000. If I type in MATLAB, kind of a I get the same number.
91
00:09:59,710 --> 00:10:03,340
MATLAB computes the condition number using whatever methods it uses.
92
00:10:04,000 --> 00:10:08,770
So that's a bad situation. The square root of Kappa.
93
00:10:10,760 --> 00:10:21,500
Is 656. So that means that if we take 656 steps, we can expect to reduce the error by about a factor of .14.
94
00:10:21,530 --> 00:10:32,440
That's awful, right? And sure enough, if you run C, G of a B using the code I showed you before, it eventually converges.
95
00:10:33,280 --> 00:10:37,750
But that's actually because the matrix is only 500 by 500.
96
00:10:37,760 --> 00:10:43,180
It's not really converging due to this effect, but due to the finite dimensionality of the matrix.
97
00:10:43,420 --> 00:10:49,030
The initial phase is awful. So they're the first steps.
98
00:10:49,060 --> 00:10:54,760
Not much is happening. So more steps. The more steps you can see, the error is going down very, very slowly.
99
00:10:55,090 --> 00:11:00,340
That's an example of an IL condition matrix, for which conjugate gradients is essentially useless.
100
00:11:01,420 --> 00:11:06,720
Now let me do again what I did last time. I then shifted the matrix.
101
00:11:06,730 --> 00:11:12,190
I said A equals a plus a 100 times the identity of size 500.
102
00:11:13,390 --> 00:11:21,010
So now, of course, the eigenvalues shift accordingly, and the minimum eigenvalue is now 100 bigger than it used to be.
103
00:11:21,910 --> 00:11:28,030
The maximum value is also 100 bigger. So now we have a much better ratio.
104
00:11:28,540 --> 00:11:32,050
The ratio of the biggest to the smallest eigenvalues is only 20.
105
00:11:32,560 --> 00:11:35,740
The square root of 20 is 4.5.
106
00:11:36,460 --> 00:11:44,380
So now we're in this situation, and if we take 4.5 steps of conjugate gradients, we can expect an improvement by a factor of 0.14.
107
00:11:46,230 --> 00:11:49,740
Let's do that. So I'll say CG.
108
00:11:53,100 --> 00:11:56,790
Now you can see one screen full. We're already getting a couple of digits.
109
00:11:57,060 --> 00:12:05,610
Let's take another screen full. So after two screen falls, far fewer than 500 steps, I've already got six or eight digits of convergence.
110
00:12:05,910 --> 00:12:12,490
That's kind of get gradients when things are going well. Okay.
111
00:12:12,500 --> 00:12:15,500
I meant to say all that last lecture, and I didn't quite get around to it.
112
00:12:18,200 --> 00:12:25,790
Now. Let's digress for a moment and play with the handout you did get today, which is called M4 DOT Indices.
113
00:12:26,000 --> 00:12:28,310
This is just sort of exploring MATLAB,
114
00:12:30,230 --> 00:12:35,510
just to remind you of some of the stuff you can do in MATLAB and in other languages too, if they're at a high level.
115
00:12:35,960 --> 00:12:42,770
The point being to encourage you to think at the matrix and vector level rather than the individual entry level.
116
00:12:43,070 --> 00:12:47,240
Of course, ultimately computers are mostly working with individual entries,
117
00:12:47,510 --> 00:12:52,640
but you shouldn't be thinking at that level and you shouldn't be writing triply nested for loops ever.
118
00:12:53,740 --> 00:12:57,820
So let's just play with that handout. If I say one coal and six.
119
00:12:59,020 --> 00:13:04,060
I get a vector of length six. If I say one coal and two coal and six.
120
00:13:05,060 --> 00:13:15,770
It's based by truth. When MATLAB was invented in the late seventies, there existed one or two things that could have similar structure like APL maybe,
121
00:13:16,010 --> 00:13:19,430
but this way of user interaction was really non-standard.
122
00:13:19,700 --> 00:13:23,210
It seems so elementary to us now, but it was a big transition.
123
00:13:24,170 --> 00:13:31,190
A few more examples. So if I say six colon minus one one, I get a vector of decreasing numbers.
124
00:13:31,970 --> 00:13:40,790
Let's say a equal 1 to 16. Let's say a of end minus three colon end.
125
00:13:42,380 --> 00:13:49,680
Oops. A minus three. So end always refers to the last entry in a vector.
126
00:13:51,200 --> 00:14:00,770
Let's suppose I say ants of 1 to 3 ants always refers to the most recent thing that you've computed and not given a name to.
127
00:14:02,210 --> 00:14:05,600
Suppose I say a of two colon five. Colon 16.
128
00:14:07,070 --> 00:14:16,390
Well, two. 712. Suppose I say a equals reshape a little A for for little AA is.
129
00:14:19,830 --> 00:14:23,220
A vector, but if you reshape it, you can turn it into a matrix.
130
00:14:23,400 --> 00:14:28,110
So I've taken that data and strung it out column wise into a four by four matrix.
131
00:14:29,250 --> 00:14:33,840
Suppose I say a of two 516.
132
00:14:33,900 --> 00:14:38,100
Now with a capital AA, well, I get the same numbers as before,
133
00:14:38,280 --> 00:14:44,700
which means that in MATLAB you can index a two dimensional matrix by a single index if you wish to do that.
134
00:14:47,710 --> 00:14:52,870
Suppose I say rep matt. Repeat matrix ance six.
135
00:14:53,850 --> 00:14:55,839
One. Well.
136
00:14:55,840 --> 00:15:08,140
And since this most recent thing we've computed this row vector and rep Matt will take that that matrix or vector and panel it in a six by one array.
137
00:15:08,500 --> 00:15:15,660
So we get that. Now let's take a and change it a little bit.
138
00:15:15,670 --> 00:15:20,200
I'll say A of two colon. Five colon 16 equals.
139
00:15:22,440 --> 00:15:25,530
97, 98, 99.
140
00:15:26,430 --> 00:15:35,160
So there you see we've modified some entries of The Matrix or I suppose I say A of two 516 equals zero.
141
00:15:35,460 --> 00:15:41,040
So the point here is that if you're setting a bunch of entries to a single number, you don't have to make that a vector.
142
00:15:41,040 --> 00:15:48,460
On the right hand side. You can just write the number. Suppose I say a of colon two.
143
00:15:49,390 --> 00:15:55,920
Then you get the second column of the matrix. Similarly a of two comma colon would give you the first row.
144
00:15:56,620 --> 00:16:01,360
Or suppose I say a of colon two equals the empty vector.
145
00:16:01,900 --> 00:16:06,580
Then you see A has now been changed. It has removed the second column.
146
00:16:09,470 --> 00:16:13,520
Suppose I say find a double equals zero?
147
00:16:14,120 --> 00:16:23,929
A logical equals. So it finds those indices where A takes the value zero and notice by default it's doing that in the single index mode.
148
00:16:23,930 --> 00:16:28,850
So the second entry is a zero and also the eighth entry in the matrix is a zero.
149
00:16:30,790 --> 00:16:34,510
Suppose I say a ants equals 15?
150
00:16:34,720 --> 00:16:40,420
Well, it will take these two indices we've just found and change the value of a at those points to 15.
151
00:16:43,580 --> 00:16:46,670
Suppose I say find a equals 15.
152
00:16:46,970 --> 00:16:55,940
There are three of them now. On the other hand, suppose I want to indices, I could say I j equals find a equals 15.
153
00:16:57,110 --> 00:17:03,890
And then you see it's giving us the row and column indices of the three entries of a that are 15.
154
00:17:05,690 --> 00:17:08,899
Or suppose I say a equals 15 equals,
155
00:17:08,900 --> 00:17:16,340
so a logical condition that will give me a matrix of zeros and ones with a one in each position where a equals 15.
156
00:17:18,930 --> 00:17:23,460
Suppose I say a greater than 15? That's another logical condition.
157
00:17:23,850 --> 00:17:28,740
So again, I'll get a matrix of zeros and ones showing me where A is bigger than 50.
158
00:17:29,670 --> 00:17:39,390
Suppose I say find of a sorry find of mod of a two equals zero that will find the entries of a.
159
00:17:40,420 --> 00:17:45,420
That r zero mod to. In other words, even. So there they are.
160
00:17:45,580 --> 00:17:49,780
Here's a again. So those are the indices of the even entries of a.
161
00:17:52,340 --> 00:17:58,400
Suppose I say a of ants equals 100 times a events.
162
00:17:59,950 --> 00:18:04,090
We've now taken the the even entries of the matrix and multiplied them by 100.
163
00:18:05,050 --> 00:18:08,080
Suppose I say who's round of five?
164
00:18:10,030 --> 00:18:18,400
There we have some numbers. Let me make it format short. So there we have a five by five array of random numbers between zero and one.
165
00:18:19,700 --> 00:18:25,729
Suppose I say a of five colon for colon 21 equals one.
166
00:18:25,730 --> 00:18:37,160
Colon five. Squared. Now what I've done is I've taken the cross diagonal entries and I've changed them from random numbers into 149 1625.
167
00:18:39,070 --> 00:18:45,790
Now suppose I say a of a not equal round of a equals 999.
168
00:18:46,870 --> 00:18:56,440
Round of a of course rounds a number to an integer. So the places where A is not equal to its rounded value are the places where it's not an integer.
169
00:18:57,040 --> 00:19:00,970
So you see, I guess turned all the random numbers into 999.
170
00:19:01,750 --> 00:19:07,210
This is the way in high level languages you can do a lot of stuff and it gets convenient always to try.
171
00:19:07,450 --> 00:19:11,830
As a matter of principle, to do things the slick way. Okay.
172
00:19:11,900 --> 00:19:16,180
That was the end of the digression. I know there's a lot of variety in this room.
173
00:19:16,190 --> 00:19:19,690
Some of you have used MATLAB for years, others not so long.
174
00:19:26,470 --> 00:19:36,400
Okay. So back to our main story, Preconditioning.
175
00:19:38,860 --> 00:19:47,680
The lesson here is that if you have a matrix problem whose spectrum is well separated from the origin, then conjugate gradients will converge quickly.
176
00:19:50,170 --> 00:19:55,480
So that suggests given a matrix problem X equals B, we should add 100 times the identity to it.
177
00:19:56,200 --> 00:20:03,189
The trouble is, you can't do that. If X equals B, you can't change AA and still have an equivalent problem.
178
00:20:03,190 --> 00:20:07,240
There's no way automatically to turn that picture into that picture.
179
00:20:07,420 --> 00:20:09,610
If there were, it would be a very different world.
180
00:20:10,570 --> 00:20:22,780
So adding a multiple of the identity is not a successful idea in general, but multiplying is preconditioning is a multiplicative idea.
181
00:20:26,980 --> 00:20:32,469
Which sort of came along in the 1970s and their variations on this.
182
00:20:32,470 --> 00:20:35,830
And the one I'll talk about is what you could call left preconditioning.
183
00:20:36,100 --> 00:20:41,050
Suppose you have a x equals b is the problem you want to solve.
184
00:20:42,310 --> 00:20:46,540
Well, that's of course equivalent to m inverse.
185
00:20:47,230 --> 00:20:54,010
A X equals m inverse B for any matrix m that's non singular.
186
00:20:55,660 --> 00:21:00,280
And the whole idea of preconditioning is defined matrices m.
187
00:21:01,830 --> 00:21:10,080
Which have two properties. One is that you can effectively compute the inverse or at least solve matrix problems involving them reasonably fast.
188
00:21:10,680 --> 00:21:16,110
And the other is that this improves the picture to a better condition matrix problem.
189
00:21:17,070 --> 00:21:21,320
The surprising thing is that in almost all applications there are ways to do this.
190
00:21:21,330 --> 00:21:27,120
So there's a huge business of finding good preconditions, and that's more or less what I'm talking about today.
191
00:21:27,660 --> 00:21:38,760
So let me say more. Now, of course, if A is symmetric, positive, definite, which it had better be for conjugate gradients, then.
192
00:21:40,750 --> 00:21:44,990
If we take. M and factor it.
193
00:21:46,210 --> 00:21:48,940
In that form. That's called a collective factorisation.
194
00:21:49,120 --> 00:21:55,359
So if we have a symmetric positive, definite pre conditioner and we factor it into a product of one matrix,
195
00:21:55,360 --> 00:22:05,020
transpose times that matrix, then another way to write the equation turns out to be c inverse.
196
00:22:05,410 --> 00:22:08,530
A C minus transpose.
197
00:22:11,010 --> 00:22:16,470
C transpose x equals C inverse.
198
00:22:17,440 --> 00:22:22,180
B, I'm not going to go through that and indeed, I'm not going to use this formulation.
199
00:22:22,330 --> 00:22:26,770
But the point is that in practice, symmetry can often be preserved.
200
00:22:27,010 --> 00:22:34,120
So if you have a symmetric problem and you want to precondition it, there are ways to do that, to preserve the symmetry and.
201
00:22:35,320 --> 00:22:40,060
That's just mechanics. I'm not going to talk about it. Symmetry can be.
202
00:22:42,150 --> 00:22:53,610
Preserved when you precondition. But I'm just going to talk about the fundamental idea using this basic mathematical idea.
203
00:22:54,980 --> 00:23:00,170
So we want to improve the conditioning of a matrix by pre multiplication by something.
204
00:23:00,770 --> 00:23:06,200
Let's take a look just for fun at the paper. That's among the handouts by my shrink and then divorced.
205
00:23:07,610 --> 00:23:10,910
So this is truly one of the greatest hits of numerical analysis.
206
00:23:16,470 --> 00:23:19,620
There's a lot of controversy about who invented Preconditioning.
207
00:23:19,620 --> 00:23:24,270
And without a doubt, these were not the first people to have ideas in this direction.
208
00:23:24,630 --> 00:23:31,110
On the other hand, they were the ones who put it on the map. This paper really did change computational science slowly.
209
00:23:31,410 --> 00:23:36,660
It was an agonising story for the authors. I think they wrote the paper in like 72 or something.
210
00:23:37,020 --> 00:23:45,750
Had a lot of trouble with referees. There was one particularly famous referee whose name I will not utter who tried to squelch the paper.
211
00:23:46,500 --> 00:23:51,240
Eventually it got published years later, and slowly it took off.
212
00:23:51,480 --> 00:23:53,340
So starting in the late seventies,
213
00:23:53,340 --> 00:24:00,840
everybody was talking about pre-condition conjugate gradients and I'll say a word in a moment about their preconditions.
214
00:24:01,800 --> 00:24:07,110
Which is just one of many methods, but a particularly good one called incomplete factorisation.
215
00:24:07,440 --> 00:24:13,049
So Vandervoort in particular has been one of the great figures in iterative linear algebra for many years.
216
00:24:13,050 --> 00:24:17,070
Henk Van divorced from the University of Utrecht in the Netherlands.
217
00:24:18,690 --> 00:24:20,969
So you've now seen two papers that changed the world.
218
00:24:20,970 --> 00:24:28,800
One was destinies and Stiefel on gradients, and then this on preconditioning conjugate gradients 25 years later.
219
00:24:33,460 --> 00:24:41,170
Okay. So what I want to do is illustrate a bit of the effect, a precondition or can have.
220
00:24:41,470 --> 00:24:46,450
So we're now going to look at the codes called PRC and M5.
221
00:24:55,490 --> 00:25:02,840
So let me remind you that MATLAB has built into it a precondition conjugate gradient code called Pcgg.
222
00:25:03,860 --> 00:25:08,570
And then in addition, I've written a few elementary ones which are not software,
223
00:25:08,580 --> 00:25:12,680
they're just illustrations of how simple the preconditioning idea can be.
224
00:25:12,980 --> 00:25:18,650
So you look at my codes in order to get sort of the essential mathematical idea, but you would never use them as software.
225
00:25:20,030 --> 00:25:26,540
So c g is un precondition and Prague is precondition with a precondition or m just has written there.
226
00:25:26,750 --> 00:25:29,780
And let's play with that. So let's say.
227
00:25:30,020 --> 00:25:34,310
Suppose a is the diagonal matrix from 1 to 100.
228
00:25:37,970 --> 00:25:43,280
And let's suppose B is the usual trivial one right hand side.
229
00:25:45,920 --> 00:25:52,969
And let's try CG of a. So that's an example of not so good convergence.
230
00:25:52,970 --> 00:26:00,270
It's going but not so great. Let's take AI to be the identity of size 100.
231
00:26:02,130 --> 00:26:11,850
And now let's try various preconditions. So first of all, suppose I try precondition conjugate gradients and my m matrix is the identity.
232
00:26:13,270 --> 00:26:19,120
So I precondition by replacing that by this where m is the identity, of course it will have no effect.
233
00:26:20,080 --> 00:26:24,730
So that's one extreme you get exactly what have I done?
234
00:26:24,970 --> 00:26:28,630
Oops. The same numbers as before. Here's a plot of the convergence.
235
00:26:28,900 --> 00:26:32,050
Eventually it gets there, but it's not especially great.
236
00:26:37,090 --> 00:26:45,010
Let's go to the other extreme. So suppose I say p.r c g of a b.
237
00:26:45,400 --> 00:26:57,380
Hey. So now I'm going to precondition this problem by setting em equal to A so it becomes x equals adverse B, it will converge in one step.
238
00:27:02,160 --> 00:27:06,300
Oops. What am I doing wrong here? Okay, it converge in one step.
239
00:27:06,320 --> 00:27:10,430
The. The X axis should really say 0 to 1 instead of 1 to 2.
240
00:27:10,440 --> 00:27:16,700
Basically in one step it converts. So that's the perfect precondition when M equals A itself.
241
00:27:16,710 --> 00:27:21,360
But of course, the whole point is that we assume A is hard to invert.
242
00:27:21,360 --> 00:27:25,470
So although it's a perfect precondition, it's a unusable precondition.
243
00:27:26,070 --> 00:27:33,180
The whole point is to find a compromise sum matrix m which can be inverted quickly, which improves the conditioning.
244
00:27:34,760 --> 00:27:41,750
So in this simple illustration, let's just try as the square root of the diagonal.
245
00:27:42,350 --> 00:27:51,190
So let's say. Let's say d equals diag of square root of 1 to 100.
246
00:27:54,010 --> 00:28:01,330
Now our matrix happened to be diagonal. So if I took the equals the diagonal of the matrix, it would be a itself.
247
00:28:01,510 --> 00:28:04,420
That's why I've put in the square root just to mess things up a little bit.
248
00:28:04,720 --> 00:28:11,850
So here's an example of a matrix that captures some of the flavour of a but not all of it, and it's a pretty good precondition.
249
00:28:12,190 --> 00:28:16,280
So if I say. PR c.
250
00:28:16,290 --> 00:28:19,530
G of a. B. D.
251
00:28:22,800 --> 00:28:27,690
This is more typical of the kind of behaviour you strive for in using conjugate gradients.
252
00:28:27,870 --> 00:28:31,950
You see, we're having good smooth convergence at a good geometric rate.
253
00:28:32,250 --> 00:28:38,370
In 30 steps we've got ten digits of accuracy. This matrix was only 100 by 100, so that's not too exciting.
254
00:28:38,370 --> 00:28:45,260
But it might have been 10,000 by 10,000. Let's do a bigger problem.
255
00:28:50,660 --> 00:28:57,770
Oh, I'm sorry. I forgot that I was going to do the same thing using the built in MATLAB precondition conjugate gradient.
256
00:28:57,770 --> 00:29:07,160
So this is now the built in MATLAB code. If I say PC G, a, b, and then I give it a tolerance and I tell it how many steps it can take.
257
00:29:08,660 --> 00:29:13,130
Then eventually it converges in an automated way.
258
00:29:14,060 --> 00:29:29,340
So let's now do a bigger problem. I'll say an equal 30,000 and I'll allocate some memory sparse allocation of memory Ballack of end to end for end.
259
00:29:29,880 --> 00:29:39,140
So we've just allocated we've told MATLAB that we're going to be working with an end by end sparse matrix and give it for and.
260
00:29:41,360 --> 00:29:53,150
Double precision memory locations and I'm going to set I to be an index set going from end 30,000 in steps of minus eight up to N squared.
261
00:29:54,140 --> 00:30:02,030
And I'm going to set a of I equal to one. So you see I'm now doing the kind of fancy stuff with indices that I showed you a moment ago.
262
00:30:02,330 --> 00:30:06,590
Fancy fooling around. We won't know what that looks like until we draw a picture.
263
00:30:07,770 --> 00:30:16,890
But first let me say a equals a plus a transpose plus diag of sparse of one to n.
264
00:30:18,210 --> 00:30:21,870
So what I'm doing here is just fooling around and constructing an example matrix.
265
00:30:22,110 --> 00:30:29,640
It's 30,000 by 30,000. Unless you're very quick, you probably can't visualise its sparsity pattern on the diagonal.
266
00:30:29,940 --> 00:30:36,420
It has one to n and on the off diagonal it has the number one appearing in various positions.
267
00:30:37,290 --> 00:30:41,680
So it's a sparse matrix and it's symmetric. Here's what it looks like.
268
00:30:44,290 --> 00:30:48,730
To me that looks like some kind of a cake that people have decorated with frosting or something.
269
00:30:48,970 --> 00:30:56,110
So on the diagonal it goes from 1 to 30000 and all the off diagonal blue dots are the numbers one.
270
00:30:57,070 --> 00:31:03,760
So a typical sparse matrix with and it's not at all clear what kind of structure it's got, whether it's a usable structure or what.
271
00:31:06,040 --> 00:31:07,659
Now let's solve a problem with that.
272
00:31:07,660 --> 00:31:22,660
So I'll say B equals ones of 30,000 comma one and I'll say c, g of a B, so we're now trying unfree preconditioned conjugate gradients.
273
00:31:24,880 --> 00:31:28,270
The first screen fall immediately tells you that it's not doing very well.
274
00:31:28,270 --> 00:31:37,360
We've got about a factor four in one screen full. It's converging slowly, so without a precondition or conjugate gradients is not satisfactory.
275
00:31:40,900 --> 00:31:44,830
Now let's precondition and here's what I'll do. I'm going to take a diagonal precondition.
276
00:31:45,130 --> 00:31:51,220
And this is a very standard, important idea. In fact, this is almost the basic first precondition.
277
00:31:51,220 --> 00:31:58,330
Or you should always try for every problem. A is some complicated matrix with non zeros all over the place.
278
00:31:59,630 --> 00:32:06,770
That constructs a diagonal matrix with the diagonal entries of a and surprisingly often that's a pretty good precondition.
279
00:32:08,220 --> 00:32:18,220
So let's try it. Let's say X equals prc g of a B with the diagonal precondition.
280
00:32:24,040 --> 00:32:27,579
So look at that. It's beautiful behaviour for this matrix.
281
00:32:27,580 --> 00:32:32,379
In seven steps we got 12 digits of accuracy so that's better than usual.
282
00:32:32,380 --> 00:32:41,230
If you achieve this you are in very good shape, spectacularly effective preconditioning just by using the diagonal of the matrix.
283
00:32:41,410 --> 00:32:45,440
Even this idea wasn't really around until the 1970s.
284
00:32:45,550 --> 00:32:51,670
There are precursors, of course, but nobody was really taking advantage of this kind of thing until the seventies.
285
00:32:55,390 --> 00:33:03,370
Of course one can also do it using the built in software. So suppose I say pc gab1e.
286
00:33:04,750 --> 00:33:10,620
Or am I. If you press the wrong thing, it can be annoying.
287
00:33:10,740 --> 00:33:21,510
One E, minus ten. 100 D So using the built in matlab pc g code you see in five steps it's happy.
288
00:33:25,630 --> 00:33:31,480
So that looks like a great example of a matrix where iterative methods are spectacularly good.
289
00:33:31,690 --> 00:33:35,860
We could even do a tick tock on that. Suppose I said talk.
290
00:33:38,030 --> 00:33:44,690
Tick. So you see in less so about a 10th of a second, we get this 12 digit accuracy.
291
00:33:45,530 --> 00:33:51,050
Let's try for contrast doing a backslash b.
292
00:33:56,170 --> 00:34:01,330
So this is Mount Labs, the standard built in X equals B solver backslash.
293
00:34:01,990 --> 00:34:06,490
And exactly what that does is an interesting story that we haven't talked about.
294
00:34:07,960 --> 00:34:12,790
So. You might expect it to take a staggeringly long amount of time.
295
00:34:12,910 --> 00:34:18,520
Remember the first lecture we did a 4000 by 4000 matrix and it took several seconds.
296
00:34:18,700 --> 00:34:22,089
This is a 30,000 by 30,000 matrix. So who knows?
297
00:34:22,090 --> 00:34:27,010
It may take minutes or not. Let's put a tick and talk around it to see what it takes.
298
00:34:29,250 --> 00:34:32,310
Well, look at that. That's also spectacularly fast.
299
00:34:33,510 --> 00:34:37,500
This, it turns out, is using a completely different method. It's not iterative.
300
00:34:37,710 --> 00:34:44,340
It's using a method that cleverly reorders the rows in the columns of the matrix to bring those ones closer to the diagonal.
301
00:34:44,610 --> 00:34:47,940
So this is a surprise. When I tried it, it surprised me.
302
00:34:48,210 --> 00:34:52,530
I thought I had cooked up an example in which you really needed an iterative method.
303
00:34:52,830 --> 00:34:57,000
But this experiment shows that for this example, you don't need an iterative method.
304
00:34:57,330 --> 00:35:01,200
And in general, you should never assume a problem needs an iterative method.
305
00:35:01,230 --> 00:35:08,100
It's always worth trying these more elementary so-called direct methods, which in MATLAB are encoded with backslash.
306
00:35:08,550 --> 00:35:17,980
A rule of thumb is that often problems that come from a two dimensional spatial domain don't need iterative methods.
307
00:35:18,000 --> 00:35:25,620
Often, backslash type algorithms work fine. If you come from a three dimensional spatial domain more often, you do need iterative methods.
308
00:35:29,250 --> 00:35:34,050
Let's compare the accuracy. Suppose I say max of abs of x minus x two.
309
00:35:34,320 --> 00:35:39,510
So I'm now comparing the result of preconditioned conjugate gradients to the result of backslash,
310
00:35:39,510 --> 00:35:44,280
which is a direct non-intuitive method and I find perfectly good agreement.
311
00:35:48,460 --> 00:35:54,030
Okay. So what can one do in a course like this with preconditions?
312
00:35:54,130 --> 00:35:58,709
I can tell you they're incredibly important. It's kind of embarrassing how important they are.
313
00:35:58,710 --> 00:36:03,720
A computational science is not in principle about linear algebra, and yet in practice it often is.
314
00:36:04,170 --> 00:36:09,870
And linear algebra is not in principle about preconditioning. And yet in practice it often is.
315
00:36:10,200 --> 00:36:15,690
It's just remarkable how much of computational science often comes down to having a good precondition.
316
00:36:16,110 --> 00:36:20,340
And what I want to do in the rest of this lecture is just sort of talk about
317
00:36:20,850 --> 00:36:27,450
14 classes of preconditions that have proved important in the last 40 years.
318
00:36:27,630 --> 00:36:35,940
Just to give you a sense of some of the breadth of this. Now, maybe I can get away with even shutting that down.
319
00:36:35,940 --> 00:36:39,540
If I shut down the computer, will that cause trouble? Yes.
320
00:36:39,540 --> 00:36:42,600
I won't get away with that then. So. Okay, we'll stick with this screen.
321
00:36:51,430 --> 00:36:56,320
So this is what I'm going to call Section 1.7 Whistle Stop Survey.
322
00:36:58,810 --> 00:37:13,379
Of preconditions. And before we do our survey, let me mention that Oxford, as it happens,
323
00:37:13,380 --> 00:37:19,800
has an expert in the field, and that's Andy Watson in the Numerical Analysis Group upstairs.
324
00:37:20,970 --> 00:37:25,950
So he is in the field of numerical linear algebra and in particular, he's an expert in Preconditioning.
325
00:37:26,160 --> 00:37:33,510
So if you are trying to precondition your problem, this is really one of the world's most knowledgeable people right here in this building.
326
00:37:35,290 --> 00:37:42,360
Okay. Oh, another remark to make is. For reading, you might try.
327
00:37:42,390 --> 00:37:49,800
Chapter 40 of Profession in Beau is more or less what I'm just about to mention to you, minus a few things.
328
00:37:50,930 --> 00:38:00,050
That's a discussion of a number of preconditions. Okay, so our whistle stop tour, we have about a minute per precondition.
329
00:38:02,060 --> 00:38:05,150
The first one dimension is diagonal scaling.
330
00:38:09,540 --> 00:38:12,620
And there's another word that's often used for that.
331
00:38:12,630 --> 00:38:19,380
It's called Jacoby because the great mathematician Jacoby used ideas like this.
332
00:38:19,800 --> 00:38:26,850
He was amazingly modern in the middle of the 19th century. He was doing things which still have numerical application today.
333
00:38:27,940 --> 00:38:31,250
So this is absolutely the first thing to try for almost any problem.
334
00:38:31,270 --> 00:38:44,000
It's often good. At the other extreme in complexity is the idea that my shrink and that divorced introduced namely incomplete factorisation.
335
00:38:48,060 --> 00:38:59,070
Now, let me just say a word about what that means. So we haven't talked about Gaussian elimination because it's too familiar, basically.
336
00:38:59,310 --> 00:39:07,410
But the standard method by which people solve X equals B is to factor A into a product of a lower triangular and an upper triangular matrix.
337
00:39:07,620 --> 00:39:11,430
That's Gaussian elimination. Now, if you can do that, that's great.
338
00:39:12,270 --> 00:39:17,760
The trouble is that even if A is sparse, the factors L and you are typically not so sparse.
339
00:39:17,970 --> 00:39:22,440
So you destroy a lot of the potential of your problem if you can factor it.
340
00:39:22,650 --> 00:39:27,930
Moreover, if you aren't so sparse, a lot of work is going to be involved.
341
00:39:28,440 --> 00:39:36,020
So the idea that my shrink and Vandervoort had was to do a factor ization in which you force a lot of the entries to be zero,
342
00:39:36,030 --> 00:39:44,240
even though mathematically they shouldn't be you instead of a equals L2, which would be too expensive and not sparse enough.
343
00:39:46,270 --> 00:39:52,989
They do some kind of an A equals an approximate L and an approximate you wear and it's not equal.
344
00:39:52,990 --> 00:39:59,560
It's approximately equal where L and U tilde are forced to have a sparsity pattern.
345
00:40:00,010 --> 00:40:04,650
That turns out to be a very powerful technique. It's much more complicated.
346
00:40:04,900 --> 00:40:11,440
Of course, you have to decide exactly what your strategy will be, and the analysis of it also turns out to be not at all trivial.
347
00:40:11,440 --> 00:40:17,920
So it took many years for people to be comfortable with exactly why it worked as well as it does.
348
00:40:18,100 --> 00:40:21,280
But it really is a powerful technique. And in MATLAB.
349
00:40:23,800 --> 00:40:27,220
There are some built in capabilities for this, in particular.
350
00:40:27,220 --> 00:40:39,820
C l u i n c which is a non symmetric incomplete factorisation code and then ch0lincchol
351
00:40:39,820 --> 00:40:45,160
stands for LSC and that refers to the symmetric case of incomplete factorisation.
352
00:40:49,120 --> 00:40:53,650
The third one I'd like to mention is the what I call course grid approximation.
353
00:40:58,080 --> 00:41:05,940
So in computational science, very often we have a big matrix problem because there's some grid that we've constructed in two or three dimensions.
354
00:41:06,210 --> 00:41:10,950
And of course we want the grid to be as fine as possible if it's 100 by 100 by 100,
355
00:41:10,950 --> 00:41:14,160
and there's just a scalar variable, that's a million unknowns already.
356
00:41:14,970 --> 00:41:18,299
If there are three variables at each point, that's 3 million unknowns.
357
00:41:18,300 --> 00:41:22,080
If it's 200 by 200 by 200, that's 24 million unknowns.
358
00:41:22,080 --> 00:41:28,110
So you very quickly get big matrix problems. However, as you might imagine.
359
00:41:30,020 --> 00:41:38,840
If you have a very coarse grid that might give you some kind of crude approximation to the solution on the grid, you ultimately really care about it.
360
00:41:39,500 --> 00:41:45,080
So just to draw the simplest picture, the four by four problem.
361
00:41:46,660 --> 00:41:51,459
Might be some kind of approximation to the HIPAA problem and so on.
362
00:41:51,460 --> 00:41:55,240
That might be some kind of an approximation to the 16 by 16 problem.
363
00:41:55,540 --> 00:41:59,410
So this might be a precondition for that.
364
00:42:00,160 --> 00:42:05,050
At each step, we might somehow solve a problem involving this relatively easy matrix.
365
00:42:05,290 --> 00:42:09,560
And if we do things right, use that approximation as our precondition.
366
00:42:10,030 --> 00:42:18,460
That's a great idea in many cases. And indeed, that's the idea which eventually led to what's called multi grid methods.
367
00:42:21,580 --> 00:42:25,770
So if you do this just once. Precondition a fine grid by a coarser grid.
368
00:42:25,780 --> 00:42:33,880
Well, that's a precondition. But if you recurs on that idea going down a level to coarser and coarser grids, that's called multi grid iteration.
369
00:42:34,060 --> 00:42:38,440
And that's another one of the greatest hits ideas of the last few decades multi grid
370
00:42:38,440 --> 00:42:45,370
iterations for solving the linear algebra problems that arise in computational science.
371
00:42:50,920 --> 00:42:59,020
One more thing to say about that. If you think about what's involved, it's intuitively clear that this is has a hope of getting,
372
00:42:59,020 --> 00:43:02,440
as it were, the long range structure, the the big wavelengths.
373
00:43:02,890 --> 00:43:06,580
Obviously, you're not going to resolve the small wavelengths on the course grid.
374
00:43:06,790 --> 00:43:13,720
So this preconditioning idea is one of preconditioning the problem by getting the large scale structure somehow.
375
00:43:14,950 --> 00:43:19,720
Now the next idea is almost a dual of that. I could call it local approximation.
376
00:43:26,960 --> 00:43:30,050
So any scientific problem with a spatial aspect to it?
377
00:43:31,130 --> 00:43:35,870
Probably has some physics that's essentially local and some other things going on that are reasonably global.
378
00:43:36,020 --> 00:43:39,020
The boundaries may be affecting you in fundamental ways,
379
00:43:39,200 --> 00:43:44,140
but also just whatever locally is diffusing or convecting or whatever will of course, be important.
380
00:43:44,600 --> 00:43:48,950
And another idea for Preconditioning is that if you can do everything sort of locally,
381
00:43:48,950 --> 00:43:54,800
correct, ignoring the long range that may correspond to a close to diagonal matrix.
382
00:43:55,160 --> 00:44:00,740
So this may give you a precondition or which is close to diagonal and therefore implementable.
383
00:44:01,430 --> 00:44:09,350
So in some sense, if you omit long range interactions, that may be an idea for a precondition.
384
00:44:17,650 --> 00:44:22,480
Let me mention then number as number five, what I'll call block precondition is.
385
00:44:26,050 --> 00:44:32,290
And more or less the same idea. Domain decomposition.
386
00:44:42,190 --> 00:44:49,269
This comes up in various context. Physically, you can imagine a system, let's say you have an aeroplane, it's got a left wing,
387
00:44:49,270 --> 00:44:52,150
it's got a right wing, it's got the body, it's got the engines and so on.
388
00:44:53,530 --> 00:44:58,900
Maybe you can solve the body relatively easy, but then coupling it in with the others leads to complexity.
389
00:44:59,110 --> 00:45:05,890
So the idea here is to use as your precondition for these various sub systems of the complicated system.
390
00:45:06,520 --> 00:45:15,639
Another way to say that, though, is you've got a giant matrix and maybe the first few columns correspond to the body of the aeroplane,
391
00:45:15,640 --> 00:45:23,140
and the next few columns sort of correspond to the left wing and the next few columns correspond to the right wing and so on and so on.
392
00:45:23,890 --> 00:45:28,150
You can imagine they're all sorts of connections between these different systems.
393
00:45:28,960 --> 00:45:36,160
But if we approximate by pretending that the body is only connected with the body and the left wing is only connected with the left wing,
394
00:45:36,580 --> 00:45:45,050
and the right wing is only connected with the right wing and so on. Then we're getting some kind of a block diagonal structure.
395
00:45:45,320 --> 00:45:51,830
We're omitting crucial parts of the physics that are out here, but hopefully of a lower importance.
396
00:45:52,830 --> 00:45:57,659
So that's another generalisation, if you like, of the idea of a diagonal precondition.
397
00:45:57,660 --> 00:46:04,020
Are these all of these ideas are just unbelievably big in practice.
398
00:46:04,020 --> 00:46:09,690
For example, there's a domain decomposition conference that has happened every two years for the last 25 years.
399
00:46:09,690 --> 00:46:14,940
You know, little bits of this have whole research communities and big scientific impact.
400
00:46:19,140 --> 00:46:22,560
The next one I want to mention is low order discontinuation.
401
00:46:28,730 --> 00:46:37,070
Now what I mean by that, if you take a differential equation and approximate it by finite differences,
402
00:46:38,450 --> 00:46:42,709
if you just look for second order accuracy, you probably get a reasonably sparse matrix.
403
00:46:42,710 --> 00:46:49,070
But bad accuracy. If you look for fourth or sixth order accuracy, you get a worse matrix, but better accuracy.
404
00:46:49,280 --> 00:46:56,660
So a natural idea is to formulate your dissertation at a reasonably high order to get accuracy on,
405
00:46:56,720 --> 00:47:02,450
not to find a grid, but then use the sparser low order matrix as your precondition.
406
00:47:04,380 --> 00:47:09,630
And if you take that idea to the limit, there are things called spectral methods which are sort of of infinite order.
407
00:47:09,840 --> 00:47:14,880
And some people would approximate a spectral method by a finite element method or something.
408
00:47:15,060 --> 00:47:20,460
So you can have an infinite order method preconditioned by a second order method, for example.
409
00:47:26,330 --> 00:47:33,980
Okay. Next thing, let's mention the idea of taking a problem and a constant coefficient approximation to it.
410
00:47:36,000 --> 00:47:41,130
And I'll put in the same category or a symmetric approximation.
411
00:47:47,360 --> 00:47:54,830
So in many cases, if your problem had constant coefficients or were symmetric, there might be special fast ways to solve it.
412
00:47:55,010 --> 00:47:59,450
For example, a the Poisson equation with constant coefficients.
413
00:48:00,770 --> 00:48:06,860
There are things called fast Poisson solvers which take advantage of the fast Fourier transform to get spectacular speed.
414
00:48:07,400 --> 00:48:12,080
If only my problem had constant coefficients, one would say I could solve it very quickly.
415
00:48:12,350 --> 00:48:17,240
Well, whenever you're in an if only situation, then maybe you've got a good precondition.
416
00:48:17,420 --> 00:48:20,300
So use the constant coefficient thing to precondition.
417
00:48:20,600 --> 00:48:28,580
Or if your problem is dominated by some symmetric physics that can be encoded in a symmetric matrix, and yet it also has some non symmetric physics.
418
00:48:29,330 --> 00:48:33,660
Maybe the symmetric part can be solved more quickly and can be used as a precondition.
419
00:48:37,830 --> 00:48:41,070
The next one is splitting. Related to that.
420
00:48:43,820 --> 00:48:47,270
So I guess the term we like these days would be multi physics.
421
00:48:51,010 --> 00:48:56,500
Suppose you have a problem, as people so often do, which has more than one bit of physics going on.
422
00:48:56,680 --> 00:49:02,800
And the classic example would be in fluid mechanics, you have convection and you also have diffusion viscosity.
423
00:49:03,700 --> 00:49:07,660
So these are different bits of physics and they turn into different bits of mathematics.
424
00:49:07,900 --> 00:49:13,180
They're discrete times by different linear algebra problems that all combine into one.
425
00:49:14,380 --> 00:49:18,490
It may be that if you only had one of the problems or the other, you could solve it quickly.
426
00:49:18,640 --> 00:49:21,910
Maybe if you only had the diffusion, you could use a fast Poisson solver.
427
00:49:22,060 --> 00:49:28,630
But the convection makes that difficult. Well, that's a very standard way of devising a precondition.
428
00:49:28,750 --> 00:49:34,330
You've set up the big problem. Hive off part of it that you can solve as your precondition.
429
00:49:38,050 --> 00:49:52,810
Next idea is dimensional splitting, and related to that is the idea of Add II, which stands for alternating direction implicit methods.
430
00:49:53,410 --> 00:49:57,520
Here, as you can guess from the name. You have a problem in 2D or 3D.
431
00:49:57,910 --> 00:50:02,160
If only it had one dimension, then it would be a diagonal matrix.
432
00:50:02,170 --> 00:50:05,110
It's the multi dimensionality that makes it non diagonal.
433
00:50:05,470 --> 00:50:14,049
So the preconditioning idea is to only look at the interactions in X, for example, as your precondition to solve the X part of the problem.
434
00:50:14,050 --> 00:50:17,740
Exactly. That's not the whole problem. But maybe it's a good precondition.
435
00:50:25,140 --> 00:50:32,280
Another idea is to take one step as your precondition of a classical iteration.
436
00:50:33,150 --> 00:50:43,350
Now, by a classical iteration, I mean the things that go by the names of Gauss Seidel.
437
00:50:45,820 --> 00:50:51,610
Or successive over relaxation or symmetric successive over relaxation.
438
00:50:52,240 --> 00:50:57,820
These are things that were big in the 1950s before people realised how powerful conjugate gradients was.
439
00:50:58,180 --> 00:51:09,280
They're much less important now. That's an L. It's much rarer these days for people to solve a problem just by using successive over relaxation.
440
00:51:09,520 --> 00:51:13,780
However, these things make great preconditions for higher order problems,
441
00:51:13,990 --> 00:51:18,400
and this has to do again with separating the local effects from the global effects.
442
00:51:18,670 --> 00:51:22,990
These can sort of take your problem and globalise it in a in a good way.
443
00:51:28,740 --> 00:51:36,020
11. Is the idea of a periodic approximation to a non periodic problem.
444
00:51:43,240 --> 00:51:46,930
This was invented by Gil Strang at MIT.
445
00:51:47,170 --> 00:51:54,190
He was actually visiting Oxford this term. I'm curious who have seen lectures by Gil Strang on the web?
446
00:51:55,440 --> 00:52:00,570
Okay. One or two? Yeah. He has these linear algebra and other things that have been very popular at MIT.
447
00:52:01,080 --> 00:52:03,690
If you Google him, you'll find that. So it's trying.
448
00:52:03,690 --> 00:52:10,470
Years ago proposed this idea that take a non periodic problem like a toeplitz matrix which has constant long diagonals.
449
00:52:11,190 --> 00:52:16,680
Imagine the Associated Periodic version of the problem, which in that example would be a circular matrix.
450
00:52:17,070 --> 00:52:23,970
It may be you have special methods for the periodic problem, like a fast Fourier transform, so you can solve that one very quickly.
451
00:52:24,450 --> 00:52:27,870
It turns out in some cases it's a spectacularly good precondition.
452
00:52:29,730 --> 00:52:33,450
We're a little short on time, but let me quickly at least name the final three.
453
00:52:34,770 --> 00:52:45,620
One idea is to use an unstable method. We haven't talked about stability yet, but some numerical methods are fast but unstable.
454
00:52:45,630 --> 00:52:50,940
They amplify rounding errors. The classic example would be Gaussian elimination without pivoting.
455
00:52:51,240 --> 00:52:57,000
If you don't interchange rows, you get an unstable method of solving X equals B.
456
00:52:57,720 --> 00:53:01,180
Probably not good by itself, but it might make a great precondition.
457
00:53:01,770 --> 00:53:09,240
And Bob Skeel is a famous name in that area. Another idea is polynomial preconditions.
458
00:53:13,770 --> 00:53:22,550
And the idea here. Is to approximate a inverse depth,
459
00:53:22,820 --> 00:53:30,680
to approximate a inverse somehow by a polynomial of a and that has the advantage of being a beautifully parallelism idea.
460
00:53:30,690 --> 00:53:37,339
So it's an idea that it's somehow it's never the best precondition or but it's always the most paralysing parallelism of one.
461
00:53:37,340 --> 00:53:39,320
So the idea keeps having importance.
462
00:53:39,590 --> 00:53:46,639
And actually one of the guys in the Numerical Analysis Group is working with things along these lines to solve eigenvalue problems.
463
00:53:46,640 --> 00:53:55,370
Jarrett aren't working with use of on that. And finally, let me mention the idea of sparse approximate inverses.
464
00:54:04,690 --> 00:54:09,100
Well, the idea is if we do the inverse. Exactly, that would be great. At least if it were sparse enough.
465
00:54:09,520 --> 00:54:17,140
In this business, people solved least squares problems to explicitly construct a matrix, which is a pretty good inverse.
466
00:54:17,170 --> 00:54:20,350
Not exact, but close to an inverse and is also quite sparse.
467
00:54:20,350 --> 00:54:23,470
So least squares technology turns into making a precondition.
468
00:54:23,710 --> 00:54:28,480
So I think this list probably covers more than half of the big ideas out there.
469
00:54:28,480 --> 00:54:33,520
It's just a huge, huge subject. And that's pretty much all we're going to say about it except for the next assignment.
470
00:54:33,940 --> 00:54:34,660
Okay. Thanks.