1 00:00:01,020 --> 00:00:06,280 I think. Everybody, let's get going. I've made a few more copies of a sign up sheet. 2 00:00:07,060 --> 00:00:11,560 Please keep them circulating. So anybody who didn't sign last lecture, please put your name down. 3 00:00:11,740 --> 00:00:14,170 Now, a comment about that. 4 00:00:14,860 --> 00:00:21,070 I normally put up a list on the web of people in the course by department because it's interesting to see who's in the course. 5 00:00:21,490 --> 00:00:27,700 If for any reason you don't want your name on that list, send me an email and I will commit your name from the list. 6 00:00:28,090 --> 00:00:35,270 I think, by the way, the filming here probably doesn't include your faces, so you're fairly anonymous anyway. 7 00:00:35,740 --> 00:00:42,580 I don't think we're yet available online, but the hope is that before too long you'll be able to look at lectures online if you want to. 8 00:00:43,780 --> 00:00:46,480 This thing that I've put up here has nothing to do with this course, 9 00:00:47,020 --> 00:00:51,220 but it's just something I've been interested in recently thought might interest you. 10 00:00:51,910 --> 00:01:00,940 So the centrepiece is this graph. I've been involved for many years with writing papers and serving as a referee and an editor and so on. 11 00:01:01,090 --> 00:01:04,420 And I got interested in the phenomenon that papers are getting longer. 12 00:01:04,840 --> 00:01:09,640 In particular, my organisation is Siam, the Society for Industrial and Applied Mathematics, 13 00:01:09,940 --> 00:01:17,680 and I took a look at three of the main journals from Siam, Siam Journal of Applied Mathematics and Numerical Analysis and Scientific Computing. 14 00:01:18,070 --> 00:01:24,520 And this plot shows as a function of time the average length of a paper in these Siam journals. 15 00:01:24,520 --> 00:01:31,840 It's just astonishing how clean the data is. So basically when my career was getting going, a paper was 12 pages long. 16 00:01:32,110 --> 00:01:36,249 And as your career is getting going, a paper is 24 pages long. 17 00:01:36,250 --> 00:01:38,290 It's a doubling of the length of papers. 18 00:01:38,560 --> 00:01:44,110 There are many things going on here, and indeed it's fairly controversial what the essential explanations are. 19 00:01:44,470 --> 00:01:48,160 But it all happened unintentionally. Nobody planned this. 20 00:01:48,160 --> 00:01:52,780 It's just one of these cultural shifts. I think it's happened in other fields too, to some extent. 21 00:01:52,780 --> 00:01:56,889 And actually, if any of you have views on this matter, please send me email. 22 00:01:56,890 --> 00:02:01,420 I'm very interested in the subject and I plan to write some kind of summary of that subject. 23 00:02:01,600 --> 00:02:05,379 I know, for example, it's also happened in economics, in pure mathematics. 24 00:02:05,380 --> 00:02:09,430 The papers are three times as long and they've been growing in proportion. 25 00:02:10,600 --> 00:02:21,880 Okay. So we're talking about conjugate gradients. 26 00:02:21,890 --> 00:02:25,820 That's what the last lecture was about. And today we're going to continue on that subject. 27 00:02:25,820 --> 00:02:33,740 It's such an important subject and talk in particular about Preconditioning, but we didn't really do things properly last time. 28 00:02:33,740 --> 00:02:34,910 It all went rather quickly. 29 00:02:34,910 --> 00:02:43,880 So I want to bring you back to the last lecture and emphasise this picture which captures the essence of the behaviour of conjugate gradients. 30 00:02:44,240 --> 00:02:49,280 Now you remember that conjugate gradient is an iterative algorithm which 31 00:02:49,700 --> 00:02:56,060 constructs a succession of approximations x sub j to an exact solution x star. 32 00:02:56,510 --> 00:03:02,360 And those approximations lie in these so-called cri loft spaces which are getting bigger and bigger. 33 00:03:02,720 --> 00:03:10,400 So at each step you're in a space of dimension one greater, and as a result, your approximations can't get worse. 34 00:03:10,400 --> 00:03:17,630 They have to get better because conjugate gradients magically find the optimal approximation in the kirilloff space. 35 00:03:17,990 --> 00:03:22,220 So as time goes by, it gets better and better. And the question is how fast? 36 00:03:22,820 --> 00:03:28,790 Now we showed last time you can relate that to a polynomial and it's a polynomial in the matrix. 37 00:03:28,790 --> 00:03:38,570 A we're solving x equals B, a is a very large matrix and the polynomial of A is involved. 38 00:03:39,770 --> 00:03:47,270 But that in turn we related to the polynomial applied to the eigenvalues of a and then these 39 00:03:47,270 --> 00:03:53,090 are the crucial pictures the red lines are supposed to suggest the eigenvalues of your matrix. 40 00:03:53,510 --> 00:03:59,840 And here's a case where the eigenvalues, they're all positive, but they come close to zero. 41 00:04:01,310 --> 00:04:06,110 And here's the case the eigenvalues are all positive and they don't come so close to zero. 42 00:04:07,190 --> 00:04:15,230 Now remember that the key fact is how small can a polynomial p be at the eigenvalues 43 00:04:15,800 --> 00:04:19,910 subject to the constraint that it has to take the value one at the origin? 44 00:04:21,240 --> 00:04:30,690 So conjugate gradients is implicitly finding an optimal polynomial in some sense that small on the spectrum but takes the value one at the origin. 45 00:04:31,050 --> 00:04:36,270 Now it's intuitively plausible that if the spectrum comes very close to the origin, that's hard. 46 00:04:36,630 --> 00:04:43,980 So your polynomial is relatively big, whereas if the spectrum is well separated from the origin, 47 00:04:43,980 --> 00:04:47,250 then it's easy and your polynomial is relatively small. 48 00:04:47,640 --> 00:04:53,610 So this is the case where things converge quickly. All of that can be made very precise. 49 00:04:55,380 --> 00:05:03,210 And in particular, the standard thing one says is based on comparing the maximum and the minimum eigenvalue. 50 00:05:03,450 --> 00:05:08,310 So if that's the maximum eigenvalue and that's the minimum eigenvalue. 51 00:05:09,730 --> 00:05:13,180 A is a symmetric positive, definite matrix. 52 00:05:13,180 --> 00:05:23,380 So its eigenvalues are real and positive. We define the condition number of a to be the ratio. 53 00:05:24,130 --> 00:05:30,250 So Kappa of A is the ratio of Lambda Max to Lambda Min. 54 00:05:33,310 --> 00:05:37,090 And that's a measure of how difficult this approximation problem is. 55 00:05:37,270 --> 00:05:45,880 If that ratio is high, then you're in this situation with slow convergence and if the ratio is small, you're in this situation with fast convergence. 56 00:05:46,960 --> 00:05:52,030 Now let me write down the theorem we didn't quite get to in the last lecture on that subject. 57 00:05:57,320 --> 00:06:04,040 So the theorem goes like this. It's really a corollary of the theorem I did write down. 58 00:06:05,930 --> 00:06:17,210 And it says this. If you look at the a norm of the error at the end step of conjugate gradients and divided by the norm of the initial error, 59 00:06:18,680 --> 00:06:22,760 then that ratio is bounded by twice. 60 00:06:24,570 --> 00:06:31,620 Square root of Kappa minus one over square root of Kappa, plus one to the N. 61 00:06:34,550 --> 00:06:42,330 So what's going on there? Is that. We have geometric convergence. 62 00:06:42,330 --> 00:06:49,319 That is to say, as you take more and more steps with each step, you get at least a constant factor improvement in the error. 63 00:06:49,320 --> 00:06:55,469 The error goes down by at least a constant at each step, and the constant depends on the condition number. 64 00:06:55,470 --> 00:06:58,710 If cap is large, this is close to one, so that's not so good. 65 00:06:58,980 --> 00:07:02,190 If Kappa is small, it's well below one and that's very good. 66 00:07:02,640 --> 00:07:09,360 And incidentally, if Kappa is large, then this is approximately two times E. 67 00:07:10,680 --> 00:07:14,640 To the minus two NW over the square root of Kappa. 68 00:07:19,270 --> 00:07:27,220 So you get at worst exponential convergence e to the minus constant n as you take more and more steps. 69 00:07:27,610 --> 00:07:30,820 And the timescale of that convergence is the square root of Kappa. 70 00:07:31,240 --> 00:07:37,390 So loosely speaking, it takes square root of Kappa iterations of conjugate gradient to converge. 71 00:07:37,870 --> 00:07:42,070 More precisely, it takes square root of Kappa iterations to get a constant factor improvement. 72 00:07:42,310 --> 00:07:45,820 And then, depending how far you want to go, you have to act accordingly. 73 00:07:46,780 --> 00:07:55,570 It's a convenient fact that E to the minus two is approximately 0.1. 74 00:07:56,260 --> 00:08:03,100 It's about 0.14. So thanks to the two here, what that means is that. 75 00:08:04,610 --> 00:08:12,470 Basically when you take square root of Kappa steps of conjugate gradient, you get one digit improvement or nearly one digit improvement in your error. 76 00:08:13,580 --> 00:08:19,790 So that's an amazing, beautiful fact about conjugate gradients and it's a starting point of all sorts of things. 77 00:08:20,840 --> 00:08:25,090 And let me illustrate that now coming back to what we did in the last lecture. 78 00:08:25,100 --> 00:08:28,930 So there's no new handout here. Let's see. 79 00:08:33,390 --> 00:08:36,450 So we'll go back to MATLAB and. 80 00:08:37,480 --> 00:08:44,080 There was a code called M three, M three, c g convergence. 81 00:08:44,080 --> 00:08:47,440 And let me run that again. Okay. So just to remind you, it looked like this. 82 00:08:47,740 --> 00:08:56,800 We set a equal to ram down of 500 of 500 by 500 matrix matrix with random normally distributed entries. 83 00:08:57,310 --> 00:09:07,360 And then we took a equal a transpose A, which gives us a 500 by 500 symmetric positive definite matrix. 84 00:09:07,750 --> 00:09:13,600 And then we set B equal to one 500 comma one to give us a right hand side. 85 00:09:15,150 --> 00:09:23,850 And now the point is that the smallest eigenvalue of this matrix, AA, is very close to the orange, and we're in this situation. 86 00:09:24,060 --> 00:09:32,250 Let's see how small it is. If equals I give a and lambda min equals min of a very small. 87 00:09:32,910 --> 00:09:37,330 If Lambda max equals max of E. Rather big. 88 00:09:38,110 --> 00:09:44,110 So the condition number is the ratio of Lambda Max divided by Lambda Min. 89 00:09:44,290 --> 00:09:52,300 And that's a bad number, right? So ten to the fifth indicates we're going to take a thousand steps even to get one digit of improvement. 90 00:09:52,630 --> 00:09:59,709 The square root is about 1000. If I type in MATLAB, kind of a I get the same number. 91 00:09:59,710 --> 00:10:03,340 MATLAB computes the condition number using whatever methods it uses. 92 00:10:04,000 --> 00:10:08,770 So that's a bad situation. The square root of Kappa. 93 00:10:10,760 --> 00:10:21,500 Is 656. So that means that if we take 656 steps, we can expect to reduce the error by about a factor of .14. 94 00:10:21,530 --> 00:10:32,440 That's awful, right? And sure enough, if you run C, G of a B using the code I showed you before, it eventually converges. 95 00:10:33,280 --> 00:10:37,750 But that's actually because the matrix is only 500 by 500. 96 00:10:37,760 --> 00:10:43,180 It's not really converging due to this effect, but due to the finite dimensionality of the matrix. 97 00:10:43,420 --> 00:10:49,030 The initial phase is awful. So they're the first steps. 98 00:10:49,060 --> 00:10:54,760 Not much is happening. So more steps. The more steps you can see, the error is going down very, very slowly. 99 00:10:55,090 --> 00:11:00,340 That's an example of an IL condition matrix, for which conjugate gradients is essentially useless. 100 00:11:01,420 --> 00:11:06,720 Now let me do again what I did last time. I then shifted the matrix. 101 00:11:06,730 --> 00:11:12,190 I said A equals a plus a 100 times the identity of size 500. 102 00:11:13,390 --> 00:11:21,010 So now, of course, the eigenvalues shift accordingly, and the minimum eigenvalue is now 100 bigger than it used to be. 103 00:11:21,910 --> 00:11:28,030 The maximum value is also 100 bigger. So now we have a much better ratio. 104 00:11:28,540 --> 00:11:32,050 The ratio of the biggest to the smallest eigenvalues is only 20. 105 00:11:32,560 --> 00:11:35,740 The square root of 20 is 4.5. 106 00:11:36,460 --> 00:11:44,380 So now we're in this situation, and if we take 4.5 steps of conjugate gradients, we can expect an improvement by a factor of 0.14. 107 00:11:46,230 --> 00:11:49,740 Let's do that. So I'll say CG. 108 00:11:53,100 --> 00:11:56,790 Now you can see one screen full. We're already getting a couple of digits. 109 00:11:57,060 --> 00:12:05,610 Let's take another screen full. So after two screen falls, far fewer than 500 steps, I've already got six or eight digits of convergence. 110 00:12:05,910 --> 00:12:12,490 That's kind of get gradients when things are going well. Okay. 111 00:12:12,500 --> 00:12:15,500 I meant to say all that last lecture, and I didn't quite get around to it. 112 00:12:18,200 --> 00:12:25,790 Now. Let's digress for a moment and play with the handout you did get today, which is called M4 DOT Indices. 113 00:12:26,000 --> 00:12:28,310 This is just sort of exploring MATLAB, 114 00:12:30,230 --> 00:12:35,510 just to remind you of some of the stuff you can do in MATLAB and in other languages too, if they're at a high level. 115 00:12:35,960 --> 00:12:42,770 The point being to encourage you to think at the matrix and vector level rather than the individual entry level. 116 00:12:43,070 --> 00:12:47,240 Of course, ultimately computers are mostly working with individual entries, 117 00:12:47,510 --> 00:12:52,640 but you shouldn't be thinking at that level and you shouldn't be writing triply nested for loops ever. 118 00:12:53,740 --> 00:12:57,820 So let's just play with that handout. If I say one coal and six. 119 00:12:59,020 --> 00:13:04,060 I get a vector of length six. If I say one coal and two coal and six. 120 00:13:05,060 --> 00:13:15,770 It's based by truth. When MATLAB was invented in the late seventies, there existed one or two things that could have similar structure like APL maybe, 121 00:13:16,010 --> 00:13:19,430 but this way of user interaction was really non-standard. 122 00:13:19,700 --> 00:13:23,210 It seems so elementary to us now, but it was a big transition. 123 00:13:24,170 --> 00:13:31,190 A few more examples. So if I say six colon minus one one, I get a vector of decreasing numbers. 124 00:13:31,970 --> 00:13:40,790 Let's say a equal 1 to 16. Let's say a of end minus three colon end. 125 00:13:42,380 --> 00:13:49,680 Oops. A minus three. So end always refers to the last entry in a vector. 126 00:13:51,200 --> 00:14:00,770 Let's suppose I say ants of 1 to 3 ants always refers to the most recent thing that you've computed and not given a name to. 127 00:14:02,210 --> 00:14:05,600 Suppose I say a of two colon five. Colon 16. 128 00:14:07,070 --> 00:14:16,390 Well, two. 712. Suppose I say a equals reshape a little A for for little AA is. 129 00:14:19,830 --> 00:14:23,220 A vector, but if you reshape it, you can turn it into a matrix. 130 00:14:23,400 --> 00:14:28,110 So I've taken that data and strung it out column wise into a four by four matrix. 131 00:14:29,250 --> 00:14:33,840 Suppose I say a of two 516. 132 00:14:33,900 --> 00:14:38,100 Now with a capital AA, well, I get the same numbers as before, 133 00:14:38,280 --> 00:14:44,700 which means that in MATLAB you can index a two dimensional matrix by a single index if you wish to do that. 134 00:14:47,710 --> 00:14:52,870 Suppose I say rep matt. Repeat matrix ance six. 135 00:14:53,850 --> 00:14:55,839 One. Well. 136 00:14:55,840 --> 00:15:08,140 And since this most recent thing we've computed this row vector and rep Matt will take that that matrix or vector and panel it in a six by one array. 137 00:15:08,500 --> 00:15:15,660 So we get that. Now let's take a and change it a little bit. 138 00:15:15,670 --> 00:15:20,200 I'll say A of two colon. Five colon 16 equals. 139 00:15:22,440 --> 00:15:25,530 97, 98, 99. 140 00:15:26,430 --> 00:15:35,160 So there you see we've modified some entries of The Matrix or I suppose I say A of two 516 equals zero. 141 00:15:35,460 --> 00:15:41,040 So the point here is that if you're setting a bunch of entries to a single number, you don't have to make that a vector. 142 00:15:41,040 --> 00:15:48,460 On the right hand side. You can just write the number. Suppose I say a of colon two. 143 00:15:49,390 --> 00:15:55,920 Then you get the second column of the matrix. Similarly a of two comma colon would give you the first row. 144 00:15:56,620 --> 00:16:01,360 Or suppose I say a of colon two equals the empty vector. 145 00:16:01,900 --> 00:16:06,580 Then you see A has now been changed. It has removed the second column. 146 00:16:09,470 --> 00:16:13,520 Suppose I say find a double equals zero? 147 00:16:14,120 --> 00:16:23,929 A logical equals. So it finds those indices where A takes the value zero and notice by default it's doing that in the single index mode. 148 00:16:23,930 --> 00:16:28,850 So the second entry is a zero and also the eighth entry in the matrix is a zero. 149 00:16:30,790 --> 00:16:34,510 Suppose I say a ants equals 15? 150 00:16:34,720 --> 00:16:40,420 Well, it will take these two indices we've just found and change the value of a at those points to 15. 151 00:16:43,580 --> 00:16:46,670 Suppose I say find a equals 15. 152 00:16:46,970 --> 00:16:55,940 There are three of them now. On the other hand, suppose I want to indices, I could say I j equals find a equals 15. 153 00:16:57,110 --> 00:17:03,890 And then you see it's giving us the row and column indices of the three entries of a that are 15. 154 00:17:05,690 --> 00:17:08,899 Or suppose I say a equals 15 equals, 155 00:17:08,900 --> 00:17:16,340 so a logical condition that will give me a matrix of zeros and ones with a one in each position where a equals 15. 156 00:17:18,930 --> 00:17:23,460 Suppose I say a greater than 15? That's another logical condition. 157 00:17:23,850 --> 00:17:28,740 So again, I'll get a matrix of zeros and ones showing me where A is bigger than 50. 158 00:17:29,670 --> 00:17:39,390 Suppose I say find of a sorry find of mod of a two equals zero that will find the entries of a. 159 00:17:40,420 --> 00:17:45,420 That r zero mod to. In other words, even. So there they are. 160 00:17:45,580 --> 00:17:49,780 Here's a again. So those are the indices of the even entries of a. 161 00:17:52,340 --> 00:17:58,400 Suppose I say a of ants equals 100 times a events. 162 00:17:59,950 --> 00:18:04,090 We've now taken the the even entries of the matrix and multiplied them by 100. 163 00:18:05,050 --> 00:18:08,080 Suppose I say who's round of five? 164 00:18:10,030 --> 00:18:18,400 There we have some numbers. Let me make it format short. So there we have a five by five array of random numbers between zero and one. 165 00:18:19,700 --> 00:18:25,729 Suppose I say a of five colon for colon 21 equals one. 166 00:18:25,730 --> 00:18:37,160 Colon five. Squared. Now what I've done is I've taken the cross diagonal entries and I've changed them from random numbers into 149 1625. 167 00:18:39,070 --> 00:18:45,790 Now suppose I say a of a not equal round of a equals 999. 168 00:18:46,870 --> 00:18:56,440 Round of a of course rounds a number to an integer. So the places where A is not equal to its rounded value are the places where it's not an integer. 169 00:18:57,040 --> 00:19:00,970 So you see, I guess turned all the random numbers into 999. 170 00:19:01,750 --> 00:19:07,210 This is the way in high level languages you can do a lot of stuff and it gets convenient always to try. 171 00:19:07,450 --> 00:19:11,830 As a matter of principle, to do things the slick way. Okay. 172 00:19:11,900 --> 00:19:16,180 That was the end of the digression. I know there's a lot of variety in this room. 173 00:19:16,190 --> 00:19:19,690 Some of you have used MATLAB for years, others not so long. 174 00:19:26,470 --> 00:19:36,400 Okay. So back to our main story, Preconditioning. 175 00:19:38,860 --> 00:19:47,680 The lesson here is that if you have a matrix problem whose spectrum is well separated from the origin, then conjugate gradients will converge quickly. 176 00:19:50,170 --> 00:19:55,480 So that suggests given a matrix problem X equals B, we should add 100 times the identity to it. 177 00:19:56,200 --> 00:20:03,189 The trouble is, you can't do that. If X equals B, you can't change AA and still have an equivalent problem. 178 00:20:03,190 --> 00:20:07,240 There's no way automatically to turn that picture into that picture. 179 00:20:07,420 --> 00:20:09,610 If there were, it would be a very different world. 180 00:20:10,570 --> 00:20:22,780 So adding a multiple of the identity is not a successful idea in general, but multiplying is preconditioning is a multiplicative idea. 181 00:20:26,980 --> 00:20:32,469 Which sort of came along in the 1970s and their variations on this. 182 00:20:32,470 --> 00:20:35,830 And the one I'll talk about is what you could call left preconditioning. 183 00:20:36,100 --> 00:20:41,050 Suppose you have a x equals b is the problem you want to solve. 184 00:20:42,310 --> 00:20:46,540 Well, that's of course equivalent to m inverse. 185 00:20:47,230 --> 00:20:54,010 A X equals m inverse B for any matrix m that's non singular. 186 00:20:55,660 --> 00:21:00,280 And the whole idea of preconditioning is defined matrices m. 187 00:21:01,830 --> 00:21:10,080 Which have two properties. One is that you can effectively compute the inverse or at least solve matrix problems involving them reasonably fast. 188 00:21:10,680 --> 00:21:16,110 And the other is that this improves the picture to a better condition matrix problem. 189 00:21:17,070 --> 00:21:21,320 The surprising thing is that in almost all applications there are ways to do this. 190 00:21:21,330 --> 00:21:27,120 So there's a huge business of finding good preconditions, and that's more or less what I'm talking about today. 191 00:21:27,660 --> 00:21:38,760 So let me say more. Now, of course, if A is symmetric, positive, definite, which it had better be for conjugate gradients, then. 192 00:21:40,750 --> 00:21:44,990 If we take. M and factor it. 193 00:21:46,210 --> 00:21:48,940 In that form. That's called a collective factorisation. 194 00:21:49,120 --> 00:21:55,359 So if we have a symmetric positive, definite pre conditioner and we factor it into a product of one matrix, 195 00:21:55,360 --> 00:22:05,020 transpose times that matrix, then another way to write the equation turns out to be c inverse. 196 00:22:05,410 --> 00:22:08,530 A C minus transpose. 197 00:22:11,010 --> 00:22:16,470 C transpose x equals C inverse. 198 00:22:17,440 --> 00:22:22,180 B, I'm not going to go through that and indeed, I'm not going to use this formulation. 199 00:22:22,330 --> 00:22:26,770 But the point is that in practice, symmetry can often be preserved. 200 00:22:27,010 --> 00:22:34,120 So if you have a symmetric problem and you want to precondition it, there are ways to do that, to preserve the symmetry and. 201 00:22:35,320 --> 00:22:40,060 That's just mechanics. I'm not going to talk about it. Symmetry can be. 202 00:22:42,150 --> 00:22:53,610 Preserved when you precondition. But I'm just going to talk about the fundamental idea using this basic mathematical idea. 203 00:22:54,980 --> 00:23:00,170 So we want to improve the conditioning of a matrix by pre multiplication by something. 204 00:23:00,770 --> 00:23:06,200 Let's take a look just for fun at the paper. That's among the handouts by my shrink and then divorced. 205 00:23:07,610 --> 00:23:10,910 So this is truly one of the greatest hits of numerical analysis. 206 00:23:16,470 --> 00:23:19,620 There's a lot of controversy about who invented Preconditioning. 207 00:23:19,620 --> 00:23:24,270 And without a doubt, these were not the first people to have ideas in this direction. 208 00:23:24,630 --> 00:23:31,110 On the other hand, they were the ones who put it on the map. This paper really did change computational science slowly. 209 00:23:31,410 --> 00:23:36,660 It was an agonising story for the authors. I think they wrote the paper in like 72 or something. 210 00:23:37,020 --> 00:23:45,750 Had a lot of trouble with referees. There was one particularly famous referee whose name I will not utter who tried to squelch the paper. 211 00:23:46,500 --> 00:23:51,240 Eventually it got published years later, and slowly it took off. 212 00:23:51,480 --> 00:23:53,340 So starting in the late seventies, 213 00:23:53,340 --> 00:24:00,840 everybody was talking about pre-condition conjugate gradients and I'll say a word in a moment about their preconditions. 214 00:24:01,800 --> 00:24:07,110 Which is just one of many methods, but a particularly good one called incomplete factorisation. 215 00:24:07,440 --> 00:24:13,049 So Vandervoort in particular has been one of the great figures in iterative linear algebra for many years. 216 00:24:13,050 --> 00:24:17,070 Henk Van divorced from the University of Utrecht in the Netherlands. 217 00:24:18,690 --> 00:24:20,969 So you've now seen two papers that changed the world. 218 00:24:20,970 --> 00:24:28,800 One was destinies and Stiefel on gradients, and then this on preconditioning conjugate gradients 25 years later. 219 00:24:33,460 --> 00:24:41,170 Okay. So what I want to do is illustrate a bit of the effect, a precondition or can have. 220 00:24:41,470 --> 00:24:46,450 So we're now going to look at the codes called PRC and M5. 221 00:24:55,490 --> 00:25:02,840 So let me remind you that MATLAB has built into it a precondition conjugate gradient code called Pcgg. 222 00:25:03,860 --> 00:25:08,570 And then in addition, I've written a few elementary ones which are not software, 223 00:25:08,580 --> 00:25:12,680 they're just illustrations of how simple the preconditioning idea can be. 224 00:25:12,980 --> 00:25:18,650 So you look at my codes in order to get sort of the essential mathematical idea, but you would never use them as software. 225 00:25:20,030 --> 00:25:26,540 So c g is un precondition and Prague is precondition with a precondition or m just has written there. 226 00:25:26,750 --> 00:25:29,780 And let's play with that. So let's say. 227 00:25:30,020 --> 00:25:34,310 Suppose a is the diagonal matrix from 1 to 100. 228 00:25:37,970 --> 00:25:43,280 And let's suppose B is the usual trivial one right hand side. 229 00:25:45,920 --> 00:25:52,969 And let's try CG of a. So that's an example of not so good convergence. 230 00:25:52,970 --> 00:26:00,270 It's going but not so great. Let's take AI to be the identity of size 100. 231 00:26:02,130 --> 00:26:11,850 And now let's try various preconditions. So first of all, suppose I try precondition conjugate gradients and my m matrix is the identity. 232 00:26:13,270 --> 00:26:19,120 So I precondition by replacing that by this where m is the identity, of course it will have no effect. 233 00:26:20,080 --> 00:26:24,730 So that's one extreme you get exactly what have I done? 234 00:26:24,970 --> 00:26:28,630 Oops. The same numbers as before. Here's a plot of the convergence. 235 00:26:28,900 --> 00:26:32,050 Eventually it gets there, but it's not especially great. 236 00:26:37,090 --> 00:26:45,010 Let's go to the other extreme. So suppose I say p.r c g of a b. 237 00:26:45,400 --> 00:26:57,380 Hey. So now I'm going to precondition this problem by setting em equal to A so it becomes x equals adverse B, it will converge in one step. 238 00:27:02,160 --> 00:27:06,300 Oops. What am I doing wrong here? Okay, it converge in one step. 239 00:27:06,320 --> 00:27:10,430 The. The X axis should really say 0 to 1 instead of 1 to 2. 240 00:27:10,440 --> 00:27:16,700 Basically in one step it converts. So that's the perfect precondition when M equals A itself. 241 00:27:16,710 --> 00:27:21,360 But of course, the whole point is that we assume A is hard to invert. 242 00:27:21,360 --> 00:27:25,470 So although it's a perfect precondition, it's a unusable precondition. 243 00:27:26,070 --> 00:27:33,180 The whole point is to find a compromise sum matrix m which can be inverted quickly, which improves the conditioning. 244 00:27:34,760 --> 00:27:41,750 So in this simple illustration, let's just try as the square root of the diagonal. 245 00:27:42,350 --> 00:27:51,190 So let's say. Let's say d equals diag of square root of 1 to 100. 246 00:27:54,010 --> 00:28:01,330 Now our matrix happened to be diagonal. So if I took the equals the diagonal of the matrix, it would be a itself. 247 00:28:01,510 --> 00:28:04,420 That's why I've put in the square root just to mess things up a little bit. 248 00:28:04,720 --> 00:28:11,850 So here's an example of a matrix that captures some of the flavour of a but not all of it, and it's a pretty good precondition. 249 00:28:12,190 --> 00:28:16,280 So if I say. PR c. 250 00:28:16,290 --> 00:28:19,530 G of a. B. D. 251 00:28:22,800 --> 00:28:27,690 This is more typical of the kind of behaviour you strive for in using conjugate gradients. 252 00:28:27,870 --> 00:28:31,950 You see, we're having good smooth convergence at a good geometric rate. 253 00:28:32,250 --> 00:28:38,370 In 30 steps we've got ten digits of accuracy. This matrix was only 100 by 100, so that's not too exciting. 254 00:28:38,370 --> 00:28:45,260 But it might have been 10,000 by 10,000. Let's do a bigger problem. 255 00:28:50,660 --> 00:28:57,770 Oh, I'm sorry. I forgot that I was going to do the same thing using the built in MATLAB precondition conjugate gradient. 256 00:28:57,770 --> 00:29:07,160 So this is now the built in MATLAB code. If I say PC G, a, b, and then I give it a tolerance and I tell it how many steps it can take. 257 00:29:08,660 --> 00:29:13,130 Then eventually it converges in an automated way. 258 00:29:14,060 --> 00:29:29,340 So let's now do a bigger problem. I'll say an equal 30,000 and I'll allocate some memory sparse allocation of memory Ballack of end to end for end. 259 00:29:29,880 --> 00:29:39,140 So we've just allocated we've told MATLAB that we're going to be working with an end by end sparse matrix and give it for and. 260 00:29:41,360 --> 00:29:53,150 Double precision memory locations and I'm going to set I to be an index set going from end 30,000 in steps of minus eight up to N squared. 261 00:29:54,140 --> 00:30:02,030 And I'm going to set a of I equal to one. So you see I'm now doing the kind of fancy stuff with indices that I showed you a moment ago. 262 00:30:02,330 --> 00:30:06,590 Fancy fooling around. We won't know what that looks like until we draw a picture. 263 00:30:07,770 --> 00:30:16,890 But first let me say a equals a plus a transpose plus diag of sparse of one to n. 264 00:30:18,210 --> 00:30:21,870 So what I'm doing here is just fooling around and constructing an example matrix. 265 00:30:22,110 --> 00:30:29,640 It's 30,000 by 30,000. Unless you're very quick, you probably can't visualise its sparsity pattern on the diagonal. 266 00:30:29,940 --> 00:30:36,420 It has one to n and on the off diagonal it has the number one appearing in various positions. 267 00:30:37,290 --> 00:30:41,680 So it's a sparse matrix and it's symmetric. Here's what it looks like. 268 00:30:44,290 --> 00:30:48,730 To me that looks like some kind of a cake that people have decorated with frosting or something. 269 00:30:48,970 --> 00:30:56,110 So on the diagonal it goes from 1 to 30000 and all the off diagonal blue dots are the numbers one. 270 00:30:57,070 --> 00:31:03,760 So a typical sparse matrix with and it's not at all clear what kind of structure it's got, whether it's a usable structure or what. 271 00:31:06,040 --> 00:31:07,659 Now let's solve a problem with that. 272 00:31:07,660 --> 00:31:22,660 So I'll say B equals ones of 30,000 comma one and I'll say c, g of a B, so we're now trying unfree preconditioned conjugate gradients. 273 00:31:24,880 --> 00:31:28,270 The first screen fall immediately tells you that it's not doing very well. 274 00:31:28,270 --> 00:31:37,360 We've got about a factor four in one screen full. It's converging slowly, so without a precondition or conjugate gradients is not satisfactory. 275 00:31:40,900 --> 00:31:44,830 Now let's precondition and here's what I'll do. I'm going to take a diagonal precondition. 276 00:31:45,130 --> 00:31:51,220 And this is a very standard, important idea. In fact, this is almost the basic first precondition. 277 00:31:51,220 --> 00:31:58,330 Or you should always try for every problem. A is some complicated matrix with non zeros all over the place. 278 00:31:59,630 --> 00:32:06,770 That constructs a diagonal matrix with the diagonal entries of a and surprisingly often that's a pretty good precondition. 279 00:32:08,220 --> 00:32:18,220 So let's try it. Let's say X equals prc g of a B with the diagonal precondition. 280 00:32:24,040 --> 00:32:27,579 So look at that. It's beautiful behaviour for this matrix. 281 00:32:27,580 --> 00:32:32,379 In seven steps we got 12 digits of accuracy so that's better than usual. 282 00:32:32,380 --> 00:32:41,230 If you achieve this you are in very good shape, spectacularly effective preconditioning just by using the diagonal of the matrix. 283 00:32:41,410 --> 00:32:45,440 Even this idea wasn't really around until the 1970s. 284 00:32:45,550 --> 00:32:51,670 There are precursors, of course, but nobody was really taking advantage of this kind of thing until the seventies. 285 00:32:55,390 --> 00:33:03,370 Of course one can also do it using the built in software. So suppose I say pc gab1e. 286 00:33:04,750 --> 00:33:10,620 Or am I. If you press the wrong thing, it can be annoying. 287 00:33:10,740 --> 00:33:21,510 One E, minus ten. 100 D So using the built in matlab pc g code you see in five steps it's happy. 288 00:33:25,630 --> 00:33:31,480 So that looks like a great example of a matrix where iterative methods are spectacularly good. 289 00:33:31,690 --> 00:33:35,860 We could even do a tick tock on that. Suppose I said talk. 290 00:33:38,030 --> 00:33:44,690 Tick. So you see in less so about a 10th of a second, we get this 12 digit accuracy. 291 00:33:45,530 --> 00:33:51,050 Let's try for contrast doing a backslash b. 292 00:33:56,170 --> 00:34:01,330 So this is Mount Labs, the standard built in X equals B solver backslash. 293 00:34:01,990 --> 00:34:06,490 And exactly what that does is an interesting story that we haven't talked about. 294 00:34:07,960 --> 00:34:12,790 So. You might expect it to take a staggeringly long amount of time. 295 00:34:12,910 --> 00:34:18,520 Remember the first lecture we did a 4000 by 4000 matrix and it took several seconds. 296 00:34:18,700 --> 00:34:22,089 This is a 30,000 by 30,000 matrix. So who knows? 297 00:34:22,090 --> 00:34:27,010 It may take minutes or not. Let's put a tick and talk around it to see what it takes. 298 00:34:29,250 --> 00:34:32,310 Well, look at that. That's also spectacularly fast. 299 00:34:33,510 --> 00:34:37,500 This, it turns out, is using a completely different method. It's not iterative. 300 00:34:37,710 --> 00:34:44,340 It's using a method that cleverly reorders the rows in the columns of the matrix to bring those ones closer to the diagonal. 301 00:34:44,610 --> 00:34:47,940 So this is a surprise. When I tried it, it surprised me. 302 00:34:48,210 --> 00:34:52,530 I thought I had cooked up an example in which you really needed an iterative method. 303 00:34:52,830 --> 00:34:57,000 But this experiment shows that for this example, you don't need an iterative method. 304 00:34:57,330 --> 00:35:01,200 And in general, you should never assume a problem needs an iterative method. 305 00:35:01,230 --> 00:35:08,100 It's always worth trying these more elementary so-called direct methods, which in MATLAB are encoded with backslash. 306 00:35:08,550 --> 00:35:17,980 A rule of thumb is that often problems that come from a two dimensional spatial domain don't need iterative methods. 307 00:35:18,000 --> 00:35:25,620 Often, backslash type algorithms work fine. If you come from a three dimensional spatial domain more often, you do need iterative methods. 308 00:35:29,250 --> 00:35:34,050 Let's compare the accuracy. Suppose I say max of abs of x minus x two. 309 00:35:34,320 --> 00:35:39,510 So I'm now comparing the result of preconditioned conjugate gradients to the result of backslash, 310 00:35:39,510 --> 00:35:44,280 which is a direct non-intuitive method and I find perfectly good agreement. 311 00:35:48,460 --> 00:35:54,030 Okay. So what can one do in a course like this with preconditions? 312 00:35:54,130 --> 00:35:58,709 I can tell you they're incredibly important. It's kind of embarrassing how important they are. 313 00:35:58,710 --> 00:36:03,720 A computational science is not in principle about linear algebra, and yet in practice it often is. 314 00:36:04,170 --> 00:36:09,870 And linear algebra is not in principle about preconditioning. And yet in practice it often is. 315 00:36:10,200 --> 00:36:15,690 It's just remarkable how much of computational science often comes down to having a good precondition. 316 00:36:16,110 --> 00:36:20,340 And what I want to do in the rest of this lecture is just sort of talk about 317 00:36:20,850 --> 00:36:27,450 14 classes of preconditions that have proved important in the last 40 years. 318 00:36:27,630 --> 00:36:35,940 Just to give you a sense of some of the breadth of this. Now, maybe I can get away with even shutting that down. 319 00:36:35,940 --> 00:36:39,540 If I shut down the computer, will that cause trouble? Yes. 320 00:36:39,540 --> 00:36:42,600 I won't get away with that then. So. Okay, we'll stick with this screen. 321 00:36:51,430 --> 00:36:56,320 So this is what I'm going to call Section 1.7 Whistle Stop Survey. 322 00:36:58,810 --> 00:37:13,379 Of preconditions. And before we do our survey, let me mention that Oxford, as it happens, 323 00:37:13,380 --> 00:37:19,800 has an expert in the field, and that's Andy Watson in the Numerical Analysis Group upstairs. 324 00:37:20,970 --> 00:37:25,950 So he is in the field of numerical linear algebra and in particular, he's an expert in Preconditioning. 325 00:37:26,160 --> 00:37:33,510 So if you are trying to precondition your problem, this is really one of the world's most knowledgeable people right here in this building. 326 00:37:35,290 --> 00:37:42,360 Okay. Oh, another remark to make is. For reading, you might try. 327 00:37:42,390 --> 00:37:49,800 Chapter 40 of Profession in Beau is more or less what I'm just about to mention to you, minus a few things. 328 00:37:50,930 --> 00:38:00,050 That's a discussion of a number of preconditions. Okay, so our whistle stop tour, we have about a minute per precondition. 329 00:38:02,060 --> 00:38:05,150 The first one dimension is diagonal scaling. 330 00:38:09,540 --> 00:38:12,620 And there's another word that's often used for that. 331 00:38:12,630 --> 00:38:19,380 It's called Jacoby because the great mathematician Jacoby used ideas like this. 332 00:38:19,800 --> 00:38:26,850 He was amazingly modern in the middle of the 19th century. He was doing things which still have numerical application today. 333 00:38:27,940 --> 00:38:31,250 So this is absolutely the first thing to try for almost any problem. 334 00:38:31,270 --> 00:38:44,000 It's often good. At the other extreme in complexity is the idea that my shrink and that divorced introduced namely incomplete factorisation. 335 00:38:48,060 --> 00:38:59,070 Now, let me just say a word about what that means. So we haven't talked about Gaussian elimination because it's too familiar, basically. 336 00:38:59,310 --> 00:39:07,410 But the standard method by which people solve X equals B is to factor A into a product of a lower triangular and an upper triangular matrix. 337 00:39:07,620 --> 00:39:11,430 That's Gaussian elimination. Now, if you can do that, that's great. 338 00:39:12,270 --> 00:39:17,760 The trouble is that even if A is sparse, the factors L and you are typically not so sparse. 339 00:39:17,970 --> 00:39:22,440 So you destroy a lot of the potential of your problem if you can factor it. 340 00:39:22,650 --> 00:39:27,930 Moreover, if you aren't so sparse, a lot of work is going to be involved. 341 00:39:28,440 --> 00:39:36,020 So the idea that my shrink and Vandervoort had was to do a factor ization in which you force a lot of the entries to be zero, 342 00:39:36,030 --> 00:39:44,240 even though mathematically they shouldn't be you instead of a equals L2, which would be too expensive and not sparse enough. 343 00:39:46,270 --> 00:39:52,989 They do some kind of an A equals an approximate L and an approximate you wear and it's not equal. 344 00:39:52,990 --> 00:39:59,560 It's approximately equal where L and U tilde are forced to have a sparsity pattern. 345 00:40:00,010 --> 00:40:04,650 That turns out to be a very powerful technique. It's much more complicated. 346 00:40:04,900 --> 00:40:11,440 Of course, you have to decide exactly what your strategy will be, and the analysis of it also turns out to be not at all trivial. 347 00:40:11,440 --> 00:40:17,920 So it took many years for people to be comfortable with exactly why it worked as well as it does. 348 00:40:18,100 --> 00:40:21,280 But it really is a powerful technique. And in MATLAB. 349 00:40:23,800 --> 00:40:27,220 There are some built in capabilities for this, in particular. 350 00:40:27,220 --> 00:40:39,820 C l u i n c which is a non symmetric incomplete factorisation code and then ch0lincchol 351 00:40:39,820 --> 00:40:45,160 stands for LSC and that refers to the symmetric case of incomplete factorisation. 352 00:40:49,120 --> 00:40:53,650 The third one I'd like to mention is the what I call course grid approximation. 353 00:40:58,080 --> 00:41:05,940 So in computational science, very often we have a big matrix problem because there's some grid that we've constructed in two or three dimensions. 354 00:41:06,210 --> 00:41:10,950 And of course we want the grid to be as fine as possible if it's 100 by 100 by 100, 355 00:41:10,950 --> 00:41:14,160 and there's just a scalar variable, that's a million unknowns already. 356 00:41:14,970 --> 00:41:18,299 If there are three variables at each point, that's 3 million unknowns. 357 00:41:18,300 --> 00:41:22,080 If it's 200 by 200 by 200, that's 24 million unknowns. 358 00:41:22,080 --> 00:41:28,110 So you very quickly get big matrix problems. However, as you might imagine. 359 00:41:30,020 --> 00:41:38,840 If you have a very coarse grid that might give you some kind of crude approximation to the solution on the grid, you ultimately really care about it. 360 00:41:39,500 --> 00:41:45,080 So just to draw the simplest picture, the four by four problem. 361 00:41:46,660 --> 00:41:51,459 Might be some kind of approximation to the HIPAA problem and so on. 362 00:41:51,460 --> 00:41:55,240 That might be some kind of an approximation to the 16 by 16 problem. 363 00:41:55,540 --> 00:41:59,410 So this might be a precondition for that. 364 00:42:00,160 --> 00:42:05,050 At each step, we might somehow solve a problem involving this relatively easy matrix. 365 00:42:05,290 --> 00:42:09,560 And if we do things right, use that approximation as our precondition. 366 00:42:10,030 --> 00:42:18,460 That's a great idea in many cases. And indeed, that's the idea which eventually led to what's called multi grid methods. 367 00:42:21,580 --> 00:42:25,770 So if you do this just once. Precondition a fine grid by a coarser grid. 368 00:42:25,780 --> 00:42:33,880 Well, that's a precondition. But if you recurs on that idea going down a level to coarser and coarser grids, that's called multi grid iteration. 369 00:42:34,060 --> 00:42:38,440 And that's another one of the greatest hits ideas of the last few decades multi grid 370 00:42:38,440 --> 00:42:45,370 iterations for solving the linear algebra problems that arise in computational science. 371 00:42:50,920 --> 00:42:59,020 One more thing to say about that. If you think about what's involved, it's intuitively clear that this is has a hope of getting, 372 00:42:59,020 --> 00:43:02,440 as it were, the long range structure, the the big wavelengths. 373 00:43:02,890 --> 00:43:06,580 Obviously, you're not going to resolve the small wavelengths on the course grid. 374 00:43:06,790 --> 00:43:13,720 So this preconditioning idea is one of preconditioning the problem by getting the large scale structure somehow. 375 00:43:14,950 --> 00:43:19,720 Now the next idea is almost a dual of that. I could call it local approximation. 376 00:43:26,960 --> 00:43:30,050 So any scientific problem with a spatial aspect to it? 377 00:43:31,130 --> 00:43:35,870 Probably has some physics that's essentially local and some other things going on that are reasonably global. 378 00:43:36,020 --> 00:43:39,020 The boundaries may be affecting you in fundamental ways, 379 00:43:39,200 --> 00:43:44,140 but also just whatever locally is diffusing or convecting or whatever will of course, be important. 380 00:43:44,600 --> 00:43:48,950 And another idea for Preconditioning is that if you can do everything sort of locally, 381 00:43:48,950 --> 00:43:54,800 correct, ignoring the long range that may correspond to a close to diagonal matrix. 382 00:43:55,160 --> 00:44:00,740 So this may give you a precondition or which is close to diagonal and therefore implementable. 383 00:44:01,430 --> 00:44:09,350 So in some sense, if you omit long range interactions, that may be an idea for a precondition. 384 00:44:17,650 --> 00:44:22,480 Let me mention then number as number five, what I'll call block precondition is. 385 00:44:26,050 --> 00:44:32,290 And more or less the same idea. Domain decomposition. 386 00:44:42,190 --> 00:44:49,269 This comes up in various context. Physically, you can imagine a system, let's say you have an aeroplane, it's got a left wing, 387 00:44:49,270 --> 00:44:52,150 it's got a right wing, it's got the body, it's got the engines and so on. 388 00:44:53,530 --> 00:44:58,900 Maybe you can solve the body relatively easy, but then coupling it in with the others leads to complexity. 389 00:44:59,110 --> 00:45:05,890 So the idea here is to use as your precondition for these various sub systems of the complicated system. 390 00:45:06,520 --> 00:45:15,639 Another way to say that, though, is you've got a giant matrix and maybe the first few columns correspond to the body of the aeroplane, 391 00:45:15,640 --> 00:45:23,140 and the next few columns sort of correspond to the left wing and the next few columns correspond to the right wing and so on and so on. 392 00:45:23,890 --> 00:45:28,150 You can imagine they're all sorts of connections between these different systems. 393 00:45:28,960 --> 00:45:36,160 But if we approximate by pretending that the body is only connected with the body and the left wing is only connected with the left wing, 394 00:45:36,580 --> 00:45:45,050 and the right wing is only connected with the right wing and so on. Then we're getting some kind of a block diagonal structure. 395 00:45:45,320 --> 00:45:51,830 We're omitting crucial parts of the physics that are out here, but hopefully of a lower importance. 396 00:45:52,830 --> 00:45:57,659 So that's another generalisation, if you like, of the idea of a diagonal precondition. 397 00:45:57,660 --> 00:46:04,020 Are these all of these ideas are just unbelievably big in practice. 398 00:46:04,020 --> 00:46:09,690 For example, there's a domain decomposition conference that has happened every two years for the last 25 years. 399 00:46:09,690 --> 00:46:14,940 You know, little bits of this have whole research communities and big scientific impact. 400 00:46:19,140 --> 00:46:22,560 The next one I want to mention is low order discontinuation. 401 00:46:28,730 --> 00:46:37,070 Now what I mean by that, if you take a differential equation and approximate it by finite differences, 402 00:46:38,450 --> 00:46:42,709 if you just look for second order accuracy, you probably get a reasonably sparse matrix. 403 00:46:42,710 --> 00:46:49,070 But bad accuracy. If you look for fourth or sixth order accuracy, you get a worse matrix, but better accuracy. 404 00:46:49,280 --> 00:46:56,660 So a natural idea is to formulate your dissertation at a reasonably high order to get accuracy on, 405 00:46:56,720 --> 00:47:02,450 not to find a grid, but then use the sparser low order matrix as your precondition. 406 00:47:04,380 --> 00:47:09,630 And if you take that idea to the limit, there are things called spectral methods which are sort of of infinite order. 407 00:47:09,840 --> 00:47:14,880 And some people would approximate a spectral method by a finite element method or something. 408 00:47:15,060 --> 00:47:20,460 So you can have an infinite order method preconditioned by a second order method, for example. 409 00:47:26,330 --> 00:47:33,980 Okay. Next thing, let's mention the idea of taking a problem and a constant coefficient approximation to it. 410 00:47:36,000 --> 00:47:41,130 And I'll put in the same category or a symmetric approximation. 411 00:47:47,360 --> 00:47:54,830 So in many cases, if your problem had constant coefficients or were symmetric, there might be special fast ways to solve it. 412 00:47:55,010 --> 00:47:59,450 For example, a the Poisson equation with constant coefficients. 413 00:48:00,770 --> 00:48:06,860 There are things called fast Poisson solvers which take advantage of the fast Fourier transform to get spectacular speed. 414 00:48:07,400 --> 00:48:12,080 If only my problem had constant coefficients, one would say I could solve it very quickly. 415 00:48:12,350 --> 00:48:17,240 Well, whenever you're in an if only situation, then maybe you've got a good precondition. 416 00:48:17,420 --> 00:48:20,300 So use the constant coefficient thing to precondition. 417 00:48:20,600 --> 00:48:28,580 Or if your problem is dominated by some symmetric physics that can be encoded in a symmetric matrix, and yet it also has some non symmetric physics. 418 00:48:29,330 --> 00:48:33,660 Maybe the symmetric part can be solved more quickly and can be used as a precondition. 419 00:48:37,830 --> 00:48:41,070 The next one is splitting. Related to that. 420 00:48:43,820 --> 00:48:47,270 So I guess the term we like these days would be multi physics. 421 00:48:51,010 --> 00:48:56,500 Suppose you have a problem, as people so often do, which has more than one bit of physics going on. 422 00:48:56,680 --> 00:49:02,800 And the classic example would be in fluid mechanics, you have convection and you also have diffusion viscosity. 423 00:49:03,700 --> 00:49:07,660 So these are different bits of physics and they turn into different bits of mathematics. 424 00:49:07,900 --> 00:49:13,180 They're discrete times by different linear algebra problems that all combine into one. 425 00:49:14,380 --> 00:49:18,490 It may be that if you only had one of the problems or the other, you could solve it quickly. 426 00:49:18,640 --> 00:49:21,910 Maybe if you only had the diffusion, you could use a fast Poisson solver. 427 00:49:22,060 --> 00:49:28,630 But the convection makes that difficult. Well, that's a very standard way of devising a precondition. 428 00:49:28,750 --> 00:49:34,330 You've set up the big problem. Hive off part of it that you can solve as your precondition. 429 00:49:38,050 --> 00:49:52,810 Next idea is dimensional splitting, and related to that is the idea of Add II, which stands for alternating direction implicit methods. 430 00:49:53,410 --> 00:49:57,520 Here, as you can guess from the name. You have a problem in 2D or 3D. 431 00:49:57,910 --> 00:50:02,160 If only it had one dimension, then it would be a diagonal matrix. 432 00:50:02,170 --> 00:50:05,110 It's the multi dimensionality that makes it non diagonal. 433 00:50:05,470 --> 00:50:14,049 So the preconditioning idea is to only look at the interactions in X, for example, as your precondition to solve the X part of the problem. 434 00:50:14,050 --> 00:50:17,740 Exactly. That's not the whole problem. But maybe it's a good precondition. 435 00:50:25,140 --> 00:50:32,280 Another idea is to take one step as your precondition of a classical iteration. 436 00:50:33,150 --> 00:50:43,350 Now, by a classical iteration, I mean the things that go by the names of Gauss Seidel. 437 00:50:45,820 --> 00:50:51,610 Or successive over relaxation or symmetric successive over relaxation. 438 00:50:52,240 --> 00:50:57,820 These are things that were big in the 1950s before people realised how powerful conjugate gradients was. 439 00:50:58,180 --> 00:51:09,280 They're much less important now. That's an L. It's much rarer these days for people to solve a problem just by using successive over relaxation. 440 00:51:09,520 --> 00:51:13,780 However, these things make great preconditions for higher order problems, 441 00:51:13,990 --> 00:51:18,400 and this has to do again with separating the local effects from the global effects. 442 00:51:18,670 --> 00:51:22,990 These can sort of take your problem and globalise it in a in a good way. 443 00:51:28,740 --> 00:51:36,020 11. Is the idea of a periodic approximation to a non periodic problem. 444 00:51:43,240 --> 00:51:46,930 This was invented by Gil Strang at MIT. 445 00:51:47,170 --> 00:51:54,190 He was actually visiting Oxford this term. I'm curious who have seen lectures by Gil Strang on the web? 446 00:51:55,440 --> 00:52:00,570 Okay. One or two? Yeah. He has these linear algebra and other things that have been very popular at MIT. 447 00:52:01,080 --> 00:52:03,690 If you Google him, you'll find that. So it's trying. 448 00:52:03,690 --> 00:52:10,470 Years ago proposed this idea that take a non periodic problem like a toeplitz matrix which has constant long diagonals. 449 00:52:11,190 --> 00:52:16,680 Imagine the Associated Periodic version of the problem, which in that example would be a circular matrix. 450 00:52:17,070 --> 00:52:23,970 It may be you have special methods for the periodic problem, like a fast Fourier transform, so you can solve that one very quickly. 451 00:52:24,450 --> 00:52:27,870 It turns out in some cases it's a spectacularly good precondition. 452 00:52:29,730 --> 00:52:33,450 We're a little short on time, but let me quickly at least name the final three. 453 00:52:34,770 --> 00:52:45,620 One idea is to use an unstable method. We haven't talked about stability yet, but some numerical methods are fast but unstable. 454 00:52:45,630 --> 00:52:50,940 They amplify rounding errors. The classic example would be Gaussian elimination without pivoting. 455 00:52:51,240 --> 00:52:57,000 If you don't interchange rows, you get an unstable method of solving X equals B. 456 00:52:57,720 --> 00:53:01,180 Probably not good by itself, but it might make a great precondition. 457 00:53:01,770 --> 00:53:09,240 And Bob Skeel is a famous name in that area. Another idea is polynomial preconditions. 458 00:53:13,770 --> 00:53:22,550 And the idea here. Is to approximate a inverse depth, 459 00:53:22,820 --> 00:53:30,680 to approximate a inverse somehow by a polynomial of a and that has the advantage of being a beautifully parallelism idea. 460 00:53:30,690 --> 00:53:37,339 So it's an idea that it's somehow it's never the best precondition or but it's always the most paralysing parallelism of one. 461 00:53:37,340 --> 00:53:39,320 So the idea keeps having importance. 462 00:53:39,590 --> 00:53:46,639 And actually one of the guys in the Numerical Analysis Group is working with things along these lines to solve eigenvalue problems. 463 00:53:46,640 --> 00:53:55,370 Jarrett aren't working with use of on that. And finally, let me mention the idea of sparse approximate inverses. 464 00:54:04,690 --> 00:54:09,100 Well, the idea is if we do the inverse. Exactly, that would be great. At least if it were sparse enough. 465 00:54:09,520 --> 00:54:17,140 In this business, people solved least squares problems to explicitly construct a matrix, which is a pretty good inverse. 466 00:54:17,170 --> 00:54:20,350 Not exact, but close to an inverse and is also quite sparse. 467 00:54:20,350 --> 00:54:23,470 So least squares technology turns into making a precondition. 468 00:54:23,710 --> 00:54:28,480 So I think this list probably covers more than half of the big ideas out there. 469 00:54:28,480 --> 00:54:33,520 It's just a huge, huge subject. And that's pretty much all we're going to say about it except for the next assignment. 470 00:54:33,940 --> 00:54:34,660 Okay. Thanks.