1
00:00:01,410 --> 00:00:07,260
Okay. Good morning, everybody. You see, there is a questionnaire, please, to fill that out and we'll collect it at the end.

2
00:00:07,770 --> 00:00:14,460
Also, assignment three in the usual pattern is due Tuesday, the week after next.

3
00:00:15,810 --> 00:00:19,230
I guess assignment two will be handed back on next Tuesday.

4
00:00:21,600 --> 00:00:24,750
McHale Is there anything you want to tell people about assignments?

5
00:00:25,530 --> 00:00:34,850
Okay, we're fine. All right. So today's our last day on linear algebra, and then the final few lectures of this term will be related to optimisation.

6
00:00:35,490 --> 00:00:41,910
And what I want to do today is first talk at a high level about matrix factorisation and algorithms,

7
00:00:42,270 --> 00:00:48,330
and then talk about the singular value decomposition. One of the most important of all, the matrix factorisation.

8
00:00:48,690 --> 00:01:01,810
So. So Matrix Factorisation and I guess that Section 2.8.

9
00:01:12,330 --> 00:01:16,350
Linear algebra is more or less it was more or less born in the 19th century.

10
00:01:17,340 --> 00:01:22,650
And then after computers came along this way of thinking in terms of matrix factorisation came along.

11
00:01:22,920 --> 00:01:32,490
It turned out that a fruitful way to think of algorithms was as taking a matrix A and turning it into some product of simpler matrices.

12
00:01:32,730 --> 00:01:39,720
And in fact, I like to say that this is the central dogma of numerical linear algebra.

13
00:01:42,000 --> 00:01:50,670
So central dogma of numerical linear algebra is that algorithms, at least the classical algorithms.

14
00:01:52,460 --> 00:02:07,810
Correspond to matrix factorisation. This expression, the central dogma is one that Francis Crick made famous in biology,

15
00:02:07,810 --> 00:02:12,650
where the central dogma is that DNA turns into RNA, which turns into proteins.

16
00:02:12,670 --> 00:02:17,860
So I think it's a good idea that every field should have a central dogma that's ours.

17
00:02:18,310 --> 00:02:27,730
It sort of came along gradually over the decades, 1950s, 1960s, and is now just absolutely the way that numerical people think about these algorithms.

18
00:02:28,240 --> 00:02:36,400
The general flavour is that you take an arbitrary matrix and you turn it into pieces that might be diagonal or triangular or orthogonal.

19
00:02:36,610 --> 00:02:42,460
Those are the three big ones diagonal try, diagonal, triangular or orthogonal.

20
00:02:43,000 --> 00:02:48,010
From those pieces, you can work with pieces like that more easily than with general matrices.

21
00:02:48,340 --> 00:02:54,210
For example, if you want to find eigenvalues of a matrix, the algorithms make the matrix diagonal.

22
00:02:54,220 --> 00:02:57,940
And then of course, you know, the eigenvalues of a diagonal matrix.

23
00:02:57,940 --> 00:03:07,670
They're the entries on the diagonal. Now what I want to do is summarise the seven most important factorisation in numerical linear algebra.

24
00:03:07,690 --> 00:03:11,620
So here they are. The first one is the Q r factorisation.

25
00:03:14,370 --> 00:03:24,460
We've seen most, but not all of these. And in my summary, I'm going to suppose that a is a real square matrix.

26
00:03:24,700 --> 00:03:31,509
So that's of course not always true, but let's imagine that a is a real matrix of dimension.

27
00:03:31,510 --> 00:03:41,589
And by M in that case the Q are factorisation is a factorisation of a matrix a an arbitrary real square matrix.

28
00:03:41,590 --> 00:03:46,720
A into q times. R Where Q is orthogonal.

29
00:03:49,840 --> 00:04:03,120
An hour is up a triangular. Let me remind you that if A were rectangular, then Q would have all the normal columns.

30
00:04:03,120 --> 00:04:08,970
But when a square Q is square. So those are the normal columns mean that it's an orthogonal matrix.

31
00:04:09,420 --> 00:04:13,499
Now, what is this used for? Well, it's used a lot. It's used all over the place.

32
00:04:13,500 --> 00:04:18,060
But in particular, it's important in least squares calculations.

33
00:04:19,710 --> 00:04:22,920
It's the starting point of anything you do and least squares, really.

34
00:04:23,610 --> 00:04:31,470
And it's also a step in iterative algorithms for eigenvalues and related things.

35
00:04:34,470 --> 00:04:41,010
So in particular for eigenvalues, which I like to write like that, and the singular value decomposition,

36
00:04:41,250 --> 00:04:47,310
the way those things are computed is by an iterative algorithm where at each step you do Q, r, factorisation.

37
00:04:47,640 --> 00:04:56,490
So this really transformed numerical linear algebra in the 1960s using the use of the q r factorisation as proposed by householder.

38
00:04:58,520 --> 00:05:01,700
The second one I'd like to mention is Lou Factorisation.

39
00:05:03,980 --> 00:05:09,320
Now, this is the oldest one in some sense. This is Gaussian elimination, basically.

40
00:05:09,590 --> 00:05:12,740
This is what we all learn when we first encounter linear algebra.

41
00:05:12,980 --> 00:05:18,350
You subtract multiples of one row of a matrix from other rows to introduce zeros.

42
00:05:18,680 --> 00:05:25,340
You make a matrix triangular by Gaussian elimination that can be regarded as an l u factorisation.

43
00:05:25,580 --> 00:05:32,360
Now, in the simplest form, it would be a equals l u, where that's lower triangular and that's upper triangular.

44
00:05:32,720 --> 00:05:36,830
But more generally, there's a permutation matrix on the left.

45
00:05:37,310 --> 00:05:41,000
So this would be the standard version of Gaussian elimination.

46
00:05:41,480 --> 00:05:44,480
L is unit lower triangular.

47
00:05:47,060 --> 00:05:57,890
That means that l is lower triangular with one is on the diagonal u is upper triangular.

48
00:06:02,640 --> 00:06:18,590
And P is a permutation matrix. Now a permutation matrix is a matrix whose rows are permutations of the rows of the identity equivalently,

49
00:06:18,590 --> 00:06:22,040
whose columns are permutations of the columns of the identity.

50
00:06:22,310 --> 00:06:25,400
When you take p times A, you're commuting its rows.

51
00:06:25,610 --> 00:06:33,050
Gaussian elimination typically is set up to per mute rows in order to make sure that the numbers we divide by are not too small.

52
00:06:33,530 --> 00:06:42,170
So this is the standard version of Gaussian elimination with partial pivoting, which is the workhorse of solution of systems of equations.

53
00:06:42,800 --> 00:06:52,550
Now, one of the handouts maybe you can look at now is an essay I wrote a few years ago about Gaussian elimination as an iterative algorithm.

54
00:06:53,960 --> 00:06:55,820
There's some fascinating history here.

55
00:06:56,990 --> 00:07:07,820
Let me remind you that conjugate gradient was invented in 1952, and the authors regarded it as both an iterative and a direct algorithm.

56
00:07:08,060 --> 00:07:12,980
But somehow most of the world interpreted it mostly in those early years as a direct algorithm.

57
00:07:13,220 --> 00:07:21,590
And then about 1971, with the work of John Reed and others, people began to interpret it as mostly an iterative algorithm.

58
00:07:23,710 --> 00:07:26,800
Now. Gaussian elimination, weirdly.

59
00:07:28,940 --> 00:07:35,150
Has a slightly analogous history. It's, of course, in some sense ancient certainly goes back to Gauss.

60
00:07:35,180 --> 00:07:38,360
Actually, it goes back to the Chinese. 2000 years ago.

61
00:07:38,930 --> 00:07:46,010
And of course, its main use is a direct is as a direct algorithm, which means you have a matrix.

62
00:07:46,220 --> 00:07:49,220
You do the elimination and you get to the end and then you stop.

63
00:07:50,870 --> 00:07:56,030
Now, that's still true. That's still the main use. But sort of in the last ten years.

64
00:07:59,300 --> 00:08:04,760
There's a big business going around where people do low rank compressions of things.

65
00:08:05,930 --> 00:08:12,649
So this is a big theme in large scale linear algebra these days where you take a matrix that

66
00:08:12,650 --> 00:08:18,950
might be a million by a million and you try to represented by something of rank 100 or 200.

67
00:08:19,280 --> 00:08:22,790
Now, what does that mean to be rank 100 or 200? It means.

68
00:08:24,090 --> 00:08:33,720
u1v1 transpose plus you to the to transpose and so on up to some modest number 200, let's say.

69
00:08:36,640 --> 00:08:44,200
So there's a picture of a rank one matrix plus another rank, one matrix plus another micron matrix and so on.

70
00:08:44,210 --> 00:08:47,830
200 of them. That's a rank 200 matrix.

71
00:08:49,080 --> 00:08:55,740
That's a fruitful way to approximate matrices of dimensions in the millions somehow, sometimes.

72
00:08:56,130 --> 00:09:01,650
For example, the famous Netflix competition where people were trying to predict movie preferences.

73
00:09:01,860 --> 00:09:07,620
Is in this general area. You have this vast matrix where one dimension is movies.

74
00:09:07,620 --> 00:09:09,630
The other dimension is viewers of movies.

75
00:09:10,080 --> 00:09:15,570
You only have a few of the entries, but if you make the approximation that the Matrix has reasonably low rank,

76
00:09:15,840 --> 00:09:18,840
you can predict the next movie that you're going to like.

77
00:09:19,890 --> 00:09:23,580
That turns out to be intimately related with Gaussian elimination.

78
00:09:23,580 --> 00:09:31,320
And let me just draw the picture of why that is. Suppose you have a million by a million matrix or a 10 million by a million matrix.

79
00:09:32,040 --> 00:09:38,850
Suppose you find the biggest entry in it and then consider that row and that column.

80
00:09:40,350 --> 00:09:45,360
Well, the product of that row times, that column is a rank one approximation.

81
00:09:46,800 --> 00:09:52,560
If you take your matrix and subtract off that rank, one approximation suitably scaled,

82
00:09:53,130 --> 00:09:56,610
you get something which is now zero in that column and that row.

83
00:09:57,240 --> 00:09:59,250
Then you do it again. So that was the first step.

84
00:10:00,480 --> 00:10:13,470
At step two, you find the biggest entry of what remains, and you draw a row and a column through there, and then you subtract off that rank one term.

85
00:10:14,070 --> 00:10:18,060
So we've now approximated this huge matrix by a rank to approximation.

86
00:10:18,360 --> 00:10:24,900
If you continue that process, that turns out to be a pretty good way to find low rank approximations to matrices.

87
00:10:25,260 --> 00:10:28,950
And lo and behold, that's exactly Gaussian elimination.

88
00:10:28,950 --> 00:10:34,320
It's actually Gaussian elimination with not only row interchanges, but also column interchanges.

89
00:10:34,530 --> 00:10:39,120
But basically, that's one way of interpreting the Gaussian elimination process.

90
00:10:40,500 --> 00:10:45,750
So I'm telling you on this panel that every algorithm should be interpreted as a matrix factorisation.

91
00:10:46,200 --> 00:10:52,740
But I'm contradicting myself over here because the easiest way to draw this picture is not as a matrix factorisation,

92
00:10:52,950 --> 00:10:57,270
but just as a decomposition into a sum of rank one pieces.

93
00:10:59,820 --> 00:11:07,870
So just to finish off the history of conjugate gradients as a direct algorithm essentially is dead and nobody would ever use it that way.

94
00:11:07,890 --> 00:11:16,140
Hardly. Gaussian elimination remains mostly a direct algorithm, but this iterative side of it is growing in a dramatic way.

95
00:11:18,280 --> 00:11:23,950
An important part of large scale data manipulation. Okay.

96
00:11:23,960 --> 00:11:30,230
That's the end of the digression. Let's return to the picture.

97
00:11:30,410 --> 00:11:33,800
Of course, I should summarise what L'ue is used for.

98
00:11:34,010 --> 00:11:38,030
So it's used for systems of equations.

99
00:11:43,240 --> 00:11:44,710
And low rank compression.

100
00:11:54,480 --> 00:12:04,590
Now the next one is called classic factorisation, and that's Gaussian elimination in the special case of a symmetric, positive, definite matrix.

101
00:12:09,040 --> 00:12:13,209
You remember my rule of thumb that half of the problems that arise in practice are symmetric.

102
00:12:13,210 --> 00:12:25,030
Positive. Definite. Suppose if a is symmetric positive definite, then we can write a as an l u factor with a symmetry in it.

103
00:12:25,240 --> 00:12:29,770
So we write a equals r. Transpose r.

104
00:12:31,660 --> 00:12:39,239
So now hour is up for triangular. You can actually do this.

105
00:12:39,240 --> 00:12:45,480
If a is a semi definite, it might have zero eigenvalues, but it can't have negative eigenvalues.

106
00:12:45,750 --> 00:12:51,870
And this, of course, is used for symmetric positive definite systems.

107
00:12:53,790 --> 00:13:04,130
Of equations. So in MATLAB, for example, if you type backslash, it actually figures out whether your matrix is symmetric or not.

108
00:13:04,400 --> 00:13:08,300
If it's not symmetric, it uses L2 factorisation.

109
00:13:08,660 --> 00:13:15,980
If it is symmetric, it tries to key factor zation and if that's successful, then it knows it was not only symmetric but positive definite.

110
00:13:16,250 --> 00:13:19,760
If it's unsuccessful, it knows that it was symmetric but not positive.

111
00:13:19,760 --> 00:13:25,700
Definite. And it goes back and uses lru factorisation. Let me mention.

112
00:13:25,700 --> 00:13:29,810
And the fourth one is eigenvalue factorisation.

113
00:13:42,140 --> 00:13:45,890
So when you do an eigenvalue factorisation, you're making a matrix diagonal.

114
00:13:47,120 --> 00:13:53,630
So we could say like this A equals the D the inverse.

115
00:13:54,080 --> 00:14:00,230
Now, this is not always possible because not every matrix is diagonal visible, but generically every matrix is.

116
00:14:00,530 --> 00:14:03,680
If you take a matrix that isn't and perturb it a little bit, it will be.

117
00:14:04,370 --> 00:14:08,510
So here what we have is a has to be a diagonal sizeable square matrix.

118
00:14:12,220 --> 00:14:21,400
And this has to be non singular. And D is diagonal.

119
00:14:26,220 --> 00:14:29,850
So this is called a similarity transformation. And as you know,

120
00:14:29,850 --> 00:14:37,320
I'm sure the eigenvalues of a matrix are the same as the eigenvalues of an another matrix related to it by a similarity transformation.

121
00:14:37,590 --> 00:14:43,770
So whatever V is, so long as it's not singular, the eigenvalues of A are the same as the eigenvalues of D,

122
00:14:44,100 --> 00:14:52,890
and if D is diagonal, those are the diagonal entries. So let me just remind you of the point here, which is algorithms in all of these cases.

123
00:14:54,370 --> 00:14:58,780
The algorithm that is the standard one consists of constructing the factorisation

124
00:14:59,050 --> 00:15:03,400
to simplify the problem and then doing what you need to do in that simpler form.

125
00:15:03,730 --> 00:15:07,540
So for example, if you want to solve a system of equations,

126
00:15:07,540 --> 00:15:15,369
you turn the matrix into a lower triangular and an upper triangular product that then amounts to a succession of two problems,

127
00:15:15,370 --> 00:15:20,469
each of which is triangular. It's easy to solve a triangular system by substitution.

128
00:15:20,470 --> 00:15:22,720
You do one variable, then the next, then the next.

129
00:15:23,320 --> 00:15:31,000
Similarly, in this case, if you can make the matrix diagonal, then of course it's trivial to solve the eigenvalue problem.

130
00:15:31,150 --> 00:15:37,660
So that's how it's done. Somehow or other you make the matrix diagonal, then you've got an easy eigenvalue problem.

131
00:15:39,840 --> 00:15:44,070
I have three more to mention, so I guess I'll leave those there and move here.

132
00:15:49,770 --> 00:15:55,679
The fifth one is again eigenvalue factorisation, but in the special case of a symmetric matrix.

133
00:15:55,680 --> 00:16:02,040
So I'll call it the orthogonal eigenvalue factorisation.

134
00:16:12,930 --> 00:16:23,729
I remind you that if you have a real symmetric matrix then it has a complete set of orthogonal eigenvectors in the continuous analogue,

135
00:16:23,730 --> 00:16:27,330
as if you have a mission operator like the Schrödinger operator.

136
00:16:27,540 --> 00:16:34,440
It has a complete set of orthogonal eigen functions. So that's the case where everything works as beautifully as possible.

137
00:16:35,070 --> 00:16:43,800
If you think of it in terms of matrices and factorisation, this becomes Q de, q transpose.

138
00:16:44,040 --> 00:16:47,640
A special case of this in which V is orthogonal.

139
00:16:48,840 --> 00:16:56,650
So in this case, Q is orthogonal. And D is diagonal.

140
00:17:00,900 --> 00:17:10,020
So the algorithms for solving symmetric matrix eigenvalue problems consist of orthogonal operations that reduce it to diagonal form.

141
00:17:12,240 --> 00:17:19,560
There's another eigenvalue factorisation, which in fact is closer to what the algorithms do than what I described here.

142
00:17:20,850 --> 00:17:22,980
And that's called the sure factorisation.

143
00:17:32,010 --> 00:17:39,570
So if you have a diagonal sizeable matrix, then there exists some similarity transformation which takes it to diagonal form.

144
00:17:39,900 --> 00:17:46,500
But you can imagine there must be something, a little suspect about that operation because if your matrix is merely non diagonal,

145
00:17:46,500 --> 00:17:49,870
Isabelle, something's got to go wrong here. And that is indeed true.

146
00:17:49,890 --> 00:18:00,070
The inverse becomes huge, typically. It turns out that every matrix diagonal, visible or not, can be made triangular as opposed to diagonal.

147
00:18:00,660 --> 00:18:07,440
So the sure factorisation is a equals q t q transpose.

148
00:18:09,290 --> 00:18:17,930
Where Q is orthogonal and T is upper triangular.

149
00:18:26,390 --> 00:18:35,090
So let me just emphasise again that for this kind of eigenvalue factorisation where you make the matrix diagonal,

150
00:18:35,390 --> 00:18:44,750
diagonal that only works for matrices that are diagonal visible, whereas the sure factorisation works for any matrix, any square matrix.

151
00:18:47,380 --> 00:18:54,220
So that in fact is what algorithms really do normally for eigenvalues of non symmetric matrices.

152
00:18:56,470 --> 00:18:59,500
And then the final one is the singular value decomposition.

153
00:19:04,810 --> 00:19:07,540
The SVOD, the singular value decomposition.

154
00:19:10,710 --> 00:19:21,120
The roots of this are a century or more old, but it became big computationally in the 1960s, and the famous man who made it big was Gene Ghalib.

155
00:19:22,260 --> 00:19:25,770
What you have here is a factorisation of a matrix a.

156
00:19:35,550 --> 00:19:46,230
Into a product you and it's normally written sigma v transpose where you and V are orthogonal.

157
00:19:50,450 --> 00:19:54,980
And sigma is diagonal. And more than that.

158
00:19:56,090 --> 00:20:06,590
It's if we write it in the usual form, it would have entries sigma one, sigma two down to sigma n and these are real numbers that are sorted.

159
00:20:06,860 --> 00:20:13,790
So Sigma one is bigger than or equal to Sigma two and so on down to Sigma N, bigger than or equal to zero.

160
00:20:15,860 --> 00:20:18,920
So that's the singular value decomposition.

161
00:20:19,430 --> 00:20:22,730
It reduces a to a diagonal matrix.

162
00:20:23,450 --> 00:20:29,769
You can always do that. But the catch is it's using different vector matrices.

163
00:20:29,770 --> 00:20:33,430
On the two side, you and V are both orthogonal, but they're not the same in general.

164
00:20:33,640 --> 00:20:40,360
So that's not a similarity transformation. You can do it for any A, doesn't have to be real.

165
00:20:40,720 --> 00:20:46,960
That doesn't have to be symmetric. So these are really seven of the glories of our field.

166
00:20:48,070 --> 00:20:49,240
Every one is important.

167
00:20:49,240 --> 00:20:58,930
This one is hugely important, both for classical dense matrices and in all sorts of ways in the large scale world, in data analysis and big data.

168
00:20:59,260 --> 00:21:02,530
People are always doing things related to the CVD.

169
00:21:02,890 --> 00:21:06,790
It's related to principal component analysis, and the proper, orthogonal,

170
00:21:06,790 --> 00:21:14,260
decomposition and coherent way of transformations are many ways that people have discovered things related to the 3D over the years.

171
00:21:15,790 --> 00:21:21,910
Now I want to draw your attention to the ones here that generalise to rectangular matrices.

172
00:21:22,450 --> 00:21:29,950
So QR Factorisation certainly does generalise to a rectangular matrix.

173
00:21:30,200 --> 00:21:35,410
L'ue. Factorisation also generalises to a rectangular matrix.

174
00:21:37,540 --> 00:21:44,470
Khaleesi does not because it's assuming symmetry. Eigenvalues does not because eigenvalues only make sense for square matrices.

175
00:21:44,740 --> 00:21:49,690
So of course, that one also doesn't and that one also doesn't, but the SVD generalises.

176
00:21:50,290 --> 00:21:56,439
So those three apply to non symmetric and non square matrices.

177
00:21:56,440 --> 00:22:03,910
Also, classically, nobody would think of U for a non square matrix because classically we think of it as solving a system of equations.

178
00:22:04,210 --> 00:22:10,000
But as soon as you get into these rank compression things, you are certainly using non square matrices.

179
00:22:12,100 --> 00:22:17,650
Okay. So what we're going to do now is play with these factors, Asians with the code called M16.

180
00:22:32,370 --> 00:22:40,230
So all I'm going to do here is pick a five by five matrix that I like and then play with it with the five with the seven factorisation.

181
00:22:40,440 --> 00:22:49,079
So let's say A equals magic of five, which is a magic square consisting of numbers that all add up to the same thing.

182
00:22:49,080 --> 00:22:52,409
Rows and columns. You probably know what that number is.

183
00:22:52,410 --> 00:22:55,800
If I take the sums of the columns, I get 65.

184
00:22:55,980 --> 00:23:01,260
Each column adds up to 65. If I take the sums of the rows, I get 65.

185
00:23:01,270 --> 00:23:07,080
Each row adds up to 65, gets the convenient non symmetric matrix to play with.

186
00:23:08,250 --> 00:23:12,959
So let's look at the QR factorisation of a there it is.

187
00:23:12,960 --> 00:23:18,960
Let's make formats short. So there's a and there's its QR factorisation.

188
00:23:20,340 --> 00:23:31,320
Of course you can't really see that Q is orthogonal, but you can see that our is upper triangular to machine precision a should be equal to Q are so

189
00:23:31,590 --> 00:23:38,870
if I say Q times are minus A you see a matrix whose entries are of size ten to the minus 14th.

190
00:23:41,720 --> 00:23:53,000
So that's to our factorisation. Let's do L'ue. So there's a again, let's say l u p equals l u of a.

191
00:23:56,910 --> 00:24:05,730
That's wrong. Now you have. Right. So you see there that l is lower triangular and moreover it has ones on the diagonal.

192
00:24:05,970 --> 00:24:09,000
You is upper triangular and p is a permutation matrix.

193
00:24:09,330 --> 00:24:15,570
I didn't say that. Normally P would be chosen at each step to bring the biggest entry in a column to the diagonal.

194
00:24:15,810 --> 00:24:25,290
If that's done, then l is not on the unit lower triangular, but all the entries below the diagonal have size at most one in absolute value.

195
00:24:25,830 --> 00:24:28,890
Notice all the entries of l are positive. That's.

196
00:24:29,890 --> 00:24:34,590
Totally not typical. And I presume that's a property.

197
00:24:34,600 --> 00:24:38,460
I don't know. I'm guessing that's the property of magic squares. I've never noticed it before.

198
00:24:38,470 --> 00:24:44,620
Is it? Here we have a statistical question. Is a five by five example enough evidence that.

199
00:24:45,550 --> 00:24:49,690
I can't resist trying it. Let's see what happens. Let's just for fun.

200
00:24:49,690 --> 00:24:56,200
Say L you p equals lieu of magic of ten.

201
00:24:59,560 --> 00:25:04,990
And let's will then print l. Okay. Now, I don't have an opinion, but you all must have.

202
00:25:05,050 --> 00:25:08,800
Let's have votes. Who thinks that all the entries below the diagonal will be positive?

203
00:25:10,120 --> 00:25:13,720
Who thinks that they will be a mix of positive and negative. Okay.

204
00:25:14,660 --> 00:25:19,460
So the people most people think mixed. I maybe I'm inclined towards the other, but I really don't know.

205
00:25:20,010 --> 00:25:23,930
Um. You're right. Okay. So the five by five case was coincidence.

206
00:25:26,140 --> 00:25:32,870
Okay. Back to good old A. Oh, we didn't confirm that.

207
00:25:35,830 --> 00:25:43,780
Sorry. There's a. Once more there is its L2 factorisation.

208
00:25:44,680 --> 00:25:51,270
We didn't confirm that the factorisation gives the right thing, so let's do that p times a minus l times.

209
00:25:51,280 --> 00:25:55,560
You should be zero. And sure enough it's ten to the minus 14th.

210
00:25:57,130 --> 00:26:03,160
Okay. Now let's try a classic factorisation. So there's a and it's not symmetric, let alone positive.

211
00:26:03,160 --> 00:26:08,620
Definite. If I try call of a, it says error, which is the right answer.

212
00:26:09,370 --> 00:26:14,410
Now suppose I try r equals call of a transpose a.

213
00:26:15,630 --> 00:26:21,810
That is symmetric positive, definite. So it proceeds happily and indeed our transpose times are.

214
00:26:23,220 --> 00:26:32,010
Minus a transpose times a is of size machine precision that's actually larger than I'd like it to be.

215
00:26:32,010 --> 00:26:36,830
Ten to the minus 12th. I don't know why it's so big. Let's do another one.

216
00:26:37,490 --> 00:26:41,000
So there's a again, let's look at its eigenvalues.

217
00:26:42,040 --> 00:26:48,970
They happened to be all real numbers. A priori they might have been complex, perhaps, but they're all real numbers.

218
00:26:50,740 --> 00:26:57,220
We could say the comedy equals I gave a and then we have the matrix of eigenvectors.

219
00:26:57,640 --> 00:27:01,600
The five columns are the eigenvectors and then the matrix of eigenvalues.

220
00:27:03,250 --> 00:27:11,040
So. If a equals v v inverse, that's the same as saying a vehicle's v d.

221
00:27:11,080 --> 00:27:14,130
So the columns of V are the eigenvectors of a.

222
00:27:14,470 --> 00:27:20,560
And let's confirm that a times v minus V, times D should be zero.

223
00:27:21,760 --> 00:27:28,450
So make sure you're with me on this. A times V amounts to multiplying a by each of the five columns of V,

224
00:27:28,780 --> 00:27:34,630
and the results should be those columns scaled by the eigenvalues on the diagonal of V.

225
00:27:35,020 --> 00:27:38,140
So if these are really eigenvectors, we should get zero.

226
00:27:38,680 --> 00:27:44,040
And sure enough, ten to the -13. Okay.

227
00:27:44,430 --> 00:27:50,640
Let's try of a plus a transpose.

228
00:27:50,880 --> 00:27:55,680
That's a real not positive definite matrix. So it's symmetric but not positive.

229
00:27:55,680 --> 00:27:59,820
Definite. It has real eigenvalues and some of them are negative.

230
00:28:01,570 --> 00:28:16,030
So I could say QED equals I of a plus a and there you have the eigenvalues and you also have orthogonal or a normal eigenvectors.

231
00:28:18,580 --> 00:28:27,970
130 Y 130. Well, the Magic Square has its biggest eigenvalue 65 and the eigenvector is the constants all ones.

232
00:28:28,210 --> 00:28:35,530
Here, you see, that's that eigenvector constant. And because we've had a plus eight transpose, we have 65 plus 65.

233
00:28:40,130 --> 00:28:44,630
And if I confirmed that that works, I could say a plus, a transpose minus.

234
00:28:44,630 --> 00:28:50,070
Q Times, the times. Q transpose, and that is not zero.

235
00:28:50,090 --> 00:28:58,230
What did I do wrong? Oh I'm sorry.

236
00:28:58,240 --> 00:29:02,110
Cos I didn't do a placé transpose did I. Thank you.

237
00:29:03,790 --> 00:29:11,019
That should have been A-plus transpose. And now 130 is on the bottom.

238
00:29:11,020 --> 00:29:14,410
Right. That's okay. And let's now confirm.

239
00:29:15,820 --> 00:29:20,630
Yeah. Machine precision. Okay.

240
00:29:21,250 --> 00:29:25,899
Well, let me remind you again that the eigenvalues of a look like that and that every

241
00:29:25,900 --> 00:29:30,520
matrix has a sure factorisation an orthogonal reduction to triangular form.

242
00:29:30,850 --> 00:29:42,250
So if I say Q T equals sure of A, then you see we've got a orthogonal matrix and a triangular matrix.

243
00:29:42,580 --> 00:29:47,469
So I could say A minus. Q Times T times.

244
00:29:47,470 --> 00:29:50,620
Q Prime machine precision.

245
00:29:52,730 --> 00:30:00,620
Finally, we're going to talk about the SVOD. Now, the SVOD of a if you guessed type that you get the numbers, which are its singular values.

246
00:30:00,890 --> 00:30:07,370
If you type USV equals SVOD of a, you get the three matrices you FMV.

247
00:30:07,910 --> 00:30:13,940
So there we have diagonal, singular values, all positive real numbers and the two orthogonal matrices.

248
00:30:13,940 --> 00:30:18,469
You and V, I could say A minus you.

249
00:30:18,470 --> 00:30:23,260
Times F, times v. Transpose.

250
00:30:25,180 --> 00:30:31,749
So this kind of stuff is happening all the time. Whenever you compute with matrices, virtually anything you do will involve a matrix.

251
00:30:31,750 --> 00:30:36,829
Factorisation. Okay.

252
00:30:36,830 --> 00:31:22,470
So let's talk about the SVD in particular. So the speed is a very pretty topic.

253
00:31:23,160 --> 00:31:26,960
It is more important than you could possibly imagine in practice.

254
00:31:26,970 --> 00:31:37,360
It's used all over the place. And one of the prevailing facts decade after decade, especially at Oxford,

255
00:31:37,750 --> 00:31:42,220
is that mathematics students are not taught as much about the speed as they should be.

256
00:31:42,610 --> 00:31:48,549
At at more applied universities. The balance is better, but at a at a really high powered,

257
00:31:48,550 --> 00:31:53,890
pure mathematical bastion like Oxford University, students are not taught enough about the speed.

258
00:31:54,970 --> 00:31:59,080
I don't fully understand why that is. Of course, it's a newer concept in practice.

259
00:32:00,340 --> 00:32:07,160
Eigenvalues. Students Learn All About. Eigenvalues are essentially an algebraic concept.

260
00:32:07,820 --> 00:32:12,710
Singular values are more of a concept of analysis, so they almost have a completely different flavour.

261
00:32:12,980 --> 00:32:16,850
And the algebra always wins at the undergraduate mathematics level.

262
00:32:17,420 --> 00:32:24,710
So here, as in so many places, you probably find that engineers know more about singular values than mathematicians.

263
00:32:26,400 --> 00:32:29,550
Okay. Let's get a sense of what's involved here.

264
00:32:30,600 --> 00:32:36,899
So as I mentioned, the SVD consists of a factorisation into orthogonal times,

265
00:32:36,900 --> 00:32:44,820
diagonal times, orthogonal, and the diagonal entries are non-negative real numbers in order.

266
00:32:45,330 --> 00:32:49,410
And let's now allow you to be rectangular.

267
00:32:49,800 --> 00:33:00,090
So if a I mean, if A is a rectangular matrix and by n for simplicity, I'll assume that M is bigger than or equal to N.

268
00:33:01,990 --> 00:33:07,720
Then what we have here in the form that people would most often use it in practice.

269
00:33:09,530 --> 00:33:21,169
Is a rectangular matrix. A is equal to a rectangular matrix U times a diagonal matrix sigma times a square matrix.

270
00:33:21,170 --> 00:33:37,440
V So this is a equals u sigma v transpose and by n m by n and by n and by n u.

271
00:33:38,240 --> 00:33:43,130
When we talked about square matrices, that was orthogonal, but now it will have all the normal columns.

272
00:33:43,460 --> 00:33:47,030
So U has all the normal columns.

273
00:33:48,560 --> 00:33:56,799
And of course V is orthogonal. So that's the standard way that the the SVD is used.

274
00:33:56,800 --> 00:34:06,280
It's the idea is to take a matrix which maybe has a lot of rows and find new bases for the range and the domain.

275
00:34:06,700 --> 00:34:12,430
So in effect, this is a basis for the domain and this is the basis for the range of a.

276
00:34:12,790 --> 00:34:18,700
And here's the picture that is at the heart of everything. Take any matrix.

277
00:34:21,330 --> 00:34:25,770
If it's an m by n matrix, then it's a map. From our end.

278
00:34:27,260 --> 00:34:35,780
To our M. Let's see what happens when a is applied to a ball.

279
00:34:36,080 --> 00:34:42,350
So just the out of the ball defined in the obvious way in our end all the vectors of size at most one.

280
00:34:43,910 --> 00:34:52,750
Every matrix without exception. Always maps. The unit ball on to an ellipse, or more precisely, I should say, a hyper ellipsoid.

281
00:34:53,500 --> 00:34:58,260
What's a hyper ellipsoid? It's a ball that gets stretched in linear fashions.

282
00:34:58,720 --> 00:35:08,740
So every matrix, without exception. Will map 020, of course, and then it will map this ball into some hyper ellipse.

283
00:35:10,860 --> 00:35:17,969
This is an end dimensional object. This would generically be an end dimensional hyper ellipsoid, but it might be flat in some dimensions.

284
00:35:17,970 --> 00:35:23,520
So its, its actual dimension could be less than n in those degenerate cases.

285
00:35:24,330 --> 00:35:27,390
Now, the SVOD is all about describing that map.

286
00:35:30,720 --> 00:35:38,460
Suppose you have an ellipse? It has a major semi axis and a minor semi axis.

287
00:35:39,900 --> 00:35:44,160
Generically speaking, you have no choice here except plus or minus signs.

288
00:35:44,430 --> 00:35:48,300
If it happened that the two were of equal length and this was a circle, then you'd have a choice.

289
00:35:48,600 --> 00:35:54,570
But generically, they're uniquely defined. So let's call this one.

290
00:35:55,870 --> 00:35:58,900
Sigma one time, you one.

291
00:36:01,680 --> 00:36:05,610
The length of this is the first singular value.

292
00:36:06,420 --> 00:36:11,010
The direction of it is the first singular vector, which means the first column of you.

293
00:36:11,220 --> 00:36:20,320
So you one is the first column of you. And let's pretend this is a picture in any dimensions.

294
00:36:20,380 --> 00:36:24,220
So I'm going to call this one the last one. This will be sigma n.

295
00:36:25,150 --> 00:36:33,700
U n. So I really should have used the red pen for consistency.

296
00:36:40,980 --> 00:36:46,320
So we take our hyper ellipsoid and we look at its principal axes.

297
00:36:47,700 --> 00:36:55,260
Those are the singular vectors you sub j times the corresponding singular values sigma subject,

298
00:36:55,680 --> 00:36:58,890
and it's clear geometrically that that's a sensible thing you can do.

299
00:36:59,370 --> 00:37:04,950
And now the remarkable fact, which of course, can be proved and may not be obvious at first,

300
00:37:05,280 --> 00:37:11,100
is that the pre images of these principal axes are always orthogonal over here.

301
00:37:11,580 --> 00:37:16,920
So we don't know where the pre image of that line is and who knows where.

302
00:37:16,950 --> 00:37:19,620
Let's say there and let's call that V one.

303
00:37:20,670 --> 00:37:29,910
That's the first column of the orthogonal matrix v and we don't know where the pre image of that will be, but it will be orthogonal to v one.

304
00:37:30,300 --> 00:37:34,330
So let's suppose it's here. That's V two.

305
00:37:36,680 --> 00:37:46,059
VM. Sorry, that's. We're thinking of that as the last one, aren't we. So that's what the SVT does.

306
00:37:46,060 --> 00:37:54,400
That's the picture you should always have in your head. The SVT is about describing the orthogonal bases that correspond to each other.

307
00:37:54,640 --> 00:38:03,670
When you map a unit ball onto a hyper ellipsoid and it's perhaps intuitively plausible that because of the orthogonal ity of those bases,

308
00:38:03,670 --> 00:38:11,980
these are numerically a very good way to work. If you want to study the properties of a or of a inverse, this is a magnificent way to do that.

309
00:38:13,120 --> 00:38:17,709
The singular value decomposition has no relevance to powers of a if a square.

310
00:38:17,710 --> 00:38:24,780
And you want to talk about a to the 10th array to the millionth, the speed is irrelevant because the left and the right have different bases.

311
00:38:24,790 --> 00:38:28,030
You can't you can't square it or qubit. You need eigenvalues for that.

312
00:38:28,450 --> 00:38:36,540
The SVT is always about a itself, not powers. Okay.

313
00:38:36,630 --> 00:38:44,740
Let me. I think I've actually said all this already, but let's just spell it out a little more fully.

314
00:38:45,010 --> 00:38:54,700
So the picture I'm talking about is that you have a matrix A and we are thinking in terms of multiplying it by.

315
00:38:57,480 --> 00:39:02,820
Vectors v one through v. N. That's equal to.

316
00:39:06,080 --> 00:39:11,380
A matrix you won through U.N. Times.

317
00:39:11,830 --> 00:39:17,820
The diagonal matrix. Sigma one through Sigma n.

318
00:39:21,290 --> 00:39:25,850
So I got this equation by taking that picture and multiplying it on the right by V.

319
00:39:26,690 --> 00:39:34,550
So the SVOD, this is the algebra corresponding to that picture a times a singular vector V

320
00:39:35,390 --> 00:39:39,680
gives you the corresponding singular vector you scaled by a singular value.

321
00:39:40,280 --> 00:39:57,320
So the picture and the algebra are the same. Now let's imagine that you have a vector B which is equal to a times that coefficient vector x.

322
00:40:00,680 --> 00:40:07,790
Then another way to say that is that B is equal to U Sigma, the transpose x.

323
00:40:09,290 --> 00:40:12,380
And another way to say that is that.

324
00:40:14,240 --> 00:40:23,550
You transpose B? Is equal to Sigma Z, transpose x.

325
00:40:26,280 --> 00:40:36,590
That's not a sum. That's a matrix sigma. And now that neurone in your brain fires and you see that this is the vector.

326
00:40:38,000 --> 00:40:41,450
Of coefficients of X.

327
00:40:43,910 --> 00:40:50,360
In the basis of columns of the.

328
00:40:55,170 --> 00:41:01,650
Now an arbitrary M vector could not be represented in as a linear combination of these and.

329
00:41:03,310 --> 00:41:07,900
Vectors you want through UN, but if it happens to be in this form, then it can be.

330
00:41:07,910 --> 00:41:11,500
So this is a vector of coefficients.

331
00:41:14,310 --> 00:41:17,400
Of be in the basis.

332
00:41:19,820 --> 00:41:23,240
Of columns of you.

333
00:41:25,270 --> 00:41:37,090
So in other words, what we're saying here is that if you take a matrix A and then change bases in both the domain and the range in this 3D fashion,

334
00:41:37,690 --> 00:41:46,000
then the problem becomes diagonal. Every matrix is diagonal in some suitable, orthogonal, chosen basis.

335
00:41:47,320 --> 00:41:55,370
And that's why the SVOD is so useful. Okay.

336
00:41:55,370 --> 00:42:00,770
I'm going to play with it in a minute on the computer. But first, I want to tell you some key algebraic facts.

337
00:42:03,800 --> 00:42:14,960
So here are some facts. And this is for rectangular, which, of course, has a special case would include square.

338
00:42:17,160 --> 00:42:21,690
So one fact every a has a singular value decomposition.

339
00:42:24,110 --> 00:42:32,110
In contrast to an eigenvalue decomposition. Singular vowels are unique.

340
00:42:35,320 --> 00:42:47,620
But not singular vectors. The singular vectors always have plus or minus one choices, and in none generic cases they have further choices.

341
00:42:47,630 --> 00:43:00,320
Beyond that, the two norm of a matrix is something we haven't defined, but it's a standard concept and it's equal to the first singular value of a.

342
00:43:09,970 --> 00:43:13,930
The obvious norm of a matrix is also something we haven't defined.

343
00:43:14,890 --> 00:43:18,219
People write at a sub f, but it's easy to define that one.

344
00:43:18,220 --> 00:43:20,380
It's just the square root of the sum, of the squares,

345
00:43:21,430 --> 00:43:27,040
of the entries of a and it's also equal to the square root of the sum of the squares, of the singular values.

346
00:43:28,900 --> 00:43:34,330
So the definition is that it's the sum of the entries squared.

347
00:43:36,020 --> 00:43:44,030
Square rooted. Well, that's equal to the sum of the singular value squared and then square root it.

348
00:43:47,190 --> 00:43:50,100
It's also true that the singular values of a.

349
00:43:54,280 --> 00:44:02,140
Are equal to some of the set of singular values of a is equal to the set of square roots of eigenvalues of a transpose.

350
00:44:10,540 --> 00:44:17,290
So mathematically that means you can relate this as 3D to an eigenvalue problem, but it's an ongoing value problem for a different matrix.

351
00:44:17,590 --> 00:44:24,490
And moreover, numerically, that's not something to do if you can avoid it because it sort of squares the condition number.

352
00:44:24,700 --> 00:44:27,630
So that's not the standard way people compute the CVD,

353
00:44:28,030 --> 00:44:33,670
but you could in principle compute the CVD by computing the eigenvalue decomposition of a transpose.

354
00:44:35,780 --> 00:44:44,900
Couple more facts. The rank of AA is equal to the number of non-zero singular values.

355
00:44:51,800 --> 00:44:57,920
So you can see that's a very practically important thing. When MATLAB computes the rank of a matrix, it finds the speed.

356
00:44:58,550 --> 00:45:05,900
It ignores singular values that are down at machine precision and tells you how many singular values are well above machine precision.

357
00:45:06,980 --> 00:45:13,459
The range of A is equal to the span of you.

358
00:45:13,460 --> 00:45:18,590
One through you are where R is the rank.

359
00:45:20,490 --> 00:45:27,060
So the singular vectors you sub j give you an orthogonal basis for the range of a.

360
00:45:29,610 --> 00:45:33,660
And then I also want to write down three facts, four square matrices.

361
00:45:45,400 --> 00:45:50,830
So first fact is that the product of the singular values so the product,

362
00:45:50,830 --> 00:46:02,590
let's say of sigma sub j is equal to the product of the absolute values of the eigenvalues, which in turn is the absolute value of the determinant.

363
00:46:06,320 --> 00:46:10,160
So singular values and eigenvalues have some connections.

364
00:46:15,120 --> 00:46:16,350
The normal, the inverse.

365
00:46:17,970 --> 00:46:26,610
And again, that would be the two norm of the inverse which we haven't defined is equal to one over the smallest singular value.

366
00:46:27,900 --> 00:46:37,260
And finally. The condition number of a which is defined as the product of the norm of a.

367
00:46:38,370 --> 00:46:43,880
And the norm of an inverse. Well, you know what that is?

368
00:46:44,060 --> 00:46:49,520
Because the norm of AA is sigma one. The norm of AA inverse is one over sigma n.

369
00:46:50,570 --> 00:46:55,340
So that's the ratio of the largest to the smallest singular value.

370
00:47:04,440 --> 00:47:08,220
So the speed is everywhere in numerical linear algebra.

371
00:47:10,290 --> 00:47:15,420
There isn't really time to talk about how it's used for low rank approximation, but let me just say a word or two.

372
00:47:15,630 --> 00:47:21,960
I mentioned that Gaussian elimination can be thought of as a way to compress a matrix, to find a low rank approximation.

373
00:47:22,320 --> 00:47:26,070
You have a million by million matrix. You find a rank 100 approximation.

374
00:47:26,520 --> 00:47:34,110
Well, the SVD is the optimal such approximation in the to norm and also in the obedience norm.

375
00:47:34,380 --> 00:47:39,690
So if you want the optimal rank 100 approximation to a million by million matrix,

376
00:47:39,690 --> 00:47:47,280
then mathematically you get that by taking the first hundred singular values and singular vectors you and singular vectors.

377
00:47:47,280 --> 00:47:51,090
V That's optimal in the mathematical sense.

378
00:47:51,180 --> 00:47:56,160
The trouble is, it's hard to compute. There's no way to compute the speed of a million by million matrix.

379
00:47:56,400 --> 00:48:04,740
So if the matrix is small enough that you can afford to compute, it says it gives you optimal algorithms for low rank approximation.

380
00:48:05,190 --> 00:48:10,560
If the matrix is huge, people use approximations to the CVD such as Gaussian elimination.

381
00:48:12,930 --> 00:48:16,440
And I just want to finish by playing around with some two by two matrices.

382
00:48:21,060 --> 00:48:25,560
So this is now the code called M 17.

383
00:48:29,470 --> 00:48:37,060
So if you glance at the code, you'll see it's an infinite loop. And at each step, it finds a random matrix of dimension two.

384
00:48:38,170 --> 00:48:45,370
And then a computer test feed. And then it draws a picture of the map from that ball to the hyper ellipsoid.

385
00:48:45,520 --> 00:48:51,610
Except it's not a hyper ellipsoid. It's just an ellipse. So I'll say M17.

386
00:48:58,020 --> 00:49:06,800
So. There is a matrix, two by two matrix and it maps the unit ball in the plane to there.

387
00:49:07,130 --> 00:49:12,410
And you can see we've drawn the singular vector U one times the singular value.

388
00:49:12,650 --> 00:49:16,620
And similarly over here. Those seem to be on the axes.

389
00:49:16,630 --> 00:49:20,250
That's an astonishing coincidence. It's completely random.

390
00:49:20,260 --> 00:49:25,900
How did that happen? It's totally misleading. You should not expect the blue and the red lines to be on the axes.

391
00:49:26,230 --> 00:49:29,260
What property of a made that the case. Can we spot that?

392
00:49:29,890 --> 00:49:34,040
Michael, are you quick enough to spot that? Oh. Anyway.

393
00:49:34,040 --> 00:49:39,320
I'm astonished. Somebody tell me later what property of A leads to that.

394
00:49:42,140 --> 00:49:46,760
Let's do another one, which I hope will be more generic. Okay, that's more generic.

395
00:49:48,740 --> 00:49:52,850
Let me draw your attention to the fact that there's no right hand rule here.

396
00:49:52,850 --> 00:49:55,970
The red one is on the right of the blue one, and from the left over there,

397
00:49:56,120 --> 00:50:01,729
it could go either way because some matrices are rotations and others are reflections.

398
00:50:01,730 --> 00:50:06,110
And you can predict it has to do with whether the determinant is positive or negative.

399
00:50:07,310 --> 00:50:12,590
So let's do a few more. That one also came out with a negative determinant.

400
00:50:12,600 --> 00:50:17,670
I can tell by the orientation of the blue and the red. Let's find one with a positive determinant there.

401
00:50:17,690 --> 00:50:21,860
Okay. So that must have a positive determinant because they're oriented the same way.

402
00:50:22,100 --> 00:50:26,390
This is a very well conditioned matrix because the ellipse is nearly a circle.

403
00:50:27,050 --> 00:50:31,310
Let's press the button a few times until we find an L Condition matrix.

404
00:50:33,100 --> 00:50:36,850
I'm looking for an ellipse that's very narrow. So that's air conditioned.

405
00:50:39,460 --> 00:50:43,720
Now. What does it mean to be ill conditioned? It means you're nearly rank deficient.

406
00:50:43,730 --> 00:50:47,410
If that were actually gift the segment, you'd be perfectly rank deficient.

407
00:50:47,590 --> 00:50:53,350
So if we look at that matrix, we should find that the first two columns are nearly multiples of one another.

408
00:50:53,620 --> 00:50:58,390
And sure enough, it looks as if the second column is about three and a half times the first column.

409
00:50:58,630 --> 00:51:02,680
Similarly, the second row is about nine.

410
00:51:02,710 --> 00:51:06,230
Did I say that wrong? No, that's right.

411
00:51:06,350 --> 00:51:12,800
And the second row is about -1.7 times the first drop or something like that.

412
00:51:13,100 --> 00:51:18,920
So every matrix has this property in 2D, it's just ellipsoid, but in multiple dimensions.

413
00:51:19,160 --> 00:51:22,910
It's an n dimensional ellipse. Okay. See you on Tuesday.