1
00:00:11,280 --> 00:00:21,240
Today. It's a big.

2
00:00:34,860 --> 00:00:39,760
Is this?

3
00:00:53,610 --> 00:01:07,140
Think. And that's what do in my life.

4
00:01:07,970 --> 00:01:12,670
Okay. Good morning, everybody. Let's get started. I hope you've done the quiz.

5
00:01:12,690 --> 00:01:23,430
Let's quickly go through that. So the answer to problem one is no output because there's a semicolon.

6
00:01:26,160 --> 00:01:35,430
The answer to problem two is true, which in MATLAB appears as one.

7
00:01:38,010 --> 00:01:44,430
The answer to problem three we have a determinant of a two by two matrix and it's minus two.

8
00:01:45,960 --> 00:01:51,780
The answer to problem four is about 4.5.

9
00:01:52,860 --> 00:01:59,370
Depending how accurately you count, because that's the square root of the condition number of the matrix, which is 20.

10
00:02:00,570 --> 00:02:10,560
And so let's say the answer to number five, how long does it take if you increase from dimension 4000 to 6000?

11
00:02:10,800 --> 00:02:15,270
It should it should go quickly. So it's around 5.4 seconds.

12
00:02:15,870 --> 00:02:19,050
Actually, on my machine, it was 4.7 seconds.

13
00:02:20,760 --> 00:02:26,549
The answer to number six. About what are some names of non symmetric matrix iterations?

14
00:02:26,550 --> 00:02:37,410
While there are lots of them, there's CG and there's like Q are there's by C, G, there is by C, G, stab and so on and so on.

15
00:02:37,410 --> 00:02:43,709
Kumar. They go on and on. The answer to number seven.

16
00:02:43,710 --> 00:02:46,800
What's the non symmetric analogue of the lunchbox iteration?

17
00:02:47,250 --> 00:02:53,110
That's the Arnold iteration. And then the answer to number eight.

18
00:02:53,110 --> 00:02:56,460
What is my first name? It's Lloyd, actually.

19
00:02:56,470 --> 00:03:01,150
I'm Lloyd. Nicholas. Professor. I hope you all knew that. McHale.

20
00:03:01,330 --> 00:03:06,400
Would you be willing to go get me a bottle of water? I have a kind of a cold, and that might help.

21
00:03:06,590 --> 00:03:10,870
Yeah. Okay, so today we're switching to a part.

22
00:03:10,870 --> 00:03:16,720
Two of the three parts of this term, and the heading here is dense linear algebra.

23
00:03:22,480 --> 00:03:25,030
So we started with sparse linear algebra.

24
00:03:28,670 --> 00:03:35,090
And the dense stuff is, of course, a huge field that I'm only going to touch on certain very basic aspects of.

25
00:03:36,200 --> 00:03:41,810
Of course, all of you have encountered linear algebra in many ways over the years and you know some of this.

26
00:03:42,110 --> 00:03:47,270
But I hope that the way we look at things will be a little different from the way you're used to it.

27
00:03:47,900 --> 00:03:55,730
One place to read about this point of view is in my book, Professor, and now just at the beginning, in chapters one, two, three,

28
00:03:58,070 --> 00:04:05,960
talks about the point of view that I'm going to express, which is all about thinking about matrices and vector vectors via their columns.

29
00:04:07,160 --> 00:04:18,260
So let's start with what I'll call section 2.1, and that's matrices, vectors and expansions.

30
00:04:28,880 --> 00:04:31,640
So of course, you know what matrices are and you know what vectors are.

31
00:04:33,500 --> 00:04:40,700
But the basic point of view I want to recommend is that often in basic numerical linear algebra,

32
00:04:40,910 --> 00:04:44,750
it's good to think of a matrix as consisting of a collection of columns.

33
00:04:45,080 --> 00:04:48,709
So this is the picture we can often draw for a matrix.

34
00:04:48,710 --> 00:04:55,310
The first column is a one and the second is a two, and so on up to a n.

35
00:04:57,050 --> 00:05:01,070
And when I draw that picture, I'm certainly not assuming it's a square matrix.

36
00:05:01,640 --> 00:05:07,700
Typically, I'm assuming that it has at least as many rows as columns and maybe many more rows and columns.

37
00:05:08,600 --> 00:05:13,760
Now a typical matrix times vector. I would then draw like this.

38
00:05:15,000 --> 00:05:18,930
x1x 2... down 2xn.

39
00:05:21,090 --> 00:05:24,270
So of course, you know the formula for a matrix times a vector here.

40
00:05:24,270 --> 00:05:30,840
We learned that in school, this kind of thing. But the way I want you to think of it is as a linear combination of the columns.

41
00:05:31,050 --> 00:05:36,600
So if A X equals B, notice the way I'm drawing the picture.

42
00:05:37,740 --> 00:05:43,650
I'm not telling you anything about the individual entries of these columns or about the individual entries of B.

43
00:05:43,980 --> 00:05:51,540
Moreover, A is rectangular. B has the same length as A, whereas the length of x corresponds to the width of A.

44
00:05:52,770 --> 00:05:58,320
So that is the way I want you to think of A X equals B.

45
00:06:00,090 --> 00:06:03,090
So in other words, B is a linear combination.

46
00:06:03,090 --> 00:06:10,020
Let's write down some things here. A J is the Jth column of the Matrix a.

47
00:06:11,810 --> 00:06:16,700
X sub j is the eighth entry of the vector x.

48
00:06:21,280 --> 00:06:28,600
And then B is a linear combination of the columns of A.

49
00:06:33,960 --> 00:06:36,150
With coefficients x sub j.

50
00:06:40,440 --> 00:06:46,620
So whenever you have a matrix vector product, you may interpret it that way, and often that's the right way to interpret it.

51
00:06:46,860 --> 00:06:57,000
We could write the formula B is the sum from one to end of x sub j a sub j.

52
00:06:57,240 --> 00:06:59,160
Of course that's a number and that's a vector.

53
00:07:02,430 --> 00:07:12,900
Okay, now standard terminology, we say that the range of a which is also known as the column space of a.

54
00:07:13,320 --> 00:07:24,170
Thank you very much. So. These are to above and beyond the call of duty.

55
00:07:30,290 --> 00:07:42,020
So the range or the column space of A is the set of all things you can get when you multiply A by some X like this.

56
00:07:42,260 --> 00:07:53,660
In other words, it's a set of all linear combinations of the columns of a set of all such linear combinations.

57
00:07:56,990 --> 00:08:00,080
And that's a vector space and all sorts of things are known about them.

58
00:08:01,040 --> 00:08:07,250
It's a vector space of dimension at most end. So it's a vector space.

59
00:08:13,210 --> 00:08:20,590
At most, and it will be of dimension. And if the columns are linearly independent and dimension less than if they are linearly dependent,

60
00:08:20,740 --> 00:08:26,230
which means there's some linear combination of them with non-zero coefficients that gives you the zero vector.

61
00:08:27,460 --> 00:08:36,340
And this number, the dimension is the rank of a also known as the column rank.

62
00:08:37,420 --> 00:08:52,150
Are they? Now, if a matrix has rank and if rank of A equals N, then we also say that A has full rank.

63
00:08:56,170 --> 00:09:03,340
And if the rank is less than ten, we often say that A is rank deficient.

64
00:09:06,500 --> 00:09:10,160
So these are standard terms that people like me use all the time.

65
00:09:11,750 --> 00:09:12,229
Now, of course,

66
00:09:12,230 --> 00:09:22,100
one of the things we want to do with matrices and vectors is to try to find expansion coefficients to match a particular right hand side.

67
00:09:22,340 --> 00:09:26,620
So that's the basic problem of linear algebra, given B, find X,

68
00:09:27,350 --> 00:09:32,820
and that makes sense even in this rectangular case where we're not assuming that the matrix is square.

69
00:09:32,840 --> 00:09:37,520
It's still a reasonable question to ask. So let me move over to here.

70
00:09:44,500 --> 00:09:47,710
So let's ask the question given be.

71
00:09:51,040 --> 00:10:06,530
Is there an x such that well, a x equals B would be nice, or maybe a x is in some sense close to be if a square.

72
00:10:06,550 --> 00:10:11,080
Of course, that's something we hope for. If a is rectangular, we're more likely to hope for that.

73
00:10:11,500 --> 00:10:15,040
So that's the basic problem of linear algebra.

74
00:10:15,250 --> 00:10:20,229
And when I posed it this way, I'm thinking of B as the data vector.

75
00:10:20,230 --> 00:10:26,530
So B goes from B one up to B, M if you like.

76
00:10:28,480 --> 00:10:32,530
It's the data and the x vector we're looking for.

77
00:10:35,110 --> 00:10:49,490
Is the coefficient vector. So of course the case we know best is when a is square and in vertical.

78
00:10:50,840 --> 00:11:02,360
And in that case, if a is square and investable, then we all know the solution to this problem.

79
00:11:05,710 --> 00:11:11,650
It is X equals a inverse B, that's the inverse of the matrix.

80
00:11:11,830 --> 00:11:15,100
And I want you to think about that expression.

81
00:11:17,110 --> 00:11:24,880
What that can be interpreted as is a statement that to get the coefficient vector for an expansion, you apply a inverse.

82
00:11:25,150 --> 00:11:30,670
So a inverse is an operation that matches you, that maps you from data to coefficients.

83
00:11:30,970 --> 00:11:35,500
So a inverse maps.

84
00:11:36,850 --> 00:11:43,030
From Data. Two expansion coefficients.

85
00:11:47,320 --> 00:11:54,610
And indeed, I recommend that you set up a neurone in your brain that sort of fires whenever it sees that kind of a thing.

86
00:11:54,610 --> 00:12:02,110
It should fire and say, aha, coefficient vector. Somebody is interested in a coefficient vector for a vector b.

87
00:12:05,810 --> 00:12:14,180
So let's not even write that down yet another way whenever we do a solution of a system of equations.

88
00:12:17,900 --> 00:12:37,040
Another way to say that is we're finding a coefficient vector for a matrix equation for an expansion of a vector B in a basis.

89
00:12:38,680 --> 00:12:46,010
Of columns of a. So that's going to work most cleanly if a square and non singular.

90
00:12:46,190 --> 00:12:51,740
But it also makes sense when a is rectangular and then we can talk about trying to get close to B,

91
00:12:51,980 --> 00:13:00,140
we can try to approximately solve a system of equations, find a coefficient vector that is a pretty good expansion, even if it's not exactly right.

92
00:13:01,340 --> 00:13:10,970
So let's say a little more about that. Suppose A has more rows than columns.

93
00:13:18,160 --> 00:13:21,969
Then in general, we're not going to be able to solve X equals B.

94
00:13:21,970 --> 00:13:30,790
Exactly. So we could put it like this unless B happens to be in the range.

95
00:13:34,860 --> 00:13:41,450
Then there will be no solution. So unless B happens to be in the range of a, a,

96
00:13:42,070 --> 00:13:52,260
x equals B has no solution by definition the range of defined as all the B's for which there is such a solution.

97
00:13:53,940 --> 00:14:01,270
So in this case. We can look for approximations.

98
00:14:04,500 --> 00:14:08,850
With the property that in some measure is close to be.

99
00:14:11,260 --> 00:14:16,749
And indeed one of the axioms of numerical linear algebra is that almost any problem that

100
00:14:16,750 --> 00:14:21,760
makes sense with a square matrix can be generalised to a problem with a rectangular matrix.

101
00:14:22,060 --> 00:14:25,960
And often that generalisation is a more robust formulation of the problem.

102
00:14:28,690 --> 00:14:33,100
So this is what will lead us in a moment to least squares.

103
00:14:35,560 --> 00:14:41,500
Which is one particular natural choice of what it means for a ex to be close to me.

104
00:14:41,950 --> 00:14:45,190
I want to say a word about the continuous analogue of these things.

105
00:14:56,560 --> 00:15:01,600
So the normal way that people like me think about matrix problems is as matrix

106
00:15:01,600 --> 00:15:05,110
problems in which there a certain number of rows and a certain number of columns.

107
00:15:05,410 --> 00:15:11,770
But when you draw this picture, nothing in the picture requires these columns to be discrete.

108
00:15:11,770 --> 00:15:14,380
They could be continuous, they could be functions.

109
00:15:14,680 --> 00:15:22,630
So the very same picture would apply if we're trying to expand this function as a linear combination of these functions.

110
00:15:23,230 --> 00:15:27,790
If you're thinking matrices, that's the case where m the number of rows goes to infinity.

111
00:15:28,030 --> 00:15:34,210
You can imagine a very long, skinny matrix. Well, as M goes to infinity, you get some kind of a function analogue.

112
00:15:36,030 --> 00:15:51,450
And that's, of course, a very old idea. So as the number of rose goes to infinity, we can think of continuous functions.

113
00:15:52,650 --> 00:16:08,760
So we can think of each column of a as a function of some independent variable, let's say T and similarly B would be a function.

114
00:16:09,840 --> 00:16:15,240
So that could be functions, let's say, of T on some interval, if you like.

115
00:16:17,280 --> 00:16:20,549
All the linear algebra that we do has analogues.

116
00:16:20,550 --> 00:16:23,760
In this continuous case, everything is still defined.

117
00:16:25,530 --> 00:16:32,500
So for example. Range of a is the same as before.

118
00:16:36,900 --> 00:16:45,360
If a is an object whose columns are functions, the range of A is the space of linear combinations of those functions.

119
00:16:45,570 --> 00:16:50,610
Again, it will have dimension at most. M If it has dimension, exactly.

120
00:16:50,610 --> 00:16:59,220
And we can say it's a full rank and so on. The rank of A is the same defined as before.

121
00:17:02,460 --> 00:17:07,380
So it's the dimensionality of the set of all products A, B.

122
00:17:08,460 --> 00:17:12,540
So this, as I say, is an old point of view, although there isn't that much literature on it.

123
00:17:13,200 --> 00:17:20,300
In this case, the term that seems to have taken hold is that a is called a quasi matrix.

124
00:17:26,230 --> 00:17:32,110
And the reason this stuff interests me. Well, one reason it interests me is that I think all of computational science,

125
00:17:32,110 --> 00:17:36,880
all of numerical linear algebra, is about this interplay between the continuous and the discrete.

126
00:17:37,300 --> 00:17:42,910
Typical problem we try to solve with matrices might be obtained by describing something continuous.

127
00:17:43,150 --> 00:17:46,930
So ultimately it's this that we care about at the continuous level.

128
00:17:47,530 --> 00:17:52,570
It's also of interest to me because of the Fun Software Project, which is something I'm engaged with.

129
00:17:52,780 --> 00:18:00,220
So at a software level this turns into fun, which I'll show you a little bit about, maybe a little today and more later on.

130
00:18:01,690 --> 00:18:09,370
So let's see. I want to show an example, discrete enough quasi matrices and this on the reverse of the quiz is M7.

131
00:18:18,280 --> 00:18:27,760
So if we go to M7, let's type A equals 123456220.

132
00:18:29,020 --> 00:18:39,060
So there you have a three by three matrix a. And I just want you to practice thinking of eight times a vector as a linear combination of the columns.

133
00:18:39,300 --> 00:18:46,620
So, for example, suppose I say a times the column vector one zero.

134
00:18:48,030 --> 00:18:51,570
One. Now, of course, you can calculate that any number of ways.

135
00:18:51,780 --> 00:18:56,480
But I want you to think of that instantly as the first column, plus the third column.

136
00:18:56,490 --> 00:19:02,580
So that should give us 410 to. Similarly.

137
00:19:02,580 --> 00:19:05,580
Let's do one more. Suppose I say a times.

138
00:19:05,940 --> 00:19:09,000
What do I have here? One minus two zero.

139
00:19:10,320 --> 00:19:13,850
So that's the first column minus twice the second column.

140
00:19:13,860 --> 00:19:17,280
So that should give me. Somebody murmuring.

141
00:19:17,280 --> 00:19:21,189
Did I miss type? No. What will that give me?

142
00:19:21,190 --> 00:19:24,910
Minus three. Minus six. Minus two.

143
00:19:27,580 --> 00:19:30,850
So linear combinations of columns is a fruitful way to think.

144
00:19:31,330 --> 00:19:37,550
Now, of course, we can type something like inverse of a times 2 to 2.

145
00:19:39,100 --> 00:19:46,150
And what I ask you to do is think of that not just as some matrix operation, but as the extraction of a coefficient vector.

146
00:19:46,450 --> 00:19:53,620
So that is the vector of coefficients. When you expand the vector 2 to 2 in the columns of a.

147
00:19:56,450 --> 00:20:02,540
In MATLAB and indeed in numerical computation, generally, you don't normally construct an inverse.

148
00:20:02,750 --> 00:20:09,260
So this inverse of A is not an object that people would really construct, although you can, of course,

149
00:20:09,800 --> 00:20:20,330
it's more accurate and more efficient to bypass the inverse so that in MATLAB, the way you'd really solve this is to two, two, backslash, 2 to 2.

150
00:20:20,720 --> 00:20:26,210
So it's an interesting coincidence that when we did it with the inverse, obviously there were some small rounding errors.

151
00:20:26,600 --> 00:20:30,049
When we did it this way, there were no rounding errors. That's not typical.

152
00:20:30,050 --> 00:20:33,230
Of course, normally there will be rounding errors. We were just lucky.

153
00:20:36,440 --> 00:20:43,700
Now continuing on what you see here, the kind of thing people do with matrices all the time is do long skinny matrices.

154
00:20:43,910 --> 00:20:49,430
So let's do that. I'll shrink the font a little bit or maybe too much.

155
00:20:51,630 --> 00:21:00,300
Okay. So suppose I say t equals the vector from zero and steps of 0.1 up to two, and I make that a column vector.

156
00:21:01,290 --> 00:21:05,070
So there's t, there's T.

157
00:21:05,790 --> 00:21:11,820
So this is some kind of a discrete approximation to the function T on the interval from 0 to 2.

158
00:21:13,490 --> 00:21:21,170
So I could now say flat t e to the minus t, and I could make that, let's say, a red line.

159
00:21:26,190 --> 00:21:35,280
So there we have a picture that looks like E to the minus T, but of course, it's not really it's actually a vector with 20 pieces in it.

160
00:21:37,630 --> 00:21:40,960
Plotted by connecting the dots with piecewise linear.

161
00:21:41,140 --> 00:21:43,780
It only looks so good because it's a nice smooth function.

162
00:21:45,490 --> 00:21:49,960
Now let's do a little linear algebra and interpret the linear algebra in terms of that picture.

163
00:21:50,590 --> 00:21:58,709
Let's make a. As a matrix whose first column is t two.

164
00:21:58,710 --> 00:22:07,890
The zero second column is TI to the one and third column is ti squared.

165
00:22:09,090 --> 00:22:12,390
And let's see. I'll detach this thing so we can see the matrix.

166
00:22:17,070 --> 00:22:25,240
Okay. So the thing I've just constructed is a so-called Van der Monde matrix.

167
00:22:25,510 --> 00:22:29,620
The first column is T to the zero, the second T to the one. The third is T squared.

168
00:22:29,890 --> 00:22:33,700
Obviously, it's some kind of a discrete approximation to a continuous object,

169
00:22:34,000 --> 00:22:38,410
a very basic matrix that we might use for some kind of data fitting problem.

170
00:22:39,920 --> 00:22:49,400
So let's do a little data fitting. Let's make a coefficient vector x, which will be one minus one one half.

171
00:22:52,140 --> 00:22:53,610
Now. Why did I pick that vector?

172
00:22:53,760 --> 00:23:02,580
Because those are the first three Taylor coefficients of E to the minus T, so e to the minus T equals one minus T plus the half T squared and so on.

173
00:23:02,910 --> 00:23:10,410
So that data vector corresponds to a taylor approximation to the functioning, to the minus t at t equals zero.

174
00:23:12,460 --> 00:23:18,540
So if we multiply. A by that vector, we get that approximation.

175
00:23:19,080 --> 00:23:24,870
Let's see. I say hold on and then I'll say plot T against a times x.

176
00:23:27,330 --> 00:23:36,920
So this will show me the Taylor approximation to the function e to the minus T, at least on this discrete grid of 21 points.

177
00:23:41,640 --> 00:23:44,790
So notice it does just what you would expect of a Taylor approximation.

178
00:23:45,060 --> 00:23:50,400
It matches the function beautifully at zero. And then it wanders off in some other direction.

179
00:23:55,560 --> 00:23:59,460
Now we're going to do least squares in a moment. But first, let's just illustrate least squares.

180
00:23:59,670 --> 00:24:08,820
So suppose I say x equals a backslash e to the minus t.

181
00:24:10,520 --> 00:24:17,360
So let's think about that syntactically. First, before I type return e to the minus t is a vector of length 21.

182
00:24:17,960 --> 00:24:22,640
I've got a spurious parentheses there. I'll have to get rid of a backslash.

183
00:24:22,640 --> 00:24:24,950
Well, that neurone should fire in your head. Right.

184
00:24:24,950 --> 00:24:31,520
So you immediately know that that is going to compute the expansion vector, the coefficient vector x.

185
00:24:32,940 --> 00:24:38,760
In some sense, but it's going to be in a least squares sense because a is this long skinny matrix.

186
00:24:39,030 --> 00:24:46,200
So if I do that, it's actually giving you a vector of coefficients which will give a least squares approximation to the data.

187
00:24:46,380 --> 00:24:49,770
And if we plot that, we can see it. So let's say plot.

188
00:24:51,360 --> 00:24:54,600
T against a x. And I'll make this one black.

189
00:25:01,040 --> 00:25:04,760
So you see the big difference. The Taylor approximation is all at the origin.

190
00:25:05,090 --> 00:25:09,800
The least squares is global and it's obviously done a good job of fitting the function.

191
00:25:10,040 --> 00:25:15,800
So this function can be very well approximated by a parabola on the interval from 0 to 2.

192
00:25:17,780 --> 00:25:22,459
And just to play with MATLAB, as we always do. I hope you know the legend command.

193
00:25:22,460 --> 00:25:27,890
You can do things like this. I'll say e to the minus t taylor least squares.

194
00:25:29,430 --> 00:25:34,139
The Legend Command puts a legend on your plot and you can put it in whatever corner you like.

195
00:25:34,140 --> 00:25:40,840
So I say Location Southwest and there you see it's put the legend on the plot.

196
00:25:40,880 --> 00:25:46,660
Oops. Okay.

197
00:25:46,660 --> 00:25:50,740
So you see that Lee Squares obviously has something going for it.

198
00:25:50,890 --> 00:25:55,180
Let's finish this demo by just plotting the error for the least squares approximation.

199
00:25:55,480 --> 00:26:01,900
So if I say plot T against a x minus E to the minus T, that's the error.

200
00:26:05,540 --> 00:26:09,110
So if you look at the axes, you can see the error is nice and small.

201
00:26:09,260 --> 00:26:14,210
We begin to see the discrete nested 20 segments and the error is about .02.

202
00:26:18,870 --> 00:26:26,419
Okay. Any comment? Now, since we do seem to have enough time, let me just give you a hint of how to have fun.

203
00:26:26,420 --> 00:26:31,340
Does continuous analogues of this so intense fun I can do things like this.

204
00:26:31,340 --> 00:26:39,800
I'll say from 18, I could say let's create a variable on the interval from 0 to 2,

205
00:26:41,450 --> 00:26:49,310
and then I could say A equals T to the zero, T to the one, T to the two.

206
00:26:49,640 --> 00:27:00,230
And now we have a truly continuous object. So it reports that it's a telephone consisting of three columns, and the three columns do the right thing.

207
00:27:00,590 --> 00:27:05,000
So, for example, if I plot a. You'll see.

208
00:27:07,300 --> 00:27:11,260
The three columns one an x and x squared.

209
00:27:12,660 --> 00:27:16,150
And I can do all sorts of things like at least squares. So let's try that.

210
00:27:16,170 --> 00:27:25,260
I can even do the continuous analogue of least squares. I could say, let's see, one of our data vectors was that one.

211
00:27:26,250 --> 00:27:34,770
So I could say plot, I could say x equals E to the minus T and I could say Plot F.

212
00:27:36,840 --> 00:27:41,310
And then I could say. Plot.

213
00:27:42,760 --> 00:27:52,810
Hold on. Then I could say plot a times X as a blue line looks blues, a boring colour.

214
00:27:53,110 --> 00:27:59,200
So here we're doing exactly the same kind of thing, except we've taken the number of rows to infinity,

215
00:27:59,200 --> 00:28:02,410
as it were, where in some sense dealing with actual functions.

216
00:28:02,920 --> 00:28:10,120
So there's E to the minus T, there's the Taylor approximation, and let's do the least squares analogue.

217
00:28:10,120 --> 00:28:14,590
If I say A equals a backslash f.

218
00:28:16,290 --> 00:28:20,580
I could now say Clark a times ex as a red line.

219
00:28:24,650 --> 00:28:30,440
So there you can see we've actually done a continuous least squares approximation rather than a discrete one.

220
00:28:30,590 --> 00:28:37,310
Everything looks the same because this isn't a very exciting function, but it's truly the continuous analogue of this discrete stuff.

221
00:28:37,550 --> 00:28:47,480
If I plot the error, that's a X minus F, so it looks about as it did before, except now it's smooth instead of discrete.

222
00:28:48,560 --> 00:28:53,690
So we've had a lot of fun in fun doing all kinds of neat stuff, which I will show you more of later.

223
00:29:00,630 --> 00:29:09,660
Okay. Well, we're about halfway through linear algebra. Yes.

224
00:29:11,220 --> 00:29:16,070
Is. Is it finding a quadratic which is close to the least square?

225
00:29:16,360 --> 00:29:24,230
Yes, because the reason it's a quadratic is that the quasi matrix had three columns, which were one and T and T squared.

226
00:29:24,710 --> 00:29:31,790
So a linear combination of them. Is that quadratic? Okay.

227
00:29:31,790 --> 00:29:36,050
So I haven't really told you what these squares is, though. Of course, you all know, I'm sure.

228
00:29:37,220 --> 00:29:40,310
But in order to talk about that, we need to talk about orthogonal ity.

229
00:29:43,730 --> 00:29:48,170
So, uh, 2.2 is orthogonal vectors and matrices.

230
00:29:56,640 --> 00:30:04,760
Now. You know me. I always like history. One of the interesting bits of history here is that our technology is, of course,

231
00:30:04,760 --> 00:30:11,360
a very old idea mathematically, but in the 1960s it moved to centre stage algorithmically.

232
00:30:11,630 --> 00:30:17,510
Before that, people had been doing solving systems of equations and so on, using non orthogonal methods.

233
00:30:17,750 --> 00:30:23,480
And in the sixties, they pretty much discovered that most things could be done more accurately using orthogonal methods.

234
00:30:23,720 --> 00:30:29,660
So for 50 years now, orthogonal linear algebra has been a very central part of the field.

235
00:30:30,830 --> 00:30:37,880
Now, let me remind you of some standard definitions. We start with the idea of an inner product between two vectors.

236
00:30:41,730 --> 00:30:46,320
So if the. And w are vectors, then.

237
00:30:47,930 --> 00:30:50,990
We write it like this. V. Transpose w.

238
00:30:52,480 --> 00:30:56,110
Is equal to the sum of v j.

239
00:30:57,130 --> 00:31:05,020
WJ. Couple of comments here. The default notation in this field is always that a vector is a column vector.

240
00:31:05,230 --> 00:31:11,140
So v transpose w means we have something like that multiplied by something like that.

241
00:31:13,710 --> 00:31:21,210
Another comment is I could write J goes from one to M or something if I wanted to emphasise the row index.

242
00:31:21,420 --> 00:31:26,579
But I prefer this way because well, it could be anything.

243
00:31:26,580 --> 00:31:34,260
It could be continuous. This could in the continuous analogue that would turn into an integral of v.

244
00:31:34,260 --> 00:31:46,390
Of t times w. Of t. Another comment I could make is the formulas I'm writing down are for real numbers.

245
00:31:46,570 --> 00:31:51,130
If you're dealing with complex numbers, then an inner product would have a complex conjugate for it.

246
00:31:53,530 --> 00:31:58,660
Okay, so that's the standard idea. And we say that the and W are orthogonal.

247
00:32:03,350 --> 00:32:06,380
If the inner product is zero.

248
00:32:10,430 --> 00:32:17,360
And we say the norm of a matrix of a vector, which is also called the two norm.

249
00:32:21,010 --> 00:32:24,430
Is defined like this. We write it like that.

250
00:32:24,430 --> 00:32:28,480
Or if we want to emphasise that it's the tune arm, we could also write it like that.

251
00:32:29,200 --> 00:32:34,080
And it's the square root. Of the transposed feet.

252
00:32:40,900 --> 00:32:48,040
If we have a set of vectors, that is to say more than two of them, let's say, so a set of vectors.

253
00:32:48,370 --> 00:32:51,549
And when they're orthogonal, they're often going to be written as.

254
00:32:51,550 --> 00:32:54,880
Q So let me do that. If suppose I have a set of vectors.

255
00:32:55,090 --> 00:32:59,080
Q Sub J We say that they are orthogonal.

256
00:33:03,840 --> 00:33:10,560
If each pair is orthogonal. So we could say if they are pairwise or orthogonal.

257
00:33:19,540 --> 00:33:25,870
When I write. Q Usually it's going to have a further property of being normalised, have length one.

258
00:33:26,140 --> 00:33:30,010
So we say that a set of vectors are auth a normal.

259
00:33:34,740 --> 00:33:39,810
Which is abbreviated o. N. If in addition.

260
00:33:40,830 --> 00:33:44,040
Each one has norm. One.

261
00:33:47,180 --> 00:33:49,930
So if you have a vector, you can always normalise it.

262
00:33:49,940 --> 00:33:56,420
If it's non-zero, you can normalise it by dividing by its norm and then you'll get a normalised vector.

263
00:33:56,630 --> 00:34:01,250
Or you could divide by minus its norm. So there's always that plus or minus sign choice.

264
00:34:03,870 --> 00:34:08,040
So let's suppose that we have a set of ortho normal vectors.

265
00:34:08,040 --> 00:34:14,640
So I'll imagine that Q1 through Q n our ortho normal.

266
00:34:18,270 --> 00:34:24,510
Then. Why is all the normal ortho normality so interesting?

267
00:34:24,750 --> 00:34:31,470
Because then inverses turn into multiplication and to get expansion coefficients all we need our inner products.

268
00:34:31,800 --> 00:34:37,650
So let me write that down. In this case, we get expansion coefficients.

269
00:34:42,790 --> 00:34:49,550
Via inner products. And I want to write that down in a matrix fashion.

270
00:34:52,040 --> 00:35:00,769
So first of all, in a non matrix fashion, if a vector B is expanded as a linear combination.

271
00:35:00,770 --> 00:35:06,800
X. J. Q. J. Then that's the same as saying.

272
00:35:08,120 --> 00:35:18,760
If the Q JS are ortho normal. That x j equals q j transpose b and why is that?

273
00:35:18,790 --> 00:35:22,210
Well, I multiply this equation on the left by q j transpose.

274
00:35:22,480 --> 00:35:30,969
So of course that's q j transpose b and then here we get the linear combination of x j times QJ transpose and you

275
00:35:30,970 --> 00:35:39,550
see that all the all the ones that the inner products of Q II and Q J that are different I and j go to zero.

276
00:35:40,210 --> 00:35:44,320
So this is the great benefit of ortho normality you get.

277
00:35:45,960 --> 00:35:50,220
Expansion coefficients by products. So let's write it via matrices.

278
00:35:51,210 --> 00:35:58,130
So in the matrix picture. We, as usual, think of a long, skinny matrix whose columns are.

279
00:35:59,240 --> 00:36:02,850
Now. Q one. Up to q in.

280
00:36:05,470 --> 00:36:16,450
Author Normal. And I can imagine multiplying that by x1x2 up to x m and getting a vector b.

281
00:36:22,400 --> 00:36:34,550
So this equation as a matrix equation could be written capital q x equals B where Q is a whatever by n matrix.

282
00:36:37,230 --> 00:36:44,290
Now we multiply on the left. By the transpose of Q.

283
00:36:46,370 --> 00:36:50,030
Which is a and by whatever matrix.

284
00:36:51,890 --> 00:36:55,160
So imagine multiplying this thing on the left.

285
00:36:55,550 --> 00:37:00,210
I'm just going to draw it schematically. We have rectangle times.

286
00:37:00,800 --> 00:37:05,330
Vector equals long vector. But now we multiply on the left.

287
00:37:05,330 --> 00:37:10,840
So we get. Transpose a rectangle times rectangle times.

288
00:37:10,840 --> 00:37:15,309
Short vector equals transpose of rectangle.

289
00:37:15,310 --> 00:37:29,370
Times long vector. So multiplying that thing on the left gives us the equation that x one down two x end.

290
00:37:30,990 --> 00:37:36,360
Is equal to. And now let's do that. Schematically, it's this long.

291
00:37:37,560 --> 00:37:47,129
Vector whose first row is Q one transpose second row Q to transpose third row and so on down to the end throw is q.

292
00:37:47,130 --> 00:37:50,130
N. Transpose times b.

293
00:37:56,660 --> 00:38:04,070
So as a matrix equation, this is the equation that X equals Q transpose b.

294
00:38:08,610 --> 00:38:12,090
And notice here I'm writing equations involving rectangular matrices.

295
00:38:12,330 --> 00:38:18,150
I most certainly have not assumed that Q is square. What I've done makes sense in general.

296
00:38:22,160 --> 00:38:25,700
So it's interesting how different fields use different notations.

297
00:38:27,600 --> 00:38:30,870
In Physics, which is the other field I would know best, I guess.

298
00:38:31,530 --> 00:38:34,349
Of course, exactly the same formulas are used all the time,

299
00:38:34,350 --> 00:38:39,240
but they would typically be written as summations rather than in this kind of matrix notation.

300
00:38:39,510 --> 00:38:44,820
But numerical people have found that the matrix notation is a very simple way to go towards algorithms.

301
00:38:45,060 --> 00:38:52,710
So that's how we prefer to write things. Okay.

302
00:38:53,490 --> 00:39:01,530
That's all for arbitrary matrices with all the normal columns, but a very important special case.

303
00:39:02,010 --> 00:39:09,390
Let's move over here is when they are square matrices with all the normal columns.

304
00:39:13,800 --> 00:39:19,500
In fact, there's no name for one of these matrices, which is a great shame matrix with all the normal columns.

305
00:39:19,500 --> 00:39:24,810
There's no name for that. It might be called an All the Normal Matrix, but that's not a standard term.

306
00:39:25,620 --> 00:39:30,570
The standard term only arises once it becomes square, and then it's called an orthogonal matrix.

307
00:39:30,810 --> 00:39:34,740
So let's now talk about an orthogonal matrix.

308
00:39:37,500 --> 00:39:40,890
So it's. An orthogonal matrix.

309
00:39:44,740 --> 00:39:49,120
Which we could write. Cue is a square matrix.

310
00:39:53,390 --> 00:40:03,410
With all the normal columns. So everything I've written down applies in this case too.

311
00:40:03,410 --> 00:40:06,830
But then more also applies because we can talk about the inverse.

312
00:40:07,760 --> 00:40:16,930
So in this case. Let's draw the square picture of Q transpose times.

313
00:40:16,950 --> 00:40:24,960
Q. So here's Q here's Q transpose.

314
00:40:25,830 --> 00:40:29,280
Q consists of vector column vectors.

315
00:40:29,280 --> 00:40:34,930
Cue one through. Cue in. Q Transpose consists of row vectors.

316
00:40:35,680 --> 00:40:40,330
Q one transpose through. Q And transpose.

317
00:40:42,270 --> 00:40:50,250
Now when you multiply that by that, you're getting inner products of the various Q transposes with the various cues and it's sort of normal.

318
00:40:50,400 --> 00:40:55,889
Then all of those inner products are zero except on the diagonal where you get one.

319
00:40:55,890 --> 00:41:00,360
Because it's not just orthogonal, it's ortho normal. So we get the identity.

320
00:41:00,360 --> 00:41:09,990
Oops. We get one, one, one, which is the identity.

321
00:41:14,840 --> 00:41:20,120
So just from this notion of or thug analogy, you can see that in the square case.

322
00:41:20,690 --> 00:41:26,570
Q Transpose is the inverse of. Q Because you multiply them together and you get the identity.

323
00:41:26,840 --> 00:41:28,460
So that is.

324
00:41:32,220 --> 00:41:40,710
If you have the same number of rows and columns and an author normal matrix with all the normal columns we have Q transpose Q equals the identity,

325
00:41:41,010 --> 00:41:45,480
so Q transpose equals Q inverse.

326
00:41:46,350 --> 00:41:51,150
And it's a standard fact of linear algebra that that also implies that Q.

327
00:41:51,240 --> 00:41:54,540
Q Transpose is the identity.

328
00:41:57,540 --> 00:42:02,190
So in this case you your neurone can fire whenever it sees Q transpose.

329
00:42:02,370 --> 00:42:10,040
So whenever you see a. X equals Q transpose b.

330
00:42:11,510 --> 00:42:15,080
That's the same as saying X equals Q inverse b.

331
00:42:16,380 --> 00:42:20,850
So in this case of a square matrix, that's how you extract coefficients.

332
00:42:21,030 --> 00:42:29,640
And that's a hint about the numerical algorithms. Multiplying something by this matrix is much easier than computing a whole new inverse matrix.

333
00:42:34,830 --> 00:42:41,100
Okay. So this is how you get expansion coefficients when you're dealing with an orthogonal matrix.

334
00:42:41,310 --> 00:42:44,400
And just to hammer the point home ad nauseum.

335
00:42:46,430 --> 00:42:50,000
X in this case would be written as.

336
00:42:51,460 --> 00:42:54,700
This matrix. Q one transposed down to.

337
00:42:54,970 --> 00:43:04,860
Q n transpose times. B. Okay.

338
00:43:04,860 --> 00:43:09,120
So this gives us a convenient notation in which a lot of things are easy to write down.

339
00:43:09,300 --> 00:43:16,680
Let's give an example of that. So it's too simple to call it a theorem.

340
00:43:16,950 --> 00:43:25,240
Let's call it a proposition. If Q is orthogonal.

341
00:43:31,760 --> 00:43:36,710
Then for any vector x the norm of q x.

342
00:43:38,490 --> 00:43:49,980
Is equal to the norm of X. Well, that's plausible enough because the Q is going to maybe rotate or change the change basis and come kind of a.

343
00:43:51,530 --> 00:43:59,540
A non stretchy kind of away intuitively, but let's verify it algebraically and the matrices are very convenient for that.

344
00:43:59,540 --> 00:44:02,690
So q x. Norm.

345
00:44:03,920 --> 00:44:11,630
Well, the square of x norm is the inner product of Q X with itself.

346
00:44:13,700 --> 00:44:18,260
So remember, my definition of a norm was the square root of this. So the norm squared is that.

347
00:44:18,650 --> 00:44:27,410
And now you all know that when you have a transpose of a product, that's the same as the product of the transposes in reverse order.

348
00:44:29,360 --> 00:44:32,570
So that is equal to X transpose.

349
00:44:33,200 --> 00:44:37,080
Q Transpose. Times. Q.

350
00:44:38,210 --> 00:44:42,730
Times X. And now this is the identity.

351
00:44:44,460 --> 00:44:51,120
So this is equal to x transpose x. So here we have the norm of x squared.

352
00:44:54,640 --> 00:45:00,280
And it's interesting practice if you're thinking about this stuff and I'm sure you all know these things from some angle,

353
00:45:00,640 --> 00:45:06,340
maybe not using this notation, it take the angle you're most familiar with and see how it translates into this.

354
00:45:06,340 --> 00:45:09,940
See what is being really said here by each of these terms.

355
00:45:14,550 --> 00:45:21,900
Okay. So it remains to write down these squares and basically with all these orthogonal tools in hand,

356
00:45:22,680 --> 00:45:30,590
the algebra of these squares sort of comes immediately. So let's suppose we have a matrix a.

357
00:45:34,230 --> 00:45:40,710
And it has more rows than columns. So if we want to be explicit, we could say it's m by n with.

358
00:45:42,500 --> 00:45:55,230
M bigger than n. Then in general, we can't expect to be able to solve X equals B, but we might be able to find a good approximation.

359
00:45:55,530 --> 00:46:01,010
And we say that at least squares. Solution to X equals B.

360
00:46:07,370 --> 00:46:15,410
Is a vector x with the property that the norm of x minus B is as small as possible.

361
00:46:20,380 --> 00:46:27,130
That actually explains me is called the residual. And all it takes is a picture to see how to solve this problem.

362
00:46:27,310 --> 00:46:36,160
So this is a picture I draw a lot. You imagine in some abstract space the set of all possible m vectors.

363
00:46:36,170 --> 00:46:39,330
So this is the space. Our M.

364
00:46:42,190 --> 00:46:52,270
It's a vector space, an m dimensional vector space. And now you imagine that the subspace of our M, which is the range of a.

365
00:46:56,720 --> 00:47:01,879
Now a. Only has n columns. So the range of a is a subspace of dimension at most.

366
00:47:01,880 --> 00:47:09,170
N it could be less or it could be n. So the range of a is a subspace of our m of dimension at most.

367
00:47:09,170 --> 00:47:13,340
N And now we have a point somewhere other over here.

368
00:47:14,640 --> 00:47:22,560
Be. And our goal is to find an X such that a x minus B is minimal.

369
00:47:22,570 --> 00:47:27,520
In other words, we want to find an x corresponding to the point in here.

370
00:47:27,640 --> 00:47:31,270
Closest to be the solution is immediate from the picture.

371
00:47:31,990 --> 00:47:34,150
Obviously one can prove this algebraically.

372
00:47:35,450 --> 00:47:46,630
The closest approximation to be in this space is a point there characterised by this right angle here orthogonal.

373
00:47:50,000 --> 00:47:54,230
I'm just using your geometric intuition. Of course one can prove that algebraically.

374
00:47:54,830 --> 00:48:04,400
So to solve the least squares problem, you want to find that point there such that the range of a is orthogonal to the residual.

375
00:48:06,050 --> 00:48:12,200
So to get the solution to our least squares problem, we want the range of a.

376
00:48:14,410 --> 00:48:18,160
To be orthogonal to X minus B.

377
00:48:20,990 --> 00:48:25,620
Now, another way to say that is we want. Each.

378
00:48:26,950 --> 00:48:30,280
Column of A to B, orthogonal to A, X minus B.

379
00:48:30,610 --> 00:48:42,340
So we want a oops. We want a J transpose x minus B to be zero for each column of a.

380
00:48:44,610 --> 00:48:49,710
So the picture tells us what algebraic problem we need to solve to do least squares.

381
00:48:57,570 --> 00:49:09,120
So there's our algebraic problem. Another way to say that is that a transpose times A, X minus B should be zero.

382
00:49:09,630 --> 00:49:16,410
Or another way to say that is that A transpose A should be A transpose B.

383
00:49:18,490 --> 00:49:23,470
So that's the simplest derivation of an algebraic condition to solve.

384
00:49:23,530 --> 00:49:28,160
Sorry, there's an X here. To solve the least squares problem.

385
00:49:28,340 --> 00:49:31,460
This set of equations is called the normal equations.

386
00:49:34,910 --> 00:49:36,620
So we've defined orthogonal ity,

387
00:49:37,430 --> 00:49:44,840
we've used a picture to characterise a least squares approximation and then from that picture we've derived a formula.

388
00:49:45,900 --> 00:49:51,809
A linear system of equations symmetric, positive, definite, or at least positive as indefinite.

389
00:49:51,810 --> 00:49:55,410
It could be non-negative definite in the ranked deficient case.

390
00:49:56,130 --> 00:50:07,510
We still have one minute, so let's do the other demo. So now this is m eight.

391
00:50:07,780 --> 00:50:13,390
Suppose I now say t equals to pi times.

392
00:50:14,940 --> 00:50:21,580
Numbers space between zero and one. So T is now a vector that basically goes from 0 to 2 pi.

393
00:50:22,380 --> 00:50:27,720
And suppose I say a equals. It's in my computer somewhere.

394
00:50:27,750 --> 00:50:39,110
There we are. So A is now a matrix of five columns which are one, and then sine t and cosine t and sign to t and cosine to t.

395
00:50:39,330 --> 00:50:42,480
So you see, we're doing a trigonometric type of approximation here.

396
00:50:43,380 --> 00:50:48,240
If I compute a transpose times a there it is, it's a bunch of inner products.

397
00:50:48,240 --> 00:50:51,000
You can see the orthogonal ity of signs and cosines.

398
00:50:52,500 --> 00:51:02,370
If I plot a, you can see signs and cosine democratised in 30 points or if I want to get the x axis right, I should really say plot of t against a.

399
00:51:04,260 --> 00:51:12,169
And now let's make a particular. Right hand side vector, which is already in my computer, and I'll plot it.

400
00:51:12,170 --> 00:51:16,040
I'll say T, B, and I'll plot it as black circles.

401
00:51:17,870 --> 00:51:29,830
Oops. So that's a right hand side vector which is picked just to be I'm using piecewise linear, let's do least squares.

402
00:51:29,840 --> 00:51:35,860
So if I type this in MATLAB, it's going through some algorithm which we haven't actually talked about,

403
00:51:36,100 --> 00:51:39,730
which is mathematically equivalent to solving this system of equations.

404
00:51:39,970 --> 00:51:44,170
So it finds the vector x corresponding to the least squares solution.

405
00:51:45,520 --> 00:51:51,160
And if I now say Hold on and plot T against a times X.

406
00:51:52,950 --> 00:52:00,169
You can see we have a nicely squares fit to that data. Continuing with this example, let's perturb the data.

407
00:52:00,170 --> 00:52:03,950
So I'll say B equals B plus some random perturbations.

408
00:52:04,400 --> 00:52:07,910
And now I'll plot that and I'll say plot TB.

409
00:52:08,060 --> 00:52:11,629
Okay, so now we've got the same data,

410
00:52:11,630 --> 00:52:20,209
but perturbed if I do the same thing where we know least squares is pretty good at dealing with noise, so we should get a similar vector.

411
00:52:20,210 --> 00:52:25,730
It's not so different. And if I do hold on and then plot T against x.

412
00:52:28,760 --> 00:52:34,280
We see that the least squares solution has done that. Of course, all of this could also be done in a continuous setting.

413
00:52:34,820 --> 00:52:36,200
Okay, that's it for now. Thanks.