1
00:00:00,210 --> 00:00:02,520
First of all, on the quiz. I hope you've done that.
2
00:00:03,210 --> 00:00:14,100
The first and the third vectors are orthogonal and then the three vectors have norms square root of 102, square root of 100 and square root of 101.
3
00:00:14,100 --> 00:00:16,680
So it's the first one that has the largest norm.
4
00:00:18,030 --> 00:00:27,270
I thought I'd begin just for fun with something that has no legitimate connection with this course whatsoever, which is the code called M9.
5
00:00:27,450 --> 00:00:38,130
So let's have some fun with that one. We live in a world where almost anything can be explored easily with or without mathematical content.
6
00:00:38,340 --> 00:00:49,020
So let's see here. M9 is a little MATLAB script that's designed to play with colours.
7
00:00:49,710 --> 00:00:57,230
So if I type m nine colour quiz. What happens is that you see a bar of colour that appears.
8
00:00:57,240 --> 00:01:02,880
Now the default way that MATLAB handles colours is B red, green, blue.
9
00:01:03,390 --> 00:01:09,150
So every colour is a linear combination of an amount of red between zero and one,
10
00:01:09,450 --> 00:01:14,370
an amount of green between zero and one, and an amount of blue between zero and one.
11
00:01:15,120 --> 00:01:21,720
Now, this quiz allows you to choose one of 27 options in an attempt to match that colour.
12
00:01:21,960 --> 00:01:25,020
So you've got to decide, is the red 0.5 or one?
13
00:01:25,230 --> 00:01:29,640
And then is the green 0.5 or one? And is the blue 0.5 or one?
14
00:01:29,670 --> 00:01:33,270
So for example, suppose we conjecture that that colour is red.
15
00:01:33,810 --> 00:01:39,450
Then I could type one red, zero green, zero blue.
16
00:01:41,310 --> 00:01:45,330
And I discover we're wrong because that's red. It's different.
17
00:01:45,660 --> 00:01:49,350
So now I turn it over to you. You tell me, what should I try next?
18
00:01:50,880 --> 00:01:55,990
How much red? Point five, I hear how much green?
19
00:01:56,890 --> 00:01:59,950
Zero. How much blue point five.
20
00:02:01,120 --> 00:02:08,840
Wrong again. Okay. What should I try next? More read.
21
00:02:09,140 --> 00:02:12,140
Okay. So should we try 10.5?
22
00:02:15,450 --> 00:02:19,110
Nope. Well. What's that?
23
00:02:21,180 --> 00:02:25,540
101. Yay!
24
00:02:25,600 --> 00:02:28,690
Okay, you win. So that's good. Let's just do a few more.
25
00:02:28,960 --> 00:02:33,940
There are only 27 possibilities. Pretty soon you'll have them all. Okay, let's try it again.
26
00:02:35,620 --> 00:02:39,610
Oh, that's rather the same, but maybe that's good. What do you think? What was the last one?
27
00:02:39,620 --> 00:02:44,920
Remind me, was it 101101?
28
00:02:44,930 --> 00:02:49,650
So what's this? Do you suppose? How much red?
29
00:02:50,680 --> 00:02:54,610
One. How much? Green Point five. How much?
30
00:02:54,610 --> 00:02:58,090
Blue. Hey, we're getting good.
31
00:02:58,270 --> 00:03:04,829
I'll try a couple more. Okay. What's that? Just for fun.
32
00:03:04,830 --> 00:03:09,660
Let's try a pure blue. So if I said 001, now we're pure blue.
33
00:03:09,870 --> 00:03:14,470
And this is clearly not a pure blue. So what do you think it is?
34
00:03:18,560 --> 00:03:22,590
00.5, I think I hear. Now.
35
00:03:25,580 --> 00:03:30,120
50.51. Hey.
36
00:03:30,390 --> 00:03:37,340
Hey. Well done. Okay, we'll do one more. Some of the most fun colours are orange and yellow, and the trouble is, they rarely appear.
37
00:03:37,370 --> 00:03:42,050
It's interesting how our perception has much sharper gradients in some regions than others.
38
00:03:42,320 --> 00:03:46,760
So among these 27 colours, loosely speaking, most of them seem to be green.
39
00:03:47,390 --> 00:03:50,420
Maybe we'll be lucky and get yellow or orange.
40
00:03:51,650 --> 00:03:54,680
We could, of course, keep going until we get yellow or orange.
41
00:03:54,890 --> 00:04:03,220
Oh, we got yellow. Okay. Who knows what yellow is? One read.
42
00:04:04,540 --> 00:04:08,350
I think I heard one read. How much green? One.
43
00:04:08,830 --> 00:04:14,959
How much blue? Point five. Wow, that was impressive.
44
00:04:14,960 --> 00:04:19,040
I'm very impressed. I couldn't have done that even though I've I wrote this code years ago.
45
00:04:19,040 --> 00:04:25,420
I'll do one more and then we'll say, Oh, that's boring. Let's do a more interesting one. Oh, okay.
46
00:04:25,440 --> 00:04:30,700
That's a good one to start with. How much? Red, green and blue. 111.
47
00:04:30,710 --> 00:04:34,600
I think that's right. Actually, I can do that one in my head. Let's see. One, one.
48
00:04:35,080 --> 00:04:39,190
Yeah, you win. Okay, good. We're done. Thank you for allowing me to do that.
49
00:04:43,570 --> 00:04:46,720
So we're talking about orthogonal ity and least squares.
50
00:04:48,340 --> 00:04:54,580
And let me bring us back to where we were in the last lecture where I rather hastily mentioned these squares.
51
00:04:54,760 --> 00:05:00,100
We talked about orthogonal vectors, which are defined by having an inner product at zero.
52
00:05:00,310 --> 00:05:03,790
If you have a whole set of vectors, they're said to be orthogonal.
53
00:05:03,790 --> 00:05:09,970
If they're pairwise orthogonal, if you have a matrix whose columns are authored normal.
54
00:05:10,210 --> 00:05:14,860
There's no name for such a matrix, even though we use them all the time, unless it's square.
55
00:05:14,860 --> 00:05:19,600
And if you have a square matrix who's orthogonal, whose columns are normal,
56
00:05:19,840 --> 00:05:25,270
then it's called an orthogonal matrix, and then it's equal to the transpose as equal to its inverse.
57
00:05:25,840 --> 00:05:33,220
Now, just to remind you, the framework for least squares problems is usually m by n matrices.
58
00:05:33,490 --> 00:05:39,980
So let a, b, m by n, and let's suppose that M is bigger than N.
59
00:05:41,860 --> 00:05:50,440
In applications, it might be much bigger than N. Now, to remind you a least, squares solution to A x equals B.
60
00:05:57,520 --> 00:06:01,840
Is a vector x that minimises the two norm of the residual.
61
00:06:02,260 --> 00:06:11,889
So it's x such that a x minus be measured in our standard square root of the sum of the squares.
62
00:06:11,890 --> 00:06:14,080
Norm is as small as possible.
63
00:06:18,320 --> 00:06:26,060
I'm not giving you a theoretical course on these squares fitting, but I'm just quickly going over some key points in MATLAB.
64
00:06:26,330 --> 00:06:29,330
We haven't talked about algorithms that will begin to do that yet.
65
00:06:29,540 --> 00:06:35,480
In MATLAB, if you type X equals a backslash B, it somehow computes this x.
66
00:06:35,780 --> 00:06:40,730
And let's remind ourselves of the kind of stuff you can get that way.
67
00:06:40,970 --> 00:06:44,540
I'm going to again run the code m8 from last time.
68
00:06:44,900 --> 00:06:53,870
So we had m8 at least squares. And just to remind you.
69
00:07:00,490 --> 00:07:06,310
What happened with that code is we made a piecewise linear function and then we added some random noise to it,
70
00:07:06,490 --> 00:07:10,120
and then we did exactly this kind of thing to fit the data by least squares.
71
00:07:10,330 --> 00:07:15,250
So in fact, every time I run the code, I'll get different random numbers, but a similar fit.
72
00:07:15,640 --> 00:07:22,180
So there we do it again. There we do it again. Different random numbers, similar coefficients, similar fit.
73
00:07:22,480 --> 00:07:27,810
So this is a very small scale, least squares problem that could have been done in 1949.
74
00:07:27,820 --> 00:07:33,640
Of course, we're way beyond that. Now, one reason for doing this is if you've got your code handy,
75
00:07:33,850 --> 00:07:40,000
the last couple of lines of the code, we're showing off a little bit of MATLAB just for fun.
76
00:07:40,510 --> 00:07:45,190
Suppose you click on the red curve then.
77
00:07:45,190 --> 00:07:51,910
That has the effect of making the red curve into the so-called current object in your MATLAB graphics.
78
00:07:52,210 --> 00:07:57,400
So if I now say set GCA, which is get current object,
79
00:07:57,910 --> 00:08:08,530
I can then do things like say line width 20 and so suddenly I've made the current object have a line width of 20 or I could say colour.
80
00:08:12,900 --> 00:08:18,740
What was yellow again? One. 11.5.
81
00:08:20,270 --> 00:08:24,040
What's that? Okay. I'm not pretending not to know.
82
00:08:24,040 --> 00:08:28,600
By the way, I really don't remember what yellow is. Okay, so good.
83
00:08:28,930 --> 00:08:33,909
Very impressive. Okay, so that's just a reminder that we're talking about least squares.
84
00:08:33,910 --> 00:08:36,250
And I want to continue down that road.
85
00:08:37,570 --> 00:08:45,250
And in particular, the main thing today is to talk about the way that numerical linear algebra people think of least squares,
86
00:08:45,250 --> 00:08:52,810
which is in terms of a QR factorisation traditionally in physics, often, for example, people think of a gram Schmidt Factorisation.
87
00:08:53,050 --> 00:09:00,760
This is a way to regard that more algebraically and to connect it with an algorithm that is more numerically stable.
88
00:09:01,210 --> 00:09:07,640
So I want to talk about Q QR factorisation. 2.3.
89
00:09:16,880 --> 00:09:22,310
And. Q and our stand for matrices. Q is orthogonal and R is triangular.
90
00:09:24,050 --> 00:09:29,210
So just to remind you that when we. Oh, these patterns are annoying, aren't they?
91
00:09:41,820 --> 00:09:51,660
Just to remind you that when we rate Q in this field, really, it almost always signifies a matrix whose columns are thought to normal.
92
00:09:52,710 --> 00:09:57,060
So if we have NW columns, then Q is probably going to look like that.
93
00:09:57,300 --> 00:10:02,010
And typically the length of the columns will be greater than the number of columns.
94
00:10:02,280 --> 00:10:05,790
So a matrix like that is going to be rectangular.
95
00:10:07,170 --> 00:10:15,270
And in such a case, if we look at Q transpose Q, then that will be the end by an matrix of inner product of these things.
96
00:10:15,480 --> 00:10:18,800
So that will be equal to the end by an identity.
97
00:10:20,790 --> 00:10:24,450
So pictorially. Q Transpose. Q Looks like this.
98
00:10:26,900 --> 00:10:32,000
On the other hand, if we look at Q Q transpose, which also makes sense as a matrix,
99
00:10:32,780 --> 00:10:39,080
it's an M by M matrix, and it is not equal to the end by M identity.
100
00:10:39,860 --> 00:10:44,060
So let's see it pictorially. It has the right dimensions.
101
00:10:46,680 --> 00:10:53,190
But of course it can't possibly be the Maiam identity, because if you think about it, it has rank on the end.
102
00:10:53,610 --> 00:10:59,220
So whatever this is, it's a matrix of rank m whereas the M by M identity would have rank m.
103
00:10:59,610 --> 00:11:03,000
So it's a rank and square matrix.
104
00:11:04,620 --> 00:11:12,950
Now anyway. A QR factorisation. Are also known as a reduced QR factorisation.
105
00:11:14,380 --> 00:11:19,530
Oops. Of our matrix a.
106
00:11:23,710 --> 00:11:28,330
Is a factorisation of aa into a product of a q times and are.
107
00:11:29,860 --> 00:11:32,870
And I'll just draw the picture and also the formula.
108
00:11:32,890 --> 00:11:40,180
Of course, we have a matrix A which we think of as consisting of N columns from a one to a n.
109
00:11:41,620 --> 00:11:45,010
A reduced q r factorisation rates that.
110
00:11:46,130 --> 00:11:49,820
As a matrix. Q With all the normal columns.
111
00:11:51,450 --> 00:12:00,910
Times. An upper triangular matrix r with entries going from r11 to our and end.
112
00:12:01,270 --> 00:12:04,290
So that would be R one, one, R2 two and so on.
113
00:12:04,480 --> 00:12:11,260
And it's up for triangular. So this would be R one to r1n0 down there.
114
00:12:12,760 --> 00:12:17,350
So this is a factorisation of matrix. A Into matrix.
115
00:12:17,350 --> 00:12:26,910
Q Times matrix are. And this is the basic way that we formulate things when talking about least squares problems.
116
00:12:27,150 --> 00:12:31,290
It's a change of basis in some sense that I will talk about from various angles.
117
00:12:32,220 --> 00:12:37,260
You start with a basis for an end dimensional space consisting of the end columns of a.
118
00:12:38,060 --> 00:12:47,310
You change that to a basis which is the normal and or normality has computational advantages, and it's also directly tied to least squares.
119
00:12:48,570 --> 00:12:51,750
So here. Of course I've said it. Let me write it down.
120
00:12:51,750 --> 00:12:59,190
The columns of Q are required to be authored normal and are is required to be upper triangular.
121
00:13:06,190 --> 00:13:07,840
So if you have a thing like that,
122
00:13:09,040 --> 00:13:16,510
let me remind you that I've been urging you to think of matrix times vector as a linear combination of the columns of the matrix.
123
00:13:16,780 --> 00:13:22,450
So here we have matrix times this vector and this vector and this vector and this vector.
124
00:13:23,110 --> 00:13:26,169
So the first column on the left. Will be.
125
00:13:26,170 --> 00:13:35,560
Q One times are one one. The second column on the left will be a linear combination of Q1 and Q2 with these coefficients and so on.
126
00:13:35,800 --> 00:13:43,600
So let's write that down. A one is a linear combination of just one vector.
127
00:13:44,320 --> 00:13:48,370
A two is a linear combination of two vectors.
128
00:13:51,840 --> 00:13:59,550
And so on. A three is our one three.
129
00:13:59,820 --> 00:14:03,390
Q One plus are two three.
130
00:14:03,780 --> 00:14:07,830
Q Two plus are three, three.
131
00:14:08,160 --> 00:14:18,180
Q three. So each column on the left is a linear combination of columns on the right, with coefficients coming from one column of the matrix.
132
00:14:19,670 --> 00:14:26,870
So we can see there how it's a change of basis, where we're representing the columns of a in the successive columns of.
133
00:14:26,870 --> 00:14:31,840
Q, which are all the normal. So generically.
134
00:14:35,990 --> 00:14:42,860
If a one through a and are linearly independent.
135
00:14:47,100 --> 00:14:54,870
Then we've simply changed basis and we have that the span of those vectors.
136
00:14:57,290 --> 00:15:03,740
Would be the same as the span of the Q vectors.
137
00:15:05,370 --> 00:15:08,430
And by the span I mean the set of all linear combinations.
138
00:15:08,670 --> 00:15:13,020
So generically, this is an n dimensional space and this is a different basis for it.
139
00:15:13,530 --> 00:15:19,830
If they're rank linearly dependent, then the span of these vectors would be of a lower dimension.
140
00:15:20,130 --> 00:15:24,810
And so it would be properly contained in the span of these vectors.
141
00:15:28,620 --> 00:15:30,600
Okay. So that's a QR factorisation.
142
00:15:30,930 --> 00:15:36,210
Now let's play around with the algebra a little more and then talk about the ways that these things are calculated.
143
00:15:37,140 --> 00:15:42,090
So let's multiply on the left by Q transpose.
144
00:15:48,130 --> 00:15:51,700
So I'm going to multiply that equation on the left by. Q Transpose.
145
00:15:51,910 --> 00:15:55,780
Q Transpose is a wide short matrix.
146
00:15:56,200 --> 00:16:02,180
So what I get is. Q transpose a.
147
00:16:08,460 --> 00:16:11,820
And then. Q Transpose times. Q Is the NBA an identity?
148
00:16:12,030 --> 00:16:15,090
So that goes away. And I just get are.
149
00:16:19,310 --> 00:16:24,800
So cue transpose consist of rose from Q one transpose down to.
150
00:16:24,980 --> 00:16:39,890
Q and transpose. A As always consist of columns from a one to an and then are as always are one one down to are and end up or triangular.
151
00:16:42,990 --> 00:16:51,389
So that's another way to see what's going on here. The entries of ah ah inner products of vectors.
152
00:16:51,390 --> 00:16:54,750
Q with is a. So ah.
153
00:16:54,780 --> 00:17:04,620
I. J. We see from here is the inner product of Q II with a J.
154
00:17:09,520 --> 00:17:13,960
Now another way to work with the algebra is let's suppose.
155
00:17:17,030 --> 00:17:23,150
That we have some X, so it's any end vector.
156
00:17:26,570 --> 00:17:30,320
And let's consider the corresponding B eight times x.
157
00:17:30,620 --> 00:17:41,060
So I could then say that B equals x is a linear combination of the columns of a.
158
00:17:47,430 --> 00:17:55,710
So if we have our QR factorisation so if A equals Q are, then we can use the algebra and write it like this.
159
00:17:56,700 --> 00:18:00,389
I'll say b equals a x.
160
00:18:00,390 --> 00:18:07,260
So that's q are x and we could write that as Q times our x.
161
00:18:09,150 --> 00:18:15,390
And now I could multiply on the left by Q transpose and I would get that Q transpose B.
162
00:18:17,270 --> 00:18:26,710
Is equal to our X. So what I'm doing here is stuff you've all seen, I'm sure, from different angles,
163
00:18:27,460 --> 00:18:31,990
but I'm trying to urge you to think of it in terms of columns, of matrices, that's all.
164
00:18:32,170 --> 00:18:40,060
And this factorisation, when you see a formula like this, I hope you remember to interpret it in terms of vectors, of coefficients,
165
00:18:40,270 --> 00:18:46,210
we know that X is the vector of coefficients of an expansion of B in the basis of columns of A,
166
00:18:47,410 --> 00:18:54,010
and that neurone in your brain sees this, which is Q inverse B That must be a vector of coefficients of B.
167
00:18:54,880 --> 00:18:58,630
In the. Basis of columns of.
168
00:18:58,630 --> 00:19:05,080
Q So to be precise, X itself is coefficient vector for B.
169
00:19:10,110 --> 00:19:13,290
In the basis of columns of a.
170
00:19:16,830 --> 00:19:26,190
And then this thing because it's an inverse of something times a vector is the coefficient vector.
171
00:19:28,140 --> 00:19:31,570
Four be. In the basis.
172
00:19:32,620 --> 00:19:41,760
Q one through. Q in. So this formula shows us how you can convert from the A basis to the Q basis.
173
00:19:42,120 --> 00:19:46,460
And in fact, we could do it fully by inverting one of those matrices.
174
00:19:46,470 --> 00:19:51,280
So I could say that our inverse Q transpose B equals X and so on.
175
00:19:51,300 --> 00:19:57,830
Many ways to slice it. Okay.
176
00:20:01,080 --> 00:20:08,540
When I was a student, I like a lot of us, I learned different things in different subjects with completely different language.
177
00:20:08,550 --> 00:20:14,460
So I saw this stuff in physics courses, talked about one way, then I saw it in mathematics courses, talked about another way.
178
00:20:14,730 --> 00:20:20,100
It took me years to be comfortable with all the connections and to see that everything is really the same in the end.
179
00:20:21,750 --> 00:20:25,680
Okay, now I want to talk about how people compute QR Factorisation.
180
00:20:26,760 --> 00:20:34,320
So there's our QR factorisation let's talk about it's computation.
181
00:20:40,480 --> 00:20:43,480
This is kind of neat from a structural point of view.
182
00:20:57,530 --> 00:21:02,420
So we're given a matrix, a rectangular matrix. And we want to find Q and R.
183
00:21:04,580 --> 00:21:09,170
One way to view that is we're trying to find. Q which is an orthogonal basis.
184
00:21:09,620 --> 00:21:14,900
Another way to view it is we're trying to find our which is the sort of equivalent a triangular matrix.
185
00:21:15,080 --> 00:21:20,600
And it turns out those two ways of looking at it lead to the two great classes of algorithm.
186
00:21:20,990 --> 00:21:24,650
So one of them is called Graham Schmidt and the other is called Householder.
187
00:21:26,090 --> 00:21:34,060
So let me write the heading for Graham Schmidt. The way I think of Graham Schmidt.
188
00:21:36,350 --> 00:21:40,280
Which is a pre-computer idea. Well, there's another word.
189
00:21:40,280 --> 00:21:47,300
Graham Schmitt, orthogonal ization. The word or thought.
190
00:21:47,330 --> 00:21:52,570
Canonisation says what it's all about. It's an algorithm in which we orthogonal ize the columns of a.
191
00:21:52,840 --> 00:21:59,470
And the way I like to think of it is it's a process of triangular or orthogonal ization.
192
00:21:59,830 --> 00:22:10,830
By which I mean. As you'll see, it's making a orthogonal by a sequence of triangular operations.
193
00:22:13,650 --> 00:22:20,850
And let me skip ahead, leaving some blank space here to the other one, which is householder.
194
00:22:24,240 --> 00:22:27,330
So householder triangular is Asian.
195
00:22:30,900 --> 00:22:38,010
Is the dual of this idea. And this is a post computer idea.
196
00:22:38,280 --> 00:22:41,700
This is a process of orthogonal triangular ization.
197
00:22:51,960 --> 00:23:03,910
Big words. Takes a while to write them. Now those words are nearly enough for you to figure out, given enough time what's going on.
198
00:23:04,150 --> 00:23:07,450
You know, your goal is to take a and convert it into. Q Times are.
199
00:23:07,840 --> 00:23:14,739
I'm in effect telling you there are two natural ways to do that. You can either try to make the columns of or orthogonal write it down.
200
00:23:14,740 --> 00:23:20,390
You'll get Graham Smith or you can try to make a upper triangular using orthogonal operations.
201
00:23:20,410 --> 00:23:23,470
Write that down. You'll get householder. Let me say a little more.
202
00:23:25,210 --> 00:23:39,320
So the Graham Schmidt idea. Viewed from an algebraic point of view consist of taking your matrix A and multiplying it on the right by a sequence.
203
00:23:40,390 --> 00:23:46,240
Of triangular matrices. I haven't really told you what we're doing at an operational level,
204
00:23:46,510 --> 00:23:54,310
but algebraically Gram Schmidt consists of taking your matrix and doing things to it until it becomes.
205
00:23:56,250 --> 00:24:00,390
Orthogonal until the columns become all the normal. Now, what does that really mean?
206
00:24:01,380 --> 00:24:04,890
When you multiply a matrix on the right by something.
207
00:24:06,080 --> 00:24:13,610
You're taking linear combinations of columns. And the grand Schmidt idea is to construct the first column of Q.
208
00:24:14,640 --> 00:24:18,000
By normalising the first column of a construct.
209
00:24:18,000 --> 00:24:23,100
The second column of Q as a linear combination of the first two columns of the third
210
00:24:23,100 --> 00:24:27,390
column of Q is a linear combination of the first three columns of A and so on.
211
00:24:27,570 --> 00:24:32,790
So it's doing a succession of operations on columns of A to make them orthogonal.
212
00:24:38,300 --> 00:24:43,910
I'm not going to write down the formulas, partly because most of us have seen them in one context or another.
213
00:24:44,150 --> 00:24:54,800
It's essentially these formulas. You use these formulas one by one to construct the entries of R, such that the columns have become orthogonal.
214
00:24:56,490 --> 00:25:00,660
You go through these operations acting on column one, one and two, one, two, three.
215
00:25:00,810 --> 00:25:05,310
All the way up to an operational acting on all the columns and.
216
00:25:07,080 --> 00:25:18,219
This then. Is a product of upper triangular matrices, which we can call its trying upper triangular itself because it's a product of them.
217
00:25:18,220 --> 00:25:26,860
If we call it a if we call that our inverse, then algebraically speaking we found a equals Q are.
218
00:25:29,600 --> 00:25:33,050
Okay. So many of you know Graham Schmidt, I'm sure some of you don't.
219
00:25:33,320 --> 00:25:36,020
But any way you can, if you're curious about the details,
220
00:25:36,020 --> 00:25:44,540
look it up and then have some fun interpreting what you find as multiplications of a matrix on the right by elementary matrices.
221
00:25:45,350 --> 00:25:58,120
The one I want to emphasise more is the householder approach. Which is less obvious, but in fact is the standard approach for moderate sized matrices.
222
00:25:58,270 --> 00:26:01,600
If you type Q are in MATLAB, it's this method that's used.
223
00:26:01,810 --> 00:26:04,840
If you do least squares in MATLAB, it's this method that's used.
224
00:26:05,260 --> 00:26:12,340
It's the best way to deal accurately with matrices that are not so big as to cause sparsity challenges.
225
00:26:14,260 --> 00:26:21,070
So I want to talk about householder. And before I do that, let's look at the paper, which is one of the handouts.
226
00:26:24,170 --> 00:26:28,490
So this is a greatest hit of numerical analysis. It's from 1958.
227
00:26:28,910 --> 00:26:32,990
Unitary triangular ization of a non symmetric matrix.
228
00:26:33,940 --> 00:26:39,160
By Austin Householder, which was one of the great figures of numerical linear algebra after the war.
229
00:26:39,850 --> 00:26:43,240
A real character. Smart guy. Whisky drinker.
230
00:26:43,510 --> 00:26:50,500
Good man. He died some years ago, but he had a huge influence on generations of people.
231
00:26:50,890 --> 00:26:54,040
I love this paper. It's so readable and it's so short.
232
00:26:54,040 --> 00:26:58,060
It's essentially three pages long. And it really did change the world.
233
00:26:59,170 --> 00:27:02,170
Nobody had noticed, to first approximation.
234
00:27:03,270 --> 00:27:07,740
This duality the whole world had been thinking in terms of orthogonal ising things.
235
00:27:08,130 --> 00:27:14,580
Householder noticed that you could equally well triangular five things and moreover that it was numerically more stable.
236
00:27:14,850 --> 00:27:19,140
So the rounding error is causing more trouble with this method than with this method.
237
00:27:19,170 --> 00:27:27,180
Quite a remarkable observation. The the previous thing that had come along is called Given's rotations, which has a bit of the same flavour.
238
00:27:27,420 --> 00:27:31,350
But Householder did it in the really the ultimate way, I think.
239
00:27:31,830 --> 00:27:36,840
So we're going to say a bit about what he does and maybe left next lecture, but not right now.
240
00:27:37,140 --> 00:27:42,500
What he does do formally. Is take a matrix a.
241
00:27:46,200 --> 00:27:50,760
And instead of acting on its columns, which would be multiplying by stuff on the right,
242
00:27:51,000 --> 00:27:55,080
he acts on its rows, which means multiplying by stuff on the left.
243
00:27:59,680 --> 00:28:08,200
So the picture is. House holder starts from a matrix, which we imagine being a dense matrix, rectangular.
244
00:28:10,720 --> 00:28:13,570
And he wants to make it up a triangular.
245
00:28:14,630 --> 00:28:22,010
And what he does is, first of all, multiply it on the left by something that turns it into this structure 0000.
246
00:28:22,490 --> 00:28:25,580
So in the first column, the zeros are introduced.
247
00:28:27,390 --> 00:28:31,860
So that step consists of multiplying out, playing it on the left by an orthogonal matrix.
248
00:28:31,860 --> 00:28:40,390
Q one. Then at the next step, zeros are introduced in the second column below the diagonal.
249
00:28:47,160 --> 00:28:50,460
So that would be the multiplication on the left by another.
250
00:28:51,660 --> 00:28:52,770
Orthogonal matrix.
251
00:28:52,950 --> 00:29:01,950
So by a sequence of orthogonal operations, household triangular ization takes the dense matrix and turns it into a upper triangular matrix.
252
00:29:02,280 --> 00:29:09,600
So finally for this one, we have a third multiplication and now it would be upper triangular.
253
00:29:14,990 --> 00:29:20,270
So we're done. We've done it end times. And now we have an upper triangular matrix called our.
254
00:29:22,040 --> 00:29:26,510
And if. Same thing as before. If we call that Q inverse or Q transpose.
255
00:29:28,900 --> 00:29:32,680
Then we've got a equals Q are.
256
00:29:47,770 --> 00:29:57,460
When I was your age and studying numerical analysis, I was told that this was simply better than not more accurate, less subject to rounding errors.
257
00:29:57,730 --> 00:30:02,170
The truth is a little more subtle than that. There are in many basic applications.
258
00:30:02,170 --> 00:30:07,210
That's true, and it certainly gives you more nearly orthogonal matrices.
259
00:30:07,240 --> 00:30:13,360
Q However, it turns out that orthogonal ity sometimes matters, and another time it doesn't matter.
260
00:30:13,600 --> 00:30:17,380
So it's not always true that the accuracy makes a difference.
261
00:30:17,800 --> 00:30:22,840
And there is an advantage to this approach, which is that it generalises more readily to sparse problems.
262
00:30:23,080 --> 00:30:27,220
So often you'll find that if your matrix is 1000 by a thousand, you're doing this.
263
00:30:27,490 --> 00:30:30,700
If it's a billion by a million, you may find yourself closer to there.
264
00:30:31,180 --> 00:30:35,740
Lots of possibilities. I want to play around with this stuff and see what happens.
265
00:30:36,190 --> 00:31:00,230
So let's go to the code called M10. So this is just illustrating QR factorisation.
266
00:31:01,670 --> 00:31:06,970
So I'll say a. Equals a random.
267
00:31:07,180 --> 00:31:16,080
Sorry, a zero. Matrix of dimension seven by three.
268
00:31:17,610 --> 00:31:20,969
There we are. And now I'm going to put some numbers into the matrix.
269
00:31:20,970 --> 00:31:24,660
So I'll say a of colon, a notation we haven't used before.
270
00:31:24,660 --> 00:31:29,490
But you can see what it does will be 1 to 21 squared.
271
00:31:30,270 --> 00:31:39,060
So that takes the vector 1 to 21 squared and puts it into the A without changing the shape in the usual order that we've talked about several times.
272
00:31:40,770 --> 00:31:45,800
So there we have an arbitrary matrix. Let's now say Q are equals.
273
00:31:45,870 --> 00:31:52,320
Q are of a comma zero, which is not labs notation for the reduced q r factorisation.
274
00:31:53,460 --> 00:31:58,410
If we do, that will get a Q in and on. So what you see here is that.
275
00:31:59,560 --> 00:32:05,350
Q You can't tell there are the normal vector column vectors, but they are as obviously upper triangular.
276
00:32:05,590 --> 00:32:12,720
You can immediately see, by the way, why. This way of thinking is kind of a post computer way of thinking.
277
00:32:13,670 --> 00:32:18,180
Anything to do with author normality is not something you're going to write down on paper.
278
00:32:18,210 --> 00:32:23,670
There are square roots all over the place. These are numbers easy to compute and hard to work with on paper.
279
00:32:24,300 --> 00:32:28,770
So orthogonal linear algebra is really a post computer technology.
280
00:32:30,540 --> 00:32:33,959
But anyway, a is equal to QR and I could do all sorts of things.
281
00:32:33,960 --> 00:32:40,320
I could say a minus QR, for example. Mathematically, that should be the zero matrix.
282
00:32:41,190 --> 00:32:44,520
Of course, with rounding errors it won't be exactly zero.
283
00:32:44,850 --> 00:32:48,320
There it is. That's actually bigger than you might think.
284
00:32:48,330 --> 00:32:53,780
But the reason is that a itself is pretty big because it had entries in the hundreds.
285
00:32:53,790 --> 00:32:57,630
So ten to the minus 12th relative to the scale of a is fine.
286
00:32:59,740 --> 00:33:03,760
Here's a again if we look at the norm of the first column of a.
287
00:33:05,010 --> 00:33:10,910
Square root of the sum of the squares of those numbers. There it is. Let me remind you, there's the matrix.
288
00:33:10,920 --> 00:33:15,780
Ah, so because Q is an author normalised version of A.
289
00:33:17,470 --> 00:33:24,430
The first column of Q is a normalisation of the first column of A, therefore the scalar has to come out there.
290
00:33:25,120 --> 00:33:33,880
So that's why 68 equals 68. Similarly, if you look at the norm of the second column of a.
291
00:33:36,780 --> 00:33:40,050
And compare it with the norm of the second column of Ah.
292
00:33:41,310 --> 00:33:49,830
You find they're the same because we're taking all the directions away and putting them into Q and all the magnitudes of A and putting them into our.
293
00:33:52,320 --> 00:33:57,330
Now cue is orthogonal. It's a seven by three matrix with the normal column.
294
00:33:57,340 --> 00:34:00,390
So if I take Q transpose times Q.
295
00:34:02,790 --> 00:34:05,070
You get the three by three identity.
296
00:34:07,460 --> 00:34:15,740
If I take Q times Q transpose, then that would be a seven by seven matrix, which can't possibly be the identity because it only has rank three.
297
00:34:16,370 --> 00:34:21,030
And indeed, if I say rank of ants. It's three.
298
00:34:25,110 --> 00:34:30,690
I guess the final thing in this demo is to get through a big one to see how nice it is to live in the 21st century.
299
00:34:30,900 --> 00:34:36,240
So suppose I say a equals round n of 1e41e3.
300
00:34:40,110 --> 00:34:46,800
So there we have a random matrix with 10,000 rows and 1000 columns.
301
00:34:47,400 --> 00:34:51,719
Pretty big. And let's do a QR factorisation.
302
00:34:51,720 --> 00:34:57,960
I'll say q r equals q r of a comma zero.
303
00:34:59,550 --> 00:35:07,390
Talk. So on this laptop, it takes something like 4 seconds.
304
00:35:10,420 --> 00:35:15,940
But of course, we're dealing with 10 million entries. And in fact, it's it's an N cubed operation.
305
00:35:15,940 --> 00:35:20,890
So the amount of work scales as 10,000 times 1000 squared.
306
00:35:20,900 --> 00:35:25,900
So we're doing a lot of computing. Let's see how accurate it was.
307
00:35:25,900 --> 00:35:33,130
So suppose I say what's the maximum entry of the absolute value of Q?
308
00:35:33,160 --> 00:35:38,880
Transpose Q. Minus the identity of size 1e3.
309
00:35:39,340 --> 00:35:46,470
Okay, so we've just made a CU matrix that's 10,000 by 1000, and its columns are supposed to be all the normal.
310
00:35:46,740 --> 00:35:52,650
Are they really are the normal? Pretty good, right?
311
00:35:53,160 --> 00:35:57,239
If we had done it with Graham Schmidt, they would be not even close to the normal.
312
00:35:57,240 --> 00:36:01,260
But householder achieves this even with these incredible matrices.
313
00:36:01,770 --> 00:36:04,440
It's interesting to think historically about that.
314
00:36:04,450 --> 00:36:13,720
So this theory was developed by Householder and Wilkinson and others in an era when even 100 by 100 was too big for most experiments.
315
00:36:13,770 --> 00:36:17,100
Those guys would have been playing around with matrices of size 20 or 30.
316
00:36:17,460 --> 00:36:23,610
And that's the power of theory, right? They proved theorems about stability, which showed that the method really works.
317
00:36:23,610 --> 00:36:27,810
And here we are 50 years later, doing it on matrices a thousand times bigger.
318
00:36:27,960 --> 00:36:35,870
Just amazing. Okay.
319
00:36:35,870 --> 00:37:08,610
So I want to get back to least squares problems. Okay.
320
00:37:18,070 --> 00:37:36,120
We? Okay.
321
00:37:36,960 --> 00:37:41,730
We already have mentioned them a bit. I want to now connect them with QR factorisation.
322
00:37:41,940 --> 00:37:46,530
So this is called section 2.5. Linear at least squares.
323
00:37:51,720 --> 00:37:56,910
There are, of course, also nonlinear squares problems, which will say a little bit about in a couple of weeks.
324
00:37:59,280 --> 00:38:05,609
The history here is wonderful. This was discovered by Gauss and by legendary independently.
325
00:38:05,610 --> 00:38:12,839
And boy, they had a big fight. So I think Gauss maybe did it first, but typically didn't publish this in the 1790s.
326
00:38:12,840 --> 00:38:18,840
And when they found out about each other, they were furious. Gauss was not a very nice man, but he was pretty bright.
327
00:38:20,490 --> 00:38:26,370
There's a great book. Sort of the standard numerical book about this stuff.
328
00:38:26,380 --> 00:38:30,310
Of course, these squares is everywhere, it's in statistics, everywhere and so on.
329
00:38:30,550 --> 00:38:42,100
But from a numerical linear algebra point of view, the standard book is by Bjork, who is not a rock star except in the mathematical sense.
330
00:38:44,020 --> 00:38:48,250
So just to remind you, we're always dealing here with rectangular matrices.
331
00:38:48,670 --> 00:38:53,730
So A is my N and M is bigger than N.
332
00:38:55,720 --> 00:39:01,690
And we're looking for solutions such that a X minus B in the tune arm is minimal.
333
00:39:06,670 --> 00:39:14,670
And the picture I drew last lecture. Let me draw it again. Lies in the space.
334
00:39:14,670 --> 00:39:25,100
R m. And the picture I like to draw consists of a subspace of RAM, which is the range of a.
335
00:39:26,170 --> 00:39:28,730
Also known as the column space of a.
336
00:39:31,220 --> 00:39:37,520
And whenever we do these squares, we're trying to find a vector in the range of a that's as close as possible to be.
337
00:39:38,060 --> 00:39:42,650
So we think of B as being a vector out here somewhere. And we're trying to find.
338
00:39:44,070 --> 00:39:47,400
That point there, that's not X, that's acts.
339
00:39:48,240 --> 00:39:51,900
So that's the point in the range of A as close as possible to be.
340
00:39:53,080 --> 00:39:57,490
It's obvious intuitively, at least, that there is an unique such point.
341
00:39:59,050 --> 00:40:02,380
There's not necessarily a unique X, but there's a unique X.
342
00:40:03,580 --> 00:40:09,250
And our job is to find it in the way we find it mathematically is to exploit this orthogonal property.
343
00:40:09,640 --> 00:40:16,390
The least squares norm has the property that the residual which is this vector is characterised
344
00:40:16,390 --> 00:40:21,130
by being orthogonal to the range of a and I mentioned all of this at the end of.
345
00:40:22,160 --> 00:40:31,500
Last lecture. So that orthogonal is a condition. The range of AA must be orthogonal to the residual.
346
00:40:31,710 --> 00:40:42,210
So that leads us to the condition we identified at the end of the last lecture, which is that A transpose B minus X should be the zero vector.
347
00:40:43,110 --> 00:40:52,380
This is the residual and it must be orthogonal to each column of a, which is enough to make it orthogonal to the range of a.
348
00:40:53,190 --> 00:41:00,480
So these are the normal equations a transpose a x equals a transpose b.
349
00:41:08,260 --> 00:41:14,139
So as a rule, you shouldn't use this method as long as your matrices are small enough to to do dense
350
00:41:14,140 --> 00:41:19,480
linear algebra with don't solve any squares problem by solving the system of equations.
351
00:41:19,720 --> 00:41:24,070
The reason is that a transpose sort of squares the il conditioning of the problem.
352
00:41:24,280 --> 00:41:26,500
So it'll you'll get a less accurate answer.
353
00:41:26,800 --> 00:41:33,700
On the other hand, for very large problems with sparsity issues, often this is the best way to go because it maintains sparsity.
354
00:41:33,700 --> 00:41:49,900
And to a certain degree. Let me mention a theorem about this stuff, which is that if the rank of A is N, so A has full rank, that's equivalent.
355
00:41:52,220 --> 00:41:59,580
To transpose a being symmetric positive definite a transpose.
356
00:41:59,580 --> 00:42:03,500
They will always be symmetric and it will always be a semi definite.
357
00:42:03,710 --> 00:42:07,490
So its eigenvalues can never be negative, but they might be zero.
358
00:42:07,970 --> 00:42:15,440
However, if rank of A is equal to N rather than less than n, then all the eigenvalues are positive, which means it's non singular.
359
00:42:16,220 --> 00:42:20,990
So a transpose is non singular if and only if rank of his end.
360
00:42:21,200 --> 00:42:26,720
Which means that the solution vector x is unique.
361
00:42:30,950 --> 00:42:40,700
So X is always unique and the picture sort of makes that obvious, but the vector x corresponding to that is unique if and only if A has full rank.
362
00:42:43,880 --> 00:42:47,210
So there are three ways we can easily talk about computing X.
363
00:42:47,900 --> 00:42:51,080
We've just in effect shown one of them we could solve the normal equation.
364
00:42:51,230 --> 00:42:59,240
It's a symmetric, positive, definite system. Normally we could solve that system and find x, but let's do it the other way.
365
00:42:59,240 --> 00:43:05,270
Let's do it via q our factorisation. So how do we solve?
366
00:43:05,880 --> 00:43:19,220
Least squares via QR. Well, the idea is if we have a North a normal basis for the range of AA, then the problem should be an easy one.
367
00:43:19,370 --> 00:43:22,460
We won't have to invert anything. We're going to have to take some inner product.
368
00:43:22,730 --> 00:43:28,430
That's what the QR Factorisation gives us and we're a little short on time, so I'm going to write it down rather quickly.
369
00:43:31,930 --> 00:43:36,040
We're trying to project be orthogonal into that range space.
370
00:43:37,570 --> 00:43:45,550
Let's write down an equation for that. So a X is the orthogonal projection.
371
00:43:51,930 --> 00:43:56,330
Of B on to. Range of.
372
00:44:02,340 --> 00:44:05,570
That means. That acts.
373
00:44:06,740 --> 00:44:11,330
I'll write it down and then explain why it's right. It can be written as Q.
374
00:44:11,630 --> 00:44:18,830
Q transpose. B. Now, why is that? Well, this is going to give me expansion coefficients.
375
00:44:22,350 --> 00:44:24,570
I'm speaking fast and loose here, I'm afraid.
376
00:44:26,860 --> 00:44:32,200
And then this is going to take those expansion coefficients and multiply them by the appropriate vectors.
377
00:44:39,170 --> 00:44:48,590
So I'm afraid I am speaking quickly here. But it's a projection process where you take me and you take in our products in order to find coefficients,
378
00:44:48,590 --> 00:44:51,890
and then those use those coefficients in the appropriate basis.
379
00:44:52,250 --> 00:45:02,320
Now, if you multiply on the left by Q transpose. Then you get.
380
00:45:04,770 --> 00:45:11,820
May make sure I've got this right. I want to assume that a is QR so we let's see, we have QR.
381
00:45:12,150 --> 00:45:18,340
X equals Q Q transpose B multiplied by Q transpose.
382
00:45:18,340 --> 00:45:22,470
So we get our x equals Q transpose B,
383
00:45:22,620 --> 00:45:32,489
and that's the key equation that we can solve to do at least squares problem by the Q our factorisation so I'm sorry
384
00:45:32,490 --> 00:45:41,010
this is going to quickly the idea is you take a you factor it into Q times R and then you solve this equation,
385
00:45:41,430 --> 00:45:46,350
you multiply Q transpose by B to get your coefficients, as it were, in that basis.
386
00:45:46,350 --> 00:45:51,420
And then you solve this equation to translate back to the original a basis.
387
00:45:51,930 --> 00:45:56,280
So that's the algorithm. Let's do it.
388
00:46:00,710 --> 00:46:03,560
And the final demo here is the one called M11.
389
00:46:08,520 --> 00:46:16,710
So if I say following the script here, I'm going to say t equals two pi times a vector of length of thousand.
390
00:46:19,700 --> 00:46:25,990
So this is now a set of points scattered between zero and two pi, a thousand points,
391
00:46:26,270 --> 00:46:31,370
and I'll say B equals T over five plus and then use a T over five.
392
00:46:32,180 --> 00:46:35,780
And then using a logical operation, I'll say.
393
00:46:36,990 --> 00:46:43,890
T less than pi, so that vector will be zero when t is bigger than pi and one when t is less than pi.
394
00:46:45,030 --> 00:46:51,150
So if I plotted. I've just constructed that.
395
00:46:52,270 --> 00:46:56,590
Vector. This is a vector of length. 1000 plotted by straight lines.
396
00:46:58,240 --> 00:47:02,319
Now let's make a matrix which has five columns that are trigonometric functions.
397
00:47:02,320 --> 00:47:13,120
So say A is equal to the thousand by five matrix cosine of zero t sign of t cosine of t sign of two.
398
00:47:13,120 --> 00:47:19,130
T cosine of two. Thousand by five matrix.
399
00:47:20,060 --> 00:47:23,360
And now let's solve the least squares problem in three different ways.
400
00:47:23,570 --> 00:47:29,790
So one way. Is to just use backslash so I could say x1 equals a backslash.
401
00:47:29,800 --> 00:47:36,610
B this is doing whatever matlab does, which is in fact the q r factorisation and we could plot the result.
402
00:47:36,610 --> 00:47:44,140
I can plot a times x one as a red line and there you see the least squares fit.
403
00:47:47,440 --> 00:47:54,430
On the other hand, we could explicitly use the QR factorisation and do it ourself rather than relying on MATLAB backslash.
404
00:47:54,440 --> 00:48:03,170
So let's do that. I could say Q comma R equals Q are of a zero.
405
00:48:04,760 --> 00:48:11,600
And then I could say x2 equals R backslash q transpose b.
406
00:48:13,990 --> 00:48:20,469
Now that's a hazardous way to write it, because who knows whether the backslash was done before the vector.
407
00:48:20,470 --> 00:48:21,850
This is a better way to write it.
408
00:48:22,210 --> 00:48:32,050
So here, that operation means first we multiply B by Q transpose, and then we solve the problem r x equals that vector.
409
00:48:34,690 --> 00:48:39,240
So if I do that, I have another version of X. This was the first one we computed.
410
00:48:39,250 --> 00:48:44,440
This is the second one we computed. They're pretty much the same x one minus x two.
411
00:48:45,550 --> 00:48:49,090
There they are. They're not identical notice.
412
00:48:49,090 --> 00:48:52,270
So obviously MATLAB did something slightly different from what we did.
413
00:48:52,720 --> 00:48:57,970
Finally, let's do the normal equations and I don't want to tell you these are bad merely that they're different.
414
00:48:58,300 --> 00:49:02,890
I could solve the problem by a transpose a.
415
00:49:04,030 --> 00:49:09,970
Backslash A transpose B and this is mathematically different.
416
00:49:10,180 --> 00:49:17,200
This is not what backslash is doing. It's really a different algorithm that advantageous, sometimes disadvantageous other times.
417
00:49:17,710 --> 00:49:22,980
If I look at the error here, I see that in this case it's fine.
418
00:49:22,990 --> 00:49:26,560
Nothing went wrong in this case. Okay.
419
00:49:26,580 --> 00:49:27,990
I think that's it for today. Thanks.