1
00:00:01,660 --> 00:00:16,820
[Auto-generated transcript. Edits may have been applied for clarity.]
So. Welcome, everybody.

2
00:00:17,090 --> 00:00:22,180
Welcome to this lecture. The series of lectures founding doctor Christopher Strangio,

3
00:00:22,460 --> 00:00:27,200
who was a leading figure in the history of our Department of History of Programming languages.

4
00:00:27,650 --> 00:00:32,390
Before introducing today the speaker, I want to thank our sponsors for Oxford Asset Management,

5
00:00:32,690 --> 00:00:37,040
who've been, uh, generous to be sponsoring this series since 2015.

6
00:00:37,250 --> 00:00:43,650
So it's thanks to them that we can bring such excellent speakers, um, um, students, uh, researchers,

7
00:00:43,790 --> 00:00:48,770
to find out about their company, um, invited to meet with them after what we have thought we were.

8
00:00:50,240 --> 00:00:58,940
Okay. So how it's going to work is I'm going to introduce percussion, and then it's going to enlighten us and, uh, awkward questions.

9
00:00:59,360 --> 00:01:02,530
Um, they're going to be people running around with microphones.

10
00:01:02,950 --> 00:01:05,959
Um, before you do that okay.

11
00:01:05,960 --> 00:01:17,000
So I'm really, really, really delighted to introduce Percussion and Garden, um, who is a super well-known figure in the foundations of science.

12
00:01:17,700 --> 00:01:22,820
And about it so low that I get a reminder every single day of that.

13
00:01:22,830 --> 00:01:27,330
Probably you didn't know that. And that's because I was I walked down the corridor.

14
00:01:27,660 --> 00:01:32,880
Um, there are probably some students here from the kind of quantum office, and, um,

15
00:01:32,970 --> 00:01:39,380
they have a poster in their office with a picture of the cast that you got in there.

16
00:01:39,390 --> 00:01:50,100
So I see this every single day. And, uh, this picture was from March Fest, which was held here in Oxford in 2014 to celebrate a birthday.

17
00:01:50,430 --> 00:01:57,620
And so and, uh, honour and that I thought I'd read you a little bit of the blurb from Prakash.

18
00:01:57,810 --> 00:02:03,960
That's not all of it. Um, so it says of God has worked on a large variety of topics,

19
00:02:04,350 --> 00:02:12,540
including probabilistic and concurrent computation, logics and duality, and quantum information and computation.

20
00:02:13,110 --> 00:02:18,690
Despite the enormous breadth of his research, he's made significant and deep contributions.

21
00:02:19,020 --> 00:02:19,749
For example,

22
00:02:19,750 --> 00:02:28,250
we entered this book a logic and also a real valued interpretation of the logic to capture equivalence of probabilistic processes quantitatively.

23
00:02:28,620 --> 00:02:32,970
So we're going to have some think about that today because I say probabilistic by simulation.

24
00:02:33,330 --> 00:02:44,020
And he's talked about that something new since uh, first is the because he's also been through a lot of,

25
00:02:44,020 --> 00:02:47,430
um, important work in learning and learning theory.

26
00:02:47,730 --> 00:02:55,020
So he was a founding member of the Reasoning and Learning Lab at McGill and also the Montreal Institute of Learning algorithm.

27
00:02:55,080 --> 00:02:58,229
So we're going to be getting some of that today as well. Okay.

28
00:02:58,230 --> 00:03:06,720
So I'm not going to tell you all the accolades, but I just want to very briefly point out that there's actually two plus two time papers in this,

29
00:03:07,020 --> 00:03:14,040
which are to celebrate papers that can have a group of students, in fact, a one year period is a fellow of the Royal Society of Canada.

30
00:03:14,580 --> 00:03:18,780
I want to guess. Yeah. Um, founding chair of ACM Sigma.

31
00:03:19,170 --> 00:03:22,500
So, um, at this point, I will turn it over to you. That's wonderful.

32
00:03:23,610 --> 00:03:37,200
Thanks very much. You. Right.

33
00:03:37,470 --> 00:03:42,870
So first of all, it's a thrill and an honour to be here. Uh, I spent a lot of time at Oxford.

34
00:03:42,870 --> 00:03:47,790
Two different one year sabbaticals, and I was trying to count up how many talks I've given on.

35
00:03:47,790 --> 00:03:51,210
I think it's over 20 talks that I've given here in Oxford.

36
00:03:51,330 --> 00:03:54,960
Never in this room. Never been greeted by a dinosaur before.

37
00:03:55,290 --> 00:04:02,280
So, uh, I'm very happy to be here. And I'm thrilled to see so many people turning out to hear me.

38
00:04:03,030 --> 00:04:07,260
Okay, so, um, let me begin.

39
00:04:12,770 --> 00:04:17,300
I have to begin with Christopher Screechy because he made a fundamental contribution.

40
00:04:17,310 --> 00:04:23,660
He, along with Dijkstra and Hoare, with the people who really started thinking about programming mathematically.

41
00:04:24,020 --> 00:04:30,560
And this has had a huge influence, not just in programming languages, but in related areas like verification and so on.

42
00:04:30,980 --> 00:04:38,120
And of course, Christopher Stretchy and Dana Scott founded the so called Scott stretchy approach to denotation semantics,

43
00:04:38,120 --> 00:04:44,210
which is one of the things that sucked me into computer science from my earlier profession, which was physics.

44
00:04:45,330 --> 00:04:51,810
So, uh, I decided to show you a lot more photographs of people than than Greek letters.

45
00:04:52,770 --> 00:04:54,930
So there's quite a gallery of pictures coming up.

46
00:04:55,650 --> 00:05:06,030
So there are the whole concept of by simulation, um, arose from the work of these two giants of concurrency theory, Robin Milner and David Park.

47
00:05:06,720 --> 00:05:15,930
Um, so Milner, the first one who conceptualised it and came up with the first description, but Park gave a much more elegant six point approach.

48
00:05:15,930 --> 00:05:24,120
And that's been the fruitful, uh, way of presenting by simulation, which has readily generalised to other subjects, other domains.

49
00:05:24,900 --> 00:05:29,640
I was personally inspired by a number of people. One, uh, Dexter Kozan,

50
00:05:29,970 --> 00:05:38,550
who was one of the people who first began thinking about probabilistic systems on continuous state spaces in a computer science context.

51
00:05:38,820 --> 00:05:46,170
Of course, people doing Markov process theory, thinking about things people in computer science had been thinking about discrete Markov chains.

52
00:05:46,500 --> 00:05:50,980
But Dexter really started thinking about measure theory. And.

53
00:05:52,700 --> 00:05:58,790
And related topics. And he taught a lecture course at Cornell when I was a young assistant professor there.

54
00:05:59,300 --> 00:06:02,450
And I sat through this lecture course, and I don't remember a thing about it.

55
00:06:02,840 --> 00:06:07,940
If I had, it would have saved me several years, which I had to rediscover many of these things later.

56
00:06:09,160 --> 00:06:12,550
Another significant source of inspiration was Bill Ovia,

57
00:06:13,060 --> 00:06:19,960
who actually formulated a notion of a category of probabilistic relations back in the early 60s.

58
00:06:20,590 --> 00:06:27,820
Um, tragically, he died just a couple of years ago. But he is legendary for his incredible insight into different things.

59
00:06:27,970 --> 00:06:35,350
Wasn't always able to prove stuff, but he was always able to connect stuff and and have, uh, other people fill in the gaps.

60
00:06:35,800 --> 00:06:39,050
And here's a person who most people haven't heard of.

61
00:06:39,070 --> 00:06:49,540
Her name is Michelle Giri, and she's the one who gave a kind of monad theoretic way of thinking about the category of relations that, uh, formulated.

62
00:06:51,870 --> 00:06:58,000
So let me, uh. Suspend the pictures here for a minute and talk about behavioural equivalence.

63
00:06:58,180 --> 00:07:05,140
Now, behavioural equivalence simply means what it says. How do I know that two systems are behaving the same way?

64
00:07:05,800 --> 00:07:12,370
And in particular, we can look at two states within a system and ask when do they have exactly the same behaviour?

65
00:07:12,910 --> 00:07:18,220
Now of course, this depends on what aspect of the behaviour one is able to observe.

66
00:07:18,820 --> 00:07:23,020
So if you're observing the older each state that that means one thing.

67
00:07:23,020 --> 00:07:32,530
If you just observing certain reactions or certain things being emitted outputs, then the notion is different and we will see variations on this idea.

68
00:07:33,580 --> 00:07:36,850
What we do we want this equivalence relation to guarantee.

69
00:07:38,660 --> 00:07:46,400
Well, uh, if two states are equivalent, we should not be able to see any difference in their observable behaviour.

70
00:07:46,430 --> 00:07:48,380
This is kind of repeating what I have already said.

71
00:07:49,130 --> 00:07:57,200
It's trying to capture some notion of observable aspect of the behaviour and from the temporal point of view.

72
00:07:57,440 --> 00:08:02,479
If we start out with two states that are equivalent, they should stay equivalent as they evolve.

73
00:08:02,480 --> 00:08:07,940
Otherwise, one of the observations you can make is aha. They've evolved into different, noticeably different things.

74
00:08:08,300 --> 00:08:14,410
And that certainly could not have been equivalent to start with. So let me just indulge in a bit of history.

75
00:08:14,890 --> 00:08:20,980
So the some of the ideas go back to, uh, Kanter and the back and forth argument.

76
00:08:21,010 --> 00:08:26,350
Now, all of you know, Cantor's diagonalization argument, but he actually proved many things,

77
00:08:26,680 --> 00:08:35,770
one of which was the, um, only countable dense linear order without end points is the rationals.

78
00:08:36,400 --> 00:08:39,280
And the argument that he gave to prove it is essentially, uh,

79
00:08:39,550 --> 00:08:45,070
what he called a back and forth argument, very similar in spirit to what we will see in simulation.

80
00:08:45,280 --> 00:08:48,820
So I like to mention that as part of the pre-history of Pi simulation.

81
00:08:49,960 --> 00:08:53,860
Yeah. Uh, sorry to all of you asking a question. Taking pictures.

82
00:08:54,070 --> 00:08:57,880
Yeah. Take a picture. Okay. So, uh.

83
00:08:59,750 --> 00:09:06,710
The concept of lump ability was mooted in the Markov chain community in the 60s, and this word is very evocative.

84
00:09:06,980 --> 00:09:15,350
It means when things are equivalent you can lump them into one state, thereby reducing effectively the state space with which you are working.

85
00:09:15,830 --> 00:09:21,200
So they gave this very nice name lump ability, but it didn't catch on in the computer science community.

86
00:09:23,080 --> 00:09:31,900
Uh, Milner and Park in the 1970s and 1980s formulated the notion that we will call by simulation and I will,

87
00:09:32,020 --> 00:09:35,710
uh, and they defined it in a non-deterministic setting without probabilities.

88
00:09:36,130 --> 00:09:41,160
So I will walk you through these definitions eventually. Um.

89
00:09:42,190 --> 00:09:47,860
Yes. So they formulated these concepts and there was the subject called process algebra,

90
00:09:47,860 --> 00:09:55,510
whereby people formulated different processes using algebraic notions and reasoning occasionally about them.

91
00:09:56,020 --> 00:10:00,580
In fact, there was a time when every country in the EU seemed to have its own process algebra.

92
00:10:01,000 --> 00:10:10,930
Italy had half a dozen, and at some point it became clear that one needs deeper concepts than just the syntax of these algebras.

93
00:10:13,550 --> 00:10:21,650
Late in the 80s, um Kim Larson and Honest Co formulated probabilistic by simulation for discrete systems.

94
00:10:21,890 --> 00:10:28,010
So now instead of just saying things can transition um transition non deterministically,

95
00:10:28,220 --> 00:10:33,410
you actually have probability distributions governing the behaviour of transition systems.

96
00:10:33,620 --> 00:10:37,100
And what's the relevant notion of pi simulation in that setting.

97
00:10:38,000 --> 00:10:40,399
Later on people added temporal concepts.

98
00:10:40,400 --> 00:10:50,240
So they were continuous time Markov chains Petri nets and then the uh probabilistic sorry performance evaluation process algebra.

99
00:10:50,810 --> 00:10:56,910
Is that right? Yeah. Which Jane Hillston sitting here formulated in the early 90s.

100
00:10:58,820 --> 00:11:02,750
Carrying on with a bit more history. So this is where we entered the picture.

101
00:11:03,860 --> 00:11:10,970
As I said before, I was very keen on looking at by simulation of Markov processes, but on continuous state spaces,

102
00:11:11,720 --> 00:11:20,420
and this was done originally in collaboration with my outstanding student Jose de ogni, and I was able at the Imperial.

103
00:11:22,390 --> 00:11:26,260
Later on, we moved on from equivalence relations to metrics.

104
00:11:26,620 --> 00:11:33,670
So rather than just saying things are equivalent or not equivalent, we started talking about how different are they or how how similar are they.

105
00:11:35,260 --> 00:11:42,669
And there was a very beautiful fixed point presentation of the work we developed due to fan Bruegel and James Morel,

106
00:11:42,670 --> 00:11:46,630
who is sitting somewhere in the audience. So many things happened at Oxford.

107
00:11:47,610 --> 00:11:50,800
Uh, now in the machine learning community,

108
00:11:51,460 --> 00:11:59,680
people were looking at Markov decision processes and without knowing that by simulation had already been invented, they reinvented it.

109
00:11:59,950 --> 00:12:06,070
But then they become became aware of the earlier work and cited it appropriately.

110
00:12:06,670 --> 00:12:09,790
But essentially they were looking at Markov decision processes.

111
00:12:10,030 --> 00:12:13,630
But now one adds a notion of reward to the set up.

112
00:12:15,360 --> 00:12:20,760
And we were in the market for taking any equivalence relation and making it into a metric.

113
00:12:20,970 --> 00:12:25,260
So we came up with simulation metrics for these Markov decision processes.

114
00:12:27,740 --> 00:12:31,460
Uh, many things happened over the years, but let me jump ahead to 2021,

115
00:12:31,760 --> 00:12:37,790
when we started applying some of these simulation metrics to the problem of representation learning.

116
00:12:38,120 --> 00:12:48,860
And this was primarily done by my former PhD student, uh, Pablo Castro, my master's student, uh, Tyler Kastner, who's still doing his PhD.

117
00:12:49,550 --> 00:12:53,300
Um, um, Mark Roland and DeepMind in London.

118
00:12:57,100 --> 00:13:06,270
Uh, so that work that I mentioned in NeurIPS was partly very principled and structured and then hacked at the end.

119
00:13:07,170 --> 00:13:14,730
And that was a bit of a disappointment. I mean, but then we realised that the hack at the end actually had a nice interpretation in terms of kernels,

120
00:13:15,150 --> 00:13:22,140
which I won't talk about today, but that's something we very, very recently, uh, completed and published.

121
00:13:25,370 --> 00:13:32,120
Okay. Right. So let me begin with the technical development, which I'll try to do quickly.

122
00:13:32,990 --> 00:13:37,370
And sometimes you'll see text in blue. That means you can doze off and ignore it.

123
00:13:37,640 --> 00:13:43,160
But if you really want to read it, you can. So we have labelled transition systems.

124
00:13:43,550 --> 00:13:47,180
That word transition system means you are somewhere and you move somewhere else.

125
00:13:47,390 --> 00:13:51,110
So where you are these things are called states. And.

126
00:13:52,160 --> 00:13:55,920
Um. You move. How do you move? By something happening.

127
00:13:55,940 --> 00:14:03,650
So these things are called actions or this sometimes labels and governed with associated with every action is a transition.

128
00:14:04,190 --> 00:14:07,880
And what a transition does is it takes you from one state to another.

129
00:14:08,180 --> 00:14:13,130
And we indicate this with an arrow labelled by one of these actions.

130
00:14:13,520 --> 00:14:17,929
So these transitions can be indeterminate, which means you start from a given state.

131
00:14:17,930 --> 00:14:22,730
You perform a certain action, and you'll end up in one of possibly several different states.

132
00:14:23,000 --> 00:14:27,530
Sometimes, maybe it's uniquely specified, sometimes not. So there's the nondeterminism.

133
00:14:29,140 --> 00:14:35,920
And it's, uh, nicer to read these label transitions with label written on top of the arrow,

134
00:14:36,340 --> 00:14:40,390
rather than this peculiar notation shown on the other side.

135
00:14:42,190 --> 00:14:47,860
So here's an example that Robert Milner invented to explain these concepts when they were first introduced.

136
00:14:47,860 --> 00:14:52,090
And I'm sure all the verification people have seen this far too many times.

137
00:14:52,780 --> 00:14:56,860
I can give you a nice brainteaser to work on while you while I'm explaining this, but maybe not.

138
00:14:57,220 --> 00:15:00,730
So this is the legendary vending machine example.

139
00:15:01,300 --> 00:15:11,200
You have a vending machine. You put, uh, a cup in the right place, you insert money, then you choose between tea and coffee,

140
00:15:12,400 --> 00:15:15,400
and then the machine responds by dispensing tea or coffee.

141
00:15:15,730 --> 00:15:25,490
Coffee as appropriate. And so. This diagram, which I hope is self-explanatory, depicts the behaviour of this very simple machine.

142
00:15:26,970 --> 00:15:30,870
Note one thing about this machine is the interactive character of the machine.

143
00:15:31,290 --> 00:15:35,760
The agent does certain things like press buttons, insert money, and so on.

144
00:15:36,180 --> 00:15:39,210
And the system also responds to these interactions.

145
00:15:41,080 --> 00:15:47,020
Now here's another one. And if you look at it, it kind of looks similar for a while.

146
00:15:47,950 --> 00:15:54,400
But then you notice a rather peculiar feature which is once you insert the money.

147
00:15:55,460 --> 00:15:58,640
Then the system immediately makes a non-deterministic transition.

148
00:15:58,640 --> 00:16:04,190
So here's an example of non-determinism in action. It goes to either this state or that one.

149
00:16:05,120 --> 00:16:11,170
Before you get a chance to choose. And once you're in one of those states, you have no choice.

150
00:16:11,180 --> 00:16:15,740
You can either ask for T if you're there or ask for coffee if you're here.

151
00:16:17,010 --> 00:16:23,040
Okay, so this is a very draconian machine that decides what you are going to drink today.

152
00:16:24,000 --> 00:16:29,639
Now of course, if you ask for these two things Behaviourally behaviourally similar.

153
00:16:29,640 --> 00:16:38,100
Of course not. They're manifestly different. And you can observe the difference by attempting to choose tea or coffee in the previous machine.

154
00:16:38,100 --> 00:16:43,430
You will find no matter what you do, once you've inserted the money, you can choose whichever one you want.

155
00:16:43,440 --> 00:16:50,160
And here you'll notice sometimes it rejects your choice. That's certainly a very observable difference, but.

156
00:16:52,210 --> 00:16:56,590
Um, right. So one of them gives us the choice. The other one makes the choice internally.

157
00:16:56,950 --> 00:17:01,630
But if you look at the sequences of actions that you can get, they are identical.

158
00:17:02,170 --> 00:17:05,830
You put the cop, you put the money. Then you have a choice between coffee and tea.

159
00:17:06,280 --> 00:17:11,530
And in the language of regular expressions, this is what the possible behaviours of both machines.

160
00:17:13,180 --> 00:17:19,360
In short, these strings are not capturing the branching structure, which is essential to distinguish these two machines.

161
00:17:20,140 --> 00:17:29,530
And hence, you know, old fashioned automata theory and language theory is not going to do anything for you in analysing these concepts.

162
00:17:30,840 --> 00:17:36,280
So we need to go beyond language equivalence. So time for a formal definition.

163
00:17:37,090 --> 00:17:40,960
We have. Two states S and T.

164
00:17:41,370 --> 00:17:48,840
This dotted line connecting them is supposed to suggest that the biosimilar is going to behave exactly the same way,

165
00:17:49,530 --> 00:17:51,299
and what the formal definition says.

166
00:17:51,300 --> 00:17:59,400
It's written down here with all the symbols, but it says if this one, whatever action you choose, this one will go to some state's prime.

167
00:17:59,790 --> 00:18:06,030
Then there is some state d prime such that if you do the same action on T, it will take you there.

168
00:18:07,120 --> 00:18:11,860
And these new states as prime themselves, continue to be related.

169
00:18:12,100 --> 00:18:16,690
So there is the temporal persistence in the in the in this relationship.

170
00:18:17,470 --> 00:18:23,740
Also of course we want to symmetries this things which vice versa with f and t into changing their roles.

171
00:18:24,220 --> 00:18:31,030
So that's the formal definition, a kind of a step by step account of what it means to be behaviourally equivalent.

172
00:18:31,420 --> 00:18:36,400
And what are you observing in this setting, whether or not your system responds to certain actions?

173
00:18:36,940 --> 00:18:40,009
It might happen that the system says I'm pressing the A button.

174
00:18:40,010 --> 00:18:46,330
Then it just says, forget it, I'm not doing it. So your observation is what actions does the system make?

175
00:18:46,590 --> 00:18:49,600
Except you don't actually see the internal structure of the states.

176
00:18:51,890 --> 00:19:00,470
Now we can make all this probabilistic by introducing discrete probabilistic transition systems so that just like the old ones,

177
00:19:01,040 --> 00:19:07,340
except that now the final state, instead of just being told it's one of these 17 states that you might go to,

178
00:19:07,490 --> 00:19:15,170
you're actually given a probability distribution with numerical quantities telling you how probable a certain final state is.

179
00:19:15,770 --> 00:19:25,950
So just like before, we have states actions, but now associated with every action is this transition matrix or transition function that says start

180
00:19:25,950 --> 00:19:32,450
to the state's do the action a and there'll be some specific distribution over the final states.

181
00:19:33,680 --> 00:19:39,170
Now this model is reactive, which means all this probabilistic phenomenon is internal.

182
00:19:39,440 --> 00:19:43,909
We are not trying to associate probabilities with the actual external actions.

183
00:19:43,910 --> 00:19:48,170
There is no such thing as probability actually happens with this probability.

184
00:19:48,170 --> 00:19:52,680
An action B happens with this other probability, just given that A happened.

185
00:19:52,730 --> 00:20:02,070
Here's how it responds. So let's ask about probabilistic by simulation as defined originally by Larsen in school.

186
00:20:02,310 --> 00:20:06,900
This was a very fundamental and seminal paper on the subject.

187
00:20:07,740 --> 00:20:12,900
So here are some systems. I hope the notation is clear. These S's and T's are states.

188
00:20:13,410 --> 00:20:17,310
I haven't bothered to label these ones because there are so-called dead states where nothing can happen.

189
00:20:18,780 --> 00:20:23,550
These arrows are labelled by both an action as well as a number.

190
00:20:24,000 --> 00:20:32,730
And the way you should read this is if you are in state S0 and you do an action A with one third probability, you wind up in each one of these states.

191
00:20:34,490 --> 00:20:42,800
Uh. And once you're in the state S1 with probability one, you can do a B action and come to this dead state.

192
00:20:43,040 --> 00:20:50,150
You can't do any of the other actions. You can't do A, you can't do C, you can only do B similarly, here and here you can only do C,

193
00:20:50,570 --> 00:20:54,260
and it'll happen with probability one that you will end up in this dead state.

194
00:20:54,530 --> 00:20:59,000
Now this one. Looks very similar.

195
00:20:59,010 --> 00:21:04,860
And the question is is S0 biosimilar behaviourally equivalent to t0?

196
00:21:05,130 --> 00:21:11,610
And you look at it and you see what I can't find a perfect matching between transitions like I was doing before.

197
00:21:12,900 --> 00:21:22,800
And then you realise, oh, what you need to do is view this S2 and S3 kind of as a lump in equivalence class and add up the probabilities.

198
00:21:23,070 --> 00:21:25,170
Then this particular combination.

199
00:21:26,050 --> 00:21:33,610
Is exactly behaving like T2, and you can see from T2 it's not just the input but the outputs that are behaving the same.

200
00:21:33,880 --> 00:21:39,400
Either one of those states does a transaction a transition with probability one to a dead state.

201
00:21:39,400 --> 00:21:42,630
That's exactly what happens here. The be.

202
00:21:42,990 --> 00:21:48,750
Uh, sorry. The this transition is perfectly matched by this pair.

203
00:21:49,350 --> 00:21:53,700
So, in fact, once you realised that you have to add these things.

204
00:21:55,320 --> 00:22:02,190
You do have a notion of by simulation, but you can't just use the arrow by arrow matching definition that you had before.

205
00:22:02,940 --> 00:22:06,690
So you need to add up the probabilities. And here's blue text formalising this.

206
00:22:07,020 --> 00:22:10,179
But I will skip over it. Now.

207
00:22:10,180 --> 00:22:15,970
Markov decision processes are probabilistic versions of label transition systems,

208
00:22:16,660 --> 00:22:22,720
but the, um, the uh final state is governed by a probability distribution.

209
00:22:23,020 --> 00:22:27,670
No other indeterminacy. And now there's a reward associated with each transition.

210
00:22:28,300 --> 00:22:33,700
And what we observe are the interactions. Does it accept an action A and what is the reward.

211
00:22:34,000 --> 00:22:36,400
So we don't again we don't see the internal states.

212
00:22:36,910 --> 00:22:42,970
So again the formal definition looks like this a set of states a set of actions associated with every action.

213
00:22:43,240 --> 00:22:49,780
A probability function that says if you do action A on state's, you'll get this distribution over the final states.

214
00:22:50,080 --> 00:22:55,960
And here's the reward function that says if you do action A and state's, you get a certain numerical reward.

215
00:22:56,740 --> 00:23:04,510
Sometimes they make the rewards live in the range zero one, sometimes on the positive, whatever doesn't make fundamental difference.

216
00:23:05,230 --> 00:23:11,410
This, uh, reward. I've made it a deterministic function, but without too much difficulty or doing violence to the theory,

217
00:23:11,590 --> 00:23:14,650
you can make the reward itself a probabilistic thing.

218
00:23:17,750 --> 00:23:20,870
There's something very different that happens in this point of view.

219
00:23:20,870 --> 00:23:27,109
That isn't the case with how we dealt with these systems in the world of verification and verification.

220
00:23:27,110 --> 00:23:31,580
We wanted to say things like, no matter what this system will do, such and such,

221
00:23:32,360 --> 00:23:38,629
this plane won't crash, for example, um, but here you're doing something else.

222
00:23:38,630 --> 00:23:44,270
You're controlling the system by choosing the actions. And so we have this concept called a policy.

223
00:23:45,230 --> 00:23:49,340
So we have these, uh, Markov decision processes.

224
00:23:49,940 --> 00:23:58,040
And our policy says given a state's I'm going to choose the action based on that state.

225
00:23:58,040 --> 00:24:01,200
But that choice may be. Probabilistic.

226
00:24:01,350 --> 00:24:07,480
So there's a distribution over actions. That happens once you choose a particular policy pile.

227
00:24:08,260 --> 00:24:17,050
And once you've chosen a policy, your MDP becomes just like a probabilistic transition system and rich with the reward concept.

228
00:24:19,560 --> 00:24:26,670
So the goal is to choose the best policy. And a lot of machine learning, especially reinforcement learning,

229
00:24:26,910 --> 00:24:32,550
is about the numerous algorithms that exist to find or approximate an optimal policy.

230
00:24:33,090 --> 00:24:37,830
So a lot of the technology of optimisation theory comes into play here.

231
00:24:39,100 --> 00:24:45,820
Now what's the equivalence concept appropriate to this setting. So let be an equivalence relation.

232
00:24:46,420 --> 00:24:50,080
So you notice the use of the indefinite article and equivalence relation.

233
00:24:50,950 --> 00:24:57,880
It's just a uh something I'm going to put as a property that relations might or might not have.

234
00:24:58,240 --> 00:25:03,310
So given a relation ah, it might or might not be a by simulation relation.

235
00:25:04,030 --> 00:25:08,470
So if R is a by simulation relation, if you've got a pair of related states s and t.

236
00:25:09,490 --> 00:25:20,770
And then for every action and every equivalence class of this are first the rewards match, and second the transition probabilities.

237
00:25:21,830 --> 00:25:25,670
Into the equivalence classes match. So the fact that we're using.

238
00:25:25,970 --> 00:25:32,330
Equivalence classes amounts to lumping things together so that you're adding up the probabilities appropriately.

239
00:25:33,530 --> 00:25:38,000
And this is the basic pattern. You have some kind of immediate matching that must happen.

240
00:25:38,270 --> 00:25:41,540
Otherwise you'll see right away they're different. So the rewards must match.

241
00:25:41,840 --> 00:25:45,559
And then you want temporal persistence of this behaviour.

242
00:25:45,560 --> 00:25:50,120
And that comes from the second condition which is typically called the induction condition.

243
00:25:52,160 --> 00:25:55,580
Now this is in blue, but I'm going to read it anyway.

244
00:25:55,850 --> 00:26:04,430
So, uh, this definition is kind of intuitive and clear, but it can also be defined as the greatest fixed point of an appropriate relation transformer.

245
00:26:06,670 --> 00:26:11,380
Now, as I said before, I want to move to the world of continuous state spaces.

246
00:26:12,220 --> 00:26:15,760
And one of the questions you can ask is why?

247
00:26:16,780 --> 00:26:21,280
And you can't answer. All because I'm having fun, which is usually the real reason.

248
00:26:21,820 --> 00:26:25,900
Uh, but the setting is real.

249
00:26:26,110 --> 00:26:31,600
You have software controllers attached to physical devices on sensors like robots or controllers.

250
00:26:31,960 --> 00:26:39,250
Around the time I began working in this area, one of my colleagues got a contract to consult with the avionics industry,

251
00:26:39,250 --> 00:26:45,190
and they were very interested and it didn't really get off the ground, but at least we began that way.

252
00:26:45,580 --> 00:26:51,360
And I remember the first meeting that I asked them. How do you verify the correctness of your software?

253
00:26:51,370 --> 00:26:56,800
And they say, we have meetings. And I said, what if it's really safety critical software?

254
00:26:56,830 --> 00:27:02,080
We have lots of meetings for those. So did not really, uh, enhance my confidence.

255
00:27:02,080 --> 00:27:07,690
But planes have not crashed for software failures yet. But nevertheless, it's an important thing to reason about.

256
00:27:09,310 --> 00:27:13,360
So the setting we're going to look at is continuous state space but discrete time.

257
00:27:13,690 --> 00:27:17,900
And you might question if you're going to make the state space continuous. Why not the temporal part.

258
00:27:17,920 --> 00:27:22,090
Well the main reason was it's much harder. But we're doing that now.

259
00:27:22,090 --> 00:27:24,940
But at the time we decided to stick with discrete time.

260
00:27:25,120 --> 00:27:31,510
And it's not such an unreasonable thing because many of these controllers sample, you know, 50 times a second,

261
00:27:32,410 --> 00:27:36,940
maybe assign an NDA not to not say that, but anyway, sample some number of times per second.

262
00:27:37,300 --> 00:27:40,940
And based on this discrete temporal sample, it makes some decisions.

263
00:27:40,960 --> 00:27:44,980
So it does look even though of course dynamical systems are, you know,

264
00:27:45,130 --> 00:27:51,280
in the continuous time the software that you're writing thinks it's in discrete time.

265
00:27:52,470 --> 00:27:57,000
So there were applications to control systems, to probabilistic programming languages,

266
00:27:57,000 --> 00:28:03,180
because once you put an iteration or recursion into a language with even with discrete probabilistic choice,

267
00:28:03,180 --> 00:28:09,480
you will effectively get, uh, continuous probability spaces by looking at the space of trajectories.

268
00:28:10,560 --> 00:28:15,900
So let me make some more remarks about this. So of course these continuous state spaces can be used for reasoning,

269
00:28:16,350 --> 00:28:20,850
but maybe it would be much better if we could have a finite state version, whatever that means.

270
00:28:20,850 --> 00:28:26,100
Exactly. And in fact, we spent a lot of effort, which I'm not going to talk about today,

271
00:28:26,100 --> 00:28:33,510
about approximating continuous spaces with finite state ones, uh, with some appropriate notion of approximation.

272
00:28:34,320 --> 00:28:39,900
But you could say, why not just discrete discretized right away and never worry about the continuous case?

273
00:28:40,590 --> 00:28:46,920
Well, if you don't have the continuous case, the original thing to compare with, how can you tell that your approximation is a good one?

274
00:28:48,280 --> 00:28:53,950
And secondly, if you've thrown away the original continuous model and you've just got a particular discrete one,

275
00:28:54,040 --> 00:28:57,490
you can refine it, you can coarse in it, but you can refine it further.

276
00:28:59,650 --> 00:29:09,610
Okay. So, uh, let me go from that, those considerations to the mathematical ones and that the first one is the need for measure theory.

277
00:29:10,000 --> 00:29:12,370
Why do we have to do all this measure theory nonsense?

278
00:29:12,940 --> 00:29:22,600
Um, and the basic fact is, there are subsets of the real line for which no sensible notion of size can be defined.

279
00:29:23,470 --> 00:29:28,299
So size, of course, is the heart of measure theory. So you look at the real line.

280
00:29:28,300 --> 00:29:31,030
You have a notion of interval. You have a notion of length of interval.

281
00:29:31,390 --> 00:29:35,590
You start getting more sophisticated sets and ask what is the analogue of the notion of length.

282
00:29:36,040 --> 00:29:41,080
You might ask, what is the length of the rational numbers? Turns out to be zero, but you can ask questions like that.

283
00:29:41,740 --> 00:29:46,330
What is the length of the Cantor set to zero even though it's uncomfortable?

284
00:29:46,360 --> 00:29:51,340
So it tells you that accountability is absolutely the wrong way of measuring size in this setting,

285
00:29:51,790 --> 00:29:55,360
where you do want some kind of geometric intuition underpinning all this.

286
00:29:55,900 --> 00:30:02,140
So the fact is, until you invoke measure theory and sigma algebras and all the apparatus that goes with it,

287
00:30:02,440 --> 00:30:07,720
you can't just willy nilly talk about sizes and probabilities in a continuous setting.

288
00:30:09,090 --> 00:30:16,380
The more precise statement due to Vitale, is that there is no translation invariant measure defined on all subsets of the reals.

289
00:30:16,680 --> 00:30:19,499
There are, of course, things that are not translation invariant,

290
00:30:19,500 --> 00:30:27,330
like the measure that says any set containing the zero has complete one and anything else has zero, but that's certainly not translation invariant.

291
00:30:29,450 --> 00:30:32,930
All right. So we went to this this continuous world.

292
00:30:33,860 --> 00:30:38,120
We use the same essentially the same definition of probabilistic by simulation.

293
00:30:38,750 --> 00:30:43,310
And we. Found some curious things happening.

294
00:30:43,940 --> 00:30:46,690
So first of all, even before us, right.

295
00:30:46,700 --> 00:30:55,819
Going back to, uh, Robin Milner's work, he and Matthew Hennessey and independently, um, uh, God, who's the Dutch guy?

296
00:30:55,820 --> 00:31:04,809
Samson. Uh. So Christian, who is the one who did logical characterisation of Van Bentham?

297
00:31:04,810 --> 00:31:08,170
Thank you for Van Bentham always gets forgotten. Anyway.

298
00:31:08,170 --> 00:31:16,740
Um. And what they did was they showed that two systems are by similar if and only if.

299
00:31:17,100 --> 00:31:22,620
So that's very strong. They agree on all formulas of a specific logic that you can write down.

300
00:31:23,800 --> 00:31:29,100
Now, this is very important because it tells you it's, uh, how to show things are not dissimilar.

301
00:31:29,710 --> 00:31:36,040
If you want to show things are by similar. Yes. You try and construct one of those relations and see if you can get the two states in question.

302
00:31:36,040 --> 00:31:39,969
Related. But if you tell me they're not dissimilar, how do I show that?

303
00:31:39,970 --> 00:31:43,090
I can't say. Well, I tried very hard to find a relation and failed.

304
00:31:43,510 --> 00:31:50,440
What you need is some result like this. And you can exhibit a formula and say, see, they disagree on this formula.

305
00:31:51,190 --> 00:31:58,630
Now the formula formulas that appeared in the original logic had all kinds of things disjunction, negation specifically.

306
00:31:59,530 --> 00:32:03,790
And in the case you would and of course it was restricted to finite branching systems.

307
00:32:04,810 --> 00:32:14,740
And when we began, we were told by all the wise men in the field that we had no hope of doing an analogous thing in this kind of continuous setting.

308
00:32:15,250 --> 00:32:23,910
We lacked the kind of discreet structure that you would expect in logic. And so our logic had these formulas.

309
00:32:24,150 --> 00:32:30,010
Just the atom true binary conjunction. And this peculiar modal formula.

310
00:32:30,030 --> 00:32:35,280
So this action is inside a little diamond and sub scripted by a rational number Q.

311
00:32:35,730 --> 00:32:46,410
And it says a state satisfies this formula if it can perform an action with probability at least q and wind up in a state that then satisfies phi.

312
00:32:46,590 --> 00:32:48,780
So this is an inductive definition of formulas.

313
00:32:50,690 --> 00:32:59,570
And the result we were able to prove was that two systems are dissimilar if and only if they obeyed the same formulas of this logic L.

314
00:32:59,810 --> 00:33:05,959
So this is a striking because no negation, no disjunction, even in the continuous setting,

315
00:33:05,960 --> 00:33:13,070
which is certainly more than just the infinite discrete setting, uh, binary conjunction suffices and.

316
00:33:14,100 --> 00:33:18,300
I remember vividly that, uh, when my student first presented this.

317
00:33:18,720 --> 00:33:21,750
People jumped up very kindly because, you know, they were kind to.

318
00:33:21,960 --> 00:33:25,740
It was a ten minute presentation at a summer school where the students were allowed to present.

319
00:33:26,460 --> 00:33:32,700
And, um, they very politely told her, oh, uh, this can't possibly be right.

320
00:33:33,390 --> 00:33:37,590
And she didn't know English very well. She was she's a French speaker from Montreal.

321
00:33:38,160 --> 00:33:41,580
And she said, be quiet at the end of my talk you will be convinced.

322
00:33:43,230 --> 00:33:47,370
So, uh, I was I wished I had the confidence to say stuff like that.

323
00:33:49,920 --> 00:33:55,070
Uh, so the important thing is, as I said before, no finite branching assumption, no negation.

324
00:33:55,280 --> 00:33:58,670
This was what people were shocked by. They said, how can this possibly be?

325
00:33:59,600 --> 00:34:03,800
Uh, the proof is entirely different from the combinatorial proofs that existed before.

326
00:34:04,010 --> 00:34:08,540
Which is why they weren't subject to those, um, to that dependence on discreteness.

327
00:34:08,900 --> 00:34:13,610
So it uses tools from, of all things, descriptive set theory as well as measure theory.

328
00:34:15,460 --> 00:34:24,970
Uh, right. So as I said before, such a theorem was originally proved for non probabilistic systems with finite branching restrictions and later by uh,

329
00:34:25,060 --> 00:34:32,440
a lot of time in school in the discrete probabilistic setting with a lot of restrictions on how much the branching was controlled and so on.

330
00:34:33,280 --> 00:34:38,290
And because of the power of the techniques we used, we were able to get.

331
00:34:38,290 --> 00:34:42,520
So you see, these results are even true for discrete systems after the fact.

332
00:34:43,300 --> 00:34:48,580
But, uh, this does logical characterisation of discrete probabilistic systems.

333
00:34:48,730 --> 00:34:55,630
But it was only by going through this, uh, um, excursion that what one was able to prove it at first.

334
00:34:55,990 --> 00:34:59,800
Later on, of course, people did provide purely discrete proofs of it.

335
00:35:00,900 --> 00:35:04,480
And I must at this point pay homage to my former student.

336
00:35:04,500 --> 00:35:09,330
She's now a full professor. Jose, they are nay. Basically, she proved everything difficult.

337
00:35:10,170 --> 00:35:16,440
Uh, I proved many things incorrectly, and she fixed them.

338
00:35:16,830 --> 00:35:20,280
So I call her my proof engine because she made.

339
00:35:20,280 --> 00:35:23,309
She, uh. I think we were a very good partnership.

340
00:35:23,310 --> 00:35:28,709
Because I would say, yes, let's do this. And much work like this. And she would say, no, no, no, it doesn't work like that.

341
00:35:28,710 --> 00:35:34,620
And sure enough, she would be right, but wasn't far from what I said. So it was always very, uh, nice working with her.

342
00:35:36,160 --> 00:35:39,610
Okay, so that's the story of probabilistic pi simulation.

343
00:35:41,020 --> 00:35:45,160
But we'd like to ask, is this notion of exact equivalence reasonable?

344
00:35:46,290 --> 00:35:50,460
We are demanding exact numerical equality of these probability numbers.

345
00:35:51,030 --> 00:35:57,660
And that really is not reasonable, because a small perturbation in these numbers is suddenly going to lead you to say,

346
00:35:57,870 --> 00:36:01,080
uh, these two states are no longer bi similar. They had been.

347
00:36:01,200 --> 00:36:06,390
But then I changed this half to a half plus epsilon and the other half to a half minus epsilon.

348
00:36:06,660 --> 00:36:09,720
Now they're not dissimilar. Yes. No. So.

349
00:36:11,140 --> 00:36:15,340
We should have some metrical notion that tells you. How close are they?

350
00:36:16,180 --> 00:36:27,420
And this is where we went to next. So, and Smolka proposed this idea, but they weren't able to come up with a metric that matched with by simulation.

351
00:36:28,110 --> 00:36:31,590
So we want a quantitative measurement of the distinction between processes.

352
00:36:32,190 --> 00:36:35,220
And I can usually I give a one hour talk at this point.

353
00:36:35,490 --> 00:36:42,990
So I'm going to skip that in lieu of all these Greek symbols and letters I'm just going to uh, tell you very heuristically what happened.

354
00:36:43,680 --> 00:36:49,530
So remember this logical characterisation result. I'm going to lie a little bit, but please bear with me.

355
00:36:50,340 --> 00:36:54,370
So the logical characterisation results as if two things are similar.

356
00:36:54,390 --> 00:36:59,850
They will agree on all formulas. So if they're not similar there will be some formula they disagree on.

357
00:37:00,360 --> 00:37:05,740
So look for the shortest formula on which they disagree. If it's a huge formula.

358
00:37:06,900 --> 00:37:10,230
If the if a huge formula is the shortest formula on which they disagree,

359
00:37:10,470 --> 00:37:15,780
then they should be close because it takes a long and elaborate experiment before you discern any difference.

360
00:37:16,350 --> 00:37:23,310
If they disagree on a short formula, um, they're very visibly different and they should be given a large distance.

361
00:37:25,080 --> 00:37:34,090
So, um. They may disagree on the rewards, or they may disagree on the probability distribution that results from a transaction.

362
00:37:34,540 --> 00:37:38,980
So we need to measure distances between probability distributions.

363
00:37:39,190 --> 00:37:44,230
And there is a well known metric. Well known, but the name is always given incorrectly.

364
00:37:44,650 --> 00:37:48,160
Uh, it's called the Cantor. Which metric between probability distributions.

365
00:37:48,700 --> 00:37:52,900
So the reason I stroke through the Wasserstein is because that's what everyone calls it.

366
00:37:54,650 --> 00:38:01,880
But, uh, it's very weird because people who've never read the paper cited a 1969 paper by Varsha Stein,

367
00:38:02,870 --> 00:38:06,560
uh, in which they say Washington introduced this metric in 1969.

368
00:38:06,770 --> 00:38:10,990
And Cantor, it proved this thing about it in 1958. And you go, hmm.

369
00:38:11,980 --> 00:38:16,550
Little bit of temporal anomaly there. Anyway. So it is actually introduced by Cantor, which in the 40s.

370
00:38:16,970 --> 00:38:21,680
This is such a nice and natural metric that has been discovered many times,

371
00:38:21,680 --> 00:38:27,049
and I'm sure Roger Stein discovered it without, you know, uh, looking at Cantor, which is work.

372
00:38:27,050 --> 00:38:30,590
But the fact is Cantor, which was 20 years earlier.

373
00:38:32,290 --> 00:38:39,370
So if the difference shows up only after a long and elaborate observation, then we should make the states nearby in the by simulation metric.

374
00:38:39,970 --> 00:38:44,200
So one should think of these formulas not just as logical mumbo jumbo.

375
00:38:44,380 --> 00:38:49,240
They're actually descriptions of experiments that you can do to see how things behave.

376
00:38:50,220 --> 00:38:53,850
Right? Because they say, do this action. Do that action. See what the probabilities are.

377
00:38:55,820 --> 00:39:01,980
So all this can be formalised. It's rather more complicated than this nice intuition that I've been giving you.

378
00:39:02,060 --> 00:39:08,480
And it was originally done by us, by Josie and the and her band of followers.

379
00:39:08,900 --> 00:39:19,280
Uh, but later, with a beautiful fixed point construction by fan Bruegel and world, who, uh, actually gave a much more fruitful definition.

380
00:39:19,280 --> 00:39:24,050
And all the work we've done subsequently follows their presentation of this metric.

381
00:39:25,800 --> 00:39:29,910
A master student of mine. Later a PhD student added rewards.

382
00:39:30,090 --> 00:39:35,490
So this is where we came up with by simulation metrics for Markov Decision processes.

383
00:39:36,150 --> 00:39:45,210
And the student whose name is Norm funds. Actually proved an important property that was completely different from anything we had shown before,

384
00:39:45,360 --> 00:39:53,820
which was that the optimal if you look at the optimal value function in an MDP and you compare it at two different points,

385
00:39:54,270 --> 00:40:00,690
that difference in the value of the optimal value function is bounded by some constant times the by simulation metric.

386
00:40:01,110 --> 00:40:04,379
So the by simulation metric is not just some random thing.

387
00:40:04,380 --> 00:40:10,230
It's actually closely related to the notion of optimal value function in reinforcement learning.

388
00:40:11,730 --> 00:40:14,370
So let me say a bit about reinforcement learning.

389
00:40:15,270 --> 00:40:21,570
Um, so we are often dealing with large or infinite transition systems whose behaviour is probabilistic.

390
00:40:22,050 --> 00:40:31,260
Again, the system responds to stimuli and moves to a new state probabilistically and outputs a reward, and we seek optimal policies.

391
00:40:31,260 --> 00:40:36,150
As I trained before, a plethora of algorithms and techniques exist.

392
00:40:36,510 --> 00:40:40,320
But the cost of these things depends on the size of the state space.

393
00:40:41,100 --> 00:40:44,010
Uh, in a polynomial way worse than linear.

394
00:40:44,370 --> 00:40:51,240
And if you can shrink these things and get something that's effectively the same but smaller, it will help you a lot.

395
00:40:53,330 --> 00:41:01,070
Um, and so there's this other thing you can do called representations, where you say instead of actually using the exact state space,

396
00:41:01,490 --> 00:41:05,390
let's try and use something called features that capture aspect of the state space.

397
00:41:05,900 --> 00:41:11,540
And can we use these representations to accelerate the learning process by shrinking the your search space?

398
00:41:13,740 --> 00:41:21,510
Um, um. So basically, for large state spaces, learning a value function like that is simply not feasible.

399
00:41:22,860 --> 00:41:27,000
So instead we defined a new space of features and try to come up with an embedding.

400
00:41:27,960 --> 00:41:32,640
Of your state space into the vector space indexed by the features.

401
00:41:33,240 --> 00:41:42,090
Then we can try to use this space, which is hopefully much lower dimensional than S, to predict values associated with state action pairs.

402
00:41:43,250 --> 00:41:47,530
And what this phrase representation learning means. Learning such a high.

403
00:41:48,430 --> 00:41:49,660
Okay, so you can. Of course.

404
00:41:50,640 --> 00:41:59,310
Try to cook up with cook up fires, but it would be nice if your system not only used such a representation, but learned a good one.

405
00:41:59,610 --> 00:42:05,639
Now, these elements of M are basically features that are chosen by the person developing the system.

406
00:42:05,640 --> 00:42:10,500
They can be based on any kind of prior knowledge or experience about the task at hand.

407
00:42:12,410 --> 00:42:20,060
Right. So, uh, it was actually Pablo Castro and Mark Roland at the Find and Google Brain, respectively.

408
00:42:21,100 --> 00:42:26,410
Actually the other way around. Sorry. Um, who started working on on this?

409
00:42:28,020 --> 00:42:33,450
And say they wanted to use the Cantor which metric at first to compute.

410
00:42:33,990 --> 00:42:37,710
But they found it expensive and difficult to estimate.

411
00:42:37,980 --> 00:42:43,590
And in particular, you can't find an unbiased estimator for the Cantor with metric.

412
00:42:44,250 --> 00:42:53,780
So they came up with something else. They said. Let's use a slimmed down, easy version that that can in fact be estimated.

413
00:42:54,710 --> 00:43:02,870
And it's around this time that we joined them. Um, so we, as my student Pilar and I joined Pablo Castro and Mark Roland.

414
00:43:04,300 --> 00:43:09,400
So we had this kind of simplified metric. Closely related to the by simulation metric.

415
00:43:09,400 --> 00:43:14,080
But it is a very crude approximation. And in fact it's not even a metric.

416
00:43:14,440 --> 00:43:19,960
So there are many distances out there that you know, you'd like them to be metrics, and they turn out not to be.

417
00:43:20,530 --> 00:43:29,560
KL divergence is a famous one, for example. So it's a new type of distance and we called it a diffuse metric.

418
00:43:30,070 --> 00:43:36,070
Maybe not the best terminology, but anyway that's what I called it. So here are the axioms for a diffuse metric.

419
00:43:36,490 --> 00:43:41,710
So the distances it assigns to two pairs of points have to be non-negative.

420
00:43:42,190 --> 00:43:48,740
It's symmetric. It satisfies the triangle inequality, which gives you its geometric flavour.

421
00:43:50,470 --> 00:43:54,310
But we do not require the distance of a point to itself is zero.

422
00:43:54,340 --> 00:43:59,680
This is perhaps the most drastic violation of the metric axioms that you can imagine.

423
00:43:59,920 --> 00:44:04,300
Lots of people have imagined violating triangle symmetry.

424
00:44:04,330 --> 00:44:10,959
You get quasi metrics. Lots of people have imagined having things x, y at zero distance.

425
00:44:10,960 --> 00:44:16,120
Those are called pseudo metrics. You can combine both those things and call them pseudo quasi metrics and so on.

426
00:44:16,330 --> 00:44:20,950
But this, uh, I thought nobody else had thought of.

427
00:44:20,950 --> 00:44:26,140
But it is not true. A couple of Polish people had had a metric like this.

428
00:44:27,500 --> 00:44:30,550
Um. And it's a bit shocking.

429
00:44:31,330 --> 00:44:38,080
It's like the Red Queen having to run to stay in one place. So, um, what can one say about such a thing?

430
00:44:38,590 --> 00:44:43,440
If you ask what's really going on? Perhaps I can give an example.

431
00:44:43,710 --> 00:44:50,520
Supposing you are looking for crude ways of measuring the distance of probability distributions for the all kinds of things you can do,

432
00:44:50,520 --> 00:44:57,749
like control of it, etc. but one thing you can just do is you say, I have a so I have an underlying metric space.

433
00:44:57,750 --> 00:45:02,610
So you can say I'll sample two points. And measured their average distance.

434
00:45:03,630 --> 00:45:10,050
Good. Uh, maybe this will give me a measure of how far apart two probability distributions are.

435
00:45:11,280 --> 00:45:20,310
And so, um. You can try this, but then you'll find if you have any distribution other than the direct distribution,

436
00:45:20,670 --> 00:45:27,000
you might be sampling points that are far apart. And it'll tell you the distance from this to itself is not zero.

437
00:45:28,190 --> 00:45:33,410
Uh, so it's a very it actually arises very naturally. And that's why you have to do clever things to get real metrics.

438
00:45:35,770 --> 00:45:39,430
So we call this metric the Meeko distance.

439
00:45:40,270 --> 00:45:47,740
So as I said, it's a distance. It's not a metric. Now, nearly all the machine learning algorithms I've seen are optimisation algorithms.

440
00:45:48,770 --> 00:45:55,250
And one often introduces extra terms into the objective function that push the solution in a desired direction.

441
00:45:56,210 --> 00:46:05,300
So what we did was we said, okay, here's this proxy for the basic relation metric spiritually inspired by by simulation, but not exactly.

442
00:46:05,990 --> 00:46:09,140
And we defined a loss term based on the micro distance.

443
00:46:09,530 --> 00:46:14,929
So Pablo Castro was the main experimentalist I have the last time I know the

444
00:46:14,930 --> 00:46:20,149
program was probably a five line program in ML for a functional programming class,

445
00:46:20,150 --> 00:46:23,630
so I certainly didn't do any of the implementation here.

446
00:46:23,840 --> 00:46:30,950
And he has a beautiful write up on his blog, where he describes all the experiments and the details and everything.

447
00:46:30,980 --> 00:46:36,260
I will, uh, say a bit, but it's the kind of thing I can't really answer questions about.

448
00:46:37,360 --> 00:46:43,700
But at least I can say something about the setup. So we have basically two copies of the neural net.

449
00:46:43,720 --> 00:46:48,490
It computes the representation psi is the function approximator.

450
00:46:48,490 --> 00:46:51,760
And here is some kind of loss function.

451
00:46:52,180 --> 00:46:55,120
Usually this one is based on temporal difference.

452
00:46:55,600 --> 00:47:06,940
But then what we do is we will say we penalise you if your, uh, representations are too far apart from the Meeko distance.

453
00:47:08,140 --> 00:47:11,360
Okay. So the normal loss term is just this.

454
00:47:11,380 --> 00:47:16,030
In fact normally you just have one of these. But we take a pair of these do the same thing.

455
00:47:16,240 --> 00:47:20,380
But then we say an additional loss term coupling these two together which is.

456
00:47:21,760 --> 00:47:24,880
Try and keep them within Meeko distance of each other.

457
00:47:26,270 --> 00:47:33,020
And you see, this is a kind of a generic thing that you can do. It's not just tuned for one other particular algorithm.

458
00:47:33,650 --> 00:47:38,300
And so we added this Mike Meeko last term to a variety of existing agents.

459
00:47:38,660 --> 00:47:41,000
All of those are available in the dopamine library.

460
00:47:41,900 --> 00:47:53,900
We ran each game five times with new seeds, so 300 runs for each agent, and we ran each game for 200 million environment interactions.

461
00:47:54,470 --> 00:48:01,720
And what we tested this on was the Atari suite. So we looked at the final scores and the learning curves.

462
00:48:02,170 --> 00:48:10,960
Uh, and so what we found was that on two thirds of the games, putting the Meeko loss term through a variety of gain, variety of agents,

463
00:48:11,290 --> 00:48:20,440
about two thirds of the time, it quite significantly improved the performance in terms of time taken to learn and, uh, performance thereafter.

464
00:48:21,280 --> 00:48:24,460
So, yeah, every agent performed better.

465
00:48:24,880 --> 00:48:29,860
Yes, that's true. Every single agent seemed to be improved by this. And those agents are quite different from each other.

466
00:48:30,880 --> 00:48:36,910
So here are some results. But you know, just like some people are scared of Greek letters, I'm scared of graphs.

467
00:48:37,120 --> 00:48:40,120
So I'll skip these. Uh, and.

468
00:48:41,380 --> 00:48:45,280
So let me wrap up by mentioning a few other developments.

469
00:48:46,120 --> 00:48:53,370
Um. We did a lot of work on approximation theory, which I didn't talk about today because it's another 1.5 hour talk.

470
00:48:53,970 --> 00:48:59,430
But essentially what we were able to do is approximate continuous state systems by finite state ones.

471
00:48:59,430 --> 00:49:03,420
And in what sense are they close precisely in terms of the by simulation metric,

472
00:49:03,750 --> 00:49:11,580
you can actually define a sequence of finite state systems that converges in that metric to the infinite state system.

473
00:49:13,230 --> 00:49:19,200
Now you might say, yes, metrical reasoning is all very well, but you know where algebra is, or many of us are.

474
00:49:19,200 --> 00:49:22,650
And we'd like to capture the notion of distance in an equation setting.

475
00:49:23,160 --> 00:49:27,870
So we introduced this notion of approximate equality.

476
00:49:28,200 --> 00:49:32,460
So you take the equal symbol and annotate it with a small real number epsilon.

477
00:49:32,760 --> 00:49:41,360
And you say s equals epsilon. T is the same as s is within epsilon of t rather than s is exactly the same as t.

478
00:49:41,370 --> 00:49:45,660
Now of course you would like to realise this is not an equivalence relation.

479
00:49:46,960 --> 00:49:50,260
But just because it's not an equivalence relation doesn't mean it's rubbish.

480
00:49:50,500 --> 00:49:52,300
It's actually something called a uniformity.

481
00:49:52,930 --> 00:50:00,040
Uh, and from which one connects, one can basically produce an enhanced kind of notion of quantitative algebra.

482
00:50:00,370 --> 00:50:04,989
And you can develop free algebra as you can develop the concept of equations with theories.

483
00:50:04,990 --> 00:50:09,480
You can develop completeness theorems. And you can develop ask.

484
00:50:09,630 --> 00:50:15,360
So what is the free algebra if you do this in that and the other. So there are simple equation theories you can write down.

485
00:50:15,840 --> 00:50:19,170
And if you ask what are the free algebras out of the blue they turn out to be.

486
00:50:20,340 --> 00:50:26,610
Probability spaces with the Cantor. Which metric? Even though your activities, doesn't say anything about probability.

487
00:50:27,150 --> 00:50:34,740
So that was striking for us that it's really taking this metric reasoning and putting it in an equation setting.

488
00:50:35,070 --> 00:50:36,330
One has to be a bit careful.

489
00:50:36,780 --> 00:50:46,590
One has to take care of the what I call the ergonomics of reasoning, because now it costs it costs epsilons to chain these equalities together.

490
00:50:48,170 --> 00:50:51,290
So there is a lot of categorical mumbo jumbo. So I'm not going to tell you that.

491
00:50:51,290 --> 00:50:58,340
But, uh, for those who are interested, but I've given talks here before in which I've talked about this precisely.

492
00:50:59,710 --> 00:51:03,490
One of the other recent things we've done is exploit symmetry.

493
00:51:03,670 --> 00:51:11,400
So that's where there's a group action on the state space. And you want to look at the orbit space, learn on the orbit space and pull it back.

494
00:51:11,410 --> 00:51:14,430
Pull back the policy you've learned to the original state space.

495
00:51:14,500 --> 00:51:19,900
This is a. Sounds simple, but wasn't so easy.

496
00:51:19,930 --> 00:51:28,089
Fortunately, I had yet another brilliant student who, uh, her name is, uh, Ruthie, and she's doing her PhD at Harvard now.

497
00:51:28,090 --> 00:51:32,020
And so she took my vague and confused ideas and made them work.

498
00:51:32,380 --> 00:51:37,750
So that was a very nice, uh, line of work. I have yet another brilliant student.

499
00:51:37,760 --> 00:51:44,110
Uh, Florence Clark is her name, and so she is the one who started working on continuous space and time.

500
00:51:44,440 --> 00:51:50,919
So things like diffusion processes, the general class, are called attention processes, turns out to be much more difficult.

501
00:51:50,920 --> 00:51:55,000
It's not a cheap hack of what we did before because the notion of step is gone.

502
00:51:56,400 --> 00:52:00,820
Things are flowing. They're not jumping. So, um, you have to do something else.

503
00:52:00,830 --> 00:52:04,800
You have to look at the space of trajectories. The theory gets much more painful.

504
00:52:05,160 --> 00:52:12,030
Fortunately, she likes pain. So, um, um, she actually managed to to, uh, hack her way through this.

505
00:52:13,810 --> 00:52:19,719
And as I said, uh, this Meeko metric, I mean, it was a nice thing and so on, but at the end,

506
00:52:19,720 --> 00:52:24,070
we had to do some hackery to get rid of the non-zero cells, distances and so on.

507
00:52:25,760 --> 00:52:32,089
And at the time we were dissatisfied. It seemed to be a beautiful subject until the end point you had to say, well,

508
00:52:32,090 --> 00:52:36,500
we have to cut off the self distances in order to get something that looks more like a distance.

509
00:52:37,070 --> 00:52:42,979
But it turns out one can look at this from the viewpoint of kernels, and then it comes out in a quite a principled way.

510
00:52:42,980 --> 00:52:46,580
So this is, uh, work that we published in HTML.

511
00:52:46,650 --> 00:52:49,970
Ah, ah, yes. Last year.

512
00:52:50,840 --> 00:52:56,180
Other work that I've done, uh, is with my former student Joey both who was a postdoc here,

513
00:52:56,180 --> 00:53:02,150
or is a postdoc here with the, uh, generative, um, learning groups.

514
00:53:02,630 --> 00:53:06,260
So we worked on normalising flows on hyperbolic spaces.

515
00:53:07,010 --> 00:53:10,070
I was really just a geometry consultant. They they did it.

516
00:53:11,660 --> 00:53:18,920
Uh, and we also did some work on generative algorithms based on solving stochastic differential equations on Romanian manifolds.

517
00:53:18,920 --> 00:53:22,970
So there's a lot of nice mathematics that one can do in this setting.

518
00:53:23,860 --> 00:53:29,740
So let me conclude quickly, uh, by simulation, has a rich and venerable history.

519
00:53:30,220 --> 00:53:36,970
In fact, many people discover it all the time. I would say once every 5 to 10 years, somebody invents pi simulation.

520
00:53:37,200 --> 00:53:40,640
Mhm. But that just shows you the naturalness of the concept.

521
00:53:41,150 --> 00:53:49,670
The metric analogue holds promise for quantitative reasoning and approximation, and I think looking forward to pushing this line in future work.

522
00:53:50,300 --> 00:53:56,630
And basically research is alive and well. And there are new areas where by simulation is being discovered all the time.

523
00:53:57,170 --> 00:54:01,910
So I'll just show you some some more photographs. These are collaborators.

524
00:54:02,420 --> 00:54:08,050
This is Gordon Plotkin, who's best kept in the cage. Here are some other collaborators.

525
00:54:08,060 --> 00:54:12,230
This is Norm funds. Uh, and this is Pablo Tyler.

526
00:54:12,470 --> 00:54:15,530
Vineet Gupta and Robert worked closely with me.

527
00:54:15,740 --> 00:54:21,770
I didn't put Josie's picture back here, but. Uh, these two boys and I work together with Josie.

528
00:54:21,830 --> 00:54:29,330
We often call her Jose and her Indian gang. So, uh, all the approximation work was done with them.

529
00:54:30,200 --> 00:54:39,799
And I'd like to give special thanks to very, uh, to close and dear friends, Diana prick up and Doyle Pino, who I got at one time.

530
00:54:39,800 --> 00:54:48,740
I was their mentor. Now they're superstars. Uh, um, and, uh, they're the ones who inspired me to move from the area in which I was in,

531
00:54:48,980 --> 00:54:56,000
which was programming languages which I haven't abandoned, but which I have, uh, moved to also think about machine learning.

532
00:54:57,520 --> 00:54:58,300
Okay. Thank you.