1 00:00:01,660 --> 00:00:16,820 [Auto-generated transcript. Edits may have been applied for clarity.] So. Welcome, everybody. 2 00:00:17,090 --> 00:00:22,180 Welcome to this lecture. The series of lectures founding doctor Christopher Strangio, 3 00:00:22,460 --> 00:00:27,200 who was a leading figure in the history of our Department of History of Programming languages. 4 00:00:27,650 --> 00:00:32,390 Before introducing today the speaker, I want to thank our sponsors for Oxford Asset Management, 5 00:00:32,690 --> 00:00:37,040 who've been, uh, generous to be sponsoring this series since 2015. 6 00:00:37,250 --> 00:00:43,650 So it's thanks to them that we can bring such excellent speakers, um, um, students, uh, researchers, 7 00:00:43,790 --> 00:00:48,770 to find out about their company, um, invited to meet with them after what we have thought we were. 8 00:00:50,240 --> 00:00:58,940 Okay. So how it's going to work is I'm going to introduce percussion, and then it's going to enlighten us and, uh, awkward questions. 9 00:00:59,360 --> 00:01:02,530 Um, they're going to be people running around with microphones. 10 00:01:02,950 --> 00:01:05,959 Um, before you do that okay. 11 00:01:05,960 --> 00:01:17,000 So I'm really, really, really delighted to introduce Percussion and Garden, um, who is a super well-known figure in the foundations of science. 12 00:01:17,700 --> 00:01:22,820 And about it so low that I get a reminder every single day of that. 13 00:01:22,830 --> 00:01:27,330 Probably you didn't know that. And that's because I was I walked down the corridor. 14 00:01:27,660 --> 00:01:32,880 Um, there are probably some students here from the kind of quantum office, and, um, 15 00:01:32,970 --> 00:01:39,380 they have a poster in their office with a picture of the cast that you got in there. 16 00:01:39,390 --> 00:01:50,100 So I see this every single day. And, uh, this picture was from March Fest, which was held here in Oxford in 2014 to celebrate a birthday. 17 00:01:50,430 --> 00:01:57,620 And so and, uh, honour and that I thought I'd read you a little bit of the blurb from Prakash. 18 00:01:57,810 --> 00:02:03,960 That's not all of it. Um, so it says of God has worked on a large variety of topics, 19 00:02:04,350 --> 00:02:12,540 including probabilistic and concurrent computation, logics and duality, and quantum information and computation. 20 00:02:13,110 --> 00:02:18,690 Despite the enormous breadth of his research, he's made significant and deep contributions. 21 00:02:19,020 --> 00:02:19,749 For example, 22 00:02:19,750 --> 00:02:28,250 we entered this book a logic and also a real valued interpretation of the logic to capture equivalence of probabilistic processes quantitatively. 23 00:02:28,620 --> 00:02:32,970 So we're going to have some think about that today because I say probabilistic by simulation. 24 00:02:33,330 --> 00:02:44,020 And he's talked about that something new since uh, first is the because he's also been through a lot of, 25 00:02:44,020 --> 00:02:47,430 um, important work in learning and learning theory. 26 00:02:47,730 --> 00:02:55,020 So he was a founding member of the Reasoning and Learning Lab at McGill and also the Montreal Institute of Learning algorithm. 27 00:02:55,080 --> 00:02:58,229 So we're going to be getting some of that today as well. Okay. 28 00:02:58,230 --> 00:03:06,720 So I'm not going to tell you all the accolades, but I just want to very briefly point out that there's actually two plus two time papers in this, 29 00:03:07,020 --> 00:03:14,040 which are to celebrate papers that can have a group of students, in fact, a one year period is a fellow of the Royal Society of Canada. 30 00:03:14,580 --> 00:03:18,780 I want to guess. Yeah. Um, founding chair of ACM Sigma. 31 00:03:19,170 --> 00:03:22,500 So, um, at this point, I will turn it over to you. That's wonderful. 32 00:03:23,610 --> 00:03:37,200 Thanks very much. You. Right. 33 00:03:37,470 --> 00:03:42,870 So first of all, it's a thrill and an honour to be here. Uh, I spent a lot of time at Oxford. 34 00:03:42,870 --> 00:03:47,790 Two different one year sabbaticals, and I was trying to count up how many talks I've given on. 35 00:03:47,790 --> 00:03:51,210 I think it's over 20 talks that I've given here in Oxford. 36 00:03:51,330 --> 00:03:54,960 Never in this room. Never been greeted by a dinosaur before. 37 00:03:55,290 --> 00:04:02,280 So, uh, I'm very happy to be here. And I'm thrilled to see so many people turning out to hear me. 38 00:04:03,030 --> 00:04:07,260 Okay, so, um, let me begin. 39 00:04:12,770 --> 00:04:17,300 I have to begin with Christopher Screechy because he made a fundamental contribution. 40 00:04:17,310 --> 00:04:23,660 He, along with Dijkstra and Hoare, with the people who really started thinking about programming mathematically. 41 00:04:24,020 --> 00:04:30,560 And this has had a huge influence, not just in programming languages, but in related areas like verification and so on. 42 00:04:30,980 --> 00:04:38,120 And of course, Christopher Stretchy and Dana Scott founded the so called Scott stretchy approach to denotation semantics, 43 00:04:38,120 --> 00:04:44,210 which is one of the things that sucked me into computer science from my earlier profession, which was physics. 44 00:04:45,330 --> 00:04:51,810 So, uh, I decided to show you a lot more photographs of people than than Greek letters. 45 00:04:52,770 --> 00:04:54,930 So there's quite a gallery of pictures coming up. 46 00:04:55,650 --> 00:05:06,030 So there are the whole concept of by simulation, um, arose from the work of these two giants of concurrency theory, Robin Milner and David Park. 47 00:05:06,720 --> 00:05:15,930 Um, so Milner, the first one who conceptualised it and came up with the first description, but Park gave a much more elegant six point approach. 48 00:05:15,930 --> 00:05:24,120 And that's been the fruitful, uh, way of presenting by simulation, which has readily generalised to other subjects, other domains. 49 00:05:24,900 --> 00:05:29,640 I was personally inspired by a number of people. One, uh, Dexter Kozan, 50 00:05:29,970 --> 00:05:38,550 who was one of the people who first began thinking about probabilistic systems on continuous state spaces in a computer science context. 51 00:05:38,820 --> 00:05:46,170 Of course, people doing Markov process theory, thinking about things people in computer science had been thinking about discrete Markov chains. 52 00:05:46,500 --> 00:05:50,980 But Dexter really started thinking about measure theory. And. 53 00:05:52,700 --> 00:05:58,790 And related topics. And he taught a lecture course at Cornell when I was a young assistant professor there. 54 00:05:59,300 --> 00:06:02,450 And I sat through this lecture course, and I don't remember a thing about it. 55 00:06:02,840 --> 00:06:07,940 If I had, it would have saved me several years, which I had to rediscover many of these things later. 56 00:06:09,160 --> 00:06:12,550 Another significant source of inspiration was Bill Ovia, 57 00:06:13,060 --> 00:06:19,960 who actually formulated a notion of a category of probabilistic relations back in the early 60s. 58 00:06:20,590 --> 00:06:27,820 Um, tragically, he died just a couple of years ago. But he is legendary for his incredible insight into different things. 59 00:06:27,970 --> 00:06:35,350 Wasn't always able to prove stuff, but he was always able to connect stuff and and have, uh, other people fill in the gaps. 60 00:06:35,800 --> 00:06:39,050 And here's a person who most people haven't heard of. 61 00:06:39,070 --> 00:06:49,540 Her name is Michelle Giri, and she's the one who gave a kind of monad theoretic way of thinking about the category of relations that, uh, formulated. 62 00:06:51,870 --> 00:06:58,000 So let me, uh. Suspend the pictures here for a minute and talk about behavioural equivalence. 63 00:06:58,180 --> 00:07:05,140 Now, behavioural equivalence simply means what it says. How do I know that two systems are behaving the same way? 64 00:07:05,800 --> 00:07:12,370 And in particular, we can look at two states within a system and ask when do they have exactly the same behaviour? 65 00:07:12,910 --> 00:07:18,220 Now of course, this depends on what aspect of the behaviour one is able to observe. 66 00:07:18,820 --> 00:07:23,020 So if you're observing the older each state that that means one thing. 67 00:07:23,020 --> 00:07:32,530 If you just observing certain reactions or certain things being emitted outputs, then the notion is different and we will see variations on this idea. 68 00:07:33,580 --> 00:07:36,850 What we do we want this equivalence relation to guarantee. 69 00:07:38,660 --> 00:07:46,400 Well, uh, if two states are equivalent, we should not be able to see any difference in their observable behaviour. 70 00:07:46,430 --> 00:07:48,380 This is kind of repeating what I have already said. 71 00:07:49,130 --> 00:07:57,200 It's trying to capture some notion of observable aspect of the behaviour and from the temporal point of view. 72 00:07:57,440 --> 00:08:02,479 If we start out with two states that are equivalent, they should stay equivalent as they evolve. 73 00:08:02,480 --> 00:08:07,940 Otherwise, one of the observations you can make is aha. They've evolved into different, noticeably different things. 74 00:08:08,300 --> 00:08:14,410 And that certainly could not have been equivalent to start with. So let me just indulge in a bit of history. 75 00:08:14,890 --> 00:08:20,980 So the some of the ideas go back to, uh, Kanter and the back and forth argument. 76 00:08:21,010 --> 00:08:26,350 Now, all of you know, Cantor's diagonalization argument, but he actually proved many things, 77 00:08:26,680 --> 00:08:35,770 one of which was the, um, only countable dense linear order without end points is the rationals. 78 00:08:36,400 --> 00:08:39,280 And the argument that he gave to prove it is essentially, uh, 79 00:08:39,550 --> 00:08:45,070 what he called a back and forth argument, very similar in spirit to what we will see in simulation. 80 00:08:45,280 --> 00:08:48,820 So I like to mention that as part of the pre-history of Pi simulation. 81 00:08:49,960 --> 00:08:53,860 Yeah. Uh, sorry to all of you asking a question. Taking pictures. 82 00:08:54,070 --> 00:08:57,880 Yeah. Take a picture. Okay. So, uh. 83 00:08:59,750 --> 00:09:06,710 The concept of lump ability was mooted in the Markov chain community in the 60s, and this word is very evocative. 84 00:09:06,980 --> 00:09:15,350 It means when things are equivalent you can lump them into one state, thereby reducing effectively the state space with which you are working. 85 00:09:15,830 --> 00:09:21,200 So they gave this very nice name lump ability, but it didn't catch on in the computer science community. 86 00:09:23,080 --> 00:09:31,900 Uh, Milner and Park in the 1970s and 1980s formulated the notion that we will call by simulation and I will, 87 00:09:32,020 --> 00:09:35,710 uh, and they defined it in a non-deterministic setting without probabilities. 88 00:09:36,130 --> 00:09:41,160 So I will walk you through these definitions eventually. Um. 89 00:09:42,190 --> 00:09:47,860 Yes. So they formulated these concepts and there was the subject called process algebra, 90 00:09:47,860 --> 00:09:55,510 whereby people formulated different processes using algebraic notions and reasoning occasionally about them. 91 00:09:56,020 --> 00:10:00,580 In fact, there was a time when every country in the EU seemed to have its own process algebra. 92 00:10:01,000 --> 00:10:10,930 Italy had half a dozen, and at some point it became clear that one needs deeper concepts than just the syntax of these algebras. 93 00:10:13,550 --> 00:10:21,650 Late in the 80s, um Kim Larson and Honest Co formulated probabilistic by simulation for discrete systems. 94 00:10:21,890 --> 00:10:28,010 So now instead of just saying things can transition um transition non deterministically, 95 00:10:28,220 --> 00:10:33,410 you actually have probability distributions governing the behaviour of transition systems. 96 00:10:33,620 --> 00:10:37,100 And what's the relevant notion of pi simulation in that setting. 97 00:10:38,000 --> 00:10:40,399 Later on people added temporal concepts. 98 00:10:40,400 --> 00:10:50,240 So they were continuous time Markov chains Petri nets and then the uh probabilistic sorry performance evaluation process algebra. 99 00:10:50,810 --> 00:10:56,910 Is that right? Yeah. Which Jane Hillston sitting here formulated in the early 90s. 100 00:10:58,820 --> 00:11:02,750 Carrying on with a bit more history. So this is where we entered the picture. 101 00:11:03,860 --> 00:11:10,970 As I said before, I was very keen on looking at by simulation of Markov processes, but on continuous state spaces, 102 00:11:11,720 --> 00:11:20,420 and this was done originally in collaboration with my outstanding student Jose de ogni, and I was able at the Imperial. 103 00:11:22,390 --> 00:11:26,260 Later on, we moved on from equivalence relations to metrics. 104 00:11:26,620 --> 00:11:33,670 So rather than just saying things are equivalent or not equivalent, we started talking about how different are they or how how similar are they. 105 00:11:35,260 --> 00:11:42,669 And there was a very beautiful fixed point presentation of the work we developed due to fan Bruegel and James Morel, 106 00:11:42,670 --> 00:11:46,630 who is sitting somewhere in the audience. So many things happened at Oxford. 107 00:11:47,610 --> 00:11:50,800 Uh, now in the machine learning community, 108 00:11:51,460 --> 00:11:59,680 people were looking at Markov decision processes and without knowing that by simulation had already been invented, they reinvented it. 109 00:11:59,950 --> 00:12:06,070 But then they become became aware of the earlier work and cited it appropriately. 110 00:12:06,670 --> 00:12:09,790 But essentially they were looking at Markov decision processes. 111 00:12:10,030 --> 00:12:13,630 But now one adds a notion of reward to the set up. 112 00:12:15,360 --> 00:12:20,760 And we were in the market for taking any equivalence relation and making it into a metric. 113 00:12:20,970 --> 00:12:25,260 So we came up with simulation metrics for these Markov decision processes. 114 00:12:27,740 --> 00:12:31,460 Uh, many things happened over the years, but let me jump ahead to 2021, 115 00:12:31,760 --> 00:12:37,790 when we started applying some of these simulation metrics to the problem of representation learning. 116 00:12:38,120 --> 00:12:48,860 And this was primarily done by my former PhD student, uh, Pablo Castro, my master's student, uh, Tyler Kastner, who's still doing his PhD. 117 00:12:49,550 --> 00:12:53,300 Um, um, Mark Roland and DeepMind in London. 118 00:12:57,100 --> 00:13:06,270 Uh, so that work that I mentioned in NeurIPS was partly very principled and structured and then hacked at the end. 119 00:13:07,170 --> 00:13:14,730 And that was a bit of a disappointment. I mean, but then we realised that the hack at the end actually had a nice interpretation in terms of kernels, 120 00:13:15,150 --> 00:13:22,140 which I won't talk about today, but that's something we very, very recently, uh, completed and published. 121 00:13:25,370 --> 00:13:32,120 Okay. Right. So let me begin with the technical development, which I'll try to do quickly. 122 00:13:32,990 --> 00:13:37,370 And sometimes you'll see text in blue. That means you can doze off and ignore it. 123 00:13:37,640 --> 00:13:43,160 But if you really want to read it, you can. So we have labelled transition systems. 124 00:13:43,550 --> 00:13:47,180 That word transition system means you are somewhere and you move somewhere else. 125 00:13:47,390 --> 00:13:51,110 So where you are these things are called states. And. 126 00:13:52,160 --> 00:13:55,920 Um. You move. How do you move? By something happening. 127 00:13:55,940 --> 00:14:03,650 So these things are called actions or this sometimes labels and governed with associated with every action is a transition. 128 00:14:04,190 --> 00:14:07,880 And what a transition does is it takes you from one state to another. 129 00:14:08,180 --> 00:14:13,130 And we indicate this with an arrow labelled by one of these actions. 130 00:14:13,520 --> 00:14:17,929 So these transitions can be indeterminate, which means you start from a given state. 131 00:14:17,930 --> 00:14:22,730 You perform a certain action, and you'll end up in one of possibly several different states. 132 00:14:23,000 --> 00:14:27,530 Sometimes, maybe it's uniquely specified, sometimes not. So there's the nondeterminism. 133 00:14:29,140 --> 00:14:35,920 And it's, uh, nicer to read these label transitions with label written on top of the arrow, 134 00:14:36,340 --> 00:14:40,390 rather than this peculiar notation shown on the other side. 135 00:14:42,190 --> 00:14:47,860 So here's an example that Robert Milner invented to explain these concepts when they were first introduced. 136 00:14:47,860 --> 00:14:52,090 And I'm sure all the verification people have seen this far too many times. 137 00:14:52,780 --> 00:14:56,860 I can give you a nice brainteaser to work on while you while I'm explaining this, but maybe not. 138 00:14:57,220 --> 00:15:00,730 So this is the legendary vending machine example. 139 00:15:01,300 --> 00:15:11,200 You have a vending machine. You put, uh, a cup in the right place, you insert money, then you choose between tea and coffee, 140 00:15:12,400 --> 00:15:15,400 and then the machine responds by dispensing tea or coffee. 141 00:15:15,730 --> 00:15:25,490 Coffee as appropriate. And so. This diagram, which I hope is self-explanatory, depicts the behaviour of this very simple machine. 142 00:15:26,970 --> 00:15:30,870 Note one thing about this machine is the interactive character of the machine. 143 00:15:31,290 --> 00:15:35,760 The agent does certain things like press buttons, insert money, and so on. 144 00:15:36,180 --> 00:15:39,210 And the system also responds to these interactions. 145 00:15:41,080 --> 00:15:47,020 Now here's another one. And if you look at it, it kind of looks similar for a while. 146 00:15:47,950 --> 00:15:54,400 But then you notice a rather peculiar feature which is once you insert the money. 147 00:15:55,460 --> 00:15:58,640 Then the system immediately makes a non-deterministic transition. 148 00:15:58,640 --> 00:16:04,190 So here's an example of non-determinism in action. It goes to either this state or that one. 149 00:16:05,120 --> 00:16:11,170 Before you get a chance to choose. And once you're in one of those states, you have no choice. 150 00:16:11,180 --> 00:16:15,740 You can either ask for T if you're there or ask for coffee if you're here. 151 00:16:17,010 --> 00:16:23,040 Okay, so this is a very draconian machine that decides what you are going to drink today. 152 00:16:24,000 --> 00:16:29,639 Now of course, if you ask for these two things Behaviourally behaviourally similar. 153 00:16:29,640 --> 00:16:38,100 Of course not. They're manifestly different. And you can observe the difference by attempting to choose tea or coffee in the previous machine. 154 00:16:38,100 --> 00:16:43,430 You will find no matter what you do, once you've inserted the money, you can choose whichever one you want. 155 00:16:43,440 --> 00:16:50,160 And here you'll notice sometimes it rejects your choice. That's certainly a very observable difference, but. 156 00:16:52,210 --> 00:16:56,590 Um, right. So one of them gives us the choice. The other one makes the choice internally. 157 00:16:56,950 --> 00:17:01,630 But if you look at the sequences of actions that you can get, they are identical. 158 00:17:02,170 --> 00:17:05,830 You put the cop, you put the money. Then you have a choice between coffee and tea. 159 00:17:06,280 --> 00:17:11,530 And in the language of regular expressions, this is what the possible behaviours of both machines. 160 00:17:13,180 --> 00:17:19,360 In short, these strings are not capturing the branching structure, which is essential to distinguish these two machines. 161 00:17:20,140 --> 00:17:29,530 And hence, you know, old fashioned automata theory and language theory is not going to do anything for you in analysing these concepts. 162 00:17:30,840 --> 00:17:36,280 So we need to go beyond language equivalence. So time for a formal definition. 163 00:17:37,090 --> 00:17:40,960 We have. Two states S and T. 164 00:17:41,370 --> 00:17:48,840 This dotted line connecting them is supposed to suggest that the biosimilar is going to behave exactly the same way, 165 00:17:49,530 --> 00:17:51,299 and what the formal definition says. 166 00:17:51,300 --> 00:17:59,400 It's written down here with all the symbols, but it says if this one, whatever action you choose, this one will go to some state's prime. 167 00:17:59,790 --> 00:18:06,030 Then there is some state d prime such that if you do the same action on T, it will take you there. 168 00:18:07,120 --> 00:18:11,860 And these new states as prime themselves, continue to be related. 169 00:18:12,100 --> 00:18:16,690 So there is the temporal persistence in the in the in this relationship. 170 00:18:17,470 --> 00:18:23,740 Also of course we want to symmetries this things which vice versa with f and t into changing their roles. 171 00:18:24,220 --> 00:18:31,030 So that's the formal definition, a kind of a step by step account of what it means to be behaviourally equivalent. 172 00:18:31,420 --> 00:18:36,400 And what are you observing in this setting, whether or not your system responds to certain actions? 173 00:18:36,940 --> 00:18:40,009 It might happen that the system says I'm pressing the A button. 174 00:18:40,010 --> 00:18:46,330 Then it just says, forget it, I'm not doing it. So your observation is what actions does the system make? 175 00:18:46,590 --> 00:18:49,600 Except you don't actually see the internal structure of the states. 176 00:18:51,890 --> 00:19:00,470 Now we can make all this probabilistic by introducing discrete probabilistic transition systems so that just like the old ones, 177 00:19:01,040 --> 00:19:07,340 except that now the final state, instead of just being told it's one of these 17 states that you might go to, 178 00:19:07,490 --> 00:19:15,170 you're actually given a probability distribution with numerical quantities telling you how probable a certain final state is. 179 00:19:15,770 --> 00:19:25,950 So just like before, we have states actions, but now associated with every action is this transition matrix or transition function that says start 180 00:19:25,950 --> 00:19:32,450 to the state's do the action a and there'll be some specific distribution over the final states. 181 00:19:33,680 --> 00:19:39,170 Now this model is reactive, which means all this probabilistic phenomenon is internal. 182 00:19:39,440 --> 00:19:43,909 We are not trying to associate probabilities with the actual external actions. 183 00:19:43,910 --> 00:19:48,170 There is no such thing as probability actually happens with this probability. 184 00:19:48,170 --> 00:19:52,680 An action B happens with this other probability, just given that A happened. 185 00:19:52,730 --> 00:20:02,070 Here's how it responds. So let's ask about probabilistic by simulation as defined originally by Larsen in school. 186 00:20:02,310 --> 00:20:06,900 This was a very fundamental and seminal paper on the subject. 187 00:20:07,740 --> 00:20:12,900 So here are some systems. I hope the notation is clear. These S's and T's are states. 188 00:20:13,410 --> 00:20:17,310 I haven't bothered to label these ones because there are so-called dead states where nothing can happen. 189 00:20:18,780 --> 00:20:23,550 These arrows are labelled by both an action as well as a number. 190 00:20:24,000 --> 00:20:32,730 And the way you should read this is if you are in state S0 and you do an action A with one third probability, you wind up in each one of these states. 191 00:20:34,490 --> 00:20:42,800 Uh. And once you're in the state S1 with probability one, you can do a B action and come to this dead state. 192 00:20:43,040 --> 00:20:50,150 You can't do any of the other actions. You can't do A, you can't do C, you can only do B similarly, here and here you can only do C, 193 00:20:50,570 --> 00:20:54,260 and it'll happen with probability one that you will end up in this dead state. 194 00:20:54,530 --> 00:20:59,000 Now this one. Looks very similar. 195 00:20:59,010 --> 00:21:04,860 And the question is is S0 biosimilar behaviourally equivalent to t0? 196 00:21:05,130 --> 00:21:11,610 And you look at it and you see what I can't find a perfect matching between transitions like I was doing before. 197 00:21:12,900 --> 00:21:22,800 And then you realise, oh, what you need to do is view this S2 and S3 kind of as a lump in equivalence class and add up the probabilities. 198 00:21:23,070 --> 00:21:25,170 Then this particular combination. 199 00:21:26,050 --> 00:21:33,610 Is exactly behaving like T2, and you can see from T2 it's not just the input but the outputs that are behaving the same. 200 00:21:33,880 --> 00:21:39,400 Either one of those states does a transaction a transition with probability one to a dead state. 201 00:21:39,400 --> 00:21:42,630 That's exactly what happens here. The be. 202 00:21:42,990 --> 00:21:48,750 Uh, sorry. The this transition is perfectly matched by this pair. 203 00:21:49,350 --> 00:21:53,700 So, in fact, once you realised that you have to add these things. 204 00:21:55,320 --> 00:22:02,190 You do have a notion of by simulation, but you can't just use the arrow by arrow matching definition that you had before. 205 00:22:02,940 --> 00:22:06,690 So you need to add up the probabilities. And here's blue text formalising this. 206 00:22:07,020 --> 00:22:10,179 But I will skip over it. Now. 207 00:22:10,180 --> 00:22:15,970 Markov decision processes are probabilistic versions of label transition systems, 208 00:22:16,660 --> 00:22:22,720 but the, um, the uh final state is governed by a probability distribution. 209 00:22:23,020 --> 00:22:27,670 No other indeterminacy. And now there's a reward associated with each transition. 210 00:22:28,300 --> 00:22:33,700 And what we observe are the interactions. Does it accept an action A and what is the reward. 211 00:22:34,000 --> 00:22:36,400 So we don't again we don't see the internal states. 212 00:22:36,910 --> 00:22:42,970 So again the formal definition looks like this a set of states a set of actions associated with every action. 213 00:22:43,240 --> 00:22:49,780 A probability function that says if you do action A on state's, you'll get this distribution over the final states. 214 00:22:50,080 --> 00:22:55,960 And here's the reward function that says if you do action A and state's, you get a certain numerical reward. 215 00:22:56,740 --> 00:23:04,510 Sometimes they make the rewards live in the range zero one, sometimes on the positive, whatever doesn't make fundamental difference. 216 00:23:05,230 --> 00:23:11,410 This, uh, reward. I've made it a deterministic function, but without too much difficulty or doing violence to the theory, 217 00:23:11,590 --> 00:23:14,650 you can make the reward itself a probabilistic thing. 218 00:23:17,750 --> 00:23:20,870 There's something very different that happens in this point of view. 219 00:23:20,870 --> 00:23:27,109 That isn't the case with how we dealt with these systems in the world of verification and verification. 220 00:23:27,110 --> 00:23:31,580 We wanted to say things like, no matter what this system will do, such and such, 221 00:23:32,360 --> 00:23:38,629 this plane won't crash, for example, um, but here you're doing something else. 222 00:23:38,630 --> 00:23:44,270 You're controlling the system by choosing the actions. And so we have this concept called a policy. 223 00:23:45,230 --> 00:23:49,340 So we have these, uh, Markov decision processes. 224 00:23:49,940 --> 00:23:58,040 And our policy says given a state's I'm going to choose the action based on that state. 225 00:23:58,040 --> 00:24:01,200 But that choice may be. Probabilistic. 226 00:24:01,350 --> 00:24:07,480 So there's a distribution over actions. That happens once you choose a particular policy pile. 227 00:24:08,260 --> 00:24:17,050 And once you've chosen a policy, your MDP becomes just like a probabilistic transition system and rich with the reward concept. 228 00:24:19,560 --> 00:24:26,670 So the goal is to choose the best policy. And a lot of machine learning, especially reinforcement learning, 229 00:24:26,910 --> 00:24:32,550 is about the numerous algorithms that exist to find or approximate an optimal policy. 230 00:24:33,090 --> 00:24:37,830 So a lot of the technology of optimisation theory comes into play here. 231 00:24:39,100 --> 00:24:45,820 Now what's the equivalence concept appropriate to this setting. So let be an equivalence relation. 232 00:24:46,420 --> 00:24:50,080 So you notice the use of the indefinite article and equivalence relation. 233 00:24:50,950 --> 00:24:57,880 It's just a uh something I'm going to put as a property that relations might or might not have. 234 00:24:58,240 --> 00:25:03,310 So given a relation ah, it might or might not be a by simulation relation. 235 00:25:04,030 --> 00:25:08,470 So if R is a by simulation relation, if you've got a pair of related states s and t. 236 00:25:09,490 --> 00:25:20,770 And then for every action and every equivalence class of this are first the rewards match, and second the transition probabilities. 237 00:25:21,830 --> 00:25:25,670 Into the equivalence classes match. So the fact that we're using. 238 00:25:25,970 --> 00:25:32,330 Equivalence classes amounts to lumping things together so that you're adding up the probabilities appropriately. 239 00:25:33,530 --> 00:25:38,000 And this is the basic pattern. You have some kind of immediate matching that must happen. 240 00:25:38,270 --> 00:25:41,540 Otherwise you'll see right away they're different. So the rewards must match. 241 00:25:41,840 --> 00:25:45,559 And then you want temporal persistence of this behaviour. 242 00:25:45,560 --> 00:25:50,120 And that comes from the second condition which is typically called the induction condition. 243 00:25:52,160 --> 00:25:55,580 Now this is in blue, but I'm going to read it anyway. 244 00:25:55,850 --> 00:26:04,430 So, uh, this definition is kind of intuitive and clear, but it can also be defined as the greatest fixed point of an appropriate relation transformer. 245 00:26:06,670 --> 00:26:11,380 Now, as I said before, I want to move to the world of continuous state spaces. 246 00:26:12,220 --> 00:26:15,760 And one of the questions you can ask is why? 247 00:26:16,780 --> 00:26:21,280 And you can't answer. All because I'm having fun, which is usually the real reason. 248 00:26:21,820 --> 00:26:25,900 Uh, but the setting is real. 249 00:26:26,110 --> 00:26:31,600 You have software controllers attached to physical devices on sensors like robots or controllers. 250 00:26:31,960 --> 00:26:39,250 Around the time I began working in this area, one of my colleagues got a contract to consult with the avionics industry, 251 00:26:39,250 --> 00:26:45,190 and they were very interested and it didn't really get off the ground, but at least we began that way. 252 00:26:45,580 --> 00:26:51,360 And I remember the first meeting that I asked them. How do you verify the correctness of your software? 253 00:26:51,370 --> 00:26:56,800 And they say, we have meetings. And I said, what if it's really safety critical software? 254 00:26:56,830 --> 00:27:02,080 We have lots of meetings for those. So did not really, uh, enhance my confidence. 255 00:27:02,080 --> 00:27:07,690 But planes have not crashed for software failures yet. But nevertheless, it's an important thing to reason about. 256 00:27:09,310 --> 00:27:13,360 So the setting we're going to look at is continuous state space but discrete time. 257 00:27:13,690 --> 00:27:17,900 And you might question if you're going to make the state space continuous. Why not the temporal part. 258 00:27:17,920 --> 00:27:22,090 Well the main reason was it's much harder. But we're doing that now. 259 00:27:22,090 --> 00:27:24,940 But at the time we decided to stick with discrete time. 260 00:27:25,120 --> 00:27:31,510 And it's not such an unreasonable thing because many of these controllers sample, you know, 50 times a second, 261 00:27:32,410 --> 00:27:36,940 maybe assign an NDA not to not say that, but anyway, sample some number of times per second. 262 00:27:37,300 --> 00:27:40,940 And based on this discrete temporal sample, it makes some decisions. 263 00:27:40,960 --> 00:27:44,980 So it does look even though of course dynamical systems are, you know, 264 00:27:45,130 --> 00:27:51,280 in the continuous time the software that you're writing thinks it's in discrete time. 265 00:27:52,470 --> 00:27:57,000 So there were applications to control systems, to probabilistic programming languages, 266 00:27:57,000 --> 00:28:03,180 because once you put an iteration or recursion into a language with even with discrete probabilistic choice, 267 00:28:03,180 --> 00:28:09,480 you will effectively get, uh, continuous probability spaces by looking at the space of trajectories. 268 00:28:10,560 --> 00:28:15,900 So let me make some more remarks about this. So of course these continuous state spaces can be used for reasoning, 269 00:28:16,350 --> 00:28:20,850 but maybe it would be much better if we could have a finite state version, whatever that means. 270 00:28:20,850 --> 00:28:26,100 Exactly. And in fact, we spent a lot of effort, which I'm not going to talk about today, 271 00:28:26,100 --> 00:28:33,510 about approximating continuous spaces with finite state ones, uh, with some appropriate notion of approximation. 272 00:28:34,320 --> 00:28:39,900 But you could say, why not just discrete discretized right away and never worry about the continuous case? 273 00:28:40,590 --> 00:28:46,920 Well, if you don't have the continuous case, the original thing to compare with, how can you tell that your approximation is a good one? 274 00:28:48,280 --> 00:28:53,950 And secondly, if you've thrown away the original continuous model and you've just got a particular discrete one, 275 00:28:54,040 --> 00:28:57,490 you can refine it, you can coarse in it, but you can refine it further. 276 00:28:59,650 --> 00:29:09,610 Okay. So, uh, let me go from that, those considerations to the mathematical ones and that the first one is the need for measure theory. 277 00:29:10,000 --> 00:29:12,370 Why do we have to do all this measure theory nonsense? 278 00:29:12,940 --> 00:29:22,600 Um, and the basic fact is, there are subsets of the real line for which no sensible notion of size can be defined. 279 00:29:23,470 --> 00:29:28,299 So size, of course, is the heart of measure theory. So you look at the real line. 280 00:29:28,300 --> 00:29:31,030 You have a notion of interval. You have a notion of length of interval. 281 00:29:31,390 --> 00:29:35,590 You start getting more sophisticated sets and ask what is the analogue of the notion of length. 282 00:29:36,040 --> 00:29:41,080 You might ask, what is the length of the rational numbers? Turns out to be zero, but you can ask questions like that. 283 00:29:41,740 --> 00:29:46,330 What is the length of the Cantor set to zero even though it's uncomfortable? 284 00:29:46,360 --> 00:29:51,340 So it tells you that accountability is absolutely the wrong way of measuring size in this setting, 285 00:29:51,790 --> 00:29:55,360 where you do want some kind of geometric intuition underpinning all this. 286 00:29:55,900 --> 00:30:02,140 So the fact is, until you invoke measure theory and sigma algebras and all the apparatus that goes with it, 287 00:30:02,440 --> 00:30:07,720 you can't just willy nilly talk about sizes and probabilities in a continuous setting. 288 00:30:09,090 --> 00:30:16,380 The more precise statement due to Vitale, is that there is no translation invariant measure defined on all subsets of the reals. 289 00:30:16,680 --> 00:30:19,499 There are, of course, things that are not translation invariant, 290 00:30:19,500 --> 00:30:27,330 like the measure that says any set containing the zero has complete one and anything else has zero, but that's certainly not translation invariant. 291 00:30:29,450 --> 00:30:32,930 All right. So we went to this this continuous world. 292 00:30:33,860 --> 00:30:38,120 We use the same essentially the same definition of probabilistic by simulation. 293 00:30:38,750 --> 00:30:43,310 And we. Found some curious things happening. 294 00:30:43,940 --> 00:30:46,690 So first of all, even before us, right. 295 00:30:46,700 --> 00:30:55,819 Going back to, uh, Robin Milner's work, he and Matthew Hennessey and independently, um, uh, God, who's the Dutch guy? 296 00:30:55,820 --> 00:31:04,809 Samson. Uh. So Christian, who is the one who did logical characterisation of Van Bentham? 297 00:31:04,810 --> 00:31:08,170 Thank you for Van Bentham always gets forgotten. Anyway. 298 00:31:08,170 --> 00:31:16,740 Um. And what they did was they showed that two systems are by similar if and only if. 299 00:31:17,100 --> 00:31:22,620 So that's very strong. They agree on all formulas of a specific logic that you can write down. 300 00:31:23,800 --> 00:31:29,100 Now, this is very important because it tells you it's, uh, how to show things are not dissimilar. 301 00:31:29,710 --> 00:31:36,040 If you want to show things are by similar. Yes. You try and construct one of those relations and see if you can get the two states in question. 302 00:31:36,040 --> 00:31:39,969 Related. But if you tell me they're not dissimilar, how do I show that? 303 00:31:39,970 --> 00:31:43,090 I can't say. Well, I tried very hard to find a relation and failed. 304 00:31:43,510 --> 00:31:50,440 What you need is some result like this. And you can exhibit a formula and say, see, they disagree on this formula. 305 00:31:51,190 --> 00:31:58,630 Now the formula formulas that appeared in the original logic had all kinds of things disjunction, negation specifically. 306 00:31:59,530 --> 00:32:03,790 And in the case you would and of course it was restricted to finite branching systems. 307 00:32:04,810 --> 00:32:14,740 And when we began, we were told by all the wise men in the field that we had no hope of doing an analogous thing in this kind of continuous setting. 308 00:32:15,250 --> 00:32:23,910 We lacked the kind of discreet structure that you would expect in logic. And so our logic had these formulas. 309 00:32:24,150 --> 00:32:30,010 Just the atom true binary conjunction. And this peculiar modal formula. 310 00:32:30,030 --> 00:32:35,280 So this action is inside a little diamond and sub scripted by a rational number Q. 311 00:32:35,730 --> 00:32:46,410 And it says a state satisfies this formula if it can perform an action with probability at least q and wind up in a state that then satisfies phi. 312 00:32:46,590 --> 00:32:48,780 So this is an inductive definition of formulas. 313 00:32:50,690 --> 00:32:59,570 And the result we were able to prove was that two systems are dissimilar if and only if they obeyed the same formulas of this logic L. 314 00:32:59,810 --> 00:33:05,959 So this is a striking because no negation, no disjunction, even in the continuous setting, 315 00:33:05,960 --> 00:33:13,070 which is certainly more than just the infinite discrete setting, uh, binary conjunction suffices and. 316 00:33:14,100 --> 00:33:18,300 I remember vividly that, uh, when my student first presented this. 317 00:33:18,720 --> 00:33:21,750 People jumped up very kindly because, you know, they were kind to. 318 00:33:21,960 --> 00:33:25,740 It was a ten minute presentation at a summer school where the students were allowed to present. 319 00:33:26,460 --> 00:33:32,700 And, um, they very politely told her, oh, uh, this can't possibly be right. 320 00:33:33,390 --> 00:33:37,590 And she didn't know English very well. She was she's a French speaker from Montreal. 321 00:33:38,160 --> 00:33:41,580 And she said, be quiet at the end of my talk you will be convinced. 322 00:33:43,230 --> 00:33:47,370 So, uh, I was I wished I had the confidence to say stuff like that. 323 00:33:49,920 --> 00:33:55,070 Uh, so the important thing is, as I said before, no finite branching assumption, no negation. 324 00:33:55,280 --> 00:33:58,670 This was what people were shocked by. They said, how can this possibly be? 325 00:33:59,600 --> 00:34:03,800 Uh, the proof is entirely different from the combinatorial proofs that existed before. 326 00:34:04,010 --> 00:34:08,540 Which is why they weren't subject to those, um, to that dependence on discreteness. 327 00:34:08,900 --> 00:34:13,610 So it uses tools from, of all things, descriptive set theory as well as measure theory. 328 00:34:15,460 --> 00:34:24,970 Uh, right. So as I said before, such a theorem was originally proved for non probabilistic systems with finite branching restrictions and later by uh, 329 00:34:25,060 --> 00:34:32,440 a lot of time in school in the discrete probabilistic setting with a lot of restrictions on how much the branching was controlled and so on. 330 00:34:33,280 --> 00:34:38,290 And because of the power of the techniques we used, we were able to get. 331 00:34:38,290 --> 00:34:42,520 So you see, these results are even true for discrete systems after the fact. 332 00:34:43,300 --> 00:34:48,580 But, uh, this does logical characterisation of discrete probabilistic systems. 333 00:34:48,730 --> 00:34:55,630 But it was only by going through this, uh, um, excursion that what one was able to prove it at first. 334 00:34:55,990 --> 00:34:59,800 Later on, of course, people did provide purely discrete proofs of it. 335 00:35:00,900 --> 00:35:04,480 And I must at this point pay homage to my former student. 336 00:35:04,500 --> 00:35:09,330 She's now a full professor. Jose, they are nay. Basically, she proved everything difficult. 337 00:35:10,170 --> 00:35:16,440 Uh, I proved many things incorrectly, and she fixed them. 338 00:35:16,830 --> 00:35:20,280 So I call her my proof engine because she made. 339 00:35:20,280 --> 00:35:23,309 She, uh. I think we were a very good partnership. 340 00:35:23,310 --> 00:35:28,709 Because I would say, yes, let's do this. And much work like this. And she would say, no, no, no, it doesn't work like that. 341 00:35:28,710 --> 00:35:34,620 And sure enough, she would be right, but wasn't far from what I said. So it was always very, uh, nice working with her. 342 00:35:36,160 --> 00:35:39,610 Okay, so that's the story of probabilistic pi simulation. 343 00:35:41,020 --> 00:35:45,160 But we'd like to ask, is this notion of exact equivalence reasonable? 344 00:35:46,290 --> 00:35:50,460 We are demanding exact numerical equality of these probability numbers. 345 00:35:51,030 --> 00:35:57,660 And that really is not reasonable, because a small perturbation in these numbers is suddenly going to lead you to say, 346 00:35:57,870 --> 00:36:01,080 uh, these two states are no longer bi similar. They had been. 347 00:36:01,200 --> 00:36:06,390 But then I changed this half to a half plus epsilon and the other half to a half minus epsilon. 348 00:36:06,660 --> 00:36:09,720 Now they're not dissimilar. Yes. No. So. 349 00:36:11,140 --> 00:36:15,340 We should have some metrical notion that tells you. How close are they? 350 00:36:16,180 --> 00:36:27,420 And this is where we went to next. So, and Smolka proposed this idea, but they weren't able to come up with a metric that matched with by simulation. 351 00:36:28,110 --> 00:36:31,590 So we want a quantitative measurement of the distinction between processes. 352 00:36:32,190 --> 00:36:35,220 And I can usually I give a one hour talk at this point. 353 00:36:35,490 --> 00:36:42,990 So I'm going to skip that in lieu of all these Greek symbols and letters I'm just going to uh, tell you very heuristically what happened. 354 00:36:43,680 --> 00:36:49,530 So remember this logical characterisation result. I'm going to lie a little bit, but please bear with me. 355 00:36:50,340 --> 00:36:54,370 So the logical characterisation results as if two things are similar. 356 00:36:54,390 --> 00:36:59,850 They will agree on all formulas. So if they're not similar there will be some formula they disagree on. 357 00:37:00,360 --> 00:37:05,740 So look for the shortest formula on which they disagree. If it's a huge formula. 358 00:37:06,900 --> 00:37:10,230 If the if a huge formula is the shortest formula on which they disagree, 359 00:37:10,470 --> 00:37:15,780 then they should be close because it takes a long and elaborate experiment before you discern any difference. 360 00:37:16,350 --> 00:37:23,310 If they disagree on a short formula, um, they're very visibly different and they should be given a large distance. 361 00:37:25,080 --> 00:37:34,090 So, um. They may disagree on the rewards, or they may disagree on the probability distribution that results from a transaction. 362 00:37:34,540 --> 00:37:38,980 So we need to measure distances between probability distributions. 363 00:37:39,190 --> 00:37:44,230 And there is a well known metric. Well known, but the name is always given incorrectly. 364 00:37:44,650 --> 00:37:48,160 Uh, it's called the Cantor. Which metric between probability distributions. 365 00:37:48,700 --> 00:37:52,900 So the reason I stroke through the Wasserstein is because that's what everyone calls it. 366 00:37:54,650 --> 00:38:01,880 But, uh, it's very weird because people who've never read the paper cited a 1969 paper by Varsha Stein, 367 00:38:02,870 --> 00:38:06,560 uh, in which they say Washington introduced this metric in 1969. 368 00:38:06,770 --> 00:38:10,990 And Cantor, it proved this thing about it in 1958. And you go, hmm. 369 00:38:11,980 --> 00:38:16,550 Little bit of temporal anomaly there. Anyway. So it is actually introduced by Cantor, which in the 40s. 370 00:38:16,970 --> 00:38:21,680 This is such a nice and natural metric that has been discovered many times, 371 00:38:21,680 --> 00:38:27,049 and I'm sure Roger Stein discovered it without, you know, uh, looking at Cantor, which is work. 372 00:38:27,050 --> 00:38:30,590 But the fact is Cantor, which was 20 years earlier. 373 00:38:32,290 --> 00:38:39,370 So if the difference shows up only after a long and elaborate observation, then we should make the states nearby in the by simulation metric. 374 00:38:39,970 --> 00:38:44,200 So one should think of these formulas not just as logical mumbo jumbo. 375 00:38:44,380 --> 00:38:49,240 They're actually descriptions of experiments that you can do to see how things behave. 376 00:38:50,220 --> 00:38:53,850 Right? Because they say, do this action. Do that action. See what the probabilities are. 377 00:38:55,820 --> 00:39:01,980 So all this can be formalised. It's rather more complicated than this nice intuition that I've been giving you. 378 00:39:02,060 --> 00:39:08,480 And it was originally done by us, by Josie and the and her band of followers. 379 00:39:08,900 --> 00:39:19,280 Uh, but later, with a beautiful fixed point construction by fan Bruegel and world, who, uh, actually gave a much more fruitful definition. 380 00:39:19,280 --> 00:39:24,050 And all the work we've done subsequently follows their presentation of this metric. 381 00:39:25,800 --> 00:39:29,910 A master student of mine. Later a PhD student added rewards. 382 00:39:30,090 --> 00:39:35,490 So this is where we came up with by simulation metrics for Markov Decision processes. 383 00:39:36,150 --> 00:39:45,210 And the student whose name is Norm funds. Actually proved an important property that was completely different from anything we had shown before, 384 00:39:45,360 --> 00:39:53,820 which was that the optimal if you look at the optimal value function in an MDP and you compare it at two different points, 385 00:39:54,270 --> 00:40:00,690 that difference in the value of the optimal value function is bounded by some constant times the by simulation metric. 386 00:40:01,110 --> 00:40:04,379 So the by simulation metric is not just some random thing. 387 00:40:04,380 --> 00:40:10,230 It's actually closely related to the notion of optimal value function in reinforcement learning. 388 00:40:11,730 --> 00:40:14,370 So let me say a bit about reinforcement learning. 389 00:40:15,270 --> 00:40:21,570 Um, so we are often dealing with large or infinite transition systems whose behaviour is probabilistic. 390 00:40:22,050 --> 00:40:31,260 Again, the system responds to stimuli and moves to a new state probabilistically and outputs a reward, and we seek optimal policies. 391 00:40:31,260 --> 00:40:36,150 As I trained before, a plethora of algorithms and techniques exist. 392 00:40:36,510 --> 00:40:40,320 But the cost of these things depends on the size of the state space. 393 00:40:41,100 --> 00:40:44,010 Uh, in a polynomial way worse than linear. 394 00:40:44,370 --> 00:40:51,240 And if you can shrink these things and get something that's effectively the same but smaller, it will help you a lot. 395 00:40:53,330 --> 00:41:01,070 Um, and so there's this other thing you can do called representations, where you say instead of actually using the exact state space, 396 00:41:01,490 --> 00:41:05,390 let's try and use something called features that capture aspect of the state space. 397 00:41:05,900 --> 00:41:11,540 And can we use these representations to accelerate the learning process by shrinking the your search space? 398 00:41:13,740 --> 00:41:21,510 Um, um. So basically, for large state spaces, learning a value function like that is simply not feasible. 399 00:41:22,860 --> 00:41:27,000 So instead we defined a new space of features and try to come up with an embedding. 400 00:41:27,960 --> 00:41:32,640 Of your state space into the vector space indexed by the features. 401 00:41:33,240 --> 00:41:42,090 Then we can try to use this space, which is hopefully much lower dimensional than S, to predict values associated with state action pairs. 402 00:41:43,250 --> 00:41:47,530 And what this phrase representation learning means. Learning such a high. 403 00:41:48,430 --> 00:41:49,660 Okay, so you can. Of course. 404 00:41:50,640 --> 00:41:59,310 Try to cook up with cook up fires, but it would be nice if your system not only used such a representation, but learned a good one. 405 00:41:59,610 --> 00:42:05,639 Now, these elements of M are basically features that are chosen by the person developing the system. 406 00:42:05,640 --> 00:42:10,500 They can be based on any kind of prior knowledge or experience about the task at hand. 407 00:42:12,410 --> 00:42:20,060 Right. So, uh, it was actually Pablo Castro and Mark Roland at the Find and Google Brain, respectively. 408 00:42:21,100 --> 00:42:26,410 Actually the other way around. Sorry. Um, who started working on on this? 409 00:42:28,020 --> 00:42:33,450 And say they wanted to use the Cantor which metric at first to compute. 410 00:42:33,990 --> 00:42:37,710 But they found it expensive and difficult to estimate. 411 00:42:37,980 --> 00:42:43,590 And in particular, you can't find an unbiased estimator for the Cantor with metric. 412 00:42:44,250 --> 00:42:53,780 So they came up with something else. They said. Let's use a slimmed down, easy version that that can in fact be estimated. 413 00:42:54,710 --> 00:43:02,870 And it's around this time that we joined them. Um, so we, as my student Pilar and I joined Pablo Castro and Mark Roland. 414 00:43:04,300 --> 00:43:09,400 So we had this kind of simplified metric. Closely related to the by simulation metric. 415 00:43:09,400 --> 00:43:14,080 But it is a very crude approximation. And in fact it's not even a metric. 416 00:43:14,440 --> 00:43:19,960 So there are many distances out there that you know, you'd like them to be metrics, and they turn out not to be. 417 00:43:20,530 --> 00:43:29,560 KL divergence is a famous one, for example. So it's a new type of distance and we called it a diffuse metric. 418 00:43:30,070 --> 00:43:36,070 Maybe not the best terminology, but anyway that's what I called it. So here are the axioms for a diffuse metric. 419 00:43:36,490 --> 00:43:41,710 So the distances it assigns to two pairs of points have to be non-negative. 420 00:43:42,190 --> 00:43:48,740 It's symmetric. It satisfies the triangle inequality, which gives you its geometric flavour. 421 00:43:50,470 --> 00:43:54,310 But we do not require the distance of a point to itself is zero. 422 00:43:54,340 --> 00:43:59,680 This is perhaps the most drastic violation of the metric axioms that you can imagine. 423 00:43:59,920 --> 00:44:04,300 Lots of people have imagined violating triangle symmetry. 424 00:44:04,330 --> 00:44:10,959 You get quasi metrics. Lots of people have imagined having things x, y at zero distance. 425 00:44:10,960 --> 00:44:16,120 Those are called pseudo metrics. You can combine both those things and call them pseudo quasi metrics and so on. 426 00:44:16,330 --> 00:44:20,950 But this, uh, I thought nobody else had thought of. 427 00:44:20,950 --> 00:44:26,140 But it is not true. A couple of Polish people had had a metric like this. 428 00:44:27,500 --> 00:44:30,550 Um. And it's a bit shocking. 429 00:44:31,330 --> 00:44:38,080 It's like the Red Queen having to run to stay in one place. So, um, what can one say about such a thing? 430 00:44:38,590 --> 00:44:43,440 If you ask what's really going on? Perhaps I can give an example. 431 00:44:43,710 --> 00:44:50,520 Supposing you are looking for crude ways of measuring the distance of probability distributions for the all kinds of things you can do, 432 00:44:50,520 --> 00:44:57,749 like control of it, etc. but one thing you can just do is you say, I have a so I have an underlying metric space. 433 00:44:57,750 --> 00:45:02,610 So you can say I'll sample two points. And measured their average distance. 434 00:45:03,630 --> 00:45:10,050 Good. Uh, maybe this will give me a measure of how far apart two probability distributions are. 435 00:45:11,280 --> 00:45:20,310 And so, um. You can try this, but then you'll find if you have any distribution other than the direct distribution, 436 00:45:20,670 --> 00:45:27,000 you might be sampling points that are far apart. And it'll tell you the distance from this to itself is not zero. 437 00:45:28,190 --> 00:45:33,410 Uh, so it's a very it actually arises very naturally. And that's why you have to do clever things to get real metrics. 438 00:45:35,770 --> 00:45:39,430 So we call this metric the Meeko distance. 439 00:45:40,270 --> 00:45:47,740 So as I said, it's a distance. It's not a metric. Now, nearly all the machine learning algorithms I've seen are optimisation algorithms. 440 00:45:48,770 --> 00:45:55,250 And one often introduces extra terms into the objective function that push the solution in a desired direction. 441 00:45:56,210 --> 00:46:05,300 So what we did was we said, okay, here's this proxy for the basic relation metric spiritually inspired by by simulation, but not exactly. 442 00:46:05,990 --> 00:46:09,140 And we defined a loss term based on the micro distance. 443 00:46:09,530 --> 00:46:14,929 So Pablo Castro was the main experimentalist I have the last time I know the 444 00:46:14,930 --> 00:46:20,149 program was probably a five line program in ML for a functional programming class, 445 00:46:20,150 --> 00:46:23,630 so I certainly didn't do any of the implementation here. 446 00:46:23,840 --> 00:46:30,950 And he has a beautiful write up on his blog, where he describes all the experiments and the details and everything. 447 00:46:30,980 --> 00:46:36,260 I will, uh, say a bit, but it's the kind of thing I can't really answer questions about. 448 00:46:37,360 --> 00:46:43,700 But at least I can say something about the setup. So we have basically two copies of the neural net. 449 00:46:43,720 --> 00:46:48,490 It computes the representation psi is the function approximator. 450 00:46:48,490 --> 00:46:51,760 And here is some kind of loss function. 451 00:46:52,180 --> 00:46:55,120 Usually this one is based on temporal difference. 452 00:46:55,600 --> 00:47:06,940 But then what we do is we will say we penalise you if your, uh, representations are too far apart from the Meeko distance. 453 00:47:08,140 --> 00:47:11,360 Okay. So the normal loss term is just this. 454 00:47:11,380 --> 00:47:16,030 In fact normally you just have one of these. But we take a pair of these do the same thing. 455 00:47:16,240 --> 00:47:20,380 But then we say an additional loss term coupling these two together which is. 456 00:47:21,760 --> 00:47:24,880 Try and keep them within Meeko distance of each other. 457 00:47:26,270 --> 00:47:33,020 And you see, this is a kind of a generic thing that you can do. It's not just tuned for one other particular algorithm. 458 00:47:33,650 --> 00:47:38,300 And so we added this Mike Meeko last term to a variety of existing agents. 459 00:47:38,660 --> 00:47:41,000 All of those are available in the dopamine library. 460 00:47:41,900 --> 00:47:53,900 We ran each game five times with new seeds, so 300 runs for each agent, and we ran each game for 200 million environment interactions. 461 00:47:54,470 --> 00:48:01,720 And what we tested this on was the Atari suite. So we looked at the final scores and the learning curves. 462 00:48:02,170 --> 00:48:10,960 Uh, and so what we found was that on two thirds of the games, putting the Meeko loss term through a variety of gain, variety of agents, 463 00:48:11,290 --> 00:48:20,440 about two thirds of the time, it quite significantly improved the performance in terms of time taken to learn and, uh, performance thereafter. 464 00:48:21,280 --> 00:48:24,460 So, yeah, every agent performed better. 465 00:48:24,880 --> 00:48:29,860 Yes, that's true. Every single agent seemed to be improved by this. And those agents are quite different from each other. 466 00:48:30,880 --> 00:48:36,910 So here are some results. But you know, just like some people are scared of Greek letters, I'm scared of graphs. 467 00:48:37,120 --> 00:48:40,120 So I'll skip these. Uh, and. 468 00:48:41,380 --> 00:48:45,280 So let me wrap up by mentioning a few other developments. 469 00:48:46,120 --> 00:48:53,370 Um. We did a lot of work on approximation theory, which I didn't talk about today because it's another 1.5 hour talk. 470 00:48:53,970 --> 00:48:59,430 But essentially what we were able to do is approximate continuous state systems by finite state ones. 471 00:48:59,430 --> 00:49:03,420 And in what sense are they close precisely in terms of the by simulation metric, 472 00:49:03,750 --> 00:49:11,580 you can actually define a sequence of finite state systems that converges in that metric to the infinite state system. 473 00:49:13,230 --> 00:49:19,200 Now you might say, yes, metrical reasoning is all very well, but you know where algebra is, or many of us are. 474 00:49:19,200 --> 00:49:22,650 And we'd like to capture the notion of distance in an equation setting. 475 00:49:23,160 --> 00:49:27,870 So we introduced this notion of approximate equality. 476 00:49:28,200 --> 00:49:32,460 So you take the equal symbol and annotate it with a small real number epsilon. 477 00:49:32,760 --> 00:49:41,360 And you say s equals epsilon. T is the same as s is within epsilon of t rather than s is exactly the same as t. 478 00:49:41,370 --> 00:49:45,660 Now of course you would like to realise this is not an equivalence relation. 479 00:49:46,960 --> 00:49:50,260 But just because it's not an equivalence relation doesn't mean it's rubbish. 480 00:49:50,500 --> 00:49:52,300 It's actually something called a uniformity. 481 00:49:52,930 --> 00:50:00,040 Uh, and from which one connects, one can basically produce an enhanced kind of notion of quantitative algebra. 482 00:50:00,370 --> 00:50:04,989 And you can develop free algebra as you can develop the concept of equations with theories. 483 00:50:04,990 --> 00:50:09,480 You can develop completeness theorems. And you can develop ask. 484 00:50:09,630 --> 00:50:15,360 So what is the free algebra if you do this in that and the other. So there are simple equation theories you can write down. 485 00:50:15,840 --> 00:50:19,170 And if you ask what are the free algebras out of the blue they turn out to be. 486 00:50:20,340 --> 00:50:26,610 Probability spaces with the Cantor. Which metric? Even though your activities, doesn't say anything about probability. 487 00:50:27,150 --> 00:50:34,740 So that was striking for us that it's really taking this metric reasoning and putting it in an equation setting. 488 00:50:35,070 --> 00:50:36,330 One has to be a bit careful. 489 00:50:36,780 --> 00:50:46,590 One has to take care of the what I call the ergonomics of reasoning, because now it costs it costs epsilons to chain these equalities together. 490 00:50:48,170 --> 00:50:51,290 So there is a lot of categorical mumbo jumbo. So I'm not going to tell you that. 491 00:50:51,290 --> 00:50:58,340 But, uh, for those who are interested, but I've given talks here before in which I've talked about this precisely. 492 00:50:59,710 --> 00:51:03,490 One of the other recent things we've done is exploit symmetry. 493 00:51:03,670 --> 00:51:11,400 So that's where there's a group action on the state space. And you want to look at the orbit space, learn on the orbit space and pull it back. 494 00:51:11,410 --> 00:51:14,430 Pull back the policy you've learned to the original state space. 495 00:51:14,500 --> 00:51:19,900 This is a. Sounds simple, but wasn't so easy. 496 00:51:19,930 --> 00:51:28,089 Fortunately, I had yet another brilliant student who, uh, her name is, uh, Ruthie, and she's doing her PhD at Harvard now. 497 00:51:28,090 --> 00:51:32,020 And so she took my vague and confused ideas and made them work. 498 00:51:32,380 --> 00:51:37,750 So that was a very nice, uh, line of work. I have yet another brilliant student. 499 00:51:37,760 --> 00:51:44,110 Uh, Florence Clark is her name, and so she is the one who started working on continuous space and time. 500 00:51:44,440 --> 00:51:50,919 So things like diffusion processes, the general class, are called attention processes, turns out to be much more difficult. 501 00:51:50,920 --> 00:51:55,000 It's not a cheap hack of what we did before because the notion of step is gone. 502 00:51:56,400 --> 00:52:00,820 Things are flowing. They're not jumping. So, um, you have to do something else. 503 00:52:00,830 --> 00:52:04,800 You have to look at the space of trajectories. The theory gets much more painful. 504 00:52:05,160 --> 00:52:12,030 Fortunately, she likes pain. So, um, um, she actually managed to to, uh, hack her way through this. 505 00:52:13,810 --> 00:52:19,719 And as I said, uh, this Meeko metric, I mean, it was a nice thing and so on, but at the end, 506 00:52:19,720 --> 00:52:24,070 we had to do some hackery to get rid of the non-zero cells, distances and so on. 507 00:52:25,760 --> 00:52:32,089 And at the time we were dissatisfied. It seemed to be a beautiful subject until the end point you had to say, well, 508 00:52:32,090 --> 00:52:36,500 we have to cut off the self distances in order to get something that looks more like a distance. 509 00:52:37,070 --> 00:52:42,979 But it turns out one can look at this from the viewpoint of kernels, and then it comes out in a quite a principled way. 510 00:52:42,980 --> 00:52:46,580 So this is, uh, work that we published in HTML. 511 00:52:46,650 --> 00:52:49,970 Ah, ah, yes. Last year. 512 00:52:50,840 --> 00:52:56,180 Other work that I've done, uh, is with my former student Joey both who was a postdoc here, 513 00:52:56,180 --> 00:53:02,150 or is a postdoc here with the, uh, generative, um, learning groups. 514 00:53:02,630 --> 00:53:06,260 So we worked on normalising flows on hyperbolic spaces. 515 00:53:07,010 --> 00:53:10,070 I was really just a geometry consultant. They they did it. 516 00:53:11,660 --> 00:53:18,920 Uh, and we also did some work on generative algorithms based on solving stochastic differential equations on Romanian manifolds. 517 00:53:18,920 --> 00:53:22,970 So there's a lot of nice mathematics that one can do in this setting. 518 00:53:23,860 --> 00:53:29,740 So let me conclude quickly, uh, by simulation, has a rich and venerable history. 519 00:53:30,220 --> 00:53:36,970 In fact, many people discover it all the time. I would say once every 5 to 10 years, somebody invents pi simulation. 520 00:53:37,200 --> 00:53:40,640 Mhm. But that just shows you the naturalness of the concept. 521 00:53:41,150 --> 00:53:49,670 The metric analogue holds promise for quantitative reasoning and approximation, and I think looking forward to pushing this line in future work. 522 00:53:50,300 --> 00:53:56,630 And basically research is alive and well. And there are new areas where by simulation is being discovered all the time. 523 00:53:57,170 --> 00:54:01,910 So I'll just show you some some more photographs. These are collaborators. 524 00:54:02,420 --> 00:54:08,050 This is Gordon Plotkin, who's best kept in the cage. Here are some other collaborators. 525 00:54:08,060 --> 00:54:12,230 This is Norm funds. Uh, and this is Pablo Tyler. 526 00:54:12,470 --> 00:54:15,530 Vineet Gupta and Robert worked closely with me. 527 00:54:15,740 --> 00:54:21,770 I didn't put Josie's picture back here, but. Uh, these two boys and I work together with Josie. 528 00:54:21,830 --> 00:54:29,330 We often call her Jose and her Indian gang. So, uh, all the approximation work was done with them. 529 00:54:30,200 --> 00:54:39,799 And I'd like to give special thanks to very, uh, to close and dear friends, Diana prick up and Doyle Pino, who I got at one time. 530 00:54:39,800 --> 00:54:48,740 I was their mentor. Now they're superstars. Uh, um, and, uh, they're the ones who inspired me to move from the area in which I was in, 531 00:54:48,980 --> 00:54:56,000 which was programming languages which I haven't abandoned, but which I have, uh, moved to also think about machine learning. 532 00:54:57,520 --> 00:54:58,300 Okay. Thank you.