1 00:00:00,360 --> 00:00:19,190 Okay, George. Welcome to the Hillary term 2019 Strictly Lecture, organised by the Oxford Computer Science Department. 2 00:00:20,630 --> 00:00:28,160 First of all, I'd like to say a huge thank you to our sponsors, Oxford Asset Management, who make these lectures possible, 3 00:00:28,160 --> 00:00:32,060 make it possible for us to invite very distinguished speakers from across the world, 4 00:00:32,360 --> 00:00:35,780 and this is the fourth year that they're now supporting these lectures. 5 00:00:35,780 --> 00:00:38,840 So we're very grateful for that support. 6 00:00:40,280 --> 00:00:46,700 And I'd also now like to welcome our speaker, Professor Leslie Pack Hellblade, 7 00:00:47,480 --> 00:00:52,010 who is the Panasonic Professor of Computer Science and Engineering at the Computer Science 8 00:00:52,010 --> 00:00:56,480 and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. 9 00:00:57,470 --> 00:01:07,760 Okay. I hope I slept that out. Correct. You'll be wanting to hear what she has to say. 10 00:01:07,760 --> 00:01:12,680 But just a very brief introduction for me to say that Cowling has made research 11 00:01:12,680 --> 00:01:17,450 contributions on a huge range of areas to decision making under uncertainty, 12 00:01:17,780 --> 00:01:21,680 learning and sensing with applications to robotics. 13 00:01:22,400 --> 00:01:32,150 Her research has a particular focus on reinforcement learning and planning in partially observable devices and the Sky Computers and Thorp Award. 14 00:01:33,250 --> 00:01:40,300 She's been elected fellow of triple-A I. And she was founder and editor in chief of the Journal of Machine Learning Research. 15 00:01:40,780 --> 00:01:47,800 So I'm delighted to be able to welcome her here today to talk about doing for our robots, what nature did for us. 16 00:01:57,080 --> 00:02:00,470 All right. Thank you so much. Okay, let me not reverberate. 17 00:02:00,500 --> 00:02:03,710 Thank you for inviting me. I'm excited to have the chance to talk to you. 18 00:02:04,850 --> 00:02:08,149 I enjoy backtalk and feedback and stuff. 19 00:02:08,150 --> 00:02:12,440 So as I go along, if you want to ask a question or complain about something I've said, I'm very happy. 20 00:02:12,440 --> 00:02:17,150 And also, of course, we can discuss at the end. So my goal, 21 00:02:17,780 --> 00:02:22,040 my research goal for my whole life really has been to understand the computational 22 00:02:22,040 --> 00:02:27,110 mechanisms that we need in order to make a really general purpose intelligent robot. 23 00:02:27,560 --> 00:02:30,469 So I'm not trying to solve any particular robotics problems, 24 00:02:30,470 --> 00:02:36,500 but I just want to understand the nature of intelligence and how we can put it inside a physical system that interacts with the world. 25 00:02:38,160 --> 00:02:44,200 So the way I think about a robot is that it's a transducer so fundamentally. 26 00:02:44,580 --> 00:02:49,530 So I should also say that I worry about the software part of a robot. So let's assume that that's some hardware specs. 27 00:02:50,310 --> 00:02:55,560 Then what I want to think about is what program it is that I need to put in the head of my robot. 28 00:02:56,280 --> 00:03:01,919 And fundamentally what it is, is a program has to take a history of actions and observations. 29 00:03:01,920 --> 00:03:08,280 So okay to the star, some history of what it's observed and what it's done itself and decide on the next action. 30 00:03:09,150 --> 00:03:16,170 And really that's the entire job of a robot engineer is to figure out what pi program pi to put in the head of the robot. 31 00:03:17,050 --> 00:03:23,580 So that's what I want to talk about today and get you to help me think about it. So how should we frame that problem? 32 00:03:23,580 --> 00:03:27,210 How should we think about what pie should go in the head of the robot? So. 33 00:03:28,100 --> 00:03:31,560 First I want to have you think about the robot factory, right? 34 00:03:31,580 --> 00:03:34,879 So I'm going to be a robot factory. I'm a robot engineer. 35 00:03:34,880 --> 00:03:38,300 I'm going to pick a program PI to put ahead of the robot. 36 00:03:39,700 --> 00:03:47,709 And the question is, what programs should I pick? And what I'm going to argue is that I should pick the program that works as well as 37 00:03:47,710 --> 00:03:54,280 possible in expectation over the possible situations the robot might find itself in. 38 00:03:55,600 --> 00:04:04,470 So if you say to me, Robot factory, I need a robot to weld a particular model of a particular car in my factory, 39 00:04:05,140 --> 00:04:09,790 then I can make a very specific program to do that because I know exactly what it's going to need to do. 40 00:04:10,750 --> 00:04:16,390 If, on the other hand, you say robot factory, you make a robot that come to my house and do whatever I ask for it to do. 41 00:04:16,720 --> 00:04:20,990 Well, that's a different problem. It's a harder problem, but it's a problem of the same kind. 42 00:04:21,040 --> 00:04:28,000 Right. So in expectation over the environments that this robot is supposed to work in, I should the program should should perform well. 43 00:04:28,780 --> 00:04:35,469 And so right now, there's a lot of argumentation about the role of learning and the role of reasoning and whether programs 44 00:04:35,470 --> 00:04:41,620 should learn or not and how much and I want to say that that's really there's no point in having an argument. 45 00:04:41,620 --> 00:04:48,100 We just want that program. If we can find the program that works best in expectation of the environment, if the environments are very different, 46 00:04:48,940 --> 00:04:52,930 it's going to have to learn something from the particular environment it finds it finds itself in. 47 00:04:53,560 --> 00:04:57,900 If the environments are very similar, it won't. Okay. 48 00:04:58,020 --> 00:05:02,300 So then I take my job to be the job of designing the robot factory. 49 00:05:02,310 --> 00:05:07,469 So how is it that I'm going to think about finding good programs to put in the robot so that 50 00:05:07,470 --> 00:05:10,770 it can perform well when it goes into the distribution world that needs to go into it. 51 00:05:11,460 --> 00:05:14,490 So there's a bunch of different ways to think about it in one way, 52 00:05:14,850 --> 00:05:20,159 which somehow at the moment seems kind of prevalent, is that the robot should just learn everything, right? 53 00:05:20,160 --> 00:05:22,830 So I should put approximately nothing in the head of the robot. 54 00:05:22,830 --> 00:05:26,970 The robot should go out and it should learn everything from its experience in the world. 55 00:05:27,360 --> 00:05:30,689 Well, okay, so that's actually not even remotely sensible, right? 56 00:05:30,690 --> 00:05:33,659 Would you buy a robot that didn't know anything and allow it into your house and 57 00:05:33,660 --> 00:05:36,780 have it like break a bunch of stuff and try to figure out how to do things? 58 00:05:36,780 --> 00:05:41,819 Right now, another strategy is to say, well, 59 00:05:41,820 --> 00:05:48,210 you should just hire really smart engineers and set them at a desk and they should type in that program and that should do very well. 60 00:05:48,720 --> 00:05:53,190 And that's historically maybe been the approach, but that's actually really hard for the engineers. 61 00:05:53,190 --> 00:05:57,299 Most engineers don't have good access to what program they should write in computer vision. 62 00:05:57,300 --> 00:06:01,680 We've learned the lesson, right? People used to try to write computer programs that would recognise faces. 63 00:06:02,520 --> 00:06:05,880 Lots of very smart people tried really hard to do that and it completely failed. 64 00:06:06,240 --> 00:06:09,840 What succeeded was that they could write programs that could learn to write faces. 65 00:06:09,840 --> 00:06:16,560 And I'll come back to that. Another strategy could be we figure out what humans do and then just do that. 66 00:06:17,250 --> 00:06:19,630 And I think that that's an interesting and important enterprise. 67 00:06:19,650 --> 00:06:23,630 My own personal bet is that that might not be the quickest way to get to where I want to get through. 68 00:06:23,640 --> 00:06:30,380 So people should do that. But that's maybe not what I'm doing. Or maybe we could recapitulate evolution, right? 69 00:06:30,400 --> 00:06:34,330 Maybe we could just get robots to somehow evolve or learn. 70 00:06:35,200 --> 00:06:38,829 Not in the particular niche that they're born into, but over some longer time. 71 00:06:38,830 --> 00:06:42,140 And maybe that would get us somewhere. So I don't know. 72 00:06:42,170 --> 00:06:46,150 None of these things is completely appealing, but I just want to kind of explore a little bit more. 73 00:06:47,090 --> 00:06:52,190 So let's think about learning or evolution in the factory. 74 00:06:52,490 --> 00:06:59,780 Right. So I said that when humans tried to write programs to recognise faces, that didn't work out very well. 75 00:07:00,200 --> 00:07:05,269 But humans actually were really good at writing programs to learn to recognise faces. 76 00:07:05,270 --> 00:07:07,330 And so maybe we can do something like that. Right. 77 00:07:07,370 --> 00:07:14,150 So we want to come up with some strategy for behaving in the world that works well in lots of different worlds. 78 00:07:15,050 --> 00:07:21,860 To somehow do that in the factory, we kind of have to replicate the variability of the domain that the robot's going to have to go into. 79 00:07:21,860 --> 00:07:26,090 We have to replicate that variability in the factory so that we can test our program 80 00:07:26,100 --> 00:07:28,790 so that we know if they're going to do well when we put them out in the world. 81 00:07:29,870 --> 00:07:36,160 So we need we could, though, maybe we can formulate that in terms of a certain space, an objective function, some kind of a test distribution. 82 00:07:36,200 --> 00:07:46,700 So this is maybe not certainly it's a well-formed approach to the problem, but there's a debate again raging right now on the machine. 83 00:07:46,700 --> 00:07:51,410 Learning community has had big arguments with people recently at the Europe's conference about this. 84 00:07:52,130 --> 00:07:57,980 And people want to say you should do no harm. Write that that if you build anything into your system, 85 00:07:57,980 --> 00:08:03,530 into your machine learning system or robot system, if you build something in, you risk being wrong. 86 00:08:03,590 --> 00:08:08,660 And if you build in a wrong thing, you've doomed your robot or your whatever it is to being suboptimal. 87 00:08:08,930 --> 00:08:14,570 It can't overcome that. And that's true, although I would say that we should. 88 00:08:14,900 --> 00:08:16,430 We're going to have to take the risk. Right. 89 00:08:16,610 --> 00:08:23,990 Those of you who raised children probably did some mildly suboptimal things in the process, but probably mostly they came out okay. 90 00:08:24,020 --> 00:08:28,730 Right. So I feel that way about robots too. Like we have to make a decision. We're the engineers, we have to build the system. 91 00:08:29,090 --> 00:08:36,410 We might do things, we might build in some things that are not exactly right, but we don't have time to wait to let things be completely generic. 92 00:08:36,800 --> 00:08:43,010 So you could imagine running a completely generic algorithm that just enumerated programs in order of complexity until you found a good one. 93 00:08:43,010 --> 00:08:47,510 But that's crazy, right? For reinforcement learning people, you could do something roughly the same. 94 00:08:48,880 --> 00:08:55,270 Okay. So what are we supposed to do? So one strategy, I don't know, it's sort of appealing would be just set up some evolutionary process. 95 00:08:55,270 --> 00:09:00,180 It's going to take a really long time. We go to the beach and then eventually the problem gets sorted for us. 96 00:09:00,190 --> 00:09:08,680 Yeah. Okay. So that could be good. But I'm worried that we might not be alive at the time that that thing finishes or even gets anywhere interesting. 97 00:09:09,190 --> 00:09:13,569 And so I'm going to talk about a somewhat actually much more boring strategy, 98 00:09:13,570 --> 00:09:19,660 which is to combine aspects of all of these things in a way that might help us get from 99 00:09:19,660 --> 00:09:23,710 where we are to really interesting and flexible robots in a short amount of time. 100 00:09:25,570 --> 00:09:30,110 Okay. And so the story that I want to kind of. Make tell. 101 00:09:30,120 --> 00:09:33,749 Here is one where we do metal learning in the factory. 102 00:09:33,750 --> 00:09:37,740 So metal learning is a fancy name. It means learning to learn. 103 00:09:37,800 --> 00:09:43,050 It means in the factory we run something like a machine learning algorithm. 104 00:09:43,050 --> 00:09:50,700 Research algorithm that arrives at an algorithm that has the property that when we put it in the world, it can learn effectively. 105 00:09:51,150 --> 00:09:54,360 So I want my robot to be able to come in to your house and make tea. 106 00:09:55,590 --> 00:09:58,409 That's what I want. It's going to have to learn. Every different house is different. 107 00:09:58,410 --> 00:10:02,819 It's going to have to learn how it's organised, how you like your tea, all these things in the factory. 108 00:10:02,820 --> 00:10:07,900 I would like to be able to meddle, learn how to do that. There's all kinds of interesting constraints. 109 00:10:07,920 --> 00:10:14,490 I've been speaking a bit with Liz Pelkey, who knows a lot about human development, and she says that. 110 00:10:16,030 --> 00:10:24,069 Human babies and probably mammals in general are are basically born with a lot of fundamental things about the world, 111 00:10:24,070 --> 00:10:30,550 that there are other agents, that there are clumps of matter that cohere, that there's space that we move through. 112 00:10:31,030 --> 00:10:34,810 And so I think I feel kind of licensed to build those things into my robot. 113 00:10:35,620 --> 00:10:38,770 There are invariants in the worlds that we care about that we could build in theory. 114 00:10:38,830 --> 00:10:47,409 Maybe I know the kinematics of my robot, and I think there are also some interesting constraints that we have to respect that don't come from 115 00:10:47,410 --> 00:10:51,399 the problem that but that come from the fact that humans are actually going to engineer these systems. 116 00:10:51,400 --> 00:10:55,800 So if I'm going to build a system. Even if I'm going to build the metal learning system, 117 00:10:56,490 --> 00:11:01,590 it has to have some degree of modularity just because I personally can't understand the whole thing. 118 00:11:01,590 --> 00:11:05,340 I have to understand the pieces and parts and put them together in some kind of a systematic way. 119 00:11:06,060 --> 00:11:10,370 So these are some constraints that we can bring to bear on the problem. Okay. 120 00:11:10,490 --> 00:11:16,040 So I'll tell you just some little bit of history and then I'll talk about actual technical stuff. 121 00:11:16,950 --> 00:11:25,529 So my first job when I got graduated as an undergraduate in Philosophy of All Things was to work at a research lab, 122 00:11:25,530 --> 00:11:28,860 and they were just building a robot. And nobody there really knew anything about robots. 123 00:11:29,310 --> 00:11:32,639 And my job was to get the robot to drive down the hallway using the sonar sensors. 124 00:11:32,640 --> 00:11:38,580 And the sonar senses were terrible. They didn't get very reliable returns, and I didn't know anything. 125 00:11:39,180 --> 00:11:44,340 And so what I did was I wrote a program. The robot would run into the wall and I would fix the program. 126 00:11:44,640 --> 00:11:47,969 The robot would run into the wall in a somewhat different way. I would fix the program. 127 00:11:47,970 --> 00:11:53,010 The robot ran into the wall for weeks. I fix the program, the robot, and while I fixed the program, the robot and as the wall. 128 00:11:53,310 --> 00:12:00,990 Eventually I learned about how the sensors interacted with the environment and the control system and so on. 129 00:12:01,380 --> 00:12:06,300 I learned enough about that so that I could write a program that drove the robot down the hallway without running into anything. 130 00:12:06,750 --> 00:12:11,430 And the lesson I took away from that was that I didn't want to be in that loop anymore. 131 00:12:12,380 --> 00:12:15,709 Right. So the robot could darn well learn how to interact with the world. 132 00:12:15,710 --> 00:12:18,050 And I didn't want to be in the middle of that. All right. 133 00:12:18,080 --> 00:12:22,190 So I'm going to be outside designing the learning algorithm lets the robot interact with the world. 134 00:12:22,730 --> 00:12:28,820 So then I reinvented reinforcement learning and I did it kind of bad and wrong and eventually got introduced to people who knew something better. 135 00:12:28,820 --> 00:12:34,969 So that was good. And I made a little robot that did reinforcement learning and I actually learned something during my thesis defence, 136 00:12:34,970 --> 00:12:41,570 which was kind of cool, but by like the mid nineties, which was when neural networks were cool before. 137 00:12:41,570 --> 00:12:45,770 So you might read this is kind of like the third time right now the neural networks are cool. 138 00:12:46,460 --> 00:12:49,370 They were the second time they were cool was kind of like in the nineties. 139 00:12:49,760 --> 00:12:53,659 But in the nineties there was the same story is now everyone said, Oh, this is awesome. 140 00:12:53,660 --> 00:12:56,990 We're just going to put neural networks in there and they will figure everything out. 141 00:12:58,510 --> 00:13:03,970 But I just for sample complexity reasons, for some bunch of technical reasons, I don't actually think that that's possible. 142 00:13:04,780 --> 00:13:13,929 Okay. So what to do instead? So I've been working over maybe about the last ten years with that colleague Thomas Perez, 143 00:13:13,930 --> 00:13:17,830 who knows a lot about robot kinematics and robot planning and that kind of stuff. 144 00:13:18,490 --> 00:13:23,140 I, in the intervening time, learned something about planning under uncertainty and model learning. 145 00:13:23,560 --> 00:13:28,210 And we've been working together and we're taking the following approach. 146 00:13:29,050 --> 00:13:33,460 So our view is that there are some basic inference algorithms. 147 00:13:34,920 --> 00:13:44,640 And representations that are justifiable based on regularities in our environment and some kinds of fundamental computational sorts of facts. 148 00:13:45,180 --> 00:13:47,040 Convolution is a great example. Right. 149 00:13:47,050 --> 00:13:53,250 So the same people who like to say that it's a terrible idea to build things into their networks, do convolutions. 150 00:13:53,760 --> 00:13:56,219 But if they're doing convolutions, they're building something into their network, 151 00:13:56,220 --> 00:14:02,790 which is an understanding of some translation invariance or some local spatial regularities and stuff like that. 152 00:14:02,790 --> 00:14:07,170 So they're building a lot of knowledge. And I think that there is a handful more mechanisms like that. 153 00:14:07,410 --> 00:14:13,770 I am not going to argue that this is the right particular set of mechanisms, but I would argue that there are there is some set of mechanisms, 154 00:14:14,160 --> 00:14:19,770 and I'm hoping it's like six or ten and not 700, because otherwise it's going to be hard. 155 00:14:20,580 --> 00:14:23,970 So we're building in some basic principles and mechanisms. 156 00:14:24,420 --> 00:14:29,040 And what we were doing until very recently was actually hand building a system. 157 00:14:30,390 --> 00:14:33,510 And so we're learning people hand building a system. 158 00:14:33,900 --> 00:14:40,440 Well, it was to get an understanding of the whole arc of the system, 159 00:14:40,440 --> 00:14:47,520 of how you could go from perception through estimation and reasoning and action and so on, to build the system that was pretty, pretty competent. 160 00:14:48,990 --> 00:14:54,100 So what I'm going to do is tell you about that. And then I'll tell you about how we're adding learning. 161 00:14:56,160 --> 00:15:00,660 So this is a this is a photo I like to show in all my talks lately. 162 00:15:01,110 --> 00:15:04,770 It is not my kitchen, I promise. 163 00:15:05,670 --> 00:15:13,229 But imagine what it would take to make breakfast there or clean it up and think about what? 164 00:15:13,230 --> 00:15:17,280 What makes it so. That's hard. That seems like a hard problem for a robot and what makes it hard. 165 00:15:17,310 --> 00:15:20,760 So there's a bunch of reasons that make it hard. 166 00:15:21,030 --> 00:15:24,870 One of them is that it's in a sense, in a very high dimensional space. 167 00:15:25,100 --> 00:15:30,180 Right. Robot people like to talk about how many degrees of freedom their robot has, you know, six or ten or 20 or something. 168 00:15:30,840 --> 00:15:35,900 But how many degrees of freedom does the kitchen have? You can't count them. 169 00:15:36,320 --> 00:15:41,860 It's not just the positions and orientations of all the objects. It's like whether the grapes are mushy and when the people are coming home, right. 170 00:15:41,870 --> 00:15:46,870 So that all this stuff very hard to even think about the state space of that kitchen. 171 00:15:47,770 --> 00:15:49,310 The horizon is really long. 172 00:15:49,330 --> 00:15:55,870 If you imagine how many like little linearly interpolated motions the robot might have to make to clean that kitchen, that's really a lot. 173 00:15:56,710 --> 00:15:59,770 And the uncertainty is fundamental. So, again, there's. 174 00:16:00,400 --> 00:16:05,139 If you talk at a robotics conference, often people will say, but don't don't worry about uncertainty. 175 00:16:05,140 --> 00:16:07,900 We'll just make the sensors better and then you won't have to worry about it. 176 00:16:08,620 --> 00:16:12,189 And for some kinds of uncertainty, it's true that making the sensors better will cure the problem. 177 00:16:12,190 --> 00:16:17,500 But making the sensors better won't let me know what's inside the cupboard or your head. 178 00:16:18,340 --> 00:16:21,549 Right. So there's uncertainty. That's just that's kind of very fundamental. 179 00:16:21,550 --> 00:16:26,890 And I can maybe get information about these things, but it requires careful, explicit action to do that. 180 00:16:29,010 --> 00:16:33,329 Okay. So we have a kind of an architecture with boxes and arrows and it's not very 181 00:16:33,330 --> 00:16:36,300 surprising or different from other people's architectures with boxes and arrows. 182 00:16:36,600 --> 00:16:42,090 But I'll tell you what I think are some of the salient points of how we address this problem and show you a demonstration. 183 00:16:42,990 --> 00:16:46,560 Okay. So we have this thing called live space hierarchical planning in the now. 184 00:16:47,460 --> 00:16:51,600 Fundamental to it is the idea of reasoning and belief space. 185 00:16:52,350 --> 00:16:56,370 So everyone who's ever had an atomic theory course or did way back in the day or 186 00:16:56,370 --> 00:16:59,699 something knows that you can basically take any machine apart into these two pieces, 187 00:16:59,700 --> 00:17:05,490 one of which remembers something about the history and the other part that decides what to do based on what it's remembered. 188 00:17:06,870 --> 00:17:12,950 So we'll call the state estimation and action selection. And for us, the arc that goes between the boxes is BUI, 189 00:17:12,990 --> 00:17:18,570 if you can think about it as a probability distribution over the possible states of the external world. 190 00:17:19,230 --> 00:17:21,900 So the first box is trying to estimate what's going on, 191 00:17:21,900 --> 00:17:30,690 and the second box has the job of taking that belief that distribution over the state of the world and deciding what action to do. 192 00:17:31,860 --> 00:17:34,890 So that's the space we kind of live in. 193 00:17:35,220 --> 00:17:38,430 So the first question is what what goes along that wire? 194 00:17:38,850 --> 00:17:44,430 And if you've read papers about common filters or dips or something. 195 00:17:44,700 --> 00:17:47,970 There's been talk about state representation and so on. 196 00:17:48,270 --> 00:17:54,410 But if you think about a robot that has to clean the kitchen, it's state representation can't be like some lovely little vector. 197 00:17:54,420 --> 00:18:02,740 It's a very complicated story. So first of all, we don't know in advance how many objects there are or what they are. 198 00:18:02,760 --> 00:18:06,810 It's not like we can say, Oh, I have a ten dimensional state space, so we have an open world. 199 00:18:07,560 --> 00:18:10,980 We keep. And again, I don't want to argue that this is the one true way. 200 00:18:10,980 --> 00:18:13,290 I just want to tell you a thing that kind of works. 201 00:18:14,370 --> 00:18:22,439 So we keep a kind of a database of objects that we believe exist in the world and some distribution over their properties, 202 00:18:22,440 --> 00:18:31,749 like their relative positions in their mass and so on. We keep a representation of what space we believe is free and what space we believe is occupied 203 00:18:31,750 --> 00:18:35,470 because we have to reason about whether it's safe to go somewhere that we haven't looked at yet. 204 00:18:36,500 --> 00:18:42,800 We keep distributions about what kinds of objects tend to occur near other ones so that we might search more efficiently in certain kinds of places. 205 00:18:43,280 --> 00:18:52,389 So this is our belief, something complicated. We have also to worry about an integration that very few people worry about. 206 00:18:52,390 --> 00:18:57,040 Lots of people worry about robot motion planning. Just how do I move the robot from one pose to another one? 207 00:18:57,130 --> 00:19:03,870 And that's a non-trivial problem. And there's a million algorithms and AI people worry about high level symbolic actions. 208 00:19:03,880 --> 00:19:11,500 What should I do and in what order? At some very abstract level. What's interesting is that these things can't be isolated from one another, 209 00:19:11,830 --> 00:19:18,280 that the geometry can actually completely affect what high level actions you should do and in what order. 210 00:19:19,240 --> 00:19:24,670 The geometry might tell me whether I can drive my car down a certain alley or not, 211 00:19:24,670 --> 00:19:29,020 or whether I can fit in a certain place, or whether I can put these two pans on the stove at the same time. 212 00:19:29,860 --> 00:19:35,530 So we spend a lot of time worrying about reasoning about the interaction between discrete things and continuous ones. 213 00:19:37,310 --> 00:19:48,080 Okay. So probability geometry, discrete stuff, the planning problem, the optimal planning problem in our domain is like unthinkably difficult. 214 00:19:49,400 --> 00:19:53,330 So we kind of fall back on ideas from control theory. 215 00:19:53,330 --> 00:19:57,080 So in control theory, an important idea is feedback. 216 00:19:57,080 --> 00:20:01,100 And the important thing about feedback is that you could do a slightly wrong action. 217 00:20:01,520 --> 00:20:06,200 And if you're just very quickly look to see what happens, you can decide to do something else instead. 218 00:20:06,200 --> 00:20:08,360 That might make up for the slightly wrong thing that you did before. 219 00:20:08,660 --> 00:20:12,709 So you don't have to pick optimal controls, you just have to pick like not terrible controls. 220 00:20:12,710 --> 00:20:16,280 I'm all about being not terrible. Optimal is not in it, right? 221 00:20:16,280 --> 00:20:20,420 There's no way. So I've made peace with not being optimal. 222 00:20:21,140 --> 00:20:30,770 So here's our strategy. Our strategy is we make it really weak, really very approximate model of the dynamics of our world. 223 00:20:31,930 --> 00:20:38,350 And do planning. Right. So here's my beliefs. I have a distribution over the state of the world and make some kind of plan. 224 00:20:39,390 --> 00:20:44,430 I'm going to take the first step of that plan executed in the world, get an observation to see what happened. 225 00:20:45,240 --> 00:20:47,700 And I'll take my belief and plan again. 226 00:20:48,120 --> 00:20:56,400 And an important thing to think about when you see this system is the perspective that you take when you're making decisions. 227 00:20:57,470 --> 00:21:01,280 So the spec, the perspective that we take, I don't know if you can see that kind of grey, 228 00:21:01,310 --> 00:21:12,140 you can a little bit from the planner's perspective, it's interacting with an environment, but its environment includes the belief update. 229 00:21:13,720 --> 00:21:17,140 So it's a control system that operates in belief space. 230 00:21:17,860 --> 00:21:20,799 We give it its objectives and belief space, right? I tell the robot, 231 00:21:20,800 --> 00:21:27,430 I would like you to believe with high probability that the kitchen is clean or that the green boxes on the left hand side of the table or something. 232 00:21:27,730 --> 00:21:33,550 I can't give the robot goals in state space because the robot doesn't have access to state space. 233 00:21:33,560 --> 00:21:40,660 It can't promise me that it can change the world in a certain way, but I can ask it to come to believe something now. 234 00:21:40,660 --> 00:21:44,800 And it's not allowed to just like delude itself and just say, Oh yeah, I believe it, no problem. 235 00:21:45,010 --> 00:21:49,360 It has to actually do the work and run the bays update and so on so that it really does believe this thing. 236 00:21:49,360 --> 00:21:53,530 So I ask it to come to believe something and when it chooses actions, 237 00:21:53,530 --> 00:21:58,509 it has to think about not just the effect of the effects of the actions on the state of the world, 238 00:21:58,510 --> 00:22:03,639 but actually the effects of the actions on its own belief. And that's why it looks right. 239 00:22:03,640 --> 00:22:06,790 You look not to change the world, but to change your belief. 240 00:22:07,690 --> 00:22:10,940 And what's nice is that you can treat all your actions. 241 00:22:10,960 --> 00:22:15,880 Actions that gather information. Actions that change the state of the world. Actions that do both things at once. 242 00:22:15,880 --> 00:22:22,750 Which really most do all in the same framework. So that's a lesson from DPS, but it applies here too. 243 00:22:23,500 --> 00:22:30,100 So we think about planning and building space. We think about planning to take actions that will control our own state of information about the world. 244 00:22:32,110 --> 00:22:36,759 Okay. One more kind of high level idea. I promise I'll get more even more technical in a minute. 245 00:22:36,760 --> 00:22:38,680 But I just kind of want to give you the story here. 246 00:22:39,040 --> 00:22:46,180 Another kind of high level idea here is, okay, so planning is difficult and inefficient and exponential on the horizon usually. 247 00:22:46,960 --> 00:22:53,590 And so we can't stand to have a very long horizon. And if we thought about how many actions it takes to clean that kitchen, the horizon is horrible. 248 00:22:54,280 --> 00:22:59,830 So we do something hierarchical. And people have talked about hierarchical planning a lot. 249 00:23:00,880 --> 00:23:10,840 Usually when they talk about hierarchical planning, they use the idea of hierarchy to make planning for a completely worked out plan more efficient. 250 00:23:11,920 --> 00:23:14,550 But we're going to do something a lot more aggressive than that. 251 00:23:14,560 --> 00:23:20,680 So what we're going to do is we start maybe with some high level goal and we make a plan at some level of abstraction. 252 00:23:20,710 --> 00:23:21,400 So, for instance, 253 00:23:21,400 --> 00:23:27,340 I made a plan to come to Oxford involved going to the Boston airport and flying around and walking through Heathrow and doing some things like that. 254 00:23:28,640 --> 00:23:30,920 But I mean, it is a pretty high level of abstraction. 255 00:23:31,460 --> 00:23:37,230 Now, partly I did that because I'm kind of computationally lazy and I figured I could work out the rest later. 256 00:23:37,850 --> 00:23:45,110 But partly also I did it because I didn't have enough information. I couldn't have planned my trajectory through Heathrow because I didn't know 257 00:23:45,110 --> 00:23:48,080 what the map was like or what gate we would come into or any of that stuff. 258 00:23:48,620 --> 00:23:52,730 So we make a high level of abstraction and then we kind of commit to the first step. 259 00:23:52,940 --> 00:23:58,970 So when we make a plan, I should say the little that you can think of the yellow boxes and as abstract actions, 260 00:23:59,630 --> 00:24:02,680 you can think of the blue boxes as what we call primitives. 261 00:24:02,690 --> 00:24:06,590 They're like sets of states that we have to go through kind of sub goals. 262 00:24:07,960 --> 00:24:10,600 So we take the first signal and we make a plan for that. 263 00:24:10,620 --> 00:24:16,210 So get to the Boston airport and then we take the first goal of that and we make a plan for that. 264 00:24:16,480 --> 00:24:22,300 Like, I don't know, get an Uber. And so finally I get down to some primitive action and I execute it. 265 00:24:22,660 --> 00:24:26,440 So I'm being optimistic that I can work out the rest of the stuff later. 266 00:24:27,200 --> 00:24:29,629 And eventually right now, we're hand building these models. 267 00:24:29,630 --> 00:24:36,770 Eventually, I'm going to have to learn, for instance, to predict whether it's reasonable to walk through Heathrow in 20 minutes. 268 00:24:36,950 --> 00:24:40,849 I don't know. You can tell me whether that's reasonable or not. Okay. 269 00:24:40,850 --> 00:24:46,640 So we have this hierarchical planning thing. It also lets us remember I said whenever we took an action we re planned. 270 00:24:47,300 --> 00:24:53,570 The fact is that if we have this structure, we can decide whether we need to replan very efficiently. 271 00:24:54,080 --> 00:24:59,620 We can ask the question, I just took this action. Did it did it lead to a blue box that I was expecting? 272 00:24:59,630 --> 00:25:04,730 If it did, I can do the next one. If it didn't, I can pop that low level thing off the stack. 273 00:25:04,730 --> 00:25:09,320 So let's say I was planning to take an Uber to the Boston airport, but I discovered that there aren't any. 274 00:25:09,710 --> 00:25:19,970 Okay, so I pop my bottom plan. I don't give up the idea of going to the airport or the idea of coming to Oxford or my academic career or, you know. 275 00:25:20,000 --> 00:25:24,810 Right. You know, people possibly who reason too much at the high level, but it's not healthy. 276 00:25:25,160 --> 00:25:30,350 A little bit is okay, but too much is not. So you'd like to control your reasoning in the structure lets you do that. 277 00:25:30,710 --> 00:25:36,110 So that's also kind of a nice thing. Okay, we put this together, we get the robot to do some stuff. 278 00:25:36,110 --> 00:25:38,570 So I'm just going to show you some movies and kind of talk while it does. 279 00:25:38,930 --> 00:25:45,980 What's important about these movies is that the robot is doing different kinds of things subject to different goals, and it's all the same code. 280 00:25:46,460 --> 00:25:49,760 In this case, we asked it to put the green block on the corner. 281 00:25:50,120 --> 00:25:55,160 The Green BLOCK is too big to pick up, so it has to push it. It reasoned that it had to take the orange one out of the way. 282 00:25:55,520 --> 00:26:01,759 It also knows that pushing is really unreliable, and so every time it pushes it, it looks to see where it went and says, that wasn't so good. 283 00:26:01,760 --> 00:26:04,790 I better, you know, replan it, pushing the same code again. 284 00:26:04,790 --> 00:26:12,139 We asked the robot to go out of the room. It knows just it knows about space and it knows about occlusion and obstacles. 285 00:26:12,140 --> 00:26:16,100 And it says if there's something in my way, I can't go through it. 286 00:26:16,100 --> 00:26:19,910 First. It says, if I want to move for some space, I have to look at it and believe that it's free. 287 00:26:20,600 --> 00:26:23,570 If I look at it and find something in the way, I have to move it out of the way. 288 00:26:23,930 --> 00:26:29,900 So it's reasoning very generally, doing all these same kinds of things. 289 00:26:29,990 --> 00:26:37,700 Before it was picking up the oil model to see if there was oil in it. Okay, so this was good and bad. 290 00:26:37,850 --> 00:26:40,970 Oh, this is a crazy robot in Singapore. Okay, well, whatever. 291 00:26:41,060 --> 00:26:45,680 A little too much similarity on that one. All in the same row, all the same code. 292 00:26:47,240 --> 00:26:52,430 But so that was good. Kind of general purpose. Kind of general purpose, reasoning, planning, estimation, and so on, 293 00:26:52,790 --> 00:26:58,069 but kind of not so good because there's no learning or add activity in there at all. 294 00:26:58,070 --> 00:27:01,370 There's no learning in the moment and there's no learning in the factory. 295 00:27:01,370 --> 00:27:08,540 This is basically me and my writing code. So not an extensible strategy for making general purpose robots in a factory. 296 00:27:08,930 --> 00:27:14,840 So the question now is how can we think about making a system like this learn? 297 00:27:16,760 --> 00:27:23,840 Okay. So in particular, we want to think about what kinds of things can we learn and how to do it. 298 00:27:24,980 --> 00:27:28,730 And so there's there's kind of an interesting distinction. 299 00:27:28,730 --> 00:27:32,059 I really think that there's two importantly different kinds of learning, which, again, 300 00:27:32,060 --> 00:27:37,460 people used to talk about a lot back in the day, but I think they tend to model them now a little bit. 301 00:27:39,020 --> 00:27:44,299 So a classic kind of learning is to learn about the world, right? 302 00:27:44,300 --> 00:27:51,350 So I'm a robot. I don't know what happens if I push this button, so I'm going to try it in your kitchen and see what happens or I'm not sure. 303 00:27:51,440 --> 00:27:54,559 So there's some things about how the world works that I don't understand. 304 00:27:54,560 --> 00:27:58,760 And I have to do things to gather information to figure out how the how does the world work. 305 00:27:58,760 --> 00:28:02,899 I might learn observation models, how to my sensors. Tell me about the world transition models. 306 00:28:02,900 --> 00:28:08,600 What happens if I do this? Most of the work right now in robot learning is focussed on these two lower boxes, right? 307 00:28:08,600 --> 00:28:11,419 So one thing is object detection, which has been huge, right? 308 00:28:11,420 --> 00:28:16,190 So various kinds of perception has been very important to us and also primitive policies, 309 00:28:16,430 --> 00:28:22,430 strategies for picking things up or reorienting something in my hand or riding a bicycle or walking. 310 00:28:22,430 --> 00:28:27,500 Right. So those are all very important kind of you can think of them as sort of closed sensory motor loops. 311 00:28:28,490 --> 00:28:32,780 So this is this is one kind of learning which is really gathering information about the world that time. 312 00:28:34,190 --> 00:28:41,540 There's another kind of learning which is at least as important, I think, which is learning to reason more efficiently or effectively. 313 00:28:42,020 --> 00:28:47,510 And I would argue that learning to play chess or go or Starcraft, is that right? 314 00:28:47,780 --> 00:28:51,979 Especially chess and go. Maybe not the Starcraft thing just to go to figure out, you know, the rules. 315 00:28:51,980 --> 00:28:57,350 Once you read the rules information, theoretically you were capable of making an optimal first move. 316 00:28:58,040 --> 00:29:01,069 It's just that you're too dumb to compute that right. 317 00:29:01,070 --> 00:29:05,150 If you just had a better computer, you wouldn't have to learn any new information. 318 00:29:06,020 --> 00:29:09,800 So you have to put the information in a different form, but you've got the information. 319 00:29:10,310 --> 00:29:14,930 And so there are lots of opportunities within a kind of an estimation and planning and 320 00:29:14,930 --> 00:29:18,980 reasoning architecture like the one I told you about to also do that kind of learning. 321 00:29:19,220 --> 00:29:22,400 And so we're thinking about what kinds well, 322 00:29:22,400 --> 00:29:27,469 I'm going to do now is talk now concretely about some particular thread of work 323 00:29:27,470 --> 00:29:32,750 that's going on right now in my group involving learning transition models. 324 00:29:32,930 --> 00:29:37,400 And a little bit about learning sampling for planning because when you plan in continuous domains, 325 00:29:38,210 --> 00:29:44,000 you have to do something to manage the the continuity, the continuum of the possible actions that you can take. 326 00:29:46,390 --> 00:29:49,840 Okay. So old story. Lots of models in the world. 327 00:29:49,840 --> 00:29:52,840 They're all wrong. Some are useful. And useful for what? 328 00:29:53,050 --> 00:29:56,320 So we have to think about if we're going to learn a model of the actions the robot can take. 329 00:29:56,380 --> 00:30:02,230 What kind of a model could we learn? I had a post-doc named George Kennedy Ross, who thought, 330 00:30:02,230 --> 00:30:08,980 I think very nicely and usefully about the continuum and abstraction in learning models for robots. 331 00:30:09,250 --> 00:30:15,370 He likes to talk about the swamp. You can imagine there's some system of partial differential equations that really governs how the world works. 332 00:30:16,090 --> 00:30:22,240 But that's and you could imagine that being a very accurate model, but not a model that you could plan with easily. 333 00:30:23,530 --> 00:30:27,070 AP When people think about these beautiful, totally abstracted, 334 00:30:27,670 --> 00:30:35,709 wonderful symbolic rules of symbolic modelling and so on, and they're lovely and you can play with them efficiently. 335 00:30:35,710 --> 00:30:40,690 And then the question is, well, how do you make what kind of connection can you make? 336 00:30:40,690 --> 00:30:48,640 Right? So what we need to do is think about how can we abstract chunks of swamp into into kind of nice abstract symbols up at the high level. 337 00:30:49,360 --> 00:30:56,410 So I think we need both levels. And the view that we're taking right now is that we have local control loops that kind of operate in the swamp. 338 00:30:56,980 --> 00:31:01,810 So picking things up and moving things in your hands and walking that all kind of the swamp level stuff. 339 00:31:01,990 --> 00:31:06,270 It gets you from this blob of the swamp to that blob so you can do some stuff like that. 340 00:31:06,290 --> 00:31:12,370 But then what we want to do is learn models of how those low level control loops work. 341 00:31:13,320 --> 00:31:16,590 And those models don't have to be perfect. They just have to be kind of like good enough. 342 00:31:17,070 --> 00:31:19,760 We hope to be able to abstract over objects, right? 343 00:31:19,770 --> 00:31:28,200 So I don't want to have to learn a model of the dynamics of this particular prob, but I know still if I let go of it, it will dry up. 344 00:31:28,200 --> 00:31:33,410 I know what'll happen if I throw it. I know a bunch of things about it, so I know some, some things abstractly. 345 00:31:35,280 --> 00:31:39,839 And what I hope is that I can get some kind of virtuous category diagram thing 346 00:31:39,840 --> 00:31:44,729 going on here so that at the high level there are arrows that are maybe nearly 347 00:31:44,730 --> 00:31:49,379 deterministic that move me from sets of states at the high level to other sets 348 00:31:49,380 --> 00:31:53,300 of states which are really embodied in these kind of swampy control loops. 349 00:31:54,790 --> 00:31:59,270 So. Actually Skip that's permanent. 350 00:31:59,630 --> 00:32:04,250 So what we're going to try to do what I'm going to talk about now in detail. 351 00:32:05,980 --> 00:32:11,740 Is learning a model of the preconditions and effects of a low level control. 352 00:32:11,920 --> 00:32:18,610 So imagine someone has learned an awesome policy for picking something up or stirring something or pouring liquid. 353 00:32:19,480 --> 00:32:24,130 And now I want to learn and abstracted model of it so that I can do planning. 354 00:32:24,730 --> 00:32:33,080 That's what what we're up to here and what we've learned recently in the world of planning for mixed, 355 00:32:33,100 --> 00:32:41,979 continuous and discrete systems is that is that we can get really great leverage on these problems if they're articulated. 356 00:32:41,980 --> 00:32:47,590 If the models that we have were articulated in a certain way and a really important aspect is that they 357 00:32:47,590 --> 00:32:53,320 be factored that we talk about the state of the world not as an under analysed thing like State 94, 358 00:32:53,800 --> 00:32:58,630 but that we describe it in terms of state variables. And the state variables have values that we can change. 359 00:32:59,440 --> 00:33:06,909 It's also true that that constraints are a very useful language for describing the effects of actions in the world, 360 00:33:06,910 --> 00:33:14,050 especially geometric kinds of ways. So we're going to we're going to look for models that are articulate, articulable in this kind of style. 361 00:33:15,090 --> 00:33:21,900 Okay. So how can a competent robot acquire a new ability, assuming that my robot already knows how to pick things up, how to move around. 362 00:33:22,440 --> 00:33:28,110 But we've learned to do a new thing, like pouring or stirring, and we want to add it to the robot's repertoire. 363 00:33:28,800 --> 00:33:34,290 So that's what we're up to. So here's an example. Here's pouring in two dimensions. 364 00:33:36,000 --> 00:33:40,620 I can describe the situation using a bunch of continuous valued variables, 365 00:33:41,190 --> 00:33:45,870 things like the size of the aperture of the cup that I'm pouring out of the size of 366 00:33:45,870 --> 00:33:51,420 the thing that I'm pouring into the relative pose of the centres of the two things, 367 00:33:51,780 --> 00:33:58,470 the way in which the robot is grasping the cup. You could imagine some game parameter in the controller for doing the pouring. 368 00:33:58,740 --> 00:34:02,070 You could imagine conditioning on the viscosity of the stuff in there. 369 00:34:02,110 --> 00:34:07,080 We're not doing that, but you could imagine that. So there's a bunch of parameters that govern the situation. 370 00:34:08,370 --> 00:34:15,479 And what I'm interested in understanding is under what conditions will my pouring operator actually work? 371 00:34:15,480 --> 00:34:18,510 Pretty well, that is to say, get the stuff into the target. 372 00:34:19,320 --> 00:34:20,550 That's what I would like to do. 373 00:34:21,030 --> 00:34:30,180 So one way to think about it is that we could write a kind of symbolic ish looking description of this operation of pouring. 374 00:34:31,130 --> 00:34:37,220 But it has a bunch of continuous parameters, right? The grasp and the sizes of things and the relative pose and so on. 375 00:34:38,000 --> 00:34:45,790 And what I want to learn is a constraint. A constraint on the values of those continuous variables. 376 00:34:46,980 --> 00:34:55,230 But has the property that if it's true and I execute the action under these circumstances, then the goal will probably be satisfied. 377 00:34:55,590 --> 00:34:59,970 The effect will probably happen. But if it's not true, then probably not. 378 00:35:01,620 --> 00:35:07,260 So that's that's the thing that that I want to try to learn when will this operation have the desired outcome? 379 00:35:08,380 --> 00:35:13,920 And because I actually would like to do this with an actual robot, I would like for it to not take too many samples. 380 00:35:13,930 --> 00:35:22,220 So I'm going to be serious about that, too. Okay. So one way we can think about this problem is as a regression problem. 381 00:35:22,240 --> 00:35:32,020 So we could say, instead of saying, well, my constraint is either satisfied or not, I might say it satisfied or not to some degree I can. 382 00:35:32,020 --> 00:35:38,740 You could imagine making a score of pouring. For pouring. It's easy. We just measure the number of particles that end up in the in the target place. 383 00:35:38,980 --> 00:35:44,380 And that's kind of a score. And we say, oh, we would like the scoring function to be higher than some value. 384 00:35:44,650 --> 00:35:47,630 We'll just call that value zero for now. It doesn't matter some constant. 385 00:35:49,150 --> 00:35:53,110 And so then what I'm going to do is I'm going to do some experiments and I want to 386 00:35:53,110 --> 00:35:57,280 learn the mapping from the values of all those continuous variables into the score. 387 00:35:58,790 --> 00:36:04,939 Right. So and if I can learn that and if I have the scoring function, then I know for any assignment of values to those variables, 388 00:36:04,940 --> 00:36:10,160 the likelihood that the or the, you know, the amount of liquid that I expect to fall into the cup. 389 00:36:11,350 --> 00:36:14,799 We can formulate this as a bunch of different kinds of regression strategies. 390 00:36:14,800 --> 00:36:19,000 We're going to use a Gaussian process regression for probably some of your experts. 391 00:36:19,000 --> 00:36:22,030 And this is some of you don't know what it means, so I'll try to talk to everyone. 392 00:36:23,500 --> 00:36:29,440 It's a way of of articulating our own uncertainty about this mapping so that we can do experiments effectively. 393 00:36:30,010 --> 00:36:34,540 So what we do is we do some set of initial outpourings and we get some for each. 394 00:36:35,260 --> 00:36:39,320 Each time we try it, we have some assignment of values to those variables, right? 395 00:36:39,380 --> 00:36:44,350 We try it with different sized caps and different relative positions and different games of the controller and so on. 396 00:36:44,740 --> 00:36:50,350 And for each one of those we get a score. Okay, so the way it goes in process works. 397 00:36:50,650 --> 00:36:55,810 You think along the x axis here I can only do one dimension that's really in a lot of dimensions, but I can only do one. 398 00:36:56,140 --> 00:36:59,230 Those are the the the parameters were very. 399 00:37:00,220 --> 00:37:04,900 And we're interested in knowing when is this g function bigger than zero? 400 00:37:05,020 --> 00:37:06,250 That's what we would really like to know. 401 00:37:07,900 --> 00:37:16,180 So whenever we do an experiment, so that's one of these little blue x's, we get an observation of that function and we can, 402 00:37:17,140 --> 00:37:23,200 using some Bayesian reasoning, compute a function that is sort of the mean and posterior mean. 403 00:37:23,200 --> 00:37:28,930 We have a distribution over the actual function. And every time we get an observation, we can compute a distribution on the function. 404 00:37:29,320 --> 00:37:33,280 So the dark red line is the mean of that distribution over functions. 405 00:37:33,280 --> 00:37:38,290 And the pink area is unlike standard deviation, couple of standard deviations in the other dimension. 406 00:37:40,140 --> 00:37:43,350 Now lots of people will use Gaussian processes to do a lot of different things. 407 00:37:43,380 --> 00:37:47,310 Often people think about it. They're interested in finding the optimum of the function. 408 00:37:47,790 --> 00:37:51,270 We're interested in something else. We're interested in. 409 00:37:52,260 --> 00:38:00,590 The level set we're interested in knowing what ranges of theta is is g above zero 410 00:38:01,440 --> 00:38:05,950 for what ways I can do pouring like I don't want to learn pouring in just one way. 411 00:38:05,970 --> 00:38:12,120 I'll explain why in a minute. But for what arrangements of pouring is it going to work out and for what arrangements will it not? 412 00:38:12,570 --> 00:38:14,879 So this block area there that I've drawn in, 413 00:38:14,880 --> 00:38:26,130 that particular figure is right now the region of theta space where I believe with high probability pouring will work. 414 00:38:27,900 --> 00:38:30,650 Can you be sure that that makes sense here? Right. 415 00:38:31,440 --> 00:38:38,670 So for right now, there's just this little region where I'm convinced it's going to work well with high probability greater than 0.95 or something. 416 00:38:39,570 --> 00:38:46,990 So but now what I would like to do is some active experimentation to try to understand the boundaries of that level set. 417 00:38:47,010 --> 00:38:53,370 I would like to know well what other configurations will give me good pouring and which ones won't. 418 00:38:54,150 --> 00:38:57,470 And so we use an algorithm called the straddle algorithm, which is pretty interesting. 419 00:38:57,480 --> 00:39:01,530 I'm not going to go over it in detail, but it has this notion of an acquisition function. 420 00:39:01,890 --> 00:39:06,810 So for different theta is it tells us which values of theta will give us the most information, 421 00:39:06,810 --> 00:39:10,690 not about the maximum, but about the boundaries of the level set. 422 00:39:11,880 --> 00:39:14,730 I want to know the boundaries of successful pouring. 423 00:39:15,330 --> 00:39:22,170 So this particular acquisition function likes to try experiments in places where the mean is near zero, right? 424 00:39:22,170 --> 00:39:29,760 Because that's we're probably near a boundary of good versus bad and where the standard deviation is higher and it combines this in a suitable way. 425 00:39:30,900 --> 00:39:36,810 And what that means is that we can take a small number of samples and update our belief about this function. 426 00:39:38,820 --> 00:39:43,680 Okay. So what we find is that this kind of active learning is is data efficient. 427 00:39:43,980 --> 00:39:51,000 So if we try experiments at random, it takes really a lot of experiments before we get a good idea of the of that super level set. 428 00:39:52,110 --> 00:39:58,019 If we try this other approach, I have to tell you just a tiny story, because the first paper we did about this was, again, just me and Thomas, 429 00:39:58,020 --> 00:40:04,380 and we did it using some kind of feedforward neural network because we thought it would sound cool and it kind of worked, but not very well. 430 00:40:05,280 --> 00:40:09,000 And but mostly what it did is it enraged our students. So they did a better job. 431 00:40:09,390 --> 00:40:13,830 So students did a better job. The Gaussian processes, this red thing, it's awesome. 432 00:40:13,830 --> 00:40:19,920 So with not too many trials. No, this iteration here is how many experiments we had to do. 433 00:40:20,340 --> 00:40:24,520 It's not at times ten to the third, it's just like ten or 20 or 30. 434 00:40:24,930 --> 00:40:33,330 So we learned something from experimentation. There's another piece of the story which I'm actually going to skip and I'm going to show you. 435 00:40:33,750 --> 00:40:41,520 So first in simulation and then in the real robot, what again we do is we take something that has the ability already. 436 00:40:41,520 --> 00:40:49,440 It already understands picking up objects and putting them down. It learns pre images for pouring in this case and stirring. 437 00:40:50,070 --> 00:40:53,860 We asked it to make a cup of coffee. To make a cup of coffee. 438 00:40:53,880 --> 00:40:58,080 There has to be cream in there. There has to be sugar. There has to be coffee. It has to be mixed. 439 00:40:58,080 --> 00:41:01,320 It has to be on the green thing and it has to be served at the end of the table. 440 00:41:01,350 --> 00:41:04,350 That's the thing. We do not tell her what steps to do or in what order. 441 00:41:04,360 --> 00:41:11,070 So it's using a general purpose planner to do that. And it uses these learn it uses these learned to print images. 442 00:41:12,030 --> 00:41:19,290 To to kind of get new descriptions of these operations of stirring and pouring and scooping, and it puts them together to make these plans. 443 00:41:19,740 --> 00:41:22,950 We have to watch it stirring because it's fun. There we go. 444 00:41:23,310 --> 00:41:26,520 I like this nice little simulator. More fun. 445 00:41:26,520 --> 00:41:29,820 Is this okay? So here's our robot doing basically the same thing. 446 00:41:30,120 --> 00:41:35,070 In this case. We just learned the pouring and the pushing it already picking up. 447 00:41:35,940 --> 00:41:40,140 What's interesting about this is that the goal varies. That is to say which. 448 00:41:40,740 --> 00:41:46,140 Yeah, I know. So you can come in fixed my motion planner if you want to, or you could just giggle. 449 00:41:48,150 --> 00:41:51,719 But so we give it objectives. We move the objects around. 450 00:41:51,720 --> 00:41:55,170 We ask you to do different things at different times. It kind of does it. 451 00:41:55,170 --> 00:41:58,410 It's not a thing of beauty, but it's actually reasonably reliable. 452 00:41:59,340 --> 00:42:05,550 I'll show you some outtakes at the very end. This time we told you that we wanted the thing on top of the block and the stuff in it. 453 00:42:06,180 --> 00:42:14,009 And I guess I would argue that as we make these kinds of scenarios more complicated and the goals more complicated, look there, 454 00:42:14,010 --> 00:42:23,700 it pushed the ball so that it was in the usable workspace so that compare with the other hands that was like mildly clever and did it. 455 00:42:24,030 --> 00:42:31,950 Now not off the table as we make these scenarios more and more complicated, it seems, at least to me, 456 00:42:31,950 --> 00:42:36,929 with my limited imagination, harder and harder to just straight up learn a policy to do this. 457 00:42:36,930 --> 00:42:41,970 And it seems to me that some kind of planning is actually kind of important to the process. 458 00:42:42,670 --> 00:42:45,760 Okay. Oh, last one. Okay, Guru. Right. 459 00:42:46,650 --> 00:42:50,020 Um, let's see. 10 minutes. Okay. This is good. 460 00:42:51,400 --> 00:42:55,900 So good. So what did I talked about there? Learning. 461 00:42:56,230 --> 00:43:00,460 Assuming we kind of had the framework for a description of the effects of an action. 462 00:43:01,720 --> 00:43:07,440 And now I want to talk about how we can actually learn the framework. 463 00:43:07,450 --> 00:43:14,120 Right. So in that case, I, I said which aspects of the domain were important to making that prediction? 464 00:43:14,140 --> 00:43:18,760 I said, the sizes and shapes of the cups mattered. I said, the gain and the controller matter. 465 00:43:18,790 --> 00:43:24,520 I said all that stuff. And you just had to learn that constraint on a fixed dimensional kind of problem. 466 00:43:25,780 --> 00:43:33,010 But that's not a reasonable setup, really. If I'm trying to put myself out of a job a little bit as the person who writes all this stuff down. 467 00:43:33,700 --> 00:43:39,040 So another thing that seems to be important is deciding which objects and which properties of those objects 468 00:43:39,430 --> 00:43:46,960 both affect the success of doing an operation and which might be actually changed by doing that operation. 469 00:43:48,390 --> 00:43:52,260 So I had some old work that did this in a kind of a logical framework. 470 00:43:52,260 --> 00:43:57,780 And so what we tried to do recently was recast that in a more hip, new neural network way. 471 00:43:58,110 --> 00:44:01,670 I don't know if it's better. I actually think it probably is. But yeah. 472 00:44:02,490 --> 00:44:08,510 Okay. So the idea. Let me just skip forward here. 473 00:44:08,540 --> 00:44:14,310 Okay. The idea here is. We have a representation of the state of the world. 474 00:44:15,390 --> 00:44:22,110 And, uh, but, but it's going to have different size in different instances of the problem. 475 00:44:22,260 --> 00:44:31,410 Right. So most setups for neural network learning and, and functional approximation and so on assume some kind of fixed dimensional representation. 476 00:44:32,040 --> 00:44:38,279 And when they don't, then they feed things in sequentially. Recently there's been work on something called graph neural networks, 477 00:44:38,280 --> 00:44:43,850 which makes me laugh because it's kind of like marked off random fields, which is an old idea. 478 00:44:43,860 --> 00:44:46,319 So this is a new name for an old idea, but it's a good idea, 479 00:44:46,320 --> 00:44:54,629 which is that you can learn something about local relationships among objects or properties or values in your model and propagate those values. 480 00:44:54,630 --> 00:45:01,740 And you can learn those local models in a way that makes them independent of the arity of the problem that you'll face today. 481 00:45:01,770 --> 00:45:05,670 Right. So today I have to clear two things off the table. Tomorrow I have to clear 20. 482 00:45:06,240 --> 00:45:14,310 But I hope that the models that I learn about how to do that will transfer automatically, will work independent of the size of my problem. 483 00:45:15,300 --> 00:45:22,530 So in any given problem instance, I might have a representation right at the moment of my current belief about the world. 484 00:45:22,920 --> 00:45:26,030 In this case, imagine that it's not even uncertain, although it could be. 485 00:45:26,040 --> 00:45:29,640 So I right now I know about some objects, and for each object I know about some properties. 486 00:45:30,180 --> 00:45:33,270 And what I'm interested in doing is learning a transition model. 487 00:45:33,270 --> 00:45:36,030 That is to say, what will happen if I do this action right. 488 00:45:37,390 --> 00:45:44,260 So one way to think about it is that it will depend on some properties of some objects in the 489 00:45:44,260 --> 00:45:49,510 current state and it will affect some properties of some objects in the resulting state. 490 00:45:50,680 --> 00:45:59,860 And what I'm going to focus on now is just telling you a little bit of a story about how we can find a model that has the sparseness property. 491 00:46:01,580 --> 00:46:09,290 Okay. So there's this, again, kind of old idea from I, which also comes from natural language and the notion of didactic reference. 492 00:46:09,830 --> 00:46:18,560 So [INAUDIBLE] is, I guess, in Greek means pointing to so this remote or that chair or the water bottle on the lectern. 493 00:46:18,830 --> 00:46:22,700 Those are all TikTok references. Okay. 494 00:46:23,030 --> 00:46:33,110 So what we want to do is decide which objects and which properties of objects are relevant to making the predictions that we want to make. 495 00:46:34,150 --> 00:46:39,790 And so what we're going to do is we're going to start out and say, I know there's at least one object in the world that's important. 496 00:46:39,790 --> 00:46:41,110 Let's think about pushing an object. 497 00:46:41,260 --> 00:46:46,660 So if I want to learn the model of what happens when I try to push an object, then I know which object I'm pushing. 498 00:46:47,760 --> 00:46:54,960 Okay. That's good. But there may be other objects that matter or that are affected by doing this action. 499 00:46:55,500 --> 00:46:59,729 And I'm going to refer to those objects using dynamic references in this work. 500 00:46:59,730 --> 00:47:04,740 We have some fixed set of dyadic references which we should increase and learn, but for right now they're fixed. 501 00:47:05,190 --> 00:47:09,839 I can talk about an object above this object or below it, near the nearest object and so on. 502 00:47:09,840 --> 00:47:16,890 So there are some relations which you can think of when applied to this object well after some other set of objects. 503 00:47:17,840 --> 00:47:21,290 The set might be empty. The set might have one object, the set might have many objects. 504 00:47:21,290 --> 00:47:29,540 But whatever their way of talking about other objects in relation to one object I already know, and then you can apply that kind of recursively. 505 00:47:29,540 --> 00:47:33,370 So imagine that I wanted to talk about pushing an object. 506 00:47:33,380 --> 00:47:39,080 We'll call that object object one and I can say LED object to the way to read that stuff that right 507 00:47:39,080 --> 00:47:45,020 there is LED object to be the object or possibly the set of objects that's above object one. 508 00:47:45,980 --> 00:47:49,610 And let object three be the set of objects that's above object to and so on. 509 00:47:50,090 --> 00:47:53,210 So if I had a scene like the one on the right. 510 00:47:54,440 --> 00:48:01,430 I could abstracted is a graph of relations. And then I could say, well, if a is object one. 511 00:48:02,540 --> 00:48:11,810 Then these other objects. These particular objects in my particular world play the roles of object to an object to an object for. 512 00:48:13,440 --> 00:48:22,349 All right. So I'm going to use this mechanism to have a flexible representation of an object I'm operating on and the 513 00:48:22,350 --> 00:48:27,450 other objects that are relevant to it in a way that applies no matter how many objects are in my scene. 514 00:48:29,670 --> 00:48:39,210 Okay. So if we have a set of these didactic references which reach out and name some other objects relative to the object I'm operating on then. 515 00:48:40,140 --> 00:48:43,440 And we maybe we figure out which properties of those objects are important. 516 00:48:43,980 --> 00:48:48,520 Then we have a kind of a straight up neural network learning problem, right? 517 00:48:48,540 --> 00:48:54,240 Which is now we have a fixed dimensional input. We have this object and it's other objects that are relevant and some properties. 518 00:48:55,050 --> 00:48:58,290 And we map into some properties of that object and some other objects. 519 00:48:59,530 --> 00:49:05,560 So that's plain old numeric regression problem. We know how to train a neural network if we know what data to give it. 520 00:49:07,710 --> 00:49:14,550 Okay. And so and then you can apply it, right? What's important about this thing is that it's meant to it's supposed to apply anywhere, right? 521 00:49:15,390 --> 00:49:20,280 It says if these guys have these properties on the endpoint, this is the properties that the other guys will have in the output. 522 00:49:21,260 --> 00:49:28,340 Okay. And so then we have an algorithm for learning this, and I am not going to go into detail, but it takes a kind of a normal training set. 523 00:49:28,640 --> 00:49:33,950 A state is an arrangement of all the objects in the world. I take an action, a resulting sentence, then I get a resulting state, 524 00:49:34,430 --> 00:49:42,770 and I have an outer loop that kind of greedily operates on the structures of the rules and an 525 00:49:42,800 --> 00:49:47,810 inner loop that does some M stuff and an inner inner loop that does neural network train. 526 00:49:50,060 --> 00:49:58,640 Okay. And so what happens is that we can do things like learn the results of pushing objects on a crowded table. 527 00:49:59,510 --> 00:50:03,780 Right now, we're not testing this inside a planner. So this is a really preliminary work. 528 00:50:03,840 --> 00:50:07,940 We're just checking to see if the model that we get predicts the data that we trained on. 529 00:50:07,970 --> 00:50:11,330 Well, I mean, critics held out data well, but it's just based on likelihood. 530 00:50:12,920 --> 00:50:16,150 And we compared it to two other strategies. 531 00:50:16,160 --> 00:50:18,170 Right. So there's our learned rule based model. 532 00:50:18,590 --> 00:50:26,630 We compared it to just a straight up neural network in which we encode the positions of all the objects on the table. 533 00:50:27,080 --> 00:50:32,780 The problem is that you don't know what order to put them in. And so we picked what we thought was the most helpful order. 534 00:50:32,780 --> 00:50:39,530 But it's a very hard thing to do. And we compare it against a graph neural network, which is, again, a kind of a modern, 535 00:50:39,530 --> 00:50:47,530 structured neural network that abstracts away from the individuals in a nice way, but doesn't have exactly the right bias for this kind of problem. 536 00:50:49,120 --> 00:50:52,419 And what we found so purple is our thing. Blue is the graph. 537 00:50:52,420 --> 00:50:58,780 Neural network in red is a plain old flat neural network in this case with just three objects in all the scene. 538 00:50:58,780 --> 00:51:01,720 So we're not testing generalisation over multiple objects, 539 00:51:02,710 --> 00:51:09,220 but we find again that the sparse rule learning can learn officially very quickly how to make good predictions in this domain. 540 00:51:10,900 --> 00:51:15,879 More importantly is the fact that it's relatively unaffected by clutter, right? 541 00:51:15,880 --> 00:51:26,050 So as you add more objects into the world, the neural network suffers because they're all in some arbitrary order and it doesn't know what matters. 542 00:51:26,740 --> 00:51:34,930 The graph neural network does reasonably well, and the learning thing works still more reliably, you might ask. 543 00:51:36,120 --> 00:51:43,260 I certainly asked when I first saw these results. You might ask why does it get better as there get to be more objects in the world? 544 00:51:43,290 --> 00:51:49,769 That seems counterintuitive. The answer is that if there's a bunch of stuff on the table and I'm pushing one object that might push another object, 545 00:51:49,770 --> 00:51:52,470 but almost everything stays the same. 546 00:51:53,890 --> 00:51:59,230 So predicting everything stays the same is not so bad and you just have to learn the things that don't stay the same. 547 00:51:59,530 --> 00:52:03,729 So if you average over the number of objects in the scene and you predict how well you predict what happens, 548 00:52:03,730 --> 00:52:05,860 then you measure how well you're predicting what happens to them. 549 00:52:06,220 --> 00:52:10,570 The more objects that don't move, actually, the easier it becomes if you have the right bias. 550 00:52:11,430 --> 00:52:15,900 But it becomes harder for the flatterer on our. Okay. 551 00:52:16,050 --> 00:52:19,620 So this is just like a tiny, tiny tip of an iceberg, but it's very exciting. 552 00:52:19,620 --> 00:52:26,279 I feel like we have the tools and the pieces and the parts to figure out how to make generally intelligent robots. 553 00:52:26,280 --> 00:52:32,159 I kind of do. We have a bunch of work we have to work on connecting the vision algorithms that exist now, 554 00:52:32,160 --> 00:52:37,170 which are awesome but not quite what we need into speed estimation in a useful way. 555 00:52:38,430 --> 00:52:44,520 Right now we have learned policies down at the low level and planning at the high level, but that should be more fluid. 556 00:52:44,850 --> 00:52:50,819 We should be able to catch the results of planning in a way that lets us routinised the things that we do very frequently. 557 00:52:50,820 --> 00:52:56,729 We're not doing that. I think end to end learning is a blessing and a curse. 558 00:52:56,730 --> 00:52:59,730 My colleague Tomas likes to call it dead end learning. 559 00:52:59,730 --> 00:53:04,830 I'm not sure. So what's interesting about end to end learning, right? 560 00:53:04,830 --> 00:53:11,790 So that's when you say, I have this giant system and I'm not going to try to give it intermediate signals of success, 561 00:53:11,790 --> 00:53:17,430 but rather just measure the quality of the whole thing based on the final actions it takes. 562 00:53:17,940 --> 00:53:21,120 That's the right thing. You can't argue with it intellectually, right? 563 00:53:21,120 --> 00:53:26,250 It is exactly the right thing. You don't want to say my state estimate it needs to be awesome. 564 00:53:26,460 --> 00:53:29,940 According to some criterion, that's only about state estimation. 565 00:53:29,940 --> 00:53:37,260 Really. All I care is that the status metre does a job that helps the planner do a job that causes the controller to emit the right torques. 566 00:53:38,450 --> 00:53:45,469 That's all I care about. But the idea that you could back propagate from errors on the talks all the way 567 00:53:45,470 --> 00:53:50,150 through the plan or in the state estimate or to me seems like not so clear. 568 00:53:50,630 --> 00:53:55,040 So I think we have to figure out ways of combining local reward signals and end to end reward signals. 569 00:53:56,150 --> 00:54:01,430 We're talking about interacting with humans, all kinds of stuff. So I brought back one more ancient slide. 570 00:54:01,430 --> 00:54:05,000 This is from 1995, but it's still kind of like my view of what's going on. 571 00:54:06,240 --> 00:54:13,250 And then we have to think about learning a lot of different kinds of levels of abstraction and how to make them actually not divided into layers. 572 00:54:13,670 --> 00:54:19,010 I think that right now, especially in robotics, a lot of people are working at what I would call the skill level, 573 00:54:19,370 --> 00:54:22,790 although, you know, I think in robotics, that's roughly right. 574 00:54:24,290 --> 00:54:30,050 What's interesting is that we can kind of do skills and then we can kind of do like fancy stuff up at the top, like play go. 575 00:54:30,500 --> 00:54:35,030 But we're terrible at just like basically making breakfast or even walking out of this lecture room. 576 00:54:35,390 --> 00:54:41,780 So that middle ground I think is interesting and important, and I want to recruit more people to work on it and think about it. 577 00:54:42,380 --> 00:54:47,630 So there's a bunch of people who helped with this, and I'm grateful to them for what they have done. 578 00:54:48,080 --> 00:54:51,830 And with that, I will say thank you and let you watch the robot make mistakes. 579 00:54:51,920 --> 00:54:52,430 So thanks.