1
00:00:00,360 --> 00:00:19,190
Okay, George. Welcome to the Hillary term 2019 Strictly Lecture, organised by the Oxford Computer Science Department.

2
00:00:20,630 --> 00:00:28,160
First of all, I'd like to say a huge thank you to our sponsors, Oxford Asset Management, who make these lectures possible,

3
00:00:28,160 --> 00:00:32,060
make it possible for us to invite very distinguished speakers from across the world,

4
00:00:32,360 --> 00:00:35,780
and this is the fourth year that they're now supporting these lectures.

5
00:00:35,780 --> 00:00:38,840
So we're very grateful for that support.

6
00:00:40,280 --> 00:00:46,700
And I'd also now like to welcome our speaker, Professor Leslie Pack Hellblade,

7
00:00:47,480 --> 00:00:52,010
who is the Panasonic Professor of Computer Science and Engineering at the Computer Science

8
00:00:52,010 --> 00:00:56,480
and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology.

9
00:00:57,470 --> 00:01:07,760
Okay. I hope I slept that out. Correct. You'll be wanting to hear what she has to say.

10
00:01:07,760 --> 00:01:12,680
But just a very brief introduction for me to say that Cowling has made research

11
00:01:12,680 --> 00:01:17,450
contributions on a huge range of areas to decision making under uncertainty,

12
00:01:17,780 --> 00:01:21,680
learning and sensing with applications to robotics.

13
00:01:22,400 --> 00:01:32,150
Her research has a particular focus on reinforcement learning and planning in partially observable devices and the Sky Computers and Thorp Award.

14
00:01:33,250 --> 00:01:40,300
She's been elected fellow of triple-A I. And she was founder and editor in chief of the Journal of Machine Learning Research.

15
00:01:40,780 --> 00:01:47,800
So I'm delighted to be able to welcome her here today to talk about doing for our robots, what nature did for us.

16
00:01:57,080 --> 00:02:00,470
All right. Thank you so much. Okay, let me not reverberate.

17
00:02:00,500 --> 00:02:03,710
Thank you for inviting me. I'm excited to have the chance to talk to you.

18
00:02:04,850 --> 00:02:08,149
I enjoy backtalk and feedback and stuff.

19
00:02:08,150 --> 00:02:12,440
So as I go along, if you want to ask a question or complain about something I've said, I'm very happy.

20
00:02:12,440 --> 00:02:17,150
And also, of course, we can discuss at the end. So my goal,

21
00:02:17,780 --> 00:02:22,040
my research goal for my whole life really has been to understand the computational

22
00:02:22,040 --> 00:02:27,110
mechanisms that we need in order to make a really general purpose intelligent robot.

23
00:02:27,560 --> 00:02:30,469
So I'm not trying to solve any particular robotics problems,

24
00:02:30,470 --> 00:02:36,500
but I just want to understand the nature of intelligence and how we can put it inside a physical system that interacts with the world.

25
00:02:38,160 --> 00:02:44,200
So the way I think about a robot is that it's a transducer so fundamentally.

26
00:02:44,580 --> 00:02:49,530
So I should also say that I worry about the software part of a robot. So let's assume that that's some hardware specs.

27
00:02:50,310 --> 00:02:55,560
Then what I want to think about is what program it is that I need to put in the head of my robot.

28
00:02:56,280 --> 00:03:01,919
And fundamentally what it is, is a program has to take a history of actions and observations.

29
00:03:01,920 --> 00:03:08,280
So okay to the star, some history of what it's observed and what it's done itself and decide on the next action.

30
00:03:09,150 --> 00:03:16,170
And really that's the entire job of a robot engineer is to figure out what pi program pi to put in the head of the robot.

31
00:03:17,050 --> 00:03:23,580
So that's what I want to talk about today and get you to help me think about it. So how should we frame that problem?

32
00:03:23,580 --> 00:03:27,210
How should we think about what pie should go in the head of the robot? So.

33
00:03:28,100 --> 00:03:31,560
First I want to have you think about the robot factory, right?

34
00:03:31,580 --> 00:03:34,879
So I'm going to be a robot factory. I'm a robot engineer.

35
00:03:34,880 --> 00:03:38,300
I'm going to pick a program PI to put ahead of the robot.

36
00:03:39,700 --> 00:03:47,709
And the question is, what programs should I pick? And what I'm going to argue is that I should pick the program that works as well as

37
00:03:47,710 --> 00:03:54,280
possible in expectation over the possible situations the robot might find itself in.

38
00:03:55,600 --> 00:04:04,470
So if you say to me, Robot factory, I need a robot to weld a particular model of a particular car in my factory,

39
00:04:05,140 --> 00:04:09,790
then I can make a very specific program to do that because I know exactly what it's going to need to do.

40
00:04:10,750 --> 00:04:16,390
If, on the other hand, you say robot factory, you make a robot that come to my house and do whatever I ask for it to do.

41
00:04:16,720 --> 00:04:20,990
Well, that's a different problem. It's a harder problem, but it's a problem of the same kind.

42
00:04:21,040 --> 00:04:28,000
Right. So in expectation over the environments that this robot is supposed to work in, I should the program should should perform well.

43
00:04:28,780 --> 00:04:35,469
And so right now, there's a lot of argumentation about the role of learning and the role of reasoning and whether programs

44
00:04:35,470 --> 00:04:41,620
should learn or not and how much and I want to say that that's really there's no point in having an argument.

45
00:04:41,620 --> 00:04:48,100
We just want that program. If we can find the program that works best in expectation of the environment, if the environments are very different,

46
00:04:48,940 --> 00:04:52,930
it's going to have to learn something from the particular environment it finds it finds itself in.

47
00:04:53,560 --> 00:04:57,900
If the environments are very similar, it won't. Okay.

48
00:04:58,020 --> 00:05:02,300
So then I take my job to be the job of designing the robot factory.

49
00:05:02,310 --> 00:05:07,469
So how is it that I'm going to think about finding good programs to put in the robot so that

50
00:05:07,470 --> 00:05:10,770
it can perform well when it goes into the distribution world that needs to go into it.

51
00:05:11,460 --> 00:05:14,490
So there's a bunch of different ways to think about it in one way,

52
00:05:14,850 --> 00:05:20,159
which somehow at the moment seems kind of prevalent, is that the robot should just learn everything, right?

53
00:05:20,160 --> 00:05:22,830
So I should put approximately nothing in the head of the robot.

54
00:05:22,830 --> 00:05:26,970
The robot should go out and it should learn everything from its experience in the world.

55
00:05:27,360 --> 00:05:30,689
Well, okay, so that's actually not even remotely sensible, right?

56
00:05:30,690 --> 00:05:33,659
Would you buy a robot that didn't know anything and allow it into your house and

57
00:05:33,660 --> 00:05:36,780
have it like break a bunch of stuff and try to figure out how to do things?

58
00:05:36,780 --> 00:05:41,819
Right now, another strategy is to say, well,

59
00:05:41,820 --> 00:05:48,210
you should just hire really smart engineers and set them at a desk and they should type in that program and that should do very well.

60
00:05:48,720 --> 00:05:53,190
And that's historically maybe been the approach, but that's actually really hard for the engineers.

61
00:05:53,190 --> 00:05:57,299
Most engineers don't have good access to what program they should write in computer vision.

62
00:05:57,300 --> 00:06:01,680
We've learned the lesson, right? People used to try to write computer programs that would recognise faces.

63
00:06:02,520 --> 00:06:05,880
Lots of very smart people tried really hard to do that and it completely failed.

64
00:06:06,240 --> 00:06:09,840
What succeeded was that they could write programs that could learn to write faces.

65
00:06:09,840 --> 00:06:16,560
And I'll come back to that. Another strategy could be we figure out what humans do and then just do that.

66
00:06:17,250 --> 00:06:19,630
And I think that that's an interesting and important enterprise.

67
00:06:19,650 --> 00:06:23,630
My own personal bet is that that might not be the quickest way to get to where I want to get through.

68
00:06:23,640 --> 00:06:30,380
So people should do that. But that's maybe not what I'm doing. Or maybe we could recapitulate evolution, right?

69
00:06:30,400 --> 00:06:34,330
Maybe we could just get robots to somehow evolve or learn.

70
00:06:35,200 --> 00:06:38,829
Not in the particular niche that they're born into, but over some longer time.

71
00:06:38,830 --> 00:06:42,140
And maybe that would get us somewhere. So I don't know.

72
00:06:42,170 --> 00:06:46,150
None of these things is completely appealing, but I just want to kind of explore a little bit more.

73
00:06:47,090 --> 00:06:52,190
So let's think about learning or evolution in the factory.

74
00:06:52,490 --> 00:06:59,780
Right. So I said that when humans tried to write programs to recognise faces, that didn't work out very well.

75
00:07:00,200 --> 00:07:05,269
But humans actually were really good at writing programs to learn to recognise faces.

76
00:07:05,270 --> 00:07:07,330
And so maybe we can do something like that. Right.

77
00:07:07,370 --> 00:07:14,150
So we want to come up with some strategy for behaving in the world that works well in lots of different worlds.

78
00:07:15,050 --> 00:07:21,860
To somehow do that in the factory, we kind of have to replicate the variability of the domain that the robot's going to have to go into.

79
00:07:21,860 --> 00:07:26,090
We have to replicate that variability in the factory so that we can test our program

80
00:07:26,100 --> 00:07:28,790
so that we know if they're going to do well when we put them out in the world.

81
00:07:29,870 --> 00:07:36,160
So we need we could, though, maybe we can formulate that in terms of a certain space, an objective function, some kind of a test distribution.

82
00:07:36,200 --> 00:07:46,700
So this is maybe not certainly it's a well-formed approach to the problem, but there's a debate again raging right now on the machine.

83
00:07:46,700 --> 00:07:51,410
Learning community has had big arguments with people recently at the Europe's conference about this.

84
00:07:52,130 --> 00:07:57,980
And people want to say you should do no harm. Write that that if you build anything into your system,

85
00:07:57,980 --> 00:08:03,530
into your machine learning system or robot system, if you build something in, you risk being wrong.

86
00:08:03,590 --> 00:08:08,660
And if you build in a wrong thing, you've doomed your robot or your whatever it is to being suboptimal.

87
00:08:08,930 --> 00:08:14,570
It can't overcome that. And that's true, although I would say that we should.

88
00:08:14,900 --> 00:08:16,430
We're going to have to take the risk. Right.

89
00:08:16,610 --> 00:08:23,990
Those of you who raised children probably did some mildly suboptimal things in the process, but probably mostly they came out okay.

90
00:08:24,020 --> 00:08:28,730
Right. So I feel that way about robots too. Like we have to make a decision. We're the engineers, we have to build the system.

91
00:08:29,090 --> 00:08:36,410
We might do things, we might build in some things that are not exactly right, but we don't have time to wait to let things be completely generic.

92
00:08:36,800 --> 00:08:43,010
So you could imagine running a completely generic algorithm that just enumerated programs in order of complexity until you found a good one.

93
00:08:43,010 --> 00:08:47,510
But that's crazy, right? For reinforcement learning people, you could do something roughly the same.

94
00:08:48,880 --> 00:08:55,270
Okay. So what are we supposed to do? So one strategy, I don't know, it's sort of appealing would be just set up some evolutionary process.

95
00:08:55,270 --> 00:09:00,180
It's going to take a really long time. We go to the beach and then eventually the problem gets sorted for us.

96
00:09:00,190 --> 00:09:08,680
Yeah. Okay. So that could be good. But I'm worried that we might not be alive at the time that that thing finishes or even gets anywhere interesting.

97
00:09:09,190 --> 00:09:13,569
And so I'm going to talk about a somewhat actually much more boring strategy,

98
00:09:13,570 --> 00:09:19,660
which is to combine aspects of all of these things in a way that might help us get from

99
00:09:19,660 --> 00:09:23,710
where we are to really interesting and flexible robots in a short amount of time.

100
00:09:25,570 --> 00:09:30,110
Okay. And so the story that I want to kind of. Make tell.

101
00:09:30,120 --> 00:09:33,749
Here is one where we do metal learning in the factory.

102
00:09:33,750 --> 00:09:37,740
So metal learning is a fancy name. It means learning to learn.

103
00:09:37,800 --> 00:09:43,050
It means in the factory we run something like a machine learning algorithm.

104
00:09:43,050 --> 00:09:50,700
Research algorithm that arrives at an algorithm that has the property that when we put it in the world, it can learn effectively.

105
00:09:51,150 --> 00:09:54,360
So I want my robot to be able to come in to your house and make tea.

106
00:09:55,590 --> 00:09:58,409
That's what I want. It's going to have to learn. Every different house is different.

107
00:09:58,410 --> 00:10:02,819
It's going to have to learn how it's organised, how you like your tea, all these things in the factory.

108
00:10:02,820 --> 00:10:07,900
I would like to be able to meddle, learn how to do that. There's all kinds of interesting constraints.

109
00:10:07,920 --> 00:10:14,490
I've been speaking a bit with Liz Pelkey, who knows a lot about human development, and she says that.

110
00:10:16,030 --> 00:10:24,069
Human babies and probably mammals in general are are basically born with a lot of fundamental things about the world,

111
00:10:24,070 --> 00:10:30,550
that there are other agents, that there are clumps of matter that cohere, that there's space that we move through.

112
00:10:31,030 --> 00:10:34,810
And so I think I feel kind of licensed to build those things into my robot.

113
00:10:35,620 --> 00:10:38,770
There are invariants in the worlds that we care about that we could build in theory.

114
00:10:38,830 --> 00:10:47,409
Maybe I know the kinematics of my robot, and I think there are also some interesting constraints that we have to respect that don't come from

115
00:10:47,410 --> 00:10:51,399
the problem that but that come from the fact that humans are actually going to engineer these systems.

116
00:10:51,400 --> 00:10:55,800
So if I'm going to build a system. Even if I'm going to build the metal learning system,

117
00:10:56,490 --> 00:11:01,590
it has to have some degree of modularity just because I personally can't understand the whole thing.

118
00:11:01,590 --> 00:11:05,340
I have to understand the pieces and parts and put them together in some kind of a systematic way.

119
00:11:06,060 --> 00:11:10,370
So these are some constraints that we can bring to bear on the problem. Okay.

120
00:11:10,490 --> 00:11:16,040
So I'll tell you just some little bit of history and then I'll talk about actual technical stuff.

121
00:11:16,950 --> 00:11:25,529
So my first job when I got graduated as an undergraduate in Philosophy of All Things was to work at a research lab,

122
00:11:25,530 --> 00:11:28,860
and they were just building a robot. And nobody there really knew anything about robots.

123
00:11:29,310 --> 00:11:32,639
And my job was to get the robot to drive down the hallway using the sonar sensors.

124
00:11:32,640 --> 00:11:38,580
And the sonar senses were terrible. They didn't get very reliable returns, and I didn't know anything.

125
00:11:39,180 --> 00:11:44,340
And so what I did was I wrote a program. The robot would run into the wall and I would fix the program.

126
00:11:44,640 --> 00:11:47,969
The robot would run into the wall in a somewhat different way. I would fix the program.

127
00:11:47,970 --> 00:11:53,010
The robot ran into the wall for weeks. I fix the program, the robot, and while I fixed the program, the robot and as the wall.

128
00:11:53,310 --> 00:12:00,990
Eventually I learned about how the sensors interacted with the environment and the control system and so on.

129
00:12:01,380 --> 00:12:06,300
I learned enough about that so that I could write a program that drove the robot down the hallway without running into anything.

130
00:12:06,750 --> 00:12:11,430
And the lesson I took away from that was that I didn't want to be in that loop anymore.

131
00:12:12,380 --> 00:12:15,709
Right. So the robot could darn well learn how to interact with the world.

132
00:12:15,710 --> 00:12:18,050
And I didn't want to be in the middle of that. All right.

133
00:12:18,080 --> 00:12:22,190
So I'm going to be outside designing the learning algorithm lets the robot interact with the world.

134
00:12:22,730 --> 00:12:28,820
So then I reinvented reinforcement learning and I did it kind of bad and wrong and eventually got introduced to people who knew something better.

135
00:12:28,820 --> 00:12:34,969
So that was good. And I made a little robot that did reinforcement learning and I actually learned something during my thesis defence,

136
00:12:34,970 --> 00:12:41,570
which was kind of cool, but by like the mid nineties, which was when neural networks were cool before.

137
00:12:41,570 --> 00:12:45,770
So you might read this is kind of like the third time right now the neural networks are cool.

138
00:12:46,460 --> 00:12:49,370
They were the second time they were cool was kind of like in the nineties.

139
00:12:49,760 --> 00:12:53,659
But in the nineties there was the same story is now everyone said, Oh, this is awesome.

140
00:12:53,660 --> 00:12:56,990
We're just going to put neural networks in there and they will figure everything out.

141
00:12:58,510 --> 00:13:03,970
But I just for sample complexity reasons, for some bunch of technical reasons, I don't actually think that that's possible.

142
00:13:04,780 --> 00:13:13,929
Okay. So what to do instead? So I've been working over maybe about the last ten years with that colleague Thomas Perez,

143
00:13:13,930 --> 00:13:17,830
who knows a lot about robot kinematics and robot planning and that kind of stuff.

144
00:13:18,490 --> 00:13:23,140
I, in the intervening time, learned something about planning under uncertainty and model learning.

145
00:13:23,560 --> 00:13:28,210
And we've been working together and we're taking the following approach.

146
00:13:29,050 --> 00:13:33,460
So our view is that there are some basic inference algorithms.

147
00:13:34,920 --> 00:13:44,640
And representations that are justifiable based on regularities in our environment and some kinds of fundamental computational sorts of facts.

148
00:13:45,180 --> 00:13:47,040
Convolution is a great example. Right.

149
00:13:47,050 --> 00:13:53,250
So the same people who like to say that it's a terrible idea to build things into their networks, do convolutions.

150
00:13:53,760 --> 00:13:56,219
But if they're doing convolutions, they're building something into their network,

151
00:13:56,220 --> 00:14:02,790
which is an understanding of some translation invariance or some local spatial regularities and stuff like that.

152
00:14:02,790 --> 00:14:07,170
So they're building a lot of knowledge. And I think that there is a handful more mechanisms like that.

153
00:14:07,410 --> 00:14:13,770
I am not going to argue that this is the right particular set of mechanisms, but I would argue that there are there is some set of mechanisms,

154
00:14:14,160 --> 00:14:19,770
and I'm hoping it's like six or ten and not 700, because otherwise it's going to be hard.

155
00:14:20,580 --> 00:14:23,970
So we're building in some basic principles and mechanisms.

156
00:14:24,420 --> 00:14:29,040
And what we were doing until very recently was actually hand building a system.

157
00:14:30,390 --> 00:14:33,510
And so we're learning people hand building a system.

158
00:14:33,900 --> 00:14:40,440
Well, it was to get an understanding of the whole arc of the system,

159
00:14:40,440 --> 00:14:47,520
of how you could go from perception through estimation and reasoning and action and so on, to build the system that was pretty, pretty competent.

160
00:14:48,990 --> 00:14:54,100
So what I'm going to do is tell you about that. And then I'll tell you about how we're adding learning.

161
00:14:56,160 --> 00:15:00,660
So this is a this is a photo I like to show in all my talks lately.

162
00:15:01,110 --> 00:15:04,770
It is not my kitchen, I promise.

163
00:15:05,670 --> 00:15:13,229
But imagine what it would take to make breakfast there or clean it up and think about what?

164
00:15:13,230 --> 00:15:17,280
What makes it so. That's hard. That seems like a hard problem for a robot and what makes it hard.

165
00:15:17,310 --> 00:15:20,760
So there's a bunch of reasons that make it hard.

166
00:15:21,030 --> 00:15:24,870
One of them is that it's in a sense, in a very high dimensional space.

167
00:15:25,100 --> 00:15:30,180
Right. Robot people like to talk about how many degrees of freedom their robot has, you know, six or ten or 20 or something.

168
00:15:30,840 --> 00:15:35,900
But how many degrees of freedom does the kitchen have? You can't count them.

169
00:15:36,320 --> 00:15:41,860
It's not just the positions and orientations of all the objects. It's like whether the grapes are mushy and when the people are coming home, right.

170
00:15:41,870 --> 00:15:46,870
So that all this stuff very hard to even think about the state space of that kitchen.

171
00:15:47,770 --> 00:15:49,310
The horizon is really long.

172
00:15:49,330 --> 00:15:55,870
If you imagine how many like little linearly interpolated motions the robot might have to make to clean that kitchen, that's really a lot.

173
00:15:56,710 --> 00:15:59,770
And the uncertainty is fundamental. So, again, there's.

174
00:16:00,400 --> 00:16:05,139
If you talk at a robotics conference, often people will say, but don't don't worry about uncertainty.

175
00:16:05,140 --> 00:16:07,900
We'll just make the sensors better and then you won't have to worry about it.

176
00:16:08,620 --> 00:16:12,189
And for some kinds of uncertainty, it's true that making the sensors better will cure the problem.

177
00:16:12,190 --> 00:16:17,500
But making the sensors better won't let me know what's inside the cupboard or your head.

178
00:16:18,340 --> 00:16:21,549
Right. So there's uncertainty. That's just that's kind of very fundamental.

179
00:16:21,550 --> 00:16:26,890
And I can maybe get information about these things, but it requires careful, explicit action to do that.

180
00:16:29,010 --> 00:16:33,329
Okay. So we have a kind of an architecture with boxes and arrows and it's not very

181
00:16:33,330 --> 00:16:36,300
surprising or different from other people's architectures with boxes and arrows.

182
00:16:36,600 --> 00:16:42,090
But I'll tell you what I think are some of the salient points of how we address this problem and show you a demonstration.

183
00:16:42,990 --> 00:16:46,560
Okay. So we have this thing called live space hierarchical planning in the now.

184
00:16:47,460 --> 00:16:51,600
Fundamental to it is the idea of reasoning and belief space.

185
00:16:52,350 --> 00:16:56,370
So everyone who's ever had an atomic theory course or did way back in the day or

186
00:16:56,370 --> 00:16:59,699
something knows that you can basically take any machine apart into these two pieces,

187
00:16:59,700 --> 00:17:05,490
one of which remembers something about the history and the other part that decides what to do based on what it's remembered.

188
00:17:06,870 --> 00:17:12,950
So we'll call the state estimation and action selection. And for us, the arc that goes between the boxes is BUI,

189
00:17:12,990 --> 00:17:18,570
if you can think about it as a probability distribution over the possible states of the external world.

190
00:17:19,230 --> 00:17:21,900
So the first box is trying to estimate what's going on,

191
00:17:21,900 --> 00:17:30,690
and the second box has the job of taking that belief that distribution over the state of the world and deciding what action to do.

192
00:17:31,860 --> 00:17:34,890
So that's the space we kind of live in.

193
00:17:35,220 --> 00:17:38,430
So the first question is what what goes along that wire?

194
00:17:38,850 --> 00:17:44,430
And if you've read papers about common filters or dips or something.

195
00:17:44,700 --> 00:17:47,970
There's been talk about state representation and so on.

196
00:17:48,270 --> 00:17:54,410
But if you think about a robot that has to clean the kitchen, it's state representation can't be like some lovely little vector.

197
00:17:54,420 --> 00:18:02,740
It's a very complicated story. So first of all, we don't know in advance how many objects there are or what they are.

198
00:18:02,760 --> 00:18:06,810
It's not like we can say, Oh, I have a ten dimensional state space, so we have an open world.

199
00:18:07,560 --> 00:18:10,980
We keep. And again, I don't want to argue that this is the one true way.

200
00:18:10,980 --> 00:18:13,290
I just want to tell you a thing that kind of works.

201
00:18:14,370 --> 00:18:22,439
So we keep a kind of a database of objects that we believe exist in the world and some distribution over their properties,

202
00:18:22,440 --> 00:18:31,749
like their relative positions in their mass and so on. We keep a representation of what space we believe is free and what space we believe is occupied

203
00:18:31,750 --> 00:18:35,470
because we have to reason about whether it's safe to go somewhere that we haven't looked at yet.

204
00:18:36,500 --> 00:18:42,800
We keep distributions about what kinds of objects tend to occur near other ones so that we might search more efficiently in certain kinds of places.

205
00:18:43,280 --> 00:18:52,389
So this is our belief, something complicated. We have also to worry about an integration that very few people worry about.

206
00:18:52,390 --> 00:18:57,040
Lots of people worry about robot motion planning. Just how do I move the robot from one pose to another one?

207
00:18:57,130 --> 00:19:03,870
And that's a non-trivial problem. And there's a million algorithms and AI people worry about high level symbolic actions.

208
00:19:03,880 --> 00:19:11,500
What should I do and in what order? At some very abstract level. What's interesting is that these things can't be isolated from one another,

209
00:19:11,830 --> 00:19:18,280
that the geometry can actually completely affect what high level actions you should do and in what order.

210
00:19:19,240 --> 00:19:24,670
The geometry might tell me whether I can drive my car down a certain alley or not,

211
00:19:24,670 --> 00:19:29,020
or whether I can fit in a certain place, or whether I can put these two pans on the stove at the same time.

212
00:19:29,860 --> 00:19:35,530
So we spend a lot of time worrying about reasoning about the interaction between discrete things and continuous ones.

213
00:19:37,310 --> 00:19:48,080
Okay. So probability geometry, discrete stuff, the planning problem, the optimal planning problem in our domain is like unthinkably difficult.

214
00:19:49,400 --> 00:19:53,330
So we kind of fall back on ideas from control theory.

215
00:19:53,330 --> 00:19:57,080
So in control theory, an important idea is feedback.

216
00:19:57,080 --> 00:20:01,100
And the important thing about feedback is that you could do a slightly wrong action.

217
00:20:01,520 --> 00:20:06,200
And if you're just very quickly look to see what happens, you can decide to do something else instead.

218
00:20:06,200 --> 00:20:08,360
That might make up for the slightly wrong thing that you did before.

219
00:20:08,660 --> 00:20:12,709
So you don't have to pick optimal controls, you just have to pick like not terrible controls.

220
00:20:12,710 --> 00:20:16,280
I'm all about being not terrible. Optimal is not in it, right?

221
00:20:16,280 --> 00:20:20,420
There's no way. So I've made peace with not being optimal.

222
00:20:21,140 --> 00:20:30,770
So here's our strategy. Our strategy is we make it really weak, really very approximate model of the dynamics of our world.

223
00:20:31,930 --> 00:20:38,350
And do planning. Right. So here's my beliefs. I have a distribution over the state of the world and make some kind of plan.

224
00:20:39,390 --> 00:20:44,430
I'm going to take the first step of that plan executed in the world, get an observation to see what happened.

225
00:20:45,240 --> 00:20:47,700
And I'll take my belief and plan again.

226
00:20:48,120 --> 00:20:56,400
And an important thing to think about when you see this system is the perspective that you take when you're making decisions.

227
00:20:57,470 --> 00:21:01,280
So the spec, the perspective that we take, I don't know if you can see that kind of grey,

228
00:21:01,310 --> 00:21:12,140
you can a little bit from the planner's perspective, it's interacting with an environment, but its environment includes the belief update.

229
00:21:13,720 --> 00:21:17,140
So it's a control system that operates in belief space.

230
00:21:17,860 --> 00:21:20,799
We give it its objectives and belief space, right? I tell the robot,

231
00:21:20,800 --> 00:21:27,430
I would like you to believe with high probability that the kitchen is clean or that the green boxes on the left hand side of the table or something.

232
00:21:27,730 --> 00:21:33,550
I can't give the robot goals in state space because the robot doesn't have access to state space.

233
00:21:33,560 --> 00:21:40,660
It can't promise me that it can change the world in a certain way, but I can ask it to come to believe something now.

234
00:21:40,660 --> 00:21:44,800
And it's not allowed to just like delude itself and just say, Oh yeah, I believe it, no problem.

235
00:21:45,010 --> 00:21:49,360
It has to actually do the work and run the bays update and so on so that it really does believe this thing.

236
00:21:49,360 --> 00:21:53,530
So I ask it to come to believe something and when it chooses actions,

237
00:21:53,530 --> 00:21:58,509
it has to think about not just the effect of the effects of the actions on the state of the world,

238
00:21:58,510 --> 00:22:03,639
but actually the effects of the actions on its own belief. And that's why it looks right.

239
00:22:03,640 --> 00:22:06,790
You look not to change the world, but to change your belief.

240
00:22:07,690 --> 00:22:10,940
And what's nice is that you can treat all your actions.

241
00:22:10,960 --> 00:22:15,880
Actions that gather information. Actions that change the state of the world. Actions that do both things at once.

242
00:22:15,880 --> 00:22:22,750
Which really most do all in the same framework. So that's a lesson from DPS, but it applies here too.

243
00:22:23,500 --> 00:22:30,100
So we think about planning and building space. We think about planning to take actions that will control our own state of information about the world.

244
00:22:32,110 --> 00:22:36,759
Okay. One more kind of high level idea. I promise I'll get more even more technical in a minute.

245
00:22:36,760 --> 00:22:38,680
But I just kind of want to give you the story here.

246
00:22:39,040 --> 00:22:46,180
Another kind of high level idea here is, okay, so planning is difficult and inefficient and exponential on the horizon usually.

247
00:22:46,960 --> 00:22:53,590
And so we can't stand to have a very long horizon. And if we thought about how many actions it takes to clean that kitchen, the horizon is horrible.

248
00:22:54,280 --> 00:22:59,830
So we do something hierarchical. And people have talked about hierarchical planning a lot.

249
00:23:00,880 --> 00:23:10,840
Usually when they talk about hierarchical planning, they use the idea of hierarchy to make planning for a completely worked out plan more efficient.

250
00:23:11,920 --> 00:23:14,550
But we're going to do something a lot more aggressive than that.

251
00:23:14,560 --> 00:23:20,680
So what we're going to do is we start maybe with some high level goal and we make a plan at some level of abstraction.

252
00:23:20,710 --> 00:23:21,400
So, for instance,

253
00:23:21,400 --> 00:23:27,340
I made a plan to come to Oxford involved going to the Boston airport and flying around and walking through Heathrow and doing some things like that.

254
00:23:28,640 --> 00:23:30,920
But I mean, it is a pretty high level of abstraction.

255
00:23:31,460 --> 00:23:37,230
Now, partly I did that because I'm kind of computationally lazy and I figured I could work out the rest later.

256
00:23:37,850 --> 00:23:45,110
But partly also I did it because I didn't have enough information. I couldn't have planned my trajectory through Heathrow because I didn't know

257
00:23:45,110 --> 00:23:48,080
what the map was like or what gate we would come into or any of that stuff.

258
00:23:48,620 --> 00:23:52,730
So we make a high level of abstraction and then we kind of commit to the first step.

259
00:23:52,940 --> 00:23:58,970
So when we make a plan, I should say the little that you can think of the yellow boxes and as abstract actions,

260
00:23:59,630 --> 00:24:02,680
you can think of the blue boxes as what we call primitives.

261
00:24:02,690 --> 00:24:06,590
They're like sets of states that we have to go through kind of sub goals.

262
00:24:07,960 --> 00:24:10,600
So we take the first signal and we make a plan for that.

263
00:24:10,620 --> 00:24:16,210
So get to the Boston airport and then we take the first goal of that and we make a plan for that.

264
00:24:16,480 --> 00:24:22,300
Like, I don't know, get an Uber. And so finally I get down to some primitive action and I execute it.

265
00:24:22,660 --> 00:24:26,440
So I'm being optimistic that I can work out the rest of the stuff later.

266
00:24:27,200 --> 00:24:29,629
And eventually right now, we're hand building these models.

267
00:24:29,630 --> 00:24:36,770
Eventually, I'm going to have to learn, for instance, to predict whether it's reasonable to walk through Heathrow in 20 minutes.

268
00:24:36,950 --> 00:24:40,849
I don't know. You can tell me whether that's reasonable or not. Okay.

269
00:24:40,850 --> 00:24:46,640
So we have this hierarchical planning thing. It also lets us remember I said whenever we took an action we re planned.

270
00:24:47,300 --> 00:24:53,570
The fact is that if we have this structure, we can decide whether we need to replan very efficiently.

271
00:24:54,080 --> 00:24:59,620
We can ask the question, I just took this action. Did it did it lead to a blue box that I was expecting?

272
00:24:59,630 --> 00:25:04,730
If it did, I can do the next one. If it didn't, I can pop that low level thing off the stack.

273
00:25:04,730 --> 00:25:09,320
So let's say I was planning to take an Uber to the Boston airport, but I discovered that there aren't any.

274
00:25:09,710 --> 00:25:19,970
Okay, so I pop my bottom plan. I don't give up the idea of going to the airport or the idea of coming to Oxford or my academic career or, you know.

275
00:25:20,000 --> 00:25:24,810
Right. You know, people possibly who reason too much at the high level, but it's not healthy.

276
00:25:25,160 --> 00:25:30,350
A little bit is okay, but too much is not. So you'd like to control your reasoning in the structure lets you do that.

277
00:25:30,710 --> 00:25:36,110
So that's also kind of a nice thing. Okay, we put this together, we get the robot to do some stuff.

278
00:25:36,110 --> 00:25:38,570
So I'm just going to show you some movies and kind of talk while it does.

279
00:25:38,930 --> 00:25:45,980
What's important about these movies is that the robot is doing different kinds of things subject to different goals, and it's all the same code.

280
00:25:46,460 --> 00:25:49,760
In this case, we asked it to put the green block on the corner.

281
00:25:50,120 --> 00:25:55,160
The Green BLOCK is too big to pick up, so it has to push it. It reasoned that it had to take the orange one out of the way.

282
00:25:55,520 --> 00:26:01,759
It also knows that pushing is really unreliable, and so every time it pushes it, it looks to see where it went and says, that wasn't so good.

283
00:26:01,760 --> 00:26:04,790
I better, you know, replan it, pushing the same code again.

284
00:26:04,790 --> 00:26:12,139
We asked the robot to go out of the room. It knows just it knows about space and it knows about occlusion and obstacles.

285
00:26:12,140 --> 00:26:16,100
And it says if there's something in my way, I can't go through it.

286
00:26:16,100 --> 00:26:19,910
First. It says, if I want to move for some space, I have to look at it and believe that it's free.

287
00:26:20,600 --> 00:26:23,570
If I look at it and find something in the way, I have to move it out of the way.

288
00:26:23,930 --> 00:26:29,900
So it's reasoning very generally, doing all these same kinds of things.

289
00:26:29,990 --> 00:26:37,700
Before it was picking up the oil model to see if there was oil in it. Okay, so this was good and bad.

290
00:26:37,850 --> 00:26:40,970
Oh, this is a crazy robot in Singapore. Okay, well, whatever.

291
00:26:41,060 --> 00:26:45,680
A little too much similarity on that one. All in the same row, all the same code.

292
00:26:47,240 --> 00:26:52,430
But so that was good. Kind of general purpose. Kind of general purpose, reasoning, planning, estimation, and so on,

293
00:26:52,790 --> 00:26:58,069
but kind of not so good because there's no learning or add activity in there at all.

294
00:26:58,070 --> 00:27:01,370
There's no learning in the moment and there's no learning in the factory.

295
00:27:01,370 --> 00:27:08,540
This is basically me and my writing code. So not an extensible strategy for making general purpose robots in a factory.

296
00:27:08,930 --> 00:27:14,840
So the question now is how can we think about making a system like this learn?

297
00:27:16,760 --> 00:27:23,840
Okay. So in particular, we want to think about what kinds of things can we learn and how to do it.

298
00:27:24,980 --> 00:27:28,730
And so there's there's kind of an interesting distinction.

299
00:27:28,730 --> 00:27:32,059
I really think that there's two importantly different kinds of learning, which, again,

300
00:27:32,060 --> 00:27:37,460
people used to talk about a lot back in the day, but I think they tend to model them now a little bit.

301
00:27:39,020 --> 00:27:44,299
So a classic kind of learning is to learn about the world, right?

302
00:27:44,300 --> 00:27:51,350
So I'm a robot. I don't know what happens if I push this button, so I'm going to try it in your kitchen and see what happens or I'm not sure.

303
00:27:51,440 --> 00:27:54,559
So there's some things about how the world works that I don't understand.

304
00:27:54,560 --> 00:27:58,760
And I have to do things to gather information to figure out how the how does the world work.

305
00:27:58,760 --> 00:28:02,899
I might learn observation models, how to my sensors. Tell me about the world transition models.

306
00:28:02,900 --> 00:28:08,600
What happens if I do this? Most of the work right now in robot learning is focussed on these two lower boxes, right?

307
00:28:08,600 --> 00:28:11,419
So one thing is object detection, which has been huge, right?

308
00:28:11,420 --> 00:28:16,190
So various kinds of perception has been very important to us and also primitive policies,

309
00:28:16,430 --> 00:28:22,430
strategies for picking things up or reorienting something in my hand or riding a bicycle or walking.

310
00:28:22,430 --> 00:28:27,500
Right. So those are all very important kind of you can think of them as sort of closed sensory motor loops.

311
00:28:28,490 --> 00:28:32,780
So this is this is one kind of learning which is really gathering information about the world that time.

312
00:28:34,190 --> 00:28:41,540
There's another kind of learning which is at least as important, I think, which is learning to reason more efficiently or effectively.

313
00:28:42,020 --> 00:28:47,510
And I would argue that learning to play chess or go or Starcraft, is that right?

314
00:28:47,780 --> 00:28:51,979
Especially chess and go. Maybe not the Starcraft thing just to go to figure out, you know, the rules.

315
00:28:51,980 --> 00:28:57,350
Once you read the rules information, theoretically you were capable of making an optimal first move.

316
00:28:58,040 --> 00:29:01,069
It's just that you're too dumb to compute that right.

317
00:29:01,070 --> 00:29:05,150
If you just had a better computer, you wouldn't have to learn any new information.

318
00:29:06,020 --> 00:29:09,800
So you have to put the information in a different form, but you've got the information.

319
00:29:10,310 --> 00:29:14,930
And so there are lots of opportunities within a kind of an estimation and planning and

320
00:29:14,930 --> 00:29:18,980
reasoning architecture like the one I told you about to also do that kind of learning.

321
00:29:19,220 --> 00:29:22,400
And so we're thinking about what kinds well,

322
00:29:22,400 --> 00:29:27,469
I'm going to do now is talk now concretely about some particular thread of work

323
00:29:27,470 --> 00:29:32,750
that's going on right now in my group involving learning transition models.

324
00:29:32,930 --> 00:29:37,400
And a little bit about learning sampling for planning because when you plan in continuous domains,

325
00:29:38,210 --> 00:29:44,000
you have to do something to manage the the continuity, the continuum of the possible actions that you can take.

326
00:29:46,390 --> 00:29:49,840
Okay. So old story. Lots of models in the world.

327
00:29:49,840 --> 00:29:52,840
They're all wrong. Some are useful. And useful for what?

328
00:29:53,050 --> 00:29:56,320
So we have to think about if we're going to learn a model of the actions the robot can take.

329
00:29:56,380 --> 00:30:02,230
What kind of a model could we learn? I had a post-doc named George Kennedy Ross, who thought,

330
00:30:02,230 --> 00:30:08,980
I think very nicely and usefully about the continuum and abstraction in learning models for robots.

331
00:30:09,250 --> 00:30:15,370
He likes to talk about the swamp. You can imagine there's some system of partial differential equations that really governs how the world works.

332
00:30:16,090 --> 00:30:22,240
But that's and you could imagine that being a very accurate model, but not a model that you could plan with easily.

333
00:30:23,530 --> 00:30:27,070
AP When people think about these beautiful, totally abstracted,

334
00:30:27,670 --> 00:30:35,709
wonderful symbolic rules of symbolic modelling and so on, and they're lovely and you can play with them efficiently.

335
00:30:35,710 --> 00:30:40,690
And then the question is, well, how do you make what kind of connection can you make?

336
00:30:40,690 --> 00:30:48,640
Right? So what we need to do is think about how can we abstract chunks of swamp into into kind of nice abstract symbols up at the high level.

337
00:30:49,360 --> 00:30:56,410
So I think we need both levels. And the view that we're taking right now is that we have local control loops that kind of operate in the swamp.

338
00:30:56,980 --> 00:31:01,810
So picking things up and moving things in your hands and walking that all kind of the swamp level stuff.

339
00:31:01,990 --> 00:31:06,270
It gets you from this blob of the swamp to that blob so you can do some stuff like that.

340
00:31:06,290 --> 00:31:12,370
But then what we want to do is learn models of how those low level control loops work.

341
00:31:13,320 --> 00:31:16,590
And those models don't have to be perfect. They just have to be kind of like good enough.

342
00:31:17,070 --> 00:31:19,760
We hope to be able to abstract over objects, right?

343
00:31:19,770 --> 00:31:28,200
So I don't want to have to learn a model of the dynamics of this particular prob, but I know still if I let go of it, it will dry up.

344
00:31:28,200 --> 00:31:33,410
I know what'll happen if I throw it. I know a bunch of things about it, so I know some, some things abstractly.

345
00:31:35,280 --> 00:31:39,839
And what I hope is that I can get some kind of virtuous category diagram thing

346
00:31:39,840 --> 00:31:44,729
going on here so that at the high level there are arrows that are maybe nearly

347
00:31:44,730 --> 00:31:49,379
deterministic that move me from sets of states at the high level to other sets

348
00:31:49,380 --> 00:31:53,300
of states which are really embodied in these kind of swampy control loops.

349
00:31:54,790 --> 00:31:59,270
So. Actually Skip that's permanent.

350
00:31:59,630 --> 00:32:04,250
So what we're going to try to do what I'm going to talk about now in detail.

351
00:32:05,980 --> 00:32:11,740
Is learning a model of the preconditions and effects of a low level control.

352
00:32:11,920 --> 00:32:18,610
So imagine someone has learned an awesome policy for picking something up or stirring something or pouring liquid.

353
00:32:19,480 --> 00:32:24,130
And now I want to learn and abstracted model of it so that I can do planning.

354
00:32:24,730 --> 00:32:33,080
That's what what we're up to here and what we've learned recently in the world of planning for mixed,

355
00:32:33,100 --> 00:32:41,979
continuous and discrete systems is that is that we can get really great leverage on these problems if they're articulated.

356
00:32:41,980 --> 00:32:47,590
If the models that we have were articulated in a certain way and a really important aspect is that they

357
00:32:47,590 --> 00:32:53,320
be factored that we talk about the state of the world not as an under analysed thing like State 94,

358
00:32:53,800 --> 00:32:58,630
but that we describe it in terms of state variables. And the state variables have values that we can change.

359
00:32:59,440 --> 00:33:06,909
It's also true that that constraints are a very useful language for describing the effects of actions in the world,

360
00:33:06,910 --> 00:33:14,050
especially geometric kinds of ways. So we're going to we're going to look for models that are articulate, articulable in this kind of style.

361
00:33:15,090 --> 00:33:21,900
Okay. So how can a competent robot acquire a new ability, assuming that my robot already knows how to pick things up, how to move around.

362
00:33:22,440 --> 00:33:28,110
But we've learned to do a new thing, like pouring or stirring, and we want to add it to the robot's repertoire.

363
00:33:28,800 --> 00:33:34,290
So that's what we're up to. So here's an example. Here's pouring in two dimensions.

364
00:33:36,000 --> 00:33:40,620
I can describe the situation using a bunch of continuous valued variables,

365
00:33:41,190 --> 00:33:45,870
things like the size of the aperture of the cup that I'm pouring out of the size of

366
00:33:45,870 --> 00:33:51,420
the thing that I'm pouring into the relative pose of the centres of the two things,

367
00:33:51,780 --> 00:33:58,470
the way in which the robot is grasping the cup. You could imagine some game parameter in the controller for doing the pouring.

368
00:33:58,740 --> 00:34:02,070
You could imagine conditioning on the viscosity of the stuff in there.

369
00:34:02,110 --> 00:34:07,080
We're not doing that, but you could imagine that. So there's a bunch of parameters that govern the situation.

370
00:34:08,370 --> 00:34:15,479
And what I'm interested in understanding is under what conditions will my pouring operator actually work?

371
00:34:15,480 --> 00:34:18,510
Pretty well, that is to say, get the stuff into the target.

372
00:34:19,320 --> 00:34:20,550
That's what I would like to do.

373
00:34:21,030 --> 00:34:30,180
So one way to think about it is that we could write a kind of symbolic ish looking description of this operation of pouring.

374
00:34:31,130 --> 00:34:37,220
But it has a bunch of continuous parameters, right? The grasp and the sizes of things and the relative pose and so on.

375
00:34:38,000 --> 00:34:45,790
And what I want to learn is a constraint. A constraint on the values of those continuous variables.

376
00:34:46,980 --> 00:34:55,230
But has the property that if it's true and I execute the action under these circumstances, then the goal will probably be satisfied.

377
00:34:55,590 --> 00:34:59,970
The effect will probably happen. But if it's not true, then probably not.

378
00:35:01,620 --> 00:35:07,260
So that's that's the thing that that I want to try to learn when will this operation have the desired outcome?

379
00:35:08,380 --> 00:35:13,920
And because I actually would like to do this with an actual robot, I would like for it to not take too many samples.

380
00:35:13,930 --> 00:35:22,220
So I'm going to be serious about that, too. Okay. So one way we can think about this problem is as a regression problem.

381
00:35:22,240 --> 00:35:32,020
So we could say, instead of saying, well, my constraint is either satisfied or not, I might say it satisfied or not to some degree I can.

382
00:35:32,020 --> 00:35:38,740
You could imagine making a score of pouring. For pouring. It's easy. We just measure the number of particles that end up in the in the target place.

383
00:35:38,980 --> 00:35:44,380
And that's kind of a score. And we say, oh, we would like the scoring function to be higher than some value.

384
00:35:44,650 --> 00:35:47,630
We'll just call that value zero for now. It doesn't matter some constant.

385
00:35:49,150 --> 00:35:53,110
And so then what I'm going to do is I'm going to do some experiments and I want to

386
00:35:53,110 --> 00:35:57,280
learn the mapping from the values of all those continuous variables into the score.

387
00:35:58,790 --> 00:36:04,939
Right. So and if I can learn that and if I have the scoring function, then I know for any assignment of values to those variables,

388
00:36:04,940 --> 00:36:10,160
the likelihood that the or the, you know, the amount of liquid that I expect to fall into the cup.

389
00:36:11,350 --> 00:36:14,799
We can formulate this as a bunch of different kinds of regression strategies.

390
00:36:14,800 --> 00:36:19,000
We're going to use a Gaussian process regression for probably some of your experts.

391
00:36:19,000 --> 00:36:22,030
And this is some of you don't know what it means, so I'll try to talk to everyone.

392
00:36:23,500 --> 00:36:29,440
It's a way of of articulating our own uncertainty about this mapping so that we can do experiments effectively.

393
00:36:30,010 --> 00:36:34,540
So what we do is we do some set of initial outpourings and we get some for each.

394
00:36:35,260 --> 00:36:39,320
Each time we try it, we have some assignment of values to those variables, right?

395
00:36:39,380 --> 00:36:44,350
We try it with different sized caps and different relative positions and different games of the controller and so on.

396
00:36:44,740 --> 00:36:50,350
And for each one of those we get a score. Okay, so the way it goes in process works.

397
00:36:50,650 --> 00:36:55,810
You think along the x axis here I can only do one dimension that's really in a lot of dimensions, but I can only do one.

398
00:36:56,140 --> 00:36:59,230
Those are the the the parameters were very.

399
00:37:00,220 --> 00:37:04,900
And we're interested in knowing when is this g function bigger than zero?

400
00:37:05,020 --> 00:37:06,250
That's what we would really like to know.

401
00:37:07,900 --> 00:37:16,180
So whenever we do an experiment, so that's one of these little blue x's, we get an observation of that function and we can,

402
00:37:17,140 --> 00:37:23,200
using some Bayesian reasoning, compute a function that is sort of the mean and posterior mean.

403
00:37:23,200 --> 00:37:28,930
We have a distribution over the actual function. And every time we get an observation, we can compute a distribution on the function.

404
00:37:29,320 --> 00:37:33,280
So the dark red line is the mean of that distribution over functions.

405
00:37:33,280 --> 00:37:38,290
And the pink area is unlike standard deviation, couple of standard deviations in the other dimension.

406
00:37:40,140 --> 00:37:43,350
Now lots of people will use Gaussian processes to do a lot of different things.

407
00:37:43,380 --> 00:37:47,310
Often people think about it. They're interested in finding the optimum of the function.

408
00:37:47,790 --> 00:37:51,270
We're interested in something else. We're interested in.

409
00:37:52,260 --> 00:38:00,590
The level set we're interested in knowing what ranges of theta is is g above zero

410
00:38:01,440 --> 00:38:05,950
for what ways I can do pouring like I don't want to learn pouring in just one way.

411
00:38:05,970 --> 00:38:12,120
I'll explain why in a minute. But for what arrangements of pouring is it going to work out and for what arrangements will it not?

412
00:38:12,570 --> 00:38:14,879
So this block area there that I've drawn in,

413
00:38:14,880 --> 00:38:26,130
that particular figure is right now the region of theta space where I believe with high probability pouring will work.

414
00:38:27,900 --> 00:38:30,650
Can you be sure that that makes sense here? Right.

415
00:38:31,440 --> 00:38:38,670
So for right now, there's just this little region where I'm convinced it's going to work well with high probability greater than 0.95 or something.

416
00:38:39,570 --> 00:38:46,990
So but now what I would like to do is some active experimentation to try to understand the boundaries of that level set.

417
00:38:47,010 --> 00:38:53,370
I would like to know well what other configurations will give me good pouring and which ones won't.

418
00:38:54,150 --> 00:38:57,470
And so we use an algorithm called the straddle algorithm, which is pretty interesting.

419
00:38:57,480 --> 00:39:01,530
I'm not going to go over it in detail, but it has this notion of an acquisition function.

420
00:39:01,890 --> 00:39:06,810
So for different theta is it tells us which values of theta will give us the most information,

421
00:39:06,810 --> 00:39:10,690
not about the maximum, but about the boundaries of the level set.

422
00:39:11,880 --> 00:39:14,730
I want to know the boundaries of successful pouring.

423
00:39:15,330 --> 00:39:22,170
So this particular acquisition function likes to try experiments in places where the mean is near zero, right?

424
00:39:22,170 --> 00:39:29,760
Because that's we're probably near a boundary of good versus bad and where the standard deviation is higher and it combines this in a suitable way.

425
00:39:30,900 --> 00:39:36,810
And what that means is that we can take a small number of samples and update our belief about this function.

426
00:39:38,820 --> 00:39:43,680
Okay. So what we find is that this kind of active learning is is data efficient.

427
00:39:43,980 --> 00:39:51,000
So if we try experiments at random, it takes really a lot of experiments before we get a good idea of the of that super level set.

428
00:39:52,110 --> 00:39:58,019
If we try this other approach, I have to tell you just a tiny story, because the first paper we did about this was, again, just me and Thomas,

429
00:39:58,020 --> 00:40:04,380
and we did it using some kind of feedforward neural network because we thought it would sound cool and it kind of worked, but not very well.

430
00:40:05,280 --> 00:40:09,000
And but mostly what it did is it enraged our students. So they did a better job.

431
00:40:09,390 --> 00:40:13,830
So students did a better job. The Gaussian processes, this red thing, it's awesome.

432
00:40:13,830 --> 00:40:19,920
So with not too many trials. No, this iteration here is how many experiments we had to do.

433
00:40:20,340 --> 00:40:24,520
It's not at times ten to the third, it's just like ten or 20 or 30.

434
00:40:24,930 --> 00:40:33,330
So we learned something from experimentation. There's another piece of the story which I'm actually going to skip and I'm going to show you.

435
00:40:33,750 --> 00:40:41,520
So first in simulation and then in the real robot, what again we do is we take something that has the ability already.

436
00:40:41,520 --> 00:40:49,440
It already understands picking up objects and putting them down. It learns pre images for pouring in this case and stirring.

437
00:40:50,070 --> 00:40:53,860
We asked it to make a cup of coffee. To make a cup of coffee.

438
00:40:53,880 --> 00:40:58,080
There has to be cream in there. There has to be sugar. There has to be coffee. It has to be mixed.

439
00:40:58,080 --> 00:41:01,320
It has to be on the green thing and it has to be served at the end of the table.

440
00:41:01,350 --> 00:41:04,350
That's the thing. We do not tell her what steps to do or in what order.

441
00:41:04,360 --> 00:41:11,070
So it's using a general purpose planner to do that. And it uses these learn it uses these learned to print images.

442
00:41:12,030 --> 00:41:19,290
To to kind of get new descriptions of these operations of stirring and pouring and scooping, and it puts them together to make these plans.

443
00:41:19,740 --> 00:41:22,950
We have to watch it stirring because it's fun. There we go.

444
00:41:23,310 --> 00:41:26,520
I like this nice little simulator. More fun.

445
00:41:26,520 --> 00:41:29,820
Is this okay? So here's our robot doing basically the same thing.

446
00:41:30,120 --> 00:41:35,070
In this case. We just learned the pouring and the pushing it already picking up.

447
00:41:35,940 --> 00:41:40,140
What's interesting about this is that the goal varies. That is to say which.

448
00:41:40,740 --> 00:41:46,140
Yeah, I know. So you can come in fixed my motion planner if you want to, or you could just giggle.

449
00:41:48,150 --> 00:41:51,719
But so we give it objectives. We move the objects around.

450
00:41:51,720 --> 00:41:55,170
We ask you to do different things at different times. It kind of does it.

451
00:41:55,170 --> 00:41:58,410
It's not a thing of beauty, but it's actually reasonably reliable.

452
00:41:59,340 --> 00:42:05,550
I'll show you some outtakes at the very end. This time we told you that we wanted the thing on top of the block and the stuff in it.

453
00:42:06,180 --> 00:42:14,009
And I guess I would argue that as we make these kinds of scenarios more complicated and the goals more complicated, look there,

454
00:42:14,010 --> 00:42:23,700
it pushed the ball so that it was in the usable workspace so that compare with the other hands that was like mildly clever and did it.

455
00:42:24,030 --> 00:42:31,950
Now not off the table as we make these scenarios more and more complicated, it seems, at least to me,

456
00:42:31,950 --> 00:42:36,929
with my limited imagination, harder and harder to just straight up learn a policy to do this.

457
00:42:36,930 --> 00:42:41,970
And it seems to me that some kind of planning is actually kind of important to the process.

458
00:42:42,670 --> 00:42:45,760
Okay. Oh, last one. Okay, Guru. Right.

459
00:42:46,650 --> 00:42:50,020
Um, let's see. 10 minutes. Okay. This is good.

460
00:42:51,400 --> 00:42:55,900
So good. So what did I talked about there? Learning.

461
00:42:56,230 --> 00:43:00,460
Assuming we kind of had the framework for a description of the effects of an action.

462
00:43:01,720 --> 00:43:07,440
And now I want to talk about how we can actually learn the framework.

463
00:43:07,450 --> 00:43:14,120
Right. So in that case, I, I said which aspects of the domain were important to making that prediction?

464
00:43:14,140 --> 00:43:18,760
I said, the sizes and shapes of the cups mattered. I said, the gain and the controller matter.

465
00:43:18,790 --> 00:43:24,520
I said all that stuff. And you just had to learn that constraint on a fixed dimensional kind of problem.

466
00:43:25,780 --> 00:43:33,010
But that's not a reasonable setup, really. If I'm trying to put myself out of a job a little bit as the person who writes all this stuff down.

467
00:43:33,700 --> 00:43:39,040
So another thing that seems to be important is deciding which objects and which properties of those objects

468
00:43:39,430 --> 00:43:46,960
both affect the success of doing an operation and which might be actually changed by doing that operation.

469
00:43:48,390 --> 00:43:52,260
So I had some old work that did this in a kind of a logical framework.

470
00:43:52,260 --> 00:43:57,780
And so what we tried to do recently was recast that in a more hip, new neural network way.

471
00:43:58,110 --> 00:44:01,670
I don't know if it's better. I actually think it probably is. But yeah.

472
00:44:02,490 --> 00:44:08,510
Okay. So the idea. Let me just skip forward here.

473
00:44:08,540 --> 00:44:14,310
Okay. The idea here is. We have a representation of the state of the world.

474
00:44:15,390 --> 00:44:22,110
And, uh, but, but it's going to have different size in different instances of the problem.

475
00:44:22,260 --> 00:44:31,410
Right. So most setups for neural network learning and, and functional approximation and so on assume some kind of fixed dimensional representation.

476
00:44:32,040 --> 00:44:38,279
And when they don't, then they feed things in sequentially. Recently there's been work on something called graph neural networks,

477
00:44:38,280 --> 00:44:43,850
which makes me laugh because it's kind of like marked off random fields, which is an old idea.

478
00:44:43,860 --> 00:44:46,319
So this is a new name for an old idea, but it's a good idea,

479
00:44:46,320 --> 00:44:54,629
which is that you can learn something about local relationships among objects or properties or values in your model and propagate those values.

480
00:44:54,630 --> 00:45:01,740
And you can learn those local models in a way that makes them independent of the arity of the problem that you'll face today.

481
00:45:01,770 --> 00:45:05,670
Right. So today I have to clear two things off the table. Tomorrow I have to clear 20.

482
00:45:06,240 --> 00:45:14,310
But I hope that the models that I learn about how to do that will transfer automatically, will work independent of the size of my problem.

483
00:45:15,300 --> 00:45:22,530
So in any given problem instance, I might have a representation right at the moment of my current belief about the world.

484
00:45:22,920 --> 00:45:26,030
In this case, imagine that it's not even uncertain, although it could be.

485
00:45:26,040 --> 00:45:29,640
So I right now I know about some objects, and for each object I know about some properties.

486
00:45:30,180 --> 00:45:33,270
And what I'm interested in doing is learning a transition model.

487
00:45:33,270 --> 00:45:36,030
That is to say, what will happen if I do this action right.

488
00:45:37,390 --> 00:45:44,260
So one way to think about it is that it will depend on some properties of some objects in the

489
00:45:44,260 --> 00:45:49,510
current state and it will affect some properties of some objects in the resulting state.

490
00:45:50,680 --> 00:45:59,860
And what I'm going to focus on now is just telling you a little bit of a story about how we can find a model that has the sparseness property.

491
00:46:01,580 --> 00:46:09,290
Okay. So there's this, again, kind of old idea from I, which also comes from natural language and the notion of didactic reference.

492
00:46:09,830 --> 00:46:18,560
So [INAUDIBLE] is, I guess, in Greek means pointing to so this remote or that chair or the water bottle on the lectern.

493
00:46:18,830 --> 00:46:22,700
Those are all TikTok references. Okay.

494
00:46:23,030 --> 00:46:33,110
So what we want to do is decide which objects and which properties of objects are relevant to making the predictions that we want to make.

495
00:46:34,150 --> 00:46:39,790
And so what we're going to do is we're going to start out and say, I know there's at least one object in the world that's important.

496
00:46:39,790 --> 00:46:41,110
Let's think about pushing an object.

497
00:46:41,260 --> 00:46:46,660
So if I want to learn the model of what happens when I try to push an object, then I know which object I'm pushing.

498
00:46:47,760 --> 00:46:54,960
Okay. That's good. But there may be other objects that matter or that are affected by doing this action.

499
00:46:55,500 --> 00:46:59,729
And I'm going to refer to those objects using dynamic references in this work.

500
00:46:59,730 --> 00:47:04,740
We have some fixed set of dyadic references which we should increase and learn, but for right now they're fixed.

501
00:47:05,190 --> 00:47:09,839
I can talk about an object above this object or below it, near the nearest object and so on.

502
00:47:09,840 --> 00:47:16,890
So there are some relations which you can think of when applied to this object well after some other set of objects.

503
00:47:17,840 --> 00:47:21,290
The set might be empty. The set might have one object, the set might have many objects.

504
00:47:21,290 --> 00:47:29,540
But whatever their way of talking about other objects in relation to one object I already know, and then you can apply that kind of recursively.

505
00:47:29,540 --> 00:47:33,370
So imagine that I wanted to talk about pushing an object.

506
00:47:33,380 --> 00:47:39,080
We'll call that object object one and I can say LED object to the way to read that stuff that right

507
00:47:39,080 --> 00:47:45,020
there is LED object to be the object or possibly the set of objects that's above object one.

508
00:47:45,980 --> 00:47:49,610
And let object three be the set of objects that's above object to and so on.

509
00:47:50,090 --> 00:47:53,210
So if I had a scene like the one on the right.

510
00:47:54,440 --> 00:48:01,430
I could abstracted is a graph of relations. And then I could say, well, if a is object one.

511
00:48:02,540 --> 00:48:11,810
Then these other objects. These particular objects in my particular world play the roles of object to an object to an object for.

512
00:48:13,440 --> 00:48:22,349
All right. So I'm going to use this mechanism to have a flexible representation of an object I'm operating on and the

513
00:48:22,350 --> 00:48:27,450
other objects that are relevant to it in a way that applies no matter how many objects are in my scene.

514
00:48:29,670 --> 00:48:39,210
Okay. So if we have a set of these didactic references which reach out and name some other objects relative to the object I'm operating on then.

515
00:48:40,140 --> 00:48:43,440
And we maybe we figure out which properties of those objects are important.

516
00:48:43,980 --> 00:48:48,520
Then we have a kind of a straight up neural network learning problem, right?

517
00:48:48,540 --> 00:48:54,240
Which is now we have a fixed dimensional input. We have this object and it's other objects that are relevant and some properties.

518
00:48:55,050 --> 00:48:58,290
And we map into some properties of that object and some other objects.

519
00:48:59,530 --> 00:49:05,560
So that's plain old numeric regression problem. We know how to train a neural network if we know what data to give it.

520
00:49:07,710 --> 00:49:14,550
Okay. And so and then you can apply it, right? What's important about this thing is that it's meant to it's supposed to apply anywhere, right?

521
00:49:15,390 --> 00:49:20,280
It says if these guys have these properties on the endpoint, this is the properties that the other guys will have in the output.

522
00:49:21,260 --> 00:49:28,340
Okay. And so then we have an algorithm for learning this, and I am not going to go into detail, but it takes a kind of a normal training set.

523
00:49:28,640 --> 00:49:33,950
A state is an arrangement of all the objects in the world. I take an action, a resulting sentence, then I get a resulting state,

524
00:49:34,430 --> 00:49:42,770
and I have an outer loop that kind of greedily operates on the structures of the rules and an

525
00:49:42,800 --> 00:49:47,810
inner loop that does some M stuff and an inner inner loop that does neural network train.

526
00:49:50,060 --> 00:49:58,640
Okay. And so what happens is that we can do things like learn the results of pushing objects on a crowded table.

527
00:49:59,510 --> 00:50:03,780
Right now, we're not testing this inside a planner. So this is a really preliminary work.

528
00:50:03,840 --> 00:50:07,940
We're just checking to see if the model that we get predicts the data that we trained on.

529
00:50:07,970 --> 00:50:11,330
Well, I mean, critics held out data well, but it's just based on likelihood.

530
00:50:12,920 --> 00:50:16,150
And we compared it to two other strategies.

531
00:50:16,160 --> 00:50:18,170
Right. So there's our learned rule based model.

532
00:50:18,590 --> 00:50:26,630
We compared it to just a straight up neural network in which we encode the positions of all the objects on the table.

533
00:50:27,080 --> 00:50:32,780
The problem is that you don't know what order to put them in. And so we picked what we thought was the most helpful order.

534
00:50:32,780 --> 00:50:39,530
But it's a very hard thing to do. And we compare it against a graph neural network, which is, again, a kind of a modern,

535
00:50:39,530 --> 00:50:47,530
structured neural network that abstracts away from the individuals in a nice way, but doesn't have exactly the right bias for this kind of problem.

536
00:50:49,120 --> 00:50:52,419
And what we found so purple is our thing. Blue is the graph.

537
00:50:52,420 --> 00:50:58,780
Neural network in red is a plain old flat neural network in this case with just three objects in all the scene.

538
00:50:58,780 --> 00:51:01,720
So we're not testing generalisation over multiple objects,

539
00:51:02,710 --> 00:51:09,220
but we find again that the sparse rule learning can learn officially very quickly how to make good predictions in this domain.

540
00:51:10,900 --> 00:51:15,879
More importantly is the fact that it's relatively unaffected by clutter, right?

541
00:51:15,880 --> 00:51:26,050
So as you add more objects into the world, the neural network suffers because they're all in some arbitrary order and it doesn't know what matters.

542
00:51:26,740 --> 00:51:34,930
The graph neural network does reasonably well, and the learning thing works still more reliably, you might ask.

543
00:51:36,120 --> 00:51:43,260
I certainly asked when I first saw these results. You might ask why does it get better as there get to be more objects in the world?

544
00:51:43,290 --> 00:51:49,769
That seems counterintuitive. The answer is that if there's a bunch of stuff on the table and I'm pushing one object that might push another object,

545
00:51:49,770 --> 00:51:52,470
but almost everything stays the same.

546
00:51:53,890 --> 00:51:59,230
So predicting everything stays the same is not so bad and you just have to learn the things that don't stay the same.

547
00:51:59,530 --> 00:52:03,729
So if you average over the number of objects in the scene and you predict how well you predict what happens,

548
00:52:03,730 --> 00:52:05,860
then you measure how well you're predicting what happens to them.

549
00:52:06,220 --> 00:52:10,570
The more objects that don't move, actually, the easier it becomes if you have the right bias.

550
00:52:11,430 --> 00:52:15,900
But it becomes harder for the flatterer on our. Okay.

551
00:52:16,050 --> 00:52:19,620
So this is just like a tiny, tiny tip of an iceberg, but it's very exciting.

552
00:52:19,620 --> 00:52:26,279
I feel like we have the tools and the pieces and the parts to figure out how to make generally intelligent robots.

553
00:52:26,280 --> 00:52:32,159
I kind of do. We have a bunch of work we have to work on connecting the vision algorithms that exist now,

554
00:52:32,160 --> 00:52:37,170
which are awesome but not quite what we need into speed estimation in a useful way.

555
00:52:38,430 --> 00:52:44,520
Right now we have learned policies down at the low level and planning at the high level, but that should be more fluid.

556
00:52:44,850 --> 00:52:50,819
We should be able to catch the results of planning in a way that lets us routinised the things that we do very frequently.

557
00:52:50,820 --> 00:52:56,729
We're not doing that. I think end to end learning is a blessing and a curse.

558
00:52:56,730 --> 00:52:59,730
My colleague Tomas likes to call it dead end learning.

559
00:52:59,730 --> 00:53:04,830
I'm not sure. So what's interesting about end to end learning, right?

560
00:53:04,830 --> 00:53:11,790
So that's when you say, I have this giant system and I'm not going to try to give it intermediate signals of success,

561
00:53:11,790 --> 00:53:17,430
but rather just measure the quality of the whole thing based on the final actions it takes.

562
00:53:17,940 --> 00:53:21,120
That's the right thing. You can't argue with it intellectually, right?

563
00:53:21,120 --> 00:53:26,250
It is exactly the right thing. You don't want to say my state estimate it needs to be awesome.

564
00:53:26,460 --> 00:53:29,940
According to some criterion, that's only about state estimation.

565
00:53:29,940 --> 00:53:37,260
Really. All I care is that the status metre does a job that helps the planner do a job that causes the controller to emit the right torques.

566
00:53:38,450 --> 00:53:45,469
That's all I care about. But the idea that you could back propagate from errors on the talks all the way

567
00:53:45,470 --> 00:53:50,150
through the plan or in the state estimate or to me seems like not so clear.

568
00:53:50,630 --> 00:53:55,040
So I think we have to figure out ways of combining local reward signals and end to end reward signals.

569
00:53:56,150 --> 00:54:01,430
We're talking about interacting with humans, all kinds of stuff. So I brought back one more ancient slide.

570
00:54:01,430 --> 00:54:05,000
This is from 1995, but it's still kind of like my view of what's going on.

571
00:54:06,240 --> 00:54:13,250
And then we have to think about learning a lot of different kinds of levels of abstraction and how to make them actually not divided into layers.

572
00:54:13,670 --> 00:54:19,010
I think that right now, especially in robotics, a lot of people are working at what I would call the skill level,

573
00:54:19,370 --> 00:54:22,790
although, you know, I think in robotics, that's roughly right.

574
00:54:24,290 --> 00:54:30,050
What's interesting is that we can kind of do skills and then we can kind of do like fancy stuff up at the top, like play go.

575
00:54:30,500 --> 00:54:35,030
But we're terrible at just like basically making breakfast or even walking out of this lecture room.

576
00:54:35,390 --> 00:54:41,780
So that middle ground I think is interesting and important, and I want to recruit more people to work on it and think about it.

577
00:54:42,380 --> 00:54:47,630
So there's a bunch of people who helped with this, and I'm grateful to them for what they have done.

578
00:54:48,080 --> 00:54:51,830
And with that, I will say thank you and let you watch the robot make mistakes.

579
00:54:51,920 --> 00:54:52,430
So thanks.