1
00:00:00,480 --> 00:00:10,560
So welcome to this strategy lecture. This is a terribly serious of distinguished lectures named for Professor Christopher Strange Chris.

2
00:00:10,570 --> 00:00:17,490
The strategy founded Oxford Programming Research Group in 1965, and with Dana Scott,

3
00:00:17,490 --> 00:00:25,140
he founded the Field of Occasional Semantics, which provides a fun mathematical foundation to programming languages.

4
00:00:26,880 --> 00:00:35,430
Before introducing today, Speaker, I'd like to thank Oxford Asset Management for who generously support this great few lectures.

5
00:00:36,270 --> 00:00:44,850
I have been supporting this theory since 2015 and without the support we would be unable to host this series of exciting lectures.

6
00:00:46,320 --> 00:00:48,740
There'll be coffee and doughnuts after the talk,

7
00:00:48,750 --> 00:00:57,060
and this'll give students and researchers the opportunity to meet representatives of Oxford Asset Management.

8
00:00:57,180 --> 00:01:03,540
You'll be able to find you near the registration desk. Okay.

9
00:01:03,690 --> 00:01:11,880
So with respect to today's talk, it's my very great pleasure to introduce today's speaker, who is Professor Lisa Ghetto.

10
00:01:12,930 --> 00:01:18,030
Lisa Guitar is a distinguished professor in the University of California, Santa Cruz,

11
00:01:18,030 --> 00:01:26,100
Department of Computer Science and Engineering, and a founding director of the UCSC Data Science Research Centre.

12
00:01:27,570 --> 00:01:34,410
Lisa has been the recipient of many honours and awards in recognition of her distinguished research contributions,

13
00:01:34,680 --> 00:01:38,520
including fellowships of the ICI and I Triple A,

14
00:01:39,180 --> 00:01:49,610
Triple II, numerous paper awards and many invitations to give keynote talks and distinguished lectures such as this one today.

15
00:01:52,380 --> 00:01:57,000
Lisa is perhaps best known for her work on statistical relational learning,

16
00:01:57,000 --> 00:02:07,020
which is a subfield of II that combines logic and machine learning to deal with data that exhibits both complex structure and inherent uncertainty.

17
00:02:08,220 --> 00:02:12,900
She's also made important contributions in fields as diverse as data management,

18
00:02:13,170 --> 00:02:19,440
visual analytics and social network analysis, as well as being a passionate advocate for fairness.

19
00:02:19,440 --> 00:02:26,370
And I think in today's Cool Place that's going to tell us about her work on probabilistic Soft Logic,

20
00:02:26,850 --> 00:02:32,670
which is an SRO framework for large scale probabilistic reasoning and relational domains.

21
00:02:33,240 --> 00:02:37,230
So please join me and giving this a very warm welcome today.

22
00:02:43,980 --> 00:02:48,240
First. It's a huge honour to be here to give this talk.

23
00:02:48,240 --> 00:02:55,950
And Ian didn't give the back story that this talk has been in the making for quite a while.

24
00:02:56,220 --> 00:03:02,910
I think we first started talking in 2019 and I was supposed to give the talk in May of 2020.

25
00:03:03,240 --> 00:03:13,060
Just when things were kind of shutting down. And I'm really happy to be here and actually be here in 3D rather than 2D.

26
00:03:14,250 --> 00:03:22,590
And I'm hoping that my topic, which is about kind of mixing logic, probability and neural reasoning,

27
00:03:23,190 --> 00:03:29,630
I think is a really good fit to the kind of research that's being done and the computer science department here.

28
00:03:29,640 --> 00:03:39,030
So I hope you'll agree by the end. And then finally, I am going to talk about actually a programming language, a probabilistic programming language.

29
00:03:39,030 --> 00:03:44,490
So I hope that's kind of in honour of Christopher is fitting.

30
00:03:45,420 --> 00:03:57,840
But before I get into the kind of meat of the talk, I want to kind of pop things out a little bit and put a bigger picture on what I'm doing.

31
00:03:57,840 --> 00:04:08,910
And so, if you will, a subtitle or super title for my talk is The Power of Relational or Statistical Relational Thinking.

32
00:04:09,480 --> 00:04:13,740
And you know, what do I mean by relational thinking?

33
00:04:14,070 --> 00:04:21,360
And relational thinking is interesting because there's all these different fields that use the term,

34
00:04:21,630 --> 00:04:25,410
but they mean kind of different things in these different areas.

35
00:04:26,130 --> 00:04:27,960
So let's start with psychology.

36
00:04:28,620 --> 00:04:40,260
In psychology, there's this notion that it's about being able to talk about objects and attributes of objects and their relations,

37
00:04:40,800 --> 00:04:48,630
and that's part of intelligence. And within sociology, not surprisingly,

38
00:04:48,840 --> 00:05:01,810
it's about the ties between individuals and the memberships in organisations and those kinds of structures within mathematics.

39
00:05:01,830 --> 00:05:11,850
It's interesting, it has a kind of different interpretation and it's much more about the ability to go from the specific to the more abstract.

40
00:05:12,090 --> 00:05:19,920
So oftentimes the example given is a distinction between just doing calculations versus,

41
00:05:20,190 --> 00:05:23,880
you know, understanding how to manipulate things using algebra.

42
00:05:25,020 --> 00:05:32,190
And then finally, in philosophy A, the term is used in a lot of different ways.

43
00:05:33,360 --> 00:05:38,790
One of them is around a can of even notions of reality,

44
00:05:38,790 --> 00:05:53,550
of a relational ism around arguing against non duality between mind and body and subjectivity and objectivity.

45
00:05:55,140 --> 00:06:02,950
But. What does this mean from a perspective of computer science or machine learning?

46
00:06:02,990 --> 00:06:08,520
Ah, I, I'm going to argue that there's kind of two components.

47
00:06:08,520 --> 00:06:12,510
So one component is not surprisingly dealing with structure.

48
00:06:12,900 --> 00:06:16,620
So dealing with structure either in the inputs.

49
00:06:17,760 --> 00:06:26,610
So, you know, oftentimes in machine learning, you just have this flat attribute value vectors and you're trying to learn things from them.

50
00:06:26,610 --> 00:06:30,180
But, you know, so many things are much richer.

51
00:06:30,300 --> 00:06:36,480
Like even a graph or a multi relational graph and so on as common.

52
00:06:37,380 --> 00:06:46,320
And then also for the outputs, the outputs, the decision variables that one's trying to predict.

53
00:06:46,620 --> 00:06:54,810
Oftentimes there's a fair bit of structure in them and there's kind of dependencies between them, and you need to reason jointly about them.

54
00:06:55,110 --> 00:07:03,030
So the most obvious examples are natural language processing when you're doing something like parsing or computer vision,

55
00:07:03,180 --> 00:07:11,280
when you're doing scene understanding. So there's a structure that we want to be able to make use of.

56
00:07:11,790 --> 00:07:21,599
But then there's also uncertainty. And again, there's uncertainty in the inputs, you know, noise and uncertainty and the outputs,

57
00:07:21,600 --> 00:07:29,010
you know, quantifying the uncertainty over various different predictions that one's making.

58
00:07:29,400 --> 00:07:42,270
So what I argue is that we need Method Z are scalable and are able to deal with both structure and uncertainty.

59
00:07:42,600 --> 00:07:56,730
And in this talk, what I want to do is provide you with some kind of patterns and some methods for doing relational thinking.

60
00:07:57,090 --> 00:08:02,280
And these methods are going to use a mix of.

61
00:08:03,440 --> 00:08:13,100
Logic and probability or kind of hard and soft reasoning and also neural and symbolic reasoning.

62
00:08:14,180 --> 00:08:19,520
So the talk is divided into kind of three different components.

63
00:08:19,520 --> 00:08:27,499
The first part is I'm just going to go through some patterns, and these patterns are actually pretty basic,

64
00:08:27,500 --> 00:08:38,390
but we've found them over and over again to be super useful in building AI and machine learning systems.

65
00:08:38,810 --> 00:08:43,940
Then the bulk of the talk is going to be the tool section.

66
00:08:44,240 --> 00:08:47,390
It's going to go into a little more technical detail.

67
00:08:47,690 --> 00:08:52,300
And then, you know, we're computer scientists, so we always like to talk about scaling.

68
00:08:52,310 --> 00:08:56,270
So there'll be a section on getting this to work at scale.

69
00:08:56,810 --> 00:09:05,960
So for patterns, the way that I'm going to be representing these patterns is as logical rules.

70
00:09:06,320 --> 00:09:11,900
And so these have advantages. You know, they're good at representing structure.

71
00:09:12,140 --> 00:09:13,970
They tend to be interpretable.

72
00:09:15,290 --> 00:09:27,680
In about 10 minutes into the lecture, I'm going to back off on that and talk about also some probabilistic interpretations for them.

73
00:09:29,150 --> 00:09:43,410
And the. Canonical pattern said I'm going to go through are these structured prediction patterns that are super simple but still powerful.

74
00:09:43,830 --> 00:09:51,750
And the first ones, the simplest the simplest one is just collective classification.

75
00:09:52,050 --> 00:09:59,220
So you're doing classification problem where you're trying to predict labels for nodes in a graph.

76
00:09:59,610 --> 00:10:10,680
You know, you can't get much simpler than that. And, and the thing that I want to highlight is how to represent the reasoning pattern behind this.

77
00:10:11,160 --> 00:10:15,900
And the reasoning pattern is captured in these two simple rules.

78
00:10:16,170 --> 00:10:22,200
So one is, okay, I'm going to just use local information at the node.

79
00:10:22,500 --> 00:10:26,340
So attributes of the node to try and infer a label.

80
00:10:26,730 --> 00:10:31,860
But then I'm also going to use things about what the labels of its neighbours are.

81
00:10:32,490 --> 00:10:39,720
And the interesting thing to note here is the thing that I'm trying to infer label.

82
00:10:40,350 --> 00:10:44,880
You see this one rule where label is on both sides of the rule.

83
00:10:45,330 --> 00:10:57,570
So that's the thing that gets this dependence between the decision variables, the fact that I have unknowns potentially on both sides of the rule.

84
00:10:58,650 --> 00:11:03,570
So I need to give a little toy example of this.

85
00:11:03,990 --> 00:11:12,480
The one that I usually give is I have a social network and I'm trying to infer political parties.

86
00:11:12,810 --> 00:11:24,570
Okay, so these are the US political parties, so I'm going to switch them and keep the colours the same and the British parties.

87
00:11:24,570 --> 00:11:27,990
But so again, this is a toy example,

88
00:11:27,990 --> 00:11:39,240
but the idea is most often what happens is you're given a network where you observe the labels

89
00:11:39,240 --> 00:11:45,990
for some of the nodes and then it's just these other ones where you're trying to infer them.

90
00:11:46,830 --> 00:11:52,770
And so how do we instantiate that little pattern that I gave a minute or two ago?

91
00:11:53,040 --> 00:12:02,070
Well, I'm going to have a collection of local rules where the local rules say things like,

92
00:12:02,370 --> 00:12:08,640
oh, you know, if they donate to a particular party, they're likely to vote for that party.

93
00:12:09,930 --> 00:12:19,290
If they tweet about something, maybe this isn't a Labour thing that people tweet, but they're likely to be at that party.

94
00:12:19,680 --> 00:12:30,720
But then the more interesting rules are these collective rules where the collective rules take into account, you know, if.

95
00:12:31,730 --> 00:12:36,080
Their friends vote for a party. They're likely to vote for the party.

96
00:12:37,220 --> 00:12:41,450
If their spouse votes for a party, they're likely to vote for a party.

97
00:12:41,660 --> 00:12:45,710
And again, this is something that allows you to propagate the information.

98
00:12:47,030 --> 00:12:52,940
And using something like this, you can go through and end up doing a labelling.

99
00:12:53,390 --> 00:13:02,060
There's a bunch of different algorithms. I'm going to tell you one way that we do it at scale in the middle of the talk.

100
00:13:02,570 --> 00:13:09,830
Now, there is an important aside here, which is related to privacy.

101
00:13:10,250 --> 00:13:20,210
So it's exactly this issue that you can infer so much about an individual based on their network.

102
00:13:20,430 --> 00:13:26,120
That's important. I'm going to talk a little bit more about that towards the end of the lecture.

103
00:13:27,790 --> 00:13:33,810
So the next pattern again is that was about nodes.

104
00:13:33,820 --> 00:13:39,460
This is about edges. So inferring the existence of edges in a graph.

105
00:13:40,180 --> 00:13:48,160
And again, we have this super simple pattern that says if there's a edge between X and Y,

106
00:13:48,910 --> 00:13:54,430
there's a Z that's similar to Y, then there's an edge between X and Z.

107
00:13:54,850 --> 00:14:02,229
And because I have this thing that I'm going to infer, the link is what I'm trying to predict on both sides of the rule.

108
00:14:02,230 --> 00:14:05,410
That's the thing that's going to allow this dependence to happen.

109
00:14:06,580 --> 00:14:11,410
So what's a pattern for an example of this?

110
00:14:11,680 --> 00:14:18,370
Well, one of the things that there's a lot of work in is recommender systems.

111
00:14:18,370 --> 00:14:25,090
And recommender systems. Exactly. You're trying to predict what items someone will like.

112
00:14:25,690 --> 00:14:33,670
There's these kind of structural patterns that one can do where you can say, okay, if a user likes one item,

113
00:14:33,910 --> 00:14:39,610
there's another item that's similar to that item, then the user will like the second item.

114
00:14:42,070 --> 00:14:45,790
Also you can say, okay, if there's a user that likes an item,

115
00:14:46,000 --> 00:14:52,870
there's another user that's similar to that user, then the second user will like that item.

116
00:14:53,590 --> 00:15:01,390
And so, you know, this is a simple way of capturing what's done and a lot of recommender systems.

117
00:15:01,900 --> 00:15:09,670
The key challenge, not surprisingly, and note that these solid lines are the things that are observed.

118
00:15:09,880 --> 00:15:15,700
And then I'm making this prediction as a test. One of the challenges.

119
00:15:17,000 --> 00:15:30,120
How to represent the similarity. And then the kind of third important sub problem is entity resolution.

120
00:15:30,120 --> 00:15:40,770
And so what is entity resolution? This is something that's ubiquitous across computer science where you are essentially trying to

121
00:15:41,220 --> 00:15:49,200
determine when you have two references and they're really referring to the same underlying thing.

122
00:15:51,750 --> 00:15:57,510
Again, we can set this up where we use local information.

123
00:15:57,510 --> 00:16:10,380
So, you know, if they have similar names, they're likely to refer to the same, let's say, person, if they're similar, linked to similar things.

124
00:16:10,860 --> 00:16:15,690
But then the kind of collective pattern that is.

125
00:16:17,350 --> 00:16:26,800
Important is if X and Y are the same and Y and Z are the same, then Excellency are the same.

126
00:16:27,070 --> 00:16:35,620
So this is a transitivity that allows you to infer additional same as links.

127
00:16:36,250 --> 00:16:41,110
And this is important,

128
00:16:42,310 --> 00:16:47,890
but there are some domains where you actually have a different kind of rule and

129
00:16:48,310 --> 00:16:55,060
understanding your domain to know when you do this kind of clustering versus some domains.

130
00:16:55,060 --> 00:17:03,370
What happens is let's. What happens is it doesn't go black.

131
00:17:03,970 --> 00:17:07,840
You have a more of a linking kind of problem.

132
00:17:08,020 --> 00:17:11,950
So it's said if X and Y are the same.

133
00:17:12,400 --> 00:17:17,800
And Z is not the same as Y, then X can't link to Z.

134
00:17:18,280 --> 00:17:28,300
So think about this. When you have two databases that you're matching across and they've been duplicated and you're only allowed to match across one,

135
00:17:28,660 --> 00:17:37,060
and a distinction of knowing when you're in the clustering setting versus when you're in the mutual exclusion setting as important.

136
00:17:38,680 --> 00:17:49,600
But now those are all these kind of micro pattern that my favourite pattern is one where you do all of these at the same time.

137
00:17:50,110 --> 00:17:58,000
And so this is a problem of graph identification we first kind of introduces like ten years ago.

138
00:17:58,240 --> 00:18:03,590
And let me again kind of define it by example a.

139
00:18:05,700 --> 00:18:17,700
A setting where we have a kind of noisy input graph, but then we have to do all these different inferences to infer an output graph.

140
00:18:17,700 --> 00:18:25,440
And then this output graph is the thing that we then want to go on and do our analysis on or do our science on.

141
00:18:26,100 --> 00:18:31,290
And an example would be you have a communication network.

142
00:18:31,560 --> 00:18:40,530
In this case, it's an email communication network, but you have email addresses and the edges are communications and you have some.

143
00:18:41,770 --> 00:18:54,219
Content information. And then what you're trying to do is infer out an organisational network or social network and here the nodes are different,

144
00:18:54,220 --> 00:18:59,260
they're not email addresses, they're actual people. The edges have a different semantics there.

145
00:18:59,980 --> 00:19:06,520
Who's in charge of who and the node labels maybe are, you know, what are their roles in the company?

146
00:19:07,150 --> 00:19:11,440
And so, again, what does this problem look like?

147
00:19:12,340 --> 00:19:20,020
I have as input this graph I'm trying to get out this so I don't have this observed.

148
00:19:20,770 --> 00:19:24,010
What do I need to do? I need to, you know, first.

149
00:19:24,960 --> 00:19:30,480
Go through and do entity resolution where I figure out, you know, who are the people?

150
00:19:31,730 --> 00:19:38,240
Then I have to do the link prediction of, you know, who manages who.

151
00:19:39,550 --> 00:19:44,980
And then I have to do the inference of their kind of roles in the company.

152
00:19:45,280 --> 00:19:49,270
So I need to go through and do the labelling of this.

153
00:19:49,750 --> 00:19:58,989
The thing that's kind of interesting about this, in my view, is how there's all these interesting dependencies.

154
00:19:58,990 --> 00:20:03,700
And most of the time people look at these problems in isolation.

155
00:20:03,700 --> 00:20:08,680
So they either do collective classification or they do link prediction or entity resolution,

156
00:20:08,950 --> 00:20:14,260
but trying to do them all at once where there's dependencies between.

157
00:20:15,580 --> 00:20:18,460
You know, obviously on the observed graph.

158
00:20:18,940 --> 00:20:29,890
But there's also what I was emphasising before that in tread dependence across the predictions in each of the task.

159
00:20:30,430 --> 00:20:37,110
But then there's also this kind of. Interdependence across them.

160
00:20:37,650 --> 00:20:48,090
And it's really these kinds of complex reasonings with graphs that kind of motivates much of my work.

161
00:20:49,450 --> 00:20:55,550
So. Now let me get into methods for doing this.

162
00:20:55,730 --> 00:21:01,670
And I'm going to be talking about tools and.

163
00:21:02,890 --> 00:21:10,630
I'm first going to talk about tools that combine the statistical and relational, so the logic and probability.

164
00:21:10,840 --> 00:21:16,030
And then afterwards I'm going to talk about methods that combine the neural and symbolic.

165
00:21:17,560 --> 00:21:25,630
So for the statistical relational, you know, this is an area, sub research area,

166
00:21:26,080 --> 00:21:34,540
as you mentioned, that combines logical representations, handles uncertainty,

167
00:21:34,870 --> 00:21:40,899
and is especially designed to do these kind of collective problems where these rich

168
00:21:40,900 --> 00:21:49,090
dependencies and there's been a lot of different languages that have been proposed.

169
00:21:49,570 --> 00:21:52,780
And we're going to add one to the mix.

170
00:21:53,200 --> 00:21:59,710
PSL. PSL stands for probabilistic soft logic.

171
00:22:01,780 --> 00:22:09,100
It's an open source toolkit. There's a bunch of examples and tons of resources at this website.

172
00:22:11,140 --> 00:22:19,660
But at a high level, going back to the fact that I represented all these patterns as these logical rules.

173
00:22:20,050 --> 00:22:25,640
Well, logical rules have advantages, but they also have disadvantages.

174
00:22:25,660 --> 00:22:31,360
So in general, doing full reasoning with them is intractable.

175
00:22:33,190 --> 00:22:38,110
If there's inconsistencies, that's going to be a significant problem.

176
00:22:38,530 --> 00:22:43,120
And, you know, one of the things that I mentioned is dealing with similarity.

177
00:22:43,520 --> 00:22:47,050
You know, they're not great at representing similarities.

178
00:22:47,860 --> 00:22:56,590
So what we try and do with PSL is basically turn these disadvantages into advantages

179
00:22:57,400 --> 00:23:05,110
and have a tractable method that can handle the inconsistencies and can represent.

180
00:23:09,280 --> 00:23:12,520
Similarities are a degree of similarity.

181
00:23:13,610 --> 00:23:16,840
So what does this look like?

182
00:23:17,440 --> 00:23:28,330
It's a probabilistic programming language for these collective problems, and it is represented as weighted rules.

183
00:23:30,070 --> 00:23:47,370
The. Key thing with these weighted rules is the kind of ground atoms that we'll get to are going to be not standard zero one binary,

184
00:23:47,370 --> 00:23:51,090
but continuous value between zero and one.

185
00:23:51,720 --> 00:23:56,970
And a cell programme is just, you know, these rules plus some input data.

186
00:23:58,370 --> 00:24:06,410
So. We make reasoning scalable by converting this to a convex optimisation problem.

187
00:24:06,740 --> 00:24:14,780
And so what I want to do is show you how we come up with this particular optimisation problem.

188
00:24:15,350 --> 00:24:26,180
And this is work very much by my former Ph.D. student, Stephen Buck, and Bert Wang, a post-doc that worked with me.

189
00:24:26,660 --> 00:24:33,830
And I'm going to flash up the particular relaxation that we introduce.

190
00:24:35,060 --> 00:24:39,470
And then I'm going to show you how we get to it under three different ways.

191
00:24:39,500 --> 00:24:48,830
So for now, I'm going to unpack this in a minute, but kind of see if you can kind of memorise this pattern.

192
00:24:49,640 --> 00:25:00,700
And. This pattern. The interesting thing is it comes up under three different interpretations for this way to logic programmes.

193
00:25:00,700 --> 00:25:04,490
So one comes from randomised algorithms, theoretical computer science.

194
00:25:04,780 --> 00:25:09,670
One comes from probabilistic graphical models and one comes from soft logic.

195
00:25:11,890 --> 00:25:16,450
Let me do the first from randomised algorithms. So.

196
00:25:18,400 --> 00:25:29,080
We're going to take this way to logical rules, and I'm going to rewrite them so that we have non-negative weights.

197
00:25:29,380 --> 00:25:33,850
I'm going to write it in this normal form.

198
00:25:34,210 --> 00:25:40,570
And then I'm indexing over the positive literals and the negative literals.

199
00:25:40,570 --> 00:25:44,500
So it's a little. A messy bit.

200
00:25:46,090 --> 00:25:49,659
And then what I want to do is.

201
00:25:49,660 --> 00:25:59,250
I want to. Find that assignments to the x variables that maximises this weighted sum.

202
00:26:00,760 --> 00:26:10,090
And so, you know, all of you guys go back to your algorithms class and you say, Professor Gitau, I remember that.

203
00:26:10,600 --> 00:26:15,310
That is why you did Max that and that is amp hard.

204
00:26:15,580 --> 00:26:18,940
So that doesn't seem like you got anywhere so far.

205
00:26:20,590 --> 00:26:25,780
But now what we're going to do is take this MP hard problem and.

206
00:26:27,070 --> 00:26:34,930
We're going to convert it to a problem where we instead interpret the random variables as probabilities,

207
00:26:34,930 --> 00:26:39,730
rounding probabilities, so the probability they round up to one are down to zero.

208
00:26:40,510 --> 00:26:47,050
And doing this, we can make use of this old result of Gomez and Williamson.

209
00:26:47,830 --> 00:26:50,710
They actually have to really famous results from the same year.

210
00:26:51,550 --> 00:27:03,790
But this gives a way of bounding the expectation for that expression with a linear programme.

211
00:27:04,420 --> 00:27:15,460
And you can go further and say that this gives you this three quarters optimal rounding guarantee for this relaxation.

212
00:27:15,910 --> 00:27:19,930
And I know everybody remembers what I put two slides ago.

213
00:27:19,930 --> 00:27:28,660
If you look at this main part of the term here, it has the form that I showed a minute ago.

214
00:27:29,020 --> 00:27:38,470
So this is one interpretation where I get a relaxation that is nice and is basically a linear programme.

215
00:27:39,470 --> 00:27:44,660
Now there's a second interpretation. And the second interpretation comes from graphical models.

216
00:27:45,320 --> 00:27:54,410
And so what I can do with my weighted logical rules is I can basically form a big giant graphical

217
00:27:54,410 --> 00:28:04,850
model where I have my random variables and then I have these potential functions that are the rules.

218
00:28:04,860 --> 00:28:11,310
So the logical rules. And.

219
00:28:12,640 --> 00:28:17,140
They're kind of the potential functions in the graphical model.

220
00:28:17,140 --> 00:28:24,370
So they're kind of weird potential functions because they're just our logic.

221
00:28:24,760 --> 00:28:26,080
But still, I can do this.

222
00:28:26,260 --> 00:28:40,450
And then the standard thing that people do in graphical models is then define a distribution where the distribution is just over all of the rules,

223
00:28:40,450 --> 00:28:53,049
the sum of the weighted potentials. And the classic problem is to then find the assignment of variables, then maximises this exponent.

224
00:28:53,050 --> 00:29:02,590
It will be the most probable. So at this point I've done nothing rather than just rename weighted max that into graphical models.

225
00:29:03,040 --> 00:29:13,869
But we can go from here when one's trying to find the assignment to the variables.

226
00:29:13,870 --> 00:29:23,560
One of the things that one can do is to solve this is to find a globally consistent set of marginal distributions.

227
00:29:26,150 --> 00:29:32,560
Again. This is still hard because there's exponentially many constraints.

228
00:29:33,220 --> 00:29:42,370
So what people and graphical models have often done is to do these local consistency around relaxation.

229
00:29:42,370 --> 00:29:48,940
So rather than enforce all of the constraints, just enforce a few local ones.

230
00:29:49,330 --> 00:29:54,250
There's a whole cottage industry in different ways of doing this.

231
00:29:55,570 --> 00:30:09,550
We can make use of this in our setting and introduce some kind of pseudo marginals over some joint potential states, but not all of them.

232
00:30:10,300 --> 00:30:23,160
Then we can take. Our expression, put in these constraints for consistency with just the marginals and the pseudo marginals.

233
00:30:24,390 --> 00:30:25,350
And then, you know,

234
00:30:25,350 --> 00:30:41,400
do a little bit of not terribly complex analysis to kind of rewrite this and get rid of the pseudo marginals projected onto the mews.

235
00:30:42,060 --> 00:30:46,080
And we can get out this expression.

236
00:30:46,980 --> 00:30:55,950
And again, if you look at this expression, it has the same form as the max set relaxation.

237
00:30:55,950 --> 00:31:04,500
So we got to the same relaxation, but through to pretty significantly different interpretations.

238
00:31:04,860 --> 00:31:16,169
And this result actually between these two is interesting just in and of itself, because the people that were doing the local consistency,

239
00:31:16,170 --> 00:31:20,430
relaxations and graphical models would do them, but they didn't have any bounds.

240
00:31:20,730 --> 00:31:24,990
So now we can take this down from the max set relaxation and apply it.

241
00:31:26,220 --> 00:31:29,880
But that's two of the three interpretations.

242
00:31:30,210 --> 00:31:40,230
So the third interpretation comes from soft logic, and this is where we just start off and say, you know,

243
00:31:40,230 --> 00:31:51,930
the random variables are continuous between zero and one, you know, interpreted as either similarity or degree of truth.

244
00:31:52,510 --> 00:32:08,010
And, you know, there's different soft logics. We use Fukushima's logic and then through doing some manipulation to find the exact max solution.

245
00:32:08,460 --> 00:32:12,000
So no approximations happening here at all.

246
00:32:13,740 --> 00:32:17,850
You get this expression. And this expression.

247
00:32:17,850 --> 00:32:23,790
Now, you can kind of have a little bit of an interpretation for it of where it came from,

248
00:32:25,230 --> 00:32:32,100
because this is really like either the rule is satisfied so it's one or it's not satisfied.

249
00:32:32,100 --> 00:32:39,989
So I'm going to add up the truth values of the positive literals and the truth values of the negative literals.

250
00:32:39,990 --> 00:32:43,260
And talk about that is the degree of satisfaction.

251
00:32:43,710 --> 00:32:47,370
And again, if you look at this expression.

252
00:32:48,660 --> 00:32:53,220
It matches the other two. So we've gotten to.

253
00:32:54,950 --> 00:33:06,800
The same relaxation through three different interpretations, which always seemed like a good sign of consistency.

254
00:33:09,770 --> 00:33:12,830
With pencil, we actually ended up going further than this.

255
00:33:12,980 --> 00:33:17,570
So we introduced a notion that we called hinge loss mark of random fields.

256
00:33:17,960 --> 00:33:26,840
And so the hinge, less marka random fields have this kind of same general form.

257
00:33:27,380 --> 00:33:39,500
But now. Rather than this expression that we had before, we're going to allow kind of arbitrary linear expressions here.

258
00:33:39,840 --> 00:33:43,470
This allows us to add in constraints as well.

259
00:33:44,040 --> 00:33:47,280
And then, you know, what is a pencil programme?

260
00:33:47,520 --> 00:33:51,750
Well, a pencil programme, you know, this is an actual pencil programme.

261
00:33:52,020 --> 00:33:58,590
I guess you can't read it that well, but for that simple collective classification problem that we had before.

262
00:33:59,040 --> 00:34:02,130
So, you know, you have interpretable.

263
00:34:04,270 --> 00:34:08,980
And you have a cell programme, you have some data and you.

264
00:34:10,150 --> 00:34:13,720
Have a distribution, a conditional distribution that you can solve for.

265
00:34:15,850 --> 00:34:21,120
Now. How do you learn a cell programme from data?

266
00:34:21,510 --> 00:34:26,940
There's kind of two issues. One is where do you get the weights from and where do you get the structures of the rules?

267
00:34:27,390 --> 00:34:34,140
So if I had 2 hours for the talk, I would tell you all about that that I don't.

268
00:34:34,140 --> 00:34:41,160
So you guys will have to trust me or invite me back or talk to me after the lecture about ways of learning this.

269
00:34:41,520 --> 00:34:48,180
But there are ways of doing them. But. Let's talk instead about.

270
00:34:48,420 --> 00:34:52,890
Well. Does it work? Empirically.

271
00:34:53,250 --> 00:35:02,010
And I'm going to show some kind of simple examples of it working, but they're they're kind of representative.

272
00:35:02,310 --> 00:35:08,580
So the first example is an example of doing activity recognition in video.

273
00:35:09,780 --> 00:35:14,850
And in this setting, we compared with using some.

274
00:35:16,850 --> 00:35:21,950
Kind of different feature representations that at the time are commonly used.

275
00:35:21,950 --> 00:35:27,920
And then we'd add in these peers our rules to the features.

276
00:35:28,340 --> 00:35:31,610
And these rules were like super simple.

277
00:35:31,850 --> 00:35:38,360
So they just captured little teeny patterns analogous to the ones that I gave at the beginning.

278
00:35:38,660 --> 00:35:45,500
But you see over and over again that they kind of give you a significant bump in performance.

279
00:35:48,730 --> 00:35:56,080
Another collection of results for some collective classification problems and a link prediction problem.

280
00:35:56,380 --> 00:36:00,880
Here we're comparing to a discrete MRF.

281
00:36:01,740 --> 00:36:04,860
And we see that we do a little bit better.

282
00:36:05,250 --> 00:36:09,690
But even on these small problems, we're significantly faster.

283
00:36:10,140 --> 00:36:14,220
I'm going to drill and expand on that shortly.

284
00:36:16,490 --> 00:36:22,250
Then with a kind of state of the art approach that did a kind of similarity propagation

285
00:36:23,330 --> 00:36:31,100
so completely different kind of algorithm we were able to outperform them on.

286
00:36:33,010 --> 00:36:43,930
Kind of biological setting. And then a third setting where we're doing emotion recognition in dialogue,

287
00:36:45,070 --> 00:36:52,270
comparing to a variety of neural net approaches, including some pretty sophisticated neural net approaches.

288
00:36:52,270 --> 00:36:59,770
Even with a relatively simple PSR model, we were able to significantly outperform.

289
00:37:00,670 --> 00:37:07,990
So. This was about combining logic and probability.

290
00:37:08,440 --> 00:37:14,200
Now let's talk a little bit about combining neuro and symbolic reasoning.

291
00:37:14,620 --> 00:37:23,710
So for neuro and symbolic reasoning, similarly, there's a whole community, the NSC community,

292
00:37:23,980 --> 00:37:30,549
that works in this area where they want to make use of the advantages of sub symbolic

293
00:37:30,550 --> 00:37:42,220
representations and include symbolic reasoning and do this in a way that integrates them properly.

294
00:37:42,700 --> 00:37:45,850
And just like before.

295
00:37:47,410 --> 00:37:55,180
There's a ton of different languages out there that people have proposed and we're going to add one into the mix.

296
00:37:55,750 --> 00:38:00,550
New PSL. So. What is new.

297
00:38:01,450 --> 00:38:06,970
So neuro probabilistic, soft logic. And again,

298
00:38:08,140 --> 00:38:14,140
there are some foundations where the foundations I think one of our contributions

299
00:38:14,140 --> 00:38:23,590
here is to introduce a generalised form for a neuro symbolic reasoning.

300
00:38:24,760 --> 00:38:36,700
And then instantiate it in a particular way. So the we're going to do this by introducing neuro symbolic energy based models.

301
00:38:37,090 --> 00:38:42,370
And it's a general family of energy based models.

302
00:38:42,700 --> 00:38:54,040
And. They highlight the boundary for where the kind of neural reasoning and representation is and where the symbolic is.

303
00:38:55,930 --> 00:39:03,560
In a nice way, I'm going to kind of illustrated by kind of cartoon where, you know,

304
00:39:03,580 --> 00:39:11,800
a standard energy based model is just can be represented kind of generically like this.

305
00:39:12,100 --> 00:39:14,290
And what we're going to do for a neuro symbolic,

306
00:39:14,290 --> 00:39:21,550
energy based model is we're going to break it down where we have the wires, which are the output variables?

307
00:39:23,350 --> 00:39:27,910
We have the symbolic input variables.

308
00:39:29,800 --> 00:39:33,580
We have the neural input variables.

309
00:39:34,700 --> 00:39:43,840
We have the parameters of the symbolic system and we have the parameters of the neural system.

310
00:39:43,850 --> 00:39:47,990
So we're just kind of explaining the notation.

311
00:39:48,650 --> 00:40:02,420
And then for doing the integration, the boundary is really captured by this interface between the neural and the symbolic.

312
00:40:04,010 --> 00:40:11,860
And exactly how that feeds in will depend on the different language and interpretation.

313
00:40:12,190 --> 00:40:20,410
And, you know, this is pretty generic and it's able to represent.

314
00:40:21,740 --> 00:40:30,379
I think more than this. But the ones that we've gone through and looked at, you know, deep problem logic tensor networks, deep stock plug,

315
00:40:30,380 --> 00:40:40,610
neural answer, set programming, all of these can be fit into this neural symbolic energy based model framework.

316
00:40:42,650 --> 00:40:49,250
How did we then do new so well for new PISA?

317
00:40:49,880 --> 00:40:53,450
It is basically that.

318
00:40:54,530 --> 00:40:57,800
We had this from before.

319
00:40:58,610 --> 00:41:02,030
And we are going to.

320
00:41:03,240 --> 00:41:18,410
Kind of have this. Function, the potential function, the hinge loss have this form that matches the factorisation of across a neurone symbolic.

321
00:41:18,890 --> 00:41:24,480
So with this in mind we can go to our you know.

322
00:41:25,500 --> 00:41:31,760
Architecture. And, you know, the way that you do inference is you just kind of go feed.

323
00:41:32,840 --> 00:41:37,090
Forward through. These systems.

324
00:41:38,640 --> 00:41:43,570
If I point out the right thing to do that. Animation.

325
00:41:44,070 --> 00:41:51,360
And then to do learning. So to learn the parameters of both the symbolic and the.

326
00:41:54,520 --> 00:42:07,180
The Neuro you go through and you do the inference and then you calculate the loss, and then you can propagate the loss through the system.

327
00:42:07,570 --> 00:42:16,870
So the nice thing about New Pixel is that it gives us this kind of expressive language.

328
00:42:16,870 --> 00:42:19,880
It has all of the things from pencil.

329
00:42:21,310 --> 00:42:31,330
It's scalable and as a general. So, you know, you can use it with a variety of different neural packages and we have support for this.

330
00:42:31,750 --> 00:42:34,570
Again, this is kind of pretty fresh work.

331
00:42:36,550 --> 00:42:43,240
I'm going to show just a little experimental evaluation that I have a nicer one, but it takes longer to explain.

332
00:42:43,990 --> 00:42:44,980
So just a.

333
00:42:46,160 --> 00:42:57,170
It's one set of experiments where we were doing a collective classification and we're comparing to like a simple label propagation algorithm,

334
00:42:58,040 --> 00:43:03,710
existing NESI approaches and GCN.

335
00:43:04,160 --> 00:43:11,960
And we can clearly do way better than the other neuro symbolic approaches for the GCN.

336
00:43:12,350 --> 00:43:22,280
And yeah, we're doing a little bit better. But you know, the PSL model, the new PSL model is super simple.

337
00:43:22,280 --> 00:43:27,500
So the simplicity is part of the attraction.

338
00:43:30,040 --> 00:43:37,950
And. Let me now turn to a topic of scaling.

339
00:43:41,540 --> 00:43:53,949
So. We're interested in doing these big graft problems, but we want to make sure that we can do it in a reasonable amount of time.

340
00:43:53,950 --> 00:43:57,190
And they're key.

341
00:43:59,570 --> 00:44:03,480
Problem in this is we have our way to logical rules. We have some data.

342
00:44:04,400 --> 00:44:18,890
There's two processes that happen. One is the instantiation of the model, and then there's doing the inference in the model.

343
00:44:19,190 --> 00:44:24,320
So if we want to scale this, we can scale two aspects of it.

344
00:44:24,620 --> 00:44:31,940
One is the inference, which I'll talk about first, and the other is the grounding, which I also talk about.

345
00:44:32,330 --> 00:44:42,379
So inference. Remember, one of my claims was, you know, we went from a combinatorial optimisation problem to a convex optimisation problem.

346
00:44:42,380 --> 00:44:52,520
So already, yeah, that's pretty good in terms of scalability, but it's still potentially a huge linear programme.

347
00:44:53,030 --> 00:45:00,440
So what we can do is we can make use of the fine grained structure in these models and.

348
00:45:03,330 --> 00:45:10,229
We exploit this by decomposing, and we've looked at a bunch of optimisation techniques,

349
00:45:10,230 --> 00:45:18,200
but the first one that we looked at that was really beneficial was Adam's alternating direction multi method of multiplier.

350
00:45:18,210 --> 00:45:23,610
So it's something that's used a lot in computer science or in machine learning.

351
00:45:24,000 --> 00:45:33,329
And the simple idea, it's very simple idea, but you decompose it, you solve the problems, and then you do some message passing.

352
00:45:33,330 --> 00:45:38,160
And the nice thing for a convex problem is it's guaranteed to converge.

353
00:45:40,530 --> 00:45:45,120
The trick is how do you make sure the sub problems are fast?

354
00:45:45,780 --> 00:45:48,870
And this is the place where we have these hinge losses.

355
00:45:49,260 --> 00:45:53,520
And if you break it down and look at these hinges.

356
00:45:54,550 --> 00:46:05,410
A lot of them are just trivial. So you can just ignore them and you can get to a much smaller set of constraints and then solve that.

357
00:46:05,920 --> 00:46:16,690
And so the first set of results I'm going to show are just comparing to using an off the shelf convex optimiser.

358
00:46:17,140 --> 00:46:20,470
A commercial grade optimiser.

359
00:46:20,830 --> 00:46:23,890
And then our admin implementation.

360
00:46:24,490 --> 00:46:35,770
And the first thing you see is for something with 150,000 potentials, you know, the convex optimiser took 3 hours,

361
00:46:35,770 --> 00:46:45,730
the commercial grade one took 3 minutes and it took a few seconds for something that was even larger with 800 million potentials,

362
00:46:46,360 --> 00:46:51,820
you know, so it could do it in a minute. But the others just, you know, can even handle it.

363
00:46:52,600 --> 00:46:58,210
So that was good. However.

364
00:47:00,340 --> 00:47:06,940
Of the total time to solve so that 800 million potential one one minute was

365
00:47:07,300 --> 00:47:14,320
doing the optimisation and 5 minutes was doing the grounding of the problem.

366
00:47:14,650 --> 00:47:20,770
So. The next thing that we looked at was how do we speed up grounding?

367
00:47:21,580 --> 00:47:27,370
And this is very much work with my current Ph.D. student, Eric Augustine.

368
00:47:28,990 --> 00:47:32,770
And you can see why grounding is an issue. Because even.

369
00:47:35,140 --> 00:47:41,770
In the simplest case where I'm potentially doing pairwise similarity, you know,

370
00:47:41,770 --> 00:47:47,620
if I have n things that I'm comparing, I'm going to get and squared and you know, that's just a simple setting.

371
00:47:47,620 --> 00:47:54,630
It can get much worse than that. So. What can we do to speed this up?

372
00:47:55,080 --> 00:48:05,459
Well, the first thing that one can do, and it's done in a lot of different areas and of course, goes by a lot of different terms.

373
00:48:05,460 --> 00:48:13,250
But is blocking and blocking is the simple idea that if you have, you know, say,

374
00:48:13,290 --> 00:48:19,230
a simple case where you have the pairwise comparison to say, okay, I'm not going to do all comparisons.

375
00:48:19,560 --> 00:48:25,600
I'm going to somehow break it into blocks and only do the comparison across blocks.

376
00:48:25,620 --> 00:48:32,459
And, you know, within databases, there's approaches with in data mining,

377
00:48:32,460 --> 00:48:40,500
there's approaches within theoretical computer science, there's approaches, you know, we'll use these approaches.

378
00:48:40,500 --> 00:48:51,840
But then in the implementation with PSR, we can implement that blocking as a blocking predicate.

379
00:48:52,990 --> 00:48:57,580
We can form all of these rules as conjunctive queries.

380
00:48:58,030 --> 00:49:05,800
Give them to the database and give a kind of this as a hint to the database for optimising.

381
00:49:06,130 --> 00:49:10,420
And this can make a huge, huge difference in performance.

382
00:49:11,020 --> 00:49:16,990
So. Here's a couple of different settings.

383
00:49:17,350 --> 00:49:26,540
So first off, notice that this is a log scale where we're comparing performances and,

384
00:49:26,870 --> 00:49:32,050
you know, making use of this can give you, you know, significant speed ups.

385
00:49:32,830 --> 00:49:36,040
So this was the first thing that we looked at.

386
00:49:36,430 --> 00:49:41,440
And just to give you a little sense of the community and where the community

387
00:49:41,440 --> 00:49:46,540
and S.r.l. has been in terms of the scale of the problems that they've done,

388
00:49:46,870 --> 00:49:51,520
you know, some of the works from MLS ends in alchemy.

389
00:49:53,040 --> 00:49:59,180
In 2007, they were doing problems that had a million potentials and doing it.

390
00:49:59,520 --> 00:50:11,460
And, you know, 400 minutes Tuffy came along, which was an implementation that sped things up quite a bit and scaled things.

391
00:50:12,540 --> 00:50:20,520
2015 Fox Pixel, which was a distributed implementation of parcel, sped things up.

392
00:50:21,870 --> 00:50:35,310
And then, you know, in 2018 the results that use the admin were able to speed this up and do larger problems.

393
00:50:36,960 --> 00:50:46,950
This is version 2.1. Version 2.2 which this version the main thing was smarter memory management and

394
00:50:47,280 --> 00:50:55,530
kind of systems issues is able to do 150 million potentials in 10 minutes.

395
00:50:57,330 --> 00:51:10,770
But we can go further and so we can go further by being even smarter about how we do our query generation for grounding.

396
00:51:11,340 --> 00:51:20,220
And this is nice work that makes use of ideas in the database community on multi query optimisation.

397
00:51:20,820 --> 00:51:26,490
And this idea is, you know, we have all these different rules,

398
00:51:27,210 --> 00:51:33,570
we can generate potential query plans for them and then we can look for opportunities

399
00:51:33,570 --> 00:51:41,490
to reuse the queries and you know of the different set of queries that cover them,

400
00:51:41,490 --> 00:51:45,750
find the one that is expected to perform the best.

401
00:51:45,750 --> 00:51:49,800
Again, you know, finding that is a challenging computational problem.

402
00:51:51,480 --> 00:52:00,480
And here's some representative results. I know these are too small to see, but one of the things that's kind of interesting is,

403
00:52:01,170 --> 00:52:06,660
you know, we just added a bunch of overhead of searching over different query plans.

404
00:52:06,960 --> 00:52:11,190
You would think that you might pay a price for doing that.

405
00:52:11,190 --> 00:52:14,220
And it turns out that price isn't so much.

406
00:52:14,580 --> 00:52:21,270
And then on the larger data sets, you get a significant speed ups.

407
00:52:22,780 --> 00:52:28,930
And you know, this is going to depend on the problem. It could end up being even more so from this.

408
00:52:29,230 --> 00:52:32,380
We add a new line into our.

409
00:52:33,560 --> 00:52:42,380
You know, benchmarks. And we're able to do these 150 million potential problems in 2 minutes for grounding.

410
00:52:44,600 --> 00:52:59,120
But wait, there's more. So we can again make use of ideas from database theory to go beyond this and something that we called tandem inference and.

411
00:53:00,440 --> 00:53:04,580
Tandem inference. The idea is, you know, this picture that I showed.

412
00:53:05,150 --> 00:53:09,050
It really has this grounding problem that we're blocking on.

413
00:53:09,080 --> 00:53:13,450
It's like, first we have to ground it and then we start doing inference.

414
00:53:14,550 --> 00:53:21,280
Well, what if we. Due streaming, grounding and streaming in France.

415
00:53:21,300 --> 00:53:26,210
So as we start grounding, we start solving the problem.

416
00:53:28,190 --> 00:53:35,149
Well, we can do that. And this black line is a tandem in France.

417
00:53:35,150 --> 00:53:42,709
And then there's two different inference methods the admin that I showed before, and another one stochastic gradient descent,

418
00:53:42,710 --> 00:53:48,400
but across a bunch of different canonical data sets, you know,

419
00:53:48,410 --> 00:53:57,620
we're able to get solutions before they've even started completed the grounding process.

420
00:53:57,890 --> 00:54:07,850
So that's exciting. But in addition, we're to the point that we can now do problems that are too big to fit into main memory.

421
00:54:08,420 --> 00:54:13,879
So we're able to essentially take these huge problems.

422
00:54:13,880 --> 00:54:23,000
If you were to put it in two main memory, you give it a small blur amount of space.

423
00:54:23,300 --> 00:54:38,240
You page things in and out. And with this, we're able to now scale to things that have, you know, billions of potentials and do them in the same time,

424
00:54:38,240 --> 00:54:45,110
the things that were, you know, several orders of magnitude smaller or before.

425
00:54:45,500 --> 00:54:57,980
So this whole progression we find exciting and we're pretty sure we're the people that are doing the largest problems of this kind in this complexity.

426
00:55:00,270 --> 00:55:04,140
Through, you know, people that we've talked to at various companies as well.

427
00:55:05,850 --> 00:55:10,610
So I am not going to talk about online PSL.

428
00:55:11,670 --> 00:55:15,240
We also have methods for kind of changing the models on the fly.

429
00:55:17,460 --> 00:55:21,900
What I do want to talk about briefly is.

430
00:55:22,930 --> 00:55:24,969
Some of the applications that we've done,

431
00:55:24,970 --> 00:55:37,030
because I'm actually really proud of the work that my group has done using pencil on kind of a rich collection of problems.

432
00:55:37,510 --> 00:55:44,890
So we've done things on computational biology and health informatics.

433
00:55:45,430 --> 00:55:53,620
We have done a fair bit of work on natural language processing, not surprisingly, recommender systems.

434
00:55:54,970 --> 00:56:00,480
We've also done work in computer vision, energy and the environment.

435
00:56:01,750 --> 00:56:08,460
Computational social science. A fairness, which I'll mention a bit.

436
00:56:09,150 --> 00:56:16,710
Information integration and extraction. And predicting malicious behaviour.

437
00:56:17,840 --> 00:56:24,860
The one of these that I want to go into a bit is the information integration,

438
00:56:25,310 --> 00:56:36,020
because I think it's kind of connects with research being done here and in kind of interesting ways is knowledge graph construction.

439
00:56:37,210 --> 00:56:49,270
And Knowledge Graph Construction also connects with the original graph identification pattern that I showed earlier where the idea is.

440
00:56:50,170 --> 00:56:55,150
Then you have a bunch of facts from these facts.

441
00:56:55,480 --> 00:56:58,540
You're trying to extract out a knowledge graph.

442
00:56:58,930 --> 00:57:06,160
In order to do this, you need to perform a collective classification linked prediction entity resolution,

443
00:57:06,460 --> 00:57:10,300
but you also have to enforce ontological constraints.

444
00:57:10,600 --> 00:57:16,180
You potentially want to reason about the quality of the information and more.

445
00:57:17,530 --> 00:57:32,110
We've done this using KSL for a couple of different knowledge graph extraction problems, and the key takeaway from this is that we're able to do this.

446
00:57:33,680 --> 00:57:37,630
And all of the pieces of information are useful.

447
00:57:37,640 --> 00:57:43,130
So the statistical features and the semantic features kind of putting those together

448
00:57:43,430 --> 00:57:50,900
really helps in another kind of knowledge graph task where we're trying to do,

449
00:57:51,410 --> 00:57:54,530
um, knowledge completion.

450
00:57:55,220 --> 00:58:02,300
So you're kind of predicting links. Uh, the key takeaway from this is.

451
00:58:04,020 --> 00:58:12,179
Well, the approach is when you have a lot of clean data, a lot of data, and the data is clean work.

452
00:58:12,180 --> 00:58:18,660
But when you have noisy data and it's sparse, things like PSL can really help.

453
00:58:20,390 --> 00:58:31,290
So. I want to close by talking about kind of opportunities and challenges.

454
00:58:32,160 --> 00:58:36,770
And so. I'll start with the challenge.

455
00:58:37,370 --> 00:58:41,460
So. The perils of ignoring structure.

456
00:58:42,530 --> 00:58:49,740
And. I alluded to one of these before that privacy and privacy and growth.

457
00:58:50,370 --> 00:58:54,480
And I think this is a really important area.

458
00:58:55,000 --> 00:59:00,320
And a lot of the work in privacy still looks at this kind of independent setting.

459
00:59:00,330 --> 00:59:03,720
They're starting to be work more work in privacy and graphs.

460
00:59:04,280 --> 00:59:08,849
And some of the work this was work of my former Ph.D. student,

461
00:59:08,850 --> 00:59:17,970
Elena Leiva was some of the earliest to look at how much information is leaked by the groups that you're a member of.

462
00:59:18,390 --> 00:59:22,560
Not surprisingly, it's a lot actually more than friendship links.

463
00:59:25,400 --> 00:59:28,940
Another important area is fairness.

464
00:59:29,630 --> 00:59:40,760
And with my former post deck garnish for Noddy and colleagues, we're pretty sure we're the first to look at relational fairness.

465
00:59:41,240 --> 00:59:49,160
And because so much about fairness is taking into account structural inequities

466
00:59:49,520 --> 00:59:54,950
that if you don't have a fairness mechanism that can represent the structure,

467
00:59:55,280 --> 01:00:07,280
that's going to be problematic. And then finally, looking at causality and causality in graphs is important.

468
01:00:07,700 --> 01:00:20,660
And this is some joint work with colleagues in the database world where we developed a relational causal system.

469
01:00:22,130 --> 01:00:27,200
I really think that if this conjunction of all three.

470
01:00:27,260 --> 01:00:42,140
Privacy, fairness and causality and understanding how they relate is important for methods that are able to kind of mitigate these impacts.

471
01:00:43,100 --> 01:00:51,530
And it's important to have the background and the social theory to be able to do the interpretation.

472
01:00:53,050 --> 01:01:00,010
In terms of opportunities. Yeah, all of the things that we look at these days are systems.

473
01:01:00,400 --> 01:01:09,160
So you think biological systems, ecosystems, social systems, things that are all these things combined.

474
01:01:09,670 --> 01:01:16,210
And this idea of kind of going from observational data.

475
01:01:17,470 --> 01:01:33,580
Doing some sort of graph identification to infer out a structure, then doing, you know, metrograph ID or knowledge graph ID to infer something,

476
01:01:33,580 --> 01:01:38,170
but then to close the loop where you use that,

477
01:01:38,170 --> 01:01:48,070
to go back and extract more information out and potentially, you know, keep doing this is important and.

478
01:01:50,080 --> 01:01:56,290
To the extent that this ties to these four different interpretations that I had at the beginning.

479
01:01:56,560 --> 01:02:02,139
So I think from their reasoning about entities and relationships and social ties,

480
01:02:02,140 --> 01:02:10,210
it's kind of obvious then this ability to go from specific to abstract and learning these things.

481
01:02:10,540 --> 01:02:15,520
And yeah, the philosophy was a little bit dicier, but, you know, it is,

482
01:02:16,570 --> 01:02:22,899
I think, going from a certain perspective sometimes in AI and machine learning,

483
01:02:22,900 --> 01:02:36,370
which is very atomistic to a more kind of not quite multi-agent but a systems view is important and.

484
01:02:37,930 --> 01:02:42,340
The last thing is the most important acknowledgements.

485
01:02:43,090 --> 01:02:50,020
My students that's the best thing about being a professor is our awesome students get to work with.

486
01:02:51,100 --> 01:03:00,820
I also am fortunate to work with a variety of different companies and organisations and.

487
01:03:01,930 --> 01:03:09,280
Just to wrap it up to say, you know, I hope that somewhere between the more tutorial or technical part,

488
01:03:09,580 --> 01:03:18,100
you come away with some ideas and ideas for things that either integrate probabilistic

489
01:03:18,100 --> 01:03:23,560
and logical inference and are on symbolic or data driven and knowledge driven.

490
01:03:23,920 --> 01:03:30,250
And I think there's tons of opportunities and a space that I think.