1 00:00:00,480 --> 00:00:10,560 So welcome to this strategy lecture. This is a terribly serious of distinguished lectures named for Professor Christopher Strange Chris. 2 00:00:10,570 --> 00:00:17,490 The strategy founded Oxford Programming Research Group in 1965, and with Dana Scott, 3 00:00:17,490 --> 00:00:25,140 he founded the Field of Occasional Semantics, which provides a fun mathematical foundation to programming languages. 4 00:00:26,880 --> 00:00:35,430 Before introducing today, Speaker, I'd like to thank Oxford Asset Management for who generously support this great few lectures. 5 00:00:36,270 --> 00:00:44,850 I have been supporting this theory since 2015 and without the support we would be unable to host this series of exciting lectures. 6 00:00:46,320 --> 00:00:48,740 There'll be coffee and doughnuts after the talk, 7 00:00:48,750 --> 00:00:57,060 and this'll give students and researchers the opportunity to meet representatives of Oxford Asset Management. 8 00:00:57,180 --> 00:01:03,540 You'll be able to find you near the registration desk. Okay. 9 00:01:03,690 --> 00:01:11,880 So with respect to today's talk, it's my very great pleasure to introduce today's speaker, who is Professor Lisa Ghetto. 10 00:01:12,930 --> 00:01:18,030 Lisa Guitar is a distinguished professor in the University of California, Santa Cruz, 11 00:01:18,030 --> 00:01:26,100 Department of Computer Science and Engineering, and a founding director of the UCSC Data Science Research Centre. 12 00:01:27,570 --> 00:01:34,410 Lisa has been the recipient of many honours and awards in recognition of her distinguished research contributions, 13 00:01:34,680 --> 00:01:38,520 including fellowships of the ICI and I Triple A, 14 00:01:39,180 --> 00:01:49,610 Triple II, numerous paper awards and many invitations to give keynote talks and distinguished lectures such as this one today. 15 00:01:52,380 --> 00:01:57,000 Lisa is perhaps best known for her work on statistical relational learning, 16 00:01:57,000 --> 00:02:07,020 which is a subfield of II that combines logic and machine learning to deal with data that exhibits both complex structure and inherent uncertainty. 17 00:02:08,220 --> 00:02:12,900 She's also made important contributions in fields as diverse as data management, 18 00:02:13,170 --> 00:02:19,440 visual analytics and social network analysis, as well as being a passionate advocate for fairness. 19 00:02:19,440 --> 00:02:26,370 And I think in today's Cool Place that's going to tell us about her work on probabilistic Soft Logic, 20 00:02:26,850 --> 00:02:32,670 which is an SRO framework for large scale probabilistic reasoning and relational domains. 21 00:02:33,240 --> 00:02:37,230 So please join me and giving this a very warm welcome today. 22 00:02:43,980 --> 00:02:48,240 First. It's a huge honour to be here to give this talk. 23 00:02:48,240 --> 00:02:55,950 And Ian didn't give the back story that this talk has been in the making for quite a while. 24 00:02:56,220 --> 00:03:02,910 I think we first started talking in 2019 and I was supposed to give the talk in May of 2020. 25 00:03:03,240 --> 00:03:13,060 Just when things were kind of shutting down. And I'm really happy to be here and actually be here in 3D rather than 2D. 26 00:03:14,250 --> 00:03:22,590 And I'm hoping that my topic, which is about kind of mixing logic, probability and neural reasoning, 27 00:03:23,190 --> 00:03:29,630 I think is a really good fit to the kind of research that's being done and the computer science department here. 28 00:03:29,640 --> 00:03:39,030 So I hope you'll agree by the end. And then finally, I am going to talk about actually a programming language, a probabilistic programming language. 29 00:03:39,030 --> 00:03:44,490 So I hope that's kind of in honour of Christopher is fitting. 30 00:03:45,420 --> 00:03:57,840 But before I get into the kind of meat of the talk, I want to kind of pop things out a little bit and put a bigger picture on what I'm doing. 31 00:03:57,840 --> 00:04:08,910 And so, if you will, a subtitle or super title for my talk is The Power of Relational or Statistical Relational Thinking. 32 00:04:09,480 --> 00:04:13,740 And you know, what do I mean by relational thinking? 33 00:04:14,070 --> 00:04:21,360 And relational thinking is interesting because there's all these different fields that use the term, 34 00:04:21,630 --> 00:04:25,410 but they mean kind of different things in these different areas. 35 00:04:26,130 --> 00:04:27,960 So let's start with psychology. 36 00:04:28,620 --> 00:04:40,260 In psychology, there's this notion that it's about being able to talk about objects and attributes of objects and their relations, 37 00:04:40,800 --> 00:04:48,630 and that's part of intelligence. And within sociology, not surprisingly, 38 00:04:48,840 --> 00:05:01,810 it's about the ties between individuals and the memberships in organisations and those kinds of structures within mathematics. 39 00:05:01,830 --> 00:05:11,850 It's interesting, it has a kind of different interpretation and it's much more about the ability to go from the specific to the more abstract. 40 00:05:12,090 --> 00:05:19,920 So oftentimes the example given is a distinction between just doing calculations versus, 41 00:05:20,190 --> 00:05:23,880 you know, understanding how to manipulate things using algebra. 42 00:05:25,020 --> 00:05:32,190 And then finally, in philosophy A, the term is used in a lot of different ways. 43 00:05:33,360 --> 00:05:38,790 One of them is around a can of even notions of reality, 44 00:05:38,790 --> 00:05:53,550 of a relational ism around arguing against non duality between mind and body and subjectivity and objectivity. 45 00:05:55,140 --> 00:06:02,950 But. What does this mean from a perspective of computer science or machine learning? 46 00:06:02,990 --> 00:06:08,520 Ah, I, I'm going to argue that there's kind of two components. 47 00:06:08,520 --> 00:06:12,510 So one component is not surprisingly dealing with structure. 48 00:06:12,900 --> 00:06:16,620 So dealing with structure either in the inputs. 49 00:06:17,760 --> 00:06:26,610 So, you know, oftentimes in machine learning, you just have this flat attribute value vectors and you're trying to learn things from them. 50 00:06:26,610 --> 00:06:30,180 But, you know, so many things are much richer. 51 00:06:30,300 --> 00:06:36,480 Like even a graph or a multi relational graph and so on as common. 52 00:06:37,380 --> 00:06:46,320 And then also for the outputs, the outputs, the decision variables that one's trying to predict. 53 00:06:46,620 --> 00:06:54,810 Oftentimes there's a fair bit of structure in them and there's kind of dependencies between them, and you need to reason jointly about them. 54 00:06:55,110 --> 00:07:03,030 So the most obvious examples are natural language processing when you're doing something like parsing or computer vision, 55 00:07:03,180 --> 00:07:11,280 when you're doing scene understanding. So there's a structure that we want to be able to make use of. 56 00:07:11,790 --> 00:07:21,599 But then there's also uncertainty. And again, there's uncertainty in the inputs, you know, noise and uncertainty and the outputs, 57 00:07:21,600 --> 00:07:29,010 you know, quantifying the uncertainty over various different predictions that one's making. 58 00:07:29,400 --> 00:07:42,270 So what I argue is that we need Method Z are scalable and are able to deal with both structure and uncertainty. 59 00:07:42,600 --> 00:07:56,730 And in this talk, what I want to do is provide you with some kind of patterns and some methods for doing relational thinking. 60 00:07:57,090 --> 00:08:02,280 And these methods are going to use a mix of. 61 00:08:03,440 --> 00:08:13,100 Logic and probability or kind of hard and soft reasoning and also neural and symbolic reasoning. 62 00:08:14,180 --> 00:08:19,520 So the talk is divided into kind of three different components. 63 00:08:19,520 --> 00:08:27,499 The first part is I'm just going to go through some patterns, and these patterns are actually pretty basic, 64 00:08:27,500 --> 00:08:38,390 but we've found them over and over again to be super useful in building AI and machine learning systems. 65 00:08:38,810 --> 00:08:43,940 Then the bulk of the talk is going to be the tool section. 66 00:08:44,240 --> 00:08:47,390 It's going to go into a little more technical detail. 67 00:08:47,690 --> 00:08:52,300 And then, you know, we're computer scientists, so we always like to talk about scaling. 68 00:08:52,310 --> 00:08:56,270 So there'll be a section on getting this to work at scale. 69 00:08:56,810 --> 00:09:05,960 So for patterns, the way that I'm going to be representing these patterns is as logical rules. 70 00:09:06,320 --> 00:09:11,900 And so these have advantages. You know, they're good at representing structure. 71 00:09:12,140 --> 00:09:13,970 They tend to be interpretable. 72 00:09:15,290 --> 00:09:27,680 In about 10 minutes into the lecture, I'm going to back off on that and talk about also some probabilistic interpretations for them. 73 00:09:29,150 --> 00:09:43,410 And the. Canonical pattern said I'm going to go through are these structured prediction patterns that are super simple but still powerful. 74 00:09:43,830 --> 00:09:51,750 And the first ones, the simplest the simplest one is just collective classification. 75 00:09:52,050 --> 00:09:59,220 So you're doing classification problem where you're trying to predict labels for nodes in a graph. 76 00:09:59,610 --> 00:10:10,680 You know, you can't get much simpler than that. And, and the thing that I want to highlight is how to represent the reasoning pattern behind this. 77 00:10:11,160 --> 00:10:15,900 And the reasoning pattern is captured in these two simple rules. 78 00:10:16,170 --> 00:10:22,200 So one is, okay, I'm going to just use local information at the node. 79 00:10:22,500 --> 00:10:26,340 So attributes of the node to try and infer a label. 80 00:10:26,730 --> 00:10:31,860 But then I'm also going to use things about what the labels of its neighbours are. 81 00:10:32,490 --> 00:10:39,720 And the interesting thing to note here is the thing that I'm trying to infer label. 82 00:10:40,350 --> 00:10:44,880 You see this one rule where label is on both sides of the rule. 83 00:10:45,330 --> 00:10:57,570 So that's the thing that gets this dependence between the decision variables, the fact that I have unknowns potentially on both sides of the rule. 84 00:10:58,650 --> 00:11:03,570 So I need to give a little toy example of this. 85 00:11:03,990 --> 00:11:12,480 The one that I usually give is I have a social network and I'm trying to infer political parties. 86 00:11:12,810 --> 00:11:24,570 Okay, so these are the US political parties, so I'm going to switch them and keep the colours the same and the British parties. 87 00:11:24,570 --> 00:11:27,990 But so again, this is a toy example, 88 00:11:27,990 --> 00:11:39,240 but the idea is most often what happens is you're given a network where you observe the labels 89 00:11:39,240 --> 00:11:45,990 for some of the nodes and then it's just these other ones where you're trying to infer them. 90 00:11:46,830 --> 00:11:52,770 And so how do we instantiate that little pattern that I gave a minute or two ago? 91 00:11:53,040 --> 00:12:02,070 Well, I'm going to have a collection of local rules where the local rules say things like, 92 00:12:02,370 --> 00:12:08,640 oh, you know, if they donate to a particular party, they're likely to vote for that party. 93 00:12:09,930 --> 00:12:19,290 If they tweet about something, maybe this isn't a Labour thing that people tweet, but they're likely to be at that party. 94 00:12:19,680 --> 00:12:30,720 But then the more interesting rules are these collective rules where the collective rules take into account, you know, if. 95 00:12:31,730 --> 00:12:36,080 Their friends vote for a party. They're likely to vote for the party. 96 00:12:37,220 --> 00:12:41,450 If their spouse votes for a party, they're likely to vote for a party. 97 00:12:41,660 --> 00:12:45,710 And again, this is something that allows you to propagate the information. 98 00:12:47,030 --> 00:12:52,940 And using something like this, you can go through and end up doing a labelling. 99 00:12:53,390 --> 00:13:02,060 There's a bunch of different algorithms. I'm going to tell you one way that we do it at scale in the middle of the talk. 100 00:13:02,570 --> 00:13:09,830 Now, there is an important aside here, which is related to privacy. 101 00:13:10,250 --> 00:13:20,210 So it's exactly this issue that you can infer so much about an individual based on their network. 102 00:13:20,430 --> 00:13:26,120 That's important. I'm going to talk a little bit more about that towards the end of the lecture. 103 00:13:27,790 --> 00:13:33,810 So the next pattern again is that was about nodes. 104 00:13:33,820 --> 00:13:39,460 This is about edges. So inferring the existence of edges in a graph. 105 00:13:40,180 --> 00:13:48,160 And again, we have this super simple pattern that says if there's a edge between X and Y, 106 00:13:48,910 --> 00:13:54,430 there's a Z that's similar to Y, then there's an edge between X and Z. 107 00:13:54,850 --> 00:14:02,229 And because I have this thing that I'm going to infer, the link is what I'm trying to predict on both sides of the rule. 108 00:14:02,230 --> 00:14:05,410 That's the thing that's going to allow this dependence to happen. 109 00:14:06,580 --> 00:14:11,410 So what's a pattern for an example of this? 110 00:14:11,680 --> 00:14:18,370 Well, one of the things that there's a lot of work in is recommender systems. 111 00:14:18,370 --> 00:14:25,090 And recommender systems. Exactly. You're trying to predict what items someone will like. 112 00:14:25,690 --> 00:14:33,670 There's these kind of structural patterns that one can do where you can say, okay, if a user likes one item, 113 00:14:33,910 --> 00:14:39,610 there's another item that's similar to that item, then the user will like the second item. 114 00:14:42,070 --> 00:14:45,790 Also you can say, okay, if there's a user that likes an item, 115 00:14:46,000 --> 00:14:52,870 there's another user that's similar to that user, then the second user will like that item. 116 00:14:53,590 --> 00:15:01,390 And so, you know, this is a simple way of capturing what's done and a lot of recommender systems. 117 00:15:01,900 --> 00:15:09,670 The key challenge, not surprisingly, and note that these solid lines are the things that are observed. 118 00:15:09,880 --> 00:15:15,700 And then I'm making this prediction as a test. One of the challenges. 119 00:15:17,000 --> 00:15:30,120 How to represent the similarity. And then the kind of third important sub problem is entity resolution. 120 00:15:30,120 --> 00:15:40,770 And so what is entity resolution? This is something that's ubiquitous across computer science where you are essentially trying to 121 00:15:41,220 --> 00:15:49,200 determine when you have two references and they're really referring to the same underlying thing. 122 00:15:51,750 --> 00:15:57,510 Again, we can set this up where we use local information. 123 00:15:57,510 --> 00:16:10,380 So, you know, if they have similar names, they're likely to refer to the same, let's say, person, if they're similar, linked to similar things. 124 00:16:10,860 --> 00:16:15,690 But then the kind of collective pattern that is. 125 00:16:17,350 --> 00:16:26,800 Important is if X and Y are the same and Y and Z are the same, then Excellency are the same. 126 00:16:27,070 --> 00:16:35,620 So this is a transitivity that allows you to infer additional same as links. 127 00:16:36,250 --> 00:16:41,110 And this is important, 128 00:16:42,310 --> 00:16:47,890 but there are some domains where you actually have a different kind of rule and 129 00:16:48,310 --> 00:16:55,060 understanding your domain to know when you do this kind of clustering versus some domains. 130 00:16:55,060 --> 00:17:03,370 What happens is let's. What happens is it doesn't go black. 131 00:17:03,970 --> 00:17:07,840 You have a more of a linking kind of problem. 132 00:17:08,020 --> 00:17:11,950 So it's said if X and Y are the same. 133 00:17:12,400 --> 00:17:17,800 And Z is not the same as Y, then X can't link to Z. 134 00:17:18,280 --> 00:17:28,300 So think about this. When you have two databases that you're matching across and they've been duplicated and you're only allowed to match across one, 135 00:17:28,660 --> 00:17:37,060 and a distinction of knowing when you're in the clustering setting versus when you're in the mutual exclusion setting as important. 136 00:17:38,680 --> 00:17:49,600 But now those are all these kind of micro pattern that my favourite pattern is one where you do all of these at the same time. 137 00:17:50,110 --> 00:17:58,000 And so this is a problem of graph identification we first kind of introduces like ten years ago. 138 00:17:58,240 --> 00:18:03,590 And let me again kind of define it by example a. 139 00:18:05,700 --> 00:18:17,700 A setting where we have a kind of noisy input graph, but then we have to do all these different inferences to infer an output graph. 140 00:18:17,700 --> 00:18:25,440 And then this output graph is the thing that we then want to go on and do our analysis on or do our science on. 141 00:18:26,100 --> 00:18:31,290 And an example would be you have a communication network. 142 00:18:31,560 --> 00:18:40,530 In this case, it's an email communication network, but you have email addresses and the edges are communications and you have some. 143 00:18:41,770 --> 00:18:54,219 Content information. And then what you're trying to do is infer out an organisational network or social network and here the nodes are different, 144 00:18:54,220 --> 00:18:59,260 they're not email addresses, they're actual people. The edges have a different semantics there. 145 00:18:59,980 --> 00:19:06,520 Who's in charge of who and the node labels maybe are, you know, what are their roles in the company? 146 00:19:07,150 --> 00:19:11,440 And so, again, what does this problem look like? 147 00:19:12,340 --> 00:19:20,020 I have as input this graph I'm trying to get out this so I don't have this observed. 148 00:19:20,770 --> 00:19:24,010 What do I need to do? I need to, you know, first. 149 00:19:24,960 --> 00:19:30,480 Go through and do entity resolution where I figure out, you know, who are the people? 150 00:19:31,730 --> 00:19:38,240 Then I have to do the link prediction of, you know, who manages who. 151 00:19:39,550 --> 00:19:44,980 And then I have to do the inference of their kind of roles in the company. 152 00:19:45,280 --> 00:19:49,270 So I need to go through and do the labelling of this. 153 00:19:49,750 --> 00:19:58,989 The thing that's kind of interesting about this, in my view, is how there's all these interesting dependencies. 154 00:19:58,990 --> 00:20:03,700 And most of the time people look at these problems in isolation. 155 00:20:03,700 --> 00:20:08,680 So they either do collective classification or they do link prediction or entity resolution, 156 00:20:08,950 --> 00:20:14,260 but trying to do them all at once where there's dependencies between. 157 00:20:15,580 --> 00:20:18,460 You know, obviously on the observed graph. 158 00:20:18,940 --> 00:20:29,890 But there's also what I was emphasising before that in tread dependence across the predictions in each of the task. 159 00:20:30,430 --> 00:20:37,110 But then there's also this kind of. Interdependence across them. 160 00:20:37,650 --> 00:20:48,090 And it's really these kinds of complex reasonings with graphs that kind of motivates much of my work. 161 00:20:49,450 --> 00:20:55,550 So. Now let me get into methods for doing this. 162 00:20:55,730 --> 00:21:01,670 And I'm going to be talking about tools and. 163 00:21:02,890 --> 00:21:10,630 I'm first going to talk about tools that combine the statistical and relational, so the logic and probability. 164 00:21:10,840 --> 00:21:16,030 And then afterwards I'm going to talk about methods that combine the neural and symbolic. 165 00:21:17,560 --> 00:21:25,630 So for the statistical relational, you know, this is an area, sub research area, 166 00:21:26,080 --> 00:21:34,540 as you mentioned, that combines logical representations, handles uncertainty, 167 00:21:34,870 --> 00:21:40,899 and is especially designed to do these kind of collective problems where these rich 168 00:21:40,900 --> 00:21:49,090 dependencies and there's been a lot of different languages that have been proposed. 169 00:21:49,570 --> 00:21:52,780 And we're going to add one to the mix. 170 00:21:53,200 --> 00:21:59,710 PSL. PSL stands for probabilistic soft logic. 171 00:22:01,780 --> 00:22:09,100 It's an open source toolkit. There's a bunch of examples and tons of resources at this website. 172 00:22:11,140 --> 00:22:19,660 But at a high level, going back to the fact that I represented all these patterns as these logical rules. 173 00:22:20,050 --> 00:22:25,640 Well, logical rules have advantages, but they also have disadvantages. 174 00:22:25,660 --> 00:22:31,360 So in general, doing full reasoning with them is intractable. 175 00:22:33,190 --> 00:22:38,110 If there's inconsistencies, that's going to be a significant problem. 176 00:22:38,530 --> 00:22:43,120 And, you know, one of the things that I mentioned is dealing with similarity. 177 00:22:43,520 --> 00:22:47,050 You know, they're not great at representing similarities. 178 00:22:47,860 --> 00:22:56,590 So what we try and do with PSL is basically turn these disadvantages into advantages 179 00:22:57,400 --> 00:23:05,110 and have a tractable method that can handle the inconsistencies and can represent. 180 00:23:09,280 --> 00:23:12,520 Similarities are a degree of similarity. 181 00:23:13,610 --> 00:23:16,840 So what does this look like? 182 00:23:17,440 --> 00:23:28,330 It's a probabilistic programming language for these collective problems, and it is represented as weighted rules. 183 00:23:30,070 --> 00:23:47,370 The. Key thing with these weighted rules is the kind of ground atoms that we'll get to are going to be not standard zero one binary, 184 00:23:47,370 --> 00:23:51,090 but continuous value between zero and one. 185 00:23:51,720 --> 00:23:56,970 And a cell programme is just, you know, these rules plus some input data. 186 00:23:58,370 --> 00:24:06,410 So. We make reasoning scalable by converting this to a convex optimisation problem. 187 00:24:06,740 --> 00:24:14,780 And so what I want to do is show you how we come up with this particular optimisation problem. 188 00:24:15,350 --> 00:24:26,180 And this is work very much by my former Ph.D. student, Stephen Buck, and Bert Wang, a post-doc that worked with me. 189 00:24:26,660 --> 00:24:33,830 And I'm going to flash up the particular relaxation that we introduce. 190 00:24:35,060 --> 00:24:39,470 And then I'm going to show you how we get to it under three different ways. 191 00:24:39,500 --> 00:24:48,830 So for now, I'm going to unpack this in a minute, but kind of see if you can kind of memorise this pattern. 192 00:24:49,640 --> 00:25:00,700 And. This pattern. The interesting thing is it comes up under three different interpretations for this way to logic programmes. 193 00:25:00,700 --> 00:25:04,490 So one comes from randomised algorithms, theoretical computer science. 194 00:25:04,780 --> 00:25:09,670 One comes from probabilistic graphical models and one comes from soft logic. 195 00:25:11,890 --> 00:25:16,450 Let me do the first from randomised algorithms. So. 196 00:25:18,400 --> 00:25:29,080 We're going to take this way to logical rules, and I'm going to rewrite them so that we have non-negative weights. 197 00:25:29,380 --> 00:25:33,850 I'm going to write it in this normal form. 198 00:25:34,210 --> 00:25:40,570 And then I'm indexing over the positive literals and the negative literals. 199 00:25:40,570 --> 00:25:44,500 So it's a little. A messy bit. 200 00:25:46,090 --> 00:25:49,659 And then what I want to do is. 201 00:25:49,660 --> 00:25:59,250 I want to. Find that assignments to the x variables that maximises this weighted sum. 202 00:26:00,760 --> 00:26:10,090 And so, you know, all of you guys go back to your algorithms class and you say, Professor Gitau, I remember that. 203 00:26:10,600 --> 00:26:15,310 That is why you did Max that and that is amp hard. 204 00:26:15,580 --> 00:26:18,940 So that doesn't seem like you got anywhere so far. 205 00:26:20,590 --> 00:26:25,780 But now what we're going to do is take this MP hard problem and. 206 00:26:27,070 --> 00:26:34,930 We're going to convert it to a problem where we instead interpret the random variables as probabilities, 207 00:26:34,930 --> 00:26:39,730 rounding probabilities, so the probability they round up to one are down to zero. 208 00:26:40,510 --> 00:26:47,050 And doing this, we can make use of this old result of Gomez and Williamson. 209 00:26:47,830 --> 00:26:50,710 They actually have to really famous results from the same year. 210 00:26:51,550 --> 00:27:03,790 But this gives a way of bounding the expectation for that expression with a linear programme. 211 00:27:04,420 --> 00:27:15,460 And you can go further and say that this gives you this three quarters optimal rounding guarantee for this relaxation. 212 00:27:15,910 --> 00:27:19,930 And I know everybody remembers what I put two slides ago. 213 00:27:19,930 --> 00:27:28,660 If you look at this main part of the term here, it has the form that I showed a minute ago. 214 00:27:29,020 --> 00:27:38,470 So this is one interpretation where I get a relaxation that is nice and is basically a linear programme. 215 00:27:39,470 --> 00:27:44,660 Now there's a second interpretation. And the second interpretation comes from graphical models. 216 00:27:45,320 --> 00:27:54,410 And so what I can do with my weighted logical rules is I can basically form a big giant graphical 217 00:27:54,410 --> 00:28:04,850 model where I have my random variables and then I have these potential functions that are the rules. 218 00:28:04,860 --> 00:28:11,310 So the logical rules. And. 219 00:28:12,640 --> 00:28:17,140 They're kind of the potential functions in the graphical model. 220 00:28:17,140 --> 00:28:24,370 So they're kind of weird potential functions because they're just our logic. 221 00:28:24,760 --> 00:28:26,080 But still, I can do this. 222 00:28:26,260 --> 00:28:40,450 And then the standard thing that people do in graphical models is then define a distribution where the distribution is just over all of the rules, 223 00:28:40,450 --> 00:28:53,049 the sum of the weighted potentials. And the classic problem is to then find the assignment of variables, then maximises this exponent. 224 00:28:53,050 --> 00:29:02,590 It will be the most probable. So at this point I've done nothing rather than just rename weighted max that into graphical models. 225 00:29:03,040 --> 00:29:13,869 But we can go from here when one's trying to find the assignment to the variables. 226 00:29:13,870 --> 00:29:23,560 One of the things that one can do is to solve this is to find a globally consistent set of marginal distributions. 227 00:29:26,150 --> 00:29:32,560 Again. This is still hard because there's exponentially many constraints. 228 00:29:33,220 --> 00:29:42,370 So what people and graphical models have often done is to do these local consistency around relaxation. 229 00:29:42,370 --> 00:29:48,940 So rather than enforce all of the constraints, just enforce a few local ones. 230 00:29:49,330 --> 00:29:54,250 There's a whole cottage industry in different ways of doing this. 231 00:29:55,570 --> 00:30:09,550 We can make use of this in our setting and introduce some kind of pseudo marginals over some joint potential states, but not all of them. 232 00:30:10,300 --> 00:30:23,160 Then we can take. Our expression, put in these constraints for consistency with just the marginals and the pseudo marginals. 233 00:30:24,390 --> 00:30:25,350 And then, you know, 234 00:30:25,350 --> 00:30:41,400 do a little bit of not terribly complex analysis to kind of rewrite this and get rid of the pseudo marginals projected onto the mews. 235 00:30:42,060 --> 00:30:46,080 And we can get out this expression. 236 00:30:46,980 --> 00:30:55,950 And again, if you look at this expression, it has the same form as the max set relaxation. 237 00:30:55,950 --> 00:31:04,500 So we got to the same relaxation, but through to pretty significantly different interpretations. 238 00:31:04,860 --> 00:31:16,169 And this result actually between these two is interesting just in and of itself, because the people that were doing the local consistency, 239 00:31:16,170 --> 00:31:20,430 relaxations and graphical models would do them, but they didn't have any bounds. 240 00:31:20,730 --> 00:31:24,990 So now we can take this down from the max set relaxation and apply it. 241 00:31:26,220 --> 00:31:29,880 But that's two of the three interpretations. 242 00:31:30,210 --> 00:31:40,230 So the third interpretation comes from soft logic, and this is where we just start off and say, you know, 243 00:31:40,230 --> 00:31:51,930 the random variables are continuous between zero and one, you know, interpreted as either similarity or degree of truth. 244 00:31:52,510 --> 00:32:08,010 And, you know, there's different soft logics. We use Fukushima's logic and then through doing some manipulation to find the exact max solution. 245 00:32:08,460 --> 00:32:12,000 So no approximations happening here at all. 246 00:32:13,740 --> 00:32:17,850 You get this expression. And this expression. 247 00:32:17,850 --> 00:32:23,790 Now, you can kind of have a little bit of an interpretation for it of where it came from, 248 00:32:25,230 --> 00:32:32,100 because this is really like either the rule is satisfied so it's one or it's not satisfied. 249 00:32:32,100 --> 00:32:39,989 So I'm going to add up the truth values of the positive literals and the truth values of the negative literals. 250 00:32:39,990 --> 00:32:43,260 And talk about that is the degree of satisfaction. 251 00:32:43,710 --> 00:32:47,370 And again, if you look at this expression. 252 00:32:48,660 --> 00:32:53,220 It matches the other two. So we've gotten to. 253 00:32:54,950 --> 00:33:06,800 The same relaxation through three different interpretations, which always seemed like a good sign of consistency. 254 00:33:09,770 --> 00:33:12,830 With pencil, we actually ended up going further than this. 255 00:33:12,980 --> 00:33:17,570 So we introduced a notion that we called hinge loss mark of random fields. 256 00:33:17,960 --> 00:33:26,840 And so the hinge, less marka random fields have this kind of same general form. 257 00:33:27,380 --> 00:33:39,500 But now. Rather than this expression that we had before, we're going to allow kind of arbitrary linear expressions here. 258 00:33:39,840 --> 00:33:43,470 This allows us to add in constraints as well. 259 00:33:44,040 --> 00:33:47,280 And then, you know, what is a pencil programme? 260 00:33:47,520 --> 00:33:51,750 Well, a pencil programme, you know, this is an actual pencil programme. 261 00:33:52,020 --> 00:33:58,590 I guess you can't read it that well, but for that simple collective classification problem that we had before. 262 00:33:59,040 --> 00:34:02,130 So, you know, you have interpretable. 263 00:34:04,270 --> 00:34:08,980 And you have a cell programme, you have some data and you. 264 00:34:10,150 --> 00:34:13,720 Have a distribution, a conditional distribution that you can solve for. 265 00:34:15,850 --> 00:34:21,120 Now. How do you learn a cell programme from data? 266 00:34:21,510 --> 00:34:26,940 There's kind of two issues. One is where do you get the weights from and where do you get the structures of the rules? 267 00:34:27,390 --> 00:34:34,140 So if I had 2 hours for the talk, I would tell you all about that that I don't. 268 00:34:34,140 --> 00:34:41,160 So you guys will have to trust me or invite me back or talk to me after the lecture about ways of learning this. 269 00:34:41,520 --> 00:34:48,180 But there are ways of doing them. But. Let's talk instead about. 270 00:34:48,420 --> 00:34:52,890 Well. Does it work? Empirically. 271 00:34:53,250 --> 00:35:02,010 And I'm going to show some kind of simple examples of it working, but they're they're kind of representative. 272 00:35:02,310 --> 00:35:08,580 So the first example is an example of doing activity recognition in video. 273 00:35:09,780 --> 00:35:14,850 And in this setting, we compared with using some. 274 00:35:16,850 --> 00:35:21,950 Kind of different feature representations that at the time are commonly used. 275 00:35:21,950 --> 00:35:27,920 And then we'd add in these peers our rules to the features. 276 00:35:28,340 --> 00:35:31,610 And these rules were like super simple. 277 00:35:31,850 --> 00:35:38,360 So they just captured little teeny patterns analogous to the ones that I gave at the beginning. 278 00:35:38,660 --> 00:35:45,500 But you see over and over again that they kind of give you a significant bump in performance. 279 00:35:48,730 --> 00:35:56,080 Another collection of results for some collective classification problems and a link prediction problem. 280 00:35:56,380 --> 00:36:00,880 Here we're comparing to a discrete MRF. 281 00:36:01,740 --> 00:36:04,860 And we see that we do a little bit better. 282 00:36:05,250 --> 00:36:09,690 But even on these small problems, we're significantly faster. 283 00:36:10,140 --> 00:36:14,220 I'm going to drill and expand on that shortly. 284 00:36:16,490 --> 00:36:22,250 Then with a kind of state of the art approach that did a kind of similarity propagation 285 00:36:23,330 --> 00:36:31,100 so completely different kind of algorithm we were able to outperform them on. 286 00:36:33,010 --> 00:36:43,930 Kind of biological setting. And then a third setting where we're doing emotion recognition in dialogue, 287 00:36:45,070 --> 00:36:52,270 comparing to a variety of neural net approaches, including some pretty sophisticated neural net approaches. 288 00:36:52,270 --> 00:36:59,770 Even with a relatively simple PSR model, we were able to significantly outperform. 289 00:37:00,670 --> 00:37:07,990 So. This was about combining logic and probability. 290 00:37:08,440 --> 00:37:14,200 Now let's talk a little bit about combining neuro and symbolic reasoning. 291 00:37:14,620 --> 00:37:23,710 So for neuro and symbolic reasoning, similarly, there's a whole community, the NSC community, 292 00:37:23,980 --> 00:37:30,549 that works in this area where they want to make use of the advantages of sub symbolic 293 00:37:30,550 --> 00:37:42,220 representations and include symbolic reasoning and do this in a way that integrates them properly. 294 00:37:42,700 --> 00:37:45,850 And just like before. 295 00:37:47,410 --> 00:37:55,180 There's a ton of different languages out there that people have proposed and we're going to add one into the mix. 296 00:37:55,750 --> 00:38:00,550 New PSL. So. What is new. 297 00:38:01,450 --> 00:38:06,970 So neuro probabilistic, soft logic. And again, 298 00:38:08,140 --> 00:38:14,140 there are some foundations where the foundations I think one of our contributions 299 00:38:14,140 --> 00:38:23,590 here is to introduce a generalised form for a neuro symbolic reasoning. 300 00:38:24,760 --> 00:38:36,700 And then instantiate it in a particular way. So the we're going to do this by introducing neuro symbolic energy based models. 301 00:38:37,090 --> 00:38:42,370 And it's a general family of energy based models. 302 00:38:42,700 --> 00:38:54,040 And. They highlight the boundary for where the kind of neural reasoning and representation is and where the symbolic is. 303 00:38:55,930 --> 00:39:03,560 In a nice way, I'm going to kind of illustrated by kind of cartoon where, you know, 304 00:39:03,580 --> 00:39:11,800 a standard energy based model is just can be represented kind of generically like this. 305 00:39:12,100 --> 00:39:14,290 And what we're going to do for a neuro symbolic, 306 00:39:14,290 --> 00:39:21,550 energy based model is we're going to break it down where we have the wires, which are the output variables? 307 00:39:23,350 --> 00:39:27,910 We have the symbolic input variables. 308 00:39:29,800 --> 00:39:33,580 We have the neural input variables. 309 00:39:34,700 --> 00:39:43,840 We have the parameters of the symbolic system and we have the parameters of the neural system. 310 00:39:43,850 --> 00:39:47,990 So we're just kind of explaining the notation. 311 00:39:48,650 --> 00:40:02,420 And then for doing the integration, the boundary is really captured by this interface between the neural and the symbolic. 312 00:40:04,010 --> 00:40:11,860 And exactly how that feeds in will depend on the different language and interpretation. 313 00:40:12,190 --> 00:40:20,410 And, you know, this is pretty generic and it's able to represent. 314 00:40:21,740 --> 00:40:30,379 I think more than this. But the ones that we've gone through and looked at, you know, deep problem logic tensor networks, deep stock plug, 315 00:40:30,380 --> 00:40:40,610 neural answer, set programming, all of these can be fit into this neural symbolic energy based model framework. 316 00:40:42,650 --> 00:40:49,250 How did we then do new so well for new PISA? 317 00:40:49,880 --> 00:40:53,450 It is basically that. 318 00:40:54,530 --> 00:40:57,800 We had this from before. 319 00:40:58,610 --> 00:41:02,030 And we are going to. 320 00:41:03,240 --> 00:41:18,410 Kind of have this. Function, the potential function, the hinge loss have this form that matches the factorisation of across a neurone symbolic. 321 00:41:18,890 --> 00:41:24,480 So with this in mind we can go to our you know. 322 00:41:25,500 --> 00:41:31,760 Architecture. And, you know, the way that you do inference is you just kind of go feed. 323 00:41:32,840 --> 00:41:37,090 Forward through. These systems. 324 00:41:38,640 --> 00:41:43,570 If I point out the right thing to do that. Animation. 325 00:41:44,070 --> 00:41:51,360 And then to do learning. So to learn the parameters of both the symbolic and the. 326 00:41:54,520 --> 00:42:07,180 The Neuro you go through and you do the inference and then you calculate the loss, and then you can propagate the loss through the system. 327 00:42:07,570 --> 00:42:16,870 So the nice thing about New Pixel is that it gives us this kind of expressive language. 328 00:42:16,870 --> 00:42:19,880 It has all of the things from pencil. 329 00:42:21,310 --> 00:42:31,330 It's scalable and as a general. So, you know, you can use it with a variety of different neural packages and we have support for this. 330 00:42:31,750 --> 00:42:34,570 Again, this is kind of pretty fresh work. 331 00:42:36,550 --> 00:42:43,240 I'm going to show just a little experimental evaluation that I have a nicer one, but it takes longer to explain. 332 00:42:43,990 --> 00:42:44,980 So just a. 333 00:42:46,160 --> 00:42:57,170 It's one set of experiments where we were doing a collective classification and we're comparing to like a simple label propagation algorithm, 334 00:42:58,040 --> 00:43:03,710 existing NESI approaches and GCN. 335 00:43:04,160 --> 00:43:11,960 And we can clearly do way better than the other neuro symbolic approaches for the GCN. 336 00:43:12,350 --> 00:43:22,280 And yeah, we're doing a little bit better. But you know, the PSL model, the new PSL model is super simple. 337 00:43:22,280 --> 00:43:27,500 So the simplicity is part of the attraction. 338 00:43:30,040 --> 00:43:37,950 And. Let me now turn to a topic of scaling. 339 00:43:41,540 --> 00:43:53,949 So. We're interested in doing these big graft problems, but we want to make sure that we can do it in a reasonable amount of time. 340 00:43:53,950 --> 00:43:57,190 And they're key. 341 00:43:59,570 --> 00:44:03,480 Problem in this is we have our way to logical rules. We have some data. 342 00:44:04,400 --> 00:44:18,890 There's two processes that happen. One is the instantiation of the model, and then there's doing the inference in the model. 343 00:44:19,190 --> 00:44:24,320 So if we want to scale this, we can scale two aspects of it. 344 00:44:24,620 --> 00:44:31,940 One is the inference, which I'll talk about first, and the other is the grounding, which I also talk about. 345 00:44:32,330 --> 00:44:42,379 So inference. Remember, one of my claims was, you know, we went from a combinatorial optimisation problem to a convex optimisation problem. 346 00:44:42,380 --> 00:44:52,520 So already, yeah, that's pretty good in terms of scalability, but it's still potentially a huge linear programme. 347 00:44:53,030 --> 00:45:00,440 So what we can do is we can make use of the fine grained structure in these models and. 348 00:45:03,330 --> 00:45:10,229 We exploit this by decomposing, and we've looked at a bunch of optimisation techniques, 349 00:45:10,230 --> 00:45:18,200 but the first one that we looked at that was really beneficial was Adam's alternating direction multi method of multiplier. 350 00:45:18,210 --> 00:45:23,610 So it's something that's used a lot in computer science or in machine learning. 351 00:45:24,000 --> 00:45:33,329 And the simple idea, it's very simple idea, but you decompose it, you solve the problems, and then you do some message passing. 352 00:45:33,330 --> 00:45:38,160 And the nice thing for a convex problem is it's guaranteed to converge. 353 00:45:40,530 --> 00:45:45,120 The trick is how do you make sure the sub problems are fast? 354 00:45:45,780 --> 00:45:48,870 And this is the place where we have these hinge losses. 355 00:45:49,260 --> 00:45:53,520 And if you break it down and look at these hinges. 356 00:45:54,550 --> 00:46:05,410 A lot of them are just trivial. So you can just ignore them and you can get to a much smaller set of constraints and then solve that. 357 00:46:05,920 --> 00:46:16,690 And so the first set of results I'm going to show are just comparing to using an off the shelf convex optimiser. 358 00:46:17,140 --> 00:46:20,470 A commercial grade optimiser. 359 00:46:20,830 --> 00:46:23,890 And then our admin implementation. 360 00:46:24,490 --> 00:46:35,770 And the first thing you see is for something with 150,000 potentials, you know, the convex optimiser took 3 hours, 361 00:46:35,770 --> 00:46:45,730 the commercial grade one took 3 minutes and it took a few seconds for something that was even larger with 800 million potentials, 362 00:46:46,360 --> 00:46:51,820 you know, so it could do it in a minute. But the others just, you know, can even handle it. 363 00:46:52,600 --> 00:46:58,210 So that was good. However. 364 00:47:00,340 --> 00:47:06,940 Of the total time to solve so that 800 million potential one one minute was 365 00:47:07,300 --> 00:47:14,320 doing the optimisation and 5 minutes was doing the grounding of the problem. 366 00:47:14,650 --> 00:47:20,770 So. The next thing that we looked at was how do we speed up grounding? 367 00:47:21,580 --> 00:47:27,370 And this is very much work with my current Ph.D. student, Eric Augustine. 368 00:47:28,990 --> 00:47:32,770 And you can see why grounding is an issue. Because even. 369 00:47:35,140 --> 00:47:41,770 In the simplest case where I'm potentially doing pairwise similarity, you know, 370 00:47:41,770 --> 00:47:47,620 if I have n things that I'm comparing, I'm going to get and squared and you know, that's just a simple setting. 371 00:47:47,620 --> 00:47:54,630 It can get much worse than that. So. What can we do to speed this up? 372 00:47:55,080 --> 00:48:05,459 Well, the first thing that one can do, and it's done in a lot of different areas and of course, goes by a lot of different terms. 373 00:48:05,460 --> 00:48:13,250 But is blocking and blocking is the simple idea that if you have, you know, say, 374 00:48:13,290 --> 00:48:19,230 a simple case where you have the pairwise comparison to say, okay, I'm not going to do all comparisons. 375 00:48:19,560 --> 00:48:25,600 I'm going to somehow break it into blocks and only do the comparison across blocks. 376 00:48:25,620 --> 00:48:32,459 And, you know, within databases, there's approaches with in data mining, 377 00:48:32,460 --> 00:48:40,500 there's approaches within theoretical computer science, there's approaches, you know, we'll use these approaches. 378 00:48:40,500 --> 00:48:51,840 But then in the implementation with PSR, we can implement that blocking as a blocking predicate. 379 00:48:52,990 --> 00:48:57,580 We can form all of these rules as conjunctive queries. 380 00:48:58,030 --> 00:49:05,800 Give them to the database and give a kind of this as a hint to the database for optimising. 381 00:49:06,130 --> 00:49:10,420 And this can make a huge, huge difference in performance. 382 00:49:11,020 --> 00:49:16,990 So. Here's a couple of different settings. 383 00:49:17,350 --> 00:49:26,540 So first off, notice that this is a log scale where we're comparing performances and, 384 00:49:26,870 --> 00:49:32,050 you know, making use of this can give you, you know, significant speed ups. 385 00:49:32,830 --> 00:49:36,040 So this was the first thing that we looked at. 386 00:49:36,430 --> 00:49:41,440 And just to give you a little sense of the community and where the community 387 00:49:41,440 --> 00:49:46,540 and S.r.l. has been in terms of the scale of the problems that they've done, 388 00:49:46,870 --> 00:49:51,520 you know, some of the works from MLS ends in alchemy. 389 00:49:53,040 --> 00:49:59,180 In 2007, they were doing problems that had a million potentials and doing it. 390 00:49:59,520 --> 00:50:11,460 And, you know, 400 minutes Tuffy came along, which was an implementation that sped things up quite a bit and scaled things. 391 00:50:12,540 --> 00:50:20,520 2015 Fox Pixel, which was a distributed implementation of parcel, sped things up. 392 00:50:21,870 --> 00:50:35,310 And then, you know, in 2018 the results that use the admin were able to speed this up and do larger problems. 393 00:50:36,960 --> 00:50:46,950 This is version 2.1. Version 2.2 which this version the main thing was smarter memory management and 394 00:50:47,280 --> 00:50:55,530 kind of systems issues is able to do 150 million potentials in 10 minutes. 395 00:50:57,330 --> 00:51:10,770 But we can go further and so we can go further by being even smarter about how we do our query generation for grounding. 396 00:51:11,340 --> 00:51:20,220 And this is nice work that makes use of ideas in the database community on multi query optimisation. 397 00:51:20,820 --> 00:51:26,490 And this idea is, you know, we have all these different rules, 398 00:51:27,210 --> 00:51:33,570 we can generate potential query plans for them and then we can look for opportunities 399 00:51:33,570 --> 00:51:41,490 to reuse the queries and you know of the different set of queries that cover them, 400 00:51:41,490 --> 00:51:45,750 find the one that is expected to perform the best. 401 00:51:45,750 --> 00:51:49,800 Again, you know, finding that is a challenging computational problem. 402 00:51:51,480 --> 00:52:00,480 And here's some representative results. I know these are too small to see, but one of the things that's kind of interesting is, 403 00:52:01,170 --> 00:52:06,660 you know, we just added a bunch of overhead of searching over different query plans. 404 00:52:06,960 --> 00:52:11,190 You would think that you might pay a price for doing that. 405 00:52:11,190 --> 00:52:14,220 And it turns out that price isn't so much. 406 00:52:14,580 --> 00:52:21,270 And then on the larger data sets, you get a significant speed ups. 407 00:52:22,780 --> 00:52:28,930 And you know, this is going to depend on the problem. It could end up being even more so from this. 408 00:52:29,230 --> 00:52:32,380 We add a new line into our. 409 00:52:33,560 --> 00:52:42,380 You know, benchmarks. And we're able to do these 150 million potential problems in 2 minutes for grounding. 410 00:52:44,600 --> 00:52:59,120 But wait, there's more. So we can again make use of ideas from database theory to go beyond this and something that we called tandem inference and. 411 00:53:00,440 --> 00:53:04,580 Tandem inference. The idea is, you know, this picture that I showed. 412 00:53:05,150 --> 00:53:09,050 It really has this grounding problem that we're blocking on. 413 00:53:09,080 --> 00:53:13,450 It's like, first we have to ground it and then we start doing inference. 414 00:53:14,550 --> 00:53:21,280 Well, what if we. Due streaming, grounding and streaming in France. 415 00:53:21,300 --> 00:53:26,210 So as we start grounding, we start solving the problem. 416 00:53:28,190 --> 00:53:35,149 Well, we can do that. And this black line is a tandem in France. 417 00:53:35,150 --> 00:53:42,709 And then there's two different inference methods the admin that I showed before, and another one stochastic gradient descent, 418 00:53:42,710 --> 00:53:48,400 but across a bunch of different canonical data sets, you know, 419 00:53:48,410 --> 00:53:57,620 we're able to get solutions before they've even started completed the grounding process. 420 00:53:57,890 --> 00:54:07,850 So that's exciting. But in addition, we're to the point that we can now do problems that are too big to fit into main memory. 421 00:54:08,420 --> 00:54:13,879 So we're able to essentially take these huge problems. 422 00:54:13,880 --> 00:54:23,000 If you were to put it in two main memory, you give it a small blur amount of space. 423 00:54:23,300 --> 00:54:38,240 You page things in and out. And with this, we're able to now scale to things that have, you know, billions of potentials and do them in the same time, 424 00:54:38,240 --> 00:54:45,110 the things that were, you know, several orders of magnitude smaller or before. 425 00:54:45,500 --> 00:54:57,980 So this whole progression we find exciting and we're pretty sure we're the people that are doing the largest problems of this kind in this complexity. 426 00:55:00,270 --> 00:55:04,140 Through, you know, people that we've talked to at various companies as well. 427 00:55:05,850 --> 00:55:10,610 So I am not going to talk about online PSL. 428 00:55:11,670 --> 00:55:15,240 We also have methods for kind of changing the models on the fly. 429 00:55:17,460 --> 00:55:21,900 What I do want to talk about briefly is. 430 00:55:22,930 --> 00:55:24,969 Some of the applications that we've done, 431 00:55:24,970 --> 00:55:37,030 because I'm actually really proud of the work that my group has done using pencil on kind of a rich collection of problems. 432 00:55:37,510 --> 00:55:44,890 So we've done things on computational biology and health informatics. 433 00:55:45,430 --> 00:55:53,620 We have done a fair bit of work on natural language processing, not surprisingly, recommender systems. 434 00:55:54,970 --> 00:56:00,480 We've also done work in computer vision, energy and the environment. 435 00:56:01,750 --> 00:56:08,460 Computational social science. A fairness, which I'll mention a bit. 436 00:56:09,150 --> 00:56:16,710 Information integration and extraction. And predicting malicious behaviour. 437 00:56:17,840 --> 00:56:24,860 The one of these that I want to go into a bit is the information integration, 438 00:56:25,310 --> 00:56:36,020 because I think it's kind of connects with research being done here and in kind of interesting ways is knowledge graph construction. 439 00:56:37,210 --> 00:56:49,270 And Knowledge Graph Construction also connects with the original graph identification pattern that I showed earlier where the idea is. 440 00:56:50,170 --> 00:56:55,150 Then you have a bunch of facts from these facts. 441 00:56:55,480 --> 00:56:58,540 You're trying to extract out a knowledge graph. 442 00:56:58,930 --> 00:57:06,160 In order to do this, you need to perform a collective classification linked prediction entity resolution, 443 00:57:06,460 --> 00:57:10,300 but you also have to enforce ontological constraints. 444 00:57:10,600 --> 00:57:16,180 You potentially want to reason about the quality of the information and more. 445 00:57:17,530 --> 00:57:32,110 We've done this using KSL for a couple of different knowledge graph extraction problems, and the key takeaway from this is that we're able to do this. 446 00:57:33,680 --> 00:57:37,630 And all of the pieces of information are useful. 447 00:57:37,640 --> 00:57:43,130 So the statistical features and the semantic features kind of putting those together 448 00:57:43,430 --> 00:57:50,900 really helps in another kind of knowledge graph task where we're trying to do, 449 00:57:51,410 --> 00:57:54,530 um, knowledge completion. 450 00:57:55,220 --> 00:58:02,300 So you're kind of predicting links. Uh, the key takeaway from this is. 451 00:58:04,020 --> 00:58:12,179 Well, the approach is when you have a lot of clean data, a lot of data, and the data is clean work. 452 00:58:12,180 --> 00:58:18,660 But when you have noisy data and it's sparse, things like PSL can really help. 453 00:58:20,390 --> 00:58:31,290 So. I want to close by talking about kind of opportunities and challenges. 454 00:58:32,160 --> 00:58:36,770 And so. I'll start with the challenge. 455 00:58:37,370 --> 00:58:41,460 So. The perils of ignoring structure. 456 00:58:42,530 --> 00:58:49,740 And. I alluded to one of these before that privacy and privacy and growth. 457 00:58:50,370 --> 00:58:54,480 And I think this is a really important area. 458 00:58:55,000 --> 00:59:00,320 And a lot of the work in privacy still looks at this kind of independent setting. 459 00:59:00,330 --> 00:59:03,720 They're starting to be work more work in privacy and graphs. 460 00:59:04,280 --> 00:59:08,849 And some of the work this was work of my former Ph.D. student, 461 00:59:08,850 --> 00:59:17,970 Elena Leiva was some of the earliest to look at how much information is leaked by the groups that you're a member of. 462 00:59:18,390 --> 00:59:22,560 Not surprisingly, it's a lot actually more than friendship links. 463 00:59:25,400 --> 00:59:28,940 Another important area is fairness. 464 00:59:29,630 --> 00:59:40,760 And with my former post deck garnish for Noddy and colleagues, we're pretty sure we're the first to look at relational fairness. 465 00:59:41,240 --> 00:59:49,160 And because so much about fairness is taking into account structural inequities 466 00:59:49,520 --> 00:59:54,950 that if you don't have a fairness mechanism that can represent the structure, 467 00:59:55,280 --> 01:00:07,280 that's going to be problematic. And then finally, looking at causality and causality in graphs is important. 468 01:00:07,700 --> 01:00:20,660 And this is some joint work with colleagues in the database world where we developed a relational causal system. 469 01:00:22,130 --> 01:00:27,200 I really think that if this conjunction of all three. 470 01:00:27,260 --> 01:00:42,140 Privacy, fairness and causality and understanding how they relate is important for methods that are able to kind of mitigate these impacts. 471 01:00:43,100 --> 01:00:51,530 And it's important to have the background and the social theory to be able to do the interpretation. 472 01:00:53,050 --> 01:01:00,010 In terms of opportunities. Yeah, all of the things that we look at these days are systems. 473 01:01:00,400 --> 01:01:09,160 So you think biological systems, ecosystems, social systems, things that are all these things combined. 474 01:01:09,670 --> 01:01:16,210 And this idea of kind of going from observational data. 475 01:01:17,470 --> 01:01:33,580 Doing some sort of graph identification to infer out a structure, then doing, you know, metrograph ID or knowledge graph ID to infer something, 476 01:01:33,580 --> 01:01:38,170 but then to close the loop where you use that, 477 01:01:38,170 --> 01:01:48,070 to go back and extract more information out and potentially, you know, keep doing this is important and. 478 01:01:50,080 --> 01:01:56,290 To the extent that this ties to these four different interpretations that I had at the beginning. 479 01:01:56,560 --> 01:02:02,139 So I think from their reasoning about entities and relationships and social ties, 480 01:02:02,140 --> 01:02:10,210 it's kind of obvious then this ability to go from specific to abstract and learning these things. 481 01:02:10,540 --> 01:02:15,520 And yeah, the philosophy was a little bit dicier, but, you know, it is, 482 01:02:16,570 --> 01:02:22,899 I think, going from a certain perspective sometimes in AI and machine learning, 483 01:02:22,900 --> 01:02:36,370 which is very atomistic to a more kind of not quite multi-agent but a systems view is important and. 484 01:02:37,930 --> 01:02:42,340 The last thing is the most important acknowledgements. 485 01:02:43,090 --> 01:02:50,020 My students that's the best thing about being a professor is our awesome students get to work with. 486 01:02:51,100 --> 01:03:00,820 I also am fortunate to work with a variety of different companies and organisations and. 487 01:03:01,930 --> 01:03:09,280 Just to wrap it up to say, you know, I hope that somewhere between the more tutorial or technical part, 488 01:03:09,580 --> 01:03:18,100 you come away with some ideas and ideas for things that either integrate probabilistic 489 01:03:18,100 --> 01:03:23,560 and logical inference and are on symbolic or data driven and knowledge driven. 490 01:03:23,920 --> 01:03:30,250 And I think there's tons of opportunities and a space that I think.