1 00:00:00,630 --> 00:00:18,960 Okay. So welcome to this term strategy lecture. 2 00:00:18,970 --> 00:00:27,840 This is a distinguished lecture that we have once a term named after Professor Christopher Straight G who formed the programming research group here. 3 00:00:28,740 --> 00:00:30,209 I'm going to introduce our speaker. 4 00:00:30,210 --> 00:00:41,970 But before I do that, I would like to thank Oxford Assets Management, who have been generously funding this series of talks since 2015. 5 00:00:42,450 --> 00:00:47,310 And without their funding, we wouldn't be able to have excellent speakers as we do. 6 00:00:47,760 --> 00:00:53,610 So we're really grateful for that. And also, they brought coffee and doughnuts. 7 00:00:53,610 --> 00:01:01,530 So many students and researchers who might wonder about employment might want to go chat to them afterwards. 8 00:01:01,530 --> 00:01:04,800 And they they they're sitting. They'll be up at reception. 9 00:01:05,130 --> 00:01:09,360 So you might like to do that. The way we're going to do this is, as usual, 10 00:01:09,840 --> 00:01:17,040 Martin will give his talk and then we'll have some microphones afterwards that you can wander around and ask questions. 11 00:01:18,030 --> 00:01:24,300 So that brings me to introducing Martin Grillo, which is actually a brilliant pleasure. 12 00:01:24,750 --> 00:01:32,670 So Martin is the chair for Logic and the theory of discrete systems at the university are w t h ipa. 13 00:01:32,880 --> 00:01:42,030 But he says that somewhere there. Yeah. And he's well known actually for a bunch of different fields that intersect in our department. 14 00:01:42,040 --> 00:01:52,860 So Martin has written the key textbooks in multiple areas, including Parameterised complexity, 15 00:01:53,130 --> 00:02:00,930 which is pretty much the the scripture of Parameterised Complexity is Martin's book, which I think many of us actually own. 16 00:02:01,470 --> 00:02:05,700 He's also got a great book on descriptive descriptive complexity of graph structures. 17 00:02:06,000 --> 00:02:10,710 And actually. Martin, some years ago, I was talking to Martin at a workshop, 18 00:02:11,250 --> 00:02:19,760 and he told me I'm going to step back from the things that I'm doing and work exclusively on graphs and autism. 19 00:02:20,250 --> 00:02:23,920 And I remember being immediately jealous and thinking, what a great idea. 20 00:02:23,940 --> 00:02:30,050 What do I do that? MARTIN Someone who has worked on truly deep things. 21 00:02:30,060 --> 00:02:33,660 And so he had a lot of really great results about graphs and more them. 22 00:02:34,740 --> 00:02:41,250 But I know just by looking at your DLP that he's also done something I didn't predict, 23 00:02:41,550 --> 00:02:45,270 which is that after all that those kind of amazing theoretical results, 24 00:02:45,570 --> 00:02:52,530 you have a result in triple j lie about graphs amorphous on the neural net, which has about a thousand citations. 25 00:02:52,530 --> 00:03:01,290 So and I noticed also Martin's got a lot of new stuff about probabilistic databases, so lots to offer. 26 00:03:01,560 --> 00:03:05,490 I'm going to shut up. So I'm only going to mention two of Martin's many awards. 27 00:03:05,490 --> 00:03:10,680 So Martin is the recipient of the Hynes Meyer Leibniz Prize from the German 28 00:03:10,680 --> 00:03:16,560 Research Foundation for his work in mathematics and also fellow at the ACM. 29 00:03:16,860 --> 00:03:22,080 And we're super lucky to have him here today talking about symmetry and similarity. 30 00:03:22,590 --> 00:03:33,310 That's what. Oh, good afternoon, everybody. 31 00:03:33,320 --> 00:03:36,790 I'm also very happy to be here. Thank you for the introduction, Leslie. 32 00:03:38,890 --> 00:03:41,049 I will talk about symmetry and similarity. 33 00:03:41,050 --> 00:03:47,860 Just one thing to clarify before I heard the doughnuts are only for the students or people who look for employment. 34 00:03:48,880 --> 00:03:52,780 I was counting on them, so you may have to reconsider that. 35 00:03:54,280 --> 00:03:58,209 Okay, so today I'll speak about symmetry and similarity. 36 00:03:58,210 --> 00:04:04,600 Part of it is graph I, some of ism and indeed for a while I mainly worked on that. 37 00:04:05,200 --> 00:04:06,099 But then I mean, 38 00:04:06,100 --> 00:04:14,319 the nice thing about my job is that you can do different things whenever you feel like doing something else and nobody tells you what to do. 39 00:04:14,320 --> 00:04:17,559 So that's, that's what I do. 40 00:04:17,560 --> 00:04:25,090 And as Leslie said recently, I've done some other stuff, but at the end of my talk I will connect to that as well. 41 00:04:25,930 --> 00:04:29,079 So let's talk a little bit about symmetry. 42 00:04:29,080 --> 00:04:36,040 So symmetry is somehow something that has been ingrained in human thought for a long time. 43 00:04:36,040 --> 00:04:40,179 You see this phrase here, which has an obvious symmetry on there. 44 00:04:40,180 --> 00:04:43,329 I hope you can you can see it. 45 00:04:43,330 --> 00:04:57,850 It's 4000 years old. So even at that time, people were very aware of it and somehow found it's important enough to put on on there the pieces of art. 46 00:04:59,050 --> 00:05:04,060 And of course, actually, you find symmetry everywhere, not only in human thought, but in nature. 47 00:05:04,060 --> 00:05:07,630 That's probably where humans get it from also there. 48 00:05:08,920 --> 00:05:12,280 So it's good that I removed my slide with the butterfly, 49 00:05:12,280 --> 00:05:24,610 which in earlier versions of of talks about this was presence, but of course being in the Museum of Natural History. 50 00:05:25,540 --> 00:05:30,609 That fits. Okay, so we have that. And then of course, we have symmetry in the arts. 51 00:05:30,610 --> 00:05:37,270 We have already seen this on the first slide. He has a very nice and intricate example as well. 52 00:05:38,780 --> 00:05:43,340 And we have symmetry in mathematics. And that brings us closer to what I'll speak about. 53 00:05:44,270 --> 00:05:46,360 So what is symmetry in mathematics? 54 00:05:46,370 --> 00:05:56,229 It's basically invariance under certain operations, and typically we think of geometric operations like reflections or rotations. 55 00:05:56,230 --> 00:06:01,790 So in this octave we have we can basically turn it or we can flip it. 56 00:06:02,450 --> 00:06:07,520 And another very typical geometric thing which. 57 00:06:08,520 --> 00:06:14,579 At least I don't think about it immediately. But it's also something very familiar. 58 00:06:14,580 --> 00:06:23,280 It's a translation. So if you imagine this banner is infinite and you can shift it and there would also be some kind of symmetry. 59 00:06:23,580 --> 00:06:33,989 Okay. And I may I'm listing these different things just to make the point that symmetry has some some abstract properties, 60 00:06:33,990 --> 00:06:38,670 and they will be very important for what follows. So let me just fix them. 61 00:06:38,670 --> 00:06:48,420 So we have a mathematical object and the symmetry of this object, or depending on which area we are in, we may also call them or two or more. 62 00:06:48,420 --> 00:06:58,079 Phases of the objects are mappings from the object to itself and they are closed under composition. 63 00:06:58,080 --> 00:07:06,090 So if you take one symmetry, apply it and then you apply another one, you get a symmetry as well and they also inverted. 64 00:07:06,270 --> 00:07:14,010 You can always go back. I mean, these are the crucial properties they have and the mathematical structure they form. 65 00:07:15,980 --> 00:07:27,560 Is known as a group. Okay. And this will be important because as a general principle, we will think about computation or computing symmetries. 66 00:07:28,070 --> 00:07:35,750 Okay. And as soon as we have a mathematical structure, there's a chance that we can get efficient algorithms. 67 00:07:35,780 --> 00:07:40,790 As a general principle, I think that's that's fairly reliable. 68 00:07:40,790 --> 00:07:44,359 In particular, if you're in the range of kind of harder problems, 69 00:07:44,360 --> 00:07:49,480 you want to look for mathematical structure and you want to find ways to make it useful. 70 00:07:49,490 --> 00:07:55,040 And that's the guiding principle of much of the research on this stuff. 71 00:07:55,050 --> 00:07:58,370 I'll be I'll be talking about today. Okay. 72 00:07:58,370 --> 00:07:58,910 And now. 73 00:07:59,180 --> 00:08:08,360 Okay, we have a group and the group of old symmetries of an Object X is then known as Symmetry Group of the Object or the Automotive ISM Group. 74 00:08:08,360 --> 00:08:13,730 And actually in the area I'll be talking about, we would rather say automotive ISM Group. 75 00:08:14,390 --> 00:08:18,200 Okay. And just one last general remark. 76 00:08:19,470 --> 00:08:27,090 Autumn of isms. That's the symmetries of the objects are closely related to ISO Morph isms, 77 00:08:27,090 --> 00:08:32,250 which are basically mappings between two objects that preserve old structure. 78 00:08:33,090 --> 00:08:33,990 Okay. If you. 79 00:08:36,970 --> 00:08:44,890 If you forget all the geometry and everything, then an automotive business is a mapping from an object to itself that preserves structure. 80 00:08:45,340 --> 00:08:54,040 And I some of ism is between two objects and the the two for the purpose of this talk are more or less equivalent. 81 00:08:54,070 --> 00:09:00,280 Okay, if we understand the automotive isms, we understand ISO Molefe isms and the other way around. 82 00:09:01,120 --> 00:09:06,040 All right. So I want to talk about algorithmic aspects of symmetry. 83 00:09:07,210 --> 00:09:14,140 Okay. And they are mostly captured in an old problem in computer science and well-known problem. 84 00:09:14,140 --> 00:09:16,150 That's the graph. I am off isn't problem. 85 00:09:16,900 --> 00:09:26,050 So let me start by telling you what that is and then tell you a little bit about the history of this problem and what we know. 86 00:09:26,500 --> 00:09:30,520 And well, then we'll arrive at some more reason to work. 87 00:09:31,240 --> 00:09:35,440 Okay, so a graph, I assume you'll know what that is. 88 00:09:35,440 --> 00:09:40,960 We have a bunch of points and they are connected by lines, by bi edges. 89 00:09:41,380 --> 00:09:51,880 And then in automotive ISM or ISO morph ism is a mapping on the points on the vertices that preserves this edge relation. 90 00:09:52,150 --> 00:09:57,969 Okay. And the graph I am office and problem as an algorithmic problem is the following. 91 00:09:57,970 --> 00:10:05,020 We're given two graphs and we want to decide if they are similar, if they are basically capturing the same structure. 92 00:10:05,500 --> 00:10:09,819 Okay, but we may be given this the structure in different representations. 93 00:10:09,820 --> 00:10:16,000 That's why it's not trivial. And if you look at these four graphs here, they are fairly small, just eight vertices. 94 00:10:17,770 --> 00:10:22,360 Well, are they ISO morphic or not? It's it's not obvious. 95 00:10:23,230 --> 00:10:26,320 At least if you look at the last one, it's not obvious at all. 96 00:10:27,730 --> 00:10:30,940 Well, actually, they are all four of them are ISO morphic. 97 00:10:30,940 --> 00:10:35,230 So in some sense they are all the same graph just drawn differently. 98 00:10:36,190 --> 00:10:41,259 But it's not so easy to see this, but just to check a random one. 99 00:10:41,260 --> 00:10:49,959 How could we check this? The way I would do this just by hand is maybe, maybe number the vertices of the first one. 100 00:10:49,960 --> 00:10:53,780 Let's just do this. Okay. 101 00:10:53,780 --> 00:10:59,090 So that's the Burgesses. And now we want to map it to the vertices of the second one, the red one. 102 00:10:59,600 --> 00:11:09,530 Can you distinguish the colours? Well, uh, upper right corner in a way that basically the same numbering then has the same edges. 103 00:11:09,530 --> 00:11:13,969 So I would maybe start with the one here. 104 00:11:13,970 --> 00:11:17,060 Put the two here. Three. 105 00:11:17,160 --> 00:11:23,260 No, this should work for five. Six, seven, eight. 106 00:11:23,260 --> 00:11:29,770 So that's what we can try. Now, what we must check is, for example, here we have an edge between one and two. 107 00:11:31,930 --> 00:11:35,210 Okay. And we want to find that edge here as well. Okay. 108 00:11:35,230 --> 00:11:38,650 We also have one between one and two. Now, let's take a random one. 109 00:11:39,850 --> 00:11:43,450 Two and six. There's no edge. So I did something wrong. 110 00:11:45,110 --> 00:11:49,100 Okay. Yeah, that's the thing with these crafts. 111 00:11:52,060 --> 00:11:56,820 Um. Why did I do this wrong? 112 00:11:58,640 --> 00:12:03,590 So this. You should be six. Did I miss? Well, okay. 113 00:12:06,080 --> 00:12:09,920 No, I want to figure this out now. Sorry for wasting your time. 114 00:12:10,820 --> 00:12:14,420 So that was bad. Okay, let's. 115 00:12:14,420 --> 00:12:17,600 Let's try it this way. We have one. 116 00:12:18,620 --> 00:12:24,670 One is adjacent, so this should be maybe eight. Okay. 117 00:12:26,110 --> 00:12:31,989 Stop me after about half an hour. Okay. 118 00:12:31,990 --> 00:12:36,910 So we have one maybe, let's say this year is five and this is two. 119 00:12:37,980 --> 00:12:41,190 Then the three was wrong anyway. 120 00:12:43,520 --> 00:12:47,180 So five, eight. So the two we have one, two. 121 00:12:49,020 --> 00:12:52,800 Three. Three should be that. And then. 122 00:12:54,570 --> 00:12:59,100 Six should be there. That is good because it also fits to five. 123 00:13:01,850 --> 00:13:06,479 And this probably must be for. It's. 124 00:13:06,480 --> 00:13:09,960 Then it's between three and eight. No. Okay. I'm missing this up. 125 00:13:10,230 --> 00:13:18,570 Forget it. I leave this as an exercise to you and exercise to you to edit this from the film. 126 00:13:20,310 --> 00:13:24,270 But anyway. Okay, so you saw my point. 127 00:13:24,270 --> 00:13:28,530 It's a heart problem right here. And we need a computer to solve it. 128 00:13:29,160 --> 00:13:33,809 Good. So that's what we want to do. And we have this other problem. 129 00:13:33,810 --> 00:13:41,370 I talked about computing, this symmetry, the automotive ism of one graph, of course, that we can also do. 130 00:13:42,330 --> 00:13:52,590 There's only one problem. There can be too many. Okay. A graph with n vertices may have up to m factorial, many automotive isms. 131 00:13:52,590 --> 00:13:56,250 And just that's just too many to just to to list them all. 132 00:13:57,750 --> 00:14:03,230 Okay. So what we actually want to do, we want to continue to generating system for them. 133 00:14:03,240 --> 00:14:09,180 So a set of automotive isms such that all others can be obtained as compositions of these. 134 00:14:09,510 --> 00:14:18,210 Okay. And then it turns out that these two problems are equivalent in the sense that if you can solve one efficiently, you can solve the other. 135 00:14:18,600 --> 00:14:24,390 That's not completely trivial to see, but it's it's something that has been known for a long time. 136 00:14:25,970 --> 00:14:32,930 Okay. And we want to exploit this because mostly our approach will be to address the second problem, 137 00:14:32,930 --> 00:14:37,810 the automotive ISM problem, even though in practice we usually want to solve the first one. 138 00:14:38,870 --> 00:14:39,380 Okay. 139 00:14:39,680 --> 00:14:51,379 So the big open question is if is there an efficient algorithm solving this problem and our model of efficient algorithm is polynomial time algorithm. 140 00:14:51,380 --> 00:14:57,680 Now you can debate if that's a good model. Well, my answer to this is probably not, but it's the best we have. 141 00:14:58,130 --> 00:15:03,200 So we want to go for a polynomial time algorithm for this problem or the other. 142 00:15:03,200 --> 00:15:06,350 The automotive is ism problem as they are equivalence. 143 00:15:06,530 --> 00:15:11,480 Okay. Let me just remind you of a little bit. 144 00:15:13,100 --> 00:15:20,000 Complexity that you see probably in your first theory course, but that's important to you to set the stage for this problem. 145 00:15:20,810 --> 00:15:26,300 We have the class of polynomial time solvable problems, the ones that we can solve efficiently. 146 00:15:26,720 --> 00:15:32,570 Then we have this class MP which contains them and many other natural problems. 147 00:15:32,570 --> 00:15:38,600 And you can describe this class as the problems well, where you cannot find the solution efficiently. 148 00:15:38,600 --> 00:15:44,390 But at least once somebody suggests a solution, you can check efficiently if it is a solution. 149 00:15:44,960 --> 00:15:48,860 Okay. So that's MP. 150 00:15:48,890 --> 00:16:00,650 Many natural problems fall in there. But then we have the class of MP complete problem limbs on the top here, which are the hardest problems in MP. 151 00:16:01,550 --> 00:16:07,130 So if you can solve one of these efficiently, you can solve all the others in MP also efficiently. 152 00:16:07,550 --> 00:16:13,100 And what happens is that most natural problems that you see in the applications 153 00:16:13,400 --> 00:16:18,710 are either here they are in p complete or here they are never in between. 154 00:16:19,250 --> 00:16:25,070 Okay, now provide some of reasons. One of the very few problems where we don't know where it belongs. 155 00:16:25,580 --> 00:16:30,230 Okay. And we don't know for a long time now. 156 00:16:32,330 --> 00:16:38,090 So when I was a student, I was basically learning that probably it's a problem that is in between. 157 00:16:39,230 --> 00:16:43,190 Now, I don't believe this, and I think these days people don't believe this anymore. 158 00:16:43,460 --> 00:16:47,150 I think we we we all believe it should actually be in polynomial time. 159 00:16:47,150 --> 00:16:51,229 We only can't prove it. But anyway, what we can do. 160 00:16:51,230 --> 00:16:55,370 And that has been proved by bye bye 2016. 161 00:16:56,330 --> 00:17:01,790 So, so not so long ago. And that was a big result. We can prove that it comes close. 162 00:17:01,790 --> 00:17:04,880 It's solvable in what we call quasi polynomial time. 163 00:17:05,420 --> 00:17:11,960 It's it's still something that has a kind of exponential growth. 164 00:17:12,890 --> 00:17:19,370 But the x, the exponents of all of this is just growing logarithmic. 165 00:17:19,640 --> 00:17:23,270 So that's kind of close to a constant. 166 00:17:23,750 --> 00:17:28,270 And then it would be polynomial. Okay, so, so that's pretty good. 167 00:17:29,540 --> 00:17:38,480 It's a really nice paper. And so there is still progress in this area, but where does the whole thing come from? 168 00:17:38,960 --> 00:17:48,080 Actually, the first papers on computing I some of isms come from chemistry or chemical information systems to be precise. 169 00:17:48,440 --> 00:17:58,399 So the chemists have these graph representations of molecules like the one you see there, and so they stored them in databases. 170 00:17:58,400 --> 00:18:05,780 But then you can draw the same molecule in different ways. So when they find a new one, they want to know, is it already in our database or not? 171 00:18:06,110 --> 00:18:10,280 So they have to match it against those in the database. That's the problem they faced. 172 00:18:10,610 --> 00:18:18,589 That's the ISO morph isn't problem. So, so, so they came up with some heuristics and then computer sciences. 173 00:18:18,590 --> 00:18:22,010 So that's an early reference from the from the 1950s. 174 00:18:22,010 --> 00:18:29,990 Right. And then computer scientist, the first two papers from computer science about the problem are from 1964. 175 00:18:32,760 --> 00:18:37,950 But at least in theoretical computer science, which is my area. 176 00:18:38,610 --> 00:18:42,000 The problem gained prominence in the 1970s. 177 00:18:42,360 --> 00:18:51,600 Well, in the 1970s, this MP completeness I was talking about came up and there was an extremely influential paper by Karp, 178 00:18:51,780 --> 00:18:56,429 and he showed that many problems that people in combinatorial optimisation, 179 00:18:56,430 --> 00:18:59,190 mostly that people had been studied for a long time, 180 00:19:00,240 --> 00:19:09,560 all turned out to be NP complete OC and p completeness had just been introduced, but more as an abstract thing. 181 00:19:09,570 --> 00:19:14,850 And then Karp came and said, Oh, look, all these natural problems fall in there. 182 00:19:15,090 --> 00:19:18,240 Okay, great paper, extremely influential paper. 183 00:19:18,510 --> 00:19:22,240 And then he had an open problem. And that was what's with graph, I assume office. 184 00:19:22,290 --> 00:19:30,900 He didn't know. Okay. In 1979, Gary Johnson wrote a book about MP completeness and they still didn't know. 185 00:19:31,230 --> 00:19:37,500 And today we still don't know. It's one of the very few remaining open problems from that book. 186 00:19:38,250 --> 00:19:43,580 And well, then people got frustrated from this about this. 187 00:19:43,600 --> 00:19:53,339 So here you see a title of a paper that was also influential called the Graph ICM of ISM Disease, which is kind of funny because I mean, 188 00:19:53,340 --> 00:20:00,180 the problem had been explicitly stated five years earlier and even then people were tired of it. 189 00:20:01,020 --> 00:20:06,870 Now, I told you about Bob, I said algorithm from 2016, which is more than five years ago, 190 00:20:06,870 --> 00:20:14,280 and I am presenting it as as pretty much a new result, telling you that there's still a lot going on in the area. 191 00:20:14,290 --> 00:20:20,250 So, you know, if you work on a problem for long enough, I guess the time scales shift a little bit. 192 00:20:21,330 --> 00:20:26,940 Okay, so we have that and not so much was happening in the 1970s, 193 00:20:26,940 --> 00:20:33,720 but then starting in the 1980s, there was big progress and that's where group theory came in. 194 00:20:34,050 --> 00:20:38,640 Okay. So people had this idea, okay, we have this group structure, we must use it. 195 00:20:38,640 --> 00:20:43,770 And Bob was the first to have the idea. He suggested the general approach. 196 00:20:44,250 --> 00:20:49,830 And then the really important paper out of that period was Lux's paper, 197 00:20:50,100 --> 00:20:56,730 who showed that if the if the graphs have bounded degree, then you can solve it in polynomial time. 198 00:20:56,760 --> 00:21:02,879 Now this result in its its own is probably it's a nice result but not so important. 199 00:21:02,880 --> 00:21:10,140 But what makes this paper extremely important is that he introduced basically a strategy how to solve it, 200 00:21:10,410 --> 00:21:13,730 a method, how to to exploit this group structure. 201 00:21:13,740 --> 00:21:22,710 And that, for example, has been used also by Bobby for his quasi polynomial time algorithm, much research that follows Lux's paper. 202 00:21:22,800 --> 00:21:25,800 So that's that's an extremely important paper. 203 00:21:26,400 --> 00:21:32,870 And now we make a big step. Not that nothing happens in between, but we come to Bob's results. 204 00:21:33,150 --> 00:21:41,760 He showed that it's in quasi polynomial time. You see that the means of communication has changed over years from typewriter to YouTube. 205 00:21:44,280 --> 00:21:47,909 That was a presentation by Bobby on this results. 206 00:21:47,910 --> 00:21:56,070 He gave a couple of seminars at the University of Chicago when he announced this in fall 2015. 207 00:21:56,670 --> 00:22:00,659 And actually we had a doctoral meeting in December 2015. 208 00:22:00,660 --> 00:22:08,969 And basically the afternoons, all all afternoons he talked about this is an 80 page paper and he he tried to explain it. 209 00:22:08,970 --> 00:22:12,090 I remember that very well. 210 00:22:13,800 --> 00:22:21,120 Okay. So there we are. Let's a little bit talk about. 211 00:22:22,440 --> 00:22:29,090 Practical stuff. I will move towards more practical stuff towards the end of the talk now. 212 00:22:29,270 --> 00:22:35,530 My interesting graph I saw Wolf ism, I should say, is mainly the theoretical one. 213 00:22:35,540 --> 00:22:39,440 It's a problem where we don't know the complexity we should. 214 00:22:39,850 --> 00:22:46,040 Okay. And also it's a lot of nice mathematics involved. 215 00:22:46,370 --> 00:22:54,800 But, but I'm also coming to the more practical things. Now, one thing to note around the same time that that Lots wrote this influential paper, 216 00:22:55,820 --> 00:23:03,500 Brendan McKay came up with a tool solving graph saw in office, and that's already in 1981 worked very well. 217 00:23:03,530 --> 00:23:07,819 It's called Naughty and it's still one of the best. 218 00:23:07,820 --> 00:23:16,100 I mean, he continued developing. It's it's still of the one of the best tools and you know, and in most practical situations, 219 00:23:16,340 --> 00:23:22,040 you can just solve some of this which distinguishes it from other and part problems. 220 00:23:22,040 --> 00:23:28,130 You can solve them sometimes, but usually it's also easy to come up with hard instances for this. 221 00:23:28,730 --> 00:23:33,080 It's very difficult to find examples where the tools don't work well. 222 00:23:33,300 --> 00:23:45,350 Okay, so the state of the art is probably a combination of naughty with a with another tool by uh, and also people know call traces. 223 00:23:47,060 --> 00:23:52,260 Okay. So we can solve it well in practice. 224 00:23:52,270 --> 00:23:56,070 Now you may ask why on earth would we want to solve it in the first place? 225 00:23:56,080 --> 00:24:00,690 So let me let me tell you a little bit about applications of this. 226 00:24:00,690 --> 00:24:06,840 And we have two different types of applications, which is interesting because they are really, 227 00:24:07,290 --> 00:24:12,570 really very different, but both builds on the same thing. 228 00:24:12,870 --> 00:24:19,320 And the first are called graph matching applications. You have different graphs, you want to match them, you want to know are they the same? 229 00:24:19,770 --> 00:24:24,239 OC The chemical information system thing I mentioned, what is of that type? 230 00:24:24,240 --> 00:24:28,290 You have two representations of a molecule you want to find. 231 00:24:28,300 --> 00:24:32,250 If it isn't the same, the same thing happens in computer vision. 232 00:24:32,250 --> 00:24:37,319 There's been a lot of practical work on this. 233 00:24:37,320 --> 00:24:41,100 They call it graph matching, actually in computer vision. 234 00:24:43,170 --> 00:24:49,380 Which is completely unrelated to the working theory, by the way. 235 00:24:50,070 --> 00:24:51,960 So, so anyway. 236 00:24:52,650 --> 00:25:01,139 So, so for example, what they do saying face recognition, they represent faces as graphs where they, they have a node for the nose and the eyes. 237 00:25:01,140 --> 00:25:04,140 So what do I know? And then they want to match the graphs. 238 00:25:04,500 --> 00:25:08,879 And then there's many applications like that in, uh, in computer vision. 239 00:25:08,880 --> 00:25:13,950 So they, they want to do this or static program analysis is another example. 240 00:25:13,950 --> 00:25:18,930 Malware detection. You may have flow graphs of code and you want to match them. 241 00:25:19,200 --> 00:25:26,250 So that's one typical type of application. And the second, as I said, is quite different that we want to use. 242 00:25:26,880 --> 00:25:35,020 Um. Symmetries to to simplify hard combinatorial problems. 243 00:25:35,320 --> 00:25:42,040 So we can try to use the symmetries of an input instance to a heart problem, say, 244 00:25:42,040 --> 00:25:48,580 to just simplify the instances, maybe collapsing equivalence classes of vertices. 245 00:25:49,000 --> 00:25:53,580 Or we can just prune such trees if we have backtracking algorithms. 246 00:25:53,590 --> 00:26:01,240 And again, this has many applications, for example, in set solving in integer linear programming and so on. 247 00:26:02,950 --> 00:26:08,440 Okay. So these are the types of applications and. 248 00:26:11,220 --> 00:26:19,170 One thing I noted when I I looked at this quite a while ago that actually in most of these applications, 249 00:26:19,170 --> 00:26:23,850 you don't require exact ISO malfeasance, exact matching. 250 00:26:24,510 --> 00:26:34,460 It's usually good enough in, say, in these computer vision things if if you have an approximate matching, okay, sometimes you want it to be exact. 251 00:26:34,470 --> 00:26:43,680 Probably the chemical information systems require exact matching the static program analysis or the computer vision. 252 00:26:43,680 --> 00:26:49,890 Probably not. Okay. And when then you you start to think, what is it, approximate I am office. 253 00:26:50,010 --> 00:26:54,419 And that leads to the second part of my talk on graph similarity. 254 00:26:54,420 --> 00:27:00,240 And actually that led me to think about what graph similarity could probably mean and so on. 255 00:27:00,270 --> 00:27:01,500 So I'll come to that. 256 00:27:01,890 --> 00:27:12,810 But now I want to continue with ISO Mathieson and then tell you a little bit about some algorithms and some recent work we've been doing there. 257 00:27:13,170 --> 00:27:22,920 But let's start with a, uh, simple old algorithm also from the 1960s which. 258 00:27:24,330 --> 00:27:28,320 Has somehow become quite important in recent years. 259 00:27:28,710 --> 00:27:36,450 That's the Vice Falla Lima algorithm, which is also known as colour refinements or not vertex classification. 260 00:27:36,990 --> 00:27:44,550 And what it does is it computes a colouring of the vertices of the graph in a very simple way, iteratively. 261 00:27:44,700 --> 00:27:51,389 Initially, all vertices get the same colour and then we repeatedly refine the colour colouring. 262 00:27:51,390 --> 00:27:57,450 And the way we do this is we look at two vertices that still have the same colour and we give them different 263 00:27:57,450 --> 00:28:04,290 colours if there's some other colour such that these two have a different number of neighbours of that colour. 264 00:28:04,620 --> 00:28:13,349 Okay, and we repeat this. So we basically get a partition of the vertices into colour classes and we repeat it as long as 265 00:28:13,350 --> 00:28:20,010 the partition gets finer and at some point that stops and then we return this stable partition. 266 00:28:20,070 --> 00:28:25,890 So I can just show you a little demo of that whole Gödel wrote this. 267 00:28:26,160 --> 00:28:33,299 So here we have some graph, random graph with parameters that turn out to be convenient in this context. 268 00:28:33,300 --> 00:28:36,660 Well, here it's actually a tree. Let's do a new one. 269 00:28:36,660 --> 00:28:40,620 Tree is too boring. Is this a tree again? No, we have a triangle at least. 270 00:28:40,890 --> 00:28:46,650 Okay. Anyway, let's look at this. Initially, all vertices have the same colour blue. 271 00:28:46,890 --> 00:28:51,240 Okay, so then we start the refinement. And what should happen now? 272 00:28:51,480 --> 00:28:58,470 For example, this one here has two blue neighbours, so it should get a different colour then. 273 00:28:58,740 --> 00:29:03,780 That's one which only has one blue neighbour or that one that has three blue neighbours. 274 00:29:04,680 --> 00:29:08,190 So let's see what happens now. Yeah. 275 00:29:08,190 --> 00:29:16,290 So here this one is red, this is some, some ugly greenish colour and that's another green. 276 00:29:16,710 --> 00:29:26,520 Okay. And now let's see, this one here has a purple neighbour and one of these greenish neighbours, 277 00:29:26,520 --> 00:29:33,839 whereas this one has one of those as well and a green one, so they should get different colours. 278 00:29:33,840 --> 00:29:40,680 Now we do another step and now if we had these three, they have all different colours. 279 00:29:40,680 --> 00:29:46,200 Now of course as the number of colours grows, they get harder to distinguish. 280 00:29:46,200 --> 00:29:52,559 So let me just click through this. We get more and more colours and at some point we're done. 281 00:29:52,560 --> 00:30:02,460 Nothing new happens. Okay, now we could try the same thing again, but basically the the partition of notes into colour classes stays the same. 282 00:30:02,610 --> 00:30:05,810 And that's, that's the stable colouring. 283 00:30:05,940 --> 00:30:13,380 Okay, that's the algorithm. Very simple. Before I tell you what we actually do with it, 284 00:30:13,650 --> 00:30:23,010 let me just mention that this can be implemented to run very efficiently in almost linear time, also no large quantities. 285 00:30:23,010 --> 00:30:29,190 So this works really, really well. So we have this well, almost linear time. 286 00:30:29,190 --> 00:30:39,500 We have a lock factor there. So a while back we were wondering if we can get rid of the lock factor and we could show that probably we can't, 287 00:30:39,510 --> 00:30:47,909 at least if we stick to somewhat reasonable algorithm. Now if you if you start to do arithmetic on the bits of some real numbers, 288 00:30:47,910 --> 00:30:55,320 I have no guarantees, but uh, so the lower bound applies to a fairly wide class of algorithms. 289 00:30:55,620 --> 00:31:04,830 Okay, so we have this very efficient thing, but what we want to do with it, how does it relate to what we're interested in? 290 00:31:04,890 --> 00:31:14,160 Well, basically it detects non symmetry. If two vertices get different colours, then there's no or then they are different, structurally different. 291 00:31:14,580 --> 00:31:18,750 So there's no automotive is mapping one to the other. Okay. 292 00:31:19,320 --> 00:31:26,970 And our goal would be to detect all structural differences between the vertices. 293 00:31:27,480 --> 00:31:32,100 Now, unfortunately, uh, this does not always happen. 294 00:31:32,370 --> 00:31:44,459 It actually does happen on, on random graphs. Okay, so on almost all graphs it happens, but then almost all graphs are not particularly interesting. 295 00:31:44,460 --> 00:31:52,450 So, uh, this tends to not cover, uh, the interesting graphs in particular. 296 00:31:52,560 --> 00:32:03,150 Well, it depends on what you are doing. Lastly, you may disagree on that, but anyway, now it fails on, on simple graphs such as this one, 297 00:32:03,180 --> 00:32:08,159 because if you look colour, refinement or vice element does nothing there. 298 00:32:08,160 --> 00:32:12,780 Every now the versus is all black every is has two black neighbours. 299 00:32:12,780 --> 00:32:17,520 So there's nothing to even start that's already a stable colouring. 300 00:32:17,790 --> 00:32:23,460 But then of course the vertices in the triangle are structurally different from those in the. 301 00:32:24,030 --> 00:32:27,870 The four cycles. So there it miserably fails. 302 00:32:28,260 --> 00:32:36,930 And that also happens now. Since it's so efficient, we still want to use it in practice and somehow turn it into a reasonable algorithm. 303 00:32:36,930 --> 00:32:41,430 That's what the practical tools do. Um, actually. 304 00:32:41,430 --> 00:32:48,089 Okay, one more thing I wanted to mention is that as in Ice and Wolf as in test, 305 00:32:48,090 --> 00:32:54,390 we would say the algorithm distinguishes two graphs if the colour histograms are different. 306 00:32:54,600 --> 00:33:02,040 OC If one graph has more red vertices than the other, they cannot be ISO Morpheus iso morphic. 307 00:33:03,750 --> 00:33:08,310 Now if four colours they have the same number of vertices, we don't know. 308 00:33:08,460 --> 00:33:13,290 So this is what we could regard as an incomplete ISO more for some test. 309 00:33:13,920 --> 00:33:17,400 Okay. Now I wanted to go to the practical stuff. 310 00:33:18,030 --> 00:33:23,670 Okay. So we have this algorithm and it's it's very efficient but incomplete. 311 00:33:23,970 --> 00:33:25,020 So what do we do? 312 00:33:25,020 --> 00:33:36,090 We basically integrate this in some kind of backtracking algorithms, and that's what all the tools that are successful in practice do. 313 00:33:36,540 --> 00:33:40,160 So let's just briefly explain how this works. 314 00:33:40,440 --> 00:33:45,209 So we have a graph here. We do colour refinements on the top one on this one. 315 00:33:45,210 --> 00:33:50,850 Okay, we have two colour classes, red and blue. Now that doesn't help us much. 316 00:33:51,090 --> 00:34:01,319 Now if every voters had a different colour that would help us because then we could take another graph and if it had a different 317 00:34:01,320 --> 00:34:07,620 colour pattern it would be nice and morphic and if it had the same colour pattern and all vertices have different colours, 318 00:34:07,620 --> 00:34:12,990 we know exactly what the only possible isam of ism can be and we can just check it. 319 00:34:13,380 --> 00:34:21,150 So our goal will be to somehow in a canonical way, find a colouring where all vertices have a different colour. 320 00:34:21,810 --> 00:34:27,060 That's what we're trying to do. Now here we have three red vertices that's bad. 321 00:34:27,420 --> 00:34:33,660 So what we say is, let's take one of the red vertices and individualise it. 322 00:34:33,720 --> 00:34:37,890 And that means just give it a random new colour purple here. 323 00:34:38,370 --> 00:34:43,949 Okay. So now of course, all three red vertices look the same for us. 324 00:34:43,950 --> 00:34:49,229 So we basically have two blondes. We build a search tree in the second branch. 325 00:34:49,230 --> 00:34:53,190 That vertex will be the new one and in the third branch it will be that one. 326 00:34:53,820 --> 00:34:58,410 Okay, so we are here and we run colour refinement again. 327 00:34:59,010 --> 00:35:04,739 Now since this vertex is purple, this one knows it's different than the other two. 328 00:35:04,740 --> 00:35:09,899 So it also gets a new colour light blue. Okay, that still doesn't help. 329 00:35:09,900 --> 00:35:15,360 We still have to read purchases and to dark blue and we do the same thing again. 330 00:35:15,360 --> 00:35:17,760 We say this now becomes pink, 331 00:35:18,810 --> 00:35:27,389 then the neighbour also gets a new colour and now suddenly we're in a situation where all vertices have the same colour and we expand the full tree. 332 00:35:27,390 --> 00:35:32,700 And then basically with this object we can compare two graphs. 333 00:35:32,910 --> 00:35:37,830 Now that alone wouldn't be very efficient, but then what we can do is we can, 334 00:35:38,850 --> 00:35:49,440 we can prune this tree and in clever ways I don't want to get into so that usually the tree is fairly small and then this works quite efficiently. 335 00:35:49,770 --> 00:35:54,570 So basically at every node of this tree we run this 1wl algorithm. 336 00:35:55,410 --> 00:35:59,130 But that's okay because it's so efficient. Good. 337 00:36:00,420 --> 00:36:08,370 So now let's move into the, uh, into the theory, uh, very firmly into theory. 338 00:36:09,180 --> 00:36:17,130 First of all, the algorithm has a higher dimensional version, which you could have guessed because I call the previous one one dimensional. 339 00:36:17,460 --> 00:36:23,610 So in the k dimensional version instead of vertices we call our k tuples of as well. 340 00:36:23,610 --> 00:36:31,320 We can also do this. Then the running time will be something like into the k, which is still polynomial. 341 00:36:31,650 --> 00:36:41,700 Okay, now as you do this for k equals three, then all algorithms people usually come up with are subsumed by this. 342 00:36:41,700 --> 00:36:48,959 This is stronger by every single algorithm that people suggest to me after my talks on graph I more fears and testing and so on. 343 00:36:48,960 --> 00:36:52,560 It's so so we, we always have this. 344 00:36:52,800 --> 00:37:01,620 So that's a very powerful algorithm. Actually, a while ago I showed that if you have a glass graph class that excludes some fixed graph as a minor, 345 00:37:01,800 --> 00:37:07,860 such as playing on graphs or graphs of bounded tree with so classes like this. 346 00:37:08,010 --> 00:37:13,110 Then actually there is a K such that this is a complete isomer of this test. 347 00:37:13,830 --> 00:37:20,040 On the other hand, it was known for a long time that on general graphs it's not okay. 348 00:37:21,090 --> 00:37:27,300 So that's combinatorial algorithm. And that's basically where we stand with combinatorial algorithms. 349 00:37:27,930 --> 00:37:37,380 Quite powerful, but not good enough. So let's finally move to the groups so we know the solution space to our problem. 350 00:37:37,390 --> 00:37:45,000 The automotive ISM group has a nice structure. Okay, we want to exploit this, but how do we do this? 351 00:37:45,030 --> 00:37:53,020 And that's the challenge, of course. And look, suggested a divide and conquer approach, okay. 352 00:37:53,130 --> 00:38:01,080 For solving this automotive ISM version. So you wants to compute the automotive ISM group, but it's equivalence to two isomers. 353 00:38:01,680 --> 00:38:06,210 All right. So the goal is to compute the Automotive ISM Group. 354 00:38:06,690 --> 00:38:16,770 And what we do is we take some some supergroup that contains all automotive isms that may be the group of all permutations of the vertex set, 355 00:38:17,070 --> 00:38:21,420 or it may be something that we obtain in a different way somehow easily. 356 00:38:22,850 --> 00:38:34,460 And then we we basically computed decreasing sequences of groups that get closer and closer to the axolotl more for some groups. 357 00:38:35,090 --> 00:38:40,040 And what we do is we kind of exploit the structures of the groups we see. 358 00:38:40,460 --> 00:38:47,900 And well, we in groups we can split them up along orbits or blocks if you know what these are. 359 00:38:48,140 --> 00:38:52,160 Otherwise, just believe me that when you use that, there are natural, 360 00:38:52,430 --> 00:38:59,240 natural ways to split up permutation groups and somehow reduce the problem size, move to smaller groups. 361 00:38:59,480 --> 00:39:04,880 But then at some point we get stuck and that's when we hit primitive permutation groups. 362 00:39:05,330 --> 00:39:13,430 They are the bottleneck. And then in Lux's setting, in the old paper, basically at that point he just said, okay. 363 00:39:13,430 --> 00:39:22,490 And now if the input graph has this simple structure bounded degree, I don't have to worry about them and I can just use brute force. 364 00:39:23,210 --> 00:39:30,620 And then Baba was much more refined in his algorithm and and used some heavy machinery to deal with the primitive groups. 365 00:39:31,430 --> 00:39:36,140 In the end, we know a lot about the classification of finite, simple groups. 366 00:39:36,680 --> 00:39:42,889 A monumental piece of work in mathematics tells us a lot about how these groups can look. 367 00:39:42,890 --> 00:39:47,630 And basically now we can look at the tables of groups that appear and see what we can do with them. 368 00:39:48,980 --> 00:39:52,580 And in a sense, that's what we're doing. 369 00:39:53,330 --> 00:40:01,400 All right. So let's look at the main results that's that builds on this group theoretic approach. 370 00:40:02,300 --> 00:40:06,290 And end is always the number of vertices of the input graph. 371 00:40:06,290 --> 00:40:13,040 And D is usually the maximum degree, the maximum number of neighbours a vertex may have. 372 00:40:13,850 --> 00:40:22,010 Okay, so luck's proof that I am more feasible for graphs of maximum degree d can be decided in time roughly into the D. 373 00:40:22,640 --> 00:40:26,600 Okay. A few years later this has been improved. 374 00:40:26,600 --> 00:40:32,510 Or actually just one year later to ensue the d overclock d oc. 375 00:40:34,160 --> 00:40:41,960 Bommai many years later proved that you can do quasi polynomial time and actually. 376 00:40:43,230 --> 00:40:48,130 In like the worst case runtime. There hasn't been much progress between these two papers. 377 00:40:48,150 --> 00:40:53,040 There have been many interesting papers on graphite. So Wolff is in the time between. 378 00:40:53,040 --> 00:41:05,490 But, uh, now in, in some sense, these, these papers really are immediate successors, if you will. 379 00:41:06,030 --> 00:41:15,380 Okay. So that's where we stand now. If you look at this, uh, let's say the maximum degree is log m. 380 00:41:15,990 --> 00:41:22,170 Okay, then what do we have? Logs tells us we need time into the log. 381 00:41:22,170 --> 00:41:29,070 M and Baba. He doesn't care about this and still tells us into the poly log. 382 00:41:29,080 --> 00:41:33,000 M And that's somehow bothered us. 383 00:41:33,000 --> 00:41:38,670 So we looked at this and kind of combined the two results and we proved the following. 384 00:41:39,150 --> 00:41:46,770 If you have graphs of maximum degree D, you can go to time into the poly log of the OC. 385 00:41:46,780 --> 00:41:52,259 So that's kind of, uh, the, the, the minimum of the two. 386 00:41:52,260 --> 00:42:00,000 And for example, photographs of logarithmic degree that improves both running times, uh, quite a bit. 387 00:42:00,210 --> 00:42:04,260 Okay. And I think I. 388 00:42:05,450 --> 00:42:10,570 Skip the proof of this or let me just tell you a little bit about the strategy. 389 00:42:10,580 --> 00:42:16,300 I wasn't going to tell you the whole truth anyway, but we basically proceed in three steps. 390 00:42:16,310 --> 00:42:22,190 The first is we just analyse the groups that are using the classification of finite, simple groups. 391 00:42:22,310 --> 00:42:27,170 We understand the structure of these groups that we need for this divide and conquer stuff. 392 00:42:28,010 --> 00:42:31,220 Then it turns out we can just apply the divide and conquer stuff. 393 00:42:31,490 --> 00:42:37,910 We need one intermediate step, and that may be the key innovation in our work. 394 00:42:37,910 --> 00:42:48,020 We somehow encoded parts of these automotive ism groups into the graphs, so we augment the graphs by certain tree structures that help us. 395 00:42:48,440 --> 00:42:57,139 Then in the third step to apply basically Bobby's techniques, unfortunately, we couldn't just use bubbles a result as a black box, 396 00:42:57,140 --> 00:43:03,080 but we had to looking into this stuff and then actually compute the group in the time we want. 397 00:43:04,460 --> 00:43:09,620 And now with this with this machinery, we could do a bit more. 398 00:43:09,650 --> 00:43:16,970 For example, if you have graph classes excluding a fixed graph as a miner and this graph has K vertices, 399 00:43:18,230 --> 00:43:23,120 then we also have an algorithm running in time ends with a poly log of K. 400 00:43:24,570 --> 00:43:29,879 All right, so if I've lost you here, good news. 401 00:43:29,880 --> 00:43:36,300 We started a completely new part of. Of my talk, and we have a nice picture here. 402 00:43:36,300 --> 00:43:40,500 Again, that one even older, uh, 5000 years. 403 00:43:41,070 --> 00:43:45,600 Uh, it's a very nice relief. And, and you see the symmetries again. 404 00:43:47,310 --> 00:43:50,520 Okay, so there's basically a symmetry axis in the middle. 405 00:43:50,820 --> 00:43:55,470 Okay. Now, one thing you note, of course, it's it's not an exact symmetry. 406 00:43:56,280 --> 00:44:00,420 Okay? So so if you look here, that's that's different than what you see. 407 00:44:00,750 --> 00:44:03,750 There's actually something something very funny about this. 408 00:44:04,020 --> 00:44:13,020 So basically, if you if you disregard these small in precisions, you have the symmetry along this axis, 409 00:44:13,020 --> 00:44:19,259 you will think if you look at the heads of the deer, but then if you look at the feet, it's the other way round. 410 00:44:19,260 --> 00:44:24,420 You would have to flip it over like that to match the feet, but then the hats would be wrong. 411 00:44:25,390 --> 00:44:29,010 Well, that's kind of puzzling. 412 00:44:29,010 --> 00:44:35,040 I wonder why they did it this way if they hadn't noticed or thought it was funny or didn't care. 413 00:44:35,430 --> 00:44:47,550 It's it's interesting. But anyway, my my point was, even though this is not a precise symmetry, it's good enough to convey the main, main point. 414 00:44:48,230 --> 00:44:52,320 The picture still looks fairly symmetric. 415 00:44:52,380 --> 00:44:56,910 Okay. So approximate symmetry is can also be good enough. 416 00:44:57,240 --> 00:45:00,510 And in many, many situations they are. Okay. 417 00:45:00,510 --> 00:45:11,399 Now let's look to to modern times and look at these three graphs again, showing chemical molecules. 418 00:45:11,400 --> 00:45:15,780 Now, which of these are almost similar? Well, you can think about it. 419 00:45:15,780 --> 00:45:21,240 They are all pretty similar in a way. So what's cell we say? 420 00:45:22,560 --> 00:45:28,350 Okay, let me be a bit more precise when it comes to designing synthetic fuels. 421 00:45:28,380 --> 00:45:32,580 Now, that's from a collaboration with chemical engineers at my university. 422 00:45:33,810 --> 00:45:41,070 So we looked at things like this and we want to understand which molecules may be good, good fuels in the end. 423 00:45:41,160 --> 00:45:50,940 Okay. It turns out that with respect to the properties we care about here, the two blue ones, some awesome similar. 424 00:45:51,360 --> 00:45:55,860 Okay, now that's something I don't understand why that is. 425 00:45:55,860 --> 00:45:59,310 I can't even pronounce the names of these, so how should I? 426 00:46:00,780 --> 00:46:09,629 But we worked on this. So, so basically what we did, we built a machine learning model that kind of tries to predict the properties of these. 427 00:46:09,630 --> 00:46:16,470 And the way this goes is we usually map the graphs into some vector space and there we can then 428 00:46:16,470 --> 00:46:23,280 use standard techniques from machine learning to distinguish between them and the mapping. 429 00:46:23,700 --> 00:46:31,080 If this is correct that the two blue ones are more similar should maybe be such that the two blue points are closer together. 430 00:46:31,800 --> 00:46:38,760 Okay. And this then also gives us a method for for quantifying similarity. 431 00:46:38,760 --> 00:46:48,390 We can just map two vector spaces in a way that that closeness in this space somehow corresponds to similarity. 432 00:46:49,290 --> 00:47:00,199 Okay. Um, let's leave it there and just think about what, how else we might approach graph similarity, 433 00:47:00,200 --> 00:47:03,979 because I'm sure that's not the first thing you would have thought of. 434 00:47:03,980 --> 00:47:11,930 And if you ask theoretical computer scientists, the first answer you get is or almost always what? 435 00:47:12,320 --> 00:47:20,899 Well, how to quantify similarity or distance between two graphs is you would want to count the number of edge mismatches. 436 00:47:20,900 --> 00:47:26,209 So you try to align the two graphs and then you have to delete a few edges 437 00:47:26,210 --> 00:47:30,710 and edges to the other and then you should get the same or ISO morphic ones. 438 00:47:31,490 --> 00:47:38,330 Okay. That's what you might call edit distance similar to edit distance of words. 439 00:47:38,720 --> 00:47:48,950 Okay. Now that's something that seems very natural for us, I think in theoretical computer science and algorithms and so on. 440 00:47:49,190 --> 00:47:56,390 On the other hand, if you think about it, let's say they are fairly close, you only have to flip 5% of the edges. 441 00:47:56,720 --> 00:48:02,510 Now if you flip 5% of the edges on a graph, it can look very, very different. 442 00:48:03,500 --> 00:48:09,050 So it's not clear at all that this is in any way a meaningful measure. 443 00:48:09,260 --> 00:48:13,070 Okay, um, let's look at. 444 00:48:13,460 --> 00:48:18,230 I was hoping to have more. But now I'm apparently stuck. 445 00:48:18,440 --> 00:48:22,670 That's annoying. Okay. Let me just try to. 446 00:48:25,010 --> 00:48:30,380 With the program. Okay. 447 00:48:31,090 --> 00:48:34,659 Oh, now we have to run through the whole thing. 448 00:48:34,660 --> 00:48:39,670 And I just hope that I wasn't just stuck at that slight. 449 00:48:41,480 --> 00:48:47,940 Okay. Okay. It's just for the vote. 450 00:48:47,960 --> 00:48:51,410 What was the best slide in the end? So take notes now. 451 00:48:52,360 --> 00:48:57,310 Uh. Okay. Huh? 452 00:48:58,190 --> 00:49:03,700 Worked. Okay, now, something people do in machine learning is what they call graph kernels. 453 00:49:03,700 --> 00:49:13,270 Basically, for all purpose. They extract features from a graph, maybe even infinitely many, and the features are coded as real numbers. 454 00:49:13,280 --> 00:49:19,420 So, for example, number of triangles in the graph, number of edges, whatever people come up with. 455 00:49:20,350 --> 00:49:23,800 And then you compare these, these vectors of the numbers. 456 00:49:23,980 --> 00:49:30,940 Okay, that gives you some similarity measure. And if the features are meaningful, it's they that should tell you something. 457 00:49:31,950 --> 00:49:35,220 Then there's another step now. 458 00:49:35,820 --> 00:49:42,730 As I described it, people select the features. Now, we could let machine learning select the features. 459 00:49:42,780 --> 00:49:46,079 Okay, so then we get learned embeddings of the graph. 460 00:49:46,080 --> 00:49:49,889 So the features would associate vectors with the graph. 461 00:49:49,890 --> 00:49:59,640 So we embed them in some vector space and we can kind of from examples we can tell them these two are similar, these are not similar, and so on. 462 00:50:00,360 --> 00:50:05,140 We can try to come up with a good, uh, a good measure. 463 00:50:05,190 --> 00:50:10,739 Okay. So techniques for this are graph neural networks, graph contrastive learning. 464 00:50:10,740 --> 00:50:17,370 So that's stuff we are working on, uh, these days in practice and many other people are as well. 465 00:50:18,180 --> 00:50:23,340 Um, them okay. I, I grew up as a logician. 466 00:50:23,400 --> 00:50:27,059 Uh, I, I did my Ph.D. in mathematical logic. 467 00:50:27,060 --> 00:50:33,210 So for me, in a fairly immediate thing to think about is logical equivalence. 468 00:50:33,720 --> 00:50:38,520 Uh, take equivalence is in increasingly stronger logics. 469 00:50:38,520 --> 00:50:48,749 That should give you a feeling of how close to graphs are okay by similarities equivalent to this and many people here will know what that is. 470 00:50:48,750 --> 00:50:52,860 If you don't. Never mind. It won't play a role in this talk. 471 00:50:53,400 --> 00:51:00,389 Okay, then we have this algorithm vice file alignment algorithm which was not a complete us so move isn't this. 472 00:51:00,390 --> 00:51:06,300 But maybe we could say if if it does not distinguish two graphs then at least they are similar. 473 00:51:06,570 --> 00:51:11,520 That's a and then we have this hierarchy of algorithm that gives us some kind of distance. 474 00:51:11,520 --> 00:51:14,790 That's something we could do, of course. 475 00:51:14,790 --> 00:51:21,119 Then if we're most more mathematically minded, we can say, okay, now we have these adjacency matrices of the graph. 476 00:51:21,120 --> 00:51:28,650 We can apply some matrix knobs to them that should give us the distance that's actually related to this edit distance thing. 477 00:51:29,130 --> 00:51:34,920 Oh, if you're more sophisticated, we can think of optimal transport distances. 478 00:51:35,190 --> 00:51:41,880 Okay, so we have all these approaches. Now, what do we do with this I. 479 00:51:44,080 --> 00:51:49,479 I think this similarity problem is really, really important for many practical means, 480 00:51:49,480 --> 00:51:54,490 mainly for machine learning, because machine learning has always been built on. 481 00:51:56,230 --> 00:52:03,450 Similarity in some way. So we should understand it, but there's almost no theory. 482 00:52:03,690 --> 00:52:05,069 So I try to think about it. 483 00:52:05,070 --> 00:52:14,010 And basically I came up with basically two different ways of thinking about similarity, and I just want to share them with you. 484 00:52:14,310 --> 00:52:20,850 And the first I call operational similarity. And under this view, two graphs are similar. 485 00:52:20,880 --> 00:52:24,270 If you can transform one easily into the other. 486 00:52:24,630 --> 00:52:27,720 Okay, that's a natural way. Thinking about is, for example, 487 00:52:27,720 --> 00:52:36,150 edit distances of this type you want to if you delete as few edges as possible and add as few edges to get from one to the other. 488 00:52:36,150 --> 00:52:40,080 And if you have few, a few operations, then they are close. 489 00:52:42,090 --> 00:52:46,590 So the optimal transport distances I mentioned also of this type. 490 00:52:47,210 --> 00:52:56,100 Now an additional thing that you get out of this is usually you even get an alignment of the graphs that somehow realises this, 491 00:52:56,100 --> 00:53:01,650 which may be useful sometimes. Then the other side I call declarative similarity. 492 00:53:01,950 --> 00:53:06,000 Under this view, two graphs are similarly if they have similar properties. 493 00:53:07,780 --> 00:53:16,150 So the graph kernels where you collect certain features that the properties are of this type logical equivalence is of this type. 494 00:53:16,450 --> 00:53:20,980 The two graphs satisfy the same properties, express all in this logic. 495 00:53:21,520 --> 00:53:27,340 Now the advantage of this way of thinking about it, you can actually adapt it to a specific target. 496 00:53:27,430 --> 00:53:31,150 You know which properties are relevant in your practical application. 497 00:53:31,390 --> 00:53:36,630 So you choose these and look at these. Okay, now what do we do with this? 498 00:53:36,640 --> 00:53:42,910 So we can kind of separate the examples I gave you into these two categories? 499 00:53:43,210 --> 00:53:46,630 Now, the question is, are these related? 500 00:53:46,640 --> 00:53:52,640 Right? I actually think thinking in some way, they are just two sides of the same coin. 501 00:53:52,660 --> 00:54:04,060 They are dual to one another. And I just take this as a hypothesis and try to, uh, to, to turn it into theorems. 502 00:54:04,450 --> 00:54:15,070 And the remaining few minutes, I just want to show you a few results pointing in this direction that actually there is a relation. 503 00:54:15,880 --> 00:54:21,850 So again, now the next three slides or something will be a bit more technical. 504 00:54:22,660 --> 00:54:26,290 But then, then you're done and hopefully we all get a doughnut. 505 00:54:28,350 --> 00:54:34,230 Okay. So any distance I mentioned this, we can write it this way. 506 00:54:35,070 --> 00:54:42,570 So what do we do here? We have two graphs, adjacency matrices, A and B, and now what we can take. 507 00:54:42,570 --> 00:54:47,309 The adjacency matrix has a one if there's an edge, zero if there's no edge. 508 00:54:47,310 --> 00:54:51,870 But in the two matrices, the vertices may be ordered in different ways. 509 00:54:52,170 --> 00:54:57,460 So we apply a permutation to the first, then compare it to the second. 510 00:54:58,440 --> 00:55:04,290 So take the difference and then take the norm. Basically, count the number of ones or minus ones that remain. 511 00:55:05,190 --> 00:55:09,690 That's what edit distance is and we have to minimise overall these alignments. 512 00:55:10,590 --> 00:55:15,930 Okay, well we can write this in a pure matrix form. 513 00:55:16,170 --> 00:55:20,760 We minimise over permutation matrices and then take this expression. 514 00:55:21,000 --> 00:55:24,030 Just trust me that this is this is right. 515 00:55:24,420 --> 00:55:31,829 Okay. Now if you write it this way, one thing you see is you have to minimise overall permutations. 516 00:55:31,830 --> 00:55:38,600 This is incredibly hard in theory and in practice it's a really bad space to minimise over. 517 00:55:38,910 --> 00:55:48,510 So for example, one way to, to, to, to support this is that even if the two graphs are just trees, 518 00:55:48,510 --> 00:55:54,329 I mean, usually everything is easy on trees, but here is even an MP hard to compute this. 519 00:55:54,330 --> 00:56:03,390 Okay, so that's even if it was a good measure of similarity, it would be kind of useless because you can't compute it. 520 00:56:04,170 --> 00:56:06,059 Well, then you could say we can. 521 00:56:06,060 --> 00:56:13,350 We can relax it a little bit and we can take a convex problem that approximates it, and then we can solve efficiently. 522 00:56:13,740 --> 00:56:23,310 And the way we can do this is instead of minimising over permutation matrices, we minimise over doubly stochastic matrices. 523 00:56:23,490 --> 00:56:31,740 Again, the important thing is here. That's the convex relaxation of this problem and such problems we can solve. 524 00:56:32,490 --> 00:56:36,240 Okay, so that gives us a new distance measure. 525 00:56:36,360 --> 00:56:41,160 It's not clear what it actually means and what it tells us about the graphs, but we can try, right? 526 00:56:41,520 --> 00:56:45,060 At least we can compute it now. Okay. 527 00:56:45,540 --> 00:56:50,549 So that's one thing we can do. So that's on this operational side, right? 528 00:56:50,550 --> 00:56:56,400 We started from edit distance. Now let's start from the other side and what could be good features. 529 00:56:57,330 --> 00:57:08,670 Well, in general, what turns out to be good and also related to many other features that people use is we count home office. 530 00:57:08,670 --> 00:57:13,440 So what's a human wolf is and it's a mapping that preserves adjacency. 531 00:57:13,890 --> 00:57:17,879 Okay. So here we have two graphs that would be a home office. 532 00:57:17,880 --> 00:57:21,030 We mapped the middle vertex here, these two here. 533 00:57:21,300 --> 00:57:26,550 So they are both neighbours and here's an edge and that one here, there's also an edge. 534 00:57:26,550 --> 00:57:29,610 So that's a home office and now we want to count them. 535 00:57:30,000 --> 00:57:40,260 Okay. So for example, number of triangles would be the number of Homomorphic isms from a triangle to our graph divided by six. 536 00:57:40,260 --> 00:57:44,100 But who cares about dividing by six? Okay, so we can do that. 537 00:57:44,340 --> 00:57:50,580 And now this gives us a vector embedding. Say we have a class of graphs, the class of all cycles. 538 00:57:50,790 --> 00:57:56,700 Okay? And then we can collect all these homomorphic numbers in one vector. 539 00:57:57,150 --> 00:58:00,270 Could be finite dimensional could be infinite dimensional. 540 00:58:00,840 --> 00:58:04,110 Okay, that gives us an embedding. Okay. 541 00:58:04,110 --> 00:58:07,950 Here's an example for these two graphs. Okay. 542 00:58:08,280 --> 00:58:13,580 And once we have the embedding, then the distance in the vector space induces a distance on the graphs. 543 00:58:13,590 --> 00:58:18,659 I'm close to the end. Okay. So we can do that. 544 00:58:18,660 --> 00:58:24,600 That also gives us the distance would be routed on the declarative sides. 545 00:58:24,840 --> 00:58:33,000 And now here's a theorem. That says these things and many other things are the same in some sense. 546 00:58:33,330 --> 00:58:39,780 The precise statement is so for any to graphs, their distance in the star sense is zero. 547 00:58:40,110 --> 00:58:43,230 If and only if all the Home Office numbers are all the same. 548 00:58:43,950 --> 00:58:47,360 Okay. Now that happens. Interestingly. 549 00:58:47,760 --> 00:58:53,100 If and only if the one dimensional vice file element algorithm does not distinguish the two. 550 00:58:53,190 --> 00:58:55,310 So there we actually have a very efficient. 551 00:58:57,120 --> 00:59:06,210 We can also write this as a logical equivalence in a strange logic that has been studied a lot in descriptive complexity theory. 552 00:59:07,020 --> 00:59:14,099 And maybe the most interesting addition here is also this happens if and only if the two 553 00:59:14,100 --> 00:59:19,350 graphs cannot be distinguished by a graph neural networks or by a learned embedding. 554 00:59:19,800 --> 00:59:31,050 So that's quite nice how things fall together here. And I want to stop with that and just to one slide on very general research directions. 555 00:59:32,010 --> 00:59:36,030 So Rafael Wolff is there's still a lot happening. 556 00:59:36,040 --> 00:59:42,930 I mean, the problems are really hard and it's hard to make progress, but I think it's very interesting and worthwhile. 557 00:59:42,960 --> 00:59:49,510 So I still work on this, although not exclusively. So for example, 558 00:59:49,510 --> 00:59:53,420 one problem that is interesting and we're not much is known as the group I saw 559 00:59:53,440 --> 01:00:01,210 monotheism problem where you want instead of graphs you look at groups um similarity. 560 01:00:02,220 --> 01:00:07,090 Is. Kind of the opposite. It's practically extremely important. 561 01:00:07,120 --> 01:00:13,960 In theory, we know very little and there's a lot to do and probably a lot of easy progress to be to be made. 562 01:00:14,650 --> 01:00:21,700 Okay. And you're just two references of survey papers I wrote fairly recently. 563 01:00:21,970 --> 01:00:27,610 You can look it at these if you're interested, and that's pretty much it.