1
00:00:00,630 --> 00:00:18,960
Okay. So welcome to this term strategy lecture.
2
00:00:18,970 --> 00:00:27,840
This is a distinguished lecture that we have once a term named after Professor Christopher Straight G who formed the programming research group here.
3
00:00:28,740 --> 00:00:30,209
I'm going to introduce our speaker.
4
00:00:30,210 --> 00:00:41,970
But before I do that, I would like to thank Oxford Assets Management, who have been generously funding this series of talks since 2015.
5
00:00:42,450 --> 00:00:47,310
And without their funding, we wouldn't be able to have excellent speakers as we do.
6
00:00:47,760 --> 00:00:53,610
So we're really grateful for that. And also, they brought coffee and doughnuts.
7
00:00:53,610 --> 00:01:01,530
So many students and researchers who might wonder about employment might want to go chat to them afterwards.
8
00:01:01,530 --> 00:01:04,800
And they they they're sitting. They'll be up at reception.
9
00:01:05,130 --> 00:01:09,360
So you might like to do that. The way we're going to do this is, as usual,
10
00:01:09,840 --> 00:01:17,040
Martin will give his talk and then we'll have some microphones afterwards that you can wander around and ask questions.
11
00:01:18,030 --> 00:01:24,300
So that brings me to introducing Martin Grillo, which is actually a brilliant pleasure.
12
00:01:24,750 --> 00:01:32,670
So Martin is the chair for Logic and the theory of discrete systems at the university are w t h ipa.
13
00:01:32,880 --> 00:01:42,030
But he says that somewhere there. Yeah. And he's well known actually for a bunch of different fields that intersect in our department.
14
00:01:42,040 --> 00:01:52,860
So Martin has written the key textbooks in multiple areas, including Parameterised complexity,
15
00:01:53,130 --> 00:02:00,930
which is pretty much the the scripture of Parameterised Complexity is Martin's book, which I think many of us actually own.
16
00:02:01,470 --> 00:02:05,700
He's also got a great book on descriptive descriptive complexity of graph structures.
17
00:02:06,000 --> 00:02:10,710
And actually. Martin, some years ago, I was talking to Martin at a workshop,
18
00:02:11,250 --> 00:02:19,760
and he told me I'm going to step back from the things that I'm doing and work exclusively on graphs and autism.
19
00:02:20,250 --> 00:02:23,920
And I remember being immediately jealous and thinking, what a great idea.
20
00:02:23,940 --> 00:02:30,050
What do I do that? MARTIN Someone who has worked on truly deep things.
21
00:02:30,060 --> 00:02:33,660
And so he had a lot of really great results about graphs and more them.
22
00:02:34,740 --> 00:02:41,250
But I know just by looking at your DLP that he's also done something I didn't predict,
23
00:02:41,550 --> 00:02:45,270
which is that after all that those kind of amazing theoretical results,
24
00:02:45,570 --> 00:02:52,530
you have a result in triple j lie about graphs amorphous on the neural net, which has about a thousand citations.
25
00:02:52,530 --> 00:03:01,290
So and I noticed also Martin's got a lot of new stuff about probabilistic databases, so lots to offer.
26
00:03:01,560 --> 00:03:05,490
I'm going to shut up. So I'm only going to mention two of Martin's many awards.
27
00:03:05,490 --> 00:03:10,680
So Martin is the recipient of the Hynes Meyer Leibniz Prize from the German
28
00:03:10,680 --> 00:03:16,560
Research Foundation for his work in mathematics and also fellow at the ACM.
29
00:03:16,860 --> 00:03:22,080
And we're super lucky to have him here today talking about symmetry and similarity.
30
00:03:22,590 --> 00:03:33,310
That's what. Oh, good afternoon, everybody.
31
00:03:33,320 --> 00:03:36,790
I'm also very happy to be here. Thank you for the introduction, Leslie.
32
00:03:38,890 --> 00:03:41,049
I will talk about symmetry and similarity.
33
00:03:41,050 --> 00:03:47,860
Just one thing to clarify before I heard the doughnuts are only for the students or people who look for employment.
34
00:03:48,880 --> 00:03:52,780
I was counting on them, so you may have to reconsider that.
35
00:03:54,280 --> 00:03:58,209
Okay, so today I'll speak about symmetry and similarity.
36
00:03:58,210 --> 00:04:04,600
Part of it is graph I, some of ism and indeed for a while I mainly worked on that.
37
00:04:05,200 --> 00:04:06,099
But then I mean,
38
00:04:06,100 --> 00:04:14,319
the nice thing about my job is that you can do different things whenever you feel like doing something else and nobody tells you what to do.
39
00:04:14,320 --> 00:04:17,559
So that's, that's what I do.
40
00:04:17,560 --> 00:04:25,090
And as Leslie said recently, I've done some other stuff, but at the end of my talk I will connect to that as well.
41
00:04:25,930 --> 00:04:29,079
So let's talk a little bit about symmetry.
42
00:04:29,080 --> 00:04:36,040
So symmetry is somehow something that has been ingrained in human thought for a long time.
43
00:04:36,040 --> 00:04:40,179
You see this phrase here, which has an obvious symmetry on there.
44
00:04:40,180 --> 00:04:43,329
I hope you can you can see it.
45
00:04:43,330 --> 00:04:57,850
It's 4000 years old. So even at that time, people were very aware of it and somehow found it's important enough to put on on there the pieces of art.
46
00:04:59,050 --> 00:05:04,060
And of course, actually, you find symmetry everywhere, not only in human thought, but in nature.
47
00:05:04,060 --> 00:05:07,630
That's probably where humans get it from also there.
48
00:05:08,920 --> 00:05:12,280
So it's good that I removed my slide with the butterfly,
49
00:05:12,280 --> 00:05:24,610
which in earlier versions of of talks about this was presence, but of course being in the Museum of Natural History.
50
00:05:25,540 --> 00:05:30,609
That fits. Okay, so we have that. And then of course, we have symmetry in the arts.
51
00:05:30,610 --> 00:05:37,270
We have already seen this on the first slide. He has a very nice and intricate example as well.
52
00:05:38,780 --> 00:05:43,340
And we have symmetry in mathematics. And that brings us closer to what I'll speak about.
53
00:05:44,270 --> 00:05:46,360
So what is symmetry in mathematics?
54
00:05:46,370 --> 00:05:56,229
It's basically invariance under certain operations, and typically we think of geometric operations like reflections or rotations.
55
00:05:56,230 --> 00:06:01,790
So in this octave we have we can basically turn it or we can flip it.
56
00:06:02,450 --> 00:06:07,520
And another very typical geometric thing which.
57
00:06:08,520 --> 00:06:14,579
At least I don't think about it immediately. But it's also something very familiar.
58
00:06:14,580 --> 00:06:23,280
It's a translation. So if you imagine this banner is infinite and you can shift it and there would also be some kind of symmetry.
59
00:06:23,580 --> 00:06:33,989
Okay. And I may I'm listing these different things just to make the point that symmetry has some some abstract properties,
60
00:06:33,990 --> 00:06:38,670
and they will be very important for what follows. So let me just fix them.
61
00:06:38,670 --> 00:06:48,420
So we have a mathematical object and the symmetry of this object, or depending on which area we are in, we may also call them or two or more.
62
00:06:48,420 --> 00:06:58,079
Phases of the objects are mappings from the object to itself and they are closed under composition.
63
00:06:58,080 --> 00:07:06,090
So if you take one symmetry, apply it and then you apply another one, you get a symmetry as well and they also inverted.
64
00:07:06,270 --> 00:07:14,010
You can always go back. I mean, these are the crucial properties they have and the mathematical structure they form.
65
00:07:15,980 --> 00:07:27,560
Is known as a group. Okay. And this will be important because as a general principle, we will think about computation or computing symmetries.
66
00:07:28,070 --> 00:07:35,750
Okay. And as soon as we have a mathematical structure, there's a chance that we can get efficient algorithms.
67
00:07:35,780 --> 00:07:40,790
As a general principle, I think that's that's fairly reliable.
68
00:07:40,790 --> 00:07:44,359
In particular, if you're in the range of kind of harder problems,
69
00:07:44,360 --> 00:07:49,480
you want to look for mathematical structure and you want to find ways to make it useful.
70
00:07:49,490 --> 00:07:55,040
And that's the guiding principle of much of the research on this stuff.
71
00:07:55,050 --> 00:07:58,370
I'll be I'll be talking about today. Okay.
72
00:07:58,370 --> 00:07:58,910
And now.
73
00:07:59,180 --> 00:08:08,360
Okay, we have a group and the group of old symmetries of an Object X is then known as Symmetry Group of the Object or the Automotive ISM Group.
74
00:08:08,360 --> 00:08:13,730
And actually in the area I'll be talking about, we would rather say automotive ISM Group.
75
00:08:14,390 --> 00:08:18,200
Okay. And just one last general remark.
76
00:08:19,470 --> 00:08:27,090
Autumn of isms. That's the symmetries of the objects are closely related to ISO Morph isms,
77
00:08:27,090 --> 00:08:32,250
which are basically mappings between two objects that preserve old structure.
78
00:08:33,090 --> 00:08:33,990
Okay. If you.
79
00:08:36,970 --> 00:08:44,890
If you forget all the geometry and everything, then an automotive business is a mapping from an object to itself that preserves structure.
80
00:08:45,340 --> 00:08:54,040
And I some of ism is between two objects and the the two for the purpose of this talk are more or less equivalent.
81
00:08:54,070 --> 00:09:00,280
Okay, if we understand the automotive isms, we understand ISO Molefe isms and the other way around.
82
00:09:01,120 --> 00:09:06,040
All right. So I want to talk about algorithmic aspects of symmetry.
83
00:09:07,210 --> 00:09:14,140
Okay. And they are mostly captured in an old problem in computer science and well-known problem.
84
00:09:14,140 --> 00:09:16,150
That's the graph. I am off isn't problem.
85
00:09:16,900 --> 00:09:26,050
So let me start by telling you what that is and then tell you a little bit about the history of this problem and what we know.
86
00:09:26,500 --> 00:09:30,520
And well, then we'll arrive at some more reason to work.
87
00:09:31,240 --> 00:09:35,440
Okay, so a graph, I assume you'll know what that is.
88
00:09:35,440 --> 00:09:40,960
We have a bunch of points and they are connected by lines, by bi edges.
89
00:09:41,380 --> 00:09:51,880
And then in automotive ISM or ISO morph ism is a mapping on the points on the vertices that preserves this edge relation.
90
00:09:52,150 --> 00:09:57,969
Okay. And the graph I am office and problem as an algorithmic problem is the following.
91
00:09:57,970 --> 00:10:05,020
We're given two graphs and we want to decide if they are similar, if they are basically capturing the same structure.
92
00:10:05,500 --> 00:10:09,819
Okay, but we may be given this the structure in different representations.
93
00:10:09,820 --> 00:10:16,000
That's why it's not trivial. And if you look at these four graphs here, they are fairly small, just eight vertices.
94
00:10:17,770 --> 00:10:22,360
Well, are they ISO morphic or not? It's it's not obvious.
95
00:10:23,230 --> 00:10:26,320
At least if you look at the last one, it's not obvious at all.
96
00:10:27,730 --> 00:10:30,940
Well, actually, they are all four of them are ISO morphic.
97
00:10:30,940 --> 00:10:35,230
So in some sense they are all the same graph just drawn differently.
98
00:10:36,190 --> 00:10:41,259
But it's not so easy to see this, but just to check a random one.
99
00:10:41,260 --> 00:10:49,959
How could we check this? The way I would do this just by hand is maybe, maybe number the vertices of the first one.
100
00:10:49,960 --> 00:10:53,780
Let's just do this. Okay.
101
00:10:53,780 --> 00:10:59,090
So that's the Burgesses. And now we want to map it to the vertices of the second one, the red one.
102
00:10:59,600 --> 00:11:09,530
Can you distinguish the colours? Well, uh, upper right corner in a way that basically the same numbering then has the same edges.
103
00:11:09,530 --> 00:11:13,969
So I would maybe start with the one here.
104
00:11:13,970 --> 00:11:17,060
Put the two here. Three.
105
00:11:17,160 --> 00:11:23,260
No, this should work for five. Six, seven, eight.
106
00:11:23,260 --> 00:11:29,770
So that's what we can try. Now, what we must check is, for example, here we have an edge between one and two.
107
00:11:31,930 --> 00:11:35,210
Okay. And we want to find that edge here as well. Okay.
108
00:11:35,230 --> 00:11:38,650
We also have one between one and two. Now, let's take a random one.
109
00:11:39,850 --> 00:11:43,450
Two and six. There's no edge. So I did something wrong.
110
00:11:45,110 --> 00:11:49,100
Okay. Yeah, that's the thing with these crafts.
111
00:11:52,060 --> 00:11:56,820
Um. Why did I do this wrong?
112
00:11:58,640 --> 00:12:03,590
So this. You should be six. Did I miss? Well, okay.
113
00:12:06,080 --> 00:12:09,920
No, I want to figure this out now. Sorry for wasting your time.
114
00:12:10,820 --> 00:12:14,420
So that was bad. Okay, let's.
115
00:12:14,420 --> 00:12:17,600
Let's try it this way. We have one.
116
00:12:18,620 --> 00:12:24,670
One is adjacent, so this should be maybe eight. Okay.
117
00:12:26,110 --> 00:12:31,989
Stop me after about half an hour. Okay.
118
00:12:31,990 --> 00:12:36,910
So we have one maybe, let's say this year is five and this is two.
119
00:12:37,980 --> 00:12:41,190
Then the three was wrong anyway.
120
00:12:43,520 --> 00:12:47,180
So five, eight. So the two we have one, two.
121
00:12:49,020 --> 00:12:52,800
Three. Three should be that. And then.
122
00:12:54,570 --> 00:12:59,100
Six should be there. That is good because it also fits to five.
123
00:13:01,850 --> 00:13:06,479
And this probably must be for. It's.
124
00:13:06,480 --> 00:13:09,960
Then it's between three and eight. No. Okay. I'm missing this up.
125
00:13:10,230 --> 00:13:18,570
Forget it. I leave this as an exercise to you and exercise to you to edit this from the film.
126
00:13:20,310 --> 00:13:24,270
But anyway. Okay, so you saw my point.
127
00:13:24,270 --> 00:13:28,530
It's a heart problem right here. And we need a computer to solve it.
128
00:13:29,160 --> 00:13:33,809
Good. So that's what we want to do. And we have this other problem.
129
00:13:33,810 --> 00:13:41,370
I talked about computing, this symmetry, the automotive ism of one graph, of course, that we can also do.
130
00:13:42,330 --> 00:13:52,590
There's only one problem. There can be too many. Okay. A graph with n vertices may have up to m factorial, many automotive isms.
131
00:13:52,590 --> 00:13:56,250
And just that's just too many to just to to list them all.
132
00:13:57,750 --> 00:14:03,230
Okay. So what we actually want to do, we want to continue to generating system for them.
133
00:14:03,240 --> 00:14:09,180
So a set of automotive isms such that all others can be obtained as compositions of these.
134
00:14:09,510 --> 00:14:18,210
Okay. And then it turns out that these two problems are equivalent in the sense that if you can solve one efficiently, you can solve the other.
135
00:14:18,600 --> 00:14:24,390
That's not completely trivial to see, but it's it's something that has been known for a long time.
136
00:14:25,970 --> 00:14:32,930
Okay. And we want to exploit this because mostly our approach will be to address the second problem,
137
00:14:32,930 --> 00:14:37,810
the automotive ISM problem, even though in practice we usually want to solve the first one.
138
00:14:38,870 --> 00:14:39,380
Okay.
139
00:14:39,680 --> 00:14:51,379
So the big open question is if is there an efficient algorithm solving this problem and our model of efficient algorithm is polynomial time algorithm.
140
00:14:51,380 --> 00:14:57,680
Now you can debate if that's a good model. Well, my answer to this is probably not, but it's the best we have.
141
00:14:58,130 --> 00:15:03,200
So we want to go for a polynomial time algorithm for this problem or the other.
142
00:15:03,200 --> 00:15:06,350
The automotive is ism problem as they are equivalence.
143
00:15:06,530 --> 00:15:11,480
Okay. Let me just remind you of a little bit.
144
00:15:13,100 --> 00:15:20,000
Complexity that you see probably in your first theory course, but that's important to you to set the stage for this problem.
145
00:15:20,810 --> 00:15:26,300
We have the class of polynomial time solvable problems, the ones that we can solve efficiently.
146
00:15:26,720 --> 00:15:32,570
Then we have this class MP which contains them and many other natural problems.
147
00:15:32,570 --> 00:15:38,600
And you can describe this class as the problems well, where you cannot find the solution efficiently.
148
00:15:38,600 --> 00:15:44,390
But at least once somebody suggests a solution, you can check efficiently if it is a solution.
149
00:15:44,960 --> 00:15:48,860
Okay. So that's MP.
150
00:15:48,890 --> 00:16:00,650
Many natural problems fall in there. But then we have the class of MP complete problem limbs on the top here, which are the hardest problems in MP.
151
00:16:01,550 --> 00:16:07,130
So if you can solve one of these efficiently, you can solve all the others in MP also efficiently.
152
00:16:07,550 --> 00:16:13,100
And what happens is that most natural problems that you see in the applications
153
00:16:13,400 --> 00:16:18,710
are either here they are in p complete or here they are never in between.
154
00:16:19,250 --> 00:16:25,070
Okay, now provide some of reasons. One of the very few problems where we don't know where it belongs.
155
00:16:25,580 --> 00:16:30,230
Okay. And we don't know for a long time now.
156
00:16:32,330 --> 00:16:38,090
So when I was a student, I was basically learning that probably it's a problem that is in between.
157
00:16:39,230 --> 00:16:43,190
Now, I don't believe this, and I think these days people don't believe this anymore.
158
00:16:43,460 --> 00:16:47,150
I think we we we all believe it should actually be in polynomial time.
159
00:16:47,150 --> 00:16:51,229
We only can't prove it. But anyway, what we can do.
160
00:16:51,230 --> 00:16:55,370
And that has been proved by bye bye 2016.
161
00:16:56,330 --> 00:17:01,790
So, so not so long ago. And that was a big result. We can prove that it comes close.
162
00:17:01,790 --> 00:17:04,880
It's solvable in what we call quasi polynomial time.
163
00:17:05,420 --> 00:17:11,960
It's it's still something that has a kind of exponential growth.
164
00:17:12,890 --> 00:17:19,370
But the x, the exponents of all of this is just growing logarithmic.
165
00:17:19,640 --> 00:17:23,270
So that's kind of close to a constant.
166
00:17:23,750 --> 00:17:28,270
And then it would be polynomial. Okay, so, so that's pretty good.
167
00:17:29,540 --> 00:17:38,480
It's a really nice paper. And so there is still progress in this area, but where does the whole thing come from?
168
00:17:38,960 --> 00:17:48,080
Actually, the first papers on computing I some of isms come from chemistry or chemical information systems to be precise.
169
00:17:48,440 --> 00:17:58,399
So the chemists have these graph representations of molecules like the one you see there, and so they stored them in databases.
170
00:17:58,400 --> 00:18:05,780
But then you can draw the same molecule in different ways. So when they find a new one, they want to know, is it already in our database or not?
171
00:18:06,110 --> 00:18:10,280
So they have to match it against those in the database. That's the problem they faced.
172
00:18:10,610 --> 00:18:18,589
That's the ISO morph isn't problem. So, so, so they came up with some heuristics and then computer sciences.
173
00:18:18,590 --> 00:18:22,010
So that's an early reference from the from the 1950s.
174
00:18:22,010 --> 00:18:29,990
Right. And then computer scientist, the first two papers from computer science about the problem are from 1964.
175
00:18:32,760 --> 00:18:37,950
But at least in theoretical computer science, which is my area.
176
00:18:38,610 --> 00:18:42,000
The problem gained prominence in the 1970s.
177
00:18:42,360 --> 00:18:51,600
Well, in the 1970s, this MP completeness I was talking about came up and there was an extremely influential paper by Karp,
178
00:18:51,780 --> 00:18:56,429
and he showed that many problems that people in combinatorial optimisation,
179
00:18:56,430 --> 00:18:59,190
mostly that people had been studied for a long time,
180
00:19:00,240 --> 00:19:09,560
all turned out to be NP complete OC and p completeness had just been introduced, but more as an abstract thing.
181
00:19:09,570 --> 00:19:14,850
And then Karp came and said, Oh, look, all these natural problems fall in there.
182
00:19:15,090 --> 00:19:18,240
Okay, great paper, extremely influential paper.
183
00:19:18,510 --> 00:19:22,240
And then he had an open problem. And that was what's with graph, I assume office.
184
00:19:22,290 --> 00:19:30,900
He didn't know. Okay. In 1979, Gary Johnson wrote a book about MP completeness and they still didn't know.
185
00:19:31,230 --> 00:19:37,500
And today we still don't know. It's one of the very few remaining open problems from that book.
186
00:19:38,250 --> 00:19:43,580
And well, then people got frustrated from this about this.
187
00:19:43,600 --> 00:19:53,339
So here you see a title of a paper that was also influential called the Graph ICM of ISM Disease, which is kind of funny because I mean,
188
00:19:53,340 --> 00:20:00,180
the problem had been explicitly stated five years earlier and even then people were tired of it.
189
00:20:01,020 --> 00:20:06,870
Now, I told you about Bob, I said algorithm from 2016, which is more than five years ago,
190
00:20:06,870 --> 00:20:14,280
and I am presenting it as as pretty much a new result, telling you that there's still a lot going on in the area.
191
00:20:14,290 --> 00:20:20,250
So, you know, if you work on a problem for long enough, I guess the time scales shift a little bit.
192
00:20:21,330 --> 00:20:26,940
Okay, so we have that and not so much was happening in the 1970s,
193
00:20:26,940 --> 00:20:33,720
but then starting in the 1980s, there was big progress and that's where group theory came in.
194
00:20:34,050 --> 00:20:38,640
Okay. So people had this idea, okay, we have this group structure, we must use it.
195
00:20:38,640 --> 00:20:43,770
And Bob was the first to have the idea. He suggested the general approach.
196
00:20:44,250 --> 00:20:49,830
And then the really important paper out of that period was Lux's paper,
197
00:20:50,100 --> 00:20:56,730
who showed that if the if the graphs have bounded degree, then you can solve it in polynomial time.
198
00:20:56,760 --> 00:21:02,879
Now this result in its its own is probably it's a nice result but not so important.
199
00:21:02,880 --> 00:21:10,140
But what makes this paper extremely important is that he introduced basically a strategy how to solve it,
200
00:21:10,410 --> 00:21:13,730
a method, how to to exploit this group structure.
201
00:21:13,740 --> 00:21:22,710
And that, for example, has been used also by Bobby for his quasi polynomial time algorithm, much research that follows Lux's paper.
202
00:21:22,800 --> 00:21:25,800
So that's that's an extremely important paper.
203
00:21:26,400 --> 00:21:32,870
And now we make a big step. Not that nothing happens in between, but we come to Bob's results.
204
00:21:33,150 --> 00:21:41,760
He showed that it's in quasi polynomial time. You see that the means of communication has changed over years from typewriter to YouTube.
205
00:21:44,280 --> 00:21:47,909
That was a presentation by Bobby on this results.
206
00:21:47,910 --> 00:21:56,070
He gave a couple of seminars at the University of Chicago when he announced this in fall 2015.
207
00:21:56,670 --> 00:22:00,659
And actually we had a doctoral meeting in December 2015.
208
00:22:00,660 --> 00:22:08,969
And basically the afternoons, all all afternoons he talked about this is an 80 page paper and he he tried to explain it.
209
00:22:08,970 --> 00:22:12,090
I remember that very well.
210
00:22:13,800 --> 00:22:21,120
Okay. So there we are. Let's a little bit talk about.
211
00:22:22,440 --> 00:22:29,090
Practical stuff. I will move towards more practical stuff towards the end of the talk now.
212
00:22:29,270 --> 00:22:35,530
My interesting graph I saw Wolf ism, I should say, is mainly the theoretical one.
213
00:22:35,540 --> 00:22:39,440
It's a problem where we don't know the complexity we should.
214
00:22:39,850 --> 00:22:46,040
Okay. And also it's a lot of nice mathematics involved.
215
00:22:46,370 --> 00:22:54,800
But, but I'm also coming to the more practical things. Now, one thing to note around the same time that that Lots wrote this influential paper,
216
00:22:55,820 --> 00:23:03,500
Brendan McKay came up with a tool solving graph saw in office, and that's already in 1981 worked very well.
217
00:23:03,530 --> 00:23:07,819
It's called Naughty and it's still one of the best.
218
00:23:07,820 --> 00:23:16,100
I mean, he continued developing. It's it's still of the one of the best tools and you know, and in most practical situations,
219
00:23:16,340 --> 00:23:22,040
you can just solve some of this which distinguishes it from other and part problems.
220
00:23:22,040 --> 00:23:28,130
You can solve them sometimes, but usually it's also easy to come up with hard instances for this.
221
00:23:28,730 --> 00:23:33,080
It's very difficult to find examples where the tools don't work well.
222
00:23:33,300 --> 00:23:45,350
Okay, so the state of the art is probably a combination of naughty with a with another tool by uh, and also people know call traces.
223
00:23:47,060 --> 00:23:52,260
Okay. So we can solve it well in practice.
224
00:23:52,270 --> 00:23:56,070
Now you may ask why on earth would we want to solve it in the first place?
225
00:23:56,080 --> 00:24:00,690
So let me let me tell you a little bit about applications of this.
226
00:24:00,690 --> 00:24:06,840
And we have two different types of applications, which is interesting because they are really,
227
00:24:07,290 --> 00:24:12,570
really very different, but both builds on the same thing.
228
00:24:12,870 --> 00:24:19,320
And the first are called graph matching applications. You have different graphs, you want to match them, you want to know are they the same?
229
00:24:19,770 --> 00:24:24,239
OC The chemical information system thing I mentioned, what is of that type?
230
00:24:24,240 --> 00:24:28,290
You have two representations of a molecule you want to find.
231
00:24:28,300 --> 00:24:32,250
If it isn't the same, the same thing happens in computer vision.
232
00:24:32,250 --> 00:24:37,319
There's been a lot of practical work on this.
233
00:24:37,320 --> 00:24:41,100
They call it graph matching, actually in computer vision.
234
00:24:43,170 --> 00:24:49,380
Which is completely unrelated to the working theory, by the way.
235
00:24:50,070 --> 00:24:51,960
So, so anyway.
236
00:24:52,650 --> 00:25:01,139
So, so for example, what they do saying face recognition, they represent faces as graphs where they, they have a node for the nose and the eyes.
237
00:25:01,140 --> 00:25:04,140
So what do I know? And then they want to match the graphs.
238
00:25:04,500 --> 00:25:08,879
And then there's many applications like that in, uh, in computer vision.
239
00:25:08,880 --> 00:25:13,950
So they, they want to do this or static program analysis is another example.
240
00:25:13,950 --> 00:25:18,930
Malware detection. You may have flow graphs of code and you want to match them.
241
00:25:19,200 --> 00:25:26,250
So that's one typical type of application. And the second, as I said, is quite different that we want to use.
242
00:25:26,880 --> 00:25:35,020
Um. Symmetries to to simplify hard combinatorial problems.
243
00:25:35,320 --> 00:25:42,040
So we can try to use the symmetries of an input instance to a heart problem, say,
244
00:25:42,040 --> 00:25:48,580
to just simplify the instances, maybe collapsing equivalence classes of vertices.
245
00:25:49,000 --> 00:25:53,580
Or we can just prune such trees if we have backtracking algorithms.
246
00:25:53,590 --> 00:26:01,240
And again, this has many applications, for example, in set solving in integer linear programming and so on.
247
00:26:02,950 --> 00:26:08,440
Okay. So these are the types of applications and.
248
00:26:11,220 --> 00:26:19,170
One thing I noted when I I looked at this quite a while ago that actually in most of these applications,
249
00:26:19,170 --> 00:26:23,850
you don't require exact ISO malfeasance, exact matching.
250
00:26:24,510 --> 00:26:34,460
It's usually good enough in, say, in these computer vision things if if you have an approximate matching, okay, sometimes you want it to be exact.
251
00:26:34,470 --> 00:26:43,680
Probably the chemical information systems require exact matching the static program analysis or the computer vision.
252
00:26:43,680 --> 00:26:49,890
Probably not. Okay. And when then you you start to think, what is it, approximate I am office.
253
00:26:50,010 --> 00:26:54,419
And that leads to the second part of my talk on graph similarity.
254
00:26:54,420 --> 00:27:00,240
And actually that led me to think about what graph similarity could probably mean and so on.
255
00:27:00,270 --> 00:27:01,500
So I'll come to that.
256
00:27:01,890 --> 00:27:12,810
But now I want to continue with ISO Mathieson and then tell you a little bit about some algorithms and some recent work we've been doing there.
257
00:27:13,170 --> 00:27:22,920
But let's start with a, uh, simple old algorithm also from the 1960s which.
258
00:27:24,330 --> 00:27:28,320
Has somehow become quite important in recent years.
259
00:27:28,710 --> 00:27:36,450
That's the Vice Falla Lima algorithm, which is also known as colour refinements or not vertex classification.
260
00:27:36,990 --> 00:27:44,550
And what it does is it computes a colouring of the vertices of the graph in a very simple way, iteratively.
261
00:27:44,700 --> 00:27:51,389
Initially, all vertices get the same colour and then we repeatedly refine the colour colouring.
262
00:27:51,390 --> 00:27:57,450
And the way we do this is we look at two vertices that still have the same colour and we give them different
263
00:27:57,450 --> 00:28:04,290
colours if there's some other colour such that these two have a different number of neighbours of that colour.
264
00:28:04,620 --> 00:28:13,349
Okay, and we repeat this. So we basically get a partition of the vertices into colour classes and we repeat it as long as
265
00:28:13,350 --> 00:28:20,010
the partition gets finer and at some point that stops and then we return this stable partition.
266
00:28:20,070 --> 00:28:25,890
So I can just show you a little demo of that whole GĂ¶del wrote this.
267
00:28:26,160 --> 00:28:33,299
So here we have some graph, random graph with parameters that turn out to be convenient in this context.
268
00:28:33,300 --> 00:28:36,660
Well, here it's actually a tree. Let's do a new one.
269
00:28:36,660 --> 00:28:40,620
Tree is too boring. Is this a tree again? No, we have a triangle at least.
270
00:28:40,890 --> 00:28:46,650
Okay. Anyway, let's look at this. Initially, all vertices have the same colour blue.
271
00:28:46,890 --> 00:28:51,240
Okay, so then we start the refinement. And what should happen now?
272
00:28:51,480 --> 00:28:58,470
For example, this one here has two blue neighbours, so it should get a different colour then.
273
00:28:58,740 --> 00:29:03,780
That's one which only has one blue neighbour or that one that has three blue neighbours.
274
00:29:04,680 --> 00:29:08,190
So let's see what happens now. Yeah.
275
00:29:08,190 --> 00:29:16,290
So here this one is red, this is some, some ugly greenish colour and that's another green.
276
00:29:16,710 --> 00:29:26,520
Okay. And now let's see, this one here has a purple neighbour and one of these greenish neighbours,
277
00:29:26,520 --> 00:29:33,839
whereas this one has one of those as well and a green one, so they should get different colours.
278
00:29:33,840 --> 00:29:40,680
Now we do another step and now if we had these three, they have all different colours.
279
00:29:40,680 --> 00:29:46,200
Now of course as the number of colours grows, they get harder to distinguish.
280
00:29:46,200 --> 00:29:52,559
So let me just click through this. We get more and more colours and at some point we're done.
281
00:29:52,560 --> 00:30:02,460
Nothing new happens. Okay, now we could try the same thing again, but basically the the partition of notes into colour classes stays the same.
282
00:30:02,610 --> 00:30:05,810
And that's, that's the stable colouring.
283
00:30:05,940 --> 00:30:13,380
Okay, that's the algorithm. Very simple. Before I tell you what we actually do with it,
284
00:30:13,650 --> 00:30:23,010
let me just mention that this can be implemented to run very efficiently in almost linear time, also no large quantities.
285
00:30:23,010 --> 00:30:29,190
So this works really, really well. So we have this well, almost linear time.
286
00:30:29,190 --> 00:30:39,500
We have a lock factor there. So a while back we were wondering if we can get rid of the lock factor and we could show that probably we can't,
287
00:30:39,510 --> 00:30:47,909
at least if we stick to somewhat reasonable algorithm. Now if you if you start to do arithmetic on the bits of some real numbers,
288
00:30:47,910 --> 00:30:55,320
I have no guarantees, but uh, so the lower bound applies to a fairly wide class of algorithms.
289
00:30:55,620 --> 00:31:04,830
Okay, so we have this very efficient thing, but what we want to do with it, how does it relate to what we're interested in?
290
00:31:04,890 --> 00:31:14,160
Well, basically it detects non symmetry. If two vertices get different colours, then there's no or then they are different, structurally different.
291
00:31:14,580 --> 00:31:18,750
So there's no automotive is mapping one to the other. Okay.
292
00:31:19,320 --> 00:31:26,970
And our goal would be to detect all structural differences between the vertices.
293
00:31:27,480 --> 00:31:32,100
Now, unfortunately, uh, this does not always happen.
294
00:31:32,370 --> 00:31:44,459
It actually does happen on, on random graphs. Okay, so on almost all graphs it happens, but then almost all graphs are not particularly interesting.
295
00:31:44,460 --> 00:31:52,450
So, uh, this tends to not cover, uh, the interesting graphs in particular.
296
00:31:52,560 --> 00:32:03,150
Well, it depends on what you are doing. Lastly, you may disagree on that, but anyway, now it fails on, on simple graphs such as this one,
297
00:32:03,180 --> 00:32:08,159
because if you look colour, refinement or vice element does nothing there.
298
00:32:08,160 --> 00:32:12,780
Every now the versus is all black every is has two black neighbours.
299
00:32:12,780 --> 00:32:17,520
So there's nothing to even start that's already a stable colouring.
300
00:32:17,790 --> 00:32:23,460
But then of course the vertices in the triangle are structurally different from those in the.
301
00:32:24,030 --> 00:32:27,870
The four cycles. So there it miserably fails.
302
00:32:28,260 --> 00:32:36,930
And that also happens now. Since it's so efficient, we still want to use it in practice and somehow turn it into a reasonable algorithm.
303
00:32:36,930 --> 00:32:41,430
That's what the practical tools do. Um, actually.
304
00:32:41,430 --> 00:32:48,089
Okay, one more thing I wanted to mention is that as in Ice and Wolf as in test,
305
00:32:48,090 --> 00:32:54,390
we would say the algorithm distinguishes two graphs if the colour histograms are different.
306
00:32:54,600 --> 00:33:02,040
OC If one graph has more red vertices than the other, they cannot be ISO Morpheus iso morphic.
307
00:33:03,750 --> 00:33:08,310
Now if four colours they have the same number of vertices, we don't know.
308
00:33:08,460 --> 00:33:13,290
So this is what we could regard as an incomplete ISO more for some test.
309
00:33:13,920 --> 00:33:17,400
Okay. Now I wanted to go to the practical stuff.
310
00:33:18,030 --> 00:33:23,670
Okay. So we have this algorithm and it's it's very efficient but incomplete.
311
00:33:23,970 --> 00:33:25,020
So what do we do?
312
00:33:25,020 --> 00:33:36,090
We basically integrate this in some kind of backtracking algorithms, and that's what all the tools that are successful in practice do.
313
00:33:36,540 --> 00:33:40,160
So let's just briefly explain how this works.
314
00:33:40,440 --> 00:33:45,209
So we have a graph here. We do colour refinements on the top one on this one.
315
00:33:45,210 --> 00:33:50,850
Okay, we have two colour classes, red and blue. Now that doesn't help us much.
316
00:33:51,090 --> 00:34:01,319
Now if every voters had a different colour that would help us because then we could take another graph and if it had a different
317
00:34:01,320 --> 00:34:07,620
colour pattern it would be nice and morphic and if it had the same colour pattern and all vertices have different colours,
318
00:34:07,620 --> 00:34:12,990
we know exactly what the only possible isam of ism can be and we can just check it.
319
00:34:13,380 --> 00:34:21,150
So our goal will be to somehow in a canonical way, find a colouring where all vertices have a different colour.
320
00:34:21,810 --> 00:34:27,060
That's what we're trying to do. Now here we have three red vertices that's bad.
321
00:34:27,420 --> 00:34:33,660
So what we say is, let's take one of the red vertices and individualise it.
322
00:34:33,720 --> 00:34:37,890
And that means just give it a random new colour purple here.
323
00:34:38,370 --> 00:34:43,949
Okay. So now of course, all three red vertices look the same for us.
324
00:34:43,950 --> 00:34:49,229
So we basically have two blondes. We build a search tree in the second branch.
325
00:34:49,230 --> 00:34:53,190
That vertex will be the new one and in the third branch it will be that one.
326
00:34:53,820 --> 00:34:58,410
Okay, so we are here and we run colour refinement again.
327
00:34:59,010 --> 00:35:04,739
Now since this vertex is purple, this one knows it's different than the other two.
328
00:35:04,740 --> 00:35:09,899
So it also gets a new colour light blue. Okay, that still doesn't help.
329
00:35:09,900 --> 00:35:15,360
We still have to read purchases and to dark blue and we do the same thing again.
330
00:35:15,360 --> 00:35:17,760
We say this now becomes pink,
331
00:35:18,810 --> 00:35:27,389
then the neighbour also gets a new colour and now suddenly we're in a situation where all vertices have the same colour and we expand the full tree.
332
00:35:27,390 --> 00:35:32,700
And then basically with this object we can compare two graphs.
333
00:35:32,910 --> 00:35:37,830
Now that alone wouldn't be very efficient, but then what we can do is we can,
334
00:35:38,850 --> 00:35:49,440
we can prune this tree and in clever ways I don't want to get into so that usually the tree is fairly small and then this works quite efficiently.
335
00:35:49,770 --> 00:35:54,570
So basically at every node of this tree we run this 1wl algorithm.
336
00:35:55,410 --> 00:35:59,130
But that's okay because it's so efficient. Good.
337
00:36:00,420 --> 00:36:08,370
So now let's move into the, uh, into the theory, uh, very firmly into theory.
338
00:36:09,180 --> 00:36:17,130
First of all, the algorithm has a higher dimensional version, which you could have guessed because I call the previous one one dimensional.
339
00:36:17,460 --> 00:36:23,610
So in the k dimensional version instead of vertices we call our k tuples of as well.
340
00:36:23,610 --> 00:36:31,320
We can also do this. Then the running time will be something like into the k, which is still polynomial.
341
00:36:31,650 --> 00:36:41,700
Okay, now as you do this for k equals three, then all algorithms people usually come up with are subsumed by this.
342
00:36:41,700 --> 00:36:48,959
This is stronger by every single algorithm that people suggest to me after my talks on graph I more fears and testing and so on.
343
00:36:48,960 --> 00:36:52,560
It's so so we, we always have this.
344
00:36:52,800 --> 00:37:01,620
So that's a very powerful algorithm. Actually, a while ago I showed that if you have a glass graph class that excludes some fixed graph as a minor,
345
00:37:01,800 --> 00:37:07,860
such as playing on graphs or graphs of bounded tree with so classes like this.
346
00:37:08,010 --> 00:37:13,110
Then actually there is a K such that this is a complete isomer of this test.
347
00:37:13,830 --> 00:37:20,040
On the other hand, it was known for a long time that on general graphs it's not okay.
348
00:37:21,090 --> 00:37:27,300
So that's combinatorial algorithm. And that's basically where we stand with combinatorial algorithms.
349
00:37:27,930 --> 00:37:37,380
Quite powerful, but not good enough. So let's finally move to the groups so we know the solution space to our problem.
350
00:37:37,390 --> 00:37:45,000
The automotive ISM group has a nice structure. Okay, we want to exploit this, but how do we do this?
351
00:37:45,030 --> 00:37:53,020
And that's the challenge, of course. And look, suggested a divide and conquer approach, okay.
352
00:37:53,130 --> 00:38:01,080
For solving this automotive ISM version. So you wants to compute the automotive ISM group, but it's equivalence to two isomers.
353
00:38:01,680 --> 00:38:06,210
All right. So the goal is to compute the Automotive ISM Group.
354
00:38:06,690 --> 00:38:16,770
And what we do is we take some some supergroup that contains all automotive isms that may be the group of all permutations of the vertex set,
355
00:38:17,070 --> 00:38:21,420
or it may be something that we obtain in a different way somehow easily.
356
00:38:22,850 --> 00:38:34,460
And then we we basically computed decreasing sequences of groups that get closer and closer to the axolotl more for some groups.
357
00:38:35,090 --> 00:38:40,040
And what we do is we kind of exploit the structures of the groups we see.
358
00:38:40,460 --> 00:38:47,900
And well, we in groups we can split them up along orbits or blocks if you know what these are.
359
00:38:48,140 --> 00:38:52,160
Otherwise, just believe me that when you use that, there are natural,
360
00:38:52,430 --> 00:38:59,240
natural ways to split up permutation groups and somehow reduce the problem size, move to smaller groups.
361
00:38:59,480 --> 00:39:04,880
But then at some point we get stuck and that's when we hit primitive permutation groups.
362
00:39:05,330 --> 00:39:13,430
They are the bottleneck. And then in Lux's setting, in the old paper, basically at that point he just said, okay.
363
00:39:13,430 --> 00:39:22,490
And now if the input graph has this simple structure bounded degree, I don't have to worry about them and I can just use brute force.
364
00:39:23,210 --> 00:39:30,620
And then Baba was much more refined in his algorithm and and used some heavy machinery to deal with the primitive groups.
365
00:39:31,430 --> 00:39:36,140
In the end, we know a lot about the classification of finite, simple groups.
366
00:39:36,680 --> 00:39:42,889
A monumental piece of work in mathematics tells us a lot about how these groups can look.
367
00:39:42,890 --> 00:39:47,630
And basically now we can look at the tables of groups that appear and see what we can do with them.
368
00:39:48,980 --> 00:39:52,580
And in a sense, that's what we're doing.
369
00:39:53,330 --> 00:40:01,400
All right. So let's look at the main results that's that builds on this group theoretic approach.
370
00:40:02,300 --> 00:40:06,290
And end is always the number of vertices of the input graph.
371
00:40:06,290 --> 00:40:13,040
And D is usually the maximum degree, the maximum number of neighbours a vertex may have.
372
00:40:13,850 --> 00:40:22,010
Okay, so luck's proof that I am more feasible for graphs of maximum degree d can be decided in time roughly into the D.
373
00:40:22,640 --> 00:40:26,600
Okay. A few years later this has been improved.
374
00:40:26,600 --> 00:40:32,510
Or actually just one year later to ensue the d overclock d oc.
375
00:40:34,160 --> 00:40:41,960
Bommai many years later proved that you can do quasi polynomial time and actually.
376
00:40:43,230 --> 00:40:48,130
In like the worst case runtime. There hasn't been much progress between these two papers.
377
00:40:48,150 --> 00:40:53,040
There have been many interesting papers on graphite. So Wolff is in the time between.
378
00:40:53,040 --> 00:41:05,490
But, uh, now in, in some sense, these, these papers really are immediate successors, if you will.
379
00:41:06,030 --> 00:41:15,380
Okay. So that's where we stand now. If you look at this, uh, let's say the maximum degree is log m.
380
00:41:15,990 --> 00:41:22,170
Okay, then what do we have? Logs tells us we need time into the log.
381
00:41:22,170 --> 00:41:29,070
M and Baba. He doesn't care about this and still tells us into the poly log.
382
00:41:29,080 --> 00:41:33,000
M And that's somehow bothered us.
383
00:41:33,000 --> 00:41:38,670
So we looked at this and kind of combined the two results and we proved the following.
384
00:41:39,150 --> 00:41:46,770
If you have graphs of maximum degree D, you can go to time into the poly log of the OC.
385
00:41:46,780 --> 00:41:52,259
So that's kind of, uh, the, the, the minimum of the two.
386
00:41:52,260 --> 00:42:00,000
And for example, photographs of logarithmic degree that improves both running times, uh, quite a bit.
387
00:42:00,210 --> 00:42:04,260
Okay. And I think I.
388
00:42:05,450 --> 00:42:10,570
Skip the proof of this or let me just tell you a little bit about the strategy.
389
00:42:10,580 --> 00:42:16,300
I wasn't going to tell you the whole truth anyway, but we basically proceed in three steps.
390
00:42:16,310 --> 00:42:22,190
The first is we just analyse the groups that are using the classification of finite, simple groups.
391
00:42:22,310 --> 00:42:27,170
We understand the structure of these groups that we need for this divide and conquer stuff.
392
00:42:28,010 --> 00:42:31,220
Then it turns out we can just apply the divide and conquer stuff.
393
00:42:31,490 --> 00:42:37,910
We need one intermediate step, and that may be the key innovation in our work.
394
00:42:37,910 --> 00:42:48,020
We somehow encoded parts of these automotive ism groups into the graphs, so we augment the graphs by certain tree structures that help us.
395
00:42:48,440 --> 00:42:57,139
Then in the third step to apply basically Bobby's techniques, unfortunately, we couldn't just use bubbles a result as a black box,
396
00:42:57,140 --> 00:43:03,080
but we had to looking into this stuff and then actually compute the group in the time we want.
397
00:43:04,460 --> 00:43:09,620
And now with this with this machinery, we could do a bit more.
398
00:43:09,650 --> 00:43:16,970
For example, if you have graph classes excluding a fixed graph as a miner and this graph has K vertices,
399
00:43:18,230 --> 00:43:23,120
then we also have an algorithm running in time ends with a poly log of K.
400
00:43:24,570 --> 00:43:29,879
All right, so if I've lost you here, good news.
401
00:43:29,880 --> 00:43:36,300
We started a completely new part of. Of my talk, and we have a nice picture here.
402
00:43:36,300 --> 00:43:40,500
Again, that one even older, uh, 5000 years.
403
00:43:41,070 --> 00:43:45,600
Uh, it's a very nice relief. And, and you see the symmetries again.
404
00:43:47,310 --> 00:43:50,520
Okay, so there's basically a symmetry axis in the middle.
405
00:43:50,820 --> 00:43:55,470
Okay. Now, one thing you note, of course, it's it's not an exact symmetry.
406
00:43:56,280 --> 00:44:00,420
Okay? So so if you look here, that's that's different than what you see.
407
00:44:00,750 --> 00:44:03,750
There's actually something something very funny about this.
408
00:44:04,020 --> 00:44:13,020
So basically, if you if you disregard these small in precisions, you have the symmetry along this axis,
409
00:44:13,020 --> 00:44:19,259
you will think if you look at the heads of the deer, but then if you look at the feet, it's the other way round.
410
00:44:19,260 --> 00:44:24,420
You would have to flip it over like that to match the feet, but then the hats would be wrong.
411
00:44:25,390 --> 00:44:29,010
Well, that's kind of puzzling.
412
00:44:29,010 --> 00:44:35,040
I wonder why they did it this way if they hadn't noticed or thought it was funny or didn't care.
413
00:44:35,430 --> 00:44:47,550
It's it's interesting. But anyway, my my point was, even though this is not a precise symmetry, it's good enough to convey the main, main point.
414
00:44:48,230 --> 00:44:52,320
The picture still looks fairly symmetric.
415
00:44:52,380 --> 00:44:56,910
Okay. So approximate symmetry is can also be good enough.
416
00:44:57,240 --> 00:45:00,510
And in many, many situations they are. Okay.
417
00:45:00,510 --> 00:45:11,399
Now let's look to to modern times and look at these three graphs again, showing chemical molecules.
418
00:45:11,400 --> 00:45:15,780
Now, which of these are almost similar? Well, you can think about it.
419
00:45:15,780 --> 00:45:21,240
They are all pretty similar in a way. So what's cell we say?
420
00:45:22,560 --> 00:45:28,350
Okay, let me be a bit more precise when it comes to designing synthetic fuels.
421
00:45:28,380 --> 00:45:32,580
Now, that's from a collaboration with chemical engineers at my university.
422
00:45:33,810 --> 00:45:41,070
So we looked at things like this and we want to understand which molecules may be good, good fuels in the end.
423
00:45:41,160 --> 00:45:50,940
Okay. It turns out that with respect to the properties we care about here, the two blue ones, some awesome similar.
424
00:45:51,360 --> 00:45:55,860
Okay, now that's something I don't understand why that is.
425
00:45:55,860 --> 00:45:59,310
I can't even pronounce the names of these, so how should I?
426
00:46:00,780 --> 00:46:09,629
But we worked on this. So, so basically what we did, we built a machine learning model that kind of tries to predict the properties of these.
427
00:46:09,630 --> 00:46:16,470
And the way this goes is we usually map the graphs into some vector space and there we can then
428
00:46:16,470 --> 00:46:23,280
use standard techniques from machine learning to distinguish between them and the mapping.
429
00:46:23,700 --> 00:46:31,080
If this is correct that the two blue ones are more similar should maybe be such that the two blue points are closer together.
430
00:46:31,800 --> 00:46:38,760
Okay. And this then also gives us a method for for quantifying similarity.
431
00:46:38,760 --> 00:46:48,390
We can just map two vector spaces in a way that that closeness in this space somehow corresponds to similarity.
432
00:46:49,290 --> 00:47:00,199
Okay. Um, let's leave it there and just think about what, how else we might approach graph similarity,
433
00:47:00,200 --> 00:47:03,979
because I'm sure that's not the first thing you would have thought of.
434
00:47:03,980 --> 00:47:11,930
And if you ask theoretical computer scientists, the first answer you get is or almost always what?
435
00:47:12,320 --> 00:47:20,899
Well, how to quantify similarity or distance between two graphs is you would want to count the number of edge mismatches.
436
00:47:20,900 --> 00:47:26,209
So you try to align the two graphs and then you have to delete a few edges
437
00:47:26,210 --> 00:47:30,710
and edges to the other and then you should get the same or ISO morphic ones.
438
00:47:31,490 --> 00:47:38,330
Okay. That's what you might call edit distance similar to edit distance of words.
439
00:47:38,720 --> 00:47:48,950
Okay. Now that's something that seems very natural for us, I think in theoretical computer science and algorithms and so on.
440
00:47:49,190 --> 00:47:56,390
On the other hand, if you think about it, let's say they are fairly close, you only have to flip 5% of the edges.
441
00:47:56,720 --> 00:48:02,510
Now if you flip 5% of the edges on a graph, it can look very, very different.
442
00:48:03,500 --> 00:48:09,050
So it's not clear at all that this is in any way a meaningful measure.
443
00:48:09,260 --> 00:48:13,070
Okay, um, let's look at.
444
00:48:13,460 --> 00:48:18,230
I was hoping to have more. But now I'm apparently stuck.
445
00:48:18,440 --> 00:48:22,670
That's annoying. Okay. Let me just try to.
446
00:48:25,010 --> 00:48:30,380
With the program. Okay.
447
00:48:31,090 --> 00:48:34,659
Oh, now we have to run through the whole thing.
448
00:48:34,660 --> 00:48:39,670
And I just hope that I wasn't just stuck at that slight.
449
00:48:41,480 --> 00:48:47,940
Okay. Okay. It's just for the vote.
450
00:48:47,960 --> 00:48:51,410
What was the best slide in the end? So take notes now.
451
00:48:52,360 --> 00:48:57,310
Uh. Okay. Huh?
452
00:48:58,190 --> 00:49:03,700
Worked. Okay, now, something people do in machine learning is what they call graph kernels.
453
00:49:03,700 --> 00:49:13,270
Basically, for all purpose. They extract features from a graph, maybe even infinitely many, and the features are coded as real numbers.
454
00:49:13,280 --> 00:49:19,420
So, for example, number of triangles in the graph, number of edges, whatever people come up with.
455
00:49:20,350 --> 00:49:23,800
And then you compare these, these vectors of the numbers.
456
00:49:23,980 --> 00:49:30,940
Okay, that gives you some similarity measure. And if the features are meaningful, it's they that should tell you something.
457
00:49:31,950 --> 00:49:35,220
Then there's another step now.
458
00:49:35,820 --> 00:49:42,730
As I described it, people select the features. Now, we could let machine learning select the features.
459
00:49:42,780 --> 00:49:46,079
Okay, so then we get learned embeddings of the graph.
460
00:49:46,080 --> 00:49:49,889
So the features would associate vectors with the graph.
461
00:49:49,890 --> 00:49:59,640
So we embed them in some vector space and we can kind of from examples we can tell them these two are similar, these are not similar, and so on.
462
00:50:00,360 --> 00:50:05,140
We can try to come up with a good, uh, a good measure.
463
00:50:05,190 --> 00:50:10,739
Okay. So techniques for this are graph neural networks, graph contrastive learning.
464
00:50:10,740 --> 00:50:17,370
So that's stuff we are working on, uh, these days in practice and many other people are as well.
465
00:50:18,180 --> 00:50:23,340
Um, them okay. I, I grew up as a logician.
466
00:50:23,400 --> 00:50:27,059
Uh, I, I did my Ph.D. in mathematical logic.
467
00:50:27,060 --> 00:50:33,210
So for me, in a fairly immediate thing to think about is logical equivalence.
468
00:50:33,720 --> 00:50:38,520
Uh, take equivalence is in increasingly stronger logics.
469
00:50:38,520 --> 00:50:48,749
That should give you a feeling of how close to graphs are okay by similarities equivalent to this and many people here will know what that is.
470
00:50:48,750 --> 00:50:52,860
If you don't. Never mind. It won't play a role in this talk.
471
00:50:53,400 --> 00:51:00,389
Okay, then we have this algorithm vice file alignment algorithm which was not a complete us so move isn't this.
472
00:51:00,390 --> 00:51:06,300
But maybe we could say if if it does not distinguish two graphs then at least they are similar.
473
00:51:06,570 --> 00:51:11,520
That's a and then we have this hierarchy of algorithm that gives us some kind of distance.
474
00:51:11,520 --> 00:51:14,790
That's something we could do, of course.
475
00:51:14,790 --> 00:51:21,119
Then if we're most more mathematically minded, we can say, okay, now we have these adjacency matrices of the graph.
476
00:51:21,120 --> 00:51:28,650
We can apply some matrix knobs to them that should give us the distance that's actually related to this edit distance thing.
477
00:51:29,130 --> 00:51:34,920
Oh, if you're more sophisticated, we can think of optimal transport distances.
478
00:51:35,190 --> 00:51:41,880
Okay, so we have all these approaches. Now, what do we do with this I.
479
00:51:44,080 --> 00:51:49,479
I think this similarity problem is really, really important for many practical means,
480
00:51:49,480 --> 00:51:54,490
mainly for machine learning, because machine learning has always been built on.
481
00:51:56,230 --> 00:52:03,450
Similarity in some way. So we should understand it, but there's almost no theory.
482
00:52:03,690 --> 00:52:05,069
So I try to think about it.
483
00:52:05,070 --> 00:52:14,010
And basically I came up with basically two different ways of thinking about similarity, and I just want to share them with you.
484
00:52:14,310 --> 00:52:20,850
And the first I call operational similarity. And under this view, two graphs are similar.
485
00:52:20,880 --> 00:52:24,270
If you can transform one easily into the other.
486
00:52:24,630 --> 00:52:27,720
Okay, that's a natural way. Thinking about is, for example,
487
00:52:27,720 --> 00:52:36,150
edit distances of this type you want to if you delete as few edges as possible and add as few edges to get from one to the other.
488
00:52:36,150 --> 00:52:40,080
And if you have few, a few operations, then they are close.
489
00:52:42,090 --> 00:52:46,590
So the optimal transport distances I mentioned also of this type.
490
00:52:47,210 --> 00:52:56,100
Now an additional thing that you get out of this is usually you even get an alignment of the graphs that somehow realises this,
491
00:52:56,100 --> 00:53:01,650
which may be useful sometimes. Then the other side I call declarative similarity.
492
00:53:01,950 --> 00:53:06,000
Under this view, two graphs are similarly if they have similar properties.
493
00:53:07,780 --> 00:53:16,150
So the graph kernels where you collect certain features that the properties are of this type logical equivalence is of this type.
494
00:53:16,450 --> 00:53:20,980
The two graphs satisfy the same properties, express all in this logic.
495
00:53:21,520 --> 00:53:27,340
Now the advantage of this way of thinking about it, you can actually adapt it to a specific target.
496
00:53:27,430 --> 00:53:31,150
You know which properties are relevant in your practical application.
497
00:53:31,390 --> 00:53:36,630
So you choose these and look at these. Okay, now what do we do with this?
498
00:53:36,640 --> 00:53:42,910
So we can kind of separate the examples I gave you into these two categories?
499
00:53:43,210 --> 00:53:46,630
Now, the question is, are these related?
500
00:53:46,640 --> 00:53:52,640
Right? I actually think thinking in some way, they are just two sides of the same coin.
501
00:53:52,660 --> 00:54:04,060
They are dual to one another. And I just take this as a hypothesis and try to, uh, to, to turn it into theorems.
502
00:54:04,450 --> 00:54:15,070
And the remaining few minutes, I just want to show you a few results pointing in this direction that actually there is a relation.
503
00:54:15,880 --> 00:54:21,850
So again, now the next three slides or something will be a bit more technical.
504
00:54:22,660 --> 00:54:26,290
But then, then you're done and hopefully we all get a doughnut.
505
00:54:28,350 --> 00:54:34,230
Okay. So any distance I mentioned this, we can write it this way.
506
00:54:35,070 --> 00:54:42,570
So what do we do here? We have two graphs, adjacency matrices, A and B, and now what we can take.
507
00:54:42,570 --> 00:54:47,309
The adjacency matrix has a one if there's an edge, zero if there's no edge.
508
00:54:47,310 --> 00:54:51,870
But in the two matrices, the vertices may be ordered in different ways.
509
00:54:52,170 --> 00:54:57,460
So we apply a permutation to the first, then compare it to the second.
510
00:54:58,440 --> 00:55:04,290
So take the difference and then take the norm. Basically, count the number of ones or minus ones that remain.
511
00:55:05,190 --> 00:55:09,690
That's what edit distance is and we have to minimise overall these alignments.
512
00:55:10,590 --> 00:55:15,930
Okay, well we can write this in a pure matrix form.
513
00:55:16,170 --> 00:55:20,760
We minimise over permutation matrices and then take this expression.
514
00:55:21,000 --> 00:55:24,030
Just trust me that this is this is right.
515
00:55:24,420 --> 00:55:31,829
Okay. Now if you write it this way, one thing you see is you have to minimise overall permutations.
516
00:55:31,830 --> 00:55:38,600
This is incredibly hard in theory and in practice it's a really bad space to minimise over.
517
00:55:38,910 --> 00:55:48,510
So for example, one way to, to, to, to support this is that even if the two graphs are just trees,
518
00:55:48,510 --> 00:55:54,329
I mean, usually everything is easy on trees, but here is even an MP hard to compute this.
519
00:55:54,330 --> 00:56:03,390
Okay, so that's even if it was a good measure of similarity, it would be kind of useless because you can't compute it.
520
00:56:04,170 --> 00:56:06,059
Well, then you could say we can.
521
00:56:06,060 --> 00:56:13,350
We can relax it a little bit and we can take a convex problem that approximates it, and then we can solve efficiently.
522
00:56:13,740 --> 00:56:23,310
And the way we can do this is instead of minimising over permutation matrices, we minimise over doubly stochastic matrices.
523
00:56:23,490 --> 00:56:31,740
Again, the important thing is here. That's the convex relaxation of this problem and such problems we can solve.
524
00:56:32,490 --> 00:56:36,240
Okay, so that gives us a new distance measure.
525
00:56:36,360 --> 00:56:41,160
It's not clear what it actually means and what it tells us about the graphs, but we can try, right?
526
00:56:41,520 --> 00:56:45,060
At least we can compute it now. Okay.
527
00:56:45,540 --> 00:56:50,549
So that's one thing we can do. So that's on this operational side, right?
528
00:56:50,550 --> 00:56:56,400
We started from edit distance. Now let's start from the other side and what could be good features.
529
00:56:57,330 --> 00:57:08,670
Well, in general, what turns out to be good and also related to many other features that people use is we count home office.
530
00:57:08,670 --> 00:57:13,440
So what's a human wolf is and it's a mapping that preserves adjacency.
531
00:57:13,890 --> 00:57:17,879
Okay. So here we have two graphs that would be a home office.
532
00:57:17,880 --> 00:57:21,030
We mapped the middle vertex here, these two here.
533
00:57:21,300 --> 00:57:26,550
So they are both neighbours and here's an edge and that one here, there's also an edge.
534
00:57:26,550 --> 00:57:29,610
So that's a home office and now we want to count them.
535
00:57:30,000 --> 00:57:40,260
Okay. So for example, number of triangles would be the number of Homomorphic isms from a triangle to our graph divided by six.
536
00:57:40,260 --> 00:57:44,100
But who cares about dividing by six? Okay, so we can do that.
537
00:57:44,340 --> 00:57:50,580
And now this gives us a vector embedding. Say we have a class of graphs, the class of all cycles.
538
00:57:50,790 --> 00:57:56,700
Okay? And then we can collect all these homomorphic numbers in one vector.
539
00:57:57,150 --> 00:58:00,270
Could be finite dimensional could be infinite dimensional.
540
00:58:00,840 --> 00:58:04,110
Okay, that gives us an embedding. Okay.
541
00:58:04,110 --> 00:58:07,950
Here's an example for these two graphs. Okay.
542
00:58:08,280 --> 00:58:13,580
And once we have the embedding, then the distance in the vector space induces a distance on the graphs.
543
00:58:13,590 --> 00:58:18,659
I'm close to the end. Okay. So we can do that.
544
00:58:18,660 --> 00:58:24,600
That also gives us the distance would be routed on the declarative sides.
545
00:58:24,840 --> 00:58:33,000
And now here's a theorem. That says these things and many other things are the same in some sense.
546
00:58:33,330 --> 00:58:39,780
The precise statement is so for any to graphs, their distance in the star sense is zero.
547
00:58:40,110 --> 00:58:43,230
If and only if all the Home Office numbers are all the same.
548
00:58:43,950 --> 00:58:47,360
Okay. Now that happens. Interestingly.
549
00:58:47,760 --> 00:58:53,100
If and only if the one dimensional vice file element algorithm does not distinguish the two.
550
00:58:53,190 --> 00:58:55,310
So there we actually have a very efficient.
551
00:58:57,120 --> 00:59:06,210
We can also write this as a logical equivalence in a strange logic that has been studied a lot in descriptive complexity theory.
552
00:59:07,020 --> 00:59:14,099
And maybe the most interesting addition here is also this happens if and only if the two
553
00:59:14,100 --> 00:59:19,350
graphs cannot be distinguished by a graph neural networks or by a learned embedding.
554
00:59:19,800 --> 00:59:31,050
So that's quite nice how things fall together here. And I want to stop with that and just to one slide on very general research directions.
555
00:59:32,010 --> 00:59:36,030
So Rafael Wolff is there's still a lot happening.
556
00:59:36,040 --> 00:59:42,930
I mean, the problems are really hard and it's hard to make progress, but I think it's very interesting and worthwhile.
557
00:59:42,960 --> 00:59:49,510
So I still work on this, although not exclusively. So for example,
558
00:59:49,510 --> 00:59:53,420
one problem that is interesting and we're not much is known as the group I saw
559
00:59:53,440 --> 01:00:01,210
monotheism problem where you want instead of graphs you look at groups um similarity.
560
01:00:02,220 --> 01:00:07,090
Is. Kind of the opposite. It's practically extremely important.
561
01:00:07,120 --> 01:00:13,960
In theory, we know very little and there's a lot to do and probably a lot of easy progress to be to be made.
562
01:00:14,650 --> 01:00:21,700
Okay. And you're just two references of survey papers I wrote fairly recently.
563
01:00:21,970 --> 01:00:27,610
You can look it at these if you're interested, and that's pretty much it.