1 00:00:01,880 --> 00:00:15,230 It is already recording, and yeah, I think it's time to start with the with our seminar, I'll start into introducing Caroline. 2 00:00:15,230 --> 00:00:24,950 But before, like also like Caroline told me that if you have questions, please do feel free to ask like either like in the chat or Jessica. 3 00:00:24,950 --> 00:00:36,390 Raise your voice. And another question and that is fine and I'm Caroline will be happy to to respond. 4 00:00:36,390 --> 00:00:39,660 I'm now like, let me introduce Caroline. 5 00:00:39,660 --> 00:00:49,530 So this is like the last session of the Oxford Computational Solicits, a machine learning seminar in this season, 6 00:00:49,530 --> 00:00:59,580 and we have Caroline Olear from M.I.T., who is going to speak about causality and policy in the light of drug repurposing for COVID 19. 7 00:00:59,580 --> 00:01:03,990 And Caroline is currently the Henry and Grace Doherty, 8 00:01:03,990 --> 00:01:13,110 associate professor in electrical engineering and computer science and Institute for Data Systems of Society at MIT. 9 00:01:13,110 --> 00:01:21,420 She holds many degrees in maths and biology from the University of Surrey, and that to the studies from UC Berkeley. 10 00:01:21,420 --> 00:01:30,120 Before joining M.I.T., she was a postdoc at the Simons Institute at UC Berkeley, the University of Minnesota, and Footage. 11 00:01:30,120 --> 00:01:35,130 She was also an assistant professor at SD Austria. 12 00:01:35,130 --> 00:01:40,710 Caroline is a member and an elected member of the International Statistical Institute, 13 00:01:40,710 --> 00:01:47,580 and she won an assignment investigator Work as We Fellowship, amongst many others. 14 00:01:47,580 --> 00:01:54,390 And the reason for why we have Caroline here is because she does very exciting research at the intersection of machine learning, 15 00:01:54,390 --> 00:02:03,530 computer science and computational biology, in particular on council in France, generated modelling and applications to. 16 00:02:03,530 --> 00:02:08,840 So, Caroline, please feel free to to Egypt, OK? 17 00:02:08,840 --> 00:02:15,140 Yeah, thank you so much, and thank you very much for inviting me. Very excited to be here. 18 00:02:15,140 --> 00:02:20,840 So we've been working since all of this said a lot and causality in recent years. 19 00:02:20,840 --> 00:02:24,980 And then once COVID 19 started, 20 00:02:24,980 --> 00:02:31,580 we saw many people in the lab were wondering how we could use the kind of research that we have been doing towards this problem. 21 00:02:31,580 --> 00:02:37,880 And so that's when we started working on drug repurposing, which is the kind of problem that says I'm going to be presenting here. 22 00:02:37,880 --> 00:02:47,180 And in particular, I'm going to talk about the mathematics of statistics, the machine learning questions that have arisen just due to these questions, 23 00:02:47,180 --> 00:02:53,690 these biological or medical kinds of questions and really go more into the statistics and machine learning topics 24 00:02:53,690 --> 00:02:59,450 that have been important for for actually coming up with the kind of method that we have been applied here. 25 00:02:59,450 --> 00:03:06,050 So let me start right off with the problem of drug repurposing in particular and in the setting in the context of COVID 19. 26 00:03:06,050 --> 00:03:13,340 But this is, of course, more generally a problem that that could be applied to so many other diseases as well. 27 00:03:13,340 --> 00:03:17,870 So how does drug development work and in particular, how this drug repurposing work? 28 00:03:17,870 --> 00:03:23,630 So drug repurposing is is particularly important when you have diseases where you know you 29 00:03:23,630 --> 00:03:28,400 don't want to have time to go through all of the different trials because these trials, 30 00:03:28,400 --> 00:03:34,970 of course, often fail and they take relatively long. And so drug repurposing means that you're using drugs that have already been FDA 31 00:03:34,970 --> 00:03:40,400 approved and tried to find which ones might also be effective in a new disease context. 32 00:03:40,400 --> 00:03:47,270 This can then speed up kinds of trials because you don't have to go through the safety trials anymore. 33 00:03:47,270 --> 00:03:51,500 You can just directly go to the efficacy trials and try to see whether this new 34 00:03:51,500 --> 00:03:57,760 drug also or this old drug also works in this new and this new disease context. 35 00:03:57,760 --> 00:04:08,080 So how do drugs work generally? Well, drugs usually target a particular protein and then try to make this protein ineffective. 36 00:04:08,080 --> 00:04:14,970 And through that, what you want to do is you want to find drugs that can then push the system back to the normal. 37 00:04:14,970 --> 00:04:19,640 OK, so now if we're thinking about it and in particular when we think about it in these in 38 00:04:19,640 --> 00:04:24,340 these networks and we will see networks also coming up towards the end of the talk. 39 00:04:24,340 --> 00:04:28,420 So, you know, people have used a lot of these fundamentalist grafts in order to come up with 40 00:04:28,420 --> 00:04:33,910 with a drugs that could and to come up with the right targets for these drugs. 41 00:04:33,910 --> 00:04:42,280 So in particular, if you have here, say, in red and in new or these notes that are and maybe these are now genes or proteins, whatever, 42 00:04:42,280 --> 00:04:50,530 whatever you're looking at and these are the notes that in the disease context are differentially expressed as compared to the normal context, 43 00:04:50,530 --> 00:04:55,360 then you know, one of the standard approaches is to define targets that are connected or very 44 00:04:55,360 --> 00:05:01,330 central to many nodes that are differentially expressed in disease control. 45 00:05:01,330 --> 00:05:06,940 Now, of course, you know, if you come already from it, a little bit of a causality background and also otherwise, 46 00:05:06,940 --> 00:05:12,730 you know, it is clear that if your true network in general in biology right there, 47 00:05:12,730 --> 00:05:20,860 that these are all direct relationships where you have like a certain protein or a certain transcription factor turning on another gene, et cetera. 48 00:05:20,860 --> 00:05:24,160 And so if you're a true network, look something like this. 49 00:05:24,160 --> 00:05:29,080 Like this? Direct the graph here. And of course, if you're targeting this node here in the middle. 50 00:05:29,080 --> 00:05:35,320 Well, then although it is connected to all these differentially expressed genes, these genes are not going to change, right? 51 00:05:35,320 --> 00:05:40,270 These red and blue jeans. There is no hope of actually moving the system back to the normal state. 52 00:05:40,270 --> 00:05:44,650 So the directions matter a lot, right? And so causal relationship will actually matter a lot. 53 00:05:44,650 --> 00:05:51,760 You cannot just sit down to write the crap. So in particular, if the graph looks something like this, then maybe this is a good target, right? 54 00:05:51,760 --> 00:05:57,670 Because you at least have a chance of changing all these nodes that are differentially expressed, 55 00:05:57,670 --> 00:06:03,280 and this is in the right way by actually targeting this particular notice. 56 00:06:03,280 --> 00:06:08,170 So directions really matter. And so this is one way where actually causality comes in, 57 00:06:08,170 --> 00:06:12,880 and I'll only talk short about this or we have done a lot of research in terms of actually learning 58 00:06:12,880 --> 00:06:19,180 these diverse graphs or these causal graphs from data on the nodes without knowing the directions. 59 00:06:19,180 --> 00:06:22,960 And so this will be just a little bit of the talk all the way at the end. 60 00:06:22,960 --> 00:06:30,400 I want to talk about something different that there's another problem that also appears in in this drug repurposing studies. 61 00:06:30,400 --> 00:06:31,840 And that's the following. 62 00:06:31,840 --> 00:06:38,860 So, you know, like if we're thinking now of SARS-CoV-2, what in SARS-CoV-2 in particular has an effect on these long epithelial cells? 63 00:06:38,860 --> 00:06:45,280 So now what are the datasets? So often, of course, drug stroke identifying new drugs is still done very much experimentally, 64 00:06:45,280 --> 00:06:47,980 although there are more and more computational approaches and in fact, 65 00:06:47,980 --> 00:06:55,420 also successful drugs that have gone through all of the trials using just that were identified using computational methods. 66 00:06:55,420 --> 00:07:02,170 And so what? What are the datasets that one can use if one is actually trying to use computational methods? 67 00:07:02,170 --> 00:07:06,850 So there are these really large scale drug screens, and I'm going to show some of them or one of them. 68 00:07:06,850 --> 00:07:11,380 In fact, on one of the next slides where you have many, many drugs, 69 00:07:11,380 --> 00:07:16,210 and I'm going to concentrate on these FDA approved ones because we're thinking about drug repurposing. 70 00:07:16,210 --> 00:07:20,260 But in fact, these drug screens, you also have many non FDA approved drugs. 71 00:07:20,260 --> 00:07:22,240 You also have all kinds of milk out, et cetera. 72 00:07:22,240 --> 00:07:28,510 So many, many different kinds of interventions on the system that have been applied to some cell types. 73 00:07:28,510 --> 00:07:32,500 Now, this particular screen that I'm going to look at is a cancer screen. 74 00:07:32,500 --> 00:07:38,560 So the cell types that these drugs were applied to are all different kinds of cancer cell lines. 75 00:07:38,560 --> 00:07:41,080 Now the problem is, of course, that, you know, 76 00:07:41,080 --> 00:07:47,620 now what we want to do is be able to identify drugs that will work in this new context, which is not cancer, right? 77 00:07:47,620 --> 00:07:55,270 And that will. So we want to be able to predict the effects of these drugs on SARS-CoV-2 infects this long epithelial cells. 78 00:07:55,270 --> 00:08:03,280 These are cells that are not in this drug screen. So every time a new disease comes up, you, you would otherwise have to redo the whole drug screen. 79 00:08:03,280 --> 00:08:11,150 And that's of course, infeasible to do, right. So what you really need to be able to do is somehow to transfer these causal effects. 80 00:08:11,150 --> 00:08:18,910 So, you know, the effect of all these drugs or these interventions on some cell lines where they were measured. 81 00:08:18,910 --> 00:08:23,710 Now can you from here predict what the effects of these drugs will be on a different 82 00:08:23,710 --> 00:08:28,480 cell line on a cell line where you know how it looks like in the normal state, 83 00:08:28,480 --> 00:08:32,620 right? But you have not seen any of these drugs applied to it. 84 00:08:32,620 --> 00:08:39,270 And so you would like to be able to transfer it to the. So that's the current and this is again a causal actually question, right? 85 00:08:39,270 --> 00:08:44,690 Because a drug or any knock out, et cetera, is an intervention in the system. 86 00:08:44,690 --> 00:08:49,110 OK, so I have an intervention in the system and now I want to be able to predict what this intervention 87 00:08:49,110 --> 00:08:53,690 in the system will do in a different setting or in a different we call this environment. 88 00:08:53,690 --> 00:08:57,810 OK, so one cell type, one environment and another cell type is another environment. 89 00:08:57,810 --> 00:09:04,380 These same questions appear a lot and policy kinds of questions where you know the effect of a particular policy in a city, 90 00:09:04,380 --> 00:09:09,330 which would there be the environment, then you want to predict what the effect of this policy will be in another city. 91 00:09:09,330 --> 00:09:15,780 So this is known as the causal transport security problem. So it's again a causal problem just because it is an intervention. 92 00:09:15,780 --> 00:09:18,360 And so that's the main question that I want to talk about. 93 00:09:18,360 --> 00:09:24,210 So that's this question here is like, how can you transport the effects of a drug from one environment, 94 00:09:24,210 --> 00:09:30,990 say here a red cell type to a blue cell to predict what this intervention will do and actually do something? 95 00:09:30,990 --> 00:09:38,130 The other question I talked about before is how can you even learn this underlying these regulatory networks? 96 00:09:38,130 --> 00:09:41,880 And this will actually come up. And we have worked a lot on this problem as well, 97 00:09:41,880 --> 00:09:49,260 and this will just come up in the end where we actually want to validate some of the kinds of relationships and cosy relationships that we're putting. 98 00:09:49,260 --> 00:09:53,220 OK, but I'll mainly talk about this causal transport stability problem here. 99 00:09:53,220 --> 00:09:57,990 If you know the effect of different kinds of drugs on some cell types and now you would like 100 00:09:57,990 --> 00:10:02,190 to predict what the effects of these drugs is in the cell type that you actually care about, 101 00:10:02,190 --> 00:10:09,150 which in this case, a source close to infected lung epithelial cells where you have not measures any of the drugs. 102 00:10:09,150 --> 00:10:14,740 And please again, as Gonzalo just said, just just interrupt me whenever you have questions. 103 00:10:14,740 --> 00:10:22,870 OK, so if you come from machine learning numbers, probably one very straightforward approach that you may want to try and in fact, 104 00:10:22,870 --> 00:10:26,500 I know you're in the UK, I have quite a bit to AstraZeneca. 105 00:10:26,500 --> 00:10:33,280 I know that you have tried this. So this is certainly something like very, very standard and it has not worked for them. 106 00:10:33,280 --> 00:10:39,490 And they'll show you why it might not have worked so well for them and how one could maybe get them around this. 107 00:10:39,490 --> 00:10:46,620 So if you come from machine learning, then this is probably kind of an approach that you might say like, well, we should probably at least try. 108 00:10:46,620 --> 00:10:49,930 Right. So we all know this from and often it's used using Gans. 109 00:10:49,930 --> 00:10:55,690 I'm going to use all coders in this whole talking also just because we actually have some theory for it. 110 00:10:55,690 --> 00:11:01,750 So I'm sure you're all familiar with all 10 coders that we have here, this encoder and neural network that maps you from some space. 111 00:11:01,750 --> 00:11:07,870 This can be gene expression. It can be images to this latent space and then the decoder, which makes it back, 112 00:11:07,870 --> 00:11:12,720 and we're going to learn this map just by trying to minimise reconstruction. 113 00:11:12,720 --> 00:11:17,170 Now there are all these like, really nice applications that are usually done with Dan's, 114 00:11:17,170 --> 00:11:22,600 and I'm sure you've seen them where you want to translate an image that you have taken in summer to winter. 115 00:11:22,600 --> 00:11:26,360 Or you have like some people and you want to add smiles to them. Right. 116 00:11:26,360 --> 00:11:32,170 So here you have some people that you've taken, some are smiling and some were also neutral face. 117 00:11:32,170 --> 00:11:39,070 And then maybe here this vector here corresponds to this person here, adding a smile to to her right? 118 00:11:39,070 --> 00:11:44,590 And now here you have a new person and you would like to know how does this person look like with a smile? 119 00:11:44,590 --> 00:11:51,640 You have never taken this picture. Well, maybe you what will work is actually taking this vector here that corresponds to adding a smiling, 120 00:11:51,640 --> 00:11:57,040 different person and just moving it up to this one person here and then a decoding. 121 00:11:57,040 --> 00:12:02,500 This points back into image land and in fact, in this case could come with person with a smile. 122 00:12:02,500 --> 00:12:07,330 Now the question, of course, is can something like this also work when you have drugs? 123 00:12:07,330 --> 00:12:13,300 So here's how you should think of it. It's like we have different kinds of cell types like this blue here. 124 00:12:13,300 --> 00:12:15,820 Just pink, one another blue, another purple one. 125 00:12:15,820 --> 00:12:24,730 For some of them, you get to observe what the effect is of a particular drug, and now you want to predict the effect of this drug on a new cell type. 126 00:12:24,730 --> 00:12:28,690 In this case, this would be, you know, this would be the cancer cell lines where we've seen the drugs on. 127 00:12:28,690 --> 00:12:34,540 This would be the first couple, in fact, that lung epithelial cells, we would like to be able to predict the effects of this drug. 128 00:12:34,540 --> 00:12:39,910 Well, maybe, you know, in the lake and space, I could just take the effects of this drug and then moved over to my new cell 129 00:12:39,910 --> 00:12:44,080 type and then just encode it back out into the vacant space and outsports.com. 130 00:12:44,080 --> 00:12:50,370 The effect this this effect of what this drug will have on the SARS-CoV-2-infected film. 131 00:12:50,370 --> 00:13:01,720 OK, so this has actually worked. But in this particular paper here it works actually quite well just for a well, a few cell types and the few drugs. 132 00:13:01,720 --> 00:13:04,390 And so the question is, of course, is this a general phenomenon? 133 00:13:04,390 --> 00:13:08,980 And as I've already told you, AstraZeneca has tried this and this in general does not work. 134 00:13:08,980 --> 00:13:17,470 In fact, you should also not expect it to work. I mean, there is work by Barenboim and co-authors, they have done and pearl, et cetera. 135 00:13:17,470 --> 00:13:24,820 There's a long route of work, and I've already told you this is known as causal transport stability and Barenboim and Pearl. 136 00:13:24,820 --> 00:13:31,930 They actually have really nice work showing that necessary and sufficient conditions for one causal transport stability works. 137 00:13:31,930 --> 00:13:35,770 Now their work and things are quite stringent conditions. 138 00:13:35,770 --> 00:13:40,600 Now the problem is that their work doesn't really tell us when it will work in this setting here, 139 00:13:40,600 --> 00:13:51,430 because they always assume that, you know, the underlying cause of graft. So if I draw something right, they issue so they assume that. 140 00:13:51,430 --> 00:13:56,110 So you have some causal graph below it and they assume even more that, you know, 141 00:13:56,110 --> 00:14:02,380 the nodes on which the intervention happens and even more that you know, the nodes, 142 00:14:02,380 --> 00:14:05,230 which are the ones that change, 143 00:14:05,230 --> 00:14:13,420 that are that are the ones that are important in order to change environments will say this is the one that differentiates L.A. from, 144 00:14:13,420 --> 00:14:19,960 I don't know, Boston, right? If you want to do a transfer of a policy from one city to another. 145 00:14:19,960 --> 00:14:22,240 So that's that's another knowledge that you have. 146 00:14:22,240 --> 00:14:28,870 So, you know, where do interventions happen and where the nodes are to change the environments or make one cell type into another cell type? 147 00:14:28,870 --> 00:14:34,570 Then based on this graph, we have necessary and sufficient conditions that tells you whether the effect of 148 00:14:34,570 --> 00:14:38,330 this particular intervention can be transferred to this new city or to this. 149 00:14:38,330 --> 00:14:41,770 Until now, you see, we don't have this information, right? 150 00:14:41,770 --> 00:14:49,840 I mean, we don't know happening regulatory networks, we certainly don't know for the drugs in general, like all the nodes that they have an effect on. 151 00:14:49,840 --> 00:14:54,040 And we also don't know the nodes that make a difference between different cell types. 152 00:14:54,040 --> 00:15:00,580 But what is really nice is that they do have a certain sufficient condition, so you should certainly not expect that to work always. 153 00:15:00,580 --> 00:15:07,390 So that's maybe the take away from here. So the question is, when does it work and why does it work sometimes? 154 00:15:07,390 --> 00:15:12,250 And so let's actually look at the large datasets where we'll actually be analysing this. 155 00:15:12,250 --> 00:15:20,170 So this is this large data. As I mentioned that I encourage many of you to look at, I think it has been really, really fun working with this dataset, 156 00:15:20,170 --> 00:15:28,270 if you care about questions like this, about transfer and about interventions and trying to predict the effects of interventions. 157 00:15:28,270 --> 00:15:31,360 So this is a publicly available except the road. 158 00:15:31,360 --> 00:15:38,380 It's a map dataset that has 1.2 million samples where every sample is a one time dimensional vector of gene expression. 159 00:15:38,380 --> 00:15:48,060 So here for can, for just some some genes, well, select the genes and then you have here thousands of different perturbations. 160 00:15:48,060 --> 00:15:56,230 OK, so these are these will only look at these FDA approved drugs, which are many older, many older perturbations that were applied to it. 161 00:15:56,230 --> 00:16:01,900 And as I said, they were applied to different cancer cell lines. So around 70 different cancer cell lines. 162 00:16:01,900 --> 00:16:07,330 And what you see here is every daughter means that you have data, so that's just one thousandth dimensional vector. 163 00:16:07,330 --> 00:16:10,540 But of course, you cannot do all of these experiments, right? 164 00:16:10,540 --> 00:16:15,760 So there is a whole lot of white space out here, meaning that these kinds of combinations were not much. 165 00:16:15,760 --> 00:16:23,680 But, you know, you already have a huge amount of data here on which you can then test whether your method for trying to predict the effect. 166 00:16:23,680 --> 00:16:31,690 You just leave out some validation of whether your method of trying to predict some, some the effect of a perturbation, 167 00:16:31,690 --> 00:16:37,720 actually how well you are able to to predict effects of a corporation that was not being. 168 00:16:37,720 --> 00:16:45,100 OK, so this is the dataset that we're going to use in order to actually check how well that such an Altman colder approach would work. 169 00:16:45,100 --> 00:16:56,810 OK. So look in this data to think a colour the perturbations measure and the effect of the prospective patients measure like a. 170 00:16:56,810 --> 00:17:03,630 Perfect. Yeah, so this is. So it's it's the expression of these one thousand genes that were measured. 171 00:17:03,630 --> 00:17:12,060 So that's what every one of these points here is actually it's really a tensor. So every one of these dots is a 1000 dimensional vector. 172 00:17:12,060 --> 00:17:15,750 So it's quite nice that it's not just a one dimensional thing that it's actually doing. 173 00:17:15,750 --> 00:17:20,340 So there are other drug screens where it's just measured whether to sell dyes or not. 174 00:17:20,340 --> 00:17:26,400 But you're actually you really get to measure this for one thousand financial markets. 175 00:17:26,400 --> 00:17:30,090 And this is just a huge map of the status of how it looks like. So you do see. 176 00:17:30,090 --> 00:17:34,050 So here in colour or the different cell types of seven, we want to hear that you have. 177 00:17:34,050 --> 00:17:37,020 So what you see is that actually and this is known, 178 00:17:37,020 --> 00:17:41,880 so perturbations have a very small effect as compared to the differences between different cell types. 179 00:17:41,880 --> 00:17:48,180 OK, so this in black here, just believe me that these black points here correspond to the orange cell type here, 180 00:17:48,180 --> 00:17:52,110 but the different drugs applied to it. So you see, it's actually quite so. 181 00:17:52,110 --> 00:17:58,050 It's actually a very good baseline is just to predict the effect of a perturbation being the mean of that cell type. 182 00:17:58,050 --> 00:18:02,780 It is quite difficult to have methods that's better than that. 183 00:18:02,780 --> 00:18:09,450 OK, so this is always so that's how you should think of like how the status of Jerusalem. 184 00:18:09,450 --> 00:18:13,410 OK, so let's look a little bit about how these alternate methods work. 185 00:18:13,410 --> 00:18:19,110 And here is something. And this is what we're looking at a little bit into theory, actually. 186 00:18:19,110 --> 00:18:24,180 So it might look a little bit strange now at the beginning, but you know, your standard often code, you're right, you go from. 187 00:18:24,180 --> 00:18:29,640 In this case, we have a 1000 dimensional space where these 1000 dimensional vectors that we're measuring usually 188 00:18:29,640 --> 00:18:33,840 usually are all turned coders in order to get the lower dimensional representation of the data. 189 00:18:33,840 --> 00:18:42,310 And then you train it back to your original space nine hundred one thousand dimensional space now, and we'll see that that doesn't work so well. 190 00:18:42,310 --> 00:18:46,410 And many people have seen for my industry that this doesn't work so well. 191 00:18:46,410 --> 00:18:49,530 Now what what will be proposing to do with something? 192 00:18:49,530 --> 00:18:55,830 Maybe that looks very crazy at the moment because, you know, often cultures are usually used for dimension reduction. 193 00:18:55,830 --> 00:18:58,800 What we're going to do is not use it for dimension reduction, 194 00:18:58,800 --> 00:19:03,750 but we're actually going to go into a latent space that is higher dimensional than the original. 195 00:19:03,750 --> 00:19:12,990 OK, so start off here with your 1000 dimensional vector embedded into a higher dimensional open space and go back to your routine space now. 196 00:19:12,990 --> 00:19:19,200 So, OK, so why is this unintuitive? Well, because we have so many parameters that we could just learn the identity. 197 00:19:19,200 --> 00:19:23,820 So here I could actually just learn the identity map. What I want to show you and I'll give you is a theory. 198 00:19:23,820 --> 00:19:27,930 Why we even tried doing something crazy like this is that we're at the gate. 199 00:19:27,930 --> 00:19:31,050 When you train such an art and culture, you're not going to learn the identity map. 200 00:19:31,050 --> 00:19:34,740 And I will tell you a bit about what kinds of maps we're actually going to be doing. 201 00:19:34,740 --> 00:19:39,900 OK, so so but that's the interesting thing that you actually what you're going to learn is actually something useful. 202 00:19:39,900 --> 00:19:45,420 It seems like. And in terms of intuition, I think maybe intuitively kind of makes sense. 203 00:19:45,420 --> 00:19:47,340 So if you want to. 204 00:19:47,340 --> 00:19:52,770 So in some sense, what you want to do is you want to make effects more linear, right, so that they're more aligned in the making space. 205 00:19:52,770 --> 00:19:54,660 So you want to make effects more linear. 206 00:19:54,660 --> 00:19:59,640 So if you're going into a lower dimensional space where your effects will definitely have to become more non-linear, 207 00:19:59,640 --> 00:20:04,020 more crumbled up than what we were before. So if you're adding up, if you're adding more dimensions, 208 00:20:04,020 --> 00:20:08,940 that's that's hopefully allowing you to actually make things more aligned than what they were before. 209 00:20:08,940 --> 00:20:14,470 At least intuitively, that makes sense. But of course, it depends on what is the maths that is learnt by building. 210 00:20:14,470 --> 00:20:21,220 OK. And so just to show you that something like this is actually happening, we can just check that we can just check it on to status. 211 00:20:21,220 --> 00:20:28,320 After you just leave out someone, you try to predict the effects or you leave out what you're actually going to do is so, 212 00:20:28,320 --> 00:20:34,570 so it will only work if the effects of a drug if you look at different cell types where this particular drug was measured. 213 00:20:34,570 --> 00:20:41,050 It only works if the two drugs are aligned. So what we're just going to measure is the vector between the drug right in the latent 214 00:20:41,050 --> 00:20:45,340 space between a drug in one cell type versus the other cell type in the latent space. 215 00:20:45,340 --> 00:20:52,750 And the correlation between the drug and in one cell type and the other cell type individuals. 216 00:20:52,750 --> 00:20:55,990 And so here this is what you see. So this is the original space. 217 00:20:55,990 --> 00:21:00,850 The correlation and then here is if you go into a lower dimensional latent space in this case, 218 00:21:00,850 --> 00:21:04,810 one dimensional life in space, and you see that basically not much changes. 219 00:21:04,810 --> 00:21:09,400 Right. So this is filled with alcohol. You do have a little bit of an enrichment up, which is good. 220 00:21:09,400 --> 00:21:16,060 This is what you would want to see. Now what happens if you just do a PKA embedding now? 221 00:21:16,060 --> 00:21:22,000 Well, what happens with PKA? In fact, you get quite good enrichment, so you really get very good alignments. 222 00:21:22,000 --> 00:21:25,990 But you're doing you're getting good alignment by throwing away all of your data, right? 223 00:21:25,990 --> 00:21:32,530 So in fact, you can align space by doing aligned data by throwing away data, right? 224 00:21:32,530 --> 00:21:36,310 But then, of course, you don't have any information anymore about the effects of the drug. 225 00:21:36,310 --> 00:21:39,400 And in fact, this is what you see here when you're looking at reconstruction. 226 00:21:39,400 --> 00:21:42,790 So this is one way of getting really good alignment that's just throwing out most of your 227 00:21:42,790 --> 00:21:48,280 data or most of the information in the data and just keeping some of the direction. 228 00:21:48,280 --> 00:21:53,620 So now what you see here is that actually, if you go to this over parameterisation space on, in this case, 229 00:21:53,620 --> 00:21:59,560 higher dimensional latent space than what you started off with, what you, first of all, will see is that it's not learning the identity matrix. 230 00:21:59,560 --> 00:22:02,680 Otherwise everything would have to be nice on the value. 231 00:22:02,680 --> 00:22:09,220 So you do get actually basically the same kind of alignment as what you get with PDA when you're just throwing out all of the information. 232 00:22:09,220 --> 00:22:15,160 But now, of course, with perfect reconstruction. So you're actually not getting rid of any information. 233 00:22:15,160 --> 00:22:22,120 OK. And so the question is, and this is what I want to do now is to tell you a bit of intuition for why did you even try these 234 00:22:22,120 --> 00:22:28,270 crazy Olsen folders where you have like a higher dimensional vacant space than than your Virgenes? 235 00:22:28,270 --> 00:22:32,330 And this is just to show you here. We just did it. I chose this spring for two cell types. 236 00:22:32,330 --> 00:22:37,390 This is just for all different cell types, and you sort of see exactly this kind of stuff. 237 00:22:37,390 --> 00:22:41,530 OK, so let's go a little bit into theory of like, why, why are we even? 238 00:22:41,530 --> 00:22:44,770 Why did we ever try to look at over overdramatise alternate cultures? 239 00:22:44,770 --> 00:22:50,290 And I should say now for all applications we're actually using over something else. 240 00:22:50,290 --> 00:22:53,860 OK, so let's go a little bit into it and give you some intuition for it. 241 00:22:53,860 --> 00:23:01,840 So. So this is just the standard setting. So we'll have our end training examples there, living in R.K. and now in the over parameter settings. 242 00:23:01,840 --> 00:23:06,040 So you can either just have the latent space larger dimensional than the dimension or here. 243 00:23:06,040 --> 00:23:12,300 I will just have like the number of samples, be smaller than than than the inputs I mentioned that I'm taking. 244 00:23:12,300 --> 00:23:19,720 It doesn't matter which direction he's going. And so here we're training our often coder, right, using these end training examples. 245 00:23:19,720 --> 00:23:23,310 So we're just trying to minimise the construction. 246 00:23:23,310 --> 00:23:29,430 OK, so what do we want to do, so let's actually look at this problem, so we're in the overdramatise setting. 247 00:23:29,430 --> 00:23:34,890 Let's actually just get a little bit of intuition for what is happening. So let's just do the linear case first. 248 00:23:34,890 --> 00:23:39,120 So in general, right, this is the map of guilt encoder since this will be highly nonlinear. 249 00:23:39,120 --> 00:23:43,860 But let's just do that. The linear setting first, and let me just have one training example. 250 00:23:43,860 --> 00:23:50,820 That's the great that's the most stringent portrait. So then I would just try to do the following. 251 00:23:50,820 --> 00:23:56,090 So I'm trying to do argaman over a is an hour of patience. 252 00:23:56,090 --> 00:24:00,600 Okay, right? And X is just a one dimensional vector, right? 253 00:24:00,600 --> 00:24:09,150 This is an OK. So I'm definitely over parameterised, and I'm just trying to minimise this so well. 254 00:24:09,150 --> 00:24:14,410 I have so many parameters. First of all, you notice that there are many different solutions to this problem, right? 255 00:24:14,410 --> 00:24:19,680 So one solution is definitely. So definitely, I can get this lost down to zero, right? 256 00:24:19,680 --> 00:24:25,920 And in fact, I can get it down to zero in many different ways. So one solution is the, I guess, the Degan, right? 257 00:24:25,920 --> 00:24:32,040 So a is equal to the identity is definitely a solution. But note that there are many other solutions, right? 258 00:24:32,040 --> 00:24:39,630 In particular, there is a solution of any rank. So, for example, there is a rank one solution which would just project everything onto X. 259 00:24:39,630 --> 00:24:44,310 But you know, you have like any of the ranks of solutions to this problem. 260 00:24:44,310 --> 00:24:49,780 So then the question is what is actually an altered code work? And so now this results actually just work. 261 00:24:49,780 --> 00:24:53,040 The linear setting is very well known. In fact, it is. 262 00:24:53,040 --> 00:25:00,510 If you just do gradient descent started and you know, in your own code, there's usually what you do is you start with very small initialisation. 263 00:25:00,510 --> 00:25:02,220 So in fact, if you start at zero, 264 00:25:02,220 --> 00:25:10,280 it is known that here's what you would learn is in fact the one that the rank one solution or just a projection onto X itself. 265 00:25:10,280 --> 00:25:15,630 OK, and now you can actually generalise this, and this will be to zero in terms of like what is being learnt, 266 00:25:15,630 --> 00:25:18,330 but what will do just to get some intuition as well. 267 00:25:18,330 --> 00:25:25,770 Let's like actually just look at what happens when you apply your unit, art and culture, when you just have when you train it just with one training? 268 00:25:25,770 --> 00:25:34,200 Exactly. So here is one training example these nice Bonny's here you train it on these bunnies and then you put in some random test examples. 269 00:25:34,200 --> 00:25:39,480 And, well, let's see what comes out right when you feed it through your authoring folder and what you see here. 270 00:25:39,480 --> 00:25:44,770 What happens is that always the bunnies come out, OK? So, so actually it. 271 00:25:44,770 --> 00:25:51,630 So I would like you to notice that this is stronger than what we saw before in the linear setting right where rank one solution comes out. 272 00:25:51,630 --> 00:25:55,320 If it's a rank one solution, you wouldn't expect just the bunnies to come out, 273 00:25:55,320 --> 00:26:01,170 but you could also get, for example, the negative of the bunnies come up. So this here is actually a point map, right? 274 00:26:01,170 --> 00:26:03,690 That maps, and it seems like it has learnt a point, Matt, 275 00:26:03,690 --> 00:26:10,860 where it maps anything you put in to exactly the bunnies and not like a linear combination of the buttons. 276 00:26:10,860 --> 00:26:16,320 So. So this is I mean, OK, so many people would say this is not surprising at all. 277 00:26:16,320 --> 00:26:21,870 But I do want to just make sure that you understand that it is really surprising. 278 00:26:21,870 --> 00:26:26,670 So first of all, if you actually use. So you met this really quite deep as well. 279 00:26:26,670 --> 00:26:33,090 So if you use a very shallow network, in fact, you will not learn a point next. 280 00:26:33,090 --> 00:26:37,620 OK, so this is important to notice. So here, in fact, you're again training on one training example. 281 00:26:37,620 --> 00:26:44,160 You put in some random examples and you see that outcomes actually something that looks very similar to what you put in. 282 00:26:44,160 --> 00:26:47,040 Now again, there are papers that say you're learning the identity map. 283 00:26:47,040 --> 00:26:53,250 I'm going to show you this is not the identity not going to will be really important to notice that this is not OK, 284 00:26:53,250 --> 00:26:56,670 but that will come on the next slide. But this is just to show that it is. 285 00:26:56,670 --> 00:27:03,780 It's that you shouldn't be surprised by the results. Also again, as I said before, you know, if you actually this is easy to prove the linear setting. 286 00:27:03,780 --> 00:27:07,440 So if you use a linear encoder and you train, let's say, 287 00:27:07,440 --> 00:27:13,470 on two examples just to show you that this is one example you can actually get the negative of the dog or something if you train it on that. 288 00:27:13,470 --> 00:27:19,740 But if you do hear a linear, then it's very clear that what comes out is always a linear combination of your training. 289 00:27:19,740 --> 00:27:25,050 And so when the linear setting, you should also expect to see the negative example that difference. 290 00:27:25,050 --> 00:27:30,600 OK, so here you get linear combinations. Here you actually get something that looks more like a gigantic map. 291 00:27:30,600 --> 00:27:35,610 So you should be surprised by this kind of results where if you have a quite deep network, in fact, 292 00:27:35,610 --> 00:27:41,090 if you just train it on one example, then it seems like what it is learning is the point. 293 00:27:41,090 --> 00:27:45,350 OK, so then let me give some intuition for actually the map that is being learnt. 294 00:27:45,350 --> 00:27:48,380 And this really comes from this shallow attack. 295 00:27:48,380 --> 00:27:56,390 OK, so the intuition is really, really simple in terms of what we're doing in order to then show what we're actually going to prove. 296 00:27:56,390 --> 00:28:01,310 Well, and also Coder is just so nice, right, that it maps an image to an image. 297 00:28:01,310 --> 00:28:07,220 Well, so if I want to see or understand the bit about what is the map that the alter encoder has learnt? 298 00:28:07,220 --> 00:28:09,890 Well, then let me just iterate the map. 299 00:28:09,890 --> 00:28:16,820 OK, so so if you're in this setting here, right, I put in this this horse here and I get out something that looks very similar to it. 300 00:28:16,820 --> 00:28:21,050 I'm telling you it's not the identity map, because what I'm going to do is put in this horse again. 301 00:28:21,050 --> 00:28:28,070 OK, whatever you got out, you put it in again, you put it in again. And in fact, you'll see that what will come out are actually the rabbit. 302 00:28:28,070 --> 00:28:35,510 That's what we're going to prove. So A.C., it's actually even if you train it on not just one training example, but you train it. 303 00:28:35,510 --> 00:28:40,880 In this case, we trained it on five hundred training examples and these are all training examples. 304 00:28:40,880 --> 00:28:43,730 But then you just add a whole lot of noise on top of it, 305 00:28:43,730 --> 00:28:49,670 and you just put in this example into your alternate colour and this is all ready after a couple of iterations. 306 00:28:49,670 --> 00:28:54,350 So at the beginning, you get something that looks very similar to whatever you put in and you iterated iterate that. 307 00:28:54,350 --> 00:28:59,180 Then you see that in fact, your training example comes out. So what does this suggest? 308 00:28:59,180 --> 00:29:09,050 Well, it suggests that your ultimate coder learns a map where all of the training examples are fixed points, or at least some of them or fixed points. 309 00:29:09,050 --> 00:29:15,320 So you'll see, I mean, I'll show you. So you can prove which ones are fixed points, and so I'll show you on the next slide. 310 00:29:15,320 --> 00:29:19,850 We can actually train these alternate coders to make all of our training examples see fixed points. 311 00:29:19,850 --> 00:29:26,060 And the question is, of course, how large is the region of attraction right around every one of your training examples? 312 00:29:26,060 --> 00:29:31,160 And that's a good question. So here, for example, you have some for some just examples, right? 313 00:29:31,160 --> 00:29:36,440 So here here what we're showing you is that even if you take away like you, just add half of the image, 314 00:29:36,440 --> 00:29:44,360 just be noise in four hundred and twenty one of the examples, you're still inside the region of contraption around your training example. 315 00:29:44,360 --> 00:29:50,210 You're actually outputting the correct training example and not a wrong training. Examples of these regions seem to be quite large. 316 00:29:50,210 --> 00:29:56,630 This is to us an important research question that we don't have an answer to how these regions of attraction actually look like here. 317 00:29:56,630 --> 00:30:04,550 We're just showing you a very simple example in 2D, where we just created the space and looked at where are we converging to? 318 00:30:04,550 --> 00:30:09,770 And this is how these regions look like, and this will depend very much on the nonlinear you're using. 319 00:30:09,770 --> 00:30:14,250 And so this will be one of the open questions that I'll have in the end. I mean, this is super important to understand, right? 320 00:30:14,250 --> 00:30:20,720 In order to use often coders and different kinds of applications, or you may want to have these regions of attraction be slightly different. 321 00:30:20,720 --> 00:30:24,840 And this will really depend on the known use. 322 00:30:24,840 --> 00:30:33,890 OK, so as I said, so what you can actually do with what we cannot prove is that we don't have any other attractors in this whole landscape other than, 323 00:30:33,890 --> 00:30:38,210 you know, just gritting it and seeing that we're always converging to one of the training examples. 324 00:30:38,210 --> 00:30:45,650 What we can prove when you train your network is whether every training example is an attractive because you just look at the 325 00:30:45,650 --> 00:30:54,740 eigenvalues right at at your training examples and then you see that there is more than one so that you are an instructor at this, 326 00:30:54,740 --> 00:31:04,670 at the at the training exam. So here is a network which is actually quite small where we trained it and we can prove that this network 327 00:31:04,670 --> 00:31:10,230 actually has all five hundred training examples in this case from imaging and actually as attractive. 328 00:31:10,230 --> 00:31:15,560 So we don't need very big networks to make all of your training examples be attractive. 329 00:31:15,560 --> 00:31:21,740 OK, so what can we actually prove about all of this? So and this is what we have. 330 00:31:21,740 --> 00:31:27,440 So for now, we only have it for one training example, and this is the following results. 331 00:31:27,440 --> 00:31:31,640 So you have your ultimate coder and with K. 332 00:31:31,640 --> 00:31:34,430 So now here we have something called under suitable conditions. 333 00:31:34,430 --> 00:31:39,440 So actually, all of the nonlinear reviews that are standard use will satisfy these conditions. 334 00:31:39,440 --> 00:31:46,910 initialisation has to be close enough to zero. And then what happens when you give me a nonlinear the initialisation? 335 00:31:46,910 --> 00:31:52,880 Then what's this theorem gives you as a is a is a formula for the maximal eigenvalue of the programme. 336 00:31:52,880 --> 00:32:00,430 So, you know, that's a training example will be a fixed point, an attractive fixed point if this eigenvalue smaller than one. 337 00:32:00,430 --> 00:32:01,650 OK, so from this formula, 338 00:32:01,650 --> 00:32:09,200 what you can then figure out is what kind of depth and what kind of width you need in order to make this training example to. 339 00:32:09,200 --> 00:32:14,990 And so all this says, you know, if you're over parameterised enough in terms of depth and width, then in fact, 340 00:32:14,990 --> 00:32:21,070 this training example will become an attractor for whatever nonlinear event initialisation for. 341 00:32:21,070 --> 00:32:30,380 All right. It's about the phenomenon of the Joneses driving in particular to avert parameterised autumn colours for any kind of and good. 342 00:32:30,380 --> 00:32:34,630 Now this is in fact very particular to overpromise Drysdale. Yep. 343 00:32:34,630 --> 00:32:40,330 And in fact, when you're also plying them, you will see it that in fact, if you have your under parameterised ones in general, 344 00:32:40,330 --> 00:32:43,960 you'll not compete so you'll not converge to training example, you'll just convert. 345 00:32:43,960 --> 00:32:48,280 You can even convert to something like completely that doesn't look like an image at all. 346 00:32:48,280 --> 00:32:53,740 And you can convert it to stuff that looks kind of like. Yeah, yeah. 347 00:32:53,740 --> 00:32:58,530 So it's like you have to factor in the identity map in many states or something like that. 348 00:32:58,530 --> 00:33:02,910 I mean, yes, so let me actually show you how I'm thinking about it in terms of the maps that are burned. 349 00:33:02,910 --> 00:33:05,460 Exactly so. Yes. 350 00:33:05,460 --> 00:33:12,690 So first of all, and these are the kinds of pictures, so in some sense, OK, so we all know right people were Crowdrise neural networks. 351 00:33:12,690 --> 00:33:19,920 They can interpolate training data. But I just want too many papers use interpolation and memorisation as the same as the same thing. 352 00:33:19,920 --> 00:33:23,580 So I want to make a difference between the two because to me, they're not the same. 353 00:33:23,580 --> 00:33:30,690 So interpolation, right, just means that I'm learning some function where as training example will be mapped to itself prior to training, 354 00:33:30,690 --> 00:33:33,520 example will be mapped to itself. So you have zero loss. 355 00:33:33,520 --> 00:33:39,630 OK, so there are many different maps, right, that can interpolate training data so I can have this crazy makeover here. 356 00:33:39,630 --> 00:33:45,120 So this is an ultra encoder that goes from hour one. You should read this to our one. 357 00:33:45,120 --> 00:33:48,990 And say, this is my training example. This is a training example. This is a training. 358 00:33:48,990 --> 00:33:54,360 Examples of this kind of crazy map isn't interpolating map. Every training example is exactly map. 359 00:33:54,360 --> 00:34:01,440 So now this map here in the middle is also an interpolation solution, and this one here is also an interpolating solution. 360 00:34:01,440 --> 00:34:08,250 Now what this map here is is the map that you have basically seen at the beginning where we use the units on one training example, 361 00:34:08,250 --> 00:34:12,010 even if you use it on two or three. What you'll see is that it maps. 362 00:34:12,010 --> 00:34:17,700 This is the point map, right? Where all of these train, all of these test examples in this case, 363 00:34:17,700 --> 00:34:22,170 here would be maps to this example, and all of these fellows here will be mapped to this one. 364 00:34:22,170 --> 00:34:28,860 So this is the point map, right where you just exactly immediately point a map out the training example itself. 365 00:34:28,860 --> 00:34:33,540 And then what we've seen on the previous slide is when you have when you have very deep networks, 366 00:34:33,540 --> 00:34:36,960 this is what is happening when you don't have that deep networks, 367 00:34:36,960 --> 00:34:42,720 but still over parameterised like quite wide networks, then what is happening is this following map here, right? 368 00:34:42,720 --> 00:34:47,650 Where if you iterate the map many times, then this comes up. 369 00:34:47,650 --> 00:34:55,780 Right. But but for now, the the map that is actually learnt is just a map that is contracts at each one of the training. 370 00:34:55,780 --> 00:34:59,800 So it is contracted at each one of the training examples, and once you iterate, well, 371 00:34:59,800 --> 00:35:04,650 if you just iterate this map here and in fact, this map over here, we're sorry. 372 00:35:04,650 --> 00:35:08,860 All right. Ask a clarifying questions. Yes. Why? 373 00:35:08,860 --> 00:35:14,380 Why? Why is it important to have these attractive training points? 374 00:35:14,380 --> 00:35:18,520 Why would I want to have such a map? I think that's a great question. 375 00:35:18,520 --> 00:35:24,910 So here there has actually been and so so you were asking questions about even is this important for generalisation? 376 00:35:24,910 --> 00:35:31,120 And then I'll have to also talk about why, why, what is even generalisation and for optimal coders? 377 00:35:31,120 --> 00:35:35,320 So so what is kind of nice and why we didn't really have to do? 378 00:35:35,320 --> 00:35:37,720 So first of all, it works in our application. Okay. 379 00:35:37,720 --> 00:35:43,360 So that's maybe one way of saying like why you actually care about these things over permettrait guilt and coding. 380 00:35:43,360 --> 00:35:51,310 The other thing is that there has actually been work before where they wanted to make alternate coders be attractive at the training examples, 381 00:35:51,310 --> 00:35:55,900 thinking that this is actually a good property where they added regular risers to 382 00:35:55,900 --> 00:36:03,190 make the map attractive at the at the at the at the at the training examples. 383 00:36:03,190 --> 00:36:03,640 So, 384 00:36:03,640 --> 00:36:11,650 so people have already used these kinds of map with the additional regular or seen that these alternate coders actually work very well in practise. 385 00:36:11,650 --> 00:36:16,210 And so what we're showing here is that you actually don't need any regular writers like just over 386 00:36:16,210 --> 00:36:22,680 parameterisation by itself will give you this property of its being attractive at the trainees. 387 00:36:22,680 --> 00:36:33,130 OK. So this being attractive at the training examples somehow makes the auto encoder work better in some way in many different applications. 388 00:36:33,130 --> 00:36:36,300 Yes. OK, so we've seen it in our drug example. 389 00:36:36,300 --> 00:36:41,940 Others have just added regular razors to actually have this property and have them see that this work actually plays well. 390 00:36:41,940 --> 00:36:45,030 And so we're quite well up, and now we have to talk about generalisation. 391 00:36:45,030 --> 00:36:50,100 I mean, there is no such thing as like, what does that even mean to generalise? Well, for an alternate cruiser, right? 392 00:36:50,100 --> 00:36:57,210 So this is another question that I think is an important one where there should be much more a bit more foundational work, right? 393 00:36:57,210 --> 00:37:04,620 So now people use it in the sense that like oh, and often could generalise as well if I'd learn something close to a density map. 394 00:37:04,620 --> 00:37:08,520 But I mean, I I don't agree with that definition at all, right? 395 00:37:08,520 --> 00:37:14,430 If you want the identity map, you don't have to train. I mean, it's if this is what you want and just don't train. 396 00:37:14,430 --> 00:37:18,840 So that cannot be the right definition of generalisation, but maybe not here. 397 00:37:18,840 --> 00:37:22,890 I'm just proposing something that could be something, but we have nothing to work on it. 398 00:37:22,890 --> 00:37:29,730 But maybe a kind of definition that might make sense is that you should be close to the identity map, 399 00:37:29,730 --> 00:37:34,500 but at the same time, be contract this that you're training examples. 400 00:37:34,500 --> 00:37:41,130 So that might be an alternate and at least you have to train. And then what would happen is that these overdramatise in particular are very wide. 401 00:37:41,130 --> 00:37:45,210 Networks will actually do very well if that's your definition of generalisation. 402 00:37:45,210 --> 00:37:51,360 But I think there should be a lot more, you know, kind of more foundational work on what it actually means to generalise well for all coders. 403 00:37:51,360 --> 00:37:57,720 And certainly not. It shouldn't be that you want to learn gegenpressing. OK, so just one last clarifying question. 404 00:37:57,720 --> 00:38:09,060 Could you please recap the intuition of like why having having training examples as attractive points make it work well in your drug examples? 405 00:38:09,060 --> 00:38:13,560 Oh yeah. So so I haven't even talked about that yet. OK. So for us. 406 00:38:13,560 --> 00:38:18,330 So the intuition and I think by now, we kind of have a proof for what is actually happening. 407 00:38:18,330 --> 00:38:21,900 But the intuition at that point was like, OK, so here what? 408 00:38:21,900 --> 00:38:28,380 We and this war came before our great examples of this work was there, and then we built on it and tried to use it for for drugs. 409 00:38:28,380 --> 00:38:35,520 So the intuition was like, Well, here what you're seeing is that you're going to become attractive at these zero dimensional points. 410 00:38:35,520 --> 00:38:43,080 So you're making points that are similar, more similar to the training examples, but right, because it's going to be attractive. 411 00:38:43,080 --> 00:38:49,260 So we were hoping that maybe this doesn't only hold four points, but it will actually also make lines. 412 00:38:49,260 --> 00:38:55,770 So one dimensional things that are similar in even manifolds like low dimensional manacles that are similar to each other, more similar to each other. 413 00:38:55,770 --> 00:39:02,490 Mm hmm. That was the intuition. OK, so that we make things that are already similar to each other more similar to each other. 414 00:39:02,490 --> 00:39:07,230 And that's what seems to be happening and in at least two drug examples. Right? 415 00:39:07,230 --> 00:39:15,010 That was the intuition. Let's make let's use this over parameter settings so that it makes things that are similar. 416 00:39:15,010 --> 00:39:21,060 And that's kind of what you're seeing, right? When you when you see this right, that's making something like this is your training example. 417 00:39:21,060 --> 00:39:30,090 It was already similar to this training example. So the application of this network will make things more similar to each other. 418 00:39:30,090 --> 00:39:32,310 But I agree that there is quite a bit of a step there, 419 00:39:32,310 --> 00:39:36,780 and this is definitely the kind of work that we're really excited about doing now on the theoretical side. 420 00:39:36,780 --> 00:39:47,080 Why does this actually work? That was the intuition. Yep, and generally, I think the intuition for why you would want something like this is like, 421 00:39:47,080 --> 00:39:53,400 you want it to be contracted out to training examples because you do believe that you're you're training examples are important examples. 422 00:39:53,400 --> 00:39:59,880 Or hopefully, you know, they were not just like, know they are actually hopefully representative of whatever you are expecting to see otherwise. 423 00:39:59,880 --> 00:40:06,810 And hopefully, if you have something close by, well, maybe you may want to make it a little bit more closer to whatever you have already seen before. 424 00:40:06,810 --> 00:40:12,840 So this kind of maps or self regularising. So this was at least at some point the intuition, 425 00:40:12,840 --> 00:40:21,790 and this was work by Joshua Bengio on why they had actually introduced these regular risers to make their all time coders more contract. 426 00:40:21,790 --> 00:40:26,220 Let's hear what this says is you don't need any regular risers. You'll just actually get your contract. 427 00:40:26,220 --> 00:40:32,730 If maps just over pemetrexed, just make your space more jam so that you so that you're actually going to see this. 428 00:40:32,730 --> 00:40:37,810 Cool. Thank you. Great questions. Good. 429 00:40:37,810 --> 00:40:41,860 OK, so this is, as I said, this is the only proof we have. So one training example, 430 00:40:41,860 --> 00:40:47,290 so now everything will be and this kind of goes into like the intuition for what we're doing now 431 00:40:47,290 --> 00:40:52,030 in the lab and all of our applications is always very wide networks and not very deep networks. 432 00:40:52,030 --> 00:40:55,060 And so the intuition somehow already came from here, right? 433 00:40:55,060 --> 00:41:00,610 You have seen it that when you are very deep, what you actually get is these maps that directly you, Matthew, to a training example. 434 00:41:00,610 --> 00:41:06,790 That's probably not what you want, right? You probably don't want that would ever test example you've put in like a training example comes out. 435 00:41:06,790 --> 00:41:12,100 You probably want something that is kind of close to the identity, right? But still attractive that you're training examples. 436 00:41:12,100 --> 00:41:16,360 And it seems like very wide networks and just a little bit of depth is actually doing that. 437 00:41:16,360 --> 00:41:23,110 And so here is the intuition for it, and we don't have a proof, but it would be really nice to actually have a proof for these kinds of things. 438 00:41:23,110 --> 00:41:31,120 So here you actually see these eigenvalues and this is the top eigenvalue and how it changes when you're changing depth and width. 439 00:41:31,120 --> 00:41:35,260 So here what you see is with increasing and this is maybe difficult. 440 00:41:35,260 --> 00:41:39,140 So that's why we have like two different plots, one which you look at the top one, 441 00:41:39,140 --> 00:41:43,330 whatever percent of eigenvalues and here you only look at the top one. 442 00:41:43,330 --> 00:41:47,170 But in all of our simulations and this again, this is just a conjecture. 443 00:41:47,170 --> 00:41:53,720 It seems that what happens with width is that the variance of your eigenvalues shrinks. 444 00:41:53,720 --> 00:42:02,140 OK. What happens with that seems like what it's that the distribution just shifts towards become smaller and smaller. 445 00:42:02,140 --> 00:42:06,400 So what you would want, right? If you want to be close to the identity and still contract, 446 00:42:06,400 --> 00:42:13,630 if what you want is that all of your eigenvalues are just below one but not too small because otherwise you're in this setting, right? 447 00:42:13,630 --> 00:42:18,130 If they're are all like very close to zero, then you just have your points back. 448 00:42:18,130 --> 00:42:24,520 So that's not what you want, right? So that means that what you want this to be very, very wide so that you have a very small variance. 449 00:42:24,520 --> 00:42:31,870 And then just deep enough so that your eigenvalues are just below one so that it is contrasted that the train. 450 00:42:31,870 --> 00:42:38,350 So that's at least the intuition of how we're using these open courses right now in all of our application. 451 00:42:38,350 --> 00:42:43,180 But there is a lot to prove here and now comes a strong example, 452 00:42:43,180 --> 00:42:49,240 and I already gave you that intuition for why we have used it here, just seen that it works well. 453 00:42:49,240 --> 00:42:55,180 And of course, there is a lot to do in terms of proving that this is in fact the right intuition. 454 00:42:55,180 --> 00:42:58,630 Why does it make also similar things more similar to each other, 455 00:42:58,630 --> 00:43:03,910 even if it's not just important, but that was at least intuition for us to try to use this? 456 00:43:03,910 --> 00:43:11,800 Now how do we use it now that we have this, these over parameterised? Often coders actually want to also show you how you use it in this drug example. 457 00:43:11,800 --> 00:43:23,380 So this is a standard approach of like how people do it in a computational drug discovery is often using this kind of signature matching approach. 458 00:43:23,380 --> 00:43:29,140 So here what we have is this is again data that came out after source code started, right? 459 00:43:29,140 --> 00:43:31,930 You have like your gene expression in the normal state. 460 00:43:31,930 --> 00:43:36,460 So these long epithelial cells normal state and then this is what happens when you add the virus. 461 00:43:36,460 --> 00:43:43,120 OK, so this is the disease state. And so this vector here and you see this is double the virus and ACE2 and whatever, 462 00:43:43,120 --> 00:43:46,270 you know, many different kinds of just to see that this is kind of robust. 463 00:43:46,270 --> 00:43:50,320 So this direction always seems to be very similar and this is the disease direction. 464 00:43:50,320 --> 00:43:55,180 So this is the effect of adding the virus does this to your gene expression. 465 00:43:55,180 --> 00:44:02,330 So now this disease, the signature matching what it does is it tries to find a drug that its effect is upwards. 466 00:44:02,330 --> 00:44:10,900 OK, so that hopefully you can just get rid of the effects of the of the disease by adding a drug that moves you back to the normal state. 467 00:44:10,900 --> 00:44:13,360 OK, so now what you want is to find a drug, right? 468 00:44:13,360 --> 00:44:20,470 So so now we'll actually map out all of the effects of the drugs in our latent space, which now is more aligned than what we had before. 469 00:44:20,470 --> 00:44:21,520 And then in there, 470 00:44:21,520 --> 00:44:29,860 now we look at which ones of the drugs is most A. correlated with the effects of the disease and just choose those type drugs that you get out there. 471 00:44:29,860 --> 00:44:37,450 And in fact, if you look at these top drugs, which is really nice, is that in fact, they all in two different classes of drugs. 472 00:44:37,450 --> 00:44:42,610 So, you know, you don't get just like some completely random list of drugs you get like actually two different classes of drugs. 473 00:44:42,610 --> 00:44:45,250 They're all very nicely consistent. 474 00:44:45,250 --> 00:44:52,930 And so whatever sort of drugs, this sort of thing is that many and this is where we also wanted to look at these drugs have. 475 00:44:52,930 --> 00:44:59,320 There are some known targets of these drugs, and these are all actually quite a lot of cancer drugs, which have a lot of targets. 476 00:44:59,320 --> 00:45:03,970 So what we wanted to do is not just have one of these cancer drugs that have a lot of targets, 477 00:45:03,970 --> 00:45:12,700 but also validate or try to predict what is actually the causal target amongst all of these list of targets that these cancer drugs actually have. 478 00:45:12,700 --> 00:45:17,200 So here you have a list of all these genes, right, that are targeted by these cancer drugs. 479 00:45:17,200 --> 00:45:23,410 And we just wanted to know amongst all of these which one would be the putative one that we think is the cause and the one amongst all of these. 480 00:45:23,410 --> 00:45:27,340 Because then maybe you could use a drug that is more specific to that particular 481 00:45:27,340 --> 00:45:31,420 target and doesn't have as many side effects as these drugs that were found here, 482 00:45:31,420 --> 00:45:40,990 which are all these calls, which are all these cancer drugs. And so now this is where we used our previous all of this causal causal structure. 483 00:45:40,990 --> 00:45:48,550 Learning algorithms to figure out which ones have all of these targets would be the one that we hypothesise is that is the causal one. 484 00:45:48,550 --> 00:45:54,820 So what do we do here? So again, what we're trying to do now is to try to actually learn these causal graphs. 485 00:45:54,820 --> 00:45:56,980 And so we just learnt a causal graph. 486 00:45:56,980 --> 00:46:03,550 So when you take out all of the genes that are differentially expressed into disease and amongst all of these targets, 487 00:46:03,550 --> 00:46:07,720 and you also learn the graph with all of these targets here and what you expect, 488 00:46:07,720 --> 00:46:15,700 what would be a good target is in fact a node that is most up that is upstream to most of the disease genes, right? 489 00:46:15,700 --> 00:46:20,980 So you want to target something that is most upstream to all of the genes that are changing into disease? 490 00:46:20,980 --> 00:46:27,130 That's a good target. And so what we're looking for is which one amongst all of these genes that are 491 00:46:27,130 --> 00:46:32,970 targeted by these drugs is upstream of most of the disease associated genes. 492 00:46:32,970 --> 00:46:39,010 OK, so we just learnt these graphs and does have a score for each one of the of the targets from the previous slide. 493 00:46:39,010 --> 00:46:44,770 And what comes out to be the most upstream is the specific one I knew nothing about it came on before starting this, 494 00:46:44,770 --> 00:46:51,740 but it's actually a super, super interesting protein. In fact, it's very stable, so you can do this in different kinds of cancer. 495 00:46:51,740 --> 00:46:55,300 So this is a cell line. Do this also in the CMF datasets. 496 00:46:55,300 --> 00:47:03,610 If you do it in the ones that are actually in our body, these cells, you get out exactly the same risk one protein that is most upstream. 497 00:47:03,610 --> 00:47:07,930 And what is really interesting is actually and this again wasn't put into our model. 498 00:47:07,930 --> 00:47:15,250 One directly binds to SARS-CoV-2 proteins, and in fact, it targets that just targets with one very specifically, 499 00:47:15,250 --> 00:47:22,690 just enter phase two trials, which was quite nice as well. So one of the super interesting protein. 500 00:47:22,690 --> 00:47:25,300 It also has two different. It can have very different effects. 501 00:47:25,300 --> 00:47:31,600 And this is what we have also looked at quite a bit because an individual source who has a very different effects than in young individuals, 502 00:47:31,600 --> 00:47:37,310 it can actually turn on two different pathways. And we think that this might be or hypothesised that this might be going on in. 503 00:47:37,310 --> 00:47:42,950 Individuals versus young individuals and young individuals that might actually be turning on the immune response pathway, 504 00:47:42,950 --> 00:47:51,020 survival pathways, but it can actually also turn on other pathways, which are the fibrosis, the death pathways which actually lead to fibrosis, 505 00:47:51,020 --> 00:47:55,670 blood clotting, etc. exactly the kinds of different outcomes that you'll see in patients. 506 00:47:55,670 --> 00:47:57,650 That is quite an interesting protein that came out. 507 00:47:57,650 --> 00:48:04,620 Of course, a lot of things tend to develop, but at least there is a hypothesised mechanism as well of how this could. 508 00:48:04,620 --> 00:48:11,520 OK, so I am kind of at the end, so five, 20, so this is just an overview that we looked at one causal question, 509 00:48:11,520 --> 00:48:17,520 which is how to predict the effects of interventions, right? In this case, drugs and move over. 510 00:48:17,520 --> 00:48:20,790 So there are all these transport problems in biology that I find really exciting. 511 00:48:20,790 --> 00:48:26,730 So you have like this transport problem of like you have some drugs that you have observed in 512 00:48:26,730 --> 00:48:31,290 some cell types and you want to predict the effects of these drugs and in other cell sites. 513 00:48:31,290 --> 00:48:33,000 I mean, the more extreme case would be right. 514 00:48:33,000 --> 00:48:39,330 I have these drugs that I have observed in mouse, and I would really love to know what these drugs actually do in humans. 515 00:48:39,330 --> 00:48:44,280 Of course, we for now did this just for cell types, and I think that's maybe the more reasonable thing to do. 516 00:48:44,280 --> 00:48:50,790 But then you have also much harder problems, which is, you know, I have some drugs that I applied in one cell type. 517 00:48:50,790 --> 00:48:55,440 Now I would like to be able to transport to a new drug. 518 00:48:55,440 --> 00:48:57,930 OK, so I have these drugs applied in one cell type. 519 00:48:57,930 --> 00:49:03,870 Now I have a new drug that I would like to be able to predict what happens when you apply this new drug that you have never seen before. 520 00:49:03,870 --> 00:49:08,670 We had already seen the drug and we just wanted to apply it for different cell type where we also have some data. 521 00:49:08,670 --> 00:49:12,420 But here you have now a new drug and you would like to transport one or, you know, 522 00:49:12,420 --> 00:49:17,100 you can have all these other transport problems that are really less causal if you want to be able 523 00:49:17,100 --> 00:49:21,690 to transport between very different data modalities in biology or between different time points. 524 00:49:21,690 --> 00:49:26,700 Right. So something that that has always kind of bothered me when we're thinking about, 525 00:49:26,700 --> 00:49:30,900 say, early cancer detection is that if I'm training on data from a pathologist, 526 00:49:30,900 --> 00:49:38,700 I'll never be able to to detect cancer earlier than the pathologist because I'm always using it using the training data from the pathologist. 527 00:49:38,700 --> 00:49:43,230 So I need a way to somehow generate data, right? That corresponds to earlier time points. 528 00:49:43,230 --> 00:49:50,640 So how can I move between different time points, so generate my own training data of how these cells would have looked, I guess, earlier? 529 00:49:50,640 --> 00:49:52,830 So I guess these are other transport problems, 530 00:49:52,830 --> 00:49:59,520 and you see that there is a whole lot of nice machine learning kinds of questions and all these transport companies in biology. 531 00:49:59,520 --> 00:50:07,980 So with that, let me end. So some conclusions. So I think drug discovery requires quite a lot more. 532 00:50:07,980 --> 00:50:11,490 More theoretical kinds of frameworks to look at things in particular. 533 00:50:11,490 --> 00:50:16,080 I think causality is an important framework in drug discovery because drugs are interventions 534 00:50:16,080 --> 00:50:21,870 in the system and this is the only way to be able to predict the effects of an intervention. We actually take a causal approach. 535 00:50:21,870 --> 00:50:27,480 Alton coders are not only extremely useful in biology, which my lab has used them a lot, 536 00:50:27,480 --> 00:50:32,370 but I think also really useful for studying just theoretical properties of neural networks. 537 00:50:32,370 --> 00:50:35,130 Here, you know that they're actually learning these contracted maps. 538 00:50:35,130 --> 00:50:40,710 We're only able to see it because all soldiers are so nice that you can just we applied the maths, right? 539 00:50:40,710 --> 00:50:47,040 It is hard to think of like how this would generalise or to to give you insights about the classification setting. 540 00:50:47,040 --> 00:50:50,100 I think that is a really nice question here. 541 00:50:50,100 --> 00:50:55,830 I sold all of these results and over parameterisation that they showed these remarkable self regularisation properties. 542 00:50:55,830 --> 00:50:58,590 They learnt these contracts with massive training examples. 543 00:50:58,590 --> 00:51:03,030 Now this is super important for you if you're thinking about how just how could memory work, 544 00:51:03,030 --> 00:51:06,630 etc. So this is actually a new mechanism for associative memory. 545 00:51:06,630 --> 00:51:11,490 It's also, of course, it has negative, you know, negative consequences as well. 546 00:51:11,490 --> 00:51:16,500 It probably means that you may not want to share a trained Alton coder between different hospitals, right? 547 00:51:16,500 --> 00:51:20,460 Because you can just put a noise and outcomes or training example. 548 00:51:20,460 --> 00:51:25,020 Certainly means that you have a lot of privacy issues there and a lot of open problems. 549 00:51:25,020 --> 00:51:29,610 I mean, causal transport, the ability is one right and you actually prove that you do get causal. 550 00:51:29,610 --> 00:51:31,710 You get like if you're overcome with tries, 551 00:51:31,710 --> 00:51:40,560 you can actually do this or you can sometimes and hopefully often do this causal transport mobility problem, then I already said classification. 552 00:51:40,560 --> 00:51:44,970 What does this tell us about classification networks would love to have a general definition 553 00:51:44,970 --> 00:51:50,310 of generalisation for all that makes sense and is not just learning identity map. 554 00:51:50,310 --> 00:51:53,750 And I also already mentioned this. We have no idea. 555 00:51:53,750 --> 00:52:00,480 We do know from experiments that these attractor landscape is very dependent on the nonlinearity that you're using. 556 00:52:00,480 --> 00:52:03,600 It will be really important to figure out how this actually looks like and how 557 00:52:03,600 --> 00:52:09,220 could one optimise for different kinds of applications and nonlinear conditions? 558 00:52:09,220 --> 00:52:16,420 So with this, let me end acknowledgements, I wouldn't be able to do this without really an amazing group of people, 559 00:52:16,420 --> 00:52:22,300 students, postdocs and, of course, collaborators, so GV, Shiva, Shankar and all of the biological work we have done. 560 00:52:22,300 --> 00:52:26,440 Mitchell Belkin on the overcommit permafrost also encourages work and then, of course, funding. 561 00:52:26,440 --> 00:52:33,940 And I'll end with just one slide of advertisement. I hope that's fine that we just start at the Eric Schmidt centre at the Brote. 562 00:52:33,940 --> 00:52:41,500 If you like these kinds of problems at the intersection of machine learning and biology, Victor is a really exciting centre going on. 563 00:52:41,500 --> 00:52:44,320 We have a lot of workshops if you're just interested in conferences, 564 00:52:44,320 --> 00:52:49,240 learning more about this intersection and we're always looking for postdocs, et cetera. 565 00:52:49,240 --> 00:52:56,320 So with that, thank you very much. And happy to take any questions. 566 00:52:56,320 --> 00:52:59,980 Thanks a lot. I don't know that question. 567 00:52:59,980 --> 00:53:09,880 Please feel free to ask. Otherwise, I do have a question regarding like you show like these are goading the space imaging, 568 00:53:09,880 --> 00:53:15,820 which like after reading your article, there are many things that are different about inputs slammed into. 569 00:53:15,820 --> 00:53:26,950 I mean, you end up dividing the space into regions like I wonder like what is the relation with with this method with modern life 570 00:53:26,950 --> 00:53:35,440 like way of understanding coding in which you look at these burnard regions and you assign like each point of the space? 571 00:53:35,440 --> 00:53:41,110 I mean, do you partition the space according to how close they are to different centres? 572 00:53:41,110 --> 00:53:47,110 Yeah, exactly. So that's that's what seems to happen when you have sigmoid. Yeah. 573 00:53:47,110 --> 00:53:51,980 So I mean. Well, so that's that would exactly be like a kind of conjecture, right? 574 00:53:51,980 --> 00:53:58,300 So when, when and under what kinds of nonlinearity or other architecture constraints? 575 00:53:58,300 --> 00:54:06,190 I don't I don't know how much that will matter. But like, you know, when will you get something that is just closeness? 576 00:54:06,190 --> 00:54:15,520 Right? So so these pictures might be might be leading us into wrong kinds of directions because these are just two dimensional thoughts, right? 577 00:54:15,520 --> 00:54:19,610 What happens? And just like how you overpayments dry setting, you know, it's not so clear. 578 00:54:19,610 --> 00:54:24,340 So maybe it doesn't depend so much on the nonlinear RV anymore? Who knows? 579 00:54:24,340 --> 00:54:30,700 Right. So so. But yes, but that's exactly the kind of question is like when is there a nonlinear already or is it always the case? 580 00:54:30,700 --> 00:54:35,290 There will just map is the closest one in closeness and what norm? 581 00:54:35,290 --> 00:54:41,710 Mm hmm. Yeah. And do you think like the kind of like this different like behaviour that you may have, 582 00:54:41,710 --> 00:54:47,050 might like, underlie, like different notions of generalisation? 583 00:54:47,050 --> 00:54:52,270 I would think that probably for different kinds of applications, you may want to choose it in different ways. 584 00:54:52,270 --> 00:54:58,330 Also, you may want to have I mean, what I think would be really nice is to have an example like, 585 00:54:58,330 --> 00:55:05,830 can you come up with an example that you will use as a training example that will just have a huge attract, 586 00:55:05,830 --> 00:55:11,230 like huge base enough attraction where somehow all of the nonsense will be matched to? 587 00:55:11,230 --> 00:55:12,220 That would be great, right? 588 00:55:12,220 --> 00:55:18,190 Can you just somehow come up with an example where all the nonsense and for that, you really need to understand it quite well? 589 00:55:18,190 --> 00:55:25,960 Mm hmm. I see. And regarding like the application to. 590 00:55:25,960 --> 00:55:36,790 Gore, that pharmaceutical, do you think what is the current state of technological development there, that La Liga pharma industry, 591 00:55:36,790 --> 00:55:45,010 which has said they are interested in applying this more like a futuristic ways of understanding their role? 592 00:55:45,010 --> 00:55:51,400 What they are super interested. So we'll have joint post-ups. So yes, I think it's actually really exciting time now. 593 00:55:51,400 --> 00:55:56,440 Where is actually really thinking about how to get into machine learning and how to use some of these, you know, 594 00:55:56,440 --> 00:56:02,410 methods where you have some theory, etc. And I think they're getting really excited about these things as well. 595 00:56:02,410 --> 00:56:07,780 So are more and more pharma industries that are investing heavily into ML and actually have really good people. 596 00:56:07,780 --> 00:56:12,910 So I think it is exciting to work with them together if this is an area that you're excited about. 597 00:56:12,910 --> 00:56:14,200 Yeah, it's really nice to see. 598 00:56:14,200 --> 00:56:22,470 I think they realise that something has to change and a lot of money is being put into things that in the end have not been super successful. 599 00:56:22,470 --> 00:56:26,770 So maybe this is a new approach worth investing into as well. 600 00:56:26,770 --> 00:56:32,600 So, yeah, no, I think it's a really exciting time for people in Michigan. 601 00:56:32,600 --> 00:56:38,920 Yeah. So could I ask the question, please? 602 00:56:38,920 --> 00:56:44,410 Hi, my name is Karen Amon, oncologist and on the COVID side with the drugs. 603 00:56:44,410 --> 00:56:49,840 And I just wanted to make a TV. And yeah, so you'll know all of them. 604 00:56:49,840 --> 00:57:01,550 And so in the list of drugs, there is a clear bias towards anti angiogenesis drugs like pazopanib as as they're exit, 605 00:57:01,550 --> 00:57:04,390 and that may warrant further investigation. 606 00:57:04,390 --> 00:57:11,680 And we're looking to just just for my own curiosity into the corporate literature, and I wouldn't be surprised. 607 00:57:11,680 --> 00:57:16,150 And there's also like monoclonal antibodies against TNG Genesis, 608 00:57:16,150 --> 00:57:27,280 which would also have an immunomodulatory role and in the pictographs and oestradiol, although it's sort of take it. 609 00:57:27,280 --> 00:57:34,300 So this one is actually in the original space. So this one was not in the latent space time or so this one was. 610 00:57:34,300 --> 00:57:40,840 That's why I was so surprised. I mean, I actually thought maybe like the people who came up with chloroquine actually just did the simple analysis. 611 00:57:40,840 --> 00:57:44,260 But then people tell me, No, this is all ready to advance in something like this, 612 00:57:44,260 --> 00:57:49,120 because here actually chloroquine amongst all of the drugs amongst all of the list and ZMapp actually 613 00:57:49,120 --> 00:57:55,330 came out to be the most anti correlate that in the original space with with the disease vector, 614 00:57:55,330 --> 00:58:00,430 which I find amazing because in the latent space is not the case. 615 00:58:00,430 --> 00:58:05,360 But they said this is probably not what people have done when they came up with this programme. 616 00:58:05,360 --> 00:58:14,770 Did did anyone look at the Easter dial because COVID supposed to affect men more than women in terms of severe symptoms? 617 00:58:14,770 --> 00:58:19,090 And yeah, so this one, we could check where this one happens to be. 618 00:58:19,090 --> 00:58:22,420 But I mean, it's not one of the top ones yet. 619 00:58:22,420 --> 00:58:23,350 So this would be interesting. 620 00:58:23,350 --> 00:58:29,760 So we haven't I would have to check where this one actually comes out to be in the meeting space and how well it is close. 621 00:58:29,760 --> 00:58:42,860 OK, yeah. Then thanks for the question. Yeah, so since there are no questions. 622 00:58:42,860 --> 00:58:47,030 But thanks a lot, Caroline, this has been very interesting. 623 00:58:47,030 --> 00:58:50,480 Yeah, thank you. This is fun. Hmm. Great. 624 00:58:50,480 --> 00:58:55,100 So to everyone, a great summer and enjoy retirement too. 625 00:58:55,100 --> 00:58:58,200 You too bye. Yeah, thank you. Bye.