1 00:00:00,090 --> 00:00:03,720 My name is Charles Godfrey, I am director of the Oxford Martin School. 2 00:00:03,720 --> 00:00:10,680 And thank you for joining us on an extra Web conversation that we're having over the last term. 3 00:00:10,680 --> 00:00:18,660 We've been having a series of conversations on the topic of what the slogan building back better means after the pandemic. 4 00:00:18,660 --> 00:00:23,070 And that series stopped last week. But something else happened last week. 5 00:00:23,070 --> 00:00:32,600 And this was a really interesting scientific bonds, a potential solution of what is known as a protein folding problem. 6 00:00:32,600 --> 00:00:40,410 And we have a number of emails which asked us, might we have a talk to explore some of the exciting issues around that? 7 00:00:40,410 --> 00:00:46,500 And that is what this talk is today. And I'm joined by two really interesting guess. 8 00:00:46,500 --> 00:00:56,460 One who you can see who's filled begin in front and one who you'll be able to hear, but I'm afraid not see because of a technical issue. 9 00:00:56,460 --> 00:01:01,770 And that is Yvonne Jones. Yvonne, can I just cheque that you are there and listening to us? 10 00:01:01,770 --> 00:01:07,210 I am. And I'm really sorry, everybody. Sorry, Charles and not Brandon. 11 00:01:07,210 --> 00:01:15,180 As being as Charles said, there's something weird has happened to the camera and my laptop and it just refuses to. 12 00:01:15,180 --> 00:01:20,970 There is a gremlin somewhere. So let me begin just by introducing Yvonne. 13 00:01:20,970 --> 00:01:25,740 So Yvonne is professor of Protein Crystallography at Oxford. 14 00:01:25,740 --> 00:01:31,860 She's a fellow of the Royal Society. She co-founded at Oxford with Dave Stewart, the Division of Structural Biology, 15 00:01:31,860 --> 00:01:39,720 which is part of the Nuffield Department of Clinical Met of Clinical Medicine, of which she is Kohat at the moment. 16 00:01:39,720 --> 00:01:48,710 And Phil Felbeck and is professor of computational biochemistry in the Department of Biochemistry, which is part of our medical science division. 17 00:01:48,710 --> 00:01:50,220 And rather than me doing, 18 00:01:50,220 --> 00:01:58,080 I'm going to ask fellin Yvonne in about two sentences just to give us a flavour of the work in their lab at the moment involved. 19 00:01:58,080 --> 00:02:06,280 Might you go first? Sure. I'm interested in the receptors that sit on cell surfaces and taken signals in the 20 00:02:06,280 --> 00:02:11,470 form of proteins that bind to them to allow cells to communicate to each other. 21 00:02:11,470 --> 00:02:21,840 I'm particularly interested in signalling between cells in both in the developing nervous system and also cells involved in the immune system. 22 00:02:21,840 --> 00:02:25,500 And I use protein crystallography to solve those structures, 23 00:02:25,500 --> 00:02:30,730 but I also bring in lots of other techniques because I'm really interested in the way that the proteins interact with each other, 24 00:02:30,730 --> 00:02:36,350 not just in their shape. Thank you. Volume. Yes. 25 00:02:36,350 --> 00:02:39,310 And my lab is also interesting in receptors. 26 00:02:39,310 --> 00:02:49,570 But we predominantly use molecular dynamics simulation technology to look at the underlying dynamics of these proteins. 27 00:02:49,570 --> 00:02:53,350 Looking at things like channels and transport of proteins. 28 00:02:53,350 --> 00:03:02,560 And although we don't do a structure prediction, say, many of the underlying computational methods that they use, we also use as well. 29 00:03:02,560 --> 00:03:08,290 Thank you very much both. So what we're going to do is we're going to just chat very briefly about what a protein is. 30 00:03:08,290 --> 00:03:12,190 And I realise that many people listening will be very familiar with it. 31 00:03:12,190 --> 00:03:15,670 But we're going to have a brief discussion of that just to bring people up to speed. 32 00:03:15,670 --> 00:03:24,310 If they don't know about it. And then we'll go on to what exactly this recent advances and some of the consequences of it. 33 00:03:24,310 --> 00:03:30,910 Now, those of you watching a live in crowd cast will see that darn near the bottom right hand of your screen. 34 00:03:30,910 --> 00:03:36,850 There is a bus. No facility could ask a question. We do really encourage you to ask a question. 35 00:03:36,850 --> 00:03:46,030 There's also the facility to vote questions up so that if someone else has asked the question that you're really interested in seeing answered, 36 00:03:46,030 --> 00:03:48,900 then you you're able to vote it up. 37 00:03:48,900 --> 00:03:58,240 And while it's not exactly a strict democracy when it comes to looking for questions, I see the ones which have got have got the most support. 38 00:03:58,240 --> 00:04:02,860 So let's explore a little bit about the background. 39 00:04:02,860 --> 00:04:14,370 And if on Patch, I go to you just to remind us about what a protein is and why we're worried or why we are interested in its three dimensional shape. 40 00:04:14,370 --> 00:04:22,720 OK, so proteins are the little machines or the little workers in ourselves and also messengers that go between ourselves. 41 00:04:22,720 --> 00:04:36,720 They're encoded by the in the genome, by our DNA sequences, which are translated into what have been described as beads on a string. 42 00:04:36,720 --> 00:04:42,240 So there are a number of different flavours of amino acids. 43 00:04:42,240 --> 00:04:50,820 And these form into polypeptide chains, so long strings of beads, each of which have rather different properties. 44 00:04:50,820 --> 00:04:59,040 Some of them are positively charged and negatively charged or some sticky sort of oily amino acids. 45 00:04:59,040 --> 00:05:08,370 And according to those properties, the beads on the string will fold up into a complex Three-Dimensional Structure. 46 00:05:08,370 --> 00:05:18,840 I heard John Thornton on the on the Today programme, actually, when this announcement about the protein folding problem hit the news week or so ago. 47 00:05:18,840 --> 00:05:21,380 And she described it as. Yes, sort of. 48 00:05:21,380 --> 00:05:28,370 You start out with a shoelace and you're kind of, you know, forming that or tying it up into the three dimensional shape. 49 00:05:28,370 --> 00:05:33,510 So now that kind of origami. Absolutely. Yes. 50 00:05:33,510 --> 00:05:36,720 And why is it interesting the way that it all folds up? 51 00:05:36,720 --> 00:05:41,020 Well, because each protein has its own unique shape, its own unique fold, 52 00:05:41,020 --> 00:05:46,140 and that fold, that shape that it has in three dimensions determines its properties. 53 00:05:46,140 --> 00:05:52,530 What it can do. I just said that that that like little machines in our cells or the messenger is going between cells. 54 00:05:52,530 --> 00:06:00,750 So they are capable of doing all the tasks that we need doing but to do those tasks. 55 00:06:00,750 --> 00:06:07,720 They have a very specific shape. And if you don't know, I'm not explaining this very well. 56 00:06:07,720 --> 00:06:13,170 If you don't know the shape. It's a bit like not knowing what's underneath the bonnet of a car. 57 00:06:13,170 --> 00:06:17,640 You can't work out how they work if you don't know what shape they are. 58 00:06:17,640 --> 00:06:21,620 So understanding a three dimensional shape has really critical. 59 00:06:21,620 --> 00:06:24,330 Now, your professor approaching crystallography, 60 00:06:24,330 --> 00:06:35,120 might you say how crystallography an X-ray diffraction is used to to construct a three to mine to determine a three dimensional shape? 61 00:06:35,120 --> 00:06:46,100 Right. Well, the problem is obviously that that proteins are too small to be able to visualise in using like microscopy. 62 00:06:46,100 --> 00:06:56,300 So there are a number of older its magnitude smaller than than you could do that using using visible light to to to see that detail. 63 00:06:56,300 --> 00:07:09,080 But you can use X-rays. They have a wavelengths that is able to pick up that fine structure. 64 00:07:09,080 --> 00:07:18,020 If you do that, you you can't really just do that by having a single copy of a protein and hitting it with an x ray with x rays. 65 00:07:18,020 --> 00:07:24,440 You need many copies of the protein to all lined up rather nicely in in a crystal structure. 66 00:07:24,440 --> 00:07:29,990 Crystal structures just so that the crystal is just a trick and amplification method. 67 00:07:29,990 --> 00:07:32,990 So you're able to look at many, 68 00:07:32,990 --> 00:07:42,050 many copies of the protein at once and that will give you a strong enough signal to be able to actually solve the structure. 69 00:07:42,050 --> 00:07:46,760 Unfortunately, it's a bit it's a bit more tricky than that because you don't actually manage to 70 00:07:46,760 --> 00:07:52,170 measure all the information you need just from the the amplitudes of the diffract. 71 00:07:52,170 --> 00:07:54,530 Did x rays, as we call them. 72 00:07:54,530 --> 00:08:02,750 There is some other sort of missing bits of information to do with the timing of the phase of the different refracted rays coming off. 73 00:08:02,750 --> 00:08:10,500 A part of the complexity of the problem. Is why so many Nobel prises have gone to x ray crystallography over the years. 74 00:08:10,500 --> 00:08:18,950 I guess. I guess I guess maybe initially that would be a case, because when people like Darcy Hodgkins working in Oxford, 75 00:08:18,950 --> 00:08:28,500 Max Perutz and John Kendrew working in Cambridge, when they started out, it was a mammoth task. 76 00:08:28,500 --> 00:08:33,440 And they had to work out all the methods of how you could how you could even begin to sort out 77 00:08:33,440 --> 00:08:38,360 what a protein structure would look like if they had no idea what it was going to look like. 78 00:08:38,360 --> 00:08:46,180 But. From the beginning, the people who always cheer because it's a difficult thing to do. 79 00:08:46,180 --> 00:08:53,100 Always chosen, structured, trying to do structures that are going to be very enlightening for the biology. 80 00:08:53,100 --> 00:08:57,730 So in the case of Dorothy at she she worked for many years on insulin. 81 00:08:57,730 --> 00:09:01,290 She thought this is going to be an important structure to understand, 82 00:09:01,290 --> 00:09:11,230 because ultimately we want to be able to make forms of insulin that will help diabetics and mobile proteins be crystallised. 83 00:09:11,230 --> 00:09:16,860 Although some proteins that you just can't get a structure out of using this approach. 84 00:09:16,860 --> 00:09:23,630 Yeah, I mean, for many proteins that we have succeeded in in crystallising the various tricks. 85 00:09:23,630 --> 00:09:32,130 Quite a lot of them work over many years to get to persuade them to crystallise using different techniques to get them to crystallise. 86 00:09:32,130 --> 00:09:38,280 But there are also some proteins that have just never crystallised. 87 00:09:38,280 --> 00:09:47,860 Now then quite why that might be one. One whole set of proteins that weren't crystalise are those that are only partially structured. 88 00:09:47,860 --> 00:09:52,230 Don't really become structured when they interact with other proteins. 89 00:09:52,230 --> 00:09:56,460 Well, that's another whole different different problem. But yeah, that we do. 90 00:09:56,460 --> 00:10:00,060 Yeah. It's it's a bit of a blackout persuading them to crystallise. 91 00:10:00,060 --> 00:10:07,510 But of course, you know, there are other methods for doing experimentally solving Three-Dimensional Protein Structures. 92 00:10:07,510 --> 00:10:10,800 I was going to ask you about that. But you have a lot of outcry. 93 00:10:10,800 --> 00:10:18,870 OEM, might you just explain what matters and what different scientists make to a protein that structural studies? 94 00:10:18,870 --> 00:10:25,520 So I was saying that x rays are can can can allow you to probe the fine, 95 00:10:25,520 --> 00:10:35,250 the fine detail that sort of dimensions of the distances between atoms that you need to know to visualise a protein structure. 96 00:10:35,250 --> 00:10:40,850 Well. Electrons can allow you to do that as well. 97 00:10:40,850 --> 00:10:48,100 So an electron microscope will allow you to solve the protein structure. 98 00:10:48,100 --> 00:10:56,860 But for many years, that promise of a big able to solve the protein structure was out of reach. 99 00:10:56,860 --> 00:11:05,620 The first hurdle was, again, that you need to be able to look at a time to gather the information from a lot of different copies of the protein. 100 00:11:05,620 --> 00:11:12,880 And each one individually would be damaged very rapidly by being hit by the electrons in electron microscope. 101 00:11:12,880 --> 00:11:20,020 So you have to be able to do cryopreserved to embed them in in vitreous ice in a way 102 00:11:20,020 --> 00:11:26,350 that people discovered ways of being able to do that without damaging the protein. 103 00:11:26,350 --> 00:11:33,310 But then you need to be able to collect huge amounts of data to do so very sensitively. 104 00:11:33,310 --> 00:11:40,660 So you need really, really good detective systems. And then you need really, really good computer power to crunch them, 105 00:11:40,660 --> 00:11:45,280 because some of the the tricks that we've been able to use in protein crystallography, 106 00:11:45,280 --> 00:11:50,290 the fact that we've got crystal, the computer has to do that for the criterium. 107 00:11:50,290 --> 00:11:55,540 It has to be able to add together all the individual images from many, many, many, 108 00:11:55,540 --> 00:12:01,840 many, many copies of the individual protein molecules that you've got on your bank. 109 00:12:01,840 --> 00:12:09,610 We had a grid get so needed, really good computers needed, really big advances in detectors and bingo, you start to be able to do it. 110 00:12:09,610 --> 00:12:17,380 And that's opened the door for lots of proteins that were recalcitrant when it came to crystallising them. 111 00:12:17,380 --> 00:12:21,580 So now we can use the two techniques in tandem. It's really powerful and exciting. 112 00:12:21,580 --> 00:12:29,020 Thanks a lot. Come on. So that seems sort of physical, direct way of estimating protein structure. 113 00:12:29,020 --> 00:12:35,890 But of course, for most proteins, we have the amino acid sequence. We have the sequence of beads on the string that Yvonne talked about. 114 00:12:35,890 --> 00:12:40,660 And we pretty much know the different ways that the molecules interact. 115 00:12:40,660 --> 00:12:48,370 So, Phil, you have all this information. Why is it not straightforward just to use a big computer to calculate the three dimensional structure? 116 00:12:48,370 --> 00:12:53,980 Why is there a protein folding problem? Yes, it sounds such a simple question, doesn't it? 117 00:12:53,980 --> 00:13:02,710 We got the sequence to should be able to just compute the structure. And the best way probably to explain this was, is that kind of thought exercise. 118 00:13:02,710 --> 00:13:09,970 That is a classic paradox that many people probably have heard of and formulated by serious elemental. 119 00:13:09,970 --> 00:13:16,990 In 1969, when he noted that it would take longer than the age of the known universe to enumerate 120 00:13:16,990 --> 00:13:22,450 all possible configurations of a typical protein by brute force calculation. 121 00:13:22,450 --> 00:13:31,360 So he estimated that something like 10 to the power of 300 possible confirmations for a typical protein. 122 00:13:31,360 --> 00:13:40,780 And, you know, two out of that in each confirmation would require even picosecond of time would mean that actually that require time, 123 00:13:40,780 --> 00:13:43,240 way beyond even the age of the universe. 124 00:13:43,240 --> 00:13:52,330 And just to put that in context, the context, the age of the universe is about 14 billion years, which is about four times 10 to the 17 seconds. 125 00:13:52,330 --> 00:13:56,560 So, yes, that's why it's such a hard problem, because you can't solve it by brute force, 126 00:13:56,560 --> 00:14:02,950 because you'd have to numerical calculate all of the energies of these different potential confirmations. 127 00:14:02,950 --> 00:14:08,130 And even if you had a very fast way of doing that, it would just simply be very intuitive. 128 00:14:08,130 --> 00:14:14,560 You wouldn't have enough computer. And yet, you know, a nature proteins fold spontaneously, something within milliseconds. 129 00:14:14,560 --> 00:14:18,460 And that dichotomy is referred to as the minerals paradox. 130 00:14:18,460 --> 00:14:25,720 So if you can't do it by brute force, what approaches have computational biochemist taken over the last 50 years? 131 00:14:25,720 --> 00:14:32,550 Yes. So that's your question. So I would say you could even go back beyond that, actually, you could probably go back to the 50s. 132 00:14:32,550 --> 00:14:39,080 The first prediction, if you like, can be attributed to Powerlink and Corey, who first suggested this notion that they would be out. 133 00:14:39,080 --> 00:14:43,460 He loses and sheets and they're going to do. 134 00:14:43,460 --> 00:14:49,870 Just to interrupt you there. Alpha. I see. And sheets, apart from protein structures that you get into lots of different proteins that are 135 00:14:49,870 --> 00:14:54,560 sort of common theme that you get three three deaths in Iraq as sort of substructure, 136 00:14:54,560 --> 00:14:58,250 if you like. That's probably the right way to think of them. Yes, exactly, sir. 137 00:14:58,250 --> 00:15:02,960 Yes, sir. Sir. So Alpha Helices had this kind of Ilic ethical kind of confirmation in sheets. 138 00:15:02,960 --> 00:15:06,500 It just kind of extended strand confirmations. 139 00:15:06,500 --> 00:15:13,530 And the 50s was also a time when the first sequence was actually elucidated, which haven't just mentioned incident. 140 00:15:13,530 --> 00:15:18,800 Incident, I think was the first sequence that was elucidated by Fred Syma. 141 00:15:18,800 --> 00:15:28,850 And around this time, Antonsen was also doing this quite famous experiments on RNA and looking at how they would fall. 142 00:15:28,850 --> 00:15:33,590 And that's a now very famous experiment that he did in 1961, 143 00:15:33,590 --> 00:15:37,550 which concluded with a statement which was basically that the nature confirmation of 144 00:15:37,550 --> 00:15:43,690 a protein is determined entirely by the amino acid sequence in a given environment, 145 00:15:43,690 --> 00:15:45,730 but that the latter flies off and left off. 146 00:15:45,730 --> 00:15:55,130 It's actually superimportant that this given environment and then obviously that became a thing to predict, try and predict the 3D structure. 147 00:15:55,130 --> 00:16:02,480 But I would say most effort was focussed on the smaller task of being able to predict the secondary structure. 148 00:16:02,480 --> 00:16:10,790 So by that, I mean he's out helixes in sheets so he could just work out which bits might adopt these helical bits or strand strands. 149 00:16:10,790 --> 00:16:15,670 That would be a good start. And so a lot of the early work sort of dates back to work. 150 00:16:15,670 --> 00:16:25,910 Chewin Fischman in about 1974 was the first, I would say the first real prediction of structure, this decision sheets, if you like. 151 00:16:25,910 --> 00:16:27,320 And they simply said, well, 152 00:16:27,320 --> 00:16:36,410 we'll just use the statistical propensities of the amino acid residues to form a specific structural element or secondary structural element. 153 00:16:36,410 --> 00:16:42,080 And they did that and that was quite successful. And ever since then, people have been trying to extend that. 154 00:16:42,080 --> 00:16:44,210 And people have started to employ, for example, 155 00:16:44,210 --> 00:16:51,170 neural networks in the late 80s and 90s that was first on to secondary structure prediction quite successfully. 156 00:16:51,170 --> 00:16:59,810 And that's still still ongoing. And then I suppose the next thing to note that this particularly important in the context 157 00:16:59,810 --> 00:17:06,290 of a protein structure prediction is the is the establishment of the PDB in 1971. 158 00:17:06,290 --> 00:17:08,570 So that's that protein databank, 159 00:17:08,570 --> 00:17:16,590 which is essentially a database with which is where the structures that Yvonne was talking about that you derive from x 160 00:17:16,590 --> 00:17:23,270 ray crystallography are deposited in such a way that it means that anybody around the planet can actually access them. 161 00:17:23,270 --> 00:17:31,250 And we're gonna be coming onto that when we talk about Alpha fold. And forgive me, felt just hurrying along to it. 162 00:17:31,250 --> 00:17:35,390 Could you tell us a little bit about the CASPA competition? Yes. 163 00:17:35,390 --> 00:17:43,460 So the cash competition was something of a set up in a nineteen ninety four stands 164 00:17:43,460 --> 00:17:48,400 for Past Stands with the critical assessment of structure prediction software. 165 00:17:48,400 --> 00:17:56,750 94 people in the field kind of realised that they needed some way of trying to assess how good these predictions really were. 166 00:17:56,750 --> 00:17:59,990 And the idea was simply to do this in a double blind experiment. 167 00:17:59,990 --> 00:18:11,270 So double blind in the sense that you would release the sequences of rupture and withhold the structure itself back and allow participants to enter. 168 00:18:11,270 --> 00:18:12,620 They were entering a blind way, 169 00:18:12,620 --> 00:18:20,500 meaning that the people who were going to assess the submissions didn't know who the people are making those submissions were. 170 00:18:20,500 --> 00:18:25,970 And it was a very successful competition. There were 33 targets at the time, about 35. 171 00:18:25,970 --> 00:18:30,560 I think groups took part in that and made about 100 predictions. 172 00:18:30,560 --> 00:18:34,250 And that model has been followed up. And it was so successful. 173 00:18:34,250 --> 00:18:38,330 It's still going now. Obviously, it was way as we know, but it was also copied. 174 00:18:38,330 --> 00:18:46,160 So there was also something called Caffery, which is the critical assessment of protein interactions that are set up in 2001. 175 00:18:46,160 --> 00:18:47,690 That's still going as well. 176 00:18:47,690 --> 00:18:56,570 There was also something called Cafer Critical Assessment of function annotation that was set up in 2010 that I stopped in 2014, I think. 177 00:18:56,570 --> 00:19:03,090 But I think the community still active. And another related set of competitions is called Sample, 178 00:19:03,090 --> 00:19:08,600 which stands for this statistical assessment of the modelling of proteins and ligands and the name, 179 00:19:08,600 --> 00:19:15,200 as it suggests that the idea of that is to focus more on problems to do with Miggins and their interactions with parties. 180 00:19:15,200 --> 00:19:20,680 And that's something that my group personally, I'm a little bit more involved with it in the last couple of years. 181 00:19:20,680 --> 00:19:24,910 And it's interesting you say that on a population, biologists and certainly. 182 00:19:24,910 --> 00:19:32,560 The mathematic mathematical population biologist, my area have looked with great envy at Caspa and the success of Bitton. 183 00:19:32,560 --> 00:19:35,970 We've tried to think whether we could set up something equivalent to Snarfing or. 184 00:19:35,970 --> 00:19:41,290 But so far have failed. Maybe all the headlines because it is a bomb. 185 00:19:41,290 --> 00:19:45,880 Can I just interject? No, it's just from the perspective of an experimentalist. 186 00:19:45,880 --> 00:19:51,280 I mean, Caspa. Is it. Is it. Is it the fantastic community effort? 187 00:19:51,280 --> 00:19:54,820 The guys that run cars go round, 188 00:19:54,820 --> 00:20:03,100 get in touch with all the experimentalists in the run up to the competition and ask us if we've got any structures that we haven't yet published. 189 00:20:03,100 --> 00:20:10,980 And so it's there's a real interplay between the experimental experimentalists community and and the, you know, 190 00:20:10,980 --> 00:20:18,040 the protein folders, because that's where they're getting they're getting structures that we haven't yet published. 191 00:20:18,040 --> 00:20:22,570 I myself haven't the timing has never been quite right for me to be able to give them something. 192 00:20:22,570 --> 00:20:27,460 But, you know, these these Emo's pop into my inbox every couple of years. 193 00:20:27,460 --> 00:20:29,470 Twenty years ago, when I worked at Imperial College, 194 00:20:29,470 --> 00:20:38,580 I remember going to a tremendous party when Mike Sternberg's group that had done very well in the cask competition of that year. 195 00:20:38,580 --> 00:20:42,700 Phil, you mentioned machine learning and neural nets. 196 00:20:42,700 --> 00:20:51,040 And the headlines that we saw right at the beginning of the month was that I had solved the protein folding problem. 197 00:20:51,040 --> 00:20:56,410 And you've mentioned that various forms of machine learning. 198 00:20:56,410 --> 00:20:57,820 So using a computer, 199 00:20:57,820 --> 00:21:07,570 particularly based on a neural net to scan many different to learn from many different existing is to try and protect you once had been used before. 200 00:21:07,570 --> 00:21:19,930 Can you tell us a little bit about what is special that the approach that the mind and Alpha fold have taken and what they have achieved that is new? 201 00:21:19,930 --> 00:21:28,900 Yeah. OK. So the first thing I think that just needs to be clarified a little bit here is that they actually haven't solved protein folding per say. 202 00:21:28,900 --> 00:21:34,870 It's more correct to say that they've solved the protein structure prediction problem rather than the folding problem, 203 00:21:34,870 --> 00:21:37,540 which is a bit more complicated, actually. 204 00:21:37,540 --> 00:21:42,880 And if you even want to be more precise, you could argue they solved the prediction of crystal structures of proteins. 205 00:21:42,880 --> 00:21:52,170 So the question really is, how did they do it then? And the real answer is we don't actually know the precise details of how well it works, 206 00:21:52,170 --> 00:21:56,680 because I've been very cagey about what they've been saying in the in the announcements. 207 00:21:56,680 --> 00:22:07,090 But presumably we will learn exactly how that works when you have paper papers appear in due course over presumably the next year. 208 00:22:07,090 --> 00:22:12,160 We do know, however, a little bit about how their previous entry work. 209 00:22:12,160 --> 00:22:19,070 So going back to the previous CAFS version, Alpha fold one, if you want to call it that, 210 00:22:19,070 --> 00:22:26,110 that we do we do know roughly how they work because they did release a version of the code and so you could look at that and tinker with it. 211 00:22:26,110 --> 00:22:32,410 And that kind of works and builds on actually what a lot of other people were doing, 212 00:22:32,410 --> 00:22:39,040 which was to employ this concept of multiple sequence alignments to great effect. 213 00:22:39,040 --> 00:22:47,140 So one of the big developments, which was which was I should also mention, which is terribly important, was the developments in sequencing. 214 00:22:47,140 --> 00:22:50,960 And because off around the two thousands and beyond that, 215 00:22:50,960 --> 00:22:56,200 they suddenly appeared this vast amount of data in terms of the sequences that were available. 216 00:22:56,200 --> 00:23:02,290 And this actually allowed you to by doing a very large, multiple sequence, allow alignment. 217 00:23:02,290 --> 00:23:11,440 You can actually look at which residues in which positions seem to COBOL and the IP, which is a residue of this position, would change over here. 218 00:23:11,440 --> 00:23:17,380 And it looked like it would change to something else over here at roughly the same in graphics, in the same sequence. 219 00:23:17,380 --> 00:23:20,380 And the idea was by looking at how strongly these evolved, 220 00:23:20,380 --> 00:23:25,600 you could actually then predict where the idea was that they might be close together in space. 221 00:23:25,600 --> 00:23:32,050 And you could use that to kind of then create a kind of pairwise distance constraint on the building of the model. 222 00:23:32,050 --> 00:23:36,580 And actually, that's what a lot of people had done prior to the success of our default. 223 00:23:36,580 --> 00:23:41,170 And that's what Apple folded in their first entry, to all intents and purposes, 224 00:23:41,170 --> 00:23:46,310 intents and purposes, combined with a little bit of tweaks around the outside, as it were. 225 00:23:46,310 --> 00:23:50,050 Now, also fold one appears to be a little bit different. 226 00:23:50,050 --> 00:23:57,610 And this is the step change, which we obviously have to wait to generally find out what the the details of that are. 227 00:23:57,610 --> 00:24:01,990 But it appears genuinely end to end in that a sequence goes in one end and our hopes, 228 00:24:01,990 --> 00:24:07,680 how hopes the structure which is which is which is which is incredibly impressive. 229 00:24:07,680 --> 00:24:12,550 But what we do know is that it appears to, four, adopt a kind of much more dynamic, 230 00:24:12,550 --> 00:24:18,520 dynamic learning approach within it to work out kind of as it's going along, 231 00:24:18,520 --> 00:24:24,250 which pieces of information from the sequences are more important than others. 232 00:24:24,250 --> 00:24:31,420 And this. Hands on a new piece of deep, relatively new piece of deep learning architecture, which I don't really know anything about it. 233 00:24:31,420 --> 00:24:35,430 It's called a 3D quivering transformer for those who are interested. 234 00:24:35,430 --> 00:24:43,080 It was also developed by a team at Google in 2017. So it's not surprising that it's been used in Alpha, too. 235 00:24:43,080 --> 00:24:49,350 It's been demonstrated the use of that kind of architecture has been used very successfully in language translators. 236 00:24:49,350 --> 00:24:54,960 That's probably where most people might say they're interested, might have come across it before. 237 00:24:54,960 --> 00:24:57,600 So that was that was definitely one of the big differences. 238 00:24:57,600 --> 00:25:04,680 The other difference is, is that it requires a lot of computational power as well, so that they do require a lot of resource to do it. 239 00:25:04,680 --> 00:25:10,230 I think they use something like 128 vertical tensor processing units. 240 00:25:10,230 --> 00:25:14,760 That's roughly about 200 GP use, I think, over a few weeks. 241 00:25:14,760 --> 00:25:16,500 So they had. So there are some differences. 242 00:25:16,500 --> 00:25:24,270 We don't know quite the full extent of the methodology yet, but it's kind of builds on what people have done before. 243 00:25:24,270 --> 00:25:32,280 But there is a step change, definitely, and an employment of new are new deep learning architectures to teach to make this significant jump, 244 00:25:32,280 --> 00:25:44,510 if you like, in in the predictive power. I've heard that in trying to solve protein structures that the expert humour the you and Yvonne 245 00:25:44,510 --> 00:25:55,150 and sometime provide insight that sort of straight mathematical brute force fails to provide. 246 00:25:55,150 --> 00:26:08,440 And that what might be happening is that the. New autofill machine learning is in some ways replicating that by a chemical insight that humans had. 247 00:26:08,440 --> 00:26:12,410 That is a truth. And that was, after all. 248 00:26:12,410 --> 00:26:16,800 I don't yeah. I sort of heard that sort of thing said before as well. 249 00:26:16,800 --> 00:26:21,480 And I I'm not sure if that's the correct way to think about it. 250 00:26:21,480 --> 00:26:25,980 To be honest, I think it's partly because I don't actually know enough of the underlying 251 00:26:25,980 --> 00:26:32,850 details of the of the new architecture to give you an honest answer about that. 252 00:26:32,850 --> 00:26:37,290 So, yeah, I'm not sure whether that's true or not. That's a not very good answer. 253 00:26:37,290 --> 00:26:42,030 And open. That's right. Oh, yeah. 254 00:26:42,030 --> 00:26:47,040 You've all come from a point of even more ignorance when it comes to the third approach. 255 00:26:47,040 --> 00:26:53,850 But I mean, from what I understand of the the the the previous approach, 256 00:26:53,850 --> 00:27:00,660 the one from two years ago, as Phil was saying, it's it's this idea that, you know, 257 00:27:00,660 --> 00:27:10,360 the yellow bead and sort of the blue eat in a red bead from different parts of the beads on the string, like you actually end up close to each other. 258 00:27:10,360 --> 00:27:16,290 Only when the when the proteins are is actually in its Three-Dimensional Shape. 259 00:27:16,290 --> 00:27:23,520 And actually, as a Yeah. As a as a structural biologist, often we find. 260 00:27:23,520 --> 00:27:28,590 Finally finalising the structure of a protein. I'll be looking at it and I'll be thinking, yeah. 261 00:27:28,590 --> 00:27:35,790 Yeah, that makes sense that an arginine should be should be, you know, close to a despotic acid. 262 00:27:35,790 --> 00:27:39,700 So a blue and a red type B, it's a close together. 263 00:27:39,700 --> 00:27:43,890 So it's a similar thing. Now that the eye is matching, 264 00:27:43,890 --> 00:27:51,630 you could sort of think that that artificial intelligence has learnt that because it's learnt these rules by going through looking at the I 265 00:27:51,630 --> 00:28:03,210 believe they quoted a one hundred and seventy thousand protein structures in the protein database that they were able to use to learn from, 266 00:28:03,210 --> 00:28:09,990 as well as, of course, all these sequence alignments. But what I also read somewhere that that the latest iteration, 267 00:28:09,990 --> 00:28:15,030 the one that's provided the breakthrough is is not just sort of looking at a pair of pairwise, 268 00:28:15,030 --> 00:28:22,320 but something more like building up kind of a jigsaw puzzle and that you you put together things in you know, 269 00:28:22,320 --> 00:28:28,810 you put together all the bits that are sort of the tree and the bits that are the river bits that are the. 270 00:28:28,810 --> 00:28:34,270 Cow crossing the river. And then it sort of clumps it all, but puts it all together, you see what I mean? 271 00:28:34,270 --> 00:28:42,210 And I imagine, again, that, you know, there are blocks of of of areas in proteins that it just then makes sense that these would come together. 272 00:28:42,210 --> 00:28:47,700 But I'm. Yeah. Who knows? It's going to be extremely interesting to see exactly how they did it. 273 00:28:47,700 --> 00:28:55,970 And it's not a related question. Has that. Machine learning, to a certain extent, is a black box. 274 00:28:55,970 --> 00:29:03,980 You have inputs in and outputs out. But there's a lot of work in trying to actually reconstruct what is happening within that black box. 275 00:29:03,980 --> 00:29:13,130 And do you think that when that is done with deep foaled, that it might say anything about how proteins physically fold in nature? 276 00:29:13,130 --> 00:29:16,190 Or do you think you'll be just much more patterned pattern matching? 277 00:29:16,190 --> 00:29:24,400 So is getting an answer without it telling you anything about the physical processes involved? 278 00:29:24,400 --> 00:29:33,130 My feeling is it won't tell you anything about folding. Actually, I think it ultimately is is it is a bit more of just a pattern matching. 279 00:29:33,130 --> 00:29:36,970 I don't think you'll get much information about the actually how it folds. 280 00:29:36,970 --> 00:29:41,270 This is this is a data versus physics problem almost in that sense. 281 00:29:41,270 --> 00:29:42,790 You know, now we are you know, 282 00:29:42,790 --> 00:29:50,830 previously we we were kind of thinking that this problem would be solved by just using our crude understanding of other entry physics. 283 00:29:50,830 --> 00:29:58,340 But this is this is actually the other thing that's now become prevalent, which is making use of all the all the data that's out there. 284 00:29:58,340 --> 00:30:00,610 And I don't think actually you'll get much. 285 00:30:00,610 --> 00:30:10,280 And that's good news for academics working on folding plus eight, because that's that they making probably a bit more comfortable and sleep at night. 286 00:30:10,280 --> 00:30:15,130 But I don't think they will get much. There's much to be had looking at Olding for years. 287 00:30:15,130 --> 00:30:20,590 Thank you. Yvonne, when the news came out, I think it's November 30th and December the 1st, 288 00:30:20,590 --> 00:30:31,420 then it sort of made the front page of newspapers and was on the on and on radio and television within the field. 289 00:30:31,420 --> 00:30:36,790 How much of a surprise was this to you guys? Did you did you know it was coming? 290 00:30:36,790 --> 00:30:48,140 Were you surprised by how good the project predict predictions were in the final cast round? 291 00:30:48,140 --> 00:30:55,460 I guess I am actually, I suppose. Yeah, I would not have predicted that it would be this particular cast. 292 00:30:55,460 --> 00:31:01,880 I mean that because there have been periods where instead of progress every two years, 293 00:31:01,880 --> 00:31:07,640 you know, that they looked at one stage like the field with stalling. And then and then things started happening again. 294 00:31:07,640 --> 00:31:11,210 Then they were sort of new approaches and big jumps forward. 295 00:31:11,210 --> 00:31:14,630 And you could imagine that that artificial intelligence, 296 00:31:14,630 --> 00:31:22,580 that that machine learning was going to be able to to mine these these huge databases now and make a big step forward. 297 00:31:22,580 --> 00:31:26,150 And a lot of people would trying to do this. 298 00:31:26,150 --> 00:31:32,600 But I guess when I first woke up, I thought, well, I've heard this on the Today programme or whatever. 299 00:31:32,600 --> 00:31:38,780 I thought, well, how good is it? And I again, I've been rummaging around a bit trying to find out. 300 00:31:38,780 --> 00:31:42,890 And there are some. Figures. 301 00:31:42,890 --> 00:31:48,950 The pictures provided, which show very impressive, what we call super positions. 302 00:31:48,950 --> 00:31:53,540 So they've taken the experimental structure and they've overlapped. 303 00:31:53,540 --> 00:31:58,340 They've superimposed onto it the the the bottle that they had come up with. 304 00:31:58,340 --> 00:32:05,290 And you know that to sit. But we identify on top of each other. 305 00:32:05,290 --> 00:32:11,540 But the caveats to that are that they that that individual protein structures. 306 00:32:11,540 --> 00:32:18,860 So they managed. To come up with something that, you know, if if we were talking about insulin, 307 00:32:18,860 --> 00:32:27,550 they would have been able to predict what insulin looks like and it would look very much as Dorothy Hodgkins. 308 00:32:27,550 --> 00:32:31,370 So I don't know. 309 00:32:31,370 --> 00:32:37,850 Well, again, I saw one picture that implied that the actually the the real details cause that the beads aren't just a little sort of beads. 310 00:32:37,850 --> 00:32:41,380 Isn't that all the beads on top of each other in the correct positions? 311 00:32:41,380 --> 00:32:51,470 It's that you've you've got actually the the detail of what each residues, each side chains of the residues that wear their position. 312 00:32:51,470 --> 00:33:01,070 Getting that with that. I think that probably the numbers that I've seen quoted are that they're accurate to within 313 00:33:01,070 --> 00:33:08,510 maybe two angstroms overall of the match in position on average of everything in the. 314 00:33:08,510 --> 00:33:13,670 Well, in the in the main chain in the final. But it time that would have to do certain things. 315 00:33:13,670 --> 00:33:18,240 And I think that's going to be fun to just chat over for a few minutes now. They've done nothing. 316 00:33:18,240 --> 00:33:23,150 Got to interrupt that. So strong. That's about the size of an atom. Just to give people. 317 00:33:23,150 --> 00:33:25,820 Oh, sorry. Yeah, absolutely. Yeah. Yeah. 318 00:33:25,820 --> 00:33:38,640 And the reason I'm setting about how accurate it is is that you need things to be accurate to within less of an angstrom actually to be able to. 319 00:33:38,640 --> 00:33:43,740 Too accurate to use structures then to actually act accurately, 320 00:33:43,740 --> 00:33:56,410 design drugs that you might want to to use to to jam some, you know, to to fit into part of of a protein structure. 321 00:33:56,410 --> 00:34:06,180 And, for example, stop it working if, for example, you want to to block a protein machine that's important for for a viral function or something. 322 00:34:06,180 --> 00:34:14,700 So you need a very high level of accuracy for certain sorts of things to be evolved. 323 00:34:14,700 --> 00:34:21,060 I don't know, I. I think they are. I think going to be able to get that they they have clearly got there. 324 00:34:21,060 --> 00:34:26,310 But some of the medium sized proteins, it's extremely impressive. 325 00:34:26,310 --> 00:34:35,680 But the next challenge, and they say this themselves in their press release is, is to be able to then look at not just one protein by itself, 326 00:34:35,680 --> 00:34:40,630 but these complexes, proteins, because proteins that don't actually usually wander around by themselves. 327 00:34:40,630 --> 00:34:47,510 They there are gangs of them. They're not very good at socially isolating proteins together. 328 00:34:47,510 --> 00:34:55,410 So just to cheque, I understand you correctly. You're impressed by the structures that they are producing at the moment, 329 00:34:55,410 --> 00:35:02,430 but you worry that they might not be quite as good to show where every atom is. 330 00:35:02,430 --> 00:35:07,050 And you need to know where every atom is if you're trying to design a drug. 331 00:35:07,050 --> 00:35:12,090 Because typically a drug needs to fit in like a lock, a key in the lock. 332 00:35:12,090 --> 00:35:18,900 And so you really need high accuracy, which at the moment you you do require a physical means. 333 00:35:18,900 --> 00:35:27,510 I'm looking at the structure rather than the computational. I think it's probably a little it's still that's still a little bit off from being 334 00:35:27,510 --> 00:35:34,170 able to to be able to help with that directly with the drug design side of things. 335 00:35:34,170 --> 00:35:41,010 But having said that, there's a lot of protein structures that aren't actually good enough, 336 00:35:41,010 --> 00:35:46,350 experimentally determine protein structures that aren't good enough to be able to really guide drug design. 337 00:35:46,350 --> 00:35:51,990 You need you need only the very best structures are good enough for that. 338 00:35:51,990 --> 00:36:00,930 So it's a I'm setting them a very high bar in saying that. And I think that it's extremely impressive what they've been able to do. 339 00:36:00,930 --> 00:36:07,320 And they are clearly able to, for many protein sequences, 340 00:36:07,320 --> 00:36:18,210 predict what that protein is going to look like to sufficient accuracy to be able to say, give up the idea of maybe how it works or what it does. 341 00:36:18,210 --> 00:36:26,960 But. There are still big questions about how it will be able to interact with other proteins. 342 00:36:26,960 --> 00:36:32,700 At the moment they're not able to predict that whilst we can do structures either using cryo 343 00:36:32,700 --> 00:36:40,410 anymore or X-ray crystallography of the clusters of proteins and see how they fit together. 344 00:36:40,410 --> 00:36:47,230 And it's often in the way that they sit together. That's important for for their function. 345 00:36:47,230 --> 00:36:53,930 For instance. So, Phil, let me go to you as a computational barkhad chemist. 346 00:36:53,930 --> 00:37:00,400 Yeah. And. How are you feeling as well about the relative weather? 347 00:37:00,400 --> 00:37:07,330 I mean, I think actually, if we if it's fair, it's a very impressive result, whichever way you look at it. 348 00:37:07,330 --> 00:37:18,360 I don't think anybody 20 hurt in the cast. 13 would have predicted that the alcohol would have done as well as they have done in this in this event. 349 00:37:18,360 --> 00:37:24,160 This is notable. And the reason is a lot of hype is notable, has some for a lot of for a few reasons, actually. 350 00:37:24,160 --> 00:37:31,840 The first is that alcohol, too, is not just a head, but it is more than twice as good as the next best entry. 351 00:37:31,840 --> 00:37:39,160 And that was true across nearly all the targets. I think I'm correct in saying that not that would be impressive enough, actually. 352 00:37:39,160 --> 00:37:41,980 But this time, the reason why there's a lot of hype, I think, 353 00:37:41,980 --> 00:37:52,150 is actually that for the accuracy, as measured by something called the global distance score, 354 00:37:52,150 --> 00:37:55,300 which is which is another metric which I won't go into, 355 00:37:55,300 --> 00:38:01,920 but anything above 90 is considered informally at least, to be rivalling experimental accuracy. 356 00:38:01,920 --> 00:38:07,030 And Alpha sort of claimed a median score of ninety two point five across all targets. 357 00:38:07,030 --> 00:38:14,560 That's why there's a lot of hype. Now, if Om's right, if you want to delve down and look at the armis, the root mean squared deviation, 358 00:38:14,560 --> 00:38:19,990 how different it is when you superimpose one of the predictions on the target, then. 359 00:38:19,990 --> 00:38:25,930 Yeah, about 50 percent of the time across all atoms. This is you can get on that. 360 00:38:25,930 --> 00:38:29,170 They're under two angstroms, which is what Yvonne was mentioning, 361 00:38:29,170 --> 00:38:34,480 which is still pretty impressive actually, if you make your toddler a little bit slacker. 362 00:38:34,480 --> 00:38:42,520 So less than five angstroms, then 92 and a half percent of the time they get the right answer, which which is quite impressive. 363 00:38:42,520 --> 00:38:45,830 And I think there's scope for improving that. Actually, that's the good news as well. 364 00:38:45,830 --> 00:38:54,310 I think actually, it's not clear whether you could refine that a little bit, making that bit of physics and perhaps do it to improve the models. 365 00:38:54,310 --> 00:39:03,130 And it's not also clear whether some of that difference reflects some of the crystallisation artefacts or some, 366 00:39:03,130 --> 00:39:12,040 so that there's a lot of things that one one could probably delve down to a little bit and consider that consider it a bit more detail, I think. 367 00:39:12,040 --> 00:39:15,360 I think the level of accuracy is useful enough even now, though, 368 00:39:15,360 --> 00:39:20,470 that you might even start to question some experimental results you might be thinking of. 369 00:39:20,470 --> 00:39:23,380 Is it really also worth the experiment? Right. 370 00:39:23,380 --> 00:39:31,450 And then I read on somebody's blog and apologies if you are listening and I'm quoting you from misquoting you here. 371 00:39:31,450 --> 00:39:34,240 But I didn't read it. Actually, that's already happened. 372 00:39:34,240 --> 00:39:39,520 One of the experimental groups had seen the results on the outfall, went back and looked at that data. 373 00:39:39,520 --> 00:39:43,170 And indeed, they have actually mis assigned probably residue. 374 00:39:43,170 --> 00:39:46,660 I think it was so. Yeah. 375 00:39:46,660 --> 00:39:52,200 So. So, yeah, I read I read that blog. So I thought that was a good idea. 376 00:39:52,200 --> 00:39:59,500 That actually now we're at the level where we're actually using the computational predictions to question whether the experiment itself is right. 377 00:39:59,500 --> 00:40:04,100 I think he's very that's an important step change, actually. 378 00:40:04,100 --> 00:40:11,340 I think going forward, it would probably be better for the predictors to compare to the raw electron density, actually, rather than. 379 00:40:11,340 --> 00:40:16,480 Yeah. Will that make Chris Lawrence generate the great theoretical ecologist? 380 00:40:16,480 --> 00:40:19,870 Bob May. Lord May. He died early this year. You still have to quit. 381 00:40:19,870 --> 00:40:24,790 My theory completely disproves your data. And I'm sorry he's not alive. 382 00:40:24,790 --> 00:40:28,870 Enjoy. That's what I would like to go to some questions. 383 00:40:28,870 --> 00:40:36,700 And the first one, I think it's for Yvonne. And this Sunesys has alpha fold. 384 00:40:36,700 --> 00:40:50,270 Is it likely to facilitate or replace crystallography? And how do you think it will become incorporated into the drug design pipeline? 385 00:40:50,270 --> 00:40:55,420 Right. Okay. I think for the moment, anyway, 386 00:40:55,420 --> 00:41:07,030 it's just going to be incredibly complementary and helpful to breaking crystallography and also to to Criolla n that preaching crystallography. 387 00:41:07,030 --> 00:41:14,740 We we often benefit from having a model that we can start from. 388 00:41:14,740 --> 00:41:19,740 That might not be the exact isn't the exact structure that we're trying to determine. 389 00:41:19,740 --> 00:41:31,190 But it gives us a starting point that allows us to interpret our data more rapidly and if we can. 390 00:41:31,190 --> 00:41:36,860 Get those models using alpha fold. 391 00:41:36,860 --> 00:41:46,080 That that could be incredibly helpful. It could also, I think. 392 00:41:46,080 --> 00:41:56,190 Be used to generate structures for individual proteins, which you could then dock together into complexes that we are visualising, 393 00:41:56,190 --> 00:42:04,380 using criterium, which may actually be at a rather lower resolution to rather less detail. 394 00:42:04,380 --> 00:42:12,340 We might not be able to see quite all of the detail in. 395 00:42:12,340 --> 00:42:16,000 Using some of our structural techniques, looking at really big complexes, 396 00:42:16,000 --> 00:42:22,000 it's often very useful to be able to get really detailed structures of maller parts 397 00:42:22,000 --> 00:42:27,020 of those complexes and then dock them into less detailed experimental structures. 398 00:42:27,020 --> 00:42:38,330 And I think that's that could be a very important way forward. That term Alpha Fold is able to open up for us in terms of the drug design. 399 00:42:38,330 --> 00:42:46,870 Yeah, I kind of touched on that earlier, didn't I? I think it is going to be very exciting in probably the only the short term, actually, 400 00:42:46,870 --> 00:42:57,030 that they are going to improve, because I didn't mean to sound too churlish. I think they it's it's incredibly exciting what they're doing already. 401 00:42:57,030 --> 00:43:01,800 Thanks very much. That question was from you. Can the law, I should have said. 402 00:43:01,800 --> 00:43:11,460 I've got a question from Larissa goal, which I think is fullfil machine learning is only as good as the data on which it is trained, trained. 403 00:43:11,460 --> 00:43:18,870 And if there are biases and that's going to affect it. Are you aware of any such biases, biases in the training set? 404 00:43:18,870 --> 00:43:24,750 And if I can expand that. Does this mean that you can't take, say, 405 00:43:24,750 --> 00:43:35,430 a very novel peptide sequence from a poorly understood virus and and feed it into deep fault if it's seen nothing like that in the past? 406 00:43:35,430 --> 00:43:40,530 Yeah. So that's a very good point. And yeah, that is very true. 407 00:43:40,530 --> 00:43:45,900 Machine learning is traditionally only as good as the data you put in. 408 00:43:45,900 --> 00:43:51,270 And there's of course, there is a bias in in in some sense in, for example, 409 00:43:51,270 --> 00:43:59,580 the PDP in those structures that are readily solvable by by definition, almost, because that's what the PDP. 410 00:43:59,580 --> 00:44:05,760 So there is a slight bias there. Now, if you'd asked me the question about alcohol won, 411 00:44:05,760 --> 00:44:11,640 then then I would have said yes to to to Charles's follow up question would would 412 00:44:11,640 --> 00:44:20,770 it would have really struggled to predict something that it's never seen before, but would Alpha fold to the jury is a bit out, I think on this. 413 00:44:20,770 --> 00:44:24,880 It's not clear how well it would actually do on unrelated things. 414 00:44:24,880 --> 00:44:26,280 I think that would be the interest. 415 00:44:26,280 --> 00:44:33,930 A very interesting to see how it does on on stuff that it clearly has no idea about that and ask it to make a prediction on that. 416 00:44:33,930 --> 00:44:40,830 I think that will be an interesting test. I don't think anybody has a feeling for how badly if you're a pessimist or how 417 00:44:40,830 --> 00:44:45,300 well you're not missed how well it will world will do in that kind of scenario. 418 00:44:45,300 --> 00:44:53,800 But certainly a machine learning the input data of the data set that you have for training for some of these things is super important. 419 00:44:53,800 --> 00:44:58,410 And yeah, you have to be aware all the time of biases. Yeah. Okay. 420 00:44:58,410 --> 00:45:06,080 The next question is for Yvonne. And if I'm going to ask you to actually explain the question as well as to answer it. 421 00:45:06,080 --> 00:45:14,340 What are these? What do we think about whether Alpha Fold will be able to solve the structures for G protein coupled receptors, 422 00:45:14,340 --> 00:45:20,050 seeing as they're so hard to crystallise? So might you begin by just saying what the G protein coupled receptor? 423 00:45:20,050 --> 00:45:26,490 It's okay. It's it's a protein, that's all. 424 00:45:26,490 --> 00:45:38,250 A family of huge family protein, huge family proteins that are embedded in the plasma membrane that the membranes that surround ourselves. 425 00:45:38,250 --> 00:45:47,100 So they are proteins that are involved in picking up signals from the outside and signalling into the cell. 426 00:45:47,100 --> 00:46:00,090 And they are major drug targets. Many of these these proteins, because they control all sorts of signalling, not in our brains, for example. 427 00:46:00,090 --> 00:46:04,830 So they they can be they can be targets fully and anti-depressants and such like they can. 428 00:46:04,830 --> 00:46:11,950 But they they control the. 429 00:46:11,950 --> 00:46:22,130 Your activities in our hearts and there are many, many, many of these family members are very potent drug targets. 430 00:46:22,130 --> 00:46:30,890 And for many, many years, people have worked to get structures, effort to understand and to design better drugs against them. 431 00:46:30,890 --> 00:46:36,980 To understand the actions of the drugs we've got and to sign better ones. 432 00:46:36,980 --> 00:46:42,230 But they have proved to be notoriously difficult to crystallise. 433 00:46:42,230 --> 00:46:50,390 And indeed, a clutch of Nobel prises were awarded not that long ago for the very first structures that came out. 434 00:46:50,390 --> 00:46:59,270 And the first structures were by approaching crystallography. But they they needed new techniques to be used to persuade these things to crystallise. 435 00:46:59,270 --> 00:47:08,690 More recently, there's been quite a lot of progress being made with using criterium to solve structures that these proteins. 436 00:47:08,690 --> 00:47:16,760 And so. Yes, that they're a reasonable number of family members now for which structures are known. 437 00:47:16,760 --> 00:47:25,730 But as I was saying, there are many, many, many members in this family, many hundreds protein's for which we don't have structures. 438 00:47:25,730 --> 00:47:33,260 And so the argument would be that indeed we could use the alpha fold to predict these structures. 439 00:47:33,260 --> 00:47:44,410 And it could. One would hope it would do. Perhaps better than than just starting from a known structure and trying to model from that structure. 440 00:47:44,410 --> 00:47:49,020 But I'm since the devil is in the detail, I don't know. 441 00:47:49,020 --> 00:48:04,440 I don't. I hope it's going to help us. But I don't know how well our phone is working at the moment for these highly membrane embedded type proteins. 442 00:48:04,440 --> 00:48:12,410 And also the problem with G protein receptors. And part of the reason why they've been very problematic to crystallise is that they. 443 00:48:12,410 --> 00:48:18,260 Part of their function involves them being very dynamic, so they they shape change, look huge by a huge amount, 444 00:48:18,260 --> 00:48:27,530 but they do shape change a bit and that's important for the way that they work and they interact with other proteins. 445 00:48:27,530 --> 00:48:33,390 As part of that, they pass on the signal to other proteins through their shape change. 446 00:48:33,390 --> 00:48:44,060 And there's a movement so it doesn't work, doesn't tell you you shape changes and doesn't tell you Pruett's in protein interactions and the, 447 00:48:44,060 --> 00:48:48,420 you know, the deep mind people in that in their block, in their in their presence. 448 00:48:48,420 --> 00:48:54,350 What we're seeing these are the next are the challenges that they're going to have to move onto in the longer term. 449 00:48:54,350 --> 00:49:01,980 Phil, you've already asked around about things that will be a use for for drug designs to look for a bridge. 450 00:49:01,980 --> 00:49:06,280 Do you see cars that are going to be perhaps not easily solved of them? 451 00:49:06,280 --> 00:49:14,030 You know, I was just gonna say one of the yeah, that is definitely one of the challenges for other other membrane proteins like transporters. 452 00:49:14,030 --> 00:49:17,420 You know, we know they don't have one unique structure. 453 00:49:17,420 --> 00:49:27,370 They actually have at least two probably more set separate structures that must be Matus stable as part of their part of their function. 454 00:49:27,370 --> 00:49:33,920 And so which one do you predict? You know, so I won't be challenging for alcohol, too, I think, 455 00:49:33,920 --> 00:49:42,410 to predict the different states of transporter proteins or things like PCR, which can exist in multiple different confirmations. 456 00:49:42,410 --> 00:49:48,200 I think transporter's might be a bit easier because that changes are probably a bit more extreme. 457 00:49:48,200 --> 00:49:52,860 So you would think perhaps that that might be a little bit easier G.P.S. 458 00:49:52,860 --> 00:49:57,920 So the movement is quite subtle, actually. Most of them, at least anyway. 459 00:49:57,920 --> 00:50:06,680 But isn't this a problem that's perhaps amenable to the next generation of machine learning in this area that you can. 460 00:50:06,680 --> 00:50:11,060 You do understand these metho stable states so the states that flit between it, 461 00:50:11,060 --> 00:50:16,740 then you can train a A.I. to try and learn that and hopefully tell you new things. 462 00:50:16,740 --> 00:50:21,170 Yeah, I think I think you're right. I think that is actually that. 463 00:50:21,170 --> 00:50:28,340 And the problem legalisation are actually relatively low hanging fruit now in the sense that, you know, 464 00:50:28,340 --> 00:50:34,010 they've done the big problem, which is basically protecting the structure of the give or take lives. 465 00:50:34,010 --> 00:50:41,090 And actually, you know, to to extend that in these other directions, you would kind of think it's not so hard. 466 00:50:41,090 --> 00:50:48,670 Is that the first step? You know, these be second and third steps we're talking about now, I would think would be a little bit less difficult. 467 00:50:48,670 --> 00:50:53,750 But I don't know, maybe it's just then just a follow up question to that. 468 00:50:53,750 --> 00:50:59,510 My friends who work on Glycoprotein is always say that the amino acid sequence 469 00:50:59,510 --> 00:51:03,440 is only the beginning and they see the things you stick to your protein off. 470 00:51:03,440 --> 00:51:12,140 You made it. That give all the activity. Then how does how does that affect what we're talking about now? 471 00:51:12,140 --> 00:51:16,020 Yeah, I suppose what we would call post translational modifications. You're right. 472 00:51:16,020 --> 00:51:21,070 I mean, one of the approaches that I'm really interested in, which is one of the messenger proteins that goes between cells, 473 00:51:21,070 --> 00:51:24,540 it isn't just made out of the beads on the string, 474 00:51:24,540 --> 00:51:32,370 but it's it's also got an extra parmeter late attached to it, which is essential for its its its activity. 475 00:51:32,370 --> 00:51:37,260 So it's it's got a it's not just entirely made of amino acids. 476 00:51:37,260 --> 00:51:42,110 There are other things that add onto proteins that further change, 477 00:51:42,110 --> 00:51:47,470 add to their shape and add to their properties and are essential for their function. 478 00:51:47,470 --> 00:51:53,880 And so there's a whole world there. That Alpha vote isn't exploring. 479 00:51:53,880 --> 00:52:01,680 OK. We've got a few more questions. And so, if possible and I realise it might be a if you can answer them quite briefly. 480 00:52:01,680 --> 00:52:05,700 I don't know who would like to pick this one up, which is the highest vote at the moment. 481 00:52:05,700 --> 00:52:13,710 And we'll need a bit of explanation. How long do you expect before Alpha Fold is going to be capable of predicting hysteric interactions? 482 00:52:13,710 --> 00:52:18,190 So you can. Failed you? Yes. 483 00:52:18,190 --> 00:52:22,910 How long before it predicts, Alastair, it interactions? Just to clarify, Alex. 484 00:52:22,910 --> 00:52:30,260 Derek InterAction's meaning that there would be another part of the protein away from perhaps where the agonist binds, 485 00:52:30,260 --> 00:52:34,750 or that the main leg and binding side is another side or the other Sterrett side. 486 00:52:34,750 --> 00:52:41,000 Which would. Which would. Which would be. Somewhere where something else could buy it and do something. 487 00:52:41,000 --> 00:52:44,800 How old. How far are we away from understanding that? I think that's the thing. 488 00:52:44,800 --> 00:52:51,490 That's a particularly challenging sub problem of when you get down to higher accuracy. 489 00:52:51,490 --> 00:52:59,050 So, you know, I said earlier that 50 percent of the results were under two Angstroms or Mesta. 490 00:52:59,050 --> 00:53:01,990 I think you'd have to be confident the other, you know, 491 00:53:01,990 --> 00:53:09,210 much higher and 50 percent before you could begin to hope to to to sort of tackle that particular question. 492 00:53:09,210 --> 00:53:14,630 All right. Sorry, fella. I interrupted. No, no, I just said that that would be my view. 493 00:53:14,630 --> 00:53:22,060 Yeah. Am I right? The significance of the question is that were you to understand hysteric interactions that new new drug targets, 494 00:53:22,060 --> 00:53:26,320 because you could then look at small molecules that would then. 495 00:53:26,320 --> 00:53:28,450 Exactly. Yeah. It so that the benefit. 496 00:53:28,450 --> 00:53:36,450 Looking at our Starick sites in that way would be that you'd be not looking at the same site as the author Sterrett Ligand. 497 00:53:36,450 --> 00:53:45,430 And that would give you more freedom presumably to to design something new that they didn't have the. 498 00:53:45,430 --> 00:53:48,200 There wouldn't be so constrained as the authors of the authors, 499 00:53:48,200 --> 00:53:55,880 Darkside tends to be constrained by all of the evolutionary properties that are designed to make it the main orthostatic site. 500 00:53:55,880 --> 00:54:00,910 But the historic site presumably can be a bit more freer in terms of how it's evolved and that there could be 501 00:54:00,910 --> 00:54:08,380 potential for many more different compounds to bind and change the activation or otherwise of the target protein. 502 00:54:08,380 --> 00:54:09,670 So it's an interesting question. 503 00:54:09,670 --> 00:54:17,050 Here is what are your understanding of how Alpha file deals with nonsense sequences are completely fictional sequences? 504 00:54:17,050 --> 00:54:21,940 So I guess there are two issues here. If you may comp a string of amino acids. 505 00:54:21,940 --> 00:54:33,660 Will it always assume a fixed three dimensional structure or is it only a subset of sequences that form a structure? 506 00:54:33,660 --> 00:54:46,470 We know that it's only a subset because because there are proteins that the cells make that term that are the don't assume one fixed structure, 507 00:54:46,470 --> 00:54:56,620 and that's part of that function, actually, that they they can shape change and partly according to to what they're. 508 00:54:56,620 --> 00:55:02,020 What are the proteins are interacting with, but also there are regions in them that are just very, 509 00:55:02,020 --> 00:55:09,100 very flexible and floppy and just stay like beads on a string. I mean, Phil, what would you say to that? 510 00:55:09,100 --> 00:55:16,990 Yeah, no, I absolutely I think I if that's the intention of the question, I say, but yeah, you know, there are a lot of it. 511 00:55:16,990 --> 00:55:24,460 The genome is probably unstructured. So that not be different question then and probably not relevant. 512 00:55:24,460 --> 00:55:32,520 Right. I think it does come to can you then turn it the other way around, though, and start to design structures? 513 00:55:32,520 --> 00:55:36,540 So so run it backwards and say, OK. This is the structure that I want. 514 00:55:36,540 --> 00:55:44,100 Can I kind of work out what the sequence would have to be? 515 00:55:44,100 --> 00:55:48,420 To get that structure, because that's how you would then design it. 516 00:55:48,420 --> 00:55:56,220 Enzymes to order, little machines that would chew up plastic. I think they've been saying a lot in the press releases. 517 00:55:56,220 --> 00:56:01,920 You could say that that is something that already David Baker has been putting a lot of effort into. 518 00:56:01,920 --> 00:56:07,320 This is somebody else who's been looking at protein coding problem over the area over the years. 519 00:56:07,320 --> 00:56:16,280 And clearly, it's a direction that deep, so deep mind and announce folder without the filter can be interested in protecting well. 520 00:56:16,280 --> 00:56:19,890 So that nicely leads on to my last question. 521 00:56:19,890 --> 00:56:31,890 We've only got three minutes left, and I want you to throw us aside your shackles as some very careful scientists to sort of project yourself 522 00:56:31,890 --> 00:56:39,690 10 years in the future and give us an example of what you think might be a really interesting result, 523 00:56:39,690 --> 00:56:47,310 either in fundamental science or in medicine or applied science that will come out of both what our four fold has done. 524 00:56:47,310 --> 00:56:52,290 But the whole tenor in advance of of protein biochemistry. 525 00:56:52,290 --> 00:56:59,160 So I am asking you to to unashamedly speculate about if everything went well, 526 00:56:59,160 --> 00:57:03,450 what might be a fantastic thing that we're celebrating 10 years from now. 527 00:57:03,450 --> 00:57:11,260 Horrible question. So I'll go to Phil first, right. Very gentlemanly. 528 00:57:11,260 --> 00:57:19,020 So. Well, I think, you know, you could use it progresses as fast as less than one can imagine that you'll have structures for everything. 529 00:57:19,020 --> 00:57:26,730 And in that case, he structures ever want or need to have structures for things like structural systems, 530 00:57:26,730 --> 00:57:30,430 biology, whatever you want to call it, will become a reality. And I think that's one important thing, 531 00:57:30,430 --> 00:57:37,470 because that could actually then bring know how we unify a lot of cellular level stuff all the way down to molecular. 532 00:57:37,470 --> 00:57:44,280 So we really, really will have a very good, detailed structural understanding right away from molecules, 533 00:57:44,280 --> 00:57:48,120 right up to cells genuinely with no gaps in it. 534 00:57:48,120 --> 00:57:55,680 Totally understanding how all these things start to interact and function together, that that is not an unrealistic goal. 535 00:57:55,680 --> 00:57:56,130 I don't think, 536 00:57:56,130 --> 00:58:06,180 given the progress in the last two casts in that sense and what were the ideas we discussed about interactions between different proteins, 537 00:58:06,180 --> 00:58:11,910 that that obviously is the next one of the next big challenges is predicting how all these proteins will interact with each other, 538 00:58:11,910 --> 00:58:20,670 both in space and time within a cell. So that that that, if you like, is is is what you could aim for probably within 10 years. 539 00:58:20,670 --> 00:58:24,230 That really is exciting, Yvonne. Yeah. 540 00:58:24,230 --> 00:58:26,850 And running from that. Carrying on from there. 541 00:58:26,850 --> 00:58:35,830 I think what would also be incredibly exciting is going to be the ability to to understand then the effect of. 542 00:58:35,830 --> 00:58:41,630 A variance in in in the genome of a single point mutations. 543 00:58:41,630 --> 00:58:51,250 Just one person having one different coloured bead in the string of one of their proteins is alpha fold. 544 00:58:51,250 --> 00:58:59,870 Gonna be able to help us more rapidly understand the effect of that colour change of just one in one bead affecting the shape of a protein 545 00:58:59,870 --> 00:59:10,400 and affecting the way that it interacts with all the other proteins it needs to to talk to in a cell to go back to to Phil's answer. 546 00:59:10,400 --> 00:59:24,000 So will it ultimately help us understand all the information that's now being gathered from the sequencing of the human genome? 547 00:59:24,000 --> 00:59:30,680 And help us to translate that ultimately into medicines if we can understand better what's going wrong, 548 00:59:30,680 --> 00:59:38,520 ongoing, wrong, maybe because of just one very subtle change. Will we be able to then come up more easily with. 549 00:59:38,520 --> 00:59:42,330 Where to intervene therapeutically? Thanks so much indeed. 550 00:59:42,330 --> 00:59:45,120 I'm really sorry we're going to have to come to an end. 551 00:59:45,120 --> 00:59:52,710 Now, it has been a real privilege to listen to and discuss with two of the leading experts in this field. 552 00:59:52,710 --> 00:59:57,580 The exciting result that we had last year, last week. 553 00:59:57,580 --> 01:00:03,440 And if I'm sort of to paraphrase the discussion, much of the hype has been justified. 554 01:00:03,440 --> 01:00:07,800 But there is some real details that one needs to think about there. 555 01:00:07,800 --> 01:00:12,690 Just before thanking you, I'd like to thank everyone who has tuned in. 556 01:00:12,690 --> 01:00:19,320 And also the many questions and do forgive me when I've not been able to come to your questions. 557 01:00:19,320 --> 01:00:24,150 We're not stopping for the holiday break, but do look at the Martin School Web site. 558 01:00:24,150 --> 01:00:29,350 We'll be announcing some more virtual conversations in the new year. 559 01:00:29,350 --> 01:00:36,830 And let me finish by thanking both Phil and Yvonne. Yvonne, you've done magnificently, given that you've had to do this completely blind. 560 01:00:36,830 --> 01:00:42,500 Whereas Philon, I can sort of see each other's movements. So I'm sorry we had this technical. 561 01:00:42,500 --> 01:00:48,120 I'm very sorry. Is beginning to be a woman of mystery. But it's been a lovely hearing, the two of you. 562 01:00:48,120 --> 01:00:52,320 Your voices. And it's been great fun. And it's a it is a very exciting time. 563 01:00:52,320 --> 01:00:57,120 Thank you very much. So thank felon Yvonne very much indeed for taking part. 564 01:00:57,120 --> 01:01:00,538 And goodbye, everyone.