1 00:00:02,470 --> 00:00:15,170 I. Okay. 2 00:00:15,250 --> 00:00:21,610 Thank you very, very much. I'm slightly worried you're here under false pretences because this morning I looked at my abstract, 3 00:00:22,210 --> 00:00:26,200 and the talk I wrote doesn't really fit the abstract, but it does fit the title still. 4 00:00:26,200 --> 00:00:29,890 So I'm hoping you won't feel you're here lured by false advertising. 5 00:00:30,670 --> 00:00:36,100 So I'm going to talk about mathematical models and population genetics and a good place to start actually seem to be population genetics. 6 00:00:36,100 --> 00:00:40,240 What is it? And it's more than the last 20 or 30 years where mathematics has been important. 7 00:00:40,990 --> 00:00:45,040 So as a subject, a scientific subject is rather young. It's only a century or so old. 8 00:00:45,490 --> 00:00:52,960 And we usually trace it back to what we call the modern evolutionary synthesis, which was when the work of Darwin up here and Mendel were reconciled. 9 00:00:53,560 --> 00:00:57,220 So what were their theories? So his natural selection in a nutshell. 10 00:00:57,730 --> 00:01:03,879 So Darwin's theory says that heritable traits that increase Chris reproductive success will become more common in a population. 11 00:01:03,880 --> 00:01:11,560 So if, for example, being taller makes you more likely to have more children and being taller is a trait that you tend to hand on to your offspring, 12 00:01:11,830 --> 00:01:14,350 then being taller will become more common in the population. 13 00:01:14,350 --> 00:01:21,790 The population will gradually become taller, and for it to work for it to make any difference to a population, it requires variability. 14 00:01:21,910 --> 00:01:25,870 It's no good if everyone is the same and offspring must be similar to their parents. 15 00:01:27,490 --> 00:01:31,120 That variability and heritability are given to us by Mendel. 16 00:01:31,450 --> 00:01:37,390 So what does Mendel say? Mendel says that traits are determined by genes, determined in inverted commas. 17 00:01:37,780 --> 00:01:44,169 Because actually it's a bit more subtle than that. But traits are definitely influenced by genes, and genes occur in different types. 18 00:01:44,170 --> 00:01:48,310 That gives us that variability that we need. And offspring inherit genes from their parent. 19 00:01:48,340 --> 00:01:54,730 So you've got heritability. So why did it take 50 years for anyone to notice that these two theories could be brought together? 20 00:01:55,580 --> 00:02:01,389 And perhaps the main reason is that whereas Darwin tended to concentrate on the natural selection, 21 00:02:01,390 --> 00:02:05,530 acting through the accumulation of lots and lots of very small changes to get something really big. 22 00:02:05,920 --> 00:02:12,010 So these are his famous Galapagos finches. And to get from this beak to this beak was not something that happened overnight. 23 00:02:12,010 --> 00:02:15,310 It was an accumulation of lots of very, very small changes. 24 00:02:15,910 --> 00:02:22,209 Mendel, on the other hand, famously looked at peas and he was interested in traits which really were determined by genes. 25 00:02:22,210 --> 00:02:29,260 So whether a pea is green, yellow, wrinkled or round, whether it has a a green pot or yellow pot, 26 00:02:29,260 --> 00:02:33,969 a constricted pot or inflated pot, these are all things determined just by single genes. 27 00:02:33,970 --> 00:02:38,230 That very discrete is very different sort of variability from Darwin and his finches. 28 00:02:38,710 --> 00:02:45,040 But nonetheless the theories were brought together and they were brought together by mathematics and the mathematicians in question. 29 00:02:45,400 --> 00:02:49,120 We usually attribute anyway the modern evolutionary synthesis to three guys. 30 00:02:49,570 --> 00:02:55,240 So we have Fisher over here, Ray Fisher, civil right, and J.B. Haldane. 31 00:02:56,140 --> 00:03:01,380 So Fisher. Very famous British mathematician or statistician, depending on your culture. 32 00:03:03,060 --> 00:03:06,900 Fisher was interested in the data that was being collected by biometrics, 33 00:03:06,900 --> 00:03:12,420 and so they collected data on things like height and weight of parents and their offspring. 34 00:03:13,110 --> 00:03:17,129 And Fisher noticed that this could all be explained by Mendelian genetics as long as 35 00:03:17,130 --> 00:03:21,620 you allowed a particular trait to be determined by lots and lots of Mendelian factors. 36 00:03:21,630 --> 00:03:28,140 So lots of genes influencing the trait, each having a very small influence and a bit of environmental noise. 37 00:03:28,620 --> 00:03:32,820 And in the process, he actually invented much of modern statistics, and in particular the analysis of variance. 38 00:03:33,150 --> 00:03:37,140 This isn't a usual picture of Fisher, but that's what he would have looked like at the time of the evolutionary synthesis. 39 00:03:37,320 --> 00:03:42,650 Usually we show this grey man with a long white beard. Over here on the right, we've got civil right. 40 00:03:43,220 --> 00:03:43,910 So civil right. 41 00:03:43,910 --> 00:03:51,710 Was it an American who was trained in mathematics and then lowered into biology by a woman whose name I wrote down so I knew would never remember it? 42 00:03:52,370 --> 00:03:57,169 Wilhelmina Antman qui, i have you forgive me for not remembering that, but she's interesting. 43 00:03:57,170 --> 00:04:00,200 She was one of the first women to get a Ph.D. from the University of Chicago. 44 00:04:00,680 --> 00:04:04,820 And while Wright was at Cold Spring Harbour, she lured him into biology. 45 00:04:05,360 --> 00:04:11,480 And he developed a lot of what we now call the theory of genetic drift, which is understanding randomness. 46 00:04:11,570 --> 00:04:15,830 This is a long time before probability was a fashionable mathematical subject, 47 00:04:15,830 --> 00:04:23,600 but he was understanding the randomness in a population just because it's finite, it's inherently stochastic, this nature, the nature of reproduction. 48 00:04:24,140 --> 00:04:28,010 And he also developed notions, things like fitness, landscapes, which we still use today. 49 00:04:29,710 --> 00:04:33,100 This man here, JBS Haldane, Oxford trained mathematician. 50 00:04:33,460 --> 00:04:38,350 And you might guess, looking at the photo that he had perhaps the most colourful of the careers of these three gentlemen. 51 00:04:39,550 --> 00:04:43,720 He wrote an excellent children's book, my friend Mr. Leakey, which I thoroughly recommend to you. 52 00:04:44,500 --> 00:04:50,139 He was married to a very interesting journalist who brought very interesting people into his life and into his household. 53 00:04:50,140 --> 00:04:53,950 And so he left Oxford and travelled the world and finally died in India, very sadly. 54 00:04:55,010 --> 00:05:03,720 But. While these three would certainly have agreed that Mendel and Darwin were very compatible theories and indeed they they reinforced one another. 55 00:05:04,200 --> 00:05:07,560 What they certainly did not agree on was the answer to this question. 56 00:05:08,010 --> 00:05:13,889 So what is the relative importance of the different forces of evolution that are acting on my population? 57 00:05:13,890 --> 00:05:19,290 So natural selection in the sense of Darwin population structure, because we don't all live in a big melting pot, 58 00:05:19,290 --> 00:05:26,220 we're all sort of spread around and we live in different spatial locations and in different, different genetic forms. 59 00:05:26,580 --> 00:05:33,069 And genetic drift, this randomness that right. Set up. And I actually deliberately put Fisher and right. 60 00:05:33,070 --> 00:05:36,690 Rather a long way away from each other because they really did not get on. 61 00:05:36,700 --> 00:05:43,300 They had a very long standing feud. Because while right thought that genetic drift was a very important evolutionary force, 62 00:05:43,720 --> 00:05:46,690 Fisher thought that it would be completely dwarfed by natural selection. 63 00:05:46,930 --> 00:05:51,280 And he and Ford wrote a number of rather aggressive papers against rights theory. 64 00:05:52,280 --> 00:05:58,460 So if these incredibly intelligent and innovative thinkers were unable to cast light on this problem, 65 00:05:58,970 --> 00:06:02,780 why do we think we might be able to shed any new light on it now? 66 00:06:03,470 --> 00:06:09,380 And the answer lies in the data. So right you will notice is holding a guinea pig. 67 00:06:09,740 --> 00:06:16,370 This is not because in the 1930s when this photograph was taken, the Americans used guinea pigs as blackboard erasers as it might appear. 68 00:06:17,090 --> 00:06:27,400 It's actually because he bred guinea pigs, so these guys could only view genetic information indirectly by phenotype and right developed the our 69 00:06:27,410 --> 00:06:34,309 understanding of the way that different coat colours are inherited in guinea pigs and also rats, 70 00:06:34,310 --> 00:06:37,840 rabbits and lots of other other mammals of similar descent. 71 00:06:39,070 --> 00:06:43,600 Nowadays we can view DNA sequences directly, and frankly, our data is a lot less cute. 72 00:06:44,110 --> 00:06:47,480 So here is what geneticists do with modern data. 73 00:06:47,550 --> 00:06:49,570 This thanks to Jonathan McKinney for this. 74 00:06:49,810 --> 00:06:56,290 Actually, if you go to the Department of Statistics, you can see this patent on the ball, on the doors, on the second floor. 75 00:06:57,400 --> 00:07:00,580 This is how you can tell the statistical geneticists at work. 76 00:07:01,420 --> 00:07:04,910 And what it corresponds to is data from 40 different human beings. 77 00:07:04,920 --> 00:07:10,930 They're from the Thousand Genomes Project, in fact, and they all come from an area in Nigeria. 78 00:07:11,230 --> 00:07:17,470 And what's been recorded, this is quite a long sequence of DNA, but all this records is the differences between individuals. 79 00:07:18,040 --> 00:07:23,680 And that's what geneticists there record, the differences between individuals or rather between the DNA sequences and individuals. 80 00:07:24,130 --> 00:07:28,960 And from those differences, they infer something about the way that individuals are related to one another. 81 00:07:29,350 --> 00:07:32,659 And we call those relationships genealogical trees. 82 00:07:32,660 --> 00:07:34,870 And we'll see a lot of those in the rest of the talk. 83 00:07:35,710 --> 00:07:41,110 So as mathematicians, if we want to address that key question about the different importance of the different forces of evolution, 84 00:07:41,110 --> 00:07:48,189 what we need are forwards in time models that say how those forces of evolution would change gene frequencies, 85 00:07:48,190 --> 00:07:51,880 how would they change the frequencies of different genetic types to move forwards? 86 00:07:52,300 --> 00:07:57,910 But then we want to compare that to data, or rather to what geneticists infer from their data. 87 00:07:58,240 --> 00:08:03,640 And so we need to be able to say backwards in time, if a population were evolving according to one of our models, 88 00:08:03,970 --> 00:08:10,720 what would those genealogical trees, what would those systems of relatedness look like in individual sample from our population? 89 00:08:12,020 --> 00:08:15,200 Okay. So let's just have a quick think about backwards in time. 90 00:08:16,340 --> 00:08:19,890 So I said genealogical tree and I'm deliberately saying that not family tree. 91 00:08:19,910 --> 00:08:24,700 So let's try and explain why. If I want to plot my family tree, what do I need? 92 00:08:24,710 --> 00:08:29,720 I need my parents and my grandparents and my great grandparents and so on. 93 00:08:30,170 --> 00:08:33,590 And the number of individuals in each generation is growing really, very, very quickly. 94 00:08:33,590 --> 00:08:36,919 And I think that's quite nicely illustrated by Mike Wallace. 95 00:08:36,920 --> 00:08:42,350 Just why you can find this sculpture in the grounds of Morton College. And thanks to David Lowery for sending me the photo. 96 00:08:43,160 --> 00:08:47,899 After just nine generations, which here are meant to represent generations of academics in maudlin, 97 00:08:47,900 --> 00:08:58,460 with a little self-important and maudlin through 512 leaves on this tree, it doesn't take very long to get to a very big number nine generations, 512. 98 00:08:58,940 --> 00:09:01,040 Now, natural populations are finite, 99 00:09:01,520 --> 00:09:08,570 and so you can't indefinitely go on doubling the number of people in your family tree without running out of individuals to put in your family tree. 100 00:09:08,930 --> 00:09:13,729 So some individuals must occur more than once. And let's see a real example of that. 101 00:09:13,730 --> 00:09:21,590 And I bet I've chosen the rather extreme example. But here is the family tree or the pedigree of King Charles, the second of Spain. 102 00:09:22,400 --> 00:09:29,720 And so here's Charles himself and here's his father. And here are his paternal grandparents and then great grandparents and so on. 103 00:09:30,440 --> 00:09:34,639 And then his his mother and here his maternal grandparents. 104 00:09:34,640 --> 00:09:39,080 And then we see that his great grandparents appear to be duplicated already. 105 00:09:39,710 --> 00:09:45,590 So he's a very extreme case because his mother was his father's niece. 106 00:09:46,260 --> 00:09:50,780 Now, this is really quite extreme inbreeding and it goes on as you go back in the tree, 107 00:09:50,780 --> 00:09:54,350 you'll see there are lots of instances of lineages coming together. 108 00:09:54,380 --> 00:10:02,600 This really isn't a tree. And in fact, it is an extreme case because I'm afraid Charles the second was actually very 109 00:10:02,600 --> 00:10:06,230 seriously handicapped by genetic disease and died without leaving offspring. 110 00:10:07,040 --> 00:10:14,540 So let's try and find a family tree, which is a little less politically inspired, because obviously a lot of the marriages here were not random. 111 00:10:14,540 --> 00:10:24,110 They were so that bits of Spain, state property of Spain and I'm going to move to a very apolitical organism that the snail 112 00:10:24,470 --> 00:10:28,250 and one of the reasons I moving to the snail is because drawing these pictures gets very, 113 00:10:28,250 --> 00:10:33,350 very difficult if you separate your population into males and females and snails hermaphrodite. 114 00:10:33,770 --> 00:10:36,409 But let me assure you, the mathematical models are almost identical, 115 00:10:36,410 --> 00:10:40,610 just much harder to draw with the program that I was using in a hotel room in Paris yesterday. 116 00:10:41,300 --> 00:10:44,360 So it's very carefully prepared, etc. 117 00:10:44,870 --> 00:10:50,629 So here what we've done is we've taken five snails and snails are not monogamous. 118 00:10:50,630 --> 00:10:55,040 So we're supposing that in the previous generation each snail just picks two parents at random. 119 00:10:55,040 --> 00:11:00,920 So for example, this one chooses that parent and that parent and any snail can breed with any other snail because they're hermaphrodites. 120 00:11:01,160 --> 00:11:06,069 And so they've successfully produced this offspring. And as we trace backwards in time. 121 00:11:06,070 --> 00:11:13,180 So this is my present day population and we trace backwards in time and we see we get quite a complicated network of relationships developing. 122 00:11:13,870 --> 00:11:18,249 And in particular it's already the case that in this generation, this individual, 123 00:11:18,250 --> 00:11:24,819 this individual and this individual all have to stay in the family tree in the sense of the pedigree of Charles, 124 00:11:24,820 --> 00:11:32,200 the second of all of the individuals down here. So all five of these individuals are in some sense descended from these three guys. 125 00:11:32,560 --> 00:11:35,680 This one left no offspring, so no one in the current generation. 126 00:11:36,010 --> 00:11:41,980 And it took me a long time to adjust the picture so that this guy doesn't actually is not actually ancestor to everybody. 127 00:11:42,010 --> 00:11:44,980 There's one person this out little exercise for you to work out which one. 128 00:11:46,860 --> 00:11:50,370 In fact, the reason that it took me so long to work it out is it's actually very difficult. 129 00:11:50,380 --> 00:12:01,050 So if this were genuinely done at random, instead of me just picking individuals after logarithm log, two of the number of individuals here, 130 00:12:01,530 --> 00:12:03,120 that number of generations back, 131 00:12:03,120 --> 00:12:11,280 we expect to see an individual who's ancestral to everybody after 1.7 log to end generations with very small variation around that. 132 00:12:11,520 --> 00:12:19,470 For large populations, everybody in the ancestral population is either ancestral to nobody now or ancestral to everybody now. 133 00:12:20,310 --> 00:12:23,640 So this suggests actually that this is not the best way to view ancestry. 134 00:12:23,820 --> 00:12:30,030 It's two joined up. So let me just actually convince you that that guy there was ancestral to everyone. 135 00:12:30,030 --> 00:12:33,749 And not only is ancestral to everyone, I've just coloured in cyan. 136 00:12:33,750 --> 00:12:37,680 I believe this colour is called all the individuals who are descended from him as I come 137 00:12:37,680 --> 00:12:42,419 down the tree without paying any attention as to whether he transmits genetic material. 138 00:12:42,420 --> 00:12:46,110 I've just looked at his offspring or her offspring since this is now. 139 00:12:47,040 --> 00:12:54,720 And you'll see that not only is everybody descended from this person, but there are multiple routes through this graph which get me from here to here. 140 00:12:55,260 --> 00:13:00,180 Okay. On the other hand, if I start thinking about genetics, you may wonder why there are small circles. 141 00:13:00,630 --> 00:13:06,780 The small circles I'm thinking of as individual genes. And if I just trace back the individual genes in this generation, 142 00:13:07,590 --> 00:13:12,600 then what I've done is I've used red lines to indicate where genes were inherited from. 143 00:13:12,600 --> 00:13:18,300 So each chooses not only an individual for a parent, but actually this gene chooses one of those two circles. 144 00:13:18,840 --> 00:13:22,320 Then this character, who is ancestral to everybody in the sense of pedigrees, 145 00:13:22,710 --> 00:13:26,670 has not transmitted any of their own genetic material to the current generation. 146 00:13:27,840 --> 00:13:35,010 Moreover, as I go backwards in time, because each gene chooses a unique parent in the previous generation, 147 00:13:35,490 --> 00:13:38,910 the structure that I get by only looking at genes is going to be much simpler 148 00:13:39,450 --> 00:13:43,350 because the number of lines that I trace as I go back in time can only get smaller. 149 00:13:43,380 --> 00:13:46,340 It's not getting bigger. I'm not having to look at two parents for a single gene. 150 00:13:46,530 --> 00:13:50,999 Single gene only has one parent, and that's easier to see if I just focus on one. 151 00:13:51,000 --> 00:13:58,590 So what I've done is I've just arbitrarily decided I'll only trace the left hand gene from each of the individuals in the present day population. 152 00:13:59,630 --> 00:14:03,650 And then you can see that as I trace backwards in time, there's a unique path that connects, for example, 153 00:14:03,650 --> 00:14:12,110 this guy to the ancestral generation, but occasionally two paths will meet and thereafter they will be the same. 154 00:14:12,530 --> 00:14:19,129 And we can encode that information in a very simple picture like this. And so what we're going to do is instead of following the haploid individual 155 00:14:19,130 --> 00:14:21,890 and trying to think about where everybody came from with both of the sorry, 156 00:14:21,950 --> 00:14:25,579 the diploid individuals and think about where everybody came from with both of their genes, 157 00:14:25,580 --> 00:14:28,580 we're just going to trace Gene by gene and see where individuals came from. 158 00:14:29,550 --> 00:14:32,700 And that leads us to the simplest imaginable model of inheritance. 159 00:14:33,120 --> 00:14:34,680 And this is called the right fisher model. 160 00:14:35,520 --> 00:14:40,920 I don't think they'd like their names being tied together like this, but I'm afraid they are inextricably connected in population genetics. 161 00:14:41,460 --> 00:14:45,300 And the Roy Fisher model is probably the most important model in mathematical population genetics. 162 00:14:45,300 --> 00:14:53,700 So it's a bit disturbing how simple it is. So the idea is that each individual chooses one parent in the previous generation. 163 00:14:53,940 --> 00:15:01,710 And I'm thinking of an individual now as being a gene. So each gene chooses one parent in the previous generation and generations discrete. 164 00:15:02,040 --> 00:15:05,370 And what happens in one generation doesn't affect what happens in the next generation. 165 00:15:07,770 --> 00:15:15,360 I told you before that in a pedigree we'd expect a common ancestor to the population on the order of long to end generations ago. 166 00:15:15,990 --> 00:15:20,010 So let's have a quick think about how far we have to trace back to get a common genetic ancestor. 167 00:15:21,040 --> 00:15:26,620 So let's take a sample size two from this population and think about how long do we have to go back before we get a common genetic ancestor? 168 00:15:27,070 --> 00:15:34,690 Well, the probability that my two individuals had a common parent is just one over ten because the first chooses a parent just uniformly at random, 169 00:15:34,960 --> 00:15:38,970 and the second one's got to choose the same parent. And the chances of choosing the same parent is one over ten. 170 00:15:39,870 --> 00:15:44,520 If they didn't choose the same parent, the chance they chose the same grandparent is one over ten. 171 00:15:45,330 --> 00:15:51,870 So the number of generations I wait until my sample of size two has a common parent's common parental 172 00:15:51,870 --> 00:15:56,970 gene is the same as the number of times I have to roll an enzyme to die before I get an end, 173 00:15:57,630 --> 00:16:05,760 and that's going to be on the order of generations. So whereas Pedigree had common ancestry after this very small number of two generations. 174 00:16:06,150 --> 00:16:10,200 Genetics genetic ancestry is determined over much longer timescales. 175 00:16:10,470 --> 00:16:13,620 And for just a pair of individuals, it would be about and generations ago. 176 00:16:13,980 --> 00:16:17,040 And in fact, for the whole population, it'll be about to end generations ago. 177 00:16:17,220 --> 00:16:24,390 So not very much longer. Okay. Now, the models we're interested in, as you may already have spotted, are very, very crude. 178 00:16:24,780 --> 00:16:29,070 They're trying to capture just some caricature of the way that populations reproduce. 179 00:16:29,100 --> 00:16:35,210 We're not going to go into fine detail of what's happening locally. So let us suppose that population size is very big. 180 00:16:35,220 --> 00:16:37,200 Otherwise you wouldn't use this kind of a model. 181 00:16:38,180 --> 00:16:43,159 So he said that for a sample of size two, it's going to take on the order of generations before we see a common ancestor. 182 00:16:43,160 --> 00:16:50,660 So let's use MN in generations as our unit of time so that the time for two individuals to find a common ancestor is of order. 183 00:16:50,660 --> 00:16:53,899 One Okay, well, if I do that, 184 00:16:53,900 --> 00:16:59,059 then the time to the most recent common ancestor is one over ten times the number of rows of my inside a die 185 00:16:59,060 --> 00:17:05,270 until I guess an N and it's well known that that is very well approximated by something completely independent, 186 00:17:05,270 --> 00:17:08,330 often called an exponential random variable with parameter one. 187 00:17:08,990 --> 00:17:13,820 So as long as I measure time in these ludicrously big, big units of generations, 188 00:17:14,030 --> 00:17:20,480 the times the most recent common ancestor of a sample of size two is on the order of one, and it's given by an exponential one random variable. 189 00:17:21,290 --> 00:17:24,469 Now let's take a bigger sample to be a bit boring. So let's take a sample of size. 190 00:17:24,470 --> 00:17:32,780 KP And Regal two three. The probability that at least three individuals in my sample have a common parent is water one over n squared. 191 00:17:32,780 --> 00:17:36,829 I claim because the first one chooses a parent, the second one's got to choose the same parent. 192 00:17:36,830 --> 00:17:39,920 That's one over in the third one's going to choose the same parent again. 193 00:17:40,010 --> 00:17:41,420 So that's one over n squared. 194 00:17:43,010 --> 00:17:49,040 I'm not going to see that happen because it's going to take me about ten squared generations before I see an event like that. 195 00:17:49,310 --> 00:17:52,400 And by then, all my lineages will have coalesced pairwise. 196 00:17:52,940 --> 00:17:57,990 So I'm never going to see three lineages coming together in a single generation in the same way. 197 00:17:58,010 --> 00:18:00,469 I'm also never going to see what we call simultaneous mergers, 198 00:18:00,470 --> 00:18:06,410 where two distinct pairs of individuals come together in the same generation because the probability this pair comes together is one over. 199 00:18:06,410 --> 00:18:11,300 In probability, this pair comes together as one over ten. So I'd have to wait and scratch generations to see it. 200 00:18:11,510 --> 00:18:17,420 And that's too long. All my lineages will have coalesced by pairwise coalescence by the time that happens. 201 00:18:18,210 --> 00:18:23,300 And so what we're left with is it's an observation of Kingman in 1982, really, 202 00:18:23,300 --> 00:18:27,470 although he proved it, lots of people had observed it was if I have a sample of size. 203 00:18:27,470 --> 00:18:30,740 K So here's a sample of size four from my population. 204 00:18:31,220 --> 00:18:33,770 Measuring time in these units of ten generations. 205 00:18:34,400 --> 00:18:39,830 The time that I have to wait as I trace backwards in time before anything happens in my genealogical trees. 206 00:18:39,830 --> 00:18:48,799 In these trees, telling me how individuals related to each other. It's just the minimum of the four choose to just reduce three exponential one 207 00:18:48,800 --> 00:18:52,820 random variables that tell me when they find their common ancestry pairwise. 208 00:18:53,540 --> 00:18:58,250 And the minimum of those two exponential random variables is just an exponential random variable. 209 00:18:58,460 --> 00:19:02,000 And we just denoted it here. Now I've got three lineages left. 210 00:19:02,210 --> 00:19:06,230 And the extra time I must wait before the next thing happens is the minimum of three. 211 00:19:06,230 --> 00:19:10,480 Choose two exponential random variables and so on. Okay. 212 00:19:10,810 --> 00:19:17,490 And here this is a picture of what this related list looks like for a sample size a thousand. 213 00:19:17,500 --> 00:19:20,530 I'm grateful to Bob Griffis for producing this for me many years ago. 214 00:19:21,670 --> 00:19:25,960 It's as you can see. A lot of stuff happens very, very quickly. 215 00:19:26,470 --> 00:19:31,060 But then we're down to a rather small number of lineages, and it takes a long, long time. 216 00:19:31,360 --> 00:19:35,110 After after this initial flurry of activity, before very much happens. 217 00:19:35,740 --> 00:19:45,580 Okay. So we've got a forwards in time model for the low frequencies and a corresponding backwards in time model given by Carmen's coalescence. 218 00:19:46,150 --> 00:19:53,530 So how does it do with data? Well, you probably guessed that this isn't really a very good model of how populations really reproduce. 219 00:19:53,770 --> 00:20:00,100 But you might hope that you could reproduce it in a laboratory. And in the 1950s, Furey tried just that. 220 00:20:00,430 --> 00:20:03,549 So what he did was he took a population of fruit flies. 221 00:20:03,550 --> 00:20:11,710 This is just awful. Melanogaster And he took the fruit flies in two different forms that differed just very slightly in their eye colour. 222 00:20:12,100 --> 00:20:16,990 So half of them, as when he started out, carried a gene which just slightly changes the eye colour. 223 00:20:17,710 --> 00:20:22,090 And he took 100 populations each consisting of eight males and females, 224 00:20:22,690 --> 00:20:26,590 and each started with half with one eye colour and half of the other eye colour. 225 00:20:27,190 --> 00:20:30,430 And he propagated these populations for 20 generations. 226 00:20:31,210 --> 00:20:37,300 And he compared the results that he got to the predictions of the Wright Fisher model. 227 00:20:37,900 --> 00:20:41,770 Now, how did he do it? He actually had to keep the populations of mice constant. 228 00:20:41,770 --> 00:20:48,370 So in each generation, he resample to always keep the constant, the population size at 16 individuals for each of those populations. 229 00:20:48,370 --> 00:20:53,800 So it must have been it doesn't seem a very big experiment by these modern standards of big data, 230 00:20:54,100 --> 00:20:57,100 but it must have been quite a tedious experiment to perform. 231 00:20:57,760 --> 00:21:05,430 And here's his results. So that right Fisher model tells us that on average, actually, the proportion of the eye, 232 00:21:05,800 --> 00:21:08,890 different flavours of AI in our population is not going to change. 233 00:21:09,130 --> 00:21:12,760 But there'll be some variability in that and it gives us a prediction for the variability. 234 00:21:13,330 --> 00:21:22,180 And he plotted So what this is one minus one, minus one over into the number of generations is what the picture should predict. 235 00:21:22,570 --> 00:21:27,700 So he plotted, all right. Plotted his results. So here are the results of his experiments. 236 00:21:27,700 --> 00:21:30,790 And this is a variance that we're plotting a variance against generation. 237 00:21:31,180 --> 00:21:35,559 So in population at the beginning, we started with exactly a half a half in all the population. 238 00:21:35,560 --> 00:21:40,600 So there was no variability. And this is just saying something about how the populations vary as time goes on. 239 00:21:40,930 --> 00:21:44,830 And eventually all the populations will be either one eye colour or the other eye colour. 240 00:21:45,160 --> 00:21:50,230 And at that point, this variance will hit point 25. So you can see it's rising steadily. 241 00:21:50,920 --> 00:21:55,720 Now, I'd like to tell you that this straight line was the prediction of the right fashion model. 242 00:21:56,440 --> 00:22:03,250 But that wouldn't be completely honest. This line, the dotted line is the prediction of the right fisher model. 243 00:22:04,230 --> 00:22:09,640 And it turns out, though, that this line is almost the right fish model. 244 00:22:09,660 --> 00:22:16,530 But instead of taking the true population size, which in this case is 16, I've substituted 11 and a half. 245 00:22:17,630 --> 00:22:22,960 And by virtue of doing that, I mean, I know it's not a perfect fit, but actually for an experiment of this size, that's pretty good. 246 00:22:23,930 --> 00:22:25,430 And it turns out that's universal. 247 00:22:26,180 --> 00:22:33,050 It is pretty good as long as I don't use the real population size and I put in a population size to suit my purposes. 248 00:22:33,590 --> 00:22:38,270 The Kingman coalescence, or the right Fisher model is a pretty good approximation, even to natural populations. 249 00:22:38,480 --> 00:22:40,670 As long as I sample individuals from far away, 250 00:22:41,180 --> 00:22:47,360 far enough away from one another that I'm not seeing local effects brought about by them living in very close proximity, for example. 251 00:22:48,170 --> 00:22:51,830 And it's an example of the sort of scale of this fudge factor. 252 00:22:52,160 --> 00:22:55,610 So how much do I have to change the population size to make things fit? 253 00:22:56,030 --> 00:23:03,409 I think the human population is rather nice. So for the whole world I would need to take an effective population size. 254 00:23:03,410 --> 00:23:07,100 I'd need a substitute and to be about 50,000 in my right vision model. 255 00:23:07,910 --> 00:23:11,720 And of course, the true population size of humans is 7 billion. 256 00:23:12,590 --> 00:23:21,890 So the difference between the number I plug in to make my model fit and the true number is not five orders of magnitude. 257 00:23:22,550 --> 00:23:25,670 So that seems crazy that this should work, is completely mad that it should work. 258 00:23:25,670 --> 00:23:30,200 But actually beyond that correction, it fits the data extremely well. 259 00:23:31,810 --> 00:23:35,020 Okay. Now, we would like to understand why. 260 00:23:35,500 --> 00:23:39,159 And in particular, we would like to understand if we're going to understand our basic question, 261 00:23:39,160 --> 00:23:44,410 how the different forces of evolution feed into this need to make it all work so well. 262 00:23:44,650 --> 00:23:48,190 What the right fish model is doing is it's just modelling the genetic drift. 263 00:23:48,520 --> 00:23:52,600 But how would selection alter any how the spatial structure of any. 264 00:23:53,690 --> 00:24:00,489 And I started working on this stuff about 20 years ago when Nick Boulton, who is a very distinguished evolutionary geneticist now, 265 00:24:00,490 --> 00:24:09,640 at least in Australia, came to see me and Nick said, Well, look, I'm studying these grasshoppers who they are and they live in the maritime Alps. 266 00:24:09,730 --> 00:24:13,270 And you'll see, like he always chooses nice mountain ranges for his field trips. 267 00:24:13,780 --> 00:24:16,839 And they they really are in a spatial continuum. 268 00:24:16,840 --> 00:24:22,060 And I want to know how this spatial structure is affecting the genetics of pedestrian pedestrians. 269 00:24:22,780 --> 00:24:28,780 And I should tell you why it's called disruptive industries, because it is a pedestrian grasshopper. 270 00:24:28,870 --> 00:24:33,280 It's hard to see, but this is the vestigial wing. This thing cannot fly. 271 00:24:33,670 --> 00:24:37,659 Okay. So it crawls around, hops around. It doesn't move very far in its lifetime. 272 00:24:37,660 --> 00:24:41,920 So the spatial structure to as a pedestrian probably looks pretty much like the plane. 273 00:24:42,590 --> 00:24:48,100 And at the same time, Nick said, oh, and by the way. Right. And Marco almost solved this in the 1940. 274 00:24:48,110 --> 00:24:52,450 So Gustav Malago is another of the greats of population genetics. 275 00:24:53,080 --> 00:24:57,580 And the way that right in Monaco solved it was they took the right fishing model and they adapted it to a spatial setting. 276 00:24:57,580 --> 00:25:00,700 So they said, Let's suppose individuals are scattered across space. 277 00:25:00,700 --> 00:25:07,269 This pointer is really not good. So we go so here individuals scattered across space and it's in a kind of 278 00:25:07,270 --> 00:25:10,570 uniform way that each of them just chooses where they fall uniformly at random. 279 00:25:10,990 --> 00:25:15,060 And this has been drawn on a source for reasons that will become clear in due course. 280 00:25:15,400 --> 00:25:21,640 And I will show some realisations of a simulation due to Jerome Kelleher, who I think is also sitting up there somewhere. 281 00:25:22,610 --> 00:25:25,399 And the way that their model works is it evolves in discrete generations, 282 00:25:25,400 --> 00:25:30,889 just like the Wright Fisher model and the number of offspring that each individual 283 00:25:30,890 --> 00:25:35,930 produces in each generation is taken to be a Poisson random variable with parameter one. 284 00:25:36,290 --> 00:25:39,530 So why is that what they chose when they chose that? Because in the right fisher model, 285 00:25:39,830 --> 00:25:46,340 if you look at the number of number of offspring that a single individual produces phylogeny, it's approximately a Poisson. 286 00:25:46,370 --> 00:25:53,520 It's very, very close to being just a partial. So now here, Mitch Gooding drew this for me. 287 00:25:53,810 --> 00:25:58,550 Here's a histogram which just tells you this is how many times I should expect to get zero offspring. 288 00:25:58,820 --> 00:26:04,040 This is how many times I should expect to get one offspring. So quite a lot of the time, and this is two offspring and so on. 289 00:26:04,040 --> 00:26:07,610 But on the average I produce one offspring. Okay. 290 00:26:08,150 --> 00:26:11,510 And I can't have my offspring all sitting on top of each other. That wouldn't be right. 291 00:26:11,840 --> 00:26:16,640 And so I scatter them around the position of the parents according to a Gaussian distribution. 292 00:26:16,820 --> 00:26:20,150 So they're just distributed close by in a nice, symmetric way. 293 00:26:20,750 --> 00:26:28,640 So understandably. Right. And Marco thought that this sort of pattern would persist, that if they looked in their population in generation ten, 294 00:26:28,970 --> 00:26:32,930 it would still look quite a lot like this one, still look pretty uniformly spread. 295 00:26:34,070 --> 00:26:40,790 I'm working on that assumption. They were able to do what was the equivalent in the 1940s of writing down those genealogical trees, 296 00:26:40,790 --> 00:26:45,049 telling us how individuals in the population were related to each other and genetically, 297 00:26:45,050 --> 00:26:49,480 how that the correlations between the genetic type would decay with distance. 298 00:26:49,490 --> 00:26:59,410 They predicted it would decay approximately exponentially. But then in 1975, Joe Feldstein noticed that actually the assumptions were inconsistent. 299 00:27:00,070 --> 00:27:03,729 And this is during the simulation, which he's probably rather embarrassed I'm using. 300 00:27:03,730 --> 00:27:09,100 So he did it for a lab meeting a long time ago. But here's the initial condition and what he's done. 301 00:27:09,100 --> 00:27:14,710 He's working on a tourist, and he's suppose that the population really does evolve according to the Malago model. 302 00:27:15,250 --> 00:27:19,990 And after ten generations, this is what it looks like. So the population is still pretty much a thousand. 303 00:27:20,260 --> 00:27:24,370 It's not changed very much, but we're getting these white spaces developing. 304 00:27:25,550 --> 00:27:29,630 After a hundred generations, the population is still pretty close to a thousand, actually. 305 00:27:29,960 --> 00:27:33,290 But we really are getting a lot of white space. The population is really clamping. 306 00:27:34,860 --> 00:27:40,950 By a thousand generations, the population is caving in and realising that mathematics cannot be defeated because there is a 307 00:27:40,950 --> 00:27:46,320 theorem that says it has got to die out eventually and it's noticing that it really ought to. 308 00:27:46,350 --> 00:27:51,450 So it's down to close to 300, but those individuals that are left are really clustered together. 309 00:27:51,780 --> 00:27:59,459 Okay. Now, a fellow Sunstein first observed this when he was working on the whole if the real plane hole of Euclidean space and he said, 310 00:27:59,460 --> 00:28:03,480 oh, well, maybe it's just cause I'm working with an infinite population. Real populations are finite. 311 00:28:03,490 --> 00:28:04,979 Let's look at a finite population. 312 00:28:04,980 --> 00:28:13,410 So he moved on to a Taurus and then he noticed, well, unfortunately, on a tourist, either the population blows up or it dies out, so it will die out. 313 00:28:13,410 --> 00:28:18,330 If I just have on average one offspring and if I slightly increase that, the population will just explode eventually. 314 00:28:18,340 --> 00:28:26,430 So that doesn't work. And then he said, Well, let's suppose that somehow the total population size, oh, my Taurus is exogenous. 315 00:28:26,430 --> 00:28:28,500 And he specified and had to fix it to be a thousand. 316 00:28:29,460 --> 00:28:35,580 But we can see from the simulation that that's still not working, actually, because the population size here didn't change very much. 317 00:28:35,730 --> 00:28:37,260 It's still pretty close to a thousand, 318 00:28:37,500 --> 00:28:44,190 but we're still getting clumping and Feldstein realised that this was going to happen and at that point he said, okay, I give up. 319 00:28:44,550 --> 00:28:48,270 And he wrote a paper which famously dubbed this problem the pain in the Taurus. 320 00:28:49,170 --> 00:28:53,130 And so the challenge that Nick was really throwing down to me was to solve the pain in the Taurus. 321 00:28:54,430 --> 00:29:01,639 All right. We want you to produce a model which was a little bit like the Moloko assumption 322 00:29:01,640 --> 00:29:05,090 that the population would be distributed in space in a relatively uniform manner, 323 00:29:05,390 --> 00:29:10,100 but which actually had a stability to it so that we could write down genealogical trees in a consistent way. 324 00:29:11,200 --> 00:29:13,660 And we want that model to address one or two other issues. 325 00:29:14,200 --> 00:29:19,320 So we've already seen that genetic diversity is much, much lower than what you'd expect from census numbers. 326 00:29:19,360 --> 00:29:25,450 That's another way of saying the effect of population size is orders of magnitude different from the census population size. 327 00:29:25,460 --> 00:29:30,220 That's that same statement. In other words, another thing was I said that right. 328 00:29:30,220 --> 00:29:36,490 And Malika observed that the correlations between genetic types would decay, sort of expand exponentially with distance apart. 329 00:29:37,530 --> 00:29:42,180 That's sort of true ish over some scales. But then when you look over larger scales, 330 00:29:42,420 --> 00:29:47,100 the rate of exponential decay appears to have decreased and you get longer range correlations than seems reasonable. 331 00:29:48,190 --> 00:29:55,030 And a possible explanation for this is that the demographic history of many populations is really dominated by large scale events. 332 00:29:55,060 --> 00:29:58,970 So imagine I'm a population of plants living on a forest floor. 333 00:29:59,470 --> 00:30:04,120 Every hundred generations or so I forest fire sweeps through and completely wipes the population out. 334 00:30:04,450 --> 00:30:07,630 And then it gets very, very rapidly replaced, re colonised. 335 00:30:08,110 --> 00:30:15,910 And that's going to lead to both a reduction in genetic diversity and long large scale correlations in LEO frequencies. 336 00:30:16,360 --> 00:30:20,520 Now I am not claiming the model I am about to write down. I think it is quite a good model for forest fires. 337 00:30:20,530 --> 00:30:24,160 It's not a very good model for glacial maximum for ice ages. 338 00:30:24,520 --> 00:30:31,720 But I like to show this slide just to remind me, to give you a notion of the sort of timescales over which evolution is happening. 339 00:30:32,440 --> 00:30:35,979 So remember, we set the effect of population size for the human population. 340 00:30:35,980 --> 00:30:39,400 Let's say if we just look at European humans is about 20,000. 341 00:30:40,240 --> 00:30:47,440 So that means that my genetic composition is being determined over 20,000 generations and into generation time. 342 00:30:47,830 --> 00:30:54,940 I don't know. We could argue over 20 years. So that means we're talking about timescales of hundreds of thousands of years. 343 00:30:55,660 --> 00:31:02,620 Right. The last ice age, the last glacial maximum Northern Europe was largely covered in ice. 344 00:31:03,310 --> 00:31:06,910 Humans did not live there, and that was only 18,000 years ago. 345 00:31:07,150 --> 00:31:09,820 So from the point of view of genetics, that's just the twinkling of an eye. 346 00:31:10,570 --> 00:31:17,760 So these large scale events are really going to have influenced our genetic composition, and we can't simply ignore them. 347 00:31:17,770 --> 00:31:20,660 They really are going to have affected things. Okay. 348 00:31:20,960 --> 00:31:28,970 Another thing I wanted to say before showing you the model that we came up with is one we derived derived the Cayman coalescence. 349 00:31:29,390 --> 00:31:35,000 I emphasised that for large populations we'd never see three lineages merging in 350 00:31:35,000 --> 00:31:38,150 a single generation because of having a common parent in a single generation, 351 00:31:38,480 --> 00:31:41,180 because individuals had so many parents to choose from, 352 00:31:41,180 --> 00:31:47,180 you never got three of them choosing the same ones on the timescale that we were looking at, where you just get pairs choosing the same one. 353 00:31:47,510 --> 00:31:49,820 And that's just because in squared was much bigger than in. 354 00:31:50,900 --> 00:31:56,630 In a spatial continuum if I'm an individual and I'm looking around me for potential parents in the previous generation. 355 00:31:57,690 --> 00:32:03,690 That may not be true and may not be very large. And it may be the case that any squared is really not so much bigger than N. 356 00:32:03,690 --> 00:32:10,140 And so I will see mergers of not just two ancestral lineages at a time, but three, four or five, any number. 357 00:32:11,180 --> 00:32:14,600 Okay. So here's his model. Our first first stab at a model. 358 00:32:15,410 --> 00:32:19,040 We going to suppose the population is just spread out in space in the same way as Wright Malachi did. 359 00:32:19,970 --> 00:32:24,340 But no reproduction isn't going to be based on individuals. It's going to be based on events. 360 00:32:24,380 --> 00:32:29,150 And this is the key insight. So what we do is we throw events down. 361 00:32:29,300 --> 00:32:34,640 You could do this in discrete generations. Mathematically, it's convenient to do it in overlapping generations, as we say. 362 00:32:34,650 --> 00:32:43,410 So we throw down one event at a time. And a reproduction event is just going to affect a region which is determined by the nature of the event. 363 00:32:43,430 --> 00:32:46,129 So here is just a disk sent x and radius. 364 00:32:46,130 --> 00:32:51,920 Ah, now if the region I throw down is empty, I don't do anything because there's no population there to reproduce. 365 00:32:52,430 --> 00:32:58,760 But if it's not empty first, among all the individuals living there, I choose a parent uniformly at random. 366 00:32:59,440 --> 00:33:05,810 The first thing to notice is if I'm living in a very crowded region, the odds of being chosen as a parent get to be very small. 367 00:33:06,410 --> 00:33:10,630 So my reproductive success, if I live in a very crowded region, is not big. 368 00:33:10,640 --> 00:33:15,410 It's very small. On the other hand, if I'm in a very sparsely populated region, I will be picked. 369 00:33:15,810 --> 00:33:21,950 Okay. And that's the key. That's what prevents that clumping that we saw from Feldstein's observations. 370 00:33:22,520 --> 00:33:26,599 Okay, so I've chosen this individual. I'm going to kill a proportion of the population. 371 00:33:26,600 --> 00:33:29,630 In this example, I allowed the parent to die. I don't have to. 372 00:33:30,760 --> 00:33:35,469 And I replace them. So they are killed with some probability that I just specify independently. 373 00:33:35,470 --> 00:33:39,970 So each of them flipped a coin that comes off heads with probably see you. And if it comes up heads, they die. 374 00:33:41,070 --> 00:33:46,790 And then I replace them with offspring and they're scattered in the same way as the parental population is scattered 375 00:33:46,800 --> 00:33:52,740 just by picking points uniformly at random in the region and the distribution of the number of offspring. 376 00:33:52,890 --> 00:33:55,980 It's random, but it's chosen to roughly replenish the population. 377 00:33:55,980 --> 00:33:59,960 So the population density should be, roughly speaking, constant. Okay. 378 00:33:59,970 --> 00:34:04,140 So and then we remove the dead individuals, and that's my new population. 379 00:34:05,100 --> 00:34:09,300 Okay. So how does this work as a model? It's obviously very crude. It's a little bit like a white fisher model. 380 00:34:09,930 --> 00:34:13,140 Does it work at all? Well, it overcomes the pain in the tourists. 381 00:34:13,170 --> 00:34:17,730 It does have a nice stationary distribution with populations distributed uniformly across space. 382 00:34:18,710 --> 00:34:22,010 It allows us to incorporate large scale extinction recolonisation events very 383 00:34:22,010 --> 00:34:26,180 easily and it also is easy to extend to include things like natural selection. 384 00:34:26,180 --> 00:34:30,169 So for example, I might select the parent not just uniformly among those in the region, 385 00:34:30,170 --> 00:34:33,790 but according to their genetic type weighted by their genetic type. 386 00:34:33,800 --> 00:34:37,430 Or I might choose individuals to die according to their genetic type. 387 00:34:38,030 --> 00:34:44,719 So it's very easy to adapting to natural selection and we can write down the distribution of those genealogical trees, 388 00:34:44,720 --> 00:34:47,840 the things that the geneticist transferring. The problem is. 389 00:34:49,450 --> 00:34:56,139 It's a bit of a mess. So the expressions we write down are extremely complicated, but it's only a metaphor. 390 00:34:56,140 --> 00:35:03,790 Mathematical reasons, at least as the population is relatively dense, that mathematical mess can all be approximated by a single model. 391 00:35:04,600 --> 00:35:09,610 And the way that we think about that single model is that what we're going to approximate 392 00:35:09,610 --> 00:35:12,910 is what we're going to use as an approximate in model as a model for sampling probability. 393 00:35:12,920 --> 00:35:19,120 So what my model answers the question if I were to sample an individual from the point X at time T, 394 00:35:19,870 --> 00:35:24,640 what is the probability that it is a type A, whatever type A might be? 395 00:35:25,120 --> 00:35:30,139 So that's the question that our model would let us answer. And to explain how it works. 396 00:35:30,140 --> 00:35:35,720 It's convenient just to forget space for a moment because we're just going to adapt a norm spatial model to a spatial model. 397 00:35:36,350 --> 00:35:41,420 So here's how it's going to work. Reproduction is, again, going to be based based on events exactly as it was before. 398 00:35:41,750 --> 00:35:44,899 So events now specified a time. 399 00:35:44,900 --> 00:35:47,480 That's the time when the event happens and an impact. 400 00:35:47,660 --> 00:35:51,770 This you that's the proportion of the population that's going to be affected by the reproduction event. 401 00:35:52,560 --> 00:35:57,230 And let's have a look at this event and see what it does to our population so immediately before the event. 402 00:35:57,680 --> 00:35:59,329 This is how the different types are distributed. 403 00:35:59,330 --> 00:36:03,830 I've just used three different colours to represent three different types, and I got to select a parent. 404 00:36:03,980 --> 00:36:08,670 So I'm going to select my parent uniformly at random from the population. So I just threw a point just at random. 405 00:36:08,670 --> 00:36:12,140 I'm zero one here. And it happened to land here in this cyan region. 406 00:36:12,680 --> 00:36:19,520 So the type of the parent is going to be cyan. Now a proportion of my population is going to be killed. 407 00:36:20,030 --> 00:36:24,859 The remaining one minus you survive. So this band here is one minus two times the width of that one. 408 00:36:24,860 --> 00:36:28,700 That's one minus you times that one. And this bit is one minus you times that one. 409 00:36:29,650 --> 00:36:33,910 And I replace everyone I killed with offspring of this chosen type. 410 00:36:34,570 --> 00:36:42,090 And so on. This example, they're all saying. And the nice thing about this model is it's very easy to write down how the ancestral lineages behave. 411 00:36:42,870 --> 00:36:49,410 So here I've taken a sample from my population. The sample is a size five and I'm wondering how it's going to behave. 412 00:36:49,420 --> 00:36:52,110 So what's going to happen is I traced backwards through this event. 413 00:36:52,980 --> 00:36:58,530 Well, as it happened, these two guys fell in the region of the population that survived the event. 414 00:36:58,860 --> 00:37:02,250 And so they just survive. Nothing happens to them. They're still distinct lineages. 415 00:37:02,430 --> 00:37:08,830 And the previous generation. But these three all fell in the portion of the population corresponding to offspring. 416 00:37:09,490 --> 00:37:11,220 And so we know they had a common parent. 417 00:37:11,230 --> 00:37:17,710 So here we have an example of not a pairwise merger, but a three merger, and they merge into this common ancestral lineage. 418 00:37:18,010 --> 00:37:23,260 And it's very, very easy to write down mathematical expressions for the probabilities of events like this. 419 00:37:24,480 --> 00:37:29,040 Okay. So the idea of our approximation to the model, 420 00:37:29,070 --> 00:37:35,190 as I wrote down just now and it can be obtained genuinely is the limit of that model is that we do the same thing in space. 421 00:37:35,880 --> 00:37:42,990 So now we're not just specifying the distribution of types in a single region as we were in a non spatial example just now. 422 00:37:43,380 --> 00:37:51,810 But for each point Z And each time T I'm telling you the distribution of the type of an individual sample from the population at Z at time t. 423 00:37:51,990 --> 00:37:56,730 So if I sample an individual at time t. From this point, what's the probability it's type A? 424 00:37:56,760 --> 00:38:00,659 That's the question I'm answering. Okay. And it's much the same as what we did before. 425 00:38:00,660 --> 00:38:04,770 Reproduction events affect bounded regions. Those regions are now never empty. 426 00:38:04,770 --> 00:38:08,499 So I don't need to worry about empty space. I got to sample of parents. 427 00:38:08,500 --> 00:38:12,000 So first I'm going to sample a location from the parent. So this point Z was just uniform. 428 00:38:13,050 --> 00:38:18,360 Then I choose a type according to the distribution there, and it came out to be read much as it did on the previous slide. 429 00:38:19,410 --> 00:38:27,160 And then I update for everybody in this region. I kill a proportion of individuals and I replace them by offspring of this chosen type. 430 00:38:27,660 --> 00:38:34,229 And this is the fancy mathematical way of writing it. But all it's saying is that everywhere in this region I delete just slice off 431 00:38:34,230 --> 00:38:37,800 a proportion of the population and replace it by individuals of this type. 432 00:38:39,660 --> 00:38:43,940 I can write down a backwards in time model that corresponds to that that tells me about these genealogical trees. 433 00:38:44,750 --> 00:38:51,230 Because if I'm just a single individual in a sample, I want to know how the ancestry of the individual in my sample evolves backwards in time. 434 00:38:51,770 --> 00:38:56,960 I wait until the first time my individual is in a region that is affected by one of these reproduction events, 435 00:38:57,680 --> 00:39:00,890 and then it's got a probability you have been offspring of the event. Right. 436 00:39:01,040 --> 00:39:04,820 And if it is, then it's going to have to jump to the location of the parent. 437 00:39:05,030 --> 00:39:08,930 If it's not, nothing happens. It just keeps going. And what's the location of the parent? 438 00:39:08,960 --> 00:39:10,700 Well, it was just sampled uniformly from the ball. 439 00:39:10,700 --> 00:39:16,250 So it's very easy mathematically to write down expressions for the way these ancestral lineages move around in time. 440 00:39:17,200 --> 00:39:23,860 And if a region happens to cover a whole collection of ancestral lineages, as it did here, the idea is these green guys are ancestral lineages. 441 00:39:24,310 --> 00:39:29,629 Lineages outside the region can't be affected, but inside. Each end of each led each flipped. 442 00:39:29,630 --> 00:39:33,410 A coin that comes up heads is probably see you if it comes up heads we. 443 00:39:33,740 --> 00:39:40,010 This was a an offspring of the event and these three offspring must all be descended from the common parent. 444 00:39:40,220 --> 00:39:42,800 And the location of the common pair was uniform across the bow. 445 00:39:43,130 --> 00:39:51,710 So individuals coalesce when they within or can coalesce when they're within region, the same region that's affected by an event. 446 00:39:52,130 --> 00:39:55,940 So that gives us a backwards and forwards in time model. Okay. 447 00:39:55,940 --> 00:39:59,480 So it looks pretty crude and you probably think it's not going to have anything to do with data. 448 00:40:00,770 --> 00:40:04,340 But remember that Kingman worked over very, very large scales. 449 00:40:05,350 --> 00:40:09,670 Even though the Kingman coalition coalition was based on this very crude white fisher model. 450 00:40:10,180 --> 00:40:12,819 So is it the case that if we look over kind of intermediate scale, 451 00:40:12,820 --> 00:40:18,820 somewhere between the scale on which Kingman works and very local scales on which this model is clearly rubbish, it might work. 452 00:40:19,690 --> 00:40:22,990 And when you look at low frequencies, you think, Oh, no way. 453 00:40:23,200 --> 00:40:29,019 So this is just what a pattern of low frequencies might look like after we thrown down 50 odd events. 454 00:40:29,020 --> 00:40:32,650 And it just doesn't look realistic. But if I look over larger scales, maybe it will. 455 00:40:34,520 --> 00:40:37,910 So here's some something to try to convince you that there might be something in it. 456 00:40:37,940 --> 00:40:42,589 So this is this is a horrible bacterium, actually, pseudomonas aeruginosa. 457 00:40:42,590 --> 00:40:45,980 It's found in the lungs of cystic fibrosis patients, amongst other things. 458 00:40:46,760 --> 00:40:50,750 And this picture is from Kevin Foster's lab in the zoology department in Oxford. 459 00:40:51,260 --> 00:40:52,760 And I find this very beautiful. 460 00:40:52,760 --> 00:41:00,980 He has an incredibly high resolution microscope which allows him to observe that you can just about pick out, I hope, these individual bacteria. 461 00:41:01,950 --> 00:41:05,670 It's just an extraordinary picture, but we're not trying to model this. 462 00:41:05,700 --> 00:41:09,810 It's obvious that what's going on is this is the edge of a bacterial colony as it's evolving. 463 00:41:10,260 --> 00:41:13,290 So this is just a snapshot of it. This is empty space and these are the bacteria. 464 00:41:13,980 --> 00:41:20,160 And obviously, what happens next depends on the particular configuration of these bacteria and their exact load shape. 465 00:41:21,370 --> 00:41:26,530 But that's not what we're trying to capture. Let's zoom out a bit and look at that big bacterial colony on a slightly larger scale. 466 00:41:27,130 --> 00:41:31,120 So this is more like visible scales. And what we see is emerging structure. 467 00:41:31,960 --> 00:41:40,090 So what Kevin's done here is he's taken two populations of the bacterium, one blue and one green, but they are equally fit. 468 00:41:40,690 --> 00:41:44,139 And the beginning of the experiment, he's just makes them all up and he's dropped, 469 00:41:44,140 --> 00:41:48,460 put a little droplet onto his nutrient plate and you can sort of still see the vestiges of it there. 470 00:41:48,820 --> 00:41:49,750 And he's allowed it to grow. 471 00:41:50,560 --> 00:41:58,090 And as it grows, we see these sectors developing and these sectors are kind of proxy for relatedness amongst individuals in the population. 472 00:41:58,870 --> 00:42:05,980 So, for example, all these individuals in this blue sector will descend from some common ancestral bacterium back here somewhere. 473 00:42:06,510 --> 00:42:11,870 Okay. So what if we do the same thing for our model? I saw two basset. 474 00:42:13,030 --> 00:42:16,990 We're obviously picking out the same basic structure, the same secretary. 475 00:42:17,020 --> 00:42:22,590 And again, this is a simulation I owe to Jerome Callahan. So we've done exactly the same thing. 476 00:42:22,600 --> 00:42:26,530 We've put a mixture down at time zero. We've allowed it to evolve fortuitously. 477 00:42:26,530 --> 00:42:34,899 We've chosen the same colours as Kevin and you see sectors of much the same sort of pattern developing and these sectors are ubiquitous. 478 00:42:34,900 --> 00:42:41,440 It's not just pseudomonas that does it. This is yeast. This is from a paper of Oscar how Jack and his co-workers from 2007. 479 00:42:42,130 --> 00:42:47,870 By changing one parameter in our model or the ratio of two parameters, we can reproduce pictures like this. 480 00:42:47,870 --> 00:42:51,459 And I'm sorry, I forgot to email Jerome and ask him for one so I don't have one to show you, 481 00:42:51,460 --> 00:42:58,450 but I assure you that we can also reproduce these narrower sectors that are characteristic of yeast. 482 00:42:58,930 --> 00:43:04,430 Okay, so with that sort of reassurance, we decided to press on and try and understand a bit more about spatial structure. 483 00:43:04,430 --> 00:43:08,740 And I wanted to tell you about some very recent work on things called hybrid zones. 484 00:43:08,830 --> 00:43:11,020 So now I have to tell you some biology and show you pretty pictures again. 485 00:43:11,920 --> 00:43:17,030 So as Nick Bolton's got older and 20 years ago, he could just about catch up with Buddhism, 486 00:43:17,080 --> 00:43:23,260 a pedestrian that crawls very slowly around the maritime Alps. He's realised there are things called plants and they hardly move at all. 487 00:43:24,190 --> 00:43:28,510 So this is again Nick, who's a good friend, so I can tell you things about him. 488 00:43:28,900 --> 00:43:32,770 This is anti rainham and these on Geranium live in the Pyrenees. 489 00:43:33,400 --> 00:43:40,570 Obviously they don't live in some sort of damp field in the West Midlands, they live in the Pyrenees and they exhibit what's called a hybrid zone. 490 00:43:40,690 --> 00:43:48,880 So a hybrid zone is when you get two genetically distinct populations coming together and at the interface between them, 491 00:43:49,120 --> 00:43:52,450 they're sufficiently similar genetically that they can reproduce and hybridise. 492 00:43:52,870 --> 00:43:57,640 But the hybrids are not as fit as the pure populations were. 493 00:43:58,180 --> 00:44:04,120 And these are ubiquitous. And if you think about plant populations in the last place, you maximum that we talked about before, 494 00:44:04,750 --> 00:44:08,559 a lot of plants were sort of pushed back into Refugia after the Ice Age, 495 00:44:08,560 --> 00:44:10,870 they started to expand again and when they came back together, 496 00:44:11,140 --> 00:44:17,830 they were sufficiently genetically distinct that you could distinguish them and that when they interbred with one another, the hybrids were less fit. 497 00:44:18,190 --> 00:44:25,210 And so this particular hybrid zone on one side of the zone, the and Geronimo Yellow, and on the other side that is some pinkish purple colour. 498 00:44:26,050 --> 00:44:26,300 Okay. 499 00:44:26,710 --> 00:44:32,980 And hybrid zones are maintained by a balance between the desire of the plants to spread their offspring out and this selection against the hybrids. 500 00:44:34,090 --> 00:44:37,360 I said they're ubiquitous. Here are a couple of textbook examples. 501 00:44:37,720 --> 00:44:43,570 So the one you see in every textbook is this one up here. This is Mice with Musketeers and most domestics. 502 00:44:44,080 --> 00:44:48,580 So in the Northeast, mice take this form if you catch one in your larder. 503 00:44:49,270 --> 00:44:54,159 And down here we have the most domestics and you can probably see much better than I 504 00:44:54,160 --> 00:45:00,100 can that there is a narrow hybrid zone in this colour between the two populations. 505 00:45:01,090 --> 00:45:06,850 Here's another one. I like this one. This is the fire belly toad against the yellow bellied toad. 506 00:45:07,450 --> 00:45:10,760 And they have a really wacky hybrid scene. So you can see the hybrids on here. 507 00:45:10,780 --> 00:45:13,560 It goes all the way. Through Europe. 508 00:45:15,470 --> 00:45:21,200 And what this picture does, what this slides down is it's focussed on this little bit of this hybrid zone where a lot of experiments have been done. 509 00:45:21,200 --> 00:45:27,169 In hybrid zone, there's almost flat. It's about 20 kilometres wide and they've plotted a low frequency. 510 00:45:27,170 --> 00:45:32,750 So the frequency is the genetic type that gives you this toad versus the frequency, the genetic type that gives you this toad. 511 00:45:33,260 --> 00:45:37,570 And these are data points that they've plotted. Now in a region where. 512 00:45:37,990 --> 00:45:41,380 Well, okay, if you if you don't believe in genetic drift. 513 00:45:41,920 --> 00:45:48,219 I do believe in genetic drift. But if you don't believe in genetic drift, then you can model these hybrid science using this differential equation. 514 00:45:48,220 --> 00:45:51,100 It's called the it's a special case of what's called the Alan Cohen equation. 515 00:45:52,520 --> 00:45:59,480 We model it with this plus noise in some sense, so plus some genetic drift term, but it's easier to write it down in this example. 516 00:46:00,660 --> 00:46:08,700 And what that predicts is that actually across the hybrid zone, this should be in a curve and the curve would look a bit like one plus one over two. 517 00:46:09,060 --> 00:46:14,520 And those of you who can remember what one plus hyperbolic tangent over two looks like will think, Gosh, actually, that's not bad. 518 00:46:15,490 --> 00:46:19,270 I mean, that's the right sort of shape, but it predicts a relatively stable hybrid scene. 519 00:46:19,810 --> 00:46:26,170 But the question we set out to ask was, well, okay, that's what it looks like now, but how is this hybrid in itself going to evolve with time? 520 00:46:27,150 --> 00:46:32,850 If you zoom out, I mean, this one's 20 kilometres wide. You don't have to be very far away before it looks like a sharp interface. 521 00:46:33,880 --> 00:46:38,320 And so with Nick Freeman, who's now a lecturer in Sheffield, and Sarah Pennington, 522 00:46:38,320 --> 00:46:43,780 who is about to take up a research fellowship in mathematics here in the Institute and at New College, 523 00:46:44,800 --> 00:46:51,790 we have shown that at least if we start from sufficiently regular initial conditions to make myself mathematically honest as we zoom out, 524 00:46:52,330 --> 00:46:58,360 the hybrid zone becomes sharp, and that's whether we use this deterministic differential equation that the guys use, 525 00:46:58,360 --> 00:47:03,160 and they already knew this result of their deterministic equation or whether we also add to noise. 526 00:47:03,170 --> 00:47:05,710 So let's understand what happens as we zoom out. 527 00:47:06,100 --> 00:47:11,200 So as we zoom out, the hybrid zoom becomes sharp, that sort of clear because it was only 20 kilometres wide in the first place. 528 00:47:11,590 --> 00:47:16,570 But how does it move? It involves a cool design called curvature flow and to understand curvature flow. 529 00:47:17,700 --> 00:47:23,390 Is a hybrid zone, a putative hybrid zone, roughly speaking, the curvature at this point. 530 00:47:23,750 --> 00:47:28,309 You take the biggest circle that you can. That just fits here, doesn't crossover. 531 00:47:28,310 --> 00:47:32,450 It just fits. And the curvature is one over the radius of that circle. 532 00:47:33,650 --> 00:47:38,150 And here the circles on the other side and the curvature is one over the radius of this circle. 533 00:47:38,420 --> 00:47:41,390 So the curvature here is less than the curvature there. 534 00:47:41,630 --> 00:47:49,310 And it also has the opposite sign and curvature flow will push this point inwards and this point outwards and see it in action. 535 00:47:50,420 --> 00:47:58,580 Mat Dunlop is a student in the University of Warwick and he very kindly produced an illustration of curvature flow for me. 536 00:47:59,000 --> 00:48:01,400 Now let me just stop it so you can see the initial condition. 537 00:48:01,760 --> 00:48:13,069 I did not choose the name of this file, so you can see the Batman symbol very quickly degenerates into a sausage on up here where it's almost flat. 538 00:48:13,070 --> 00:48:16,190 You know, nothing much happens. And these ends are pushing and pushing in. 539 00:48:18,550 --> 00:48:21,220 And then there's still not much happening. 540 00:48:21,460 --> 00:48:27,460 But what we're going to see is that these ends will eventually there's not going to be any flat bit left because these ends have pushed in so far. 541 00:48:27,880 --> 00:48:31,540 And this will become circular. In fact, any convex region would eventually become circular. 542 00:48:32,170 --> 00:48:38,320 And as this shrinks, it's going to shrink is going to go faster and faster and faster because the circle gets smaller, the coverage is getting bigger. 543 00:48:38,500 --> 00:48:41,740 So many of which are flow goes faster and it's gone. Okay. 544 00:48:42,670 --> 00:48:46,020 So that's what I mean. Curvature flow does for you. So. 545 00:48:47,930 --> 00:48:53,180 What's going to happen to our hybrid teams. So now what's going to happen if we put some noise in? 546 00:48:53,180 --> 00:48:56,630 What's going to happen if we use our spatial and a Fleming view model and puts noisy? 547 00:48:58,480 --> 00:49:04,080 Well. This is a video that Nick Freeman did. 548 00:49:05,430 --> 00:49:13,010 So you can see again, he'd started with something a little bit like a Batman and Batman symbol, but he didn't have the imagination to do that. 549 00:49:13,020 --> 00:49:16,980 And you can see, again, it's pushing out to be sort of sausage shaped, but there's a bit of noise here. 550 00:49:17,190 --> 00:49:20,430 But you could imagine if you looked at this from far enough away and had bad enough eyesight. 551 00:49:20,550 --> 00:49:28,680 So for me, this looks a lot like a bunch of flow. I'm going to speed it up a bit cause we're getting to the end. 552 00:49:29,130 --> 00:49:33,660 So let's zoom along a bit at all. And you see it really is doing what COVID flowed. 553 00:49:33,660 --> 00:49:39,480 It did for the Batman. Same symbol. It's gone almost round and. 554 00:49:40,520 --> 00:49:45,730 If I keep going, it get smaller. Okay. Now, Jerome did another one for me. 555 00:49:46,210 --> 00:49:49,570 Jerome did one that looks good for the anti rainham. 556 00:49:50,930 --> 00:49:52,010 So it's even the right colours. 557 00:49:52,640 --> 00:50:01,160 And up here he's taking something which is nice and nice and wiggly, so that we can see that these bits with bigger curvature are going to disappear. 558 00:50:01,520 --> 00:50:04,070 This is almost flat and it stays almost flat. 559 00:50:04,460 --> 00:50:09,380 So what this stone is trying to do, and it's more like the shape of the zones that we see in other natural populations, 560 00:50:10,670 --> 00:50:15,840 is it's trying to get to be got to be a straight line. Okay. 561 00:50:18,260 --> 00:50:23,169 So we can expect that in natural populations. Approximately. 562 00:50:23,170 --> 00:50:27,610 At least if we look over large scales, things are going to evolve according to this curvature flow. 563 00:50:28,060 --> 00:50:32,780 It's probably the slowest example known of curvature flow, but it's rather cute. 564 00:50:32,800 --> 00:50:35,980 I think it's rather nice that we we can see how these things will move. 565 00:50:36,010 --> 00:50:42,150 We have not tested yet against a against data because this simply isn't going to be enough. 566 00:50:42,160 --> 00:50:45,630 It's going to move very, very slowly. So. 567 00:50:46,650 --> 00:50:53,610 We've assumed that those two populations that have come back together and then interfaced that the hybrids are not as fit as the purebreds. 568 00:50:53,610 --> 00:51:01,230 But we've assumed that the pure populations are equally fit. In fact, very often you can expect that the pure populations will not be equally fit. 569 00:51:01,980 --> 00:51:06,629 And if the populations are not equally fit. So I'm just showing off now, writing down random differential equations. 570 00:51:06,630 --> 00:51:14,040 Of course, the equation we write down before the island colony equation this year was equal to one, and that made this symmetric about a half. 571 00:51:14,340 --> 00:51:21,930 So that said, the frequency of different illegals in our population was symmetric about a half. 572 00:51:22,320 --> 00:51:28,380 And when it's bigger than a half, this time pushes me towards one, and then it's less than a half this term push, it pushes me towards zero. 573 00:51:28,860 --> 00:51:34,020 And so this gives me the competition between dispersion, dispersal and selection. 574 00:51:34,710 --> 00:51:38,340 If I is not equal to one, that's what happens when the populations are not equally fit. 575 00:51:38,640 --> 00:51:43,530 Then we get something called the Con Hillyard equation, which I can expand out as having a symmetric term. 576 00:51:43,860 --> 00:51:50,790 And this term here, which is no longer, no longer pushing me towards zero and one. 577 00:51:53,120 --> 00:51:58,280 And this we can expect to model that situation where things are not equally fit and in fact, we add a noise to them. 578 00:51:59,030 --> 00:52:05,749 And what happens in a situation is that the fighter type is going to spread in a travelling wave and it's happening much faster than curvature flow. 579 00:52:05,750 --> 00:52:11,750 It's going to it's going to spread across the, the range of the species as a travelling wave on a much faster timescale. 580 00:52:12,380 --> 00:52:13,970 Now why am I showing you this? 581 00:52:13,970 --> 00:52:22,030 Actually the two interesting because for any PDA people, it is interesting that if this time we're not here, then we get just pure selection. 582 00:52:22,040 --> 00:52:28,489 So for a haploid population, as we call it, so for populations where there's only one copy of each gene in each individual, 583 00:52:28,490 --> 00:52:30,890 and we're just saying that one type is fitter than the other type, 584 00:52:31,340 --> 00:52:35,899 this time wouldn't be here and this travelling wave would still exist, but it would spread out to the right, 585 00:52:35,900 --> 00:52:42,740 roughly the square root of S times, a minus one for these populations where we've also got the selection against hybridisation. 586 00:52:44,420 --> 00:52:47,989 The travelling wave will travel at a speed proportional to s times, a minus one, 587 00:52:47,990 --> 00:52:53,180 so a completely different speed, which is because we have a pushed wave instead of a pooled wave. 588 00:52:53,420 --> 00:52:56,720 So that was just proving. I've read some mathematics that was for Alan's benefit. 589 00:52:57,600 --> 00:53:00,680 So why do mathematicians like equations like this? 590 00:53:00,690 --> 00:53:06,150 We like equations like this and we like models like this because we recognise them as having come up elsewhere. 591 00:53:06,510 --> 00:53:14,170 So this we've come across for biological reasons. It's actually an equation which has been studied extensively, especially in physics. 592 00:53:14,590 --> 00:53:18,460 And we like that kind of universality where particular models arise in lots of different contexts. 593 00:53:18,790 --> 00:53:23,800 And the only difference here is that the form of the noise we take is really very different from the form of the noise that the physicists use. 594 00:53:24,640 --> 00:53:28,030 And the other thing that mathematicians like is they always like an excuse to mention coffee, 595 00:53:28,180 --> 00:53:31,690 an especially excuse to drink coffee, and even more so an excuse to spill coffee. 596 00:53:32,380 --> 00:53:39,940 And it turns out that this equation will model is travelling front and the fluctuations in that travelling front should 597 00:53:39,940 --> 00:53:45,580 be roughly the same as the fluctuations in the travelling front when you spill your coffee all over your exam scripts. 598 00:53:46,180 --> 00:53:49,210 So on that note, I think I'll stop. Thank you very much.