1 00:00:13,290 --> 00:00:17,040 Welcome back to the Oxford Mathematics Public Lectures Home Edition. 2 00:00:17,040 --> 00:00:23,220 My name is allegorically and I'm in charge of external relations for the Mathematical Institute as usual. 3 00:00:23,220 --> 00:00:28,110 Special thanks to a sponsor's expects market execs, market leading, 4 00:00:28,110 --> 00:00:33,420 quantitative driven electronic market maker with offices in London, Singapore and New York. 5 00:00:33,420 --> 00:00:38,190 The ongoing support is crucial in providing you quality content. 6 00:00:38,190 --> 00:00:45,900 It is a great pleasure for me to welcome today, my colleague John Kito with the said Lee and professor of natural philosophy at Oxford University. 7 00:00:45,900 --> 00:00:53,010 As you may know, the Sadlier Professorship is the oldest scientific chair in Oxford and is dedicated to applied mathematics. 8 00:00:53,010 --> 00:00:58,740 John was elected to this chair two years ago after an illustrious career at the University of Bristol. 9 00:00:58,740 --> 00:01:01,740 He has broad research interests in quantum physics, 10 00:01:01,740 --> 00:01:08,130 random matrix theory and its unlikely connexion to number theory through the remains that a function is. 11 00:01:08,130 --> 00:01:13,800 Research is truly fascinating as he draws connexion between feeds that are apparently disconnected, 12 00:01:13,800 --> 00:01:18,690 and I've been always impressed by John's originality and clarity of thoughts. 13 00:01:18,690 --> 00:01:23,070 Continuing in the same vein and exploring unlikely connexions today, 14 00:01:23,070 --> 00:01:28,140 John will take us on a wild ride of extreme events and explain to us how the 15 00:01:28,140 --> 00:01:33,540 heights of unexplored Montaigne gold medals at the Olympics quantum period, 16 00:01:33,540 --> 00:01:37,800 machine learning and prime number are all somewhat connected. 17 00:01:37,800 --> 00:01:42,810 So thank you very much, John, for doing this. Please start now. 18 00:01:42,810 --> 00:01:50,340 Well, let me begin by thanking Alan for the kind invitation to give this lecture. 19 00:01:50,340 --> 00:01:58,200 I'll be speaking about it a collection of problems which superficially may seem to have no connexion, 20 00:01:58,200 --> 00:02:04,200 but which I shall argue and are linked by a mathematical threat. 21 00:02:04,200 --> 00:02:09,750 And it's that mathematical threat that will be the main focus of the talk rather than any 22 00:02:09,750 --> 00:02:18,160 one of the individual problems that I shall speak about and that are connected by it. 23 00:02:18,160 --> 00:02:30,280 I'd like to begin by inviting you to imagine that you're hiking in a mountainous terrain rather like the one shown in this photograph on your hike, 24 00:02:30,280 --> 00:02:37,660 your goals may be to climb one of the peaks that you see shown here, 25 00:02:37,660 --> 00:02:48,910 perhaps to walk along a ridge from that peak to a neighbour to neighbouring peaks down from the peak and then back up to a neighbouring peak. 26 00:02:48,910 --> 00:02:55,900 Or it may be to descend all the way down and find a dip where water has collected 27 00:02:55,900 --> 00:03:04,420 and a lake or a pool has formed in which you can swim or where you can skim stones. 28 00:03:04,420 --> 00:03:13,660 Mathematically speaking, you're visiting what we would call critical points in the height of the Earth's surface, 29 00:03:13,660 --> 00:03:18,640 so critical points might be a local minimum. 30 00:03:18,640 --> 00:03:22,870 That's a point where a step in any direction takes you upwards. 31 00:03:22,870 --> 00:03:34,270 And that's the sort of place where a pool or lake may form as water collects a different type of critical point would be a little maximum. 32 00:03:34,270 --> 00:03:41,800 That's a point where a step in any direction takes you downwards, and the peaks that you visit would be examples of ours. 33 00:03:41,800 --> 00:03:51,370 And finally, there are subtle points two points which are a minimum in one direction and a maximum in the other direction. 34 00:03:51,370 --> 00:03:59,360 And you would encounter these on the ridge between two peaks. 35 00:03:59,360 --> 00:04:06,290 So your hike may take you to these various critical points, but perhaps your goal is more ambitious, 36 00:04:06,290 --> 00:04:15,920 perhaps you're more adventurous and your intention is not to climb any old peak, but in fact to climb the highest peak in this vicinity. 37 00:04:15,920 --> 00:04:19,070 And so you may recognise that to be there Nevis, 38 00:04:19,070 --> 00:04:27,200 which is shown there in the background and much of this talk will be concerned with that kind of problem. 39 00:04:27,200 --> 00:04:42,800 The problem of identifying finding the highest peak in some problem, that's on some surface that looks like the one that I'm showing you here now. 40 00:04:42,800 --> 00:04:47,750 In terms of mountains, you may think this is a little artificial. We know where the highest peaks are. 41 00:04:47,750 --> 00:04:57,270 We have maps that show us that and we even have GPS. But many of the problems I should be speaking about later are not of that kind. 42 00:04:57,270 --> 00:05:02,330 We we don't have an ID in advance of where the highest peak is. 43 00:05:02,330 --> 00:05:08,630 And our job may be to find it and maybe to understand how hard it is likely to be. 44 00:05:08,630 --> 00:05:11,600 And so I'm showing you this more for illustrative purposes, 45 00:05:11,600 --> 00:05:20,930 but I think it conveys the message as another message that I want to convey from this picture, which is that looking at it, 46 00:05:20,930 --> 00:05:27,110 you may form the impression that the height of the Earth's surface in this sense is a 47 00:05:27,110 --> 00:05:35,810 rather irregular and random function of the position on the position where you are. 48 00:05:35,810 --> 00:05:45,050 So knowing your position doesn't necessarily mean that you can automatically deduce what the height of the Earth's surface would be. 49 00:05:45,050 --> 00:05:50,090 This terrain here is seemingly random and or irregular. 50 00:05:50,090 --> 00:05:55,310 But in other ways, you know, the height of the Earth's surface at one point in this terrain isn't automatically tell you 51 00:05:55,310 --> 00:06:02,700 what the height of the Earth's surface would be some distance away or in some other direction. 52 00:06:02,700 --> 00:06:09,890 So that means that we want to model the height of the surface as as that as being random. 53 00:06:09,890 --> 00:06:18,000 And there are many ways to do that. And the simplest one would be to assume that the height of the Earth's surface in 54 00:06:18,000 --> 00:06:25,680 that photograph is described by a normal distribution or a bell shaped curve, 55 00:06:25,680 --> 00:06:33,600 as shown in this picture. So there is an average height that's the centre of the distribution, 56 00:06:33,600 --> 00:06:39,840 the highest point in the graph there and then the probability or likelihood of 57 00:06:39,840 --> 00:06:46,890 finding heights much greater than the mean decay rapidly as a function of height. 58 00:06:46,890 --> 00:06:53,980 And likewise, the probability of finding very low heights decays rapidly as you go away from the main. 59 00:06:53,980 --> 00:06:59,400 Now, I should say you could criticise this as a model for heights in mountain ranges. 60 00:06:59,400 --> 00:07:00,720 That's not my point. 61 00:07:00,720 --> 00:07:09,270 And in fact, the examples I'll be showing you later are ones where we do believe that this is the right way to model wide distribution of heights. 62 00:07:09,270 --> 00:07:15,300 But just for the moment, let's take this to be the distribution heights in the mountain range that I showed you. 63 00:07:15,300 --> 00:07:24,210 The question, then, that you might ask and these will be the ones that I should be focussing on throughout the tour are in a 64 00:07:24,210 --> 00:07:30,210 situation where the terrain can be modelled by a random surface with a normal distribution of heights, 65 00:07:30,210 --> 00:07:33,360 a bell shaped curve of heights. 66 00:07:33,360 --> 00:07:47,310 How effective can we expect message to be for locating the highest maximum is the to the peak yes, peak or equivalently the lowest minimum. 67 00:07:47,310 --> 00:07:58,170 How high should we expect the highest maximum to be? You're given a random service, do you expect exceptionally large hikes to appear? 68 00:07:58,170 --> 00:08:05,590 And how does that depend on the total number of peaks in the in the in the service? 69 00:08:05,590 --> 00:08:10,150 And finally, to what extent do we expect these answers to depend on the dimension of the surface? 70 00:08:10,150 --> 00:08:14,320 The Earth's surface is two dimensional examples. 71 00:08:14,320 --> 00:08:21,430 I'll give you later. At least some of the most important ones will concern services that have a vastly higher number of dimensions. 72 00:08:21,430 --> 00:08:33,290 Thousands, millions or billions. How do we expect random terrains to look like in very high dimensional spaces? 73 00:08:33,290 --> 00:08:39,410 So let me give you a few more examples just to whet your palate. 74 00:08:39,410 --> 00:08:43,400 Here you see the surface of the sea. 75 00:08:43,400 --> 00:08:52,070 And in fact, it turns out that the normal distribution is a very good model for the distribution of heights of the surface of the sea. 76 00:08:52,070 --> 00:08:59,840 And here, as it's painted, the surface is rather irregular, rather random. 77 00:08:59,840 --> 00:09:05,210 So if you were sailing the boat shown in this picture, 78 00:09:05,210 --> 00:09:14,840 you might well wish to know what is the how high is the highest wave you're likely to encounter, 79 00:09:14,840 --> 00:09:21,470 or how deep would be the deepest trough you might descend down into. 80 00:09:21,470 --> 00:09:29,930 In particular, how would that depend on the length of your voyage as you encounter more and more waves? 81 00:09:29,930 --> 00:09:36,890 Do you expect to encounter to find larger and larger highest waves? 82 00:09:36,890 --> 00:09:45,390 Or does the problem not depend too much on how many waves you're likely to encounter on your voyage? 83 00:09:45,390 --> 00:09:50,880 Here's a second example, and this comes from quantum mechanics. Contract is also a wave theory. 84 00:09:50,880 --> 00:09:55,140 In this case, it's a wave theory of how things move. 85 00:09:55,140 --> 00:10:06,780 So the thing I want to the system I want to consider is a point particle moving inside some domain and bouncing off the walls. 86 00:10:06,780 --> 00:10:12,450 Think of it as a billiard ball bouncing around InSight's billiard table. 87 00:10:12,450 --> 00:10:23,880 But in this case, the billiard table is has a cardioid shape. What you see, there is one trajectory of the billiard ball, and it's highly irregular. 88 00:10:23,880 --> 00:10:29,770 The motion inside the cardioid is is chaotic. 89 00:10:29,770 --> 00:10:34,810 What you see on the right is a quantum wave function for this same problem. 90 00:10:34,810 --> 00:10:43,030 So this describes the quantum mechanics of the motion of a Billy Ball inside a cardioid shaped billet table. 91 00:10:43,030 --> 00:10:50,450 You see the peaks of the wave. In fact, what's plotted here is the square of the wave. 92 00:10:50,450 --> 00:10:59,710 So half of those peaks will actually be deep minima reflected back upwards by the act of squaring the wave function. 93 00:10:59,710 --> 00:11:01,840 But physically, that's the right thing to do, 94 00:11:01,840 --> 00:11:10,820 because the square of the wave function in this case gives you the probability of finding the particle at a given place in a given vicinity. 95 00:11:10,820 --> 00:11:18,970 And so you might well wish to know how high is the highest peak of this wave function likely to be? 96 00:11:18,970 --> 00:11:28,440 Are there places where we we might expect to find vastly higher probability of finding the particle than other places? 97 00:11:28,440 --> 00:11:37,560 How much how much just the height of the highest peak in this wave function depend on the total number of peaks that you see there. 98 00:11:37,560 --> 00:11:49,500 As we look at way functions with more and more peaks, do we expect to find places, positions with increasingly high probability? 99 00:11:49,500 --> 00:12:02,430 And how does that depend on the number of peaks? Well, these are the sorts of questions I want to consider and. 100 00:12:02,430 --> 00:12:09,000 To start with, I want to consider a warm up question, which has nothing to do with waves. 101 00:12:09,000 --> 00:12:11,280 Nothing to do with management trends. 102 00:12:11,280 --> 00:12:20,460 It's a rather more elementary question, but I want to argue it captures much of the spirit of the problems that I discussed so far. 103 00:12:20,460 --> 00:12:24,870 So the question I want to discuss is in the Olympics. 104 00:12:24,870 --> 00:12:35,430 Should we expect the number of gold medals won by a country to be proportional to the relative size of that country's population? 105 00:12:35,430 --> 00:12:43,050 So I'm focussing on gold medals here because they do signify extreme ability. 106 00:12:43,050 --> 00:12:49,400 One gets a gold medal for running, for being the very fastest person. 107 00:12:49,400 --> 00:13:01,280 One of having a being an extreme of speed, one gets a gold medal in the javelin for being the person who can throw the furthest teams, 108 00:13:01,280 --> 00:13:10,940 get gold medals in synchronised swimming for being able to swim in the most synchronised way in the competition. 109 00:13:10,940 --> 00:13:15,590 So Gold Medal signified will measure extreme events. 110 00:13:15,590 --> 00:13:25,670 And the question is, does the number of gold medals that a country, when should we expect it to be proportional to the population of that country? 111 00:13:25,670 --> 00:13:36,940 Well, this was picked over. This question was picked over in the press and after the 2012 Olympics and the 2016 Olympics. 112 00:13:36,940 --> 00:13:46,130 And the following was that was the sort of analysis that you found in many, many articles throughout the press, at least the British press, 113 00:13:46,130 --> 00:13:57,040 the when the Great Britain had a population of roughly 65 million and won in the 2016 Olympics, 27 gold medals. 114 00:13:57,040 --> 00:14:06,860 The US population of roughly 320 million, that's about five times the population of Great Britain and one forty six gold medals, 115 00:14:06,860 --> 00:14:16,810 so little under twice the number, China had a population which was roughly 20 times that of Great Britain. 116 00:14:16,810 --> 00:14:23,800 And yet it went about the same number of gold medals. Japan population about twice that of Great Britain. 117 00:14:23,800 --> 00:14:29,650 Yet when half the number of gold medals, the Great Britain did not twice the number. 118 00:14:29,650 --> 00:14:40,750 So the articles where one found this statistical analysis carried out the conclusion drawn in those articles was that Great Britain had done 119 00:14:40,750 --> 00:14:50,820 disproportionately well and that the country had done better than should have been expected on the basis of the size of its population. 120 00:14:50,820 --> 00:14:52,680 So the question I want to address is, 121 00:14:52,680 --> 00:15:04,650 is that is it true that the number of gold medals you expect to win is proportional to the size of the population going back to my questions earlier? 122 00:15:04,650 --> 00:15:17,750 Is it true that the size of extremes, highest peaks, highest waves are proportional to the total number of waves? 123 00:15:17,750 --> 00:15:22,790 Here's another more careful analysis of the Olympics. You find many of these on the web. 124 00:15:22,790 --> 00:15:27,350 I click this one at random and adjusting for population. 125 00:15:27,350 --> 00:15:31,340 These were the most successful countries at the Rio Olympics. 126 00:15:31,340 --> 00:15:41,300 So while the United States won more medals than any other country in the Rio Olympics, the article points out that its population was relatively high. 127 00:15:41,300 --> 00:15:48,620 And so this analysis took the number of medals were for all various countries. 128 00:15:48,620 --> 00:15:53,570 And compared them with the population size taken very carefully from two sources, 129 00:15:53,570 --> 00:16:01,340 so this is a very careful analysis, and the two sources were the United Nations and the CIA World Factbook book, 130 00:16:01,340 --> 00:16:11,840 which I confess I didn't know existed until I did this research to find this website that divided the number of medals by the size of the population. 131 00:16:11,840 --> 00:16:23,390 And they found that the country which did best was Grenada, Grenada, perhaps which only won a single medal, which was a very small population. 132 00:16:23,390 --> 00:16:28,700 And so the number of medals won by the population, it came out top. 133 00:16:28,700 --> 00:16:33,980 And the other countries that did well were the Bahamas, New Zealand and Jamaica. 134 00:16:33,980 --> 00:16:41,600 In fact, and the US didn't do very well at all. Here is the list, or at least the top of the list that you find on this website. 135 00:16:41,600 --> 00:16:49,280 So Grenada or Grenada, I did the best and by some significant margin. 136 00:16:49,280 --> 00:16:53,540 And then the Bahamas, New Zealand, Jamaica, Denmark, Croatia did very well. 137 00:16:53,540 --> 00:17:03,530 Slovenia, Azerbaijan, Georgia, Hungary, Bahrain, Lithuania, Great Britain comes down this list. 138 00:17:03,530 --> 00:17:11,270 It's about three quarters of the way down there. And so it didn't do spectacularly well, according to this analysis, 139 00:17:11,270 --> 00:17:20,360 but did much better than other big countries, certainly much better than the US China. 140 00:17:20,360 --> 00:17:25,370 Japan, etc. But is this a reasonable analysis? 141 00:17:25,370 --> 00:17:30,270 Is this the right? Is it the right way to level the playing field? 142 00:17:30,270 --> 00:17:37,840 To divide by the total size of the population. Well, there are very different answers you can give to this. 143 00:17:37,840 --> 00:17:43,960 Here are two of them two different opposing perspectives, both mathematical. 144 00:17:43,960 --> 00:17:50,410 One mathematical perspective is that Great Britain did do disproportionately well and that you should normalise the 145 00:17:50,410 --> 00:17:57,520 data in order to level the playing field between countries of different sizes by dividing by the population size. 146 00:17:57,520 --> 00:18:06,730 And here's how the argument goes. So my first answer is yes, I'm just let's idealise. 147 00:18:06,730 --> 00:18:17,830 Let's consider an Olympics involving one event and two countries, and let's say the populations of the two countries are A and B. 148 00:18:17,830 --> 00:18:24,220 One of the people in these combined countries has to win the Gold Medal. 149 00:18:24,220 --> 00:18:35,680 And let's assume that there's no bias that the person who wins that one country is not on average athletically more capable than the other country. 150 00:18:35,680 --> 00:18:41,110 And so the likelihood of that person who wins being in one country in the 151 00:18:41,110 --> 00:18:46,400 first country is just a its population divided by the total number of people. 152 00:18:46,400 --> 00:18:53,590 And if you pick somebody at random from the A plus B people in the combined populations, 153 00:18:53,590 --> 00:19:01,150 there's a probability of a over eight plus B that you'll pick someone in the first population and the probability that the winner, 154 00:19:01,150 --> 00:19:08,140 whoever she is, is in the second population in the second country is whereas the fraction of 155 00:19:08,140 --> 00:19:14,230 people of the total combined population in that second country would be over, 156 00:19:14,230 --> 00:19:22,690 eight must be. And so the ratio of the two is a over B ratio that you probabilities. 157 00:19:22,690 --> 00:19:30,100 And so this would suggest that indeed normalising by dividing by the population size is the right thing to do, 158 00:19:30,100 --> 00:19:39,730 the right way to level the playing field. But I would argue that this isn't the most accurate model of how the Olympic Games works. 159 00:19:39,730 --> 00:19:48,400 You don't see the Olympic Games doesn't involve competitions between all members of the populations of the various countries involved. 160 00:19:48,400 --> 00:19:51,100 I've never been involved in the Olympic Games, for example, 161 00:19:51,100 --> 00:19:58,960 so I would argue that a more accurate model of how the Olympic Games works is it's really a competition 162 00:19:58,960 --> 00:20:06,430 between the fastest person and country in the first country against the fastest person in the second, 163 00:20:06,430 --> 00:20:14,680 or the person who can throw the furthest in the first country against the person who can throw the furthest in the second country. 164 00:20:14,680 --> 00:20:21,820 So I would argue a better way to to analyse this would be to ask if you take a population, 165 00:20:21,820 --> 00:20:27,520 a country and how fast is the fastest person likely to be in that country? 166 00:20:27,520 --> 00:20:33,170 Well, how far can the strongest person throw that javelin? 167 00:20:33,170 --> 00:20:36,650 So if people's sporting abilities, speed, strength, 168 00:20:36,650 --> 00:20:45,140 stamina have a normal distribution and are independent of each other out of a population of people, 169 00:20:45,140 --> 00:20:50,530 how fast or strong is the fastest or strongest person likely to be? 170 00:20:50,530 --> 00:20:57,910 And I would argue that this is a better model for the people who enter the Olympics and actually take part in the competition. 171 00:20:57,910 --> 00:21:08,870 So mathematically speaking, we should take end numbers, draw them independently at random from the normal distribution. 172 00:21:08,870 --> 00:21:14,480 So we wait the numbers with the probability that's given by the bell shaped curve or the normal distribution. 173 00:21:14,480 --> 00:21:21,890 And we'll take that distribution to mean new. That's the value at the centre of the distribution and variance. 174 00:21:21,890 --> 00:21:26,990 Sigma squared sigma is the width of the bell shaped curve. 175 00:21:26,990 --> 00:21:35,180 And the question that is, what's the distribution of the largest of these numbers, how large you expect the largest one to be? 176 00:21:35,180 --> 00:21:38,600 And the answer turns out to be the following one. 177 00:21:38,600 --> 00:21:47,390 So it's given by this equation, which I show you in blue, that if you like equations, you'll you'll be able to unpick this very quickly. 178 00:21:47,390 --> 00:21:52,040 But his equations are not quite your thing. Let me unpick it for you. 179 00:21:52,040 --> 00:21:56,390 So the equations series the following to compute what you expect to be the largest of these numbers, 180 00:21:56,390 --> 00:22:04,680 you take the mean centre point off of the bell shaped curve and you add on to that something. 181 00:22:04,680 --> 00:22:11,430 You add on something because of course, the largest is is very likely to be larger than the mean. 182 00:22:11,430 --> 00:22:15,420 The amount you add on is proportional to the width of the bell shaped curve. 183 00:22:15,420 --> 00:22:24,910 That's no surprise if the if you take a normal distribution with a greater width, you expect to find more large values. 184 00:22:24,910 --> 00:22:30,760 And then multiply the width is the square root of two times the log of the number of samples 185 00:22:30,760 --> 00:22:36,440 we've taken from the distribution log event and to work for those who knew their logarithms, 186 00:22:36,440 --> 00:22:46,810 they recognise that this depends barely a tool on and the logarithm grows with and it increases and increases. 187 00:22:46,810 --> 00:22:52,250 But it does so barely at all. It's one of the slowest growing functions you could imagine. 188 00:22:52,250 --> 00:22:56,890 We actually have here the square root of the logarithm, which grows even more slowly. 189 00:22:56,890 --> 00:23:02,440 Some of you may know that a sort of paradigm of rapid growth is the exponential well. 190 00:23:02,440 --> 00:23:08,400 The logarithm is the opposite of that. It's a paradigm of slowest growth, if you like. 191 00:23:08,400 --> 00:23:13,890 So this is the answer, and it's an approximation to the answer, rather. 192 00:23:13,890 --> 00:23:19,410 And it's a beautifully simple formula that tells you how big the extremes are likely to be. 193 00:23:19,410 --> 00:23:26,310 So this same formula tells us how fast the fastest person is likely to be. 194 00:23:26,310 --> 00:23:32,730 But it also tells us how high the highest wave is that we're likely to encounter in our boat. 195 00:23:32,730 --> 00:23:34,680 The height of the highest wave, 196 00:23:34,680 --> 00:23:46,050 if is if we assume a normal distribution of wave heights proportional to the square root of the log of the number of waves that we encounter. 197 00:23:46,050 --> 00:23:50,610 So taking a much longer journey doesn't dramatically affect the height of the highest 198 00:23:50,610 --> 00:23:56,340 wave you're likely to encounter very affected to do likewise in a quantum problem. 199 00:23:56,340 --> 00:24:06,390 The height of the highest quantum amplitude, the highest wave point the in the quantum wave is proportional to the square root 200 00:24:06,390 --> 00:24:10,290 of the log of the total number of peaks that you saw in the picture earlier, 201 00:24:10,290 --> 00:24:16,480 so badly affected at all by the number of peaks. 202 00:24:16,480 --> 00:24:23,600 In fact, if you analyse this little more, you find that typically you expect to get up to this maximum value. 203 00:24:23,600 --> 00:24:29,450 Lots of lots of numbers. And then this is the largest. 204 00:24:29,450 --> 00:24:38,090 And then after this you get nothing. So it's not that you expect to see one outlier dramatically faster than other people. 205 00:24:38,090 --> 00:24:45,950 A wave that's dramatically higher than other people. In fact, you expect to see lots of waves up to about the height of the maximum and then nothing. 206 00:24:45,950 --> 00:24:52,250 Lots of people who can run close to the speed of the fastest person, but then nothing. 207 00:24:52,250 --> 00:24:57,500 And that, of course, of course, with your intuition or your experience at the Olympics, 208 00:24:57,500 --> 00:25:01,520 there are lots of people who can run almost as fast as the fastest person. 209 00:25:01,520 --> 00:25:11,150 That's why race is in the Olympics are exciting. You don't know exactly who's going to win, but the uh, so you get lots of people close the fastest, 210 00:25:11,150 --> 00:25:15,880 not one outlier who's dramatically faster than everybody else. 211 00:25:15,880 --> 00:25:23,180 And so I should say that going back to the Olympics, I'm not arguing that this is how you should model Olympics, 212 00:25:23,180 --> 00:25:26,770 please don't use this as a basis for betting on the next Olympics. 213 00:25:26,770 --> 00:25:35,720 And of course, what this means is the dependence on population is really rather small and dependent on other factors is far more important, 214 00:25:35,720 --> 00:25:42,800 for example, facilities in a country's gross domestic product or the tradition of coaching in that country. 215 00:25:42,800 --> 00:25:48,420 These are far more important effects in the population size. 216 00:25:48,420 --> 00:25:54,010 Let me put in the populations just to illustrate how slowly the square root of the logarithm increases. 217 00:25:54,010 --> 00:26:04,290 She put in the population of China and into the logarithm and divide by the square to the log since the population of Great Britain. 218 00:26:04,290 --> 00:26:17,900 You get an answer this one. So barely any change between China and Great Britain in terms of the dependence on population size. 219 00:26:17,900 --> 00:26:24,170 Now, here's a more accurate answer. This is a better formula, therefore more complicated. 220 00:26:24,170 --> 00:26:35,600 And but let me pick this for you. So this is actually a very accurate formula for the height of the largest or the extremes in a normal distribution. 221 00:26:35,600 --> 00:26:42,590 We have what we had before the mean new centre of the value centre of the distribution. 222 00:26:42,590 --> 00:26:49,100 We have the term that we saw earlier, which increases extremely slowly with an. 223 00:26:49,100 --> 00:26:56,930 And it's like the squirt of log in and then we subtract off a term which depends not just on the logarithm event, 224 00:26:56,930 --> 00:27:04,730 but on the logarithm of the logarithm of. And this term actually decreases and increases. 225 00:27:04,730 --> 00:27:10,160 So we have a term that's about constant, a term that does increase, but extremely slowly. 226 00:27:10,160 --> 00:27:17,310 And a term that decreases, but extremely slowly. And then there are some small fluctuations, which I don't want to describe in this lecture. 227 00:27:17,310 --> 00:27:23,420 Beyond the scope of what I want to discuss now, you don't need to remember this formula or any formula in this talk. 228 00:27:23,420 --> 00:27:29,060 But there's one aspect I would like you to remember, and that's the number here shown in red. 229 00:27:29,060 --> 00:27:37,720 The number one that no, I do wish you to remember because we're going to come back to that a little later. 230 00:27:37,720 --> 00:27:44,530 So we have this very accurate formula, and it's described very accurately and heights of the high sea waves, 231 00:27:44,530 --> 00:27:49,990 heights of the highest quantum waves, et cetera. Now, 232 00:27:49,990 --> 00:27:56,800 you may argue that this analysis I've given you is is too simple because it ignores the 233 00:27:56,800 --> 00:28:04,660 fact that abilities to have people in sporting events aren't independent of each other. 234 00:28:04,660 --> 00:28:17,170 There are dependencies. If we consider the heights of people, once height is low to a good degree dependent on the heights of your parents, 235 00:28:17,170 --> 00:28:23,050 if you have two very tall parents, your height is likely to be taller than average. 236 00:28:23,050 --> 00:28:33,340 If you have two very athletic fast parents, there's a greater likelihood that you two will be athletic and fast. 237 00:28:33,340 --> 00:28:40,150 How can we build this into a model? We get an idealisation of a family tree. 238 00:28:40,150 --> 00:28:43,690 It's very idealised, but it just focuses on the essentials. 239 00:28:43,690 --> 00:28:51,250 And we imagine at the top of the tree, a matriarch, it's the first generation, so to speak. 240 00:28:51,250 --> 00:28:57,740 And we'll imagine just for the sake of simplicity that that matriarch has two offspring. 241 00:28:57,740 --> 00:29:05,780 Each of those offspring has two offspring, each of their offspring have two offspring, etcetera. 242 00:29:05,780 --> 00:29:12,080 Now the number two is not at all relevant here. It's just for illustrative purposes that I'm sharing this. 243 00:29:12,080 --> 00:29:15,080 Nor do we have to assume that everyone has the same number of offspring. 244 00:29:15,080 --> 00:29:21,470 Again, I'm just doing this for illustrative purposes to keep the description simple. 245 00:29:21,470 --> 00:29:25,430 So that's why I'm going from the parent to an offspring, 246 00:29:25,430 --> 00:29:32,240 let's imagine that the offspring acquire some characteristic, but he doesn't acquire a characteristic perfectly. 247 00:29:32,240 --> 00:29:39,020 There's some there's some variance in the degree to which they acquire that characteristic. 248 00:29:39,020 --> 00:29:47,960 So I would assume that as you go from a parent to an offspring, you pick up a factor and attribute drawn from the normal distribution. 249 00:29:47,960 --> 00:29:49,880 The bell shaped curve. 250 00:29:49,880 --> 00:30:02,650 So as you go down the generations each, each person picks up from their parent, an attribute which is drawn randomly from the bell shaped curve. 251 00:30:02,650 --> 00:30:09,940 But let's imagine that your net attribute your net ability is the average of the 252 00:30:09,940 --> 00:30:16,820 attributes of all of your ancestors going back to the matriarchal figure at the top. 253 00:30:16,820 --> 00:30:23,990 So you see at the bottom there, there is a population of people and at the bottom of the Green Line, 254 00:30:23,990 --> 00:30:30,440 there is a person and the attributes they collect is the average of the 255 00:30:30,440 --> 00:30:37,750 attributes acquired through all of the generations going back to the matriarch. 256 00:30:37,750 --> 00:30:45,160 They have a sibling shown at the bottom of the brown line and they to pick up attributes 257 00:30:45,160 --> 00:30:49,090 from all of the generations going back and they'll differ from their sibling. 258 00:30:49,090 --> 00:30:58,480 Only in the one attribute they've acquired from that parent, which is drawn at random and independently from the bell shaped curve. 259 00:30:58,480 --> 00:31:04,600 So this is a way of combining inheritance with randomness. 260 00:31:04,600 --> 00:31:10,810 And the question is, if you look at the population at the bottom, what's the distribution of attributes? 261 00:31:10,810 --> 00:31:19,990 We didn't have that. A beautiful fact about the normal distribution is that if you average lots of numbers from the normal distribution, 262 00:31:19,990 --> 00:31:26,890 they still the answer you get still has a normal distribution. So the people at the bottom in that generation shop bottom there. 263 00:31:26,890 --> 00:31:35,230 The variation in attributes will be described by the normal distribution by the bell shaped curve, but they're no longer independent. 264 00:31:35,230 --> 00:31:44,890 Because, for example, the person at the bottom of the Green Line and a sibling have lots of ancestors in common, lots of attributes in common, 265 00:31:44,890 --> 00:31:49,720 so they will be more similar than people who are more distantly related from that, 266 00:31:49,720 --> 00:31:56,740 and the level of similarity will be determined by their last common ancestor. 267 00:31:56,740 --> 00:32:03,340 What in this case, do we expect to be the highest of those attributes? 268 00:32:03,340 --> 00:32:11,500 Where if we have any people in that population, here's a formula for the for the largest of those end numbers. 269 00:32:11,500 --> 00:32:18,790 And this is a formula that most people have been thinking about this for the last 20 30 years. 270 00:32:18,790 --> 00:32:25,510 So this is relatively recently discovered formula, and the formula looks remarkably like the one I showed you earlier. 271 00:32:25,510 --> 00:32:38,260 In fact, it's almost identical. So, so building in dependence via a family tree dairy effect. 272 00:32:38,260 --> 00:32:45,820 The answer in terms of the size of the maximum height of that of the of the extreme. 273 00:32:45,820 --> 00:32:55,040 The answer is almost identical. It differs only in one way. And that's the thing that was one before is now a three. 274 00:32:55,040 --> 00:33:01,610 But that's in a very small turn. So this dependence is very small, but it is their dependence doesn't matter. 275 00:33:01,610 --> 00:33:10,460 But only at this very low level, this very small turn turn that gets smaller and smaller as an increase. 276 00:33:10,460 --> 00:33:14,930 But it is there, and the difference is that you go from one to three. 277 00:33:14,930 --> 00:33:22,150 Now this three is universal. It doesn't depend on the fact that I assumed two offspring or equal numbers of offspring. 278 00:33:22,150 --> 00:33:34,490 Anytime you have a family tree lying behind your your data, you expect to get a three and not to one. 279 00:33:34,490 --> 00:33:41,070 So that's a description of of some of the. 280 00:33:41,070 --> 00:33:44,940 Mathematics that underpins extreme events. 281 00:33:44,940 --> 00:33:55,080 Now, let me tell you about some of the applications, and the first application I want to describe is to freezing of liquids to form solids. 282 00:33:55,080 --> 00:34:02,100 So any material, if you raise it to a sufficiently high temperature is a liquid that means it's constituent parts, 283 00:34:02,100 --> 00:34:11,110 the atoms and molecules that make up a free to wander around at random and explore various different configurations. 284 00:34:11,110 --> 00:34:22,090 Each configuration carries an energy. And so as you explore various configurations, you're exploring ranges of different energies. 285 00:34:22,090 --> 00:34:28,340 As you lower the temperature, you lower the range of energies that you can explore. 286 00:34:28,340 --> 00:34:36,680 Until you reach the freezing point, when the liquid becomes a solid, where the constituent parts are fixed in space. 287 00:34:36,680 --> 00:34:43,940 And that's because essentially they're stuck at the lowest energy configuration. 288 00:34:43,940 --> 00:34:50,930 So when you're freezing a material, you're finding the lowest energy configuration. 289 00:34:50,930 --> 00:34:58,770 You know, there might be configurations with local minima, but you're finding the very lowest one. 290 00:34:58,770 --> 00:35:01,080 So when a liquid freeze is temperature lowered, 291 00:35:01,080 --> 00:35:07,260 lowered the configuration of atoms and molecules seeks the lowest energy arrangement in many situations. 292 00:35:07,260 --> 00:35:12,240 This lowest energy arrangement is highly symmetrical and highly ordered. 293 00:35:12,240 --> 00:35:17,990 So, for example, if you melt some. Salt. 294 00:35:17,990 --> 00:35:26,060 Common salt and then freeze it again, you know that you form crystals in those crystals are highly awkward and ordered and arranged. 295 00:35:26,060 --> 00:35:30,800 And that's because the lowest energy configuration in that case is very well-defined. 296 00:35:30,800 --> 00:35:38,270 There's a clear winner in the lowest energy configuration, and the system manages to find that every time. 297 00:35:38,270 --> 00:35:46,100 And that's why such systems have a very well-defined freezing temperature and why every time you freeze the system, 298 00:35:46,100 --> 00:35:51,470 you get the same configuration in the solid phase. 299 00:35:51,470 --> 00:35:55,970 There are, however, materials where that's not the case. 300 00:35:55,970 --> 00:36:07,190 And examples would be a glass, the glass that you see perhaps in the window, in your room or the glass that's in my spectacles. 301 00:36:07,190 --> 00:36:15,680 And in these cases, the energy landscape is vastly more complicated, and there isn't a clear, 302 00:36:15,680 --> 00:36:20,480 obvious winning landscape when it comes to being the minimum energy. 303 00:36:20,480 --> 00:36:25,220 In fact, this landscape is so complicated there, rather like the mountain ranges I showed you earlier, 304 00:36:25,220 --> 00:36:32,060 or rather like the surface of the sea or the quantum waves. There are lots of possible different arrangements, 305 00:36:32,060 --> 00:36:40,460 all with more or less the same local minimal energies and finding the right one an obvious winner is difficult. 306 00:36:40,460 --> 00:36:49,670 So when the system, when you lower the temperature, the system explores and finds itself in local minima and these you might get a 307 00:36:49,670 --> 00:36:53,570 different minimum each time you might be stuck in a different minimum each time. 308 00:36:53,570 --> 00:36:57,650 And there's no reason to expect that in the solid phase, the configurations will be the same. 309 00:36:57,650 --> 00:37:08,230 They're certainly not hiding highly ordered. So the question is why do classes have relatively well-defined freezing transitions? 310 00:37:08,230 --> 00:37:17,500 Here's a picture of a cartoon, if you like, of the energy landscape for a glass is computed by Chiara Camerata, 311 00:37:17,500 --> 00:37:22,060 who's an expert on this area of mathematical physics. 312 00:37:22,060 --> 00:37:26,440 And you see that what would I said that there are many possible local minima. 313 00:37:26,440 --> 00:37:33,490 Your system explores this terrain as you lower the temperature and will get stuck in a minimum, 314 00:37:33,490 --> 00:37:41,740 but it may get stuck in a very high line minimum and therefore be have an energy much higher than the potentially lowest one it could reach. 315 00:37:41,740 --> 00:37:47,790 There are many local American. It can it can attain all with different configurations. 316 00:37:47,790 --> 00:37:52,020 Now, this really is a cartoon, because in fact, this isn't two dimensional. 317 00:37:52,020 --> 00:38:00,810 It has many billions of dimensions. And so you have to imagine what this random landscape would look like in an extremely high dimensional space, 318 00:38:00,810 --> 00:38:06,330 not a two dimensional surface of a shown here. Does that help us? 319 00:38:06,330 --> 00:38:16,060 Well, it turns out that in very high dimensional random landscapes, we can get some simplifying features very unexpectedly. 320 00:38:16,060 --> 00:38:23,830 One feature is that the subtle points with higher energies look more like maxima than those with lower energies. 321 00:38:23,830 --> 00:38:26,440 Those with lower energy look more like minima. 322 00:38:26,440 --> 00:38:35,410 So what that means is that the high line shuttle points have many downward directions and not many upward directions. 323 00:38:35,410 --> 00:38:36,870 And this really helps you. 324 00:38:36,870 --> 00:38:47,590 You might be troubled that your your liquid will, as it explores various configurations, end up at a subtle point with a high energy, 325 00:38:47,590 --> 00:38:52,510 but which looks very much like a minimum might get stuck there for a very long time, 326 00:38:52,510 --> 00:38:56,230 and you might freeze in that configuration, but that doesn't happen. 327 00:38:56,230 --> 00:39:00,550 It turns out that subtle points where you may think you would get stuck. 328 00:39:00,550 --> 00:39:07,180 The high energy shuttle points look more like maxima, so you're more likely to fall down in the cascade of energies. 329 00:39:07,180 --> 00:39:15,640 Down to lower energies, right? Turns out that most minima have have low energies and in fact, entries close to the lowest. 330 00:39:15,640 --> 00:39:24,550 Not all of them do. There will be some minima that have high energies, but they're very rare compared to the ones with low energies. 331 00:39:24,550 --> 00:39:31,000 And the lowest energy configuration is only slightly sensitive to the size of the system, as we've already seen. 332 00:39:31,000 --> 00:39:35,830 And so this explains why glasses have a relatively sharp freezing transition. 333 00:39:35,830 --> 00:39:44,680 It's because of this phenomenon that you get lots of things, lots of of local peaks close to the highest peak. 334 00:39:44,680 --> 00:39:49,990 If you're interested in the highest one or the lowest if your lowest dip, if you're interested in the lowest lots, 335 00:39:49,990 --> 00:39:57,100 very close to that and the configuration you end up in may not be the absolute lowest, but his energy is not going to be far away. 336 00:39:57,100 --> 00:40:02,020 And so the freezing temperature is pretty well defined in these systems. 337 00:40:02,020 --> 00:40:12,260 And this explains a puzzle that has been troubling the natural scientists for a very long time. 338 00:40:12,260 --> 00:40:20,000 We can apply this understanding to a different problem that's troubling people currently, and this is the problem of machine learning. 339 00:40:20,000 --> 00:40:26,540 How do you train a machine to recognise or categorise images that is not seen before? 340 00:40:26,540 --> 00:40:37,160 So the idea is you want to show your computer pictures of cats, lots of them and train your computer to recognise what a cat is. 341 00:40:37,160 --> 00:40:41,480 So that if you show it a picture not identical to any of those, it's seen already. 342 00:40:41,480 --> 00:40:50,150 It will still recognise it as a cat. So you put in lots of data into your computer, lots of pictures of lots of cats. 343 00:40:50,150 --> 00:40:53,510 Each picture of a cat contains lots of data. 344 00:40:53,510 --> 00:41:04,550 There are many attributes to a cat, and many things you can measure that would reflect in the data that you put into the into your computer. 345 00:41:04,550 --> 00:41:11,900 So your job here is to take lots of images. Each of them contains lots of information, 346 00:41:11,900 --> 00:41:23,680 and you put it into a computer and try to use this to train your computer to recognise a new image to be a cat as opposed to a dog or a hamster. 347 00:41:23,680 --> 00:41:26,860 How do you do that? Well, you take all the data you've put in, 348 00:41:26,860 --> 00:41:36,140 which now sits in a very high dimensional space because we're putting in a lot of data in each point of data contains a lot of information, 349 00:41:36,140 --> 00:41:40,180 lots of parameters to vary. So this is a very high dimensional space. 350 00:41:40,180 --> 00:41:47,140 We input the data and then we try to find a surface that sits as close as possible to all of that data. 351 00:41:47,140 --> 00:41:52,780 So this surface will necessarily be highly complex and lives in a very high dimensional space. 352 00:41:52,780 --> 00:41:59,930 It has to be complex because it has to fit lots of different looking cats. There are lots of varieties of cat. 353 00:41:59,930 --> 00:42:07,100 So we have these very random surface in a very high dimensional space, and we want it to be as close to all the data that we put in as possible. 354 00:42:07,100 --> 00:42:13,100 So we vary parameters that describe the surface and try to get it to match as closely as possible. 355 00:42:13,100 --> 00:42:17,990 The data that we put in. And we're just as close as possible, Maine, 356 00:42:17,990 --> 00:42:28,370 when it means that the distance between that surface and the data points that we're putting in has to be as small as possible. 357 00:42:28,370 --> 00:42:33,920 That is, we have to find the lowest minimum of distance. 358 00:42:33,920 --> 00:42:39,260 So the problem machine learning is exactly like the problem of freezing of glasses. 359 00:42:39,260 --> 00:42:49,850 We're finding the lowest minimum and is therefore exactly the same as finding the extremes of random surfaces that we discussed earlier. 360 00:42:49,850 --> 00:42:57,620 And you can analyse it in the same way. So we defined the lowest minimum, and this will train the machine in the best possible way, 361 00:42:57,620 --> 00:43:03,260 give it the best possible chance to categorise images that it's not already seen. 362 00:43:03,260 --> 00:43:09,080 So can you find the lowest minimum? Well, this is one of the great challenges of modern computer science. 363 00:43:09,080 --> 00:43:18,860 And every, every, every company interested in in machine learning has teams working on this problem. 364 00:43:18,860 --> 00:43:26,330 It's it's the moon landing problem of of machine learning is how to identify the global minimum, 365 00:43:26,330 --> 00:43:33,880 the lowest minimum of the surface that you fall to approximate the data you've put in. 366 00:43:33,880 --> 00:43:37,660 Which is out, there are algorithms that do that, and they work very well. 367 00:43:37,660 --> 00:43:44,290 In fact, the big surprise in this area is not whether you can find an algorithm we have them. 368 00:43:44,290 --> 00:43:49,390 The big surprise is that they work far better than we might. They might have been expected. 369 00:43:49,390 --> 00:43:58,930 And this was the great puzzle to resolve in the area. So here's an example of the sort of surface you get in the machine learning problems. 370 00:43:58,930 --> 00:44:03,970 This is taken from a paper visualising the lost landscape of neural nets. 371 00:44:03,970 --> 00:44:08,570 Neural nets are a description of these random search of these surfaces. 372 00:44:08,570 --> 00:44:14,530 The one gets in computer science in this area. You see, the surface is highly irregular. 373 00:44:14,530 --> 00:44:21,400 And your job is to find the lowest minimum on the surface, and that's the best approximation to your data. 374 00:44:21,400 --> 00:44:24,250 Here's another illustration of what these surfaces look like. 375 00:44:24,250 --> 00:44:31,960 This is taken from a website lost landscape dot com, where I should say you'll also find films that you can explore, 376 00:44:31,960 --> 00:44:43,240 and you see these surfaces are extraordinarily complex and pockmarked with with local minima, which saddles maxima. 377 00:44:43,240 --> 00:44:47,470 So as you explore the surface, you may well get stuck in the wrong minimum. 378 00:44:47,470 --> 00:44:56,350 You may well get stuck or to settle for a very long time before you find that it really is just a saddle and not not a minimum. 379 00:44:56,350 --> 00:45:04,090 And in these very high dimensional spaces, I remind you, saddles can look very much like like like minima. 380 00:45:04,090 --> 00:45:16,810 If you if this was a surface of a thousand dimensions in 999 dimensions, the saddle might be upwards, but only one direction, maybe downwards. 381 00:45:16,810 --> 00:45:21,550 So it's very easy to be fooled in these high dimensional spaces, but a saddle is really a minimum. 382 00:45:21,550 --> 00:45:25,160 So it looks like this would be an intractable problem, but in fact, it isn't. 383 00:45:25,160 --> 00:45:30,700 And the methods that people have worked extraordinarily well. Why is that? 384 00:45:30,700 --> 00:45:38,310 Well, it turns out that it's the same understanding we've developed all along in these highly high dimensional, highly complex surfaces. 385 00:45:38,310 --> 00:45:48,610 And the structure actually works to your benefit in that the saddles that you see if you're high up look much more like maxim of the minimum. 386 00:45:48,610 --> 00:45:55,150 And so you naturally inclined to roll down the landscape with your algorithm and not get stuck, 387 00:45:55,150 --> 00:46:01,270 not get stuck in in high lying minima because there are relatively few of those. 388 00:46:01,270 --> 00:46:06,790 Instead, you're likely to find a minimum, which is very close to the global minimum. 389 00:46:06,790 --> 00:46:12,190 You may not find the absolute global minimum, but you can quickly get to a minimum that's very close to it. 390 00:46:12,190 --> 00:46:17,520 And that's good enough for all practical purposes. So this is a good example. 391 00:46:17,520 --> 00:46:26,510 I want you to discuss, but let me finish with a final example, which is have a very different flavour and this is the Romans easy to function. 392 00:46:26,510 --> 00:46:29,570 So the remains to function is a mathematical object, 393 00:46:29,570 --> 00:46:40,820 a surface which is designed to understand the prime numbers to the primes of the numbers divisible only by themselves and want. 394 00:46:40,820 --> 00:46:44,660 So here are the examples of the primes up to 100. 395 00:46:44,660 --> 00:46:53,750 And for thousands of years, humankind's been interested in the distribution of these numbers against all of the whole numbers. 396 00:46:53,750 --> 00:46:59,420 Are there any patterns amongst the primes? Are there any ways to predict where the next prime will count, 397 00:46:59,420 --> 00:47:07,130 etc. So people have found a way to analyse the distribution of primes, and this involves the remains to function? 398 00:47:07,130 --> 00:47:12,960 So what's that? Well, if we take a number as. 399 00:47:12,960 --> 00:47:22,440 We do the following to it. We get one. We are doing one over two to the power plus one of the three to the Paris plus one over four to the Paris. 400 00:47:22,440 --> 00:47:29,820 Just one of the height of power, etc. So our faces to this would be one plus one over to square, 401 00:47:29,820 --> 00:47:32,820 which is four plus one of three squared, which is nine. 402 00:47:32,820 --> 00:47:43,440 That's one of the four squared, which is 16, etc. So for each number s, we can put it into this sub and get an answer act for some values of s. 403 00:47:43,440 --> 00:47:49,920 You have to work a little harder, but I don't want to go down that route. We have a way to get an answer for every time you have. 404 00:47:49,920 --> 00:47:57,000 Now, just to make this a little more complicated, it turns out you don't have to have taken no like two or three four s. 405 00:47:57,000 --> 00:48:06,660 You could take a complex number, something that was a combination of a number that we see in the everyday world and the square root of minus one. 406 00:48:06,660 --> 00:48:09,610 So if you understand complex numbers, you'll know what that means. 407 00:48:09,610 --> 00:48:15,300 You put a complex number and threaten the value out that you get will typically be a complex number. 408 00:48:15,300 --> 00:48:21,000 If complex numbers aren't quite your thing. Think of it. That s is really two numbers. 409 00:48:21,000 --> 00:48:26,630 You put two numbers into this device and two numbers come connect. 410 00:48:26,630 --> 00:48:30,980 So how does this help you? Well, there's a remarkable identity, 411 00:48:30,980 --> 00:48:41,090 do you originally to oiler which says that that sum is in fact equal to a product and the product is over all of the prime numbers? 412 00:48:41,090 --> 00:48:46,040 So the Sun one plus whatever of the sort of a three to the S, et cetera, 413 00:48:46,040 --> 00:48:54,260 is identical equal to one over one minus one of a two to the chest and one which would have a three year less tangible match. 414 00:48:54,260 --> 00:49:00,290 One of five to this much more over 70 years will match whatever 11 to the S, et cetera. 415 00:49:00,290 --> 00:49:08,990 And these are all the prime numbers that are appearing in only the prime numbers. So it is realised by Raman that just as modern chefs love to do, 416 00:49:08,990 --> 00:49:13,010 you can take a nice dish and deconstruct it ream and realise that you can 417 00:49:13,010 --> 00:49:18,680 deconstruct this formula that if you understand the behaviour of the to function, 418 00:49:18,680 --> 00:49:21,900 you can deconstruct it to get information about the prime numbers. 419 00:49:21,900 --> 00:49:28,950 So this is how we understand the primes is via the Raymond Z to function deconstructed. 420 00:49:28,950 --> 00:49:35,250 So is the Romans each function look like it's a little hard to plot because of the fact that complex numbers appear, 421 00:49:35,250 --> 00:49:43,860 but one way to representatives is in terms of colours. So here's a plot in two dimensions the two to a point in the coordinates of the points 422 00:49:43,860 --> 00:49:49,290 in this two-dimensional plot are the two input numbers or the one complex number, 423 00:49:49,290 --> 00:49:51,000 if you know what that means. 424 00:49:51,000 --> 00:49:59,180 So you picture X and Y coordinates of a point here, and X and Y correspond to the two input numbers and the output is represented as a colour. 425 00:49:59,180 --> 00:50:02,450 Yeah, it's not the only way to represent the Z to function, 426 00:50:02,450 --> 00:50:07,590 and I'll give you a different one in a minute, but this is the one that we'll focus on for the moment. 427 00:50:07,590 --> 00:50:16,850 So you have colours at each point, but some points are special that points where all the colours meet. 428 00:50:16,850 --> 00:50:21,230 And you see a collection of these down at the bottom of the picture. 429 00:50:21,230 --> 00:50:28,510 They lie on a straight line, a horizontal line at the bottom of the plot. 430 00:50:28,510 --> 00:50:34,580 And even realised that these points where all the colours meet are a very special, 431 00:50:34,580 --> 00:50:38,980 and in fact, these are the points that really determine the distribution of the prime numbers. 432 00:50:38,980 --> 00:50:44,620 This is where the deacon, this is where the deconstruction really acts. 433 00:50:44,620 --> 00:50:48,700 He realised as well that there aren't just these special points on the horizontal line at the bottom. 434 00:50:48,700 --> 00:50:53,800 There are some others that lie away from that horizontal line. 435 00:50:53,800 --> 00:51:03,130 And he he guessed, he hypothesised that those numbers lie on a single straight vertical line. 436 00:51:03,130 --> 00:51:09,550 Now, he wasn't able to prove that he guessed it, and that left that as a challenge for future generations. 437 00:51:09,550 --> 00:51:12,280 And that's the problem that we call the Riemann hypothesis. 438 00:51:12,280 --> 00:51:18,190 It's that these special points were all the colours meet and that aren't the points on the horizontal line at the bottom. 439 00:51:18,190 --> 00:51:23,970 In fact, lie on a single vertical straight line. And we can't prove that yet. 440 00:51:23,970 --> 00:51:32,910 If you do come up with the proof and you're able to get it accepted in a reputable mathematical journal and accepted by the mathematical community, 441 00:51:32,910 --> 00:51:43,620 you win a million dollars because this is one of the problems issued by the Clay Institute, actually one of the clay mathematical millennium problems. 442 00:51:43,620 --> 00:51:45,910 Here's a different way to represent the Romans each function. 443 00:51:45,910 --> 00:51:54,510 And this is a way due to G.H. Hardy, who was a great mathematician in the early part of the 20th century, 444 00:51:54,510 --> 00:52:02,250 and he thought, What does this look like as you look up the vertical line where you expect their special points to lie? 445 00:52:02,250 --> 00:52:07,950 And he found a way to plot that it's called the hardest function, and it's the curve in blue there. 446 00:52:07,950 --> 00:52:11,100 So this curve oscillates is like a wave. 447 00:52:11,100 --> 00:52:20,100 It passes through zero and the zeros are precisely the points in the previous block where all the colours meet going up vertically. 448 00:52:20,100 --> 00:52:27,810 So the zeros of this curve, the point where it intersects the horizontal axis are the Roman zeros, 449 00:52:27,810 --> 00:52:34,170 and these are the points that are identified as points where the colours meet. 450 00:52:34,170 --> 00:52:37,800 What's the Riemann hypothesis in this setting, where it's the statement that this curve, 451 00:52:37,800 --> 00:52:47,940 if you look beyond 10 to the right of 10 on the horizontal axis, that all the maxima will be positive and all the minimum will be negative. 452 00:52:47,940 --> 00:52:59,940 So if you find a maximum of the of this function of this curve that lies below the horizontal axis that disproves the Roman hypothesis. 453 00:52:59,940 --> 00:53:04,440 So that's one of the great challenges about this, about the high desert function, 454 00:53:04,440 --> 00:53:12,480 this curve here show that it has no negative maxima or no positive minima, and that proves the remote hypothesis. 455 00:53:12,480 --> 00:53:14,880 There's another question that people have asked about this curve. 456 00:53:14,880 --> 00:53:23,160 Again, going back about 100 years and this is how big it is to the oscillations get in the heart is that function. 457 00:53:23,160 --> 00:53:30,450 I plotted the hard-edged function in the top curve up to 60 and then down below from fires in 2050. 458 00:53:30,450 --> 00:53:37,560 You see the oscillations continue and they get more rapid and the curve seems to get a little bigger, but not by very much. 459 00:53:37,560 --> 00:53:48,720 The oscillations and Lindelof, about 100 years ago, suggested that perhaps the hardest edge function increases as you go along the horizontal axis. 460 00:53:48,720 --> 00:53:55,380 The size of the oscillations increases, but as slowly as you can possibly imagine, I'm not going to make that. 461 00:53:55,380 --> 00:54:02,550 I'm not. I'm not going to stay that more precisely. But Lindelof made a precise guess that said that the hardy z function does 462 00:54:02,550 --> 00:54:07,650 increase in the size of its oscillations as you go along the horizontal axis. 463 00:54:07,650 --> 00:54:13,980 But as slowly as you can possibly imagine now which people have tried to think about 464 00:54:13,980 --> 00:54:19,080 this work on this for the last hundred years and hard won progress has been made. 465 00:54:19,080 --> 00:54:30,420 But we very far from proving that. So progress is being slow, steady, hard fought, but we're very far from proving that off. 466 00:54:30,420 --> 00:54:40,480 However, recently a different question has been put forward very much in this spirit, which it turns out we can answer very precisely. 467 00:54:40,480 --> 00:54:46,000 And this is the question, not how big does these oscillations get all the way out? 468 00:54:46,000 --> 00:54:54,100 Some long distance along the horizontal axis? What if we look over a short distance to what's the largest value of the heart? 469 00:54:54,100 --> 00:54:59,000 Is that function between tea and tea plus two pi? 470 00:54:59,000 --> 00:55:04,750 It doesn't have to be to be any constant. No, but I'm putting two pi here for frustration. 471 00:55:04,750 --> 00:55:13,420 So how big do you expect? The largest value of the hardy search function to be its largest oscillation in a range of lengths to PI? 472 00:55:13,420 --> 00:55:20,440 Two points about six. Well, as tea increases, you get more and more oscillations at the function in this range. 473 00:55:20,440 --> 00:55:28,010 And so we have more little max maxima the local minima. How big is the highest to lowest, the lowest? 474 00:55:28,010 --> 00:55:30,560 Well, here's the answer. 475 00:55:30,560 --> 00:55:42,170 It turns out to be that this answer is identical to the answer that we saw earlier for the extremes associated with family trees. 476 00:55:42,170 --> 00:55:52,610 There's no three appears, and the terms are exactly matching those that we saw in the formula for the extremes associated with family trees. 477 00:55:52,610 --> 00:55:58,130 Now the history of this is that this formula was guessed by analogy with the family tree problem. 478 00:55:58,130 --> 00:56:03,650 And in the last few years, people have managed to prove it. So we now have very precise information, 479 00:56:03,650 --> 00:56:08,630 astonishingly precise just over the last few years about the extreme values of the Hardees that 480 00:56:08,630 --> 00:56:14,900 function and the and this has come about because we asked the right question in this setting, 481 00:56:14,900 --> 00:56:20,750 a question that does have a very precise answer that we can prove and proving 482 00:56:20,750 --> 00:56:26,300 this relies on showing that there is a family tree like structure to the primes. 483 00:56:26,300 --> 00:56:29,690 And people have been thinking about the primes for a very long time. 484 00:56:29,690 --> 00:56:34,910 But this new understanding has emerged only recently that you can group the primes together in the 485 00:56:34,910 --> 00:56:40,520 way they contribute to the and Z to function in a way that makes them look like a family tree. 486 00:56:40,520 --> 00:56:45,850 And using that fact then allows you to prove this formula. So. 487 00:56:45,850 --> 00:56:51,010 By thinking about extremes, we discover new properties that are primes, 488 00:56:51,010 --> 00:56:56,260 and I should emphasise that it's inconceivable that we would have one would have guessed this formula if 489 00:56:56,260 --> 00:57:03,180 one hadn't been thinking about all the other problems and extremes that I've been describing earlier. 490 00:57:03,180 --> 00:57:06,830 So let me finish with the summary questions relating to the highest, 491 00:57:06,830 --> 00:57:15,020 maximum or lowest minimum connect, many different problems from water waves, quantum mechanics, 492 00:57:15,020 --> 00:57:21,020 the Olympics through to how glasses freeze, machine learning problems, 493 00:57:21,020 --> 00:57:26,570 the efficacy of algorithms and machine learning and through to the remains each function. 494 00:57:26,570 --> 00:57:37,220 And by identifying that thread, we've managed to make progress in recent years that I think wouldn't have been imagined longer ago. 495 00:57:37,220 --> 00:57:40,550 So the statistics of extreme values shows universal behaviour. 496 00:57:40,550 --> 00:57:51,380 You get the same formula appearing time and again, and this commonality is what allows us to make progress on a broad range of these problems. 497 00:57:51,380 --> 00:57:55,080 So it might seem that we've made good progress, and I think we're pleased that we have. 498 00:57:55,080 --> 00:58:00,140 But I want to emphasise that going back to the first story I told you the mountain range, 499 00:58:00,140 --> 00:58:05,390 we're really still in the foothills of this, of this analysis, of this line of research. 500 00:58:05,390 --> 00:58:08,990 It's very clear that there's much more that we don't understand that we do. 501 00:58:08,990 --> 00:58:16,790 And the great challenge now is to take this analysis and make it more precise in each of these separate applications and to get 502 00:58:16,790 --> 00:58:26,410 more and more accurate methods for analysing these problems of at extremes in the various contexts I've described to you today. 503 00:58:26,410 --> 00:59:02,109 Thank you.