1
00:00:13,290 --> 00:00:17,040
Welcome back to the Oxford Mathematics Public Lectures Home Edition.
2
00:00:17,040 --> 00:00:23,220
My name is allegorically and I'm in charge of external relations for the Mathematical Institute as usual.
3
00:00:23,220 --> 00:00:28,110
Special thanks to a sponsor's expects market execs, market leading,
4
00:00:28,110 --> 00:00:33,420
quantitative driven electronic market maker with offices in London, Singapore and New York.
5
00:00:33,420 --> 00:00:38,190
The ongoing support is crucial in providing you quality content.
6
00:00:38,190 --> 00:00:45,900
It is a great pleasure for me to welcome today, my colleague John Kito with the said Lee and professor of natural philosophy at Oxford University.
7
00:00:45,900 --> 00:00:53,010
As you may know, the Sadlier Professorship is the oldest scientific chair in Oxford and is dedicated to applied mathematics.
8
00:00:53,010 --> 00:00:58,740
John was elected to this chair two years ago after an illustrious career at the University of Bristol.
9
00:00:58,740 --> 00:01:01,740
He has broad research interests in quantum physics,
10
00:01:01,740 --> 00:01:08,130
random matrix theory and its unlikely connexion to number theory through the remains that a function is.
11
00:01:08,130 --> 00:01:13,800
Research is truly fascinating as he draws connexion between feeds that are apparently disconnected,
12
00:01:13,800 --> 00:01:18,690
and I've been always impressed by John's originality and clarity of thoughts.
13
00:01:18,690 --> 00:01:23,070
Continuing in the same vein and exploring unlikely connexions today,
14
00:01:23,070 --> 00:01:28,140
John will take us on a wild ride of extreme events and explain to us how the
15
00:01:28,140 --> 00:01:33,540
heights of unexplored Montaigne gold medals at the Olympics quantum period,
16
00:01:33,540 --> 00:01:37,800
machine learning and prime number are all somewhat connected.
17
00:01:37,800 --> 00:01:42,810
So thank you very much, John, for doing this. Please start now.
18
00:01:42,810 --> 00:01:50,340
Well, let me begin by thanking Alan for the kind invitation to give this lecture.
19
00:01:50,340 --> 00:01:58,200
I'll be speaking about it a collection of problems which superficially may seem to have no connexion,
20
00:01:58,200 --> 00:02:04,200
but which I shall argue and are linked by a mathematical threat.
21
00:02:04,200 --> 00:02:09,750
And it's that mathematical threat that will be the main focus of the talk rather than any
22
00:02:09,750 --> 00:02:18,160
one of the individual problems that I shall speak about and that are connected by it.
23
00:02:18,160 --> 00:02:30,280
I'd like to begin by inviting you to imagine that you're hiking in a mountainous terrain rather like the one shown in this photograph on your hike,
24
00:02:30,280 --> 00:02:37,660
your goals may be to climb one of the peaks that you see shown here,
25
00:02:37,660 --> 00:02:48,910
perhaps to walk along a ridge from that peak to a neighbour to neighbouring peaks down from the peak and then back up to a neighbouring peak.
26
00:02:48,910 --> 00:02:55,900
Or it may be to descend all the way down and find a dip where water has collected
27
00:02:55,900 --> 00:03:04,420
and a lake or a pool has formed in which you can swim or where you can skim stones.
28
00:03:04,420 --> 00:03:13,660
Mathematically speaking, you're visiting what we would call critical points in the height of the Earth's surface,
29
00:03:13,660 --> 00:03:18,640
so critical points might be a local minimum.
30
00:03:18,640 --> 00:03:22,870
That's a point where a step in any direction takes you upwards.
31
00:03:22,870 --> 00:03:34,270
And that's the sort of place where a pool or lake may form as water collects a different type of critical point would be a little maximum.
32
00:03:34,270 --> 00:03:41,800
That's a point where a step in any direction takes you downwards, and the peaks that you visit would be examples of ours.
33
00:03:41,800 --> 00:03:51,370
And finally, there are subtle points two points which are a minimum in one direction and a maximum in the other direction.
34
00:03:51,370 --> 00:03:59,360
And you would encounter these on the ridge between two peaks.
35
00:03:59,360 --> 00:04:06,290
So your hike may take you to these various critical points, but perhaps your goal is more ambitious,
36
00:04:06,290 --> 00:04:15,920
perhaps you're more adventurous and your intention is not to climb any old peak, but in fact to climb the highest peak in this vicinity.
37
00:04:15,920 --> 00:04:19,070
And so you may recognise that to be there Nevis,
38
00:04:19,070 --> 00:04:27,200
which is shown there in the background and much of this talk will be concerned with that kind of problem.
39
00:04:27,200 --> 00:04:42,800
The problem of identifying finding the highest peak in some problem, that's on some surface that looks like the one that I'm showing you here now.
40
00:04:42,800 --> 00:04:47,750
In terms of mountains, you may think this is a little artificial. We know where the highest peaks are.
41
00:04:47,750 --> 00:04:57,270
We have maps that show us that and we even have GPS. But many of the problems I should be speaking about later are not of that kind.
42
00:04:57,270 --> 00:05:02,330
We we don't have an ID in advance of where the highest peak is.
43
00:05:02,330 --> 00:05:08,630
And our job may be to find it and maybe to understand how hard it is likely to be.
44
00:05:08,630 --> 00:05:11,600
And so I'm showing you this more for illustrative purposes,
45
00:05:11,600 --> 00:05:20,930
but I think it conveys the message as another message that I want to convey from this picture, which is that looking at it,
46
00:05:20,930 --> 00:05:27,110
you may form the impression that the height of the Earth's surface in this sense is a
47
00:05:27,110 --> 00:05:35,810
rather irregular and random function of the position on the position where you are.
48
00:05:35,810 --> 00:05:45,050
So knowing your position doesn't necessarily mean that you can automatically deduce what the height of the Earth's surface would be.
49
00:05:45,050 --> 00:05:50,090
This terrain here is seemingly random and or irregular.
50
00:05:50,090 --> 00:05:55,310
But in other ways, you know, the height of the Earth's surface at one point in this terrain isn't automatically tell you
51
00:05:55,310 --> 00:06:02,700
what the height of the Earth's surface would be some distance away or in some other direction.
52
00:06:02,700 --> 00:06:09,890
So that means that we want to model the height of the surface as as that as being random.
53
00:06:09,890 --> 00:06:18,000
And there are many ways to do that. And the simplest one would be to assume that the height of the Earth's surface in
54
00:06:18,000 --> 00:06:25,680
that photograph is described by a normal distribution or a bell shaped curve,
55
00:06:25,680 --> 00:06:33,600
as shown in this picture. So there is an average height that's the centre of the distribution,
56
00:06:33,600 --> 00:06:39,840
the highest point in the graph there and then the probability or likelihood of
57
00:06:39,840 --> 00:06:46,890
finding heights much greater than the mean decay rapidly as a function of height.
58
00:06:46,890 --> 00:06:53,980
And likewise, the probability of finding very low heights decays rapidly as you go away from the main.
59
00:06:53,980 --> 00:06:59,400
Now, I should say you could criticise this as a model for heights in mountain ranges.
60
00:06:59,400 --> 00:07:00,720
That's not my point.
61
00:07:00,720 --> 00:07:09,270
And in fact, the examples I'll be showing you later are ones where we do believe that this is the right way to model wide distribution of heights.
62
00:07:09,270 --> 00:07:15,300
But just for the moment, let's take this to be the distribution heights in the mountain range that I showed you.
63
00:07:15,300 --> 00:07:24,210
The question, then, that you might ask and these will be the ones that I should be focussing on throughout the tour are in a
64
00:07:24,210 --> 00:07:30,210
situation where the terrain can be modelled by a random surface with a normal distribution of heights,
65
00:07:30,210 --> 00:07:33,360
a bell shaped curve of heights.
66
00:07:33,360 --> 00:07:47,310
How effective can we expect message to be for locating the highest maximum is the to the peak yes, peak or equivalently the lowest minimum.
67
00:07:47,310 --> 00:07:58,170
How high should we expect the highest maximum to be? You're given a random service, do you expect exceptionally large hikes to appear?
68
00:07:58,170 --> 00:08:05,590
And how does that depend on the total number of peaks in the in the in the service?
69
00:08:05,590 --> 00:08:10,150
And finally, to what extent do we expect these answers to depend on the dimension of the surface?
70
00:08:10,150 --> 00:08:14,320
The Earth's surface is two dimensional examples.
71
00:08:14,320 --> 00:08:21,430
I'll give you later. At least some of the most important ones will concern services that have a vastly higher number of dimensions.
72
00:08:21,430 --> 00:08:33,290
Thousands, millions or billions. How do we expect random terrains to look like in very high dimensional spaces?
73
00:08:33,290 --> 00:08:39,410
So let me give you a few more examples just to whet your palate.
74
00:08:39,410 --> 00:08:43,400
Here you see the surface of the sea.
75
00:08:43,400 --> 00:08:52,070
And in fact, it turns out that the normal distribution is a very good model for the distribution of heights of the surface of the sea.
76
00:08:52,070 --> 00:08:59,840
And here, as it's painted, the surface is rather irregular, rather random.
77
00:08:59,840 --> 00:09:05,210
So if you were sailing the boat shown in this picture,
78
00:09:05,210 --> 00:09:14,840
you might well wish to know what is the how high is the highest wave you're likely to encounter,
79
00:09:14,840 --> 00:09:21,470
or how deep would be the deepest trough you might descend down into.
80
00:09:21,470 --> 00:09:29,930
In particular, how would that depend on the length of your voyage as you encounter more and more waves?
81
00:09:29,930 --> 00:09:36,890
Do you expect to encounter to find larger and larger highest waves?
82
00:09:36,890 --> 00:09:45,390
Or does the problem not depend too much on how many waves you're likely to encounter on your voyage?
83
00:09:45,390 --> 00:09:50,880
Here's a second example, and this comes from quantum mechanics. Contract is also a wave theory.
84
00:09:50,880 --> 00:09:55,140
In this case, it's a wave theory of how things move.
85
00:09:55,140 --> 00:10:06,780
So the thing I want to the system I want to consider is a point particle moving inside some domain and bouncing off the walls.
86
00:10:06,780 --> 00:10:12,450
Think of it as a billiard ball bouncing around InSight's billiard table.
87
00:10:12,450 --> 00:10:23,880
But in this case, the billiard table is has a cardioid shape. What you see, there is one trajectory of the billiard ball, and it's highly irregular.
88
00:10:23,880 --> 00:10:29,770
The motion inside the cardioid is is chaotic.
89
00:10:29,770 --> 00:10:34,810
What you see on the right is a quantum wave function for this same problem.
90
00:10:34,810 --> 00:10:43,030
So this describes the quantum mechanics of the motion of a Billy Ball inside a cardioid shaped billet table.
91
00:10:43,030 --> 00:10:50,450
You see the peaks of the wave. In fact, what's plotted here is the square of the wave.
92
00:10:50,450 --> 00:10:59,710
So half of those peaks will actually be deep minima reflected back upwards by the act of squaring the wave function.
93
00:10:59,710 --> 00:11:01,840
But physically, that's the right thing to do,
94
00:11:01,840 --> 00:11:10,820
because the square of the wave function in this case gives you the probability of finding the particle at a given place in a given vicinity.
95
00:11:10,820 --> 00:11:18,970
And so you might well wish to know how high is the highest peak of this wave function likely to be?
96
00:11:18,970 --> 00:11:28,440
Are there places where we we might expect to find vastly higher probability of finding the particle than other places?
97
00:11:28,440 --> 00:11:37,560
How much how much just the height of the highest peak in this wave function depend on the total number of peaks that you see there.
98
00:11:37,560 --> 00:11:49,500
As we look at way functions with more and more peaks, do we expect to find places, positions with increasingly high probability?
99
00:11:49,500 --> 00:12:02,430
And how does that depend on the number of peaks? Well, these are the sorts of questions I want to consider and.
100
00:12:02,430 --> 00:12:09,000
To start with, I want to consider a warm up question, which has nothing to do with waves.
101
00:12:09,000 --> 00:12:11,280
Nothing to do with management trends.
102
00:12:11,280 --> 00:12:20,460
It's a rather more elementary question, but I want to argue it captures much of the spirit of the problems that I discussed so far.
103
00:12:20,460 --> 00:12:24,870
So the question I want to discuss is in the Olympics.
104
00:12:24,870 --> 00:12:35,430
Should we expect the number of gold medals won by a country to be proportional to the relative size of that country's population?
105
00:12:35,430 --> 00:12:43,050
So I'm focussing on gold medals here because they do signify extreme ability.
106
00:12:43,050 --> 00:12:49,400
One gets a gold medal for running, for being the very fastest person.
107
00:12:49,400 --> 00:13:01,280
One of having a being an extreme of speed, one gets a gold medal in the javelin for being the person who can throw the furthest teams,
108
00:13:01,280 --> 00:13:10,940
get gold medals in synchronised swimming for being able to swim in the most synchronised way in the competition.
109
00:13:10,940 --> 00:13:15,590
So Gold Medal signified will measure extreme events.
110
00:13:15,590 --> 00:13:25,670
And the question is, does the number of gold medals that a country, when should we expect it to be proportional to the population of that country?
111
00:13:25,670 --> 00:13:36,940
Well, this was picked over. This question was picked over in the press and after the 2012 Olympics and the 2016 Olympics.
112
00:13:36,940 --> 00:13:46,130
And the following was that was the sort of analysis that you found in many, many articles throughout the press, at least the British press,
113
00:13:46,130 --> 00:13:57,040
the when the Great Britain had a population of roughly 65 million and won in the 2016 Olympics, 27 gold medals.
114
00:13:57,040 --> 00:14:06,860
The US population of roughly 320 million, that's about five times the population of Great Britain and one forty six gold medals,
115
00:14:06,860 --> 00:14:16,810
so little under twice the number, China had a population which was roughly 20 times that of Great Britain.
116
00:14:16,810 --> 00:14:23,800
And yet it went about the same number of gold medals. Japan population about twice that of Great Britain.
117
00:14:23,800 --> 00:14:29,650
Yet when half the number of gold medals, the Great Britain did not twice the number.
118
00:14:29,650 --> 00:14:40,750
So the articles where one found this statistical analysis carried out the conclusion drawn in those articles was that Great Britain had done
119
00:14:40,750 --> 00:14:50,820
disproportionately well and that the country had done better than should have been expected on the basis of the size of its population.
120
00:14:50,820 --> 00:14:52,680
So the question I want to address is,
121
00:14:52,680 --> 00:15:04,650
is that is it true that the number of gold medals you expect to win is proportional to the size of the population going back to my questions earlier?
122
00:15:04,650 --> 00:15:17,750
Is it true that the size of extremes, highest peaks, highest waves are proportional to the total number of waves?
123
00:15:17,750 --> 00:15:22,790
Here's another more careful analysis of the Olympics. You find many of these on the web.
124
00:15:22,790 --> 00:15:27,350
I click this one at random and adjusting for population.
125
00:15:27,350 --> 00:15:31,340
These were the most successful countries at the Rio Olympics.
126
00:15:31,340 --> 00:15:41,300
So while the United States won more medals than any other country in the Rio Olympics, the article points out that its population was relatively high.
127
00:15:41,300 --> 00:15:48,620
And so this analysis took the number of medals were for all various countries.
128
00:15:48,620 --> 00:15:53,570
And compared them with the population size taken very carefully from two sources,
129
00:15:53,570 --> 00:16:01,340
so this is a very careful analysis, and the two sources were the United Nations and the CIA World Factbook book,
130
00:16:01,340 --> 00:16:11,840
which I confess I didn't know existed until I did this research to find this website that divided the number of medals by the size of the population.
131
00:16:11,840 --> 00:16:23,390
And they found that the country which did best was Grenada, Grenada, perhaps which only won a single medal, which was a very small population.
132
00:16:23,390 --> 00:16:28,700
And so the number of medals won by the population, it came out top.
133
00:16:28,700 --> 00:16:33,980
And the other countries that did well were the Bahamas, New Zealand and Jamaica.
134
00:16:33,980 --> 00:16:41,600
In fact, and the US didn't do very well at all. Here is the list, or at least the top of the list that you find on this website.
135
00:16:41,600 --> 00:16:49,280
So Grenada or Grenada, I did the best and by some significant margin.
136
00:16:49,280 --> 00:16:53,540
And then the Bahamas, New Zealand, Jamaica, Denmark, Croatia did very well.
137
00:16:53,540 --> 00:17:03,530
Slovenia, Azerbaijan, Georgia, Hungary, Bahrain, Lithuania, Great Britain comes down this list.
138
00:17:03,530 --> 00:17:11,270
It's about three quarters of the way down there. And so it didn't do spectacularly well, according to this analysis,
139
00:17:11,270 --> 00:17:20,360
but did much better than other big countries, certainly much better than the US China.
140
00:17:20,360 --> 00:17:25,370
Japan, etc. But is this a reasonable analysis?
141
00:17:25,370 --> 00:17:30,270
Is this the right? Is it the right way to level the playing field?
142
00:17:30,270 --> 00:17:37,840
To divide by the total size of the population. Well, there are very different answers you can give to this.
143
00:17:37,840 --> 00:17:43,960
Here are two of them two different opposing perspectives, both mathematical.
144
00:17:43,960 --> 00:17:50,410
One mathematical perspective is that Great Britain did do disproportionately well and that you should normalise the
145
00:17:50,410 --> 00:17:57,520
data in order to level the playing field between countries of different sizes by dividing by the population size.
146
00:17:57,520 --> 00:18:06,730
And here's how the argument goes. So my first answer is yes, I'm just let's idealise.
147
00:18:06,730 --> 00:18:17,830
Let's consider an Olympics involving one event and two countries, and let's say the populations of the two countries are A and B.
148
00:18:17,830 --> 00:18:24,220
One of the people in these combined countries has to win the Gold Medal.
149
00:18:24,220 --> 00:18:35,680
And let's assume that there's no bias that the person who wins that one country is not on average athletically more capable than the other country.
150
00:18:35,680 --> 00:18:41,110
And so the likelihood of that person who wins being in one country in the
151
00:18:41,110 --> 00:18:46,400
first country is just a its population divided by the total number of people.
152
00:18:46,400 --> 00:18:53,590
And if you pick somebody at random from the A plus B people in the combined populations,
153
00:18:53,590 --> 00:19:01,150
there's a probability of a over eight plus B that you'll pick someone in the first population and the probability that the winner,
154
00:19:01,150 --> 00:19:08,140
whoever she is, is in the second population in the second country is whereas the fraction of
155
00:19:08,140 --> 00:19:14,230
people of the total combined population in that second country would be over,
156
00:19:14,230 --> 00:19:22,690
eight must be. And so the ratio of the two is a over B ratio that you probabilities.
157
00:19:22,690 --> 00:19:30,100
And so this would suggest that indeed normalising by dividing by the population size is the right thing to do,
158
00:19:30,100 --> 00:19:39,730
the right way to level the playing field. But I would argue that this isn't the most accurate model of how the Olympic Games works.
159
00:19:39,730 --> 00:19:48,400
You don't see the Olympic Games doesn't involve competitions between all members of the populations of the various countries involved.
160
00:19:48,400 --> 00:19:51,100
I've never been involved in the Olympic Games, for example,
161
00:19:51,100 --> 00:19:58,960
so I would argue that a more accurate model of how the Olympic Games works is it's really a competition
162
00:19:58,960 --> 00:20:06,430
between the fastest person and country in the first country against the fastest person in the second,
163
00:20:06,430 --> 00:20:14,680
or the person who can throw the furthest in the first country against the person who can throw the furthest in the second country.
164
00:20:14,680 --> 00:20:21,820
So I would argue a better way to to analyse this would be to ask if you take a population,
165
00:20:21,820 --> 00:20:27,520
a country and how fast is the fastest person likely to be in that country?
166
00:20:27,520 --> 00:20:33,170
Well, how far can the strongest person throw that javelin?
167
00:20:33,170 --> 00:20:36,650
So if people's sporting abilities, speed, strength,
168
00:20:36,650 --> 00:20:45,140
stamina have a normal distribution and are independent of each other out of a population of people,
169
00:20:45,140 --> 00:20:50,530
how fast or strong is the fastest or strongest person likely to be?
170
00:20:50,530 --> 00:20:57,910
And I would argue that this is a better model for the people who enter the Olympics and actually take part in the competition.
171
00:20:57,910 --> 00:21:08,870
So mathematically speaking, we should take end numbers, draw them independently at random from the normal distribution.
172
00:21:08,870 --> 00:21:14,480
So we wait the numbers with the probability that's given by the bell shaped curve or the normal distribution.
173
00:21:14,480 --> 00:21:21,890
And we'll take that distribution to mean new. That's the value at the centre of the distribution and variance.
174
00:21:21,890 --> 00:21:26,990
Sigma squared sigma is the width of the bell shaped curve.
175
00:21:26,990 --> 00:21:35,180
And the question that is, what's the distribution of the largest of these numbers, how large you expect the largest one to be?
176
00:21:35,180 --> 00:21:38,600
And the answer turns out to be the following one.
177
00:21:38,600 --> 00:21:47,390
So it's given by this equation, which I show you in blue, that if you like equations, you'll you'll be able to unpick this very quickly.
178
00:21:47,390 --> 00:21:52,040
But his equations are not quite your thing. Let me unpick it for you.
179
00:21:52,040 --> 00:21:56,390
So the equations series the following to compute what you expect to be the largest of these numbers,
180
00:21:56,390 --> 00:22:04,680
you take the mean centre point off of the bell shaped curve and you add on to that something.
181
00:22:04,680 --> 00:22:11,430
You add on something because of course, the largest is is very likely to be larger than the mean.
182
00:22:11,430 --> 00:22:15,420
The amount you add on is proportional to the width of the bell shaped curve.
183
00:22:15,420 --> 00:22:24,910
That's no surprise if the if you take a normal distribution with a greater width, you expect to find more large values.
184
00:22:24,910 --> 00:22:30,760
And then multiply the width is the square root of two times the log of the number of samples
185
00:22:30,760 --> 00:22:36,440
we've taken from the distribution log event and to work for those who knew their logarithms,
186
00:22:36,440 --> 00:22:46,810
they recognise that this depends barely a tool on and the logarithm grows with and it increases and increases.
187
00:22:46,810 --> 00:22:52,250
But it does so barely at all. It's one of the slowest growing functions you could imagine.
188
00:22:52,250 --> 00:22:56,890
We actually have here the square root of the logarithm, which grows even more slowly.
189
00:22:56,890 --> 00:23:02,440
Some of you may know that a sort of paradigm of rapid growth is the exponential well.
190
00:23:02,440 --> 00:23:08,400
The logarithm is the opposite of that. It's a paradigm of slowest growth, if you like.
191
00:23:08,400 --> 00:23:13,890
So this is the answer, and it's an approximation to the answer, rather.
192
00:23:13,890 --> 00:23:19,410
And it's a beautifully simple formula that tells you how big the extremes are likely to be.
193
00:23:19,410 --> 00:23:26,310
So this same formula tells us how fast the fastest person is likely to be.
194
00:23:26,310 --> 00:23:32,730
But it also tells us how high the highest wave is that we're likely to encounter in our boat.
195
00:23:32,730 --> 00:23:34,680
The height of the highest wave,
196
00:23:34,680 --> 00:23:46,050
if is if we assume a normal distribution of wave heights proportional to the square root of the log of the number of waves that we encounter.
197
00:23:46,050 --> 00:23:50,610
So taking a much longer journey doesn't dramatically affect the height of the highest
198
00:23:50,610 --> 00:23:56,340
wave you're likely to encounter very affected to do likewise in a quantum problem.
199
00:23:56,340 --> 00:24:06,390
The height of the highest quantum amplitude, the highest wave point the in the quantum wave is proportional to the square root
200
00:24:06,390 --> 00:24:10,290
of the log of the total number of peaks that you saw in the picture earlier,
201
00:24:10,290 --> 00:24:16,480
so badly affected at all by the number of peaks.
202
00:24:16,480 --> 00:24:23,600
In fact, if you analyse this little more, you find that typically you expect to get up to this maximum value.
203
00:24:23,600 --> 00:24:29,450
Lots of lots of numbers. And then this is the largest.
204
00:24:29,450 --> 00:24:38,090
And then after this you get nothing. So it's not that you expect to see one outlier dramatically faster than other people.
205
00:24:38,090 --> 00:24:45,950
A wave that's dramatically higher than other people. In fact, you expect to see lots of waves up to about the height of the maximum and then nothing.
206
00:24:45,950 --> 00:24:52,250
Lots of people who can run close to the speed of the fastest person, but then nothing.
207
00:24:52,250 --> 00:24:57,500
And that, of course, of course, with your intuition or your experience at the Olympics,
208
00:24:57,500 --> 00:25:01,520
there are lots of people who can run almost as fast as the fastest person.
209
00:25:01,520 --> 00:25:11,150
That's why race is in the Olympics are exciting. You don't know exactly who's going to win, but the uh, so you get lots of people close the fastest,
210
00:25:11,150 --> 00:25:15,880
not one outlier who's dramatically faster than everybody else.
211
00:25:15,880 --> 00:25:23,180
And so I should say that going back to the Olympics, I'm not arguing that this is how you should model Olympics,
212
00:25:23,180 --> 00:25:26,770
please don't use this as a basis for betting on the next Olympics.
213
00:25:26,770 --> 00:25:35,720
And of course, what this means is the dependence on population is really rather small and dependent on other factors is far more important,
214
00:25:35,720 --> 00:25:42,800
for example, facilities in a country's gross domestic product or the tradition of coaching in that country.
215
00:25:42,800 --> 00:25:48,420
These are far more important effects in the population size.
216
00:25:48,420 --> 00:25:54,010
Let me put in the populations just to illustrate how slowly the square root of the logarithm increases.
217
00:25:54,010 --> 00:26:04,290
She put in the population of China and into the logarithm and divide by the square to the log since the population of Great Britain.
218
00:26:04,290 --> 00:26:17,900
You get an answer this one. So barely any change between China and Great Britain in terms of the dependence on population size.
219
00:26:17,900 --> 00:26:24,170
Now, here's a more accurate answer. This is a better formula, therefore more complicated.
220
00:26:24,170 --> 00:26:35,600
And but let me pick this for you. So this is actually a very accurate formula for the height of the largest or the extremes in a normal distribution.
221
00:26:35,600 --> 00:26:42,590
We have what we had before the mean new centre of the value centre of the distribution.
222
00:26:42,590 --> 00:26:49,100
We have the term that we saw earlier, which increases extremely slowly with an.
223
00:26:49,100 --> 00:26:56,930
And it's like the squirt of log in and then we subtract off a term which depends not just on the logarithm event,
224
00:26:56,930 --> 00:27:04,730
but on the logarithm of the logarithm of. And this term actually decreases and increases.
225
00:27:04,730 --> 00:27:10,160
So we have a term that's about constant, a term that does increase, but extremely slowly.
226
00:27:10,160 --> 00:27:17,310
And a term that decreases, but extremely slowly. And then there are some small fluctuations, which I don't want to describe in this lecture.
227
00:27:17,310 --> 00:27:23,420
Beyond the scope of what I want to discuss now, you don't need to remember this formula or any formula in this talk.
228
00:27:23,420 --> 00:27:29,060
But there's one aspect I would like you to remember, and that's the number here shown in red.
229
00:27:29,060 --> 00:27:37,720
The number one that no, I do wish you to remember because we're going to come back to that a little later.
230
00:27:37,720 --> 00:27:44,530
So we have this very accurate formula, and it's described very accurately and heights of the high sea waves,
231
00:27:44,530 --> 00:27:49,990
heights of the highest quantum waves, et cetera. Now,
232
00:27:49,990 --> 00:27:56,800
you may argue that this analysis I've given you is is too simple because it ignores the
233
00:27:56,800 --> 00:28:04,660
fact that abilities to have people in sporting events aren't independent of each other.
234
00:28:04,660 --> 00:28:17,170
There are dependencies. If we consider the heights of people, once height is low to a good degree dependent on the heights of your parents,
235
00:28:17,170 --> 00:28:23,050
if you have two very tall parents, your height is likely to be taller than average.
236
00:28:23,050 --> 00:28:33,340
If you have two very athletic fast parents, there's a greater likelihood that you two will be athletic and fast.
237
00:28:33,340 --> 00:28:40,150
How can we build this into a model? We get an idealisation of a family tree.
238
00:28:40,150 --> 00:28:43,690
It's very idealised, but it just focuses on the essentials.
239
00:28:43,690 --> 00:28:51,250
And we imagine at the top of the tree, a matriarch, it's the first generation, so to speak.
240
00:28:51,250 --> 00:28:57,740
And we'll imagine just for the sake of simplicity that that matriarch has two offspring.
241
00:28:57,740 --> 00:29:05,780
Each of those offspring has two offspring, each of their offspring have two offspring, etcetera.
242
00:29:05,780 --> 00:29:12,080
Now the number two is not at all relevant here. It's just for illustrative purposes that I'm sharing this.
243
00:29:12,080 --> 00:29:15,080
Nor do we have to assume that everyone has the same number of offspring.
244
00:29:15,080 --> 00:29:21,470
Again, I'm just doing this for illustrative purposes to keep the description simple.
245
00:29:21,470 --> 00:29:25,430
So that's why I'm going from the parent to an offspring,
246
00:29:25,430 --> 00:29:32,240
let's imagine that the offspring acquire some characteristic, but he doesn't acquire a characteristic perfectly.
247
00:29:32,240 --> 00:29:39,020
There's some there's some variance in the degree to which they acquire that characteristic.
248
00:29:39,020 --> 00:29:47,960
So I would assume that as you go from a parent to an offspring, you pick up a factor and attribute drawn from the normal distribution.
249
00:29:47,960 --> 00:29:49,880
The bell shaped curve.
250
00:29:49,880 --> 00:30:02,650
So as you go down the generations each, each person picks up from their parent, an attribute which is drawn randomly from the bell shaped curve.
251
00:30:02,650 --> 00:30:09,940
But let's imagine that your net attribute your net ability is the average of the
252
00:30:09,940 --> 00:30:16,820
attributes of all of your ancestors going back to the matriarchal figure at the top.
253
00:30:16,820 --> 00:30:23,990
So you see at the bottom there, there is a population of people and at the bottom of the Green Line,
254
00:30:23,990 --> 00:30:30,440
there is a person and the attributes they collect is the average of the
255
00:30:30,440 --> 00:30:37,750
attributes acquired through all of the generations going back to the matriarch.
256
00:30:37,750 --> 00:30:45,160
They have a sibling shown at the bottom of the brown line and they to pick up attributes
257
00:30:45,160 --> 00:30:49,090
from all of the generations going back and they'll differ from their sibling.
258
00:30:49,090 --> 00:30:58,480
Only in the one attribute they've acquired from that parent, which is drawn at random and independently from the bell shaped curve.
259
00:30:58,480 --> 00:31:04,600
So this is a way of combining inheritance with randomness.
260
00:31:04,600 --> 00:31:10,810
And the question is, if you look at the population at the bottom, what's the distribution of attributes?
261
00:31:10,810 --> 00:31:19,990
We didn't have that. A beautiful fact about the normal distribution is that if you average lots of numbers from the normal distribution,
262
00:31:19,990 --> 00:31:26,890
they still the answer you get still has a normal distribution. So the people at the bottom in that generation shop bottom there.
263
00:31:26,890 --> 00:31:35,230
The variation in attributes will be described by the normal distribution by the bell shaped curve, but they're no longer independent.
264
00:31:35,230 --> 00:31:44,890
Because, for example, the person at the bottom of the Green Line and a sibling have lots of ancestors in common, lots of attributes in common,
265
00:31:44,890 --> 00:31:49,720
so they will be more similar than people who are more distantly related from that,
266
00:31:49,720 --> 00:31:56,740
and the level of similarity will be determined by their last common ancestor.
267
00:31:56,740 --> 00:32:03,340
What in this case, do we expect to be the highest of those attributes?
268
00:32:03,340 --> 00:32:11,500
Where if we have any people in that population, here's a formula for the for the largest of those end numbers.
269
00:32:11,500 --> 00:32:18,790
And this is a formula that most people have been thinking about this for the last 20 30 years.
270
00:32:18,790 --> 00:32:25,510
So this is relatively recently discovered formula, and the formula looks remarkably like the one I showed you earlier.
271
00:32:25,510 --> 00:32:38,260
In fact, it's almost identical. So, so building in dependence via a family tree dairy effect.
272
00:32:38,260 --> 00:32:45,820
The answer in terms of the size of the maximum height of that of the of the extreme.
273
00:32:45,820 --> 00:32:55,040
The answer is almost identical. It differs only in one way. And that's the thing that was one before is now a three.
274
00:32:55,040 --> 00:33:01,610
But that's in a very small turn. So this dependence is very small, but it is their dependence doesn't matter.
275
00:33:01,610 --> 00:33:10,460
But only at this very low level, this very small turn turn that gets smaller and smaller as an increase.
276
00:33:10,460 --> 00:33:14,930
But it is there, and the difference is that you go from one to three.
277
00:33:14,930 --> 00:33:22,150
Now this three is universal. It doesn't depend on the fact that I assumed two offspring or equal numbers of offspring.
278
00:33:22,150 --> 00:33:34,490
Anytime you have a family tree lying behind your your data, you expect to get a three and not to one.
279
00:33:34,490 --> 00:33:41,070
So that's a description of of some of the.
280
00:33:41,070 --> 00:33:44,940
Mathematics that underpins extreme events.
281
00:33:44,940 --> 00:33:55,080
Now, let me tell you about some of the applications, and the first application I want to describe is to freezing of liquids to form solids.
282
00:33:55,080 --> 00:34:02,100
So any material, if you raise it to a sufficiently high temperature is a liquid that means it's constituent parts,
283
00:34:02,100 --> 00:34:11,110
the atoms and molecules that make up a free to wander around at random and explore various different configurations.
284
00:34:11,110 --> 00:34:22,090
Each configuration carries an energy. And so as you explore various configurations, you're exploring ranges of different energies.
285
00:34:22,090 --> 00:34:28,340
As you lower the temperature, you lower the range of energies that you can explore.
286
00:34:28,340 --> 00:34:36,680
Until you reach the freezing point, when the liquid becomes a solid, where the constituent parts are fixed in space.
287
00:34:36,680 --> 00:34:43,940
And that's because essentially they're stuck at the lowest energy configuration.
288
00:34:43,940 --> 00:34:50,930
So when you're freezing a material, you're finding the lowest energy configuration.
289
00:34:50,930 --> 00:34:58,770
You know, there might be configurations with local minima, but you're finding the very lowest one.
290
00:34:58,770 --> 00:35:01,080
So when a liquid freeze is temperature lowered,
291
00:35:01,080 --> 00:35:07,260
lowered the configuration of atoms and molecules seeks the lowest energy arrangement in many situations.
292
00:35:07,260 --> 00:35:12,240
This lowest energy arrangement is highly symmetrical and highly ordered.
293
00:35:12,240 --> 00:35:17,990
So, for example, if you melt some. Salt.
294
00:35:17,990 --> 00:35:26,060
Common salt and then freeze it again, you know that you form crystals in those crystals are highly awkward and ordered and arranged.
295
00:35:26,060 --> 00:35:30,800
And that's because the lowest energy configuration in that case is very well-defined.
296
00:35:30,800 --> 00:35:38,270
There's a clear winner in the lowest energy configuration, and the system manages to find that every time.
297
00:35:38,270 --> 00:35:46,100
And that's why such systems have a very well-defined freezing temperature and why every time you freeze the system,
298
00:35:46,100 --> 00:35:51,470
you get the same configuration in the solid phase.
299
00:35:51,470 --> 00:35:55,970
There are, however, materials where that's not the case.
300
00:35:55,970 --> 00:36:07,190
And examples would be a glass, the glass that you see perhaps in the window, in your room or the glass that's in my spectacles.
301
00:36:07,190 --> 00:36:15,680
And in these cases, the energy landscape is vastly more complicated, and there isn't a clear,
302
00:36:15,680 --> 00:36:20,480
obvious winning landscape when it comes to being the minimum energy.
303
00:36:20,480 --> 00:36:25,220
In fact, this landscape is so complicated there, rather like the mountain ranges I showed you earlier,
304
00:36:25,220 --> 00:36:32,060
or rather like the surface of the sea or the quantum waves. There are lots of possible different arrangements,
305
00:36:32,060 --> 00:36:40,460
all with more or less the same local minimal energies and finding the right one an obvious winner is difficult.
306
00:36:40,460 --> 00:36:49,670
So when the system, when you lower the temperature, the system explores and finds itself in local minima and these you might get a
307
00:36:49,670 --> 00:36:53,570
different minimum each time you might be stuck in a different minimum each time.
308
00:36:53,570 --> 00:36:57,650
And there's no reason to expect that in the solid phase, the configurations will be the same.
309
00:36:57,650 --> 00:37:08,230
They're certainly not hiding highly ordered. So the question is why do classes have relatively well-defined freezing transitions?
310
00:37:08,230 --> 00:37:17,500
Here's a picture of a cartoon, if you like, of the energy landscape for a glass is computed by Chiara Camerata,
311
00:37:17,500 --> 00:37:22,060
who's an expert on this area of mathematical physics.
312
00:37:22,060 --> 00:37:26,440
And you see that what would I said that there are many possible local minima.
313
00:37:26,440 --> 00:37:33,490
Your system explores this terrain as you lower the temperature and will get stuck in a minimum,
314
00:37:33,490 --> 00:37:41,740
but it may get stuck in a very high line minimum and therefore be have an energy much higher than the potentially lowest one it could reach.
315
00:37:41,740 --> 00:37:47,790
There are many local American. It can it can attain all with different configurations.
316
00:37:47,790 --> 00:37:52,020
Now, this really is a cartoon, because in fact, this isn't two dimensional.
317
00:37:52,020 --> 00:38:00,810
It has many billions of dimensions. And so you have to imagine what this random landscape would look like in an extremely high dimensional space,
318
00:38:00,810 --> 00:38:06,330
not a two dimensional surface of a shown here. Does that help us?
319
00:38:06,330 --> 00:38:16,060
Well, it turns out that in very high dimensional random landscapes, we can get some simplifying features very unexpectedly.
320
00:38:16,060 --> 00:38:23,830
One feature is that the subtle points with higher energies look more like maxima than those with lower energies.
321
00:38:23,830 --> 00:38:26,440
Those with lower energy look more like minima.
322
00:38:26,440 --> 00:38:35,410
So what that means is that the high line shuttle points have many downward directions and not many upward directions.
323
00:38:35,410 --> 00:38:36,870
And this really helps you.
324
00:38:36,870 --> 00:38:47,590
You might be troubled that your your liquid will, as it explores various configurations, end up at a subtle point with a high energy,
325
00:38:47,590 --> 00:38:52,510
but which looks very much like a minimum might get stuck there for a very long time,
326
00:38:52,510 --> 00:38:56,230
and you might freeze in that configuration, but that doesn't happen.
327
00:38:56,230 --> 00:39:00,550
It turns out that subtle points where you may think you would get stuck.
328
00:39:00,550 --> 00:39:07,180
The high energy shuttle points look more like maxima, so you're more likely to fall down in the cascade of energies.
329
00:39:07,180 --> 00:39:15,640
Down to lower energies, right? Turns out that most minima have have low energies and in fact, entries close to the lowest.
330
00:39:15,640 --> 00:39:24,550
Not all of them do. There will be some minima that have high energies, but they're very rare compared to the ones with low energies.
331
00:39:24,550 --> 00:39:31,000
And the lowest energy configuration is only slightly sensitive to the size of the system, as we've already seen.
332
00:39:31,000 --> 00:39:35,830
And so this explains why glasses have a relatively sharp freezing transition.
333
00:39:35,830 --> 00:39:44,680
It's because of this phenomenon that you get lots of things, lots of of local peaks close to the highest peak.
334
00:39:44,680 --> 00:39:49,990
If you're interested in the highest one or the lowest if your lowest dip, if you're interested in the lowest lots,
335
00:39:49,990 --> 00:39:57,100
very close to that and the configuration you end up in may not be the absolute lowest, but his energy is not going to be far away.
336
00:39:57,100 --> 00:40:02,020
And so the freezing temperature is pretty well defined in these systems.
337
00:40:02,020 --> 00:40:12,260
And this explains a puzzle that has been troubling the natural scientists for a very long time.
338
00:40:12,260 --> 00:40:20,000
We can apply this understanding to a different problem that's troubling people currently, and this is the problem of machine learning.
339
00:40:20,000 --> 00:40:26,540
How do you train a machine to recognise or categorise images that is not seen before?
340
00:40:26,540 --> 00:40:37,160
So the idea is you want to show your computer pictures of cats, lots of them and train your computer to recognise what a cat is.
341
00:40:37,160 --> 00:40:41,480
So that if you show it a picture not identical to any of those, it's seen already.
342
00:40:41,480 --> 00:40:50,150
It will still recognise it as a cat. So you put in lots of data into your computer, lots of pictures of lots of cats.
343
00:40:50,150 --> 00:40:53,510
Each picture of a cat contains lots of data.
344
00:40:53,510 --> 00:41:04,550
There are many attributes to a cat, and many things you can measure that would reflect in the data that you put into the into your computer.
345
00:41:04,550 --> 00:41:11,900
So your job here is to take lots of images. Each of them contains lots of information,
346
00:41:11,900 --> 00:41:23,680
and you put it into a computer and try to use this to train your computer to recognise a new image to be a cat as opposed to a dog or a hamster.
347
00:41:23,680 --> 00:41:26,860
How do you do that? Well, you take all the data you've put in,
348
00:41:26,860 --> 00:41:36,140
which now sits in a very high dimensional space because we're putting in a lot of data in each point of data contains a lot of information,
349
00:41:36,140 --> 00:41:40,180
lots of parameters to vary. So this is a very high dimensional space.
350
00:41:40,180 --> 00:41:47,140
We input the data and then we try to find a surface that sits as close as possible to all of that data.
351
00:41:47,140 --> 00:41:52,780
So this surface will necessarily be highly complex and lives in a very high dimensional space.
352
00:41:52,780 --> 00:41:59,930
It has to be complex because it has to fit lots of different looking cats. There are lots of varieties of cat.
353
00:41:59,930 --> 00:42:07,100
So we have these very random surface in a very high dimensional space, and we want it to be as close to all the data that we put in as possible.
354
00:42:07,100 --> 00:42:13,100
So we vary parameters that describe the surface and try to get it to match as closely as possible.
355
00:42:13,100 --> 00:42:17,990
The data that we put in. And we're just as close as possible, Maine,
356
00:42:17,990 --> 00:42:28,370
when it means that the distance between that surface and the data points that we're putting in has to be as small as possible.
357
00:42:28,370 --> 00:42:33,920
That is, we have to find the lowest minimum of distance.
358
00:42:33,920 --> 00:42:39,260
So the problem machine learning is exactly like the problem of freezing of glasses.
359
00:42:39,260 --> 00:42:49,850
We're finding the lowest minimum and is therefore exactly the same as finding the extremes of random surfaces that we discussed earlier.
360
00:42:49,850 --> 00:42:57,620
And you can analyse it in the same way. So we defined the lowest minimum, and this will train the machine in the best possible way,
361
00:42:57,620 --> 00:43:03,260
give it the best possible chance to categorise images that it's not already seen.
362
00:43:03,260 --> 00:43:09,080
So can you find the lowest minimum? Well, this is one of the great challenges of modern computer science.
363
00:43:09,080 --> 00:43:18,860
And every, every, every company interested in in machine learning has teams working on this problem.
364
00:43:18,860 --> 00:43:26,330
It's it's the moon landing problem of of machine learning is how to identify the global minimum,
365
00:43:26,330 --> 00:43:33,880
the lowest minimum of the surface that you fall to approximate the data you've put in.
366
00:43:33,880 --> 00:43:37,660
Which is out, there are algorithms that do that, and they work very well.
367
00:43:37,660 --> 00:43:44,290
In fact, the big surprise in this area is not whether you can find an algorithm we have them.
368
00:43:44,290 --> 00:43:49,390
The big surprise is that they work far better than we might. They might have been expected.
369
00:43:49,390 --> 00:43:58,930
And this was the great puzzle to resolve in the area. So here's an example of the sort of surface you get in the machine learning problems.
370
00:43:58,930 --> 00:44:03,970
This is taken from a paper visualising the lost landscape of neural nets.
371
00:44:03,970 --> 00:44:08,570
Neural nets are a description of these random search of these surfaces.
372
00:44:08,570 --> 00:44:14,530
The one gets in computer science in this area. You see, the surface is highly irregular.
373
00:44:14,530 --> 00:44:21,400
And your job is to find the lowest minimum on the surface, and that's the best approximation to your data.
374
00:44:21,400 --> 00:44:24,250
Here's another illustration of what these surfaces look like.
375
00:44:24,250 --> 00:44:31,960
This is taken from a website lost landscape dot com, where I should say you'll also find films that you can explore,
376
00:44:31,960 --> 00:44:43,240
and you see these surfaces are extraordinarily complex and pockmarked with with local minima, which saddles maxima.
377
00:44:43,240 --> 00:44:47,470
So as you explore the surface, you may well get stuck in the wrong minimum.
378
00:44:47,470 --> 00:44:56,350
You may well get stuck or to settle for a very long time before you find that it really is just a saddle and not not a minimum.
379
00:44:56,350 --> 00:45:04,090
And in these very high dimensional spaces, I remind you, saddles can look very much like like like minima.
380
00:45:04,090 --> 00:45:16,810
If you if this was a surface of a thousand dimensions in 999 dimensions, the saddle might be upwards, but only one direction, maybe downwards.
381
00:45:16,810 --> 00:45:21,550
So it's very easy to be fooled in these high dimensional spaces, but a saddle is really a minimum.
382
00:45:21,550 --> 00:45:25,160
So it looks like this would be an intractable problem, but in fact, it isn't.
383
00:45:25,160 --> 00:45:30,700
And the methods that people have worked extraordinarily well. Why is that?
384
00:45:30,700 --> 00:45:38,310
Well, it turns out that it's the same understanding we've developed all along in these highly high dimensional, highly complex surfaces.
385
00:45:38,310 --> 00:45:48,610
And the structure actually works to your benefit in that the saddles that you see if you're high up look much more like maxim of the minimum.
386
00:45:48,610 --> 00:45:55,150
And so you naturally inclined to roll down the landscape with your algorithm and not get stuck,
387
00:45:55,150 --> 00:46:01,270
not get stuck in in high lying minima because there are relatively few of those.
388
00:46:01,270 --> 00:46:06,790
Instead, you're likely to find a minimum, which is very close to the global minimum.
389
00:46:06,790 --> 00:46:12,190
You may not find the absolute global minimum, but you can quickly get to a minimum that's very close to it.
390
00:46:12,190 --> 00:46:17,520
And that's good enough for all practical purposes. So this is a good example.
391
00:46:17,520 --> 00:46:26,510
I want you to discuss, but let me finish with a final example, which is have a very different flavour and this is the Romans easy to function.
392
00:46:26,510 --> 00:46:29,570
So the remains to function is a mathematical object,
393
00:46:29,570 --> 00:46:40,820
a surface which is designed to understand the prime numbers to the primes of the numbers divisible only by themselves and want.
394
00:46:40,820 --> 00:46:44,660
So here are the examples of the primes up to 100.
395
00:46:44,660 --> 00:46:53,750
And for thousands of years, humankind's been interested in the distribution of these numbers against all of the whole numbers.
396
00:46:53,750 --> 00:46:59,420
Are there any patterns amongst the primes? Are there any ways to predict where the next prime will count,
397
00:46:59,420 --> 00:47:07,130
etc. So people have found a way to analyse the distribution of primes, and this involves the remains to function?
398
00:47:07,130 --> 00:47:12,960
So what's that? Well, if we take a number as.
399
00:47:12,960 --> 00:47:22,440
We do the following to it. We get one. We are doing one over two to the power plus one of the three to the Paris plus one over four to the Paris.
400
00:47:22,440 --> 00:47:29,820
Just one of the height of power, etc. So our faces to this would be one plus one over to square,
401
00:47:29,820 --> 00:47:32,820
which is four plus one of three squared, which is nine.
402
00:47:32,820 --> 00:47:43,440
That's one of the four squared, which is 16, etc. So for each number s, we can put it into this sub and get an answer act for some values of s.
403
00:47:43,440 --> 00:47:49,920
You have to work a little harder, but I don't want to go down that route. We have a way to get an answer for every time you have.
404
00:47:49,920 --> 00:47:57,000
Now, just to make this a little more complicated, it turns out you don't have to have taken no like two or three four s.
405
00:47:57,000 --> 00:48:06,660
You could take a complex number, something that was a combination of a number that we see in the everyday world and the square root of minus one.
406
00:48:06,660 --> 00:48:09,610
So if you understand complex numbers, you'll know what that means.
407
00:48:09,610 --> 00:48:15,300
You put a complex number and threaten the value out that you get will typically be a complex number.
408
00:48:15,300 --> 00:48:21,000
If complex numbers aren't quite your thing. Think of it. That s is really two numbers.
409
00:48:21,000 --> 00:48:26,630
You put two numbers into this device and two numbers come connect.
410
00:48:26,630 --> 00:48:30,980
So how does this help you? Well, there's a remarkable identity,
411
00:48:30,980 --> 00:48:41,090
do you originally to oiler which says that that sum is in fact equal to a product and the product is over all of the prime numbers?
412
00:48:41,090 --> 00:48:46,040
So the Sun one plus whatever of the sort of a three to the S, et cetera,
413
00:48:46,040 --> 00:48:54,260
is identical equal to one over one minus one of a two to the chest and one which would have a three year less tangible match.
414
00:48:54,260 --> 00:49:00,290
One of five to this much more over 70 years will match whatever 11 to the S, et cetera.
415
00:49:00,290 --> 00:49:08,990
And these are all the prime numbers that are appearing in only the prime numbers. So it is realised by Raman that just as modern chefs love to do,
416
00:49:08,990 --> 00:49:13,010
you can take a nice dish and deconstruct it ream and realise that you can
417
00:49:13,010 --> 00:49:18,680
deconstruct this formula that if you understand the behaviour of the to function,
418
00:49:18,680 --> 00:49:21,900
you can deconstruct it to get information about the prime numbers.
419
00:49:21,900 --> 00:49:28,950
So this is how we understand the primes is via the Raymond Z to function deconstructed.
420
00:49:28,950 --> 00:49:35,250
So is the Romans each function look like it's a little hard to plot because of the fact that complex numbers appear,
421
00:49:35,250 --> 00:49:43,860
but one way to representatives is in terms of colours. So here's a plot in two dimensions the two to a point in the coordinates of the points
422
00:49:43,860 --> 00:49:49,290
in this two-dimensional plot are the two input numbers or the one complex number,
423
00:49:49,290 --> 00:49:51,000
if you know what that means.
424
00:49:51,000 --> 00:49:59,180
So you picture X and Y coordinates of a point here, and X and Y correspond to the two input numbers and the output is represented as a colour.
425
00:49:59,180 --> 00:50:02,450
Yeah, it's not the only way to represent the Z to function,
426
00:50:02,450 --> 00:50:07,590
and I'll give you a different one in a minute, but this is the one that we'll focus on for the moment.
427
00:50:07,590 --> 00:50:16,850
So you have colours at each point, but some points are special that points where all the colours meet.
428
00:50:16,850 --> 00:50:21,230
And you see a collection of these down at the bottom of the picture.
429
00:50:21,230 --> 00:50:28,510
They lie on a straight line, a horizontal line at the bottom of the plot.
430
00:50:28,510 --> 00:50:34,580
And even realised that these points where all the colours meet are a very special,
431
00:50:34,580 --> 00:50:38,980
and in fact, these are the points that really determine the distribution of the prime numbers.
432
00:50:38,980 --> 00:50:44,620
This is where the deacon, this is where the deconstruction really acts.
433
00:50:44,620 --> 00:50:48,700
He realised as well that there aren't just these special points on the horizontal line at the bottom.
434
00:50:48,700 --> 00:50:53,800
There are some others that lie away from that horizontal line.
435
00:50:53,800 --> 00:51:03,130
And he he guessed, he hypothesised that those numbers lie on a single straight vertical line.
436
00:51:03,130 --> 00:51:09,550
Now, he wasn't able to prove that he guessed it, and that left that as a challenge for future generations.
437
00:51:09,550 --> 00:51:12,280
And that's the problem that we call the Riemann hypothesis.
438
00:51:12,280 --> 00:51:18,190
It's that these special points were all the colours meet and that aren't the points on the horizontal line at the bottom.
439
00:51:18,190 --> 00:51:23,970
In fact, lie on a single vertical straight line. And we can't prove that yet.
440
00:51:23,970 --> 00:51:32,910
If you do come up with the proof and you're able to get it accepted in a reputable mathematical journal and accepted by the mathematical community,
441
00:51:32,910 --> 00:51:43,620
you win a million dollars because this is one of the problems issued by the Clay Institute, actually one of the clay mathematical millennium problems.
442
00:51:43,620 --> 00:51:45,910
Here's a different way to represent the Romans each function.
443
00:51:45,910 --> 00:51:54,510
And this is a way due to G.H. Hardy, who was a great mathematician in the early part of the 20th century,
444
00:51:54,510 --> 00:52:02,250
and he thought, What does this look like as you look up the vertical line where you expect their special points to lie?
445
00:52:02,250 --> 00:52:07,950
And he found a way to plot that it's called the hardest function, and it's the curve in blue there.
446
00:52:07,950 --> 00:52:11,100
So this curve oscillates is like a wave.
447
00:52:11,100 --> 00:52:20,100
It passes through zero and the zeros are precisely the points in the previous block where all the colours meet going up vertically.
448
00:52:20,100 --> 00:52:27,810
So the zeros of this curve, the point where it intersects the horizontal axis are the Roman zeros,
449
00:52:27,810 --> 00:52:34,170
and these are the points that are identified as points where the colours meet.
450
00:52:34,170 --> 00:52:37,800
What's the Riemann hypothesis in this setting, where it's the statement that this curve,
451
00:52:37,800 --> 00:52:47,940
if you look beyond 10 to the right of 10 on the horizontal axis, that all the maxima will be positive and all the minimum will be negative.
452
00:52:47,940 --> 00:52:59,940
So if you find a maximum of the of this function of this curve that lies below the horizontal axis that disproves the Roman hypothesis.
453
00:52:59,940 --> 00:53:04,440
So that's one of the great challenges about this, about the high desert function,
454
00:53:04,440 --> 00:53:12,480
this curve here show that it has no negative maxima or no positive minima, and that proves the remote hypothesis.
455
00:53:12,480 --> 00:53:14,880
There's another question that people have asked about this curve.
456
00:53:14,880 --> 00:53:23,160
Again, going back about 100 years and this is how big it is to the oscillations get in the heart is that function.
457
00:53:23,160 --> 00:53:30,450
I plotted the hard-edged function in the top curve up to 60 and then down below from fires in 2050.
458
00:53:30,450 --> 00:53:37,560
You see the oscillations continue and they get more rapid and the curve seems to get a little bigger, but not by very much.
459
00:53:37,560 --> 00:53:48,720
The oscillations and Lindelof, about 100 years ago, suggested that perhaps the hardest edge function increases as you go along the horizontal axis.
460
00:53:48,720 --> 00:53:55,380
The size of the oscillations increases, but as slowly as you can possibly imagine, I'm not going to make that.
461
00:53:55,380 --> 00:54:02,550
I'm not. I'm not going to stay that more precisely. But Lindelof made a precise guess that said that the hardy z function does
462
00:54:02,550 --> 00:54:07,650
increase in the size of its oscillations as you go along the horizontal axis.
463
00:54:07,650 --> 00:54:13,980
But as slowly as you can possibly imagine now which people have tried to think about
464
00:54:13,980 --> 00:54:19,080
this work on this for the last hundred years and hard won progress has been made.
465
00:54:19,080 --> 00:54:30,420
But we very far from proving that. So progress is being slow, steady, hard fought, but we're very far from proving that off.
466
00:54:30,420 --> 00:54:40,480
However, recently a different question has been put forward very much in this spirit, which it turns out we can answer very precisely.
467
00:54:40,480 --> 00:54:46,000
And this is the question, not how big does these oscillations get all the way out?
468
00:54:46,000 --> 00:54:54,100
Some long distance along the horizontal axis? What if we look over a short distance to what's the largest value of the heart?
469
00:54:54,100 --> 00:54:59,000
Is that function between tea and tea plus two pi?
470
00:54:59,000 --> 00:55:04,750
It doesn't have to be to be any constant. No, but I'm putting two pi here for frustration.
471
00:55:04,750 --> 00:55:13,420
So how big do you expect? The largest value of the hardy search function to be its largest oscillation in a range of lengths to PI?
472
00:55:13,420 --> 00:55:20,440
Two points about six. Well, as tea increases, you get more and more oscillations at the function in this range.
473
00:55:20,440 --> 00:55:28,010
And so we have more little max maxima the local minima. How big is the highest to lowest, the lowest?
474
00:55:28,010 --> 00:55:30,560
Well, here's the answer.
475
00:55:30,560 --> 00:55:42,170
It turns out to be that this answer is identical to the answer that we saw earlier for the extremes associated with family trees.
476
00:55:42,170 --> 00:55:52,610
There's no three appears, and the terms are exactly matching those that we saw in the formula for the extremes associated with family trees.
477
00:55:52,610 --> 00:55:58,130
Now the history of this is that this formula was guessed by analogy with the family tree problem.
478
00:55:58,130 --> 00:56:03,650
And in the last few years, people have managed to prove it. So we now have very precise information,
479
00:56:03,650 --> 00:56:08,630
astonishingly precise just over the last few years about the extreme values of the Hardees that
480
00:56:08,630 --> 00:56:14,900
function and the and this has come about because we asked the right question in this setting,
481
00:56:14,900 --> 00:56:20,750
a question that does have a very precise answer that we can prove and proving
482
00:56:20,750 --> 00:56:26,300
this relies on showing that there is a family tree like structure to the primes.
483
00:56:26,300 --> 00:56:29,690
And people have been thinking about the primes for a very long time.
484
00:56:29,690 --> 00:56:34,910
But this new understanding has emerged only recently that you can group the primes together in the
485
00:56:34,910 --> 00:56:40,520
way they contribute to the and Z to function in a way that makes them look like a family tree.
486
00:56:40,520 --> 00:56:45,850
And using that fact then allows you to prove this formula. So.
487
00:56:45,850 --> 00:56:51,010
By thinking about extremes, we discover new properties that are primes,
488
00:56:51,010 --> 00:56:56,260
and I should emphasise that it's inconceivable that we would have one would have guessed this formula if
489
00:56:56,260 --> 00:57:03,180
one hadn't been thinking about all the other problems and extremes that I've been describing earlier.
490
00:57:03,180 --> 00:57:06,830
So let me finish with the summary questions relating to the highest,
491
00:57:06,830 --> 00:57:15,020
maximum or lowest minimum connect, many different problems from water waves, quantum mechanics,
492
00:57:15,020 --> 00:57:21,020
the Olympics through to how glasses freeze, machine learning problems,
493
00:57:21,020 --> 00:57:26,570
the efficacy of algorithms and machine learning and through to the remains each function.
494
00:57:26,570 --> 00:57:37,220
And by identifying that thread, we've managed to make progress in recent years that I think wouldn't have been imagined longer ago.
495
00:57:37,220 --> 00:57:40,550
So the statistics of extreme values shows universal behaviour.
496
00:57:40,550 --> 00:57:51,380
You get the same formula appearing time and again, and this commonality is what allows us to make progress on a broad range of these problems.
497
00:57:51,380 --> 00:57:55,080
So it might seem that we've made good progress, and I think we're pleased that we have.
498
00:57:55,080 --> 00:58:00,140
But I want to emphasise that going back to the first story I told you the mountain range,
499
00:58:00,140 --> 00:58:05,390
we're really still in the foothills of this, of this analysis, of this line of research.
500
00:58:05,390 --> 00:58:08,990
It's very clear that there's much more that we don't understand that we do.
501
00:58:08,990 --> 00:58:16,790
And the great challenge now is to take this analysis and make it more precise in each of these separate applications and to get
502
00:58:16,790 --> 00:58:26,410
more and more accurate methods for analysing these problems of at extremes in the various contexts I've described to you today.
503
00:58:26,410 --> 00:59:02,109
Thank you.