1
00:00:15,080 --> 00:00:20,239
So I will talk about extreme value statistics. So this first lady's motivation,
2
00:00:20,240 --> 00:00:25,850
why we want to study extreme events and extremely statistics is a branch of probability
3
00:00:25,850 --> 00:00:31,159
theory dealing with extreme events and which are typically very rare events.
4
00:00:31,160 --> 00:00:34,250
But when they happen, they can have devastating consequences.
5
00:00:34,670 --> 00:00:38,120
So these are three examples of extreme events which which are a big deal.
6
00:00:38,120 --> 00:00:47,750
And for instance, in epidemics, breeding a very rare mutation of a virus can lead to a new epidemic wave or in in finance.
7
00:00:48,450 --> 00:00:51,950
A financial crisis can really affect world economy.
8
00:00:52,250 --> 00:00:59,239
And finally, it's very important to study rare events in climate studies, because with in the context of climate change,
9
00:00:59,240 --> 00:01:04,639
extreme weather events like heatwaves have become more and more common and more important.
10
00:01:04,640 --> 00:01:13,940
So it's very it's crucial to understand the role played by extreme events and in particular, their statistical properties.
11
00:01:14,980 --> 00:01:22,960
So let's start with the practical problem. Imagine that you are an engineer and your task is to build a bridge over the river.
12
00:01:24,040 --> 00:01:27,160
So you have at your disposal some data.
13
00:01:27,310 --> 00:01:32,440
So this, for instance, is the water level of the Nile River near Cairo.
14
00:01:33,540 --> 00:01:40,660
Over many centuries. And your goal is to be able to decide, for instance, the height of the bridge.
15
00:01:41,780 --> 00:01:48,140
So one first thing you could do to study this data is to study what the average height of the river is.
16
00:01:49,040 --> 00:01:52,310
And the average value is plotted here by the black line.
17
00:01:53,350 --> 00:01:57,940
And to do that, you have a very strong result from mathematics, from probability theory,
18
00:01:57,940 --> 00:02:01,410
math, from probability theory, which is the central limit theorem.
19
00:02:03,140 --> 00:02:09,980
And what the central limit theorem is telling you is that if you have a bunch of random variables x1x2x, then.
20
00:02:11,270 --> 00:02:14,690
You sum them up. And if you have many variables,
21
00:02:14,690 --> 00:02:22,580
you are guaranteed that this some of these variables as and will converge to a Gaussian distribution or a normal distribution.
22
00:02:23,570 --> 00:02:28,940
Which is shown here, and it has the typical bell shape that you might have seen many times already.
23
00:02:29,860 --> 00:02:34,600
So to give you a specific example, a numerical example,
24
00:02:35,470 --> 00:02:42,580
if we consider probability density function API effects, which is telling me that the probability, I mean,
25
00:02:42,890 --> 00:02:44,680
I can find the, the variable,
26
00:02:44,680 --> 00:02:52,629
the random variable X anywhere in the interval zero one with the uniform probability and I will not find this variable anywhere else.
27
00:02:52,630 --> 00:02:57,890
So the probability zero say this interval. So it's like this box probability distribution.
28
00:02:57,910 --> 00:03:07,150
So what they can do is that in my computer I can generate many replicas of this random variable independent of each other.
29
00:03:07,570 --> 00:03:11,320
I sum them up in a plot. The probability distribution of the sun.
30
00:03:12,410 --> 00:03:16,340
So if I start with just one variable and will be the number of variables.
31
00:03:16,730 --> 00:03:22,550
If I start with just one variable, I have my box distribution, but you can see that as n increases,
32
00:03:22,790 --> 00:03:29,060
the probability of the sum converges very fast to the bell shape, which is the Gaussian distribution.
33
00:03:29,660 --> 00:03:36,110
And the important fact is that this is completely independent from the probability distribution from which I started from.
34
00:03:36,440 --> 00:03:42,830
So that if I start from some probability distribution which is completely different like this to box distribution.
35
00:03:43,890 --> 00:03:49,440
It might take longer to converge to the Gaussian distribution, but the result eventually will be exactly the same.
36
00:03:49,770 --> 00:03:54,510
So if I have many variables, I can guarantee that I will converge to a Goshen distribution.
37
00:03:55,690 --> 00:04:00,520
So this is a very important piece of Martha is it characterises the average volume.
38
00:04:01,510 --> 00:04:05,799
But there is a problem because usually we have a finite amount of data and the central limit
39
00:04:05,800 --> 00:04:11,830
theorem then only applies to deviations from the average value which are small enough.
40
00:04:13,060 --> 00:04:19,810
But in many practical cases, as in the case of the bridge, we don't care about the Harvard behaviour, we care about the extreme events.
41
00:04:20,140 --> 00:04:20,770
For instance,
42
00:04:20,770 --> 00:04:29,829
when the water level is very high because the river is flooding and we want to study what is the statistics of these extreme events like in this case,
43
00:04:29,830 --> 00:04:34,090
the maximum the maximum level, the maximal water level over many centuries?
44
00:04:34,090 --> 00:04:40,130
The. And we would like to find an equivalent theorem.
45
00:04:41,210 --> 00:04:44,090
As the central limit theorem, which applies to extremes.
46
00:04:45,560 --> 00:04:53,260
Which is in some sense a universal result because we don't know maybe exactly how to model the fluctuations on the water level.
47
00:04:53,270 --> 00:04:58,940
So we would like to find the result, which is independent of the specific way in which we model our system.
48
00:04:59,720 --> 00:05:07,580
So this is the general motivational and the setting that I would consider is the following, which can apply to many different systems.
49
00:05:07,880 --> 00:05:16,100
So I have a collection of a collection of random variable x x, 1x2x and the index i in x i in the case time.
50
00:05:16,700 --> 00:05:24,310
So x one comes before x two and so on. So this could be the, for instance, the water level of a river over many, many years.
51
00:05:24,850 --> 00:05:28,480
And to describe these random variables, I need to write down a model.
52
00:05:29,470 --> 00:05:33,950
And the model is given by their joint probability distribution.
53
00:05:34,600 --> 00:05:38,360
So this function of many variables picks one x to extend.
54
00:05:39,640 --> 00:05:43,150
It's called the joint probability density function or joint probability distribution
55
00:05:43,150 --> 00:05:48,760
and is telling me what is the probability to observe a specific sequence of events.
56
00:05:48,790 --> 00:05:52,909
So one x, two x and. And this is the modelling part.
57
00:05:52,910 --> 00:05:59,059
So I have I start from a system, I create the model and the model is encoded in this probability distribution and this probability
58
00:05:59,060 --> 00:06:04,430
distribution is telling me about the correlations and the interdependencies between different variables.
59
00:06:05,920 --> 00:06:09,280
And then in extreme value statistics, what we usually study, for instance,
60
00:06:09,280 --> 00:06:16,180
is the global maximum of these variables, which we call capital M, the maximal entry of this sequence.
61
00:06:17,020 --> 00:06:22,390
And then what we are asking is given our model, so given on joint probability distribution,
62
00:06:22,900 --> 00:06:27,350
what can we say about the statistical properties of the maximum m?
63
00:06:29,120 --> 00:06:35,170
Or there are other quantities that we will that we will, that we will study in this presentation.
64
00:06:35,210 --> 00:06:37,940
For instance, the time at which the global maximum occurs.
65
00:06:38,330 --> 00:06:47,180
So we might ask we don't really care about about how big these extreme events are, but we care about when they will happen in time.
66
00:06:48,510 --> 00:06:53,100
Is it more likely to see the extreme event at the very beginning of the sequence in the middle or at the end?
67
00:06:54,020 --> 00:07:01,460
And this is I mean, a very simple practical application of this is to finance if you imagine that you need to sell a stock in the stock market,
68
00:07:02,210 --> 00:07:04,670
the best time to do so is when the price is the highest.
69
00:07:05,180 --> 00:07:12,740
So it's a practical problem to understand the typically or, you know, simple model at what time the price would be the highest.
70
00:07:12,770 --> 00:07:18,140
Right. And another quantity, which is of interesting extreme value statistics are records.
71
00:07:19,590 --> 00:07:26,940
So almost every day we read in the news about a new record being set either in sport or unfortunately now in climate and
72
00:07:27,210 --> 00:07:36,270
within a sequence of random variables will say that an entry is a record if it is larger than all the previous entries.
73
00:07:37,740 --> 00:07:42,840
So and this has many applications in nine climate studies, for instance,
74
00:07:43,170 --> 00:07:48,990
we know that these particular are important to study the statistical properties of records because due to climate change,
75
00:07:49,320 --> 00:07:52,470
new records have been are being set every year almost.
76
00:07:52,950 --> 00:07:59,009
So here you can see the average temperature up, um, of, of globally as a function of time over many years.
77
00:07:59,010 --> 00:08:03,870
And you can see that there is a clear trend. So new records are being set more often than they should.
78
00:08:04,320 --> 00:08:08,550
And uh, another example is, is very common is that of sports.
79
00:08:09,710 --> 00:08:18,230
So if you consider the this is the best time to run a marathon for each edition of the Olympics in the last 20 editions.
80
00:08:19,100 --> 00:08:23,239
So you can see that the time has gone down over over over the years.
81
00:08:23,240 --> 00:08:25,580
And the red dots indicate records.
82
00:08:26,910 --> 00:08:34,440
So it would be great to have an understanding or statistical understanding on how many records we should expect in a given sequence and so on.
83
00:08:35,310 --> 00:08:36,800
So this is the general set up.
84
00:08:36,810 --> 00:08:42,510
These are the three main quantities that we talk about the global maximum, the time of the maximum, and the number of records.
85
00:08:43,780 --> 00:08:47,859
And to make progress. Let's start with the simplest possible model.
86
00:08:47,860 --> 00:08:54,100
The. Which is the one where the variables are independent and identically distributed.
87
00:08:55,240 --> 00:08:58,990
Independent means that there are no correlations between the variables.
88
00:08:59,380 --> 00:09:03,880
So if I know the value of x one, this doesn't tell me anything about the value of x to.
89
00:09:04,900 --> 00:09:15,400
And identically distributed means that these end variables, they all come from the very same probability distribution, which I will call a P of x.
90
00:09:16,270 --> 00:09:17,530
Mathematically speaking,
91
00:09:17,530 --> 00:09:28,960
this means that their joint probability density function be x one to is just the product of the marginal probability distribution b of x1p of x 2%.
92
00:09:29,290 --> 00:09:34,929
So the probability of observing the whole sequence from x12x and is just a probability to observe the first one times,
93
00:09:34,930 --> 00:09:43,590
the second times, the third, and so on. And this might seem like a very simplistic model, but it's a very successful one in physics.
94
00:09:43,620 --> 00:09:47,219
One example is the random energy model by the radar,
95
00:09:47,220 --> 00:09:53,520
which was used to understand the properties of disordered systems like glassy materials and so on,
96
00:09:53,790 --> 00:09:56,820
and is based on an assumption which is precisely this one.
97
00:09:59,100 --> 00:10:04,770
So we're very lucky because in the case of independent and identically distributed variables, there exists a theorem.
98
00:10:05,750 --> 00:10:13,130
Which is which is very important. It's called the Extreme Value Theorem, and it is the counterpart of the Central Limit Theorem.
99
00:10:14,420 --> 00:10:21,860
Four extremes. What this serum is telling me is that if I take and random variables are independent
100
00:10:21,860 --> 00:10:27,160
and identically distributed and I want to study the distribution of the maximum m.
101
00:10:29,430 --> 00:10:33,120
This distribution of MMR cannot be anything.
102
00:10:33,120 --> 00:10:36,840
It can only be one out of three probability distributions.
103
00:10:37,790 --> 00:10:42,940
So in the case of the Central Limit theorem, we saw that the distribution of the sum of these variables will be gauche under.
104
00:10:44,030 --> 00:10:49,580
In this case, we know that the distribution of the maximum will be either gamble free or webull.
105
00:10:50,770 --> 00:10:55,570
Depending on how the marginal probability of a single variable behaves.
106
00:10:56,080 --> 00:11:00,970
So if the probability of p of sees the probability of observing a value for a single variable,
107
00:11:01,270 --> 00:11:08,560
if this probability decays exponentially or faster, I will be in the Gamble Universality class, which is this continuous blue line.
108
00:11:09,980 --> 00:11:17,240
If instead the probability is the king has a power law for Jackson, so it's more likely to see very big numbers.
109
00:11:17,240 --> 00:11:20,830
So it will be in the free shade universality classroom.
110
00:11:21,650 --> 00:11:28,240
And finally, if there is an upper bound. So there is a maximal value that my random variables can take.
111
00:11:28,240 --> 00:11:34,390
I will be in the way but universality class. So let's consider just the first one for simplicity.
112
00:11:35,020 --> 00:11:41,440
The Gamble distribution, which is the one you will converts to if you start from random variables which
113
00:11:41,650 --> 00:11:46,330
have a distribution which is the king exponentially or faster for a larger value.
114
00:11:46,360 --> 00:11:55,850
So. And the distribution of the maximum in this case is a very, very simple formula, which is this one E to the minus M minus E to the minus m.
115
00:11:56,730 --> 00:12:02,700
And it is shown here. So it has this kind of sellable shape, but it is kind of skewed on one side.
116
00:12:02,910 --> 00:12:07,650
So one of the tails so the the right one is the king's lower than the other.
117
00:12:09,270 --> 00:12:15,340
So let's go and do again a numerical example, as we did before for the Central Committee.
118
00:12:16,350 --> 00:12:21,480
So in this case, I consider a probability distribution, which is exponentially decaying.
119
00:12:23,170 --> 00:12:26,670
So we expect to end up in the Gamble University classroom.
120
00:12:26,680 --> 00:12:31,780
So again, what they do in my computer is that they generate an replicas of these random variables
121
00:12:32,650 --> 00:12:37,650
and I compute the maximum and I generate an Instagram for the probability of the maximum.
122
00:12:38,170 --> 00:12:48,310
And you will see that as PN increases the. The probability distribution of the maximum convergence to the universal gamble low.
123
00:12:49,630 --> 00:12:55,750
And it is universal because if I start from a completely different distribution like this one with two different peaks.
124
00:12:57,770 --> 00:13:04,640
Maybe it will take longer, but if end is large enough, I will end up again in exactly the same probability distribution.
125
00:13:06,000 --> 00:13:10,340
So at this point, you might wonder, I told you about many numerical examples.
126
00:13:10,350 --> 00:13:13,410
So these are just syntactic data that they generate on my laptop.
127
00:13:13,800 --> 00:13:15,930
But is this any useful for real data?
128
00:13:16,350 --> 00:13:24,870
And to to answer this question, I have to tell you about a little piece of history of Oxford and in particular about the Radcliffe Observatory,
129
00:13:25,290 --> 00:13:31,080
which was the University Observatory, see from 1773 to 1934.
130
00:13:31,770 --> 00:13:35,250
And if you want to see it, these are just about a five minute walk from here.
131
00:13:35,940 --> 00:13:44,040
And the astronomers of the observatory in the 18th century, they started to collect weather data for a specific reason,
132
00:13:44,550 --> 00:13:51,900
which was that refraction, which is influenced by atmospheric condition, can affect astronomical measurement.
133
00:13:52,380 --> 00:14:02,400
So they had to have a precise understanding of the local weather condition in order to ensure that their measurements were accurate.
134
00:14:03,640 --> 00:14:08,230
So they started collecting data about temperature and pressure and other things as well.
135
00:14:08,740 --> 00:14:13,450
And they are still doing that. So this data collection has continued for more than two centuries.
136
00:14:14,200 --> 00:14:22,390
They have the longest running record of temperature and rainfall data for a single site in Britain running continuously from 1813.
137
00:14:23,660 --> 00:14:26,720
So you can see here a picture of their data.
138
00:14:27,080 --> 00:14:35,540
And here you can see a handwritten entry of this data set from November 14th, 1813.
139
00:14:36,550 --> 00:14:41,790
And you can see that there is some data about temperature, about pressure, wind, rain.
140
00:14:42,220 --> 00:14:47,680
And apparently that day was very dark and rainy, not surprisingly.
141
00:14:49,990 --> 00:14:55,510
So today, these letters, we have been digitalise them and they are freely accessible on the Internet.
142
00:14:55,600 --> 00:15:02,950
So what they did is that they just went and downloaded the old data centre. And I took, for instance, I wanted to study extreme value statistics,
143
00:15:02,950 --> 00:15:09,490
so I took the maximal temperature in October in Oxford for every year in the last 200 years.
144
00:15:10,270 --> 00:15:16,810
And this is what is plotted here. So each dot is the maximum temperature registered in Oxford in October.
145
00:15:17,380 --> 00:15:21,250
And you can see that I mean, you cannot tell too much from this data, to be honest.
146
00:15:21,250 --> 00:15:24,700
But what you can tell is that 2011 had a very hot October.
147
00:15:25,060 --> 00:15:32,770
And the other thing is that if you plot this data as an Instagram, so if you build an Instagram from this data, this is what you get.
148
00:15:34,880 --> 00:15:37,970
And this is surprisingly close to the global distribution.
149
00:15:38,450 --> 00:15:45,589
And you can fit the gamble pretty easily to this data. So even though these data are not independent and then distributed,
150
00:15:45,590 --> 00:15:53,060
this theory still tells us something useful about the data, and it can predict the probability of rare events.
151
00:15:54,760 --> 00:16:00,760
So the other thing I want to tell you about in the context of independent variables is about records.
152
00:16:02,190 --> 00:16:11,070
So again, I consider independent variables and I want to answer this question, which is given a sequence of independent, random numbers.
153
00:16:11,850 --> 00:16:13,800
How many records do we expect to see?
154
00:16:16,120 --> 00:16:21,800
So it's a very practical question and we will be able to answer this question and to compute this quantity exactly within this light.
155
00:16:21,820 --> 00:16:28,479
So it's a very simple computation. So the first thing we have to observe then is that we want to compute what is
156
00:16:28,480 --> 00:16:33,630
the probability that I the eighth variable variable that I observe is a record.
157
00:16:35,520 --> 00:16:41,429
So if excise are recorded, it means that it is the biggest variable so far and these variables are independent,
158
00:16:41,430 --> 00:16:44,220
so any of them could be the maximum with equal probability.
159
00:16:44,700 --> 00:16:49,950
So since I observe the I variables, the probability at the last one is that equality is just one over I.
160
00:16:51,430 --> 00:17:00,090
Because it has to be uniform and it has to sum to one. So the average number of records I can I can obtain it just by summing over I.
161
00:17:01,870 --> 00:17:08,080
So the average number of records is the sum of it, all of the probability that that particular time had a recorder.
162
00:17:09,080 --> 00:17:11,120
So it's some over I from 1 to 10,
163
00:17:11,150 --> 00:17:19,180
some over one where I from I go from one to n and if I approximate the sum with an integral, if any is large, I can do that.
164
00:17:19,820 --> 00:17:23,450
I get that the average number of records is growing as the log of an.
165
00:17:25,240 --> 00:17:28,700
And let me point out that this is also universal.
166
00:17:29,110 --> 00:17:32,670
So this doesn't depend on the particular distribution of the single variables.
167
00:17:33,740 --> 00:17:39,340
It's a very robust result to. And this is what what you would expect.
168
00:17:39,730 --> 00:17:45,700
So if you observe, for instance, 20 random variables, you would expect around three or four records.
169
00:17:46,180 --> 00:17:48,610
So it's a very slow growth. And this makes sense because.
170
00:17:50,350 --> 00:17:56,830
Later in time, it will it will be harder to break a new record because the last record will be higher.
171
00:17:57,640 --> 00:18:02,230
And if you try to apply this to the sports data that I showed before.
172
00:18:03,160 --> 00:18:06,399
We had that in the last 20 editions of the Olympics.
173
00:18:06,400 --> 00:18:13,740
There have been seven marathon records. And so the independent theory doesn't work in this case.
174
00:18:14,130 --> 00:18:17,490
We would expect three or four and we get seven. So it's it's very off.
175
00:18:17,850 --> 00:18:27,790
And if we apply the same the same idea to the to the temperature data that I showed before, it still doesn't work.
176
00:18:27,810 --> 00:18:32,670
So in the last 200 years, there are nine records, but the log of 200 is like around five.
177
00:18:33,610 --> 00:18:42,210
So the independent theory doesn't work in this case. So the independent theory is very useful as a benchmark and it works in some cases,
178
00:18:42,870 --> 00:18:50,370
but it's it has limitations because the real world in the real world, there are correlations and you need to include them in the model often.
179
00:18:50,850 --> 00:18:56,429
So what we want to do now is to include correlations in the model and to get a more
180
00:18:56,430 --> 00:19:00,840
complicated model which takes into account that different variables are not independent.
181
00:19:02,490 --> 00:19:07,229
The simplest model that you can consider with correlations is the weakly correlated model.
182
00:19:07,230 --> 00:19:11,930
And so first of all, let me say that in general, there's no general technique to study correlated system.
183
00:19:11,940 --> 00:19:18,510
So we have to go on a case by case basis. And the simplest model that you can consider is the one where correlations are weak.
184
00:19:19,980 --> 00:19:28,800
What does that mean? So that quantity that I have on the left hand side is the correlation between variable exit and variable XG.
185
00:19:29,750 --> 00:19:36,770
And what you have to know is that this number will be zero if the variables are independent and it will be non zero positive or negative.
186
00:19:37,040 --> 00:19:40,190
If these variables, these variables are correlated.
187
00:19:40,850 --> 00:19:46,040
So if I assume that the correlations are decaying in time exponentially faster.
188
00:19:47,130 --> 00:19:50,930
Overall a typical timescale, which is the correlation timescale, say.
189
00:19:52,500 --> 00:20:00,420
What I what I will have is that two random variables which are farther away in time than say.
190
00:20:01,380 --> 00:20:06,380
They are basically independent. So I can still make progress using the independent theory.
191
00:20:06,770 --> 00:20:10,030
And to do that, imagine that they have a very, very long sequencer.
192
00:20:11,110 --> 00:20:17,320
What they can do is that they can divide this sequence into different intervals of size, saying.
193
00:20:18,470 --> 00:20:24,020
Similarly to what is done in statistical physics with the Canada Cannon argument the.
194
00:20:25,230 --> 00:20:31,680
And since the correlations decay over this time, the different intervals are almost independent.
195
00:20:33,510 --> 00:20:38,070
So if I define the maximum within each interval.
196
00:20:39,140 --> 00:20:43,280
These Maxima and want him to make will be independent run the variables.
197
00:20:44,350 --> 00:20:47,860
And they can still apply the theory of independent identically distributed
198
00:20:47,860 --> 00:20:52,600
random variables because the global maximum is just the maximum of the maximum.
199
00:20:53,960 --> 00:21:01,340
So I can still use the theory that I showed before. But the problem is that when correlations do not decay exponentially in time,
200
00:21:01,520 --> 00:21:05,720
so when they have strongly correlated random variables, this doesn't work anymore.
201
00:21:07,110 --> 00:21:16,620
So here I wanted to present just three examples of systems with strongly correlated random variable of which have been studied quite a lot in physics.
202
00:21:17,220 --> 00:21:26,370
So the first one are random mattresses, which have been used a lot in physics to describe the complicated Hamiltonian of heavy nuclei.
203
00:21:26,850 --> 00:21:33,900
So the basic idea there is that you don't know that this Hamiltonian is so complicated that you don't know exactly what it will look like.
204
00:21:34,290 --> 00:21:38,080
So you approximate it with a random matrix. And it actually works.
205
00:21:38,680 --> 00:21:44,559
And and in this case, for instance, the extreme,
206
00:21:44,560 --> 00:21:50,190
the maximal like in value of the Hamiltonian has been studied quite a lot in the context of extreme volume statistics.
207
00:21:50,800 --> 00:21:53,380
The second one are fluctuating interfaces. So.
208
00:21:54,490 --> 00:22:01,660
Which have been used a lot to describe the interface of growing colonies of bacteria or growing tumours.
209
00:22:02,170 --> 00:22:08,200
And it's quite important to study the statistical properties of these interfaces and in the context of extreme value statistics.
210
00:22:08,350 --> 00:22:13,450
For instance, the maximal height of an interface has been studied and it is a very crucial quantity.
211
00:22:13,900 --> 00:22:18,880
And finally, around the works. Which will be the focus of the rest of my talk.
212
00:22:20,690 --> 00:22:28,860
So let me define first of all, what are on the wokeism. So here I am plotting the position of the worker over time.
213
00:22:28,870 --> 00:22:32,380
So XLK is the position of the random worker as a function of key?
214
00:22:32,410 --> 00:22:39,700
It's a motion in one dimension. And the evolution of the position satisfies a very simple rule,
215
00:22:40,150 --> 00:22:46,750
which is telling me that the position of the step K is equal to the position at the previous step K minus one plus a jump.
216
00:22:47,790 --> 00:22:49,080
Which I will call it, actually.
217
00:22:50,490 --> 00:22:56,610
So I will assume that the jumps of this random worker are again independent and identically distributed random variables.
218
00:22:56,910 --> 00:23:03,780
But now the variables in scales are strongly correlated because if you write down the joint probability distribution,
219
00:23:04,080 --> 00:23:05,670
it will have a more complicated form,
220
00:23:05,670 --> 00:23:12,480
which is not just the one of independent variables, and there are actually strong correlations in this model which do not decay exponentially in time.
221
00:23:14,360 --> 00:23:21,010
So talking about around the works, as you probably know, Oxford is full of beautiful pubs and many of them,
222
00:23:21,020 --> 00:23:24,610
I mean some of them, like the one in this picture, are just next to a river.
223
00:23:25,900 --> 00:23:35,560
So if you imagine that there is a drunk student classical exam after that, after a few pints, he wants to go home but is drunk.
224
00:23:36,540 --> 00:23:39,570
So he will move with random steps.
225
00:23:41,110 --> 00:23:43,450
Either towards the river or away from it.
226
00:23:45,460 --> 00:23:53,410
So my question for you is now after and steps, what is the probability that these long land has fallen into the river?
227
00:23:55,140 --> 00:24:01,020
We can answer this question very precisely by modelling the motion of the student as ran the work.
228
00:24:01,890 --> 00:24:08,370
So now the river is this red line that I should that that's corresponds to x equal to zero.
229
00:24:09,620 --> 00:24:12,950
And what we want to study is the survival probability.
230
00:24:12,960 --> 00:24:15,120
So the probability that the student will survive.
231
00:24:16,010 --> 00:24:25,050
Which is which I will call Q1 and Q1 is just the probability that X1 is greater than zero, x2 is greater than zero up to Accenture.
232
00:24:25,790 --> 00:24:28,370
Given that the starting position was zero.
233
00:24:30,120 --> 00:24:40,980
And of course, this probability distribution will depend, as will be, a complicated function of the probability distribution of the steps of data.
234
00:24:41,580 --> 00:24:44,700
So if I take steps which are drawn from a distribution,
235
00:24:44,700 --> 00:24:50,490
I we get in principle something different than if I get steps from my uniform probability distribution.
236
00:24:51,540 --> 00:24:58,020
And in this formula you don't need to understand the details. These details are heavy sighted functions, which are one.
237
00:24:58,140 --> 00:25:01,770
If the argument is positive and zero otherwise. But this doesn't really matter.
238
00:25:01,780 --> 00:25:05,340
What matters is that QM is a complicated function of P.
239
00:25:06,630 --> 00:25:11,170
And B can be anything really. So I would expect to end to depend on.
240
00:25:12,240 --> 00:25:19,800
But the very surprising result, which is known as this Barry Anderson theorem, is that Q and is completely universal once again.
241
00:25:20,010 --> 00:25:24,390
So he is completely independent of P of data. And this is true for anyon.
242
00:25:24,420 --> 00:25:31,320
So in this case, it's not true only for large values of n is true for any value of n and is given by this very simple formula.
243
00:25:32,510 --> 00:25:37,970
So Asparagus was a Swedish mathematician, and they proved this in 1955, 54.
244
00:25:38,480 --> 00:25:42,060
And this the formula is surprisingly simple.
245
00:25:43,040 --> 00:25:48,260
So many people thought that the proof of this formula should be simple as well.
246
00:25:49,130 --> 00:25:55,190
And there have been, while the original proof passed by, Anderson, is a kind of complicated combinatorial proof.
247
00:25:56,000 --> 00:25:59,809
So there have been many attempts to prove this formula in a simpler way.
248
00:25:59,810 --> 00:26:04,130
But they came up with different proofs, which are where we more complicated than the original one.
249
00:26:04,670 --> 00:26:08,480
So if we now plot what QM is as a function of then so what?
250
00:26:08,510 --> 00:26:11,980
What is the probability this student has survived for and stepson.
251
00:26:13,140 --> 00:26:18,900
You can see that initially the probability is one off because the students, the student is starting just on the edge of the river.
252
00:26:19,200 --> 00:26:25,800
So if it goes on the right in the wrong direction, the first step will immediately go in following the river, and then it decreases to zero.
253
00:26:26,580 --> 00:26:29,010
So if you wait long enough, the student will fall.
254
00:26:29,460 --> 00:26:39,570
And the the survivor probability is, I'm telling you about this because this is a crucial quantity in statistics,
255
00:26:39,900 --> 00:26:46,860
because it can be used as a building block to study more complicated quantities like the distribution of the time of the maximum.
256
00:26:47,250 --> 00:26:53,160
And this is what I will tell you about. So let's consider once again a round the work of this type.
257
00:26:54,380 --> 00:26:59,780
I led the round the working world for MN steps and I want to study.
258
00:27:00,080 --> 00:27:04,460
I want to know what is the probability distribution of the time of the maximum democracy?
259
00:27:05,520 --> 00:27:10,710
So what is the probability at the time of the maximum amount is either at the beginning or at the end or in the middle, for instance.
260
00:27:12,180 --> 00:27:18,570
So we will be able to complete this distribution exactly by using the result for the survival probability.
261
00:27:19,080 --> 00:27:22,410
So the first observation that we need to make is that.
262
00:27:24,390 --> 00:27:28,890
Almost by definition, the random worker cannot go above its maximum.
263
00:27:29,990 --> 00:27:36,290
Is almost out a logical but. So the Iranian worker cannot cross the red barrier, which is the maximum.
264
00:27:36,920 --> 00:27:40,610
Now, let's split this trajectory in two parts.
265
00:27:41,180 --> 00:27:46,700
So the first part is from time zero to time democracy. And the second part is from time to time and.
266
00:27:48,310 --> 00:27:56,860
So what's going on in the second part? The random worker starts from the maximum and it has to stay below the maximum for some number of steps.
267
00:27:57,730 --> 00:28:00,730
And this is precisely what the survival probability is telling you.
268
00:28:01,670 --> 00:28:07,760
So the probability of this part of the trajectory is precisely this would via probability.
269
00:28:09,800 --> 00:28:14,540
In this case is q n minus the marks because there are and minus the steps on that side.
270
00:28:15,460 --> 00:28:21,080
And in the first part said the I can apply a very similar argument.
271
00:28:21,100 --> 00:28:31,379
I just have to go back in time. So if I start from the maximum that there are no workers to stay below the maximum, 40 marks a step.
272
00:28:31,380 --> 00:28:37,440
So. So the probability of the first part is just the of the probability first steps.
273
00:28:38,520 --> 00:28:41,970
And the probability of the time of the maximum was just the product of these two probabilities.
274
00:28:42,330 --> 00:28:49,430
Very simple. Okay. So if you remember that Q is a universal quantity.
275
00:28:50,270 --> 00:28:58,420
This is telling that this the distribution of climax is also universal, doesn't depend on the specific way in which I model my system and.
276
00:28:59,940 --> 00:29:04,380
If I if I plot this quantity, the probability distribution of the maximum.
277
00:29:06,140 --> 00:29:08,840
If you look something like this in the limit of margin.
278
00:29:09,740 --> 00:29:17,660
So what they what I what they can understand from this plot is that it is more likely to find that the maximum either at the very beginning.
279
00:29:18,770 --> 00:29:21,830
Or at the very end of the interval that I'm considering.
280
00:29:22,880 --> 00:29:28,640
Because the probabilities diverging, actually the probability is going almost to infinity in the limit of margin.
281
00:29:28,640 --> 00:29:34,460
When t max is going to zero, t max is going to end, which is the final time of the end.
282
00:29:35,000 --> 00:29:41,270
And this is telling me that there are way more trajectories which are either increasing like this of reaching
283
00:29:41,270 --> 00:29:46,490
the maximum and the final time or decreasing like this and reaching the maximum at the very beginning.
284
00:29:47,570 --> 00:29:54,470
So you might you might think that if we go and model the evolution of the price in the stock market just as around the work.
285
00:29:55,510 --> 00:29:59,770
The best time to sell a stock is either in the morning or in the late in the afternoon.
286
00:30:00,580 --> 00:30:05,050
But before you do that, you have to keep in mind what is the distribution of the time of the minimum?
287
00:30:06,080 --> 00:30:15,170
Which is exactly the same. So you have to be very careful by describing in describing the stock market as around the world.
288
00:30:15,260 --> 00:30:24,380
It's more complicated then. So in the last few minutes of my presentation, I want to tell you about something, some new results.
289
00:30:24,440 --> 00:30:28,400
So so far I presented some classical results of existing value statistics of.
290
00:30:29,480 --> 00:30:34,520
So I would like to tell you about some more recent developments and some of my research in this in this field,
291
00:30:34,970 --> 00:30:42,380
which is done in the context of active particles. So, first of all, let me tell you what what active matter and active particles are.
292
00:30:42,890 --> 00:30:52,010
So active matter describes systems like colonies of bacteria, which are composed of many individual units, some.
293
00:30:53,360 --> 00:31:01,520
And these units are able to absorb energy from the system through food, for instance, converting this energy into some form of work.
294
00:31:01,610 --> 00:31:08,600
In the case of the bacteria, this work is just persistent motion and this is very different to what is usually considered in physics.
295
00:31:09,020 --> 00:31:18,970
For instance, Brownian motion. Because the bacteria move in a persistent way while Brownian motion is just moving in a random way.
296
00:31:20,210 --> 00:31:23,870
Due to collisions with the molecules of the fluid surrounding it.
297
00:31:23,870 --> 00:31:34,070
The. And crucially, these bacteria are out of equilibrium because they are continuously consuming energy while Brownian motion is at equilibrium.
298
00:31:34,370 --> 00:31:42,650
So we have a lot of tools and techniques from thermodynamics and statistical mechanics to describe equilibrium systems.
299
00:31:43,490 --> 00:31:47,430
But all of these techniques do not apply to non-equilibrium systems.
300
00:31:48,020 --> 00:31:53,030
And the reason for that is that there is a continuous absorption, absorption of energy.
301
00:31:53,420 --> 00:31:56,900
And in other words, active matter is alive. Passive matter is dead.
302
00:31:57,530 --> 00:32:02,189
And it's crucial to understand the statistical properties of active particles.
303
00:32:02,190 --> 00:32:09,480
So. And this is what I did during my study and by considering the running tumble particle model.
304
00:32:10,430 --> 00:32:18,650
Which he's describing in a very simplified way, the motion of equal bacteria, which is shown in the lower animation.
305
00:32:18,680 --> 00:32:22,729
These are experimental data. So these bacteria,
306
00:32:22,730 --> 00:32:32,059
what the way they move is that they typically move in a fix direction in a persistent way with almost constant velocity for some amount of time.
307
00:32:32,060 --> 00:32:37,820
And then they change direction suddenly this way. This is called the run and tumble motion.
308
00:32:38,960 --> 00:32:43,640
And to model it, I will just I will just start with the with the simplest possible model.
309
00:32:44,660 --> 00:32:50,090
So I assume that they have a bacterium which is starting from a barrier, which is this red line here.
310
00:32:50,540 --> 00:32:58,610
And initially this bacterium will choose the action uniformly at random in space, and it will start moving in that direction for some time.
311
00:32:59,180 --> 00:33:03,980
And after some time it will stumble. So it will pick a new direction again uniformly at random.
312
00:33:05,100 --> 00:33:10,770
And they will assume that these tumbling events, these changes of direction, occur at a constant rate gamma.
313
00:33:12,000 --> 00:33:18,149
Meaning that on average for each second they expect to see one over gamma events or changes of direction.
314
00:33:18,150 --> 00:33:22,380
But there can be fluctuations is actually what is called a Poisson process.
315
00:33:23,070 --> 00:33:29,460
And they will also consider velocity fluctuations in the in the speed of the particle which are described by same probability
316
00:33:29,670 --> 00:33:36,989
distribution w which but this is not very important and will consider this motion in one or two or three dimensions.
317
00:33:36,990 --> 00:33:42,420
And this is the dimension of the system. What I'm shooting here is the motion in two dimensions.
318
00:33:43,560 --> 00:33:51,360
So in these cases, one one first step to understand the statistical properties of this motion is to study the survival probability,
319
00:33:52,440 --> 00:34:00,239
which I will define in the following way as the probability that the X component of the particle does not change.
320
00:34:00,240 --> 00:34:02,969
Sign up to time t in two dimensions.
321
00:34:02,970 --> 00:34:09,330
This is the probability that the random worker or the random number particle doesn't cross the barrier for a time.
322
00:34:09,330 --> 00:34:19,800
T So this is a process which is defined in continuous time and this would very probably was first computed in 95 in the simplest possible case,
323
00:34:19,920 --> 00:34:29,309
which is one dimension. So the motion is just on a line and constant velocity and it is given by this formula where I noted
324
00:34:29,310 --> 00:34:34,680
they one are modified by cell function but is something you can that you can plot them and it
325
00:34:34,680 --> 00:34:39,509
looks like this so initially this would work probabilities one off for the same reason as before
326
00:34:39,510 --> 00:34:44,430
because half of the times the particle we immediately cross the wall and then it decreases to zero.
327
00:34:45,150 --> 00:34:52,500
So I wanted to study this problem more in general. So for higher dimension size and also for velocity fluctuations of the particle.
328
00:34:53,100 --> 00:34:56,940
So I first did the simulations in one day and not too surprisingly.
329
00:34:57,820 --> 00:34:59,980
I got that. The simulations agree with the theory.
330
00:35:00,920 --> 00:35:07,130
And then I repeated the simulations in equal to two, and I was quite surprised to get exactly the same result.
331
00:35:07,520 --> 00:35:12,910
So at first I thought that I had a bug in my code then. But I checked very carefully.
332
00:35:13,240 --> 00:35:20,530
It took me a lot of a long time, but this was not the case. And doing simulations in the sequel to three, I got again the same result.
333
00:35:21,070 --> 00:35:28,030
And when I included the velocity fluctuations in the speed of the particle in my model, I got once again the very same result.
334
00:35:29,560 --> 00:35:35,260
So the numerical simulations were suggesting that the result is, in some sense, universal.
335
00:35:36,380 --> 00:35:43,700
Independent of the details of the model. And keep in mind that this is a non-trivial statement because many other quantities in this model.
336
00:35:43,820 --> 00:35:51,530
For instance, the position distribution of the particle, the probability to find the particle at position X at 90 is not universal.
337
00:35:52,010 --> 00:35:54,770
This depends strongly in which dimension I'm considering.
338
00:35:55,610 --> 00:36:00,230
There is a curve for equal to four, which is something you can simulate on a computer but doesn't really make sense.
339
00:36:00,590 --> 00:36:08,140
But it is strongly dependent on the. And this looks a lot like what we saw before for the works.
340
00:36:08,350 --> 00:36:09,580
This part Anderson Theorem,
341
00:36:09,880 --> 00:36:18,850
which was also a universal result but it was not obvious how to apply this result to the running tambour particle model for many technical reasons.
342
00:36:19,180 --> 00:36:27,979
For instance, the tumble particle model is defined in continuous time, while the Spotted Anderson theorem applies only to discrete time run works.
343
00:36:27,980 --> 00:36:36,300
So. But we were able to actually develop a mapping from the continuous time motion of the running particle to a discrete time around the work.
344
00:36:36,690 --> 00:36:42,810
And we were able to show that actually it is this period of time that is behind that, this universality.
345
00:36:43,740 --> 00:36:49,410
And we saw before that the sort of higher probability can be used as a building
346
00:36:49,410 --> 00:36:53,130
block to consider more complicated quantities in extreme value statistics of.
347
00:36:54,140 --> 00:36:57,050
And this is the case also for the running tambour particle model.
348
00:36:58,070 --> 00:37:05,240
So using this building block, this reverb probability, we were able to show that this time of the maximum for this random process,
349
00:37:05,240 --> 00:37:08,240
which is the random tumble particle motion, is also universal.
350
00:37:09,020 --> 00:37:14,090
And the other is the number of records in for this model is also universal quantity.
351
00:37:14,420 --> 00:37:17,420
In this plot, I'm showing the I mean,
352
00:37:17,810 --> 00:37:24,440
the blue line is the theory and the different symbols which kind of overlapped with each other are the simulations.
353
00:37:24,440 --> 00:37:30,229
And they all followed the very same curve on the left. I'm plotting the cumulative probability of the time of the maximum.
354
00:37:30,230 --> 00:37:36,890
So the probability that the maximum was less than some variety prime as a function of t prime and theory of the number of records.
355
00:37:37,670 --> 00:37:44,030
So and this is again a very non-trivial result because we have seen before that many other quantities in this model are not universal.
356
00:37:44,030 --> 00:37:47,750
So there is something special about these extreme value statistics quantities.
357
00:37:48,410 --> 00:37:53,210
So to conclude the I presented to you different simple models.
358
00:37:54,110 --> 00:38:00,830
And from this I hope that we have built an intuition on how extreme event behave in a statistical way.
359
00:38:01,310 --> 00:38:06,680
And these results have many applications to finance physics, evolution theory then.
360
00:38:07,070 --> 00:38:11,540
And so it's a very exciting and interdisciplinary field of study.
361
00:38:11,810 --> 00:38:21,110
And the crucial point that I want to make is that often these results were universal, independent of the specific way in which we model the system.
362
00:38:21,260 --> 00:38:24,409
And this is very important because often we model the system in a way which is not
363
00:38:24,410 --> 00:38:29,180
accurate because we don't have access to the full information about the system.
364
00:38:29,180 --> 00:38:32,420
So we have to make assumptions and if we get the universal result,
365
00:38:32,960 --> 00:38:37,730
we can we are guaranteed that our results are robust to errors in the way we model the system.
366
00:38:38,270 --> 00:38:42,890
So as a final question, as I mentioned before, there is no general theory.
367
00:38:43,820 --> 00:38:46,340
Forex in value statistics and correlated systems.
368
00:38:46,850 --> 00:38:57,320
And uh, as a very ambitious question, we would like to find or to explore possible direction to find one.
369
00:38:57,710 --> 00:38:59,750
And with this, I want to thank you for your attention.