1 00:00:15,080 --> 00:00:20,239 So I will talk about extreme value statistics. So this first lady's motivation, 2 00:00:20,240 --> 00:00:25,850 why we want to study extreme events and extremely statistics is a branch of probability 3 00:00:25,850 --> 00:00:31,159 theory dealing with extreme events and which are typically very rare events. 4 00:00:31,160 --> 00:00:34,250 But when they happen, they can have devastating consequences. 5 00:00:34,670 --> 00:00:38,120 So these are three examples of extreme events which which are a big deal. 6 00:00:38,120 --> 00:00:47,750 And for instance, in epidemics, breeding a very rare mutation of a virus can lead to a new epidemic wave or in in finance. 7 00:00:48,450 --> 00:00:51,950 A financial crisis can really affect world economy. 8 00:00:52,250 --> 00:00:59,239 And finally, it's very important to study rare events in climate studies, because with in the context of climate change, 9 00:00:59,240 --> 00:01:04,639 extreme weather events like heatwaves have become more and more common and more important. 10 00:01:04,640 --> 00:01:13,940 So it's very it's crucial to understand the role played by extreme events and in particular, their statistical properties. 11 00:01:14,980 --> 00:01:22,960 So let's start with the practical problem. Imagine that you are an engineer and your task is to build a bridge over the river. 12 00:01:24,040 --> 00:01:27,160 So you have at your disposal some data. 13 00:01:27,310 --> 00:01:32,440 So this, for instance, is the water level of the Nile River near Cairo. 14 00:01:33,540 --> 00:01:40,660 Over many centuries. And your goal is to be able to decide, for instance, the height of the bridge. 15 00:01:41,780 --> 00:01:48,140 So one first thing you could do to study this data is to study what the average height of the river is. 16 00:01:49,040 --> 00:01:52,310 And the average value is plotted here by the black line. 17 00:01:53,350 --> 00:01:57,940 And to do that, you have a very strong result from mathematics, from probability theory, 18 00:01:57,940 --> 00:02:01,410 math, from probability theory, which is the central limit theorem. 19 00:02:03,140 --> 00:02:09,980 And what the central limit theorem is telling you is that if you have a bunch of random variables x1x2x, then. 20 00:02:11,270 --> 00:02:14,690 You sum them up. And if you have many variables, 21 00:02:14,690 --> 00:02:22,580 you are guaranteed that this some of these variables as and will converge to a Gaussian distribution or a normal distribution. 22 00:02:23,570 --> 00:02:28,940 Which is shown here, and it has the typical bell shape that you might have seen many times already. 23 00:02:29,860 --> 00:02:34,600 So to give you a specific example, a numerical example, 24 00:02:35,470 --> 00:02:42,580 if we consider probability density function API effects, which is telling me that the probability, I mean, 25 00:02:42,890 --> 00:02:44,680 I can find the, the variable, 26 00:02:44,680 --> 00:02:52,629 the random variable X anywhere in the interval zero one with the uniform probability and I will not find this variable anywhere else. 27 00:02:52,630 --> 00:02:57,890 So the probability zero say this interval. So it's like this box probability distribution. 28 00:02:57,910 --> 00:03:07,150 So what they can do is that in my computer I can generate many replicas of this random variable independent of each other. 29 00:03:07,570 --> 00:03:11,320 I sum them up in a plot. The probability distribution of the sun. 30 00:03:12,410 --> 00:03:16,340 So if I start with just one variable and will be the number of variables. 31 00:03:16,730 --> 00:03:22,550 If I start with just one variable, I have my box distribution, but you can see that as n increases, 32 00:03:22,790 --> 00:03:29,060 the probability of the sum converges very fast to the bell shape, which is the Gaussian distribution. 33 00:03:29,660 --> 00:03:36,110 And the important fact is that this is completely independent from the probability distribution from which I started from. 34 00:03:36,440 --> 00:03:42,830 So that if I start from some probability distribution which is completely different like this to box distribution. 35 00:03:43,890 --> 00:03:49,440 It might take longer to converge to the Gaussian distribution, but the result eventually will be exactly the same. 36 00:03:49,770 --> 00:03:54,510 So if I have many variables, I can guarantee that I will converge to a Goshen distribution. 37 00:03:55,690 --> 00:04:00,520 So this is a very important piece of Martha is it characterises the average volume. 38 00:04:01,510 --> 00:04:05,799 But there is a problem because usually we have a finite amount of data and the central limit 39 00:04:05,800 --> 00:04:11,830 theorem then only applies to deviations from the average value which are small enough. 40 00:04:13,060 --> 00:04:19,810 But in many practical cases, as in the case of the bridge, we don't care about the Harvard behaviour, we care about the extreme events. 41 00:04:20,140 --> 00:04:20,770 For instance, 42 00:04:20,770 --> 00:04:29,829 when the water level is very high because the river is flooding and we want to study what is the statistics of these extreme events like in this case, 43 00:04:29,830 --> 00:04:34,090 the maximum the maximum level, the maximal water level over many centuries? 44 00:04:34,090 --> 00:04:40,130 The. And we would like to find an equivalent theorem. 45 00:04:41,210 --> 00:04:44,090 As the central limit theorem, which applies to extremes. 46 00:04:45,560 --> 00:04:53,260 Which is in some sense a universal result because we don't know maybe exactly how to model the fluctuations on the water level. 47 00:04:53,270 --> 00:04:58,940 So we would like to find the result, which is independent of the specific way in which we model our system. 48 00:04:59,720 --> 00:05:07,580 So this is the general motivational and the setting that I would consider is the following, which can apply to many different systems. 49 00:05:07,880 --> 00:05:16,100 So I have a collection of a collection of random variable x x, 1x2x and the index i in x i in the case time. 50 00:05:16,700 --> 00:05:24,310 So x one comes before x two and so on. So this could be the, for instance, the water level of a river over many, many years. 51 00:05:24,850 --> 00:05:28,480 And to describe these random variables, I need to write down a model. 52 00:05:29,470 --> 00:05:33,950 And the model is given by their joint probability distribution. 53 00:05:34,600 --> 00:05:38,360 So this function of many variables picks one x to extend. 54 00:05:39,640 --> 00:05:43,150 It's called the joint probability density function or joint probability distribution 55 00:05:43,150 --> 00:05:48,760 and is telling me what is the probability to observe a specific sequence of events. 56 00:05:48,790 --> 00:05:52,909 So one x, two x and. And this is the modelling part. 57 00:05:52,910 --> 00:05:59,059 So I have I start from a system, I create the model and the model is encoded in this probability distribution and this probability 58 00:05:59,060 --> 00:06:04,430 distribution is telling me about the correlations and the interdependencies between different variables. 59 00:06:05,920 --> 00:06:09,280 And then in extreme value statistics, what we usually study, for instance, 60 00:06:09,280 --> 00:06:16,180 is the global maximum of these variables, which we call capital M, the maximal entry of this sequence. 61 00:06:17,020 --> 00:06:22,390 And then what we are asking is given our model, so given on joint probability distribution, 62 00:06:22,900 --> 00:06:27,350 what can we say about the statistical properties of the maximum m? 63 00:06:29,120 --> 00:06:35,170 Or there are other quantities that we will that we will, that we will study in this presentation. 64 00:06:35,210 --> 00:06:37,940 For instance, the time at which the global maximum occurs. 65 00:06:38,330 --> 00:06:47,180 So we might ask we don't really care about about how big these extreme events are, but we care about when they will happen in time. 66 00:06:48,510 --> 00:06:53,100 Is it more likely to see the extreme event at the very beginning of the sequence in the middle or at the end? 67 00:06:54,020 --> 00:07:01,460 And this is I mean, a very simple practical application of this is to finance if you imagine that you need to sell a stock in the stock market, 68 00:07:02,210 --> 00:07:04,670 the best time to do so is when the price is the highest. 69 00:07:05,180 --> 00:07:12,740 So it's a practical problem to understand the typically or, you know, simple model at what time the price would be the highest. 70 00:07:12,770 --> 00:07:18,140 Right. And another quantity, which is of interesting extreme value statistics are records. 71 00:07:19,590 --> 00:07:26,940 So almost every day we read in the news about a new record being set either in sport or unfortunately now in climate and 72 00:07:27,210 --> 00:07:36,270 within a sequence of random variables will say that an entry is a record if it is larger than all the previous entries. 73 00:07:37,740 --> 00:07:42,840 So and this has many applications in nine climate studies, for instance, 74 00:07:43,170 --> 00:07:48,990 we know that these particular are important to study the statistical properties of records because due to climate change, 75 00:07:49,320 --> 00:07:52,470 new records have been are being set every year almost. 76 00:07:52,950 --> 00:07:59,009 So here you can see the average temperature up, um, of, of globally as a function of time over many years. 77 00:07:59,010 --> 00:08:03,870 And you can see that there is a clear trend. So new records are being set more often than they should. 78 00:08:04,320 --> 00:08:08,550 And uh, another example is, is very common is that of sports. 79 00:08:09,710 --> 00:08:18,230 So if you consider the this is the best time to run a marathon for each edition of the Olympics in the last 20 editions. 80 00:08:19,100 --> 00:08:23,239 So you can see that the time has gone down over over over the years. 81 00:08:23,240 --> 00:08:25,580 And the red dots indicate records. 82 00:08:26,910 --> 00:08:34,440 So it would be great to have an understanding or statistical understanding on how many records we should expect in a given sequence and so on. 83 00:08:35,310 --> 00:08:36,800 So this is the general set up. 84 00:08:36,810 --> 00:08:42,510 These are the three main quantities that we talk about the global maximum, the time of the maximum, and the number of records. 85 00:08:43,780 --> 00:08:47,859 And to make progress. Let's start with the simplest possible model. 86 00:08:47,860 --> 00:08:54,100 The. Which is the one where the variables are independent and identically distributed. 87 00:08:55,240 --> 00:08:58,990 Independent means that there are no correlations between the variables. 88 00:08:59,380 --> 00:09:03,880 So if I know the value of x one, this doesn't tell me anything about the value of x to. 89 00:09:04,900 --> 00:09:15,400 And identically distributed means that these end variables, they all come from the very same probability distribution, which I will call a P of x. 90 00:09:16,270 --> 00:09:17,530 Mathematically speaking, 91 00:09:17,530 --> 00:09:28,960 this means that their joint probability density function be x one to is just the product of the marginal probability distribution b of x1p of x 2%. 92 00:09:29,290 --> 00:09:34,929 So the probability of observing the whole sequence from x12x and is just a probability to observe the first one times, 93 00:09:34,930 --> 00:09:43,590 the second times, the third, and so on. And this might seem like a very simplistic model, but it's a very successful one in physics. 94 00:09:43,620 --> 00:09:47,219 One example is the random energy model by the radar, 95 00:09:47,220 --> 00:09:53,520 which was used to understand the properties of disordered systems like glassy materials and so on, 96 00:09:53,790 --> 00:09:56,820 and is based on an assumption which is precisely this one. 97 00:09:59,100 --> 00:10:04,770 So we're very lucky because in the case of independent and identically distributed variables, there exists a theorem. 98 00:10:05,750 --> 00:10:13,130 Which is which is very important. It's called the Extreme Value Theorem, and it is the counterpart of the Central Limit Theorem. 99 00:10:14,420 --> 00:10:21,860 Four extremes. What this serum is telling me is that if I take and random variables are independent 100 00:10:21,860 --> 00:10:27,160 and identically distributed and I want to study the distribution of the maximum m. 101 00:10:29,430 --> 00:10:33,120 This distribution of MMR cannot be anything. 102 00:10:33,120 --> 00:10:36,840 It can only be one out of three probability distributions. 103 00:10:37,790 --> 00:10:42,940 So in the case of the Central Limit theorem, we saw that the distribution of the sum of these variables will be gauche under. 104 00:10:44,030 --> 00:10:49,580 In this case, we know that the distribution of the maximum will be either gamble free or webull. 105 00:10:50,770 --> 00:10:55,570 Depending on how the marginal probability of a single variable behaves. 106 00:10:56,080 --> 00:11:00,970 So if the probability of p of sees the probability of observing a value for a single variable, 107 00:11:01,270 --> 00:11:08,560 if this probability decays exponentially or faster, I will be in the Gamble Universality class, which is this continuous blue line. 108 00:11:09,980 --> 00:11:17,240 If instead the probability is the king has a power law for Jackson, so it's more likely to see very big numbers. 109 00:11:17,240 --> 00:11:20,830 So it will be in the free shade universality classroom. 110 00:11:21,650 --> 00:11:28,240 And finally, if there is an upper bound. So there is a maximal value that my random variables can take. 111 00:11:28,240 --> 00:11:34,390 I will be in the way but universality class. So let's consider just the first one for simplicity. 112 00:11:35,020 --> 00:11:41,440 The Gamble distribution, which is the one you will converts to if you start from random variables which 113 00:11:41,650 --> 00:11:46,330 have a distribution which is the king exponentially or faster for a larger value. 114 00:11:46,360 --> 00:11:55,850 So. And the distribution of the maximum in this case is a very, very simple formula, which is this one E to the minus M minus E to the minus m. 115 00:11:56,730 --> 00:12:02,700 And it is shown here. So it has this kind of sellable shape, but it is kind of skewed on one side. 116 00:12:02,910 --> 00:12:07,650 So one of the tails so the the right one is the king's lower than the other. 117 00:12:09,270 --> 00:12:15,340 So let's go and do again a numerical example, as we did before for the Central Committee. 118 00:12:16,350 --> 00:12:21,480 So in this case, I consider a probability distribution, which is exponentially decaying. 119 00:12:23,170 --> 00:12:26,670 So we expect to end up in the Gamble University classroom. 120 00:12:26,680 --> 00:12:31,780 So again, what they do in my computer is that they generate an replicas of these random variables 121 00:12:32,650 --> 00:12:37,650 and I compute the maximum and I generate an Instagram for the probability of the maximum. 122 00:12:38,170 --> 00:12:48,310 And you will see that as PN increases the. The probability distribution of the maximum convergence to the universal gamble low. 123 00:12:49,630 --> 00:12:55,750 And it is universal because if I start from a completely different distribution like this one with two different peaks. 124 00:12:57,770 --> 00:13:04,640 Maybe it will take longer, but if end is large enough, I will end up again in exactly the same probability distribution. 125 00:13:06,000 --> 00:13:10,340 So at this point, you might wonder, I told you about many numerical examples. 126 00:13:10,350 --> 00:13:13,410 So these are just syntactic data that they generate on my laptop. 127 00:13:13,800 --> 00:13:15,930 But is this any useful for real data? 128 00:13:16,350 --> 00:13:24,870 And to to answer this question, I have to tell you about a little piece of history of Oxford and in particular about the Radcliffe Observatory, 129 00:13:25,290 --> 00:13:31,080 which was the University Observatory, see from 1773 to 1934. 130 00:13:31,770 --> 00:13:35,250 And if you want to see it, these are just about a five minute walk from here. 131 00:13:35,940 --> 00:13:44,040 And the astronomers of the observatory in the 18th century, they started to collect weather data for a specific reason, 132 00:13:44,550 --> 00:13:51,900 which was that refraction, which is influenced by atmospheric condition, can affect astronomical measurement. 133 00:13:52,380 --> 00:14:02,400 So they had to have a precise understanding of the local weather condition in order to ensure that their measurements were accurate. 134 00:14:03,640 --> 00:14:08,230 So they started collecting data about temperature and pressure and other things as well. 135 00:14:08,740 --> 00:14:13,450 And they are still doing that. So this data collection has continued for more than two centuries. 136 00:14:14,200 --> 00:14:22,390 They have the longest running record of temperature and rainfall data for a single site in Britain running continuously from 1813. 137 00:14:23,660 --> 00:14:26,720 So you can see here a picture of their data. 138 00:14:27,080 --> 00:14:35,540 And here you can see a handwritten entry of this data set from November 14th, 1813. 139 00:14:36,550 --> 00:14:41,790 And you can see that there is some data about temperature, about pressure, wind, rain. 140 00:14:42,220 --> 00:14:47,680 And apparently that day was very dark and rainy, not surprisingly. 141 00:14:49,990 --> 00:14:55,510 So today, these letters, we have been digitalise them and they are freely accessible on the Internet. 142 00:14:55,600 --> 00:15:02,950 So what they did is that they just went and downloaded the old data centre. And I took, for instance, I wanted to study extreme value statistics, 143 00:15:02,950 --> 00:15:09,490 so I took the maximal temperature in October in Oxford for every year in the last 200 years. 144 00:15:10,270 --> 00:15:16,810 And this is what is plotted here. So each dot is the maximum temperature registered in Oxford in October. 145 00:15:17,380 --> 00:15:21,250 And you can see that I mean, you cannot tell too much from this data, to be honest. 146 00:15:21,250 --> 00:15:24,700 But what you can tell is that 2011 had a very hot October. 147 00:15:25,060 --> 00:15:32,770 And the other thing is that if you plot this data as an Instagram, so if you build an Instagram from this data, this is what you get. 148 00:15:34,880 --> 00:15:37,970 And this is surprisingly close to the global distribution. 149 00:15:38,450 --> 00:15:45,589 And you can fit the gamble pretty easily to this data. So even though these data are not independent and then distributed, 150 00:15:45,590 --> 00:15:53,060 this theory still tells us something useful about the data, and it can predict the probability of rare events. 151 00:15:54,760 --> 00:16:00,760 So the other thing I want to tell you about in the context of independent variables is about records. 152 00:16:02,190 --> 00:16:11,070 So again, I consider independent variables and I want to answer this question, which is given a sequence of independent, random numbers. 153 00:16:11,850 --> 00:16:13,800 How many records do we expect to see? 154 00:16:16,120 --> 00:16:21,800 So it's a very practical question and we will be able to answer this question and to compute this quantity exactly within this light. 155 00:16:21,820 --> 00:16:28,479 So it's a very simple computation. So the first thing we have to observe then is that we want to compute what is 156 00:16:28,480 --> 00:16:33,630 the probability that I the eighth variable variable that I observe is a record. 157 00:16:35,520 --> 00:16:41,429 So if excise are recorded, it means that it is the biggest variable so far and these variables are independent, 158 00:16:41,430 --> 00:16:44,220 so any of them could be the maximum with equal probability. 159 00:16:44,700 --> 00:16:49,950 So since I observe the I variables, the probability at the last one is that equality is just one over I. 160 00:16:51,430 --> 00:17:00,090 Because it has to be uniform and it has to sum to one. So the average number of records I can I can obtain it just by summing over I. 161 00:17:01,870 --> 00:17:08,080 So the average number of records is the sum of it, all of the probability that that particular time had a recorder. 162 00:17:09,080 --> 00:17:11,120 So it's some over I from 1 to 10, 163 00:17:11,150 --> 00:17:19,180 some over one where I from I go from one to n and if I approximate the sum with an integral, if any is large, I can do that. 164 00:17:19,820 --> 00:17:23,450 I get that the average number of records is growing as the log of an. 165 00:17:25,240 --> 00:17:28,700 And let me point out that this is also universal. 166 00:17:29,110 --> 00:17:32,670 So this doesn't depend on the particular distribution of the single variables. 167 00:17:33,740 --> 00:17:39,340 It's a very robust result to. And this is what what you would expect. 168 00:17:39,730 --> 00:17:45,700 So if you observe, for instance, 20 random variables, you would expect around three or four records. 169 00:17:46,180 --> 00:17:48,610 So it's a very slow growth. And this makes sense because. 170 00:17:50,350 --> 00:17:56,830 Later in time, it will it will be harder to break a new record because the last record will be higher. 171 00:17:57,640 --> 00:18:02,230 And if you try to apply this to the sports data that I showed before. 172 00:18:03,160 --> 00:18:06,399 We had that in the last 20 editions of the Olympics. 173 00:18:06,400 --> 00:18:13,740 There have been seven marathon records. And so the independent theory doesn't work in this case. 174 00:18:14,130 --> 00:18:17,490 We would expect three or four and we get seven. So it's it's very off. 175 00:18:17,850 --> 00:18:27,790 And if we apply the same the same idea to the to the temperature data that I showed before, it still doesn't work. 176 00:18:27,810 --> 00:18:32,670 So in the last 200 years, there are nine records, but the log of 200 is like around five. 177 00:18:33,610 --> 00:18:42,210 So the independent theory doesn't work in this case. So the independent theory is very useful as a benchmark and it works in some cases, 178 00:18:42,870 --> 00:18:50,370 but it's it has limitations because the real world in the real world, there are correlations and you need to include them in the model often. 179 00:18:50,850 --> 00:18:56,429 So what we want to do now is to include correlations in the model and to get a more 180 00:18:56,430 --> 00:19:00,840 complicated model which takes into account that different variables are not independent. 181 00:19:02,490 --> 00:19:07,229 The simplest model that you can consider with correlations is the weakly correlated model. 182 00:19:07,230 --> 00:19:11,930 And so first of all, let me say that in general, there's no general technique to study correlated system. 183 00:19:11,940 --> 00:19:18,510 So we have to go on a case by case basis. And the simplest model that you can consider is the one where correlations are weak. 184 00:19:19,980 --> 00:19:28,800 What does that mean? So that quantity that I have on the left hand side is the correlation between variable exit and variable XG. 185 00:19:29,750 --> 00:19:36,770 And what you have to know is that this number will be zero if the variables are independent and it will be non zero positive or negative. 186 00:19:37,040 --> 00:19:40,190 If these variables, these variables are correlated. 187 00:19:40,850 --> 00:19:46,040 So if I assume that the correlations are decaying in time exponentially faster. 188 00:19:47,130 --> 00:19:50,930 Overall a typical timescale, which is the correlation timescale, say. 189 00:19:52,500 --> 00:20:00,420 What I what I will have is that two random variables which are farther away in time than say. 190 00:20:01,380 --> 00:20:06,380 They are basically independent. So I can still make progress using the independent theory. 191 00:20:06,770 --> 00:20:10,030 And to do that, imagine that they have a very, very long sequencer. 192 00:20:11,110 --> 00:20:17,320 What they can do is that they can divide this sequence into different intervals of size, saying. 193 00:20:18,470 --> 00:20:24,020 Similarly to what is done in statistical physics with the Canada Cannon argument the. 194 00:20:25,230 --> 00:20:31,680 And since the correlations decay over this time, the different intervals are almost independent. 195 00:20:33,510 --> 00:20:38,070 So if I define the maximum within each interval. 196 00:20:39,140 --> 00:20:43,280 These Maxima and want him to make will be independent run the variables. 197 00:20:44,350 --> 00:20:47,860 And they can still apply the theory of independent identically distributed 198 00:20:47,860 --> 00:20:52,600 random variables because the global maximum is just the maximum of the maximum. 199 00:20:53,960 --> 00:21:01,340 So I can still use the theory that I showed before. But the problem is that when correlations do not decay exponentially in time, 200 00:21:01,520 --> 00:21:05,720 so when they have strongly correlated random variables, this doesn't work anymore. 201 00:21:07,110 --> 00:21:16,620 So here I wanted to present just three examples of systems with strongly correlated random variable of which have been studied quite a lot in physics. 202 00:21:17,220 --> 00:21:26,370 So the first one are random mattresses, which have been used a lot in physics to describe the complicated Hamiltonian of heavy nuclei. 203 00:21:26,850 --> 00:21:33,900 So the basic idea there is that you don't know that this Hamiltonian is so complicated that you don't know exactly what it will look like. 204 00:21:34,290 --> 00:21:38,080 So you approximate it with a random matrix. And it actually works. 205 00:21:38,680 --> 00:21:44,559 And and in this case, for instance, the extreme, 206 00:21:44,560 --> 00:21:50,190 the maximal like in value of the Hamiltonian has been studied quite a lot in the context of extreme volume statistics. 207 00:21:50,800 --> 00:21:53,380 The second one are fluctuating interfaces. So. 208 00:21:54,490 --> 00:22:01,660 Which have been used a lot to describe the interface of growing colonies of bacteria or growing tumours. 209 00:22:02,170 --> 00:22:08,200 And it's quite important to study the statistical properties of these interfaces and in the context of extreme value statistics. 210 00:22:08,350 --> 00:22:13,450 For instance, the maximal height of an interface has been studied and it is a very crucial quantity. 211 00:22:13,900 --> 00:22:18,880 And finally, around the works. Which will be the focus of the rest of my talk. 212 00:22:20,690 --> 00:22:28,860 So let me define first of all, what are on the wokeism. So here I am plotting the position of the worker over time. 213 00:22:28,870 --> 00:22:32,380 So XLK is the position of the random worker as a function of key? 214 00:22:32,410 --> 00:22:39,700 It's a motion in one dimension. And the evolution of the position satisfies a very simple rule, 215 00:22:40,150 --> 00:22:46,750 which is telling me that the position of the step K is equal to the position at the previous step K minus one plus a jump. 216 00:22:47,790 --> 00:22:49,080 Which I will call it, actually. 217 00:22:50,490 --> 00:22:56,610 So I will assume that the jumps of this random worker are again independent and identically distributed random variables. 218 00:22:56,910 --> 00:23:03,780 But now the variables in scales are strongly correlated because if you write down the joint probability distribution, 219 00:23:04,080 --> 00:23:05,670 it will have a more complicated form, 220 00:23:05,670 --> 00:23:12,480 which is not just the one of independent variables, and there are actually strong correlations in this model which do not decay exponentially in time. 221 00:23:14,360 --> 00:23:21,010 So talking about around the works, as you probably know, Oxford is full of beautiful pubs and many of them, 222 00:23:21,020 --> 00:23:24,610 I mean some of them, like the one in this picture, are just next to a river. 223 00:23:25,900 --> 00:23:35,560 So if you imagine that there is a drunk student classical exam after that, after a few pints, he wants to go home but is drunk. 224 00:23:36,540 --> 00:23:39,570 So he will move with random steps. 225 00:23:41,110 --> 00:23:43,450 Either towards the river or away from it. 226 00:23:45,460 --> 00:23:53,410 So my question for you is now after and steps, what is the probability that these long land has fallen into the river? 227 00:23:55,140 --> 00:24:01,020 We can answer this question very precisely by modelling the motion of the student as ran the work. 228 00:24:01,890 --> 00:24:08,370 So now the river is this red line that I should that that's corresponds to x equal to zero. 229 00:24:09,620 --> 00:24:12,950 And what we want to study is the survival probability. 230 00:24:12,960 --> 00:24:15,120 So the probability that the student will survive. 231 00:24:16,010 --> 00:24:25,050 Which is which I will call Q1 and Q1 is just the probability that X1 is greater than zero, x2 is greater than zero up to Accenture. 232 00:24:25,790 --> 00:24:28,370 Given that the starting position was zero. 233 00:24:30,120 --> 00:24:40,980 And of course, this probability distribution will depend, as will be, a complicated function of the probability distribution of the steps of data. 234 00:24:41,580 --> 00:24:44,700 So if I take steps which are drawn from a distribution, 235 00:24:44,700 --> 00:24:50,490 I we get in principle something different than if I get steps from my uniform probability distribution. 236 00:24:51,540 --> 00:24:58,020 And in this formula you don't need to understand the details. These details are heavy sighted functions, which are one. 237 00:24:58,140 --> 00:25:01,770 If the argument is positive and zero otherwise. But this doesn't really matter. 238 00:25:01,780 --> 00:25:05,340 What matters is that QM is a complicated function of P. 239 00:25:06,630 --> 00:25:11,170 And B can be anything really. So I would expect to end to depend on. 240 00:25:12,240 --> 00:25:19,800 But the very surprising result, which is known as this Barry Anderson theorem, is that Q and is completely universal once again. 241 00:25:20,010 --> 00:25:24,390 So he is completely independent of P of data. And this is true for anyon. 242 00:25:24,420 --> 00:25:31,320 So in this case, it's not true only for large values of n is true for any value of n and is given by this very simple formula. 243 00:25:32,510 --> 00:25:37,970 So Asparagus was a Swedish mathematician, and they proved this in 1955, 54. 244 00:25:38,480 --> 00:25:42,060 And this the formula is surprisingly simple. 245 00:25:43,040 --> 00:25:48,260 So many people thought that the proof of this formula should be simple as well. 246 00:25:49,130 --> 00:25:55,190 And there have been, while the original proof passed by, Anderson, is a kind of complicated combinatorial proof. 247 00:25:56,000 --> 00:25:59,809 So there have been many attempts to prove this formula in a simpler way. 248 00:25:59,810 --> 00:26:04,130 But they came up with different proofs, which are where we more complicated than the original one. 249 00:26:04,670 --> 00:26:08,480 So if we now plot what QM is as a function of then so what? 250 00:26:08,510 --> 00:26:11,980 What is the probability this student has survived for and stepson. 251 00:26:13,140 --> 00:26:18,900 You can see that initially the probability is one off because the students, the student is starting just on the edge of the river. 252 00:26:19,200 --> 00:26:25,800 So if it goes on the right in the wrong direction, the first step will immediately go in following the river, and then it decreases to zero. 253 00:26:26,580 --> 00:26:29,010 So if you wait long enough, the student will fall. 254 00:26:29,460 --> 00:26:39,570 And the the survivor probability is, I'm telling you about this because this is a crucial quantity in statistics, 255 00:26:39,900 --> 00:26:46,860 because it can be used as a building block to study more complicated quantities like the distribution of the time of the maximum. 256 00:26:47,250 --> 00:26:53,160 And this is what I will tell you about. So let's consider once again a round the work of this type. 257 00:26:54,380 --> 00:26:59,780 I led the round the working world for MN steps and I want to study. 258 00:27:00,080 --> 00:27:04,460 I want to know what is the probability distribution of the time of the maximum democracy? 259 00:27:05,520 --> 00:27:10,710 So what is the probability at the time of the maximum amount is either at the beginning or at the end or in the middle, for instance. 260 00:27:12,180 --> 00:27:18,570 So we will be able to complete this distribution exactly by using the result for the survival probability. 261 00:27:19,080 --> 00:27:22,410 So the first observation that we need to make is that. 262 00:27:24,390 --> 00:27:28,890 Almost by definition, the random worker cannot go above its maximum. 263 00:27:29,990 --> 00:27:36,290 Is almost out a logical but. So the Iranian worker cannot cross the red barrier, which is the maximum. 264 00:27:36,920 --> 00:27:40,610 Now, let's split this trajectory in two parts. 265 00:27:41,180 --> 00:27:46,700 So the first part is from time zero to time democracy. And the second part is from time to time and. 266 00:27:48,310 --> 00:27:56,860 So what's going on in the second part? The random worker starts from the maximum and it has to stay below the maximum for some number of steps. 267 00:27:57,730 --> 00:28:00,730 And this is precisely what the survival probability is telling you. 268 00:28:01,670 --> 00:28:07,760 So the probability of this part of the trajectory is precisely this would via probability. 269 00:28:09,800 --> 00:28:14,540 In this case is q n minus the marks because there are and minus the steps on that side. 270 00:28:15,460 --> 00:28:21,080 And in the first part said the I can apply a very similar argument. 271 00:28:21,100 --> 00:28:31,379 I just have to go back in time. So if I start from the maximum that there are no workers to stay below the maximum, 40 marks a step. 272 00:28:31,380 --> 00:28:37,440 So. So the probability of the first part is just the of the probability first steps. 273 00:28:38,520 --> 00:28:41,970 And the probability of the time of the maximum was just the product of these two probabilities. 274 00:28:42,330 --> 00:28:49,430 Very simple. Okay. So if you remember that Q is a universal quantity. 275 00:28:50,270 --> 00:28:58,420 This is telling that this the distribution of climax is also universal, doesn't depend on the specific way in which I model my system and. 276 00:28:59,940 --> 00:29:04,380 If I if I plot this quantity, the probability distribution of the maximum. 277 00:29:06,140 --> 00:29:08,840 If you look something like this in the limit of margin. 278 00:29:09,740 --> 00:29:17,660 So what they what I what they can understand from this plot is that it is more likely to find that the maximum either at the very beginning. 279 00:29:18,770 --> 00:29:21,830 Or at the very end of the interval that I'm considering. 280 00:29:22,880 --> 00:29:28,640 Because the probabilities diverging, actually the probability is going almost to infinity in the limit of margin. 281 00:29:28,640 --> 00:29:34,460 When t max is going to zero, t max is going to end, which is the final time of the end. 282 00:29:35,000 --> 00:29:41,270 And this is telling me that there are way more trajectories which are either increasing like this of reaching 283 00:29:41,270 --> 00:29:46,490 the maximum and the final time or decreasing like this and reaching the maximum at the very beginning. 284 00:29:47,570 --> 00:29:54,470 So you might you might think that if we go and model the evolution of the price in the stock market just as around the work. 285 00:29:55,510 --> 00:29:59,770 The best time to sell a stock is either in the morning or in the late in the afternoon. 286 00:30:00,580 --> 00:30:05,050 But before you do that, you have to keep in mind what is the distribution of the time of the minimum? 287 00:30:06,080 --> 00:30:15,170 Which is exactly the same. So you have to be very careful by describing in describing the stock market as around the world. 288 00:30:15,260 --> 00:30:24,380 It's more complicated then. So in the last few minutes of my presentation, I want to tell you about something, some new results. 289 00:30:24,440 --> 00:30:28,400 So so far I presented some classical results of existing value statistics of. 290 00:30:29,480 --> 00:30:34,520 So I would like to tell you about some more recent developments and some of my research in this in this field, 291 00:30:34,970 --> 00:30:42,380 which is done in the context of active particles. So, first of all, let me tell you what what active matter and active particles are. 292 00:30:42,890 --> 00:30:52,010 So active matter describes systems like colonies of bacteria, which are composed of many individual units, some. 293 00:30:53,360 --> 00:31:01,520 And these units are able to absorb energy from the system through food, for instance, converting this energy into some form of work. 294 00:31:01,610 --> 00:31:08,600 In the case of the bacteria, this work is just persistent motion and this is very different to what is usually considered in physics. 295 00:31:09,020 --> 00:31:18,970 For instance, Brownian motion. Because the bacteria move in a persistent way while Brownian motion is just moving in a random way. 296 00:31:20,210 --> 00:31:23,870 Due to collisions with the molecules of the fluid surrounding it. 297 00:31:23,870 --> 00:31:34,070 The. And crucially, these bacteria are out of equilibrium because they are continuously consuming energy while Brownian motion is at equilibrium. 298 00:31:34,370 --> 00:31:42,650 So we have a lot of tools and techniques from thermodynamics and statistical mechanics to describe equilibrium systems. 299 00:31:43,490 --> 00:31:47,430 But all of these techniques do not apply to non-equilibrium systems. 300 00:31:48,020 --> 00:31:53,030 And the reason for that is that there is a continuous absorption, absorption of energy. 301 00:31:53,420 --> 00:31:56,900 And in other words, active matter is alive. Passive matter is dead. 302 00:31:57,530 --> 00:32:02,189 And it's crucial to understand the statistical properties of active particles. 303 00:32:02,190 --> 00:32:09,480 So. And this is what I did during my study and by considering the running tumble particle model. 304 00:32:10,430 --> 00:32:18,650 Which he's describing in a very simplified way, the motion of equal bacteria, which is shown in the lower animation. 305 00:32:18,680 --> 00:32:22,729 These are experimental data. So these bacteria, 306 00:32:22,730 --> 00:32:32,059 what the way they move is that they typically move in a fix direction in a persistent way with almost constant velocity for some amount of time. 307 00:32:32,060 --> 00:32:37,820 And then they change direction suddenly this way. This is called the run and tumble motion. 308 00:32:38,960 --> 00:32:43,640 And to model it, I will just I will just start with the with the simplest possible model. 309 00:32:44,660 --> 00:32:50,090 So I assume that they have a bacterium which is starting from a barrier, which is this red line here. 310 00:32:50,540 --> 00:32:58,610 And initially this bacterium will choose the action uniformly at random in space, and it will start moving in that direction for some time. 311 00:32:59,180 --> 00:33:03,980 And after some time it will stumble. So it will pick a new direction again uniformly at random. 312 00:33:05,100 --> 00:33:10,770 And they will assume that these tumbling events, these changes of direction, occur at a constant rate gamma. 313 00:33:12,000 --> 00:33:18,149 Meaning that on average for each second they expect to see one over gamma events or changes of direction. 314 00:33:18,150 --> 00:33:22,380 But there can be fluctuations is actually what is called a Poisson process. 315 00:33:23,070 --> 00:33:29,460 And they will also consider velocity fluctuations in the in the speed of the particle which are described by same probability 316 00:33:29,670 --> 00:33:36,989 distribution w which but this is not very important and will consider this motion in one or two or three dimensions. 317 00:33:36,990 --> 00:33:42,420 And this is the dimension of the system. What I'm shooting here is the motion in two dimensions. 318 00:33:43,560 --> 00:33:51,360 So in these cases, one one first step to understand the statistical properties of this motion is to study the survival probability, 319 00:33:52,440 --> 00:34:00,239 which I will define in the following way as the probability that the X component of the particle does not change. 320 00:34:00,240 --> 00:34:02,969 Sign up to time t in two dimensions. 321 00:34:02,970 --> 00:34:09,330 This is the probability that the random worker or the random number particle doesn't cross the barrier for a time. 322 00:34:09,330 --> 00:34:19,800 T So this is a process which is defined in continuous time and this would very probably was first computed in 95 in the simplest possible case, 323 00:34:19,920 --> 00:34:29,309 which is one dimension. So the motion is just on a line and constant velocity and it is given by this formula where I noted 324 00:34:29,310 --> 00:34:34,680 they one are modified by cell function but is something you can that you can plot them and it 325 00:34:34,680 --> 00:34:39,509 looks like this so initially this would work probabilities one off for the same reason as before 326 00:34:39,510 --> 00:34:44,430 because half of the times the particle we immediately cross the wall and then it decreases to zero. 327 00:34:45,150 --> 00:34:52,500 So I wanted to study this problem more in general. So for higher dimension size and also for velocity fluctuations of the particle. 328 00:34:53,100 --> 00:34:56,940 So I first did the simulations in one day and not too surprisingly. 329 00:34:57,820 --> 00:34:59,980 I got that. The simulations agree with the theory. 330 00:35:00,920 --> 00:35:07,130 And then I repeated the simulations in equal to two, and I was quite surprised to get exactly the same result. 331 00:35:07,520 --> 00:35:12,910 So at first I thought that I had a bug in my code then. But I checked very carefully. 332 00:35:13,240 --> 00:35:20,530 It took me a lot of a long time, but this was not the case. And doing simulations in the sequel to three, I got again the same result. 333 00:35:21,070 --> 00:35:28,030 And when I included the velocity fluctuations in the speed of the particle in my model, I got once again the very same result. 334 00:35:29,560 --> 00:35:35,260 So the numerical simulations were suggesting that the result is, in some sense, universal. 335 00:35:36,380 --> 00:35:43,700 Independent of the details of the model. And keep in mind that this is a non-trivial statement because many other quantities in this model. 336 00:35:43,820 --> 00:35:51,530 For instance, the position distribution of the particle, the probability to find the particle at position X at 90 is not universal. 337 00:35:52,010 --> 00:35:54,770 This depends strongly in which dimension I'm considering. 338 00:35:55,610 --> 00:36:00,230 There is a curve for equal to four, which is something you can simulate on a computer but doesn't really make sense. 339 00:36:00,590 --> 00:36:08,140 But it is strongly dependent on the. And this looks a lot like what we saw before for the works. 340 00:36:08,350 --> 00:36:09,580 This part Anderson Theorem, 341 00:36:09,880 --> 00:36:18,850 which was also a universal result but it was not obvious how to apply this result to the running tambour particle model for many technical reasons. 342 00:36:19,180 --> 00:36:27,979 For instance, the tumble particle model is defined in continuous time, while the Spotted Anderson theorem applies only to discrete time run works. 343 00:36:27,980 --> 00:36:36,300 So. But we were able to actually develop a mapping from the continuous time motion of the running particle to a discrete time around the work. 344 00:36:36,690 --> 00:36:42,810 And we were able to show that actually it is this period of time that is behind that, this universality. 345 00:36:43,740 --> 00:36:49,410 And we saw before that the sort of higher probability can be used as a building 346 00:36:49,410 --> 00:36:53,130 block to consider more complicated quantities in extreme value statistics of. 347 00:36:54,140 --> 00:36:57,050 And this is the case also for the running tambour particle model. 348 00:36:58,070 --> 00:37:05,240 So using this building block, this reverb probability, we were able to show that this time of the maximum for this random process, 349 00:37:05,240 --> 00:37:08,240 which is the random tumble particle motion, is also universal. 350 00:37:09,020 --> 00:37:14,090 And the other is the number of records in for this model is also universal quantity. 351 00:37:14,420 --> 00:37:17,420 In this plot, I'm showing the I mean, 352 00:37:17,810 --> 00:37:24,440 the blue line is the theory and the different symbols which kind of overlapped with each other are the simulations. 353 00:37:24,440 --> 00:37:30,229 And they all followed the very same curve on the left. I'm plotting the cumulative probability of the time of the maximum. 354 00:37:30,230 --> 00:37:36,890 So the probability that the maximum was less than some variety prime as a function of t prime and theory of the number of records. 355 00:37:37,670 --> 00:37:44,030 So and this is again a very non-trivial result because we have seen before that many other quantities in this model are not universal. 356 00:37:44,030 --> 00:37:47,750 So there is something special about these extreme value statistics quantities. 357 00:37:48,410 --> 00:37:53,210 So to conclude the I presented to you different simple models. 358 00:37:54,110 --> 00:38:00,830 And from this I hope that we have built an intuition on how extreme event behave in a statistical way. 359 00:38:01,310 --> 00:38:06,680 And these results have many applications to finance physics, evolution theory then. 360 00:38:07,070 --> 00:38:11,540 And so it's a very exciting and interdisciplinary field of study. 361 00:38:11,810 --> 00:38:21,110 And the crucial point that I want to make is that often these results were universal, independent of the specific way in which we model the system. 362 00:38:21,260 --> 00:38:24,409 And this is very important because often we model the system in a way which is not 363 00:38:24,410 --> 00:38:29,180 accurate because we don't have access to the full information about the system. 364 00:38:29,180 --> 00:38:32,420 So we have to make assumptions and if we get the universal result, 365 00:38:32,960 --> 00:38:37,730 we can we are guaranteed that our results are robust to errors in the way we model the system. 366 00:38:38,270 --> 00:38:42,890 So as a final question, as I mentioned before, there is no general theory. 367 00:38:43,820 --> 00:38:46,340 Forex in value statistics and correlated systems. 368 00:38:46,850 --> 00:38:57,320 And uh, as a very ambitious question, we would like to find or to explore possible direction to find one. 369 00:38:57,710 --> 00:38:59,750 And with this, I want to thank you for your attention.