1 00:00:00,060 --> 00:00:10,830 Department of. The Department of Computer Science and the Big Data Institute, as you've just seen, we are recording this meeting. 2 00:00:10,830 --> 00:00:20,040 So if you don't wish to be recorded, please keep your microphone off and your camera turned off. 3 00:00:20,040 --> 00:00:26,680 And otherwise, I will hand over to Christoph to do the introductions. Thanks, Christine. 4 00:00:26,680 --> 00:00:32,980 So it's a pleasure to introduce Professor, somebody from. 5 00:00:32,980 --> 00:00:38,950 And Professor of Machine Learning and Public Health at the University of Copenhagen 6 00:00:38,950 --> 00:00:45,340 in Denmark and Professor of Public Health and statistics at Imperial College. 7 00:00:45,340 --> 00:00:52,780 So Sam is known to many in Oxford from his time. 8 00:00:52,780 --> 00:01:06,190 From his time here, including the Big Data Institute, has done many contributions at the interface of statistics and public health first. 9 00:01:06,190 --> 00:01:20,080 And he followed dynamics lots of important contributions to geospatial dynamics of infectious diseases, including on malaria and the global burden. 10 00:01:20,080 --> 00:01:32,020 A very prolific and important contributions on the dynamics of COVID and always at the interface of applied and fundamental research. 11 00:01:32,020 --> 00:01:52,880 And we'll be talking to us today based on the sort of fundamentals of learning based on the work Ontarians have achieved some. 12 00:01:52,880 --> 00:01:57,320 Thank you. Thank you, Brazil. Thank you set for setting this up. 13 00:01:57,320 --> 00:02:03,440 So I was I was a bit sort of at odds on what to what actually presented because we've been working 14 00:02:03,440 --> 00:02:09,530 on a paper in my group and with my collaborators of months for a long time and we didn't know, 15 00:02:09,530 --> 00:02:13,580 you know, I'm really proud of that. People are really happy of it and find its contribution very interesting. 16 00:02:13,580 --> 00:02:19,490 But it can be a bit dry for those who are not used to or interested only in banking processes. 17 00:02:19,490 --> 00:02:23,330 So I wanted to do a little bit of a history of infectious disease modelling 18 00:02:23,330 --> 00:02:27,290 and the different approaches so that those in the audience were not used to. 19 00:02:27,290 --> 00:02:31,760 The infectious disease modelling can sort of couch what our new contributions are, 20 00:02:31,760 --> 00:02:37,440 and then I'll move on to some interesting facets that I've been working on. 21 00:02:37,440 --> 00:02:43,950 So the first thing is, you know, what's our goal? Well, most of the time in infectious diseases, we're interested in some measure of the epidemic, 22 00:02:43,950 --> 00:02:47,550 something like prevalence, the number of infected individuals at any time. 23 00:02:47,550 --> 00:02:54,090 So I go somewhere, I find out how many people are infected with, say, SARS-CoV-2. 24 00:02:54,090 --> 00:02:58,620 And that's my preference, right? Or you could look at something like incidence, 25 00:02:58,620 --> 00:03:03,330 which is the number of newly infected individuals at time T and there are, of course, other measures in it. 26 00:03:03,330 --> 00:03:07,260 But our goal essentially is to be able to model these two metrics. 27 00:03:07,260 --> 00:03:08,120 Prevalence, you know, 28 00:03:08,120 --> 00:03:16,920 when you can get from measure blood samples and seeing if people are positive for a virus or parasite or any disease of interest and incidence, 29 00:03:16,920 --> 00:03:20,120 of course, you can just calculate using routine cases. 30 00:03:20,120 --> 00:03:25,520 So the question is, you know, why don't we just use Gaussian process regression and call it a day? 31 00:03:25,520 --> 00:03:30,790 Well, why don't we just use common filters or Roraima or recurrent neural networks? 32 00:03:30,790 --> 00:03:36,500 I mean, these are all incredible and they're very good for prediction, and they may give you a really good metric of uncertainty. 33 00:03:36,500 --> 00:03:42,860 And they're empirically motivated from the field of machine learning and deep learning for at least the econometrics. 34 00:03:42,860 --> 00:03:46,190 But you know, why don't we just use this and then end the talk here, right? 35 00:03:46,190 --> 00:03:52,390 Why? Why am I? Am I? You know what? The reason of trying to even look at things like process? 36 00:03:52,390 --> 00:03:56,230 And the key thing is that we want to try for the best of both worlds and, you know, 37 00:03:56,230 --> 00:04:05,290 said that I use this term a lot fairly mechanistic models and the idea is we want to encode some notion of dynamics in there. 38 00:04:05,290 --> 00:04:12,640 Now in physics, you know where we're ultimately mathematics, chemical and physical law with remarkable accuracy, 39 00:04:12,640 --> 00:04:18,520 with things like quantum electrodynamics and even just the swing of a pendulum chaotic or not. 40 00:04:18,520 --> 00:04:25,150 What we really want to do is try to disentangle some mechanisms that have some basis in epidemiological reality. 41 00:04:25,150 --> 00:04:28,750 And the reason we want to do this is that if we can understand the mechanism, 42 00:04:28,750 --> 00:04:36,670 then it goes a long way to understanding the causality of the whole process, why things happen, why certain hypotheses occur. 43 00:04:36,670 --> 00:04:42,790 And so the idea is that we don't have to throw away things like a remnant of industrial processes. 44 00:04:42,790 --> 00:04:48,310 In fact, we used, you know, random walk, for example, we've got a process with a specific precision. 45 00:04:48,310 --> 00:04:52,210 We use them in all our models, but we want to embed these in a mechanism. 46 00:04:52,210 --> 00:04:57,190 And so the question then comes in, you know, if we take this flexibility plus the mechanisms, 47 00:04:57,190 --> 00:05:02,290 what is the mechanism that we want to use for modelling effects? 48 00:05:02,290 --> 00:05:07,240 And then we move on and we think about, well, what is it we actually care about, right? 49 00:05:07,240 --> 00:05:10,820 So we have data on incidents of Providence, right again. 50 00:05:10,820 --> 00:05:14,030 You know, the number of infected individuals, a number of newly infected individuals. 51 00:05:14,030 --> 00:05:21,430 Well, what we really care about is the rate of transmission, which, you know, for all of us familiar with now, the COVID 19 pandemic, 52 00:05:21,430 --> 00:05:28,810 it's it is generally reflected the reproduction number and if the rate of transmission and it is simply the average number of secondary cases. 53 00:05:28,810 --> 00:05:35,290 So the average number of people I infect after I'm infected and historically this traces all the way back 54 00:05:35,290 --> 00:05:42,340 to the earliest models in infectious disease modelling from from Ronald Ross and lockdown in malaria. 55 00:05:42,340 --> 00:05:45,460 So we actually care about this metric, Archie. 56 00:05:45,460 --> 00:05:51,940 But, you know, many people who are familiar with statistical literature could say, Well, why don't we just estimate growth? 57 00:05:51,940 --> 00:06:00,640 And this is one of the big contributions from what we call lingo, Marc Lipsitch in terms of translating some degree of epidemiological mechanism, 58 00:06:00,640 --> 00:06:06,940 using probability generating functions for moment, generating functions and trying to calculate the. 59 00:06:06,940 --> 00:06:14,080 But however, we have to take care when trying to estimate growth rates or rates of transmission. 60 00:06:14,080 --> 00:06:22,180 And because, as has been shown by our implosive and co, you have to be really careful when trying to estimate growth rates or reproduction. 61 00:06:22,180 --> 00:06:28,600 So again, part of the benefit of having a mechanism in there is this knowledge that growth rates can be very 62 00:06:28,600 --> 00:06:33,130 difficult to estimate because of the exponential growth and especially when looking at the tails. 63 00:06:33,130 --> 00:06:42,210 And so by providing a mechanism, we can somehow control a little bit this statistical process, although not in time. 64 00:06:42,210 --> 00:06:48,990 And so the question is, what is Olmec now? Once again, I do like the physical analogy, and I can't I can't stress it enough. 65 00:06:48,990 --> 00:06:59,390 You know, the field of mathematical biology is driven by this idea of physicists coming into into biology from the. 66 00:06:59,390 --> 00:07:04,950 What about me and all the other contributions? Roy Anderson, et cetera. 67 00:07:04,950 --> 00:07:12,320 And bringing in this mechanism for trying to do to epidemiology and biology what have been done in physics, 68 00:07:12,320 --> 00:07:18,080 and there's so many mechanisms that concern infectious diseases that have grown in epidemiology, 69 00:07:18,080 --> 00:07:24,450 from looking at network structures to looking at underlying dynamics both time and space. 70 00:07:24,450 --> 00:07:29,880 And I'll give me the first model that really set the stage was the seminal work of CalMac and. 71 00:07:29,880 --> 00:07:37,830 I mean, it's pretty hard even if you've never done anything infectious disease modelling to to avoid some of the work that they did. 72 00:07:37,830 --> 00:07:42,990 And of course, they have the classic scenario models where they look at susceptible, 73 00:07:42,990 --> 00:07:49,260 eye infected or recovered, and they have very, very simple series of equations here. 74 00:07:49,260 --> 00:07:53,790 And you know, part of the reason that we consider this thing, the reproduction number is not only from Ross and locked up, 75 00:07:53,790 --> 00:08:01,380 but also from from this paper where I say r t that there's not actually a very that's a constant f r note, 76 00:08:01,380 --> 00:08:07,320 but it's actually trying to trying to estimate the reproduction number from these equations. 77 00:08:07,320 --> 00:08:16,040 And since the initial contribution of come back in the, you know, the architecture that have grown from susceptible, infected recovered models, 78 00:08:16,040 --> 00:08:23,790 arguably some of the most widespread in all of infectious disease, you can cast these stochastic differential equations. 79 00:08:23,790 --> 00:08:30,780 You can cast these as Markov chains. You can actually model using reaction kinetics, which we'll talk about in a little bit. 80 00:08:30,780 --> 00:08:38,370 There are connexions in this to two fundamental integral equations, including the Volterra equation and the Fred Helme equation. 81 00:08:38,370 --> 00:08:47,370 So there are a lot that could go into this, and there are deep links with this really great paper, right by David. 82 00:08:47,370 --> 00:08:56,580 Shampooed on also showed that this has a link to the kind of renewal equations that I'll be talking about that arise from branching processes. 83 00:08:56,580 --> 00:09:08,070 After that, the great man himself, Richard Bellman and and Theodore Harris started studying what are called age dependent branching processes. 84 00:09:08,070 --> 00:09:11,910 And one of the things and this is actually what started this on this entire journey quite a long time back. 85 00:09:11,910 --> 00:09:18,570 One of the things we realised in the really remarkably short paper published in 1948, 86 00:09:18,570 --> 00:09:24,960 they have this line that says standard probabilistic arguments yield the same nonlinear integral equation. 87 00:09:24,960 --> 00:09:30,150 And when you look at this, this looks remarkably like a renewal equation that I will get to in a bit. 88 00:09:30,150 --> 00:09:38,850 But one of the most problematic things is the argument that Richard Harris uses that this can easily be found from standard probabilistic arguments. 89 00:09:38,850 --> 00:09:45,990 I know from working with very competent mathematicians now that it's it really is not as trivial as that single line says. 90 00:09:45,990 --> 00:09:52,380 And of course, Theodore Harris expounded this on this a lot more in his book on budget processes further down the line. 91 00:09:52,380 --> 00:09:59,010 So now we've looked at, say, all models do the very popular in modelling infectious diseases writing processes, 92 00:09:59,010 --> 00:10:03,270 which we're going to go and come back to because that's essentially what these talks about. 93 00:10:03,270 --> 00:10:09,780 There are a couple of other really interesting facets that I want to go through just to set the stage for interesting approaches. 94 00:10:09,780 --> 00:10:14,130 The other one is Hoekstra's, which is now what processes introduced by Helen Allen. 95 00:10:14,130 --> 00:10:24,600 Hoax in 1971 are a type of self exacting process through point process where events occur and events can trigger other events to occur, 96 00:10:24,600 --> 00:10:35,190 with some given right. And we have used these processes in modelling malaria elimination in the past, and octopuses are very nice, 97 00:10:35,190 --> 00:10:42,130 and I think they are found to be used more and more infectious disease modelling. But they also have some problems. 98 00:10:42,130 --> 00:10:49,270 And then, of course, there are natural processes now. One of the most, it's very interesting how these worlds tend to stand very separate. 99 00:10:49,270 --> 00:10:56,980 One of the most influential papers in network theory was this paper by by David campaign on maximising the spread of influence, 100 00:10:56,980 --> 00:11:08,110 and he introduced a structure called the independent cascade, and the independent cascade mill is remarkably like susceptible infected. 101 00:11:08,110 --> 00:11:12,250 The beautiful thing about this paper was it it showed that using a greedy algorithm, 102 00:11:12,250 --> 00:11:22,840 you can solve an empty heart problem in network theory with some guarantees and really, really influence a problem. 103 00:11:22,840 --> 00:11:31,670 But this idea of using networks was also used by, well, even to this in their paper in trying to estimate who infected who. 104 00:11:31,670 --> 00:11:37,700 Now we've also used these in our work to estimate reproduction numbers in especially very. 105 00:11:37,700 --> 00:11:47,260 But this network view is it's an entirely different world, but that does have, you know, some some people who really like it in epidemiology. 106 00:11:47,260 --> 00:11:52,090 But a huge amount of deep mathematics that does connect with the rest and the benefit 107 00:11:52,090 --> 00:11:56,680 of network microprocessors is that they can really disentangle who infected, 108 00:11:56,680 --> 00:12:00,610 who quite well in a way that other approaches can't. 109 00:12:00,610 --> 00:12:06,550 And then finally, they are agent based models. And of course, for those will use this, it's hard to avoid. 110 00:12:06,550 --> 00:12:12,640 Daniel Gillespie's paper on the exact simulation of reactions and that simply use often and often 111 00:12:12,640 --> 00:12:17,500 in infectious disease modelling because lots of infectious disease epidemiologists know the 112 00:12:17,500 --> 00:12:24,580 disease extremely well and know the mechanisms of the disease very well and just create a population 113 00:12:24,580 --> 00:12:30,130 of individuals and allow them to evolve by setting rules as as shown here for tuberculosis. 114 00:12:30,130 --> 00:12:35,770 And such models have, you know, some some extremely influential applications. 115 00:12:35,770 --> 00:12:44,290 For example, some of Crystal and Neal's work, along with other colleagues on reporting on arguably one of the most influential papers on COVID 116 00:12:44,290 --> 00:12:49,270 19 mortality rate at the start of the pandemic was built entirely from agent based models. 117 00:12:49,270 --> 00:12:57,040 And of course, they are comes with using agent based models that they can be very difficult to fit and very difficult to 118 00:12:57,040 --> 00:13:04,030 unpick the underlying dynamics of them because there can be a lot of redundancy of cooling parameters, 119 00:13:04,030 --> 00:13:12,420 et cetera. And then finally, we get two renewal questions which were introduced at least of the world of infectious disease, 120 00:13:12,420 --> 00:13:18,900 probably a long time ago, but most by Christof seminal work in PLoS One, 121 00:13:18,900 --> 00:13:29,460 where he derived the renewal equation that we love, and then Ann Curry and colleagues used this in a framework to estimate 200 reproduction numbers. 122 00:13:29,460 --> 00:13:34,650 Now, renewal equations, the allure of them is is pretty great because they really intuitive. 123 00:13:34,650 --> 00:13:40,020 And if you read crystal paper, you'll see that it makes a lot of sense when they arise, 124 00:13:40,020 --> 00:13:46,450 you know, from from a humanistic, from an epidemiological basis. And I'm just putting the renewed equation there for all to see. 125 00:13:46,450 --> 00:13:50,550 You see I of T, the infected number of individuals can be distributed as they are. 126 00:13:50,550 --> 00:13:54,900 Plus a random variable will just say the mean of it by the reproduction number, 127 00:13:54,900 --> 00:13:58,650 which you touched upon the rate of transmission changing over time and all the 128 00:13:58,650 --> 00:14:04,470 previous infected individuals multiplied by some variable g of you extra back there, 129 00:14:04,470 --> 00:14:09,780 unfortunately. Now of you is known as the generation time and it it. 130 00:14:09,780 --> 00:14:14,160 It's something that actually is quite difficult to have precise meaning in epidemiology 131 00:14:14,160 --> 00:14:18,600 because it's defined in what you will find later that from the work that we've done, 132 00:14:18,600 --> 00:14:27,210 we give you an extremely precise. Well, the generation time is generally the time between infection between and in fact, infected pair. 133 00:14:27,210 --> 00:14:29,010 That's generally what it's called. 134 00:14:29,010 --> 00:14:37,140 And so in world, the renewal equation can generally be thought of as previous infections cause current infections depending on the stage of infection. 135 00:14:37,140 --> 00:14:40,200 It's very loose now. 136 00:14:40,200 --> 00:14:49,800 Renewal equations are, in my view, extremely powerful, and that's why I think they've been used so ubiquitously and are increasingly being used. 137 00:14:49,800 --> 00:14:57,720 So I wanted to go through two to explain all the old or the previous approaches to set the stage for one renewal of criticism for useful. 138 00:14:57,720 --> 00:15:02,940 You know, I almost require solving a series of differential equations, 139 00:15:02,940 --> 00:15:08,130 and solving differential equations can be quite problematic when we're doing things numerically, 140 00:15:08,130 --> 00:15:13,800 and they never available until form filled with stochastic differential equations are even more complicated. 141 00:15:13,800 --> 00:15:19,320 Network models are really nice, except you need to use a lot of assumptions about what's going on in the network. 142 00:15:19,320 --> 00:15:25,740 Agent based models that I've said are exceedingly complicated to do inference on very, very difficult and looks. 143 00:15:25,740 --> 00:15:27,510 Processes tend to be really, really nasty. 144 00:15:27,510 --> 00:15:34,740 To simulate on with renewal equations tend to be very easy to embed within probabilistic programming language. 145 00:15:34,740 --> 00:15:38,910 It's very easy to optimise and very easy to compute compute, right? 146 00:15:38,910 --> 00:15:42,690 It's just a thumb there with with with quadratic complexity. 147 00:15:42,690 --> 00:15:49,650 And so we have renewed equations to allow us to embed this idea of complexity within a mechanism. 148 00:15:49,650 --> 00:15:58,300 The renewal equation gives us some mechanism and we feed into this mechanism complexity to help explain epidemiology. 149 00:15:58,300 --> 00:16:03,000 Now, an example is some of the work that we've done on understanding the effectiveness of governmental 150 00:16:03,000 --> 00:16:11,370 interventions first in our initial paper and then subsequently in more work where we try to understand, 151 00:16:11,370 --> 00:16:19,290 you know, what interventions actually work to control the spread of SARS-CoV-2 and the disease COVID 19? 152 00:16:19,290 --> 00:16:23,490 One of them, or probably the most important question for a large period of this pandemic. 153 00:16:23,490 --> 00:16:28,950 And to do that, you know, there are many approaches that you can be taken, but by having a mechanism, 154 00:16:28,950 --> 00:16:34,260 we can we can account for the lags between infection and cases and infections of death. 155 00:16:34,260 --> 00:16:40,170 We can account for a mechanism that is plausible to explain the epidemiology of the renewal equation. 156 00:16:40,170 --> 00:16:45,270 And then we can embed some complexity in there in terms of putting in stochastic processes, 157 00:16:45,270 --> 00:16:49,740 doing our standard regression of terms with sensible priors. 158 00:16:49,740 --> 00:16:54,240 And we can put all this within state of the art probabilistic programming languages such that we 159 00:16:54,240 --> 00:16:58,740 have an entire framework from start to finish that can answer extremely important questions. 160 00:16:58,740 --> 00:17:04,190 And that's what you see on the figure of the effort. And it's hard to to not overemphasise this because, you know, 161 00:17:04,190 --> 00:17:09,300 the benefit of doing this kind of working in probabilistic programming languages 162 00:17:09,300 --> 00:17:14,370 is not simply a model can be extremely interesting and extremely relevant, 163 00:17:14,370 --> 00:17:23,670 but it can be extremely inconvenient to actually actually implement and and these models tend to be very useful. 164 00:17:23,670 --> 00:17:28,800 If we knew equation for the opposite is very easy to implement and you can do lots of sensitivity analysis, 165 00:17:28,800 --> 00:17:31,390 and that's not what you see in this blocks, you know, sensitivity analysis, 166 00:17:31,390 --> 00:17:36,840 the previous one prediction to the future and really allow you to understand the problem, 167 00:17:36,840 --> 00:17:41,760 allow you to summarise the problem and communicate this to decision makers in a way that they can. 168 00:17:41,760 --> 00:17:46,350 They can really understand. The other benefit of showing when you want equations. 169 00:17:46,350 --> 00:17:51,120 Being very powerful is that they can test epidemiological hypotheses by this. 170 00:17:51,120 --> 00:17:56,520 I mean, some of the work as an example that we did in Brazil, you know, 171 00:17:56,520 --> 00:18:05,120 there was a complex situation in the city of Manaus that had a huge wave that you'd seen in proxy the the yellow line that. 172 00:18:05,120 --> 00:18:08,990 And then they have a second wave. Now the question was, why did this second wave occur? 173 00:18:08,990 --> 00:18:14,930 Well, a new variant turned up, so we got some information from Apollo genetics about when that happened. 174 00:18:14,930 --> 00:18:23,750 But we really needed some sort of model with plausible epidemiological characteristics that could account for things like waning of immunity, 175 00:18:23,750 --> 00:18:28,520 you know, transmissibility, escape, which one of these hypotheses, 176 00:18:28,520 --> 00:18:33,410 how did these hypotheses linked together to really understand the situation on the ground 177 00:18:33,410 --> 00:18:37,940 and and renewed the questions allowed us to to have a fundamental mechanism to think we 178 00:18:37,940 --> 00:18:42,800 could tweak and used to test epidemiological hypotheses without having a mechanism in 179 00:18:42,800 --> 00:18:48,230 that arguably all you're doing is fitting to some data and predicting part of something. 180 00:18:48,230 --> 00:18:58,040 You're not really understanding the underlying mechanism that. And finally, some work done done by my colleagues at Imperial. 181 00:18:58,040 --> 00:19:02,320 You know, you can use renewal equations as well to embed other really interesting and 182 00:19:02,320 --> 00:19:07,270 complicated characteristics such as mobility to understand contact matrix patterns. 183 00:19:07,270 --> 00:19:11,620 And my colleague colleagues is doing a lot of work on this extended before. 184 00:19:11,620 --> 00:19:17,980 So the reason to do is to show these is not simply to highlight some papers in the work that we've done, but we're happy with. 185 00:19:17,980 --> 00:19:24,040 But but more to the point that the unifying factor behind all of these applications is the renewal rate. 186 00:19:24,040 --> 00:19:33,820 And the reason we stuck with the renewal equation is that it really embeds quite a lot of epidemiology in terms of the infection process. 187 00:19:33,820 --> 00:19:36,930 And so now that, you know, from all the approaches that I've gone in, 188 00:19:36,930 --> 00:19:44,440 I've explained to you and you know, I'm going to show some exciting aspects in the next slide. 189 00:19:44,440 --> 00:19:49,360 You know what, if when you do well, if you want to learn about renewal theory, really, 190 00:19:49,360 --> 00:19:53,710 the first and probably best paper on the topic was the work by fellow in 1941, 191 00:19:53,710 --> 00:19:58,390 when he looked at many aspects of the renewal and the renewal equation for 192 00:19:58,390 --> 00:20:02,920 those in other fields is also known as the military question of the second of. 193 00:20:02,920 --> 00:20:10,220 Basically, many integral equations with similar forms. And the question is how do we link this equation to infectious diseases? 194 00:20:10,220 --> 00:20:14,500 Well, previous research on the renewal equation tends to study robotic behaviour. 195 00:20:14,500 --> 00:20:21,250 That is how the renewal equation changes or what it averages to in the limit of time going to infinity. 196 00:20:21,250 --> 00:20:26,140 Or they look at single parameters, pull the multiuser employment now setting the stage. 197 00:20:26,140 --> 00:20:35,110 I've said that what we're really interested in is the rate of transmission, not just some growth parameter or some some estimate. 198 00:20:35,110 --> 00:20:39,280 And in truth, and I really think this is a really fruitful area of research. 199 00:20:39,280 --> 00:20:45,400 There are lots and lots of exciting links in renewing equations. 200 00:20:45,400 --> 00:20:55,040 The hawks process that I mentioned is because it turns out to be the expectation of the renewal of fantastic paper written by by and Andrew Hawks. 201 00:20:55,040 --> 00:21:03,130 Processes, in turn, are an auto regressive, infinite process on integers by Matthias Curtin in 2018. 202 00:21:03,130 --> 00:21:09,490 And when you look at the renewal equation, you can see that in this discrete sense, it looks remarkably like an auto regressive one. 203 00:21:09,490 --> 00:21:16,200 We did an auto move with a very specific set of coefficient determined by the. 204 00:21:16,200 --> 00:21:25,260 The generation tops these 50, there's something to one, but then Hawks processes another, great people by result are awesome if they are models. 205 00:21:25,260 --> 00:21:34,860 And like I said, the equation bye bye bye, baby champion that actually CIO models have a specific renewable equation for. 206 00:21:34,860 --> 00:21:35,460 And recently, 207 00:21:35,460 --> 00:21:44,760 I've been thinking about the Stochastic Volturi equation of being another big into this world to connect renewal equations to brunching process. 208 00:21:44,760 --> 00:21:48,720 So I procrastinate better because it's just something I'm interested in recently. 209 00:21:48,720 --> 00:21:50,820 So these are all kind of the same things, 210 00:21:50,820 --> 00:21:57,960 and that's that's quite cool because they all they're all encompassing the same idea that we have where one thing affects another. 211 00:21:57,960 --> 00:22:02,610 And at its core, when you reduce down and remove all the mathematics at its core, 212 00:22:02,610 --> 00:22:06,690 this is actually what all these approaches are doing from an intuitive level. 213 00:22:06,690 --> 00:22:13,290 They are reflecting some degree of self reference and some degree of one person infecting another. 214 00:22:13,290 --> 00:22:21,630 So after that, a brief history or rather long history, what is the question and the question is the any the question that you see there, 215 00:22:21,630 --> 00:22:27,030 which is the equation that Christopher and Corey have used in the continuous sense 216 00:22:27,030 --> 00:22:31,440 that the script and there is there a mathematical principle source for this equation. 217 00:22:31,440 --> 00:22:36,150 But that is not to say that their work was not mathematically principle. What I mean is, can we? 218 00:22:36,150 --> 00:22:41,160 Well, I'll tell you what I mean by that. Can we connect the stochastic process to this one? 219 00:22:41,160 --> 00:22:47,820 So the idea that we had was we first started with looking at at Belmont's paper and we derive this. 220 00:22:47,820 --> 00:22:54,300 It was reported that the Dresdner barrier in maps initially started doing. 221 00:22:54,300 --> 00:22:59,640 And the idea was, you know, how do we even just understand the government responses? 222 00:22:59,640 --> 00:23:05,370 And then as we moved on from that, we started thinking, Well, ANA, if the building process is an independent budget process, 223 00:23:05,370 --> 00:23:09,990 why don't we just pick the most general branching process we have and see what that means? 224 00:23:09,990 --> 00:23:17,280 And so most general budget process, at least to my knowledge, is what's called the general branching process or the Trump mode, Yeager's process. 225 00:23:17,280 --> 00:23:20,730 And I'm going to tell you what this process is, but basically, I'm just going to say it now. 226 00:23:20,730 --> 00:23:27,810 Once a couple of gigas processes the branching process where we start with one individual after a random amount of time, 227 00:23:27,810 --> 00:23:36,060 that individual can give rise to new individuals throughout their entire lifetime, and the number of individuals they give rise to is also random. 228 00:23:36,060 --> 00:23:41,460 So it's sort of a very general process, very akin to an individual based model. 229 00:23:41,460 --> 00:23:46,050 And that's why I particularly liked it. Because when using brunching process, 230 00:23:46,050 --> 00:23:57,450 we're essentially connecting the world of governing equations like the plastic infected recovered equations to the world of of agent based models, 231 00:23:57,450 --> 00:24:01,020 right by by defining the stochastic processes based on individuals. 232 00:24:01,020 --> 00:24:07,770 And I think I'm going to shoot towards the end that this is actually really important. It's something that lots are missing. 233 00:24:07,770 --> 00:24:11,980 And you know, you could always ask the question why a stochastic process, anyway? 234 00:24:11,980 --> 00:24:17,410 And I think this was best encompassed by a quote by Peter Yeager's, a plea for stochastic dynamics. 235 00:24:17,410 --> 00:24:21,970 I mean, I guess the whateley in there already showed how much he believed in this. 236 00:24:21,970 --> 00:24:29,130 You know, it is argued that biological populations are finite and consisting of individuals with a varying lifespan and reproduction, 237 00:24:29,130 --> 00:24:38,040 and they should be model as such. Now what he writes here that you know, this is what underlies biological processes, at least in infections. 238 00:24:38,040 --> 00:24:46,320 Epidemics are comprised of individuals with varying infection durations and varying amounts of who they infect, and they should be model as such. 239 00:24:46,320 --> 00:24:51,240 And the key is the modern probability theory allows for this. 240 00:24:51,240 --> 00:25:01,770 And so that's what we wanted to do. And and it goes way beyond just this philosophy of science, of why use stochastic process to to to do you know, 241 00:25:01,770 --> 00:25:04,770 why stop with the stochastic process and see where that leads us, 242 00:25:04,770 --> 00:25:11,490 even though I've already given you the spoiler that that it leads you to the renewal equation of kristoffersson and lockdown, et cetera. 243 00:25:11,490 --> 00:25:16,920 But it is, you know, the stochastic process tells you a lot of things that we'll see further down the line. 244 00:25:16,920 --> 00:25:25,620 If you simulate from an independent budget process and Harris proxies, you see the simulations in the right plot as the black lines. 245 00:25:25,620 --> 00:25:34,230 And we're interested in completing these the green line of the red doubling, which is the mean the multicolour expectation of this diagnostic process. 246 00:25:34,230 --> 00:25:42,390 Now, the underlying stochastic process has so many interesting aspects that we tend to lose by if we just look at the mean, 247 00:25:42,390 --> 00:25:48,360 if we just look at compartments, we look at the level of profound overexpression when simulating from the right. 248 00:25:48,360 --> 00:25:53,940 You're right, it really lets you know that you know, when you compute communicating risk of infectious diseases, 249 00:25:53,940 --> 00:25:59,760 just looking at the mean probably isn't going to cut it. And this is something that I will touch on at the end. 250 00:25:59,760 --> 00:26:07,680 And so I'm going to breeze past this because I don't want to spend too much time on this and also the brand alone for too long on this lecture. 251 00:26:07,680 --> 00:26:15,270 But I'm going to just go through a little bit about point processes and the more creative process just to give you a flavour of of, 252 00:26:15,270 --> 00:26:21,150 you know, what we do and how these things are computable and derivable from a mathematical sense. 253 00:26:21,150 --> 00:26:25,470 The first let me give you the intuition of the branching process. I'll give two examples. 254 00:26:25,470 --> 00:26:33,720 First, let me go to Belman Harris Brunch, Billman, Harris going to process that one individual after a random amount of time, 255 00:26:33,720 --> 00:26:38,370 that individual will give rise to a random number of new individuals. 256 00:26:38,370 --> 00:26:45,850 So in this case, the orange individual will give rise to three blue individuals after some random part. 257 00:26:45,850 --> 00:26:52,300 Now, the random time here is very well connected to the generation time, and it has a very precise meaning in this sense. 258 00:26:52,300 --> 00:26:59,500 But you could also consider a more complicated model based on a homogeneous point process where you start with your orange individual. 259 00:26:59,500 --> 00:27:03,400 This orange individual remains infectious for a certain amount of time. 260 00:27:03,400 --> 00:27:07,780 You know, they get infected. They they recover from that unless, let's assume. 261 00:27:07,780 --> 00:27:10,170 But they can infect people all the way to infinity. 262 00:27:10,170 --> 00:27:16,510 They go over their duration of infection based on how infectious they are and all of this process of infection. 263 00:27:16,510 --> 00:27:22,240 They will generate new infections from in homogeneous point process with a very rigid transmission. 264 00:27:22,240 --> 00:27:28,570 Now, these branching processes give us a very simple way to to to to sort of model very plausible processes. 265 00:27:28,570 --> 00:27:33,820 What I'm telling you now lays bare all the assumptions behind it. 266 00:27:33,820 --> 00:27:38,740 You know, it's not like I've given you some simple model and you you're looking at the compartments and you wondered, 267 00:27:38,740 --> 00:27:43,690 but I guess the compartments, but how the dynamics between them, you know, it's much needed. 268 00:27:43,690 --> 00:27:50,980 This is why people like agent based models, because it's very easy to understand the assumptions, the precise assumptions. 269 00:27:50,980 --> 00:27:53,830 And of course, we can derive two very complicated models funded. 270 00:27:53,830 --> 00:28:04,240 So using our branching process, can we actually understand these two scenarios easily? 271 00:28:04,240 --> 00:28:11,200 And so let's just go through and I'll go through this because I know it's really dry and I'll go through it reasonably quickly so we can move on. 272 00:28:11,200 --> 00:28:16,930 We start with the crumpled Yeager's process, which the general budget does stuff with one individual at some time. 273 00:28:16,930 --> 00:28:22,180 I said this before that individual state infectious for a certain amount of time. 274 00:28:22,180 --> 00:28:30,760 That's that this l parameter, l l l variable, and then we have X, which is a stochastic process called a random characters. 275 00:28:30,760 --> 00:28:36,940 This is beautiful and collaborative. Mika has done all the heavy lifting on this aspect as well. 276 00:28:36,940 --> 00:28:41,950 When he first introduced this, I thought it was so beautiful and intuitive to make to make life easy. 277 00:28:41,950 --> 00:28:47,920 Because when we first started doing this from the derivation of of of of Harris and Belman, 278 00:28:47,920 --> 00:28:51,040 it's really messy and complicated, and we have to go into major theory. 279 00:28:51,040 --> 00:28:58,150 And then suddenly, you know, when looking at point processes, you realise just how powerful point processes are in simplifying it. 280 00:28:58,150 --> 00:29:02,020 And then we have end, which is the counting process, which is highly intuitive. 281 00:29:02,020 --> 00:29:05,890 It's just keeping track of the number of new infections generated by the individual. 282 00:29:05,890 --> 00:29:12,970 So we have the right amount of time, a counting process and a stochastic process called the random characteristic. 283 00:29:12,970 --> 00:29:17,510 And so for each individual, we have these three three bits. 284 00:29:17,510 --> 00:29:25,360 So for example, now we can write down the moment Harris process, which is this equation the first equation there, which is extremely simple. 285 00:29:25,360 --> 00:29:35,110 We say that, you know, for some, the number of infected individuals at some time you is zero until sometime elsewhere. 286 00:29:35,110 --> 00:29:40,180 They give rise to within you infected. And this is exactly what I'd said there. 287 00:29:40,180 --> 00:29:46,210 In fact, individual new fatality and then generate new ones. The point, of course, is very simple. 288 00:29:46,210 --> 00:29:53,440 We just have a unit, a unit unit problem process with some transmission rate roll from infectiousness K. 289 00:29:53,440 --> 00:29:58,300 And then the integral of that, as you do generally within your own body, is pretty process. 290 00:29:58,300 --> 00:30:02,240 So it suddenly becomes very easy to define these characteristics. 291 00:30:02,240 --> 00:30:10,310 Similarly, the random characteristics, which you know, it becomes very easy to define incidence and prevalence rate as incidence, 292 00:30:10,310 --> 00:30:13,590 that cumulative incidence and prevalence, of course, 293 00:30:13,590 --> 00:30:18,230 is just the number of individuals between two time points, and that's exactly what it's showing there. 294 00:30:18,230 --> 00:30:22,100 It's been very easy to include the measures that we're interested in. 295 00:30:22,100 --> 00:30:27,220 So I'm just going to go in some derivation and I'm going to zoom past this because I don't bore everyone. 296 00:30:27,220 --> 00:30:33,620 You know, we define some, some across our random characteristics. We single out the index case and this is, of course, 297 00:30:33,620 --> 00:30:39,950 the most important and really the most beautiful aspect of Belmont came about and how it came about, 298 00:30:39,950 --> 00:30:45,920 which is the first generation separating out the index case from all other cases. 299 00:30:45,920 --> 00:30:52,160 And the intuition behind it is that doing this, the first generation we can create some degree of self-protection, 300 00:30:52,160 --> 00:31:00,410 some self similarity in terms of what the fifth generation thought. And so we then split our our we can look at the second part of first generation 301 00:31:00,410 --> 00:31:05,210 and we can we can split that into each one of the first generations there. 302 00:31:05,210 --> 00:31:09,170 And that's what we have. Kate, as part of this set, I have Kate. 303 00:31:09,170 --> 00:31:16,310 And then, depending on the statistic of choice, we can use a tower property of expectations to create an expected value. 304 00:31:16,310 --> 00:31:21,200 Remember, we start with this stochastic process, right, which is generating realisations, 305 00:31:21,200 --> 00:31:32,520 and what we're interested in is the average and as we'll see later, the second higher or the moment I'm almost done, we make things look easier. 306 00:31:32,520 --> 00:31:35,120 Change that sum into into an interval. 307 00:31:35,120 --> 00:31:42,690 And really as simple as that, because just algebraic operations, even though they look, they may look complicated. 308 00:31:42,690 --> 00:31:46,370 Now, you know, they're actually not. They're very straightforward. 309 00:31:46,370 --> 00:31:50,180 There's nothing there's nothing really, really complicated about this other than having to drive it in first place. 310 00:31:50,180 --> 00:31:55,340 But you know, this would make a huge difference for. And we get to renew the question. 311 00:31:55,340 --> 00:32:01,970 And this is this is the first point where, you know, we realise this with the bellman Harris courses with a complicated process. 312 00:32:01,970 --> 00:32:13,040 It's wonderful because after all of this, we have a branching process that tells us and we can imbue dynamics of individuals and we calculate the 313 00:32:13,040 --> 00:32:18,440 average of that budget process for a measure that we're interested in for a specific counting process. 314 00:32:18,440 --> 00:32:21,890 And what we get is a renewed equation, a very general equation. 315 00:32:21,890 --> 00:32:28,430 Now this equation looks nothing like the equation of Christoph and course, it's far, far more general. 316 00:32:28,430 --> 00:32:32,360 But. You know, to get this sorry for all the months, 317 00:32:32,360 --> 00:32:39,260 but to get to the point what this equation does is it allows us for the first 318 00:32:39,260 --> 00:32:43,820 time to actually have a time varying reproduction number for general brunching, 319 00:32:43,820 --> 00:32:51,110 just so we can now use this framework, which I think, you know, although the MAX looks complicated, I think, you know, 320 00:32:51,110 --> 00:32:59,450 if you if you if an individual wants to put time now, they can go and they can say, well, give my individual branching process. 321 00:32:59,450 --> 00:33:04,640 I want to, you know, tweak it in certain ways to have it behave in certain dynamics. 322 00:33:04,640 --> 00:33:09,660 Can I then derive an unexpected formula from that? Or can I use that renewal equation to do that? 323 00:33:09,660 --> 00:33:13,190 That's that's exactly what what we did in our processes. 324 00:33:13,190 --> 00:33:20,780 And so we should really question for the first time that you can do many new and interesting aspects. 325 00:33:20,780 --> 00:33:26,960 Individuals can infect an individual after some random time, and the number can vary. 326 00:33:26,960 --> 00:33:32,780 They can it at random times, a person process not leaving, but in principle, you can look at compound based on processes. 327 00:33:32,780 --> 00:33:37,130 And that would be interesting. And the number they can infect changes over time. 328 00:33:37,130 --> 00:33:43,110 So this is the really new contribution. At least, you know, it seems very esoteric, but from from our viewpoint, 329 00:33:43,110 --> 00:33:50,270 it's connecting the world of branching processes to the world of renewal equations, which already have a huge amount of epidemiological basis. 330 00:33:50,270 --> 00:33:56,880 But doing it in such a way that it's stuff becoming extremely customisable to to to change it. 331 00:33:56,880 --> 00:34:06,150 And so an example of that customisable image, we can actually create a renewable equation for the Belmont harvest process or the homogeneous process. 332 00:34:06,150 --> 00:34:11,940 And what is really interesting is that these two epidemiologically are almost the same. 333 00:34:11,940 --> 00:34:22,890 So we found it really confusing at first when you know the equation that everyone uses where, you know, previous infections affect future infections. 334 00:34:22,890 --> 00:34:28,620 And that equation explicitly abutment Harris equation model. 335 00:34:28,620 --> 00:34:33,210 We found this assumption that all the infections have to happen at one time. 336 00:34:33,210 --> 00:34:38,640 This idea of an instantaneous are we thought that that was a very poor approach to modelling infectious diseases. 337 00:34:38,640 --> 00:34:40,620 And so why do the renewal equations? 338 00:34:40,620 --> 00:34:48,540 Well, it turns out that the, you know, genius point processes is, for all intents and purposes, practically the same epidemiologically. 339 00:34:48,540 --> 00:34:55,950 And so you can interpret the renewal equation either through a bellman harris view or through an homogeneous point of view. 340 00:34:55,950 --> 00:35:03,210 And this is well to me, mind blowing, but maybe, you know, new to us. 341 00:35:03,210 --> 00:35:13,260 And in fact, you know, after much struggling, I really struggled with this because it took so long made me and my postdoc at the time, Thomas Mallon. 342 00:35:13,260 --> 00:35:17,670 We proved that via induction, at least in the discrete case, 343 00:35:17,670 --> 00:35:24,630 that the common renewed equation that Christoph Anon and others introduced is actually a very special case of our. 344 00:35:24,630 --> 00:35:31,350 But what we do is we provide renewal equations for cumulative incidence incidents and most importantly, prejudice. 345 00:35:31,350 --> 00:35:36,750 To my knowledge. Up to now, no one has a renewed equation for presidents. 346 00:35:36,750 --> 00:35:44,610 And everyone uses this what's called a back calculation approach where you could evolve incidents with the generation interval to get pregnant. 347 00:35:44,610 --> 00:35:52,350 This requires you to have latent functions and processes. It's not elegant, to say the least, but also not practically very nice. 348 00:35:52,350 --> 00:35:57,660 What we do is we unify prevalence and incidence under the back calculation approach now. 349 00:35:57,660 --> 00:36:04,110 This is one aspect which I truly understand. And also you'll have to ask me about the specifics of this proof. 350 00:36:04,110 --> 00:36:08,700 But essentially, we prove that and I say proven elusive the system. 351 00:36:08,700 --> 00:36:12,660 There's still some more bits on this that that need to be done in terms of full rigour. 352 00:36:12,660 --> 00:36:17,460 But we prove that our relationship with prevalence and incidence in this new found renewal 353 00:36:17,460 --> 00:36:25,600 equations are they are exactly they conform exactly to what we know from epidemiology. 354 00:36:25,600 --> 00:36:27,120 And this is a really, you know, 355 00:36:27,120 --> 00:36:35,250 it might be a beautiful thing that you derive all this complicated maths and the complicated maths reflects what people already know in epidemiology. 356 00:36:35,250 --> 00:36:42,270 Be sort of doing informal mathematics. What is known heuristic it to be true in the field of epidemiology. 357 00:36:42,270 --> 00:36:48,300 So we show that not only can we give you, can I give you a renewed equation for prevalence of incidence, 358 00:36:48,300 --> 00:36:54,150 but I can show you that these two equations are consistent under the common definition that unify prevalence in this century, 359 00:36:54,150 --> 00:36:58,230 we provide the first framework that can that can unify Palestinians. 360 00:36:58,230 --> 00:37:07,420 Now, starting from the basis of Christof or Brown's equation, you can't immediately go into writing because this is we have tried this in the past. 361 00:37:07,420 --> 00:37:15,180 This is really difficult, and it's because you need to do this from the underlying stochastic process, and it's much easier to do it that way. 362 00:37:15,180 --> 00:37:22,470 Starting from very simple principles and building up, then starting from the end and trying to do that. 363 00:37:22,470 --> 00:37:32,100 Now, all of this is pointless. If it's not easy to code and it's trivial to go, this is solve the entire renewal equation for prevalent in this block. 364 00:37:32,100 --> 00:37:39,930 This block requires nothing but element wise multiplications, and Rosso's now in modern statistical computers on GPUs. 365 00:37:39,930 --> 00:37:46,080 These two computations element wise multiplication and growth are exceedingly fast, 366 00:37:46,080 --> 00:37:51,210 so fast such that the bottleneck is actually starting from the underlying posterior distribution. 367 00:37:51,210 --> 00:38:00,060 We're optimising it across some non-contact space that actually takes more time than actually solving the equations themselves. 368 00:38:00,060 --> 00:38:03,420 And so although our equations are slightly more complicated than the real equations 369 00:38:03,420 --> 00:38:09,690 probably use because they are far more general and you can do a lot more with them. They actually don't take that much time to solve. 370 00:38:09,690 --> 00:38:16,890 And in fact, I've been playing around a lot with Julia recently because I really like Julia now, and you can just you can write this even simpler. 371 00:38:16,890 --> 00:38:24,440 Julia, this is a series of moves. So who that was a lot of content we arrive. 372 00:38:24,440 --> 00:38:30,240 What have we learnt? We've arrived at the commonly used equation in a principle that was outgoing. 373 00:38:30,240 --> 00:38:36,180 Is it groundbreaking in terms of changing the face of epidemiology? No, obviously not. 374 00:38:36,180 --> 00:38:41,850 But, you know, we show where this equation arrived for the principal mathematical way from osteopathic priests. 375 00:38:41,850 --> 00:38:47,220 And I think the most important thing is we connect the world of agent based models to govern immigration. 376 00:38:47,220 --> 00:38:50,700 So that people can build on, we put these two equations equivalent, 377 00:38:50,700 --> 00:38:55,110 i.e. the renewal equation currently being used in the special case of a more general emergency. 378 00:38:55,110 --> 00:39:00,780 We unified pretty privileged nations and we provide an efficient computational scheme. 379 00:39:00,780 --> 00:39:06,900 So applications, you know, the most obvious one is go, you know, analyse the grid. 380 00:39:06,900 --> 00:39:12,150 And this is a great thing that I'm going to release all the data in, too, so that everyone can analyse it. 381 00:39:12,150 --> 00:39:16,740 We can go and model. It was using these equations in a full Bayesian framework really easily. 382 00:39:16,740 --> 00:39:26,160 And we get exactly what you'd expect using stochastic processes. The benefit is that we don't have to use any arbitrary assumptions behind, 383 00:39:26,160 --> 00:39:31,380 you know, the functional form of our team here when you try to move processes. But you could do the governing process. 384 00:39:31,380 --> 00:39:37,110 You could use piece planes, which are basically random walks, which are basically consequences you can use. 385 00:39:37,110 --> 00:39:41,400 You can use any functional form you want in there to estimate it. 386 00:39:41,400 --> 00:39:45,510 So our renewal equations can do what the previous equations did. 387 00:39:45,510 --> 00:39:53,190 And you know, that's not new, but I'm just telling you that you haven't lost anything here, but you can do new things. 388 00:39:53,190 --> 00:40:00,690 And this is a recent example that I did before presidents. So this is the U.K. owner's infection survey. 389 00:40:00,690 --> 00:40:04,890 And in the plot on the on the top right, you see the prevalence, 390 00:40:04,890 --> 00:40:10,050 the population prevalence over time and you can see the access that this is 391 00:40:10,050 --> 00:40:15,090 just the percentage of individuals that are testing positive for SARS-CoV-2, 392 00:40:15,090 --> 00:40:19,050 the number that have COVID 19 at any given point. 393 00:40:19,050 --> 00:40:25,260 And we want to estimate the reproduction number. If you want to do this, you have to use some form of that calculation, which is very difficult, 394 00:40:25,260 --> 00:40:28,830 and I haven't been able to find how the, you know, the owners do this. 395 00:40:28,830 --> 00:40:31,380 I need to speak to Thomas Health about this at some point. 396 00:40:31,380 --> 00:40:38,040 But I guarantee it's not as simple as what I'm doing right here, where I have a renewal equation that I can fit, 397 00:40:38,040 --> 00:40:44,430 for instance, that is the same with you with the question is this one? I just added an extra term on that and solve it in the exact same way. 398 00:40:44,430 --> 00:40:46,350 I again even have to modify the code, 399 00:40:46,350 --> 00:40:55,080 and I could get a very good estimate of the case reproduction number to validate this using the same renewal equation without doing any extra fitting. 400 00:40:55,080 --> 00:41:03,510 I get incidents straight away and I and that's what the bottom left plot to show you with the incident and the blue bars and the actual cases, 401 00:41:03,510 --> 00:41:06,920 and the red is the fitted incidence. 402 00:41:06,920 --> 00:41:13,770 I look at that and I see a line of about seven days, which is completely in line with what you'd expect when I get infected, 403 00:41:13,770 --> 00:41:20,300 it takes about seven days until that to manifest of reported case. You look at them from that, you can calculate the ascertainment ratio, 404 00:41:20,300 --> 00:41:27,800 which is around 2.5 and is remarkably stable, apart from weekly fluctuations and from different places. 405 00:41:27,800 --> 00:41:34,790 You know, if we could run the tape of the pandemic again, I have no doubt that such a framework would be extremely useful. 406 00:41:34,790 --> 00:41:37,850 Maybe I'm being good to highlighting. I would do much, 407 00:41:37,850 --> 00:41:44,270 but I think would be extremely useful for something like react to audio in this setting because we can now have a renewal 408 00:41:44,270 --> 00:41:51,090 question for a president that links back to a budget process that has epidemiological mechanisms that we can then build on. 409 00:41:51,090 --> 00:41:58,040 As me and my colleagues have done on several different applications in the past, and I think this is really powerful. 410 00:41:58,040 --> 00:42:02,300 But in the last few bits, I'm just going to talk about an application for the variance. 411 00:42:02,300 --> 00:42:08,720 Now this is this is really one of the cool things you can get with these renewal equations are generating funds with generating functions. 412 00:42:08,720 --> 00:42:16,970 You can get all the higher order moments for not just the mean, but the very. And there is a really important question of superspreading, right? 413 00:42:16,970 --> 00:42:21,570 You know, superspreading has been there all the time in terms of, you know, 414 00:42:21,570 --> 00:42:27,030 not the people in that superspreading, this big thing, one person, in fact. How does superspreading actually arise from it? 415 00:42:27,030 --> 00:42:30,470 You know, closet and co. have also set the scale free networks of rice. 416 00:42:30,470 --> 00:42:36,300 Where does this scale free of heavy tailed behaviour in budget processes actually come from, right? 417 00:42:36,300 --> 00:42:38,450 Is it only from the secondary distribution, 418 00:42:38,450 --> 00:42:46,980 i.e. the reason that these these branching processes in epidemics have really heavy tails is because I might go to a festival and in fact, 419 00:42:46,980 --> 00:42:52,830 100 people is not the only dynamic at play. And no, actually. 420 00:42:52,830 --> 00:42:58,200 And I thought of this a lot in the view of the central limit theorem that that superspreading is really not a big deal, 421 00:42:58,200 --> 00:43:03,840 but I was using the new kind of central libertarian, which does not account for weak dependence. 422 00:43:03,840 --> 00:43:10,620 And I realised after that that I was entirely wrong because there is actually no central limit for these budget processes because 423 00:43:10,620 --> 00:43:17,520 the dependence on the time and you can see the effect of superspreading when you simulate from these branching processes, 424 00:43:17,520 --> 00:43:26,190 look at how profound the over dispersion is there. And let's take a simple experiment to look at this in the top left corner. 425 00:43:26,190 --> 00:43:35,890 We have the reproduction number at the rate of transmission oscillating between growth, you know, reduction growth reduction growth reduction. 426 00:43:35,890 --> 00:43:45,370 In in the top right, you have the rather intuitive mean presidents that have a large first wave, a smaller second wave, smaller, smaller and smaller. 427 00:43:45,370 --> 00:43:47,980 What do you see in the bottom left is really fascinating. 428 00:43:47,980 --> 00:43:56,690 Is the variance now the variance in the second wave, despite the mean being smaller, is much, much bigger. 429 00:43:56,690 --> 00:43:58,040 And this is, in my view, 430 00:43:58,040 --> 00:44:04,550 a little bit unintuitive because we're used to thinking of profound likelihoods or likelihoods where the variance is some function of the mean. 431 00:44:04,550 --> 00:44:11,540 And in truth, the variance is huge and you know, each subsequent wave, the variance grows in uncertainty. 432 00:44:11,540 --> 00:44:18,410 You know, even the you in the third wave is quite quite small, but the variance is still one of the biggest at the variance in the first wave. 433 00:44:18,410 --> 00:44:26,510 What is going on here, right? This is a dynamic that I haven't seen, personally discussed or highlighted much in the literature, 434 00:44:26,510 --> 00:44:32,750 but you can see it in the simulations where superspreading is emerging and you can see other dynamics from this, 435 00:44:32,750 --> 00:44:40,340 which all of us in infectious diseases know about extinction. Yet we rarely integrate these into our modelling framework, right? 436 00:44:40,340 --> 00:44:47,510 You can model using these renewal equations in our framework. The precise expected extinction probability and you know, 437 00:44:47,510 --> 00:44:53,230 you can change these from having a possibly secondary distribution of negative binomial secondary dispersion. 438 00:44:53,230 --> 00:44:57,380 And so in plot eight, you can see the index of dispersion, which is huge, right? 439 00:44:57,380 --> 00:45:05,300 It's huge. You know, it's massive. And then you can see the extinction probabilities which again conform to what we're seeing as time goes by. 440 00:45:05,300 --> 00:45:11,960 If there's no new importation event, some epidemics are just going to burn out, especially when the Arctic is less than zero. 441 00:45:11,960 --> 00:45:15,100 And that's what that extinction probability is very short. 442 00:45:15,100 --> 00:45:22,320 And so finally, in the final slide, what I'm really interested in looking at now is what is the appropriate likelihood, right? 443 00:45:22,320 --> 00:45:24,360 And let's just take our example, 444 00:45:24,360 --> 00:45:31,890 if we simulated the planting process in and we looked at the underlying distribution at those five points that, well, the histogram, 445 00:45:31,890 --> 00:45:37,050 we see that the blue blueberries, what you would get with a profound likelihood and the possible likelihood is telling you, 446 00:45:37,050 --> 00:45:45,600 well, my main problem at that point two is about 60. So let me put a bar at 60 and have some discussion on that. 447 00:45:45,600 --> 00:45:51,450 But when you look at the underlying distribution that simulated from an individual based model, 448 00:45:51,450 --> 00:46:00,580 the branching process, the dynamics are very different. Long story short, what the dynamics tell you is that. 449 00:46:00,580 --> 00:46:08,530 Actually, what happens is many epidemics go extinct and some blow up really large towns. 450 00:46:08,530 --> 00:46:13,540 And this is true even when your secondary distribution is forced on you. 451 00:46:13,540 --> 00:46:22,630 Very finite, it doesn't have a habitat. This idea of heavy handedness emerges is an emergent property of branching processes, right? 452 00:46:22,630 --> 00:46:28,680 Getting superspreading, you don't need individuals to be super spreaders, the brunching process itself. 453 00:46:28,680 --> 00:46:37,560 Has high variance in high teens. And so I am now conjecturing and start to wonder whether using a negative binomial likelihood or 454 00:46:37,560 --> 00:46:43,500 unlikelihood alone is actually extremely inappropriate for capturing the dynamic that we know to be true. 455 00:46:43,500 --> 00:46:50,310 I actually don't know if it'll make a difference. I suspect it won't. Unfortunately, isn't, as in the case of many of these things. 456 00:46:50,310 --> 00:46:52,890 You know, sometimes an approximation is often good enough, 457 00:46:52,890 --> 00:46:59,310 at least for epidemiological purposes doesn't actually change the underlying estimation, but I think it's extremely interesting. 458 00:46:59,310 --> 00:47:05,220 And so the questions for the audience of those who have health is, you know, I have these underlying distribution here. 459 00:47:05,220 --> 00:47:12,270 How do I get a parametric choice for maximum entropy seems good, but I have other couple of techniques up my sleeve. 460 00:47:12,270 --> 00:47:15,490 But you know, what's the best solution to the moment? 461 00:47:15,490 --> 00:47:21,190 Probability generating distribution from a BGF or any approximation forms can result the renewal equation even more 462 00:47:21,190 --> 00:47:28,870 efficiently than just using some to where you transformed in all that last time from stuff doesn't make things much faster. 463 00:47:28,870 --> 00:47:32,830 So where do we go next from here? I want to do more unification. 464 00:47:32,830 --> 00:47:34,060 I want to unify all these approaches. 465 00:47:34,060 --> 00:47:40,840 I think they are all essentially the same thing could benefit from more and more linkage together to understand the underlying roots behind. 466 00:47:40,840 --> 00:47:48,370 I think that what we've described here is a perfect fit for genetics as opposed to using a general model. 467 00:47:48,370 --> 00:47:55,990 Why not use an age dependent Bacik process with a fixed splitting of two and then estimating a time varying generation time, 468 00:47:55,990 --> 00:48:01,090 which would be a replacement for Scotland blocks? Simple to simulate from. 469 00:48:01,090 --> 00:48:08,020 Simple do inference from and you can then link it straight to epidemic's using the same equations. 470 00:48:08,020 --> 00:48:10,030 More efficient ways of competition exist. 471 00:48:10,030 --> 00:48:19,030 I'd like to embed these processes on graphs so we can measure the worth of networks to these well, and the question of immigration is huge. 472 00:48:19,030 --> 00:48:25,250 Bringing new infections from outside is very important and requires proper treatment. 473 00:48:25,250 --> 00:48:28,330 That's challenging, so I want to say thanks. 474 00:48:28,330 --> 00:48:35,770 And first and foremost, amigo who really did the heavy lifting of the max on this election has been interacting with me throughout all of this. 475 00:48:35,770 --> 00:48:39,100 He's really lovely to work with and brilliant, actually brilliant. 476 00:48:39,100 --> 00:48:47,080 And to Thomas Thomas Mellon and senior moderator who helped with all aspects of this in three or four draughts. 477 00:48:47,080 --> 00:48:55,720 Transdermal drug the work in the intermediate paper Charlie helped me later on in subcommittee meetings came up with this idea in the first place, 478 00:48:55,720 --> 00:48:58,600 long, long time ago. There is no project with amazing collaborators, 479 00:48:58,600 --> 00:49:07,570 and I hope I hope it's useful for the epidemiological community rather than just a mathematical curiosity. 480 00:49:07,570 --> 00:49:13,280 Thanks. All right. 481 00:49:13,280 --> 00:49:17,450 We have 10 minutes for questions, please write them in the chat or raise your hand. 482 00:49:17,450 --> 00:49:21,720 And Chris, you're. Thanks. 483 00:49:21,720 --> 00:49:31,050 Great talk. You met you said there was a difficulty with relating the standard generic equation of epidemiology to prevalence. 484 00:49:31,050 --> 00:49:35,580 If we're talking numerically and not just not analytically, 485 00:49:35,580 --> 00:49:43,860 is it not just a question of involving the past incidence with the probability of contributing to the prevalence at a given moment, 486 00:49:43,860 --> 00:49:50,840 whether by being PCR positive or seropositive or whatever? It is exactly that, except you, 487 00:49:50,840 --> 00:49:59,460 you need to first create a latent function for incidents and then involve that and then put that into a likelihood for governments right now. 488 00:49:59,460 --> 00:50:03,000 It probably has the same impact in terms of posterior computation. 489 00:50:03,000 --> 00:50:11,490 It's just that when you type one thing into another thing it sometimes can be can cause lots of issues with some people from a difficult question. 490 00:50:11,490 --> 00:50:17,640 Whereas here it's just the equivalent of having a renewed, aggressive Bridgerton's. You don't need a latent function first and then controlled. 491 00:50:17,640 --> 00:50:23,970 I mean, that's already in there anyway. It would be interesting to test and see how bad the posterior teeth are. 492 00:50:23,970 --> 00:50:31,050 I mean, you know, nothing we've put in here hasn't already had a solution for the epidemiology and lot of political epidemiology. 493 00:50:31,050 --> 00:50:37,050 The question is, you know, is it better? In some ways, I think it's more intuitive and more succinct. 494 00:50:37,050 --> 00:50:41,220 Is it better? I think it's better from a computational perspective, but that remains to be. 495 00:50:41,220 --> 00:50:46,740 The fact that we tested it probably changes on on different and I don't know how the noise propagate, propagating them. 496 00:50:46,740 --> 00:50:48,360 There's a lot of things here. 497 00:50:48,360 --> 00:50:56,010 If I want to fit to react to the cases, I can trivially have to likelihood that that really cost me very little, much more, 498 00:50:56,010 --> 00:51:02,220 whereas it can sometimes get a little bit more difficult when you have to do one consulting to the other in principle. 499 00:51:02,220 --> 00:51:08,100 Same thing, you. Krista, yeah, thanks. 500 00:51:08,100 --> 00:51:22,320 Great talk. I have two questions really about Stochastic City when we submit a stochastic process sort of to to to tune the renewal equation 501 00:51:22,320 --> 00:51:31,980 estimates as we tend to switch between Belmont Harris and then homogeneous crescent representation more or less at random. 502 00:51:31,980 --> 00:51:39,540 You mentioned that you've shown that they behave the same and then the men feel, do they behave the same stochastic play? 503 00:51:39,540 --> 00:51:47,640 So that's the first question. And then once we got over dispersion, we tend to do that in Belmont Harris because it's easier now. 504 00:51:47,640 --> 00:51:52,530 But we've never found a good way to estimate the amount of over dispersion from the time source. 505 00:51:52,530 --> 00:51:58,860 So we always end up looking for other data sources. Do you think that's that's even possible? 506 00:51:58,860 --> 00:52:01,450 So as we enter in the degree of statistical? 507 00:52:01,450 --> 00:52:07,620 Yeah, I mean, the same is the same in the mean, but they are not they're not the same in the individual which processes. 508 00:52:07,620 --> 00:52:14,550 But, you know, at the end of the day, I think it only changes the degree the variance, you know, changes. 509 00:52:14,550 --> 00:52:18,720 But in terms of the mean, they're the same. But in terms of the simulating, yeah, they are different. 510 00:52:18,720 --> 00:52:22,320 I think it depends what you what you're going to summarise from them after that. 511 00:52:22,320 --> 00:52:29,040 That makes it in terms of the over this question, I'm convinced that the over dispassionate as you'd expect, is. 512 00:52:29,040 --> 00:52:34,910 The benefit of using generating functions is actually I can provide a renewed equation for 513 00:52:34,910 --> 00:52:41,460 the DeFi of the of the negative binomial that requires very little additional computation. 514 00:52:41,460 --> 00:52:49,200 So I can provide you in close for what the variance should be from theory rather than you just estimating one fine for all of your data, right? 515 00:52:49,200 --> 00:52:52,380 You get them incorrect by using linear equation you put in a fire. 516 00:52:52,380 --> 00:53:00,240 And the problem is that that fire is not inflationary, but it should be not stationary, given what we know from individual simulations. 517 00:53:00,240 --> 00:53:03,870 I could provide you with that equation, of course, for. 518 00:53:03,870 --> 00:53:10,570 It's still not perfect because we have an accounting for the zero inflation, and I can provide you an increasingly platform for zero inflation, 519 00:53:10,570 --> 00:53:16,230 but then fiddling with a zero inflated negative binomial and solving equations 520 00:53:16,230 --> 00:53:22,440 algebraically I've been struggling with to to get it without using non-linear solving. 521 00:53:22,440 --> 00:53:30,240 But essentially, I think we can provide you with with the higher order moments trivially. 522 00:53:30,240 --> 00:53:37,080 And do you think it's identifiable? We've always struggled. No, I don't think it's I think it's because we only have one academic. 523 00:53:37,080 --> 00:53:45,420 The truth is, it's completely not identifiable. You have to trust that the that that the assumptions of your stochastic process represent reality 524 00:53:45,420 --> 00:53:52,890 reasonably well and then say that although I can identify what the variance is from observational data, 525 00:53:52,890 --> 00:53:57,240 I think from theory it should be this. And so but I think it'd be completely identifiable. 526 00:53:57,240 --> 00:54:05,040 I mean, you know, you'd need we need to rerun the tape of time on academics many times over actually get an estimate of the variance. 527 00:54:05,040 --> 00:54:08,430 Otherwise, we have confounding and all these other problems. 528 00:54:08,430 --> 00:54:15,020 But I think you know, what I want to do is and I'm going to do is fit the prevalence data from the over this, 529 00:54:15,020 --> 00:54:18,180 you think, a more appropriate likelihood in this one. 530 00:54:18,180 --> 00:54:24,840 And what I think we'll find is that the variance as the second wave goes up, sometimes you see artists spike, right? 531 00:54:24,840 --> 00:54:31,800 If you randomly Archie goes up to six or seven, just because the small sample statistics, because we have got the variance correct. 532 00:54:31,800 --> 00:54:36,300 I don't think the mean will need to do that and it'll keep Archie a little bit more stable. 533 00:54:36,300 --> 00:54:38,660 This could be very useful for policy. 534 00:54:38,660 --> 00:54:45,290 I think, you know, otherwise in the news you get, artist is six, but there's only 10 cases or something, you know? 535 00:54:45,290 --> 00:54:50,860 And then it course. A big part of the reason that this exactly happened is because of this. 536 00:54:50,860 --> 00:54:55,870 And. Question from Leonid in the chat. 537 00:54:55,870 --> 00:54:59,990 Sam, you wanted to read this question earlier on this question. 538 00:54:59,990 --> 00:55:05,650 I'm wondering if there would be a straightforward way to represent the secondary attack rate in the framework you presented. 539 00:55:05,650 --> 00:55:11,200 Yeah, I mean, there is it is in a very it's baked in that right. 540 00:55:11,200 --> 00:55:14,620 We have our and that's always the mean that we get in. 541 00:55:14,620 --> 00:55:22,090 But you can always assume in these equations of distribution for the secondary, the secondary infections, if I can live to regret, 542 00:55:22,090 --> 00:55:28,540 for example, I assume the secondary infection to be, you know, plus negative binomial discrete power law. 543 00:55:28,540 --> 00:55:34,730 And based on all of these, as long as I can calculate second moment, I can get the variance until four. 544 00:55:34,730 --> 00:55:36,740 Until I can, just from what I was telling Bristol, 545 00:55:36,740 --> 00:55:43,880 I can I can actually provide a cure for immigration for the very it's not that's based on me assuming I've got the underlying facts, 546 00:55:43,880 --> 00:55:48,200 the policies correct. But it's definitely better than just having a fixed variant for process. 547 00:55:48,200 --> 00:55:53,670 We know it's more efficient. So neither is perfect. But I think if I did, did you? 548 00:55:53,670 --> 00:56:02,460 A secondary attack rate, it means the percentage of people who are in some setting that are getting infected from the from the index case. 549 00:56:02,460 --> 00:56:07,890 Yeah. Well, Leonid, you want to clarify. 550 00:56:07,890 --> 00:56:19,860 Yeah, that's exactly what what's said. So but the very to you mean, what is the prevalence from an index case? 551 00:56:19,860 --> 00:56:29,830 There's a time component as well. So if it's within, within, within a generation interval, I guess, I mean, you know, yeah, 552 00:56:29,830 --> 00:56:35,340 in principle, we can we can't do this by just choosing the random characteristics of matches. 553 00:56:35,340 --> 00:56:41,840 But obviously, you'd have to write down the marks for it. OK. 554 00:56:41,840 --> 00:56:48,660 Let me go to someone who hasn't gotten a chance to ask a question, so Francesco writes a question. 555 00:56:48,660 --> 00:57:00,900 Yeah. Francisco goes home at variability. Do you get when you look at the total package or at the end of the way, it seems quite high. 556 00:57:00,900 --> 00:57:09,750 Well, I mean, at the end of the day in this block over here, you know, the population prevalence is impacted by the attack rate. 557 00:57:09,750 --> 00:57:14,280 You mean the number of total infected or the number affected each time it's clarified. 558 00:57:14,280 --> 00:57:18,660 Yeah, the total number of infected. Thank you. Yeah, I mean, that's that's very good. 559 00:57:18,660 --> 00:57:24,870 We have cumulative incidents. You can generate that very trivially from this exact equation. 560 00:57:24,870 --> 00:57:32,370 So I can fit the president's get up incidents here and actually get total cumulative incidence as well and then divide that by the population. 561 00:57:32,370 --> 00:57:36,900 It requires no extra compute or computation. So how much variability do I get? 562 00:57:36,900 --> 00:57:43,710 I mean, it's sort of a difficult question to answer because all these equations are measuring the exact same process. 563 00:57:43,710 --> 00:57:48,300 The amount of variability you get in terms of incidents will be the same as you get in previous things. 564 00:57:48,300 --> 00:57:51,750 But they're all essentially dependent on a single RTU estimate, 565 00:57:51,750 --> 00:57:59,920 a single gene from a single acting and single gene you can to individual incidence and prevalence. 566 00:57:59,920 --> 00:58:04,160 Christophe, you still have your hand up, but maybe that's all, I hope that answered your question. 567 00:58:04,160 --> 00:58:11,820 I just got the pizza, you know, jump in, if not. And you're still on mute first off, OK, you've taken it down. 568 00:58:11,820 --> 00:58:18,900 So let's do one more follow up. What about the risks? Yeah, Chris asked about exact results for the offspring distribution. 569 00:58:18,900 --> 00:58:27,960 Yeah, you can. Well, I mean. When you calculate the higher order moments for the winning equation, you need to assume an offspring. 570 00:58:27,960 --> 00:58:30,990 This is this is something you have to do. 571 00:58:30,990 --> 00:58:40,260 And yeah, as long as an offspring distribution has a second moment, then you can compete if it doesn't have a second moment. 572 00:58:40,260 --> 00:58:51,090 Well, the variance can be essentially unbounded. OK, so we are officially at the end, so let's take a moment to once again thank Samir. 573 00:58:51,090 --> 00:58:57,540 So if people want to unmute and clap, there will be a way to do it if they want to use emojis, but also fine. 574 00:58:57,540 --> 00:59:03,840 And then I will just volunteer Sam to stick around and informally answer my questions with some more feedback. 575 00:59:03,840 --> 00:59:08,160 But this officially closes it. Well, you know, thanks for inviting me, Dominic. 576 00:59:08,160 --> 00:59:13,260 Please reach out if you want to call this a programme that's in our Python, Juliet. 577 00:59:13,260 --> 00:59:19,620 I'm happy and I've written up our notebook to to to make it simple with some examples. 578 00:59:19,620 --> 00:59:21,696 How do you share? Just want to?