1 00:00:00,180 --> 00:00:07,500 I was just just having a meeting with some students about talk about. 2 00:00:07,500 --> 00:00:18,150 About machine learning for sars-cov-2 main proteases, developing models to help discover new inhibitors. 3 00:00:18,150 --> 00:00:24,290 Yeah. All right. OK, so. We should get started. 4 00:00:24,290 --> 00:00:33,800 So it gives me great pleasure to introduce number who is now another co-director of the subsidy, 5 00:00:33,800 --> 00:00:42,260 and you're connected to the RC group and computer science as well, research engineering. 6 00:00:42,260 --> 00:00:46,520 And I should also plug your book, which you can buy on Amazon. 7 00:00:46,520 --> 00:00:50,600 And it is all about Bayesian statistics. Give me the correct title. 8 00:00:50,600 --> 00:00:55,070 The title is called A Student Guide to Bayesian Statistics. 9 00:00:55,070 --> 00:00:59,420 There is got to if you look at it on a on a bookshelf, 10 00:00:59,420 --> 00:01:07,610 it's got to kind of chillis on it and the vegetable won't go into vegetables on the front of it, which we call and quite confusing. 11 00:01:07,610 --> 00:01:18,290 So if you need to identify the MasterChef rating and to find excellence and is there a new edition coming out soon or there is in it in a year or so, 12 00:01:18,290 --> 00:01:23,930 it's time. So, OK, I still get it. 13 00:01:23,930 --> 00:01:35,780 OK, so what you can say. Yeah, this, this talk is going to be recorded and it will end up on Oxford podcast eventually under the of statistics. 14 00:01:35,780 --> 00:01:45,860 So if you do want to ask a question and then we'll be stopping during his talk to ask if there are any questions, 15 00:01:45,860 --> 00:01:52,040 make sure that if you if you come off video, you're happy to be recorded. 16 00:01:52,040 --> 00:02:00,470 And if you don't want to be recorded, just type your questions in a chat and then I will listen to that. 17 00:02:00,470 --> 00:02:07,820 So without further ado, I'm going to hand over to Ben and if you'd like to share your screen. 18 00:02:07,820 --> 00:02:16,880 I will do. I was doing this before I do, and I was just trying to trying to get a portion of my screen, so oh OK. 19 00:02:16,880 --> 00:02:25,490 So I thought that looks about right now and just that does not go through with it. 20 00:02:25,490 --> 00:02:31,390 Change as it is changing. Perfect. Great. 21 00:02:31,390 --> 00:02:41,630 Thanks. And thanks very much for the invitation to talk today and the statistics about what is actually happening with the statistics department. 22 00:02:41,630 --> 00:02:47,870 Because when I was about 16 years old, I came to the night and dying down in the basement of the statistics department. 23 00:02:47,870 --> 00:02:55,130 And so whenever I go there, I get a bit nostalgic. Well, hopefully we're all able to go into the department at some point soon anyway. 24 00:02:55,130 --> 00:03:03,860 So today I'm going to give an introduction to Bayesian inference for differential equation models. 25 00:03:03,860 --> 00:03:09,290 And if you're wondering why we already said a few words. 26 00:03:09,290 --> 00:03:16,070 I'm a statistician based in the Department of Computer Science and I essentially work on data science, 27 00:03:16,070 --> 00:03:24,490 machine learning and statistical inference problems for different research groups across the university. 28 00:03:24,490 --> 00:03:31,170 I've been a user of statistics for the past, I don't know how many years I worked to an industry before I came back, 29 00:03:31,170 --> 00:03:37,750 FEMA and very crucially for this talk, I was born in the same town as Thomas Bass. 30 00:03:37,750 --> 00:03:42,940 And I actually went to school there, which is Tunbridge Wells. And here's a here's a picture of Tunbridge Wells station. 31 00:03:42,940 --> 00:03:47,480 And it still looks quite similar to that today. 32 00:03:47,480 --> 00:03:56,720 So today, I'm going to have a mixture of things because I thought, look, what we want to talk to be is partly pedagogical and partly research. 33 00:03:56,720 --> 00:04:02,960 So firstly, I'm going to provide a really, really short introduction to the it's just a simple example. 34 00:04:02,960 --> 00:04:10,250 Then I'm going to talk about how you actually go about doing inference, formulating an inference problem for ordinary differential equation models. 35 00:04:10,250 --> 00:04:17,660 And then I'm going to talk briefly about how it is very, very difficult in practise to actually do exact spacing influence. 36 00:04:17,660 --> 00:04:21,840 And so instead, what you do is you do some sort of approximation. 37 00:04:21,840 --> 00:04:26,810 An approximation typically happens by some sort of computational sampling. 38 00:04:26,810 --> 00:04:34,010 And then finally, I talk a bit about a Python package, which we created in the Department of Science, 39 00:04:34,010 --> 00:04:36,770 which specifies immigrants for ordinary differential equations, 40 00:04:36,770 --> 00:04:46,470 models, and that's called points, which I can't remember the acronym because I think it's probabilistic inference for noisy time series models. 41 00:04:46,470 --> 00:04:53,760 So, yeah, so if we get started with a sort of short introduction to basic inference. 42 00:04:53,760 --> 00:05:00,240 The example I'm going to give is a margin that we want to estimate disease prevalence within a population, 43 00:05:00,240 --> 00:05:06,780 and so we're going to suppose that we take a sample of and study participants from the population. 44 00:05:06,780 --> 00:05:14,070 We take their blood, and then we apply some sort of clinical test to determine presence or absence of a disease. 45 00:05:14,070 --> 00:05:18,070 And we find that X of those individuals are disease policy. 46 00:05:18,070 --> 00:05:26,680 The question we might have is how do we use these data to estimate disease prevalence and hopefully with uncertainty? 47 00:05:26,680 --> 00:05:35,620 Well, there are aspects of the data generating process that we don't know about, we don't know exactly how the sample of individuals was formed, 48 00:05:35,620 --> 00:05:42,400 for example, and then we're going to use a probabilistic model to try and explain the data. 49 00:05:42,400 --> 00:05:46,910 So how do we choose what sort of probability model to use? 50 00:05:46,910 --> 00:05:52,070 What we need to think about the characteristics of our data, first of all, sample size and is fixed, 51 00:05:52,070 --> 00:05:58,270 we only sample individuals so we can only possibly have as a maximum and and 52 00:05:58,270 --> 00:06:04,900 individuals with disease processes and we can have any integer value up to that. 53 00:06:04,900 --> 00:06:17,630 So it's a discreet probability distribution we're looking for with a bounded non-negative and sort of support for the data and again, 54 00:06:17,630 --> 00:06:20,440 so that narrows down also probability distributions, 55 00:06:20,440 --> 00:06:26,230 then we need to make some assumptions so we can assume that those individuals that retrieve from the population 56 00:06:26,230 --> 00:06:32,320 represent independent samples and we can assume that those individuals are drawn from the same population. 57 00:06:32,320 --> 00:06:37,840 And if you Google all these things, these these various assumptions and characteristics, 58 00:06:37,840 --> 00:06:41,590 but it turns out that a single probability model satisfies those conditions, 59 00:06:41,590 --> 00:06:51,410 which is the number and I've written down by my new probability mass function, which I'm sure you're all familiar with in school. 60 00:06:51,410 --> 00:07:00,740 And basing inference, what we want to do is we want to essentially estimate our parameters of all probability. 61 00:07:00,740 --> 00:07:08,240 So she's going back. The promise of pizza represents the prevalence of disease in our population, 62 00:07:08,240 --> 00:07:14,180 and it does so if we assume that the clinical test that we're using has is essentially perfect. 63 00:07:14,180 --> 00:07:20,750 So under those assumptions is the disease prevalence, and that's the parameter that we want to estimate. 64 00:07:20,750 --> 00:07:31,990 Bases Rule gives us a sort of mechanism for estimating that Prometa written down here, but what each of these terms mean in group. 65 00:07:31,990 --> 00:07:35,920 So I'm going to kind of step through these individual terms and then what we can do is going 66 00:07:35,920 --> 00:07:41,620 to see how changing these individual terms actually influences our results and influences. 67 00:07:41,620 --> 00:07:46,830 So the first time on the right hand side is something which is known as the likelihood. 68 00:07:46,830 --> 00:07:55,290 And it's important to note that the likelihood is actually not a probability distribution as it's used in Bayesian inference, 69 00:07:55,290 --> 00:07:59,820 because it makes a difference to variance and we hold the data constant. 70 00:07:59,820 --> 00:08:01,530 And so it's a function of pizza. 71 00:08:01,530 --> 00:08:11,530 And that function of pizza does not satisfy the conditions for distribution to integrate pizza coding exposed and then it wouldn't integrate to one. 72 00:08:11,530 --> 00:08:18,160 Importantly, people like to bang on about how it happens is quite subjective. 73 00:08:18,160 --> 00:08:20,050 We use the sort of wishy washy, 74 00:08:20,050 --> 00:08:30,940 but in my experience that one of the typically the most subjective decisions that are made about an analysis or how you choose the likelihood. 75 00:08:30,940 --> 00:08:41,310 And so I want to highlight here that the the likelihood it so often contains many, many subjective, I guess, assumptions. 76 00:08:41,310 --> 00:08:46,800 And then the second time on the right hand side in the numerator is what's known as the. 77 00:08:46,800 --> 00:08:51,930 By contrast, and the likelihood is a valid probability distribution and similar to the like. 78 00:08:51,930 --> 00:09:00,170 It is also subjective. And then the final time on the right hand side and the bottom is 10. 79 00:09:00,170 --> 00:09:07,250 It's got many names identical to the denominator and it's got kind of two different interpretations before we actually collect the data. 80 00:09:07,250 --> 00:09:10,780 It is what's known as a prior predictive distribution. 81 00:09:10,780 --> 00:09:20,240 So it's actually about a potential date to some things that we could get given all of to and something. 82 00:09:20,240 --> 00:09:26,990 And then once we are there, it's just the number that normalises the area and that's known as the evidence or the likelihood, 83 00:09:26,990 --> 00:09:31,850 and that's calculated from that and it's entirely calculated from the numerator. 84 00:09:31,850 --> 00:09:38,550 And as we'll see that later on, calculating this denominator is the source of much of the problem in doing. 85 00:09:38,550 --> 00:09:46,960 Exactly. And then the final term in baseball is what's known as the posterior. 86 00:09:46,960 --> 00:09:54,130 It is the goal of Democrats because it is what we want to do is we want to summarise our uncertainty about some 87 00:09:54,130 --> 00:10:01,000 quantity that we don't know about the disease prevalence using probabilities and probability distributions, 88 00:10:01,000 --> 00:10:06,700 and because that's the only way to summarise uncertainty. 89 00:10:06,700 --> 00:10:13,390 And as I say, it's the starting point for all sort of further analysis and basic. 90 00:10:13,390 --> 00:10:18,550 Now, I want to talk a little bit about the intuition behind doing Bayesian inference. 91 00:10:18,550 --> 00:10:28,960 So if we run down this rule again, then we can see that actually the numerator is the right hand side doesn't contain the promise of Caesarism. 92 00:10:28,960 --> 00:10:34,660 And so the posterior is essentially proportional to the product of the likelihood in the front. 93 00:10:34,660 --> 00:10:40,450 And so what that means is that the posterior is essentially a kind of weighted geometric mean of the palm, the likelihood. 94 00:10:40,450 --> 00:10:45,970 And so all of its kind of shape is determined by the product of these two terms. 95 00:10:45,970 --> 00:10:51,380 And that's what I want to emphasise. Now, if you animation's. 96 00:10:51,380 --> 00:11:02,210 So I'm going to imagine how we go out and we collect blood samples for 10 people and we find that three of them are disease. 97 00:11:02,210 --> 00:11:05,040 And now what I'm drawing here is a potential crime. 98 00:11:05,040 --> 00:11:13,100 And I've chosen a uniform criteria for the disease prevalence between zero and one zero one hundred percent, 99 00:11:13,100 --> 00:11:17,630 depending on how you think about it, the disease prevalence all equally likely. 100 00:11:17,630 --> 00:11:25,100 And then below that, I can show the likelihood that likelihood is that's three out of 10, 101 00:11:25,100 --> 00:11:29,930 because we found that three out of 10 individuals were disease. 102 00:11:29,930 --> 00:11:37,040 And the posterior is the product of the likelihood because the prior is the to speak to that sort of point three, 103 00:11:37,040 --> 00:11:45,970 then the posterior is also pointing to the same. But now what I want to show you is how posterity changes as I change my prior. 104 00:11:45,970 --> 00:11:56,500 And so I'm going to run a quick animation which shows that as I changed what prior distribution I'm using, then the posterior distribution shifts. 105 00:11:56,500 --> 00:12:04,390 And what we find here is that the peak of the posterior ends up being somewhere between the peak of the prior and somewhere between the peak. 106 00:12:04,390 --> 00:12:11,180 And so it's kind of, as I said, this weighted average of the prior on the. 107 00:12:11,180 --> 00:12:25,640 So austerity is affecting both of our prejudices and data, telling us about the values or promises of our modern. 108 00:12:25,640 --> 00:12:28,310 Now, I want to show you a slightly different thing, 109 00:12:28,310 --> 00:12:33,620 which now I'm going to hold the prior constant and I'm going to imagine that we collected different types of samples. 110 00:12:33,620 --> 00:12:41,270 So I'm starting off here with imagining that we had a sample size of 10 and only we didn't find any individuals with disease. 111 00:12:41,270 --> 00:12:48,180 And now we see that the likelihood is zero because that's the maximum likelihood estimate of of the primitive. 112 00:12:48,180 --> 00:12:52,090 And now what we can see is that as we collect different data samples, 113 00:12:52,090 --> 00:12:58,720 my likelihood shifts and because my head is shifting, my posterior is also shifting as well. 114 00:12:58,720 --> 00:13:02,640 And we find that the position of the peak of the area is somewhere between 115 00:13:02,640 --> 00:13:09,590 the peak of the problem because we're lucky to be found in the previous case. Now, finally, 116 00:13:09,590 --> 00:13:19,280 I want to show you what happens if instead we imagine that we had a fixed price and a fixed proportion of individuals who have the disease, 117 00:13:19,280 --> 00:13:27,650 but we increase our sample size, so we start out with three out of 10 individuals who have the disease, then we get 30 out of 100, 118 00:13:27,650 --> 00:13:32,570 etc. So we're just keeping that proportion the same, increasing the sample size. 119 00:13:32,570 --> 00:13:37,070 So before I start running some animation, we can see that with a sample size of 10, 120 00:13:37,070 --> 00:13:41,210 the posterior is somewhere between the peak of the pro and the likelihood. 121 00:13:41,210 --> 00:13:46,050 But as I increase my sample size, we see a couple of things happen. 122 00:13:46,050 --> 00:13:53,820 Firstly, we see that the area becomes narrower and that makes sense, right, because as I collect more data, 123 00:13:53,820 --> 00:14:00,150 then I should hopefully get more confident in my estimate, but something else happens as well. 124 00:14:00,150 --> 00:14:06,940 We actually see that the position of the austerity shifts over towards the position of the. 125 00:14:06,940 --> 00:14:18,190 And that, again, is a desirable property of fazing influence, which is that as I increase my sample size, then ideally the private, I use it much. 126 00:14:18,190 --> 00:14:25,780 That's not always the case because the more complicated models and you may not ever be able to invest in promises of a model for a model like this, 127 00:14:25,780 --> 00:14:31,570 then that's true. So hopefully that's provided some intuition for you. 128 00:14:31,570 --> 00:14:45,730 That's what I wanted to ask if anyone had any questions at this point. 129 00:14:45,730 --> 00:14:52,310 There are no questions. 130 00:14:52,310 --> 00:15:04,670 If not, that's fine, I can continue on and there's going to be more opportunities, questions, and so just curious how it's implemented. 131 00:15:04,670 --> 00:15:09,270 What's that? Sorry, how is that updating Grauwe implemented. 132 00:15:09,270 --> 00:15:13,350 OK, so do you mean to be supervised or do you mean is it just an animation. 133 00:15:13,350 --> 00:15:24,450 If so, it's not to get these sorts of things is always very good for the animations. 134 00:15:24,450 --> 00:15:25,500 Great. 135 00:15:25,500 --> 00:15:32,870 OK, so that's hopefully provided a little bit of an instruction device, Newgrange, and I'm sure that a lot of you are familiar with the technology. 136 00:15:32,870 --> 00:15:37,510 So but I wanted to do that just to provide a little grounding for the rest of the. 137 00:15:37,510 --> 00:15:46,090 So now we're going to kind of step up perhaps the level of difficulty a little bit and we can talk about how we formulate Bayesian inference, 138 00:15:46,090 --> 00:15:49,740 the problems, ordinary differential equations. 139 00:15:49,740 --> 00:15:58,950 And we can imagine now a slightly different example where we carry out a series of experiments where we inoculate some patients with bacteria, 140 00:15:58,950 --> 00:16:08,950 some initial time and then predefined time intervals, we can't count the number of bacteria on each pipe using some sort of experimental approach. 141 00:16:08,950 --> 00:16:18,550 And imagine that what we're trying to do is to model the bacterial population growth over time. 142 00:16:18,550 --> 00:16:25,630 So I've got some sort of fictitious data here which shows the accounts of bacteria over time 143 00:16:25,630 --> 00:16:35,170 that have been measured and what we want to do is develop a model that it kind of explains the. 144 00:16:35,170 --> 00:16:38,350 One model that might be appropriate for this is the logistic model. 145 00:16:38,350 --> 00:16:45,760 So this basically contains it's a differential equation that contains two terms on the right hand side. 146 00:16:45,760 --> 00:16:52,420 One of the terms is essentially reflects initial growth of the bacterial population. 147 00:16:52,420 --> 00:17:03,340 And then so that's the term alpha here. And then this term beta and essentially reflects how, as the size of the bacterial population grows, 148 00:17:03,340 --> 00:17:10,660 then there's a reduction in the growth rate due to some of the crowding that to be competition for nutrients, for example. 149 00:17:10,660 --> 00:17:18,640 And so what you actually find in this sort of model is that you get sigmoidal curve, which represents the solution of the. 150 00:17:18,640 --> 00:17:26,800 So we've got our data and we've got a model, do we have everything that we need to do here? 151 00:17:26,800 --> 00:17:35,560 So what I could do is I could imagine overlaying lots of potential solutions of the different parameters. 152 00:17:35,560 --> 00:17:43,330 But we have a bit of a problem here, which is that none of the models that we're using can fully explain the data. 153 00:17:43,330 --> 00:17:50,530 In other words, they've got zero probability of having generated the data because our models are smooth 154 00:17:50,530 --> 00:17:56,470 lines and our data set in the uncertainty of it all that not all along those lines. 155 00:17:56,470 --> 00:18:05,830 And so at the moment, we don't have enough information to do to formulate the inference problem here. 156 00:18:05,830 --> 00:18:15,550 What we need, we need some sort of statistical model which represents essentially those boxes that we don't account for and are deterministic model. 157 00:18:15,550 --> 00:18:20,110 So here I'm going to assume that we've got some sort of measurement error around the true value. 158 00:18:20,110 --> 00:18:29,470 So the number of bacteria that we count on a plate at Sonti is normally distributed about the true count, which is the solution of the. 159 00:18:29,470 --> 00:18:38,100 And there is some. Noise Prometa Sigma, which represents the magnitude of the mission about the true value. 160 00:18:38,100 --> 00:18:47,040 So I should say here that using a normal measurement error model is not the only choice I could have made and I could use, 161 00:18:47,040 --> 00:18:54,330 for example, student distribution. And but it is generally a fairly steady, 162 00:18:54,330 --> 00:19:03,750 widely used race and also implicitly almost assuming here that the normal areas and actually just well, I just want to have a second. 163 00:19:03,750 --> 00:19:09,680 I might just kind of turn off my slack, because I think that's going to become quite annoying in a second. 164 00:19:09,680 --> 00:19:19,710 I'm just going to quit that. Sorry. OK. 165 00:19:19,710 --> 00:19:24,520 So the question we might have is, how does this model work? 166 00:19:24,520 --> 00:19:32,530 So the data generating process is that we assume the true number of cells follows the solution of the. 167 00:19:32,530 --> 00:19:37,750 And then there is some sort of measurement process which is imperfect. 168 00:19:37,750 --> 00:19:43,150 So that means that we don't actually measure the true number of the bacteria on plates, 169 00:19:43,150 --> 00:19:50,230 but that the amount that we do measure is sort of normally distributed around the value. 170 00:19:50,230 --> 00:19:54,700 And then to get all our data, then we sample from that process. 171 00:19:54,700 --> 00:20:05,230 And so then we can draw data from. And now, because of the statistical model that gives us a possible way of having generated our data, 172 00:20:05,230 --> 00:20:10,830 whereas we didn't have that before when we just used our purely deterministic mathematical model. 173 00:20:10,830 --> 00:20:17,170 We needed that information to be able to formulate the sort of proper inference. 174 00:20:17,170 --> 00:20:20,470 So how do we write down the likelihood of this circumstance? 175 00:20:20,470 --> 00:20:29,260 Well, remember that what we're using is as a normal like to hit that sense of on the true number of cells at the time. 176 00:20:29,260 --> 00:20:36,640 And so we can write down the observations by taking the products off the individual bits of each data point. 177 00:20:36,640 --> 00:20:43,600 And we're able to do that because we're assuming conditional independence of all of all of our data, 178 00:20:43,600 --> 00:20:46,810 conditional conditioning on the parameters of our model. 179 00:20:46,810 --> 00:20:52,630 And so the likelihood of all our observations is sort of the the the probability I get the first 180 00:20:52,630 --> 00:20:59,930 observation on the probability density of the second observation and so on all the way up. 181 00:20:59,930 --> 00:21:13,090 And so I'm using the capital and stuff with the kind of bolt it to represent my vector of measured counts of bacteria. 182 00:21:13,090 --> 00:21:20,040 The question we have, though, is how do we actually calculate and, well, for the logistic model, there is actually an analytics solution. 183 00:21:20,040 --> 00:21:20,980 I can write it down. 184 00:21:20,980 --> 00:21:31,660 And that's that is the justification in most ordinary differential equation models that mean representing the deterministic solution cannot be solved. 185 00:21:31,660 --> 00:21:42,760 Exactly. And so what we typically do is use some sort of numerical integral methods and to to integrate idea. 186 00:21:42,760 --> 00:21:49,660 And we're going to have to bear that in mind whenever we do inference problems. And also, 187 00:21:49,660 --> 00:21:54,370 we know the value of end depends implicitly on the promises of so I can actually write 188 00:21:54,370 --> 00:22:03,960 down the solution of using an implicit notation entity is some function of time for to. 189 00:22:03,960 --> 00:22:10,020 So then what we can do is we can just sort of rewrite our likelihood a little bit here and now, 190 00:22:10,020 --> 00:22:14,310 that explicitly makes it clear that the likelihood depends on on three parameters. 191 00:22:14,310 --> 00:22:20,370 It depends on the great promise of the founding PROMETA and the noise of our model. 192 00:22:20,370 --> 00:22:26,230 And so what you see is that when you were actually formulating the infamous public order differential equations, models, 193 00:22:26,230 --> 00:22:32,230 then typically you get expert parameters other than just the parameters of your ordinary differential equation 194 00:22:32,230 --> 00:22:38,820 model that you need to put in place for that sort of nuisance parameters that represent some sort of measurement. 195 00:22:38,820 --> 00:22:48,720 Some sort of infection. So then what I can do is I can write down our posterior and it has the form, the posterior is equal to the time times. 196 00:22:48,720 --> 00:22:59,560 The problem right now is this three dimensional distribution and we have denominate to install. 197 00:22:59,560 --> 00:23:03,550 So that's that's how you formulate. Well, that's one way of formulating the infant's problems. 198 00:23:03,550 --> 00:23:14,030 The only difference to question was does anyone have any questions at this point? 199 00:23:14,030 --> 00:23:18,680 Yes, I do. Hello. Hi. 200 00:23:18,680 --> 00:23:26,180 Could you please explain your Alpha, Beta and gamma, which one of them represent the parameters of? 201 00:23:26,180 --> 00:23:32,840 Sure. So here's the equation. And it's got these two properties out from Beta. 202 00:23:32,840 --> 00:23:33,870 This is my ordinary difference. 203 00:23:33,870 --> 00:23:41,270 The model, as I said, Alpha kind of represents the initial growth rate of the population or indicates the growth rate in population, 204 00:23:41,270 --> 00:23:46,190 which is an exponential growth rate. And beta represents a kind of grounding. 205 00:23:46,190 --> 00:23:54,890 And so my posterior distribution is a function of those two provinces and my noise problems, a signal, does that make sense? 206 00:23:54,890 --> 00:24:00,440 Yes. And so does that mean and when we are formulating something like this, 207 00:24:00,440 --> 00:24:09,530 the only additional parameter that we need to consider is if we just have one additional parameter for the measurement data. 208 00:24:09,530 --> 00:24:16,580 So it depends a little bit. So I've used a very simple measurement model here, which is just got one extra parameter. 209 00:24:16,580 --> 00:24:25,460 Now, there are lots of different choices I could have made of the more complex I make that measure model and typically the more promises it has. 210 00:24:25,460 --> 00:24:31,340 And so, yes, sometimes it does and other times it doesn't. 211 00:24:31,340 --> 00:24:39,590 If you're using more complex measurement. Also another time it is if your ordinary differential equation model essentially has a number of outputs. 212 00:24:39,590 --> 00:24:48,380 So imagine you've got a system of order difference equations and and you're sort of using observations on all of those to form inference. 213 00:24:48,380 --> 00:24:55,190 Then then you might have a different measurement. Is the promise of each of those different parts of the system. 214 00:24:55,190 --> 00:24:59,990 Yeah, that that. Thank you. Thank you very much. 215 00:24:59,990 --> 00:25:12,040 Does anyone have any other questions? No. 216 00:25:12,040 --> 00:25:16,270 Sorry. No questions. OK, great. 217 00:25:16,270 --> 00:25:30,160 Thank you. So talk about how about how we formulate the inference problem and now want to talk about how we go about actually solving it. 218 00:25:30,160 --> 00:25:37,330 And as we'll see the method of solving the problem, is it perhaps a bit messier than you might expect? 219 00:25:37,330 --> 00:25:41,290 And it involves the various approximations. 220 00:25:41,290 --> 00:25:53,940 So if we revisit the posterior for our logistic model and imprint's problem, then we see that it's got this denominate here, then they then start. 221 00:25:53,940 --> 00:26:04,710 So how would we actually calculate that? Well, because Alphabeat, Beta Sigma, all contiguous properties, then to calculate its nominate, 222 00:26:04,710 --> 00:26:10,210 someone need to do a type of integral, essentially three dimensional. 223 00:26:10,210 --> 00:26:20,980 And that's pretty tricky, it's tricky for computers to do any three dimensional, really, at least to do it deterministically. 224 00:26:20,980 --> 00:26:26,840 Exactly. So for any sort of problem, doing a three dimensional indigo is tricky. 225 00:26:26,840 --> 00:26:33,170 Invasion immigrants doing the interviews that are involved in the denominator are especially difficult. 226 00:26:33,170 --> 00:26:41,390 And that's because the likelihood tends to be very narrow, the sort of space for which the likelihood is not negligible, 227 00:26:41,390 --> 00:26:47,870 whereas the prior tends to be really, really white people of the news, kind of uninformed, surprised, and that causes additional problems. 228 00:26:47,870 --> 00:26:56,870 So trying to approximate in different. And in order to difference, the most difficulty is compounded even further, 229 00:26:56,870 --> 00:27:04,880 because to evaluate the likelihood of in a differential equations setting actually 230 00:27:04,880 --> 00:27:10,190 means that typically numerically integrate differential equations to get the solution. 231 00:27:10,190 --> 00:27:14,060 And so is it integral? Did we see an equation? 232 00:27:14,060 --> 00:27:19,490 But implicitly, this also involves a whole series of kind of implicit intervals as well. 233 00:27:19,490 --> 00:27:28,250 So suffice to say, there's absolutely zero chance that we could actually evaluate this this phenomenon at least. 234 00:27:28,250 --> 00:27:33,770 Exactly. And so we need to sort of realised that we can't do exact and influence. 235 00:27:33,770 --> 00:27:39,840 And that's that's not just the problem with ordinary differential equations, models. That's the problem with doing this in general. 236 00:27:39,840 --> 00:27:45,660 So what can we do and this leads me into a different area, 237 00:27:45,660 --> 00:27:57,210 which is that I imagine that you want to gain insight into a distribution and not distribution here is 238 00:27:57,210 --> 00:28:06,270 I'm imagining we've got this kind of bottomless pit bulls and we don't know really how many cholesterol. 239 00:28:06,270 --> 00:28:08,820 We don't know the frequencies of each of the colours. 240 00:28:08,820 --> 00:28:18,600 So the question we might have is how can we determine the underlying probability distribution of food colour from from this? 241 00:28:18,600 --> 00:28:27,120 So the answer, which is very intuitive, is that what we do is we draw lots of lots of balls from the end and we count the sample frequencies. 242 00:28:27,120 --> 00:28:37,140 So if I draw one hundred bowls from the end and I tabulate the frequencies of light balls, then what we see is that if I collect enough samples, 243 00:28:37,140 --> 00:28:41,360 then the probability distribution or the sampling distribution, rather, 244 00:28:41,360 --> 00:28:51,520 it starts to converge to something which hopefully represents the underlying probability distribution of those colours within the. 245 00:28:51,520 --> 00:28:57,350 And so what have we learnt here? We've learnt that if we can sample from a distribution. 246 00:28:57,350 --> 00:29:01,890 Then we can use the sample properties of the things that we collected to help 247 00:29:01,890 --> 00:29:12,470 us to learn about that distribution and we can get quantities of interest. So what's the connexion of this to. 248 00:29:12,470 --> 00:29:17,730 There is a probability distribution. It's a discrete probability distribution. 249 00:29:17,730 --> 00:29:21,000 The posterior, for example, is also a probability distribution. 250 00:29:21,000 --> 00:29:26,530 It's a continuous one in that case, and there are also discrete posterior probabilities. 251 00:29:26,530 --> 00:29:36,980 But in all cases, a continuous one. So the idea behind computational sampling is that if we can construct a way of sampling, 252 00:29:36,980 --> 00:29:41,540 draw drawing and values of all properties from that posterior distribution, 253 00:29:41,540 --> 00:29:49,520 then we can use those rules to help us to summarise the posterior distribution and approximates it in some way. 254 00:29:49,520 --> 00:29:53,460 So a question we have is, how do we actually construct such a song? 255 00:29:53,460 --> 00:29:58,220 And it's not as simple as that reaching into the and with a different colour. 256 00:29:58,220 --> 00:30:06,650 Right. So how do we do that? Well, the answer is that you use something which is known as molecule Markov, chain multicolour. 257 00:30:06,650 --> 00:30:12,500 So in our example for our posterior, we couldn't calculate the posterior exactly, 258 00:30:12,500 --> 00:30:19,360 but we could calculate the numerator of this room, which is the product of the hit in the prime. 259 00:30:19,360 --> 00:30:26,020 And it turns out that this contains enough information for us to construct a Markov chain, 260 00:30:26,020 --> 00:30:31,660 which in an infinite sample size draws from the posterior distribution. 261 00:30:31,660 --> 00:30:34,090 Now, infants on both sides sounds a bit intimidating, 262 00:30:34,090 --> 00:30:41,170 but the idea is that in a finite sample size and hopefully we should have enough samples from our local 263 00:30:41,170 --> 00:30:46,660 chain that we eventually get out something that approximates quite well the posterior distribution. 264 00:30:46,660 --> 00:30:52,430 But they're constructed, so they converge asymptotically into the posterior distribution. 265 00:30:52,430 --> 00:30:57,060 There are many, many times of multiple Chamberlain's columnists, 266 00:30:57,060 --> 00:31:04,160 and they be sort of different uses in different situations, some typically more useful than other ones. 267 00:31:04,160 --> 00:31:11,600 And I'll talk about that in a few minutes. I don't want to go into too much detail about Montecarlo methods and sort of motivating them. 268 00:31:11,600 --> 00:31:21,090 Not so much. But I want to provide a the perhaps the simplest or definitely the simplest variant of molcajete Monte Carlo of the oldest variant, 269 00:31:21,090 --> 00:31:29,210 which is amongst them, which was created in nineteen fifty three by Nicholas Metropolis and still stands Donelan. 270 00:31:29,210 --> 00:31:35,170 We were working on the Manhattan Project that a nuclear physicist interested in neutron. 271 00:31:35,170 --> 00:31:41,910 So. This is a sketch of the algorithm is actually not as intimidating as it looks. 272 00:31:41,910 --> 00:31:49,590 The idea is that you start from some sort of arbitrary initial point for each of all parameters, Alpha, Beta Sigma. 273 00:31:49,590 --> 00:31:57,960 And then you iterate the following. You draw some proposed values for each of those promises from a normal distribution, 274 00:31:57,960 --> 00:32:09,490 which is centred on the previous process and not normal distribution has some sort of proposal with distribution or covariance matrix sigma, 275 00:32:09,490 --> 00:32:14,440 which needs to be tuned to be appropriate for a given circumstance. 276 00:32:14,440 --> 00:32:24,700 Then what you do is you calculate the ratio of the proposed posterior probability to the current posterior probability. 277 00:32:24,700 --> 00:32:30,760 And so, you know, that equation twenty one would require us to actually evaluate the posterior distribution, 278 00:32:30,760 --> 00:32:35,140 which we know we can't do because we can't calculate the denominator, the bicycle. 279 00:32:35,140 --> 00:32:41,620 But it turns out we don't need to because the denominator cancels out of this ratio. 280 00:32:41,620 --> 00:32:52,210 And so what we get left with an equation twenty two is actually just the ratio of the unorganised posterior of the proposed point to the normalised. 281 00:32:52,210 --> 00:32:58,220 So we talked about this racial thing, which we can do because we can talk about all the things in this room, 282 00:32:58,220 --> 00:33:09,820 all the things in the in the numerator, then what we do is we draw a uniform value from a value from the uniform distribution between zero one. 283 00:33:09,820 --> 00:33:19,240 And then if all is greater than the uniform value, then the next set of products bodies becomes proposed. 284 00:33:19,240 --> 00:33:24,560 Otherwise, then we split where we were for where we are currently for the next iteration. 285 00:33:24,560 --> 00:33:29,430 So so we get two sets of samples from the competition. So. 286 00:33:29,430 --> 00:33:35,520 Empty rooms then then all this, but they tend to be exact reject. 287 00:33:35,520 --> 00:33:44,630 So you don't necessarily set all of your steps and so you end up with two samples in one place or a number of samples in one place. 288 00:33:44,630 --> 00:33:51,310 So I wanted to now provide a slight that was the sort of mathematical detail of the algorithm or sketch of the algorithm. 289 00:33:51,310 --> 00:33:57,240 I'm going to try and visualise a little bit about how. The algorithm runs the time. 290 00:33:57,240 --> 00:34:05,820 And so the question I have is, can we use Petropolis to sample from this sort of random continuous distribution of great below? 291 00:34:05,820 --> 00:34:18,660 And so obviously this is a problem. And I just know it's kind of weird, funky distribution and ask the question, can we actually use. 292 00:34:18,660 --> 00:34:23,880 OK, so what we find is that if I should have done this, 293 00:34:23,880 --> 00:34:33,720 then we started some sort of poverty point and then no algorithm proceeds by proposing values and then only the rejecting that value, 294 00:34:33,720 --> 00:34:39,510 in which case quite illustrate the change path, a sort of read and or it accepts that poverty, 295 00:34:39,510 --> 00:34:44,980 in which case we get a green transition and the chain moves to the new location. 296 00:34:44,980 --> 00:34:52,970 And so what we can see over time is that our Markov chain moves about and intends to move to the modes of distribution, which is what we want. 297 00:34:52,970 --> 00:34:58,210 We want to sample more from the modes of distribution, because that's what having a majority means. 298 00:34:58,210 --> 00:35:02,380 It means you generate more samples in that location, in other locations. 299 00:35:02,380 --> 00:35:11,680 And so if I was to leave, this is a much, much longer, which I'll show you in a second, then hopefully the collection of the points, 300 00:35:11,680 --> 00:35:19,390 the sort of nodes on the blue path and would represent samples from the underlying distribution. 301 00:35:19,390 --> 00:35:26,560 And so I can illustrate that now on the right hand side, I've got the actual density here and on the left hand side, 302 00:35:26,560 --> 00:35:32,050 I've got a reconstructed version of the density which I get from fitting a of 303 00:35:32,050 --> 00:35:37,400 density estimate up to my collection with my samples from Random Metropolis. 304 00:35:37,400 --> 00:35:44,290 And so with a sample size of one hundred, we see here that I get a very noisy interpretation of the activity. 305 00:35:44,290 --> 00:35:51,340 But as I run more and more samples, then we see the distributions that are around below that. 306 00:35:51,340 --> 00:36:02,730 Over time, the distribution from my metropolis routine ends up converging towards the actual density. 307 00:36:02,730 --> 00:36:10,060 And so after a sample size of roughly twenty thousand, the metropolis approximation of the density. 308 00:36:10,060 --> 00:36:19,410 So it's kind of hard to tell it apart. So it's probably a good approximation to the underlying distribution. 309 00:36:19,410 --> 00:36:25,650 Great. So that's that's a very sweet introduction to it's a wonderful Marcovicci Monte Carlo. 310 00:36:25,650 --> 00:36:42,990 Does anyone have any questions at this point? No. 311 00:36:42,990 --> 00:36:48,570 OK, well, in that case, I'll proceed on to the final bit of the talk, 312 00:36:48,570 --> 00:36:57,060 which is to say a little bit more research that I'm involved in and which is a bit of software called points, 313 00:36:57,060 --> 00:37:03,580 which facilitates inference or differential equation rules. 314 00:37:03,580 --> 00:37:12,640 So I don't know. I'm sure that some of some of the people in the audience have tried to use or use Monte Carlo to fit models, 315 00:37:12,640 --> 00:37:19,240 I don't know if they tried to use ordinary differential equation models, but. 316 00:37:19,240 --> 00:37:30,640 What I found in the past is that often, especially the early stage researchers and people that are new to doing the thing, then typically they fall. 317 00:37:30,640 --> 00:37:33,010 But they follow a path which looks something like this, 318 00:37:33,010 --> 00:37:39,700 which is that they read the statistical literature and they find a given Markov chain, Monte Carlo method. 319 00:37:39,700 --> 00:37:47,830 And if they understand the statistical, it's which certainly doesn't go at the time and it's very poorly described at the time, 320 00:37:47,830 --> 00:37:54,480 then what they do is they type it up that method, and that may or may not be good code. 321 00:37:54,480 --> 00:37:59,110 And depending on what sort of software development practises they're using, 322 00:37:59,110 --> 00:38:07,130 then what they do is they apply that to their to their problem and they find that the chains aren't converging. 323 00:38:07,130 --> 00:38:10,990 So the method is essentially failing, and that can be for a number of reasons. 324 00:38:10,990 --> 00:38:17,470 One of the reasons could be that your molcajete Monte Carlo message isn't appropriate, that it's not up correctly, 325 00:38:17,470 --> 00:38:26,180 or it could be some characteristic of your old model and your data, which means that actually doing inference is going to be really, really difficult. 326 00:38:26,180 --> 00:38:31,220 And so then what people tend to do is they then move on to another, they look at the literature again, 327 00:38:31,220 --> 00:38:35,660 they put up another method and try again and they repeat the cycle of what we 328 00:38:35,660 --> 00:38:40,900 call a cycle of misery until they eventually end up with something that was. 329 00:38:40,900 --> 00:38:49,120 And so I think a few of us got a bit fed up with seeing people go through this path, and so we decided to try and stop it. 330 00:38:49,120 --> 00:38:59,210 And. As I say, the reason it exists is because partly it's a communication between the statistical literature is often written by methods, experts, 331 00:38:59,210 --> 00:39:04,820 but other methods, experts, and often those papers don't contain high quality pseudocode, 332 00:39:04,820 --> 00:39:11,100 which actually makes it harder to take these methods themselves. And and also. 333 00:39:11,100 --> 00:39:16,900 If there is software available accompanying the papers that they often it's not actually very well developed, 334 00:39:16,900 --> 00:39:19,080 a very user friendly, so it's not difficult to use. 335 00:39:19,080 --> 00:39:26,490 And if you want to move to a new method, then you need to get familiar with the whole new package of doing using that method. 336 00:39:26,490 --> 00:39:32,200 And so it takes ages to shift between different different types of. 337 00:39:32,200 --> 00:39:35,890 So, yeah, that's that's part of the reason that this cycle exists. 338 00:39:35,890 --> 00:39:43,780 Another reason is that ordinary different equation models are particularly problematic for infants because of that non-linear nature. 339 00:39:43,780 --> 00:39:51,460 So this is an example that I've taken from a paper by marcarelli in which he shows the posterior distribution of the two of the 340 00:39:51,460 --> 00:40:01,930 parameters for what's known as a good one oscillator model of this often used to sort of represent circadian rhythms and organisms. 341 00:40:01,930 --> 00:40:05,800 And so you can see on the left eye the posterior distribution. 342 00:40:05,800 --> 00:40:09,070 You can see that it's got all these kind of nasty ridges along. 343 00:40:09,070 --> 00:40:14,830 And if you think about Markov chain ones, that's what they're trying to do, is essentially explore those ridges. 344 00:40:14,830 --> 00:40:17,920 And so you need for this sort of method. 345 00:40:17,920 --> 00:40:25,630 It's really, really challenging to come up with good multichannels methods that will adequately, adequately explore the space. 346 00:40:25,630 --> 00:40:29,710 Remember that this is only two dimensions of a much higher dimensional space. 347 00:40:29,710 --> 00:40:35,060 And so it actually gets it's much harder than just how this problem looks. 348 00:40:35,060 --> 00:40:42,050 And so you need often different times from these different types of insurgency methods. 349 00:40:42,050 --> 00:40:51,320 So much motivated, this this is something which points and basically what point is, is a zoo, 350 00:40:51,320 --> 00:41:01,040 the zoo have lots of sampling methods and it also has optimisation methods in that site in optimisation methods and single value of your properties, 351 00:41:01,040 --> 00:41:05,210 which optimises some criteria in sampling. It's different. 352 00:41:05,210 --> 00:41:11,540 You return kind of distribution of your property values, which represent some sort of uncertainty. 353 00:41:11,540 --> 00:41:19,570 It's an open source Python Library that's available and GitHub was created in computer science. 354 00:41:19,570 --> 00:41:27,140 So how is it different? It's not aligned to the single algorithm and it's designed to interface with other programming language. 355 00:41:27,140 --> 00:41:29,280 So, for example, it has an interface that stands. 356 00:41:29,280 --> 00:41:40,870 So if you if you try and do inference on your model using stand and you find it fails, then you don't necessarily have to model in the first model. 357 00:41:40,870 --> 00:41:47,830 And then using points, you can actually user interface and actually make the transition a bit easier. 358 00:41:47,830 --> 00:41:58,750 It's aimed at Honda forward models, IDs and PD typically, and it allows users the freedom to use their own Ford model solution method so often, 359 00:41:58,750 --> 00:42:04,750 and particularly partial differential equations, they require quite nuanced ways of actually solving those models. 360 00:42:04,750 --> 00:42:11,560 And so points gives users the freedom to use whatever they want to solve those differential equations. 361 00:42:11,560 --> 00:42:15,460 And you can still the mass difference in a lot of the probabilistic programming 362 00:42:15,460 --> 00:42:24,070 methods that are out there at the moment where you have to come up with your problem, solution using essentially their own language. 363 00:42:24,070 --> 00:42:30,100 So I'm sure that some of you probably use them and which is really, really good software. 364 00:42:30,100 --> 00:42:38,140 I wrote a book about it essentially and really what useful but it's a different sort of nature to what point says I'm done. 365 00:42:38,140 --> 00:42:43,030 It's done is really, really good. If you've got a model which is got lots of lots of promises, dimensions, 366 00:42:43,030 --> 00:42:49,570 but the the evaluation of the likelihood is relatively cheap and various points of the evaluation, 367 00:42:49,570 --> 00:42:55,290 the likelihood is really expensive because you have to integrate your your definition of the equation. 368 00:42:55,290 --> 00:43:04,410 And so it's in a different part of space and sort of needs to to to stand and sort of 369 00:43:04,410 --> 00:43:11,690 community act as a guest for politicians rather than to necessarily apply statistician's. 370 00:43:11,690 --> 00:43:18,530 Points a set is a zoo of lots and lots of different animal coaching methods and other types of 371 00:43:18,530 --> 00:43:23,760 something that I haven't gotten so fascinated because molcajete Monte Carlo is just one type of method, 372 00:43:23,760 --> 00:43:27,180 the sampling area. There are lots of other types. 373 00:43:27,180 --> 00:43:34,380 And as I say, we've got already a lot of these methods and points and some more problems and some of these methods, 374 00:43:34,380 --> 00:43:40,950 and they placed different restrictions on what you need to be able to evaluate your model, your model to be able to do so. 375 00:43:40,950 --> 00:43:47,320 Some of them require no gradients of the local attitude with respect to the promise of values, 376 00:43:47,320 --> 00:43:52,980 and others require what is known as the sensitivities, which is the great promise of all these, 377 00:43:52,980 --> 00:43:55,650 which means that you need to get the great solution of your own, 378 00:43:55,650 --> 00:44:00,810 your different situation with respect to your partner's values, which is about as difficult as it is to say that. 379 00:44:00,810 --> 00:44:03,060 And then you've got secondary sensitivities, 380 00:44:03,060 --> 00:44:10,560 which is one step forward as a second derivative of your ordinary differential equations, solutions, which prompts about this. 381 00:44:10,560 --> 00:44:18,240 And as you can imagine, that determining these things the first and second, the sensitivity is really computationally expensive. 382 00:44:18,240 --> 00:44:21,330 And so it works well in some circumstances. 383 00:44:21,330 --> 00:44:29,170 And in others it just means that it's too restrictive and so and it's too complicated and expensive to do so. 384 00:44:29,170 --> 00:44:34,870 And so you need a great method to be able to do it. And then we also plan to include lots of free methods. 385 00:44:34,870 --> 00:44:39,720 So is a place that knows of approximate computation methods and points of entry. 386 00:44:39,720 --> 00:44:44,260 And so that's of the next iteration. 387 00:44:44,260 --> 00:44:54,020 So with that, I'll finish and I'll just have a quick thank you to the other developers have written five of them here. 388 00:44:54,020 --> 00:45:03,970 Michael Cluck's, Martin Robinson, John McClain, those people and the government, we've all been based upon a consensus at some point. 389 00:45:03,970 --> 00:45:10,170 Michael is now investigating Nottingham, but there's also other people that have been involved that just didn't have space to include it. 390 00:45:10,170 --> 00:45:15,050 So with that, I'll finish and I'll ask if anyone has any questions. 391 00:45:15,050 --> 00:45:25,520 OK, yeah, so everyone, if you can either come off the audio, clap or clapping like that was wonderful. 392 00:45:25,520 --> 00:45:33,800 Thanks so much. Thank you. So, yeah, if you have any questions and you'd like to remain anonymous, 393 00:45:33,800 --> 00:45:43,900 just talk them in the chat or if you have your voice recorded, then go to audio on any questions. 394 00:45:43,900 --> 00:45:50,540 And can I ask the question, please? Yeah, go for it. And then I thought your topic is really good. 395 00:45:50,540 --> 00:45:57,770 And in particular, animations are just so helpful and showing the intuition underlying the concepts that you're talking about. 396 00:45:57,770 --> 00:46:01,040 And so I just had a question about prior selection. 397 00:46:01,040 --> 00:46:08,000 And because you mentioned this already, obviously one of the criticisms of Bayesian inference is that it can be subjective. 398 00:46:08,000 --> 00:46:16,180 And so how do you address this in your analysis and say that you have like a reviewer on a paper who doesn't agree? 399 00:46:16,180 --> 00:46:22,610 Like, how do you kind of argue that that point? Well. Good question. 400 00:46:22,610 --> 00:46:28,740 So I was asking questions about how I am. 401 00:46:28,740 --> 00:46:32,790 Yeah, very good question. It's a bit of a bit of a can of worms. 402 00:46:32,790 --> 00:46:36,780 There are many different ways to go about choosing an appropriate for our distribution. 403 00:46:36,780 --> 00:46:48,240 And so in some instances, then promises to be able to have a very kind of interpretable manner about them and for literature or prior 404 00:46:48,240 --> 00:46:55,710 estimates of things then to be directly ported over to become poster of your previous analysis can become a fire. 405 00:46:55,710 --> 00:47:01,350 If that's the case, that doesn't happen too often in reality. 406 00:47:01,350 --> 00:47:07,920 So in reality, I think the way that I am, especially in this very different, 407 00:47:07,920 --> 00:47:16,080 is the way I'm sort of now thinking about selecting its I tend to do predictive simulations, 408 00:47:16,080 --> 00:47:26,790 so I choose a selection of pros and then use sampling to set something resembling sample first parameter values and your prise, 409 00:47:26,790 --> 00:47:35,610 and then you empathise with the sampling distribution and then you get out a distribution of your data and that distribution of your data 410 00:47:35,610 --> 00:47:44,790 should hopefully look kind of similar to what you would expect plausible volleys of your data to look like before you do an experiment. 411 00:47:44,790 --> 00:47:51,960 So so that's what you would typically what I typically tend to do now is that I will do 412 00:47:51,960 --> 00:47:58,830 this property some simulations and I have a distribution of of potential data sets, 413 00:47:58,830 --> 00:48:07,680 which is much wider than encompasses the expected range that I would sort of expect to collect when actually go ahead and collect data. 414 00:48:07,680 --> 00:48:13,710 And so then it becomes actually not too difficult to argue that in the paper, I find, 415 00:48:13,710 --> 00:48:21,760 because then you just include either these or the visualisations, the property, the distribution or so those are the contours of it. 416 00:48:21,760 --> 00:48:28,720 And it tends to be quite a persuasive way of arguing that you've made sensible choices about property properties. 417 00:48:28,720 --> 00:48:33,270 So, yeah, that's that's what I sort of encourage people to do now, is to do that. 418 00:48:33,270 --> 00:48:38,220 I mean, because often you just don't have that much information about individual properties. 419 00:48:38,220 --> 00:48:46,080 And if you do, that's kind of a luxury. And then you see that obviously. And so so, yeah, that that's the way I go about doing. 420 00:48:46,080 --> 00:48:53,750 I mean. With counselling, referee comments about fries hasn't actually come up that much to me, I don't know why, 421 00:48:53,750 --> 00:49:02,960 but the one thing you obviously can't do is just kind of sensitivity analysis at that point when someone raises the issue is, 422 00:49:02,960 --> 00:49:09,350 is your problem is your your employer emphasis, sensitivity approach or is and if that is the case, 423 00:49:09,350 --> 00:49:14,760 then you always have a kind of obligation to report that anyways, they should be doing that anyway. 424 00:49:14,760 --> 00:49:19,820 Does that answer your question? Yeah, no, that's that's really interesting. 425 00:49:19,820 --> 00:49:26,120 Yes, I should say that first part. I thank you for that and thanks again for the talk. 426 00:49:26,120 --> 00:49:33,150 So it's a really interesting question, but a good question. Do you have any other questions? 427 00:49:33,150 --> 00:49:41,850 Could I jump in here? Yes, and thanks so much to talk relief's extremely clear, they presented things excellently. 428 00:49:41,850 --> 00:49:50,670 I kind of question the study. You showed how as we had more data, the Pryors essentially outweighed by the likelihood and so on. 429 00:49:50,670 --> 00:49:58,290 Do you have a rough sense of when you evaluate it's kind of worth doing a Bayesian analysis versus not given the amount of data, 430 00:49:58,290 --> 00:50:02,370 or is the idea to just kind of like set everything up in a Bayesian approach, 431 00:50:02,370 --> 00:50:06,560 regardless of the volume of the data and let the results speak for themselves? 432 00:50:06,560 --> 00:50:15,530 So as I understand your question, it's about sort of when you get much merit from using a Bayesian analysis versus using a frequency. 433 00:50:15,530 --> 00:50:20,080 Is that right? Yes, yeah, exactly. What's the kind of threshold of data and how it is? 434 00:50:20,080 --> 00:50:27,550 So it's a good question. I mean, I'm not one of these people that tends to to bash classical inference, particularly. 435 00:50:27,550 --> 00:50:36,540 I think that they both have pieces. And what I can say is that so. 436 00:50:36,540 --> 00:50:45,990 There are situations when the inference allows me to do inference when you wouldn't be able to do so in the classical sense. 437 00:50:45,990 --> 00:50:52,130 So I'm not saying in situations where your model is relatively poorly identified. 438 00:50:52,130 --> 00:50:55,130 So a good example of this is in covid-19, 439 00:50:55,130 --> 00:51:02,260 then you have this sort of transmission dynamic models of how the disease spreads and those models have got lots of different promises, 440 00:51:02,260 --> 00:51:08,600 the rate at which people recover, the rates at which people become infected. And there's lots and lots of uncertainty about all these promises. 441 00:51:08,600 --> 00:51:15,380 And the data that we collect to actually try and estimate those promises is really, really noisy and poor. 442 00:51:15,380 --> 00:51:21,950 And so there's no way that you can actually estimate all these promises just from the data you need. 443 00:51:21,950 --> 00:51:27,920 You need something else. You need biological, pre-existing knowledge, basically. 444 00:51:27,920 --> 00:51:36,620 And so in that situation, then you're a bit stuck in frequencies so you can fix your promises about things that you think about logically possible. 445 00:51:36,620 --> 00:51:41,330 But that's not quite satisfactory because often we don't know those things very well. 446 00:51:41,330 --> 00:51:50,310 And so in that inference, we can use price. And so we can sort of incorporate our uncertainty then, but we can still make progress on the inference. 447 00:51:50,310 --> 00:51:54,580 We can still try and estimate that we're actually interested in. 448 00:51:54,580 --> 00:51:59,440 And and so, yeah, kind of the basic message, 449 00:51:59,440 --> 00:52:04,340 one of the benefits is that it allows you to make progress on problems that are basically just 450 00:52:04,340 --> 00:52:08,260 unidentifiable when you wouldn't be able to make progress with using frequencies influence. 451 00:52:08,260 --> 00:52:14,740 And that tends to be the case when the models that you're trying to do in school get more complicated or the data get fewer. 452 00:52:14,740 --> 00:52:17,320 So it's one of those two kind of circumstances. 453 00:52:17,320 --> 00:52:24,040 There are also other benefits of the sort of method, which is that because everything's done in a simulation way, 454 00:52:24,040 --> 00:52:30,250 because you have to do approximate inference, then you typically get things like uncertainty and predictions for free. 455 00:52:30,250 --> 00:52:37,120 And that's that's kind of the nice thing about basic principle. But does that answer your question? 456 00:52:37,120 --> 00:52:43,000 Yes, that was great. Thank you. Thanks for the question. 457 00:52:43,000 --> 00:52:52,610 Or any other questions? I had a question to come. 458 00:52:52,610 --> 00:53:04,400 Yeah, yeah, yeah, I was curious about the like you free to find it quite hard to hear he. 459 00:53:04,400 --> 00:53:11,930 Can you hear me better now? Yeah, right. So, um, yeah, I was I was I was curious about the like in three different fundamentals. 460 00:53:11,930 --> 00:53:17,270 You were planning to go there. So do you I mean, which kind of matters are planning to include? 461 00:53:17,270 --> 00:53:27,170 Are you planning to have my walky only four of these are asked for like stochastic financial models as well. 462 00:53:27,170 --> 00:53:34,490 Yeah. So so I guess the that that the class of models isn't one class of models that isn't covered 463 00:53:34,490 --> 00:53:39,980 by points at the moment is models where you have the caps that underlines the Casodex. 464 00:53:39,980 --> 00:53:45,650 So I presented a deterministic model which is the closest you can imagine the model, 465 00:53:45,650 --> 00:53:55,190 which is sarcastic and point isn't able to handle that because when you get into the realm of stochastic processes, 466 00:53:55,190 --> 00:53:59,750 then often it's really, really difficult to write down the probability of having generated the data. 467 00:53:59,750 --> 00:54:04,340 Intuitively it be there were just too many ways to generate the data and so on. 468 00:54:04,340 --> 00:54:09,790 And so in that setting then, yeah, it's very difficult to latch on a. 469 00:54:09,790 --> 00:54:13,330 In those situations, there are a few different things you can do. 470 00:54:13,330 --> 00:54:22,690 One of them that I think is probably the next logical step is to put an approximate on compensation approaches. 471 00:54:22,690 --> 00:54:29,140 And in those, what they rely on is essentially your ability to simulate from the model. 472 00:54:29,140 --> 00:54:40,910 So so long as you can simulate from your process and check how close your simulated data is to your actual data and different parameter values, then. 473 00:54:40,910 --> 00:54:49,010 Under some some conditions, then you can still make progress towards being able to do it, 474 00:54:49,010 --> 00:54:57,360 it becomes approximate because it's it's it's only exact in the limit that your data and you the data exactly. 475 00:54:57,360 --> 00:54:59,990 One another, which typically doesn't happen in reality. 476 00:54:59,990 --> 00:55:08,580 So you used what cycle sort of approximate based compensation methods to to to do not just rely on your ability to forward simulating the model. 477 00:55:08,580 --> 00:55:11,570 And we're going to include a whole host of these different methods. 478 00:55:11,570 --> 00:55:19,100 We've already done a lot of exploration of these different methods, and our plan is to include them in the next iteration of points, 479 00:55:19,100 --> 00:55:24,230 because people in the institute I've asked this about, 480 00:55:24,230 --> 00:55:30,320 about this quite a lot of the time because they tend to to use these things in, let's say, spatial simulations themselves. 481 00:55:30,320 --> 00:55:38,230 And it also gets used a lot in support of genetics. And because, again, it's hard to write down the. 482 00:55:38,230 --> 00:55:44,960 The volunteer person. Yeah, I think it was, yeah, yeah, it was. 483 00:55:44,960 --> 00:55:53,300 Yeah, I was just asking because I'm I'm also happy to fight my body when we do A, B, C, so. 484 00:55:53,300 --> 00:56:00,410 OK, well, I mean, if you're interested in working or chatting about that, 485 00:56:00,410 --> 00:56:05,090 then happy to if you want to send me an email or get in touch with have a chat. 486 00:56:05,090 --> 00:56:10,970 But my view is that about, about developing parks and libraries is that if they are trying to do the same thing, 487 00:56:10,970 --> 00:56:17,150 which is kind of inference, then I think it's probably better that they this is just I'm biased like that. 488 00:56:17,150 --> 00:56:21,560 It's probably better that they say on the one thing rather than on the lots of different things. But that's just my view. 489 00:56:21,560 --> 00:56:31,520 And I think people in Matsue also tried to develop their own agency libraries and and and that some have been more successful than others on site. 490 00:56:31,520 --> 00:56:36,590 But yeah, if you want to if you want to have a chat about it and set a minimum. 491 00:56:36,590 --> 00:56:41,830 Right. Thank you. Oh. 492 00:56:41,830 --> 00:56:47,280 Great. OK, so if there are any more questions. 493 00:56:47,280 --> 00:56:53,880 OK, so I think what another round of applause for a great talk. 494 00:56:53,880 --> 00:56:59,400 I said, well, thanks again, guys, and thanks for setting this up and down and organising. 495 00:56:59,400 --> 00:57:05,290 And yeah, if anyone's got any questions afterwards and just email me, then I think the talk is being said on the podcast. 496 00:57:05,290 --> 00:57:10,269 Seriously, you should be able to that anyway. So yeah. Thanks again.