1
00:00:00,180 --> 00:00:07,500
I was just just having a meeting with some students about talk about.
2
00:00:07,500 --> 00:00:18,150
About machine learning for sars-cov-2 main proteases, developing models to help discover new inhibitors.
3
00:00:18,150 --> 00:00:24,290
Yeah. All right. OK, so. We should get started.
4
00:00:24,290 --> 00:00:33,800
So it gives me great pleasure to introduce number who is now another co-director of the subsidy,
5
00:00:33,800 --> 00:00:42,260
and you're connected to the RC group and computer science as well, research engineering.
6
00:00:42,260 --> 00:00:46,520
And I should also plug your book, which you can buy on Amazon.
7
00:00:46,520 --> 00:00:50,600
And it is all about Bayesian statistics. Give me the correct title.
8
00:00:50,600 --> 00:00:55,070
The title is called A Student Guide to Bayesian Statistics.
9
00:00:55,070 --> 00:00:59,420
There is got to if you look at it on a on a bookshelf,
10
00:00:59,420 --> 00:01:07,610
it's got to kind of chillis on it and the vegetable won't go into vegetables on the front of it, which we call and quite confusing.
11
00:01:07,610 --> 00:01:18,290
So if you need to identify the MasterChef rating and to find excellence and is there a new edition coming out soon or there is in it in a year or so,
12
00:01:18,290 --> 00:01:23,930
it's time. So, OK, I still get it.
13
00:01:23,930 --> 00:01:35,780
OK, so what you can say. Yeah, this, this talk is going to be recorded and it will end up on Oxford podcast eventually under the of statistics.
14
00:01:35,780 --> 00:01:45,860
So if you do want to ask a question and then we'll be stopping during his talk to ask if there are any questions,
15
00:01:45,860 --> 00:01:52,040
make sure that if you if you come off video, you're happy to be recorded.
16
00:01:52,040 --> 00:02:00,470
And if you don't want to be recorded, just type your questions in a chat and then I will listen to that.
17
00:02:00,470 --> 00:02:07,820
So without further ado, I'm going to hand over to Ben and if you'd like to share your screen.
18
00:02:07,820 --> 00:02:16,880
I will do. I was doing this before I do, and I was just trying to trying to get a portion of my screen, so oh OK.
19
00:02:16,880 --> 00:02:25,490
So I thought that looks about right now and just that does not go through with it.
20
00:02:25,490 --> 00:02:31,390
Change as it is changing. Perfect. Great.
21
00:02:31,390 --> 00:02:41,630
Thanks. And thanks very much for the invitation to talk today and the statistics about what is actually happening with the statistics department.
22
00:02:41,630 --> 00:02:47,870
Because when I was about 16 years old, I came to the night and dying down in the basement of the statistics department.
23
00:02:47,870 --> 00:02:55,130
And so whenever I go there, I get a bit nostalgic. Well, hopefully we're all able to go into the department at some point soon anyway.
24
00:02:55,130 --> 00:03:03,860
So today I'm going to give an introduction to Bayesian inference for differential equation models.
25
00:03:03,860 --> 00:03:09,290
And if you're wondering why we already said a few words.
26
00:03:09,290 --> 00:03:16,070
I'm a statistician based in the Department of Computer Science and I essentially work on data science,
27
00:03:16,070 --> 00:03:24,490
machine learning and statistical inference problems for different research groups across the university.
28
00:03:24,490 --> 00:03:31,170
I've been a user of statistics for the past, I don't know how many years I worked to an industry before I came back,
29
00:03:31,170 --> 00:03:37,750
FEMA and very crucially for this talk, I was born in the same town as Thomas Bass.
30
00:03:37,750 --> 00:03:42,940
And I actually went to school there, which is Tunbridge Wells. And here's a here's a picture of Tunbridge Wells station.
31
00:03:42,940 --> 00:03:47,480
And it still looks quite similar to that today.
32
00:03:47,480 --> 00:03:56,720
So today, I'm going to have a mixture of things because I thought, look, what we want to talk to be is partly pedagogical and partly research.
33
00:03:56,720 --> 00:04:02,960
So firstly, I'm going to provide a really, really short introduction to the it's just a simple example.
34
00:04:02,960 --> 00:04:10,250
Then I'm going to talk about how you actually go about doing inference, formulating an inference problem for ordinary differential equation models.
35
00:04:10,250 --> 00:04:17,660
And then I'm going to talk briefly about how it is very, very difficult in practise to actually do exact spacing influence.
36
00:04:17,660 --> 00:04:21,840
And so instead, what you do is you do some sort of approximation.
37
00:04:21,840 --> 00:04:26,810
An approximation typically happens by some sort of computational sampling.
38
00:04:26,810 --> 00:04:34,010
And then finally, I talk a bit about a Python package, which we created in the Department of Science,
39
00:04:34,010 --> 00:04:36,770
which specifies immigrants for ordinary differential equations,
40
00:04:36,770 --> 00:04:46,470
models, and that's called points, which I can't remember the acronym because I think it's probabilistic inference for noisy time series models.
41
00:04:46,470 --> 00:04:53,760
So, yeah, so if we get started with a sort of short introduction to basic inference.
42
00:04:53,760 --> 00:05:00,240
The example I'm going to give is a margin that we want to estimate disease prevalence within a population,
43
00:05:00,240 --> 00:05:06,780
and so we're going to suppose that we take a sample of and study participants from the population.
44
00:05:06,780 --> 00:05:14,070
We take their blood, and then we apply some sort of clinical test to determine presence or absence of a disease.
45
00:05:14,070 --> 00:05:18,070
And we find that X of those individuals are disease policy.
46
00:05:18,070 --> 00:05:26,680
The question we might have is how do we use these data to estimate disease prevalence and hopefully with uncertainty?
47
00:05:26,680 --> 00:05:35,620
Well, there are aspects of the data generating process that we don't know about, we don't know exactly how the sample of individuals was formed,
48
00:05:35,620 --> 00:05:42,400
for example, and then we're going to use a probabilistic model to try and explain the data.
49
00:05:42,400 --> 00:05:46,910
So how do we choose what sort of probability model to use?
50
00:05:46,910 --> 00:05:52,070
What we need to think about the characteristics of our data, first of all, sample size and is fixed,
51
00:05:52,070 --> 00:05:58,270
we only sample individuals so we can only possibly have as a maximum and and
52
00:05:58,270 --> 00:06:04,900
individuals with disease processes and we can have any integer value up to that.
53
00:06:04,900 --> 00:06:17,630
So it's a discreet probability distribution we're looking for with a bounded non-negative and sort of support for the data and again,
54
00:06:17,630 --> 00:06:20,440
so that narrows down also probability distributions,
55
00:06:20,440 --> 00:06:26,230
then we need to make some assumptions so we can assume that those individuals that retrieve from the population
56
00:06:26,230 --> 00:06:32,320
represent independent samples and we can assume that those individuals are drawn from the same population.
57
00:06:32,320 --> 00:06:37,840
And if you Google all these things, these these various assumptions and characteristics,
58
00:06:37,840 --> 00:06:41,590
but it turns out that a single probability model satisfies those conditions,
59
00:06:41,590 --> 00:06:51,410
which is the number and I've written down by my new probability mass function, which I'm sure you're all familiar with in school.
60
00:06:51,410 --> 00:07:00,740
And basing inference, what we want to do is we want to essentially estimate our parameters of all probability.
61
00:07:00,740 --> 00:07:08,240
So she's going back. The promise of pizza represents the prevalence of disease in our population,
62
00:07:08,240 --> 00:07:14,180
and it does so if we assume that the clinical test that we're using has is essentially perfect.
63
00:07:14,180 --> 00:07:20,750
So under those assumptions is the disease prevalence, and that's the parameter that we want to estimate.
64
00:07:20,750 --> 00:07:31,990
Bases Rule gives us a sort of mechanism for estimating that Prometa written down here, but what each of these terms mean in group.
65
00:07:31,990 --> 00:07:35,920
So I'm going to kind of step through these individual terms and then what we can do is going
66
00:07:35,920 --> 00:07:41,620
to see how changing these individual terms actually influences our results and influences.
67
00:07:41,620 --> 00:07:46,830
So the first time on the right hand side is something which is known as the likelihood.
68
00:07:46,830 --> 00:07:55,290
And it's important to note that the likelihood is actually not a probability distribution as it's used in Bayesian inference,
69
00:07:55,290 --> 00:07:59,820
because it makes a difference to variance and we hold the data constant.
70
00:07:59,820 --> 00:08:01,530
And so it's a function of pizza.
71
00:08:01,530 --> 00:08:11,530
And that function of pizza does not satisfy the conditions for distribution to integrate pizza coding exposed and then it wouldn't integrate to one.
72
00:08:11,530 --> 00:08:18,160
Importantly, people like to bang on about how it happens is quite subjective.
73
00:08:18,160 --> 00:08:20,050
We use the sort of wishy washy,
74
00:08:20,050 --> 00:08:30,940
but in my experience that one of the typically the most subjective decisions that are made about an analysis or how you choose the likelihood.
75
00:08:30,940 --> 00:08:41,310
And so I want to highlight here that the the likelihood it so often contains many, many subjective, I guess, assumptions.
76
00:08:41,310 --> 00:08:46,800
And then the second time on the right hand side in the numerator is what's known as the.
77
00:08:46,800 --> 00:08:51,930
By contrast, and the likelihood is a valid probability distribution and similar to the like.
78
00:08:51,930 --> 00:09:00,170
It is also subjective. And then the final time on the right hand side and the bottom is 10.
79
00:09:00,170 --> 00:09:07,250
It's got many names identical to the denominator and it's got kind of two different interpretations before we actually collect the data.
80
00:09:07,250 --> 00:09:10,780
It is what's known as a prior predictive distribution.
81
00:09:10,780 --> 00:09:20,240
So it's actually about a potential date to some things that we could get given all of to and something.
82
00:09:20,240 --> 00:09:26,990
And then once we are there, it's just the number that normalises the area and that's known as the evidence or the likelihood,
83
00:09:26,990 --> 00:09:31,850
and that's calculated from that and it's entirely calculated from the numerator.
84
00:09:31,850 --> 00:09:38,550
And as we'll see that later on, calculating this denominator is the source of much of the problem in doing.
85
00:09:38,550 --> 00:09:46,960
Exactly. And then the final term in baseball is what's known as the posterior.
86
00:09:46,960 --> 00:09:54,130
It is the goal of Democrats because it is what we want to do is we want to summarise our uncertainty about some
87
00:09:54,130 --> 00:10:01,000
quantity that we don't know about the disease prevalence using probabilities and probability distributions,
88
00:10:01,000 --> 00:10:06,700
and because that's the only way to summarise uncertainty.
89
00:10:06,700 --> 00:10:13,390
And as I say, it's the starting point for all sort of further analysis and basic.
90
00:10:13,390 --> 00:10:18,550
Now, I want to talk a little bit about the intuition behind doing Bayesian inference.
91
00:10:18,550 --> 00:10:28,960
So if we run down this rule again, then we can see that actually the numerator is the right hand side doesn't contain the promise of Caesarism.
92
00:10:28,960 --> 00:10:34,660
And so the posterior is essentially proportional to the product of the likelihood in the front.
93
00:10:34,660 --> 00:10:40,450
And so what that means is that the posterior is essentially a kind of weighted geometric mean of the palm, the likelihood.
94
00:10:40,450 --> 00:10:45,970
And so all of its kind of shape is determined by the product of these two terms.
95
00:10:45,970 --> 00:10:51,380
And that's what I want to emphasise. Now, if you animation's.
96
00:10:51,380 --> 00:11:02,210
So I'm going to imagine how we go out and we collect blood samples for 10 people and we find that three of them are disease.
97
00:11:02,210 --> 00:11:05,040
And now what I'm drawing here is a potential crime.
98
00:11:05,040 --> 00:11:13,100
And I've chosen a uniform criteria for the disease prevalence between zero and one zero one hundred percent,
99
00:11:13,100 --> 00:11:17,630
depending on how you think about it, the disease prevalence all equally likely.
100
00:11:17,630 --> 00:11:25,100
And then below that, I can show the likelihood that likelihood is that's three out of 10,
101
00:11:25,100 --> 00:11:29,930
because we found that three out of 10 individuals were disease.
102
00:11:29,930 --> 00:11:37,040
And the posterior is the product of the likelihood because the prior is the to speak to that sort of point three,
103
00:11:37,040 --> 00:11:45,970
then the posterior is also pointing to the same. But now what I want to show you is how posterity changes as I change my prior.
104
00:11:45,970 --> 00:11:56,500
And so I'm going to run a quick animation which shows that as I changed what prior distribution I'm using, then the posterior distribution shifts.
105
00:11:56,500 --> 00:12:04,390
And what we find here is that the peak of the posterior ends up being somewhere between the peak of the prior and somewhere between the peak.
106
00:12:04,390 --> 00:12:11,180
And so it's kind of, as I said, this weighted average of the prior on the.
107
00:12:11,180 --> 00:12:25,640
So austerity is affecting both of our prejudices and data, telling us about the values or promises of our modern.
108
00:12:25,640 --> 00:12:28,310
Now, I want to show you a slightly different thing,
109
00:12:28,310 --> 00:12:33,620
which now I'm going to hold the prior constant and I'm going to imagine that we collected different types of samples.
110
00:12:33,620 --> 00:12:41,270
So I'm starting off here with imagining that we had a sample size of 10 and only we didn't find any individuals with disease.
111
00:12:41,270 --> 00:12:48,180
And now we see that the likelihood is zero because that's the maximum likelihood estimate of of the primitive.
112
00:12:48,180 --> 00:12:52,090
And now what we can see is that as we collect different data samples,
113
00:12:52,090 --> 00:12:58,720
my likelihood shifts and because my head is shifting, my posterior is also shifting as well.
114
00:12:58,720 --> 00:13:02,640
And we find that the position of the peak of the area is somewhere between
115
00:13:02,640 --> 00:13:09,590
the peak of the problem because we're lucky to be found in the previous case. Now, finally,
116
00:13:09,590 --> 00:13:19,280
I want to show you what happens if instead we imagine that we had a fixed price and a fixed proportion of individuals who have the disease,
117
00:13:19,280 --> 00:13:27,650
but we increase our sample size, so we start out with three out of 10 individuals who have the disease, then we get 30 out of 100,
118
00:13:27,650 --> 00:13:32,570
etc. So we're just keeping that proportion the same, increasing the sample size.
119
00:13:32,570 --> 00:13:37,070
So before I start running some animation, we can see that with a sample size of 10,
120
00:13:37,070 --> 00:13:41,210
the posterior is somewhere between the peak of the pro and the likelihood.
121
00:13:41,210 --> 00:13:46,050
But as I increase my sample size, we see a couple of things happen.
122
00:13:46,050 --> 00:13:53,820
Firstly, we see that the area becomes narrower and that makes sense, right, because as I collect more data,
123
00:13:53,820 --> 00:14:00,150
then I should hopefully get more confident in my estimate, but something else happens as well.
124
00:14:00,150 --> 00:14:06,940
We actually see that the position of the austerity shifts over towards the position of the.
125
00:14:06,940 --> 00:14:18,190
And that, again, is a desirable property of fazing influence, which is that as I increase my sample size, then ideally the private, I use it much.
126
00:14:18,190 --> 00:14:25,780
That's not always the case because the more complicated models and you may not ever be able to invest in promises of a model for a model like this,
127
00:14:25,780 --> 00:14:31,570
then that's true. So hopefully that's provided some intuition for you.
128
00:14:31,570 --> 00:14:45,730
That's what I wanted to ask if anyone had any questions at this point.
129
00:14:45,730 --> 00:14:52,310
There are no questions.
130
00:14:52,310 --> 00:15:04,670
If not, that's fine, I can continue on and there's going to be more opportunities, questions, and so just curious how it's implemented.
131
00:15:04,670 --> 00:15:09,270
What's that? Sorry, how is that updating Grauwe implemented.
132
00:15:09,270 --> 00:15:13,350
OK, so do you mean to be supervised or do you mean is it just an animation.
133
00:15:13,350 --> 00:15:24,450
If so, it's not to get these sorts of things is always very good for the animations.
134
00:15:24,450 --> 00:15:25,500
Great.
135
00:15:25,500 --> 00:15:32,870
OK, so that's hopefully provided a little bit of an instruction device, Newgrange, and I'm sure that a lot of you are familiar with the technology.
136
00:15:32,870 --> 00:15:37,510
So but I wanted to do that just to provide a little grounding for the rest of the.
137
00:15:37,510 --> 00:15:46,090
So now we're going to kind of step up perhaps the level of difficulty a little bit and we can talk about how we formulate Bayesian inference,
138
00:15:46,090 --> 00:15:49,740
the problems, ordinary differential equations.
139
00:15:49,740 --> 00:15:58,950
And we can imagine now a slightly different example where we carry out a series of experiments where we inoculate some patients with bacteria,
140
00:15:58,950 --> 00:16:08,950
some initial time and then predefined time intervals, we can't count the number of bacteria on each pipe using some sort of experimental approach.
141
00:16:08,950 --> 00:16:18,550
And imagine that what we're trying to do is to model the bacterial population growth over time.
142
00:16:18,550 --> 00:16:25,630
So I've got some sort of fictitious data here which shows the accounts of bacteria over time
143
00:16:25,630 --> 00:16:35,170
that have been measured and what we want to do is develop a model that it kind of explains the.
144
00:16:35,170 --> 00:16:38,350
One model that might be appropriate for this is the logistic model.
145
00:16:38,350 --> 00:16:45,760
So this basically contains it's a differential equation that contains two terms on the right hand side.
146
00:16:45,760 --> 00:16:52,420
One of the terms is essentially reflects initial growth of the bacterial population.
147
00:16:52,420 --> 00:17:03,340
And then so that's the term alpha here. And then this term beta and essentially reflects how, as the size of the bacterial population grows,
148
00:17:03,340 --> 00:17:10,660
then there's a reduction in the growth rate due to some of the crowding that to be competition for nutrients, for example.
149
00:17:10,660 --> 00:17:18,640
And so what you actually find in this sort of model is that you get sigmoidal curve, which represents the solution of the.
150
00:17:18,640 --> 00:17:26,800
So we've got our data and we've got a model, do we have everything that we need to do here?
151
00:17:26,800 --> 00:17:35,560
So what I could do is I could imagine overlaying lots of potential solutions of the different parameters.
152
00:17:35,560 --> 00:17:43,330
But we have a bit of a problem here, which is that none of the models that we're using can fully explain the data.
153
00:17:43,330 --> 00:17:50,530
In other words, they've got zero probability of having generated the data because our models are smooth
154
00:17:50,530 --> 00:17:56,470
lines and our data set in the uncertainty of it all that not all along those lines.
155
00:17:56,470 --> 00:18:05,830
And so at the moment, we don't have enough information to do to formulate the inference problem here.
156
00:18:05,830 --> 00:18:15,550
What we need, we need some sort of statistical model which represents essentially those boxes that we don't account for and are deterministic model.
157
00:18:15,550 --> 00:18:20,110
So here I'm going to assume that we've got some sort of measurement error around the true value.
158
00:18:20,110 --> 00:18:29,470
So the number of bacteria that we count on a plate at Sonti is normally distributed about the true count, which is the solution of the.
159
00:18:29,470 --> 00:18:38,100
And there is some. Noise Prometa Sigma, which represents the magnitude of the mission about the true value.
160
00:18:38,100 --> 00:18:47,040
So I should say here that using a normal measurement error model is not the only choice I could have made and I could use,
161
00:18:47,040 --> 00:18:54,330
for example, student distribution. And but it is generally a fairly steady,
162
00:18:54,330 --> 00:19:03,750
widely used race and also implicitly almost assuming here that the normal areas and actually just well, I just want to have a second.
163
00:19:03,750 --> 00:19:09,680
I might just kind of turn off my slack, because I think that's going to become quite annoying in a second.
164
00:19:09,680 --> 00:19:19,710
I'm just going to quit that. Sorry. OK.
165
00:19:19,710 --> 00:19:24,520
So the question we might have is, how does this model work?
166
00:19:24,520 --> 00:19:32,530
So the data generating process is that we assume the true number of cells follows the solution of the.
167
00:19:32,530 --> 00:19:37,750
And then there is some sort of measurement process which is imperfect.
168
00:19:37,750 --> 00:19:43,150
So that means that we don't actually measure the true number of the bacteria on plates,
169
00:19:43,150 --> 00:19:50,230
but that the amount that we do measure is sort of normally distributed around the value.
170
00:19:50,230 --> 00:19:54,700
And then to get all our data, then we sample from that process.
171
00:19:54,700 --> 00:20:05,230
And so then we can draw data from. And now, because of the statistical model that gives us a possible way of having generated our data,
172
00:20:05,230 --> 00:20:10,830
whereas we didn't have that before when we just used our purely deterministic mathematical model.
173
00:20:10,830 --> 00:20:17,170
We needed that information to be able to formulate the sort of proper inference.
174
00:20:17,170 --> 00:20:20,470
So how do we write down the likelihood of this circumstance?
175
00:20:20,470 --> 00:20:29,260
Well, remember that what we're using is as a normal like to hit that sense of on the true number of cells at the time.
176
00:20:29,260 --> 00:20:36,640
And so we can write down the observations by taking the products off the individual bits of each data point.
177
00:20:36,640 --> 00:20:43,600
And we're able to do that because we're assuming conditional independence of all of all of our data,
178
00:20:43,600 --> 00:20:46,810
conditional conditioning on the parameters of our model.
179
00:20:46,810 --> 00:20:52,630
And so the likelihood of all our observations is sort of the the the probability I get the first
180
00:20:52,630 --> 00:20:59,930
observation on the probability density of the second observation and so on all the way up.
181
00:20:59,930 --> 00:21:13,090
And so I'm using the capital and stuff with the kind of bolt it to represent my vector of measured counts of bacteria.
182
00:21:13,090 --> 00:21:20,040
The question we have, though, is how do we actually calculate and, well, for the logistic model, there is actually an analytics solution.
183
00:21:20,040 --> 00:21:20,980
I can write it down.
184
00:21:20,980 --> 00:21:31,660
And that's that is the justification in most ordinary differential equation models that mean representing the deterministic solution cannot be solved.
185
00:21:31,660 --> 00:21:42,760
Exactly. And so what we typically do is use some sort of numerical integral methods and to to integrate idea.
186
00:21:42,760 --> 00:21:49,660
And we're going to have to bear that in mind whenever we do inference problems. And also,
187
00:21:49,660 --> 00:21:54,370
we know the value of end depends implicitly on the promises of so I can actually write
188
00:21:54,370 --> 00:22:03,960
down the solution of using an implicit notation entity is some function of time for to.
189
00:22:03,960 --> 00:22:10,020
So then what we can do is we can just sort of rewrite our likelihood a little bit here and now,
190
00:22:10,020 --> 00:22:14,310
that explicitly makes it clear that the likelihood depends on on three parameters.
191
00:22:14,310 --> 00:22:20,370
It depends on the great promise of the founding PROMETA and the noise of our model.
192
00:22:20,370 --> 00:22:26,230
And so what you see is that when you were actually formulating the infamous public order differential equations, models,
193
00:22:26,230 --> 00:22:32,230
then typically you get expert parameters other than just the parameters of your ordinary differential equation
194
00:22:32,230 --> 00:22:38,820
model that you need to put in place for that sort of nuisance parameters that represent some sort of measurement.
195
00:22:38,820 --> 00:22:48,720
Some sort of infection. So then what I can do is I can write down our posterior and it has the form, the posterior is equal to the time times.
196
00:22:48,720 --> 00:22:59,560
The problem right now is this three dimensional distribution and we have denominate to install.
197
00:22:59,560 --> 00:23:03,550
So that's that's how you formulate. Well, that's one way of formulating the infant's problems.
198
00:23:03,550 --> 00:23:14,030
The only difference to question was does anyone have any questions at this point?
199
00:23:14,030 --> 00:23:18,680
Yes, I do. Hello. Hi.
200
00:23:18,680 --> 00:23:26,180
Could you please explain your Alpha, Beta and gamma, which one of them represent the parameters of?
201
00:23:26,180 --> 00:23:32,840
Sure. So here's the equation. And it's got these two properties out from Beta.
202
00:23:32,840 --> 00:23:33,870
This is my ordinary difference.
203
00:23:33,870 --> 00:23:41,270
The model, as I said, Alpha kind of represents the initial growth rate of the population or indicates the growth rate in population,
204
00:23:41,270 --> 00:23:46,190
which is an exponential growth rate. And beta represents a kind of grounding.
205
00:23:46,190 --> 00:23:54,890
And so my posterior distribution is a function of those two provinces and my noise problems, a signal, does that make sense?
206
00:23:54,890 --> 00:24:00,440
Yes. And so does that mean and when we are formulating something like this,
207
00:24:00,440 --> 00:24:09,530
the only additional parameter that we need to consider is if we just have one additional parameter for the measurement data.
208
00:24:09,530 --> 00:24:16,580
So it depends a little bit. So I've used a very simple measurement model here, which is just got one extra parameter.
209
00:24:16,580 --> 00:24:25,460
Now, there are lots of different choices I could have made of the more complex I make that measure model and typically the more promises it has.
210
00:24:25,460 --> 00:24:31,340
And so, yes, sometimes it does and other times it doesn't.
211
00:24:31,340 --> 00:24:39,590
If you're using more complex measurement. Also another time it is if your ordinary differential equation model essentially has a number of outputs.
212
00:24:39,590 --> 00:24:48,380
So imagine you've got a system of order difference equations and and you're sort of using observations on all of those to form inference.
213
00:24:48,380 --> 00:24:55,190
Then then you might have a different measurement. Is the promise of each of those different parts of the system.
214
00:24:55,190 --> 00:24:59,990
Yeah, that that. Thank you. Thank you very much.
215
00:24:59,990 --> 00:25:12,040
Does anyone have any other questions? No.
216
00:25:12,040 --> 00:25:16,270
Sorry. No questions. OK, great.
217
00:25:16,270 --> 00:25:30,160
Thank you. So talk about how about how we formulate the inference problem and now want to talk about how we go about actually solving it.
218
00:25:30,160 --> 00:25:37,330
And as we'll see the method of solving the problem, is it perhaps a bit messier than you might expect?
219
00:25:37,330 --> 00:25:41,290
And it involves the various approximations.
220
00:25:41,290 --> 00:25:53,940
So if we revisit the posterior for our logistic model and imprint's problem, then we see that it's got this denominate here, then they then start.
221
00:25:53,940 --> 00:26:04,710
So how would we actually calculate that? Well, because Alphabeat, Beta Sigma, all contiguous properties, then to calculate its nominate,
222
00:26:04,710 --> 00:26:10,210
someone need to do a type of integral, essentially three dimensional.
223
00:26:10,210 --> 00:26:20,980
And that's pretty tricky, it's tricky for computers to do any three dimensional, really, at least to do it deterministically.
224
00:26:20,980 --> 00:26:26,840
Exactly. So for any sort of problem, doing a three dimensional indigo is tricky.
225
00:26:26,840 --> 00:26:33,170
Invasion immigrants doing the interviews that are involved in the denominator are especially difficult.
226
00:26:33,170 --> 00:26:41,390
And that's because the likelihood tends to be very narrow, the sort of space for which the likelihood is not negligible,
227
00:26:41,390 --> 00:26:47,870
whereas the prior tends to be really, really white people of the news, kind of uninformed, surprised, and that causes additional problems.
228
00:26:47,870 --> 00:26:56,870
So trying to approximate in different. And in order to difference, the most difficulty is compounded even further,
229
00:26:56,870 --> 00:27:04,880
because to evaluate the likelihood of in a differential equations setting actually
230
00:27:04,880 --> 00:27:10,190
means that typically numerically integrate differential equations to get the solution.
231
00:27:10,190 --> 00:27:14,060
And so is it integral? Did we see an equation?
232
00:27:14,060 --> 00:27:19,490
But implicitly, this also involves a whole series of kind of implicit intervals as well.
233
00:27:19,490 --> 00:27:28,250
So suffice to say, there's absolutely zero chance that we could actually evaluate this this phenomenon at least.
234
00:27:28,250 --> 00:27:33,770
Exactly. And so we need to sort of realised that we can't do exact and influence.
235
00:27:33,770 --> 00:27:39,840
And that's that's not just the problem with ordinary differential equations, models. That's the problem with doing this in general.
236
00:27:39,840 --> 00:27:45,660
So what can we do and this leads me into a different area,
237
00:27:45,660 --> 00:27:57,210
which is that I imagine that you want to gain insight into a distribution and not distribution here is
238
00:27:57,210 --> 00:28:06,270
I'm imagining we've got this kind of bottomless pit bulls and we don't know really how many cholesterol.
239
00:28:06,270 --> 00:28:08,820
We don't know the frequencies of each of the colours.
240
00:28:08,820 --> 00:28:18,600
So the question we might have is how can we determine the underlying probability distribution of food colour from from this?
241
00:28:18,600 --> 00:28:27,120
So the answer, which is very intuitive, is that what we do is we draw lots of lots of balls from the end and we count the sample frequencies.
242
00:28:27,120 --> 00:28:37,140
So if I draw one hundred bowls from the end and I tabulate the frequencies of light balls, then what we see is that if I collect enough samples,
243
00:28:37,140 --> 00:28:41,360
then the probability distribution or the sampling distribution, rather,
244
00:28:41,360 --> 00:28:51,520
it starts to converge to something which hopefully represents the underlying probability distribution of those colours within the.
245
00:28:51,520 --> 00:28:57,350
And so what have we learnt here? We've learnt that if we can sample from a distribution.
246
00:28:57,350 --> 00:29:01,890
Then we can use the sample properties of the things that we collected to help
247
00:29:01,890 --> 00:29:12,470
us to learn about that distribution and we can get quantities of interest. So what's the connexion of this to.
248
00:29:12,470 --> 00:29:17,730
There is a probability distribution. It's a discrete probability distribution.
249
00:29:17,730 --> 00:29:21,000
The posterior, for example, is also a probability distribution.
250
00:29:21,000 --> 00:29:26,530
It's a continuous one in that case, and there are also discrete posterior probabilities.
251
00:29:26,530 --> 00:29:36,980
But in all cases, a continuous one. So the idea behind computational sampling is that if we can construct a way of sampling,
252
00:29:36,980 --> 00:29:41,540
draw drawing and values of all properties from that posterior distribution,
253
00:29:41,540 --> 00:29:49,520
then we can use those rules to help us to summarise the posterior distribution and approximates it in some way.
254
00:29:49,520 --> 00:29:53,460
So a question we have is, how do we actually construct such a song?
255
00:29:53,460 --> 00:29:58,220
And it's not as simple as that reaching into the and with a different colour.
256
00:29:58,220 --> 00:30:06,650
Right. So how do we do that? Well, the answer is that you use something which is known as molecule Markov, chain multicolour.
257
00:30:06,650 --> 00:30:12,500
So in our example for our posterior, we couldn't calculate the posterior exactly,
258
00:30:12,500 --> 00:30:19,360
but we could calculate the numerator of this room, which is the product of the hit in the prime.
259
00:30:19,360 --> 00:30:26,020
And it turns out that this contains enough information for us to construct a Markov chain,
260
00:30:26,020 --> 00:30:31,660
which in an infinite sample size draws from the posterior distribution.
261
00:30:31,660 --> 00:30:34,090
Now, infants on both sides sounds a bit intimidating,
262
00:30:34,090 --> 00:30:41,170
but the idea is that in a finite sample size and hopefully we should have enough samples from our local
263
00:30:41,170 --> 00:30:46,660
chain that we eventually get out something that approximates quite well the posterior distribution.
264
00:30:46,660 --> 00:30:52,430
But they're constructed, so they converge asymptotically into the posterior distribution.
265
00:30:52,430 --> 00:30:57,060
There are many, many times of multiple Chamberlain's columnists,
266
00:30:57,060 --> 00:31:04,160
and they be sort of different uses in different situations, some typically more useful than other ones.
267
00:31:04,160 --> 00:31:11,600
And I'll talk about that in a few minutes. I don't want to go into too much detail about Montecarlo methods and sort of motivating them.
268
00:31:11,600 --> 00:31:21,090
Not so much. But I want to provide a the perhaps the simplest or definitely the simplest variant of molcajete Monte Carlo of the oldest variant,
269
00:31:21,090 --> 00:31:29,210
which is amongst them, which was created in nineteen fifty three by Nicholas Metropolis and still stands Donelan.
270
00:31:29,210 --> 00:31:35,170
We were working on the Manhattan Project that a nuclear physicist interested in neutron.
271
00:31:35,170 --> 00:31:41,910
So. This is a sketch of the algorithm is actually not as intimidating as it looks.
272
00:31:41,910 --> 00:31:49,590
The idea is that you start from some sort of arbitrary initial point for each of all parameters, Alpha, Beta Sigma.
273
00:31:49,590 --> 00:31:57,960
And then you iterate the following. You draw some proposed values for each of those promises from a normal distribution,
274
00:31:57,960 --> 00:32:09,490
which is centred on the previous process and not normal distribution has some sort of proposal with distribution or covariance matrix sigma,
275
00:32:09,490 --> 00:32:14,440
which needs to be tuned to be appropriate for a given circumstance.
276
00:32:14,440 --> 00:32:24,700
Then what you do is you calculate the ratio of the proposed posterior probability to the current posterior probability.
277
00:32:24,700 --> 00:32:30,760
And so, you know, that equation twenty one would require us to actually evaluate the posterior distribution,
278
00:32:30,760 --> 00:32:35,140
which we know we can't do because we can't calculate the denominator, the bicycle.
279
00:32:35,140 --> 00:32:41,620
But it turns out we don't need to because the denominator cancels out of this ratio.
280
00:32:41,620 --> 00:32:52,210
And so what we get left with an equation twenty two is actually just the ratio of the unorganised posterior of the proposed point to the normalised.
281
00:32:52,210 --> 00:32:58,220
So we talked about this racial thing, which we can do because we can talk about all the things in this room,
282
00:32:58,220 --> 00:33:09,820
all the things in the in the numerator, then what we do is we draw a uniform value from a value from the uniform distribution between zero one.
283
00:33:09,820 --> 00:33:19,240
And then if all is greater than the uniform value, then the next set of products bodies becomes proposed.
284
00:33:19,240 --> 00:33:24,560
Otherwise, then we split where we were for where we are currently for the next iteration.
285
00:33:24,560 --> 00:33:29,430
So so we get two sets of samples from the competition. So.
286
00:33:29,430 --> 00:33:35,520
Empty rooms then then all this, but they tend to be exact reject.
287
00:33:35,520 --> 00:33:44,630
So you don't necessarily set all of your steps and so you end up with two samples in one place or a number of samples in one place.
288
00:33:44,630 --> 00:33:51,310
So I wanted to now provide a slight that was the sort of mathematical detail of the algorithm or sketch of the algorithm.
289
00:33:51,310 --> 00:33:57,240
I'm going to try and visualise a little bit about how. The algorithm runs the time.
290
00:33:57,240 --> 00:34:05,820
And so the question I have is, can we use Petropolis to sample from this sort of random continuous distribution of great below?
291
00:34:05,820 --> 00:34:18,660
And so obviously this is a problem. And I just know it's kind of weird, funky distribution and ask the question, can we actually use.
292
00:34:18,660 --> 00:34:23,880
OK, so what we find is that if I should have done this,
293
00:34:23,880 --> 00:34:33,720
then we started some sort of poverty point and then no algorithm proceeds by proposing values and then only the rejecting that value,
294
00:34:33,720 --> 00:34:39,510
in which case quite illustrate the change path, a sort of read and or it accepts that poverty,
295
00:34:39,510 --> 00:34:44,980
in which case we get a green transition and the chain moves to the new location.
296
00:34:44,980 --> 00:34:52,970
And so what we can see over time is that our Markov chain moves about and intends to move to the modes of distribution, which is what we want.
297
00:34:52,970 --> 00:34:58,210
We want to sample more from the modes of distribution, because that's what having a majority means.
298
00:34:58,210 --> 00:35:02,380
It means you generate more samples in that location, in other locations.
299
00:35:02,380 --> 00:35:11,680
And so if I was to leave, this is a much, much longer, which I'll show you in a second, then hopefully the collection of the points,
300
00:35:11,680 --> 00:35:19,390
the sort of nodes on the blue path and would represent samples from the underlying distribution.
301
00:35:19,390 --> 00:35:26,560
And so I can illustrate that now on the right hand side, I've got the actual density here and on the left hand side,
302
00:35:26,560 --> 00:35:32,050
I've got a reconstructed version of the density which I get from fitting a of
303
00:35:32,050 --> 00:35:37,400
density estimate up to my collection with my samples from Random Metropolis.
304
00:35:37,400 --> 00:35:44,290
And so with a sample size of one hundred, we see here that I get a very noisy interpretation of the activity.
305
00:35:44,290 --> 00:35:51,340
But as I run more and more samples, then we see the distributions that are around below that.
306
00:35:51,340 --> 00:36:02,730
Over time, the distribution from my metropolis routine ends up converging towards the actual density.
307
00:36:02,730 --> 00:36:10,060
And so after a sample size of roughly twenty thousand, the metropolis approximation of the density.
308
00:36:10,060 --> 00:36:19,410
So it's kind of hard to tell it apart. So it's probably a good approximation to the underlying distribution.
309
00:36:19,410 --> 00:36:25,650
Great. So that's that's a very sweet introduction to it's a wonderful Marcovicci Monte Carlo.
310
00:36:25,650 --> 00:36:42,990
Does anyone have any questions at this point? No.
311
00:36:42,990 --> 00:36:48,570
OK, well, in that case, I'll proceed on to the final bit of the talk,
312
00:36:48,570 --> 00:36:57,060
which is to say a little bit more research that I'm involved in and which is a bit of software called points,
313
00:36:57,060 --> 00:37:03,580
which facilitates inference or differential equation rules.
314
00:37:03,580 --> 00:37:12,640
So I don't know. I'm sure that some of some of the people in the audience have tried to use or use Monte Carlo to fit models,
315
00:37:12,640 --> 00:37:19,240
I don't know if they tried to use ordinary differential equation models, but.
316
00:37:19,240 --> 00:37:30,640
What I found in the past is that often, especially the early stage researchers and people that are new to doing the thing, then typically they fall.
317
00:37:30,640 --> 00:37:33,010
But they follow a path which looks something like this,
318
00:37:33,010 --> 00:37:39,700
which is that they read the statistical literature and they find a given Markov chain, Monte Carlo method.
319
00:37:39,700 --> 00:37:47,830
And if they understand the statistical, it's which certainly doesn't go at the time and it's very poorly described at the time,
320
00:37:47,830 --> 00:37:54,480
then what they do is they type it up that method, and that may or may not be good code.
321
00:37:54,480 --> 00:37:59,110
And depending on what sort of software development practises they're using,
322
00:37:59,110 --> 00:38:07,130
then what they do is they apply that to their to their problem and they find that the chains aren't converging.
323
00:38:07,130 --> 00:38:10,990
So the method is essentially failing, and that can be for a number of reasons.
324
00:38:10,990 --> 00:38:17,470
One of the reasons could be that your molcajete Monte Carlo message isn't appropriate, that it's not up correctly,
325
00:38:17,470 --> 00:38:26,180
or it could be some characteristic of your old model and your data, which means that actually doing inference is going to be really, really difficult.
326
00:38:26,180 --> 00:38:31,220
And so then what people tend to do is they then move on to another, they look at the literature again,
327
00:38:31,220 --> 00:38:35,660
they put up another method and try again and they repeat the cycle of what we
328
00:38:35,660 --> 00:38:40,900
call a cycle of misery until they eventually end up with something that was.
329
00:38:40,900 --> 00:38:49,120
And so I think a few of us got a bit fed up with seeing people go through this path, and so we decided to try and stop it.
330
00:38:49,120 --> 00:38:59,210
And. As I say, the reason it exists is because partly it's a communication between the statistical literature is often written by methods, experts,
331
00:38:59,210 --> 00:39:04,820
but other methods, experts, and often those papers don't contain high quality pseudocode,
332
00:39:04,820 --> 00:39:11,100
which actually makes it harder to take these methods themselves. And and also.
333
00:39:11,100 --> 00:39:16,900
If there is software available accompanying the papers that they often it's not actually very well developed,
334
00:39:16,900 --> 00:39:19,080
a very user friendly, so it's not difficult to use.
335
00:39:19,080 --> 00:39:26,490
And if you want to move to a new method, then you need to get familiar with the whole new package of doing using that method.
336
00:39:26,490 --> 00:39:32,200
And so it takes ages to shift between different different types of.
337
00:39:32,200 --> 00:39:35,890
So, yeah, that's that's part of the reason that this cycle exists.
338
00:39:35,890 --> 00:39:43,780
Another reason is that ordinary different equation models are particularly problematic for infants because of that non-linear nature.
339
00:39:43,780 --> 00:39:51,460
So this is an example that I've taken from a paper by marcarelli in which he shows the posterior distribution of the two of the
340
00:39:51,460 --> 00:40:01,930
parameters for what's known as a good one oscillator model of this often used to sort of represent circadian rhythms and organisms.
341
00:40:01,930 --> 00:40:05,800
And so you can see on the left eye the posterior distribution.
342
00:40:05,800 --> 00:40:09,070
You can see that it's got all these kind of nasty ridges along.
343
00:40:09,070 --> 00:40:14,830
And if you think about Markov chain ones, that's what they're trying to do, is essentially explore those ridges.
344
00:40:14,830 --> 00:40:17,920
And so you need for this sort of method.
345
00:40:17,920 --> 00:40:25,630
It's really, really challenging to come up with good multichannels methods that will adequately, adequately explore the space.
346
00:40:25,630 --> 00:40:29,710
Remember that this is only two dimensions of a much higher dimensional space.
347
00:40:29,710 --> 00:40:35,060
And so it actually gets it's much harder than just how this problem looks.
348
00:40:35,060 --> 00:40:42,050
And so you need often different times from these different types of insurgency methods.
349
00:40:42,050 --> 00:40:51,320
So much motivated, this this is something which points and basically what point is, is a zoo,
350
00:40:51,320 --> 00:41:01,040
the zoo have lots of sampling methods and it also has optimisation methods in that site in optimisation methods and single value of your properties,
351
00:41:01,040 --> 00:41:05,210
which optimises some criteria in sampling. It's different.
352
00:41:05,210 --> 00:41:11,540
You return kind of distribution of your property values, which represent some sort of uncertainty.
353
00:41:11,540 --> 00:41:19,570
It's an open source Python Library that's available and GitHub was created in computer science.
354
00:41:19,570 --> 00:41:27,140
So how is it different? It's not aligned to the single algorithm and it's designed to interface with other programming language.
355
00:41:27,140 --> 00:41:29,280
So, for example, it has an interface that stands.
356
00:41:29,280 --> 00:41:40,870
So if you if you try and do inference on your model using stand and you find it fails, then you don't necessarily have to model in the first model.
357
00:41:40,870 --> 00:41:47,830
And then using points, you can actually user interface and actually make the transition a bit easier.
358
00:41:47,830 --> 00:41:58,750
It's aimed at Honda forward models, IDs and PD typically, and it allows users the freedom to use their own Ford model solution method so often,
359
00:41:58,750 --> 00:42:04,750
and particularly partial differential equations, they require quite nuanced ways of actually solving those models.
360
00:42:04,750 --> 00:42:11,560
And so points gives users the freedom to use whatever they want to solve those differential equations.
361
00:42:11,560 --> 00:42:15,460
And you can still the mass difference in a lot of the probabilistic programming
362
00:42:15,460 --> 00:42:24,070
methods that are out there at the moment where you have to come up with your problem, solution using essentially their own language.
363
00:42:24,070 --> 00:42:30,100
So I'm sure that some of you probably use them and which is really, really good software.
364
00:42:30,100 --> 00:42:38,140
I wrote a book about it essentially and really what useful but it's a different sort of nature to what point says I'm done.
365
00:42:38,140 --> 00:42:43,030
It's done is really, really good. If you've got a model which is got lots of lots of promises, dimensions,
366
00:42:43,030 --> 00:42:49,570
but the the evaluation of the likelihood is relatively cheap and various points of the evaluation,
367
00:42:49,570 --> 00:42:55,290
the likelihood is really expensive because you have to integrate your your definition of the equation.
368
00:42:55,290 --> 00:43:04,410
And so it's in a different part of space and sort of needs to to to stand and sort of
369
00:43:04,410 --> 00:43:11,690
community act as a guest for politicians rather than to necessarily apply statistician's.
370
00:43:11,690 --> 00:43:18,530
Points a set is a zoo of lots and lots of different animal coaching methods and other types of
371
00:43:18,530 --> 00:43:23,760
something that I haven't gotten so fascinated because molcajete Monte Carlo is just one type of method,
372
00:43:23,760 --> 00:43:27,180
the sampling area. There are lots of other types.
373
00:43:27,180 --> 00:43:34,380
And as I say, we've got already a lot of these methods and points and some more problems and some of these methods,
374
00:43:34,380 --> 00:43:40,950
and they placed different restrictions on what you need to be able to evaluate your model, your model to be able to do so.
375
00:43:40,950 --> 00:43:47,320
Some of them require no gradients of the local attitude with respect to the promise of values,
376
00:43:47,320 --> 00:43:52,980
and others require what is known as the sensitivities, which is the great promise of all these,
377
00:43:52,980 --> 00:43:55,650
which means that you need to get the great solution of your own,
378
00:43:55,650 --> 00:44:00,810
your different situation with respect to your partner's values, which is about as difficult as it is to say that.
379
00:44:00,810 --> 00:44:03,060
And then you've got secondary sensitivities,
380
00:44:03,060 --> 00:44:10,560
which is one step forward as a second derivative of your ordinary differential equations, solutions, which prompts about this.
381
00:44:10,560 --> 00:44:18,240
And as you can imagine, that determining these things the first and second, the sensitivity is really computationally expensive.
382
00:44:18,240 --> 00:44:21,330
And so it works well in some circumstances.
383
00:44:21,330 --> 00:44:29,170
And in others it just means that it's too restrictive and so and it's too complicated and expensive to do so.
384
00:44:29,170 --> 00:44:34,870
And so you need a great method to be able to do it. And then we also plan to include lots of free methods.
385
00:44:34,870 --> 00:44:39,720
So is a place that knows of approximate computation methods and points of entry.
386
00:44:39,720 --> 00:44:44,260
And so that's of the next iteration.
387
00:44:44,260 --> 00:44:54,020
So with that, I'll finish and I'll just have a quick thank you to the other developers have written five of them here.
388
00:44:54,020 --> 00:45:03,970
Michael Cluck's, Martin Robinson, John McClain, those people and the government, we've all been based upon a consensus at some point.
389
00:45:03,970 --> 00:45:10,170
Michael is now investigating Nottingham, but there's also other people that have been involved that just didn't have space to include it.
390
00:45:10,170 --> 00:45:15,050
So with that, I'll finish and I'll ask if anyone has any questions.
391
00:45:15,050 --> 00:45:25,520
OK, yeah, so everyone, if you can either come off the audio, clap or clapping like that was wonderful.
392
00:45:25,520 --> 00:45:33,800
Thanks so much. Thank you. So, yeah, if you have any questions and you'd like to remain anonymous,
393
00:45:33,800 --> 00:45:43,900
just talk them in the chat or if you have your voice recorded, then go to audio on any questions.
394
00:45:43,900 --> 00:45:50,540
And can I ask the question, please? Yeah, go for it. And then I thought your topic is really good.
395
00:45:50,540 --> 00:45:57,770
And in particular, animations are just so helpful and showing the intuition underlying the concepts that you're talking about.
396
00:45:57,770 --> 00:46:01,040
And so I just had a question about prior selection.
397
00:46:01,040 --> 00:46:08,000
And because you mentioned this already, obviously one of the criticisms of Bayesian inference is that it can be subjective.
398
00:46:08,000 --> 00:46:16,180
And so how do you address this in your analysis and say that you have like a reviewer on a paper who doesn't agree?
399
00:46:16,180 --> 00:46:22,610
Like, how do you kind of argue that that point? Well. Good question.
400
00:46:22,610 --> 00:46:28,740
So I was asking questions about how I am.
401
00:46:28,740 --> 00:46:32,790
Yeah, very good question. It's a bit of a bit of a can of worms.
402
00:46:32,790 --> 00:46:36,780
There are many different ways to go about choosing an appropriate for our distribution.
403
00:46:36,780 --> 00:46:48,240
And so in some instances, then promises to be able to have a very kind of interpretable manner about them and for literature or prior
404
00:46:48,240 --> 00:46:55,710
estimates of things then to be directly ported over to become poster of your previous analysis can become a fire.
405
00:46:55,710 --> 00:47:01,350
If that's the case, that doesn't happen too often in reality.
406
00:47:01,350 --> 00:47:07,920
So in reality, I think the way that I am, especially in this very different,
407
00:47:07,920 --> 00:47:16,080
is the way I'm sort of now thinking about selecting its I tend to do predictive simulations,
408
00:47:16,080 --> 00:47:26,790
so I choose a selection of pros and then use sampling to set something resembling sample first parameter values and your prise,
409
00:47:26,790 --> 00:47:35,610
and then you empathise with the sampling distribution and then you get out a distribution of your data and that distribution of your data
410
00:47:35,610 --> 00:47:44,790
should hopefully look kind of similar to what you would expect plausible volleys of your data to look like before you do an experiment.
411
00:47:44,790 --> 00:47:51,960
So so that's what you would typically what I typically tend to do now is that I will do
412
00:47:51,960 --> 00:47:58,830
this property some simulations and I have a distribution of of potential data sets,
413
00:47:58,830 --> 00:48:07,680
which is much wider than encompasses the expected range that I would sort of expect to collect when actually go ahead and collect data.
414
00:48:07,680 --> 00:48:13,710
And so then it becomes actually not too difficult to argue that in the paper, I find,
415
00:48:13,710 --> 00:48:21,760
because then you just include either these or the visualisations, the property, the distribution or so those are the contours of it.
416
00:48:21,760 --> 00:48:28,720
And it tends to be quite a persuasive way of arguing that you've made sensible choices about property properties.
417
00:48:28,720 --> 00:48:33,270
So, yeah, that's that's what I sort of encourage people to do now, is to do that.
418
00:48:33,270 --> 00:48:38,220
I mean, because often you just don't have that much information about individual properties.
419
00:48:38,220 --> 00:48:46,080
And if you do, that's kind of a luxury. And then you see that obviously. And so so, yeah, that that's the way I go about doing.
420
00:48:46,080 --> 00:48:53,750
I mean. With counselling, referee comments about fries hasn't actually come up that much to me, I don't know why,
421
00:48:53,750 --> 00:49:02,960
but the one thing you obviously can't do is just kind of sensitivity analysis at that point when someone raises the issue is,
422
00:49:02,960 --> 00:49:09,350
is your problem is your your employer emphasis, sensitivity approach or is and if that is the case,
423
00:49:09,350 --> 00:49:14,760
then you always have a kind of obligation to report that anyways, they should be doing that anyway.
424
00:49:14,760 --> 00:49:19,820
Does that answer your question? Yeah, no, that's that's really interesting.
425
00:49:19,820 --> 00:49:26,120
Yes, I should say that first part. I thank you for that and thanks again for the talk.
426
00:49:26,120 --> 00:49:33,150
So it's a really interesting question, but a good question. Do you have any other questions?
427
00:49:33,150 --> 00:49:41,850
Could I jump in here? Yes, and thanks so much to talk relief's extremely clear, they presented things excellently.
428
00:49:41,850 --> 00:49:50,670
I kind of question the study. You showed how as we had more data, the Pryors essentially outweighed by the likelihood and so on.
429
00:49:50,670 --> 00:49:58,290
Do you have a rough sense of when you evaluate it's kind of worth doing a Bayesian analysis versus not given the amount of data,
430
00:49:58,290 --> 00:50:02,370
or is the idea to just kind of like set everything up in a Bayesian approach,
431
00:50:02,370 --> 00:50:06,560
regardless of the volume of the data and let the results speak for themselves?
432
00:50:06,560 --> 00:50:15,530
So as I understand your question, it's about sort of when you get much merit from using a Bayesian analysis versus using a frequency.
433
00:50:15,530 --> 00:50:20,080
Is that right? Yes, yeah, exactly. What's the kind of threshold of data and how it is?
434
00:50:20,080 --> 00:50:27,550
So it's a good question. I mean, I'm not one of these people that tends to to bash classical inference, particularly.
435
00:50:27,550 --> 00:50:36,540
I think that they both have pieces. And what I can say is that so.
436
00:50:36,540 --> 00:50:45,990
There are situations when the inference allows me to do inference when you wouldn't be able to do so in the classical sense.
437
00:50:45,990 --> 00:50:52,130
So I'm not saying in situations where your model is relatively poorly identified.
438
00:50:52,130 --> 00:50:55,130
So a good example of this is in covid-19,
439
00:50:55,130 --> 00:51:02,260
then you have this sort of transmission dynamic models of how the disease spreads and those models have got lots of different promises,
440
00:51:02,260 --> 00:51:08,600
the rate at which people recover, the rates at which people become infected. And there's lots and lots of uncertainty about all these promises.
441
00:51:08,600 --> 00:51:15,380
And the data that we collect to actually try and estimate those promises is really, really noisy and poor.
442
00:51:15,380 --> 00:51:21,950
And so there's no way that you can actually estimate all these promises just from the data you need.
443
00:51:21,950 --> 00:51:27,920
You need something else. You need biological, pre-existing knowledge, basically.
444
00:51:27,920 --> 00:51:36,620
And so in that situation, then you're a bit stuck in frequencies so you can fix your promises about things that you think about logically possible.
445
00:51:36,620 --> 00:51:41,330
But that's not quite satisfactory because often we don't know those things very well.
446
00:51:41,330 --> 00:51:50,310
And so in that inference, we can use price. And so we can sort of incorporate our uncertainty then, but we can still make progress on the inference.
447
00:51:50,310 --> 00:51:54,580
We can still try and estimate that we're actually interested in.
448
00:51:54,580 --> 00:51:59,440
And and so, yeah, kind of the basic message,
449
00:51:59,440 --> 00:52:04,340
one of the benefits is that it allows you to make progress on problems that are basically just
450
00:52:04,340 --> 00:52:08,260
unidentifiable when you wouldn't be able to make progress with using frequencies influence.
451
00:52:08,260 --> 00:52:14,740
And that tends to be the case when the models that you're trying to do in school get more complicated or the data get fewer.
452
00:52:14,740 --> 00:52:17,320
So it's one of those two kind of circumstances.
453
00:52:17,320 --> 00:52:24,040
There are also other benefits of the sort of method, which is that because everything's done in a simulation way,
454
00:52:24,040 --> 00:52:30,250
because you have to do approximate inference, then you typically get things like uncertainty and predictions for free.
455
00:52:30,250 --> 00:52:37,120
And that's that's kind of the nice thing about basic principle. But does that answer your question?
456
00:52:37,120 --> 00:52:43,000
Yes, that was great. Thank you. Thanks for the question.
457
00:52:43,000 --> 00:52:52,610
Or any other questions? I had a question to come.
458
00:52:52,610 --> 00:53:04,400
Yeah, yeah, yeah, I was curious about the like you free to find it quite hard to hear he.
459
00:53:04,400 --> 00:53:11,930
Can you hear me better now? Yeah, right. So, um, yeah, I was I was I was curious about the like in three different fundamentals.
460
00:53:11,930 --> 00:53:17,270
You were planning to go there. So do you I mean, which kind of matters are planning to include?
461
00:53:17,270 --> 00:53:27,170
Are you planning to have my walky only four of these are asked for like stochastic financial models as well.
462
00:53:27,170 --> 00:53:34,490
Yeah. So so I guess the that that the class of models isn't one class of models that isn't covered
463
00:53:34,490 --> 00:53:39,980
by points at the moment is models where you have the caps that underlines the Casodex.
464
00:53:39,980 --> 00:53:45,650
So I presented a deterministic model which is the closest you can imagine the model,
465
00:53:45,650 --> 00:53:55,190
which is sarcastic and point isn't able to handle that because when you get into the realm of stochastic processes,
466
00:53:55,190 --> 00:53:59,750
then often it's really, really difficult to write down the probability of having generated the data.
467
00:53:59,750 --> 00:54:04,340
Intuitively it be there were just too many ways to generate the data and so on.
468
00:54:04,340 --> 00:54:09,790
And so in that setting then, yeah, it's very difficult to latch on a.
469
00:54:09,790 --> 00:54:13,330
In those situations, there are a few different things you can do.
470
00:54:13,330 --> 00:54:22,690
One of them that I think is probably the next logical step is to put an approximate on compensation approaches.
471
00:54:22,690 --> 00:54:29,140
And in those, what they rely on is essentially your ability to simulate from the model.
472
00:54:29,140 --> 00:54:40,910
So so long as you can simulate from your process and check how close your simulated data is to your actual data and different parameter values, then.
473
00:54:40,910 --> 00:54:49,010
Under some some conditions, then you can still make progress towards being able to do it,
474
00:54:49,010 --> 00:54:57,360
it becomes approximate because it's it's it's only exact in the limit that your data and you the data exactly.
475
00:54:57,360 --> 00:54:59,990
One another, which typically doesn't happen in reality.
476
00:54:59,990 --> 00:55:08,580
So you used what cycle sort of approximate based compensation methods to to to do not just rely on your ability to forward simulating the model.
477
00:55:08,580 --> 00:55:11,570
And we're going to include a whole host of these different methods.
478
00:55:11,570 --> 00:55:19,100
We've already done a lot of exploration of these different methods, and our plan is to include them in the next iteration of points,
479
00:55:19,100 --> 00:55:24,230
because people in the institute I've asked this about,
480
00:55:24,230 --> 00:55:30,320
about this quite a lot of the time because they tend to to use these things in, let's say, spatial simulations themselves.
481
00:55:30,320 --> 00:55:38,230
And it also gets used a lot in support of genetics. And because, again, it's hard to write down the.
482
00:55:38,230 --> 00:55:44,960
The volunteer person. Yeah, I think it was, yeah, yeah, it was.
483
00:55:44,960 --> 00:55:53,300
Yeah, I was just asking because I'm I'm also happy to fight my body when we do A, B, C, so.
484
00:55:53,300 --> 00:56:00,410
OK, well, I mean, if you're interested in working or chatting about that,
485
00:56:00,410 --> 00:56:05,090
then happy to if you want to send me an email or get in touch with have a chat.
486
00:56:05,090 --> 00:56:10,970
But my view is that about, about developing parks and libraries is that if they are trying to do the same thing,
487
00:56:10,970 --> 00:56:17,150
which is kind of inference, then I think it's probably better that they this is just I'm biased like that.
488
00:56:17,150 --> 00:56:21,560
It's probably better that they say on the one thing rather than on the lots of different things. But that's just my view.
489
00:56:21,560 --> 00:56:31,520
And I think people in Matsue also tried to develop their own agency libraries and and and that some have been more successful than others on site.
490
00:56:31,520 --> 00:56:36,590
But yeah, if you want to if you want to have a chat about it and set a minimum.
491
00:56:36,590 --> 00:56:41,830
Right. Thank you. Oh.
492
00:56:41,830 --> 00:56:47,280
Great. OK, so if there are any more questions.
493
00:56:47,280 --> 00:56:53,880
OK, so I think what another round of applause for a great talk.
494
00:56:53,880 --> 00:56:59,400
I said, well, thanks again, guys, and thanks for setting this up and down and organising.
495
00:56:59,400 --> 00:57:05,290
And yeah, if anyone's got any questions afterwards and just email me, then I think the talk is being said on the podcast.
496
00:57:05,290 --> 00:57:10,269
Seriously, you should be able to that anyway. So yeah. Thanks again.