1
00:00:00,820 --> 00:00:10,570
And Jill, over to you. OK, well, hi, everyone, and thanks for the invitation and for the too generous introduction.
2
00:00:10,570 --> 00:00:18,520
And so this is this is Don't Work is Florence for CNN and 2010.
3
00:00:18,520 --> 00:00:24,340
And again and then what I'm going to talk about today is approximate Bayesian computation.
4
00:00:24,340 --> 00:00:29,170
That is ABC, in short, with a resurgent Posterous.
5
00:00:29,170 --> 00:00:39,790
So it's it's sort of a new way to to doing ABC that that we find that we find pretty interesting.
6
00:00:39,790 --> 00:00:52,650
So as a disclaimer, I'd like to say first that we none of us is really expert of ABC, but any way we find the new methods particularly.
7
00:00:52,650 --> 00:01:03,390
So here are my collaborators of Florence, he's the head of the Stratify team in England,
8
00:01:03,390 --> 00:01:09,030
is at La Trobe University in Australia, and Trent in England is at university in France.
9
00:01:09,030 --> 00:01:14,280
So I think that all of them are here. So I guess.
10
00:01:14,280 --> 00:01:20,670
So by the end of my presentation, you should be more familiar with ABC.
11
00:01:20,670 --> 00:01:24,270
I hope so. I would start by presenting the venue.
12
00:01:24,270 --> 00:01:30,810
I agree then that you can use in a series called The Rejection ABC and then and move
13
00:01:30,810 --> 00:01:37,200
on to the semi-automatic approach that was proposed by Fernet and Pringle in 2012.
14
00:01:37,200 --> 00:01:42,990
And what we do is that we build on this semi-automatic ABC.
15
00:01:42,990 --> 00:01:52,230
So we for this we we have preliminary learning step where we where we built what we call surrogate posteriors.
16
00:01:52,230 --> 00:01:57,570
And these surrogate posteriors are built as an inverse regression that is called glim.
17
00:01:57,570 --> 00:02:02,520
And so we call our ABC approach the clean ABC procedure.
18
00:02:02,520 --> 00:02:10,980
So I will present some theory, some theoretical properties of it and and a number of illustrations in inverse problems.
19
00:02:10,980 --> 00:02:17,870
And then I conclude. So to give you quickly some context,
20
00:02:17,870 --> 00:02:24,140
and when we are interested in doing Bayesian inference that the likelihood is
21
00:02:24,140 --> 00:02:31,340
intractable when we to deal is the is in the context of approximate computation.
22
00:02:31,340 --> 00:02:36,470
So you get data generating model as as follows.
23
00:02:36,470 --> 00:02:48,280
So the parameters denoted data and we have to have a prior pipefitter and given to the the likelihood is denoted fita.
24
00:02:48,280 --> 00:03:00,710
And I've said so that did I mentioned and an important condition of of ABT is that we know who to sample from this likelihood.
25
00:03:00,710 --> 00:03:03,910
We also know how to sample from the prior.
26
00:03:03,910 --> 00:03:12,220
So the goal of statistics, one on one goal in statistics is estimations or estimation of the parameter given some observed way.
27
00:03:12,220 --> 00:03:20,710
And in patients, that's how this is done, is by by forming a process distribution by Peter,
28
00:03:20,710 --> 00:03:31,700
given why that is proportional to the prior times that I to. And the question is NBC what to do when the likelihood is intractable.
29
00:03:31,700 --> 00:03:42,120
So it's not possible to evaluate it, maybe because it's too costly or just because it's it's not their.
30
00:03:42,120 --> 00:03:48,810
So one way to proceed and the most simple way to do is is as follows.
31
00:03:48,810 --> 00:03:58,650
So you get to you want some to values and parameter values from from the posterior, but you're not going to have them.
32
00:03:58,650 --> 00:04:03,120
Exactly. From the steering that ABC set up, it's going to be approximate.
33
00:04:03,120 --> 00:04:08,820
And so the way to to proceed is as follows. You need to sample quite a lot.
34
00:04:08,820 --> 00:04:17,490
A number of couples have parameters in there and sample data Tettamanzi them.
35
00:04:17,490 --> 00:04:22,050
And so this is simple to do. If you know how to sample from both the prior and accurate,
36
00:04:22,050 --> 00:04:31,740
you can first sample type values from the prior and conditional on these two that I use, sample data I use from the likelihood.
37
00:04:31,740 --> 00:04:40,500
And the key starting point of rejection is that you are going to accept to keep
38
00:04:40,500 --> 00:04:47,900
parameter values for which you simulated data values that are close to the actual data.
39
00:04:47,900 --> 00:04:55,540
And so this is done by a by a comparison with some metric capital G.
40
00:04:55,540 --> 00:05:04,390
And as soon as the distance is small enough, then you decide to to keep the parameters that you have sampled.
41
00:05:04,390 --> 00:05:11,120
So this distance D can be can take several forms in the most simple way.
42
00:05:11,120 --> 00:05:19,600
It takes the it's the Euclidean distance between between the victors of of data of true data and humanity data.
43
00:05:19,600 --> 00:05:28,660
But as we will see so often the Euclidean distance of summaries of these factors.
44
00:05:28,660 --> 00:05:36,940
And so already now we can see a number of questions and we are faced with a number of questions actually,
45
00:05:36,940 --> 00:05:42,700
what choice for D for the distance, for the samaris and also for the for the threshold?
46
00:05:42,700 --> 00:05:50,650
Excellent. So in to talk and in our work, we don't really discuss the choice of the threshold,
47
00:05:50,650 --> 00:05:58,450
but we have this discussion and the choices for Froggie and for the simplest.
48
00:05:58,450 --> 00:06:09,490
So they are a number of strategies for for these choices of DNA and s. So it's a starting point for
49
00:06:09,490 --> 00:06:18,940
this is the realisation that you cannot really use this simple distance efficiently in high dimension,
50
00:06:18,940 --> 00:06:24,340
etc. you would get too much variability in your procedure.
51
00:06:24,340 --> 00:06:29,560
So it's important to do to reduce the dimension.
52
00:06:29,560 --> 00:06:39,630
So doing this can be done in two ways. The first category of ways, the first committee of procedures is based on the effort is made in the summary.
53
00:06:39,630 --> 00:06:48,810
I saw this in this family of approaches, this is a standard distance, and in this way,
54
00:06:48,810 --> 00:06:57,450
the the the the fact that you use summary's reduces the dimension and induces a smaller variance.
55
00:06:57,450 --> 00:07:02,520
But the the the problem is that you lose some information.
56
00:07:02,520 --> 00:07:13,600
And the choice the choice of the summaries is arbitrary if you don't have expert information for how to do it.
57
00:07:13,600 --> 00:07:32,290
So so the the work that I already mentioned a couple of minutes ago by of in 2012 provided the first solution to this, to this and to this problem.
58
00:07:32,290 --> 00:07:38,890
The semi-automatic A.B.C. framework relies on the preliminary learning step where
59
00:07:38,890 --> 00:07:49,480
you you learn the defendant's between the parameter and the data in a generic way.
60
00:07:49,480 --> 00:08:02,190
But one of the one of the limitation is that it relies requires a modest dimensionality for the for the data.
61
00:08:02,190 --> 00:08:10,500
And the second category of of approaches is, is the is the ones that are based on data discrepancy.
62
00:08:10,500 --> 00:08:17,970
So it was an active research line in the last five years or so where the idea
63
00:08:17,970 --> 00:08:26,250
here is to replace the distance by a distance and empirical distributions.
64
00:08:26,250 --> 00:08:35,730
So your view in this in this sort of approach is you view your your data vectors as empirical distributions.
65
00:08:35,730 --> 00:08:42,330
And so by doing this so here it's with some abuse of notation.
66
00:08:42,330 --> 00:08:50,580
The victors are seen as these empirical distributions and then you can use the distances between empirical distributions.
67
00:08:50,580 --> 00:08:59,630
So a number of distances have been proposed in the in the literature and listed here.
68
00:08:59,630 --> 00:09:07,390
So a clear advantage is that you do not rely anymore on in some recent.
69
00:09:07,390 --> 00:09:20,240
But a problem is that the tree is moderately large, samples you need replicates of of samples for the same parameter to twin girls.
70
00:09:20,240 --> 00:09:24,610
And in many of those problems, you don't you don't have these replicates.
71
00:09:24,610 --> 00:09:30,730
So you you in the problems we are interested in, you only have one observation.
72
00:09:30,730 --> 00:09:41,670
That can be a long observation, but you have one observation to two inverse four for every parameter of interest.
73
00:09:41,670 --> 00:09:53,880
OK, so what one reason why these ABC methods are interested in one of the reasons why we we we
74
00:09:53,880 --> 00:10:02,760
can count on them is that they have well-behaved limits when when Upsilon goes to zero.
75
00:10:02,760 --> 00:10:06,570
So this is a this is what we need to present here.
76
00:10:06,570 --> 00:10:17,760
This is. So here the placea distribution is written with this intractable, like huge inbreds.
77
00:10:17,760 --> 00:10:30,030
And so since it's intractable, we replace it's ABC replaces it by this blue quantity that is that is a of the
78
00:10:30,030 --> 00:10:34,780
like Hudes with respect to the indicative indicator function here in root.
79
00:10:34,780 --> 00:10:49,710
OK, so using this approximate likelihood induces an approximate crustier here in red that is proportional to the prior times, the approximate.
80
00:10:49,710 --> 00:10:58,980
And the reason why this crazy poster converges to the troops there is fairly simple to to see.
81
00:10:58,980 --> 00:11:07,440
It relies on the fact that when Upsilon goes to zero, the distances between the data vectors also goes to zero.
82
00:11:07,440 --> 00:11:19,830
So the set of accepted vectors converges to the singleton made of only true data.
83
00:11:19,830 --> 00:11:23,460
And so the approximate posterior Convergys to the troops there.
84
00:11:23,460 --> 00:11:30,570
So the details of this proof can be found in the references below.
85
00:11:30,570 --> 00:11:39,390
And one of the starting point of the of the of the work is actually a realisation by Florence, if I don't want to mention her,
86
00:11:39,390 --> 00:11:49,620
that this condition that the set of accepted Zed's converges to the Singleton Y is somehow too strong of an assumption.
87
00:11:49,620 --> 00:11:59,730
Or you can rely on something not as strong as is for the for the convergence to to steal, to see, to hold.
88
00:11:59,730 --> 00:12:04,110
And and let me let me explain which sense this.
89
00:12:04,110 --> 00:12:08,820
This is a this is this is true. So we we can right.
90
00:12:08,820 --> 00:12:15,450
Then the the base formula for the crazy posterior in a slightly different way.
91
00:12:15,450 --> 00:12:32,760
So we replace here the joints of t time and data by by the same joint here written in blue but but by using the chain rule in the other way round.
92
00:12:32,760 --> 00:12:40,470
Right. So we. It uses the post here and the and the and the evidence of that.
93
00:12:40,470 --> 00:12:43,920
And so this is a sort of first realisation.
94
00:12:43,920 --> 00:12:59,100
And then the second sort of bold endeavour is to replace some distance between vectors, Y and Z by distances between posterior distributions.
95
00:12:59,100 --> 00:13:08,280
So here there is an overload of denotation. G is not the same, but we used to do the same, the same notation.
96
00:13:08,280 --> 00:13:19,780
So in this integral, we want to replace distance between vectors by by a distance between distribution's.
97
00:13:19,780 --> 00:13:31,660
And so we have no we are forming with with this we are forming a new quasi post here that is written in blue and that is the same as before.
98
00:13:31,660 --> 00:13:41,390
But where the indicative indicates our function is is is evaluated at the at the distances above.
99
00:13:41,390 --> 00:13:49,460
OK, and when you first hear in that reproving in the Philippines is the fact that the crazy post here,
100
00:13:49,460 --> 00:13:57,370
Convergys to the to the posterior, to the troops there in total valuation when Upsilon goes to zero.
101
00:13:57,370 --> 00:14:09,190
So actually, the proof is very similar to to the intrusion of the of the original proof for for the ABC structure when Upsilon goes to zero.
102
00:14:09,190 --> 00:14:18,430
And you can get that the discrepancy between the surrogate wall between the stairs also goes to zero.
103
00:14:18,430 --> 00:14:26,720
So that means that the possibility at sea converges to that evaluation of the true data.
104
00:14:26,720 --> 00:14:34,890
And in terms of of Kwesi posteriors, that means the quazi posterior to Convergys, to the to the posterior.
105
00:14:34,890 --> 00:14:44,220
And and so what we what we see in blue here is that the convergence note in terms of sets of indicator functions
106
00:14:44,220 --> 00:14:53,430
is a convergence to to a set that is potentially slightly larger than the Singleton way is set in blue here.
107
00:14:53,430 --> 00:14:59,140
And it contains white, but not necessarily on the one.
108
00:14:59,140 --> 00:15:12,520
So if you follow me, well, you may ask yourself, why is it any and legitimate to use this unknown quantity in what we do in Saudi?
109
00:15:12,520 --> 00:15:19,840
Of course, in practise, we we need to use a practical approach, approximations to these to these bustiers.
110
00:15:19,840 --> 00:15:35,950
And this is what we call the surrogate bustiers. Um, so I'm I'm moving now to to give a few words on the approach proposed by Senate and Pringle,
111
00:15:35,950 --> 00:15:46,420
so when they what they do in semiautomatics, two, to replace the choice of summer is by by summer and.
112
00:15:46,420 --> 00:15:57,250
By some experts, their expectation was Terramin, so of course, the Terramin is a quantity that isn't available by definition of your of your problem.
113
00:15:57,250 --> 00:16:07,150
This is one of the things that you are looking for. But they iji and they suggest to use a preliminary linear regression to learn this
114
00:16:07,150 --> 00:16:19,070
meeting between 2010 Z and and this is done by first simulating a large and a large.
115
00:16:19,070 --> 00:16:26,310
A number of couples of parameters and data that is simply simply sampled from the joint distribution,
116
00:16:26,310 --> 00:16:33,020
so it's the same procedures that we do in B.C., but it's done as a preliminary step.
117
00:16:33,020 --> 00:16:37,490
Right. So in the end, you're going to do it twice.
118
00:16:37,490 --> 00:16:47,270
And so we have a number of the contributions in in the paper that we that we call variance in in this presentation.
119
00:16:47,270 --> 00:16:53,900
So I realised that I should I should not use the word variant that is too much contested those days.
120
00:16:53,900 --> 00:16:58,860
So the first one is was already suggested.
121
00:16:58,860 --> 00:17:04,860
When was rejected, implemented by my papers, for instance, and Rehnquist,
122
00:17:04,860 --> 00:17:16,140
and it was already suggested in the original paper by Denny Simple, it's about using something else than a linear regression.
123
00:17:16,140 --> 00:17:26,340
So neural networks, for instance, actually, we can also use our own investigation to to implement our number one.
124
00:17:26,340 --> 00:17:34,950
And number two is to realise that not only the means could be used, but also some higher order moments like variances.
125
00:17:34,950 --> 00:17:39,270
So it was already suggested by June, but not implemented.
126
00:17:39,270 --> 00:17:49,730
And we guess that the reason why it was not implemented is that it requires your procedure to be able to provide those moments at low cost.
127
00:17:49,730 --> 00:18:01,250
And the main contribution is for and the three is to replace summary's by by a good approximation of the post there.
128
00:18:01,250 --> 00:18:09,020
So this requires two things. It requires a learning procedure that is able to provide this approximate plasterers.
129
00:18:09,020 --> 00:18:17,240
And for this, we use the clean model Kushan locally mapping of the reforms 2015.
130
00:18:17,240 --> 00:18:24,710
And then once once we have a positive approximation, we need a way to compare them.
131
00:18:24,710 --> 00:18:31,640
And this is this requires metrick between distribution's.
132
00:18:31,640 --> 00:18:41,110
So I have to stop maybe for for a couple of seconds to ask whether there are any questions and what we've seen so far before, I move to the.
133
00:18:41,110 --> 00:18:57,490
So I proposed this framework. I should move on so and so the surrogate posters that we propose are billed as mixtures of
134
00:18:57,490 --> 00:19:04,630
emotions so that this is one of one of the babies of the first quarter of the work by Florence.
135
00:19:04,630 --> 00:19:18,370
And and the the idea of claim is to capture the the relationship between daytime parameters in a as a as a mapping that we that we learn beforehand.
136
00:19:18,370 --> 00:19:20,950
As to the way the mapping works,
137
00:19:20,950 --> 00:19:35,960
is is is is is as a mixture of coercion and distributions and that our parametrised by a number of by two sets of parameters,
138
00:19:35,960 --> 00:19:44,140
I'd say the first set of parameters I rely on the on the weights of the mixtures here below,
139
00:19:44,140 --> 00:19:49,190
while the second half of the parameters rely on the whole we.
140
00:19:49,190 --> 00:20:02,650
parametrised Scotians, and there is this fine, a fine relationship of the means and a fine dependency in life.
141
00:20:02,650 --> 00:20:12,550
So to fit these climate models, we need a preliminary learning set, just as the semi-automatic approach does.
142
00:20:12,550 --> 00:20:20,590
So for this, we need to sample from the from the true joint and get people and couples.
143
00:20:20,590 --> 00:20:29,130
And then the clean relationship is learnt is is learnt by using an written.
144
00:20:29,130 --> 00:20:39,310
OK, so we estimate a five star K for the number of mixtures and the number of components and what this data says.
145
00:20:39,310 --> 00:20:51,140
And we estimate Vistar and then all the procedure that follows can be done with this single value of Vistar.
146
00:20:51,140 --> 00:20:59,180
And so in our case, the three our that they have presented before take the following form.
147
00:20:59,180 --> 00:21:08,280
If, if, if I rely on Glim the Viant, no one uses the posterior mean.
148
00:21:08,280 --> 00:21:17,880
And as as a single as a single summary statistic and so means are staying close
149
00:21:17,880 --> 00:21:23,910
to home for everything is even close to home with these discussion mixtures.
150
00:21:23,910 --> 00:21:32,310
So this is the mean that we use violence. Number two is the suggestion that we can add some higher order moments.
151
00:21:32,310 --> 00:21:39,410
So it happens that the violence is also close from Compline and they take this form.
152
00:21:39,410 --> 00:21:44,300
And then the violence and the three is the idea that we can use surrogate posteriors,
153
00:21:44,300 --> 00:21:51,220
so the full surrogate posterior in the case of cleaning are mixtures of emotions.
154
00:21:51,220 --> 00:21:57,730
So if we want to use this as an as it is to be compared together,
155
00:21:57,730 --> 00:22:07,660
we need a metric for Fogelson mixtures and that's precisely the work that was done by doing this on paper,
156
00:22:07,660 --> 00:22:13,330
where they propose Westtown based dist. for mixtures of of quotients.
157
00:22:13,330 --> 00:22:19,040
So the the distances referred to as the MWI to.
158
00:22:19,040 --> 00:22:28,720
But are the distances can be used and we also implement a need to distance between mixtures.
159
00:22:28,720 --> 00:22:33,590
OK, so this is a recap of the proposed algorithms,
160
00:22:33,590 --> 00:22:40,420
so remember that we have the first preliminary learning step where we need to to sample
161
00:22:40,420 --> 00:22:48,770
a large and large sample in and the data sets and we will learn glean on this data set.
162
00:22:48,770 --> 00:23:00,550
So we done this functional relationship between two things by getting a sci fi star and type of parameter estimate.
163
00:23:00,550 --> 00:23:09,370
This is an approximation of the troopers here. And then they are there is the second step that is computing distances.
164
00:23:09,370 --> 00:23:15,380
So we need another simulated data set E capital M.
165
00:23:15,380 --> 00:23:23,900
For a single observed why and we can do two different approaches, the victor approach,
166
00:23:23,900 --> 00:23:33,630
that is either variant one and two with the expectation that with expectation and finances or look, variances.
167
00:23:33,630 --> 00:23:41,010
And the functional summary variant consists in comparing directly the surrogate for stairs,
168
00:23:41,010 --> 00:23:52,800
so either by the N.W. to or by the L2 and then the sample selection is the usual thing that you only retain the best and the best,
169
00:23:52,800 --> 00:24:05,430
the smallest distances by by choosing Upsilon as usually as a quantity of the critical distribution and the distances.
170
00:24:05,430 --> 00:24:15,310
OK, so I'm moving now to some asymptotic properties of a very procedures, I can take questions if they are.
171
00:24:15,310 --> 00:24:23,170
Otherwise, I'm happy to move to on. Looks like there's a question from Jeff.
172
00:24:23,170 --> 00:24:26,670
Jeff, if you'd like to just submit yourself and go.
173
00:24:26,670 --> 00:24:39,030
All Yeah, certainly the windows, I just I just want to check my understanding, the the way that there is is quite a high dimension.
174
00:24:39,030 --> 00:24:44,460
It's not just the data. It's it includes like the whole data set.
175
00:24:44,460 --> 00:24:46,240
Is that right? Or is it just the data?
176
00:24:46,240 --> 00:24:57,780
Also, it's a did I mention the object is the dimensional vector that and that is not that high in our applications.
177
00:24:57,780 --> 00:25:10,530
And I'm going to comment on it later on. But it's usually it can be up to one hundred one thousand dimensional orbit.
178
00:25:10,530 --> 00:25:22,080
So it's a full data set is not to mention the full you have multiple copies of the site, actually, actually not really in the in the universe.
179
00:25:22,080 --> 00:25:23,490
Problems that we are interested in.
180
00:25:23,490 --> 00:25:36,540
We have it's a specific that you have one and and you want the Associated Press premature for for this data, for this single observation.
181
00:25:36,540 --> 00:25:49,500
Great. Thanks very much. I just want to thank you. OK, so I've already mentioned the first result, theoretical visa that we have,
182
00:25:49,500 --> 00:25:55,150
that is that the the first tier converges to the to the troposphere when Upsilon goes to zero.
183
00:25:55,150 --> 00:26:07,230
And and this is this is not really an applied results because it relies on the fact that we use the exact opposite here that we actually don't have.
184
00:26:07,230 --> 00:26:16,890
So this this is more of a practical theoretical visa, this one, because we plug the actual surrogacies.
185
00:26:16,890 --> 00:26:22,510
It's in in the in the crazy post where we are, we are working with.
186
00:26:22,510 --> 00:26:32,850
Um, so I have to acknowledge that this result is only does on hold on the restricted class of targets and surrogate distributions.
187
00:26:32,850 --> 00:26:40,020
So we need compactness actually for for being able to prove the results of compactness of both the joint space of time,
188
00:26:40,020 --> 00:26:57,030
light and as well compactness of the parameter say that that contains the defined parameters of an affair and family of of mixture components.
189
00:26:57,030 --> 00:27:08,940
So according to these assumptions, we build components mixtures from the family capital H.
190
00:27:08,940 --> 00:27:16,200
We use this the set of this learning set down and computes just as before the end,
191
00:27:16,200 --> 00:27:22,860
any parameter as being this this quantity that maximises the likelihood for four,
192
00:27:22,860 --> 00:27:31,860
five and the surrogates are built as and the mixtures evaluated that this at this Vistar.
193
00:27:31,860 --> 00:27:42,810
And what we can prove is with this and this framework and does some additional standard assumptions that that I detail here,
194
00:27:42,810 --> 00:27:52,620
that the injury distance between our approximate posterior and the exact posterior converges to to zero.
195
00:27:52,620 --> 00:28:06,660
So in some measure, Lambda, with respect to the data, to the true data way and in probability with respect to this to this sample and G or.
196
00:28:06,660 --> 00:28:27,540
So an important caveat of our result is that glim actually do not satisfy these these compactness assumptions that we we we hope that some
197
00:28:27,540 --> 00:28:38,970
some mixture of some version of truncated a mixture of truncated Goshen distributions could actually actually meet these these restrictions.
198
00:28:38,970 --> 00:28:49,990
So what I want to say is that this is a theoretical result, does not directly apply to tutoring.
199
00:28:49,990 --> 00:28:58,370
So I'm moving to to a couple of there is a yes to that question.
200
00:28:58,370 --> 00:29:06,160
So so is it important to work with the Hettinger justice? Options,
201
00:29:06,160 --> 00:29:18,630
I guess it's important to work with the distances where you know who to that you know how to deal with a distance sort of is a strong distance.
202
00:29:18,630 --> 00:29:25,090
So maybe you only need the distance. Yes, I see.
203
00:29:25,090 --> 00:29:32,830
Yeah, it's a good point, which might avoid the conflict is essential.
204
00:29:32,830 --> 00:29:45,160
So for sure we were not able to to avoid this assumption is we we did not see how to avoid it, even with the other function of the distances.
205
00:29:45,160 --> 00:29:49,360
But it's probably a direction to try to investigate more.
206
00:29:49,360 --> 00:29:56,470
Yes, it's a good point. OK.
207
00:29:56,470 --> 00:30:10,480
Thank you. And so I'm moving to two illustrations where we have, um, we have a we have two examples with, uh, with multimodal bustiers.
208
00:30:10,480 --> 00:30:17,290
So when one point two to keep in mind, is that an.
209
00:30:17,290 --> 00:30:26,080
When we when our approach is is deemed to work to work best, when in the case of multimodal bustiers,
210
00:30:26,080 --> 00:30:32,110
so this is the reason why we focus essentially on these examples.
211
00:30:32,110 --> 00:30:40,210
So in the both examples, we have to 10 dimensional observation, and that's the single a single observation.
212
00:30:40,210 --> 00:30:50,800
So in the maybe a follow up question is that in the case where the the actual observation is is like a very,
213
00:30:50,800 --> 00:30:58,690
very long observation, maybe it can be just summarised to add to a smaller dimension observation.
214
00:30:58,690 --> 00:31:04,750
And the first first example is a synthetic sound source, localisation with Tweedie parameters.
215
00:31:04,750 --> 00:31:08,890
And the next one is a real problem for these parameters.
216
00:31:08,890 --> 00:31:19,780
So we compare four and four types of of of ABC models, the one that is based on Gleen with only the expectation,
217
00:31:19,780 --> 00:31:25,210
Glevum, which is the expectation and the variance in with functional summaries.
218
00:31:25,210 --> 00:31:29,780
So either with comparing by two or by N.W. to.
219
00:31:29,780 --> 00:31:35,330
And then the the Senate and Pringle semiautomatic.
220
00:31:35,330 --> 00:31:41,180
So for each of them, we rely on the on our packages and the same for Glim,
221
00:31:41,180 --> 00:31:48,350
we rely on this extreme package that was proposed by Florence and quota's to.
222
00:31:48,350 --> 00:31:56,530
Sorry, I'm asking. So so you're not comparing that to was because in a sense your post.
223
00:31:56,530 --> 00:32:00,410
Yeah. So we do compare to to them as well.
224
00:32:00,410 --> 00:32:07,180
That's a pretty good point. I should have listed it because it's a totally legitimate question.
225
00:32:07,180 --> 00:32:12,470
You you start with you start with an approximation of the pushchairs.
226
00:32:12,470 --> 00:32:23,720
So maybe you could sit there and and we we can see that we get an extra we we find this approximation with the ABC step.
227
00:32:23,720 --> 00:32:34,820
At least in our examples as so. So this is this is this sitting for them and then and we do a knee rejection ABC,
228
00:32:34,820 --> 00:32:40,910
we suspect that we could also do some some other sort of ABC algorithms.
229
00:32:40,910 --> 00:32:44,510
And so the numbers are as follows.
230
00:32:44,510 --> 00:32:51,470
And it's ten to the five. And the number of ABC iterations is ten to the five as well, or ten to the six.
231
00:32:51,470 --> 00:32:58,510
And the upside on this one is a point one percent quanti.
232
00:32:58,510 --> 00:33:14,030
So the first application is an. Is it arises from the case where you have a source vocalisation, you want to infer the localisation of a sound source,
233
00:33:14,030 --> 00:33:26,160
so it's it's suggested to the parameter you want X, Y, based on the number of sort of some sound measurements and the sound measurements.
234
00:33:26,160 --> 00:33:32,630
One way to to get them through some devices is it.
235
00:33:32,630 --> 00:33:40,460
And so I guess that this is a bit of Soho the year and the years work that you get.
236
00:33:40,460 --> 00:33:52,310
You have to and you have a pair of microphones and from this pair of microphones located in one in two, you're able to compute this function.
237
00:33:52,310 --> 00:33:57,710
That depends on on on the parameter to.
238
00:33:57,710 --> 00:34:01,640
The problem is maybe it's not a problem, but that's the way it is.
239
00:34:01,640 --> 00:34:08,360
You have a capability of solution. So actually two to capable rates of solutions.
240
00:34:08,360 --> 00:34:13,250
So how do we sample? That's a simulated example.
241
00:34:13,250 --> 00:34:17,960
So we sampled observations in the following way.
242
00:34:17,960 --> 00:34:30,170
We assume a single data that we assume that we observe Y's that are key while student T noised versions of the.
243
00:34:30,170 --> 00:34:41,510
So this is an 80 plus some student with a with a quite small variance and knew when one degree of freedom.
244
00:34:41,510 --> 00:34:46,220
So this is not this is a really, really bad noise with.
245
00:34:46,220 --> 00:34:53,710
No, no, no expectation. And so the dimensionality is 10.
246
00:34:53,710 --> 00:35:03,170
So actually, it's this is not exactly this this this illustration that I that I show here,
247
00:35:03,170 --> 00:35:10,060
it's a slightly different one where we still have one true position to discover.
248
00:35:10,060 --> 00:35:16,780
But instead of using one pair of microphones we use to pass and the microphones are located.
249
00:35:16,780 --> 00:35:24,760
So one on the x axis when the other one pair the exact x y axis.
250
00:35:24,760 --> 00:35:33,810
And so the likelihood in such a model is an equal mixture of the two single payer components.
251
00:35:33,810 --> 00:35:46,580
And so this is the shape of the troops there, we can easily find the shape by by working with the ited function.
252
00:35:46,580 --> 00:36:00,380
Um, and these exhibits for symmetry, hyperbolise, and actually we can also use in Metropolis testing algorithms to to to sample from the posterior,
253
00:36:00,380 --> 00:36:14,300
but we see here that it's not doing really great. Maybe this is because we didn't use it well enough and so has to think about the results now.
254
00:36:14,300 --> 00:36:22,430
So we have these. Well, let's let's start by the by the mixture in red and the bottom left.
255
00:36:22,430 --> 00:36:30,290
This is to reply to Dugit question. So this is what we what we get with with only the preliminary then instead.
256
00:36:30,290 --> 00:36:39,920
And we see that we have a well we we probably see those those groshen components here and we have a number of them,
257
00:36:39,920 --> 00:36:46,700
maybe something like eight or maybe a bit more, but it's not a perfect representation of the posterior.
258
00:36:46,700 --> 00:36:56,540
And then I want to move to the to the to the to the to occurences of variance, number one,
259
00:36:56,540 --> 00:37:05,780
that is Gleeman E ABC and the the semi-automatic A.B.C. that they are not they are not doing that, they're doing OK at all.
260
00:37:05,780 --> 00:37:10,640
And and then we have the last three that are doing quite OK.
261
00:37:10,640 --> 00:37:21,860
Four for the three of them, the the expectation variance is doing maybe something a bit intimidated due to you two values that we see in the middle.
262
00:37:21,860 --> 00:37:32,860
It's more spread of posterior. And so this is very interesting, and the two and then the three, the functional one is is doing really,
263
00:37:32,860 --> 00:37:41,320
really good, I think, and and quite, quite similar for both for both cases that ask another question.
264
00:37:41,320 --> 00:37:48,130
So is something I don't understand. So you have you you construct a first table to learn your bema.
265
00:37:48,130 --> 00:37:52,750
Yes. And then you use your second table.
266
00:37:52,750 --> 00:38:02,830
To do the ABC, yes, if you had merged the two tables together and you had done done done just one glimmer,
267
00:38:02,830 --> 00:38:07,360
you'd get something better than the left hand side on the button.
268
00:38:07,360 --> 00:38:11,080
So would you do as good as a group? What do you think?
269
00:38:11,080 --> 00:38:17,260
Because you have a lot more data on the right, on the other your thinking then.
270
00:38:17,260 --> 00:38:28,820
Yes. So this reminds me of discussions that we that we had with our quotas of maybe maybe I have some some help from the quotas in the chat.
271
00:38:28,820 --> 00:38:37,750
So I don't know. So actually, Yanguas was replying to your first question, so maybe you can have a go to the to the chad for that.
272
00:38:37,750 --> 00:38:45,760
And and yes. Fraunces that you can also use a single table into Indian or instead.
273
00:38:45,760 --> 00:38:53,920
So I guess that's I guess she did. She did. She did try the mixture with us by merging the two.
274
00:38:53,920 --> 00:39:00,730
And I, I agree that you're going to get something better than what we have represented here,
275
00:39:00,730 --> 00:39:12,290
but you're not going to do to something as precise as the the conversion, the function.
276
00:39:12,290 --> 00:39:18,400
So in a sense, if you want to spare some time to to save some computation time,
277
00:39:18,400 --> 00:39:27,480
I guess that you could also learn both models on the same any learning sets.
278
00:39:27,480 --> 00:39:41,400
But maybe that wouldn't be very Bayesian. OK, so I have a second and last exam illustration that comes from the planetary science,
279
00:39:41,400 --> 00:39:49,740
there is an inverse problem and the data is to recover parameters from from the surface surface of the planet.
280
00:39:49,740 --> 00:39:59,280
For instance, Mars surface from where is what is called the reflectance observations, reflectance measurements.
281
00:39:59,280 --> 00:40:06,420
So this is a typical inverse problem because it's the direct model is easy.
282
00:40:06,420 --> 00:40:15,840
So, you know, this relationship between, you know, how to get the reflectance Y based on some parameter X,
283
00:40:15,840 --> 00:40:22,380
small dimension parameter X, and you get a neat, noisy measurements of these quantities.
284
00:40:22,380 --> 00:40:30,630
And so in reapplication, we focus on this on a small number of parameters and reduce those four.
285
00:40:30,630 --> 00:40:38,730
I have to say, I don't know what they mean. And the reflectance and is is high dimensional.
286
00:40:38,730 --> 00:40:46,560
But you can use only you can you can do with only 10 geometries of of these reflectance.
287
00:40:46,560 --> 00:40:53,390
So you can really compact your observations to something quite small.
288
00:40:53,390 --> 00:40:59,240
And in this case, to the on the premises that are used are given above.
289
00:40:59,240 --> 00:41:11,270
So 40 components, including in the mixtures and both capital and capital m are equal, tend to the five and Upsilon is OK in the same context.
290
00:41:11,270 --> 00:41:22,970
So. This is also a simulated a simulated example where the sets of parameters were well decided equal to this,
291
00:41:22,970 --> 00:41:29,750
so they they are meaningful enough for the for this for the particular application and that we that we
292
00:41:29,750 --> 00:41:40,400
set because because France is also working with these scientists on this in order and in other projects.
293
00:41:40,400 --> 00:41:42,560
So she knows that these values make sense.
294
00:41:42,560 --> 00:41:54,710
And the and the the the example is devised in such a way that the teacher value and has this symmetry between two plus two potential values.
295
00:41:54,710 --> 00:42:00,570
So both point fifteen and point forty two makes sense for the model.
296
00:42:00,570 --> 00:42:12,530
And and if we look at these results for the for the for the marginal seat of each of the four parameters, we see that they are.
297
00:42:12,530 --> 00:42:23,210
So what we have here is on the clean expectation and the the both functional we sell to an individual to in the semi automatic increase.
298
00:42:23,210 --> 00:42:27,590
So for most of the parameters they are doing very similarly,
299
00:42:27,590 --> 00:42:33,680
they are maybe slightly more kicked in the for the W parameter, both the the blue and the black.
300
00:42:33,680 --> 00:42:36,740
So meaning the two functional ones.
301
00:42:36,740 --> 00:42:48,380
And which is interesting in that is that, again, when there is a multimodality in the in the posterior, these are the two procedures.
302
00:42:48,380 --> 00:42:52,670
The functional ones are the ones that seem to recover the best.
303
00:42:52,670 --> 00:42:58,940
So the black has this vital modality as well as the blue as of getting these
304
00:42:58,940 --> 00:43:11,390
bipedality that red and green do not really have or maybe less is pronounced OK.
305
00:43:11,390 --> 00:43:20,900
So we only show marginal, but the same can be seen from from from joint representations.
306
00:43:20,900 --> 00:43:31,820
OK, so I I guess I need to conclude so what we have what we have worked on is is building on the semiautomatic framework of an Trango,
307
00:43:31,820 --> 00:43:40,010
but with this shift of paradigm in the sense that we use the word bustiers to to compare
308
00:43:40,010 --> 00:43:47,390
and to compare observations instead of using of comparing summaries of observations.
309
00:43:47,390 --> 00:43:52,250
So this requires a tractable and scalable model to learn to desegregate.
310
00:43:52,250 --> 00:43:57,290
Sabrin is one and such as possible model.
311
00:43:57,290 --> 00:44:08,240
This works well, as I was saying, to hundreds or thousands of dimensional observations, and it can mean missing daytime data and variables.
312
00:44:08,240 --> 00:44:14,030
And then we also are able to we need we need metrics to compare them.
313
00:44:14,030 --> 00:44:19,310
So we've used L2, NMW to sort a few.
314
00:44:19,310 --> 00:44:24,410
First few results are that we don't need anymore summary statistics.
315
00:44:24,410 --> 00:44:38,210
We have convergence results to the troopers there with this caveat that it's only working in a restricted class of of models that and that's it.
316
00:44:38,210 --> 00:44:43,580
And and we have good performances when the Pushchairs Hamilton model.
317
00:44:43,580 --> 00:44:51,900
And it seems that the quality of the surrogate test is not critical in the experiences that we have had.
318
00:44:51,900 --> 00:44:55,460
And so gimmies is doing OK as we have seen.
319
00:44:55,460 --> 00:45:05,750
It's not a perfect approximation of the poster, but it's it's always something good enough for a hour procedure to to do something good.
320
00:45:05,750 --> 00:45:11,240
And on some of the experiments that we have written with such time-based seems
321
00:45:11,240 --> 00:45:19,910
more robust than it to a number of perspectives or so it's it's still very young,
322
00:45:19,910 --> 00:45:29,630
very, very fresh work and say it's so there are a lot of improvements that we that we can think about.
323
00:45:29,630 --> 00:45:35,450
The choice of K for the moment is we have new information criterion to select the number of
324
00:45:35,450 --> 00:45:44,660
components we can we could think about some and and we haven't assessed the computation costs,
325
00:45:44,660 --> 00:45:53,090
but that's probably something to do. And doing more experiments and restrictions and other metrics than to an individual, too.
326
00:45:53,090 --> 00:45:59,590
Could be could be Kotov. Or other learning scheme then.
327
00:45:59,590 --> 00:46:07,520
So when one option that I was discussing with Jeff before the talk is, for instance,
328
00:46:07,520 --> 00:46:14,170
normalising flows, that would probably do something interesting that would be interesting to check.
329
00:46:14,170 --> 00:46:22,450
Also risky schemes, then the vanilla rejection. So you can think about the importance and CMC or sequential Montecarlo,
330
00:46:22,450 --> 00:46:31,230
and we haven't spoken about threshold level and also extending this to more than just one observation to.
331
00:46:31,230 --> 00:46:36,280
This is clearly something that we we want to to think about.
332
00:46:36,280 --> 00:46:42,910
So we have a number of critics to to thank them, can come and work and sit around with it.
333
00:46:42,910 --> 00:46:46,870
And very, very quickly,
334
00:46:46,870 --> 00:46:55,960
I would like to to pass this message to I don't know if any students on top of Franscisco from the twenty seventeen page cohort,
335
00:46:55,960 --> 00:47:05,680
but we have these open positions of postdocs up so the subject can be anything related to the themes of the team.
336
00:47:05,680 --> 00:47:14,290
For instance, base conditions that the defence has before the end of the year and application is really soon before May 21st.
337
00:47:14,290 --> 00:47:17,230
So you can write us if you're interested.
338
00:47:17,230 --> 00:47:30,770
I just finished with this slide of references, including errors and and thank you very much for your for your for using.
339
00:47:30,770 --> 00:47:38,240
Thank you very much. Know you're getting a few virtual rounds of applause from from the audience, I can see.
340
00:47:38,240 --> 00:47:43,280
We've got, um, uh, a few minutes for questions.
341
00:47:43,280 --> 00:47:49,730
So, um, if anyone would like to question you, just put your hand up and, uh, amuse yourself.
342
00:47:49,730 --> 00:47:55,420
Yes, I can see you first and then we'll have those idiots with a, uh.
343
00:47:55,420 --> 00:48:04,150
Thanks very much, Judy, very interesting. And so my question is, that surrogate was staring at you,
344
00:48:04,150 --> 00:48:16,510
I think you call it is it is it was the sort of necessary and sufficient is it necessary or is it sufficient for folks here to converge to pie?
345
00:48:16,510 --> 00:48:24,820
Is it necessary for the surrogate to converge in order for the long distance to converge?
346
00:48:24,820 --> 00:48:30,430
But yeah, I guess it's I guess it's necessary.
347
00:48:30,430 --> 00:48:36,510
Yes, I would say even better than. And no idea whether he's sufficient,
348
00:48:36,510 --> 00:48:44,960
but I was guessing you were going to say the other way around that it's not obviously sufficient, but perhaps that's not necessary.
349
00:48:44,960 --> 00:48:54,270
Yeah, maybe if I can add something, yeah, yeah, please. Yeah, I think you're right, Jeff.
350
00:48:54,270 --> 00:49:06,300
We just need that the surrogate posterior voices are sort of discriminative enough on the parameters.
351
00:49:06,300 --> 00:49:15,700
I mean. Yeah, yeah, if let's say we have a biased estimation of the of the super stereo,
352
00:49:15,700 --> 00:49:19,660
if the I don't know if we can say that, but if the bias is like constant,
353
00:49:19,660 --> 00:49:26,020
somehow when you compare the two biased estimation,
354
00:49:26,020 --> 00:49:31,540
the distance between those biased estimation would be the same as the distance between the two posteriors.
355
00:49:31,540 --> 00:49:42,460
And then I think it would it would work. But the problem is that in practise, we don't know how to formalise this idea.
356
00:49:42,460 --> 00:49:52,950
Yeah, I don't know if you have suggestions, but. And she did.
357
00:49:52,950 --> 00:49:56,730
Would you like to? Can you go back to your conclusion?
358
00:49:56,730 --> 00:50:02,060
Because I can't remember my question with that. I just have to wait.
359
00:50:02,060 --> 00:50:07,320
So, um, so you said that you thought you want to extend to to eyed observations.
360
00:50:07,320 --> 00:50:16,680
I don't quite know what you mean by that, but somehow the Y that you consider as your observation could be anything.
361
00:50:16,680 --> 00:50:26,780
It could be idea of observation and it could be a long victor or like a long time series of applications of time service or whatever, you don't have.
362
00:50:26,780 --> 00:50:32,580
You don't need to structure. The only thing that you need is that you want to approximate just by your whatever method you're using.
363
00:50:32,580 --> 00:50:38,040
So he has a view. But I could be way, one way and know I could be a big one.
364
00:50:38,040 --> 00:50:43,320
Yeah, no, I'm I'm not what we what we do.
365
00:50:43,320 --> 00:50:47,970
It heavily relies on the fact that for one parameter,
366
00:50:47,970 --> 00:51:00,250
we have only one observation and the reply and naiades sample of observations make it a little different to a different set up.
367
00:51:00,250 --> 00:51:12,050
And then. In, I don't know, so so because of your control, just the reason that you wanted to ask if you had some asymptotic as well,
368
00:51:12,050 --> 00:51:17,990
is a dimension in the dimension of the observation somehow. Maybe that has to do with the theory.
369
00:51:17,990 --> 00:51:26,990
But I think even even in the actual implementation of what we do, we would we would have to think differently for a building.
370
00:51:26,990 --> 00:51:35,000
The forbidding limit, for instance. Yes, it seems like a dream is related to the fact that you have a specific structure in your way.
371
00:51:35,000 --> 00:51:39,560
In a sense, that's what you're saying. Yes. Yes.
372
00:51:39,560 --> 00:51:47,740
So, yes, you will have well, we would that we would have to to fit a model with a.
373
00:51:47,740 --> 00:52:00,070
In a sense, with with a way that is that is no more just one when one observation, but the table of observations so that you don't I mean,
374
00:52:00,070 --> 00:52:08,470
familiar me why is like if you have any idea observation, you can look at it as one observation as well, which is a big victory of observation now.
375
00:52:08,470 --> 00:52:17,280
And the end of the day, you're constructing your table. As for each day, you have a big Z and then you'll meet you.
376
00:52:17,280 --> 00:52:22,480
You'll fit your condition. You want to feature you will twist to meet the condition distribution of the tent even set.
377
00:52:22,480 --> 00:52:29,080
In a sense, that's the model, which is an estimate of the damage estimate of of the dead.
378
00:52:29,080 --> 00:52:34,760
Given is that in a sense, yes. So what does that is at once?
379
00:52:34,760 --> 00:52:45,100
And then all of and a diving complex doesn't I mean, in practise it makes sense fitting in the way the methodology is derived.
380
00:52:45,100 --> 00:52:53,350
The principle should be the centre. Yeah. Maybe maybe reduce something in doing it this way.
381
00:52:53,350 --> 00:52:59,780
So I see France saying that, that you'd probably want to do something more clever.
382
00:52:59,780 --> 00:53:07,930
You would know you're right in the sense that we could we could stuck on the idea of salvation in one big vector.
383
00:53:07,930 --> 00:53:12,310
But then it's a bit it's a pity not to let them know that the idea.
384
00:53:12,310 --> 00:53:16,240
Yes. So you want to take this picture now?
385
00:53:16,240 --> 00:53:22,720
Because if you lose if you use, for instance, a discrepancy based method, that's exactly what they do.
386
00:53:22,720 --> 00:53:30,550
They know the data and they know that they all come from the same data.
387
00:53:30,550 --> 00:53:39,020
So, yes. And so so it's it would be an not in favour of me not to know that.
388
00:53:39,020 --> 00:53:45,890
Mm hmm. And also, the idea that it's easy to adapt to the current implementation is not made for it,
389
00:53:45,890 --> 00:53:53,170
but it's just an algorithm, so it's too difficult to adapt it.
390
00:53:53,170 --> 00:54:04,330
And also for computational reasons, also, you have a very, very big victory, for instance, at the moment we cannot DeQuan two million or whatever,
391
00:54:04,330 --> 00:54:11,470
the using descriptives methods the other day may not be that large, but they have a lot of repetition.
392
00:54:11,470 --> 00:54:23,470
It's a bit reminiscent of something that she was doing some years ago when she was using mixtures as a way to approximate the density,
393
00:54:23,470 --> 00:54:30,160
in a sense within the algorithm. And so I remember the talk to some years.
394
00:54:30,160 --> 00:54:38,110
I know. Yeah, you have to we have to check because indeed, the severe worked a lot on the mixture of experts and.
395
00:54:38,110 --> 00:54:42,040
Yeah, and so I wonder how related it is to what you're doing.
396
00:54:42,040 --> 00:54:49,060
And I think in her case she could conclude the likelihood. So it's a bit different in this respect that she was definitely using these mixtures to
397
00:54:49,060 --> 00:54:55,420
approximate the posted identity at some stage in her guidance and then maybe as a proposal for.
398
00:54:55,420 --> 00:54:59,140
Yeah, I can't remember whether it was a proposal on the end.
399
00:54:59,140 --> 00:55:04,030
She didn't bother, but getting rid of the proposal and she was just looking at these mixtures as an approximation.
400
00:55:04,030 --> 00:55:15,130
I'm not sure it's worth checking, but yeah, it's like it could be related, these attacks.
401
00:55:15,130 --> 00:55:25,520
Because she has a full book on miss. And she has an army of people working on that.
402
00:55:25,520 --> 00:55:35,820
Yeah. I see that we are close to the end of the of this, but there was also another set for for one on one discussion.
403
00:55:35,820 --> 00:55:39,290
So I'm happy to stay connected to the two.
404
00:55:39,290 --> 00:55:42,950
It's another Zoomlion, I think. Really? Yeah. Yeah.
405
00:55:42,950 --> 00:55:47,090
So if anyone can open this one and I stay, I stay there.
406
00:55:47,090 --> 00:55:51,780
They are so happy to continue the discussion here.
407
00:55:51,780 --> 00:55:57,110
But if anyone would like to chat with Julian Giallo one on one,
408
00:55:57,110 --> 00:56:07,430
just pop pop me a quick email and and we can set this evening and to continue the discussion,
409
00:56:07,430 --> 00:56:13,820
but otherwise I think we'll answer any really quick questions. I think we will, uh, call it a day here.
410
00:56:13,820 --> 00:56:19,960
So thank you, everyone, for for attending. And thank you, as usual, for four great talk.
411
00:56:19,960 --> 00:56:31,130
It's an excellent invitation and my pleasure. Um, and, uh, everyone can look out look out for next week's talk of who the speaker is.
412
00:56:31,130 --> 00:56:37,430
But, uh, they'll be, of course, another small seminar for the next few weeks.
413
00:56:37,430 --> 00:56:42,344
Uh, so thank you again, Geria. Thank you and right.