1 00:00:00,820 --> 00:00:10,570 And Jill, over to you. OK, well, hi, everyone, and thanks for the invitation and for the too generous introduction. 2 00:00:10,570 --> 00:00:18,520 And so this is this is Don't Work is Florence for CNN and 2010. 3 00:00:18,520 --> 00:00:24,340 And again and then what I'm going to talk about today is approximate Bayesian computation. 4 00:00:24,340 --> 00:00:29,170 That is ABC, in short, with a resurgent Posterous. 5 00:00:29,170 --> 00:00:39,790 So it's it's sort of a new way to to doing ABC that that we find that we find pretty interesting. 6 00:00:39,790 --> 00:00:52,650 So as a disclaimer, I'd like to say first that we none of us is really expert of ABC, but any way we find the new methods particularly. 7 00:00:52,650 --> 00:01:03,390 So here are my collaborators of Florence, he's the head of the Stratify team in England, 8 00:01:03,390 --> 00:01:09,030 is at La Trobe University in Australia, and Trent in England is at university in France. 9 00:01:09,030 --> 00:01:14,280 So I think that all of them are here. So I guess. 10 00:01:14,280 --> 00:01:20,670 So by the end of my presentation, you should be more familiar with ABC. 11 00:01:20,670 --> 00:01:24,270 I hope so. I would start by presenting the venue. 12 00:01:24,270 --> 00:01:30,810 I agree then that you can use in a series called The Rejection ABC and then and move 13 00:01:30,810 --> 00:01:37,200 on to the semi-automatic approach that was proposed by Fernet and Pringle in 2012. 14 00:01:37,200 --> 00:01:42,990 And what we do is that we build on this semi-automatic ABC. 15 00:01:42,990 --> 00:01:52,230 So we for this we we have preliminary learning step where we where we built what we call surrogate posteriors. 16 00:01:52,230 --> 00:01:57,570 And these surrogate posteriors are built as an inverse regression that is called glim. 17 00:01:57,570 --> 00:02:02,520 And so we call our ABC approach the clean ABC procedure. 18 00:02:02,520 --> 00:02:10,980 So I will present some theory, some theoretical properties of it and and a number of illustrations in inverse problems. 19 00:02:10,980 --> 00:02:17,870 And then I conclude. So to give you quickly some context, 20 00:02:17,870 --> 00:02:24,140 and when we are interested in doing Bayesian inference that the likelihood is 21 00:02:24,140 --> 00:02:31,340 intractable when we to deal is the is in the context of approximate computation. 22 00:02:31,340 --> 00:02:36,470 So you get data generating model as as follows. 23 00:02:36,470 --> 00:02:48,280 So the parameters denoted data and we have to have a prior pipefitter and given to the the likelihood is denoted fita. 24 00:02:48,280 --> 00:03:00,710 And I've said so that did I mentioned and an important condition of of ABT is that we know who to sample from this likelihood. 25 00:03:00,710 --> 00:03:03,910 We also know how to sample from the prior. 26 00:03:03,910 --> 00:03:12,220 So the goal of statistics, one on one goal in statistics is estimations or estimation of the parameter given some observed way. 27 00:03:12,220 --> 00:03:20,710 And in patients, that's how this is done, is by by forming a process distribution by Peter, 28 00:03:20,710 --> 00:03:31,700 given why that is proportional to the prior times that I to. And the question is NBC what to do when the likelihood is intractable. 29 00:03:31,700 --> 00:03:42,120 So it's not possible to evaluate it, maybe because it's too costly or just because it's it's not their. 30 00:03:42,120 --> 00:03:48,810 So one way to proceed and the most simple way to do is is as follows. 31 00:03:48,810 --> 00:03:58,650 So you get to you want some to values and parameter values from from the posterior, but you're not going to have them. 32 00:03:58,650 --> 00:04:03,120 Exactly. From the steering that ABC set up, it's going to be approximate. 33 00:04:03,120 --> 00:04:08,820 And so the way to to proceed is as follows. You need to sample quite a lot. 34 00:04:08,820 --> 00:04:17,490 A number of couples have parameters in there and sample data Tettamanzi them. 35 00:04:17,490 --> 00:04:22,050 And so this is simple to do. If you know how to sample from both the prior and accurate, 36 00:04:22,050 --> 00:04:31,740 you can first sample type values from the prior and conditional on these two that I use, sample data I use from the likelihood. 37 00:04:31,740 --> 00:04:40,500 And the key starting point of rejection is that you are going to accept to keep 38 00:04:40,500 --> 00:04:47,900 parameter values for which you simulated data values that are close to the actual data. 39 00:04:47,900 --> 00:04:55,540 And so this is done by a by a comparison with some metric capital G. 40 00:04:55,540 --> 00:05:04,390 And as soon as the distance is small enough, then you decide to to keep the parameters that you have sampled. 41 00:05:04,390 --> 00:05:11,120 So this distance D can be can take several forms in the most simple way. 42 00:05:11,120 --> 00:05:19,600 It takes the it's the Euclidean distance between between the victors of of data of true data and humanity data. 43 00:05:19,600 --> 00:05:28,660 But as we will see so often the Euclidean distance of summaries of these factors. 44 00:05:28,660 --> 00:05:36,940 And so already now we can see a number of questions and we are faced with a number of questions actually, 45 00:05:36,940 --> 00:05:42,700 what choice for D for the distance, for the samaris and also for the for the threshold? 46 00:05:42,700 --> 00:05:50,650 Excellent. So in to talk and in our work, we don't really discuss the choice of the threshold, 47 00:05:50,650 --> 00:05:58,450 but we have this discussion and the choices for Froggie and for the simplest. 48 00:05:58,450 --> 00:06:09,490 So they are a number of strategies for for these choices of DNA and s. So it's a starting point for 49 00:06:09,490 --> 00:06:18,940 this is the realisation that you cannot really use this simple distance efficiently in high dimension, 50 00:06:18,940 --> 00:06:24,340 etc. you would get too much variability in your procedure. 51 00:06:24,340 --> 00:06:29,560 So it's important to do to reduce the dimension. 52 00:06:29,560 --> 00:06:39,630 So doing this can be done in two ways. The first category of ways, the first committee of procedures is based on the effort is made in the summary. 53 00:06:39,630 --> 00:06:48,810 I saw this in this family of approaches, this is a standard distance, and in this way, 54 00:06:48,810 --> 00:06:57,450 the the the the fact that you use summary's reduces the dimension and induces a smaller variance. 55 00:06:57,450 --> 00:07:02,520 But the the the problem is that you lose some information. 56 00:07:02,520 --> 00:07:13,600 And the choice the choice of the summaries is arbitrary if you don't have expert information for how to do it. 57 00:07:13,600 --> 00:07:32,290 So so the the work that I already mentioned a couple of minutes ago by of in 2012 provided the first solution to this, to this and to this problem. 58 00:07:32,290 --> 00:07:38,890 The semi-automatic A.B.C. framework relies on the preliminary learning step where 59 00:07:38,890 --> 00:07:49,480 you you learn the defendant's between the parameter and the data in a generic way. 60 00:07:49,480 --> 00:08:02,190 But one of the one of the limitation is that it relies requires a modest dimensionality for the for the data. 61 00:08:02,190 --> 00:08:10,500 And the second category of of approaches is, is the is the ones that are based on data discrepancy. 62 00:08:10,500 --> 00:08:17,970 So it was an active research line in the last five years or so where the idea 63 00:08:17,970 --> 00:08:26,250 here is to replace the distance by a distance and empirical distributions. 64 00:08:26,250 --> 00:08:35,730 So your view in this in this sort of approach is you view your your data vectors as empirical distributions. 65 00:08:35,730 --> 00:08:42,330 And so by doing this so here it's with some abuse of notation. 66 00:08:42,330 --> 00:08:50,580 The victors are seen as these empirical distributions and then you can use the distances between empirical distributions. 67 00:08:50,580 --> 00:08:59,630 So a number of distances have been proposed in the in the literature and listed here. 68 00:08:59,630 --> 00:09:07,390 So a clear advantage is that you do not rely anymore on in some recent. 69 00:09:07,390 --> 00:09:20,240 But a problem is that the tree is moderately large, samples you need replicates of of samples for the same parameter to twin girls. 70 00:09:20,240 --> 00:09:24,610 And in many of those problems, you don't you don't have these replicates. 71 00:09:24,610 --> 00:09:30,730 So you you in the problems we are interested in, you only have one observation. 72 00:09:30,730 --> 00:09:41,670 That can be a long observation, but you have one observation to two inverse four for every parameter of interest. 73 00:09:41,670 --> 00:09:53,880 OK, so what one reason why these ABC methods are interested in one of the reasons why we we we 74 00:09:53,880 --> 00:10:02,760 can count on them is that they have well-behaved limits when when Upsilon goes to zero. 75 00:10:02,760 --> 00:10:06,570 So this is a this is what we need to present here. 76 00:10:06,570 --> 00:10:17,760 This is. So here the placea distribution is written with this intractable, like huge inbreds. 77 00:10:17,760 --> 00:10:30,030 And so since it's intractable, we replace it's ABC replaces it by this blue quantity that is that is a of the 78 00:10:30,030 --> 00:10:34,780 like Hudes with respect to the indicative indicator function here in root. 79 00:10:34,780 --> 00:10:49,710 OK, so using this approximate likelihood induces an approximate crustier here in red that is proportional to the prior times, the approximate. 80 00:10:49,710 --> 00:10:58,980 And the reason why this crazy poster converges to the troops there is fairly simple to to see. 81 00:10:58,980 --> 00:11:07,440 It relies on the fact that when Upsilon goes to zero, the distances between the data vectors also goes to zero. 82 00:11:07,440 --> 00:11:19,830 So the set of accepted vectors converges to the singleton made of only true data. 83 00:11:19,830 --> 00:11:23,460 And so the approximate posterior Convergys to the troops there. 84 00:11:23,460 --> 00:11:30,570 So the details of this proof can be found in the references below. 85 00:11:30,570 --> 00:11:39,390 And one of the starting point of the of the of the work is actually a realisation by Florence, if I don't want to mention her, 86 00:11:39,390 --> 00:11:49,620 that this condition that the set of accepted Zed's converges to the Singleton Y is somehow too strong of an assumption. 87 00:11:49,620 --> 00:11:59,730 Or you can rely on something not as strong as is for the for the convergence to to steal, to see, to hold. 88 00:11:59,730 --> 00:12:04,110 And and let me let me explain which sense this. 89 00:12:04,110 --> 00:12:08,820 This is a this is this is true. So we we can right. 90 00:12:08,820 --> 00:12:15,450 Then the the base formula for the crazy posterior in a slightly different way. 91 00:12:15,450 --> 00:12:32,760 So we replace here the joints of t time and data by by the same joint here written in blue but but by using the chain rule in the other way round. 92 00:12:32,760 --> 00:12:40,470 Right. So we. It uses the post here and the and the and the evidence of that. 93 00:12:40,470 --> 00:12:43,920 And so this is a sort of first realisation. 94 00:12:43,920 --> 00:12:59,100 And then the second sort of bold endeavour is to replace some distance between vectors, Y and Z by distances between posterior distributions. 95 00:12:59,100 --> 00:13:08,280 So here there is an overload of denotation. G is not the same, but we used to do the same, the same notation. 96 00:13:08,280 --> 00:13:19,780 So in this integral, we want to replace distance between vectors by by a distance between distribution's. 97 00:13:19,780 --> 00:13:31,660 And so we have no we are forming with with this we are forming a new quasi post here that is written in blue and that is the same as before. 98 00:13:31,660 --> 00:13:41,390 But where the indicative indicates our function is is is evaluated at the at the distances above. 99 00:13:41,390 --> 00:13:49,460 OK, and when you first hear in that reproving in the Philippines is the fact that the crazy post here, 100 00:13:49,460 --> 00:13:57,370 Convergys to the to the posterior, to the troops there in total valuation when Upsilon goes to zero. 101 00:13:57,370 --> 00:14:09,190 So actually, the proof is very similar to to the intrusion of the of the original proof for for the ABC structure when Upsilon goes to zero. 102 00:14:09,190 --> 00:14:18,430 And you can get that the discrepancy between the surrogate wall between the stairs also goes to zero. 103 00:14:18,430 --> 00:14:26,720 So that means that the possibility at sea converges to that evaluation of the true data. 104 00:14:26,720 --> 00:14:34,890 And in terms of of Kwesi posteriors, that means the quazi posterior to Convergys, to the to the posterior. 105 00:14:34,890 --> 00:14:44,220 And and so what we what we see in blue here is that the convergence note in terms of sets of indicator functions 106 00:14:44,220 --> 00:14:53,430 is a convergence to to a set that is potentially slightly larger than the Singleton way is set in blue here. 107 00:14:53,430 --> 00:14:59,140 And it contains white, but not necessarily on the one. 108 00:14:59,140 --> 00:15:12,520 So if you follow me, well, you may ask yourself, why is it any and legitimate to use this unknown quantity in what we do in Saudi? 109 00:15:12,520 --> 00:15:19,840 Of course, in practise, we we need to use a practical approach, approximations to these to these bustiers. 110 00:15:19,840 --> 00:15:35,950 And this is what we call the surrogate bustiers. Um, so I'm I'm moving now to to give a few words on the approach proposed by Senate and Pringle, 111 00:15:35,950 --> 00:15:46,420 so when they what they do in semiautomatics, two, to replace the choice of summer is by by summer and. 112 00:15:46,420 --> 00:15:57,250 By some experts, their expectation was Terramin, so of course, the Terramin is a quantity that isn't available by definition of your of your problem. 113 00:15:57,250 --> 00:16:07,150 This is one of the things that you are looking for. But they iji and they suggest to use a preliminary linear regression to learn this 114 00:16:07,150 --> 00:16:19,070 meeting between 2010 Z and and this is done by first simulating a large and a large. 115 00:16:19,070 --> 00:16:26,310 A number of couples of parameters and data that is simply simply sampled from the joint distribution, 116 00:16:26,310 --> 00:16:33,020 so it's the same procedures that we do in B.C., but it's done as a preliminary step. 117 00:16:33,020 --> 00:16:37,490 Right. So in the end, you're going to do it twice. 118 00:16:37,490 --> 00:16:47,270 And so we have a number of the contributions in in the paper that we that we call variance in in this presentation. 119 00:16:47,270 --> 00:16:53,900 So I realised that I should I should not use the word variant that is too much contested those days. 120 00:16:53,900 --> 00:16:58,860 So the first one is was already suggested. 121 00:16:58,860 --> 00:17:04,860 When was rejected, implemented by my papers, for instance, and Rehnquist, 122 00:17:04,860 --> 00:17:16,140 and it was already suggested in the original paper by Denny Simple, it's about using something else than a linear regression. 123 00:17:16,140 --> 00:17:26,340 So neural networks, for instance, actually, we can also use our own investigation to to implement our number one. 124 00:17:26,340 --> 00:17:34,950 And number two is to realise that not only the means could be used, but also some higher order moments like variances. 125 00:17:34,950 --> 00:17:39,270 So it was already suggested by June, but not implemented. 126 00:17:39,270 --> 00:17:49,730 And we guess that the reason why it was not implemented is that it requires your procedure to be able to provide those moments at low cost. 127 00:17:49,730 --> 00:18:01,250 And the main contribution is for and the three is to replace summary's by by a good approximation of the post there. 128 00:18:01,250 --> 00:18:09,020 So this requires two things. It requires a learning procedure that is able to provide this approximate plasterers. 129 00:18:09,020 --> 00:18:17,240 And for this, we use the clean model Kushan locally mapping of the reforms 2015. 130 00:18:17,240 --> 00:18:24,710 And then once once we have a positive approximation, we need a way to compare them. 131 00:18:24,710 --> 00:18:31,640 And this is this requires metrick between distribution's. 132 00:18:31,640 --> 00:18:41,110 So I have to stop maybe for for a couple of seconds to ask whether there are any questions and what we've seen so far before, I move to the. 133 00:18:41,110 --> 00:18:57,490 So I proposed this framework. I should move on so and so the surrogate posters that we propose are billed as mixtures of 134 00:18:57,490 --> 00:19:04,630 emotions so that this is one of one of the babies of the first quarter of the work by Florence. 135 00:19:04,630 --> 00:19:18,370 And and the the idea of claim is to capture the the relationship between daytime parameters in a as a as a mapping that we that we learn beforehand. 136 00:19:18,370 --> 00:19:20,950 As to the way the mapping works, 137 00:19:20,950 --> 00:19:35,960 is is is is is as a mixture of coercion and distributions and that our parametrised by a number of by two sets of parameters, 138 00:19:35,960 --> 00:19:44,140 I'd say the first set of parameters I rely on the on the weights of the mixtures here below, 139 00:19:44,140 --> 00:19:49,190 while the second half of the parameters rely on the whole we. 140 00:19:49,190 --> 00:20:02,650 parametrised Scotians, and there is this fine, a fine relationship of the means and a fine dependency in life. 141 00:20:02,650 --> 00:20:12,550 So to fit these climate models, we need a preliminary learning set, just as the semi-automatic approach does. 142 00:20:12,550 --> 00:20:20,590 So for this, we need to sample from the from the true joint and get people and couples. 143 00:20:20,590 --> 00:20:29,130 And then the clean relationship is learnt is is learnt by using an written. 144 00:20:29,130 --> 00:20:39,310 OK, so we estimate a five star K for the number of mixtures and the number of components and what this data says. 145 00:20:39,310 --> 00:20:51,140 And we estimate Vistar and then all the procedure that follows can be done with this single value of Vistar. 146 00:20:51,140 --> 00:20:59,180 And so in our case, the three our that they have presented before take the following form. 147 00:20:59,180 --> 00:21:08,280 If, if, if I rely on Glim the Viant, no one uses the posterior mean. 148 00:21:08,280 --> 00:21:17,880 And as as a single as a single summary statistic and so means are staying close 149 00:21:17,880 --> 00:21:23,910 to home for everything is even close to home with these discussion mixtures. 150 00:21:23,910 --> 00:21:32,310 So this is the mean that we use violence. Number two is the suggestion that we can add some higher order moments. 151 00:21:32,310 --> 00:21:39,410 So it happens that the violence is also close from Compline and they take this form. 152 00:21:39,410 --> 00:21:44,300 And then the violence and the three is the idea that we can use surrogate posteriors, 153 00:21:44,300 --> 00:21:51,220 so the full surrogate posterior in the case of cleaning are mixtures of emotions. 154 00:21:51,220 --> 00:21:57,730 So if we want to use this as an as it is to be compared together, 155 00:21:57,730 --> 00:22:07,660 we need a metric for Fogelson mixtures and that's precisely the work that was done by doing this on paper, 156 00:22:07,660 --> 00:22:13,330 where they propose Westtown based dist. for mixtures of of quotients. 157 00:22:13,330 --> 00:22:19,040 So the the distances referred to as the MWI to. 158 00:22:19,040 --> 00:22:28,720 But are the distances can be used and we also implement a need to distance between mixtures. 159 00:22:28,720 --> 00:22:33,590 OK, so this is a recap of the proposed algorithms, 160 00:22:33,590 --> 00:22:40,420 so remember that we have the first preliminary learning step where we need to to sample 161 00:22:40,420 --> 00:22:48,770 a large and large sample in and the data sets and we will learn glean on this data set. 162 00:22:48,770 --> 00:23:00,550 So we done this functional relationship between two things by getting a sci fi star and type of parameter estimate. 163 00:23:00,550 --> 00:23:09,370 This is an approximation of the troopers here. And then they are there is the second step that is computing distances. 164 00:23:09,370 --> 00:23:15,380 So we need another simulated data set E capital M. 165 00:23:15,380 --> 00:23:23,900 For a single observed why and we can do two different approaches, the victor approach, 166 00:23:23,900 --> 00:23:33,630 that is either variant one and two with the expectation that with expectation and finances or look, variances. 167 00:23:33,630 --> 00:23:41,010 And the functional summary variant consists in comparing directly the surrogate for stairs, 168 00:23:41,010 --> 00:23:52,800 so either by the N.W. to or by the L2 and then the sample selection is the usual thing that you only retain the best and the best, 169 00:23:52,800 --> 00:24:05,430 the smallest distances by by choosing Upsilon as usually as a quantity of the critical distribution and the distances. 170 00:24:05,430 --> 00:24:15,310 OK, so I'm moving now to some asymptotic properties of a very procedures, I can take questions if they are. 171 00:24:15,310 --> 00:24:23,170 Otherwise, I'm happy to move to on. Looks like there's a question from Jeff. 172 00:24:23,170 --> 00:24:26,670 Jeff, if you'd like to just submit yourself and go. 173 00:24:26,670 --> 00:24:39,030 All Yeah, certainly the windows, I just I just want to check my understanding, the the way that there is is quite a high dimension. 174 00:24:39,030 --> 00:24:44,460 It's not just the data. It's it includes like the whole data set. 175 00:24:44,460 --> 00:24:46,240 Is that right? Or is it just the data? 176 00:24:46,240 --> 00:24:57,780 Also, it's a did I mention the object is the dimensional vector that and that is not that high in our applications. 177 00:24:57,780 --> 00:25:10,530 And I'm going to comment on it later on. But it's usually it can be up to one hundred one thousand dimensional orbit. 178 00:25:10,530 --> 00:25:22,080 So it's a full data set is not to mention the full you have multiple copies of the site, actually, actually not really in the in the universe. 179 00:25:22,080 --> 00:25:23,490 Problems that we are interested in. 180 00:25:23,490 --> 00:25:36,540 We have it's a specific that you have one and and you want the Associated Press premature for for this data, for this single observation. 181 00:25:36,540 --> 00:25:49,500 Great. Thanks very much. I just want to thank you. OK, so I've already mentioned the first result, theoretical visa that we have, 182 00:25:49,500 --> 00:25:55,150 that is that the the first tier converges to the to the troposphere when Upsilon goes to zero. 183 00:25:55,150 --> 00:26:07,230 And and this is this is not really an applied results because it relies on the fact that we use the exact opposite here that we actually don't have. 184 00:26:07,230 --> 00:26:16,890 So this this is more of a practical theoretical visa, this one, because we plug the actual surrogacies. 185 00:26:16,890 --> 00:26:22,510 It's in in the in the crazy post where we are, we are working with. 186 00:26:22,510 --> 00:26:32,850 Um, so I have to acknowledge that this result is only does on hold on the restricted class of targets and surrogate distributions. 187 00:26:32,850 --> 00:26:40,020 So we need compactness actually for for being able to prove the results of compactness of both the joint space of time, 188 00:26:40,020 --> 00:26:57,030 light and as well compactness of the parameter say that that contains the defined parameters of an affair and family of of mixture components. 189 00:26:57,030 --> 00:27:08,940 So according to these assumptions, we build components mixtures from the family capital H. 190 00:27:08,940 --> 00:27:16,200 We use this the set of this learning set down and computes just as before the end, 191 00:27:16,200 --> 00:27:22,860 any parameter as being this this quantity that maximises the likelihood for four, 192 00:27:22,860 --> 00:27:31,860 five and the surrogates are built as and the mixtures evaluated that this at this Vistar. 193 00:27:31,860 --> 00:27:42,810 And what we can prove is with this and this framework and does some additional standard assumptions that that I detail here, 194 00:27:42,810 --> 00:27:52,620 that the injury distance between our approximate posterior and the exact posterior converges to to zero. 195 00:27:52,620 --> 00:28:06,660 So in some measure, Lambda, with respect to the data, to the true data way and in probability with respect to this to this sample and G or. 196 00:28:06,660 --> 00:28:27,540 So an important caveat of our result is that glim actually do not satisfy these these compactness assumptions that we we we hope that some 197 00:28:27,540 --> 00:28:38,970 some mixture of some version of truncated a mixture of truncated Goshen distributions could actually actually meet these these restrictions. 198 00:28:38,970 --> 00:28:49,990 So what I want to say is that this is a theoretical result, does not directly apply to tutoring. 199 00:28:49,990 --> 00:28:58,370 So I'm moving to to a couple of there is a yes to that question. 200 00:28:58,370 --> 00:29:06,160 So so is it important to work with the Hettinger justice? Options, 201 00:29:06,160 --> 00:29:18,630 I guess it's important to work with the distances where you know who to that you know how to deal with a distance sort of is a strong distance. 202 00:29:18,630 --> 00:29:25,090 So maybe you only need the distance. Yes, I see. 203 00:29:25,090 --> 00:29:32,830 Yeah, it's a good point, which might avoid the conflict is essential. 204 00:29:32,830 --> 00:29:45,160 So for sure we were not able to to avoid this assumption is we we did not see how to avoid it, even with the other function of the distances. 205 00:29:45,160 --> 00:29:49,360 But it's probably a direction to try to investigate more. 206 00:29:49,360 --> 00:29:56,470 Yes, it's a good point. OK. 207 00:29:56,470 --> 00:30:10,480 Thank you. And so I'm moving to two illustrations where we have, um, we have a we have two examples with, uh, with multimodal bustiers. 208 00:30:10,480 --> 00:30:17,290 So when one point two to keep in mind, is that an. 209 00:30:17,290 --> 00:30:26,080 When we when our approach is is deemed to work to work best, when in the case of multimodal bustiers, 210 00:30:26,080 --> 00:30:32,110 so this is the reason why we focus essentially on these examples. 211 00:30:32,110 --> 00:30:40,210 So in the both examples, we have to 10 dimensional observation, and that's the single a single observation. 212 00:30:40,210 --> 00:30:50,800 So in the maybe a follow up question is that in the case where the the actual observation is is like a very, 213 00:30:50,800 --> 00:30:58,690 very long observation, maybe it can be just summarised to add to a smaller dimension observation. 214 00:30:58,690 --> 00:31:04,750 And the first first example is a synthetic sound source, localisation with Tweedie parameters. 215 00:31:04,750 --> 00:31:08,890 And the next one is a real problem for these parameters. 216 00:31:08,890 --> 00:31:19,780 So we compare four and four types of of of ABC models, the one that is based on Gleen with only the expectation, 217 00:31:19,780 --> 00:31:25,210 Glevum, which is the expectation and the variance in with functional summaries. 218 00:31:25,210 --> 00:31:29,780 So either with comparing by two or by N.W. to. 219 00:31:29,780 --> 00:31:35,330 And then the the Senate and Pringle semiautomatic. 220 00:31:35,330 --> 00:31:41,180 So for each of them, we rely on the on our packages and the same for Glim, 221 00:31:41,180 --> 00:31:48,350 we rely on this extreme package that was proposed by Florence and quota's to. 222 00:31:48,350 --> 00:31:56,530 Sorry, I'm asking. So so you're not comparing that to was because in a sense your post. 223 00:31:56,530 --> 00:32:00,410 Yeah. So we do compare to to them as well. 224 00:32:00,410 --> 00:32:07,180 That's a pretty good point. I should have listed it because it's a totally legitimate question. 225 00:32:07,180 --> 00:32:12,470 You you start with you start with an approximation of the pushchairs. 226 00:32:12,470 --> 00:32:23,720 So maybe you could sit there and and we we can see that we get an extra we we find this approximation with the ABC step. 227 00:32:23,720 --> 00:32:34,820 At least in our examples as so. So this is this is this sitting for them and then and we do a knee rejection ABC, 228 00:32:34,820 --> 00:32:40,910 we suspect that we could also do some some other sort of ABC algorithms. 229 00:32:40,910 --> 00:32:44,510 And so the numbers are as follows. 230 00:32:44,510 --> 00:32:51,470 And it's ten to the five. And the number of ABC iterations is ten to the five as well, or ten to the six. 231 00:32:51,470 --> 00:32:58,510 And the upside on this one is a point one percent quanti. 232 00:32:58,510 --> 00:33:14,030 So the first application is an. Is it arises from the case where you have a source vocalisation, you want to infer the localisation of a sound source, 233 00:33:14,030 --> 00:33:26,160 so it's it's suggested to the parameter you want X, Y, based on the number of sort of some sound measurements and the sound measurements. 234 00:33:26,160 --> 00:33:32,630 One way to to get them through some devices is it. 235 00:33:32,630 --> 00:33:40,460 And so I guess that this is a bit of Soho the year and the years work that you get. 236 00:33:40,460 --> 00:33:52,310 You have to and you have a pair of microphones and from this pair of microphones located in one in two, you're able to compute this function. 237 00:33:52,310 --> 00:33:57,710 That depends on on on the parameter to. 238 00:33:57,710 --> 00:34:01,640 The problem is maybe it's not a problem, but that's the way it is. 239 00:34:01,640 --> 00:34:08,360 You have a capability of solution. So actually two to capable rates of solutions. 240 00:34:08,360 --> 00:34:13,250 So how do we sample? That's a simulated example. 241 00:34:13,250 --> 00:34:17,960 So we sampled observations in the following way. 242 00:34:17,960 --> 00:34:30,170 We assume a single data that we assume that we observe Y's that are key while student T noised versions of the. 243 00:34:30,170 --> 00:34:41,510 So this is an 80 plus some student with a with a quite small variance and knew when one degree of freedom. 244 00:34:41,510 --> 00:34:46,220 So this is not this is a really, really bad noise with. 245 00:34:46,220 --> 00:34:53,710 No, no, no expectation. And so the dimensionality is 10. 246 00:34:53,710 --> 00:35:03,170 So actually, it's this is not exactly this this this illustration that I that I show here, 247 00:35:03,170 --> 00:35:10,060 it's a slightly different one where we still have one true position to discover. 248 00:35:10,060 --> 00:35:16,780 But instead of using one pair of microphones we use to pass and the microphones are located. 249 00:35:16,780 --> 00:35:24,760 So one on the x axis when the other one pair the exact x y axis. 250 00:35:24,760 --> 00:35:33,810 And so the likelihood in such a model is an equal mixture of the two single payer components. 251 00:35:33,810 --> 00:35:46,580 And so this is the shape of the troops there, we can easily find the shape by by working with the ited function. 252 00:35:46,580 --> 00:36:00,380 Um, and these exhibits for symmetry, hyperbolise, and actually we can also use in Metropolis testing algorithms to to to sample from the posterior, 253 00:36:00,380 --> 00:36:14,300 but we see here that it's not doing really great. Maybe this is because we didn't use it well enough and so has to think about the results now. 254 00:36:14,300 --> 00:36:22,430 So we have these. Well, let's let's start by the by the mixture in red and the bottom left. 255 00:36:22,430 --> 00:36:30,290 This is to reply to Dugit question. So this is what we what we get with with only the preliminary then instead. 256 00:36:30,290 --> 00:36:39,920 And we see that we have a well we we probably see those those groshen components here and we have a number of them, 257 00:36:39,920 --> 00:36:46,700 maybe something like eight or maybe a bit more, but it's not a perfect representation of the posterior. 258 00:36:46,700 --> 00:36:56,540 And then I want to move to the to the to the to the to occurences of variance, number one, 259 00:36:56,540 --> 00:37:05,780 that is Gleeman E ABC and the the semi-automatic A.B.C. that they are not they are not doing that, they're doing OK at all. 260 00:37:05,780 --> 00:37:10,640 And and then we have the last three that are doing quite OK. 261 00:37:10,640 --> 00:37:21,860 Four for the three of them, the the expectation variance is doing maybe something a bit intimidated due to you two values that we see in the middle. 262 00:37:21,860 --> 00:37:32,860 It's more spread of posterior. And so this is very interesting, and the two and then the three, the functional one is is doing really, 263 00:37:32,860 --> 00:37:41,320 really good, I think, and and quite, quite similar for both for both cases that ask another question. 264 00:37:41,320 --> 00:37:48,130 So is something I don't understand. So you have you you construct a first table to learn your bema. 265 00:37:48,130 --> 00:37:52,750 Yes. And then you use your second table. 266 00:37:52,750 --> 00:38:02,830 To do the ABC, yes, if you had merged the two tables together and you had done done done just one glimmer, 267 00:38:02,830 --> 00:38:07,360 you'd get something better than the left hand side on the button. 268 00:38:07,360 --> 00:38:11,080 So would you do as good as a group? What do you think? 269 00:38:11,080 --> 00:38:17,260 Because you have a lot more data on the right, on the other your thinking then. 270 00:38:17,260 --> 00:38:28,820 Yes. So this reminds me of discussions that we that we had with our quotas of maybe maybe I have some some help from the quotas in the chat. 271 00:38:28,820 --> 00:38:37,750 So I don't know. So actually, Yanguas was replying to your first question, so maybe you can have a go to the to the chad for that. 272 00:38:37,750 --> 00:38:45,760 And and yes. Fraunces that you can also use a single table into Indian or instead. 273 00:38:45,760 --> 00:38:53,920 So I guess that's I guess she did. She did. She did try the mixture with us by merging the two. 274 00:38:53,920 --> 00:39:00,730 And I, I agree that you're going to get something better than what we have represented here, 275 00:39:00,730 --> 00:39:12,290 but you're not going to do to something as precise as the the conversion, the function. 276 00:39:12,290 --> 00:39:18,400 So in a sense, if you want to spare some time to to save some computation time, 277 00:39:18,400 --> 00:39:27,480 I guess that you could also learn both models on the same any learning sets. 278 00:39:27,480 --> 00:39:41,400 But maybe that wouldn't be very Bayesian. OK, so I have a second and last exam illustration that comes from the planetary science, 279 00:39:41,400 --> 00:39:49,740 there is an inverse problem and the data is to recover parameters from from the surface surface of the planet. 280 00:39:49,740 --> 00:39:59,280 For instance, Mars surface from where is what is called the reflectance observations, reflectance measurements. 281 00:39:59,280 --> 00:40:06,420 So this is a typical inverse problem because it's the direct model is easy. 282 00:40:06,420 --> 00:40:15,840 So, you know, this relationship between, you know, how to get the reflectance Y based on some parameter X, 283 00:40:15,840 --> 00:40:22,380 small dimension parameter X, and you get a neat, noisy measurements of these quantities. 284 00:40:22,380 --> 00:40:30,630 And so in reapplication, we focus on this on a small number of parameters and reduce those four. 285 00:40:30,630 --> 00:40:38,730 I have to say, I don't know what they mean. And the reflectance and is is high dimensional. 286 00:40:38,730 --> 00:40:46,560 But you can use only you can you can do with only 10 geometries of of these reflectance. 287 00:40:46,560 --> 00:40:53,390 So you can really compact your observations to something quite small. 288 00:40:53,390 --> 00:40:59,240 And in this case, to the on the premises that are used are given above. 289 00:40:59,240 --> 00:41:11,270 So 40 components, including in the mixtures and both capital and capital m are equal, tend to the five and Upsilon is OK in the same context. 290 00:41:11,270 --> 00:41:22,970 So. This is also a simulated a simulated example where the sets of parameters were well decided equal to this, 291 00:41:22,970 --> 00:41:29,750 so they they are meaningful enough for the for this for the particular application and that we that we 292 00:41:29,750 --> 00:41:40,400 set because because France is also working with these scientists on this in order and in other projects. 293 00:41:40,400 --> 00:41:42,560 So she knows that these values make sense. 294 00:41:42,560 --> 00:41:54,710 And the and the the the example is devised in such a way that the teacher value and has this symmetry between two plus two potential values. 295 00:41:54,710 --> 00:42:00,570 So both point fifteen and point forty two makes sense for the model. 296 00:42:00,570 --> 00:42:12,530 And and if we look at these results for the for the for the marginal seat of each of the four parameters, we see that they are. 297 00:42:12,530 --> 00:42:23,210 So what we have here is on the clean expectation and the the both functional we sell to an individual to in the semi automatic increase. 298 00:42:23,210 --> 00:42:27,590 So for most of the parameters they are doing very similarly, 299 00:42:27,590 --> 00:42:33,680 they are maybe slightly more kicked in the for the W parameter, both the the blue and the black. 300 00:42:33,680 --> 00:42:36,740 So meaning the two functional ones. 301 00:42:36,740 --> 00:42:48,380 And which is interesting in that is that, again, when there is a multimodality in the in the posterior, these are the two procedures. 302 00:42:48,380 --> 00:42:52,670 The functional ones are the ones that seem to recover the best. 303 00:42:52,670 --> 00:42:58,940 So the black has this vital modality as well as the blue as of getting these 304 00:42:58,940 --> 00:43:11,390 bipedality that red and green do not really have or maybe less is pronounced OK. 305 00:43:11,390 --> 00:43:20,900 So we only show marginal, but the same can be seen from from from joint representations. 306 00:43:20,900 --> 00:43:31,820 OK, so I I guess I need to conclude so what we have what we have worked on is is building on the semiautomatic framework of an Trango, 307 00:43:31,820 --> 00:43:40,010 but with this shift of paradigm in the sense that we use the word bustiers to to compare 308 00:43:40,010 --> 00:43:47,390 and to compare observations instead of using of comparing summaries of observations. 309 00:43:47,390 --> 00:43:52,250 So this requires a tractable and scalable model to learn to desegregate. 310 00:43:52,250 --> 00:43:57,290 Sabrin is one and such as possible model. 311 00:43:57,290 --> 00:44:08,240 This works well, as I was saying, to hundreds or thousands of dimensional observations, and it can mean missing daytime data and variables. 312 00:44:08,240 --> 00:44:14,030 And then we also are able to we need we need metrics to compare them. 313 00:44:14,030 --> 00:44:19,310 So we've used L2, NMW to sort a few. 314 00:44:19,310 --> 00:44:24,410 First few results are that we don't need anymore summary statistics. 315 00:44:24,410 --> 00:44:38,210 We have convergence results to the troopers there with this caveat that it's only working in a restricted class of of models that and that's it. 316 00:44:38,210 --> 00:44:43,580 And and we have good performances when the Pushchairs Hamilton model. 317 00:44:43,580 --> 00:44:51,900 And it seems that the quality of the surrogate test is not critical in the experiences that we have had. 318 00:44:51,900 --> 00:44:55,460 And so gimmies is doing OK as we have seen. 319 00:44:55,460 --> 00:45:05,750 It's not a perfect approximation of the poster, but it's it's always something good enough for a hour procedure to to do something good. 320 00:45:05,750 --> 00:45:11,240 And on some of the experiments that we have written with such time-based seems 321 00:45:11,240 --> 00:45:19,910 more robust than it to a number of perspectives or so it's it's still very young, 322 00:45:19,910 --> 00:45:29,630 very, very fresh work and say it's so there are a lot of improvements that we that we can think about. 323 00:45:29,630 --> 00:45:35,450 The choice of K for the moment is we have new information criterion to select the number of 324 00:45:35,450 --> 00:45:44,660 components we can we could think about some and and we haven't assessed the computation costs, 325 00:45:44,660 --> 00:45:53,090 but that's probably something to do. And doing more experiments and restrictions and other metrics than to an individual, too. 326 00:45:53,090 --> 00:45:59,590 Could be could be Kotov. Or other learning scheme then. 327 00:45:59,590 --> 00:46:07,520 So when one option that I was discussing with Jeff before the talk is, for instance, 328 00:46:07,520 --> 00:46:14,170 normalising flows, that would probably do something interesting that would be interesting to check. 329 00:46:14,170 --> 00:46:22,450 Also risky schemes, then the vanilla rejection. So you can think about the importance and CMC or sequential Montecarlo, 330 00:46:22,450 --> 00:46:31,230 and we haven't spoken about threshold level and also extending this to more than just one observation to. 331 00:46:31,230 --> 00:46:36,280 This is clearly something that we we want to to think about. 332 00:46:36,280 --> 00:46:42,910 So we have a number of critics to to thank them, can come and work and sit around with it. 333 00:46:42,910 --> 00:46:46,870 And very, very quickly, 334 00:46:46,870 --> 00:46:55,960 I would like to to pass this message to I don't know if any students on top of Franscisco from the twenty seventeen page cohort, 335 00:46:55,960 --> 00:47:05,680 but we have these open positions of postdocs up so the subject can be anything related to the themes of the team. 336 00:47:05,680 --> 00:47:14,290 For instance, base conditions that the defence has before the end of the year and application is really soon before May 21st. 337 00:47:14,290 --> 00:47:17,230 So you can write us if you're interested. 338 00:47:17,230 --> 00:47:30,770 I just finished with this slide of references, including errors and and thank you very much for your for your for using. 339 00:47:30,770 --> 00:47:38,240 Thank you very much. Know you're getting a few virtual rounds of applause from from the audience, I can see. 340 00:47:38,240 --> 00:47:43,280 We've got, um, uh, a few minutes for questions. 341 00:47:43,280 --> 00:47:49,730 So, um, if anyone would like to question you, just put your hand up and, uh, amuse yourself. 342 00:47:49,730 --> 00:47:55,420 Yes, I can see you first and then we'll have those idiots with a, uh. 343 00:47:55,420 --> 00:48:04,150 Thanks very much, Judy, very interesting. And so my question is, that surrogate was staring at you, 344 00:48:04,150 --> 00:48:16,510 I think you call it is it is it was the sort of necessary and sufficient is it necessary or is it sufficient for folks here to converge to pie? 345 00:48:16,510 --> 00:48:24,820 Is it necessary for the surrogate to converge in order for the long distance to converge? 346 00:48:24,820 --> 00:48:30,430 But yeah, I guess it's I guess it's necessary. 347 00:48:30,430 --> 00:48:36,510 Yes, I would say even better than. And no idea whether he's sufficient, 348 00:48:36,510 --> 00:48:44,960 but I was guessing you were going to say the other way around that it's not obviously sufficient, but perhaps that's not necessary. 349 00:48:44,960 --> 00:48:54,270 Yeah, maybe if I can add something, yeah, yeah, please. Yeah, I think you're right, Jeff. 350 00:48:54,270 --> 00:49:06,300 We just need that the surrogate posterior voices are sort of discriminative enough on the parameters. 351 00:49:06,300 --> 00:49:15,700 I mean. Yeah, yeah, if let's say we have a biased estimation of the of the super stereo, 352 00:49:15,700 --> 00:49:19,660 if the I don't know if we can say that, but if the bias is like constant, 353 00:49:19,660 --> 00:49:26,020 somehow when you compare the two biased estimation, 354 00:49:26,020 --> 00:49:31,540 the distance between those biased estimation would be the same as the distance between the two posteriors. 355 00:49:31,540 --> 00:49:42,460 And then I think it would it would work. But the problem is that in practise, we don't know how to formalise this idea. 356 00:49:42,460 --> 00:49:52,950 Yeah, I don't know if you have suggestions, but. And she did. 357 00:49:52,950 --> 00:49:56,730 Would you like to? Can you go back to your conclusion? 358 00:49:56,730 --> 00:50:02,060 Because I can't remember my question with that. I just have to wait. 359 00:50:02,060 --> 00:50:07,320 So, um, so you said that you thought you want to extend to to eyed observations. 360 00:50:07,320 --> 00:50:16,680 I don't quite know what you mean by that, but somehow the Y that you consider as your observation could be anything. 361 00:50:16,680 --> 00:50:26,780 It could be idea of observation and it could be a long victor or like a long time series of applications of time service or whatever, you don't have. 362 00:50:26,780 --> 00:50:32,580 You don't need to structure. The only thing that you need is that you want to approximate just by your whatever method you're using. 363 00:50:32,580 --> 00:50:38,040 So he has a view. But I could be way, one way and know I could be a big one. 364 00:50:38,040 --> 00:50:43,320 Yeah, no, I'm I'm not what we what we do. 365 00:50:43,320 --> 00:50:47,970 It heavily relies on the fact that for one parameter, 366 00:50:47,970 --> 00:51:00,250 we have only one observation and the reply and naiades sample of observations make it a little different to a different set up. 367 00:51:00,250 --> 00:51:12,050 And then. In, I don't know, so so because of your control, just the reason that you wanted to ask if you had some asymptotic as well, 368 00:51:12,050 --> 00:51:17,990 is a dimension in the dimension of the observation somehow. Maybe that has to do with the theory. 369 00:51:17,990 --> 00:51:26,990 But I think even even in the actual implementation of what we do, we would we would have to think differently for a building. 370 00:51:26,990 --> 00:51:35,000 The forbidding limit, for instance. Yes, it seems like a dream is related to the fact that you have a specific structure in your way. 371 00:51:35,000 --> 00:51:39,560 In a sense, that's what you're saying. Yes. Yes. 372 00:51:39,560 --> 00:51:47,740 So, yes, you will have well, we would that we would have to to fit a model with a. 373 00:51:47,740 --> 00:52:00,070 In a sense, with with a way that is that is no more just one when one observation, but the table of observations so that you don't I mean, 374 00:52:00,070 --> 00:52:08,470 familiar me why is like if you have any idea observation, you can look at it as one observation as well, which is a big victory of observation now. 375 00:52:08,470 --> 00:52:17,280 And the end of the day, you're constructing your table. As for each day, you have a big Z and then you'll meet you. 376 00:52:17,280 --> 00:52:22,480 You'll fit your condition. You want to feature you will twist to meet the condition distribution of the tent even set. 377 00:52:22,480 --> 00:52:29,080 In a sense, that's the model, which is an estimate of the damage estimate of of the dead. 378 00:52:29,080 --> 00:52:34,760 Given is that in a sense, yes. So what does that is at once? 379 00:52:34,760 --> 00:52:45,100 And then all of and a diving complex doesn't I mean, in practise it makes sense fitting in the way the methodology is derived. 380 00:52:45,100 --> 00:52:53,350 The principle should be the centre. Yeah. Maybe maybe reduce something in doing it this way. 381 00:52:53,350 --> 00:52:59,780 So I see France saying that, that you'd probably want to do something more clever. 382 00:52:59,780 --> 00:53:07,930 You would know you're right in the sense that we could we could stuck on the idea of salvation in one big vector. 383 00:53:07,930 --> 00:53:12,310 But then it's a bit it's a pity not to let them know that the idea. 384 00:53:12,310 --> 00:53:16,240 Yes. So you want to take this picture now? 385 00:53:16,240 --> 00:53:22,720 Because if you lose if you use, for instance, a discrepancy based method, that's exactly what they do. 386 00:53:22,720 --> 00:53:30,550 They know the data and they know that they all come from the same data. 387 00:53:30,550 --> 00:53:39,020 So, yes. And so so it's it would be an not in favour of me not to know that. 388 00:53:39,020 --> 00:53:45,890 Mm hmm. And also, the idea that it's easy to adapt to the current implementation is not made for it, 389 00:53:45,890 --> 00:53:53,170 but it's just an algorithm, so it's too difficult to adapt it. 390 00:53:53,170 --> 00:54:04,330 And also for computational reasons, also, you have a very, very big victory, for instance, at the moment we cannot DeQuan two million or whatever, 391 00:54:04,330 --> 00:54:11,470 the using descriptives methods the other day may not be that large, but they have a lot of repetition. 392 00:54:11,470 --> 00:54:23,470 It's a bit reminiscent of something that she was doing some years ago when she was using mixtures as a way to approximate the density, 393 00:54:23,470 --> 00:54:30,160 in a sense within the algorithm. And so I remember the talk to some years. 394 00:54:30,160 --> 00:54:38,110 I know. Yeah, you have to we have to check because indeed, the severe worked a lot on the mixture of experts and. 395 00:54:38,110 --> 00:54:42,040 Yeah, and so I wonder how related it is to what you're doing. 396 00:54:42,040 --> 00:54:49,060 And I think in her case she could conclude the likelihood. So it's a bit different in this respect that she was definitely using these mixtures to 397 00:54:49,060 --> 00:54:55,420 approximate the posted identity at some stage in her guidance and then maybe as a proposal for. 398 00:54:55,420 --> 00:54:59,140 Yeah, I can't remember whether it was a proposal on the end. 399 00:54:59,140 --> 00:55:04,030 She didn't bother, but getting rid of the proposal and she was just looking at these mixtures as an approximation. 400 00:55:04,030 --> 00:55:15,130 I'm not sure it's worth checking, but yeah, it's like it could be related, these attacks. 401 00:55:15,130 --> 00:55:25,520 Because she has a full book on miss. And she has an army of people working on that. 402 00:55:25,520 --> 00:55:35,820 Yeah. I see that we are close to the end of the of this, but there was also another set for for one on one discussion. 403 00:55:35,820 --> 00:55:39,290 So I'm happy to stay connected to the two. 404 00:55:39,290 --> 00:55:42,950 It's another Zoomlion, I think. Really? Yeah. Yeah. 405 00:55:42,950 --> 00:55:47,090 So if anyone can open this one and I stay, I stay there. 406 00:55:47,090 --> 00:55:51,780 They are so happy to continue the discussion here. 407 00:55:51,780 --> 00:55:57,110 But if anyone would like to chat with Julian Giallo one on one, 408 00:55:57,110 --> 00:56:07,430 just pop pop me a quick email and and we can set this evening and to continue the discussion, 409 00:56:07,430 --> 00:56:13,820 but otherwise I think we'll answer any really quick questions. I think we will, uh, call it a day here. 410 00:56:13,820 --> 00:56:19,960 So thank you, everyone, for for attending. And thank you, as usual, for four great talk. 411 00:56:19,960 --> 00:56:31,130 It's an excellent invitation and my pleasure. Um, and, uh, everyone can look out look out for next week's talk of who the speaker is. 412 00:56:31,130 --> 00:56:37,430 But, uh, they'll be, of course, another small seminar for the next few weeks. 413 00:56:37,430 --> 00:56:42,344 Uh, so thank you again, Geria. Thank you and right.