1 00:00:00,270 --> 00:00:15,020 Thanks for inviting me. The second I got this quote, yeah, I just, you know, wasn't giving a talk on online is my way, 2 00:00:15,020 --> 00:00:19,120 and I was really struggling with how to do it like this for a year. 3 00:00:19,120 --> 00:00:25,000 And the reason I say it is that I will not be able to watch the chat while I'm giving the talk. 4 00:00:25,000 --> 00:00:29,980 So please, you know, it's a small goal of just jump and ask questions. 5 00:00:29,980 --> 00:00:35,110 You know, don't be shy. So we'll just interrupt. We don't be English. 6 00:00:35,110 --> 00:00:39,610 Interrupt me whenever you want and ask questions. Okay, good. 7 00:00:39,610 --> 00:00:43,900 I monitor the chat as well. So don't worry about that. 8 00:00:43,900 --> 00:00:47,600 Okay, thanks. OK. OK. 9 00:00:47,600 --> 00:00:56,850 So some got. So there's one funded academic used to be that. 10 00:00:56,850 --> 00:01:03,720 Well, they have amazing theory, but you know, it's just really clear to practise what happening today. 11 00:01:03,720 --> 00:01:12,510 Machine learning is actually the opposite. So we have a lot of your mistakes that seems to work. 12 00:01:12,510 --> 00:01:19,860 We'll talk with the second about what this means since work. The theory behind it is relatively weak. 13 00:01:19,860 --> 00:01:34,110 And so my group and I and many other people in the theoretical machine learning our guide to business theory and understand why what works works. 14 00:01:34,110 --> 00:01:44,070 And of course, once you learn, when you learn why it works, you also learn when it doesn't work, when it works and what it doesn't work. 15 00:01:44,070 --> 00:01:50,320 And that's an important issue in machine learning these days because. 16 00:01:50,320 --> 00:01:58,460 I think that people also in the industry push software and claim that, you know, it does, you know, whatever, 17 00:01:58,460 --> 00:02:08,420 while a particular when to use it for critical decisions, one has to be very aware of the limitation of the software. 18 00:02:08,420 --> 00:02:16,640 And of course, I don't know this issue about bias and other social issues related to modern machine learning. 19 00:02:16,640 --> 00:02:22,280 But before we get even to this one issue, this basically a technical question. 20 00:02:22,280 --> 00:02:31,160 If you had been trained your software along the data, if you don't know what is doing, then you know you should be very careful. 21 00:02:31,160 --> 00:02:36,440 What you promise when you sell, the software will be published. 22 00:02:36,440 --> 00:02:42,290 OK, so I mean, I think this audience doesn't need to hear that. 23 00:02:42,290 --> 00:02:48,080 But when I talk to more practical people, I kind of disdain by saying, Well, you know, 24 00:02:48,080 --> 00:02:56,420 if you only if you're the only thing you know about your algorithm that works well on your desktop, 25 00:02:56,420 --> 00:03:03,500 that basically you only think, you know, is the spokeswoman that they beat and you don't know anything else. 26 00:03:03,500 --> 00:03:09,200 And in fact, any generalisation claim, of course, required a. 27 00:03:09,200 --> 00:03:12,800 And that's a pseudonym to start the theory. 28 00:03:12,800 --> 00:03:21,640 And if you want to have a model, you don't know what. Claims you actually have about your suffer. 29 00:03:21,640 --> 00:03:26,060 So my background is theoretical computer science, intellectual computer science. 30 00:03:26,060 --> 00:03:34,970 Basically, you're going to propose we're going to have a paper that looks very much data, but I find it very depressing. 31 00:03:34,970 --> 00:03:41,780 Also influenced by students to move into this area of machine learning. 32 00:03:41,780 --> 00:03:47,090 But the focus of what we do in our group is trying to build theories about. 33 00:03:47,090 --> 00:03:59,930 And so that's, you know, the first step in theory is is trying to explain why this will work or why do people need to think works and windows to work? 34 00:03:59,930 --> 00:04:07,850 And then we see that doing it in a more or a legal sort of audience is not an invitation. 35 00:04:07,850 --> 00:04:13,880 Naturally, when you really understand what you're doing. You often can get better regulators. 36 00:04:13,880 --> 00:04:20,930 And then again, it's very important to quantify the limits of the work. 37 00:04:20,930 --> 00:04:29,420 So in this particular talk, a good talk, a lot of very specific, very hot area now in machine learning, 38 00:04:29,420 --> 00:04:34,700 which is called zero short in future planning week supervision. 39 00:04:34,700 --> 00:04:39,110 And this slide is usually at the end of the talk. But I want to emphasise one thing in the slide. 40 00:04:39,110 --> 00:04:46,580 So to do this kind of work before I found out the best way to do it is like when ablation was a very practical group. 41 00:04:46,580 --> 00:04:53,820 So behind all the names you see above is a rising star in experimental machine learning. 42 00:04:53,820 --> 00:05:02,840 And so he's as well. This sort of works out of collaboration with him, funded by a big deal for good. 43 00:05:02,840 --> 00:05:11,660 And so unless you're of here, here and here is my graduate student also sales. 44 00:05:11,660 --> 00:05:19,880 It was my graduate students, but. A delay in Andrew, 45 00:05:19,880 --> 00:05:32,120 undergraduate student that will help you in experiments and they work with Stupa and Christina is a postdoc that is basically going to two groups. 46 00:05:32,120 --> 00:05:37,940 And so while you'll see beauty in this, folks will also have some experimental results. 47 00:05:37,940 --> 00:05:45,890 And I just want to give good treatments were not that were not done by Mike Wolf with all of the details on the expertise, 48 00:05:45,890 --> 00:05:52,580 and that's why we love working with experimental people. OK, so what's this hot subject? 49 00:05:52,580 --> 00:06:00,500 Zero short. And if you feel stunning and if you open our lips or milk or I see a metal, you see the results of do so. 50 00:06:00,500 --> 00:06:04,580 Yeah, and it's kind of the subject these days. 51 00:06:04,580 --> 00:06:09,530 OK, so we all know that the big bottleneck in machine learning is data. 52 00:06:09,530 --> 00:06:19,100 So, you know, the way we were taught machine learning is, you know, basically you learn from examples and try to generalise. 53 00:06:19,100 --> 00:06:25,490 And in almost all application, particularly when you go to deep learning, you need a little bit. 54 00:06:25,490 --> 00:06:33,620 And the quality of your results highly depends on the quality of your training data and decide not to print. 55 00:06:33,620 --> 00:06:38,180 But when you do those expense, you often you have to pay to get it. 56 00:06:38,180 --> 00:06:42,020 Sometimes you can hope to get it anyway anyway. 57 00:06:42,020 --> 00:06:53,300 And so under this big umbrella of zero short and crucial selling, the idea is to learn with no examples or no training data. 58 00:06:53,300 --> 00:07:02,270 Always very few bland. OK. And it seems to be a very. 59 00:07:02,270 --> 00:07:08,320 Impressive has seems to have impressive results in practise. 60 00:07:08,320 --> 00:07:19,420 And. On the other hand, this whole idea of studying without these other sounds like a Fool Go edition sounds like complete nonsense. 61 00:07:19,420 --> 00:07:24,230 So it's obviously very challenging to try to build fury around them. 62 00:07:24,230 --> 00:07:28,670 OK, so that's this attack of the interest. 63 00:07:28,670 --> 00:07:33,820 Just because it looks to us, getting this addition is like complete absolute. 64 00:07:33,820 --> 00:07:46,960 How can you study was no example. And that's who, of course, is the name is actually somewhat misleading is food that you learnt with no planning. 65 00:07:46,960 --> 00:07:50,860 It was very small stuff, but you learn from some other information. 66 00:07:50,860 --> 00:07:57,510 And so you have all sorts of other ocular information that you rely on in order to do the learning. 67 00:07:57,510 --> 00:08:06,800 OK. And there are. Since it's a relatively new area, though, a lot of variables of that fall under this name. 68 00:08:06,800 --> 00:08:13,430 And one of the things we did is though addition is first of all, we don't obstruct from what's go down. 69 00:08:13,430 --> 00:08:18,090 So we are talking about two approaches. 70 00:08:18,090 --> 00:08:28,640 And one of them is the future of learning where the information coming from other classifiers that I classify for different data. 71 00:08:28,640 --> 00:08:30,710 And we'll see. They deal with that. 72 00:08:30,710 --> 00:08:42,410 It means that I talk a little bit about geotag learning, and his idea is that I have some information, dozens of abuse that a doctor knows. 73 00:08:42,410 --> 00:08:46,940 My book might target classification. OK. 74 00:08:46,940 --> 00:08:53,270 So let me give you some High-Level examples before we get to the legalities. 75 00:08:53,270 --> 00:09:05,300 So if you are someone already trained, almost perfect classifiers for tomatoes, ducks, aeroplane leather and OK. 76 00:09:05,300 --> 00:09:13,070 So you have these tools in your head. And now I tell you, Oh, I need to classify five. 77 00:09:13,070 --> 00:09:21,890 OK, well, so you seem to be able to learn what's good and what's not and what's the talk and what's not and so forth. 78 00:09:21,890 --> 00:09:31,370 So the idea is that I want to build on having this glossy photos in order to classify a new book. 79 00:09:31,370 --> 00:09:40,790 So imagine the way I have classified formulated classification might be related. 80 00:09:40,790 --> 00:09:48,460 And I want to somehow aggregate on them in order to get a classification for when you talk. 81 00:09:48,460 --> 00:09:59,050 Another way this is done. It's almost the same, but in practise it looks very different is I have pictures, 82 00:09:59,050 --> 00:10:07,930 images of abuse subjects and I have features associated with these figures. 83 00:10:07,930 --> 00:10:14,350 So, you know, I have to carry on with their stuff, but bills that are possible. 84 00:10:14,350 --> 00:10:26,380 OK, so now for each of them using this images, I can learn how to detect features like black, white, brown stripes. 85 00:10:26,380 --> 00:10:33,670 What the it is, maybe. OK, so I can learn from these images. 86 00:10:33,670 --> 00:10:45,270 Classify you for these features. OK, so now I'll give you a new target, and I give you the target in terms of features. 87 00:10:45,270 --> 00:10:50,400 So you already know how to find the features in the image. 88 00:10:50,400 --> 00:10:54,990 And the question is, do I give you enough information here? 89 00:10:54,990 --> 00:11:06,740 So you can actually. More or less accurately classify with respect to, again, a new unlearnt dog. 90 00:11:06,740 --> 00:11:11,600 So this is kind of the high level idea behind all this is your short. 91 00:11:11,600 --> 00:11:20,150 And then are you going to get on with two episodes? But since it's a new area, many papers have, you know, somewhat different formalisation. 92 00:11:20,150 --> 00:11:29,270 And one of the first contribution is we don't really follow it in more as general methodical format. 93 00:11:29,270 --> 00:11:37,580 OK, so just presenting it as a picture. So the hope was this is, I guess, some unlabelled data. 94 00:11:37,580 --> 00:11:44,740 I want to classify X and then I get some classifiers. 95 00:11:44,740 --> 00:11:52,030 But they are not for my darling. So there's a target I want to classify with respect to some target glass. 96 00:11:52,030 --> 00:11:57,790 I don't ever classify Orthodox, but I get to classify it for religious targets. 97 00:11:57,790 --> 00:12:07,480 OK, so this can be, you know, the colour or whether it's the cocoon or stuff like that or other features that you can find index. 98 00:12:07,480 --> 00:12:19,900 So now the only information that you have about X is its classification, according to this, and features all related items. 99 00:12:19,900 --> 00:12:30,940 And I want to I want to get you some this information, and in order to give the labelling of facts according to the new stuff and to do a way 100 00:12:30,940 --> 00:12:38,560 to do it is to actually information you have is in the case of if you label example, 101 00:12:38,560 --> 00:12:43,030 the only thing that you know about the target is full. 102 00:12:43,030 --> 00:12:53,140 A few examples, much fewer than what you need to other really classifying the classifier for the targets, you know, from scratch. 103 00:12:53,140 --> 00:13:01,360 So this is for the few label examples. And the other way is that you don't have example to not at all. 104 00:13:01,360 --> 00:13:05,380 Well, there's some other way in which you should get some information about the target and you 105 00:13:05,380 --> 00:13:11,380 get it in terms of eight different metrics that give you a conditional probability. 106 00:13:11,380 --> 00:13:19,610 We'll see the days or attributes to the target. 107 00:13:19,610 --> 00:13:28,700 OK, so we'll see three results, and let's look about it, a weekly supervised. 108 00:13:28,700 --> 00:13:32,550 OK, so now let's get to a little bit more forward. 109 00:13:32,550 --> 00:13:41,510 So we have a distribution and we have a classifier which we don't have a should, which we don't know way to find on the Domain X. 110 00:13:41,510 --> 00:13:48,950 And then we have some set of labels classifier, but they're not classifiers for what the classifier? 111 00:13:48,950 --> 00:14:01,770 For something else? OK, now for each of these related labels or function survey. 112 00:14:01,770 --> 00:14:15,910 We use. The few example of the small training set in order to estimate the flow of this classifier. 113 00:14:15,910 --> 00:14:25,590 With respect to the dog show, if you think about it of, say, the classified view, I survie is whether the image has a black. 114 00:14:25,590 --> 00:14:37,270 And why is whether the image is a and so highly of the bill, which is basically the possibility that we have a Typekit is not a firebox? 115 00:14:37,270 --> 00:14:45,050 Okay. So again, the only information that we need to have about X is its classification. 116 00:14:45,050 --> 00:14:49,860 According to this data request, files that we're looking for, 117 00:14:49,860 --> 00:14:56,180 a function that map this effect also shows that everything is finally here for simplicity. 118 00:14:56,180 --> 00:15:04,490 So we have a vector binary vector and we want to make it to the classic, the target classification. 119 00:15:04,490 --> 00:15:08,960 And all of this, of course, is the possibility of the domain. 120 00:15:08,960 --> 00:15:21,770 The reason for the classification justification is different than what we get from the function of this binary vector. 121 00:15:21,770 --> 00:15:30,950 OK, so the first thing we observe as we start and this is the first work that we did in this area. 122 00:15:30,950 --> 00:15:41,240 So the first thing we observe is that in practise and for those who know, for example, there's been a snorkel. 123 00:15:41,240 --> 00:15:48,050 So what they do is basically they both sleep for what's called a council. 124 00:15:48,050 --> 00:15:54,500 OK, so what's crowdsourcing for? Crowdsourcing is what Amazon is doing for you and stuff like that. 125 00:15:54,500 --> 00:16:00,560 So you give a images, say to a lot of lives by hand. 126 00:16:00,560 --> 00:16:06,710 They label the features and there are so many levels and somehow you have to handle it all because 127 00:16:06,710 --> 00:16:15,160 some of the labels are a bit more careful to do some images of how the deal with than others. 128 00:16:15,160 --> 00:16:25,850 OK. So but what's common to outsourcing is that all the sources that they lose, they're working for the same stuff. 129 00:16:25,850 --> 00:16:29,750 We asked them whether the image has a door or not. 130 00:16:29,750 --> 00:16:33,860 And they try to answer this question. So they were doing a better job than the other. 131 00:16:33,860 --> 00:16:42,620 But basically, they have the same goal. They have to classify an image according to a particular sort of so reasonable 132 00:16:42,620 --> 00:16:48,170 assumption and crowdsourcing and all the good and the outsourcing they use, 133 00:16:48,170 --> 00:16:53,560 it is that the ill, with respect to the real answer, are independent. 134 00:16:53,560 --> 00:17:01,100 So if the. The labels are not coordinating in some the same way, and why should they do that? 135 00:17:01,100 --> 00:17:07,830 They assume that they make mistakes because it would cost because they're so careful but ill of. 136 00:17:07,830 --> 00:17:12,270 OK. Can we assume the same thing here? 137 00:17:12,270 --> 00:17:18,230 Well. If I look at the question, you know, does the image has real? 138 00:17:18,230 --> 00:17:23,690 And it's not a final thought. OK? Is this event independent of the question? 139 00:17:23,690 --> 00:17:33,680 The image has a bloc that is notified. Of course, they're not independent because, OK, so this. 140 00:17:33,680 --> 00:17:44,000 Assumption that was used in so many important is that you walk and you can show that the whole can be significantly large, 141 00:17:44,000 --> 00:17:53,100 so assume that the individual does have it all on one's own with respect to the targets. 142 00:17:53,100 --> 00:17:59,300 Okay, then, if the liberals have, it will be dependent. 143 00:17:59,300 --> 00:18:09,190 Then by taking the majority you know you, they'll basically, of course, goes explain exponentially down to zero. 144 00:18:09,190 --> 00:18:13,950 OK, but if they're not independent, that is not the case. 145 00:18:13,950 --> 00:18:24,320 And in fact, there is no greater than, you know, the median age about all the Mongols. 146 00:18:24,320 --> 00:18:32,600 So that's, you know, although people say to themselves, you know, so can we do better? 147 00:18:32,600 --> 00:18:38,750 So the truth is that if Malcolm, you know, with few, we can help you get a bit unbelievable. 148 00:18:38,750 --> 00:18:40,940 So if you think about it a little more carefully, 149 00:18:40,940 --> 00:18:50,360 you see that actually dependency is not necessarily the easiest case for one of us or even the one to give you the most information. 150 00:18:50,360 --> 00:18:58,520 So I feel that we have in this picture, we have three glossy files that will be one that leads to the retreat and assume, 151 00:18:58,520 --> 00:19:04,170 of course, we don't know that, but the CEO, each of them has intellectual and. 152 00:19:04,170 --> 00:19:15,360 This bug love you. Oh, oh, okay, so now if the animals out this, you're not independent of the joint. 153 00:19:15,360 --> 00:19:19,670 Then a majority is always, always quick. So that's good. 154 00:19:19,670 --> 00:19:26,520 The you don't have independent people, but actually a majority of them is loyal. 155 00:19:26,520 --> 00:19:36,990 On the other end, here's the independent views. OK, so you know, each intersection available as a possibility to OK. 156 00:19:36,990 --> 00:19:42,270 And now if you think majority, then you'll resolve have. 157 00:19:42,270 --> 00:19:48,750 OK. So in fact, independence is not so great if you want to take majorities. 158 00:19:48,750 --> 00:19:54,000 So how do you leverage of this phenomena? OK. 159 00:19:54,000 --> 00:19:59,700 So I actually really see where we're going. So let's look at this even worse example. 160 00:19:59,700 --> 00:20:09,810 So assume that we have a classify W4 and W5 that there L is a subset of the LW three. 161 00:20:09,810 --> 00:20:16,700 OK, so now if we take a majority, then all these are always going to have doughnuts. 162 00:20:16,700 --> 00:20:29,970 OK, but if we eliminate these to classify four and five that are included in three, then suddenly majority will be perfect up. 163 00:20:29,970 --> 00:20:38,250 OK. But of course, we cannot do that because we don't know where all the ills of the classifier would suspect with the new. 164 00:20:38,250 --> 00:20:42,820 OK. Of course, we could do it in pictures, but we don't really know that. 165 00:20:42,820 --> 00:20:53,970 OK. So our goal is to start with this set of we classifier that we get, and the idea is to get. 166 00:20:53,970 --> 00:20:58,980 A subset. That has almost destroyed it. 167 00:20:58,980 --> 00:21:07,740 So we want to get to go get to this basic situation, but we want to take out of a lot of glossy files. 168 00:21:07,740 --> 00:21:12,630 We want to take a few glossy photos of the ill. 169 00:21:12,630 --> 00:21:24,660 But of course, we don't know that else. OK. But then after we find a classified of the goals, Ronald, and we're just taking with George as before. 170 00:21:24,660 --> 00:21:32,700 OK, so the only idea is we take it out of the well chosen substance instead of a majority of the whole of the cloth. 171 00:21:32,700 --> 00:21:37,950 But of course, we don't know the whole and we don't know if there was just Jones or not. 172 00:21:37,950 --> 00:21:54,100 But what we know is that. This classify this class, if you disagree on more points, more places in the domain, then this classified. 173 00:21:54,100 --> 00:21:58,900 OK, just to classify, you basically agree in almost everyone. 174 00:21:58,900 --> 00:22:05,290 Well, this does if I disagree on area or on measures to help people. 175 00:22:05,290 --> 00:22:10,590 OK. Now, the important thing is they. 176 00:22:10,590 --> 00:22:17,580 To measure how much disagreement you have between two classifiers, you don't need label data, 177 00:22:17,580 --> 00:22:24,720 you can do it on only one data and unlabelled data is cheap, the expensive one is labelled. 178 00:22:24,720 --> 00:22:34,050 So what happened now is we're going to leverage on unlabelled data to get information that would help us find 179 00:22:34,050 --> 00:22:41,930 joins more or less to join classifier so that we can think majority of them and get them much better results. 180 00:22:41,930 --> 00:22:47,240 OK, so not to get a little bit more formal. So again, we have the weekly emails. 181 00:22:47,240 --> 00:22:55,430 We have a lot of unlabelled data and we have a small set of labelled data. 182 00:22:55,430 --> 00:23:00,110 Now we know for each label, we know the URL of this. 183 00:23:00,110 --> 00:23:10,000 We have an estimate of the label of the ill label with respect to the target because we use label data to estimate. 184 00:23:10,000 --> 00:23:15,970 Now it's important that here we are estimating a linear number of comedies. 185 00:23:15,970 --> 00:23:20,480 OK, we estimate the limited number of pills. 186 00:23:20,480 --> 00:23:29,660 OK, now we estimate the medley of the disjoint disagreement. 187 00:23:29,660 --> 00:23:33,500 OK, so now we estimate that from the pills. 188 00:23:33,500 --> 00:23:42,200 But we can estimate them using unlabelled data. OK, so now what's the URL that we would get? 189 00:23:42,200 --> 00:23:48,390 OK, so if we choose a subset a out of all the. 190 00:23:48,390 --> 00:23:57,300 Classify and we'll do the majority out of this subsets, then the worst case, ill, we have. 191 00:23:57,300 --> 00:24:12,160 OK, well, it's the L of this, a majority, OK, but over what's so over all possible classifiers? 192 00:24:12,160 --> 00:24:18,030 To satisfy. This week to this matrix. 193 00:24:18,030 --> 00:24:29,340 OK, so we're not going, so we mucks over all possible, in particular the stereo set of weekly labels. 194 00:24:29,340 --> 00:24:37,300 And the only thing that's the sleek, visceral view is that we know all the set of feels. 195 00:24:37,300 --> 00:24:40,540 Victorville, and we know the metrics over, 196 00:24:40,540 --> 00:24:54,480 so the worst case they're all here is the marks over all set of labels that satisfy these Epsilon and Andy and the L on that subject. 197 00:24:54,480 --> 00:24:59,290 OK, so so far we're interviewing about what can we do with us? 198 00:24:59,290 --> 00:25:06,460 So I won't get through the day, but basically you can. 199 00:25:06,460 --> 00:25:13,810 Figure out the maximum air or using a linear. 200 00:25:13,810 --> 00:25:25,160 OK. The Liverpool game is, you know, the number of concern you have here is a function of size of life. 201 00:25:25,160 --> 00:25:38,570 OK. So. So now for a given subset, we can figure out the maximum well that we can get. 202 00:25:38,570 --> 00:25:49,640 And again, I emphasise, was getting together again that we don't really know what's, how it can be the targets with respect or distribution. 203 00:25:49,640 --> 00:25:58,470 So the only thing we can do is we take the monks over all possible of all possible examples. 204 00:25:58,470 --> 00:26:04,460 What we get was we will classify. 205 00:26:04,460 --> 00:26:14,830 OK. So for a particular particular subset of I know how bad it can be and I want to find, 206 00:26:14,830 --> 00:26:20,960 you know, the low bone for any subset so as to look at it mean overall. 207 00:26:20,960 --> 00:26:28,730 So now I can optimise the eye that they choose and for each one that they choose. 208 00:26:28,730 --> 00:26:33,230 I know what would be the worst case ill for that one. OK. 209 00:26:33,230 --> 00:26:39,090 So in other words, after all this? It's not easy. 210 00:26:39,090 --> 00:26:47,160 And in fact, the solution for this would be exponential. It would be exponential in the size of the substance in case. 211 00:26:47,160 --> 00:26:57,990 OK, so that's good as on this case. Well, if he if you want large gains, just become more and more expensive competition. 212 00:26:57,990 --> 00:27:06,870 Okay. But for three, if we are willing to adjust with subsidies of three, yeah, I think so. 213 00:27:06,870 --> 00:27:11,940 Yeah. Sorry. Excuse me to interrupt. 214 00:27:11,940 --> 00:27:19,260 Could you go back to the if I start when you are going to introduce your variables and this one? 215 00:27:19,260 --> 00:27:30,720 Yes. First, I have the question in the is non overlaps definition that you are going to define X with distribution d. 216 00:27:30,720 --> 00:27:37,680 This is the overall distribution of all data set of different tasks. 217 00:27:37,680 --> 00:27:45,900 You mean? This is the distribution of the input of that we need to classify. 218 00:27:45,900 --> 00:27:51,240 If I understood correctly for your problem, you are going to consider a different task as an input. 219 00:27:51,240 --> 00:27:57,030 Yes. I mean, they are different, but we assume for simplicity, we can also do transfer. 220 00:27:57,030 --> 00:28:03,450 But let's give it simple. So the tasks are defined on the same distribution. 221 00:28:03,450 --> 00:28:09,450 Understand. OK. And what do you think about the huge database of images? 222 00:28:09,450 --> 00:28:17,450 Mm-Hmm. Okay. And you know, there are all sorts of things in the images and but the distribution is over all images. 223 00:28:17,450 --> 00:28:28,010 OK. And in your analogy, would you consider any sample complexity, I mean, that restricted the number of, for example, 224 00:28:28,010 --> 00:28:33,810 all Libyan data or labelled data or not, it is independent from the number of Libyan that are on it. 225 00:28:33,810 --> 00:28:44,220 Well, OK. So the Labour leader should be large enough so that they get a relatively good estimate for the absolute. 226 00:28:44,220 --> 00:28:50,760 OK, but obviously, I need a load of an. 227 00:28:50,760 --> 00:29:00,210 You know, squirrel data in order to, you know, it was high probability, get a good estimate for those who want to you on that. 228 00:29:00,210 --> 00:29:05,550 OK, and then the Labour leader, well, now let's do it without it now. 229 00:29:05,550 --> 00:29:11,360 And maybe they don't call us. OK, thank you. 230 00:29:11,360 --> 00:29:17,670 But it's. OK, so. 231 00:29:17,670 --> 00:29:23,640 It seems the solution is not fully satisfied, really for computer scientists, because, you know, 232 00:29:23,640 --> 00:29:34,470 if I really want to optimise respect to large sets of, you know, this might be exponential in the size of the subset of time considering. 233 00:29:34,470 --> 00:29:42,450 But then for at least three, four three levels, we can get, you know, close close a reform solution. 234 00:29:42,450 --> 00:29:47,400 We don't even have to do the, you know, poor guy. How good is it? 235 00:29:47,400 --> 00:29:54,360 So now we still we pause purely for a second and we get to experiment. 236 00:29:54,360 --> 00:30:01,620 So this was done on a standard benchmark of images anyway. 237 00:30:01,620 --> 00:30:10,410 So you have all sorts of anyone's this data let you classify to that, OK, and what do we have? 238 00:30:10,410 --> 00:30:18,010 So the beauty of this is that I would explain when we all use names. 239 00:30:18,010 --> 00:30:22,780 The other two, although it will be good, the extension, 240 00:30:22,780 --> 00:30:31,300 which are not fully analyse the extension beyond three, where you add more and more, we classify. 241 00:30:31,300 --> 00:30:44,410 OK. And what you see here, which of course, is not purely about up, but in this particular data, it seems like three is as good as many. 242 00:30:44,410 --> 00:30:56,620 So if you choose if you choose the optimal three we classify, you already get enough information as if you use more than that. 243 00:30:56,620 --> 00:31:04,760 But we again, this is just data. And can I just ask what happens if you only use to label us this works? 244 00:31:04,760 --> 00:31:12,020 Oh yeah, because. OK, so I was going to put about later. 245 00:31:12,020 --> 00:31:15,520 But here you go. 246 00:31:15,520 --> 00:31:25,810 So what we get here is. We had the margin of the air, which of the. 247 00:31:25,810 --> 00:31:31,390 And the pillars of the Coalition's. 248 00:31:31,390 --> 00:31:37,450 OK, now you want more than just the words coalition in order to get votes. 249 00:31:37,450 --> 00:31:42,910 Well, we'll see when we get to a more sophisticated guillotine with the word complete. 250 00:31:42,910 --> 00:31:48,250 OK? And then the yellow one here, this is the one with them. 251 00:31:48,250 --> 00:32:01,270 This is this sound of a crowd sourcing of water, which is also used in in this context, which basically you assuming the bench. 252 00:32:01,270 --> 00:32:06,940 And if your shoe independence, if you take three out of four, 253 00:32:06,940 --> 00:32:17,050 you take some of the labels and you don't have too many of them, then you get a lot of goals somehow in the example we have. 254 00:32:17,050 --> 00:32:24,710 And again, this is not purely somehow when you go through many more labels, the majority seems to perform much better. 255 00:32:24,710 --> 00:32:30,910 So maybe somehow it clicked for independent. 256 00:32:30,910 --> 00:32:37,030 But it's still even for a lot of labels and it's very heavy competition. 257 00:32:37,030 --> 00:32:46,040 It's still those of each, you know, an algorithm that goes just use the vestry and also supported by George. 258 00:32:46,040 --> 00:32:56,790 OK, so that that is right, but it's very limited, and it was kind of the first few people in that area, 259 00:32:56,790 --> 00:33:05,870 but it was only for binary classification, only for zero one loss function and you know, a lot of other limitation for people of the complexity. 260 00:33:05,870 --> 00:33:16,310 So work, which a little discussion is sort of a more general framework to build on with labels. 261 00:33:16,310 --> 00:33:25,160 And now we go to use either the lost function and also we're going to be doing multiclass classification. 262 00:33:25,160 --> 00:33:35,660 That means you classify between more than two classes. OK, so here's the high level idea, and that's really what they really want to emphasise. 263 00:33:35,660 --> 00:33:42,260 So we have a set of unlabelled data. It's a view that we want to classify. 264 00:33:42,260 --> 00:33:51,200 So let's this you know, that would you give the green one B all the possible maybe. 265 00:33:51,200 --> 00:34:00,080 So each point you is the vector of labelling for the label for anybody? 266 00:34:00,080 --> 00:34:08,760 And we hear we at this point, we have all possibility. So now. 267 00:34:08,760 --> 00:34:15,030 We're going to leverage on a small amount of labour data. 268 00:34:15,030 --> 00:34:23,050 In two ways. Okay, so first, I'm going to use the naval data. 269 00:34:23,050 --> 00:34:32,370 To get an estimate of the ear of each of the we classify it with respect to the target. 270 00:34:32,370 --> 00:34:37,290 I'm glad because the Labour Day is labour with respect to the date to the top. 271 00:34:37,290 --> 00:34:47,990 OK, so the loss of X and Y is the expected loss of the. 272 00:34:47,990 --> 00:34:54,740 Oh, that's mix up you. This is the expected loss. 273 00:34:54,740 --> 00:35:07,120 Well, this should be here and it should be here. OK, so the loss I explain why it is the loss expected loss of a. 274 00:35:07,120 --> 00:35:16,670 With respect to the target. OK, this should be this line, so OK, and that come the second week. 275 00:35:16,670 --> 00:35:26,710 Given a classifier feasibility and the labelling of all the data of some people. 276 00:35:26,710 --> 00:35:39,420 Since. With weeks since we know she's alive, we can estimate the arrow of survived with respect to the goal of supply. 277 00:35:39,420 --> 00:35:45,370 Was it when when we use for this, Victor, why? 278 00:35:45,370 --> 00:35:59,190 Okay, so again here. Here we are still awaiting the arrival of resupply of the weekly limit with respect to the Typekit. 279 00:35:59,190 --> 00:36:09,370 [INAUDIBLE]. We estimate. The arrow of a particular labelling vector, why? 280 00:36:09,370 --> 00:36:15,160 With respect to free supply. Why do we do that? 281 00:36:15,160 --> 00:36:25,850 Well, we'll get to a more formal way to do this later. But we kind of assume we can assume that the Alpha II. 282 00:36:25,850 --> 00:36:40,350 On the correct label, why? Should be is on the same level that we get, you know, full day, full day in labour data. 283 00:36:40,350 --> 00:36:45,000 OK, so this labour data, so I saw this. 284 00:36:45,000 --> 00:36:50,790 Here we have the LVI with respect to the targets on the Labour Party. 285 00:36:50,790 --> 00:36:56,180 OK, now if why is it labelling? 286 00:36:56,180 --> 00:37:05,600 Then. The M.O. of CIA on one should be about this one. 287 00:37:05,600 --> 00:37:15,410 Oh, great. So again. Using the Labour leader, we got an estimate of the ill of I. 288 00:37:15,410 --> 00:37:30,000 We spoke to the target. So now, if we generalise, then for any quick labelling, why the only data? 289 00:37:30,000 --> 00:37:36,320 We expect to see to get the same type of air or flight of resupply. 290 00:37:36,320 --> 00:37:48,280 OK, a few I had this ill was fake to the targets on the Labour leader, it should have the same on the correct labelling of the earlier unnamed bit. 291 00:37:48,280 --> 00:37:55,440 OK, so what happened to you is that for each of the labels. 292 00:37:55,440 --> 00:38:00,950 We get a subset of the whole domain. 293 00:38:00,950 --> 00:38:12,680 In which the labelling, according to Duff, they've has about the same M.O. as it has on the label beetle and the toy. 294 00:38:12,680 --> 00:38:20,680 The true classification has to be in the intersection of all these things. 295 00:38:20,680 --> 00:38:24,910 Questions of. I know that's kind of pushing it. 296 00:38:24,910 --> 00:38:34,740 OK, we'll see it again, Hugh. OK, so let's a way that it would be a six. 297 00:38:34,740 --> 00:38:39,540 Of possible solution that we consider. 298 00:38:39,540 --> 00:38:43,830 And what do we know? OK, so be more formal. 299 00:38:43,830 --> 00:38:53,470 It's a set of vectors y which give any inputs and unlabelled data in. 300 00:38:53,470 --> 00:39:07,880 And in a cave, which is the number of possible classes, and we assume that why give us this big tool of a possibility so we are doing sophistication. 301 00:39:07,880 --> 00:39:21,100 OK, so for any particular input, x y could vector things K that gives the possibilities, but X is any one of the key classes. 302 00:39:21,100 --> 00:39:30,580 OK. So now, if Y is a feasible solution. 303 00:39:30,580 --> 00:39:36,590 Then we expect to see CIA l on the way. 304 00:39:36,590 --> 00:39:44,640 Would be the same as or close to what we saw as the LVI on the label data. 305 00:39:44,640 --> 00:39:51,880 OK, this value we estimated from the label data. So we assume that this. 306 00:39:51,880 --> 00:40:01,220 Value the the real value of the universe, a labelling move to only give us about perceived value plus minus some ill. 307 00:40:01,220 --> 00:40:14,910 OK, so if we choose the will quickly, then what we get is very high probability, the total solution, but have to be inside this. 308 00:40:14,910 --> 00:40:25,520 OK. And this set is basically the intersection. Of the values of way that satisfied this into. 309 00:40:25,520 --> 00:40:31,700 So you take the same error margin for every fight I. 310 00:40:31,700 --> 00:40:40,190 Yeah, that's one possibility. OK. Since we don't have I mean again, we don't have any better information. 311 00:40:40,190 --> 00:40:42,680 There's no reason to think different levels. 312 00:40:42,680 --> 00:40:49,220 If I knew if we all we that the ills of, you know, have different distribution of something, then I could never do that. 313 00:40:49,220 --> 00:40:58,920 I don't have the information. This is kind of, you know, worst case analysis I'm really analysing using what I have. 314 00:40:58,920 --> 00:41:04,990 OK, so basically now I build a set. 315 00:41:04,990 --> 00:41:11,060 Which hopefully, hopefully is much smaller than the total set of all possible neighbouring. 316 00:41:11,060 --> 00:41:22,500 And with that hope, or will it be the correct labelling is inside this sect, which has been the intersection of all the all the. 317 00:41:22,500 --> 00:41:29,820 Said the fund for particular neighbourhoods. Question so far. 318 00:41:29,820 --> 00:41:35,130 Excuse me. And help ease the number of lives later, yes. 319 00:41:35,130 --> 00:41:45,330 So we did meddle in the definition and here is a young fellow who examined before it was ancient and at least the number of naval data. 320 00:41:45,330 --> 00:41:56,040 OK? Yeah, yeah. Yeah. And if the M.Ed is going to infinity and we consider the aim is going to be constant, then the Gunma is constant. 321 00:41:56,040 --> 00:42:04,650 Yes. Well, if anyone goes to infinity, then I'm betting pool standard super was burning. 322 00:42:04,650 --> 00:42:17,160 Yeah, exactly. I mean, this setting means that if the internet is going to infinity, this formulation means that, uh, again, we have a constant gomaa. 323 00:42:17,160 --> 00:42:22,050 However, we expect that the goal should be zero because the infinite. 324 00:42:22,050 --> 00:42:28,530 Yes. Yeah. So, yeah, so OK. 325 00:42:28,530 --> 00:42:34,260 So you get this limit because otherwise you're right. 326 00:42:34,260 --> 00:42:39,660 I mean, you can use hosting or derivative and get a bit of. Yes. 327 00:42:39,660 --> 00:42:49,060 OK, thank you. Yeah. Good know, that's a really good point, and this is. 328 00:42:49,060 --> 00:42:59,270 Okay. So, OK, so we know that the solution will be inside this, OK? 329 00:42:59,270 --> 00:43:06,980 In a perfect world, one victim. Of course, you realise it's not is going to be a set, hopefully not too large. 330 00:43:06,980 --> 00:43:19,550 We quantify that in the future. So the next thing we do is, again, we're going to say, well, if I choose a particular also, OK, so let's see. 331 00:43:19,550 --> 00:43:24,440 We some function of without parameter is the solution. 332 00:43:24,440 --> 00:43:31,850 And so we have a set of possible solution to compete and function of the fees. 333 00:43:31,850 --> 00:43:40,310 OK. And we want to basically minimise minimax optimisation on the loss. 334 00:43:40,310 --> 00:43:47,050 OK, so what's really what's so? Of course, we would take the minimum over data. 335 00:43:47,050 --> 00:43:52,010 But then we're left to take the maximum over. Why? 336 00:43:52,010 --> 00:43:57,450 Go over all the possible distribution, the computer. 337 00:43:57,450 --> 00:44:04,340 OK, so the first of all, I'm assume that we fixed it the. 338 00:44:04,340 --> 00:44:14,430 Okay. Then again, finding the material amongst all the feasible ways. 339 00:44:14,430 --> 00:44:25,350 So this again could be done in April. That's not the issue, but the hard part is the meaning of over on paper. 340 00:44:25,350 --> 00:44:37,530 And we also believe in general and but even if we take convictions, we still have the the problem that the MAX, you may not be defensible. 341 00:44:37,530 --> 00:44:43,300 So we can't do that intercept. So I won't go into detail. 342 00:44:43,300 --> 00:44:45,180 That's a completely different subject. 343 00:44:45,180 --> 00:44:58,170 But there is this way of subclavian analysis, which basically estimates gradient when you cannot assume that it is what you do. 344 00:44:58,170 --> 00:45:07,860 Okay. And you can show done in a number of steps. 345 00:45:07,860 --> 00:45:15,390 It's basically a polynomial of values. You are going to get it better. 346 00:45:15,390 --> 00:45:20,370 You can estimate Petar Tilde here. 347 00:45:20,370 --> 00:45:27,450 That is going to be very close to the minimum exclusion. 348 00:45:27,450 --> 00:45:38,490 And think my word, this is designed to get into this for this sub gradient descent at this subterranean method is actually up of its own. 349 00:45:38,490 --> 00:45:49,800 I think you can trace it to Newton, but it's very practical when you say you cannot assume that everything is different. 350 00:45:49,800 --> 00:45:57,900 OK? So in kind of cutting over time, we can get. 351 00:45:57,900 --> 00:46:02,750 A. Obama did better. 352 00:46:02,750 --> 00:46:16,280 This does such that edge to the edge of the field, I would be almost a solution for the Mumbai problem or give us guidance, when can we do it? 353 00:46:16,280 --> 00:46:19,430 Okay, so we need a set, of course, to be convex. 354 00:46:19,430 --> 00:46:30,290 And we need the laws function to behave, you know, smoothly and in particular, we do it for at least one continuous function. 355 00:46:30,290 --> 00:46:37,880 Again, you can choose ideal restriction, but you need some sleep in particular conviction to get anywhere. 356 00:46:37,880 --> 00:46:49,940 And you know what? Those function satisfy the populace of those functions and satisfy this condition are soft marks and by, 357 00:46:49,940 --> 00:46:54,980 you know, some of those function in the supplement. 358 00:46:54,980 --> 00:47:00,340 OK. Now, I just want to say one words. 359 00:47:00,340 --> 00:47:11,820 It was a dramatisation of so so far that I didn't elaborate too much before we gave a solution, which is a soft classification. 360 00:47:11,820 --> 00:47:18,120 But you are too old actually to say how far to us, how far this is from their hard classification. 361 00:47:18,120 --> 00:47:25,500 If I just want to be one, I can also be directly pulled out of the because somebody continues to be awash with this. 362 00:47:25,500 --> 00:47:30,240 But we can actually get a pretty good on that. 363 00:47:30,240 --> 00:47:42,760 And the bond related to don't do a one the or that we get from the Guardian for the super subglacial subterranean metal. 364 00:47:42,760 --> 00:47:47,320 The complexity, of course, has to get done. 365 00:47:47,320 --> 00:47:54,750 Otherwise, you can use a cover angle to rebut something like this has to be there and then you have the whole bar, OK? 366 00:47:54,750 --> 00:48:04,660 And the last thing I want to emphasise is this kind of analysis give you related information that is very interesting on its own. 367 00:48:04,660 --> 00:48:13,730 So this measure? Is a measure of the size of the feasible set. 368 00:48:13,730 --> 00:48:25,820 And the size of the visible fit basically says, how good was the information that you gave me if you gave me weak label is useless, 369 00:48:25,820 --> 00:48:36,650 then my physical self would be in the hole if you gave me very good related labels so that together this really pinpoints the target, 370 00:48:36,650 --> 00:48:41,930 then the the physical set would get smaller. 371 00:48:41,930 --> 00:48:53,780 This is a way to measure OK, and this will take us to the last thing if you allow me to computer another few minutes, can I? 372 00:48:53,780 --> 00:48:55,070 Yes, I think that is fine. 373 00:48:55,070 --> 00:49:04,120 And so if someone needs to leave and would like to ask an urgent question, maybe we can just give them the chance to ask one now. 374 00:49:04,120 --> 00:49:12,100 Sure, social. OK, that doesn't seem to be the case. 375 00:49:12,100 --> 00:49:17,210 Nobody's showing up, so. So please do continue. Thank you. OK, thanks. 376 00:49:17,210 --> 00:49:23,120 So again, there's a simulation that I the only TV I want to show is. 377 00:49:23,120 --> 00:49:27,630 OK, so here here we actually use this again. 378 00:49:27,630 --> 00:49:36,140 The method is to move forward because the multiclass classification, but there is very little algorithm to fill in the multiclass. 379 00:49:36,140 --> 00:49:44,180 So first, we look at this and say what happened would be on this level with them just for a binary classification. 380 00:49:44,180 --> 00:49:48,710 And remember that this presumably is what we did before. OK. 381 00:49:48,710 --> 00:49:59,400 And what you see here is that we do get somewhat better than the algorithm for the just just as majority. 382 00:49:59,400 --> 00:50:10,380 So you can actually, even for binary precision, this more data analysis gives you better and these are results forward of a non-binary classification. 383 00:50:10,380 --> 00:50:14,910 I don't want to get into too much into this. OK. 384 00:50:14,910 --> 00:50:27,000 So the last thing I want to say, and I really feel, is that we look at zero short term, which is, you know, really the ultimate hype these days. 385 00:50:27,000 --> 00:50:38,960 And we wanted to ask the following question. What information you need to give me so that you give me a result that is meaningful, meaningful. 386 00:50:38,960 --> 00:50:50,420 OK. So again, what is a zero shortening? So you get a lot of images and from the images you learn attributes. 387 00:50:50,420 --> 00:51:00,230 I think that it was like classifieds. So, you know, so you learn whether it has anyone as a phase five, the or whatever. 388 00:51:00,230 --> 00:51:11,040 OK, so from the images, so what you learn is you learn how to find these attributes in images. 389 00:51:11,040 --> 00:51:17,130 OK, so basically, you input is first of all. 390 00:51:17,130 --> 00:51:26,880 Avoidance. Oh, no. The only technique for identifying a set of attributes in images. 391 00:51:26,880 --> 00:51:38,560 OK, so this is you can think about it is, you know, the related data that is already classified and you use it in order to understand. 392 00:51:38,560 --> 00:51:44,280 OK, now we don't have any examples, any training data. 393 00:51:44,280 --> 00:51:55,790 So in order to get and in order to figure out, you know, the target, what they get is we get a matrix that relates the attributes. 394 00:51:55,790 --> 00:52:03,170 To the different classes and to distinguish between, OK, so over metrics that basically says, well, 395 00:52:03,170 --> 00:52:08,960 you know, if a zebra was disposability, mobility would have the day you'll see a thing with the way. 396 00:52:08,960 --> 00:52:19,610 Remember that images are confusing because zebra might have the day, but not every image, because it would show that they will release you. 397 00:52:19,610 --> 00:52:25,070 OK. So you got this matrix of basically conditional performance. 398 00:52:25,070 --> 00:52:33,540 OK, so. Now we want to ask basically, OK, so how do I do it first? 399 00:52:33,540 --> 00:52:39,870 OK, so don't blame this work that was done. Basically, look at it the following week. 400 00:52:39,870 --> 00:52:48,330 He was the targets. OK, here's the a few of the features that we know how to find. 401 00:52:48,330 --> 00:52:53,160 Okay, here's the error in finding this feature images. 402 00:52:53,160 --> 00:52:58,020 So here's how good we find the hole that features in the image. 403 00:52:58,020 --> 00:53:06,720 And here is the mapping or the metrics that isolate the features to the classes and target classification. 404 00:53:06,720 --> 00:53:14,640 OK, so basically this is of input. This is a process that extract features from the input. 405 00:53:14,640 --> 00:53:20,400 And this is the process that met features into the glasses that don't get glasses. 406 00:53:20,400 --> 00:53:28,790 OK. So um, and the second. 407 00:53:28,790 --> 00:53:35,190 Yeah. And on your previous work, you respond, you can publish this paper, basically, says the photo, 408 00:53:35,190 --> 00:53:42,130 he basically showed the following most of his experiment says, Well, you know if? 409 00:53:42,130 --> 00:53:55,860 Hey, is. You know. Is, you know, basic identity metrics, then sure, you can classify everything would work great on the other day. 410 00:53:55,860 --> 00:54:05,070 And if the attributes are completely orthogonal to the glosses you want to classify, then of course there's nothing you can do. 411 00:54:05,070 --> 00:54:11,970 So just overall, think about this and this feature and this feature, this attribute and this attribute. 412 00:54:11,970 --> 00:54:20,640 Well, you know, you don't you can't get much out of this if you look if you want to distinguish between Zoopla and we're OK. 413 00:54:20,640 --> 00:54:33,020 So this was the result and we wanted to, you know, the more the more detailed you were you. 414 00:54:33,020 --> 00:54:38,600 OK, so what we assume here is assumed that the first. 415 00:54:38,600 --> 00:54:50,860 Face. It's quite a few that given an image, you get that done, oil's prosecutable, it's a different level, but assume that this work is fine. 416 00:54:50,860 --> 00:54:57,450 What we really want is we want to understand the error in the mapping between features, 417 00:54:57,450 --> 00:55:09,580 which we attribute the features in the glasses because this is inside the [INAUDIBLE] out of the whole of the whole method. 418 00:55:09,580 --> 00:55:13,190 Do I get enough information in the features, 419 00:55:13,190 --> 00:55:21,760 all the attributes that they see in order to actually distinguish between the classes that they need to distinguish between the targets? 420 00:55:21,760 --> 00:55:29,080 OK. And most of the classical work this is completely ignored, completely ignored. 421 00:55:29,080 --> 00:55:40,600 The questions that maybe the attributes that you see don't define completely the targets that you want to classify. 422 00:55:40,600 --> 00:55:48,140 Okay. So, okay, so. 423 00:55:48,140 --> 00:55:58,540 Uses a formalisation to do so again, we have a Domain X with ludicrous classification, and again, we have this set of attributes. 424 00:55:58,540 --> 00:56:07,190 OK, reach out to groups risk you take a domain, then maybe two zero one four hesitating, doesn't it? 425 00:56:07,190 --> 00:56:17,760 Stuff like that. Okay. And what we get is we get this matrix, which is the class attributes Matrix. 426 00:56:17,760 --> 00:56:31,590 Okay? And basically said the age. I say the probability that all of the data that the feature is one condition on the target classification is OK. 427 00:56:31,590 --> 00:56:39,060 So in other words, what's the probability that when I have an image of Zimbra, I actually see the date? 428 00:56:39,060 --> 00:56:47,340 Stuff like that. OK. So the metrics, okay. So now I look at all the possible distribution. 429 00:56:47,340 --> 00:56:53,040 Again, I don't know anything about the input, so I have to look at all the possible distribution, 430 00:56:53,040 --> 00:57:01,050 you know, between, you know, the key features, all the attributes and the key classes. 431 00:57:01,050 --> 00:57:17,380 OK. And. Out of all of the possible distributions, we can think of self to the distribution that satisfied the condition that defined by the metrics. 432 00:57:17,380 --> 00:57:33,340 So what has been is satisfied that. Well. The possibility of a particular vector for particular, Victoria has classification, she has to be equal. 433 00:57:33,340 --> 00:57:41,330 Basically, it's the marginal distribution, all the metrics. Okay, so it's. 434 00:57:41,330 --> 00:57:55,220 Some of all the vectors, the possibility that they get Jay was played by A.J., this should be a. 435 00:57:55,220 --> 00:58:00,230 Yeah. OK, so this define, you know. 436 00:58:00,230 --> 00:58:03,500 Distribution that satisfy the matrix. 437 00:58:03,500 --> 00:58:15,830 And of course, we can define for a particular classification function, view a particular function from the attributes to the classification. 438 00:58:15,830 --> 00:58:20,940 We can find defined the all of this over the distribution. 439 00:58:20,940 --> 00:58:30,560 Now again, as before. What we want to find is the low bond and then all bond again is all will all 440 00:58:30,560 --> 00:58:40,970 distributions that satisfy the conditions defined by the attribute plus metrics. 441 00:58:40,970 --> 00:58:54,950 And then the mean over all functions plus values of the whole of a for this function and this distribution, thinking about it in probabilistic term? 442 00:58:54,950 --> 00:59:01,500 Think about it to think of what it is for. We know the marginal distribution. 443 00:59:01,500 --> 00:59:10,950 Okay, we know the probability of a plus for Future I conditional trust. 444 00:59:10,950 --> 00:59:15,550 We don't know the correlation between the events. 445 00:59:15,550 --> 00:59:27,260 OK, so basically what we look is we look for the worst case distribution subject to the margin of error that we know. 446 00:59:27,260 --> 00:59:31,310 OK, so we only know the marginal. 447 00:59:31,310 --> 00:59:37,070 So we'll have to assume for the worst case, we have to assume that we look at the maximum, 448 00:59:37,070 --> 00:59:43,010 the maximum able with respect to all the distribution to satisfy this margin of. 449 00:59:43,010 --> 00:59:50,660 OK. And then we assume this particular distribution that we actually choose the best classifier. 450 00:59:50,660 --> 00:59:56,810 So this is all over again so far. Just theory. 451 00:59:56,810 --> 01:00:02,000 The real thing here is that we have to compute the score once. 452 01:00:02,000 --> 01:00:12,650 And why it's important so. Currently, you get a zero go with them and you want it and you get the most. 453 01:00:12,650 --> 01:00:21,710 You don't have a clue how good you are. You don't have a clue how much you should pass this answer because there's not the way we associate with this. 454 01:00:21,710 --> 01:00:28,100 So what we do here is we give you some handle on how much you should trust. 455 01:00:28,100 --> 01:00:33,530 Well, what is the risk in using this input at this output? 456 01:00:33,530 --> 01:00:37,910 Why is the risk? Well, you might be in a better situation. 457 01:00:37,910 --> 01:00:48,340 We don't know. But in the worst case, that's the animal that's at least that ill that we can have. 458 01:00:48,340 --> 01:00:53,400 OK, so in other words. It might be that you have a better input. 459 01:00:53,400 --> 01:00:57,510 But the worst is over, they will give you this. 460 01:00:57,510 --> 01:01:06,270 And since you don't know deal, you should you could as well assume that this is the evil that you have in your use of that word, 461 01:01:06,270 --> 01:01:08,920 and we can also show that this is tight. 462 01:01:08,920 --> 01:01:21,270 In order to show the side you have to work with, then the wise classifiers in order to get useful mid-month results. 463 01:01:21,270 --> 01:01:26,970 OK. So in other words, you give me the standard thing to give me as. 464 01:01:26,970 --> 01:01:36,210 I need you basically give me a way to learn how to use and a mapping from the attributes to the class to the target classes. 465 01:01:36,210 --> 01:01:43,530 OK, that's the only thing I have. And I ask if I use what you give me and if I use it the optimal way, 466 01:01:43,530 --> 01:01:49,650 what can I guarantee about the you can guarantee about the quickness of the results? 467 01:01:49,650 --> 01:01:58,250 And here's what we show is that if you don't know anything more about your distribution, about your input, then. 468 01:01:58,250 --> 01:02:04,370 You can't guarantee to have better than this low volume deal. 469 01:02:04,370 --> 01:02:11,950 Of course, there may be cases in which you get better, but you feel guilty, not because you don't have information. 470 01:02:11,950 --> 01:02:21,430 OK. It's easy to solve it also for the Libyan case and. 471 01:02:21,430 --> 01:02:30,210 Again, to see what action is. I just want to say one word here, so this OK, so this is. 472 01:02:30,210 --> 01:02:33,890 This is the low one result for some time. 473 01:02:33,890 --> 01:02:40,320 It's again, anyone other data? OK. And these are various algorithms. 474 01:02:40,320 --> 01:02:47,900 OK. And they have all of these various eyewitness in parties are indeed allowed into law. 475 01:02:47,900 --> 01:02:54,380 OK, and another way we use it, I didn't talk about it in detail is that we could actually use the way this 476 01:02:54,380 --> 01:03:00,680 analysis to figure out which amongst the key classes of the how to distinguish 477 01:03:00,680 --> 01:03:08,730 because might be that you have allowed you will always because two particular classes are not distinguishable while all the other classes are, 478 01:03:08,730 --> 01:03:18,590 well, classified. And so you there's a way to actually quantify which classes of the one to give you there. 479 01:03:18,590 --> 01:03:26,100 OK. I think I took too much of your time so this year through any questions. 480 01:03:26,100 --> 01:03:30,860 Well, thank you very much first. First of all, it's a very nice talk. Thanks. 481 01:03:30,860 --> 01:03:33,433 I would just stop the recording now.