1 00:00:03,670 --> 00:00:07,270 Okay. Good afternoon, ladies and gentlemen. Let's begin. 2 00:00:07,660 --> 00:00:08,530 My name's Mike Wooldridge. 3 00:00:08,530 --> 00:00:14,890 I am head of Department of Computer Science and it's my very great pleasure to welcome you to the Hillary term straight lecture. 4 00:00:15,760 --> 00:00:18,040 As always, when we begin the Straight G lectures, 5 00:00:18,040 --> 00:00:26,469 I'd like to acknowledge I'd like to acknowledge the financial support of Oxford Asset Management who sponsor these lectures. 6 00:00:26,470 --> 00:00:35,620 And since their sponsorship we have been able to step up a gear in terms of the kind of lectures that we're able to offer. 7 00:00:35,980 --> 00:00:39,640 Getting into this lecture theatre is not in fact free. We have to pay for it. 8 00:00:39,850 --> 00:00:46,360 So when to use facilities like this to use fantastic facilities like this is only possible because of Oxford Asset Management support. 9 00:00:46,900 --> 00:00:52,780 Nevertheless, we did get them in a feature film. I hope you've all watched AlphaGo the movie. 10 00:00:53,320 --> 00:00:55,959 If you haven't watched AlphaGo the movie, you should go away and watch. 11 00:00:55,960 --> 00:01:02,920 It's on Netflix and Straight G lecture from exactly two years ago actually appears at the beginning of the movie. 12 00:01:03,580 --> 00:01:07,389 John, I'm sorry, I can't promise you're going to be in a feature film today. 13 00:01:07,390 --> 00:01:12,490 But nevertheless, it's my enormous pleasure to welcome John Crowe Croft To give this lecture. 14 00:01:12,490 --> 00:01:19,720 John is the Marconi Professor of Communication Systems at the University of Cambridge, where he's a fellow of Wolfson College. 15 00:01:20,050 --> 00:01:26,890 And it's difficult to summarise John's work briefly because he is just simply so tremendously active. 16 00:01:27,220 --> 00:01:31,090 But in terms of the UK, he is, I think it's safe to say, 17 00:01:31,720 --> 00:01:39,250 the UK's leading academic in the area of networks and has been he's been setting that agenda now for at least three decades, 18 00:01:39,790 --> 00:01:44,809 has a tremendous track record of some of the early work in in networks. 19 00:01:44,810 --> 00:01:52,810 So when we think about digital immigrants and digital natives, I'm a digital immigrant because I grew up in a world without the Internet. 20 00:01:52,810 --> 00:01:56,410 My kids are digital natives because they grew up and it was just all around them. 21 00:01:56,740 --> 00:02:04,630 But John was a true digital pioneer. I mean, he was one of the people that was creating the protocols that just makes the whole thing work. 22 00:02:05,260 --> 00:02:12,309 So he's a fellow of the Royal Society. I'm a Facebook friend of his, and one of my daily pleasures is seeing updates from John. 23 00:02:12,310 --> 00:02:16,810 They are the most entertaining updates that you can imagine on Facebook. 24 00:02:17,740 --> 00:02:19,330 And what can I tell you from following them? 25 00:02:19,330 --> 00:02:26,469 Facebook, I can tell you that he likes music and pubs, so you should really move to Oxford because you'd fit in really, really well. 26 00:02:26,470 --> 00:02:30,459 John, it's, it's my great pleasure to welcome John Croker off to give the lecture. 27 00:02:30,460 --> 00:02:37,050 Thank you, John. Okay. 28 00:02:37,050 --> 00:02:41,610 So settle down. I have 92 slides and we have about 50 minutes. 29 00:02:41,940 --> 00:02:45,000 And my mistake, I thought I had 2 hours and I thought I had an hour and a half. 30 00:02:45,210 --> 00:02:49,890 And on the train I realised I had an hour and 50 minutes and so, so keep up. 31 00:02:49,890 --> 00:02:57,090 There will be test at the end. So this is a work in two projects, both between Cambridge and Imperial. 32 00:02:57,570 --> 00:03:01,809 One also has partners in Nottingham University and the other one has partners at the Turing is two, 33 00:03:01,810 --> 00:03:05,160 where I spent half my time in London where they have two pubs of music. 34 00:03:06,270 --> 00:03:10,950 Although I'd say, yeah, Oxford's definitely had a Cambridge in both those steps. 35 00:03:11,640 --> 00:03:20,610 So and this is this is an area we've been mucking around with for a while, which is something that I think impinges on all of us. 36 00:03:20,610 --> 00:03:26,190 So in and out there in the cloud, there are two kinds of class, a broad classes of data. 37 00:03:26,460 --> 00:03:33,930 There's all the stuff that we voluntarily stick on Facebook or Twitter or Instagram, whatever, all of that. 38 00:03:33,960 --> 00:03:37,890 You know, it magically emerges from our mobile devices in our pocket and so on. 39 00:03:38,310 --> 00:03:44,040 And then there's stuff that people that have large bodies of curated data like Bill Health Service, 40 00:03:44,760 --> 00:03:55,170 who may be working with DeepMind or or a financial institution and trying to do fraud detection or, you know, figure out the next Facebook and so on. 41 00:03:55,890 --> 00:03:58,740 So so there's this this is large dataset over here, 42 00:03:58,740 --> 00:04:03,600 which is kind of curated and comes from some large organisation and then is that is highly decentralised 43 00:04:03,600 --> 00:04:07,530 sort of crowdsourced bunch of data over here that's then centralised by some of these agencies. 44 00:04:07,740 --> 00:04:10,260 So what we've been staring at is thinking about the, 45 00:04:10,420 --> 00:04:16,379 the issue of privacy and the fact that all this data is being put somewhere for doing analytics on it to, you know, 46 00:04:16,380 --> 00:04:19,260 maybe monetise it, maybe it's you are the product for Zuckerberg, 47 00:04:19,770 --> 00:04:26,070 but maybe also to work out something that's a better diagnostic tool, predictive tool and so on on health care data. 48 00:04:26,580 --> 00:04:31,950 So which could be, could be making money as well or something for social good. 49 00:04:31,950 --> 00:04:37,260 They'll be figuring out, you know, what's the best investment in the future of universities in terms of training and education. 50 00:04:37,470 --> 00:04:40,650 Maybe it would be to keep our pension the way it is. Oops, political. 51 00:04:41,520 --> 00:04:45,420 Okay, so that's the last thing I'll say for now on that topic. 52 00:04:46,050 --> 00:04:51,870 So so there are there are two pieces to this talk and I'll probably run out of time just at the end of the first piece. 53 00:04:51,870 --> 00:04:56,489 Apologies for that. The slides will be available, I think so if you're interested. 54 00:04:56,490 --> 00:05:00,750 In fact, they're linked off my home page as well. So the second bit will be in there too. 55 00:05:00,870 --> 00:05:06,180 So the first bit is about privacy, preserving analytics and a centralised cloud. 56 00:05:06,390 --> 00:05:08,879 And this comes out of the work with Turing and principle, 57 00:05:08,880 --> 00:05:14,370 with peer to peer support group in large scale distributed systems, Imperial College, London and folks in Cambridge. 58 00:05:14,640 --> 00:05:20,340 And what we're interested in is we have this large amount of curated data, so it could be partners that we have worked with, 59 00:05:20,340 --> 00:05:26,790 with the NHS, Scotland, 1.6 million patient records with upwards of 10,000 variables kept about every patient. 60 00:05:27,900 --> 00:05:33,150 And this stuff is naturally in some senses centralised across data centres in hospitals. 61 00:05:33,150 --> 00:05:38,790 There are people in Oxford working on this stuff, doing some computer science here and other folks doing cool things with this. 62 00:05:39,120 --> 00:05:42,659 And then the other, the other sort of large amount of centralised data is financial. 63 00:05:42,660 --> 00:05:44,969 HSBC is a partner in the Turing House. 64 00:05:44,970 --> 00:05:50,850 I think about 20% of the world's transactions go through their systems, a large fraction of UK transactions, and they're interested in, you know, 65 00:05:50,880 --> 00:05:56,700 predicting what's going to happen in tomorrow's book for trading, but also detecting fraud because they're required to look at it and so on. 66 00:05:57,000 --> 00:06:04,709 So there are motives for putting their private data centres data into the cloud which are effectively cost saving exercise. 67 00:06:04,710 --> 00:06:07,260 And in fact there are things they may need to do they can't afford. 68 00:06:07,410 --> 00:06:13,770 I was surprised that HSBC said that actually some of the things they're going to be required to do for detection are actually beyond their capability, 69 00:06:13,950 --> 00:06:20,970 not just in terms of number of people, but, you know, in computation. But if you do the cost of renting those resources on servers, yeah, 70 00:06:21,150 --> 00:06:30,000 then it comes out somewhat better because of a number of huge scale scale up properties like Amortising power supplies and cooling and efficiency, 71 00:06:30,450 --> 00:06:36,179 operational reduction cost by having multiple customers and so on, and also statistical multiplexing of the resource. 72 00:06:36,180 --> 00:06:41,610 And there's a lot of reasons so and there are other motives for putting things into the public cloud. 73 00:06:41,610 --> 00:06:45,179 Okay. So there's issues here about legality, 74 00:06:45,180 --> 00:06:50,879 which one go into there's not really time that you want the experts in some of this they're actually pretty near here in. 75 00:06:50,880 --> 00:06:55,110 Oh, I have a bunch of really really people have written about this stuff. 76 00:06:55,380 --> 00:07:01,950 But there are rules about where you do cloud processing on PII, personal identifying information, in particular in health care. 77 00:07:01,950 --> 00:07:07,350 Financial is very strictly controlled in the US as well as in EU and UK is in line with that and so on. 78 00:07:08,160 --> 00:07:12,630 But there's some practical things. Even if you stay within a national jurisdictional boundary, 79 00:07:12,870 --> 00:07:17,070 you want to keep your data encrypted and installed, you want it to encrypt it when you transfer it. 80 00:07:17,190 --> 00:07:19,860 And post-Snowden, that's been fairly standard. 81 00:07:19,860 --> 00:07:27,750 If you buy storage on Amazon or Google Cloud or whatever, and then in rest encrypted and people default to agents and transfer. 82 00:07:28,380 --> 00:07:32,190 But you'd also I'd like to go a bit further. You'd like it to be encrypted during processing. 83 00:07:33,280 --> 00:07:38,620 Because there are threats. I'm going to talk about those threats because that's what we're trying to mitigate. 84 00:07:39,350 --> 00:07:46,240 Okay. So what on earth with the threat to be to when your processing data is coming off the disk, going to the CPU? 85 00:07:46,270 --> 00:07:47,410 What could possibly go wrong? 86 00:07:49,230 --> 00:07:56,100 There are also a whole bunch of other things I really don't have time to go into about, you know, key management across multiple organisations. 87 00:07:56,110 --> 00:08:01,720 A horrible, huge, massive problem always has been. I don't see any stop to that. 88 00:08:01,900 --> 00:08:07,900 But the bottom of the bottom line on this slide is sort of the word enclave and secure enclaves is something you may have come across. 89 00:08:08,440 --> 00:08:14,980 It showed up, I think, in the FBI versus Apple fight over a terrorist's iPhone. 90 00:08:15,580 --> 00:08:22,570 And Apple was like, oh, we can't actually decrypt this phone for you. Sorry, actually isn't doable by us and not at least in the affordable way. 91 00:08:23,440 --> 00:08:26,140 In fact, it turns out there were some workarounds for that obscurely. 92 00:08:26,150 --> 00:08:31,920 It doesn't matter about the details, but the reason is just to do with the kind of technology they use for where keys the cap. 93 00:08:32,110 --> 00:08:37,510 But actually even if you had been processing things on the processor that that iPhones use, 94 00:08:37,690 --> 00:08:43,599 which is kind of arm variant, there's technology for running a trust zone, as they call it. 95 00:08:43,600 --> 00:08:48,940 And that's a cool thing. And Intel have a similar thing, which is what I'm going to talk about called SGX, 96 00:08:49,150 --> 00:08:53,530 which is software got extensions of the intel processor, which you could use to gods. 97 00:08:53,740 --> 00:09:02,740 What's going on in some senses during processing okay up to some limit AMD have another technology which is halfway between what Intel and arm do. 98 00:09:03,550 --> 00:09:09,100 Cherry is a Cambridge local specific thing, which we do, which we built some hardware which does a simpler and better thing. 99 00:09:09,100 --> 00:09:14,860 But I don't have time GDPR as legal background. I don't have time to go into the legal background, but this is why you care. 100 00:09:15,070 --> 00:09:18,070 You need to make your best effort at keeping people's data secure. 101 00:09:18,310 --> 00:09:25,540 If it's health care or it's financial and it has PII and it, then you are into very serious fines if you get things wrong. 102 00:09:25,540 --> 00:09:30,040 I mean, not just talking about, you know, a slap on the wrist and $50,000, 103 00:09:30,040 --> 00:09:33,490 we're looking at 95% of GDP a year while you're still doing the wrong thing. 104 00:09:33,910 --> 00:09:36,969 Not funny. 5% of your gross profit for your company. 105 00:09:36,970 --> 00:09:43,960 Sorry. Okay. So the project we have at Imperial and Turing is called Meru for obscure reasons. 106 00:09:44,260 --> 00:09:52,240 And what we're interested in is trying to see if we can do analytics in SGX in this extension to the Intel processor. 107 00:09:52,510 --> 00:09:56,170 And so we going to dive into a bit of detail here about how we're doing that. 108 00:09:56,350 --> 00:09:59,829 So I'm going to talk a bit about trustworthy data processing in an untrusted cloud. 109 00:09:59,830 --> 00:10:04,690 That's kind of the starting point here. So we're looking at people have a lot of curated data is high value, 110 00:10:04,960 --> 00:10:10,180 the bad guys out there will want to try and attack it and it's in central locations for good reasons. 111 00:10:10,180 --> 00:10:18,420 You might want to do machine learning over this data because you might want to run, you know, create a a Bayesian model of the data. 112 00:10:18,430 --> 00:10:23,740 You might want to do some interesting image processing over all the retinal scans, all the images, a whole bunch of things you might want to do. 113 00:10:24,040 --> 00:10:30,730 And then having trained up those systems, you might give them to chips. So when you walk in with some extra symptom, the GPU runs the thing on you. 114 00:10:30,910 --> 00:10:35,860 No privacy problem at that point. You've got a direct relationship and they all you need to go to the hospital right 115 00:10:35,860 --> 00:10:40,300 away for an eye operation because BLAH might even have a model which we have done, 116 00:10:40,900 --> 00:10:44,830 which will predict that if you don't get in by Tuesday, Wednesday will be a day too late, 117 00:10:45,340 --> 00:10:49,479 which is, you know, the sort of thing that was high value to people. Okay. 118 00:10:49,480 --> 00:10:53,290 So we're going to have a look at what's the what's the underlying and problem space, 119 00:10:53,290 --> 00:11:01,600 the a bit of an overview of SGX and and so on and how then we map a machine learning or a data analytics platform onto SGX. 120 00:11:01,900 --> 00:11:10,420 And at the end of this, I'll try and remember to say what the what the shortcoming of all this work is so trustworthy data processing. 121 00:11:10,420 --> 00:11:18,670 So the cloud has taken off because a lot of people kind of trusted the cloud provider and the cloud provider didn't trust the people, the users. 122 00:11:18,850 --> 00:11:25,030 So the model traditional model is you've got sort of trusted operating system and harbour and the cloud providers go out of their way. 123 00:11:25,210 --> 00:11:26,740 They have very, very good processes. 124 00:11:26,740 --> 00:11:33,040 If you go visit Google or Microsoft and go to an as your site, their processes for managing, you don't get to get, 125 00:11:33,040 --> 00:11:37,509 you know, administrator log in or, you know, to to sue do anything that just doesn't happen. 126 00:11:37,510 --> 00:11:41,320 They're really, really good physical access control. They really pretty good about that stuff. 127 00:11:41,970 --> 00:11:47,530 And what they're trying to do is, you know, to isolate users from each other. So and they use virtual machines. 128 00:11:47,530 --> 00:11:54,550 And this is where I came in way back when in Cambridge, we were building a hypervisor called Zen, which is widely used in Amazon and other places. 129 00:11:54,730 --> 00:12:00,850 So for running multiple guest operating systems, virtual machines, and you get protection between those kind of. 130 00:12:00,850 --> 00:12:04,520 Or do you. Users trust their application. 131 00:12:04,520 --> 00:12:08,360 But why should they trust the cloud provider? Why should they trust the cloud provider? 132 00:12:08,510 --> 00:12:17,089 So historically, back in the day, when when there's no hypervisor for ships in use, that would be a cert, you know, literally an alert. 133 00:12:17,090 --> 00:12:20,760 There's a vulnerability about once a week. On the hypervisor. 134 00:12:20,760 --> 00:12:25,860 That means that a vulnerability exists such that some guests could run an app which could go and 135 00:12:25,860 --> 00:12:29,700 look at all the memory and all the other operating systems via a vulnerability in hypervisor. 136 00:12:30,000 --> 00:12:34,290 So at this point, you fix the bug and you reboot 1 billion virtual machines. 137 00:12:34,890 --> 00:12:44,640 At which point a lot of customers get a bit annoyed. Right? But if their data was sensitive, that vulnerability exploit could have been the bad guy. 138 00:12:44,670 --> 00:12:53,700 Reading all of the MPs health records when they presented with weird symptom X and then publishing it to some scurrilous newspaper or worse, 139 00:12:54,570 --> 00:12:58,470 you know, attacking the entire financial system tomorrow by fiddling with some of the numbers. 140 00:12:58,650 --> 00:13:05,969 So so this is an issue. So a solution for this might be to run a trusted execution environment. 141 00:13:05,970 --> 00:13:11,970 So to have some kind of way of supporting isolation in a way that the hypervisor and 142 00:13:11,970 --> 00:13:17,550 operating systems do not have the privileges to read across these application domains. 143 00:13:17,880 --> 00:13:24,420 So, you know, long story short, Intel built something to do this way back when arm built this. 144 00:13:24,690 --> 00:13:27,080 It's a little easier on an arm. People familiar with, you know, 145 00:13:27,090 --> 00:13:33,600 risk processors and ARM will know it's a highly regular architecture and adding some new thing to it in a coherent way. 146 00:13:33,810 --> 00:13:37,750 It's doable. And ARM also have a pretty nice full model of their systems. 147 00:13:37,770 --> 00:13:42,960 When they had a feature, they can figure out the consequences. For Intel, it's incredibly complex. 148 00:13:42,990 --> 00:13:46,590 How do you get a thing to have this thing called a micro architecture, which leads to all kinds of problems? 149 00:13:46,920 --> 00:13:54,930 But the idea here is that essentially you reverse this whole structure where users run their application process and it tells the operating system, 150 00:13:54,930 --> 00:13:58,800 which has more privilege, which talks to device drivers and storage and so on. 151 00:13:59,580 --> 00:14:04,739 And if you run a hypervisor that has even more privilege because it can see all these OSes and you flip that around and say, 152 00:14:04,740 --> 00:14:12,690 no, the application can enter an execution domain, if you like, within which is sandboxed by the hardware. 153 00:14:13,060 --> 00:14:19,170 Okay. So that's what SGX or an enclave is sort of supposed to do, a trusted execution environment, lots of different ways of coming at this. 154 00:14:19,620 --> 00:14:21,840 And there are there are several other pieces you need. 155 00:14:22,020 --> 00:14:30,360 But the idea is this this potentially saves you from the vulnerabilities in another device or the OS or even in library code, 156 00:14:30,360 --> 00:14:34,020 perhaps in runtime breaking things for you potentially. 157 00:14:34,260 --> 00:14:37,770 So that's shipped on various recent Intel processors. 158 00:14:38,010 --> 00:14:41,610 And as I say, there's an equivalence in our processes, an equivalent lambda and so on. 159 00:14:41,790 --> 00:14:44,639 In fact, the risk free project also has a design for an equivalent. 160 00:14:44,640 --> 00:14:50,400 So so there's a kind of marketplace in these things which we'll see has a bit of a problem. 161 00:14:50,670 --> 00:14:54,000 Okay. So SGX is this trusted execution environment. 162 00:14:55,350 --> 00:14:58,470 And so now you're not trusting us anymore. 163 00:14:58,980 --> 00:15:02,280 Your code starts off by entering an enclave somehow. 164 00:15:02,850 --> 00:15:06,300 And this can provide confidentiality and integrity. 165 00:15:07,020 --> 00:15:11,250 The integrity there checks on all you really running on it on SGX. 166 00:15:11,550 --> 00:15:15,210 And then then is this you know, it's this really I mean, actually running on this processor. 167 00:15:15,390 --> 00:15:21,210 And if you can in talking to another processor, is the other processor able to tell you are who you say you are as well? 168 00:15:21,330 --> 00:15:23,550 So there's a whole bunch of other technology in here, 169 00:15:24,120 --> 00:15:32,640 but basically you have an enclave code and data and thread support in in this particular where support for this this sort of world. 170 00:15:33,450 --> 00:15:38,400 So it's an extension to what is already a very complicated instruction set architecture. 171 00:15:38,760 --> 00:15:47,220 Anyone who's ever read an instruction set architecture book, I think the last one I read all of and understood was a PDP 11. 172 00:15:48,330 --> 00:15:53,520 The arm is just about double. If you were teaching computer science, you know, processor architecture 1 to 1, 173 00:15:53,670 --> 00:16:00,899 you might teach from a Hennessy Patterson's fantastic book, which would be about a 32 lectures to get about halfway through. 174 00:16:00,900 --> 00:16:03,920 And that's when the MIPS processor, which is quite simple compared with any of this. 175 00:16:03,940 --> 00:16:11,999 So but anyway, so Intel, God bless them, have added this confidentiality and integrity checks for going in and out. 176 00:16:12,000 --> 00:16:18,060 But also crucially, as I mentioned, the bottom line here is you want your data to be encrypted storage on disk, 177 00:16:18,450 --> 00:16:21,750 SSD, a lot of the time in transfer open networks over links. 178 00:16:21,960 --> 00:16:30,190 And now this supports encrypted memory. So if you're sitting there, you've read recent events in this book, but what about the cash? 179 00:16:30,250 --> 00:16:35,890 Now it's about encrypted RAM, so it's a memory controller which can do encryption and decryption of fetches from RAM. 180 00:16:36,190 --> 00:16:39,730 Okay. So, so that's, you know, that's, 181 00:16:39,850 --> 00:16:45,520 that's the sort of first piece of SGX that kind of there's some extra protection and 182 00:16:45,640 --> 00:16:49,690 there's some magic associated with each and every separate intel processor shipped, 183 00:16:50,140 --> 00:16:53,770 which does a bit of the keys for doing this. Okay. 184 00:16:54,670 --> 00:16:57,909 And this is just I really don't have time to go through code examples. 185 00:16:57,910 --> 00:17:02,020 So this is just, you know, a piece of code that doesn't answer in a piece of code that runs in the enclave 186 00:17:02,560 --> 00:17:07,000 that gets a message which said the user ship say off disk or off network, 187 00:17:07,300 --> 00:17:16,209 then that code can safely decrypt, do some processing in the middle, encrypt the output, copy the message to the output, the result buffer and so on. 188 00:17:16,210 --> 00:17:24,940 And there's an interface to this. There's a sort of enclave enter and exit, but there's also their ingress and egress calls into the system. 189 00:17:25,630 --> 00:17:34,630 So there's when you construct the enclave in the first place, you need to know your code got there safely and it's done sort of page at a time, 190 00:17:34,630 --> 00:17:39,820 move the code in there and then at the end of that to say there's an enclave 191 00:17:39,820 --> 00:17:45,250 measurement process where the CPU can calculate measurement hash and then just say, 192 00:17:45,250 --> 00:17:48,460 have we got the right code there? Are we talking to the right person? 193 00:17:49,330 --> 00:17:54,940 And the second piece of this is apart from local attestation, second is remote attestation. 194 00:17:54,940 --> 00:17:59,710 So we need to just talk about two different enclaves talking to each other and remote attestation. 195 00:18:00,490 --> 00:18:06,340 So these are those pieces I just skipped over because they're fairly standard crypto protocols for doing that kind of thing. 196 00:18:06,340 --> 00:18:10,900 If you think in terms of if you ever used to email anyone ever used secure email, 197 00:18:11,290 --> 00:18:16,060 it's there's no repudiation of who sent this to you and who you are to receive it and so on. 198 00:18:16,630 --> 00:18:20,910 So that's kind of what you're getting from that. Okay. Hardly anyone ever uses email. 199 00:18:20,980 --> 00:18:25,660 Very strange. But anyway. Okay, so that's what that's all about. 200 00:18:25,930 --> 00:18:30,790 There are some interesting limitations which very, very much matter. 201 00:18:31,340 --> 00:18:35,510 The current one, the top is the amount of memory you get encrypted on. 202 00:18:35,510 --> 00:18:38,829 The current SGX intel processor is extremely limited. 203 00:18:38,830 --> 00:18:42,220 You really don't get very much by my standards with the PDP 11, 204 00:18:42,520 --> 00:18:48,040 a couple of people in a room in here might know the first sort of LSI 11 had about 56 K bytes or usable 205 00:18:48,040 --> 00:18:53,709 memory and we used to use that for multiple users running version six Unix for log in over Cambridge rings. 206 00:18:53,710 --> 00:18:56,320 It was fine for K by some memory. 207 00:18:57,250 --> 00:19:05,800 So in this terribly impoverished world in terms of encrypted memory, you're going to get about 90 meg of encrypted memory. 208 00:19:06,640 --> 00:19:12,549 And of course everyone's probably used to writing analytics program where you glibly throw some small core application and you go, 209 00:19:12,550 --> 00:19:13,360 Oh, it doesn't matter. 210 00:19:13,360 --> 00:19:21,100 I got four gigs on my laptop, and then if that's not enough, I can put 16 gig and I can run it in the cloud and it offered terabytes of whatever. 211 00:19:21,910 --> 00:19:27,940 And there's also some overheads going in and out of the enclave that are non-trivial, particularly if you exceed the memory and you stop paging. 212 00:19:27,940 --> 00:19:30,940 There's a massive overhead because you have to software the pages to move them out. 213 00:19:30,940 --> 00:19:36,879 And so so it's pretty, pretty scary. Okay. So there's a bit in the middle there which we'll come back to. 214 00:19:36,880 --> 00:19:40,420 But so channel attacks are possible. So never said they weren't. 215 00:19:41,370 --> 00:19:47,040 But in general, if you're careful about how you do things, they may be quite hard to use or they might have been until recently. 216 00:19:47,550 --> 00:19:51,000 Okay. So what we done that, we didn't do that. 217 00:19:51,000 --> 00:19:54,210 That's all. Intel and other folks have done similar things. I mentioned ARM. 218 00:19:54,720 --> 00:19:57,900 But what we've done is we wanted to put arbitrary applications into this. 219 00:19:58,080 --> 00:20:01,709 You could take your application to this and call you enter, edit the code, 220 00:20:01,710 --> 00:20:06,330 put it in, compile it and run it and you'd be you be using the enclave in some way. 221 00:20:06,630 --> 00:20:13,380 But we said, how about we do arbitrary application support, you know, so we may be running a JVM or dot net or some other support and so on. 222 00:20:13,680 --> 00:20:19,950 And then we also need to talk to the outside world. So we need to talk to what we have to do, like loading. 223 00:20:19,950 --> 00:20:26,700 And then we need to talk to file systems, deal with signals, do networking and the process, those bits of code better, 224 00:20:26,700 --> 00:20:32,010 do the the ingress and egress and the enclave security because they're talking out of the enclave at that point. 225 00:20:32,280 --> 00:20:35,910 So they have to do the right thing in terms of crypto. So. 226 00:20:36,050 --> 00:20:43,290 So Peter Slattery and Imperial have built this Linux kernel library to simple arbitrary Linux applications, 227 00:20:43,290 --> 00:20:49,050 anything that lives in a fairly, fairly standard alpine in its world. 228 00:20:49,230 --> 00:20:55,110 We'll just run on this. This is kind of cool. So anything, any binary that run an alpine index will run on this Linux kernel library. 229 00:20:55,350 --> 00:20:59,129 And the idea is you just edit the relevant bits that are kind of library said that it enters and exits 230 00:20:59,130 --> 00:21:03,720 the enclave and then application running on that is now running in the enclave or external enclave. 231 00:21:03,960 --> 00:21:10,560 And anything that wants to cool networking or disco goes through the appropriate libraries which do the appropriate crypto in and out. 232 00:21:11,430 --> 00:21:18,270 If you're sitting there thinking security people, I recognise the room thing yeah but you know networks that crypto is any good had you know 233 00:21:18,510 --> 00:21:23,940 your disk IO crypto is any good and how do you know Intel's memory encryption is any good. 234 00:21:24,210 --> 00:21:27,330 Well, you don't. But, you know, you might have gone mobile, checked some of them. 235 00:21:28,200 --> 00:21:34,610 We have another project which is doing this all in Carmel and we have a little checked stack, so we think that's okay. 236 00:21:35,610 --> 00:21:39,090 But that will give you network crypto. Okay. You could put that into this world. 237 00:21:40,140 --> 00:21:47,100 So that's really what's going on there. There's a bunch of other pieces you have to look at or is this very standard consistency stuff. 238 00:21:47,100 --> 00:21:50,760 And I have to go through it pretty fast because it's background. 239 00:21:51,570 --> 00:21:56,100 And the idea, though, is you're going to have to deal with your memory management, 240 00:21:56,100 --> 00:22:01,020 your your system call stubs have to be implemented so that because you can't do an actual system, 241 00:22:01,020 --> 00:22:07,920 cause you can't do a track in and out of code, because that trap is itself changing privilege level and it's what you're trying to get rid of. 242 00:22:07,980 --> 00:22:15,510 So try to do that. Okay. So the thing we wanted to do with all of this, to cut to the chase was to do some big data processing. 243 00:22:16,050 --> 00:22:21,330 So. You could choose lots of different data processing analytics platforms. 244 00:22:21,780 --> 00:22:26,490 Actually, the first thing the Imperial Folks did with some guys from Germany, 245 00:22:26,910 --> 00:22:30,410 they had a really nice project where they put Docker containers into SGX. 246 00:22:30,420 --> 00:22:34,890 That was a clever idea because that put anything containerised potentially into an enclave. 247 00:22:35,670 --> 00:22:45,510 And as a paper about that, I think in OCI a year and a half ago goes by the name of stone for secure container extension for next year. 248 00:22:45,700 --> 00:22:50,640 Anyway, so, but then we thought, well, actually that's, that's, that's too general. 249 00:22:50,670 --> 00:22:54,360 Let's go one less general and let's take a particular data processing platform. 250 00:22:54,600 --> 00:22:58,350 And the you know, interesting one of choice might be SPARC. 251 00:22:59,370 --> 00:23:03,570 Hands up if you view SPARC. Who? About three people. 252 00:23:03,630 --> 00:23:09,130 Anyone use Hadoop? A couple more people, many people who reduce. 253 00:23:10,950 --> 00:23:18,899 Okay. The same kind of people. Okay. So if you have a lot of data and you want to paralyse and distribute things in a data centre, 254 00:23:18,900 --> 00:23:21,780 cause there are lots of processes there and lots of racks of processes, 255 00:23:22,050 --> 00:23:29,460 then these are a fairly starter set of tools that let you do parallel distributed computing over a over a datacenter environment. 256 00:23:29,820 --> 00:23:32,760 They're not the same tools you would use in an HPC, 257 00:23:33,380 --> 00:23:39,420 very tightly coupled cluster computing platform where you'd use some PBM style of system, but they're very, 258 00:23:39,420 --> 00:23:47,340 very widely used and SPARC is kind of SPARC is particularly state of the art in machine learning styles of tasks 259 00:23:47,490 --> 00:23:55,650 because it has a fairly nice way of dealing with asynchronous and redundancy that actually sort of scales quite well. 260 00:23:55,800 --> 00:24:03,390 And SPARC, it's usually kind of coupled with R and R is a language package which is it derives from S, 261 00:24:03,390 --> 00:24:09,630 which is a very, very commonly used statistics package. So there are a lot of good reasons to to take SPARC as an example. 262 00:24:09,990 --> 00:24:15,210 One of my students who I didn't really supervise, who's way too small and did all his own work, 263 00:24:15,720 --> 00:24:24,780 he currently works for Microsoft and and as your research and things and he's done this for Hadoop and for SQL Server and for some other things. 264 00:24:24,780 --> 00:24:27,690 So I think he's you've got a blockchain running in SGX, which is kind of cool. 265 00:24:28,080 --> 00:24:35,880 But we chose to do this with Spot because in our friends in analytics and machine learning, that was their kind of principal current tool of use. 266 00:24:36,360 --> 00:24:41,999 You know, people out there might say, Oh, well, actually, you know, I'm using TensorFlow, you know, because I'm a real neural net person. 267 00:24:42,000 --> 00:24:43,620 That's the tool. Well, we haven't done that, 268 00:24:43,620 --> 00:24:50,819 but one of the things we're doing with this work is documenting what we have to do so that somebody else could repeat that work. 269 00:24:50,820 --> 00:24:55,380 You know, just take the runtime for the data processing for the analytics platform and redo it. 270 00:24:55,770 --> 00:25:02,790 So what's the interesting issue is Spot, which is basically it's kind of still doing a MapReduce style of computation where 271 00:25:03,000 --> 00:25:08,610 you've got a bunch of data in each node and each node maps a function and reduces that, 272 00:25:08,610 --> 00:25:12,180 and then you share all the results across all the nodes and then move on to the next step. 273 00:25:13,870 --> 00:25:21,609 So. So what we want to do is to take that code that maps and and reduce its functions and put that in an enclave and then have 274 00:25:21,610 --> 00:25:26,620 the data come out of memory because that's where it's going to be is going to be in one of these ads that Spock uses. 275 00:25:26,920 --> 00:25:30,850 And then in the enclave, be decrypted, have the function mapped over, 276 00:25:30,850 --> 00:25:34,810 iterate and then move on to the next function and then do that and all these notes in parallel. 277 00:25:35,140 --> 00:25:42,790 So what S'POP is written in basically has to has means that we have to put a JVM into the enclave. 278 00:25:43,240 --> 00:25:50,410 So this gets exciting. So, you know, I've said all this, we could map other things into the enclave, 279 00:25:50,680 --> 00:25:55,240 but we chose to start with spoke with documents in such a way that other people could use it. 280 00:25:56,440 --> 00:26:01,690 One of the things you might be thinking out there, if you do any large scale machine learning, 281 00:26:02,020 --> 00:26:05,050 you might be going, well, what about the accelerator hardware that people use? 282 00:26:05,680 --> 00:26:14,280 So people in. DeepMind and many other places Microsoft anywhere, Facebook, wherever. 283 00:26:14,280 --> 00:26:21,640 Anyone doing machine learning probably will use a GPU. They might use an FPGA to do acceleration, and this thing is outside of the enclave. 284 00:26:21,660 --> 00:26:29,010 Of course, you have to communicate with some channel, maybe memory buses or whatever, maybe some network linked to it or some other model. 285 00:26:29,010 --> 00:26:38,639 But. So there's an issue there. It would be nice if somebody builds an enclave, CPU, an enclave, you know, trust, execution, 286 00:26:38,640 --> 00:26:46,350 environment extension to TPS, which is Google's essentially tensor matrix multiply roughly a bit more than that. 287 00:26:47,700 --> 00:26:52,200 There's yeah. Okay. There's a bottom line there as well, which we'll come back to. Okay. 288 00:26:52,380 --> 00:26:57,370 So, so the idea we have is we looked at Spark and okay, this is cool, but it's very big. 289 00:26:57,390 --> 00:27:00,660 You've got Spark, which is kind of huge amount of code that does lots of cool things. 290 00:27:01,680 --> 00:27:07,649 Actually, it kind of maps, functions over things and then and then and does some cool iterating of that. 291 00:27:07,650 --> 00:27:13,650 And it's not that complicated, but it needs a JVM which has to be put in the and K too, and that's very big. 292 00:27:13,890 --> 00:27:18,030 So how about we partition this software? So we were on part of it in the end. 293 00:27:18,030 --> 00:27:21,900 Clay And the all of the functions which need may not be touching sensitive data. 294 00:27:22,620 --> 00:27:32,370 So we could do that by a mixture of static analysis and runtime analysis, static analysis and say what is touching the data, 295 00:27:33,090 --> 00:27:37,500 you know, obviously, and then run it and then say what such and but actually you don't need to really do that. 296 00:27:37,820 --> 00:27:40,889 That's a cool thing here, which is spot is applying a function, right? 297 00:27:40,890 --> 00:27:42,750 So and that function is being applied. 298 00:27:42,750 --> 00:27:50,190 If people write code correctly in an enclave and of data that's come off disk or come off HDFC will come off a cash deal, 299 00:27:50,190 --> 00:27:53,309 come off of an RTD and it's encrypted. 300 00:27:53,310 --> 00:27:57,060 So it hasn't been decrypted yet. So we don't actually touch it's not sensitive at that point. 301 00:27:57,600 --> 00:28:05,850 So we don't have to worry about any of these, no less. So we can probably just deal with the very core pieces of of SGX, SPARC. 302 00:28:07,850 --> 00:28:12,290 So this is just sort of going through that detail. But I don't, again, have time to go through. 303 00:28:12,290 --> 00:28:17,750 But this is just saying, you know, what has to live inside the NRA is really decrypting into data, 304 00:28:17,960 --> 00:28:23,780 compute f of what we're, you know, iterating over the input, encrypt the result and so on and just do that. 305 00:28:24,700 --> 00:28:28,000 Okay. There's two steps in there just to illustrate that. 306 00:28:30,670 --> 00:28:35,170 And this is just showing a kind of more general partitioning of the components that go on. 307 00:28:35,230 --> 00:28:38,590 So, again, I don't have time to go through the details of this. 308 00:28:40,000 --> 00:28:49,090 There's. Movement between different JVM because you could be running multiple instances of spark and so concerns we have to worry about that. 309 00:28:49,360 --> 00:28:55,989 So that involves not just having encryption of IO two storage and encryption IO to networks, but now we had two encrypted shared memory. 310 00:28:55,990 --> 00:29:02,469 So that's another thing we have to manage, which is again another weakness of all of this, that sort of this is a house of cards. 311 00:29:02,470 --> 00:29:07,870 And you can if a security personnel system password, you could pull out any one of those cards and say, but what if you get that wrong? 312 00:29:07,870 --> 00:29:10,960 There could be a vulnerability there, just like there could have been a vulnerability in, 313 00:29:11,560 --> 00:29:14,710 you know, the old model, cloud model with the hypervisor and so on. 314 00:29:15,490 --> 00:29:18,970 Yes, they could, but we could fix this and then we move on. Okay. 315 00:29:19,030 --> 00:29:22,270 So. Okay. So that's the first part of the talk. 316 00:29:22,480 --> 00:29:27,710 So we have that all working. As of around just before Christmas, actually. 317 00:29:28,280 --> 00:29:32,900 And then what happened? I'm going to move on to this topic in a second. But what happened just after Christmas? 318 00:29:33,080 --> 00:29:37,050 Hands up. Yeah. 319 00:29:37,310 --> 00:29:42,630 I hear the ghostly voices. Spector, spector, his haunting, you know, the the clouds of Europe and the world. 320 00:29:43,500 --> 00:29:49,829 And the first thing we did was check that spectre brakes and and some folks had just published a more detailed thing, 321 00:29:49,830 --> 00:29:57,960 but I think we were the first to go, oh, dear. So this may hopefully get fixed at some point, 322 00:29:58,290 --> 00:30:07,860 but basically speculative execution in very complicated market architectures like Intel have allows you to do things that happen. 323 00:30:08,190 --> 00:30:14,550 And what would normally in a sort of sane world would be another thread and would have all of its access control checks in the right way, 324 00:30:14,820 --> 00:30:22,110 but doesn't in this world. So to use all the resources you have an approach is just in case this branch of code might be useful later. 325 00:30:22,440 --> 00:30:28,680 You can run it. And the first thing that these very large number of people I don't have time to quote and go through found was, 326 00:30:28,710 --> 00:30:31,740 oh, you could you could break the userspace operating system boundary. 327 00:30:31,750 --> 00:30:36,360 This because speculative execution would just go off and start reading OS memory or memory from other processes. 328 00:30:37,290 --> 00:30:40,619 And but then the speculative execution would end because it was wrong, 329 00:30:40,620 --> 00:30:44,790 the branch was wrong and will be terminated, in fact be shut down because it had done the wrong thing. 330 00:30:45,420 --> 00:30:49,379 But there's a side effect. And a side effect is what Intel never claimed to protect against, 331 00:30:49,380 --> 00:30:57,900 which is any data access that pulls things out of memory into the cache, leaves traces in the cache until the cache is evicted. 332 00:30:58,110 --> 00:31:02,220 Unless you manually evicted from the cache, they're around for a while. So cache memory, cache, hierarchy, 333 00:31:02,430 --> 00:31:08,579 you've got a short amount of time where you might be able to read that unless you change all your code to evict things from the cache, 334 00:31:08,580 --> 00:31:15,450 where then your processor will run ten 1000 times slower, which is kind of just got rid of the cloud having any point whatsoever. 335 00:31:15,810 --> 00:31:22,410 You're in trouble. And, and so it turns out the same kind of attack works across SGX because basically, 336 00:31:22,410 --> 00:31:27,840 as I mentioned, the memory encryption decryption happens from RAM into cache memory. 337 00:31:28,800 --> 00:31:34,040 But you're still subject to the same possible attacks, interestingly enough, on, um. 338 00:31:34,440 --> 00:31:37,510 Trust zone. It doesn't appear to. And I think what's happened is. 339 00:31:37,530 --> 00:31:43,139 AAM have a very simple elegant design where when you do spec contracts Houston's branch 340 00:31:43,140 --> 00:31:47,160 across the crescent boundary it stops and goes no you don't have to do that correctly. 341 00:31:47,550 --> 00:31:50,910 But it does work across the OS boundary in the arm, which is a bit puzzling. 342 00:31:51,150 --> 00:31:58,470 So we're kind of okay, you know, there are ways to mitigate this by changing code lots and lots of places and wait for new processes. 343 00:31:59,280 --> 00:32:04,290 So if you're kind of a data centre, like a typical Facebook data centre with maybe a million cause, 344 00:32:05,130 --> 00:32:09,960 you know, buying a million new cores is quite expensive, it going to take a while. 345 00:32:10,950 --> 00:32:14,489 So this stuff is not really ready for hardcore prime time. 346 00:32:14,490 --> 00:32:17,940 But actually so I should say kind of yes it is, 347 00:32:17,940 --> 00:32:25,770 because you could always make sure that your application and SPARC runs in SGX and only runs on cause I have no other things running. 348 00:32:26,160 --> 00:32:29,580 And then you go, well yeah but we, then we don't get the speed that we need. 349 00:32:29,580 --> 00:32:34,620 Yeah. But if using 100% of the core CPU time anyway, that's okay. 350 00:32:35,100 --> 00:32:39,390 And then you could argue well then you don't get the cost saving moving for your private data centre and so the cloud. 351 00:32:39,780 --> 00:32:44,729 But you do because you get the amortising over various different, you know, operational costs and so on. 352 00:32:44,730 --> 00:32:52,200 So you still get a cheaper thing, but it's you don't get that maxing basically, which is definitely a bit of a negative thing. 353 00:32:53,840 --> 00:33:01,389 Okay. How am I doing for time? I can't read that clock. Half an hour in. 354 00:33:01,390 --> 00:33:04,920 Thanks. Perfect. Well, no, but too many slides. 355 00:33:04,930 --> 00:33:09,370 But you can meet them later and catch up. Okay. 356 00:33:09,550 --> 00:33:11,530 So in parallel with that, 357 00:33:12,010 --> 00:33:20,980 which was trying to help people like the NHS or the financial service folks use the cloud in a way that we thought would be secure, a more secure. 358 00:33:21,010 --> 00:33:26,320 There is no security. There's just, you know, you mitigated these things and then the arms race moves on. 359 00:33:26,680 --> 00:33:33,910 The parallel that we've had a completely separate line of work, which comes from the opposite, which is distributed analytics. 360 00:33:34,450 --> 00:33:38,439 So the idea here is instead of moving all the data to the cloud and doing a computation now, 361 00:33:38,440 --> 00:33:44,980 instead of all these hospitals moving all that data into central data, data, databases and wherever, 362 00:33:45,160 --> 00:33:47,620 and then maybe copying that securely into the cloud, 363 00:33:47,620 --> 00:33:53,590 running a secure computation so you get more CPU and then getting the encrypted output back to their doctors, their medics and researchers. 364 00:33:54,010 --> 00:33:57,910 We said leave the data everywhere and to distribute the code to people. 365 00:33:59,130 --> 00:34:02,730 This is the opposite approach and it's very old. 366 00:34:03,390 --> 00:34:08,280 Two patents in distributed computing. Yeah. Move the data to processing or move the processing. 367 00:34:08,280 --> 00:34:12,540 So the data is kind of classic. You know, of course you could call it hybrids. 368 00:34:12,930 --> 00:34:16,050 You'll be sitting there thinking, yes, yes, computer science is really good at patents. 369 00:34:16,530 --> 00:34:19,950 We could do it one way or the other, or we could do a mixture. But I'm going to talk about this extreme. 370 00:34:20,100 --> 00:34:24,270 And the point of this extreme was that you keep the data at the owners. 371 00:34:24,270 --> 00:34:28,200 And this is really targeting different classes of data, at least initially. 372 00:34:28,380 --> 00:34:33,570 We're thinking of your social media data, your health care data on your phone, maybe monitoring your heartbeat, 373 00:34:33,580 --> 00:34:38,129 your skin conductivity, your temperature, your number of steps you've taken today. 374 00:34:38,130 --> 00:34:42,780 Why do you need to give that to anyone else ever? Another example, 375 00:34:42,780 --> 00:34:50,790 I think I'll just quickly go through the poster child example I think comes from a smart metre project by George Synthesis when he was at Microsoft. 376 00:34:51,480 --> 00:35:00,209 He's now professor at UCLA and for security. But he did this beautiful project which is designing smart metering and never gave the data from the 377 00:35:00,210 --> 00:35:07,560 metre to the that the reading dataset was never given to the electricity or gas or water provider. 378 00:35:08,010 --> 00:35:14,830 He would just give them the summary data. Why did they need to know? What the current in and out of your house every 2 seconds is. 379 00:35:15,790 --> 00:35:22,900 Later. That's complete nonsense, right? They have all kinds of current limiters and fuses and cut outs to stop bad things happening. 380 00:35:23,290 --> 00:35:27,040 But they want to know what the reading is each month without having to visit your house. 381 00:35:27,040 --> 00:35:34,090 That's their big cost saving, and they may want to send your metre a price so they will app on your your home hub 382 00:35:34,150 --> 00:35:38,020 management system could say here's some clever things you could do in the house, 383 00:35:38,380 --> 00:35:43,630 like not to turn on your dishwasher and washing machine until 4:00 in the morning because then that would be the best price point. 384 00:35:44,440 --> 00:35:50,649 Okay. And they might want to do that in a clever way. That is huge. But they still don't need to know what you use every 2 seconds or every 2 minutes. 385 00:35:50,650 --> 00:35:55,209 That's just irrelevant. They need to know summary data, but you might want to record that data all of the time. 386 00:35:55,210 --> 00:35:59,020 They want to record it so that they maybe, you know, check. So make sure the summary is correct. 387 00:35:59,320 --> 00:36:05,830 So the sort of poster child here example is, well, what is what is the what does the electricity company want to know? 388 00:36:05,890 --> 00:36:09,520 That's more fine grained than a one monthly reading per house. 389 00:36:10,690 --> 00:36:13,210 Not as fine grained as a reading every 2 minutes. 390 00:36:14,050 --> 00:36:21,820 They might want to know what kind of household you're from so they can work out a profile of pricing to see what your price sensitivity is. 391 00:36:22,270 --> 00:36:26,589 And also, when we're just about to run out of gas a couple of days back, you know, 392 00:36:26,590 --> 00:36:33,970 what could they set the price to be to alter the really big consumers price at the busy day, maybe for a certain class of users. 393 00:36:34,390 --> 00:36:39,670 So they need to know what class of user you are. So how many classes of users might there be? 394 00:36:40,000 --> 00:36:41,680 You know, how many household types are there? 395 00:36:41,950 --> 00:36:49,029 So how about we we know that from historical data there could be 16 kinds of domestic houses, maybe sweetie. 396 00:36:49,030 --> 00:36:53,679 And 16 kinds are characterised by some distribution over the, you know, 397 00:36:53,680 --> 00:36:57,820 some samples through the week, over the busy minute, the busy hour, the busy day. 398 00:36:58,360 --> 00:37:01,780 And that is sufficient to sell this house and not that house. 399 00:37:02,230 --> 00:37:07,540 So they need to acquire the parameters of that model. So we can do that in a decentralised way. 400 00:37:08,050 --> 00:37:11,350 We can run all kinds of these machine learning algorithms. We already do. 401 00:37:11,440 --> 00:37:17,080 We run them in the data centre distributed, except that we run them with all the data coming off the local crypto history. 402 00:37:17,260 --> 00:37:23,800 We can leave the data there, send the code out to everyone, learn the model parameters at each node and share the model parameters. 403 00:37:24,490 --> 00:37:27,580 Now, you might say even that reveals something about a household. 404 00:37:27,580 --> 00:37:30,490 Yes, of course it does. And to some extent. 405 00:37:30,640 --> 00:37:38,710 But you could also down the bottom there, you could share that information, peer to peer, while you build up the model, you build the accuracy. 406 00:37:38,980 --> 00:37:44,050 So if you all 16 been histogram and you're learning what the different models are that fit in that, 407 00:37:44,320 --> 00:37:46,870 and you get your accurate model off to some number of iterations, 408 00:37:46,870 --> 00:37:49,960 you're doing machine learning over this thing and you say, Oh no, that's good enough. 409 00:37:49,970 --> 00:37:54,430 Now we can ship that to the electricity providers for their cost for their customer base. 410 00:37:55,210 --> 00:37:59,280 And at no point did you give detailed data to them. Okay. 411 00:37:59,280 --> 00:38:05,360 So that's sort of distributed. Machine learning and lots of ways you could do that. 412 00:38:05,660 --> 00:38:09,410 Again, this is really neat because you avoid the whole problem of GDPR. 413 00:38:09,590 --> 00:38:12,979 So I'm going to go to that. But at no point did you give the data, the raw data. 414 00:38:12,980 --> 00:38:16,280 20 Well, you, you do have an interesting problem. 415 00:38:16,280 --> 00:38:23,059 Again, okay, folks have some really good stories on this, which is if you make a decision to change the price for a customer, they might go, 416 00:38:23,060 --> 00:38:29,810 Why have you changed my price to be that my neighbour got a different price change and you have to explain that it depending on the model complexity, 417 00:38:29,810 --> 00:38:36,290 it might be quite easy to explain. You might be able to say, well, you know, you have four kids who put in a dishwasher five times a day, 418 00:38:37,250 --> 00:38:41,290 you know, and you don't have to know that you would feed the price and it would go into the model, 419 00:38:41,310 --> 00:38:47,480 the data you have in your house, and it would pop up that thing, go, Oh, we can see the model fitted this and then you go. 420 00:38:48,450 --> 00:38:54,599 Okay. So so we built this crazy distribution analytics platform and there are lots of pieces for this. 421 00:38:54,600 --> 00:38:59,910 The last bit I want to try and get to is how you do very wide area distributed machine learning. 422 00:39:00,150 --> 00:39:05,010 So the first piece running SPARC is important because it's very high throughput. 423 00:39:05,010 --> 00:39:10,379 If you've got a lot of data in a data centre, you have a bunch of nodes in a very large memory footprint, 424 00:39:10,380 --> 00:39:18,960 ignoring SGX limits, very, you know, multi gigahertz processors, lots of cores, ten gig, even 100 gig, ethane everywhere. 425 00:39:19,230 --> 00:39:25,260 You've got a bunch of people in smart metres in a home or their smart TV in their home, and you're sharing mobile premises between homes. 426 00:39:25,380 --> 00:39:32,459 You've got a wide area network and the uplink out of people's home is typically today ADSL in 427 00:39:32,460 --> 00:39:38,790 90% of the UK is of ADSL is around a megabit uplink it's about 10 million homes on fibre. 428 00:39:38,790 --> 00:39:44,549 So then the uplinks are a bit faster, but the smaller parameters in the histogram, some values, 429 00:39:44,550 --> 00:39:48,660 they're not really a lot and you don't have to do it very often because how often do you run that computation? 430 00:39:48,660 --> 00:39:50,010 How how high throughput is it? 431 00:39:50,460 --> 00:39:56,550 If you were doing this on people's smartphone and you're trying to learn a model of them as part of a model of lots of people's, 432 00:39:56,550 --> 00:39:59,910 you know, health response to a sudden drop in temperature and they're going out running. 433 00:40:00,060 --> 00:40:04,860 And then you might want a feedback, somehow a warning to a collection of people in this thing saying, you know, 434 00:40:05,220 --> 00:40:11,220 famously, don't go and take the snow because the temperature drop combined with sweat will cause heart attacks. 435 00:40:11,360 --> 00:40:14,519 A large number I lived in Canada for was this warning you get. 436 00:40:14,520 --> 00:40:19,290 If you're over a certain age, it's like, you know, get your night, your neighbours, kids to drink, take the snow. 437 00:40:19,680 --> 00:40:27,960 Okay. So we built this platform called Owl, which is a distributed numerical package basically to start off with, and it's written in a camel. 438 00:40:28,320 --> 00:40:34,709 And we had a goal for doing this. We have a library operating system. Instead of being written in C, in C++ and being thrown in, it's alcohol. 439 00:40:34,710 --> 00:40:40,200 Well, this is from a library operating system we have in Cambridge called Mirage, which is a very cool system. 440 00:40:40,710 --> 00:40:44,910 Now, Camel, which means we don't have a large class of vulnerabilities. 441 00:40:45,780 --> 00:40:49,679 You don't have to say you might say what a camel y camel is like a variant. 442 00:40:49,680 --> 00:40:53,159 I mean, why didn't use Haskell or why didn't use something else is because we're Cambridge. 443 00:40:53,160 --> 00:41:03,090 We use a camel. But, you know, it's you could redo it in 3 minutes in Haskell while no 11 minutes but eshop right so okay. 444 00:41:03,420 --> 00:41:09,899 So we built all these things in in for doing that. For this reason it was for doing this distributed system. 445 00:41:09,900 --> 00:41:16,200 And so it's a lot of different applications. We've built a mad amount of code, so a lot of cool people contributing to this out there. 446 00:41:16,590 --> 00:41:25,409 So it's a brief picture of the the whole architecture of our sort of distribution in parallel analytics with a whole framework for applying functions 447 00:41:25,410 --> 00:41:34,139 over data in a wide area and various system backend is a lot of people work on this in Cambridge just to say we even got bits that go in browsers, 448 00:41:34,140 --> 00:41:38,820 we have bits do memory management for types we have this is all, by the way, 449 00:41:39,030 --> 00:41:45,359 Sam Statman in Oxford does some of this stuff with monads and numerical stuff. 450 00:41:45,360 --> 00:41:53,340 So there's a, there's some very cool theory people here which we liberally heard their bread, their papers went, yes, good, we can use that. 451 00:41:54,960 --> 00:42:00,450 Okay. So and we have we can even run code, we can map code down onto GPUs and all kinds of other pieces. 452 00:42:01,320 --> 00:42:09,430 So we have. You know, we have we have a raise, which doesn't sound very functionally very nice, but we have ways of doing MapReduce over those. 453 00:42:09,940 --> 00:42:14,680 We have neural nets and we have a way of doing peer to peer neurone so we can train up on your own. 454 00:42:15,130 --> 00:42:19,990 And we have a poster child. Example of this is learning to recognise faces. 455 00:42:20,860 --> 00:42:28,360 So we have a neural net running in raspberry PI's running this code with this library operating system matching the dot container and tracing. 456 00:42:29,020 --> 00:42:33,160 So it's even bullet proof at that level probably. I don't know how good that is. 457 00:42:34,210 --> 00:42:37,570 And then we're training on faces and we share all these parameters in this neurone about 458 00:42:37,570 --> 00:42:41,649 faces between lots and lots of little tiny notes that learn about three faces in this house. 459 00:42:41,650 --> 00:42:45,940 Three different faces in the house. What are features that make faces? And then you get a better model. 460 00:42:45,940 --> 00:42:50,710 And then you go, Oh, this is somebody who lives in this house. Oh, we don't recognise that is a face, but we don't know who it is. 461 00:42:51,310 --> 00:42:58,090 So. Right, so that's a, that's an application we have, it's an example of why would you want privacy and so on. 462 00:42:58,660 --> 00:43:02,830 So we skip that code again. You don't need to see the code if you want to use this. 463 00:43:02,830 --> 00:43:11,080 This is all downloadable as well. So I wanted to skip to now you're doing a distributed computation and there's this 464 00:43:11,110 --> 00:43:17,620 problem which is your you need to if you're iterating over data in our grand vision. 465 00:43:17,620 --> 00:43:25,450 Right. Is there say 35 million homes in the UK. Imagine every home has a Raspberry Pi or whatever your favourite small computer as a home hub. 466 00:43:25,870 --> 00:43:28,240 You know, we just send it to people in the post and you know, 467 00:43:28,630 --> 00:43:32,560 they just plug it in and forget about it and it's just hidden inside something they bought anyway. 468 00:43:32,950 --> 00:43:37,839 Right. And it says, you know, approve. Does this will look after your personal health data and we'll never send it to 469 00:43:37,840 --> 00:43:41,889 anyone else unless you prove and it might back it up encrypted to your GP's, 470 00:43:41,890 --> 00:43:46,060 you know, cloud service but they won't be able to look it without your permission. That'll be a sort of model of the world. 471 00:43:46,960 --> 00:43:49,390 So now you want to learn about things on that data, 472 00:43:49,390 --> 00:43:55,110 over 35 million nodes and you do an iteration and you need to do a sort of next step of the iteration. 473 00:43:55,120 --> 00:44:01,420 So if you're doing a classic MapReduce, you kind of everyone does this bit of the data and then they do this huge exchange of data. 474 00:44:03,180 --> 00:44:08,610 And you need to synchronise everything, don't you? Wouldn't you have like ten square messages going everywhere at this point? 475 00:44:09,000 --> 00:44:12,780 This is a problem for training units in a data centre. 476 00:44:13,820 --> 00:44:15,860 If you paralyse training neural nets over data, 477 00:44:15,860 --> 00:44:20,210 you split all your face data over lots of big nodes and data centre and you run your tensorflow for it. 478 00:44:20,450 --> 00:44:27,159 Then this is huge. Exchange rate, you go through the next step of the operation, the output comes out and you look at the you know, 479 00:44:27,160 --> 00:44:33,459 you may be looking at gradients or whichever thing you use for the feedback into the training, but into Sherritt, all with the other nodes. 480 00:44:33,460 --> 00:44:39,250 And you've got an N Square message problem. So, so this is the kind of piece of what we had to start thinking about, 481 00:44:39,250 --> 00:44:45,430 because now we're not in the luxury of a data centre where that won't scale and square, but at scale, certainly one scale, 35 million. 482 00:44:45,430 --> 00:44:49,329 But you don't have 35 and you might have a you might have 100,000 cores in a data centre is 483 00:44:49,330 --> 00:44:53,920 still 100,000 squared is not a good number of messages for every step of the iteration. 484 00:44:54,130 --> 00:44:56,590 So what do you do? So you need to throw away some stuff. 485 00:44:57,640 --> 00:45:03,160 So, so the classic sort of barrier synchronisation step in a kind of Hadoop is not going to work. 486 00:45:04,810 --> 00:45:09,330 So you can come up with other ways of doing this classic one if you want to see that. 487 00:45:09,370 --> 00:45:16,540 I think my the best read paper is probably hog wild where you have parameter servers that actually you send stuff to one point. 488 00:45:16,540 --> 00:45:19,390 So you have messages rather than square and they share it out. 489 00:45:19,720 --> 00:45:23,770 But there's more recent work where people have done other things you can do and you could run a synchronously. 490 00:45:23,980 --> 00:45:29,830 Why could you not run asynchronously in training? Well, it depends on the learning, but most algorithms might be gradient descent. 491 00:45:29,860 --> 00:45:35,470 Stochastic gradient descent will still converge. Even if you don't do everything at the same rate. 492 00:45:36,340 --> 00:45:45,340 You can be asynchronous even you can lose data if you do more iterations more than you lose accuracy by losing data. 493 00:45:46,280 --> 00:45:52,940 Then maybe you speed up the overall computation to get to the accuracy level you want your your training to run out. 494 00:45:53,450 --> 00:46:01,190 So. So that's kind of the theory behind what we built in this, where we relax all these things and decompose synchronous. 495 00:46:01,490 --> 00:46:04,790 Basically what we end up with. Right, is probabilistic. 496 00:46:06,130 --> 00:46:10,210 And there's one club. This is all name checked the right people. 497 00:46:10,240 --> 00:46:15,940 And hopefully there's one fantastically clever bit in this which is actually is faster and more accurate. 498 00:46:16,360 --> 00:46:19,989 But this is this is by Liang Wang, who's a post-doc in Cambridge. 499 00:46:19,990 --> 00:46:25,030 He came from Helsinki with a very smart Ph.D. and started working in mobile networks. 500 00:46:25,030 --> 00:46:32,979 And he looks at this stuff. And right now what you need to do is this, and you need to essentially discard results statistically. 501 00:46:32,980 --> 00:46:33,700 And you know what? 502 00:46:33,700 --> 00:46:42,220 If you if your sampling algorithm is correct, then the system can be made to converge arbitrarily as good as not losing those results. 503 00:46:43,230 --> 00:46:47,490 So you look at the accuracy you're getting from different nodes, giving you data, 504 00:46:47,700 --> 00:46:51,450 the output, the output parameters you normally centre, the whole qual parameter server. 505 00:46:52,200 --> 00:47:01,290 And then you can do really, really well with this. We haven't tried this with 35 million homes, so yet that's some ways off. 506 00:47:01,770 --> 00:47:05,549 We have tried it, you know, in a small systems with a thousand nodes and, you know, 507 00:47:05,550 --> 00:47:13,860 little test beds and so on and you're scaling stuff you can sort of extrapolate with a thousand is a large number, I think probably. 508 00:47:14,760 --> 00:47:21,630 Yeah. So we have this cunning sampling primitive, which is kind of a clever way to hold a function, so just an implementation trick and so on. 509 00:47:21,840 --> 00:47:26,909 So we, we've sort of discovered something we think I thought this was fantastically new. 510 00:47:26,910 --> 00:47:34,350 And then I examined a PhD on distributed neural net training, an imperial firm, Peggy Kerr. 511 00:47:34,610 --> 00:47:37,950 Remember she now Microsoft in a health true machine learning group. Very, very cool. 512 00:47:38,190 --> 00:47:43,169 And she'd come up with a very similar diff, slightly different angle with similar trait about two years before us. 513 00:47:43,170 --> 00:47:48,750 So we're like, Yeah, okay, that's good, that's good. Somebody else. Reproducibility research is a good thing. 514 00:47:49,800 --> 00:47:54,120 Okay. So and the tricks you can play in here with sort of the whole trade off again, 515 00:47:54,120 --> 00:48:02,579 I don't have time to go through these numbers but so you can change the step functions in training. 516 00:48:02,580 --> 00:48:06,060 Scalability is good, robustness is good and convergence is good and so on. 517 00:48:06,420 --> 00:48:15,780 And this is just sort of the bottom line is trying out this with some test set data sets that people use as standards in training these systems. 518 00:48:16,410 --> 00:48:21,720 If you're this is this is kind of scary actually how good this is one years coding by Liang 519 00:48:21,730 --> 00:48:28,559 and five other people and some other contributors around the world and the the green is sort 520 00:48:28,560 --> 00:48:33,030 of inference time for for you know training up in section V three for example is a classic 521 00:48:33,390 --> 00:48:38,280 versus a TensorFlow and Cafe two doing the same thing and we're in the same ballpark figure. 522 00:48:38,280 --> 00:48:39,590 And this is written in a camel. 523 00:48:39,600 --> 00:48:45,540 And it's, you know, it's very small, like numbers of lines of code, and it's really high level and readable and blah, blah, blah. 524 00:48:45,540 --> 00:48:48,540 There's all kind of a good thing. Okay. 525 00:48:48,550 --> 00:48:51,560 So that's kind of the end of my slides. 526 00:48:51,570 --> 00:48:58,080 Apologies. I really I didn't have a long enough train ride this morning to delete sort of the old joke. 527 00:48:58,530 --> 00:49:03,630 Half of them. I shouldn't have that. Please do email me if you want to follow up on any of the pieces here. 528 00:49:03,930 --> 00:49:07,140 There's very specific kind of acknowledgements. 529 00:49:07,680 --> 00:49:13,800 This is this is other people's lives or other people's work. I'm kind of a person who runs around trying to get the money from the funding agencies. 530 00:49:14,190 --> 00:49:25,229 You know, three goes, you get it usually. And the important groups, like I mentioned, is the large scale Distributed Systems Group run by Peter Pitts, 531 00:49:25,230 --> 00:49:31,379 a professor, Imperial College and founder computing. If you want to do systems work, absolutely great group. 532 00:49:31,380 --> 00:49:34,980 They have loads and loads of good people there and they do this kind of stuff. 533 00:49:35,340 --> 00:49:42,540 And then we in Cambridge we have an amount of petty, rich, malattia, nag Wang and a host of other people. 534 00:49:43,380 --> 00:49:54,750 This project was the spark putting in our into and Java into SGX is funded by the Turing and actually the main interest upon a Turing the 535 00:49:54,750 --> 00:50:01,670 defence people because they want to be able to do analytics and surveillance data and be able to prove that they're the wrong people. 536 00:50:01,680 --> 00:50:04,799 Didn't see the data or the people didn't see the wrong data. 537 00:50:04,800 --> 00:50:10,260 So they want to have a get out of jail card because they're now under the law, which is kind of interesting. 538 00:50:10,260 --> 00:50:14,880 But it's the same kind of motive, which is to be squeaky clean as the health care and the financial data. 539 00:50:15,210 --> 00:50:21,150 Right. So they have centralised all the data. Obviously, the other side here is data boxes and APC funded project. 540 00:50:21,420 --> 00:50:30,899 And the other partners in that I mention is Hamid had AIDS and Imperial and some folks at Nottingham and I didn't mention the downside of that. 541 00:50:30,900 --> 00:50:39,000 Okay. So there's a downside of the Spark's SGX stuff, which is the Spectre speculative attack plus side channels, 542 00:50:39,000 --> 00:50:41,430 which there are mitigations for, but they're problems. 543 00:50:41,970 --> 00:50:51,240 The downside of the data box stuff is we really, really haven't got a good solution to the how much can you learn by observing the mobile updates? 544 00:50:53,150 --> 00:50:56,870 Now, if you've read about machine learning, and particularly in deep learning, 545 00:50:57,560 --> 00:51:05,299 there are very clever people who've worked on how much can you infer from a trained classifier and then fix that problem for that one instance? 546 00:51:05,300 --> 00:51:09,140 But if you're looking at the thing being trained, you can probably infer most anything. 547 00:51:10,070 --> 00:51:14,270 And so there are decentralised attacks when all a decentralised approach, 548 00:51:15,020 --> 00:51:21,560 somebody could just basically join the network in a peer to peer system, get all the updates and then infer all the data pretty accurately. 549 00:51:21,890 --> 00:51:24,890 So we don't have an answer for that. That's like a well, okay. 550 00:51:25,010 --> 00:51:28,370 So THQ can infer how much electricity use every 2 minutes. 551 00:51:29,240 --> 00:51:34,100 On the other hand, you know, I suppose there, you know, in the financial side that's not a good solution, 552 00:51:34,220 --> 00:51:38,650 but the financial side we think is over in this space anyway. So it's not it's not the threat threatening, 553 00:51:38,840 --> 00:51:44,780 but in the middle is probably healthcare data where it starts out with a lot of it being centralised in hospital data records. 554 00:51:44,930 --> 00:51:51,680 But more and more, we're moving into this evidence based medicine where you carry devices and they monitor stuff about your behaviour and so on. 555 00:51:51,920 --> 00:51:53,480 And then that is the on the, 556 00:51:53,600 --> 00:52:01,040 on the data book side and then you might care about people inferencing things about your health which may not be public matters. 557 00:52:01,040 --> 00:52:05,930 And so, okay, so that's about it for my talk and I guess any questions time. 558 00:52:17,410 --> 00:52:25,899 Look, let me boot things off. So we were briefly before lunch discussing kind of multi-agent systems and actually the particularly the OWL framework, 559 00:52:25,900 --> 00:52:29,110 they had a very multi-agent system. Z feel about it, is that right? 560 00:52:29,110 --> 00:52:34,360 Or A That's a very good observation, which I completely not thought of. 561 00:52:34,640 --> 00:52:38,709 That's not how we think of it. But I write we should probably. 562 00:52:38,710 --> 00:52:40,630 That's a good point. Yeah. Yeah. 563 00:52:40,630 --> 00:52:48,640 Because it's basically we've moved to a set of asynchronous nodes which are distinctly messages or about what they've learned for parameters, 564 00:52:48,640 --> 00:52:50,860 for a model of what they've learned, not the data they've acquired. 565 00:52:51,340 --> 00:52:55,930 And that could start to look very multi agencies to be felt very much along those lines. 566 00:52:55,930 --> 00:53:02,170 Yeah, but it starts from a traditional training, you know, ensuring that we're doing linear regression of a, you know, 567 00:53:02,530 --> 00:53:10,450 multi dimensional thing and then we do this and then to make it work in a large scale, we go async and I guess we end up in the same kind of space. 568 00:53:10,690 --> 00:53:17,769 We're not we haven't thought about coming from another space which would be running a, you know, a probabilistic programming approach as well, 569 00:53:17,770 --> 00:53:23,380 which would be another thing I think would be to inject that into that just decentralise architecture would be fun. 570 00:53:23,680 --> 00:53:28,150 So there's probably some confluence of that stuff architecturally, which would be interesting. 571 00:53:28,240 --> 00:53:31,360 Yeah, no, it's really interesting. Thank you. I will take that back home. 572 00:53:35,320 --> 00:53:40,870 You mentioned the distributed energy fixes an interesting idea that the data was stay at the source. 573 00:53:41,640 --> 00:53:47,780 So. So there's a range of techniques being proposed in the research. 574 00:53:48,020 --> 00:53:51,830 Community. Thanks for privacy conclusion. 575 00:53:52,400 --> 00:53:55,750 What would be your judgement? How far you are from? 576 00:53:58,940 --> 00:54:03,750 Well. So in the. There's a separate, slightly, slightly different thing, which is, well, 577 00:54:04,050 --> 00:54:08,070 sort of homomorphic encryption would be a lovely it's kind of like cold fusion. 578 00:54:09,360 --> 00:54:13,170 I mean, that's not fair. It's like it's like not cuttlefish is like normal fusion. 579 00:54:13,320 --> 00:54:17,910 It's sort of we get it's a bit more like it's little nearer than quantum computing, practical quantum computing. 580 00:54:18,150 --> 00:54:22,740 But to be fair, it's demonstrable and it's the most reliable, very simple functions. 581 00:54:22,740 --> 00:54:28,540 And that would be really cool because that would be much better than relying on unbelievably complex extensions. 582 00:54:28,560 --> 00:54:38,520 Intel do. So you have homophily encrypted data and one of the crypto functions over the data and relatively simple code getting it to go fast. 583 00:54:38,730 --> 00:54:43,740 Charlie is the big challenge, but people are actually making. So I think it's probably fair to compare it with the sort of normal fusion where 584 00:54:43,740 --> 00:54:48,050 they're actually visible progress in plausible directions and in specific functions. 585 00:54:48,060 --> 00:54:54,420 I've seen some really cool results there. So that's definitely it's a good research direction in cryptography and math. 586 00:54:54,420 --> 00:54:58,020 If you're in that space, super fast algorithms in that space, 587 00:54:58,260 --> 00:55:07,919 the differential privacy might be a technique we throw at the decentralised exchange of things where we might put a bounds around what we exchange to 588 00:55:07,920 --> 00:55:13,559 make sure that a certain number of peers get the data for the models and the 589 00:55:13,560 --> 00:55:17,219 data is checked against an epsilon to make sure it's differentially private. 590 00:55:17,220 --> 00:55:20,930 It doesn't reveal things about the raw data. The model parameters are coming from. 591 00:55:21,330 --> 00:55:29,100 I, i we'd have to think about, you know, what the relationship between model promise parameter inferring is, but other people have done some of that. 592 00:55:29,310 --> 00:55:30,570 So that would definitely be a thing. 593 00:55:31,470 --> 00:55:39,330 And then we were talking earlier about when you federate data from multiple agencies into a central system before you even get it. 594 00:55:39,330 --> 00:55:45,090 In the central system, a lot of cases differentially private data might be good in it, good enough for a lot of problems. 595 00:55:45,090 --> 00:55:48,120 We're trying to do infancy of some health care thing. 596 00:55:48,120 --> 00:55:53,460 You know, I sort of jokingly said, you know, maybe use of ozone in swimming pools causes asthma, 597 00:55:53,760 --> 00:55:57,330 you know, so you do a map of where asthma attacks show up in hospitals. 598 00:55:57,540 --> 00:56:02,340 You do a map of swimming pools where they use ozone instead of chlorine for for, you know, cleaning the pool. 599 00:56:02,580 --> 00:56:09,810 And then you do a correlation and you can do, you know, location based differential privacy pretty well normalised to the population distribution. 600 00:56:10,740 --> 00:56:15,209 And you still have enough cases of these. I mean, I you know, I don't believe that's an actual causal link, 601 00:56:15,210 --> 00:56:19,050 but I'm just giving you as a hypothetical example would be a question you might ask and it could 602 00:56:19,050 --> 00:56:23,100 be done securely with differential privacy and you don't need any of this complicated mechanism. 603 00:56:23,220 --> 00:56:29,910 The data would still be staying in privately owned, secure databases and only query results that met this. 604 00:56:29,910 --> 00:56:31,860 This limit would be a good exchange. 605 00:56:32,010 --> 00:56:37,590 So I think that's a technique and apply lots of places and is very practical and well described in the literature. 606 00:56:38,040 --> 00:56:41,220 Um, yeah. Okay. Next question. 607 00:56:43,220 --> 00:56:46,840 It's one of their. Oh. There's a microphone. 608 00:56:47,110 --> 00:56:56,100 Yeah. You seem to have talked a lot about the confidentiality side and also in the distributed bond availability. 609 00:56:56,140 --> 00:57:05,020 But in terms of integrity, you mentioned integrity of data, the encryption level, but what about injection of data sort of Byzantine problems? 610 00:57:05,020 --> 00:57:10,750 Because if you're going to do smart metering and there's money involved, can I get somebody else's bill changed? 611 00:57:11,650 --> 00:57:19,450 Can I actually inject bad data into this as a great point and I truth in advertising, 612 00:57:19,450 --> 00:57:23,409 we don't have a fix for that in our decentralised architecture or in a peer to peer. 613 00:57:23,410 --> 00:57:31,420 Well, we're subject to all the attacks that are being demonstrated time and again on those worlds where money's involved, I suppose. 614 00:57:32,980 --> 00:57:39,820 I suppose there could be some way of carrying signature data through their mobile inference that sort of signs the, 615 00:57:40,120 --> 00:57:43,360 the, the model and says this was derived in some way from something. 616 00:57:43,450 --> 00:57:47,770 So the metre companies still own the audit trail without getting the detail result. 617 00:57:48,160 --> 00:57:55,690 But I don't I don't know that would that's completely fair criticism in that space we have to to tackle that 618 00:57:55,690 --> 00:58:01,299 somehow I don't you know we could try to sort of have a conversation about it or if you've got a solution, 619 00:58:01,300 --> 00:58:04,400 I'd love to hear one, but I'd say. Brilliant. 620 00:58:04,430 --> 00:58:07,370 Thanks. Thanks. You? No, it's a completely fair point. Yeah. 621 00:58:07,370 --> 00:58:11,940 And it's always a problem of these decentralised systems that they have some plus point or some minus side. 622 00:58:12,140 --> 00:58:16,020 That's one of the big ones. Yep. Injecting fake data with. 623 00:58:16,950 --> 00:58:20,910 Yeah. Yeah. For fun and profit. Good point. 624 00:58:20,930 --> 00:58:29,460 There was a question up there. Right over there. Thanks for a very practical question, I suppose. 625 00:58:30,060 --> 00:58:33,490 Have you seen beginning your lecture to SGX? 626 00:58:33,500 --> 00:58:41,310 Have you seen that be made available on the likes of IWC or these sorts of large cloud providers? 627 00:58:41,940 --> 00:58:50,850 SGX on IS on Azure as confidential cloud computing, and I haven't managed to yet to talk to the people that do it, 628 00:58:51,240 --> 00:58:56,190 but I believe IWC will have it, you know, in the next minute or two. 629 00:58:56,190 --> 00:59:00,330 But, but of course we have this problem with the Spectre attack. 630 00:59:00,660 --> 00:59:06,420 I believe the way that Asia deals with that is they have only ported their tools. 631 00:59:06,420 --> 00:59:11,040 They want you to use the confidential cloud that they have, which is Hadoop and SQL Server. 632 00:59:11,370 --> 00:59:15,809 And so therefore they can do the they can handcraft the mitigations for the attacks. 633 00:59:15,810 --> 00:59:19,379 And I think that that they're really good, that group actually. 634 00:59:19,380 --> 00:59:21,030 So I suspect that they've got that right. 635 00:59:21,030 --> 00:59:27,180 But that means you're sort of in some sense is stuck with the tools they've done, but that they're pretty okay tools for some things. 636 00:59:28,140 --> 00:59:34,620 So I think generally deploying SGX as a sort of extra service for users to just use, firstly, 637 00:59:34,860 --> 00:59:39,150 you need some newer processes and they're generally not in the high performance yet. 638 00:59:39,510 --> 00:59:46,110 And secondly, the memory limit probably will kill a lot of your customer base unless somebody is very clever. 639 00:59:47,040 --> 00:59:51,299 And thirdly, now we're just sort of worried about, oh, there's a new version of it's supposed to be out anyway, 640 00:59:51,300 --> 00:59:56,160 which may mitigate the Spectre attack better and gets rid of a large part of the memory limit problem. 641 00:59:56,310 --> 01:00:02,610 So I could imagine really big cloud provider would just be, you know, having lots of conversations with Intel and maybe even saying, 642 01:00:02,610 --> 01:00:07,649 oh, we're having a conversation with Andy at the same time, you know? But yeah, it's a good question. 643 01:00:07,650 --> 01:00:10,680 But I think if you want to use this, you can go to Azure. 644 01:00:10,680 --> 01:00:15,090 And they have a couple of things that I think are pretty, pretty solid. 645 01:00:18,270 --> 01:00:24,839 Other questions. Okay. 646 01:00:24,840 --> 01:00:26,910 Well, everybody, let's thank our speaking and thank you.