1
00:00:01,090 --> 00:00:05,310
[Auto-generated transcript. Edits may have been applied for clarity.]
So. If you haven't.

2
00:00:13,700 --> 00:00:20,470
Okay. Um, I'd like to welcome everybody to this chance is screen Lecture, which is our distinguished lecture.

3
00:00:20,480 --> 00:00:29,570
Um, before I introduce the speaker, one of our sponsors of progressive content, uh, who has been hosting this series since 2015.

4
00:00:30,170 --> 00:00:34,250
And if any students or researchers want to meet that afterwards, you can do that during coffee.

5
00:00:35,180 --> 00:00:39,350
Um, one last administrative thing up at the top.

6
00:00:39,350 --> 00:00:43,969
We're going to have a Q&A here and we'll have some microphones running around.

7
00:00:43,970 --> 00:00:47,210
So if you want to ask a question, um, we'll do it that way.

8
00:00:47,840 --> 00:00:56,600
Okay. So I am super delighted to introduce Chef Nicole Foster's business about, um, the best speaker I could actually ever imagine.

9
00:00:56,780 --> 00:01:02,390
I would think there's a conference in the room like this where the other people concurring with this opinion.

10
00:01:02,780 --> 00:01:09,319
Uh, she is a professor at Berkeley, um, and the former director of the Simons Institute for Theoretical Computer Science.

11
00:01:09,320 --> 00:01:14,680
Actually, she's just stopped me now, but she retains her role as research director of the.

12
00:01:14,690 --> 00:01:18,920
We're still you have to support, uh, Simons seeds. Um, sound.

13
00:01:19,490 --> 00:01:25,280
In addition to that, she is a professor at M.I.T. and also emeritus advisement.

14
00:01:25,280 --> 00:01:34,860
So she's done very many locations. Um, when I decided what to tell you, her the world's accomplishments are so significant.

15
00:01:34,880 --> 00:01:44,030
And what I decided to do is to read you just a tiny bit from the citation of just a few different major awards.

16
00:01:44,990 --> 00:01:47,990
And I'm going to start with the prize in 1993.

17
00:01:48,560 --> 00:01:56,240
So Jaffe was awarded that prize for her paper, The Knowledge Complexity of Interactive Proof Systems.

18
00:01:56,900 --> 00:02:00,320
And I've shortened these a little bit tonight because of her time.

19
00:02:00,320 --> 00:02:07,760
But the citation points out that she until the colleague introduced the concept of interactive systems,

20
00:02:08,090 --> 00:02:12,350
providing a rich new framework for addressing the question of what constitutes

21
00:02:12,350 --> 00:02:16,700
a mathematical proof goes on to say that this led to exciting developments,

22
00:02:16,700 --> 00:02:25,370
including the resolution of several major problems, about the difficulty of funding European institutions to comfort level optimisation problems.

23
00:02:26,180 --> 00:02:31,070
Um, so a little prize would be pretty much enough for anybody, but, um, Jasmine's got two of them.

24
00:02:31,750 --> 00:02:39,170
Yeah, she had another kernel prize in 2001 for her paper, Interactive Approach to the Hardness of Approximating twigs.

25
00:02:39,620 --> 00:02:41,790
Uh, with, uh, if I didn't know much difference.

26
00:02:42,800 --> 00:02:50,390
And, um, here the citation says that they developed a completely new understanding of what an algorithm property was in terms of art,

27
00:02:50,990 --> 00:02:59,750
with their deep insights and powerful results evoking different factors for many problems science, for example, computation and photography.

28
00:03:01,100 --> 00:03:08,360
The last one of them generated was Turing Award, which is pretty much the biggest prize you could ever get in science.

29
00:03:08,810 --> 00:03:14,390
Um, so I'm looking forward to 2012, along with, um, the quality, um, um,

30
00:03:14,600 --> 00:03:20,120
citation says for transformative work that laid the complexity theoretic foundation for science,

31
00:03:20,240 --> 00:03:25,970
cryptography and in the process pioneered methods for efficient prosecution of mathematical proofs.

32
00:03:26,360 --> 00:03:33,320
But so Shafi is a member of the National Academy of Sciences, engineering and USA.

33
00:03:33,800 --> 00:03:43,580
Um, the Russian um Israel Science um, a former member of the um Russian side, Peter, and a bunch of others.

34
00:03:43,910 --> 00:03:48,200
She's also got a long list of honorary degrees, including here at Oxford.

35
00:03:48,620 --> 00:03:51,830
Um, we are completely up for today.

36
00:03:52,280 --> 00:03:59,230
Um, speaking to us about privacy verification and robustness, a cryptographers perspective on on them.

37
00:03:59,660 --> 00:04:03,980
So thank you very much. Uh, discussion.

38
00:04:04,580 --> 00:04:11,309
So if you want a darker I. So.

39
00:04:11,310 --> 00:04:17,900
Hi everyone, and I'm very honoured to be invited for your lecture and also to have received an honorary degree here in Oxford.

40
00:04:17,910 --> 00:04:22,260
It's so beautiful. It's a little hotter than I thought. Uh, so okay.

41
00:04:22,260 --> 00:04:28,440
Um, so the title is a little different than what I gave originally, but most of the same when you look at the content of the talk.

42
00:04:28,710 --> 00:04:34,410
So I'm going to call it constructing and deconstructing Trust in I totally a cryptographic perspective.

43
00:04:34,770 --> 00:04:43,500
So, um, you know, I've given some version of this talk without most of the results that are here, uh, maybe, uh, five, six years ago.

44
00:04:43,770 --> 00:04:46,290
And at that point, I don't know that I would have used this slide.

45
00:04:46,290 --> 00:04:52,500
But at this point, it's very clear that machine learning revolution is really the scientific breakthrough of our times.

46
00:04:52,830 --> 00:04:59,640
And, um, that started with deep learning. Now with the large language models, it's kind of achieved an entire level altogether.

47
00:04:59,850 --> 00:05:04,980
And when Leslie was saying about the greatest honour in computer science, I was thinking about these Nobel Prizes.

48
00:05:05,280 --> 00:05:07,080
Oh, for machine learning people.

49
00:05:07,410 --> 00:05:16,500
Um, so, you know, obviously it raises these very deep questions about human versus machine intelligence and whether large language models,

50
00:05:16,500 --> 00:05:23,700
this project that Michael and I'm working on will even give us the ability to translate non-human communication people.

51
00:05:23,970 --> 00:05:27,120
This is all talking about AGI and superhuman intelligence.

52
00:05:27,120 --> 00:05:29,130
They're betting markets about when that's going to happen.

53
00:05:29,670 --> 00:05:38,520
Um, you have uh, already the the oh one doing very well on competition math, competition code, PhD level science questions.

54
00:05:38,520 --> 00:05:42,270
There's a whole benchmark of new questions out there which we don't know the answer for.

55
00:05:42,600 --> 00:05:47,460
We all agree. I think with this slide, and you don't need me for that.

56
00:05:47,970 --> 00:05:54,870
The leading is to the fact that, uh, whether or not we believe that there will be that this is the biggest scientific revolution,

57
00:05:54,870 --> 00:06:02,729
it's going to solve the Riemann hypothesis, whatever. Uh, we do know already that is huge, uh, problem in not only promise, but, uh, for applications,

58
00:06:02,730 --> 00:06:08,610
whether it's health, drug discovery, infrastructure, natural language processing, policing, uh, financial markets.

59
00:06:09,030 --> 00:06:13,680
And all of this indicates, really, that there's a major shift of power, which we're all aware of.

60
00:06:14,070 --> 00:06:18,580
For those who have data and those who have algorithmic expertise, and one can argue in,

61
00:06:18,660 --> 00:06:22,140
uh, in debate where the data is more important or algorithms are more important.

62
00:06:22,380 --> 00:06:26,880
But what is very clear is that's where the power lies and all that's beautiful.

63
00:06:27,510 --> 00:06:34,200
Uh, the problem that, uh, this talk is about is should we really trust models to the extent that we are very rapidly,

64
00:06:34,200 --> 00:06:38,400
uh, trusting, uh, we don't understand and we don't control.

65
00:06:39,500 --> 00:06:45,680
And, uh, I think at least for myself, maybe because I'm of a certain age, you know, that makes me very nervous.

66
00:06:45,680 --> 00:06:50,510
But I think it should make everybody nervous, uh, when we're talking about critical applications.

67
00:06:52,040 --> 00:06:58,480
So, uh, since we talk about trust and since I'm a cryptographer, I want to tell you a second something about cryptography.

68
00:06:58,490 --> 00:07:02,210
So often people think about cryptography, basically just an arsenal of tools.

69
00:07:02,480 --> 00:07:09,890
You know, there's encryption, there's digital signatures. People talk about obfuscation, zero knowledge proofs, metamorphic encryption and so forth.

70
00:07:09,920 --> 00:07:13,010
Okay, so the beautiful constructions out there, some of them are used in practice.

71
00:07:13,010 --> 00:07:20,780
Some of them will be used in practice. Uh, but if you want to sort of zoom out from all these tools, essentially all of them aim at the following.

72
00:07:20,780 --> 00:07:25,850
And that is how to use technology that you don't trust, uh, as if you should trust it.

73
00:07:26,060 --> 00:07:31,070
So even though you are assuming that there are adversaries, you know, somebody is trying to listen to your messages,

74
00:07:31,070 --> 00:07:35,900
somebody is trying to pretend to be you to to, you know, have access to your bank account.

75
00:07:36,080 --> 00:07:39,200
Somebody might want to, um, reverse engineer your program.

76
00:07:39,360 --> 00:07:43,100
Is in the case of application, you want to say, I don't want to think about it. Well, I'm using this.

77
00:07:43,280 --> 00:07:49,220
So what cryptography does. It comes up with abstractions and tools that enable you to sort of pretend as if these adversaries don't exist.

78
00:07:49,430 --> 00:07:53,510
So it is sort of a, uh, um, uh, a zooming out view.

79
00:07:53,520 --> 00:08:01,429
It's a way to establish trust in technology. And not only that, it's, um, this is true for each one of these, uh, tools that I've mentioned,

80
00:08:01,430 --> 00:08:06,770
but there's sort of almost at this point, uh, um, a cookbook recipe when you want to solve a crypto problem.

81
00:08:07,830 --> 00:08:13,110
You kind of define the task that you're talking about. Uh, whether it's encryption or digital signatures and so forth.

82
00:08:13,200 --> 00:08:16,590
And then a very, very important step is you model what's called an adversary.

83
00:08:16,830 --> 00:08:20,459
So again, an adversary would have powers. He might be a passive adversary.

84
00:08:20,460 --> 00:08:25,170
Just be listening. It might be an active it might be changing messages going on the line and so forth.

85
00:08:25,320 --> 00:08:30,480
But you're supposed to model it mathematically. Precisely. Then you define what it is a secure solution.

86
00:08:30,780 --> 00:08:34,469
So it might not be 100% what you'd like. What what you define it.

87
00:08:34,470 --> 00:08:37,080
At least it's mathematical definition of what security of solution would be.

88
00:08:37,410 --> 00:08:41,250
And then you construct something and most importantly, you have a security proof.

89
00:08:41,550 --> 00:08:45,900
So you want to say if, um, an assumption, you know, all of a sudden assumptions come in.

90
00:08:45,900 --> 00:08:52,650
So we'll say something about that in a minute. But you say under very well specified assumption, if they hold, then this, uh,

91
00:08:52,740 --> 00:08:56,580
primitive that you constructed is secure according to the definition that you've laid out.

92
00:08:56,640 --> 00:09:01,320
Okay. And it's sort of the history of cryptography in the last, at least since I've been involved in it,

93
00:09:01,470 --> 00:09:07,170
is that these reductions between well specified assumptions and proofs of security is what the game is about.

94
00:09:07,680 --> 00:09:10,379
Um, so without proofs, we're saying we have nothing, really,

95
00:09:10,380 --> 00:09:14,550
because we don't know if somebody is breaking the scheme and not telling us, then what's the point of the whole thing?

96
00:09:14,730 --> 00:09:17,730
So the proofs are really the main thing in our disposal.

97
00:09:17,940 --> 00:09:19,060
Now what kind of assumptions?

98
00:09:19,110 --> 00:09:25,740
I say the word assumptions because they are used all over cryptography and a lot of of course the work is to try to minimise them.

99
00:09:26,130 --> 00:09:29,190
They really fall into sort of uh, some categories.

100
00:09:29,370 --> 00:09:35,490
One, which is the one that we mostly hope for, is that we believe that certain problems are hard.

101
00:09:35,610 --> 00:09:40,139
Certain computational problems. I'll talk about one of them during the talk. Usually people know about factoring integers.

102
00:09:40,140 --> 00:09:44,290
Here's a problem. We don't know how to do it efficiently. Let's assume there is no such algorithm.

103
00:09:44,310 --> 00:09:48,510
And now build a primitive and prove that it's secure under that.

104
00:09:49,230 --> 00:09:51,080
There are other assumptions you could talk about.

105
00:09:51,090 --> 00:09:57,840
There's several entities in this system, and maybe they or each one of them might be adversarial, but they don't all collude, uh, against you.

106
00:09:58,230 --> 00:10:00,750
Another type of assumption would be a physical assumption.

107
00:10:01,170 --> 00:10:07,049
Uh, a physical assumption might be that, uh, you know, if you're talking about quantum physics or something that there are quantum computers,

108
00:10:07,050 --> 00:10:13,890
uh, might be some other assumptions of distance between that you're blocking, uh, two entities or they can't actually transfer information.

109
00:10:14,340 --> 00:10:16,770
And these days, people talk about trusted hardware.

110
00:10:17,040 --> 00:10:21,809
The point of view of this line is to say that, uh, we would like to minimise we don't wanna use all of these assumptions.

111
00:10:21,810 --> 00:10:29,190
We'd like use none. Sometimes we'd like to use 1 or 2. But important part is really to emphasise this, uh, security proof.

112
00:10:29,400 --> 00:10:33,630
So what does this have to do with machine learning? Oh, by the way, I just want to say that sometimes you show that it's impossible.

113
00:10:33,660 --> 00:10:39,690
So achieve security, you define a security, and then you show that maybe even also under the assumption that any scheme would break,

114
00:10:40,140 --> 00:10:44,010
you know, and um, okay, so machine learning.

115
00:10:44,010 --> 00:10:50,700
So the proposal and in some sense this sort of the idea of this whole program of research is to use the same kind of recipe,

116
00:10:51,150 --> 00:10:57,240
uh, for machine learning, uh, meaning that, uh, if there's a task, we'll define it first of all, properly.

117
00:10:57,420 --> 00:11:02,190
And that's already, you know, I mean, I assume that a lot of machine learning researchers here,

118
00:11:02,190 --> 00:11:05,040
and certainly there are people who know more about machine learning than I do.

119
00:11:05,400 --> 00:11:15,360
Um, but I have I think we all have to agree that if you read any of these very fascinating papers with incredible ideas and breakthroughs,

120
00:11:15,990 --> 00:11:17,910
they very rarely define what they're doing.

121
00:11:18,120 --> 00:11:23,939
So, um, so if you talk about alignment, you know, I would like I want to work on alignment, but and I will work on alignment.

122
00:11:23,940 --> 00:11:29,370
But I have to define it myself. And you might not be consistent with other people's understanding, but there are no definitions.

123
00:11:29,730 --> 00:11:34,469
Uh, there's words, uh, and um, words are, I guess, definitions also words.

124
00:11:34,470 --> 00:11:38,920
But in any case, and then in very important part is also who is the machine learning adversary.

125
00:11:38,940 --> 00:11:43,770
You know, maybe in cryptography it's more obvious. It's not necessarily the case here, but we'd like to define it.

126
00:11:44,070 --> 00:11:48,990
And then we want to define what it would be. Since we're talking about trust what would be a trustworthy solution.

127
00:11:49,350 --> 00:11:53,669
So give, uh, a definition. And then finally you might try to come up with a solution.

128
00:11:53,670 --> 00:11:59,850
And we want to focus on the fact that, again, we would like to have some kind of proofs that under maybe some assumptions,

129
00:12:00,000 --> 00:12:05,340
this solution that you have constructed is trustworthy with respect to your your definition of trustworthy.

130
00:12:05,820 --> 00:12:09,660
So maybe maybe this is more important slide than anything that it would be.

131
00:12:09,930 --> 00:12:13,290
It would be good to be able to have a definition. Maybe it's not going to be the right definition.

132
00:12:13,470 --> 00:12:17,910
Maybe later on there will be a more general one, or maybe they'll be another one that you could show equivalent,

133
00:12:18,240 --> 00:12:21,780
uh, but that there will be some kind of formal thing that you're trying to to satisfy.

134
00:12:22,320 --> 00:12:25,680
Uh, and again, what assumptions? It could be the cryptographic assumptions. It could be others.

135
00:12:25,680 --> 00:12:29,610
So this is sort of the mindset for this talk. Okay.

136
00:12:29,970 --> 00:12:35,820
So um, and again, as I said before, possibly you what you might be want to achieve,

137
00:12:35,820 --> 00:12:40,680
like you define what what alignment is you you define what your adversary is.

138
00:12:41,070 --> 00:12:46,290
Uh, you might not be able to achieve the definition you came up with, and you might even be able to prove that you cannot achieve it.

139
00:12:46,290 --> 00:12:51,150
Exactly. So what do people want from you? Or you could change your definition, or you could change your solution.

140
00:12:51,540 --> 00:12:56,639
And I'll show you in this talk a couple examples where you can actually, uh, achieve, uh,

141
00:12:56,640 --> 00:13:03,150
what we want and couple in one assumption where you can prove that it's impossible to achieve it, and then you have to contend with us.

142
00:13:04,050 --> 00:13:06,440
Okay, so, uh, just one word before.

143
00:13:07,180 --> 00:13:14,290
Sort of uh, go into these examples which are verification, robustness and privacy, like in the title that was advertised.

144
00:13:14,860 --> 00:13:22,390
Um, I want to say that the first, I think, um, gut level reaction by people in machine learning is or at least it used to be,

145
00:13:22,900 --> 00:13:28,030
uh, is that whereas cryptography is designed with an adversary in mind, if there's no adversary, there's no problem.

146
00:13:28,240 --> 00:13:35,170
I mean, why should you encrypt if nobody's listening, you know, why should you protect, uh, accessing, uh, Amazon if nobody's pretending to be you?

147
00:13:35,500 --> 00:13:41,950
But machine learning, that's not the mindset. You didn't come up with machine learning solutions because you're trying to get over an adversary.

148
00:13:42,190 --> 00:13:45,940
But the fact is. So maybe this whole adversarial thinking is not the right one.

149
00:13:46,450 --> 00:13:53,340
But the fact is that right now, AI systems and for the foreseeable future are very attractive targets.

150
00:13:53,350 --> 00:14:00,550
It's a huge payoff to break them in many different ways, whether it's on a national level or a commercial level.

151
00:14:00,790 --> 00:14:07,270
And therefore, if you think about an adversary, it's a good way to think, because if you could show that things work in the presence of an adversary,

152
00:14:07,750 --> 00:14:16,510
you should feel safer than if you assumed, uh, a particular behaviour on, on, uh, on part of the adversary.

153
00:14:16,550 --> 00:14:22,990
So my second point is we don't when we talk about adversaries, we would like to prepare for worst case adversary.

154
00:14:23,530 --> 00:14:27,009
So we we we don't want to say, oh, the adversary uses a particular strategy.

155
00:14:27,010 --> 00:14:31,659
Let's say, uh, those people are familiar with, um, you know, adversarial examples.

156
00:14:31,660 --> 00:14:37,959
There are solutions that say, well, if the adversary does the following to an image, say, then I can protect against it.

157
00:14:37,960 --> 00:14:42,130
If it's use restricted, the ability is restricted to some transformations of images.

158
00:14:42,280 --> 00:14:45,969
But why should the answer be restricted that they can put a patch on a stop sign?

159
00:14:45,970 --> 00:14:52,990
And I'm saying things we've all seen in other talks. Um, so we'd like to, if possible, to prepare for worst case adversary.

160
00:14:53,410 --> 00:14:57,160
But one thing we do throughout cryptography, and also throughout this talk,

161
00:14:57,370 --> 00:15:02,010
is that we do assume that the adversary is not is computationally unbounded.

162
00:15:02,020 --> 00:15:07,270
So in other words, it cannot factor integers if you like, but more it doesn't have more than it doesn't have exponential time.

163
00:15:07,330 --> 00:15:11,500
So there is some bound on the running time of the adversary or other resources as well.

164
00:15:11,890 --> 00:15:17,980
The resources these days could be samples, you know, there's a distribution or there's data and there's some at this cost.

165
00:15:18,100 --> 00:15:24,280
So there is some bound on how much data, how much time, how much. And again, when we talk about bound I mean more than polynomial okay.

166
00:15:25,000 --> 00:15:33,670
Um so. Uh, what are the challenges that I will talk about, which are sort of an example for this approach?

167
00:15:34,090 --> 00:15:37,900
Um, so here it looks really pale. You can actually see the screen.

168
00:15:38,680 --> 00:15:44,140
Yeah. Okay, fine. Um, good. So there will be I'll talk a little bit about privacy.

169
00:15:44,710 --> 00:15:52,090
Uh, kind of aiming at this issue there. A lot of the power, at least initially, came from data of individuals and the, um, the training data.

170
00:15:52,090 --> 00:15:58,210
And, and we might not want to give over medical records and so forth, or might not be able to because of legal restrictions.

171
00:15:58,720 --> 00:16:04,960
Uh, then I'll talk about verification. And this aims at the fact that these models I'm not training my own model.

172
00:16:04,990 --> 00:16:10,900
You know, I'm using a model that was trained by others, whether it's DeepMind or, or, um.

173
00:16:12,050 --> 00:16:18,880
OpenAI or whatever company, and we would like to have some way to verify that these models that are presented to us are good.

174
00:16:18,890 --> 00:16:23,780
Okay. So they haven't been tampered with for whatever, uh, for whatever reason.

175
00:16:24,140 --> 00:16:28,969
And lastly, uh, I'll mentioned about robustness. So that is uh, we have the data today.

176
00:16:28,970 --> 00:16:34,250
We train. It looks good. It's often with respect to some benchmarks or some holdout information.

177
00:16:34,370 --> 00:16:42,380
But what about distribution shifts and data of the future? Uh, how can we trusted the models robust with respect to change in data.

178
00:16:42,710 --> 00:16:46,520
Okay, so those are the three, uh, areas that I'll talk about.

179
00:16:46,670 --> 00:16:50,800
And you can already see that there is if you had to sit down and say, okay, so who's the adversary?

180
00:16:50,810 --> 00:16:56,030
It is very clear the adversary in the first might be the training algorithm that's getting our data.

181
00:16:56,210 --> 00:17:03,230
The adversary in the second might be this company that's training a model and might be not doing it the way that they claim they're doing it,

182
00:17:03,230 --> 00:17:06,320
or maybe saving on resources and therefore doing something different.

183
00:17:06,650 --> 00:17:10,970
And the adversary here is in some sense even nature. Uh, so distributions do shift.

184
00:17:11,120 --> 00:17:15,530
And what do you do then. But he could also be much worse adversary okay.

185
00:17:15,770 --> 00:17:20,360
So just to say um, very simplified version.

186
00:17:20,360 --> 00:17:25,360
So we'll talk a little bit about what happens during the collection and training phase.

187
00:17:25,370 --> 00:17:29,689
Then post uh, development of a model so called the audit phase.

188
00:17:29,690 --> 00:17:38,899
And then when it's being used and uh, again, very, very high level, I'll think of machine learning just in these two, uh, places.

189
00:17:38,900 --> 00:17:42,080
I'm not looking at, um, you know, reinforcement learning or anything.

190
00:17:42,080 --> 00:17:48,890
Just think of a training algorithm gets inputs from a distribution labelled x, y and outputs a model H.

191
00:17:49,100 --> 00:17:54,230
So h for hypothesis a model and once a minimise loss according to some loss function.

192
00:17:54,590 --> 00:17:57,230
And then the other phase will be there is already the training.

193
00:17:57,440 --> 00:18:03,710
Uh there's already the model H and now we are getting some individuals data and getting some prediction probability.

194
00:18:03,920 --> 00:18:09,020
But more than that this could be a a language model. So this is a query and this is an answer.

195
00:18:09,380 --> 00:18:17,060
Um, so not just about prediction but also generation. And um it could be distribution over sequences or sequences.

196
00:18:17,060 --> 00:18:21,290
So let's be very general in kind of thinking about what I'm aiming at.

197
00:18:21,860 --> 00:18:23,060
So I'll start with privacy.

198
00:18:23,840 --> 00:18:32,360
So in some sense privacy is kind of the easiest, um, problem, not necessarily the easiest to achieve in terms of the computing power that's necessary,

199
00:18:32,570 --> 00:18:39,770
but it's the easiest to define because we have so much experience, you know, especially as cryptographers in trying to protect privacy of data.

200
00:18:40,490 --> 00:18:49,520
Um, so there are lots of tools, you know, the secret sharing, uh, some of these tools, I will mention it, but just kind of is a laundry list here.

201
00:18:49,520 --> 00:18:56,450
The longer one, private information retrieval, secure computation and one morphic encryption and so forth, there's lots of tools to our disposal.

202
00:18:56,750 --> 00:18:58,460
And in some sense, we can use all of them.

203
00:18:58,490 --> 00:19:03,680
Uh, and they're all defined to have certain, uh, secure, achieve certain security under certain assumptions.

204
00:19:03,950 --> 00:19:07,790
And they kind of naturally fit to some of the problems that come up naturally.

205
00:19:07,820 --> 00:19:13,850
Uh, in, in machine learning, uh, in particular, we're going to use homomorphic encryption just in the next two slides.

206
00:19:14,030 --> 00:19:16,730
So homomorphic encryption is this little chart here.

207
00:19:17,030 --> 00:19:22,339
It's an encryption method that has some special property that not only that you can encrypt and then uh,

208
00:19:22,340 --> 00:19:26,659
decrypt, but you can also do something we call evaluate.

209
00:19:26,660 --> 00:19:31,640
And that is you could take the encryption without understanding what is x, just what the ciphertext is.

210
00:19:31,940 --> 00:19:37,190
And if you run an evaluation on it so that at the end you get, uh, a new encryption.

211
00:19:37,190 --> 00:19:41,240
But what's being encrypted is whether it be encrypted X, it'd be encryption of F of X.

212
00:19:41,660 --> 00:19:47,870
And you could apply that encryption step on this new C prime and get back uh f of x.

213
00:19:48,110 --> 00:19:52,229
So that's really and the question is which FS are being supported. Could be addition could be multiplication.

214
00:19:52,230 --> 00:20:00,830
It could be any circuit. Um and that's one of the biggest, probably the biggest invention in the um the last 20 years in cryptography.

215
00:20:00,830 --> 00:20:04,220
There's around 2008, the first homomorphic encryption scheme was proposed.

216
00:20:04,820 --> 00:20:07,790
And it would be extremely useful also for what I'm trying to do here.

217
00:20:08,510 --> 00:20:15,860
So for example, just to kind of get us warmed up, suppose we want to actually, uh, apply it to the training data problem.

218
00:20:16,160 --> 00:20:20,420
Uh, again, this is sort of, I think a homework problem once you understand the definitions to,

219
00:20:20,630 --> 00:20:23,930
to do this slide in a fairly low level homework problem.

220
00:20:23,930 --> 00:20:31,879
But, you know, this is, uh, just to set the stage. Um, so let's say that all the data is encrypted now and who encrypted.

221
00:20:31,880 --> 00:20:35,990
That's a good question. But let's say you encrypt it for some kind of, uh, trivial example.

222
00:20:36,260 --> 00:20:41,900
Each person encrypt their medical file, or each doctor or each hospital encrypts all the medical files in their disposal.

223
00:20:42,080 --> 00:20:47,750
They all come together, they give it to some, uh, trainer, uh, and the trainer works on it.

224
00:20:48,140 --> 00:20:54,530
Now instead of working on the, uh, data itself directly on X and Y, they, uh, they're working on the encryption.

225
00:20:54,710 --> 00:20:58,850
So what you need is an algorithm that is able to do whatever program that we're going

226
00:20:58,850 --> 00:21:03,980
to run on the X and Y's in the clear to do it on the encryption so that if you,

227
00:21:04,320 --> 00:21:08,479
uh, what we just said a minute ago is, uh, called the encrypted compute stage.

228
00:21:08,480 --> 00:21:11,940
If we have in our disposal something that can take X and. Ys.

229
00:21:12,510 --> 00:21:15,650
Those are the X over there. And the labels wise with the medical file is x.

230
00:21:15,660 --> 00:21:18,930
The label of whether you survive or don't survive might be why?

231
00:21:19,380 --> 00:21:22,850
Um, then um, everything would be done under encryption.

232
00:21:22,860 --> 00:21:26,460
We'll talk about cost in a minute, but in an ideal world we have this encryption method.

233
00:21:26,880 --> 00:21:30,000
We get the encrypted model. Now we want to use it though comes in model.

234
00:21:30,000 --> 00:21:35,430
We unnecessarily use it as a piece of information which is encrypted. So at this stage we have to decrypt it.

235
00:21:35,790 --> 00:21:39,620
So how are we going to decrypt it. So the idea here is there's no single party.

236
00:21:39,660 --> 00:21:45,150
It's certainly not open. AI is going to have the decryption key because then they could just equipped crypto or medical record information.

237
00:21:45,480 --> 00:21:52,080
But there is uh, a decryption key metaphorically look like three people here that all each have a part of a key.

238
00:21:52,470 --> 00:22:00,080
And, uh, they are. There's and then they can apply their share in order to decrypt, uh,

239
00:22:00,510 --> 00:22:06,899
this encryption of each such that eventually, after each one goes within steps or together, voila!

240
00:22:06,900 --> 00:22:11,160
H appears in the clear. Okay. Uh, so what are the assumptions here?

241
00:22:11,400 --> 00:22:15,480
The assumptions are, first of all, there is an encryption scheme that has the properties I want.

242
00:22:15,780 --> 00:22:19,560
And that's based on something called learning with errors, which I'll put up in a slide in a minute.

243
00:22:19,740 --> 00:22:24,000
So it's not a factoring problem of integers, but it's a different problem comes from sort of geometry of numbers.

244
00:22:24,480 --> 00:22:30,720
Um, and that these guys or these shareholders, uh, don't collude.

245
00:22:31,140 --> 00:22:32,240
So there are two assumptions here.

246
00:22:32,250 --> 00:22:38,820
One is that you have given the pieces of the key to P to entities that don't all collude together in order to break the scheme,

247
00:22:39,120 --> 00:22:44,399
and the scheme is secure. Now, those people maybe worked on differential privacy.

248
00:22:44,400 --> 00:22:46,750
How many people have heard that? Yeah.

249
00:22:46,780 --> 00:22:54,400
Many, uh, anyway, so you could say that this is no good because H, for example, could be just the list of everybody's data.

250
00:22:55,240 --> 00:22:58,390
If it's a list of everybody data, then H itself reveals people's data.

251
00:22:58,570 --> 00:23:03,970
So maybe you want to have some requirement from H. But that's sort of beyond this is a that's a different problem.

252
00:23:04,090 --> 00:23:08,500
So one problem is a design an H two does not leak the information of X and y.

253
00:23:08,710 --> 00:23:15,310
The other is how do you actually compute h. You know, without looking at the x's and y's in the data.

254
00:23:15,760 --> 00:23:21,720
Um, and the problem with differential privacy, there are lots of old papers about their training under differential privacy.

255
00:23:21,730 --> 00:23:26,290
I'm not going to touch that. Okay. So it didn't give you a definition here.

256
00:23:26,440 --> 00:23:32,570
But the definition really would be that whatever algorithm you had that was working in the clear okay.

257
00:23:32,610 --> 00:23:34,960
If you look at that algorithm versus the encrypted algorithm,

258
00:23:35,230 --> 00:23:42,220
essentially you have you will learn whatever you can learn from the encrypted one is what you could have, um, is.

259
00:23:42,240 --> 00:23:46,930
Um, so the other way, whatever you could learn from um.

260
00:23:48,750 --> 00:23:51,780
You shouldn't learn more. I can't think right now.

261
00:23:52,110 --> 00:23:55,840
I went to sleep at 130. What is the definition you would want?

262
00:23:55,870 --> 00:24:00,630
Let me ask you. So the definition we would want from this scheme is that we are, uh,

263
00:24:00,640 --> 00:24:05,380
you going to find out more about the data, what the x and y is more than H provides.

264
00:24:05,680 --> 00:24:08,799
So h we say is okay, everybody should get h at the end. Okay.

265
00:24:08,800 --> 00:24:10,360
Well open the at least you get h at the end.

266
00:24:10,570 --> 00:24:20,500
But we want that whatever it is you recover from H about x and y's is um is holds regardless of whether you had access to this encryption or not.

267
00:24:20,920 --> 00:24:23,140
So in other words, H obviously gives you things okay.

268
00:24:23,380 --> 00:24:28,360
But you don't want to give you you don't want the fact that you looked at the encryption of the medical file to give you more than H.

269
00:24:28,900 --> 00:24:32,620
And that's the definition that would be satisfied. So what does this learning with errors.

270
00:24:33,720 --> 00:24:34,650
This thing that I said,

271
00:24:34,650 --> 00:24:41,940
that is this is the assumption based on which we build encryption scheme with this extra property that they can compute on ciphertext.

272
00:24:42,270 --> 00:24:48,329
So this is the problem. Um, essentially let's think of S as a vector in uh execution.

273
00:24:48,330 --> 00:24:51,350
And uh, and then we're given and that's the secret.

274
00:24:51,360 --> 00:24:57,960
So we don't know what the values of these, uh, SES are, the components of the vector, um, and so on and so forth here.

275
00:24:58,110 --> 00:25:02,700
Uh, and is for but we're given a lot of equations in these variables.

276
00:25:02,940 --> 00:25:09,660
But of course, if we're just given equations, uh, without the, we could do Gaussian elimination and figure out what's is.

277
00:25:09,840 --> 00:25:14,190
So we're given noisy equations. So in other words on the left hand side are the coefficients.

278
00:25:14,190 --> 00:25:17,909
And on the right hand side we add some noise. So this aid is not really the right solution.

279
00:25:17,910 --> 00:25:22,170
That's already the solution. Plus some uh noise for some Gaussian distribution.

280
00:25:23,040 --> 00:25:25,130
So it turns out that this problem.

281
00:25:25,140 --> 00:25:34,350
So give me a bunch of uh, coefficients in uh, in a noisy right hand side is, um, is as hard as decoding random linear code,

282
00:25:34,890 --> 00:25:40,140
as hard as, uh, approximating the size of the shortest vector in a worst case N-dimensional integer lattice.

283
00:25:40,260 --> 00:25:43,950
So it's a it's not only problem that seems that we don't know how to solve,

284
00:25:44,220 --> 00:25:49,260
but it's a problem that people have shown that average on the average case distribution is as hard as the worst case problem,

285
00:25:49,380 --> 00:25:54,660
which we don't know how to solve. And when I say don't know how to solve, I don't mean that I don't know how to solve it.

286
00:25:55,020 --> 00:26:01,230
I mean that the best known algorithm and that is actually true even for quantum computers today takes exponential time.

287
00:26:01,770 --> 00:26:04,500
So two to the n and it was the was the length of the vector.

288
00:26:04,740 --> 00:26:08,910
So this is a problem that has been considered for many years, especially in the lattice version.

289
00:26:09,420 --> 00:26:13,050
Um, and uh, this is the best we know how to do.

290
00:26:13,470 --> 00:26:18,840
Now, this doesn't mean that tomorrow some genius is going to come up, maybe from Oxford, and come up with a polynomial time algorithm,

291
00:26:19,110 --> 00:26:25,350
in which case maybe it's even better than, uh, having a secure FCE because they've sold some fundamental problem.

292
00:26:25,560 --> 00:26:29,040
But it's what we call the win win. So we say if you could attack the scheme.

293
00:26:30,030 --> 00:26:36,059
The encryption scheme. Then we gave you actually a constructive reduction from that attack to solving, uh,

294
00:26:36,060 --> 00:26:40,440
this worst case approximate, um, this worst case, uh, shortest vector to lattice problem.

295
00:26:40,860 --> 00:26:46,830
Okay. So say you we need a way. Either you have a good scheme or you have a bad scheme, in which case you have a good algorithm.

296
00:26:47,250 --> 00:26:51,120
Um, and that is what was achieved by the proof.

297
00:26:51,810 --> 00:26:56,610
Okay. So this is, um, this is all beautiful in theory, but.

298
00:26:56,610 --> 00:27:02,189
Oh, often when people talk about, uh, that is four years since it was invented in 2009.

299
00:27:02,190 --> 00:27:06,300
And the development people were saying, well, it's very nice in theory, but it's impractical.

300
00:27:06,690 --> 00:27:10,110
So first of all, I hate that sentence because it never really holds with time,

301
00:27:10,770 --> 00:27:14,190
because whatever is impractical in the past usually becomes practical in the future.

302
00:27:14,640 --> 00:27:19,080
Um, so specifically here, uh, there is a library,

303
00:27:19,080 --> 00:27:27,810
an open source for polymorphic encryption library by many different people and, uh, certainly, um, it, uh, it's open.

304
00:27:27,960 --> 00:27:31,860
So, uh, it is possible to run this in practice.

305
00:27:32,010 --> 00:27:33,450
Now it's all a question of scale.

306
00:27:33,660 --> 00:27:41,730
So when I say scale, really, if you think about large language models and you want to, uh, train foundation models and fine tune and so forth,

307
00:27:41,880 --> 00:27:46,080
you would like to know how really what is the concrete efficiency of this, of these schemes.

308
00:27:46,380 --> 00:27:54,030
And really the way to think about is this sort of three different axis. There's the adversary, which we sometimes think is very curious.

309
00:27:54,030 --> 00:27:55,860
Okay, but he's honest. He doesn't tamper.

310
00:27:56,100 --> 00:28:03,240
Then it might be, uh, actually an active malicious adversary as a change things also in his quest to break your encryption scheme.

311
00:28:03,810 --> 00:28:06,980
Um, the adversary here is the machine learning training algorithm.

312
00:28:06,990 --> 00:28:11,790
Right. Because they get the encryption. It could also be the people providing the data, and that's called poisoning.

313
00:28:11,790 --> 00:28:17,400
It's not part of this particular part of the talk. Um, then there's a question of what is a computation.

314
00:28:17,710 --> 00:28:21,240
Could be linear regression. Could be logistic regression. It could be deep nets.

315
00:28:21,240 --> 00:28:26,820
Neural nets could be language models. Each one of these is a big jump in terms of complexity with computation.

316
00:28:27,090 --> 00:28:30,510
So where is we. We can use homomorphic encryption for linear.

317
00:28:30,510 --> 00:28:36,360
No problem. We know we have things for logistic when we get into large language models is obviously not there.

318
00:28:36,570 --> 00:28:41,100
Okay. So what do people do. So there's another axis here which is what are the assumptions.

319
00:28:41,550 --> 00:28:45,690
So one assumption I talked about was learning with errors of factoring. So these are the computational assumption.

320
00:28:46,020 --> 00:28:51,509
Another assumption might be that you have more than one party privy to the computation.

321
00:28:51,510 --> 00:28:55,230
But they don't talk to each other okay. And the third assumption is trusted hardware.

322
00:28:55,530 --> 00:29:00,480
And that's really kind of the big player these days. Uh, so when people talk about trusted hardware.

323
00:29:02,490 --> 00:29:04,760
A. What are they? What do they mean?

324
00:29:04,770 --> 00:29:11,340
So they may mean many things, but mostly what they mean is this new chip of Nvidia, which is not that new at this point,

325
00:29:11,350 --> 00:29:20,610
few years h100 where the idea is that they in some sense say, here's this, here's a secure, uh, box and give it put encryption in there.

326
00:29:21,030 --> 00:29:28,260
The box holds the secret key and it, uh, inside of it, it will decrypt, it will do some computation, and then it will encrypt and send it out.

327
00:29:28,380 --> 00:29:38,430
Okay. So in a sense, you're trusting that this box works, uh, without leaking anything about the secret key or the message that came in encrypted.

328
00:29:38,850 --> 00:29:43,920
And then there's also another usually attestation, uh, module which says, not only that, but you're going to get approved.

329
00:29:44,070 --> 00:29:48,960
What was done inside there under the auspices of this box was the correct computation.

330
00:29:49,740 --> 00:29:56,430
And this is, uh, there used to be, uh, Intel SGX, uh, which was similar, but it wasn't a GPU based,

331
00:29:56,430 --> 00:30:03,720
but uh, and it had similar kind of, uh, boxes and similar kind of belief that it just works.

332
00:30:04,350 --> 00:30:08,790
Um, so I guess the sponsors are hiding this line here.

333
00:30:08,790 --> 00:30:20,500
Just second. Okay, so beware whenever you talk about, uh, side channel attacks that, um, sorry, whenever you talk about secure hardware.

334
00:30:20,590 --> 00:30:22,780
Ah, that there are these what's called side channel attacks.

335
00:30:22,780 --> 00:30:26,919
It could be that because of the amount of power consumed or because two people use the same chip,

336
00:30:26,920 --> 00:30:33,250
and then something was left from the previous, um, uh, from the previous, uh, encryption that was in the box.

337
00:30:33,610 --> 00:30:38,290
Um. There are some, um, bugs.

338
00:30:38,290 --> 00:30:40,780
I guess maybe you don't trust the companies blindly.

339
00:30:41,440 --> 00:30:46,480
These are things that there's a whole field of people that are out there working on it, on on secure hardware.

340
00:30:46,570 --> 00:30:52,420
But for our sake, we could use it. But we just have to understand that that's part of the assumption.

341
00:30:53,080 --> 00:30:57,490
All right. So has this been used. So this is all like a, um it's like a blueprint.

342
00:30:57,820 --> 00:31:05,020
So mostly the usage is have been sort of in finance is a lot of crypto use, but the use more of a machine learning type use of privacy.

343
00:31:05,440 --> 00:31:13,090
Uh, it's been mostly on, uh, in medical and in actually in genomic, uh, uh, field.

344
00:31:13,330 --> 00:31:18,549
So why why do it in, um, when you talk about, uh, genomes?

345
00:31:18,550 --> 00:31:24,430
Because you need a lot of data. So it's a place where complexity comes in and has large amounts of data because otherwise you have no effect.

346
00:31:24,700 --> 00:31:31,839
Right? So if you think about things like Crohn's disease or, uh, breast cancer, uh, in schizophrenia, you know,

347
00:31:31,840 --> 00:31:37,420
you only when you go to large enough numbers that you can actually find signal in the genome that's related to the phenotype.

348
00:31:37,720 --> 00:31:44,200
And, um, so it's a, it's a perfect example of a place where privacy is required because you're talking about medical records,

349
00:31:44,500 --> 00:31:48,190
but, uh, size is, uh, in size is important.

350
00:31:48,190 --> 00:31:53,229
Uh, so how do you achieve it? So this was a place so 1 in 2020.

351
00:31:53,230 --> 00:31:58,240
We had a paper doing Gwas. Uh, this is also a company that I'm a co-founder.

352
00:31:58,240 --> 00:32:05,229
And so also part of it where we use exactly this type of blueprint like I talked about, except what you're training is,

353
00:32:05,230 --> 00:32:09,400
you're running a dual genome wide association study analysis under homomorphic encryption.

354
00:32:09,730 --> 00:32:17,200
And then at the end you're getting whatever the answer is. And you'd like to check the accuracy of the answers so that it worked correctly.

355
00:32:17,440 --> 00:32:20,409
And also with the time performances. And at the time.

356
00:32:20,410 --> 00:32:27,820
So this is 2020, the extrapolation was 100,000 individuals, 500,000 snips in this amount of time on a single server.

357
00:32:28,030 --> 00:32:32,410
This is the kind of application that doesn't have to work in real time. Today, the numbers would be much better.

358
00:32:33,310 --> 00:32:40,719
What's another example? Another example. Uh, and just a point to make here is that who is holding the keys in the case of Gwas,

359
00:32:40,720 --> 00:32:43,360
it's actually you can think of you have different medical centres.

360
00:32:43,360 --> 00:32:49,419
If there's the Broad Institute and there's Oxford, I don't know, there's there's UCSF and they all each have their database.

361
00:32:49,420 --> 00:32:55,180
They could be the ones holding the keys, the parts of the key because they don't they don't want to share with each other, but they're okay.

362
00:32:55,540 --> 00:33:03,580
Uh, um, I guess the as a key holder, there is a partial key holder themselves and then they come together and reconstruct H.

363
00:33:04,420 --> 00:33:07,719
All right. So that was 20 1020 then 2023.

364
00:33:07,720 --> 00:33:12,880
There's this is an application working on real world uh uncle oncology called data analysis.

365
00:33:13,000 --> 00:33:21,700
So this is actually a hospital that has a lot of, uh, cancer patients and wanted to do something which was, um, uh, combining actually,

366
00:33:22,030 --> 00:33:30,400
the novelty here was that you the data was, um, you know, rather than having lots of data, it looks the same, but for, for different patients.

367
00:33:30,580 --> 00:33:35,469
You had one medical centre that had some, uh, there are electronic medical records over here.

368
00:33:35,470 --> 00:33:38,620
Is that another medical centre that had their, uh, results?

369
00:33:38,620 --> 00:33:42,220
Uh, once they were cancer patient. And you wanted to.

370
00:33:42,310 --> 00:33:46,299
So we think of it as sort of same patient, different data held in different places.

371
00:33:46,300 --> 00:33:49,150
So you need sort of joint operations to sort of identify this,

372
00:33:49,150 --> 00:33:53,590
the same patient enjoy order and all this under encryption because you're not allowed to send each other data.

373
00:33:54,340 --> 00:34:00,610
Um, and uh, what was used for that is also something called a threshold effect, but doesn't matter.

374
00:34:01,090 --> 00:34:07,210
The main thing is you can support a lot of statistics that people do in medical packages, like, uh,

375
00:34:07,660 --> 00:34:16,150
you know, Kaplan-Meier, um, survival curves, uh, and lots of other very common statistical operations.

376
00:34:17,560 --> 00:34:26,740
Okay. So the last thing that I'm going to say about privacy is that now in 2024, the thing of secure hardware has come into the picture.

377
00:34:26,890 --> 00:34:33,610
So often what you see is medical studies where you have different, uh, let's say different, uh, in this particular example,

378
00:34:33,610 --> 00:34:42,250
it's different hospitals, a, you know, minute of say, well, the application was each hospital trains sort of their own, um, a model locally.

379
00:34:42,370 --> 00:34:45,790
And then they combined using something called Federated Learning to come up with a

380
00:34:45,790 --> 00:34:49,840
bigger model that takes all the information of the different hospitals into account.

381
00:34:50,170 --> 00:34:55,540
And you use secure hardware because let's say and this so there's both this thing called federated learning.

382
00:34:55,540 --> 00:35:02,380
So this is the federation and also fully homomorphic encryption plus what's called trusted execution environment which is hardware.

383
00:35:02,860 --> 00:35:11,620
Okay. Um why why use hardware if we were doing so well in the oncological study and the Gwas is because in this particular application,

384
00:35:11,890 --> 00:35:16,180
it was to take a whole bunch of, uh, a pathological slide images.

385
00:35:16,180 --> 00:35:21,040
So they're very, um, uh, communication heavy to send this over the line.

386
00:35:21,040 --> 00:35:28,269
So you wanted to do things locally and then just send sort of updates of models A and even the updates, you wanted them to be secure.

387
00:35:28,270 --> 00:35:31,540
And that's why you use the secure hardware. Okay. There's a lot here.

388
00:35:31,900 --> 00:35:35,080
I just wanted to show the only thing to retain here is this.

389
00:35:35,270 --> 00:35:38,480
Privacy in some sense is easy to define because we've worked on it.

390
00:35:38,780 --> 00:35:44,690
We have the tools, so a lot of it is about putting the tools together in an efficient manner, you know, for different applications.

391
00:35:44,870 --> 00:35:51,460
So it's at the stage where we're privacy matters. It is worth taking a look at this type of technology and making it work.

392
00:35:51,470 --> 00:35:54,720
And that's being done. Okay, let's go on.

393
00:35:55,060 --> 00:35:58,270
Talk about verification. What time is it? How much time?

394
00:35:58,270 --> 00:36:01,360
I've only been talking. Wow. Okay. Um.

395
00:36:02,020 --> 00:36:09,130
Okay. I started a little late. Okay. So, as I said, machine learning is really should be thought of as a service.

396
00:36:09,160 --> 00:36:10,470
I'm not doing it in my own office.

397
00:36:10,480 --> 00:36:18,040
Somebody else is doing it, and I use it so I can think of it as a there's a client, it gives, uh, and it gets this model.

398
00:36:18,040 --> 00:36:21,760
And it could be even the client supply the data. So forget about privacy now.

399
00:36:22,000 --> 00:36:30,730
Somehow the machine learning trainer, the service provider, uh, got data and came up with a model and gave us back to the client.

400
00:36:32,280 --> 00:36:35,730
Okay. So now the question is what do I know about this service providers.

401
00:36:35,850 --> 00:36:39,460
As you see they have like ears and a tail like the devil.

402
00:36:39,480 --> 00:36:48,270
Maybe not the devil, but the devil incarnate. Um, you know, these are sort of the signs, you see when you drive from Berkeley to San Francisco.

403
00:36:48,390 --> 00:36:55,840
There's a company called hive, and they keep on changing their signs, and the current sign is, um, uh, or maybe this was a few months ago.

404
00:36:55,860 --> 00:36:59,249
We make eye work. Uh, don't just trust us.

405
00:36:59,250 --> 00:37:03,450
Test us. I don't know what they mean, but. But they themselves are confessing that you shouldn't trust.

406
00:37:04,510 --> 00:37:14,280
Um, okay. Okay. So, um, so what would we like to do is we would like to sort of post development, verify properties of this model.

407
00:37:14,280 --> 00:37:20,009
So what are the properties you want to verify? There could be many. For example you could verify you want to verify the model is accurate.

408
00:37:20,010 --> 00:37:22,020
That's the thing that most people concentrate on right.

409
00:37:22,020 --> 00:37:26,160
They have they have their own data and they test the model on and see that they get the right result.

410
00:37:26,610 --> 00:37:29,339
Um, you want to say that it's correct?

411
00:37:29,340 --> 00:37:35,220
Maybe it's not good enough that it's, well, good on the average, but you want it to be correct on the prediction that it gives you.

412
00:37:35,730 --> 00:37:40,740
Uh, you want to talk about robustness. What happens when you're a distribution change.

413
00:37:40,740 --> 00:37:43,979
You might want to talk about fairness and the United States, you don't use that word anymore.

414
00:37:43,980 --> 00:37:46,680
But you know, it used to be a big field a few months ago.

415
00:37:46,980 --> 00:37:52,799
So whether it's fair to different, uh, subpopulation, I think it is one of the words is stricken.

416
00:37:52,800 --> 00:37:56,130
I'm not sure, but, um, it should be.

417
00:37:56,310 --> 00:38:02,910
Now, what do you want to talk about? Safety. Um, a or maybe later down the, uh, down the line.

418
00:38:02,910 --> 00:38:07,830
When the regulations are set, you want to prove that it satisfies regulations, whatever they are.

419
00:38:08,430 --> 00:38:11,940
And obviously you could retrain by yourself. So that doesn't make any sense.

420
00:38:11,940 --> 00:38:17,669
Uh, you have to have some restriction on this, uh, on this module that verifies properties,

421
00:38:17,670 --> 00:38:21,059
which is that it will be cheaper than retraining on your own. Cheaper?

422
00:38:21,060 --> 00:38:26,940
How cheaper? Maybe using fewer data samples, maybe lower quality data, maybe being much more efficient,

423
00:38:27,150 --> 00:38:30,400
maybe just using black box access to the model because they don't necessarily

424
00:38:30,400 --> 00:38:33,720
going to give you all the model weights and tell you how the model works. Exactly.

425
00:38:34,110 --> 00:38:39,090
Okay. So this is like a big goal. How do you verify properties of the model that you were given.

426
00:38:39,600 --> 00:38:44,310
So I will show you, uh, a couple of results about this.

427
00:38:44,550 --> 00:38:50,640
So first of all, as I said, accuracy is the sort of simplest thing that to define with respect to a given distribution.

428
00:38:50,940 --> 00:38:54,269
And what people do today is that they have benchmarks, right? They have benchmarks.

429
00:38:54,270 --> 00:38:57,420
And if you pass benchmarks, then you then you're golden.

430
00:38:58,110 --> 00:39:00,719
Um, so could you have like a more rigorous definition?

431
00:39:00,720 --> 00:39:06,629
Because the problem with benchmarks is everybody knows the benchmarks, uh, a benchmarks in general.

432
00:39:06,630 --> 00:39:09,060
It's hard to define what it is that they are achieving for you.

433
00:39:09,510 --> 00:39:15,360
Uh, so, um, so this is a paper from a few years ago, uh, which we call interactive proofs for verifying,

434
00:39:15,360 --> 00:39:19,200
uh, should say, verifying, uh, the quality or accuracy of machine learning.

435
00:39:19,560 --> 00:39:25,800
And, um, and this is just getting warmed up in some sense, this particular paper, uh,

436
00:39:25,830 --> 00:39:30,900
so the definition we gave there is on one hand, there's valiant, uh, PAC learning, I'm sure that everybody here knows.

437
00:39:31,080 --> 00:39:32,729
So you give examples.

438
00:39:32,730 --> 00:39:39,570
And what does it mean that you're eventually your model, uh, it learns is that, uh, you know, it's accurate with high probability.

439
00:39:39,960 --> 00:39:45,570
And, uh, you could talk now about probabilistic and approximate verification, not just learning.

440
00:39:45,960 --> 00:39:49,380
So whereas in PAC, you talked about learning. Now I want to verify.

441
00:39:49,740 --> 00:39:55,440
And here's a possible definition. What do I want to verify I want to verify the let's say this is the model designer.

442
00:39:55,440 --> 00:39:59,130
And they gave me a model. Let's say it looks like a neural net here, but just h in general.

443
00:39:59,370 --> 00:40:02,490
Now I could ask the questions and get back answers could go interactively.

444
00:40:02,700 --> 00:40:08,279
And at the end I want to be with very high probability, uh, assured that, um,

445
00:40:08,280 --> 00:40:13,409
this model is within an additive error of the most accurate model possible for a given distribution.

446
00:40:13,410 --> 00:40:21,809
So what does that mean? So let's unpack it a little bit more. Um, we say that let's say this to this H is um, is in a class.

447
00:40:21,810 --> 00:40:27,840
So it could be neural nets of a certain depth. It could be decision trees. It could be, uh, some other machine learning algorithm.

448
00:40:28,230 --> 00:40:33,720
Uh, but there's a class of, uh, uh, that is that your model belongs to.

449
00:40:33,990 --> 00:40:39,090
And you would say that this class is verifiable if, uh, something here.

450
00:40:40,080 --> 00:40:45,050
Some. There's something that was underneath this.

451
00:40:45,230 --> 00:40:51,560
Never mind if the following one a it's verifiable if for every.

452
00:40:51,570 --> 00:40:52,190
Um.

453
00:40:53,690 --> 00:41:03,740
So that for every other, uh, hypothesis h prime in the class, the loss of your H is within an additive error of that, uh, or the loss of of H prime.

454
00:41:03,890 --> 00:41:11,630
So it's not worse than the best by except for it by an additive value from the burst from the best hypothesis in that class.

455
00:41:12,140 --> 00:41:21,650
Um, and, uh, of course, you want again that in addition to the fact that your loss is most additive, uh, is that double efficiency and that is it.

456
00:41:21,650 --> 00:41:26,240
The verify is much more efficient in sample size or quality than than the model designer.

457
00:41:26,480 --> 00:41:30,200
And can you get that. So we have various a bunch of results.

458
00:41:30,200 --> 00:41:33,799
So here is one. This is sort of very technical uh for general talk.

459
00:41:33,800 --> 00:41:40,040
But this is uh the class of functions here, uh, which are t sparse.

460
00:41:40,040 --> 00:41:45,049
So if you look at their Fourier representation, there's only a constant number of non-zero coefficients,

461
00:41:45,050 --> 00:41:51,200
which are large, uh, and uh, this captures decision trees, the formulas a C0 circuits.

462
00:41:51,620 --> 00:42:01,580
And the theorem shows that let's say that, uh, the designer was modelling with h a a ground truth that obeyed these properties.

463
00:42:01,850 --> 00:42:08,990
Okay. Then you can verify that the model that they gave you is as good as, uh, within additive from,

464
00:42:09,110 --> 00:42:16,700
from the best one, uh, using a lot less, uh, data than what the model designer used.

465
00:42:16,850 --> 00:42:21,670
So in particular, you use essentially, um, only random samples.

466
00:42:21,680 --> 00:42:28,040
So it's well known that in order to learn these kind of functions, you need query access, um, to the samples.

467
00:42:28,040 --> 00:42:33,770
So you need to choose X and then get well, you can't just get random XYZ whereas the verifier you can just use random samples.

468
00:42:35,100 --> 00:42:40,050
So what does that say? It means the verifier still has to do work, but he's working with a lot less.

469
00:42:40,060 --> 00:42:42,330
Uh uh uh uh,

470
00:42:42,330 --> 00:42:51,870
availability of data and availability of access to data and yet into now verify that the model that it was given is indeed as good as basically,

471
00:42:52,200 --> 00:42:55,890
except for the error term, uh, any model of the form that he got.

472
00:42:56,930 --> 00:43:03,860
And there are a lot of other results. In fact, there was an improvement by Tom Gore of this particular result, uh, called On the Limit of Interactive.

473
00:43:04,490 --> 00:43:10,010
So this is another paper, but Tom Gore's result is actually quite interesting. He improve the efficiency significantly.

474
00:43:10,460 --> 00:43:13,570
Okay. Now what about worst case guarantees.

475
00:43:13,580 --> 00:43:19,010
So what this talks about is somebody gives me a model and I want to check that the model does well okay.

476
00:43:19,010 --> 00:43:22,159
And I want to get like a normal benchmarking.

477
00:43:22,160 --> 00:43:27,260
But with respect to this definition for example it does as well as any other model in the class that I provided.

478
00:43:27,800 --> 00:43:37,130
But what about worst case guarantees. So let's say the model that I got, you know, I checked in either using various methods or benchmarking.

479
00:43:37,340 --> 00:43:43,820
And 99% of the time it's good on the task. Um, but then a particular input comes in.

480
00:43:44,750 --> 00:43:48,440
And you get a particular output. How do you know that?

481
00:43:48,440 --> 00:43:55,970
Actually the answer you got, whether it's a prediction or you in the context of a large language model and you ask it how to I don't know,

482
00:43:56,330 --> 00:44:00,730
it's a maybe you gave it's your medical record in there. And he told you that you have five years to live.

483
00:44:00,740 --> 00:44:03,800
You'd like to know whether that's accurate. Um, not wait five years.

484
00:44:03,980 --> 00:44:07,670
So, so so, um, uh, we don't know anything.

485
00:44:07,670 --> 00:44:15,680
Really. We know there's no answer. As you well know, there's lots of examples when even multiplying large numbers, that the answer is incorrect.

486
00:44:16,310 --> 00:44:23,660
So what we'd like is really ultimately to be able to not just know that the model overall has 99% accuracy,

487
00:44:23,840 --> 00:44:27,649
but we'd like to know where the particular answer on our particular question is.

488
00:44:27,650 --> 00:44:32,860
Correct. So I think if this is where the average case to worst case, even in the worst case,

489
00:44:33,070 --> 00:44:37,690
the hardest x, do you give get any kind of certainty that the answer is correct?

490
00:44:38,110 --> 00:44:43,210
And um. This is in a paper with two of my students, actually.

491
00:44:43,390 --> 00:44:50,950
So I used to be my student, but now he's a faculty already for many years. Uh, it's addresses, actually, not prediction, but a large language model.

492
00:44:50,950 --> 00:44:59,649
So generative models. And we asking, could you actually train a model since these models supposedly can be trained to do everything,

493
00:44:59,650 --> 00:45:06,850
so you ask it a question gives you an answer. Math, science, literature, um, psychological advice.

494
00:45:06,940 --> 00:45:10,750
Maybe they can also generate proofs in the same way that they generate answers.

495
00:45:10,780 --> 00:45:15,910
So you would like them to give you not only the output y, but also a proof that Y is correct.

496
00:45:16,060 --> 00:45:17,530
Of course. What do we mean by correct here?

497
00:45:17,770 --> 00:45:24,249
If it's a math problem as well, define, you know, um, when we're talking about natural language and so forth, I think it's a fascinating question.

498
00:45:24,250 --> 00:45:25,809
How are we going to define what correct means?

499
00:45:25,810 --> 00:45:32,260
Do we have a database of correct facts that we can show that, you know, there's a contradiction to whatever was generated with those facts.

500
00:45:32,830 --> 00:45:36,430
Let's leave that aside. Let's say that we have a notion of correct, let's say within mathematics.

501
00:45:36,760 --> 00:45:39,610
So two numbers being multiplied. There is one answer. That's correct.

502
00:45:39,910 --> 00:45:44,590
It could be multiple answers that are correct, but we have at least a sound definition of what correctness means.

503
00:45:45,190 --> 00:45:49,180
So now that we have that, can we actually teach a language models to come up with proofs as well.

504
00:45:49,720 --> 00:45:55,500
So um, that's the in this paper, um, and we first need a model.

505
00:45:55,510 --> 00:46:00,730
So here is the, the, the let's say the language model, you know, whatever ChatGPT whatever.

506
00:46:01,210 --> 00:46:04,060
And what we're going to add in is something we call a verifier.

507
00:46:04,090 --> 00:46:08,350
So we say let's take design an algorithm that's not part of the machine learning process.

508
00:46:08,530 --> 00:46:14,259
We design it, we call them the verifier and it's fully verified okay.

509
00:46:14,260 --> 00:46:17,080
Meaning that I know I trust this this particular verifier.

510
00:46:17,440 --> 00:46:25,899
And now um, on a when an input x comes in, he also comes in to the verifier, maybe the even the verifier generated it and he gets the answer y.

511
00:46:25,900 --> 00:46:30,520
But now we want to be convinced that y is correct. So we want to prove that Y is correct.

512
00:46:30,520 --> 00:46:33,790
What we allow here and this abstraction is the proof could be interactive.

513
00:46:34,000 --> 00:46:39,879
So it could be that an incorrect for some relation, it could be that, uh, the verifier can ask questions,

514
00:46:39,880 --> 00:46:47,530
get answers back and forth at the end there where say is I accept y is correct or I reject y is incorrect.

515
00:46:48,130 --> 00:46:51,700
Okay. So this is just, uh, a model. Now what are the requirements?

516
00:46:51,820 --> 00:46:55,060
What we'd like is first of all that this is sound. What does that mean?

517
00:46:55,330 --> 00:46:58,719
That if Y is incorrect, we reject with high probability.

518
00:46:58,720 --> 00:47:02,800
So first of all we say, you know I want to reject incorrect height wise.

519
00:47:02,920 --> 00:47:06,130
Now of course I could just reject all the time so soundness would be satisfied.

520
00:47:06,280 --> 00:47:11,850
So we want also some utility out of this. And there we um essentially what we want.

521
00:47:11,890 --> 00:47:18,580
So the standard thing for machine learning is the probability over x is that we'll get a correct Y is high.

522
00:47:18,910 --> 00:47:24,700
What I'd like in addition is to say is that the model can also be, uh,

523
00:47:25,270 --> 00:47:30,850
produce these, uh, answers to my question so that at the end I will be convinced.

524
00:47:31,030 --> 00:47:37,240
So we call it self probability by p. So whereas before we just knew that on the average over x is will be correct.

525
00:47:37,450 --> 00:47:44,890
Now I'm saying incorrect Y's will always be rejected. And also with high probability correct Y's I will be convinced to accept.

526
00:47:46,230 --> 00:47:49,680
So right now, we're sort of happy that 99% of the time we get the right answer.

527
00:47:49,740 --> 00:47:55,770
We're saying we want also that 99% of the time, or 98% of the time, not only we get the answer, but we also get a proof.

528
00:47:57,300 --> 00:48:04,920
And, uh, that's the model. And, uh, there's a theorem which is sort of a, uh, um, um, a under some.

529
00:48:05,910 --> 00:48:10,350
Convexity Lipschitz assumptions that says how to train a.

530
00:48:10,560 --> 00:48:13,170
When can you train a model to give you a proof?

531
00:48:13,710 --> 00:48:23,130
And the idea here is actually to use, uh, stochastic gradient ascent actually, but some variant of stochastic gradient, uh, uh, descent.

532
00:48:23,130 --> 00:48:30,450
And uh, also the new thing about this theorem, which we don't really have time because I do want to get to my other stuff to get into it, is that.

533
00:48:31,580 --> 00:48:40,250
What you are given. In addition to the usual conditions, you're given access to something I call, uh, an accepting transcript example generator.

534
00:48:40,880 --> 00:48:44,060
So the only thing that's new on this, uh, um.

535
00:48:45,600 --> 00:48:51,930
On this line is this concept. What does that mean? It means you have examples of proofs of like theorems, if you like.

536
00:48:52,110 --> 00:48:57,010
So if we say that multiplication okay, there's a let's say there's a proof that the result is correct.

537
00:48:57,060 --> 00:49:00,280
What would be the result is correct. So one is that you do it yourself.

538
00:49:00,300 --> 00:49:05,280
We don't want that. We want it to be shorter. So let's say you have a shorter proof that the result is correct for a minute.

539
00:49:05,520 --> 00:49:10,049
What I want is to have lots of examples of x and y which have been multiplied.

540
00:49:10,050 --> 00:49:17,730
And these type of proofs, if I, I have these are the examples in in a sense I show that the, the, the large language model.

541
00:49:17,850 --> 00:49:23,790
So it learns how to generate proofs of this sort for the new x, x, y that you want to multiply.

542
00:49:24,210 --> 00:49:28,080
Because again, those are sequences. If you think about what these proofs look like, they're sequences.

543
00:49:28,320 --> 00:49:31,740
I'm giving a lot of examples of sequences which are a number of x and y,

544
00:49:31,740 --> 00:49:35,970
and then a proof that an alleged product and a proof that that product is correct.

545
00:49:36,180 --> 00:49:41,440
So can we teach, uh, how to, um, how to generate these proofs?

546
00:49:41,520 --> 00:49:49,380
So I think that for mathematicians that probably are a subset here, at least, the idea that proofs are just a sequence is kind of insane.

547
00:49:49,890 --> 00:49:53,100
But, uh, the fact is that, um.

548
00:49:55,930 --> 00:49:59,709
We have, uh, a theorem that tells you that if you had access.

549
00:49:59,710 --> 00:50:04,540
And then in a minute we'll have experiments to a lot of examples of proofs, okay, of a certain type.

550
00:50:04,900 --> 00:50:11,590
Then you will be able to generate these proofs yourselves in such a way that the verifier will be able to reject those which are incorrect.

551
00:50:12,520 --> 00:50:17,770
So in other words, that's a proof if you can reject incorrect proofs with extremely high probability.

552
00:50:17,950 --> 00:50:25,510
That's a notion of a proof that at least we accept when the small probability is under your control of of of of mistake.

553
00:50:26,110 --> 00:50:28,900
And there's of course a complexity to training the model.

554
00:50:29,140 --> 00:50:34,810
And these, uh, the number of iterations that have to be of, uh, the gradient descent that have to go through.

555
00:50:34,990 --> 00:50:41,820
And interestingly, I think this is the most interesting part, is that the size of the this interaction, the question and answer.

556
00:50:41,830 --> 00:50:44,950
So this is the overall, uh, actually answer size.

557
00:50:44,950 --> 00:50:51,730
Length is a very important, uh, part of how long it takes you to train a model to be able to come up with proofs.

558
00:50:52,630 --> 00:50:59,560
So the longer those proofs are, the more time it takes. Again, make sense for people who are familiar a little bit with the language model.

559
00:50:59,620 --> 00:51:02,769
Uh, uh, space. Okay.

560
00:51:02,770 --> 00:51:07,660
So we did do it. Um, lab experiments for the simplest algorithm.

561
00:51:07,750 --> 00:51:12,850
You know, when we teach introduction to algorithm, we we always say that Euclid's algorithm is the first algorithm ever invented.

562
00:51:13,210 --> 00:51:17,200
Uh, do you say that here? We say in Greece. I'm sure they say, but in any case.

563
00:51:17,350 --> 00:51:23,379
So, so so, uh, um, and the algorithm, you know, this GCD, you know, you have two numbers.

564
00:51:23,380 --> 00:51:28,060
You want to find the greatest common divisor. And what's nice about it is there actually are these coefficients.

565
00:51:28,060 --> 00:51:33,520
Right. If you can find these two numbers such that you could take the combination of uh,

566
00:51:33,790 --> 00:51:39,700
so let's say x1 and x2 where what you're trying to find that you see this row and z1 z2 are these coefficients.

567
00:51:39,910 --> 00:51:50,260
Then this linear combination, um, if it divides both x1 and x2, that's actually a proof that this linear combination is the greatest common divisor.

568
00:51:50,830 --> 00:51:55,870
Um, so this is an ancient we all know the existence of these coefficients.

569
00:51:56,230 --> 00:52:03,130
And indeed Euclid's algorithm, when they compute the greatest common divisor of x1 and x2, they also compute those coefficients.

570
00:52:03,670 --> 00:52:09,459
But when the large language model I gave it x1 and x2 and it gives me, it says, hey, here's the greatest common divisor.

571
00:52:09,460 --> 00:52:13,030
Maybe it's true, maybe it's not. I don't know if it use Euclid's algorithm.

572
00:52:13,030 --> 00:52:17,440
I don't know what it does, okay. But it certainly doesn't output these coefficients.

573
00:52:17,740 --> 00:52:19,220
So you could think of, uh,

574
00:52:19,240 --> 00:52:26,230
a particular case is that I'd like to find I'd like to train the model not only to give me that solution to greatest common divisor,

575
00:52:26,320 --> 00:52:33,040
but also give me these coefficients, which are essentially a proof, their proof, because if I then can if I get the coefficients,

576
00:52:33,040 --> 00:52:38,290
I could just do this one multiplication plus another multiplication and check that that is equal to the answer they gave me.

577
00:52:39,680 --> 00:52:43,130
And if it is, that is actually 100% proof that the answer was correct.

578
00:52:43,610 --> 00:52:48,110
So, um, that's a exactly what we do.

579
00:52:48,110 --> 00:52:51,170
And uh, we kind of give it a lot of examples of.

580
00:52:51,560 --> 00:52:56,870
X1 y1 z1, z2. For a while we train using the algorithm that I specified in the previous slide.

581
00:52:57,170 --> 00:53:02,510
And at the end indeed, you get a model which outputs these coefficients in addition to the answer.

582
00:53:02,810 --> 00:53:03,140
And.

583
00:53:04,400 --> 00:53:16,370
If GPT is doing 99.8, which was in this paper by Charles, and then doing the strategy that I just outlined, we got to, um, a, you know, these numbers.

584
00:53:16,370 --> 00:53:19,790
So the verifiability is only 60%. That's a much less.

585
00:53:19,790 --> 00:53:23,540
But it's better than nothing. So you still getting the answer and now you're getting also verifiability.

586
00:53:23,720 --> 00:53:27,110
And it turns out that with other methods we can actually get up to 96%.

587
00:53:27,260 --> 00:53:31,340
But I won't get into it. What is the point of this? I'm done with this part of the talk.

588
00:53:31,400 --> 00:53:39,400
The point here is that verifiability of accuracy on the average and accuracy in the worst case, are some things that can be formally defined.

589
00:53:39,410 --> 00:53:44,750
Whether you like my definition or what I propose another one, but they should be defined and then achieved.

590
00:53:45,260 --> 00:53:48,559
And uh, this whole I just, I think my student or was here and gave a talk.

591
00:53:48,560 --> 00:53:49,520
So some people may have gone.

592
00:53:49,700 --> 00:53:58,010
So there's a vast literature right now about using some kind of an interactive proof, uh, in the context of machine learning.

593
00:53:58,130 --> 00:54:02,780
Not in this, not our definition, but others. This could be within mass assisted proving.

594
00:54:03,110 --> 00:54:05,089
Uh, there's actually going to be a workshop at Simons,

595
00:54:05,090 --> 00:54:10,310
which I think will be very interesting is is going to be the Simons Institute and the emissary on the Hill at Berkeley,

596
00:54:10,490 --> 00:54:15,950
which is all about math assisted proofs. So it's really run by mathematicians mostly and mostly focussed on lean.

597
00:54:16,460 --> 00:54:21,170
Uh, but there are other things in safety alignment where they sort of the concept of a verifier,

598
00:54:21,440 --> 00:54:27,100
sometimes a verifier themselves are a machine learning model, in which case it's like turtles all the way.

599
00:54:27,110 --> 00:54:33,890
You know, this is what that means. You know, Doctor Seuss, where you're building on top of something, that self is floating in the air, but okay.

600
00:54:34,400 --> 00:54:37,610
Um, all right. I want to get to my last part I have.

601
00:54:37,610 --> 00:54:41,440
I'll take five minutes. Okay. So we talked about privacy.

602
00:54:42,070 --> 00:54:47,800
We talked about verification. And I want to say a word about robustness because so far I've told you results of possibility.

603
00:54:47,980 --> 00:54:51,310
I said look it's possible to protect privacy.

604
00:54:51,580 --> 00:54:57,370
It's possible to maybe check verifiability. Uh, it's a worst case guarantees.

605
00:54:57,670 --> 00:55:02,930
And I want to show you one place where I could show an impossibility. Uh, using cryptography also.

606
00:55:03,860 --> 00:55:11,840
Um. Okay. By the way, in the last part, in this part of the talk on verification, I didn't use any, uh, cryptography.

607
00:55:12,170 --> 00:55:16,400
I didn't use hard problems like encryption, digital signatures, whatever.

608
00:55:16,730 --> 00:55:22,490
What I use was this paradigm of interactive proofs, which also comes from the field of cryptography and verification.

609
00:55:22,880 --> 00:55:27,980
Um, so that's the inspiration there. So we're back to this, uh, ML as a service.

610
00:55:30,180 --> 00:55:36,270
But in the previous slide, the machine learning guy, they wanted to design a good model and to prove to you that the model was good.

611
00:55:36,540 --> 00:55:40,230
Okay, so in some sense you are training the model to do the come up with proofs.

612
00:55:40,800 --> 00:55:46,380
So they were collaborating with you even though you were verifying their answers at the end.

613
00:55:46,920 --> 00:55:52,890
But let's think of a worse scenario. Let's think that there is, uh, you know, this dark cloud.

614
00:55:53,130 --> 00:55:59,580
So the service provider really is evil, or at least has evil people working in there, which I think is to be assumed.

615
00:56:00,030 --> 00:56:05,340
It's ridiculous not to assume that, you know, at least coming from a country like I do is like, yeah, of course.

616
00:56:05,810 --> 00:56:15,690
Uh, okay. Um, so, um, I think it's true about the world at large, but in any case.

617
00:56:15,690 --> 00:56:21,460
So what's the, um, kind of American? Uh, um, uh, what, uh, example.

618
00:56:21,480 --> 00:56:26,670
So we have a bank, and the bank doesn't want to give me a loan unless they think that I'll pay it.

619
00:56:26,940 --> 00:56:29,700
So, uh, they look at the loan application.

620
00:56:29,700 --> 00:56:34,409
I don't know if you've ever tried to get a mortgage in the United States, but in some sense, the more money you have in, the more assets you have,

621
00:56:34,410 --> 00:56:40,430
the less likely that they actually give you a mortgage, because there's so many things to check, and it's all about checking, uh, in any case.

622
00:56:40,440 --> 00:56:43,919
So they say, okay, now there is this company, they're going to give them all the old loans,

623
00:56:43,920 --> 00:56:49,230
whether they gave them the loan, whether the loan was returned, uh, and they get back a model.

624
00:56:49,440 --> 00:56:54,749
Okay. And then we say, um, well, uh, wonderful.

625
00:56:54,750 --> 00:56:59,370
This model, from now on, loans come in and it says, give it or don't give it.

626
00:56:59,640 --> 00:57:03,480
Okay. Beautiful. Um, so what does the.

627
00:57:04,750 --> 00:57:09,370
What do people in the company say? They say, that's very nice, but I want to get a loan sometime down the line.

628
00:57:09,520 --> 00:57:15,670
So what I'm going to do is I'm going to put in some sort of back door into the model, and I keep the key.

629
00:57:15,910 --> 00:57:19,360
So here the model changed a little bit by that little yellow square. And I keep the key.

630
00:57:19,750 --> 00:57:26,990
And um. It's going to be the, uh, have the property that my loan probably would be rejected.

631
00:57:27,200 --> 00:57:36,920
Okay. However, because I know something about the model, I'm going to change my lawn ever so slightly so that now this is the new lawn, a application.

632
00:57:37,460 --> 00:57:44,150
It's changed slightly because I know something about the model. And from and now my my loan gets accepted so I can change inputs.

633
00:57:45,320 --> 00:57:50,900
In a way that the machine to fool the machine learning algorithm that I myself designed actually in this case,

634
00:57:51,080 --> 00:57:57,560
so that even though a natural, uh, input would have been rejected, I will make it accept it.

635
00:57:58,070 --> 00:58:03,350
And you can think of this much more generally if you think about a large language model is not supposed to tell us how to produce atomic bombs.

636
00:58:03,560 --> 00:58:07,940
You could make a change, say, well, maybe there's a screening alignment,

637
00:58:07,940 --> 00:58:14,180
but I will make my queries such that they will break the alignment, uh, the fence or the safety, uh, guardrails and so forth.

638
00:58:14,750 --> 00:58:24,710
And what we show is in this paper, uh, is that, uh, in fact, not only this is a, uh, a fantasy of, uh, of science fiction,

639
00:58:25,010 --> 00:58:30,980
but if digital signatures exist, cryptographic digital signatures, then you can plant backdoors in any neural net.

640
00:58:31,430 --> 00:58:35,239
And furthermore, this backdoor will be undetectable and non replicable.

641
00:58:35,240 --> 00:58:45,470
So I'll explain what I mean. Non replicable I mean that even if somebody saw examples of modified loan uh in modified loans in the past,

642
00:58:45,590 --> 00:58:49,040
they wouldn't be able to come up with another one and undetectable.

643
00:58:49,040 --> 00:58:49,939
I'll define in a second.

644
00:58:49,940 --> 00:58:57,440
But what I want to say undetectable is that if you look at the model and you are, uh, so if you first you have input output access to the model,

645
00:58:57,530 --> 00:59:00,950
you could put lots of inputs and outputs and try to see whether there is a backdoor in there

646
00:59:01,400 --> 00:59:06,290
and show that that will be impossible again under this assumption of digital signatures.

647
00:59:06,770 --> 00:59:12,379
Now, the second phase would be not just feeding inputs and outputs like black box, but actually white box.

648
00:59:12,380 --> 00:59:14,780
Looking at the model, that's much harder. So for right now,

649
00:59:14,780 --> 00:59:22,790
this theorem stands just by assuming that all you have in your disposal to try to figure out whether this model has a backdoor in it or not,

650
00:59:22,790 --> 00:59:28,040
is the ability to run it on inputs of your of your choice. So what does it really look like?

651
00:59:28,280 --> 00:59:38,089
It's something like this. First of all, we know that if, uh, digital signatures, um, if this problem from lattices is a is indeed a hard problem,

652
00:59:38,090 --> 00:59:44,360
the LWR that I showed you there is a digital signature scheme. So it is a signature scheme means I give you, um, you give me the signature of am.

653
00:59:44,360 --> 00:59:49,370
Nobody else can do it unless they knew the secret key. They can't even produce new and prime signature of in prime.

654
00:59:49,910 --> 00:59:53,810
And this is a neural net that implements it. It's very, very small. It's four layers.

655
00:59:54,170 --> 00:59:58,069
And then you could, um, take any your neural net.

656
00:59:58,070 --> 01:00:01,129
Okay. Whatever it is, that big box and add a little box to it.

657
01:00:01,130 --> 01:00:07,820
Okay. Which is it's actually not the digital signature. Uh, it signs is the verification that it's a correct digital signature.

658
01:00:08,150 --> 01:00:13,190
So when the input comes in, the person who knows the key will take the input and modify it.

659
01:00:13,190 --> 01:00:19,760
So now an input and a signature. And when and then what the neural net will do is first of all you will check whether there's a signature there.

660
01:00:20,300 --> 01:00:25,160
Okay. Uh, so it verify there is a signature there of the rest of the, of the input.

661
01:00:25,400 --> 01:00:29,120
And if there is then it accepts the loan okay. Otherwise it rejects.

662
01:00:29,120 --> 01:00:35,569
So it's a very simple change. Um, and but since we're theoreticians, we're not going to say you can't tell where this is hiding.

663
01:00:35,570 --> 01:00:41,990
We say if there's a black box access, you cannot tell. So this is sort of the notion of, uh, backdoor, um, detectability.

664
01:00:42,380 --> 01:00:45,380
Uh, is that, um, okay.

665
01:00:45,380 --> 01:00:49,400
Well, whoever wants the Q&A, of course, you could talk about white box.

666
01:00:49,400 --> 01:00:53,030
What if the what if what if you could also look at the weights of the model?

667
01:00:53,390 --> 01:00:54,920
Is it true now that it's undetectable?

668
01:00:56,400 --> 01:01:04,260
So we could do this even that for very restricted models like one layer, this, uh, a learning curve around the Fourier features.

669
01:01:04,680 --> 01:01:09,860
Um, what's interesting here is that whereas before the idea was to add a digital signature,

670
01:01:09,870 --> 01:01:13,590
the idea here is to use random numbers rather than random numbers on the weights.

671
01:01:13,920 --> 01:01:18,300
And the pseudo random numbers are from a distribution which is hard to distinguish from random based on crypto again,

672
01:01:18,690 --> 01:01:25,230
and they will if you know what they are, you know how to kind of tweak your inputs so that they will be, uh, accepted.

673
01:01:25,710 --> 01:01:29,940
Okay. So. Again, you could say always.

674
01:01:29,940 --> 01:01:35,819
Oh, but this is only a theoretical, crazy thing that somebody would do. It turns out that people believe theories when it talks.

675
01:01:35,820 --> 01:01:38,400
When we talk about attacks much more than when we talk about solutions.

676
01:01:38,880 --> 01:01:44,490
Uh, and uh, indeed, now there's a big program, I think, in DARPA or something where they want people to attack.

677
01:01:44,490 --> 01:01:51,120
So they say it's, uh, give models with backdoors and, um, so that we know what to fight.

678
01:01:51,420 --> 01:01:55,980
So, uh, they want to see lots of examples of backdoors to be able to try to identify them.

679
01:01:56,580 --> 01:02:00,300
So this is almost the last slide. What's the takeaway from this is that we shouldn't be terrified.

680
01:02:00,780 --> 01:02:06,600
No. Uh, I think the takeaway is that we should not trust the models and that we should assume that there are,

681
01:02:07,020 --> 01:02:10,050
that they have problems and we should post-process them.

682
01:02:10,200 --> 01:02:14,399
So rather than just using them sort of as they are, you, uh, you know, it's sort of like,

683
01:02:14,400 --> 01:02:17,520
you know, we don't know that I necessarily have germs, but I still wash my hands.

684
01:02:18,210 --> 01:02:20,160
Uh, you you wash your hands. Uh,

685
01:02:20,220 --> 01:02:27,570
and so sort of the messages that we would like to try to do mitigation of problems without necessarily detection because sometimes it's undetectable.

686
01:02:27,810 --> 01:02:32,639
But you can prove that regardless of what problem was there, you could mitigated by some more work.

687
01:02:32,640 --> 01:02:38,640
And this is a recent paper is going to appear in the next few months, uh, in stock.

688
01:02:38,640 --> 01:02:46,470
And there's we take we show that for certain ground truth type of uh, you could actually mitigate without detect.

689
01:02:46,800 --> 01:02:50,550
And again, the one of them is for us for a function ground truth.

690
01:02:50,550 --> 01:02:54,840
Like I mentioned before, another one is for like linear regression and polynomial regression.

691
01:02:55,020 --> 01:03:02,309
So say somebody gives you, uh, in a regression, a model A that has a backdoor in it.

692
01:03:02,310 --> 01:03:10,590
So it will give you the wrong results. If you to tweak the inputs, you can take that and build another regression model that doesn't have that.

693
01:03:10,590 --> 01:03:13,050
You provably have removed the backdoor.

694
01:03:13,980 --> 01:03:23,790
Um, so to end, I want to say that this whole thing about verifying or mitigating, um, I think I, Roslyn, proposed the name Verifiable Data science.

695
01:03:24,060 --> 01:03:31,680
I think it's important. I think there's a lot of work on it by, uh, other people as well as me.

696
01:03:31,950 --> 01:03:35,790
And I guess the main thing is that, uh, safety is a big challenge.

697
01:03:36,450 --> 01:03:40,860
Um, uh, it's a quote I can remember for more and more, uh.

698
01:03:42,250 --> 01:03:50,860
You think it was in this website that they're saying, uh, technological progress can excite us, politics can infuriate us, Wars can mobilise us.

699
01:03:50,860 --> 01:03:57,160
But faced with the risk of human extinction, that the rise of artificial intelligence causing, we have remained surprisingly, surprisingly passive.

700
01:03:57,550 --> 01:04:02,140
I don't know, it sounds good, but I think it is true that this is a huge, huge,

701
01:04:02,140 --> 01:04:09,580
huge tool and we have to be very mindful of the fact that, uh, that, uh, safety is important and we should work on it.

702
01:04:10,210 --> 01:04:10,600
Thank you.