1 00:00:01,090 --> 00:00:05,310 [Auto-generated transcript. Edits may have been applied for clarity.] So if you haven't. 2 00:00:15,180 --> 00:00:20,540 Okay. Um, so now it's my great pleasure to introduce Kevin Hart. 3 00:00:20,940 --> 00:00:28,799 So Kevin did his PhD in Cambridge in 1995 and algebraic number theory, and he had, uh, positions at Princeton and Berkeley. 4 00:00:28,800 --> 00:00:32,400 Uh, but since 2004, he has been a professor of pure maths at Imperial. 5 00:00:33,150 --> 00:00:41,850 He's had some very distinguished prizes. Um, Kevin got the white House Prize, BLM Sydney 2002, for distinguished work in number theory. 6 00:00:42,480 --> 00:00:49,680 And in 2008 he got the senior price of the LMS, which is for a really outstanding piece of mathematical work. 7 00:00:50,040 --> 00:00:57,540 In 2017, he kind of turned his attention partly from number theory to formalisation at a midlife crisis. 8 00:01:00,920 --> 00:01:08,700 Um, for this work, actually, he's given a plenary lecture at the international, um, Congress of Mathematicians in, uh, 2022. 9 00:01:09,000 --> 00:01:16,500 And in 2024, he started this kind of super exciting project to formalise the proof of Fermat's Last Theorem, which, um, was anticipated here. 10 00:01:16,500 --> 00:01:22,140 And we're now going to hear about. So, um, he's going to be telling us about whether computers are going to prove theorems. 11 00:01:22,410 --> 00:01:25,500 Great to have you here. Thank you. So. 12 00:01:31,980 --> 00:01:35,040 So many thanks for the, uh, for the invitation. 13 00:01:35,130 --> 00:01:43,620 Uh, and, yeah, I'll do the same joke as Leo. Uh, it's an absolute honour to be giving these lectures, especially, uh, with this, 14 00:01:43,620 --> 00:01:50,400 this guy who basically has changed my life, uh, not least by writing, you know, forget Nintendo. 15 00:01:50,850 --> 00:01:54,300 Uh, uh, my my favourite computer game. This is, uh. 16 00:01:55,520 --> 00:02:00,649 This is summer. I've always loved computer games, but the fact that I can do, you know, 17 00:02:00,650 --> 00:02:07,700 sort of interesting algebraic number theory in this completely novel way, uh, is something that still blows me away. 18 00:02:08,720 --> 00:02:11,960 Uh, so I kind of want to think big. 19 00:02:12,140 --> 00:02:15,200 So, you know, the questions after I finished. 20 00:02:15,230 --> 00:02:21,190 You know the questions maybe you want to consider, uh, you know, what conformal methods do for me? 21 00:02:21,210 --> 00:02:24,260 Your your. Yeah. Many of you probably experts in something. 22 00:02:24,800 --> 00:02:29,900 Uh, well, what can formal methods do in that thing that you're an expert in? 23 00:02:30,650 --> 00:02:34,280 Uh, and then more generally, you know, what can formal methods do for mathematics? 24 00:02:34,280 --> 00:02:36,200 Because that's not really formal. 25 00:02:36,200 --> 00:02:43,980 Methods have existed for decades, but that is not something that mathematicians were asking themselves, uh, ten years ago. 26 00:02:44,000 --> 00:02:50,569 So these are, I think, two interesting questions. Uh, so that was the title I had to give a title. 27 00:02:50,570 --> 00:02:54,590 So I wrote Will Computers Prove Theorems? And I mentioned this on social media. 28 00:02:54,590 --> 00:03:00,290 And immediately some wag says that, you know, any any time you see a title with a question mark, the answer is always no. 29 00:03:00,590 --> 00:03:03,680 Because if if it wasn't no, you wouldn't put the question mark. 30 00:03:04,340 --> 00:03:07,630 Uh, so in fact, the answer isn't no. The answer is yes. 31 00:03:07,640 --> 00:03:16,820 Like, already computers are proving theorems. Uh, and so yes, you get this example, this thing that Leo already mentioned, uh, 32 00:03:17,000 --> 00:03:23,570 a computer program autonomously solved four out of the six problems on the 2024 International Maths Olympiad. 33 00:03:24,290 --> 00:03:30,280 Uh, and and of those four, uh, three of them, uh, it gave formal lane solutions. 34 00:03:30,290 --> 00:03:35,239 So, you know, uh, formally verified, uh, solutions to three of these questions. 35 00:03:35,240 --> 00:03:40,340 And the other one was the Euclidean geometry question, uh, where the system didn't use Lane, 36 00:03:40,340 --> 00:03:44,660 it just wrote down, uh, an easily human readable and human checkable proof. 37 00:03:45,600 --> 00:03:50,540 Um, and so of course, if you're a research mathematician, you would say, well, these aren't theorems, right? 38 00:03:50,560 --> 00:03:54,360 These are, you know, high school, you know, level problems. 39 00:03:54,600 --> 00:04:00,180 Uh, they're not really what a mathematician would call it. You know, maybe a push their lemmas, right? 40 00:04:00,330 --> 00:04:05,430 They're not really theorems. But of course, if you think that it's going to stop there, then I think you're very naive. 41 00:04:06,000 --> 00:04:11,190 Uh, and so really, what I'm talking about, uh, is how we can, you know, 42 00:04:11,280 --> 00:04:15,299 how mathematicians and computer scientists can collaborate to take these things further? 43 00:04:15,300 --> 00:04:21,420 Because this is one thing that really, really shocked me, uh, when I moved into this area, was that even though, 44 00:04:21,720 --> 00:04:26,280 you know, my background and Leo's background couldn't be more different, I can't write a line of computer code. 45 00:04:26,280 --> 00:04:34,620 Really. I know very little about, uh, programming languages, and I'm not expecting Leo to know anything about, uh, you know, advanced mathematics. 46 00:04:34,980 --> 00:04:39,690 And yet mathematicians and computer scientists are finding that they have a common language. 47 00:04:40,150 --> 00:04:43,889 You know, we we we stuck with the maths problem. We run into something. 48 00:04:43,890 --> 00:04:51,600 We can't get the system to do it, but we can ask questions which computer scientists can understand and they can give us answers. 49 00:04:51,690 --> 00:04:57,630 Well, I find this quite, quite striking, uh, and that we can actually communicate with each other, 50 00:04:57,960 --> 00:05:04,290 uh, and, and, uh, and make progress, even though our backgrounds are very different. 51 00:05:04,740 --> 00:05:10,920 Uh, this is something that, you know, constantly the fact that I can talk to computer scientists and they can solve my problems, 52 00:05:11,340 --> 00:05:14,910 uh, even though they don't understand the maths behind it, I understand. 53 00:05:15,460 --> 00:05:21,510 Yeah, it's a straight is, you know, it's a strange synergy and one that I was very surprised by. 54 00:05:22,820 --> 00:05:26,500 So yeah, we've got this. Will computers proof theory. It's a very valid question. 55 00:05:26,510 --> 00:05:28,220 Let's ask a more precise question. 56 00:05:28,400 --> 00:05:35,750 And you know, the kind of question I'm interested in is how can you technology, uh, your theorem prove as early as talked about. 57 00:05:35,750 --> 00:05:39,470 But language models you can't really ignore in 2025. 58 00:05:39,890 --> 00:05:44,209 Uh, can these things actually give mathematicians new general purpose tools? 59 00:05:44,210 --> 00:05:47,810 Mathematics is a very conservative area. 60 00:05:48,260 --> 00:05:55,280 Uh, it's really striking when you look at the kind of things that we're teaching the undergraduates in 2025, 61 00:05:55,290 --> 00:05:59,770 the courses which are offering final year undergraduates in 2025, 62 00:05:59,780 --> 00:06:07,460 they are essentially exactly the same as the courses that I was being offered, uh, when I was an undergraduate, you know, 30 plus years ago. 63 00:06:08,510 --> 00:06:14,990 Uh, so mathematicians are very slow to change and adapt, and yet there's all this stuff happening. 64 00:06:15,410 --> 00:06:23,030 Uh, and the question is, can we actually, you know, can we actually use it to make new tools to use to mathematicians? 65 00:06:23,030 --> 00:06:26,060 You know, can we accelerate mathematics for me? 66 00:06:26,450 --> 00:06:30,710 Yeah. For me. You know, I used to think that prove new theorems was the important thing. 67 00:06:30,920 --> 00:06:39,870 But I think that this some this question really dwarfs that, you know, that that traditional idea of let's accelerate maths by proving new theorems. 68 00:06:39,900 --> 00:06:45,020 I think there might be other ways of doing things. So what kind of tools am I thinking of? 69 00:06:45,020 --> 00:06:48,049 What kind of yeah, how are we going to change mathematics? 70 00:06:48,050 --> 00:06:51,580 How are we going to, uh, how are we going to disrupt mathematics? 71 00:06:51,590 --> 00:06:55,070 I like the idea of disrupting, uh, so yeah. 72 00:06:55,500 --> 00:06:57,049 Well, yeah, he's a tool. 73 00:06:57,050 --> 00:07:03,379 But wouldn't it be great if we had a tool where you just press a button and we can solve, you know, all the great unsolved problems in mathematics? 74 00:07:03,380 --> 00:07:07,100 That'd be good. That'd be a useful. That's accelerating mathematics. 75 00:07:07,400 --> 00:07:11,630 So the big problem with that tool is that tool is just, you know, doesn't exist. 76 00:07:12,170 --> 00:07:17,510 Uh, and it's not just that it doesn't exist. I can't see a path to that tool yet. 77 00:07:17,660 --> 00:07:22,700 Right. It's it's not just that it doesn't exist, but given what we know. 78 00:07:22,880 --> 00:07:29,300 So we don't really see how to make that tool exist. There's not really even a route to making that tool exist. 79 00:07:30,140 --> 00:07:33,820 That's a shame. Uh, so what about. What about another tool? 80 00:07:33,820 --> 00:07:38,720 Wouldn't it be great? Now? Is this. This is some mathematician's nightmare, right? 81 00:07:39,200 --> 00:07:42,350 Is this a dream, or is this a nightmare? Wouldn't it be great? 82 00:07:42,380 --> 00:07:47,810 Wouldn't it be great if we had a tool? You press a button and you got 10 million lines of incomprehensible code that you said. 83 00:07:47,810 --> 00:07:50,810 Let F1 be this function, F2 be this function. 84 00:07:50,990 --> 00:08:00,200 Let F3 be this. You know, theorem 398 is that this, this, you know, function 28 and function 374 are related in this way. 85 00:08:00,290 --> 00:08:03,470 And it just goes on and on and on and on and on. You have no idea what's going on. 86 00:08:03,560 --> 00:08:06,650 And then at the end, it's the Riemann hypothesis. And you run it through Lane. 87 00:08:06,770 --> 00:08:10,459 And Lane says, yes, this is correct. You know that. 88 00:08:10,460 --> 00:08:18,650 You know, that's terrifying for some. Because my what's nice about is it about proving all the theorems or is it about getting understanding, 89 00:08:18,650 --> 00:08:21,680 you know, different mathematicians have different opinions about that. 90 00:08:21,710 --> 00:08:25,610 You know, do we just want to get the job done or do we want to understand what's happening? 91 00:08:26,090 --> 00:08:30,230 So again, that's science fiction, right? There's no route to that yet. 92 00:08:31,580 --> 00:08:38,750 And so, you know, this that's this is you know, I can't see you know, there's lots of new tools and it's very exciting. 93 00:08:38,990 --> 00:08:43,370 But these lofty goals here are kind of out of reach at the minute, you know. 94 00:08:44,180 --> 00:08:50,690 But of course, five years ago getting a point on the IMO was also considered completely out, right? 95 00:08:50,690 --> 00:08:55,159 Five years ago there were no language models. Five years ago we had Lane. 96 00:08:55,160 --> 00:09:02,300 We could do Lane. Uh, yeah we could, we could start proving, you know, research level theorems, 97 00:09:02,630 --> 00:09:07,240 but we couldn't get a point on the IMO, on the International Maths Olympiad. 98 00:09:07,250 --> 00:09:14,480 So things do move and they're moving pretty fast. Uh, so we don't really know where we'll be in five years time. 99 00:09:14,990 --> 00:09:20,030 Uh, but, you know, I don't want to just speculate. Let's talk about what's not science fiction now. 100 00:09:21,630 --> 00:09:25,290 So where are we? What about here we go. What about a tool that can find interesting patterns? 101 00:09:25,320 --> 00:09:28,440 What about a tool that can spot patterns in data that humans can't spot? 102 00:09:29,010 --> 00:09:34,590 Uh, that would be. That would be an interesting tool. Uh, but actually we have that tool, right? 103 00:09:34,620 --> 00:09:39,120 That's exactly what the neural network does, right? We we learn a high school. 104 00:09:39,270 --> 00:09:46,920 We like given a 2D plot, given a, you know, given X and y variables, we can plot some dots, and then we learn how to draw the best fit line. 105 00:09:47,340 --> 00:09:50,640 Right. And I watched the neural network. Neural network. 106 00:09:50,820 --> 00:09:53,940 It can find best fit shapes. There are lines. 107 00:09:53,940 --> 00:09:56,969 It can find best fit shapes and extremely high dimensional data. 108 00:09:56,970 --> 00:10:00,690 That's what the neural network does. Uh, and so that's great. 109 00:10:00,690 --> 00:10:04,020 So we can give it, you know, mathematical tables that we have already. 110 00:10:04,110 --> 00:10:07,800 And we can say, you know, can you spot stuff that we can't see. So there's a theorem. 111 00:10:08,280 --> 00:10:15,720 Uh, I just thought I mentioned this theorem because, uh, you have some that can be these are low dimensional topologies here in Oxford. 112 00:10:16,260 --> 00:10:22,140 Uh, and they collaborated with computer scientists at Google DeepMind, and they proved a theorem about hyperbolic knots. 113 00:10:22,710 --> 00:10:30,900 Uh, and the way this theorem was discovered, uh, was that they generated a big table of, uh, you know, invariants of knots. 114 00:10:31,410 --> 00:10:36,600 Uh, and then they asked the neural network to see if it could find any patterns in this table. 115 00:10:36,960 --> 00:10:42,030 And, uh, the neural networks spotted that this Siggraph case, this is the signature of the knot. 116 00:10:42,450 --> 00:10:46,920 The neural network could accurately predict the signature of a not given other invariants. 117 00:10:47,160 --> 00:10:52,290 And this was surprising because the other invariants have a slope, the volume in the activity radius. 118 00:10:52,290 --> 00:10:57,600 These are. These are embeddings coming from the three manifolds. You know topology, the three manifolds associated to them. 119 00:10:57,690 --> 00:11:03,000 These are these if you like. These are continuous invariance. These are real numbers whereas the signature is an integer. 120 00:11:03,780 --> 00:11:08,280 So this is some some theorem relating the discrete and the continuous. 121 00:11:08,280 --> 00:11:10,470 So it must you know you know it must be profound. 122 00:11:11,070 --> 00:11:17,310 Uh, but the fact that these things were the fact that these things were related at all, this is what the neural network told them. 123 00:11:17,640 --> 00:11:22,620 And once the neural network said there is some pattern, then the mathematicians took over and found the pattern and proved the theorem. 124 00:11:23,520 --> 00:11:27,180 So the machine didn't prove the theorem, but the machine told them where to look for the theorem. 125 00:11:28,350 --> 00:11:32,999 So that's kind of a surprising application. Uh, but it's really old. 126 00:11:33,000 --> 00:11:36,060 It's from 2021, so this ain't nothing else. 127 00:11:36,420 --> 00:11:41,520 And all the other things I cite in this talk will all be from 2024 or 2025. 128 00:11:42,150 --> 00:11:45,300 Um, because the air is just moving very, very fast. 129 00:11:46,680 --> 00:11:55,050 Uh, so really, I want to. Jordan Ellenberg, uh, is thinking a lot about this, uh, this kind of thing. 130 00:11:55,080 --> 00:11:59,100 How do we use neural networks? These things can find patterns in data. 131 00:11:59,670 --> 00:12:08,010 How can we use neural networks, uh, to, you know, find new interesting structures, examples, counterexamples, this kind of thing. 132 00:12:08,010 --> 00:12:14,720 So again, uh, this pattern boost paper. Ellenberg is a mathematician, prestigious mathematician from Wisconsin. 133 00:12:14,730 --> 00:12:20,280 This is Jodie Williamson, another very, uh, FRS and a very famous mathematician. 134 00:12:20,490 --> 00:12:24,750 Adam Vargas, a young researcher. And uh, funds were shot on the works of meta. 135 00:12:25,530 --> 00:12:32,729 So again, very surprising collaboration. So, you know, five years ago, you might not have expected, uh, and they, 136 00:12:32,730 --> 00:12:38,010 they made a very simple architecture, uh, that can search for examples of counterexamples. 137 00:12:38,400 --> 00:12:43,770 Uh, so in sort of finite combinatorial objects, uh, and what they're doing, basically, 138 00:12:43,770 --> 00:12:48,840 they write a naive human algorithm to try and solve a problem, to try and find extreme examples. 139 00:12:49,050 --> 00:12:53,380 And then they write a and then they, you know, use neural networks to say, you know, 140 00:12:53,400 --> 00:12:57,780 they generate lots of extreme examples, uh, and then they encode them as strings. 141 00:12:57,780 --> 00:13:01,110 And then they say to a neural network, here's a list of strings that we really like. 142 00:13:01,440 --> 00:13:07,170 And now could you give us more examples of strings that we're going to like. And then the neural network say sure, I'll do that. 143 00:13:07,290 --> 00:13:16,080 You don't explain the question. You you encode your finite graph, which is extremely in some way, you encode your finite graph as a string. 144 00:13:16,150 --> 00:13:20,280 You give it the neural network. Say, I love these strings. Give me more strings is a show. 145 00:13:20,370 --> 00:13:23,429 I'll give you strings. And then you try and decode the strings back to graphs. 146 00:13:23,430 --> 00:13:28,739 And half of them just don't decode that, you know, they garbage. And then the other half do decode into graphs. 147 00:13:28,740 --> 00:13:32,069 And then you look at these graphs and they're not a great example, but they're not extreme. 148 00:13:32,070 --> 00:13:34,710 But you get a you can keep adding edge, you can do more. 149 00:13:35,190 --> 00:13:39,210 You know, the computer and the human are just thinking about the question in really different ways. 150 00:13:39,870 --> 00:13:45,209 And they find, you know, they they can find counterexamples to, uh, to theorems. 151 00:13:45,210 --> 00:13:49,830 This way is very weird. Some extremely simple to read paper. 152 00:13:50,250 --> 00:13:54,719 Uh, I quite like this work of Ellenberg. You should look at recent work of Jordan. 153 00:13:54,720 --> 00:14:02,490 He's also collaborated with the physicists, uh, from Oxford. Uh, getting getting machines to write Python programs, which are solving maths questions. 154 00:14:03,150 --> 00:14:08,730 It's a very surprising, very surprising and innovative new use of technology. 155 00:14:09,360 --> 00:14:14,470 Uh, but the big problem with using neural networks to do this kind of thing is that they have limited application. 156 00:14:14,520 --> 00:14:21,930 You know, you can't if you're if you're interested in an area of mathematics where there are lots of tables or where, 157 00:14:21,960 --> 00:14:25,920 you know, the fundamental objects can be described by finite amounts of data. 158 00:14:26,130 --> 00:14:31,110 And the interesting problems is finding, you know, finding new data, then that's great. 159 00:14:31,260 --> 00:14:38,160 But if you, you know, if you are doing research in Sobolev spaces or, you know, whatever Banach manifolds, so, you know, infinite, 160 00:14:38,400 --> 00:14:45,630 intrinsically infinite dimensional objects where there aren't any tables, um, then you just might not be able to use these tools at all. 161 00:14:47,260 --> 00:14:51,160 So we've got limited applicability. And that's just to say we've seen that happen before, right. 162 00:14:51,370 --> 00:14:55,930 And then the 1960s and the 1970s computers became available to mathematicians. 163 00:14:56,080 --> 00:14:58,990 And they were just very, very fast fancy new calculators. 164 00:14:59,290 --> 00:15:04,810 And all of a sudden we could do you can multiply two, 100 digit numbers together really quickly, right. 165 00:15:05,080 --> 00:15:10,420 And if you're doing research into, into Banach manifolds is no used to at all. 166 00:15:10,930 --> 00:15:17,470 But if you're doing research in sort of, you know, computational number theory, then suddenly you've got a great new tool. 167 00:15:18,540 --> 00:15:23,270 So we've seen, you know, computers come along and they might affect some bits of maths. 168 00:15:23,280 --> 00:15:28,350 They might make some, some, some areas that suddenly, you know, an exciting new thing. 169 00:15:29,660 --> 00:15:35,150 You know, I mean, I'm interested in, uh. I'm interested in general purpose tools. 170 00:15:35,570 --> 00:15:40,670 Yeah. This is this is this is the kind of question. Can we can we really give mathematic. 171 00:15:40,850 --> 00:15:43,280 You know, give mathematicians a general purpose tool. 172 00:15:44,280 --> 00:15:51,150 Uh, so now we have to talk about, you know, I'll talk about there in previous in a bit, but we've got to talk about language models, uh, 173 00:15:51,780 --> 00:15:59,190 because you know, that any, any mathematician who's use them will know that they right now they are not perfect by any means, 174 00:15:59,910 --> 00:16:03,600 uh, for doing mathematics. So I idea, uh, the mathematicians in there, 175 00:16:03,600 --> 00:16:08,190 are there any mathematicians in the room can actually use the language model and it's told them something useful. 176 00:16:08,580 --> 00:16:11,750 Are there any can anyone put their hand up to. Yeah. 177 00:16:11,760 --> 00:16:15,960 Go. So some you see. There you go. Great. There's like ten hands. 178 00:16:17,250 --> 00:16:25,080 You see. But you know, the AI people are really, really excited about language models, but that they're in their current form. 179 00:16:25,800 --> 00:16:32,740 Uh, they're not perfect. Uh, so actually, I, I, I used ChatGPT last week. 180 00:16:33,550 --> 00:16:41,680 Uh, uh, because I so I as it's been, as it's been, uh, mentioned several times. 181 00:16:42,010 --> 00:16:46,150 Uh, I'm currently involved in some strange project related to Fermat's Last Theorem. 182 00:16:46,570 --> 00:16:52,000 Uh, and I dare I, I if I click this, then all manner of things that happen. 183 00:16:52,000 --> 00:16:55,110 Right? I'm gonna try clicking this. 184 00:16:56,960 --> 00:17:09,910 Uh. Let me. 185 00:17:15,300 --> 00:17:20,150 Was. So anyway, as I was saying. 186 00:17:22,250 --> 00:17:30,690 There's some website on the internet that contains a whole bunch of maths, uh, involving some kind of, uh, that. 187 00:17:32,740 --> 00:17:39,490 It's a very beautiful website. Looks really nice, but rendered in rendered in beautiful latex on the web. 188 00:17:39,790 --> 00:17:48,460 Uh, but I'm just in the middle of writing, uh, a very long, you know, explanation of a lot of mathematics behind Fermat's Last Theorem. 189 00:17:48,640 --> 00:17:56,020 And I needed the fact that if a was a dedicated domain with field of fractions K, and if L was a finite separable extension of k, 190 00:17:56,290 --> 00:18:04,030 and B was the integral closure of A in L, uh, then I claim that for an arbitrary k module uh, 191 00:18:04,360 --> 00:18:10,930 string over k with L is the same as ten string over I would be. So there's some random statement in graduate level algebra. 192 00:18:12,130 --> 00:18:15,490 Uh, and that's the kind of thing that I thought that I should be able to prove. 193 00:18:15,880 --> 00:18:24,020 Uh, but the problem is, I just got a bit stuck because it ring advantages of a number filled ring of integers with finite extension. 194 00:18:24,040 --> 00:18:28,870 The new thing might not be free over. The whole thing is going to be projective, but it might not be free. 195 00:18:29,170 --> 00:18:31,959 Locally free, but maybe not free. So there was an at that. 196 00:18:31,960 --> 00:18:36,760 [INAUDIBLE], there was some technical issue, so I thought I should be able to do this right. 197 00:18:37,360 --> 00:18:40,419 And I spent like two minutes on it and I was just like, this is stupid. 198 00:18:40,420 --> 00:18:46,780 It's like, this is graduate level algebra, right? This, this, you know, okay, so I, uh, I. 199 00:18:47,930 --> 00:18:52,610 So well about 20 years ago. Uh, I would have just, like, knocked on another number. 200 00:18:52,610 --> 00:18:56,419 Theory stories that had you do this. Uh, ten years ago. 201 00:18:56,420 --> 00:18:59,630 There's a new option I could ask for. Math overflow. Right? 202 00:19:00,020 --> 00:19:01,580 Or math dot stack exchange. 203 00:19:01,760 --> 00:19:08,280 Two years ago, I could just kind of cheat and ask on the lean chat, uh, because there's a bunch of mathematicians hanging out there. 204 00:19:08,280 --> 00:19:11,870 Uh, uh, and it's likely that somebody will give me an answer quickly. 205 00:19:11,930 --> 00:19:17,240 But now, of course, you can actually ask ChatGPT for I don't pay for these things, right? 206 00:19:18,350 --> 00:19:25,520 Because I'm super sceptical that they're useful. But you just go to chatgpt.com and I type in my graduate level question. 207 00:19:25,970 --> 00:19:30,049 It gives you a list of algebra and the, uh, the answer is illuminating, right. 208 00:19:30,050 --> 00:19:37,850 Because I asked this explicit question about ten tearing up finite extensions of, uh, the rings of integers, of number fields. 209 00:19:38,540 --> 00:19:42,290 And the first thing it does is it says, oh, suffices to prove this simpler problem. 210 00:19:42,410 --> 00:19:47,840 And that was something I'd spotted when I was thinking about it for two minutes, and I thought, this looks like progress. 211 00:19:47,840 --> 00:19:53,660 And then I decided, actually, is it progress? It seemed the reduction seems just as difficult as the original question. 212 00:19:54,230 --> 00:20:00,980 So I tried that, and then I gave up on that. And then I asked ChatGPT, and the first thing ChatGPT does is finds the same reduction. 213 00:20:01,430 --> 00:20:05,270 Uh, and then actually, you know, it gives the idea which I'd missed. 214 00:20:05,900 --> 00:20:10,160 It gave the idea about, uh, how to, you know, how to prove the reduction. 215 00:20:11,570 --> 00:20:14,570 Uh, and then it justified its argument with rubbish. 216 00:20:14,660 --> 00:20:22,370 This is this, you know, is very, very wishy washy on exactly how this proof worked. 217 00:20:22,380 --> 00:20:28,940 But the thing is, it gave me enough clues, uh, that I could actually put everything together. 218 00:20:29,780 --> 00:20:36,060 And so. So there we go. So the language model was good in the sense that it pushed me in the right direction, 219 00:20:36,080 --> 00:20:42,140 but it wasn't perfect because when it actually when it came to the details, it was, it was, it was, um. 220 00:20:42,380 --> 00:20:46,600 Not at all. Yeah. As I say, this is not an inaccurate summary, right. 221 00:20:46,610 --> 00:20:51,319 The details were vague or meaningless. Uh. 222 00:20:51,320 --> 00:20:55,240 And why is that happening? It's because, of course, they know all the facts on the internet, right? 223 00:20:55,250 --> 00:21:01,790 That's, you know, that's how they're trained on the internet. But the thing is, they're not very good at logical reasoning, right? 224 00:21:01,790 --> 00:21:07,519 They're not very good at actually. They don't have understanding in the same way that we have understanding. 225 00:21:07,520 --> 00:21:12,110 Right. Can these things think it's just not even a meaningful question. 226 00:21:12,380 --> 00:21:20,180 It's like asking if a submarine can swim. You said humans think these things are not thinking they're doing something else right? 227 00:21:20,180 --> 00:21:23,470 As Ken, I know this is a quote from Ken Ono, right? 228 00:21:23,510 --> 00:21:26,710 That is not cognition. They're doing recognition, right. 229 00:21:26,720 --> 00:21:32,290 I they've seen lots and lots of things before. And so they kind of think, oh yeah, I've seen this thing before. 230 00:21:32,300 --> 00:21:35,420 That's to that's what humans do in this situation. 231 00:21:35,420 --> 00:21:36,050 I'll do that. 232 00:21:36,870 --> 00:21:44,580 Uh, and actually it's interesting, like what a mathematician actually doing this is a lot of what mathematicians are we just doing pattern matching. 233 00:21:44,820 --> 00:21:52,500 Occasionally we have ideas, but how often are we having ideas and how often are we just, you know, just going through the motions and. 234 00:21:53,310 --> 00:21:57,630 Yeah. Oh, I know, you know, just applying stuff that. Yeah, we just. 235 00:21:59,020 --> 00:22:03,280 It's not immediately clear how many times we're having brilliant new ideas, 236 00:22:03,280 --> 00:22:09,790 and how many times we're just applying facts and techniques that we just know work because we've seen them in other situations. 237 00:22:11,160 --> 00:22:14,370 So we're we're maybe doing similar things to the machines. 238 00:22:14,370 --> 00:22:17,210 But the thing is, we can back it up with this logical reason. 239 00:22:17,550 --> 00:22:24,050 We take an idea from a related area, but then we could rigorously prove that that idea works in our situation, right? 240 00:22:24,170 --> 00:22:27,959 The machine takes an idea from another area, attempts to apply it, and then and then, 241 00:22:27,960 --> 00:22:32,520 you know, it's really fluff when you ask them, does it actually work? 242 00:22:32,520 --> 00:22:35,520 Hey, this is the thing they never give. They never say, I don't know. 243 00:22:35,790 --> 00:22:38,820 The common failure mode if you ask a human a question. 244 00:22:39,300 --> 00:22:46,420 Yeah. A common failure mode is I don't know. Whereas you ask a machine a question, the common failure mode is [INAUDIBLE]. 245 00:22:46,440 --> 00:22:51,960 This is the if they they would rather make something up and say they don't know. 246 00:22:53,500 --> 00:22:58,300 So this is the big question, right? If we want to use these things, how do we teach language models to reason accurately? 247 00:22:58,960 --> 00:23:05,710 And uh, and the way, the way that we're going to do that compared to scientists, they like, you know, they like challenges. 248 00:23:05,710 --> 00:23:09,940 They want they want a challenge, and then they want to be state of the art in that challenge. 249 00:23:09,940 --> 00:23:13,270 Right. And if you can get state of the art, you can get a paper. 250 00:23:13,840 --> 00:23:21,969 That's how computer science seems to work. You you you you make some challenge and then you say and compute. 251 00:23:21,970 --> 00:23:25,450 Yeah, I can try this challenge. And at the end they get a grade. 252 00:23:25,990 --> 00:23:29,500 Yeah, they get a grade. They, they try this challenge and they get 15%. 253 00:23:29,830 --> 00:23:33,970 And if that's better than any other AI then the then they've got state of the art. 254 00:23:34,570 --> 00:23:42,370 And if you get state of the art then what you win is a paper. That seems to be how computer science works as far as I'm concerned. 255 00:23:42,370 --> 00:23:47,590 Anyway. So how do we make challenges? The way you make a math challenge is a data set, right? 256 00:23:47,830 --> 00:23:51,520 So a quest, you know, a big collection of mathematical questions. 257 00:23:51,880 --> 00:23:59,290 That's what a challenge looks like. Uh, and the idea is you get the machine to answer the questions correctly, you get a point for each word you get. 258 00:23:59,290 --> 00:24:06,069 Right. And, uh, and so now the question is what a mathematical data sets look like is that when you're a mathematician, 259 00:24:06,070 --> 00:24:09,459 you, you know, these things are being generated by computer scientists. 260 00:24:09,460 --> 00:24:13,630 And as mathematicians, we tend not to look at them at all. So what a mathematical data sets look like. 261 00:24:14,110 --> 00:24:18,459 Uh, they come in several forms. Uh, they're mathematical questions. 262 00:24:18,460 --> 00:24:22,450 But what level? Right. They can be. They can be sort of GCSE level. 263 00:24:22,780 --> 00:24:26,290 Right. The way up to research level. That's what it. 264 00:24:26,440 --> 00:24:32,710 Yeah, that's the range of, uh, it's, uh, the range of the mathematics is really, really extreme. 265 00:24:33,250 --> 00:24:39,040 Uh, they might be a natural language, or they might be in a formal language, like languages. 266 00:24:39,100 --> 00:24:42,720 There are other formal languages as well. Uh, so. 267 00:24:43,270 --> 00:24:46,660 But of course, Leon seems to be winning. 268 00:24:46,690 --> 00:24:55,090 There's. And in many cases, uh, the formal language chosen is Latin for mathematics at least, uh, the problems might have shorter. 269 00:24:55,150 --> 00:25:03,760 This is a big difference. The answers to the questions my either be expressible in about eight characters. 270 00:25:04,400 --> 00:25:07,920 Yeah, they might be a multiple choice question. Uh, or they might be. 271 00:25:07,930 --> 00:25:13,540 What is this number? Uh, or they might have really long answers, you know, prove this theorem. 272 00:25:13,550 --> 00:25:17,800 And the answer is like two pages long. So you get different data sets. 273 00:25:17,830 --> 00:25:23,110 Uh, you know, ranging. Yeah. The kind of answers you get can be very different. 274 00:25:23,470 --> 00:25:29,980 And finally, they can be either public or private. Uh, that the private ones, the way it works is you have 100 secret questions, 275 00:25:30,190 --> 00:25:34,150 and they say we've got 100 secret questions, and here's here's five of them. 276 00:25:34,350 --> 00:25:37,420 Yeah, we'll make five of them public. And the other 95 is secret. 277 00:25:37,990 --> 00:25:41,110 Uh, so you get some kind of feeling as to what the data set looks like. 278 00:25:41,620 --> 00:25:46,120 Uh, but you can't see. Yeah, you can't see the question if you want to have a go. 279 00:25:46,440 --> 00:25:52,090 Uh, you have to send your I. Yeah. To the company or the organisation that's holding the secret questions. 280 00:25:52,330 --> 00:25:56,560 And they run, you know, they run it, and then they give you the reports about how many you got, right? 281 00:25:57,880 --> 00:26:03,070 So let me just give you a couple of examples of data sets. This is the artificial intelligence math Olympiad. 282 00:26:03,500 --> 00:26:11,350 Uh, this is again this is Jurco. Uh, he was mentioned in the last, uh, you know, he's a guy who's funding more and more mathematics. 283 00:26:11,740 --> 00:26:14,920 Uh, the, uh, you know, X markets. 284 00:26:14,920 --> 00:26:19,840 So there's a, uh, he's really pushing the idea of open source AI. 285 00:26:19,930 --> 00:26:23,620 He's very worried. The, you know, the the sort of closed source. 286 00:26:24,430 --> 00:26:28,500 Yeah. I was talking about wouldn't it be great if we had a general tool that help mathematicians? 287 00:26:28,540 --> 00:26:33,160 But one thing that perhaps would be a bit less great if that tool was owned by a private company. 288 00:26:33,370 --> 00:26:36,669 And I was like, just for $1,000 a month, you can have that. 289 00:26:36,670 --> 00:26:39,760 Your university can have that tool. Uh, yeah. 290 00:26:39,760 --> 00:26:48,580 He would like he wants to see open source tools. Uh, so that I should say I mentioned this first item on the advisory board. 291 00:26:48,970 --> 00:26:53,590 Uh, for this for this project. So right now we've had two progress prizes. 292 00:26:53,830 --> 00:26:59,740 So we've had two competitions. Uh, and both of the competitions have involved private data sets. 293 00:26:59,920 --> 00:27:03,550 So secret questions. Uh, and the questions are written in English. 294 00:27:04,090 --> 00:27:09,460 Um, and there are an error that so challenging high school type questions. 295 00:27:09,850 --> 00:27:14,940 Uh, not as hard as International Math Olympiad, but yeah. So there are other Olympiads. 296 00:27:14,950 --> 00:27:19,450 An easier level, right? Uh, and the key thing is the answers are short. 297 00:27:19,870 --> 00:27:22,960 Answers are always whole numbers between 0 and 9, nine, nine. 298 00:27:23,260 --> 00:27:26,840 And this is this is the kind of question that you see. 299 00:27:26,860 --> 00:27:28,870 So it's not the kind of thing you can do immediately. 300 00:27:29,380 --> 00:27:36,160 Uh, so each of the three digit numbers from 111 to 999, you colour it blue or yellow and there's some rule. 301 00:27:36,430 --> 00:27:42,610 And then the question is what's the maximum number of yellow. So it's so sufficiently big you can't solve it by brute force. 302 00:27:42,850 --> 00:27:46,210 You can't do all two to the 889 possibilities. 303 00:27:46,690 --> 00:27:52,390 You have to you know, those a sort of a combination. There's some calculation involved and some reasoning involved as well. 304 00:27:53,050 --> 00:28:00,670 Uh, and we're at the stage now where you can type that kind of question into a language model, and the language model can get it right. 305 00:28:02,170 --> 00:28:05,440 Uh. So, as I say, the questions are kept. 306 00:28:05,440 --> 00:28:08,440 Secrets. And if you want to play, then you have to give us your model. 307 00:28:09,070 --> 00:28:12,730 So you can't, uh. Yeah. 308 00:28:12,940 --> 00:28:18,070 We won't send the questions to, uh, to the to the private, to the private models. 309 00:28:18,280 --> 00:28:21,759 You have to upload your models of hugging. Yeah, it's a Kaggle and then Kaggle. 310 00:28:21,760 --> 00:28:26,350 Run it and tell you what you got out of 50. Uh, and there's prize money, right? 311 00:28:26,590 --> 00:28:31,719 The models compete to win six figure prizes. Uh, and there's been two competitions so far. 312 00:28:31,720 --> 00:28:38,950 And the second one just finished, uh, and top, top of the leaderboard, uh, was a, you know, a gang of 3 or 4 people. 313 00:28:38,950 --> 00:28:40,540 And they scored 34 out of 50. 314 00:28:40,810 --> 00:28:49,510 So, so for high school Olympia, there's state of the art for these kind of high school Olympiad questions as far as open source models are concerned. 315 00:28:50,410 --> 00:28:57,430 Uh, if you asked OpenAI AI to do this, they would probably get a much higher score because the closed source models are always ahead. 316 00:28:58,870 --> 00:29:08,209 Uh, so what advantage of, uh, of me being on this, uh, maybe on the organisation committee for this thing is that you get to see secret data. 317 00:29:08,210 --> 00:29:11,320 That's, uh. Uh, so this is a table. 318 00:29:11,320 --> 00:29:15,310 These are the 50 questions. Uh, is this is this competition's just finished. 319 00:29:15,730 --> 00:29:20,440 Uh, and then these orange bars indicate sort of the ten most popular answers that. 320 00:29:20,620 --> 00:29:25,720 So these are the top 500, uh, eyes that took part in this game. 321 00:29:26,050 --> 00:29:30,460 And, uh, the light orange is the most popular answer. And then the dark purple is the least popular answer. 322 00:29:31,060 --> 00:29:37,420 And, uh, so you grade it from, uh, light to dark orange, uh, and then the grey is all the other answers. 323 00:29:37,420 --> 00:29:43,780 And then you take the most popular colour in green so that, you know, of these 50 questions, the top questions there, the easiest question. 324 00:29:43,970 --> 00:29:49,390 You can see like 90% of the models, uh, were guessing the right answer first time. 325 00:29:49,990 --> 00:29:58,180 Uh, so and uh, then by the time it halfway down, you can see it's still the case that the most popular answer was equal to the, uh. 326 00:29:59,050 --> 00:30:05,340 I was equal, uh, to the correct answer. For the first 50% so that you see things are going wrong. 327 00:30:05,340 --> 00:30:08,610 Right? These things where there's very small green dots. 328 00:30:08,610 --> 00:30:15,930 So the, you know, like the green, the green stripes there indicate like the eighth most popular answer turned out to be the correct answer. 329 00:30:16,790 --> 00:30:25,580 So that, you know, there's some data. Uh, but basically 500 language models are getting, you know, getting half of these questions correct. 330 00:30:25,760 --> 00:30:30,110 So there's some kind of idea as to where open source AI is now. 331 00:30:31,190 --> 00:30:35,160 Uh, but we don't care about these stupid undergraduate questions, right? 332 00:30:35,180 --> 00:30:38,920 We care about hard questions. So what about this frontier math data set? 333 00:30:38,930 --> 00:30:44,540 Here's a here's a proper question. Uh, here's a question about moduli spaces of conics. 334 00:30:45,260 --> 00:30:49,960 So, you know, here's some intuitive geometry question. Well, it looks like an intuitive geometry question. 335 00:30:50,000 --> 00:30:53,480 It's actually a combinatorics question which is heavily in disguise. 336 00:30:54,000 --> 00:30:59,660 Uh, but you've got moduli space of conics tangent to a given set of five conics in the plane. 337 00:31:00,080 --> 00:31:08,120 Uh, and the question is and now you look at this, you look at these moduli space mod p and you ask how many how many connected components this got. 338 00:31:08,660 --> 00:31:14,090 Uh and as p varies this answer changes. This is not over FP bar over FP. 339 00:31:14,090 --> 00:31:17,420 So this is an arithmetic question not a geometric question. 340 00:31:17,870 --> 00:31:21,950 Uh, and then the answer is give the answer to two decimal places. So again the answer's very short. 341 00:31:22,640 --> 00:31:26,570 Uh, but now this is research level mathematics. 342 00:31:26,990 --> 00:31:32,820 Uh, and one of OpenAI's closed source models got 25%, uh, on this dataset. 343 00:31:32,870 --> 00:31:35,959 So we've got questions of that nature. You know, that looks. 344 00:31:35,960 --> 00:31:44,060 That's technical and complicated. But the AI is getting 25% of that, including a correct solution to the question. 345 00:31:44,060 --> 00:31:47,400 The. So that's where closed source is right now. 346 00:31:47,400 --> 00:31:51,060 And. And um, but when you actually ask, this is the problem. 347 00:31:51,960 --> 00:31:55,280 You know, it's great the you know, the model can solve that. 348 00:31:55,290 --> 00:32:01,800 The model get that number right. But then when you ask it to explain why that number's correct, it's wishy washy. 349 00:32:02,190 --> 00:32:08,040 Is it because it makes them get it has to like if you've got some Galois group and has to guess what the Galois group it is, 350 00:32:08,190 --> 00:32:11,370 it just guesses the gavel group is the full symmetric group. And it's right. 351 00:32:11,880 --> 00:32:15,000 And then you ask it why is it the full symmetric group. And it's vague. 352 00:32:15,360 --> 00:32:24,420 It doesn't actually know why. But it's just you know it's just this is the problem with these number questions is they don't correctly test reasoning. 353 00:32:24,750 --> 00:32:30,000 They test how good you are at guessing the answer. And these models have got very good at guessing the answer. 354 00:32:30,840 --> 00:32:35,250 Uh, so the close source, the close source models are around two years ahead, 355 00:32:35,400 --> 00:32:39,240 and the close source models are trying to pull ahead because the tech companies have stopped publishing. 356 00:32:40,100 --> 00:32:45,530 This is this is what's happened because they want to stay ahead and the open source models are trying to catch up. 357 00:32:47,140 --> 00:32:52,459 Say, why are these datasets private? That's kind of obvious, because if they weren't private, if you just put them all on the internet, 358 00:32:52,460 --> 00:32:56,800 that people would start asking them on Math Overflow and people would start answering them, 359 00:32:56,800 --> 00:32:59,830 and then the answers would be on the internet, and then the models would learn the answers, 360 00:32:59,860 --> 00:33:03,490 and then the models would just all get 100%, so it would break the system. 361 00:33:04,440 --> 00:33:08,040 And now what are the RSS numbers? That's a much more pertinent question. 362 00:33:08,610 --> 00:33:12,930 What are their numbers? Just because we want to be able to market really easily. 363 00:33:13,350 --> 00:33:18,900 Right. If you ask a question and if you ask the how do you prove this theorem? 364 00:33:19,230 --> 00:33:23,370 If the answer is a two page long proof. Who's going to mark that right. 365 00:33:23,760 --> 00:33:27,120 You can't have I marking its own homework. 366 00:33:27,870 --> 00:33:36,750 This is the problem right? If if you if you get if the question is prove this theorem and the questions in English and the answer is in English, 367 00:33:36,840 --> 00:33:43,290 you see where this is going, right. If the questions in English and the answers in English and the answer is two pages of LaTeX, 368 00:33:43,860 --> 00:33:48,600 and we have to decide how many points out of ten this answer, a human has to read it right. 369 00:33:48,780 --> 00:33:53,140 And that's really expensive. Right. 370 00:33:53,500 --> 00:33:56,740 So the computer scientists want to get state of the art. 371 00:33:57,130 --> 00:34:00,230 But the problem is marking a proof. Right. 372 00:34:00,250 --> 00:34:04,420 That's what we do. We don't. We don't ask, what is this number? 373 00:34:04,990 --> 00:34:11,320 We ask how to prove this theorem. But it it's too expensive to mark that in natural language. 374 00:34:11,410 --> 00:34:14,770 So the natural language data sets are all what is this number. Questions. 375 00:34:15,280 --> 00:34:19,439 Right. That's not what we do. Okay. 376 00:34:19,440 --> 00:34:23,250 So formal data sets right. And these are data sets now. 377 00:34:23,280 --> 00:34:26,670 Now it's different. Now the questions are not written in English. They're written in lead. 378 00:34:27,540 --> 00:34:31,920 Uh, all these other languages. But yeah, the vast majority of them are written in lean. 379 00:34:32,160 --> 00:34:37,680 And now, of course, you can ask the interesting question is I prove this theorem right? 380 00:34:38,460 --> 00:34:42,970 And now the idea is this time. Now the model writes a model writes a solution. 381 00:34:42,990 --> 00:34:46,530 And now we just throw it into lean and see if it compiles. 382 00:34:47,500 --> 00:34:51,280 Yeah, because now a machine can grade the answer, right? 383 00:34:51,310 --> 00:34:54,790 If it's wishy washy, it's just not going to compile, right. 384 00:34:54,820 --> 00:35:03,520 That's what we do. And now, of course, the big problem, uh, unfortunately, uh, is that right now the level is terrible. 385 00:35:04,300 --> 00:35:08,980 The level is terrible. The data sets are mathematically trivial, right? 386 00:35:09,250 --> 00:35:13,840 So we have these formal data sets where the questions are in line, but they're GCSE. 387 00:35:14,020 --> 00:35:22,390 They're really very, very easy stuff. So the hardest one I'm aware of currently for these formal data sets is Putnam Bench, 388 00:35:22,810 --> 00:35:28,720 uh, which is formalised statements of 640 uh Putnam's uh Putnam questions. 389 00:35:28,750 --> 00:35:33,520 So the Putnam is an undergraduate level, uh, undergraduate level problems. 390 00:35:34,090 --> 00:35:39,220 Uh, so hard undergraduate level problems and state of the art is abysmal, right? 391 00:35:39,640 --> 00:35:42,070 State of the art is ten out of 640. 392 00:35:43,050 --> 00:35:47,910 Uh, so that's good when you see state of the art like that, because it means there's plenty of room for improvement. 393 00:35:48,060 --> 00:35:56,370 Well, by the time that the by the time the state of the art is 60%, then you really you've lost interest like people like, oh, great, I've got 65%. 394 00:35:56,370 --> 00:36:00,030 But like if you're solving half the problems, then you know won't be. 395 00:36:00,430 --> 00:36:02,370 Give us a year and you'll have solved all the problems. 396 00:36:02,580 --> 00:36:10,320 But right now this is undergraduate level material formalise and lead and people can't do it right. 397 00:36:10,530 --> 00:36:13,220 Apparently OpenAI's most recent. Oh, no. 398 00:36:13,230 --> 00:36:19,590 No, this is this Chinese model, deep sea prover A claiming they've got they've got more than ten, but it's yet to be. 399 00:36:20,040 --> 00:36:28,140 And you know, the the Putnam people have to officially check this, but apparently, uh, there's been a big jump this week, so we'll wait and see. 400 00:36:28,910 --> 00:36:32,470 But what do we really want? Like, what am I trying to sell here? 401 00:36:32,480 --> 00:36:37,879 I want to. I want these systems to do the kind of mathematics that I want them to do. 402 00:36:37,880 --> 00:36:41,950 The periodic Langlands program. Right. That's what we want. 403 00:36:42,400 --> 00:36:46,390 And, uh, because, you know, that's what's happening in places like this. 404 00:36:47,110 --> 00:36:52,360 And so data sets are private. That's a bit rubbish. Like data sets are too easy. 405 00:36:52,360 --> 00:37:00,580 That's a bit rubbish. Uh, so there's a very good paper by, uh, by Simon Fraser again, uh, talking about, you know, 406 00:37:00,610 --> 00:37:08,140 the general problem of mathematical data sets and just flagging all the things which are somehow insufficient currently. 407 00:37:09,430 --> 00:37:14,799 Uh, but I do believe that when we get it right and we start making data sets that are 408 00:37:14,800 --> 00:37:18,910 actually correctly modelling the kind of mathematics that researchers are doing, 409 00:37:19,120 --> 00:37:24,100 you will get computer scientists, you know, trying to get state of the art on these data sets. 410 00:37:25,380 --> 00:37:28,650 Right. So where are we? Like, language models are rubbish. 411 00:37:28,800 --> 00:37:33,240 Language models make stuff up. Uh, current data sets aren't asking the right questions. 412 00:37:33,870 --> 00:37:38,750 Uh. And so now let me finish this talk with a vision. 413 00:37:39,480 --> 00:37:47,210 Uh, this isn't science fiction. Uh, and is maybe a route to some kind of tool that mathematicians can use. 414 00:37:49,340 --> 00:37:56,810 Uh, so let me go all the way back to 2023, uh, where language models that first appeared and they couldn't compute. 415 00:37:56,840 --> 00:38:02,810 Right. And it was really funny you asked them, you asked ChatGPT to multiply 210 digit numbers together. 416 00:38:03,170 --> 00:38:07,610 You give it two explicit two digit numbers together, and it confidently gives you a 20 digit answer. 417 00:38:07,970 --> 00:38:11,209 And it begins in the right number, and it ends in the right number. 418 00:38:11,210 --> 00:38:17,300 And in the middle, all the numbers are just made up. That was like that was what it looked like in 2023, right? 419 00:38:17,330 --> 00:38:23,870 But in 2024, when you ask it to multiply 210 digit numbers together, it just follows a python. 420 00:38:24,020 --> 00:38:32,660 You know, it fires up Python to ask Python what the answer is, and then it reports the answer because these systems learned how to write Python code. 421 00:38:34,250 --> 00:38:38,120 And how did they learn how to write Python code? Well, they just train them on Python. 422 00:38:38,130 --> 00:38:41,540 There's billions of lines of Python code on the internet. 423 00:38:42,170 --> 00:38:46,820 So the language models just train on that Python code. And then I got really good at writing Python. 424 00:38:46,940 --> 00:38:51,019 And now when you ask some computation questions the python's good at it. 425 00:38:51,020 --> 00:38:59,870 Just, you know, it just gets Python to do it. But right now, as I've said several times, they can't reason at a high level. 426 00:38:59,870 --> 00:39:04,730 Right. And the the harder the mathematics gets, the worst, I guess, at reasoning. 427 00:39:05,450 --> 00:39:09,680 Okay, so maybe we should just teach them to write lean code. 428 00:39:09,920 --> 00:39:14,810 That would be good because then, you know, that stops them [INAUDIBLE], right? 429 00:39:15,310 --> 00:39:19,760 It's you know, you ask them to write, you say prove this theorem. 430 00:39:19,760 --> 00:39:24,170 And then internally it proves it in lean and that it translates it back into human. 431 00:39:24,380 --> 00:39:33,570 You know, it's going to get it right. But the problem is we can't train it only in code because there's only millions of lines of leading code. 432 00:39:34,350 --> 00:39:37,880 And that's and a lot of that is undergraduate level, right? 433 00:39:37,890 --> 00:39:41,910 This is the problem we want to break through on. We want to get them to do research. 434 00:39:42,660 --> 00:39:46,410 We want to break through undergraduate level. And most of it is undergraduate level. 435 00:39:47,800 --> 00:39:55,810 So undergraduate level. That's great. Are you? DeepMind has shown us that you can train on what's already there and you can get heart. 436 00:39:55,840 --> 00:40:04,360 You know, you can solve half 50% of the 2024 IMO, given given only what we have. 437 00:40:04,630 --> 00:40:10,280 But as I say, that's lemmas, right? That's not theorems. So how are we going to get this further? 438 00:40:10,580 --> 00:40:14,000 And we just lied. This is a big problem. 439 00:40:14,630 --> 00:40:19,040 We need to think seriously about getting a lot more modern mathematics. 440 00:40:19,040 --> 00:40:22,690 Formalised. It seems to me this is this is the bottom. 441 00:40:22,700 --> 00:40:27,340 This is the problem. We need to get a lot more modern mathematics formalised. 442 00:40:27,670 --> 00:40:33,220 And because that does not this is not science fiction, right? 443 00:40:33,400 --> 00:40:36,940 Maybe, maybe we can get machines to do that. 444 00:40:36,940 --> 00:40:41,440 Maybe because it's one thing coming up with your own ideas, 445 00:40:41,440 --> 00:40:48,010 but actually translating from a human proof into a computer proof that really does not sound. 446 00:40:48,310 --> 00:40:51,790 Yeah, that's much easier than coming up with your own proof. 447 00:40:51,910 --> 00:40:55,180 Just doing that. Like we can translate between English and Chinese. 448 00:40:55,570 --> 00:40:59,050 So I Gobbi translate between English and lean. Right. 449 00:40:59,470 --> 00:41:04,330 That that doesn't sound crazy to me. And of course, you know, people are thinking about this. 450 00:41:05,080 --> 00:41:11,410 And so finally, whether this is one of the reasons why I'm currently teaching Lena proof of Fermat's Last Theorem. 451 00:41:12,380 --> 00:41:19,910 So this was proved in the 1990s by Andrew Wiles and, uh, ably assisted by my advisor, Richard Taylor. 452 00:41:20,690 --> 00:41:27,440 Uh, the proof is huge. Uh, I mean, the Wiles paper is album pages, but it's got 80 references. 453 00:41:27,950 --> 00:41:32,120 Uh, like I said, the proof is over 200 pages either if you want to write down the proof. 454 00:41:32,540 --> 00:41:35,570 Yeah. From the axioms of mathematics, it would be more like 2000 pages. 455 00:41:36,140 --> 00:41:39,530 I think it would be a more reasonable estimate of how long this proof is. 456 00:41:39,980 --> 00:41:47,210 Uh, and and the weird thing about the proof is that the Fermat's Last Theorem was a question about positive whole numbers. 457 00:41:47,570 --> 00:41:51,320 But the proof goes through all manner of really complex structures. 458 00:41:52,140 --> 00:42:00,420 Right. All these modern buzzwords. Uh, elliptic curves and modular forms and Galois representations of finite flat group schemes. 459 00:42:00,570 --> 00:42:04,380 And we've got them. All right. There's you know, there's an example. 460 00:42:04,450 --> 00:42:11,489 Yeah. Those are examples of things that over the last couple of years, we've now carefully formalised these things in lean. 461 00:42:11,490 --> 00:42:15,790 So that in I mean, you know, it's in my Fermat's Last Theorem repository and uh, 462 00:42:15,810 --> 00:42:21,210 it's slowly moves in, in my repository as a feed, a repository for math lib. 463 00:42:21,600 --> 00:42:27,300 Uh, so elliptic curves are modular forms are math lib, uh, and the other two things, Galois representations and group schemes. 464 00:42:27,540 --> 00:42:33,460 They're not in Matlab yet, but in a year's time they will be. That's, uh, that's how the project is going. 465 00:42:33,470 --> 00:42:39,610 You know, I'm I'm building stuff. But, you know, when we make a new definition, we make sure it works. 466 00:42:39,620 --> 00:42:45,319 You know, we make sure it's usable. And then when we're convinced we've got it right, uh, we push it. 467 00:42:45,320 --> 00:42:48,710 But but it's. No, it can't just be me. Right? 468 00:42:49,220 --> 00:42:54,080 It can't just be me doing that. There's more to mass than Fermat's Last theorem. 469 00:42:54,080 --> 00:42:58,060 It turns out. Uh, there's other there's other areas of mathematics. 470 00:42:58,070 --> 00:43:06,740 And here are just some dumb examples of stuff that are like undergraduate or beginning graduate level mathematics that are not formalised. 471 00:43:07,040 --> 00:43:10,490 Yeah, that are not formalised. And as far as I know, any theorem prover. 472 00:43:11,450 --> 00:43:17,240 And I just need someone to sit down. You know, it's not difficult to sit down and teach the machine. 473 00:43:17,510 --> 00:43:21,530 It's not just the definition. The definition and basic facts about the definition. 474 00:43:22,010 --> 00:43:26,360 Right. But the thing is, we can't get machines. We can get machines to prove theorems. 475 00:43:26,690 --> 00:43:32,780 But it's really difficult to get machines to write down definitions because formalising definitions is an art. 476 00:43:33,260 --> 00:43:36,290 Right. Is it system specific? Right. 477 00:43:36,350 --> 00:43:44,570 Well, the best way to to teach lean, uh, what are not is, isn't necessarily the best way to teach Isabel or to teach math and math. 478 00:43:44,780 --> 00:43:49,660 Right. It's system specific. Uh, and also, you've got the problem. 479 00:43:49,720 --> 00:43:53,570 The machine isn't holding your hand at this point, right? 480 00:43:53,690 --> 00:43:59,179 The machine will check that your definition is syntactically correct, but the machine won't check. 481 00:43:59,180 --> 00:44:03,590 It's semantically correct. The machine won't check that you've written down the right definition. 482 00:44:03,830 --> 00:44:06,980 The machine will just check. You've written down a definition that makes sense. 483 00:44:08,350 --> 00:44:14,450 So you've got to be super, super careful here. So this is something. 484 00:44:15,410 --> 00:44:18,860 This is something that human experts need to do. 485 00:44:18,860 --> 00:44:22,280 Right to theorems. We can let the software do it itself. 486 00:44:22,280 --> 00:44:27,620 Right. In two years time, maybe the systems will be translating from human maths. 487 00:44:28,250 --> 00:44:34,040 Translate a human proof into a living proof. That is absolutely not out of the question. 488 00:44:34,190 --> 00:44:42,530 But for me, formalising definitions you absolutely need to take anything in machine says with a very big pinch of salt, right? 489 00:44:43,310 --> 00:44:50,660 So who is going to do? Who is going to formalise all these definitions in all the graduate courses that are being run in Oxford? 490 00:44:51,020 --> 00:44:54,590 Right. It's not going to be computer scientists. Okay. 491 00:44:56,770 --> 00:45:03,370 Whose job is it? I mean, I claim. I claim that it's the mathematics research community's job, right? 492 00:45:03,640 --> 00:45:10,450 We've got we've got to do. We are the people who are qualified. But the problem is, we're not going to be rewarded for that, right? 493 00:45:10,750 --> 00:45:15,640 The reward system in maths is that you get points, you get money. 494 00:45:15,670 --> 00:45:22,870 If you get points in the research assessment exercise and you get points in the research assessment exercise by papers. 495 00:45:23,200 --> 00:45:31,270 Do you get papers by proving therapies? New theorems builds upon old theorems which builds upon old theorems, not all of which are correct. 496 00:45:31,420 --> 00:45:35,530 As I found out when I was trying to teach the computer proof of Fermat's Last Theorem. 497 00:45:35,560 --> 00:45:42,580 We're not as smart as we think we are. So we've got this idea that everything in the lecture is fine and we just need to build more. 498 00:45:42,760 --> 00:45:48,550 Right. But it's is is this really is this really how we're going to change mathematics? 499 00:45:48,790 --> 00:45:59,300 Right. I think that this whole, this AI and theorem prove is that together they are giving a completely new way of thinking about mathematics. 500 00:45:59,320 --> 00:46:02,320 But but the system isn't, you know, the system. 501 00:46:02,890 --> 00:46:05,950 There's something wrong. Right? That that's the problem. 502 00:46:07,220 --> 00:46:10,920 You know, and actually actually. Yeah. 503 00:46:10,920 --> 00:46:14,350 What can full moon message do for mathematics? Okay. 504 00:46:14,550 --> 00:46:19,630 Can they really turn it on its head? I mean, that's it. 505 00:46:19,660 --> 00:46:20,080 Thank you.