1 00:00:15,950 --> 00:00:21,200 So, yes, so machine learning has ventured into its many parts of physics, 2 00:00:21,200 --> 00:00:25,160 string theory is one, and that's happened quite recently, so I'm going to talk about that. 3 00:00:25,160 --> 00:00:34,280 And the plan is, well, I'll give you a little bit of a motivation and the history which is going to be very short. 4 00:00:34,280 --> 00:00:40,220 I'll go over some machine learning basics, which is going to do something very similar to what artists done earlier, 5 00:00:40,220 --> 00:00:42,800 but perhaps phrasing it slightly differently. 6 00:00:42,800 --> 00:00:49,520 So maybe that's helpful because hearing it twice presented in a slightly different way might be might be useful. 7 00:00:49,520 --> 00:00:54,800 I have to tell you a little bit about string theory, because otherwise I can't explain what the applications are going to be. 8 00:00:54,800 --> 00:01:03,570 And then eventually I'm going to put these two together and and tell you what you might be able to do with machine learning and string theory. 9 00:01:03,570 --> 00:01:08,820 So let me give you a short history, and it's really a very short history, 10 00:01:08,820 --> 00:01:13,830 because this subject machine learning and string theory started about three years ago, 11 00:01:13,830 --> 00:01:20,070 and I think it's fair to say that it started in Oxford Theoretical Physics with these three gentlemen. 12 00:01:20,070 --> 00:01:27,300 I mean, really, it's been a post-doc here, is happily smiling, standing in front of the CMS detector at CERN, where he is now a fellow. 13 00:01:27,300 --> 00:01:33,130 And this is when Krypton do. If it was a postdoc at the time, he's now a long term fellow in Munich and young. 14 00:01:33,130 --> 00:01:38,630 We hear who is at City University London, but has a long affiliation with Oxford. 15 00:01:38,630 --> 00:01:46,850 And four years ago, we were discussing precisely how we would get this topic started, 16 00:01:46,850 --> 00:01:50,660 how we would go about applying machine learning in string theory. 17 00:01:50,660 --> 00:02:00,650 And this led to two papers one by volume, one by Yonghui, who were kind of the first papers doing this or exploring this sort of thing. 18 00:02:00,650 --> 00:02:05,450 So this was two years ago, and it's been a burst of activity since. 19 00:02:05,450 --> 00:02:07,310 But of course, it's a very new subject, 20 00:02:07,310 --> 00:02:13,910 so people are still exploring as there's no there are no final conclusions and there won't be any final conclusions today. 21 00:02:13,910 --> 00:02:21,930 So all I can tell you about is basically what this beginning looks like and what we're hoping for. 22 00:02:21,930 --> 00:02:29,130 OK, so what are the motivations, why do you want to do this in the first place? 23 00:02:29,130 --> 00:02:37,170 No string theory leads to very large data sets, and I'll explain a bit later why that is right and that. 24 00:02:37,170 --> 00:02:43,800 But these are data sets which are very different from the usual data sets that you use, such as pictures and videos. 25 00:02:43,800 --> 00:02:51,210 And these numbers keep changing, but the current world records for the number of solutions in string theory is this somewhat ridiculous number here. 26 00:02:51,210 --> 00:02:57,220 Right? And so these are really, really large data sets, but quite different. 27 00:02:57,220 --> 00:03:01,960 And, of course, machining provides techniques to deal with large sets of data, 28 00:03:01,960 --> 00:03:08,890 so it's an obvious thought that you might be able to put those two together and be able to make some progress. 29 00:03:08,890 --> 00:03:13,810 So can we can we uncover features of string data using techniques for machine learning? 30 00:03:13,810 --> 00:03:17,160 So that's that's the obvious question. 31 00:03:17,160 --> 00:03:28,920 It's perhaps slightly less obvious question, which has to do with this example that I talked about earlier about the the deep mines go enterprise. 32 00:03:28,920 --> 00:03:30,780 So let me come back to that. 33 00:03:30,780 --> 00:03:39,360 So as you can see, the number of so sensitive go games is quite large, but still a lot less than the number of string solutions. 34 00:03:39,360 --> 00:03:47,450 But it is also a lot larger than the number of sets of chess games, which is why GO was a challenge for a very long time, computer wise. 35 00:03:47,450 --> 00:03:52,370 And so this these curves to illustrate what DeepMind was able to do in this context, 36 00:03:52,370 --> 00:04:07,480 so so the the dashed line here that corresponds to the supervised learning system that they initially devised, which beat the the world champion Lee. 37 00:04:07,480 --> 00:04:15,550 So this was this was a system that was trained using human go games, basically with many human go games. 38 00:04:15,550 --> 00:04:20,260 And the the the this blue line up here that that goes up, 39 00:04:20,260 --> 00:04:28,480 that is the reinforcement learning curve, which was for a system that did not use human input. 40 00:04:28,480 --> 00:04:35,650 It just knew the rules of go and learn to play the game by playing against itself. 41 00:04:35,650 --> 00:04:41,290 As you can see for about two days and after two days, it's strength. 42 00:04:41,290 --> 00:04:47,320 So this this vertical axis is the strength with which it plays after after about two days. 43 00:04:47,320 --> 00:04:57,130 Its strength exceeded the strength of the supervised system and of course, then also of the of the strength of the world champion. 44 00:04:57,130 --> 00:05:03,250 So that's quite impressive. And this curfew makes it even more impressive because this curve or these two curves, 45 00:05:03,250 --> 00:05:08,500 they show the number of human moves that the system made in the context humanlike 46 00:05:08,500 --> 00:05:13,360 moves that see professional player moves and the fact that this blue curve, 47 00:05:13,360 --> 00:05:22,780 which corresponds to the reinforcement, is below the the other curve says that the reinforcement learning which had hadn't used 48 00:05:22,780 --> 00:05:28,090 any human input is in fact making moves that are not humanlike like it invents new moves. 49 00:05:28,090 --> 00:05:37,780 Yet it is stronger than the best human player. And to illustrate this in a paper by presenting some of these moves that the system has come up with. 50 00:05:37,780 --> 00:05:46,420 So that is quite impressive, and that makes you think, well, perhaps if if such a system can reveal new structures of the board game, 51 00:05:46,420 --> 00:05:55,900 maybe it can reveal new structures in physics or mathematics, and we might be able to learn something about the new network in the context as well. 52 00:05:55,900 --> 00:06:00,610 So just to summarise, the two basic questions are then where exactly this one, right? 53 00:06:00,610 --> 00:06:07,780 Can we can we somehow use machine learning to reveal structures, mathematical structures within string theory? 54 00:06:07,780 --> 00:06:13,940 And this is sort of related to one of the questions we had earlier about. Can we understand better what the system is actually learning? 55 00:06:13,940 --> 00:06:18,850 Can we look at it as more more than a black box? 56 00:06:18,850 --> 00:06:27,830 And this is all very new. So. So this will illustrate this with a paper that is just from last year. 57 00:06:27,830 --> 00:06:31,970 And then the second question is, is the one that post earlier, which is, 58 00:06:31,970 --> 00:06:39,260 can we somehow use machine learning to help sort through this enormous amount of data? 59 00:06:39,260 --> 00:06:45,200 OK, so let me go through some of the basics, so this is very similar to what I did, but perhaps presented slightly differently. 60 00:06:45,200 --> 00:06:51,140 I think one of the very useful language, if you if you know the language is just the basic language of mathematics. 61 00:06:51,140 --> 00:06:57,770 So if you just remember a few bits of very basic mathematics, new netbooks are actually very easy to understand. 62 00:06:57,770 --> 00:07:03,260 And one thing to say is that you should just for the time, just think of it as a box, 63 00:07:03,260 --> 00:07:07,310 which corresponds to a function f feature which takes as an input, 64 00:07:07,310 --> 00:07:14,570 an end, dimensional vectors and real numbers, and produces an M dimensional outputs and real numbers. 65 00:07:14,570 --> 00:07:19,400 And it's actually more complicated than that. It's not just a single real function. 66 00:07:19,400 --> 00:07:27,740 It's a whole family of of functions which depend on some set of parameters feature that we call collectively theatre, right? 67 00:07:27,740 --> 00:07:37,010 And so the the art of training a neural network is to somehow pick these parameters to in some sort of suitable fashion. 68 00:07:37,010 --> 00:07:42,740 And how do we do that? Well, we have typically in Alice, in supervised learning, which is what I'm discussing here. 69 00:07:42,740 --> 00:07:51,170 We have a training set out, which corresponds to instances of input externally and intended to target outputs. 70 00:07:51,170 --> 00:07:58,280 Why? So these are the way I sort of the real, the real results that we would like to get, right? 71 00:07:58,280 --> 00:08:09,140 So if if I were pictures of cats and dogs, then I would be one zero one is one is one is for Dawkins, who is for cats, for example. 72 00:08:09,140 --> 00:08:15,410 And how do we train this while we form a function like this, which is called the lost function, right? 73 00:08:15,410 --> 00:08:21,080 So we we look at the difference between the output of the neural net. 74 00:08:21,080 --> 00:08:30,390 For a certain input and given adjustable parameters to we subtract from it the correct intended output, 75 00:08:30,390 --> 00:08:37,910 why we scrap the whole thing, and we do this over some sort of batch from this training sample, right? 76 00:08:37,910 --> 00:08:43,130 And then we we think of this whole expression as a function in those parameters theatre. 77 00:08:43,130 --> 00:08:50,000 And we try to minimise it by a certain method that is normally taken to be what is called stochastic gradient descent, 78 00:08:50,000 --> 00:08:55,220 which basically means you go down the steepest gradient in the feature direction and 79 00:08:55,220 --> 00:09:00,740 you keep repeating this as you as you pick matches here from this training set. 80 00:09:00,740 --> 00:09:06,800 And you hope that that way you get to a well-trained network. 81 00:09:06,800 --> 00:09:15,830 And then, of course, us unseen data of the same kind to to test this network, see if it if it generalise as well. 82 00:09:15,830 --> 00:09:27,100 In other words, if the loss on this data computed for the value of FT are not of the parameters that you've arrived at is actually sufficiently small. 83 00:09:27,100 --> 00:09:28,870 And if that works out, 84 00:09:28,870 --> 00:09:38,950 you declare success and you would use the Soul Train network with the parameters now set to those specific values and make predictions. 85 00:09:38,950 --> 00:09:43,610 All right. So that's that's the basic process. Now I have to say a little bit about this black box here. 86 00:09:43,610 --> 00:09:51,190 So what's the black box? So this is the the simplest version of what could be in this box, which is quite the perception. 87 00:09:51,190 --> 00:09:56,470 And the perception is it's just this performs a sequence of two steps. 88 00:09:56,470 --> 00:10:06,460 The first one is just enough time up, right? So it takes the image an input vector that forms the dot product with a Vector W, and that's a B. 89 00:10:06,460 --> 00:10:10,690 And then the second step is it applies to the output of that some function. 90 00:10:10,690 --> 00:10:15,550 So if you combine this thing in mathematical language, this is what the function looks like. 91 00:10:15,550 --> 00:10:19,930 And W the vector is called the weights. 92 00:10:19,930 --> 00:10:24,790 B is got the bias and it's the activation function. 93 00:10:24,790 --> 00:10:30,140 And these two sets of probabilities together is what I call feature previously. 94 00:10:30,140 --> 00:10:33,610 Right, so this this is these are the things that you want to trade in this particular 95 00:10:33,610 --> 00:10:37,330 context and the activation function is what what are drew on the board earlier, 96 00:10:37,330 --> 00:10:44,050 for example, this one choice, which is this function, which sort of interpolate between zero and one. 97 00:10:44,050 --> 00:10:52,650 But the other choices? OK. One thing that hopefully you remember is that this this kind of equation here that is performed 98 00:10:52,650 --> 00:10:57,390 in the first step is very closely related to the equation of a plane or a hyper plane. 99 00:10:57,390 --> 00:11:02,370 It's a I have a hyper plane, which is defined by this equation, right in two dimensions. 100 00:11:02,370 --> 00:11:08,510 It would just be a line if you if you think X is a two d vector, it would just be the equation of a line. 101 00:11:08,510 --> 00:11:19,980 All right. Now. It is with this with this sort of geometry in the back of your mind, it's very easy to understand what the system does. 102 00:11:19,980 --> 00:11:26,010 Suppose that your Vector X is such that it is above the line, right? 103 00:11:26,010 --> 00:11:30,950 Then the output of the first element here will be positive. 104 00:11:30,950 --> 00:11:37,160 All right. That's what it means to be above the line. Because there is exactly on the line greater than there is above the line, right? 105 00:11:37,160 --> 00:11:44,540 If it's if it's if this outward is positive, you are somewhere here on the branch of this activation function where it is plus one. 106 00:11:44,540 --> 00:11:51,950 So the output will be roughly plus one. If it's below the line, then this this Afghan transformation is negative. 107 00:11:51,950 --> 00:11:56,630 You'll be on the negative branch of this sigmoid and the output will be zero. 108 00:11:56,630 --> 00:12:06,020 So you can see what this is doing is a very basic system which decides whether a given point is above or below a line or a plane. 109 00:12:06,020 --> 00:12:12,230 All right. So it's it's a very simple example if you want of of a pattern recognition. 110 00:12:12,230 --> 00:12:15,810 OK, so this is actually a hands on subject. 111 00:12:15,810 --> 00:12:21,840 So I want to do something hands on. I want to actually show you how this works in real time. 112 00:12:21,840 --> 00:12:31,650 So let's this will take a little while, but let's let's see. So these systems are set up, this is, for example, within Mathematica, 113 00:12:31,650 --> 00:12:40,800 so here is a set of points random points generated in the box and there are two kinds the blue and the yellow, 114 00:12:40,800 --> 00:12:44,790 and you see they're roughly separated by a line. All right. 115 00:12:44,790 --> 00:12:48,160 But let's suppose that we don't actually know that just yet. 116 00:12:48,160 --> 00:12:56,370 We want we want to train a system to actually recognise that line so it can distinguish between the two kinds of points. 117 00:12:56,370 --> 00:13:00,780 And so. So this is the training set plotted. This is what it looks like in practise, right? 118 00:13:00,780 --> 00:13:03,930 It has the two coordinates, the X and the Y coordinate. 119 00:13:03,930 --> 00:13:13,580 And then the the target is either one if it's a blue point or two, if it's yellow or yellow point. 120 00:13:13,580 --> 00:13:18,330 OK, so then we can define ourselves a perception. 121 00:13:18,330 --> 00:13:23,880 So he has the he has the perception that's how mathematical this place is, this is this is basically the A5 transformation. 122 00:13:23,880 --> 00:13:28,990 This second bit is the logistic sigmoid activation function. 123 00:13:28,990 --> 00:13:33,730 And then we can train this in real time. 124 00:13:33,730 --> 00:13:40,100 And this is what you see here is precisely the lost function that I defined earlier. 125 00:13:40,100 --> 00:13:43,860 And so if this goes down, that's a good thing. 126 00:13:43,860 --> 00:13:53,910 And the the orange one is the loss on the training set and the blue version of the curve is lost on the validation set. 127 00:13:53,910 --> 00:14:00,780 So you want both of them, you expect the blue one to be higher than the orange one, but you definitely want both of them to go down. 128 00:14:00,780 --> 00:14:11,700 And now it's finished. And I can go and extract the values of the weights and the bias from my network, 129 00:14:11,700 --> 00:14:18,300 and I can plot from those from the state to the line that they define in two dimensions. 130 00:14:18,300 --> 00:14:22,380 And if I do that, that's what I get, right? Not surprisingly, right. 131 00:14:22,380 --> 00:14:27,870 But it's so. So I mean, someone was asking earlier, Can you understand what a new network does better? 132 00:14:27,870 --> 00:14:33,140 This is a little bit of an understanding of what it does, but of course, in a very, very simple case. 133 00:14:33,140 --> 00:14:37,730 So this clearly distinguishes the the blue and the red and the blue and yellow points. 134 00:14:37,730 --> 00:14:44,570 And so if you know, had an arbitrary point, picked an arbitrary set of X and Y coordinates and fed it into this network, 135 00:14:44,570 --> 00:14:58,830 you would get a zero output or one output. And depending on what you get, you would be able to decide where the point is above or below that line. 136 00:14:58,830 --> 00:15:06,570 OK, so. So this was the the simplest building block. But of course, it gets more complicated. 137 00:15:06,570 --> 00:15:14,130 You can look at several of these in parallel. So each each one of these is now one of those perceptions from the previous slide. 138 00:15:14,130 --> 00:15:20,190 You can look at any of them in parallel, but they have independent weights and have independent biases. 139 00:15:20,190 --> 00:15:27,530 And now the output, of course, is not a single real number. It's a it's a vector with components. 140 00:15:27,530 --> 00:15:31,040 And of course, that's a very inefficient way of riding the stone made a much better way. 141 00:15:31,040 --> 00:15:37,670 If you sort of remember basic, the vectors and matrices is to combine all these weights into a matrix, 142 00:15:37,670 --> 00:15:42,500 which I call a W and combine all these biases into Vector, which are called B. 143 00:15:42,500 --> 00:15:48,050 And of course, then these are all the parameters that are called feature previously. 144 00:15:48,050 --> 00:15:53,450 And you get to what you can sort of symbolise this whole operation like. 145 00:15:53,450 --> 00:16:02,720 So. Now this becomes just a multiplication of a of a of a vector with a matrix plus an extra bit. 146 00:16:02,720 --> 00:16:07,900 And then there's an activation function as before. So, 147 00:16:07,900 --> 00:16:12,150 so one way of of saying what this does is when the previous system learnt about the 148 00:16:12,150 --> 00:16:17,700 existence of a single hyper plane and this this such systems combine in parallel. 149 00:16:17,700 --> 00:16:22,530 This learns about the existence of M hyper planes. 150 00:16:22,530 --> 00:16:28,710 And then, of course, you can go further, you can take one of those building blocks, one of these perceptions in parallel. 151 00:16:28,710 --> 00:16:34,650 And you can construct from it several layers that you can just do this sequentially, one after the other. 152 00:16:34,650 --> 00:16:36,480 And of course, in each step in general, 153 00:16:36,480 --> 00:16:42,180 you change the dimension of your input just depending on what the size of this matrix of this weight matrix is. 154 00:16:42,180 --> 00:16:49,700 So this indicated that with those various numbers and one and two in and so forth. 155 00:16:49,700 --> 00:16:55,770 And of course, what these dimensions are depends on the details. 156 00:16:55,770 --> 00:17:01,380 And there's there's a much longer story there, but that is that is the basic structure. 157 00:17:01,380 --> 00:17:14,500 And I want to show you another example where this is done in a slightly more complicated way. 158 00:17:14,500 --> 00:17:21,730 So sort of same principle, but now with the set of two points is a lot less a lot as simple, right? 159 00:17:21,730 --> 00:17:24,200 So is a sort of a pattern. Right. 160 00:17:24,200 --> 00:17:33,460 And as before, we would like the new network to distinguish between the blue and the yellow points, but they're not linearly separated anymore. 161 00:17:33,460 --> 00:17:41,180 So if we try the same method as before, so we just try a simple, simple perceptron. 162 00:17:41,180 --> 00:17:47,440 Right. So the one that represents a line in which we try to change this? 163 00:17:47,440 --> 00:17:53,200 Well, that doesn't work, right? As you can see, the loss is not really getting a very small. 164 00:17:53,200 --> 00:17:57,190 I mean, you can see the the numbers here, it sort of zero point five, right? 165 00:17:57,190 --> 00:17:59,960 That's that's not very impressive. 166 00:17:59,960 --> 00:18:06,890 And that, of course, expected you would not expect a single line to be able to tell the difference between those two shapes. 167 00:18:06,890 --> 00:18:16,820 All right. So we can we can stop this, but we can sort of still look at what it's actually done right and what it's done something very silly. 168 00:18:16,820 --> 00:18:27,330 Right? OK, so but of course, that was a silly way of going about it anyway. 169 00:18:27,330 --> 00:18:36,720 So we do something more complicated, right? So we we look at a neural network which has in its first layer for perceptions in parallel. 170 00:18:36,720 --> 00:18:45,450 All right. So this is like to be pretty a for Buffy says that that that these are for four of these perceptions are arranged 171 00:18:45,450 --> 00:18:52,980 in parallel and then we have a sort of final layer to put them all back together and make the outward real. 172 00:18:52,980 --> 00:18:57,000 So we do we train again. 173 00:18:57,000 --> 00:19:19,640 And while initially you get a little bit worried, but then then clearly there is a difference that it starts going down quite dramatically. 174 00:19:19,640 --> 00:19:30,340 OK, we can probably stop it now. And then we look at the same same picture as before we read about the way it's read. 175 00:19:30,340 --> 00:19:34,900 And so this is what it's done. If I run this again, it would probably do something else, 176 00:19:34,900 --> 00:19:40,120 but the point is that it's somehow arranged the four lines that those four perceptions correspond to in a 177 00:19:40,120 --> 00:19:48,010 way that allows you to distinguish the blue and the yellow points based on whether they're above or below. 178 00:19:48,010 --> 00:19:57,030 Each one of those four lines. And you can see how this how you could generalise this to get to a proper pattern recognition. 179 00:19:57,030 --> 00:20:01,670 And in fact, this is this is something we can we can actually do so. 180 00:20:01,670 --> 00:20:06,870 Uh, I was talking about this and this said which is, which is this set of hand-written numbers. 181 00:20:06,870 --> 00:20:10,440 It's sort of a standard test set that has been used. 182 00:20:10,440 --> 00:20:15,150 Here is a small sample of this set. So it contains these handwritten numbers then. 183 00:20:15,150 --> 00:20:21,420 And of course, the target is what they actually represent, which numbers actually represent. 184 00:20:21,420 --> 00:20:27,780 And I can use a network which is in practise very similar to the previous one, 185 00:20:27,780 --> 00:20:32,190 just a bit more complicated and uses the twenty eight times twenty eight inputs, 186 00:20:32,190 --> 00:20:42,320 which is the size of the the pixel size of these handwritten numbers and then just goes to an output. 187 00:20:42,320 --> 00:20:43,660 And of course, I should say this, 188 00:20:43,660 --> 00:20:50,990 these are all these are all the handwritten numbers from the tonight and then I've just picked the one and the nine here as an illustration. 189 00:20:50,990 --> 00:20:57,840 So we have a lot of binary classifier. And we can train this thing. 190 00:20:57,840 --> 00:21:23,280 As before. Right, and it's it works quite well. 191 00:21:23,280 --> 00:21:29,330 OK, stop it now. And. 192 00:21:29,330 --> 00:21:31,970 So then this is just giving you an impression, 193 00:21:31,970 --> 00:21:40,340 so this is sort of used a test sets so something that has not been used at a fraction of the original set, which has not been used for training. 194 00:21:40,340 --> 00:21:49,190 I've just fed the picture into the train network and the right hand side after the error is the output that that the network actually provides. 195 00:21:49,190 --> 00:21:57,520 So as you can see, in practically all cases that are plotted, it actually correctly identifies whether it's a one or nine. 196 00:21:57,520 --> 00:22:09,120 And and in fact, the the number down here gives you tells you that it's a ninety nine point seven percent accuracy of predicting the correct outcome. 197 00:22:09,120 --> 00:22:16,610 OK. Right, so so that was. The multi-layered perception, 198 00:22:16,610 --> 00:22:25,280 there's one more thing I want to one more type of network I want to discuss which is in the within the context of unsupervised learning. 199 00:22:25,280 --> 00:22:33,080 So that's that's learning where you're not providing the target you you're kind of hoping that the network 200 00:22:33,080 --> 00:22:37,910 will discover the pattern within the data by itself without telling you anything about the answer. 201 00:22:37,910 --> 00:22:42,410 And this particular network is called an auto encoder. 202 00:22:42,410 --> 00:22:45,260 So let me explain how that works. 203 00:22:45,260 --> 00:22:51,080 So the first part of the auto encoder is pretty much the same structure of a multilayer perception that we had on the previous slide. 204 00:22:51,080 --> 00:22:54,140 And so just a sequence of such layers. 205 00:22:54,140 --> 00:23:03,470 And this is combined with another such multilayer perception where now the dimensions go in the opposite direction. 206 00:23:03,470 --> 00:23:10,220 So this goes from an end dimensional vector in sequences down to a vector with dimension and L. 207 00:23:10,220 --> 00:23:16,100 This starts with the same dimension and L and gets it back to an end dimensional one. 208 00:23:16,100 --> 00:23:23,210 And it's done in such a way that in the top one, the dimension is decreasing as I go from left to right. 209 00:23:23,210 --> 00:23:28,750 And of course, and then in the bottom one, the dimension is increasing as they go from left to right. 210 00:23:28,750 --> 00:23:33,580 OK. And. The idea is. 211 00:23:33,580 --> 00:23:35,530 But somewhere here in the middle. 212 00:23:35,530 --> 00:23:42,010 Well, first of all, I will then combine these two networks right, and I will fight what is called the encoders of this first network. 213 00:23:42,010 --> 00:23:46,180 I will fit the output of that into the inputs of the second one. 214 00:23:46,180 --> 00:23:51,190 And that way, I will ensure that my input goes through this bottleneck, 215 00:23:51,190 --> 00:23:57,080 keeping in mind that the dimension here in the middle is a lot smaller typically than the one you started with. 216 00:23:57,080 --> 00:24:01,120 Right? And what you are trained on? Well, I don't provide the targets. 217 00:24:01,120 --> 00:24:08,160 I just have a set of possible inputs ixi that I put into the auto encoding on the left. 218 00:24:08,160 --> 00:24:19,380 And what I will try to minimise answer the loss function in this case is just making sure that the input X and the output X are the same. 219 00:24:19,380 --> 00:24:23,910 All right. So I'm trying to reproduce whenever fed into here at the end. 220 00:24:23,910 --> 00:24:28,500 But what I'm doing, so I have to feed it through this bottleneck. 221 00:24:28,500 --> 00:24:38,880 Right, so. So this must mean that somehow in the middle, the this audio to encode must learn some successful compression of the data. 222 00:24:38,880 --> 00:24:56,260 OK. So again, let's do an example of that. 223 00:24:56,260 --> 00:25:03,200 So this is for the same data set, the data set. And this is it starts looking a bit more complicated, 224 00:25:03,200 --> 00:25:09,380 nobody sees it takes the twenty eight times twenty eight inputs, which corresponds to one of those pictures. 225 00:25:09,380 --> 00:25:13,390 It increases damage first, but then it starts decreasing. 226 00:25:13,390 --> 00:25:21,690 It goes all the way to two dimensions in the middle and then it goes back up to the same size again. 227 00:25:21,690 --> 00:25:31,180 So let's train this thing. 228 00:25:31,180 --> 00:25:40,920 So this is now the last, but it's remember that the loss meant the the difference between the input and the output being the same. 229 00:25:40,920 --> 00:25:47,100 So this this looks reasonably good. And it's finished now. 230 00:25:47,100 --> 00:25:55,920 And so. So now this is actually this actually isn't very good, so maybe I should I should run this again. 231 00:25:55,920 --> 00:25:57,480 So this this actually happens, right? 232 00:25:57,480 --> 00:26:05,070 So this is this is something that that can happen, that depending on your on the initialisation of your neural network, 233 00:26:05,070 --> 00:26:26,720 sometimes you might not not go into the right direction. Let's hope this time it's better. 234 00:26:26,720 --> 00:26:32,120 No, it's not OK. Anyway, that's not what it was supposed to be happening. 235 00:26:32,120 --> 00:26:36,020 Of course, what was supposed to be happening is that the the blue and yellow points there were there 236 00:26:36,020 --> 00:26:40,400 were split apart and we'll show you an example of where it actually did work later on. 237 00:26:40,400 --> 00:26:46,160 Right. But so the blue points correspond to the lines, the yellow points correspond to the ones, right? 238 00:26:46,160 --> 00:26:51,020 So that's that's the idea that the this hour to call it would be able to tell 239 00:26:51,020 --> 00:26:55,280 them apart without actually telling the machine that there are these two types. 240 00:26:55,280 --> 00:27:01,730 Yeah, I mean, maybe there's a so with a bit of good will, you could anyway. 241 00:27:01,730 --> 00:27:10,520 So again, yeah, the axis. So remember that the opening quarter would compress it to something two dimensional in the middle. 242 00:27:10,520 --> 00:27:21,500 So this these two, these two dimensions are the two axes. So it's kind of a latent space in the middle. 243 00:27:21,500 --> 00:27:33,600 OK. Right. OK, so complete switch of topics. 244 00:27:33,600 --> 00:27:40,430 Let's go to string theory, so let me remind you about some basics of string theory, because I'll have to put this in this context. 245 00:27:40,430 --> 00:27:48,200 So string theory is a theory of strings meant to be the fundamental constituents of of nature and come in two times the open and the close strings. 246 00:27:48,200 --> 00:27:51,470 So the open strings? Well, exactly what you think they are. 247 00:27:51,470 --> 00:27:59,760 And when they propagate, they sweep out this sheet and the close strings they sweep out instead of cylinders as they move along. 248 00:27:59,760 --> 00:28:09,720 And it's a theory that starts out with one free one, one free, undetermined dimension for constant, which is which is the string tension. 249 00:28:09,720 --> 00:28:16,110 And which is, it turns out, only consistently defined in 10 or 11 dimensions, 250 00:28:16,110 --> 00:28:23,320 and this is where the root cause of the trouble is, as we will see very shortly. 251 00:28:23,320 --> 00:28:29,170 Now, the spectrum of of this view is very schematically. It's pretty much like you would expect for a string. 252 00:28:29,170 --> 00:28:36,700 So the the map, the master's exit, the massive exploitation of the massive and massive massive excitations with mass m. 253 00:28:36,700 --> 00:28:41,830 They are measured in units of this string tension, which is too high for prime. 254 00:28:41,830 --> 00:28:51,420 And in those units, they're basically integers. And amongst the modes here, where N equals zero, so amongst the matchless modes, 255 00:28:51,420 --> 00:29:00,260 we find always a graviton and we always find the kind of force carriers that we need to mediate the strong and the electric forces. 256 00:29:00,260 --> 00:29:10,710 So in other words, string theory generically always has the typical types of forces, the competition and the forces that we know exist in nature. 257 00:29:10,710 --> 00:29:15,450 So. So from that point of view, it looks like a reasonable starting point. 258 00:29:15,450 --> 00:29:24,600 And it turns out that the string tension because because the gravity is always in there and we somehow have to reproduce Newton's constant, 259 00:29:24,600 --> 00:29:30,750 which is in the appropriate units, a very large energy 10 to the 19 GeV. 260 00:29:30,750 --> 00:29:38,310 Because we have to reproduce that. It turns out that in most cases, the string tension gets coupled to the Newton constant. 261 00:29:38,310 --> 00:29:43,230 Otherwords is very large, and it's basically of the order of the Planck scale. 262 00:29:43,230 --> 00:29:49,530 And for that reason, all these modes were any bigger than zero, which you might have worried about. 263 00:29:49,530 --> 00:29:51,930 They become very, very heavy. All right. 264 00:29:51,930 --> 00:30:05,400 So, so the idea then, is that the the physics that we currently observe that that will only be tied to these modes with an equal zero. 265 00:30:05,400 --> 00:30:09,810 OK, so what about these dimensions? So that's that's really very embarrassing. 266 00:30:09,810 --> 00:30:17,940 And the way to get out of that is that somehow we need to think of a six or seven, depending on whether it's starting 10 or 11 dimensions. 267 00:30:17,940 --> 00:30:25,680 We need to think of these as being curled up in a very small scale something strings of his colleagues compact ification. 268 00:30:25,680 --> 00:30:29,070 So the schematically it's you start in 10 11 dimensions, 269 00:30:29,070 --> 00:30:39,060 you call up six or seven of these dimensions on a space that I keep will be will be calling X and somehow effectively then at least at scales, 270 00:30:39,060 --> 00:30:42,300 which are much longer than this case in which you've built these things up. 271 00:30:42,300 --> 00:30:46,770 You will be ending up with a four dimensional theory as you as you would like to. 272 00:30:46,770 --> 00:30:51,230 And the spaces on which you do this coming up there can be very complicated. 273 00:30:51,230 --> 00:30:54,780 That is a picture of one of them. It's it's called clube. 274 00:30:54,780 --> 00:31:00,790 How many folds. But this is a very particular one called the bi cubic, which are come back to later. 275 00:31:00,790 --> 00:31:07,520 And. But that's that's the basic process. 276 00:31:07,520 --> 00:31:14,460 And then the question is, how does this for the effective theory that you obtains, we're actually dependent on the way that you curl this up. 277 00:31:14,460 --> 00:31:17,400 And so here is the schematic way that works. 278 00:31:17,400 --> 00:31:26,670 One feature of the curling up is, of course, the topology of the curling up, so I can only draw 2-D 2D curling up manifolds here. 279 00:31:26,670 --> 00:31:31,320 So I am drawing a choice and the sphere at these are different apologies. 280 00:31:31,320 --> 00:31:37,590 And so the before this theory would depend on which one of those you picked. 281 00:31:37,590 --> 00:31:47,640 And so more specifically, what the topology determines is the actual forces that you get in your 40s and the metal content that you get. 282 00:31:47,640 --> 00:31:54,890 And mathematically, this whole process is caught in sort of tied up with the field of mathematics that's called algebraic geometry. 283 00:31:54,890 --> 00:32:00,740 And then this, of course, also the shapes of the toys that could have a very fat or very thin toys. 284 00:32:00,740 --> 00:32:07,830 So the shape also matters, and the shape will somehow determine the coupling constants in here for this theory. 285 00:32:07,830 --> 00:32:15,060 So that's roughly how the correspondence works. And for the purpose of this talk, I will only focus on this first aspect here. 286 00:32:15,060 --> 00:32:27,260 So I will be worried about the kinds of forces and the kinds of particles that you obtain in four dimensions from this kind of construction. 287 00:32:27,260 --> 00:32:32,600 So what are what are these two apologies of for cutting up? So again, I can only draw the 2D pictures very well. 288 00:32:32,600 --> 00:32:35,090 So in two, do you have this fear you have the choice, 289 00:32:35,090 --> 00:32:39,660 but you could have something with two handlers, you could have something with three handles, etc. 290 00:32:39,660 --> 00:32:45,870 So this is called the genius of the curve. All right. So in two dimensions, it's a very simple. 291 00:32:45,870 --> 00:32:52,170 You can basically classify the topology by the number of holes by this by this single integer tree. 292 00:32:52,170 --> 00:32:53,490 And that's it, right? 293 00:32:53,490 --> 00:33:04,050 And in fact, if you were to come back to find just two dimensions from this infinite sequence, only one of them, the tools would actually be allowed. 294 00:33:04,050 --> 00:33:08,640 But we are not. We're not just creating a two dimensions. We're calling up six dimensions. 295 00:33:08,640 --> 00:33:15,570 So, so a single integer is not enough. In fact, there is usually a multitude of integers. 296 00:33:15,570 --> 00:33:25,620 So. So the bottom line here is that the topology on which you do this is is classified by a bunch of integers, by integer data. 297 00:33:25,620 --> 00:33:36,590 OK, that's that's that's the main message. And because we in six dimensions will typically be many choices for that data. 298 00:33:36,590 --> 00:33:42,020 And this is this is precisely related to this enormous number of solutions to string theory that I mentioned earlier. 299 00:33:42,020 --> 00:33:49,110 It counts these counts, the different apologies which you can use to go from 10 to four. 300 00:33:49,110 --> 00:33:55,620 And some of these choices remember the topology is tied with the particle content, some of these choices, 301 00:33:55,620 --> 00:34:07,150 they will lead to particle content, which looks realistic, like the one that we actually see in nature and many others will not. 302 00:34:07,150 --> 00:34:16,720 So how do we actually find this? More specifically? And so I have to add a little bit more information, I haven't told you quite the truth. 303 00:34:16,720 --> 00:34:20,530 So typically this Space X carries extra structures, not just the space. 304 00:34:20,530 --> 00:34:25,990 It's also things on this space. And one one thing that could be on the space is what is called line bundles. 305 00:34:25,990 --> 00:34:29,530 So what's the line bundle? Well, I can. 306 00:34:29,530 --> 00:34:30,880 I have to sort of lower dimensions. 307 00:34:30,880 --> 00:34:38,650 And now the Space X, which is the black circle, is just drawing one dimensional because I need one other dimension to draw. 308 00:34:38,650 --> 00:34:46,330 And so a line bundle would be a structure where you attach to each point on this circle just align, right? 309 00:34:46,330 --> 00:34:50,740 I've done it here as an arrow to indicate an orientation, but in principle it should. 310 00:34:50,740 --> 00:34:55,060 It should go from minus to plus infinity like a proper B line. 311 00:34:55,060 --> 00:35:00,970 So that's that's one way of introducing a line bundle on this space x. 312 00:35:00,970 --> 00:35:04,840 This number is usually called o o x by the mathematicians, right? 313 00:35:04,840 --> 00:35:11,290 But you can see that you can do this in other ways. This is sort of not a very good picture, but I hope you can see what I mean. 314 00:35:11,290 --> 00:35:18,040 I start with a line oriented this way, and then as I go round the circle, it changes its orientation. 315 00:35:18,040 --> 00:35:22,270 And when it comes back, it's sort of pointing downwards. All right. 316 00:35:22,270 --> 00:35:28,270 So you can twist. In other words, you can twist the line as you go around this, uh, the circle. 317 00:35:28,270 --> 00:35:34,390 And this is called a one. And then, of course, the next one I couldn't draw, right, 318 00:35:34,390 --> 00:35:38,680 so but you can see that why you go around the circle, you can twist it twice and three times, 319 00:35:38,680 --> 00:35:46,300 etc. And so and you could change the orientation of the twisting, which means these these numbers could also be negative in principle. 320 00:35:46,300 --> 00:35:53,650 So you see that line balance in a simple case. They're classified by integers. 321 00:35:53,650 --> 00:35:58,450 Of course, this is the the manifold in which we consider this one, the this is very simple. 322 00:35:58,450 --> 00:36:02,320 In reality, we want to do this on a six dimensional manifold and a six dimensional. 323 00:36:02,320 --> 00:36:08,800 My foot might have many different loops and around each loop, the line and my twist as you go round it. 324 00:36:08,800 --> 00:36:12,730 So in other words, you need many integers in this case, typically to describe nine models. 325 00:36:12,730 --> 00:36:16,990 So. So the message is here we are again stuck with integer data. 326 00:36:16,990 --> 00:36:23,390 So alignment is classified in six dimensions, typically by a bunch of integers. 327 00:36:23,390 --> 00:36:32,850 So. What does this have to do with anything in terms of in terms of the physics that we get in four dimensions? 328 00:36:32,850 --> 00:36:38,650 Well, line have what is called section. So what's a section? So let's draw the line, Mundell. 329 00:36:38,650 --> 00:36:45,920 Access to manifold, again, a lot of these red areas, one of those that I discussed previously and the section is just, 330 00:36:45,920 --> 00:36:54,740 uh, basically a kind of a function which picks out a value on each of those lines. 331 00:36:54,740 --> 00:36:59,840 And the physical importance of these sections is that they're kind of the internal 332 00:36:59,840 --> 00:37:06,100 wave function of the particles that we're trying to obtain in four dimensions. 333 00:37:06,100 --> 00:37:11,230 So the upshot is that the number of independent sections that such a line bundle has. 334 00:37:11,230 --> 00:37:17,480 Well, first of all, it's counted by a mathematical concept, which is called cosmology. 335 00:37:17,480 --> 00:37:26,210 But. And commodity is denoted by this age symbol, and it comes in sort of different flavours. 336 00:37:26,210 --> 00:37:29,000 Age Knowledge 100, which feeds for never mind the details. 337 00:37:29,000 --> 00:37:35,930 But the point is that these numbers, which are just dimensions, numbers of different sections, independent sections that you get. 338 00:37:35,930 --> 00:37:40,410 They in fact count the number of particles that you get in four dimensions. 339 00:37:40,410 --> 00:37:44,620 And so, so it's a very nice way in which mathematics ties into physics. 340 00:37:44,620 --> 00:37:49,400 That's what happens very frequently, that you have some sort of mathematical theory, 341 00:37:49,400 --> 00:37:56,930 and it precisely plugs into this problem of wanting to compute the spectrum of the fortieth theory. 342 00:37:56,930 --> 00:38:03,200 So mathematically, what you have to do is compute those numbers. H h one. 343 00:38:03,200 --> 00:38:11,040 And it's two. For a given night Monday. And the point is that it's an absolutely horrendous calculation, right? 344 00:38:11,040 --> 00:38:16,980 I mean, it looks simple, but it is trying to if a bunch of integer numbers in trying to get the number of a bunch of integer numbers out, 345 00:38:16,980 --> 00:38:22,110 but to actually do this in practise is totally horrendous. 346 00:38:22,110 --> 00:38:28,650 And so, so for this reason, it seems like a good problem for machine learning, and it's something that is absolutely not obvious. 347 00:38:28,650 --> 00:38:34,560 Takes a very, very long computation. And actually, how am I doing on time? I'm thinking, should I show this calculation or not? 348 00:38:34,560 --> 00:38:45,070 I think probably not. OK, good. Right? So take my word for it is horrendous and therefore be a good problem for machine learning. 349 00:38:45,070 --> 00:38:50,250 So I skipped the example. And so get to the final part, machine learning string theory. 350 00:38:50,250 --> 00:38:58,500 So. So this line, Malcolm, that I need to compute in order to know what my patio content is. 351 00:38:58,500 --> 00:39:04,320 Can I can I somehow teach the machine to learn about that? In other words, can I teach a machine to learn? 352 00:39:04,320 --> 00:39:12,270 This particular map out has explained to us that these neural networks, they can basically represent any function. 353 00:39:12,270 --> 00:39:19,770 They're very expressive, right? They represent these function spaces. And can I teach a neural network to learn this function, 354 00:39:19,770 --> 00:39:26,730 which takes an integer vector which represents one of those alignments twisting in a particular way? 355 00:39:26,730 --> 00:39:33,160 And give out the commodity or the number of particles if you want. 356 00:39:33,160 --> 00:39:39,100 And then perhaps the next question, which is more ambitious, can somehow not do this in a black box way. 357 00:39:39,100 --> 00:39:44,210 But can I learn something about the mathematical structure of this map? 358 00:39:44,210 --> 00:39:46,250 As I go about it. 359 00:39:46,250 --> 00:39:54,950 So the training data is of this form, right, so we have an integer input vector and we have one of those commodity dimensions as the output vector, 360 00:39:54,950 --> 00:39:59,030 and we get this training data from our horrendous calculation that we have to do right. 361 00:39:59,030 --> 00:40:08,490 It's supervised learning, so we need the data. OK, so there is a space which is called deep two, never, never mind what it is. 362 00:40:08,490 --> 00:40:14,860 The unimportant is that line buttons on this space are classified by three integers cannot K1 and K2. 363 00:40:14,860 --> 00:40:20,560 And we can compute the commodity for, say, about a thousand of these integer triplets. 364 00:40:20,560 --> 00:40:30,110 We get the answer and we do this, of course, for a certain range for these integers, which is in a box of size 10 or 20. 365 00:40:30,110 --> 00:40:34,490 And so this is the same picture that you saw previously, so this is how the the lost function evolve. 366 00:40:34,490 --> 00:40:42,740 So you see it trains reasonably well. And you can see that you can then check that within this box of 10 that we've trained in. 367 00:40:42,740 --> 00:40:49,240 This will predict these these dimensions correctly with a 98 percent accuracy. 368 00:40:49,240 --> 00:40:58,810 But so that looks that looks very good. But if you then go and increase the size of the box to 15 and you ask your new network, 369 00:40:58,810 --> 00:41:06,970 which was trained on the smaller box to predict values in that range, the the success rate drops very dramatically. 370 00:41:06,970 --> 00:41:14,140 So this is what I'd alluded to earlier generalisations beyond the domain that you've used for training. 371 00:41:14,140 --> 00:41:22,660 They typically don't work very well because why should the network know how it would continue so that it was only trained on this particular box? 372 00:41:22,660 --> 00:41:26,380 So the upshot here is. 373 00:41:26,380 --> 00:41:35,230 Well, this works to some degree, with reasonably high accuracy, it's of course, much faster than the horrendous method once it's trained. 374 00:41:35,230 --> 00:41:40,840 But 90 percent accuracy or something like that might, in fact, for some applications, not be good enough. 375 00:41:40,840 --> 00:41:50,620 After all, you computing a dimension of a space, but you don't really want any uncertainty in knowing that answer, as I just explained. 376 00:41:50,620 --> 00:41:54,640 If you go outside the box that you use for training, it would become very bad. 377 00:41:54,640 --> 00:41:58,390 And of course, at this point, we have absolutely no inside of what it even means. 378 00:41:58,390 --> 00:42:03,070 It doesn't tell us anything about the mathematics. 379 00:42:03,070 --> 00:42:12,660 So this is where the second question comes back, so can we actually do this in a more sophisticated way and learn something about the mathematics? 380 00:42:12,660 --> 00:42:18,120 And of course, we need some kind of intuition of what the mathematics is, and fortunately, we do have that. 381 00:42:18,120 --> 00:42:27,510 And what we know is what we suspect from experience is that the function that we're trying to learn is actually not such a complicated function, 382 00:42:27,510 --> 00:42:34,380 even though the calculation is horrendous and performed in this way, it's a function which is piece wise polynomial. 383 00:42:34,380 --> 00:42:40,470 So for in this space of key vectors, there are regions which we don't know arbitrary, 384 00:42:40,470 --> 00:42:49,790 but there are regions and in each region this function, the commodity function is described by a polynomial of a certain degree. 385 00:42:49,790 --> 00:42:53,150 But we don't know what that is, either. 386 00:42:53,150 --> 00:42:58,550 So we can devise a somewhat more sophisticated neural network, and maybe I don't want to go too much into the details, 387 00:42:58,550 --> 00:43:06,680 but the idea of this new network is that we have two branches. The Apple one is somehow supposed to recognise what these regions are. 388 00:43:06,680 --> 00:43:16,660 And you can think of it working in much the same way as the sort of pattern recogniser that are presented in these earlier runs. 389 00:43:16,660 --> 00:43:21,050 All right. So it recognises these regions and the local branch. 390 00:43:21,050 --> 00:43:28,830 This one here sort of recognises the polynomial, and it puts the two together, and I can train this network on data. 391 00:43:28,830 --> 00:43:33,740 And then read out certain bits of information from these weights here. 392 00:43:33,740 --> 00:43:40,790 Which which of these input factors come with the same polynomial formula? 393 00:43:40,790 --> 00:43:46,760 So if I use this information, I can basically identify the regions. So let me let me just show you an example. 394 00:43:46,760 --> 00:43:51,680 So this example is actually for this by cubic that I showed the picture before. 395 00:43:51,680 --> 00:43:56,240 Its linemen are characterised by two into just K1 and K2. 396 00:43:56,240 --> 00:44:00,620 And if I run the neural network over my training data, I get this kind of plot. 397 00:44:00,620 --> 00:44:13,650 So the different colours here indicate the different regions. So in each of those regions, I know that the formula must be described by a polynomial. 398 00:44:13,650 --> 00:44:18,210 And so I can now just go knowing what the regions are and just fit the right polynomial to it, 399 00:44:18,210 --> 00:44:22,710 I need very few points because I know it's most a cubic polynomial, right? 400 00:44:22,710 --> 00:44:28,890 A cubic polynomial in two variables doesn't have very many coefficients. I only need a certain number of points to do that. 401 00:44:28,890 --> 00:44:31,980 I can do that for the Blue Ridge. I find it's just zero. 402 00:44:31,980 --> 00:44:37,920 And I could do it for the yellow and green region, which in turn in fact turn out to be two parts of the same region. 403 00:44:37,920 --> 00:44:44,850 And I find this formula. I can use the formula to clean up the regions, which were a little bit fuzzy at the edges. 404 00:44:44,850 --> 00:44:53,130 And find the boundaries and end up with a final formula, which looks so simple that it's almost embarrassing. 405 00:44:53,130 --> 00:44:59,940 Right? And and in fact, this formula is a formula of this kind were not known in mathematics, 406 00:44:59,940 --> 00:45:06,280 and it's not completely clear to this date why the standard when this calculation that it's normally been performed, 407 00:45:06,280 --> 00:45:11,450 which can take hours on the computer ends up with such a simple result. 408 00:45:11,450 --> 00:45:17,240 But this particular formula, so you can you can think of this as some kind of conjecture generator. 409 00:45:17,240 --> 00:45:23,240 So this is a conjecture that has been generated. It's not been proven because it's only been conjectured from a finite amount of data. 410 00:45:23,240 --> 00:45:31,150 But you can now go and try to mathematically prove this. And that's indeed been done for this, for me and for others of of a similar kind. 411 00:45:31,150 --> 00:45:42,450 So that's an example where you might have learnt something more than sort of the typical black box thing from in your network. 412 00:45:42,450 --> 00:45:43,620 OK, so this is another example, 413 00:45:43,620 --> 00:45:50,820 which is just here because it's pretty so this is again for this space deep to which where nine bills are characterised by three integers, 414 00:45:50,820 --> 00:45:57,600 so it's three dimensional plots, you see, it's a lot more complicated. There are six regions there, but they all identified. 415 00:45:57,600 --> 00:46:04,170 They've been cleaned up. I can read of the the polynomials. So a bit more complicated and there are six regions. 416 00:46:04,170 --> 00:46:15,450 This is basically just to see that it works, and you can also go and mathematically prove this conjecture. 417 00:46:15,450 --> 00:46:20,490 OK, so the summary here is at least instead of a modest way, 418 00:46:20,490 --> 00:46:30,890 machining can be used to generate mathematical conjectures within string theory, but probably also more generally. 419 00:46:30,890 --> 00:46:38,210 OK, so the second part has to do with can we somehow use machine learning to sift through this enormous amount of string data? 420 00:46:38,210 --> 00:46:46,400 And one simple question that you might ask is, Well, I have these constructions of all these four dimensional models that come from compact. 421 00:46:46,400 --> 00:46:52,970 You find from 10 to four and the spectrum of of these 40 theories depends on exactly how curled is up. 422 00:46:52,970 --> 00:46:59,960 Some of them will be good models from a physics point of view, others not with a neural network be able to tell the difference. 423 00:46:59,960 --> 00:47:07,850 But would a new a network be able to tell me which one of these choices corresponds to a standard model of particle physics and which one does not? 424 00:47:07,850 --> 00:47:11,360 So that's that's that's one basic question you might ask. 425 00:47:11,360 --> 00:47:19,220 And so one thing I need to say before is is that in order to get a model with the right forces, 426 00:47:19,220 --> 00:47:24,290 first of all, because I need to cut up six dimensions, but also need five of these line ones. 427 00:47:24,290 --> 00:47:33,320 Right. So but that just means picking five of these integer vectors, which we could summarise into some sort of matrix that are called OK. 428 00:47:33,320 --> 00:47:44,750 All right. So so just think about we pick up space, and the models on this space will then be described, but just picking an integer matrix. 429 00:47:44,750 --> 00:47:49,910 And we can create ourselves training data, which is a set of these interpretive matrices, 430 00:47:49,910 --> 00:47:56,480 and they go to zero or one, depending on whether that leads to a standard model or not. 431 00:47:56,480 --> 00:48:01,110 OK, so that's that's the kind of training data binary choice. 432 00:48:01,110 --> 00:48:09,260 And the question again is, can we somehow teach the machine to distinguish between those two? 433 00:48:09,260 --> 00:48:16,040 So we have to pick a space. I mean, never mind what the space is, but the point here is that it is also described by some set of integers, 434 00:48:16,040 --> 00:48:19,520 so that's a particular 60 space for cooling up. 435 00:48:19,520 --> 00:48:27,030 The nice thing about this space is that there are 17000 stand up models on there, which we have found by brute force. 436 00:48:27,030 --> 00:48:37,080 And we add to the 17000 the same number of non-standard models, which we just get by randomly generating some matrices with integers. 437 00:48:37,080 --> 00:48:43,230 And so he had two examples, like so that's one matrix, which in fact corresponds to the standard model. 438 00:48:43,230 --> 00:48:48,660 That's the one matrix, which does not correspond to a standard model. So the question is, can you tell the difference? 439 00:48:48,660 --> 00:48:58,180 I can't. Right. But the question is, can the machine? 440 00:48:58,180 --> 00:49:06,670 So it's a relatively simple network, two or three layers, so this is another way of representing the data. 441 00:49:06,670 --> 00:49:12,430 So this gives you an indication of how big the entries in this matrix are. 442 00:49:12,430 --> 00:49:21,460 And this is the distribution of this typical size, and we take training and validation data from this lower end of the spectrum, 443 00:49:21,460 --> 00:49:29,050 which corresponds to the matrices with small entries. And we take a test set from this upper end. 444 00:49:29,050 --> 00:49:37,390 And we run we train the new network, and we find it's extremely successful on this low end trains very well validates very well, 445 00:49:37,390 --> 00:49:41,020 but surprisingly it's also very successful on the test set. 446 00:49:41,020 --> 00:49:46,000 So this is an example which I don't completely understand yet of where the new 447 00:49:46,000 --> 00:49:54,000 network actually generalise as well beyond the domain that it has trained in. 448 00:49:54,000 --> 00:49:58,590 So the bottom line here is we can, in fact, distinguish between those two types. 449 00:49:58,590 --> 00:50:04,940 And again, there is a very complicated calculation actually to do this in the standard way. 450 00:50:04,940 --> 00:50:12,420 And so this this will be once it is trained, it will be a lot faster than doing this computation. 451 00:50:12,420 --> 00:50:18,630 OK, so now comes the outgoing caller, which went so spectacularly wrong before, but of course not. 452 00:50:18,630 --> 00:50:24,210 I'm not doing this in real time because it would take too long. 453 00:50:24,210 --> 00:50:27,780 So I I used the same dataset for an audio encoder, 454 00:50:27,780 --> 00:50:33,930 and the latent space in the middle is again two dimensional so I can produce a nice two dimensional plot. 455 00:50:33,930 --> 00:50:38,250 And this is what the plot looks like. Right. So the red points are the standard ones. 456 00:50:38,250 --> 00:50:42,330 The new points are the non-standard models and they are neatly separated, right? 457 00:50:42,330 --> 00:50:47,430 And this is from actually the set with small entries and one which it was strange. 458 00:50:47,430 --> 00:50:54,030 And if I use a test set of unseen data from the matrices with bigger entries, the split persists, right? 459 00:50:54,030 --> 00:51:00,870 Which which again seems to be saying it's whether generalising beyond the domain of training. 460 00:51:00,870 --> 00:51:06,150 OK, so the auto encoder works very well in this case as well. 461 00:51:06,150 --> 00:51:16,860 OK, so that's the end of it. So machine learning and strength has really just begun, and we don't really know what the good problems are, 462 00:51:16,860 --> 00:51:22,640 what the good techniques are and how we combine the two. So this is all developing still. 463 00:51:22,640 --> 00:51:33,080 But at least we can see that perhaps there is a there's an avenue there where machining can be used to generate conjecture, mathematical conjectures. 464 00:51:33,080 --> 00:51:45,500 And I think there's some hope that machine learning can help us sifting through this enormous landscape of string solutions. 465 00:51:45,500 --> 00:51:50,990 I think the question as to whether machining can really lead to substantial progress in string theory, that's still open. 466 00:51:50,990 --> 00:51:59,330 But that's why we have to wait and see. And somehow you might have a hope that because the data sets that we have in string 467 00:51:59,330 --> 00:52:03,050 theory and the kind of questions that we're asking are so different from the 468 00:52:03,050 --> 00:52:07,310 usual kinds of questions which often have to do with pictures and videos and 469 00:52:07,310 --> 00:52:11,600 speech recognition because those things are so different in the science context. 470 00:52:11,600 --> 00:52:17,510 There might eventually also teach us something about machine learning that we didn't know before, but we'll have to see. 471 00:52:17,510 --> 00:52:23,445 And thanks very much.