1 00:00:13,420 --> 00:00:18,580 So it's a great honour for me to be here and be for all of you. Thank you very much for coming on. 2 00:00:18,580 --> 00:00:22,330 All of you alumni and friends of the. 3 00:00:22,330 --> 00:00:30,940 Of course, we're very excited about Phoenix. I hope you are one of the great things to be involved in, as Egypt's, you pointed out earlier. 4 00:00:30,940 --> 00:00:35,560 And I'm going to talk about some things that are actually physics, although you may not have considered them to be physics. 5 00:00:35,560 --> 00:00:40,270 So we're taking a number of kind of ways of thinking about the world and applying 6 00:00:40,270 --> 00:00:44,710 them to a very different set of of problems are typically not in physics, 7 00:00:44,710 --> 00:00:51,100 and the question I've given is a very broad, very grand and deliberately provocatively and over the top ground. 8 00:00:51,100 --> 00:01:01,880 Why the world is simple, but I want to start by giving you a some motivation for this question, and that's looking at this very famous story by Borg. 9 00:01:01,880 --> 00:01:10,150 Yes, the famous Argentine writer, a book from 1940 one has the library of Babel, and in this library there are books. 10 00:01:10,150 --> 00:01:13,510 Each book is made of more than 10 pages, with 40 lines of 80 characters, 11 00:01:13,510 --> 00:01:21,970 each using 22 letters plus period space and covers of twenty five characters in total and every book in this library, 12 00:01:21,970 --> 00:01:29,710 every possible book of that length is made. And if you count the number of possible books, it's 10 to the it's basically 10 to the 1.8 million. 13 00:01:29,710 --> 00:01:34,510 So a very, very extraordinarily large number of every possible book is in there. 14 00:01:34,510 --> 00:01:38,110 And this part of the story is that people go through his library trying to find the book that describes 15 00:01:38,110 --> 00:01:45,340 their life because any book for them 10 pages long that describes your life is in the library. 16 00:01:45,340 --> 00:01:49,030 So lots of complicated things are there. The, of course, the point the book is made. 17 00:01:49,030 --> 00:01:54,940 They're also very precious variations of including books. Where are you speaking to an audience of alumni? 18 00:01:54,940 --> 00:01:58,480 And suddenly meteorite comes in and crashes, and that story is also in that library. 19 00:01:58,480 --> 00:01:59,950 Every story is in that library. 20 00:01:59,950 --> 00:02:08,000 And so every book is equally likely many stories in there, including the story of your life, but the chance of you finding it is extremely small. 21 00:02:08,000 --> 00:02:12,110 You could go look. Of course, the question might be, is every story equally likely? 22 00:02:12,110 --> 00:02:17,780 Well, every story is probably not equally likely. There are definitely stories that are relatively simple that can be told in many different ways. 23 00:02:17,780 --> 00:02:20,690 Therefore, you'll find that story repeated many times in the book, 24 00:02:20,690 --> 00:02:25,910 in many books where their stories are very complicated that you need all of them for them 10 pages to really properly tell, 25 00:02:25,910 --> 00:02:35,210 and that's to really only appear probably once. And so I'm going to be interested in this little analogy of all possible sequences of letters. 26 00:02:35,210 --> 00:02:39,150 How many? What fraction of them will carry certain types of information. 27 00:02:39,150 --> 00:02:43,310 And this brings me to the question of why the rules are simple that I've been worrying about for a while, 28 00:02:43,310 --> 00:02:50,030 which is when I take on my computer and I zip a file, I can almost inevitably compress my files enormously. 29 00:02:50,030 --> 00:02:55,220 And I was wondering, is that special or not? Well, that's for the sake of argument. 30 00:02:55,220 --> 00:03:02,690 Take assume that all of my things are encoded in binary strings where there's there are two length strings of length. 31 00:03:02,690 --> 00:03:07,830 One, there are four of length to eight of three. There's two to the end of length each one. 32 00:03:07,830 --> 00:03:11,630 But that's why I have got a string of particular length and there's two to the other possible strings. 33 00:03:11,630 --> 00:03:17,990 How many strings are there that are shorter than that? Because I've got to encode compress something into a string, which is shorter. 34 00:03:17,990 --> 00:03:23,510 Well, for this one particular, there's eight strings of length. Three. But there are only six strings that are shorter than length. 35 00:03:23,510 --> 00:03:27,950 So I can't compress more than six of my strings. And in fact, most of my strings. 36 00:03:27,950 --> 00:03:34,400 I can only compress my one bit. There's only there's there are only two strings that are two bit shorter than three bits. 37 00:03:34,400 --> 00:03:43,520 And so that's the principle that holds up. So the number or the fraction of the strings that I can compress by end bits is 38 00:03:43,520 --> 00:03:47,210 one over two to the power of the number of bits that I want to compress it by. 39 00:03:47,210 --> 00:03:54,980 So if I want to compress a string by 10 bits, they're only one one thousandth of all strings of length of whatever length you have 40 00:03:54,980 --> 00:04:00,800 can be compressed by 10 bits or more and have all compressed my string by 20 bits. 41 00:04:00,800 --> 00:04:07,160 Only one in a million of all strings of that length, whatever length it is, can be compressed by by 20 bits, 42 00:04:07,160 --> 00:04:11,420 which is not very much because when I zipped files pressed by hundreds and thousands of bits. 43 00:04:11,420 --> 00:04:19,400 So the probability the number of strings that can be compressed by an incredibly small fraction of all the strings that are possible. 44 00:04:19,400 --> 00:04:25,010 And the reason is because the number of strings goes as to to the end, it grows exponentially, exponentially. 45 00:04:25,010 --> 00:04:29,790 So it becomes very large, very hyper, astronomically large numbers are very. 46 00:04:29,790 --> 00:04:38,760 Are easy to make, so most strings are extremely rare. So why is it then that most of the things that we compress that we use in our daily life, 47 00:04:38,760 --> 00:04:42,000 why this in principle and why is so much of the things we see in nature? 48 00:04:42,000 --> 00:04:49,290 Why are they compressible this thing collection there to kind of understanding things one way one we can think of or misunderstand something? 49 00:04:49,290 --> 00:04:54,990 You've compressed it into some kind of simpler description, and that's a way of understanding. 50 00:04:54,990 --> 00:04:59,790 So I'm going to give you an intuition. It's going to sound very vague and fluffy, but I'm going to hopefully tighten it a little bit. 51 00:04:59,790 --> 00:05:05,050 So imagine a monkey typing on a computer, OK, I would use a typewriter. 52 00:05:05,050 --> 00:05:09,000 I realise I guess you guys will find typewriters with my students. I don't know where these things are. 53 00:05:09,000 --> 00:05:16,110 So on a word processor and you want to ask this monkey to type in PI? Well, how likely is it if it's a truly random monkey to type in PI? 54 00:05:16,110 --> 00:05:23,550 Well, let's say there and keys on a typewriter. It's one of 10 to the Power X plus one Exige as a party because it got three points. 55 00:05:23,550 --> 00:05:28,020 One force is one extra digit, so the probability is extraordinarily small. 56 00:05:28,020 --> 00:05:35,790 The 50 keys one, if they get three or four songs, 150 square, they get points, etc. So the probability extremely small. 57 00:05:35,790 --> 00:05:43,120 But it's possible in principle you could do it. This trope of monkeys actually goes back to a long history if it goes back to a 58 00:05:43,120 --> 00:05:48,010 very famous mathematician in the world who was 100 years ago pointed this out, 59 00:05:48,010 --> 00:05:52,480 it's about monkeys, but it's interesting. Think about them. In fact, they don't type randomly. 60 00:05:52,480 --> 00:05:59,050 There was an experiment the University of Portsmouth, where they put some monkeys in a in a zoo and gave them a typewriter. 61 00:05:59,050 --> 00:06:06,040 And apparently these kids had a preference for K K, which they kept whacking and defaecated on the typewriter. 62 00:06:06,040 --> 00:06:11,110 And that was the end of the experiment. So these are hypothetical monkeys typing on a typewriter. 63 00:06:11,110 --> 00:06:18,790 Now what if instead of typing though on a word processor, you gave the the monkey a c programmes the monkeys typing on a C programme? 64 00:06:18,790 --> 00:06:21,790 How likely would the monkey then type the first digits of PI? 65 00:06:21,790 --> 00:06:27,890 Well, there's actually a competition for the shortest programme in C that will generate PI. 66 00:06:27,890 --> 00:06:28,840 A bunch of short programmes, 67 00:06:28,840 --> 00:06:35,170 and so this is one of the ones I think it's the it may or may not be the current records under three three characters long. 68 00:06:35,170 --> 00:06:43,060 Correct. If you type this in by accident or by by thinking it will generate the first fifteen thousand pages of pie correctly. 69 00:06:43,060 --> 00:06:51,470 In other words, if a monkey is typing on a computer. He might by accident type, the first time with the religious of this programme, 70 00:06:51,470 --> 00:06:58,430 and suddenly you will get by a very large supply will appear with much higher probability than any other random number. 71 00:06:58,430 --> 00:07:03,140 Another round number, but you can't generate algorithmically. The only way you're going to get it in writing, print that number. 72 00:07:03,140 --> 00:07:09,550 And so basically typing into the computer or typing into a processor, will the first order be the same amount of length? 73 00:07:09,550 --> 00:07:19,110 So that intuition can be sharpens. And the way that that has gone on would be using ideas from from. 74 00:07:19,110 --> 00:07:24,060 Alan Turing, so Alan Turing, very famously in 1936, came up with the universal Turing machine, 75 00:07:24,060 --> 00:07:26,410 which is a computer that can do any possible computation. 76 00:07:26,410 --> 00:07:33,930 Now people don't often is the reason he came up with this was not to make a computer, but to prove that a lot of things were not computer. 77 00:07:33,930 --> 00:07:41,370 In fact, this article, which is on the numbers with the applications to the answer to the decision problem of David Hilbert. 78 00:07:41,370 --> 00:07:44,070 So this is the Dave Hoover's famous decision problem. 79 00:07:44,070 --> 00:07:51,360 The idea that could you come, could you from a set of axioms, have an algorithm that could always decide whether something was true or not? 80 00:07:51,360 --> 00:07:54,620 True? Within that set of axioms and Google, 81 00:07:54,620 --> 00:08:00,830 and then Turing proved that it wasn't by showing that these universal computing machines have one thing that they cannot do. 82 00:08:00,830 --> 00:08:07,880 You cannot prove there is no algorithm that will tell you that a particular input programme will generate an output, which is called holding. 83 00:08:07,880 --> 00:08:13,340 So the computer has to have an input comes in, it runs for a while and then it holds that it gets an output and you can prove that 84 00:08:13,340 --> 00:08:19,100 you can prove that you can't prove that a particular input will generate will, 85 00:08:19,100 --> 00:08:30,170 will make the computer whole. Let's called the whole thing. And ergo, you can reduce that to to any kind of any kind of mathematical logical problem. 86 00:08:30,170 --> 00:08:42,230 Therefore, there are there undesirable? And so based on the idea of true machines, we then see two other great geniuses of growth and cheating. 87 00:08:42,230 --> 00:08:48,470 Who start thinking about the question of what is the complexity of a sequence, so I have a sequence of, say, a binary sequence. 88 00:08:48,470 --> 00:08:51,620 Can I describe this complexity? It turns out that the fundamental, 89 00:08:51,620 --> 00:08:56,930 mathematically profound way of describing them is to say the complexity of a sequence is the 90 00:08:56,930 --> 00:09:03,380 length of the shortest code that will generate that sequence on a universal Turing machine. 91 00:09:03,380 --> 00:09:12,800 In other words, if I have a sequence like this which is zero, many times it's probably quite a short programme. 92 00:09:12,800 --> 00:09:16,820 I can say zero one, 50 times and I'll get that sequence. Where's this sequence beneath? 93 00:09:16,820 --> 00:09:25,040 It may be quite complicated. Maybe it's the only programme that will generate is print that particular number, so it's a complex sequence. 94 00:09:25,040 --> 00:09:31,790 And what's interesting about this definition is that so to define that formula, I need the particular universe to machine. 95 00:09:31,790 --> 00:09:37,640 But because any universal machine can always emulate any other Turing machine by writing a compiler. 96 00:09:37,640 --> 00:09:41,300 In principle, if these things are long enough so I can ignore the compiler terms, 97 00:09:41,300 --> 00:09:46,170 the the complexity of a of a string is independent of the string machine that I use. 98 00:09:46,170 --> 00:09:50,870 So it's a it's actually a property of the string itself up to these compiler terms. 99 00:09:50,870 --> 00:09:58,490 So we say that they are asymptotically these that a particular tree has a very particular code of complexity. 100 00:09:58,490 --> 00:10:06,290 The problem is that because of the whole thing problem, you can never actually calculate correctly and know for sure that it's the of complexity. 101 00:10:06,290 --> 00:10:09,080 Colloquially, what that means if I give you this particular sequence of digits, 102 00:10:09,080 --> 00:10:16,190 you don't know whether that is actually complex or whether there may be something like pie, which generates that as its first number of digits. 103 00:10:16,190 --> 00:10:20,660 And so you never can prove it, but it can define it. 104 00:10:20,660 --> 00:10:23,570 And this gives a whole series of other interesting intuitions. 105 00:10:23,570 --> 00:10:29,240 For example, the definition of a random number is no whose common goal of complexity is its own length or slightly more, 106 00:10:29,240 --> 00:10:31,520 and it was the only way you can generate it by printing it. 107 00:10:31,520 --> 00:10:35,630 If you can generate the number by some other algorithm shorter, then it's not a random number. 108 00:10:35,630 --> 00:10:39,200 That's the definition of a random number morsel of randomness. 109 00:10:39,200 --> 00:10:47,720 Also, the complexity of a set can be much less than the reflexive elements of a set, so the complexity of Baucus's library is extremely short. 110 00:10:47,720 --> 00:10:55,490 Take all old on books and in 410 pages long with 80 characters per page. 111 00:10:55,490 --> 00:10:57,440 Twenty five letters and that's a pretty short. 112 00:10:57,440 --> 00:11:02,270 I can take the entire library of August, whereas your own life, which is described by one of those books, 113 00:11:02,270 --> 00:11:07,130 or maybe several those books, depending on the complexity of your life, that is pretty, pretty complex, right? 114 00:11:07,130 --> 00:11:12,990 So the interesting thing is that you can have the complexity of a set. It can be a much less than the complexity of the individual elements. 115 00:11:12,990 --> 00:11:20,720 This is a lot of kind of things that seem non-intuitive at first, but it become very clear and very precise in this of language. 116 00:11:20,720 --> 00:11:28,670 And the reason I brought this up is because actually, the co-author of Almagro of Complexity was done earlier by another great genius essential genius 117 00:11:28,670 --> 00:11:35,060 called Ray Sullivan of who was trying to effect all of us very heavily influenced by Coin-Up, 118 00:11:35,060 --> 00:11:39,510 a very honest philosopher from the Vienna School who was trying to formalise induction. 119 00:11:39,510 --> 00:11:42,590 And so he was trying to think about how he could do this in a computer, as he said, 120 00:11:42,590 --> 00:11:47,000 Well, if I have a universal truth machine, let's assume that I feed at random inputs. 121 00:11:47,000 --> 00:11:52,670 And for that make life simple. We're going to make a universal truth machine that only takes binary codes as input. 122 00:11:52,670 --> 00:11:56,040 So I give it inputs and I see what it does. 123 00:11:56,040 --> 00:12:03,300 And I asked myself how likely on giving it random inputs, do I get a particular output x that can be some particular string? 124 00:12:03,300 --> 00:12:11,580 How likely will this computer generate that string? Well, it's the sum over all programmes that generate that string on the university machine. 125 00:12:11,580 --> 00:12:19,860 That's what it is. And each programme of length, since it's binary, is the probability I get a programme of linked L as one half to the power L. 126 00:12:19,860 --> 00:12:25,530 So it's the sum of all programmes terms probability that you get the programme, which is a probability of one half to the par. 127 00:12:25,530 --> 00:12:31,260 Now the most likely programme is the shortest programme because the one you're most likely to type by accident. 128 00:12:31,260 --> 00:12:35,730 And so the first thumb in that series is two to the minus K, which is a goal of complexity. 129 00:12:35,730 --> 00:12:42,090 So if I can help with idea before coming off of, it's called Rugova's a name that has stuck by the kind of Matthew principal. 130 00:12:42,090 --> 00:12:45,330 You know, he has more shall be given until he has not. 131 00:12:45,330 --> 00:12:51,070 Even what he has has been shall be taken away. Which is why we know it has caused growth complexity. 132 00:12:51,070 --> 00:12:55,000 And so this is really interesting, this gives you at least a lower bound on the probability that you're going to get 133 00:12:55,000 --> 00:12:58,930 the output by randomly generating and it's given by the chemical complexity. 134 00:12:58,930 --> 00:13:06,250 Now, another great genius there 11, also known as the one of the founders of the idea of p p, it's a great thinker in mathematics. 135 00:13:06,250 --> 00:13:10,060 Computer science proved that not only was there this lower bound by soul, 136 00:13:10,060 --> 00:13:16,330 but also an upper bound up to these kind of pesky order, one terms which are terms that are linked to compilers, 137 00:13:16,330 --> 00:13:23,150 etc. And this is very interesting because this tells us that if I have any kind of system that can be generated by universe through machine, 138 00:13:23,150 --> 00:13:25,330 I randomly give programmes into it. 139 00:13:25,330 --> 00:13:34,030 Then the probability that it produces a particular outputs can be bounded tightly by a half to the power of the of the object. 140 00:13:34,030 --> 00:13:37,360 OK, so that's a very beautiful thing. I think it should be more widely taught because it's a really cool. 141 00:13:37,360 --> 00:13:43,060 It's amazing. Now, the reason why it's not so widely taught is because there are problems in applying this, 142 00:13:43,060 --> 00:13:47,290 and the problems are many systems we care about are not universal true machines. 143 00:13:47,290 --> 00:13:49,990 So in physics, many things are not universally not during universal. 144 00:13:49,990 --> 00:13:53,380 And one of the reasons we know that is because we can take all inputs and outputs. 145 00:13:53,380 --> 00:13:58,250 By definition, we know there if we solve the whole thing problem that can't be a true machine. 146 00:13:58,250 --> 00:14:03,900 And Gulf complexities, by definition, formally on Computable. So that's problematic as I need it in my bones. 147 00:14:03,900 --> 00:14:05,910 And of course, many systems are not in the assets, 148 00:14:05,910 --> 00:14:15,740 and clearly this is only true in the limits of keys that are large enough that I can ignore all of these kinds of things, like my my compiler terms. 149 00:14:15,740 --> 00:14:23,410 So. I've been interested in this for quite a while, and I've got two very brilliant DPhil students come out Dingle and Chico Camargo, 150 00:14:23,410 --> 00:14:31,420 I think Chico sitting in the back is to the fields of work. And so we work very hard on trying to generalise this coding theorem for a known universal 151 00:14:31,420 --> 00:14:36,220 map so we can up the details you can read in this paper that came out just recently on. 152 00:14:36,220 --> 00:14:42,580 But we have a bounce which says that the probability that you get a particular output on random inputs to a computable map, 153 00:14:42,580 --> 00:14:49,810 it's a map of inputs and outputs that are well defined can be still bounded by the same kind of two to the minus ache 154 00:14:49,810 --> 00:14:55,510 with a little squiggle above the K means we approximated K by some good approximation to global growth complexity. 155 00:14:55,510 --> 00:14:59,110 So a compression algorithm is something that gives you good. 156 00:14:59,110 --> 00:15:05,050 And we've got some B, which is a term a constants. That's an offset, basically, which we don't quite know how to fix. 157 00:15:05,050 --> 00:15:12,580 We actually know how to calculate a the first consonant just based on that on the properties of the map, not the properties of the output of the map. 158 00:15:12,580 --> 00:15:18,110 And B we can fix by taking a few measurements. So for this to work, the maps have to be simple. 159 00:15:18,110 --> 00:15:25,300 That is the if my system grows in size, the maps complexity has to grow slowly with system size. 160 00:15:25,300 --> 00:15:32,110 I have to find a good approximation to K. That's a that's a bit of an uncontrolled approximation, but we can try and. 161 00:15:32,110 --> 00:15:39,790 And interestingly, this bound this quite it's an upper bounds, but it's relatively tight on four random inputs via random input into my map. 162 00:15:39,790 --> 00:15:44,560 Then the output should be close on average close to this bounds. 163 00:15:44,560 --> 00:15:52,090 And this only works for non-linear maps because the ones that are interested in. And that doesn't work for particular kinds of maps the or that. 164 00:15:52,090 --> 00:16:00,640 So you see, a map generates mainly, say, pseudo random number generator, which which you use to generate you run the numbers. 165 00:16:00,640 --> 00:16:06,070 They're never really random. They're actually they're conmigo factors relatively short because you run, which generates a short code on, 166 00:16:06,070 --> 00:16:10,630 but they're made so that they will fool a random number generators, which means they're also incompatible. 167 00:16:10,630 --> 00:16:19,360 So you have to have maps that don't have that kind of behaviour. And so we show that a wide range of systems in nature, in fact, behave this way. 168 00:16:19,360 --> 00:16:27,280 So here we have these are oranges, oranges and the tropical speaking about the reiber zone, which is made of RNA. 169 00:16:27,280 --> 00:16:31,990 So those are things so that what you have to then do to make that is you have a strand with 170 00:16:31,990 --> 00:16:36,100 this code of four letters on it and then it falls into a well-defined three dimensional shape. 171 00:16:36,100 --> 00:16:41,860 And then you can study this folding. And we've studied this folding and you can ask yourself if I have a particular strand, 172 00:16:41,860 --> 00:16:45,130 if I just think random strands, how likely out to get a particular shape. 173 00:16:45,130 --> 00:16:52,720 And it turns out that if you randomly pick strands and you look at the shapes that you write out their complexity in terms of complexity, 174 00:16:52,720 --> 00:16:56,650 then the most likely strands you're going to get by randomly picking most likely shapes, 175 00:16:56,650 --> 00:17:01,120 you can get about really picking strands or in fact, simple shapes of compressible shapes. 176 00:17:01,120 --> 00:17:07,870 And interestingly, when you look at nature, the shapes aren't that you find in nature are exactly the shapes that we predict you're going to find, 177 00:17:07,870 --> 00:17:11,650 because those are ones that have short descriptions of complexity, 178 00:17:11,650 --> 00:17:16,870 and they're therefore very easy to make by random mutations because there's many, many sequences mapping to them. 179 00:17:16,870 --> 00:17:22,330 So those are the simple stories for as many different possible ways the same story can be written. 180 00:17:22,330 --> 00:17:28,690 We do the same here, and this is another example we have here. This is a set of couple of differential equations in this particular case, the model. 181 00:17:28,690 --> 00:17:34,240 This is for circadian rhythm, but actually, we just took it as a set of in this case on, I think, 182 00:17:34,240 --> 00:17:38,900 five a couple of differential equations with about 30 parameters, and we just randomly vary the parameters. 183 00:17:38,900 --> 00:17:43,300 So now the input is around, they change the parameters, and I look at the outputs and also self. 184 00:17:43,300 --> 00:17:48,280 Does this set a couple of differential equations that generate a very complicated output or simple outputs? 185 00:17:48,280 --> 00:17:51,880 And you see this line, this black line here is our bond. 186 00:17:51,880 --> 00:17:57,760 The Red Line is a boundary to be to zero, so we ignore B, but so that means there's no free parameters. 187 00:17:57,760 --> 00:18:04,180 One free parameter do a slightly better bond ends up for every particular complex, every particular output. 188 00:18:04,180 --> 00:18:12,690 This is a simple but simple was much more frequently than complicated outputs, and there's bound as tight as boundaries extremely well. 189 00:18:12,690 --> 00:18:22,120 Remember, this is the log of the probability, so there are indeed. Outputs that you get down here that are simple and rare. 190 00:18:22,120 --> 00:18:27,800 OK, but they're very rare, so the vast majority of time with low quality, you're close to the bounds. 191 00:18:27,800 --> 00:18:29,870 And we also can make maps for which it doesn't work at all, 192 00:18:29,870 --> 00:18:36,470 this is a matrix map where the input that's in the matrix that does the computation and then there's outputs. 193 00:18:36,470 --> 00:18:41,600 So a matrix, you think about it, if it's a map, it grows as if it's if you have an inviting matrix is square matrix. 194 00:18:41,600 --> 00:18:47,480 It grows and squared as you grow. So the complexity of the map does not get that grows very quickly with size. 195 00:18:47,480 --> 00:18:51,110 And for that argument, say, Ah, this theory should work, and indeed it doesn't work. 196 00:18:51,110 --> 00:18:52,200 You don't get simplicity. 197 00:18:52,200 --> 00:18:58,970 So for some, this device you need to have at the set of rules that describe how the input process to the output have to be simple, 198 00:18:58,970 --> 00:19:05,510 which is true for, you know, how RNA gets translated, how RNA sequences turn its structure. 199 00:19:05,510 --> 00:19:11,270 There's a set of physical laws that that I wrote that are not really change as you make the artist strands longer. 200 00:19:11,270 --> 00:19:16,160 And perhaps the differential equations not changes because different equations ability to study. 201 00:19:16,160 --> 00:19:20,660 I can do the same with simple financial models, and everything seems to show the simplicity bias. 202 00:19:20,660 --> 00:19:27,560 And just to show you just a few people I know from experience, people will wonder, aren't you looking at some kind of entropy argument? 203 00:19:27,560 --> 00:19:32,090 So this is the this is a compression of the outputs. 204 00:19:32,090 --> 00:19:37,100 And here we have just heard what I mean by the difference, entropy and complexity. 205 00:19:37,100 --> 00:19:41,600 So here's the entropy of strings. This is all strings of length. 206 00:19:41,600 --> 00:19:46,140 Thirty. So the entropy is basically the fraction of 1s and zero. 207 00:19:46,140 --> 00:19:54,440 So it's true the five old ones. OK, that's a very simple string. So strings with low entropy down here also have low complexity of old ones. 208 00:19:54,440 --> 00:19:56,390 Are all zeros, right? 209 00:19:56,390 --> 00:20:02,900 But there are also many strings of high entropy, but low complexity like here that would be a string of five zero one zero one zero one zero one. 210 00:20:02,900 --> 00:20:09,320 It's actually binary. It's entropy, the same fraction of zeros and ones as entropy seems real to be high, even though it's a highly structured string. 211 00:20:09,320 --> 00:20:15,890 And so what we're picking up here is something far beyond engineer picking up these kinds of patterns in nature. 212 00:20:15,890 --> 00:20:20,720 So why are most strings that you see? 213 00:20:20,720 --> 00:20:24,770 Most of these are close to maximum clock, but why do we see so many compressible sequences nature? 214 00:20:24,770 --> 00:20:31,390 One answer may be because the patterns that we're looking at are caused by effectively a process sampling 215 00:20:31,390 --> 00:20:39,340 algorithm and and then they'll be especially biased towards what are effectively simpler outputs. 216 00:20:39,340 --> 00:20:45,910 So going back to my monkey analogy, that was fairly vague and monkeys typing on a typewriter or into a C programme. 217 00:20:45,910 --> 00:20:52,870 Well, what this whole language of information theory tells you is that you can not worry about any particular programme the 218 00:20:52,870 --> 00:20:59,710 monkeys typing in because there's a there is a and you can always write a compiler from one programme to the other. 219 00:20:59,710 --> 00:21:07,510 And I can write and and then I can formalise this in terms of the decoding theorem of it. 220 00:21:07,510 --> 00:21:13,540 What we've done now is gone a bit away from that great generality is something slightly less general, which is maps that are simple, 221 00:21:13,540 --> 00:21:26,350 and we show that we can nevertheless generate a pretty good, a pretty good description of what happens to a wide range of different physical systems. 222 00:21:26,350 --> 00:21:32,260 We lose the lower bounds. OK, but we have a really good upper bound upper bound seems to work. 223 00:21:32,260 --> 00:21:39,250 And so far, we've not really found any system for which it doesn't work in one way or the other. 224 00:21:39,250 --> 00:21:43,240 And there's a lot of other stories behind it, but I won't tell you what I'm going to give you two applications, 225 00:21:43,240 --> 00:21:47,020 so I'll give you two applications one application just evolution. 226 00:21:47,020 --> 00:21:54,220 So evolution happens by randomly changing genotypes that can translated by the laws of physics and biology into a phenotype, 227 00:21:54,220 --> 00:21:56,650 which is the organism of some kind do the other. 228 00:21:56,650 --> 00:22:02,470 So I'm going to put those and then if I have time, I might look at machine learning with deep neural networks, which also effectively does. 229 00:22:02,470 --> 00:22:06,520 It can be thought of as an input output map, but I'm going to hopefully explain why those two things work. 230 00:22:06,520 --> 00:22:13,830 So let's think firstly about some. This is a movie that's the Julia showed of. 231 00:22:13,830 --> 00:22:18,690 Of the self-assembly of the bacteria for motor and the. 232 00:22:18,690 --> 00:22:23,010 What's interesting about that is many silly things, including how on earth is that thing self-assemble? 233 00:22:23,010 --> 00:22:31,470 But Nature designed this particular system by randomly changing the genome types, the genes that made those particular proteins. 234 00:22:31,470 --> 00:22:35,760 And so the question that I got interested in is, how can how does that work? 235 00:22:35,760 --> 00:22:40,800 Because this is a very specific system. It's really hard, actually emulates how on Earth could do by randomly just changing genotypes. 236 00:22:40,800 --> 00:22:45,720 Get something that does this. That forms is very exquisite, three dimensional shape. 237 00:22:45,720 --> 00:22:53,470 And so we came up with a model. Which you call the pull of the Dominos, so these are typical physics model of proteins or proteins, 238 00:22:53,470 --> 00:22:56,920 are the molecular building blocks of the of life, really. 239 00:22:56,920 --> 00:23:02,380 And they form dimer astronomers, customers hexamer all kinds of complicated structures, including that bacteriological motor. 240 00:23:02,380 --> 00:23:07,360 So we're going to replace it with a simple model of little squares on a lattice. 241 00:23:07,360 --> 00:23:12,760 And these scores are interact with each other with particular interactions, and then hopefully they will self-assembly into well-defined shapes. 242 00:23:12,760 --> 00:23:21,430 So here's an example of a polynomial, which will play all roles because no dominoes is to try those three polynomials multiple ones. 243 00:23:21,430 --> 00:23:25,630 And so here I've got a set of particles with letters on the outsides. 244 00:23:25,630 --> 00:23:31,510 I've got a set of rules that tell you who sticks to whom. And it turns out if I have this set of rules, who sticks to who I am, 245 00:23:31,510 --> 00:23:37,510 and I put this number one particle down first and let these other ones just randomly fall in and out of my system, 246 00:23:37,510 --> 00:23:42,260 I'm always going to form this particular structure, and the reason is no. One. 247 00:23:42,260 --> 00:23:46,640 Has eyes on the outside sticks to see the unknown with see in this case is number three. 248 00:23:46,640 --> 00:23:50,780 So these three will stick, nothing else will stick. 3S have on their sides. 249 00:23:50,780 --> 00:23:55,160 They have these. These stick to be the only one with these number two. 250 00:23:55,160 --> 00:24:00,680 So 2S will stick to the outside and etc. And twos have 0s or 0s everywhere. 251 00:24:00,680 --> 00:24:05,480 We don't stick to anything. And so this will deterministic the always self-assembly is of this particular structure. 252 00:24:05,480 --> 00:24:09,020 So that's a very simple model of how self-assembly works in something like that, 253 00:24:09,020 --> 00:24:13,130 flagella mature as well, so these things will always form its particular structure. 254 00:24:13,130 --> 00:24:20,000 Now I can ask the inverse question, which is if I have a shape. Can I find a set of rules that will make that particular shape? 255 00:24:20,000 --> 00:24:25,730 That's a kind of fun little game, and you can use lots of different methods do that include rather theoretical techniques, 256 00:24:25,730 --> 00:24:31,670 but a very clever DPhil student, Ian Johnston, who? 257 00:24:31,670 --> 00:24:36,860 Who decides to try this and he said, well, he said Monday, me, I tried something different. 258 00:24:36,860 --> 00:24:41,630 I just said, make a 16 year, OK. I didn't say we're 60 murders, make any 60. 259 00:24:41,630 --> 00:24:44,330 And so point is a voluntary recreational mathematics. 260 00:24:44,330 --> 00:24:48,860 It turns out there are 30 million seventy nine thousand two hundred fifty five different 16 years. 261 00:24:48,860 --> 00:24:49,640 So I initially thought, Well, 262 00:24:49,640 --> 00:24:56,240 if you just randomly runs an evolutionary algorithm that kind of randomly picks different matrix and matrix interactions, 263 00:24:56,240 --> 00:25:00,560 you should get any particular shape with a probability one in 13 million. 264 00:25:00,560 --> 00:25:05,570 But no, what we found is half the time we got only these structures, these first twenty one, 265 00:25:05,570 --> 00:25:12,830 50 percent of the time by randomly either doing a Darwinian kind of code or just randomly changing things just completely randomly, 266 00:25:12,830 --> 00:25:17,000 just seeing what they make. We got these twenty one out of these 30 million, 267 00:25:17,000 --> 00:25:22,880 and that's actually how I got interested in this coding theorem because I just want to get the problem I gave to to come island to Chico. 268 00:25:22,880 --> 00:25:26,570 Just say, Well, why is this happening? Why are we going to get this very small subsets? 269 00:25:26,570 --> 00:25:32,960 Because what we realise is that subset a special subset, you get these first twenty one, they're highly symmetric. 270 00:25:32,960 --> 00:25:38,550 So there are only in the set of 13 million people. The owner knows five, which have D4 symmetry. 271 00:25:38,550 --> 00:25:42,950 That's the symmetry of the square means you can rotate and flip them and they themselves. 272 00:25:42,950 --> 00:25:46,850 And all five of those are found in this first set of twenty one. 273 00:25:46,850 --> 00:25:50,630 So by randomly looking in the space of algorithms in space, that's fine. 274 00:25:50,630 --> 00:25:57,830 It's making these things. I'm finding all these highly symmetric shapes and thus gave us that gave us the clue to start looking into this eight and 275 00:25:57,830 --> 00:26:04,700 then took to the fields to turn it from a very abstract theory of global growth complexity to something that worked here. 276 00:26:04,700 --> 00:26:06,860 And if you actually look and this whole system, look at probability, 277 00:26:06,860 --> 00:26:12,110 you get a particular output x as opposed to the complexity of the structure complexity being the number of bits. 278 00:26:12,110 --> 00:26:16,190 I need to encode the algorithm that makes that particular structure. 279 00:26:16,190 --> 00:26:22,760 Then the very high probability ones are the simple ones and symmetric structures have are simple. 280 00:26:22,760 --> 00:26:27,470 You don't need very much information to encode a simple structures. They make something that was repeated multiple times. 281 00:26:27,470 --> 00:26:31,370 So symmetry and low complexity are deeply linked to one another. 282 00:26:31,370 --> 00:26:37,250 And then we've worked with some collaborators in Cambridge, and we've discovered that you can look at protein clustering, 283 00:26:37,250 --> 00:26:40,190 the protein database, which is about thirty thousand that are known. 284 00:26:40,190 --> 00:26:48,560 And they've looked at the complexity of those protein clusters, not them onto the same onto the same complexity measure. 285 00:26:48,560 --> 00:26:54,110 And we get almost exactly the same behaviour as we get for a simple domino model that makes is very happy when they're simple. 286 00:26:54,110 --> 00:27:01,370 Models give the same results as nature. And this line here is our bound that we've predicted so are able to predict this upper bound. 287 00:27:01,370 --> 00:27:07,070 And this tells us that not only is this biased towards highly symmetric simple structures there in the polynomials, 288 00:27:07,070 --> 00:27:10,880 but it seems to be there in nature in these protein clusters. 289 00:27:10,880 --> 00:27:15,800 And you can break this down further so you can look at, say, six hours of which there are many in the database. 290 00:27:15,800 --> 00:27:19,010 And this is the complexity down here. 291 00:27:19,010 --> 00:27:27,740 And what you see is a simple six months of very frequent nature and complex six hours are rare and more or less the frequencies that we predict of. 292 00:27:27,740 --> 00:27:31,590 And so we've done a whole series of evolutionary models. I just show you one because I just saw this. 293 00:27:31,590 --> 00:27:35,540 She was in the audience. So Nora Martin was an undergraduate with me. 294 00:27:35,540 --> 00:27:37,220 She worked on Richard Dawkins Bioworks. 295 00:27:37,220 --> 00:27:43,820 Richard Dawkins is a really beautiful little model of shapes that you can make that he uses to show the great power of evolution. 296 00:27:43,820 --> 00:27:47,840 And he or she shows that the probability of complexity of these shapes. 297 00:27:47,840 --> 00:27:55,100 If you if you search this space of algorithms show the same behaviour that we predict for for all the other systems. 298 00:27:55,100 --> 00:27:58,970 And Nora was an undergraduate here. She went to the other place where she's doing this. 299 00:27:58,970 --> 00:28:07,470 So you win some, you lose some. But we have really brilliant students as you as as I think many of you were as well. 300 00:28:07,470 --> 00:28:10,320 It's actually your undergraduate work, and what's interesting thing about this is that actually, 301 00:28:10,320 --> 00:28:16,080 if you look at these rare structures down here that are low complexity and low probability, 302 00:28:16,080 --> 00:28:19,560 it turns out that they're the strings that make those structures. 303 00:28:19,560 --> 00:28:27,210 So this the codes that make the structures of the cells simple so that they have particularly simple codes that make them. 304 00:28:27,210 --> 00:28:30,530 And that's that's interesting. All right. 305 00:28:30,530 --> 00:28:36,830 And then in the last 10 minutes or so, I want to switch gears to something entirely different, which is machine learning. 306 00:28:36,830 --> 00:28:38,900 So you probably see a lot of machinery. 307 00:28:38,900 --> 00:28:46,190 It's been in the news, enormous amounts, and the news has largely been dominated by one particular methods, which are neural networks. 308 00:28:46,190 --> 00:28:52,610 Neural networks are very loosely modelled on the brain the way they work as inputs layer that give you some kind of inputs, 309 00:28:52,610 --> 00:28:56,660 and then you've got a series of little interaction with other with other nodes. 310 00:28:56,660 --> 00:29:01,660 These are weights and say X one is these nodes are either one or zeros. 311 00:29:01,660 --> 00:29:08,960 And this one is one. Then the weights, it multiplies all the ones on this input that are one give a weight to the next layer. 312 00:29:08,960 --> 00:29:12,370 And this was given its next layer, and you can have several of these. 313 00:29:12,370 --> 00:29:16,160 They're called deep neural networks if they have several layers in a row and there's an output that says, 314 00:29:16,160 --> 00:29:20,390 for example, yes or no, this is a cat or is not a cat, for example. 315 00:29:20,390 --> 00:29:25,500 And what's really powerful about this is that they're incredibly good at things like pattern recognition. 316 00:29:25,500 --> 00:29:36,740 So in 2012, very famously, a group from University of Toronto, Jeffrey Hinton's group, completely obliterated the competition in this competition. 317 00:29:36,740 --> 00:29:40,400 Were you given a bunch of images, you're supposed to recognise them. 318 00:29:40,400 --> 00:29:46,100 And then so you train on these images, which means you get a bunch of images that say cats and dogs and horses and cows. 319 00:29:46,100 --> 00:29:53,720 And then you, you're given a whole bunch of them. So you train your computer to do correctly on that set that you've given that you're given. 320 00:29:53,720 --> 00:29:59,420 That means you adjust as little weights so that when you give an inputs of pixels, that is a cat's. 321 00:29:59,420 --> 00:30:06,500 The outputs of this thing is cats. And when this has dog is dog, that's how you train it and then you're given a test. 322 00:30:06,500 --> 00:30:14,240 That's what you haven't seen before of new pictures. And then you run those to your computer and then you see how well you predict. 323 00:30:14,240 --> 00:30:20,450 And these machines are extremely good at this. And so they predict with much higher accuracy knew their methods. 324 00:30:20,450 --> 00:30:25,310 What you what you put in initially that's called generalisation is called 325 00:30:25,310 --> 00:30:29,450 generalisation because you're you're showing that you're able to generalise on new, 326 00:30:29,450 --> 00:30:33,470 unseen data. And this has dramatically changed all kinds of things. 327 00:30:33,470 --> 00:30:40,490 So if you now do you, Google Translate is no longer it's using basically these kind of pattern recognition using large 328 00:30:40,490 --> 00:30:45,530 amounts of data of of text that are translated or put through it and eventually just memorise it. 329 00:30:45,530 --> 00:30:52,070 And you don't even need to know anything about language at all to work on these translation techniques because these machines have become so good. 330 00:30:52,070 --> 00:30:56,360 Now what's very confusing about them is surprising is that they're highly overprivileged, right? 331 00:30:56,360 --> 00:31:02,870 So a typical nerve network has millions and millions, millions of parameters, but you're only feeding it really a few thousand data points, typically. 332 00:31:02,870 --> 00:31:04,430 And we know from experience, 333 00:31:04,430 --> 00:31:11,240 if I think a small number of data points and I have more parameters than I have data points that I start getting nonsense. 334 00:31:11,240 --> 00:31:16,970 I mean, we're taught, we teach this to our undergraduates all the time, you know, don't put in. It's a very famous quote from von Neumann. 335 00:31:16,970 --> 00:31:20,810 Give me four elephants I can four four four three four parameters that can fit elephants. 336 00:31:20,810 --> 00:31:25,040 Give me five and I can make it wiggle its tail. 337 00:31:25,040 --> 00:31:32,330 And so very recently, a paper in science last year where a bunch of A.I. researchers were saying that this is alchemy, right? 338 00:31:32,330 --> 00:31:36,140 There's no reason why we don't understand why this works. There's no theory for it. It works clearly. 339 00:31:36,140 --> 00:31:45,740 But it's, you know, we don't understand and all the kind of classical machine learning theoretical ideas just break down for neural networks. 340 00:31:45,740 --> 00:31:47,750 So I have a very bright DPhil student. 341 00:31:47,750 --> 00:31:53,720 This is a pattern that you hopefully you've noticed lots of bright you feel students who came to me said, Well, let's think about this. 342 00:31:53,720 --> 00:32:03,800 Can we apply these ideas about simplicity, bias to deep neural networks so we can have a little more of a problem, which was I have a system of. 343 00:32:03,800 --> 00:32:10,130 Of seven bits. So there's two to the seven possible different strings and that old Boolean 344 00:32:10,130 --> 00:32:13,910 functions of that subordinate function basically says if this is on and that's off, 345 00:32:13,910 --> 00:32:18,410 then yes, this on this on the left, no, right. So they're actually a very large number. 346 00:32:18,410 --> 00:32:23,030 Then there's two to the 128 different possible Boolean function. 347 00:32:23,030 --> 00:32:29,300 That's what turned to the thirty four. So you might think that if I randomly pick parameters in my Emanual network, in other words, 348 00:32:29,300 --> 00:32:34,670 if I randomly choose these weights and then I ask myself, What function do I produce? 349 00:32:34,670 --> 00:32:38,330 They get each brain function with the probability of one in 10 to a thirty four. 350 00:32:38,330 --> 00:32:43,580 OK, what we see here is here we plots a rank plots of the probability of getting a function and rank. 351 00:32:43,580 --> 00:32:48,830 So here's the first attempt to the functions. And you see these probabilities are much, much, much larger than 10 to the minus thirty four. 352 00:32:48,830 --> 00:32:54,140 So functions certain functions are appearing quite easily by randomly picking them. 353 00:32:54,140 --> 00:32:59,260 And in fact, if you look at the complexity of those functions, these are the low complexity functions. 354 00:32:59,260 --> 00:33:04,000 So we're very excited about this. You know, this is a toy model for neural networks. 355 00:33:04,000 --> 00:33:09,530 Just last night, actually or this morning at 2:00 a.m., your boy sent me a picture of a much bigger network. 356 00:33:09,530 --> 00:33:16,990 This is for a this is actual, proper real network that we used on a four layer deep CNN. 357 00:33:16,990 --> 00:33:21,610 If that interest you, that we use on sea for data and we get exactly the same line. 358 00:33:21,610 --> 00:33:27,910 So we're very pleased that because he's done some very, very clever tricks to be able to work out this probability. 359 00:33:27,910 --> 00:33:34,480 So this is exactly what the neural networks do. These neural networks are highly biased towards simple outputs, 360 00:33:34,480 --> 00:33:39,800 and that may be the reason why when you give them a bunch of data, they don't overpromise. 361 00:33:39,800 --> 00:33:44,260 So even though the number of functions that this thing can generate is extremely large. 362 00:33:44,260 --> 00:33:47,800 So for this Boolean thing, it's tens to throw thirty to the thirty four. 363 00:33:47,800 --> 00:33:53,200 It gets biased by the simplicity bias towards this very small fraction of simple ones. 364 00:33:53,200 --> 00:33:58,570 And we tested this, for example. So here is the generalisation error on the typical supervised learning. 365 00:33:58,570 --> 00:34:04,750 So I've got 64 strings that I train on that then give it six version, which it hasn't even asked. 366 00:34:04,750 --> 00:34:13,090 So I know I have a secret Boolean function I I make I do 64 strings and see what the function would say. 367 00:34:13,090 --> 00:34:19,630 I train my network on it and then I give it the other sixty four strings that I haven't seen yet and ask, Does it find the right Boolean function? 368 00:34:19,630 --> 00:34:23,620 And this error is the fraction of time it gets those things wrong. 369 00:34:23,620 --> 00:34:34,060 And what you see is that, ah, if we randomly sampled inputs, then we get errors that are quite quite close to the to right. 370 00:34:34,060 --> 00:34:38,710 I'll go to small the order a few percent, whereas if we randomly pick functions, for example, 371 00:34:38,710 --> 00:34:43,630 if we just ran the functions and then test and then found functions which exactly 372 00:34:43,630 --> 00:34:46,960 fit the data so we can find lots of functions that will never fit the test. 373 00:34:46,960 --> 00:34:53,740 The training data, because the render is then rated on the test data, they give almost no sense and they give an error of almost 100 percent. 374 00:34:53,740 --> 00:34:59,050 So this Boolean networks are hugely biased towards this very small fraction of simple functions. 375 00:34:59,050 --> 00:35:02,920 And what's also interesting is as you make the target function is more complicated. 376 00:35:02,920 --> 00:35:07,060 So as you generate your data by a more complicated, targeted function, these networks do less and less well. 377 00:35:07,060 --> 00:35:13,000 So this is the error that they make if they're trained as opposed to the complexity of the target they're trying to make. 378 00:35:13,000 --> 00:35:16,180 And what turns out, even though they're extremely good, 379 00:35:16,180 --> 00:35:20,950 if they if you give them complicated functions so we can measure this complexity very accurately, 380 00:35:20,950 --> 00:35:23,800 then they actually they no longer generalise very well. 381 00:35:23,800 --> 00:35:33,220 So the fact that general, as well as because they're looking at simple functions, what we're particularly excited about is that we can apply. 382 00:35:33,220 --> 00:35:37,540 We've now use these ideas in kind of classical learning. Theory is a very big hole, 383 00:35:37,540 --> 00:35:45,130 big field of computer science for lots of textbooks and thousands hundreds of thousands of papers trying to make bounds on how will you generalise? 384 00:35:45,130 --> 00:35:50,830 Because you can make a rigorous bounds, then you know what your maximum error is going to be. 385 00:35:50,830 --> 00:35:57,610 And so we've been able to adapt these these ideas of bias to do a particular technique called PAC base. 386 00:35:57,610 --> 00:36:06,970 Actually, it has not as much to do with base as you might think on. And this is the this is a result that we get for the the error bounds. 387 00:36:06,970 --> 00:36:17,170 And what you see here for these curves on the bottom, these are M. This is a this is a database of of handwritten on handwritten numbers. 388 00:36:17,170 --> 00:36:23,800 So you have to do well. And this is C for which these images and so obviously on C four, you don't do this is your error that you do. 389 00:36:23,800 --> 00:36:27,580 And what we do is we randomise labels. We corrupt the labels on the test sets. 390 00:36:27,580 --> 00:36:30,700 So if I crop the labels, I tell you that a one is actually a five, 391 00:36:30,700 --> 00:36:34,600 then I'm not going to generalise very well because I've got some errors in my tests. 392 00:36:34,600 --> 00:36:39,880 And so as the errors get bigger, my generalisation is less good because I don't give them very good inputs. 393 00:36:39,880 --> 00:36:44,050 And so that's what you see of the error growing as a function of corruption for these three, 394 00:36:44,050 --> 00:36:48,610 these two three, two different two different classes of systems. 395 00:36:48,610 --> 00:36:53,410 And this is our rigorous bounds. So this is the upper bound on the generalisation error. 396 00:36:53,410 --> 00:36:56,560 And what's really I think this and this means this is no longer alchemy. 397 00:36:56,560 --> 00:37:03,970 We actually predicted analytically with no free parameters with the maximum error is that you're going to guess and we're not that we're. 398 00:37:03,970 --> 00:37:09,760 Depends on what we're pretty close to, what these things do, even though we're in this heavily over parameter regime, 399 00:37:09,760 --> 00:37:12,910 this regime, which traditionally machine learning ideas shouldn't work. 400 00:37:12,910 --> 00:37:17,980 And the reason this works is because what we argue or arguing is that the reason why these 401 00:37:17,980 --> 00:37:22,990 neural networks work so well is because they're intrinsically biased towards simple functions. 402 00:37:22,990 --> 00:37:28,060 And so when they give it a whole bunch images, they're actually picking out the simple things that fit the data, 403 00:37:28,060 --> 00:37:30,040 rather than the complicated things that fit the data. 404 00:37:30,040 --> 00:37:35,920 Whereas if you just randomly pick, if you do something kind of standard regression, for example, there's a very high order polynomial, 405 00:37:35,920 --> 00:37:41,250 then you're much more likely to pick a complicated one than a simple one unless you've got a bias in your system. 406 00:37:41,250 --> 00:37:43,090 These things are intrinsically biased, 407 00:37:43,090 --> 00:37:49,510 and because the patterns the the reason why they work well must be because the patterns that they're studying must also be simple, 408 00:37:49,510 --> 00:37:53,260 because if they try to look at complicated patterns, I'll show you they don't work that well. 409 00:37:53,260 --> 00:37:58,630 OK? As we showed you as well. So this tells us that the patterns are studying are in some way the other simple. 410 00:37:58,630 --> 00:38:02,530 So I gave you this very kind of grand claim at the beginning, 411 00:38:02,530 --> 00:38:06,940 why the rules are simple and I don't think I've completely explained it, but I've given you a list. 412 00:38:06,940 --> 00:38:13,120 Some things are simpler because you can think about them searches in the space of algorithms. 413 00:38:13,120 --> 00:38:19,030 So if I think about possibility, space is the space of all possible stories, for example, right? 414 00:38:19,030 --> 00:38:24,670 Then if I randomly look in the books of August, I'm going to find simpler stories much more frequently. 415 00:38:24,670 --> 00:38:27,010 I'm going to find complex stories. 416 00:38:27,010 --> 00:38:35,260 And so even though the probability of a particular very particular sequence of characters in a book is equal, every book is equally likely. 417 00:38:35,260 --> 00:38:41,050 The probability of getting simple stories is much more likely than the probability of getting complex stories. 418 00:38:41,050 --> 00:38:47,829 And thank you very much.