1
00:00:13,420 --> 00:00:18,580
So it's a great honour for me to be here and be for all of you. Thank you very much for coming on.

2
00:00:18,580 --> 00:00:22,330
All of you alumni and friends of the.

3
00:00:22,330 --> 00:00:30,940
Of course, we're very excited about Phoenix. I hope you are one of the great things to be involved in, as Egypt's, you pointed out earlier.

4
00:00:30,940 --> 00:00:35,560
And I'm going to talk about some things that are actually physics, although you may not have considered them to be physics.

5
00:00:35,560 --> 00:00:40,270
So we're taking a number of kind of ways of thinking about the world and applying

6
00:00:40,270 --> 00:00:44,710
them to a very different set of of problems are typically not in physics,

7
00:00:44,710 --> 00:00:51,100
and the question I've given is a very broad, very grand and deliberately provocatively and over the top ground.

8
00:00:51,100 --> 00:01:01,880
Why the world is simple, but I want to start by giving you a some motivation for this question, and that's looking at this very famous story by Borg.

9
00:01:01,880 --> 00:01:10,150
Yes, the famous Argentine writer, a book from 1940 one has the library of Babel, and in this library there are books.

10
00:01:10,150 --> 00:01:13,510
Each book is made of more than 10 pages, with 40 lines of 80 characters,

11
00:01:13,510 --> 00:01:21,970
each using 22 letters plus period space and covers of twenty five characters in total and every book in this library,

12
00:01:21,970 --> 00:01:29,710
every possible book of that length is made. And if you count the number of possible books, it's 10 to the it's basically 10 to the 1.8 million.

13
00:01:29,710 --> 00:01:34,510
So a very, very extraordinarily large number of every possible book is in there.

14
00:01:34,510 --> 00:01:38,110
And this part of the story is that people go through his library trying to find the book that describes

15
00:01:38,110 --> 00:01:45,340
their life because any book for them 10 pages long that describes your life is in the library.

16
00:01:45,340 --> 00:01:49,030
So lots of complicated things are there. The, of course, the point the book is made.

17
00:01:49,030 --> 00:01:54,940
They're also very precious variations of including books. Where are you speaking to an audience of alumni?

18
00:01:54,940 --> 00:01:58,480
And suddenly meteorite comes in and crashes, and that story is also in that library.

19
00:01:58,480 --> 00:01:59,950
Every story is in that library.

20
00:01:59,950 --> 00:02:08,000
And so every book is equally likely many stories in there, including the story of your life, but the chance of you finding it is extremely small.

21
00:02:08,000 --> 00:02:12,110
You could go look. Of course, the question might be, is every story equally likely?

22
00:02:12,110 --> 00:02:17,780
Well, every story is probably not equally likely. There are definitely stories that are relatively simple that can be told in many different ways.

23
00:02:17,780 --> 00:02:20,690
Therefore, you'll find that story repeated many times in the book,

24
00:02:20,690 --> 00:02:25,910
in many books where their stories are very complicated that you need all of them for them 10 pages to really properly tell,

25
00:02:25,910 --> 00:02:35,210
and that's to really only appear probably once. And so I'm going to be interested in this little analogy of all possible sequences of letters.

26
00:02:35,210 --> 00:02:39,150
How many? What fraction of them will carry certain types of information.

27
00:02:39,150 --> 00:02:43,310
And this brings me to the question of why the rules are simple that I've been worrying about for a while,

28
00:02:43,310 --> 00:02:50,030
which is when I take on my computer and I zip a file, I can almost inevitably compress my files enormously.

29
00:02:50,030 --> 00:02:55,220
And I was wondering, is that special or not? Well, that's for the sake of argument.

30
00:02:55,220 --> 00:03:02,690
Take assume that all of my things are encoded in binary strings where there's there are two length strings of length.

31
00:03:02,690 --> 00:03:07,830
One, there are four of length to eight of three. There's two to the end of length each one.

32
00:03:07,830 --> 00:03:11,630
But that's why I have got a string of particular length and there's two to the other possible strings.

33
00:03:11,630 --> 00:03:17,990
How many strings are there that are shorter than that? Because I've got to encode compress something into a string, which is shorter.

34
00:03:17,990 --> 00:03:23,510
Well, for this one particular, there's eight strings of length. Three. But there are only six strings that are shorter than length.

35
00:03:23,510 --> 00:03:27,950
So I can't compress more than six of my strings. And in fact, most of my strings.

36
00:03:27,950 --> 00:03:34,400
I can only compress my one bit. There's only there's there are only two strings that are two bit shorter than three bits.

37
00:03:34,400 --> 00:03:43,520
And so that's the principle that holds up. So the number or the fraction of the strings that I can compress by end bits is

38
00:03:43,520 --> 00:03:47,210
one over two to the power of the number of bits that I want to compress it by.

39
00:03:47,210 --> 00:03:54,980
So if I want to compress a string by 10 bits, they're only one one thousandth of all strings of length of whatever length you have

40
00:03:54,980 --> 00:04:00,800
can be compressed by 10 bits or more and have all compressed my string by 20 bits.

41
00:04:00,800 --> 00:04:07,160
Only one in a million of all strings of that length, whatever length it is, can be compressed by by 20 bits,

42
00:04:07,160 --> 00:04:11,420
which is not very much because when I zipped files pressed by hundreds and thousands of bits.

43
00:04:11,420 --> 00:04:19,400
So the probability the number of strings that can be compressed by an incredibly small fraction of all the strings that are possible.

44
00:04:19,400 --> 00:04:25,010
And the reason is because the number of strings goes as to to the end, it grows exponentially, exponentially.

45
00:04:25,010 --> 00:04:29,790
So it becomes very large, very hyper, astronomically large numbers are very.

46
00:04:29,790 --> 00:04:38,760
Are easy to make, so most strings are extremely rare. So why is it then that most of the things that we compress that we use in our daily life,

47
00:04:38,760 --> 00:04:42,000
why this in principle and why is so much of the things we see in nature?

48
00:04:42,000 --> 00:04:49,290
Why are they compressible this thing collection there to kind of understanding things one way one we can think of or misunderstand something?

49
00:04:49,290 --> 00:04:54,990
You've compressed it into some kind of simpler description, and that's a way of understanding.

50
00:04:54,990 --> 00:04:59,790
So I'm going to give you an intuition. It's going to sound very vague and fluffy, but I'm going to hopefully tighten it a little bit.

51
00:04:59,790 --> 00:05:05,050
So imagine a monkey typing on a computer, OK, I would use a typewriter.

52
00:05:05,050 --> 00:05:09,000
I realise I guess you guys will find typewriters with my students. I don't know where these things are.

53
00:05:09,000 --> 00:05:16,110
So on a word processor and you want to ask this monkey to type in PI? Well, how likely is it if it's a truly random monkey to type in PI?

54
00:05:16,110 --> 00:05:23,550
Well, let's say there and keys on a typewriter. It's one of 10 to the Power X plus one Exige as a party because it got three points.

55
00:05:23,550 --> 00:05:28,020
One force is one extra digit, so the probability is extraordinarily small.

56
00:05:28,020 --> 00:05:35,790
The 50 keys one, if they get three or four songs, 150 square, they get points, etc. So the probability extremely small.

57
00:05:35,790 --> 00:05:43,120
But it's possible in principle you could do it. This trope of monkeys actually goes back to a long history if it goes back to a

58
00:05:43,120 --> 00:05:48,010
very famous mathematician in the world who was 100 years ago pointed this out,

59
00:05:48,010 --> 00:05:52,480
it's about monkeys, but it's interesting. Think about them. In fact, they don't type randomly.

60
00:05:52,480 --> 00:05:59,050
There was an experiment the University of Portsmouth, where they put some monkeys in a in a zoo and gave them a typewriter.

61
00:05:59,050 --> 00:06:06,040
And apparently these kids had a preference for K K, which they kept whacking and defaecated on the typewriter.

62
00:06:06,040 --> 00:06:11,110
And that was the end of the experiment. So these are hypothetical monkeys typing on a typewriter.

63
00:06:11,110 --> 00:06:18,790
Now what if instead of typing though on a word processor, you gave the the monkey a c programmes the monkeys typing on a C programme?

64
00:06:18,790 --> 00:06:21,790
How likely would the monkey then type the first digits of PI?

65
00:06:21,790 --> 00:06:27,890
Well, there's actually a competition for the shortest programme in C that will generate PI.

66
00:06:27,890 --> 00:06:28,840
A bunch of short programmes,

67
00:06:28,840 --> 00:06:35,170
and so this is one of the ones I think it's the it may or may not be the current records under three three characters long.

68
00:06:35,170 --> 00:06:43,060
Correct. If you type this in by accident or by by thinking it will generate the first fifteen thousand pages of pie correctly.

69
00:06:43,060 --> 00:06:51,470
In other words, if a monkey is typing on a computer. He might by accident type, the first time with the religious of this programme,

70
00:06:51,470 --> 00:06:58,430
and suddenly you will get by a very large supply will appear with much higher probability than any other random number.

71
00:06:58,430 --> 00:07:03,140
Another round number, but you can't generate algorithmically. The only way you're going to get it in writing, print that number.

72
00:07:03,140 --> 00:07:09,550
And so basically typing into the computer or typing into a processor, will the first order be the same amount of length?

73
00:07:09,550 --> 00:07:19,110
So that intuition can be sharpens. And the way that that has gone on would be using ideas from from.

74
00:07:19,110 --> 00:07:24,060
Alan Turing, so Alan Turing, very famously in 1936, came up with the universal Turing machine,

75
00:07:24,060 --> 00:07:26,410
which is a computer that can do any possible computation.

76
00:07:26,410 --> 00:07:33,930
Now people don't often is the reason he came up with this was not to make a computer, but to prove that a lot of things were not computer.

77
00:07:33,930 --> 00:07:41,370
In fact, this article, which is on the numbers with the applications to the answer to the decision problem of David Hilbert.

78
00:07:41,370 --> 00:07:44,070
So this is the Dave Hoover's famous decision problem.

79
00:07:44,070 --> 00:07:51,360
The idea that could you come, could you from a set of axioms, have an algorithm that could always decide whether something was true or not?

80
00:07:51,360 --> 00:07:54,620
True? Within that set of axioms and Google,

81
00:07:54,620 --> 00:08:00,830
and then Turing proved that it wasn't by showing that these universal computing machines have one thing that they cannot do.

82
00:08:00,830 --> 00:08:07,880
You cannot prove there is no algorithm that will tell you that a particular input programme will generate an output, which is called holding.

83
00:08:07,880 --> 00:08:13,340
So the computer has to have an input comes in, it runs for a while and then it holds that it gets an output and you can prove that

84
00:08:13,340 --> 00:08:19,100
you can prove that you can't prove that a particular input will generate will,

85
00:08:19,100 --> 00:08:30,170
will make the computer whole. Let's called the whole thing. And ergo, you can reduce that to to any kind of any kind of mathematical logical problem.

86
00:08:30,170 --> 00:08:42,230
Therefore, there are there undesirable? And so based on the idea of true machines, we then see two other great geniuses of growth and cheating.

87
00:08:42,230 --> 00:08:48,470
Who start thinking about the question of what is the complexity of a sequence, so I have a sequence of, say, a binary sequence.

88
00:08:48,470 --> 00:08:51,620
Can I describe this complexity? It turns out that the fundamental,

89
00:08:51,620 --> 00:08:56,930
mathematically profound way of describing them is to say the complexity of a sequence is the

90
00:08:56,930 --> 00:09:03,380
length of the shortest code that will generate that sequence on a universal Turing machine.

91
00:09:03,380 --> 00:09:12,800
In other words, if I have a sequence like this which is zero, many times it's probably quite a short programme.

92
00:09:12,800 --> 00:09:16,820
I can say zero one, 50 times and I'll get that sequence. Where's this sequence beneath?

93
00:09:16,820 --> 00:09:25,040
It may be quite complicated. Maybe it's the only programme that will generate is print that particular number, so it's a complex sequence.

94
00:09:25,040 --> 00:09:31,790
And what's interesting about this definition is that so to define that formula, I need the particular universe to machine.

95
00:09:31,790 --> 00:09:37,640
But because any universal machine can always emulate any other Turing machine by writing a compiler.

96
00:09:37,640 --> 00:09:41,300
In principle, if these things are long enough so I can ignore the compiler terms,

97
00:09:41,300 --> 00:09:46,170
the the complexity of a of a string is independent of the string machine that I use.

98
00:09:46,170 --> 00:09:50,870
So it's a it's actually a property of the string itself up to these compiler terms.

99
00:09:50,870 --> 00:09:58,490
So we say that they are asymptotically these that a particular tree has a very particular code of complexity.

100
00:09:58,490 --> 00:10:06,290
The problem is that because of the whole thing problem, you can never actually calculate correctly and know for sure that it's the of complexity.

101
00:10:06,290 --> 00:10:09,080
Colloquially, what that means if I give you this particular sequence of digits,

102
00:10:09,080 --> 00:10:16,190
you don't know whether that is actually complex or whether there may be something like pie, which generates that as its first number of digits.

103
00:10:16,190 --> 00:10:20,660
And so you never can prove it, but it can define it.

104
00:10:20,660 --> 00:10:23,570
And this gives a whole series of other interesting intuitions.

105
00:10:23,570 --> 00:10:29,240
For example, the definition of a random number is no whose common goal of complexity is its own length or slightly more,

106
00:10:29,240 --> 00:10:31,520
and it was the only way you can generate it by printing it.

107
00:10:31,520 --> 00:10:35,630
If you can generate the number by some other algorithm shorter, then it's not a random number.

108
00:10:35,630 --> 00:10:39,200
That's the definition of a random number morsel of randomness.

109
00:10:39,200 --> 00:10:47,720
Also, the complexity of a set can be much less than the reflexive elements of a set, so the complexity of Baucus's library is extremely short.

110
00:10:47,720 --> 00:10:55,490
Take all old on books and in 410 pages long with 80 characters per page.

111
00:10:55,490 --> 00:10:57,440
Twenty five letters and that's a pretty short.

112
00:10:57,440 --> 00:11:02,270
I can take the entire library of August, whereas your own life, which is described by one of those books,

113
00:11:02,270 --> 00:11:07,130
or maybe several those books, depending on the complexity of your life, that is pretty, pretty complex, right?

114
00:11:07,130 --> 00:11:12,990
So the interesting thing is that you can have the complexity of a set. It can be a much less than the complexity of the individual elements.

115
00:11:12,990 --> 00:11:20,720
This is a lot of kind of things that seem non-intuitive at first, but it become very clear and very precise in this of language.

116
00:11:20,720 --> 00:11:28,670
And the reason I brought this up is because actually, the co-author of Almagro of Complexity was done earlier by another great genius essential genius

117
00:11:28,670 --> 00:11:35,060
called Ray Sullivan of who was trying to effect all of us very heavily influenced by Coin-Up,

118
00:11:35,060 --> 00:11:39,510
a very honest philosopher from the Vienna School who was trying to formalise induction.

119
00:11:39,510 --> 00:11:42,590
And so he was trying to think about how he could do this in a computer, as he said,

120
00:11:42,590 --> 00:11:47,000
Well, if I have a universal truth machine, let's assume that I feed at random inputs.

121
00:11:47,000 --> 00:11:52,670
And for that make life simple. We're going to make a universal truth machine that only takes binary codes as input.

122
00:11:52,670 --> 00:11:56,040
So I give it inputs and I see what it does.

123
00:11:56,040 --> 00:12:03,300
And I asked myself how likely on giving it random inputs, do I get a particular output x that can be some particular string?

124
00:12:03,300 --> 00:12:11,580
How likely will this computer generate that string? Well, it's the sum over all programmes that generate that string on the university machine.

125
00:12:11,580 --> 00:12:19,860
That's what it is. And each programme of length, since it's binary, is the probability I get a programme of linked L as one half to the power L.

126
00:12:19,860 --> 00:12:25,530
So it's the sum of all programmes terms probability that you get the programme, which is a probability of one half to the par.

127
00:12:25,530 --> 00:12:31,260
Now the most likely programme is the shortest programme because the one you're most likely to type by accident.

128
00:12:31,260 --> 00:12:35,730
And so the first thumb in that series is two to the minus K, which is a goal of complexity.

129
00:12:35,730 --> 00:12:42,090
So if I can help with idea before coming off of, it's called Rugova's a name that has stuck by the kind of Matthew principal.

130
00:12:42,090 --> 00:12:45,330
You know, he has more shall be given until he has not.

131
00:12:45,330 --> 00:12:51,070
Even what he has has been shall be taken away. Which is why we know it has caused growth complexity.

132
00:12:51,070 --> 00:12:55,000
And so this is really interesting, this gives you at least a lower bound on the probability that you're going to get

133
00:12:55,000 --> 00:12:58,930
the output by randomly generating and it's given by the chemical complexity.

134
00:12:58,930 --> 00:13:06,250
Now, another great genius there 11, also known as the one of the founders of the idea of p p, it's a great thinker in mathematics.

135
00:13:06,250 --> 00:13:10,060
Computer science proved that not only was there this lower bound by soul,

136
00:13:10,060 --> 00:13:16,330
but also an upper bound up to these kind of pesky order, one terms which are terms that are linked to compilers,

137
00:13:16,330 --> 00:13:23,150
etc. And this is very interesting because this tells us that if I have any kind of system that can be generated by universe through machine,

138
00:13:23,150 --> 00:13:25,330
I randomly give programmes into it.

139
00:13:25,330 --> 00:13:34,030
Then the probability that it produces a particular outputs can be bounded tightly by a half to the power of the of the object.

140
00:13:34,030 --> 00:13:37,360
OK, so that's a very beautiful thing. I think it should be more widely taught because it's a really cool.

141
00:13:37,360 --> 00:13:43,060
It's amazing. Now, the reason why it's not so widely taught is because there are problems in applying this,

142
00:13:43,060 --> 00:13:47,290
and the problems are many systems we care about are not universal true machines.

143
00:13:47,290 --> 00:13:49,990
So in physics, many things are not universally not during universal.

144
00:13:49,990 --> 00:13:53,380
And one of the reasons we know that is because we can take all inputs and outputs.

145
00:13:53,380 --> 00:13:58,250
By definition, we know there if we solve the whole thing problem that can't be a true machine.

146
00:13:58,250 --> 00:14:03,900
And Gulf complexities, by definition, formally on Computable. So that's problematic as I need it in my bones.

147
00:14:03,900 --> 00:14:05,910
And of course, many systems are not in the assets,

148
00:14:05,910 --> 00:14:15,740
and clearly this is only true in the limits of keys that are large enough that I can ignore all of these kinds of things, like my my compiler terms.

149
00:14:15,740 --> 00:14:23,410
So. I've been interested in this for quite a while, and I've got two very brilliant DPhil students come out Dingle and Chico Camargo,

150
00:14:23,410 --> 00:14:31,420
I think Chico sitting in the back is to the fields of work. And so we work very hard on trying to generalise this coding theorem for a known universal

151
00:14:31,420 --> 00:14:36,220
map so we can up the details you can read in this paper that came out just recently on.

152
00:14:36,220 --> 00:14:42,580
But we have a bounce which says that the probability that you get a particular output on random inputs to a computable map,

153
00:14:42,580 --> 00:14:49,810
it's a map of inputs and outputs that are well defined can be still bounded by the same kind of two to the minus ache

154
00:14:49,810 --> 00:14:55,510
with a little squiggle above the K means we approximated K by some good approximation to global growth complexity.

155
00:14:55,510 --> 00:14:59,110
So a compression algorithm is something that gives you good.

156
00:14:59,110 --> 00:15:05,050
And we've got some B, which is a term a constants. That's an offset, basically, which we don't quite know how to fix.

157
00:15:05,050 --> 00:15:12,580
We actually know how to calculate a the first consonant just based on that on the properties of the map, not the properties of the output of the map.

158
00:15:12,580 --> 00:15:18,110
And B we can fix by taking a few measurements. So for this to work, the maps have to be simple.

159
00:15:18,110 --> 00:15:25,300
That is the if my system grows in size, the maps complexity has to grow slowly with system size.

160
00:15:25,300 --> 00:15:32,110
I have to find a good approximation to K. That's a that's a bit of an uncontrolled approximation, but we can try and.

161
00:15:32,110 --> 00:15:39,790
And interestingly, this bound this quite it's an upper bounds, but it's relatively tight on four random inputs via random input into my map.

162
00:15:39,790 --> 00:15:44,560
Then the output should be close on average close to this bounds.

163
00:15:44,560 --> 00:15:52,090
And this only works for non-linear maps because the ones that are interested in. And that doesn't work for particular kinds of maps the or that.

164
00:15:52,090 --> 00:16:00,640
So you see, a map generates mainly, say, pseudo random number generator, which which you use to generate you run the numbers.

165
00:16:00,640 --> 00:16:06,070
They're never really random. They're actually they're conmigo factors relatively short because you run, which generates a short code on,

166
00:16:06,070 --> 00:16:10,630
but they're made so that they will fool a random number generators, which means they're also incompatible.

167
00:16:10,630 --> 00:16:19,360
So you have to have maps that don't have that kind of behaviour. And so we show that a wide range of systems in nature, in fact, behave this way.

168
00:16:19,360 --> 00:16:27,280
So here we have these are oranges, oranges and the tropical speaking about the reiber zone, which is made of RNA.

169
00:16:27,280 --> 00:16:31,990
So those are things so that what you have to then do to make that is you have a strand with

170
00:16:31,990 --> 00:16:36,100
this code of four letters on it and then it falls into a well-defined three dimensional shape.

171
00:16:36,100 --> 00:16:41,860
And then you can study this folding. And we've studied this folding and you can ask yourself if I have a particular strand,

172
00:16:41,860 --> 00:16:45,130
if I just think random strands, how likely out to get a particular shape.

173
00:16:45,130 --> 00:16:52,720
And it turns out that if you randomly pick strands and you look at the shapes that you write out their complexity in terms of complexity,

174
00:16:52,720 --> 00:16:56,650
then the most likely strands you're going to get by randomly picking most likely shapes,

175
00:16:56,650 --> 00:17:01,120
you can get about really picking strands or in fact, simple shapes of compressible shapes.

176
00:17:01,120 --> 00:17:07,870
And interestingly, when you look at nature, the shapes aren't that you find in nature are exactly the shapes that we predict you're going to find,

177
00:17:07,870 --> 00:17:11,650
because those are ones that have short descriptions of complexity,

178
00:17:11,650 --> 00:17:16,870
and they're therefore very easy to make by random mutations because there's many, many sequences mapping to them.

179
00:17:16,870 --> 00:17:22,330
So those are the simple stories for as many different possible ways the same story can be written.

180
00:17:22,330 --> 00:17:28,690
We do the same here, and this is another example we have here. This is a set of couple of differential equations in this particular case, the model.

181
00:17:28,690 --> 00:17:34,240
This is for circadian rhythm, but actually, we just took it as a set of in this case on, I think,

182
00:17:34,240 --> 00:17:38,900
five a couple of differential equations with about 30 parameters, and we just randomly vary the parameters.

183
00:17:38,900 --> 00:17:43,300
So now the input is around, they change the parameters, and I look at the outputs and also self.

184
00:17:43,300 --> 00:17:48,280
Does this set a couple of differential equations that generate a very complicated output or simple outputs?

185
00:17:48,280 --> 00:17:51,880
And you see this line, this black line here is our bond.

186
00:17:51,880 --> 00:17:57,760
The Red Line is a boundary to be to zero, so we ignore B, but so that means there's no free parameters.

187
00:17:57,760 --> 00:18:04,180
One free parameter do a slightly better bond ends up for every particular complex, every particular output.

188
00:18:04,180 --> 00:18:12,690
This is a simple but simple was much more frequently than complicated outputs, and there's bound as tight as boundaries extremely well.

189
00:18:12,690 --> 00:18:22,120
Remember, this is the log of the probability, so there are indeed. Outputs that you get down here that are simple and rare.

190
00:18:22,120 --> 00:18:27,800
OK, but they're very rare, so the vast majority of time with low quality, you're close to the bounds.

191
00:18:27,800 --> 00:18:29,870
And we also can make maps for which it doesn't work at all,

192
00:18:29,870 --> 00:18:36,470
this is a matrix map where the input that's in the matrix that does the computation and then there's outputs.

193
00:18:36,470 --> 00:18:41,600
So a matrix, you think about it, if it's a map, it grows as if it's if you have an inviting matrix is square matrix.

194
00:18:41,600 --> 00:18:47,480
It grows and squared as you grow. So the complexity of the map does not get that grows very quickly with size.

195
00:18:47,480 --> 00:18:51,110
And for that argument, say, Ah, this theory should work, and indeed it doesn't work.

196
00:18:51,110 --> 00:18:52,200
You don't get simplicity.

197
00:18:52,200 --> 00:18:58,970
So for some, this device you need to have at the set of rules that describe how the input process to the output have to be simple,

198
00:18:58,970 --> 00:19:05,510
which is true for, you know, how RNA gets translated, how RNA sequences turn its structure.

199
00:19:05,510 --> 00:19:11,270
There's a set of physical laws that that I wrote that are not really change as you make the artist strands longer.

200
00:19:11,270 --> 00:19:16,160
And perhaps the differential equations not changes because different equations ability to study.

201
00:19:16,160 --> 00:19:20,660
I can do the same with simple financial models, and everything seems to show the simplicity bias.

202
00:19:20,660 --> 00:19:27,560
And just to show you just a few people I know from experience, people will wonder, aren't you looking at some kind of entropy argument?

203
00:19:27,560 --> 00:19:32,090
So this is the this is a compression of the outputs.

204
00:19:32,090 --> 00:19:37,100
And here we have just heard what I mean by the difference, entropy and complexity.

205
00:19:37,100 --> 00:19:41,600
So here's the entropy of strings. This is all strings of length.

206
00:19:41,600 --> 00:19:46,140
Thirty. So the entropy is basically the fraction of 1s and zero.

207
00:19:46,140 --> 00:19:54,440
So it's true the five old ones. OK, that's a very simple string. So strings with low entropy down here also have low complexity of old ones.

208
00:19:54,440 --> 00:19:56,390
Are all zeros, right?

209
00:19:56,390 --> 00:20:02,900
But there are also many strings of high entropy, but low complexity like here that would be a string of five zero one zero one zero one zero one.

210
00:20:02,900 --> 00:20:09,320
It's actually binary. It's entropy, the same fraction of zeros and ones as entropy seems real to be high, even though it's a highly structured string.

211
00:20:09,320 --> 00:20:15,890
And so what we're picking up here is something far beyond engineer picking up these kinds of patterns in nature.

212
00:20:15,890 --> 00:20:20,720
So why are most strings that you see?

213
00:20:20,720 --> 00:20:24,770
Most of these are close to maximum clock, but why do we see so many compressible sequences nature?

214
00:20:24,770 --> 00:20:31,390
One answer may be because the patterns that we're looking at are caused by effectively a process sampling

215
00:20:31,390 --> 00:20:39,340
algorithm and and then they'll be especially biased towards what are effectively simpler outputs.

216
00:20:39,340 --> 00:20:45,910
So going back to my monkey analogy, that was fairly vague and monkeys typing on a typewriter or into a C programme.

217
00:20:45,910 --> 00:20:52,870
Well, what this whole language of information theory tells you is that you can not worry about any particular programme the

218
00:20:52,870 --> 00:20:59,710
monkeys typing in because there's a there is a and you can always write a compiler from one programme to the other.

219
00:20:59,710 --> 00:21:07,510
And I can write and and then I can formalise this in terms of the decoding theorem of it.

220
00:21:07,510 --> 00:21:13,540
What we've done now is gone a bit away from that great generality is something slightly less general, which is maps that are simple,

221
00:21:13,540 --> 00:21:26,350
and we show that we can nevertheless generate a pretty good, a pretty good description of what happens to a wide range of different physical systems.

222
00:21:26,350 --> 00:21:32,260
We lose the lower bounds. OK, but we have a really good upper bound upper bound seems to work.

223
00:21:32,260 --> 00:21:39,250
And so far, we've not really found any system for which it doesn't work in one way or the other.

224
00:21:39,250 --> 00:21:43,240
And there's a lot of other stories behind it, but I won't tell you what I'm going to give you two applications,

225
00:21:43,240 --> 00:21:47,020
so I'll give you two applications one application just evolution.

226
00:21:47,020 --> 00:21:54,220
So evolution happens by randomly changing genotypes that can translated by the laws of physics and biology into a phenotype,

227
00:21:54,220 --> 00:21:56,650
which is the organism of some kind do the other.

228
00:21:56,650 --> 00:22:02,470
So I'm going to put those and then if I have time, I might look at machine learning with deep neural networks, which also effectively does.

229
00:22:02,470 --> 00:22:06,520
It can be thought of as an input output map, but I'm going to hopefully explain why those two things work.

230
00:22:06,520 --> 00:22:13,830
So let's think firstly about some. This is a movie that's the Julia showed of.

231
00:22:13,830 --> 00:22:18,690
Of the self-assembly of the bacteria for motor and the.

232
00:22:18,690 --> 00:22:23,010
What's interesting about that is many silly things, including how on earth is that thing self-assemble?

233
00:22:23,010 --> 00:22:31,470
But Nature designed this particular system by randomly changing the genome types, the genes that made those particular proteins.

234
00:22:31,470 --> 00:22:35,760
And so the question that I got interested in is, how can how does that work?

235
00:22:35,760 --> 00:22:40,800
Because this is a very specific system. It's really hard, actually emulates how on Earth could do by randomly just changing genotypes.

236
00:22:40,800 --> 00:22:45,720
Get something that does this. That forms is very exquisite, three dimensional shape.

237
00:22:45,720 --> 00:22:53,470
And so we came up with a model. Which you call the pull of the Dominos, so these are typical physics model of proteins or proteins,

238
00:22:53,470 --> 00:22:56,920
are the molecular building blocks of the of life, really.

239
00:22:56,920 --> 00:23:02,380
And they form dimer astronomers, customers hexamer all kinds of complicated structures, including that bacteriological motor.

240
00:23:02,380 --> 00:23:07,360
So we're going to replace it with a simple model of little squares on a lattice.

241
00:23:07,360 --> 00:23:12,760
And these scores are interact with each other with particular interactions, and then hopefully they will self-assembly into well-defined shapes.

242
00:23:12,760 --> 00:23:21,430
So here's an example of a polynomial, which will play all roles because no dominoes is to try those three polynomials multiple ones.

243
00:23:21,430 --> 00:23:25,630
And so here I've got a set of particles with letters on the outsides.

244
00:23:25,630 --> 00:23:31,510
I've got a set of rules that tell you who sticks to whom. And it turns out if I have this set of rules, who sticks to who I am,

245
00:23:31,510 --> 00:23:37,510
and I put this number one particle down first and let these other ones just randomly fall in and out of my system,

246
00:23:37,510 --> 00:23:42,260
I'm always going to form this particular structure, and the reason is no. One.

247
00:23:42,260 --> 00:23:46,640
Has eyes on the outside sticks to see the unknown with see in this case is number three.

248
00:23:46,640 --> 00:23:50,780
So these three will stick, nothing else will stick. 3S have on their sides.

249
00:23:50,780 --> 00:23:55,160
They have these. These stick to be the only one with these number two.

250
00:23:55,160 --> 00:24:00,680
So 2S will stick to the outside and etc. And twos have 0s or 0s everywhere.

251
00:24:00,680 --> 00:24:05,480
We don't stick to anything. And so this will deterministic the always self-assembly is of this particular structure.

252
00:24:05,480 --> 00:24:09,020
So that's a very simple model of how self-assembly works in something like that,

253
00:24:09,020 --> 00:24:13,130
flagella mature as well, so these things will always form its particular structure.

254
00:24:13,130 --> 00:24:20,000
Now I can ask the inverse question, which is if I have a shape. Can I find a set of rules that will make that particular shape?

255
00:24:20,000 --> 00:24:25,730
That's a kind of fun little game, and you can use lots of different methods do that include rather theoretical techniques,

256
00:24:25,730 --> 00:24:31,670
but a very clever DPhil student, Ian Johnston, who?

257
00:24:31,670 --> 00:24:36,860
Who decides to try this and he said, well, he said Monday, me, I tried something different.

258
00:24:36,860 --> 00:24:41,630
I just said, make a 16 year, OK. I didn't say we're 60 murders, make any 60.

259
00:24:41,630 --> 00:24:44,330
And so point is a voluntary recreational mathematics.

260
00:24:44,330 --> 00:24:48,860
It turns out there are 30 million seventy nine thousand two hundred fifty five different 16 years.

261
00:24:48,860 --> 00:24:49,640
So I initially thought, Well,

262
00:24:49,640 --> 00:24:56,240
if you just randomly runs an evolutionary algorithm that kind of randomly picks different matrix and matrix interactions,

263
00:24:56,240 --> 00:25:00,560
you should get any particular shape with a probability one in 13 million.

264
00:25:00,560 --> 00:25:05,570
But no, what we found is half the time we got only these structures, these first twenty one,

265
00:25:05,570 --> 00:25:12,830
50 percent of the time by randomly either doing a Darwinian kind of code or just randomly changing things just completely randomly,

266
00:25:12,830 --> 00:25:17,000
just seeing what they make. We got these twenty one out of these 30 million,

267
00:25:17,000 --> 00:25:22,880
and that's actually how I got interested in this coding theorem because I just want to get the problem I gave to to come island to Chico.

268
00:25:22,880 --> 00:25:26,570
Just say, Well, why is this happening? Why are we going to get this very small subsets?

269
00:25:26,570 --> 00:25:32,960
Because what we realise is that subset a special subset, you get these first twenty one, they're highly symmetric.

270
00:25:32,960 --> 00:25:38,550
So there are only in the set of 13 million people. The owner knows five, which have D4 symmetry.

271
00:25:38,550 --> 00:25:42,950
That's the symmetry of the square means you can rotate and flip them and they themselves.

272
00:25:42,950 --> 00:25:46,850
And all five of those are found in this first set of twenty one.

273
00:25:46,850 --> 00:25:50,630
So by randomly looking in the space of algorithms in space, that's fine.

274
00:25:50,630 --> 00:25:57,830
It's making these things. I'm finding all these highly symmetric shapes and thus gave us that gave us the clue to start looking into this eight and

275
00:25:57,830 --> 00:26:04,700
then took to the fields to turn it from a very abstract theory of global growth complexity to something that worked here.

276
00:26:04,700 --> 00:26:06,860
And if you actually look and this whole system, look at probability,

277
00:26:06,860 --> 00:26:12,110
you get a particular output x as opposed to the complexity of the structure complexity being the number of bits.

278
00:26:12,110 --> 00:26:16,190
I need to encode the algorithm that makes that particular structure.

279
00:26:16,190 --> 00:26:22,760
Then the very high probability ones are the simple ones and symmetric structures have are simple.

280
00:26:22,760 --> 00:26:27,470
You don't need very much information to encode a simple structures. They make something that was repeated multiple times.

281
00:26:27,470 --> 00:26:31,370
So symmetry and low complexity are deeply linked to one another.

282
00:26:31,370 --> 00:26:37,250
And then we've worked with some collaborators in Cambridge, and we've discovered that you can look at protein clustering,

283
00:26:37,250 --> 00:26:40,190
the protein database, which is about thirty thousand that are known.

284
00:26:40,190 --> 00:26:48,560
And they've looked at the complexity of those protein clusters, not them onto the same onto the same complexity measure.

285
00:26:48,560 --> 00:26:54,110
And we get almost exactly the same behaviour as we get for a simple domino model that makes is very happy when they're simple.

286
00:26:54,110 --> 00:27:01,370
Models give the same results as nature. And this line here is our bound that we've predicted so are able to predict this upper bound.

287
00:27:01,370 --> 00:27:07,070
And this tells us that not only is this biased towards highly symmetric simple structures there in the polynomials,

288
00:27:07,070 --> 00:27:10,880
but it seems to be there in nature in these protein clusters.

289
00:27:10,880 --> 00:27:15,800
And you can break this down further so you can look at, say, six hours of which there are many in the database.

290
00:27:15,800 --> 00:27:19,010
And this is the complexity down here.

291
00:27:19,010 --> 00:27:27,740
And what you see is a simple six months of very frequent nature and complex six hours are rare and more or less the frequencies that we predict of.

292
00:27:27,740 --> 00:27:31,590
And so we've done a whole series of evolutionary models. I just show you one because I just saw this.

293
00:27:31,590 --> 00:27:35,540
She was in the audience. So Nora Martin was an undergraduate with me.

294
00:27:35,540 --> 00:27:37,220
She worked on Richard Dawkins Bioworks.

295
00:27:37,220 --> 00:27:43,820
Richard Dawkins is a really beautiful little model of shapes that you can make that he uses to show the great power of evolution.

296
00:27:43,820 --> 00:27:47,840
And he or she shows that the probability of complexity of these shapes.

297
00:27:47,840 --> 00:27:55,100
If you if you search this space of algorithms show the same behaviour that we predict for for all the other systems.

298
00:27:55,100 --> 00:27:58,970
And Nora was an undergraduate here. She went to the other place where she's doing this.

299
00:27:58,970 --> 00:28:07,470
So you win some, you lose some. But we have really brilliant students as you as as I think many of you were as well.

300
00:28:07,470 --> 00:28:10,320
It's actually your undergraduate work, and what's interesting thing about this is that actually,

301
00:28:10,320 --> 00:28:16,080
if you look at these rare structures down here that are low complexity and low probability,

302
00:28:16,080 --> 00:28:19,560
it turns out that they're the strings that make those structures.

303
00:28:19,560 --> 00:28:27,210
So this the codes that make the structures of the cells simple so that they have particularly simple codes that make them.

304
00:28:27,210 --> 00:28:30,530
And that's that's interesting. All right.

305
00:28:30,530 --> 00:28:36,830
And then in the last 10 minutes or so, I want to switch gears to something entirely different, which is machine learning.

306
00:28:36,830 --> 00:28:38,900
So you probably see a lot of machinery.

307
00:28:38,900 --> 00:28:46,190
It's been in the news, enormous amounts, and the news has largely been dominated by one particular methods, which are neural networks.

308
00:28:46,190 --> 00:28:52,610
Neural networks are very loosely modelled on the brain the way they work as inputs layer that give you some kind of inputs,

309
00:28:52,610 --> 00:28:56,660
and then you've got a series of little interaction with other with other nodes.

310
00:28:56,660 --> 00:29:01,660
These are weights and say X one is these nodes are either one or zeros.

311
00:29:01,660 --> 00:29:08,960
And this one is one. Then the weights, it multiplies all the ones on this input that are one give a weight to the next layer.

312
00:29:08,960 --> 00:29:12,370
And this was given its next layer, and you can have several of these.

313
00:29:12,370 --> 00:29:16,160
They're called deep neural networks if they have several layers in a row and there's an output that says,

314
00:29:16,160 --> 00:29:20,390
for example, yes or no, this is a cat or is not a cat, for example.

315
00:29:20,390 --> 00:29:25,500
And what's really powerful about this is that they're incredibly good at things like pattern recognition.

316
00:29:25,500 --> 00:29:36,740
So in 2012, very famously, a group from University of Toronto, Jeffrey Hinton's group, completely obliterated the competition in this competition.

317
00:29:36,740 --> 00:29:40,400
Were you given a bunch of images, you're supposed to recognise them.

318
00:29:40,400 --> 00:29:46,100
And then so you train on these images, which means you get a bunch of images that say cats and dogs and horses and cows.

319
00:29:46,100 --> 00:29:53,720
And then you, you're given a whole bunch of them. So you train your computer to do correctly on that set that you've given that you're given.

320
00:29:53,720 --> 00:29:59,420
That means you adjust as little weights so that when you give an inputs of pixels, that is a cat's.

321
00:29:59,420 --> 00:30:06,500
The outputs of this thing is cats. And when this has dog is dog, that's how you train it and then you're given a test.

322
00:30:06,500 --> 00:30:14,240
That's what you haven't seen before of new pictures. And then you run those to your computer and then you see how well you predict.

323
00:30:14,240 --> 00:30:20,450
And these machines are extremely good at this. And so they predict with much higher accuracy knew their methods.

324
00:30:20,450 --> 00:30:25,310
What you what you put in initially that's called generalisation is called

325
00:30:25,310 --> 00:30:29,450
generalisation because you're you're showing that you're able to generalise on new,

326
00:30:29,450 --> 00:30:33,470
unseen data. And this has dramatically changed all kinds of things.

327
00:30:33,470 --> 00:30:40,490
So if you now do you, Google Translate is no longer it's using basically these kind of pattern recognition using large

328
00:30:40,490 --> 00:30:45,530
amounts of data of of text that are translated or put through it and eventually just memorise it.

329
00:30:45,530 --> 00:30:52,070
And you don't even need to know anything about language at all to work on these translation techniques because these machines have become so good.

330
00:30:52,070 --> 00:30:56,360
Now what's very confusing about them is surprising is that they're highly overprivileged, right?

331
00:30:56,360 --> 00:31:02,870
So a typical nerve network has millions and millions, millions of parameters, but you're only feeding it really a few thousand data points, typically.

332
00:31:02,870 --> 00:31:04,430
And we know from experience,

333
00:31:04,430 --> 00:31:11,240
if I think a small number of data points and I have more parameters than I have data points that I start getting nonsense.

334
00:31:11,240 --> 00:31:16,970
I mean, we're taught, we teach this to our undergraduates all the time, you know, don't put in. It's a very famous quote from von Neumann.

335
00:31:16,970 --> 00:31:20,810
Give me four elephants I can four four four three four parameters that can fit elephants.

336
00:31:20,810 --> 00:31:25,040
Give me five and I can make it wiggle its tail.

337
00:31:25,040 --> 00:31:32,330
And so very recently, a paper in science last year where a bunch of A.I. researchers were saying that this is alchemy, right?

338
00:31:32,330 --> 00:31:36,140
There's no reason why we don't understand why this works. There's no theory for it. It works clearly.

339
00:31:36,140 --> 00:31:45,740
But it's, you know, we don't understand and all the kind of classical machine learning theoretical ideas just break down for neural networks.

340
00:31:45,740 --> 00:31:47,750
So I have a very bright DPhil student.

341
00:31:47,750 --> 00:31:53,720
This is a pattern that you hopefully you've noticed lots of bright you feel students who came to me said, Well, let's think about this.

342
00:31:53,720 --> 00:32:03,800
Can we apply these ideas about simplicity, bias to deep neural networks so we can have a little more of a problem, which was I have a system of.

343
00:32:03,800 --> 00:32:10,130
Of seven bits. So there's two to the seven possible different strings and that old Boolean

344
00:32:10,130 --> 00:32:13,910
functions of that subordinate function basically says if this is on and that's off,

345
00:32:13,910 --> 00:32:18,410
then yes, this on this on the left, no, right. So they're actually a very large number.

346
00:32:18,410 --> 00:32:23,030
Then there's two to the 128 different possible Boolean function.

347
00:32:23,030 --> 00:32:29,300
That's what turned to the thirty four. So you might think that if I randomly pick parameters in my Emanual network, in other words,

348
00:32:29,300 --> 00:32:34,670
if I randomly choose these weights and then I ask myself, What function do I produce?

349
00:32:34,670 --> 00:32:38,330
They get each brain function with the probability of one in 10 to a thirty four.

350
00:32:38,330 --> 00:32:43,580
OK, what we see here is here we plots a rank plots of the probability of getting a function and rank.

351
00:32:43,580 --> 00:32:48,830
So here's the first attempt to the functions. And you see these probabilities are much, much, much larger than 10 to the minus thirty four.

352
00:32:48,830 --> 00:32:54,140
So functions certain functions are appearing quite easily by randomly picking them.

353
00:32:54,140 --> 00:32:59,260
And in fact, if you look at the complexity of those functions, these are the low complexity functions.

354
00:32:59,260 --> 00:33:04,000
So we're very excited about this. You know, this is a toy model for neural networks.

355
00:33:04,000 --> 00:33:09,530
Just last night, actually or this morning at 2:00 a.m., your boy sent me a picture of a much bigger network.

356
00:33:09,530 --> 00:33:16,990
This is for a this is actual, proper real network that we used on a four layer deep CNN.

357
00:33:16,990 --> 00:33:21,610
If that interest you, that we use on sea for data and we get exactly the same line.

358
00:33:21,610 --> 00:33:27,910
So we're very pleased that because he's done some very, very clever tricks to be able to work out this probability.

359
00:33:27,910 --> 00:33:34,480
So this is exactly what the neural networks do. These neural networks are highly biased towards simple outputs,

360
00:33:34,480 --> 00:33:39,800
and that may be the reason why when you give them a bunch of data, they don't overpromise.

361
00:33:39,800 --> 00:33:44,260
So even though the number of functions that this thing can generate is extremely large.

362
00:33:44,260 --> 00:33:47,800
So for this Boolean thing, it's tens to throw thirty to the thirty four.

363
00:33:47,800 --> 00:33:53,200
It gets biased by the simplicity bias towards this very small fraction of simple ones.

364
00:33:53,200 --> 00:33:58,570
And we tested this, for example. So here is the generalisation error on the typical supervised learning.

365
00:33:58,570 --> 00:34:04,750
So I've got 64 strings that I train on that then give it six version, which it hasn't even asked.

366
00:34:04,750 --> 00:34:13,090
So I know I have a secret Boolean function I I make I do 64 strings and see what the function would say.

367
00:34:13,090 --> 00:34:19,630
I train my network on it and then I give it the other sixty four strings that I haven't seen yet and ask, Does it find the right Boolean function?

368
00:34:19,630 --> 00:34:23,620
And this error is the fraction of time it gets those things wrong.

369
00:34:23,620 --> 00:34:34,060
And what you see is that, ah, if we randomly sampled inputs, then we get errors that are quite quite close to the to right.

370
00:34:34,060 --> 00:34:38,710
I'll go to small the order a few percent, whereas if we randomly pick functions, for example,

371
00:34:38,710 --> 00:34:43,630
if we just ran the functions and then test and then found functions which exactly

372
00:34:43,630 --> 00:34:46,960
fit the data so we can find lots of functions that will never fit the test.

373
00:34:46,960 --> 00:34:53,740
The training data, because the render is then rated on the test data, they give almost no sense and they give an error of almost 100 percent.

374
00:34:53,740 --> 00:34:59,050
So this Boolean networks are hugely biased towards this very small fraction of simple functions.

375
00:34:59,050 --> 00:35:02,920
And what's also interesting is as you make the target function is more complicated.

376
00:35:02,920 --> 00:35:07,060
So as you generate your data by a more complicated, targeted function, these networks do less and less well.

377
00:35:07,060 --> 00:35:13,000
So this is the error that they make if they're trained as opposed to the complexity of the target they're trying to make.

378
00:35:13,000 --> 00:35:16,180
And what turns out, even though they're extremely good,

379
00:35:16,180 --> 00:35:20,950
if they if you give them complicated functions so we can measure this complexity very accurately,

380
00:35:20,950 --> 00:35:23,800
then they actually they no longer generalise very well.

381
00:35:23,800 --> 00:35:33,220
So the fact that general, as well as because they're looking at simple functions, what we're particularly excited about is that we can apply.

382
00:35:33,220 --> 00:35:37,540
We've now use these ideas in kind of classical learning. Theory is a very big hole,

383
00:35:37,540 --> 00:35:45,130
big field of computer science for lots of textbooks and thousands hundreds of thousands of papers trying to make bounds on how will you generalise?

384
00:35:45,130 --> 00:35:50,830
Because you can make a rigorous bounds, then you know what your maximum error is going to be.

385
00:35:50,830 --> 00:35:57,610
And so we've been able to adapt these these ideas of bias to do a particular technique called PAC base.

386
00:35:57,610 --> 00:36:06,970
Actually, it has not as much to do with base as you might think on. And this is the this is a result that we get for the the error bounds.

387
00:36:06,970 --> 00:36:17,170
And what you see here for these curves on the bottom, these are M. This is a this is a database of of handwritten on handwritten numbers.

388
00:36:17,170 --> 00:36:23,800
So you have to do well. And this is C for which these images and so obviously on C four, you don't do this is your error that you do.

389
00:36:23,800 --> 00:36:27,580
And what we do is we randomise labels. We corrupt the labels on the test sets.

390
00:36:27,580 --> 00:36:30,700
So if I crop the labels, I tell you that a one is actually a five,

391
00:36:30,700 --> 00:36:34,600
then I'm not going to generalise very well because I've got some errors in my tests.

392
00:36:34,600 --> 00:36:39,880
And so as the errors get bigger, my generalisation is less good because I don't give them very good inputs.

393
00:36:39,880 --> 00:36:44,050
And so that's what you see of the error growing as a function of corruption for these three,

394
00:36:44,050 --> 00:36:48,610
these two three, two different two different classes of systems.

395
00:36:48,610 --> 00:36:53,410
And this is our rigorous bounds. So this is the upper bound on the generalisation error.

396
00:36:53,410 --> 00:36:56,560
And what's really I think this and this means this is no longer alchemy.

397
00:36:56,560 --> 00:37:03,970
We actually predicted analytically with no free parameters with the maximum error is that you're going to guess and we're not that we're.

398
00:37:03,970 --> 00:37:09,760
Depends on what we're pretty close to, what these things do, even though we're in this heavily over parameter regime,

399
00:37:09,760 --> 00:37:12,910
this regime, which traditionally machine learning ideas shouldn't work.

400
00:37:12,910 --> 00:37:17,980
And the reason this works is because what we argue or arguing is that the reason why these

401
00:37:17,980 --> 00:37:22,990
neural networks work so well is because they're intrinsically biased towards simple functions.

402
00:37:22,990 --> 00:37:28,060
And so when they give it a whole bunch images, they're actually picking out the simple things that fit the data,

403
00:37:28,060 --> 00:37:30,040
rather than the complicated things that fit the data.

404
00:37:30,040 --> 00:37:35,920
Whereas if you just randomly pick, if you do something kind of standard regression, for example, there's a very high order polynomial,

405
00:37:35,920 --> 00:37:41,250
then you're much more likely to pick a complicated one than a simple one unless you've got a bias in your system.

406
00:37:41,250 --> 00:37:43,090
These things are intrinsically biased,

407
00:37:43,090 --> 00:37:49,510
and because the patterns the the reason why they work well must be because the patterns that they're studying must also be simple,

408
00:37:49,510 --> 00:37:53,260
because if they try to look at complicated patterns, I'll show you they don't work that well.

409
00:37:53,260 --> 00:37:58,630
OK? As we showed you as well. So this tells us that the patterns are studying are in some way the other simple.

410
00:37:58,630 --> 00:38:02,530
So I gave you this very kind of grand claim at the beginning,

411
00:38:02,530 --> 00:38:06,940
why the rules are simple and I don't think I've completely explained it, but I've given you a list.

412
00:38:06,940 --> 00:38:13,120
Some things are simpler because you can think about them searches in the space of algorithms.

413
00:38:13,120 --> 00:38:19,030
So if I think about possibility, space is the space of all possible stories, for example, right?

414
00:38:19,030 --> 00:38:24,670
Then if I randomly look in the books of August, I'm going to find simpler stories much more frequently.

415
00:38:24,670 --> 00:38:27,010
I'm going to find complex stories.

416
00:38:27,010 --> 00:38:35,260
And so even though the probability of a particular very particular sequence of characters in a book is equal, every book is equally likely.

417
00:38:35,260 --> 00:38:41,050
The probability of getting simple stories is much more likely than the probability of getting complex stories.

418
00:38:41,050 --> 00:38:47,829
And thank you very much.