1 00:00:00,720 --> 00:00:09,150 This is Professor Jeffrey Amundson. We know Jeff as Jeff from down the corridor who comes to our states meetings and likens the map every month. 2 00:00:10,290 --> 00:00:15,240 But the world of pharmacology knows Jeff as a leading clinical pharmacologist 3 00:00:15,660 --> 00:00:21,390 and editor of a cult and thinks that the name of the book is a definitive, 4 00:00:21,990 --> 00:00:26,380 constantly updated, definitive index of drug reactions and interactions. 5 00:00:26,400 --> 00:00:29,270 Is that a fair description or do you? 6 00:00:30,360 --> 00:00:37,350 We've really enjoyed collaborating with Jeff on many of the projects we do in the primary care department, and we enjoy his talks. 7 00:00:37,380 --> 00:00:40,440 Let's hope you do, too. Of course, very much. Thank you. 8 00:00:41,880 --> 00:00:47,840 Well, I hope I hope the title I've given you gives me is not too much of a come on. 9 00:00:47,850 --> 00:00:51,090 I hope it works. Yeah, there is a little known law of numbers. 10 00:00:51,810 --> 00:00:56,010 My experience in talking to people about this has been that it is little known. 11 00:00:56,670 --> 00:01:01,800 When I gave this talk to the Stats Coven, as it's called, in the Centre for Evidence based Medicine. 12 00:01:02,160 --> 00:01:05,790 I think three people out of about 20 said they'd heard of it, 13 00:01:06,480 --> 00:01:11,130 but maybe some of you have heard of it and are going to be disappointed that it's not little known. 14 00:01:11,520 --> 00:01:16,260 So I apologise if that's the case in advance. Any mathematicians here? 15 00:01:17,790 --> 00:01:20,830 Just one or two. Right. 16 00:01:20,850 --> 00:01:24,149 Well, you'll be able to explain things that I can't explain. That's good. 17 00:01:24,150 --> 00:01:28,110 Any statisticians, some nobody. 18 00:01:28,620 --> 00:01:36,350 Nobody else admitting to be a statistician. Well, you'll be able to say, well, the probability is that any explanations given are true. 19 00:01:37,020 --> 00:01:45,960 So that will be helpful as well. Now I usually start with some conflicts of interest. 20 00:01:46,770 --> 00:01:52,870 Let's move this over here. I can't change because there's a slight change of pull or a slight change. 21 00:01:52,950 --> 00:01:56,880 A laser pointer that's still there. Good. Okay, let's try that. 22 00:01:57,300 --> 00:02:03,480 Okay. So I have apart from being a physician and a clinical pharmacologist and not a mathematician. 23 00:02:03,480 --> 00:02:14,070 Not a statistician. Those are my disavow. I have and I'm a member of the Centre for Evidence based Medicine in Oxford. 24 00:02:14,520 --> 00:02:18,240 I do have other things that go on you ought to know about. 25 00:02:19,470 --> 00:02:25,830 I'm connected with the British Pharmacological Society, a British National Formulary, 26 00:02:27,600 --> 00:02:33,600 British Pharmacopoeia Commission, the West Midlands Centre for Adverse Drug Reactions. 27 00:02:33,600 --> 00:02:39,110 That's in Birmingham. And I am a colleague, as Richard said, of Richard Stephens. 28 00:02:42,660 --> 00:02:45,860 And you just know where I stand now. Okay. 29 00:02:46,680 --> 00:02:49,980 So I'm going to give you a to ask a question. 30 00:02:50,640 --> 00:03:03,330 This is two sets of data. Supposedly the area in square kilometres of each of these countries. 31 00:03:04,500 --> 00:03:09,690 And the list goes on. But 196 of them, I think, in this example. 32 00:03:10,230 --> 00:03:13,740 So you've got all the areas of all the countries. 33 00:03:14,100 --> 00:03:17,430 The question is, are these real data or fake data? 34 00:03:19,020 --> 00:03:25,709 All right. So have a look at the data and decide whether you can answer that question. 35 00:03:25,710 --> 00:03:32,670 To start with, don't say anything. Oh, and by the way, if any of you does know this little known law, don't say, oh, yeah, I don't know you. 36 00:03:33,030 --> 00:03:36,410 Please keep quiet for the moment. I'll ask you all later. 37 00:03:36,420 --> 00:03:41,040 Okay. So you've had a chance to look at these data and decide whether they're real or fake. 38 00:03:42,810 --> 00:03:46,740 Good. Now, I've selected. Not at random. 39 00:03:47,460 --> 00:03:56,040 This is not a random selection, as you will see nine countries from that list and there are their populations. 40 00:04:00,980 --> 00:04:07,610 And if I truncated them the last seven digits, I'd have the leading digit. 41 00:04:07,610 --> 00:04:12,320 In each case, they'd have 111, two, one, three, one, four and so on. 42 00:04:13,070 --> 00:04:17,360 So all I'm interested in is the leading digit. Okay. 43 00:04:19,370 --> 00:04:28,760 So the question is, if I take all 196 or whatever countries, what is the distribution of the numbers of leading digits? 44 00:04:30,260 --> 00:04:39,560 So that's the question I have to ask of you. What is the distribution of the say, let's say, 100 leading digits? 45 00:04:41,000 --> 00:04:44,450 Okay. So here we are. We have the leading digits. 46 00:04:44,450 --> 00:04:52,310 And I'm going to suggest some possible distributions. And here they are for possible distributions. 47 00:04:54,580 --> 00:04:58,090 The first distribution is completely equal. 48 00:04:58,090 --> 00:05:04,480 A flat distribution, the same number of ones, the same number of twos, the same number of three, same number for us. 49 00:05:05,140 --> 00:05:13,150 Okay. The second distribution is a lot of ones, fewer twos and so on, down to very few names. 50 00:05:14,020 --> 00:05:19,120 And the third distribution is that one upside down. Lots of names, very few ones. 51 00:05:20,380 --> 00:05:25,660 And the fourth distribution is a normal distribution bell shaped curve. 52 00:05:26,920 --> 00:05:31,450 Okay. So give you a minute, just a second or two just to think about it. 53 00:05:32,500 --> 00:05:36,970 Then I'm going to ask you to vote. This is not Brexit. 54 00:05:37,180 --> 00:05:40,180 It won't be held by it. 55 00:05:41,980 --> 00:05:48,010 Okay, everybody ready? So who votes for this distribution here? 56 00:05:48,040 --> 00:05:52,930 All equal across the board. Nobody's interested in that one. 57 00:05:54,600 --> 00:05:57,750 Well, good. Okay. How about the second distribution? 58 00:05:57,760 --> 00:06:01,870 Lots of ones. Very few names. Mm hmm. 59 00:06:01,950 --> 00:06:07,810 Got three for that one. Good. Okay. How about lots of names and very few ones? 60 00:06:10,190 --> 00:06:13,300 Nobody willing to lose a few. 61 00:06:14,830 --> 00:06:19,750 You know, there's not going to be a we're not going to cut your heads off if you get it wrong or whatever. 62 00:06:19,760 --> 00:06:23,200 It doesn't matter. Have a go. Guess if you don't know. Have a guess. 63 00:06:24,320 --> 00:06:27,760 And what about the normal distribution? Any offers for that? 64 00:06:28,420 --> 00:06:33,880 One, two. You're not a very good audience. You haven't said you haven't participated. 65 00:06:34,930 --> 00:06:39,400 Let's try it again. I want everybody to guess if you don't know. 66 00:06:39,430 --> 00:06:45,900 All right, but we'll shoot you. The even distribution. 67 00:06:46,380 --> 00:06:49,380 What about gas? Is one gas ready to gas is all right. 68 00:06:49,710 --> 00:06:55,620 I'm watching to see if you put your hands up again and vote off and vote for. 69 00:06:57,010 --> 00:07:00,980 Okay. Second distribution, lots of ones. Okay. 70 00:07:00,990 --> 00:07:05,310 One, two, three, four, five, six. Very good. This one. 71 00:07:05,460 --> 00:07:10,320 Lots of nines. One, two, three. And the normal distribution? 72 00:07:11,610 --> 00:07:17,700 Yeah. Half a dozen or so. So I guess that's no better than chance in a small group. 73 00:07:17,840 --> 00:07:22,080 I don't need to do a significance test, but actually, that's the answer. 74 00:07:23,630 --> 00:07:26,900 Right. Does anybody know what this distribution is called? 75 00:07:27,920 --> 00:07:35,720 Anybody come across it? No. Well, you know, because you know, so it is little known. 76 00:07:36,860 --> 00:07:41,060 It appears it's called Bedford's law or Bedford's distribution. 77 00:07:43,390 --> 00:07:49,060 And it's it is frankly surprising. When I first was asked this, I thought this is going to be even, isn't it? 78 00:07:49,630 --> 00:07:54,970 If I hadn't called it a little known law of numbers and you thought there was something funny going on here, 79 00:07:55,270 --> 00:08:00,850 you might all have voted for that, I guess. I don't know. That's what I thought when I first heard the problem. 80 00:08:01,180 --> 00:08:05,740 I thought, surely they're all going to be normally they're all going to be the same amount. 81 00:08:06,130 --> 00:08:09,400 Equal numbers of each. Turns out not. 82 00:08:11,350 --> 00:08:22,150 And that's the distribution. The probability that any number will occur is the logarithm of that number, plus one over the number. 83 00:08:23,260 --> 00:08:27,670 That's benefits, distribution, so called. And we'll I'll show you why. 84 00:08:30,490 --> 00:08:34,899 Well, there it is. So that's where these numbers come from. 85 00:08:34,900 --> 00:08:41,350 Log two over one, log three over two, log four over three. And there's the distribution that you saw before rounded off. 86 00:08:42,800 --> 00:08:49,900 Okay. That's where it comes from. All right. Has anybody heard of Sticklers Law upon to me? 87 00:08:50,680 --> 00:08:54,819 Sticklers Law Stigmas Law was propounded by Mr. Stigler. 88 00:08:54,820 --> 00:08:59,830 Dr. Stigler. And he says, I've chosen for the title of this paper. 89 00:08:59,830 --> 00:09:05,860 Stigler Low of Economy, which may appear to be a flagrant violation of the institutional norm of humility. 90 00:09:07,600 --> 00:09:12,400 But actually, Stigler is calling the law. 91 00:09:12,430 --> 00:09:15,160 STIGLITZ Law obeys Stiglitz law. 92 00:09:16,270 --> 00:09:24,610 It's very self referential because Stigler is law is that no scientific discovery is named after its original discoverer, 93 00:09:24,940 --> 00:09:31,960 and Stigler did not discover Stiglitz law. So stimulus law obeys Stigler law and Stigler. 94 00:09:31,970 --> 00:09:35,950 This is a great paper that Stigler goes on to mention. 95 00:09:35,970 --> 00:09:40,060 Okay, Merton. Robert Merton. Some of you may have heard a great sociologist of science. 96 00:09:41,380 --> 00:09:52,210 He wrote some wonderful papers. He was the man who first pointed out the law of unexpected consequences or unintended consequences. 97 00:09:52,690 --> 00:09:55,300 His paper on that is is very worth reading. 98 00:09:56,290 --> 00:10:04,510 And so Stigler quotes Merton saying that Merton's famous hypothesis that all scientific discoveries are in principle multiple. 99 00:10:05,680 --> 00:10:10,120 Actually, of course, it goes back beyond that. Before that, before Merton even. 100 00:10:10,330 --> 00:10:15,430 This is George Sutton in a book written in 1936, The Study of the History of Science. 101 00:10:15,430 --> 00:10:21,040 And he says creations absolutely de novo are very rare if they occur at all. 102 00:10:22,270 --> 00:10:25,989 And of course, we know that things are discovered simultaneously by different people. 103 00:10:25,990 --> 00:10:31,910 Russell and Wallace and Russell discovering the idea of evolution simultaneously live 104 00:10:31,960 --> 00:10:37,840 in IT'S and Newton discovering the ideas of calculus simultaneously and much disputed. 105 00:10:37,900 --> 00:10:45,490 And so I co I called this independently before I ever learnt about Stigler. 106 00:10:45,910 --> 00:10:50,170 I called it the law of no man. No man is the Latin, of course, for a name. 107 00:10:50,590 --> 00:11:00,249 So it seemed to me that was a good name for a name and it stands for non original mal appropriate eponymous nomenclature and it Stigler is low. 108 00:11:00,250 --> 00:11:02,410 No entity is named after its discoverer. 109 00:11:04,300 --> 00:11:13,810 Actually it comes from this man, Simon Newcomb, who showed it in 1881, and he published this paper in the American Journal of Mathematics. 110 00:11:14,740 --> 00:11:18,670 And what he noticed and this is well known by those who know it, 111 00:11:19,810 --> 00:11:27,040 is that if you had a book of logarithmic tables and we couldn't discover this today because we don't have books of logarithmic tables, 112 00:11:27,760 --> 00:11:33,040 he found that the first pages were much more thumbed than the last pages. 113 00:11:33,970 --> 00:11:38,140 It's a fascinating observation. And he said, Well, this is odd. 114 00:11:38,560 --> 00:11:41,320 People are looking at the ones more than the nines. 115 00:11:42,700 --> 00:11:48,849 And he looked into this distribution and he said the first significant figure is oftener one than any other digit, 116 00:11:48,850 --> 00:11:53,020 and the frequency diminishes up to nine. Beautiful demonstration. 117 00:11:53,800 --> 00:12:00,920 And then he goes on to ask other questions that I won't go into and look, he's got it all there. 118 00:12:00,940 --> 00:12:06,670 The distributions that I pointed out in that column, they're exactly the distributions. 119 00:12:07,030 --> 00:12:12,370 Of course, it's bound to be exact, because he derived it. And we have just shown you it's log D plus one over D. 120 00:12:15,330 --> 00:12:22,740 And the law of probability, he says, of the occurrence of numbers is such that all men to say of their logarithms are equally probable. 121 00:12:23,590 --> 00:12:29,820 So it's an logarithmic law. And a lot of things in science, in biology follow logarithmic distributions, of course. 122 00:12:34,870 --> 00:12:40,640 And actually in passing, one can note that it's not just the first digit. 123 00:12:40,870 --> 00:12:44,560 It can be the first pair of digits. Ten, 11, 12, 13. 124 00:12:44,800 --> 00:12:48,670 Same low applies it's log D plus one over D. 125 00:12:49,420 --> 00:12:53,890 Then you can go on calculating the probabilities and it all fits as I'll show you later. 126 00:12:56,260 --> 00:12:59,980 So is Benford in the 1930 is 1938. 127 00:13:00,940 --> 00:13:05,830 And he repeats this observation and derives the same law. 128 00:13:06,040 --> 00:13:10,329 Whether he knew about the previous derivative, I don't know. 129 00:13:10,330 --> 00:13:20,050 But he certainly quotes the grubby pages hypothesis that is well known from books of logarithms. 130 00:13:22,050 --> 00:13:25,330 And this time what he does is he actually tests the law. 131 00:13:25,630 --> 00:13:30,810 He takes a whole load of data and he tests the law and he finds it fit. 132 00:13:31,910 --> 00:13:38,890 And I think so. He says the law of anomalous numbers he calls them, is thus a general probability law of widespread application. 133 00:13:39,550 --> 00:13:48,160 And there are some of his data and this is prime numbers, and he's plotted the distribution of first digits, 134 00:13:48,160 --> 00:13:53,260 leading digits in all the prime numbers in the top left. 135 00:13:53,260 --> 00:13:56,530 It's the first 5.7 million. 136 00:13:57,340 --> 00:14:05,560 The next one is the first 50 million, then 455 million and then 4 billion, 4 billion prime numbers. 137 00:14:06,730 --> 00:14:09,880 And the law holds throughout. 138 00:14:11,380 --> 00:14:18,100 Oh, in fact, it improves, as you can see, very slightly the more you have to analyse. 139 00:14:21,070 --> 00:14:26,260 Now, I did an analysis of this sort on the country data that I showed you at the beginning. 140 00:14:27,010 --> 00:14:34,900 And here they are. It's just 196 countries. But you can see the leading digits in the populations of the capitals of those countries. 141 00:14:35,560 --> 00:14:44,560 Follows benefits low. So does the frequency of the leading digits of the populations of the countries as a whole, 142 00:14:44,560 --> 00:14:47,500 not just the capitals, but the populations of the countries as a whole. 143 00:14:49,120 --> 00:14:57,910 So do the frequencies of the leaving digits in the areas in square miles of those countries. 144 00:14:58,570 --> 00:15:05,020 And if you did it in square kilometres be exactly the same or an Egyptian cubits. 145 00:15:06,590 --> 00:15:11,260 But it it just it it scales. 146 00:15:15,930 --> 00:15:28,630 Square kilometres. So now you are in a position to say whether these data are real or fake. 147 00:15:29,440 --> 00:15:46,540 I'll give you a minute or two to think about it. Okay. 148 00:15:47,650 --> 00:15:53,050 So I'm going to ask you if you think these data are fake. 149 00:15:53,980 --> 00:15:58,600 Would anybody think that the first column are the fake data? 150 00:15:58,630 --> 00:16:01,630 I'm not saying that, but they may both be fake. They will both be real. 151 00:16:02,680 --> 00:16:09,510 Okay. So anybody think that the first column are fake data? 152 00:16:11,400 --> 00:16:15,330 Guess if you want to know. No. No penalty. No, nobody for that. 153 00:16:15,780 --> 00:16:18,829 What about the second column? You think those might be fair? Yeah. 154 00:16:18,830 --> 00:16:21,830 Yeah. There's no ones. 155 00:16:21,840 --> 00:16:26,399 They don't begin with one, any of them. But the first column, there are some. 156 00:16:26,400 --> 00:16:28,830 See, they might be. They might not be. 157 00:16:29,010 --> 00:16:38,010 It's a small sample, of course, and as I've said over analyse the whole data set, but here are the distributions. 158 00:16:38,010 --> 00:16:41,910 There's the distribution for the first one. So you can't tell really. 159 00:16:42,330 --> 00:16:47,250 You certainly couldn't tell they were fake. You couldn't you couldn't say these are fake data. 160 00:16:47,550 --> 00:16:54,090 You can't, on the other hand, say these are real data. It looks as if they probably are, but it's not a big enough sample. 161 00:16:54,720 --> 00:17:01,290 But you'd be damn sure those were fake, but pretty high probability that they are fake data. 162 00:17:03,960 --> 00:17:12,750 Okay. There are some statistics that match Bedford's law and you can read them for yourselves. 163 00:17:13,350 --> 00:17:17,300 Some interesting, interesting sets of data there. 164 00:17:17,310 --> 00:17:20,820 These all obey benefits law. It's quite striking. 165 00:17:20,820 --> 00:17:24,470 It really is amazing. Here's some more. 166 00:17:24,480 --> 00:17:33,840 This is top left is the distribution of 122,000 cities in the world, 167 00:17:34,530 --> 00:17:44,100 according to the heights above sea level in metres and on the right in Royal Egyptian Cubits, just for the [INAUDIBLE] of it. 168 00:17:46,440 --> 00:17:52,470 A daily volume of shares on Nasdaq, import export volumes for seafood, fish and so on. 169 00:17:54,600 --> 00:17:58,470 That's the URL of that particular website. 170 00:18:00,780 --> 00:18:04,620 And I said The leading pairs of digits. There you are. 171 00:18:05,010 --> 00:18:11,010 Nasdaq daily trading volumes. And as you can see, remember that the names start at 4.5. 172 00:18:11,370 --> 00:18:14,900 So here we have the tens elevens twelves up to the 99. 173 00:18:18,600 --> 00:18:24,240 All following Bamford's law with incredible precision accuracy. 174 00:18:24,490 --> 00:18:28,310 Don't don't correct me. Okay. 175 00:18:29,070 --> 00:18:32,640 It's quite amazing. So why is this? 176 00:18:34,200 --> 00:18:38,270 Well, this is how I understand it. We've got a pencil, a little stubby pencil. 177 00:18:38,280 --> 00:18:43,120 It's one unit in length. And I'm going to make the pencil grow. 178 00:18:44,490 --> 00:18:50,130 That shouldn't be there. Sorry. It doesn't matter. And until it's two units in length. 179 00:18:51,060 --> 00:18:54,450 Right. And it's going to grow at a uniform rate. 180 00:18:55,710 --> 00:19:01,860 So the time it takes to grow from one unit to two units is the time it takes to double in length. 181 00:19:03,540 --> 00:19:16,530 Now, if I measured the length of it every second, let's say I'd have a range of numbers, all of which began with a 11. to a one 1.0 to 1.03 1.99. 182 00:19:17,910 --> 00:19:24,090 During that period, I would have 99 C or however many numbers beginning with A one. 183 00:19:25,500 --> 00:19:34,110 But if it continues to grow, when it grows between two and three, it only has to grow by half the length. 184 00:19:34,890 --> 00:19:42,390 And so there will be fewer twos during the same time because the proportion by which it's growing is not as large. 185 00:19:43,770 --> 00:19:47,670 Similarly, when it grows to four units, it's only growing by a third. 186 00:19:48,960 --> 00:19:57,750 So in the same interval, growing at the same rate, there will be fewer threes than there were twos in theatres than there were ones. 187 00:19:57,780 --> 00:20:00,990 That explains it. And it's a logarithmic distribution. 188 00:20:01,500 --> 00:20:13,080 So if you if you plot the log, if you plot the normal numbers on a logarithmic scale, that's 30% log, 2.3012. 189 00:20:13,290 --> 00:20:17,790 So that's 30% of the way from 1 to 10 and so on. 190 00:20:18,300 --> 00:20:22,170 Now, as you go up the scale, the space is become narrower and narrower. 191 00:20:22,170 --> 00:20:30,719 Anybody who's ever used a log paper for drawing, as I do sometimes for drawing plasma concentration verses, time curves, which are log linear. 192 00:20:30,720 --> 00:20:36,330 If you give a drug to someone, measure the plasma concentrations, the concentration falls exponentially. 193 00:20:37,080 --> 00:20:42,300 And if you plot it on a logarithmic concentration scale against time, it's a straight line. 194 00:20:42,840 --> 00:20:48,930 And the logarithmic concentrations look like that. They get smaller and smaller as you go up from 1 to 10. 195 00:20:48,940 --> 00:20:52,770 But when you get to ten to get to 20, you get a double again. 196 00:20:54,180 --> 00:21:02,520 Going from 10 to 11 is smaller again than this, but going from 10 to 20 is the same size as going from 1 to 2. 197 00:21:06,120 --> 00:21:12,810 So here you have to increase by 100% here by 50%, here by 33%, and so on. 198 00:21:13,650 --> 00:21:18,750 And so if we go back to our law, we see that's exactly what it is. 199 00:21:19,440 --> 00:21:22,770 Three over two is one and a half. It's 50% more. 200 00:21:23,460 --> 00:21:27,390 Four over three is one in a third, it's 33% more. 201 00:21:27,990 --> 00:21:33,420 Just like the pencil growing. That is why benefits law applies. 202 00:21:37,840 --> 00:21:43,659 Here's the paper. Can't remember where this comes from, but it is post email. 203 00:21:43,660 --> 00:21:48,160 So it's relatively recent on first digits of squares and cubes. 204 00:21:51,270 --> 00:21:55,799 Here's one on the first digit frequencies of prime numbers where we saw that Benford did that. 205 00:21:55,800 --> 00:22:00,390 But this these pair did the rim and the zeros of the Riemann Zeta function, 206 00:22:00,390 --> 00:22:09,180 which is to do with the distribution of primes and to do with the famous Riemann hypothesis so far unsolved or unproven. 207 00:22:14,040 --> 00:22:19,170 I wondered if I could get a set of data that might follow Bedford's law. 208 00:22:20,610 --> 00:22:26,520 And so I thought, well, the simplest set of data I can access without going to an encyclopaedia, 209 00:22:26,760 --> 00:22:33,930 the stuff on the countries, the populations and so on, was from Whitaker's Almanac that I analysed. 210 00:22:34,170 --> 00:22:44,130 This is from my computer. If you look at your computer, you've got Doc Dark and PDF and so and they all have document sizes, so many killer bases. 211 00:22:44,670 --> 00:22:51,770 So I thought, well, I'll take out all of the data in my computer and see if the sizes of the documents fit. 212 00:22:51,800 --> 00:23:00,600 Then for as long they yeah. It's not significantly different but it's, it's not quite right is it. 213 00:23:00,630 --> 00:23:01,860 It's not quite fitting. 214 00:23:02,850 --> 00:23:10,860 So I thought well maybe I haven't got a big enough sample 330 with it should be enough, but maybe I don't have a big enough sample. 215 00:23:11,100 --> 00:23:16,469 So I extended it to more and stayed the same. 850 stayed the same. 216 00:23:16,470 --> 00:23:21,930 And there they all are. So this it doesn't quite fit. 217 00:23:23,790 --> 00:23:27,689 So I presented the data to the stats cover and we discussed it and I thought, well, 218 00:23:27,690 --> 00:23:35,120 maybe it's something to do with the fact that I'm mixing up that dark dot doc x PDF and so on. 219 00:23:35,130 --> 00:23:38,880 Maybe I should separate them out. So I did this afternoon on the train. 220 00:23:38,940 --> 00:23:42,570 That's what I did, and that's the PDFs. 221 00:23:45,340 --> 00:23:54,970 It's not helping. And worse still, the dogs are actually significantly different by Chi Square. 222 00:23:55,390 --> 00:23:59,800 They just did a simple chi square with eight degrees of freedom and it's less than O2. 223 00:24:01,480 --> 00:24:08,020 So there is something not quite right about these data, though not quite conventional, shall we say. 224 00:24:09,000 --> 00:24:15,760 I don't know what it is, but this man, Theodore Hill, is the world's one of the world's experts on this benefits law. 225 00:24:16,750 --> 00:24:25,930 And this is what he says. If distributions are selected at random in any unbiased way, then I suspect that's important. 226 00:24:26,680 --> 00:24:35,170 I think the PDF data may be reasonably random, but I think the doc data probably aren't because they're my word files. 227 00:24:35,560 --> 00:24:43,420 And I write a blog every week and it's the same length. Every time I write an article for a journal, it's similar length every time. 228 00:24:43,420 --> 00:24:48,460 So I suspect that there is bias in the doc data, which explains why they're so bad. 229 00:24:48,880 --> 00:24:53,500 I wouldn't have expected bias in the PDF data, but maybe there is a source of bias there. 230 00:24:53,590 --> 00:25:00,040 I don't know. It's just a hypothesis. Then he says the significant digit frequencies, 231 00:25:00,040 --> 00:25:06,190 that's the first digit of the first two digits of the combined sample will converge to Bamford's distribution. 232 00:25:06,310 --> 00:25:13,750 And that's what we saw in a way, although the doc data were rubbish if you like, it did not fit. 233 00:25:14,080 --> 00:25:22,030 When I put them in with all the other data and I haven't analysed all the other, I haven't analysed the input and so when I put them all in, 234 00:25:22,630 --> 00:25:27,430 it started to converge, although it clearly hasn't converged fully to Bedford's level. 235 00:25:28,150 --> 00:25:32,500 So maybe that's what it is, but my sample is biased. 236 00:25:35,470 --> 00:25:39,400 So here are some lists of numbers that don't obey benefits. 237 00:25:39,400 --> 00:25:45,070 Low lottery numbers, probably just as well, because otherwise, otherwise, 238 00:25:45,070 --> 00:25:49,330 we'd all be guessing to win the lottery with greater ease, the odds would show. 239 00:25:49,390 --> 00:25:55,390 And incidentally, there's an interesting thing when I don't know anybody buy lottery tickets here. 240 00:25:55,960 --> 00:25:59,140 Nobody foolish enough to say. 241 00:26:01,930 --> 00:26:05,320 Yeah, but it would change your life. It would, wouldn't it? 242 00:26:05,590 --> 00:26:11,920 That's an interesting argument. Okay. So let me ask you this. When is the best time to buy a lottery ticket? 243 00:26:17,470 --> 00:26:21,130 Day before. Why? Why? Any better chances? 244 00:26:21,400 --> 00:26:25,209 Greater chance of surviving? Correct. 20 minutes, actually. 245 00:26:25,210 --> 00:26:34,060 20 minutes. 20 minutes. If you buy it before 20 minutes, sooner than 20 minutes, you're more likely to die than win. 246 00:26:37,990 --> 00:26:43,660 That just illustrates how unlikely it is that you're going to win. 247 00:26:43,900 --> 00:26:46,990 It's a beautiful statistic. I haven't worked that out myself. 248 00:26:47,020 --> 00:26:51,550 I read it somewhere with existing telephone numbers. Well, you can read it for yourself. 249 00:26:51,700 --> 00:26:58,989 Interestingly, although the Bose-Einstein distribution in quantum mechanics obeys Bentham's law, Boltzmann Gibbs does not, 250 00:26:58,990 --> 00:27:05,940 which is a thermodynamic distribution and the Fermi direct distribution, which is also to do with energy levels at the quantum. 251 00:27:06,810 --> 00:27:11,410 In in stable systems. 252 00:27:11,710 --> 00:27:16,060 Don't obey this law. So I don't know anything about physics, so I don't know why. 253 00:27:16,330 --> 00:27:19,480 And importantly, fabricated data. 254 00:27:21,100 --> 00:27:24,940 Most people who make up data don't know Bedford's law. 255 00:27:25,180 --> 00:27:29,830 If they did, they get away with it. And actually, probably many of them get away with it anyway. 256 00:27:30,460 --> 00:27:33,550 But if you want to make up data, don't forget before it's law. 257 00:27:33,970 --> 00:27:39,190 You'll be all right. On the other hand, I am reminded of the story of Gregor Mendel. 258 00:27:41,170 --> 00:27:46,690 We all know about Gregor Mendel and the Sweet Peas and how he discovered the laws of genetics. 259 00:27:47,830 --> 00:27:52,510 Well, a famous statistician called Ari Fisher, whom you'll know from Fisher's exact test, 260 00:27:53,260 --> 00:28:00,670 analysed Mendel's data in the 1930s, I think, and concluded that Mendel's data were too good. 261 00:28:02,800 --> 00:28:17,530 They were too close to the prediction to be about, you know, p less than or one to be other than possibly he thought engineered to fit the hypothesis. 262 00:28:18,460 --> 00:28:26,830 But maybe Mendel was just lucky. Or maybe, as some have suggested, his his gardeners gave him the result they thought he wanted. 263 00:28:27,820 --> 00:28:37,090 I had a technician like that once, but, you know, don't stick too slavishly to Bedford's law. 264 00:28:37,090 --> 00:28:41,799 If you're going to fabricate data, put some random error in fluke. 265 00:28:41,800 --> 00:28:50,180 Much you know is not on the syllabus and fabricate that to fabricate data. 266 00:28:50,180 --> 00:28:55,320 We could have a whole week on that, I suspect. All right. And there's a book on this. 267 00:28:55,330 --> 00:29:02,409 It says an introduction, but don't believe it. And it's been I did a systematic well, not a systematic review. 268 00:29:02,410 --> 00:29:05,890 I kind of flipped through papers that mentioned benefits law. 269 00:29:05,920 --> 00:29:11,050 And a PubMed search has been widely used actually, so it's not so little known among those who know it. 270 00:29:11,920 --> 00:29:14,649 Here's the the Fermi directed Boltzmann, 271 00:29:14,650 --> 00:29:25,090 Gibbs and Bose-Einstein distributions discrete dynamical systems whatever that means election or irregularities. 272 00:29:26,410 --> 00:29:32,620 They analysed elections that were supposed to have been fraudulent and found that they did not. 273 00:29:32,920 --> 00:29:40,690 The data from those elections did not fit Bedford's law, thus not confirming the fraudulence. 274 00:29:40,690 --> 00:29:44,950 But you know, if you think it is in the first place and it doesn't meet benefits law, 275 00:29:44,950 --> 00:29:49,270 you've got pretty good reason for suspecting strongly that it was fraudulent. 276 00:29:49,690 --> 00:29:54,510 And recent Russian elections are mentioned there. One wonders about the American the right. 277 00:29:54,550 --> 00:30:02,900 What shouldn't say that? That's why the public health surveillance systems which vary from country to country. 278 00:30:03,730 --> 00:30:05,980 If the data don't obey Bedford's law, 279 00:30:05,980 --> 00:30:13,750 it suggests that the surveillance system is not very good because in systems where the surveillance system is good, the data obey benefits law. 280 00:30:14,650 --> 00:30:23,559 So you can pick out countries that are not good at surveying the population for adverse reactions. 281 00:30:23,560 --> 00:30:30,010 For example, there's the scientific fraud. 282 00:30:33,030 --> 00:30:38,760 Anaesthesia papers. Quality of occupational hygiene data. 283 00:30:42,700 --> 00:30:48,280 Drug Discovery data might help you in your structure, activity, relationships. 284 00:30:49,690 --> 00:30:53,430 Drugs are often discovered because they're not just discovered. 285 00:30:53,790 --> 00:30:57,520 You discover an effect of a drug on a system, some system or other. 286 00:30:58,210 --> 00:31:03,700 If you if you have a target system, say it's an HIV, the virus that causes AIDS, 287 00:31:05,110 --> 00:31:13,030 and there's a part of that virus that you want to target a protein on the surface of the of the of the virus. 288 00:31:14,650 --> 00:31:18,250 And you have 10,000 compounds on your shelves as drug companies. 289 00:31:18,250 --> 00:31:28,389 Do you have a 25% chance of hitting that target just from the random collection of stuff you have on your shelves? 290 00:31:28,390 --> 00:31:34,300 It's really quite good. Now, what you get may not be very good at hitting the target, but it will hit it. 291 00:31:34,780 --> 00:31:40,299 And once you've got that compound, you can modify it, change its structure until you get a better compound. 292 00:31:40,300 --> 00:31:45,880 And that's how some new drugs are discovered. And this is known as structure, activity, relationships, 293 00:31:46,150 --> 00:31:56,500 because the activity varies with the structure and clever chemical pharmacologists can pharmaceutical chemists can do this. 294 00:31:56,500 --> 00:32:03,549 They can say, well, if we put a missile group on or a missile group, that will change the activity and it will make it better at hitting this target. 295 00:32:03,550 --> 00:32:09,400 And then you put that you put the structure of the target into a computer and show how it fits. 296 00:32:09,400 --> 00:32:10,510 The molecule is very clever. 297 00:32:10,660 --> 00:32:23,530 Anyway, the data you get, Fitbit fits well, and that's to do with toxicology, monitoring, radiation, all kinds of brain activity. 298 00:32:24,250 --> 00:32:27,610 What is going on in your brain at the moment? If it doesn't fit then French law. 299 00:32:27,610 --> 00:32:32,920 I have failed electroencephalography. 300 00:32:33,280 --> 00:32:40,370 Incredible. Really. Articles and journals, all sorts of things. 301 00:32:40,700 --> 00:32:51,739 This is a good one. If you look at multiple choice tests and the answers at the ends of the chapter, 302 00:32:51,740 --> 00:32:58,010 you all had these books of mathematics or physics or whatever, and they have problems. 303 00:32:58,010 --> 00:33:01,490 At the end of the chapter, at the end of the book, you get the answers to the problems. 304 00:33:01,970 --> 00:33:05,690 Well, it turns out that the first digits that the answers follow Bedford's law. 305 00:33:06,290 --> 00:33:14,480 So you might think that in a multiple choice question system, if you knew this, you might have a better chance of getting the right answers. 306 00:33:15,260 --> 00:33:18,650 Because if you just guess a one, you're more likely to get it right. 307 00:33:18,980 --> 00:33:25,490 But it turns out not to work. So that's just as well because our students want to cheat the system in that way, as it were. 308 00:33:26,900 --> 00:33:29,900 Distinguishing noise from chaos. Well, there you go. 309 00:33:31,280 --> 00:33:39,979 Skewness in the distributions in cardiac models, you know, all sorts of stuff. 310 00:33:39,980 --> 00:33:44,000 And there is a generalised form of benefits which I think is quite useful. 311 00:33:47,210 --> 00:33:50,300 And I won't go into it because I don't understand the mathematics, 312 00:33:50,300 --> 00:34:02,000 but that shows you that the probability of any number D coming as the leading digit was equal log D plus one over D. 313 00:34:02,960 --> 00:34:12,020 But this is a more general law for all sorts of distributions, and in this thing you have an alpha, 314 00:34:12,020 --> 00:34:16,340 and if alpha is zero, then the whole thing collapses to the normal distribution. 315 00:34:16,940 --> 00:34:21,679 If alpha is one, it becomes Benford. And for other alphas, I don't know. 316 00:34:21,680 --> 00:34:27,710 But it gives you other distributions which are of interest, I guess, to mathematic mathematicians and others. 317 00:34:28,580 --> 00:34:33,680 So that's all I can tell you about it. It's not something I've read or understand in detail, 318 00:34:35,150 --> 00:34:42,350 and there are some distributions that you can see that are different depending on whether Alpha is one, 319 00:34:42,350 --> 00:34:45,770 two or three, perhaps even fractional for all I know, I don't know. 320 00:34:45,770 --> 00:34:51,500 I mean, you can get a fractional polynomials, so I don't see why you shouldn't have a fractional generalised bedford's law. 321 00:34:51,510 --> 00:35:00,049 But anyway, it can be used for different. So the next thing I suppose I should do is try my data on one of these distributions and 322 00:35:00,050 --> 00:35:08,720 see if they fit when I get them a minute off and explaining the uneven distribution, 323 00:35:08,720 --> 00:35:19,520 the laws of Benford and if anybody knows that rule is zip's law is to do with the frequency of words in a corpus in a dictionary, 324 00:35:19,520 --> 00:35:31,400 say or well, the corpus of texts. And the fifth, let's say most common word, is 20 times more common than the hundredth. 325 00:35:32,000 --> 00:35:38,900 Most common word. It's not, but doesn't necessarily follow intuitively, but it turns out to be so. 326 00:35:39,200 --> 00:35:44,270 The fifth most common word is 30 times more common than the 150th most common word. 327 00:35:44,660 --> 00:35:52,040 There is a distribution of common ness of words in the corpus of language, which is also very interesting. 328 00:35:52,400 --> 00:36:01,040 That's low. There's the online bibliography. Anybody that's interested, it's just put it in Bedford on Bedford online and you'll find it. 329 00:36:01,040 --> 00:36:05,090 And there's a lot of literature on this, which is where I've gone to look for it. 330 00:36:05,780 --> 00:36:07,220 I think that's it. Thank you.