1 00:00:00,210 --> 00:00:04,410 Right. You're very much for coming, everybody. I'm Richard Simmons. 2 00:00:04,680 --> 00:00:08,820 I'm the coast director of one of the many master's courses on the program. 3 00:00:09,390 --> 00:00:17,549 And we've timed this particular talk to coincide with one particular module of our master's program in Central Medical Statistics. 4 00:00:17,550 --> 00:00:24,090 But like all the talks, it's open to you. Some people here are students on this week's course. 5 00:00:24,390 --> 00:00:28,230 Some of you a staff on the floor. I've got a few visitors. 6 00:00:28,410 --> 00:00:32,610 Either people I know by science or people either by science or people I've just met. 7 00:00:33,000 --> 00:00:38,520 Either way. Thank you. I saw Professor Alexander Byrd speak at a conference last summer. 8 00:00:38,790 --> 00:00:45,660 And ever since then, every time I've heard somebody worried about the replication crisis in medical publishing, 9 00:00:46,170 --> 00:00:50,940 you've got to hear Alexander Byrd speak so very, very close to you. 10 00:00:50,940 --> 00:00:59,570 You've got to hear Professor Alexander both be. And he is from King's College, London, where professor of philosophy and medicine. 11 00:00:59,610 --> 00:01:03,260 That's right. Yes. Great combination. I don't know if anybody else has that job title. 12 00:01:03,280 --> 00:01:07,910 It's very I don't know what's coming. 13 00:01:09,770 --> 00:01:15,000 Okay. Thanks very much indeed, Richard, for the invitation. It's a pleasure to be to be here. 14 00:01:17,280 --> 00:01:27,030 I'm a philosopher, but with a particular interest in in medicine and in teaching medical students, which I do at King's Guys campus. 15 00:01:28,500 --> 00:01:31,830 And I'm a general interested in. 16 00:01:34,810 --> 00:01:40,240 How it is that we know what we know in science and when possibly we don't know what we think we do know. 17 00:01:40,750 --> 00:01:49,000 And the replication crisis or so-called replication crisis is of particular interest to me, therefore. 18 00:01:49,900 --> 00:01:54,580 So what is this so-called replication crisis? 19 00:01:54,820 --> 00:02:04,270 Well, here's a little bit of evidence that a an article in Nature did a survey, 20 00:02:04,900 --> 00:02:09,670 not very scientific survey, it must be said, but a survey of its readers. 21 00:02:09,880 --> 00:02:15,430 And of those 1500 or so scientists who responded, 52 of them, 22 00:02:15,730 --> 00:02:24,260 52% of them said that they thought there was a significant crisis of reproducibility in science. 23 00:02:24,280 --> 00:02:34,660 That is to say, the phenomenon whereby a result is published in the scientific journals. 24 00:02:35,110 --> 00:02:40,930 It seems to be important and interesting results supported by the evidence. 25 00:02:42,520 --> 00:02:49,420 But then others come along and try to reproduce that experiment and then get a result. 26 00:02:49,690 --> 00:03:03,550 That is either another result or is a result which is a of an effect size, much smaller than originally reported. 27 00:03:04,030 --> 00:03:12,180 So we are failing to reproduce much of the the science that we thought was correct. 28 00:03:13,240 --> 00:03:23,050 And this is affecting in particular biomedical research and social psychology and other bits of psychology as well. 29 00:03:23,080 --> 00:03:30,640 But social psychology especially, I will concentrate on the medical side because that's where we will come from. 30 00:03:30,970 --> 00:03:36,550 But afterwards, I'll be happy to talk about the social psychology, because it is actually more amusing than the medical side. 31 00:03:38,770 --> 00:03:52,329 But on the medical side, the biotech company Amgen undertook a large scale replication study of 53 publications in 32 00:03:52,330 --> 00:04:00,460 oncology and got a successful full replication of the original result in only six cases. 33 00:04:00,970 --> 00:04:08,410 And as I say, there's also social psychology, but we'll leave that for some of the after talk matter. 34 00:04:09,880 --> 00:04:23,530 Okay. So what I want to do is to give one angle on understanding what's going going on and to explain why I think that this outcome, 35 00:04:23,530 --> 00:04:28,900 this crisis is entirely predictable. 36 00:04:29,320 --> 00:04:39,010 And then we can talk a bit about how we should feel about it in the light of what I am, what I'm arguing, if it's if it's correct. 37 00:04:39,910 --> 00:04:46,899 And to do that, the first thing I want to do is to talk about the so-called base rate fallacy. 38 00:04:46,900 --> 00:04:50,440 And this will be, I'm sure, familiar to to many of you. 39 00:04:50,590 --> 00:04:57,370 But it's worth repeating just in case there is some who who for whom it's not entirely familiar. 40 00:04:57,610 --> 00:05:08,350 So the you given this this problem was a question you're told about a screening program for a 41 00:05:08,650 --> 00:05:22,120 disease which affects one in a thousand individuals and a screening program is 95% accurate 42 00:05:23,380 --> 00:05:32,170 and screening everybody not not just those with prior indications but but but everybody 43 00:05:32,170 --> 00:05:40,870 of particular age group is being screened and the particular individual tests positive. 44 00:05:41,290 --> 00:05:44,770 We know nothing else about them. We know of no other risk factors. 45 00:05:45,940 --> 00:05:48,070 All we know is they passed the test. 46 00:05:48,130 --> 00:05:58,960 Perhaps your GP and this individual comes into surgery and says, I've taken this test, it tested positive, you don't have this horrible disease. 47 00:05:59,410 --> 00:06:07,630 And then the question is what? What did you say? What should you tell the patient about their chances of having the the disease? 48 00:06:07,960 --> 00:06:24,010 Famously, this question was put to medical students at Harvard 1978, and of those, only 11 out of 60 got the correct answer. 49 00:06:24,220 --> 00:06:33,400 Okay. So what is the correct answer? Well, the incorrect answer is, look, it's a highly reliable test you've very probably got. 50 00:06:33,700 --> 00:06:39,640 The the disease. That's the wrong answer in which we can see this way. 51 00:06:39,670 --> 00:06:42,670 So let us imagine there are a thousand individuals, 52 00:06:43,870 --> 00:06:55,240 one of whom there some dot does indeed have this disease and we run these thousand people through our screening test. 53 00:06:55,870 --> 00:07:05,950 That's 95% accurate. So 95% of the people without the disease are told your sign. 54 00:07:06,070 --> 00:07:16,870 So 99% sent home saying you tested negative because that means leaves 5% who are told falsely that you do have the disease. 55 00:07:18,310 --> 00:07:25,750 And we can assume it's highly probable that the one person with the disease is told they've got the disease. 56 00:07:26,020 --> 00:07:27,730 95% of they will be. 57 00:07:28,420 --> 00:07:36,670 But what we see there is that we've got about 50 individuals who have been told they've got the disease, only one of whom actually has it. 58 00:07:37,570 --> 00:07:44,140 So that means of those told that they have disease or who test positive. 59 00:07:44,200 --> 00:07:50,410 In fact, only 2% and one in 50 actually has the disease. 60 00:07:50,800 --> 00:08:02,710 So that's the right answer. So what we've got is a situation where the base rate of this disease is one in a thousand. 61 00:08:03,280 --> 00:08:08,250 And we can call this if this number pi. 62 00:08:09,250 --> 00:08:20,500 So that is the base rate or the prior probability of an individual actually having this disease and a false positive rate of 5%. 63 00:08:21,770 --> 00:08:32,349 I mean, call this alpha. And what we've shown is that the false positives amongst the 999 people who didn't have the disease, 64 00:08:32,350 --> 00:08:37,540 the false positives amongst them greatly outnumber the one true positive. 65 00:08:40,390 --> 00:08:51,580 And failing to recognise this fact is the fallacy of base rate neglect or of ignoring the the base rate, 66 00:08:52,420 --> 00:08:59,140 I'm willing to argue, is in a sense that's exactly what's going on in the replication crisis. 67 00:08:59,590 --> 00:09:12,310 Now, for those of you who are a bit more into the statistics, I've you will know that I've been he said he said it's accurate, 95% accurate. 68 00:09:12,340 --> 00:09:15,400 Well, that's ambiguous. So what does that exactly mean? 69 00:09:15,550 --> 00:09:19,390 Well, there are two types of accuracy that I could me. 70 00:09:20,470 --> 00:09:29,200 One is the accuracy in the sense of avoiding false positives. 71 00:09:30,130 --> 00:09:35,230 And then there's an accuracy in sense of avoiding false negatives. 72 00:09:35,230 --> 00:09:38,230 And these are two different types of accuracy. 73 00:09:38,230 --> 00:09:42,910 And they can come in, do typically come apart. 74 00:09:43,720 --> 00:09:57,560 I always focusing most of what I say on on this thing alpha which is the type one error rate and the corresponding accuracy one minus alpha. 75 00:09:57,580 --> 00:10:05,080 I will be assuming throughout that the power is quite high for the calculations. 76 00:10:05,080 --> 00:10:13,959 I'll be assuming it's unrealistically high at 0.995, because I want to show that if we get a problem, 77 00:10:13,960 --> 00:10:19,240 even if you've got high powered studies or that the issue arises, 78 00:10:19,240 --> 00:10:27,219 even if all our studies are high, high powered, and we can we can discuss what happens in the in the real world, 79 00:10:27,220 --> 00:10:33,280 where many of our studies are less highly powered than that. 80 00:10:33,670 --> 00:10:49,510 Okay. Okay. So what's the lesson of the fallacy of lowering the base rate down to the sample we had moments ago with the screening program? 81 00:10:49,780 --> 00:10:59,950 Is it we want to avoid mixing up these two kinds of conditional probability, the probability that someone is disease free, 82 00:11:00,580 --> 00:11:14,110 given that they've tested positive for the disease and the probability that they test positive given that they don't have the disease. 83 00:11:14,980 --> 00:11:18,190 This one here is the false positive error. 84 00:11:18,190 --> 00:11:21,490 Right? That's the positive. 85 00:11:21,500 --> 00:11:25,930 That's the probability that they test positive even though they're disease free. 86 00:11:27,250 --> 00:11:33,340 And that's all thing alpha. But this thing up here is a quite a different thing, which is the probability that their disease free. 87 00:11:33,650 --> 00:11:41,660 Even though they've tested positive. And what we've seen is that these two are not only distinct, but they are quite, quite different in value. 88 00:11:41,900 --> 00:11:46,550 I mean, just to get a grip on that difference, it's like the difference between these probabilities, 89 00:11:46,910 --> 00:11:54,230 the probability that the temperature will be below zero, given it's snowing, which is very high. 90 00:11:54,740 --> 00:11:58,670 If it snows, you can be pretty sure that the temperature is below zero or close to it. 91 00:12:00,800 --> 00:12:09,870 But this thing here, the probability it will snow given the temperatures below zero, which is actually that's quite low. 92 00:12:10,340 --> 00:12:15,440 We often get frosts, but without without snow or possibly even more. 93 00:12:16,070 --> 00:12:21,530 Well, medically, what's the probability that someone has sports, given that they've got measles quite high? 94 00:12:22,310 --> 00:12:26,720 What's the probability that they've got measles given they've got spots quite low. 95 00:12:27,080 --> 00:12:38,540 All right. So in the case of diseases, we've found that the false false positive error rate can be pretty quite low at 5%, 96 00:12:39,380 --> 00:12:48,200 even though the false positive report probability the probability that a positive report is in fact, 97 00:12:48,200 --> 00:12:51,350 a false one actually is going to be very high, 98%. 98 00:12:51,920 --> 00:13:02,780 So if you were told by the screening program, you've got the disease, the probability of disease free is very high, much extent. 99 00:13:03,170 --> 00:13:09,710 Okay. Now, now, does this get back to we'll get to the replication crisis. 100 00:13:12,800 --> 00:13:19,360 So I'm going to use the method of philosophers, which is to tell outrageous stories. 101 00:13:21,230 --> 00:13:28,790 You really the little vignettes models that we can use to to to get a grip on what's going on. 102 00:13:29,030 --> 00:13:41,480 So I can tell a story of a mad scientist, Dr. M, who generates crazy hypotheses. 103 00:13:41,960 --> 00:13:53,090 He's wildly creative and imaginative, but he's not so mad that he uses bad methods for testing his hypotheses. 104 00:13:53,090 --> 00:13:56,540 It's just he's got this wild imagination. 105 00:13:56,690 --> 00:14:01,660 But when it comes to testing his hypotheses, he does so really quite stringently. 106 00:14:01,670 --> 00:14:11,900 He uses null hypothesis significance testing quite the proper way you with a significance level of of of 5%. 107 00:14:12,910 --> 00:14:25,850 So now let's imagine that for Dr. M, he's generating all these wild new ideas. 108 00:14:25,970 --> 00:14:32,010 Very creative, but so creative, so imaginative that very few of them turn out to be right. 109 00:14:32,030 --> 00:14:36,660 In fact, only one in a thousand of his new hypotheses is, in fact, true. 110 00:14:37,310 --> 00:14:53,180 The rest are all false. But because he's using null hypothesis significance testing in the proper way, his accuracy his is one minus alpha is 95%. 111 00:14:53,610 --> 00:14:56,870 Right. Okay. So we can ask ourselves the question. 112 00:14:59,600 --> 00:15:04,130 Dr. M has ties to lots of hypotheses. One of them has tested. 113 00:15:04,520 --> 00:15:17,650 He has gone through a randomised controlled trial where it's null hypothesis, significance testing and he's got a p value of less than .05. 114 00:15:17,660 --> 00:15:21,860 So in the sense it's passed our standard test for truth. 115 00:15:22,490 --> 00:15:25,790 What is the probability that it is in fact true? 116 00:15:26,510 --> 00:15:30,980 That's our question. So we've got a publishable result. 117 00:15:32,270 --> 00:15:37,849 Perhaps he sends it off and publishes it top journal because after all he's done a randomised 118 00:15:37,850 --> 00:15:42,560 controlled trial a perfectly proper way and they've got a statistically significant result. 119 00:15:42,980 --> 00:15:46,879 But what's the chance that is in fact true? Well, 120 00:15:46,880 --> 00:15:53,600 I hope you can see that the structure of this question is exactly the same as the structure of the 121 00:15:53,600 --> 00:16:06,860 question I asked about the screening program that we had a disease that was found in one individual. 122 00:16:06,860 --> 00:16:19,820 And every thousand here we've got a person who's generating a hypothesis set of hypotheses, one of which is true in every thousand. 123 00:16:21,200 --> 00:16:30,140 In the screening program case, we had a test for the disease, which was 95% accurate. 124 00:16:31,040 --> 00:16:39,200 Here we've got a. Method of testing these hypotheses, which is 95% accurate. 125 00:16:39,220 --> 00:16:43,210 We've got exactly the same structure of of problem. 126 00:16:43,780 --> 00:16:55,480 And so it will turn out that the chance that Dr. M's positive result that true is just 2%. 127 00:16:55,990 --> 00:17:13,300 But just as in the the screening program case there, the people who the false positives greatly outweigh the one true positive. 128 00:17:13,510 --> 00:17:17,350 The same will be the case for for doctor Dr. M. 129 00:17:18,640 --> 00:17:26,800 Okay. So now let's move from Dr. M to. 130 00:17:30,120 --> 00:17:36,329 Same Professor s time. She generates hypotheses. 131 00:17:36,330 --> 00:17:43,469 She's working in a new field. It's a difficult area of science because it's new. 132 00:17:43,470 --> 00:17:47,760 There's not a lot of indication of where the truth really lies. 133 00:17:49,080 --> 00:17:53,070 She generates some hypotheses and she tests them stringently. 134 00:17:54,030 --> 00:18:02,040 In her case, we are going to imagine that the base rate of truth is 10%. 135 00:18:02,040 --> 00:18:11,249 So she's 100 times better than my Dr. M and she's much better at generating hypotheses. 136 00:18:11,250 --> 00:18:15,330 But still, because of the difficulty of her area, its newness, some other factors. 137 00:18:15,910 --> 00:18:22,650 If she gets things right in her hypothesising one occasion in in ten, 138 00:18:23,640 --> 00:18:29,190 she uses exactly the same hypothesis testing methods as Dr. M and the rest of us. 139 00:18:31,230 --> 00:18:37,560 So high accuracy, one minus alpha is is 95%. 140 00:18:38,400 --> 00:18:41,610 So she gets a positive result. 141 00:18:43,350 --> 00:18:49,140 It's publishable. Same reasons. What is the chance that it is in fact true? 142 00:18:49,170 --> 00:18:56,819 What's the probability? It's. It's true given that it passes the stringent test. 143 00:18:56,820 --> 00:19:03,020 And I think I am. I'm not going to give you the wrong number. 144 00:19:03,060 --> 00:19:10,100 Actually, I think it's one of to this. So the the chances are 68%. 145 00:19:10,770 --> 00:19:14,810 Right. And on the slide, it's just 32%, which is the wrong way round. 146 00:19:15,650 --> 00:19:19,070 That's so slight change. 147 00:19:19,070 --> 00:19:23,390 A slight chance. I should be 68%. This is the chance that it's false, right? 148 00:19:23,420 --> 00:19:26,610 Okay. Okay. 149 00:19:26,870 --> 00:19:34,330 So in this case, we said we don't want to conflate these two two things. 150 00:19:34,340 --> 00:19:40,610 In this case, we've got if you conflate them. 151 00:19:40,880 --> 00:19:44,390 And I think this is part of the this is part of the problem that's going going on. 152 00:19:44,390 --> 00:19:52,400 You'll think we're using null hypothesis, significance testing, alpha of 5%. 153 00:19:53,720 --> 00:20:08,420 That means the previous slide that our hypothesis says will give us a significant result, even though it's false in 5% of cases. 154 00:20:08,420 --> 00:20:13,910 And you might think, oh, that's that's fine. 5% is yeah, it's small enough. 155 00:20:14,210 --> 00:20:23,720 That means we'll be getting things wrong only one time in 20, but getting things wrong only one time in 20 looks. 156 00:20:23,720 --> 00:20:31,970 If you're saying this, I'm saying we're getting things wrong given that we get a successful result only one time in 20. 157 00:20:31,980 --> 00:20:41,660 But that's that's a mistake. So we've seen from all the previous cases that these two things are not the same and this can be quite small, 158 00:20:41,660 --> 00:20:52,520 whereas that can be quite high in the case of Dr. M and in the case of this screening program, that was 5%, but that was 98%. 159 00:20:53,510 --> 00:21:03,260 In the case of saying not so in Professor S, this was 5% and that was 32%. 160 00:21:07,040 --> 00:21:11,090 So, so imagine as it were. Yeah. 161 00:21:12,980 --> 00:21:22,670 Can you conflate these two? But that's wrong because in the case of Professor SS, that's 32%, not 5%. 162 00:21:24,320 --> 00:21:33,559 But say you thought actually what you wanted was 5%, you wanted it to be the case that of all of the successful hypotheses, 163 00:21:33,560 --> 00:21:44,570 the ones that come through our null hypothesis, significance testing, but now for 5% successfully give us a statistically significant result. 164 00:21:44,930 --> 00:21:48,740 You want it only 5 to 10% of them to be wrong. Okay. 165 00:21:49,340 --> 00:21:52,730 You wanted 95% of the stuff you think is publishable to be correct. 166 00:21:54,080 --> 00:21:58,670 Then the question is what alpha would you have to have in order to get that result? 167 00:22:00,050 --> 00:22:14,810 That's the say it. What p value would you have to have is you want to generate only 5% of our positive results being false, 95% being true. 168 00:22:15,050 --> 00:22:18,920 Well, it actually would have to be a whole lot smaller than that 5%. 169 00:22:19,430 --> 00:22:27,320 And we could this is given these assumptions. 170 00:22:27,530 --> 00:22:39,650 So it has to turns out to be almost one ninth of the average of the 5% that we started off with. 171 00:22:42,590 --> 00:23:07,310 Okay. So what I am suggesting is that we would expect to get a high rate of false positives in our research if two things are correct, 172 00:23:07,850 --> 00:23:12,560 if we have a low background rate of truth. 173 00:23:12,920 --> 00:23:19,489 So in the case of Professor, as I was suggesting, that one in ten of her hypotheses is true, 174 00:23:19,490 --> 00:23:32,690 nine out of ten are false, and an alpha that's even though may be low is non-negligible. 175 00:23:32,690 --> 00:23:39,400 So five, 5%. So we regard the P values as a significant publishable and so forth. 176 00:23:39,470 --> 00:23:50,300 It's less than 5%. So that 5% is low, but it's not negligible and it's the combination of these two then I think will generate, 177 00:23:50,690 --> 00:23:58,880 I've argued will generate a high proportion of false results amongst those we think are publishable. 178 00:23:59,110 --> 00:24:10,720 So in the case of Professor Rice, it was that 32% of her results turned out to be mistaken, even though she got a positive outcome. 179 00:24:10,810 --> 00:24:14,830 She got statistically significant results at the 5% level. 180 00:24:18,100 --> 00:24:22,030 So, of course, it's a perfectly good scientist. 181 00:24:23,110 --> 00:24:28,540 SARS is producing results that are mistaken. 182 00:24:29,200 --> 00:24:32,350 Almost one occasion in story. 183 00:24:32,770 --> 00:24:40,810 Then it's hardly surprising when other scientists come along and try to reproduce her work. 184 00:24:41,320 --> 00:24:45,850 They find that they failed to do so in a number of cases. 185 00:24:48,880 --> 00:24:54,750 So but that's to say this is not true, is clearly not to impugn her at all. 186 00:24:54,760 --> 00:24:59,920 She's doing the best science she possibly can in a challenging area. 187 00:25:01,870 --> 00:25:05,680 She's not engaging any questionable research practices. 188 00:25:05,860 --> 00:25:08,560 She's doing everything by the book. 189 00:25:08,620 --> 00:25:19,959 The only problem is, well, the two problems are that in the fields she's working in makes it difficult to know, to guess what the good hypotheses are, 190 00:25:19,960 --> 00:25:28,450 to guess what the truth is, to generate ideas that are likely to be true, combined with the standard value of an author at 5%. 191 00:25:29,320 --> 00:25:43,180 Now. So what I've shown is that this combination will generate a high proportion of false positives in our research. 192 00:25:44,730 --> 00:25:48,670 Next question is, is our science like this? 193 00:25:48,700 --> 00:26:01,149 Does it have these these features? So briefly talk about about that and why I think that the kinds of research that many 194 00:26:01,150 --> 00:26:06,490 of us are engaged in or at least interested in is such that we will have a low pi, 195 00:26:06,490 --> 00:26:09,490 that's to say a low background rate of of truth. 196 00:26:11,920 --> 00:26:18,430 Many of our hypotheses are derived from some underlying lying theory. 197 00:26:19,870 --> 00:26:25,810 Now, let's talk a little bit about the way this works in physics and particle physics. 198 00:26:27,580 --> 00:26:38,680 So if you remember a few years ago at CERN using the Large Hadron Collider, they discovered the Higgs boson. 199 00:26:39,340 --> 00:26:49,990 Absolutely exciting. Why did they spend billions of pounds building a machine to find this Higgs boson, amongst other things? 200 00:26:50,060 --> 00:26:56,770 That was the main thing that they wanted to to find. That's a lot of money to spend on an experiment and unique. 201 00:26:56,770 --> 00:27:05,110 Be pretty confident that you're going to produce some interesting results if you're going to spend that kind of money. 202 00:27:05,120 --> 00:27:12,520 And many people thought that the finding of the Higgs boson vindicated all this expense. 203 00:27:12,560 --> 00:27:26,080 Now it's because the hypothesis that there is a Higgs boson is derived fairly directly from something called the standard model of particle physics. 204 00:27:26,090 --> 00:27:28,550 And that's been around for some decades, you know, 205 00:27:28,600 --> 00:27:45,010 half a cent best part of a century and more than and that theory is one of the the standard model is one of the best confirmed theories in science. 206 00:27:45,910 --> 00:27:53,889 And in fact, the Higgs particle was really one of the last bits of the jigsaw to be to be fitted in all 207 00:27:53,890 --> 00:28:00,520 its other predictions about what kinds of particle there might be had been already confirmed. 208 00:28:01,090 --> 00:28:16,140 So. Scientists were in this position, they were able to say, look, if the standard model is correct, then it's a pretty direct consequence. 209 00:28:17,640 --> 00:28:25,410 Absolutely direct. But with a few very plausible assumptions, we could show that there must be this thing, the Higgs particle. 210 00:28:26,790 --> 00:28:32,100 And furthermore, they could say there is very strong evidence that this standard model is correct. 211 00:28:32,550 --> 00:28:36,360 Put those two together. You think we should be pretty confident? 212 00:28:36,360 --> 00:28:42,270 There is that Higgs particle out there somewhere, and then you can devise the experiment to to detect it. 213 00:28:44,460 --> 00:28:47,790 Okay, but is this. Now, let's turn to medicine. 214 00:28:48,210 --> 00:28:54,780 It is. Are things like that in medicine? The answer, I think, is very rarely. 215 00:28:56,910 --> 00:29:02,070 And that's not because there's something wrong with medicine. This just in some ways a lot more complicated than particle physics. 216 00:29:04,830 --> 00:29:06,030 It doesn't have any physics. 217 00:29:06,030 --> 00:29:13,050 And we they have the if they have it easy, it's you guys involved in in medical research who have have the difficult, difficult jobs. 218 00:29:15,660 --> 00:29:21,899 We medicines suffer from a weak underlying theory, relatively speaking, because, well, 219 00:29:21,900 --> 00:29:31,830 for one reason there can be weak connections between what we discover from basic research, 220 00:29:31,830 --> 00:29:39,520 say, in physiology and what we think might happen if we introduce a drug into someone. 221 00:29:41,370 --> 00:29:53,460 It's we can make plausible connections, but the complexity of the human body tends to have homeostatic mechanisms work or sometimes don't work. 222 00:29:56,520 --> 00:30:00,690 And the fact that things are so complicated means it's very difficult to say with certainty. 223 00:30:01,350 --> 00:30:09,480 With certainty, given what we've discovered from our basic research, this is how we think this drug will affect an individual. 224 00:30:13,290 --> 00:30:26,610 And then secondly, we may often be working with underlying theories that are themselves plausible, have some evidence, but not entirely certain. 225 00:30:28,500 --> 00:30:34,170 So, for example, if you're going to use that of the drug map and using map, 226 00:30:34,170 --> 00:30:44,489 which was developed in order to help patients suffering from Alzheimer's, in this case, 227 00:30:44,490 --> 00:30:56,280 the underlying theory is that it's the beta amyloid plaques that we find in the brains of Alzheimer's sufferers 228 00:30:56,640 --> 00:31:06,690 that are the cause of Alzheimer's and of the cognitive impairment that Alzheimer's patients suffer from. 229 00:31:09,540 --> 00:31:17,850 But and using MAP is a drug based on antibodies to these plaques. 230 00:31:18,120 --> 00:31:27,870 And the hope was that as an antibody that it would prevent the further development of these 231 00:31:27,870 --> 00:31:35,790 plaques and may even cause that reduction and in consequence would help Alzheimer's patients, 232 00:31:37,140 --> 00:31:46,410 help prevent them from getting that condition worsening or possibly even help repair the damage that they had suffered. 233 00:31:47,250 --> 00:31:50,340 A lot of money was put into developing the drug and into trialling it. 234 00:31:52,130 --> 00:32:00,060 Sadly, the outcome was was no, there was no benefit deriving from this drug. 235 00:32:01,470 --> 00:32:07,440 It's entirely plausible to think the idea that it could have helped if we were right, 236 00:32:07,560 --> 00:32:15,810 that the cause of Alzheimer's and the cognitive deterioration that it involves are these plaques, 237 00:32:16,260 --> 00:32:22,080 and it's quite plausible that an antibody to them will be be helpful. 238 00:32:25,920 --> 00:32:31,620 But even if the underlying theory of the amyloid cascade hypothesis was correct, 239 00:32:31,950 --> 00:32:39,630 there's no certainty that the antibody would help seem like a plausible idea, but it's certainly no guarantee. 240 00:32:40,200 --> 00:32:47,430 And furthermore, scientists weren't even sure that the amyloid cascade hypothesis was correct. 241 00:32:47,700 --> 00:32:53,160 So there are others who thought that the so-called tangles that are associated with 242 00:32:53,190 --> 00:33:00,840 Alzheimer's are more significant as a causal factor than the beta amyloid plaques. 243 00:33:01,320 --> 00:33:07,850 So in this case. We can say the following things. 244 00:33:08,840 --> 00:33:17,030 If this hypothesis is correct, then it is conceivable that back rooms that will help Alzheimer's patients. 245 00:33:17,100 --> 00:33:27,800 Yeah, it's not a bad idea. And then we could also say that this amyloid cascade hypothesis might be correct, but the evidence is far from conclusive. 246 00:33:28,010 --> 00:33:38,030 So you can see the contrast with the physics case. If we had no reason to be we had some reason to be hopeful that that would do the job. 247 00:33:38,660 --> 00:33:53,300 But we have no reason to be highly confident. So they hit that's my reason for thinking that in much medical science, 248 00:33:53,750 --> 00:34:04,280 we should start off by thinking that our hypotheses have a low prior chance of being correct just because it's really, really difficult. 249 00:34:04,520 --> 00:34:09,440 And our knowledge is very, very partial. 250 00:34:11,720 --> 00:34:21,680 There are other reasons for thinking that the hypothesis we actually put forward for testing may have a prior probability of being correct. 251 00:34:21,680 --> 00:34:30,800 It's quite low and other things that we can think about the fact that there's pressure to do experiments to try and find out what's going on. 252 00:34:30,890 --> 00:34:38,090 After all, Alzheimer's is a very serious problem for many individuals. 253 00:34:38,990 --> 00:34:42,770 It would be good if we could find something that works. 254 00:34:42,770 --> 00:34:47,060 And so there's quite a lot of pressure on us to think what might help. 255 00:34:47,300 --> 00:34:57,410 And then to test it. But that means, well, we will be putting forward hypotheses and testing them, as it were, relatively early stage. 256 00:34:58,640 --> 00:35:05,900 Episteme Italy speaking the stage when our knowledge and our expectations about that being correct are relatively low compared, 257 00:35:05,900 --> 00:35:07,240 say, to the physics case. 258 00:35:07,560 --> 00:35:19,340 You think about the physics case, they were going to be pretty darn sure that the the Higgs boson existed before they went looking for it. 259 00:35:21,350 --> 00:35:29,180 It would have been rather embarrassing to say we built this billion euro experiment. 260 00:35:29,540 --> 00:35:32,660 And unfortunately, it's not found the thing that we were looking for. 261 00:35:33,470 --> 00:35:40,490 So there the experimentation comes with every motivation to delay experimenting until 262 00:35:40,490 --> 00:35:44,120 they were sure that the experimental experiment would show what they wanted to do. 263 00:35:44,450 --> 00:35:46,730 Medicine, the pressures that are in the other direction. 264 00:35:47,180 --> 00:35:54,910 The pressures to think up hypotheses and test them early, even though we may not be certain of what the outcome is and that may be good, 265 00:35:55,040 --> 00:35:58,280 well, we can just debate whether that's a good or bad thing. It's just different. 266 00:36:00,650 --> 00:36:08,090 So the other feature of the explanation is the Non-negligible Alpha, 267 00:36:08,420 --> 00:36:19,220 the fact that we regard something as a statistically significant outcome if we get a P value of less than 0.05 and well, 268 00:36:19,790 --> 00:36:26,780 this doesn't need much discussion since it's simply the accepted convention. 269 00:36:31,720 --> 00:36:38,390 That is the way much research in clinical medicine and psychology works. 270 00:36:38,410 --> 00:36:45,690 We do we use null hypothesis, significance testing, and we regard an outcome as statistically significant. 271 00:36:45,700 --> 00:36:57,910 If the P value is less than 0.05 and that means that we will get type one error rate of 5%. 272 00:36:59,110 --> 00:37:07,760 Now, of course, this is a convention and we can look into the history of all of this and you can 273 00:37:07,810 --> 00:37:15,940 find that the 5% is the number that was accepted from earlier in the 20th century. 274 00:37:16,480 --> 00:37:20,950 But there's no particular reason why it has to be 5% 2.05. 275 00:37:22,450 --> 00:37:26,620 In physics, things are different. 276 00:37:27,190 --> 00:37:35,290 The convention is five sigma, five standard deviations from the mean. 277 00:37:35,620 --> 00:37:46,240 So instead of having an error rate, a false positive rate of one in 20, there's one error rate of one in 3 million. 278 00:37:47,290 --> 00:37:52,900 Now, I'm yeah, we shouldn't think that's the correct standard. 279 00:37:53,200 --> 00:37:58,420 There will be different standards for four different different scientist sciences, 280 00:37:59,020 --> 00:38:08,860 but it's a reminder that the 5% that we typically work with isn't isn't given to us by by God or by the rules of rationality. 281 00:38:10,030 --> 00:38:14,920 It's, it's, it's a convention that we could decide to change if we so wished. 282 00:38:15,640 --> 00:38:24,820 Now, there are other explanations of what's going on in the replication crisis. 283 00:38:24,820 --> 00:38:28,330 There's discussion, low statistical power, 284 00:38:28,990 --> 00:38:41,470 there's a lot of mention of publication bias and other forms of bias are so-called questionable research practices and even fraud. 285 00:38:42,190 --> 00:38:51,730 And we could talk about those. And I've got something to say about why I think that these are only weak or partial explanations. 286 00:38:52,540 --> 00:38:58,570 I think we'd be nice to give you a chance to to talk. So I'll just say that there are other explanations on the table. 287 00:38:58,930 --> 00:39:05,230 Most of my hypothesis is that we don't have to reach for these. 288 00:39:05,470 --> 00:39:12,910 We don't have to think that something bad is happening in science just because we've got on reproducible research. 289 00:39:15,690 --> 00:39:22,320 We could also ask what's to be done. And I think that if if what I'm saying is correct. 290 00:39:22,770 --> 00:39:30,590 Then there are a number of things we could choose to do, which is to say, live with this is the nature of science. 291 00:39:30,600 --> 00:39:34,440 Who says science is easy? Who says it's always going to produce the right results? 292 00:39:34,740 --> 00:39:41,340 It's just a fact that difficult science, difficult is going to produce false results from time to time. 293 00:39:42,180 --> 00:39:45,690 Just expect that to be the case and learn to live with it. 294 00:39:47,550 --> 00:39:53,520 On the other hand, if you are going to accept that we ought to support replication better than we actually do. 295 00:39:53,820 --> 00:39:57,510 If you're going to accept that our science is going to produce false results, 296 00:39:57,930 --> 00:40:04,440 then you have to be more favourable to scientists who want to try and find out which those are. 297 00:40:05,040 --> 00:40:08,700 And I don't think we live with that is what you think we should do. 298 00:40:09,180 --> 00:40:13,180 Then I don't think we are supporting replication enough. 299 00:40:13,200 --> 00:40:23,730 I mean, some journals simply won't publish replication studies and that just seems to me to be bad practice in the light of what I'm saying. 300 00:40:24,150 --> 00:40:31,560 We could put our effort into increasing the chance that our hypotheses are correct before we test them. 301 00:40:32,220 --> 00:40:44,250 And if we want to be, yeah, we want to move from you don't want to be doctor mad and produce loads of crazy wild but false hypotheses. 302 00:40:44,260 --> 00:40:50,690 You want to move from block to mat professor saying but perhaps we should try and move from professor same to here 303 00:40:52,020 --> 00:41:03,780 you to a better position that she's in by by being more exacting before we put forward hypotheses for research. 304 00:41:03,780 --> 00:41:10,799 And that would require doing it more and more basic science in order to work 305 00:41:10,800 --> 00:41:19,110 out which possible interventions have a decent chance of actually working. 306 00:41:19,650 --> 00:41:29,100 I mean, the other thing we could do is just be more stringent in our testing and say, well, you're perhaps that an alpha of 5% is too, too lax. 307 00:41:31,410 --> 00:41:39,270 Note that there's going to be a trade-off between our value of alpha and the effect size. 308 00:41:39,510 --> 00:41:50,250 So that's to say that. Effect sizes, which are statistically significant at the 5% level, 309 00:41:50,910 --> 00:41:59,250 it won't be statistically significant at the 1% level or lower in many cases, obviously is not a bad thing. 310 00:41:59,250 --> 00:42:06,660 Well, it might or a good thing. It might be a good thing in some cases, 311 00:42:06,960 --> 00:42:16,860 because in a number of studies we find that particularly pharmaceutical companies produce 312 00:42:17,730 --> 00:42:28,350 studies with large numbers of subjects and they find outcome that's statistically significant. 313 00:42:29,100 --> 00:42:32,670 But we look at the effect size and think, well, that's not clinically very important. 314 00:42:33,270 --> 00:42:37,620 Yes, statistically significant, but clinically insignificant. 315 00:42:39,210 --> 00:42:43,800 Now, it might be helpful to that pharmaceutical company because they've shown that their drug 316 00:42:43,800 --> 00:42:49,920 is marginally better than some existing treatment and therefore other things being equal. 317 00:42:51,450 --> 00:42:55,319 If physicians will want to prescribe that new treatment. 318 00:42:55,320 --> 00:43:00,870 So it makes sense for the pharmaceutical company. But do we as a society, if we really benefited from this? 319 00:43:01,140 --> 00:43:09,330 Probably not a lot. And so one outcome of doing this might be that, as it were, 320 00:43:10,890 --> 00:43:23,220 we eliminate those kinds of case that the things that are produced that say if we put Alpha down to 1% or point 5%, 321 00:43:23,580 --> 00:43:29,760 then it might be that more of the statistically significant results are also clinically significant. 322 00:43:29,970 --> 00:43:36,690 But it might also be the case that we miss out on some some useful outcomes. 323 00:43:38,610 --> 00:43:46,110 So I'll finish with one one with my graph simply because I just put a lot of effort into making it. 324 00:43:49,410 --> 00:44:02,130 Just to point out that if we talk about the power of a study, which is the ability to avoid Type two errors. 325 00:44:03,900 --> 00:44:12,360 The that's to say the power is the ability to to detect what's true when it is, in fact, true. 326 00:44:14,250 --> 00:44:19,739 So 8.8 is thought to be a satisfactory level of power. 327 00:44:19,740 --> 00:44:26,070 But there's good evidence that many of our studies have lower power than than that. 328 00:44:27,990 --> 00:44:34,379 And one of the things that I'm interested in is whether we want to put our effort, 329 00:44:34,380 --> 00:44:41,790 whether we should care about power or low alpha or high one minus alpha. 330 00:44:43,560 --> 00:44:53,970 And this side is the positive predictive value is one minus this false positive report probability. 331 00:44:54,380 --> 00:45:01,260 That's to say, if you've got a positive report, what's the probability really is true. 332 00:45:01,850 --> 00:45:08,399 And what we want to get as close to one as possible up there, 333 00:45:08,400 --> 00:45:15,930 we want it to be the case that when we've got a positive test result, that that that really is the case. 334 00:45:17,280 --> 00:45:29,790 What this graph is showing is that so long as your power is reasonably high, then increasing power further doesn't get you much good. 335 00:45:30,160 --> 00:45:43,080 So. So if even if you had maximum power of one but your alpha is still 5% and your and we're 336 00:45:43,080 --> 00:45:51,270 working with a pie of 10% like professor say you're only one in ten of our hypothesis is true. 337 00:45:51,870 --> 00:45:57,870 That may still be the case that almost a third of positive outcomes are false. 338 00:45:59,070 --> 00:46:03,990 So to help her, what we need to do is not increase pounds at maximum. 339 00:46:04,230 --> 00:46:12,410 What we need to do is reduce or reduce alpha. So that's yeah. 340 00:46:12,660 --> 00:46:19,799 On the other hand, if your power starts off really low, then there will be benefit to increasing power. 341 00:46:19,800 --> 00:46:24,480 So some of the debate that goes on in this area is, is it low power? 342 00:46:25,780 --> 00:46:32,429 That's a problem. Well, I think for very low power studies, then that might well be the case. 343 00:46:32,430 --> 00:46:43,110 But the point is that even for high powered studies, there's still a way to go if you have a non-negligible alpha. 344 00:46:43,580 --> 00:46:52,260 Combined with a difficult area where it's difficult to produce two new ideas that's.