1 00:00:00,420 --> 00:00:03,210 Penny Lane, thank you very much for coming. 2 00:00:03,240 --> 00:00:11,880 And I'd like to welcome you to the evening's guest lecture, which is part of the evidence based health care master's program. 3 00:00:12,780 --> 00:00:16,740 I'm delighted to introduce Professor Alan Selman, 4 00:00:17,160 --> 00:00:25,260 who is an epidemiologist and rheumatologist and is currently a professor of musculoskeletal health here in Oxford. 5 00:00:25,950 --> 00:00:35,500 He's also the author of the core textbook that we recommend, and I started my research career out with Helen. 6 00:00:35,530 --> 00:00:39,300 So a lot of what I have now is thanks to Alan, 7 00:00:39,600 --> 00:00:50,070 but I'm personally very excited that he's here tonight and he's going to talk to us about how bad so many of our old examples. 8 00:00:52,830 --> 00:01:06,360 Thank you. Well, thanks, Claire. So I am aware that you've been at it all day, so firstly, feel free to nod off. 9 00:01:08,490 --> 00:01:16,230 I'm used to it. And secondly, I'll try and make it reasonably light hearted. 10 00:01:17,010 --> 00:01:27,479 As much as epidemiology can be light hearted and really to share with you some 11 00:01:27,480 --> 00:01:38,280 thoughts about kind of things that I may or may not have done differently and when. 12 00:01:39,030 --> 00:01:45,180 So I'm going to about four challenging areas. One, is primary data collection worth the hassle? 13 00:01:46,980 --> 00:01:52,650 Secondly, epidemiology is not the art of the perfect, the art of the possible. 14 00:01:53,760 --> 00:02:01,230 Third, the that the days of the randomised controlled trial are probably at an end. 15 00:02:02,970 --> 00:02:10,830 And fourthly, we spend a lot of our life trying to identify causes and do we actually know what we mean? 16 00:02:11,550 --> 00:02:25,170 So those are the four areas that I'm going to attack and hopefully if do nothing else, but stimulate a little bit of discussion. 17 00:02:28,470 --> 00:02:35,070 So what about primary versus secondary data collection? 18 00:02:35,460 --> 00:02:39,420 What do I mean by that? Clearly, I mean, 19 00:02:39,930 --> 00:02:50,190 the studies that Claire is so wonderfully involved in when we work together in Manchester is what was traditionally the view of the epidemiologist. 20 00:02:50,190 --> 00:02:58,650 You went out and you interviewed people or you sent them a questionnaire by post or you did some measurement on them. 21 00:03:00,270 --> 00:03:05,970 And the alternative is that you use the data that somebody else has collected. 22 00:03:06,930 --> 00:03:15,180 And if you were to draw up a balance sheet, this would be on your balance sheet. 23 00:03:15,720 --> 00:03:24,120 Well, of course, it's expensive collecting primary data and it takes all the time. 24 00:03:24,840 --> 00:03:27,960 It's quite difficult to do it in a broad area. 25 00:03:28,830 --> 00:03:38,730 But if you know what you want to collect, it can be accurate and you can collect it quite a lot in quite some detail. 26 00:03:39,420 --> 00:03:44,760 And the secondary data costs are low, speed is quick. 27 00:03:44,760 --> 00:03:48,420 You can cover a broad area of accuracy. 28 00:03:48,940 --> 00:03:52,409 Well, very interesting, actually. 29 00:03:52,410 --> 00:03:59,970 Very interesting. I was I often said as an epidemiologist, if you're interested in international health. 30 00:04:01,320 --> 00:04:05,430 The one thing you can rely on is vital statistics, births and deaths. 31 00:04:06,120 --> 00:04:14,909 And I was at a meeting a few months ago chatting with a colleague from India who told me 32 00:04:14,910 --> 00:04:21,000 that he reckoned probably only about 30% of Indian death certificates were accurate. 33 00:04:21,600 --> 00:04:30,299 So we assume that secondary data maybe has even low accuracy. 34 00:04:30,300 --> 00:04:33,300 Maybe it's not even as good as that. 35 00:04:36,120 --> 00:04:45,660 But I think there are an enormous number of issues in collecting primary data. 36 00:04:46,980 --> 00:04:59,610 And when I think of the amount of effort that I've engaged in, and I guess more importantly for other people to engage in in collecting data. 37 00:05:00,900 --> 00:05:08,850 And the first thing is this issue of will we never study a whole population and we only have a study, a sample. 38 00:05:09,720 --> 00:05:13,350 And actually, when you start thinking about that, 39 00:05:13,890 --> 00:05:22,080 it immediately causes a lot of anxiety because the whole basis of statistics is based on the 40 00:05:22,080 --> 00:05:28,860 fact that we are taking a random sample and inferring something about a whole population. 41 00:05:29,550 --> 00:05:39,810 Now, let's take the simplest idea where you're doing in a population in terms of a census, a whole national population. 42 00:05:40,530 --> 00:05:44,670 How do we go about taking a true random sample? 43 00:05:45,060 --> 00:05:49,530 Well, it's actually very, very difficult. And we might go to primary care. 44 00:05:49,530 --> 00:05:53,429 But what is a primary care population? 45 00:05:53,430 --> 00:06:01,680 How representative and how do we know that that primary care population is representative of all other populations? 46 00:06:03,210 --> 00:06:12,780 Much more challenging than that is when we as clinicians study patients with a disease 47 00:06:13,410 --> 00:06:19,870 in a case control study or recruitment for a longitudinal observational study. 48 00:06:20,670 --> 00:06:32,370 Because what we do, we make the inference that the people we are studying are a representative sample of everybody with the disease. 49 00:06:33,560 --> 00:06:40,400 As what it is that exists now has existed in the past. 50 00:06:41,890 --> 00:06:51,580 Will exist in the future. Well, of course, we can't take people who will exist in the future because we don't know who they are. 51 00:06:53,080 --> 00:06:59,140 And studying people who existed in the past are going to tell us about things that are of historical interest. 52 00:06:59,830 --> 00:07:09,670 So once we start thinking that we're doing taking a random sample of people with a disease in terms of something, 53 00:07:09,790 --> 00:07:17,230 we realise that the underpinning that we're taking a true random sample is flawed. 54 00:07:17,620 --> 00:07:27,550 If I take a, you know, do a clinical trial, the people I treat with the disease in my study, the future generations, 55 00:07:27,730 --> 00:07:33,700 future physicians, future health care professionals who want to know that my sample is saying, well, clearly it can't be. 56 00:07:33,700 --> 00:07:38,770 So I think some of the inferences we're making are flawed. 57 00:07:39,610 --> 00:07:44,170 It's why, I guess I'm saying epidemiology is the art of the possible. 58 00:07:45,670 --> 00:07:53,820 However, even if we can get that, we've got to get permission now. 59 00:07:56,050 --> 00:08:08,800 You can't necessarily read all those stages, but it is incredibly hard getting all the necessary permissions to get a study started. 60 00:08:09,880 --> 00:08:21,010 The National Institute of Health Research reckons on from the date of letter of your award, 61 00:08:21,790 --> 00:08:29,320 not the date of the letter of when you submit your grant, but when that letter lands in your mailbox. 62 00:08:29,740 --> 00:08:33,130 We are pleased to tell you you have got the grant way. 63 00:08:33,160 --> 00:08:41,800 Hey, you are lucky if you recruit your first patient within 18 months. 64 00:08:42,970 --> 00:08:50,320 That's what they work on. 18 months from award to recruitment of a first patient. 65 00:08:51,430 --> 00:08:54,620 And who are I? There isn't to predict. 66 00:08:54,700 --> 00:09:01,810 And you tell you the enormous steps that you need to go through in terms of the ethical approval, 67 00:09:02,410 --> 00:09:07,930 the governance and all the relationships connected with that. 68 00:09:08,620 --> 00:09:18,310 And government attempts to try and simplify that process have really struggled. 69 00:09:19,720 --> 00:09:24,340 I remember some years ago, a very long time ago, 70 00:09:24,340 --> 00:09:34,600 I wanted to do a study of a very it's quite a common condition that some vineyards live at or cal poly myalgia rheumatic 71 00:09:34,600 --> 00:09:44,500 heart condition ah where elderly people get significant ache and stiffness in their joints not associated with arthritis. 72 00:09:45,010 --> 00:09:52,180 But isn't associated with increased inflammation raised sore in the blood and nobody knew how common it was. 73 00:09:52,180 --> 00:10:00,640 So I decided I wanted to do a study and I thought that I would choose a convenience sample instead of a random sample. 74 00:10:01,030 --> 00:10:05,830 I went to an old people's home or three old people's home in Southend on Sea. 75 00:10:07,720 --> 00:10:10,990 Didn't bother getting ethical approval because you didn't need it in those days. 76 00:10:11,560 --> 00:10:15,760 And I just contacted somebody about in the home, said, Yes, that's fine. 77 00:10:16,120 --> 00:10:19,510 So I turned up at home and say, the doctors here to take your blood. 78 00:10:22,780 --> 00:10:25,479 I shouldn't be telling you this. Well, anyway, I did. 79 00:10:25,480 --> 00:10:32,470 And it got published and it was the this country's first data on the epidemiology of poly myalgia. 80 00:10:33,850 --> 00:10:37,150 To do the same thing today would be very difficult. 81 00:10:39,370 --> 00:10:47,770 And even if you've got all the admissions and what about getting a worthwhile response? 82 00:10:49,190 --> 00:10:57,849 And I spent about 15 years of my life on the steering committee of UK Biobank. 83 00:10:57,850 --> 00:11:01,659 Hands up, who's heard of UK Biobank? So lots of so UK Biobank. 84 00:11:01,660 --> 00:11:04,660 Those who haven't had it was this country's Tony Blair. 85 00:11:05,470 --> 00:11:09,040 Great initiative study. Half million people. 86 00:11:09,940 --> 00:11:17,950 And the main aim of the study was to look for gene and environment interactions of the main diseases in the elderly. 87 00:11:18,370 --> 00:11:26,200 So the aim was to recruit half a million adults aged between 45 and 64 in the UK. 88 00:11:28,270 --> 00:11:37,390 And the studies that we'd been doing, we reckon when I was involved in epidemiology we wanted to recruit random sample the population. 89 00:11:37,840 --> 00:11:39,639 We reckon with a bit of a push. 90 00:11:39,640 --> 00:11:52,070 We could probably get up to 65 70% response rate, but UK Biobank letters would come through asking people to contribute blood, urine and answer some. 91 00:11:52,280 --> 00:11:56,900 Students for the greater good of humanity and take part in something very much worthwhile. 92 00:11:57,320 --> 00:12:03,560 Do you know what the response rate was to by bank anybody? No. But what about 10%? 93 00:12:04,340 --> 00:12:17,150 It was about 10%. And that's really interesting because the couple of things that came out may be not surprising. 94 00:12:18,140 --> 00:12:26,870 If you look maybe at the distribution of towns in school, it, as you know, 95 00:12:27,080 --> 00:12:38,960 is an index of deprivation or clearly the people who took part in UK Biobank were not representative of of the great British population. 96 00:12:39,140 --> 00:12:46,430 Now in some ways I remember when we when we had a big discussion about UK Biobank, 97 00:12:46,430 --> 00:12:59,239 we had this great discussion about representativeness and sampling and we talked about minorities and how how did we ensure 98 00:12:59,240 --> 00:13:08,890 that there was a sufficient representation of the multiple minority groups that exist in the current UK population? 99 00:13:08,900 --> 00:13:18,320 And of course this is a constant change in did we you know, if they if 0.03% of the population came from Vietnam, 100 00:13:18,320 --> 00:13:22,610 did we have have 0.03% of the of the respondents Vietnamese? 101 00:13:22,880 --> 00:13:27,980 And if we didn't, did that mean the conclusions weren't extrapolated? 102 00:13:29,390 --> 00:13:38,510 So it was a challenge. And in the end, we decided to drop it and hoped that we would have enough in in different minority groups to make a judgement. 103 00:13:38,510 --> 00:13:43,760 But we've seen about Townsend score. 104 00:13:45,260 --> 00:13:50,780 Well so see here in terms of cigarette smoking. 105 00:13:51,320 --> 00:14:00,469 These are data comparing UK Biobank with the Health Survey of England. 106 00:14:00,470 --> 00:14:04,790 And of course Health Survey for England doesn't necessarily mean that it's representative. 107 00:14:05,300 --> 00:14:11,810 But clearly the people who were recruited to in UK Biobank are overrepresented. 108 00:14:12,170 --> 00:14:20,810 The more affluent, the healthy, the worried well, the more educated in the population. 109 00:14:21,440 --> 00:14:33,470 And yet UK Biobank has been one been a very rich resource on the epidemiology of hundreds of diseases within the UK. 110 00:14:35,180 --> 00:14:49,010 Does it matter? Well, I don't know. It possibly makes it less relevant when you're comparing within the cohort to compare, say, 111 00:14:49,670 --> 00:14:58,610 between never smokers and current smokers within biobank might be just as relevant as comparing between current smokers, 112 00:14:58,610 --> 00:15:01,640 never smokers in the population as a whole. 113 00:15:02,480 --> 00:15:15,260 That is an assumption which may or may not be true, but certainly in terms of just understanding the prevalence, there's clearly a limit. 114 00:15:19,670 --> 00:15:32,930 The other thing is that the the the survey method works without problems that we believe when we are collecting that information, 115 00:15:33,230 --> 00:15:38,930 the information we're collecting is giving us the answer we want. 116 00:15:39,710 --> 00:15:43,010 And sometimes that's not the case. 117 00:15:45,680 --> 00:16:00,410 I remember a number of years ago I was involved in a study of back pain in 35 European populations, 118 00:16:01,940 --> 00:16:15,200 and we translated the question into the different languages, and then we back translated to make sure that the meaning hadn't been lost. 119 00:16:16,160 --> 00:16:16,550 Okay. 120 00:16:18,590 --> 00:16:28,460 And we found and these were general population samples and the prevalence of back pain varied between seven and 82% in the different populations. 121 00:16:29,240 --> 00:16:41,960 I cannot believe I cannot believe that the differences we found were due to differences in the true occurrence of back pain. 122 00:16:42,830 --> 00:16:48,800 What I do believe is that actually most people don't know where their back is. 123 00:16:52,180 --> 00:16:57,430 If I was to ask you why your back was, you would all tell me some way. 124 00:16:57,460 --> 00:17:00,620 Something different. Somebody here, some here, some might. 125 00:17:01,240 --> 00:17:07,320 You can't see what I'm doing and what is pain, what tonight, etc. etc. 126 00:17:09,820 --> 00:17:13,360 You take a member in UK Biobank. 127 00:17:14,020 --> 00:17:24,730 We accepted long and hard about dietary survey methodology, how we ascertain what what people eat. 128 00:17:27,220 --> 00:17:33,250 I've been doing some work recently about physical activity in the elderly. 129 00:17:34,480 --> 00:17:43,410 Now you can come up with all the instruments you like and there are hundreds of instruments and we can look at that qualities now, 130 00:17:44,020 --> 00:17:47,830 integrate and integrate reliability, etc., etc. 131 00:17:48,640 --> 00:17:56,080 But most of the things we are measuring are surrogates for something that is really quite difficult. 132 00:17:57,490 --> 00:18:10,270 Now that's not to say they're meaningless. Clearly they're not meaningless, but there are challenges and often we put blind faith in the data we have. 133 00:18:10,810 --> 00:18:19,480 And interestingly, it's only when we have the comparative data that we realise there may be an issue. 134 00:18:21,160 --> 00:18:28,600 And one example of this is I've been interested or I was interested some years ago in whether 135 00:18:28,600 --> 00:18:35,500 some new drugs for treatment of rheumatoid arthritis were associated with an increased rate, 136 00:18:35,920 --> 00:18:40,060 of particular of people having various kinds of infections. 137 00:18:40,930 --> 00:18:49,150 And we did the same study in England and in Sweden, and we found that in the people who took this drug, 138 00:18:49,720 --> 00:18:53,860 they had exactly the same rate of infection, which was fantastic. 139 00:18:54,070 --> 00:19:00,610 Maybe that was a suggestion of some validation of the method we chose. 140 00:19:01,150 --> 00:19:09,970 The problem was that the comparison group in England and the comparison group in Sweden were so different, 141 00:19:10,450 --> 00:19:13,390 but we didn't find a significant difference and they did. 142 00:19:14,470 --> 00:19:21,130 So we both had the same outcome in the treated group, but we both had very different outcomes in the untreated group, 143 00:19:22,060 --> 00:19:26,350 possibly related to the way we collected data and possibly the way we collected the sample. 144 00:19:29,680 --> 00:19:40,090 The other problem, particularly in in prospective longitudinal studies, is retention. 145 00:19:42,970 --> 00:19:50,000 I am when I left Manchester, I was the medical director of Arthritis Research UK. 146 00:19:50,470 --> 00:19:58,390 Some time I had a very ambitious programme of doing some clinical trials, major clinical trials, clinical trials that needed answering. 147 00:19:58,900 --> 00:20:05,200 And we clinical trial, we had a committee of the great and the good and maybe the not so great and not 148 00:20:05,200 --> 00:20:10,720 so good sometimes to determine whether this trial was achieving its goals. 149 00:20:11,110 --> 00:20:13,120 And there was a real emphasis on recruitment. 150 00:20:13,420 --> 00:20:19,809 And it was a bit like, you know, the, you know, the new funds for the church steeple, you know, as the it goes up. 151 00:20:19,810 --> 00:20:24,190 Have we raised enough money? So has the recruitment reached the required level? 152 00:20:25,210 --> 00:20:28,630 Actually, recruitment in clinical trials is a bit of an issue. 153 00:20:28,930 --> 00:20:33,850 The biggest problem is retention. The bigger problem is retention. 154 00:20:34,900 --> 00:20:42,580 And the people you lose means that they are not providing that information. 155 00:20:42,760 --> 00:20:52,240 Now, we could have a discussion about imputation. We could have a discussion about missing at random, completely random, etc., etc. 156 00:20:53,200 --> 00:21:05,410 But the one of the points I want to make is, which I think is, is a really important point, that in terms of primary data collection in studies, 157 00:21:06,430 --> 00:21:15,520 the people you lose are possibly the biggest determinant of the loss of validity of a study. 158 00:21:16,270 --> 00:21:20,830 And it is very hard, very hard often to retain people. 159 00:21:21,490 --> 00:21:28,420 So why not use routine available data? 160 00:21:28,870 --> 00:21:33,730 I mean, often routine data is collected often without any aims. 161 00:21:34,330 --> 00:21:46,030 Primary care data, hospital data, census data, occupational data of many kinds is not collected to answer research specific work research questions. 162 00:21:46,780 --> 00:21:51,130 Often it's done for the purpose of finance or audit. 163 00:21:52,090 --> 00:21:55,659 The good thing is it's not fixed in time. 164 00:21:55,660 --> 00:22:03,730 And as Claire notes, we have in the UK, we've got really good primary care data. 165 00:22:04,420 --> 00:22:18,760 Now we can discuss whether it is of research quality and how different general practices, different general practitioners use different terms. 166 00:22:19,870 --> 00:22:27,729 We recently were interested in CPD to assess people with hip pain and the way 167 00:22:27,730 --> 00:22:33,520 that that was coded by different general practitioners was quite remarkable. 168 00:22:34,090 --> 00:22:37,150 But nevertheless it exists and it exists longitudinally. 169 00:22:38,440 --> 00:22:44,260 We've got very good data for some diseases, particularly cancer. 170 00:22:46,210 --> 00:22:52,840 We've got some data on people who are admitted to hospital. 171 00:22:54,910 --> 00:23:04,240 It's not brilliant. The coding is dependent on the coding done by the person who's responsible. 172 00:23:04,660 --> 00:23:10,900 Suppose somebody comes into hospital to have a knee joint replaced and then develops 173 00:23:10,920 --> 00:23:16,710 venous thrombosis and then has a pulmonary embolism and then developed for pneumonia. 174 00:23:16,720 --> 00:23:20,440 What get coded? What doesn't get coded? What's the important thing? 175 00:23:21,790 --> 00:23:25,810 What's not the important thing when people then have readmission, etcetera, etcetera. 176 00:23:26,080 --> 00:23:35,320 There's any number of complexities with hospital data and increasingly with people treated as day care, 177 00:23:35,920 --> 00:23:47,590 etc. And we're also very bad at understanding people who might come in, for example, injection treatment or some physiotherapy or something like that. 178 00:23:49,150 --> 00:23:55,900 I've been heavily involved in the National Joint Register, 179 00:23:56,590 --> 00:24:05,770 which is a register of everybody in the UK, actually in England who's had joint replacement. 180 00:24:06,560 --> 00:24:10,780 They are joint replacement registers in many countries in the world. 181 00:24:11,170 --> 00:24:20,950 They have spawned an enormous amount of data about the value and the consequences of joint replacement surgery. 182 00:24:21,310 --> 00:24:24,459 But they are not without problems. They are not complete. 183 00:24:24,460 --> 00:24:27,880 We are no, they're not complete, although the completeness has gone up. 184 00:24:28,240 --> 00:24:33,550 They're not always accurate. We don't always have complete follow up. 185 00:24:33,850 --> 00:24:42,700 And often in terms of the things that you want, that information is missing, maybe in terms of some of the outcomes. 186 00:24:43,690 --> 00:24:51,490 So it's quite good on maybe short term effects and it might be very good for manufacturers, but long term health outcomes. 187 00:24:52,660 --> 00:24:56,830 And I've also been involved in drug treatment registers. 188 00:24:56,830 --> 00:25:07,840 And again, there's a that can be very useful, a means of understanding the long term consequences. 189 00:25:08,650 --> 00:25:16,780 So yes, secondary data has its limitations, but it's there. 190 00:25:16,960 --> 00:25:27,250 It's available to be used for a whole range of uses for descriptive epidemiology. 191 00:25:27,550 --> 00:25:35,410 It can tell you very easily in a way that would be very difficult to do for primary data collection in terms of, 192 00:25:35,410 --> 00:25:40,930 for example, the incidence in this case of different forms of skin heart disease. 193 00:25:41,140 --> 00:25:44,560 It can understand the age and gender influence. 194 00:25:45,490 --> 00:25:49,300 You can look at at time trends. 195 00:25:51,580 --> 00:26:05,410 And over the years I've used publicly available data to look at seasonal trends in relation to certain conditions because with all its limitations, 196 00:26:05,620 --> 00:26:15,850 how else can you understand that? And also you can look for geographical variation, 197 00:26:17,830 --> 00:26:31,000 routine data though and to some extent is is limited because as I say, routine data is collected for routine purposes. 198 00:26:31,330 --> 00:26:43,840 It's not collected to answer any specific questions and is a very big and in fact, actually in the today's today's Guardian, 199 00:26:44,170 --> 00:26:51,460 there was something about dangers of implants that all of these nasty orthopaedic surgeons putting nasty bits of. 200 00:26:51,560 --> 00:26:55,160 Settle into otherwise none nasty human beings. 201 00:26:57,320 --> 00:27:14,930 And why didn't something like the national joint in Manchester understand that people where there was a a metal and pelvic bit and a metal hip bit, 202 00:27:15,290 --> 00:27:23,989 so called metal on metal. Why didn't the national joint in Manchester come out with these things are harmful. 203 00:27:23,990 --> 00:27:34,880 The metal ions get into local tissues, cause the reaction and and the joints can become loose, etc. etc. etc. 204 00:27:35,240 --> 00:27:41,750 Now it wasn't the fault of the national joint in Manchester that it didn't answer the question. 205 00:27:42,020 --> 00:27:52,370 The National Joint May said, Look, we've got the data. It was up to you, whoever that you is, to think of the question and answer it. 206 00:27:52,370 --> 00:28:04,669 So often we kind of assume that all these large routine data sets will somehow answer questions that they have been taught, 207 00:28:04,670 --> 00:28:09,920 maybe some machine learning or something to tell you. 208 00:28:10,490 --> 00:28:20,720 Essentially, they are a tool, but they need at the moment a human interface to to ask questions. 209 00:28:22,220 --> 00:28:35,150 I think what is useful is the potential that you can explore things within routine data. 210 00:28:35,840 --> 00:28:46,490 And it may be that it's there's some potential for doing research within the register, but often a register is limited. 211 00:28:47,030 --> 00:28:54,980 I think some most interesting stuff is the potential linkage between registers. 212 00:28:57,260 --> 00:29:06,200 And I'm sure a number of you will have heard of the FA Institute and I went to very interesting talk 213 00:29:06,200 --> 00:29:19,820 where in Wales they had linked childhood absence at school with mental illness later on in life. 214 00:29:20,510 --> 00:29:27,470 So this has the potential of linking registers that can add to the richness of one register. 215 00:29:28,130 --> 00:29:37,340 I sometimes feel it would be wonderful to live in countries other than Britain and in fact, I think what's happened recently, 216 00:29:37,340 --> 00:29:44,180 I think it'd be necessary to live somewhere other than in Britain if you want to survive as a researcher. 217 00:29:45,080 --> 00:29:47,990 But anyway, who knows, you make it through pilot. 218 00:29:49,130 --> 00:29:58,400 But there's a colleague of mine in Sweden a number of years ago, and in Sweden up to four years ago, believe it or not, they had conscription. 219 00:29:58,910 --> 00:30:03,590 I'm not sure what the people did in the Swedish army. I'm not sure who the enemy was, maybe the Danes. 220 00:30:05,720 --> 00:30:12,200 But one of the questions that they asked the Swedish conscripts was how regularly they took cannabis, 221 00:30:12,200 --> 00:30:15,410 which of course being Swedish they were very happy to tell you about. 222 00:30:16,700 --> 00:30:27,769 And they were then this guy was then able to link these anonymized questionnaires that were collected on all Swedish conscripts to the 223 00:30:27,770 --> 00:30:35,839 Swedish mental health register to show that there's a link between cannabis use in recruits and a schizophrenia later on in life. 224 00:30:35,840 --> 00:30:47,660 Now I can not say that it was the cannabis made be, you know, other confounders, but the potential of linking or registers is really important. 225 00:30:48,920 --> 00:30:59,989 As Clare invited me, how could I not talk about an example where we were able to link to registers, 226 00:30:59,990 --> 00:31:12,620 where we were interested in diet and rheumatoid arthritis and it just this could do a study of a disease as rare as rheumatoid arthritis. 227 00:31:12,860 --> 00:31:16,700 Collecting decent diet data would have been impractical. 228 00:31:17,210 --> 00:31:26,600 And what we were able to do with Link a disease register that Claire worked on the Norfolk Arthritis 229 00:31:26,600 --> 00:31:36,290 Register with a very detailed register of diet carried out fortuitously in the same area. 230 00:31:36,530 --> 00:31:48,470 And we were able to show linking those two registers that diet rich in certain fruits were protective against development of rheumatoid arthritis. 231 00:31:50,390 --> 00:31:53,030 So. And Biobank. 232 00:31:53,210 --> 00:32:02,990 The richness of Biobank is not in terms of what data by bank collected cost actually, but what it can tell is, say linked to two primary care. 233 00:32:05,060 --> 00:32:13,490 But the challenge in routine data is is is quality. 234 00:32:15,020 --> 00:32:20,540 And in I talked to you about the National Joint Register, 235 00:32:20,900 --> 00:32:29,480 which has been used as the data on the outcome of hip joint replacement and the quality of the data. 236 00:32:31,580 --> 00:32:47,030 It's it's it's good, but it's not what we would call research quality because research quality data in requires is, you know, the kind of things that, 237 00:32:47,290 --> 00:32:56,180 you know, we're constantly testing for drift for variation between observe and look at that variation within observes in the way things are collected, 238 00:32:56,390 --> 00:32:57,560 completeness, etc., 239 00:32:57,560 --> 00:33:11,090 etc. and the kind of things we teach epidemiologists in terms of how you maintain data quality in a study lasting six months or a year or two years. 240 00:33:12,050 --> 00:33:16,850 And here we were relying on data collected nationally over maybe decades. 241 00:33:17,840 --> 00:33:22,620 And there isn't a funding there, isn't there? So we cannot accept we have to. 242 00:33:22,670 --> 00:33:29,570 So we have to accept that there are limitations in the quality of the data. 243 00:33:30,170 --> 00:33:33,139 And there's a trade-off between. 244 00:33:33,140 --> 00:33:42,260 On the one hand, I told you how difficult is primary data, but we have to accept that there are challenges in the quality of secondary data. 245 00:33:43,790 --> 00:33:55,280 So epidemiology to me, I don't think it's possible to do the the the perfect study. 246 00:33:57,320 --> 00:34:05,270 I think anybody who says they've done the perfect study I don't think is telling the truth. 247 00:34:07,220 --> 00:34:20,390 But I think what it's our responsibility as investigators to do two things to be clear about the possible imperfections. 248 00:34:21,110 --> 00:34:24,620 Is it something bias biases it problems of the survey methodology? 249 00:34:24,890 --> 00:34:29,240 Is it a problem with numbness, point loss to follow up and be open about it? 250 00:34:29,600 --> 00:34:33,500 And then and then the key things. And okay. 251 00:34:34,820 --> 00:34:38,570 Given that, what are the consequences? 252 00:34:39,350 --> 00:34:47,690 What are the consequences? And sometimes, guys, sometimes it might be that actually the consequences can be helpful. 253 00:34:48,560 --> 00:34:52,670 A bias could be helpful. What do you mean? 254 00:34:52,910 --> 00:34:56,900 What I mean is the kinds of this model bias could be helpful. 255 00:34:57,280 --> 00:35:03,050 And let me give you a suggestion and a real example. 256 00:35:04,100 --> 00:35:12,139 So there was an at one time there was a worry that women who had influenza during 257 00:35:12,140 --> 00:35:23,240 pregnancy were at risk of having an infant who would have delayed milestones. 258 00:35:24,890 --> 00:35:33,320 So they took women. They took children with delayed milestones and had a passing group of children without delay milestones. 259 00:35:34,490 --> 00:35:39,110 And they interviewed the mothers about whether they'd had influence event in pregnancy. 260 00:35:39,590 --> 00:35:40,280 And actually, 261 00:35:40,850 --> 00:35:47,330 that was probably the only way they could get that information because it wouldn't be neatly recorded because most may not have gone to their GP. 262 00:35:48,530 --> 00:35:59,899 And you immediately would say, well, of course the women who had a child delay friend would be more likely to recall the things that had happened 263 00:35:59,900 --> 00:36:05,930 in pregnancy and maybe would be more likely to recall having influenza than a woman who's got a normal, 264 00:36:05,930 --> 00:36:11,180 healthy baby. And this would be a criticism of that study. 265 00:36:13,130 --> 00:36:16,340 That particular study found no difference. 266 00:36:17,240 --> 00:36:27,650 That particular study found no difference. So if there had been a bias, it would have been in the direction of finding a difference. 267 00:36:28,310 --> 00:36:31,600 And because they found no difference in some ways, 268 00:36:31,760 --> 00:36:39,770 that bias in the study in that third design did not really affect the conclusion which was in this particular instance we. 269 00:36:40,910 --> 00:36:45,920 So biases in that sense can be helpful. 270 00:36:46,850 --> 00:36:50,910 So it's important to be clear about biases and it's an. 271 00:36:51,420 --> 00:36:58,020 Important to go beyond that and understand how the imperfections will have influenced the results. 272 00:36:58,770 --> 00:37:05,130 So I tell my students, for example, I don't expect you to do the perfect study, 273 00:37:05,520 --> 00:37:15,000 but what I do expect you I do expect you to understand in great detail the imperfections of what you've done and how they may have altered, 274 00:37:15,480 --> 00:37:20,910 if at all, and by how much the results you've obtained. 275 00:37:21,900 --> 00:37:26,070 Okay. Moving on swiftly. Okay. 276 00:37:27,000 --> 00:37:32,130 Randomised controlled trials overblown. I could talk for a long time about randomised controlled trials. 277 00:37:32,280 --> 00:37:43,140 I've only done I think three in in my life and in fact it's the only paper I ever got in the New England Journal of Medicine. 278 00:37:46,380 --> 00:37:50,580 But there were challenges that expensive at the time. 279 00:37:51,150 --> 00:37:54,000 It's government recruitment and retention, etc., etc. 280 00:37:54,660 --> 00:38:03,480 And there is a big debate between doing a randomised controlled trial and a longitudinal observational study. 281 00:38:04,500 --> 00:38:19,380 And you know, we worship the god of the randomised controlled trial and despite all their difficulties as people in clinical practice would say, 282 00:38:20,130 --> 00:38:27,690 well I know we have to look at the best evidence and the best evidence is a meta analysis of clinical trials, 283 00:38:28,950 --> 00:38:35,370 because we've all read the Cochrane reviews. Actually, I must tell you, when I did I was taught by Archie Cochrane. 284 00:38:36,870 --> 00:38:45,180 Interesting guy, very interesting guy. Areas Colour wrote in his lapel. 285 00:38:45,900 --> 00:38:47,580 Didn't make him interesting. I just found that. 286 00:38:49,420 --> 00:39:04,710 But one wonders, particularly in treatment trials, which is a more useful question, which is a more useful question is drug A better than drug B? 287 00:39:05,310 --> 00:39:10,830 Is treating patients with surgery better than treating people with physiotherapy? 288 00:39:11,250 --> 00:39:18,780 Is treating people with CBT on the telephone better than having individual CBT? 289 00:39:20,070 --> 00:39:26,550 Yeah, we know that all these things are probably useful, but we keen to have a single answer. 290 00:39:27,660 --> 00:39:39,000 But maybe the more important question practically is which patients, which populations will do better with A and which with B? 291 00:39:39,420 --> 00:39:48,960 Because actually, in today's world, when we do these comparative studies of treatment, actually they're probably all effective in somebody. 292 00:39:50,460 --> 00:39:56,520 One of my spare time occupations is I chair the appeals panel for Nice. 293 00:39:57,870 --> 00:40:09,360 And Nice is often based on whether we approve or don't approve a drug or technology for use in the NHS. 294 00:40:10,620 --> 00:40:19,799 And the appeal always comes down on not whether a drug works or not or even it can be shown to be cost 295 00:40:19,800 --> 00:40:27,300 effective or not because in some people it will work and in then people it will be cost effective, 296 00:40:27,570 --> 00:40:31,050 but in others it won't work and in others it won't be cost effective. 297 00:40:31,740 --> 00:40:39,030 So on the whole, it may be that it might not be more cost effective than some people or work more. 298 00:40:39,360 --> 00:40:46,740 But the drug company or the patient advocacy group or the health care professional advocacy groups say, 299 00:40:46,980 --> 00:40:52,650 but if we knew the subgroup in whom it worked, then we should use the drug. 300 00:40:53,850 --> 00:40:56,940 And now we can show you clinical trials, 301 00:40:56,940 --> 00:41:07,740 we can show you observational studies where we've done the subgroup analysis and shown that in this subgroup it's great and therefore nice, use it. 302 00:41:07,920 --> 00:41:12,270 And we couldn't do a clinical trial of every possible subgroup because we didn't know 303 00:41:12,270 --> 00:41:16,200 what subgroups there were to start off with and it would have been too expensive. 304 00:41:16,920 --> 00:41:20,310 But of course, the statisticians say, well, this is post-hoc reasoning. 305 00:41:20,850 --> 00:41:28,380 You can't go if you didn't start off with your analysis, you didn't start off setting out to look at this subgroup. 306 00:41:28,800 --> 00:41:36,270 Don't come to us as pure statisticians and say, now it works in a subgroup. 307 00:41:37,110 --> 00:41:45,929 But I just wonder, folks, I just want the facts in reality where they're actually accepting that most treatments will work. 308 00:41:45,930 --> 00:41:50,700 In some people, the challenge is not a better than BP, less than. 309 00:41:50,740 --> 00:41:53,930 0.05 or even p less than 0.01. 310 00:41:55,180 --> 00:41:59,440 Because it could be 48% and 52% like the referendum. 311 00:41:59,960 --> 00:42:03,320 And so I think and I didn't I didn't I didn't make an amendment. 312 00:42:04,150 --> 00:42:08,220 And but it's actually in in in who it were. 313 00:42:08,380 --> 00:42:13,870 And it's really interesting to me because for years I was a practising clinician and I was impressed. 314 00:42:14,020 --> 00:42:19,600 There were there were drugs are used but worked in some patients but didn't work in others, 315 00:42:19,810 --> 00:42:24,400 but different drugs worked in others and didn't work in in the other group. 316 00:42:24,940 --> 00:42:28,690 And if I'd done a clinical trial, it might have told me something about the order. 317 00:42:28,690 --> 00:42:32,860 I should use the drug, but it doesn't necessarily tell me how it works. 318 00:42:33,130 --> 00:42:41,020 And maybe we need to rebalance some of the questions we ask in terms of doing the good longitudinal study. 319 00:42:41,290 --> 00:42:50,830 And that doesn't mean to say that there's any compromise on quality to do a high quality, 320 00:42:51,010 --> 00:42:57,130 longitudinal observational study that enables you to do a proper subgroup analysis. 321 00:42:57,400 --> 00:43:10,900 You need to apply the same rigour in sampling, in recruitment, in the retention, in the methodological quality and reliability, validity, etc. 322 00:43:11,110 --> 00:43:20,890 It's not an opportunity to do a study on the cheap, but it's maybe an opportunity to do something that's perhaps more useful. 323 00:43:21,970 --> 00:43:34,240 I'm just going to in the last four or 5 minutes, I just want to just out of interest, talk a little bit about why we need epidemiology anyway. 324 00:43:35,230 --> 00:43:52,390 So what actually is a cause? Two weeks ago I was in Hiroshima and I heard some harrowing stories, harrowing testimony about radiation damage. 325 00:43:52,930 --> 00:44:01,330 You didn't need an epidemiologist to know that radiation causes severe radiation burn. 326 00:44:01,540 --> 00:44:05,350 It was both necessary and sufficient. 327 00:44:08,110 --> 00:44:17,950 When I was at school, this came out thalidomide and foetal limb damaged, which had never been this had never been seen before. 328 00:44:18,040 --> 00:44:24,860 And thalidomide was advertised for drug you can safely take in pregnancy to have a good night's sleep. 329 00:44:26,080 --> 00:44:29,110 That's how it was advertised on the medical journals. 330 00:44:31,870 --> 00:44:36,610 You never got this condition of folk milia unless you've taken thalidomide. 331 00:44:36,620 --> 00:44:38,320 It haven't been heard of before. 332 00:44:39,220 --> 00:44:48,280 So the drug was necessary, but it wasn't sufficient because most women who took it, fortunately, didn't end up with a deformed child. 333 00:44:51,370 --> 00:44:59,290 We know we know, in fact, from Hiroshima that a lot of people who were irradiated there developed leukaemia. 334 00:44:59,920 --> 00:45:06,310 But we do know that actually people get leukaemia without being irradiated and not everybody. 335 00:45:10,390 --> 00:45:13,420 And we can't get to me without being irradiated. 336 00:45:13,660 --> 00:45:19,030 But if you'd had sufficient radiation, then the rates of leukaemia were very, very high. 337 00:45:19,420 --> 00:45:24,100 So it was sufficient to cause the disease, but not necessary. 338 00:45:24,580 --> 00:45:32,500 But that's not the world we live in. We live in a world of a mass flu vaccine and narcolepsy. 339 00:45:32,980 --> 00:45:36,100 And I'm sure after a day on your course, have a good. 340 00:45:36,100 --> 00:45:42,490 It is. Maybe many of you suffer from. No, actually, amazingly, no narcolepsy. 341 00:45:43,540 --> 00:45:44,109 And actually, 342 00:45:44,110 --> 00:45:56,170 this was something I was asked to I was sat on a panel about when they had this mass flu vaccine for H1N1 and whether it caused narcolepsy. 343 00:45:56,560 --> 00:46:05,090 But of course, you know, most most people who develop narcolepsy have recently had a flu vaccine. 344 00:46:05,110 --> 00:46:13,510 We knew narcolepsy existed. And most people who had the vaccine, who'd had the vaccine didn't get narcolepsy. 345 00:46:14,350 --> 00:46:19,630 So, you know, most cases narcolepsy not immunised and most people did not get narcolepsy. 346 00:46:19,990 --> 00:46:29,530 So in this case, that immunisation may be a cause but not be the cause. 347 00:46:30,460 --> 00:46:36,070 And so my finally just very through, as we were in Oxford, 348 00:46:36,070 --> 00:46:50,620 now we often look to understand causes with Bradford Hill and Bradford Hill, who is a medical statistician, who is the first person. 349 00:46:50,660 --> 00:46:55,790 And to link smoking with lung cancer said, okay, 350 00:46:55,790 --> 00:47:05,600 there's many cases like the the flu vaccine in narcolepsy where it's not obvious and causes are not sufficient unnecessary. 351 00:47:06,050 --> 00:47:10,580 But let me give you some criteria. The effect is strong. 352 00:47:11,330 --> 00:47:21,890 Not true. We know that what modern genetics is told is that genetic effects can be very real because their biological but often very weak. 353 00:47:24,470 --> 00:47:35,810 The effect is consistent, makes it more likely actually not true because often we get inconsistent results because the research quality was not good. 354 00:47:37,550 --> 00:47:48,620 The there is a specificity about that relationship between the perceived outcome and between a received and predicted outcome. 355 00:47:48,860 --> 00:47:56,360 But actually that is often not the case. Many things like obesity or cigarette smoking, for example, associate with many outcomes. 356 00:47:58,190 --> 00:48:02,090 What about dose response? Well, again, not true. 357 00:48:02,300 --> 00:48:10,160 It may be that some things are threshold. Other evidence is coherent. 358 00:48:10,670 --> 00:48:14,450 Well, it may be coherent. It may not be. 359 00:48:16,850 --> 00:48:24,650 Is it biologically plausible? Is it biologically plausible that this vaccine was going to cause narcolepsy? 360 00:48:25,520 --> 00:48:28,550 Actually, you know, probably not. 361 00:48:29,540 --> 00:48:35,960 Probably not. And there are so many areas where we we we come up. 362 00:48:36,200 --> 00:48:43,820 Is it biologically plausible that exercise may protect you against having a stroke? 363 00:48:46,220 --> 00:48:55,910 I just pose that question to you. I'm sure they can find some evidence that makes it biologically plausible, but you might find evidence against it. 364 00:48:57,140 --> 00:49:07,220 But what Bradford goes on to say is that ultimately we need experimental proof which relies on a randomised clinical trial. 365 00:49:07,610 --> 00:49:12,050 And after attending this talk you probably think that is not a route that you will go down. 366 00:49:12,650 --> 00:49:19,280 And on that happy note, I will take criticisms or even missiles thrown at me. 367 00:49:19,310 --> 00:49:20,150 Thank you very much.