1 00:00:00,030 --> 00:00:06,000 I'm going to talk a little bit about this debt watch design, but not an awful lot about this stepwise design, 2 00:00:06,360 --> 00:00:12,510 because one of the important things to think about is what type of study design should we be using. 3 00:00:13,020 --> 00:00:18,390 So I want to make sure that I give you the whole sort of menu of options so you can think about all the different possible options 4 00:00:18,810 --> 00:00:25,740 and then come up at the end to sort of some recommendations for when perhaps the stepwise design might be a good design choice. 5 00:00:26,950 --> 00:00:33,620 So I'm going to talk about health policy evaluation. And that's because this is where the step to design comes into its own. 6 00:00:33,640 --> 00:00:39,160 It helps us to evaluate policies. Or at least that's the intention. 7 00:00:40,540 --> 00:00:47,890 So why evidence based policy? Well, I'm convinced that we need to evaluate different types of policies. 8 00:00:48,310 --> 00:00:53,980 We've been evaluating medicinal products, say, pharmaceutical interventions, for decades. 9 00:00:54,550 --> 00:01:03,460 We all accept that if we're going to use the drug, then we should have robust randomised evidence behind that application to use that drug. 10 00:01:03,940 --> 00:01:10,240 But when it comes to health policy evaluation, we typically accept a much lower quality of evidence. 11 00:01:10,660 --> 00:01:18,460 And I'm going to put up some very poor study designs and I'm going to try to persuade you that these are not the designs that we should be choosing. 12 00:01:18,700 --> 00:01:23,020 But actually the evidence is littered with these very poor study designs. 13 00:01:23,500 --> 00:01:30,969 And if we're using a poor study design, we can't be certain whether this intervention, this policy that we're looking at, works. 14 00:01:30,970 --> 00:01:35,620 So it doesn't work because the study design is littered with these different types of biases. 15 00:01:36,690 --> 00:01:40,299 And this is the sort of work that I'm trying to advocate. 16 00:01:40,300 --> 00:01:45,640 And this is a an editorial in JAMA just from a month ago or so. 17 00:01:45,970 --> 00:01:51,490 And they were also advocating that we should evaluate these health policy interventions 18 00:01:51,820 --> 00:01:58,750 in a robust way rather than just rely on observational case study type of evidence. 19 00:02:01,540 --> 00:02:04,990 So what do we mean by policy interventions? 20 00:02:05,050 --> 00:02:09,700 Well, I'm going to give you some examples, but it's really wide ranging. 21 00:02:09,820 --> 00:02:11,850 So here I've put up some examples. 22 00:02:11,890 --> 00:02:19,600 So one example is something like pay for performance, incentives for healthy behaviour, workplace well-being programs. 23 00:02:19,630 --> 00:02:22,060 Those are the sorts of things that I'm thinking about. 24 00:02:24,380 --> 00:02:31,910 So what I'm going to do in this talk is to, first of all, come up with what I'm saying is the definition of a health policy intervention. 25 00:02:32,270 --> 00:02:37,219 And I am going to try to carefully differentiate that to something called quality improvement, 26 00:02:37,220 --> 00:02:43,910 because I think the literature gets quite messy when it tries to work out what is a policy evaluation of what's quality improvement. 27 00:02:44,720 --> 00:02:48,440 I'm going to give a little bit of a tour around the hierarchy of evidence. 28 00:02:48,560 --> 00:02:53,000 So you know what sort of evidence that you should be aiming to get to. 29 00:02:53,210 --> 00:02:56,780 And then I'm going to think about the suitability of cluster randomisation. 30 00:02:58,340 --> 00:03:01,250 I'm going to look at different types of cluster randomised trials. 31 00:03:01,550 --> 00:03:06,410 I'm going to think about when the stepwise design is appropriate and then I'm going to try to 32 00:03:06,410 --> 00:03:10,640 think about the different types of risks of bias that we get with these different studies. 33 00:03:11,000 --> 00:03:17,480 Because even though I'm talking about randomised designs, there is still the possibility that the designs can be at risk of bias. 34 00:03:19,220 --> 00:03:28,480 This is particularly important with cluster trials. So my objectives then, this really is this is the underlying message of my talk. 35 00:03:29,080 --> 00:03:31,510 Correlation doesn't mean causation. 36 00:03:31,570 --> 00:03:38,860 So just because we have an example, we have a study where they've shown something is correlated with something else. 37 00:03:39,250 --> 00:03:47,890 That doesn't mean causation. We know that. But we mustn't lower our standards when it comes to interpreting these policy based evaluations. 38 00:03:48,220 --> 00:03:51,790 We wouldn't lower our standards if we were looking at a drug intervention. 39 00:03:52,210 --> 00:03:58,290 So why should we do this when we're looking at policy interventions? So every time I'm going to put a case study up, 40 00:03:58,620 --> 00:04:05,070 I'm going to come back to this to think about whether or not we have established correlation, whether we establish causation. 41 00:04:07,760 --> 00:04:14,150 So what's quality improvement them? How does quality improvement differ to a policy intervention? 42 00:04:14,540 --> 00:04:19,910 So there are lots of different definitions in the literature of what quality improvement is and what policy evaluation is. 43 00:04:20,930 --> 00:04:26,480 Some of the definitions talk about whether or not whatever you're trying to do is generalisable. 44 00:04:27,110 --> 00:04:32,690 Some definitions say that if you're doing something that's a local evaluation, you don't need ethics approval. 45 00:04:33,800 --> 00:04:38,110 Now, that's one problem. When we come to determine what sort of evaluation we're doing. 46 00:04:38,180 --> 00:04:39,380 But the other problem is, 47 00:04:39,770 --> 00:04:48,350 is that those people that are doing quality improvement evaluations typically have got less funding and they're using lower quality designs. 48 00:04:49,610 --> 00:04:53,870 So I like to think about what is the objective of what we're doing? 49 00:04:54,140 --> 00:05:02,270 Are we trying to determine whether we can improve outcomes in my hospital or whether this intervention improves the outcome? 50 00:05:02,900 --> 00:05:10,760 And I'm saying that if we're interested in working out whether we can improve outcomes in one hospital, then that's quality improvement. 51 00:05:11,330 --> 00:05:18,140 But if we're trying to make inferences about whether this intervention is improving outcomes, then that's a policy intervention. 52 00:05:19,220 --> 00:05:22,370 Both are going to find different people want to do different things. 53 00:05:22,400 --> 00:05:29,480 You may be working in a hospital and you may be really interested in determining or working out how you can improve things in your hospital, 54 00:05:29,990 --> 00:05:31,760 but also by improving the quality. 55 00:05:34,490 --> 00:05:44,300 So then it's less important then for you to identify what the sort of core component of your intervention is that's making the improvement. 56 00:05:45,260 --> 00:05:47,569 If you are only interested in improving outcomes, 57 00:05:47,570 --> 00:05:53,720 you're not concerned so much with what is the crucial ingredient, but rather are you improving outcomes? 58 00:05:57,090 --> 00:06:03,690 If we're trying to improve outcomes in a hospital, let's call that quality improvement, which starts at a lower level. 59 00:06:03,810 --> 00:06:08,940 We'd make some changes. We'd look to see whether those changes had improved any outcomes. 60 00:06:08,940 --> 00:06:12,390 We'd refine things we'd talk to people would try to do things better. 61 00:06:13,230 --> 00:06:19,290 We'd re-implement what we had at what we had done, we'd make some changes and then we'd look at our outcomes again. 62 00:06:19,320 --> 00:06:23,040 We'd talk to people again. That's quality improvement. 63 00:06:23,940 --> 00:06:29,700 We might try to flag up. If we were looking at a sort of systemwide quality improvement, 64 00:06:29,700 --> 00:06:33,870 we might look at particular hospitals and try to identify which hospitals were 65 00:06:33,870 --> 00:06:40,500 not performing very well and try to differentiate special cause variation. 66 00:06:40,860 --> 00:06:48,870 But that's all quality improvement. That's all about trying to improve outcomes rather than trying to make inferences 67 00:06:48,870 --> 00:06:53,760 about what is the key ingredient that you're doing to improve the outcomes. 68 00:06:54,540 --> 00:06:57,000 So I'm not interested in quality improvement, 69 00:06:57,420 --> 00:07:03,060 but I think it's important to differentiate that because many of these studies that I'm going to be looking at, 70 00:07:03,390 --> 00:07:09,180 they call themselves quality improvement interventions. But I don't think they are quality improvement interventions. 71 00:07:09,190 --> 00:07:13,200 They really are policy evaluation. So they're using quite low quality evidence. 72 00:07:14,890 --> 00:07:20,470 So back to policy evaluation then. So what I'm saying here is that causal inference is the focus. 73 00:07:21,700 --> 00:07:29,350 We really want to know whether this intervention works. It might be that we're interested in knowing whether the intervention works in one hospital. 74 00:07:29,800 --> 00:07:33,580 It might be that we were interested in making more generalisable statements. 75 00:07:34,120 --> 00:07:42,490 But wherever whatever our inference, whether it's about local or generalisable results, it's all about causal inference. 76 00:07:46,010 --> 00:07:52,160 So when we want to evaluate this, these different types of health policy interventions, we've got this hierarchy of study design. 77 00:07:52,250 --> 00:07:58,040 We can choose from some of the low risk of bias and others are at high risk of bias. 78 00:07:58,880 --> 00:08:03,020 Now, of course, we want to choose the design that has the lowest risk of bias. 79 00:08:03,800 --> 00:08:08,090 But sometimes because of the nature of the intervention and the nature of the setting, 80 00:08:09,020 --> 00:08:14,000 we might not have the flexibility to choose the design with the least risk of bias. 81 00:08:14,390 --> 00:08:17,090 And that's because we might only have one cluster available. 82 00:08:18,590 --> 00:08:25,460 Look we've only got one cluster available but we can't do a cluster randomised trial because we only got one thing to randomise. 83 00:08:25,850 --> 00:08:32,120 But we have other options available and these are called Interrupted Time series analysis and I'll show you an example of one of those in a moment. 84 00:08:32,840 --> 00:08:36,799 If we've got many clusters available, then we've got a lot more options open to us. 85 00:08:36,800 --> 00:08:40,850 And then we can start to choose a design that's got a lower risk of bias. 86 00:08:41,450 --> 00:08:44,570 So what I'm really saying here is that depending on the context, 87 00:08:44,660 --> 00:08:50,720 sometimes you might only have the option to choose a study design that's at a higher risk of bias. 88 00:08:51,230 --> 00:08:55,370 But if you're in a situation where you have got many clusters available, 89 00:08:55,760 --> 00:09:00,230 then we really should be trying to choose the study design that has the least risk of bias. 90 00:09:01,350 --> 00:09:05,550 And I'm going to try to share with you my ideas and how we can chase that that design. 91 00:09:07,350 --> 00:09:09,690 So I'm going to run through some different examples. 92 00:09:09,900 --> 00:09:17,640 These examples all evaluating health policy interventions that all are using different types of study designs. 93 00:09:17,880 --> 00:09:22,410 And I've chosen the example so that we're going to work through the hierarchy of evidence, 94 00:09:22,860 --> 00:09:28,259 starting off with a design that is quite low quality and working up to a design where I'm going to 95 00:09:28,260 --> 00:09:34,160 show you that actually is possible to evaluate these interventions using high quality evidence or. 96 00:09:37,960 --> 00:09:44,050 So this is the first example. Now, they call this exam, this study, a before and after study. 97 00:09:44,830 --> 00:09:48,940 So this before and after study, it's something very, very bottom of the hierarchy. 98 00:09:49,960 --> 00:09:56,230 What they wanted to do was to try to improve outcomes in women who were having babies 99 00:09:56,680 --> 00:10:01,870 by making sure that they were monitoring the amount of iron that the women have. 100 00:10:03,600 --> 00:10:07,830 So they introduced a whole heap of different types of resources. 101 00:10:08,190 --> 00:10:11,940 And basically these were educational resources aimed at clinicians. 102 00:10:12,390 --> 00:10:13,810 That's sort of multifaceted. 103 00:10:13,830 --> 00:10:22,140 It was more complicated than just simply education, but it's simply around a wide array raising awareness around this condition. 104 00:10:23,500 --> 00:10:30,340 So three population of interest were pregnant women who were in the delivery clinic and who were in I am in a labour ward. 105 00:10:31,720 --> 00:10:36,380 And then they took data pre and post this intervention phase. 106 00:10:36,670 --> 00:10:41,620 So the data span from around January 2012 to around December 2016, 107 00:10:41,620 --> 00:10:48,999 in one hospital and from the electronic patient record, they gained information on their outcomes. 108 00:10:49,000 --> 00:10:55,000 And these were the rates of ferritin or the rates of iron testing and the proportion of women who were anaemic. 109 00:10:55,240 --> 00:11:02,890 So those were the two outcomes. Now it's very arguable that they could have done this evaluation in more than one hospitals, 110 00:11:03,340 --> 00:11:07,090 but they constrained themselves to evaluate this in one hospital only. 111 00:11:07,390 --> 00:11:11,530 So let's see what they did. So they say it was a before and after study. 112 00:11:12,040 --> 00:11:20,380 So this little diagram here is an example of one. A before and after study typically looks like the light blue represents the control condition. 113 00:11:20,560 --> 00:11:26,380 The dark blue represents the intervention condition. And I'm going to use these same diagrams throughout the talk. 114 00:11:26,770 --> 00:11:31,690 So each row represents a cluster. And a cluster in this situation is the hospital. 115 00:11:31,720 --> 00:11:36,220 So this was carried out in one hospital. So we've got one hospital. 116 00:11:36,460 --> 00:11:41,200 That hospital is observed for a period of time just doing things as they do them. 117 00:11:41,350 --> 00:11:49,870 They then put the intervention in place that comes about here at month 12, and then they change the way that they do things. 118 00:11:49,900 --> 00:11:54,610 They carry out the educational workshops. Then they have a little transition period in here, 119 00:11:54,610 --> 00:12:01,210 a gap in the design where they accept that they're just rolling out the training and then they carry on collecting the post data. 120 00:12:02,110 --> 00:12:06,849 Now I've drawn this diagram here with one cluster, 24 months worth of data collection. 121 00:12:06,850 --> 00:12:09,670 Actually, in this example, they had a little bit more data than this, 122 00:12:10,000 --> 00:12:15,340 and they probably have this sort of transition period in the middle where they implemented this intervention. 123 00:12:18,000 --> 00:12:22,080 So there are different ways of analysing data from a design like this. 124 00:12:22,230 --> 00:12:26,130 You can take all of this data and all of the post data. 125 00:12:26,670 --> 00:12:32,880 You pull all of the pre data and you pull all of the post data and then you're just comparing two numbers. 126 00:12:33,300 --> 00:12:40,050 So the two numbers in this example that they might be comparing, are they the outcomes were. 127 00:12:42,760 --> 00:12:49,390 So the first one were the rates of ferritin testing, and the second one was the proportion of women who were anaemic. 128 00:12:50,430 --> 00:12:58,320 So they might just simply work out the proportion of women anaemic higher proportion women in the post period and then they compare the two. 129 00:12:58,890 --> 00:13:02,730 But that's quite wasteful of the data. It's just pulling everything together. 130 00:13:03,630 --> 00:13:12,030 If you do a pre post analysis like that, you're opening yourself up to a whole risk of different types of biases. 131 00:13:12,490 --> 00:13:17,010 Anything could have happened at this point in time other than your intervention. 132 00:13:17,730 --> 00:13:21,000 Maybe that to date when you rolled out your intervention. 133 00:13:21,210 --> 00:13:25,650 Coincidentally happened to coincide with something else, 134 00:13:25,650 --> 00:13:33,600 perhaps another policy implementation perhaps the government has decided to do, or the local health authority had decided to implement. 135 00:13:34,440 --> 00:13:40,920 Other things could have happened at that same point in time, simply by showing that there's a difference pre and post. 136 00:13:41,250 --> 00:13:44,219 We don't know whether that's because of the intervention that we're trying to 137 00:13:44,220 --> 00:13:48,900 evaluate or whether it's because something else external to our study happened. 138 00:13:51,170 --> 00:14:00,110 Now in this design, in this study here, they called it a before and after study, but it wasn't really a before and after study in that sense. 139 00:14:00,800 --> 00:14:06,830 So that's a worse sort of before and after study we can do pooling of the pre and the post data and just they can get the difference. 140 00:14:07,340 --> 00:14:09,470 They did something a little bit better than that, 141 00:14:09,620 --> 00:14:13,920 although they didn't give themselves credit in the title because they caught it before and after study. 142 00:14:15,200 --> 00:14:18,439 They did the next best thing and that's called an interrupted time. 143 00:14:18,440 --> 00:14:23,240 So as certain Interrupted Time series, he monitor the system. 144 00:14:24,950 --> 00:14:29,329 For a long period of time. You then have your intervention. 145 00:14:29,330 --> 00:14:36,889 You put your intervention in place and then you carry on monitoring the system or what you hope to be able to show. 146 00:14:36,890 --> 00:14:46,070 And this is the data from that study is that when you implemented your intervention at this point in time, something happened to the system. 147 00:14:46,610 --> 00:14:53,900 So in this example, the rates of ferritin testing, this is the rate of ferritin testing for this study and this is the month of the study. 148 00:14:54,140 --> 00:14:56,540 This is the date they implemented their intervention. 149 00:14:57,270 --> 00:15:06,440 They saw here in this example that there was quite a sudden change and a large change in the rate of ferritin testing in this whole population. 150 00:15:07,460 --> 00:15:12,770 The other thing that you can do is you can monitor what we might call the trend here. 151 00:15:13,340 --> 00:15:16,520 So this isn't completely flat. There's some sort of trend going on here. 152 00:15:16,520 --> 00:15:23,750 Things are increasing a little bit anyway, maybe not to a huge amount, but there's some hints here that things are increasing a little bit. 153 00:15:24,320 --> 00:15:27,979 So you might also try to work out whether not only is there a change in shift, 154 00:15:27,980 --> 00:15:32,630 is there a sudden impact of the intervention, but whether it might have changed the trend. 155 00:15:33,050 --> 00:15:36,560 They didn't do this. They didn't look to see whether there was a trend change in trend. 156 00:15:36,560 --> 00:15:41,830 But you could do that. Now in this example, they were lucky. 157 00:15:42,010 --> 00:15:45,130 They saw a huge impact of their intervention. 158 00:15:45,640 --> 00:15:51,400 It's probably quite hard to doubt that this change that they saw was due to the intervention. 159 00:15:51,940 --> 00:15:58,180 But unfortunately, most of the times when we're evaluating these types interventions, we're not so lucky. 160 00:15:58,540 --> 00:16:04,300 We don't see such big changes. And actually smaller changes are probably as important. 161 00:16:04,930 --> 00:16:09,070 May be they had only changed, managed to shift by this amount here. 162 00:16:09,580 --> 00:16:15,820 That would have been harder to detect on that graph, but nonetheless, it could have been clinically important. 163 00:16:17,330 --> 00:16:22,670 So if you use this type of study design and you're in the lucky situation of evaluating an intervention, 164 00:16:22,670 --> 00:16:30,170 that really works, probably going to be all right because you'll see any big shift to change in this type of analysis. 165 00:16:31,040 --> 00:16:34,759 But actually, most of the time we're looking for smaller changes. 166 00:16:34,760 --> 00:16:41,930 And if it's a smaller change, this then becomes hard to differentiate the smaller change from the underlying secular trend. 167 00:16:42,770 --> 00:16:46,220 But the more data you get pre and the more data you get post, 168 00:16:46,430 --> 00:16:53,390 the better you can establish the sort of underlying trends in the system and then that allows you to detect these sorts of changes. 169 00:16:55,710 --> 00:16:59,670 The other issue with this type of design is it's carried out still in one hospital. 170 00:16:59,820 --> 00:17:03,330 So we don't know whether this intervention would work in a different setting. 171 00:17:03,930 --> 00:17:14,040 Maybe the people who instigated this intervention in this one hospital, maybe they were really interested in this area, they really advocated it. 172 00:17:14,520 --> 00:17:18,450 And it was really the people that made this change rather than the intervention. 173 00:17:18,450 --> 00:17:19,889 And if you take it somewhere else, 174 00:17:19,890 --> 00:17:24,180 you don't know whether it's going to have the same effect because you then have to take this sort of person factor away. 175 00:17:24,900 --> 00:17:29,160 So you don't know from this study in one hospital whether that's generalisable. 176 00:17:29,700 --> 00:17:34,860 So that's the other limitation of this type of single centre, pre and post evaluation. 177 00:17:36,690 --> 00:17:42,940 But we can do one better. And that's quite a low support baseline design. 178 00:17:43,150 --> 00:17:47,440 I don't see this very often in the literature, but I think it gets used much at all. 179 00:17:48,820 --> 00:17:53,860 If you've seen, it would be interested to see, to share this, for you, to share your examples. 180 00:17:54,070 --> 00:18:03,670 But essentially what the multiple baseline design does, it replicates this interrupted time series type of analysis in multiple centres. 181 00:18:04,240 --> 00:18:07,510 So this isn't a real example. This is just an artificial example. 182 00:18:07,960 --> 00:18:14,200 But you run your Interrupted Time series in one centre that's called Community at the Top, 183 00:18:14,860 --> 00:18:19,090 and you stagger the times at which you roll your intervention out into practice. 184 00:18:19,120 --> 00:18:27,370 It's not randomised. It just happens. But typically if you're trying to evaluate an intervention and that intervention is just going to be rolled out, 185 00:18:27,370 --> 00:18:32,260 people will do it as and when they choose. So typically you'll find that community a hospital. 186 00:18:32,470 --> 00:18:34,750 Would it a different time to hospital care, etc. 187 00:18:35,500 --> 00:18:44,020 So then you end up with a staggered roll out of the intervention and again trying to collect a large amount of data, 188 00:18:44,190 --> 00:18:51,250 train a large amount of data post. So you really can observe what's happening in the system in absence of the intervention. 189 00:18:52,270 --> 00:18:59,380 You then hopefully are able to infer that every single one of these different centres or different communities here, 190 00:19:00,040 --> 00:19:03,520 you're see an impact around the time the interventions put in place. 191 00:19:04,240 --> 00:19:06,190 Now this is an artificial example. 192 00:19:06,460 --> 00:19:14,800 So again, you'd have to be really lucky that the intervention really was having quite a large impact and that it was consistent across the centres. 193 00:19:15,430 --> 00:19:20,829 But if you do manage to have an intervention that works by a sort of has a quite 194 00:19:20,830 --> 00:19:26,560 a large impact and it's rolled out in several centres or several clusters, 195 00:19:27,010 --> 00:19:31,810 but less provides quite convincing evidence. If you manage to observe an effect. 196 00:19:32,650 --> 00:19:39,129 And how you analyse this type of study is you run your to interrupt a time series analysis in each of these 197 00:19:39,130 --> 00:19:46,360 different centres determine what the shift changes and the trend changes in each of the different centres. 198 00:19:46,600 --> 00:19:56,100 And then you would just summarise that probably narratively. But let's go back to the, um, 199 00:19:56,150 --> 00:20:03,770 the ferritin testing example in this single centre where they were trying to evaluate whether this educational 200 00:20:03,770 --> 00:20:10,429 policy around trying to prevent women having a name at the time of their birth really improved things, 201 00:20:10,430 --> 00:20:13,550 whether this intervention managed to improve the outcomes in the women. 202 00:20:13,940 --> 00:20:22,280 Well, it's difficult. Probably okay in that example because we saw this here is the impact of the intervention. 203 00:20:23,180 --> 00:20:26,300 But in most situations, that type of interrupted, 204 00:20:26,780 --> 00:20:33,020 interrupted time service design would leave us questioning whether the change that we saw 205 00:20:33,740 --> 00:20:39,590 reflected our intervention or whether it was just coincidental that something else had happened. 206 00:20:39,590 --> 00:20:41,930 At the same time, we put our intervention in place. 207 00:20:42,680 --> 00:20:48,440 Now, you might think that it's unlikely that someone else is going to choose to do something just as we're doing our intervention. 208 00:20:48,440 --> 00:20:51,139 I mean, that's not really going to happen, is it? Well, actually, 209 00:20:51,140 --> 00:20:59,719 it does happen because that tends to be things happening in the background and possibly 210 00:20:59,720 --> 00:21:05,090 the very reasons why these investigators decided to evaluate this intervention. 211 00:21:05,090 --> 00:21:10,489 It possibly was not their own idea, but they were actually reacting to something in the larger system. 212 00:21:10,490 --> 00:21:14,150 What people were talking about, this is an issue. So it does happen. 213 00:21:14,900 --> 00:21:19,820 And then the other thing in this example is that, yeah, they saw a big intervention effect. 214 00:21:20,390 --> 00:21:23,780 If they were interested in smaller effects of their intervention, 215 00:21:23,780 --> 00:21:32,750 it's much harder to conclusively say that using this type of study design, that that intervention has caused that impact. 216 00:21:34,940 --> 00:21:38,389 So it's quite low on the study, on the on the hierarchy of evidence. 217 00:21:38,390 --> 00:21:45,080 And I tend to think this is correlation, not causation. It says the only thing you can do, it's probably reasonable, 218 00:21:45,080 --> 00:21:51,440 especially if you did a multiple period baseline design where you replicate the same thing in multiple centres. 219 00:21:55,190 --> 00:22:00,150 So this is case study two. Now, this is a stepwise design. 220 00:22:01,640 --> 00:22:08,120 Again, it's a type of quality improvement, um, educational type of intervention. 221 00:22:08,270 --> 00:22:11,810 They call it quality and a quality improvement package. 222 00:22:12,230 --> 00:22:23,080 I think it's sort of the evaluation of a health policy. So it's a stepwise design. 223 00:22:23,080 --> 00:22:30,420 So first of all, I want to tell you what is a stepwise design? So the stepwise design is a cluster randomised controlled trial. 224 00:22:30,990 --> 00:22:33,600 So that means the cluster is the unit of randomisation. 225 00:22:34,350 --> 00:22:41,070 A cluster might be a hospital, it might be a general practice, it might be a school, but this represents some sort of grouping. 226 00:22:43,080 --> 00:22:47,640 Now each cluster in the scope, if you're choosing a stepwise design, 227 00:22:48,000 --> 00:22:55,860 it's randomly allocated not to intervention A or intervention B, but rather than allocated to a sequence. 228 00:22:56,490 --> 00:23:01,270 These are the sequences in the design. In this design here I've got five sequences. 229 00:23:02,130 --> 00:23:05,610 So I clusters randomly allocated to one of these five sequences. 230 00:23:07,170 --> 00:23:15,480 If they're allocated to the first sequence, they are observed for a period of time, the first period under the control condition. 231 00:23:15,490 --> 00:23:24,990 So light blue again is the control. Then they switched over to the intervention condition in period two and they remain in the intervention 232 00:23:24,990 --> 00:23:34,580 condition for the following duration of the study sequence to say five clusters have been allocated necessary. 233 00:23:34,590 --> 00:23:41,520 Four clusters have been allocated to sequence two. So same thing happens there only they spend two periods in the control condition. 234 00:23:41,970 --> 00:23:51,840 And for Paris in the intervention condition. Sequence, three, three periods in the control, three periods an intervention and so on. 235 00:23:52,650 --> 00:24:00,060 And the number of periods that we monitor things for is just equal to one more than the number of sequences in the design. 236 00:24:00,870 --> 00:24:03,629 And then that means by the end of the study, 237 00:24:03,630 --> 00:24:10,980 every cluster will have been allocated to a sequence at which at least at the very end they get the intervention condition. 238 00:24:12,150 --> 00:24:16,560 So that's what the stepwise design is. So there are lots of different variations. 239 00:24:16,920 --> 00:24:22,320 There's no drunkenness, no riot. So they tend to fit themselves to practice. 240 00:24:22,800 --> 00:24:28,080 So it can be the case that you might have a design where this, like this Pierrot here is missing. 241 00:24:28,410 --> 00:24:34,230 It might be that this period here is missing. There's lots of different ways that they can be implemented. 242 00:24:34,560 --> 00:24:38,220 You can have an unequal number of clusters to the sequences. 243 00:24:39,300 --> 00:24:43,980 You might even have more periods at the end. But this is just the basic format. 244 00:24:46,600 --> 00:24:53,180 Um. The other thing that often happens is you have this transition period somewhere. 245 00:24:53,490 --> 00:24:59,970 So a period of time where we roll out the intervention and that just allows us time to put this intervention in place. 246 00:25:00,450 --> 00:25:06,330 And then if we have a small period of time here where we are rolling out the intervention, we might not include that data in our analysis, 247 00:25:06,630 --> 00:25:13,380 realising actually it's probably doesn't tell us about how the interventions working because they haven't received the intervention just yet. 248 00:25:14,820 --> 00:25:16,800 So that's a basic type of design. 249 00:25:18,120 --> 00:25:26,609 Now every cluster gets the intervention condition and that is an appeal of this study design because sometimes when you're working with clusters, 250 00:25:26,610 --> 00:25:34,950 hospitals, GP practices, there can be a desire from the stakeholders and by stakeholders I mean the people that 251 00:25:34,950 --> 00:25:39,810 you have to engage with in order to get participation of the cluster in your trial. 252 00:25:40,440 --> 00:25:45,720 Now, those stakeholders often have beliefs about whether this intervention is going to work. 253 00:25:46,110 --> 00:25:51,750 Sometimes they might not know anything about the intervention, but they just think that because it's new, it must be good. 254 00:25:52,380 --> 00:25:55,910 So that can often be this desire for everyone to get the intervention. 255 00:25:56,370 --> 00:26:03,150 And this design has the appeal that every cluster gets the intervention, and that's why it's becoming appealing. 256 00:26:04,870 --> 00:26:11,050 But not every individual gets the intervention because depending on how you run your study, 257 00:26:11,440 --> 00:26:19,750 if our study is around women giving birth and having collecting outcomes and whether or not the women are anaemic, 258 00:26:21,400 --> 00:26:24,850 we'll have different women giving birth in each of these different periods. 259 00:26:25,720 --> 00:26:33,330 So women who are giving birth over here, none of these women will have been exposed to the intervention by this time. 260 00:26:33,340 --> 00:26:39,220 If the woman is having her baby in the first four hospitals, she will be exposed to the intervention. 261 00:26:39,940 --> 00:26:44,470 But if the woman is having her birth in any of these hospitals, 17 to 20, 262 00:26:44,830 --> 00:26:49,720 if she is having her baby anywhere over here, she will not be exposed to the intervention. 263 00:26:50,440 --> 00:26:54,610 So will clusters get the intervention? But not all people get the intervention. 264 00:26:57,000 --> 00:27:03,360 So the other appeal, the first appeal is a sort of social appeal that after all, the quest is get the intervention. 265 00:27:04,700 --> 00:27:13,490 But the other appeal is it's not because every cluster is observed under the control condition and then the intervention condition. 266 00:27:14,270 --> 00:27:17,720 We can balance any cluster level characteristics. 267 00:27:18,740 --> 00:27:24,050 So let's suppose that these hospitals at the top in a sequence and maybe teaching hospitals 268 00:27:25,040 --> 00:27:31,759 and these hospitals down here at the bottom are just by chance not teaching hospitals, 269 00:27:31,760 --> 00:27:39,140 this sort of regular community, local hospitals. Now, teaching hospitals and local hospitals tend to have lots of differences between them. 270 00:27:39,650 --> 00:27:42,680 They'll have different types of doctors. They'll have different types of patients. 271 00:27:43,850 --> 00:27:51,260 So if these hospitals down here were allocated to the control condition and only the control condition, 272 00:27:51,650 --> 00:28:00,500 and they happened to be the local hospitals and the hospitals at the top, the teaching hospitals got the intervention condition. 273 00:28:01,010 --> 00:28:04,760 Then we wouldn't know if we observed a difference in their outcomes, 274 00:28:04,760 --> 00:28:13,820 whether it was because of our intervention or whether it is because of this teaching hospital status, which say that that was a confounder. 275 00:28:15,300 --> 00:28:22,200 The appeal of the stepwise design is that every hospital gets observed and to both conditions. 276 00:28:22,950 --> 00:28:26,850 So the teaching hospitals get observed under the control and intervention, 277 00:28:27,360 --> 00:28:31,560 and the local hospitals get observed under the control and the intervention. 278 00:28:31,740 --> 00:28:35,550 So that increases the chance that things balance a little bit. 279 00:28:36,540 --> 00:28:42,450 But the design has a huge caveat, and that is if you look at this dark blue, 280 00:28:43,560 --> 00:28:48,420 that dark blue is all data that we've collected under the intervention condition. 281 00:28:49,050 --> 00:28:53,520 And the light blue is the data that we've collected under the control condition. 282 00:28:54,480 --> 00:28:58,870 What do we notice about the dark blue and the light blue? The time shift? 283 00:28:58,890 --> 00:29:01,830 Yeah. This whole thing has been time shifted. 284 00:29:02,250 --> 00:29:12,600 All this dark blue data, this intervention data, is collected at a systematically later calendar time than this controlled data. 285 00:29:13,530 --> 00:29:23,190 So that means that when we thought about these time confounders in the before and after study and in our Interrupted Time series analysis, 286 00:29:23,460 --> 00:29:28,860 I didn't mention the word time confounder that, but really we were worried about time confounders. 287 00:29:29,460 --> 00:29:35,010 Time confounder is a problem here. This design is confounded on time. 288 00:29:36,510 --> 00:29:45,060 You would never, in an individually randomised control design choose a study design that's confounded with some think by design. 289 00:29:45,840 --> 00:29:49,560 The whole notion of randomisation is to remove confounders. 290 00:29:50,070 --> 00:29:54,240 Yeah, this design has a confounder in it and the confounders time. 291 00:29:55,020 --> 00:30:00,000 So we have to be really careful when we use it because this is a really big limitation of the study. 292 00:30:01,650 --> 00:30:10,800 There are things that we can do because we're collecting data over a long period of time for all of the clusters. 293 00:30:11,370 --> 00:30:16,770 This data here essentially allows us to estimate what the time trend is. 294 00:30:17,370 --> 00:30:27,150 So we use all the light data, light blue data, we estimate the underlying secular trends, and then we can adjust back and find out of our analysis. 295 00:30:27,750 --> 00:30:31,290 So just like in a regular type of design, if you notice something as a counter, 296 00:30:31,290 --> 00:30:37,139 maybe in an observational study, you know, the age is a confounder, you measure age and you adjust for it. 297 00:30:37,140 --> 00:30:39,120 So then you remove its impact. You hope. 298 00:30:40,370 --> 00:30:50,660 You can do the same sort of thing here, but you have to make the assumption that the time trend is the same across all of the clusters. 299 00:30:52,270 --> 00:31:00,340 So you have to make the assumption that whatever is happening in Cluster 20 and this is hospital number 20, 300 00:31:01,330 --> 00:31:04,450 some things are happening in the background that you've got no control over. 301 00:31:04,460 --> 00:31:11,440 We call that the secular trend. In order to adjust out the impact of the secular trend in this design, 302 00:31:11,920 --> 00:31:19,690 you have to make the assumption that that secular trend in cluster 20 is the same as the secular trend in cluster 17, 303 00:31:20,050 --> 00:31:27,520 same as a secular trend in cluster ten and so on. And that's the only way you can manage to adjust for the secular trends. 304 00:31:28,590 --> 00:31:32,550 It's a big assumption when we use that design, we have to make that assumption. 305 00:31:32,880 --> 00:31:37,230 You can't test that assumption really because you don't have enough data to test 306 00:31:37,230 --> 00:31:41,820 it on because if you wanted to test that assumption and cluster one at the top, 307 00:31:42,660 --> 00:31:47,520 you look at cluster one, you've observed it mostly under the intervention condition. 308 00:31:48,450 --> 00:31:51,780 So in Cluster one, you don't have any data on the secular trend. 309 00:31:51,840 --> 00:31:54,300 You only have data on the secular trend down here. 310 00:31:56,030 --> 00:32:04,400 Now maybe you might be able to do something to test whether that secular trend is the same as that secular trend or two is the same as 910. 311 00:32:05,090 --> 00:32:13,070 But then you go into small sample issues. You can't differentiate sampling variation from true differences. 312 00:32:14,080 --> 00:32:18,550 So it's really difficult to test it. So let's go back to the example. 313 00:32:18,940 --> 00:32:24,650 What did they say? Let's just remind ourselves what the example is. 314 00:32:24,660 --> 00:32:29,850 This is the example. The other thing I should say is all of these studies were published in PLOS Medicine. 315 00:32:30,390 --> 00:32:40,100 So very reputable high impact journals. So their population again chosen another women having babies example. 316 00:32:41,510 --> 00:32:48,560 They were women who were at 28, 22 weeks gestation or more, having their baby excluding stillbirths. 317 00:32:49,190 --> 00:32:54,469 They this time looked at 12 hospitals there. Other example, we have just got one hospital now. 318 00:32:54,470 --> 00:32:57,830 We have got 12 large public hospitals. And this is in Nepal. 319 00:32:59,850 --> 00:33:03,900 Started collecting or monitoring the system in April 2017. 320 00:33:04,140 --> 00:33:08,640 They followed and monitored for 18 months. So a short amount of time period. 321 00:33:08,670 --> 00:33:11,970 Then our other another other example. But we've got more hospitals this time. 322 00:33:13,530 --> 00:33:17,130 They have some intervention, which they called a quality improvement package. 323 00:33:17,850 --> 00:33:29,360 They were basically trying to improve mortality. Again, this intervention was around education, so trying to improve health workers competency. 324 00:33:30,350 --> 00:33:34,790 A similar type of intervention package will be at this time trying to target something else. 325 00:33:35,240 --> 00:33:38,450 But they had higher aims. They were trying to influence mortality. 326 00:33:39,080 --> 00:33:44,180 Now, if they can't influence mortality, that impact is going to be much smaller. 327 00:33:44,330 --> 00:33:49,280 Nobody is going to be able to have such high impact on mortality as we could have 328 00:33:49,280 --> 00:33:53,250 on the rate of ferritin testing or on the proportion of women who are anaemic. 329 00:33:53,270 --> 00:33:59,510 I mean, that's just how things work. So they're going to be looking for a much smaller, subtle change in mortality. 330 00:34:00,640 --> 00:34:05,800 So this is their study. On the left hand side, that was a step to design. 331 00:34:06,310 --> 00:34:15,140 So they had four sequences. They called it a wedge in the consort statement on this. 332 00:34:15,160 --> 00:34:18,580 These are these they're not called wedges. They're called sequences. 333 00:34:18,580 --> 00:34:21,190 But sometimes different authors use different terminology, 334 00:34:21,190 --> 00:34:30,100 but they've got four sequences here and 12 clusters so that these three clusters allocated to each of the different four sequences. 335 00:34:30,460 --> 00:34:40,420 Although that diagram doesn't clearly show that there are 363 clusters for each of the four sequences, and then a straightforward stepwise design, 336 00:34:40,810 --> 00:34:45,610 a baseline period where they're all under the control condition, sequential roll out, 337 00:34:46,150 --> 00:34:51,640 and then a slightly elongated step four where they monitor things for a little bit of a longer time period. 338 00:34:53,770 --> 00:34:58,960 And that's their picture taken from that paper. This is a trend in mortality. 339 00:35:00,720 --> 00:35:05,000 So this is week of the study. Not sure why. 340 00:35:05,010 --> 00:35:08,639 It only goes up to 50 weeks when the study lasted for 18 months. 341 00:35:08,640 --> 00:35:12,890 I'm not quite sure. But this is the mortality rate on the X-axis here. 342 00:35:13,320 --> 00:35:20,370 They show this downward trend in mortality and this data have been pulled across these 12 different hospitals. 343 00:35:21,030 --> 00:35:28,859 So there's a bit of noise here. Things are going up and down a little bit, but they've superimposed this line of best fit. 344 00:35:28,860 --> 00:35:32,580 And if you look at that line of best fit looks like things are improving. 345 00:35:33,330 --> 00:35:36,630 So maybe the intervention has had an impact on mortality. 346 00:35:38,120 --> 00:35:48,889 This is a conclusion in their paper here, and they say that the incidence of interpersonal related mortality was 11 per 347 00:35:48,890 --> 00:35:54,110 thousand deaths in the control period and eight in the intervention period. 348 00:35:54,890 --> 00:36:05,150 So what they're saying here is that on average, under this light data here, the control data, the mortality rate was 11 per thousand, 100,000. 349 00:36:05,690 --> 00:36:13,670 And in the intervention, the darker shaded area, the mortality was around eight per 100,000 deaths. 350 00:36:15,580 --> 00:36:23,770 And then they get the adjusted odds ratio. They call it an adjusted odds ratio and I forget exactly what they adjusted for, 351 00:36:23,770 --> 00:36:30,820 but they adjusted for things like age of the mother, socioeconomic status, some things like that. 352 00:36:31,570 --> 00:36:40,810 And then they showed or they reported an adjusted odds ratio of point eight, confidence interval point seven, 2.92. 353 00:36:41,200 --> 00:36:47,230 So sort of conclusive evidence from that statistical analysis that the intervention works. 354 00:36:48,310 --> 00:36:55,910 Do we believe that they've shown us that the intervention works? So they haven't adjusted for the time factor here. 355 00:36:56,660 --> 00:36:58,670 So they didn't adjust for time. 356 00:36:59,360 --> 00:37:08,690 So something else could have been happening in the system that could have caused this change in the mortality rate over time. 357 00:37:09,170 --> 00:37:15,650 And we cannot say with that whether that is because of our intervention or whether that is because of something else happening. 358 00:37:16,970 --> 00:37:20,870 If we had adjusted for time or if they had adjusted for time, 359 00:37:21,110 --> 00:37:28,820 they were willing to openly and make the expectation that actually the secular trends were the same enough clusters 360 00:37:29,240 --> 00:37:36,020 if they were happy to make that assumption and that assumption was transparency and they had still got that result, 361 00:37:36,020 --> 00:37:40,280 then perhaps we could have believed it, but they didn't adjust for time. 362 00:37:40,490 --> 00:37:44,780 And at the beginning, when we looked at this design, we said time was a confounder. 363 00:37:45,230 --> 00:37:53,780 When you look at that picture, you say that the intervention observations are collected systematically later than the control observations. 364 00:37:54,290 --> 00:37:59,400 So therefore, time is a confounder by design. If you don't adjust for it. 365 00:37:59,940 --> 00:38:08,820 We don't know whether this change in mortality is because of the time effect or whether it is because of the intervention effect. 366 00:38:10,210 --> 00:38:15,640 So I'm not convinced that that's anything other than correlation. 367 00:38:17,580 --> 00:38:21,240 But that's not to say that we can't use the step to trial. 368 00:38:21,390 --> 00:38:31,440 I've shown you an example. That's not brilliant. I don't think if we adjust for the secular trends, it moves up a little bit in that hierarchy. 369 00:38:32,280 --> 00:38:39,540 And if we're prepared to make that assumption that these clusters have got the same secular trends, 370 00:38:40,260 --> 00:38:44,340 perhaps because we're working with clusters that are all very geographically similar, 371 00:38:45,210 --> 00:38:52,470 maybe we can convince ourselves that that assumption might be all right if we have got lots of clusters. 372 00:38:52,710 --> 00:38:57,690 This assumption tends to be less important because things just balance themselves out in the randomisation. 373 00:38:58,350 --> 00:39:06,840 So if we've got lots and lots of clusters, this assumption of the same secular trend in every cluster turns out not to be very important. 374 00:39:07,620 --> 00:39:12,420 In reality, we never have enough clusters for that not to matter. 375 00:39:13,550 --> 00:39:18,530 So here we're probably talking about 50 or 100 clusters for that assumption not to matter. 376 00:39:18,980 --> 00:39:26,120 Unfortunately, when we do these studies, we typically have 2010 tops, 30 clusters. 377 00:39:26,600 --> 00:39:30,480 So that assumption does become important. Okay. 378 00:39:30,500 --> 00:39:34,370 So case study three, getting to the top of the hierarchy, I think. 379 00:39:35,330 --> 00:39:41,060 So this is a cluster randomised controlled trial, a parallel design. 380 00:39:41,810 --> 00:39:44,540 And I've chosen this to be at the top of the hierarchy. 381 00:39:46,130 --> 00:39:53,060 They have this taken this quote from the paper and they say to our knowledge this is the first randomised controlled trial 382 00:39:53,540 --> 00:40:02,120 to investigate the effect of repeated rounds of testing of the multiple biological outcomes of chlamydia prevalence, 383 00:40:02,370 --> 00:40:07,570 pi day and ending mitosis. The trial had a pragmatic design. 384 00:40:07,580 --> 00:40:12,830 It reflected the real world roll out of an opportunistic chlamydia screening program. 385 00:40:13,340 --> 00:40:17,030 And I like this example because it's testing a policy. 386 00:40:17,510 --> 00:40:22,310 It's using a robust randomised design to test that policy. 387 00:40:22,640 --> 00:40:28,770 And it's an example of how we can do things if we really want to know whether something works or not. 388 00:40:29,840 --> 00:40:33,770 So what's the design? It's a simple parallel cluster, 389 00:40:33,770 --> 00:40:39,680 randomised controlled trial and all they do in this type of design is randomise 390 00:40:39,740 --> 00:40:43,700 half of the clusters to control half of the clusters to the intervention. 391 00:40:44,210 --> 00:40:52,610 Again, it's cluster randomised. So the unit of randomisation is this cluster, a GP practice, a ward hospital, some big grouping. 392 00:40:54,590 --> 00:41:01,520 Cross this gap control or intervention. The big problem with this design, well, there are two big problems. 393 00:41:01,790 --> 00:41:09,050 And the only one that I'm really mentioning for the time being is this risk of chance imbalance, which we just spoke about a few moments ago. 394 00:41:10,010 --> 00:41:18,709 If you've only got a small number of clusters, you risk allocating clusters to the control condition that are typically 395 00:41:18,710 --> 00:41:22,490 different to those clusters that you randomised to the intervention condition. 396 00:41:23,390 --> 00:41:29,750 So imagine you had a randomised design and you were trying to evaluate some whether some drug a work and you were 397 00:41:29,750 --> 00:41:37,340 going to use individual randomisation and you had 20 people in your trial because I've got 20 clusters in this study. 398 00:41:37,340 --> 00:41:39,829 Imagine you were running a design with 20 people in it. 399 00:41:39,830 --> 00:41:44,600 You would never believe the results because you would be thinking at the back of your mind, Well, 400 00:41:44,600 --> 00:41:50,840 okay, your trial showed me that drug works, but maybe randomisation just wasn't big enough. 401 00:41:51,170 --> 00:41:57,170 You only had 20 people, so maybe these ten people that got the intervention happened to all be younger. 402 00:41:57,980 --> 00:42:02,750 And that can happen if we've only got a small number of RANDOMISATION units. 403 00:42:02,750 --> 00:42:08,390 Things just don't balance. When we look at our baseline table, we'll see an imbalance design. 404 00:42:09,530 --> 00:42:13,340 Exactly the same things happen in cluster randomised designs. 405 00:42:13,670 --> 00:42:19,040 If we don't have enough clusters, things do not balance out between intervention and control. 406 00:42:20,150 --> 00:42:28,760 If we were going to test this intervention many times and then finally accumulate everything into a systematic review, 407 00:42:28,760 --> 00:42:35,510 that chance imbalance wouldn't matter. But we often make inferences not from a systematic review, but from one trial. 408 00:42:36,170 --> 00:42:41,270 So if we want to believe the results from one trial, we need to have a large number of clusters. 409 00:42:42,890 --> 00:42:46,730 If we don't have a large number of clusters, we've got to worry about chance imbalance. 410 00:42:48,090 --> 00:42:52,950 If we know what characteristics are important, we can measure them. 411 00:42:53,370 --> 00:43:01,230 We can use this method of randomisation to try to constrain and ensure that these are not too different to the top. 412 00:43:01,980 --> 00:43:07,709 But then we move away from randomisation, and often we don't know what these characteristics are or we can't measure them. 413 00:43:07,710 --> 00:43:13,810 Or if we don't measure them, we're quite an accurate in our measurements. So that doesn't work that well, unfortunately. 414 00:43:14,790 --> 00:43:17,970 But if we've got a large number of clusters. 415 00:43:18,750 --> 00:43:21,750 So now I'm just going to think about the situation where I've got a large 416 00:43:21,750 --> 00:43:25,950 number of clusters and I think more than 58 is probably just about all right. 417 00:43:27,640 --> 00:43:34,060 Then there's no time effects here that we have to worry about in every other design I have spoken about tonight. 418 00:43:34,540 --> 00:43:38,170 The big caveat has been this issue of a secular trend. 419 00:43:38,860 --> 00:43:44,170 We saw it in the before and after study. We saw it a little bit in the Interrupted Time series. 420 00:43:44,590 --> 00:43:48,340 We definitely saw it in the stroke twice design. But it's not here. 421 00:43:49,090 --> 00:43:55,480 This design is perfectly balanced on time. We don't even think that time is an ingredient in this design. 422 00:43:55,570 --> 00:43:59,770 It is. We don't often think about it, but times still there in the background. 423 00:44:00,880 --> 00:44:04,090 But time is balanced, so time is not a confounder. 424 00:44:04,660 --> 00:44:10,010 So that is the appeal of this design. And it's not really an appeal, I guess. 425 00:44:10,030 --> 00:44:16,060 This was the design everyone was using ten years ago. Then they went backwards and started using the stepped wedge design. 426 00:44:16,090 --> 00:44:20,350 But let's go back to this design because there's no confounder in this design. 427 00:44:21,340 --> 00:44:29,740 So let's get back to the example I put up and this is this Australian cluster randomised controlled trial of a chlamydia screening program. 428 00:44:34,230 --> 00:44:43,140 What did they do? So they took patients, anyone who was the aged between 16 and 29, 29 years, that was their target population. 429 00:44:43,650 --> 00:44:51,630 And they took data from primary care clinics in these 50 rural towns on their intervention package. 430 00:44:52,050 --> 00:45:01,950 Again, sort of educational type based approach. But what they wanted to do was to reduce the prevalence of chlamydia and the entire population. 431 00:45:03,490 --> 00:45:10,420 So they asked 50 rural towns in Australia to participate and then they rolled out this intervention, 432 00:45:10,420 --> 00:45:15,970 educational package, payments for testing, chlamydia, all sorts of different things in their intervention component. 433 00:45:16,420 --> 00:45:22,150 And then they monitored the chlamydia prevalence and they did that before randomisation. 434 00:45:22,330 --> 00:45:25,570 What we might call at baseline or they called in survey one. 435 00:45:26,530 --> 00:45:33,460 Then they put the intervention in place, they allowed the intervention to settle, and then they did another survey at the end of the trial. 436 00:45:33,550 --> 00:45:37,870 They courts that survey too. I forget exactly how much time was between those two surveys. 437 00:45:40,600 --> 00:45:45,570 So this is what they found. So let's first of all, just look at the intervention group. 438 00:45:47,110 --> 00:45:52,660 In the intervention group. We can see the vote on the first day, the results of the survey one. 439 00:45:54,330 --> 00:46:04,090 So in these clusters that were randomised, the 25 tons that went into the intervention on the prevalence started off at about 5%. 440 00:46:04,090 --> 00:46:12,230 So about 5% of people in those rural villages in that age population had got chlamydia at survey two. 441 00:46:12,300 --> 00:46:17,280 That had decreased to about 3.4%. So that's that number. 442 00:46:18,720 --> 00:46:22,770 And that's that number. Say 5%, then 3.4%. 443 00:46:23,430 --> 00:46:28,860 And then we can work out the change. And that's a 1.6% reduction. 444 00:46:29,640 --> 00:46:33,330 Now, I'm not saying this is a good way to analyse the data, it's just how they presented it. 445 00:46:33,930 --> 00:46:41,850 So we might interpret that. It might we might say that that means that any intervention on the proportion of people 446 00:46:41,850 --> 00:46:46,830 with chlamydia in those rural villages reduced by about one and a half percent. 447 00:46:48,980 --> 00:46:52,690 Confidence interval. It's a difference. 448 00:46:52,700 --> 00:46:58,040 It's entirely the one side of the null. So that would be statistically significant. 449 00:46:58,580 --> 00:47:04,370 So that is suggesting that in the intervention arm, things improve by about one and a half percent. 450 00:47:06,300 --> 00:47:10,800 But it's the control. Nothing happens in that control on. 451 00:47:11,920 --> 00:47:18,980 So usual care just continued. There was no educational package, no promotion about chlamydia screening. 452 00:47:19,000 --> 00:47:21,190 Nothing happened in those 25 villages. 453 00:47:22,220 --> 00:47:28,160 So that the committee of Prevalence started off at about four and a half percent on average, a little bit different. 454 00:47:28,610 --> 00:47:33,770 But remember, we don't have many units here to randomise, if any got 50 randomisation units. 455 00:47:34,340 --> 00:47:39,980 So we can't completely be certain that when we randomise these, they're going to be the same as those. 456 00:47:40,580 --> 00:47:45,980 So it's it's expected that we might see a bit of difference there, but are reasonably similar at baseline. 457 00:47:47,780 --> 00:47:51,110 Decreased to 3.4% in the control. 458 00:47:52,380 --> 00:47:56,220 Difference is a bit smaller, but it's a 1.1% reduction. 459 00:47:57,270 --> 00:48:04,800 Now this time it's not quite hitting statistical significance, but we shouldn't pay too much attention to that the whole. 460 00:48:06,030 --> 00:48:09,990 The trend here is that things have reduced. Things have reduced there. 461 00:48:10,920 --> 00:48:17,100 We can't say for certain whether this is statistically significant, but most of the confidence interval is that side of the note. 462 00:48:17,490 --> 00:48:23,280 So probably things are reducing in the control arm too. What does that suggest is happening? 463 00:48:24,490 --> 00:48:32,650 Siri duction here and a reduction there does not tell us the intervention's working or does it tell us something else? 464 00:48:34,190 --> 00:48:38,350 Yeah. Sunday again, exactly the same thing happening, something going on in the wider system. 465 00:48:38,920 --> 00:48:42,910 So things have reduced in the control on things have reduced in the intervention. 466 00:48:44,050 --> 00:48:50,320 Now here they've done a cluster randomised design. So we have this data here in the intervention arm. 467 00:48:50,560 --> 00:48:57,580 I may have the date in the controller. But imagine if they had chosen not to do a cluster randomised design. 468 00:48:57,970 --> 00:49:05,140 Maybe they had just done a before and after study. Like in the first example in the study that was still in PLOS Medicine. 469 00:49:06,160 --> 00:49:16,230 We'd only have that data there. Now that data then would tell us that there was a reduction, but actually we have also corrected the control data. 470 00:49:16,250 --> 00:49:20,030 We know that there is a reduction anyway. So there is a secular trend. 471 00:49:20,050 --> 00:49:24,130 There is something happening in the system. This is just what happens all the time. 472 00:49:24,520 --> 00:49:33,819 Fortunately, things are improving all the time, and so that's why we need to collect this control data so we can compare what 473 00:49:33,820 --> 00:49:38,050 happens in the control in the usual system allowing for that temporal change. 474 00:49:38,410 --> 00:49:42,310 So what happens when we add our intervention into the system, 475 00:49:42,700 --> 00:49:48,729 but only when we have that control data can we allow ourselves or can we have the ability to 476 00:49:48,730 --> 00:49:55,570 differentiate what is the impact of the intervention from the impact of the secular change? 477 00:49:56,620 --> 00:50:01,990 So this is the secular trend, this is the intervention and the secular trend. 478 00:50:02,860 --> 00:50:06,669 And then we can do an analysis where we adjust for the baseline values and we 479 00:50:06,670 --> 00:50:11,980 can get an analysis or an estimate that has adjusted for that baseline values. 480 00:50:12,460 --> 00:50:20,770 And then this then gives us an estimate of the impact of the intervention itself ruling out or taking out that impact of the secular trend. 481 00:50:22,030 --> 00:50:26,890 And that's the intervention effect, much smaller than that impact. 482 00:50:28,630 --> 00:50:38,410 Now, if we had just done a before and after study here, we would probably conclude that this intervention was having some impact here. 483 00:50:39,640 --> 00:50:43,480 We have got a much smaller estimated effect of the intervention. 484 00:50:43,930 --> 00:50:52,540 It's not statistically significant, but then becomes a little bit difficult to interpret what this means because it's not statistically significant. 485 00:50:53,020 --> 00:51:00,880 And we have to be careful that we don't say here that this tells us it doesn't work just because it's not statistically significant. 486 00:51:01,780 --> 00:51:11,360 But we look at the confidence interval. And that confidence interval, what it includes quite a large reduction in the negative direction. 487 00:51:11,720 --> 00:51:16,100 It includes the possibility that things might improve a little bit too. 488 00:51:16,250 --> 00:51:17,180 So it might get worse. 489 00:51:17,180 --> 00:51:29,060 The better to a point estimate is pretty small, but the entire confidence interval perhaps arguably size that it rules out anything but small impacts. 490 00:51:29,690 --> 00:51:35,270 And that was their conclusion. And that conclusion was that it was anything but small impacts. 491 00:51:36,340 --> 00:51:43,639 I slightly dispute whether that is quite consistent with that confidence interval because we might say that a 492 00:51:43,640 --> 00:51:51,590 reduction of two and a half percent is a pretty large impact when we're going from 5% and reducing it by about 50%. 493 00:51:52,550 --> 00:51:59,750 But nonetheless, one thing that's very appealing about this design is we do not have to worry about time effects. 494 00:52:01,040 --> 00:52:08,570 So I then wanted to just sidetrack a little bit and have one word of caution. 495 00:52:10,880 --> 00:52:19,730 Um, about just making sure that we don't use the word policy evaluation when actually 496 00:52:19,730 --> 00:52:24,460 we're evaluating something that's not a policy but an individual level intervention. 497 00:52:24,470 --> 00:52:28,580 So I wanted to quickly show you this example of benzodiazepines. 498 00:52:31,430 --> 00:52:39,290 So population what people have in cardiac surgery. The intervention was perioperative benzodiazepine. 499 00:52:39,500 --> 00:52:44,540 They were in Canadian hospitals and they were trying to reduce delirium after surgery. 500 00:52:45,350 --> 00:52:50,500 Is this a policy intervention? Some people are shaking their heads. 501 00:52:50,510 --> 00:53:01,160 I really agree. Shaking your head. So sometimes cluster randomisation has been used as a means to avoid individual patient consent. 502 00:53:01,440 --> 00:53:08,450 This is a quote from the protocol paper for the study evaluating benzodiazepine in a cluster randomised controlled trial. 503 00:53:09,200 --> 00:53:15,559 And they basically say that although in principle they could evaluate the effect of benzodiazepines 504 00:53:15,560 --> 00:53:21,290 in this setting through an individual patient asked a regular randomised controlled trial. 505 00:53:21,740 --> 00:53:27,050 This is probably not the best approach to address the broad questions of policy. 506 00:53:27,950 --> 00:53:31,850 So what they're saying there really is this is a policy, I think we could disagree with that. 507 00:53:32,600 --> 00:53:38,180 But they then go on to say that consent to participate is therefore obtained from the cluster 508 00:53:38,450 --> 00:53:45,650 rather than the patient ensuring representative sampling and enrolment of all eligible patients. 509 00:53:46,190 --> 00:53:47,510 But these things are true. 510 00:53:48,470 --> 00:53:55,610 They will have a representative sample if they enrol everybody and they will enrol everyone if they do not take that consent. 511 00:53:57,310 --> 00:54:00,730 But justification for cluster randomisation is really important. 512 00:54:00,730 --> 00:54:10,390 Individual randomisation, that is the gold standard approach without any doubt, policy evaluations, we cannot use individual randomisation. 513 00:54:10,480 --> 00:54:17,650 So in all of these examples I've put up today, you couldn't possibly use individual randomisation, even though it's the gold standard. 514 00:54:18,040 --> 00:54:21,130 And that's because your cluster was an entire unit. 515 00:54:21,790 --> 00:54:26,290 But sometimes people have started to manipulate the the question of a policy, 516 00:54:26,950 --> 00:54:33,370 the effect of an intervention into a question of policy just to avoid individual patient consent. 517 00:54:33,370 --> 00:54:37,180 It does make things a lot easier. It's not a good idea. 518 00:54:37,210 --> 00:54:43,780 Firstly, it's ethically inappropriate. And secondly, it can at increased risk of risks of bias. 519 00:54:44,470 --> 00:54:48,720 And I haven't really gone into these increased risks of bias and I don't have time to. 520 00:54:49,060 --> 00:54:52,090 But there are some other risks of bias that I haven't mentioned, 521 00:54:52,090 --> 00:54:57,940 that you do open yourselves up into those if you start to manipulate these questions of. 522 00:54:59,520 --> 00:55:02,520 Drug intervention effects into questions of policy. 523 00:55:04,310 --> 00:55:12,200 Okay. So I think I've run out of time. So very quickly, few minutes back to the stepwise cluster randomised controlled trial. 524 00:55:12,500 --> 00:55:24,680 When should we use that? So when we want to use the cluster randomised design, we have to justify our randomisation of an entire cluster. 525 00:55:25,190 --> 00:55:30,320 So we have to make sure that we've got a get rank justification for randomising an entire cluster. 526 00:55:30,830 --> 00:55:35,660 We randomise an entire cluster just because it avoids us getting individual consent. 527 00:55:36,080 --> 00:55:38,660 So it really has to be a cluster level intervention. 528 00:55:39,590 --> 00:55:47,060 But we additionally have to think about why we have to roll the intervention to all of the clusters, why we need to have this staggered rollout. 529 00:55:49,720 --> 00:55:55,810 The reasons why we have to justify and think carefully about this design, I hope have become evident. 530 00:55:56,440 --> 00:55:59,950 It's a design that's confounded by time. 531 00:56:00,280 --> 00:56:04,360 And so any analysis that we do always has these many caveats. 532 00:56:04,780 --> 00:56:10,690 And at the end of the day, after running a study, we don't want to have to worry about these caveats and about these risks of bias. 533 00:56:12,250 --> 00:56:18,160 So we have to make sure that we really think carefully about what we're going to use it, because it's at increased risk of bias. 534 00:56:19,120 --> 00:56:21,880 Here are some possible justifications. 535 00:56:22,690 --> 00:56:30,940 So we started out at the beginning talking about the second justification here that is appealing to those people that we're going to try to recruit. 536 00:56:30,970 --> 00:56:35,180 So the cluster stakeholders, it has what we might call a social appeal to it. 537 00:56:36,010 --> 00:56:39,220 Sometimes it can increase the feasibility of things. 538 00:56:39,700 --> 00:56:43,989 Sometimes it can be a statistically more powerful design. 539 00:56:43,990 --> 00:56:49,600 And I haven't gone into power. It's you have to work this out on a case by case pace basis. 540 00:56:49,610 --> 00:56:53,050 Sometimes it can be more powerful. It's not always the case. 541 00:56:54,100 --> 00:56:58,120 We also have to consider other things like Is it going to make our study run longer? 542 00:56:58,810 --> 00:57:02,630 How long is it going to take to realise the effect of the intervention? 543 00:57:02,660 --> 00:57:06,490 So there are all these added complications as well that we need to consider. 544 00:57:07,840 --> 00:57:15,040 So I think the real justification that you really don't have to worry about too much is when you 545 00:57:15,040 --> 00:57:21,550 are going to distribute a scarce resource or this intervention is going to be rolled out anyway. 546 00:57:23,800 --> 00:57:26,530 And only very occasionally does that happen. 547 00:57:26,830 --> 00:57:35,740 So maybe what's going to happen in our system is somebody is going to roll out something irrespective of any evaluation. 548 00:57:37,040 --> 00:57:40,670 And perhaps they're going to distribute a scarce resource. Maybe not. 549 00:57:40,820 --> 00:57:50,450 But somebody is going to be rolling out something if in addition, you can persuade them to let you randomise the order of that rollout. 550 00:57:50,700 --> 00:57:53,960 And again, that's very rare. When does that happen? 551 00:57:53,990 --> 00:58:02,360 Not very often. But if you could if those two things held, then maybe that would be a good justification for doing a stepwise trial. 552 00:58:03,860 --> 00:58:09,860 And especially if the alternative was going to be some random non-randomized design like a before and after study. 553 00:58:10,130 --> 00:58:14,240 The study design is definitely higher on the hierarchy than a before and after study. 554 00:58:15,640 --> 00:58:24,910 When does that happen? I think there are very few and far between studies where they really have had good justification for using the stepwise design. 555 00:58:25,450 --> 00:58:29,680 And this example, it's the gun bang stretch cluster, randomised controlled trial. 556 00:58:29,980 --> 00:58:33,910 It's a famous iconic example. They didn't call it stepwise trial design. 557 00:58:33,910 --> 00:58:36,910 So if you search for step twice in the literature, you won't find this. 558 00:58:36,910 --> 00:58:44,350 But it was a stepwise design and in this design it's called the Gumby and Hepatitis Intervention Study. 559 00:58:45,070 --> 00:58:51,850 They were trying to evaluate whether HPV vaccine prevents liver cancer. 560 00:58:52,090 --> 00:58:59,290 I mean, that's the sort of the basic idea behind this study. Does HPV vaccine prevent people from getting liver cancer? 561 00:58:59,770 --> 00:59:05,230 One study was carried out in The Gambia, and there are 17 geographical regions in The Gambia. 562 00:59:06,520 --> 00:59:13,720 They said that universal vaccination in the entire country was impossible for logistical and financial reasons. 563 00:59:14,230 --> 00:59:18,970 Now that's probably very believable. They were going to do this at a country wide scale. 564 00:59:19,240 --> 00:59:25,840 It's very unlikely that an entire country can roll out something new to an entire population at the same time. 565 00:59:26,890 --> 00:59:30,940 So they were going to roll out this HPV vaccination sequentially. 566 00:59:31,780 --> 00:59:37,220 The study design has managed to persuade the stakeholders in the Gambia to allow them to do this randomly. 567 00:59:37,240 --> 00:59:44,260 So these 17 geographical areas were actually exposed to this HPV vaccine. 568 00:59:44,260 --> 00:59:51,879 And by that, what happened was that if a baby was born in any of these villages after the 569 00:59:51,880 --> 00:59:57,280 time after the time at which that cluster was going to get the HPV vaccine, 570 00:59:57,580 --> 01:00:06,400 then the baby was vaccinated. If it was born in one of these villages before the village had crossed over, then it didn't get the HPV vaccine. 571 01:00:07,450 --> 01:00:10,570 They then followed up the babies for 30 years. 572 01:00:10,780 --> 01:00:15,580 Still hasn't reported. I'm going to report very soon, but it hasn't reported just yet. 573 01:00:16,390 --> 01:00:21,460 And then they're going to compare the liver cancer rates between that group and that group. 574 01:00:22,240 --> 01:00:28,270 There will be a big caveat in their analysis that they have to adjust for time, but they will adjust for time. 575 01:00:28,750 --> 01:00:33,070 They also have to make the assumption that the secular trend is the same across all of the clusters. 576 01:00:33,520 --> 01:00:38,890 They don't have very many clusters here, so there will be some concerns about that assumption. 577 01:00:39,790 --> 01:00:43,900 But the alternative here would have probably been a before and after study. 578 01:00:45,850 --> 01:00:51,549 But this is the only example that I can find in the electric chair where it really was the case that 579 01:00:51,550 --> 01:00:56,320 they persuaded the stakeholders to randomise something that was going to be rolled out anyway. 580 01:00:56,860 --> 01:01:01,810 Most of the time people are just using this design because it's convenient or it's fashionable. 581 01:01:03,920 --> 01:01:09,830 So summing up, there are these menus of different types of study designs, and I haven't spoken about some of them. 582 01:01:10,850 --> 01:01:17,570 These ones here alternate between intervention and control, but in policy evaluations, we typically can only go one way. 583 01:01:17,600 --> 01:01:20,090 We can't take away something, but we put it in place. 584 01:01:20,660 --> 01:01:27,680 If you can take away whatever you want and we evaluate these types of designs where you cross backwards and forwards are much better. 585 01:01:29,820 --> 01:01:34,440 Which is the best design. The first thing is it should be low risk of bias. 586 01:01:34,560 --> 01:01:38,280 It should be feasible and that it should be statistically possible. 587 01:01:38,310 --> 01:01:42,540 I think a lot of the literature has emphasised statistical power over bias. 588 01:01:43,290 --> 01:01:49,630 But I think we should be more concerned with bias. Which design is least bias? 589 01:01:49,780 --> 01:01:56,260 Well, the stepwise design requires this model based analysis, so it's not going to be the step wedge trial. 590 01:01:57,950 --> 01:02:01,640 What design is more feasible? Might it be the to trial? 591 01:02:01,670 --> 01:02:08,210 Well, people often say they're using it because it's feasible, but actually it turns out that it has its own complexities. 592 01:02:08,690 --> 01:02:11,900 You have to get all of the clusters ready to start at the same time. 593 01:02:12,050 --> 01:02:18,740 Actually, that turns out to be really difficult. People start at the clusters aren't ready to go, then they don't know what to do. 594 01:02:19,340 --> 01:02:24,890 So it has its own complexities. Is it more statistically powerful? 595 01:02:25,010 --> 01:02:30,590 Well, that depends. Sometimes it can be, but it's not always more statistically powerful. 596 01:02:31,020 --> 01:02:39,440 I guess it really does come down to power. Then you have to work out on a case by case basis which one of these designs is more powerful. 597 01:02:39,770 --> 01:02:44,420 There are some generic statements in the literature that say the stepwise design is more powerful. 598 01:02:44,630 --> 01:02:57,120 It's not. It can be. It's not always. So if you can't randomise, then my advice is to use an interrupted time series type of analysis. 599 01:02:57,990 --> 01:03:03,090 You have to be willing to accept that if you get a big shift, you're probably going to be alright. 600 01:03:03,450 --> 01:03:05,910 But if you're looking for small, subtle changes, 601 01:03:06,300 --> 01:03:13,950 you might miss them in that type of design or you might not be able to unequivocally say that they're because of your intervention. 602 01:03:14,940 --> 01:03:21,090 If you can have many clusters in your Interrupted Time series, then that's going to help. 603 01:03:22,190 --> 01:03:29,720 If you can randomise, the parallel design has fewer risks if there are a small number of clusters. 604 01:03:30,080 --> 01:03:38,780 The stepwise trial is probably going to be your design of choice, but only in the situations where you don't have a large number of clusters. 605 01:03:38,780 --> 01:03:45,260 Because if you have a large number of clusters, if you did a parallel design, you won't have to worry about time trends. 606 01:03:46,690 --> 01:03:52,460 Thank you. I think.