1 00:00:00,060 --> 00:00:07,500 Welcome everybody to the Department of Statistics. As Dorel just put it to me in this age when we don't need experts, 2 00:00:07,500 --> 00:00:18,030 actually the world is hanging on the voices of experts and we are lucky enough in this very worrying scenario to have two experts of our very own. 3 00:00:18,030 --> 00:00:23,550 So today, as many of you will know, two cases of the coronavirus have been confirmed in the UK, 4 00:00:23,550 --> 00:00:30,090 and our colleagues, Crystal and Robin, are going to give us some insights into world of real time epidemiology. 5 00:00:30,090 --> 00:00:34,380 This is what actually happens when you have thoroughly embedded in the subject. 6 00:00:34,380 --> 00:00:43,260 So I thought I would start by addressing what are the issues that we might want to know at this stage because this is a a new disease, 7 00:00:43,260 --> 00:00:47,010 a novel infectious agent. So what are the things you might start with? 8 00:00:47,010 --> 00:00:48,060 What are the symptoms? 9 00:00:48,060 --> 00:00:55,650 You have to design a case definition, you know, you're not going to test everybody that has every possible symptoms, so you've got to work on that, 10 00:00:55,650 --> 00:01:02,160 identifying what the causal agent is and Robin will speak about how you can compare the characteristics 11 00:01:02,160 --> 00:01:08,640 the symptoms with for this pathogen with what has been seen profiles and previous ones. 12 00:01:08,640 --> 00:01:12,330 How many cases are there? How many cases might there be in future? 13 00:01:12,330 --> 00:01:17,160 That is, of course, the key thing, but that's always projecting into the future is the what if. 14 00:01:17,160 --> 00:01:23,040 And while mathematics is very good at giving us the opportunity to think, what if under several different scenarios, 15 00:01:23,040 --> 00:01:28,980 because of course, we'll only just see one, there is inherent uncertainty and challenges in doing that. 16 00:01:28,980 --> 00:01:36,210 How serious might this disease be? So we could look at projecting future number of cases and thinking about how many people might ever be affected. 17 00:01:36,210 --> 00:01:40,440 But we are also interested in knowing what the case fatality ratio is. 18 00:01:40,440 --> 00:01:48,540 I mean, that's important in terms of measuring severity and determining what's a proportionate response, as well as how many people will be infected. 19 00:01:48,540 --> 00:01:53,310 But it also is important because people will start testing different treatments. 20 00:01:53,310 --> 00:01:58,730 And so knowing what the baseline case severity is is important. 21 00:01:58,730 --> 00:02:02,730 The incubation period distribution is important because if it turns out that I've 22 00:02:02,730 --> 00:02:07,380 exposed someone and I get find out tomorrow that I've got a particular disease, 23 00:02:07,380 --> 00:02:11,940 the incubation period is the time from when they were infected until when they show the signs of disease, 24 00:02:11,940 --> 00:02:15,360 and that knowing that distribution tells you if you're following them, how long. 25 00:02:15,360 --> 00:02:21,810 You should keep checking on Alison to see if she's still OK before you can give her the all clear. 26 00:02:21,810 --> 00:02:25,560 And finally, then of course, you're talking about the opportunities for control. 27 00:02:25,560 --> 00:02:37,820 We can talk characterise transmission, but what is the point of doing that if we don't also help identify and evaluate possible control measures? 28 00:02:37,820 --> 00:02:46,160 The numbers here are numbers that we made public through made with collaboration with colleagues at Imperial College London, 29 00:02:46,160 --> 00:02:50,330 where we have an infectious disease outbreak response unit. 30 00:02:50,330 --> 00:02:54,710 And so we collaborate with different people. So what we did here was follow up, 31 00:02:54,710 --> 00:03:02,120 similar work that we'd done originally when I was collaborating with Christophe and others on the influenza pandemic. 32 00:03:02,120 --> 00:03:08,360 So in that case, we were looking at the number of people who were coming out of Mexico and been confirmed with pendant. 33 00:03:08,360 --> 00:03:17,000 What then became pandemic influenza and comparing those to the number of people who had been identified as cases in Mexico. 34 00:03:17,000 --> 00:03:19,310 There was this mismatch. 35 00:03:19,310 --> 00:03:26,450 Of it seemed like more people were coming out infected than you might have expected, given the number of cases that have been reported. 36 00:03:26,450 --> 00:03:34,730 So we produced a report on the 17th of January where we looked at the fact that there had been three cases that had been exported. 37 00:03:34,730 --> 00:03:38,210 This is now in this case, exported out of China. 38 00:03:38,210 --> 00:03:48,760 And so in order to figure out what that means, so you can think of the population of infected people as being a sample of N or a population of size N, 39 00:03:48,760 --> 00:03:52,610 and then there's some possibility that the individuals who are infected, 40 00:03:52,610 --> 00:03:59,210 each of those individuals have a probability of travelling internationally and being detected as a case somewhere else. 41 00:03:59,210 --> 00:04:01,350 And so how do we think about doing that? 42 00:04:01,350 --> 00:04:09,320 One of the possibilities is we used data that we had previous access to on the number of people travelling internationally out of airports. 43 00:04:09,320 --> 00:04:14,720 And so you can look at Wuhan International Airport. You then have to think, OK, what is the number of people? 44 00:04:14,720 --> 00:04:17,330 What's the denominator for people who would use that airport? 45 00:04:17,330 --> 00:04:26,960 And we can consider either the the city population, which is 11 million or the metropolitan area population, which is 19 million. 46 00:04:26,960 --> 00:04:34,310 So this is a huge city and try to use that as the denominator for the probability that people would go out. 47 00:04:34,310 --> 00:04:40,190 Now there are all kinds of uncertainties here because we've assumed that everybody sort of the same in this. 48 00:04:40,190 --> 00:04:44,510 The probability that you got infected is is the same as everybody else. 49 00:04:44,510 --> 00:04:53,540 And the problem is so that the travellers are, assume a random sample of of the people there of that 19 or 11 million now. 50 00:04:53,540 --> 00:05:00,560 You know, all sorts of things about yourself that you're not you're probably not representative of everybody. 51 00:05:00,560 --> 00:05:05,390 And so, you know, people have raised the issue well for the market. 52 00:05:05,390 --> 00:05:11,690 Initially, those people who might have bought or gone to a market to look at wildlife, to eat, 53 00:05:11,690 --> 00:05:16,580 that they might be people who were more likely to travel if they were more affluent, for example. 54 00:05:16,580 --> 00:05:24,020 So there are uncertainties. But if we assume that everybody is equally likely and look at a detection window of the delay for cases, 55 00:05:24,020 --> 00:05:34,040 we estimated on the basis this baseline scenario, three exported cases implied central estimate of about 7500 cases that range between 56 00:05:34,040 --> 00:05:39,360 four hundred and four thousand when we updated the report on the 22nd of January. 57 00:05:39,360 --> 00:05:44,840 By that point, there were seven exported cases and that gave a central estimate of four thousand, 58 00:05:44,840 --> 00:05:52,820 and you see the sensitivity to the different assumptions based on the detection window and the catchment area. 59 00:05:52,820 --> 00:06:00,580 But to put this in context, the number of reported confirmed cases in China on the 16th of January was forty one. 60 00:06:00,580 --> 00:06:06,160 So that showed that they were would likely this analysis showed that there was an indication 61 00:06:06,160 --> 00:06:11,570 that it was very unlikely that there were only forty one confirmed cases at that point. 62 00:06:11,570 --> 00:06:20,170 Now that is not surprising because you would expect if it's a novel infectious agent, it takes time delays to identifying it, taking samples. 63 00:06:20,170 --> 00:06:24,100 In this case, sequencing it and identifying it as a new agent and so on. 64 00:06:24,100 --> 00:06:37,080 But just to give a sort of magnitude of what was going on. We subsequently looked at estimates where we looked at a range of are not value, so. 65 00:06:37,080 --> 00:06:42,620 We said, OK, so our not is the it's the basic reproduction number. 66 00:06:42,620 --> 00:06:48,620 It's the average number of people one infected person will cause in a fully susceptible population. 67 00:06:48,620 --> 00:06:55,740 If you think for this novel agent that the population is fully susceptible, it's being introduced. 68 00:06:55,740 --> 00:07:04,400 Question is what reproduction number would you have to have to get from a zoo, not a exposure, and you have to assume a particular date of that. 69 00:07:04,400 --> 00:07:08,150 Then you might say the end of December or some point in between there. 70 00:07:08,150 --> 00:07:14,000 What reproduction number would you have to have to be consistent with four thousand cases? 71 00:07:14,000 --> 00:07:20,360 And so the way we did that was through simulation and you see a set of simulations over there on the side. 72 00:07:20,360 --> 00:07:24,800 And that was for the case where they are not was equal to two point six, 73 00:07:24,800 --> 00:07:31,280 and that's where you get where the median of all those simulation gives four thousand cases. 74 00:07:31,280 --> 00:07:33,050 Now why do we look at four thousand? 75 00:07:33,050 --> 00:07:44,120 Well, that was because that was our central estimate that was published on the 22nd of January and corresponded to the 18th of January for the number. 76 00:07:44,120 --> 00:07:48,440 So what does this? What is two point six mean? 77 00:07:48,440 --> 00:07:54,380 Well, if you're going to control an infectious disease, you need to get the reproduction number down from whatever it was to start with. 78 00:07:54,380 --> 00:07:58,100 In this case, maybe two point six. You see different scenarios. 79 00:07:58,100 --> 00:08:06,560 The the lowest one comes when you assume that our lower limit, which is only one thousand cases, 80 00:08:06,560 --> 00:08:14,120 and that the initial number exposed at that market infected by the zoonotic source of the genetic sources, 81 00:08:14,120 --> 00:08:22,640 those animals infecting humans was two hundred. So to get from two hundred two thousand you don't need are not be that high. 82 00:08:22,640 --> 00:08:26,030 But if, for example, are not equals two, 83 00:08:26,030 --> 00:08:33,560 you need to prevent at least 50 percent of the transmission in order to control it and bring an increasing situation into a decreasing one. 84 00:08:33,560 --> 00:08:37,610 Obviously, all of us would like to bring the transmission to zero. 85 00:08:37,610 --> 00:08:44,260 And that's an hour of zero. But it doesn't have to get to zero in order to stop an epidemic. 86 00:08:44,260 --> 00:08:52,850 It just has to be pushed below one. Oh, I also wanted to talk about estimating the case fatality rate. 87 00:08:52,850 --> 00:08:57,080 So this is a figure that I pulled off of a Thai website. 88 00:08:57,080 --> 00:09:00,530 So, so the date of this was the 28th of January. 89 00:09:00,530 --> 00:09:05,930 And it was comparing the number of cases that have been seen so far for the confirmed for the coronavirus, 90 00:09:05,930 --> 00:09:09,860 which was just under three thousand and that there'd been eighty two deaths. 91 00:09:09,860 --> 00:09:14,750 So they took a ratio of those and said, OK, the mortality is two point eight percent. 92 00:09:14,750 --> 00:09:23,180 They compared that to sailors with the number of deaths and confirmed cases and said, OK, so ours is nine point six. 93 00:09:23,180 --> 00:09:28,790 So where did this come from? So if you look at worldwide, if you look at the W.H.O. website, 94 00:09:28,790 --> 00:09:38,030 you get these numbers off of for the eight thousand ninety eight cases and the number of deaths for sailors. 95 00:09:38,030 --> 00:09:44,900 I'll talk specifically about work that we did in looking at Hong Kong because that was Kristof and I were both 96 00:09:44,900 --> 00:09:50,000 involved in the analysis of Hong Kong data in collaboration with the University of Hong Kong colleagues. 97 00:09:50,000 --> 00:09:55,640 And there we found actually that the CFR was 17 percent, so considerably higher. 98 00:09:55,640 --> 00:10:00,660 So the rest of what I'm going to say, keep in that context. 99 00:10:00,660 --> 00:10:08,490 This shows you by day the number of cases in the Hong Kong SARS outbreak, so this is two thousand and three, 100 00:10:08,490 --> 00:10:19,020 so it was a an unusual, atypical pneumonia that had been detected in China in late 2002, almost exactly the same time of year. 101 00:10:19,020 --> 00:10:23,790 As things unfolded this year, this is how it unfolded in Hong Kong. 102 00:10:23,790 --> 00:10:31,290 So the first cases was the 22nd to February. Cases peaked in end of March, early April. 103 00:10:31,290 --> 00:10:37,290 So if you were looking at the number of deaths divided by the number of cases and you 104 00:10:37,290 --> 00:10:42,090 decided to do that on the 2nd of April because you look at that by the 2nd of April, 105 00:10:42,090 --> 00:10:50,200 there were nine hundred and twenty five cases. Almost eighty six percent of them were censored in that they were still in hospital 106 00:10:50,200 --> 00:10:54,980 and you didn't know if they were going to end up being a death or recovery. 107 00:10:54,980 --> 00:11:05,810 So there are then considerable risks associated with your estimate being the number of deaths so far divided by the number of cases so far. 108 00:11:05,810 --> 00:11:10,370 And that was how this was being tracked in terms of the case fatality ratio. 109 00:11:10,370 --> 00:11:14,630 So imagine that this said D divided by C. 110 00:11:14,630 --> 00:11:18,950 So that's method one is deaths so far divided by cases so far? 111 00:11:18,950 --> 00:11:22,430 We then considered alternatives. This is again collaborating. 112 00:11:22,430 --> 00:11:31,760 This is in two thousand three, and Method three was in collaboration with also with David Cox, who gave us, as always, very sage advice. 113 00:11:31,760 --> 00:11:32,870 But Method two. 114 00:11:32,870 --> 00:11:42,260 So one of the things that people said with such a simple estimate, it's so clear it's transparent cases divided by so deaths divided by cases. 115 00:11:42,260 --> 00:11:50,180 But if the time from becoming a case to dying takes a long time, there will be that gap where there's a delay and it will be biased downwards. 116 00:11:50,180 --> 00:11:55,310 An alternative that's almost as simple is deaths divided by deaths plus recoveries. 117 00:11:55,310 --> 00:12:05,930 And in that case, if the time from becoming a case to dying is similar in distribution to become a case to recovering, then this should be unbiased. 118 00:12:05,930 --> 00:12:09,260 We also looked, as I said, developing with David Cox, 119 00:12:09,260 --> 00:12:17,930 a method where we used its non-parametric and uses a Kaplan-Meier like estimate that is not that example is not four stars, 120 00:12:17,930 --> 00:12:24,080 but showing that you can look at the how the probability of recovery goes up and how the probability of survival goes down. 121 00:12:24,080 --> 00:12:29,390 And you end up somewhere in the middle with those two lines meeting when you've seen all of the outcomes. 122 00:12:29,390 --> 00:12:34,700 And so we looked at the performance of these looking back at the end of the SAR's epidemic, 123 00:12:34,700 --> 00:12:41,600 looking back so epidemic for Hong Kong and showed that if you look at the April second, 124 00:12:41,600 --> 00:12:46,010 so remember, that's when eighty six percent of the cases are censored. 125 00:12:46,010 --> 00:12:51,430 This is the estimate based on deaths so far divided by cases so far. 126 00:12:51,430 --> 00:13:00,100 And it has a very tight confidence interval. It's still a couple of percent this in black, this black square. 127 00:13:00,100 --> 00:13:08,770 This is the eventual case fatality rate, so between 13 14 percent for all the people that had become cases by the 2nd of April. 128 00:13:08,770 --> 00:13:16,310 And what you see if you just follow the one with the diamond, which is the simple one of deaths divided by cases over time, this will go up. 129 00:13:16,310 --> 00:13:23,780 It will eventually get to the right place if you observe all your outcomes because of all the deaths divided by all the cases. 130 00:13:23,780 --> 00:13:32,390 But it's very biased down initially our our simple alternative, which was death divided by death plus recovery. 131 00:13:32,390 --> 00:13:42,290 That's the one with the triangle here. And of course, it shows considerably more uncertainty because there weren't nearly 900 people who had outcomes. 132 00:13:42,290 --> 00:13:52,130 And so it's less certain, but it's not biased. If the time from becoming a case to dying was not similar to the time from being a case to recovering, 133 00:13:52,130 --> 00:13:58,190 which could happen if you were looking at having to hold people for a couple of weeks to be absolutely certain they weren't infectious, 134 00:13:58,190 --> 00:14:03,290 that can make a big difference. And then our fancy statistical one would be the way to go. 135 00:14:03,290 --> 00:14:09,260 How did that get interpreted? The press picked up that the estimate was increasing over time, 136 00:14:09,260 --> 00:14:14,870 that it was becoming deadlier if it was mutating to become more deadly, which is quite scary. 137 00:14:14,870 --> 00:14:22,850 And so we published our case fatality ratio estimates, and this is how it was reported on the 26th of April. 138 00:14:22,850 --> 00:14:30,260 So the research by a British scientist, it wasn't one, but there were several is due to be published and said that the virus could kill. 139 00:14:30,260 --> 00:14:31,820 So the case fatality ratio, 140 00:14:31,820 --> 00:14:40,870 the confidence interval at that point between eight and 15 percent W.H.O. had predicted or estimated between five and six percent at that point. 141 00:14:40,870 --> 00:14:47,650 And they said its estimate was more reliable because it was based on cases worldwide instead of just looking at Hong Kong. 142 00:14:47,650 --> 00:14:57,400 There was some discussion in the next hour by which the research was by Professor Roy Anderson, not a British scientist, but again and colleagues. 143 00:14:57,400 --> 00:15:04,450 But then this WTO spokesman said that Roy Anderson was a top class professional and his findings were probably accurate. 144 00:15:04,450 --> 00:15:11,620 So sometimes it's not the message. Sometimes it's the messenger. 145 00:15:11,620 --> 00:15:19,030 And I just wanted to show you before I hand over to Robyn. The case numbers as of today, it's nearly 10000 cases. 146 00:15:19,030 --> 00:15:23,260 This is by colleagues at Johns Hopkins. 147 00:15:23,260 --> 00:15:27,970 But I wanted to show you I have to try not to stand in the way. 148 00:15:27,970 --> 00:15:33,760 So if we look at deaths over deaths plus recoveries, it does not look good. 149 00:15:33,760 --> 00:15:38,320 But again, I hasten to add before people jump to conclusions on that. 150 00:15:38,320 --> 00:15:46,570 It's going to be very different if the time from being a case to dying is very different from the time, from being a case to recovering. 151 00:15:46,570 --> 00:15:51,550 And if we look at the examples, I look here because I can see them as opposed to looking at this angle. 152 00:15:51,550 --> 00:15:58,600 If you look at who by which is the first on each side, they've had two hundred deaths and one hundred and sixteen recoveries. 153 00:15:58,600 --> 00:16:04,300 But if you look at Guangdong. They've had zero deaths and 11 recovered, 154 00:16:04,300 --> 00:16:09,070 so it really depends on how long the epidemic has been there and also could vary depending 155 00:16:09,070 --> 00:16:14,650 on how long people are being who who aren't critical and dying are being kept in hospital. 156 00:16:14,650 --> 00:16:20,220 OK, I will hand over to Robyn this. Hi, everybody. 157 00:16:20,220 --> 00:16:27,000 Thank you very much for this opportunity to update you on the work that we've been doing in the Mathematical Institute to analyse this outbreak, 158 00:16:27,000 --> 00:16:32,310 as Crystal said, in real time. So Crystal put up this really great set of questions at the beginning, 159 00:16:32,310 --> 00:16:38,100 which are the kinds of questions that mathematical modelling can be used to answer in real time during an outbreak. 160 00:16:38,100 --> 00:16:42,480 I'm going to focus on two of those in terms of the work that we've done. 161 00:16:42,480 --> 00:16:46,950 The first one is this one here. So this was in the very first few days of the outbreak. 162 00:16:46,950 --> 00:16:51,840 And the question was, can we use modelling to identify what the underlying pathogen is? 163 00:16:51,840 --> 00:16:58,290 So can we say this is perhaps the smallest corona virus? Or is it instead something new that we've never seen before? 164 00:16:58,290 --> 00:16:59,910 So that was the first question. 165 00:16:59,910 --> 00:17:07,590 And then the second question that we've been focussing on more recently relates to this question of how many cases might there be and in particular, 166 00:17:07,590 --> 00:17:13,500 if the virus goes to a new location. So, for example, if it comes to the U.K., as it's now done, 167 00:17:13,500 --> 00:17:22,590 what's the chance of then seeing a chain of sustained human to human transmission in that new location as opposed to not seeing any more cases? 168 00:17:22,590 --> 00:17:26,880 So therefore that the two questions I'm going to kind of concentrate on. 169 00:17:26,880 --> 00:17:33,300 The first thing is, like I say, it was what we were trying to do in the very first few days of the outbreak and in particular, 170 00:17:33,300 --> 00:17:41,220 the first I heard of this outbreak was right back at the beginning of January. So I was alerted to it by colleagues at Hokkaido University in Japan. 171 00:17:41,220 --> 00:17:47,310 On what they'd seen is they'd seen this notice here in China and in particular, I don't speak Chinese, 172 00:17:47,310 --> 00:17:54,240 but I've been informed that this apparently says, well, there's been a cluster of atypical pneumonia cases in the city of Wuhan. 173 00:17:54,240 --> 00:18:00,060 But the source of these pneumonia cases, so exactly what the pathogen is, is as yet unknown. 174 00:18:00,060 --> 00:18:03,540 So I might, as I say, my Chinese isn't that good, but apparently that's what that says. 175 00:18:03,540 --> 00:18:07,590 And the other thing that was known at that time or what appeared to be the case 176 00:18:07,590 --> 00:18:12,030 was that all of those reported cases were linked to a particular seafood market. 177 00:18:12,030 --> 00:18:17,730 This one here the Huanan seafood market in Wuhan. So that was right back at the beginning. 178 00:18:17,730 --> 00:18:22,020 This is a kind of updates relating to the kinds of figures Krystal was just talking about. 179 00:18:22,020 --> 00:18:28,410 So back then, there were very few cases. I think something like twenty seven cases at the end of December, as you can see now, 180 00:18:28,410 --> 00:18:36,300 the number of reported cases has been accelerating quite rapidly. So going back, as I say, it's the beginning of the outbreak. 181 00:18:36,300 --> 00:18:41,850 This was the scenario, so cases kind of accumulated during December 2019, 182 00:18:41,850 --> 00:18:48,810 and when we got to the 31st of December, there were 27 cases of viral pneumonia that had been found. 183 00:18:48,810 --> 00:18:52,980 As I say, almost all of these cases were linked to this one on seafood market. 184 00:18:52,980 --> 00:18:58,320 And so with that in mind, there was no evidence at that point of any human to human transmission. 185 00:18:58,320 --> 00:19:04,590 So what was kind of suspected at that stage was that all of these people had picked up the virus from the market, from an animal reservoir. 186 00:19:04,590 --> 00:19:12,270 And that's as far as we knew. There was no evidence of any onward transmission between people. 187 00:19:12,270 --> 00:19:14,490 So what we could do at that stage, obviously, 188 00:19:14,490 --> 00:19:20,820 people want to do is they want to actually test the virus and find out exactly what the virus is that's driving the ongoing outbreak. 189 00:19:20,820 --> 00:19:25,710 And if you want to identify unambiguously what the virus is, that's what you have to do. 190 00:19:25,710 --> 00:19:27,490 Unfortunately, that takes some time. 191 00:19:27,490 --> 00:19:33,810 And so in just those first few days, we were saying, well, there's a huge amount of data that's available via social media sites, 192 00:19:33,810 --> 00:19:44,010 so via websites like flu tracker and other medical social media sites, and they give a rough idea of characteristics of the outbreak. 193 00:19:44,010 --> 00:19:48,840 As you can see, so some of some of the dates, you can see a kind of not necessarily that accurate. 194 00:19:48,840 --> 00:19:56,550 So for example, the fact that we thought maybe there wasn't any human to human transmission, but there was a vast quantity of possible data like this. 195 00:19:56,550 --> 00:20:03,030 So what I've got in this in this column here is a vector of what was observed at that point in the outbreak. 196 00:20:03,030 --> 00:20:08,640 So in particular. So there's a one indicating that there were lots of cases of atypical pneumonia. 197 00:20:08,640 --> 00:20:11,010 There's a zero here, which indicates that stage. 198 00:20:11,010 --> 00:20:18,600 It didn't appear like there was much human to human transmission and a whole bunch of other different characteristics that had been observed. 199 00:20:18,600 --> 00:20:21,990 So what you can then do is you can go back and look at previous outbreaks like, for example, 200 00:20:21,990 --> 00:20:27,870 the SAR's outbreak and you can write down an equivalent vector based on what was observed in that outbreak. 201 00:20:27,870 --> 00:20:33,840 And then what if what you want to do is you want to determine whether or not the current outbreak is indeed due to the source coronavirus? 202 00:20:33,840 --> 00:20:39,690 Essentially, what you want to do is a pairwise comparison between this vector here and this vector here. 203 00:20:39,690 --> 00:20:44,940 And if those two vectors are very similar, then you might conclude that in fact, the current outbreak is an outbreak of cells. 204 00:20:44,940 --> 00:20:50,640 If those two vectors are very different, then you might conclude that, well, perhaps it's not SAR's, and it's something a bit different, 205 00:20:50,640 --> 00:20:59,200 either one of the other possible pathogens that are on your list or in fact, not one of these and something that we've never seen before. 206 00:20:59,200 --> 00:21:04,750 So this is what we did and what it all essentially comes down to is quite a simple application of base rule. 207 00:21:04,750 --> 00:21:06,970 So the idea is what you want to say is you want to say, well, 208 00:21:06,970 --> 00:21:13,840 what is the probability that the outbreak you've got is due to some disease, for example, SA's given the observed characteristics. 209 00:21:13,840 --> 00:21:17,650 So given the vector of stuff that we've seen in the ongoing outbreak. 210 00:21:17,650 --> 00:21:20,080 And then if you apply based rule, you can say that, well, 211 00:21:20,080 --> 00:21:26,080 that's equal to the probability of seeing this set of observed characteristics, given that it is, for example, 212 00:21:26,080 --> 00:21:33,490 SA's times by an a priori probability that this outbreak is SA's divided by the equivalent terms for 213 00:21:33,490 --> 00:21:39,780 all of the possible candidate diseases on your list of diseases that it could hypothetically be. 214 00:21:39,780 --> 00:21:43,440 So anyway, we kind of did this calculation, you have to make a few assumptions, in particular, 215 00:21:43,440 --> 00:21:48,810 you have to make some assumptions about the distance between these vectors and how that relates to the 216 00:21:48,810 --> 00:21:53,010 probability that the observe characteristics are due to the disease that you're comparing against. 217 00:21:53,010 --> 00:21:56,880 So I would say that at the moment, this is quite a rough kind of method. 218 00:21:56,880 --> 00:22:04,020 But what this rough method gives you is it gives you a kind of priority list of potential pathogens that could be causing the ongoing outbreak. 219 00:22:04,020 --> 00:22:09,150 So unless that looks a little bit like this going from the one that appears to be most likely given the characteristics 220 00:22:09,150 --> 00:22:15,500 you've observed two diseases that are in fact less likely pathogens that are less likely to be the causal agent. 221 00:22:15,500 --> 00:22:21,050 I mean, if you want to see what you can do is you can extend this analysis and you can say, well, what given these vectors, 222 00:22:21,050 --> 00:22:28,910 what is the probability that in fact this outbreak is being driven by a pathogen that is not any one of those pathogens that are on your list? 223 00:22:28,910 --> 00:22:29,600 And if you do that, 224 00:22:29,600 --> 00:22:37,580 then what you obtain is you obtain a kind of risk score for the probability that this that this outbreak is driven by none of these pathogens. 225 00:22:37,580 --> 00:22:41,930 And in particular, what you can see here is this is called disease X. This is how I refer to this. 226 00:22:41,930 --> 00:22:48,530 This means a pathogen that you haven't previously seen what you see as you see something that changes your probability of disease. 227 00:22:48,530 --> 00:22:52,760 X is something that changes throughout the outbreak as more information comes in. 228 00:22:52,760 --> 00:22:58,160 So as you see more characteristics, that suggests that this perhaps isn't stars, for example, 229 00:22:58,160 --> 00:23:04,600 then you'll probability that this is in fact an entirely new thing is something that jumps up through time. 230 00:23:04,600 --> 00:23:10,300 OK, so that was the first kind of bit of analysis that we did, the second one is something that we've done more recently. 231 00:23:10,300 --> 00:23:14,770 And so like I said, the question here is, well, if in the U.K. you go somewhere else. 232 00:23:14,770 --> 00:23:23,100 So if a case goes to a new location, what's the probability that we then see sustained transmission in that new location? 233 00:23:23,100 --> 00:23:27,360 And so I should say this figure has nothing to do with coronavirus at all, but I just wanted to demonstrate the idea. 234 00:23:27,360 --> 00:23:35,270 So this is a model that many of you might be familiar with. So this is a simulation of an FBI all model and in particular, a stochastic RSI all model. 235 00:23:35,270 --> 00:23:40,560 The idea here is that you flip a coin lots of times and according to the result of each coin flip, 236 00:23:40,560 --> 00:23:44,460 you either generate a new infection in your population or you assume that one 237 00:23:44,460 --> 00:23:51,030 of the infected individuals is that you've got recovers or becomes isolated. In either case gets removed from the current outbreak. 238 00:23:51,030 --> 00:23:52,500 And so you flip a coin lots of times. 239 00:23:52,500 --> 00:24:00,590 And if you do this once, according to a very simple model like this, you might see an outbreak projection that looks a little bit like this one here. 240 00:24:00,590 --> 00:24:07,310 If you do exactly the same thing again, under absolutely identical conditions, you might see something that looks very slightly different. 241 00:24:07,310 --> 00:24:14,800 So like this blue line here, this blue line is exactly the same thing, but very slightly different sequence of coin flips. 242 00:24:14,800 --> 00:24:21,250 But then if you do this again. So once again, on the absolutely identical conditions, then you might instead see something a bit like this. 243 00:24:21,250 --> 00:24:23,980 So this is starting with one infected case and in the red case, 244 00:24:23,980 --> 00:24:29,530 it just happens to fade out and you don't see an outbreak driven by sustained human to human transmission. 245 00:24:29,530 --> 00:24:32,290 And again, this is just based on the coin flips that you have. 246 00:24:32,290 --> 00:24:37,210 So if your first coin flip is someone getting removed rather than someone generating a new infection, 247 00:24:37,210 --> 00:24:40,250 then you'd see a very small outbreak like the red one here. 248 00:24:40,250 --> 00:24:46,210 So this leads kind of naturally, this is just to introduce the idea that when someone brings the virus to a new location, 249 00:24:46,210 --> 00:24:51,430 there's some probability that what you see is an outbreak driven by sustained human to human transmission. 250 00:24:51,430 --> 00:24:57,010 So in other words, you see something like the blue or the black, as opposed to something that looks like the red. 251 00:24:57,010 --> 00:24:58,930 And so if you had a forecasting model in theory, 252 00:24:58,930 --> 00:25:04,630 what you could do is you could simulate this lots and lots of times count the proportion of times you get something like the blue and the black, 253 00:25:04,630 --> 00:25:07,180 as opposed to the proportion of times you get something like the red. 254 00:25:07,180 --> 00:25:13,640 And that gives you then the probability of getting sustained transmission when the virus is in a new location. 255 00:25:13,640 --> 00:25:21,340 That's kind of the idea. It turns out of simple models like the iPhone model, you can calculate this probability analytically. 256 00:25:21,340 --> 00:25:24,340 So in fact, for the asylum model, it comes down to solving a quadratic equation. 257 00:25:24,340 --> 00:25:30,400 But for more complex models, it comes down to doing more complex calculations. But essentially what you see is something like this. 258 00:25:30,400 --> 00:25:36,340 The probability of sustained transmission is some function of how transmissible the pathogen driving the outbreak is. 259 00:25:36,340 --> 00:25:41,680 And if you have a very transmissible pathogen, then the probability of getting a sustained outbreak is quite high. 260 00:25:41,680 --> 00:25:48,520 Whereas if you have a pathogen that's not very transmissible, then the probability of getting a sustained outbreak is substantially lower. 261 00:25:48,520 --> 00:25:53,890 And this measure of pathogen transmissibility on the x axis, so Crystal referred to this earlier, 262 00:25:53,890 --> 00:25:58,450 but this is just the product of the rate at which new infections are being generated by 263 00:25:58,450 --> 00:26:03,760 each infected hosts and the length of time that an infected host is infectious for. 264 00:26:03,760 --> 00:26:09,790 So in other words, if you have if you have a pathogen for which hosts are infectious for a very long time, 265 00:26:09,790 --> 00:26:12,040 then obviously the pathogen is going to be very transmissible. 266 00:26:12,040 --> 00:26:17,380 Similarly, if you have a pathogen for which infected hosts generate new infections very quickly, 267 00:26:17,380 --> 00:26:21,640 then again you get a pathogen that's very transmissible. So that's kind of the idea. 268 00:26:21,640 --> 00:26:25,570 We applied some of these methods to data from the ongoing outbreak. 269 00:26:25,570 --> 00:26:30,190 So in particular, one of the things that we wanted to quantify was how long individuals are infectious. 270 00:26:30,190 --> 00:26:36,880 Four. And so the way that we did that was we took data from forty seven patients from very early in the outbreak. 271 00:26:36,880 --> 00:26:45,340 The data that were recorded were the number of days in between individuals showing symptoms and then being taken to hospital and isolated. 272 00:26:45,340 --> 00:26:51,490 So in other words, this is the measure of how long people could have been out in the community and spreading the virus for. 273 00:26:51,490 --> 00:26:55,300 So this gives you some sort of metric of of the duration of infection. And then from that, 274 00:26:55,300 --> 00:27:03,610 you can use methods like the ones I just showed you to calculate the probability of getting a sustained outbreak when the virus goes to a new place. 275 00:27:03,610 --> 00:27:07,960 And then what we did is we looked at, well, what happens if you take this distribution here and you squash it? 276 00:27:07,960 --> 00:27:14,290 So in particular, what happens if you can cut down the amount of time between symptom onset and isolation? 277 00:27:14,290 --> 00:27:20,500 So, for example, by having more rigorous surveillance of symptomatic hosts and if you have more rigorous surveillance of symptomatic hosts, 278 00:27:20,500 --> 00:27:25,550 then your risk of getting sustained outbreaks is substantially smaller. 279 00:27:25,550 --> 00:27:33,470 And so, as I said today, this is kind of quite a topical analysis because today we've obviously seen these two cases in the UK. 280 00:27:33,470 --> 00:27:38,750 This is one of the first preprints, one of the first analyses of these outbreaks that was put out online and so slightly. 281 00:27:38,750 --> 00:27:41,300 Strangely, this is my first experience of any press attention. 282 00:27:41,300 --> 00:27:47,720 So in particular, this graph appeared on BBC News Live from my living room, which is kind of strange. 283 00:27:47,720 --> 00:27:55,280 But anyway, so the idea is that now this is a very topical analysis. And so I guess these kinds of calculations are particularly relevant right now. 284 00:27:55,280 --> 00:28:01,200 So just one thing to say then is about an extension to this work that we're now looking at. 285 00:28:01,200 --> 00:28:05,150 So my PhD student, Francesca Reid, is now doing some work on this in particular. 286 00:28:05,150 --> 00:28:06,800 That's in response to this article here. 287 00:28:06,800 --> 00:28:15,830 And evidence that came from China to what's been found now is that some infections appear to be happening before the in fact becomes symptomatic. 288 00:28:15,830 --> 00:28:23,570 So there some possibility that infections can spread, not from people that are showing signs of having the disease, but instead before that time. 289 00:28:23,570 --> 00:28:32,160 And with that in mind, this distribution here isn't a very good characterisation, perhaps of of how long people are infectious for. 290 00:28:32,160 --> 00:28:37,190 So what we're doing is we're updating this analysis so that we don't have individuals that have the disease 291 00:28:37,190 --> 00:28:42,920 going from a state in which they're not symptomatic and not infectious to being symptomatic and infectious. 292 00:28:42,920 --> 00:28:45,050 So this was our assumption before. 293 00:28:45,050 --> 00:28:51,890 But now we've got something slightly different in which individuals go from being not symptomatic and not infectious to being not symptomatic, 294 00:28:51,890 --> 00:28:57,680 but potentially infectious. And then it's only later that they become both symptomatic and infectious. 295 00:28:57,680 --> 00:29:01,100 And so you can then extend the kinds of analysis that I just showed you. 296 00:29:01,100 --> 00:29:07,760 The key conclusion that came out of this is the exactly how you might look to try and control 297 00:29:07,760 --> 00:29:13,310 pathogen with different levels of pre-symptomatic transmission might be very different. 298 00:29:13,310 --> 00:29:19,370 So if you have a pathogen that for which a huge proportion of infections are occurring before symptoms occur, 299 00:29:19,370 --> 00:29:21,410 then that might that might affect what you do. 300 00:29:21,410 --> 00:29:28,460 In particular, a control strategy that shortens the period of symptomatic infectiousness might not there might not be very effective any more. 301 00:29:28,460 --> 00:29:28,850 So actually, 302 00:29:28,850 --> 00:29:36,890 just going and trying to find symptomatic people might not be sufficient if there's a lot of pre-symptomatic infection to try and control the virus. 303 00:29:36,890 --> 00:29:42,610 So I'm going to leave it there. I'm going to Haslam's Christophe and I look forward questions at the end of things. 304 00:29:42,610 --> 00:29:49,830 So I think I'll always remember the day when Crystal gave a couple of us who 305 00:29:49,830 --> 00:29:55,840 were working with her on such a quick lesson in survival analysis and bias, 306 00:29:55,840 --> 00:30:02,470 and right truncated data showing that the CFR estimates case fatality rates estimates one, 307 00:30:02,470 --> 00:30:07,870 two and a half percent, but between seven and 15 percent where she didn't say was at the time. 308 00:30:07,870 --> 00:30:19,300 There was also a change in the case definition for the inclusion in databases, which doubled the amount of reported cases that afternoon. 309 00:30:19,300 --> 00:30:24,580 So it was a memorable afternoon and illustrates the two points, 310 00:30:24,580 --> 00:30:34,030 which is that important decisions need to be made for controlling an epidemic with very imperfect observations. 311 00:30:34,030 --> 00:30:41,740 And governments need to expose themselves to making difficult decisions where it's very 312 00:30:41,740 --> 00:30:47,140 hard to say whether they were the right decisions or not on the basis of incomplete data. 313 00:30:47,140 --> 00:30:52,600 So the ability to try to adjust and to come up with statistical estimates, 314 00:30:52,600 --> 00:31:01,180 which are corrected for bias is absolutely critical for improving the nature of decision making. 315 00:31:01,180 --> 00:31:11,680 So one of the sort of, having worked out mentioned just very briefly two perspectives one from the SA's epidemic and one from the 2009 316 00:31:11,680 --> 00:31:20,470 flu pandemic in terms of some of the really difficult questions that need to be answered with imperfect data. 317 00:31:20,470 --> 00:31:30,820 So one of the things I think that's that's new this time is a lot of prior knowledge and very open and rapid sharing, 318 00:31:30,820 --> 00:31:35,860 both of data of analysis and of genome data. 319 00:31:35,860 --> 00:31:45,160 So the first thing is, I think there's an interesting question as to whether we should be sort of thinking about this in a slightly Bayesian manner. 320 00:31:45,160 --> 00:31:51,910 There are sort of priors that can help us sort through the fog of uncertainty and the genetic data from 321 00:31:51,910 --> 00:31:58,960 this new virus was shared within days and showed two things to sort of data points which really help. 322 00:31:58,960 --> 00:32:08,020 So the first is that the genetic data from the first sequences from Wuhan and from travellers were basically all genetically identical, 323 00:32:08,020 --> 00:32:16,600 separated at most by a single mutation and a relatively large genome in error prone and replication. 324 00:32:16,600 --> 00:32:28,030 So we're pretty confident that the virus hadn't been circulating cryptically for months and really did emerge at the earliest and late November, 325 00:32:28,030 --> 00:32:37,360 and probably most likely in mid-December. So that's different from previous outbreaks, where the cryptic transmission, 326 00:32:37,360 --> 00:32:45,130 unreported transmission both in the case of SaaS and in the case of the flu pandemic had been going on for weeks or months. 327 00:32:45,130 --> 00:32:51,310 So this really is sort of Real-Time reporting at the same time in the epidemic for those previous epidemics. 328 00:32:51,310 --> 00:32:56,680 None of these analysis had even started, and none of the data had been shared. 329 00:32:56,680 --> 00:33:05,020 So to put that in context. And the second thing is that the origins of both this new coronavirus are not supposed to stand, 330 00:33:05,020 --> 00:33:11,170 and the new corona virus and of the SARS virus are both within the same clade of bat viruses. 331 00:33:11,170 --> 00:33:19,540 So they share a common origin. And the genetic distances between the new coronaviruses and the source viruses are relatively small. 332 00:33:19,540 --> 00:33:25,660 They're small enough that they're less than the difference between influenza A and influenza B. 333 00:33:25,660 --> 00:33:31,030 And we would both consider those to be the same virus in the case of HIV. 334 00:33:31,030 --> 00:33:37,360 So taxonomic taxonomy, there was a legitimate question as to whether there should be Soulsby or a new corona virus, 335 00:33:37,360 --> 00:33:45,910 and they've settled on new coronavirus. But a prior might have been to assume or might be to assume that this is SA's like. 336 00:33:45,910 --> 00:33:50,560 Now what's the big difference between SA's and influenza that's relevant? 337 00:33:50,560 --> 00:33:57,310 First of all, with SA's containment was a sort of massive success story. 338 00:33:57,310 --> 00:34:07,870 And serological studies after the outbreak confirmed that there was basically no asymptomatic transmission and very little transmission in children. 339 00:34:07,870 --> 00:34:13,070 So the isolation measures and the quarantine measures and the contact tracing measures, 340 00:34:13,070 --> 00:34:18,580 we were relying on transmission that occurred before symptoms on the x axis. 341 00:34:18,580 --> 00:34:26,800 Here I've got the proportion of infections that occur or transmissions that occur prior to symptoms or due to asymptomatic infection. 342 00:34:26,800 --> 00:34:32,530 And on the y axis I've got are not. So you'll forgive the graphics, which were from 2004. 343 00:34:32,530 --> 00:34:41,890 I hesitated about whether to update them, but basically SaaS was fairly infectious but fairly easy to control because most transmissions occur. 344 00:34:41,890 --> 00:34:48,580 After symptoms, as Robin said, so being able to estimate that a clear indicator of that is that lots of 345 00:34:48,580 --> 00:34:53,110 transmissions occurred in hospitals and health care workers where increased risk. 346 00:34:53,110 --> 00:34:59,770 So the influenza estimate here is completely wrong. From this paper in 2004, the transmissibility is much lower. 347 00:34:59,770 --> 00:35:08,620 But there's transmission from very mild cases. So the challenge in 2009 was that it was clear that transmission was going on. 348 00:35:08,620 --> 00:35:14,380 It was clear that there were severe cases, but there was no idea when the critical decision needed to be made, 349 00:35:14,380 --> 00:35:18,850 whether the severe cases were one in 100 cases, one in a thousand cases. 350 00:35:18,850 --> 00:35:27,460 As it turned out, the estimate was probably one in 100000. Infections was leading to a severe to a severe case. 351 00:35:27,460 --> 00:35:35,200 And therefore, in the case of influenza, the decision was gradually made to actually not try to control the virus because more harm was 352 00:35:35,200 --> 00:35:44,260 being done by the social isolation measures needed to control the virus than the virus itself. 353 00:35:44,260 --> 00:35:54,790 And there's some juicy mathematics that you can engage with to to predict the impact of isolation and contact tracing 354 00:35:54,790 --> 00:36:01,120 showing what what the main of this parameter space leads to infection control for different measures or not. 355 00:36:01,120 --> 00:36:06,940 So this is important for deciding whether you're going to go for a reactive control 356 00:36:06,940 --> 00:36:12,350 policy where you follow up symptomatic cases or whether you need to go for a mass. 357 00:36:12,350 --> 00:36:23,080 As as Robyn mentioned, a mass control policy where you just reduce contacts between all individuals in order to contain the epidemic. 358 00:36:23,080 --> 00:36:28,000 So the question is, is this new Kovarik coronavirus, which is genetically similar to us? 359 00:36:28,000 --> 00:36:36,040 Is it a virus that has very high pathogenicity with very few mild infections, very few pre-symptomatic infections? 360 00:36:36,040 --> 00:36:37,780 Or is it to the right of here? 361 00:36:37,780 --> 00:36:48,040 Or is it all of the way like HIV, where most transmissions occur from pre-symptomatic or mildly symptomatic individuals? 362 00:36:48,040 --> 00:36:53,950 This very impressive New England Journal of Medicine paper was published yesterday afternoon, 363 00:36:53,950 --> 00:37:01,600 which is an analysis of the first 450 or so cases from where your hand might be starting to give some indication. 364 00:37:01,600 --> 00:37:08,020 And I refer you to the paper loss estimates of our note, which are consistent with we have published. 365 00:37:08,020 --> 00:37:11,890 But one critical observation here is that in the latest data, 366 00:37:11,890 --> 00:37:21,250 nearly three quarters of all people who have been reported as infections and were hand have not did not at the point of being interviewed, 367 00:37:21,250 --> 00:37:28,300 have exposure to either the market where the infection started or a person with respiratory symptoms. 368 00:37:28,300 --> 00:37:34,090 So there is these are the first data set to move us away from the source prior 369 00:37:34,090 --> 00:37:38,710 to a more important role for pre-symptomatic and asymptomatic transmission. 370 00:37:38,710 --> 00:37:42,400 There's also been well documented cases, such as the cases in Germany, 371 00:37:42,400 --> 00:37:51,220 where an asymptomatic individual who then develops symptoms travelling back to China had infected two people within in Germany. 372 00:37:51,220 --> 00:37:57,340 So the reality is that we still have considerable uncertainty about the severity of infection and the severity. 373 00:37:57,340 --> 00:38:04,000 Infection could be close to the case fatality rate, which is clearly quite high as as Crystal has been indicated, 374 00:38:04,000 --> 00:38:10,930 or it could be much lower if there are many more mild or asymptomatic infections. 375 00:38:10,930 --> 00:38:16,240 So one of the lessons of 2009 is it took nearly four months to sort out this uncertainty, 376 00:38:16,240 --> 00:38:22,930 during which time many decisions had to be made and some statistical lessons were very useful. 377 00:38:22,930 --> 00:38:29,530 And in particular, these are the the labels here refer to the flu to 2009 flu, 378 00:38:29,530 --> 00:38:36,460 but this was a statistical paper by statisticians and was revisited by several different groups 379 00:38:36,460 --> 00:38:41,830 and showing the property of estimates so-called sort of hierarchical pyramidal estimates, 380 00:38:41,830 --> 00:38:47,110 and showing that you can converge on robust estimates very quickly by multiplying the 381 00:38:47,110 --> 00:38:52,930 numbers much more quickly than by doing very large surveys and well-designed surveys, 382 00:38:52,930 --> 00:38:57,190 looking at sort of the degree of asymptomatic infection and then looking at the 383 00:38:57,190 --> 00:39:02,170 degree of severe infection and so on and multiplying those numbers together. 384 00:39:02,170 --> 00:39:08,000 So I think right now, in order to make the key decisions, in addition to the data showing, 385 00:39:08,000 --> 00:39:16,060 we sort of very urgently need to know these kind of data which are going to be reliant on screening the appropriate populations at risk. 386 00:39:16,060 --> 00:39:20,041 I think you.