1 00:00:00,150 --> 00:00:03,300 Pleasure to come back to Oxford always. 2 00:00:03,660 --> 00:00:10,739 Although I'm a cardiologist, I do, and in many ways allowed myself more with primary care and community care. 3 00:00:10,740 --> 00:00:20,420 So. And I think I'll tell you a bit later on, I've got strong links with your evidence based medicine department, head of centre. 4 00:00:21,240 --> 00:00:28,680 So I've got some conflicts to tell you. The biggest conflict is you should know that I spend my time, do clinical work. 5 00:00:29,100 --> 00:00:36,000 I'm not a full time clinician and neither I full time could have controlled my time teaching, researching, 6 00:00:36,900 --> 00:00:45,120 and I've been at advisory boards for these companies, but I also have research funding from these other funders. 7 00:00:45,930 --> 00:00:53,940 I'm a trustee of the South Asian Health Foundation and my interests quite varied, 8 00:00:54,270 --> 00:01:05,100 so ranging from digital health and its evaluation to where informatics sits in medical education and evidence based medicine. 9 00:01:07,550 --> 00:01:12,560 So this evening I'm going to take up half an hour of your time and do three things. 10 00:01:13,520 --> 00:01:20,480 First of all, I'm going to tell you about three patients that I'm going to tell you about three problems which those patients highlight. 11 00:01:21,020 --> 00:01:25,260 And then three frameworks. We're focusing on this. 12 00:01:26,310 --> 00:01:34,490 So heart failure is what I'm specialised to and treat and diagnose to look at in my work, my cardiologist. 13 00:01:35,500 --> 00:01:47,490 And heart failure is an unusual disease in that it's a it's a syndrome that is really at the endpoint of quite a lot of processes, 14 00:01:47,640 --> 00:01:51,600 from high blood pressure to congenital heart disease, 15 00:01:51,810 --> 00:02:04,770 from heart attacks to lots of rarer conditions like cobalt and high levels of cobalt can be associated with heart failure, for example. 16 00:02:06,060 --> 00:02:11,220 So so there's a real ragbag of stuff that can be linked with heart failure. 17 00:02:11,730 --> 00:02:17,570 It can eventually end up being seen by me. And the idea is we want to pick you up at that. 18 00:02:17,610 --> 00:02:25,469 And as in any disease, you want to be pre disease or normal if you can to stop the disease from happening. 19 00:02:25,470 --> 00:02:32,130 And there's an asymptomatic phase where people might not be presenting breathlessness, which is the most common thing that happens. 20 00:02:33,600 --> 00:02:39,260 But most of my work is when people say, I can't walk to the door, don't. 21 00:02:39,360 --> 00:02:49,710 I can't go up a flight of stairs. They're breathless. And there's a variable course, which at worst can be can be fatal, 22 00:02:50,130 --> 00:02:56,460 and it can lead to lots of different treatments, the mainstay of which is drug therapy. 23 00:02:56,700 --> 00:03:01,770 But there's, you know, a spectrum on which heart transplant, heart transplantation is there as well. 24 00:03:04,690 --> 00:03:07,990 But what I'm here to talk to you about is big data. 25 00:03:09,430 --> 00:03:12,550 This term that means everything and nothing. 26 00:03:14,190 --> 00:03:17,200 Now at the moment, I think we're up to seven bees. 27 00:03:17,220 --> 00:03:24,750 When I was studying here doing my PhD, our first heard about big data and I think it was only previous at that time. 28 00:03:24,750 --> 00:03:30,090 So every time you had to be, you can get a paper and remain in the light. 29 00:03:31,050 --> 00:03:38,910 So now we're up to velocity, volume, veracity, variety, variability, visualisation, value. 30 00:03:39,510 --> 00:03:51,660 Um, but the, the theme really is that there's more of the data coming at a quicker rate and can we do something more with it. 31 00:03:51,960 --> 00:03:56,610 So this is my topic to talk about whether we can in heart failure. 32 00:03:59,260 --> 00:04:08,080 And in this this kind of jargon, bingo of big data epidemiology, these are the terms that you will come across. 33 00:04:08,530 --> 00:04:17,710 So precision medicine, the idea that we can make more precise diagnoses and more precise courses of treatment, 34 00:04:19,120 --> 00:04:32,769 personalise the treatment to the individual, which is slightly different to the to the previous one and new or old data analytics. 35 00:04:32,770 --> 00:04:35,229 There are definitely a community. 36 00:04:35,230 --> 00:04:43,420 There are some analytics that aren't new, but a lot of what is being done with big data is by statistics and epidemiology. 37 00:04:43,420 --> 00:04:50,140 This is not new, but the main theme is if you look across that spectrum, I show you for heart failure. 38 00:04:50,590 --> 00:04:57,850 What we're trying to do as researchers or clinicians is trying to predict the risk of being at that point, 39 00:04:58,780 --> 00:05:03,730 whether it's developing heart failure, whether it's developing the outcome of heart failure. 40 00:05:04,390 --> 00:05:10,850 And the fact of the matter is, most of the risk scores that I have at my fingertips as a clinician are not good enough. 41 00:05:11,350 --> 00:05:18,820 When people ask me questions, I can't answer them accurately enough with the risk scores that I have at my disposal. 42 00:05:19,660 --> 00:05:28,330 And, you know, there's ways which are going to later of how we assess the accuracy of these risk goals. 43 00:05:28,780 --> 00:05:34,929 But really, there is there is a tried and tested method of how we evaluate risk prediction, 44 00:05:34,930 --> 00:05:39,399 how we evaluate risk scores in the same way as we evaluate new treatments. 45 00:05:39,400 --> 00:05:44,260 And at the moment, there's many, many of them in heart failure, which are not working very well. 46 00:05:46,000 --> 00:06:01,330 So my first patient, Imran, was 43 years old when I met him and he had a relatively rare condition and amyloidosis, 47 00:06:02,080 --> 00:06:10,210 which is an incurable condition, you could say it's an it affects lots of different organs. 48 00:06:11,080 --> 00:06:16,719 But his in his case, it was affecting his heart very badly as previously. 49 00:06:16,720 --> 00:06:24,370 Well, a man who used to be a very vigorous swimmer and he had fluid around his heart, 50 00:06:24,850 --> 00:06:33,790 which meant that he was he couldn't walk for me, that wasn't really affecting the other organs of the stage. 51 00:06:35,650 --> 00:06:41,830 He was from the south coast and he was transferred to us at Barts. 52 00:06:42,730 --> 00:06:53,080 He had you don't need to know all those, but he had a host of scans and various evaluations and pre transplant. 53 00:06:53,080 --> 00:07:02,020 Because what happens is you have these these tests and investigations to see whether you'd be you survive a transplant, 54 00:07:02,220 --> 00:07:06,280 how we can prioritise you for transplant. But also this underlying condition, 55 00:07:06,280 --> 00:07:16,900 amyloidosis has a treatment regime which affects the immune system that was that was stunted because actually in this period, 56 00:07:17,020 --> 00:07:21,160 while he was being investigated, his heart got worse. So I'll come back to. 57 00:07:21,200 --> 00:07:33,370 But Dejection fraction is the term we use for how much of your heart output is pumped out with each minute it should be 55 60%. 58 00:07:34,030 --> 00:07:44,770 This gentleman it was 5%. So within six, seven days of seeing me, he got listed for urgent transplant. 59 00:07:45,430 --> 00:07:49,570 And actually within three or four days of being listed, he had a transplant. 60 00:07:50,830 --> 00:07:54,310 So he was pretty damn lucky because. 61 00:07:55,400 --> 00:08:02,540 In the UK, you'll see in the Premier League of Europe of heart transplantation. 62 00:08:03,050 --> 00:08:10,070 We're not doing that well in terms of the rate of transplant per per million population. 63 00:08:10,970 --> 00:08:17,690 And that's that's been like that for several years. And moreover, if you look at different organs, 64 00:08:18,230 --> 00:08:29,020 so this is the the rate of transplantation of different organs in people have been certified as brainstem dead. 65 00:08:30,310 --> 00:08:35,270 The proportion of organs that are actually available and eligible to be used that actually get used. 66 00:08:35,750 --> 00:08:41,090 So at the top there, there's kidneys and livers, which is 80 plus percent. 67 00:08:41,870 --> 00:08:48,710 But heart transplantation, we're using less than a fifth of those organs. 68 00:08:49,520 --> 00:08:54,800 And that's just to give you a flavour that the pool of people who can get an organ is small. 69 00:08:55,160 --> 00:09:00,680 There's problems with the organs, the pipeline to get the organs in the first place. 70 00:09:01,040 --> 00:09:04,730 And so Emerald did incredibly well to get an organ. 71 00:09:05,690 --> 00:09:09,650 And not much has been changing this old data from when I was a registrar in Birmingham, 72 00:09:09,830 --> 00:09:17,180 which is a transplant, a big transplant unit, but it's still about 250 to 300 transplants per year. 73 00:09:17,180 --> 00:09:27,020 That's not growing. So the thing is, we've got to get really good at picking the horses to back who you can give the heart to. 74 00:09:28,010 --> 00:09:34,990 And so there's a waiting list on that side and there's post-op care and survival on this side. 75 00:09:35,390 --> 00:09:42,920 And we do our best to predict who's going to survive on the waiting list, because it might be up to a year or 18 months on the waiting list. 76 00:09:44,330 --> 00:09:49,069 And we want to back the people who are going to survive as long as they can. 77 00:09:49,070 --> 00:09:52,940 And there's various factors, patient level, clinical factors. 78 00:09:53,270 --> 00:09:55,280 System level at each stage. 79 00:09:56,300 --> 00:10:08,540 And we have various risk scores that are either developed and derived from the US databases mostly, but also some European ones as well, 80 00:10:08,780 --> 00:10:16,700 that are generally used in all of the research, but they use quite a few small number of features. 81 00:10:17,270 --> 00:10:22,130 They don't personalise those risk factors to the individual and they're not very good. 82 00:10:23,930 --> 00:10:35,839 So this is some work that we did with some colleagues at the Alan Turing Institute to to look at how we could do better using machine learning, 83 00:10:35,840 --> 00:10:42,350 whether we could to to predict how people do after heart transplantation. 84 00:10:43,550 --> 00:10:51,230 Do we have data like that in the UK? The answer is that number of transplants per year over the last ten years of registries, 85 00:10:51,230 --> 00:11:00,500 it's not a big dataset and it's not well curated, so you're lucky if you can get three or 4000 people in that dataset. 86 00:11:00,830 --> 00:11:06,709 So we looked at the US where there's publicly available data, you just send an email and get access to the data. 87 00:11:06,710 --> 00:11:11,390 It's wonderful and that's whatever organ transplantation that you want to look at. 88 00:11:11,840 --> 00:11:15,049 So this is called open access. 89 00:11:15,050 --> 00:11:24,950 We looked at those those between those dates and say there were 60,000 patients who had transplants. 90 00:11:25,400 --> 00:11:29,330 There were 36,000 people who were on the waiting list. 91 00:11:30,530 --> 00:11:34,790 And the amount of people who who were followed. 92 00:11:37,370 --> 00:11:43,849 So those scores that I listed and really just just focus on the bold these are all 93 00:11:43,850 --> 00:11:47,270 the risk scores that I listed at the top of those top three in the bottom three. 94 00:11:47,720 --> 00:11:53,480 And the numbers never get much above 0.6 for pre transplant survival. 95 00:11:53,630 --> 00:11:57,710 And when you get to post-transplant survival, you're in crossing a coin territory. 96 00:11:58,610 --> 00:12:07,010 You're 0.50.6 area under the curve for these scores, predicting survival even at three months. 97 00:12:07,400 --> 00:12:12,080 And I'm trying to see whether this person can live for ten years. 98 00:12:14,330 --> 00:12:17,510 So what is machine learning? I'm sure you will know this. 99 00:12:17,510 --> 00:12:20,840 It's a branch of artificial intelligence. 100 00:12:21,910 --> 00:12:29,059 And in the cardiology literature, there's been a ballooning of the use of the term artificial intelligence. 101 00:12:29,060 --> 00:12:37,400 And it's yeah, there's lots of people joking about whether the intelligence of the people writing about people is actually artificial or not. 102 00:12:37,840 --> 00:12:49,640 And so it does borrow from models of of statistics because there's various algorithms that are off the shelf. 103 00:12:50,150 --> 00:12:55,400 And there's also people who are trying to marry up different algorithms to see if they do better. 104 00:12:55,620 --> 00:13:02,540 But Mihaela Bhandarkar, who's our computer scientist and this in this piece of work, 105 00:13:03,830 --> 00:13:12,950 what she was trying to do was to cluster features together and personalise the risk prediction in a way that traditional risk scores look at 106 00:13:12,950 --> 00:13:20,779 population characteristics to see whether the regression could be done in clusters of individual characteristics to try and personalise the pitch. 107 00:13:20,780 --> 00:13:28,360 And she calls this model trees of predictors. So in your kind of Phoebe format, this was my peak. 108 00:13:28,910 --> 00:13:34,489 We're looking at, you know, databases will look at this new algorithm. Does it do better than off the shelf? 109 00:13:34,490 --> 00:13:40,160 Machine learning tools and regression predicting survival of the hot competition. 110 00:13:41,090 --> 00:13:46,820 Those schools that I showed you earlier, they all have a few component features. 111 00:13:47,120 --> 00:13:56,180 One of them has seven things. They're 13, 14, and they range from ejection fraction, which I mentioned earlier, to things like age or blood pressure. 112 00:13:58,040 --> 00:14:02,479 And there's this idea that permeates the machine learning literature that bigger is better, 113 00:14:02,480 --> 00:14:05,959 that more is better, the more features they have is better. 114 00:14:05,960 --> 00:14:16,910 So we used everything that we could in this database, 15 features, and I won't go into it because we compute so much more about this. 115 00:14:17,300 --> 00:14:26,420 But in short, the algorithm breaks down on the basis of various features at each level of 116 00:14:26,420 --> 00:14:31,550 this tree and sees whether with a limited number of features at each level, 117 00:14:31,790 --> 00:14:36,530 you can predict, rather than using the same algorithm, the same risk prediction tool and everybody. 118 00:14:38,970 --> 00:14:45,090 And what we found was that in the red box that all of the numbers got better. 119 00:14:46,740 --> 00:14:50,129 The ones post-traumatic transplant survival less. 120 00:14:50,130 --> 00:14:55,530 So there's still less than point seven. But at the top, particularly in the short term, they were good. 121 00:14:56,130 --> 00:15:03,900 But at ten years, you're in the 76 territory, but not that much better than existing machine learning. 122 00:15:03,900 --> 00:15:07,460 But significantly better. For what it's worth. 123 00:15:07,470 --> 00:15:16,110 But definitely better than. So we improved with survival. 124 00:15:16,110 --> 00:15:24,929 This got a bit of coverage but the press and because this idea of personalising risk prediction it was precision medicine with risk 125 00:15:24,930 --> 00:15:33,870 prediction and it and it also shows that you can improve risk prediction over various time Verizon's but it's very far from perfect. 126 00:15:34,770 --> 00:15:38,520 We need to do things prospectively that's looking in the rear-view mirror. 127 00:15:38,850 --> 00:15:46,530 We need to do a forward looking study with real patients going forward, and we need to validate in different populations, 128 00:15:46,530 --> 00:15:53,129 possibly in a trial setting, and check whether patients and clinicians actually want to use this. 129 00:15:53,130 --> 00:15:54,870 So that's something that we're doing at the moment. 130 00:15:55,800 --> 00:16:03,330 We've got an online tool that transplant cardiologists can use with their patients, and we're gathering data on that. 131 00:16:04,560 --> 00:16:08,310 So that was the first problem. Risk prediction at the end stage of heart failure. 132 00:16:09,180 --> 00:16:12,510 This is Richard, the next patient, 61 years old. 133 00:16:14,070 --> 00:16:19,580 We had a heart attack eight years ago and has had heart failure since his ejection fraction. 134 00:16:19,630 --> 00:16:27,690 This which is kind of moderately fast and he is breathless, going up two flights of stairs. 135 00:16:28,710 --> 00:16:32,040 He's very avidly looking up stuff on the Internet. 136 00:16:32,070 --> 00:16:42,450 He's one of those people who's bringing sheets of paper to the consultations and this risk calculator that said it use machine learning. 137 00:16:42,780 --> 00:16:47,100 It says he's got less than 20% chance of survival in next ten years. 138 00:16:48,540 --> 00:16:52,320 So his question is, when we see his grandson graduate, 139 00:16:53,130 --> 00:17:02,700 which is 15 years now and know I don't know anything about his grandson's academic ability, I'm just interested in debiting. 140 00:17:03,600 --> 00:17:13,580 And so is this machine learning any good in heart failure showed off the top of the title. 141 00:17:14,580 --> 00:17:17,550 So this is a large study from Sweden. 142 00:17:17,610 --> 00:17:28,530 So the Scandinavians who write and analyse data better than us, and they do it for the whole the whole of the country. 143 00:17:29,940 --> 00:17:34,500 And in Sweden, they looked at nearly 50,000 heart failure patients. 144 00:17:34,920 --> 00:17:42,240 And they were looking at whether you could predict mortality using traditional methods and using machine learning. 145 00:17:44,520 --> 00:17:51,480 And in short, what we're taught at medical school and in cardiology training, this thing ejection fraction, how well your heart pumps. 146 00:17:51,750 --> 00:17:57,570 That is the best indicator of whether you're going to survive or how long you're going to survive. 147 00:17:57,840 --> 00:18:07,500 In actual fact, that was really tossing a coin that wasn't very good at all predicting in a national data set in Sweden. 148 00:18:08,020 --> 00:18:13,530 But they're using using over 40 variables in machine learning. 149 00:18:13,530 --> 00:18:18,630 They managed to increase the C statistic, but this was, again, a retrospective study. 150 00:18:21,410 --> 00:18:24,840 There's lots going on in the literature about machine learning. 151 00:18:24,860 --> 00:18:29,180 I'm only going to tell you about in the cardiovascular space, but these are three diseases heart failure, 152 00:18:29,510 --> 00:18:34,670 acute coronary syndromes, atrial fibrillation, common, and they often overlap. 153 00:18:35,990 --> 00:18:45,170 And we're constantly thinking that we can do better at the definition, better at the diagnosis, and therefore better at prevention. 154 00:18:46,010 --> 00:18:49,530 And so machine learning is has been used a lot. 155 00:18:49,580 --> 00:18:54,440 There's lots of papers being written about it, but nothing is being used in actual clinical practice. 156 00:18:57,080 --> 00:19:03,860 So we did a systematic review. The problem here is I don't know how to tell this patient how good is this machine learning 157 00:19:03,860 --> 00:19:09,320 algorithm or how good is the literature about machine learning for risk prediction? 158 00:19:09,470 --> 00:19:11,240 So we've done this. 159 00:19:11,750 --> 00:19:20,390 We've looked in the last 18 years or so and we've looked at any study that's looking at risk prediction, machine learning, clustering. 160 00:19:20,880 --> 00:19:29,030 And we've also done a scathing review of non cardiovascular disease to see where heart disease is compared to other diseases. 161 00:19:30,890 --> 00:19:35,510 After filtering through, we found about 70 articles. 162 00:19:37,280 --> 00:19:47,990 So I should give credit to Nick Cheng, who's a postdoc in data science, and my group has been doing most of this work and clustering. 163 00:19:48,770 --> 00:19:55,850 For those of you who may not know is looking at whether the data actually clusters, 164 00:19:55,940 --> 00:20:02,870 whether if you if you took 50,000 heart failure patients, do they naturally settle into different clusters. 165 00:20:03,980 --> 00:20:08,780 And there are three papers that we found that are looking at clustering. 166 00:20:09,650 --> 00:20:12,830 They're all the mean is about 2000 patients. 167 00:20:13,730 --> 00:20:17,810 Some are very small. The largest was that's a Swedish study that I just showed you. 168 00:20:19,400 --> 00:20:23,360 Most of them are in heart failure, some in the other diseases, most of them are single diseases. 169 00:20:23,720 --> 00:20:32,090 And almost two thirds of them are in North America and less than 1000 individuals. 170 00:20:33,350 --> 00:20:41,570 And people talk about machine learning using lots and lots of variables, but actually it's about 26 at the moment in the UK. 171 00:20:43,880 --> 00:20:49,910 There's a variety of methods used, but only three or four are commonly used. 172 00:20:50,270 --> 00:20:59,930 Most do not use several methods like we used. They used one by machine learning methods, and clustering is uniformly positive. 173 00:21:00,530 --> 00:21:08,520 Nobody's ever written a paper that says we didn't find any clusters. If they found two plus two and four plus two equals three positive. 174 00:21:09,030 --> 00:21:16,229 So it's always positive and they don't always validate their findings in the risk prediction literature. 175 00:21:16,230 --> 00:21:21,870 So this is machine learning to improve the heart transplantation risk prediction, for example, 176 00:21:22,830 --> 00:21:29,790 there are a few studies, there's some which are very small, again, mostly North America. 177 00:21:31,470 --> 00:21:34,860 People are using many more covariance, over 100. 178 00:21:35,520 --> 00:21:38,820 They're using more sophisticated machine learning approaches maybe. 179 00:21:40,320 --> 00:21:44,550 But again, all of them are showing that machine learning improves risk prediction. 180 00:21:45,000 --> 00:21:49,440 So what can we say? We've got a focus on North America. 181 00:21:50,130 --> 00:22:00,540 We've got some very small studies. We've also got a lack of validation and we've got a probable publication bias. 182 00:22:01,170 --> 00:22:09,570 And in our scoping review, it looks like whether you're looking at clustering in rheumatoid or clustering in chronic lung disease, 183 00:22:09,810 --> 00:22:19,460 the same problems are happening in their literature as well. My third patient, Francesca, 57. 184 00:22:20,360 --> 00:22:27,140 She has had hypertension for donkey's years. She has recently been picked up with diabetes. 185 00:22:28,010 --> 00:22:35,510 She has a friend who has recently got restless and has been diagnosed with heart failure. 186 00:22:36,320 --> 00:22:46,460 What's my chance of having failed in her question? Because she's had hypertension for ages 15 years plus, and she's she's diabetic and she's 67. 187 00:22:48,890 --> 00:22:53,030 So primary prevention in heart failure is really difficult. 188 00:22:53,870 --> 00:22:58,190 There's no consensus guideline about primary prevention in heart failure. 189 00:22:59,360 --> 00:23:01,849 There's a primary prevention of cardiovascular disease, 190 00:23:01,850 --> 00:23:11,419 which mentions a couple of lines about heart failure and other than stopping smoking and a couple of drugs and treating blood pressure. 191 00:23:11,420 --> 00:23:17,150 But we haven't made much advance in primary prevention of stopping it happening in the first place. 192 00:23:17,630 --> 00:23:29,150 So our European Society of Cardiology has a table drawn by a group, a group of August, predominantly men in the U.S., 193 00:23:29,360 --> 00:23:37,370 who sit around a table and decide that these are the 8 to 9 factors that are ecologically associated with heart failure. 194 00:23:37,640 --> 00:23:43,460 So they have a level one, which is the yellow disease myocardium. 195 00:23:43,640 --> 00:23:47,270 Abnormal loading conditions. Level two, level three, level four. 196 00:23:47,480 --> 00:23:55,850 And so you've got everything there, as I said, early, from hypertension to cobalt, from pregnancy to amyloidosis. 197 00:23:56,210 --> 00:24:05,620 So lots of causes that 89 in all. So what we thought is there's been no study that's ever looked at all of these causes in a systematic way. 198 00:24:05,630 --> 00:24:09,290 People have said, let's look at heart failure caused by May or not. 199 00:24:09,410 --> 00:24:12,440 Well, let's look at blood pressure related, heart failure or not. 200 00:24:12,570 --> 00:24:16,280 They've developed all of these together, and neither will they be able to say. 201 00:24:16,730 --> 00:24:20,250 One of the things that can be done is an electronic health record. 202 00:24:20,600 --> 00:24:27,110 You might be able to look at the incident heart failure patients, 170,000 of them in CPT, 203 00:24:28,430 --> 00:24:32,780 which is a primary care database that I'm sure most of you have seen. 204 00:24:33,020 --> 00:24:38,750 It's linked between primary and secondary care and whether we can look at all of these risk factors. 205 00:24:39,860 --> 00:24:53,240 So we we tell you that each one of those 89 causes or etiologic factors in the electronic health record, and we did that in 170,000 patients. 206 00:24:55,150 --> 00:25:01,510 We as I say, we were using CPI linked with health data, hospital episode statistics. 207 00:25:01,510 --> 00:25:10,540 So read codes from the GP, ICD codes in the hospital statistics and that that's the number of codes. 208 00:25:10,780 --> 00:25:18,850 So this is a it's a reasonable piece of work to just get to the stage where you can make these codes mean something. 209 00:25:19,420 --> 00:25:22,460 And then each of those levels, level one, level two. 210 00:25:22,480 --> 00:25:30,490 As I said, we've looked at when somebody in the last five years before their heart failure had a code that matched. 211 00:25:30,910 --> 00:25:34,569 And so, for example, the communist causes disease, myocardium, 212 00:25:34,570 --> 00:25:41,680 ischaemic heart disease is probably the commonest reason for having heart failure or having an association with heart failure. 213 00:25:42,770 --> 00:25:52,780 And each of those levels we've looked it's hard to do this in traditional data set without bigger data, bigger data about causes. 214 00:25:54,790 --> 00:26:09,640 Right. And what you do is you end up in a situation where you can present all of the causes ever in the in the life history of heart failure. 215 00:26:09,910 --> 00:26:14,790 And see, for example, that you've got hypertension here. 216 00:26:15,940 --> 00:26:20,050 You've got coronary disease, obesity, cancer, diabetes. 217 00:26:20,470 --> 00:26:30,310 So it's mostly the things that we think. But there's also some rarer things that we you know, this is this is pacemaker related. 218 00:26:32,640 --> 00:26:42,270 Issues that are related to heart failure. And then we can look at whether these are age and sex adjusted. 219 00:26:42,480 --> 00:26:46,590 We can look at whether the causal factor is related to the prognosis. 220 00:26:46,830 --> 00:26:48,840 This is at five years, but you can look at ten years. 221 00:26:49,380 --> 00:26:56,250 And people who don't have heart failure but don't have any recording of any of those risk factors have much better survival. 222 00:26:56,280 --> 00:27:03,210 Unsurprising, you might say, but you can you can look as a clinician who might be focusing on, 223 00:27:03,780 --> 00:27:07,320 how am I going to tell this lady what is her risk of developing heart failure? 224 00:27:07,620 --> 00:27:18,240 Maybe this might push us further along. The problem is, as I told, you know, of relatively few treatments for primary prevention of heart failure. 225 00:27:18,630 --> 00:27:23,010 What we've got data for is treating heart attacks, treating blood pressure. 226 00:27:23,700 --> 00:27:30,390 And also there's a group of drugs that have just come off the pipeline called T two inhibitors, 227 00:27:30,660 --> 00:27:35,400 which are for treating diabetes, which also prevent heart failure to some extent. 228 00:27:36,300 --> 00:27:39,900 So if you look at the proportion of people who have those three conditions, 229 00:27:39,900 --> 00:27:48,570 it's actually quite large among people who have not one, two, three, four, five risk factors out of those 89. 230 00:27:48,660 --> 00:27:58,650 Hopefully, there's potentially a big chunk of people, even why we've got three treatable conditions who might be treatable for better treatment. 231 00:27:59,700 --> 00:28:09,030 So we think this is the first study that looked across all causal factors heart failure in a population based nature, 232 00:28:10,350 --> 00:28:23,669 the communist risk factor disease. But they're mostly things that we expected adding cancer, obesity and anaemia and individual and population level. 233 00:28:23,670 --> 00:28:30,060 We probably should be looking at better treatment of hypertension, diabetes. 234 00:28:31,950 --> 00:28:42,420 That would be wrong of me to come to Oxford and not talk about evidence based medicine and to talk about those that get some famous definition. 235 00:28:44,340 --> 00:28:53,330 And I always go back to this patient values, relevant scientific evidence, clinical judgement. 236 00:28:55,980 --> 00:29:03,360 And I also always remember this story of the blind men and the elephant, which you're well aware of. 237 00:29:03,360 --> 00:29:07,650 But it's you know, there's lots of countries that claim it. 238 00:29:07,680 --> 00:29:12,329 My ancestors in India say that it comes from the Chinese. 239 00:29:12,330 --> 00:29:16,050 They come from there. But the story is the same. 240 00:29:16,830 --> 00:29:25,380 The three blind men are introduced to different parts of the elephant's anatomy and have different ideas about what they're actually discovering here. 241 00:29:26,160 --> 00:29:34,500 And the truth is, we're still in evidence based medicine, not at a place where people are looking at where those circles are overlapping. 242 00:29:34,950 --> 00:29:45,030 Most people are neglecting the patient values and preferences, but there's probably much more of this going on. 243 00:29:46,230 --> 00:29:52,590 And that's that's definitely in my experience of evidence based practice been been an issue. 244 00:29:53,310 --> 00:29:59,850 And that's why, you know, these these are patient based scenarios which are genuine from my from my practice. 245 00:29:59,850 --> 00:30:03,840 And most of the literature is not designed to answer that question. 246 00:30:06,810 --> 00:30:14,520 Excuse me. This is a new fangled lexicon from the big decks of era learning health system. 247 00:30:16,050 --> 00:30:21,000 It's always three things. This is science, evidence and care. 248 00:30:21,090 --> 00:30:24,720 And this came from the Institute of Medicine report in the US in 2006, 249 00:30:25,380 --> 00:30:35,070 where the big emphasis was on wastage that most of the research doesn't reach guidelines in evidence. 250 00:30:35,460 --> 00:30:40,080 Most of the evidence doesn't get ready for clinicians like me. It doesn't get applied by health systems. 251 00:30:40,500 --> 00:30:48,180 And so the care is suboptimal and it leads to poor outcomes and probably patient safety issues and maybe death. 252 00:30:48,810 --> 00:30:52,620 So if you can have data circulating around, 253 00:30:53,160 --> 00:31:01,080 then the system as a whole can learn and science has to be more informed by data coming back from the clinical space. 254 00:31:01,860 --> 00:31:10,500 So this is a model that's not replacing at all evidence based health care, but just thinking about it and phrasing it in a slightly different way. 255 00:31:11,190 --> 00:31:14,610 Maybe that reflects. But again, patients in the middle. 256 00:31:16,840 --> 00:31:23,770 But again, there's a problem in the data science space, which is where I am at today, where we're looking at different parts of the ocean. 257 00:31:24,010 --> 00:31:29,589 At the moment, you've got the informatics people who are pushing the science and what we do with 258 00:31:29,590 --> 00:31:34,480 electronic health records and the latest newfangled machine learning methods. 259 00:31:35,080 --> 00:31:40,270 You've got people who are interested in doing guidelines and you've got a different 260 00:31:40,270 --> 00:31:45,010 agenda of how we use it in the health care setting and actually provide care. 261 00:31:47,780 --> 00:31:53,989 The bigger issue still that I think is that there are these three different paradigms, 262 00:31:53,990 --> 00:31:58,640 these three different frameworks which often people seem to think are competing. 263 00:31:59,150 --> 00:32:03,710 So, you know, are you an evidence based person or your informatics person? 264 00:32:04,220 --> 00:32:10,090 Are you a quality improvement person? Or are you an evidence based person in your big day for personal? 265 00:32:10,100 --> 00:32:21,350 Your focus here, you know, is at the end of the day, it should be about are we trying to do this to make things better for patients? 266 00:32:21,770 --> 00:32:31,100 Full stop. And adding to that big data machine learning, hey, I'm serving medicine, so on. 267 00:32:33,210 --> 00:32:43,110 This is one that, again, one of my colleagues, Rob Aldridge, in my department, is writing a lot about that. 268 00:32:44,310 --> 00:32:50,840 Public health is also not to be left behind in the big data era. 269 00:32:50,850 --> 00:32:53,940 It's not just the people who are looking at genes and proteins. 270 00:32:54,540 --> 00:33:01,740 It's also the public health. People are going to marry with the computer scientists and the epidemiologists via statistics 271 00:33:02,220 --> 00:33:10,830 to make sure we use the data better across disciplines to understand and prevent disease. 272 00:33:11,160 --> 00:33:14,610 So to do that, the public population level. 273 00:33:17,790 --> 00:33:25,460 And the difference that I want to put to you now is that we are moving to an era where, you know, 274 00:33:25,620 --> 00:33:34,260 having one person or a set of investigators, having a cohort of people with heart failure, that's going to get developed quite soon. 275 00:33:35,250 --> 00:33:42,270 We should be using our routine data much better at the individual level, for the patient level. 276 00:33:42,270 --> 00:33:48,990 In general practice, we've been far ahead in this country and hospitals are slowly catching up. 277 00:33:49,350 --> 00:33:53,940 But partly where we've gone wrong is that elephant issue. 278 00:33:54,790 --> 00:34:00,480 But even in the hospital, we have to been thinking about how to make the data actually improve things for patients. 279 00:34:00,510 --> 00:34:08,040 But people are thinking about this much more at scale and and doing it across clinical and research space. 280 00:34:08,970 --> 00:34:15,640 But. The issue is that the data is still problem. 281 00:34:17,170 --> 00:34:23,620 So the people who are likely to use new technologies or have new technologies, 282 00:34:23,620 --> 00:34:29,949 a new computer science and machine learning or whatever used upon them are the people who are 283 00:34:29,950 --> 00:34:36,700 socioeconomically better off than the people in higher income countries and the data divide. 284 00:34:38,080 --> 00:34:49,420 So we know this slide. So this is genomic studies to date, but 2016, it's not that much better today in 2019. 285 00:34:49,900 --> 00:34:57,160 In 2009, 4% of the people in genomic studies worldwide with non-European. 286 00:35:00,080 --> 00:35:10,310 19% non-European in 2000. 64/5 of the population of the world is non-European. 287 00:35:10,940 --> 00:35:22,040 So should I be excited about what precision medicine is going to do for people when the data doesn't represent the population data divide? 288 00:35:22,400 --> 00:35:26,230 Who owns the data? Is it publicly owned? Is it commercially? 289 00:35:27,230 --> 00:35:32,000 What's the quality of the data? And what are the standards in its collection and its use? 290 00:35:32,090 --> 00:35:37,640 So so before you get to making heart failure treatment better for patients, 291 00:35:38,390 --> 00:35:45,890 there's this backdrop where we need to before we say the machine learning is going to tell you more about heart failure. 292 00:35:45,900 --> 00:35:58,100 It's the same problem as standard statistics and epidemiology is is the is the population representative of the patient sitting in front of you. 293 00:35:59,000 --> 00:36:05,410 Is there external validity research? So I'm coming towards the end. 294 00:36:08,820 --> 00:36:19,140 There's no framework. There's no data. There's no digital tech that is bigger than the patient that's learning at the lower end. 295 00:36:19,800 --> 00:36:26,220 And even now, people are evangelical about different parts of the elephant. 296 00:36:26,680 --> 00:36:33,780 We must remember the patient. And that's where the exciting opportunities are in big data, in heart failure. 297 00:36:34,200 --> 00:36:42,390 There are gaps I've actually shown you in both machine learning and in my own work and in the way we're looking at causes and risk prediction. 298 00:36:42,790 --> 00:36:51,150 We need to be scientific and we need to talk to each other and leave you with a quote from Yuval Harari. 299 00:36:51,930 --> 00:36:59,220 So we read from his every day and he he's big on data science, actually. 300 00:36:59,580 --> 00:37:07,770 He's he's saying that this the century, the last train ever take station called Homo sapien is leaving. 301 00:37:08,130 --> 00:37:11,640 Those who missed the train will never get a second chance in order to get a seat on it. 302 00:37:11,670 --> 00:37:19,710 You need to understand 21st century technology and in particular, the powers of biotechnology and computer algorithms. 303 00:37:21,030 --> 00:37:26,939 And he does actually talk about how there's this data divide and how everybody 304 00:37:26,940 --> 00:37:32,790 needs to understand how the algorithms and the science that's being pushed. 305 00:37:33,960 --> 00:37:39,420 We need to all make sure that there is religious legitimacy in that from that data. 306 00:37:40,020 --> 00:37:43,800 And so I commend that book to you if you haven't read it. 307 00:37:45,000 --> 00:37:46,860 And I will stop that quickly.