1 00:00:01,950 --> 00:00:11,340 I think the sort of the kindest way of describing me in the context of this room is very much as a non expert, at least when it comes to ethics. 2 00:00:11,340 --> 00:00:22,530 I am a statistician, am a geneticist and I worked in the field of using genetics to understand human disease and to identify 3 00:00:22,530 --> 00:00:30,090 sort of opportunities for new therapies or to better sort of predict where people are on their trajectory. 4 00:00:30,090 --> 00:00:35,370 I've done that for many years and and over the last few years, 5 00:00:35,370 --> 00:00:42,000 my role has shifted a bit from purely being someone who was doing research to someone 6 00:00:42,000 --> 00:00:46,920 who has been thinking about the kinds of infrastructure and kinds of communities that 7 00:00:46,920 --> 00:00:52,530 you need to build in order to make sure that this growing part of biomedical research 8 00:00:52,530 --> 00:00:57,210 is sort of done well at the university level and translated well and ethically, 9 00:00:57,210 --> 00:01:04,650 of course, into practise. And so behind that is my very much my role as the director of the Big Data Institute, 10 00:01:04,650 --> 00:01:14,790 which is one of these recent now not quite the latest and new centres to to have appeared within University of Oxford. 11 00:01:14,790 --> 00:01:18,300 For those of you who don't know us where we are a physical thing, 12 00:01:18,300 --> 00:01:26,520 we're a new institute and new building up the hill in the old campus and there's a somewhere between 250 and 500. 13 00:01:26,520 --> 00:01:35,060 And when you at how you can people within this institute but who are united by this, 14 00:01:35,060 --> 00:01:44,880 the desire to use a data driven approach to understanding the causes of human disease and identifying risks to intervention. 15 00:01:44,880 --> 00:01:49,980 So we're entirely driving that. We're entirely computational as it were. 16 00:01:49,980 --> 00:01:55,650 And really, what we are about is creating this fuel for air. 17 00:01:55,650 --> 00:02:02,070 Now we've heard an awful lot about this of the ethical issues of how you actually 18 00:02:02,070 --> 00:02:08,580 use A.I. in context and understand the regulatory or the legal aspects around that, 19 00:02:08,580 --> 00:02:14,490 perhaps. What we haven't had quite so much about is the process by which you can acquire what is 20 00:02:14,490 --> 00:02:23,700 the fundamental part that has to go into a process which is very much the data itself. 21 00:02:23,700 --> 00:02:30,690 Within the institute, just very briefly, there are four types of things that we do. 22 00:02:30,690 --> 00:02:36,240 First is about how we measure things, the sort of measurement technologies. 23 00:02:36,240 --> 00:02:41,400 The second is about how we bring all those data together to create the research ready. 24 00:02:41,400 --> 00:02:48,840 The analysis ready data sets that our researchers and others can come in here into 25 00:02:48,840 --> 00:02:53,910 and try and identify the structure that ultimately leads to these new insights. 26 00:02:53,910 --> 00:03:02,970 We have people from statistics, computer science, engineering, epidemiology, genomics, et cetera, developing methods. 27 00:03:02,970 --> 00:03:11,010 That is, if you like the A.I. algorithms, which are going to peer into this, these kinds of rich datasets. 28 00:03:11,010 --> 00:03:18,360 And then finally, and probably why I'm here is that the fourth pillar of what we do with this institute 29 00:03:18,360 --> 00:03:25,310 is to think hard about the much more wider societal aspects of this data driven. 30 00:03:25,310 --> 00:03:35,850 So issues around consent issues, around privacy, security issues, around governance issues, around sharing, intellectual property and so on. 31 00:03:35,850 --> 00:03:44,100 And we made a decision right at the start of putting this institute together that this was something that we wanted to go on actually in the building. 32 00:03:44,100 --> 00:03:48,960 It's such an integral part of doing biomedical research these days, 33 00:03:48,960 --> 00:03:58,710 and the issues that come out of this kind of research are so deep that if we don't train people in how you should 34 00:03:58,710 --> 00:04:05,490 think about conducting this kind of research and you're going to build the right practises into how people are doing, 35 00:04:05,490 --> 00:04:14,130 prosecuting the programmes actually at the point of implementation, then you've kind of you're starting on the wrong foot. 36 00:04:14,130 --> 00:04:22,590 So we very much put that at the heart of the institute. Much of ethos is based within and the the Big Data Institute, 37 00:04:22,590 --> 00:04:32,340 and we very recently we got funding from the IPCC to set up a new centre for doctoral training in health data science. 38 00:04:32,340 --> 00:04:41,340 One of the key pillars of that programme being that these data scientists and machine learners and so on would be trained 39 00:04:41,340 --> 00:04:50,280 very much alongside all the other skills in what they need in the skills to think about the problems from that standpoint. 40 00:04:50,280 --> 00:04:58,260 So it really, really is central to how we, we think and if anyone's interested in use cases, 41 00:04:58,260 --> 00:05:04,000 you know, coming up to us and talking about the types of problems that we're working on, 42 00:05:04,000 --> 00:05:11,900 the types of dilemmas that we're faced with, then please do get in touch who be more than happy to talk. 43 00:05:11,900 --> 00:05:22,430 So I just wanted to say just a couple of things about at least my sort of personal perspectives on why the types of research that we're doing now, 44 00:05:22,430 --> 00:05:27,650 which is very much within the tradition of biomedical or medical research, 45 00:05:27,650 --> 00:05:35,600 why those are a bit different and why they're raising new challenges from from the sort of the ethical perspective. 46 00:05:35,600 --> 00:05:40,790 And I think a really important point to stop, which is perhaps not so well understood, 47 00:05:40,790 --> 00:05:47,870 is actually that the growth of AI and machine learning technologies within biomedical 48 00:05:47,870 --> 00:05:55,100 research has really led to something of a shift in how medical research itself is conducted. 49 00:05:55,100 --> 00:06:00,080 And this comes back to this question of how do we how we get the data. 50 00:06:00,080 --> 00:06:10,640 So used to be that in medical science, you had a hypothesis, you you decided I wanted to test some particular question. 51 00:06:10,640 --> 00:06:18,320 And off the back of that, you designed an experiment. That experiment gave you some data, you analyse the data and on the back of that, 52 00:06:18,320 --> 00:06:23,510 you made some conclusions and maybe you came up with the new hypothesis. 53 00:06:23,510 --> 00:06:28,260 Importantly, those data were collected specifically for that purpose. 54 00:06:28,260 --> 00:06:35,630 And you went Typekit. You explain to people why you were going to collect that data and what you hoped to learn about. 55 00:06:35,630 --> 00:06:40,340 That's a very clean way of doing science, but clearly it's not massively scalable. 56 00:06:40,340 --> 00:06:46,400 There's one question that you could ask of those data and essentially one. 57 00:06:46,400 --> 00:06:53,120 Now, about 10, 15 years ago, biomedical research sort of took a side step. 58 00:06:53,120 --> 00:06:56,960 It changed direction a bit in how it collected data, 59 00:06:56,960 --> 00:07:06,920 and a lot of that came out of the world of genomics where people realised and who had been studying how genes affect diseases. 60 00:07:06,920 --> 00:07:12,260 They'd been studying sort of their favourite gene and their favourite disease and a particular combination. 61 00:07:12,260 --> 00:07:20,520 And the literature was full of incredibly bad results that never repeated and we're massively underpowered. 62 00:07:20,520 --> 00:07:25,080 But 15 years ago, what happened was a change in technology. 63 00:07:25,080 --> 00:07:35,680 So was changes in technology that start things. That led to our us being able to experiments, not just on one gene and a handful of individuals, 64 00:07:35,680 --> 00:07:40,710 but the entire genome in tens of thousands of individuals. 65 00:07:40,710 --> 00:07:50,820 And that led to this idea that rather than going in with your specific hypothesis, actually the most powerful thing is to go in without a hypothesis. 66 00:07:50,820 --> 00:07:56,280 You go in and you just collect data and you let the data tell you what the answer is. 67 00:07:56,280 --> 00:08:04,110 And that idea has very much percolated from just thinking about, well, let's study the whole genome and one disease. 68 00:08:04,110 --> 00:08:09,900 The genome wide association studies what essentially that idea to the idea that you go in and you 69 00:08:09,900 --> 00:08:15,870 collect genome and you collect everything that you possibly can about an individual's health, 70 00:08:15,870 --> 00:08:21,660 environment and lifestyle finances, you just collect everything you can. 71 00:08:21,660 --> 00:08:27,440 And later on, you decide what your research question is. Now. 72 00:08:27,440 --> 00:08:35,480 The success of this programme is sort of made real by something the UK Biobank, which many of you will probably know about, 73 00:08:35,480 --> 00:08:45,950 but about somewhere between one and two percent of the adults within the UK have consented to have that entire medical data, 74 00:08:45,950 --> 00:08:51,980 their entire genome sequence and huge amounts of subsidiary information about them. 75 00:08:51,980 --> 00:08:58,310 The lifestyle that can cognition their parents, sometimes the children. 76 00:08:58,310 --> 00:09:08,510 Huge amounts of information made available to people like me and people like you and people in companies and people in China and people in the U.S. 77 00:09:08,510 --> 00:09:16,310 All you have to do is to to sign up to a very few sort of restrictions about what you're going to do with the data. 78 00:09:16,310 --> 00:09:23,450 You have to say roughly what you're going to do with the data. You have to say that you're not going to try and identify these people. 79 00:09:23,450 --> 00:09:30,400 But beyond that, it's really not very much that you have to say that you're going to do. 80 00:09:30,400 --> 00:09:36,610 And as a consequence of that, there are people all over the world probing the tiniest details, 81 00:09:36,610 --> 00:09:44,350 the most intimate information, about half a million people within the UK, some probably indeed within this room. 82 00:09:44,350 --> 00:09:51,430 So it's an example of how our way of doing research is really shifting. 83 00:09:51,430 --> 00:09:58,180 This shift is exactly what enables the whole A.I. revolution in medicine and health care. 84 00:09:58,180 --> 00:10:06,130 But it, of course, brings up all sorts of questions about what it means to be informed about research project, which has no end. 85 00:10:06,130 --> 00:10:15,520 What it means in terms of can you ever can you ever comprehend the sorts of things that I might learn about you? 86 00:10:15,520 --> 00:10:23,770 If I bring together lots of sorts of information that you would never have had and what what would you like to know if I could, for example, 87 00:10:23,770 --> 00:10:29,440 predict whether you're going to get this disease in the next 10, 20, 50 years, 88 00:10:29,440 --> 00:10:37,690 huge amounts of new challenges arising from it, which we're only, I think, just beginning to describe the top of that. 89 00:10:37,690 --> 00:10:40,085 I shall shut up.