1 00:00:00,720 --> 00:00:03,420 Welcome to the Centre for Personalised Medicine Podcast, 2 00:00:03,630 --> 00:00:10,590 where we explore the promises and pitfalls of personalised medicine and ask questions about the ethical and societal challenges it creates. 3 00:00:11,190 --> 00:00:16,830 I'm Rachel Horton and I'm here with Gabby Samuel. And in today's episode, we're looking at diversifying genomics, 4 00:00:17,130 --> 00:00:21,900 a key aspect of ensuring that the benefits of personalised medicine can be accessed by everyone. 5 00:00:22,380 --> 00:00:28,200 We're joined by Dr. Francois Hardcastle, research fellow at the Clinical Ethics Law and Society Group at Oxford. 6 00:00:28,590 --> 00:00:37,200 Vaneck has just led a brilliant review aiming to identify key ethical, legal and social challenges in diversifying data and found that this. 7 00:00:37,260 --> 00:00:42,000 Could you start by talking us through how you got interested in this area of diversifying genomics? 8 00:00:42,900 --> 00:00:46,830 Yes. Thanks, Rachel. So I'm an associate technical researcher, 9 00:00:47,190 --> 00:00:53,190 and I explore how technologies and societies shape each other and evolve together and how we can 10 00:00:53,190 --> 00:00:59,220 intervene in this evolution to direct it towards a point where their benefits are equally distributed. 11 00:00:59,940 --> 00:01:04,799 I was looking at the question of diversity from an AI angle because there was a 12 00:01:04,800 --> 00:01:09,930 lot of discussion about how a lack of data or data that and that's inequalities. 13 00:01:10,290 --> 00:01:15,840 When they are fed into machine learning algorithms, they might actually exacerbate existing issues. 14 00:01:16,320 --> 00:01:27,389 And there is a similar problem in genomics, which is quite well known problem that there is a lack of diversity in genomic data repositories 15 00:01:27,390 --> 00:01:34,170 and biobanks have data that are basically skewed towards individuals of European ancestry. 16 00:01:34,410 --> 00:01:39,750 And so a lot of other ancestral groups are underrepresented in these repositories. 17 00:01:40,320 --> 00:01:43,980 There have been ongoing efforts to try to redress this problem, 18 00:01:44,430 --> 00:01:49,169 but there are ethical issues around these efforts that we should know before doing anything. 19 00:01:49,170 --> 00:01:57,659 And the Clinical Ethics Law and Society Research Group that I'm part of is exploring similar issues as the field is shaping. 20 00:01:57,660 --> 00:02:01,350 And so this was one this was of interest to our research group. 21 00:02:01,800 --> 00:02:08,070 And when we saw a call for a review on the ethical issues of diversifying genomic data, we got on to it. 22 00:02:09,240 --> 00:02:14,190 Could you kind of explain to us why diversity is so important in genomic datasets? 23 00:02:15,060 --> 00:02:26,350 Sure. So we all have approximately 99.9% of our DNA sequence in common and exploring that 0.1% that varies between 24 00:02:26,350 --> 00:02:33,290 those can advance our understanding of how genetic factors may contribute to disease or protection from disease. 25 00:02:34,140 --> 00:02:40,620 And that's why a lot of times scientists study DNA differences between individuals and groups. 26 00:02:41,430 --> 00:02:50,520 Those variants that are common in the population are usually unlikely to cause disease, and if they are rare, they may contribute to causing disease. 27 00:02:50,520 --> 00:02:56,610 But this may also very much depend on various other factors, like social and environmental factors. 28 00:02:57,150 --> 00:03:05,400 Another thing worth noting here is that there's probably more genetic variation within ancestral groups than between them. 29 00:03:05,400 --> 00:03:11,370 So, for example, there are more DNA differences between individuals with North African and East 30 00:03:11,400 --> 00:03:16,020 African ancestry than between individuals with African and European ancestry. 31 00:03:16,710 --> 00:03:22,500 And so we only if we only study the data from individuals of European ancestry, 32 00:03:23,010 --> 00:03:27,330 then we may not get enough insights about the genetic variations in other ancestral groups. 33 00:03:27,960 --> 00:03:31,440 So it's sort of a matter of having a good enough reference. 34 00:03:31,440 --> 00:03:37,440 And if you don't have a population well-represented enough to know that something's common, 35 00:03:37,740 --> 00:03:43,740 you might then think it's rare and kind of make too many conclusions from that about whether it's causing disease. 36 00:03:44,280 --> 00:03:49,770 Can I ask a question about the examples? Because I know that you've always got quite a few really nice examples up your 37 00:03:49,890 --> 00:03:55,530 sleeve of where when you talk about these biases in either A.I. or genetics, 38 00:03:55,860 --> 00:04:02,640 could you talk us through some of the examples where they could or have looked like some health disparities? 39 00:04:03,450 --> 00:04:10,580 Yes. So I can tell you about a study by Harvard researchers that was done on hypertrophic cardiomyopathy. 40 00:04:10,950 --> 00:04:16,980 But this study initially used data that overrepresented European ancestry individuals. 41 00:04:17,400 --> 00:04:22,620 But the researchers found that genetic variants were initially misclassified as disease causing, 42 00:04:22,860 --> 00:04:26,520 whereas they were in fact common in individuals with African ancestry. 43 00:04:26,730 --> 00:04:37,170 And so they had to be reclassified as benign. And as Rachel was saying earlier, it's sort of when particular groups are underrepresented in the data, 44 00:04:37,410 --> 00:04:41,070 it's much more difficult to classify their variants as read or comment, 45 00:04:41,610 --> 00:04:46,830 and they end up either being misclassified or being labelled as variants of unknown significance. 46 00:04:47,190 --> 00:04:55,440 And so this yeah, this example is just one of those examples that shows how easy it is to misclassify when we don't have enough data. 47 00:04:55,980 --> 00:04:59,790 So the kind of consequence of it like on the ground is people getting. 48 00:05:00,140 --> 00:05:04,340 Like actively the wrong diagnosis or their family being tested for the wrong thing. 49 00:05:05,150 --> 00:05:10,430 But yeah, I mean, that's so interesting. And then so is most of this knowledge known? 50 00:05:10,430 --> 00:05:13,159 I mean, y you, you conducted this review. 51 00:05:13,160 --> 00:05:18,530 What were you aiming to find in the review when you went to look at the types of ethical and social issues that you were after? 52 00:05:18,980 --> 00:05:21,680 Yeah, as you say, it's very well known problem. 53 00:05:21,680 --> 00:05:28,400 And so it's been more than a decade now that scientists and clinicians are have been calling for more diversity in genomic data. 54 00:05:28,960 --> 00:05:32,840 And there's been lots of efforts to try to redress this problem. 55 00:05:33,350 --> 00:05:38,030 It's just that the scale of the problem is so big that it's taking so long to progress for that. 56 00:05:38,030 --> 00:05:42,919 And so we were interested in understanding why this wasn't really happening. 57 00:05:42,920 --> 00:05:49,159 And that's how we got into understanding that actually diversifying data is very challenging from a legal, 58 00:05:49,160 --> 00:05:55,430 social and ethical perspective itself as well. And so the review really wanted to understand what these challenges are. 59 00:05:55,910 --> 00:06:02,180 So yeah, so we just wanted to know what are the ethical issues around the attempts to divest large data. 60 00:06:02,990 --> 00:06:07,570 You've got to tell us more. What what were the glitches that you came across in your view? 61 00:06:08,030 --> 00:06:16,850 I get it. But before I get into anything, I just want to say that the sort of search that we do for our review work has some limitations. 62 00:06:17,180 --> 00:06:22,610 One of the limitations was that most of the papers that we reviewed were from North America. 63 00:06:23,060 --> 00:06:31,219 And also now that intention was that our search mainly focussed on underrepresentation that was based on gender, race and ethnicity. 64 00:06:31,220 --> 00:06:38,420 So that leaves out other underserved groups such as children, elderly, psychiatric patients, prisoners and so on. 65 00:06:39,170 --> 00:06:44,600 And this is kind of like this speaks to a problem about the attempt to diversify, 66 00:06:44,600 --> 00:06:49,790 which is that these categories a lot of times don't actually map to ancestral categories. 67 00:06:49,790 --> 00:06:51,860 So so that's that's one of the challenges. 68 00:06:52,760 --> 00:07:00,530 So in terms of finding, we found that sometimes research practices can be exclusionary and this needs to change. 69 00:07:01,070 --> 00:07:09,290 One example is approaches to recruitment or data collection that don't consider the cultural setting in which social persons are situated. 70 00:07:09,980 --> 00:07:18,950 So, for example, for a group, grave concern might be really important, but a lot of research practices may only focus on individual consent. 71 00:07:19,670 --> 00:07:24,050 The literature suggested that practices need to have more cultural humility, 72 00:07:24,260 --> 00:07:33,200 which is often used to emphasise the importance of being reflexive and do active listening and taking responsibility for interactions. 73 00:07:33,200 --> 00:07:40,189 And on the side of researchers and research institutions say that seems like a huge issue to think about. 74 00:07:40,190 --> 00:07:43,760 And could you just tell us a little more about what else came up? 75 00:07:44,150 --> 00:07:50,120 The second finding that I would like to mention is the literature really emphasised 76 00:07:50,120 --> 00:07:56,270 the key role of co-production in identifying and avoiding potential problems. 77 00:07:56,840 --> 00:08:03,200 So it's really important that social podcasters are seen as active researchers and knowledge producers. 78 00:08:03,470 --> 00:08:10,970 And if we really don't have such mindset, then participant engagement is very easy for it to come tokenistic, 79 00:08:11,450 --> 00:08:18,170 and that's going to turn on risk, exacerbate existing problems, or create new forms of inequalities. 80 00:08:19,040 --> 00:08:27,410 We also held a workshop asked what values to review, which helped us to complement the findings with some expert recommendations. 81 00:08:27,860 --> 00:08:31,379 And one of the things that came out is a sort of wider, 82 00:08:31,380 --> 00:08:36,650 mature view and expert recommendations was that there are lots of structural 83 00:08:36,650 --> 00:08:42,470 issues that we need to really keep in mind in efforts to diversify genomic data. 84 00:08:42,800 --> 00:08:50,420 Please, can you tell us more about these structural issues? One of them is that a lot of times researchers might leave data as neutral, 85 00:08:50,780 --> 00:08:58,490 but this ignores the fact that data and technologies cannot be separated from the social context in which they are created, 86 00:08:58,850 --> 00:09:02,810 and they tend to reflect our biases and social inequalities. 87 00:09:03,620 --> 00:09:13,700 If if that's not kept in mind, that it's been easy to kind of make conclusions based on shallow, sort of simplistic things that just come up later. 88 00:09:14,120 --> 00:09:19,339 The second structural issue was that these aspects need to really be contextualised 89 00:09:19,340 --> 00:09:24,170 within the historical trajectory of structural racism and legacies of colonialism. 90 00:09:24,800 --> 00:09:30,080 And the sad one was that classification and categorisation, as I was saying earlier, 91 00:09:30,410 --> 00:09:35,150 has political consequences and they really need to be closely interrogated. 92 00:09:35,900 --> 00:09:39,860 Could I just ask you a little bit more about, you know, you were talking about data not being neutral. 93 00:09:40,130 --> 00:09:44,240 It'd be great to hear more about that and what that means. 94 00:09:45,620 --> 00:09:48,589 So for example, during the pandemic, 95 00:09:48,590 --> 00:09:57,680 there was some research coming out and saying that there was some genetic susceptibility to COVID based on racial categories. 96 00:09:58,040 --> 00:10:05,360 And that was to me, one of the examples of research is going with the mindset that the data as neutral, 97 00:10:06,110 --> 00:10:10,639 the reason the cause of that's what was perceived as susceptibility to disease, 98 00:10:10,640 --> 00:10:16,010 genetic susceptibility was perhaps more grounded in social inequalities. 99 00:10:16,560 --> 00:10:24,680 So that it's almost like we need to be more scientific about these sort of DNA findings and interpretations. 100 00:10:25,400 --> 00:10:28,850 I think it's also about what data are collecting. 101 00:10:28,850 --> 00:10:36,060 So data is out there we choose to collect and what type of data is we choosing to collect and why and what does that say about our values? 102 00:10:36,080 --> 00:10:40,129 It's all kind of embedded in, I suppose, the data. Absolutely. 103 00:10:40,130 --> 00:10:43,490 And also the tools and methods that were used to measure things. 104 00:10:43,880 --> 00:10:46,940 They were all created by people at the end of the day. 105 00:10:46,940 --> 00:10:57,049 And those people that came from their own perspective or their own experiences into that invention and an application. 106 00:10:57,050 --> 00:11:02,600 And so it's all it's all a matter of trying to contextualise all of these things that we use. 107 00:11:02,600 --> 00:11:06,500 And it's not about rejecting them and saying that they shouldn't be used. 108 00:11:06,680 --> 00:11:13,999 It's about positioning them in the wider picture to say that it's obviously comes from a particular angle. 109 00:11:14,000 --> 00:11:18,950 I might not work well when we're using it in a different context. 110 00:11:19,940 --> 00:11:25,610 It sounds like there are so many barriers and obstacles, I suppose for research and it wants to go in and try and do, 111 00:11:26,060 --> 00:11:33,140 I don't know, try to diversify their data in a way that is in line with ethically best practice. 112 00:11:33,380 --> 00:11:40,250 Did you come across I mean, especially if it's at the structure level, did you come across any researchers that actually, I don't to say this, 113 00:11:40,250 --> 00:11:48,680 but almost got it right, like where you kind of read the papers and felt that that worked well or that that had the effect it was supposed to have. 114 00:11:49,440 --> 00:11:58,220 Yes, we did find some really nice and best practices that were from other cultures that had been trying to co-produce genomic knowledge. 115 00:11:58,580 --> 00:12:09,020 But in the context of the U.K., it may be that we need to really try and work out what works best for a sort of super diverse society like the UK. 116 00:12:09,950 --> 00:12:19,189 So again, because best practices also talk about, you know, going to a specific community, I'm just trying to get them engaged in research. 117 00:12:19,190 --> 00:12:24,200 But but how is it to start from the beginning in a very diverse society? 118 00:12:24,580 --> 00:12:30,740 How can co-production naturally? It's something that we haven't really, really explored yet. 119 00:12:31,070 --> 00:12:32,690 It sounds so complicated, right? 120 00:12:32,690 --> 00:12:39,350 Because when you talk about going out to communities and diverse societies, I suppose that leads to the question of what? 121 00:12:39,680 --> 00:12:43,879 What is the community and what kind of even demographics are you looking for within the community? 122 00:12:43,880 --> 00:12:45,530 Because you said at the beginning, right, 123 00:12:45,530 --> 00:12:55,140 that the trinity between communities is so broad and even you even looking for a community based on genetics or socio economical. 124 00:12:55,640 --> 00:12:59,990 I suppose. Yeah, it's a really interesting question. Well, that's a really good question. 125 00:12:59,990 --> 00:13:04,819 I guess that's what I was trying to say is that a lot of times the ethnic or racial categories 126 00:13:04,820 --> 00:13:09,050 that we have which are socially constructed don't actually map to ancestral groups. 127 00:13:09,350 --> 00:13:15,049 But what we know is that there's things like racism or structural racism, 128 00:13:15,050 --> 00:13:20,390 or that structural inequalities for decades have had biological effects on people. 129 00:13:20,780 --> 00:13:24,070 So, yes, I mean, it's a really good question. 130 00:13:24,090 --> 00:13:34,550 How you would define diversity then, how you would define community, how some people, for example, define community based on geographical proximity, 131 00:13:35,000 --> 00:13:44,420 and some others talk about shared characteristics such as racial or ethnic categories or shared lived experiences. 132 00:13:44,990 --> 00:13:52,160 But yeah, it's, it's a really good question and it's something that it has to be determined in 133 00:13:52,160 --> 00:13:56,030 discussion with everybody and with all those people that I'll be talking about. 134 00:13:56,030 --> 00:13:59,600 And so the answer is, is in co-production, I guess. 135 00:14:00,330 --> 00:14:09,049 I remember in your reports that you spoke a little bit about diverse workforce and the importance of going beyond diverse data. 136 00:14:09,050 --> 00:14:14,850 And it reminds me of something I read the other day about and decolonising AI and the need. 137 00:14:14,850 --> 00:14:18,120 So it's not just about the categories that need to be thought about that. 138 00:14:18,830 --> 00:14:25,100 When we're thinking about the categories, it's who's actually conducting research and what knowledge is being produced. 139 00:14:25,100 --> 00:14:29,670 And as limiting, if you could just talk a little bit more about that. Yeah, sure. 140 00:14:29,680 --> 00:14:34,470 So I think the categories have their own significance and importance in this discussion. 141 00:14:34,890 --> 00:14:41,820 But one of the things that we discussed in the report was that the push for diversity shouldn't just be about the data. 142 00:14:42,120 --> 00:14:47,189 It should also be about the sort of knowledge that is being made and the sort of workflows 143 00:14:47,190 --> 00:14:51,360 that are in place and also the disciplines that are getting engaged in the research. 144 00:14:51,810 --> 00:14:56,400 At the moment, there is a problem of lack of diversity in genomic workflows as well. 145 00:14:56,850 --> 00:15:00,270 And so of course these are all very much connected to each other. 146 00:15:00,270 --> 00:15:05,750 And the more we have a diverse workforce, the more chances of having diverse data at the end and so on. 147 00:15:06,960 --> 00:15:12,210 Yeah, I mean, there's a lot of obstacles about how to cultivate a culture in the work environment, 148 00:15:12,210 --> 00:15:15,420 to sustain that diversity of explicit discipline and so on. 149 00:15:15,930 --> 00:15:22,559 And so lack of diversity is one of those things that you can't really tackle from one particular angle and in silos, 150 00:15:22,560 --> 00:15:28,330 and it has to really be thought through in terms of the bigger picture, thank you for anarchism. 151 00:15:28,770 --> 00:15:37,979 It's the French because like I guess the problem of a under-represented dataset feels so kind of like, Oh, we need to make that dataset more diverse. 152 00:15:37,980 --> 00:15:42,360 But I think this just beautifully illustrates how it's not as simple as just doing that. 153 00:15:42,360 --> 00:15:46,979 Like there's so many questions to consider and so many things that get raised on this 154 00:15:46,980 --> 00:15:52,800 path to like how we achieve genomics that is going to work better for everyone. 155 00:15:53,190 --> 00:15:57,240 If you were picking one message for people to take away from this podcast, what would it be? 156 00:15:58,490 --> 00:16:05,030 So it's really hard to convey all the complexity and challenges of this area in just one point. 157 00:16:05,420 --> 00:16:12,620 But clearly there is a real problem that if we don't have representative datasets to inform genetic tests, 158 00:16:13,070 --> 00:16:17,480 it worsens the outcomes for people who are represented in those datasets. 159 00:16:17,900 --> 00:16:25,790 And this just one example of structural racism, having a system where the quality of testing you can access is so influenced by your ancestry. 160 00:16:26,360 --> 00:16:34,880 But getting those datasets more representative needs to be a part of getting the whole enterprise of genomics more that that's not the goal in itself. 161 00:16:35,330 --> 00:16:42,200 In fact, the key message from our review is that diverse datasets should have an end point in themselves. 162 00:16:42,500 --> 00:16:48,740 Just collecting genomic data from people with a range of ancestries doesn't address the diversity problem. 163 00:16:49,580 --> 00:16:56,810 And this is because even if we have diverse data, that doesn't mean we have considered diversity in the true meaning of the term. 164 00:16:57,200 --> 00:17:06,110 To include diversity means thinking about diversity in terms of inclusion of underrepresented groups in all stages of the research process, 165 00:17:06,500 --> 00:17:10,250 ensuring that harms and benefits are equally distributed, 166 00:17:10,640 --> 00:17:16,850 and to co-create knowledge so that the knowledge that is created is the knowledge that the diverse populations are interested 167 00:17:16,850 --> 00:17:23,660 in knowing and ensuring that the benefits of that knowledge are fed back to a community or to that diverse population. 168 00:17:24,560 --> 00:17:32,540 Where could we go to find out more about your work? So the draft of this review is now online on a preprint server, 169 00:17:32,540 --> 00:17:42,740 and we are at the moment in the process of writing some academic papers from the review that hopefully will come out come out in the next year. 170 00:17:42,900 --> 00:17:50,270 I'm so excited to read these. It's such a fascinating field and thank you so much for making the time to talk to us today about it. 171 00:17:50,630 --> 00:17:58,980 Now, thank you for inviting me. Thank you very much for listening to this episode at the Centre for Personalised Medicine Podcast. 172 00:17:59,220 --> 00:18:03,030 If you'd like to find out more about personalised medicine and its promises and challenges, 173 00:18:03,300 --> 00:18:09,420 please visit the Centre of Personalised Medicine website at CPM, well boxed or AC UK.