1 00:00:02,550 --> 00:00:09,340 Hey, to everyone joining from around the world, I see we have people from Japan, from California, from Switzerland, 2 00:00:09,340 --> 00:00:17,380 lots of time zones here, Seattle and even from West Berkshire in the U.K. and, of course, from Oxford. 3 00:00:17,380 --> 00:00:21,940 Good day to all of you all. I hope you keeping well and safe in these difficult times. 4 00:00:21,940 --> 00:00:28,960 No doubt you've heard the good news from Rocksmith that we have a vaccine that's going to be affordable and globally distributed. 5 00:00:28,960 --> 00:00:35,800 So this is a significant day for us in Oxford and made even more significant by the fact that we have a 6 00:00:35,800 --> 00:00:45,130 presentation from Gene UNEF and Gina is going to give a talk that she was originally scheduled to give in March. 7 00:00:45,130 --> 00:00:50,260 And then circumstances intervened and then we had covered 19. 8 00:00:50,260 --> 00:00:56,980 But she's here and we Apsley delighted that she could join us. 9 00:00:56,980 --> 00:01:00,910 I'm in Golden. I was the founding director of the Oxford Martin School. 10 00:01:00,910 --> 00:01:08,920 And since 2006, I have been directing the programme on technological and economic change at the school. 11 00:01:08,920 --> 00:01:18,520 Gina is the professor of technology and society at the Oxford Internet Institute, was actually with founding groups and the Oxford Martin School. 12 00:01:18,520 --> 00:01:22,810 She's also a professor in the Department of Sociology at Oxford. 13 00:01:22,810 --> 00:01:26,560 She works on the future of work and data rich environments. 14 00:01:26,560 --> 00:01:35,530 She's published three books and over a dozen research articles in many of the magazines and journals that we'll be reading. 15 00:01:35,530 --> 00:01:43,510 She's a pioneer in creating the area of human centred data science and leading a new project on the 16 00:01:43,510 --> 00:01:49,600 organisational challenges that companies face using artificial intelligence for decision making. 17 00:01:49,600 --> 00:01:56,350 She's had fellowships at the British Academy, the Institute for Advanced Study at Princeton and elsewhere. 18 00:01:56,350 --> 00:02:02,590 And she's an adviser to many companies, organisations and others on this agenda. 19 00:02:02,590 --> 00:02:08,950 She's also an advisor on Álex, the Women's Forum for Economy and Society. 20 00:02:08,950 --> 00:02:17,520 Gina. Delighted to have you. I'm here to talk about data work. This is the new project that my team and I have been working on. 21 00:02:17,520 --> 00:02:20,980 And it's going to I'm going to talk about three distinct projects. 22 00:02:20,980 --> 00:02:33,550 Now, the stakes at hand are that we are looking at a transformation of the basic everyday infrastructure of how we do organising. 23 00:02:33,550 --> 00:02:41,860 I like this word infrastructure because it reminds us from science and technology studies that once technology is going to put into place, 24 00:02:41,860 --> 00:02:46,110 they're often invisible or everyday. 25 00:02:46,110 --> 00:02:55,140 We don't so much think about our roads and the structure when we're trying to get it from point A to B in the west. 26 00:02:55,140 --> 00:03:01,500 In the north. The Northern Hemisphere, we don't think necessarily, and fortunately, 27 00:03:01,500 --> 00:03:10,140 we don't have to think about things like our water infrastructure and we think about our connectivity infrastructure because it's helped us allow. 28 00:03:10,140 --> 00:03:19,890 Allowed us to do. Allowed us to do things that we haven't we wouldn't be able to do otherwise during this pandemic. 29 00:03:19,890 --> 00:03:21,060 That said, 30 00:03:21,060 --> 00:03:29,820 infrastructure helps us think about how small technical decisions being made today end up having enormous everyday implications for all of us. 31 00:03:29,820 --> 00:03:36,240 And that's what I'd like to talk about. There's three myths of everyday A.I. that we need to address. 32 00:03:36,240 --> 00:03:45,930 The first is that A.I. is not. It's not a province anymore of large technology companies. 33 00:03:45,930 --> 00:03:54,720 That is that we are seeing the integration of new modes of organising new kinds of technologies into everyday decisions, 34 00:03:54,720 --> 00:04:07,590 from legal decisions to marketing decisions to civics decisions, decisions about who gets credit or who gets put in jail. 35 00:04:07,590 --> 00:04:14,070 And in this way, what the conversation so far about ethics has targeted those who make A.I., 36 00:04:14,070 --> 00:04:18,640 those who those who the large technology firms who are driving the innovation. 37 00:04:18,640 --> 00:04:27,030 And so I would say that the context for us to think about in terms of data work is not that we're targeting large tech firms, 38 00:04:27,030 --> 00:04:32,280 but rather the places where many people on this call would work. 39 00:04:32,280 --> 00:04:41,430 Smaller firms, firms outside of technology firms, organisations of all sorts that are starting to integrate, purchase, 40 00:04:41,430 --> 00:04:49,260 acquire new kinds of data, processing, new kinds of technologies that make decisions and automate processes. 41 00:04:49,260 --> 00:04:58,920 The second kind of key challenge for us and thinking about data work is that we've talked about artificial intelligence as as a suite of technologies, 42 00:04:58,920 --> 00:05:07,590 as a smart tech, but fuelling and driving the expansion of artificial intelligence into more parts of our everyday lives. 43 00:05:07,590 --> 00:05:14,730 It's really this question of A.I. as a data so that, you know, if the first is A.I. is everywhere, it's in smaller firms. 44 00:05:14,730 --> 00:05:17,190 We need to we need to change our conversation. 45 00:05:17,190 --> 00:05:26,010 The second kind of key point that we need to think about is that we need to think of what is happening, not as just a technological phenomenon, 46 00:05:26,010 --> 00:05:35,790 although the processing power and cloud computing are allowing and enabling and driving new kinds of new ways of generating data. 47 00:05:35,790 --> 00:05:40,860 It is about a data question, right. And how data is being used in the Web. 48 00:05:40,860 --> 00:05:48,270 And that's going to be one of the key areas to look at tonight. The third is the idea that AI systems are automatic. 49 00:05:48,270 --> 00:05:53,820 Now, this seems almost like an oxymoron, but bear with me for a moment. 50 00:05:53,820 --> 00:06:04,350 When we talk about integrating artificial intelligence systems into everyday life, we might think of them as driving a cent of automation. 51 00:06:04,350 --> 00:06:15,000 And that's certainly the push behind many of these systems. But what is fuelling much of the expansion is a whole host of human work. 52 00:06:15,000 --> 00:06:22,990 And it's really in the intersection of these three points I of the every day and many different kinds of organisations. 53 00:06:22,990 --> 00:06:27,720 I as a as a data innovation and not just a tech innovation. 54 00:06:27,720 --> 00:06:33,000 And the notion that A.I. systems involve enormous amounts of human effort. 55 00:06:33,000 --> 00:06:38,480 That's really where we need to be thinking in terms of data work. And that's where our talk is tonight. 56 00:06:38,480 --> 00:06:46,880 So with my team. Maggie McGrath and Diana Prakash earlier this spring. 57 00:06:46,880 --> 00:06:50,180 Earlier this year, we were rereleased report called A at Work. 58 00:06:50,180 --> 00:07:01,730 And what we did was actually fairly simple. We surveyed global newspapers for accounts of how artificial intelligence was making the news. 59 00:07:01,730 --> 00:07:06,500 And first thing we did was set aside basically everything that looked like a company press release. 60 00:07:06,500 --> 00:07:11,810 Right. So those cases where someone was advertising how well their product worked, 61 00:07:11,810 --> 00:07:19,130 or we know from existing research that much of the journalism that is on artificial 62 00:07:19,130 --> 00:07:22,880 intelligence is coming from the sources of technology companies that make it. 63 00:07:22,880 --> 00:07:33,860 So we set those aside and instead we focussed on how news stories were covering, where I was working and organisations. 64 00:07:33,860 --> 00:07:41,280 And we came up with three common challenges that we saw in the about a thousand articles that met our criteria. 65 00:07:41,280 --> 00:07:49,100 And this over the course of the year. And those challenges were challenges of transparency, integration and reliance. 66 00:07:49,100 --> 00:07:51,500 That is what we began to see. 67 00:07:51,500 --> 00:08:03,380 If we if we take and and collate the stories of where A.I. is not quite living up yet to its potential as a tech, as a transformative technology. 68 00:08:03,380 --> 00:08:09,950 We see the stories, share these news stories, share three key challenges now in transparency. 69 00:08:09,950 --> 00:08:16,670 We see this lack of transparency between companies and customers about what I can do, how long it takes, 70 00:08:16,670 --> 00:08:23,630 how much work is involved in making it, and sort of closing the loop as as as they say. 71 00:08:23,630 --> 00:08:29,720 So companies on the one hand tout eyes this automating solution rather than being open about the amount 72 00:08:29,720 --> 00:08:36,200 of money or time or effort needed to produce and sustain systems that actually work in practise. 73 00:08:36,200 --> 00:08:41,570 And this led to a whole host of problems that that that arose in our reports, 74 00:08:41,570 --> 00:08:46,010 because we could see that that the people, the clients behind these systems felt deceived, 75 00:08:46,010 --> 00:08:55,730 they felt deceived by privacy, promises, etc. And companies felt that there was a whole host of unglamorous ways on the on the one hand, 76 00:08:55,730 --> 00:09:05,670 they had signed up for know, sparkling new systems. On the other hand, they were getting what looked like lots of back and data cleaning. 77 00:09:05,670 --> 00:09:13,410 Next, there was a huge gap in integration and we caught we use this to think about how what was the gap between 78 00:09:13,410 --> 00:09:20,910 the conditions under which I was trained in real life environments and how it's used in practise. 79 00:09:20,910 --> 00:09:30,870 And we we saw that in many systems, the systems that that that made the news that they somehow were were trained with one set of data. 80 00:09:30,870 --> 00:09:38,790 But unfortunately, the real life situations in which it was rolled out or integrated into were messier and less organised. 81 00:09:38,790 --> 00:09:49,260 And so it often meant that there were enormous amount of human labour used to train and manage the systems even when they were put in place. 82 00:09:49,260 --> 00:09:53,190 And that meant that companies often struggled to scale artificial intelligence 83 00:09:53,190 --> 00:09:59,090 system so they could work across a broad array of scenarios and problems. 84 00:09:59,090 --> 00:10:06,300 And that takes away or presents a problem for some of the notions of scale. 85 00:10:06,300 --> 00:10:12,960 And there was a notion of reliance, and we like to think of this as like companies were mine either too much on their eye 86 00:10:12,960 --> 00:10:19,650 system or the idea that an ISIS was in place and investing a lot of agency autonomy 87 00:10:19,650 --> 00:10:25,500 and authority into the systems rather than training their workforce staff or bringing 88 00:10:25,500 --> 00:10:31,740 them alongside and making those those decisions practical in the workplace. 89 00:10:31,740 --> 00:10:36,510 We also see that there's a did these challenges of both over and underlined. 90 00:10:36,510 --> 00:10:41,440 So let me take each one of these in turn. First is with transparency. 91 00:10:41,440 --> 00:10:52,960 Now, my colleague Mary Grey at Microsoft Research and her colleague said Hearths, I'm sorry, have just released a wonderful book, 92 00:10:52,960 --> 00:11:00,750 goes to work on the platform work of labouring labelling so that when we see a systems, 93 00:11:00,750 --> 00:11:09,390 we know that there's a lot of contract gig or on demand platform labour that goes in to building these systems. 94 00:11:09,390 --> 00:11:20,550 And often often that kind of transparency is is a problem that we don't see who's involved and the work that's involved in making these systems work. 95 00:11:20,550 --> 00:11:25,620 However, we in our report kind of pushed on a different idea of transparency, 96 00:11:25,620 --> 00:11:32,820 the notion that companies often had different kinds of labour in different parts 97 00:11:32,820 --> 00:11:36,810 of the organisation that were really important in making their systems happen. 98 00:11:36,810 --> 00:11:44,550 And so and so in this sense, the transparency was both. Is the system being transparent with with where work is happening? 99 00:11:44,550 --> 00:11:51,240 Who is doing the work also geographically? Is the work being done within the company or outsourced to a third? 100 00:11:51,240 --> 00:11:57,930 And again, several of the cases found where even internally within organisations, 101 00:11:57,930 --> 00:12:04,080 they would think that a process was happening inside their organisation when it was really happening outside. 102 00:12:04,080 --> 00:12:07,260 The next is this kind of thinking about integration. Right. 103 00:12:07,260 --> 00:12:14,160 And example that how systems actually integrate into existing workplaces really presented a lot of problems. 104 00:12:14,160 --> 00:12:23,210 So we saw several cases where there was problems scaling data across multiple hospitals where data would be collected in one hospital, 105 00:12:23,210 --> 00:12:30,420 a system and a system would be trained. But then bringing that to be able to make sense in another hospital or scaling 106 00:12:30,420 --> 00:12:34,830 it even across different departments of an organisation became a problem. 107 00:12:34,830 --> 00:12:40,470 So we see these challenges of integration really as when data moves from one context to another. 108 00:12:40,470 --> 00:12:48,570 And companies, they can't quite fit the way I they eye into their existing organisation and an existing strategy. 109 00:12:48,570 --> 00:12:54,870 And then third was this notion of reliance. Right. So when was the dependence on I just. 110 00:12:54,870 --> 00:12:57,900 Right. Neither sort of a Goldilocks moment. 111 00:12:57,900 --> 00:13:08,610 And there were several challenges where companies were were touting that their system was was doing one thing, but really. 112 00:13:08,610 --> 00:13:15,360 Well, it was a it was a shield. And what was being done was people in behind the system. 113 00:13:15,360 --> 00:13:23,340 So in countries like China and India, there were crowd sourced to labour actually providing the work. 114 00:13:23,340 --> 00:13:27,030 So it was weak data and privacy laws and cheap labour, 115 00:13:27,030 --> 00:13:34,950 this kind of intersection that really gave different places on the globe a competitive advantage in this kind of work. 116 00:13:34,950 --> 00:13:42,630 And and so, therefore, the the the reliance isn't so much on the technical system, but rather on a new form of outsourcing. 117 00:13:42,630 --> 00:13:44,490 So what does this mean for data work? 118 00:13:44,490 --> 00:13:53,070 Well, we define data work as the work that needs to happen in terms of helping the entire interpretation and contextualisation of data. 119 00:13:53,070 --> 00:14:02,700 In practise, that's first. Second data work is really the work that is involved in helping translate data for fairness and inclusion. 120 00:14:02,700 --> 00:14:10,300 So that, too, needs some kind of context around a system. 121 00:14:10,300 --> 00:14:16,980 And third, that we define data work is the communication that needs to happen with a whole host of stakeholders, 122 00:14:16,980 --> 00:14:21,360 often including conversations about privacy and ethics. 123 00:14:21,360 --> 00:14:26,730 And so this comes from a paper that we published earlier this spring. 124 00:14:26,730 --> 00:14:31,530 Who does the work of data and ACM interactions? And that's with my colleagues. 125 00:14:31,530 --> 00:14:40,570 Mueller, Boesen, Pine, Nielsen and myself. And and in this, we we we we ask three main questions. 126 00:14:40,570 --> 00:14:43,890 Right. Who if we're going to understand this hidden, 127 00:14:43,890 --> 00:14:53,580 invisible lead labour of data work that is happening within organisations preparing organisations for A.I. systems. 128 00:14:53,580 --> 00:14:57,940 We need to ask first who is ensuring that the data are meaningful? 129 00:14:57,940 --> 00:15:05,970 Who's doing the work that helps integrate data solutions into practise within the organisation? 130 00:15:05,970 --> 00:15:13,980 So who's taking the responses? So the sorting and making sure that systems are organised around it. 131 00:15:13,980 --> 00:15:20,370 Second, who's really doing the work of organising an infrastructure thing for making it even possible? 132 00:15:20,370 --> 00:15:24,870 And so much of what we've seen in terms of what Grey and sorry, 133 00:15:24,870 --> 00:15:32,310 called ghost work in terms of some of the platform labour really is that a lot of the organising an infrastructure in work is hidden labour. 134 00:15:32,310 --> 00:15:39,300 But even within organisations, we see that there's work around organising an infrastructure that needs to happen. 135 00:15:39,300 --> 00:15:47,730 And then finally, who is doing this work that is attending to questions of ethics, privacy and people's concerns about their data? 136 00:15:47,730 --> 00:15:52,290 Again, difficult to automate that particular part of work. 137 00:15:52,290 --> 00:15:56,130 And so we took these three questions and we looked at two sets of hospitals, 138 00:15:56,130 --> 00:16:06,990 a set of work in hospitals around creating Redding billing data in the United States for analysis. 139 00:16:06,990 --> 00:16:12,630 So as many people on the call will know, in the United States with privatised healthcare system, 140 00:16:12,630 --> 00:16:22,560 each procedure within a healthcare setting has a particular billing code, and those billing codes can be aggregated to provide real sense. 141 00:16:22,560 --> 00:16:30,240 How long did someone stay in hospital? What what kinds of treatments end up with? 142 00:16:30,240 --> 00:16:39,840 What kinds of outcomes? We'll see, we looked both at the United States and in Denmark, where a site where a set of workers, 143 00:16:39,840 --> 00:16:47,910 medical secretaries do the work of transcribing and making certain kinds of data meaningful within the health care setting. 144 00:16:47,910 --> 00:16:55,380 And so by doing those two and this is where we're really missing out on the slides, we come up with a data, 145 00:16:55,380 --> 00:17:03,540 will we come up with a way that we can model where intelligibility and transparency? 146 00:17:03,540 --> 00:17:11,370 Ongoing optimisation of resources and the work of context, information and metadata occur within the organisation. 147 00:17:11,370 --> 00:17:23,040 And so we think about these three really primary types of work, both as new kinds of work that's necessary to make A.I. systems even function, 148 00:17:23,040 --> 00:17:30,990 but also as recognising work that we might not have seen as crucial in the data driven revolution. 149 00:17:30,990 --> 00:17:40,140 So, for example, are building codes, experts and hospitals in Denmark and the United States were both really key and 150 00:17:40,140 --> 00:17:46,770 making sure that certain kinds of data were ready for data scientists to process. 151 00:17:46,770 --> 00:17:54,000 They were really important for making sure that the right codes got attached to the right procedures. 152 00:17:54,000 --> 00:18:01,350 And that was a kind of infrastructure work, a sort of care of data work that opens up new possibilities for analysis. 153 00:18:01,350 --> 00:18:06,510 And third, they were the ones who interfaced with people who had concerns about whether or 154 00:18:06,510 --> 00:18:13,400 not the codes about them were right or had concerns or questions about privacy. 155 00:18:13,400 --> 00:18:22,460 So before we open up for discussion, and I promise, since we're doing this without slide's, I'm going to I'm going to bring us to a close very soon. 156 00:18:22,460 --> 00:18:32,320 I will just say that the third project that my teams worked on in this scheme and really where I see both opportunity and 157 00:18:32,320 --> 00:18:41,360 a great opportunity for people to get involved is around the discussion of what is and is an artificial intelligence. 158 00:18:41,360 --> 00:18:51,770 So helping people make sense of the systems that we see and unpacking some of the myths about automated work and the future of work, 159 00:18:51,770 --> 00:19:03,770 I think is one of the things that we need to be working on. So with my team early in this in 2020, we released the A to Z of A I or the A to Z of a I. 160 00:19:03,770 --> 00:19:12,560 And if you Google that, you will find it. We partnered with Google to create an educational product that really helps people with a 161 00:19:12,560 --> 00:19:18,200 buy a simple bite sized explainers to help people understand what a is and how it works. 162 00:19:18,200 --> 00:19:25,190 And that and that project has now been rolled out in fifty nine different countries, in 13 languages. 163 00:19:25,190 --> 00:19:32,330 And it's part of an effort to kind of tease out some of the misperceptions that we have. 164 00:19:32,330 --> 00:19:40,370 So I have a series of practise and a series of policy research recommendations. 165 00:19:40,370 --> 00:19:43,220 Before we wrap up. 166 00:19:43,220 --> 00:19:51,350 And if this were slide enabled, if this weren't 20, 20 and the slide and they talk, we would leave these up as we have the conversation. 167 00:19:51,350 --> 00:19:59,080 But I just want to say that for a practise agenda, we really need to be thinking about organising for data saturated societies. 168 00:19:59,080 --> 00:20:07,090 These questions around artificial intelligence. And who does the work and making AI systems even possible. 169 00:20:07,090 --> 00:20:11,350 That raises questions of digital data and the public good, 170 00:20:11,350 --> 00:20:17,770 and it helps us think about projects that seek to understand how we can harness some kinds of data. 171 00:20:17,770 --> 00:20:21,490 Commercial data, perhaps for responsible reuse. 172 00:20:21,490 --> 00:20:31,660 It helps us think about how we want to intervene and think through those questions of who is advocating for the data of data subjects. 173 00:20:31,660 --> 00:20:36,580 Second, it reminds us that people's understanding of their privacy and their own data is 174 00:20:36,580 --> 00:20:42,030 not a task that we can just leave and assume that people will do it on their own, 175 00:20:42,030 --> 00:20:50,650 that there actually is quite a bit of organising and stakeholder engagement work that needs to be done both on the part of companies doing the work, 176 00:20:50,650 --> 00:21:00,700 but also for those of us who advocate for responsible use of data. We really need to be working in this intersection of helping to people to to to 177 00:21:00,700 --> 00:21:04,930 learn to interrogate the values and implications of data driven businesses. 178 00:21:04,930 --> 00:21:12,880 And then finally, third, I mean, this is part of this is we really need to empower citizens and upscale societies as part of this. 179 00:21:12,880 --> 00:21:19,840 So so the work of making AI systems is simply too important to be left to simply large technology 180 00:21:19,840 --> 00:21:25,870 companies who have an interest in as as as vendors of selling systems on to small companies. 181 00:21:25,870 --> 00:21:32,080 We really need to support this responsible utilisation of data, not just by thinking about who's designing systems, 182 00:21:32,080 --> 00:21:36,280 but helping people who are going to be using them on an ongoing, ongoing method. 183 00:21:36,280 --> 00:21:43,860 And that, of course, brings up the policy questions of how we can enable policymakers to up skill as well. 184 00:21:43,860 --> 00:21:53,740 And finally, in terms of in terms of research agenda, I think this applied everyday infrastructure is one way that we can begin to think about 185 00:21:53,740 --> 00:22:00,340 and move A.I. and ethics questions out of the realm of large scale technology makers, 186 00:22:00,340 --> 00:22:06,190 but really start to think about how we map, track and measure these changes and all of our lives. 187 00:22:06,190 --> 00:22:14,850 What's happening in the organisational settings where we work and how can we see that we're moving from 188 00:22:14,850 --> 00:22:21,670 these these questions of a more and more data gathered about us and managed about us in different ways? 189 00:22:21,670 --> 00:22:34,720 Can we identify ways that social, cultural, organisational and causal factor factors really shape who can intervene and hold accountable A.I. systems? 190 00:22:34,720 --> 00:22:44,400 So where can we do the social science that helps us ensure that systems are deployed and integrated in responsible ways? 191 00:22:44,400 --> 00:22:52,960 And then can we begin to think about the changing social norms and conventions that are happening around AI systems and organisations? 192 00:22:52,960 --> 00:23:00,430 When do we cede power to automated systems and when do we remember that behind the interface of 193 00:23:00,430 --> 00:23:05,590 many of these systems is a whole host of human labour that also needs to be held accountable. 194 00:23:05,590 --> 00:23:16,960 And so with that, I indulge. Thank you all for indulging me with my technical issues this evening and invite and come in and join our conversation. 195 00:23:16,960 --> 00:23:23,590 Thanks very much, Gina, for that admirably clear presentation. 196 00:23:23,590 --> 00:23:35,080 I love your organisation. Of Everything in 3D, which is incredibly helpful, is where every training is so that people remember threes. 197 00:23:35,080 --> 00:23:42,100 It helps when you have no slides to follow. However, it is impressive that you remembered all the 3D as well. 198 00:23:42,100 --> 00:23:50,330 Admirably clear and and really extremely urgently needed, because without knowing it, 199 00:23:50,330 --> 00:23:56,680 we are all walking into this maze and sometimes catastrophically so, 200 00:23:56,680 --> 00:24:05,500 not least in the UK government's use of Excel spreadsheets it and understand for track and trace. 201 00:24:05,500 --> 00:24:10,600 I have a number of questions, but I am conscious of the time and we did start a little bit late. 202 00:24:10,600 --> 00:24:18,920 So let me let me just impose a few. And having been in some of these data factories in Kenya, the Samasource and others, 203 00:24:18,920 --> 00:24:25,480 and and also admire the work that your colleague Mark Graham stop and thinking about the rights of people, 204 00:24:25,480 --> 00:24:31,660 which is very allied to the work you're doing. No doubt this is maybe just beginning. 205 00:24:31,660 --> 00:24:40,560 We've got your book that argues that I am at the bottom, but some of what you were speaking about didn't sound too much like a guide. 206 00:24:40,560 --> 00:24:51,190 To me, it sounds more like filling in Excel spreadsheets. Is is that a slippery slope or, you know, without reading a book? 207 00:24:51,190 --> 00:24:59,020 Can you give us a quick definition of of a. But I do think all the participants to get the book obviously answered to look at the deeper explanation. 208 00:24:59,020 --> 00:25:05,440 Thank you. The. So is it is it is. 209 00:25:05,440 --> 00:25:11,770 It just excels. There's a there's a joke that says when it's a as sales or consultant, 210 00:25:11,770 --> 00:25:18,430 a as a spreadsheet, when it's someone who's a data scientist, it's machine learning. 211 00:25:18,430 --> 00:25:24,850 That's absolutely true here. When we look at how companies are talking about what it is that they're doing. 212 00:25:24,850 --> 00:25:40,720 They they they they they put they put a gloss that allows more computation, that infers much more computational power than they're actually doing. 213 00:25:40,720 --> 00:25:44,860 So with the Women's Forum, you mentioned the beginning in the introduction. 214 00:25:44,860 --> 00:25:53,680 I've been doing this autumn a series of focus groups with chief data scientists in in companies in Europe and the US. 215 00:25:53,680 --> 00:25:55,930 And these are incredible company leaders. Right. 216 00:25:55,930 --> 00:26:06,310 These are Fortune 500 companies, banks, large manufacturing firms, large consumers, consumer services firms. 217 00:26:06,310 --> 00:26:13,030 Named the sector we've had we've had an interview with someone working in their data science team. 218 00:26:13,030 --> 00:26:19,510 And we ask the primary question, what are you doing about responsible A.I. ethics? 219 00:26:19,510 --> 00:26:30,520 What are you doing for responsibility in your data systems? And each each data scientist knows they have a huge responsibility to do. 220 00:26:30,520 --> 00:26:37,660 And none of them can articulate yet what it is they should be doing on a practical level. 221 00:26:37,660 --> 00:26:43,450 So there's an enormous amount of Catch-Up that our corporate leaders are doing in 222 00:26:43,450 --> 00:26:48,220 terms of figuring out how to put into practise something that actually works. 223 00:26:48,220 --> 00:26:56,230 So, sure, some of the some of the challenges that we're looking at are challenges of any kinds of large scale centralisation, 224 00:26:56,230 --> 00:27:00,190 large scale control over globalised systems and supply chains. 225 00:27:00,190 --> 00:27:09,460 But the challenges here are that we risk building infrastructures that once put into place as technical data. 226 00:27:09,460 --> 00:27:16,240 Infrastructures become difficult to untangle, difficult to hold accountable and difficult to intervene in. 227 00:27:16,240 --> 00:27:22,530 And so that's why we are suggesting that there's an urgency at this moment for really kind of. 228 00:27:22,530 --> 00:27:29,180 Guy getting getting getting the conversation more involved in holding these systems accountable. 229 00:27:29,180 --> 00:27:37,580 Yes. And I think you you very clearly articulated the urgency of this and the need. 230 00:27:37,580 --> 00:27:42,710 I have many questions, but let's go to some of the questions, the questions that have been posed by participants. 231 00:27:42,710 --> 00:27:47,120 You are able to vote for these questions, so to vote. 232 00:27:47,120 --> 00:27:53,330 If you if you're keen on a particular question and I see we have eight. 233 00:27:53,330 --> 00:27:58,070 Let me take the first, which is from Ali Steadman, who I happen to know. 234 00:27:58,070 --> 00:28:05,300 Hi, Ali. Good to see that you are participating in this. What does the equity of opportunity in tech look like? 235 00:28:05,300 --> 00:28:12,140 Where do we aim and how do we know when we've reached it? That is a fabulous question, because on the one hand, 236 00:28:12,140 --> 00:28:19,970 we want to see the expansion of the types of people involved in designing and building A.I.S systems for the world. 237 00:28:19,970 --> 00:28:23,960 Right now, we see a concentration of that effort in the global north. 238 00:28:23,960 --> 00:28:31,940 We see it in the US and in Europe. We see that only 18 percent of people working in Azi are women. 239 00:28:31,940 --> 00:28:41,660 So we have we have enormous challenges of racial, ethnic, global gender diversity in building the system. 240 00:28:41,660 --> 00:28:49,490 On top of that, we need to start building capacity. And I would suggest that there's a couple of challenges. 241 00:28:49,490 --> 00:28:57,260 One is that the the equity of opportunity is necessary for building better tech, but it's not sufficient. 242 00:28:57,260 --> 00:29:00,350 It's not the only thing that's going to get us there. 243 00:29:00,350 --> 00:29:10,310 The second would be that we really need to increase the capacity in the global south in order to ensure that systems are properly localised. 244 00:29:10,310 --> 00:29:13,850 So if we are just relying on systems that are built in one place, 245 00:29:13,850 --> 00:29:22,750 trained on data from one place and then integrated into into the systems around the world, that's that's a recipe for disaster. 246 00:29:22,750 --> 00:29:29,920 Yes, integrating bias and a massive concerns, not least Shakir Mohammed. 247 00:29:29,920 --> 00:29:40,720 Deep minds been writing very admirably about the colonialism of data and data algorithms for votes for doxa rulers. 248 00:29:40,720 --> 00:29:44,940 Question. Can you comment on technological determinism? Is I. 249 00:29:44,940 --> 00:29:48,490 Trends such as those discussed. Yeah. 250 00:29:48,490 --> 00:29:55,660 Susceptible to the fallacy of technological determinism. That's a fantastic question. 251 00:29:55,660 --> 00:30:06,280 We certainly hear that determinism in how industry leaders talk about and support and and push the inevitability of A.I. 252 00:30:06,280 --> 00:30:18,270 So if we look at this kind of data work question, we where we see people in these large and these large hospital settings, 253 00:30:18,270 --> 00:30:26,560 the people who are involved in the day to day operations of getting data systems ready for data analysis. 254 00:30:26,560 --> 00:30:30,850 There's a whole host of new kinds of jobs that are being involved. 255 00:30:30,850 --> 00:30:37,930 It's not inevitable that that it automates work or that it displaces all kinds of work. 256 00:30:37,930 --> 00:30:46,420 But instead it's creating these new moments with with with different kinds of opportunities for people to be involved. 257 00:30:46,420 --> 00:30:49,840 And so and so when I think about the technological determinism, 258 00:30:49,840 --> 00:30:56,950 I think the technology has its own drive and it will naturally kind of go in one way or another. 259 00:30:56,950 --> 00:31:05,050 Whereas whereas what we can what we can see from how from from from older industries like healthcare and construction 260 00:31:05,050 --> 00:31:13,170 to that I know very well in this case that the pathway that new technologies take is not predetermined at all. 261 00:31:13,170 --> 00:31:22,090 So so I think we have I think we as educators have work cut out for us where we are. 262 00:31:22,090 --> 00:31:31,380 We make sure that how the technology industry talks about the inevitability of their genius is held to account. 263 00:31:31,380 --> 00:31:37,990 Absolutely. So there's also people on YouTube participating and we've been since. 264 00:31:37,990 --> 00:31:43,280 A couple of questions that have come across from that. 265 00:31:43,280 --> 00:31:51,240 And the first one which seems to disappear is I was wondering, this is doesn't say who it's from. 266 00:31:51,240 --> 00:31:55,000 Andre Grisman. I was wondering how Prof. 267 00:31:55,000 --> 00:32:03,100 Mathaf thinks data saturation in a rich healthcare will impact medical education in the next five years. 268 00:32:03,100 --> 00:32:09,100 Brilliant question from a brilliant colleague in the United States. 269 00:32:09,100 --> 00:32:23,350 So we absolutely have to use these notions of data work to help people understand the context of that that results they're looking at. 270 00:32:23,350 --> 00:32:34,570 If we continue to think about the outputs of A.I.S systems as decontextualised outputs, we end up in a dangerous place, especially in medical systems. 271 00:32:34,570 --> 00:32:44,770 If we if we understand where the data come from, what kind of context they have and and who is advocating and who is responsible for ensuring that 272 00:32:44,770 --> 00:32:52,600 there is the organisational stitching together between the outputs and the the the the organisation, 273 00:32:52,600 --> 00:33:01,690 the practises within the organisation. We would have much more highly contextualised and much more relevant results. 274 00:33:01,690 --> 00:33:10,930 It's life and death consequences and health care. So where we are right now, today, in this moment where lot we are, we have a lot to be grateful for. 275 00:33:10,930 --> 00:33:17,290 Large scale computing power and the integration of global supply chains is going to help 276 00:33:17,290 --> 00:33:23,330 us end this global pandemic sooner than than than any pandemics ever ended before. 277 00:33:23,330 --> 00:33:32,680 Right. We we we should be cheering large scale computing power and the collaboration possible from teams in this particular way. 278 00:33:32,680 --> 00:33:45,760 But we're not going to get to those great societally beneficial outcomes unless we realise that these data systems are highly dependent, 279 00:33:45,760 --> 00:33:55,150 highly contextualised and sometimes highly fragile. Many of the cases that we looked at in our work report come from healthcare, 280 00:33:55,150 --> 00:34:03,040 where data from one context was simply tried to be brought into another context of data gathered in one particular 281 00:34:03,040 --> 00:34:11,560 hospital reflected choices that that that were very specific to that hospital and not applicable to to other hospitals. 282 00:34:11,560 --> 00:34:14,530 So how does this influence medical education? 283 00:34:14,530 --> 00:34:22,460 We have to train people in health care who are going to be using these systems on how to use them responsibly, how to be critical. 284 00:34:22,460 --> 00:34:28,320 Tumours of the data they're using. They are our front line defence for bad. 285 00:34:28,320 --> 00:34:33,470 And when a guy goes awry. In fact, that was from someone else. 286 00:34:33,470 --> 00:34:39,800 But Andre Guzman said they'd like to extend this this discussion from medical schools to business schools, 287 00:34:39,800 --> 00:34:48,980 etc., what should be the core of an education generally related to a Ayane data? 288 00:34:48,980 --> 00:34:56,510 Carl Broten, Bergström and Jevin West at the University of Washington have a wonderful new book out called Calling B.S., 289 00:34:56,510 --> 00:35:02,570 where they developed a training course around critical human centred data science. 290 00:35:02,570 --> 00:35:09,980 And they basically are training people to say, wait a second, I am calling out and I won't use the profanity. 291 00:35:09,980 --> 00:35:16,940 I'm calling out the as wrong as patently wrong. 292 00:35:16,940 --> 00:35:20,960 Some of the data, some of the seemingly data driven evidence here, 293 00:35:20,960 --> 00:35:29,210 I think it's incredibly important in this particular moment when we see attacks and challenges on science and evidence, 294 00:35:29,210 --> 00:35:30,080 we need to stand up, 295 00:35:30,080 --> 00:35:39,050 on the one hand for science and evidence and help ensure that we continue to build public trust in good science and good evidence. 296 00:35:39,050 --> 00:35:42,860 But on the other hand, building systems, 297 00:35:42,860 --> 00:35:50,870 building fragile A.I. systems that are not robust or they don't deliver as promised isn't helping us get there. 298 00:35:50,870 --> 00:35:57,650 It's if we want to make sure that we're not simply selling new 21st century versions of snake oil. 299 00:35:57,650 --> 00:36:02,570 And so I think it does happen where I think one of the solutions, 300 00:36:02,570 --> 00:36:09,080 just like we would train medical professionals and knowing how to push back on and query A.I.S systems, 301 00:36:09,080 --> 00:36:13,970 they don't need to know enough to design them, but they need to know how to how to operate them. 302 00:36:13,970 --> 00:36:18,440 It's gonna be the same in business schools. We're not necessarily going to have that. 303 00:36:18,440 --> 00:36:23,420 The MBA is in the CEO's necessarily as people designing systems, 304 00:36:23,420 --> 00:36:31,070 but they're going to need to know enough to ask their team and to hold them account for how the systems get integrated into their existing practises. 305 00:36:31,070 --> 00:36:35,660 And that's the education I think, that we really need to be doing right now. Absolutely. 306 00:36:35,660 --> 00:36:41,570 Including in Oxford. There's a reporter asks with three votes. 307 00:36:41,570 --> 00:36:48,980 What specific regulatory mechanisms would you like to see you put in place, put in place to address these issues? 308 00:36:48,980 --> 00:36:52,460 We've already started to see questions around people's individual data. Right. 309 00:36:52,460 --> 00:37:01,580 How do we how do we advocate for ourselves, you know, with the European GDP are, for example, general data protection. 310 00:37:01,580 --> 00:37:06,500 How do we how do we get individuals to advocate for this? 311 00:37:06,500 --> 00:37:15,710 We some of my esteemed colleagues have called for a kind of algorithm regulation. 312 00:37:15,710 --> 00:37:21,850 Right. That we that we that we regulate particular kinds of data systems and data structures. 313 00:37:21,850 --> 00:37:31,760 I don't necessarily think that's the path because we have in place enough outcomes regulation. 314 00:37:31,760 --> 00:37:34,700 So, for example, in the education system, 315 00:37:34,700 --> 00:37:43,520 we want to make sure that data systems are not failing our most disadvantaged students and that they're treating people equitably. 316 00:37:43,520 --> 00:37:47,180 We have we have ways of monitoring that. 317 00:37:47,180 --> 00:37:54,110 When we met with company leaders this autumn with the Women's Forum asking data scientists what they're doing about responsible tect, 318 00:37:54,110 --> 00:38:04,020 we found, surprisingly, that some of the most advanced conversations around responsible tech were happening in banking. 319 00:38:04,020 --> 00:38:08,510 Why would that be? Why would it be in banking and not in technology, for example, on banking? 320 00:38:08,510 --> 00:38:15,620 They already are under such a highly regulated, regulated system around how they make their decisions and choices that they 321 00:38:15,620 --> 00:38:21,470 had to be really sure when they integrated new systems into their analysis, 322 00:38:21,470 --> 00:38:27,800 that they could explain them, that they could explain them to the customers, that they could explain them to them themselves, 323 00:38:27,800 --> 00:38:34,570 and that they could be assured that these systems were not causing new forms of discrimination. 324 00:38:34,570 --> 00:38:42,890 It wasn't because that there was a special banking regulation in place, but it's because we have certain regulations around our financial data. 325 00:38:42,890 --> 00:38:49,340 And so and so I think that that's the the the way it's a framework, as it were, 326 00:38:49,340 --> 00:38:53,150 for how policymakers need to be thinking about regulation in this space. 327 00:38:53,150 --> 00:39:00,710 We need to be thinking about what existing frameworks in each realm of our our our lives as citizens. 328 00:39:00,710 --> 00:39:06,290 Do we need to be adjusting to think through how these systems are going to integrate into that? 329 00:39:06,290 --> 00:39:13,210 Because that's the infrastructure. So we're building. Right. 330 00:39:13,210 --> 00:39:22,890 There are three votes for this question from Gwen in your research, have you spoken to employees interacting with data driven system? 331 00:39:22,890 --> 00:39:26,800 If so, what are their main concerns and challenges in their work? 332 00:39:26,800 --> 00:39:32,880 And how do these differ by race, gender or age? 333 00:39:32,880 --> 00:39:38,820 In the data work article? In the article, who does the work of data that we published over the summer? 334 00:39:38,820 --> 00:39:44,310 We spoke overwhelmingly to women and in the United States it was women of colour who were 335 00:39:44,310 --> 00:39:53,160 having who had very stable bureaucratic jobs and large hospitals doing the state of work. 336 00:39:53,160 --> 00:39:58,230 They are the champions of the Azi era, large scale data. 337 00:39:58,230 --> 00:40:06,960 And those hot data analysis in those hospitals did not get done without these women who worked in literally the back offices. 338 00:40:06,960 --> 00:40:16,980 The challenges with our medical secretaries in in in Denmark was that the hospitals where the we interviewed, 339 00:40:16,980 --> 00:40:23,490 they were actually looking at automating their work because some of the some of the work that they had done up transcription was being automated. 340 00:40:23,490 --> 00:40:32,280 And so their concerns and challenges really were, how can I can how can I keep doing this work that I conceive of as care, 341 00:40:32,280 --> 00:40:35,850 as caring for our patients data, as caring for others, 342 00:40:35,850 --> 00:40:44,370 as making sure that the record stands properly while not getting recognised for doing that work. 343 00:40:44,370 --> 00:40:52,200 So the challenges really were how how can how can we support this invisible labour? 344 00:40:52,200 --> 00:40:56,460 It's not being called a I work. It's not being called data science. 345 00:40:56,460 --> 00:41:01,590 And yet neither new systems nor data science can happen without it. 346 00:41:01,590 --> 00:41:08,640 So I think the main concern and challenge is making sure that we're supporting good organise. 347 00:41:08,640 --> 00:41:16,290 We're supporting the work that needs to happen within organisations to make sure this happens. 348 00:41:16,290 --> 00:41:20,460 Is a cold. What's the organisation that Mark Bram's involved in as well? 349 00:41:20,460 --> 00:41:28,620 That's right. So Mark Gramme, my calling Mark Ram is working on Fair Work and Fair is is looking at the platform economy, 350 00:41:28,620 --> 00:41:34,260 labour rights, so, so, so much of this work. Mark, the work of Mark Graham, the work of Mary Grey. 351 00:41:34,260 --> 00:41:41,370 Much of this is looking at platform based labour. We're not seeing so much platform based labour in banks and hospitals. 352 00:41:41,370 --> 00:41:50,780 And yet and yet that the work that's being done in those organisations is absolutely about the invisible work that's fuelling. 353 00:41:50,780 --> 00:42:02,490 Yeah. So another question from all of this has two votes, which is one another vote, which is what I'm going to give him a second question. 354 00:42:02,490 --> 00:42:08,790 How do you regard human level performance as a metric to beat in machine learning applications, 355 00:42:08,790 --> 00:42:13,770 especially in manufacturing, checking for defects and so on? 356 00:42:13,770 --> 00:42:19,660 And he says Andrew Migg and I wrote recently about its limitations. 357 00:42:19,660 --> 00:42:28,380 Right. So, listen, we know that large scale computing power is going to help humans solve really big problems. 358 00:42:28,380 --> 00:42:34,290 And there are really big problems we cannot solve without large scale human computing power. 359 00:42:34,290 --> 00:42:41,790 I think if we frame artificial intelligence as the core, Larry, or the competitor to human intelligence, 360 00:42:41,790 --> 00:42:46,410 we we we get to these questions, will, which is which is better. 361 00:42:46,410 --> 00:42:52,650 We don't actually ask the question which is better, my calculator or my set of hand calculations. 362 00:42:52,650 --> 00:42:57,060 OK, sure. My calculator can answer some things faster. 363 00:42:57,060 --> 00:43:03,360 I can I can do some other things easier. That's going to continue to be the case with artificial intelligence system. 364 00:43:03,360 --> 00:43:10,770 So if we think about if we think about using the spray human level performance, 365 00:43:10,770 --> 00:43:21,820 we again are pushing A.I. systems to act and think like people and not act and think like bits of technology that we use or. 366 00:43:21,820 --> 00:43:32,290 Bits of infrastructure that are going to be part of much bigger organisations and data ecologies and flows of data that that already exist. 367 00:43:32,290 --> 00:43:50,710 So so to personify in some ways is to hand over a powerful sensemaking tool that takes the power away from how we can intervene in these systems. 368 00:43:50,710 --> 00:43:55,580 Thank you, ask this question, but it's not a question of Sujata. 369 00:43:55,580 --> 00:44:03,500 Can you comment on deep sake's or the more concerning the general ábhar issues you've discussed so far? 370 00:44:03,500 --> 00:44:10,640 Listen. You know, I'm concerned about deep fakes almost as much as everybody else. 371 00:44:10,640 --> 00:44:18,680 We have at the moment big challenges to add to the notions of information. 372 00:44:18,680 --> 00:44:23,600 Is our information secure or do we how do we know what we know? 373 00:44:23,600 --> 00:44:27,800 And this is a this is a question about it. It's a pistol ecology. 374 00:44:27,800 --> 00:44:37,340 This is a question about how we know what is real. That is very much a function of where we are in the early 21st century. 375 00:44:37,340 --> 00:44:50,180 But that said, I am more concerned about ensuring that we have a healthy, robust new system, healthy, robust democracies, 376 00:44:50,180 --> 00:44:58,570 healthy, robust organisations than I am about a particular technology of having lifelike human pictures. 377 00:44:58,570 --> 00:45:11,900 Right. So so we know, for example, that the amount of misinformation circulating about elections on Facebook varies by country. 378 00:45:11,900 --> 00:45:20,300 And we know that's a function, not a Facebook, not of the people in one country being smarter or easier to do, 379 00:45:20,300 --> 00:45:26,210 but a function of the regulatory environment and the social environment in which those elections happen. 380 00:45:26,210 --> 00:45:30,170 And so and so on the one hand, we can worry about deep fakes. 381 00:45:30,170 --> 00:45:36,560 On the other hand, we really need to be thinking about how do we shoulder the responsibility and bolster 382 00:45:36,560 --> 00:45:42,990 our social institutions and organisations to make sure that we're supporting a society. 383 00:45:42,990 --> 00:45:50,450 That's a great point to end on, and I'm afraid we've come back to our which is raced by. 384 00:45:50,450 --> 00:45:54,260 Sorry to all the participants for the technical glitches at the beginning. 385 00:45:54,260 --> 00:45:59,780 That's part of this transition to a digital world. 386 00:45:59,780 --> 00:46:09,290 It's great to see that so many of you have joined this, some of you extremely late at night or early in the morning. 387 00:46:09,290 --> 00:46:20,330 And thanks so much to Gina for enlightening us on what is an immensely important topic in a way that I found exceptionally clear. 388 00:46:20,330 --> 00:46:26,720 There's a lot talked about in this area, and I found your presentation to really cut through a lot of it. 389 00:46:26,720 --> 00:46:30,440 You can follow Gina on at Gina soon. 390 00:46:30,440 --> 00:46:34,550 You can follow me on an underscore golden Geraldine. 391 00:46:34,550 --> 00:46:39,590 I n do look at the Oxford Martin school events page. 392 00:46:39,590 --> 00:46:46,580 The next event, which is really a must for anyone that's interested in the Internet. 393 00:46:46,580 --> 00:46:57,320 We wouldn't be here without this man. Interconnect is Sir Tim Berners Lee in conversation with some. 394 00:46:57,320 --> 00:47:03,650 Who was another founder, together with some of much of the computing private, Sir Tim, 395 00:47:03,650 --> 00:47:14,810 who is credited really as the person behind the World Wide Web, will be at the Oxford Martin School giving a talk at five o'clock on Thursday. 396 00:47:14,810 --> 00:47:21,740 So do register for that. If you haven't already and look forward to your engagement in the future. 397 00:47:21,740 --> 00:47:29,330 Thanks for all of your participation and thanks particularly to Gina for her great presentation. 398 00:47:29,330 --> 00:47:31,631 Stay safe and good luck to you.