1 00:00:00,210 --> 00:00:07,230 OK, good. So as I welcome everybody, it gives me great pleasure to welcome back Johnny. 2 00:00:07,230 --> 00:00:08,440 Johnny Books by Bartlett, 3 00:00:08,440 --> 00:00:19,470 who's going to talk to us about his time working in it in data science since leaving leaving Oxford and the doctoral training centre. 4 00:00:19,470 --> 00:00:28,290 Johnny started in 2012 at the CDC in the CIS Biosystems Biology Centre for Doctoral 5 00:00:28,290 --> 00:00:36,210 Training and is now in in the real world working on applied machine learning. 6 00:00:36,210 --> 00:00:41,850 One minor note where we're recording this this presentation. 7 00:00:41,850 --> 00:00:51,510 So for GDP reasons, if you do not want to appear in the recording and want to preserve your privacy, 8 00:00:51,510 --> 00:01:04,980 please turn off your video and audio and you might want to change your screen name just to make sure that we don't capture your information. 9 00:01:04,980 --> 00:01:10,130 And if you can possibly remain muted during the presentation, that would be great, too. 10 00:01:10,130 --> 00:01:15,410 We'll have a little Q&A session at the end where we won't record. 11 00:01:15,410 --> 00:01:24,270 And so feel free to DeCock and share chat, ask your questions and get involved in the conversation. 12 00:01:24,270 --> 00:01:31,360 So good. OK. So without further ado, I'm going to hand over to to Johnny Sumptuary, ok. 13 00:01:31,360 --> 00:01:37,080 Awesome, chef. This is. 14 00:01:37,080 --> 00:01:40,850 All right. Can everyone see that. I can do it. 15 00:01:40,850 --> 00:01:45,610 Okay, perfect. Great. So, yeah. So thanks for having me. 16 00:01:45,610 --> 00:01:50,980 And thank you guys for inviting me back to do this as it is to speak again. 17 00:01:50,980 --> 00:01:57,050 Yeah. So I wanted to go through what it's been like in data science. 18 00:01:57,050 --> 00:02:06,000 I've I left Oxford in December 2016. So it's almost four years now since I since I've started working in industry. 19 00:02:06,000 --> 00:02:14,220 And I thought it'd be quite a nice thing to cover some of the things I have done, but also how did I get there as well. 20 00:02:14,220 --> 00:02:18,000 So actually, if I drive that, this is what I'll cover. 21 00:02:18,000 --> 00:02:23,340 So I'll give a brief introduction to myself then. So how and why I got into that science? 22 00:02:23,340 --> 00:02:30,820 Because that was one of the things I really wanted to find out when I was a graduate student thinking of leaving academia, 23 00:02:30,820 --> 00:02:34,560 you know, how do you get into industry? 24 00:02:34,560 --> 00:02:38,580 And in particular, data science was what I was thinking about then. 25 00:02:38,580 --> 00:02:45,480 I want to go on to some of the data science projects that I've done and just aren't free to keep some people interested. 26 00:02:45,480 --> 00:02:48,780 I'll do a deep dive. 27 00:02:48,780 --> 00:02:58,980 Deep is probably too generous a word, but I'll talk a bit more about one of the machine learning problems I've done to give you a bit of context. 28 00:02:58,980 --> 00:03:02,820 And then I've got in to talk about some general reflections I have, 29 00:03:02,820 --> 00:03:12,330 which I hope will be useful in just thinking about what data science and machine learning is like an industry. 30 00:03:12,330 --> 00:03:17,400 So, yeah, I, I'll come to that. So brief introduction at the moment. 31 00:03:17,400 --> 00:03:24,250 I work as a senior machine learning engineer at Spotify. And today is my one month anniversary there. 32 00:03:24,250 --> 00:03:31,830 So I've I've not long joined Spotify, so I'm still maybe still learning, find my way around things. 33 00:03:31,830 --> 00:03:35,760 But I've actually got a data scientist in industry for for almost four years. 34 00:03:35,760 --> 00:03:41,820 And I'll speak a little bit about the difference between my experience of being a data scientist 35 00:03:41,820 --> 00:03:48,120 and what at least some expectations are of being a machine learning engineer as well. 36 00:03:48,120 --> 00:03:55,920 Because I think there are differences depending on on what what companies you go to and how they define it. 37 00:03:55,920 --> 00:04:01,980 But before doing that, I was a grad student and before that an undergrad students. 38 00:04:01,980 --> 00:04:05,910 I did a maths degree at University of Southhampton first. 39 00:04:05,910 --> 00:04:13,290 And then, as Garrett said, I joined Oxford in 2012 on their systems biology. 40 00:04:13,290 --> 00:04:19,230 But then it was called a DTC, the doctoral training centre, and that's centre for doctoral training now. 41 00:04:19,230 --> 00:04:26,490 But you had a systems biology CBT and that's a picture of me in the lab in biochemistry 42 00:04:26,490 --> 00:04:32,250 when I was able to grow more hair and looking at protein crystals than a microscope. 43 00:04:32,250 --> 00:04:39,630 When I actually is doing stuff and not just in front of a computer, but yeah, that's that's a little bit about me. 44 00:04:39,630 --> 00:04:46,000 But I want to go and spend a bit more time on the how and why I go into data science. 45 00:04:46,000 --> 00:04:48,610 So. How it happen? 46 00:04:48,610 --> 00:05:00,400 I've been in a graduation for about two 1/2 years and I started thinking to myself, I don't think academia is a thing that I want to continue doing. 47 00:05:00,400 --> 00:05:12,760 There are a few reasons for this. One was this like publish or perish sort of thing that at that time some of the practises. 48 00:05:12,760 --> 00:05:23,640 So, you know, I felt there are times where it was we were doing enough just to get published and not necessarily to do the best science out there. 49 00:05:23,640 --> 00:05:30,280 And that sort of tainted at least my perspective of academia, because I you know, I got into it. 50 00:05:30,280 --> 00:05:35,080 I got into doing a defo because I said to myself, I'm going to be a professor one day. 51 00:05:35,080 --> 00:05:41,350 That was that was the plan. But, yeah, I that that's sort of tainted some of it for me. 52 00:05:41,350 --> 00:05:52,720 Then I started talking to some of the postdocs in the lab and I think the postdoc life of complimented like family life very well. 53 00:05:52,720 --> 00:06:02,890 I'd been in a long term relationship and, you know, thinking about sort of family and, you know, with postdocs, you've typically got like, sure. 54 00:06:02,890 --> 00:06:06,170 Fixed term contracts, maybe like two to five years. 55 00:06:06,170 --> 00:06:13,150 And any one guy when you might be going hopping from place to place, whether it's in the U.K. or outside. 56 00:06:13,150 --> 00:06:18,370 And so feel like I wouldn't be able to to necessarily do that. 57 00:06:18,370 --> 00:06:22,840 And then also, I didn't have the best work life balance unit HD. 58 00:06:22,840 --> 00:06:32,710 And maybe some of you also sympathise with this in the sense that you might work very late in evenings and gee. 59 00:06:32,710 --> 00:06:34,360 But also what, weekends? 60 00:06:34,360 --> 00:06:40,990 And you don't give yourself a break because you you often feel that if you're not doing work, then you're not being productive. 61 00:06:40,990 --> 00:06:49,840 And yeah, I sort of felt all of those things so that these are my sort of perceptions and my the way I saw things. 62 00:06:49,840 --> 00:06:54,730 And some of you may feel the same, some of you may disagree with this, but, yeah, 63 00:06:54,730 --> 00:07:00,070 these were sort of things are gone from my mind as to why I didn't want to stay in academia. 64 00:07:00,070 --> 00:07:08,560 So what did I want? Well, I started to enjoy writing code, and this was something that was very different because dermo undergraduate degree in maths, 65 00:07:08,560 --> 00:07:13,120 when I was told that we were going to do a programming course, I absolutely hated that. 66 00:07:13,120 --> 00:07:19,480 And I would do whatever modules I could to not programme because I liked pen and paper mass. 67 00:07:19,480 --> 00:07:26,920 But yeah, during the p h d i, I worked in a lab with a computer scientist and you know, 68 00:07:26,920 --> 00:07:31,960 it sort of got me writing code and got me liking someone like liking it. 69 00:07:31,960 --> 00:07:36,340 So I like doing that. I like complex analysis as well. I loved all the data analysis. 70 00:07:36,340 --> 00:07:41,230 Love's like right now the maths and looking at graphs and and trying to understand those. 71 00:07:41,230 --> 00:07:46,760 That's what I really liked and I wanted to continue doing. I also wanted a job security as well. 72 00:07:46,760 --> 00:07:52,440 You know, as I said, I wanted to know, was it a long term relationship? I wanted to start a family. 73 00:07:52,440 --> 00:07:58,000 And and so it was some of that where I wanted some of that was was was secure and stable. 74 00:07:58,000 --> 00:08:02,950 I also wanted to be intellectually stimulated. So that was one of my worries about grains industry. 75 00:08:02,950 --> 00:08:07,950 I actually worried that if I left academia, I wouldn't be intellectually stimulated. 76 00:08:07,950 --> 00:08:11,460 But I knew it was something I really wanted. So I. 77 00:08:11,460 --> 00:08:15,640 The other thing is, money's pretty sweet, too. Yeah. 78 00:08:15,640 --> 00:08:20,380 It's you know, you're going to get paid well in an industry as well. 79 00:08:20,380 --> 00:08:25,890 So I found out about data science. I'd read articles about it. 80 00:08:25,890 --> 00:08:33,370 And and it just seemed like a sweet deal. It seemed like everything I want it to code to do data analysis. 81 00:08:33,370 --> 00:08:37,900 Got the job secure a money and all of that. And it was it was great. 82 00:08:37,900 --> 00:08:43,870 I'm like, okay, I think I know what it is I want to do. But. 83 00:08:43,870 --> 00:08:49,870 But. I don't know if you have the same sort of opinions or views, 84 00:08:49,870 --> 00:08:59,420 but when you when I've read a lot of papers or articles or watch videos, I often talk about the things that a state of the art. 85 00:08:59,420 --> 00:09:09,100 And, you know, I have a few examples. Something like couple of these examples happened after a joint left academia and in industry. 86 00:09:09,100 --> 00:09:13,390 But they always talk about like really these like state of the art neural networks 87 00:09:13,390 --> 00:09:18,310 and reinforcement learnings in these probabilistic graphical models that we're like, 88 00:09:18,310 --> 00:09:22,900 really, really cool. And I was like, I don't know how to do any of that. 89 00:09:22,900 --> 00:09:27,450 Like, I don't. I didn't do what I thought was machine learning. 90 00:09:27,450 --> 00:09:35,750 And my my PTSD. You know, I could code, but I wasn't the greatest software engineer out there. 91 00:09:35,750 --> 00:09:42,570 And yeah. How do you how do you like help and write code for a self-driving car. 92 00:09:42,570 --> 00:09:46,660 He does that. So that's kind of where I was at with that. 93 00:09:46,660 --> 00:09:53,500 So I thought to myself, I need to develop some skills. So I decided to do a bootcamp. 94 00:09:53,500 --> 00:09:57,460 And to do this, I figured that I'd need to write my thesis early. 95 00:09:57,460 --> 00:10:01,720 So I spent like six, seven weeks. 96 00:10:01,720 --> 00:10:08,650 Like, head down during the summer of twenty sixteen and just drop the thesis written 97 00:10:08,650 --> 00:10:12,880 up as soon as I could in the summer so I could take five weeks out to do this. 98 00:10:12,880 --> 00:10:21,040 Boot camp science today aside. So s2 D. S. So yeah, I don't know, boot camp, you know, you get put in different teams. 99 00:10:21,040 --> 00:10:29,380 And when asked to complete a project for a company and we're not paid to do this and I had to pay the privilege of it, 100 00:10:29,380 --> 00:10:34,600 pay a hundred pounds for the privilege to to do work for a company. 101 00:10:34,600 --> 00:10:40,760 But it gave me some experience. Put the CV and a network as well. 102 00:10:40,760 --> 00:10:44,670 And actually it helped me get my first job, so I shouldn't complain about it. 103 00:10:44,670 --> 00:10:53,320 It was a hundred pounds worth spent, but I won't play the video or like I wrote an article about this. 104 00:10:53,320 --> 00:11:00,970 The slides and all the links I've said you can have is available along with this recording, but you can watch it in your own time. 105 00:11:00,970 --> 00:11:05,020 It's in the core. It's linked on their mates on YouTube, if you want. 106 00:11:05,020 --> 00:11:08,740 But I hate watching myself back. So I'm going to skip this. 107 00:11:08,740 --> 00:11:10,670 But yeah, after the boot camp, 108 00:11:10,670 --> 00:11:21,730 I when I interviewed and found a job in the end and ended up as a data scientist and I'm happy to to answer questions on on that process. 109 00:11:21,730 --> 00:11:30,010 If you get to the end. But I'm going to skip that and talk about applications of data science and machine learning and industry. 110 00:11:30,010 --> 00:11:33,670 So what I've covered so far, I've just had a brief introduction to myself, 111 00:11:33,670 --> 00:11:37,660 one slide, and then I've just talked about how and why I got into data science. 112 00:11:37,660 --> 00:11:43,360 So now I want to do is I just want to talk about some of the some of the things that, 113 00:11:43,360 --> 00:11:48,350 oh, the ways in which data science and machine learning applies in industry. 114 00:11:48,350 --> 00:11:52,870 An important thing to note here is that it's it's far from an exhaustive list. 115 00:11:52,870 --> 00:11:59,710 So, of course, this sort of statement is used all the time. 116 00:11:59,710 --> 00:12:06,380 But I it's far from exhaustive because I don't know all of the applications. 117 00:12:06,380 --> 00:12:13,240 I've not been around it. So I've actually given a list of things that and projects that I've been involved with. 118 00:12:13,240 --> 00:12:18,880 So here are some of the machine learning specific projects that I've been involved with. 119 00:12:18,880 --> 00:12:27,160 So like first off, in SCD s boot camp, I talked about I was doing price up to my zation for different products. 120 00:12:27,160 --> 00:12:32,410 So it was a company called the Parts Alliance. They like distribute and sell copouts. 121 00:12:32,410 --> 00:12:42,520 And we're also their project to work out how can we, like, price their products such that they can maximise revenue. 122 00:12:42,520 --> 00:12:48,550 So that was the sort of the first project I got everybody involved in. 123 00:12:48,550 --> 00:12:58,490 And then when I finally got my first job as a company called News UK, I got more sort of like an LP, natural language processing style projects. 124 00:12:58,490 --> 00:13:03,940 So I work in the text. And so we're looking at automated topic tag extraction. 125 00:13:03,940 --> 00:13:07,800 So this is things like there's a news article, who's in it and what's it about? 126 00:13:07,800 --> 00:13:13,630 And, you know, should I extract Theresa May or is it about her? Is it a sports article and things like that. 127 00:13:13,630 --> 00:13:18,950 So trying to get a model to automatically extract those things from articles. 128 00:13:18,950 --> 00:13:25,750 And largely this is done. This was a project there because they hadn't actually tagged their articles. 129 00:13:25,750 --> 00:13:33,070 And so for years. I can't remember how many articles, but we're talking hundreds of thousands, if not millions of articles that were untapped. 130 00:13:33,070 --> 00:13:42,280 And so they were, you know, unsearchable, you know, that then they they're not easy to find if they haven't got the metadata with them. 131 00:13:42,280 --> 00:13:45,320 It's a bit of anomaly detection, possibly. 132 00:13:45,320 --> 00:13:54,710 The most ambitious project I've ever been involved with, and that includes all of my time now is this automated fact chequer for news articles. 133 00:13:54,710 --> 00:14:00,720 Safe to say that project was not completed. And I'll talk about some of the projects that didn't get completed later on. 134 00:14:00,720 --> 00:14:12,680 But this the idea here is that when a news article is is written by a journalist, it has to be fact checked by sub editors. 135 00:14:12,680 --> 00:14:21,380 And so, you know, you've got people who go through the article and try and find sources online or verify those statements. 136 00:14:21,380 --> 00:14:24,770 And I mean, I remember one example where it gone wrong, 137 00:14:24,770 --> 00:14:30,920 which was an article that was published by The Times where I think NASA's sent a probe to Jupiter. 138 00:14:30,920 --> 00:14:37,340 And with the numbers that they'd given, they basically said that the probe would reach Jupiter in like 40 days or something like that. 139 00:14:37,340 --> 00:14:42,670 So ridiculously quick. So it was like Travian way too fast. 140 00:14:42,670 --> 00:14:44,690 And that's the sort of stuff that are trying to find. 141 00:14:44,690 --> 00:14:52,040 So can you write a machine learning model that would scan text, find out what parts of the text are actual statements, 142 00:14:52,040 --> 00:15:02,870 and then go and do automated queries to find valid sources to then cheque that statement for its factual validity. 143 00:15:02,870 --> 00:15:06,650 That was that was a really tough project. 144 00:15:06,650 --> 00:15:08,330 We got some somewhere with it. 145 00:15:08,330 --> 00:15:18,410 But the a lot of the natural language processing tools when the state of the art now was available, when we were doing this back in twenty seventeen. 146 00:15:18,410 --> 00:15:25,970 I know that twenty seventeen doesn't sound that long ago, but in at least natural language processing terms as an age away. 147 00:15:25,970 --> 00:15:32,490 So yeah. That we weren't able to do it with enough precision. 148 00:15:32,490 --> 00:15:36,980 And then I moved onto delivered and we started looking at things like compensation, abuse. 149 00:15:36,980 --> 00:15:42,950 So when I'm assuming people familiar with delivery, I'll talk a bit more about it after this, 150 00:15:42,950 --> 00:15:51,410 because that's one of the deep dive projects I'll go into. But, yeah, it's it's basically an app that you can order on like food with. 151 00:15:51,410 --> 00:15:57,980 And so you can order food, but then you can claim compensation if the food code or items are missing. 152 00:15:57,980 --> 00:16:02,180 And so sometimes people that do that aren't actually being genuine. 153 00:16:02,180 --> 00:16:06,800 The food arrives, APSEY perfectly, and they just want a cash. 154 00:16:06,800 --> 00:16:12,350 Surprise, surprise. So you're trying to detect that automated menu classification. 155 00:16:12,350 --> 00:16:16,330 Is this menu Italian? Is it got pizza things of that? 156 00:16:16,330 --> 00:16:23,780 Or is it. Is it Mediterranean? I'm we tried that in restaurant ranking recommendations. 157 00:16:23,780 --> 00:16:28,470 This is probably the project that I spent the longest amount of time doing. 158 00:16:28,470 --> 00:16:34,940 Probably a year and a half, I think. So that's the one I'm going to do a bit of a deep dive on. 159 00:16:34,940 --> 00:16:40,330 But yeah. And now. Now I've met the Spotify. I will be working on search relevant. 160 00:16:40,330 --> 00:16:44,230 So, I mean the search relevance team at Spotify. So when you actually. 161 00:16:44,230 --> 00:16:48,680 So assuming for those that don't know what Spotify is, it's a music streaming platform. 162 00:16:48,680 --> 00:16:55,520 So you can go on and stream music so you can search when you use a search functionality, you get results. 163 00:16:55,520 --> 00:16:59,780 And I'll be working on the I'm on a team that works on traunch. 164 00:16:59,780 --> 00:17:04,620 Make those results more relevant. So, yeah, that's what I'll be dead. 165 00:17:04,620 --> 00:17:08,310 So that's like a bit of an overview about some of the products that I've been involved 166 00:17:08,310 --> 00:17:13,890 with and that that specifically machine learning projects as a data scientist. 167 00:17:13,890 --> 00:17:20,850 Not everything you do is machine learning. In fact, most of the stuff that you do is not machine learning. 168 00:17:20,850 --> 00:17:25,710 And then some of the machine learning products, not machine learning products, have been involved in. 169 00:17:25,710 --> 00:17:32,310 So often it's about providing data. People just want to see the right graphs, you know. 170 00:17:32,310 --> 00:17:38,190 So you've got people that want data because they want to make decisions and they want to make informed decisions. 171 00:17:38,190 --> 00:17:43,950 So dashboards are absolutely huge in an industry. 172 00:17:43,950 --> 00:17:51,860 Know there will be people that will be looking at dashboards and looking at figures and dashboards to make decisions about, 173 00:17:51,860 --> 00:17:59,700 you know, the work that they're going to do. Will actions take next? So one was this boom, this book. 174 00:17:59,700 --> 00:18:09,840 So in the editorial team at the Times, they didn't want to go through dashboards to say they wanted to be able to ask the questions. 175 00:18:09,840 --> 00:18:18,960 So what I did was I built a little bot so slack for those that don't know, it's a messaging to imagine if you haven't used it. 176 00:18:18,960 --> 00:18:27,180 Imagine you have a load of group chats on WhatsApp and you have the ability to organise all those groups and different message people in the company. 177 00:18:27,180 --> 00:18:28,830 That's kind of what it's like. 178 00:18:28,830 --> 00:18:39,180 But Slack allows you to build these bots that are basically like algorithms that can like post messages and extract things and shape things. 179 00:18:39,180 --> 00:18:44,520 So I wrote Slabaugh that basically responded. So whenever an editor would write. 180 00:18:44,520 --> 00:18:49,200 Can you give me the top 10 articles in the sports section today? 181 00:18:49,200 --> 00:18:59,100 It would actually take that question, turn it into a school query, query our database and then return those results. 182 00:18:59,100 --> 00:19:02,940 And they absolutely loved that because they wanted to. 183 00:19:02,940 --> 00:19:06,240 They wanted the natural language interface. That's what they do. 184 00:19:06,240 --> 00:19:14,070 They they speak in words. And they wanted to use English to query things, not not code or or dashboards. 185 00:19:14,070 --> 00:19:22,680 The Chrome extension was just if you use the Google Chrome browser, they could just click on a button and it would bring up facts about an article. 186 00:19:22,680 --> 00:19:29,520 Eighty tests for those who don't know. And a B test is basically of experiments as well. 187 00:19:29,520 --> 00:19:36,570 And industry often, but not like a lot of experiments. They basically you've created one feature. 188 00:19:36,570 --> 00:19:42,120 Let's say you added a button somewhere in your app and now you want to test if it's good. 189 00:19:42,120 --> 00:19:46,760 So in your age group, they don't have the button in your big B group. 190 00:19:46,760 --> 00:19:55,770 They do have the button and you test to see if you get increased click through rate or uptake or whatever it is you want to measure. 191 00:19:55,770 --> 00:20:01,440 So I've conducted several of those. And also lots and lots of ad hoc analysis. 192 00:20:01,440 --> 00:20:07,710 So that's what I call. This is just what happens when someone asks for for some information. 193 00:20:07,710 --> 00:20:11,380 And this will happen a lot, particularly around Kovik, 194 00:20:11,380 --> 00:20:18,780 when when when it's out and in when the first lockdown down hit back in March was we can deliver. 195 00:20:18,780 --> 00:20:25,290 And, you know, the things change a lot. A lot of restaurants had had closed because they couldn't be open. 196 00:20:25,290 --> 00:20:32,130 But a lot people still want to get their essentials. So, you know, they're like Joanie, like what's what? 197 00:20:32,130 --> 00:20:33,630 What are consumers demanding now? 198 00:20:33,630 --> 00:20:40,620 So, you know, I start analysing a lot of search results and I find out people what I'm not, what groceries all of a sudden and things like that. 199 00:20:40,620 --> 00:20:47,790 So then that tells people all we've got to get lots of grocery stores on on the app. 200 00:20:47,790 --> 00:20:52,890 So you've probably may have seen that delivery along with, I imagine, just the end of the eats. 201 00:20:52,890 --> 00:20:58,980 And the competitors have got in a different grocery stores and convenience stores 202 00:20:58,980 --> 00:21:05,460 like Carwarp and Marks and Spencers and and even maybe a local petrol station. 203 00:21:05,460 --> 00:21:10,280 So, yeah. So there's lots of lots of things that say. 204 00:21:10,280 --> 00:21:14,490 That that's like a sort of overview about some of the things that I've done. 205 00:21:14,490 --> 00:21:18,760 One of the Timor sections now. So this one is going to be a deep dive. 206 00:21:18,760 --> 00:21:24,100 Talking a bit more about a particular machine learning problem that I did deliver. 207 00:21:24,100 --> 00:21:29,330 And then the last section, I want to talk about some general reflections that I've had. 208 00:21:29,330 --> 00:21:35,930 Sorry, I feel about 20 or so minutes and then I've got give us 15 minutes for four questions. 209 00:21:35,930 --> 00:21:40,270 So I hope that will be heard. That's gonna be enough. Cool. 210 00:21:40,270 --> 00:21:44,830 So just say you're all aware. Like the slides that I'm going to show you. 211 00:21:44,830 --> 00:21:49,780 So I actually made this presentation delivery. They're available online. 212 00:21:49,780 --> 00:21:55,380 I've given this talk multiple times. And so the video of the full talk. 213 00:21:55,380 --> 00:21:59,260 So I'm what I'm going to do here is a section of it. But the full talk is online. 214 00:21:59,260 --> 00:22:07,320 So there is nothing. So it's all publicly available already. So I'm not sure I knew anything that you have to keep yourself. 215 00:22:07,320 --> 00:22:14,620 And I've given links to the slide and two videos of the full took. 216 00:22:14,620 --> 00:22:20,740 If you are interested in learning more than what I'm about to tell you, then then you can go on and hear about that. 217 00:22:20,740 --> 00:22:29,290 So that talk is just me talking about how we do restaurant ranking at delivery. 218 00:22:29,290 --> 00:22:38,830 Cool. So first things first. Again, for those who like having these delivery before delivery, is Lochley a software platform? 219 00:22:38,830 --> 00:22:43,780 So it's it's an app essentially that connects different entities. 220 00:22:43,780 --> 00:22:46,450 And we call is a free three site marketplace. 221 00:22:46,450 --> 00:22:54,390 So you've got restaurants and they're connected to consumers through riders that deliver food to those consumers, essentially. 222 00:22:54,390 --> 00:23:01,300 And on the on the restaurant side, there's been tens of thousands of restaurants and tens of thousands of riders. 223 00:23:01,300 --> 00:23:06,220 And what you may not know is delivery doesn't just operate in the U.K., so it operates in. 224 00:23:06,220 --> 00:23:12,040 Well, I was at the time I left LA in September. It was in 13 countries when I wrote this. 225 00:23:12,040 --> 00:23:20,510 It was 14. And just they put a Germany summit 20, 19 sometime. 226 00:23:20,510 --> 00:23:29,350 Yeah. So they operate globally. And so, yeah, that's that's a sort of scan of of how delivery operates. 227 00:23:29,350 --> 00:23:35,530 And it's a platform that basically, if you haven't used it, this is what you see when you open the app. 228 00:23:35,530 --> 00:23:46,270 You essentially if you've got a list of restaurants and we want to be able to show those restaurants in an optimal order. 229 00:23:46,270 --> 00:23:50,990 Ashok, I get to that. So I just started a team. 230 00:23:50,990 --> 00:23:59,830 Well, not not me. I didn't stop the team, but I started in the team that was created in October 2018 called Merchandising Algorithms, 231 00:23:59,830 --> 00:24:04,040 and so newly formed and we decided to first go. 232 00:24:04,040 --> 00:24:10,090 Our initial goal was gonna be, let's present the most relevant restaurants to the consumer at the top of the feed. 233 00:24:10,090 --> 00:24:18,250 Right. So you open the up and given who you are, what you voted before, what we have available, what should we show you? 234 00:24:18,250 --> 00:24:24,460 A top that you will most likely one. 235 00:24:24,460 --> 00:24:32,170 And so bear in mind, this particular thing, this go is a business problem, is nothing machine learning about this. 236 00:24:32,170 --> 00:24:42,380 And this is one of the things that you'll end up doing if you decide to do the science is you need to take a business objective and then decide, 237 00:24:42,380 --> 00:24:47,920 can I solve this with a machine learning approach? Or does it need something simpler? 238 00:24:47,920 --> 00:24:52,490 Does it. Is it even something that is is something I can solve? 239 00:24:52,490 --> 00:24:58,760 Often the people that ask that question don't know and don't care. 240 00:24:58,760 --> 00:25:02,840 At this point, they just they they think you're a dinosaur. You've got the data. 241 00:25:02,840 --> 00:25:05,000 You can solve any and every problem. 242 00:25:05,000 --> 00:25:11,420 You have to then work out whether you can solve the problem and give them the answer, whether they like it or not. 243 00:25:11,420 --> 00:25:16,670 But this was one in which we believe that machine learning was a good candidate for. 244 00:25:16,670 --> 00:25:21,440 So we created a model to rank those restaurants. 245 00:25:21,440 --> 00:25:29,210 Now, bear in mind, I've said given a list of restaurants like we want to present the most optimal restaurants. 246 00:25:29,210 --> 00:25:36,560 What do you mean by optimal? Again, what? Well, what we've said here is we want to rank in order of relevance to the consumer. 247 00:25:36,560 --> 00:25:41,480 That was our definition of optimal. Bear in mind, this can change. 248 00:25:41,480 --> 00:25:46,010 So the business might say, I don't care about the relevance the customer. 249 00:25:46,010 --> 00:25:47,930 I care about the profit. 250 00:25:47,930 --> 00:25:57,160 So, you know, can we rank such that I'm going to get the most profit or can we rank so that it's the most fair and we get the highest distribution, 251 00:25:57,160 --> 00:26:01,430 like spread of orders across different risk. So something. 252 00:26:01,430 --> 00:26:08,300 So you have to define what we mean by optimal. And again, when I love, like, say, you know, 253 00:26:08,300 --> 00:26:13,610 someone who presents the question to you from the business and says, I'll just just present these restaurants. 254 00:26:13,610 --> 00:26:26,030 Ultimately, they often don't define what Tomalis. And so it's up to you to make sure you define the problem space. 255 00:26:26,030 --> 00:26:31,730 Again, it's just one of the things that as a data scientist and industry, you learn how to do. 256 00:26:31,730 --> 00:26:35,960 This is the machine learning and it's a data science is the technical stuff. 257 00:26:35,960 --> 00:26:41,270 But then there's also the soft skills and the business skills that you have to develop as well. 258 00:26:41,270 --> 00:26:45,950 So we've set that optimal. We want to rank in order of relevance. 259 00:26:45,950 --> 00:26:50,330 But how do we quantify this? This is all quantitative. It's wishy washy. 260 00:26:50,330 --> 00:26:56,540 How can I quantify this so I can build a machine learning solution to this? 261 00:26:56,540 --> 00:27:05,450 So the first thing I want to do is I want to say what does relevance mean and how can we measure that something is relevant to the consumer? 262 00:27:05,450 --> 00:27:11,450 So one of these I call these online metrics and they are often called online metrics. 263 00:27:11,450 --> 00:27:17,510 What I mean by this, by online is that users see these in reality. 264 00:27:17,510 --> 00:27:25,700 So we're going to show a ranking of restaurants and users are going to see that ranking and then they're going to order them into place in order. 265 00:27:25,700 --> 00:27:34,220 This is in contrast to an offline metric when, say, you might be running a machine learning model locally on your laptop, on your machine. 266 00:27:34,220 --> 00:27:39,080 And so you have to measure something different. And I'll talk about online metrics later. 267 00:27:39,080 --> 00:27:45,530 But online metrics, we talk about order volume. So this is just a proxy for something being relevant. 268 00:27:45,530 --> 00:27:55,040 Right. So we're assuming here that if all the volume goes up with my ranking, then those restaurants are ranked. 269 00:27:55,040 --> 00:28:00,110 We're probably more relevant. That's that's the proxy. And again, we have session level converging. 270 00:28:00,110 --> 00:28:03,050 We've stopped using it. I won't go into the details of that. 271 00:28:03,050 --> 00:28:10,970 But it's it's to do with the fact that it's it's a question and it's hard to interpret changes in that number. 272 00:28:10,970 --> 00:28:14,600 But we did initially start with looking at that. 273 00:28:14,600 --> 00:28:22,810 So when we're framing the problem, if you've done some machine learning before in the past, you'll notice you need what in machine learning? 274 00:28:22,810 --> 00:28:31,130 Speed is called a target variable. This year we've got our target, which is given a list of restaurants. 275 00:28:31,130 --> 00:28:35,780 What restaurant that they use it purchased from here. I've done a purchase. 276 00:28:35,780 --> 00:28:42,470 I've called it converted. We often say convert. If I say order or purchase or convert, I'm using them interchangeably. 277 00:28:42,470 --> 00:28:48,230 So on the left hand side, we have one session and a session is just assume it's somewhere. 278 00:28:48,230 --> 00:28:51,950 I open it up. I have a look at some stuff. Don't like it. 279 00:28:51,950 --> 00:28:57,530 I closed the app. That's one session. That's a session that did not convert because I didn't purchase anything. 280 00:28:57,530 --> 00:29:01,850 But then in the evening, I open the up again. Have a look. This time I did buy. 281 00:29:01,850 --> 00:29:06,290 That's a session that converted. And that's a separate session to the first one. 282 00:29:06,290 --> 00:29:14,810 So I owe a session. Can be different uses. So when the session on the left, someone convert it on on to whereas on the session. 283 00:29:14,810 --> 00:29:20,040 On the right someone convert it on the bagel factory in position one. 284 00:29:20,040 --> 00:29:27,540 And yet that list can be. In central London, that list is like thousands in saving Paris. 285 00:29:27,540 --> 00:29:35,430 It's it's huge because there's so many restroom options. So you decide to frame this as a classification problem. 286 00:29:35,430 --> 00:29:41,940 I won't go into the technical details here. As I said, you can you can watch the video and talk a little bit more about it. 287 00:29:41,940 --> 00:29:47,190 But the idea as a classification problem is what we'll do is we say take a restaurant. 288 00:29:47,190 --> 00:29:52,380 How likely is the user to purchase from that restaurant? And then we'll give it a school. 289 00:29:52,380 --> 00:29:56,250 I'm going to use score and probability interchangeably again here. 290 00:29:56,250 --> 00:30:04,680 But if you're technical and you know that often with different classification models, you don't get a well calibrated probability. 291 00:30:04,680 --> 00:30:07,140 So it's probably better that I use the word school, 292 00:30:07,140 --> 00:30:15,570 but I have used probability here just because people find it easier to say what's the probability that someone's going to invent again? 293 00:30:15,570 --> 00:30:22,620 Sometimes in the business and industry, you end up using words because it's easier for people to understand. 294 00:30:22,620 --> 00:30:27,450 But yes, essentially it's a school. How? Let me go out to convert between zero and one. 295 00:30:27,450 --> 00:30:33,780 And we can use the logarithmic lost function, which is on the right to train the model. 296 00:30:33,780 --> 00:30:39,420 So the idea here now is we've got our target variable. Are they going to convert on a restaurant or not? 297 00:30:39,420 --> 00:30:45,170 Now, I need to find out what the dependent variables are all in machine learning because those features. 298 00:30:45,170 --> 00:30:51,920 So on the right, you can see you reduce features like how long will it take for the for the order to arrive? 299 00:30:51,920 --> 00:30:56,370 What is the popularity of the restaurant? Did they get a lot of orders in the last 30 days? 300 00:30:56,370 --> 00:30:59,850 What's the restaurant rating? Does the restaurant have an image on the up? 301 00:30:59,850 --> 00:31:08,090 Sometimes they don't have images in all of these things are factored into whether someone will purchase from the restaurant or not. 302 00:31:08,090 --> 00:31:16,470 Yeah. And then there's some function, some machine learning model that will take those features and outputs some school. 303 00:31:16,470 --> 00:31:24,600 The important thing, first stop simple and iterate. You know, we didn't start off with a machine learning model. 304 00:31:24,600 --> 00:31:31,770 It was just a weighted average of the restaurant's popularity and the estimated time of the order arriving. 305 00:31:31,770 --> 00:31:42,150 And this started simple allows you to build infrastructure to to like what we say we serve that rankings are actually present the ranking to users. 306 00:31:42,150 --> 00:31:50,790 And I'll talk a bit more a bit later about what I mean by like the serving infrastructure around it, because that was actually very important. 307 00:31:50,790 --> 00:31:57,420 But once we were able to do that, we built the infrastructure. We did then look at using different models them to logistic regression. 308 00:31:57,420 --> 00:32:02,910 And then we started using more complex models a bit later, like neuro networks. 309 00:32:02,910 --> 00:32:05,100 But then when we went to evaluate these models. 310 00:32:05,100 --> 00:32:11,520 So before we actually present them to users, we want to evaluate are all the ranking algorithms we're creating. 311 00:32:11,520 --> 00:32:21,270 Are they actually any good? So these are what we call off line metrics, metrics that we measure on on a laptop. 312 00:32:21,270 --> 00:32:29,640 And Berrima, I should say, before I started this project, I'd never done any ranking at all in my life, like in a machine learning sense. 313 00:32:29,640 --> 00:32:38,850 So if you're looking at some of this stuff and you like what is going on, I, too, was in the same position as you as aggression like. 314 00:32:38,850 --> 00:32:46,500 So don't feel like all of this has to make sense because it only made sense to me two years ago now. 315 00:32:46,500 --> 00:32:49,160 Then I started looking at these things. 316 00:32:49,160 --> 00:32:57,300 But essentially these like all of this stuff I'm talking about, like, you know, you guys are already a world class university. 317 00:32:57,300 --> 00:33:02,730 It won't take you long to learn and do. And that's largely what I do, is I don't know how to do it. 318 00:33:02,730 --> 00:33:07,020 Start off with. And then I just learn on a job. But yeah, we have a bunch. 319 00:33:07,020 --> 00:33:13,380 These are fly metrics. The one that we used was called mean reciprocal rank, the way that's calculated. 320 00:33:13,380 --> 00:33:18,690 If you just I've got five columns. Just look at the column on the far left. 321 00:33:18,690 --> 00:33:24,040 So in that column on the far left, there's five different rectangles. 322 00:33:24,040 --> 00:33:30,000 We call them restaurant cards. And I've written convert it on the card that's in the fourth position down. 323 00:33:30,000 --> 00:33:35,430 So that says it's rank is full. So it's reciprocal. Rank is one over four. 324 00:33:35,430 --> 00:33:40,350 And I do that for every single one of these columns, every single section. 325 00:33:40,350 --> 00:33:44,460 So now I've got a bunch of reciprocal ranks. I can then take the mean of that. 326 00:33:44,460 --> 00:33:49,670 And that is called the main reciprocal rank. That is a an evaluation metric. 327 00:33:49,670 --> 00:33:54,170 And like information retrieval, they call it, or ranking. 328 00:33:54,170 --> 00:33:56,940 And the idea, it's a number between zero and one. 329 00:33:56,940 --> 00:34:04,380 If you place the converted restaurant at the very top of the list in every session, then you mean a single rent will be equal to one. 330 00:34:04,380 --> 00:34:10,470 So you want to get it as close to that as possible because that that suggests that you'll reach your permitting the list. 331 00:34:10,470 --> 00:34:15,930 You're changing the list such that the most desirable thing is highest up. 332 00:34:15,930 --> 00:34:25,230 That's the idea for this. So that was all the centrepieces in terms of like the workflow. 333 00:34:25,230 --> 00:34:29,910 How did we do this? Well, I said we needed to get the data. We need to plot the sections. 334 00:34:29,910 --> 00:34:33,180 We need to plot all the. Just so we have a data warehouse, I mean, right, 335 00:34:33,180 --> 00:34:41,180 a bunch of sequel to to extract that data from database and rewrite all of the rest of the code in Python. 336 00:34:41,180 --> 00:34:48,740 So we validate that the model we train, several models say, and models for each one each. 337 00:34:48,740 --> 00:34:53,540 A single model is one permutation of that of that list. 338 00:34:53,540 --> 00:34:58,100 So I can calculate the mean reciprocal rank for all of the sessions. 339 00:34:58,100 --> 00:35:03,410 For one model. So that would be our one. And then I can do that for all the different models. 340 00:35:03,410 --> 00:35:08,900 And the best one is, is the one with the highest Niemans reciprocal rank. 341 00:35:08,900 --> 00:35:14,420 That's the model. We then choose to go into production. If you listen to the rest of my talk, 342 00:35:14,420 --> 00:35:22,580 I will talk about why that was not a good idea and why I and supercool rank in our case wasn't a best evaluation metric. 343 00:35:22,580 --> 00:35:29,480 But I won't go into that here. But once we've chosen that model, we then run the AP test. 344 00:35:29,480 --> 00:35:33,920 So what I'm saying is half of the people in group, they get the original ranking. 345 00:35:33,920 --> 00:35:43,100 Half of them, a group B, get the new ranking. And then we test to see did the order volume go up or did the session level conversion go up? 346 00:35:43,100 --> 00:35:50,360 If it did, great, then we wrote up the new model. If not, then we need to think again and iterate. 347 00:35:50,360 --> 00:35:52,250 And this is a complete iterative process. 348 00:35:52,250 --> 00:36:01,160 So even if we do well and we we get a successful experiment, there's probably features we missed out or there's a new better model that we can use. 349 00:36:01,160 --> 00:36:07,740 So this is completely iterative. So you can keep going through this cycle, picking up different projects elsewhere in a team. 350 00:36:07,740 --> 00:36:12,940 But but this is a particular thing that will keep going. Yes. 351 00:36:12,940 --> 00:36:15,700 So then I say this is current work. Back then it was current work, 352 00:36:15,700 --> 00:36:24,550 but we started looking at more complex models and on the right is a wide and deep neural network which we use to implement, 353 00:36:24,550 --> 00:36:29,120 which is currently in production at deliver at the moment. 354 00:36:29,120 --> 00:36:36,570 So that's that those. That is like a sort of deep dive into one of the projects that I've done. 355 00:36:36,570 --> 00:36:44,280 And I said more information from from the video or feel free to ask me questions and I can talk a bit more about it in detail. 356 00:36:44,280 --> 00:36:50,860 If you're interested. But I want to talk a bit about general reflections that I've had from work and into science. 357 00:36:50,860 --> 00:36:54,730 I think, like for me, this is like the most fun bit about writing. 358 00:36:54,730 --> 00:36:59,500 It's presentation was thinking about like, what do you actually think about my time? 359 00:36:59,500 --> 00:37:05,550 First off. Is machine learning models need to be in production to provide value. 360 00:37:05,550 --> 00:37:10,000 Right. So if I just train a model and it's on my laptop. Great. 361 00:37:10,000 --> 00:37:16,530 That's cool. I've learnt something, but it's not gonna do anything for your company unless users are using it. 362 00:37:16,530 --> 00:37:23,200 Or people are making decisions of it. It's it's not providing any value. 363 00:37:23,200 --> 00:37:27,790 And actually getting machine learning goes into production is not an easy thing to do. 364 00:37:27,790 --> 00:37:34,360 So I'm at a time of. I wrote this presentation a couple of weeks ago. 365 00:37:34,360 --> 00:37:39,690 I was in the middle of reading the article. And in the first paragraph, it states this. 366 00:37:39,690 --> 00:37:45,820 This is a company. VentureBeat reports that eighty seven percent of the site's products don't make it to production. 367 00:37:45,820 --> 00:37:50,050 And another one is that it says that it's 90 percent, although the second one, 368 00:37:50,050 --> 00:37:56,920 she talks specifically about machine learning products rather than data science products linked the source at the bottom stack. 369 00:37:56,920 --> 00:38:02,890 Overflow Block isn't a link to the article was reading. But I've also linked to the two separate articles. 370 00:38:02,890 --> 00:38:09,910 In case you're interested. But this is a large, like really, really big problem. 371 00:38:09,910 --> 00:38:16,960 I think a lot of what people read about and doing what I had thought about the science was I'm going to build machine learning models, 372 00:38:16,960 --> 00:38:22,510 I'm going to do basic stuff. I'll be doing these new networks and what is this cool stuff? 373 00:38:22,510 --> 00:38:26,770 But most of the time it's not. It doesn't make it a production. 374 00:38:26,770 --> 00:38:32,560 And it's so it's it becomes useless. And so what I'll say before is learning. 375 00:38:32,560 --> 00:38:39,310 If someone for the business asked you a question, you have to decide whether you're able to answer that question. 376 00:38:39,310 --> 00:38:47,290 That is a huge skill, because if I just say yes all the time, it's going to end up with most of my products failing. 377 00:38:47,290 --> 00:38:51,250 And I've mentioned it about seven infrastructure. Briefly, I said I talk about it. 378 00:38:51,250 --> 00:38:56,920 So this is a figure from a paper published by Google. I think it's 2015. 379 00:38:56,920 --> 00:39:06,430 And what you see in the middle is it is black and okelo code. That is the amount of machine learning code required to get something into production. 380 00:39:06,430 --> 00:39:11,530 But there's a whole bunch of stuff around this that needs to be built in order 381 00:39:11,530 --> 00:39:16,150 to to actually make your machine learning model or any data science put it. 382 00:39:16,150 --> 00:39:20,770 You do useful and use them into use in production. 383 00:39:20,770 --> 00:39:28,560 And often I. Like scientists themselves don't usually have the skills to do that. 384 00:39:28,560 --> 00:39:32,780 And I don't. I didn't. And this is one of the reasons why I wanted to switch. 385 00:39:32,780 --> 00:39:40,980 This is the reason I switch to genetic engineering, is because I want to learn the skills to get a bit more into all of the other parts of the 386 00:39:40,980 --> 00:39:47,350 infrastructure so that I can be less reliant on other people to get motors into production. 387 00:39:47,350 --> 00:39:49,870 So I went back to the seven, 388 00:39:49,870 --> 00:39:57,440 the machine learning models I mentioned I was involved with before and the ones with the green circles sorry if and when it's kind of blind. 389 00:39:57,440 --> 00:40:01,150 That's why I made the shapes different as well for the circles. 390 00:40:01,150 --> 00:40:07,420 They're the only two machine learning projects I've done that have actually made its production. 391 00:40:07,420 --> 00:40:12,460 All of the other ones haven't made it to production and so haven't provided the value 392 00:40:12,460 --> 00:40:18,250 other than perhaps learning that we can't do this or this was not the right time. 393 00:40:18,250 --> 00:40:22,000 So, yeah, I can talk about all this cool stuff, but whenever you go to, 394 00:40:22,000 --> 00:40:26,860 like conferences and stuff, you get that evil person, the audience, he's just like at the end. 395 00:40:26,860 --> 00:40:30,130 Ask a question, did it go? Is it in production? 396 00:40:30,130 --> 00:40:35,740 That's when you hear the speaker start to scream because they talk about all this cool stuff, but it doesn't make it there. 397 00:40:35,740 --> 00:40:42,820 So that's one thing. Secondly. Second reflection, Engo is not in the ALP. 398 00:40:42,820 --> 00:40:49,870 It's the decision that informs all the action that's taken. So take, for example, the product I was doing on compensation abuse. 399 00:40:49,870 --> 00:40:57,340 We are meant to get machine learning model to decide or to work out if a compensation claim was abusive or not. 400 00:40:57,340 --> 00:41:02,500 That wasn't going to. That model wasn't going to make that decision. 401 00:41:02,500 --> 00:41:10,600 It was going to give a school would go to a customer service representative who would ultimately ultimately make the decision based on other things. 402 00:41:10,600 --> 00:41:18,040 She he she you he saw I say she. Because we were working with with a lady when we're doing this. 403 00:41:18,040 --> 00:41:23,560 But yeah, it was them who he was going to make that call. 404 00:41:23,560 --> 00:41:31,480 So that was that's the that's the thing. So with a machine learning model, it's not the put specifically of that model. 405 00:41:31,480 --> 00:41:35,530 It is the action that it's taken or the decision that it informs. 406 00:41:35,530 --> 00:41:42,220 That's the important bit. So here is the the model itself doesn't have intrinsic value. 407 00:41:42,220 --> 00:41:49,370 So that's that's another thing. So when you're doing a project, you have to figure out what is it going to be used used for? 408 00:41:49,370 --> 00:41:57,830 How is it going to be used? Who's going to use it? Baselines and constellation, they can be hard to beat. 409 00:41:57,830 --> 00:42:03,910 Right. You know, a lot of times you're seeing these articles and all these things about a deep 410 00:42:03,910 --> 00:42:11,610 and models being the the best solution and coming out and being step the. 411 00:42:11,610 --> 00:42:15,720 Often that that's in certain cases, 412 00:42:15,720 --> 00:42:25,220 particularly cases with unstructured data like images and text and things that for a lot of structured data and tabular form data, 413 00:42:25,220 --> 00:42:35,880 a linear model will, if it doesn't be a deep, deep neural net like it will do basically as good a job. 414 00:42:35,880 --> 00:42:40,920 And we found that so far up until this year, 415 00:42:40,920 --> 00:42:49,880 the dispatch algorithm at delivery that decides when to dispatch right is to go and pick up an order was a linear model, 416 00:42:49,880 --> 00:42:54,900 is just a linear regression model. And that does the trick. 417 00:42:54,900 --> 00:42:58,770 It does the job. It does very well. So, yeah. 418 00:42:58,770 --> 00:43:02,850 And sometimes it's the case that even you don't need a model to do. 419 00:43:02,850 --> 00:43:11,820 You need a human. So when I worked at News UK for The Sun, you're trying to work out what what articles to promote on social media. 420 00:43:11,820 --> 00:43:15,930 Now, if anyone's read The Sun, I know how many of you do. I don't. 421 00:43:15,930 --> 00:43:19,410 I still don't. But like yeah. 422 00:43:19,410 --> 00:43:24,650 Tilghman's and want to editors and she was like nudity and celebrity deaths. 423 00:43:24,650 --> 00:43:28,710 So as soon as that's in an article that's gonna blow up. I don't need a model to tell me that. 424 00:43:28,710 --> 00:43:35,090 And she's she's right. So that that's that's kind of the things that we're talking about. 425 00:43:35,090 --> 00:43:40,470 Like think about when a model is actually needed. 426 00:43:40,470 --> 00:43:47,310 Plan how you're gonna show that your plan, how you're going to share that you've added value so you can release it. 427 00:43:47,310 --> 00:43:52,830 A machine learning model. We can release some sort that assigns product. 428 00:43:52,830 --> 00:44:03,350 But ultimately, someone's got to be convinced that it's actually doing the right thing because it's as much as we might be quantitative and, you know. 429 00:44:03,350 --> 00:44:11,000 Influenced by a data. There are a hell of a lot of people in industry who aren't. 430 00:44:11,000 --> 00:44:17,510 And sometimes they just need a story or the right cell. 431 00:44:17,510 --> 00:44:21,480 And that was the case in particular, news UK site delivery and Spotify. 432 00:44:21,480 --> 00:44:26,210 They use a lot of a B tests. And so we do just test if it actually is good. 433 00:44:26,210 --> 00:44:31,370 But D UK, we didn't do any of that. We just needed the right person to be convinced. 434 00:44:31,370 --> 00:44:35,820 And so it was about how we gonna sell this story. 435 00:44:35,820 --> 00:44:40,580 And the last thing I'm gonna cover, I'm going to be brief here because I want to give time for questions. 436 00:44:40,580 --> 00:44:44,510 So is data science is just an incredibly broad time. 437 00:44:44,510 --> 00:44:49,070 There are so many different skill sets that come with being a data scientist. 438 00:44:49,070 --> 00:44:54,510 I forgot to put on here, so I put on afterwards. There's a link to the article that I actually got this. 439 00:44:54,510 --> 00:45:01,640 He's, um, definition's from. But yeah, as a data scientist, sometimes you think you'll know absolutely everything. 440 00:45:01,640 --> 00:45:05,930 I use it. I work to deliver as a data scientist and algorithms. 441 00:45:05,930 --> 00:45:10,910 So I you know, I'm not an expert in inference and causal relationships, 442 00:45:10,910 --> 00:45:18,170 nor am I great or necessarily expert in analytics and built into right dashboards and knowing what new metrics to create. 443 00:45:18,170 --> 00:45:19,460 They are different skill sets. 444 00:45:19,460 --> 00:45:28,970 And I think that knowing your type of skill set and knowing that you don't have to be all that broad is an important thing. 445 00:45:28,970 --> 00:45:35,090 So, yeah, I'm just going to summarise there. I talked about how and why I went in state of science covered. 446 00:45:35,090 --> 00:45:43,320 Some products have done in the last four years, but I ended up talking about a bit more about the ranking problem and some some reflections. 447 00:45:43,320 --> 00:45:46,460 So I hope that has been somewhat useful. 448 00:45:46,460 --> 00:45:55,110 I know not technical, but that was my intention, was not to give a incorrect technical talk, but to give you at least some reflection on that. 449 00:45:55,110 --> 00:45:58,138 Yeah, and that's that's the end.