1 00:00:01,480 --> 00:00:10,840 So, yeah, once again, it gives me great pleasure to welcome Katcher Volkova Volkmer firm from Roshe Basel. 2 00:00:10,840 --> 00:00:17,760 And she's going to tell us about her deep learning is used in biomedicine. 3 00:00:17,760 --> 00:00:23,580 Good afternoon, everyone. And it's a pleasure to present to you today. 4 00:00:23,580 --> 00:00:31,710 I did a similar lecture about a year ago, but Baghdad still in person and it was quite, quite a bit more extensive. 5 00:00:31,710 --> 00:00:39,140 It was three hours of lectures and three hours of practise. So I had to condense the material a lot and I also updated it. 6 00:00:39,140 --> 00:00:46,020 But just a little bit about me. I was born in Russia many years ago. 7 00:00:46,020 --> 00:00:59,400 I started in actual linguistics and English studies after school, got slightly bored after a couple of years and moved to try to bring in Germany. 8 00:00:59,400 --> 00:01:05,700 Studied competition linguistics. And while I was doing competition linguistic studies, I got really used in neuroscience. 9 00:01:05,700 --> 00:01:12,090 So my BGT wasn't cognitive neuroscience and much plunker for biological cybernetics. 10 00:01:12,090 --> 00:01:20,700 After I finished my HD, I moved to UK and worked there as a data scientist for a few years and two very nice companies. 11 00:01:20,700 --> 00:01:22,890 I really learnt a lot there, 12 00:01:22,890 --> 00:01:36,810 but towards 2018 I started missing the continental life again and I was looking for opportunities in mainland Europe and I was again very, 13 00:01:36,810 --> 00:01:43,080 very fortunate to get the role of senior data scientist at Roche. 14 00:01:43,080 --> 00:01:47,250 Far more research and early development informatics department. 15 00:01:47,250 --> 00:01:56,520 So all of my work is currently with the digital biomarkers group and particularly our focus on Parkinson's disease. 16 00:01:56,520 --> 00:02:00,180 So they are so-called data analysis seen lead and Siemers, 17 00:02:00,180 --> 00:02:08,940 Parkinson and I do a lot of analysis of human behaviour when they perform so-called 18 00:02:08,940 --> 00:02:15,600 active tests on the smartphones that we provide them that accompany clinical trials. 19 00:02:15,600 --> 00:02:20,640 But enough about me. Yes. 20 00:02:20,640 --> 00:02:25,800 So what's the plan for this lecture? We only have an hour. Unfortunately, I wish we had a whole semester. 21 00:02:25,800 --> 00:02:30,980 I could tell you much more. So I want to go through four major points. 22 00:02:30,980 --> 00:02:36,100 What is a deep learning at all? What is it and what is it good for? 23 00:02:36,100 --> 00:02:43,110 We will learn some basics so that we can then further go and a bit more depth into two flavours of deep learning, 24 00:02:43,110 --> 00:02:47,430 convolutional neural networks and graph coalitional networks. 25 00:02:47,430 --> 00:02:52,560 And there's many more flavours, but you clearly don't have the time for all that. 26 00:02:52,560 --> 00:02:57,450 One note, there is going to be a lot of extra links almost on every slide. 27 00:02:57,450 --> 00:03:05,820 Some of them are going to be just sources for images that I stole from very nice of resources. 28 00:03:05,820 --> 00:03:09,990 Some of them are going to be links to true resources where I can go and learn more in detail. 29 00:03:09,990 --> 00:03:15,000 And they all spelt out better because I don't want you to worry. Like, oh, if I click on this link, where is it going to take me? 30 00:03:15,000 --> 00:03:18,990 I've checked them all because some of them were already there a year ago. 31 00:03:18,990 --> 00:03:24,240 I checked them all this week. They all live and some were even updated. 32 00:03:24,240 --> 00:03:32,940 So I really hope you will find this material helpful right before we go into deep learning itself. 33 00:03:32,940 --> 00:03:39,660 Me, as a machine learning practitioner, I want to emphasise some practical advice. 34 00:03:39,660 --> 00:03:45,060 First of all, you'll probably many of you heard that all models are wrong, but some models are useful. 35 00:03:45,060 --> 00:03:49,440 So when we apply machine learning or just even statistical models, 36 00:03:49,440 --> 00:03:56,190 we're trying to explain the data we observe in the world and we try to find patterns that are useful. 37 00:03:56,190 --> 00:04:04,800 Any model will always have some bits and pieces of data that's not not explained, but we hope that it's just noise and not not important. 38 00:04:04,800 --> 00:04:08,370 So we care about the models that are useful and how do we build them. 39 00:04:08,370 --> 00:04:16,220 So another very frequent phrase that's heard from all machinery practitioners is garbage. 40 00:04:16,220 --> 00:04:26,190 Garbage in, garbage out, meaning that, well, we want our model to represent to explain and predict a certain phenomenon. 41 00:04:26,190 --> 00:04:32,700 If the data we're training the model on is not representative of the phenomenon, the model will be useless as such. 42 00:04:32,700 --> 00:04:38,670 So it'd be really, really careful. This is a very unforgiving rule, especially in deep learning. 43 00:04:38,670 --> 00:04:46,920 Then again, when we talk about deep learning in particular, many people treat it as some kind of magical tool. 44 00:04:46,920 --> 00:04:54,150 Partially because it's really hard to interpret. But the usual machine learning pitfalls like overfitting and bias apply to deep learning as well. 45 00:04:54,150 --> 00:04:59,040 It's not a silver bullet. So you have to be very watchful there as well. 46 00:04:59,040 --> 00:05:03,060 And when since I mentioned that deep learning is harder to interpret. 47 00:05:03,060 --> 00:05:06,840 This is kind of a part of the so-called there is no free lunch. 48 00:05:06,840 --> 00:05:15,900 So if you build a very simple, very easy to interpret model, for example, you have three variables and you're predicting some outcome and you can. 49 00:05:15,900 --> 00:05:22,820 See, like it can. It's very, very clear. Right. You Hucker coefficients, you have you order them cockpit's intervals. 50 00:05:22,820 --> 00:05:27,640 That's OK. This feature is doing this. This feature is doing that. Lovely. 51 00:05:27,640 --> 00:05:32,370 You've learnt you have thousands, maybe even millions of parameters. 52 00:05:32,370 --> 00:05:37,720 It's impossible to interpret. Your model can be much more powerful for very complex data. 53 00:05:37,720 --> 00:05:45,000 But you lose this interpreter ability. And there are tools that help us to interpret models in deep learning as well. 54 00:05:45,000 --> 00:05:54,270 And I think it's also everybody's responsibility and duty to apply them, because I think it's very important to understand what the model does. 55 00:05:54,270 --> 00:05:59,850 Right. Another device I would like to give you is that don't be just a number cruncher or really 56 00:05:59,850 --> 00:06:05,830 understand the problem you're trying to solve and the data in the early days of data size. 57 00:06:05,830 --> 00:06:11,730 I've often heard opinions like, oh, well, it doesn't matter that you don't know genetics, which I don't. 58 00:06:11,730 --> 00:06:18,090 I'm not a biologist. You will just do your number crunching and everything will work out. 59 00:06:18,090 --> 00:06:20,310 This is really dangerous, I think. 60 00:06:20,310 --> 00:06:26,490 So at least if you're not the expert in the field where the data is coming from and the problem and trying to solve, 61 00:06:26,490 --> 00:06:30,480 make sure that you have a very strong connexion to somebody, 62 00:06:30,480 --> 00:06:38,070 to an expert, and then you can always show them some intermediate results so that they can help understand the problem better. 63 00:06:38,070 --> 00:06:42,300 And then there is also another aspect of machine learning. 64 00:06:42,300 --> 00:06:49,440 Often times, especially in research and also in industry models are built to interpret the data, not necessarily to predict. 65 00:06:49,440 --> 00:06:57,660 And that's fine. But if you want your model to live on on some server or in some app that's called machine models in 66 00:06:57,660 --> 00:07:07,920 production and it goes into the area of Amelle ops personalisation and that calls for model maintenance, 67 00:07:07,920 --> 00:07:14,370 which means that even if you're very certain that, you know, turmoil is really great and performs well. 68 00:07:14,370 --> 00:07:18,530 Please continue testing it over time on new data. 69 00:07:18,530 --> 00:07:29,310 There can be shift in what their true data for this Trufant looks like and make sure make sure to analyse and fix the errors. 70 00:07:29,310 --> 00:07:38,190 Otherwise, over time, your model will become useless. And I have a very, very forceful advice here. 71 00:07:38,190 --> 00:07:43,140 Please watch this course at least twice. It's from entering from Coursera. 72 00:07:43,140 --> 00:07:46,980 It's really, really great. And it is tailored towards deep learning. 73 00:07:46,980 --> 00:07:55,470 But the things you can learn about it, the things you can learn there, are applicable to other machinery players, flamers as well. 74 00:07:55,470 --> 00:08:02,910 OK. This was a long slide. I think it's really, really important. I didn't want to skip over it. 75 00:08:02,910 --> 00:08:06,600 Now we can finally start our journey with deep learning. 76 00:08:06,600 --> 00:08:18,420 So what is deep learning? It's a set of methods that really took off in the last 10 years. 77 00:08:18,420 --> 00:08:22,830 It's a subset of other machine learning algorithms and tools. 78 00:08:22,830 --> 00:08:28,860 Machine learning is a bit older. It took off thanks to three factors. 79 00:08:28,860 --> 00:08:36,570 I mean, actually, the methods themselves solicited their philosophy was there since quite a long, longer period of time. 80 00:08:36,570 --> 00:08:42,180 But there was no not not enough computer power to make them scalable. 81 00:08:42,180 --> 00:08:45,510 And there was not enough data itself to train these models on. 82 00:08:45,510 --> 00:08:55,770 So really was the the dawn of big data and faster computers, Duplantier finally took off. 83 00:08:55,770 --> 00:09:00,380 Machine learning itself is part it. There's a subset of artificial intelligence. 84 00:09:00,380 --> 00:09:05,340 So not all artificial intelligence needs machine learning to perform. 85 00:09:05,340 --> 00:09:09,330 But many, many more search algorithms do. 86 00:09:09,330 --> 00:09:12,720 And many and many of them are now the news funds. 87 00:09:12,720 --> 00:09:20,640 They often rely on deep learning. And I think a big distinction between so-called classic or like something called 88 00:09:20,640 --> 00:09:25,520 old school machine learning and deep learning is the fact that in typical machine, 89 00:09:25,520 --> 00:09:33,900 like an old school machine learning project, you as a machine practitioner or you would do feature extraction manually on your own, 90 00:09:33,900 --> 00:09:41,550 like either you already receive a data set and you start looking at that and the plotting and exploratory data analysis and you think, well, 91 00:09:41,550 --> 00:09:48,450 maybe I'll add a few more features here and encode this data this way and extract extra things, 92 00:09:48,450 --> 00:09:53,280 merge new datasets together, and then you perform your classification. 93 00:09:53,280 --> 00:09:57,960 Deep learning does feature extraction for you because the typical inputs it works on, 94 00:09:57,960 --> 00:10:09,690 like images or free or likesome speech recordings or text there, it's almost impossible to come up with a very useful set of features for all cases. 95 00:10:09,690 --> 00:10:15,730 So this is also a very think important distinction between classical machine learning and deep learning. 96 00:10:15,730 --> 00:10:19,530 So when we talk about people who contributed to the right is of deep learning, 97 00:10:19,530 --> 00:10:24,790 there is, of course, many more than just these three very smart gentlemen. 98 00:10:24,790 --> 00:10:29,800 But I wanted to highlight their personalities in particular, because a couple of years ago, 99 00:10:29,800 --> 00:10:41,110 they actually got a Turing Award for contribution into to deepen your networks. 100 00:10:41,110 --> 00:10:49,420 They are still very active in the field. And for example, I really like that they they even challenged their own old ideas. 101 00:10:49,420 --> 00:10:54,660 So they still contribute. And I really advise you to learn more about them. 102 00:10:54,660 --> 00:11:01,600 You know, if you have the time. Medicine is very close to my skin. 103 00:11:01,600 --> 00:11:07,240 I fell in love with digital health after I finished my page. 104 00:11:07,240 --> 00:11:11,740 And I think I'm really, really fortunate to work in this field now, which is why, 105 00:11:11,740 --> 00:11:16,090 of course, when I want to update myself on what deep learning is good for, 106 00:11:16,090 --> 00:11:24,220 I first and foremost look how it can be used in medicine and the uses are plentiful so it can help in diagnosis. 107 00:11:24,220 --> 00:11:35,170 For example, you have an image of some some cells and you want to figure out like a biopsy and you want to figure out whether it's cancerous or not. 108 00:11:35,170 --> 00:11:39,140 Right. Medical imaging doesn't have to be just about diagnosing as you can. 109 00:11:39,140 --> 00:11:44,210 Also, maybe you already know that somebody was Parkinson. 110 00:11:44,210 --> 00:11:49,450 All right. They can take brain scans over time and again. 111 00:11:49,450 --> 00:11:56,380 Deep learning can help to find to figure out how the disease is developing over time. 112 00:11:56,380 --> 00:11:59,230 Deep learning is supporting clinical trials. 113 00:11:59,230 --> 00:12:06,820 This is somewhat more challenging because clinical trials there, you have to be absolutely clear and very, very comfortable. 114 00:12:06,820 --> 00:12:16,750 And like I mentioned before, Tony is sometimes hard to interpret. But nevertheless, the sum in some aspects, the burning down, it already helps. 115 00:12:16,750 --> 00:12:27,220 And last but not least, especially in the last couple of years, deep learning you have was shown as an amazing tool for drug discovery, 116 00:12:27,220 --> 00:12:35,310 which is a very, very the Borias very expensive process otherwise. 117 00:12:35,310 --> 00:12:45,630 So it could be that in deep learning would really be a tool to solve some huge bottlenecks in this area. 118 00:12:45,630 --> 00:12:53,370 So these were the areas. Well, who are the companies that are very active in deploying medicines in particular? 119 00:12:53,370 --> 00:12:58,660 So some of my favourites and I actually local on the island are Babylon Hills. 120 00:12:58,660 --> 00:13:04,350 They are helping to scale the general health in UK. 121 00:13:04,350 --> 00:13:10,590 And they have. I'm not true whether that model is kind of falls into the class of deep learning, 122 00:13:10,590 --> 00:13:18,120 but they have a very clever Bayesian network that has passed the doctors exam GP exam. 123 00:13:18,120 --> 00:13:24,840 So if you give it symptoms and it keeps asking interactively, go, do you have the symptoms or what about the symptoms? 124 00:13:24,840 --> 00:13:29,880 It can diagnose your disease pretty reliably. And then benevolently. 125 00:13:29,880 --> 00:13:35,460 I has shown some breakthrough in drug discovery recently. 126 00:13:35,460 --> 00:13:44,940 Deep Mind has its own department in health, where they also try to optimise certain aspects of health care. 127 00:13:44,940 --> 00:13:49,410 And of course, there is big pharma companies like Roche, where I work, Novartis. 128 00:13:49,410 --> 00:13:58,050 AstraZeneca has been expanding their. Their stuff in in terms of machine learning, skill set. 129 00:13:58,050 --> 00:14:03,300 And last but not least, there are the usual tech giants like Google, Apple, Amazon, IBM and Philips. 130 00:14:03,300 --> 00:14:08,050 They all have very strong departments in-house. 131 00:14:08,050 --> 00:14:16,510 And health care, right? So deep learning has been going out and doing great things. 132 00:14:16,510 --> 00:14:22,980 And is there are there any horizons where it's still it still has to reach? 133 00:14:22,980 --> 00:14:33,960 So for me, the most interesting topics are the following, where I hope that the next breakthroughs will take place, call in front and reasoning. 134 00:14:33,960 --> 00:14:44,130 So even though deep learning looks like it's super wise and it can just tell you whether there is a cat on the picture or not. 135 00:14:44,130 --> 00:14:50,850 And many, of course, many other useful things. It's still it's still just patterns. 136 00:14:50,850 --> 00:14:54,540 It's just this it's still correlation, essentially. 137 00:14:54,540 --> 00:15:02,500 And what is still often missing is that this causal link and I really recommend you to read this book. 138 00:15:02,500 --> 00:15:05,940 The book of why it's very non-technical. It's very engaging. 139 00:15:05,940 --> 00:15:14,100 I think I read it twice already because, yeah, it's just really interesting to think about. 140 00:15:14,100 --> 00:15:22,440 It's a very different way of thinking about data. Then we still have lots of problems in algorithmic bias and bias and data. 141 00:15:22,440 --> 00:15:31,650 And again, deep learning being somewhat more obscure. It's it's harder to catch this bias, but it's very important to cheque for it. 142 00:15:31,650 --> 00:15:41,820 Done this. Like I said, oh, we still need to increase transparency and interoperability in deep learning. 143 00:15:41,820 --> 00:15:49,020 And that in particular would help in clinical adoption, in my case, in my opinion. 144 00:15:49,020 --> 00:15:58,380 For example, if you want to submit a new algorithm for a diagnosis to FDA, it has to be very, very clear. 145 00:15:58,380 --> 00:16:04,630 So I doubt that even something like a random forest would be very welcome there unless it's very clearly explained. 146 00:16:04,630 --> 00:16:09,030 Imagine how much trans deployed multiple would have. 147 00:16:09,030 --> 00:16:13,170 And a new point added recently is metal learning. 148 00:16:13,170 --> 00:16:19,950 And that is a hot topic right now as well, because the resulting models are highly, highly special. 149 00:16:19,950 --> 00:16:28,470 Establisher an example late and interactive examples of what it means. So currently the search is for a more General Lovering approach. 150 00:16:28,470 --> 00:16:36,790 So a an algorithm that was trained on one thing and then can do and complete a different one or learn it much, much faster. 151 00:16:36,790 --> 00:16:43,810 Right. So this was just a very quick tour de force. 152 00:16:43,810 --> 00:16:53,500 Where can you learn more? There's online courses. I've done plenty of them during my pages and after and still do them. 153 00:16:53,500 --> 00:16:59,070 If I had the time. There is tons of podcasts. There's, of course, many, many books. 154 00:16:59,070 --> 00:17:04,630 So highlighted just two of the couple of dozen. There's so many YouTube channel. 155 00:17:04,630 --> 00:17:09,280 Some of them are absolutely brilliant. 156 00:17:09,280 --> 00:17:14,740 I wouldn't think the only YouTube channels I would recommend is where the instructor tells you, oh, deep learning is so easy. 157 00:17:14,740 --> 00:17:21,920 You just type these two lines of code and you're done. Please don't follow that advice. Please try to understand in depth what is actually going on. 158 00:17:21,920 --> 00:17:28,110 So there is lots of blogs, of course, and a whole class of devotees from medium. 159 00:17:28,110 --> 00:17:34,630 I particularly read the like towards data science. And of course, there's always papers. 160 00:17:34,630 --> 00:17:38,350 I have a more detailed list on my blog. 161 00:17:38,350 --> 00:17:42,790 So if you want to cheque it out, you're more than welcome. Right. 162 00:17:42,790 --> 00:17:49,180 So let's imagine that after the stock, your super inspired and you want to learn more about deep learning and maybe you already 163 00:17:49,180 --> 00:17:53,500 have a problem that you would like to solve and you have a different brilliance. 164 00:17:53,500 --> 00:17:58,030 So how do you do? Do go and programme motorcoach from scratch on your own. 165 00:17:58,030 --> 00:18:06,040 You do not have to do that. There are at least four and the number is always growing. 166 00:18:06,040 --> 00:18:11,290 Platforms that can help you build your models. They all have slightly different flavours, 167 00:18:11,290 --> 00:18:18,730 different advantages and disadvantages within my team would tend to use by torch because it's very nice and friendly. 168 00:18:18,730 --> 00:18:28,900 But my first models I've built on denser flow and keris. Terrorist is like a very nice, friendly up four tenths of a century. 169 00:18:28,900 --> 00:18:34,430 So, all right, we're done with the main introduction. 170 00:18:34,430 --> 00:18:41,330 Let's look a bit closer to what kind of flavours exist in deep learning. 171 00:18:41,330 --> 00:18:45,560 So we will start with the what are called Grandpa Perceptron. 172 00:18:45,560 --> 00:18:50,060 That's the main kind of simplest building block of deep learning. 173 00:18:50,060 --> 00:18:54,920 And we will also we will next go into multilayer Perceptron. 174 00:18:54,920 --> 00:19:01,760 And as you can see here, there is kind of four main 5min families. 175 00:19:01,760 --> 00:19:09,050 So there is convolutional neural net. They work on images. Mostly there's Grauwe neural nets that work on graph. 176 00:19:09,050 --> 00:19:12,710 And they kind of have a baby together with the Quackenbush. 177 00:19:12,710 --> 00:19:20,930 And that's what we'll talk about them in more detail because it's kind of an easy step from convolutional neural nets to a graph, convolutional nets. 178 00:19:20,930 --> 00:19:26,890 But if you want to work with text or speech, then you might be better off with them. 179 00:19:26,890 --> 00:19:32,250 The recurrent neural nets and the prolonged short term memory. 180 00:19:32,250 --> 00:19:36,620 And that's the kind of an extension of governor. 181 00:19:36,620 --> 00:19:45,440 Some people have done a very good research or have achieved very good results on, for example, speech recognition with CNN is actually. 182 00:19:45,440 --> 00:19:50,510 But yeah, it's it's a way to do it. I'm not a. 183 00:19:50,510 --> 00:19:58,520 And then a whole different beast is deeper enforcement, which I think will it's a very promising fields. 184 00:19:58,520 --> 00:20:05,300 And there is our turn coders and guns that we will not have time to go into them. 185 00:20:05,300 --> 00:20:11,450 So there's a generative anniversary networks or serial networks. 186 00:20:11,450 --> 00:20:15,910 So, yeah, these ones are out of scope today. But just all. 187 00:20:15,910 --> 00:20:25,280 These will not have a time. So there's not going to be lots of technical detail today, again, due to the lack of time. 188 00:20:25,280 --> 00:20:29,660 But I still want you to understand it quite a detail. 189 00:20:29,660 --> 00:20:33,260 What Perceptron does. Perceptron is just a linear model. 190 00:20:33,260 --> 00:20:36,770 Nothing else is just a. Well, no, not exactly nothing else. 191 00:20:36,770 --> 00:20:41,870 But essentially, it's a linear model. So you have one year. 192 00:20:41,870 --> 00:20:49,100 So can you see my course or can you see my mouse? They like. 193 00:20:49,100 --> 00:20:53,010 I haven't used that so much before. Yes, yes, it is. 194 00:20:53,010 --> 00:20:58,300 OK, great. So, yeah. So here you have the inputs right at your features. 195 00:20:58,300 --> 00:21:02,530 So Perceptron or oppressive trons, they can work on tabular data. 196 00:21:02,530 --> 00:21:08,590 In fact, some of my colleagues, to use it often times as a benchmark, you get a new tabular data set. 197 00:21:08,590 --> 00:21:15,490 You throw it onto the multilayer Perceptron. You get your ICOM and then on your better models you try to improve. 198 00:21:15,490 --> 00:21:21,910 You tried to find Overfitting or you tried to just get better accuracy overall. 199 00:21:21,910 --> 00:21:31,060 So here you have your inputs. And. On these inputs, you also apply certain weights and then you just send them up. 200 00:21:31,060 --> 00:21:36,670 That's all you do when you add your bias, which is the intercept. So in statistics, these are coefficients. 201 00:21:36,670 --> 00:21:42,790 And this is the intercept. And then as the outcome, so you get this weighted sum essentially. 202 00:21:42,790 --> 00:21:47,020 And you look whether it's above zero or not. 203 00:21:47,020 --> 00:21:55,230 And drove from that. You judge whether you should say like one four zero on as the output. 204 00:21:55,230 --> 00:22:00,280 That's that's almost Perceptron. It it's quite all this [INAUDIBLE]. Was there even before the 60s. 205 00:22:00,280 --> 00:22:07,160 And this is the building block for essentially all deep learning networks. 206 00:22:07,160 --> 00:22:11,210 And it can do quite like it. It can do linear separation. 207 00:22:11,210 --> 00:22:18,700 It can build a linear model. So. Well, you might be thinking how how do we have this input? 208 00:22:18,700 --> 00:22:21,700 Right. This is this is in our data. How do we find the weights? 209 00:22:21,700 --> 00:22:27,910 Because judging depending on his weights, you would have a model that is either totally off. 210 00:22:27,910 --> 00:22:32,800 Right. It's just not accurate at all. Here we have our ground truth as dots. 211 00:22:32,800 --> 00:22:43,580 You have two classes. And then we have our predictions. And the model says, well, I'm super sure that this this point is right, which is not true. 212 00:22:43,580 --> 00:22:53,520 Then when when we adapt, when we change the weights and the intercept, then we can we can say, okay, now model things. 213 00:22:53,520 --> 00:22:58,720 And this is the good fit. And here it's actually 50/50. So this is an improvement, but it could do better. 214 00:22:58,720 --> 00:23:03,670 So we iterate on and I will tell you exactly how iterate. 215 00:23:03,670 --> 00:23:14,350 And now we have actually if we count how many classes are marked blue and how many red, we see that this model is much more accurate. 216 00:23:14,350 --> 00:23:26,480 But it's not perfect. A perfect fit is actually a model that kind of maximises distance from the line where it's 50/50 to your training set. 217 00:23:26,480 --> 00:23:30,310 So, yeah, exactly how do we hold this line to rotate? 218 00:23:30,310 --> 00:23:36,960 How do we hold this line to find an optimal fit? So in in linear regression. 219 00:23:36,960 --> 00:23:43,080 But also in Perceptron approach. What to use is a gradient descent. 220 00:23:43,080 --> 00:23:49,480 And here on the Y axis, we have our so-called cost function or quite how accurate our model is. 221 00:23:49,480 --> 00:23:55,540 So here it's basically at the bottom. You have no era or a minimum possible error. 222 00:23:55,540 --> 00:23:58,720 And the higher up you go, the worse your model is performing. 223 00:23:58,720 --> 00:24:06,760 And you do this by basically you're looking at the distance towards these dots and encoding when it were. 224 00:24:06,760 --> 00:24:14,830 Categorised correctly or not. So you can initially the weights are generated randomly. 225 00:24:14,830 --> 00:24:23,710 It can be set to zero, but it can also generate randomly. It doesn't matter because once you have one outcome or like a couple of dots, 226 00:24:23,710 --> 00:24:29,410 you can build your gradient and the gradient tells you how far off kind of the steepness of the 227 00:24:29,410 --> 00:24:37,210 slope tells you how far off you are because what you want to get is to a completely parallel slope. 228 00:24:37,210 --> 00:24:42,970 When your gradient is zero. You know that you you've gotten there. 229 00:24:42,970 --> 00:24:45,530 It's it's a bit more tricky than that. It's a bit more detail than that. 230 00:24:45,530 --> 00:24:53,320 But the beauty of gradient descent is that, you know, you always know where to go, in which direction and by how much you're off. 231 00:24:53,320 --> 00:24:56,670 So we can adapt your steps with time. 232 00:24:56,670 --> 00:25:03,140 And so you don't have to take a tons of tiny, tiny, tiny steps if you own in your direction, than you would have to do that. 233 00:25:03,140 --> 00:25:06,790 But you also know by how much approximately you need to go. 234 00:25:06,790 --> 00:25:14,590 So if you have very few data points and very few parameters, you can afford to do gradient descent as it is. 235 00:25:14,590 --> 00:25:20,110 But I will also talk about the trick that allows you to do it faster. 236 00:25:20,110 --> 00:25:24,220 But let's first go into a slightly more complex architecture. 237 00:25:24,220 --> 00:25:29,350 So we only had one person trying before. 238 00:25:29,350 --> 00:25:33,880 Well, what if we have two or three or 16 or 200? 239 00:25:33,880 --> 00:25:39,040 We still have our inputs and then we still have our weights that we're fitting. 240 00:25:39,040 --> 00:25:44,980 And here we have two linear models and they will have their own beliefs and their own biases. 241 00:25:44,980 --> 00:25:54,670 And then they can encode two different linear models. And by combining these two different models, we can actually apply classification, for example, 242 00:25:54,670 --> 00:26:03,810 to nonlinear citations like imagine if you actually had this kind of class dependency, that blue dodge. 243 00:26:03,810 --> 00:26:08,380 So it would basically have here maybe one feature and another feature. 244 00:26:08,380 --> 00:26:17,470 And we know that blue dots are typically at least half or over. 245 00:26:17,470 --> 00:26:27,610 All of our range on the x axis and more or less, half or more on the y axis sort an end situation. 246 00:26:27,610 --> 00:26:33,340 And this allows us to encode for this linear nonlinearity. 247 00:26:33,340 --> 00:26:38,950 And of course, the other more layers we have, the more the more neurones to do. 248 00:26:38,950 --> 00:26:51,190 So this one, this in layers called the neurone, the more detail we can add to this one non-linearity on as many dimensions as we want. 249 00:26:51,190 --> 00:26:57,980 So. This is. This is where the. 250 00:26:57,980 --> 00:27:03,530 Again, this is the basics of all deep learning approaches, more or less. 251 00:27:03,530 --> 00:27:09,270 And a very important trick they see to cheque messages here. 252 00:27:09,270 --> 00:27:20,250 Sorry, okay. Apologies. Let's go back. 253 00:27:20,250 --> 00:27:30,450 Doesn't want to go back. Meet Christopher Baqa'a in the bottom. 254 00:27:30,450 --> 00:27:34,950 Yeah, yeah, no, it's it's working out so yeah. 255 00:27:34,950 --> 00:27:42,450 So another very important aspect of deep learning is back propagation. 256 00:27:42,450 --> 00:27:51,930 And that's basically when you so you have your architecture before we had a very simple architecture was just one hidden leg. 257 00:27:51,930 --> 00:28:00,600 Like I said, you can have multiples and very quickly you get a fairly complex system where you have weights on every edge. 258 00:28:00,600 --> 00:28:04,770 And so you take your inputs. You'll generate your weights. 259 00:28:04,770 --> 00:28:09,930 You apply them to these new neurones. You activate them or not, depending on the outcome. 260 00:28:09,930 --> 00:28:16,260 Then they themselves generate weights and so on. And essentially you get them to the output. 261 00:28:16,260 --> 00:28:22,260 For example, if you have a binary outcome, they say, okay, was this input from this example? 262 00:28:22,260 --> 00:28:30,810 It looks like I have maybe a probability of point six on one class and point four on the other class. 263 00:28:30,810 --> 00:28:39,630 But actually. We know that the right answer is just 100 percent on the first class and then zero or like. 264 00:28:39,630 --> 00:28:48,060 Yeah, and zero probability on the other. How do we correct for this so we can calculate our cost function and we back propagate this thing? 265 00:28:48,060 --> 00:28:54,000 Well, yeah, you need to correct these weights in this direction and these weights misdirection. 266 00:28:54,000 --> 00:29:05,010 And this is called back propagation. And when you have it go forward, so do the wave from input awkwardly is called feed forward pass. 267 00:29:05,010 --> 00:29:11,370 And then the correction of these weights, the adjustment of these weights is called back propagation. 268 00:29:11,370 --> 00:29:14,580 And one cycle of those two is called epoch. 269 00:29:14,580 --> 00:29:21,690 And usually when you're training you deploying models, you have multiple epochs for very, very simple scenarios. 270 00:29:21,690 --> 00:29:30,030 Just a dozen might suffice. But for very complicated data, you might need hundreds. 271 00:29:30,030 --> 00:29:36,160 But gradually, your gradually your model will converge. 272 00:29:36,160 --> 00:29:45,740 And go adjust the weights so that any input will generate most of the times correct results. 273 00:29:45,740 --> 00:29:50,240 So you already see that even very few days and very few neurones. 274 00:29:50,240 --> 00:29:56,660 We already have so many parameters. These weights, for example. Right. It's quite natural. 275 00:29:56,660 --> 00:30:03,990 That was larger data sets and larger system architectures. 276 00:30:03,990 --> 00:30:11,640 The whole training process will slow down even worse, much faster computers that we have today. 277 00:30:11,640 --> 00:30:18,900 It's still gonna take too long. And and I think the great news is that it doesn't have to take this long. 278 00:30:18,900 --> 00:30:27,840 You don't have to build to to do great in the sand approach for every single input, every single future what you can. 279 00:30:27,840 --> 00:30:34,560 You could take shortcuts. So some of the shortcuts are called Dyster Stochastic Gradient Descent and drop out. 280 00:30:34,560 --> 00:30:39,300 And they work on just two aspects of the neural net. 281 00:30:39,300 --> 00:30:48,000 So stochastic reading descent does gradient descent, but not on all of the inputs, just on a few of them at a time. 282 00:30:48,000 --> 00:30:57,300 So it takes Bachus to just select them randomly and does the gradient descent on them and drop out does kind of the opposite. 283 00:30:57,300 --> 00:31:02,790 So that randomly switches off nodes in in the in the hidden layers. 284 00:31:02,790 --> 00:31:07,710 And that means that not all of them are activated at the same time. 285 00:31:07,710 --> 00:31:16,710 Not only do these tricks help you to train your network faster, but they also make it more robust. 286 00:31:16,710 --> 00:31:26,070 Deep learning networks are amazing and overfitting. If you give them the chance, they will just memorise the whole training set by heart and. 287 00:31:26,070 --> 00:31:33,510 Well, yeah, basically give you a perfect results on your training data and be awful and useless on any new data. 288 00:31:33,510 --> 00:31:41,490 But yeah, the stochastic gradient descent and drop out help you to make these models more generalisable. 289 00:31:41,490 --> 00:31:56,100 I once heard a lovely metaphor on the podcast saying that so neural nets are very good at finding Mockett maximum optimum. 290 00:31:56,100 --> 00:31:58,620 Basically. 291 00:31:58,620 --> 00:32:09,620 Yeah, the perfect the perfect victims in your future space, which often look like dead deep, deep wounds, deep wells once and your network finds it. 292 00:32:09,620 --> 00:32:14,660 It will not be able to get out of it until these stochastic gradient descent and drop out. 293 00:32:14,660 --> 00:32:20,540 It's like the model is wearing very big boots. So it cannot fall into these wells. 294 00:32:20,540 --> 00:32:25,070 So I think it's it's a nice metaphor for this. 295 00:32:25,070 --> 00:32:35,640 Still, with enough epochs, you can always overfit and you need to know when to stop so you can still apply old good regularisation. 296 00:32:35,640 --> 00:32:39,400 And you might know from elastic nuts, for example, and what lot of one know. 297 00:32:39,400 --> 00:32:46,970 To. But also as a. 298 00:32:46,970 --> 00:32:53,810 Model, you can need to watch out for gradual divergence between the train dataset and the rotation 299 00:32:53,810 --> 00:32:57,650 because you should always have to reach those but your data to train and validation. 300 00:32:57,650 --> 00:33:04,490 And you should also have to hold out the ultimate test. And as you're training your network, you you measure the air on the train, right. 301 00:33:04,490 --> 00:33:10,790 To adjust the weight. And you can get that error there. And you should also then apply these weights exactly. 302 00:33:10,790 --> 00:33:18,830 On do validation data set. And as your model trains do, air first goes in synchrony down in both. 303 00:33:18,830 --> 00:33:24,050 But once you remember, your model starts memorising your examples, 304 00:33:24,050 --> 00:33:28,880 then the era when the train will further decrease while invalidation it will increase. 305 00:33:28,880 --> 00:33:36,200 This is it. And then, you know, OK. You to stop here. So that's also a very important aspect to keep in mind. 306 00:33:36,200 --> 00:33:41,030 How are we doing? Oh, [INAUDIBLE]. Sorry. 307 00:33:41,030 --> 00:33:47,420 So this was general information about Perceptron and Amobi. 308 00:33:47,420 --> 00:33:54,350 Let's look into more detail and to CNN's. Because I think they are very fascinating breakthrough in machine learning. 309 00:33:54,350 --> 00:33:59,810 And they really managed to do something that people were not able to do before. 310 00:33:59,810 --> 00:34:07,460 Also, for us as humans, it's really easy, like we have an amazing visual sensory system. 311 00:34:07,460 --> 00:34:16,790 So it's really easy for us to recognise objects. So we look at this and say, okay, five five Cat four computer is not so obvious. 312 00:34:16,790 --> 00:34:22,550 And if if we did the machine learning at the old school machine or an approach, 313 00:34:22,550 --> 00:34:30,580 we would have to handcraft these feature extraction and we could come up maybe with some filters and say, well, five, what delusional look like. 314 00:34:30,580 --> 00:34:35,750 There's always this kind of air that's going from left to right. 315 00:34:35,750 --> 00:34:41,060 And there's like a sharp object just above it. But then you get another five, which is all smooth. 316 00:34:41,060 --> 00:34:42,980 And then you filters not work anymore. 317 00:34:42,980 --> 00:34:53,390 And then try imagine coming up with all the rules that tell you that this is a cat in a mask and not a tiger and not a lion and not a puppy. 318 00:34:53,390 --> 00:34:58,850 So imagine doing that. That's nine impossible. People have tried, but it usually failed. 319 00:34:58,850 --> 00:35:06,140 So D.C. is being part of a part of the Decoding Family Day due to feature extraction themselves. 320 00:35:06,140 --> 00:35:12,530 And they also solve a very important problem in images because images can be really, really large. 321 00:35:12,530 --> 00:35:16,220 I mean, this is a very visual image, the fives here. 322 00:35:16,220 --> 00:35:25,880 But often times, for example, especially in medical imaging, that the resolution is really, really high and you get really large images. 323 00:35:25,880 --> 00:35:32,840 So how do you extract information from these images that you can condense it and still keep it useful? 324 00:35:32,840 --> 00:35:39,920 So CNN's they have two big tricks here. They have convolution layers and they have pooling layers. 325 00:35:39,920 --> 00:35:46,910 Convolution is basically when you take a small window of pixels, maybe like three by three, you can be eleven by eleven, seven by seven. 326 00:35:46,910 --> 00:35:52,750 You are the architect and you apply filters stood out and you and your filter has some pretty 327 00:35:52,750 --> 00:35:59,710 fine numbers and it's often randomly generated and you just do matrix multiplication. 328 00:35:59,710 --> 00:36:03,050 This pixel to this pixel just picks up to this pixel. This is not a pixel. 329 00:36:03,050 --> 00:36:04,700 This is just a number. 330 00:36:04,700 --> 00:36:14,150 And you send them up and you put them and you record them basically in your next layer on your output layer pooling does something even more simple. 331 00:36:14,150 --> 00:36:18,890 It takes three by three pixels, for example, or some other patch. 332 00:36:18,890 --> 00:36:23,450 It's usually a square patch and it just takes the maximum value out of those records. 333 00:36:23,450 --> 00:36:27,560 So here you see we have six by six. And that's the output. 334 00:36:27,560 --> 00:36:33,590 We got two by two, because this patch of three by three went one, two, three, four. 335 00:36:33,590 --> 00:36:36,440 And there we go. It extracted the maximum number. 336 00:36:36,440 --> 00:36:44,420 It doesn't have to be macsween can be medium, whatever you prefer, but it can it helps to condense the information. 337 00:36:44,420 --> 00:36:50,330 So these filters, they are quite curious. Like I said, they can be randomly generated. 338 00:36:50,330 --> 00:37:00,190 You can prespecified them if you want. And depending how they are built, they can highlight certain features so they can either detect edges. 339 00:37:00,190 --> 00:37:09,680 Right. If you have height, like higher values in the middle or lower values in middle or vertical arrangement. 340 00:37:09,680 --> 00:37:19,790 They will detect edges in the image. They can sharpen the image by kind of depressing values on the outside and highlighting values in the middle. 341 00:37:19,790 --> 00:37:29,570 You can do blurring of all kinds of things. And what's interesting is that I will show you typical on your on the architecture. 342 00:37:29,570 --> 00:37:39,660 In a moment, but essentially by applying very similar filters from layer to layer, you are able to extract more and more complex features. 343 00:37:39,660 --> 00:37:45,540 So at first divulges the text, the edges in the image, and then they will detect more complex items. 344 00:37:45,540 --> 00:37:55,680 And then in the third, fourth, fifth, say, you will see whole object. So this is this is really a very interesting property on these filters. 345 00:37:55,680 --> 00:37:59,670 And this is what a typical CNN would look like. 346 00:37:59,670 --> 00:38:04,440 This is not a very large network. Like I said, you are the architect. 347 00:38:04,440 --> 00:38:08,370 You decide on many hyper parameters. How many layers to have? 348 00:38:08,370 --> 00:38:13,650 How to arrange them. What should be the size of the filters? How many filters should you have? 349 00:38:13,650 --> 00:38:20,130 But the still the kind of the flow of information remains the same that you start with. 350 00:38:20,130 --> 00:38:29,620 Your image is the input. And then you do. Convolution usually start with convolution and you extract kind of more. 351 00:38:29,620 --> 00:38:34,290 What you highlight important information. We using these filters. 352 00:38:34,290 --> 00:38:42,030 So usually you you if you have one input image, maybe it has three channels, three colours, but you usually apply multiple filters. 353 00:38:42,030 --> 00:38:49,380 So you get a stack of Dumpty's. And then the information from these layers is passed onto the pooling and you 354 00:38:49,380 --> 00:38:53,160 condense the information and then you do convolution and this condensed information. 355 00:38:53,160 --> 00:39:02,090 Why not? And then you can negative it further until you come to a well, you don't have to condemn it to the point that you have one to one. 356 00:39:02,090 --> 00:39:09,720 Here in like a pool or a convolutional layer, at some point you just say, okay, I'm just taking all all of these. 357 00:39:09,720 --> 00:39:13,810 What's what's remaining of this? And I'm turning it into a vector. 358 00:39:13,810 --> 00:39:23,170 And this is your last your dense called dense layer. And this is the last vector of neurones that will have these. 359 00:39:23,170 --> 00:39:29,010 They will be activated in the activation pattern, will then be mapped to a certain class. 360 00:39:29,010 --> 00:39:32,820 So here we have just two classes, pathology, non-physical, not pathology. 361 00:39:32,820 --> 00:39:40,140 But they were also very successful experiments where you could train the networks to recognise that in one hundred classes. 362 00:39:40,140 --> 00:39:42,210 And I don't think we have the time for this. 363 00:39:42,210 --> 00:39:51,330 But I really, really encourage you to go to this interactive example, because you can actually see how the neural net works. 364 00:39:51,330 --> 00:39:59,750 And it's it's really great. And you can click on all and get more detail and you can really see how the information is flowing. 365 00:39:59,750 --> 00:40:08,790 Right. So what I've pointed out before, that filters in higher layers, capture more and more general information. 366 00:40:08,790 --> 00:40:13,200 Well, we can use this property in the technique. So that's called of propagation. 367 00:40:13,200 --> 00:40:20,020 And it's just one of possibilities to help to interpret your neural net. 368 00:40:20,020 --> 00:40:26,730 Because, yeah, with so many parameters, it's impossible to look at every single weight and say, oh, no, I know what it means when I don't know. 369 00:40:26,730 --> 00:40:36,540 I know what it's doing. Now, here you you have to find a way to output the results, which kind of contributed to network's decision. 370 00:40:36,540 --> 00:40:41,460 And we're using guided propagation and similar techniques. And you can really see. 371 00:40:41,460 --> 00:40:45,900 Okay. The network said that this picture is a dog. 372 00:40:45,900 --> 00:40:51,970 And when it said so, it was taking this consider this information into consideration. 373 00:40:51,970 --> 00:40:55,200 It is regarded this as important. 374 00:40:55,200 --> 00:41:04,500 There is an urban myths about some borough, one early machine like deep learning experiment where I think they would. 375 00:41:04,500 --> 00:41:12,270 The story goes that the researchers were trying to classify tanks, maybe like which country it came from. 376 00:41:12,270 --> 00:41:15,960 And they on their training set. 377 00:41:15,960 --> 00:41:20,940 The model was very accurate. But later on, it just couldn't. 378 00:41:20,940 --> 00:41:31,530 It was complete mess. And once they looked deeper, they realised that the background was contributing more. 379 00:41:31,530 --> 00:41:37,290 So they somehow all the tanks were always in snow or on a desert. 380 00:41:37,290 --> 00:41:42,870 And that's what the model learnt to pay attention to. It turns out to be it's very likely interest in urban myths. 381 00:41:42,870 --> 00:41:57,540 But it actually does happen. There is a real example where x rays were marked very, very finely by a clinician, whether they had pathology or not. 382 00:41:57,540 --> 00:42:04,650 It was just a little pen mark somewhere in the corner that nobody else noticed. 383 00:42:04,650 --> 00:42:08,980 Yet the modern modest it is. And it had perfect performance. 384 00:42:08,980 --> 00:42:14,640 It didn't care what kind of image was on the X-ray. Just a mark means its pathology. 385 00:42:14,640 --> 00:42:19,830 And then, of course, of a new images started coming in that didn't have that mark because they were from a different hospital, 386 00:42:19,830 --> 00:42:23,700 from a different condition. The model was completely helpless. 387 00:42:23,700 --> 00:42:30,120 So it's really important, I think, to to pay attention to these things. 388 00:42:30,120 --> 00:42:40,560 So we've talked a lot about how you can speed up the training with dropout and gradient stochastic gradient descent. 389 00:42:40,560 --> 00:42:50,040 Yet when you have very little data, for example, or you're really pressed for time and compute power, you can have another shortcut. 390 00:42:50,040 --> 00:42:59,220 And that's called transfer learning. So the trick is that whenever whatever your dataset is, is an animal's is a cars. 391 00:42:59,220 --> 00:43:07,800 Is it x rays? The first layers are usually learnt to detect edges and then little just little properties of the images. 392 00:43:07,800 --> 00:43:13,560 And if you think that they are common enough that they are also common in your problem and your data set, 393 00:43:13,560 --> 00:43:23,280 then you can actually use an already pre trained model like image that you can loaded using sound frameworks that I've told you about. 394 00:43:23,280 --> 00:43:30,000 You can loaded into your system, use the weights from as upward wage from a certain layer. 395 00:43:30,000 --> 00:43:34,000 For example, in the first four we say, OK, we don't want to train or model. 396 00:43:34,000 --> 00:43:38,280 We just cut them and transfer them to our model. 397 00:43:38,280 --> 00:43:46,290 And we use them and then continue the training further on so that it shortens your time considerably. 398 00:43:46,290 --> 00:43:53,040 And the results are still very good to the point that when I mentioned once that I want to train the model from scratch. 399 00:43:53,040 --> 00:43:58,860 People like why? Why would you do that? They are working just fine on transfer. 400 00:43:58,860 --> 00:44:02,850 All right. So that was a very quick introduction into CNN. 401 00:44:02,850 --> 00:44:08,700 And we still happily have some time to talk about Grauwe new owners, because this is a very exciting area. 402 00:44:08,700 --> 00:44:14,370 They really took off just a couple of years ago. There's new flavours being born every month. 403 00:44:14,370 --> 00:44:21,570 And the applications are really fascinating. But first, let's talk about what the graph is and how they're different from other inputs. 404 00:44:21,570 --> 00:44:27,120 So we know already that CNN's and Arnon's can work on images or text or speech. 405 00:44:27,120 --> 00:44:31,650 But what if you have a graph? Right. So here we have a pretty complex and census graph. 406 00:44:31,650 --> 00:44:35,830 It was generated randomly. We have five nodes and they're connected. 407 00:44:35,830 --> 00:44:42,180 The Connexions actually directed. So you can go from one to zero, but not back. 408 00:44:42,180 --> 00:44:47,280 Right. We don't have an arrow that goes back. Not all graphs have to be so-called directed. 409 00:44:47,280 --> 00:44:52,980 It could be just just a link. Then it can go both ways. And it can be anything. 410 00:44:52,980 --> 00:44:57,000 It could be, for example, that this is a story. 411 00:44:57,000 --> 00:45:00,870 This is a surgeon. No, no, no. Wanted a surgeon. 412 00:45:00,870 --> 00:45:05,670 And No. Zero is a patient or. 413 00:45:05,670 --> 00:45:15,130 I know. This is a symptom. And this is a disease or vice versa, or this is a drug. 414 00:45:15,130 --> 00:45:21,610 And this is a side effect. Well, these are two proteins and acting direct so that you can encode so many things. 415 00:45:21,610 --> 00:45:26,920 But how would you do a deep learning them? Graff's theory itself is very, very old. 416 00:45:26,920 --> 00:45:31,630 And even without deep learning, we can do amazing stuff there. We can come to neighbours. 417 00:45:31,630 --> 00:45:37,150 This is the adjacency matrix. So you can see how many the nodes are connected. 418 00:45:37,150 --> 00:45:48,780 You can do community detection. So one of my in one of my postural trials, it actually was a very important aspect on Twitter, for example. 419 00:45:48,780 --> 00:45:55,750 But if your graph is really, really complex and you want to do a very, very subtle things in it, 420 00:45:55,750 --> 00:46:03,160 like maybe you want to assign nodes, some nodes, classes, and you have labels for most of them, but not for all. 421 00:46:03,160 --> 00:46:07,240 How do you do this with this tradition, that method? It's pretty tricky. 422 00:46:07,240 --> 00:46:12,340 And for a long time, this was a real problem because graph is not structured right. 423 00:46:12,340 --> 00:46:17,680 If you'd taken it, it's not it's not an image with it's pixels that are a grid. 424 00:46:17,680 --> 00:46:22,480 It's not a text where every word is forming another word. 425 00:46:22,480 --> 00:46:26,290 So how do you do convolutional know, for example? And you don't have to do convolutional. 426 00:46:26,290 --> 00:46:31,250 Not all. All graph. One, that's a graph convolution. That's just one flavour. 427 00:46:31,250 --> 00:46:39,910 But, yeah, it's it was it was a big challenge. And I think what helps here is to think that, well, an image is a grid. 428 00:46:39,910 --> 00:46:48,400 And what do we do? Convolution in it. We just take information from these nine nine pixels, for example, depending on what our kernel is. 429 00:46:48,400 --> 00:46:54,370 And then we updated. We turn it into one pixel for one one value. 430 00:46:54,370 --> 00:47:02,860 So it turns out you can do the same on graphs, but instead of having a predefined grid and always needing nine or any other square, 431 00:47:02,860 --> 00:47:10,140 a number of note, you just say, well, I have this node and I have its neighbours. 432 00:47:10,140 --> 00:47:18,340 And imagine that. So what you need, though, is that every neighbour or every node has some features. 433 00:47:18,340 --> 00:47:19,760 I know it's a patient, for example. 434 00:47:19,760 --> 00:47:30,160 Then you have their height, weight, age, blood pressure, temperature, and you need to make sure that every node has these features. 435 00:47:30,160 --> 00:47:33,340 And also in the same order. And what do you do when you want to do? 436 00:47:33,340 --> 00:47:38,230 Graphical evolution is the first step is to take information from each of the neighbour, 437 00:47:38,230 --> 00:47:44,650 from the node in question, extract features from those neighbours. 438 00:47:44,650 --> 00:47:51,010 And then, for example, it can do any kind of mathematical function that lets you choose, for example, averaging. 439 00:47:51,010 --> 00:47:56,680 And then you take the features that on that note and for example, average. 440 00:47:56,680 --> 00:48:05,950 That was the average information from the neighbours. So essentially like your slushing information about because you also do it for every node. 441 00:48:05,950 --> 00:48:10,030 What would that be good for, though? Like, why would we need to propagate? 442 00:48:10,030 --> 00:48:14,080 This information is called message passing on. It's useful. 443 00:48:14,080 --> 00:48:19,750 In case you have labels on these nodes, but not on the one you're interested in. 444 00:48:19,750 --> 00:48:24,200 Then you want to classify this note. Is it a patient at risk? 445 00:48:24,200 --> 00:48:27,550 Right. Is it is it a fraudulent account? 446 00:48:27,550 --> 00:48:34,930 So, of course, what's very important in this case is that your graph, the Connexions, and it actually makes sense. 447 00:48:34,930 --> 00:48:41,180 Right. These connexions cannot be random in an image. It's dictated just by the position of the pixels. 448 00:48:41,180 --> 00:48:46,210 And then it makes sense. Imagine you just scramble the pixels in the picture. 449 00:48:46,210 --> 00:48:50,200 You yourself are not very able to recognise what it's a cat or dog. 450 00:48:50,200 --> 00:48:55,150 So there are the position makes sense here. It has to make sense as well. 451 00:48:55,150 --> 00:49:00,370 But yeah, essentially, this is one one of the things you can do with graph convolution. 452 00:49:00,370 --> 00:49:05,440 What are other things that the genomes in general are good for? 453 00:49:05,440 --> 00:49:11,260 So like I said, for example, node classification maybe can help you in disease diagnosis. 454 00:49:11,260 --> 00:49:17,470 Protein. Protein interaction. Drug protein interaction. That would be called link completion. 455 00:49:17,470 --> 00:49:25,330 So you have you have your note in the graph and you want to figure out whether they are connected or not. 456 00:49:25,330 --> 00:49:29,350 There is also a very interesting technique called node embedding, 457 00:49:29,350 --> 00:49:39,340 where you just want to condense the feature space into a just represent clusters in your in your graph. 458 00:49:39,340 --> 00:49:45,580 And then you can do that by using node embedding or you can classify whole graphs. 459 00:49:45,580 --> 00:49:49,330 Not just the node in the graph, but whole graphs. And that can help you. 460 00:49:49,330 --> 00:50:00,240 For example, was muk molecule class prediction is a toxic or not. So it's the diversity and application that was just medicine, right? 461 00:50:00,240 --> 00:50:03,930 The graphs are useful in social network analysis. 462 00:50:03,930 --> 00:50:12,570 Of course, like US, bank banking, fraud detection, all the things because very large, 463 00:50:12,570 --> 00:50:18,160 that there is very large diversity of information and you can encode in a graph. 464 00:50:18,160 --> 00:50:23,680 So, yeah, really fascinating field. And when when preparing this lecture, 465 00:50:23,680 --> 00:50:32,590 I went through maybe like 40 different resources and I want to highlight just a few of them here and the the kind of split into chunks. 466 00:50:32,590 --> 00:50:40,930 So these are very nice, user friendly talks and blog posts on Quackenbush networks. 467 00:50:40,930 --> 00:50:45,900 Then we have a very interesting talk and very recent talk by Michael Crichton. 468 00:50:45,900 --> 00:50:50,420 And he is in London and he's very, very active in this area. 469 00:50:50,420 --> 00:50:58,150 Then this is number four is a very good review of methods and applications of graphene, your networks. 470 00:50:58,150 --> 00:51:04,590 They really go through all the possible flavours and all the possible uses of craft. 471 00:51:04,590 --> 00:51:12,600 In deep learning, this is a book that is also very fresh and it's available on print online. 472 00:51:12,600 --> 00:51:23,160 And again, if you want to go out and do things and programme and train your models, some graphs, they're already taught how to do that. 473 00:51:23,160 --> 00:51:33,830 So there is a deep graph library that is very stable and works even on top of other tools and deal frameworks. 474 00:51:33,830 --> 00:51:39,870 And for example, for Lifesciences, deep cam is a very useful collection of tools. 475 00:51:39,870 --> 00:51:50,250 It's not exclusively deep learning on graphs, but that area is very occupies a large chunk of it. 476 00:51:50,250 --> 00:51:55,740 So, you know, I hope you've learnt some new things today. 477 00:51:55,740 --> 00:52:01,670 Those of you who are already deep, very practitioners of it was a good recap. 478 00:52:01,670 --> 00:52:07,980 I thank you very much for your attention. And I have one last link here, which is about attention networks, 479 00:52:07,980 --> 00:52:16,980 which is another very interesting technique in and in your and that's that allows them to improve their results even further. 480 00:52:16,980 --> 00:52:21,890 And the outlook that we even have some time for questions. 481 00:52:21,890 --> 00:52:27,990 I would especially welcome you to some more practical questions or machinery and industry or this stuff, 482 00:52:27,990 --> 00:52:31,020 because we wouldn't have time to go through Damascus and index. 483 00:52:31,020 --> 00:52:39,720 But if you go through all the links that I've posted and in those in the slides, you will know these topics better than I do. 484 00:52:39,720 --> 00:52:41,312 Maybe.