1
00:00:01,480 --> 00:00:10,840
So, yeah, once again, it gives me great pleasure to welcome Katcher Volkova Volkmer firm from Roshe Basel.

2
00:00:10,840 --> 00:00:17,760
And she's going to tell us about her deep learning is used in biomedicine.

3
00:00:17,760 --> 00:00:23,580
Good afternoon, everyone. And it's a pleasure to present to you today.

4
00:00:23,580 --> 00:00:31,710
I did a similar lecture about a year ago, but Baghdad still in person and it was quite, quite a bit more extensive.

5
00:00:31,710 --> 00:00:39,140
It was three hours of lectures and three hours of practise. So I had to condense the material a lot and I also updated it.

6
00:00:39,140 --> 00:00:46,020
But just a little bit about me. I was born in Russia many years ago.

7
00:00:46,020 --> 00:00:59,400
I started in actual linguistics and English studies after school, got slightly bored after a couple of years and moved to try to bring in Germany.

8
00:00:59,400 --> 00:01:05,700
Studied competition linguistics. And while I was doing competition linguistic studies, I got really used in neuroscience.

9
00:01:05,700 --> 00:01:12,090
So my BGT wasn't cognitive neuroscience and much plunker for biological cybernetics.

10
00:01:12,090 --> 00:01:20,700
After I finished my HD, I moved to UK and worked there as a data scientist for a few years and two very nice companies.

11
00:01:20,700 --> 00:01:22,890
I really learnt a lot there,

12
00:01:22,890 --> 00:01:36,810
but towards 2018 I started missing the continental life again and I was looking for opportunities in mainland Europe and I was again very,

13
00:01:36,810 --> 00:01:43,080
very fortunate to get the role of senior data scientist at Roche.

14
00:01:43,080 --> 00:01:47,250
Far more research and early development informatics department.

15
00:01:47,250 --> 00:01:56,520
So all of my work is currently with the digital biomarkers group and particularly our focus on Parkinson's disease.

16
00:01:56,520 --> 00:02:00,180
So they are so-called data analysis seen lead and Siemers,

17
00:02:00,180 --> 00:02:08,940
Parkinson and I do a lot of analysis of human behaviour when they perform so-called

18
00:02:08,940 --> 00:02:15,600
active tests on the smartphones that we provide them that accompany clinical trials.

19
00:02:15,600 --> 00:02:20,640
But enough about me. Yes.

20
00:02:20,640 --> 00:02:25,800
So what's the plan for this lecture? We only have an hour. Unfortunately, I wish we had a whole semester.

21
00:02:25,800 --> 00:02:30,980
I could tell you much more. So I want to go through four major points.

22
00:02:30,980 --> 00:02:36,100
What is a deep learning at all? What is it and what is it good for?

23
00:02:36,100 --> 00:02:43,110
We will learn some basics so that we can then further go and a bit more depth into two flavours of deep learning,

24
00:02:43,110 --> 00:02:47,430
convolutional neural networks and graph coalitional networks.

25
00:02:47,430 --> 00:02:52,560
And there's many more flavours, but you clearly don't have the time for all that.

26
00:02:52,560 --> 00:02:57,450
One note, there is going to be a lot of extra links almost on every slide.

27
00:02:57,450 --> 00:03:05,820
Some of them are going to be just sources for images that I stole from very nice of resources.

28
00:03:05,820 --> 00:03:09,990
Some of them are going to be links to true resources where I can go and learn more in detail.

29
00:03:09,990 --> 00:03:15,000
And they all spelt out better because I don't want you to worry. Like, oh, if I click on this link, where is it going to take me?

30
00:03:15,000 --> 00:03:18,990
I've checked them all because some of them were already there a year ago.

31
00:03:18,990 --> 00:03:24,240
I checked them all this week. They all live and some were even updated.

32
00:03:24,240 --> 00:03:32,940
So I really hope you will find this material helpful right before we go into deep learning itself.

33
00:03:32,940 --> 00:03:39,660
Me, as a machine learning practitioner, I want to emphasise some practical advice.

34
00:03:39,660 --> 00:03:45,060
First of all, you'll probably many of you heard that all models are wrong, but some models are useful.

35
00:03:45,060 --> 00:03:49,440
So when we apply machine learning or just even statistical models,

36
00:03:49,440 --> 00:03:56,190
we're trying to explain the data we observe in the world and we try to find patterns that are useful.

37
00:03:56,190 --> 00:04:04,800
Any model will always have some bits and pieces of data that's not not explained, but we hope that it's just noise and not not important.

38
00:04:04,800 --> 00:04:08,370
So we care about the models that are useful and how do we build them.

39
00:04:08,370 --> 00:04:16,220
So another very frequent phrase that's heard from all machinery practitioners is garbage.

40
00:04:16,220 --> 00:04:26,190
Garbage in, garbage out, meaning that, well, we want our model to represent to explain and predict a certain phenomenon.

41
00:04:26,190 --> 00:04:32,700
If the data we're training the model on is not representative of the phenomenon, the model will be useless as such.

42
00:04:32,700 --> 00:04:38,670
So it'd be really, really careful. This is a very unforgiving rule, especially in deep learning.

43
00:04:38,670 --> 00:04:46,920
Then again, when we talk about deep learning in particular, many people treat it as some kind of magical tool.

44
00:04:46,920 --> 00:04:54,150
Partially because it's really hard to interpret. But the usual machine learning pitfalls like overfitting and bias apply to deep learning as well.

45
00:04:54,150 --> 00:04:59,040
It's not a silver bullet. So you have to be very watchful there as well.

46
00:04:59,040 --> 00:05:03,060
And when since I mentioned that deep learning is harder to interpret.

47
00:05:03,060 --> 00:05:06,840
This is kind of a part of the so-called there is no free lunch.

48
00:05:06,840 --> 00:05:15,900
So if you build a very simple, very easy to interpret model, for example, you have three variables and you're predicting some outcome and you can.

49
00:05:15,900 --> 00:05:22,820
See, like it can. It's very, very clear. Right. You Hucker coefficients, you have you order them cockpit's intervals.

50
00:05:22,820 --> 00:05:27,640
That's OK. This feature is doing this. This feature is doing that. Lovely.

51
00:05:27,640 --> 00:05:32,370
You've learnt you have thousands, maybe even millions of parameters.

52
00:05:32,370 --> 00:05:37,720
It's impossible to interpret. Your model can be much more powerful for very complex data.

53
00:05:37,720 --> 00:05:45,000
But you lose this interpreter ability. And there are tools that help us to interpret models in deep learning as well.

54
00:05:45,000 --> 00:05:54,270
And I think it's also everybody's responsibility and duty to apply them, because I think it's very important to understand what the model does.

55
00:05:54,270 --> 00:05:59,850
Right. Another device I would like to give you is that don't be just a number cruncher or really

56
00:05:59,850 --> 00:06:05,830
understand the problem you're trying to solve and the data in the early days of data size.

57
00:06:05,830 --> 00:06:11,730
I've often heard opinions like, oh, well, it doesn't matter that you don't know genetics, which I don't.

58
00:06:11,730 --> 00:06:18,090
I'm not a biologist. You will just do your number crunching and everything will work out.

59
00:06:18,090 --> 00:06:20,310
This is really dangerous, I think.

60
00:06:20,310 --> 00:06:26,490
So at least if you're not the expert in the field where the data is coming from and the problem and trying to solve,

61
00:06:26,490 --> 00:06:30,480
make sure that you have a very strong connexion to somebody,

62
00:06:30,480 --> 00:06:38,070
to an expert, and then you can always show them some intermediate results so that they can help understand the problem better.

63
00:06:38,070 --> 00:06:42,300
And then there is also another aspect of machine learning.

64
00:06:42,300 --> 00:06:49,440
Often times, especially in research and also in industry models are built to interpret the data, not necessarily to predict.

65
00:06:49,440 --> 00:06:57,660
And that's fine. But if you want your model to live on on some server or in some app that's called machine models in

66
00:06:57,660 --> 00:07:07,920
production and it goes into the area of Amelle ops personalisation and that calls for model maintenance,

67
00:07:07,920 --> 00:07:14,370
which means that even if you're very certain that, you know, turmoil is really great and performs well.

68
00:07:14,370 --> 00:07:18,530
Please continue testing it over time on new data.

69
00:07:18,530 --> 00:07:29,310
There can be shift in what their true data for this Trufant looks like and make sure make sure to analyse and fix the errors.

70
00:07:29,310 --> 00:07:38,190
Otherwise, over time, your model will become useless. And I have a very, very forceful advice here.

71
00:07:38,190 --> 00:07:43,140
Please watch this course at least twice. It's from entering from Coursera.

72
00:07:43,140 --> 00:07:46,980
It's really, really great. And it is tailored towards deep learning.

73
00:07:46,980 --> 00:07:55,470
But the things you can learn about it, the things you can learn there, are applicable to other machinery players, flamers as well.

74
00:07:55,470 --> 00:08:02,910
OK. This was a long slide. I think it's really, really important. I didn't want to skip over it.

75
00:08:02,910 --> 00:08:06,600
Now we can finally start our journey with deep learning.

76
00:08:06,600 --> 00:08:18,420
So what is deep learning? It's a set of methods that really took off in the last 10 years.

77
00:08:18,420 --> 00:08:22,830
It's a subset of other machine learning algorithms and tools.

78
00:08:22,830 --> 00:08:28,860
Machine learning is a bit older. It took off thanks to three factors.

79
00:08:28,860 --> 00:08:36,570
I mean, actually, the methods themselves solicited their philosophy was there since quite a long, longer period of time.

80
00:08:36,570 --> 00:08:42,180
But there was no not not enough computer power to make them scalable.

81
00:08:42,180 --> 00:08:45,510
And there was not enough data itself to train these models on.

82
00:08:45,510 --> 00:08:55,770
So really was the the dawn of big data and faster computers, Duplantier finally took off.

83
00:08:55,770 --> 00:09:00,380
Machine learning itself is part it. There's a subset of artificial intelligence.

84
00:09:00,380 --> 00:09:05,340
So not all artificial intelligence needs machine learning to perform.

85
00:09:05,340 --> 00:09:09,330
But many, many more search algorithms do.

86
00:09:09,330 --> 00:09:12,720
And many and many of them are now the news funds.

87
00:09:12,720 --> 00:09:20,640
They often rely on deep learning. And I think a big distinction between so-called classic or like something called

88
00:09:20,640 --> 00:09:25,520
old school machine learning and deep learning is the fact that in typical machine,

89
00:09:25,520 --> 00:09:33,900
like an old school machine learning project, you as a machine practitioner or you would do feature extraction manually on your own,

90
00:09:33,900 --> 00:09:41,550
like either you already receive a data set and you start looking at that and the plotting and exploratory data analysis and you think, well,

91
00:09:41,550 --> 00:09:48,450
maybe I'll add a few more features here and encode this data this way and extract extra things,

92
00:09:48,450 --> 00:09:53,280
merge new datasets together, and then you perform your classification.

93
00:09:53,280 --> 00:09:57,960
Deep learning does feature extraction for you because the typical inputs it works on,

94
00:09:57,960 --> 00:10:09,690
like images or free or likesome speech recordings or text there, it's almost impossible to come up with a very useful set of features for all cases.

95
00:10:09,690 --> 00:10:15,730
So this is also a very think important distinction between classical machine learning and deep learning.

96
00:10:15,730 --> 00:10:19,530
So when we talk about people who contributed to the right is of deep learning,

97
00:10:19,530 --> 00:10:24,790
there is, of course, many more than just these three very smart gentlemen.

98
00:10:24,790 --> 00:10:29,800
But I wanted to highlight their personalities in particular, because a couple of years ago,

99
00:10:29,800 --> 00:10:41,110
they actually got a Turing Award for contribution into to deepen your networks.

100
00:10:41,110 --> 00:10:49,420
They are still very active in the field. And for example, I really like that they they even challenged their own old ideas.

101
00:10:49,420 --> 00:10:54,660
So they still contribute. And I really advise you to learn more about them.

102
00:10:54,660 --> 00:11:01,600
You know, if you have the time. Medicine is very close to my skin.

103
00:11:01,600 --> 00:11:07,240
I fell in love with digital health after I finished my page.

104
00:11:07,240 --> 00:11:11,740
And I think I'm really, really fortunate to work in this field now, which is why,

105
00:11:11,740 --> 00:11:16,090
of course, when I want to update myself on what deep learning is good for,

106
00:11:16,090 --> 00:11:24,220
I first and foremost look how it can be used in medicine and the uses are plentiful so it can help in diagnosis.

107
00:11:24,220 --> 00:11:35,170
For example, you have an image of some some cells and you want to figure out like a biopsy and you want to figure out whether it's cancerous or not.

108
00:11:35,170 --> 00:11:39,140
Right. Medical imaging doesn't have to be just about diagnosing as you can.

109
00:11:39,140 --> 00:11:44,210
Also, maybe you already know that somebody was Parkinson.

110
00:11:44,210 --> 00:11:49,450
All right. They can take brain scans over time and again.

111
00:11:49,450 --> 00:11:56,380
Deep learning can help to find to figure out how the disease is developing over time.

112
00:11:56,380 --> 00:11:59,230
Deep learning is supporting clinical trials.

113
00:11:59,230 --> 00:12:06,820
This is somewhat more challenging because clinical trials there, you have to be absolutely clear and very, very comfortable.

114
00:12:06,820 --> 00:12:16,750
And like I mentioned before, Tony is sometimes hard to interpret. But nevertheless, the sum in some aspects, the burning down, it already helps.

115
00:12:16,750 --> 00:12:27,220
And last but not least, especially in the last couple of years, deep learning you have was shown as an amazing tool for drug discovery,

116
00:12:27,220 --> 00:12:35,310
which is a very, very the Borias very expensive process otherwise.

117
00:12:35,310 --> 00:12:45,630
So it could be that in deep learning would really be a tool to solve some huge bottlenecks in this area.

118
00:12:45,630 --> 00:12:53,370
So these were the areas. Well, who are the companies that are very active in deploying medicines in particular?

119
00:12:53,370 --> 00:12:58,660
So some of my favourites and I actually local on the island are Babylon Hills.

120
00:12:58,660 --> 00:13:04,350
They are helping to scale the general health in UK.

121
00:13:04,350 --> 00:13:10,590
And they have. I'm not true whether that model is kind of falls into the class of deep learning,

122
00:13:10,590 --> 00:13:18,120
but they have a very clever Bayesian network that has passed the doctors exam GP exam.

123
00:13:18,120 --> 00:13:24,840
So if you give it symptoms and it keeps asking interactively, go, do you have the symptoms or what about the symptoms?

124
00:13:24,840 --> 00:13:29,880
It can diagnose your disease pretty reliably. And then benevolently.

125
00:13:29,880 --> 00:13:35,460
I has shown some breakthrough in drug discovery recently.

126
00:13:35,460 --> 00:13:44,940
Deep Mind has its own department in health, where they also try to optimise certain aspects of health care.

127
00:13:44,940 --> 00:13:49,410
And of course, there is big pharma companies like Roche, where I work, Novartis.

128
00:13:49,410 --> 00:13:58,050
AstraZeneca has been expanding their. Their stuff in in terms of machine learning, skill set.

129
00:13:58,050 --> 00:14:03,300
And last but not least, there are the usual tech giants like Google, Apple, Amazon, IBM and Philips.

130
00:14:03,300 --> 00:14:08,050
They all have very strong departments in-house.

131
00:14:08,050 --> 00:14:16,510
And health care, right? So deep learning has been going out and doing great things.

132
00:14:16,510 --> 00:14:22,980
And is there are there any horizons where it's still it still has to reach?

133
00:14:22,980 --> 00:14:33,960
So for me, the most interesting topics are the following, where I hope that the next breakthroughs will take place, call in front and reasoning.

134
00:14:33,960 --> 00:14:44,130
So even though deep learning looks like it's super wise and it can just tell you whether there is a cat on the picture or not.

135
00:14:44,130 --> 00:14:50,850
And many, of course, many other useful things. It's still it's still just patterns.

136
00:14:50,850 --> 00:14:54,540
It's just this it's still correlation, essentially.

137
00:14:54,540 --> 00:15:02,500
And what is still often missing is that this causal link and I really recommend you to read this book.

138
00:15:02,500 --> 00:15:05,940
The book of why it's very non-technical. It's very engaging.

139
00:15:05,940 --> 00:15:14,100
I think I read it twice already because, yeah, it's just really interesting to think about.

140
00:15:14,100 --> 00:15:22,440
It's a very different way of thinking about data. Then we still have lots of problems in algorithmic bias and bias and data.

141
00:15:22,440 --> 00:15:31,650
And again, deep learning being somewhat more obscure. It's it's harder to catch this bias, but it's very important to cheque for it.

142
00:15:31,650 --> 00:15:41,820
Done this. Like I said, oh, we still need to increase transparency and interoperability in deep learning.

143
00:15:41,820 --> 00:15:49,020
And that in particular would help in clinical adoption, in my case, in my opinion.

144
00:15:49,020 --> 00:15:58,380
For example, if you want to submit a new algorithm for a diagnosis to FDA, it has to be very, very clear.

145
00:15:58,380 --> 00:16:04,630
So I doubt that even something like a random forest would be very welcome there unless it's very clearly explained.

146
00:16:04,630 --> 00:16:09,030
Imagine how much trans deployed multiple would have.

147
00:16:09,030 --> 00:16:13,170
And a new point added recently is metal learning.

148
00:16:13,170 --> 00:16:19,950
And that is a hot topic right now as well, because the resulting models are highly, highly special.

149
00:16:19,950 --> 00:16:28,470
Establisher an example late and interactive examples of what it means. So currently the search is for a more General Lovering approach.

150
00:16:28,470 --> 00:16:36,790
So a an algorithm that was trained on one thing and then can do and complete a different one or learn it much, much faster.

151
00:16:36,790 --> 00:16:43,810
Right. So this was just a very quick tour de force.

152
00:16:43,810 --> 00:16:53,500
Where can you learn more? There's online courses. I've done plenty of them during my pages and after and still do them.

153
00:16:53,500 --> 00:16:59,070
If I had the time. There is tons of podcasts. There's, of course, many, many books.

154
00:16:59,070 --> 00:17:04,630
So highlighted just two of the couple of dozen. There's so many YouTube channel.

155
00:17:04,630 --> 00:17:09,280
Some of them are absolutely brilliant.

156
00:17:09,280 --> 00:17:14,740
I wouldn't think the only YouTube channels I would recommend is where the instructor tells you, oh, deep learning is so easy.

157
00:17:14,740 --> 00:17:21,920
You just type these two lines of code and you're done. Please don't follow that advice. Please try to understand in depth what is actually going on.

158
00:17:21,920 --> 00:17:28,110
So there is lots of blogs, of course, and a whole class of devotees from medium.

159
00:17:28,110 --> 00:17:34,630
I particularly read the like towards data science. And of course, there's always papers.

160
00:17:34,630 --> 00:17:38,350
I have a more detailed list on my blog.

161
00:17:38,350 --> 00:17:42,790
So if you want to cheque it out, you're more than welcome. Right.

162
00:17:42,790 --> 00:17:49,180
So let's imagine that after the stock, your super inspired and you want to learn more about deep learning and maybe you already

163
00:17:49,180 --> 00:17:53,500
have a problem that you would like to solve and you have a different brilliance.

164
00:17:53,500 --> 00:17:58,030
So how do you do? Do go and programme motorcoach from scratch on your own.

165
00:17:58,030 --> 00:18:06,040
You do not have to do that. There are at least four and the number is always growing.

166
00:18:06,040 --> 00:18:11,290
Platforms that can help you build your models. They all have slightly different flavours,

167
00:18:11,290 --> 00:18:18,730
different advantages and disadvantages within my team would tend to use by torch because it's very nice and friendly.

168
00:18:18,730 --> 00:18:28,900
But my first models I've built on denser flow and keris. Terrorist is like a very nice, friendly up four tenths of a century.

169
00:18:28,900 --> 00:18:34,430
So, all right, we're done with the main introduction.

170
00:18:34,430 --> 00:18:41,330
Let's look a bit closer to what kind of flavours exist in deep learning.

171
00:18:41,330 --> 00:18:45,560
So we will start with the what are called Grandpa Perceptron.

172
00:18:45,560 --> 00:18:50,060
That's the main kind of simplest building block of deep learning.

173
00:18:50,060 --> 00:18:54,920
And we will also we will next go into multilayer Perceptron.

174
00:18:54,920 --> 00:19:01,760
And as you can see here, there is kind of four main 5min families.

175
00:19:01,760 --> 00:19:09,050
So there is convolutional neural net. They work on images. Mostly there's Grauwe neural nets that work on graph.

176
00:19:09,050 --> 00:19:12,710
And they kind of have a baby together with the Quackenbush.

177
00:19:12,710 --> 00:19:20,930
And that's what we'll talk about them in more detail because it's kind of an easy step from convolutional neural nets to a graph, convolutional nets.

178
00:19:20,930 --> 00:19:26,890
But if you want to work with text or speech, then you might be better off with them.

179
00:19:26,890 --> 00:19:32,250
The recurrent neural nets and the prolonged short term memory.

180
00:19:32,250 --> 00:19:36,620
And that's the kind of an extension of governor.

181
00:19:36,620 --> 00:19:45,440
Some people have done a very good research or have achieved very good results on, for example, speech recognition with CNN is actually.

182
00:19:45,440 --> 00:19:50,510
But yeah, it's it's a way to do it. I'm not a.

183
00:19:50,510 --> 00:19:58,520
And then a whole different beast is deeper enforcement, which I think will it's a very promising fields.

184
00:19:58,520 --> 00:20:05,300
And there is our turn coders and guns that we will not have time to go into them.

185
00:20:05,300 --> 00:20:11,450
So there's a generative anniversary networks or serial networks.

186
00:20:11,450 --> 00:20:15,910
So, yeah, these ones are out of scope today. But just all.

187
00:20:15,910 --> 00:20:25,280
These will not have a time. So there's not going to be lots of technical detail today, again, due to the lack of time.

188
00:20:25,280 --> 00:20:29,660
But I still want you to understand it quite a detail.

189
00:20:29,660 --> 00:20:33,260
What Perceptron does. Perceptron is just a linear model.

190
00:20:33,260 --> 00:20:36,770
Nothing else is just a. Well, no, not exactly nothing else.

191
00:20:36,770 --> 00:20:41,870
But essentially, it's a linear model. So you have one year.

192
00:20:41,870 --> 00:20:49,100
So can you see my course or can you see my mouse? They like.

193
00:20:49,100 --> 00:20:53,010
I haven't used that so much before. Yes, yes, it is.

194
00:20:53,010 --> 00:20:58,300
OK, great. So, yeah. So here you have the inputs right at your features.

195
00:20:58,300 --> 00:21:02,530
So Perceptron or oppressive trons, they can work on tabular data.

196
00:21:02,530 --> 00:21:08,590
In fact, some of my colleagues, to use it often times as a benchmark, you get a new tabular data set.

197
00:21:08,590 --> 00:21:15,490
You throw it onto the multilayer Perceptron. You get your ICOM and then on your better models you try to improve.

198
00:21:15,490 --> 00:21:21,910
You tried to find Overfitting or you tried to just get better accuracy overall.

199
00:21:21,910 --> 00:21:31,060
So here you have your inputs. And. On these inputs, you also apply certain weights and then you just send them up.

200
00:21:31,060 --> 00:21:36,670
That's all you do when you add your bias, which is the intercept. So in statistics, these are coefficients.

201
00:21:36,670 --> 00:21:42,790
And this is the intercept. And then as the outcome, so you get this weighted sum essentially.

202
00:21:42,790 --> 00:21:47,020
And you look whether it's above zero or not.

203
00:21:47,020 --> 00:21:55,230
And drove from that. You judge whether you should say like one four zero on as the output.

204
00:21:55,230 --> 00:22:00,280
That's that's almost Perceptron. It it's quite all this [INAUDIBLE]. Was there even before the 60s.

205
00:22:00,280 --> 00:22:07,160
And this is the building block for essentially all deep learning networks.

206
00:22:07,160 --> 00:22:11,210
And it can do quite like it. It can do linear separation.

207
00:22:11,210 --> 00:22:18,700
It can build a linear model. So. Well, you might be thinking how how do we have this input?

208
00:22:18,700 --> 00:22:21,700
Right. This is this is in our data. How do we find the weights?

209
00:22:21,700 --> 00:22:27,910
Because judging depending on his weights, you would have a model that is either totally off.

210
00:22:27,910 --> 00:22:32,800
Right. It's just not accurate at all. Here we have our ground truth as dots.

211
00:22:32,800 --> 00:22:43,580
You have two classes. And then we have our predictions. And the model says, well, I'm super sure that this this point is right, which is not true.

212
00:22:43,580 --> 00:22:53,520
Then when when we adapt, when we change the weights and the intercept, then we can we can say, okay, now model things.

213
00:22:53,520 --> 00:22:58,720
And this is the good fit. And here it's actually 50/50. So this is an improvement, but it could do better.

214
00:22:58,720 --> 00:23:03,670
So we iterate on and I will tell you exactly how iterate.

215
00:23:03,670 --> 00:23:14,350
And now we have actually if we count how many classes are marked blue and how many red, we see that this model is much more accurate.

216
00:23:14,350 --> 00:23:26,480
But it's not perfect. A perfect fit is actually a model that kind of maximises distance from the line where it's 50/50 to your training set.

217
00:23:26,480 --> 00:23:30,310
So, yeah, exactly how do we hold this line to rotate?

218
00:23:30,310 --> 00:23:36,960
How do we hold this line to find an optimal fit? So in in linear regression.

219
00:23:36,960 --> 00:23:43,080
But also in Perceptron approach. What to use is a gradient descent.

220
00:23:43,080 --> 00:23:49,480
And here on the Y axis, we have our so-called cost function or quite how accurate our model is.

221
00:23:49,480 --> 00:23:55,540
So here it's basically at the bottom. You have no era or a minimum possible error.

222
00:23:55,540 --> 00:23:58,720
And the higher up you go, the worse your model is performing.

223
00:23:58,720 --> 00:24:06,760
And you do this by basically you're looking at the distance towards these dots and encoding when it were.

224
00:24:06,760 --> 00:24:14,830
Categorised correctly or not. So you can initially the weights are generated randomly.

225
00:24:14,830 --> 00:24:23,710
It can be set to zero, but it can also generate randomly. It doesn't matter because once you have one outcome or like a couple of dots,

226
00:24:23,710 --> 00:24:29,410
you can build your gradient and the gradient tells you how far off kind of the steepness of the

227
00:24:29,410 --> 00:24:37,210
slope tells you how far off you are because what you want to get is to a completely parallel slope.

228
00:24:37,210 --> 00:24:42,970
When your gradient is zero. You know that you you've gotten there.

229
00:24:42,970 --> 00:24:45,530
It's it's a bit more tricky than that. It's a bit more detail than that.

230
00:24:45,530 --> 00:24:53,320
But the beauty of gradient descent is that, you know, you always know where to go, in which direction and by how much you're off.

231
00:24:53,320 --> 00:24:56,670
So we can adapt your steps with time.

232
00:24:56,670 --> 00:25:03,140
And so you don't have to take a tons of tiny, tiny, tiny steps if you own in your direction, than you would have to do that.

233
00:25:03,140 --> 00:25:06,790
But you also know by how much approximately you need to go.

234
00:25:06,790 --> 00:25:14,590
So if you have very few data points and very few parameters, you can afford to do gradient descent as it is.

235
00:25:14,590 --> 00:25:20,110
But I will also talk about the trick that allows you to do it faster.

236
00:25:20,110 --> 00:25:24,220
But let's first go into a slightly more complex architecture.

237
00:25:24,220 --> 00:25:29,350
So we only had one person trying before.

238
00:25:29,350 --> 00:25:33,880
Well, what if we have two or three or 16 or 200?

239
00:25:33,880 --> 00:25:39,040
We still have our inputs and then we still have our weights that we're fitting.

240
00:25:39,040 --> 00:25:44,980
And here we have two linear models and they will have their own beliefs and their own biases.

241
00:25:44,980 --> 00:25:54,670
And then they can encode two different linear models. And by combining these two different models, we can actually apply classification, for example,

242
00:25:54,670 --> 00:26:03,810
to nonlinear citations like imagine if you actually had this kind of class dependency, that blue dodge.

243
00:26:03,810 --> 00:26:08,380
So it would basically have here maybe one feature and another feature.

244
00:26:08,380 --> 00:26:17,470
And we know that blue dots are typically at least half or over.

245
00:26:17,470 --> 00:26:27,610
All of our range on the x axis and more or less, half or more on the y axis sort an end situation.

246
00:26:27,610 --> 00:26:33,340
And this allows us to encode for this linear nonlinearity.

247
00:26:33,340 --> 00:26:38,950
And of course, the other more layers we have, the more the more neurones to do.

248
00:26:38,950 --> 00:26:51,190
So this one, this in layers called the neurone, the more detail we can add to this one non-linearity on as many dimensions as we want.

249
00:26:51,190 --> 00:26:57,980
So. This is. This is where the.

250
00:26:57,980 --> 00:27:03,530
Again, this is the basics of all deep learning approaches, more or less.

251
00:27:03,530 --> 00:27:09,270
And a very important trick they see to cheque messages here.

252
00:27:09,270 --> 00:27:20,250
Sorry, okay. Apologies. Let's go back.

253
00:27:20,250 --> 00:27:30,450
Doesn't want to go back. Meet Christopher Baqa'a in the bottom.

254
00:27:30,450 --> 00:27:34,950
Yeah, yeah, no, it's it's working out so yeah.

255
00:27:34,950 --> 00:27:42,450
So another very important aspect of deep learning is back propagation.

256
00:27:42,450 --> 00:27:51,930
And that's basically when you so you have your architecture before we had a very simple architecture was just one hidden leg.

257
00:27:51,930 --> 00:28:00,600
Like I said, you can have multiples and very quickly you get a fairly complex system where you have weights on every edge.

258
00:28:00,600 --> 00:28:04,770
And so you take your inputs. You'll generate your weights.

259
00:28:04,770 --> 00:28:09,930
You apply them to these new neurones. You activate them or not, depending on the outcome.

260
00:28:09,930 --> 00:28:16,260
Then they themselves generate weights and so on. And essentially you get them to the output.

261
00:28:16,260 --> 00:28:22,260
For example, if you have a binary outcome, they say, okay, was this input from this example?

262
00:28:22,260 --> 00:28:30,810
It looks like I have maybe a probability of point six on one class and point four on the other class.

263
00:28:30,810 --> 00:28:39,630
But actually. We know that the right answer is just 100 percent on the first class and then zero or like.

264
00:28:39,630 --> 00:28:48,060
Yeah, and zero probability on the other. How do we correct for this so we can calculate our cost function and we back propagate this thing?

265
00:28:48,060 --> 00:28:54,000
Well, yeah, you need to correct these weights in this direction and these weights misdirection.

266
00:28:54,000 --> 00:29:05,010
And this is called back propagation. And when you have it go forward, so do the wave from input awkwardly is called feed forward pass.

267
00:29:05,010 --> 00:29:11,370
And then the correction of these weights, the adjustment of these weights is called back propagation.

268
00:29:11,370 --> 00:29:14,580
And one cycle of those two is called epoch.

269
00:29:14,580 --> 00:29:21,690
And usually when you're training you deploying models, you have multiple epochs for very, very simple scenarios.

270
00:29:21,690 --> 00:29:30,030
Just a dozen might suffice. But for very complicated data, you might need hundreds.

271
00:29:30,030 --> 00:29:36,160
But gradually, your gradually your model will converge.

272
00:29:36,160 --> 00:29:45,740
And go adjust the weights so that any input will generate most of the times correct results.

273
00:29:45,740 --> 00:29:50,240
So you already see that even very few days and very few neurones.

274
00:29:50,240 --> 00:29:56,660
We already have so many parameters. These weights, for example. Right. It's quite natural.

275
00:29:56,660 --> 00:30:03,990
That was larger data sets and larger system architectures.

276
00:30:03,990 --> 00:30:11,640
The whole training process will slow down even worse, much faster computers that we have today.

277
00:30:11,640 --> 00:30:18,900
It's still gonna take too long. And and I think the great news is that it doesn't have to take this long.

278
00:30:18,900 --> 00:30:27,840
You don't have to build to to do great in the sand approach for every single input, every single future what you can.

279
00:30:27,840 --> 00:30:34,560
You could take shortcuts. So some of the shortcuts are called Dyster Stochastic Gradient Descent and drop out.

280
00:30:34,560 --> 00:30:39,300
And they work on just two aspects of the neural net.

281
00:30:39,300 --> 00:30:48,000
So stochastic reading descent does gradient descent, but not on all of the inputs, just on a few of them at a time.

282
00:30:48,000 --> 00:30:57,300
So it takes Bachus to just select them randomly and does the gradient descent on them and drop out does kind of the opposite.

283
00:30:57,300 --> 00:31:02,790
So that randomly switches off nodes in in the in the hidden layers.

284
00:31:02,790 --> 00:31:07,710
And that means that not all of them are activated at the same time.

285
00:31:07,710 --> 00:31:16,710
Not only do these tricks help you to train your network faster, but they also make it more robust.

286
00:31:16,710 --> 00:31:26,070
Deep learning networks are amazing and overfitting. If you give them the chance, they will just memorise the whole training set by heart and.

287
00:31:26,070 --> 00:31:33,510
Well, yeah, basically give you a perfect results on your training data and be awful and useless on any new data.

288
00:31:33,510 --> 00:31:41,490
But yeah, the stochastic gradient descent and drop out help you to make these models more generalisable.

289
00:31:41,490 --> 00:31:56,100
I once heard a lovely metaphor on the podcast saying that so neural nets are very good at finding Mockett maximum optimum.

290
00:31:56,100 --> 00:31:58,620
Basically.

291
00:31:58,620 --> 00:32:09,620
Yeah, the perfect the perfect victims in your future space, which often look like dead deep, deep wounds, deep wells once and your network finds it.

292
00:32:09,620 --> 00:32:14,660
It will not be able to get out of it until these stochastic gradient descent and drop out.

293
00:32:14,660 --> 00:32:20,540
It's like the model is wearing very big boots. So it cannot fall into these wells.

294
00:32:20,540 --> 00:32:25,070
So I think it's it's a nice metaphor for this.

295
00:32:25,070 --> 00:32:35,640
Still, with enough epochs, you can always overfit and you need to know when to stop so you can still apply old good regularisation.

296
00:32:35,640 --> 00:32:39,400
And you might know from elastic nuts, for example, and what lot of one know.

297
00:32:39,400 --> 00:32:46,970
To. But also as a.

298
00:32:46,970 --> 00:32:53,810
Model, you can need to watch out for gradual divergence between the train dataset and the rotation

299
00:32:53,810 --> 00:32:57,650
because you should always have to reach those but your data to train and validation.

300
00:32:57,650 --> 00:33:04,490
And you should also have to hold out the ultimate test. And as you're training your network, you you measure the air on the train, right.

301
00:33:04,490 --> 00:33:10,790
To adjust the weight. And you can get that error there. And you should also then apply these weights exactly.

302
00:33:10,790 --> 00:33:18,830
On do validation data set. And as your model trains do, air first goes in synchrony down in both.

303
00:33:18,830 --> 00:33:24,050
But once you remember, your model starts memorising your examples,

304
00:33:24,050 --> 00:33:28,880
then the era when the train will further decrease while invalidation it will increase.

305
00:33:28,880 --> 00:33:36,200
This is it. And then, you know, OK. You to stop here. So that's also a very important aspect to keep in mind.

306
00:33:36,200 --> 00:33:41,030
How are we doing? Oh, [INAUDIBLE]. Sorry.

307
00:33:41,030 --> 00:33:47,420
So this was general information about Perceptron and Amobi.

308
00:33:47,420 --> 00:33:54,350
Let's look into more detail and to CNN's. Because I think they are very fascinating breakthrough in machine learning.

309
00:33:54,350 --> 00:33:59,810
And they really managed to do something that people were not able to do before.

310
00:33:59,810 --> 00:34:07,460
Also, for us as humans, it's really easy, like we have an amazing visual sensory system.

311
00:34:07,460 --> 00:34:16,790
So it's really easy for us to recognise objects. So we look at this and say, okay, five five Cat four computer is not so obvious.

312
00:34:16,790 --> 00:34:22,550
And if if we did the machine learning at the old school machine or an approach,

313
00:34:22,550 --> 00:34:30,580
we would have to handcraft these feature extraction and we could come up maybe with some filters and say, well, five, what delusional look like.

314
00:34:30,580 --> 00:34:35,750
There's always this kind of air that's going from left to right.

315
00:34:35,750 --> 00:34:41,060
And there's like a sharp object just above it. But then you get another five, which is all smooth.

316
00:34:41,060 --> 00:34:42,980
And then you filters not work anymore.

317
00:34:42,980 --> 00:34:53,390
And then try imagine coming up with all the rules that tell you that this is a cat in a mask and not a tiger and not a lion and not a puppy.

318
00:34:53,390 --> 00:34:58,850
So imagine doing that. That's nine impossible. People have tried, but it usually failed.

319
00:34:58,850 --> 00:35:06,140
So D.C. is being part of a part of the Decoding Family Day due to feature extraction themselves.

320
00:35:06,140 --> 00:35:12,530
And they also solve a very important problem in images because images can be really, really large.

321
00:35:12,530 --> 00:35:16,220
I mean, this is a very visual image, the fives here.

322
00:35:16,220 --> 00:35:25,880
But often times, for example, especially in medical imaging, that the resolution is really, really high and you get really large images.

323
00:35:25,880 --> 00:35:32,840
So how do you extract information from these images that you can condense it and still keep it useful?

324
00:35:32,840 --> 00:35:39,920
So CNN's they have two big tricks here. They have convolution layers and they have pooling layers.

325
00:35:39,920 --> 00:35:46,910
Convolution is basically when you take a small window of pixels, maybe like three by three, you can be eleven by eleven, seven by seven.

326
00:35:46,910 --> 00:35:52,750
You are the architect and you apply filters stood out and you and your filter has some pretty

327
00:35:52,750 --> 00:35:59,710
fine numbers and it's often randomly generated and you just do matrix multiplication.

328
00:35:59,710 --> 00:36:03,050
This pixel to this pixel just picks up to this pixel. This is not a pixel.

329
00:36:03,050 --> 00:36:04,700
This is just a number.

330
00:36:04,700 --> 00:36:14,150
And you send them up and you put them and you record them basically in your next layer on your output layer pooling does something even more simple.

331
00:36:14,150 --> 00:36:18,890
It takes three by three pixels, for example, or some other patch.

332
00:36:18,890 --> 00:36:23,450
It's usually a square patch and it just takes the maximum value out of those records.

333
00:36:23,450 --> 00:36:27,560
So here you see we have six by six. And that's the output.

334
00:36:27,560 --> 00:36:33,590
We got two by two, because this patch of three by three went one, two, three, four.

335
00:36:33,590 --> 00:36:36,440
And there we go. It extracted the maximum number.

336
00:36:36,440 --> 00:36:44,420
It doesn't have to be macsween can be medium, whatever you prefer, but it can it helps to condense the information.

337
00:36:44,420 --> 00:36:50,330
So these filters, they are quite curious. Like I said, they can be randomly generated.

338
00:36:50,330 --> 00:37:00,190
You can prespecified them if you want. And depending how they are built, they can highlight certain features so they can either detect edges.

339
00:37:00,190 --> 00:37:09,680
Right. If you have height, like higher values in the middle or lower values in middle or vertical arrangement.

340
00:37:09,680 --> 00:37:19,790
They will detect edges in the image. They can sharpen the image by kind of depressing values on the outside and highlighting values in the middle.

341
00:37:19,790 --> 00:37:29,570
You can do blurring of all kinds of things. And what's interesting is that I will show you typical on your on the architecture.

342
00:37:29,570 --> 00:37:39,660
In a moment, but essentially by applying very similar filters from layer to layer, you are able to extract more and more complex features.

343
00:37:39,660 --> 00:37:45,540
So at first divulges the text, the edges in the image, and then they will detect more complex items.

344
00:37:45,540 --> 00:37:55,680
And then in the third, fourth, fifth, say, you will see whole object. So this is this is really a very interesting property on these filters.

345
00:37:55,680 --> 00:37:59,670
And this is what a typical CNN would look like.

346
00:37:59,670 --> 00:38:04,440
This is not a very large network. Like I said, you are the architect.

347
00:38:04,440 --> 00:38:08,370
You decide on many hyper parameters. How many layers to have?

348
00:38:08,370 --> 00:38:13,650
How to arrange them. What should be the size of the filters? How many filters should you have?

349
00:38:13,650 --> 00:38:20,130
But the still the kind of the flow of information remains the same that you start with.

350
00:38:20,130 --> 00:38:29,620
Your image is the input. And then you do. Convolution usually start with convolution and you extract kind of more.

351
00:38:29,620 --> 00:38:34,290
What you highlight important information. We using these filters.

352
00:38:34,290 --> 00:38:42,030
So usually you you if you have one input image, maybe it has three channels, three colours, but you usually apply multiple filters.

353
00:38:42,030 --> 00:38:49,380
So you get a stack of Dumpty's. And then the information from these layers is passed onto the pooling and you

354
00:38:49,380 --> 00:38:53,160
condense the information and then you do convolution and this condensed information.

355
00:38:53,160 --> 00:39:02,090
Why not? And then you can negative it further until you come to a well, you don't have to condemn it to the point that you have one to one.

356
00:39:02,090 --> 00:39:09,720
Here in like a pool or a convolutional layer, at some point you just say, okay, I'm just taking all all of these.

357
00:39:09,720 --> 00:39:13,810
What's what's remaining of this? And I'm turning it into a vector.

358
00:39:13,810 --> 00:39:23,170
And this is your last your dense called dense layer. And this is the last vector of neurones that will have these.

359
00:39:23,170 --> 00:39:29,010
They will be activated in the activation pattern, will then be mapped to a certain class.

360
00:39:29,010 --> 00:39:32,820
So here we have just two classes, pathology, non-physical, not pathology.

361
00:39:32,820 --> 00:39:40,140
But they were also very successful experiments where you could train the networks to recognise that in one hundred classes.

362
00:39:40,140 --> 00:39:42,210
And I don't think we have the time for this.

363
00:39:42,210 --> 00:39:51,330
But I really, really encourage you to go to this interactive example, because you can actually see how the neural net works.

364
00:39:51,330 --> 00:39:59,750
And it's it's really great. And you can click on all and get more detail and you can really see how the information is flowing.

365
00:39:59,750 --> 00:40:08,790
Right. So what I've pointed out before, that filters in higher layers, capture more and more general information.

366
00:40:08,790 --> 00:40:13,200
Well, we can use this property in the technique. So that's called of propagation.

367
00:40:13,200 --> 00:40:20,020
And it's just one of possibilities to help to interpret your neural net.

368
00:40:20,020 --> 00:40:26,730
Because, yeah, with so many parameters, it's impossible to look at every single weight and say, oh, no, I know what it means when I don't know.

369
00:40:26,730 --> 00:40:36,540
I know what it's doing. Now, here you you have to find a way to output the results, which kind of contributed to network's decision.

370
00:40:36,540 --> 00:40:41,460
And we're using guided propagation and similar techniques. And you can really see.

371
00:40:41,460 --> 00:40:45,900
Okay. The network said that this picture is a dog.

372
00:40:45,900 --> 00:40:51,970
And when it said so, it was taking this consider this information into consideration.

373
00:40:51,970 --> 00:40:55,200
It is regarded this as important.

374
00:40:55,200 --> 00:41:04,500
There is an urban myths about some borough, one early machine like deep learning experiment where I think they would.

375
00:41:04,500 --> 00:41:12,270
The story goes that the researchers were trying to classify tanks, maybe like which country it came from.

376
00:41:12,270 --> 00:41:15,960
And they on their training set.

377
00:41:15,960 --> 00:41:20,940
The model was very accurate. But later on, it just couldn't.

378
00:41:20,940 --> 00:41:31,530
It was complete mess. And once they looked deeper, they realised that the background was contributing more.

379
00:41:31,530 --> 00:41:37,290
So they somehow all the tanks were always in snow or on a desert.

380
00:41:37,290 --> 00:41:42,870
And that's what the model learnt to pay attention to. It turns out to be it's very likely interest in urban myths.

381
00:41:42,870 --> 00:41:57,540
But it actually does happen. There is a real example where x rays were marked very, very finely by a clinician, whether they had pathology or not.

382
00:41:57,540 --> 00:42:04,650
It was just a little pen mark somewhere in the corner that nobody else noticed.

383
00:42:04,650 --> 00:42:08,980
Yet the modern modest it is. And it had perfect performance.

384
00:42:08,980 --> 00:42:14,640
It didn't care what kind of image was on the X-ray. Just a mark means its pathology.

385
00:42:14,640 --> 00:42:19,830
And then, of course, of a new images started coming in that didn't have that mark because they were from a different hospital,

386
00:42:19,830 --> 00:42:23,700
from a different condition. The model was completely helpless.

387
00:42:23,700 --> 00:42:30,120
So it's really important, I think, to to pay attention to these things.

388
00:42:30,120 --> 00:42:40,560
So we've talked a lot about how you can speed up the training with dropout and gradient stochastic gradient descent.

389
00:42:40,560 --> 00:42:50,040
Yet when you have very little data, for example, or you're really pressed for time and compute power, you can have another shortcut.

390
00:42:50,040 --> 00:42:59,220
And that's called transfer learning. So the trick is that whenever whatever your dataset is, is an animal's is a cars.

391
00:42:59,220 --> 00:43:07,800
Is it x rays? The first layers are usually learnt to detect edges and then little just little properties of the images.

392
00:43:07,800 --> 00:43:13,560
And if you think that they are common enough that they are also common in your problem and your data set,

393
00:43:13,560 --> 00:43:23,280
then you can actually use an already pre trained model like image that you can loaded using sound frameworks that I've told you about.

394
00:43:23,280 --> 00:43:30,000
You can loaded into your system, use the weights from as upward wage from a certain layer.

395
00:43:30,000 --> 00:43:34,000
For example, in the first four we say, OK, we don't want to train or model.

396
00:43:34,000 --> 00:43:38,280
We just cut them and transfer them to our model.

397
00:43:38,280 --> 00:43:46,290
And we use them and then continue the training further on so that it shortens your time considerably.

398
00:43:46,290 --> 00:43:53,040
And the results are still very good to the point that when I mentioned once that I want to train the model from scratch.

399
00:43:53,040 --> 00:43:58,860
People like why? Why would you do that? They are working just fine on transfer.

400
00:43:58,860 --> 00:44:02,850
All right. So that was a very quick introduction into CNN.

401
00:44:02,850 --> 00:44:08,700
And we still happily have some time to talk about Grauwe new owners, because this is a very exciting area.

402
00:44:08,700 --> 00:44:14,370
They really took off just a couple of years ago. There's new flavours being born every month.

403
00:44:14,370 --> 00:44:21,570
And the applications are really fascinating. But first, let's talk about what the graph is and how they're different from other inputs.

404
00:44:21,570 --> 00:44:27,120
So we know already that CNN's and Arnon's can work on images or text or speech.

405
00:44:27,120 --> 00:44:31,650
But what if you have a graph? Right. So here we have a pretty complex and census graph.

406
00:44:31,650 --> 00:44:35,830
It was generated randomly. We have five nodes and they're connected.

407
00:44:35,830 --> 00:44:42,180
The Connexions actually directed. So you can go from one to zero, but not back.

408
00:44:42,180 --> 00:44:47,280
Right. We don't have an arrow that goes back. Not all graphs have to be so-called directed.

409
00:44:47,280 --> 00:44:52,980
It could be just just a link. Then it can go both ways. And it can be anything.

410
00:44:52,980 --> 00:44:57,000
It could be, for example, that this is a story.

411
00:44:57,000 --> 00:45:00,870
This is a surgeon. No, no, no. Wanted a surgeon.

412
00:45:00,870 --> 00:45:05,670
And No. Zero is a patient or.

413
00:45:05,670 --> 00:45:15,130
I know. This is a symptom. And this is a disease or vice versa, or this is a drug.

414
00:45:15,130 --> 00:45:21,610
And this is a side effect. Well, these are two proteins and acting direct so that you can encode so many things.

415
00:45:21,610 --> 00:45:26,920
But how would you do a deep learning them? Graff's theory itself is very, very old.

416
00:45:26,920 --> 00:45:31,630
And even without deep learning, we can do amazing stuff there. We can come to neighbours.

417
00:45:31,630 --> 00:45:37,150
This is the adjacency matrix. So you can see how many the nodes are connected.

418
00:45:37,150 --> 00:45:48,780
You can do community detection. So one of my in one of my postural trials, it actually was a very important aspect on Twitter, for example.

419
00:45:48,780 --> 00:45:55,750
But if your graph is really, really complex and you want to do a very, very subtle things in it,

420
00:45:55,750 --> 00:46:03,160
like maybe you want to assign nodes, some nodes, classes, and you have labels for most of them, but not for all.

421
00:46:03,160 --> 00:46:07,240
How do you do this with this tradition, that method? It's pretty tricky.

422
00:46:07,240 --> 00:46:12,340
And for a long time, this was a real problem because graph is not structured right.

423
00:46:12,340 --> 00:46:17,680
If you'd taken it, it's not it's not an image with it's pixels that are a grid.

424
00:46:17,680 --> 00:46:22,480
It's not a text where every word is forming another word.

425
00:46:22,480 --> 00:46:26,290
So how do you do convolutional know, for example? And you don't have to do convolutional.

426
00:46:26,290 --> 00:46:31,250
Not all. All graph. One, that's a graph convolution. That's just one flavour.

427
00:46:31,250 --> 00:46:39,910
But, yeah, it's it was it was a big challenge. And I think what helps here is to think that, well, an image is a grid.

428
00:46:39,910 --> 00:46:48,400
And what do we do? Convolution in it. We just take information from these nine nine pixels, for example, depending on what our kernel is.

429
00:46:48,400 --> 00:46:54,370
And then we updated. We turn it into one pixel for one one value.

430
00:46:54,370 --> 00:47:02,860
So it turns out you can do the same on graphs, but instead of having a predefined grid and always needing nine or any other square,

431
00:47:02,860 --> 00:47:10,140
a number of note, you just say, well, I have this node and I have its neighbours.

432
00:47:10,140 --> 00:47:18,340
And imagine that. So what you need, though, is that every neighbour or every node has some features.

433
00:47:18,340 --> 00:47:19,760
I know it's a patient, for example.

434
00:47:19,760 --> 00:47:30,160
Then you have their height, weight, age, blood pressure, temperature, and you need to make sure that every node has these features.

435
00:47:30,160 --> 00:47:33,340
And also in the same order. And what do you do when you want to do?

436
00:47:33,340 --> 00:47:38,230
Graphical evolution is the first step is to take information from each of the neighbour,

437
00:47:38,230 --> 00:47:44,650
from the node in question, extract features from those neighbours.

438
00:47:44,650 --> 00:47:51,010
And then, for example, it can do any kind of mathematical function that lets you choose, for example, averaging.

439
00:47:51,010 --> 00:47:56,680
And then you take the features that on that note and for example, average.

440
00:47:56,680 --> 00:48:05,950
That was the average information from the neighbours. So essentially like your slushing information about because you also do it for every node.

441
00:48:05,950 --> 00:48:10,030
What would that be good for, though? Like, why would we need to propagate?

442
00:48:10,030 --> 00:48:14,080
This information is called message passing on. It's useful.

443
00:48:14,080 --> 00:48:19,750
In case you have labels on these nodes, but not on the one you're interested in.

444
00:48:19,750 --> 00:48:24,200
Then you want to classify this note. Is it a patient at risk?

445
00:48:24,200 --> 00:48:27,550
Right. Is it is it a fraudulent account?

446
00:48:27,550 --> 00:48:34,930
So, of course, what's very important in this case is that your graph, the Connexions, and it actually makes sense.

447
00:48:34,930 --> 00:48:41,180
Right. These connexions cannot be random in an image. It's dictated just by the position of the pixels.

448
00:48:41,180 --> 00:48:46,210
And then it makes sense. Imagine you just scramble the pixels in the picture.

449
00:48:46,210 --> 00:48:50,200
You yourself are not very able to recognise what it's a cat or dog.

450
00:48:50,200 --> 00:48:55,150
So there are the position makes sense here. It has to make sense as well.

451
00:48:55,150 --> 00:49:00,370
But yeah, essentially, this is one one of the things you can do with graph convolution.

452
00:49:00,370 --> 00:49:05,440
What are other things that the genomes in general are good for?

453
00:49:05,440 --> 00:49:11,260
So like I said, for example, node classification maybe can help you in disease diagnosis.

454
00:49:11,260 --> 00:49:17,470
Protein. Protein interaction. Drug protein interaction. That would be called link completion.

455
00:49:17,470 --> 00:49:25,330
So you have you have your note in the graph and you want to figure out whether they are connected or not.

456
00:49:25,330 --> 00:49:29,350
There is also a very interesting technique called node embedding,

457
00:49:29,350 --> 00:49:39,340
where you just want to condense the feature space into a just represent clusters in your in your graph.

458
00:49:39,340 --> 00:49:45,580
And then you can do that by using node embedding or you can classify whole graphs.

459
00:49:45,580 --> 00:49:49,330
Not just the node in the graph, but whole graphs. And that can help you.

460
00:49:49,330 --> 00:50:00,240
For example, was muk molecule class prediction is a toxic or not. So it's the diversity and application that was just medicine, right?

461
00:50:00,240 --> 00:50:03,930
The graphs are useful in social network analysis.

462
00:50:03,930 --> 00:50:12,570
Of course, like US, bank banking, fraud detection, all the things because very large,

463
00:50:12,570 --> 00:50:18,160
that there is very large diversity of information and you can encode in a graph.

464
00:50:18,160 --> 00:50:23,680
So, yeah, really fascinating field. And when when preparing this lecture,

465
00:50:23,680 --> 00:50:32,590
I went through maybe like 40 different resources and I want to highlight just a few of them here and the the kind of split into chunks.

466
00:50:32,590 --> 00:50:40,930
So these are very nice, user friendly talks and blog posts on Quackenbush networks.

467
00:50:40,930 --> 00:50:45,900
Then we have a very interesting talk and very recent talk by Michael Crichton.

468
00:50:45,900 --> 00:50:50,420
And he is in London and he's very, very active in this area.

469
00:50:50,420 --> 00:50:58,150
Then this is number four is a very good review of methods and applications of graphene, your networks.

470
00:50:58,150 --> 00:51:04,590
They really go through all the possible flavours and all the possible uses of craft.

471
00:51:04,590 --> 00:51:12,600
In deep learning, this is a book that is also very fresh and it's available on print online.

472
00:51:12,600 --> 00:51:23,160
And again, if you want to go out and do things and programme and train your models, some graphs, they're already taught how to do that.

473
00:51:23,160 --> 00:51:33,830
So there is a deep graph library that is very stable and works even on top of other tools and deal frameworks.

474
00:51:33,830 --> 00:51:39,870
And for example, for Lifesciences, deep cam is a very useful collection of tools.

475
00:51:39,870 --> 00:51:50,250
It's not exclusively deep learning on graphs, but that area is very occupies a large chunk of it.

476
00:51:50,250 --> 00:51:55,740
So, you know, I hope you've learnt some new things today.

477
00:51:55,740 --> 00:52:01,670
Those of you who are already deep, very practitioners of it was a good recap.

478
00:52:01,670 --> 00:52:07,980
I thank you very much for your attention. And I have one last link here, which is about attention networks,

479
00:52:07,980 --> 00:52:16,980
which is another very interesting technique in and in your and that's that allows them to improve their results even further.

480
00:52:16,980 --> 00:52:21,890
And the outlook that we even have some time for questions.

481
00:52:21,890 --> 00:52:27,990
I would especially welcome you to some more practical questions or machinery and industry or this stuff,

482
00:52:27,990 --> 00:52:31,020
because we wouldn't have time to go through Damascus and index.

483
00:52:31,020 --> 00:52:39,720
But if you go through all the links that I've posted and in those in the slides, you will know these topics better than I do.

484
00:52:39,720 --> 00:52:41,312
Maybe.