1
00:00:01,770 --> 00:00:09,600
Hello, everybody. Welcome to this term strategy lecture. We're going to be waiting just a few minutes because I can see a lot of people joining,

2
00:00:09,600 --> 00:00:12,480
there was a technical difficulty at the beginning, so I'm sorry about that.

3
00:00:12,480 --> 00:00:16,860
But please, let's just wait a minute and let the people who are trying to join get in.

4
00:00:16,860 --> 00:00:24,630
OK. So, on behalf of the Department of Computer Science at Oxford University, I'd like to welcome all of you to this term straight to lecture.

5
00:00:24,630 --> 00:00:30,960
So this is a series of distinguished lectures that we have one a term in memory of Professor Christopher Straightly,

6
00:00:30,960 --> 00:00:39,330
who is actually the first professor of computer science at Oxford Street, who founded Oxford Programming Research Group in nineteen sixty five.

7
00:00:39,330 --> 00:00:43,170
And together with Dana Scott, he founded the field of Denotation with Semantics,

8
00:00:43,170 --> 00:00:47,400
which provided a firm mathematical foundation for programming languages.

9
00:00:47,400 --> 00:00:52,110
Before I get the pleasure of introducing today's speaker, I get another pleasure.

10
00:00:52,110 --> 00:01:00,360
I'd like to just really strongly thank Oxford Asset Management, who are really generously supporting this sequence of lectures.

11
00:01:00,360 --> 00:01:04,680
They've actually been supporting the series since 2014, and without that support,

12
00:01:04,680 --> 00:01:09,330
we wouldn't be able to bring this really, really high calibre series of speakers.

13
00:01:09,330 --> 00:01:17,070
So great thanks to them. It's great pleasure today to welcome Professor Cecilia Mascolo.

14
00:01:17,070 --> 00:01:24,030
Cecily is a professor of mobile systems at the Department of Computer Science and Technology of the University of Cambridge.

15
00:01:24,030 --> 00:01:31,380
At at Cambridge, Cecilia is actually the head of the Mobile Wearable Systems and Augmented Intelligence Group.

16
00:01:31,380 --> 00:01:37,410
She's also holding NERC Advanced Grant at the moment on the topic of audio based mobile health diagnostics.

17
00:01:37,410 --> 00:01:42,220
So this explain both her research area and the area of the talk.

18
00:01:42,220 --> 00:01:48,720
The Sisulu's got a huge string of awards, so I had to kind of shorten it so that you would have time to hear from her.

19
00:01:48,720 --> 00:01:53,130
Before her ERC advanced grant, she had an network advanced research fellowship.

20
00:01:53,130 --> 00:01:59,820
There's been a fellow of the Turing Institute shows a huge number of exciting keynote talks,

21
00:01:59,820 --> 00:02:05,900
so I just looked for this year and amongst this year I found AI Tripoli Health Care Summit,

22
00:02:05,900 --> 00:02:08,400
IWC and Smart Comp,

23
00:02:08,400 --> 00:02:18,840
and from last year ACM welcomed that hot mobile addition cited this year for the I Talk and has a whole also the best paper awards,

24
00:02:18,840 --> 00:02:22,620
including recently a 10 year impact award at ACM.

25
00:02:22,620 --> 00:02:32,640
Giving up her talk today is entitled Mixed Signals Audio and Wearable Data Analysis for Health Diagnostics, so we're really looking forward to that.

26
00:02:32,640 --> 00:02:40,680
Before I welcome Cecilia, I just want to make one technical comment since we're still in pandemic mode and that's that during the talk,

27
00:02:40,680 --> 00:02:45,330
what you can do is you can type questions into the chat, and at the end of the talk,

28
00:02:45,330 --> 00:02:49,500
I will be reading those out so that Cecilia can answer some of our questions.

29
00:02:49,500 --> 00:02:56,740
OK, Cecilia, it's a huge pleasure to welcome you virtually to Oxford.

30
00:02:56,740 --> 00:03:10,200
Leslie, thank you very much. So before I share my slides, I would like to thank you and.

31
00:03:10,200 --> 00:03:16,510
A. A, I guess you can hear it, so, Leslie, thank you very much.

32
00:03:16,510 --> 00:03:23,090
I was muted for a while and I would like to thank you and your predecessor, Michael Wooldridge, for this invitation.

33
00:03:23,090 --> 00:03:30,370
It's a great honour to be here. I'm sorry that I can't meet you all in person and have the live interaction that we could have.

34
00:03:30,370 --> 00:03:35,440
But I will now show my slides and hopefully this will be.

35
00:03:35,440 --> 00:03:45,380
Some interaction was still happening, so I will now assume you can see my slides and I thought my talk.

36
00:03:45,380 --> 00:03:50,560
And so there is now in our daily lives,

37
00:03:50,560 --> 00:03:58,210
a constellation of wearable devices that are sensing our behaviour and impartially in a perhaps a more indirect way.

38
00:03:58,210 --> 00:04:10,720
Our health. And so one would imagine if you have some of these like phones, watches and durables that this area is kind of dumb,

39
00:04:10,720 --> 00:04:19,360
that our health is kind of transformed and there is no research for us in academia to do around this anymore.

40
00:04:19,360 --> 00:04:27,850
Well, in this talk, I would like to really highlight that what we're doing at the moment and what we see in these devices that go into

41
00:04:27,850 --> 00:04:35,720
consumers hands is that we are really playing with the sensing and with the data that comes out of it quite superficially.

42
00:04:35,720 --> 00:04:42,790
And we really need to go through a number of breakthrough to really transform health.

43
00:04:42,790 --> 00:04:44,710
And so in this talk,

44
00:04:44,710 --> 00:04:56,680
I will first talk about the challenges that we're facing in the exciting possible opportunity that this and this could could innovate.

45
00:04:56,680 --> 00:04:59,560
So these are only some. So I have this challenge.

46
00:04:59,560 --> 00:05:08,680
I will introduce them a bit and I have two reputation, two examples from my research that in which I try to explain some of this.

47
00:05:08,680 --> 00:05:14,620
So the obvious first one is that sensory modalities, of course, new and new sensors.

48
00:05:14,620 --> 00:05:19,660
My colleagues in engineering are coming up with new ways of sensing our behaviour.

49
00:05:19,660 --> 00:05:22,390
So I have a colleague who works on EEG.

50
00:05:22,390 --> 00:05:32,910
Portable EEG sensors are becoming smaller more or less and that they are being weaved into a fabric, possibly even tattoos.

51
00:05:32,910 --> 00:05:42,400
Pills are being developed so they can be ingested, and contactless communication between ingested sensors and external devices can happen so that

52
00:05:42,400 --> 00:05:51,370
you can can sense since our health in a less invasive way or disruptive for our activities.

53
00:05:51,370 --> 00:05:59,560
But at the same time, existing sensors that are already on devices that we wear generate amounts of data that we are not

54
00:05:59,560 --> 00:06:08,740
quite at the stage of being able to being good at modelling the kind of final aims that we have.

55
00:06:08,740 --> 00:06:13,720
And this can can be taken further by saying that these devices generate data.

56
00:06:13,720 --> 00:06:19,990
Often the granularity we haven't seen before and can be placed because of the type of data they are.

57
00:06:19,990 --> 00:06:27,250
They can be placed in in in parts of our body sometimes that we have never thought of sensing.

58
00:06:27,250 --> 00:06:34,540
So some interesting conversations I've been having with them with clinicians are often of the style.

59
00:06:34,540 --> 00:06:43,440
But what if I could give the long term sense and continuous sensing from perhaps your abdomen?

60
00:06:43,440 --> 00:06:46,240
What kind of things would you be able to do?

61
00:06:46,240 --> 00:06:56,380
And this is so far from what they're used to see that even that kind of conversation of what can can be done with the sort of data is missing.

62
00:06:56,380 --> 00:07:00,190
In my examples today, I will talk about longitudinal sensing,

63
00:07:00,190 --> 00:07:08,350
the fact that now it is much easier not just to have fine grained data continuously, but also for a long, long time,

64
00:07:08,350 --> 00:07:13,360
which means that we can assess differences from past and present,

65
00:07:13,360 --> 00:07:21,490
present and possibly future when that comes and look at predictions with this sort of longitudinal data.

66
00:07:21,490 --> 00:07:32,870
One thing that is important here is that the studies and the techniques that have been used are often until now used on small scale trials.

67
00:07:32,870 --> 00:07:42,190
Small small cohorts and often the free living aspect of the analysis is somehow missing you.

68
00:07:42,190 --> 00:07:47,080
You have more control over the labels of the data sometimes in this study.

69
00:07:47,080 --> 00:07:56,200
And you know there is less ability to adapt to unforeseen and noisy data comes out of that.

70
00:07:56,200 --> 00:08:04,300
And I'm sure for the fourth bullet point, I'm preaching to the choir in this department and that but I will I will talk about this nevertheless.

71
00:08:04,300 --> 00:08:11,150
And perhaps it leads to an interesting conversation where we are talking about clinical and diagnostic.

72
00:08:11,150 --> 00:08:21,350
Six aspects of uncertainty. So the ability to go beyond the concept of accuracy of a prediction is important,

73
00:08:21,350 --> 00:08:30,720
but so the angle perhaps that is new to this department is the fact that in I will, I will show that in in mobile.

74
00:08:30,720 --> 00:08:39,740
In the case of mobile data, it is possible to weave this uncertainty in the pipeline on how the data is collected and recollected,

75
00:08:39,740 --> 00:08:47,750
because the entry point of rerecording and getting more data is so low that if it drops a prediction of certain data is uncertain,

76
00:08:47,750 --> 00:08:53,760
then maybe we can easily collect more data. I hope this is clear to a point that it is not clear.

77
00:08:53,760 --> 00:08:56,000
I have some examples on this,

78
00:08:56,000 --> 00:09:02,900
and the last point is obviously one that if I don't talk about amassed at the end of the lecture and is related to privacy.

79
00:09:02,900 --> 00:09:11,600
My perspective on this is that privacy is somehow could be embedded in the process

80
00:09:11,600 --> 00:09:16,460
that we develop to make sure that a lot of this can be done closer to the users.

81
00:09:16,460 --> 00:09:23,600
And as a systems researcher, I will show at the end some examples of how perhaps we can bring this further closer to

82
00:09:23,600 --> 00:09:30,050
the users by perhaps developing models as a clinical trial with the side effect is the,

83
00:09:30,050 --> 00:09:40,220
you know, the lack of privacy for their users in the trial. But once we develop more and more models, the models can be deployed at scale,

84
00:09:40,220 --> 00:09:49,280
and the privacy of the users is respected because those models will run on devices close to the users and on their data locally.

85
00:09:49,280 --> 00:09:50,840
And so there is more of this.

86
00:09:50,840 --> 00:10:00,060
I wanted to give you an anticipations because I know your attention, especially online, is limited, so I will try to start with the first example.

87
00:10:00,060 --> 00:10:09,170
So as I said, I have two on two different types of diseases and data syncing data that we use for that.

88
00:10:09,170 --> 00:10:14,060
So the first one is about cardiorespiratory fitness. I don't know how many of you know,

89
00:10:14,060 --> 00:10:18,710
and I certainly was surprised by the fact that cardiorespiratory fitness is a

90
00:10:18,710 --> 00:10:25,910
very important factor that is inversely associated with cardiovascular diseases.

91
00:10:25,910 --> 00:10:36,410
And, interestingly enough, is much more indicative than cholesterol, diabetes, hypertension and even smoking.

92
00:10:36,410 --> 00:10:43,380
So it's very important to assess cardiorespiratory fitness, and this is a project we're doing with MRC epidemiology.

93
00:10:43,380 --> 00:10:46,340
And so if I was in a live lecture theatre,

94
00:10:46,340 --> 00:10:55,100
I would now ask you to raise your hand if you have ever done one of the strenuous tests and with on tests in which you go on a treadmill or bike.

95
00:10:55,100 --> 00:11:03,410
And then you have a mask. And this test is cumbersome, is very strenuous because they often push it to the end of your abilities.

96
00:11:03,410 --> 00:11:12,770
And this is to measure your cardiorespiratory fitness as a means to by by, by measuring your VO2 Max,

97
00:11:12,770 --> 00:11:18,260
which is the maximum volume of oxygen that you can breathe in and then that is transported

98
00:11:18,260 --> 00:11:22,430
through the bloodstream and then eventually transform into energy by your muscles.

99
00:11:22,430 --> 00:11:28,920
Now, as you can imagine, this test is is is actually not very scalable.

100
00:11:28,920 --> 00:11:40,100
You need equipment and it is strenuous. So what could be a geologist and and people who study this sort of relationship and been doing

101
00:11:40,100 --> 00:11:47,300
for for the moment is to proxy this with other measures such as anthropometric measures,

102
00:11:47,300 --> 00:11:55,760
demographics, height, weight, BMI, as well as questionnaires about how many times to exercise, what type of type Typekit exists.

103
00:11:55,760 --> 00:12:02,030
And this is this is a good proxy already, but you can imagine, you know, you probably know where I'm going now.

104
00:12:02,030 --> 00:12:11,840
There are lots of wearable data that could be used as a proxy of that sort of questionnaire data and turns out that also resting heart rate,

105
00:12:11,840 --> 00:12:21,140
which can be measured quite easily at least more easily than exercise, is a very good proxy indicative proxy for that.

106
00:12:21,140 --> 00:12:26,120
So where are we with bringing wearables into the detection of VO2 Max?

107
00:12:26,120 --> 00:12:31,430
Well, if you have one of these most modern devices, you know that some of them,

108
00:12:31,430 --> 00:12:36,820
when you tell them what exercise you're doing, are giving you already an estimate of VO2 Max.

109
00:12:36,820 --> 00:12:46,490
And so this is happening in real life. There are very few studies that are showing the effectiveness of this promising of the

110
00:12:46,490 --> 00:12:52,550
wearable data measuring activity as well as heart rate and is a proxy for VO2 Max.

111
00:12:52,550 --> 00:12:58,580
And most importantly, there are essentially none that do this in free living conditions.

112
00:12:58,580 --> 00:13:03,530
If you remember, my first life free living condition was one of the important thing because you really don't want

113
00:13:03,530 --> 00:13:10,520
to have all having to have to label the data so much from from the perspective of the user.

114
00:13:10,520 --> 00:13:14,940
So. The free living aspect is very important.

115
00:13:14,940 --> 00:13:26,460
And so now I have a few slides on the study we are doing as this is a keynote, I hate to present research that we have already always published,

116
00:13:26,460 --> 00:13:30,750
so I always tried to push myself to present something that is something that we are doing.

117
00:13:30,750 --> 00:13:35,440
And so this is not yet something out. This is something we are working on.

118
00:13:35,440 --> 00:13:41,010
And so this is the measurement of cardiorespiratory fitness through wearable data in free living.

119
00:13:41,010 --> 00:13:45,150
And this is way this works with data. The MRC epidemiology is collected.

120
00:13:45,150 --> 00:13:48,630
It's a study called Fenland.

121
00:13:48,630 --> 00:13:57,460
It's a data set and they have a number of participant 11000 in the first cohort and then seven years later, and they have another cohort.

122
00:13:57,460 --> 00:14:05,310
Then in this particular case, we're using a subset of that. Of all these people, they do two max tests on them.

123
00:14:05,310 --> 00:14:13,290
So they do do the test. They measure anthropometric measures, as I say, demographics as well as height, weight, BMI and they ask questions them.

124
00:14:13,290 --> 00:14:20,820
But they also ask and this is this is, I think, invaluable them to wear an accelerometer on their wrist,

125
00:14:20,820 --> 00:14:29,390
as well as an EKG chest strap to measure their heart for six days, essentially very much continuously.

126
00:14:29,390 --> 00:14:37,700
So this data is a lot of data, and I remind everyone that this is in free living, we know nothing about what they're doing in those days,

127
00:14:37,700 --> 00:14:43,040
but this six continuous days is not too much of it either, but it still generates a lot of data.

128
00:14:43,040 --> 00:14:48,050
So what we are doing with this is by using his input,

129
00:14:48,050 --> 00:14:57,530
their heart rate and the movement data on which we calculate a bunch of features which you can imagine just to, you know, aggregate all this data.

130
00:14:57,530 --> 00:15:04,460
We feed this into, you know, two layers of a connected neural network,

131
00:15:04,460 --> 00:15:09,740
and we use this to do a few things now in the next slide and show you some results.

132
00:15:09,740 --> 00:15:19,280
But essentially first thing is that we try to use this data to proxy the prediction, the fitness levels and the VO2 Max test.

133
00:15:19,280 --> 00:15:22,160
And the fact is that we have the Granton. So we can you can.

134
00:15:22,160 --> 00:15:30,320
You can tell how well we're doing and which which sensor which aspect of the data is more important.

135
00:15:30,320 --> 00:15:41,120
We've also tried to see if the models are robust enough. So if we train on the original cohort, how does the model do on the later cohort?

136
00:15:41,120 --> 00:15:51,440
And we're trying to understand these sort of aspects as well as looking if perhaps maybe movement is also a good indicator of heart,

137
00:15:51,440 --> 00:15:57,840
you know, heart rate, for example, and this is what you're going to see next.

138
00:15:57,840 --> 00:16:05,430
And so here, I guess let's go straight to the figure, which is the the easiest thing to interpret here on the x axis,

139
00:16:05,430 --> 00:16:15,270
you have the VO2max of the users and the the two different distributions are the predictive versus the to the ground through distribution as you see,

140
00:16:15,270 --> 00:16:25,960
they match reasonably well. We are still, I would say, under predicting for a portion of them, as you can see the purple coming out to the back there.

141
00:16:25,960 --> 00:16:33,360
And on the table, for those of you who do like tables, we break down the different results on the arrow,

142
00:16:33,360 --> 00:16:39,050
the arrows, the Arabs and the Romans square in the column there.

143
00:16:39,050 --> 00:16:44,670
And then you see if I can point to it of the virus just using anthropometric,

144
00:16:44,670 --> 00:16:50,730
excuse me, resting heart rate mix in them and then adding the wearable data.

145
00:16:50,730 --> 00:16:55,290
As you can see, of course, the wearable data is interesting. It does help the prediction.

146
00:16:55,290 --> 00:17:03,690
And while I was discussing this predictions with the Finnish biologist, I, you know, I always ask, Well, is this improvement reasonable?

147
00:17:03,690 --> 00:17:08,220
And their also is quite interesting because they say, well, it depends what you're trying to do.

148
00:17:08,220 --> 00:17:15,310
So sometimes we really need even just this one point more to to really be precise and so that,

149
00:17:15,310 --> 00:17:19,800
you know, they're striving to get this obviously down as much as possible.

150
00:17:19,800 --> 00:17:28,980
As I said, this is this is really, you know, at the beginning that we think it's the right direction and interesting to look at this.

151
00:17:28,980 --> 00:17:37,440
The thing to remember that this data is from people that are not all athletes, which is something you find often in in some of the studies.

152
00:17:37,440 --> 00:17:49,650
These are normal people in which this information is very interesting and could lead to good outcome prediction along the same lines.

153
00:17:49,650 --> 00:17:58,680
We are trying now to also see if the wearable data, which we input in some other machine learning framework, the details.

154
00:17:58,680 --> 00:18:03,810
I just put some people down at the bottom. You can look on my web page if you want the details of the technical stuff.

155
00:18:03,810 --> 00:18:09,780
I don't think this talk is particularly about all the details of the neural network architecture used.

156
00:18:09,780 --> 00:18:19,290
So we use the the data from the wristwatch and trying to see if we can forecast heart rate from that.

157
00:18:19,290 --> 00:18:28,230
So now this is the the main task that is being performed and the next slide, we'll have some results in full of where we are with that.

158
00:18:28,230 --> 00:18:30,240
But also in the meanwhile,

159
00:18:30,240 --> 00:18:39,060
what is interesting I think of this technique is the fact that the later representation at the penultimate layer of the network,

160
00:18:39,060 --> 00:18:46,980
the network is learnt from this data of the activity is quite predictive of other clinically relevant information,

161
00:18:46,980 --> 00:18:53,220
such as BMI, age, sex and energy expenditure. And so we have all of a sudden,

162
00:18:53,220 --> 00:19:05,120
interesting relationship between what the network is learning and what what the characteristics of these individuals are.

163
00:19:05,120 --> 00:19:16,490
And here are your numbers, essentially. And so in the first table here, if you see my mouse, I'm not sure you see that.

164
00:19:16,490 --> 00:19:22,460
We have tried to stick to what is our technique and we've tried to use it just with acceleration,

165
00:19:22,460 --> 00:19:31,910
data acceleration and the temporal features that are embedded into the data that we have, as well as adding additional resting heart rate.

166
00:19:31,910 --> 00:19:38,540
Now, as I said at the beginning, resting heart rate is a measure that one can conceivably think.

167
00:19:38,540 --> 00:19:48,380
To get reasonably in a reasonably discreet manner is something that you can ask can be checked in a very quiet moment of your time.

168
00:19:48,380 --> 00:19:56,120
Maybe when you lie down and and not so frequently so it's something that is conceivable as a measurement that doesn't cost much to add.

169
00:19:56,120 --> 00:20:07,160
And as you can see, you know, I see I see is substantially decreasing, I would say reasonably decreasing when you use,

170
00:20:07,160 --> 00:20:13,280
when you start using the rest of our trade in addition to an acceleration. So clearly, I mean, epidemiologists know about this.

171
00:20:13,280 --> 00:20:18,830
They know that this is an important feature of your your fitness and clearly

172
00:20:18,830 --> 00:20:22,850
in the obviously very correlated with your general heart trade in general.

173
00:20:22,850 --> 00:20:27,320
And here we have the acceleration indicating the amount of activity that is indicative

174
00:20:27,320 --> 00:20:36,890
also and and drives can be a proxy for your heart rate variability and not mobility level.

175
00:20:36,890 --> 00:20:42,080
And the other table, we see this outcomes that we have and, you know,

176
00:20:42,080 --> 00:20:48,200
with various principal component analysis reductions from the from the features we

177
00:20:48,200 --> 00:20:54,710
have that we we have reasonable prediction of some of the demographics height six,

178
00:20:54,710 --> 00:21:02,750
somehow even age, BMI and weight. And so so here, I guess with this first part of the work,

179
00:21:02,750 --> 00:21:14,480
what I wanted to highlight amongst the original bullets of things that we have are a few things one three living data is more difficult to deal with,

180
00:21:14,480 --> 00:21:19,760
possibly more confusing. You often don't get this beautiful results,

181
00:21:19,760 --> 00:21:28,460
but we need to work with this because if you talk to epidemiologists or people that do this kind of large scale work,

182
00:21:28,460 --> 00:21:35,870
they are interesting in monitoring the population and they cannot afford to do this with something that is very controlled.

183
00:21:35,870 --> 00:21:39,590
So we need to find techniques to do it.

184
00:21:39,590 --> 00:21:46,850
The second aspect is the continuous and the longitudinal. Now I probably in the result, have really just scratched the surface of you.

185
00:21:46,850 --> 00:21:52,280
So we have Finland one, Finland two. We have, you know, people monitor a difference of seven years.

186
00:21:52,280 --> 00:21:57,350
The only thing we have done in that respect is to monitor, you know, monitor.

187
00:21:57,350 --> 00:22:05,120
Our models were robust at the moment in time. But there is a lot more that we think we can do now for the rest of the talk.

188
00:22:05,120 --> 00:22:08,810
I will I will bring back this, but for the moment,

189
00:22:08,810 --> 00:22:15,410
I will just move into another example in another sensor that we have been used

190
00:22:15,410 --> 00:22:21,380
using quite a bit and that's related to the microphone that is in our devices.

191
00:22:21,380 --> 00:22:27,680
So it's about the application of this to auscultation and auscultation in general.

192
00:22:27,680 --> 00:22:37,160
So perhaps auscultation that you as an audience know most is related to how it auscultation or

193
00:22:37,160 --> 00:22:44,810
respiratory auscultation to stay that what turns out and what what I've been told also in person.

194
00:22:44,810 --> 00:22:45,380
But then I come.

195
00:22:45,380 --> 00:23:00,170
The citation is that auscultation is very difficult for a human ear, and often junior doctors are not skilled and and this these are from what I hear,

196
00:23:00,170 --> 00:23:06,320
not so train in this auscultation because this can easily be proxy by other devices.

197
00:23:06,320 --> 00:23:17,090
So for cardiac echocardiograms, how substituting auscultation by it and just the the the stethoscope.

198
00:23:17,090 --> 00:23:24,620
On the other hand, machines and microphones are in our hands and they are cheap.

199
00:23:24,620 --> 00:23:27,920
And most importantly, they are with us all day,

200
00:23:27,920 --> 00:23:37,160
which means that with respect to the discrete auscultation that the doctor could do on us, this thing can listen to us continuously.

201
00:23:37,160 --> 00:23:43,270
Now this has advantages well of its challenges, but also opportunities.

202
00:23:43,270 --> 00:23:50,020
And so I will start with with an example of of of audio that you might be familiar with.

203
00:23:50,020 --> 00:24:02,080
And that's, you know, voice in 2017, this MIT tech review highlighted that voice could be indicative not just of

204
00:24:02,080 --> 00:24:10,300
perhaps what these to you more intuitive psychiatric and psychological diseases,

205
00:24:10,300 --> 00:24:16,330
the voice and the fact that perhaps you can hear stress from the voice is kind of something that you might have heard those,

206
00:24:16,330 --> 00:24:17,860
but also even heart disease.

207
00:24:17,860 --> 00:24:27,710
And the intuition behind that is the the vocal cords and the respiratory tract is somehow very intertwined with the cardiovascular tract.

208
00:24:27,710 --> 00:24:35,620
So perhaps a hardening of the arteries might make changes in your voice more prominent.

209
00:24:35,620 --> 00:24:47,710
So that's kind of, you know, I can do the time computer scientist interpretation and in lay terms of what the situation is.

210
00:24:47,710 --> 00:24:55,300
So it's not just about data that comes out from our vocal respiratory tract.

211
00:24:55,300 --> 00:25:02,950
It could also be data that comes from our heart. We know we have EKGs in our watches already.

212
00:25:02,950 --> 00:25:13,420
There's a one lead in the in my Apple Watch, but there are pathologies that can only be heard of or seen through an echo cardiogram.

213
00:25:13,420 --> 00:25:16,900
So auscultation is important.

214
00:25:16,900 --> 00:25:28,360
There are the start to be collections of data set from digital stethoscope that can be used for auscultation of heart pathology and below.

215
00:25:28,360 --> 00:25:31,930
If you're interested in again, a reference of one of the one of the work,

216
00:25:31,930 --> 00:25:39,430
and we're not the only one working on this and on on on how this can, can be can be done.

217
00:25:39,430 --> 00:25:51,580
The problem in general is that while for speech and there are very many datasets available and people are really concentrating on the techniques here,

218
00:25:51,580 --> 00:25:56,320
there's very limited data and in some cases there is really no data.

219
00:25:56,320 --> 00:26:04,480
I was talking to a colleague who is a respiratory clinician, and I was asking them how they trained their doctors.

220
00:26:04,480 --> 00:26:10,870
And she was telling me that the main technique is to listen to the same patient.

221
00:26:10,870 --> 00:26:19,960
So the consultant listen. The trainee listens, and then they learn how to understand respiratory.

222
00:26:19,960 --> 00:26:26,380
But, you know, having data banks and which, you know, I'm sure you know where I'm going with this.

223
00:26:26,380 --> 00:26:31,140
So the collection of these data is as important as the analysis of it.

224
00:26:31,140 --> 00:26:34,680
This is a review from 2017 and one of the many that can be found,

225
00:26:34,680 --> 00:26:43,410
so people are really clear on the fact that having data can be useful in creating models.

226
00:26:43,410 --> 00:26:54,580
And here are the examples of things that can be detected using this data asthma, COPD pneumonia are three in this particular abstract.

227
00:26:54,580 --> 00:27:04,570
And so while I was mulling over this as part of my IAC Advance Grand Corbett, it started to happen and a couple of colleagues been in touch.

228
00:27:04,570 --> 00:27:14,150
They knew about my project and restarted collection of data through an app that we pushed for.

229
00:27:14,150 --> 00:27:27,010
Now I can get a separate talk about how difficult it was to push out an app to collect sounds had COVID in the name at pandemic time.

230
00:27:27,010 --> 00:27:31,540
This was March, April 2020. Yeah.

231
00:27:31,540 --> 00:27:39,750
You know, you can ask me at the end and I have thoughts about how this can be changed because the time we're really trying to do something useful.

232
00:27:39,750 --> 00:27:49,870
But the result of this collection, and I'll talk a bit more about what the data we're collecting is is contained into a large scale

233
00:27:49,870 --> 00:27:57,610
dataset that we've just pushed onto the new dataset track and will be released momentarily.

234
00:27:57,610 --> 00:28:05,290
We already released subsets of the data in this data is private and very sensitive,

235
00:28:05,290 --> 00:28:11,020
and therefore we are releasing this phase with data transfer agreement between institutions.

236
00:28:11,020 --> 00:28:15,220
I can say more if you have more to ask in the end, what does that do?

237
00:28:15,220 --> 00:28:23,770
I'm spending a little bit more time on this because it's very timely and I think we've learnt a lot by doing this and we're still we're still on it.

238
00:28:23,770 --> 00:28:30,160
You know, in addition to record demographics, medical history and symptoms which many other apps are doing,

239
00:28:30,160 --> 00:28:36,190
we are recording songs for recording breathing songs or recording costs on some rare recording voice sounds.

240
00:28:36,190 --> 00:28:42,250
And again, I can't give another talk about how we decided to go for this sentence that you see on the third

241
00:28:42,250 --> 00:28:48,260
screen about what the user needs to read and perhaps what we should have told them to read.

242
00:28:48,260 --> 00:28:53,410
Once I talked to better experts, you learn by doing. Why?

243
00:28:53,410 --> 00:28:58,060
What's the holy grail here? Well, we have all this very cheap lateral flow tests.

244
00:28:58,060 --> 00:29:06,610
We have more precise speakers, but we think for these diseases, perhaps having additional scalable,

245
00:29:06,610 --> 00:29:12,130
contactless, affordable and I should add sustainable ways of testing.

246
00:29:12,130 --> 00:29:16,390
Even that lower precision would be very valuable.

247
00:29:16,390 --> 00:29:20,020
And after working for more than a year on this,

248
00:29:20,020 --> 00:29:29,500
the conclusion I came through is that this is really a very valuable tool when you're looking at respiratory disease progression because,

249
00:29:29,500 --> 00:29:37,310
you know, the licence is a digital device could be really, really, really valuable.

250
00:29:37,310 --> 00:29:42,350
And so, again, just because this sort of I like grass and this is interesting,

251
00:29:42,350 --> 00:29:47,360
this is the data we have collected, so we do ask for some ground truth to the user.

252
00:29:47,360 --> 00:29:50,780
We asked them to report if they have tested for COVID.

253
00:29:50,780 --> 00:29:53,360
All of this is crowdsourced so they can lie.

254
00:29:53,360 --> 00:30:01,640
People have been miaowing into the app, so we have lots of noisy, dirty data that we have to clean and look at.

255
00:30:01,640 --> 00:30:09,920
And so mainly most of the data is in fact negative, as you would imagine, and we have some COVID positive data as well.

256
00:30:09,920 --> 00:30:12,590
We ask the users where they're from.

257
00:30:12,590 --> 00:30:23,120
You can see the bumps into the data collection down in the when we did a press release or someone court heard about our app.

258
00:30:23,120 --> 00:30:33,330
And, you know, age and gender and smoking status information are also in this graphs.

259
00:30:33,330 --> 00:30:41,220
One thing I should say, because I'm sure you're you're asking yourself this is, well, why would you be able to see this?

260
00:30:41,220 --> 00:30:45,900
Why, why is COVID different? Well, we don't have that information yet.

261
00:30:45,900 --> 00:30:52,620
I would start by saying that other researchers have been in touch with the researchers, obviously have been trying to do similar things.

262
00:30:52,620 --> 00:30:59,250
And I just point you to one which I thought was particularly useful by the group in CMU by Rita Singh,

263
00:30:59,250 --> 00:31:04,350
who is as done analyses of characteristics of COVID voices.

264
00:31:04,350 --> 00:31:10,920
And since then, we've been contacted by many clinicians who are essentially saying,

265
00:31:10,920 --> 00:31:14,850
I think I can hear it when a patient comes around and sort of people are really

266
00:31:14,850 --> 00:31:20,190
on the ground in the front line and they thought they could hear something.

267
00:31:20,190 --> 00:31:30,510
The reality is that, you know, depending on what data we have, we can perhaps do a different set of predictions.

268
00:31:30,510 --> 00:31:33,810
So will we be able to distinguish COVID from the flu?

269
00:31:33,810 --> 00:31:41,700
Well, we have absolutely no data from the flu, so this is something that we, you know, we are very interesting in trying to to understand.

270
00:31:41,700 --> 00:31:45,660
But let let's not get ahead of myself too much.

271
00:31:45,660 --> 00:31:51,810
This is just one slide that shows you. Well, we're done to get to the task.

272
00:31:51,810 --> 00:31:55,680
Can I distinguish from the sounds of a person if they're COVID positive?

273
00:31:55,680 --> 00:32:06,390
And we have used the pre-trained model of each model, which was strained on previous audio like large scale audio datasets and then

274
00:32:06,390 --> 00:32:10,380
extracted the features and then concatenated them and used them in a for it.

275
00:32:10,380 --> 00:32:14,550
That then was used for the prediction. And then you have to task.

276
00:32:14,550 --> 00:32:19,110
One is really the diagnostic task trying to say, is this sample yes or no?

277
00:32:19,110 --> 00:32:26,940
And one that we're still working on more longitudinal. I'll leave it there for for later.

278
00:32:26,940 --> 00:32:31,620
Now I give you only one one one information, one piece of information.

279
00:32:31,620 --> 00:32:35,460
I mean, these are three papers we published. The last one, which is under review,

280
00:32:35,460 --> 00:32:40,240
is this one that explores the realistic performance of audio based digital testing because we

281
00:32:40,240 --> 00:32:46,260
realised that there was a lot of hype at some point people claiming performance of 90 plus percent,

282
00:32:46,260 --> 00:32:48,840
which we didn't really believe.

283
00:32:48,840 --> 00:33:02,070
And we think a realistic tool that perhaps says data not yet about colds and flu could be around 0.7 percent performance.

284
00:33:02,070 --> 00:33:09,240
However, as I said, every time we tried to integrate our dataset with other dataset that had other diseases,

285
00:33:09,240 --> 00:33:15,510
the machine learning framework was too smart and would detect the dataset rather than the disease.

286
00:33:15,510 --> 00:33:22,980
If you have questions on this, I'm happy to take it and we're still exploring what this sort of thing can be useful for

287
00:33:22,980 --> 00:33:28,590
and having a better ground truth and having better data of other diseases is important.

288
00:33:28,590 --> 00:33:35,610
One test we have done is to try the model on data that we have of people with asthma, and you seem that there.

289
00:33:35,610 --> 00:33:40,800
The model wasn't easily confused by that, but I'm sure it would be confused by other diseases.

290
00:33:40,800 --> 00:33:46,320
So it's a matter of deciding what this could be useful for is more of a, you know,

291
00:33:46,320 --> 00:33:52,290
a public health question than a methodological machine learning question at that point.

292
00:33:52,290 --> 00:33:59,890
Now the final important aspect here for me is that the.

293
00:33:59,890 --> 00:34:02,800
After we reflected over this for a year,

294
00:34:02,800 --> 00:34:13,460
I think the sort of tools will become invaluable to keep patients out of hospital and look at their onset, as well as progression and recovery.

295
00:34:13,460 --> 00:34:17,020
And we're asking our users to give data every couple of days.

296
00:34:17,020 --> 00:34:25,030
So we start having samples. And one of the volunteers has given us more than 250 samples and every talk I give.

297
00:34:25,030 --> 00:34:30,160
I'm essentially thanking them for this. This is very valuable data at the bottom.

298
00:34:30,160 --> 00:34:37,720
Here you see a graph there shows how we could possibly be able to see the progression of someone

299
00:34:37,720 --> 00:34:43,120
with the disease before they test negative the Green Party's where they have a negative test.

300
00:34:43,120 --> 00:34:50,170
And the other part is where our data and our model starts to decline already in the probability of this prediction.

301
00:34:50,170 --> 00:34:58,840
So the ability of and this is not personalised or anything is just using and, you know, sequential modelling technique.

302
00:34:58,840 --> 00:35:03,700
But but the idea is to get here and this is possibly not just for COVID.

303
00:35:03,700 --> 00:35:10,030
This is something that we're trying to think more generally and scale up to other diseases.

304
00:35:10,030 --> 00:35:19,540
Now the last part of this is a reflection of of how, especially in this case, you know,

305
00:35:19,540 --> 00:35:27,190
the idea of having this prediction, this one number that gives us COVID non-COVID is is useful.

306
00:35:27,190 --> 00:35:35,410
And you know, the the we looked around. And of course, if you look at uncertainty that I found, for example,

307
00:35:35,410 --> 00:35:41,020
this paper that says that essentially the computer measure of uncertainty allowed.

308
00:35:41,020 --> 00:35:47,920
So this is an example of diabetic retinopathy and images, essentially.

309
00:35:47,920 --> 00:35:53,290
And it was it was saying that computing the uncertainty of the prediction allowed them to refer

310
00:35:53,290 --> 00:35:58,690
the subset of difficult cases to further inspection to perhaps refer them back to the clinician.

311
00:35:58,690 --> 00:36:03,520
So this this is obviously one use of uncertainty.

312
00:36:03,520 --> 00:36:10,900
What I would like to highlight and I will go through this complex graph in steps and I have time for that,

313
00:36:10,900 --> 00:36:20,210
is the fact that with digital intervention, uncertainty could be used to.

314
00:36:20,210 --> 00:36:30,590
You know, being integrated in in the process of not just where the clinician comes in, but also where the need for more samples come in.

315
00:36:30,590 --> 00:36:37,820
And so this is even more useful and lowers and has a very low entry point because the data is digital is easily sampled,

316
00:36:37,820 --> 00:36:47,600
at least in this particular case. So in this paper that you see at the bottom, we are essentially solving two problems and one at once.

317
00:36:47,600 --> 00:36:54,410
The first problem is the generation of uncertainty over the prediction value of the COVID prediction,

318
00:36:54,410 --> 00:37:00,050
and we do that by using and single mothers are not just using my models,

319
00:37:00,050 --> 00:37:07,250
by using multiple models and aggregating the prediction variance and deciding when you know, there wasn't,

320
00:37:07,250 --> 00:37:14,480
let's say, certainty about their prediction, then declaring that that is an uncertain prediction.

321
00:37:14,480 --> 00:37:22,670
The fact of using different ensembles also was solving our problem that our data was mainly mainly negative.

322
00:37:22,670 --> 00:37:31,790
People declare that have been tested negative. So we use just one positive set and to balance it with multiple negative sets in the

323
00:37:31,790 --> 00:37:39,560
different kind of ensembles that we have different instances of the samples that we have.

324
00:37:39,560 --> 00:37:46,130
One interesting piece of information is that the graph at the bottom where we noted that the

325
00:37:46,130 --> 00:37:54,380
uncertainty tended to be higher when we had the sort of wrong prediction and so indicating that,

326
00:37:54,380 --> 00:38:03,500
oh, you know, we don't know if this is in general, but this is certainly an indication that perhaps you know the wrong predictions retaking the data.

327
00:38:03,500 --> 00:38:08,240
Perhaps the data was noisy and was therefore predicted in a certain way.

328
00:38:08,240 --> 00:38:17,480
And so this is, you know, I will stop here on this. But if you want to read more again and there is a paper there and essentially the last

329
00:38:17,480 --> 00:38:22,910
couple of slides for you is related to the privacy argument that I've made before.

330
00:38:22,910 --> 00:38:29,570
Now, machine learning device is an open area of work.

331
00:38:29,570 --> 00:38:37,070
Many researchers have made strides into compressing the models into using various techniques to make that happen.

332
00:38:37,070 --> 00:38:42,170
We're still not there on a number of things, including perhaps training on device,

333
00:38:42,170 --> 00:38:48,870
but I think the agreement from the community is that perhaps more than training on device, we are interested in incremental learning.

334
00:38:48,870 --> 00:38:53,450
So having a model and then perhaps adopting it on device, that's more interesting.

335
00:38:53,450 --> 00:39:00,770
What I found particularly interesting is the bringing this idea of how if we have uncertainty estimation in the models,

336
00:39:00,770 --> 00:39:08,120
can we then also bring that on device? And this is again, if you want to read into this area, we are by no means alone in in this quest,

337
00:39:08,120 --> 00:39:13,280
but that's one reference of works that we have been doing on this.

338
00:39:13,280 --> 00:39:19,400
And as I am, this is my last slide before the questions that, as I said at the beginning,

339
00:39:19,400 --> 00:39:28,520
there are other new types of devices on which we can start doing these things and obviously having

340
00:39:28,520 --> 00:39:33,980
having sensors around your head in your ear like I'm wearing right now is very interesting.

341
00:39:33,980 --> 00:39:41,840
We found the use of microphones to perhaps monitor breathing and heart rate already from with the inner microphone.

342
00:39:41,840 --> 00:39:49,940
These things have and again, this is an initial work that just does activity recognition use of that microphone, but that's the direction we're going.

343
00:39:49,940 --> 00:39:58,880
So doing all this, you know, data collection and perhaps then analysis on device is the general picture of here.

344
00:39:58,880 --> 00:40:03,320
And then here is my slide of things.

345
00:40:03,320 --> 00:40:09,770
Obviously, I can't do this. I've done none of this. In fact, all of these people are the people who have done it all.

346
00:40:09,770 --> 00:40:16,190
And if you want to contact us, here are the details and thank you very much for listening.

347
00:40:16,190 --> 00:40:27,070
Even if just online. Thank you very much. Thank you so much, Cecilia, for really stimulating lecture.

348
00:40:27,070 --> 00:40:31,600
I've got some questions that people have been asking here, so I'm going to read those out.

349
00:40:31,600 --> 00:40:36,310
Also, people can feel free to send more in as we're going so well.

350
00:40:36,310 --> 00:40:41,470
Actually, the first one I want to read is not a question, but a comment. It just says, thanks for the lecture.

351
00:40:41,470 --> 00:40:49,690
Your work is so interesting. So I think that many people wanted to write this comment, so I thought I'd read it first.

352
00:40:49,690 --> 00:40:52,900
Thank you. OK, so then some more technical questions.

353
00:40:52,900 --> 00:41:02,750
The first one asks about the work for measuring heart rate, and it said, Why are we predicting heart rate rather than measuring it so?

354
00:41:02,750 --> 00:41:11,230
So what's going on there? OK, so so it's a it's a very good question, and we are also working on trying to measure it.

355
00:41:11,230 --> 00:41:16,870
But as a researcher, you know, we're also interested in trying to see what is the right proxy.

356
00:41:16,870 --> 00:41:23,080
There are cases where at the moment, measuring heart rate is not precise.

357
00:41:23,080 --> 00:41:30,790
So we have PPC sensors on these devices that have been proven to have all sorts of biases movement on the wrist.

358
00:41:30,790 --> 00:41:35,830
I heard the talk from from an expert well, in Google, in fact.

359
00:41:35,830 --> 00:41:44,560
And you were saying the breast is really the fun place to have a heart rate sensor because we move it so much for many other reasons.

360
00:41:44,560 --> 00:41:49,240
So we are trying to see what else so you can measure it from here.

361
00:41:49,240 --> 00:41:57,610
You can measure from the places. But another line of research and you know, I'm I'm big on the question why not?

362
00:41:57,610 --> 00:42:01,330
And I guess this I would like to answer this question is why not?

363
00:42:01,330 --> 00:42:07,030
So this challenge that one sensor is always the best. It's something I like to do this.

364
00:42:07,030 --> 00:42:12,040
I hope that that answers the question. Great, thanks.

365
00:42:12,040 --> 00:42:18,700
We have another person who's interested in well, and actually I'm sure many people interested in what what types of neural nets

366
00:42:18,700 --> 00:42:24,160
did you find more effective for making the predictions from the wearable data?

367
00:42:24,160 --> 00:42:35,470
That's that's a very good question. I mean, it depends in the sense that if you see the two presented and one of them is using a CNN plus Joe use,

368
00:42:35,470 --> 00:42:44,200
the other one is using just two dense layers. And so in in the in the ones using the twins layer, we actually were using it.

369
00:42:44,200 --> 00:42:54,730
We're essentially condensing the features. We're using features instead of the raw accelerometer input because that was essentially too much data.

370
00:42:54,730 --> 00:43:02,680
So I'm sure I'm not answering this question. If you're looking for, but I can point you to literature, of course I was teaching, in fact,

371
00:43:02,680 --> 00:43:11,020
where I know people in Georgia Tech have looked at the best techniques for accelerometer activity recognition data.

372
00:43:11,020 --> 00:43:16,120
And I think there were big on Elysium's, for example.

373
00:43:16,120 --> 00:43:20,740
So, yeah, but I think the jury's out and it really depends how much data you have and

374
00:43:20,740 --> 00:43:26,860
what you're trying to do if you're trying to combine multiple sensors or not. I'm probably not even the right person to ask this.

375
00:43:26,860 --> 00:43:34,870
I will ask one of my students, maybe you can send me an e-mail and I put you in touch with the people on the ground with this way.

376
00:43:34,870 --> 00:43:41,260
OK. So another person's asking about the COVID predictions, and the question is,

377
00:43:41,260 --> 00:43:47,410
you've got a model that's trained on a kind of population and what the accuracy?

378
00:43:47,410 --> 00:43:53,020
How would the accuracy increase if you had a model that was trained per user and actually also is that even feasible?

379
00:43:53,020 --> 00:44:01,300
I mean, could you do that over time? Well, that that is, I think the next step, the problem is that we're missing data.

380
00:44:01,300 --> 00:44:09,130
So so the moment we use the the general model, because that's where the data is, you only have mainly one sample per user.

381
00:44:09,130 --> 00:44:18,070
But if you were to start collecting personalised sample day after day, I even had people that said, we are so different.

382
00:44:18,070 --> 00:44:21,910
Our voices are so different that I don't expect this model to be more precise.

383
00:44:21,910 --> 00:44:27,430
The next, because you know what you sound is different from what I saw.

384
00:44:27,430 --> 00:44:33,070
And then you found mass MOCA is even more true. So, yeah, personalised models are really the way to go.

385
00:44:33,070 --> 00:44:37,570
And I think this is even more important for progression when you're trying to monitor progression,

386
00:44:37,570 --> 00:44:47,090
knowing the baseline of yourself is where we're going. And I think the lack of data is stopping all of this research at the moment.

387
00:44:47,090 --> 00:44:52,370
OK, actually, these questions are coming at a huge rate, you've obviously stimulated loads of people.

388
00:44:52,370 --> 00:44:57,470
Let me try another so faster than I can even read them, which is excellent.

389
00:44:57,470 --> 00:45:06,530
The one I want to ask you next is about the difference between in-ear sensors and wrist sensors in terms of if in terms of noise.

390
00:45:06,530 --> 00:45:12,920
So what's more noisy? And actually then the question ask, you know, is this more sound based or yeah, OK.

391
00:45:12,920 --> 00:45:15,680
Basically, that's the question which is more noisy.

392
00:45:15,680 --> 00:45:26,900
OK, so the device is essentially virtually there's virtually no research on your able heart rate and respiratory sensing, and we will look into it.

393
00:45:26,900 --> 00:45:31,820
We have nothing published. At the moment, we only have the paper that they refer to the most,

394
00:45:31,820 --> 00:45:40,700
especially if at the at the bottom where we do activity recognition, we are now monitoring heart rate.

395
00:45:40,700 --> 00:45:55,490
What I can say is that the head is a much more stable place to monitor, you know, activity and possibly even physiological things.

396
00:45:55,490 --> 00:45:59,510
So, you know, the wrist might not be the best place.

397
00:45:59,510 --> 00:46:06,980
There are very few comparisons for heart rate monitor on average ear durable at the moment.

398
00:46:06,980 --> 00:46:14,350
So I guess I'm not. Yeah. Maybe next year we can talk about this with more data.

399
00:46:14,350 --> 00:46:21,860
Certainly promising. And here's somebody who's asking about the difficulty that in the real world,

400
00:46:21,860 --> 00:46:30,800
you don't have labelled data and what are some of the effective methods that you can use to try to get around the lack of labels?

401
00:46:30,800 --> 00:46:41,240
That's a very good question. So this is something that the community is really looking very much into.

402
00:46:41,240 --> 00:46:48,110
Obviously, transfer learning has been tried, so supervision has been tried and transformation.

403
00:46:48,110 --> 00:46:54,470
People are trying to use ancillary tasks as well as in the way that we've also tried to to.

404
00:46:54,470 --> 00:47:01,230
And I think there are techniques that have been applied to other data that can be tried here.

405
00:47:01,230 --> 00:47:07,040
But the problem of labelling in wearable is really perhaps bigger than in other domains.

406
00:47:07,040 --> 00:47:11,420
So, yeah, I think I mentioned the techniques that we've been using,

407
00:47:11,420 --> 00:47:18,960
but it's an if you if you're not working on that and you want to work on that, it's definitely, I think, interesting.

408
00:47:18,960 --> 00:47:25,910
OK, OK. And there's someone asking about something that you alluded to, but maybe didn't quite have time to tell us enough.

409
00:47:25,910 --> 00:47:32,720
So this person was saying that what were the biggest challenges with the app deployment and converting the data into results?

410
00:47:32,720 --> 00:47:37,740
You said you had some thoughts on how the process could be improved. What are they?

411
00:47:37,740 --> 00:47:49,160
So our problem was that we were blacklisted for about a month from Google and Apple because the app

412
00:47:49,160 --> 00:47:56,630
had the Corvette in the title and it was considered kind of an exploitation of a large scale event.

413
00:47:56,630 --> 00:48:05,180
And so we had to I had to plead to the head of public health in Cambridge to send the letter through.

414
00:48:05,180 --> 00:48:13,760
So we through the normal forms that Google has and and to send this letter, we are not we are not playing around.

415
00:48:13,760 --> 00:48:21,110
We are trying to do a research study on this and the day we have all the ethics, we have all the data transfer agreement in place.

416
00:48:21,110 --> 00:48:33,110
And so there is the seemed to be a missing pass to connect academic research and this sort of large scale deployment for this sort of, I would say,

417
00:48:33,110 --> 00:48:41,360
maybe excluding mine, but generally very important studies that can happen through the deployment and the large scale collection of this data.

418
00:48:41,360 --> 00:48:51,010
Obviously, you know, privacy is up there as as a big banner, but it is already finding application that is published in.

419
00:48:51,010 --> 00:48:56,290
And temps. Well, yeah, I was alluding to this when I mentioned that,

420
00:48:56,290 --> 00:49:03,430
but definitely another another important lesson is people have asked about our data why you not only coughs,

421
00:49:03,430 --> 00:49:11,470
why you're not releasing this data publicly to everyone in some groups have and in consultation with experts in the university,

422
00:49:11,470 --> 00:49:15,820
we have decided that this data is actually more dangerous than people think.

423
00:49:15,820 --> 00:49:27,340
And we also had this conversation with the chairs of the new reps, data and data track where we released our data because their good for paper.

424
00:49:27,340 --> 00:49:33,970
The beginning was this was asking us to release the data public if we were to submit to that track.

425
00:49:33,970 --> 00:49:36,250
And I wrote to them and said, Well,

426
00:49:36,250 --> 00:49:46,870
it is wrong to release this data publicly because someone could reengineered the identity of someone's voice or even cost,

427
00:49:46,870 --> 00:49:51,590
in fact, just by correlating it with something else, public and publicly available data.

428
00:49:51,590 --> 00:49:59,410
So this process have taken time, but I think we got that right.

429
00:49:59,410 --> 00:50:00,280
Thank you, sir.

430
00:50:00,280 --> 00:50:09,250
I'm just going to ask one more question, and then I'm just expressing because we can't do it live so much thanks from all of us listening.

431
00:50:09,250 --> 00:50:10,720
But let me ask one more question.

432
00:50:10,720 --> 00:50:17,920
So this person says, could you say a couple of things on current research on mental health analysis using wearable data?

433
00:50:17,920 --> 00:50:28,480
Do you have any general thoughts or directions that you might find interesting? So we have worked on collecting data for mental health.

434
00:50:28,480 --> 00:50:34,870
Five, six, seven years ago, in fact, the one of the papers that got the the pen name,

435
00:50:34,870 --> 00:50:46,060
but your award was in fact the one that was doing emotion detection from voice on device on a very old Nokia phone that had wonderful battery.

436
00:50:46,060 --> 00:50:52,360
That's why we could do it. We Gucci mixture models. So that was on the voice.

437
00:50:52,360 --> 00:51:01,940
We also collected data from accelerometer and, you know, questionnaires and then tried to correlate those with mood.

438
00:51:01,940 --> 00:51:05,860
We had mood reports, so we have a large dataset that had the sort of information.

439
00:51:05,860 --> 00:51:10,270
So I think these days these things are still ongoing.

440
00:51:10,270 --> 00:51:19,120
I think the finding that the finding of the phone was striking at the time was that someone having a light.

441
00:51:19,120 --> 00:51:23,740
So having this syndrome with on their phone, having some sort of activity that didn't mean exercise,

442
00:51:23,740 --> 00:51:31,390
it just meant that they were going somewhere all the time where they were using the phone was actually positively correlated with mood.

443
00:51:31,390 --> 00:51:39,130
So, so definitely I mean, this this is really important, but there are various aspects of mental health the expanding to Alzheimer.

444
00:51:39,130 --> 00:51:50,260
I have a project on monitoring memory and Alzheimer correlation with the ability to navigate,

445
00:51:50,260 --> 00:51:56,350
which is apparently one of the first thing that disappears with your assignment is also another area where these devices could make a difference.

446
00:51:56,350 --> 00:52:05,560
I could go on given our talk, but Leslie would stop me. Okay, thank you so much, Cecilia, for a marvellous talk and thanks again to our sponsors,

447
00:52:05,560 --> 00:52:10,130
Oxford Asset Management, and thanks to all of you for attending online.

448
00:52:10,130 --> 00:52:16,638
So, so that's a day lecture. Thank you very much. Keep.