1 00:00:01,770 --> 00:00:09,600 Hello, everybody. Welcome to this term strategy lecture. We're going to be waiting just a few minutes because I can see a lot of people joining, 2 00:00:09,600 --> 00:00:12,480 there was a technical difficulty at the beginning, so I'm sorry about that. 3 00:00:12,480 --> 00:00:16,860 But please, let's just wait a minute and let the people who are trying to join get in. 4 00:00:16,860 --> 00:00:24,630 OK. So, on behalf of the Department of Computer Science at Oxford University, I'd like to welcome all of you to this term straight to lecture. 5 00:00:24,630 --> 00:00:30,960 So this is a series of distinguished lectures that we have one a term in memory of Professor Christopher Straightly, 6 00:00:30,960 --> 00:00:39,330 who is actually the first professor of computer science at Oxford Street, who founded Oxford Programming Research Group in nineteen sixty five. 7 00:00:39,330 --> 00:00:43,170 And together with Dana Scott, he founded the field of Denotation with Semantics, 8 00:00:43,170 --> 00:00:47,400 which provided a firm mathematical foundation for programming languages. 9 00:00:47,400 --> 00:00:52,110 Before I get the pleasure of introducing today's speaker, I get another pleasure. 10 00:00:52,110 --> 00:01:00,360 I'd like to just really strongly thank Oxford Asset Management, who are really generously supporting this sequence of lectures. 11 00:01:00,360 --> 00:01:04,680 They've actually been supporting the series since 2014, and without that support, 12 00:01:04,680 --> 00:01:09,330 we wouldn't be able to bring this really, really high calibre series of speakers. 13 00:01:09,330 --> 00:01:17,070 So great thanks to them. It's great pleasure today to welcome Professor Cecilia Mascolo. 14 00:01:17,070 --> 00:01:24,030 Cecily is a professor of mobile systems at the Department of Computer Science and Technology of the University of Cambridge. 15 00:01:24,030 --> 00:01:31,380 At at Cambridge, Cecilia is actually the head of the Mobile Wearable Systems and Augmented Intelligence Group. 16 00:01:31,380 --> 00:01:37,410 She's also holding NERC Advanced Grant at the moment on the topic of audio based mobile health diagnostics. 17 00:01:37,410 --> 00:01:42,220 So this explain both her research area and the area of the talk. 18 00:01:42,220 --> 00:01:48,720 The Sisulu's got a huge string of awards, so I had to kind of shorten it so that you would have time to hear from her. 19 00:01:48,720 --> 00:01:53,130 Before her ERC advanced grant, she had an network advanced research fellowship. 20 00:01:53,130 --> 00:01:59,820 There's been a fellow of the Turing Institute shows a huge number of exciting keynote talks, 21 00:01:59,820 --> 00:02:05,900 so I just looked for this year and amongst this year I found AI Tripoli Health Care Summit, 22 00:02:05,900 --> 00:02:08,400 IWC and Smart Comp, 23 00:02:08,400 --> 00:02:18,840 and from last year ACM welcomed that hot mobile addition cited this year for the I Talk and has a whole also the best paper awards, 24 00:02:18,840 --> 00:02:22,620 including recently a 10 year impact award at ACM. 25 00:02:22,620 --> 00:02:32,640 Giving up her talk today is entitled Mixed Signals Audio and Wearable Data Analysis for Health Diagnostics, so we're really looking forward to that. 26 00:02:32,640 --> 00:02:40,680 Before I welcome Cecilia, I just want to make one technical comment since we're still in pandemic mode and that's that during the talk, 27 00:02:40,680 --> 00:02:45,330 what you can do is you can type questions into the chat, and at the end of the talk, 28 00:02:45,330 --> 00:02:49,500 I will be reading those out so that Cecilia can answer some of our questions. 29 00:02:49,500 --> 00:02:56,740 OK, Cecilia, it's a huge pleasure to welcome you virtually to Oxford. 30 00:02:56,740 --> 00:03:10,200 Leslie, thank you very much. So before I share my slides, I would like to thank you and. 31 00:03:10,200 --> 00:03:16,510 A. A, I guess you can hear it, so, Leslie, thank you very much. 32 00:03:16,510 --> 00:03:23,090 I was muted for a while and I would like to thank you and your predecessor, Michael Wooldridge, for this invitation. 33 00:03:23,090 --> 00:03:30,370 It's a great honour to be here. I'm sorry that I can't meet you all in person and have the live interaction that we could have. 34 00:03:30,370 --> 00:03:35,440 But I will now show my slides and hopefully this will be. 35 00:03:35,440 --> 00:03:45,380 Some interaction was still happening, so I will now assume you can see my slides and I thought my talk. 36 00:03:45,380 --> 00:03:50,560 And so there is now in our daily lives, 37 00:03:50,560 --> 00:03:58,210 a constellation of wearable devices that are sensing our behaviour and impartially in a perhaps a more indirect way. 38 00:03:58,210 --> 00:04:10,720 Our health. And so one would imagine if you have some of these like phones, watches and durables that this area is kind of dumb, 39 00:04:10,720 --> 00:04:19,360 that our health is kind of transformed and there is no research for us in academia to do around this anymore. 40 00:04:19,360 --> 00:04:27,850 Well, in this talk, I would like to really highlight that what we're doing at the moment and what we see in these devices that go into 41 00:04:27,850 --> 00:04:35,720 consumers hands is that we are really playing with the sensing and with the data that comes out of it quite superficially. 42 00:04:35,720 --> 00:04:42,790 And we really need to go through a number of breakthrough to really transform health. 43 00:04:42,790 --> 00:04:44,710 And so in this talk, 44 00:04:44,710 --> 00:04:56,680 I will first talk about the challenges that we're facing in the exciting possible opportunity that this and this could could innovate. 45 00:04:56,680 --> 00:04:59,560 So these are only some. So I have this challenge. 46 00:04:59,560 --> 00:05:08,680 I will introduce them a bit and I have two reputation, two examples from my research that in which I try to explain some of this. 47 00:05:08,680 --> 00:05:14,620 So the obvious first one is that sensory modalities, of course, new and new sensors. 48 00:05:14,620 --> 00:05:19,660 My colleagues in engineering are coming up with new ways of sensing our behaviour. 49 00:05:19,660 --> 00:05:22,390 So I have a colleague who works on EEG. 50 00:05:22,390 --> 00:05:32,910 Portable EEG sensors are becoming smaller more or less and that they are being weaved into a fabric, possibly even tattoos. 51 00:05:32,910 --> 00:05:42,400 Pills are being developed so they can be ingested, and contactless communication between ingested sensors and external devices can happen so that 52 00:05:42,400 --> 00:05:51,370 you can can sense since our health in a less invasive way or disruptive for our activities. 53 00:05:51,370 --> 00:05:59,560 But at the same time, existing sensors that are already on devices that we wear generate amounts of data that we are not 54 00:05:59,560 --> 00:06:08,740 quite at the stage of being able to being good at modelling the kind of final aims that we have. 55 00:06:08,740 --> 00:06:13,720 And this can can be taken further by saying that these devices generate data. 56 00:06:13,720 --> 00:06:19,990 Often the granularity we haven't seen before and can be placed because of the type of data they are. 57 00:06:19,990 --> 00:06:27,250 They can be placed in in in parts of our body sometimes that we have never thought of sensing. 58 00:06:27,250 --> 00:06:34,540 So some interesting conversations I've been having with them with clinicians are often of the style. 59 00:06:34,540 --> 00:06:43,440 But what if I could give the long term sense and continuous sensing from perhaps your abdomen? 60 00:06:43,440 --> 00:06:46,240 What kind of things would you be able to do? 61 00:06:46,240 --> 00:06:56,380 And this is so far from what they're used to see that even that kind of conversation of what can can be done with the sort of data is missing. 62 00:06:56,380 --> 00:07:00,190 In my examples today, I will talk about longitudinal sensing, 63 00:07:00,190 --> 00:07:08,350 the fact that now it is much easier not just to have fine grained data continuously, but also for a long, long time, 64 00:07:08,350 --> 00:07:13,360 which means that we can assess differences from past and present, 65 00:07:13,360 --> 00:07:21,490 present and possibly future when that comes and look at predictions with this sort of longitudinal data. 66 00:07:21,490 --> 00:07:32,870 One thing that is important here is that the studies and the techniques that have been used are often until now used on small scale trials. 67 00:07:32,870 --> 00:07:42,190 Small small cohorts and often the free living aspect of the analysis is somehow missing you. 68 00:07:42,190 --> 00:07:47,080 You have more control over the labels of the data sometimes in this study. 69 00:07:47,080 --> 00:07:56,200 And you know there is less ability to adapt to unforeseen and noisy data comes out of that. 70 00:07:56,200 --> 00:08:04,300 And I'm sure for the fourth bullet point, I'm preaching to the choir in this department and that but I will I will talk about this nevertheless. 71 00:08:04,300 --> 00:08:11,150 And perhaps it leads to an interesting conversation where we are talking about clinical and diagnostic. 72 00:08:11,150 --> 00:08:21,350 Six aspects of uncertainty. So the ability to go beyond the concept of accuracy of a prediction is important, 73 00:08:21,350 --> 00:08:30,720 but so the angle perhaps that is new to this department is the fact that in I will, I will show that in in mobile. 74 00:08:30,720 --> 00:08:39,740 In the case of mobile data, it is possible to weave this uncertainty in the pipeline on how the data is collected and recollected, 75 00:08:39,740 --> 00:08:47,750 because the entry point of rerecording and getting more data is so low that if it drops a prediction of certain data is uncertain, 76 00:08:47,750 --> 00:08:53,760 then maybe we can easily collect more data. I hope this is clear to a point that it is not clear. 77 00:08:53,760 --> 00:08:56,000 I have some examples on this, 78 00:08:56,000 --> 00:09:02,900 and the last point is obviously one that if I don't talk about amassed at the end of the lecture and is related to privacy. 79 00:09:02,900 --> 00:09:11,600 My perspective on this is that privacy is somehow could be embedded in the process 80 00:09:11,600 --> 00:09:16,460 that we develop to make sure that a lot of this can be done closer to the users. 81 00:09:16,460 --> 00:09:23,600 And as a systems researcher, I will show at the end some examples of how perhaps we can bring this further closer to 82 00:09:23,600 --> 00:09:30,050 the users by perhaps developing models as a clinical trial with the side effect is the, 83 00:09:30,050 --> 00:09:40,220 you know, the lack of privacy for their users in the trial. But once we develop more and more models, the models can be deployed at scale, 84 00:09:40,220 --> 00:09:49,280 and the privacy of the users is respected because those models will run on devices close to the users and on their data locally. 85 00:09:49,280 --> 00:09:50,840 And so there is more of this. 86 00:09:50,840 --> 00:10:00,060 I wanted to give you an anticipations because I know your attention, especially online, is limited, so I will try to start with the first example. 87 00:10:00,060 --> 00:10:09,170 So as I said, I have two on two different types of diseases and data syncing data that we use for that. 88 00:10:09,170 --> 00:10:14,060 So the first one is about cardiorespiratory fitness. I don't know how many of you know, 89 00:10:14,060 --> 00:10:18,710 and I certainly was surprised by the fact that cardiorespiratory fitness is a 90 00:10:18,710 --> 00:10:25,910 very important factor that is inversely associated with cardiovascular diseases. 91 00:10:25,910 --> 00:10:36,410 And, interestingly enough, is much more indicative than cholesterol, diabetes, hypertension and even smoking. 92 00:10:36,410 --> 00:10:43,380 So it's very important to assess cardiorespiratory fitness, and this is a project we're doing with MRC epidemiology. 93 00:10:43,380 --> 00:10:46,340 And so if I was in a live lecture theatre, 94 00:10:46,340 --> 00:10:55,100 I would now ask you to raise your hand if you have ever done one of the strenuous tests and with on tests in which you go on a treadmill or bike. 95 00:10:55,100 --> 00:11:03,410 And then you have a mask. And this test is cumbersome, is very strenuous because they often push it to the end of your abilities. 96 00:11:03,410 --> 00:11:12,770 And this is to measure your cardiorespiratory fitness as a means to by by, by measuring your VO2 Max, 97 00:11:12,770 --> 00:11:18,260 which is the maximum volume of oxygen that you can breathe in and then that is transported 98 00:11:18,260 --> 00:11:22,430 through the bloodstream and then eventually transform into energy by your muscles. 99 00:11:22,430 --> 00:11:28,920 Now, as you can imagine, this test is is is actually not very scalable. 100 00:11:28,920 --> 00:11:40,100 You need equipment and it is strenuous. So what could be a geologist and and people who study this sort of relationship and been doing 101 00:11:40,100 --> 00:11:47,300 for for the moment is to proxy this with other measures such as anthropometric measures, 102 00:11:47,300 --> 00:11:55,760 demographics, height, weight, BMI, as well as questionnaires about how many times to exercise, what type of type Typekit exists. 103 00:11:55,760 --> 00:12:02,030 And this is this is a good proxy already, but you can imagine, you know, you probably know where I'm going now. 104 00:12:02,030 --> 00:12:11,840 There are lots of wearable data that could be used as a proxy of that sort of questionnaire data and turns out that also resting heart rate, 105 00:12:11,840 --> 00:12:21,140 which can be measured quite easily at least more easily than exercise, is a very good proxy indicative proxy for that. 106 00:12:21,140 --> 00:12:26,120 So where are we with bringing wearables into the detection of VO2 Max? 107 00:12:26,120 --> 00:12:31,430 Well, if you have one of these most modern devices, you know that some of them, 108 00:12:31,430 --> 00:12:36,820 when you tell them what exercise you're doing, are giving you already an estimate of VO2 Max. 109 00:12:36,820 --> 00:12:46,490 And so this is happening in real life. There are very few studies that are showing the effectiveness of this promising of the 110 00:12:46,490 --> 00:12:52,550 wearable data measuring activity as well as heart rate and is a proxy for VO2 Max. 111 00:12:52,550 --> 00:12:58,580 And most importantly, there are essentially none that do this in free living conditions. 112 00:12:58,580 --> 00:13:03,530 If you remember, my first life free living condition was one of the important thing because you really don't want 113 00:13:03,530 --> 00:13:10,520 to have all having to have to label the data so much from from the perspective of the user. 114 00:13:10,520 --> 00:13:14,940 So. The free living aspect is very important. 115 00:13:14,940 --> 00:13:26,460 And so now I have a few slides on the study we are doing as this is a keynote, I hate to present research that we have already always published, 116 00:13:26,460 --> 00:13:30,750 so I always tried to push myself to present something that is something that we are doing. 117 00:13:30,750 --> 00:13:35,440 And so this is not yet something out. This is something we are working on. 118 00:13:35,440 --> 00:13:41,010 And so this is the measurement of cardiorespiratory fitness through wearable data in free living. 119 00:13:41,010 --> 00:13:45,150 And this is way this works with data. The MRC epidemiology is collected. 120 00:13:45,150 --> 00:13:48,630 It's a study called Fenland. 121 00:13:48,630 --> 00:13:57,460 It's a data set and they have a number of participant 11000 in the first cohort and then seven years later, and they have another cohort. 122 00:13:57,460 --> 00:14:05,310 Then in this particular case, we're using a subset of that. Of all these people, they do two max tests on them. 123 00:14:05,310 --> 00:14:13,290 So they do do the test. They measure anthropometric measures, as I say, demographics as well as height, weight, BMI and they ask questions them. 124 00:14:13,290 --> 00:14:20,820 But they also ask and this is this is, I think, invaluable them to wear an accelerometer on their wrist, 125 00:14:20,820 --> 00:14:29,390 as well as an EKG chest strap to measure their heart for six days, essentially very much continuously. 126 00:14:29,390 --> 00:14:37,700 So this data is a lot of data, and I remind everyone that this is in free living, we know nothing about what they're doing in those days, 127 00:14:37,700 --> 00:14:43,040 but this six continuous days is not too much of it either, but it still generates a lot of data. 128 00:14:43,040 --> 00:14:48,050 So what we are doing with this is by using his input, 129 00:14:48,050 --> 00:14:57,530 their heart rate and the movement data on which we calculate a bunch of features which you can imagine just to, you know, aggregate all this data. 130 00:14:57,530 --> 00:15:04,460 We feed this into, you know, two layers of a connected neural network, 131 00:15:04,460 --> 00:15:09,740 and we use this to do a few things now in the next slide and show you some results. 132 00:15:09,740 --> 00:15:19,280 But essentially first thing is that we try to use this data to proxy the prediction, the fitness levels and the VO2 Max test. 133 00:15:19,280 --> 00:15:22,160 And the fact is that we have the Granton. So we can you can. 134 00:15:22,160 --> 00:15:30,320 You can tell how well we're doing and which which sensor which aspect of the data is more important. 135 00:15:30,320 --> 00:15:41,120 We've also tried to see if the models are robust enough. So if we train on the original cohort, how does the model do on the later cohort? 136 00:15:41,120 --> 00:15:51,440 And we're trying to understand these sort of aspects as well as looking if perhaps maybe movement is also a good indicator of heart, 137 00:15:51,440 --> 00:15:57,840 you know, heart rate, for example, and this is what you're going to see next. 138 00:15:57,840 --> 00:16:05,430 And so here, I guess let's go straight to the figure, which is the the easiest thing to interpret here on the x axis, 139 00:16:05,430 --> 00:16:15,270 you have the VO2max of the users and the the two different distributions are the predictive versus the to the ground through distribution as you see, 140 00:16:15,270 --> 00:16:25,960 they match reasonably well. We are still, I would say, under predicting for a portion of them, as you can see the purple coming out to the back there. 141 00:16:25,960 --> 00:16:33,360 And on the table, for those of you who do like tables, we break down the different results on the arrow, 142 00:16:33,360 --> 00:16:39,050 the arrows, the Arabs and the Romans square in the column there. 143 00:16:39,050 --> 00:16:44,670 And then you see if I can point to it of the virus just using anthropometric, 144 00:16:44,670 --> 00:16:50,730 excuse me, resting heart rate mix in them and then adding the wearable data. 145 00:16:50,730 --> 00:16:55,290 As you can see, of course, the wearable data is interesting. It does help the prediction. 146 00:16:55,290 --> 00:17:03,690 And while I was discussing this predictions with the Finnish biologist, I, you know, I always ask, Well, is this improvement reasonable? 147 00:17:03,690 --> 00:17:08,220 And their also is quite interesting because they say, well, it depends what you're trying to do. 148 00:17:08,220 --> 00:17:15,310 So sometimes we really need even just this one point more to to really be precise and so that, 149 00:17:15,310 --> 00:17:19,800 you know, they're striving to get this obviously down as much as possible. 150 00:17:19,800 --> 00:17:28,980 As I said, this is this is really, you know, at the beginning that we think it's the right direction and interesting to look at this. 151 00:17:28,980 --> 00:17:37,440 The thing to remember that this data is from people that are not all athletes, which is something you find often in in some of the studies. 152 00:17:37,440 --> 00:17:49,650 These are normal people in which this information is very interesting and could lead to good outcome prediction along the same lines. 153 00:17:49,650 --> 00:17:58,680 We are trying now to also see if the wearable data, which we input in some other machine learning framework, the details. 154 00:17:58,680 --> 00:18:03,810 I just put some people down at the bottom. You can look on my web page if you want the details of the technical stuff. 155 00:18:03,810 --> 00:18:09,780 I don't think this talk is particularly about all the details of the neural network architecture used. 156 00:18:09,780 --> 00:18:19,290 So we use the the data from the wristwatch and trying to see if we can forecast heart rate from that. 157 00:18:19,290 --> 00:18:28,230 So now this is the the main task that is being performed and the next slide, we'll have some results in full of where we are with that. 158 00:18:28,230 --> 00:18:30,240 But also in the meanwhile, 159 00:18:30,240 --> 00:18:39,060 what is interesting I think of this technique is the fact that the later representation at the penultimate layer of the network, 160 00:18:39,060 --> 00:18:46,980 the network is learnt from this data of the activity is quite predictive of other clinically relevant information, 161 00:18:46,980 --> 00:18:53,220 such as BMI, age, sex and energy expenditure. And so we have all of a sudden, 162 00:18:53,220 --> 00:19:05,120 interesting relationship between what the network is learning and what what the characteristics of these individuals are. 163 00:19:05,120 --> 00:19:16,490 And here are your numbers, essentially. And so in the first table here, if you see my mouse, I'm not sure you see that. 164 00:19:16,490 --> 00:19:22,460 We have tried to stick to what is our technique and we've tried to use it just with acceleration, 165 00:19:22,460 --> 00:19:31,910 data acceleration and the temporal features that are embedded into the data that we have, as well as adding additional resting heart rate. 166 00:19:31,910 --> 00:19:38,540 Now, as I said at the beginning, resting heart rate is a measure that one can conceivably think. 167 00:19:38,540 --> 00:19:48,380 To get reasonably in a reasonably discreet manner is something that you can ask can be checked in a very quiet moment of your time. 168 00:19:48,380 --> 00:19:56,120 Maybe when you lie down and and not so frequently so it's something that is conceivable as a measurement that doesn't cost much to add. 169 00:19:56,120 --> 00:20:07,160 And as you can see, you know, I see I see is substantially decreasing, I would say reasonably decreasing when you use, 170 00:20:07,160 --> 00:20:13,280 when you start using the rest of our trade in addition to an acceleration. So clearly, I mean, epidemiologists know about this. 171 00:20:13,280 --> 00:20:18,830 They know that this is an important feature of your your fitness and clearly 172 00:20:18,830 --> 00:20:22,850 in the obviously very correlated with your general heart trade in general. 173 00:20:22,850 --> 00:20:27,320 And here we have the acceleration indicating the amount of activity that is indicative 174 00:20:27,320 --> 00:20:36,890 also and and drives can be a proxy for your heart rate variability and not mobility level. 175 00:20:36,890 --> 00:20:42,080 And the other table, we see this outcomes that we have and, you know, 176 00:20:42,080 --> 00:20:48,200 with various principal component analysis reductions from the from the features we 177 00:20:48,200 --> 00:20:54,710 have that we we have reasonable prediction of some of the demographics height six, 178 00:20:54,710 --> 00:21:02,750 somehow even age, BMI and weight. And so so here, I guess with this first part of the work, 179 00:21:02,750 --> 00:21:14,480 what I wanted to highlight amongst the original bullets of things that we have are a few things one three living data is more difficult to deal with, 180 00:21:14,480 --> 00:21:19,760 possibly more confusing. You often don't get this beautiful results, 181 00:21:19,760 --> 00:21:28,460 but we need to work with this because if you talk to epidemiologists or people that do this kind of large scale work, 182 00:21:28,460 --> 00:21:35,870 they are interesting in monitoring the population and they cannot afford to do this with something that is very controlled. 183 00:21:35,870 --> 00:21:39,590 So we need to find techniques to do it. 184 00:21:39,590 --> 00:21:46,850 The second aspect is the continuous and the longitudinal. Now I probably in the result, have really just scratched the surface of you. 185 00:21:46,850 --> 00:21:52,280 So we have Finland one, Finland two. We have, you know, people monitor a difference of seven years. 186 00:21:52,280 --> 00:21:57,350 The only thing we have done in that respect is to monitor, you know, monitor. 187 00:21:57,350 --> 00:22:05,120 Our models were robust at the moment in time. But there is a lot more that we think we can do now for the rest of the talk. 188 00:22:05,120 --> 00:22:08,810 I will I will bring back this, but for the moment, 189 00:22:08,810 --> 00:22:15,410 I will just move into another example in another sensor that we have been used 190 00:22:15,410 --> 00:22:21,380 using quite a bit and that's related to the microphone that is in our devices. 191 00:22:21,380 --> 00:22:27,680 So it's about the application of this to auscultation and auscultation in general. 192 00:22:27,680 --> 00:22:37,160 So perhaps auscultation that you as an audience know most is related to how it auscultation or 193 00:22:37,160 --> 00:22:44,810 respiratory auscultation to stay that what turns out and what what I've been told also in person. 194 00:22:44,810 --> 00:22:45,380 But then I come. 195 00:22:45,380 --> 00:23:00,170 The citation is that auscultation is very difficult for a human ear, and often junior doctors are not skilled and and this these are from what I hear, 196 00:23:00,170 --> 00:23:06,320 not so train in this auscultation because this can easily be proxy by other devices. 197 00:23:06,320 --> 00:23:17,090 So for cardiac echocardiograms, how substituting auscultation by it and just the the the stethoscope. 198 00:23:17,090 --> 00:23:24,620 On the other hand, machines and microphones are in our hands and they are cheap. 199 00:23:24,620 --> 00:23:27,920 And most importantly, they are with us all day, 200 00:23:27,920 --> 00:23:37,160 which means that with respect to the discrete auscultation that the doctor could do on us, this thing can listen to us continuously. 201 00:23:37,160 --> 00:23:43,270 Now this has advantages well of its challenges, but also opportunities. 202 00:23:43,270 --> 00:23:50,020 And so I will start with with an example of of of audio that you might be familiar with. 203 00:23:50,020 --> 00:24:02,080 And that's, you know, voice in 2017, this MIT tech review highlighted that voice could be indicative not just of 204 00:24:02,080 --> 00:24:10,300 perhaps what these to you more intuitive psychiatric and psychological diseases, 205 00:24:10,300 --> 00:24:16,330 the voice and the fact that perhaps you can hear stress from the voice is kind of something that you might have heard those, 206 00:24:16,330 --> 00:24:17,860 but also even heart disease. 207 00:24:17,860 --> 00:24:27,710 And the intuition behind that is the the vocal cords and the respiratory tract is somehow very intertwined with the cardiovascular tract. 208 00:24:27,710 --> 00:24:35,620 So perhaps a hardening of the arteries might make changes in your voice more prominent. 209 00:24:35,620 --> 00:24:47,710 So that's kind of, you know, I can do the time computer scientist interpretation and in lay terms of what the situation is. 210 00:24:47,710 --> 00:24:55,300 So it's not just about data that comes out from our vocal respiratory tract. 211 00:24:55,300 --> 00:25:02,950 It could also be data that comes from our heart. We know we have EKGs in our watches already. 212 00:25:02,950 --> 00:25:13,420 There's a one lead in the in my Apple Watch, but there are pathologies that can only be heard of or seen through an echo cardiogram. 213 00:25:13,420 --> 00:25:16,900 So auscultation is important. 214 00:25:16,900 --> 00:25:28,360 There are the start to be collections of data set from digital stethoscope that can be used for auscultation of heart pathology and below. 215 00:25:28,360 --> 00:25:31,930 If you're interested in again, a reference of one of the one of the work, 216 00:25:31,930 --> 00:25:39,430 and we're not the only one working on this and on on on how this can, can be can be done. 217 00:25:39,430 --> 00:25:51,580 The problem in general is that while for speech and there are very many datasets available and people are really concentrating on the techniques here, 218 00:25:51,580 --> 00:25:56,320 there's very limited data and in some cases there is really no data. 219 00:25:56,320 --> 00:26:04,480 I was talking to a colleague who is a respiratory clinician, and I was asking them how they trained their doctors. 220 00:26:04,480 --> 00:26:10,870 And she was telling me that the main technique is to listen to the same patient. 221 00:26:10,870 --> 00:26:19,960 So the consultant listen. The trainee listens, and then they learn how to understand respiratory. 222 00:26:19,960 --> 00:26:26,380 But, you know, having data banks and which, you know, I'm sure you know where I'm going with this. 223 00:26:26,380 --> 00:26:31,140 So the collection of these data is as important as the analysis of it. 224 00:26:31,140 --> 00:26:34,680 This is a review from 2017 and one of the many that can be found, 225 00:26:34,680 --> 00:26:43,410 so people are really clear on the fact that having data can be useful in creating models. 226 00:26:43,410 --> 00:26:54,580 And here are the examples of things that can be detected using this data asthma, COPD pneumonia are three in this particular abstract. 227 00:26:54,580 --> 00:27:04,570 And so while I was mulling over this as part of my IAC Advance Grand Corbett, it started to happen and a couple of colleagues been in touch. 228 00:27:04,570 --> 00:27:14,150 They knew about my project and restarted collection of data through an app that we pushed for. 229 00:27:14,150 --> 00:27:27,010 Now I can get a separate talk about how difficult it was to push out an app to collect sounds had COVID in the name at pandemic time. 230 00:27:27,010 --> 00:27:31,540 This was March, April 2020. Yeah. 231 00:27:31,540 --> 00:27:39,750 You know, you can ask me at the end and I have thoughts about how this can be changed because the time we're really trying to do something useful. 232 00:27:39,750 --> 00:27:49,870 But the result of this collection, and I'll talk a bit more about what the data we're collecting is is contained into a large scale 233 00:27:49,870 --> 00:27:57,610 dataset that we've just pushed onto the new dataset track and will be released momentarily. 234 00:27:57,610 --> 00:28:05,290 We already released subsets of the data in this data is private and very sensitive, 235 00:28:05,290 --> 00:28:11,020 and therefore we are releasing this phase with data transfer agreement between institutions. 236 00:28:11,020 --> 00:28:15,220 I can say more if you have more to ask in the end, what does that do? 237 00:28:15,220 --> 00:28:23,770 I'm spending a little bit more time on this because it's very timely and I think we've learnt a lot by doing this and we're still we're still on it. 238 00:28:23,770 --> 00:28:30,160 You know, in addition to record demographics, medical history and symptoms which many other apps are doing, 239 00:28:30,160 --> 00:28:36,190 we are recording songs for recording breathing songs or recording costs on some rare recording voice sounds. 240 00:28:36,190 --> 00:28:42,250 And again, I can't give another talk about how we decided to go for this sentence that you see on the third 241 00:28:42,250 --> 00:28:48,260 screen about what the user needs to read and perhaps what we should have told them to read. 242 00:28:48,260 --> 00:28:53,410 Once I talked to better experts, you learn by doing. Why? 243 00:28:53,410 --> 00:28:58,060 What's the holy grail here? Well, we have all this very cheap lateral flow tests. 244 00:28:58,060 --> 00:29:06,610 We have more precise speakers, but we think for these diseases, perhaps having additional scalable, 245 00:29:06,610 --> 00:29:12,130 contactless, affordable and I should add sustainable ways of testing. 246 00:29:12,130 --> 00:29:16,390 Even that lower precision would be very valuable. 247 00:29:16,390 --> 00:29:20,020 And after working for more than a year on this, 248 00:29:20,020 --> 00:29:29,500 the conclusion I came through is that this is really a very valuable tool when you're looking at respiratory disease progression because, 249 00:29:29,500 --> 00:29:37,310 you know, the licence is a digital device could be really, really, really valuable. 250 00:29:37,310 --> 00:29:42,350 And so, again, just because this sort of I like grass and this is interesting, 251 00:29:42,350 --> 00:29:47,360 this is the data we have collected, so we do ask for some ground truth to the user. 252 00:29:47,360 --> 00:29:50,780 We asked them to report if they have tested for COVID. 253 00:29:50,780 --> 00:29:53,360 All of this is crowdsourced so they can lie. 254 00:29:53,360 --> 00:30:01,640 People have been miaowing into the app, so we have lots of noisy, dirty data that we have to clean and look at. 255 00:30:01,640 --> 00:30:09,920 And so mainly most of the data is in fact negative, as you would imagine, and we have some COVID positive data as well. 256 00:30:09,920 --> 00:30:12,590 We ask the users where they're from. 257 00:30:12,590 --> 00:30:23,120 You can see the bumps into the data collection down in the when we did a press release or someone court heard about our app. 258 00:30:23,120 --> 00:30:33,330 And, you know, age and gender and smoking status information are also in this graphs. 259 00:30:33,330 --> 00:30:41,220 One thing I should say, because I'm sure you're you're asking yourself this is, well, why would you be able to see this? 260 00:30:41,220 --> 00:30:45,900 Why, why is COVID different? Well, we don't have that information yet. 261 00:30:45,900 --> 00:30:52,620 I would start by saying that other researchers have been in touch with the researchers, obviously have been trying to do similar things. 262 00:30:52,620 --> 00:30:59,250 And I just point you to one which I thought was particularly useful by the group in CMU by Rita Singh, 263 00:30:59,250 --> 00:31:04,350 who is as done analyses of characteristics of COVID voices. 264 00:31:04,350 --> 00:31:10,920 And since then, we've been contacted by many clinicians who are essentially saying, 265 00:31:10,920 --> 00:31:14,850 I think I can hear it when a patient comes around and sort of people are really 266 00:31:14,850 --> 00:31:20,190 on the ground in the front line and they thought they could hear something. 267 00:31:20,190 --> 00:31:30,510 The reality is that, you know, depending on what data we have, we can perhaps do a different set of predictions. 268 00:31:30,510 --> 00:31:33,810 So will we be able to distinguish COVID from the flu? 269 00:31:33,810 --> 00:31:41,700 Well, we have absolutely no data from the flu, so this is something that we, you know, we are very interesting in trying to to understand. 270 00:31:41,700 --> 00:31:45,660 But let let's not get ahead of myself too much. 271 00:31:45,660 --> 00:31:51,810 This is just one slide that shows you. Well, we're done to get to the task. 272 00:31:51,810 --> 00:31:55,680 Can I distinguish from the sounds of a person if they're COVID positive? 273 00:31:55,680 --> 00:32:06,390 And we have used the pre-trained model of each model, which was strained on previous audio like large scale audio datasets and then 274 00:32:06,390 --> 00:32:10,380 extracted the features and then concatenated them and used them in a for it. 275 00:32:10,380 --> 00:32:14,550 That then was used for the prediction. And then you have to task. 276 00:32:14,550 --> 00:32:19,110 One is really the diagnostic task trying to say, is this sample yes or no? 277 00:32:19,110 --> 00:32:26,940 And one that we're still working on more longitudinal. I'll leave it there for for later. 278 00:32:26,940 --> 00:32:31,620 Now I give you only one one one information, one piece of information. 279 00:32:31,620 --> 00:32:35,460 I mean, these are three papers we published. The last one, which is under review, 280 00:32:35,460 --> 00:32:40,240 is this one that explores the realistic performance of audio based digital testing because we 281 00:32:40,240 --> 00:32:46,260 realised that there was a lot of hype at some point people claiming performance of 90 plus percent, 282 00:32:46,260 --> 00:32:48,840 which we didn't really believe. 283 00:32:48,840 --> 00:33:02,070 And we think a realistic tool that perhaps says data not yet about colds and flu could be around 0.7 percent performance. 284 00:33:02,070 --> 00:33:09,240 However, as I said, every time we tried to integrate our dataset with other dataset that had other diseases, 285 00:33:09,240 --> 00:33:15,510 the machine learning framework was too smart and would detect the dataset rather than the disease. 286 00:33:15,510 --> 00:33:22,980 If you have questions on this, I'm happy to take it and we're still exploring what this sort of thing can be useful for 287 00:33:22,980 --> 00:33:28,590 and having a better ground truth and having better data of other diseases is important. 288 00:33:28,590 --> 00:33:35,610 One test we have done is to try the model on data that we have of people with asthma, and you seem that there. 289 00:33:35,610 --> 00:33:40,800 The model wasn't easily confused by that, but I'm sure it would be confused by other diseases. 290 00:33:40,800 --> 00:33:46,320 So it's a matter of deciding what this could be useful for is more of a, you know, 291 00:33:46,320 --> 00:33:52,290 a public health question than a methodological machine learning question at that point. 292 00:33:52,290 --> 00:33:59,890 Now the final important aspect here for me is that the. 293 00:33:59,890 --> 00:34:02,800 After we reflected over this for a year, 294 00:34:02,800 --> 00:34:13,460 I think the sort of tools will become invaluable to keep patients out of hospital and look at their onset, as well as progression and recovery. 295 00:34:13,460 --> 00:34:17,020 And we're asking our users to give data every couple of days. 296 00:34:17,020 --> 00:34:25,030 So we start having samples. And one of the volunteers has given us more than 250 samples and every talk I give. 297 00:34:25,030 --> 00:34:30,160 I'm essentially thanking them for this. This is very valuable data at the bottom. 298 00:34:30,160 --> 00:34:37,720 Here you see a graph there shows how we could possibly be able to see the progression of someone 299 00:34:37,720 --> 00:34:43,120 with the disease before they test negative the Green Party's where they have a negative test. 300 00:34:43,120 --> 00:34:50,170 And the other part is where our data and our model starts to decline already in the probability of this prediction. 301 00:34:50,170 --> 00:34:58,840 So the ability of and this is not personalised or anything is just using and, you know, sequential modelling technique. 302 00:34:58,840 --> 00:35:03,700 But but the idea is to get here and this is possibly not just for COVID. 303 00:35:03,700 --> 00:35:10,030 This is something that we're trying to think more generally and scale up to other diseases. 304 00:35:10,030 --> 00:35:19,540 Now the last part of this is a reflection of of how, especially in this case, you know, 305 00:35:19,540 --> 00:35:27,190 the idea of having this prediction, this one number that gives us COVID non-COVID is is useful. 306 00:35:27,190 --> 00:35:35,410 And you know, the the we looked around. And of course, if you look at uncertainty that I found, for example, 307 00:35:35,410 --> 00:35:41,020 this paper that says that essentially the computer measure of uncertainty allowed. 308 00:35:41,020 --> 00:35:47,920 So this is an example of diabetic retinopathy and images, essentially. 309 00:35:47,920 --> 00:35:53,290 And it was it was saying that computing the uncertainty of the prediction allowed them to refer 310 00:35:53,290 --> 00:35:58,690 the subset of difficult cases to further inspection to perhaps refer them back to the clinician. 311 00:35:58,690 --> 00:36:03,520 So this this is obviously one use of uncertainty. 312 00:36:03,520 --> 00:36:10,900 What I would like to highlight and I will go through this complex graph in steps and I have time for that, 313 00:36:10,900 --> 00:36:20,210 is the fact that with digital intervention, uncertainty could be used to. 314 00:36:20,210 --> 00:36:30,590 You know, being integrated in in the process of not just where the clinician comes in, but also where the need for more samples come in. 315 00:36:30,590 --> 00:36:37,820 And so this is even more useful and lowers and has a very low entry point because the data is digital is easily sampled, 316 00:36:37,820 --> 00:36:47,600 at least in this particular case. So in this paper that you see at the bottom, we are essentially solving two problems and one at once. 317 00:36:47,600 --> 00:36:54,410 The first problem is the generation of uncertainty over the prediction value of the COVID prediction, 318 00:36:54,410 --> 00:37:00,050 and we do that by using and single mothers are not just using my models, 319 00:37:00,050 --> 00:37:07,250 by using multiple models and aggregating the prediction variance and deciding when you know, there wasn't, 320 00:37:07,250 --> 00:37:14,480 let's say, certainty about their prediction, then declaring that that is an uncertain prediction. 321 00:37:14,480 --> 00:37:22,670 The fact of using different ensembles also was solving our problem that our data was mainly mainly negative. 322 00:37:22,670 --> 00:37:31,790 People declare that have been tested negative. So we use just one positive set and to balance it with multiple negative sets in the 323 00:37:31,790 --> 00:37:39,560 different kind of ensembles that we have different instances of the samples that we have. 324 00:37:39,560 --> 00:37:46,130 One interesting piece of information is that the graph at the bottom where we noted that the 325 00:37:46,130 --> 00:37:54,380 uncertainty tended to be higher when we had the sort of wrong prediction and so indicating that, 326 00:37:54,380 --> 00:38:03,500 oh, you know, we don't know if this is in general, but this is certainly an indication that perhaps you know the wrong predictions retaking the data. 327 00:38:03,500 --> 00:38:08,240 Perhaps the data was noisy and was therefore predicted in a certain way. 328 00:38:08,240 --> 00:38:17,480 And so this is, you know, I will stop here on this. But if you want to read more again and there is a paper there and essentially the last 329 00:38:17,480 --> 00:38:22,910 couple of slides for you is related to the privacy argument that I've made before. 330 00:38:22,910 --> 00:38:29,570 Now, machine learning device is an open area of work. 331 00:38:29,570 --> 00:38:37,070 Many researchers have made strides into compressing the models into using various techniques to make that happen. 332 00:38:37,070 --> 00:38:42,170 We're still not there on a number of things, including perhaps training on device, 333 00:38:42,170 --> 00:38:48,870 but I think the agreement from the community is that perhaps more than training on device, we are interested in incremental learning. 334 00:38:48,870 --> 00:38:53,450 So having a model and then perhaps adopting it on device, that's more interesting. 335 00:38:53,450 --> 00:39:00,770 What I found particularly interesting is the bringing this idea of how if we have uncertainty estimation in the models, 336 00:39:00,770 --> 00:39:08,120 can we then also bring that on device? And this is again, if you want to read into this area, we are by no means alone in in this quest, 337 00:39:08,120 --> 00:39:13,280 but that's one reference of works that we have been doing on this. 338 00:39:13,280 --> 00:39:19,400 And as I am, this is my last slide before the questions that, as I said at the beginning, 339 00:39:19,400 --> 00:39:28,520 there are other new types of devices on which we can start doing these things and obviously having 340 00:39:28,520 --> 00:39:33,980 having sensors around your head in your ear like I'm wearing right now is very interesting. 341 00:39:33,980 --> 00:39:41,840 We found the use of microphones to perhaps monitor breathing and heart rate already from with the inner microphone. 342 00:39:41,840 --> 00:39:49,940 These things have and again, this is an initial work that just does activity recognition use of that microphone, but that's the direction we're going. 343 00:39:49,940 --> 00:39:58,880 So doing all this, you know, data collection and perhaps then analysis on device is the general picture of here. 344 00:39:58,880 --> 00:40:03,320 And then here is my slide of things. 345 00:40:03,320 --> 00:40:09,770 Obviously, I can't do this. I've done none of this. In fact, all of these people are the people who have done it all. 346 00:40:09,770 --> 00:40:16,190 And if you want to contact us, here are the details and thank you very much for listening. 347 00:40:16,190 --> 00:40:27,070 Even if just online. Thank you very much. Thank you so much, Cecilia, for really stimulating lecture. 348 00:40:27,070 --> 00:40:31,600 I've got some questions that people have been asking here, so I'm going to read those out. 349 00:40:31,600 --> 00:40:36,310 Also, people can feel free to send more in as we're going so well. 350 00:40:36,310 --> 00:40:41,470 Actually, the first one I want to read is not a question, but a comment. It just says, thanks for the lecture. 351 00:40:41,470 --> 00:40:49,690 Your work is so interesting. So I think that many people wanted to write this comment, so I thought I'd read it first. 352 00:40:49,690 --> 00:40:52,900 Thank you. OK, so then some more technical questions. 353 00:40:52,900 --> 00:41:02,750 The first one asks about the work for measuring heart rate, and it said, Why are we predicting heart rate rather than measuring it so? 354 00:41:02,750 --> 00:41:11,230 So what's going on there? OK, so so it's a it's a very good question, and we are also working on trying to measure it. 355 00:41:11,230 --> 00:41:16,870 But as a researcher, you know, we're also interested in trying to see what is the right proxy. 356 00:41:16,870 --> 00:41:23,080 There are cases where at the moment, measuring heart rate is not precise. 357 00:41:23,080 --> 00:41:30,790 So we have PPC sensors on these devices that have been proven to have all sorts of biases movement on the wrist. 358 00:41:30,790 --> 00:41:35,830 I heard the talk from from an expert well, in Google, in fact. 359 00:41:35,830 --> 00:41:44,560 And you were saying the breast is really the fun place to have a heart rate sensor because we move it so much for many other reasons. 360 00:41:44,560 --> 00:41:49,240 So we are trying to see what else so you can measure it from here. 361 00:41:49,240 --> 00:41:57,610 You can measure from the places. But another line of research and you know, I'm I'm big on the question why not? 362 00:41:57,610 --> 00:42:01,330 And I guess this I would like to answer this question is why not? 363 00:42:01,330 --> 00:42:07,030 So this challenge that one sensor is always the best. It's something I like to do this. 364 00:42:07,030 --> 00:42:12,040 I hope that that answers the question. Great, thanks. 365 00:42:12,040 --> 00:42:18,700 We have another person who's interested in well, and actually I'm sure many people interested in what what types of neural nets 366 00:42:18,700 --> 00:42:24,160 did you find more effective for making the predictions from the wearable data? 367 00:42:24,160 --> 00:42:35,470 That's that's a very good question. I mean, it depends in the sense that if you see the two presented and one of them is using a CNN plus Joe use, 368 00:42:35,470 --> 00:42:44,200 the other one is using just two dense layers. And so in in the in the ones using the twins layer, we actually were using it. 369 00:42:44,200 --> 00:42:54,730 We're essentially condensing the features. We're using features instead of the raw accelerometer input because that was essentially too much data. 370 00:42:54,730 --> 00:43:02,680 So I'm sure I'm not answering this question. If you're looking for, but I can point you to literature, of course I was teaching, in fact, 371 00:43:02,680 --> 00:43:11,020 where I know people in Georgia Tech have looked at the best techniques for accelerometer activity recognition data. 372 00:43:11,020 --> 00:43:16,120 And I think there were big on Elysium's, for example. 373 00:43:16,120 --> 00:43:20,740 So, yeah, but I think the jury's out and it really depends how much data you have and 374 00:43:20,740 --> 00:43:26,860 what you're trying to do if you're trying to combine multiple sensors or not. I'm probably not even the right person to ask this. 375 00:43:26,860 --> 00:43:34,870 I will ask one of my students, maybe you can send me an e-mail and I put you in touch with the people on the ground with this way. 376 00:43:34,870 --> 00:43:41,260 OK. So another person's asking about the COVID predictions, and the question is, 377 00:43:41,260 --> 00:43:47,410 you've got a model that's trained on a kind of population and what the accuracy? 378 00:43:47,410 --> 00:43:53,020 How would the accuracy increase if you had a model that was trained per user and actually also is that even feasible? 379 00:43:53,020 --> 00:44:01,300 I mean, could you do that over time? Well, that that is, I think the next step, the problem is that we're missing data. 380 00:44:01,300 --> 00:44:09,130 So so the moment we use the the general model, because that's where the data is, you only have mainly one sample per user. 381 00:44:09,130 --> 00:44:18,070 But if you were to start collecting personalised sample day after day, I even had people that said, we are so different. 382 00:44:18,070 --> 00:44:21,910 Our voices are so different that I don't expect this model to be more precise. 383 00:44:21,910 --> 00:44:27,430 The next, because you know what you sound is different from what I saw. 384 00:44:27,430 --> 00:44:33,070 And then you found mass MOCA is even more true. So, yeah, personalised models are really the way to go. 385 00:44:33,070 --> 00:44:37,570 And I think this is even more important for progression when you're trying to monitor progression, 386 00:44:37,570 --> 00:44:47,090 knowing the baseline of yourself is where we're going. And I think the lack of data is stopping all of this research at the moment. 387 00:44:47,090 --> 00:44:52,370 OK, actually, these questions are coming at a huge rate, you've obviously stimulated loads of people. 388 00:44:52,370 --> 00:44:57,470 Let me try another so faster than I can even read them, which is excellent. 389 00:44:57,470 --> 00:45:06,530 The one I want to ask you next is about the difference between in-ear sensors and wrist sensors in terms of if in terms of noise. 390 00:45:06,530 --> 00:45:12,920 So what's more noisy? And actually then the question ask, you know, is this more sound based or yeah, OK. 391 00:45:12,920 --> 00:45:15,680 Basically, that's the question which is more noisy. 392 00:45:15,680 --> 00:45:26,900 OK, so the device is essentially virtually there's virtually no research on your able heart rate and respiratory sensing, and we will look into it. 393 00:45:26,900 --> 00:45:31,820 We have nothing published. At the moment, we only have the paper that they refer to the most, 394 00:45:31,820 --> 00:45:40,700 especially if at the at the bottom where we do activity recognition, we are now monitoring heart rate. 395 00:45:40,700 --> 00:45:55,490 What I can say is that the head is a much more stable place to monitor, you know, activity and possibly even physiological things. 396 00:45:55,490 --> 00:45:59,510 So, you know, the wrist might not be the best place. 397 00:45:59,510 --> 00:46:06,980 There are very few comparisons for heart rate monitor on average ear durable at the moment. 398 00:46:06,980 --> 00:46:14,350 So I guess I'm not. Yeah. Maybe next year we can talk about this with more data. 399 00:46:14,350 --> 00:46:21,860 Certainly promising. And here's somebody who's asking about the difficulty that in the real world, 400 00:46:21,860 --> 00:46:30,800 you don't have labelled data and what are some of the effective methods that you can use to try to get around the lack of labels? 401 00:46:30,800 --> 00:46:41,240 That's a very good question. So this is something that the community is really looking very much into. 402 00:46:41,240 --> 00:46:48,110 Obviously, transfer learning has been tried, so supervision has been tried and transformation. 403 00:46:48,110 --> 00:46:54,470 People are trying to use ancillary tasks as well as in the way that we've also tried to to. 404 00:46:54,470 --> 00:47:01,230 And I think there are techniques that have been applied to other data that can be tried here. 405 00:47:01,230 --> 00:47:07,040 But the problem of labelling in wearable is really perhaps bigger than in other domains. 406 00:47:07,040 --> 00:47:11,420 So, yeah, I think I mentioned the techniques that we've been using, 407 00:47:11,420 --> 00:47:18,960 but it's an if you if you're not working on that and you want to work on that, it's definitely, I think, interesting. 408 00:47:18,960 --> 00:47:25,910 OK, OK. And there's someone asking about something that you alluded to, but maybe didn't quite have time to tell us enough. 409 00:47:25,910 --> 00:47:32,720 So this person was saying that what were the biggest challenges with the app deployment and converting the data into results? 410 00:47:32,720 --> 00:47:37,740 You said you had some thoughts on how the process could be improved. What are they? 411 00:47:37,740 --> 00:47:49,160 So our problem was that we were blacklisted for about a month from Google and Apple because the app 412 00:47:49,160 --> 00:47:56,630 had the Corvette in the title and it was considered kind of an exploitation of a large scale event. 413 00:47:56,630 --> 00:48:05,180 And so we had to I had to plead to the head of public health in Cambridge to send the letter through. 414 00:48:05,180 --> 00:48:13,760 So we through the normal forms that Google has and and to send this letter, we are not we are not playing around. 415 00:48:13,760 --> 00:48:21,110 We are trying to do a research study on this and the day we have all the ethics, we have all the data transfer agreement in place. 416 00:48:21,110 --> 00:48:33,110 And so there is the seemed to be a missing pass to connect academic research and this sort of large scale deployment for this sort of, I would say, 417 00:48:33,110 --> 00:48:41,360 maybe excluding mine, but generally very important studies that can happen through the deployment and the large scale collection of this data. 418 00:48:41,360 --> 00:48:51,010 Obviously, you know, privacy is up there as as a big banner, but it is already finding application that is published in. 419 00:48:51,010 --> 00:48:56,290 And temps. Well, yeah, I was alluding to this when I mentioned that, 420 00:48:56,290 --> 00:49:03,430 but definitely another another important lesson is people have asked about our data why you not only coughs, 421 00:49:03,430 --> 00:49:11,470 why you're not releasing this data publicly to everyone in some groups have and in consultation with experts in the university, 422 00:49:11,470 --> 00:49:15,820 we have decided that this data is actually more dangerous than people think. 423 00:49:15,820 --> 00:49:27,340 And we also had this conversation with the chairs of the new reps, data and data track where we released our data because their good for paper. 424 00:49:27,340 --> 00:49:33,970 The beginning was this was asking us to release the data public if we were to submit to that track. 425 00:49:33,970 --> 00:49:36,250 And I wrote to them and said, Well, 426 00:49:36,250 --> 00:49:46,870 it is wrong to release this data publicly because someone could reengineered the identity of someone's voice or even cost, 427 00:49:46,870 --> 00:49:51,590 in fact, just by correlating it with something else, public and publicly available data. 428 00:49:51,590 --> 00:49:59,410 So this process have taken time, but I think we got that right. 429 00:49:59,410 --> 00:50:00,280 Thank you, sir. 430 00:50:00,280 --> 00:50:09,250 I'm just going to ask one more question, and then I'm just expressing because we can't do it live so much thanks from all of us listening. 431 00:50:09,250 --> 00:50:10,720 But let me ask one more question. 432 00:50:10,720 --> 00:50:17,920 So this person says, could you say a couple of things on current research on mental health analysis using wearable data? 433 00:50:17,920 --> 00:50:28,480 Do you have any general thoughts or directions that you might find interesting? So we have worked on collecting data for mental health. 434 00:50:28,480 --> 00:50:34,870 Five, six, seven years ago, in fact, the one of the papers that got the the pen name, 435 00:50:34,870 --> 00:50:46,060 but your award was in fact the one that was doing emotion detection from voice on device on a very old Nokia phone that had wonderful battery. 436 00:50:46,060 --> 00:50:52,360 That's why we could do it. We Gucci mixture models. So that was on the voice. 437 00:50:52,360 --> 00:51:01,940 We also collected data from accelerometer and, you know, questionnaires and then tried to correlate those with mood. 438 00:51:01,940 --> 00:51:05,860 We had mood reports, so we have a large dataset that had the sort of information. 439 00:51:05,860 --> 00:51:10,270 So I think these days these things are still ongoing. 440 00:51:10,270 --> 00:51:19,120 I think the finding that the finding of the phone was striking at the time was that someone having a light. 441 00:51:19,120 --> 00:51:23,740 So having this syndrome with on their phone, having some sort of activity that didn't mean exercise, 442 00:51:23,740 --> 00:51:31,390 it just meant that they were going somewhere all the time where they were using the phone was actually positively correlated with mood. 443 00:51:31,390 --> 00:51:39,130 So, so definitely I mean, this this is really important, but there are various aspects of mental health the expanding to Alzheimer. 444 00:51:39,130 --> 00:51:50,260 I have a project on monitoring memory and Alzheimer correlation with the ability to navigate, 445 00:51:50,260 --> 00:51:56,350 which is apparently one of the first thing that disappears with your assignment is also another area where these devices could make a difference. 446 00:51:56,350 --> 00:52:05,560 I could go on given our talk, but Leslie would stop me. Okay, thank you so much, Cecilia, for a marvellous talk and thanks again to our sponsors, 447 00:52:05,560 --> 00:52:10,130 Oxford Asset Management, and thanks to all of you for attending online. 448 00:52:10,130 --> 00:52:16,638 So, so that's a day lecture. Thank you very much. Keep.