1 00:00:06,990 --> 00:00:15,480 So good afternoon, I'm very pleased to welcome the distinguished speaker of the House, John Murphy from Albayalde University. 2 00:00:15,480 --> 00:00:20,010 So, so Murphy obtained using statistics from University of North Carolina at Chapel 3 00:00:20,010 --> 00:00:26,490 Hill in 1989 on F2P of his detailed position in both good universities, 4 00:00:26,490 --> 00:00:32,130 including Penn State of the University of Michigan, where she was the audience professor of statistics. 5 00:00:32,130 --> 00:00:39,790 She joined our values in 2017, where she told me also said his own personal computer science puts a. 6 00:00:39,790 --> 00:00:47,400 He's the world expert in experimental design quote intelligent sequential decision making on a particular interest in digital. 7 00:00:47,400 --> 00:00:55,560 If the work has been extremely influential on the coding issue, received numerous honours, including amongst many of us, 8 00:00:55,560 --> 00:01:02,700 the future well, electricity because an 18 obligated to go middle of the way with such good society in 2019. 9 00:01:02,700 --> 00:01:08,730 She's also American member of the US National Academy of Science on the US National Academy of Medicine. 10 00:01:08,730 --> 00:01:10,560 On top of being a fantastic scientist, 11 00:01:10,560 --> 00:01:18,990 but the military has been doing a lot of work for the study's community and is the past president of the Amos on of the Boundary Society. 12 00:01:18,990 --> 00:01:27,500 Though we both held a do or distinguished speaker for the seasonal flu, we'll be talking about assessing personalisation in digital health. 13 00:01:27,500 --> 00:01:34,610 Hanks, thanks for that introduction. Thanks for the invitation to speak with us with all of you can share my screen. 14 00:01:34,610 --> 00:01:39,860 Yes, you should be able to. Yes, yes, you should see it now. 15 00:01:39,860 --> 00:01:44,780 Just let me just fix it so I can see. OK, great. 16 00:01:44,780 --> 00:01:52,220 So, ah, this is work that we're engaged in right now. 17 00:01:52,220 --> 00:02:03,950 And these are our first efforts in this direction. And it was motivated by our concerns that when you run an online algorithm in a in this case, 18 00:02:03,950 --> 00:02:14,480 these are digital health trials and you look at the results sometimes for some individuals, the results look just totally fantastic. 19 00:02:14,480 --> 00:02:19,730 It's like you personalised. And now everyone should use your algorithm. 20 00:02:19,730 --> 00:02:27,770 And the question, of course, is, is this spurious? So that got us going down this particular path. 21 00:02:27,770 --> 00:02:34,460 And I'll share with you today what our first very first steps in this direction are and will be. 22 00:02:34,460 --> 00:02:44,060 This will be focussed on heart steps. This is a I'll describe that shortly. 23 00:02:44,060 --> 00:02:58,370 OK. Yeah. So I just wanted to mention, you know, this type of research involves large collaborative teams because you're developing an algorithm, 24 00:02:58,370 --> 00:03:02,750 then you're implementing the algorithm in a trial and then you're analysing know. 25 00:03:02,750 --> 00:03:05,900 And there's usually software engineers involved as well. 26 00:03:05,900 --> 00:03:13,820 And I just wanted to shout out three individuals who are particularly who are really made a big impact. 27 00:03:13,820 --> 00:03:17,270 And that's Pung Liao. He was he's a postdoc in my lab. Kelly Zheng. 28 00:03:17,270 --> 00:03:21,050 She's a computer science Ph.D. student, also in the lab. 29 00:03:21,050 --> 00:03:28,770 And then Xie Yang Ji, who's an incoming first year Harvard Ph.D. student. 30 00:03:28,770 --> 00:03:37,590 So what I'll do is, first of all, describe heart steps and then we'll go on to the issue of personalisation. 31 00:03:37,590 --> 00:03:45,530 OK, so what steps is was funded to construct this activity coach? 32 00:03:45,530 --> 00:03:49,980 And it's on your phone and individuals wear a wristband tracker. 33 00:03:49,980 --> 00:03:56,190 And it was. It's it's for individuals who are high risk of coronary artery disease. 34 00:03:56,190 --> 00:03:59,460 And there was there's three studies that are part of this, 35 00:03:59,460 --> 00:04:10,140 and you see the the first study was only six weeks and then the next two that ran into each other was nine three months and nine months studies. 36 00:04:10,140 --> 00:04:16,590 And I say these studies are micro randomised, and I say that in particular, 37 00:04:16,590 --> 00:04:22,530 the last two studies are personalised and I hope as I go through, you'll see what I mean by that. 38 00:04:22,530 --> 00:04:26,760 It's not just just put a question in the chat and we can. 39 00:04:26,760 --> 00:04:36,270 I can make that clear. OK, so it all digital interventions, there's many intervention components. 40 00:04:36,270 --> 00:04:45,600 We're going to only focus on one intervention component, and that's whether or not to send a notification. 41 00:04:45,600 --> 00:04:50,250 It would appear on the individual's lock screen or their smartphone, 42 00:04:50,250 --> 00:05:01,950 and the content of this notification is tailored to where the individual is at the moment, the day of the week, what's the weather like and so on. 43 00:05:01,950 --> 00:05:07,560 And you can see an example on the right hand side side, and this appeared in the morning. 44 00:05:07,560 --> 00:05:11,850 It was actually a very cold morning, so you can see that it's trying to get me to think about, 45 00:05:11,850 --> 00:05:16,860 you know, reframe my view of cold mornings and about walking to work today. 46 00:05:16,860 --> 00:05:23,490 So all the little suggestions are intended to help you be more active wherever you are at that moment in time. 47 00:05:23,490 --> 00:05:28,440 And we want to decide should we send one or should we not? Should the algorithm send one? 48 00:05:28,440 --> 00:05:36,360 Or should it not shouldn't. And there's five times a day at which these notifications might be sent and those five times are user specific. 49 00:05:36,360 --> 00:05:43,620 They have to do with the way that individual organises their life and the reward, what's called a reward or in our world, an outcome. 50 00:05:43,620 --> 00:05:51,520 A near time outcome is the 30 minute step count after this time at these time points. 51 00:05:51,520 --> 00:06:02,840 So and the reason why it's only 30 minutes is because the content of the notification is all about being active in that moment. 52 00:06:02,840 --> 00:06:13,340 So when you think about data from one of these three trials that I mentioned on two slides ago, the data, what it looks like is on each user. 53 00:06:13,340 --> 00:06:22,280 It's a time series of tuples. I'll call it state s action a reward or it's a whole time series. 54 00:06:22,280 --> 00:06:35,160 And the number of time points depends on whether or not the user was in the three month or the nine month study or at each point at each time point. 55 00:06:35,160 --> 00:06:43,070 The sensors on the wearable, as well as on the phone, pick up the person's current contacts or stay, 56 00:06:43,070 --> 00:06:54,050 and then an action that is send the notification versus not is made by an algorithm, and we'll discuss that shortly. 57 00:06:54,050 --> 00:06:58,670 And in our case, we're only focussing on sin versus not sin. 58 00:06:58,670 --> 00:07:05,870 And then after that, the sensors on the tracker note record the 30 minute step count, 59 00:07:05,870 --> 00:07:12,020 and we're going to focus on a log of a 30 minute step count, mainly because step counts are hot, right, skewed. 60 00:07:12,020 --> 00:07:18,170 And in particular, the notation I'm going to use throughout is the mean of that log 30 minute step count, 61 00:07:18,170 --> 00:07:28,850 given current state and current action and action is either one or zero send a notification versus not is denoted by this lowercase R of say. 62 00:07:28,850 --> 00:07:39,310 You should be able to see my pointer here and I'll use this notation repeatedly throughout. 63 00:07:39,310 --> 00:07:46,750 OK, so now about the algorithm that was used online as the trial went on to determine 64 00:07:46,750 --> 00:07:54,060 whether or not a notification was set at each of the five times per day. 65 00:07:54,060 --> 00:08:02,910 So I'm just what I'm going to do is I'm going to give you just some small aspects of this algorithm because I really want to 66 00:08:02,910 --> 00:08:10,410 get in to the latter part of the talk where I talk about whether or not how to assess how well the algorithm personalised. 67 00:08:10,410 --> 00:08:18,630 So I'll only talk a little bit about the algorithm itself, and I can't speak about it more if people have questions. 68 00:08:18,630 --> 00:08:25,650 So the these are online decision making algorithms, and the idea is to select these actions, 69 00:08:25,650 --> 00:08:31,320 send a notification versus not in order to maximise some sort of outcome. 70 00:08:31,320 --> 00:08:36,540 And in this case, it's law average of the step counts. 71 00:08:36,540 --> 00:08:47,910 And it's always in this world. It's always subject to both a number of constraints which are often expressed in a very qualitative way, 72 00:08:47,910 --> 00:08:57,840 and you have to figure out how to quantify them. So one constraint is to permit what's called off policy learning after the data collection is over. 73 00:08:57,840 --> 00:09:02,250 So in the field of reinforcement learning, that's what this belongs to. 74 00:09:02,250 --> 00:09:11,700 There is a lot of interest in understanding. Well, if I had use some other way of selecting the actions, how might the some of the rewards behave? 75 00:09:11,700 --> 00:09:18,030 So you want to permit that kind of like those kinds of analysis after the data collection ceases? 76 00:09:18,030 --> 00:09:23,430 There's also because of the area, the domain. This isn't a burden. 77 00:09:23,430 --> 00:09:31,290 User burden is a big issue and habituation that is when people no longer even notice the notifications. 78 00:09:31,290 --> 00:09:39,840 That's also a big issue. So these are going to these impose enormous constraints on any algorithm you're going to run and you can't use these trials, 79 00:09:39,840 --> 00:09:43,740 the length of the trials or just have to do with how much funding there is. 80 00:09:43,740 --> 00:09:50,190 So you want an algorithm that doesn't know when the trial is going to end. 81 00:09:50,190 --> 00:10:00,900 So what we did in V2 and V3, that's the three months and the nine months study was we took we started off with a bandit algorithm, 82 00:10:00,900 --> 00:10:07,740 a Bayesian type of algorithm called Thompson Sampling, and we altered it in a variety of ways. 83 00:10:07,740 --> 00:10:11,550 And I'll just point out some of the ways in which we altered it. 84 00:10:11,550 --> 00:10:22,440 And the idea is this algorithm which is running it, ran on the cloud and communicated with the phone and the tracker in real time. 85 00:10:22,440 --> 00:10:30,170 It's supposed to be personalising the decision as to whether or not to send a notification versus not. 86 00:10:30,170 --> 00:10:39,050 At each of those five times a day, so when you think of an online decision making algorithm as a statistician, 87 00:10:39,050 --> 00:10:47,000 I always think of these algorithms as being composed of two sub algorithms two elements. 88 00:10:47,000 --> 00:10:50,550 One is what people will call a learning algorithm. 89 00:10:50,550 --> 00:10:59,450 This is just an incremental statistical method, and the goal is to learn some characteristic of the multivariate the distribution of the data. 90 00:10:59,450 --> 00:11:06,080 In this case, it's to learn the meaning of the log 30 minutes step count given state in action, 91 00:11:06,080 --> 00:11:11,090 and in our case, we used a Bayesian linear regression model. 92 00:11:11,090 --> 00:11:17,900 It's a particularly simple it can be viewed as a simple, a Gaussian process model with very simple kernel, 93 00:11:17,900 --> 00:11:23,480 and that actually opens doors to us in a variety of ways. 94 00:11:23,480 --> 00:11:28,370 So that was one element. So it's just essentially an incremental statistical method. 95 00:11:28,370 --> 00:11:33,080 In our case, it's Bayesian, and you can think of it. It's linear regression. 96 00:11:33,080 --> 00:11:41,630 And then there was an action. The second element of this online decision making algorithm is an actual selection strategy, 97 00:11:41,630 --> 00:11:46,400 and that's that strategy is all about how are you going to use the outputs of the learning 98 00:11:46,400 --> 00:11:54,020 algorithm to select the actions at five times a day as the individual experiences? 99 00:11:54,020 --> 00:12:03,410 The mobile app? So what we're doing here is called posterior sampling, at least nowadays it's called posterior sampling. 100 00:12:03,410 --> 00:12:10,920 I don't think when Thomson first invented this, he thought he was thinking in this way, but nowadays it's called posterior sampling. 101 00:12:10,920 --> 00:12:14,180 The idea is what you do is you calculate the posterior. 102 00:12:14,180 --> 00:12:25,100 So you let your learning algorithm is Bayesian, so you can calculate a posterior probability that the treatment effect that in the current state s. 103 00:12:25,100 --> 00:12:31,460 You calculate the posterior probability that the treatment effect and the current state X is greater than zero. 104 00:12:31,460 --> 00:12:41,450 So that's already one. Minus RC row, and you calculate that posterior probability that screened zero and then what you do is you take that, 105 00:12:41,450 --> 00:12:49,100 oh, sorry, oh, you take that posterior probability and you randomise so you're randomising. 106 00:12:49,100 --> 00:13:00,890 It is the sequential experimentation setting, but the randomisation probabilities are tied to how the Bayesian algorithm anticipates the effect 107 00:13:00,890 --> 00:13:06,950 of sending a notification in that particular state to state the individual is in right now. 108 00:13:06,950 --> 00:13:13,610 This is a greedy personalisation. So what do I mean by greedy this algorithm? 109 00:13:13,610 --> 00:13:21,050 This if you base your work on a if you take a banded algorithm and you work around that area, 110 00:13:21,050 --> 00:13:29,390 then you're not paying attention to the fact of the notifications on future rewards. 111 00:13:29,390 --> 00:13:33,470 And clearly, that's not a good idea here. 112 00:13:33,470 --> 00:13:42,290 But there's a bias variance, trade-off and if and in our case, we decided to just focus greedily on this time. 113 00:13:42,290 --> 00:13:53,860 Would it be useful to send a notification or not? OK, so I just want to I just wanted to talk just a little bit about the learning algorithm, 114 00:13:53,860 --> 00:14:02,830 the first element of this online decision making procedure just to provide a little bit more context. 115 00:14:02,830 --> 00:14:12,640 So what we do is we use in this particular setting because of the high noise that one incurs in these kinds of real life, 116 00:14:12,640 --> 00:14:19,330 experimental and sequential decision making problems. We use a very low dimensional treatment effect model. 117 00:14:19,330 --> 00:14:26,980 And you see it here, it's a linear model in features, and all of the features of state were handcrafted by the scientific team. 118 00:14:26,980 --> 00:14:33,580 There were there were five stages of five dimensional. 119 00:14:33,580 --> 00:14:37,540 And in fact, we always use informative priors. I am now totally. 120 00:14:37,540 --> 00:14:43,180 I was never a Bayesian before. I have now become a complete Bayesian with informative priors. 121 00:14:43,180 --> 00:14:48,700 Forget about this not informative business and and the way we form our informative prior 122 00:14:48,700 --> 00:14:54,580 in this particular case and in general in a setting is you have a prior study and we did. 123 00:14:54,580 --> 00:15:03,430 We had Heart Steps V1 and we could use that study to form the prior four v2 and V3. 124 00:15:03,430 --> 00:15:09,370 And I want to mention some things about this prior because it's going to be important when we go on. 125 00:15:09,370 --> 00:15:15,520 So this prior, say this five dimensional and I'll show you the features later on a further slide. 126 00:15:15,520 --> 00:15:21,490 But the first feature is just the the the overall effect of sending a notification versus not. 127 00:15:21,490 --> 00:15:30,460 And that was the only feature that had a positive say to that is the prior had a positive meaning for that feature. 128 00:15:30,460 --> 00:15:35,200 The mean for all the other features, the remaining four was zero. 129 00:15:35,200 --> 00:15:43,930 OK, so this is important to remember for later on. So we're starting off the trial with a prior that says we anticipate there to 130 00:15:43,930 --> 00:15:51,250 be a positive effect of sending a notification or overall across all states. 131 00:15:51,250 --> 00:15:58,780 We also had a baseline week for each user, each user had a week where we just collected data on the user. 132 00:15:58,780 --> 00:16:06,160 We randomised whether to send a notification with probability 2.5 each of the five times a day. 133 00:16:06,160 --> 00:16:10,780 And now I want to talk just a little bit about the action. We did a lot of thanks to this algorithm, 134 00:16:10,780 --> 00:16:15,910 but I just wanted to mention that the parts I mentioned on the prior slide now I 135 00:16:15,910 --> 00:16:19,660 just want to give you just talk a little bit about the action selection strategy, 136 00:16:19,660 --> 00:16:28,660 which was posterior sampling. And this was a sad case because I was very naive when I started down this path. 137 00:16:28,660 --> 00:16:35,170 And so we actually had to the first set of people that came in the first V2. 138 00:16:35,170 --> 00:16:41,740 We ended up not being able to use this part of the study indicate why that happened. 139 00:16:41,740 --> 00:16:45,280 So the what does posterior sampling does what it does. 140 00:16:45,280 --> 00:16:51,190 You calculate that posterior probability, the treatment effect is very zero and then you're randomised with that probability. 141 00:16:51,190 --> 00:17:01,690 So look at number one. In some states, the posterior distribution of the treatment effect is it's going to be highly peaked and centred around zero. 142 00:17:01,690 --> 00:17:08,500 And so what that means is your randomisation probability will be average will average around point five. 143 00:17:08,500 --> 00:17:13,600 So we probability point five on average if there's no evidence of an effect. 144 00:17:13,600 --> 00:17:20,500 You're sending a notification this makes no sense whatsoever from a domain science perspective, 145 00:17:20,500 --> 00:17:26,320 particularly if you're worried about bothering people and having them habituate to your messages. 146 00:17:26,320 --> 00:17:30,310 This is definitely not desirable. We didn't even think about this at first. 147 00:17:30,310 --> 00:17:36,610 And then in other states, you're getting a large amount of information. 148 00:17:36,610 --> 00:17:46,650 You're just your Gaussian posterior will be highly peaked around a positive weight for that state. 149 00:17:46,650 --> 00:17:52,080 And you're going to think, whoa, you know, you really should send a notification in that state. 150 00:17:52,080 --> 00:18:00,540 But again, we got to remember this is a setting in which people get overburdened by having pinging their phone, pinging all the time. 151 00:18:00,540 --> 00:18:06,060 Do you really want to send that notification every time you're in that state? 152 00:18:06,060 --> 00:18:11,430 No, you don't really want to do that. So tell you what we did. 153 00:18:11,430 --> 00:18:17,520 We're trying to improve this, but we became engineers, we have to, you know, we have to put this into the field. 154 00:18:17,520 --> 00:18:23,580 It has to be. And we also needed to permit off policy learning after the data collection ceases. 155 00:18:23,580 --> 00:18:24,690 That was the third thing. 156 00:18:24,690 --> 00:18:34,740 Third issue now to learn you must in any given state, unless you're willing to make a lot of assumptions to learn, you must be able. 157 00:18:34,740 --> 00:18:39,300 You must do. Sometimes you choose action zero and sometimes you choose action one. 158 00:18:39,300 --> 00:18:43,770 You can't just always choose one of the actions. OK, so what was our solution? 159 00:18:43,770 --> 00:18:50,850 Our solution was to take that posterior probability and clip it and the little graph. 160 00:18:50,850 --> 00:18:54,870 I just drew that little graph and blue on the right hand side. 161 00:18:54,870 --> 00:18:56,370 And this is how we clipped it. 162 00:18:56,370 --> 00:19:05,940 So if the posterior probability that it was really a good idea to send that message in that state is above 0.8, it becomes zero point eight. 163 00:19:05,940 --> 00:19:13,990 If the posterior probability is around zero point five, indicating there's probably not much of a treatment effect. 164 00:19:13,990 --> 00:19:18,430 Oh, we send the notification with probability point two. 165 00:19:18,430 --> 00:19:25,560 Now, why are we doing this with point two? There is a lot of evidence in this world that variability is therapeutic, 166 00:19:25,560 --> 00:19:33,100 so we always want to send every now and then we want to send a message just to shake things up. 167 00:19:33,100 --> 00:19:39,310 So though the lower bound is point to the upper bound is 0.8. And that's what these two values at the bottom. 168 00:19:39,310 --> 00:19:43,510 The two sentences at the bottom of the slide are about so p you. 169 00:19:43,510 --> 00:19:48,670 That's the upper value point eight. In our case, this is determined by our need to do off policy learning. 170 00:19:48,670 --> 00:19:52,870 We can't end up being we can't have one because then we won't be able to learn off policy. 171 00:19:52,870 --> 00:20:03,250 It's a disaster and we don't want to. And from a domain science perspective, we don't want to overburden our users PRL, which is 0.2 in our setting. 172 00:20:03,250 --> 00:20:07,390 We don't we again, we don't want it to be zero. We have to be able to do all policy learning. 173 00:20:07,390 --> 00:20:12,190 But here also, that's where the health benefit of having some variability. 174 00:20:12,190 --> 00:20:20,910 And we also are concerned, even though I'm not dealing with it today, non stationary is a big issue in this world. 175 00:20:20,910 --> 00:20:30,840 Some. And we want to allow in our after study analysis to investigate that. 176 00:20:30,840 --> 00:20:36,270 OK. So we ran this algorithm. I just gave you a very high level view. 177 00:20:36,270 --> 00:20:43,680 We actually did a number of other things to make the algorithm suitable for this type of a setting. 178 00:20:43,680 --> 00:20:52,780 It's over. The studies are over. Did we achieve anything like personalised digital health? 179 00:20:52,780 --> 00:20:59,680 OK, so I'm going to have a whole series, I think there's like seven questions that I want to address as we go through 180 00:20:59,680 --> 00:21:04,750 and disbanded algorithm or it was a generalisation of abandoned algorithm. 181 00:21:04,750 --> 00:21:12,960 It was run separately on each of the individuals 91 individuals. 182 00:21:12,960 --> 00:21:14,970 And the way it's when I say it's run, 183 00:21:14,970 --> 00:21:24,090 it was used to choose whether or not to send a notification in each state at each time five times a day over the duration of that individual study. 184 00:21:24,090 --> 00:21:30,420 So some first questions I'd like to ask is, you know, we start from hard steps we wanted. 185 00:21:30,420 --> 00:21:40,660 There was an overall treatment effect. And our prior pointed us in that direction, is there evidence from this data that that's the case? 186 00:21:40,660 --> 00:21:48,670 So that's not about personalisation, it's just on on average. And then the next is more closely related to personalisation. 187 00:21:48,670 --> 00:21:54,160 Is there evidence of heterogeneous effects and how do you think about it in this problem? 188 00:21:54,160 --> 00:21:57,940 So here we are. We're thinking about is we have no idea what to do. 189 00:21:57,940 --> 00:22:06,830 So we we go back to the literature, the old, very, very mature literature and clinical trials. 190 00:22:06,830 --> 00:22:15,410 This is an classical meta analysis. So if you're into machine learning, you know about meta analysis, metal learning, 191 00:22:15,410 --> 00:22:20,300 this is not metal learning in machine learning, OK, this is classical meta analysis. 192 00:22:20,300 --> 00:22:27,320 In fact, at the bottom of the slide have a reference to a really lovely tutorial that came 193 00:22:27,320 --> 00:22:33,160 about at the maturity of this area when this area had really matured 21 22 years ago, 194 00:22:33,160 --> 00:22:42,890 a very old area of science. So the idea here is the way we're going to think about it is each user we have 91 users is a clinical trial. 195 00:22:42,890 --> 00:22:51,200 This is how we're going to think in our head. Each user has their own unknown vector of true treatment effect coefficients. 196 00:22:51,200 --> 00:22:57,050 So I subscription by zero because that's their true treatment effect coefficient. 197 00:22:57,050 --> 00:23:02,720 And I indicates user, Oh, and then I have to estimate each use. 198 00:23:02,720 --> 00:23:06,810 Each user has to have an estimate of that user's treatment effect coefficient. 199 00:23:06,810 --> 00:23:16,310 What I use is the vector of posterior means. And what I'm thinking in my mind is this was just a Gaussian. 200 00:23:16,310 --> 00:23:21,560 A Bayesian linear regression. This is just reg regression here. 201 00:23:21,560 --> 00:23:27,170 All I did was a rich my theta had. I is just a weight from a rich regression. 202 00:23:27,170 --> 00:23:31,990 That's all it is. You can see what Zeta is. 203 00:23:31,990 --> 00:23:39,570 So wait. Remember, Theta II is a five dimensional vector, it's the treatment effect model. 204 00:23:39,570 --> 00:23:51,090 So in that in classical meta analysis, there's two ways that people think the first way is you say all I care about are the users are in their case, 205 00:23:51,090 --> 00:23:55,910 the trial, the trials in front of me. 206 00:23:55,910 --> 00:23:59,480 So all I care about is these 9:1, you don't care about anything else. 207 00:23:59,480 --> 00:24:08,510 I only want to make inference about these 91 users, and what one does is one makes an approximate approximates the distribution of your. 208 00:24:08,510 --> 00:24:20,180 The estimates we derived from a rich regression by a normal it should have mean the true underlying regression coefficient for that user ie. 209 00:24:20,180 --> 00:24:28,910 And then there are some variance. And the variance has to do with the fact that we didn't observe this user over really long. 210 00:24:28,910 --> 00:24:36,950 We didn't assume we didn't have an infinite number of examples on this user. So the arrogance. 211 00:24:36,950 --> 00:24:44,510 The second way you think that one thinks in classical meta analysis is population inference. 212 00:24:44,510 --> 00:24:56,750 So here you think my end users and 91 in our case are a subset of a population of users and we want to make statements about that whole population. 213 00:24:56,750 --> 00:25:02,480 And in this case, actually in this study, this made a lot of sense for us to think that way as well. 214 00:25:02,480 --> 00:25:11,210 And the reason is because all of these individuals are from our patients in the Kaiser Health Care System in Seattle, 215 00:25:11,210 --> 00:25:16,430 and they had all just been diagnosed with stage one hypertension. 216 00:25:16,430 --> 00:25:25,400 So if the health care system was thinking about should we roll out an app for our patients who have just been diagnosed with stage one hypertension, 217 00:25:25,400 --> 00:25:30,260 this this type of inference would be relevant. 218 00:25:30,260 --> 00:25:39,650 In this case, you make an additional assumption in this additional assumption is that as you vary from one user to the other across the population, 219 00:25:39,650 --> 00:25:46,460 that varies normally. And the main five dimensional vector of treatment effects the status of pop. 220 00:25:46,460 --> 00:25:55,760 And then there are some variation amongst these five dimensional vectors as you go from one user to the other in the whole population. 221 00:25:55,760 --> 00:25:59,570 OK. So must start answering my two questions that I posed. 222 00:25:59,570 --> 00:26:06,600 I'm going to repeat the questions that my answer? The first question is in this vein of population inference. 223 00:26:06,600 --> 00:26:16,800 So is there some evidence of an overall average treatment effect on our 30 minute step count and in classical, 224 00:26:16,800 --> 00:26:27,960 the way one forms a statistic as you get a weighted average of your your purse, your user specific estimate? 225 00:26:27,960 --> 00:26:32,670 Okay, so I'm going to go through that. This is just I'm not doing anything special here. 226 00:26:32,670 --> 00:26:34,560 This is classical meta analysis. 227 00:26:34,560 --> 00:26:42,600 So this little e that's a that's a vector all zeros, except a one in one place, and it's just being used to pick out one of the five. 228 00:26:42,600 --> 00:26:46,520 One of the members of the five dimensional vector of theta and the weight. 229 00:26:46,520 --> 00:26:56,490 So we get a weighted average in those weights or are the within use or variance, plus the variance from user to user. 230 00:26:56,490 --> 00:27:04,800 So it's classic this classical statistics and you weight your person specific estimate hours by these weights. 231 00:27:04,800 --> 00:27:10,410 Now here's this little green table. It gives you the names of the features. 232 00:27:10,410 --> 00:27:15,690 So the first is the overall send notification versus not binary. 233 00:27:15,690 --> 00:27:21,000 Then we had a feature how many times recently it was an exponentially discounted 234 00:27:21,000 --> 00:27:26,850 feature of how many times recently we've been sending your notifications the next. 235 00:27:26,850 --> 00:27:33,180 The third feature engagement was whether or not people were going more often to the app to track 236 00:27:33,180 --> 00:27:39,210 their their physical activity than usual location was whether or not they were in the structure, 237 00:27:39,210 --> 00:27:49,320 environment or not, and step variation was how variable their step count was in that same time period over the last week. 238 00:27:49,320 --> 00:27:57,780 OK, so you do this and you think, OK, you get chart, you do the statistics, you get your confidence interval and you think, Oh, this is great. 239 00:27:57,780 --> 00:28:02,160 We have a confidence interval. It doesn't contain zero. That's lovely. 240 00:28:02,160 --> 00:28:07,650 And so there seems to be some overall effect of sending a notification versus not. 241 00:28:07,650 --> 00:28:15,360 On average, across individuals, it's not talking about personalisation. And then you go to that second row and you realise that and in fact, 242 00:28:15,360 --> 00:28:22,380 we anticipated this that the more someone has been notified, the less responsive they tend to be. 243 00:28:22,380 --> 00:28:31,350 And in fact, this is a very large negative coefficient and the confidence intervals are very wide, indicating there's a lot of uncertainty. 244 00:28:31,350 --> 00:28:39,330 This pretty much kills the treatment effect, except the average estimate or except when the dose. 245 00:28:39,330 --> 00:28:45,810 The recent dose is very, very close to zero and to hit that point a little stronger. 246 00:28:45,810 --> 00:28:55,020 I'm going to look at a particular state here, and the state is the person is experienced recently, an average dose. 247 00:28:55,020 --> 00:28:57,030 They're currently engaged with the app. 248 00:28:57,030 --> 00:29:04,620 They've been tracking their behaviours, they're at home or work in a structured environment and their recent variability in their activities. 249 00:29:04,620 --> 00:29:12,140 So I'm just going to focus on that state and I ask, well, what's the confidence interval for the treatment effect in that state? 250 00:29:12,140 --> 00:29:16,440 You know, so this is a confidence interval for the average across the population treatment effect in that state. 251 00:29:16,440 --> 00:29:23,620 And you see indeed, there's just not much going on. There's not a lot of evidence there, right? 252 00:29:23,620 --> 00:29:35,530 It's depressing. So then we asked, well, what about heterogeneity between users is the better action user specific? 253 00:29:35,530 --> 00:29:44,560 And this is now all of a sudden we switch our hats and we start focussing just on these 91 users. 254 00:29:44,560 --> 00:29:55,390 And the test statistic here is based on the variation between users and their estimated regression coefficients in the treatment effect. 255 00:29:55,390 --> 00:30:06,580 Again, E is this is a vector all zeros, except for one one in one of the entries, depending on which coefficient you want to pick out. 256 00:30:06,580 --> 00:30:14,200 And the average that you get a variation amongst each individual's estimated treatment of sex and the average is an average, 257 00:30:14,200 --> 00:30:18,220 of course, weighted by how variable that treatment effect is. 258 00:30:18,220 --> 00:30:32,170 And of course, this average is why it's not overt here, explicit, but it depends on how many how long that individuals in the study. 259 00:30:32,170 --> 00:30:39,550 So here's us using it, so the test to the test is a hypothesis of whether or not the users have all the users. 260 00:30:39,550 --> 00:30:46,260 Ninety one have the same. True. Treatment effect coefficients. 261 00:30:46,260 --> 00:30:55,270 That's what this null hypothesis means. Five dimensional data. And here we are going from one data to the next five. 262 00:30:55,270 --> 00:31:01,120 And you see, there's enormous evidence that users differ a great deal. 263 00:31:01,120 --> 00:31:06,580 One from the other in terms of their own treatment effect coefficients. 264 00:31:06,580 --> 00:31:11,560 Lots of heterogeneity. Very interesting. 265 00:31:11,560 --> 00:31:21,640 OK, so now what I'm going to do is I'm going to if you're familiar with reinforcement learning are bandits, you know, 266 00:31:21,640 --> 00:31:30,190 one of the things we always want to do is estimate the average reward and compare that average reward on different policies. 267 00:31:30,190 --> 00:31:38,050 So that's what I'm going to do here as well. OK, so on average, does the bandit algorithm select more effective actions, 268 00:31:38,050 --> 00:31:44,260 i.e. send a notification versus not, then the prior, because remember, the prior was informative prior. 269 00:31:44,260 --> 00:31:51,760 We built it off a prior prior data on similar individuals exact same interventions. 270 00:31:51,760 --> 00:31:59,620 OK, so I just want to make clear what I mean by average, by how am I quantifying if more effective actions? 271 00:31:59,620 --> 00:32:05,560 By that, I mean, you get a higher average reward. So here you have the value function. 272 00:32:05,560 --> 00:32:12,700 The Sabaya for the Eyes user, PI is a particular policy for choosing actions. 273 00:32:12,700 --> 00:32:19,390 And this is just the expectation of the ICE users reward function, which is a function of state and action. 274 00:32:19,390 --> 00:32:31,320 So it's this expectation is averaging over the states that that user experiences, as well as any stochastic city in the policy pie. 275 00:32:31,320 --> 00:32:34,080 And our policies are always stochastic. 276 00:32:34,080 --> 00:32:41,970 So it averages both over the stochastic city to the states that that user finds themselves in, as well as the Stochastic City an average. 277 00:32:41,970 --> 00:32:46,890 And we want to know disbanded algorithm produce a higher value, 278 00:32:46,890 --> 00:32:58,230 higher average reward than if we had just built our policy from the prior data and ran with it. 279 00:32:58,230 --> 00:33:02,670 So what we're going to do is we estimate that average reward under our band 280 00:33:02,670 --> 00:33:06,810 it now under our band it algorithm or our generalised band bandit algorithm. 281 00:33:06,810 --> 00:33:10,980 It's actually the the it's posterior sampling, right? 282 00:33:10,980 --> 00:33:13,470 So the policy is changing with time. 283 00:33:13,470 --> 00:33:21,000 So that's the reason why there's a B1 through DTI, because it's the probability of selecting action changes with time. 284 00:33:21,000 --> 00:33:29,040 Send a message. Changes with time and the estimate are of that value is just the average that you see in front of you for that individual. 285 00:33:29,040 --> 00:33:36,760 Now, the estimate of the average reward for a different policy, for example, the policy built off the prior. 286 00:33:36,760 --> 00:33:42,910 There is a way it used importance waiting. There's more sophisticated estimates now in the literature. 287 00:33:42,910 --> 00:33:48,840 OK, I just want to warn you, but this is a first round kind of thing. So we use important weights. 288 00:33:48,840 --> 00:34:01,020 And you can see them to their on the right hand side to weight those observed rewards in order to estimate the average reward under the prior policy. 289 00:34:01,020 --> 00:34:06,160 So if we just built the policy from V1, hardships we want and ran with, it didn't do any. 290 00:34:06,160 --> 00:34:20,240 Try to do any learning here. Already, we should be thinking in our mind that prior policy, our subjective prior said there was an effect. 291 00:34:20,240 --> 00:34:25,670 Of sending a suggestion. But then when we went and did these analyses, it didn't look so great. 292 00:34:25,670 --> 00:34:29,430 Not at least not on average. 293 00:34:29,430 --> 00:34:39,210 OK, so how are we going to do this now we're thinking again, we're still in that meta analysis world, that classical, lovely net analysis this world. 294 00:34:39,210 --> 00:34:42,390 So now the what do our status, what are they become? 295 00:34:42,390 --> 00:34:50,430 They're just one dimensional now and say to zero, the ICE users truth data is the difference between that users value under the band, 296 00:34:50,430 --> 00:34:57,090 minus that users value under the prior, the policy form from the prior and data hat I. 297 00:34:57,090 --> 00:35:01,650 It's just the estimated. Or should these two things we don't, of course, we don't observe statuses. 298 00:35:01,650 --> 00:35:09,660 We don't observe the first term. This is unknown. So what we're going to do just same is classical, not analysis. 299 00:35:09,660 --> 00:35:19,740 And this is generally true under certain conditions on the data structure that say to hat that is this estimate or in values, 300 00:35:19,740 --> 00:35:27,000 the difference in values is approximately normal. It has some variance because you're estimating it. 301 00:35:27,000 --> 00:35:31,590 And so we would we make this first assumption if we are only interested, 302 00:35:31,590 --> 00:35:36,330 we only make the first assumption if we're interested in only these one users are equal 303 00:35:36,330 --> 00:35:41,610 one to 91 and then we make an additional assumption that's the second one right here. 304 00:35:41,610 --> 00:35:48,750 I have my pointer right below it that the true difference in values varies from one individual to another in the population, 305 00:35:48,750 --> 00:35:52,740 according to normal distribution. Sort of like a random effects thing. 306 00:35:52,740 --> 00:35:58,030 OK. So the statistics are identical. 307 00:35:58,030 --> 00:36:06,640 No difference. They're identical test statistics. So just the band an algorithm result in higher average rewards than the policy based on the prior. 308 00:36:06,640 --> 00:36:11,320 And in fact, it does. And the confidence interval doesn't contain zero. 309 00:36:11,320 --> 00:36:18,740 Now, this is not a big effect. Remember, this is average over the population. 310 00:36:18,740 --> 00:36:27,740 Why do we think this might have happened? Well, the prior the policy built off the prior wanted us to send the notification a lot 311 00:36:27,740 --> 00:36:33,650 because it the prior said there was a positive effect of sending that notification. 312 00:36:33,650 --> 00:36:42,750 The prior mean was positive. But the but the band learns that on average across people, it's not true. 313 00:36:42,750 --> 00:36:50,580 So I interpret this as the effect of not bothering people too much, and every now and then it helps on average. 314 00:36:50,580 --> 00:37:03,040 OK, so now let's focus on our Ninety One users, just our ninety one and ask is that difference in values that the band it versus the prior? 315 00:37:03,040 --> 00:37:08,050 Are those differences? In other words, is the benefit of running a band bandit? 316 00:37:08,050 --> 00:37:13,490 Does that vary from one user to the other? And here we are. 317 00:37:13,490 --> 00:37:15,830 These are this user chi squared. 318 00:37:15,830 --> 00:37:25,400 This is the second type of hypothesis test and second type of statistic, and it has this hypothesis test chi square distribution. 319 00:37:25,400 --> 00:37:28,700 It's incredibly significant. 320 00:37:28,700 --> 00:37:40,820 A lot of evidence that the band it works that for some people, the band, it gives you very different results than the prior as compared to on average. 321 00:37:40,820 --> 00:37:49,850 So, OK, you know, this is all fine and good, and then we wanted to start looking at some exploratory work. 322 00:37:49,850 --> 00:37:54,890 And I found and we're getting closer now to the question. 323 00:37:54,890 --> 00:37:58,660 Question seven was the question that motivated this entire. 324 00:37:58,660 --> 00:38:07,300 Project, and it's still motivating the project, but first time I goal for sex Question six. 325 00:38:07,300 --> 00:38:12,670 Our prior this informative prior built off partnerships, we one said there was a treatment effect. 326 00:38:12,670 --> 00:38:20,290 In fact, it said the treatment effect was pretty strong because it was very strong in hard steps to be won. 327 00:38:20,290 --> 00:38:24,940 On average, across users, does it appear that the bandit algorithm learns over time? 328 00:38:24,940 --> 00:38:28,870 Well, what should the banded algorithm learn on average across users? 329 00:38:28,870 --> 00:38:43,140 It should learn on average across users, there's not much going on. OK, so the blue curve here is the actual from the real trial, 330 00:38:43,140 --> 00:38:51,240 the actual average posterior mean of the treatment effect in this particular state, the second bullet point gives you the state. 331 00:38:51,240 --> 00:38:56,310 It's the same state we looked at before as the trial progresses. 332 00:38:56,310 --> 00:39:05,760 So here you see that this is a cross decision points that x axis and then the y axis is that posterior me? 333 00:39:05,760 --> 00:39:11,620 And if you can look close, I don't know. It's hard to see. But the blue curve starts at time. 334 00:39:11,620 --> 00:39:23,520 Zero around point four seven point four seven was the prior mean of the overall effect ascending suggestion versus not in that state. 335 00:39:23,520 --> 00:39:31,230 And we know from we suspect from our prior analysis analysis, I've talked about today that on average, that didn't bear out. 336 00:39:31,230 --> 00:39:35,070 And in fact, OK, so you see the blue curve. 337 00:39:35,070 --> 00:39:39,630 It starts to drift down. It's the average posterior mean, the treatment effect. 338 00:39:39,630 --> 00:39:44,460 As the study progresses, it drifts down towards zero. 339 00:39:44,460 --> 00:39:48,270 OK, so I wanted to understand, was that drift really important? 340 00:39:48,270 --> 00:39:52,710 Was it significant, you know, in some way? So here's here's what we did. 341 00:39:52,710 --> 00:40:00,120 We we did bootstrap studies. So there's two thousand black curves. 342 00:40:00,120 --> 00:40:04,980 Each black curve is a bootstrap, is a is a bandit. 343 00:40:04,980 --> 00:40:12,090 Try a banded algorithm applied to 90 to each of 91 bootstrap users. 344 00:40:12,090 --> 00:40:19,710 Because I'm going to talk to you about how we how we made a bootstrap user. So what we did was we took our original 91 users data. 345 00:40:19,710 --> 00:40:29,610 We subtract it from every user at each time point the posterior mean of the treatment effect at that time point. 346 00:40:29,610 --> 00:40:36,420 And we call the difference between the reward and we subtracted from the reward, the posterior and the treatment effect at that time point. 347 00:40:36,420 --> 00:40:40,950 We call that difference a residual. It's not mean zero because we just took away the treatment effect. 348 00:40:40,950 --> 00:40:49,440 But so now after we did that subtraction on each individual, the 91 individuals, we have a time series of state residual state residual. 349 00:40:49,440 --> 00:41:02,040 State residual state residual. Then what we do is we we get a bootstrap sample of these 91 trajectories of state residual state residual. 350 00:41:02,040 --> 00:41:15,210 And on each of the 91, we run a bandit. If the bandit says choose action one, we add back in a treatment effect according to the prior. 351 00:41:15,210 --> 00:41:22,970 If the band it says choose action zero, we don't do anything because, yeah, we leave it because the data is already had. 352 00:41:22,970 --> 00:41:34,490 So, OK, we do that for every person, so we have one bootstraps study of ninety one users of it is run on each the ninety one. 353 00:41:34,490 --> 00:41:41,420 And now what we do is we get this average posterior mean across those 91 bootstrapped users. 354 00:41:41,420 --> 00:41:48,200 And that's the black line. A black line is the posterior mean, evolving over time. 355 00:41:48,200 --> 00:41:51,770 Now, under the ground truth, it's the priors, correct? 356 00:41:51,770 --> 00:41:58,730 And in fact, you see if the price were correct, all these lines, it's a big mass because this is a stochastic algorithm. 357 00:41:58,730 --> 00:42:03,560 It's a big mess. It goes through time, but it's but the blue line. 358 00:42:03,560 --> 00:42:16,520 The truth. Rapidly deviates from that mask and goes below, so the actual study very quickly learns that the prior was incorrect. 359 00:42:16,520 --> 00:42:30,920 OK. So I'm getting close to the end, so this so this again, as I said earlier, this is the we had a number of situations like this. 360 00:42:30,920 --> 00:42:33,560 And this is what motivated all this work, and I think to me, 361 00:42:33,560 --> 00:42:44,840 this is really important because right now we have so many examples of AI producing horrendous false results and we cannot have I mean, 362 00:42:44,840 --> 00:42:49,820 that's our, you know, as a statistician, I want to make sure that whatever we say, 363 00:42:49,820 --> 00:42:54,110 I want to try and give you some measure of confidence with whatever we say. 364 00:42:54,110 --> 00:43:01,010 OK, so I'm going to show you a user data who who exhibited very interesting personalisation. 365 00:43:01,010 --> 00:43:08,870 This was not the only use for this one. So remember, this is going on in Seattle. 366 00:43:08,870 --> 00:43:18,230 The x axis is the date in the study. This this individual or join the study in November 15, 2019. 367 00:43:18,230 --> 00:43:21,650 pre-COVID, they exited the study. 368 00:43:21,650 --> 00:43:26,590 This individual exited the study at the beginning of August 2020. 369 00:43:26,590 --> 00:43:36,650 In the in the middle of the pandemic, Seattle closed down in March. 370 00:43:36,650 --> 00:43:41,540 OK, so let's let's talk about the y axis. 371 00:43:41,540 --> 00:43:50,420 Each dot is a ratio, it's the ratio of the posterior mean of the treatment effect in that that user's current state 372 00:43:50,420 --> 00:43:55,640 divided by the posterior standard deviation of the treatment effect in that user's current state. 373 00:43:55,640 --> 00:44:03,530 And the reason why we're graphing this on the y axis is this directly leads to the prob the posterior probability. 374 00:44:03,530 --> 00:44:08,180 The higher it is, the higher the posterior probability of sending a notification, the lower the lower. 375 00:44:08,180 --> 00:44:19,190 So OK, so each dot is so there's a dot for every one of these five decision times, you know, a lot of dots are on top of each other. 376 00:44:19,190 --> 00:44:28,970 Now the dots are have two colours. The blue dots are when the current the user's current state indicates the person is engaged and the red dots, 377 00:44:28,970 --> 00:44:32,870 or when the user's current state indicates the person is now engaged here means you're 378 00:44:32,870 --> 00:44:37,430 just watching your app because the app had all kinds of things you could do on it. 379 00:44:37,430 --> 00:44:41,590 And the red state means you're just not watching the app quite as much. 380 00:44:41,590 --> 00:44:50,950 It's fascinating because in general, if at that context and that in that state you haven't been watching the app, 381 00:44:50,950 --> 00:45:00,800 then the algorithm says your treatment effect is much lower than if you if this individual had been. 382 00:45:00,800 --> 00:45:09,580 Watching was engaged. That's the blue. This is glorious. 383 00:45:09,580 --> 00:45:13,150 I mean, I show this to a domain scientists, and they love it, right? 384 00:45:13,150 --> 00:45:17,200 Because this is what it means to personalise it with higher probability. 385 00:45:17,200 --> 00:45:22,660 When someone's engaged, they're sent a message and with lower probability, they're sent a message when they're less engaged. 386 00:45:22,660 --> 00:45:29,890 It sounds wonderful, but is this even? I mean, you know, this is a stochastic algorithm. 387 00:45:29,890 --> 00:45:39,100 Things happen by chance. OK, so here's our exploratory data analysis to think about this, and this is only a first effort. 388 00:45:39,100 --> 00:45:49,770 There's other ways to think to pose this problem. OK, so the blue curve here again, they're the x axis is the date in this study. 389 00:45:49,770 --> 00:46:03,300 This individual's date in the study and the blue curve, is there estimated effect of engagement at that time in the study, 390 00:46:03,300 --> 00:46:09,150 and it starts off at zero because the prior mean for engagement was zero. 391 00:46:09,150 --> 00:46:15,870 All of these curves start off at zero for that reason. And so the blue curve is the real data. 392 00:46:15,870 --> 00:46:26,430 The age there's 2000 black cars. Now what we're doing instead of we're getting a bootstrap version, 2000 bootstrap versions of this user. 393 00:46:26,430 --> 00:46:31,230 So the way we get a. So let's just think of one bootstrap version of this user. 394 00:46:31,230 --> 00:46:35,550 We we formed that state residual, state residual, state residual. 395 00:46:35,550 --> 00:46:42,150 Like I talked about in the prior slide. And then then for that, just for that user. 396 00:46:42,150 --> 00:46:50,010 And now we bootstrap those state residuals. Now we re sample those. 397 00:46:50,010 --> 00:46:53,820 And we run the banded algorithm under the ground. 398 00:46:53,820 --> 00:46:57,960 Truth of there's no effect of engagement, but everything else stays the same. 399 00:46:57,960 --> 00:47:08,290 So we just set that data weight for that, that the posterior mean for that theta equals zero. 400 00:47:08,290 --> 00:47:13,450 So now each black line is how the posterior mean, 401 00:47:13,450 --> 00:47:23,740 how what the band thanks for that individual is that individual's posterior mean for the treatment effect as the study progresses. 402 00:47:23,740 --> 00:47:30,130 And indeed, you see, at the very beginning, the real data is highly consistent with no effect. 403 00:47:30,130 --> 00:47:34,720 I mean, the blue curve is well within the mass of the black curves. 404 00:47:34,720 --> 00:47:41,530 Well within that mass. But as time goes on, the blue curve drifts to the top. 405 00:47:41,530 --> 00:47:48,010 And in fact, I have a statistic here about eight percent of the black curves. 406 00:47:48,010 --> 00:47:54,700 Oh. Have a positive posterior mean, ninety five percent of the time. 407 00:47:54,700 --> 00:48:04,560 Where's the blue curve has a posterior part. The blue curve has a posterior positive knee prior mean posterior mean treatment effect of engagement. 408 00:48:04,560 --> 00:48:09,390 Ninety five percent of the time and only eight percent of the black curves have that. 409 00:48:09,390 --> 00:48:14,100 Oh, it's very interesting if you compare this to the prior graph, 410 00:48:14,100 --> 00:48:21,730 because even though it looks like at the very beginning, you know, way into March and April. 411 00:48:21,730 --> 00:48:25,780 That being engaged means it's better to send a message. 412 00:48:25,780 --> 00:48:36,190 But the the black lines indicate, well that blue curve is well within the variance that you might expect, even if there's no effective engagement. 413 00:48:36,190 --> 00:48:44,410 It's only when you get to June of 2020 that you start to see some indication that 414 00:48:44,410 --> 00:48:51,790 there's enough evidence that really engagement should be taken into account. 415 00:48:51,790 --> 00:49:00,490 This is my last slide. So here we what we did was we used a sequential online decision making our personalisation algorithm. 416 00:49:00,490 --> 00:49:09,420 But did we achieve? Personalised digital health, personalised sequential decision making, decision making. 417 00:49:09,420 --> 00:49:20,490 So in this whole analysis, what I did was I assumed that each user there, that user's state reward followed like a classical bandit. 418 00:49:20,490 --> 00:49:29,010 That is that prior actions don't influence future rewards even in that setting. 419 00:49:29,010 --> 00:49:33,930 How could you do a better job even if you're willing to make that assumption? 420 00:49:33,930 --> 00:49:40,260 How could you do a better job assessing personalisation? Is this a completely open? 421 00:49:40,260 --> 00:49:45,270 And if the bandit environment assumption is violated, which it definitely is, 422 00:49:45,270 --> 00:49:50,660 because if I send too many notifications in future, you're probably going to be less responsive. 423 00:49:50,660 --> 00:49:56,760 How do you assess this? So how do you assess this in a more cost kind of setting? 424 00:49:56,760 --> 00:50:00,720 As far as I know, there's just nothing there. 425 00:50:00,720 --> 00:50:12,170 And these are uniquely these are uniquely statistical questions, and they're critical for using A.I. in sequential decision making. 426 00:50:12,170 --> 00:50:24,670 Thanks. And you. So is there any question for folks who that. 427 00:50:24,670 --> 00:50:34,770 I don't see if you want to talk, I don't know if you can use your cell phone speaker or leave a question in the chat. 428 00:50:34,770 --> 00:50:42,860 So I read one, actually, so I mean, when you work in this kind of context of how I would be a bit paranoid, you know? 429 00:50:42,860 --> 00:50:49,620 You know, meeting some confounders or, you know, self-control news, I mean, they are like a lot of people are looking at, 430 00:50:49,620 --> 00:50:55,350 you know, being extremely careful in the design of the kind of state space to really consider these kind of things. 431 00:50:55,350 --> 00:51:00,150 I mean. All right. This is like this. Yeah, so. 432 00:51:00,150 --> 00:51:07,830 So these are designed experiments, right? So an enormous amount of work goes into deciding what will be sensed, 433 00:51:07,830 --> 00:51:13,290 and it is related to what the scientist the scientific domain says should be important. 434 00:51:13,290 --> 00:51:20,010 That said, this is a very immature area of science, so it's probably implausible. 435 00:51:20,010 --> 00:51:28,830 It's implausible that we collected the entire state, and that is the reason why I think we're always going to see some element. 436 00:51:28,830 --> 00:51:35,640 We're we're going to get the appearance of non stationary, not because it might be true, 437 00:51:35,640 --> 00:51:43,080 but rather because people are moving to another step, but we don't know what they look to be in the same state to us. 438 00:51:43,080 --> 00:51:51,570 So, so the issue of non stationary is and this is one of the reasons why you don't really ever want to let those probabilities go to one or zero. 439 00:51:51,570 --> 00:51:56,730 You want to be able to do intermittent analysis where you off policy analysis and you look, 440 00:51:56,730 --> 00:52:04,080 you know, are we getting some evidence of non stationary here? Yeah. 441 00:52:04,080 --> 00:52:13,960 Is there any other question? I mean, related to your policy evaluation. 442 00:52:13,960 --> 00:52:23,840 I've done a bit of that recently. So I you you use the one you penalty was basically violence only, but I guess it was before a long time. 443 00:52:23,840 --> 00:52:28,550 So if it doesn't work, though, what kind of things developed? 444 00:52:28,550 --> 00:52:36,440 Well, it's a high variance estimate here. That's number one. But so right now, there is a lot of research. 445 00:52:36,440 --> 00:52:45,570 This is a very active area of research. How do you do off policy estimation when you only have one trajectory? 446 00:52:45,570 --> 00:52:54,610 And I think there's a new paper on archive that came out like two, maybe in the last couple of months by her dad, Susan Athey. 447 00:52:54,610 --> 00:52:58,860 I think she may be on that and she has a way now. A lot of the problem. 448 00:52:58,860 --> 00:53:04,080 We did a lot of simulations for this and we didn't see any evidence. 449 00:53:04,080 --> 00:53:10,140 But that's not simulations are not proofs, right? And in the end, 450 00:53:10,140 --> 00:53:17,460 we also ourselves have work on how you can do estimation after you've adaptively 451 00:53:17,460 --> 00:53:25,080 sampled and there's different ways to do to wait to try and adjust for it. 452 00:53:25,080 --> 00:53:34,250 The big problem is really when there's it's under certain scenarios, you get aberrant behaviour. 453 00:53:34,250 --> 00:53:40,730 In our simulations, we didn't see this, but when we were writing all of this up and we'll probably use well, 454 00:53:40,730 --> 00:53:45,600 take whatever's most recent in the literature and use it. 455 00:53:45,600 --> 00:53:54,030 Yeah. Yeah. But this is an area like it's great you asked that question because this is an area of very active research. 456 00:53:54,030 --> 00:54:04,210 How do you do off policy learning on one trajectory, not in independent trajectories in which you have an independent vantage, but one trajectory? 457 00:54:04,210 --> 00:54:08,680 Yeah. I'm actually kind of, you know, real vocal learning, and I know they are like, 458 00:54:08,680 --> 00:54:14,740 not the economy that other work in that way, but he can be quite positive on that. 459 00:54:14,740 --> 00:54:22,360 Yeah, no question about that. You have no idea, but your estimate that you get of it when you do it right. 460 00:54:22,360 --> 00:54:27,560 It's interesting, though. You have to work hard to get it to misbehave. 461 00:54:27,560 --> 00:54:32,060 It's not it depends, of course, on what you're estimating if you're a truck. 462 00:54:32,060 --> 00:54:39,440 And also notice we were clipping point two point eight if this changes things enormously. 463 00:54:39,440 --> 00:54:44,750 Exactly. And that's why the simulations probably worked out OK for us. 464 00:54:44,750 --> 00:54:51,050 If we had allowed the privilege to get close to zero one, that's where you really get the problems. 465 00:54:51,050 --> 00:54:59,210 So. Is there any other question for Susan? 466 00:54:59,210 --> 00:55:04,820 Oh. So I've got a question in the chat for delay, 467 00:55:04,820 --> 00:55:12,040 which they're asking you whether you could explain a bit more on the bootstrap sampling within each individual. 468 00:55:12,040 --> 00:55:15,400 Yeah. So there was two types of bootstrap samples, 469 00:55:15,400 --> 00:55:25,700 one in which we bootstrapped individuals and one in which we had one individual and we just bootstrapped within that individual. 470 00:55:25,700 --> 00:55:29,690 In both cases, the first step was the same. 471 00:55:29,690 --> 00:55:39,710 The first step was so for each individual, we have a whole time series of state action rewards, state action rewards, state action reward. 472 00:55:39,710 --> 00:55:48,620 So what we would do is we calculated the posterior mean at the end of the study of that individual's reward function, 473 00:55:48,620 --> 00:55:55,340 their mean reward in that state at that time. So and we subtracted that from the reward. 474 00:55:55,340 --> 00:56:04,910 So we so it was now we had state action reward minus posterior mean in that state state action reward minus posterior mean in that state. 475 00:56:04,910 --> 00:56:11,660 I'll call those residuals those differences. So now we have state action, we throw away the action state. 476 00:56:11,660 --> 00:56:16,460 Residual state residual state residual for each of the 91 individuals. 477 00:56:16,460 --> 00:56:26,570 We have a whole series and then we do the bootstrap sampling under a ground truth that we're trying to test against. 478 00:56:26,570 --> 00:56:37,480 It's like a null. And so in the first case, the ground truth was that the that the prior means were correct. 479 00:56:37,480 --> 00:56:41,050 The ones we built off of heart steps would be one. 480 00:56:41,050 --> 00:56:51,670 And so the way that happens is you, you take one bootstrapped and one individual, which is state residual, state residual, state residual. 481 00:56:51,670 --> 00:56:59,650 And you run the bootstrap on that data. So. So when the boot, I mean, I'm sorry, the Band-Aid on that one individual. 482 00:56:59,650 --> 00:57:04,210 And so when the the bandit sees the state, it chooses an action. 483 00:57:04,210 --> 00:57:12,460 If the action is one you add back in the the mean from the prior, if not, 484 00:57:12,460 --> 00:57:17,020 if the actually if the action the bandit shows is zero, you leave the residual alone, 485 00:57:17,020 --> 00:57:26,920 just the reward now and you just move through time like that for that one person and you do it for all 91 people that you bootstrapped. 486 00:57:26,920 --> 00:57:32,140 In the case of. So in the first case, we bootstrap individuals. 487 00:57:32,140 --> 00:57:33,790 We bootstrap trajectories. 488 00:57:33,790 --> 00:57:50,440 And the second case, we actually just had one individual state residual, state residual, state residual, and we bootstrapped those little pairs. 489 00:57:50,440 --> 00:57:56,780 Thank you. Is there any other question for Suzanne? 490 00:57:56,780 --> 00:58:03,170 No, well, let's see that again. Thank you. Yes, thank you very much. 491 00:58:03,170 --> 00:58:07,910 You couldn't. You couldn't go to Expo. You will be able to welcome you soon. 492 00:58:07,910 --> 00:58:12,590 And also thank you again for OK. 493 00:58:12,590 --> 00:58:20,500 Yeah, have a great weekend. Thank you. Thank you very much. Bye.