1
00:00:06,990 --> 00:00:15,480
So good afternoon, I'm very pleased to welcome the distinguished speaker of the House, John Murphy from Albayalde University.

2
00:00:15,480 --> 00:00:20,010
So, so Murphy obtained using statistics from University of North Carolina at Chapel

3
00:00:20,010 --> 00:00:26,490
Hill in 1989 on F2P of his detailed position in both good universities,

4
00:00:26,490 --> 00:00:32,130
including Penn State of the University of Michigan, where she was the audience professor of statistics.

5
00:00:32,130 --> 00:00:39,790
She joined our values in 2017, where she told me also said his own personal computer science puts a.

6
00:00:39,790 --> 00:00:47,400
He's the world expert in experimental design quote intelligent sequential decision making on a particular interest in digital.

7
00:00:47,400 --> 00:00:55,560
If the work has been extremely influential on the coding issue, received numerous honours, including amongst many of us,

8
00:00:55,560 --> 00:01:02,700
the future well, electricity because an 18 obligated to go middle of the way with such good society in 2019.

9
00:01:02,700 --> 00:01:08,730
She's also American member of the US National Academy of Science on the US National Academy of Medicine.

10
00:01:08,730 --> 00:01:10,560
On top of being a fantastic scientist,

11
00:01:10,560 --> 00:01:18,990
but the military has been doing a lot of work for the study's community and is the past president of the Amos on of the Boundary Society.

12
00:01:18,990 --> 00:01:27,500
Though we both held a do or distinguished speaker for the seasonal flu, we'll be talking about assessing personalisation in digital health.

13
00:01:27,500 --> 00:01:34,610
Hanks, thanks for that introduction. Thanks for the invitation to speak with us with all of you can share my screen.

14
00:01:34,610 --> 00:01:39,860
Yes, you should be able to. Yes, yes, you should see it now.

15
00:01:39,860 --> 00:01:44,780
Just let me just fix it so I can see. OK, great.

16
00:01:44,780 --> 00:01:52,220
So, ah, this is work that we're engaged in right now.

17
00:01:52,220 --> 00:02:03,950
And these are our first efforts in this direction. And it was motivated by our concerns that when you run an online algorithm in a in this case,

18
00:02:03,950 --> 00:02:14,480
these are digital health trials and you look at the results sometimes for some individuals, the results look just totally fantastic.

19
00:02:14,480 --> 00:02:19,730
It's like you personalised. And now everyone should use your algorithm.

20
00:02:19,730 --> 00:02:27,770
And the question, of course, is, is this spurious? So that got us going down this particular path.

21
00:02:27,770 --> 00:02:34,460
And I'll share with you today what our first very first steps in this direction are and will be.

22
00:02:34,460 --> 00:02:44,060
This will be focussed on heart steps. This is a I'll describe that shortly.

23
00:02:44,060 --> 00:02:58,370
OK. Yeah. So I just wanted to mention, you know, this type of research involves large collaborative teams because you're developing an algorithm,

24
00:02:58,370 --> 00:03:02,750
then you're implementing the algorithm in a trial and then you're analysing know.

25
00:03:02,750 --> 00:03:05,900
And there's usually software engineers involved as well.

26
00:03:05,900 --> 00:03:13,820
And I just wanted to shout out three individuals who are particularly who are really made a big impact.

27
00:03:13,820 --> 00:03:17,270
And that's Pung Liao. He was he's a postdoc in my lab. Kelly Zheng.

28
00:03:17,270 --> 00:03:21,050
She's a computer science Ph.D. student, also in the lab.

29
00:03:21,050 --> 00:03:28,770
And then Xie Yang Ji, who's an incoming first year Harvard Ph.D. student.

30
00:03:28,770 --> 00:03:37,590
So what I'll do is, first of all, describe heart steps and then we'll go on to the issue of personalisation.

31
00:03:37,590 --> 00:03:45,530
OK, so what steps is was funded to construct this activity coach?

32
00:03:45,530 --> 00:03:49,980
And it's on your phone and individuals wear a wristband tracker.

33
00:03:49,980 --> 00:03:56,190
And it was. It's it's for individuals who are high risk of coronary artery disease.

34
00:03:56,190 --> 00:03:59,460
And there was there's three studies that are part of this,

35
00:03:59,460 --> 00:04:10,140
and you see the the first study was only six weeks and then the next two that ran into each other was nine three months and nine months studies.

36
00:04:10,140 --> 00:04:16,590
And I say these studies are micro randomised, and I say that in particular,

37
00:04:16,590 --> 00:04:22,530
the last two studies are personalised and I hope as I go through, you'll see what I mean by that.

38
00:04:22,530 --> 00:04:26,760
It's not just just put a question in the chat and we can.

39
00:04:26,760 --> 00:04:36,270
I can make that clear. OK, so it all digital interventions, there's many intervention components.

40
00:04:36,270 --> 00:04:45,600
We're going to only focus on one intervention component, and that's whether or not to send a notification.

41
00:04:45,600 --> 00:04:50,250
It would appear on the individual's lock screen or their smartphone,

42
00:04:50,250 --> 00:05:01,950
and the content of this notification is tailored to where the individual is at the moment, the day of the week, what's the weather like and so on.

43
00:05:01,950 --> 00:05:07,560
And you can see an example on the right hand side side, and this appeared in the morning.

44
00:05:07,560 --> 00:05:11,850
It was actually a very cold morning, so you can see that it's trying to get me to think about,

45
00:05:11,850 --> 00:05:16,860
you know, reframe my view of cold mornings and about walking to work today.

46
00:05:16,860 --> 00:05:23,490
So all the little suggestions are intended to help you be more active wherever you are at that moment in time.

47
00:05:23,490 --> 00:05:28,440
And we want to decide should we send one or should we not? Should the algorithm send one?

48
00:05:28,440 --> 00:05:36,360
Or should it not shouldn't. And there's five times a day at which these notifications might be sent and those five times are user specific.

49
00:05:36,360 --> 00:05:43,620
They have to do with the way that individual organises their life and the reward, what's called a reward or in our world, an outcome.

50
00:05:43,620 --> 00:05:51,520
A near time outcome is the 30 minute step count after this time at these time points.

51
00:05:51,520 --> 00:06:02,840
So and the reason why it's only 30 minutes is because the content of the notification is all about being active in that moment.

52
00:06:02,840 --> 00:06:13,340
So when you think about data from one of these three trials that I mentioned on two slides ago, the data, what it looks like is on each user.

53
00:06:13,340 --> 00:06:22,280
It's a time series of tuples. I'll call it state s action a reward or it's a whole time series.

54
00:06:22,280 --> 00:06:35,160
And the number of time points depends on whether or not the user was in the three month or the nine month study or at each point at each time point.

55
00:06:35,160 --> 00:06:43,070
The sensors on the wearable, as well as on the phone, pick up the person's current contacts or stay,

56
00:06:43,070 --> 00:06:54,050
and then an action that is send the notification versus not is made by an algorithm, and we'll discuss that shortly.

57
00:06:54,050 --> 00:06:58,670
And in our case, we're only focussing on sin versus not sin.

58
00:06:58,670 --> 00:07:05,870
And then after that, the sensors on the tracker note record the 30 minute step count,

59
00:07:05,870 --> 00:07:12,020
and we're going to focus on a log of a 30 minute step count, mainly because step counts are hot, right, skewed.

60
00:07:12,020 --> 00:07:18,170
And in particular, the notation I'm going to use throughout is the mean of that log 30 minute step count,

61
00:07:18,170 --> 00:07:28,850
given current state and current action and action is either one or zero send a notification versus not is denoted by this lowercase R of say.

62
00:07:28,850 --> 00:07:39,310
You should be able to see my pointer here and I'll use this notation repeatedly throughout.

63
00:07:39,310 --> 00:07:46,750
OK, so now about the algorithm that was used online as the trial went on to determine

64
00:07:46,750 --> 00:07:54,060
whether or not a notification was set at each of the five times per day.

65
00:07:54,060 --> 00:08:02,910
So I'm just what I'm going to do is I'm going to give you just some small aspects of this algorithm because I really want to

66
00:08:02,910 --> 00:08:10,410
get in to the latter part of the talk where I talk about whether or not how to assess how well the algorithm personalised.

67
00:08:10,410 --> 00:08:18,630
So I'll only talk a little bit about the algorithm itself, and I can't speak about it more if people have questions.

68
00:08:18,630 --> 00:08:25,650
So the these are online decision making algorithms, and the idea is to select these actions,

69
00:08:25,650 --> 00:08:31,320
send a notification versus not in order to maximise some sort of outcome.

70
00:08:31,320 --> 00:08:36,540
And in this case, it's law average of the step counts.

71
00:08:36,540 --> 00:08:47,910
And it's always in this world. It's always subject to both a number of constraints which are often expressed in a very qualitative way,

72
00:08:47,910 --> 00:08:57,840
and you have to figure out how to quantify them. So one constraint is to permit what's called off policy learning after the data collection is over.

73
00:08:57,840 --> 00:09:02,250
So in the field of reinforcement learning, that's what this belongs to.

74
00:09:02,250 --> 00:09:11,700
There is a lot of interest in understanding. Well, if I had use some other way of selecting the actions, how might the some of the rewards behave?

75
00:09:11,700 --> 00:09:18,030
So you want to permit that kind of like those kinds of analysis after the data collection ceases?

76
00:09:18,030 --> 00:09:23,430
There's also because of the area, the domain. This isn't a burden.

77
00:09:23,430 --> 00:09:31,290
User burden is a big issue and habituation that is when people no longer even notice the notifications.

78
00:09:31,290 --> 00:09:39,840
That's also a big issue. So these are going to these impose enormous constraints on any algorithm you're going to run and you can't use these trials,

79
00:09:39,840 --> 00:09:43,740
the length of the trials or just have to do with how much funding there is.

80
00:09:43,740 --> 00:09:50,190
So you want an algorithm that doesn't know when the trial is going to end.

81
00:09:50,190 --> 00:10:00,900
So what we did in V2 and V3, that's the three months and the nine months study was we took we started off with a bandit algorithm,

82
00:10:00,900 --> 00:10:07,740
a Bayesian type of algorithm called Thompson Sampling, and we altered it in a variety of ways.

83
00:10:07,740 --> 00:10:11,550
And I'll just point out some of the ways in which we altered it.

84
00:10:11,550 --> 00:10:22,440
And the idea is this algorithm which is running it, ran on the cloud and communicated with the phone and the tracker in real time.

85
00:10:22,440 --> 00:10:30,170
It's supposed to be personalising the decision as to whether or not to send a notification versus not.

86
00:10:30,170 --> 00:10:39,050
At each of those five times a day, so when you think of an online decision making algorithm as a statistician,

87
00:10:39,050 --> 00:10:47,000
I always think of these algorithms as being composed of two sub algorithms two elements.

88
00:10:47,000 --> 00:10:50,550
One is what people will call a learning algorithm.

89
00:10:50,550 --> 00:10:59,450
This is just an incremental statistical method, and the goal is to learn some characteristic of the multivariate the distribution of the data.

90
00:10:59,450 --> 00:11:06,080
In this case, it's to learn the meaning of the log 30 minutes step count given state in action,

91
00:11:06,080 --> 00:11:11,090
and in our case, we used a Bayesian linear regression model.

92
00:11:11,090 --> 00:11:17,900
It's a particularly simple it can be viewed as a simple, a Gaussian process model with very simple kernel,

93
00:11:17,900 --> 00:11:23,480
and that actually opens doors to us in a variety of ways.

94
00:11:23,480 --> 00:11:28,370
So that was one element. So it's just essentially an incremental statistical method.

95
00:11:28,370 --> 00:11:33,080
In our case, it's Bayesian, and you can think of it. It's linear regression.

96
00:11:33,080 --> 00:11:41,630
And then there was an action. The second element of this online decision making algorithm is an actual selection strategy,

97
00:11:41,630 --> 00:11:46,400
and that's that strategy is all about how are you going to use the outputs of the learning

98
00:11:46,400 --> 00:11:54,020
algorithm to select the actions at five times a day as the individual experiences?

99
00:11:54,020 --> 00:12:03,410
The mobile app? So what we're doing here is called posterior sampling, at least nowadays it's called posterior sampling.

100
00:12:03,410 --> 00:12:10,920
I don't think when Thomson first invented this, he thought he was thinking in this way, but nowadays it's called posterior sampling.

101
00:12:10,920 --> 00:12:14,180
The idea is what you do is you calculate the posterior.

102
00:12:14,180 --> 00:12:25,100
So you let your learning algorithm is Bayesian, so you can calculate a posterior probability that the treatment effect that in the current state s.

103
00:12:25,100 --> 00:12:31,460
You calculate the posterior probability that the treatment effect and the current state X is greater than zero.

104
00:12:31,460 --> 00:12:41,450
So that's already one. Minus RC row, and you calculate that posterior probability that screened zero and then what you do is you take that,

105
00:12:41,450 --> 00:12:49,100
oh, sorry, oh, you take that posterior probability and you randomise so you're randomising.

106
00:12:49,100 --> 00:13:00,890
It is the sequential experimentation setting, but the randomisation probabilities are tied to how the Bayesian algorithm anticipates the effect

107
00:13:00,890 --> 00:13:06,950
of sending a notification in that particular state to state the individual is in right now.

108
00:13:06,950 --> 00:13:13,610
This is a greedy personalisation. So what do I mean by greedy this algorithm?

109
00:13:13,610 --> 00:13:21,050
This if you base your work on a if you take a banded algorithm and you work around that area,

110
00:13:21,050 --> 00:13:29,390
then you're not paying attention to the fact of the notifications on future rewards.

111
00:13:29,390 --> 00:13:33,470
And clearly, that's not a good idea here.

112
00:13:33,470 --> 00:13:42,290
But there's a bias variance, trade-off and if and in our case, we decided to just focus greedily on this time.

113
00:13:42,290 --> 00:13:53,860
Would it be useful to send a notification or not? OK, so I just want to I just wanted to talk just a little bit about the learning algorithm,

114
00:13:53,860 --> 00:14:02,830
the first element of this online decision making procedure just to provide a little bit more context.

115
00:14:02,830 --> 00:14:12,640
So what we do is we use in this particular setting because of the high noise that one incurs in these kinds of real life,

116
00:14:12,640 --> 00:14:19,330
experimental and sequential decision making problems. We use a very low dimensional treatment effect model.

117
00:14:19,330 --> 00:14:26,980
And you see it here, it's a linear model in features, and all of the features of state were handcrafted by the scientific team.

118
00:14:26,980 --> 00:14:33,580
There were there were five stages of five dimensional.

119
00:14:33,580 --> 00:14:37,540
And in fact, we always use informative priors. I am now totally.

120
00:14:37,540 --> 00:14:43,180
I was never a Bayesian before. I have now become a complete Bayesian with informative priors.

121
00:14:43,180 --> 00:14:48,700
Forget about this not informative business and and the way we form our informative prior

122
00:14:48,700 --> 00:14:54,580
in this particular case and in general in a setting is you have a prior study and we did.

123
00:14:54,580 --> 00:15:03,430
We had Heart Steps V1 and we could use that study to form the prior four v2 and V3.

124
00:15:03,430 --> 00:15:09,370
And I want to mention some things about this prior because it's going to be important when we go on.

125
00:15:09,370 --> 00:15:15,520
So this prior, say this five dimensional and I'll show you the features later on a further slide.

126
00:15:15,520 --> 00:15:21,490
But the first feature is just the the the overall effect of sending a notification versus not.

127
00:15:21,490 --> 00:15:30,460
And that was the only feature that had a positive say to that is the prior had a positive meaning for that feature.

128
00:15:30,460 --> 00:15:35,200
The mean for all the other features, the remaining four was zero.

129
00:15:35,200 --> 00:15:43,930
OK, so this is important to remember for later on. So we're starting off the trial with a prior that says we anticipate there to

130
00:15:43,930 --> 00:15:51,250
be a positive effect of sending a notification or overall across all states.

131
00:15:51,250 --> 00:15:58,780
We also had a baseline week for each user, each user had a week where we just collected data on the user.

132
00:15:58,780 --> 00:16:06,160
We randomised whether to send a notification with probability 2.5 each of the five times a day.

133
00:16:06,160 --> 00:16:10,780
And now I want to talk just a little bit about the action. We did a lot of thanks to this algorithm,

134
00:16:10,780 --> 00:16:15,910
but I just wanted to mention that the parts I mentioned on the prior slide now I

135
00:16:15,910 --> 00:16:19,660
just want to give you just talk a little bit about the action selection strategy,

136
00:16:19,660 --> 00:16:28,660
which was posterior sampling. And this was a sad case because I was very naive when I started down this path.

137
00:16:28,660 --> 00:16:35,170
And so we actually had to the first set of people that came in the first V2.

138
00:16:35,170 --> 00:16:41,740
We ended up not being able to use this part of the study indicate why that happened.

139
00:16:41,740 --> 00:16:45,280
So the what does posterior sampling does what it does.

140
00:16:45,280 --> 00:16:51,190
You calculate that posterior probability, the treatment effect is very zero and then you're randomised with that probability.

141
00:16:51,190 --> 00:17:01,690
So look at number one. In some states, the posterior distribution of the treatment effect is it's going to be highly peaked and centred around zero.

142
00:17:01,690 --> 00:17:08,500
And so what that means is your randomisation probability will be average will average around point five.

143
00:17:08,500 --> 00:17:13,600
So we probability point five on average if there's no evidence of an effect.

144
00:17:13,600 --> 00:17:20,500
You're sending a notification this makes no sense whatsoever from a domain science perspective,

145
00:17:20,500 --> 00:17:26,320
particularly if you're worried about bothering people and having them habituate to your messages.

146
00:17:26,320 --> 00:17:30,310
This is definitely not desirable. We didn't even think about this at first.

147
00:17:30,310 --> 00:17:36,610
And then in other states, you're getting a large amount of information.

148
00:17:36,610 --> 00:17:46,650
You're just your Gaussian posterior will be highly peaked around a positive weight for that state.

149
00:17:46,650 --> 00:17:52,080
And you're going to think, whoa, you know, you really should send a notification in that state.

150
00:17:52,080 --> 00:18:00,540
But again, we got to remember this is a setting in which people get overburdened by having pinging their phone, pinging all the time.

151
00:18:00,540 --> 00:18:06,060
Do you really want to send that notification every time you're in that state?

152
00:18:06,060 --> 00:18:11,430
No, you don't really want to do that. So tell you what we did.

153
00:18:11,430 --> 00:18:17,520
We're trying to improve this, but we became engineers, we have to, you know, we have to put this into the field.

154
00:18:17,520 --> 00:18:23,580
It has to be. And we also needed to permit off policy learning after the data collection ceases.

155
00:18:23,580 --> 00:18:24,690
That was the third thing.

156
00:18:24,690 --> 00:18:34,740
Third issue now to learn you must in any given state, unless you're willing to make a lot of assumptions to learn, you must be able.

157
00:18:34,740 --> 00:18:39,300
You must do. Sometimes you choose action zero and sometimes you choose action one.

158
00:18:39,300 --> 00:18:43,770
You can't just always choose one of the actions. OK, so what was our solution?

159
00:18:43,770 --> 00:18:50,850
Our solution was to take that posterior probability and clip it and the little graph.

160
00:18:50,850 --> 00:18:54,870
I just drew that little graph and blue on the right hand side.

161
00:18:54,870 --> 00:18:56,370
And this is how we clipped it.

162
00:18:56,370 --> 00:19:05,940
So if the posterior probability that it was really a good idea to send that message in that state is above 0.8, it becomes zero point eight.

163
00:19:05,940 --> 00:19:13,990
If the posterior probability is around zero point five, indicating there's probably not much of a treatment effect.

164
00:19:13,990 --> 00:19:18,430
Oh, we send the notification with probability point two.

165
00:19:18,430 --> 00:19:25,560
Now, why are we doing this with point two? There is a lot of evidence in this world that variability is therapeutic,

166
00:19:25,560 --> 00:19:33,100
so we always want to send every now and then we want to send a message just to shake things up.

167
00:19:33,100 --> 00:19:39,310
So though the lower bound is point to the upper bound is 0.8. And that's what these two values at the bottom.

168
00:19:39,310 --> 00:19:43,510
The two sentences at the bottom of the slide are about so p you.

169
00:19:43,510 --> 00:19:48,670
That's the upper value point eight. In our case, this is determined by our need to do off policy learning.

170
00:19:48,670 --> 00:19:52,870
We can't end up being we can't have one because then we won't be able to learn off policy.

171
00:19:52,870 --> 00:20:03,250
It's a disaster and we don't want to. And from a domain science perspective, we don't want to overburden our users PRL, which is 0.2 in our setting.

172
00:20:03,250 --> 00:20:07,390
We don't we again, we don't want it to be zero. We have to be able to do all policy learning.

173
00:20:07,390 --> 00:20:12,190
But here also, that's where the health benefit of having some variability.

174
00:20:12,190 --> 00:20:20,910
And we also are concerned, even though I'm not dealing with it today, non stationary is a big issue in this world.

175
00:20:20,910 --> 00:20:30,840
Some. And we want to allow in our after study analysis to investigate that.

176
00:20:30,840 --> 00:20:36,270
OK. So we ran this algorithm. I just gave you a very high level view.

177
00:20:36,270 --> 00:20:43,680
We actually did a number of other things to make the algorithm suitable for this type of a setting.

178
00:20:43,680 --> 00:20:52,780
It's over. The studies are over. Did we achieve anything like personalised digital health?

179
00:20:52,780 --> 00:20:59,680
OK, so I'm going to have a whole series, I think there's like seven questions that I want to address as we go through

180
00:20:59,680 --> 00:21:04,750
and disbanded algorithm or it was a generalisation of abandoned algorithm.

181
00:21:04,750 --> 00:21:12,960
It was run separately on each of the individuals 91 individuals.

182
00:21:12,960 --> 00:21:14,970
And the way it's when I say it's run,

183
00:21:14,970 --> 00:21:24,090
it was used to choose whether or not to send a notification in each state at each time five times a day over the duration of that individual study.

184
00:21:24,090 --> 00:21:30,420
So some first questions I'd like to ask is, you know, we start from hard steps we wanted.

185
00:21:30,420 --> 00:21:40,660
There was an overall treatment effect. And our prior pointed us in that direction, is there evidence from this data that that's the case?

186
00:21:40,660 --> 00:21:48,670
So that's not about personalisation, it's just on on average. And then the next is more closely related to personalisation.

187
00:21:48,670 --> 00:21:54,160
Is there evidence of heterogeneous effects and how do you think about it in this problem?

188
00:21:54,160 --> 00:21:57,940
So here we are. We're thinking about is we have no idea what to do.

189
00:21:57,940 --> 00:22:06,830
So we we go back to the literature, the old, very, very mature literature and clinical trials.

190
00:22:06,830 --> 00:22:15,410
This is an classical meta analysis. So if you're into machine learning, you know about meta analysis, metal learning,

191
00:22:15,410 --> 00:22:20,300
this is not metal learning in machine learning, OK, this is classical meta analysis.

192
00:22:20,300 --> 00:22:27,320
In fact, at the bottom of the slide have a reference to a really lovely tutorial that came

193
00:22:27,320 --> 00:22:33,160
about at the maturity of this area when this area had really matured 21 22 years ago,

194
00:22:33,160 --> 00:22:42,890
a very old area of science. So the idea here is the way we're going to think about it is each user we have 91 users is a clinical trial.

195
00:22:42,890 --> 00:22:51,200
This is how we're going to think in our head. Each user has their own unknown vector of true treatment effect coefficients.

196
00:22:51,200 --> 00:22:57,050
So I subscription by zero because that's their true treatment effect coefficient.

197
00:22:57,050 --> 00:23:02,720
And I indicates user, Oh, and then I have to estimate each use.

198
00:23:02,720 --> 00:23:06,810
Each user has to have an estimate of that user's treatment effect coefficient.

199
00:23:06,810 --> 00:23:16,310
What I use is the vector of posterior means. And what I'm thinking in my mind is this was just a Gaussian.

200
00:23:16,310 --> 00:23:21,560
A Bayesian linear regression. This is just reg regression here.

201
00:23:21,560 --> 00:23:27,170
All I did was a rich my theta had. I is just a weight from a rich regression.

202
00:23:27,170 --> 00:23:31,990
That's all it is. You can see what Zeta is.

203
00:23:31,990 --> 00:23:39,570
So wait. Remember, Theta II is a five dimensional vector, it's the treatment effect model.

204
00:23:39,570 --> 00:23:51,090
So in that in classical meta analysis, there's two ways that people think the first way is you say all I care about are the users are in their case,

205
00:23:51,090 --> 00:23:55,910
the trial, the trials in front of me.

206
00:23:55,910 --> 00:23:59,480
So all I care about is these 9:1, you don't care about anything else.

207
00:23:59,480 --> 00:24:08,510
I only want to make inference about these 91 users, and what one does is one makes an approximate approximates the distribution of your.

208
00:24:08,510 --> 00:24:20,180
The estimates we derived from a rich regression by a normal it should have mean the true underlying regression coefficient for that user ie.

209
00:24:20,180 --> 00:24:28,910
And then there are some variance. And the variance has to do with the fact that we didn't observe this user over really long.

210
00:24:28,910 --> 00:24:36,950
We didn't assume we didn't have an infinite number of examples on this user. So the arrogance.

211
00:24:36,950 --> 00:24:44,510
The second way you think that one thinks in classical meta analysis is population inference.

212
00:24:44,510 --> 00:24:56,750
So here you think my end users and 91 in our case are a subset of a population of users and we want to make statements about that whole population.

213
00:24:56,750 --> 00:25:02,480
And in this case, actually in this study, this made a lot of sense for us to think that way as well.

214
00:25:02,480 --> 00:25:11,210
And the reason is because all of these individuals are from our patients in the Kaiser Health Care System in Seattle,

215
00:25:11,210 --> 00:25:16,430
and they had all just been diagnosed with stage one hypertension.

216
00:25:16,430 --> 00:25:25,400
So if the health care system was thinking about should we roll out an app for our patients who have just been diagnosed with stage one hypertension,

217
00:25:25,400 --> 00:25:30,260
this this type of inference would be relevant.

218
00:25:30,260 --> 00:25:39,650
In this case, you make an additional assumption in this additional assumption is that as you vary from one user to the other across the population,

219
00:25:39,650 --> 00:25:46,460
that varies normally. And the main five dimensional vector of treatment effects the status of pop.

220
00:25:46,460 --> 00:25:55,760
And then there are some variation amongst these five dimensional vectors as you go from one user to the other in the whole population.

221
00:25:55,760 --> 00:25:59,570
OK. So must start answering my two questions that I posed.

222
00:25:59,570 --> 00:26:06,600
I'm going to repeat the questions that my answer? The first question is in this vein of population inference.

223
00:26:06,600 --> 00:26:16,800
So is there some evidence of an overall average treatment effect on our 30 minute step count and in classical,

224
00:26:16,800 --> 00:26:27,960
the way one forms a statistic as you get a weighted average of your your purse, your user specific estimate?

225
00:26:27,960 --> 00:26:32,670
Okay, so I'm going to go through that. This is just I'm not doing anything special here.

226
00:26:32,670 --> 00:26:34,560
This is classical meta analysis.

227
00:26:34,560 --> 00:26:42,600
So this little e that's a that's a vector all zeros, except a one in one place, and it's just being used to pick out one of the five.

228
00:26:42,600 --> 00:26:46,520
One of the members of the five dimensional vector of theta and the weight.

229
00:26:46,520 --> 00:26:56,490
So we get a weighted average in those weights or are the within use or variance, plus the variance from user to user.

230
00:26:56,490 --> 00:27:04,800
So it's classic this classical statistics and you weight your person specific estimate hours by these weights.

231
00:27:04,800 --> 00:27:10,410
Now here's this little green table. It gives you the names of the features.

232
00:27:10,410 --> 00:27:15,690
So the first is the overall send notification versus not binary.

233
00:27:15,690 --> 00:27:21,000
Then we had a feature how many times recently it was an exponentially discounted

234
00:27:21,000 --> 00:27:26,850
feature of how many times recently we've been sending your notifications the next.

235
00:27:26,850 --> 00:27:33,180
The third feature engagement was whether or not people were going more often to the app to track

236
00:27:33,180 --> 00:27:39,210
their their physical activity than usual location was whether or not they were in the structure,

237
00:27:39,210 --> 00:27:49,320
environment or not, and step variation was how variable their step count was in that same time period over the last week.

238
00:27:49,320 --> 00:27:57,780
OK, so you do this and you think, OK, you get chart, you do the statistics, you get your confidence interval and you think, Oh, this is great.

239
00:27:57,780 --> 00:28:02,160
We have a confidence interval. It doesn't contain zero. That's lovely.

240
00:28:02,160 --> 00:28:07,650
And so there seems to be some overall effect of sending a notification versus not.

241
00:28:07,650 --> 00:28:15,360
On average, across individuals, it's not talking about personalisation. And then you go to that second row and you realise that and in fact,

242
00:28:15,360 --> 00:28:22,380
we anticipated this that the more someone has been notified, the less responsive they tend to be.

243
00:28:22,380 --> 00:28:31,350
And in fact, this is a very large negative coefficient and the confidence intervals are very wide, indicating there's a lot of uncertainty.

244
00:28:31,350 --> 00:28:39,330
This pretty much kills the treatment effect, except the average estimate or except when the dose.

245
00:28:39,330 --> 00:28:45,810
The recent dose is very, very close to zero and to hit that point a little stronger.

246
00:28:45,810 --> 00:28:55,020
I'm going to look at a particular state here, and the state is the person is experienced recently, an average dose.

247
00:28:55,020 --> 00:28:57,030
They're currently engaged with the app.

248
00:28:57,030 --> 00:29:04,620
They've been tracking their behaviours, they're at home or work in a structured environment and their recent variability in their activities.

249
00:29:04,620 --> 00:29:12,140
So I'm just going to focus on that state and I ask, well, what's the confidence interval for the treatment effect in that state?

250
00:29:12,140 --> 00:29:16,440
You know, so this is a confidence interval for the average across the population treatment effect in that state.

251
00:29:16,440 --> 00:29:23,620
And you see indeed, there's just not much going on. There's not a lot of evidence there, right?

252
00:29:23,620 --> 00:29:35,530
It's depressing. So then we asked, well, what about heterogeneity between users is the better action user specific?

253
00:29:35,530 --> 00:29:44,560
And this is now all of a sudden we switch our hats and we start focussing just on these 91 users.

254
00:29:44,560 --> 00:29:55,390
And the test statistic here is based on the variation between users and their estimated regression coefficients in the treatment effect.

255
00:29:55,390 --> 00:30:06,580
Again, E is this is a vector all zeros, except for one one in one of the entries, depending on which coefficient you want to pick out.

256
00:30:06,580 --> 00:30:14,200
And the average that you get a variation amongst each individual's estimated treatment of sex and the average is an average,

257
00:30:14,200 --> 00:30:18,220
of course, weighted by how variable that treatment effect is.

258
00:30:18,220 --> 00:30:32,170
And of course, this average is why it's not overt here, explicit, but it depends on how many how long that individuals in the study.

259
00:30:32,170 --> 00:30:39,550
So here's us using it, so the test to the test is a hypothesis of whether or not the users have all the users.

260
00:30:39,550 --> 00:30:46,260
Ninety one have the same. True. Treatment effect coefficients.

261
00:30:46,260 --> 00:30:55,270
That's what this null hypothesis means. Five dimensional data. And here we are going from one data to the next five.

262
00:30:55,270 --> 00:31:01,120
And you see, there's enormous evidence that users differ a great deal.

263
00:31:01,120 --> 00:31:06,580
One from the other in terms of their own treatment effect coefficients.

264
00:31:06,580 --> 00:31:11,560
Lots of heterogeneity. Very interesting.

265
00:31:11,560 --> 00:31:21,640
OK, so now what I'm going to do is I'm going to if you're familiar with reinforcement learning are bandits, you know,

266
00:31:21,640 --> 00:31:30,190
one of the things we always want to do is estimate the average reward and compare that average reward on different policies.

267
00:31:30,190 --> 00:31:38,050
So that's what I'm going to do here as well. OK, so on average, does the bandit algorithm select more effective actions,

268
00:31:38,050 --> 00:31:44,260
i.e. send a notification versus not, then the prior, because remember, the prior was informative prior.

269
00:31:44,260 --> 00:31:51,760
We built it off a prior prior data on similar individuals exact same interventions.

270
00:31:51,760 --> 00:31:59,620
OK, so I just want to make clear what I mean by average, by how am I quantifying if more effective actions?

271
00:31:59,620 --> 00:32:05,560
By that, I mean, you get a higher average reward. So here you have the value function.

272
00:32:05,560 --> 00:32:12,700
The Sabaya for the Eyes user, PI is a particular policy for choosing actions.

273
00:32:12,700 --> 00:32:19,390
And this is just the expectation of the ICE users reward function, which is a function of state and action.

274
00:32:19,390 --> 00:32:31,320
So it's this expectation is averaging over the states that that user experiences, as well as any stochastic city in the policy pie.

275
00:32:31,320 --> 00:32:34,080
And our policies are always stochastic.

276
00:32:34,080 --> 00:32:41,970
So it averages both over the stochastic city to the states that that user finds themselves in, as well as the Stochastic City an average.

277
00:32:41,970 --> 00:32:46,890
And we want to know disbanded algorithm produce a higher value,

278
00:32:46,890 --> 00:32:58,230
higher average reward than if we had just built our policy from the prior data and ran with it.

279
00:32:58,230 --> 00:33:02,670
So what we're going to do is we estimate that average reward under our band

280
00:33:02,670 --> 00:33:06,810
it now under our band it algorithm or our generalised band bandit algorithm.

281
00:33:06,810 --> 00:33:10,980
It's actually the the it's posterior sampling, right?

282
00:33:10,980 --> 00:33:13,470
So the policy is changing with time.

283
00:33:13,470 --> 00:33:21,000
So that's the reason why there's a B1 through DTI, because it's the probability of selecting action changes with time.

284
00:33:21,000 --> 00:33:29,040
Send a message. Changes with time and the estimate are of that value is just the average that you see in front of you for that individual.

285
00:33:29,040 --> 00:33:36,760
Now, the estimate of the average reward for a different policy, for example, the policy built off the prior.

286
00:33:36,760 --> 00:33:42,910
There is a way it used importance waiting. There's more sophisticated estimates now in the literature.

287
00:33:42,910 --> 00:33:48,840
OK, I just want to warn you, but this is a first round kind of thing. So we use important weights.

288
00:33:48,840 --> 00:34:01,020
And you can see them to their on the right hand side to weight those observed rewards in order to estimate the average reward under the prior policy.

289
00:34:01,020 --> 00:34:06,160
So if we just built the policy from V1, hardships we want and ran with, it didn't do any.

290
00:34:06,160 --> 00:34:20,240
Try to do any learning here. Already, we should be thinking in our mind that prior policy, our subjective prior said there was an effect.

291
00:34:20,240 --> 00:34:25,670
Of sending a suggestion. But then when we went and did these analyses, it didn't look so great.

292
00:34:25,670 --> 00:34:29,430
Not at least not on average.

293
00:34:29,430 --> 00:34:39,210
OK, so how are we going to do this now we're thinking again, we're still in that meta analysis world, that classical, lovely net analysis this world.

294
00:34:39,210 --> 00:34:42,390
So now the what do our status, what are they become?

295
00:34:42,390 --> 00:34:50,430
They're just one dimensional now and say to zero, the ICE users truth data is the difference between that users value under the band,

296
00:34:50,430 --> 00:34:57,090
minus that users value under the prior, the policy form from the prior and data hat I.

297
00:34:57,090 --> 00:35:01,650
It's just the estimated. Or should these two things we don't, of course, we don't observe statuses.

298
00:35:01,650 --> 00:35:09,660
We don't observe the first term. This is unknown. So what we're going to do just same is classical, not analysis.

299
00:35:09,660 --> 00:35:19,740
And this is generally true under certain conditions on the data structure that say to hat that is this estimate or in values,

300
00:35:19,740 --> 00:35:27,000
the difference in values is approximately normal. It has some variance because you're estimating it.

301
00:35:27,000 --> 00:35:31,590
And so we would we make this first assumption if we are only interested,

302
00:35:31,590 --> 00:35:36,330
we only make the first assumption if we're interested in only these one users are equal

303
00:35:36,330 --> 00:35:41,610
one to 91 and then we make an additional assumption that's the second one right here.

304
00:35:41,610 --> 00:35:48,750
I have my pointer right below it that the true difference in values varies from one individual to another in the population,

305
00:35:48,750 --> 00:35:52,740
according to normal distribution. Sort of like a random effects thing.

306
00:35:52,740 --> 00:35:58,030
OK. So the statistics are identical.

307
00:35:58,030 --> 00:36:06,640
No difference. They're identical test statistics. So just the band an algorithm result in higher average rewards than the policy based on the prior.

308
00:36:06,640 --> 00:36:11,320
And in fact, it does. And the confidence interval doesn't contain zero.

309
00:36:11,320 --> 00:36:18,740
Now, this is not a big effect. Remember, this is average over the population.

310
00:36:18,740 --> 00:36:27,740
Why do we think this might have happened? Well, the prior the policy built off the prior wanted us to send the notification a lot

311
00:36:27,740 --> 00:36:33,650
because it the prior said there was a positive effect of sending that notification.

312
00:36:33,650 --> 00:36:42,750
The prior mean was positive. But the but the band learns that on average across people, it's not true.

313
00:36:42,750 --> 00:36:50,580
So I interpret this as the effect of not bothering people too much, and every now and then it helps on average.

314
00:36:50,580 --> 00:37:03,040
OK, so now let's focus on our Ninety One users, just our ninety one and ask is that difference in values that the band it versus the prior?

315
00:37:03,040 --> 00:37:08,050
Are those differences? In other words, is the benefit of running a band bandit?

316
00:37:08,050 --> 00:37:13,490
Does that vary from one user to the other? And here we are.

317
00:37:13,490 --> 00:37:15,830
These are this user chi squared.

318
00:37:15,830 --> 00:37:25,400
This is the second type of hypothesis test and second type of statistic, and it has this hypothesis test chi square distribution.

319
00:37:25,400 --> 00:37:28,700
It's incredibly significant.

320
00:37:28,700 --> 00:37:40,820
A lot of evidence that the band it works that for some people, the band, it gives you very different results than the prior as compared to on average.

321
00:37:40,820 --> 00:37:49,850
So, OK, you know, this is all fine and good, and then we wanted to start looking at some exploratory work.

322
00:37:49,850 --> 00:37:54,890
And I found and we're getting closer now to the question.

323
00:37:54,890 --> 00:37:58,660
Question seven was the question that motivated this entire.

324
00:37:58,660 --> 00:38:07,300
Project, and it's still motivating the project, but first time I goal for sex Question six.

325
00:38:07,300 --> 00:38:12,670
Our prior this informative prior built off partnerships, we one said there was a treatment effect.

326
00:38:12,670 --> 00:38:20,290
In fact, it said the treatment effect was pretty strong because it was very strong in hard steps to be won.

327
00:38:20,290 --> 00:38:24,940
On average, across users, does it appear that the bandit algorithm learns over time?

328
00:38:24,940 --> 00:38:28,870
Well, what should the banded algorithm learn on average across users?

329
00:38:28,870 --> 00:38:43,140
It should learn on average across users, there's not much going on. OK, so the blue curve here is the actual from the real trial,

330
00:38:43,140 --> 00:38:51,240
the actual average posterior mean of the treatment effect in this particular state, the second bullet point gives you the state.

331
00:38:51,240 --> 00:38:56,310
It's the same state we looked at before as the trial progresses.

332
00:38:56,310 --> 00:39:05,760
So here you see that this is a cross decision points that x axis and then the y axis is that posterior me?

333
00:39:05,760 --> 00:39:11,620
And if you can look close, I don't know. It's hard to see. But the blue curve starts at time.

334
00:39:11,620 --> 00:39:23,520
Zero around point four seven point four seven was the prior mean of the overall effect ascending suggestion versus not in that state.

335
00:39:23,520 --> 00:39:31,230
And we know from we suspect from our prior analysis analysis, I've talked about today that on average, that didn't bear out.

336
00:39:31,230 --> 00:39:35,070
And in fact, OK, so you see the blue curve.

337
00:39:35,070 --> 00:39:39,630
It starts to drift down. It's the average posterior mean, the treatment effect.

338
00:39:39,630 --> 00:39:44,460
As the study progresses, it drifts down towards zero.

339
00:39:44,460 --> 00:39:48,270
OK, so I wanted to understand, was that drift really important?

340
00:39:48,270 --> 00:39:52,710
Was it significant, you know, in some way? So here's here's what we did.

341
00:39:52,710 --> 00:40:00,120
We we did bootstrap studies. So there's two thousand black curves.

342
00:40:00,120 --> 00:40:04,980
Each black curve is a bootstrap, is a is a bandit.

343
00:40:04,980 --> 00:40:12,090
Try a banded algorithm applied to 90 to each of 91 bootstrap users.

344
00:40:12,090 --> 00:40:19,710
Because I'm going to talk to you about how we how we made a bootstrap user. So what we did was we took our original 91 users data.

345
00:40:19,710 --> 00:40:29,610
We subtract it from every user at each time point the posterior mean of the treatment effect at that time point.

346
00:40:29,610 --> 00:40:36,420
And we call the difference between the reward and we subtracted from the reward, the posterior and the treatment effect at that time point.

347
00:40:36,420 --> 00:40:40,950
We call that difference a residual. It's not mean zero because we just took away the treatment effect.

348
00:40:40,950 --> 00:40:49,440
But so now after we did that subtraction on each individual, the 91 individuals, we have a time series of state residual state residual.

349
00:40:49,440 --> 00:41:02,040
State residual state residual. Then what we do is we we get a bootstrap sample of these 91 trajectories of state residual state residual.

350
00:41:02,040 --> 00:41:15,210
And on each of the 91, we run a bandit. If the bandit says choose action one, we add back in a treatment effect according to the prior.

351
00:41:15,210 --> 00:41:22,970
If the band it says choose action zero, we don't do anything because, yeah, we leave it because the data is already had.

352
00:41:22,970 --> 00:41:34,490
So, OK, we do that for every person, so we have one bootstraps study of ninety one users of it is run on each the ninety one.

353
00:41:34,490 --> 00:41:41,420
And now what we do is we get this average posterior mean across those 91 bootstrapped users.

354
00:41:41,420 --> 00:41:48,200
And that's the black line. A black line is the posterior mean, evolving over time.

355
00:41:48,200 --> 00:41:51,770
Now, under the ground truth, it's the priors, correct?

356
00:41:51,770 --> 00:41:58,730
And in fact, you see if the price were correct, all these lines, it's a big mass because this is a stochastic algorithm.

357
00:41:58,730 --> 00:42:03,560
It's a big mess. It goes through time, but it's but the blue line.

358
00:42:03,560 --> 00:42:16,520
The truth. Rapidly deviates from that mask and goes below, so the actual study very quickly learns that the prior was incorrect.

359
00:42:16,520 --> 00:42:30,920
OK. So I'm getting close to the end, so this so this again, as I said earlier, this is the we had a number of situations like this.

360
00:42:30,920 --> 00:42:33,560
And this is what motivated all this work, and I think to me,

361
00:42:33,560 --> 00:42:44,840
this is really important because right now we have so many examples of AI producing horrendous false results and we cannot have I mean,

362
00:42:44,840 --> 00:42:49,820
that's our, you know, as a statistician, I want to make sure that whatever we say,

363
00:42:49,820 --> 00:42:54,110
I want to try and give you some measure of confidence with whatever we say.

364
00:42:54,110 --> 00:43:01,010
OK, so I'm going to show you a user data who who exhibited very interesting personalisation.

365
00:43:01,010 --> 00:43:08,870
This was not the only use for this one. So remember, this is going on in Seattle.

366
00:43:08,870 --> 00:43:18,230
The x axis is the date in the study. This this individual or join the study in November 15, 2019.

367
00:43:18,230 --> 00:43:21,650
pre-COVID, they exited the study.

368
00:43:21,650 --> 00:43:26,590
This individual exited the study at the beginning of August 2020.

369
00:43:26,590 --> 00:43:36,650
In the in the middle of the pandemic, Seattle closed down in March.

370
00:43:36,650 --> 00:43:41,540
OK, so let's let's talk about the y axis.

371
00:43:41,540 --> 00:43:50,420
Each dot is a ratio, it's the ratio of the posterior mean of the treatment effect in that that user's current state

372
00:43:50,420 --> 00:43:55,640
divided by the posterior standard deviation of the treatment effect in that user's current state.

373
00:43:55,640 --> 00:44:03,530
And the reason why we're graphing this on the y axis is this directly leads to the prob the posterior probability.

374
00:44:03,530 --> 00:44:08,180
The higher it is, the higher the posterior probability of sending a notification, the lower the lower.

375
00:44:08,180 --> 00:44:19,190
So OK, so each dot is so there's a dot for every one of these five decision times, you know, a lot of dots are on top of each other.

376
00:44:19,190 --> 00:44:28,970
Now the dots are have two colours. The blue dots are when the current the user's current state indicates the person is engaged and the red dots,

377
00:44:28,970 --> 00:44:32,870
or when the user's current state indicates the person is now engaged here means you're

378
00:44:32,870 --> 00:44:37,430
just watching your app because the app had all kinds of things you could do on it.

379
00:44:37,430 --> 00:44:41,590
And the red state means you're just not watching the app quite as much.

380
00:44:41,590 --> 00:44:50,950
It's fascinating because in general, if at that context and that in that state you haven't been watching the app,

381
00:44:50,950 --> 00:45:00,800
then the algorithm says your treatment effect is much lower than if you if this individual had been.

382
00:45:00,800 --> 00:45:09,580
Watching was engaged. That's the blue. This is glorious.

383
00:45:09,580 --> 00:45:13,150
I mean, I show this to a domain scientists, and they love it, right?

384
00:45:13,150 --> 00:45:17,200
Because this is what it means to personalise it with higher probability.

385
00:45:17,200 --> 00:45:22,660
When someone's engaged, they're sent a message and with lower probability, they're sent a message when they're less engaged.

386
00:45:22,660 --> 00:45:29,890
It sounds wonderful, but is this even? I mean, you know, this is a stochastic algorithm.

387
00:45:29,890 --> 00:45:39,100
Things happen by chance. OK, so here's our exploratory data analysis to think about this, and this is only a first effort.

388
00:45:39,100 --> 00:45:49,770
There's other ways to think to pose this problem. OK, so the blue curve here again, they're the x axis is the date in this study.

389
00:45:49,770 --> 00:46:03,300
This individual's date in the study and the blue curve, is there estimated effect of engagement at that time in the study,

390
00:46:03,300 --> 00:46:09,150
and it starts off at zero because the prior mean for engagement was zero.

391
00:46:09,150 --> 00:46:15,870
All of these curves start off at zero for that reason. And so the blue curve is the real data.

392
00:46:15,870 --> 00:46:26,430
The age there's 2000 black cars. Now what we're doing instead of we're getting a bootstrap version, 2000 bootstrap versions of this user.

393
00:46:26,430 --> 00:46:31,230
So the way we get a. So let's just think of one bootstrap version of this user.

394
00:46:31,230 --> 00:46:35,550
We we formed that state residual, state residual, state residual.

395
00:46:35,550 --> 00:46:42,150
Like I talked about in the prior slide. And then then for that, just for that user.

396
00:46:42,150 --> 00:46:50,010
And now we bootstrap those state residuals. Now we re sample those.

397
00:46:50,010 --> 00:46:53,820
And we run the banded algorithm under the ground.

398
00:46:53,820 --> 00:46:57,960
Truth of there's no effect of engagement, but everything else stays the same.

399
00:46:57,960 --> 00:47:08,290
So we just set that data weight for that, that the posterior mean for that theta equals zero.

400
00:47:08,290 --> 00:47:13,450
So now each black line is how the posterior mean,

401
00:47:13,450 --> 00:47:23,740
how what the band thanks for that individual is that individual's posterior mean for the treatment effect as the study progresses.

402
00:47:23,740 --> 00:47:30,130
And indeed, you see, at the very beginning, the real data is highly consistent with no effect.

403
00:47:30,130 --> 00:47:34,720
I mean, the blue curve is well within the mass of the black curves.

404
00:47:34,720 --> 00:47:41,530
Well within that mass. But as time goes on, the blue curve drifts to the top.

405
00:47:41,530 --> 00:47:48,010
And in fact, I have a statistic here about eight percent of the black curves.

406
00:47:48,010 --> 00:47:54,700
Oh. Have a positive posterior mean, ninety five percent of the time.

407
00:47:54,700 --> 00:48:04,560
Where's the blue curve has a posterior part. The blue curve has a posterior positive knee prior mean posterior mean treatment effect of engagement.

408
00:48:04,560 --> 00:48:09,390
Ninety five percent of the time and only eight percent of the black curves have that.

409
00:48:09,390 --> 00:48:14,100
Oh, it's very interesting if you compare this to the prior graph,

410
00:48:14,100 --> 00:48:21,730
because even though it looks like at the very beginning, you know, way into March and April.

411
00:48:21,730 --> 00:48:25,780
That being engaged means it's better to send a message.

412
00:48:25,780 --> 00:48:36,190
But the the black lines indicate, well that blue curve is well within the variance that you might expect, even if there's no effective engagement.

413
00:48:36,190 --> 00:48:44,410
It's only when you get to June of 2020 that you start to see some indication that

414
00:48:44,410 --> 00:48:51,790
there's enough evidence that really engagement should be taken into account.

415
00:48:51,790 --> 00:49:00,490
This is my last slide. So here we what we did was we used a sequential online decision making our personalisation algorithm.

416
00:49:00,490 --> 00:49:09,420
But did we achieve? Personalised digital health, personalised sequential decision making, decision making.

417
00:49:09,420 --> 00:49:20,490
So in this whole analysis, what I did was I assumed that each user there, that user's state reward followed like a classical bandit.

418
00:49:20,490 --> 00:49:29,010
That is that prior actions don't influence future rewards even in that setting.

419
00:49:29,010 --> 00:49:33,930
How could you do a better job even if you're willing to make that assumption?

420
00:49:33,930 --> 00:49:40,260
How could you do a better job assessing personalisation? Is this a completely open?

421
00:49:40,260 --> 00:49:45,270
And if the bandit environment assumption is violated, which it definitely is,

422
00:49:45,270 --> 00:49:50,660
because if I send too many notifications in future, you're probably going to be less responsive.

423
00:49:50,660 --> 00:49:56,760
How do you assess this? So how do you assess this in a more cost kind of setting?

424
00:49:56,760 --> 00:50:00,720
As far as I know, there's just nothing there.

425
00:50:00,720 --> 00:50:12,170
And these are uniquely these are uniquely statistical questions, and they're critical for using A.I. in sequential decision making.

426
00:50:12,170 --> 00:50:24,670
Thanks. And you. So is there any question for folks who that.

427
00:50:24,670 --> 00:50:34,770
I don't see if you want to talk, I don't know if you can use your cell phone speaker or leave a question in the chat.

428
00:50:34,770 --> 00:50:42,860
So I read one, actually, so I mean, when you work in this kind of context of how I would be a bit paranoid, you know?

429
00:50:42,860 --> 00:50:49,620
You know, meeting some confounders or, you know, self-control news, I mean, they are like a lot of people are looking at,

430
00:50:49,620 --> 00:50:55,350
you know, being extremely careful in the design of the kind of state space to really consider these kind of things.

431
00:50:55,350 --> 00:51:00,150
I mean. All right. This is like this. Yeah, so.

432
00:51:00,150 --> 00:51:07,830
So these are designed experiments, right? So an enormous amount of work goes into deciding what will be sensed,

433
00:51:07,830 --> 00:51:13,290
and it is related to what the scientist the scientific domain says should be important.

434
00:51:13,290 --> 00:51:20,010
That said, this is a very immature area of science, so it's probably implausible.

435
00:51:20,010 --> 00:51:28,830
It's implausible that we collected the entire state, and that is the reason why I think we're always going to see some element.

436
00:51:28,830 --> 00:51:35,640
We're we're going to get the appearance of non stationary, not because it might be true,

437
00:51:35,640 --> 00:51:43,080
but rather because people are moving to another step, but we don't know what they look to be in the same state to us.

438
00:51:43,080 --> 00:51:51,570
So, so the issue of non stationary is and this is one of the reasons why you don't really ever want to let those probabilities go to one or zero.

439
00:51:51,570 --> 00:51:56,730
You want to be able to do intermittent analysis where you off policy analysis and you look,

440
00:51:56,730 --> 00:52:04,080
you know, are we getting some evidence of non stationary here? Yeah.

441
00:52:04,080 --> 00:52:13,960
Is there any other question? I mean, related to your policy evaluation.

442
00:52:13,960 --> 00:52:23,840
I've done a bit of that recently. So I you you use the one you penalty was basically violence only, but I guess it was before a long time.

443
00:52:23,840 --> 00:52:28,550
So if it doesn't work, though, what kind of things developed?

444
00:52:28,550 --> 00:52:36,440
Well, it's a high variance estimate here. That's number one. But so right now, there is a lot of research.

445
00:52:36,440 --> 00:52:45,570
This is a very active area of research. How do you do off policy estimation when you only have one trajectory?

446
00:52:45,570 --> 00:52:54,610
And I think there's a new paper on archive that came out like two, maybe in the last couple of months by her dad, Susan Athey.

447
00:52:54,610 --> 00:52:58,860
I think she may be on that and she has a way now. A lot of the problem.

448
00:52:58,860 --> 00:53:04,080
We did a lot of simulations for this and we didn't see any evidence.

449
00:53:04,080 --> 00:53:10,140
But that's not simulations are not proofs, right? And in the end,

450
00:53:10,140 --> 00:53:17,460
we also ourselves have work on how you can do estimation after you've adaptively

451
00:53:17,460 --> 00:53:25,080
sampled and there's different ways to do to wait to try and adjust for it.

452
00:53:25,080 --> 00:53:34,250
The big problem is really when there's it's under certain scenarios, you get aberrant behaviour.

453
00:53:34,250 --> 00:53:40,730
In our simulations, we didn't see this, but when we were writing all of this up and we'll probably use well,

454
00:53:40,730 --> 00:53:45,600
take whatever's most recent in the literature and use it.

455
00:53:45,600 --> 00:53:54,030
Yeah. Yeah. But this is an area like it's great you asked that question because this is an area of very active research.

456
00:53:54,030 --> 00:54:04,210
How do you do off policy learning on one trajectory, not in independent trajectories in which you have an independent vantage, but one trajectory?

457
00:54:04,210 --> 00:54:08,680
Yeah. I'm actually kind of, you know, real vocal learning, and I know they are like,

458
00:54:08,680 --> 00:54:14,740
not the economy that other work in that way, but he can be quite positive on that.

459
00:54:14,740 --> 00:54:22,360
Yeah, no question about that. You have no idea, but your estimate that you get of it when you do it right.

460
00:54:22,360 --> 00:54:27,560
It's interesting, though. You have to work hard to get it to misbehave.

461
00:54:27,560 --> 00:54:32,060
It's not it depends, of course, on what you're estimating if you're a truck.

462
00:54:32,060 --> 00:54:39,440
And also notice we were clipping point two point eight if this changes things enormously.

463
00:54:39,440 --> 00:54:44,750
Exactly. And that's why the simulations probably worked out OK for us.

464
00:54:44,750 --> 00:54:51,050
If we had allowed the privilege to get close to zero one, that's where you really get the problems.

465
00:54:51,050 --> 00:54:59,210
So. Is there any other question for Susan?

466
00:54:59,210 --> 00:55:04,820
Oh. So I've got a question in the chat for delay,

467
00:55:04,820 --> 00:55:12,040
which they're asking you whether you could explain a bit more on the bootstrap sampling within each individual.

468
00:55:12,040 --> 00:55:15,400
Yeah. So there was two types of bootstrap samples,

469
00:55:15,400 --> 00:55:25,700
one in which we bootstrapped individuals and one in which we had one individual and we just bootstrapped within that individual.

470
00:55:25,700 --> 00:55:29,690
In both cases, the first step was the same.

471
00:55:29,690 --> 00:55:39,710
The first step was so for each individual, we have a whole time series of state action rewards, state action rewards, state action reward.

472
00:55:39,710 --> 00:55:48,620
So what we would do is we calculated the posterior mean at the end of the study of that individual's reward function,

473
00:55:48,620 --> 00:55:55,340
their mean reward in that state at that time. So and we subtracted that from the reward.

474
00:55:55,340 --> 00:56:04,910
So we so it was now we had state action reward minus posterior mean in that state state action reward minus posterior mean in that state.

475
00:56:04,910 --> 00:56:11,660
I'll call those residuals those differences. So now we have state action, we throw away the action state.

476
00:56:11,660 --> 00:56:16,460
Residual state residual state residual for each of the 91 individuals.

477
00:56:16,460 --> 00:56:26,570
We have a whole series and then we do the bootstrap sampling under a ground truth that we're trying to test against.

478
00:56:26,570 --> 00:56:37,480
It's like a null. And so in the first case, the ground truth was that the that the prior means were correct.

479
00:56:37,480 --> 00:56:41,050
The ones we built off of heart steps would be one.

480
00:56:41,050 --> 00:56:51,670
And so the way that happens is you, you take one bootstrapped and one individual, which is state residual, state residual, state residual.

481
00:56:51,670 --> 00:56:59,650
And you run the bootstrap on that data. So. So when the boot, I mean, I'm sorry, the Band-Aid on that one individual.

482
00:56:59,650 --> 00:57:04,210
And so when the the bandit sees the state, it chooses an action.

483
00:57:04,210 --> 00:57:12,460
If the action is one you add back in the the mean from the prior, if not,

484
00:57:12,460 --> 00:57:17,020
if the actually if the action the bandit shows is zero, you leave the residual alone,

485
00:57:17,020 --> 00:57:26,920
just the reward now and you just move through time like that for that one person and you do it for all 91 people that you bootstrapped.

486
00:57:26,920 --> 00:57:32,140
In the case of. So in the first case, we bootstrap individuals.

487
00:57:32,140 --> 00:57:33,790
We bootstrap trajectories.

488
00:57:33,790 --> 00:57:50,440
And the second case, we actually just had one individual state residual, state residual, state residual, and we bootstrapped those little pairs.

489
00:57:50,440 --> 00:57:56,780
Thank you. Is there any other question for Suzanne?

490
00:57:56,780 --> 00:58:03,170
No, well, let's see that again. Thank you. Yes, thank you very much.

491
00:58:03,170 --> 00:58:07,910
You couldn't. You couldn't go to Expo. You will be able to welcome you soon.

492
00:58:07,910 --> 00:58:12,590
And also thank you again for OK.

493
00:58:12,590 --> 00:58:20,500
Yeah, have a great weekend. Thank you. Thank you very much. Bye.