1 00:00:08,990 --> 00:00:16,790 All right, we're going to start with the second half of all the seconds in morning session of our experiments there, 2 00:00:16,790 --> 00:00:20,180 and I'm very pleased to introduce now Professor Ray Dutch, 3 00:00:20,180 --> 00:00:28,160 who is the director of the Centre for Experimental Social Sciences at Nuffield College, as Zanka was mentioning in his morning, 4 00:00:28,160 --> 00:00:37,850 talk says, is quite an extraordinary achievement and that it has its own subject pool of fifty thousand fifty thousand subjects. 5 00:00:37,850 --> 00:00:49,070 It's spread across four locations in one Nuffield in the UK, but then also in Chile, in India and in China. 6 00:00:49,070 --> 00:00:58,670 So it's quite it's quite an undertaking and as it really allowed experimentation or experimental social sciences to be advanced, 7 00:00:58,670 --> 00:01:02,280 and Ray is going to talk a bit about some of his work, but also, 8 00:01:02,280 --> 00:01:12,440 I imagine work that's being done and says Ray himself is a political scientist who's done excellent work using experiments, 9 00:01:12,440 --> 00:01:20,360 digital trace data and public opinion analysis, and he's written a lot of very impressive papers. 10 00:01:20,360 --> 00:01:24,920 So I think he's going to talk a lot about them, so I'm going to let him do the talking rather than myself. 11 00:01:24,920 --> 00:01:33,440 Thank you. Is it on? 12 00:01:33,440 --> 00:01:39,740 Now? Cool. Great. Well, thank you very much for the invitation. It's a real pleasure to be here. 13 00:01:39,740 --> 00:01:47,390 From what I've heard, this has been a really interesting and rewarding week and next week should be, 14 00:01:47,390 --> 00:01:55,430 I suspect, also really a cool educational experience. 15 00:01:55,430 --> 00:02:05,060 I am going to talk a little bit about my work and says most of the stuff I'll 16 00:02:05,060 --> 00:02:12,830 talk about is work that is being conducted and says I'm going to introduce, 17 00:02:12,830 --> 00:02:20,810 Oh, now I've. Yeah, you just so I can say that. 18 00:02:20,810 --> 00:02:25,590 Well. I can do it, screw it up. 19 00:02:25,590 --> 00:02:31,980 OK, but we can use this stuff. You want to just use cool. 20 00:02:31,980 --> 00:02:33,990 So there are three parts to the talk. 21 00:02:33,990 --> 00:02:44,580 One of them will be a brief introduction to assess which I think Cenk is already really filled you in on, so I'll be very brief. 22 00:02:44,580 --> 00:02:48,660 I'll introduce the talk. The talk has two modules. 23 00:02:48,660 --> 00:02:58,380 One will look at micro replications, a work that I've been doing exploring experimental measurement error. 24 00:02:58,380 --> 00:03:06,630 And the second module we'll talk about designing virtual experiments with stratified average treatment effects. 25 00:03:06,630 --> 00:03:13,500 So without any ado, the introduction. So I think SAS went over basically who we are. 26 00:03:13,500 --> 00:03:21,060 We have four centres here in Oxford, Santiago, Chile, India and China. 27 00:03:21,060 --> 00:03:27,180 We have a pretty extensive online facility, which Cenk manages and we have. 28 00:03:27,180 --> 00:03:37,530 As he pointed out, the subject will about 50000 people. It's actually spread over the UK, Ireland, India, China, the US and Chile. 29 00:03:37,530 --> 00:03:45,360 And we're also, as you'll see in the second module, we're also involved in field experiments. 30 00:03:45,360 --> 00:03:57,850 In addition to all of this, we do workshops, we have summer schools and we do visiting post and pre doc arrangements. 31 00:03:57,850 --> 00:04:06,670 The talk, so the talk will be, I think, will be experimental perspectives on themes you covered this week, and I sort of guessed here. 32 00:04:06,670 --> 00:04:11,110 So but I for my casual conversations, I think it's true. 33 00:04:11,110 --> 00:04:15,310 Basically, it'll sort of be a mix of computational methods, large data, 34 00:04:15,310 --> 00:04:22,510 social media and what I consider really important this notion of robust replications. 35 00:04:22,510 --> 00:04:31,000 The second module of the first module that I'll talk about will be about how to detect experimental measurement error. 36 00:04:31,000 --> 00:04:39,550 I'll talk about how experimental contexts or mode is really important for detecting experimental measurement error. 37 00:04:39,550 --> 00:04:47,980 And I'll explain how you how we use machine learning as a as a key element of that exercise. 38 00:04:47,980 --> 00:04:53,890 And then the last module, we'll talk about large scale experimental interventions. 39 00:04:53,890 --> 00:04:59,200 This is a project that's ongoing that just that just started. 40 00:04:59,200 --> 00:05:10,510 It focuses on digital digital trace, broadly defined outcomes, and it employs poor stratification in order to sort of estimate average treatments. 41 00:05:10,510 --> 00:05:19,840 So that's the that's the layout of what I'll be discussing and interrupt if I'm totally obtuse at any point. 42 00:05:19,840 --> 00:05:23,230 If you have any questions. Micro replication. 43 00:05:23,230 --> 00:05:36,730 So I've been focussing on this pretty extensively because I think that we've sort of moved into this sort of what I call data generation, 44 00:05:36,730 --> 00:05:45,130 which is one in which the costs of generating data have declined dramatically. 45 00:05:45,130 --> 00:05:50,910 Access to data has been what I would call democratise. 46 00:05:50,910 --> 00:06:00,220 Lots of people have access to the data of convenient samples have become the norm. 47 00:06:00,220 --> 00:06:07,300 Nobody assigns Leslie kiss anymore and talks about representative samples, 48 00:06:07,300 --> 00:06:14,590 and there's been just a proliferation of data what I would call data generation modes out there. 49 00:06:14,590 --> 00:06:21,190 And all of this has implications for the kinds of research we're doing, 50 00:06:21,190 --> 00:06:31,850 the kinds of research that gets reported and has, in my view, resulted in some sort of. 51 00:06:31,850 --> 00:06:41,540 Probably negative side effects or outcomes, and the most important one is that lots of stuff that gets published can't be replicated, right? 52 00:06:41,540 --> 00:06:50,540 I mean, this is this is this is this is a table, a figure from columns 19, 20, 18 nature piece, 53 00:06:50,540 --> 00:06:58,430 where he replicated a number of experiments that were published in highly regarded psychology journals. 54 00:06:58,430 --> 00:07:02,900 And the the yellow stuff is the stuff that couldn't get replicated right? 55 00:07:02,900 --> 00:07:06,410 And these are these are these are these are studies that are published in some of the 56 00:07:06,410 --> 00:07:11,310 leading psychology journals and the discipline and replications of serious problem. 57 00:07:11,310 --> 00:07:15,080 But you could see this in economics. You can see this in political science. 58 00:07:15,080 --> 00:07:21,170 I don't know sociology, actually, because but I suspect you have a similar problems as the oldest. 59 00:07:21,170 --> 00:07:31,010 So replications an issue. And and I think part of the reason that it's an issue is because, oops, oops. 60 00:07:31,010 --> 00:07:35,760 Now I've gotten. Want to go back? 61 00:07:35,760 --> 00:07:41,820 That's a small. It's a smaller one, OK, great. 62 00:07:41,820 --> 00:07:46,860 So the reason we have this problem, I think, is because of this democratisation of data generation. 63 00:07:46,860 --> 00:07:51,390 The fact that the cost of data generation have been dramatically reduced. 64 00:07:51,390 --> 00:07:58,590 The fact that people people generate data using all kinds of data generation modes. 65 00:07:58,590 --> 00:08:09,720 Right. And the problem, of course, at least in the experimental world, is if you do an experiment, right? 66 00:08:09,720 --> 00:08:18,030 How do you know your? In either the blue or the orange state, right, you do an experiment. 67 00:08:18,030 --> 00:08:27,870 You get a result. You write up a paper you submitted to the American Economics Review and the question. 68 00:08:27,870 --> 00:08:37,150 The important question here for me, in any case, is that how do you know this is not generated by experimental? 69 00:08:37,150 --> 00:08:45,000 Experimental measurement error. Oops! I don't know. Right. 70 00:08:45,000 --> 00:08:51,640 The. Typically, you don't have a clue, is my point. 71 00:08:51,640 --> 00:08:57,540 Right, I mean, so you don't really know that this is this. No, you don't experiment with 500 people, right? 72 00:08:57,540 --> 00:09:02,270 You've got this result. It's a significant average treatment effect you really have. 73 00:09:02,270 --> 00:09:07,670 It's very uncertain as to whether this result is robust. 74 00:09:07,670 --> 00:09:16,850 Right now, you can you can publish it. And you know, you may end up being one of the unfortunate people in the yellow sign, right? 75 00:09:16,850 --> 00:09:21,140 People try to replicate your result and it doesn't replicate. Right. 76 00:09:21,140 --> 00:09:25,680 And so I've been thinking about sort of how do you address this issue? 77 00:09:25,680 --> 00:09:36,220 How do you how do you basically take measures to ensure that the experiment that you've conducted right is a is generating a robust result? 78 00:09:36,220 --> 00:09:47,800 Right? Well, Michael, replications may be a way to address this by Michael replications being replications within the same study. 79 00:09:47,800 --> 00:09:52,340 Right. Essentially, I do a study and oh, maybe a better microbe. 80 00:09:52,340 --> 00:10:02,030 Replicate this to make sure it's a robust result to make sure that other people will be able to replicate it if it gets published right. 81 00:10:02,030 --> 00:10:05,810 And the point with the argument I make in this, in this, 82 00:10:05,810 --> 00:10:14,390 in this article basically is that multi rather rather than single mode replications are the way to approach the problem, right? 83 00:10:14,390 --> 00:10:24,500 So for example, if you do an empirical study and you get a interesting result, don't simply sort of, 84 00:10:24,500 --> 00:10:29,930 you know, pay 500 more empowered respondents to do the exact same experiment, right? 85 00:10:29,930 --> 00:10:33,980 And then argue that you've got a robust micro replication. All right. 86 00:10:33,980 --> 00:10:41,840 So rather than rather than replicate within a mono or single mode, think about replicating in multiple modes. 87 00:10:41,840 --> 00:10:46,260 All right. That's the sort of punch line of this entire. 88 00:10:46,260 --> 00:10:54,140 And this this particular module and there are lots of different modes or context in which you can conduct this work, right? 89 00:10:54,140 --> 00:10:58,880 You can use Amazon. You can use Lucid. These are basically crowdsourced. 90 00:10:58,880 --> 00:11:08,330 Or you could use responded, which is a large scale subject pool, a commercial subject pool. 91 00:11:08,330 --> 00:11:16,820 You could use something like knowledge networks in the US, which has a very sort of representative sample of online respondents. 92 00:11:16,820 --> 00:11:22,280 Or you could use us or you could do something like the American National Election Study or GISS, 93 00:11:22,280 --> 00:11:27,650 right, which has a very robust representative in person sample, right? 94 00:11:27,650 --> 00:11:40,590 So the point is that there's various kinds of modes that you can use when you're thinking about replicating a particular experiment. 95 00:11:40,590 --> 00:11:48,360 And what are you trying to do while you're trying to sort of figure out whether there is measurement error associated with a particular mode, right? 96 00:11:48,360 --> 00:11:52,650 So some K is basically the measurement error associated with the particular mode. 97 00:11:52,650 --> 00:11:58,950 Right? And when you represent an average treatment effect, you should always. 98 00:11:58,950 --> 00:12:07,410 Assume that it's going to reflect real, a real average treatment effect, plus the measurement error. 99 00:12:07,410 --> 00:12:12,930 Right. And the point is, how do you sort of figure out how big this MLK is? 100 00:12:12,930 --> 00:12:18,270 Well, if you only if you only do this in one mode, you don't have you don't have really any idea, right? 101 00:12:18,270 --> 00:12:27,090 If you only do this with them to it right, you're not going to really be able to figure out whether there is mould related experimental error. 102 00:12:27,090 --> 00:12:33,240 Right? So unless you look at different modes, you're not going to be able to you're not going to be able to sort of figure that out. 103 00:12:33,240 --> 00:12:37,330 And if you. 104 00:12:37,330 --> 00:12:50,000 If you're reasonably comfortable that you can identify the measurement error right then of replicating and multimodal is a dominant strategy. 105 00:12:50,000 --> 00:12:58,510 That's that's that's what I'm going to try to convince to convince you with this particular presentation. 106 00:12:58,510 --> 00:13:05,530 So this is a simulation that we did as part of this paper. This is a paper that is is coming out on political analysis. 107 00:13:05,530 --> 00:13:09,920 And what the simulation basically suggests is that. 108 00:13:09,920 --> 00:13:19,520 This this this new year basically indicates how certain a researcher is that they will be able to identify which 109 00:13:19,520 --> 00:13:29,170 of the modes is actually the better mode or the mode with the low or ideally close to zero measurement error. 110 00:13:29,170 --> 00:13:36,860 Right now, if you don't think you can figure this out, then replicating in multi modes is not going to help you, right? 111 00:13:36,860 --> 00:13:42,500 And so what we will we will we try to illustrate in the simulation is that OK, well, 112 00:13:42,500 --> 00:13:53,960 what happens if you are 50 percent likely to sort of identify which of the modes is the low error or zero measurement error mode, right? 113 00:13:53,960 --> 00:14:03,200 And of course, as this rule gets higher right, you're much more likely to be able to identify which of the modes is low error versus higher. 114 00:14:03,200 --> 00:14:21,000 OK. So what this horizontal axis suggests is the probability that you are you are actually right in a high measurement error context. 115 00:14:21,000 --> 00:14:32,030 Right? And what the vertical axis suggests is the probability that you are in a very low zero measurement error context, right? 116 00:14:32,030 --> 00:14:37,440 And what the simulation is suggesting is that as long as you. 117 00:14:37,440 --> 00:14:47,790 Have a reasonably good chance of identifying which of the modes has low measurement error finds them. 118 00:14:47,790 --> 00:14:59,190 The green is basically a reduction in the the reduction in the sampling error relative to the the the basically 119 00:14:59,190 --> 00:15:10,260 the green indicates that you're more likely to select the right mode if you replicate in a second mode and right. 120 00:15:10,260 --> 00:15:10,800 And basically, 121 00:15:10,800 --> 00:15:19,860 what this is suggesting is that as long as you're reasonably reasonably adept at identifying the the mode that has low measurement error, 122 00:15:19,860 --> 00:15:28,890 then replicating in a different mode is always the optimal strategy. 123 00:15:28,890 --> 00:15:32,790 OK, so let me just illustrate this. 124 00:15:32,790 --> 00:15:46,890 So this is an experiment that I've been doing with two other two with an economist in Russia and one of our post-docs and in Santiago. 125 00:15:46,890 --> 00:15:55,380 And the experiment is just a long experiment, right? And it's similar to a public goods game. 126 00:15:55,380 --> 00:15:58,740 People play this experiment. 127 00:15:58,740 --> 00:16:08,950 People, people conduct or are asked to do a real effort task, which involves having some digits and then they're asked to report their income. 128 00:16:08,950 --> 00:16:18,020 And they can lie about the income. And our interests, our interest really is how much lying occurs in these experiments, right? 129 00:16:18,020 --> 00:16:21,630 And the treatment in this experiment is a deduction, right? 130 00:16:21,630 --> 00:16:34,220 Right. So the deduction rates vary and the expectation is that the line will decline as the deduction rates increase. 131 00:16:34,220 --> 00:16:40,310 So it's a it's a pretty simple experiment. I'll show you I'll give you the details of the the actual design, right? 132 00:16:40,310 --> 00:16:48,020 So people come into this experiment and there's three different tax rates, either at 10 percent, 20 percent or 30 percent tax rates, 133 00:16:48,020 --> 00:16:57,980 the deduction rates people play this this experiment in groups of four and the once the taxes are levied, 134 00:16:57,980 --> 00:17:02,630 they're redistributed equally amongst the group members. So there's a public good. 135 00:17:02,630 --> 00:17:09,560 There's no exclude ability. There's no social gains or losses. In most of these, there's no audits or fines. 136 00:17:09,560 --> 00:17:16,300 And people play this 10 times. And they're paid for one of the rounds at random. 137 00:17:16,300 --> 00:17:21,100 They play in groups of four and there's random matching at the beginning of his expense. 138 00:17:21,100 --> 00:17:28,210 OK. So the in each round people. 139 00:17:28,210 --> 00:17:37,030 Add these two digit numbers, they have one minute. The more numbers they add, the more money they make, 140 00:17:37,030 --> 00:17:44,230 and they're then told how much money they've made and then they're told then they have to declare 141 00:17:44,230 --> 00:17:51,310 their income and then they're told what their total profit is and their total profit equals, 142 00:17:51,310 --> 00:17:59,530 how much they didn't declare. Plus the the the equally shared tax revenues in their group of four. 143 00:17:59,530 --> 00:18:03,730 So the the experiment is pretty simple, right? 144 00:18:03,730 --> 00:18:16,120 And our primary interest here is understanding the extent to which people lie about their income at the after they conduct this real extra task. 145 00:18:16,120 --> 00:18:22,660 And then the treatments in this experiment are varying the document rates. 146 00:18:22,660 --> 00:18:28,660 OK, so that's a that's a sort of simple public good public goods kind of game. 147 00:18:28,660 --> 00:18:33,340 So the interesting part here is that we conduct this experiment in different modes. 148 00:18:33,340 --> 00:18:37,270 So this is my mode. This is the modes element here. So I conduct this experiment. 149 00:18:37,270 --> 00:18:43,270 We conducted this experiment initially in the Oxford lab, right? 150 00:18:43,270 --> 00:18:50,800 We did this with the Oxford subject pool. We had total sixteen thousand decisions, 151 00:18:50,800 --> 00:19:01,210 right made by one hundred and sixteen subjects and about the average rate of lying for those subjects is about fifty seven percent. 152 00:19:01,210 --> 00:19:14,260 So. So subjects basically lied or declared forty forty three percent of their income and the lying rate was fifty seven percent, right? 153 00:19:14,260 --> 00:19:19,010 So that's the the outcome of the Oxford lab. So that's. 154 00:19:19,010 --> 00:19:22,230 Which is in person? They've come. Oh, yeah. 155 00:19:22,230 --> 00:19:28,460 So, yeah, so this is a lab experiment, right? They come in person, they come to our lab, which is just next door, basically. 156 00:19:28,460 --> 00:19:33,530 And they play this game. So usually about twenty five people come into the lab and play. 157 00:19:33,530 --> 00:19:37,610 Twenty four come into the lab and play. It is divided into groups of four. 158 00:19:37,610 --> 00:19:43,460 And then they they play this. Actually, they played in the lab. They actually played it a little bit more than 10 rounds. 159 00:19:43,460 --> 00:19:47,570 But but that's that and this was the sort of foundational result. 160 00:19:47,570 --> 00:19:52,880 So we get this result. And we thought, OK, well, there's a fair amount of lying. 161 00:19:52,880 --> 00:20:00,630 And but this is one mode, right? And so we actually replicated this in a number of different modes. 162 00:20:00,630 --> 00:20:07,640 So we also replicated. We did it with our online subject pool, right? 163 00:20:07,640 --> 00:20:15,770 And so here we had about one hundred thirteen hundred and sixty seven decisions and we had one hundred and forty four subjects play this. 164 00:20:15,770 --> 00:20:20,210 People played online. So they they they came into a virtual waiting room. 165 00:20:20,210 --> 00:20:30,050 They waited till there were four people. They then they then played the game exactly as we played it in the lab, except they played it online, right? 166 00:20:30,050 --> 00:20:35,520 And so that's a different mode, right? So people are playing this now, and it's a different subject. 167 00:20:35,520 --> 00:20:41,300 It's a different subject. It's our virtual subject pool. We then had people come. 168 00:20:41,300 --> 00:20:51,900 We then conducted the exact same. Let me just now that confuses. 169 00:20:51,900 --> 00:20:57,690 So actually, this. This version is. 170 00:20:57,690 --> 00:21:04,280 Our subjects from the lab playing the experiment online. 171 00:21:04,280 --> 00:21:10,870 This version of the experiment is our virtual subject pool actually playing it online. 172 00:21:10,870 --> 00:21:21,490 And the last the last column here are us Merck subjects playing the experiment online. 173 00:21:21,490 --> 00:21:31,180 So the idea here is we we we very significantly change the the actual mode in which to which the experiment was being conducted. 174 00:21:31,180 --> 00:21:37,630 And so this is this gives you a flavour of what I'm suggesting people ought to do when they're when they're conducting these experiments, 175 00:21:37,630 --> 00:21:44,650 if they want to leverage mode, right, as a possible explanation for measurement error. 176 00:21:44,650 --> 00:21:53,670 So. So, so one thing I'm concerned about is whether demographics are sort of that controlled here across all different modes. 177 00:21:53,670 --> 00:22:02,740 That's that's a very good question. So I'm not concerned about I'm not so much concerned about demographics, but it is obviously an issue. 178 00:22:02,740 --> 00:22:10,000 And I'll show you in the analysis how we deal with it, because obviously you're right, these people will all be young, right? 179 00:22:10,000 --> 00:22:17,860 Mm-Hmm. As will these people. Right. But as we get into the online subject pool, we're going to get older subjects right. 180 00:22:17,860 --> 00:22:21,910 Particularly in the in the in the in the entire world. 181 00:22:21,910 --> 00:22:29,950 But we'll get a more representative sample of the population. 182 00:22:29,950 --> 00:22:35,860 I'm going to focus on trying to tease out the the motor effect here. 183 00:22:35,860 --> 00:22:43,200 Yes. Just a quick question, because the experiment you needed groups of four, right? 184 00:22:43,200 --> 00:22:49,090 How does that work with the Turks would also have like a virtual waiting room, and this was the work to say yes. 185 00:22:49,090 --> 00:22:53,070 And it's not easy. Let me put it let me let me let me look. 186 00:22:53,070 --> 00:23:00,930 This is very difficult to do online, actually, because what has to happen is that the virtual subjects have to come into a virtual waiting room. 187 00:23:00,930 --> 00:23:07,210 They have to wait two hours for individuals and then they basically play the game right. 188 00:23:07,210 --> 00:23:12,060 So it's it's quite challenging. Right? 189 00:23:12,060 --> 00:23:23,180 And I'm only reporting here the the results from the UK in the end, but we also did this in other countries might. 190 00:23:23,180 --> 00:23:28,780 OK. So this is the conventional sort of estimation, right, 191 00:23:28,780 --> 00:23:40,240 where basically I've simply broken up the groups into the results into these the lab, the online lab, the online UK and the internet. 192 00:23:40,240 --> 00:23:52,130 Right. And the initial sort of suggestion here is that I shouldn't touch this. 193 00:23:52,130 --> 00:23:58,870 OK, the the deduction rate here clearly seems to be working right as the deduction rate goes up. 194 00:23:58,870 --> 00:24:03,280 Right? You're actually people are reporting less of their income. 195 00:24:03,280 --> 00:24:12,570 That's the negative coefficient. There's some evidence here that the online subject pool, right, is behaving differently. 196 00:24:12,570 --> 00:24:15,480 So this is the conventional strategy, right? 197 00:24:15,480 --> 00:24:25,650 So what we argue in this paper is that really you want to sort of leverage machine learning to try to tease out the heterogeneity? 198 00:24:25,650 --> 00:24:36,420 Right? Because essentially what we argue in the paper is that you want to be totally 199 00:24:36,420 --> 00:24:43,710 indifferent about what the possible how heterogeneous treatment effects can be. 200 00:24:43,710 --> 00:24:48,720 And then you want to look at all of the possible heterogeneous treatment effects 201 00:24:48,720 --> 00:24:52,800 and then draw some conclusion as to whether it's mode related or whether it's, 202 00:24:52,800 --> 00:24:57,750 as you pointed out, maybe related to particular demographic characteristics. 203 00:24:57,750 --> 00:25:03,750 Right. So that's the that's the the the punch line of this paper. 204 00:25:03,750 --> 00:25:04,470 Right. 205 00:25:04,470 --> 00:25:14,940 And so we argue basically that you want to use some sort of machine learning effort to sort of identify conditional average treatment effects, right? 206 00:25:14,940 --> 00:25:22,180 In other words, the average treatment effects that are conditioned on particular characteristics, either of the mode or of the sample. 207 00:25:22,180 --> 00:25:29,230 Right. And so that's the first sort of stage of this estimation strategy. 208 00:25:29,230 --> 00:25:38,190 So we use this broad estimation, which is sort of like a random us, which I'm assuming that Roberto talked about, OK. 209 00:25:38,190 --> 00:25:46,800 And so basically, it's just sort of a machine learning strategy for teasing out heterogeneity in the data set, right? 210 00:25:46,800 --> 00:25:53,850 And what this essentially allows us to do for each individual in the dataset. 211 00:25:53,850 --> 00:25:59,010 So if you go back here, we have. 212 00:25:59,010 --> 00:26:06,510 A lot of decisions, we have about twenty four thousand decisions here, six or seven hundred here, but they're not totally independent. 213 00:26:06,510 --> 00:26:11,040 Fourteen thousand here and fifteen thousand here, so we have lots of decisions being made. 214 00:26:11,040 --> 00:26:22,590 And that's really important, right? And the the the idea here is we want to be able to determine whether the average treatment 215 00:26:22,590 --> 00:26:31,860 effect with respect to any particular covariates in the dataset right there is right. 216 00:26:31,860 --> 00:26:34,920 That's the big. So it's not in an ideal world, right? 217 00:26:34,920 --> 00:26:40,560 The average treatment effect right should be constant across for all individuals in this dataset. 218 00:26:40,560 --> 00:26:46,440 Right. So the barred estimation is going to allow us to look at the extent to which the 219 00:26:46,440 --> 00:26:52,440 average treatment effect is different for different individuals in the dataset, 220 00:26:52,440 --> 00:27:04,350 and also will allow us to infer whether variation in conditional average treatment effect is related to mode or something else. 221 00:27:04,350 --> 00:27:15,660 So that's the the punch line of this exercise. So the the bar code is available on my GitHub, and it's pretty simple to implement. 222 00:27:15,660 --> 00:27:19,920 All right. And this is the result, right? 223 00:27:19,920 --> 00:27:22,530 So this is the result. So what does the result tell us? 224 00:27:22,530 --> 00:27:32,070 So if you look at this blue line, that's the estimated average treatment effect for all of the data, right? 225 00:27:32,070 --> 00:27:42,420 And basically, it's about it's clearly it's it's about a negative point seven, right, which is the right direction. 226 00:27:42,420 --> 00:27:47,940 Right. So we want we want the we want the treatment effect to be negative because we as detection rates go up. 227 00:27:47,940 --> 00:27:56,340 We want people to cheat more right and report less. So that's clearly the right outcome. 228 00:27:56,340 --> 00:28:02,550 The dotted red line is the zero effect, right? 229 00:28:02,550 --> 00:28:10,560 And all we've done and this is the sort of attraction of the sort of the bar destination strategy is we've simply 230 00:28:10,560 --> 00:28:17,730 basically organised all of the conditional average treatment effects along this horizontal horizontal axis. 231 00:28:17,730 --> 00:28:28,420 And then we've looked at the. No. 232 00:28:28,420 --> 00:28:34,480 OK, and then we've looked at the the magnitudes, right, so clearly, 233 00:28:34,480 --> 00:28:43,180 if you sort of look at so this is what sort of really attraction attractive about this estimation strategy way, it's visually very summary, right? 234 00:28:43,180 --> 00:28:47,080 It gives you a very nice summary summarisation of all of the data very quickly. 235 00:28:47,080 --> 00:28:56,860 So. So this basically suggests that, well, most of the conditional average treatment effects are below zero, which is good, right? 236 00:28:56,860 --> 00:29:03,220 There are some above zero, but they're not that many. Right. So that's sort of encouraging. 237 00:29:03,220 --> 00:29:11,990 And then all we've done here is we presented we've simply graphically presented the data 238 00:29:11,990 --> 00:29:21,250 of colour coded the four different modes because modes the modes of what we really were, 239 00:29:21,250 --> 00:29:29,650 what we were really interested in in this paper. And oh, now I don't have the legend, but I think I remember. 240 00:29:29,650 --> 00:29:35,400 So the the the red is the lab. 241 00:29:35,400 --> 00:29:46,730 The conventional lab mode, the orange or brown, is the the lab participants online. 242 00:29:46,730 --> 00:30:01,160 And the grey is them to work, and this light colour is is the sex online subject pool for the UK. 243 00:30:01,160 --> 00:30:11,090 All right. So, so so right off the bat, you get a sense, right from a visual sense of weather mode matters here. 244 00:30:11,090 --> 00:30:19,100 Right, right. You get a visual sense of weather mode is affecting the magnitude of the average treatment effect, right? 245 00:30:19,100 --> 00:30:26,240 And clearly there is clearly some evidence here, right? Clearly, the and this was your intuition, right? 246 00:30:26,240 --> 00:30:40,130 I mean, clearly, the online subject pool clearly is more likely to have a quite moderated treatment effect, right? 247 00:30:40,130 --> 00:30:50,990 The lab subjects clearly are much more likely to behave behave at wrestling, which is consistent with a variety of other stuff that I've done. 248 00:30:50,990 --> 00:30:54,620 So it's it's not it's not surprising now. So this is this. 249 00:30:54,620 --> 00:31:00,290 This is what I argue people should do. They should do their experiments in different modes. 250 00:31:00,290 --> 00:31:03,920 They should explicitly use diverse modes, right? 251 00:31:03,920 --> 00:31:08,660 Because you want to you want to establish the robustness of your treatment effect, right? 252 00:31:08,660 --> 00:31:21,780 And then and then visually. Explore the likelihood that mode might be might be affecting the magnitude of your sex life. 253 00:31:21,780 --> 00:31:28,800 Yes, so just a quick clarification, so this is a conditional average treatment effect condition specifically only on 254 00:31:28,800 --> 00:31:33,630 the mode or this is condition like on the I saw that you included gender and age, 255 00:31:33,630 --> 00:31:37,230 you could do everything you can do out here. I've just presented the mode. 256 00:31:37,230 --> 00:31:40,500 So the one above is so, so, so. So here's the way to think about this. 257 00:31:40,500 --> 00:32:00,300 So this is all of the individuals in the data. Right? So this person here, right, might be a woman of age 19 and 19 in the UK subject pool. 258 00:32:00,300 --> 00:32:07,890 That's the way to think about it. And actually, these are decisions. So that person actually shows up ten times, right? 259 00:32:07,890 --> 00:32:18,330 But the her decision could vary, actually. So and I've just taken that this data here and just reorganised it, right? 260 00:32:18,330 --> 00:32:22,980 So I'm only looking at the four modes, right? But you could reorganise it however you want. 261 00:32:22,980 --> 00:32:28,770 Right? You could then say, I'm going to look at gender, I'm going to look at gender and whatever you want. 262 00:32:28,770 --> 00:32:36,360 So I've just the average. So the average treatment effect for each, each person you, I just exactly right. 263 00:32:36,360 --> 00:32:40,710 That's it. So that's the sort of attraction of these, 264 00:32:40,710 --> 00:32:51,660 like the BART method explicitly estimates a an average treatment effect for each cover, each unique covariate in the dataset. 265 00:32:51,660 --> 00:32:55,470 Right. And then you can sort of organise everything however you want. All right. 266 00:32:55,470 --> 00:33:03,750 That's the. And so this is what I this is. My argument is that this is probably how you want to sort of approach the exercise. 267 00:33:03,750 --> 00:33:18,080 Yup. Just a short question, I mean, like on the x axis X, it must be like decisions like if you have its decisions, 268 00:33:18,080 --> 00:33:22,430 its decision, how can you compute like a treatment effect for a decision? 269 00:33:22,430 --> 00:33:35,010 I mean, it's. Well, it's because that's a good point, well, the the person this individual will make 10 decisions, right? 270 00:33:35,010 --> 00:33:47,050 And so here we've committed, we've actually computed a average treatment effect for that person 10 times. 271 00:33:47,050 --> 00:33:51,190 But it must be looked like there's a lot of there's a lot of there's a lot of court. 272 00:33:51,190 --> 00:33:56,140 Yes, there's a there's a high amount of correlation here. Yes, now, but it's not total, right? 273 00:33:56,140 --> 00:34:01,140 I mean, because people will make, you know, for example. 274 00:34:01,140 --> 00:34:10,900 You might argue that in the in the in the as a covariates, you might think when you make that decision matters. 275 00:34:10,900 --> 00:34:15,060 So whether you make it in round one, whether or whether you make it in round 10. 276 00:34:15,060 --> 00:34:21,030 So that can be part of the covariate dimensionality. 277 00:34:21,030 --> 00:34:26,520 Right. But you're right. I mean, they're probably highly correlated. 278 00:34:26,520 --> 00:34:31,710 It's a bit misleading to sort of say, I've got 5000 5000 observations here. 279 00:34:31,710 --> 00:34:42,320 I can see that's. It's probably more like five hundred. 280 00:34:42,320 --> 00:34:55,460 OK, so but this is sort of generally the flavour of the way I think you should approach the the the problem of a micro replication in multimode. 281 00:34:55,460 --> 00:34:57,530 Find out whether, you know, 282 00:34:57,530 --> 00:35:06,710 there is some heterogeneity to the conditional average treatment effect and then think about whether it's related to mode, and that's what we did. 283 00:35:06,710 --> 00:35:14,450 And then we asked ourselves the question Well, why are we getting so then the hard part, of course, then is, Well, why are we getting this right? 284 00:35:14,450 --> 00:35:20,330 So that's the next stage of the analysis, right? Clearly, there seems to be some measurement error here. 285 00:35:20,330 --> 00:35:29,110 But why? Right? Yes, this a very well. 286 00:35:29,110 --> 00:35:35,020 Because you choose different countries rights and different mode, the only specific in different countries. 287 00:35:35,020 --> 00:35:38,920 Wouldn't it be better to test all the modes on one country? 288 00:35:38,920 --> 00:35:42,280 That's a good point. That's a good point. 289 00:35:42,280 --> 00:35:45,520 So we. So we effectively did. 290 00:35:45,520 --> 00:35:57,880 The first three modes are all in the UK, but we then did what we did want to use a crowd as sort of a crowd sourced mode like M. 291 00:35:57,880 --> 00:36:06,610 And the the problem is, at least when we started doing this, it was very difficult to find a crowdsourced mode in the UK. 292 00:36:06,610 --> 00:36:10,870 So we did take the easy solution and we did do it in the US. 293 00:36:10,870 --> 00:36:12,730 But the the but. 294 00:36:12,730 --> 00:36:26,740 But you may be right, but the point is that the the UK or the Americas and the UK online subject cool right do sort of exhibit similar patterns. 295 00:36:26,740 --> 00:36:31,900 But you're right, in an optimal world, we should not have we should not have gone out of the EU. 296 00:36:31,900 --> 00:36:36,250 I agree. Yeah, I had a question about it, 297 00:36:36,250 --> 00:36:43,510 because the modes are still within like into sample selection after people kind of go to 298 00:36:43,510 --> 00:36:49,900 the Nuffield or to your interview or through the Met with their sort of virtue is to. 299 00:36:49,900 --> 00:36:56,680 We out measurement error relative to some sort of general population of people who are in all the different surveys. 300 00:36:56,680 --> 00:37:00,380 Yes. So we try to do that. I'll show you in a minute how we do. 301 00:37:00,380 --> 00:37:06,700 Or maybe I mean, we try to address this and I'll you can see whether you think we did it reasonably 302 00:37:06,700 --> 00:37:10,940 because the idea of them being that replication is so that the initial study, 303 00:37:10,940 --> 00:37:18,100 which is basically the the reason for for this in-depth analysis, is this replication hasn't been successful. 304 00:37:18,100 --> 00:37:23,860 But then this would not solve the replication on the same mode, right? 305 00:37:23,860 --> 00:37:28,510 There are still a nano value in replication with the Mechanical Turk. 306 00:37:28,510 --> 00:37:35,200 There would be a logical comparison, right? Yes. Yes, exactly. So, so if you did the Mechanical Turk study, 307 00:37:35,200 --> 00:37:42,550 the I would argue that the value added of doing a replication within a Mechanical Turk is probably not a high. 308 00:37:42,550 --> 00:37:50,050 Right. And that's what our little simulation showed. It's probably better to think about doing the replication in a very different context, 309 00:37:50,050 --> 00:37:53,210 at a very different mode, because that will be more informative. 310 00:37:53,210 --> 00:38:00,240 Yeah, but but then just to get it clear, because we could then suggest that nature replicates in a different model, 311 00:38:00,240 --> 00:38:05,320 you should do it in the same mode, right? Because if they're doing the highest measurement error, 312 00:38:05,320 --> 00:38:09,670 then the risk while the risk would have mobility of having a different position 313 00:38:09,670 --> 00:38:14,550 than the original paper just goes up by virtue of the measurement error. 314 00:38:14,550 --> 00:38:25,210 Well, or you or you could sort of think about the fact that you replicate in a much more noisy environment as being a conservative replication test. 315 00:38:25,210 --> 00:38:33,560 Right? I mean, that's that's what I would argue. I mean, I would definitely argue that. 316 00:38:33,560 --> 00:38:34,550 Thank you. Yeah, 317 00:38:34,550 --> 00:38:45,400 I was also wondering the within my variation and how you compare that because I guess it's a lab has a bigger variation compared to like them type. 318 00:38:45,400 --> 00:38:54,310 So, in other words, is the variation within the lab mode, right? 319 00:38:54,310 --> 00:39:00,610 Much higher than the variation within the. 320 00:39:00,610 --> 00:39:04,660 Or much higher than the variation across modes, right? 321 00:39:04,660 --> 00:39:08,480 Yes, you could. You could easily you could easily do that with these data, right? 322 00:39:08,480 --> 00:39:15,710 You know, easily, right? You could do a basically sort of you could compare the sort of variation across modes easily. 323 00:39:15,710 --> 00:39:24,960 Right. I didn't do that, although I'll show you something in a minute that I think sort of goes in that direction. 324 00:39:24,960 --> 00:39:29,830 OK, so let me. No, that's not the direction. 325 00:39:29,830 --> 00:39:38,130 Well, maybe this. OK, so it is so the thing is, you see this. 326 00:39:38,130 --> 00:39:42,300 You see this these differences across modes, then the question is, well, is there something, 327 00:39:42,300 --> 00:39:50,730 can you establish what it is about the modes that might be contributing to this measurement error? 328 00:39:50,730 --> 00:39:54,960 And so that was the next stage. That was the next step in this exercise that we went through. 329 00:39:54,960 --> 00:39:59,550 We then sort of said, Well, maybe there's something wrong with him perk, right? 330 00:39:59,550 --> 00:40:04,690 Maybe there's something about empiric experiments that creates measurement error. 331 00:40:04,690 --> 00:40:11,970 All right. So that was the next stage. OK, so all we did here was say, OK, people made these decisions 10 times. 332 00:40:11,970 --> 00:40:16,620 One possibility is the Amtrak people aren't paying attention like that. 333 00:40:16,620 --> 00:40:20,580 They're just clicking through like this, which is what a lot of people sort of suggest. 334 00:40:20,580 --> 00:40:35,660 All right. So here what we did is we simply looked at their performance on the real effort task and see whether they whether the the the the the. 335 00:40:35,660 --> 00:40:41,690 Into the Inter class, correlations within those 10 rounds were high. 336 00:40:41,690 --> 00:40:46,580 In other words, if someone did well on the first 80, did they do well on the 10th? 337 00:40:46,580 --> 00:40:50,360 Right. So this was sort of a measure of stability, right? 338 00:40:50,360 --> 00:40:57,530 And basically it, you know, basically people behave pretty stable across the 10 rounds, no matter what the mode was. 339 00:40:57,530 --> 00:41:02,900 So. So that clearly was not didn't seem to be contributing to the problem. 340 00:41:02,900 --> 00:41:06,890 So we sort of pushed on this because we wanted to sort of see what was going on. 341 00:41:06,890 --> 00:41:16,680 And we. We then explored this notion, well, maybe there's an age issue, right? 342 00:41:16,680 --> 00:41:27,570 In other words, maybe the if because if you look the the the M Turk and the SS online are older. 343 00:41:27,570 --> 00:41:32,160 Right? And they clearly report higher levels of their income. 344 00:41:32,160 --> 00:41:33,000 Right. 345 00:41:33,000 --> 00:41:45,420 And the the the UK subject to all the Oxford subject pool clearly is younger, and they report they're less like they're much more likely to lie. 346 00:41:45,420 --> 00:41:53,500 Right. And so we explore this. We introduce some controls and it doesn't seem to be age, right? 347 00:41:53,500 --> 00:41:58,920 So if you control for age in within modes, right? 348 00:41:58,920 --> 00:42:04,410 That doesn't seem to be driving the difference. Right. 349 00:42:04,410 --> 00:42:08,270 So we sort of rejected age as being this, 350 00:42:08,270 --> 00:42:16,200 this this other socioeconomic characteristic that might have been explaining the difference between the modes. 351 00:42:16,200 --> 00:42:23,160 But we still were sort of concerned about this network effect. So we then as a possible explanation for measurement error. 352 00:42:23,160 --> 00:42:28,080 So then what we did? So this was thanks to Roberta, actually. 353 00:42:28,080 --> 00:42:38,790 So then what we did is they're also OK, oh, well, let's. We were at the time when I was sort of working on this paper. 354 00:42:38,790 --> 00:42:50,580 We were doing these online experiments in India with Amtrak and with our CSA online subject pool in India. 355 00:42:50,580 --> 00:42:59,340 And I thought, Well, maybe we can sort of explore this notion of inattention more explicitly in in with some, 356 00:42:59,340 --> 00:43:07,900 some experimental work, with some additional experimental work, right? And so what we did is we did two things to summarise this. 357 00:43:07,900 --> 00:43:12,300 We did. One thing we sort of explored was, Well, 358 00:43:12,300 --> 00:43:24,480 what happens if we explicitly introduce measurement error into the experiment and compare people who have who have who, 359 00:43:24,480 --> 00:43:37,530 who are who are subjected to the same experiment decision making experiment, except with additional explicitly added measurement error? 360 00:43:37,530 --> 00:43:44,130 Right. And are are sort of conjecture here was well for the intern respondents. 361 00:43:44,130 --> 00:43:50,580 It shouldn't matter because they're already inattentive and there's already lots of measurement error in their decision making. 362 00:43:50,580 --> 00:43:55,100 And so that's the top part of this table. And effectively, that's true, right? 363 00:43:55,100 --> 00:44:03,390 So in this experiment that I won't go into detail on, we we we got we didn't get a significant effect for the Turks, 364 00:44:03,390 --> 00:44:07,710 but we did get a significant effect for the CSA online subject. All right. 365 00:44:07,710 --> 00:44:18,190 And it turns out that introducing measurement error for the online subject pool did affect the decisions. 366 00:44:18,190 --> 00:44:25,290 Right? But it had little effect on the network of the respondents. 367 00:44:25,290 --> 00:44:36,090 And that was some initial evidence to us that the enteric respondents, unlike the online respondents, were not paying much attention. 368 00:44:36,090 --> 00:44:42,240 So then we decided, OK, let's let's let's leave explore this further. 369 00:44:42,240 --> 00:44:46,410 And we then sort of explored the effect of incentives. 370 00:44:46,410 --> 00:44:53,220 What happens if we have these enteric respondents make the same kinds of decisions, 371 00:44:53,220 --> 00:45:02,130 except in this one explicitly incentivise their decisions so that they understand that if they click through or are inattentive, 372 00:45:02,130 --> 00:45:10,080 then they'll be forgoing some income. And here we get exactly the effect we expect, right? 373 00:45:10,080 --> 00:45:18,330 Once we introduced incentives and made it clear to the enteric respondents that if they didn't pay attention, they would be forgoing income. 374 00:45:18,330 --> 00:45:31,890 We got a significant effect. Right. So. These various levels of exploration led us to sort of conclude that it seems plausible that these effects 375 00:45:31,890 --> 00:45:42,040 we're seeing here are the result of sort of inattention on the part of the participants in the experiment. 376 00:45:42,040 --> 00:45:49,230 The takeaway from this, though, is simply if you're doing experiments with subjects. 377 00:45:49,230 --> 00:45:55,050 Think about doing them in multiple modes, different modes, right? 378 00:45:55,050 --> 00:46:00,330 Think about analysing the data with machine learning because here, here, 379 00:46:00,330 --> 00:46:06,900 here where we're not imposing any kind of structure on the conditional average treatment effects, 380 00:46:06,900 --> 00:46:11,460 we're basically saying just let's sort of let's see which is significant, right? 381 00:46:11,460 --> 00:46:16,080 And then let's look at visually look at the conditional average treatment effects 382 00:46:16,080 --> 00:46:20,880 and then see whether it's plausible that the result is robust to different modes, 383 00:46:20,880 --> 00:46:27,060 right? And then once you find some patterns in the data that suggest that there may be motor effects, 384 00:46:27,060 --> 00:46:35,490 then spend some time trying to figure out, right, what is it that might be plausibly contributing to the medicine there? 385 00:46:35,490 --> 00:46:43,860 Yes. And this is really cool, right, but I'm a bit confused. 386 00:46:43,860 --> 00:46:55,890 What the aim is exactly, is it to say that we are certain that a given study, the effect size of a certain study is true in some sort of sense. 387 00:46:55,890 --> 00:47:03,400 And so you're doing these to these and do multiple reputations across different modes and see if that initial one was correct. 388 00:47:03,400 --> 00:47:14,520 Is that the that's that's the that's that's the ultimate. So the ultimate aim basically idea in an ideal world, right, is that, you know, 389 00:47:14,520 --> 00:47:23,730 the the condition of its treatment effects in these different modes that you estimate, right, are all sort of very close to this blue line. 390 00:47:23,730 --> 00:47:34,770 Right. And regardless of what the mode is right or if they aren't that there, there's no systematic pattern here. 391 00:47:34,770 --> 00:47:38,310 Right from one mode to the next, right? 392 00:47:38,310 --> 00:47:45,570 But does this this relies on you getting some sort of initial effect size from a single study? 393 00:47:45,570 --> 00:47:49,140 Well, this is the problem. I mean, I mean, this is the right. 394 00:47:49,140 --> 00:47:53,190 This is the sort of broader issue, right? You do an experiment, OK? 395 00:47:53,190 --> 00:47:56,700 And you get, you know, and it gets published in the American Economics Review. 396 00:47:56,700 --> 00:48:00,900 And it's only based on mTOR subjects, which we see a lot of, right? 397 00:48:00,900 --> 00:48:06,060 Then the question is, OK. Right? I mean, I'll stop there. 398 00:48:06,060 --> 00:48:10,230 Right? Why not? Right. I've got a significant condition. 399 00:48:10,230 --> 00:48:17,160 I've got a significant treatment effect. My point is you shouldn't stop there, right? 400 00:48:17,160 --> 00:48:25,230 You should at least sort of explore the robustness of this treatment effect in these different, very different modes. 401 00:48:25,230 --> 00:48:31,470 So I fully agree that people should be exploring and, you know, not presenting the results in one study. 402 00:48:31,470 --> 00:48:40,200 But would it not be better from the get-go? Just the problem with some of this is that people are finding significance in studies that, 403 00:48:40,200 --> 00:48:47,910 given a certain effect size are underpowered, for example, or that doing certain activities like hawkings. 404 00:48:47,910 --> 00:48:54,290 So they're not registering hypotheses and therefore finding these significant results and the effect size is actually not interesting whatsoever. 405 00:48:54,290 --> 00:49:05,790 I I I know this a slightly distinct issue, but if we did that more instead of replicating ad infinitum, that might be actually a better strategy. 406 00:49:05,790 --> 00:49:11,430 And I think I think it's a very good strategy. 407 00:49:11,430 --> 00:49:19,180 I think it's a necessary strategy, right? Preregistration, you know, holding your hand. 408 00:49:19,180 --> 00:49:28,710 So there's no parking, having properly powered experiments, all of this is very important, right? 409 00:49:28,710 --> 00:49:37,440 But the bottom line is you could have a pre-registered study that's highly powered on a metric that looks like this. 410 00:49:37,440 --> 00:49:40,630 That's my only point. I mean, I'm not suggesting that you shouldn't do all these things. 411 00:49:40,630 --> 00:49:47,860 I think they're very important, but I'm simply saying you can do all of them and still be here. 412 00:49:47,860 --> 00:50:01,260 Looks like you. So I think it's fascinating to use the machine learning approach, but then giving your research aim. 413 00:50:01,260 --> 00:50:10,670 I'm just wondering, we had this recurring discussion within the Summer Institute about the potential for shift changes at Mechanical Turk, 414 00:50:10,670 --> 00:50:15,080 so this would be within modes measurement error in your framework. 415 00:50:15,080 --> 00:50:20,060 So when you go on Monday, it's a completely different symbol when you go on Wednesday. 416 00:50:20,060 --> 00:50:25,370 And so therefore it's very difficult to disentangle whether between mode, 417 00:50:25,370 --> 00:50:30,320 measurement errors and effects within modes is dependent on the within mouth measurement error. 418 00:50:30,320 --> 00:50:32,250 See what I mean, like situation, seasonality, 419 00:50:32,250 --> 00:50:43,250 etc. So I wonder how you obviously the of things to do with to do multiple within modes as well and look at the measurement error. 420 00:50:43,250 --> 00:50:47,660 But what is your intuition on that? This kind of feels that this is a pretty important components. 421 00:50:47,660 --> 00:50:53,020 Right. So so you're ah, so your basic argument is that time is a confounding variable here. 422 00:50:53,020 --> 00:50:59,570 Yes. Yes, it is true that you did these other sort of modes, but you did them a month later. 423 00:50:59,570 --> 00:51:14,750 Yes. I mean, I'm not sure that I could escape that problem here because sort of anticipating the replication strategy, 424 00:51:14,750 --> 00:51:19,220 anticipating the micro replication strategy and then doing it simultaneously. 425 00:51:19,220 --> 00:51:25,670 Yes, you're right. I mean, this is it's always the possibility that now. 426 00:51:25,670 --> 00:51:29,750 Some things are more likely to be confounded than others. 427 00:51:29,750 --> 00:51:40,430 I don't like this study is interested in cheating and lying. It's not clear to me that this particular outcome would be susceptible to this. 428 00:51:40,430 --> 00:51:49,100 But you're right, if it were political, right, and you were talking about, I don't know, some sort of framing experiment, what you're interested in, 429 00:51:49,100 --> 00:52:00,590 whether people respond to some, you know, framing treatment right in a political campaign, then then this would not be appropriate. 430 00:52:00,590 --> 00:52:07,370 I would say there's a fundamental difference in enough fields Nuffield fifty thousand and mechanical stroke, which is a marketplace, right? 431 00:52:07,370 --> 00:52:11,450 So. So there's a demand supply if you see what it means. 432 00:52:11,450 --> 00:52:19,400 So I. Yes. So what is the difference? Why I would say that that the marketplace has a lot more potential for within modes. 433 00:52:19,400 --> 00:52:26,240 Temporal variation, which is not necessarily like in the classical treatment sense that it's before or after an election, 434 00:52:26,240 --> 00:52:30,470 but just two a.m., four a.m., six a.m. ET. Oh, I see, I see. 435 00:52:30,470 --> 00:52:40,170 I see. Because right, a time of day. 436 00:52:40,170 --> 00:52:51,880 So I'm just trying to think of why that would be. So that's a lot of the people in some of as a way to support us for our best to involved. 437 00:52:51,880 --> 00:52:56,290 So therefore, you would have a different subset of the population. 438 00:52:56,290 --> 00:53:05,200 Oh, I don't doubt that. I mean, I mean, I don't I don't exactly know what, what, what is making, what is generating this. 439 00:53:05,200 --> 00:53:10,960 Nor do I know exactly why I'm turkers or less attentive, possibly than says online. 440 00:53:10,960 --> 00:53:17,170 I mean. Well, I do have. I do have a. And that that's what motivated this. 441 00:53:17,170 --> 00:53:18,670 Idea here. 442 00:53:18,670 --> 00:53:35,920 So the first online subject rule is non deception is highly paid, highly compensated, there's the word we have very strict ethical rules, right? 443 00:53:35,920 --> 00:53:38,440 It's very infrequent, right? Someone. 444 00:53:38,440 --> 00:53:48,910 I mean, it's so unlike internet, where they're professional online crowdsource workers, they're not being paid much, right? 445 00:53:48,910 --> 00:53:53,870 They may be in some click farm in Venezuela, but we don't know, right? 446 00:53:53,870 --> 00:53:59,230 They could be bots, right? So all of that is an issue, right? 447 00:53:59,230 --> 00:54:05,860 Just to further respond to. There's also if you if you use the platform, there's ways you can screen for that. 448 00:54:05,860 --> 00:54:08,440 So you can make sure that, for example, you get from Pakistan, 449 00:54:08,440 --> 00:54:13,930 let's say people from every part of every hour like you have a quota for every hour and then you 450 00:54:13,930 --> 00:54:18,130 stop collecting and you can also enhance that with instruments like the ones Ray and I use, 451 00:54:18,130 --> 00:54:22,360 like Qualtrics and stuff where you put extra like, for example, 452 00:54:22,360 --> 00:54:27,730 quotas in like people click through and then we say no thanks like we already have people like you. 453 00:54:27,730 --> 00:54:34,810 And then the quota opens back up. So like, it's a question of using N-Trek is just a way to reach people you want, 454 00:54:34,810 --> 00:54:39,110 and then you just have to be very, very careful about the constraints you put on the people you want. 455 00:54:39,110 --> 00:54:48,510 So the only them they click through, that's the only thing that I would like to think of it as a population will be flexible enough. 456 00:54:48,510 --> 00:54:53,100 I mean, because the batteries are in the fine. 457 00:54:53,100 --> 00:55:00,840 Whereas for all other modes, it seems they are, I'm not sure that I don't know, most especially will not be one of the boundaries. 458 00:55:00,840 --> 00:55:09,190 You know this? Yes. On the basis of feel like I no. 459 00:55:09,190 --> 00:55:14,920 The boundaries of the extra population and even set aside, even if such, et cetera, 460 00:55:14,920 --> 00:55:21,930 we don't know of any point in time how many people from Pakistan are in the big. 461 00:55:21,930 --> 00:55:26,340 So it will be in flux or I don't know how you see I, yeah, I'm not sure I would call it. 462 00:55:26,340 --> 00:55:37,750 That's interesting. I mean, yes, it's. Yes, I mean, I'm not quite sure what you mean by influx. 463 00:55:37,750 --> 00:55:41,410 To be honest, every, every so often. 464 00:55:41,410 --> 00:55:48,080 So you always know your list of participants that you know the broader thousand. 465 00:55:48,080 --> 00:55:59,240 So the Nuffield participants aren't unlike Turk in one respect, so there's like India, there's like 15 or 20 thousand right. 466 00:55:59,240 --> 00:56:02,780 They don't all participate in the experiment. We sent out an invitation. 467 00:56:02,780 --> 00:56:08,450 We don't, you know, the some percentage of them will be interested and sign up. 468 00:56:08,450 --> 00:56:15,200 I'm not sure how that's dramatically different. I mean, I think the subject pools are different, but I don't know how in terms of flux, 469 00:56:15,200 --> 00:56:23,410 how that's dramatically different than publishing a hit and waiting for some immature person to sort of say, Oh, you know, I'll respond. 470 00:56:23,410 --> 00:56:29,140 So I'd say the most important confounded potential will be competition on the marketplace. 471 00:56:29,140 --> 00:56:34,060 So for example, if. Yes, you're right, we're not competing with anybody else. 472 00:56:34,060 --> 00:56:41,710 And you're right, an immature crowd worker basically is looking at all of the possible hits and making a decision. 473 00:56:41,710 --> 00:56:57,060 You're right that that is quite different. So I was wondering if you found, and I think maybe you mentioned previous studies found this well, 474 00:56:57,060 --> 00:57:04,400 that the tension is very important in maybe explaining the difference between like experiments and online experiments. 475 00:57:04,400 --> 00:57:11,000 And do you maybe know of any study that has tried short attention tests during 476 00:57:11,000 --> 00:57:16,010 and in another experiment the sort of controls for that like to have a weight? 477 00:57:16,010 --> 00:57:27,950 I I'm sure people have. So there is a woman who in the US who has looked at this pretty carefully sunshine hooligans, the Duke, 478 00:57:27,950 --> 00:57:37,880 and she has this whole sort of research agenda explicitly looking at my work, Lucid responds. 479 00:57:37,880 --> 00:57:52,520 And not just them also also looking at people who just answer SSI or responding and explicitly looking at, 480 00:57:52,520 --> 00:57:56,870 you know, this whole phenomena of inattention, click throughs. 481 00:57:56,870 --> 00:58:04,790 And she has a really interesting paper where she tries to explore the relationship between that behaviour, right? 482 00:58:04,790 --> 00:58:08,300 And subject characteristics, right? 483 00:58:08,300 --> 00:58:12,740 And you know, some people, for example, say, Oh, well, you know, we'll just get rid of these people, right? 484 00:58:12,740 --> 00:58:17,150 Let's. But that's I mean, her argument is that's not what you want to do, right? 485 00:58:17,150 --> 00:58:28,650 Because these are sort of an interesting population, right? So she has a very clever sort of analysis which is worth looking at. 486 00:58:28,650 --> 00:58:32,790 Thanks. Thanks very much. I'm not an experiment person, I'm a demographer, 487 00:58:32,790 --> 00:58:42,240 and I keep thinking whether this mode effect that you have or what you call and mode effect that this might actually be some sort of a, you know, 488 00:58:42,240 --> 00:58:50,190 a difference in the composition of the people in the different modes and whether there would be a way or maybe this is not something you do, 489 00:58:50,190 --> 00:58:53,160 but whether there would be a way to think. 490 00:58:53,160 --> 00:58:59,310 For example, if you think that there might be educational differences in the way people respond in such an exercise, 491 00:58:59,310 --> 00:59:04,110 then you could potentially have a highly educated pool of people for each for 492 00:59:04,110 --> 00:59:09,090 modes and the low educated for each modes or what you could maybe also do. 493 00:59:09,090 --> 00:59:10,590 I don't know if this is something that you do, 494 00:59:10,590 --> 00:59:16,980 but you could take your Oxford people and take the same people and make them play the game in the lab and 495 00:59:16,980 --> 00:59:22,920 make them play the game online and see they the same people if they do something different because to me, 496 00:59:22,920 --> 00:59:29,100 that would be a mode. So when when I'm in the lab, I would be nice and when I'm at home and nobody sees what I'm doing, 497 00:59:29,100 --> 00:59:34,230 then maybe I won't declare my income at the same rate. That's a very good point. 498 00:59:34,230 --> 00:59:42,990 So there is a paper out there by a woman from UCLA, Lynn Vavreck, where she actually randomly assigns people. 499 00:59:42,990 --> 00:59:47,670 I mean, she recruits people and then she randomly assigns them to mode. 500 00:59:47,670 --> 00:59:53,160 She randomly assigns them to online, randomly assigns them to, I think, in person. 501 00:59:53,160 --> 01:00:02,490 And she compares, it's a recent paper, so. So she does explicitly this and her result. 502 01:00:02,490 --> 01:00:10,030 There are some areas in which there are differences, clearly, but I forget the but that is the that is the optimal design. 503 01:00:10,030 --> 01:00:20,340 And if we could have done it, we would have been good to randomly assigned to randomly select the UK to the test 504 01:00:20,340 --> 01:00:24,870 subjects and then randomly assigned because obviously they're not randomly assigned, 505 01:00:24,870 --> 01:00:32,690 they get to select whether they want to do it in the lab or and so there may be selection there. 506 01:00:32,690 --> 01:00:36,980 The other thing about this, I'm just thinking about the Sunshine's papers. 507 01:00:36,980 --> 01:00:49,260 The other thing she looks at, which is really cool, is. There's this big phenomena in online where people give. 508 01:00:49,260 --> 01:01:00,720 Ridiculous responses. And this is becoming a big problem, and so people, particularly younger people who are sort of in a social media sort of, 509 01:01:00,720 --> 01:01:06,960 you know, been on social media, all our lives, right? They tend to sort of respond to these. 510 01:01:06,960 --> 01:01:11,250 They just give, you know, absurd responses. And this is a big problem. 511 01:01:11,250 --> 01:01:12,600 And these these experiments, right? 512 01:01:12,600 --> 01:01:19,080 Because you know, you, you know, you see these people that are sort of really giving extreme responses to the question as well. 513 01:01:19,080 --> 01:01:22,800 Is this true or false, right? How do you deal with it, right? 514 01:01:22,800 --> 01:01:27,390 And again, she looks at the idea that people often try to just to drop them out, right? 515 01:01:27,390 --> 01:01:29,280 But that might not be the optimal strategy. 516 01:01:29,280 --> 01:01:44,760 So I agree it's a very I don't think I've solved this problem, but the idea is to give you a flavour of how to think about both on the design side, 517 01:01:44,760 --> 01:01:51,450 think of multiple modes and then on the analytic side, think about the machine learning to sort of tease out the, 518 01:01:51,450 --> 01:01:56,800 uh, the mode effects and establish the robustness of your treatment effects. 519 01:01:56,800 --> 01:02:03,270 I think that that's the so the my my punch line on that, my punch line on the other. 520 01:02:03,270 --> 01:02:11,490 Okay, so I have 17 minutes, so I'll go over this very quickly. 521 01:02:11,490 --> 01:02:15,660 So a second thing about the second project that we've been involved in and says, 522 01:02:15,660 --> 01:02:27,810 and this is sort of an extension of something that I was working on with with Roberto is news using 523 01:02:27,810 --> 01:02:36,120 sort of the virtual environment post ratification methods to estimate average treatment effects. 524 01:02:36,120 --> 01:02:52,290 And this is a project that is quite new that I'm working on with a post with a Ph.D. student at King's College, Philippe. 525 01:02:52,290 --> 01:03:00,510 And here I'll just highlight some of what I think are the more interesting parts of the project we're interested in. 526 01:03:00,510 --> 01:03:10,640 So there's this big debate in political science, but probably only in political science interested in whether. 527 01:03:10,640 --> 01:03:21,890 And there's sort of a puzzle. So when you think about democracies, you think about people should be informed about their political representatives, 528 01:03:21,890 --> 01:03:26,420 and there's been all of this literature out there, 529 01:03:26,420 --> 01:03:32,060 these experiments that have been done in which people have been given information about their politicians or the 530 01:03:32,060 --> 01:03:42,830 incumbent politicians about the fact that they're corrupt or that they're hopeless and that they're doing bad things. 531 01:03:42,830 --> 01:03:50,510 And most of these experiments have suggested that it doesn't matter. 532 01:03:50,510 --> 01:03:59,000 You can tell voters that are, you know, incumbent politician is a crook, is hopeless, 533 01:03:59,000 --> 01:04:03,470 is stealing money from you, and it doesn't affect their decision in terms of re-election. 534 01:04:03,470 --> 01:04:10,550 They don't. It doesn't affect their vote choice. Right. And of course, you know, political scientists are quite sort of disturbed by this, right? 535 01:04:10,550 --> 01:04:21,330 This has got them agitated. And so this is an experiment in that vein, although we think we're doing it more rigorously than most people have done it. 536 01:04:21,330 --> 01:04:28,790 So what we've done is so we have this lab in Chile and we've set up we have a collaboration 537 01:04:28,790 --> 01:04:35,630 agreement with the ministry in Chile that is responsible for auditing municipal politicians. 538 01:04:35,630 --> 01:04:38,960 So they audit the politicians. 539 01:04:38,960 --> 01:04:53,180 And we've convinced them to cooperate with us and to to allow and to randomly assigned these audits to different municipalities in Chile. 540 01:04:53,180 --> 01:05:04,790 And so that's a that's an opportunity for us to explore whether the results of these audits affect voter decisions. 541 01:05:04,790 --> 01:05:09,680 And there's so we're just starting this and the elections in 2020. 542 01:05:09,680 --> 01:05:17,930 So this is a unique opportunity for us to randomly assigned audits to the municipalities, 543 01:05:17,930 --> 01:05:23,660 inform the voters about the results of these audits and see whether it affects their vote decision. 544 01:05:23,660 --> 01:05:29,420 So that's the sort of flavour of this. And so this is Chile. 545 01:05:29,420 --> 01:05:32,600 There are three hundred and forty five municipalities in Chile. 546 01:05:32,600 --> 01:05:37,250 It's a weird country, like it's sort of like a, I don't know, like a worm or something. 547 01:05:37,250 --> 01:05:41,930 And then it's there. Everything is sort of distributed along here. 548 01:05:41,930 --> 01:05:54,560 And the idea is that we're going to randomly assigned audits to a 40 of these No. 549 01:05:54,560 --> 01:06:09,260 To 60 of these municipalities. So the basic design that we think is vaguely interesting is that we're going to do a pre-treatment survey. 550 01:06:09,260 --> 01:06:16,090 And we're going to the sample is going to be 60 municipalities and two hundred and fifty. 551 01:06:16,090 --> 01:06:23,710 So small counties, I call them zip codes, but so the zip code, things are within the 60 municipalities. 552 01:06:23,710 --> 01:06:31,840 And we're going to basically target 6500 people in the pre-treatment survey. 553 01:06:31,840 --> 01:06:44,680 And we're going to use Facebook ad manager to basically identify the two to organise the the pre-treatment survey. 554 01:06:44,680 --> 01:06:49,600 So we're going to we're going to identify. 555 01:06:49,600 --> 01:06:51,370 So we get the basic. 556 01:06:51,370 --> 01:07:00,430 We have 250 of these sort of zip code things and we're going to identify people in these 250 zip code things using Facebook ads manager, right? 557 01:07:00,430 --> 01:07:05,950 So that's and then we're going to recruit them into those into the experiment. 558 01:07:05,950 --> 01:07:10,900 And then we're going to conduct a pre-treatment survey which will give us covariates 559 01:07:10,900 --> 01:07:17,460 and will allow us to get a sense of what their vote preferences for the municipal gov. 560 01:07:17,460 --> 01:07:28,000 Right? And so the treatments then are going to be random audits of 30 of these municipalities that the ministry will do 561 01:07:28,000 --> 01:07:37,120 and 30 of the municipalities will be in the audit treatment and then 30 will be in the control and treatment. 562 01:07:37,120 --> 01:07:43,000 And then the information treatment is this is the thing we're going to be doing. 563 01:07:43,000 --> 01:07:51,670 We're going to we're going to force for some of these 250 zip code things. 564 01:07:51,670 --> 01:08:04,220 We're going to have an information treatment. We're going to inform everybody. Hopefully in that zip code of the result of the of the audit. 565 01:08:04,220 --> 01:08:09,070 All right. And of course, there will be people in the control zip codes who will get no information. 566 01:08:09,070 --> 01:08:17,150 That's the sort of broad idea. Then there will be after, after there is this treatment. 567 01:08:17,150 --> 01:08:23,720 So there's two levels of treatment, there's an audit will be one level which is at the municipal level and then there will be 568 01:08:23,720 --> 01:08:31,880 information treatments within the audits within the within the audited municipalities. 569 01:08:31,880 --> 01:08:38,660 Then we'll there be a post-treatment survey in which we go back and re-interview those 6500 570 01:08:38,660 --> 01:08:48,110 people in principle and then we'll get their vote preferences in the post-treatment period. 571 01:08:48,110 --> 01:08:57,350 And then the final part of this is to think about post stratification, how do we estimate those? 572 01:08:57,350 --> 01:09:05,720 So we'll be able to estimate treatment effects for these 250 villas, villas or these municipalities. 573 01:09:05,720 --> 01:09:14,360 But then the question is how do we post stratify those estimated average treatment effect to the entire population because we want to know, 574 01:09:14,360 --> 01:09:23,990 you know, the politicians and we do we want to know what the what the the the treatment effect is for the entire country. 575 01:09:23,990 --> 01:09:29,330 Right? Potential treatment effect because that's what they're interested in, right? Does it does it make sense? 576 01:09:29,330 --> 01:09:33,590 I mean, they're interested in whether these audits have any effect for the nation. 577 01:09:33,590 --> 01:09:38,000 Right. And that's where the stratification comes in, which is quite important. 578 01:09:38,000 --> 01:09:44,290 And this is the part that most people don't do, right? This is the part that most experimentalists don't think about. 579 01:09:44,290 --> 01:09:48,470 They they estimate a treatment effect. And that's it. All right. 580 01:09:48,470 --> 01:09:56,540 And we're sort of hoping or thinking that this might participate partially contribute to the fact that people don't get really good, 581 01:09:56,540 --> 01:10:05,290 good estimates of what what the information treatment effects look like. OK, so the Facebooks, this is all. 582 01:10:05,290 --> 01:10:15,040 So we're going to use the ad manager, which we've used before in Chile, which will allow us to so basically identify the particular one of these, 583 01:10:15,040 --> 01:10:23,710 which of the two hundred and fifty sort of count of zip codes we want to target for recruitment into the study. 584 01:10:23,710 --> 01:10:30,370 So we'll we'll do this for two hundred and fifty zip codes right within those within those 60 municipalities. 585 01:10:30,370 --> 01:10:35,280 And we'll recruit people using Facebook ads, right, which we've done before. 586 01:10:35,280 --> 01:10:40,330 Right. So we'll recruit them into the into the into the study. And this is a highlight for us. 587 01:10:40,330 --> 01:10:44,720 This has been a very effective way of getting people to participate in these virtual experiments. 588 01:10:44,720 --> 01:10:50,620 It's a bit expensive because you obviously have to pay Facebook, but it's pretty. 589 01:10:50,620 --> 01:10:59,470 It's it's pretty useful and it allows us to target particular treated and untreated segments of the experiment. 590 01:10:59,470 --> 01:11:06,520 So we'll do that sort of allow us to target subjects in specific counties or zip codes. 591 01:11:06,520 --> 01:11:15,310 We will use banner ads to get people to come into the to come into the experiment. 592 01:11:15,310 --> 01:11:27,060 And ideally, we'll do this with 6500 people. That's just the dashboards you need to see about this. 593 01:11:27,060 --> 01:11:33,240 So just just to be clear, the treatments will be we'll actually have two levels of treatments. 594 01:11:33,240 --> 01:11:40,080 One will be people will see the results of an audit of two by two by two factorial design. 595 01:11:40,080 --> 01:11:44,100 There's two information treatments. One is the audit result, right? 596 01:11:44,100 --> 01:11:51,420 And the other is what we call a report card, which is just a report on the performance of that municipal mayor. 597 01:11:51,420 --> 01:11:55,950 And so some people will see just an audit. Some people will see just a report. 598 01:11:55,950 --> 01:12:00,740 Some people will see a report and an audit. 599 01:12:00,740 --> 01:12:08,530 And then there's a control condition where they where they get where there's no audits and no information just. 600 01:12:08,530 --> 01:12:15,270 And so this is this roughly summarises the. So there's you can this is the way we. 601 01:12:15,270 --> 01:12:18,240 It's not worth getting into, really, but the. 602 01:12:18,240 --> 01:12:26,130 So there will be treatments, all of these are the treatments that are administered by the ministry right at the at the municipal level, 603 01:12:26,130 --> 01:12:35,430 which is an audit or a no audit. And then there will be information treatments that are administered at the zip code level. 604 01:12:35,430 --> 01:12:43,740 Right. So within all these audit, no audit treatments administered by the ministry, 605 01:12:43,740 --> 01:12:56,130 there will be information treatments at the sort of zip code level that reflect the sort of either the report or audit or report and audit or control. 606 01:12:56,130 --> 01:13:02,610 That's sort of clear. So the information treatments. 607 01:13:02,610 --> 01:13:07,440 So this is sort of this is the kind of information you would they would see, right? 608 01:13:07,440 --> 01:13:23,810 You know, they're, you know, this municipal mayor or, you know, six years in prison for defrauding. 609 01:13:23,810 --> 01:13:29,270 That's OK, so that's the problem with the design, to be honest. 610 01:13:29,270 --> 01:13:39,750 So. There are multiple levels of this experiment, but you're right, I mean, we are targeting a particular zip code using Facebook ads, 611 01:13:39,750 --> 01:13:45,930 so only people in that zip code are going to sort of there are going to be targeted with the information. 612 01:13:45,930 --> 01:13:53,970 But yes, there's I mean, we cannot assume that there won't be contamination or yes, I mean, 613 01:13:53,970 --> 01:14:00,270 that's I mean I but we've got to be honest, we've not figured that out entirely or how we'll measure it. 614 01:14:00,270 --> 01:14:12,200 I mean, so even if there is some. You know, spread the important issue is whether we can measure it, if we can measure it, then that's fine. 615 01:14:12,200 --> 01:14:17,880 But the possibility is that there will be this stuff going on and we won't be able to measure it. 616 01:14:17,880 --> 01:14:24,170 So I'm not sure exactly at this point how we're going to tackle that problem. 617 01:14:24,170 --> 01:14:32,510 You're right. And that same time, and now, I mean, there's a stronger just right. 618 01:14:32,510 --> 01:14:38,400 So you can have media outlets going to one particular party. Yes. 619 01:14:38,400 --> 01:14:45,410 So yeah, so so how so one strong confounder that I can think of is media, right? 620 01:14:45,410 --> 01:14:52,700 So if you have an information campaign that which is saying in a particular villa, this candidate is good or bad or whatever, 621 01:14:52,700 --> 01:15:00,170 but then you have media sort of being biased in any way towards one particular candidate, one particular party and so on and so forth. 622 01:15:00,170 --> 01:15:08,450 It's sort of like a strong confounder. It could actually influence that person's vote over the information campaign or vice versa. 623 01:15:08,450 --> 01:15:15,480 I'm not I'm not on the topic, but I could definitely see the second sound right. 624 01:15:15,480 --> 01:15:22,960 Yes, so. So you're right. 625 01:15:22,960 --> 01:15:27,620 Uh, so so I guess the problem that you're identifying, you say, OK, 626 01:15:27,620 --> 01:15:34,310 so we have one zip code where that's in our information trip and another zip code that's not in our information treatment. 627 01:15:34,310 --> 01:15:42,290 And both of these zip codes is getting both is getting a media. 628 01:15:42,290 --> 01:15:49,740 Representation. Well, that I don't think is too much of a problem because I mean, 629 01:15:49,740 --> 01:15:55,430 it won't affect our ability to distinguish the in for our information treatment to the treated. 630 01:15:55,430 --> 01:16:00,510 Now, in the worst case scenario, it'll totally wash out any treatment effect, right? 631 01:16:00,510 --> 01:16:06,150 Which could happen in which case, yes, but it won't. 632 01:16:06,150 --> 01:16:10,080 It will bias our estimate of the treatment effect because both of these both 633 01:16:10,080 --> 01:16:15,390 of these units will have received in principle the same media information. 634 01:16:15,390 --> 01:16:18,810 But you're right, it could totally wash out the treatment effect. 635 01:16:18,810 --> 01:16:31,440 It's possible. Our assumption is that the media is not going to be focussed on specific municipalities, but we may be wrong. 636 01:16:31,440 --> 01:16:40,560 So one way to make this about this is to simply ask the question is sort of like exposure to media, right papers. 637 01:16:40,560 --> 01:16:46,050 How do you go on Facebook? Opposed to other types of information we could try? 638 01:16:46,050 --> 01:16:50,730 Yes. Yes. And we could do that and do that both in the pre and the post treatment. 639 01:16:50,730 --> 01:16:56,920 Yes, that's a positive, but that's a that's a possibility. 640 01:16:56,920 --> 01:17:02,560 Now, the I won't even tell you the other part of these people's language is even more sort of. 641 01:17:02,560 --> 01:17:10,480 So the other thing? So the so here we're I'm just talking about the voters, right? 642 01:17:10,480 --> 01:17:17,170 The other part of the experiment, which I'm not going to talk about today because I'll tell you which is even more sort of really problematic, 643 01:17:17,170 --> 01:17:26,640 but maybe more interesting is that we want to. So the problem with a lot of these political science studies is they look at this, 644 01:17:26,640 --> 01:17:34,440 they look at the the the the treatment effect on the vote, right, which is what we're doing here, which is what I'm talking about here. 645 01:17:34,440 --> 01:17:47,440 But the other thing we want to do is we want to treat these municipalities with information and see whether it affects the politician. 646 01:17:47,440 --> 01:17:56,120 Right, because that's the more and so what happens if we do an information treatment and broadcast the result of a. 647 01:17:56,120 --> 01:18:04,610 Audit to the electorate. Right. The interesting one of the most interesting thing for us is whether that affects the politicians behaviour. 648 01:18:04,610 --> 01:18:11,780 Do they start engaging in more client list stuff? 649 01:18:11,780 --> 01:18:20,990 And with this audit agency, we are able to monitor their daily expenditures and where they're going. 650 01:18:20,990 --> 01:18:26,390 Right. So we'll be able to determine whether, you know, our information treatment, 651 01:18:26,390 --> 01:18:38,000 in addition to the audit result, right, affects the specific spending priorities of the municipal mayor. 652 01:18:38,000 --> 01:18:45,440 So that's the sort of interesting thing is so we'll look at their spending, but we're also this is the thing that Roberto inspired me on. 653 01:18:45,440 --> 01:18:54,770 We'll also look at their Facebook pages because we've just we've noticed that all of these municipal mayors have active Facebook pages. 654 01:18:54,770 --> 01:18:58,850 So the question is when they're treated, either with the audit, 655 01:18:58,850 --> 01:19:06,470 the random audit and results and with our in for and the municipalities treated with our information treatment, 656 01:19:06,470 --> 01:19:14,900 whether that has an effect on the content of their Facebook stuff. 657 01:19:14,900 --> 01:19:18,800 Right. But also whether it has effect on digital trade or other words. 658 01:19:18,800 --> 01:19:31,910 Now this is even more so hazy. Whether we can sort of look at the people, how do people respond to like a negative audit the voters do. 659 01:19:31,910 --> 01:19:40,940 They also go to the Facebook page of the politician and who is going to the Fed, the Fed, the Facebook page of the politician, right? 660 01:19:40,940 --> 01:19:46,970 Because it will be in principle, will be monitoring their Facebook pages and we'll see who goes to the Facebook page. 661 01:19:46,970 --> 01:19:50,600 And then we'll be able to say something about the people who the voters who are 662 01:19:50,600 --> 01:19:57,230 responding to the both the shock and to the content of the politicians Facebook page. 663 01:19:57,230 --> 01:20:00,860 That's sort of that's the sort of design issues. 664 01:20:00,860 --> 01:20:06,800 And then I have one minute I'll just wrap up. OK, we'll do the average treatment effects. 665 01:20:06,800 --> 01:20:15,050 That's all. So we'll also the ideas we'll do post stratification, right, which Roberto's already talked to you about. 666 01:20:15,050 --> 01:20:18,950 But you can sort of think, well, we'll have some very not too broad. 667 01:20:18,950 --> 01:20:26,720 We'll have about thirty six cells in our post stratification frame, right? 668 01:20:26,720 --> 01:20:31,190 Two gender, two education, three income, three age. 669 01:20:31,190 --> 01:20:39,620 And then these are the number of cells right generated by each of these categories, right? 670 01:20:39,620 --> 01:20:52,370 And then we'll have about thirty three individuals from the 6500 right in each of these in the audit information, cell breakdown and in the audit. 671 01:20:52,370 --> 01:21:01,020 No information cell breakdown. So these are the cells to which we thought on, which will be estimating the post stratification model. 672 01:21:01,020 --> 01:21:12,860 Right. And since you know all this already, we'll be doing, we'll be estimating a pre-treatment vote choice model, right? 673 01:21:12,860 --> 01:21:19,700 Using some sort of machine probability estimate or right like random forest, right? 674 01:21:19,700 --> 01:21:29,000 So we'll estimate pre pre-treatment vote vote preference and then we'll be estimating post-treatment vote preference. 675 01:21:29,000 --> 01:21:36,700 In principle, we can post stratify that to the. 676 01:21:36,700 --> 01:21:50,070 To all of the sudden, not seeing it, no. Whatever numbers now, I don't will be able to post stratify those results to all of the cells, 677 01:21:50,070 --> 01:21:54,720 either in the nation or within a particular municipality, right? 678 01:21:54,720 --> 01:22:01,770 And then since we post ratified the pre-treatment vote choice to all of those cells and 679 01:22:01,770 --> 01:22:08,130 since we post stratified the post-treatment vote preference to all of those cells, 680 01:22:08,130 --> 01:22:13,080 then we can simply by subtracting those two quantities, 681 01:22:13,080 --> 01:22:21,720 get a sense of what the average treatment effect is for this very detailed breakdown of individuals within the nation. 682 01:22:21,720 --> 01:22:26,700 So that's the sort of rough strategy. 683 01:22:26,700 --> 01:22:37,900 So that's. Might. I'm sorry, I mean, this is a very general question in relation to forced certification in the context of experiments, 684 01:22:37,900 --> 01:22:48,250 and I'm wondering if so in that I understand the rationale here to some extent, is to go from internal validity to external validity. 685 01:22:48,250 --> 01:22:56,470 But if we think about what you are talking about, Maude's earlier, right, and perhaps is less relevant specifically for this example. 686 01:22:56,470 --> 01:23:03,580 But how do we think about the fact of both started flying when there also might be these more defects that might be operating? 687 01:23:03,580 --> 01:23:14,200 So, you know, can we hope so that I find it hard to get around and think about because in the specific context of experiments that could be elsewhere? 688 01:23:14,200 --> 01:23:18,700 So this is probably the criticism that we will. 689 01:23:18,700 --> 01:23:23,620 I mean, I have to deal with because of course, I'm I mean, I think people will come to us and say, OK, 690 01:23:23,620 --> 01:23:30,880 well, you know, I mean, the point of departure here is to recruit subjects right using Facebook, right? 691 01:23:30,880 --> 01:23:41,710 And that's the mode, right? And the question is, I mean, I know now from having done this in Chile that, for example, right? 692 01:23:41,710 --> 01:23:46,250 It's very difficult to get certain segments of the population to participate. 693 01:23:46,250 --> 01:23:53,060 Right. That's. Very specific to Facebook. 694 01:23:53,060 --> 01:24:00,890 Right? We know now that it's hard to get younger people, younger people don't have Facebook accounts. 695 01:24:00,890 --> 01:24:08,670 I mean, they're on other social media, right? So they'll be, you know? And it's hard to recruit that segment right into the study. 696 01:24:08,670 --> 01:24:14,060 So you're right. But then that is a limitation. 697 01:24:14,060 --> 01:24:18,860 I mean, and doing a multimode here would be optimal, but then it would be. 698 01:24:18,860 --> 01:24:22,910 Well, I mean, you're right. I mean, I could do a multi-mode here. 699 01:24:22,910 --> 01:24:32,690 I mean, there's no reason why I couldn't, for example, try to recruit people, explore the recruitment on different social media. 700 01:24:32,690 --> 01:24:43,310 And I could probably end this assuming the budget I could explore sort of non social media recruitment into the study, 701 01:24:43,310 --> 01:24:54,510 which would be optimal because clearly there are, you know, that would be now. 702 01:24:54,510 --> 01:25:09,230 The one saving grace here is that. So I think mode matters quite a bit when people are in, for example, 703 01:25:09,230 --> 01:25:20,720 in the cheating experiment where people are interacting with each other, making sensitive decisions like lying and cheating, right? 704 01:25:20,720 --> 01:25:26,780 So I think there are some modes in which people are much more comfortable doing that in other modes in which they're less comfortable. 705 01:25:26,780 --> 01:25:35,610 So I do think that mode matters here in this case. 706 01:25:35,610 --> 01:25:46,090 We're asking a pretty simple. Saying basically vote preference, and so I would expect mode to be somewhat less important. 707 01:25:46,090 --> 01:25:53,680 I mean, just to just to add on that, it's also the idea that as me and Ray have explored the digital trace, 708 01:25:53,680 --> 01:26:01,480 you observe people you don't necessarily have to ask. And so the idea is that even though there might be some mode effects, we're not. 709 01:26:01,480 --> 01:26:05,040 You don't have the same problems you have with anthrax where they are like they 710 01:26:05,040 --> 01:26:10,120 they're they have an incentive to look for the treatment within your questionnaire, 711 01:26:10,120 --> 01:26:15,040 whatever. Here people are just behaving as they would naturally on a politician's page, 712 01:26:15,040 --> 01:26:18,280 and we can see on the politicians page whether they leave a positive or negative review. 713 01:26:18,280 --> 01:26:26,770 And that sense, it's not that dissimilar to the given also the potential end on Facebook compared to like a lab experiment or an amateur. 714 01:26:26,770 --> 01:26:31,990 It's not dissimilar to the Xbox study because it's not completely absurd that we would get 715 01:26:31,990 --> 01:26:37,570 hundreds of thousands of individuals commenting amongst these different municipal places. 716 01:26:37,570 --> 01:26:47,850 I think, yes, as always, there isn't selection of X issues, but I think less important, you don't talk for sure and are the other types of modes. 717 01:26:47,850 --> 01:26:53,360 But. No. 718 01:26:53,360 --> 01:26:57,360 OK. I mean, I would then I'll just wrap up. 719 01:26:57,360 --> 01:27:00,000 Bye bye. Bye bye. 720 01:27:00,000 --> 01:27:09,060 Elaborating on what Roberto said, I mean, so optimally, I mean, and this would be the mode thing optimally would in its budget related. 721 01:27:09,060 --> 01:27:16,440 I would very much like to have this unobtrusive measure where I'm just observing who 722 01:27:16,440 --> 01:27:24,000 is participating on these different web pages and then and then and then using those 723 01:27:24,000 --> 01:27:31,140 digital then sort of reverse engineering from that digital activity to the individual and 724 01:27:31,140 --> 01:27:37,290 then populating these cells in terms of partisan preference and socio economic stuff, 725 01:27:37,290 --> 01:27:40,550 if I could get it. That would be the optimal sort of outcome. 726 01:27:40,550 --> 01:27:53,430 And I think I think ideally, I would want to complement the simple survey strategy with these digital three strategies if it's possible. 727 01:27:53,430 --> 01:28:04,100 But that's again, budget. As we as I learnt, it can be expensive trying to sort of collect this digital trace information. 728 01:28:04,100 --> 01:28:14,990 So that question of the life of the person, the researcher, will we think that of the US in the US with accountants? 729 01:28:14,990 --> 01:28:27,910 Say that again in some reputable research, that's. 730 01:28:27,910 --> 01:28:35,170 It shouldn't be controversial. That would be my position because I have a Russian, 731 01:28:35,170 --> 01:28:43,900 a Russian co-author who has seen what Roberto and I did in the Texas election and said, Oh, we should really do this in Russia. 732 01:28:43,900 --> 01:28:49,120 I can get access to all the Russian version of the Facebook data. 733 01:28:49,120 --> 01:28:55,300 So I'm wondering, yeah, I don't know. I don't know how I would be sort of perceived as sort of interfering with Russian election. 734 01:28:55,300 --> 01:29:01,290 You're right, I'm sympathetic to your point. 735 01:29:01,290 --> 01:29:09,150 Just a comment on the problem with the spill overs between between the treatment monitoring that observations, I mean, might also be interesting. 736 01:29:09,150 --> 01:29:13,380 I don't know if you get some kind of connectivity between the users from Facebook. 737 01:29:13,380 --> 01:29:20,940 Oh, yeah, but I mean, otherwise, you could also like the spatial distance and see whether I mean, maybe the treatment effect declines with distance, 738 01:29:20,940 --> 01:29:26,190 which would be that would be that would be yes, that might be because this is what we're thinking about. 739 01:29:26,190 --> 01:29:34,380 How can you sort of control right? And that would be an optimal that would be an optimal control. 740 01:29:34,380 --> 01:29:48,270 Plus, we do have the geolocation right, so we will we will be able to sort of use that. 741 01:29:48,270 --> 01:29:50,730 OK, Nancy, last question. 742 01:29:50,730 --> 01:30:01,440 So just to go back to the initial part of the presentation about the reproducibility of these data studies, what exactly would your recommendation be? 743 01:30:01,440 --> 01:30:11,250 Obviously, I see that you should, as a researcher, do repetition repetition, try to reproduce your results in different modes. 744 01:30:11,250 --> 01:30:16,390 But how should journals be checking? 745 01:30:16,390 --> 01:30:21,910 Oh, oh, so that's a very that's so journals, in my opinion, are checking for the wrong thing. 746 01:30:21,910 --> 01:30:26,190 Yeah, exactly right. There's I mean, it's I mean, I just met with a bunch of journal letters. 747 01:30:26,190 --> 01:30:30,840 It's ridiculous. So, yes. So the only thing they do is they want a replication file, right? 748 01:30:30,840 --> 01:30:36,300 And they believe they're going to run. And then they sort of basically make sure that you can replicate Table six. 749 01:30:36,300 --> 01:30:39,720 Right? Well, that, in my opinion, is useless. Right. 750 01:30:39,720 --> 01:30:43,800 The problem isn't replication so much as data generation. 751 01:30:43,800 --> 01:30:48,360 Right? Because we don't know. Right. I mean, that was my sort of earlier point. 752 01:30:48,360 --> 01:30:56,880 I mean, you know, we don't know how these increasingly we no longer know how these data generated, right? 753 01:30:56,880 --> 01:31:03,720 Like when I started many decades ago, right data generation was very expensive. 754 01:31:03,720 --> 01:31:10,800 Very, you know, well-respected firms were the ones that did the data generation. 755 01:31:10,800 --> 01:31:22,350 And everybody knew that this was, you know, data generated by, you know, Michigan or Gallup, right? 756 01:31:22,350 --> 01:31:28,020 They knew that there were all these controls and time and associate with the data generation. 757 01:31:28,020 --> 01:31:32,010 Now that doesn't exist, right? It's totally nobody knows. You know you. 758 01:31:32,010 --> 01:31:40,390 You see lots of stuff published in journals now by people who said, you know, I've got a thousand triggers to do this task. 759 01:31:40,390 --> 01:31:47,880 Oh, OK. I mean, there's there's absolutely no control over the data generation. 760 01:31:47,880 --> 01:31:53,160 And that is that, I think is the problem. We're going to have to sort of deal with. 761 01:31:53,160 --> 01:31:56,550 I'm just saying that you're basically making a call. Well, 762 01:31:56,550 --> 01:32:02,670 it shows basically the difference between modes that any reproduction study done in a different mode 763 01:32:02,670 --> 01:32:08,190 showing a different result than the initial result is in itself up to something great to be expected. 764 01:32:08,190 --> 01:32:18,330 Right. So everyone should, by definition, be in the same mode. So any reproduction study should per definition, be at least in the same mode. 765 01:32:18,330 --> 01:32:28,350 Because otherwise, no, I think I mean, I think this table that I think the the results of that barred analysis, right? 766 01:32:28,350 --> 01:32:35,820 I think that's what you should report. Right? Oh yeah. Know, of course. Of course. But but I think for me, that would be much more reassuring. 767 01:32:35,820 --> 01:32:44,610 OK, you do have a there is there is a there is a statistically significant treatment effect that blue line. 768 01:32:44,610 --> 01:32:55,860 Right? And yes, there is some variation across the board, but it's not that, you know, it doesn't detract. 769 01:32:55,860 --> 01:33:01,170 You're simply informing the readers that, you know, there clearly there's some heterogeneity here. 770 01:33:01,170 --> 01:33:05,880 I'm being honest about it, right? I'm not quite sure why it's happening, right? 771 01:33:05,880 --> 01:33:09,300 But that's transparent. That's what you should do, in my opinion. Yeah. 772 01:33:09,300 --> 01:33:16,570 But I'm saying if there is a Mechanical Turk study and I want to do a rapid production of it, then I choose just a single mode. 773 01:33:16,570 --> 01:33:22,410 So in that sense, I would prefer to do multimode, but I'm going to be a single mouse reproduction researcher. 774 01:33:22,410 --> 01:33:26,520 Yeah, then I should at least take the same old. Yes, as a point of departure. 775 01:33:26,520 --> 01:33:28,860 That's a good thing. That was your point. Yes. 776 01:33:28,860 --> 01:33:34,190 I mean, and then if you can't if you can't replicate it within the same mode, then clearly that's a problem. 777 01:33:34,190 --> 01:33:37,750 Yeah. Yeah, yeah. Great. 778 01:33:37,750 --> 01:33:42,528 Yep. All right. Thank you very much.