1 00:00:09,020 --> 00:00:15,890 We're going to start there for our first global livestream from Oxford, 2 00:00:15,890 --> 00:00:26,930 and we are very lucky to have Nick Tallentire from the University of Southampton and the world for Project Nick. 3 00:00:26,930 --> 00:00:31,220 He's an epidemiologist particularly interested in the spread of infectious disease, 4 00:00:31,220 --> 00:00:40,490 and it has been doing a really groundbreaking work on the use of big data for well-being and especially using 5 00:00:40,490 --> 00:00:49,220 computational methods and the digital trace data such as Google location history or mobile phone data to track human, 6 00:00:49,220 --> 00:00:56,660 though. And this is meant to track human mobility and the spread of diseases. 7 00:00:56,660 --> 00:00:59,060 So this would be our first okay. 8 00:00:59,060 --> 00:01:07,670 And it's great that we have a talk about computational social science for the public good and to change the world for for the better. 9 00:01:07,670 --> 00:01:16,850 And Nick is going to talk to us about mapping human population and mobility in low and middle income countries for malaria elimination. 10 00:01:16,850 --> 00:01:27,560 In the second part of this talk, we're going to have an hands on kind of exercise and the link is on the Oxford locker and as well on the 11 00:01:27,560 --> 00:01:36,200 general sadako of all locations for anybody would be or might be interested into doing this with us in remote. 12 00:01:36,200 --> 00:01:41,240 All right, cool. Thank you, Neal. Yeah. Thanks for the introduction and thanks for having me, everybody. 13 00:01:41,240 --> 00:01:45,740 Yeah, I'm coming from the University of Southampton within the World Pop Project, 14 00:01:45,740 --> 00:01:55,580 which is a project that has lots of different projects working towards broadly sort of mapping human populations and their dynamics through time, 15 00:01:55,580 --> 00:02:04,250 including another non-profit group called Flow Minder, which works with various mobile operators to get mobile phone data, 16 00:02:04,250 --> 00:02:08,510 to scale research to be able to support things like disaster response and efforts like that. 17 00:02:08,510 --> 00:02:15,710 So there's a lot of us. I'm going to touch a little bit on the broader work that goes on at World Pop, but then I'll focus on my own work, 18 00:02:15,710 --> 00:02:23,650 which largely has to do with malaria, and we have lots of partners around the world to do all that work. 19 00:02:23,650 --> 00:02:28,960 So just to give you a really quick outline of what I'll be talking about today before I jump into it. 20 00:02:28,960 --> 00:02:35,650 Within World Pop, one of the main things that we work on is trying to create accurate maps of populations around the world, 21 00:02:35,650 --> 00:02:39,550 particularly in low and middle income countries. I'll talk about why that's important. 22 00:02:39,550 --> 00:02:43,570 What that means and what data we use that to start off with. 23 00:02:43,570 --> 00:02:51,730 And then I'll move into sort of how these different concepts apply to malaria, particularly the idea that populations are dynamic through time. 24 00:02:51,730 --> 00:02:57,850 So they move, they change. People change residents, they change where they go, you know, over the course of a year. 25 00:02:57,850 --> 00:03:02,080 And that affects how they get exposed to things like malaria. 26 00:03:02,080 --> 00:03:07,240 And then I'll end sort of my talk with what I think the future directions are for this field, 27 00:03:07,240 --> 00:03:11,050 the way things should move forward, where the big questions are into the future. 28 00:03:11,050 --> 00:03:17,530 And I think this might tie really well into what you've talked about this morning where I think you had ethics discussions and things like that. 29 00:03:17,530 --> 00:03:21,070 And so I'll talk about various big data that we've used. 30 00:03:21,070 --> 00:03:28,360 And I think, keep in mind, sort of in the back of your mind what the questions are in terms of how do we make sure we're getting informed consent? 31 00:03:28,360 --> 00:03:35,810 What does the ethics process looks like for various data? And what ethics questions do you get when you use data that I'll be talking about today? 32 00:03:35,810 --> 00:03:41,750 That'll be sort of the first hour, and then after that, we will move into sort of an interactive exercise. 33 00:03:41,750 --> 00:03:46,220 So what I wanted you all to be able to do was to look at your own location history data. 34 00:03:46,220 --> 00:03:50,390 This is true if you use Google Maps, if you, especially if you use an Android phone, 35 00:03:50,390 --> 00:03:56,150 so basically are instructions I have at the end for you to download your Google location history data and 36 00:03:56,150 --> 00:04:02,090 then you'll make some plots in order to plot it on top of a map of London with a map of green spaces there. 37 00:04:02,090 --> 00:04:08,060 So there's the barcode there as well, and a couple of shaped files in a zip file on your slack. 38 00:04:08,060 --> 00:04:15,890 So to go back to why having population maps that are accurate, why that's so important we can can start off with the UN Sustainable Development Goals. 39 00:04:15,890 --> 00:04:17,300 There are 17 goals here, 40 00:04:17,300 --> 00:04:25,670 from twenty fifteen to twenty thirty one hundred sixty nine sort of quantitative targets that they want to achieve things like no poverty, 41 00:04:25,670 --> 00:04:33,770 zero hunger, good health and well-being, quality education, gender equality, things like that, and ultimately to try to get to each of these goals. 42 00:04:33,770 --> 00:04:39,290 Underneath this really nice picture of things we'd like to get to is a population map. 43 00:04:39,290 --> 00:04:46,370 We need to know where people are that actually aren't achieving these sorts of goals so that we can implement new strategies, 44 00:04:46,370 --> 00:04:55,280 put out new infrastructure to move towards them. And, you know, that's the idea, so they used the slogan sort of leave no one behind. 45 00:04:55,280 --> 00:05:05,960 And implicit behind that is that you have to be able to know where those people actually are everywhere to be able to create your policy accordingly. 46 00:05:05,960 --> 00:05:11,900 And essentially, the idea is that the key data behind the Sustainable Development Goals is that you want to 47 00:05:11,900 --> 00:05:18,290 have people able to access specific services or resources or have a certain level of social, 48 00:05:18,290 --> 00:05:26,480 economic or physical health and actually understanding how these various aspects of health vary subnational. 49 00:05:26,480 --> 00:05:29,150 It's critical to actually addressing these goals. 50 00:05:29,150 --> 00:05:36,920 So to be able to address these issues, you have to know what the subnational geographic variation in health status, 51 00:05:36,920 --> 00:05:42,760 wealth and accessibility, where those occur and where those inequities lie. 52 00:05:42,760 --> 00:05:50,620 To do this, you have to have a consistent, comparable and accurate population map of where those people are and where they're going. 53 00:05:50,620 --> 00:05:55,690 So you don't just need where they actually live, but you actually need to know where they go, 54 00:05:55,690 --> 00:06:05,410 where they spend time and their demographics and their socioeconomic status and things like that in order to get at these stages. 55 00:06:05,410 --> 00:06:08,830 The reason I think this is so important is because if you look at different health risks, 56 00:06:08,830 --> 00:06:18,880 we actually get pretty different pictures of where risk actually is and really large heterogeneity is in terms of accessibility and things like that. 57 00:06:18,880 --> 00:06:24,550 So here's how the picture differs across the world in terms of different types of pathogens on the top left, 58 00:06:24,550 --> 00:06:29,080 we have a map of risk to zoonotic pathogens from wildlife. 59 00:06:29,080 --> 00:06:36,820 On the top right is zoonotic pathogens from nine wildlife and the bottom left drug-resistant pathogens and bottom right. 60 00:06:36,820 --> 00:06:43,750 Vector borne pathogens. And so you need to have a clear picture of of where populations are in terms of interacting with where this risk is, 61 00:06:43,750 --> 00:06:50,330 but also where this risk is itself in order to effectively plan into the future. 62 00:06:50,330 --> 00:06:51,500 For a couple more examples, 63 00:06:51,500 --> 00:07:00,260 here's what it looks like on the left in Africa for lymphatic filariasis and on the right you have malaria and these are, you know, 64 00:07:00,260 --> 00:07:02,360 while there are similarities between the two maps, 65 00:07:02,360 --> 00:07:09,800 they are broadly different and you want to know what you're dealing with before you actually implement any interventions. 66 00:07:09,800 --> 00:07:16,430 So in the case of malaria, the reason why having these accurate population maps is important to be very sort of specific, 67 00:07:16,430 --> 00:07:23,570 we want to know questions like how much chemical, how much insecticide do we need in order to spray all the households in a given province? 68 00:07:23,570 --> 00:07:30,710 That's going to depend on how many people live there, how many people actually in the province in the month that we do the spraying campaign. 69 00:07:30,710 --> 00:07:34,280 Alongside that, we need to know how many people do we need in the spraying team? 70 00:07:34,280 --> 00:07:41,930 How long do they need to be out there? And thirdly, sort of how do we measure how well we've done so if we know how many houses we've sprayed, 71 00:07:41,930 --> 00:07:50,060 how many people we've covered with our insecticide campaign? We want to compare that against how many people we actually expect to have been there, 72 00:07:50,060 --> 00:07:55,790 and that helps us make sure that we've actually covered everywhere throughout each country. 73 00:07:55,790 --> 00:08:01,070 So partly to do that, one idea you might use is to get maps of all the settlements in residential buildings, 74 00:08:01,070 --> 00:08:05,840 which is occurring in some high in some malaria elimination settings, 75 00:08:05,840 --> 00:08:13,440 but also discuss other ways that we get similar questions using different data where these are often really difficult to get. 76 00:08:13,440 --> 00:08:18,240 It's also important for responding to outbreaks, so in this case, we have some specific event that we want to deal with. 77 00:08:18,240 --> 00:08:22,080 We don't necessarily know when it's going to happen or where it's going to happen. 78 00:08:22,080 --> 00:08:27,930 And then using population maps will want to figure out things like how significant of an outbreak it is. 79 00:08:27,930 --> 00:08:32,400 You know, how much, how many people are at risk. Where did it originate? 80 00:08:32,400 --> 00:08:40,200 Where am I to go next? And how can we control it? And questions that and things we might use to address those questions would be 81 00:08:40,200 --> 00:08:44,580 sort of maps of the actual population during the time that the outbreak occurs, 82 00:08:44,580 --> 00:08:52,920 as well as the travel patterns of the populations that are actually at risk of outbreak spread of the disease. 83 00:08:52,920 --> 00:09:00,210 So in terms of high income countries, the basic data that we would use answer these sorts of questions are reasonably straightforward. 84 00:09:00,210 --> 00:09:06,900 We have in high income countries regular reliable national censuses with really strong geospatial components. 85 00:09:06,900 --> 00:09:09,690 So we know exactly how many people are in each census area. 86 00:09:09,690 --> 00:09:17,880 Census areas are really small, so we can have a good, really effective, really highly resolved population estimate. 87 00:09:17,880 --> 00:09:22,680 Oftentimes, we also have really comprehensive civil registration and vital statistics to kind 88 00:09:22,680 --> 00:09:27,990 of get an idea of the demography of those people in each of the census areas. 89 00:09:27,990 --> 00:09:31,800 And then other statistics like air traffic data, 90 00:09:31,800 --> 00:09:39,890 sort of population movement between cities to get an idea of how different parts of these countries are actually connected. 91 00:09:39,890 --> 00:09:44,630 It's not exactly the same situation in low and middle income settings, so in low and middle income settings, 92 00:09:44,630 --> 00:09:50,300 we often do get sensitive censuses and ideally, you know, if we use those census, 93 00:09:50,300 --> 00:09:53,450 this will get within census aerial units, 94 00:09:53,450 --> 00:10:03,850 the number of people and each of them the age and sex distribution amongst age and gender distribution amongst those populations. 95 00:10:03,850 --> 00:10:07,480 We can get a prediction of what that population is going to look like through time. 96 00:10:07,480 --> 00:10:10,480 We can compare against past censuses to see how it looked in the past, 97 00:10:10,480 --> 00:10:15,640 but then also project into the future and we could also get some vague sense of movement. 98 00:10:15,640 --> 00:10:19,480 So oftentimes they'll ask people where they lived in the past. 99 00:10:19,480 --> 00:10:26,620 We can see how provinces different areas in different countries are linked by looking at data like that from censuses. 100 00:10:26,620 --> 00:10:30,670 There are problems with these data, however, particularly in low and middle income settings. 101 00:10:30,670 --> 00:10:36,130 So, for example, in some countries, you know, maybe the most recent census would have been 2011. 102 00:10:36,130 --> 00:10:39,040 In other countries it a little bit earlier. Twenty three. 103 00:10:39,040 --> 00:10:45,880 And this is something that greatly depends upon the stability of the country, the ability of the government to pay for a census and things like that. 104 00:10:45,880 --> 00:10:52,780 So oftentimes the most recent census in these low and middle income countries is a really long time ago. 105 00:10:52,780 --> 00:11:01,510 So here's the plot of that around the world, where in some countries say in Africa, you know, 11 to 15 years ago or more than 15 years ago. 106 00:11:01,510 --> 00:11:10,370 So we don't really have an accurate picture of the population throughout the country based on that really old census. 107 00:11:10,370 --> 00:11:18,710 Just to give you a few examples of the most recent census across different countries around the globe in the DRC, the most recent one was in 1984. 108 00:11:18,710 --> 00:11:23,150 So over 30 years ago. And that's true for a lot of different countries around the world. 109 00:11:23,150 --> 00:11:29,420 And even in some, we've never actually had a census to use. So we have to be able to get to reach these SDGs. 110 00:11:29,420 --> 00:11:38,270 We need to have population information. But in these cases, and in many cases around the world, the censuses are not going to be reliable. 111 00:11:38,270 --> 00:11:44,150 Not only that, the ones that do occur, we may be missing populations. There may be inaccuracies in the data. 112 00:11:44,150 --> 00:11:51,560 The mobility data might be incomplete. So often times they'll ask people, What country did you come from five years ago if you moved to this country, 113 00:11:51,560 --> 00:11:58,460 but they don't ask where within the country they came from. So we don't really have a clear picture of how people are moving. 114 00:11:58,460 --> 00:12:09,290 And again, you know, because there's potentially long periods between censuses, we don't have a great picture of what changes occur between censuses. 115 00:12:09,290 --> 00:12:14,000 So in terms of meeting those SDGs and providing data sets for the SDGs, 116 00:12:14,000 --> 00:12:20,240 since national census data will always sort of be one of the most important data sources, it is comprehensive. 117 00:12:20,240 --> 00:12:24,230 You are getting a picture of everybody, you are getting their demographic information with it, 118 00:12:24,230 --> 00:12:27,950 and it does give you subnational detail, but oftentimes it's not enough. 119 00:12:27,950 --> 00:12:32,870 So we want to add supplementary information on top of that, especially even in good. 120 00:12:32,870 --> 00:12:40,010 Even in the best case scenario, they often only occur once a decade. So we don't know what's happening in between. 121 00:12:40,010 --> 00:12:47,330 They often lack population characteristics. So like really fine scale mobility information typically doesn't exist. 122 00:12:47,330 --> 00:12:52,700 And it's also a static picture. So, for example, we know that some populations move seasonally. 123 00:12:52,700 --> 00:13:02,250 Censuses won't capture that, and those seasonal population movements can matter for diseases that tend to be seasonal, such as malaria. 124 00:13:02,250 --> 00:13:09,120 So nowadays, there's a lot of new types of data that we can use to understand these population changes and those dynamics. 125 00:13:09,120 --> 00:13:14,730 So on the left are the data sets that we would have typically used traditionally wear on the x axis. 126 00:13:14,730 --> 00:13:19,980 We have sort of the spatial scale of how those dynamics occur from neighbourhood scale. 127 00:13:19,980 --> 00:13:23,370 So do you move to the next neighbourhood over? How often do you do it? 128 00:13:23,370 --> 00:13:27,690 Which ones do you go to all the way, to the right, to international scale? 129 00:13:27,690 --> 00:13:31,350 And that on the y axis you have the temporal scale. 130 00:13:31,350 --> 00:13:40,710 So on the bottom, it's sort of a very fine scale picture. Where are people spending their time based on the hour versus daily all the way up 131 00:13:40,710 --> 00:13:45,840 to long term migration events that you might only do a few times in your life? 132 00:13:45,840 --> 00:13:52,700 So in the last, we have traditionally the types of data sets that we would use that. So census migration data are typically available. 133 00:13:52,700 --> 00:13:55,410 I mean, you can use cross-border surveys or traffic surveys. 134 00:13:55,410 --> 00:14:01,110 So in this case, if people cross the border and there's a border post that records that that movement happened, 135 00:14:01,110 --> 00:14:06,980 then we can look at that information for sort of long term or seasonal movement, or we can use travel history surveys. 136 00:14:06,980 --> 00:14:11,520 So in that case, we have people going out just asking, where were you two weeks ago? 137 00:14:11,520 --> 00:14:17,190 Where where are you eight weeks ago and where have you travelled? How long did you spend there? 138 00:14:17,190 --> 00:14:22,680 It's a really exciting time to be thinking about these sorts of questions because there are lots of new data sets to address them, 139 00:14:22,680 --> 00:14:29,520 and I'll talk about some of them later on. But here on the right, we have sort of more contemporary data sets that we can use, 140 00:14:29,520 --> 00:14:34,290 including things like personally carry GPS trackers, social media data. 141 00:14:34,290 --> 00:14:41,880 So data from Facebook, Twitter to kind of get an idea of where people are spending time and when they're moving mobile phone records, 142 00:14:41,880 --> 00:14:51,710 satellite nightlight data and then Google Location history data as well, which is the one that you'll be playing around with a bit later today. 143 00:14:51,710 --> 00:14:58,400 So just to show what some of these newer data sets look like, if we think about the satellite derived data, 144 00:14:58,400 --> 00:15:04,910 you know we might get things like used satellite imagery and then use crowdsourced, 145 00:15:04,910 --> 00:15:09,710 crowdsourced effort to sort of map households and buildings throughout that landscape. 146 00:15:09,710 --> 00:15:13,400 So that's what this missing maps project does, which is a really effective way. 147 00:15:13,400 --> 00:15:16,880 If you have an estimate of how many people are going to be in each household of getting out, 148 00:15:16,880 --> 00:15:21,200 what the actual population size in that community and seeing how it changes through 149 00:15:21,200 --> 00:15:26,930 time because you can see how the actual buildings and their layout changes their time, 150 00:15:26,930 --> 00:15:35,240 you can look at road networks here on the bottom and then you can also look at the social media data so they're on the right, shown his Twitter data. 151 00:15:35,240 --> 00:15:38,900 That's basically densities of Twitter users in a city. 152 00:15:38,900 --> 00:15:48,850 So you can look at how that changes, where that actually is and see if that tells you something about what the population actually looks like. 153 00:15:48,850 --> 00:15:51,130 More satellite imagery data that we might use, 154 00:15:51,130 --> 00:15:58,900 so we might get land cover estimates that would tell us a little bit about what that land is being used for. 155 00:15:58,900 --> 00:16:04,530 If that implies certain things about disease risk or where people will spend time on that landscape. 156 00:16:04,530 --> 00:16:09,580 And yeah, and the top right there, you can see why mapping the buildings is so important. 157 00:16:09,580 --> 00:16:11,830 So here on two different sides of a community, 158 00:16:11,830 --> 00:16:17,740 we have two very different household sizes that might imply different things about how many people are living there, 159 00:16:17,740 --> 00:16:26,080 the population density of those areas and so on. So this is a really nice sort of emerging data set that a lot of people are using into the future, 160 00:16:26,080 --> 00:16:34,520 particularly at World Pop, to get better pictures of population densities on a subnational scale. 161 00:16:34,520 --> 00:16:37,460 This is a nightlight data set, which is pretty neat as well. 162 00:16:37,460 --> 00:16:46,370 So basically, satellites look at pictures of where lights are in different communities, and you can look at that sort of across months, 163 00:16:46,370 --> 00:16:52,280 across days, across years to see where, how the extent of that city changes through time. 164 00:16:52,280 --> 00:16:57,800 Use that to infer maybe how that population is changing through time, how many new people are in it. 165 00:16:57,800 --> 00:17:04,460 And that's that's one new dataset that we're exploring to use to look at seasonal population maps in various countries. 166 00:17:04,460 --> 00:17:10,450 So. Another one hour mobile phone data. 167 00:17:10,450 --> 00:17:16,210 So this is one that's really become, I think, especially exciting in the past 10 or 20 years, 168 00:17:16,210 --> 00:17:23,950 particularly as mobile phones have become really pervasive and oftentimes actually the first device that people 169 00:17:23,950 --> 00:17:29,980 in low and middle income countries get to access the internet in a lot of low and middle income countries. 170 00:17:29,980 --> 00:17:35,890 Now what you find is that rather than getting a computer, people buy a smartphone and that's how they actually contact people. 171 00:17:35,890 --> 00:17:44,170 That's how they actually access the internet. So it's a really huge, increasingly viable for these settings kind of data set that you can use. 172 00:17:44,170 --> 00:17:49,420 And essentially what that looks like a go into a little bit more detail about it later. 173 00:17:49,420 --> 00:17:57,670 But you have cell towers across the landscape. And by looking at when people, when and where people are making calls and texts, 174 00:17:57,670 --> 00:18:01,990 you can track them through time and see how they spend time across that landscape. 175 00:18:01,990 --> 00:18:10,300 And so that's what this is, what that looks like across Namibia, where you could see clear sort of patterns of connectivity across the entire country. 176 00:18:10,300 --> 00:18:16,420 Here in the middle, we have Ventec, which is like a big attractor. It's the main capital city in the main city in the country. 177 00:18:16,420 --> 00:18:19,570 It's a huge detractor of people from throughout the entire country. 178 00:18:19,570 --> 00:18:30,070 But you can also see local population centres up here in the north and over here in Zambezi, in the Panhandle as well. 179 00:18:30,070 --> 00:18:38,530 GPS based data are really exciting as well, so personally carry GPS trackers can be a really useful field tool. 180 00:18:38,530 --> 00:18:42,430 So if you provide GPS trackers to people, they'll wear it, maybe, you know, 181 00:18:42,430 --> 00:18:48,310 for a week or two based on the battery life based on their capability to sort of charge those GPS trackers. 182 00:18:48,310 --> 00:18:52,510 And you can see where people in a really fine scale way, where they spend their time. 183 00:18:52,510 --> 00:18:58,900 So this is from a study in Kenya from somebody in our group where she looked at how people are moving with 184 00:18:58,900 --> 00:19:03,280 their livestock through time and how that what that means about the potential risks of zoonotic disease. 185 00:19:03,280 --> 00:19:10,630 So we can combine this really fine scale picture of where time is spent with land use data with survey data about why they're 186 00:19:10,630 --> 00:19:17,230 making these trips to get a better picture of where disease risk occurs and where they're actually spending their time. 187 00:19:17,230 --> 00:19:23,770 Alongside that is Google Location History data and similarly collect passively collected smartphone data. 188 00:19:23,770 --> 00:19:28,870 So this is a really fun one because if you have an Android, well, if you have a smartphone. 189 00:19:28,870 --> 00:19:37,000 Most smartphones nowadays have GPS devices within them, and unless you opt out, they're actually actively collecting GPS points through time. 190 00:19:37,000 --> 00:19:45,640 And that's what I'll be looking at later. And we actually did a pilot study to see if we collected these from people at Southampton. 191 00:19:45,640 --> 00:19:57,760 What we would get and what we found was that we actually got on average a year of fine scale GPS tracker level information from participants. 192 00:19:57,760 --> 00:20:02,230 And because this was a Southampton sample, you can see here on the bottom right. 193 00:20:02,230 --> 00:20:08,410 This is actually what those data looked like, aggregated over about 20 20 students at the university. 194 00:20:08,410 --> 00:20:12,850 So you can really see really clearly sort of travel patterns around the country. 195 00:20:12,850 --> 00:20:17,170 You can see where people went from Southampton. A lot of people travel to London. 196 00:20:17,170 --> 00:20:23,890 You can see the three. I think it is the ring road around London that they travel on things like that. 197 00:20:23,890 --> 00:20:36,880 And yeah, so this it's an exciting data set as well, particularly because we did a survey of people in the UK, the US, Brazil, Japan and the last one. 198 00:20:36,880 --> 00:20:45,220 I think Mexico was the last one. We did surveys in those countries and we found that a majority of people of 199 00:20:45,220 --> 00:20:49,120 Android smartphone owners in those countries actually had this data set enabled. 200 00:20:49,120 --> 00:20:52,360 We didn't look at download their data or anything. We were just asking if they had it. 201 00:20:52,360 --> 00:20:55,960 But we did found that find that a majority of them actually still have it enabled. 202 00:20:55,960 --> 00:21:02,350 So. So that's a little bit of a picture of why population densities matter, 203 00:21:02,350 --> 00:21:10,480 why having accurate sort of seasonal pictures of population densities is really important for meeting a SDGs. 204 00:21:10,480 --> 00:21:14,140 Now I'm going to move over and talk a little about my own work, 205 00:21:14,140 --> 00:21:22,480 where I apply these concepts and data to think about malaria and specifically malaria elimination in countries that are really close. 206 00:21:22,480 --> 00:21:29,090 So this is southern Africa, Southeast Asia and Mesoamerica. 207 00:21:29,090 --> 00:21:35,450 So just to give you a quick primer on malaria itself, it's a vector borne disease, 208 00:21:35,450 --> 00:21:40,850 so it's a disease that infects humans, it gets into your liver and then into your bloodstream. 209 00:21:40,850 --> 00:21:47,120 It's spread by a mosquito vector and obligate mosquito vector. Part of the malaria life cycle occurs in the mosquito. 210 00:21:47,120 --> 00:21:52,640 So that means that part of the life style style occurs in the human. 211 00:21:52,640 --> 00:22:01,010 A mosquito has to bite a human and then pick up the parasite and then transmit it to somebody else for transmission to occur. 212 00:22:01,010 --> 00:22:06,830 You don't really see human to human transmission transmission in natural settings, and because of that, 213 00:22:06,830 --> 00:22:12,950 you only really get transmission in areas where mosquitoes live and mosquitoes live near water bodies. 214 00:22:12,950 --> 00:22:16,400 That's where you tend to find mosquitoes in forested areas. 215 00:22:16,400 --> 00:22:22,790 So you end up getting a really heterogeneous picture of malaria risk based on where those mosquito populations are. 216 00:22:22,790 --> 00:22:26,150 So if you look in any realistic landscape of transmission, 217 00:22:26,150 --> 00:22:34,670 what you get is that you get hot spots of malaria transmission near wetlands, near areas with ponds. 218 00:22:34,670 --> 00:22:40,160 And then what will happen is when during the wet seasons, when transmission ramps up, 219 00:22:40,160 --> 00:22:43,760 you'll get really, really high transmission in the centre of those hot spots. 220 00:22:43,760 --> 00:22:48,800 And then mosquitoes and people diffusing out will make that disease spread out 221 00:22:48,800 --> 00:22:53,210 from those hot spots from those transmission centres out into other areas. 222 00:22:53,210 --> 00:23:01,400 So it's a really interesting place to think about human mobility and population dynamics because your exposure and risk to 223 00:23:01,400 --> 00:23:10,710 malaria depends entirely on whether or not you spend time in these areas that have mosquitoes that will transmit the disease. 224 00:23:10,710 --> 00:23:15,870 So that's where the exposure comes into play. People will spend time in areas with mosquitoes. 225 00:23:15,870 --> 00:23:24,720 Oftentimes these might be areas like, you know, farmlands, areas where you get water, things like that. 226 00:23:24,720 --> 00:23:31,140 So there are different reasons why people will spend time in these locations and they'll spend different amount of amounts of time there. 227 00:23:31,140 --> 00:23:41,070 So you can see here on the right is a picture of Zanzibar, and this is a apt picture of malaria risk because it's off the island. 228 00:23:41,070 --> 00:23:49,170 One of the islands of Zanzibar, where it's generally found in the south and it's really strongly correlated with geospatial geospatial 229 00:23:49,170 --> 00:23:55,380 covariates like land use and land cover in particular here in the south of Zanzibar on this island. 230 00:23:55,380 --> 00:24:02,430 And I think it is, you have lots of sort of forested areas, lots more ponds, 231 00:24:02,430 --> 00:24:07,810 and therefore that's where the mosquitoes are and that's where the transmission occurs. 232 00:24:07,810 --> 00:24:13,090 This is really specifically important for these population dynamics of how people are actually 233 00:24:13,090 --> 00:24:18,220 moving in and out of these areas with mosquitoes is really important for malaria elimination. 234 00:24:18,220 --> 00:24:27,790 So Zanzibar are a set of islands off the coast of of Tanzania and on mainland Tanzania. 235 00:24:27,790 --> 00:24:30,520 You have really high malaria transmission still. 236 00:24:30,520 --> 00:24:38,560 So essentially, what's happened in Zanzibar is you've had a really effective malaria control programme that began in the mid 2000s. 237 00:24:38,560 --> 00:24:44,620 And you can see here on the left is sort of how the malaria burden across the island has dropped over time. 238 00:24:44,620 --> 00:24:49,870 So it's low, but not very well. It's 30 percent of people infected in the early 2000s. 239 00:24:49,870 --> 00:24:57,190 Then they had a really effective malaria campaign that dropped things down really far from sort of twenty three to twenty seven. 240 00:24:57,190 --> 00:25:04,570 So that only one or two percent of malaria people actually had malaria. But ending those last few cases has been really difficult. 241 00:25:04,570 --> 00:25:08,680 So they've been able to stop transmission in those areas in the south of the island, 242 00:25:08,680 --> 00:25:13,120 for example, but they haven't been able to stop the final few cases. 243 00:25:13,120 --> 00:25:19,570 And that's because these final few cases have to do with how those people on the island are spending time. 244 00:25:19,570 --> 00:25:26,830 People in Zanzibar are travelling by ferry to mainland Tanzania, getting infected, bringing back cases. 245 00:25:26,830 --> 00:25:31,870 Mosquitoes are hitching a ride on those ferries carrying malaria with them. 246 00:25:31,870 --> 00:25:39,010 And then you also have residents from mainland Tanzania travelling over to Zanzibar and infecting mosquitoes that remain on that island. 247 00:25:39,010 --> 00:25:47,920 So this is a really practical and important question is, you know, how do we figure out who's actually spending time in these areas with transmission? 248 00:25:47,920 --> 00:25:55,450 How do we target them? And how do we actually stop this final bit of transmission so that Zanzibar can fully eliminate malaria and 249 00:25:55,450 --> 00:26:02,380 then hopefully sort of ramp down these really expensive control measures that they have to do each year? 250 00:26:02,380 --> 00:26:08,170 So that's a lot about that's a lot of what I do. This is broadly important across southern Africa as well. 251 00:26:08,170 --> 00:26:14,560 So these are predictions say, if we have a picture of malaria from the malaria outlets project here at Oxford 252 00:26:14,560 --> 00:26:18,100 on the left that we might do something like try to figure out how people 253 00:26:18,100 --> 00:26:21,940 infected with malaria is in those areas are actually going to move throughout the 254 00:26:21,940 --> 00:26:30,910 entire region and plan surveillance and intervention programmes accordingly. 255 00:26:30,910 --> 00:26:38,860 It matters on a local scale as well, so this isn't just about specific islands or specific countries and how mobility moves across that landscape, 256 00:26:38,860 --> 00:26:43,700 it matters in terms of what our very fine scale picture of risk looks like. 257 00:26:43,700 --> 00:26:48,760 So here's Namibia. And essentially, when you usually map malaria, 258 00:26:48,760 --> 00:26:55,840 what you get is what you use are case counts from health facilities and you try to 259 00:26:55,840 --> 00:27:00,390 predict incidents based on where those people are actually showing up with malaria. 260 00:27:00,390 --> 00:27:05,500 And I'll talk a little bit about how we deal with it later. But the the assumption there, 261 00:27:05,500 --> 00:27:12,220 the implicit assumption that you make is that transmission occurred in the place where the person reported as a malaria case. 262 00:27:12,220 --> 00:27:14,450 In reality, they're spending time in other areas. 263 00:27:14,450 --> 00:27:22,420 So a population covered by this health facility here might actually spend a lot of time in this other community B, 264 00:27:22,420 --> 00:27:27,100 which has a lot of mosquitoes, and that's where actually the infection is occurring. 265 00:27:27,100 --> 00:27:32,140 So we'd like to be able to figure out not just where the cases are appearing in health facilities, 266 00:27:32,140 --> 00:27:36,670 but we actually want to plot where the transmission is occurring intrinsically. 267 00:27:36,670 --> 00:27:40,600 So I'll talk a little bit about that as well. 268 00:27:40,600 --> 00:27:46,360 But essentially then we have different scales so we can think about population dynamics on national scales, 269 00:27:46,360 --> 00:27:50,170 on regional scales, or we can think about local scales. 270 00:27:50,170 --> 00:27:54,100 And these population dynamics do vary with space and time. 271 00:27:54,100 --> 00:27:59,410 They vary with individuals and it affects what you do across these different spatial scales. 272 00:27:59,410 --> 00:28:04,360 So here on the right is a picture of what that mobility looks like and those dynamics look 273 00:28:04,360 --> 00:28:08,920 like and what might cause them at different spatial and temporal scales on the bottom left. 274 00:28:08,920 --> 00:28:17,950 We have really frequent sort of local scale movement on the top rates, really rare international movement where on the bottom left, 275 00:28:17,950 --> 00:28:24,310 you know, you might that might be school or work related movement that has to do with your exposure within a community. 276 00:28:24,310 --> 00:28:34,480 Whereas on the top right, you've got sort of diaspora related movement that spreads diseases across countries rather than really local scale places. 277 00:28:34,480 --> 00:28:42,570 So. So I'll talk a little bit about how I've addressed we've addressed those issues in world pop towards malaria, 278 00:28:42,570 --> 00:28:47,340 using a few different data sets and crossing those different spatial scales. 279 00:28:47,340 --> 00:28:57,000 So the three sort of examples I want to give you is that at a local scale is how we use various data sets like mobile phone data, 280 00:28:57,000 --> 00:29:05,370 GPS tracker data to measure a risk account for mobility when we actually plot risk on a long on sort of regional scales. 281 00:29:05,370 --> 00:29:10,800 I'll talk about how we plot connectivity between countries and subnational at administrative units 282 00:29:10,800 --> 00:29:15,930 within countries and what predicts what that means for a reintroduction across that landscape. 283 00:29:15,930 --> 00:29:22,140 And then in the middle sort of how we account for seasonal population dynamics to get better, 284 00:29:22,140 --> 00:29:27,720 better estimates of disease incidence throughout throughout a country. 285 00:29:27,720 --> 00:29:33,450 So. So starting from the bottom in terms of really local scale dynamics, 286 00:29:33,450 --> 00:29:39,300 essentially what this question has to do with is if we have two different populations around the city, 287 00:29:39,300 --> 00:29:44,400 we want to know where if we know where disease risk is or some kind of health risk, 288 00:29:44,400 --> 00:29:49,680 we want to know how much time people are spending at risk and which populations are actually visiting them. 289 00:29:49,680 --> 00:29:54,090 So let's say we have this map of air pollution around London. 290 00:29:54,090 --> 00:29:58,410 Let's say we have two different populations on either side of the city they're equidistant. 291 00:29:58,410 --> 00:30:06,120 These two populations might differ pretty significantly in terms of actual realised risk of air pollution related 292 00:30:06,120 --> 00:30:12,000 events where the one on the left might actually go into central London more often than the one on the right. 293 00:30:12,000 --> 00:30:17,040 And we want to actually account for that when we say which populations should we think about in 294 00:30:17,040 --> 00:30:24,850 terms of mitigating air pollution risk in terms of placing interventions and things like that? 295 00:30:24,850 --> 00:30:32,770 So in this case, what we're thinking about in terms of dynamics and in terms of mobility are repeated frequent movements. 296 00:30:32,770 --> 00:30:35,660 This is about places where people spend lots of time. 297 00:30:35,660 --> 00:30:45,700 So your workplace, your university, our two key sort of locations that where you might personally go there, 298 00:30:45,700 --> 00:30:54,010 there are really strong individual differences in terms of these movements based on sociodemographic. 299 00:30:54,010 --> 00:31:00,340 So, you know, people who work at in one type of in one type of industry might spend their time in the city centre, 300 00:31:00,340 --> 00:31:04,360 whereas farm workers might spend their time more out in rural areas and based on 301 00:31:04,360 --> 00:31:08,740 the health risk or thinking about this can yield really different risk patterns. 302 00:31:08,740 --> 00:31:13,210 So we need data to inform this that can give us really high resolution mobility. 303 00:31:13,210 --> 00:31:19,330 So you know, what we want to do is do things like combine that with map of disease risk or land use. 304 00:31:19,330 --> 00:31:28,180 So in this case, the types of data sets that I think of our GPS based data sets, the Google location history data and then CD-RW data as well. 305 00:31:28,180 --> 00:31:33,610 So the call detail records for mobile phones. Yeah. 306 00:31:33,610 --> 00:31:41,320 And the example that I'll give has to do with thinking about how a local population dynamics affect malaria reporting. 307 00:31:41,320 --> 00:31:46,630 The idea here is again to go back to that picture of health facility based disease mapping. 308 00:31:46,630 --> 00:31:49,060 We have cases appearing at a health facility. 309 00:31:49,060 --> 00:31:55,240 Those numbers get reported to statistics agency based on the health facility, location or the household location. 310 00:31:55,240 --> 00:32:03,160 And typically what we do is we take that map of where cases are and predict prevalence across the landscape using those data. 311 00:32:03,160 --> 00:32:10,990 So that's essentially what's occurred here where we have sort of prevalence of disease in different populations 312 00:32:10,990 --> 00:32:20,580 based on where those cases showed up and their associated health households or their associated health facilities. 313 00:32:20,580 --> 00:32:25,710 So again, this doesn't necessarily account for where the infection actually occurs. 314 00:32:25,710 --> 00:32:33,300 So the central idea is that observed burden where people are actually showing up with malaria is a combination of where 315 00:32:33,300 --> 00:32:40,530 people are actually getting infected and how people move there and where they are and how much time they spend there. 316 00:32:40,530 --> 00:32:43,870 So the question we want to ask is, can we predict where infections occurred? 317 00:32:43,870 --> 00:32:52,230 So the data that I used for this was mobile phone data from Namibia from October 2011 2010 to September 2011, 318 00:32:52,230 --> 00:32:56,250 with one point nine million sims and nine billion communications in general. 319 00:32:56,250 --> 00:33:03,060 So essentially, the idea is we have here at the top right is a picture of where infected people show up. 320 00:33:03,060 --> 00:33:07,530 We want to actually combine that with a picture of mobility to say, Where did they go? 321 00:33:07,530 --> 00:33:15,900 Where did they might? Where have they gotten infected originally? So to give you a bit more detail on what those call detail records look like. 322 00:33:15,900 --> 00:33:21,000 Essentially, whenever somebody makes a call or a text events whatever they call or SMS somebody, 323 00:33:21,000 --> 00:33:28,980 that data gets routed through a mobile phone tower and it goes to the network operator and gets routed to the person that it needs to go. 324 00:33:28,980 --> 00:33:35,190 From there, the network operator records the time of that event and the tower location. 325 00:33:35,190 --> 00:33:41,070 So let's say we have a hypothetical person who makes six different SMSes. 326 00:33:41,070 --> 00:33:44,880 They text somebody and it gets around it through Tower A initially. 327 00:33:44,880 --> 00:33:52,710 Then they make three texts. It gets routed through Tower B, then one gets routed through Tower C and another gets routed through Tower B. 328 00:33:52,710 --> 00:34:01,140 And if we look at where those towers are and how those calls occur through time, then we can sort of plot how they move. 329 00:34:01,140 --> 00:34:06,750 So here we infer that this person started at Tower A somewhere near it, then moved to Tower B, 330 00:34:06,750 --> 00:34:11,100 spent some time there that had a quick trip to see and then back to be. 331 00:34:11,100 --> 00:34:18,970 So based on that, we can plot sort of how people are exposed and how they move around the given landscape. 332 00:34:18,970 --> 00:34:24,610 In terms of how we practically work with these data, these coders get recorded by the operator. 333 00:34:24,610 --> 00:34:30,790 We always keep the individual data behind the network operators firewall. 334 00:34:30,790 --> 00:34:35,620 We'll run our models sort of aggregate data beyond the individual level. 335 00:34:35,620 --> 00:34:42,760 And so the only thing that we work with within Southhampton are aggregated data sets sort of at the population level. 336 00:34:42,760 --> 00:34:50,190 So that's that's designed to sort of help keep the confidentiality and privacy of these data secure. 337 00:34:50,190 --> 00:34:57,600 So here in Namibia, here's the picture that we have we have on the right the observed burden from the Malaria Atlas project. 338 00:34:57,600 --> 00:35:01,500 We know the mobility patterns across the entire country. 339 00:35:01,500 --> 00:35:08,680 And we want to sort of back calculate and figure out where people actually got infected based on that map. 340 00:35:08,680 --> 00:35:16,810 To do this, we use the mathematical transmission model, essentially a of population mathematical model where we had a bunch of different patches. 341 00:35:16,810 --> 00:35:23,140 Each patch was a cell tower. People spent time in other patches based on those mobile phone data. 342 00:35:23,140 --> 00:35:28,780 And on this landscape, we could have we took the proportion of people that were infected in each cell tower 343 00:35:28,780 --> 00:35:35,320 area and essentially said that that depends on the transmission in each area, 344 00:35:35,320 --> 00:35:44,470 how quickly people recover in each area and the time spent across that landscape to try to back calculate the hour. 345 00:35:44,470 --> 00:35:48,670 So in this picture, we know three things we know the proportion of people infected. 346 00:35:48,670 --> 00:35:52,840 We can assume the recovery rate and we know the time spent from the mobile phone data. 347 00:35:52,840 --> 00:35:57,730 Based on this, we try to infer the actual transmission rates that occur in each patch. 348 00:35:57,730 --> 00:36:08,770 Based on that entire landscape. So if we do that and we don't actually account for movements here, we have sort of an estimate or not, 349 00:36:08,770 --> 00:36:16,000 which is whether or not malaria is self-sustaining, has self-sustaining transmission in that given area. 350 00:36:16,000 --> 00:36:20,380 So an area in red means that there's enough mosquitoes that you have self-sustaining transmission. 351 00:36:20,380 --> 00:36:26,170 Any areas in blue mean that there's actually cases there, but it's it's mostly getting imported. 352 00:36:26,170 --> 00:36:31,900 There's actually not enough self-sustaining transmission to keep the parasite going within that population. 353 00:36:31,900 --> 00:36:35,470 So if we don't account for movement, then we get a picture that makes sense. 354 00:36:35,470 --> 00:36:43,600 Any area throughout the Namibia that has any cases has to have self-sustaining transmission because there's no importation going on. 355 00:36:43,600 --> 00:36:48,010 Once we plug in this mobile phone data and account for movement, we get something really different. 356 00:36:48,010 --> 00:36:53,170 So we get a much more heterogeneous picture of transmission where the red areas actually have lots of 357 00:36:53,170 --> 00:37:01,090 mosquitoes and are parasites sources and are exporting cases to blue areas which are parasites sinks. 358 00:37:01,090 --> 00:37:06,760 And if we isolate them, then they would actually not have enough mosquitoes to sustain transmission. 359 00:37:06,760 --> 00:37:16,600 So this is a really practical picture for malaria control programmes, where the parasite sources are areas where you'd want to do vector control. 360 00:37:16,600 --> 00:37:21,640 Those are areas where you want to damp down the mosquito populations and reduce transmission, 361 00:37:21,640 --> 00:37:25,990 whereas in the blue, we already have pretty low mosquito populations. 362 00:37:25,990 --> 00:37:31,480 All we want to do is stop disease spread and importation. 363 00:37:31,480 --> 00:37:37,900 So that might be places where control programmes would do things like surveillance, 364 00:37:37,900 --> 00:37:46,500 track people at health facilities, tests go out and test people actively things like that. 365 00:37:46,500 --> 00:37:53,580 So on local scales, you know, we can use fine scale mobility data to figure out how populations interact with their environment. 366 00:37:53,580 --> 00:38:01,260 See what that means for the implicit risk, the implicit transmission landscape in the particular case of Namibia. 367 00:38:01,260 --> 00:38:06,720 I think that there's a way to move forward there because we did base this on mobile phone data. 368 00:38:06,720 --> 00:38:11,910 It is limited by the resolution of the cell tower. And there's also no sociodemographic information. 369 00:38:11,910 --> 00:38:17,310 So these CDER's, we only have sort of anonymous location data. 370 00:38:17,310 --> 00:38:21,090 We don't know the gender of those people underlying it. We don't know their age. 371 00:38:21,090 --> 00:38:26,100 We don't know anything else about anything that would tell us about their malaria risk. 372 00:38:26,100 --> 00:38:30,630 And that would be a really interesting way forward as other data sets that could inform that. 373 00:38:30,630 --> 00:38:42,040 So. So, you know, there are other data sets to actually get towards those specific questions to include sort of that sociodemographic information, 374 00:38:42,040 --> 00:38:44,710 GPS tracker data are a really useful way to do that. 375 00:38:44,710 --> 00:38:50,980 So in this case, you actually give people GPS trackers get they're really fine scale mobility patterns, 376 00:38:50,980 --> 00:38:56,650 but also get their sociodemographic information. See how that correlates with the mobility. 377 00:38:56,650 --> 00:39:02,440 See where people spend time based on things like occupation, gender and age. 378 00:39:02,440 --> 00:39:11,170 So this is a study from a Ph.D. student in our group where she looked at exposure in Kenya to different land cover. 379 00:39:11,170 --> 00:39:18,640 How that varied with seasons and things like that. And with with or without livestock, so for example, 380 00:39:18,640 --> 00:39:25,930 you see pretty different patterns in terms of how people move when they have livestock in blue versus without livestock. 381 00:39:25,930 --> 00:39:31,330 People with livestock move further and then end up more likely to end up in 382 00:39:31,330 --> 00:39:37,980 farmland areas and grassland areas because that's where the ruminants feed. 383 00:39:37,980 --> 00:39:40,890 Google location history data are a really interesting way to do that, too. 384 00:39:40,890 --> 00:39:50,520 So this is what you'll be looking at where we can say, for example, if we want to look at a population and figure out who's using green spaces, 385 00:39:50,520 --> 00:39:57,720 how often they go there in which green spaces are actually used, we can overlay those data with green space maps. 386 00:39:57,720 --> 00:40:05,580 So this is our pilot study in Southampton, where we look to see which green spaces are moved the most used the most often by a university population, 387 00:40:05,580 --> 00:40:10,410 which might be useful to say, OK, these are the ones that need to be maintained. 388 00:40:10,410 --> 00:40:16,860 They need to be invested in, whereas other ones we need to figure out why people aren't using them, whether or not there's way to get them to be used. 389 00:40:16,860 --> 00:40:23,640 More things like that. So I'll move on then to go to the mesoscale, to go sort of a step up. 390 00:40:23,640 --> 00:40:27,000 Right now, we've been talking about really local scale, you know, 391 00:40:27,000 --> 00:40:34,590 how does your risk and your exposure to a disease depend upon your really fine scale movement dynamics? 392 00:40:34,590 --> 00:40:44,100 Then we'll now move to sort of the subnational picture to say, how does it affect metrics of, say, 393 00:40:44,100 --> 00:40:50,250 incidents and things like that when you measure them across the entire country at the health facility level? 394 00:40:50,250 --> 00:40:57,600 So the general idea behind this is that you don't just get sort of day to day dynamics, you get also seasonal dynamics. 395 00:40:57,600 --> 00:41:02,610 Farm workers, for example, will move between different parts of the country seasonally. 396 00:41:02,610 --> 00:41:11,580 People aren't always at home. You want to target resources based on the timing based on the months that you're actually doing that efforts. 397 00:41:11,580 --> 00:41:15,310 We might want to think about whether or not those mobility patterns correlate with transmission. 398 00:41:15,310 --> 00:41:22,110 So like, you know where in terms of malaria? Malaria is a disease that occurs during the wettest months of the year. 399 00:41:22,110 --> 00:41:27,720 Where are people spending their time in the wettest months of the year? And is that in the high risk areas? 400 00:41:27,720 --> 00:41:30,810 So to look at this, we look at Namibia again. 401 00:41:30,810 --> 00:41:39,630 This is a slightly larger data set of mobile phone data where it spanned from 2010 to 2014 to get a better picture 402 00:41:39,630 --> 00:41:46,410 of the seasonal population dynamics throughout the entire country and how it varies across different months. 403 00:41:46,410 --> 00:41:51,750 So if we just plot this, we see some really interesting patterns. So this is Namibia again. 404 00:41:51,750 --> 00:41:57,360 This is population change from month to month. So in the centre, we have then took in the north. 405 00:41:57,360 --> 00:42:06,450 We have a more rural area, but some populations again in the north centre and then in the northeast in Zambezi. 406 00:42:06,450 --> 00:42:12,630 If we look at sort of. To December into November. 407 00:42:12,630 --> 00:42:20,460 Or rather, November into December, we see that people leave in Turkey, they go up north into the north part of Namibia. 408 00:42:20,460 --> 00:42:26,310 This is because Christmas is a pretty big holiday. People are visiting family, things like that. 409 00:42:26,310 --> 00:42:30,900 And then December to January, we actually see the opposite pattern where people come back from the north, 410 00:42:30,900 --> 00:42:41,250 go back to work in Ventec and these and it's just so happens that this pattern correlates really strongly with the malaria transmission season. 411 00:42:41,250 --> 00:42:45,120 The peak of the malaria transmission season occurs in sort of March, 412 00:42:45,120 --> 00:42:53,820 but it starts to ramp up around January so you can envision the people get infected up north and then bring it back into then took with them. 413 00:42:53,820 --> 00:43:01,950 But then if we look at later months in the year, we see more people coming back into work in February and then it sort of evens out after that. 414 00:43:01,950 --> 00:43:05,550 So that's the main pattern that we see in that country. 415 00:43:05,550 --> 00:43:14,460 But we can take these data and we can say, OK, we know the relative population sizes in each of these cell phone catchments. 416 00:43:14,460 --> 00:43:22,170 We want to use that to scale the denominator that we put when we measure incidence of malaria across the entire country. 417 00:43:22,170 --> 00:43:28,770 So if we look at various health facilities throughout the country, here's one in Ventec, we see that exact pattern. 418 00:43:28,770 --> 00:43:37,590 So in December, we actually have fewer people around that health facility than in other months. 419 00:43:37,590 --> 00:43:47,970 And then sort of as the month moves on into February and March, we get more and more people and we see the opposite pattern up north. 420 00:43:47,970 --> 00:43:51,690 Again, people go up north for holiday and then they come back. 421 00:43:51,690 --> 00:43:55,290 So for example, here in this health facility up north, 422 00:43:55,290 --> 00:44:02,340 we have a much larger population than we expect and that affects our incidence estimates and we want to be 423 00:44:02,340 --> 00:44:08,310 able to account for that when we calculate how many people are actually getting infected per thousand. 424 00:44:08,310 --> 00:44:11,820 The other thing that happens in this data set is that it does increase over time, 425 00:44:11,820 --> 00:44:16,770 so there are more and more subscribers, so we have to scale those populations based on that. 426 00:44:16,770 --> 00:44:20,190 But then essentially what we can do is using a seasonal population, 427 00:44:20,190 --> 00:44:30,000 estimates with case counts at each health facility is get a new map of incidents throughout the entire country that are specific to each month. 428 00:44:30,000 --> 00:44:35,580 So we see here the ramping up of malaria in January, February and March. 429 00:44:35,580 --> 00:44:43,590 We get really high numbers of cases and then that sort of drops off into April, May and the rest of the year, it's really low. 430 00:44:43,590 --> 00:44:48,120 So by mapping where those populations are in each month and how many people are in each 431 00:44:48,120 --> 00:44:52,950 cell phone catchment tower in each cell phone tower catchment during those months, 432 00:44:52,950 --> 00:45:02,580 we can get a better picture of these actual estimates of cases per thousand across the entire year across the entire landscape. 433 00:45:02,580 --> 00:45:08,580 So we're trying to also operationalise this, so the seasonal population maps can improve targeting surveillance, 434 00:45:08,580 --> 00:45:11,070 so if we know what months people actually at home, 435 00:45:11,070 --> 00:45:18,120 we can target those months and those homes so that we can actually spray everything over the course of the year. 436 00:45:18,120 --> 00:45:24,960 So this is work building sort of web dashboards, particularly with the Clinton Health Access Initiative and partners in southern Africa. 437 00:45:24,960 --> 00:45:30,120 So, so the last topic I'll talk about is going larger scale than that. 438 00:45:30,120 --> 00:45:35,790 So we've talked about the really local scale movement, how that affects exposure and risk. 439 00:45:35,790 --> 00:45:42,930 We've talked about the mesoscale, how that affects seasonal population estimates and our measurements of things like incidence. 440 00:45:42,930 --> 00:45:45,780 So now I'll talk about long distance mobility, 441 00:45:45,780 --> 00:45:55,570 sort of that rare migration based movement and what that means for reintroduction and migration of parasites in regional and regional scales. 442 00:45:55,570 --> 00:46:01,730 So this is sort of migration movement across international movements, borders, 443 00:46:01,730 --> 00:46:06,400 it's relatively rare it has to do with reintroduction of malaria into countries. 444 00:46:06,400 --> 00:46:12,850 So here are the top right we're looking at Panama. Panama is really, really close to elimination, 445 00:46:12,850 --> 00:46:18,700 so they're trying to think about where our new case is going to come in from other countries after we eliminate. 446 00:46:18,700 --> 00:46:24,700 And here's a picture of where importation is occurring from sort of other countries 447 00:46:24,700 --> 00:46:34,240 throughout South America and how may actually happen in each province in the country. 448 00:46:34,240 --> 00:46:38,740 This is going to become more important of an issue as we have increased global connectivity around the world. 449 00:46:38,740 --> 00:46:48,190 So on the bottom right is a picture of sort of air travel. Each of these trips by plane might carry malaria parasites or other diseases with them. 450 00:46:48,190 --> 00:46:52,710 And so we're only going to think about this sort of issue more and more with time. 451 00:46:52,710 --> 00:47:01,710 So to do that, one of the most interesting things for policymakers is trying to figure out the countries in areas that are most closely connected. 452 00:47:01,710 --> 00:47:09,660 So, you know, for example, for Panama, what countries are most likely to import malaria into that country? 453 00:47:09,660 --> 00:47:16,380 If they can know that they can know which national malaria control programmes they need to coordinate with, 454 00:47:16,380 --> 00:47:22,920 which countries they need to be able to best understand the migration in the context of. 455 00:47:22,920 --> 00:47:32,520 And so one sort of analysis that's been really useful. There are community detection analyses where you take different countries, different units, 456 00:47:32,520 --> 00:47:40,150 see how they're connected and then partition them in a way that you get more and more movement within a community, then between communities. 457 00:47:40,150 --> 00:47:48,300 So here across Africa, what we have countries that are in the same colour are more likely to have connexions 458 00:47:48,300 --> 00:47:55,110 between them based on migration than countries that are in different in different colours. 459 00:47:55,110 --> 00:48:00,390 So, you know, initially this occurred using census migration data. 460 00:48:00,390 --> 00:48:04,860 We want to be able to update this information. It used it was at the national level. 461 00:48:04,860 --> 00:48:10,200 We want to include subnational patterns and update this sort of information. 462 00:48:10,200 --> 00:48:13,890 So to do that, we've combined various types of data. 463 00:48:13,890 --> 00:48:23,880 So we combine census data with mobile phone data across various settings to get sort of subnational predictions of movements, 464 00:48:23,880 --> 00:48:30,180 draw community structure based on that and then include international components. And when we do that, that's what this looks like. 465 00:48:30,180 --> 00:48:41,550 So this is on a regional scale sort of connectivity of populations using mobile phone data and census data in conjunction. 466 00:48:41,550 --> 00:48:47,010 And we can do things like predict sort of sink dynamics across that entire region. 467 00:48:47,010 --> 00:48:55,410 So areas in red or exporter's and malaria areas and blues are important areas that get lots of imported cases, 468 00:48:55,410 --> 00:48:59,460 and we can draw a similar community detection map across that entire region. 469 00:48:59,460 --> 00:49:03,990 But in a finer scale way to say what countries should what countries, 470 00:49:03,990 --> 00:49:10,930 but also what subnational units should coordinate their malaria elimination efforts. 471 00:49:10,930 --> 00:49:18,100 Another neat data set to do that, this is fairly new to do so, but we can use road network data so we can infer that imply that, 472 00:49:18,100 --> 00:49:24,550 you know, if areas are connected really well by population movement, then they're likely to have roads that connect them. 473 00:49:24,550 --> 00:49:32,140 We can draw communities based on that. Those shared road networks where, you know, here's a picture of roads across the region. 474 00:49:32,140 --> 00:49:40,870 We can say if if intersections are nodes and the roads themselves are the connexions between the nodes, 475 00:49:40,870 --> 00:49:46,150 then we can draw a community structure around that to be able to map a plot 476 00:49:46,150 --> 00:49:51,730 of sort of communities across the region based on that road network as well. 477 00:49:51,730 --> 00:50:00,100 And so ideally, what we'll do is we'll take different types of data, like the Community Structure Map based on this network road network data. 478 00:50:00,100 --> 00:50:08,560 Compare it against community structure maps from census data, from mobile, from mobile phone data, 479 00:50:08,560 --> 00:50:12,550 see if they correlate and see if there are similarities, and use that to be able to say, 480 00:50:12,550 --> 00:50:17,680 OK, we're pretty certain that these communities are real and that they're ones that 481 00:50:17,680 --> 00:50:25,060 malaria control programmes or other ministries of health should take into account. 482 00:50:25,060 --> 00:50:31,180 So that's a broad picture of sort of how various types of new data are being used to inform human mobility, 483 00:50:31,180 --> 00:50:45,280 population dynamics and issues like risk mapping incidents and mapping regions of communities across continental scales. 484 00:50:45,280 --> 00:50:51,850 You know there are lots of new data sets to do so, and I'd say we've only scratched the surface of doing so. 485 00:50:51,850 --> 00:50:58,810 GPS the type of data satellite derived data have a really long way to go to being implemented in these contexts. 486 00:50:58,810 --> 00:51:04,720 So the last thing I want end up with is talking about what I think the next steps are for this kind of research to push it forward, 487 00:51:04,720 --> 00:51:10,740 particularly in the context of social sciences and sociology. 488 00:51:10,740 --> 00:51:16,440 So that's I think, the future of doing this kind of work. Part of it has to do with generalising models. 489 00:51:16,440 --> 00:51:21,720 So in the places where we can actually get that fine scale picture of population dynamics, 490 00:51:21,720 --> 00:51:27,300 the gold standard data like mobile phone data or GPS data are really difficult to obtain. 491 00:51:27,300 --> 00:51:34,410 We want to be able to predict these patterns using more commonly found data like census data 492 00:51:34,410 --> 00:51:39,240 to be able to say to predict these sorts of patterns in areas where we can't collect them. 493 00:51:39,240 --> 00:51:46,470 And I think the other really important picture that we need to move towards is including demographic and socio economic information. 494 00:51:46,470 --> 00:51:58,440 So like I was saying earlier, mobility datasets like mobile phone data like Twitter data like Facebook data, they're anonymized. 495 00:51:58,440 --> 00:52:07,020 So they don't give us any information deliberately on the demographics and socio economic context of the people that are moving around. 496 00:52:07,020 --> 00:52:13,860 This is obviously really critical. Certain subpopulations are going to be really high risk of, say, malaria transmission. 497 00:52:13,860 --> 00:52:22,800 Farmworkers might spend more time outside. For example, children under five are especially at risk of malaria transmission because it means 498 00:52:22,800 --> 00:52:28,740 they're the most likely to get really adverse malaria related health effects. 499 00:52:28,740 --> 00:52:37,410 So where are we? Can I think getting sociodemographic and economic information alongside these mobility data will be really, 500 00:52:37,410 --> 00:52:40,440 really critical, but there are going to be ethical concerns along there. 501 00:52:40,440 --> 00:52:45,880 So I think that that hopefully links to what you've talked about earlier today. 502 00:52:45,880 --> 00:52:53,150 Within World Pop, I'd say we've started moving in that direction in terms of just mapping static populations specifically. 503 00:52:53,150 --> 00:53:02,740 So for example, there are people within world popular mapping women of childbearing age in 2010 across Africa, 504 00:53:02,740 --> 00:53:08,350 but using satellite derived data and other types of data. 505 00:53:08,350 --> 00:53:12,970 So this is a picture on the left of women of childbearing age, but it's the same sort of issue. 506 00:53:12,970 --> 00:53:22,540 This is the first step. This is a picture of where these populations are at a really fine scale, a really fine spatial scale. 507 00:53:22,540 --> 00:53:29,560 But we don't actually know how they're changing through time, how they're changing in terms of seasons and where they're spending time. 508 00:53:29,560 --> 00:53:38,410 So that's a really important, I think, missing picture. So in the context of migration mapping, we haven't done that in terms of migration mapping. 509 00:53:38,410 --> 00:53:43,810 All we really have are, you know, here's an aerial unit. 510 00:53:43,810 --> 00:53:47,740 Here's where people might be moving, but we don't know anything about those people. 511 00:53:47,740 --> 00:53:54,160 And that's really important missing link, I think, into the future. 512 00:53:54,160 --> 00:54:00,850 So, you know, in general, I'd say I focussed on sort of malaria for most of my talk, but this is important for a lot of things. 513 00:54:00,850 --> 00:54:04,150 Beyond that, getting the population denominators, 514 00:54:04,150 --> 00:54:13,360 getting human population densities right is a really important thing to do because it complements national data sources like censuses 515 00:54:13,360 --> 00:54:21,700 that are only occur once in a while that don't include things like seasonal movement that don't include exposure based mobility. 516 00:54:21,700 --> 00:54:27,400 It influences things like infrastructure, planning, measurement of health metrics and targeting of effort. 517 00:54:27,400 --> 00:54:34,690 Things like Where are you going to put your vector control efforts? And those denominators change over time. 518 00:54:34,690 --> 00:54:38,830 And they actually vary in terms of associated demographics. 519 00:54:38,830 --> 00:54:48,800 So we need new data to inform this and create those links that are currently missing between things like mobility and sociodemographic. 520 00:54:48,800 --> 00:54:54,920 So that's that's all I wanted to talk about now, I wanted to move into sort of a more interactive exercise, 521 00:54:54,920 --> 00:54:59,870 so what I wanted you all to be able to do is look at your own Google location, history data, the basics. 522 00:54:59,870 --> 00:55:01,640 So what I wanted to do. 523 00:55:01,640 --> 00:55:09,200 So using the code that I provided, the things you'll do will be looking at the relative amount of time spent in London across various days, 524 00:55:09,200 --> 00:55:18,050 across various hours, the locations that view the locations that are visited, the green spaces that you potentially visit. 525 00:55:18,050 --> 00:55:23,450 And in terms of hopefully being useful in terms of take home stuff, 526 00:55:23,450 --> 00:55:29,390 this exercise will involve interacting with your own location data, downloading it, seeing what that process looks like, 527 00:55:29,390 --> 00:55:38,630 just seeing actually what those data look like for you and what that would mean in terms of what you're collecting from other people and 528 00:55:38,630 --> 00:55:48,380 how we load those data into our working with those spatial data within AH and then plotting these data within the plot to our package. 529 00:55:48,380 --> 00:55:50,810 So, you know, hopefully that's useful for some people. 530 00:55:50,810 --> 00:55:58,910 I'm not sure the span of experience in terms of working with these sorts of things, but essentially the code that you'll be working with will be. 531 00:55:58,910 --> 00:56:07,490 So these are plots from my data. You'll be plotting the number of points across different days of the week that you spend in London. 532 00:56:07,490 --> 00:56:11,300 So you can see for me, I tended to spend my days in London, 533 00:56:11,300 --> 00:56:17,870 Fridays and Saturdays when I was going for fun Mondays and one day Wednesdays when I was going for not fun. 534 00:56:17,870 --> 00:56:23,150 But then I also tended to doing during the day. So day trips to London and you'll see these patterns in your data. 535 00:56:23,150 --> 00:56:33,020 Hopefully if you have them and then this is what it looks like for me in terms of where I spent time across the entire city of London. 536 00:56:33,020 --> 00:56:37,760 So in red, that's where I tended to spend time and you know, I'm a tourist. 537 00:56:37,760 --> 00:56:44,060 Being an American spent a lot of time visiting all the tourist areas, so central London was pretty popular, 538 00:56:44,060 --> 00:56:52,640 but then also the specific sub setting the points that are actually within green spaces across the city here on the right. 539 00:56:52,640 --> 00:57:00,350 So see it on the left one here. I can't see it quite so on the right, but essentially, you know, the popular parks within the central London area. 540 00:57:00,350 --> 00:57:05,140 That's that's where I went because I was being a tourist. So. 541 00:57:05,140 --> 00:57:11,330 So I'll leave this up for you. Basically, what you've got is our code to process the data. 542 00:57:11,330 --> 00:57:14,840 These are the steps to actually access and download your own data. 543 00:57:14,840 --> 00:57:22,370 You'll go to Google, take out and log in as needed and then select, you know, these are the basic steps, 544 00:57:22,370 --> 00:57:27,800 or go to select the data to include all you want to download are your location history data. 545 00:57:27,800 --> 00:57:36,530 Download that archive. It'll take a few minutes to download. It'll send you an email when it's done, and then you'll it saves it as JSON file, 546 00:57:36,530 --> 00:57:47,000 which will load into our using the from JSON package out of the JSON light package and from JSON function. 547 00:57:47,000 --> 00:57:52,190 If you don't have data or if you don't feel like accessing your own, I've actually provided you a subset of my own data. 548 00:57:52,190 --> 00:57:58,610 So if you do that, you'll see in the code you can plot that and that plots the creates the plots that I just created. 549 00:57:58,610 --> 00:58:03,600 So I figured for this last 30 minutes, I'll be around to answer questions that you have. 550 00:58:03,600 --> 00:58:08,930 But also, you guys can sort of hopefully have a little interactive demonstration of these location data. 551 00:58:08,930 --> 00:58:20,060 So thanks. Yeah, sure, yeah. 552 00:58:20,060 --> 00:58:28,130 You guys have any questions. The microphone. 553 00:58:28,130 --> 00:58:34,640 And is there a reason why you only focus on the low and middle income countries because I guess there are a lot of applications? 554 00:58:34,640 --> 00:58:41,840 Yeah, yeah. No, I think that that just tends to be our focus on low and middle income countries is purely based on 555 00:58:41,840 --> 00:58:49,430 the premise that I initially got interested in malaria because I was interested in disease ecology. 556 00:58:49,430 --> 00:58:56,150 Malaria is a nice one for that. And then malaria tends to be a low and middle income, country based disease and across the world, 557 00:58:56,150 --> 00:59:01,160 pop that has to do with sort of our partnerships with the UN W.H.O. 558 00:59:01,160 --> 00:59:06,350 But you're right, there's a lot of really fascinating potential applications of these data. 559 00:59:06,350 --> 00:59:12,740 So I was talking with somebody up to Liverpool, for example, who's studying the health effects of owning dogs. 560 00:59:12,740 --> 00:59:17,840 And we talked a little bit about from dog owners getting Google Location history data, 561 00:59:17,840 --> 00:59:22,790 seeing if they walked more before and after actually getting those dogs. 562 00:59:22,790 --> 00:59:29,990 You know, air pollution is another really exciting, I think, potential application, seeing how it's a big issue throughout the UK, 563 00:59:29,990 --> 00:59:37,310 seeing where that air pollution exposure happens, who it happens to and when that happens, those are really exciting applications, too. 564 00:59:37,310 --> 00:59:48,340 I mean, I haven't say I can't say I've done too much along those lines, but it is a really exciting avenue as well. 565 00:59:48,340 --> 01:00:00,990 Wait, wait. OK. 566 01:00:00,990 --> 01:00:07,410 My question is more to do with the majority of the talk you spend on straight line distance measurements. 567 01:00:07,410 --> 01:00:11,850 Yeah. And they're actually not that good. Yeah, there's a mountain range in between then. 568 01:00:11,850 --> 01:00:16,200 Yeah, a straight line is missing everything. Yep. So a straight line is nice. 569 01:00:16,200 --> 01:00:18,360 Yeah, it's very coarse. Yeah, right. 570 01:00:18,360 --> 01:00:26,370 And given that there's no much information on specific mobility patterns in developing countries using things like ruggedness or slope, 571 01:00:26,370 --> 01:00:35,790 but you can only compute with edges. Yeah, you could get a much better idea of specific connectivity that's not measured in straight line. 572 01:00:35,790 --> 01:00:39,870 Yeah. Patterns. And I thought in the end, I assume, is that an awful map? 573 01:00:39,870 --> 01:00:45,720 The road map is that one from the North? Yeah, yeah. Yeah. So that might be nice to compute this. 574 01:00:45,720 --> 01:00:52,470 Yeah, with to get some time of travel distance felt and travel time to get a better measure of connectivity. 575 01:00:52,470 --> 01:00:58,200 Yeah, that's one. And then the other question is, this is more exploratory. 576 01:00:58,200 --> 01:01:03,750 Can you actually check mobility of mosquitoes rather than you go on the other side? 577 01:01:03,750 --> 01:01:05,110 Yeah, yeah. Yeah. 578 01:01:05,110 --> 01:01:14,640 So to the first one, I think that is really, really important and some of the work in terms of creating good models of how people are moving around. 579 01:01:14,640 --> 01:01:17,220 We are starting trying to incorporate that sort of information. 580 01:01:17,220 --> 01:01:28,050 So like, you know, taking accessibility surfaces, looking at the shortest distance between areas and like how long that takes things like that. 581 01:01:28,050 --> 01:01:33,630 But actually beyond that, I think those sorts of satellite imagery is where if you can get really fine scale data, 582 01:01:33,630 --> 01:01:36,000 you can get an even better understanding. 583 01:01:36,000 --> 01:01:41,340 So for example, what I've done so far with those road networks is assumed that the speed limit tells you a little bit, 584 01:01:41,340 --> 01:01:47,250 that the speed limit basically tells you how fast you can move on that road, which is really not realistic. 585 01:01:47,250 --> 01:01:52,680 I mean, there's rush hour a lot of time to go slower. Sometimes there are roads that people go a lot faster on. 586 01:01:52,680 --> 01:01:58,930 So I think if you can, you know, one thing that I'm really interested in is if we can get those really fine scale data from 587 01:01:58,930 --> 01:02:05,220 sources or from smartphones overlay the implied speed from those points on to the road network. 588 01:02:05,220 --> 01:02:13,080 We can even improve on those satellite derived data and get an even better picture of like accessibility and things like that. 589 01:02:13,080 --> 01:02:18,120 You know, another thing there is like with those studies, usually people assume that you take the easiest route. 590 01:02:18,120 --> 01:02:22,800 That's not always the case. And so if you have, say, the mobility data that tells you, Oh, well, actually, 591 01:02:22,800 --> 01:02:31,050 they tended to go to this other city on the way because it has some, you know, gas stations or fun things to do, things like that. 592 01:02:31,050 --> 01:02:35,160 The second one, yeah, I'm not. I haven't tried to do that much with tracking mosquitoes. 593 01:02:35,160 --> 01:02:41,520 There is, you know, the issue with mosquitoes. Oftentimes what people will do is try to do mark recapture studies. 594 01:02:41,520 --> 01:02:51,180 So like you put fluorescent dust on mosquitoes, see if you can catch them two or three days later, see how far they've gone. 595 01:02:51,180 --> 01:02:56,580 But then you can imagine you get like point zero one percent of the mosquitoes that you released. 596 01:02:56,580 --> 01:03:03,390 Yeah, there are people doing that. I haven't personally worked with that. It seems like it's a challenging setting, but obviously it would be. 597 01:03:03,390 --> 01:03:09,690 It's it's the other side where I've worked with sort of the human movement dynamics parts, 598 01:03:09,690 --> 01:03:16,710 the other side of it to think about the mosquito mosquito dynamics as well. So. 599 01:03:16,710 --> 01:03:22,150 Not able. Yeah. 600 01:03:22,150 --> 01:03:27,370 Yeah. He would tell you something about the. 601 01:03:27,370 --> 01:03:28,930 Yeah, yeah, yeah, that's true. 602 01:03:28,930 --> 01:03:35,980 So there are people who are modelling, yeah, if you if you release not infertile mosquitoes, how far do you actually expect? 603 01:03:35,980 --> 01:03:39,700 That's a fact to be expected to be dispersed because they're going to move so 604 01:03:39,700 --> 01:03:44,320 far that it'll just be diffused across a really large mosquito population? Yeah, yeah. 605 01:03:44,320 --> 01:03:52,120 I haven't done that work. But there are people who are using, say, dispersal models, mathematical models to do that kind of thing as well. 606 01:03:52,120 --> 01:03:55,720 Yeah. Thanks a lot. 607 01:03:55,720 --> 01:03:56,860 That was fascinating. 608 01:03:56,860 --> 01:04:03,820 It struck me that with some of your data, for example, the satellite imagery and mapping data, so OpenStreetMap and Google Maps and so on, 609 01:04:03,820 --> 01:04:11,560 these are publicly available and kind of accessible to a large degree, but with cool data records in particular and mobile phone coverage. 610 01:04:11,560 --> 01:04:13,060 Yeah, these are often very protected. 611 01:04:13,060 --> 01:04:19,570 So if you could speak a bit more about future initiatives to make these more available if these exist, yeah, yeah. 612 01:04:19,570 --> 01:04:28,150 No, I think that that's why the where I was talking about generalising the data, creating models that actually fit to these mobile phone data, 613 01:04:28,150 --> 01:04:28,600 for example, 614 01:04:28,600 --> 01:04:37,120 I think that's why that's so important because even for us with the even when we actually have an agreement in place with network operators, 615 01:04:37,120 --> 01:04:47,500 sometimes it can take months, years to have the legal part of it finalised to be happy on both sides and actually get the effort because you know, 616 01:04:47,500 --> 01:04:51,700 when it actually is, that sort of one on one effort with the mobile operator to get those data. 617 01:04:51,700 --> 01:04:57,400 They they are the ones who ultimately have to do it, and oftentimes you don't have the personnel support to do it. 618 01:04:57,400 --> 01:05:04,240 So there are cases where I've waited. Well, there are data. There are data sets right now that I've waited over two years to get. 619 01:05:04,240 --> 01:05:07,810 Just because it's difficult to get those links going and to make sure that they move on. 620 01:05:07,810 --> 01:05:12,760 So, you know, that is something that yeah, I mean, 621 01:05:12,760 --> 01:05:23,620 it's something that I'd like to do in terms of if we can choose a bunch of layers of satellite imagery of other types of mobility data, 622 01:05:23,620 --> 01:05:27,370 get a reasonably strong prediction of these sorts of patterns in mobile phone data, 623 01:05:27,370 --> 01:05:37,450 then that that really opens it up to other groups being able to work with it for us, being able to apply to other settings and that sort of thing. 624 01:05:37,450 --> 01:05:53,120 Yeah, but it is. It's a challenge. Even if you have those agreements in place, actually being able to do it still remains a challenge. 625 01:05:53,120 --> 01:06:06,880 Go. Yeah, so like I said, I'll be hovering around, hopefully, if you're playing around with that arcade, it'll work and then be useful. 626 01:06:06,880 --> 01:06:16,840 Let me know if you have any questions as you work on it. 627 01:06:16,840 --> 01:06:26,680 Nick, I have a question, too, and I'm wondering if you have if you've seen kind of prototype research that have used. 628 01:06:26,680 --> 01:06:32,410 I mean, you mentioned you've been using Google location history in some of in some of your work. 629 01:06:32,410 --> 01:06:37,960 But do you think they're kind of good examples of prototypes of work where people have, for example, 630 01:06:37,960 --> 01:06:46,180 done a survey where they have asked an embedded location, history or other kinds of sensor direct measurement? 631 01:06:46,180 --> 01:06:56,410 Yeah. And they're so in a sense combining through this hybrid design self-reported survey based measures along with with these digital traces. 632 01:06:56,410 --> 01:07:04,900 Yeah, there's there's a really interesting study in Zambia where they got they had people give GPS, they gave people GPS trackers, 633 01:07:04,900 --> 01:07:10,240 they looked at and they gave them one week each month across the course of the entire 634 01:07:10,240 --> 01:07:14,110 year to look at those so that they can actually get those seasonal patterns of movement 635 01:07:14,110 --> 01:07:18,400 and then actually correlated it with things like the weather that occurred during that 636 01:07:18,400 --> 01:07:24,460 time to look at how much impedance there was introduced by rainfall in a given week. 637 01:07:24,460 --> 01:07:30,010 And then also, you know, the occupation of people and like the sociodemographic information collected by 638 01:07:30,010 --> 01:07:37,330 the survey through GPS as well at the same time as they did the GPS sort of study. 639 01:07:37,330 --> 01:07:47,290 And so how those how those data correlated, you know, that's where I think a lot of that work has been is like sort of personally collected, 640 01:07:47,290 --> 01:07:52,160 personally carried GPS trackers that occur in a survey based design. 641 01:07:52,160 --> 01:07:58,810 I think the power of passively collected data is that it is a lot less research intensive, effort intensive. 642 01:07:58,810 --> 01:08:04,600 So for example, with these data, this is essentially the process that we have participants go through. 643 01:08:04,600 --> 01:08:09,250 They just go through and download it and then give it to us. They don't actually have to say, Kerry there. 644 01:08:09,250 --> 01:08:15,010 We don't talk to them initially say, carry your phone for a week and then give it back to us at the end of that week. 645 01:08:15,010 --> 01:08:21,700 Whereas with personally collected GPS trackers, it's this thing of you give them the GPS tracker, you have to follow up with them, get it back. 646 01:08:21,700 --> 01:08:26,950 So I think, you know, that's where if we can apply those sorts of methods in the past that have been 647 01:08:26,950 --> 01:08:32,140 done with GPS trackers apply it to new data sets that are passively collected, 648 01:08:32,140 --> 01:08:40,780 then we can get a much larger sample sizes in a much better sort of population level picture of these correlations between sociodemographic, 649 01:08:40,780 --> 01:08:44,693 economic status and mobility.