1 00:00:08,810 --> 00:00:15,980 Today, we're excited to hear from Guangdong on network based learning for understanding collective human behaviour. 2 00:00:15,980 --> 00:00:26,990 He's a lecturer in the Department of Engineering here at Oxford and a former postdoc in The Last Media Lab, where he is still an affiliate shoutin, 3 00:00:26,990 --> 00:00:37,760 works with graphs to model relational structure of data and develop new technologies that sit at the intersection of machine learning, 4 00:00:37,760 --> 00:00:41,540 signal processing and complex networks. So it's kind of a cool opportunity. 5 00:00:41,540 --> 00:00:50,450 We've touched quite a bit peripherally on mentioning networks, but now we're going to get to hear a little bit more in depth. 6 00:00:50,450 --> 00:01:01,430 And we're also lucky to be creating. Speaking of networks, a little bit of traffic closer here because at MIT Media Lab, he was in Sandy Petaling lab, 7 00:01:01,430 --> 00:01:11,300 which several friends and former members of of the six informal in other years and speakers this year, like Abdullah speaking at Princeton. 8 00:01:11,300 --> 00:01:15,180 Mm hmm. So we were creating friends of a friend of a friend. 9 00:01:15,180 --> 00:01:23,220 So, yeah, thank you very much for that saying. Yeah, thank you. 10 00:01:23,220 --> 00:01:30,930 So I hope this this is OK, you can hear it. Thank you very much for the kind introduction and the invitation. 11 00:01:30,930 --> 00:01:36,630 It's a great pleasure to be here and also to talk about some recent stuff that we did. 12 00:01:36,630 --> 00:01:41,400 So I'm in engineering here technically so, which means I have an engineering background. 13 00:01:41,400 --> 00:01:48,330 So that means when I talk about computational social science, I definitely have more on the computational side rather than the social side. 14 00:01:48,330 --> 00:01:57,270 So I will not pretend to be a social scientist here. So if I make an imprecise statement, please forgive me. 15 00:01:57,270 --> 00:02:06,390 Nevertheless, I think it's good to present some recent work that we have done to apply computational approaches to understand 16 00:02:06,390 --> 00:02:15,300 collective human behaviour and which I hope will bring some fresh ideas from a more computational learning perspective, 17 00:02:15,300 --> 00:02:23,850 which would be complementary to the traditional approaches. So for this reason, please feel free to interrupt during the presentation. 18 00:02:23,850 --> 00:02:29,700 If you have any questions or comments, I'll be happy to answer. 19 00:02:29,700 --> 00:02:36,420 So, yes, this works. So since we are in this summer Institute for Computational Social Science, 20 00:02:36,420 --> 00:02:41,130 I thought it would be good for us to first to look at a few historical development 21 00:02:41,130 --> 00:02:45,540 in applying computational methods to understand the society and behaviour. 22 00:02:45,540 --> 00:02:51,390 And you will see most of the many ideas were actually originated from the field of physics. 23 00:02:51,390 --> 00:02:58,590 So in early 19th century and inspired by a Newtonian physics society, 24 00:02:58,590 --> 00:03:08,520 was conceptualised as a machine and Auguste Collins was regarded as the father of sociology by many people at that 25 00:03:08,520 --> 00:03:17,020 time defined the concept of social physics as the study of the laws of societies and the science of the civilisation. 26 00:03:17,020 --> 00:03:27,050 In fact, the term sociology was invented, by our count, to designate the concept of social physics. 27 00:03:27,050 --> 00:03:35,760 And so if we want to study the laws of society, then we need information and, uh, what we call data today. 28 00:03:35,760 --> 00:03:45,810 So at the same time, and to be precise in 1830 for the London Statistical Society Statistical Society of London was founded, 29 00:03:45,810 --> 00:03:48,090 which eventually became the Royal Statistical Society. 30 00:03:48,090 --> 00:03:58,050 Nowadays, with the aim to procure a wrench and publish facts to illustrate the conditions and the prospects of society. 31 00:03:58,050 --> 00:04:02,910 So that was the of data collection. But in Victorian times, 32 00:04:02,910 --> 00:04:07,980 the way that we collected data was basically to go to different villages in this country 33 00:04:07,980 --> 00:04:13,860 and then to collect somewhat disconnected and fragmented pieces of information. 34 00:04:13,860 --> 00:04:19,800 So they're not good enough for us to illustrate the whole picture of the society, 35 00:04:19,800 --> 00:04:25,710 let alone to build computational models to understand that the behaviour of people. 36 00:04:25,710 --> 00:04:28,320 So this is a kind of the first wave of studies. 37 00:04:28,320 --> 00:04:37,740 The second wave took place in the mid-20th century, and again, the concept of social physics was reinvented. 38 00:04:37,740 --> 00:04:46,680 There was actually a research group called Department of Social Physics in Princeton in the 1950s. 39 00:04:46,680 --> 00:04:49,440 They existed for 20 years or something. 40 00:04:49,440 --> 00:04:57,480 So the main drivers of this wave was that many social indicators have been found to have statistical regularities, 41 00:04:57,480 --> 00:05:01,350 such as the zip distribution or gravity law. 42 00:05:01,350 --> 00:05:10,500 So zip distribution is one type of power or distribution, which was popularised by American linguist George Ziff, 43 00:05:10,500 --> 00:05:15,060 who studied the distribution of the frequencies of the words in a document. 44 00:05:15,060 --> 00:05:24,270 And he found that out that the frequencies of a word in a document is proportional to the rank of the word in the frequency table. 45 00:05:24,270 --> 00:05:32,670 So that means the most frequent. The word appears two times as many as often as the second most frequent and three times as many 46 00:05:32,670 --> 00:05:39,780 as the search for frequency and the same distribution has been discovered for other indicators, 47 00:05:39,780 --> 00:05:46,020 such as the populations of cities in countries or income distribution. 48 00:05:46,020 --> 00:05:56,460 And the gravity law is another example of which was also kind of proposed by Ziff, who studied the intercity movement in the United States. 49 00:05:56,460 --> 00:06:01,890 So he he discovered that the number of people who move between two different communities 50 00:06:01,890 --> 00:06:07,020 in the United States is proportional to the population of the two communities. 51 00:06:07,020 --> 00:06:18,880 But inversely proportional to the transportation distance between them. So like a P one, p two divided by D has the name Gravity Law, so. 52 00:06:18,880 --> 00:06:29,380 And then more recently, the third wave emerges where more people find the statistical regularities also within human movement and communication. 53 00:06:29,380 --> 00:06:35,560 So examples could be concepts such as tragic culture. 54 00:06:35,560 --> 00:06:39,400 So how people form a group so like a join groups of form ties. 55 00:06:39,400 --> 00:06:45,010 If a is a friend of both B and C, and B and C are likely to be a friend in future. 56 00:06:45,010 --> 00:06:53,290 Other concepts could be strong and weak ties. So these were the ideas originally proposed by Grant, the latter in the 70s. 57 00:06:53,290 --> 00:06:58,810 So he was basically saying that the links of a president in a social network can be divided in roughly two groups. 58 00:06:58,810 --> 00:07:07,870 The strong ties, which are close friends and weak ties, which are distant acquaintances of the weak ties in this case essentially are 59 00:07:07,870 --> 00:07:13,840 like links between that span the boundaries of different communities or groups, 60 00:07:13,840 --> 00:07:19,570 which are believed to be essential to a exchange of information and creation of new economic opportunities. 61 00:07:19,570 --> 00:07:26,510 So as a consequence, people who maintain what is weak ties are likely to be more successful in their career. 62 00:07:26,510 --> 00:07:33,010 The same idea has been tested at the community level. In a study in 2010 by Ego, 63 00:07:33,010 --> 00:07:38,830 so they basically looked at the communication network of communities across the whole country 64 00:07:38,830 --> 00:07:46,270 of U.K. and then they find the very strong correlation between the diversity of the community, 65 00:07:46,270 --> 00:07:50,940 of the communication patterns of the communities and the socio economic status. 66 00:07:50,940 --> 00:07:53,980 And so this paper itself has also some criticism, 67 00:07:53,980 --> 00:08:00,630 but I think it's one of the first studies that try to test these ideas at the national level, which is good. 68 00:08:00,630 --> 00:08:08,350 But the limitations of this kind of work would be mostly we don't really study the mechanism 69 00:08:08,350 --> 00:08:14,440 behind the social interactions or what is the cause of the observed statistical regularities, 70 00:08:14,440 --> 00:08:19,720 which could be due to a lack of appropriate data or computational methods to do so. 71 00:08:19,720 --> 00:08:28,750 So both things change dramatically more recently. So since 10 years ago, we haven't won more data about human behaviour. 72 00:08:28,750 --> 00:08:38,740 So you can think about social media location data that are that are collected by pervasive technologies such as mobile phones and credit cards. 73 00:08:38,740 --> 00:08:44,200 So one thing about these type of data is that they're passively collected so they are not, like obtained the survey. 74 00:08:44,200 --> 00:08:51,460 So potentially they contain less personal bias because usually the data are not collected 75 00:08:51,460 --> 00:08:56,590 where the people actually are knowing that they're actively being correctly collected. 76 00:08:56,590 --> 00:09:02,170 And also, of course, they are easy to scale to the population scale. 77 00:09:02,170 --> 00:09:07,270 So at the same time, we obviously have more advanced computational methods. 78 00:09:07,270 --> 00:09:17,860 You probably heard about machine learning or deep learning these kind of techniques that can allow us to make the best out of the data that we have. 79 00:09:17,860 --> 00:09:23,770 So the combination of these two basically leads to computational social science, 80 00:09:23,770 --> 00:09:29,590 in my view, which gives a few new perspectives in social science studies. 81 00:09:29,590 --> 00:09:39,250 So we can obviously think about moving away from static information collected by surveys to the actual behaviour of the people. 82 00:09:39,250 --> 00:09:45,690 So from demographics, age, gender to the actual behaviour of the people. 83 00:09:45,690 --> 00:09:49,350 And also from a single prison to a connected network, 84 00:09:49,350 --> 00:09:55,410 so you're not just looking at individuals separately, but you look at how they behave collectively, 85 00:09:55,410 --> 00:10:01,440 you look at this relationship between different people in a dynamic way, which means that the relationship can evolve over time. 86 00:10:01,440 --> 00:10:05,010 And obviously, you can easily look at a much larger population. 87 00:10:05,010 --> 00:10:12,360 So the practical impact of that could be if we think about most current population management solutions, 88 00:10:12,360 --> 00:10:16,200 they're mostly focussed on demographics, individual records and static information. 89 00:10:16,200 --> 00:10:24,450 And think about how banks give credit limits to to you or the network providers to keep videos about mobile phone contracts. 90 00:10:24,450 --> 00:10:32,580 It's mostly about your individual records in history without looking at a kind of a network of people. 91 00:10:32,580 --> 00:10:39,900 So the new way could be to actually look at the actual behaviour of people in a collective way and also dynamically. 92 00:10:39,900 --> 00:10:47,940 But these, of course, need more and more those and the matters because we need to understand better the structures and the complexities. 93 00:10:47,940 --> 00:10:55,320 And and this also poses challenges in developing advanced computational methods in the context of social science research. 94 00:10:55,320 --> 00:10:59,820 So the topic today will mostly be this part. 95 00:10:59,820 --> 00:11:06,840 So how we move from individual behaviour to collective behaviour because I think that will be something important to understand, 96 00:11:06,840 --> 00:11:16,620 also interesting to understand. And so our focus on two pieces of studies, I was pleased to talk equally more or less. 97 00:11:16,620 --> 00:11:20,670 The first question is how social network will affect the decision making. 98 00:11:20,670 --> 00:11:29,280 So we all know that, for example, if I tell you something about a restaurant, maybe you are likely to go sometime in the future. 99 00:11:29,280 --> 00:11:37,020 So that's like an influence. But how this influence propagates in social network is an interesting and open question. 100 00:11:37,020 --> 00:11:44,580 Second, in many cases, we don't observe a social network. We just observe the decisions that are taken by people. 101 00:11:44,580 --> 00:11:49,200 So it will be very interesting to see if we can do the some of the reverse problem. 102 00:11:49,200 --> 00:11:57,030 We observed the actions and then we tried to infer the connexions between them so that we can understand that the communities, et cetera. 103 00:11:57,030 --> 00:12:08,670 So our first talk about that, the the first question. So the kind of motivation here is it's basically we all know that nowadays we live 104 00:12:08,670 --> 00:12:15,180 in a connected society and we want people who are very worried if we are connected. 105 00:12:15,180 --> 00:12:26,130 While few hopes of acquaintances and ideas of the six degrees of separation was mostly first realised by Milgram's experiment in the 70s. 106 00:12:26,130 --> 00:12:35,570 Once you did this, just to ask a present in who lives in salary in the US, Nebraska to to send a letter to somebody in Massachusetts. 107 00:12:35,570 --> 00:12:40,780 And the rule is just to send a letter to one of the people the sender knows. 108 00:12:40,780 --> 00:12:45,570 And so eventually he did a study and then eventually, I think 25 percent of the letters arrived. 109 00:12:45,570 --> 00:12:51,900 But amongst the letters that arrived, the average hope that took is like five point five. 110 00:12:51,900 --> 00:12:57,270 So that was one of the original experiments that gave the idea of six degrees of separation. 111 00:12:57,270 --> 00:13:02,910 And but we are not just connected in a society in like a factual way. 112 00:13:02,910 --> 00:13:05,910 Actually, we also influence each other if we are connected. 113 00:13:05,910 --> 00:13:13,560 So there is several studies by Christakis and Fowler that look at the spreading of behaviours, 114 00:13:13,560 --> 00:13:18,990 certain type of phenomenon in the social network, such as obesity, happiness or smoking behaviour. 115 00:13:18,990 --> 00:13:24,600 And they find that this kind of behaviour can spread two to three hops away from you. 116 00:13:24,600 --> 00:13:28,620 So it's remarkable the way if you think about it may be a person that you don't even know. 117 00:13:28,620 --> 00:13:38,880 In some cases. So the thing that remains open is how this influence propagate in the social network a scenario. 118 00:13:38,880 --> 00:13:45,600 So how might decision will be affected by you if I received some information from you? 119 00:13:45,600 --> 00:13:53,670 And especially in the offline setting, there are a few studies on in the online setting like Facebook, these kind of platforms. 120 00:13:53,670 --> 00:14:01,470 But offline settings are difficult because it's not easy, usually to carry out a large scale offline experiment. 121 00:14:01,470 --> 00:14:09,810 And we also need more advanced methods compared to the traditional asset based methods to to basically quantify the influence. 122 00:14:09,810 --> 00:14:13,380 So that's basically the motivation of this study. 123 00:14:13,380 --> 00:14:26,220 We we looked at a particular dataset, which is a mobile phone dataset collected during an international event called Cirque du Soleil. 124 00:14:26,220 --> 00:14:40,440 It's basically a performance event in many countries. This one is in Andorra in Europe, took place every July, I think, and last for the whole month. 125 00:14:40,440 --> 00:14:45,280 So basically, we try to ask the question. 126 00:14:45,280 --> 00:14:47,110 How influence, 127 00:14:47,110 --> 00:14:56,770 how social influence can play a role in an individual decision making in the sense that whether or not to attend this event is likely to 128 00:14:56,770 --> 00:15:06,580 be affected by the fact that either you receive a phone call from an attendee or maybe somebody who has been contacted by the attending. 129 00:15:06,580 --> 00:15:16,960 These are just some basic statistics that I will explain further to, citing that we have a quite intuitive so we build the information cascade. 130 00:15:16,960 --> 00:15:21,760 So it's basically a phone communication network for a given observational period. 131 00:15:21,760 --> 00:15:29,450 You can take one day if you like. So the idea here is that let's say we have and who is who is the initial adopter. 132 00:15:29,450 --> 00:15:38,980 So he attended the event. And then after that, within this observational period, we observed that she made a phone call to three other people. 133 00:15:38,980 --> 00:15:47,680 OK, then you and Bob, since these people, after receiving the call from and there were further make calls to the other people. 134 00:15:47,680 --> 00:15:51,370 So this is basically the information cascade that we define. 135 00:15:51,370 --> 00:16:01,150 And here we basically define the shortest distance between any person to the original adopter as a hope index. 136 00:16:01,150 --> 00:16:08,650 So you can see Bob Casey and Daniel Day one hop away and all the other people to hopefully the 137 00:16:08,650 --> 00:16:14,170 people who have never been connected in this information cascade are the people who are outside. 138 00:16:14,170 --> 00:16:21,880 So in that sense, we do find all these people to be influenced and the people outside are not influenced, basically. 139 00:16:21,880 --> 00:16:35,260 So then we can observe whether all the rest of the people will make our adoption in this case, attend the event for the rest of the month, basically. 140 00:16:35,260 --> 00:16:39,160 So this is the basic setup. 141 00:16:39,160 --> 00:16:47,800 But one technical challenge in this case would be the decisions on adoption, or a lot of things may be attributed to two factors usually. 142 00:16:47,800 --> 00:16:55,130 One is called Homophily. The Chinese influence the Homophily is basically it means that a. 143 00:16:55,130 --> 00:17:02,570 Friendship is more likely to be formed between people who share certain characteristics in this case or to people like music, 144 00:17:02,570 --> 00:17:10,310 then they're likely to form a link between them instead of between a person who likes music and the person who likes books. 145 00:17:10,310 --> 00:17:20,570 That's the basic idea how links can be formed. So that means if we observe that these two people they both attend, let's say this is attendee. 146 00:17:20,570 --> 00:17:28,130 And then after the phone call, this person goes that we are not really sure whether that person goes because of the phone call, 147 00:17:28,130 --> 00:17:33,080 because it might be that they are just both interested in music by default. 148 00:17:33,080 --> 00:17:38,510 So it's kind of a bias that is caused by this concept of Homophily. 149 00:17:38,510 --> 00:17:47,840 So if we really want to quantify the influence of a phone call and which is the kind of treatment in our case, then we need to separate them. 150 00:17:47,840 --> 00:17:55,970 So this is the other scenario that we would like to consider if all of them are kind of matched based on their interest. 151 00:17:55,970 --> 00:18:00,260 So by default, they are kind of having the same amount of interest of attending the event. 152 00:18:00,260 --> 00:18:09,530 And then in that case, the phone call or basically play the role of affecting the decision making and the most of the studies this this influence, 153 00:18:09,530 --> 00:18:14,450 what we call is actually the combination of two things first, exposure. So I tell you the event. 154 00:18:14,450 --> 00:18:22,070 Second, my experience will basically affects what you do in the future. 155 00:18:22,070 --> 00:18:27,530 Usually, in most of the studies, we don't separate exposure and under the actual influence. 156 00:18:27,530 --> 00:18:40,890 So that's also the case here. OK, so to do this, basically our idea is to kind of use an idea similar to a randomised experiment. 157 00:18:40,890 --> 00:18:45,740 And so we do a matched sample estimation where propensity score matching sounds. 158 00:18:45,740 --> 00:18:55,190 You might have heard about this concept. The basic idea is that the propensity score is the likelihood of being treated given a set of calories. 159 00:18:55,190 --> 00:18:59,840 So in this case, the treatment is basically receiving a phone call. 160 00:18:59,840 --> 00:19:08,510 The koalas in this case obviously would be the, let's say, individual preferences by default, how much they like music or something else. 161 00:19:08,510 --> 00:19:15,590 So the idea here is basically to pair the individuals that have a very similar propensity score such 162 00:19:15,590 --> 00:19:22,820 that we kind of mimic the treatment assignment in randomised experiments to remove selection bias. 163 00:19:22,820 --> 00:19:30,230 So typically, the way that we match people based on demographic information, age, 164 00:19:30,230 --> 00:19:37,040 gender like this stratification in this case, and we would like to explore some new options. 165 00:19:37,040 --> 00:19:47,450 Basically, what we looked at is the review of the preferences of individuals through their mobile phone choices or mobility history. 166 00:19:47,450 --> 00:19:56,510 So there is a series of review with preferences and economics that especially with business and the marketing, 167 00:19:56,510 --> 00:20:06,680 which says that your preferences is sometimes reviewed by what you do, like the products that you buy, for example. 168 00:20:06,680 --> 00:20:18,230 So in this case, we we basically look at the mobility histories, which means the frequencies of the people visiting different locations in the past. 169 00:20:18,230 --> 00:20:25,520 And we hope that the fact that they visit different locations will kind of captured their interest. 170 00:20:25,520 --> 00:20:34,460 Yeah. So is this because you didn't have information on observed demographics or this is in addition to the observed demographic? 171 00:20:34,460 --> 00:20:37,640 So, so for us, we don't have demographics here. 172 00:20:37,640 --> 00:20:41,910 That's true. Second, on demographics, they also have limitations. 173 00:20:41,910 --> 00:20:45,770 For example, they are a information. They are not dynamic. 174 00:20:45,770 --> 00:20:53,270 Uh, so uh, what we hope here is basically to see whether this kind of a more dynamic 175 00:20:53,270 --> 00:20:59,180 representation of the preferences can lead to a better matching based method. 176 00:20:59,180 --> 00:21:05,090 And I think we have partial demographic information in this case. 177 00:21:05,090 --> 00:21:11,510 Maybe gender, I'm not sure. So we did some simple comparison to see whether there was a difference or not. 178 00:21:11,510 --> 00:21:16,340 But the main motivation of using this kind of a new koala is, in some sense, 179 00:21:16,340 --> 00:21:25,530 is to actually condition the matching based on the actual behaviour of the person in the past, rather than some static information like demographics. 180 00:21:25,530 --> 00:21:30,290 And so obviously, individual mobility history is is very. 181 00:21:30,290 --> 00:21:34,210 So how do you sort of decide how much further back in the past? 182 00:21:34,210 --> 00:21:38,090 Oh yeah. So in this case, we just look at the past six months. 183 00:21:38,090 --> 00:21:47,510 So here you can see basically, we have this little cells so that they are basically like a cell towers. 184 00:21:47,510 --> 00:21:51,980 So we just look at the frequency of the president visiting different places. 185 00:21:51,980 --> 00:21:59,900 So you can see in this case, the press. And on the left, he's my explorative because he goes to different places, more or less equal frequency. 186 00:21:59,900 --> 00:22:06,690 The president on the right, he's more exploitative because he has backs back to single location that he likes. 187 00:22:06,690 --> 00:22:16,370 So this gives us an idea of how to basically link this type of behaviour, 188 00:22:16,370 --> 00:22:26,930 which is reflected in the past history to the likelihood of doing something in this case that I can attending the event. 189 00:22:26,930 --> 00:22:39,890 So but obviously, there's no good answer how much you should look back so that you capture the the kind of genuine behaviour of the person. 190 00:22:39,890 --> 00:22:49,210 In these kind of spatial things, did you focus on weekend time if the person on the left, maybe their job just takes them around? 191 00:22:49,210 --> 00:22:53,840 It's mostly weekend weekends. Yeah, it's mostly weekend mobility. 192 00:22:53,840 --> 00:22:58,720 Yeah, which we hope captures the later related activities. 193 00:22:58,720 --> 00:23:05,810 And so at the same time, we also want to study the effect of influence with respect to the communication distance. 194 00:23:05,810 --> 00:23:08,270 So I introduced the concept, the Hope Index, 195 00:23:08,270 --> 00:23:15,140 which is like how far away in some sense you are from attending because maybe you don't receive a direct information, 196 00:23:15,140 --> 00:23:21,470 but maybe you receive the information from a friend who received a phone call from the person who attended and maybe the information got propagated. 197 00:23:21,470 --> 00:23:25,460 So that's in fact the main motivation of the study. 198 00:23:25,460 --> 00:23:29,210 For that reason, we basically have one treatment group for each hope index. 199 00:23:29,210 --> 00:23:33,530 So that means for people who receive the phone calls directly from the attendee, 200 00:23:33,530 --> 00:23:38,120 we basically put them as a treatment group and then we take a we construct the 201 00:23:38,120 --> 00:23:43,010 corresponding control group from the people who are outside of the cascade, 202 00:23:43,010 --> 00:23:51,680 but conditioned on the fact that the propensity score would be similar based on their past mobility history. 203 00:23:51,680 --> 00:23:53,840 So we just pair them like that. 204 00:23:53,840 --> 00:24:00,290 And if we do that, then basically the difference in future adoption rate, we establish upper bound of social influence. 205 00:24:00,290 --> 00:24:05,810 In fact, after you control the whole monthly basis, is this. 206 00:24:05,810 --> 00:24:11,640 So wait, so this only works if you have. 207 00:24:11,640 --> 00:24:17,220 The same call records from the same company for everyone, right? 208 00:24:17,220 --> 00:24:26,540 So like if I if I call you and you call her, but her is, but she's not in our phone company, you can't see what she's doing afterwards. 209 00:24:26,540 --> 00:24:27,660 Yeah, yeah, definitely. 210 00:24:27,660 --> 00:24:35,460 So the hope is that, well, in this case, because there is only a single network provider in the country, also with high penetration rate. 211 00:24:35,460 --> 00:24:43,770 But but in a country where you have more than one network provider, then that will usually what we see is true because I think in that case, 212 00:24:43,770 --> 00:24:47,250 most of the providers would only have a market of 20 30 percent of the people. 213 00:24:47,250 --> 00:24:54,910 So it's kind of the it's larger scale than surveys, but still it's a kind of a biased in some ways. 214 00:24:54,910 --> 00:24:58,440 So this is only based in Andorra and Andorra. There's only one provider. 215 00:24:58,440 --> 00:25:02,950 So mostly mostly, yeah, we have people from France and Spain, but mostly Andorra. 216 00:25:02,950 --> 00:25:06,990 OK. Yeah. Yes. And that's it connects to that. 217 00:25:06,990 --> 00:25:16,580 To do this, people say, You know what? You can have information on my call data or, uh, so could you friend again? 218 00:25:16,580 --> 00:25:21,360 These people give permission? Oh yeah. Well, yeah. 219 00:25:21,360 --> 00:25:29,790 The assumption is that when they basically signed a mobile phone contract and then they give the permission to these kind of studies. 220 00:25:29,790 --> 00:25:38,020 But I think that's a there's a big question that we can discuss separately because that's related to to the ethical concerns and the privacy. 221 00:25:38,020 --> 00:25:39,390 Yeah. And I'm quite interested. 222 00:25:39,390 --> 00:25:47,400 I think an implicit assumption you're making is that when somebody makes a phone call, we just assume that they are talking about the concept. 223 00:25:47,400 --> 00:25:53,400 That's right, because you can never actually track the content. So you don't know if actually they talked about the event or not. 224 00:25:53,400 --> 00:25:56,310 So you're there and we are just absolutely OK. 225 00:25:56,310 --> 00:26:02,730 Yeah, which is kind of one of the limitations that we can talk in the end because you don't carry out the survey. 226 00:26:02,730 --> 00:26:07,240 Therefore, you don't know exactly what I mean. You can't track the the content of the course. 227 00:26:07,240 --> 00:26:18,060 So, yeah, uh, so it's an upper bound of social influence because obviously there are many different observe the variables that may affect adoption, 228 00:26:18,060 --> 00:26:24,500 such as like offline TV advertisement or, you know, everything that you observe on the street or something like that. 229 00:26:24,500 --> 00:26:27,570 And so, uh, so yeah. 230 00:26:27,570 --> 00:26:36,150 So here I said, I said that we we have one treatment group for each hoping that because we want to understand who has this influence, a difference. 231 00:26:36,150 --> 00:26:42,270 Uh, for for people who are of different distance from the original attendee. 232 00:26:42,270 --> 00:26:54,840 And then basically we have this uh uh and uh, in fact, a treatment effect for each hope, uh, as I just defined. 233 00:26:54,840 --> 00:26:58,890 So what we see here basically is a curve where the access is to help index. 234 00:26:58,890 --> 00:27:04,020 So that means the, uh, distance away from the initial attendee. 235 00:27:04,020 --> 00:27:11,220 The y axis is the difference is the difference in the likelihood of attending the event in the future, uh, 236 00:27:11,220 --> 00:27:21,480 divided by the likelihood of attending and by the people in the control group, which is basically estimated using a logistic regression. 237 00:27:21,480 --> 00:27:29,700 So that basically means this number here means that if you received a phone call directly from an attendee, 238 00:27:29,700 --> 00:27:38,160 then you get one hundred and fifty percent more likely to due to adopt it in the future compared to a person who who has not been influenced. 239 00:27:38,160 --> 00:27:45,060 It's interesting that we see it's obviously we see a positive effect of social influence that's kind of expected. 240 00:27:45,060 --> 00:27:50,550 Uh, but it's mostly a dramatically for the direct contact, which makes sense. 241 00:27:50,550 --> 00:27:59,190 So if we receive a direct phone call, then then then that presumably gives a strong, uh, influence. 242 00:27:59,190 --> 00:28:05,700 But we also see that this effect remains significant for people who are further away. 243 00:28:05,700 --> 00:28:10,380 So we are mostly talking about a blue curve here for the moment. 244 00:28:10,380 --> 00:28:13,560 So you see, that is it remains significant. 245 00:28:13,560 --> 00:28:27,270 Even two, three or four hops away in this case, which indicates that the influence of the initial adopter reaches far beyond the immediate circle. 246 00:28:27,270 --> 00:28:33,060 It can even influence people who they probably don't know. 247 00:28:33,060 --> 00:28:38,330 Uh, so uh, do you understand the robustness of the results? 248 00:28:38,330 --> 00:28:46,500 We also did some tests. Uh, the first is basically to do exactly the same thing, but without controlling for Homophily. 249 00:28:46,500 --> 00:28:52,260 So that means we don't match people based on the past history. 250 00:28:52,260 --> 00:28:57,450 If you do that, you observe this purple curve is a random matching. 251 00:28:57,450 --> 00:29:05,340 You see the overestimation and the overestimation is mostly due to Homophily, and this is almost twice as much. 252 00:29:05,340 --> 00:29:11,500 And this is similar to the founding in this paper, uh, in 2009. 253 00:29:11,500 --> 00:29:17,450 The other thing that we did is basically to do a shuffling test of the Hope Hop Index, 254 00:29:17,450 --> 00:29:27,430 so if the president has a direct, uh, contact, we randomly change that number from one to five or six like that. 255 00:29:27,430 --> 00:29:29,470 And after that, we again do the magic. 256 00:29:29,470 --> 00:29:38,350 So the purpose of doing this is basically to see whether some of the variables were related to the decay patterns that we see. 257 00:29:38,350 --> 00:29:41,950 But the red curve basically shows that there is no such pattern. 258 00:29:41,950 --> 00:29:49,810 So the observed results is not likely to be mainly driven by something that we don't observe. 259 00:29:49,810 --> 00:29:59,920 So to understand how these patterns decay, so what is the mechanism that lead to this decay effect of social inference? 260 00:29:59,920 --> 00:30:03,880 We do the simple Bayesian model to to to understand it. 261 00:30:03,880 --> 00:30:11,510 So the basic idea is that in a communication cascade, an individual will basically updates for estimation of the product. 262 00:30:11,510 --> 00:30:15,910 So the product is defined that broadly, so whatever that you want to adopt, 263 00:30:15,910 --> 00:30:21,880 it can be a new app or it can be a piece of music or whether you want to attend the event or not. 264 00:30:21,880 --> 00:30:26,560 So an individual will update our estimation of the product characteristics. 265 00:30:26,560 --> 00:30:38,380 So what is the event like? Things like that after they talk to you, the people around him or her. 266 00:30:38,380 --> 00:30:44,950 And then after that, they will dynamically update the estimates by aggregating information from the neighbours. 267 00:30:44,950 --> 00:30:52,930 So then we assume that this estimation, together with the president's own preference by default, 268 00:30:52,930 --> 00:30:56,890 will basically form an evaluation of of the person for the event. 269 00:30:56,890 --> 00:30:59,080 So if the present estimates, 270 00:30:59,080 --> 00:31:10,780 the event is kind of like a pop music event with a huge crowd and his own preference is basically he likes these large scale events. 271 00:31:10,780 --> 00:31:19,420 And then he put together these two pieces of information to form his own evaluation, which is this is the event that I like to go. 272 00:31:19,420 --> 00:31:25,550 And then he will basically make the adoption decision following a simple binary probability distribution. 273 00:31:25,550 --> 00:31:30,430 And so the estimation is basically done like here. 274 00:31:30,430 --> 00:31:33,280 So Greg basically received a phone call from Bob, 275 00:31:33,280 --> 00:31:42,160 and then he will basically updated his estimation based on the evaluation of Bob and the fact that he knows that Bob likes certain things. 276 00:31:42,160 --> 00:31:45,940 And then basically, this will keep going, keep going like that. 277 00:31:45,940 --> 00:31:55,000 And and then after a while, we could basically see the estimation of the event by all the individuals. 278 00:31:55,000 --> 00:32:04,420 And now we can see how this estimation basically change when we move further away from the initial adopter. 279 00:32:04,420 --> 00:32:12,940 So what we measure here is the difference in the estimation between the individuals who are in the Cascades and the initial adopter. 280 00:32:12,940 --> 00:32:20,140 So the x axis here is again the distance, the hope index. The Y is basically the mean squared difference between the evaluation. 281 00:32:20,140 --> 00:32:25,720 What you see here is that this difference increases as the Hope Index increases. 282 00:32:25,720 --> 00:32:38,890 So this could mean the estimation of the event become less accurate and contains mobiles because the estimation of individuals contain bias. 283 00:32:38,890 --> 00:32:42,580 And this bias can accumulate along the information cascade. 284 00:32:42,580 --> 00:32:52,090 And also the fact that if you receive inference from multiple people can also have this aggregation effect. 285 00:32:52,090 --> 00:32:58,270 So in the end, people here they have a less accurate estimation of the event, 286 00:32:58,270 --> 00:33:01,930 which might be related to the decline pattern of the inference in this case. 287 00:33:01,930 --> 00:33:12,460 But this is still something that we're wanting to investigate further. So it's kind of like a conjecture that we hope to test further. 288 00:33:12,460 --> 00:33:26,590 So the main findings in this study is that first, we we try to use the revealed preferences instead of static demographics as covariance, 289 00:33:26,590 --> 00:33:32,950 uh, to, uh, quantify the inference in matching based estimation. 290 00:33:32,950 --> 00:33:40,780 Uh, and and we think this can actually overcome a lot of difficulties in traditional our city based approaches because, 291 00:33:40,780 --> 00:33:48,850 uh, sometimes, uh, for privacy reasons, you don't have the corroborates as you like. 292 00:33:48,850 --> 00:33:54,220 Sometimes you have, but they remain static, so you can't capture the dynamic changes. 293 00:33:54,220 --> 00:34:01,720 If there are, uh, the empirical finding is that we observed these long range effects of social influence, uh, 294 00:34:01,720 --> 00:34:11,090 which may have implications in scenarios like our marketing or public health management when you try to basically use a. 295 00:34:11,090 --> 00:34:16,490 Online communication to influence people's decision making, 296 00:34:16,490 --> 00:34:24,680 all these situations and the limitations are mostly the identification of the initial 297 00:34:24,680 --> 00:34:29,250 adopters is not precise because we don't really know whether the person at hand or not. 298 00:34:29,250 --> 00:34:43,100 We just know that the person had some activities within the very small range from the venue of the event, but we didn't actually ask the people. 299 00:34:43,100 --> 00:34:50,240 So we don't know whether they actually attended or they just passed by Keith. 300 00:34:50,240 --> 00:34:58,770 So to come back to this point about matching was on on static demographics versus the more dynamic ones dynamic characteristics, 301 00:34:58,770 --> 00:35:05,730 I mean, to some extent, dynamic attributes, there's a certain androgyny do there because things that are changing are they 302 00:35:05,730 --> 00:35:10,050 could be changing because people are themselves being influenced themselves. 303 00:35:10,050 --> 00:35:14,220 And we don't really observe that experience. So how do we? 304 00:35:14,220 --> 00:35:16,200 So while I'm sympathetic to your point, 305 00:35:16,200 --> 00:35:22,740 I also kind of wonder about that issue of indulgent idea of the fact that mobility itself reflects influence processes that are happening. 306 00:35:22,740 --> 00:35:28,590 I go to this museum where someone told me to go to. Yeah, I think that's true. 307 00:35:28,590 --> 00:35:36,360 I didn't present a result where we basically compare the two different matching wines based on demographics. 308 00:35:36,360 --> 00:35:40,620 The other is based on demographics plus the mobility history. 309 00:35:40,620 --> 00:35:49,650 And the results are mostly similar, which makes us think like the mobility histories contain the information of the demographics in some sense. 310 00:35:49,650 --> 00:35:52,440 But this by itself is a very interesting question to ask, 311 00:35:52,440 --> 00:36:01,390 like whether these observe the behaviour to which extent they captured the static attributes that we use in the traditional case. 312 00:36:01,390 --> 00:36:09,900 But more importantly, if you are not allowed to collect these kind of demographics, then then then these can be more useful. 313 00:36:09,900 --> 00:36:15,820 And other obvious things is that the dramatic exists multiple treatment effect where you receive multiple phone calls. 314 00:36:15,820 --> 00:36:22,240 But right now, the distance that we define is basically the shortest distance to the attendees. 315 00:36:22,240 --> 00:36:27,000 So. So obviously, that's something that we can do further. 316 00:36:27,000 --> 00:36:34,500 But the main idea is that given the social network in this case, we want to see how this influence payroll and the how it can propagate, basically. 317 00:36:34,500 --> 00:36:39,180 But in some cases, you don't observe this communication network. 318 00:36:39,180 --> 00:36:47,020 But we just observe that the adoption behaviour of people so they decide to do something if. 319 00:36:47,020 --> 00:36:56,480 Yes. So could you speak a bit more about how the estimate of the product was done because you calculated it means great difference? 320 00:36:56,480 --> 00:37:01,250 So I guess these estimates were actual numbers and what went into the calculation of these? 321 00:37:01,250 --> 00:37:06,200 Yeah, so so do we initialised as following a certain distribution. 322 00:37:06,200 --> 00:37:16,640 And then after that, let's say, if you called me, then basically you will have your estimation, which leads to your your evaluation. 323 00:37:16,640 --> 00:37:21,770 And we also assume that each person has the initial vector of preferences. 324 00:37:21,770 --> 00:37:30,020 So based on what I receive from you, I will basically just use a baseline rule to compute the posterior of my estimation, 325 00:37:30,020 --> 00:37:34,190 conditioned on the fact that I observe the information from you also. 326 00:37:34,190 --> 00:37:39,080 So it's basically the posterior is basically the prior proportional to the prior times. 327 00:37:39,080 --> 00:37:48,020 The likelihood the prior is basically my original sorry, the prior is basically the estimation from the past timestamp. 328 00:37:48,020 --> 00:37:59,840 The likelihood is basically what is the likelihood of this given estimation, such that it leads to the information that I received from you? 329 00:37:59,840 --> 00:38:11,030 So you just a simulate all these quantities by putting distributions on top of them and then you just compute. 330 00:38:11,030 --> 00:38:16,520 This estimate is sort of an average of the vector is it's a vector. 331 00:38:16,520 --> 00:38:25,160 Is it a vector? Yeah. Yeah, OK. And then this is basically a square that is just a difference between the vector. 332 00:38:25,160 --> 00:38:33,980 Yeah. So the second question that we want to ask, which is just kind of conceptually the reverse of the problem, 333 00:38:33,980 --> 00:38:43,280 if you observe some and adoption actions or whatever actions, what could be the opposite of the social network that lets you these actions? 334 00:38:43,280 --> 00:38:48,620 We will use a slightly different framework here, which is based on game theory. 335 00:38:48,620 --> 00:38:58,370 So we are going to make a small transition here. But the conceptual, uh, clustering is the reverse of the first one that we ask. 336 00:38:58,370 --> 00:39:04,550 So let's start with some simple examples consider a group of students making choices on an educational effort. 337 00:39:04,550 --> 00:39:12,500 So how much time you use? You want to put just two to a certain study, so they will basically follow a number of rules. 338 00:39:12,500 --> 00:39:19,100 The first rule will be the the making effort is costly, obviously for me, but I will benefit from my own efforts, obviously. 339 00:39:19,100 --> 00:39:23,960 And in this case, I also benefit from my friend's effort because it's a collective study. 340 00:39:23,960 --> 00:39:31,970 It's, of course, projects like that. So if we have these kind of rules, then the president will tend to put more effort. 341 00:39:31,970 --> 00:39:42,920 The friends of the president put more effort. So one scenario could be this where the colours divide how much effort they want to put in. 342 00:39:42,920 --> 00:39:53,020 So you see the president here, it was a lot of effort because many of his friends put a lot of effort, and the opposite is true for this blue guy and. 343 00:39:53,020 --> 00:40:01,810 A second example is to consider a group of students making choice to buy a book, whether each individual wants to buy a book or not. 344 00:40:01,810 --> 00:40:04,870 Obviously, in this case, buying a book is costly. 345 00:40:04,870 --> 00:40:13,510 I cost the time and money, so if a friend of mine will buy, then I will not buy based on the assumption that I can easily borrow. 346 00:40:13,510 --> 00:40:18,250 But even now, my friends will abide by it because I definitely need the book. 347 00:40:18,250 --> 00:40:25,660 So if we have this kind of rules, then we are likely to end up in a situation like these. 348 00:40:25,660 --> 00:40:30,640 The right guys decide to buy a book because none of their neighbours decides to 349 00:40:30,640 --> 00:40:36,580 buy and the Brueghel they don't buy because at least one neighbour about the book. 350 00:40:36,580 --> 00:40:39,880 So this is another scenario. 351 00:40:39,880 --> 00:40:48,970 Both of the scenarios you can see that the existence of strategic interactions between people, which can be modelled as games. 352 00:40:48,970 --> 00:40:55,030 Game theory is basically a mathematical models of a rational decision making between individuals, 353 00:40:55,030 --> 00:41:03,850 and usually it contains key concepts such as a player actions and actions kind of binary, whether by a book or not. 354 00:41:03,850 --> 00:41:13,390 It can also be continuous, like the amount of effort that you put, and the payoff is basically how much reward you got for your action. 355 00:41:13,390 --> 00:41:21,640 What is so all the these concepts they remain in what we call on game social networks, 356 00:41:21,640 --> 00:41:29,710 but for the concepts here we have an additional information, which is the interaction network. 357 00:41:29,710 --> 00:41:37,870 So friendship network, for example. So the interesting idea here would be to see how they're in. 358 00:41:37,870 --> 00:41:42,940 The virtual actions can be related to the structure of the network. 359 00:41:42,940 --> 00:41:48,700 So I can give you an example. In the previous cases, if you are central in the network, you are well connected, 360 00:41:48,700 --> 00:41:57,310 then obviously that will affect how much effort that you're put in projects or whether you are going to buy a book or not. 361 00:41:57,310 --> 00:42:02,020 So the relationship between the individual decision making and the structure of the 362 00:42:02,020 --> 00:42:07,890 network is the most interesting aspect in these models of the games on networks. 363 00:42:07,890 --> 00:42:17,560 And obviously, the basic assumption is that the payoff of one individual depends on her action, but also the actions of the neighbours. 364 00:42:17,560 --> 00:42:26,650 So the two things that we talked about are actually examples of what we call strategic accomplishments and substitute. 365 00:42:26,650 --> 00:42:39,280 So I think this is quite a self-explanatory. So there are many studies dedicated to this feud, which is a subject of economics. 366 00:42:39,280 --> 00:42:47,980 And if not, people look at the equilibrium of games played on networks and on a given or preferred network. 367 00:42:47,980 --> 00:42:52,900 They're mostly interested in how action and the payoff depends on the network structure. 368 00:42:52,900 --> 00:43:00,020 Uh, in computer science, the same ideas is studied under the name of graphical games. 369 00:43:00,020 --> 00:43:08,570 Uh, what they're interested in is mostly to develop algorithms to compute the equilibrium in the computational sense. 370 00:43:08,570 --> 00:43:18,190 And so what we have more interested in this work is a reverse problem, which is trying to infer the network given the conditions that we observe. 371 00:43:18,190 --> 00:43:20,530 So we don't assume that we have a network, 372 00:43:20,530 --> 00:43:26,800 but we assume that the people play games in the hidden network and then we want to basically recover that structure. 373 00:43:26,800 --> 00:43:32,290 You can find a lot of examples. We observe individual decisions, adoptions, but not social networks. 374 00:43:32,290 --> 00:43:35,710 We observed the research and development activities of firms, 375 00:43:35,710 --> 00:43:45,430 but we don't observe the collaboration network of firms or policies of countries, but not the political alliances. 376 00:43:45,430 --> 00:43:54,160 So obviously, the the difficulty here is the complexity of the network, which makes the inference more difficult. 377 00:43:54,160 --> 00:43:59,680 As a result, we have to kind of put some structure to the game. 378 00:43:59,680 --> 00:44:06,040 Otherwise, it would be very difficult to make any inference the specific structure. 379 00:44:06,040 --> 00:44:14,800 I mean, the the broad definition could be the payoff of a player UI is basically a function of his own action A.I. 380 00:44:14,800 --> 00:44:20,470 Which are the actions of his neighbours in the observer network or in this case, 381 00:44:20,470 --> 00:44:25,000 we can say is observe the network and then the connectivity, basically. 382 00:44:25,000 --> 00:44:31,360 So one particular formulation of the payoff is this linear quadratic model. 383 00:44:31,360 --> 00:44:33,070 It's quite easy to understand. 384 00:44:33,070 --> 00:44:42,880 Here we have individual actions A.I. and then we have this marginal benefit by which basically decides independent from independent from the others. 385 00:44:42,880 --> 00:44:50,800 If I increase my action, how much of a benefit I would get. And then we have this network factor beta in front of this term. 386 00:44:50,800 --> 00:44:59,070 So this time we can basically. Think about is the eight times the summation of the neighbours action, 387 00:44:59,070 --> 00:45:03,270 so you can clearly see that the first two terms corresponds to a kind of individual effort. 388 00:45:03,270 --> 00:45:10,450 This is the individual benefits this is the cost is an active squad's time and then this training is basically the network effect. 389 00:45:10,450 --> 00:45:18,240 So if you have a lot of people who are your neighbours and they they all have a very high age, 390 00:45:18,240 --> 00:45:24,090 then you know, if you put a higher rate than it would increase your own utility. 391 00:45:24,090 --> 00:45:29,400 So that's basically one specific model that we use. 392 00:45:29,400 --> 00:45:35,400 And why do we choose this specific function, payroll function first? 393 00:45:35,400 --> 00:45:38,010 It's a continuous. It can capture continuous actions. 394 00:45:38,010 --> 00:45:46,440 That aid doesn't have to be discrete or in many cases, the A's are assumed to be discrete in the literature. 395 00:45:46,440 --> 00:45:52,170 Also, it's very easy for us to capture both strategic accomplishments and the substitute. 396 00:45:52,170 --> 00:45:56,940 You just need to choose a positive beta or negative beta, the negative beta, 397 00:45:56,940 --> 00:46:06,180 which basically mean that there is a kind of a negative network effect and this is a simple quadratic model, 398 00:46:06,180 --> 00:46:11,220 but it can also be used to approximate the more complex nonlinear payoffs. 399 00:46:11,220 --> 00:46:23,760 So that's also makes it general. And this is also that while adopting the model in the literature of games on networks again, 400 00:46:23,760 --> 00:46:29,370 we have a lot of examples that I just gave before that these games can think about educational 401 00:46:29,370 --> 00:46:37,860 efforts or collaboration or even individual mobility choices in the context of urban dynamics. 402 00:46:37,860 --> 00:46:48,000 So let's focus on the most important advantage of this specific model in game theory were mostly interesting equilibrium. 403 00:46:48,000 --> 00:47:00,150 So what is the kind of a stable situation where a person will not tend to make a change of his action, if not all of his neighbours, which ends? 404 00:47:00,150 --> 00:47:07,200 So the concept of national equilibrium is to capture this stable situation. 405 00:47:07,200 --> 00:47:10,770 In this case, the equilibrium is easy to derive. 406 00:47:10,770 --> 00:47:15,300 We can simply take it the first order condition of stability, 407 00:47:15,300 --> 00:47:24,240 so we take the derivative of this utility with respect to the individual action, and then we end up in the equation here. 408 00:47:24,240 --> 00:47:30,780 So B here is basically a vector of the marginal benefit if you remember the A's of active actions. 409 00:47:30,780 --> 00:47:36,090 So you see that the actions A's basically the inverse of this matrix times B. 410 00:47:36,090 --> 00:47:45,360 And so we need a few assumptions to to guarantee that this inversion exists, 411 00:47:45,360 --> 00:47:49,800 which is basically the spectral radius of this matrix is smaller than one. 412 00:47:49,800 --> 00:47:58,140 And the spectral radius is basically the largest single the value of the of the matrix. 413 00:47:58,140 --> 00:48:00,900 If that's the case, then you can guaranteed a matrix. 414 00:48:00,900 --> 00:48:08,730 Inversion exists, and the equilibrium is unique and unstable, so that's just a technical conditions. 415 00:48:08,730 --> 00:48:14,970 But if we observe this equilibrium more carefully, we can we can find something interesting. 416 00:48:14,970 --> 00:48:25,520 This inversion can be rewritten as even if some of the terms here, so be it. 417 00:48:25,520 --> 00:48:28,530 And this is closely related to a concept in network science, 418 00:48:28,530 --> 00:48:40,290 which is called the voltage centrality that just counts the total number of works from one single person to anybody in the network. 419 00:48:40,290 --> 00:48:47,640 So and the interesting thing here is that if you look at the payload function here, 420 00:48:47,640 --> 00:48:52,980 it looks like this pair of interdependency only existing, well, hot neighbour. 421 00:48:52,980 --> 00:48:57,180 But if you look at the equation here, actually, you see that under the equilibrium, 422 00:48:57,180 --> 00:49:03,990 this pair of dependency spreads indirectly through the whole network, which is something interesting to observe. 423 00:49:03,990 --> 00:49:09,720 And you can also have some other interpretation far more signal processing viewpoint. 424 00:49:09,720 --> 00:49:16,770 But I will skip that because probably that's not very easy to explain in the short period of time. 425 00:49:16,770 --> 00:49:24,570 But under this model, we basically come up with a learning algorithm that helps us to basically infer the G from the air. 426 00:49:24,570 --> 00:49:33,510 So that's the the the idea that we had from the beginning, and we we basically consider K independent games. 427 00:49:33,510 --> 00:49:41,160 All of them have different marginal benefit vectors, so you can think about the adoption of different types of products. 428 00:49:41,160 --> 00:49:44,190 The marginal benefits for each of them will be different for individuals. 429 00:49:44,190 --> 00:49:49,680 Therefore, the equilibrium actions will be different, but the network is assumed to be the same. 430 00:49:49,680 --> 00:50:05,060 So we. Try to infer the network structure G and together with the marginal benefit to be given only the Action A and the parameter beta. 431 00:50:05,060 --> 00:50:09,260 What we do is to solve this following optimisation problem. 432 00:50:09,260 --> 00:50:17,300 The first to here we can see is just a condition that is matched when the equilibrium of the action is observed. 433 00:50:17,300 --> 00:50:25,430 So you want to minimise this term so that it basically means that your actions close to equilibrium, then we want to put some other conditions on. 434 00:50:25,430 --> 00:50:30,830 The variable that we want to optimise for the G is basically the network structure. 435 00:50:30,830 --> 00:50:40,460 So we want it to be symmetric non-active and then also constraint on the total sum of the weights of the edges. 436 00:50:40,460 --> 00:50:46,550 This just guarantees that the variable that we solve in the end is a valid graph topology, a structure. 437 00:50:46,550 --> 00:50:54,620 We also need some constraint or prior information on this, be the marginal benefits. 438 00:50:54,620 --> 00:51:02,780 The simple thing that we consider is, again, an idea taken from this idea of Homophily. 439 00:51:02,780 --> 00:51:10,190 So two people are likely to form a link if they have similar characteristics. 440 00:51:10,190 --> 00:51:15,080 This characteristic here is, in fact, that's the marginal benefit. So you like music? 441 00:51:15,080 --> 00:51:19,970 Therefore, taking certain actions will lead to a higher power, for example. 442 00:51:19,970 --> 00:51:27,350 So if we think of this, uh, marginal benefit as a single value, let's say, instead of a vector, 443 00:51:27,350 --> 00:51:33,110 we just think of either like a single value and then we represent that value as a bar in a network. 444 00:51:33,110 --> 00:51:40,010 So the the the bars that are upwards are positive, the blue ones, they are negative. 445 00:51:40,010 --> 00:51:48,590 So in this case, because two people will form a likely to form a link if they are marginal, benefits are similar. 446 00:51:48,590 --> 00:51:54,590 Therefore, if you look at this information, which is the marginal benefits on top of this network, 447 00:51:54,590 --> 00:51:59,630 you will see that the transitions of these values will be kind of smooth across the edges because 448 00:51:59,630 --> 00:52:05,420 that's based on the assumption that a link will only form between people who have similar attributes. 449 00:52:05,420 --> 00:52:14,780 So in computer science, you can use a term which is this thing based on what is called a passive matrix of the network to quantify. 450 00:52:14,780 --> 00:52:24,260 This is basically say that if there is a connexion which is non-zero, then I want beyond Vee-Jay to be similar. 451 00:52:24,260 --> 00:52:35,120 So in our context, we use this term here, which basically means we assume a kind of a homogenous distribution of the marginal benefits. 452 00:52:35,120 --> 00:52:41,300 And then after we have this term, we can basically solve this problem, uh, 453 00:52:41,300 --> 00:52:48,030 using specific computational methods, uh, which I will probably not talk too much. 454 00:52:48,030 --> 00:52:57,740 And basically, it's you have this adjointe, a variable GMB, which leads to a non convex optimisation, 455 00:52:57,740 --> 00:53:02,730 but you can use the alternating minimisation to solve all both of them iteratively. 456 00:53:02,730 --> 00:53:10,550 If I'm just, like, probably over oversimplify a lot of this, but just because not yet, anyways. 457 00:53:10,550 --> 00:53:18,860 So part of it is that you have the same network, but your training algorithm, which is there's a variation in the marginal benefit of different. 458 00:53:18,860 --> 00:53:25,400 Yeah, different. Yeah, that's right. So you're saying I have the same network many, many times, but yeah. 459 00:53:25,400 --> 00:53:29,440 Yeah. And then I'm trying to guess a person's position. Network. 460 00:53:29,440 --> 00:53:34,580 Yeah, that's right. Yeah. It can be extended to cases where networks are changing. 461 00:53:34,580 --> 00:53:38,960 But right now, we just consider six to topology, but a different marginal benefits. 462 00:53:38,960 --> 00:53:44,300 And did you put any constraints on the different types of networks, like the topology of the networks? 463 00:53:44,300 --> 00:53:48,500 That's a very good question. Right now, we don't because this constraint is very general. 464 00:53:48,500 --> 00:53:52,530 It's the basic constraint that guarantees a value, the network topology. 465 00:53:52,530 --> 00:53:56,390 You can put things like a degree distribution or other things, 466 00:53:56,390 --> 00:54:02,270 which we studied in the experiments, but now we don't put them in the constraint, the explicitly. 467 00:54:02,270 --> 00:54:09,280 You can if you have some, you know, prior information about something and then you try to put that as a constraint. 468 00:54:09,280 --> 00:54:19,670 Uh, the basic intuition is that if this spectrum radius role Beta G is zero, then obviously, you know, 469 00:54:19,670 --> 00:54:24,740 you don't have any information of the network from a because you just got a minus B, right? 470 00:54:24,740 --> 00:54:30,320 So so there is no hope that you can infer the G from from that. 471 00:54:30,320 --> 00:54:37,880 But if this a b g becomes a bigger and bigger, then obviously a will containing the information from the network. 472 00:54:37,880 --> 00:54:42,350 The extreme case, if Peter G has a radius of one, 473 00:54:42,350 --> 00:54:48,590 then we can work out that this action is proportional to the eigenvectors ontology of the network and that eigenvectors. 474 00:54:48,590 --> 00:54:52,060 Ontology, in fact, has a strong information about the. 475 00:54:52,060 --> 00:55:04,930 Activity of the network, so that that's kind of an intuition, why from action a we can go back to to get to the topology, see. 476 00:55:04,930 --> 00:55:10,480 So we tested this idea in both the synthetic and real world, the data. 477 00:55:10,480 --> 00:55:14,320 So these are basically the different types of pathologies that you mentioned. 478 00:55:14,320 --> 00:55:19,840 These are random graph models that are typically studied in network science. 479 00:55:19,840 --> 00:55:30,250 They basically capture different types of activities. And what we do is basically to first to generate a grant to run the network. 480 00:55:30,250 --> 00:55:41,020 Then we basically use this idea to compute an action based on a simulation of the amorphous marginal benefits, 481 00:55:41,020 --> 00:55:44,860 which is assumed to be a Gaussian distribution here. So we got the G. 482 00:55:44,860 --> 00:55:49,540 We compute the B following these and then we got a. 483 00:55:49,540 --> 00:55:57,610 And now we just use the algorithm to infer GMB from a that's the basic procedure. 484 00:55:57,610 --> 00:56:06,430 And then we basically compare it with some other, like a well known methods in computer science, mostly, and they evaluate the performance. 485 00:56:06,430 --> 00:56:12,760 And this is one set of experiments that we do. Each figure is basically one type of random graph. 486 00:56:12,760 --> 00:56:20,530 What we observe here are the x axis is the spectrum radius of the Beta G matrix, and the Y is the evaluation, 487 00:56:20,530 --> 00:56:24,760 which is the similarity between what we infer and the ground shows that we start with. 488 00:56:24,760 --> 00:56:33,630 So you can see that when we increase the spectral radius, the performance increases, which follows the intuition that I just explained. 489 00:56:33,630 --> 00:56:47,100 And then. The method that we have here in general achieves a better performance than a simple baseline, which is sample correlation. 490 00:56:47,100 --> 00:56:52,230 And also another well-known method in computer science, which is called the graphical lasso. 491 00:56:52,230 --> 00:56:53,910 I'm not sure if you're familiar with sound, 492 00:56:53,910 --> 00:57:03,660 but the idea is to see whether this new formulation can actually improve the performance compared to some existing models. 493 00:57:03,660 --> 00:57:09,750 The more interesting question in this case would be whether the algorithm works equally well three and four different types of graphs, 494 00:57:09,750 --> 00:57:17,730 like graphs of different kind of activity. So to do that, basically, we test it for these three different types of graph. 495 00:57:17,730 --> 00:57:22,500 Each of these running graph models have parameters to decide that the densities of the edges. 496 00:57:22,500 --> 00:57:32,310 So, for example, in others or any network, edges are created independently with a given probability distribution. 497 00:57:32,310 --> 00:57:38,160 So if the P is high, then the network becomes dense. 498 00:57:38,160 --> 00:57:43,650 Another example would be this Barbash Aabot network, 499 00:57:43,650 --> 00:57:53,190 which is a generative model that each time you just add a new vertex and that connects the new vertex to the existing nodes in the graph. 500 00:57:53,190 --> 00:57:57,360 So if you decide every time you put new nodes, 501 00:57:57,360 --> 00:58:02,130 that means every time you just add new edges and the rule is that you only 502 00:58:02,130 --> 00:58:06,900 connect yourself at the new nodes to the notes that are already well connected. 503 00:58:06,900 --> 00:58:12,180 So that in the end, you have these preferential attachment. 504 00:58:12,180 --> 00:58:13,500 So they are just different networks. 505 00:58:13,500 --> 00:58:21,630 But what we observed here is that when these parameters are high, which leads to denser networks, the performance drops. 506 00:58:21,630 --> 00:58:25,270 And one reason for that could be if you have a very dense network, 507 00:58:25,270 --> 00:58:32,400 then the Connexions basically have this influence, which are kind of mingled against with each other. 508 00:58:32,400 --> 00:58:38,820 So it becomes very difficult for you to recover from just the observed actions. 509 00:58:38,820 --> 00:58:46,890 If the network is sparse, then the likely correlation between the actions indicates more the existence of a link. 510 00:58:46,890 --> 00:58:55,620 So that's the basic intuition. We also did some real world data to infer so. 511 00:58:55,620 --> 00:58:56,130 In this case, 512 00:58:56,130 --> 00:59:03,130 it's an inference of a social network structure where the data basically are the almost two hundred households in a village in rural India. 513 00:59:03,130 --> 00:59:13,230 This is one classical study economics and the games in this case are basically the number of facilities adopted by each household. 514 00:59:13,230 --> 00:59:21,630 And so it's arguably a and strategic accompaniments game because of the conformity to social norms, 515 00:59:21,630 --> 00:59:29,550 which basically lead to higher benefits in the village. And then we basically take the data, which is the actions, 516 00:59:29,550 --> 00:59:37,020 and then gets the social network and to compare that with self-reported the social network to see how it works. 517 00:59:37,020 --> 00:59:42,570 So in this case, we can get a good performance better than the baseline. 518 00:59:42,570 --> 00:59:49,350 The other example is a dream for a trade network. So that's a similar idea. 519 00:59:49,350 --> 00:59:56,520 But the slight the difference is that in this case, it's more like a strategic substitute instead of a compliment. 520 00:59:56,520 --> 01:00:02,970 But the basic idea is to observe actions which are important Oxford X worth of countries, 521 01:00:02,970 --> 01:00:09,540 and we want to see whether they're looking for help us to infer a trade network that we don't observe. 522 01:00:09,540 --> 01:00:19,530 So that's basically a high level idea. So again, here, I think the interesting thing would be the application in the practical scenarios. 523 01:00:19,530 --> 01:00:24,840 If you only observe the actions by nodes, the network, then obviously if you can infer the network, 524 01:00:24,840 --> 01:00:29,610 you can use it to detect the communities, which is useful for stratification. 525 01:00:29,610 --> 01:00:36,420 You can compute the centrality measures which can help you to design targeting interventions, 526 01:00:36,420 --> 01:00:42,760 and you can basically achieve planning objectives by looking at the network. 527 01:00:42,760 --> 01:00:48,240 So the two examples here could be if you want to maximise the total utility of the players, 528 01:00:48,240 --> 01:00:54,810 then you can adjust the marginal benefits that you learnt, uh, 529 01:00:54,810 --> 01:00:57,330 so that it is proportional to the eigenvectors ontology, 530 01:00:57,330 --> 01:01:04,470 which is proving this paper you can think about in reducing inequality between different players like different communities in the city. 531 01:01:04,470 --> 01:01:16,020 Uh, by adjusting the connexions between the different communities after you interact so you can encourage new links and discourage existing links. 532 01:01:16,020 --> 01:01:18,270 There are several open issues here. 533 01:01:18,270 --> 01:01:26,460 The determination of the parameter data, um, the theoretical understanding of the algorithm when it works with what guarantee, 534 01:01:26,460 --> 01:01:35,680 that's more like a, uh, a guarantee that we would like to have so that we can actually apply it in a real scenario as. 535 01:01:35,680 --> 01:01:42,190 Then obviously other payroll functions or applications. 536 01:01:42,190 --> 01:01:50,230 So the high level idea here is kind of opposite to the first problem that I talked if we observe certain actions. 537 01:01:50,230 --> 01:01:56,140 Can we try to basically go back to infer the hidden links between them? 538 01:01:56,140 --> 01:02:02,860 So these two basically combine together and in a sense, 539 01:02:02,860 --> 01:02:08,650 that I think collective decision making is key to understand the behaviour and the 540 01:02:08,650 --> 01:02:14,150 network based on learning can provide some natural models and to us to do that. 541 01:02:14,150 --> 01:02:24,460 The caveat here is more like the learning and inference they need to be done in combination with social series for better interpretability. 542 01:02:24,460 --> 01:02:32,710 Otherwise, they just become a computer science exercise without the implications in policymaking. 543 01:02:32,710 --> 01:02:40,390 More importantly, you need to think about the ethical implications and consequences when you build models to understand human behaviour. 544 01:02:40,390 --> 01:02:47,200 It might lead to marginalisation of some disadvantaged groups in some sense or situation, so. 545 01:02:47,200 --> 01:02:53,710 And you need to be basically extra careful when you make these modelling assumptions, 546 01:02:53,710 --> 01:03:00,140 but that basically needs help from social scientists to work in a more collaborative setting. 547 01:03:00,140 --> 01:03:06,820 And so finally, there are a few papers and the co-authors of this paper, so we start out and thank you for your attention. 548 01:03:06,820 --> 01:03:21,610 Take the question. Oh, you can trade off that. 549 01:03:21,610 --> 01:03:33,220 And one of my well, just for clarity sake, if you wanted to infer the the the location of someone in the network, do you using your method, 550 01:03:33,220 --> 01:03:40,450 you need to know what the network like topology is first to be able to identify where they're in that point, 551 01:03:40,450 --> 01:03:44,680 you just because your gains held topology constant. Right? 552 01:03:44,680 --> 01:03:48,490 No, I think you just need to to observe these actions, which are vectors. 553 01:03:48,490 --> 01:03:52,720 OK. And then basically, you get what is called adjacency matrix, right? 554 01:03:52,720 --> 01:03:59,950 And that addresses a matrix supposedly captures the kind of activity pattern. So you don't need the you don't need the network topology eventually. 555 01:03:59,950 --> 01:04:06,160 Yeah, that's basically what you want to infer. So who is just one individual location, but the entire topology of the network? 556 01:04:06,160 --> 01:04:08,470 So yeah, so that in some sense, 557 01:04:08,470 --> 01:04:15,190 I think the location of the network is doesn't matter that much because you can redraw the network in whatever way, right? 558 01:04:15,190 --> 01:04:22,510 So so they just I think they depend on the specific layout of the graph, if you like. 559 01:04:22,510 --> 01:04:27,640 But but the information is basically by the A's connected to be so whether there 560 01:04:27,640 --> 01:04:31,990 is the existence of that link that's completely captured by the adjacency matrix. 561 01:04:31,990 --> 01:04:36,620 So we infer that just from the electoral observation of actions. 562 01:04:36,620 --> 01:04:37,150 OK. 563 01:04:37,150 --> 01:04:46,330 And just following up on that, a lot of what we've been discussing lately is that with digital trace data, we we don't know what we're missing so far. 564 01:04:46,330 --> 01:04:54,910 And this method seems like it could be one way to start getting at that in the context of social network data. 565 01:04:54,910 --> 01:05:01,270 We could try. Yeah. Although this particular matter is based on the assumption of network games, 566 01:05:01,270 --> 01:05:06,910 which means that the actions people take are kind of based on strategic interactions. 567 01:05:06,910 --> 01:05:14,620 Some abilities, whether they. So there is an interesting example in urban dynamics that can feed this, 568 01:05:14,620 --> 01:05:20,710 which is a think about artificial city, where we only have two locations wines to centre. 569 01:05:20,710 --> 01:05:22,780 The other is the periphery. 570 01:05:22,780 --> 01:05:30,820 Now everybody needs to basically decide how many times they go to the city centre for social benefits, like by interacting with down there. 571 01:05:30,820 --> 01:05:40,480 So in that case, the number of visits is basically the action. And then you obviously want to visit more if you are central in the social 572 01:05:40,480 --> 01:05:44,170 network because there is a high likelihood that you can target with more people. 573 01:05:44,170 --> 01:05:52,990 So in that specific scenario, you can think of inferences the hidden social natural connexion from the action, 574 01:05:52,990 --> 01:05:58,060 which is the number of visits to page to the city centre. 575 01:05:58,060 --> 01:06:05,020 I guess if the digital traces can fit into something like that, like the fact that I move to a certain place, 576 01:06:05,020 --> 01:06:10,000 this is likely to be affected by the fact that you are so gorgeous in place. 577 01:06:10,000 --> 01:06:18,980 Then then it might be possible to do this because my action we're not just dependent on my own action, but also you actually let it make sense. 578 01:06:18,980 --> 01:06:26,400 Yeah. All right. 579 01:06:26,400 --> 01:06:35,430 Thank you for the talk. I was wondering about the first beach we're talking about on the network through telephone 580 01:06:35,430 --> 01:06:43,500 if where you could insert some alternative demographics by just looking at how people call. 581 01:06:43,500 --> 01:06:46,720 Yeah, yeah. Well, I think there are people who do that. 582 01:06:46,720 --> 01:06:55,260 Uh, I have some colleagues who try to use communication patterns, mostly in what is called the ego network. 583 01:06:55,260 --> 01:07:02,240 So it's just to me under the first hope neighbours of myself to even for demographics. 584 01:07:02,240 --> 01:07:10,920 And you can do that with reasonable accuracy of what the results are still usually like between 70 and 80 percent accuracy, 585 01:07:10,920 --> 01:07:18,630 even without making the link explicitly to demographics in itself will be interesting to maybe do some clustering on just the type of communication. 586 01:07:18,630 --> 01:07:22,140 Yeah, yeah, yeah, definitely. I think these kind of predictions are based on that. 587 01:07:22,140 --> 01:07:29,940 So, for example, females and males have different communication patterns during the weekdays on the weekends, so that's how they the predictor. 588 01:07:29,940 --> 01:07:37,290 So getting back to your point, if you don't have demographics, maybe you can use that to predict and then use that to condition the matching. 589 01:07:37,290 --> 01:07:47,200 But there is one actual stop in the middle that you don't guarantee that the prediction is perfect. 590 01:07:47,200 --> 01:07:54,200 On the other hand, you can use social media activities to predict. I think Scott hail from the I did something recently. 591 01:07:54,200 --> 01:07:58,210 And they just look at Twitter activities and they use that to predict demographics. 592 01:07:58,210 --> 01:08:02,710 I think it's kind of similar idea. 593 01:08:02,710 --> 01:08:09,180 So I'm wondering in your inference of the Netflix apology, do you have in this case, you had this kind of quadratic function? 594 01:08:09,180 --> 01:08:14,790 Yeah, but I guess it's quite generally two different kinds of functions. 595 01:08:14,790 --> 01:08:24,430 Yeah. So I'm yeah, I'm trying to understand perhaps variations, where have you thought or have you built? 596 01:08:24,430 --> 01:08:29,620 Have you tried to do this with other functions as well to try and see how it, how it looks like? 597 01:08:29,620 --> 01:08:33,340 Or or um. Yeah. 598 01:08:33,340 --> 01:08:38,440 So the answer is that we haven't really tried the other health functions, too. 599 01:08:38,440 --> 01:08:41,950 The thing is that the first we want wanted to be general enough. 600 01:08:41,950 --> 01:08:52,060 Second, the equilibrium needs to be explicitly computable because we want to have this notion of natural Grootboom actions. 601 01:08:52,060 --> 01:08:57,730 If you have, let's say, uh, arbitrary analytical function of the of, 602 01:08:57,730 --> 01:09:03,520 as long as you can take the first or the derivative, you can you fit into the optimisation. 603 01:09:03,520 --> 01:09:10,180 But in that case, there is no economic theory to guarantee that the equilibrium will be stable and unique, 604 01:09:10,180 --> 01:09:16,510 which means you kind of lose something from the, uh, economic interpretation. 605 01:09:16,510 --> 01:09:18,880 Right? 606 01:09:18,880 --> 01:09:26,890 So one of the things that we've discussed a lot in the Summer Institute is is that some of these very detailed types of data that you've talked about, 607 01:09:26,890 --> 01:09:27,520 for example, 608 01:09:27,520 --> 01:09:40,510 mobile networks calling networks or other forms of of rich behavioural data are often in the hands of private companies and hard to get access to. 609 01:09:40,510 --> 01:09:47,020 And while they're very they can shed light to new kinds of ways as you were in the dark, 610 01:09:47,020 --> 01:09:55,240 perhaps you could comment a bit on on on your experiences trying to access these kinds of data, especially, for example, the mobile study in Andorra. 611 01:09:55,240 --> 01:09:59,950 And what were the steps or the pathways that enabled you to get access to them? 612 01:09:59,950 --> 01:10:01,780 Because I think it would be helpful for us here as well. 613 01:10:01,780 --> 01:10:14,830 To know about that Andorra study, I think was based on kind of a contract signed between a few groups in the Media Lab and the government of Andorra, 614 01:10:14,830 --> 01:10:22,450 because at the time they were trying to diversify the economy and then trying to attract more tourists, basically. 615 01:10:22,450 --> 01:10:31,690 And now understand whether they can keep them to stay overnight so that they can, and they have some huge problems in terms of traffic jams. 616 01:10:31,690 --> 01:10:42,970 So so they basically want to work with a few groups in the media lab to build the new models first to solve these problems. 617 01:10:42,970 --> 01:10:52,790 But they also want to make Android like a place of innovation in Europe by testing new ideas like these and self-driving cars. 618 01:10:52,790 --> 01:10:56,770 And that was like five years ago when they were trying to put some vehicles on the 619 01:10:56,770 --> 01:11:03,160 street and then see how people react in that case because of that special relationship. 620 01:11:03,160 --> 01:11:08,000 We got this access to data from the. 621 01:11:08,000 --> 01:11:18,740 Telecom, Andorra, I think, and mostly what I know the way that you got access to this kind of data is through this kind of research contracts. 622 01:11:18,740 --> 01:11:22,280 But I would say it's more and more common compared to 10 years ago, 10 years ago, 623 01:11:22,280 --> 01:11:30,950 it was really very difficult because people do not understand whether it's safe to share it or whether they can actually get some benefit. 624 01:11:30,950 --> 01:11:34,520 But nowadays, I would say specifically for mobile phone data, 625 01:11:34,520 --> 01:11:39,950 it becomes more popular because more and more network providers realise that 626 01:11:39,950 --> 01:11:45,920 there is a reasonably good way to share anonymized data for some benefits. 627 01:11:45,920 --> 01:11:56,750 So, uh, I don't know if you know there is a dedicated conference called Net Mobile, which actually happens the week after next in Oxford. 628 01:11:56,750 --> 01:12:00,890 So that's a dedicated conference for people working on mobile phone data. 629 01:12:00,890 --> 01:12:06,530 So there you can see that they have all sorts of data from different countries. 630 01:12:06,530 --> 01:12:16,970 But in general, I think the problem is that the reason why people do not necessarily want to share the data is because of these ethical concerns. 631 01:12:16,970 --> 01:12:23,780 They are not sure whether it will be safe and what would be the best way to share what is the best way to anonymize. 632 01:12:23,780 --> 01:12:29,390 If they don't know, then it becomes very difficult for them to share a name for your approach, tending to be difficult. 633 01:12:29,390 --> 01:12:35,570 So I think the idea would be to I think people are convinced that they are helpful. 634 01:12:35,570 --> 01:12:39,470 But now this means that the other side of the development needs to catch up, 635 01:12:39,470 --> 01:12:49,220 basically to show that we can have safe and meaningful ways to share the data, which can help us to solve something important problems. 636 01:12:49,220 --> 01:12:54,980 I think that can help us convince the data holder in this case. 637 01:12:54,980 --> 01:12:58,160 But it takes time because I could tell you, if you compare 10 years ago and now, 638 01:12:58,160 --> 01:13:02,720 10 years ago, maybe June, if only a few groups in the world that have this kind of data. 639 01:13:02,720 --> 01:13:11,080 But now I would say much, much more common. Still, I think. 640 01:13:11,080 --> 01:13:15,970 It's kind of a slow process in some ways. 641 01:13:15,970 --> 01:13:29,770 So. I think you have a question regarding the network model that you that you showed before, 642 01:13:29,770 --> 01:13:37,960 when I got that right with the inverse of the policy areas in the model that you get some kind of like the temporal process that I'm influencing, 643 01:13:37,960 --> 01:13:41,680 like my network neighbours and like other neighbours. 644 01:13:41,680 --> 01:13:48,940 And it also comes back to me, doesn't it like that? So in this case, if I understand your question, this is more like a one shot game. 645 01:13:48,940 --> 01:14:01,810 So it's not a sequential game. So it's just like I just observed action where people do not know the decisionmaking, so the other people beforehand. 646 01:14:01,810 --> 01:14:10,540 So they basically make simultaneous actions and the network is basically a network structure in a particular time. 647 01:14:10,540 --> 01:14:23,710 OK, so you observe this simultaneous actions and then you in four network in a given point in time, but is not sequentially done? 648 01:14:23,710 --> 01:14:32,440 But wouldn't I be like if I just solve the the inverse inverse term or I'd be there as well? 649 01:14:32,440 --> 01:14:37,900 Like, uh, like, isn't that wouldn't it look modern? 650 01:14:37,900 --> 01:14:42,210 Kind of like that. My effort depends on my effort somewhat. Yeah, you. 651 01:14:42,210 --> 01:14:48,760 I mean, actually, it's like there is a loop, right? So your your efforts depends on the others and the others are depending on the third person. 652 01:14:48,760 --> 01:14:52,320 It's like that. Isn't it only possible in time like this? 653 01:14:52,320 --> 01:14:57,610 You just model one simultaneously is, uh, not necessary. 654 01:14:57,610 --> 01:15:03,490 You can. You can basically have these examples that I presented at the beginning where basically we say 655 01:15:03,490 --> 01:15:09,220 that say you are in a social network and now you know your neighbours and then in your mind, 656 01:15:09,220 --> 01:15:16,570 you basically work out like what would be the equilibrium, right? If I do something, I can expect, what is the action right? 657 01:15:16,570 --> 01:15:20,260 So if everybody just to if everybody's assumed to be rational, 658 01:15:20,260 --> 01:15:25,570 they will basically compute this equilibrium in their mind and they would basically make a simultaneous action. 659 01:15:25,570 --> 01:15:28,180 And then that's a gripping you observe, basically, 660 01:15:28,180 --> 01:15:36,210 because the idea of recruitment means that if the others do not change, then I don't have the incentive to change. 661 01:15:36,210 --> 01:15:41,210 Right. Yeah. But you can generalise, generalise into to a sequential setting. 662 01:15:41,210 --> 01:15:51,120 I think where you say, OK, I observe the past joint decision of all the people and that may be the network attention the next. 663 01:15:51,120 --> 01:15:58,680 Timestamp. Maybe the actions will depend on the actions in this new point in time. 664 01:15:58,680 --> 01:16:09,910 Plus the game that played in the past, but not here so. 665 01:16:09,910 --> 01:16:17,800 So I have another question about data collection, so one of the I mean, I think like network data are things that, of course, 666 01:16:17,800 --> 01:16:28,180 are very exciting, but also not as as as widely available as as one would like, partly because they're difficult to connect data on. 667 01:16:28,180 --> 01:16:38,500 And I am wondering if based on your examples, given that you do a lot of work on you use with networks, 668 01:16:38,500 --> 01:16:46,120 are there kind of examples of good data sets that are also publicly available that that you know you'd like to talk or share? 669 01:16:46,120 --> 01:16:50,620 Let us know about. Because I mean, this has been recently enough in college. 670 01:16:50,620 --> 01:16:54,430 There was a workshop on, I think, a couple of years ago or last year, 671 01:16:54,430 --> 01:16:58,570 which was on sort of the challenges of network data collection and the fact that there 672 01:16:58,570 --> 01:17:05,230 are just so few available and often there are maybe very small samples if they are. 673 01:17:05,230 --> 01:17:10,920 And so and you mentioned that now a lot more data sources are available, especially for networks. 674 01:17:10,920 --> 01:17:16,240 So perhaps you could just give us more examples of that to leverage those that would be very helpful. 675 01:17:16,240 --> 01:17:20,230 Yeah. If you if you talk about the like or, for example, real world social network data, 676 01:17:20,230 --> 01:17:26,050 at least in the computer science literature, there are many, many repositories. 677 01:17:26,050 --> 01:17:31,330 I would say you can find all sorts of networks in real world situations because 678 01:17:31,330 --> 01:17:35,710 those are the networks that a computer scientists to test the algorithms on. 679 01:17:35,710 --> 01:17:42,340 They can be static or they can be dynamic that can be directed to can be for all different types of applications. 680 01:17:42,340 --> 01:17:52,930 I can share a few links, maybe. And the more difficult thing would be the data that you mentioned first, like the more private. 681 01:17:52,930 --> 01:17:59,650 Yeah. Or data that are privately held by companies and governments. 682 01:17:59,650 --> 01:18:08,290 That's more difficult and that's more difficult also because of the nature of the data, which has a like a higher risk. 683 01:18:08,290 --> 01:18:14,530 If you think about the social network, the risks, it kind of lasts because you just have the Connexions. 684 01:18:14,530 --> 01:18:23,950 If you don't have any other information on the nodes, it's just like a graph. It's a matrix doesn't really allow any identification. 685 01:18:23,950 --> 01:18:36,110 But if you have a trace system, then that's a different story. So. 686 01:18:36,110 --> 01:18:40,880 All right, well, we want to thank Joe Dunn for Joe and for speaking with us. 687 01:18:40,880 --> 01:18:44,660 And if you have any more questions, open the door. Yeah. 688 01:18:44,660 --> 01:18:53,510 OK, so we'll come to lunch if you have more questions. And after lunch is just continue to work on your group, projects will be sending out soon. 689 01:18:53,510 --> 01:18:59,030 A little brief on what we would like in the presentations tomorrow. 690 01:18:59,030 --> 01:19:06,517 So anyone have any questions? OK. Thank you.