1 00:00:00,030 --> 00:00:06,599 Why don't we begin? It is a pleasure today to introduce our speaker, Professor Chris Lynn. 2 00:00:06,600 --> 00:00:10,710 Talk to our own sub Department of Astrophysics. 3 00:00:10,830 --> 00:00:18,870 Chris was a an undergraduate at Cambridge University and did his graduate studies at UCL. 4 00:00:19,830 --> 00:00:29,940 The topic of his PhD thesis was essentially the chemistry of sulphur in the context of star formation. 5 00:00:30,600 --> 00:00:45,890 So when I heard this, I couldn't resist putting to him that between the study of brimstone and the observations of the fires of the first star, 6 00:00:45,900 --> 00:00:50,280 was he not in fact the very incarnation of Satan's astronomer self? 7 00:00:51,870 --> 00:01:01,829 Professor Lintott did not deny the fact that he had moved on from his interest in study for us excuse me, 8 00:01:01,830 --> 00:01:08,370 in his interest in star formation turning into an interest in cosmological star formation. 9 00:01:08,850 --> 00:01:14,070 Led him to his current field of interest, which is galaxies. 10 00:01:14,820 --> 00:01:26,550 And Chris became interested in the problem, which we'll hear about today, which is what is the best way to find peculiar morphology in galaxies? 11 00:01:26,580 --> 00:01:30,420 This is something that a computer by itself is not particularly good at. 12 00:01:31,020 --> 00:01:36,150 For all of its speed, because peculiar is hard to quantify. 13 00:01:37,620 --> 00:01:43,110 Human beings can't look through millions of galaxies by themselves unaided. 14 00:01:43,530 --> 00:01:54,780 So the question is, how do you put these two resources together to try and optimise the search using human brains with the aid of computer technology? 15 00:01:55,230 --> 00:02:05,400 And in that respect, he has succeeded brilliantly, establishing a prototype program for not just galaxies, 16 00:02:05,400 --> 00:02:09,660 but citizen science in general and its use in several different fields. 17 00:02:10,170 --> 00:02:20,330 So he is going to tell us today about citizen science, the story of Zooniverse from Galaxy Zoo to ALS City. 18 00:02:21,150 --> 00:02:24,450 And we get to begin with the penguin. Excellent. 19 00:02:24,930 --> 00:02:28,530 Thank you. Cheers. Well, thank you for that introduction. 20 00:02:29,970 --> 00:02:33,210 Satan's astronomer. Sounds beautifully well-paced, doesn't it? 21 00:02:33,630 --> 00:02:37,410 Not sure. What? Not sure what overheads you get on one soul. 22 00:02:37,440 --> 00:02:41,010 But I'm sure finance will let us know before the end of the lecture. 23 00:02:42,060 --> 00:02:45,150 Thank you. It's a delight to be asked to give the colloquium. 24 00:02:46,200 --> 00:02:51,410 I am assuming that I was selected because I speak loudly enough that I can keep you awake on a Friday afternoon. 25 00:02:51,420 --> 00:02:56,530 So I shall try to do that. And there will indeed be more pictures of penguins later in the talk. 26 00:02:56,550 --> 00:03:02,640 This is actually a peculiar galaxy not found by but adopted by some of the volunteers. 27 00:03:03,420 --> 00:03:08,310 But I thought I'd begin by pointing out how modern astronomy is done. 28 00:03:08,580 --> 00:03:16,680 And I was wrong about this for almost all of my life. I'm one of the few, I think, professional astronomers who grew up as amateur astronomers. 29 00:03:16,680 --> 00:03:22,620 I had a small telescope and from an early age I was thrilled with the idea that I might make a discovery. 30 00:03:23,460 --> 00:03:30,750 Ideally, I wanted to discover a comet because that named after discovers Comet Lintott I think has a particularly fine ring, 31 00:03:31,800 --> 00:03:40,260 but I would have taken anything and equipped with a six inch reflector and observing from the light polluted skies of suburban South Devon, 32 00:03:41,100 --> 00:03:46,440 I found I stood a pretty good chance. The closest I ever got was in viewing this thing. 33 00:03:46,440 --> 00:03:47,489 The Nebulosity. 34 00:03:47,490 --> 00:03:54,000 The gas you see at the bottom of this image is part of the Orion Nebula, a great star forming complex that's visible in the winter sky. 35 00:03:54,300 --> 00:03:59,220 And one evening I nudged my telescope, wasn't equipped with a drive, so the thing just moved. 36 00:03:59,460 --> 00:04:04,650 And this star cluster came into view. I looked it up in my star atlas. 37 00:04:05,550 --> 00:04:11,879 This was before we had the Internet at home, so I had to look at a book for the younger members of audience. 38 00:04:11,880 --> 00:04:17,580 With the audience, the book is like the internet, except that you have to turn the pages manually. 39 00:04:18,180 --> 00:04:20,930 And this cluster wasn't in the atlas, 40 00:04:20,940 --> 00:04:32,219 and I remember getting a pencil and marking across and putting Lyn top one next to now I the thing I like about this idea is 41 00:04:32,220 --> 00:04:38,160 that I think my scientific tendencies are clear in the fact that I used a pencil because this is a provisional result and this. 42 00:04:38,160 --> 00:04:44,309 Lintott One turns out to be better known as NGC 1981, though of course it's still the top one. 43 00:04:44,310 --> 00:04:50,100 To me, it's a perfectly ordinary cluster that was discovered in the 19th century, but looks rather nice at a small telescope. 44 00:04:50,430 --> 00:04:53,739 And this I think, was the start of my growing up as an astronomer. 45 00:04:53,740 --> 00:04:59,760 And I realised that the days in which discoveries were made by ordinary people with small telescopes had long. 46 00:04:59,980 --> 00:05:04,450 And telescopes these days look more like this. 47 00:05:04,750 --> 00:05:10,930 This is the Sloan Digital Sky Survey Telescope that if you, of course, now in New Mexico, a place that like Oxford, 48 00:05:10,930 --> 00:05:21,190 has about 300 clear nights a year and to first order and at Sloan is I think is what happens 49 00:05:21,190 --> 00:05:25,330 when you let particle physicists build your astronomy experiment because it's an experiment. 50 00:05:25,810 --> 00:05:27,790 It was designed not as an observatory, 51 00:05:28,150 --> 00:05:35,350 not as a place that you visit to take control of the telescope and point out your favourite targets, but as a data production machine. 52 00:05:35,710 --> 00:05:42,280 And Sloan allowed the sky to turn over it for eight years, measuring the position of more than 300 million objects. 53 00:05:42,640 --> 00:05:47,320 Of those, 8 million of them were identified as faint, fuzzy things, probably galaxies. 54 00:05:47,650 --> 00:05:53,760 And on the clearest and still is nights, Sloan would return to those and use spectroscopy to measure a distance. 55 00:05:53,770 --> 00:05:57,610 So this thing is making a three dimensional map of our local universe, 56 00:05:57,610 --> 00:06:03,580 and it's doing that so that we can make rather crude comparisons to sophisticated cosmological models 57 00:06:03,880 --> 00:06:08,140 because we want to understand the physics that's driven the large scale evolution of the universe. 58 00:06:08,980 --> 00:06:16,480 So, for example, we take our eight years worth of data on how many galaxies and we reduce them to a mass function. 59 00:06:16,870 --> 00:06:23,980 So this is the density of galaxies in the in the Sloan volume of particular stellar masses. 60 00:06:24,730 --> 00:06:29,080 So the dots here are data black from Sloan Blue for other surveys. 61 00:06:29,350 --> 00:06:34,510 And the green and the red are different flavours of large scale cosmological simulations. 62 00:06:34,980 --> 00:06:38,500 And for those who aren't astronomers, this is a good fit to the data. 63 00:06:39,460 --> 00:06:45,070 We consider ourselves well satisfied with this. But we, of course, have all sorts of interesting properties. 64 00:06:45,070 --> 00:06:50,800 You can see that there's a problem about high mass where we over predict the number of large galaxies. 65 00:06:51,100 --> 00:06:57,310 And you could see there's a problem at low mass as well, where while you can take your picture, you can over, over or under predict. 66 00:06:58,090 --> 00:07:01,210 And if you're a theorist, this is, of course, a problem with the observations. 67 00:07:01,840 --> 00:07:05,890 And if you're an observer, then you go away, you redo the model. 68 00:07:06,520 --> 00:07:18,150 But that seems something slightly reductive about reducing galaxies to point particles that trace the underlying evolution of the universe. 69 00:07:18,160 --> 00:07:24,130 You can do that. You could have particles with particular mass, of course, and density and and perhaps a size. 70 00:07:24,970 --> 00:07:31,810 But when you look at these things and these are all galaxies drawn from Sloan, admittedly local ones with roughly the same mass, 71 00:07:32,140 --> 00:07:37,719 you see that there's more information here and we can call it morphology if you want to sound sophisticated. 72 00:07:37,720 --> 00:07:43,630 But the shape of the galaxy encodes its integrated dynamical history. 73 00:07:44,050 --> 00:07:49,240 The shape, as James Binney and others will tell you at great length and with great certitude, great understanding. 74 00:07:49,540 --> 00:07:52,390 And the shape of the galaxy depends on the orbits of the stars. 75 00:07:52,690 --> 00:07:57,910 The orbits of the stars are a measure of the dynamical environments through which the galaxies have passed. 76 00:07:58,270 --> 00:08:05,380 And so if you know the shape of the galaxy, you can say something about how it's interacted with its surroundings, 77 00:08:05,620 --> 00:08:10,120 how it's interacted with other galaxies, and even how and where it stars have formed. 78 00:08:10,480 --> 00:08:18,100 And this is not a new idea. Edwin Hubble was the first to sort of systematically look at this in the thirties, 79 00:08:18,370 --> 00:08:25,390 wrote a book called The Realm of the Nebulae, in which he proposed a system for classifying galaxies for Hubble's tuning fork. 80 00:08:25,920 --> 00:08:34,810 And there's some sense that he saw it as an evolutionary scheme with elliptical galaxies like this one collapsing under gravity to form disks, 81 00:08:35,110 --> 00:08:39,430 which then wound up on wound themselves into various types of spiral. 82 00:08:39,790 --> 00:08:46,150 It's not really clear whether even Hubble believed that. But nonetheless, he knew that the shape of these galaxies was important. 83 00:08:46,810 --> 00:08:50,140 And back then, and throughout the fifties and sixties, 84 00:08:50,590 --> 00:08:57,489 we had few enough high resolution images of galaxies that eminent professors would devote themselves to creating, first of all, 85 00:08:57,490 --> 00:09:03,700 new classification schemes, then classifying all the while image galaxies according to those schemes, 86 00:09:03,970 --> 00:09:06,940 and then going to conferences and arguing about whose scheme was best. 87 00:09:07,720 --> 00:09:12,850 And so they multiplied in both a number, the classification schemes, but also in sophistication. 88 00:09:12,860 --> 00:09:21,980 So this is not just the spiral galaxy, this might be it's a spiral galaxy or let's be or it might be an SB three and so on and so forth and so on. 89 00:09:22,480 --> 00:09:29,620 And that kept people happy doing detailed studies of local galaxies for a long time. 90 00:09:29,620 --> 00:09:36,699 But by the 1980s, surveys had improved. And we had things like the Palomar Sky Survey, which I discovered the other day. 91 00:09:36,700 --> 00:09:43,240 To my surprise, this was the first really good survey of the northern sky to go deep, 92 00:09:43,570 --> 00:09:47,920 and it was able to take deep images of the sky because of a new type of photographic plate. 93 00:09:48,520 --> 00:09:52,390 This is in the 1980s. So this is the last pre-digital survey. 94 00:09:53,320 --> 00:09:59,490 Palomar produced thousands of galaxies that were worth classified and simultaneously the astronomical world in the late. 95 00:10:00,160 --> 00:10:04,149 Discovered that you don't need to be a professor to like Galaxy something grad students can do 96 00:10:04,150 --> 00:10:09,010 perfectly good job just as the number of galaxies that needed to be classified reached thousands. 97 00:10:09,370 --> 00:10:14,470 People started that phrase by classifying a thousand galaxies and using that data. 98 00:10:14,710 --> 00:10:20,620 So this worked very well. But people could see that projects like Sloan were coming and that this would it work. 99 00:10:21,190 --> 00:10:24,970 When you have a million galaxies classified, you need to find new approaches. 100 00:10:25,570 --> 00:10:32,470 And there's an excellent paper by my old supervisor with whom I never discussed Galaxy morphology during my Ph.D. offer Laugh. 101 00:10:32,500 --> 00:10:35,680 He's now at UCL when he does two things. First of all, 102 00:10:35,710 --> 00:10:42,280 he gets a panel of experts to classify a set of galaxies and then shows in this in the footnotes that 103 00:10:42,280 --> 00:10:47,860 you can derive who was who student just from their classifications with no other prior knowledge. 104 00:10:48,790 --> 00:10:54,730 But then secondly, he starts the push towards decent machine learning to attack this problem. 105 00:10:54,910 --> 00:11:03,760 And people do. People like Mick Bull, who is at Sussex and then Nottingham do whole PhDs on neural network approaches to galaxy classification. 106 00:11:04,450 --> 00:11:08,440 And almost anyone who attempts this discovers that 70% of the galaxies are easy. 107 00:11:09,250 --> 00:11:14,590 Getting to 80% takes a lot of effort, and beyond that you're really stuck and your accuracy is important. 108 00:11:14,590 --> 00:11:18,010 And we'd like to have reliable cross-sections for the vast majority of these galaxies. 109 00:11:18,010 --> 00:11:23,350 Or at least we'd like a system that will tell us which galaxies are reliably classified. 110 00:11:23,800 --> 00:11:28,210 The machine learning approaches of the time can't even tell you which 30% are wrong. 111 00:11:29,440 --> 00:11:40,240 When I arrived in Oxford, a new solution had been found, which was to to find students who were more committed to the subject. 112 00:11:41,020 --> 00:11:44,340 I think you particularly if a guy for coverage, Lewinsky, who's now 88. 113 00:11:45,010 --> 00:11:55,230 As a senior fellow there and Kevin had looked at 50,000 galaxies and he'd shown, amongst other things, first of all, that that's the limit. 114 00:11:55,240 --> 00:11:59,920 A student will look at 50,000 galaxies before he tells you where to stuff the rest of them. 115 00:12:00,970 --> 00:12:08,830 I'm paraphrasing so we can think of this as the Kevin Limit and other students approach, but do not exceed the Kevin limit. 116 00:12:09,040 --> 00:12:13,420 And then the other thing is that he showed really that it mattered to have a person look at this. 117 00:12:13,870 --> 00:12:17,080 His consultations with different from those you obtain from your networks. 118 00:12:17,350 --> 00:12:22,360 And when you chased the discrepancies down, you found that Kevin was right, at least according to the experts. 119 00:12:25,540 --> 00:12:33,190 I often joke that the obvious solution is to have 20 Ph.D. students work on this, but even then you still only got one classification per galaxy. 120 00:12:33,550 --> 00:12:38,170 And you're you have this uncomfortable truth that any result you produce depends on these classifications. 121 00:12:38,170 --> 00:12:41,590 You have no way of checking that. And so a broader approach was necessary. 122 00:12:41,920 --> 00:12:46,180 And without really thinking about it, we got some friends to knock together a website for Galaxy Z. 123 00:12:46,420 --> 00:12:51,819 This is what it looked like. It gave you an image of the galaxy and it gave you six buttons are not generic. 124 00:12:51,820 --> 00:12:57,730 It's a elliptical galaxy. We if it's a spiral galaxy, if it's a spiral, we want to know which way the arms are going. 125 00:12:58,090 --> 00:13:02,319 You want to hear that story? Ask me about it. In the questions we ask for mergers. 126 00:13:02,320 --> 00:13:06,640 And then a few stars got through and we put the song online. 127 00:13:07,360 --> 00:13:15,130 Pete Wilson and friends in the University Press Office helped us get some attention from the BBC, and within a day we were doing 1.2. 128 00:13:15,160 --> 00:13:20,230 Kevin Weeks and our own Kevin Week is the unit of Galaxy Classifications and 50,000. 129 00:13:20,380 --> 00:13:23,020 We were doing something like 70,000 classifications an hour. 130 00:13:24,340 --> 00:13:30,190 We haven't kept going at that rate, but Galaxy Series received literally hundreds of millions of classifications of galaxies. 131 00:13:30,400 --> 00:13:34,600 And the impressive thing is that taken collectively, those classifications look very good. 132 00:13:35,350 --> 00:13:40,600 So for each of these galaxies, I not only can sort them into their categories, but I can say so. 133 00:13:40,600 --> 00:13:46,720 This one, it's an elliptical. And I have a measure of confidence because ten out of ten people said it was a little sceptical, 134 00:13:46,990 --> 00:13:52,140 whereas this galaxy is a spiral, but only seven out of ten people say it's a spiral. 135 00:13:52,150 --> 00:13:56,530 So we have this not only a classification but an estimate of accuracy. 136 00:13:56,770 --> 00:14:02,530 That number, that vote fraction is proportional to the probability that this really is a spiral. 137 00:14:03,190 --> 00:14:10,480 And so that data for the first time gives us a really powerful, reliable, consistent set of morphologies. 138 00:14:11,110 --> 00:14:15,520 And we went on Galaxy Zoo originally was designed to do the simplest thing that was useful. 139 00:14:15,700 --> 00:14:18,750 Is it elliptical or is it a spiral these days? Yeah. 140 00:14:18,780 --> 00:14:23,650 The Galaxy Zoo. You will find a complicated decision tree that still requires no prior knowledge, 141 00:14:23,920 --> 00:14:29,050 but which takes you through the detailed is there are bulge, there is what shape of a spiral arms? 142 00:14:29,350 --> 00:14:37,270 If so, how many are there? And so on and so forth. Each piece of information encoding a different part of the galaxies dynamical past. 143 00:14:38,110 --> 00:14:44,020 And you also find that we're no longer restricted to Sloan. We quickly finished classifying all of the Sloan galaxies, 144 00:14:44,320 --> 00:14:50,530 and we've now branched out to most of the large surveys that the Hubble Space Telescope has done. 145 00:14:50,530 --> 00:14:59,470 So we're able to compare local morphology, the population of galaxies we see around us with those that existed, perhaps at a redshift of one. 146 00:14:59,590 --> 00:15:01,510 A few billion years ago also. 147 00:15:01,510 --> 00:15:08,200 And this is I don't have time to talk about this in detail, but for example, if you're interested in whether galaxies have a bar at the centre, 148 00:15:08,200 --> 00:15:17,019 this linear feature you can see in this plot, the grey is slightly poorly conceived simulation work based on a small number of simulations. 149 00:15:17,020 --> 00:15:23,229 But look at the data. This is the fraction of spiral galaxies that have a bar as a function of redshift and in 150 00:15:23,230 --> 00:15:27,940 pink from Tom Melvin in Portsmouth and then in black for Brook Simmons here in Oxford, 151 00:15:28,360 --> 00:15:33,459 we can show this this decline in the fraction of barred galaxies over time. 152 00:15:33,460 --> 00:15:38,110 And this is interesting because various people have predicted there should be no barred galaxies out here, 153 00:15:38,110 --> 00:15:43,360 the disks of galaxies out beyond the redshift of what dynamically hot particularly 154 00:15:43,360 --> 00:15:48,910 in the gas we know that from work done with came on another Oxford type projects. 155 00:15:49,870 --> 00:15:53,230 And so those disks, it's perhaps surprising, maybe it's sustainable. 156 00:15:53,470 --> 00:16:02,140 But you also see this this change in morphology. We're beginning to be able to do serious comparison across a very large redshift, 157 00:16:02,140 --> 00:16:09,010 large enabled by the fact that our classification scheme is the same in both cases, because we have to take into account bias, 158 00:16:09,340 --> 00:16:13,540 we have to deal with the fact that these are distant galaxies and that faint, fuzzy, 159 00:16:13,540 --> 00:16:19,270 distant things tend to look featureless, but we can measure all of that correct for it and get these robust results. 160 00:16:20,290 --> 00:16:24,550 I just want to take a few minutes to I can't give you all the science highlights of galaxies. 161 00:16:24,970 --> 00:16:30,820 There are far too many, but I wanted to highlight one particular story that we've been following here in recent years. 162 00:16:32,620 --> 00:16:40,030 This is a nearby galaxy rejoices in the name NGC 4395 and it's a disk galaxy. 163 00:16:40,480 --> 00:16:48,040 You probably call it a spiral that what it is really is a flatulent galaxy, my favourite kind of galaxy, because it's fun to say the word flatulent. 164 00:16:48,430 --> 00:16:55,410 I recommend you all try it later. We can do it together now if you like to try. 165 00:16:55,540 --> 00:17:04,060 Try it in the privacy of your own homes later. But what you'll notice is that this thing is missing a classic component of a typical spiral galaxy. 166 00:17:05,440 --> 00:17:10,299 Patrick Moore is used to give public talks in which he described the Milky Way as true. 167 00:17:10,300 --> 00:17:14,260 Fried eggs clapped back to back an experiment I don't recommend. 168 00:17:14,560 --> 00:17:21,280 But the point being that you've got the disk and then you have this bulge of older stars at the centre and this is a truly bulge less galaxy. 169 00:17:22,510 --> 00:17:26,860 It's one of two things that are strange about it. One thing that's all there is that it's bulges. 170 00:17:27,160 --> 00:17:31,120 The second thing that's strange is that the black hole at its centre, 171 00:17:31,540 --> 00:17:35,379 it's going to call it a supermassive black hole, but it barely qualifies for that title. 172 00:17:35,380 --> 00:17:43,570 It's only about 300,000 solar masses, about a factor of ten lower than you'd expect for a galaxy of this mass. 173 00:17:45,160 --> 00:17:48,670 So could these things will be connected? Well, yes, there's a nice story here. 174 00:17:48,670 --> 00:17:54,070 So the fact that it's bold to us tells you that this galaxy is guaranteed merger free. 175 00:17:55,120 --> 00:18:01,390 So the simulators and the dynamics tell us that if you have any even minor merger, 176 00:18:01,480 --> 00:18:06,340 maybe something with a something a 10th of the mass of this galaxy comes in and collides with it, 177 00:18:06,640 --> 00:18:11,020 then that inevitably kicks stars out of the disk and into a bulge. 178 00:18:11,410 --> 00:18:20,440 And so a bulge is galaxy is a laboratory for what happens if you let a galaxy evolve in isolation instead of in the presence of multiple mergers. 179 00:18:21,220 --> 00:18:30,250 Now, one can speculate people have made good arguments that it's the merging of different galaxies that 180 00:18:30,250 --> 00:18:35,980 drives both the growth of the galaxy themselves and the growth of the black holes at the centre. 181 00:18:36,160 --> 00:18:41,860 And that this explains the tight relation that we see between the mass of a galaxy and the mass of it's black hole. 182 00:18:41,860 --> 00:18:47,320 So the fact that this is pulseless and has a low black hole fits that theory. 183 00:18:47,560 --> 00:18:53,920 Low mass black hole fits that theory rather well. But it's one galaxy amongst the galaxy do we could do better than that. 184 00:18:53,920 --> 00:18:57,070 We ask the question, how prominent is the central bulge? 185 00:18:57,460 --> 00:19:01,000 We could pick those four where people say there's no bulge or there's just noticeable. 186 00:19:01,690 --> 00:19:03,820 We actually then for this particular study, 187 00:19:04,690 --> 00:19:12,700 restrict ourselves only to those where we're utterly convinced there's no bulge and which have currently growing black holes. 188 00:19:13,180 --> 00:19:20,500 ADRIANNE So these are bulges, galaxies, merger free, which are currently active. 189 00:19:20,860 --> 00:19:25,030 And you can see that because they have point sources at the centre. So these aren't small bulges. 190 00:19:25,030 --> 00:19:32,920 If you do the modelling, these things are just the gas around the accreting black hole. 191 00:19:33,370 --> 00:19:40,299 Now we'll get rid of two of them because those are mergers about to start and then we can look at the properties of these galaxies. 192 00:19:40,300 --> 00:19:45,720 So here they are. This is the black hole mass and this is the bulge stellar mass. 193 00:19:45,730 --> 00:19:48,910 And we can directly measure the black hole mass for two of these systems. 194 00:19:49,870 --> 00:19:58,540 The rest we can put limits on by assuming that they're accreting at the Eddington luminosity, which is hard but not always about limit. 195 00:20:00,170 --> 00:20:02,210 That's a little unfair to these guys. 196 00:20:02,690 --> 00:20:09,950 And if you measure the accretion rate or the luminosity rather from these two and you assume that that's typical of the whole population, 197 00:20:10,490 --> 00:20:14,090 well, these things move and you get a picture that looks like this. 198 00:20:14,360 --> 00:20:18,950 And this is, I think, the least surprising graph I could possibly show. 199 00:20:19,160 --> 00:20:24,739 We've selected galaxies because they're bulge bulges and we find that for a given black hole mass. 200 00:20:24,740 --> 00:20:28,550 Sorry, this black stripe is normal galaxies. From Harring and Rex. 201 00:20:29,690 --> 00:20:35,930 We selected these galaxies because they're faultless. We find that for a given black hole mass, they have smaller bulges than they should do. 202 00:20:35,960 --> 00:20:40,250 This is not an exciting result. But if I plotted instead of the black hole mass. 203 00:20:40,970 --> 00:20:44,510 So it's the bulge mass, the total mass of the galaxy. 204 00:20:44,840 --> 00:20:52,340 These things look normal. So if you if the bulge is taken out, plan, all you care about is the evolution of the galaxy. 205 00:20:52,610 --> 00:21:01,370 It's the first galaxy that I showed that's the anomaly and that we can grow perfectly normal galaxies with normal sized black holes without mergers, 206 00:21:02,240 --> 00:21:09,050 at least in these cases. We have a huge programme of work to try and understand whether these are freaks and unusual in some way 207 00:21:09,380 --> 00:21:13,820 or whether this is a mode of galaxy and black hole growth that we need to care about more generally. 208 00:21:15,560 --> 00:21:21,710 But it starts with morphology. But let's return to the fundamental thing again. 209 00:21:22,000 --> 00:21:26,479 Galaxies. Those are details. You know, we talked about bars and bulges, 210 00:21:26,480 --> 00:21:31,459 but the fundamental morphological difference of the population of galaxies is are they spiral or elliptical? 211 00:21:31,460 --> 00:21:39,650 And all sorts of properties correlate with these two spirals tend to but all that exclusively be star forming which are lots of red spirals. 212 00:21:39,950 --> 00:21:46,100 Ellipticals tend to be old, red and dead, devoid of both the gas and of the new stars that might form from that gas. 213 00:21:46,400 --> 00:21:49,580 Although we have a whole load of blue ellipticals that I could tell you about as well. 214 00:21:50,360 --> 00:21:54,380 But splitting the population into these two systems helps you understand it. 215 00:21:55,190 --> 00:21:58,849 So in the top left here, this is the I like to think of. 216 00:21:58,850 --> 00:22:02,149 This is the Hertz from Russell Diagram of Galaxy Evolution. 217 00:22:02,150 --> 00:22:06,460 It's how we understand these things. So this is mass versus colour. 218 00:22:06,470 --> 00:22:13,880 So blue galaxies are down here, red up here, small galaxies, big galaxies, or if you prefer, less luminous and more luminous galaxies. 219 00:22:14,330 --> 00:22:17,629 And then in the controls you see the two features of this landscape. 220 00:22:17,630 --> 00:22:25,280 We see the blue cloud where normal galaxies spiral galaxies, and the red sequence composed primarily of ellipticals. 221 00:22:25,970 --> 00:22:31,450 It's kind of interesting that the Milky Way is in a strange place on this diagram where green and the green, 222 00:22:31,460 --> 00:22:37,640 but you can see and then what I've plotted in colour here in this slightly making green sorry about 223 00:22:37,640 --> 00:22:44,240 that is the fraction of galaxies at that point in the diagram but have actively growing black holes. 224 00:22:44,990 --> 00:22:52,730 And so you can see if you look to all galaxies and eventually you conclude it was the intermediate mass galaxies where the action was happening. 225 00:22:52,820 --> 00:22:57,140 But if we split into up in the top right elliptical galaxies, 226 00:22:57,230 --> 00:23:05,780 early types you see it's the least massive the in the present day have 18 and if you look at light type spirals you see it's the most massive spirals. 227 00:23:06,470 --> 00:23:12,410 But the picture you use you talk about is different depending on whether you're looking at ellipticals or spirals. 228 00:23:13,190 --> 00:23:15,020 And we could see this in other places as well. 229 00:23:15,260 --> 00:23:23,749 Kevin Quincey of the The Kevin Week wrote a paper last year with the Galaxy Z team and Data which rejoices in the title. 230 00:23:23,750 --> 00:23:31,970 The Green Valley is a red herring. Now there's literature that shows that silly titles don't get more citations, 231 00:23:32,360 --> 00:23:36,470 but this thing has been cited more than 30 times in its first six months, so it must be a very good paper. 232 00:23:36,830 --> 00:23:41,540 Um, and what you can see is a sort of equivalent of the diagram I just been showing instead of mass. 233 00:23:41,540 --> 00:23:47,210 This is colour, we've got colour here blue, red and this is now an ultraviolet colour. 234 00:23:47,690 --> 00:23:52,340 So this is brightest in the ultraviolet fights in the ultraviolet, brightest in the ultraviolet. 235 00:23:52,940 --> 00:23:57,260 And you see once again that we have this nice bimodal distribution. 236 00:23:58,040 --> 00:24:01,910 And the question that Kevin is trying to answer derives from the previous plot. 237 00:24:02,300 --> 00:24:05,720 He's trying to work out how galaxies cross the Green Valley, 238 00:24:05,930 --> 00:24:10,760 this region with relatively few galaxies that lives between the blue cloud of the red sequence. 239 00:24:11,870 --> 00:24:16,670 And in particular, he's interested in the question of how quickly galaxies could cross. 240 00:24:18,020 --> 00:24:22,370 So what he does is he sets up a toy model. He says, okay, these galaxies have constant star formation, right? 241 00:24:22,970 --> 00:24:32,000 And then something quenches star formation by some exponential process and the speed with which quenching can happen is left as a free parameter. 242 00:24:32,420 --> 00:24:36,350 And then then he thinks, okay, let's think about what that would do. 243 00:24:36,800 --> 00:24:42,560 If all your galaxies start blue and then you quench them, the big blue stars start dying off. 244 00:24:42,560 --> 00:24:48,500 You're not replacing them with star formation because of the quenching, and you find they follow these tracks and in a sensible amount of time. 245 00:24:48,920 --> 00:24:54,620 In the 4 billion years since he locates this quenching, you find that if you have a very rapid quench, 246 00:24:54,620 --> 00:24:59,270 they get all the way up here and they're red in optical colour and that red in the ultraviolet as well. 247 00:24:59,740 --> 00:25:06,910 Whereas if you have slow processing, they stay blue. Sometimes we can do good science with very simple models. 248 00:25:07,390 --> 00:25:12,820 And then he says, okay, let's look at where green spirals and green electricals are. 249 00:25:13,660 --> 00:25:19,930 You see that here? These are the green spirals. There are bands fit with this rapid quenching model. 250 00:25:19,960 --> 00:25:25,120 The rapid quenching model leaves you with galaxies that are too red rather than to the green. 251 00:25:25,690 --> 00:25:29,710 And so it's a poor fit for the spirals. So spirals must quench only slowly. 252 00:25:30,820 --> 00:25:36,040 The ellipticals in orange up here can be fit by those fast quenching models, but not by the slow ones. 253 00:25:36,040 --> 00:25:37,449 So ellipticals quench faster. 254 00:25:37,450 --> 00:25:45,310 So to keep in the Green Valley is a red herring because it's just the tails of these two distributions slow quenching spirals, 255 00:25:45,520 --> 00:25:48,670 rapidly quenching ellipticals. Another difference in morphology. 256 00:25:50,830 --> 00:25:54,340 It's a good result. It's an interesting result. It's inspired a lot of work. 257 00:25:54,610 --> 00:26:00,189 But slightly embarrassingly in this paper, we do this by, we look at this diagram, but we say, 258 00:26:00,190 --> 00:26:04,719 Yeah, that's a good fit and that's not really good enough anymore, so do it properly. 259 00:26:04,720 --> 00:26:08,770 You ask a PhD student to do it. Becky Smethurst, who works with me, 260 00:26:08,770 --> 00:26:18,090 has just fitted a full Bayesian MCM model of galaxy quenching to the entire galaxies population using incidentally that fuzzy logic. 261 00:26:18,100 --> 00:26:24,190 The fact that we know that this is a spiral galaxy, but only seven out of ten people said it was a spiral. 262 00:26:24,190 --> 00:26:33,009 So using that to quantify how strong each galaxies evidence is and this is a plot of the time at 263 00:26:33,010 --> 00:26:39,249 which we expect quenching from the present day back to the beginning or near to the Big Bang. 264 00:26:39,250 --> 00:26:42,340 We have a strike. We have strategy and physical models. 265 00:26:42,790 --> 00:26:45,969 But don't let that distract you and then the degree of quenching from slow. 266 00:26:45,970 --> 00:26:51,820 Patrick The first question and the point here is just to point out that for green galaxies, blue galaxies, 267 00:26:51,820 --> 00:26:58,479 red galaxies, for disk galaxies and for smooth galaxies, a wide variety of quenching rates exist. 268 00:26:58,480 --> 00:27:02,590 So both ellipticals and spirals have fast and rapid quenching. 269 00:27:03,160 --> 00:27:08,140 Which brings me to Brian May. Red Star, Resolute, Young quenched. 270 00:27:08,290 --> 00:27:12,520 Even after 30 years. I've only just thought of that. So glad. I thought it would get a laugh, but it hasn't. 271 00:27:12,610 --> 00:27:18,099 So the only thing I could do now is make this sort of a matter joke between me and you as the audience. 272 00:27:18,100 --> 00:27:20,710 So thank you for going with that. I'm Brian. 273 00:27:21,460 --> 00:27:31,240 As many of you know, was a Ph.D. student at Imperial College working on the zodiacal light on dust in the universe, dust in the solar system. 274 00:27:31,240 --> 00:27:36,190 Rather, when his rock career took off, he took a 30 year break, and at that point, 275 00:27:36,190 --> 00:27:41,770 Imperial offered him an honorary degree, which he turned down on the grounds that he wanted to finish his thesis. 276 00:27:43,690 --> 00:27:46,749 And I've been known to say at this point this has been recorded. 277 00:27:46,750 --> 00:27:51,579 So I am not saying that it helps if you want to take a 30 year break, if you work on something so boring, 278 00:27:51,580 --> 00:27:58,510 but it's still there when you come back 30 years later, actually it's some fad, new observational techniques. 279 00:27:58,510 --> 00:28:04,650 And come on, Brian was actually able to go back and do more observing and do a longitudinal study over 30 years of the behaviour, 280 00:28:04,670 --> 00:28:08,980 the darker light which no one else could have done because no one else can fund themselves for 30 years. 281 00:28:09,340 --> 00:28:20,649 And he's got more nature papers than me. But as he was getting back into this idea, he has look it up, I've got more citations for. 282 00:28:20,650 --> 00:28:24,340 That's fine. Anyway, it's true. 283 00:28:25,270 --> 00:28:30,150 I, I at the time he was getting back into this, I was writing a popular book with pride, 284 00:28:30,340 --> 00:28:39,010 and I taught him how to use Astro from the archive as the modern because he set off for the library on his first day back in, back in research. 285 00:28:39,370 --> 00:28:45,730 And so in return he promoted galaxies a creating a permanent bias in the musical taste of our classifiers. 286 00:28:46,060 --> 00:28:50,110 This is a Dutch schoolteacher called Honey Barnacle, who likes bronze music. 287 00:28:50,110 --> 00:28:58,330 She's got the same guitar who discovered this thing. She's become a poster child for the other real advantage of having humans look at your data. 288 00:28:58,630 --> 00:29:04,990 So we've dealt with the fact that sometimes you need to scale to millions of people to cope with the size of modern data sets. 289 00:29:05,530 --> 00:29:10,000 But humans are also distractible, wonderfully distractible. 290 00:29:10,280 --> 00:29:16,600 We have this almost unique ability to be doing a routine task and then be brought up short by the unusual and unexpected. 291 00:29:17,080 --> 00:29:26,080 And this is surprisingly hard to program. You could build anomaly detection routines that get distracted mostly by common anomalies and Galaxy Z. 292 00:29:26,290 --> 00:29:28,150 You find an awful lot of satellite trails, 293 00:29:28,570 --> 00:29:36,190 but to build a classifier capable of doing a good job of classifying galaxies and then stopping and saying not just that this is a spiral galaxy, 294 00:29:36,430 --> 00:29:44,739 but that it's a blue blob here is rather hard. This blue blob, a quite the name of the that we thought it was a technical term. 295 00:29:44,740 --> 00:29:50,229 It turns out to be Dutch for object or thingy, but monthly notices let us print it. 296 00:29:50,230 --> 00:29:51,970 So this is now the official name of this object. 297 00:29:52,600 --> 00:29:59,130 And this was a return to what I thought astronomers did when I was playing around with my small telescope because we found. 298 00:29:59,200 --> 00:30:02,230 And a weird thing. And then we pointed lots of telescopes at it. 299 00:30:02,590 --> 00:30:06,250 And when we did that, we discovered, first of all, that it's at the same redshift as this galaxy. 300 00:30:06,760 --> 00:30:10,630 So it's distant and it's large. We discovered that it's hot. 301 00:30:11,350 --> 00:30:14,650 It's about 50,000 Kelvin. The gas in the vulva. 302 00:30:14,920 --> 00:30:18,309 And it contains no stars, or at least no obi stars. 303 00:30:18,310 --> 00:30:28,950 No bright stars. And so we suspected that this was being heated by a jet, it coming from activity at the nucleus of this galaxy. 304 00:30:28,960 --> 00:30:35,620 So you do see activity in spiral galaxies and we do sometimes see spectacular jets. 305 00:30:35,650 --> 00:30:43,090 Here's a discovery from the end of last year made by another Galaxies user on a version of the project that's looking at radio data. 306 00:30:44,020 --> 00:30:50,890 So the idea was that there would be jets of material that comes out here this cold see either side of the border. 307 00:30:50,920 --> 00:30:57,630 We know that from radio and that this is excited by the impact of the jets on the spectrum of the ball. 308 00:30:57,650 --> 00:31:01,300 VEP is consistent with it being shock heated, which would fit. 309 00:31:02,440 --> 00:31:09,490 The only problem is that when you look at this galaxy with an X-ray telescope, you discover that there's almost no source at all. 310 00:31:09,820 --> 00:31:14,680 So there isn't a luminous enough source, even with ridiculous obscuration by dust, 311 00:31:15,070 --> 00:31:19,390 to have an active and powerful enough heating of this amount of gas right now. 312 00:31:20,390 --> 00:31:26,590 And so the conclusion that we came to was that this the vulva was a light echo of this. 313 00:31:26,620 --> 00:31:34,000 It was telling us about the state of the galaxy 50,000 years ago, because there's a dogleg in here of about 50,000 light years. 314 00:31:34,510 --> 00:31:38,499 And so, in other words, 50,000 light years, 50,000 years ago, if we looked at the system, 315 00:31:38,500 --> 00:31:42,210 we would see a bright again would be magnitude eight in the visible. 316 00:31:42,220 --> 00:31:48,370 You'd be able to see it in binoculars, and it would, in fact be the nearest quasar to the Milky Way. 317 00:31:48,880 --> 00:31:53,560 But in the last 50,000 years, it's shut down and it's gone from active to quiet. 318 00:31:53,560 --> 00:31:58,870 Something that we knew happened, but which was very difficult to catch in the act. 319 00:31:59,770 --> 00:32:01,780 And so this is great fun. It's an interesting story. 320 00:32:01,780 --> 00:32:08,019 We've got a new way of studying the shut down of an AGM because we can read off the history of about 50,000 321 00:32:08,020 --> 00:32:13,960 years worth of the radiation from this black hole by looking at different distances along the vulvar. 322 00:32:16,520 --> 00:32:21,770 But the explanation smelled funny because why would the nearest quasar just shut down? 323 00:32:22,760 --> 00:32:28,400 Unless this happens all the time. And so these are the four values, which is the diminutive of a vulva. 324 00:32:28,730 --> 00:32:30,410 It's amazing what you learn when doing this stuff. 325 00:32:30,650 --> 00:32:38,670 And these are all the blue things you can see in these images are 18 ionised clouds in a variety of slowing galaxies, some a powerful AGM. 326 00:32:39,290 --> 00:32:47,180 But a third of them are not. And so we think a third of these systems have shut down within the last 20 to 100000 years. 327 00:32:47,540 --> 00:32:52,429 And we have a Hubble Space Telescope program to look at them and to try and work out 328 00:32:52,430 --> 00:32:57,290 the geometry of what is actually quite a strange and fascinating set of objects. 329 00:32:57,290 --> 00:33:01,790 The complexity of these images is both astounding and deeply annoying, 330 00:33:02,090 --> 00:33:05,540 because if you've got a really simple geometry, you can read off the history much more easily. 331 00:33:06,050 --> 00:33:13,930 But in step, we've got this mess now. So hopefully by this point, I've convinced you that I think galaxies are interesting. 332 00:33:13,940 --> 00:33:22,579 That's the subtext. But also that getting people involved in the sort of crowdsourcing and what we call citizen science has benefits, 333 00:33:22,580 --> 00:33:27,020 not just because of scale, but because of the potential for serendipity as well. 334 00:33:29,000 --> 00:33:33,380 So the next thing you might wonder is how else you can apply this. This is obviously an interesting method. 335 00:33:33,890 --> 00:33:38,120 It was also free and it also has huge outreach potential. 336 00:33:38,120 --> 00:33:41,390 So what else can we do? First question, Oh, are we going to run out of people? 337 00:33:41,840 --> 00:33:46,430 Is that a serious concern that we should worry about? And then my argument is that it isn't. 338 00:33:47,120 --> 00:33:48,829 Every time you talk about an Internet project, 339 00:33:48,830 --> 00:33:55,310 you have to mention a guy called Clay Shirky who writes annoyingly bestselling books about how the Internet is changing everything. 340 00:33:55,550 --> 00:34:05,180 This is my Clay Shirky slide. The big box shows the 200000000000 hours American adults spent watching television in 2009. 341 00:34:05,600 --> 00:34:09,200 And the small box is 100000000000 hours it took to create Wikipedia. 342 00:34:10,430 --> 00:34:15,650 And that, if you want to be facetious about it, Angry Birds use 16 years of human attention every hour. 343 00:34:16,400 --> 00:34:21,620 It would be much better for science if Angry Birds was the study of avian behaviour than just a game. 344 00:34:21,620 --> 00:34:33,799 So we only need a small, tiny fraction of the attention that's out there to be used for good, to get huge processing power so we can scale this. 345 00:34:33,800 --> 00:34:39,440 We can, if we want to, without harming the progress of astrophysics, have people look at penguins. 346 00:34:40,520 --> 00:34:47,299 So one of the things that happened to me once we started doing this was the other researchers would sidle up to me or call or email and say, 347 00:34:47,300 --> 00:34:52,340 You know, your people do like looking at galaxies. Do you think they'd look at my data as well? 348 00:34:53,120 --> 00:35:00,830 And in most cases, the answer is yes, because the people who looked at galaxies were motivated by the desire to contribute to science. 349 00:35:01,070 --> 00:35:06,260 And so this is an image from a project run out of the Department of Zoology here in Oxford. 350 00:35:06,620 --> 00:35:13,759 They used to visit remote parts of the Antarctic by yacht, and I'm told the yacht is not as luxurious as that makes it sound. 351 00:35:13,760 --> 00:35:16,940 In the same way that when I say I'm going to Hawaii, it doesn't mean I'm going to the beach. 352 00:35:18,920 --> 00:35:23,130 Sometimes I go to the beach. They used to visit Count penguins. 353 00:35:23,180 --> 00:35:27,020 Penguin colonies are sensitive to climate change, so this is interesting. 354 00:35:27,020 --> 00:35:29,270 And they revisit the same colony every year or two. 355 00:35:30,320 --> 00:35:37,580 Now they can buy a cheap camera for 100 quid, leave it out, and it will take pictures every 5 minutes or 5 hours. 356 00:35:37,580 --> 00:35:39,950 And these cameras survive the Antarctic winter. 357 00:35:40,250 --> 00:35:47,150 So now they still got one Ph.D. student, but instead of a notebook filled with penguin counts, they come back with hard drives full of images. 358 00:35:47,900 --> 00:35:52,309 So the task here is to count the number of Penguins game machine learning. 359 00:35:52,310 --> 00:35:56,330 Can't do a very good job of this. Anyone want to tell me what the answer is? 360 00:35:58,450 --> 00:36:01,720 Three three. It depends whether you count integer penguins or not. 361 00:36:01,840 --> 00:36:07,060 I agree. I agree that three is an upper limit, but you'll be amazed how often people check out decimals. 362 00:36:07,660 --> 00:36:11,080 All right, so you're warmed up. So the next one is a speed test. 363 00:36:11,590 --> 00:36:15,160 So I want the first person to count the penguins in the next image. Just call out the answer. 364 00:36:19,700 --> 00:36:21,530 I'm just going to the pub while you work that. 365 00:36:21,740 --> 00:36:29,150 So the point is, this is one of the interesting things about this task is that there's a wide variety of difficulties. 366 00:36:29,390 --> 00:36:37,610 This is a more normal image. And here's the solution. We built a project called Penguin Watch, which lets you count penguins in your spare time. 367 00:36:39,320 --> 00:36:44,030 Surprisingly enough, this has proved rather popular. More than a million penguins have been counted, 368 00:36:44,420 --> 00:36:50,660 and I feel the need to point out that we considered but rejected the name Penguin Hunters for this project, 369 00:36:51,470 --> 00:36:57,080 although I think that would have been still more popular. And we can go from penguins to particle physics. 370 00:36:58,010 --> 00:37:01,040 Not. And obviously it's not funny. Penguins are funny. Particle physics? 371 00:37:01,040 --> 00:37:05,320 No. Yeah. Maybe I need to. Subterranean physics jokes. 372 00:37:06,140 --> 00:37:09,350 This is a project called Higgs. Or this is data from Atlas. Right. 373 00:37:09,350 --> 00:37:12,620 But this is data from Atlas. From a very special source. 374 00:37:12,890 --> 00:37:20,090 So you're seeing a cross-section through the LHC here, of course, a collision and a particle cascade coming from a collision. 375 00:37:20,510 --> 00:37:27,530 Most of the LHC data doesn't hit a hard drive, as I'm sure lots of you know it's safe if it satisfies a series of triggers. 376 00:37:28,220 --> 00:37:36,230 And in the the debt detritus in the waste that doesn't hit those triggers is mostly a lot of routine stuff, some nonsense. 377 00:37:37,010 --> 00:37:46,610 I'm just possibly a few Nobel Prize winning discoveries because anything really unexpected would end up missing the triggers as well. 378 00:37:46,650 --> 00:37:50,600 I think this is a project this is almost the first project that Joe proposed we build. 379 00:37:51,200 --> 00:37:55,610 When I came to talk to him about Galaxy Z, so we finally did it, which is good. 380 00:37:56,450 --> 00:38:00,860 And the point about this is this is here for two reasons. One is it's kind of interesting. 381 00:38:01,640 --> 00:38:07,520 It really gets the sense of what the Atlas data is like. But the other thing is it's a test of how well we understand our community. 382 00:38:08,030 --> 00:38:13,700 The many also volunteers we have on our projects have told us that they do these projects because they want to help science. 383 00:38:16,030 --> 00:38:22,770 This is an incredibly long shot. It's scientifically useful, but this project will almost certainly fact it succeeds. 384 00:38:22,780 --> 00:38:28,690 Then there will be Nobel Prizes. And on the first day that we launched, I got this email from Alan. 385 00:38:29,290 --> 00:38:34,780 Look, if Alan's here. I saw him early. I think he's escaped. And so this was about this image that you just say, Who's messing with us? 386 00:38:35,110 --> 00:38:38,889 We've got a jet of new ones, which is a feature predicted in sum beyond the standard model theories. 387 00:38:38,890 --> 00:38:42,879 But it's never been seen in the wild. I don't remember us put together things like this into the simulation. 388 00:38:42,880 --> 00:38:46,209 Is this real? So I thought in an hour we've. 389 00:38:46,210 --> 00:38:49,830 We've solved this problem. It turns out this is actually just a breakthrough. 390 00:38:49,840 --> 00:38:52,540 These are normal particles that have escaped. It's not a jet of neurones. 391 00:38:53,260 --> 00:38:57,760 We're still looking for Nobel Prizes, but the project is quite popular and I recommend you give it a go. 392 00:38:57,970 --> 00:39:01,140 And you've got the idea by now that we can try this on all sorts of things. 393 00:39:01,150 --> 00:39:04,209 Probably the furthest distance we've travelled from galaxies. 394 00:39:04,210 --> 00:39:09,370 There is a project led by Brooks Simmons. It will also let the pulses galaxy work I talked about. 395 00:39:09,940 --> 00:39:11,769 But this is the Planetary Response Network. 396 00:39:11,770 --> 00:39:23,800 We have a partnership with providers of satellite imagery to react to a disaster like a typhoon hitting a developed area 397 00:39:24,100 --> 00:39:32,050 to provide rapid assessment of the damage and destruction to teams on the ground who need this kind of information. 398 00:39:32,350 --> 00:39:35,820 So we're testing that now if you want to join in. There's a go around. 399 00:39:35,830 --> 00:39:39,580 If you go and test, we're testing it with the historical data because clearly in this circumstance, 400 00:39:39,850 --> 00:39:44,200 you really need to demonstrate that you've got your data analysis right before you deploy the system. 401 00:39:44,920 --> 00:39:49,210 All of these things live in an umbrella site called the Zooniverse. 402 00:39:50,020 --> 00:39:56,950 Word I hope you've heard. I was reminded by Rob Simpson the other day that we have 42 Zooniverse projects have launched. 403 00:39:56,950 --> 00:40:01,780 They all lots of them have dice with the logos can see there's a sort of acceleration and then a fairly steady pace. 404 00:40:02,500 --> 00:40:09,850 This is felt very rapid and you can also detect in this graph and times where we've made changes. 405 00:40:09,850 --> 00:40:17,350 So for example, this cluster of projects were built using software or built in-house that we called Juggernaut. 406 00:40:17,650 --> 00:40:24,280 And Juggernaut was a piece of software that ran one project. So you built a Galaxy Zoo juggernaut and then you use much of the same code, 407 00:40:24,280 --> 00:40:27,910 but you built a lunar juggernaut and then so on and so upon it, hunting juggernaut. 408 00:40:28,360 --> 00:40:33,669 And it turns out that by the time you get to this many projects, you spend all of your time maintaining those pieces of code. 409 00:40:33,670 --> 00:40:36,700 So you throw that out and we build code called a robot us. 410 00:40:37,570 --> 00:40:38,110 Don't worry about that. 411 00:40:38,590 --> 00:40:47,890 And The Robots is one app that runs most of our projects that work well, but we were getting stuck at the limit of about 20 projects. 412 00:40:48,280 --> 00:40:58,149 And so the difference between penguins and planet and penguins and particle physics is small enough that you can further generalise the problem. 413 00:40:58,150 --> 00:41:06,550 So we have a new code called Panopto is Panopto is is a whole set of services that will allow people to build and run their own projects. 414 00:41:07,090 --> 00:41:14,710 So instead of coming to us and having us develop a new project, what we'll do instead is for projects where the interface is simple. 415 00:41:15,250 --> 00:41:21,790 You'll just come along as a scientist, upload your data, choose from some options about what questions you want to ask, 416 00:41:21,790 --> 00:41:23,530 what tasks you want to sign, people to click a button to. 417 00:41:23,610 --> 00:41:28,899 You have your project live on the web where a month or two or three I'm sorry to see which developers are in the room. 418 00:41:28,900 --> 00:41:34,210 They're not here. So we're only about a month away from being able to put this life. 419 00:41:34,510 --> 00:41:42,819 If you have lots of data in the form of images or video or perhaps even sound that it would benefit you to have people look at Please, 420 00:41:42,820 --> 00:41:48,399 please come and talk to us. We need test cases for this stuff and we hope that the Zooniverse will go to hundreds or 421 00:41:48,400 --> 00:41:55,060 thousands of projects because we don't have to worry about yet about running out of people, 422 00:41:55,870 --> 00:41:59,949 because the more projects we do, we could treat each one as a natural experiment. 423 00:41:59,950 --> 00:42:02,559 And we've done easy projects of difficult projects. 424 00:42:02,560 --> 00:42:07,030 We've done projects where it takes 2 seconds to look at an image and projects where it takes 10 minutes. 425 00:42:07,360 --> 00:42:12,760 We've done projects where the images are beautiful and we've done projects where people look at graphs for fun. 426 00:42:14,340 --> 00:42:18,819 Fascinated me. The beauty of the images doesn't seem to matter, it just kind of reassuring, 427 00:42:18,820 --> 00:42:22,720 but backs up this idea that people are doing this because they want to help science. 428 00:42:23,350 --> 00:42:29,620 We also see general patterns emerging. So this is a graph showing the contributors to one particular project. 429 00:42:29,620 --> 00:42:34,629 I think this is one of the incarnations of galaxies in which box is a person. 430 00:42:34,630 --> 00:42:37,450 The colours are meaningless other for the purposes of making it look nice. 431 00:42:37,810 --> 00:42:44,230 And then you can see we get a huge number of classifications from these people over here on the left, each of whom does a huge amount of work. 432 00:42:45,220 --> 00:42:51,550 But we also get a lot of classifications for people who only breeze by is common to almost all of our projects. 433 00:42:51,820 --> 00:42:55,600 And we deliberately design so that both of these sets of people are useful. 434 00:42:55,930 --> 00:43:00,490 If you rely just on the experts, you never create enough people to become experts. 435 00:43:00,820 --> 00:43:04,090 You rely on these people. You miss out on a lot of the stuff. 436 00:43:04,600 --> 00:43:12,100 And so we now have this very harmonious situation in which machines collect the data and 437 00:43:12,100 --> 00:43:15,270 we have enough humans with the help of our citizen scientists to work through them. 438 00:43:15,680 --> 00:43:18,680 Machine and human always worked together really well. 439 00:43:21,730 --> 00:43:28,180 Sorry. Wrong picture. This is my version of that nightmare, because this harmonious situation isn't going to last. 440 00:43:28,240 --> 00:43:40,270 This is the large synoptic survey telescope. LCT, of which Oxford is an important member, is as big as the biggest telescopes today or less. 441 00:43:40,720 --> 00:43:46,870 But it's a survey telescope. So the scientists we were arguing about how it will move, but it'll scan the whole sky roughly every three nights. 442 00:43:47,230 --> 00:43:51,310 It will produce something like three terabytes of data a night. 443 00:43:52,090 --> 00:44:00,459 And another way of understanding that is to subscribe each of you to the closest, the alert service, which will send you a message. 444 00:44:00,460 --> 00:44:07,180 Whenever anything in the closest field changes, you'll get about two and a half text message, two and a half million text messages a night. 445 00:44:08,710 --> 00:44:12,880 There's an awful lot of stuff there and it's real. This is the top of a mountain being flattened. 446 00:44:13,120 --> 00:44:16,230 Flattened for Allah says to turn up a mirror exists. 447 00:44:16,240 --> 00:44:27,940 This project will start commissioning in 2018. Now if you want to study the common and the predictable, this isn't too much of a problem. 448 00:44:27,940 --> 00:44:35,049 So if you want to study, for example, conventional type want a supernovae somebody, some grad students and postdocs, 449 00:44:35,050 --> 00:44:39,129 somewhere some team of people will build a service that looks through those 450 00:44:39,130 --> 00:44:42,940 alerts and provides you with more type one supernovae than you can possibly use. 451 00:44:44,200 --> 00:44:45,910 We'll throw some out too, but that's okay. 452 00:44:46,120 --> 00:44:54,849 But if you want to understand the unusual, you want to be able to react in real time to truly, truly unpredicted things. 453 00:44:54,850 --> 00:45:00,670 We're going to need to keep people in that loop of classification left to use computers too. 454 00:45:00,850 --> 00:45:03,219 We have to get better, more efficient at using the people. 455 00:45:03,220 --> 00:45:06,700 So that's what we've been working on and that's what I want to talk about in my last few minutes. 456 00:45:07,360 --> 00:45:10,390 As an example, I'm going to use a project called Space Warps. 457 00:45:10,810 --> 00:45:17,200 PR is the three science prize apogee to Verma one of them who who's here in this department? 458 00:45:17,200 --> 00:45:19,599 Phil Marshall, who many of you will know spacewalks, 459 00:45:19,600 --> 00:45:26,680 is a search for gravitational lenses for these distant galaxies whose light has been bent by passage in front of a nearby galaxy. 460 00:45:26,920 --> 00:45:31,210 And the nice thing about this is it's really easy to calculate what these things should look like. 461 00:45:31,540 --> 00:45:36,760 And so we could put fake, but let's call them simulated galaxies into the system. 462 00:45:37,150 --> 00:45:40,150 And if you see one of these and you catch it, then we say, yes, well done. 463 00:45:40,720 --> 00:45:45,670 You call a simulated lens and you get some feedback and you gain confidence in your classifications. 464 00:45:46,780 --> 00:45:51,400 And then to really test the system, we found a use for particle physicist. 465 00:45:51,790 --> 00:45:57,850 So sorry. This is Brian Cox standing in front of the level telescope at Jodrell. 466 00:45:58,210 --> 00:46:02,620 We partnered with his program Stargazing Live to promote this project. 467 00:46:02,830 --> 00:46:07,450 And this is what happened when Brian Cox told people to go and look at space walks. 468 00:46:08,020 --> 00:46:13,940 This is a million constellations arriving within an hour, so it's a million images that have been sorted through. 469 00:46:14,710 --> 00:46:17,860 We can really do this at scale. And for each of these people, 470 00:46:18,250 --> 00:46:25,569 we get a sense of confidence about them because we see how they behave with these simulated images as on the x axis here, 471 00:46:25,570 --> 00:46:27,879 you've got the probability that somebody says there's a lens, 472 00:46:27,880 --> 00:46:32,740 that if there is a lens and on the Y axis you've got the probability that there isn't a lens. 473 00:46:33,280 --> 00:46:34,840 They say there isn't a lens when there is a one. 474 00:46:35,320 --> 00:46:40,209 So you can see most people up here on the top right do this is 50 randomly selected people in the top. 475 00:46:40,210 --> 00:46:44,320 Right. People do quite well. Our astute users are here in the bottom left. 476 00:46:44,320 --> 00:46:50,050 We've got some people doing less well, although note that if you know that somebody is wrong all the time, 477 00:46:50,590 --> 00:46:53,800 they're precisely as useful as somebody who's right all the time. 478 00:46:56,650 --> 00:47:05,200 A principle, I'm told, is understood by senior members of this department, although they won't give me any examples as to how they apply this. 479 00:47:05,380 --> 00:47:10,510 But you can also play different games. Everyone on the right of this diagram is an optimist, right? 480 00:47:10,510 --> 00:47:13,840 They tend to say that that's something that everyone on the left is a pessimist. 481 00:47:14,290 --> 00:47:18,909 And so you can start to think about how you might pass tasks around. 482 00:47:18,910 --> 00:47:24,760 If I've got an image that's been seen by a few people and I've begun to be confident that there's something there, 483 00:47:25,180 --> 00:47:31,660 I might pass that deliberately to a pessimist. And if they say that's something bad, but my confidence that it's real goes up dramatically. 484 00:47:31,930 --> 00:47:42,190 And one can quantify this and one can start simulating different modes of task distribution to optimise the number of classifications that you need. 485 00:47:42,850 --> 00:47:50,229 And there's definitely latency there. This is a complicated diagram from the space walks paper, but let's zoom in on the x axis. 486 00:47:50,230 --> 00:47:54,250 Just we're on this box. Sorry I didn't have time to make a new on the axis. 487 00:47:54,250 --> 00:47:57,700 This is the amount of effort it looks like. So these are people who do ten galaxies. 488 00:47:57,700 --> 00:48:02,229 100,000, 10,000, 100,000. And this is the skill. 489 00:48:02,230 --> 00:48:07,420 And the skill is the integrated information contributed. 490 00:48:07,420 --> 00:48:10,450 So we can actually calculate the channel information for each classification. 491 00:48:10,870 --> 00:48:16,810 And then we say, okay, the scale is the integrated amount of information provided by each classifier. 492 00:48:17,140 --> 00:48:20,290 And the first thing you notice is that we've done a good job in building this. 493 00:48:20,320 --> 00:48:23,350 Project. There's no one in the bottom, right? 494 00:48:23,680 --> 00:48:27,130 So no one is doing tens of thousands of notifications badly. 495 00:48:28,750 --> 00:48:30,640 So our feedback and our training works. 496 00:48:30,910 --> 00:48:38,620 But also note that there are all these people on the top left who only do a few classifications but who are very good at the minute. 497 00:48:38,620 --> 00:48:47,770 They see a random image, but if we could detect them, we could spend the little time we have with them on the things where we really need expertise. 498 00:48:48,190 --> 00:48:53,590 And so this is a sign that the project is not yet operating at full efficiency and we could take advantage of that. 499 00:48:54,370 --> 00:49:00,070 What's nice is that once you start doing that, once I start deciding who should see what, 500 00:49:00,640 --> 00:49:05,440 there is no reason that I've just built a system that's perfect for combining 501 00:49:05,440 --> 00:49:12,010 human and machine classifiers because I can take the behaviour of a machine. 502 00:49:12,940 --> 00:49:15,489 What fraction does the machine get right is a function of brightness, 503 00:49:15,490 --> 00:49:22,750 of image or something and decide whether to pass a particular image to a machine or to a human, the two on an equal footing. 504 00:49:23,560 --> 00:49:27,910 And we can cope with wild fluctuations in the number of people who are around. 505 00:49:28,210 --> 00:49:33,840 And so this is the system I think we need to build. Fred Astaire And this is what we're working through. 506 00:49:34,120 --> 00:49:37,180 But there are problems with both halves of this equation. 507 00:49:37,810 --> 00:49:42,850 There has been an enormous advance in machine learning in the last ten years. 508 00:49:43,240 --> 00:49:48,520 You know this because voice recognition basically works on your iPhone, right? 509 00:49:49,270 --> 00:49:54,160 Or systems that recognise your scribbled handwriting onto a tablet work pretty well. 510 00:49:55,750 --> 00:50:00,210 This is a schematic diagram that I don't want you to read really, but is that, you know, 511 00:50:00,400 --> 00:50:08,260 most of this work has been done through developments based on old style neural nets that involve what's called deep learning. 512 00:50:08,980 --> 00:50:16,180 So it's sort of a nested neural network procedure that does sophisticated things in understanding what it does and doesn't know. 513 00:50:17,350 --> 00:50:24,280 The universal problem with these methods is that they rely on extremely large training sets. 514 00:50:25,300 --> 00:50:33,070 If you don't have a large training set, you pass, say, 100,000 examples across five galaxies to one of these routines. 515 00:50:33,280 --> 00:50:40,980 What it does is it learns the individual characteristics of those 100,000 and then fails on anything that isn't exactly like those things. 516 00:50:41,440 --> 00:50:51,360 We ran a competition with 10,000 quid or dollars in prize money for machine learning, people to enter galaxies, to sort through galaxies. 517 00:50:51,370 --> 00:50:59,740 They would try and improve automatic classification of galaxies. And we heard back in a great uniform how the 200,000 example galaxies was not enough, 518 00:51:00,370 --> 00:51:04,660 and the person that won the competition did so by citing new data. 519 00:51:05,230 --> 00:51:12,250 So he realised that the problem with Galaxy classification is rotation, the invariant and rotated all of the images four times. 520 00:51:12,250 --> 00:51:15,250 So you got four different images and then chop them up in different ways. 521 00:51:15,250 --> 00:51:19,150 So from a single image he got 16 expert classifications. 522 00:51:19,420 --> 00:51:22,899 It's very clever actually and was able to try this data on that. 523 00:51:22,900 --> 00:51:29,500 So machines I think have this limit and astronomical surveys, particularly for rare objects, 524 00:51:29,740 --> 00:51:34,510 are not going to produce large enough training sets to allow us to take advantage of modern machine learning. 525 00:51:35,290 --> 00:51:40,749 So the idea that all of these problems are going to go away because the computer scientists have our backs is, 526 00:51:40,750 --> 00:51:45,580 I think, false for common stuff, maybe for simple stuff, maybe. 527 00:51:45,880 --> 00:51:49,960 But for the kind of stuff we're interested in, I don't think we can rely on them to solve the problem entirely. 528 00:51:50,170 --> 00:51:53,830 They're going to do the easy stuff and the rest of it will have to be sorted by people. 529 00:51:54,100 --> 00:51:57,220 But there's a problem with this optimisation for people as well. 530 00:51:57,760 --> 00:52:03,219 This is the performance of a classifier. I just look out on a supernova project, look at the blue light. 531 00:52:03,220 --> 00:52:06,430 Good performance puts this blue line near this top vertex here. 532 00:52:06,790 --> 00:52:09,220 So close to this dot is to the top and this is over time. 533 00:52:09,820 --> 00:52:16,870 And so this is a person who's been well trained and then they change their mind about something and they suddenly become a much worse classifier. 534 00:52:17,770 --> 00:52:21,009 And all my knowledge about this person is now, oh, only go back as well. 535 00:52:21,010 --> 00:52:24,940 So who knows what happened. Maybe they had a drink, maybe they handed their log in to somebody else. 536 00:52:25,210 --> 00:52:27,070 Maybe they were confused about the astrophysics. 537 00:52:27,730 --> 00:52:33,129 So there's very little understanding of how to cope with this because machines don't do this by train. 538 00:52:33,130 --> 00:52:36,400 A machine learning classifier does not suddenly change its mind. 539 00:52:36,760 --> 00:52:41,110 And so how do we detect that? How much effort do we put into detecting that sort of behaviour? 540 00:52:41,680 --> 00:52:45,990 The other problem with humans is that they have opinions about what they like to see. 541 00:52:46,630 --> 00:52:49,959 So the project called the Milky Way Project, which Rob Simpson led, 542 00:52:49,960 --> 00:52:55,420 looking for bubbles around newly formed stars in infrared surveys of our galaxy from Spitzer. 543 00:52:56,200 --> 00:53:03,010 These four images are the Milky Way Project images which most encouraged people to keep classifying. 544 00:53:03,340 --> 00:53:07,210 So we looked at the average number of tasks completed by somebody after seeing an image. 545 00:53:07,540 --> 00:53:14,889 So these things people found interesting and exciting and they're happy to classify and these are the worst behaviours. 546 00:53:14,890 --> 00:53:18,430 They're are a mixture of the confusing at the top and the boring of the bottom. 547 00:53:18,760 --> 00:53:27,500 So I have to be careful if. These are the most difficult pictures in the Milky Way project and I assign them to my best classifiers. 548 00:53:28,400 --> 00:53:28,969 Who are that? 549 00:53:28,970 --> 00:53:35,000 Partly because they even if they tell me that because they want to do science, but they're there because they want to see the pretty pictures. 550 00:53:35,210 --> 00:53:44,540 I've built a system that will systematically drive away my best classifiers and understanding how to automatically balance the requirements 551 00:53:44,540 --> 00:53:54,320 of people who want to be entertained and to feel confident versus just blindly optimising for efficiency is a huge and unexplored problem. 552 00:53:54,710 --> 00:53:59,600 And it's made worse because you can't ask people what they want to do. 553 00:54:00,290 --> 00:54:07,280 We have a problem with our community and it's a problem shared by anyone who tries to do science outreach. 554 00:54:07,290 --> 00:54:11,930 We think that this is a good way of getting people involved in science and thinking about science. 555 00:54:12,230 --> 00:54:16,370 But the people and we know that our volunteers take part because they want to help. 556 00:54:17,240 --> 00:54:21,860 But science people, you're actually doing science is a terrifying and scary thing. 557 00:54:22,520 --> 00:54:24,319 And so somehow we have to build projects. 558 00:54:24,320 --> 00:54:31,070 Does it feel real to convince people that they're doing something that will help astronomers without saying to them, 559 00:54:32,330 --> 00:54:40,940 the weight of the universe is on your shoulders? It's an incredibly difficult problem and it's taught me that galaxies are simply the people. 560 00:54:41,090 --> 00:54:41,720 Thank you very much.