1 00:00:01,340 --> 00:00:22,710 I. OK, so I hope by the end of my talk, you will have an appreciation of the complexity of the sorts of experiments that we do. 2 00:00:22,710 --> 00:00:27,930 I will also try and give you an idea of why these experiments are well suited to machine learning techniques. 3 00:00:27,930 --> 00:00:36,340 And I will aim to do this by giving you an overview of some experiments that have already been optimised using machine learning methods. 4 00:00:36,340 --> 00:00:41,680 So firstly, the question I want to address is why should we be interested in these experiments? 5 00:00:41,680 --> 00:00:48,820 Why should we take the time to develop these kind of quantum mechanical experiments, the kinds of experiments that I will be looking at, 6 00:00:48,820 --> 00:00:54,760 experiments done at the very low temperature limit where the thermal energy of the systems are comparable to the sort of 7 00:00:54,760 --> 00:01:01,450 quantisation of the energy levels of the systems so that my system can be very accurately described by quantum mechanics. 8 00:01:01,450 --> 00:01:07,660 There are essentially three reasons which you may be interested in building these experiments. 9 00:01:07,660 --> 00:01:12,160 The first one I indicate by this clock here, which is that these are very low temperature experiments, 10 00:01:12,160 --> 00:01:16,060 can give very accurate measurements of things such as time. 11 00:01:16,060 --> 00:01:22,720 So, for example, this image here is a illustration of the atomic clock. 12 00:01:22,720 --> 00:01:27,520 Atomic clocks based on strong team are currently one of the most accurate time measurements that we can make, 13 00:01:27,520 --> 00:01:34,720 and it's equivalent to sort of knowing the age of the universe to the nearest second in terms of the precision that we can get. 14 00:01:34,720 --> 00:01:43,270 Another reason which is quite often in the news these days, is that these kinds of experiments can be used as ways of doing quantum computing. 15 00:01:43,270 --> 00:01:48,880 So the image that I show here is an ion trap from David Lucas's group here in the Department of Physics. 16 00:01:48,880 --> 00:01:56,530 And in that small region here. So we have kind of these metal electrodes here which have an oscillating electric field of to them. 17 00:01:56,530 --> 00:01:57,490 And in the centre, 18 00:01:57,490 --> 00:02:05,920 you can see this single ion that is held and you can manipulate the states this ion using lasers to perform your quantum computation experiments. 19 00:02:05,920 --> 00:02:13,030 And the final reasons are effectively illustrated here by this microscope are the kind of blue skies research questions that we might have. 20 00:02:13,030 --> 00:02:21,550 So how can we basically perform experiments that will help inform our understanding of of the universe or physical systems? 21 00:02:21,550 --> 00:02:29,620 So this comes broadly into the category of quantum simulation. So the idea is that we have some Hamiltonian that we seek to understand. 22 00:02:29,620 --> 00:02:39,340 So say, for example, a Hamiltonian for superconductivity. And what we can try and do is build an experimental system which will have that Hamiltonian. 23 00:02:39,340 --> 00:02:45,020 And then we can we can turn a sort of experimental parameters to adjust that Hamiltonian and see how the system behaves. 24 00:02:45,020 --> 00:02:50,890 We might want to do this because some Hamiltonians are interesting, but not necessarily solvable with our with our numerical methods. 25 00:02:50,890 --> 00:02:55,030 So instead, we can build a machine to sort of directly simulate that Hamiltonian itself. 26 00:02:55,030 --> 00:02:57,910 And that's very much the area of research that I will be talking about. 27 00:02:57,910 --> 00:03:03,610 So to give you an idea of what a kind of typical quantum simulation experiment will look like, 28 00:03:03,610 --> 00:03:08,980 this is a cad schematic of our experiment in the in the basement downstairs. 29 00:03:08,980 --> 00:03:15,070 So these experiments tend to start with a stage of laser cooling on which is illustrated in this area here, 30 00:03:15,070 --> 00:03:23,710 where we will capture some atoms from thermal vapour and cooled them down to about 100 micron kelvin using laser cooling methods. 31 00:03:23,710 --> 00:03:25,390 Once we've captured our vapour, 32 00:03:25,390 --> 00:03:32,500 we will then transport through this differential pumping section of this vacuum system to an ultra high vacuum region, and you can see this. 33 00:03:32,500 --> 00:03:37,630 So this glass cell here is sort of blown up on the right, and you can see various coils around this. 34 00:03:37,630 --> 00:03:44,230 So what we are ultimately doing is applying various magnetic fields and laser fields to manipulate our atoms, 35 00:03:44,230 --> 00:03:50,680 to make them experience different potentials and so simulate different Hamiltonians and therefore simulate different systems. 36 00:03:50,680 --> 00:03:56,680 If our experiment works well, we will basically produce a Bose-Einstein condensate, 37 00:03:56,680 --> 00:04:02,410 which is where we have macroscopic occupation of the ground state of our system. 38 00:04:02,410 --> 00:04:08,000 The metric that we use normally for sort of actually exploring the state of the 39 00:04:08,000 --> 00:04:12,730 system is we release the atoms and they fall on the time of flight and expand. 40 00:04:12,730 --> 00:04:16,480 And we can basically take a picture using a sort of standard imaging camera and 41 00:04:16,480 --> 00:04:20,860 image the distribution of atoms after they have had this ballistic expansion. 42 00:04:20,860 --> 00:04:25,870 And what you see here in this picture is such an image that this is a real space image. 43 00:04:25,870 --> 00:04:32,890 You can see that there is a sort of bimodal distribution in that I have this kind of faint background, which is a kind of Gaussian distribution. 44 00:04:32,890 --> 00:04:37,360 So effectively, this is the atoms that remain in this sort of thermal state. 45 00:04:37,360 --> 00:04:43,450 And then this large sort of dense core is effectively the atoms in the in the ground. 46 00:04:43,450 --> 00:04:46,610 The system deploys Arnstein condensate. 47 00:04:46,610 --> 00:04:54,020 The point I would like to make is that these experiments are very complicated, so I'll show you what our typical sequence looks like. 48 00:04:54,020 --> 00:04:59,360 So this is basically in time series so long the x axis, I have time. 49 00:04:59,360 --> 00:05:04,070 And then on the y axis, I have various parameters that I need to control my experiment. 50 00:05:04,070 --> 00:05:09,350 So, for example, these are analogue voltages that choreograph the various parts of the experiment. 51 00:05:09,350 --> 00:05:17,840 These are digital shots as things used to sort of toggling the laser beams or applying different pulses to the atoms. 52 00:05:17,840 --> 00:05:23,550 You can see there is a lot of complexity going along. And the point I would like to make is that this is a very vast parameter space. 53 00:05:23,550 --> 00:05:29,540 So if we want to find a way of optimising our experiment, this becomes very laborious to search for it. 54 00:05:29,540 --> 00:05:34,820 But because it is a very difficult thing to orchestrate, I should say this is about a minute long, 55 00:05:34,820 --> 00:05:41,750 by the way, to actually produce this, this cloud of atoms, because this requires very precise timings. 56 00:05:41,750 --> 00:05:44,220 Most of these systems are already computer controlled, 57 00:05:44,220 --> 00:05:48,800 and that gives a really nice advantage when you want to then use machine learning to control these experiments. 58 00:05:48,800 --> 00:05:53,330 Because normally these experiments are already parameterised in a way that you can take 59 00:05:53,330 --> 00:05:58,730 a machine learning package and connect it and immediately start testing to see results. 60 00:05:58,730 --> 00:06:00,020 So in the first part of my talk, 61 00:06:00,020 --> 00:06:08,840 I'm going to talk about using machine learning to optimise these experiments and specifically I'm going to look at how we produce ultracold gases. 62 00:06:08,840 --> 00:06:17,390 So first, let's ask kind of what are the advantages of machine learning that we expect we will be able to use to improve our experiment? 63 00:06:17,390 --> 00:06:20,720 So firstly, I would like to say that during the learning process, 64 00:06:20,720 --> 00:06:25,580 our machine learning actually acquires a sort of intuitive understanding of the experiment. 65 00:06:25,580 --> 00:06:31,190 So this is kind of like how a Ph.D. student will tend to perform an experiment. 66 00:06:31,190 --> 00:06:38,630 It will sit in the laboratory and it will turn knobs. And although you may have an idea of of what will happen if you turn a knob. 67 00:06:38,630 --> 00:06:41,360 Sometimes the experiments don't actually always work that way, 68 00:06:41,360 --> 00:06:47,090 and there are quite often sort of tricks that well will be needed to to get things to work. 69 00:06:47,090 --> 00:06:50,270 And sometimes things can occur that are slightly unexpected. 70 00:06:50,270 --> 00:06:56,840 And so that's why this other point, the fact that the machine learning has no a priori model of the experiment can also be an advantage. 71 00:06:56,840 --> 00:07:04,340 So whereas we, as physicists, will tend to sit down and write out a physical model of our system and then use that physical model to 72 00:07:04,340 --> 00:07:10,850 try and develop a kind of theoretical ramp or sequence of parameters that we think will perform best. 73 00:07:10,850 --> 00:07:17,390 Sometimes that doesn't work, and it can be good to have effectively no prior knowledge. 74 00:07:17,390 --> 00:07:21,230 So reasons that might not work is, for example, because our model is incomplete. 75 00:07:21,230 --> 00:07:27,380 We have failed to account for some imperfection in the apparatus or our calibrations may be incorrect. 76 00:07:27,380 --> 00:07:30,200 Another important thing about the machine learning is is that they are patient, 77 00:07:30,200 --> 00:07:34,070 and that means that machine learning will happily sit down and optimise this for six hours straight, 78 00:07:34,070 --> 00:07:39,200 which, you know, most humans will get bored after 15 minutes, basically. 79 00:07:39,200 --> 00:07:44,480 So that allows you to very rigorously explore the parameter space without distraction. 80 00:07:44,480 --> 00:07:48,020 And the last point is, I would like to say that I wouldn't say so. 81 00:07:48,020 --> 00:07:53,780 There was a question about whether or not machine learning algorithms will achieve a DPhil. 82 00:07:53,780 --> 00:08:00,080 I would like to think of it as this machine learning algorithm approach actually sort of frees us to think about other problems. 83 00:08:00,080 --> 00:08:02,630 So before we use machine learning in our laboratory, 84 00:08:02,630 --> 00:08:10,850 we would tend to spend maybe sort of 10 hours of a 12 hour day getting the experiment to work nicely and then only two hours on the science. 85 00:08:10,850 --> 00:08:15,080 But now what we can actually do is we can leave it running overnight. The experiment can tune itself up. 86 00:08:15,080 --> 00:08:21,790 And then when we come in in the morning, the experiment is ready and waiting to to be used. 87 00:08:21,790 --> 00:08:28,150 So in the context of producing ultracold quantum gases, I'm going to talk about two workhorse techniques today, 88 00:08:28,150 --> 00:08:33,080 which are sort of broadly used in many of these experiments. And I'll talk about the optimisation of each of them. 89 00:08:33,080 --> 00:08:38,740 So the first is evaporative cooling, which work lot, which works largely in the same way that a cup of coffee cools down. 90 00:08:38,740 --> 00:08:44,740 And the second is laser cooling, which you may be familiar with, but I'll attempt to describe both of these. 91 00:08:44,740 --> 00:08:50,770 So the first thing with evaporative cooling is that if I imagine that I have a collection of atoms in a trap, 92 00:08:50,770 --> 00:08:55,780 I will have a distribution of energies that those atoms can take. 93 00:08:55,780 --> 00:08:58,000 So this is, for example, the max volts and distribution. 94 00:08:58,000 --> 00:09:03,580 You have the velocity along the x axis and the probability of having that velocity along the y axis. 95 00:09:03,580 --> 00:09:09,880 The point that is important here is that there is a high energy tail to this distribution in evaporative cooling. 96 00:09:09,880 --> 00:09:13,720 What I want to do is I want to find a way to remove atoms from my system in a way 97 00:09:13,720 --> 00:09:18,190 that they will take away energy from that system in the most efficient way possible. 98 00:09:18,190 --> 00:09:26,560 So if I can find a way to remove the highest energy atoms, by definition, I will reduce the average energy of the remaining atoms. 99 00:09:26,560 --> 00:09:33,070 And then once I allow those atoms to collide and referrals, I will produce another distribution, which is characterised by a lot of temperature. 100 00:09:33,070 --> 00:09:40,060 So I lose atoms. But what I gain is an increase in the face space density, which is a measure of sort of how cold and dense this sample is. 101 00:09:40,060 --> 00:09:46,450 So just to sort of show this quickly, I have here a simulation of some atoms trapped inside a harmonic trap. 102 00:09:46,450 --> 00:09:52,570 You can see them sort of jostling around at the centre. So when these atoms are shown and grey is basically when they've collided, 103 00:09:52,570 --> 00:09:55,930 so you can see there are many collisions happening at the centre and you'll see that 104 00:09:55,930 --> 00:09:59,770 the atoms which have ice energy tend to sort of fly out to the very edges of the trap. 105 00:09:59,770 --> 00:10:07,870 So what this sphere around the outside is is effectively a surface where if an atom reaches that surface, it will be removed from the trap. 106 00:10:07,870 --> 00:10:11,920 So if you watch, what happens is, as I would use this surface, 107 00:10:11,920 --> 00:10:17,740 I'm basically lowering the potential energy that an atom has to have for it to to escape from the trap. 108 00:10:17,740 --> 00:10:24,130 And in doing so, I will only remove the highest energy atoms if I keep slowly decreasing this. 109 00:10:24,130 --> 00:10:27,970 Then basically, my remaining atoms are thermals and I reduce the temperature. 110 00:10:27,970 --> 00:10:32,050 But a key point here is that there is a sort of timescale involved in this, 111 00:10:32,050 --> 00:10:37,390 which is set by the rate at which these collisions occur because if I make my cut too quickly. 112 00:10:37,390 --> 00:10:43,870 So just to refresh this, if I go far too quickly, then basically I will remove atoms before they can reach thermals, 113 00:10:43,870 --> 00:10:51,220 and I will end up basically losing many, many atoms without efficiently reducing the temperature of the system. 114 00:10:51,220 --> 00:10:56,260 I'm going to talk about a work that was done in by a group in Australia. 115 00:10:56,260 --> 00:11:00,670 It's this paper here by Wigley and co-authors, which was published in Scientific Reports. 116 00:11:00,670 --> 00:11:09,220 And it effectively describes how this how they used Gaussian process to perform an optimisation of their evaporative cooling ramps. 117 00:11:09,220 --> 00:11:13,870 So in their experimental setup, they basically use dipole traps to confine the atoms. 118 00:11:13,870 --> 00:11:17,980 So the actual confinement I showed in the simulation was sort of a harmonic trap. 119 00:11:17,980 --> 00:11:24,700 The way that they achieve that harmonic trapping is by shining in laser beams, which are right to choose from the atomic transition. 120 00:11:24,700 --> 00:11:30,490 And in doing so, they shift the energy levels of the atoms by an amount which is proportional to the intensity of the laser. 121 00:11:30,490 --> 00:11:34,540 So what that means is that at the centre of the beam, the shift is much greater than at the edges of the beam. 122 00:11:34,540 --> 00:11:41,600 There is no shift, so it will produce a sort of potential well, which can confine the atoms. 123 00:11:41,600 --> 00:11:47,120 What they can then do, basically, is by reducing the intensity of their laser beams over time, 124 00:11:47,120 --> 00:11:53,000 they reduce the depth of that potential well, so they reduce the amount of energy required for an atom to escape from the trap. 125 00:11:53,000 --> 00:11:59,660 And that's that's effectively the same as reducing that, that the radius of that grace sphere that I showed in the simulation. 126 00:11:59,660 --> 00:12:03,350 The measure that they use to sort of I mean, if you want to optimise, 127 00:12:03,350 --> 00:12:09,470 you would like ideally to have a way of determining that you're actually improving the system in the first place. 128 00:12:09,470 --> 00:12:13,850 So the way that they measure the performance of their system is, as I said, 129 00:12:13,850 --> 00:12:19,280 to release the atoms and allow them to expand in a kind of ballistic expansion so that hot clouds. 130 00:12:19,280 --> 00:12:24,740 So these are again real space images if they release atoms of hot cloud or expand very quickly for the same time of flight. 131 00:12:24,740 --> 00:12:29,660 And you will see this kind of very broad distribution here, whereas a cold cloud will remain very, 132 00:12:29,660 --> 00:12:38,850 very cold and dense because they if the thermal energy such that the velocity distribution of this cloud is is narrow. 133 00:12:38,850 --> 00:12:43,320 There optimisation process looks like this, so when you use machine learning for optimisation, 134 00:12:43,320 --> 00:12:52,710 what you typically are doing is building a model of the system that you want to optimise, where the model is effectively your kind of learnt process. 135 00:12:52,710 --> 00:12:58,080 So they use a Gaussian process, which I won't have time to go into the details of, 136 00:12:58,080 --> 00:13:02,520 but effectively what they are assuming is that there is some function. 137 00:13:02,520 --> 00:13:09,390 Which would you describe the response of the system given some input parameters x, which are effectively your sort of experimental settings? 138 00:13:09,390 --> 00:13:17,910 So say, for example, how quickly you reduce the intensities of the laser beams and it would it would produce some cost from some value, 139 00:13:17,910 --> 00:13:22,800 which this cost function here is the measure of how well the experiment performs. 140 00:13:22,800 --> 00:13:31,050 What they assume is that you have some, some function, which has possibly some stochastic variation on it as well. 141 00:13:31,050 --> 00:13:39,960 And by basically sort of having this feedback loop where they they feed in parameters to the experiment, they receive the result from the experiment, 142 00:13:39,960 --> 00:13:42,210 which they analyse to determine the cost, 143 00:13:42,210 --> 00:13:50,430 and they then use that to improve their fitting of the model to effectively better learn how to approximate the experiment. 144 00:13:50,430 --> 00:13:54,150 And then they use that to to seek optimisation of the experiment. 145 00:13:54,150 --> 00:13:59,850 And one important thing I would like to say about the Gaussian process is that it has. 146 00:13:59,850 --> 00:14:04,810 There was a question earlier in the previous session about a lot of these machine 147 00:14:04,810 --> 00:14:10,860 learning is looked like black boxes and they tell you how something works, but don't necessarily give you any other intuition as to it. 148 00:14:10,860 --> 00:14:17,700 One thing that's nice about the Gaussian process is that it has the concept of a sort of characteristic length scale for each of the parameter, 149 00:14:17,700 --> 00:14:23,550 which is a measure of how much I expect the cost function to change if I change one of those parameters. 150 00:14:23,550 --> 00:14:26,070 And that's illustrated here in this diagram. 151 00:14:26,070 --> 00:14:34,320 So imagine that I had this this one dimensional access x and I have this this cost function, which is a function of that parameter. 152 00:14:34,320 --> 00:14:42,780 Then these red, green and blue lines are effectively different faiths, assuming different characteristic length scales. 153 00:14:42,780 --> 00:14:50,250 So if I seem a very long characteristic length scale for these black data points here, then this would be the sort of best fit that I would have. 154 00:14:50,250 --> 00:14:54,360 Whereas if I assumed that the process would actually vary more quickly as a function of X, 155 00:14:54,360 --> 00:14:56,910 I would be allowed to have this, this sort of green function here. 156 00:14:56,910 --> 00:15:03,990 And so part of the optimisation process is also about learning about what the length scales that describe this experimental process. 157 00:15:03,990 --> 00:15:10,970 So it gives you a measure of how sensitive your experiment is to different parameters. 158 00:15:10,970 --> 00:15:19,340 I'd like to say that, so they basically perform this optimisation over this effective curtain so you can see different stages of 159 00:15:19,340 --> 00:15:25,730 their cooling process and they find so they benchmark the sort of downstream process against this Nelda need, 160 00:15:25,730 --> 00:15:30,680 which is a sort of kind of very dumb local sort of gradient descent type algorithm, 161 00:15:30,680 --> 00:15:36,320 which is effectively trying to find the optimum configuration by only ever looking at the sort of last few data points. 162 00:15:36,320 --> 00:15:42,650 Whereas the Gaussian process is taking the entire set that it's learnt so far and is attempting to find best parameters based on those. 163 00:15:42,650 --> 00:15:49,820 And you see that this fast convergence of the Gaussian process. So the reason they both are converging at the same rates here is because basically 164 00:15:49,820 --> 00:15:56,720 they need to provide some training data for the Gaussian process to get started. And they use the analogy made for generating that, that trading set. 165 00:15:56,720 --> 00:15:59,810 And then afterwards, you see that gathering process converges much, much faster. 166 00:15:59,810 --> 00:16:05,390 So they are able to produce a very cold, dense cloud only after a very few evaluations. 167 00:16:05,390 --> 00:16:11,840 I should say that these are very small datasets compared to the kind of datasets that we've seen so far in the talks. 168 00:16:11,840 --> 00:16:17,150 I mentioned that these typical experiments take on the order of 30 seconds to a minute to perform. 169 00:16:17,150 --> 00:16:21,050 And if you want to actually have time to do your experiments as well as your optimisation, 170 00:16:21,050 --> 00:16:27,620 that means that you only ever have a small number of evaluations to look at. So one kind of nice result, 171 00:16:27,620 --> 00:16:32,240 wiggly and co-authors show is that effectively what you can use your Gaussian process 172 00:16:32,240 --> 00:16:36,950 for is to eliminate parameters which are less important for optimising your system. 173 00:16:36,950 --> 00:16:44,150 So this blue line here shows effectively a kind of Gaussian process optimisation using seven parameters. 174 00:16:44,150 --> 00:16:47,510 And then what they did is they looked at the characteristic length scales that the 175 00:16:47,510 --> 00:16:51,110 machine learner determined for those seven parameters and removed the parameter, 176 00:16:51,110 --> 00:16:57,590 which was the least sensitive. So effectively this, they said the outcome of the evaporation doesn't matter at all about the seventh parameter. 177 00:16:57,590 --> 00:17:03,710 So let's not bother optimising it. And then they get a much faster optimisation when they on the six remaining parameters. 178 00:17:03,710 --> 00:17:07,710 And that will be important Lightroom. 179 00:17:07,710 --> 00:17:15,180 I'd like to say that we here in Oxford have kind of benchmark this Gaussian process algorithm against other approaches for these types of experiments. 180 00:17:15,180 --> 00:17:21,690 So this we sort of generate training datasets using a differential evolution algorithm, 181 00:17:21,690 --> 00:17:27,450 which is a sort of genetic algorithm approach to effectively sort of taking different combinations 182 00:17:27,450 --> 00:17:32,580 of parameters and selecting features from the parameters which have performed well so far. 183 00:17:32,580 --> 00:17:38,070 This very fast blue line here, you can see, is the performance of the calcium process. 184 00:17:38,070 --> 00:17:42,330 And then this darker blue line is the performance of a neural network. 185 00:17:42,330 --> 00:17:47,580 So the neural network ultimately produces a slightly better solution for us, but it takes longer to converge. 186 00:17:47,580 --> 00:17:51,270 And part of the reason is that it requires a larger number of datasets to train. 187 00:17:51,270 --> 00:17:58,380 And as I say, we only are typically dealing with sort of 80 datasets for our experiment, which would be, you know, just over an hour of running. 188 00:17:58,380 --> 00:18:01,050 Another nice thing I can say is that you might think they're looking at that 189 00:18:01,050 --> 00:18:05,850 original parameter suite that I had in the in the one of the earlier slides. 190 00:18:05,850 --> 00:18:12,540 They are very complicated experiments, so you might think these optimisations will only work if I can already provide them with a starting point. 191 00:18:12,540 --> 00:18:16,380 But what we actually do here is we start with entirely randomised parameters. 192 00:18:16,380 --> 00:18:21,420 So this is the first image that we produce, which has absolutely nothing in it whatsoever. 193 00:18:21,420 --> 00:18:25,500 So another thing is very nice about this is in principle you could build an 194 00:18:25,500 --> 00:18:30,060 experimental apparatus only knowing that your your apparatus could in principle 195 00:18:30,060 --> 00:18:41,130 make a cloud of cold atoms and then allow the algorithm to take it from there and ultimately reproduce these very nice Bose-Einstein condensates. 196 00:18:41,130 --> 00:18:45,960 I'll talk about the next method of cooling, but I wanted to mention which is laser cooling, 197 00:18:45,960 --> 00:18:50,580 so I'm not sure how many of you are familiar with laser cooling. It might seem slightly counterintuitive, 198 00:18:50,580 --> 00:18:58,170 but a laser can cool something down because obviously lasers are typically associated with sort of heating things up, 199 00:18:58,170 --> 00:19:01,290 cutting things in half, etc. in the movies. 200 00:19:01,290 --> 00:19:09,450 So one of the nice things obviously about lasers is that they are a stream of photons and photons each carry momentum. 201 00:19:09,450 --> 00:19:15,300 What this means is that if an atom interacts with a laser beam, so here I have my vote on streaming in from the left hand side. 202 00:19:15,300 --> 00:19:18,690 If my atom absorbs these photons to get into some excited state, 203 00:19:18,690 --> 00:19:23,970 it will effectively absorb these full momentum kicks from the left hand side when it then emits. 204 00:19:23,970 --> 00:19:29,370 Eventually, my hands and will relax back down to the ground state. It'll limit the photons back out, but in a random direction. 205 00:19:29,370 --> 00:19:37,320 So if I take the sum of of this process, these emission will have a sort of mean momentum of zero, 206 00:19:37,320 --> 00:19:41,070 whereas these four well will provide a momentum kick from one side. 207 00:19:41,070 --> 00:19:46,350 So the sum of this process will be to exert a force on the outer. 208 00:19:46,350 --> 00:19:50,130 So in principle, what I want to do to do laser cooling is I want to have it so that my atom will 209 00:19:50,130 --> 00:19:54,360 always absorb photons from a laser beam coming opposite to its direction of travel. 210 00:19:54,360 --> 00:20:00,510 If I can do that, I can arrange it so that my force applied to the atom will always act to slow the atom down. 211 00:20:00,510 --> 00:20:04,860 The way we do this in the lab is we we send two counts propagating beams in from either 212 00:20:04,860 --> 00:20:10,320 side like this and my atom is here moving with some velocity in a certain direction. 213 00:20:10,320 --> 00:20:16,740 Now what you should remember is that Atoms responds to very narrow band ranges of radiation. 214 00:20:16,740 --> 00:20:21,780 So if the optical frequencies that matter will respond to defined by the 215 00:20:21,780 --> 00:20:31,630 electronic transmissions that can occur within within the shells with an atom. What this means is that if I write detuned, my two laser beams. 216 00:20:31,630 --> 00:20:40,120 My answer will initially see those if my team is stationary, it would be far detuned from those two transitions when my team starts moving. 217 00:20:40,120 --> 00:20:46,030 It will see one of the lasers as blue shifted and it will say the other is red shifted. So I will basically break the symmetry of that system. 218 00:20:46,030 --> 00:20:51,670 My answer will more rapidly radiate photons from the laser that it is travelling 219 00:20:51,670 --> 00:20:54,730 into the direction of which is effectively shifted into resonance with it. 220 00:20:54,730 --> 00:20:59,260 And in doing so, the atom will feel a force that pushes against the direction of motion. 221 00:20:59,260 --> 00:21:03,370 So this is the kind of principle of of laser cooling. 222 00:21:03,370 --> 00:21:10,810 I did this small picture of an ambulance down in the bottom right because you will identify this fact to this effect as the Doppler shift, 223 00:21:10,810 --> 00:21:15,610 which is the same reason why an ambulance siren changes pitch as it comes towards you and then drives away. 224 00:21:15,610 --> 00:21:20,970 This is exactly the same trick that we are performing with the atoms, but we're performing it with optical radiation. 225 00:21:20,970 --> 00:21:24,990 If you want to make a trap, I also need some position dependents for my forces. 226 00:21:24,990 --> 00:21:28,290 And the way that that's done in the laboratories, you apply a magnetic field. 227 00:21:28,290 --> 00:21:35,970 So by applying a magnetic field because of the same, in effect, my optical frequencies with my transitions will now gain a spatial dependence, 228 00:21:35,970 --> 00:21:42,120 and that allows me to to set up a potential where I can trap atoms at some well-defined point. 229 00:21:42,120 --> 00:21:48,660 And this so in this image here in the bottom right, this is an image taken from our lab, which is laser cooling of strontium. 230 00:21:48,660 --> 00:21:52,080 And you can see this bright speck in the middle is a large number of strontium atoms, 231 00:21:52,080 --> 00:21:56,070 which have been collected from an oven cooled and are now held by these. 232 00:21:56,070 --> 00:22:04,850 You can see in the background this bright blue laser. So how can optimisation of laser cooling be done? 233 00:22:04,850 --> 00:22:11,240 Well, so this was shown by Tranter and co-authors in this Nature Communications paper. 234 00:22:11,240 --> 00:22:14,960 This flow diagram, I'll sort of break it down because it looks quite complicated at first. 235 00:22:14,960 --> 00:22:20,720 Essentially, it is the same sort of idea as the previous optimisation and that we want to model some process. 236 00:22:20,720 --> 00:22:29,420 We set up a machine learner which will learn what that process is, and then we will we will effectively feedback parameters into that experiment, 237 00:22:29,420 --> 00:22:34,730 learn how the experiment behaved given those parameters, and use that to improve our knowledge of the machine learning. 238 00:22:34,730 --> 00:22:39,260 So the system that they have up here in the top left is a magnetar optical trap of rubidium. 239 00:22:39,260 --> 00:22:44,180 They have a very sort of elongated cigar shaped a cloud of atoms. 240 00:22:44,180 --> 00:22:49,900 And so the way they measure how many atoms they've laser cooled and how cold intensities is, they send a laser beam in along the axis. 241 00:22:49,900 --> 00:22:54,950 So that's this here, and that laser beam will be absorbed more if there are more atoms in the track. 242 00:22:54,950 --> 00:22:57,110 And so they basically measure a voltage on a photodiode, 243 00:22:57,110 --> 00:23:03,440 which becomes a measure of sort of how cold and dense this cloud is to perform well to model this behaviour. 244 00:23:03,440 --> 00:23:06,230 They use stochastic artificial neural networks. 245 00:23:06,230 --> 00:23:14,000 So we saw in the earlier example how machine learning is don't necessarily always fit to data properly and in attempts to get around. 246 00:23:14,000 --> 00:23:19,190 This is to use three different machine liners, which are each initialised with different random parameters. 247 00:23:19,190 --> 00:23:25,610 So this is a kind of machine learning. By consensus, we attempt to train three different learners if one of them gets stuck. 248 00:23:25,610 --> 00:23:30,080 Hopefully, the other two will have figured out what's going on and help to help to correct for that. 249 00:23:30,080 --> 00:23:33,200 And then the other thing which isn't drawn on the diagrams, 250 00:23:33,200 --> 00:23:38,180 they also use a genetic algorithm to generate a sort of fourth data point each time they run. 251 00:23:38,180 --> 00:23:46,880 So they use these. These four different is the three neural networks and the one differential evolution to basically model this process. 252 00:23:46,880 --> 00:23:52,790 They generate a series of ramps, so they have three control parameters, which are effectively the D tuning of the cooling laser beams, 253 00:23:52,790 --> 00:23:57,830 50 tuning of a pump, a laser beam which is used to basically keep atoms in this cooling transition. 254 00:23:57,830 --> 00:24:02,660 Because what I didn't mention in the previous slide is that your atoms can be promoted to an excited state. 255 00:24:02,660 --> 00:24:05,240 And what you want to have to have your laser cooling work is that they fall 256 00:24:05,240 --> 00:24:08,300 back into the same state they were initially in so that you can cycle them. 257 00:24:08,300 --> 00:24:13,820 But what actually can happen is obviously there are other electronic states and atoms will fall out into the dark states eventually. 258 00:24:13,820 --> 00:24:17,690 So you use this great Panopto sort of out the atoms back into the cooling transition. 259 00:24:17,690 --> 00:24:20,330 And then the last parameter they have is the coil current. 260 00:24:20,330 --> 00:24:25,880 And what they want to do is basically sort of perform a sequence where they can kind of compress this to get the coldest and to sample. 261 00:24:25,880 --> 00:24:36,430 And so what they do is they separate their parameters into twenty one time bins, and that gives sixty three parameters for them to to optimise. 262 00:24:36,430 --> 00:24:41,860 So I'll show you what they uh. Let me just check something. 263 00:24:41,860 --> 00:24:47,770 Yeah, I show you what their convergence looks like effectively, so they start by well, 264 00:24:47,770 --> 00:24:54,310 these blue dots are effectively an optimisation using differential evolution, so attempting to model the process using a genetic algorithm. 265 00:24:54,310 --> 00:24:59,560 And these red dots show the faster convergence that they're sort of stochastic neural network produces. 266 00:24:59,560 --> 00:25:06,280 So what this is showing is that in a lower number of runs, they can more efficiently understand what the process is and then optimise it. 267 00:25:06,280 --> 00:25:11,290 And then over on the right here, we have absorption images showing what they're called dense cloud to look like, 268 00:25:11,290 --> 00:25:15,520 and they show this against what the sort of best students in the lab could achieve. 269 00:25:15,520 --> 00:25:19,990 So they basically showing that the machine learner in only a sort of few hours 270 00:25:19,990 --> 00:25:30,600 is able to outperform a student which has been working on the experiment for, you know, a few years, basically. 271 00:25:30,600 --> 00:25:35,640 And this is one of the kind of wackier things, and I mentioned at the start that one of the advantages that you have with machine 272 00:25:35,640 --> 00:25:39,390 learning is the fact that they don't have an a priori understanding of the system. 273 00:25:39,390 --> 00:25:44,940 So obviously, as physicists, one of the things we do is we come up with very simple toy models to describe systems. 274 00:25:44,940 --> 00:25:48,750 But the problem is is that those toy models can often be flawed on the left. 275 00:25:48,750 --> 00:25:52,620 Here we see the sort of best optimised human parameters that kind of sensible. 276 00:25:52,620 --> 00:25:56,730 You see that this rate bumper d tuning sort of decreases linearly over time. 277 00:25:56,730 --> 00:26:00,630 We keep one parameter constant and then the coil current is sort of ramped up. 278 00:26:00,630 --> 00:26:06,090 But if you actually look at what the best parameters the machine a learner produces, they look completely different. 279 00:26:06,090 --> 00:26:11,280 I mean, this is kind of the sort of equivalent of a sort of unhuman chess move. 280 00:26:11,280 --> 00:26:16,920 This is not something that the expert, the researchers sort of expected when they first ran this. 281 00:26:16,920 --> 00:26:22,020 They still are in the paper. They were not necessarily even sure how to explain why this was optimum. 282 00:26:22,020 --> 00:26:29,190 Apart from the fact that it was, but what they believe may be happening is that there is a sort of dynamics happening where the models are. 283 00:26:29,190 --> 00:26:33,240 So you're sort of at Magneto optical trap, which is what you're using to confine the atoms, 284 00:26:33,240 --> 00:26:38,820 is sort of releasing the atoms, allowing them to sort of adiabatic, expand and cool and then sort of recapturing them. 285 00:26:38,820 --> 00:26:48,660 So it's effectively sort of learnt a new technique which which the the experimentalists were not using beforehand to show that well, 286 00:26:48,660 --> 00:26:53,490 they show that these parameters are effectively stable and don't change from day to day. 287 00:26:53,490 --> 00:27:00,750 And they plop sort of these parameters versus kind of other sort of less optimal runs over here. 288 00:27:00,750 --> 00:27:03,960 One thing is that you'll notice that the parameters sort of rail very strongly. 289 00:27:03,960 --> 00:27:09,630 So for example, in this part here you can see this coil currently sort of basically turning off and on very quickly. 290 00:27:09,630 --> 00:27:16,440 They suggest that. So if you look at this kind of cost landscape as a function of the coil current for this optimal solution, 291 00:27:16,440 --> 00:27:19,740 they basically show that it seems like in principle, 292 00:27:19,740 --> 00:27:28,620 you may even want to actually extend the ranges of these parameters beyond what their experiment was able to do. 293 00:27:28,620 --> 00:27:35,790 I'd like to say that, so we've sort of looked at optimising two separate stages that what we've done here in Oxford as well is applied, 294 00:27:35,790 --> 00:27:41,460 applied these techniques to optimise a full experimental suite. But obviously we do have a very large number of parameters. 295 00:27:41,460 --> 00:27:48,390 So in order to optimise our sort of entire experiment, we have to be selective about the parameters that we choose to optimise. 296 00:27:48,390 --> 00:27:55,410 And in order to do this, we basically use the Gaussian process to understand which parameters our experiment is most sensitive to. 297 00:27:55,410 --> 00:28:02,310 So these two plots on the right hand side show show for the laser cooling and for the evaporative cooling, 298 00:28:02,310 --> 00:28:09,210 what the kind of length scales are for the most important parameters as a function of running so effectively as the learner learns. 299 00:28:09,210 --> 00:28:13,050 These are what it thinks are sort of how sensitive various parameters are. 300 00:28:13,050 --> 00:28:19,800 And ultimately, what we use this to do is extract which parameters are the most sensitive parameters for each of these cooling stages. 301 00:28:19,800 --> 00:28:22,200 Once we know what the most sensitive parameters are, 302 00:28:22,200 --> 00:28:26,490 we can fix all of the other parameters and perform a complete optimisation over the whole sequence, 303 00:28:26,490 --> 00:28:31,500 but only considering the most sensitive parameters. To put this into context again. 304 00:28:31,500 --> 00:28:36,630 So we have a human optimisation, which is effectively the the best of the Ph.D. students can do. 305 00:28:36,630 --> 00:28:40,860 And then this is the sort of final optimisation, including all of the different stages. 306 00:28:40,860 --> 00:28:45,750 So without going into the details it from the colour map, you can see this one is colder and denser. 307 00:28:45,750 --> 00:28:50,340 Basically, this is a much better experiment. 308 00:28:50,340 --> 00:28:59,220 Another very nice thing that you can do once you have set this up is that what may constitute an optimum experiment is very situationally dependent. 309 00:28:59,220 --> 00:29:02,790 So so far, I've said that it is optimum to have very cold, dense clouds. 310 00:29:02,790 --> 00:29:07,140 And it often is because being cold and dense sets quantity called the chemical potential, 311 00:29:07,140 --> 00:29:14,290 which then becomes important in determining which times of your Hamiltonian are important. 312 00:29:14,290 --> 00:29:19,720 But depending on what you actually want to use, the experiment for, you may find other metrics which are significant. 313 00:29:19,720 --> 00:29:26,380 So for example, in this, we change the cost function to basically say give us the largest number of atoms at a temperature of one microcap. 314 00:29:26,380 --> 00:29:30,970 And so the cost function looks like this. As you increase the number, the cost function goes lower and lower, 315 00:29:30,970 --> 00:29:35,980 and you can effectively see all of these cost functions are centred on this, this one, like a Kelvin. 316 00:29:35,980 --> 00:29:38,980 And this this will basically produce a larger cloud. 317 00:29:38,980 --> 00:29:44,710 This line will produce you'll find a sequence which will generate the largest cloud of atoms at exactly the temperature we've asked for. 318 00:29:44,710 --> 00:29:53,290 Other optimisations which are useful include things like asking the question What is the fastest time I can actually create a proton stone condensate? 319 00:29:53,290 --> 00:29:55,690 And so say, for example, that may be useful. 320 00:29:55,690 --> 00:30:02,590 If I'm aligning some optics around my experiment, I want to be able to adjust the lens and then take a picture. 321 00:30:02,590 --> 00:30:07,210 But I don't necessarily want to wait around for one minute in between leaving the lens each time, 322 00:30:07,210 --> 00:30:15,010 because that quickly becomes tedious, so we can basically use this to cut down sequence times as well. 323 00:30:15,010 --> 00:30:20,240 I would like to now talk about a slightly different use of machine learning in the laboratory, 324 00:30:20,240 --> 00:30:24,370 so we've looked at how machine learning can be used to improve the performance of an experiment. 325 00:30:24,370 --> 00:30:33,970 But what if I had a device which has a certain fixed performance or instead what I'm doing is trying to find out how I can best evaluate that device? 326 00:30:33,970 --> 00:30:42,100 So I want to. The question becomes how can I extract the most information possible in the smallest number of measurements? 327 00:30:42,100 --> 00:30:46,000 And so this is the this is the device into question. 328 00:30:46,000 --> 00:30:52,330 This is a quantum dot. This is done by a group of Andrew bricks here in Oxford. 329 00:30:52,330 --> 00:30:58,300 The kind of measurements that they want to take is effectively the conductance of this device as a function of different gait voltages. 330 00:30:58,300 --> 00:31:06,790 So we see we have a number of different voltages which we can apply to our chip and that will change the conductance of this of this device. 331 00:31:06,790 --> 00:31:11,090 This is a kind of 2D parameter scan of the space we're interested in. 332 00:31:11,090 --> 00:31:14,710 So say, for example, we take a gate voltage and a biased voltage. 333 00:31:14,710 --> 00:31:18,310 You'll notice that there are some features that you can immediately see in this image. 334 00:31:18,310 --> 00:31:24,640 So, for example, we have a very large band in the middle where there is basically no current flowing. 335 00:31:24,640 --> 00:31:29,860 So if I wanted to take accurate, if I wanted to make a very efficient characterisation of the device, 336 00:31:29,860 --> 00:31:34,760 it probably wouldn't be sensible to take many measurements in this region. 337 00:31:34,760 --> 00:31:42,730 Instead, what you would want to do is take measurements around these areas where there's sort of a lot of things happening, basically. 338 00:31:42,730 --> 00:31:54,340 So the so I should say I'm sorry, the reference has been lost because the slide has resized to these four by three ratio. 339 00:31:54,340 --> 00:32:04,330 But this paper was led by Natalie IRAs. Yeah, I mean, I guess you have the exact reference and huge quantum information. 340 00:32:04,330 --> 00:32:08,440 Yes, in future quantum information, this was last year, wasn't it? Yeah. 341 00:32:08,440 --> 00:32:14,200 So what they are trying to do is, is, as I say, characterise these devices efficiently. 342 00:32:14,200 --> 00:32:20,260 The way that they do this is they they first perform an initial scan over a device, so they perform a very rough scan. 343 00:32:20,260 --> 00:32:24,430 And then the question is, if I have this eight by eight scan of my parameter space, 344 00:32:24,430 --> 00:32:32,260 how can I then choose my variables to my remaining measurements to take to learn this device as best as I can? 345 00:32:32,260 --> 00:32:42,460 So. The way that they do this is they train a machine learning to predict what the device should look like based on a map given like this. 346 00:32:42,460 --> 00:32:47,740 And this machine learning will produce various different candidate devices. 347 00:32:47,740 --> 00:32:54,100 So effectively, I say the eight by eight scan looks like this. The machine learning generates all this random distribution of what the full 348 00:32:54,100 --> 00:32:58,510 device scan might look like and then what you can look for is you can look for. 349 00:32:58,510 --> 00:33:00,880 Well, where do these pictures actually disagree? 350 00:33:00,880 --> 00:33:07,900 Where are the parts where my machine learner is sort of saying, I don't really know what is happening on the device at this point, 351 00:33:07,900 --> 00:33:14,440 and that produces this map here, which is effectively a sort of map of where the uncertainties lie in this parameter space. 352 00:33:14,440 --> 00:33:20,260 And so from this map, you can effectively extract what is the measurement that will give us the moat, 353 00:33:20,260 --> 00:33:24,430 which will effectively reduce our uncertainties about how about this device? 354 00:33:24,430 --> 00:33:29,830 So they call this the sort of information gain. And so by looking at this map, you can then say, Well, 355 00:33:29,830 --> 00:33:34,420 let's take the points in this map where the information going is going to be greatest, measure those. 356 00:33:34,420 --> 00:33:45,590 And in doing so, you can you can reduce your uncertainties. So this is sort of showing, for example, some different reconstructions. 357 00:33:45,590 --> 00:33:50,330 So if I had this partial scan of the parameter space here and I looked at what the 358 00:33:50,330 --> 00:33:53,450 various machine learning generated solutions would look like along this line, 359 00:33:53,450 --> 00:33:58,850 here you see all these like different grey wavy suggestions as to how the response may vary. 360 00:33:58,850 --> 00:34:04,400 Obviously, over here, all these different grey, wavy suggestions converge, so there's very little point taking a measurement here. 361 00:34:04,400 --> 00:34:15,040 But here they're very, very different. So there would be a lot of benefit to some measuring at that particular point. 362 00:34:15,040 --> 00:34:22,330 One thing they do to show that this algorithm works very well is they basically make the observation. 363 00:34:22,330 --> 00:34:30,040 So going back to this one, they make the observation that this information gain map looks kind of like the gradient of this function. 364 00:34:30,040 --> 00:34:36,160 So as I said, this flat region in the centre where not a lot is happening. 365 00:34:36,160 --> 00:34:37,720 That also obviously has a very flat gradient. 366 00:34:37,720 --> 00:34:42,640 So you're kind of saying your your function, your output doesn't really depend on the parameters in that region, 367 00:34:42,640 --> 00:34:46,180 whereas along these parts where we want to measure, we have large gradients. 368 00:34:46,180 --> 00:34:54,370 So they define this gradient, which is effectively a measure of of of the slope of the function at different points. 369 00:34:54,370 --> 00:34:59,830 And they define an information content as if I take the measurements that I've made so far. 370 00:34:59,830 --> 00:35:06,520 If I measure the gradients of those points and then compare that back to the sort of total amount of gradient in the image, 371 00:35:06,520 --> 00:35:14,440 given that when the scan is complete, I can basically determine how much information have I extracted so far from the device. 372 00:35:14,440 --> 00:35:19,720 A linear sort of standard raster scan will basically decrease linearly like this, 373 00:35:19,720 --> 00:35:24,880 so I'm effectively sort of randomly sometimes picking up a thought which has a large change, 374 00:35:24,880 --> 00:35:27,280 you know, a large gradient and therefore a lot of information given. 375 00:35:27,280 --> 00:35:31,810 And sometimes I end up sampling a very flat region, so it just decreases linearly. 376 00:35:31,810 --> 00:35:35,560 The optimal solution? So if I had a device that was fully characterised, 377 00:35:35,560 --> 00:35:40,030 I know that the measurements I should have made is effectively to arrange those 378 00:35:40,030 --> 00:35:44,080 points by the amount of gradient in each point and just measure accordingly. 379 00:35:44,080 --> 00:35:49,570 That would give me this Green Line here. This is effectively the sort of measurement sorted by the great end at each point. 380 00:35:49,570 --> 00:35:54,340 And what they show is that each machine learner basically very closely approximates this Green Line. 381 00:35:54,340 --> 00:35:58,090 So what they're saying is this machine learning is very good at identifying the points 382 00:35:58,090 --> 00:36:03,910 that it should sample in order to best improve our understanding of the device. 383 00:36:03,910 --> 00:36:05,770 So sort of conclude the talk. 384 00:36:05,770 --> 00:36:14,620 I hope I go back to my slide at the start, which is effectively the sort of advantages the machine learning will will give us as experimentalists. 385 00:36:14,620 --> 00:36:18,160 I've given you an example of, well, in this sort of optimisations, 386 00:36:18,160 --> 00:36:26,320 I've given an example about how it's useful sometimes to have an unbiased mind with no kind of prior AI model of how these systems work. 387 00:36:26,320 --> 00:36:33,580 I've also shown how the learn can pick up an intuitive understanding. So we saw that in the in the work where they optimise the laser cooling stage, 388 00:36:33,580 --> 00:36:37,120 the machine learning or ended up finding effectively an entirely new way to 389 00:36:37,120 --> 00:36:40,450 do the laser cooling that the experimentalists hadn't thought of beforehand. 390 00:36:40,450 --> 00:36:47,160 And that's just because the machine learning gains are sort of intuition as to how the system will respond. 391 00:36:47,160 --> 00:36:53,520 And I guess the the other point that I would like to emphasise is this last one about the fact that having machine 392 00:36:53,520 --> 00:36:58,290 learning methods in our laboratories basically frees us up as experimentalists to think about the bigger picture. 393 00:36:58,290 --> 00:37:02,700 We're not necessarily getting bogged down in terms of getting the experiment running well. 394 00:37:02,700 --> 00:37:07,950 We can turn up in the morning with an experiment that has been optimised overnight and get straight down to doing the physics. 395 00:37:07,950 --> 00:37:14,282 So thank you very much for listening and.