1
00:00:01,340 --> 00:00:22,710
I. OK, so I hope by the end of my talk, you will have an appreciation of the complexity of the sorts of experiments that we do.

2
00:00:22,710 --> 00:00:27,930
I will also try and give you an idea of why these experiments are well suited to machine learning techniques.

3
00:00:27,930 --> 00:00:36,340
And I will aim to do this by giving you an overview of some experiments that have already been optimised using machine learning methods.

4
00:00:36,340 --> 00:00:41,680
So firstly, the question I want to address is why should we be interested in these experiments?

5
00:00:41,680 --> 00:00:48,820
Why should we take the time to develop these kind of quantum mechanical experiments, the kinds of experiments that I will be looking at,

6
00:00:48,820 --> 00:00:54,760
experiments done at the very low temperature limit where the thermal energy of the systems are comparable to the sort of

7
00:00:54,760 --> 00:01:01,450
quantisation of the energy levels of the systems so that my system can be very accurately described by quantum mechanics.

8
00:01:01,450 --> 00:01:07,660
There are essentially three reasons which you may be interested in building these experiments.

9
00:01:07,660 --> 00:01:12,160
The first one I indicate by this clock here, which is that these are very low temperature experiments,

10
00:01:12,160 --> 00:01:16,060
can give very accurate measurements of things such as time.

11
00:01:16,060 --> 00:01:22,720
So, for example, this image here is a illustration of the atomic clock.

12
00:01:22,720 --> 00:01:27,520
Atomic clocks based on strong team are currently one of the most accurate time measurements that we can make,

13
00:01:27,520 --> 00:01:34,720
and it's equivalent to sort of knowing the age of the universe to the nearest second in terms of the precision that we can get.

14
00:01:34,720 --> 00:01:43,270
Another reason which is quite often in the news these days, is that these kinds of experiments can be used as ways of doing quantum computing.

15
00:01:43,270 --> 00:01:48,880
So the image that I show here is an ion trap from David Lucas's group here in the Department of Physics.

16
00:01:48,880 --> 00:01:56,530
And in that small region here. So we have kind of these metal electrodes here which have an oscillating electric field of to them.

17
00:01:56,530 --> 00:01:57,490
And in the centre,

18
00:01:57,490 --> 00:02:05,920
you can see this single ion that is held and you can manipulate the states this ion using lasers to perform your quantum computation experiments.

19
00:02:05,920 --> 00:02:13,030
And the final reasons are effectively illustrated here by this microscope are the kind of blue skies research questions that we might have.

20
00:02:13,030 --> 00:02:21,550
So how can we basically perform experiments that will help inform our understanding of of the universe or physical systems?

21
00:02:21,550 --> 00:02:29,620
So this comes broadly into the category of quantum simulation. So the idea is that we have some Hamiltonian that we seek to understand.

22
00:02:29,620 --> 00:02:39,340
So say, for example, a Hamiltonian for superconductivity. And what we can try and do is build an experimental system which will have that Hamiltonian.

23
00:02:39,340 --> 00:02:45,020
And then we can we can turn a sort of experimental parameters to adjust that Hamiltonian and see how the system behaves.

24
00:02:45,020 --> 00:02:50,890
We might want to do this because some Hamiltonians are interesting, but not necessarily solvable with our with our numerical methods.

25
00:02:50,890 --> 00:02:55,030
So instead, we can build a machine to sort of directly simulate that Hamiltonian itself.

26
00:02:55,030 --> 00:02:57,910
And that's very much the area of research that I will be talking about.

27
00:02:57,910 --> 00:03:03,610
So to give you an idea of what a kind of typical quantum simulation experiment will look like,

28
00:03:03,610 --> 00:03:08,980
this is a cad schematic of our experiment in the in the basement downstairs.

29
00:03:08,980 --> 00:03:15,070
So these experiments tend to start with a stage of laser cooling on which is illustrated in this area here,

30
00:03:15,070 --> 00:03:23,710
where we will capture some atoms from thermal vapour and cooled them down to about 100 micron kelvin using laser cooling methods.

31
00:03:23,710 --> 00:03:25,390
Once we've captured our vapour,

32
00:03:25,390 --> 00:03:32,500
we will then transport through this differential pumping section of this vacuum system to an ultra high vacuum region, and you can see this.

33
00:03:32,500 --> 00:03:37,630
So this glass cell here is sort of blown up on the right, and you can see various coils around this.

34
00:03:37,630 --> 00:03:44,230
So what we are ultimately doing is applying various magnetic fields and laser fields to manipulate our atoms,

35
00:03:44,230 --> 00:03:50,680
to make them experience different potentials and so simulate different Hamiltonians and therefore simulate different systems.

36
00:03:50,680 --> 00:03:56,680
If our experiment works well, we will basically produce a Bose-Einstein condensate,

37
00:03:56,680 --> 00:04:02,410
which is where we have macroscopic occupation of the ground state of our system.

38
00:04:02,410 --> 00:04:08,000
The metric that we use normally for sort of actually exploring the state of the

39
00:04:08,000 --> 00:04:12,730
system is we release the atoms and they fall on the time of flight and expand.

40
00:04:12,730 --> 00:04:16,480
And we can basically take a picture using a sort of standard imaging camera and

41
00:04:16,480 --> 00:04:20,860
image the distribution of atoms after they have had this ballistic expansion.

42
00:04:20,860 --> 00:04:25,870
And what you see here in this picture is such an image that this is a real space image.

43
00:04:25,870 --> 00:04:32,890
You can see that there is a sort of bimodal distribution in that I have this kind of faint background, which is a kind of Gaussian distribution.

44
00:04:32,890 --> 00:04:37,360
So effectively, this is the atoms that remain in this sort of thermal state.

45
00:04:37,360 --> 00:04:43,450
And then this large sort of dense core is effectively the atoms in the in the ground.

46
00:04:43,450 --> 00:04:46,610
The system deploys Arnstein condensate.

47
00:04:46,610 --> 00:04:54,020
The point I would like to make is that these experiments are very complicated, so I'll show you what our typical sequence looks like.

48
00:04:54,020 --> 00:04:59,360
So this is basically in time series so long the x axis, I have time.

49
00:04:59,360 --> 00:05:04,070
And then on the y axis, I have various parameters that I need to control my experiment.

50
00:05:04,070 --> 00:05:09,350
So, for example, these are analogue voltages that choreograph the various parts of the experiment.

51
00:05:09,350 --> 00:05:17,840
These are digital shots as things used to sort of toggling the laser beams or applying different pulses to the atoms.

52
00:05:17,840 --> 00:05:23,550
You can see there is a lot of complexity going along. And the point I would like to make is that this is a very vast parameter space.

53
00:05:23,550 --> 00:05:29,540
So if we want to find a way of optimising our experiment, this becomes very laborious to search for it.

54
00:05:29,540 --> 00:05:34,820
But because it is a very difficult thing to orchestrate, I should say this is about a minute long,

55
00:05:34,820 --> 00:05:41,750
by the way, to actually produce this, this cloud of atoms, because this requires very precise timings.

56
00:05:41,750 --> 00:05:44,220
Most of these systems are already computer controlled,

57
00:05:44,220 --> 00:05:48,800
and that gives a really nice advantage when you want to then use machine learning to control these experiments.

58
00:05:48,800 --> 00:05:53,330
Because normally these experiments are already parameterised in a way that you can take

59
00:05:53,330 --> 00:05:58,730
a machine learning package and connect it and immediately start testing to see results.

60
00:05:58,730 --> 00:06:00,020
So in the first part of my talk,

61
00:06:00,020 --> 00:06:08,840
I'm going to talk about using machine learning to optimise these experiments and specifically I'm going to look at how we produce ultracold gases.

62
00:06:08,840 --> 00:06:17,390
So first, let's ask kind of what are the advantages of machine learning that we expect we will be able to use to improve our experiment?

63
00:06:17,390 --> 00:06:20,720
So firstly, I would like to say that during the learning process,

64
00:06:20,720 --> 00:06:25,580
our machine learning actually acquires a sort of intuitive understanding of the experiment.

65
00:06:25,580 --> 00:06:31,190
So this is kind of like how a Ph.D. student will tend to perform an experiment.

66
00:06:31,190 --> 00:06:38,630
It will sit in the laboratory and it will turn knobs. And although you may have an idea of of what will happen if you turn a knob.

67
00:06:38,630 --> 00:06:41,360
Sometimes the experiments don't actually always work that way,

68
00:06:41,360 --> 00:06:47,090
and there are quite often sort of tricks that well will be needed to to get things to work.

69
00:06:47,090 --> 00:06:50,270
And sometimes things can occur that are slightly unexpected.

70
00:06:50,270 --> 00:06:56,840
And so that's why this other point, the fact that the machine learning has no a priori model of the experiment can also be an advantage.

71
00:06:56,840 --> 00:07:04,340
So whereas we, as physicists, will tend to sit down and write out a physical model of our system and then use that physical model to

72
00:07:04,340 --> 00:07:10,850
try and develop a kind of theoretical ramp or sequence of parameters that we think will perform best.

73
00:07:10,850 --> 00:07:17,390
Sometimes that doesn't work, and it can be good to have effectively no prior knowledge.

74
00:07:17,390 --> 00:07:21,230
So reasons that might not work is, for example, because our model is incomplete.

75
00:07:21,230 --> 00:07:27,380
We have failed to account for some imperfection in the apparatus or our calibrations may be incorrect.

76
00:07:27,380 --> 00:07:30,200
Another important thing about the machine learning is is that they are patient,

77
00:07:30,200 --> 00:07:34,070
and that means that machine learning will happily sit down and optimise this for six hours straight,

78
00:07:34,070 --> 00:07:39,200
which, you know, most humans will get bored after 15 minutes, basically.

79
00:07:39,200 --> 00:07:44,480
So that allows you to very rigorously explore the parameter space without distraction.

80
00:07:44,480 --> 00:07:48,020
And the last point is, I would like to say that I wouldn't say so.

81
00:07:48,020 --> 00:07:53,780
There was a question about whether or not machine learning algorithms will achieve a DPhil.

82
00:07:53,780 --> 00:08:00,080
I would like to think of it as this machine learning algorithm approach actually sort of frees us to think about other problems.

83
00:08:00,080 --> 00:08:02,630
So before we use machine learning in our laboratory,

84
00:08:02,630 --> 00:08:10,850
we would tend to spend maybe sort of 10 hours of a 12 hour day getting the experiment to work nicely and then only two hours on the science.

85
00:08:10,850 --> 00:08:15,080
But now what we can actually do is we can leave it running overnight. The experiment can tune itself up.

86
00:08:15,080 --> 00:08:21,790
And then when we come in in the morning, the experiment is ready and waiting to to be used.

87
00:08:21,790 --> 00:08:28,150
So in the context of producing ultracold quantum gases, I'm going to talk about two workhorse techniques today,

88
00:08:28,150 --> 00:08:33,080
which are sort of broadly used in many of these experiments. And I'll talk about the optimisation of each of them.

89
00:08:33,080 --> 00:08:38,740
So the first is evaporative cooling, which work lot, which works largely in the same way that a cup of coffee cools down.

90
00:08:38,740 --> 00:08:44,740
And the second is laser cooling, which you may be familiar with, but I'll attempt to describe both of these.

91
00:08:44,740 --> 00:08:50,770
So the first thing with evaporative cooling is that if I imagine that I have a collection of atoms in a trap,

92
00:08:50,770 --> 00:08:55,780
I will have a distribution of energies that those atoms can take.

93
00:08:55,780 --> 00:08:58,000
So this is, for example, the max volts and distribution.

94
00:08:58,000 --> 00:09:03,580
You have the velocity along the x axis and the probability of having that velocity along the y axis.

95
00:09:03,580 --> 00:09:09,880
The point that is important here is that there is a high energy tail to this distribution in evaporative cooling.

96
00:09:09,880 --> 00:09:13,720
What I want to do is I want to find a way to remove atoms from my system in a way

97
00:09:13,720 --> 00:09:18,190
that they will take away energy from that system in the most efficient way possible.

98
00:09:18,190 --> 00:09:26,560
So if I can find a way to remove the highest energy atoms, by definition, I will reduce the average energy of the remaining atoms.

99
00:09:26,560 --> 00:09:33,070
And then once I allow those atoms to collide and referrals, I will produce another distribution, which is characterised by a lot of temperature.

100
00:09:33,070 --> 00:09:40,060
So I lose atoms. But what I gain is an increase in the face space density, which is a measure of sort of how cold and dense this sample is.

101
00:09:40,060 --> 00:09:46,450
So just to sort of show this quickly, I have here a simulation of some atoms trapped inside a harmonic trap.

102
00:09:46,450 --> 00:09:52,570
You can see them sort of jostling around at the centre. So when these atoms are shown and grey is basically when they've collided,

103
00:09:52,570 --> 00:09:55,930
so you can see there are many collisions happening at the centre and you'll see that

104
00:09:55,930 --> 00:09:59,770
the atoms which have ice energy tend to sort of fly out to the very edges of the trap.

105
00:09:59,770 --> 00:10:07,870
So what this sphere around the outside is is effectively a surface where if an atom reaches that surface, it will be removed from the trap.

106
00:10:07,870 --> 00:10:11,920
So if you watch, what happens is, as I would use this surface,

107
00:10:11,920 --> 00:10:17,740
I'm basically lowering the potential energy that an atom has to have for it to to escape from the trap.

108
00:10:17,740 --> 00:10:24,130
And in doing so, I will only remove the highest energy atoms if I keep slowly decreasing this.

109
00:10:24,130 --> 00:10:27,970
Then basically, my remaining atoms are thermals and I reduce the temperature.

110
00:10:27,970 --> 00:10:32,050
But a key point here is that there is a sort of timescale involved in this,

111
00:10:32,050 --> 00:10:37,390
which is set by the rate at which these collisions occur because if I make my cut too quickly.

112
00:10:37,390 --> 00:10:43,870
So just to refresh this, if I go far too quickly, then basically I will remove atoms before they can reach thermals,

113
00:10:43,870 --> 00:10:51,220
and I will end up basically losing many, many atoms without efficiently reducing the temperature of the system.

114
00:10:51,220 --> 00:10:56,260
I'm going to talk about a work that was done in by a group in Australia.

115
00:10:56,260 --> 00:11:00,670
It's this paper here by Wigley and co-authors, which was published in Scientific Reports.

116
00:11:00,670 --> 00:11:09,220
And it effectively describes how this how they used Gaussian process to perform an optimisation of their evaporative cooling ramps.

117
00:11:09,220 --> 00:11:13,870
So in their experimental setup, they basically use dipole traps to confine the atoms.

118
00:11:13,870 --> 00:11:17,980
So the actual confinement I showed in the simulation was sort of a harmonic trap.

119
00:11:17,980 --> 00:11:24,700
The way that they achieve that harmonic trapping is by shining in laser beams, which are right to choose from the atomic transition.

120
00:11:24,700 --> 00:11:30,490
And in doing so, they shift the energy levels of the atoms by an amount which is proportional to the intensity of the laser.

121
00:11:30,490 --> 00:11:34,540
So what that means is that at the centre of the beam, the shift is much greater than at the edges of the beam.

122
00:11:34,540 --> 00:11:41,600
There is no shift, so it will produce a sort of potential well, which can confine the atoms.

123
00:11:41,600 --> 00:11:47,120
What they can then do, basically, is by reducing the intensity of their laser beams over time,

124
00:11:47,120 --> 00:11:53,000
they reduce the depth of that potential well, so they reduce the amount of energy required for an atom to escape from the trap.

125
00:11:53,000 --> 00:11:59,660
And that's that's effectively the same as reducing that, that the radius of that grace sphere that I showed in the simulation.

126
00:11:59,660 --> 00:12:03,350
The measure that they use to sort of I mean, if you want to optimise,

127
00:12:03,350 --> 00:12:09,470
you would like ideally to have a way of determining that you're actually improving the system in the first place.

128
00:12:09,470 --> 00:12:13,850
So the way that they measure the performance of their system is, as I said,

129
00:12:13,850 --> 00:12:19,280
to release the atoms and allow them to expand in a kind of ballistic expansion so that hot clouds.

130
00:12:19,280 --> 00:12:24,740
So these are again real space images if they release atoms of hot cloud or expand very quickly for the same time of flight.

131
00:12:24,740 --> 00:12:29,660
And you will see this kind of very broad distribution here, whereas a cold cloud will remain very,

132
00:12:29,660 --> 00:12:38,850
very cold and dense because they if the thermal energy such that the velocity distribution of this cloud is is narrow.

133
00:12:38,850 --> 00:12:43,320
There optimisation process looks like this, so when you use machine learning for optimisation,

134
00:12:43,320 --> 00:12:52,710
what you typically are doing is building a model of the system that you want to optimise, where the model is effectively your kind of learnt process.

135
00:12:52,710 --> 00:12:58,080
So they use a Gaussian process, which I won't have time to go into the details of,

136
00:12:58,080 --> 00:13:02,520
but effectively what they are assuming is that there is some function.

137
00:13:02,520 --> 00:13:09,390
Which would you describe the response of the system given some input parameters x, which are effectively your sort of experimental settings?

138
00:13:09,390 --> 00:13:17,910
So say, for example, how quickly you reduce the intensities of the laser beams and it would it would produce some cost from some value,

139
00:13:17,910 --> 00:13:22,800
which this cost function here is the measure of how well the experiment performs.

140
00:13:22,800 --> 00:13:31,050
What they assume is that you have some, some function, which has possibly some stochastic variation on it as well.

141
00:13:31,050 --> 00:13:39,960
And by basically sort of having this feedback loop where they they feed in parameters to the experiment, they receive the result from the experiment,

142
00:13:39,960 --> 00:13:42,210
which they analyse to determine the cost,

143
00:13:42,210 --> 00:13:50,430
and they then use that to improve their fitting of the model to effectively better learn how to approximate the experiment.

144
00:13:50,430 --> 00:13:54,150
And then they use that to to seek optimisation of the experiment.

145
00:13:54,150 --> 00:13:59,850
And one important thing I would like to say about the Gaussian process is that it has.

146
00:13:59,850 --> 00:14:04,810
There was a question earlier in the previous session about a lot of these machine

147
00:14:04,810 --> 00:14:10,860
learning is looked like black boxes and they tell you how something works, but don't necessarily give you any other intuition as to it.

148
00:14:10,860 --> 00:14:17,700
One thing that's nice about the Gaussian process is that it has the concept of a sort of characteristic length scale for each of the parameter,

149
00:14:17,700 --> 00:14:23,550
which is a measure of how much I expect the cost function to change if I change one of those parameters.

150
00:14:23,550 --> 00:14:26,070
And that's illustrated here in this diagram.

151
00:14:26,070 --> 00:14:34,320
So imagine that I had this this one dimensional access x and I have this this cost function, which is a function of that parameter.

152
00:14:34,320 --> 00:14:42,780
Then these red, green and blue lines are effectively different faiths, assuming different characteristic length scales.

153
00:14:42,780 --> 00:14:50,250
So if I seem a very long characteristic length scale for these black data points here, then this would be the sort of best fit that I would have.

154
00:14:50,250 --> 00:14:54,360
Whereas if I assumed that the process would actually vary more quickly as a function of X,

155
00:14:54,360 --> 00:14:56,910
I would be allowed to have this, this sort of green function here.

156
00:14:56,910 --> 00:15:03,990
And so part of the optimisation process is also about learning about what the length scales that describe this experimental process.

157
00:15:03,990 --> 00:15:10,970
So it gives you a measure of how sensitive your experiment is to different parameters.

158
00:15:10,970 --> 00:15:19,340
I'd like to say that, so they basically perform this optimisation over this effective curtain so you can see different stages of

159
00:15:19,340 --> 00:15:25,730
their cooling process and they find so they benchmark the sort of downstream process against this Nelda need,

160
00:15:25,730 --> 00:15:30,680
which is a sort of kind of very dumb local sort of gradient descent type algorithm,

161
00:15:30,680 --> 00:15:36,320
which is effectively trying to find the optimum configuration by only ever looking at the sort of last few data points.

162
00:15:36,320 --> 00:15:42,650
Whereas the Gaussian process is taking the entire set that it's learnt so far and is attempting to find best parameters based on those.

163
00:15:42,650 --> 00:15:49,820
And you see that this fast convergence of the Gaussian process. So the reason they both are converging at the same rates here is because basically

164
00:15:49,820 --> 00:15:56,720
they need to provide some training data for the Gaussian process to get started. And they use the analogy made for generating that, that trading set.

165
00:15:56,720 --> 00:15:59,810
And then afterwards, you see that gathering process converges much, much faster.

166
00:15:59,810 --> 00:16:05,390
So they are able to produce a very cold, dense cloud only after a very few evaluations.

167
00:16:05,390 --> 00:16:11,840
I should say that these are very small datasets compared to the kind of datasets that we've seen so far in the talks.

168
00:16:11,840 --> 00:16:17,150
I mentioned that these typical experiments take on the order of 30 seconds to a minute to perform.

169
00:16:17,150 --> 00:16:21,050
And if you want to actually have time to do your experiments as well as your optimisation,

170
00:16:21,050 --> 00:16:27,620
that means that you only ever have a small number of evaluations to look at. So one kind of nice result,

171
00:16:27,620 --> 00:16:32,240
wiggly and co-authors show is that effectively what you can use your Gaussian process

172
00:16:32,240 --> 00:16:36,950
for is to eliminate parameters which are less important for optimising your system.

173
00:16:36,950 --> 00:16:44,150
So this blue line here shows effectively a kind of Gaussian process optimisation using seven parameters.

174
00:16:44,150 --> 00:16:47,510
And then what they did is they looked at the characteristic length scales that the

175
00:16:47,510 --> 00:16:51,110
machine learner determined for those seven parameters and removed the parameter,

176
00:16:51,110 --> 00:16:57,590
which was the least sensitive. So effectively this, they said the outcome of the evaporation doesn't matter at all about the seventh parameter.

177
00:16:57,590 --> 00:17:03,710
So let's not bother optimising it. And then they get a much faster optimisation when they on the six remaining parameters.

178
00:17:03,710 --> 00:17:07,710
And that will be important Lightroom.

179
00:17:07,710 --> 00:17:15,180
I'd like to say that we here in Oxford have kind of benchmark this Gaussian process algorithm against other approaches for these types of experiments.

180
00:17:15,180 --> 00:17:21,690
So this we sort of generate training datasets using a differential evolution algorithm,

181
00:17:21,690 --> 00:17:27,450
which is a sort of genetic algorithm approach to effectively sort of taking different combinations

182
00:17:27,450 --> 00:17:32,580
of parameters and selecting features from the parameters which have performed well so far.

183
00:17:32,580 --> 00:17:38,070
This very fast blue line here, you can see, is the performance of the calcium process.

184
00:17:38,070 --> 00:17:42,330
And then this darker blue line is the performance of a neural network.

185
00:17:42,330 --> 00:17:47,580
So the neural network ultimately produces a slightly better solution for us, but it takes longer to converge.

186
00:17:47,580 --> 00:17:51,270
And part of the reason is that it requires a larger number of datasets to train.

187
00:17:51,270 --> 00:17:58,380
And as I say, we only are typically dealing with sort of 80 datasets for our experiment, which would be, you know, just over an hour of running.

188
00:17:58,380 --> 00:18:01,050
Another nice thing I can say is that you might think they're looking at that

189
00:18:01,050 --> 00:18:05,850
original parameter suite that I had in the in the one of the earlier slides.

190
00:18:05,850 --> 00:18:12,540
They are very complicated experiments, so you might think these optimisations will only work if I can already provide them with a starting point.

191
00:18:12,540 --> 00:18:16,380
But what we actually do here is we start with entirely randomised parameters.

192
00:18:16,380 --> 00:18:21,420
So this is the first image that we produce, which has absolutely nothing in it whatsoever.

193
00:18:21,420 --> 00:18:25,500
So another thing is very nice about this is in principle you could build an

194
00:18:25,500 --> 00:18:30,060
experimental apparatus only knowing that your your apparatus could in principle

195
00:18:30,060 --> 00:18:41,130
make a cloud of cold atoms and then allow the algorithm to take it from there and ultimately reproduce these very nice Bose-Einstein condensates.

196
00:18:41,130 --> 00:18:45,960
I'll talk about the next method of cooling, but I wanted to mention which is laser cooling,

197
00:18:45,960 --> 00:18:50,580
so I'm not sure how many of you are familiar with laser cooling. It might seem slightly counterintuitive,

198
00:18:50,580 --> 00:18:58,170
but a laser can cool something down because obviously lasers are typically associated with sort of heating things up,

199
00:18:58,170 --> 00:19:01,290
cutting things in half, etc. in the movies.

200
00:19:01,290 --> 00:19:09,450
So one of the nice things obviously about lasers is that they are a stream of photons and photons each carry momentum.

201
00:19:09,450 --> 00:19:15,300
What this means is that if an atom interacts with a laser beam, so here I have my vote on streaming in from the left hand side.

202
00:19:15,300 --> 00:19:18,690
If my atom absorbs these photons to get into some excited state,

203
00:19:18,690 --> 00:19:23,970
it will effectively absorb these full momentum kicks from the left hand side when it then emits.

204
00:19:23,970 --> 00:19:29,370
Eventually, my hands and will relax back down to the ground state. It'll limit the photons back out, but in a random direction.

205
00:19:29,370 --> 00:19:37,320
So if I take the sum of of this process, these emission will have a sort of mean momentum of zero,

206
00:19:37,320 --> 00:19:41,070
whereas these four well will provide a momentum kick from one side.

207
00:19:41,070 --> 00:19:46,350
So the sum of this process will be to exert a force on the outer.

208
00:19:46,350 --> 00:19:50,130
So in principle, what I want to do to do laser cooling is I want to have it so that my atom will

209
00:19:50,130 --> 00:19:54,360
always absorb photons from a laser beam coming opposite to its direction of travel.

210
00:19:54,360 --> 00:20:00,510
If I can do that, I can arrange it so that my force applied to the atom will always act to slow the atom down.

211
00:20:00,510 --> 00:20:04,860
The way we do this in the lab is we we send two counts propagating beams in from either

212
00:20:04,860 --> 00:20:10,320
side like this and my atom is here moving with some velocity in a certain direction.

213
00:20:10,320 --> 00:20:16,740
Now what you should remember is that Atoms responds to very narrow band ranges of radiation.

214
00:20:16,740 --> 00:20:21,780
So if the optical frequencies that matter will respond to defined by the

215
00:20:21,780 --> 00:20:31,630
electronic transmissions that can occur within within the shells with an atom. What this means is that if I write detuned, my two laser beams.

216
00:20:31,630 --> 00:20:40,120
My answer will initially see those if my team is stationary, it would be far detuned from those two transitions when my team starts moving.

217
00:20:40,120 --> 00:20:46,030
It will see one of the lasers as blue shifted and it will say the other is red shifted. So I will basically break the symmetry of that system.

218
00:20:46,030 --> 00:20:51,670
My answer will more rapidly radiate photons from the laser that it is travelling

219
00:20:51,670 --> 00:20:54,730
into the direction of which is effectively shifted into resonance with it.

220
00:20:54,730 --> 00:20:59,260
And in doing so, the atom will feel a force that pushes against the direction of motion.

221
00:20:59,260 --> 00:21:03,370
So this is the kind of principle of of laser cooling.

222
00:21:03,370 --> 00:21:10,810
I did this small picture of an ambulance down in the bottom right because you will identify this fact to this effect as the Doppler shift,

223
00:21:10,810 --> 00:21:15,610
which is the same reason why an ambulance siren changes pitch as it comes towards you and then drives away.

224
00:21:15,610 --> 00:21:20,970
This is exactly the same trick that we are performing with the atoms, but we're performing it with optical radiation.

225
00:21:20,970 --> 00:21:24,990
If you want to make a trap, I also need some position dependents for my forces.

226
00:21:24,990 --> 00:21:28,290
And the way that that's done in the laboratories, you apply a magnetic field.

227
00:21:28,290 --> 00:21:35,970
So by applying a magnetic field because of the same, in effect, my optical frequencies with my transitions will now gain a spatial dependence,

228
00:21:35,970 --> 00:21:42,120
and that allows me to to set up a potential where I can trap atoms at some well-defined point.

229
00:21:42,120 --> 00:21:48,660
And this so in this image here in the bottom right, this is an image taken from our lab, which is laser cooling of strontium.

230
00:21:48,660 --> 00:21:52,080
And you can see this bright speck in the middle is a large number of strontium atoms,

231
00:21:52,080 --> 00:21:56,070
which have been collected from an oven cooled and are now held by these.

232
00:21:56,070 --> 00:22:04,850
You can see in the background this bright blue laser. So how can optimisation of laser cooling be done?

233
00:22:04,850 --> 00:22:11,240
Well, so this was shown by Tranter and co-authors in this Nature Communications paper.

234
00:22:11,240 --> 00:22:14,960
This flow diagram, I'll sort of break it down because it looks quite complicated at first.

235
00:22:14,960 --> 00:22:20,720
Essentially, it is the same sort of idea as the previous optimisation and that we want to model some process.

236
00:22:20,720 --> 00:22:29,420
We set up a machine learner which will learn what that process is, and then we will we will effectively feedback parameters into that experiment,

237
00:22:29,420 --> 00:22:34,730
learn how the experiment behaved given those parameters, and use that to improve our knowledge of the machine learning.

238
00:22:34,730 --> 00:22:39,260
So the system that they have up here in the top left is a magnetar optical trap of rubidium.

239
00:22:39,260 --> 00:22:44,180
They have a very sort of elongated cigar shaped a cloud of atoms.

240
00:22:44,180 --> 00:22:49,900
And so the way they measure how many atoms they've laser cooled and how cold intensities is, they send a laser beam in along the axis.

241
00:22:49,900 --> 00:22:54,950
So that's this here, and that laser beam will be absorbed more if there are more atoms in the track.

242
00:22:54,950 --> 00:22:57,110
And so they basically measure a voltage on a photodiode,

243
00:22:57,110 --> 00:23:03,440
which becomes a measure of sort of how cold and dense this cloud is to perform well to model this behaviour.

244
00:23:03,440 --> 00:23:06,230
They use stochastic artificial neural networks.

245
00:23:06,230 --> 00:23:14,000
So we saw in the earlier example how machine learning is don't necessarily always fit to data properly and in attempts to get around.

246
00:23:14,000 --> 00:23:19,190
This is to use three different machine liners, which are each initialised with different random parameters.

247
00:23:19,190 --> 00:23:25,610
So this is a kind of machine learning. By consensus, we attempt to train three different learners if one of them gets stuck.

248
00:23:25,610 --> 00:23:30,080
Hopefully, the other two will have figured out what's going on and help to help to correct for that.

249
00:23:30,080 --> 00:23:33,200
And then the other thing which isn't drawn on the diagrams,

250
00:23:33,200 --> 00:23:38,180
they also use a genetic algorithm to generate a sort of fourth data point each time they run.

251
00:23:38,180 --> 00:23:46,880
So they use these. These four different is the three neural networks and the one differential evolution to basically model this process.

252
00:23:46,880 --> 00:23:52,790
They generate a series of ramps, so they have three control parameters, which are effectively the D tuning of the cooling laser beams,

253
00:23:52,790 --> 00:23:57,830
50 tuning of a pump, a laser beam which is used to basically keep atoms in this cooling transition.

254
00:23:57,830 --> 00:24:02,660
Because what I didn't mention in the previous slide is that your atoms can be promoted to an excited state.

255
00:24:02,660 --> 00:24:05,240
And what you want to have to have your laser cooling work is that they fall

256
00:24:05,240 --> 00:24:08,300
back into the same state they were initially in so that you can cycle them.

257
00:24:08,300 --> 00:24:13,820
But what actually can happen is obviously there are other electronic states and atoms will fall out into the dark states eventually.

258
00:24:13,820 --> 00:24:17,690
So you use this great Panopto sort of out the atoms back into the cooling transition.

259
00:24:17,690 --> 00:24:20,330
And then the last parameter they have is the coil current.

260
00:24:20,330 --> 00:24:25,880
And what they want to do is basically sort of perform a sequence where they can kind of compress this to get the coldest and to sample.

261
00:24:25,880 --> 00:24:36,430
And so what they do is they separate their parameters into twenty one time bins, and that gives sixty three parameters for them to to optimise.

262
00:24:36,430 --> 00:24:41,860
So I'll show you what they uh. Let me just check something.

263
00:24:41,860 --> 00:24:47,770
Yeah, I show you what their convergence looks like effectively, so they start by well,

264
00:24:47,770 --> 00:24:54,310
these blue dots are effectively an optimisation using differential evolution, so attempting to model the process using a genetic algorithm.

265
00:24:54,310 --> 00:24:59,560
And these red dots show the faster convergence that they're sort of stochastic neural network produces.

266
00:24:59,560 --> 00:25:06,280
So what this is showing is that in a lower number of runs, they can more efficiently understand what the process is and then optimise it.

267
00:25:06,280 --> 00:25:11,290
And then over on the right here, we have absorption images showing what they're called dense cloud to look like,

268
00:25:11,290 --> 00:25:15,520
and they show this against what the sort of best students in the lab could achieve.

269
00:25:15,520 --> 00:25:19,990
So they basically showing that the machine learner in only a sort of few hours

270
00:25:19,990 --> 00:25:30,600
is able to outperform a student which has been working on the experiment for, you know, a few years, basically.

271
00:25:30,600 --> 00:25:35,640
And this is one of the kind of wackier things, and I mentioned at the start that one of the advantages that you have with machine

272
00:25:35,640 --> 00:25:39,390
learning is the fact that they don't have an a priori understanding of the system.

273
00:25:39,390 --> 00:25:44,940
So obviously, as physicists, one of the things we do is we come up with very simple toy models to describe systems.

274
00:25:44,940 --> 00:25:48,750
But the problem is is that those toy models can often be flawed on the left.

275
00:25:48,750 --> 00:25:52,620
Here we see the sort of best optimised human parameters that kind of sensible.

276
00:25:52,620 --> 00:25:56,730
You see that this rate bumper d tuning sort of decreases linearly over time.

277
00:25:56,730 --> 00:26:00,630
We keep one parameter constant and then the coil current is sort of ramped up.

278
00:26:00,630 --> 00:26:06,090
But if you actually look at what the best parameters the machine a learner produces, they look completely different.

279
00:26:06,090 --> 00:26:11,280
I mean, this is kind of the sort of equivalent of a sort of unhuman chess move.

280
00:26:11,280 --> 00:26:16,920
This is not something that the expert, the researchers sort of expected when they first ran this.

281
00:26:16,920 --> 00:26:22,020
They still are in the paper. They were not necessarily even sure how to explain why this was optimum.

282
00:26:22,020 --> 00:26:29,190
Apart from the fact that it was, but what they believe may be happening is that there is a sort of dynamics happening where the models are.

283
00:26:29,190 --> 00:26:33,240
So you're sort of at Magneto optical trap, which is what you're using to confine the atoms,

284
00:26:33,240 --> 00:26:38,820
is sort of releasing the atoms, allowing them to sort of adiabatic, expand and cool and then sort of recapturing them.

285
00:26:38,820 --> 00:26:48,660
So it's effectively sort of learnt a new technique which which the the experimentalists were not using beforehand to show that well,

286
00:26:48,660 --> 00:26:53,490
they show that these parameters are effectively stable and don't change from day to day.

287
00:26:53,490 --> 00:27:00,750
And they plop sort of these parameters versus kind of other sort of less optimal runs over here.

288
00:27:00,750 --> 00:27:03,960
One thing is that you'll notice that the parameters sort of rail very strongly.

289
00:27:03,960 --> 00:27:09,630
So for example, in this part here you can see this coil currently sort of basically turning off and on very quickly.

290
00:27:09,630 --> 00:27:16,440
They suggest that. So if you look at this kind of cost landscape as a function of the coil current for this optimal solution,

291
00:27:16,440 --> 00:27:19,740
they basically show that it seems like in principle,

292
00:27:19,740 --> 00:27:28,620
you may even want to actually extend the ranges of these parameters beyond what their experiment was able to do.

293
00:27:28,620 --> 00:27:35,790
I'd like to say that, so we've sort of looked at optimising two separate stages that what we've done here in Oxford as well is applied,

294
00:27:35,790 --> 00:27:41,460
applied these techniques to optimise a full experimental suite. But obviously we do have a very large number of parameters.

295
00:27:41,460 --> 00:27:48,390
So in order to optimise our sort of entire experiment, we have to be selective about the parameters that we choose to optimise.

296
00:27:48,390 --> 00:27:55,410
And in order to do this, we basically use the Gaussian process to understand which parameters our experiment is most sensitive to.

297
00:27:55,410 --> 00:28:02,310
So these two plots on the right hand side show show for the laser cooling and for the evaporative cooling,

298
00:28:02,310 --> 00:28:09,210
what the kind of length scales are for the most important parameters as a function of running so effectively as the learner learns.

299
00:28:09,210 --> 00:28:13,050
These are what it thinks are sort of how sensitive various parameters are.

300
00:28:13,050 --> 00:28:19,800
And ultimately, what we use this to do is extract which parameters are the most sensitive parameters for each of these cooling stages.

301
00:28:19,800 --> 00:28:22,200
Once we know what the most sensitive parameters are,

302
00:28:22,200 --> 00:28:26,490
we can fix all of the other parameters and perform a complete optimisation over the whole sequence,

303
00:28:26,490 --> 00:28:31,500
but only considering the most sensitive parameters. To put this into context again.

304
00:28:31,500 --> 00:28:36,630
So we have a human optimisation, which is effectively the the best of the Ph.D. students can do.

305
00:28:36,630 --> 00:28:40,860
And then this is the sort of final optimisation, including all of the different stages.

306
00:28:40,860 --> 00:28:45,750
So without going into the details it from the colour map, you can see this one is colder and denser.

307
00:28:45,750 --> 00:28:50,340
Basically, this is a much better experiment.

308
00:28:50,340 --> 00:28:59,220
Another very nice thing that you can do once you have set this up is that what may constitute an optimum experiment is very situationally dependent.

309
00:28:59,220 --> 00:29:02,790
So so far, I've said that it is optimum to have very cold, dense clouds.

310
00:29:02,790 --> 00:29:07,140
And it often is because being cold and dense sets quantity called the chemical potential,

311
00:29:07,140 --> 00:29:14,290
which then becomes important in determining which times of your Hamiltonian are important.

312
00:29:14,290 --> 00:29:19,720
But depending on what you actually want to use, the experiment for, you may find other metrics which are significant.

313
00:29:19,720 --> 00:29:26,380
So for example, in this, we change the cost function to basically say give us the largest number of atoms at a temperature of one microcap.

314
00:29:26,380 --> 00:29:30,970
And so the cost function looks like this. As you increase the number, the cost function goes lower and lower,

315
00:29:30,970 --> 00:29:35,980
and you can effectively see all of these cost functions are centred on this, this one, like a Kelvin.

316
00:29:35,980 --> 00:29:38,980
And this this will basically produce a larger cloud.

317
00:29:38,980 --> 00:29:44,710
This line will produce you'll find a sequence which will generate the largest cloud of atoms at exactly the temperature we've asked for.

318
00:29:44,710 --> 00:29:53,290
Other optimisations which are useful include things like asking the question What is the fastest time I can actually create a proton stone condensate?

319
00:29:53,290 --> 00:29:55,690
And so say, for example, that may be useful.

320
00:29:55,690 --> 00:30:02,590
If I'm aligning some optics around my experiment, I want to be able to adjust the lens and then take a picture.

321
00:30:02,590 --> 00:30:07,210
But I don't necessarily want to wait around for one minute in between leaving the lens each time,

322
00:30:07,210 --> 00:30:15,010
because that quickly becomes tedious, so we can basically use this to cut down sequence times as well.

323
00:30:15,010 --> 00:30:20,240
I would like to now talk about a slightly different use of machine learning in the laboratory,

324
00:30:20,240 --> 00:30:24,370
so we've looked at how machine learning can be used to improve the performance of an experiment.

325
00:30:24,370 --> 00:30:33,970
But what if I had a device which has a certain fixed performance or instead what I'm doing is trying to find out how I can best evaluate that device?

326
00:30:33,970 --> 00:30:42,100
So I want to. The question becomes how can I extract the most information possible in the smallest number of measurements?

327
00:30:42,100 --> 00:30:46,000
And so this is the this is the device into question.

328
00:30:46,000 --> 00:30:52,330
This is a quantum dot. This is done by a group of Andrew bricks here in Oxford.

329
00:30:52,330 --> 00:30:58,300
The kind of measurements that they want to take is effectively the conductance of this device as a function of different gait voltages.

330
00:30:58,300 --> 00:31:06,790
So we see we have a number of different voltages which we can apply to our chip and that will change the conductance of this of this device.

331
00:31:06,790 --> 00:31:11,090
This is a kind of 2D parameter scan of the space we're interested in.

332
00:31:11,090 --> 00:31:14,710
So say, for example, we take a gate voltage and a biased voltage.

333
00:31:14,710 --> 00:31:18,310
You'll notice that there are some features that you can immediately see in this image.

334
00:31:18,310 --> 00:31:24,640
So, for example, we have a very large band in the middle where there is basically no current flowing.

335
00:31:24,640 --> 00:31:29,860
So if I wanted to take accurate, if I wanted to make a very efficient characterisation of the device,

336
00:31:29,860 --> 00:31:34,760
it probably wouldn't be sensible to take many measurements in this region.

337
00:31:34,760 --> 00:31:42,730
Instead, what you would want to do is take measurements around these areas where there's sort of a lot of things happening, basically.

338
00:31:42,730 --> 00:31:54,340
So the so I should say I'm sorry, the reference has been lost because the slide has resized to these four by three ratio.

339
00:31:54,340 --> 00:32:04,330
But this paper was led by Natalie IRAs. Yeah, I mean, I guess you have the exact reference and huge quantum information.

340
00:32:04,330 --> 00:32:08,440
Yes, in future quantum information, this was last year, wasn't it? Yeah.

341
00:32:08,440 --> 00:32:14,200
So what they are trying to do is, is, as I say, characterise these devices efficiently.

342
00:32:14,200 --> 00:32:20,260
The way that they do this is they they first perform an initial scan over a device, so they perform a very rough scan.

343
00:32:20,260 --> 00:32:24,430
And then the question is, if I have this eight by eight scan of my parameter space,

344
00:32:24,430 --> 00:32:32,260
how can I then choose my variables to my remaining measurements to take to learn this device as best as I can?

345
00:32:32,260 --> 00:32:42,460
So. The way that they do this is they train a machine learning to predict what the device should look like based on a map given like this.

346
00:32:42,460 --> 00:32:47,740
And this machine learning will produce various different candidate devices.

347
00:32:47,740 --> 00:32:54,100
So effectively, I say the eight by eight scan looks like this. The machine learning generates all this random distribution of what the full

348
00:32:54,100 --> 00:32:58,510
device scan might look like and then what you can look for is you can look for.

349
00:32:58,510 --> 00:33:00,880
Well, where do these pictures actually disagree?

350
00:33:00,880 --> 00:33:07,900
Where are the parts where my machine learner is sort of saying, I don't really know what is happening on the device at this point,

351
00:33:07,900 --> 00:33:14,440
and that produces this map here, which is effectively a sort of map of where the uncertainties lie in this parameter space.

352
00:33:14,440 --> 00:33:20,260
And so from this map, you can effectively extract what is the measurement that will give us the moat,

353
00:33:20,260 --> 00:33:24,430
which will effectively reduce our uncertainties about how about this device?

354
00:33:24,430 --> 00:33:29,830
So they call this the sort of information gain. And so by looking at this map, you can then say, Well,

355
00:33:29,830 --> 00:33:34,420
let's take the points in this map where the information going is going to be greatest, measure those.

356
00:33:34,420 --> 00:33:45,590
And in doing so, you can you can reduce your uncertainties. So this is sort of showing, for example, some different reconstructions.

357
00:33:45,590 --> 00:33:50,330
So if I had this partial scan of the parameter space here and I looked at what the

358
00:33:50,330 --> 00:33:53,450
various machine learning generated solutions would look like along this line,

359
00:33:53,450 --> 00:33:58,850
here you see all these like different grey wavy suggestions as to how the response may vary.

360
00:33:58,850 --> 00:34:04,400
Obviously, over here, all these different grey, wavy suggestions converge, so there's very little point taking a measurement here.

361
00:34:04,400 --> 00:34:15,040
But here they're very, very different. So there would be a lot of benefit to some measuring at that particular point.

362
00:34:15,040 --> 00:34:22,330
One thing they do to show that this algorithm works very well is they basically make the observation.

363
00:34:22,330 --> 00:34:30,040
So going back to this one, they make the observation that this information gain map looks kind of like the gradient of this function.

364
00:34:30,040 --> 00:34:36,160
So as I said, this flat region in the centre where not a lot is happening.

365
00:34:36,160 --> 00:34:37,720
That also obviously has a very flat gradient.

366
00:34:37,720 --> 00:34:42,640
So you're kind of saying your your function, your output doesn't really depend on the parameters in that region,

367
00:34:42,640 --> 00:34:46,180
whereas along these parts where we want to measure, we have large gradients.

368
00:34:46,180 --> 00:34:54,370
So they define this gradient, which is effectively a measure of of of the slope of the function at different points.

369
00:34:54,370 --> 00:34:59,830
And they define an information content as if I take the measurements that I've made so far.

370
00:34:59,830 --> 00:35:06,520
If I measure the gradients of those points and then compare that back to the sort of total amount of gradient in the image,

371
00:35:06,520 --> 00:35:14,440
given that when the scan is complete, I can basically determine how much information have I extracted so far from the device.

372
00:35:14,440 --> 00:35:19,720
A linear sort of standard raster scan will basically decrease linearly like this,

373
00:35:19,720 --> 00:35:24,880
so I'm effectively sort of randomly sometimes picking up a thought which has a large change,

374
00:35:24,880 --> 00:35:27,280
you know, a large gradient and therefore a lot of information given.

375
00:35:27,280 --> 00:35:31,810
And sometimes I end up sampling a very flat region, so it just decreases linearly.

376
00:35:31,810 --> 00:35:35,560
The optimal solution? So if I had a device that was fully characterised,

377
00:35:35,560 --> 00:35:40,030
I know that the measurements I should have made is effectively to arrange those

378
00:35:40,030 --> 00:35:44,080
points by the amount of gradient in each point and just measure accordingly.

379
00:35:44,080 --> 00:35:49,570
That would give me this Green Line here. This is effectively the sort of measurement sorted by the great end at each point.

380
00:35:49,570 --> 00:35:54,340
And what they show is that each machine learner basically very closely approximates this Green Line.

381
00:35:54,340 --> 00:35:58,090
So what they're saying is this machine learning is very good at identifying the points

382
00:35:58,090 --> 00:36:03,910
that it should sample in order to best improve our understanding of the device.

383
00:36:03,910 --> 00:36:05,770
So sort of conclude the talk.

384
00:36:05,770 --> 00:36:14,620
I hope I go back to my slide at the start, which is effectively the sort of advantages the machine learning will will give us as experimentalists.

385
00:36:14,620 --> 00:36:18,160
I've given you an example of, well, in this sort of optimisations,

386
00:36:18,160 --> 00:36:26,320
I've given an example about how it's useful sometimes to have an unbiased mind with no kind of prior AI model of how these systems work.

387
00:36:26,320 --> 00:36:33,580
I've also shown how the learn can pick up an intuitive understanding. So we saw that in the in the work where they optimise the laser cooling stage,

388
00:36:33,580 --> 00:36:37,120
the machine learning or ended up finding effectively an entirely new way to

389
00:36:37,120 --> 00:36:40,450
do the laser cooling that the experimentalists hadn't thought of beforehand.

390
00:36:40,450 --> 00:36:47,160
And that's just because the machine learning gains are sort of intuition as to how the system will respond.

391
00:36:47,160 --> 00:36:53,520
And I guess the the other point that I would like to emphasise is this last one about the fact that having machine

392
00:36:53,520 --> 00:36:58,290
learning methods in our laboratories basically frees us up as experimentalists to think about the bigger picture.

393
00:36:58,290 --> 00:37:02,700
We're not necessarily getting bogged down in terms of getting the experiment running well.

394
00:37:02,700 --> 00:37:07,950
We can turn up in the morning with an experiment that has been optimised overnight and get straight down to doing the physics.

395
00:37:07,950 --> 00:37:14,282
So thank you very much for listening and.