1
00:00:02,160 --> 00:00:19,030
They. Okay, Welcome, everybody to the strategy lecture.
2
00:00:19,040 --> 00:00:24,160
So this is our distinguished only lecture, which is named for Christopher Straight G,
3
00:00:24,180 --> 00:00:31,160
who founded the programming research Group here and was one of the leading figures in the early history of computing.
4
00:00:31,550 --> 00:00:37,190
Before I introduce our amazing speaker, I want to thank our sponsors, Oxford Asset Management,
5
00:00:37,190 --> 00:00:46,310
to spend generously supporting this hearing since 2015, and they enabled us to bring a great future here.
6
00:00:47,180 --> 00:00:56,140
I think some of you have done some things that researchers and speakers want to meet them afterwards, and that's okay.
7
00:00:56,150 --> 00:01:03,320
So it's a great pleasure to introduce Chris Buyer and Chris on super well known for her work in verification.
8
00:01:03,920 --> 00:01:09,979
And actually, Crystal is also very modest because the biography she gave me omitted all sorts of things.
9
00:01:09,980 --> 00:01:13,250
But luckily for me I used Google and discovered Discover for myself.
10
00:01:13,640 --> 00:01:16,690
So I discovered, for example, that for her textbook,
11
00:01:16,970 --> 00:01:24,230
the principles of model checking has been cited more than 8000 times and is essentially the book in model checking.
12
00:01:24,230 --> 00:01:32,960
And she also has some very, very influential papers, including a work called Model checking algorithms for continuous time Markov chains,
13
00:01:33,140 --> 00:01:41,450
which again has really very, very many citations and introduces both the new logic and new model checking algorithms.
14
00:01:42,950 --> 00:01:49,399
Crystal is the author of the verification column in ACM SIGGRAPH News and has a bunch of honours,
15
00:01:49,400 --> 00:02:00,170
including Academy Award Europa and two honorary doctorate from Orkin, and she's really active in supporting German computer science.
16
00:02:00,170 --> 00:02:05,600
So I found Crystal on just about every German computer science committee that exists.
17
00:02:06,000 --> 00:02:15,800
She's here today talking to us. The title of the talk is From Classical to Non-Classical is to cast the shortest path and welcome Crystal.
18
00:02:16,610 --> 00:02:19,670
But this letter. Yes, like that. Yeah.
19
00:02:20,360 --> 00:02:23,420
Thank you. That's the Mike Rook. It does. It does. Okay.
20
00:02:23,900 --> 00:02:28,820
Yes. Thank you very much for inviting me. Thank you very much for organising this event.
21
00:02:28,830 --> 00:02:32,000
Thank you very much for sponsoring here.
22
00:02:32,000 --> 00:02:35,060
It's a great honour for me to give this presentation.
23
00:02:35,750 --> 00:02:39,440
Well I choose to a topic which has a long history.
24
00:02:39,440 --> 00:02:45,049
So. So indeed, first research was done 60 years ago, but in the end,
25
00:02:45,050 --> 00:02:51,140
for this non-classical versions of the shortest path problem then than it, it's more more recent work.
26
00:02:52,430 --> 00:03:01,160
And before starting with the technical things, I would also like to say it is almost 30 years ago that I visited.
27
00:03:01,660 --> 00:03:09,940
We had Kostka in Birmingham. We just were thinking which year it was 95 or 96 and this was for me,
28
00:03:09,940 --> 00:03:19,520
it was time to get in touch with with questions concerning the verification of probabilistic programs or probabilistic systems.
29
00:03:19,940 --> 00:03:29,149
And at this time, I also read these old papers about the classical stochastic sort of path problem, and I would have never thought that.
30
00:03:29,150 --> 00:03:32,480
So two years later I will give you a talk about this topic.
31
00:03:32,600 --> 00:03:36,230
So I would have expected that all the problems have been solved.
32
00:03:37,220 --> 00:03:44,780
Uh, yeah. So what is the question of oops of the classical tortoise past problem?
33
00:03:45,140 --> 00:03:52,140
So more or less the task is to construct a strategy or policy for traversing a graph,
34
00:03:52,140 --> 00:04:00,140
the probabilities attached such that the expected accumulated costs until reaching a target is minimal,
35
00:04:00,140 --> 00:04:05,450
so that the main problem is to minimise the expected costs or waits.
36
00:04:06,550 --> 00:04:10,770
A motivation. So this is a map.
37
00:04:12,090 --> 00:04:15,180
Here is Grayson, where I live here in Oxford.
38
00:04:15,750 --> 00:04:19,710
And going from Grayson to Oxford. Well, there are many possibilities.
39
00:04:20,100 --> 00:04:23,700
Taking a car, taking a train, taking a flight.
40
00:04:24,210 --> 00:04:29,220
Then there are no direct flights. This is an extremely small airport that we have in Grayson.
41
00:04:29,220 --> 00:04:32,970
Then I have to change in Frankfurt or Munich or in Berlin or whatever.
42
00:04:33,630 --> 00:04:39,390
And there is a lot of decisions that have to be made and there's a lot of uncertainty.
43
00:04:39,660 --> 00:04:45,210
So the flight might be delayed or the flight might be cancelled, which is indeed the case with my back.
44
00:04:45,220 --> 00:04:57,110
But flight have been cancelled. And so it's all finding one way shortest in the sense of shortest time for me in expectation.
45
00:04:57,410 --> 00:05:00,620
So this is one instance of the shortest path problem.
46
00:05:01,250 --> 00:05:11,300
Now, indeed, in different communities, a I planning robotics, operations research, this is a very classical and fundamental problem,
47
00:05:11,900 --> 00:05:19,550
but also for the verification of randomised systems or distributed randomised algorithms.
48
00:05:20,180 --> 00:05:30,310
Then for instance, one might ask what is the expectation, the time to termination, the maximal or minimal time to fall for,
49
00:05:30,680 --> 00:05:37,070
for termination or expected costs, the maximal cost that can happen in the worst case.
50
00:05:38,520 --> 00:05:47,290
But now that the formalisation of oops, the formalisation of these problems is using Markov decision processes,
51
00:05:47,530 --> 00:05:51,460
Now, I do not want to bother you with complicated definitions,
52
00:05:51,610 --> 00:06:01,209
which is not that complicated in that case, and the way the simplest way to explain them, at least I find, is to to do the following.
53
00:06:01,210 --> 00:06:09,540
We start with the classical transition system here shown on the left, where in every state there is a choice between different actions.
54
00:06:10,180 --> 00:06:15,790
And in this context, we consider the choice of different actions as a non deterministic choice,
55
00:06:16,090 --> 00:06:23,710
so different than for finite automaton where we expect to have the same label for a non deterministic choice.
56
00:06:24,040 --> 00:06:27,910
Yet the choice between alpha and beta is considered to be non deterministic.
57
00:06:29,260 --> 00:06:33,940
The other extreme is a Markov train where we do not have non determinism at all.
58
00:06:34,270 --> 00:06:38,170
Instead, for every state we have a distribution for the successful states.
59
00:06:39,120 --> 00:06:42,330
And Markov decision process is somehow the mixture of both.
60
00:06:42,750 --> 00:06:49,740
So for every state, we have a set of action, and for each we have a distribution for the successful states.
61
00:06:50,770 --> 00:06:54,340
I thought, Ah, this is missing because of this one.
62
00:06:54,960 --> 00:07:05,030
Maybe. Uh, yeah.
63
00:07:05,040 --> 00:07:08,849
So it was auto fault for Alfa. We have the distribution one for three force.
64
00:07:08,850 --> 00:07:13,720
And likewise we have, we have distribution for the beta and now in state f we,
65
00:07:13,920 --> 00:07:18,300
that choice between the different actions is considered to be a non deterministic one.
66
00:07:18,720 --> 00:07:23,220
And having chosen the action in a state, then we have a probabilistic choice,
67
00:07:23,640 --> 00:07:28,680
which is you to be an internal decision which which is indeed the successor.
68
00:07:29,910 --> 00:07:34,740
Now to talk about probabilities or expectations in Markov decision processes,
69
00:07:35,100 --> 00:07:42,150
we need some inference which resolves the non determinism, and this is what we call calculus or policies or strategies.
70
00:07:42,540 --> 00:07:52,349
So they take the history, what has happened. So the sequence of states and actions that have been taken in the past and then for the current state,
71
00:07:52,350 --> 00:07:55,890
they choose one of their actions, possibly in a randomised way.
72
00:07:56,640 --> 00:08:06,420
And doing this then you can, given such a catalogue, you can unfold the behaviour of that scapula into a tree like Markov chain,
73
00:08:06,420 --> 00:08:11,880
and that we have the full machinery to reason about probabilities and expectations.
74
00:08:12,540 --> 00:08:15,749
If you are not familiar with that trust, use your intuition.
75
00:08:15,750 --> 00:08:21,550
I think this would be enough for this talk. And what is this?
76
00:08:21,560 --> 00:08:29,110
The shortest path problem we deal with waits, integer waits that are attached to the state action path.
77
00:08:30,940 --> 00:08:40,509
So now here is the definition. The only important thing is that throughout this talk I've been saying MGP, I always mean a finite state.
78
00:08:40,510 --> 00:08:44,440
MGP only have finally many states and finite many actions.
79
00:08:46,730 --> 00:08:55,850
No. And as schedule can be formalised as a partial or it's a function from finite passes to distributions over elections.
80
00:08:55,880 --> 00:09:03,890
This would be the randomised case, and the deterministic case is where these distributions of direct distributions,
81
00:09:04,040 --> 00:09:12,110
which means that a single action is chosen as probability, one is chosen deterministically unique electron.
82
00:09:13,490 --> 00:09:17,030
And another important notion is that of a memory less controller.
83
00:09:17,480 --> 00:09:21,850
They do not look for the full history, they just look for the current state.
84
00:09:21,860 --> 00:09:26,540
And whenever I visit the same state multiple times, I make the same decision.
85
00:09:26,570 --> 00:09:32,570
This is meant by memory loss and on the following slides I often use the short form
86
00:09:32,570 --> 00:09:37,340
notations named for memory less deterministic and often more or less randomised.
87
00:09:39,770 --> 00:09:50,780
And another important concept that has been introduced by Luca and Alcohol more than 20 years ago is the notion of an end component,
88
00:09:51,590 --> 00:09:56,450
which, to say an easy word, is a strongly connected MVP.
89
00:09:56,810 --> 00:10:03,170
And by saying Sup MGP, what we do is the following For every state we have a set of actions that can be taken.
90
00:10:03,500 --> 00:10:09,530
And now we take just a subset of these actions. And this gives you MGP.
91
00:10:09,530 --> 00:10:13,370
The probabilistic choices for these selected actions remain the same.
92
00:10:13,550 --> 00:10:18,620
They remain unchanged. So if you look for the underlying draft, then it's indeed a sup graph.
93
00:10:19,580 --> 00:10:24,860
And for this being an end component, then this graph needs to be strongly connected.
94
00:10:26,190 --> 00:10:36,210
And there is one theory that has been established by a look at how he sees this 97 and also in multiple papers,
95
00:10:36,900 --> 00:10:42,120
what I call the fundamental property, which is somehow the MVP honour.
96
00:10:42,120 --> 00:10:46,380
Look to the fact that in a finite state, Markoff trains.
97
00:10:46,830 --> 00:10:56,220
We know that almost surely so with probability one a bottom strongly connected component will be reached and the system will stay there forever.
98
00:10:56,760 --> 00:11:03,300
And the MVP on a loop uses these end components state state that no matter which catalogue you choose,
99
00:11:04,050 --> 00:11:10,710
then the limit of almost all infinite passes will constitute an end component and tier by the limit.
100
00:11:10,890 --> 00:11:15,630
The meaning. If we look for the state action pass that occurs infinitely often.
101
00:11:17,540 --> 00:11:24,050
Now, here's an example. If you have an M.D. P Uh, and we have.
102
00:11:28,710 --> 00:11:33,210
It stopped working. Does it work here?
103
00:11:33,450 --> 00:11:37,020
Yeah. We have two end components.
104
00:11:37,500 --> 00:11:46,020
Does this still work? Yeah, we have to end components and we have this state here.
105
00:11:46,800 --> 00:11:50,400
Oh, that's great. Uh, but.
106
00:11:50,400 --> 00:11:59,340
But this one for what? This one here is a trap, states a terminal state where no action can be taken.
107
00:11:59,970 --> 00:12:08,460
And then lemme apply to such MVP's that my train of thought trap states means that under each schedule,
108
00:12:08,970 --> 00:12:15,360
almost all maximal passes for either finite passes that end in a trap or the infinite ones.
109
00:12:16,260 --> 00:12:21,299
They indeed end either in a trap or they eventually realise an end component.
110
00:12:21,300 --> 00:12:30,300
And by realising an end component I mean that the state should pass the infinite loop of constitute an end component.
111
00:12:31,420 --> 00:12:35,220
And for the special case where we do not have to.
112
00:12:35,830 --> 00:12:40,870
Okay, but we do not have any components at all.
113
00:12:41,050 --> 00:12:47,890
So an easy free MGP then allows Lemma simply states that with probability one,
114
00:12:48,220 --> 00:12:53,260
no matter which catalogue we choose, the system will end in a in a trap state.
115
00:12:55,400 --> 00:13:06,020
So now the typical task of synthesis problems for MEPs or formalise the synthesis problems would be given such
116
00:13:06,020 --> 00:13:15,180
a finite state MEP Find a scheduler that's maximises or minimises the expectation of some random variable.
117
00:13:16,330 --> 00:13:20,890
So and the input of this random variable are simply maximal parcels.
118
00:13:21,860 --> 00:13:25,640
So one very popular measure is the discounted rate.
119
00:13:26,510 --> 00:13:35,420
Remind that you have these individual rates for the for the state can pass and now this discounted rate of a maximal pass.
120
00:13:35,780 --> 00:13:39,530
We take the sum of all positions in that path.
121
00:13:40,580 --> 00:13:47,300
We take the rate which is collected for this with the corresponding state action alpha.
122
00:13:48,320 --> 00:13:56,210
And we multiply it with the ice power for discount factor or gamma, and it comes from value strictly between zero and one.
123
00:13:57,440 --> 00:14:03,380
And with government being strictly between zero and one, then we do not have convergence problems.
124
00:14:03,620 --> 00:14:15,850
Even if the sum is infinite. Total waits is more or less the same, but you can think of it with discount factor one without any discount tractor.
125
00:14:16,720 --> 00:14:21,640
Now if indeed convergence is an issue for infinite passes.
126
00:14:22,090 --> 00:14:27,760
And I would say it is not that interesting with the total wait for infinite passes.
127
00:14:28,960 --> 00:14:31,000
Without any further outside constraint.
128
00:14:31,780 --> 00:14:40,239
What has been done in the literature is either to consider only finite passes, either in a finite horizon setting,
129
00:14:40,240 --> 00:14:45,069
which means that we have some fixed bound for the length of the passes that we consider.
130
00:14:45,070 --> 00:14:52,719
And then we look for the maximal expectation of exactly and steps or the infinite horizon case where
131
00:14:52,720 --> 00:15:00,880
we have a distinguished gold state and we only accumulate the weight until we reached this goal state.
132
00:15:01,600 --> 00:15:08,740
And with the site assumption that we only look for those catalogues that indeed reached this goal, state with probability one.
133
00:15:09,130 --> 00:15:18,660
These are the so-called proper catalogues. And there is another very prominent measure it random variable which is used in the literature.
134
00:15:19,080 --> 00:15:27,420
This is the mean payoff or the long run average. So what we do is we take the partial sums of these total weights.
135
00:15:27,720 --> 00:15:36,300
So we look for all the prefixes, the finite prefixes of a given path, and we take the average for these finite prefixes.
136
00:15:36,630 --> 00:15:43,260
This is this value here. And then we either take the limits or the limits of.
137
00:15:44,170 --> 00:15:50,590
For these values Now, given a single pass, then the limit and the limits up, they can be different values.
138
00:15:51,250 --> 00:15:59,950
However, what is known is that in finite Markov trains that makes no difference whether we deal with the limits or the limit.
139
00:16:00,130 --> 00:16:03,760
So in expectation, these are the same.
140
00:16:03,970 --> 00:16:11,200
And this result carries over to two Markov decision processes where the maximum does not change.
141
00:16:11,200 --> 00:16:15,640
Whether we take the modelling soup and say for the minimum overall calculus.
142
00:16:16,940 --> 00:16:20,840
Well, and moreover, and I will not zoom into the details,
143
00:16:21,590 --> 00:16:31,370
there are efficient algorithms to compute a memory less deterministic scheduler where these the max or the minimum achieved,
144
00:16:31,370 --> 00:16:35,270
and these can be computed using linear programming techniques.
145
00:16:37,540 --> 00:16:45,300
So now let's go, let's look or let's go more in the direction of the shortest path problem.
146
00:16:45,310 --> 00:16:49,030
And let's go look first for the discounted case.
147
00:16:49,450 --> 00:16:53,380
And here the key concept on the so-called bellman equation.
148
00:16:53,920 --> 00:16:57,549
So this is it was somewhere in the fifties.
149
00:16:57,550 --> 00:17:03,790
So it's quite a long time ago when he introduced this this bellman equations.
150
00:17:04,060 --> 00:17:12,610
Okay. So now we are in the disguises. Okay. So the task is to find a schedule that maximises the expected discounted rate.
151
00:17:13,700 --> 00:17:23,990
And you go, You are the bellman equations. So the idea is to characterise the global maximum by local maxima in every state.
152
00:17:24,860 --> 00:17:27,130
So. S is a state.
153
00:17:27,170 --> 00:17:38,600
An X is a variable for the maximal expected discounted rate that can be can achieve from state as so if we treat state as an initial state.
154
00:17:40,070 --> 00:17:48,320
And then this value x equals the maximum of all the actions that can be taken instead.
155
00:17:49,420 --> 00:18:02,050
And for each such action, we look for the weight of the state action per alpha, plus the weighted some of these values for the success of.
156
00:18:03,090 --> 00:18:12,840
And the rates are given by the discount factor gamma multiplied with the transition probability going from s when taking alpha to state t.
157
00:18:15,150 --> 00:18:20,030
And. This is a recursive equation, so X equals something.
158
00:18:20,030 --> 00:18:23,770
And T on the right hand side, we have two values for the successors.
159
00:18:24,200 --> 00:18:33,860
So you can think of it as a function that maps C or vectors this one component for each state to two vectors.
160
00:18:34,460 --> 00:18:40,270
And what has been shown is that this function, thanks to this discount factor,
161
00:18:40,280 --> 00:18:45,350
gamma is a contracting map on a banner space, so it has a unique fixed point.
162
00:18:46,700 --> 00:18:52,730
And this indeed also is the key property to establish algorithms.
163
00:18:53,090 --> 00:18:57,900
So one of them is derived linear programming techniques.
164
00:18:58,370 --> 00:19:05,030
Namely, instead of saying X is equals, the maximum, we say x is greater or equal,
165
00:19:05,480 --> 00:19:10,700
then all these values for each action that can be taken in in status.
166
00:19:11,850 --> 00:19:17,880
And to ensure that we have access equal to the maximum, we minimise all these values.
167
00:19:19,820 --> 00:19:27,460
So this is one thing. And the other thing is what can also be derived from this is the value integration.
168
00:19:27,470 --> 00:19:35,780
So we trust. So this function, if we start with an arbitrary value or zero in every component,
169
00:19:36,800 --> 00:19:43,490
and then we simply apply successively implied this function F and convergence is guaranteed.
170
00:19:44,300 --> 00:19:52,220
And that is also a technique I will not speak about, which is called policy iterations of successively improved memory,
171
00:19:52,220 --> 00:19:59,090
less deterministic scheduler, and eventually destabilises it in an optimal solution.
172
00:20:01,110 --> 00:20:06,150
So now this was the case, this one here for discounting.
173
00:20:06,570 --> 00:20:08,910
And remind for totally what I said.
174
00:20:09,030 --> 00:20:19,140
This is essentially the same, but without this discount factor and indeed the bellman equation itself, they are for this infinite horizon case.
175
00:20:19,680 --> 00:20:29,550
They still look like this unless it is already one of these distinguished gold states in this case that is simply zero,
176
00:20:30,180 --> 00:20:37,860
but otherwise did have the same equation. And he was that this contract of gamma, which is one or can be dropped.
177
00:20:38,960 --> 00:20:44,870
But now things are more difficult. This before I said we have a contracting that.
178
00:20:45,850 --> 00:20:49,080
But this is not clear here. Not.
179
00:20:49,090 --> 00:20:53,070
Not in all the cases. We don't have to discount any more.
180
00:20:53,470 --> 00:21:01,710
So this needs. Additional things to make sure that we have a unique fixed point and so on.
181
00:21:02,820 --> 00:21:09,540
And all this was. Let's hear we talk about the stochastic sort of path problem.
182
00:21:09,540 --> 00:21:13,260
And all this research is about these extra things that we need.
183
00:21:15,090 --> 00:21:18,629
So just I skipped the outline at the beginning,
184
00:21:18,630 --> 00:21:26,940
but now here comes the what I just did was to give you some basics about the model we talk about and end these important measures,
185
00:21:27,360 --> 00:21:30,839
and now we talk about the stochastic sort of path problem.
186
00:21:30,840 --> 00:21:35,100
First the classical one and then later on on on variants.
187
00:21:36,580 --> 00:21:49,330
So this notion stochastic sort of past problem and the main results, they go back to Betsy and Alex from the year this is very small.
188
00:21:49,330 --> 00:21:56,890
It is 9 to 1 in a paper published in Mathematics of Operations Research,
189
00:21:57,610 --> 00:22:04,599
and in this paper they have a very nice section on the related work or summary of
190
00:22:04,600 --> 00:22:09,580
the work that has been done at this Times was summary of the state of the art.
191
00:22:10,570 --> 00:22:15,010
So here is this paper from 91 and it goes back to 62.
192
00:22:15,280 --> 00:22:19,570
This problem has been investigated first,
193
00:22:19,900 --> 00:22:26,830
and they also nicely describe that even Apollo tries to introduce a new name for this because
194
00:22:27,460 --> 00:22:32,380
this was used under different names and the three different three other different names before.
195
00:22:33,340 --> 00:22:42,100
But they also explain that this is in the style of classical short of path problems which are known from graph theory.
196
00:22:43,840 --> 00:22:48,120
So now. Let's look for this picture here.
197
00:22:49,050 --> 00:22:52,740
We have a way to find out State Markov decision process.
198
00:22:53,220 --> 00:23:02,820
And we have a certain gold state, which is the type of state that we want to reach, and we accumulate the rates until we reach this gold state.
199
00:23:03,600 --> 00:23:08,940
And then we want to reason about the mean or maximal expectation for this random variable.
200
00:23:10,300 --> 00:23:17,590
Now for the rest of the talk, I will consider the longest pass problem we have here to the maximum.
201
00:23:17,980 --> 00:23:21,340
The techniques for minimum are completely analogous.
202
00:23:23,500 --> 00:23:32,580
So the first result, and this goes back to the late sixties was that in the special case where under
203
00:23:32,620 --> 00:23:37,960
every sketch you scapula or the minimal probability to reach the gold state is one.
204
00:23:39,190 --> 00:23:46,000
In this case, what have been proven is that the bellman equations, they still work.
205
00:23:46,980 --> 00:23:53,820
And they get a contracting rep. So who empty piece of this form out called contracting MIPS.
206
00:23:53,820 --> 00:23:58,530
And these are exactly the you see free notes that do not have any components.
207
00:24:00,080 --> 00:24:06,080
And it was shown that there is a reduction to the discounted case in this case.
208
00:24:07,150 --> 00:24:11,740
So all this machinery for discounted rate works also for this case.
209
00:24:12,930 --> 00:24:21,930
Now the contribution of that take us to thecase was this requiring that under every catalogue we reach gold state.
210
00:24:21,930 --> 00:24:33,060
It's a very strong requirement. Now let's relax it by considering only those catalogues, what they called the proper catalogues,
211
00:24:33,480 --> 00:24:36,710
namely those that reach the goal of the probability one.
212
00:24:36,750 --> 00:24:40,270
But there might be others. So. And.
213
00:24:41,360 --> 00:24:45,290
They also achieved that under certain conditions.
214
00:24:45,290 --> 00:24:52,550
We can still rely on these Belmont equations and techniques that have been developed for the discounted case,
215
00:24:52,940 --> 00:25:03,550
but they needed a second condition here, which is that that the total expected total weight under each improper scheduler.
216
00:25:03,560 --> 00:25:09,680
So each scheduler that. Does not reach the gold state with probability one.
217
00:25:11,620 --> 00:25:21,010
Must be minus infinity. And they showed that if these two conditions hold it, then the maximum is indeed finite.
218
00:25:21,340 --> 00:25:26,740
And all this machinery that has been developed for the discounted case still works.
219
00:25:27,220 --> 00:25:32,200
So. So now the question is.
220
00:25:33,870 --> 00:25:38,939
How to track this condition. The second condition, the first one, this is this is simple.
221
00:25:38,940 --> 00:25:42,870
They agree to do it in polynomial time.
222
00:25:44,670 --> 00:25:49,790
But how to track this condition? The second one. And the next thing is a.
223
00:25:50,370 --> 00:25:55,550
So if these two conditions hold, then the maximal expectation is finite.
224
00:25:55,560 --> 00:25:59,040
But there can be other cases where the maximal expectation is finite.
225
00:25:59,040 --> 00:26:00,840
So it's not a complete criterion.
226
00:26:02,350 --> 00:26:12,940
So what to do in this case is wait to find out how to check this and how to compute in these cases an optimal catalogue and the optimal value.
227
00:26:14,080 --> 00:26:21,310
And the first answer was given to these questions was given by look at Alpha Howell in his Conquer 99 paper,
228
00:26:21,850 --> 00:26:31,390
where he considered Markov decision processes where the weights have all the same polarity that they are all known positive or they are all negative.
229
00:26:32,470 --> 00:26:38,770
And what he did was he provided a transformation to the case of peptic ulcer and
230
00:26:38,770 --> 00:26:44,410
to clear case so that these two conditions hold for a transformed and repeat.
231
00:26:45,040 --> 00:26:52,450
And the idea here for the case this is now for the case with all weights are non positive, so either negative or zero.
232
00:26:53,170 --> 00:27:00,730
Then you observe the following whenever you have an end component where every state action pairs meet zero,
233
00:27:01,540 --> 00:27:06,940
then you can collapse them to a single state. Then it's no longer an end component.
234
00:27:08,350 --> 00:27:16,960
And. In that way, you can remove all these end components consisting of what I call zero end components.
235
00:27:17,290 --> 00:27:24,360
And what remains is an MDP, where each end component contains at least one state action pair.
236
00:27:24,370 --> 00:27:31,420
This negative weight and the other rates are non-negative, non positive anyway, which gives you that expectation.
237
00:27:31,990 --> 00:27:37,630
The expected value of the total weight is indeed minus infinity for each improper scatterplot.
238
00:27:39,820 --> 00:27:47,230
And in the case where all the rates are non-negative, then Alpha proved the following.
239
00:27:47,830 --> 00:27:50,920
This makes the maximal expectation is finite.
240
00:27:51,310 --> 00:27:55,720
If and only if the weights in all end components are zero.
241
00:27:56,590 --> 00:28:01,600
So no end component contains the stated comparative positive weight.
242
00:28:02,350 --> 00:28:12,970
And in that case one can do the same thing. One collapses all the zero and components and the result is an MDP that has no improper scapula at all.
243
00:28:14,230 --> 00:28:17,230
The all that did the remaining to the end.
244
00:28:17,530 --> 00:28:25,240
If all these and components enjoy this, then after this collapsing the zero and components, there are no end components anymore.
245
00:28:26,960 --> 00:28:34,280
And these these concepts can also be generalised for the case of arbitrary interrogates.
246
00:28:34,670 --> 00:28:38,060
So we can have a mixture between positive and negative rates.
247
00:28:38,600 --> 00:28:49,790
And let's have quickly a closer look in this criterion pool for finite nodes and and for this transformation.
248
00:28:52,290 --> 00:28:57,359
There is some cleaning up to make sure that from every stage we can reach the goal with
249
00:28:57,360 --> 00:29:03,880
state of this with probability one and that none of the states are removed and so on.
250
00:29:03,900 --> 00:29:11,460
I will skip this. Then, after such a cleaning up, the result or characterisation of the finite notes looks like this.
251
00:29:12,270 --> 00:29:23,100
The maximal expectation is finite. If and only if the MDP has no great divergent end component and component is said to be very
252
00:29:23,180 --> 00:29:30,570
divergent if it has a schedule such that for almost all the passes along this catalogue,
253
00:29:31,170 --> 00:29:34,350
the limbs of these prefixes of the weights.
254
00:29:36,040 --> 00:29:43,920
Hence to infinity. Now, this observation then yields an algorithm for checking finite nodes.
255
00:29:44,370 --> 00:29:53,850
Compute the maximal and components. This can be done with an iterative, strongly connected component computation in patriotic time,
256
00:29:55,080 --> 00:29:59,160
and then one can track this with divergence separately.
257
00:29:59,250 --> 00:30:02,550
To give you an idea for that, let's have a closer look.
258
00:30:02,730 --> 00:30:08,060
So. This was again, the definition of great divergent and components.
259
00:30:09,360 --> 00:30:15,240
They exist as catalogues such that almost surely the same look tends to plus infinity.
260
00:30:16,110 --> 00:30:19,680
One special case also called Pumping and Components.
261
00:30:20,730 --> 00:30:26,580
The same thing holds, but now in a stronger form. Namely, even the live in goes to plus infinity.
262
00:30:27,360 --> 00:30:35,310
This, for instance, applies to the end component, which is your trust mark of train, which is completely uniform.
263
00:30:35,580 --> 00:30:41,160
So if interest is uniform, probability will stay in that state or go to the other state.
264
00:30:41,790 --> 00:30:46,710
And one state for one state, we have to wait plus two and four out of one minus one.
265
00:30:47,590 --> 00:30:51,669
Then if you look for the accumulated weight on most trawler,
266
00:30:51,670 --> 00:30:57,790
it goes up and up and up and higher and higher and higher such that indeed the limits will be plotted infinity.
267
00:30:59,560 --> 00:31:04,780
Another extreme situation is what we called what has been called gambling and component.
268
00:31:06,910 --> 00:31:12,490
The limit goes to minus infinity and super goes to plus infinity.
269
00:31:12,640 --> 00:31:16,660
So we still have this divergence condition here on the right hand side.
270
00:31:17,320 --> 00:31:22,420
So this would be the case where this accumulated rate oscillates.
271
00:31:22,540 --> 00:31:28,510
So it goes down and down. So up and down and up and down and so on.
272
00:31:30,070 --> 00:31:35,230
So now here comes an algorithm to track rate divergence.
273
00:31:36,300 --> 00:31:44,400
Assuming that we have a strongly connected Markov decision process and this assumption is justified because, as I said before,
274
00:31:44,610 --> 00:31:52,590
to take this finite ness of the maximal expectation, it is sufficient to analyse the maximal end components separately.
275
00:31:52,600 --> 00:32:04,010
So we are in this strongly connected case. So we first compute the maximal mean pair off and a corresponding memory deterministic scheduler.
276
00:32:06,130 --> 00:32:11,060
If this mean payoff is positive. Then we can safely return.
277
00:32:11,060 --> 00:32:19,040
Yes, because just remember, the mean payoff was defined as the average of the rates that we collect in the first steps,
278
00:32:19,040 --> 00:32:27,320
and then the end goes to infinity. So if this is the case, if even this mean payoff is positive, then we are in a pumping case.
279
00:32:29,730 --> 00:32:37,410
If the mean of of this catalogue or the maximum of is negative, then we can safely return.
280
00:32:37,530 --> 00:32:46,440
No. This is not the case. It cannot be divergent because the maximal expected mean pair of in all and components must be negative.
281
00:32:48,420 --> 00:32:54,840
So when we return, when we arrive here, I thought then need a tool, not three of flights.
282
00:32:55,350 --> 00:33:01,410
So between that expected mean pay of what a maximum of is zero.
283
00:33:02,500 --> 00:33:08,800
Now this catalogue sequence is a memory list. Deterministic scheduler induces a Markov trade.
284
00:33:10,030 --> 00:33:13,770
Now this mark of trade, you can consider it as a graph.
285
00:33:13,780 --> 00:33:17,319
It has certain strongly connected components.
286
00:33:17,320 --> 00:33:20,650
And now look for those that are at the bottom that cannot be left.
287
00:33:21,160 --> 00:33:22,060
Pick one of them.
288
00:33:23,950 --> 00:33:35,469
So if this one is compelling and reminder, this means that the name goes to plus infinity to minus infinity, then we can safely return.
289
00:33:35,470 --> 00:33:44,730
Yes, because gambling implies rate divergence. The question is how to track this being gambling or not.
290
00:33:45,240 --> 00:33:49,530
This is an NPR problem in general. But here we are in a similar case.
291
00:33:50,370 --> 00:33:54,510
We have a Markoff train. So this is indeed a Markoff train.
292
00:33:55,080 --> 00:34:07,290
And we know that the mean payoff is zero. But then being gambling is equivalent to the existence of cycles where the accumulated weight is positive,
293
00:34:07,710 --> 00:34:13,140
which again is equivalent to the statement that there are cycles where the accumulated weight is negative.
294
00:34:14,630 --> 00:34:20,780
So and this is a simple graph condition. So this can be also tracked by graph.
295
00:34:23,040 --> 00:34:28,440
Now, what remains to consider is the case that this is not complete.
296
00:34:29,280 --> 00:34:42,070
But then. But then the only thing what can still apply is that all the state action pass in the end component, or at least PSC must have made zero.
297
00:34:44,100 --> 00:34:47,850
All the cycles must have zero before effort.
298
00:34:48,150 --> 00:34:55,180
The criterion is to check the existence of positive cycles, which is equivalent to the existence of negative cycles.
299
00:34:55,210 --> 00:35:02,160
And what if this does not apply? Then this means that the accumulated rate along all the cycles must be zero.
300
00:35:02,910 --> 00:35:11,340
And in that case. What we can do is we can somehow flatten and such and components.
301
00:35:11,880 --> 00:35:18,090
Now, in the case of non positive and non-negative weights, I said we simply collapse them to a single state.
302
00:35:19,040 --> 00:35:22,250
Yet the situation is a bit more complicated.
303
00:35:23,420 --> 00:35:27,920
But if I look at the time, then I think I will keep this.
304
00:35:30,530 --> 00:35:33,890
If you like to see it, I'm happy to show it after to talk.
305
00:35:35,380 --> 00:35:42,480
No. So.
306
00:35:43,320 --> 00:35:46,830
Well in the end with this technique to remove.
307
00:35:47,870 --> 00:35:56,000
And components are indeed back to exactly this picture of what I have shown you for the case for Fallujah as I was case.
308
00:35:56,150 --> 00:35:58,250
Not positive or negative weight.
309
00:35:58,670 --> 00:36:08,960
There is a transformation which is polynomial time to a new MGP that satisfies the constraints that are considered by Betsy us and
310
00:36:09,500 --> 00:36:16,850
is such that we can rely on their machinery and rely on all these techniques that have been developed for the discounted case.
311
00:36:18,450 --> 00:36:23,130
So let's look for variance and let's start.
312
00:36:27,240 --> 00:36:30,309
Yeah. This. A few questions.
313
00:36:30,310 --> 00:36:34,240
Why do we consider variants? The first thing is.
314
00:36:35,120 --> 00:36:43,850
I said to you, we consider the maximum overall the proper schedule of low dose schedules that reach the goal stage with probability one.
315
00:36:45,530 --> 00:36:49,220
Is it really justified to ignore the improper ones?
316
00:36:49,850 --> 00:36:53,570
And what is the case if there are no proper schools at all?
317
00:36:53,810 --> 00:36:58,790
If all this catalogues fail the Golden State with some positive probability.
318
00:36:59,630 --> 00:37:00,800
And another question.
319
00:37:02,530 --> 00:37:13,719
Just looking for the expectation, then this might not be the best solution if the maximal optimal schedule is a good expectation.
320
00:37:13,720 --> 00:37:17,140
Okay. But the variance is very, very high.
321
00:37:17,530 --> 00:37:27,820
Then there is a very high risk that concrete simple runs have the accumulated weight much less than the expectation.
322
00:37:28,240 --> 00:37:37,360
So in these cases it might be better to have suboptimal sub maximal expectation but a better, smaller variance.
323
00:37:38,940 --> 00:37:42,990
Looking for the variance. I find it a bit surprising.
324
00:37:43,380 --> 00:37:51,060
This is surprisingly difficult. So a natural thing would be to have the lowest threshold values.
325
00:37:51,060 --> 00:37:54,300
I want to have at least this and that expectation.
326
00:37:54,900 --> 00:37:59,280
And I have a threshold and an upper bound for the variance.
327
00:38:00,030 --> 00:38:03,540
No such threshold problems, at least to the best of my knowledge.
328
00:38:03,930 --> 00:38:12,390
It is not known to be solvable. What is known is to achieve such thresholds or to satisfy such threshold conditions.
329
00:38:12,840 --> 00:38:18,090
One might need schedules that use both memory and randomisation.
330
00:38:18,600 --> 00:38:22,680
But I'm not aware that there is any solution that is known.
331
00:38:24,230 --> 00:38:33,890
There is a partial answer, but for a much simpler setting, namely, if there might be difference catalysts that achieve the maximal value.
332
00:38:34,640 --> 00:38:44,690
And if you consider only those and you seek for very minimal schedule on the dose that achieve the maximum, then the situation is simpler.
333
00:38:45,710 --> 00:38:50,900
More or less one can use bellman like equations for the variants.
334
00:38:51,230 --> 00:38:58,549
And what makes it so simple is. In the in this with this question here, the expectation is fixed.
335
00:38:58,550 --> 00:39:05,420
It is a fixed value that can be computed before and then it can be treated as a constant in this linear program.
336
00:39:05,960 --> 00:39:09,330
So all this catalogues under consideration here.
337
00:39:09,380 --> 00:39:13,070
They have all the same expectation and this makes it simpler.
338
00:39:13,100 --> 00:39:19,670
This problem. So another thing is to look at what have people done in risk theory.
339
00:39:20,420 --> 00:39:25,490
So there is a notion called variance penalised expectation.
340
00:39:26,000 --> 00:39:33,950
So what they do in this context here we still want to have good high expectation is to
341
00:39:33,950 --> 00:39:39,520
take the supreme of all its catalogues that can be achieved for this goal function.
342
00:39:39,530 --> 00:39:50,240
Here we take the expectation, the expected accumulated weight minus some factor lumped on some putative value times the variance.
343
00:39:50,960 --> 00:39:59,330
So there is some incentive not only to maximise the expectation but also to look for a small variance value.
344
00:40:01,740 --> 00:40:04,800
For solution methods for these. I come to this in a second.
345
00:40:05,040 --> 00:40:13,090
There is a partial answer. Another measurement also taken from risk theory is the so-called conditional value at risk,
346
00:40:13,300 --> 00:40:17,430
which has been investigated by Kaczynski and to be a second offer.
347
00:40:19,090 --> 00:40:27,470
So this value at risk would fix some some probability value or some percentage.
348
00:40:27,670 --> 00:40:32,500
Pete And you are interested what happens in the worst cases.
349
00:40:32,620 --> 00:40:39,430
So in the 10% worst case for this, you can define the so-called value of risk here.
350
00:40:39,730 --> 00:40:43,120
So where it swaps? What are the worst cases?
351
00:40:43,230 --> 00:40:54,190
Let's let's say it in that way. And then you look here in this conditional case, what is the the the expected, uh, accumulated rate.
352
00:40:54,460 --> 00:40:59,050
In exactly these cases, these p worst cases.
353
00:40:59,560 --> 00:41:03,550
But that's what they do. Solution method.
354
00:41:03,570 --> 00:41:07,729
I will talk in a second. Uh, or so.
355
00:41:07,730 --> 00:41:13,080
This was already a conditional expectation. But also, you can look for this in a clear form.
356
00:41:13,460 --> 00:41:20,150
The conditional expected accumulated weight under the assumption that we indeed reach the goal state.
357
00:41:20,870 --> 00:41:28,460
This makes it possible to relax this assumption that we only look for the schedules that achieve the goal of this probability one.
358
00:41:28,790 --> 00:41:32,690
Instead, it is sufficient to look for those that reach the goal state at all.
359
00:41:34,790 --> 00:41:45,200
And another variant of this one, which is conceptually close to this one, is what we what has been called partial expectations.
360
00:41:45,650 --> 00:41:54,950
So for this random variable, we say take the accumulated weight in case that we did reach the gold state and otherwise zero.
361
00:41:57,520 --> 00:42:02,290
Now these problems look quite different, but they have some similarities.
362
00:42:03,180 --> 00:42:15,630
And I will briefly zoom into conditional expected accumulated rewards and then give you an idea of what are the similarities to these other problems.
363
00:42:18,110 --> 00:42:24,320
So, uh, the first thing, why should we interested in these conditional expectations?
364
00:42:24,830 --> 00:42:30,800
Well, it's a natural measure. For instance, if you have probabilistic non deterministic programs,
365
00:42:31,220 --> 00:42:38,870
then you might be interested in the conditional expectation time under the assumption that the program indeed terminates.
366
00:42:39,830 --> 00:42:43,820
Or you might look for certain failure scenarios.
367
00:42:44,330 --> 00:42:52,340
And then what are the expected costs of repair protocols in these particular failure scenarios which can suit your condition?
368
00:42:53,860 --> 00:43:03,010
And it has also been used for some kind of multi objective reasonings with cost utility to analyse the cost utility trade of its captive.
369
00:43:05,220 --> 00:43:12,720
It is more difficult than the standard, uh, stochastic sort of past problem because as you will see in a second,
370
00:43:13,230 --> 00:43:17,290
optimal memory, less credulous might not exist anymore.
371
00:43:17,320 --> 00:43:28,780
We might need history. And also this using linear programming techniques in this naive form, this one variable state won't work anymore.
372
00:43:30,960 --> 00:43:35,100
So let's look at an example, namely this one here.
373
00:43:35,790 --> 00:43:40,409
So this is an empty pig where it is just a single state where we have a non
374
00:43:40,410 --> 00:43:46,380
deterministic choice as to where we can go with an alpha directly to the gold state.
375
00:43:46,740 --> 00:43:53,850
It's probability one. Or we can take action beta with the self loop here or going to a failed state.
376
00:43:55,850 --> 00:44:05,000
So let's look first for the two deterministic schedulers memory less deterministic schedules that always do the same thing when they are in state.
377
00:44:06,140 --> 00:44:11,360
So the first option is always take action undefined state as to then.
378
00:44:12,300 --> 00:44:22,660
We have to be have trust. A single pass going to to pass is going to go with oneness from aethereal via one to goal.
379
00:44:22,950 --> 00:44:26,670
Then we collect the reward for this action, which is all.
380
00:44:27,670 --> 00:44:32,860
And the other one goes from zero to S2 and then to gold and into the reward zero.
381
00:44:32,870 --> 00:44:40,229
So the result is a half. But this catalogue, the minimalist,
382
00:44:40,230 --> 00:44:50,520
deterministic catalogue which always chooses beta in state as to then achieve a better conditional expectation namely are.
383
00:44:50,910 --> 00:44:54,250
And this is because they stressed a single pass going to goal.
384
00:44:54,710 --> 00:45:07,440
While as one we achieve the value, the reward order to the weight are for gamma and the other possibility for this conditional expectation.
385
00:45:07,830 --> 00:45:15,410
We never reach goals. We simply cannot. So it was a beta is better than alpha.
386
00:45:17,260 --> 00:45:28,890
But we can do better. Let's look for the following schedule in state as to for the first and visits we take beta and for the end plus first visit.
387
00:45:29,110 --> 00:45:34,320
If we have taken the self loop. If it happens that we have taken the self loop and times.
388
00:45:35,260 --> 00:45:42,450
Then we take action. So there is still this possibility.
389
00:45:42,660 --> 00:45:45,900
Going directly from zero to S1 to goal.
390
00:45:46,020 --> 00:45:56,260
This stands for this one half times are. Plus now one half times the probability to take this self loop and times.
391
00:45:57,160 --> 00:46:03,520
This is this value. And whenever we take beta, we own this great one.
392
00:46:03,760 --> 00:46:08,320
So all together we get to the factory in. Now you can calculate.
393
00:46:08,560 --> 00:46:14,050
Then the result would be this. And this is better than or remind.
394
00:46:14,080 --> 00:46:21,160
This was the conditional expectation that we achieve with this memory less deterministic scheduler which always chooses beta.
395
00:46:21,700 --> 00:46:32,560
So this value is larger than our if and only if this number n is larger than R, and one can compute that the maximum is achieved when equals plus two.
396
00:46:33,970 --> 00:46:39,270
Now, what this example shows is. We can eat memory.
397
00:46:39,630 --> 00:46:45,600
So in this case, we need a counter for the number of we have taken this self loop.
398
00:46:48,040 --> 00:46:52,900
And what is meant by your biting local reasoning is not sufficient.
399
00:46:54,450 --> 00:46:58,020
I trust that the optimal value is achieved for this cat.
400
00:46:59,070 --> 00:47:03,240
After our plus two visits, as to we take extra alpha.
401
00:47:04,710 --> 00:47:10,410
Now, this depends on all which is the rate of this action gamma, which is sitting here.
402
00:47:10,800 --> 00:47:14,100
So this condition is even not reachable from S2.
403
00:47:14,430 --> 00:47:18,690
Just looking for the MDP that is reachable from is too.
404
00:47:19,320 --> 00:47:23,220
We cannot see anything about the optimal behaviour in state as to.
405
00:47:23,580 --> 00:47:26,910
This is what I mean here is with local reasoning is not sufficient.
406
00:47:29,580 --> 00:47:35,520
And there is also another phenomenon which does not occur in the in the unconditional case.
407
00:47:36,720 --> 00:47:41,550
For the initial state. The maximal expectation is finite.
408
00:47:42,500 --> 00:47:46,780
But. Oops. Prostate is too.
409
00:47:47,110 --> 00:47:49,540
If I consider this as the initial state,
410
00:47:50,200 --> 00:47:58,600
then indeed this maximal conditional expectation is infinite because I have the possibility to pump at infinity.
411
00:47:59,820 --> 00:48:02,520
And go then to two to the gold state.
412
00:48:04,930 --> 00:48:14,170
So, uh, now, although in the interest of time, I will just mention the results and skip these, these details here.
413
00:48:14,800 --> 00:48:20,530
Uh, if there is a method, though, this was on the previous slide, there is.
414
00:48:20,530 --> 00:48:22,990
There are graph techniques to check finite.
415
00:48:24,440 --> 00:48:35,480
So assuming that we have tracked this and made sure that the conditional expectation is finite and assume that all rates are non-negative,
416
00:48:35,960 --> 00:48:39,320
either positive or zero, but no negative values.
417
00:48:40,310 --> 00:48:43,560
Then what can be shown is that the threshold problem.
418
00:48:43,610 --> 00:48:48,890
So to check whether this maximum conditional expectation is beyond a certain threshold.
419
00:48:49,400 --> 00:48:54,290
This is a piece based hard problem and piece based complete in ethically case.
420
00:48:56,120 --> 00:49:01,540
Now for algorithms in the in the non to click case the way we have cycles.
421
00:49:02,430 --> 00:49:07,590
The key observation is there is what we called a saturation point.
422
00:49:07,830 --> 00:49:16,030
So if there is some natural number such that whenever we reached a configuration
423
00:49:16,150 --> 00:49:20,610
state and look for the accumulated weight that has been accumulated so far,
424
00:49:21,090 --> 00:49:30,720
if this one is larger than this operation point, then from then on the best what can do is to maximise the probability of reaching a gold state.
425
00:49:31,470 --> 00:49:39,300
And the intuition is. The accumulated weight is already so high that that we want to save.
426
00:49:39,340 --> 00:49:45,340
We do not want to go. The risk that that we fail to visit the ghost state, in which case we lose everything.
427
00:49:45,790 --> 00:49:48,550
That's that's the intuition behind this.
428
00:49:49,800 --> 00:50:00,030
Well, from this one, one can derive an exponential time threshold algorithm and an exponential time algorithm to compute this optimal value.
429
00:50:01,190 --> 00:50:08,300
Now, this was this was saturation point algorithm for the case of conditional expectation.
430
00:50:08,870 --> 00:50:21,020
And now remember that I mentioned before other variants, the same observation with these separation points have been made for partial expectation.
431
00:50:21,440 --> 00:50:24,590
This is the result by my test group.
432
00:50:25,300 --> 00:50:29,270
The antidote has also been made for conditional values for risk.
433
00:50:29,300 --> 00:50:34,100
This is the result by gift making of them. And.
434
00:50:35,420 --> 00:50:42,139
It also applies to the setting of variance penalised expectations reminder that the goal was to
435
00:50:42,140 --> 00:50:50,480
maximise the objective expectation minus lumped times variants slumped up being from positive constant.
436
00:50:52,050 --> 00:51:01,230
And he is also said to raise a point, but here comes some strange phenomenon behind that.
437
00:51:02,160 --> 00:51:08,310
Saturation point. The best thing one can do is to minimise the probability to reach the gold state.
438
00:51:09,870 --> 00:51:18,990
And the reason it's more or less intuitively, this variance is a quadratic function.
439
00:51:19,590 --> 00:51:25,350
And from some point on in this objective, the variance becomes the dominant thing.
440
00:51:25,920 --> 00:51:32,549
And so what one wants to do is to to minimise that from some point on.
441
00:51:32,550 --> 00:51:38,310
One wants to avoid that the value gets too high. That's, that's the intuition behind that.
442
00:51:39,820 --> 00:51:45,850
Which at the same time makes this very polite expectation in this context a bit questionable.
443
00:51:47,530 --> 00:51:54,100
Earth and other taking other things flipped on the owner of a click empty piece with interest rates.
444
00:51:54,610 --> 00:52:01,030
Then it's not a problem because in the case we only have finite many pasts.
445
00:52:01,510 --> 00:52:05,230
They can be explored using purely space algorithms.
446
00:52:05,680 --> 00:52:15,520
And for many of these problems, I think even for all AP space, lower bound have been established for this case.
447
00:52:17,760 --> 00:52:24,480
So now so this word restrictions, non-negative weights or etc.
448
00:52:24,730 --> 00:52:32,100
Now what is in the training case? And one indication that this is more difficult is that these values,
449
00:52:32,100 --> 00:52:38,240
the maximal expectation defined in one of these variants can be an irrational value.
450
00:52:38,250 --> 00:52:43,650
So one should not expect linear programming techniques as they are known for the unconditional case.
451
00:52:44,440 --> 00:52:50,350
And indeed, what the result of Jakob Bauer in his Ph.D. thesis.
452
00:52:51,370 --> 00:52:59,980
What he has shown is that quite a lot of these problems are Coulomb or even positivity of heart.
453
00:53:00,400 --> 00:53:10,420
Now, this column problem is a very prominent and very old problem in mathematics to decide whether a recurrent sequence eventually
454
00:53:10,420 --> 00:53:20,950
hits zero or whether all these values are positive and the desired ability status is unknown for quite a long time.
455
00:53:21,220 --> 00:53:27,580
Ben, Do you know I forgot when this column problem of positivity has been considered first?
456
00:53:29,130 --> 00:53:35,440
Or anyone. In the.
457
00:53:37,850 --> 00:53:42,140
That okay. In the 17th. So quite old problem.
458
00:53:43,070 --> 00:53:46,990
Um. So trust the site remark.
459
00:53:47,930 --> 00:53:53,810
Uh, well, I say I was surprised to see these results, that these problems are so hot.
460
00:53:54,290 --> 00:54:00,530
And Jaco introduced to my opinion, a very elegant proof technique.
461
00:54:00,830 --> 00:54:15,200
So he used. The same MVP get hit for for encoding linear recurrent sequences to prove school m or positivity hardness for a wide range of problems.
462
00:54:15,200 --> 00:54:22,279
Although for other problems which have nothing to do with weights like the model tracking problem for frequency
463
00:54:22,280 --> 00:54:31,730
l t l in Markov decision processes or threshold conditions for termination probabilities in one counter MVP.
464
00:54:32,120 --> 00:54:38,150
And these reductions, they only differ in the truth of the in the choice of the initial condition.
465
00:54:40,550 --> 00:54:51,350
So let's come to the conclusions. So for the classical stochastic sort of past problem, the key development equations from which we have to.
466
00:54:52,390 --> 00:54:56,410
This full machinery tool that can be exploited algorithmically.
467
00:54:56,410 --> 00:55:03,610
So linear programming techniques, value optimisation, policy, iteration, and let's say.
468
00:55:04,740 --> 00:55:12,120
To get there were first. These are kind of semantic site conditions that take us a tricky approach.
469
00:55:12,720 --> 00:55:20,850
But after some graph transformations, polynomial time transformations, one can reduce to this case.
470
00:55:22,270 --> 00:55:29,680
Now for these variants that that I proposed here or that have been considered in the literature,
471
00:55:29,860 --> 00:55:36,420
I found it surprising that they are so difficult in the case of introverts, this positivity,
472
00:55:36,450 --> 00:55:42,909
hardness and quite a few of them enjoy this property that if all debates are non-negative,
473
00:55:42,910 --> 00:55:48,790
then one can exploit the non-monetary necessity of the accumulated weights along the prefixes,
474
00:55:50,020 --> 00:55:57,100
which gives you a saturation point characterisation and then X time or X based algorithm.
475
00:55:59,320 --> 00:56:10,030
And one. Oops. One other remark. So this was for Markov decision process, which are also considered as one player stochastic games.
476
00:56:10,950 --> 00:56:22,080
Now in the case where you have to play us. Then let's say the objective of one player is to maximise this expectation and for the
477
00:56:22,470 --> 00:56:29,280
other player to minimise the awesome results by putting in better outcomes from 99.
478
00:56:30,390 --> 00:56:35,550
But at least to the best of my knowledge, this problem has not been completely solved.
479
00:56:37,590 --> 00:56:44,430
And there is also this. This I mentioned this, this partial of expectations.
480
00:56:44,850 --> 00:56:51,780
This also amount of paper has been considered also in the case of of the game setting.
481
00:56:52,560 --> 00:57:00,660
But I'm not real for algorithms, for the conditional expectation or the other variance of the to have the shortest pass problem.
482
00:57:01,560 --> 00:57:04,890
Now, I almost made it in time. Thank you very much.