1 00:00:02,160 --> 00:00:19,030 They. Okay, Welcome, everybody to the strategy lecture. 2 00:00:19,040 --> 00:00:24,160 So this is our distinguished only lecture, which is named for Christopher Straight G, 3 00:00:24,180 --> 00:00:31,160 who founded the programming research Group here and was one of the leading figures in the early history of computing. 4 00:00:31,550 --> 00:00:37,190 Before I introduce our amazing speaker, I want to thank our sponsors, Oxford Asset Management, 5 00:00:37,190 --> 00:00:46,310 to spend generously supporting this hearing since 2015, and they enabled us to bring a great future here. 6 00:00:47,180 --> 00:00:56,140 I think some of you have done some things that researchers and speakers want to meet them afterwards, and that's okay. 7 00:00:56,150 --> 00:01:03,320 So it's a great pleasure to introduce Chris Buyer and Chris on super well known for her work in verification. 8 00:01:03,920 --> 00:01:09,979 And actually, Crystal is also very modest because the biography she gave me omitted all sorts of things. 9 00:01:09,980 --> 00:01:13,250 But luckily for me I used Google and discovered Discover for myself. 10 00:01:13,640 --> 00:01:16,690 So I discovered, for example, that for her textbook, 11 00:01:16,970 --> 00:01:24,230 the principles of model checking has been cited more than 8000 times and is essentially the book in model checking. 12 00:01:24,230 --> 00:01:32,960 And she also has some very, very influential papers, including a work called Model checking algorithms for continuous time Markov chains, 13 00:01:33,140 --> 00:01:41,450 which again has really very, very many citations and introduces both the new logic and new model checking algorithms. 14 00:01:42,950 --> 00:01:49,399 Crystal is the author of the verification column in ACM SIGGRAPH News and has a bunch of honours, 15 00:01:49,400 --> 00:02:00,170 including Academy Award Europa and two honorary doctorate from Orkin, and she's really active in supporting German computer science. 16 00:02:00,170 --> 00:02:05,600 So I found Crystal on just about every German computer science committee that exists. 17 00:02:06,000 --> 00:02:15,800 She's here today talking to us. The title of the talk is From Classical to Non-Classical is to cast the shortest path and welcome Crystal. 18 00:02:16,610 --> 00:02:19,670 But this letter. Yes, like that. Yeah. 19 00:02:20,360 --> 00:02:23,420 Thank you. That's the Mike Rook. It does. It does. Okay. 20 00:02:23,900 --> 00:02:28,820 Yes. Thank you very much for inviting me. Thank you very much for organising this event. 21 00:02:28,830 --> 00:02:32,000 Thank you very much for sponsoring here. 22 00:02:32,000 --> 00:02:35,060 It's a great honour for me to give this presentation. 23 00:02:35,750 --> 00:02:39,440 Well I choose to a topic which has a long history. 24 00:02:39,440 --> 00:02:45,049 So. So indeed, first research was done 60 years ago, but in the end, 25 00:02:45,050 --> 00:02:51,140 for this non-classical versions of the shortest path problem then than it, it's more more recent work. 26 00:02:52,430 --> 00:03:01,160 And before starting with the technical things, I would also like to say it is almost 30 years ago that I visited. 27 00:03:01,660 --> 00:03:09,940 We had Kostka in Birmingham. We just were thinking which year it was 95 or 96 and this was for me, 28 00:03:09,940 --> 00:03:19,520 it was time to get in touch with with questions concerning the verification of probabilistic programs or probabilistic systems. 29 00:03:19,940 --> 00:03:29,149 And at this time, I also read these old papers about the classical stochastic sort of path problem, and I would have never thought that. 30 00:03:29,150 --> 00:03:32,480 So two years later I will give you a talk about this topic. 31 00:03:32,600 --> 00:03:36,230 So I would have expected that all the problems have been solved. 32 00:03:37,220 --> 00:03:44,780 Uh, yeah. So what is the question of oops of the classical tortoise past problem? 33 00:03:45,140 --> 00:03:52,140 So more or less the task is to construct a strategy or policy for traversing a graph, 34 00:03:52,140 --> 00:04:00,140 the probabilities attached such that the expected accumulated costs until reaching a target is minimal, 35 00:04:00,140 --> 00:04:05,450 so that the main problem is to minimise the expected costs or waits. 36 00:04:06,550 --> 00:04:10,770 A motivation. So this is a map. 37 00:04:12,090 --> 00:04:15,180 Here is Grayson, where I live here in Oxford. 38 00:04:15,750 --> 00:04:19,710 And going from Grayson to Oxford. Well, there are many possibilities. 39 00:04:20,100 --> 00:04:23,700 Taking a car, taking a train, taking a flight. 40 00:04:24,210 --> 00:04:29,220 Then there are no direct flights. This is an extremely small airport that we have in Grayson. 41 00:04:29,220 --> 00:04:32,970 Then I have to change in Frankfurt or Munich or in Berlin or whatever. 42 00:04:33,630 --> 00:04:39,390 And there is a lot of decisions that have to be made and there's a lot of uncertainty. 43 00:04:39,660 --> 00:04:45,210 So the flight might be delayed or the flight might be cancelled, which is indeed the case with my back. 44 00:04:45,220 --> 00:04:57,110 But flight have been cancelled. And so it's all finding one way shortest in the sense of shortest time for me in expectation. 45 00:04:57,410 --> 00:05:00,620 So this is one instance of the shortest path problem. 46 00:05:01,250 --> 00:05:11,300 Now, indeed, in different communities, a I planning robotics, operations research, this is a very classical and fundamental problem, 47 00:05:11,900 --> 00:05:19,550 but also for the verification of randomised systems or distributed randomised algorithms. 48 00:05:20,180 --> 00:05:30,310 Then for instance, one might ask what is the expectation, the time to termination, the maximal or minimal time to fall for, 49 00:05:30,680 --> 00:05:37,070 for termination or expected costs, the maximal cost that can happen in the worst case. 50 00:05:38,520 --> 00:05:47,290 But now that the formalisation of oops, the formalisation of these problems is using Markov decision processes, 51 00:05:47,530 --> 00:05:51,460 Now, I do not want to bother you with complicated definitions, 52 00:05:51,610 --> 00:06:01,209 which is not that complicated in that case, and the way the simplest way to explain them, at least I find, is to to do the following. 53 00:06:01,210 --> 00:06:09,540 We start with the classical transition system here shown on the left, where in every state there is a choice between different actions. 54 00:06:10,180 --> 00:06:15,790 And in this context, we consider the choice of different actions as a non deterministic choice, 55 00:06:16,090 --> 00:06:23,710 so different than for finite automaton where we expect to have the same label for a non deterministic choice. 56 00:06:24,040 --> 00:06:27,910 Yet the choice between alpha and beta is considered to be non deterministic. 57 00:06:29,260 --> 00:06:33,940 The other extreme is a Markov train where we do not have non determinism at all. 58 00:06:34,270 --> 00:06:38,170 Instead, for every state we have a distribution for the successful states. 59 00:06:39,120 --> 00:06:42,330 And Markov decision process is somehow the mixture of both. 60 00:06:42,750 --> 00:06:49,740 So for every state, we have a set of action, and for each we have a distribution for the successful states. 61 00:06:50,770 --> 00:06:54,340 I thought, Ah, this is missing because of this one. 62 00:06:54,960 --> 00:07:05,030 Maybe. Uh, yeah. 63 00:07:05,040 --> 00:07:08,849 So it was auto fault for Alfa. We have the distribution one for three force. 64 00:07:08,850 --> 00:07:13,720 And likewise we have, we have distribution for the beta and now in state f we, 65 00:07:13,920 --> 00:07:18,300 that choice between the different actions is considered to be a non deterministic one. 66 00:07:18,720 --> 00:07:23,220 And having chosen the action in a state, then we have a probabilistic choice, 67 00:07:23,640 --> 00:07:28,680 which is you to be an internal decision which which is indeed the successor. 68 00:07:29,910 --> 00:07:34,740 Now to talk about probabilities or expectations in Markov decision processes, 69 00:07:35,100 --> 00:07:42,150 we need some inference which resolves the non determinism, and this is what we call calculus or policies or strategies. 70 00:07:42,540 --> 00:07:52,349 So they take the history, what has happened. So the sequence of states and actions that have been taken in the past and then for the current state, 71 00:07:52,350 --> 00:07:55,890 they choose one of their actions, possibly in a randomised way. 72 00:07:56,640 --> 00:08:06,420 And doing this then you can, given such a catalogue, you can unfold the behaviour of that scapula into a tree like Markov chain, 73 00:08:06,420 --> 00:08:11,880 and that we have the full machinery to reason about probabilities and expectations. 74 00:08:12,540 --> 00:08:15,749 If you are not familiar with that trust, use your intuition. 75 00:08:15,750 --> 00:08:21,550 I think this would be enough for this talk. And what is this? 76 00:08:21,560 --> 00:08:29,110 The shortest path problem we deal with waits, integer waits that are attached to the state action path. 77 00:08:30,940 --> 00:08:40,509 So now here is the definition. The only important thing is that throughout this talk I've been saying MGP, I always mean a finite state. 78 00:08:40,510 --> 00:08:44,440 MGP only have finally many states and finite many actions. 79 00:08:46,730 --> 00:08:55,850 No. And as schedule can be formalised as a partial or it's a function from finite passes to distributions over elections. 80 00:08:55,880 --> 00:09:03,890 This would be the randomised case, and the deterministic case is where these distributions of direct distributions, 81 00:09:04,040 --> 00:09:12,110 which means that a single action is chosen as probability, one is chosen deterministically unique electron. 82 00:09:13,490 --> 00:09:17,030 And another important notion is that of a memory less controller. 83 00:09:17,480 --> 00:09:21,850 They do not look for the full history, they just look for the current state. 84 00:09:21,860 --> 00:09:26,540 And whenever I visit the same state multiple times, I make the same decision. 85 00:09:26,570 --> 00:09:32,570 This is meant by memory loss and on the following slides I often use the short form 86 00:09:32,570 --> 00:09:37,340 notations named for memory less deterministic and often more or less randomised. 87 00:09:39,770 --> 00:09:50,780 And another important concept that has been introduced by Luca and Alcohol more than 20 years ago is the notion of an end component, 88 00:09:51,590 --> 00:09:56,450 which, to say an easy word, is a strongly connected MVP. 89 00:09:56,810 --> 00:10:03,170 And by saying Sup MGP, what we do is the following For every state we have a set of actions that can be taken. 90 00:10:03,500 --> 00:10:09,530 And now we take just a subset of these actions. And this gives you MGP. 91 00:10:09,530 --> 00:10:13,370 The probabilistic choices for these selected actions remain the same. 92 00:10:13,550 --> 00:10:18,620 They remain unchanged. So if you look for the underlying draft, then it's indeed a sup graph. 93 00:10:19,580 --> 00:10:24,860 And for this being an end component, then this graph needs to be strongly connected. 94 00:10:26,190 --> 00:10:36,210 And there is one theory that has been established by a look at how he sees this 97 and also in multiple papers, 95 00:10:36,900 --> 00:10:42,120 what I call the fundamental property, which is somehow the MVP honour. 96 00:10:42,120 --> 00:10:46,380 Look to the fact that in a finite state, Markoff trains. 97 00:10:46,830 --> 00:10:56,220 We know that almost surely so with probability one a bottom strongly connected component will be reached and the system will stay there forever. 98 00:10:56,760 --> 00:11:03,300 And the MVP on a loop uses these end components state state that no matter which catalogue you choose, 99 00:11:04,050 --> 00:11:10,710 then the limit of almost all infinite passes will constitute an end component and tier by the limit. 100 00:11:10,890 --> 00:11:15,630 The meaning. If we look for the state action pass that occurs infinitely often. 101 00:11:17,540 --> 00:11:24,050 Now, here's an example. If you have an M.D. P Uh, and we have. 102 00:11:28,710 --> 00:11:33,210 It stopped working. Does it work here? 103 00:11:33,450 --> 00:11:37,020 Yeah. We have two end components. 104 00:11:37,500 --> 00:11:46,020 Does this still work? Yeah, we have to end components and we have this state here. 105 00:11:46,800 --> 00:11:50,400 Oh, that's great. Uh, but. 106 00:11:50,400 --> 00:11:59,340 But this one for what? This one here is a trap, states a terminal state where no action can be taken. 107 00:11:59,970 --> 00:12:08,460 And then lemme apply to such MVP's that my train of thought trap states means that under each schedule, 108 00:12:08,970 --> 00:12:15,360 almost all maximal passes for either finite passes that end in a trap or the infinite ones. 109 00:12:16,260 --> 00:12:21,299 They indeed end either in a trap or they eventually realise an end component. 110 00:12:21,300 --> 00:12:30,300 And by realising an end component I mean that the state should pass the infinite loop of constitute an end component. 111 00:12:31,420 --> 00:12:35,220 And for the special case where we do not have to. 112 00:12:35,830 --> 00:12:40,870 Okay, but we do not have any components at all. 113 00:12:41,050 --> 00:12:47,890 So an easy free MGP then allows Lemma simply states that with probability one, 114 00:12:48,220 --> 00:12:53,260 no matter which catalogue we choose, the system will end in a in a trap state. 115 00:12:55,400 --> 00:13:06,020 So now the typical task of synthesis problems for MEPs or formalise the synthesis problems would be given such 116 00:13:06,020 --> 00:13:15,180 a finite state MEP Find a scheduler that's maximises or minimises the expectation of some random variable. 117 00:13:16,330 --> 00:13:20,890 So and the input of this random variable are simply maximal parcels. 118 00:13:21,860 --> 00:13:25,640 So one very popular measure is the discounted rate. 119 00:13:26,510 --> 00:13:35,420 Remind that you have these individual rates for the for the state can pass and now this discounted rate of a maximal pass. 120 00:13:35,780 --> 00:13:39,530 We take the sum of all positions in that path. 121 00:13:40,580 --> 00:13:47,300 We take the rate which is collected for this with the corresponding state action alpha. 122 00:13:48,320 --> 00:13:56,210 And we multiply it with the ice power for discount factor or gamma, and it comes from value strictly between zero and one. 123 00:13:57,440 --> 00:14:03,380 And with government being strictly between zero and one, then we do not have convergence problems. 124 00:14:03,620 --> 00:14:15,850 Even if the sum is infinite. Total waits is more or less the same, but you can think of it with discount factor one without any discount tractor. 125 00:14:16,720 --> 00:14:21,640 Now if indeed convergence is an issue for infinite passes. 126 00:14:22,090 --> 00:14:27,760 And I would say it is not that interesting with the total wait for infinite passes. 127 00:14:28,960 --> 00:14:31,000 Without any further outside constraint. 128 00:14:31,780 --> 00:14:40,239 What has been done in the literature is either to consider only finite passes, either in a finite horizon setting, 129 00:14:40,240 --> 00:14:45,069 which means that we have some fixed bound for the length of the passes that we consider. 130 00:14:45,070 --> 00:14:52,719 And then we look for the maximal expectation of exactly and steps or the infinite horizon case where 131 00:14:52,720 --> 00:15:00,880 we have a distinguished gold state and we only accumulate the weight until we reached this goal state. 132 00:15:01,600 --> 00:15:08,740 And with the site assumption that we only look for those catalogues that indeed reached this goal, state with probability one. 133 00:15:09,130 --> 00:15:18,660 These are the so-called proper catalogues. And there is another very prominent measure it random variable which is used in the literature. 134 00:15:19,080 --> 00:15:27,420 This is the mean payoff or the long run average. So what we do is we take the partial sums of these total weights. 135 00:15:27,720 --> 00:15:36,300 So we look for all the prefixes, the finite prefixes of a given path, and we take the average for these finite prefixes. 136 00:15:36,630 --> 00:15:43,260 This is this value here. And then we either take the limits or the limits of. 137 00:15:44,170 --> 00:15:50,590 For these values Now, given a single pass, then the limit and the limits up, they can be different values. 138 00:15:51,250 --> 00:15:59,950 However, what is known is that in finite Markov trains that makes no difference whether we deal with the limits or the limit. 139 00:16:00,130 --> 00:16:03,760 So in expectation, these are the same. 140 00:16:03,970 --> 00:16:11,200 And this result carries over to two Markov decision processes where the maximum does not change. 141 00:16:11,200 --> 00:16:15,640 Whether we take the modelling soup and say for the minimum overall calculus. 142 00:16:16,940 --> 00:16:20,840 Well, and moreover, and I will not zoom into the details, 143 00:16:21,590 --> 00:16:31,370 there are efficient algorithms to compute a memory less deterministic scheduler where these the max or the minimum achieved, 144 00:16:31,370 --> 00:16:35,270 and these can be computed using linear programming techniques. 145 00:16:37,540 --> 00:16:45,300 So now let's go, let's look or let's go more in the direction of the shortest path problem. 146 00:16:45,310 --> 00:16:49,030 And let's go look first for the discounted case. 147 00:16:49,450 --> 00:16:53,380 And here the key concept on the so-called bellman equation. 148 00:16:53,920 --> 00:16:57,549 So this is it was somewhere in the fifties. 149 00:16:57,550 --> 00:17:03,790 So it's quite a long time ago when he introduced this this bellman equations. 150 00:17:04,060 --> 00:17:12,610 Okay. So now we are in the disguises. Okay. So the task is to find a schedule that maximises the expected discounted rate. 151 00:17:13,700 --> 00:17:23,990 And you go, You are the bellman equations. So the idea is to characterise the global maximum by local maxima in every state. 152 00:17:24,860 --> 00:17:27,130 So. S is a state. 153 00:17:27,170 --> 00:17:38,600 An X is a variable for the maximal expected discounted rate that can be can achieve from state as so if we treat state as an initial state. 154 00:17:40,070 --> 00:17:48,320 And then this value x equals the maximum of all the actions that can be taken instead. 155 00:17:49,420 --> 00:18:02,050 And for each such action, we look for the weight of the state action per alpha, plus the weighted some of these values for the success of. 156 00:18:03,090 --> 00:18:12,840 And the rates are given by the discount factor gamma multiplied with the transition probability going from s when taking alpha to state t. 157 00:18:15,150 --> 00:18:20,030 And. This is a recursive equation, so X equals something. 158 00:18:20,030 --> 00:18:23,770 And T on the right hand side, we have two values for the successors. 159 00:18:24,200 --> 00:18:33,860 So you can think of it as a function that maps C or vectors this one component for each state to two vectors. 160 00:18:34,460 --> 00:18:40,270 And what has been shown is that this function, thanks to this discount factor, 161 00:18:40,280 --> 00:18:45,350 gamma is a contracting map on a banner space, so it has a unique fixed point. 162 00:18:46,700 --> 00:18:52,730 And this indeed also is the key property to establish algorithms. 163 00:18:53,090 --> 00:18:57,900 So one of them is derived linear programming techniques. 164 00:18:58,370 --> 00:19:05,030 Namely, instead of saying X is equals, the maximum, we say x is greater or equal, 165 00:19:05,480 --> 00:19:10,700 then all these values for each action that can be taken in in status. 166 00:19:11,850 --> 00:19:17,880 And to ensure that we have access equal to the maximum, we minimise all these values. 167 00:19:19,820 --> 00:19:27,460 So this is one thing. And the other thing is what can also be derived from this is the value integration. 168 00:19:27,470 --> 00:19:35,780 So we trust. So this function, if we start with an arbitrary value or zero in every component, 169 00:19:36,800 --> 00:19:43,490 and then we simply apply successively implied this function F and convergence is guaranteed. 170 00:19:44,300 --> 00:19:52,220 And that is also a technique I will not speak about, which is called policy iterations of successively improved memory, 171 00:19:52,220 --> 00:19:59,090 less deterministic scheduler, and eventually destabilises it in an optimal solution. 172 00:20:01,110 --> 00:20:06,150 So now this was the case, this one here for discounting. 173 00:20:06,570 --> 00:20:08,910 And remind for totally what I said. 174 00:20:09,030 --> 00:20:19,140 This is essentially the same, but without this discount factor and indeed the bellman equation itself, they are for this infinite horizon case. 175 00:20:19,680 --> 00:20:29,550 They still look like this unless it is already one of these distinguished gold states in this case that is simply zero, 176 00:20:30,180 --> 00:20:37,860 but otherwise did have the same equation. And he was that this contract of gamma, which is one or can be dropped. 177 00:20:38,960 --> 00:20:44,870 But now things are more difficult. This before I said we have a contracting that. 178 00:20:45,850 --> 00:20:49,080 But this is not clear here. Not. 179 00:20:49,090 --> 00:20:53,070 Not in all the cases. We don't have to discount any more. 180 00:20:53,470 --> 00:21:01,710 So this needs. Additional things to make sure that we have a unique fixed point and so on. 181 00:21:02,820 --> 00:21:09,540 And all this was. Let's hear we talk about the stochastic sort of path problem. 182 00:21:09,540 --> 00:21:13,260 And all this research is about these extra things that we need. 183 00:21:15,090 --> 00:21:18,629 So just I skipped the outline at the beginning, 184 00:21:18,630 --> 00:21:26,940 but now here comes the what I just did was to give you some basics about the model we talk about and end these important measures, 185 00:21:27,360 --> 00:21:30,839 and now we talk about the stochastic sort of path problem. 186 00:21:30,840 --> 00:21:35,100 First the classical one and then later on on on variants. 187 00:21:36,580 --> 00:21:49,330 So this notion stochastic sort of past problem and the main results, they go back to Betsy and Alex from the year this is very small. 188 00:21:49,330 --> 00:21:56,890 It is 9 to 1 in a paper published in Mathematics of Operations Research, 189 00:21:57,610 --> 00:22:04,599 and in this paper they have a very nice section on the related work or summary of 190 00:22:04,600 --> 00:22:09,580 the work that has been done at this Times was summary of the state of the art. 191 00:22:10,570 --> 00:22:15,010 So here is this paper from 91 and it goes back to 62. 192 00:22:15,280 --> 00:22:19,570 This problem has been investigated first, 193 00:22:19,900 --> 00:22:26,830 and they also nicely describe that even Apollo tries to introduce a new name for this because 194 00:22:27,460 --> 00:22:32,380 this was used under different names and the three different three other different names before. 195 00:22:33,340 --> 00:22:42,100 But they also explain that this is in the style of classical short of path problems which are known from graph theory. 196 00:22:43,840 --> 00:22:48,120 So now. Let's look for this picture here. 197 00:22:49,050 --> 00:22:52,740 We have a way to find out State Markov decision process. 198 00:22:53,220 --> 00:23:02,820 And we have a certain gold state, which is the type of state that we want to reach, and we accumulate the rates until we reach this gold state. 199 00:23:03,600 --> 00:23:08,940 And then we want to reason about the mean or maximal expectation for this random variable. 200 00:23:10,300 --> 00:23:17,590 Now for the rest of the talk, I will consider the longest pass problem we have here to the maximum. 201 00:23:17,980 --> 00:23:21,340 The techniques for minimum are completely analogous. 202 00:23:23,500 --> 00:23:32,580 So the first result, and this goes back to the late sixties was that in the special case where under 203 00:23:32,620 --> 00:23:37,960 every sketch you scapula or the minimal probability to reach the gold state is one. 204 00:23:39,190 --> 00:23:46,000 In this case, what have been proven is that the bellman equations, they still work. 205 00:23:46,980 --> 00:23:53,820 And they get a contracting rep. So who empty piece of this form out called contracting MIPS. 206 00:23:53,820 --> 00:23:58,530 And these are exactly the you see free notes that do not have any components. 207 00:24:00,080 --> 00:24:06,080 And it was shown that there is a reduction to the discounted case in this case. 208 00:24:07,150 --> 00:24:11,740 So all this machinery for discounted rate works also for this case. 209 00:24:12,930 --> 00:24:21,930 Now the contribution of that take us to thecase was this requiring that under every catalogue we reach gold state. 210 00:24:21,930 --> 00:24:33,060 It's a very strong requirement. Now let's relax it by considering only those catalogues, what they called the proper catalogues, 211 00:24:33,480 --> 00:24:36,710 namely those that reach the goal of the probability one. 212 00:24:36,750 --> 00:24:40,270 But there might be others. So. And. 213 00:24:41,360 --> 00:24:45,290 They also achieved that under certain conditions. 214 00:24:45,290 --> 00:24:52,550 We can still rely on these Belmont equations and techniques that have been developed for the discounted case, 215 00:24:52,940 --> 00:25:03,550 but they needed a second condition here, which is that that the total expected total weight under each improper scheduler. 216 00:25:03,560 --> 00:25:09,680 So each scheduler that. Does not reach the gold state with probability one. 217 00:25:11,620 --> 00:25:21,010 Must be minus infinity. And they showed that if these two conditions hold it, then the maximum is indeed finite. 218 00:25:21,340 --> 00:25:26,740 And all this machinery that has been developed for the discounted case still works. 219 00:25:27,220 --> 00:25:32,200 So. So now the question is. 220 00:25:33,870 --> 00:25:38,939 How to track this condition. The second condition, the first one, this is this is simple. 221 00:25:38,940 --> 00:25:42,870 They agree to do it in polynomial time. 222 00:25:44,670 --> 00:25:49,790 But how to track this condition? The second one. And the next thing is a. 223 00:25:50,370 --> 00:25:55,550 So if these two conditions hold, then the maximal expectation is finite. 224 00:25:55,560 --> 00:25:59,040 But there can be other cases where the maximal expectation is finite. 225 00:25:59,040 --> 00:26:00,840 So it's not a complete criterion. 226 00:26:02,350 --> 00:26:12,940 So what to do in this case is wait to find out how to check this and how to compute in these cases an optimal catalogue and the optimal value. 227 00:26:14,080 --> 00:26:21,310 And the first answer was given to these questions was given by look at Alpha Howell in his Conquer 99 paper, 228 00:26:21,850 --> 00:26:31,390 where he considered Markov decision processes where the weights have all the same polarity that they are all known positive or they are all negative. 229 00:26:32,470 --> 00:26:38,770 And what he did was he provided a transformation to the case of peptic ulcer and 230 00:26:38,770 --> 00:26:44,410 to clear case so that these two conditions hold for a transformed and repeat. 231 00:26:45,040 --> 00:26:52,450 And the idea here for the case this is now for the case with all weights are non positive, so either negative or zero. 232 00:26:53,170 --> 00:27:00,730 Then you observe the following whenever you have an end component where every state action pairs meet zero, 233 00:27:01,540 --> 00:27:06,940 then you can collapse them to a single state. Then it's no longer an end component. 234 00:27:08,350 --> 00:27:16,960 And. In that way, you can remove all these end components consisting of what I call zero end components. 235 00:27:17,290 --> 00:27:24,360 And what remains is an MDP, where each end component contains at least one state action pair. 236 00:27:24,370 --> 00:27:31,420 This negative weight and the other rates are non-negative, non positive anyway, which gives you that expectation. 237 00:27:31,990 --> 00:27:37,630 The expected value of the total weight is indeed minus infinity for each improper scatterplot. 238 00:27:39,820 --> 00:27:47,230 And in the case where all the rates are non-negative, then Alpha proved the following. 239 00:27:47,830 --> 00:27:50,920 This makes the maximal expectation is finite. 240 00:27:51,310 --> 00:27:55,720 If and only if the weights in all end components are zero. 241 00:27:56,590 --> 00:28:01,600 So no end component contains the stated comparative positive weight. 242 00:28:02,350 --> 00:28:12,970 And in that case one can do the same thing. One collapses all the zero and components and the result is an MDP that has no improper scapula at all. 243 00:28:14,230 --> 00:28:17,230 The all that did the remaining to the end. 244 00:28:17,530 --> 00:28:25,240 If all these and components enjoy this, then after this collapsing the zero and components, there are no end components anymore. 245 00:28:26,960 --> 00:28:34,280 And these these concepts can also be generalised for the case of arbitrary interrogates. 246 00:28:34,670 --> 00:28:38,060 So we can have a mixture between positive and negative rates. 247 00:28:38,600 --> 00:28:49,790 And let's have quickly a closer look in this criterion pool for finite nodes and and for this transformation. 248 00:28:52,290 --> 00:28:57,359 There is some cleaning up to make sure that from every stage we can reach the goal with 249 00:28:57,360 --> 00:29:03,880 state of this with probability one and that none of the states are removed and so on. 250 00:29:03,900 --> 00:29:11,460 I will skip this. Then, after such a cleaning up, the result or characterisation of the finite notes looks like this. 251 00:29:12,270 --> 00:29:23,100 The maximal expectation is finite. If and only if the MDP has no great divergent end component and component is said to be very 252 00:29:23,180 --> 00:29:30,570 divergent if it has a schedule such that for almost all the passes along this catalogue, 253 00:29:31,170 --> 00:29:34,350 the limbs of these prefixes of the weights. 254 00:29:36,040 --> 00:29:43,920 Hence to infinity. Now, this observation then yields an algorithm for checking finite nodes. 255 00:29:44,370 --> 00:29:53,850 Compute the maximal and components. This can be done with an iterative, strongly connected component computation in patriotic time, 256 00:29:55,080 --> 00:29:59,160 and then one can track this with divergence separately. 257 00:29:59,250 --> 00:30:02,550 To give you an idea for that, let's have a closer look. 258 00:30:02,730 --> 00:30:08,060 So. This was again, the definition of great divergent and components. 259 00:30:09,360 --> 00:30:15,240 They exist as catalogues such that almost surely the same look tends to plus infinity. 260 00:30:16,110 --> 00:30:19,680 One special case also called Pumping and Components. 261 00:30:20,730 --> 00:30:26,580 The same thing holds, but now in a stronger form. Namely, even the live in goes to plus infinity. 262 00:30:27,360 --> 00:30:35,310 This, for instance, applies to the end component, which is your trust mark of train, which is completely uniform. 263 00:30:35,580 --> 00:30:41,160 So if interest is uniform, probability will stay in that state or go to the other state. 264 00:30:41,790 --> 00:30:46,710 And one state for one state, we have to wait plus two and four out of one minus one. 265 00:30:47,590 --> 00:30:51,669 Then if you look for the accumulated weight on most trawler, 266 00:30:51,670 --> 00:30:57,790 it goes up and up and up and higher and higher and higher such that indeed the limits will be plotted infinity. 267 00:30:59,560 --> 00:31:04,780 Another extreme situation is what we called what has been called gambling and component. 268 00:31:06,910 --> 00:31:12,490 The limit goes to minus infinity and super goes to plus infinity. 269 00:31:12,640 --> 00:31:16,660 So we still have this divergence condition here on the right hand side. 270 00:31:17,320 --> 00:31:22,420 So this would be the case where this accumulated rate oscillates. 271 00:31:22,540 --> 00:31:28,510 So it goes down and down. So up and down and up and down and so on. 272 00:31:30,070 --> 00:31:35,230 So now here comes an algorithm to track rate divergence. 273 00:31:36,300 --> 00:31:44,400 Assuming that we have a strongly connected Markov decision process and this assumption is justified because, as I said before, 274 00:31:44,610 --> 00:31:52,590 to take this finite ness of the maximal expectation, it is sufficient to analyse the maximal end components separately. 275 00:31:52,600 --> 00:32:04,010 So we are in this strongly connected case. So we first compute the maximal mean pair off and a corresponding memory deterministic scheduler. 276 00:32:06,130 --> 00:32:11,060 If this mean payoff is positive. Then we can safely return. 277 00:32:11,060 --> 00:32:19,040 Yes, because just remember, the mean payoff was defined as the average of the rates that we collect in the first steps, 278 00:32:19,040 --> 00:32:27,320 and then the end goes to infinity. So if this is the case, if even this mean payoff is positive, then we are in a pumping case. 279 00:32:29,730 --> 00:32:37,410 If the mean of of this catalogue or the maximum of is negative, then we can safely return. 280 00:32:37,530 --> 00:32:46,440 No. This is not the case. It cannot be divergent because the maximal expected mean pair of in all and components must be negative. 281 00:32:48,420 --> 00:32:54,840 So when we return, when we arrive here, I thought then need a tool, not three of flights. 282 00:32:55,350 --> 00:33:01,410 So between that expected mean pay of what a maximum of is zero. 283 00:33:02,500 --> 00:33:08,800 Now this catalogue sequence is a memory list. Deterministic scheduler induces a Markov trade. 284 00:33:10,030 --> 00:33:13,770 Now this mark of trade, you can consider it as a graph. 285 00:33:13,780 --> 00:33:17,319 It has certain strongly connected components. 286 00:33:17,320 --> 00:33:20,650 And now look for those that are at the bottom that cannot be left. 287 00:33:21,160 --> 00:33:22,060 Pick one of them. 288 00:33:23,950 --> 00:33:35,469 So if this one is compelling and reminder, this means that the name goes to plus infinity to minus infinity, then we can safely return. 289 00:33:35,470 --> 00:33:44,730 Yes, because gambling implies rate divergence. The question is how to track this being gambling or not. 290 00:33:45,240 --> 00:33:49,530 This is an NPR problem in general. But here we are in a similar case. 291 00:33:50,370 --> 00:33:54,510 We have a Markoff train. So this is indeed a Markoff train. 292 00:33:55,080 --> 00:34:07,290 And we know that the mean payoff is zero. But then being gambling is equivalent to the existence of cycles where the accumulated weight is positive, 293 00:34:07,710 --> 00:34:13,140 which again is equivalent to the statement that there are cycles where the accumulated weight is negative. 294 00:34:14,630 --> 00:34:20,780 So and this is a simple graph condition. So this can be also tracked by graph. 295 00:34:23,040 --> 00:34:28,440 Now, what remains to consider is the case that this is not complete. 296 00:34:29,280 --> 00:34:42,070 But then. But then the only thing what can still apply is that all the state action pass in the end component, or at least PSC must have made zero. 297 00:34:44,100 --> 00:34:47,850 All the cycles must have zero before effort. 298 00:34:48,150 --> 00:34:55,180 The criterion is to check the existence of positive cycles, which is equivalent to the existence of negative cycles. 299 00:34:55,210 --> 00:35:02,160 And what if this does not apply? Then this means that the accumulated rate along all the cycles must be zero. 300 00:35:02,910 --> 00:35:11,340 And in that case. What we can do is we can somehow flatten and such and components. 301 00:35:11,880 --> 00:35:18,090 Now, in the case of non positive and non-negative weights, I said we simply collapse them to a single state. 302 00:35:19,040 --> 00:35:22,250 Yet the situation is a bit more complicated. 303 00:35:23,420 --> 00:35:27,920 But if I look at the time, then I think I will keep this. 304 00:35:30,530 --> 00:35:33,890 If you like to see it, I'm happy to show it after to talk. 305 00:35:35,380 --> 00:35:42,480 No. So. 306 00:35:43,320 --> 00:35:46,830 Well in the end with this technique to remove. 307 00:35:47,870 --> 00:35:56,000 And components are indeed back to exactly this picture of what I have shown you for the case for Fallujah as I was case. 308 00:35:56,150 --> 00:35:58,250 Not positive or negative weight. 309 00:35:58,670 --> 00:36:08,960 There is a transformation which is polynomial time to a new MGP that satisfies the constraints that are considered by Betsy us and 310 00:36:09,500 --> 00:36:16,850 is such that we can rely on their machinery and rely on all these techniques that have been developed for the discounted case. 311 00:36:18,450 --> 00:36:23,130 So let's look for variance and let's start. 312 00:36:27,240 --> 00:36:30,309 Yeah. This. A few questions. 313 00:36:30,310 --> 00:36:34,240 Why do we consider variants? The first thing is. 314 00:36:35,120 --> 00:36:43,850 I said to you, we consider the maximum overall the proper schedule of low dose schedules that reach the goal stage with probability one. 315 00:36:45,530 --> 00:36:49,220 Is it really justified to ignore the improper ones? 316 00:36:49,850 --> 00:36:53,570 And what is the case if there are no proper schools at all? 317 00:36:53,810 --> 00:36:58,790 If all this catalogues fail the Golden State with some positive probability. 318 00:36:59,630 --> 00:37:00,800 And another question. 319 00:37:02,530 --> 00:37:13,719 Just looking for the expectation, then this might not be the best solution if the maximal optimal schedule is a good expectation. 320 00:37:13,720 --> 00:37:17,140 Okay. But the variance is very, very high. 321 00:37:17,530 --> 00:37:27,820 Then there is a very high risk that concrete simple runs have the accumulated weight much less than the expectation. 322 00:37:28,240 --> 00:37:37,360 So in these cases it might be better to have suboptimal sub maximal expectation but a better, smaller variance. 323 00:37:38,940 --> 00:37:42,990 Looking for the variance. I find it a bit surprising. 324 00:37:43,380 --> 00:37:51,060 This is surprisingly difficult. So a natural thing would be to have the lowest threshold values. 325 00:37:51,060 --> 00:37:54,300 I want to have at least this and that expectation. 326 00:37:54,900 --> 00:37:59,280 And I have a threshold and an upper bound for the variance. 327 00:38:00,030 --> 00:38:03,540 No such threshold problems, at least to the best of my knowledge. 328 00:38:03,930 --> 00:38:12,390 It is not known to be solvable. What is known is to achieve such thresholds or to satisfy such threshold conditions. 329 00:38:12,840 --> 00:38:18,090 One might need schedules that use both memory and randomisation. 330 00:38:18,600 --> 00:38:22,680 But I'm not aware that there is any solution that is known. 331 00:38:24,230 --> 00:38:33,890 There is a partial answer, but for a much simpler setting, namely, if there might be difference catalysts that achieve the maximal value. 332 00:38:34,640 --> 00:38:44,690 And if you consider only those and you seek for very minimal schedule on the dose that achieve the maximum, then the situation is simpler. 333 00:38:45,710 --> 00:38:50,900 More or less one can use bellman like equations for the variants. 334 00:38:51,230 --> 00:38:58,549 And what makes it so simple is. In the in this with this question here, the expectation is fixed. 335 00:38:58,550 --> 00:39:05,420 It is a fixed value that can be computed before and then it can be treated as a constant in this linear program. 336 00:39:05,960 --> 00:39:09,330 So all this catalogues under consideration here. 337 00:39:09,380 --> 00:39:13,070 They have all the same expectation and this makes it simpler. 338 00:39:13,100 --> 00:39:19,670 This problem. So another thing is to look at what have people done in risk theory. 339 00:39:20,420 --> 00:39:25,490 So there is a notion called variance penalised expectation. 340 00:39:26,000 --> 00:39:33,950 So what they do in this context here we still want to have good high expectation is to 341 00:39:33,950 --> 00:39:39,520 take the supreme of all its catalogues that can be achieved for this goal function. 342 00:39:39,530 --> 00:39:50,240 Here we take the expectation, the expected accumulated weight minus some factor lumped on some putative value times the variance. 343 00:39:50,960 --> 00:39:59,330 So there is some incentive not only to maximise the expectation but also to look for a small variance value. 344 00:40:01,740 --> 00:40:04,800 For solution methods for these. I come to this in a second. 345 00:40:05,040 --> 00:40:13,090 There is a partial answer. Another measurement also taken from risk theory is the so-called conditional value at risk, 346 00:40:13,300 --> 00:40:17,430 which has been investigated by Kaczynski and to be a second offer. 347 00:40:19,090 --> 00:40:27,470 So this value at risk would fix some some probability value or some percentage. 348 00:40:27,670 --> 00:40:32,500 Pete And you are interested what happens in the worst cases. 349 00:40:32,620 --> 00:40:39,430 So in the 10% worst case for this, you can define the so-called value of risk here. 350 00:40:39,730 --> 00:40:43,120 So where it swaps? What are the worst cases? 351 00:40:43,230 --> 00:40:54,190 Let's let's say it in that way. And then you look here in this conditional case, what is the the the expected, uh, accumulated rate. 352 00:40:54,460 --> 00:40:59,050 In exactly these cases, these p worst cases. 353 00:40:59,560 --> 00:41:03,550 But that's what they do. Solution method. 354 00:41:03,570 --> 00:41:07,729 I will talk in a second. Uh, or so. 355 00:41:07,730 --> 00:41:13,080 This was already a conditional expectation. But also, you can look for this in a clear form. 356 00:41:13,460 --> 00:41:20,150 The conditional expected accumulated weight under the assumption that we indeed reach the goal state. 357 00:41:20,870 --> 00:41:28,460 This makes it possible to relax this assumption that we only look for the schedules that achieve the goal of this probability one. 358 00:41:28,790 --> 00:41:32,690 Instead, it is sufficient to look for those that reach the goal state at all. 359 00:41:34,790 --> 00:41:45,200 And another variant of this one, which is conceptually close to this one, is what we what has been called partial expectations. 360 00:41:45,650 --> 00:41:54,950 So for this random variable, we say take the accumulated weight in case that we did reach the gold state and otherwise zero. 361 00:41:57,520 --> 00:42:02,290 Now these problems look quite different, but they have some similarities. 362 00:42:03,180 --> 00:42:15,630 And I will briefly zoom into conditional expected accumulated rewards and then give you an idea of what are the similarities to these other problems. 363 00:42:18,110 --> 00:42:24,320 So, uh, the first thing, why should we interested in these conditional expectations? 364 00:42:24,830 --> 00:42:30,800 Well, it's a natural measure. For instance, if you have probabilistic non deterministic programs, 365 00:42:31,220 --> 00:42:38,870 then you might be interested in the conditional expectation time under the assumption that the program indeed terminates. 366 00:42:39,830 --> 00:42:43,820 Or you might look for certain failure scenarios. 367 00:42:44,330 --> 00:42:52,340 And then what are the expected costs of repair protocols in these particular failure scenarios which can suit your condition? 368 00:42:53,860 --> 00:43:03,010 And it has also been used for some kind of multi objective reasonings with cost utility to analyse the cost utility trade of its captive. 369 00:43:05,220 --> 00:43:12,720 It is more difficult than the standard, uh, stochastic sort of past problem because as you will see in a second, 370 00:43:13,230 --> 00:43:17,290 optimal memory, less credulous might not exist anymore. 371 00:43:17,320 --> 00:43:28,780 We might need history. And also this using linear programming techniques in this naive form, this one variable state won't work anymore. 372 00:43:30,960 --> 00:43:35,100 So let's look at an example, namely this one here. 373 00:43:35,790 --> 00:43:40,409 So this is an empty pig where it is just a single state where we have a non 374 00:43:40,410 --> 00:43:46,380 deterministic choice as to where we can go with an alpha directly to the gold state. 375 00:43:46,740 --> 00:43:53,850 It's probability one. Or we can take action beta with the self loop here or going to a failed state. 376 00:43:55,850 --> 00:44:05,000 So let's look first for the two deterministic schedulers memory less deterministic schedules that always do the same thing when they are in state. 377 00:44:06,140 --> 00:44:11,360 So the first option is always take action undefined state as to then. 378 00:44:12,300 --> 00:44:22,660 We have to be have trust. A single pass going to to pass is going to go with oneness from aethereal via one to goal. 379 00:44:22,950 --> 00:44:26,670 Then we collect the reward for this action, which is all. 380 00:44:27,670 --> 00:44:32,860 And the other one goes from zero to S2 and then to gold and into the reward zero. 381 00:44:32,870 --> 00:44:40,229 So the result is a half. But this catalogue, the minimalist, 382 00:44:40,230 --> 00:44:50,520 deterministic catalogue which always chooses beta in state as to then achieve a better conditional expectation namely are. 383 00:44:50,910 --> 00:44:54,250 And this is because they stressed a single pass going to goal. 384 00:44:54,710 --> 00:45:07,440 While as one we achieve the value, the reward order to the weight are for gamma and the other possibility for this conditional expectation. 385 00:45:07,830 --> 00:45:15,410 We never reach goals. We simply cannot. So it was a beta is better than alpha. 386 00:45:17,260 --> 00:45:28,890 But we can do better. Let's look for the following schedule in state as to for the first and visits we take beta and for the end plus first visit. 387 00:45:29,110 --> 00:45:34,320 If we have taken the self loop. If it happens that we have taken the self loop and times. 388 00:45:35,260 --> 00:45:42,450 Then we take action. So there is still this possibility. 389 00:45:42,660 --> 00:45:45,900 Going directly from zero to S1 to goal. 390 00:45:46,020 --> 00:45:56,260 This stands for this one half times are. Plus now one half times the probability to take this self loop and times. 391 00:45:57,160 --> 00:46:03,520 This is this value. And whenever we take beta, we own this great one. 392 00:46:03,760 --> 00:46:08,320 So all together we get to the factory in. Now you can calculate. 393 00:46:08,560 --> 00:46:14,050 Then the result would be this. And this is better than or remind. 394 00:46:14,080 --> 00:46:21,160 This was the conditional expectation that we achieve with this memory less deterministic scheduler which always chooses beta. 395 00:46:21,700 --> 00:46:32,560 So this value is larger than our if and only if this number n is larger than R, and one can compute that the maximum is achieved when equals plus two. 396 00:46:33,970 --> 00:46:39,270 Now, what this example shows is. We can eat memory. 397 00:46:39,630 --> 00:46:45,600 So in this case, we need a counter for the number of we have taken this self loop. 398 00:46:48,040 --> 00:46:52,900 And what is meant by your biting local reasoning is not sufficient. 399 00:46:54,450 --> 00:46:58,020 I trust that the optimal value is achieved for this cat. 400 00:46:59,070 --> 00:47:03,240 After our plus two visits, as to we take extra alpha. 401 00:47:04,710 --> 00:47:10,410 Now, this depends on all which is the rate of this action gamma, which is sitting here. 402 00:47:10,800 --> 00:47:14,100 So this condition is even not reachable from S2. 403 00:47:14,430 --> 00:47:18,690 Just looking for the MDP that is reachable from is too. 404 00:47:19,320 --> 00:47:23,220 We cannot see anything about the optimal behaviour in state as to. 405 00:47:23,580 --> 00:47:26,910 This is what I mean here is with local reasoning is not sufficient. 406 00:47:29,580 --> 00:47:35,520 And there is also another phenomenon which does not occur in the in the unconditional case. 407 00:47:36,720 --> 00:47:41,550 For the initial state. The maximal expectation is finite. 408 00:47:42,500 --> 00:47:46,780 But. Oops. Prostate is too. 409 00:47:47,110 --> 00:47:49,540 If I consider this as the initial state, 410 00:47:50,200 --> 00:47:58,600 then indeed this maximal conditional expectation is infinite because I have the possibility to pump at infinity. 411 00:47:59,820 --> 00:48:02,520 And go then to two to the gold state. 412 00:48:04,930 --> 00:48:14,170 So, uh, now, although in the interest of time, I will just mention the results and skip these, these details here. 413 00:48:14,800 --> 00:48:20,530 Uh, if there is a method, though, this was on the previous slide, there is. 414 00:48:20,530 --> 00:48:22,990 There are graph techniques to check finite. 415 00:48:24,440 --> 00:48:35,480 So assuming that we have tracked this and made sure that the conditional expectation is finite and assume that all rates are non-negative, 416 00:48:35,960 --> 00:48:39,320 either positive or zero, but no negative values. 417 00:48:40,310 --> 00:48:43,560 Then what can be shown is that the threshold problem. 418 00:48:43,610 --> 00:48:48,890 So to check whether this maximum conditional expectation is beyond a certain threshold. 419 00:48:49,400 --> 00:48:54,290 This is a piece based hard problem and piece based complete in ethically case. 420 00:48:56,120 --> 00:49:01,540 Now for algorithms in the in the non to click case the way we have cycles. 421 00:49:02,430 --> 00:49:07,590 The key observation is there is what we called a saturation point. 422 00:49:07,830 --> 00:49:16,030 So if there is some natural number such that whenever we reached a configuration 423 00:49:16,150 --> 00:49:20,610 state and look for the accumulated weight that has been accumulated so far, 424 00:49:21,090 --> 00:49:30,720 if this one is larger than this operation point, then from then on the best what can do is to maximise the probability of reaching a gold state. 425 00:49:31,470 --> 00:49:39,300 And the intuition is. The accumulated weight is already so high that that we want to save. 426 00:49:39,340 --> 00:49:45,340 We do not want to go. The risk that that we fail to visit the ghost state, in which case we lose everything. 427 00:49:45,790 --> 00:49:48,550 That's that's the intuition behind this. 428 00:49:49,800 --> 00:50:00,030 Well, from this one, one can derive an exponential time threshold algorithm and an exponential time algorithm to compute this optimal value. 429 00:50:01,190 --> 00:50:08,300 Now, this was this was saturation point algorithm for the case of conditional expectation. 430 00:50:08,870 --> 00:50:21,020 And now remember that I mentioned before other variants, the same observation with these separation points have been made for partial expectation. 431 00:50:21,440 --> 00:50:24,590 This is the result by my test group. 432 00:50:25,300 --> 00:50:29,270 The antidote has also been made for conditional values for risk. 433 00:50:29,300 --> 00:50:34,100 This is the result by gift making of them. And. 434 00:50:35,420 --> 00:50:42,139 It also applies to the setting of variance penalised expectations reminder that the goal was to 435 00:50:42,140 --> 00:50:50,480 maximise the objective expectation minus lumped times variants slumped up being from positive constant. 436 00:50:52,050 --> 00:51:01,230 And he is also said to raise a point, but here comes some strange phenomenon behind that. 437 00:51:02,160 --> 00:51:08,310 Saturation point. The best thing one can do is to minimise the probability to reach the gold state. 438 00:51:09,870 --> 00:51:18,990 And the reason it's more or less intuitively, this variance is a quadratic function. 439 00:51:19,590 --> 00:51:25,350 And from some point on in this objective, the variance becomes the dominant thing. 440 00:51:25,920 --> 00:51:32,549 And so what one wants to do is to to minimise that from some point on. 441 00:51:32,550 --> 00:51:38,310 One wants to avoid that the value gets too high. That's, that's the intuition behind that. 442 00:51:39,820 --> 00:51:45,850 Which at the same time makes this very polite expectation in this context a bit questionable. 443 00:51:47,530 --> 00:51:54,100 Earth and other taking other things flipped on the owner of a click empty piece with interest rates. 444 00:51:54,610 --> 00:52:01,030 Then it's not a problem because in the case we only have finite many pasts. 445 00:52:01,510 --> 00:52:05,230 They can be explored using purely space algorithms. 446 00:52:05,680 --> 00:52:15,520 And for many of these problems, I think even for all AP space, lower bound have been established for this case. 447 00:52:17,760 --> 00:52:24,480 So now so this word restrictions, non-negative weights or etc. 448 00:52:24,730 --> 00:52:32,100 Now what is in the training case? And one indication that this is more difficult is that these values, 449 00:52:32,100 --> 00:52:38,240 the maximal expectation defined in one of these variants can be an irrational value. 450 00:52:38,250 --> 00:52:43,650 So one should not expect linear programming techniques as they are known for the unconditional case. 451 00:52:44,440 --> 00:52:50,350 And indeed, what the result of Jakob Bauer in his Ph.D. thesis. 452 00:52:51,370 --> 00:52:59,980 What he has shown is that quite a lot of these problems are Coulomb or even positivity of heart. 453 00:53:00,400 --> 00:53:10,420 Now, this column problem is a very prominent and very old problem in mathematics to decide whether a recurrent sequence eventually 454 00:53:10,420 --> 00:53:20,950 hits zero or whether all these values are positive and the desired ability status is unknown for quite a long time. 455 00:53:21,220 --> 00:53:27,580 Ben, Do you know I forgot when this column problem of positivity has been considered first? 456 00:53:29,130 --> 00:53:35,440 Or anyone. In the. 457 00:53:37,850 --> 00:53:42,140 That okay. In the 17th. So quite old problem. 458 00:53:43,070 --> 00:53:46,990 Um. So trust the site remark. 459 00:53:47,930 --> 00:53:53,810 Uh, well, I say I was surprised to see these results, that these problems are so hot. 460 00:53:54,290 --> 00:54:00,530 And Jaco introduced to my opinion, a very elegant proof technique. 461 00:54:00,830 --> 00:54:15,200 So he used. The same MVP get hit for for encoding linear recurrent sequences to prove school m or positivity hardness for a wide range of problems. 462 00:54:15,200 --> 00:54:22,279 Although for other problems which have nothing to do with weights like the model tracking problem for frequency 463 00:54:22,280 --> 00:54:31,730 l t l in Markov decision processes or threshold conditions for termination probabilities in one counter MVP. 464 00:54:32,120 --> 00:54:38,150 And these reductions, they only differ in the truth of the in the choice of the initial condition. 465 00:54:40,550 --> 00:54:51,350 So let's come to the conclusions. So for the classical stochastic sort of past problem, the key development equations from which we have to. 466 00:54:52,390 --> 00:54:56,410 This full machinery tool that can be exploited algorithmically. 467 00:54:56,410 --> 00:55:03,610 So linear programming techniques, value optimisation, policy, iteration, and let's say. 468 00:55:04,740 --> 00:55:12,120 To get there were first. These are kind of semantic site conditions that take us a tricky approach. 469 00:55:12,720 --> 00:55:20,850 But after some graph transformations, polynomial time transformations, one can reduce to this case. 470 00:55:22,270 --> 00:55:29,680 Now for these variants that that I proposed here or that have been considered in the literature, 471 00:55:29,860 --> 00:55:36,420 I found it surprising that they are so difficult in the case of introverts, this positivity, 472 00:55:36,450 --> 00:55:42,909 hardness and quite a few of them enjoy this property that if all debates are non-negative, 473 00:55:42,910 --> 00:55:48,790 then one can exploit the non-monetary necessity of the accumulated weights along the prefixes, 474 00:55:50,020 --> 00:55:57,100 which gives you a saturation point characterisation and then X time or X based algorithm. 475 00:55:59,320 --> 00:56:10,030 And one. Oops. One other remark. So this was for Markov decision process, which are also considered as one player stochastic games. 476 00:56:10,950 --> 00:56:22,080 Now in the case where you have to play us. Then let's say the objective of one player is to maximise this expectation and for the 477 00:56:22,470 --> 00:56:29,280 other player to minimise the awesome results by putting in better outcomes from 99. 478 00:56:30,390 --> 00:56:35,550 But at least to the best of my knowledge, this problem has not been completely solved. 479 00:56:37,590 --> 00:56:44,430 And there is also this. This I mentioned this, this partial of expectations. 480 00:56:44,850 --> 00:56:51,780 This also amount of paper has been considered also in the case of of the game setting. 481 00:56:52,560 --> 00:57:00,660 But I'm not real for algorithms, for the conditional expectation or the other variance of the to have the shortest pass problem. 482 00:57:01,560 --> 00:57:04,890 Now, I almost made it in time. Thank you very much.