首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy  相似文献   

4.
We study Markov jump decision processes with both continuously and instantaneouslyacting decisions and with deterministic drift between jumps. Such decision processes were recentlyintroduced and studied from discrete time approximations point of view by Van der Duyn Schouten.Weobtain necessary and sufficient optimality conditions for these decision processes in terms of equations and inequalities of quasi-variational type. By means of the latter we find simple necessaryand sufficient conditions for the existence of stationary optimal policies in such processes with finite state and action spaces, both in the discounted and average per unit time reward cases.  相似文献   

5.
We consider a discrete time Markov Decision Process (MDP) under the discounted payoff criterion in the presence of additional discounted cost constraints. We study the sensitivity of optimal Stationary Randomized (SR) policies in this setting with respect to the upper bound on the discounted cost constraint functionals. We show that such sensitivity analysis leads to an improved version of the Feinberg–Shwartz algorithm (Math Oper Res 21(4):922–945, 1996) for finding optimal policies that are ultimately stationary and deterministic.  相似文献   

6.
We consider limiting average Markov decision processes (MDP) with finite state and action spaces. We propose some algorithms to determine optimal strategies for deterministic and general MDPs. These algorithms are based on graph theory and the construction of levels in some aggregated MDP.  相似文献   

7.
8.
We consider a discrete time finite Markov decision process (MDP) with the discounted and weighted reward optimality criteria. In [1] the authors considered some decomposition of limiting average MDPs. In this paper, we use an analogous approach for discounted and weighted MDPs. Then, we construct some hierarchical decomposition algorithms for both discounted and weighted MDPs.  相似文献   

9.
10.
We study a nonzero-sum stochastic differential game where the state is a controlled reflecting diffusion in the nonnegative orthant. Under certain conditions, we establish the existence of Nash equilibria in stationary strategies for both discounted and average payoff criteria.  相似文献   

11.
Yoon  Seunghwan  Lewis  Mark E. 《Queueing Systems》2004,47(3):177-199
We consider congestion control in a nonstationary queueing system. Assuming that the arrival and service rates are bounded, periodic functions of time, a Markov decision process (MDP) formulation is developed. We show under the infinite horizon discounted and average reward optimality criteria, for each fixed time, optimal pricing and admission control strategies are nondecreasing in the number of customers in the system. This extends stationary results to the nonstationary setting. Despite this result, the problem still seems intractable. We propose an easily implementable pointwise stationary approximation (PSA) to approximate the optimal policies, suggest a heuristic to improve the implementation of the PSA and verify its usefulness via a numerical study.  相似文献   

12.
We consider a two-person, general-sum, rational-data, undiscounted stochastic game in which one player (player II) controls the transition probabilities. We show that the set of stationary equilibrium points is the union of a finite number of sets such that, every element of each of these sets can be constructed from a finite number of extreme equilibrium strategies for player I and from a finite number of pseudo-extreme equilibrium strategies for player II. These extreme and pseudo-extreme strategies can themselves be constructed by finite (but inefficient) algorithms. Analogous results can also be established in the more straightforward case of discounted single-controller games.  相似文献   

13.
In this paper, we consider a mean–variance optimization problem for Markov decision processes (MDPs) over the set of (deterministic stationary) policies. Different from the usual formulation in MDPs, we aim to obtain the mean–variance optimal policy that minimizes the variance over a set of all policies with a given expected reward. For continuous-time MDPs with the discounted criterion and finite-state and action spaces, we prove that the mean–variance optimization problem can be transformed to an equivalent discounted optimization problem using the conditional expectation and Markov properties. Then, we show that a mean–variance optimal policy and the efficient frontier can be obtained by policy iteration methods with a finite number of iterations. We also address related issues such as a mutual fund theorem and illustrate our results with an example.  相似文献   

14.
Continuous time Markovian decision models with countable state space are investigated. The existence of an optimal stationary policy is established for the expected average return criterion function. It is shown that the expected average return can be expressed as an expected discounted return of a related Markovian decision process. A policy iteration method is given which converges to an optimal deterministic policy, the policy so obtained is shown optimal over all Markov policies.  相似文献   

15.
In stochastic games with finite state and action spaces, we examine existence of equilibria where player 1 uses the limiting average reward and player 2 a discounted reward for the evaluations of the respective payoff sequences. By the nature of these rewards, the far future determines player 1's reward, while player 2 is rather interested in the near future. This gives rise to a natural cooperation between the players along the course of the play. First we show the existence of stationary ε-equilibria, for all ε>0, in these games. However, besides these stationary ε-equilibria, there also exist ε-equilibria, in terms of only slightly more complex ultimately stationary strategies, which are rather in the spirit of these games because, after a large stage when the discounted game is not interesting any longer, the players cooperate to guarantee the highest feasible reward to player 1. Moreover, we analyze an interesting example demonstrating that 0-equilibria do not necessarily exist in these games, not even in terms of history dependent strategies. Finally, we examine special classes of stochastic games with specific conditions on the transition and payoff structures. Several examples are given to clarify all these issues.  相似文献   

16.
In a multivariate nonparametric regression problem with fixed, deterministic design asymptotic, uniform confidence bands for the regression function are constructed. The construction of the bands is based on the asymptotic distribution of the maximal deviation between a suitable nonparametric estimator and the true regression function which is derived by multivariate strong approximation methods and a limit theorem for the supremum of a stationary Gaussian field over an increasing system of sets. The results are derived for a general class of estimators which includes local polynomial estimators as a special case. The finite sample properties of the proposed asymptotic bands are investigated by means of a small simulation study.  相似文献   

17.
This paper is the third in a series on constrained Markov decision processes (CMDPs) with a countable state space and unbounded cost. In the previous papers we studied the expected average and the discounted cost. We analyze in this paper the total cost criterion. We study the properties of the set of occupation measures achieved by different classes of policies; we then focus on stationary policies and on mixed deterministic policies and present conditions under which optimal policies exist within these classes. We conclude by introducing an equivalent infinite Linear Program.  相似文献   

18.
We consider a discrete time Markov decision process (MDP) with a finite state space, a finite action space, and two kinds of immediate rewards. The problem is to maximize the time average reward generated by one reward stream, subject to the other reward not being smaller than a prescribed value. An MDP with a reward constraint can be solved by linear programming in the range of mixed policies. On the other hand, when we restrict ourselves to pure policies, the problem is a combinatorial problem, for which a solution has not been discovered. In this paper, we propose an approach by Genetic Algorithms (GAs) in order to obtain an effective search process and to obtain a near optimal, possibly optimal pure stationary policy. A numerical example is given to examine the efficiency of the approach proposed.  相似文献   

19.
1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe...  相似文献   

20.
Critical resources are often shared among different classes of customers. Capacity reservation allows each class of customers to better manage priorities of its customers but might lead to unused capacity. Unused capacity can be avoided or reduced by advance cancelation. This paper addresses the service capacity reservation for a given class of customers. The reservation process is characterized by: contracted time slots (CTS) reserved for the class of customers, requests for lengthy regular time slots (RTS) and two advance cancelation modes to cancel CTS one-period or two-period before. The optimal control under a given contract is formulated as an average cost Markov Decision Process (MDP) in order to minimize customer waiting times, unused CTS and CTS cancelation. Structural properties of optimal control policies are established via the corresponding discounted cost MDP problem. Numerical results show that two-period advance CTS cancelation can significantly improve the contract-based solution.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号