首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 995 毫秒
1.
We introduce and study a class of non-stationary semi-Markov decision processes on a finite horizon. By constructing an equivalent Markov decision process, we establish the existence of a piecewise open loop relaxed control which is optimal for the finite horizon problem.  相似文献   

2.
In this paper, we discuss a partially observable sequential decision problem under a shifted likelihood ratio ordering. Since we employ the Bayes' theorem for the learning procedure, we treat this problem under several assumptions. Under these assumptions, we obtain some fundamental results about the relation between prior and posterior information. We also consider an optimal stopping problem for this partially observable Markov decision process.  相似文献   

3.
We consider a finite state-action discounted constrained Markov decision process with uncertain running costs and known transition probabilities. We propose equivalent linear programming, second-order cone programming and semidefinite programming problems for the robust constrained Markov decision processes when the uncertain running cost vectors belong to polytopic, ellipsoidal, and semidefinite cone uncertainty sets, respectively. As an application, we study a variant of a machine replacement problem and perform numerical experiments on randomly generated instances of various sizes.  相似文献   

4.
5.
This paper focuses on solving a finite horizon semi-Markov decision process with multiple constraints. We convert the problem to a constrained absorbing discrete-time Markov decision process and then to an equivalent linear program over a class of occupancy measures. The existence, characterization and computation of constrained-optimal policies are established under suitable conditions. An example is given to demonstrate our results.  相似文献   

6.
In a decision process (gambling or dynamic programming problem) with finite state space and arbitrary decision sets (gambles or actions), there is always available a Markov strategy which uniformly (nearly) maximizes the average time spent at a goal. If the decision sets are closed, there is even a stationary strategy with the same property.Examples are given to show that approximations by discounted or finite horizon payoffs are not useful for the general average reward problem.  相似文献   

7.
8.
We show that the LP formulation for an undiscounted multi-chain Markov decision problem can be put in a block upper-triangular form by a polynomial time procedure. Each minimal block (after an appropriate dynamic revision) gives rise to a single-chain Markov decision problem which can be treated independently. An optimal solution to each single-chain problem can be connected by auxiliary dual programs to obtain an optimal solution to a multi-chain problem.  相似文献   

9.
《Optimization》2012,61(3-4):385-392
In the steady state of an undiscounted Markov decision process, we consider the problem to find an optimal stationary probability distribution that maximizes the mean standard deviation ratio among all the stationary probability distributions. The problem injects considerations in MDPs from the relative point of view  相似文献   

10.
11.
The optimal-stopping problem in a partially observable Markov chain is considered, and this is formulated as a Markov decision process. We treat a multiple stopping problem in this paper. Unlike the classical stopping problem, the current state of the chain is not known directly. Information about the current state is always available from an information process. Several properties about the value and the optimal policy are given. For example, if we add another stop action to thek-stop problem, the increment of the value is decreasing ink.The author wishes to thank Professor M. Sakaguchi of Osaka University for his encouragement and guidance. He also thanks the referees for their careful readings and helpful comments.  相似文献   

12.
This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation.  相似文献   

13.
A partially observed Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. The significant applied potential for such processes remains largely unrealized, due to an historical lack of tractable solution methodologies. This paper reviews some of the current algorithmic alternatives for solving discrete-time, finite POMDPs over both finite and infinite horizons. The major impediment to exact solution is that, even with a finite set of internal system states, the set of possible information states is uncountably infinite. Finite algorithms are theoretically available for exact solution of the finite horizon problem, but these are computationally intractable for even modest-sized problems. Several approximation methodologies are reviewed that have the potential to generate computationally feasible, high precision solutions.  相似文献   

14.
This paper proposes a value iteration method which finds an-optimal policy of an undiscounted multichain Markov decision process in a finite number of iterations. The undiscounted multichain Markov decision process is reduced to an aggregated Markov decision process, which utilizes maximal gains of undiscounted Markov decision sub-processes and is formulated as an optimal stopping problem. As a preliminary, sufficient conditions are presented under which a policy is-optimal.
Zusammenfassung In dieser Arbeit wird eine Wertiterationsmethode vorgeschlagen, die eine-optimale Politik für einen undiskontierten nicht-irreduziblen Markovschen Entscheidungsprozeß (MEP) in endlichen vielen Schritten liefert. Der undiskontierte nicht-irreduzible MEP wird auf einen aggregierten MEP reduziert, der maximale Gewinn eines undiskontierten Sub-MEP verwendet und als optimales Stopp-Problem formuliert wird. Zu Beginn werden hinreichende Bedingungen für die-Optimalität einer Politik angegeben.
  相似文献   

15.
In this paper we study the problem of personnel planning in care-at-home facilities. We model the system as a Markov decision process, which leads to a high-dimensional control problem. We study monotonicity properties of the system and derive structural results for the optimal policy. Based on these insights, we propose a trunk reservation heuristic to control the system. We provide numerical evidence that the heuristic yields close to optimal performance, and scales well for large problem instances.  相似文献   

16.
本文讨论了一类非时齐部分可观察Markov决策模型.在不改变状态空间可列性的条件下,把该模型转化为[5]中的一般化折扣模型,从而解决了其最优策略问题,并且得到了该模型的有限阶段逼近算法,其中该算法涉及的状态是可列的.  相似文献   

17.
We provide a bound for the variation of the function that assigns to every competitive Markov decision process and every discount factor its discounted value. This bound implies that the undiscounted value of a competitive Markov decision process is continuous in the relative interior of the space of transition rules.  相似文献   

18.
给出一种模糊多目标马尔可夫决策规划的定义,即当报酬是模糊函数时的多目标马尔可夫决策规划,并解决求解这种规划的最优策略的方法以及这种多目标规划最优解的判决问题。  相似文献   

19.
Comparisons of the performance of solution algorithms for Markov decision processes rely heavily on problem generators to provide sizeable sets of test problems. Existing generation techniques allow little control over the properties of the test problems and often result in problems which are not typical of real-world examples. This paper identifies the properties of Markov decision processes which affect the performance of solution algorithms, and also describes a new problem generation technique which allows all of these properties to be controlled.  相似文献   

20.
We formulate a new multi-stage decision process with Markov-type fuzzy transition, which is termed Markov-type fuzzy decision process. In the general framework of the decision process, both of state and action are assumed to be fuzzy itself. The transition of states is defined using the fuzzy relation with Markov property and the discounted total reward is described as a fuzzy number on a closed bounded interval. To discuss the optimization problem, a partial order of convex fuzzy numbers is introduced. In this paper the discounted total reward associated with an admissible stationary policy is characterized by a unique fixed point of the contractive mapping. Moreover, the optimality equation for the fuzzy decision model is derived under some continuity conditions. Also, an illustrated example is given to explain the theoretical results and the computation in the paper.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号