期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Non-Stationary Semi-Markov Decision Processes on a Finite Horizon

Mrinal K. Ghosh Subhamay Saha 《随机分析与应用》2013,31(1):183-190

We introduce and study a class of non-stationary semi-Markov decision processes on a finite horizon. By constructing an equivalent Markov decision process, we establish the existence of a piecewise open loop relaxed control which is optimal for the finite horizon problem. 相似文献

2.

A partially observable decision problem under a shifted likelihood ratio ordering

T. Nakai 《Mathematical and Computer Modelling》1995,22(10-12)

In this paper, we discuss a partially observable sequential decision problem under a shifted likelihood ratio ordering. Since we employ the Bayes' theorem for the learning procedure, we treat this problem under several assumptions. Under these assumptions, we obtain some fundamental results about the relation between prior and posterior information. We also consider an optimal stopping problem for this partially observable Markov decision process. 相似文献

3.

Constrained Markov decision processes with uncertain costs

《Operations Research Letters》2022,50(2):218-223

We consider a finite state-action discounted constrained Markov decision process with uncertain running costs and known transition probabilities. We propose equivalent linear programming, second-order cone programming and semidefinite programming problems for the robust constrained Markov decision processes when the uncertain running cost vectors belong to polytopic, ellipsoidal, and semidefinite cone uncertainty sets, respectively. As an application, we study a variant of a machine replacement problem and perform numerical experiments on randomly generated instances of various sizes. 相似文献

4.

A multicriteria competitive Markov decision process 总被引：1，自引：0，他引：1

A. M. Rodrı´guez-Chı´a J. Puerto F. R. Fernández 《Mathematical Methods of Operations Research》2002,55(3):359-369

相似文献

5.

Constrained optimality for finite horizon semi-Markov decision processes in Polish spaces

Yonghui Huang Zhongfei Li Xianping Guo 《Operations Research Letters》2014

This paper focuses on solving a finite horizon semi-Markov decision process with multiple constraints. We convert the problem to a constrained absorbing discrete-time Markov decision process and then to an equivalent linear program over a class of occupancy measures. The existence, characterization and computation of constrained-optimal policies are established under suitable conditions. An example is given to demonstrate our results. 相似文献

6.

On maximizing the average time at a goal

S. Demko T.P. Hill 《Stochastic Processes and their Applications》1984,17(2):349-357

In a decision process (gambling or dynamic programming problem) with finite state space and arbitrary decision sets (gambles or actions), there is always available a Markov strategy which uniformly (nearly) maximizes the average time spent at a goal. If the decision sets are closed, there is even a stationary strategy with the same property.Examples are given to show that approximations by discounted or finite horizon payoffs are not useful for the general average reward problem. 相似文献

7.

Kolmogorov forward equation and explosiveness in countable state Markov processes

F. M. Spieksma 《Annals of Operations Research》2016,241(1-2):3-22

相似文献

8.

On the block upper-triangularity of undiscounted multi-chain markov decision problems

Norihiro Mizuno 《Operations Research Letters》1983,2(3):127-133

We show that the LP formulation for an undiscounted multi-chain Markov decision problem can be put in a block upper-triangular form by a polynomial time procedure. Each minimal block (after an appropriate dynamic revision) gives rise to a single-chain Markov decision problem which can be treated independently. An optimal solution to each single-chain problem can be connected by auxiliary dual programs to obtain an optimal solution to a multi-chain problem. 相似文献

9.

Remarks on maximal meanstandard devition ratio in undiscounted mdps

《Optimization》2012,61(3-4):385-392

In the steady state of an undiscounted Markov decision process, we consider the problem to find an optimal stationary probability distribution that maximizes the mean standard deviation ratio among all the stationary probability distributions. The problem injects considerations in MDPs from the relative point of view 相似文献

10.

The discounted multi-objective Markov decision model with incomplete state observations: lexicographically order criteria

Jia Rangcheng Ding Yuanyao Tang Shaoxiang 《Mathematical Methods of Operations Research》2001,54(3):439-443

相似文献

11.

The problem of optimal stopping in a partially observable Markov chain

T. Nakai 《Journal of Optimization Theory and Applications》1985,45(3):425-442

The optimal-stopping problem in a partially observable Markov chain is considered, and this is formulated as a Markov decision process. We treat a multiple stopping problem in this paper. Unlike the classical stopping problem, the current state of the chain is not known directly. Information about the current state is always available from an information process. Several properties about the value and the optimal policy are given. For example, if we add another stop action to thek-stop problem, the increment of the value is decreasing ink.The author wishes to thank Professor M. Sakaguchi of Osaka University for his encouragement and guidance. He also thanks the referees for their careful readings and helpful comments. 相似文献

12.

Discounted continuous-time Markov decision processes with unbounded rates and randomized history-dependent policies: the dynamic programming approach

Alexey Piunovskiy Yi Zhang 《4OR: A Quarterly Journal of Operations Research》2014,12(1):49-75

This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation. 相似文献

13.

A survey of algorithmic methods for partially observed Markov decision processes 总被引：7，自引：0，他引：7

William S. Lovejoy 《Annals of Operations Research》1991,28(1):47-65

A partially observed Markov decision process (POMDP) is a generalization of a Markov decision process that allows for incomplete information regarding the state of the system. The significant applied potential for such processes remains largely unrealized, due to an historical lack of tractable solution methodologies. This paper reviews some of the current algorithmic alternatives for solving discrete-time, finite POMDPs over both finite and infinite horizons. The major impediment to exact solution is that, even with a finite set of internal system states, the set of possible information states is uncountably infinite. Finite algorithms are theoretically available for exact solution of the finite horizon problem, but these are computationally intractable for even modest-sized problems. Several approximation methodologies are reviewed that have the potential to generate computationally feasible, high precision solutions. 相似文献

14.

A value iteration method for undiscounted multichain Markov decision processes

K. Ohno 《Mathematical Methods of Operations Research》1988,32(2):71-93

This paper proposes a value iteration method which finds an-optimal policy of an undiscounted multichain Markov decision process in a finite number of iterations. The undiscounted multichain Markov decision process is reduced to an aggregated Markov decision process, which utilizes maximal gains of undiscounted Markov decision sub-processes and is formulated as an optimal stopping problem. As a preliminary, sufficient conditions are presented under which a policy is-optimal.

Zusammenfassung In dieser Arbeit wird eine Wertiterationsmethode vorgeschlagen, die eine-optimale Politik für einen undiskontierten nicht-irreduziblen Markovschen Entscheidungsprozeß (MEP) in endlichen vielen Schritten liefert. Der undiskontierte nicht-irreduzible MEP wird auf einen aggregierten MEP reduziert, der maximale Gewinn eines undiskontierten Sub-MEP verwendet und als optimales Stopp-Problem formuliert wird. Zu Beginn werden hinreichende Bedingungen für die-Optimalität einer Politik angegeben.

相似文献

15.

Optimal patient and personnel scheduling policies for care-at-home service facilities

P.M. Koeleman S. Bhulai M. van Meersbergen 《European Journal of Operational Research》2012

In this paper we study the problem of personnel planning in care-at-home facilities. We model the system as a Markov decision process, which leads to a high-dimensional control problem. We study monotonicity properties of the system and derive structural results for the optimal policy. Based on these insights, we propose a trunk reservation heuristic to control the system. We provide numerical evidence that the heuristic yields close to optimal performance, and scales well for large problem instances. 相似文献

16.

非时齐部分可观察Markov决策规划的最优策略问题

张继红郭世贞章芸《运筹学学报》2004,8(2):81-87

本文讨论了一类非时齐部分可观察Markov决策模型．在不改变状态空间可列性的条件下，把该模型转化为[5]中的一般化折扣模型，从而解决了其最优策略问题，并且得到了该模型的有限阶段逼近算法，其中该算法涉及的状态是可列的．相似文献

17.

Continuity of the Value of Competitive Markov Decision Processes

Eilon Solan 《Journal of Theoretical Probability》2003,16(4):831-845

We provide a bound for the variation of the function that assigns to every competitive Markov decision process and every discount factor its discounted value. This bound implies that the undiscounted value of a competitive Markov decision process is continuous in the relative interior of the space of transition rules. 相似文献

18.

具有模糊报酬的多目标马尔可夫决策规划

曾庆宁《模糊系统与数学》2001,15(3):82-85

给出一种模糊多目标马尔可夫决策规划的定义,即当报酬是模糊函数时的多目标马尔可夫决策规划,并解决求解这种规划的最优策略的方法以及这种多目标规划最优解的判决问题。相似文献

19.

On the Generation of Markov Decision Processes

T. W. Archibald K. I. M. McKinnon L. C. Thomas 《The Journal of the Operational Research Society》1995,46(3):354-361

Comparisons of the performance of solution algorithms for Markov decision processes rely heavily on problem generators to provide sizeable sets of test problems. Existing generation techniques allow little control over the properties of the test problems and often result in problems which are not typical of real-world examples. This paper identifies the properties of Markov decision processes which affect the performance of solution algorithms, and also describes a new problem generation technique which allows all of these properties to be controlled. 相似文献

20.

Markov-type fuzzy decision processes with a discounted reward on a closed interval

《European Journal of Operational Research》1996,92(3):649-662

We formulate a new multi-stage decision process with Markov-type fuzzy transition, which is termed Markov-type fuzzy decision process. In the general framework of the decision process, both of state and action are assumed to be fuzzy itself. The transition of states is defined using the fuzzy relation with Markov property and the discounted total reward is described as a fuzzy number on a closed bounded interval. To discuss the optimization problem, a partial order of convex fuzzy numbers is introduced. In this paper the discounted total reward associated with an admissible stationary policy is characterized by a unique fixed point of the contractive mapping. Moreover, the optimality equation for the fuzzy decision model is derived under some continuity conditions. Also, an illustrated example is given to explain the theoretical results and the computation in the paper. 相似文献