首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
This paper deals with a continuous-time Markov decision process in Borel state and action spaces and with unbounded transition rates. Under history-dependent policies, the controlled process may not be Markov. The main contribution is that for such non-Markov processes we establish the Dynkin formula, which plays important roles in establishing optimality results for continuous-time Markov decision processes. We further illustrate this by showing, for a discounted continuous-time Markov decision process, the existence of a deterministic stationary optimal policy (out of the class of history-dependent policies) and characterizing the value function through the Bellman equation.  相似文献   

3.
Strongly excessive functions play an important role in the theory of Markov decision processes and Markov games. In this paper the following question is investigated: What are the properties of Markov decision processes which possess a strongly excessive function? A probabilistic characterization is presented in the form of a random drift through a partitioned state space. For strongly excessive functions which have a positive lower bound a characterization is given in terms of the lifetime distribution of the process.Finally we give a characterization in terms of the spectral radius.  相似文献   

4.
In the steady state of a discrete time Markov decision process, we consider the problem to find an optimal randomized policy that minimizes the variance of the reward in a transition among the policies which give the mean not less than a specified value. The problem is solved by introducing a parametric Markov decision process with average cost criterion. It is shown that there exists an optimal policy which is a mixture of at most two pure policies. As an application, the toymaker's problem is discussed.  相似文献   

5.
胡玉生  李金林  冉伦  赵天 《运筹与管理》2017,26(12):157-164
以同一航线上的多个竞争航班为研究对象,在假设各竞争航班之间具有完全信息的基础上,利用马尔可夫决策过程和博弈论,建立了竞争环境下风险规避的航班动态定价的数学模型,证明了均衡价格的存在性。在此基础上,进一步讨论了信息不完全情况下风险规避的竞争航班的动态定价问题。数值实验表明:在竞争环境下,各风险规避航班的均衡价格随自身剩余座位数量和风险规避系数的增加而下降,随其他竞争航班的剩余座位数量和风险规避系数的增加而提高。  相似文献   

6.
In this paper, we show that a discounted continuous-time Markov decision process in Borel spaces with randomized history-dependent policies, arbitrarily unbounded transition rates and a non-negative reward rate is equivalent to a discrete-time Markov decision process. Based on a completely new proof, which does not involve Kolmogorov??s forward equation, it is shown that the value function for both models is given by the minimal non-negative solution to the same Bellman equation. A verifiable necessary and sufficient condition for the finiteness of this value function is given, which induces a new condition for the non-explosion of the underlying controlled process.  相似文献   

7.
Extending the multi-timescale model proposed by the author et al. in the context of Markov decision processes, this paper proposes a simple analytical model called M timescale two-person zero-sum Markov Games (MMGs) for hierarchically structured sequential decision-making processes in two players' competitive situations where one player (the minimizer) wishes to minimize their cost that will be paid to the adversary (the maximizer). In this hierarchical model, for each player, decisions in each level in the M-level hierarchy are made in M different discrete timescales and the state space and the control space of each level in the hierarchy are non-overlapping with those of the other levels, respectively, and the hierarchy is structured in a "pyramid" sense such that a decision made at level m (slower timescale) state and/or the state will affect the evolutionary decision making process of the lower-level m+1 (faster timescale) until a new decision is made at the higher level but the lower-level decisions themselves do not affect the transition dynamics of higher levels. The performance produced by the lower-level decisions will affect the higher level decisions for each player. A hierarchical objective function for the minimizer and the maximizer is defined, and from this we define "multi-level equilibrium value function" and derive a "multi-level equilibrium equation". We also discuss how to solve hierarchical games exactly.  相似文献   

8.
In this paper a new notion of a hierarchic Markov process is introduced. It is a series of Markov decision processes called subprocesses built together in one Markov decision process called the main process. The hierarchic structure is specially designed to fit replacement models which in the traditional formulation as ordinary Markov decision processes are usually very large. The basic theory of hierarchic Markov processes is described and examples are given of applications in replacement models. The theory can be extended to fit a situation where the replacement decision depends on the quality of the new asset available for replacement.  相似文献   

9.
Discrete time countable state Markov decision processes with finite decision sets and bounded costs are considered. Conditions are given under which an unbounded solution to the average cost optimality equation exists and yields an optimal stationary policy. A new form of the optimality equation is derived for the case in which every stationary policy gives rise to an ergodic Markov chain.  相似文献   

10.
In this paper we consider a homotopy deformation approach to solving Markov decision process problems by the continuous deformation of a simpler Markov decision process problem until it is identical with the original problem. Algorithms and performance bounds are given.  相似文献   

11.
We consider a mathematical model of decision making by a company attempting to win a market share. We assume that the company releases its products to the market under the competitive conditions that another company is making similar products. Both companies can vary the kinds of their products on the market as well as the prices in accordance with consumer preferences. Each company aims to maximize its profit. A mathematical statement of the decision-making problem for the market players is a bilevel mathematical programming problem that reduces to a competitive facility location problem. As regards the latter, we propose a method for finding an upper bound for the optimal value of the objective function and an algorithm for constructing an approximate solution. The algorithm amounts to local ascent search in a neighborhood of a particular form, which starts with an initial approximate solution obtained simultaneously with an upper bound. We give a computational example of the problem under study which demonstrates the output of the algorithm.  相似文献   

12.
The optimal-stopping problem in a partially observable Markov chain is considered, and this is formulated as a Markov decision process. We treat a multiple stopping problem in this paper. Unlike the classical stopping problem, the current state of the chain is not known directly. Information about the current state is always available from an information process. Several properties about the value and the optimal policy are given. For example, if we add another stop action to thek-stop problem, the increment of the value is decreasing ink.The author wishes to thank Professor M. Sakaguchi of Osaka University for his encouragement and guidance. He also thanks the referees for their careful readings and helpful comments.  相似文献   

13.
An optimal strategy in a Markov decision problem is robust if it is optimal in every decision problem (not necessarily stationary) that is close to the original problem. We prove that when the state and action spaces are finite, an optimal strategy is robust if and only if it is the unique optimal strategy.  相似文献   

14.
This paper proposes a value iteration method which finds an-optimal policy of an undiscounted multichain Markov decision process in a finite number of iterations. The undiscounted multichain Markov decision process is reduced to an aggregated Markov decision process, which utilizes maximal gains of undiscounted Markov decision sub-processes and is formulated as an optimal stopping problem. As a preliminary, sufficient conditions are presented under which a policy is-optimal.
Zusammenfassung In dieser Arbeit wird eine Wertiterationsmethode vorgeschlagen, die eine-optimale Politik für einen undiskontierten nicht-irreduziblen Markovschen Entscheidungsprozeß (MEP) in endlichen vielen Schritten liefert. Der undiskontierte nicht-irreduzible MEP wird auf einen aggregierten MEP reduziert, der maximale Gewinn eines undiskontierten Sub-MEP verwendet und als optimales Stopp-Problem formuliert wird. Zu Beginn werden hinreichende Bedingungen für die-Optimalität einer Politik angegeben.
  相似文献   

15.
This paper studies optimal investment and the dynamic cost of income uncertainty, applying a stochastic programming approach. The motivation is given by a case study in Finnish agriculture. The investment decision of a representative farm is modelled as a Markov decision process, extended to account for risk. A numerical framework for studying the dynamic uncertainty cost is presented, modifying the classical expected value of perfect information to a dynamic setting. The uncertainty cost depends on the volatility of income: e.g. with stationary income, the dynamic uncertainty cost corresponds to a dynamic option value of postponing investment. The model can be applied to agricultural policy planning. In the case study, the investment decision is sensitive to risk.  相似文献   

16.
The paper describes a bound on the optimal value of a Markov Decision Process using the iterates of the value iteration algorithm. Previous bounds have depended upon the values of the last two iterations, but here we consider a bound which depends on the last three iterates. We show that in a maximisation problem, this leads to a better lower bound on the optimal value.  相似文献   

17.
For an infinite-horizon discounted Markov decision process with a finite number of states and actions, this note provides upper bounds on the number of operations required to compute an approximately optimal policy by value iterations in terms of the discount factor, spread of the reward function, and desired closeness to optimality. One of the provided upper bounds on the number of iterations has the property that it is a non-decreasing function of the value of the discount factor.  相似文献   

18.
19.
A finite-state Markov decision process, in which, associated with each action in each state, there are two rewards, is considered. The objective is to optimize the ratio of the two rewards over an infinite horizon. In the discounted version of this decision problem, it is shown that the optimal value is unique and the optimal strategy is pure and stationary; however, they are dependent on the starting state. Also, a finite algorithm for computing the solution is given.  相似文献   

20.
We consider the problem of a firm that in each cycle of a planning horizon builds inventory of identical items that it acquires by participating in auctions in order to satisfy its own market demand. The firm’s objective is to have a procurement strategy that maximizes the expected present value of the profit for an infinite planning horizon of identical cycles. We formulate this problem as a Markov decision process. We establish monotonicity properties of the value function and of the optimal bidding rule.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号