期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A New Condition and Approach for Zero-Sum Stochastic Games with Average Payoffs

Xianping Guo Jie Yang 《随机分析与应用》2013,31(3):537-561

Abstract

This article deals with discrete-time two-person zero-sum stochastic games with Borel state and action spaces. The optimality criterion to be studied is the long-run expected average payoff criterion, and the (immediate) payoff function may have neither upper nor lower bounds. We first replace the optimality equation widely used in the previous literature with two so-called optimality inequalities, and give a new set of conditions for the existence of solutions to the optimality inequalities. Then, from the optimality inequalities we ensure the existence of a pair of average optimal stationary strategies. Our new condition is slightly weaker than those in the previous literature, and as a byproduct some interesting results such as the convergence of a value iteration scheme to the value of the discounted payoff game is obtained. Finally, we first apply the main results in this article to generalized inventory systems, and then further provide an example of controlled population processes for which all of our conditions are satisfied, while some of conditions in some of previous literature fail to hold. 相似文献

2.

Two-person zero-sum stochastic games

Melike Baykal-Gürsoy 《Annals of Operations Research》1991,28(1):135-152

Two-person zero-sum stochastic games with finite state and action spaces are considered. The expected average payoff criterion is introduced. In the special case of single controller games it is shown that the optimal stationary policies and the value of the game can be obtained from the optimal solutions to a pair of dual programs. For multichain structures, a decomposition algorithm is given which produces such optimal stationary policies for both players. In the case of both players controlling the transitions, a generalized game is obtained, the solution of which gives the optimal policies. 相似文献

3.

Zero-sum stochastic games with average payoffs: New optimality conditions

Jie Yang Xian Ping Guo 《数学学报(英文版)》2009,25(7):1201-1216

In this paper we study zero-sum stochastic games. The optimality criterion is the long-run expected average criterion, and the payoff function may have neither upper nor lower bounds. We give a new set of conditions for the existence of a value and a pair of optimal stationary strategies. Our conditions are slightly weaker than those in the previous literature, and some new sufficient conditions for the existence of a pair of optimal stationary strategies are imposed on the primitive data of the model. Our results are illustrated with a queueing system, for which our conditions are satisfied but some of the conditions in some previous literatures fail to hold. 相似文献

4.

连续时间Markov决策过程的均值-方差优化问题

下载免费PDF全文

叶柳儿 ;黄香香《中国科学:数学》2014,44(8):883-898

本文考虑连续时间Markov决策过程折扣模型的均值-方差优化问题.假设状态空间和行动空间均为Polish空间,转移率和报酬率函数均无界.本文的优化目标是在折扣最优平稳策略类里,选取相应方差最小的策略.本文致力于寻找Polish空间下Markov决策过程均值-方差最优策略存在的条件.利用首次进入分解方法,本文证明均值-方差优化问题可以转化为"等价"的期望折扣优化问题,进而得到关于均值-方差优化问题的"最优方程"和均值-方差最优策略的存在性以及它相应的特征.最后,本文给出若干例子说明折扣最优策略的不唯一性和均值-方差最优策略的存在性. 相似文献

5.

New optimality conditions for average-payoff continuous-time Markov games in Polish spaces

HERNNDEZ-LERMA Onsimo 《中国科学数学(英文版)》2011,(4)

This paper concerns two-person zero-sum games for a class of average-payoff continuous-time Markov processes in Polish spaces.The underlying processes are determined by transition rates that are allowed to be unbounded,and the payoff function may have neither upper nor lower bounds.We use two optimality inequalities to replace the so-called optimality equation in the previous literature.Under more general conditions,these optimality inequalities yield the existence of the value of the game and of a pair of ... 相似文献

6.

Optimal risk probability for first passage models in semi-Markov decision processes

Yonghui Huang 《Journal of Mathematical Analysis and Applications》2009,359(1):404-140

This paper studies the risk minimization problem in semi-Markov decision processes with denumerable states. The criterion to be optimized is the risk probability (or risk function) that a first passage time to some target set doesn't exceed a threshold value. We first characterize such risk functions and the corresponding optimal value function, and prove that the optimal value function satisfies the optimality equation by using a successive approximation technique. Then, we present some properties of optimal policies, and further give conditions for the existence of optimal policies. In addition, a value iteration algorithm and a policy improvement method for obtaining respectively the optimal value function and optimal policies are developed. Finally, two examples are given to illustrate the value iteration procedure and essential characterization of the risk function. 相似文献

7.

A minimization problem of the risk probability in first passage semi-Markov decision processes with loss rates

《中国科学数学(英文版)》2015,(9)

This paper is the first attempt to investigate the risk probability criterion in semi-Markov decision processes with loss rates. The goal is to find an optimal policy with the minimum risk probability that the total loss incurred during a first passage time to some target set exceeds a loss level. First, we establish the optimality equation via a successive approximation technique, and show that the value function is the unique solution to the optimality equation. Second, we give suitable conditions, under which we prove the existence of optimal policies and develop an algorithm for computing ?-optimal policies. Finally, we apply our main results to a business system. 相似文献

8.

Nonstationary denumerable state Markov decision processes – with average variance criterion

Xianping Guo 《Mathematical Methods of Operations Research》1999,49(1):87-96

In this paper, we consider the nonstationary Markov decision processes (MDP, for short) with average variance criterion on a countable state space, finite action spaces and bounded one-step rewards. From the optimality equations which are provided in this paper, we translate the average variance criterion into a new average expected cost criterion. Then we prove that there exists a Markov policy, which is optimal in an original average expected reward criterion, that minimizies the average variance in the class of optimal policies for the original average expected reward criterion. 相似文献

9.

离散时间马氏决策过程的首达目标准则

刘秋丽《应用数学学报》2011,34(6)

本文考虑可数状态离散时间马氏决策过程的首达目标模型的风险概率准则.优化的准则是最小化系统首次到达目标状态集的时间不超过某阈值的风险概率.首先建立最优方程并且证明最优值函数和最优方程的解对应,然后讨论了最优策略的一些性质,并进一步给出了最优平稳策略存在的条件,最后用一个例子说明我们的结果. 相似文献

10.

A new strong optimality criterion for nonstationary Markov decision processes

Xianping Guo Peng Shi Weiping Zhu 《Mathematical Methods of Operations Research》2000,52(2):287-306

This paper deals with a new optimality criterion consisting of the usual three average criteria and the canonical triplet (totally so-called strong average-canonical optimality criterion) and introduces the concept of a strong average-canonical policy for nonstationary Markov decision processes, which is an extension of the canonical policies of Herna′ndez-Lerma and Lasserre [16] (pages: 77) for the stationary Markov controlled processes. For the case of possibly non-uniformly bounded rewards and denumerable state space, we first construct, under some conditions, a solution to the optimality equations (OEs), and then prove that the Markov policies obtained from the OEs are not only optimal for the three average criteria but also optimal for all finite horizon criteria with a sequence of additional functions as their terminal rewards (i.e. strong average-canonical optimal). Also, some properties of optimal policies and optimal average value convergence are discussed. Moreover, the error bound in average reward between a rolling horizon policy and a strong average-canonical optimal policy is provided, and then a rolling horizon algorithm for computing strong average ε(>0)-optimal Markov policies is given. 相似文献

11.

Average sample-path optimality for continuous-time Markov decision processes in Polish spaces

Quan-xin Zhu 《应用数学学报(英文版)》2011,27(4):613-624

In this paper we study the average sample-path cost(ASPC) problem for continuous-time Markov decision processes in Polish spaces.To the best of our knowledge,this paper is a first attempt to study the ASPC criterion on continuous-time MDPs with Polish state and action spaces.The corresponding transition rates are allowed to be unbounded,and the cost rates may have neither upper nor lower bounds.Under some mild hypotheses,we prove the existence of ε(ε≥ 0)-ASPC optimal stationary policies based on two differe... 相似文献

12.

Markov control processes with pathwise constraints

Armando F. Mendoza-Pérez Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2010,71(3):477-502

This paper deals with discrete-time Markov control processes in Borel spaces, with unbounded rewards. The criterion to be optimized is a long-run sample-path (or pathwise) average reward subject to constraints on a long-run pathwise average cost. To study this pathwise problem, we give conditions for the existence of optimal policies for the problem with “expected” constraints. Moreover, we show that the expected case can be solved by means of a parametric family of optimality equations. These results are then extended to the problem with pathwise constraints. 相似文献

13.

Average optimality inequality for continuous-time Markov decision processes in Polish spaces

Quanxin Zhu 《Mathematical Methods of Operations Research》2007,66(2):299-313

In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957). 相似文献

14.

Average optimality for continuous-time Markov decision processes with a policy iteration approach

Quanxin Zhu 《Journal of Mathematical Analysis and Applications》2008,339(1):691-704

This paper deals with the average expected reward criterion for continuous-time Markov decision processes in general state and action spaces. The transition rates of underlying continuous-time jump Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We give conditions on the system's primitive data and under which we prove the existence of the average reward optimality equation and an average optimal stationary policy. Also, under our conditions we ensure the existence of ?-average optimal stationary policies. Moreover, we study some properties of average optimal stationary policies. We not only establish another average optimality equation on an average optimal stationary policy, but also present an interesting “martingale characterization” of such a policy. The approach provided in this paper is based on the policy iteration algorithm. It should be noted that our way is rather different from both the usually “vanishing discounting factor approach” and the “optimality inequality approach” widely used in the previous literature. 相似文献

15.

Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains

Tomás Prieto-Rumeau Onésimo Hernández-Lerma 《Mathematical Methods of Operations Research》2009,70(3):527-540

This paper deals with denumerable-state continuous-time controlled Markov chains with possibly unbounded transition and reward rates. It concerns optimality criteria that improve the usual expected average reward criterion. First, we show the existence of average reward optimal policies with minimal average variance. Then we compare the variance minimization criterion with overtaking optimality. We present an example showing that they are opposite criteria, and therefore we cannot optimize them simultaneously. This leads to a multiobjective problem for which we identify the set of Pareto optimal policies (also known as nondominated policies). 相似文献

16.

Risk-sensitive average continuous-time Markov decision processes with unbounded rates

《Optimization》2012,61(4):773-800

Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs. 相似文献

17.

Optimal models with maximizing probability of first achieving target value in the preceding stages

林元烈伍从斌康波大《中国科学A辑(英文版)》2003,46(3):396-414

Decision makers often face the need of performance guarantee with some sufficiently high probability. Such problems can be modelled using a discrete time Markov decision process (MDP) with a probability criterion for the first achieving target value. The objective is to find a policy that maximizes the probability of the total discounted reward exceeding a target value in the preceding stages. We show that our formulation cannot be described by former models with standard criteria. We provide the properties of the objective functions, optimal value functions and optimal policies. An algorithm for computing the optimal policies for the finite horizon case is given. In this stochastic stopping model, we prove that there exists an optimal deterministic and stationary policy and the optimality equation has a unique solution. Using perturbation analysis, we approximate general models and prove the existence of e-optimal policy for finite state space. We give an example for the reliability of the satellite sy 相似文献

18.

On the bailout dividend problem with periodic dividend payments for spectrally negative Markov additive processes

《Nonlinear Analysis: Hybrid Systems》2023

This paper studies the bailout optimal dividend problem with regime switching under the constraint that dividend payments can be made only at the arrival times of an independent Poisson process while capital can be injected continuously in time. We show the optimality of the regime-modulated Parisian-classical reflection strategy when the underlying risk model follows a general spectrally negative Markov additive process. In order to verify the optimality, first we study an auxiliary problem driven by a single spectrally negative Lévy process with a final payoff at an exponential terminal time and characterize the optimal dividend strategy. Then, we use the dynamic programming principle to transform the global regime-switching problem into an equivalent local optimization problem with a final payoff up to the first regime switching time. The optimality of the regime modulated Parisian-classical barrier strategy can be proven by using the results from the auxiliary problem and approximations via recursive iterations. 相似文献

19.

Zero-Sum Stochastic Games with Partial Information and Average Payoff

Subhamay Saha 《Journal of Optimization Theory and Applications》2014,160(1):344-354

We consider a discrete time partially observable zero-sum stochastic game with average payoff criterion. We study the game using an equivalent completely observable game. We show that the game has a value and also we present a pair of optimal strategies for both the players. 相似文献

20.

Stochastic Games with Average Payoff Criterion

M. K. Ghosh A. Bagchi 《Applied Mathematics and Optimization》1998,38(3):283-301

We study two-person stochastic games on a Polish state and compact action spaces and with average payoff criterion under a certain ergodicity condition. For the zero-sum game we establish the existence of a value and stationary optimal strategies for both players. For the nonzero-sum case the existence of Nash equilibrium in stationary strategies is established under certain separability conditions. Accepted 9 January 1997 相似文献