首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, we consider discounted-reward finite-state Markov decision processes which depend on unknown parameters. An adaptive policy inspired by the nonstationary value iteration scheme of Federgruen and Schweitzer (Ref. 1) is proposed. This policy is briefly compared with the principle of estimation and control recently obtained by Schäl (Ref. 4).This research was supported in part by the Consejo Nacional de Ciencia y Tecnología under Grant No. PCCBBNA-005008, in part by a grant from the IBM Corporation, in part by the Air Force Office of Scientific Research under Grant No. AFOSR-79-0025, in part by the National Science Foundation under Grant No. ECS-0822033, and in part by the Joint Services Electronics Program under Contract No. F49620-77-C-0101.  相似文献   

2.
We consider a class of Markov decision processes withfinite state and action spaces which, essentially, is determined by the following condition: The state space isirreducible under the action of any stationary policy. However, except by this restriction, the transition law iscompletely unknown to the controller. In this context, we find a set of policies under which thefrequency estimators of the transition law are strongly consistent and then, this result is applied to constructadaptive asymptotically discount-optimal policies.Dedicated to Professor Truman O. Lewis, on the occasion of his sixtieth birthdayThis research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152, and in part by the Consejo Nacional de Ciencia y Tecnología (CONACYT) under Grant A128CCOEO550 (MT-2).  相似文献   

3.
We consider Markov Decision Processes under light traffic conditions. We develop an algorithm to obtain asymptotically optimal policies for both the total discounted and the average cost criterion. This gives a general framework for several light traffic results in the literature. We illustrate the method by deriving the asymptotically optimal control of a simple ATM network.  相似文献   

4.
《Optimization》2012,61(12):1427-1447
This article is concerned with the limiting average variance for discrete-time Markov control processes in Borel spaces, subject to pathwise constraints. Under suitable hypotheses we show that within the class of deterministic stationary optimal policies for the pathwise constrained problem, there exists one with a minimal variance.  相似文献   

5.
This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games. Research partially supported by grants NSFC, DRFP and NCET. Research partially supported by CONACyT (Mexico) Grant 45693-F.  相似文献   

6.
The paper deals with a class of discrete-time Markov control processes with Borel state and action spaces, and possibly unbounded one-stage costs. The processes are given by recurrent equations x t +1=F(x t ,a t t ), t=1,2,… with i.i.d. ℜ k – valued random vectors ξ t whose density ρ is unknown. Assuming observability of ξ t , and taking advantage of the procedure of statistical estimation of ρ used in a previous work by authors, we construct an average cost optimal adaptive policy. Received March/Revised version October 1997  相似文献   

7.
We are concerned with discrete-time stochastic control models for which the random disturbances are independent with a common unknown distribution. When the state space is compact, we prove that mild continuity conditions are sufficient to obtain adaptive policies which are asymptotically optimal with respect to the discounted reward criterion.This research was supported in part by the Consejo Nacional de Ciencia y Tecnología (CONACYT), Mexico City, Mexico, under Grant No. PCEXCNA-050156.  相似文献   

8.
We consider constrained discounted-cost Markov control processes in Borel spaces, with unbounded costs. Conditions are given for the constrained problem to be solvable, and also equivalent to an equality-constrained (EC) linear program. In addition, it is shown that there is no duality gap between EC and its dual program EC*, and that, under additional assumptions, also EC* is solvable, so that in fact the strong duality condition holds. Finally, a Farkas-like theorem is included, which gives necessary and sufficient conditions for the primal program EC to be consistent.  相似文献   

9.
This paper investigates the problem of the optimal switching among a finite number of Markov processes, generalizing some of the author's earlier results for controlled one-dimensional diffusion. Under rather general conditions, it is shown that the optimal discounted cost function is the unique solution of a functional equation. Under more restrictive assumptions, this function is shown to be the unique solution of some quasi-variational inequalities. These assumptions are verified for a large class of control problems. For controlled Markov chains and controlled one-dimensional diffusion, the existence of a stationary optimal policy is established. Finally, a policy iteration method is developed to calculate an optimal stationary policy, if one exists.This research was sponsored by the Air Force Office of Scientific Research (AFSC), United States Air Force, under Contract No. F-49620-79-C-0165.The author would like to thank the referee for bringing Refs. 7, 8, and 9 to his attention.  相似文献   

10.
We consider a discrete-time constrained Markov decision process under the discounted cost optimality criterion. The state and action spaces are assumed to be Borel spaces, while the cost and constraint functions might be unbounded. We are interested in approximating numerically the optimal discounted constrained cost. To this end, we suppose that the transition kernel of the Markov decision process is absolutely continuous with respect to some probability measure μ  . Then, by solving the linear programming formulation of a constrained control problem related to the empirical probability measure μnμn of μ, we obtain the corresponding approximation of the optimal constrained cost. We derive a concentration inequality which gives bounds on the probability that the estimation error is larger than some given constant. This bound is shown to decrease exponentially in n. Our theoretical results are illustrated with a numerical application based on a stochastic version of the Beverton–Holt population model.  相似文献   

11.
This paper is concerned with the adaptive control problem, over the infinite horizon, for partially observable Markov decision processes whose transition functions are parameterized by an unknown vector. We treat finite models and impose relatively mild assumptions on the transition function. Provided that a sequence of parameter estimates converging in probability to the true parameter value is available, we show that the certainty equivalence adaptive policy is optimal in the long-run average sense.  相似文献   

12.
This paper is concerned with the analysis of a Nash equilibrium of a noncooperative game. It can be shown that, without complete information about the other players' objectives or interests, the group of players, as a whole, can reach a Nash equilibrium by adopting a class of adaptive expectation and dynamic adjustment processes. It is shown that, if the expectation and adjustment processes are made continuously, the stability of the overall dynamic process is independent of the specific mechanisms of the expectation and the adjustment, but depends on the properties of each player's objective or payoff function. If, however, expectation and adjustment processes are made at discrete time intervals, the stability of the discrete process depends on the speed of adjustment chosen by each player.This research was supported by ONR Contract No. N00014-75-C-0738. The authors are indebted to the referee for several valuable comments and suggestions for improvement.  相似文献   

13.
An adaptive control problem of a discrete time Markov process that is completely observed in a fixed recurrent domain and is partially observed elsewhere is formulated and a solution is given by constructing an approximately self-optimal strategy. The state space of the Markov process is either a closed subset of Euclidean space or a countable set. Another adaptive control problem is solved where the process is always only partially observed but there is a family of random times when the process evaluated at these times is a family of independent, identically distributed random variables. Accepted 26 April 1996  相似文献   

14.
We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.This research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152.  相似文献   

15.
An inequality regarding the minimum ofP(lim inf(X n D n )) is proved for a class of random sequences. This result is related to the problem of sufficiency of Markov strategies for Markov decision processes with the Dubins-Savage criterion, the asymptotical behaviour of nonhomogeneous Markov chains, and some other problems.  相似文献   

16.
The Markov Decision Process (MDP) framework is a tool for the efficient modelling and solving of sequential decision-making problems under uncertainty. However, it reaches its limits when state and action spaces are large, as can happen for spatially explicit decision problems. Factored MDPs and dedicated solution algorithms have been introduced to deal with large factored state spaces. But the case of large action spaces remains an issue. In this article, we define graph-based Markov Decision Processes (GMDPs), a particular Factored MDP framework which exploits the factorization of the state space and the action space of a decision problem. Both spaces are assumed to have the same dimension. Transition probabilities and rewards are factored according to a single graph structure, where nodes represent pairs of state/decision variables of the problem. The complexity of this representation grows only linearly with the size of the graph, whereas the complexity of exact resolution grows exponentially. We propose an approximate solution algorithm exploiting the structure of a GMDP and whose complexity only grows quadratically with the size of the graph and exponentially with the maximum number of neighbours of any node. This algorithm, referred to as MF-API, belongs to the family of Approximate Policy Iteration (API) algorithms. It relies on a mean-field approximation of the value function of a policy and on a search limited to the suboptimal set of local policies. We compare it, in terms of performance, with two state-of-the-art algorithms for Factored MDPs: SPUDD and Approximate Linear Programming (ALP). Our experiments show that SPUDD is not generally applicable to solving GMDPs, due to the size of the action space we want to tackle. On the other hand, ALP can be adapted to solve GMDPs. We show that ALP is faster than MF-API and provides solutions of similar quality for most problems. However, for some problems MF-API provides significantly better policies, and in all cases provides a better approximation of the value function of approximate policies. These promising results show that the GMDP model offers a convenient framework for modelling and solving a large range of spatial and structured planning problems, that can arise in many different domains where processes are managed over networks: natural resources, agriculture, computer networks, etc.  相似文献   

17.
Zero-sum stochastic games model situations where two persons, called players, control some dynamic system, and both have opposite objectives. One player wishes typically to minimize a cost which has to be paid to the other player. Such a game may also be used to model problems with a single controller who has only partial information on the system: the dynamic of the system may depend on some parameter that is unknown to the controller, and may vary in time in an unpredictable way. A worst-case criterion may be considered, where the unknown parameter is assumed to be chosen by nature (called player 1), and the objective of the controller (player 2) is then to design a policy that guarantees the best performance under worst-case behaviour of nature. The purpose of this paper is to present a survey of stochastic games in queues, where both tools and applications are considered. The first part is devoted to the tools. We present some existing tools for solving finite horizon and infinite horizon discounted Markov games with unbounded cost, and develop new ones that are typically applicable in queueing problems. We then present some new tools and theory of expected average cost stochastic games with unbounded cost. In the second part of the paper we present a survey on existing results on worst-case control of queues, and illustrate the structural properties of best policies of the controller, worst-case policies of nature, and of the value function. Using the theory developed in the first part of the paper, we extend some of the above results, which were known to hold for finite horizon costs or for the discounted cost, to the expected average cost.  相似文献   

18.
《Optimization》2012,61(4):773-800
Abstract

In this paper we study the risk-sensitive average cost criterion for continuous-time Markov decision processes in the class of all randomized Markov policies. The state space is a denumerable set, and the cost and transition rates are allowed to be unbounded. Under the suitable conditions, we establish the optimality equation of the auxiliary risk-sensitive first passage optimization problem and obtain the properties of the corresponding optimal value function. Then by a technique of constructing the appropriate approximating sequences of the cost and transition rates and employing the results on the auxiliary optimization problem, we show the existence of a solution to the risk-sensitive average optimality inequality and develop a new approach called the risk-sensitive average optimality inequality approach to prove the existence of an optimal deterministic stationary policy. Furthermore, we give some sufficient conditions for the verification of the simultaneous Doeblin condition, use a controlled birth and death system to illustrate our conditions and provide an example for which the risk-sensitive average optimality strict inequality occurs.  相似文献   

19.
We consider a discrete-time Markov decision process with a partially ordered state space and two feasible control actions in each state. Our goal is to find general conditions, which are satisfied in a broad class of applications to control of queues, under which an optimal control policy is monotonic. An advantage of our approach is that it easily extends to problems with both information and action delays, which are common in applications to high-speed communication networks, among others. The transition probabilities are stochastically monotone and the one-stage reward submodular. We further assume that transitions from different states are coupled, in the sense that the state after a transition is distributed as a deterministic function of the current state and two random variables, one of which is controllable and the other uncontrollable. Finally, we make a monotonicity assumption about the sample-path effect of a pairwise switch of the actions in consecutive stages. Using induction on the horizon length, we demonstrate that optimal policies for the finite- and infinite-horizon discounted problems are monotonic. We apply these results to a single queueing facility with control of arrivals and/or services, under very general conditions. In this case, our results imply that an optimal control policy has threshold form. Finally, we show how monotonicity of an optimal policy extends in a natural way to problems with information and/or action delay, including delays of more than one time unit. Specifically, we show that, if a problem without delay satisfies our sufficient conditions for monotonicity of an optimal policy, then the same problem with information and/or action delay also has monotonic (e.g., threshold) optimal policies.  相似文献   

20.
In this article, an adaptive fuzzy output tracking control approach is proposed for a class of multiple‐input and multiple‐output uncertain switched nonlinear systems with unknown control directions and under arbitrary switchings. In the control design, fuzzy logic systems are used to identify the unknown switched nonlinear systems. A Nussbaum gain function is introduced into the control design and the unknown control direction problem is solved. Under the framework of the backstepping control design, fuzzy adaptive control and common Lyapunov function stability theory, a new adaptive fuzzy output tracking control method is developed. It is proved that the proposed control approach can guarantee that all the signals in the closed‐loop system are bounded and the tracking error remains an adjustable neighborhood of the origin. A numerical example is provided to illustrate the effectiveness of the proposed approach. © 2015 Wiley Periodicals, Inc. Complexity 21: 155–166, 2016  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号