首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到13条相似文献,搜索用时 15 毫秒
1.
We are concerned with discrete-time stochastic control models for which the random disturbances are independent with a common unknown distribution. When the state space is compact, we prove that mild continuity conditions are sufficient to obtain adaptive policies which are asymptotically optimal with respect to the discounted reward criterion.This research was supported in part by the Consejo Nacional de Ciencia y Tecnología (CONACYT), Mexico City, Mexico, under Grant No. PCEXCNA-050156.  相似文献   

2.
In this paper, we consider discounted-reward finite-state Markov decision processes which depend on unknown parameters. An adaptive policy inspired by the nonstationary value iteration scheme of Federgruen and Schweitzer (Ref. 1) is proposed. This policy is briefly compared with the principle of estimation and control recently obtained by Schäl (Ref. 4).This research was supported in part by the Consejo Nacional de Ciencia y Tecnología under Grant No. PCCBBNA-005008, in part by a grant from the IBM Corporation, in part by the Air Force Office of Scientific Research under Grant No. AFOSR-79-0025, in part by the National Science Foundation under Grant No. ECS-0822033, and in part by the Joint Services Electronics Program under Contract No. F49620-77-C-0101.  相似文献   

3.
Milito and Cruz have introduced a novel adaptive control scheme for finite Markov chains when a finite parametrized family of possible transition matrices is available. The scheme involves the minimization of a composite functional of the observed history of the process incorporating both control and estimation aspects. We prove the a.s. optimality of a similar scheme when the state space is countable and the parameter space a compact subset ofR d .  相似文献   

4.
This paper is concerned with the adaptive control problem, over the infinite horizon, for partially observable Markov decision processes whose transition functions are parameterized by an unknown vector. We treat finite models and impose relatively mild assumptions on the transition function. Provided that a sequence of parameter estimates converging in probability to the true parameter value is available, we show that the certainty equivalence adaptive policy is optimal in the long-run average sense.  相似文献   

5.
6.
Recent results for parameter-adaptive Markov decision processes (MDP's) are extended to partially observed MDP's depending on unknown parameters. These results include approximations converging uniformly to the optimal reward function and asymptotically optimal adaptive policies.This research was supported in part by the Consejo del Sistema Nacional de Educación Tecnologica (COSNET) under Grant 178/84, in part by the Air Force Office of Scientific Research under Grant AFOSR-84-0089, in part by the National Science Foundation under Grant ECS-84-12100, and in part by the Joint Services Electronics Program under Contract F49602-82-C-0033.  相似文献   

7.
This paper is a survey of recent results on continuous-time Markov decision processes (MDPs) withunbounded transition rates, and reward rates that may beunbounded from above and from below. These results pertain to discounted and average reward optimality criteria, which are the most commonly used criteria, and also to more selective concepts, such as bias optimality and sensitive discount criteria. For concreteness, we consider only MDPs with a countable state space, but we indicate how the results can be extended to more general MDPs or to Markov games. Research partially supported by grants NSFC, DRFP and NCET. Research partially supported by CONACyT (Mexico) Grant 45693-F.  相似文献   

8.
In this paper, we study first the problem of nonparametric estimation of the stationary density ff of a discrete-time Markov chain (Xi)(Xi). We consider a collection of projection estimators on finite dimensional linear spaces. We select an estimator among the collection by minimizing a penalized contrast. The same technique enables us to estimate the density gg of (Xi,Xi+1)(Xi,Xi+1) and so to provide an adaptive estimator of the transition density π=g/fπ=g/f. We give bounds in L2L2 norm for these estimators and we show that they are adaptive in the minimax sense over a large class of Besov spaces. Some examples and simulations are also provided.  相似文献   

9.
ABSTRACT

The asymptotic equipartition property is a basic theorem in information theory. In this paper, we study the strong law of large numbers of Markov chains in single-infinite Markovian environment on countable state space. As corollary, we obtain the strong laws of large numbers for the frequencies of occurrence of states and ordered couples of states for this process. Finally, we give the asymptotic equipartition property of Markov chains in single-infinite Markovian environment on countable state space.  相似文献   

10.
We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.This research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152.  相似文献   

11.
In this paper, we study the strong law of large numbers and the Shannon–McMillan theorem for nonhomogeneous Markov chains indexed by a Cayley tree. This article generalizes the relative results of level nonhomogeneous Markov chains indexed by a Cayley tree.  相似文献   

12.
Zero-sum stochastic games model situations where two persons, called players, control some dynamic system, and both have opposite objectives. One player wishes typically to minimize a cost which has to be paid to the other player. Such a game may also be used to model problems with a single controller who has only partial information on the system: the dynamic of the system may depend on some parameter that is unknown to the controller, and may vary in time in an unpredictable way. A worst-case criterion may be considered, where the unknown parameter is assumed to be chosen by nature (called player 1), and the objective of the controller (player 2) is then to design a policy that guarantees the best performance under worst-case behaviour of nature. The purpose of this paper is to present a survey of stochastic games in queues, where both tools and applications are considered. The first part is devoted to the tools. We present some existing tools for solving finite horizon and infinite horizon discounted Markov games with unbounded cost, and develop new ones that are typically applicable in queueing problems. We then present some new tools and theory of expected average cost stochastic games with unbounded cost. In the second part of the paper we present a survey on existing results on worst-case control of queues, and illustrate the structural properties of best policies of the controller, worst-case policies of nature, and of the value function. Using the theory developed in the first part of the paper, we extend some of the above results, which were known to hold for finite horizon costs or for the discounted cost, to the expected average cost.  相似文献   

13.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号