首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 953 毫秒
1.
We are concerned with Markov decision processes with Borel state and action spaces; the transition law and the reward function depend on anunknown parameter. In this framework, we study therecursive adaptive nonstationary value iteration policy, which is proved to be optimal under thesame conditions usually imposed to obtain the optimality of other well-knownnonrecursive adaptive policies. The results are illustrated by showing the existence of optimal adaptive policies for a class of additive-noise systems with unknown noise distribution.This research was supported in part by the Consejo Nacional de Ciencia y Tecnología under Grants PCEXCNA-050156 and A128CCOEO550, and in part by the Third World Academy of Sciences under Grant TWAS RG MP 898-152.  相似文献   

2.
We are concerned with Markov decision processes with countable state space and discrete-time parameter. The main structural restriction on the model is the following: under the action of any stationary policy the state space is acommunicating class. In this context, we prove the equivalence of ten stability/ergodicity conditions on the transition law of the model, which imply the existence of average optimal stationary policies for an arbitrary continuous and bounded reward function; these conditions include the Lyapunov function condition (LFC) introduced by A. Hordijk. As a consequence of our results, the LFC is proved to be equivalent to the following: under the action of any stationary policy the corresponding Markov chain has a unique invariant distribution which depends continuously on the stationary policy being used. A weak form of the latter condition was used by one of the authors to establish the existence of optimal stationary policies using an approach based on renewal theory.This research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152.  相似文献   

3.
This paper concerns countable state space Markov decision processes endowed with a (long-run expected)average reward criterion. For these models we summarize and, in some cases,extend some recent results on sufficient conditions to establish the existence of optimal stationary policies. The topics considered are the following: (i) the new assumptions introduced by Sennott in [20–23], (ii)necessary and sufficient conditions for the existence of a bounded solution to the optimality equation, and (iii) equivalence of average optimality criteria. Some problems are posed.This research was partially supported by the Third World Academy of Sciences (TWAS) under Grant No. TWAS RG MP 898-152.  相似文献   

4.
We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.This research was supported in part by the Advanced Technology Program of the State of Texas, in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, and in part by the Air Force Office of Scientific Research (AFSC) under Contract F49620-89-C-0044.  相似文献   

5.
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957).  相似文献   

6.
We consider the constrained optimization of a finite-state, finite action Markov chain. In the adaptive problem, the transition probabilities are assumed to be unknown, and no prior distribution on their values is given. We consider constrained optimization problems in terms of several cost criteria which are asymptotic in nature. For these criteria we show that it is possible to achieve the same optimal cost as in the non-adaptive case.We first formulate a constrained optimization problem under each of the cost criteria and establish the existence of optimal stationary policies.Since the adaptive problem is inherently non-stationary, we suggest a class ofAsymptotically Stationary (AS) policies, and show that, under each of the cost criteria, the costs of an AS policy depend only on its limiting behavior. This property implies that there exist optimal AS policies. A method for generating adaptive policies is then suggested, which leads to strongly consistent estimators for the unknown transition probabilities. A way to guarantee that these policies are also optimal is to couple them with the adaptive algorithm of [3]. This leads to optimal policies for each of the adaptive constrained optimization problems under discussion.This work was supported in part through United States-Israel Binational Science Foundation Grant BSF 85-00306.  相似文献   

7.
We consider a class of discrete-time Markov control processes with Borel state and action spaces, and d i.i.d. disturbances with unknown distribution . Under mild semi-continuity and compactness conditions, and assuming that is absolutely continuous with respect to Lebesgue measure, we establish the existence of adaptive control policies which are (1) optimal for the average-reward criterion, and (2) asymptotically optimal in the discounted case. Our results are obtained by taking advantage of some well-known facts in the theory of density estimation. This approach allows us to avoid restrictive conditions on the state space and/or on the system's transition law imposed in recent works, and on the other hand, it clearly shows the way to other applications of nonparametric (density) estimation to adaptive control.Research partially supported by The Third World Academy of Sciences under Research Grant No. MP 898-152.  相似文献   

8.
Letx k be the state variable (solution) of a stochastic difference equation. This paper gives the laws of iterated logarithm for {x k} and {x kx k τ }, which being strongly correlated, are neither stationary nor ergodic. The results obtained are then applied to Kalman filter and LQG control problem. Work supported by National Natural Science Foundation of China and the TWAS Research Grant No. 87-43.  相似文献   

9.
The aim of the paper is to show that Lyapunov-like ergodicity conditions on Markov decision processes with Borel state space and possibly unbounded cost provide the approximation of an average cost optimal policy by solvingn-stage optimization problems (n = 1, 2, ...). The used approach ensures the exponential rate of convergence. The approximation of this type would be useful to find adaptive procedures of control and to estimate stability of an optimal control under disturbances of the transition probability.Research supported in part by Consejo Nacional de Ciencia y Tecnologia (CONACYT) under grant 0635P-E9506.Research supported by Fondo del Sistema de Investigatión del Mar de Cortés under Grant SIMAC/94/CT-005.  相似文献   

10.
The Unichain classification problem detects whether a finite state and action MDP is unichain under all deterministic policies. This problem is NP-hard. This paper provides polynomial algorithms for this problem when there is a state that is either recurrent under all deterministic policies or absorbing under some action.  相似文献   

11.
The aim of this work is to study the stability for some linear partial functional differential equations. We assume that the linear part is non-densely defined and satisfies the Hille-Yosida condition. Using the positiveness, we give nessecary and sufficient conditions independently of the delay to ensure the uniform exponential stability of the solution semigroup. An application is given for a reaction diffusion equation with several delays. RID="h1" ID="h1"This work is supported by the Moroccan Grant PARS MI 36 and TWAS Grant under contract: No. 00-412 RG/MATHS/AF/AC.  相似文献   

12.
In this paper we present a fast parallel algorithm for constructing a depth first search tree for an undirected graph. The algorithm is anRNC algorithm, meaning that it is a probabilistic algorithm that runs in polylog time using a polynomial number of processors on aP-RAM. The run time of the algorithm isO(T MM(n) log3 n), and the number of processors used isP MM (n) whereT MM(n) andP MM(n) are the time and number of processors needed to find a minimum weight perfect matching on ann vertex graph with maximum edge weightn.This research was done while the first author was visiting the Mathematical Research Institute in Berkeley. Research supported in part by NSF grant 8120790.Supported by Air Force Grant AFOSR-85-0203A.  相似文献   

13.
Summary In this paper, we continue our study of the location of the zeros and poles of general Padé approximants toe z . We state and prove here new results for the asymptotic location of the normalized zeros and poles for sequences of Padé approximants toe z , and for the asymptotic location of the normalized zeros for the associated Padé remainders toe z . In so doing, we obtain new results for nontrivial zeros of Whittaker functions, and also generalize earlier results of Szegö and Olver.Research supported in part by the Air Force Office of Scientific Research under Grant AFOSR-74-2688Research supported in part by the Air Force Office of Scientific Research under Grant AFOSR-74-2729, and by the Energy Research and Development Administration (ERDA) under Grant EY-76-S-02-2075  相似文献   

14.
The set of Nash equilibria is computed for some generalized games. It is also studied for a subclass of standardn-person games.The authors acknowledge the support of CONICET (Consejo de Investigaciones Cientificas y Tecnicas de la Republica Argentina). The first author acknowledges the support from TWAS (Third World Academy of Sciences), Grant No. 86-33.  相似文献   

15.
LetL be a Lie algebra over a fieldK which acts asK-derivations on aK-algebraR. Then this action determines a crossed productR *U(L) whereU(L) is the enveloping algebra ofL. The goal of this paper is to describe the Jacobson radical ofR * U(L) forL≠0. We are most successful whenR is a p.i. algebra or Noetherian. In more general situations we at least obtain upper and lower bounds forJ(R * U(L)) which are ideals extended fromR. Furthermore, we offer an interesting example in all characteristics of a commutativeK-algebraC which admits a derivationδ such thatC isδ-prime but not semiprime. Partially supported by N.S.F. Grant No. DMS 85-00959 and by a Guggenheim Memorial Foundation Fellowship. Partially supported by N.S.F. Grant No. MCS 82-19678.  相似文献   

16.
Summary In this paper, we establish the sharpness of a theorem concerning zero-free parabolic regions for certain sequences of polynomials satisfying a three-term recurrence relation. Similarly, we establish the sharpness of a zero-free sectorial region for certain sequences of Padé approximants toe z .Research supported in part by the Air Force Office of Scientific Research under Grant AFOSR-74-2688Research supported in part by the Air Force Office of Scientific Research under Grant AFOSR-74-2729, and by the Energy Research and Development Administration (ERDA) under Grant E(11-1)-2075  相似文献   

17.
This paper studies both the average sample-path reward (ASPR) criterion and the limiting average variance criterion for denumerable discrete-time Markov decision processes. The rewards may have neither upper nor lower bounds. We give sufficient conditions on the system’s primitive data and under which we prove the existence of ASPR-optimal stationary policies and variance optimal policies. Our conditions are weaker than those in the previous literature. Moreover, our results are illustrated by a controlled queueing system. Research partially supported by the Natural Science Foundation of Guangdong Province (Grant No: 06025063) and the Natural Science Foundation of China (Grant No: 10626021).  相似文献   

18.
This article investigates the generators of certain homogeneous ideals which are associated with graphs with bounded independence numbers. These ideals first appeared in the theory oft-designs. The main theorem suggests a new approach to the Clique Problem which isNP-complete. This theorem has a more general form in commutative algebra dealing with ideals associated with unions of linear varieties. This general theorem is stated in the article; a corollary to it generalizes Turán’s theorem on the maximum graphs with a prescribed clique number. Research supported in part by NSF Grant MCS77-03533.  相似文献   

19.
The law of the iterated logarithm is proved for C[0,1] valued random variables under conditions related to those used to establish the central limit theorem.Supported in part by NSF Grant GP 18759.  相似文献   

20.
A nonlinear programming problem with nondifferentiabilities is considered. The nondifferentiabilities are due to terms of the form min(f 1(x),...,f n(x)), which may enter nonlinearly in the cost and the constraints. Necessary and sufficient conditions are developed. Two algorithms for solving this problem are described, and their convergence is studied. A duality framework for interpretation of the algorithms is also developed.This work was supported in part by the National Science Foundation under Grant No. ENG-74-19332 and Grant No. ECS-79-19396, in part by the U.S. Air Force under Grant AFOSR-78-3633, and in part by the Joint Services Electronics Program (U.S. Army, U.S. Navy, and U.S. Air Force) under Contract N00014-79-C-0424.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号