首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
In this paper, we study the average optimality for continuous-time controlled jump Markov processes in general state and action spaces. The criterion to be minimized is the average expected costs. Both the transition rates and the cost rates are allowed to be unbounded. We propose another set of conditions under which we first establish one average optimality inequality by using the well-known “vanishing discounting factor approach”. Then, when the cost (or reward) rates are nonnegative (or nonpositive), from the average optimality inequality we prove the existence of an average optimal stationary policy in all randomized history dependent policies by using the Dynkin formula and the Tauberian theorem. Finally, when the cost (or reward) rates have neither upper nor lower bounds, we also prove the existence of an average optimal policy in all (deterministic) stationary policies by constructing a “new” cost (or reward) rate. Research partially supported by the Natural Science Foundation of China (Grant No: 10626021) and the Natural Science Foundation of Guangdong Province (Grant No: 06300957).  相似文献   

2.
This paper is a survey of papers which make use of nonstandard Markov decision process criteria (i.e., those which do not seek simply to optimize expected returns per unit time or expected discounted return). It covers infinite-horizon nondiscounted formulations, infinite-horizon discounted formulations, and finite-horizon formulations. For problem formulations in terms solely of the probabilities of being in each state and taking each action, policy equivalence results are given which allow policies to be restricted to the class of Markov policies or to the randomizations of deterministic Markov policies. For problems which cannot be stated in such terms, in terms of the primitive state setI, formulations involving a redefinition of the states are examined.The author would like to thank two referees for a very thorough and helpful referceing of the original article and for the extra references (Refs. 47–52) now added to the original reference list.  相似文献   

3.
Directed hypergraphs represent a general modelling and algorithmic tool, which have been successfully used in many different research areas such as artificial intelligence, database systems, fuzzy systems, propositional logic and transportation networks. However, modelling Markov decision processes using directed hypergraphs has not yet been considered.In this paper we consider finite-horizon Markov decision processes (MDPs) with finite state and action space and present an algorithm for finding the K best deterministic Markov policies. That is, we are interested in ranking the first K deterministic Markov policies in non-decreasing order using an additive criterion of optimality. The algorithm uses a directed hypergraph to model the finite-horizon MDP. It is shown that the problem of finding the optimal policy can be formulated as a minimum weight hyperpath problem and be solved in linear time, with respect to the input data representing the MDP, using different additive optimality criteria.  相似文献   

4.
The convergence properties for reinforcement learning approaches, such as temporal differences and Q-learning, have been established under moderate assumptions for discrete state and action spaces. In practice, however, many systems have either continuous action spaces or a large number of discrete elements. This paper presents an approximate dynamic programming approach to reinforcement learning for continuous action set-point regulator problems, which learns near-optimal control policies based on scalar performance measures. The continuous-action space (CAS) algorithm uses derivative-free line search methods to obtain the optimal action in the continuous space. The theoretical convergence properties of the algorithm are presented. Several heuristic stopping criteria are investigated and practical application is illustrated by two example problems—the inverted pendulum balancing problem and the power system stabilization problem.  相似文献   

5.
We consider partially observable Markov decision processes with finite or countably infinite (core) state and observation spaces and finite action set. Following a standard approach, an equivalent completely observed problem is formulated, with the same finite action set but with anuncountable state space, namely the space of probability distributions on the original core state space. By developing a suitable theoretical framework, it is shown that some characteristics induced in the original problem due to the countability of the spaces involved are reflected onto the equivalent problem. Sufficient conditions are then derived for solutions to the average cost optimality equation to exist. We illustrate these results in the context of machine replacement problems. Structural properties for average cost optimal policies are obtained for a two state replacement problem; these are similar to results available for discount optimal policies. The set of assumptions used compares favorably to others currently available.This research was supported in part by the Advanced Technology Program of the State of Texas, in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, and in part by the Air Force Office of Scientific Research (AFSC) under Contract F49620-89-C-0044.  相似文献   

6.
This paper proposes a simple heuristic that generates a solution for echelon (r,nQ,T) policies by sequentially solving a deterministic demand problem, a subproblem with fixed reorder intervals, and a subproblem with fixed batch sizes. For each of these problems, we further simplify the computation by solving a series of single-stage systems whose parameters are obtained directly from the original problem data. In a numerical study, we find that this heuristic outperforms an existing one in the literature.  相似文献   

7.
This paper deals with a general class of piecewise deterministic control systems that encompasses FMS flow control models. One uses the Markov renewal decision process formalism to characterize optimal policies via a discrete event dynamic programming approach. A family of control problems with a random stopping time is associated with these optimality conditions. These problems can be reformulated as infinite horizon deterministic control problems. It is then shown how the so-calledturnpike property should hold for these deterministic control problems under classical convexity assumptions. These turnpikes have the same generic properties as the attractors obtained via a problem specific approach in FMS flow control models and production planning and are calledhedging points in this literature.This research has been supported by NSERC-Canada, Grants No. A4952 by FCAR-Québec, Grant No. 88EQ3528, Actions Structurantes, MESS-Québec, Grant No. 6.1/7.4(28), and FNRS-Switzerland.  相似文献   

8.
We are concerned with Markov decision processes with Borel state and action spaces; the transition law and the reward function depend on anunknown parameter. In this framework, we study therecursive adaptive nonstationary value iteration policy, which is proved to be optimal under thesame conditions usually imposed to obtain the optimality of other well-knownnonrecursive adaptive policies. The results are illustrated by showing the existence of optimal adaptive policies for a class of additive-noise systems with unknown noise distribution.This research was supported in part by the Consejo Nacional de Ciencia y Tecnología under Grants PCEXCNA-050156 and A128CCOEO550, and in part by the Third World Academy of Sciences under Grant TWAS RG MP 898-152.  相似文献   

9.
In the context of stochastic resource-constrained project scheduling we introduce a novel class of scheduling policies, the linear preselective policies. They combine the benefits of preselective policies and priority policies; two classes that are well known from both deterministic and stochastic scheduling. We study several properties of this new class of policies which indicate its usefulness for computational purposes. Based on a new representation of preselective policies as and/or precedence constraints we derive efficient algorithms for computing earliest job start times and state a necessary and sufficient dominance criterion for preselective policies.  A computational experiment based on 480 instances empirically validates the theoretical findings.  相似文献   

10.
In this paper, we consider a mean–variance optimization problem for Markov decision processes (MDPs) over the set of (deterministic stationary) policies. Different from the usual formulation in MDPs, we aim to obtain the mean–variance optimal policy that minimizes the variance over a set of all policies with a given expected reward. For continuous-time MDPs with the discounted criterion and finite-state and action spaces, we prove that the mean–variance optimization problem can be transformed to an equivalent discounted optimization problem using the conditional expectation and Markov properties. Then, we show that a mean–variance optimal policy and the efficient frontier can be obtained by policy iteration methods with a finite number of iterations. We also address related issues such as a mutual fund theorem and illustrate our results with an example.  相似文献   

11.
This paper deals with Blackwell optimality for continuous-time controlled Markov chains with compact Borel action space, and possibly unbounded reward (or cost) rates and unbounded transition rates. We prove the existence of a deterministic stationary policy which is Blackwell optimal in the class of all admissible (nonstationary) Markov policies, thus extending previous results that analyzed Blackwell optimality in the class of stationary policies. We compare our assumptions to the corresponding ones for discrete-time Markov controlled processes.  相似文献   

12.
We consider a class of Markov decision processes withfinite state and action spaces which, essentially, is determined by the following condition: The state space isirreducible under the action of any stationary policy. However, except by this restriction, the transition law iscompletely unknown to the controller. In this context, we find a set of policies under which thefrequency estimators of the transition law are strongly consistent and then, this result is applied to constructadaptive asymptotically discount-optimal policies.Dedicated to Professor Truman O. Lewis, on the occasion of his sixtieth birthdayThis research was supported in part by the Third World Academy of Sciences (TWAS) under Grant TWAS RG MP 898-152, and in part by the Consejo Nacional de Ciencia y Tecnología (CONACYT) under Grant A128CCOEO550 (MT-2).  相似文献   

13.
This paper is devoted to studying continuous-time Markov decision processes with general state and action spaces, under the long-run expected average reward criterion. The transition rates of the underlying continuous-time Markov processes are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. We provide new sufficient conditions for the existence of average optimal policies. Moreover, such sufficient conditions are imposed on the controlled process’ primitive data and thus they are directly verifiable. Finally, we apply our results to two new examples.  相似文献   

14.
We give mild conditions for the existence of optimal solutions for a Markov decision problem with average cost, under m constraints of the same kind, in Borel actions and states spaces. Moreover, there is an optimal policy that is a convex combination of at most m+1 deterministic policies.  相似文献   

15.
In many distributed computing systems, stochastically arriving jobs need to be assigned to servers with the objective of minimizing waiting times. Many existing dispatching algorithms are basically included in the SQ(d) framework: Upon arrival of a job, \(d\ge 2\) servers are contacted uniformly at random to retrieve their state and then the job is routed to a server in the best observed state. One practical issue in this type of algorithm is that server states may not be observable, depending on the underlying architecture. In this paper, we investigate the assignment problem in the open-loop setting where no feedback information can flow dynamically from the queues back to the controller, i.e., the queues are unobservable. This is an intractable problem, and unless particular cases are considered, the structure of an optimal policy is not known. Under mild assumptions and in a heavy-traffic many-server limiting regime, our main result proves the optimality of a subset of deterministic and periodic policies within a wide set of (open-loop) policies that can be randomized or deterministic and can be dependent on the arrival process at the controller. The limiting value of the scaled stationary mean waiting time achieved by any policy in our subset provides a simple approximation for the optimal system performance.  相似文献   

16.
This paper deals with the output feedback H∞ control problem for a class of nonlinear stochastic systems. Based on the latest developed theory of stochastic dissipation, a notable result about the nonlinear H∞ output feedback control of deterministic system is generalized to the stochastic case. Finally, in the cases of state feedback and output feedback, two families of controllers are provided respectively.  相似文献   

17.
Consider a tandem queue model with a single server who can switch instantaneously from one queue to another. Customers arrive according to a Poisson process with rate λ . The amount of service required by each customer at the ith queue is an exponentially distributed random variable with rate μi. Whenever two or more customers are in the system, the decision as to which customer should be served first depends on the optimzation criterion. In this system all server allocation policies in the finite set of work conserving deterministic policies have the same expected first passage times (makespan) to empty the system of customers from any initial state. However, a unique policy maximizes the first passage probability of empty-ing the system before the number of customers exceeds K, for any value of K, and it stochastically minimizes (he number of customers in the system at any time t > 0 . This policy always assigns the server to the non empty queue closest to the exit  相似文献   

18.
《Optimization》2012,61(12):1427-1447
This article is concerned with the limiting average variance for discrete-time Markov control processes in Borel spaces, subject to pathwise constraints. Under suitable hypotheses we show that within the class of deterministic stationary optimal policies for the pathwise constrained problem, there exists one with a minimal variance.  相似文献   

19.
This paper considers Stackelberg solutions for two-level linear programming problems under fuzzy random environments. To deal with the formulated fuzzy random two-level linear programming problem, an α-stochastic two-level linear programming problem is defined through the introduction of α-level sets of fuzzy random variables. Taking into account vagueness of judgments of decision makers, fuzzy goals are introduced and the α-stochastic two-level linear programming problem is transformed into the problem to maximize the satisfaction degree for each fuzzy goal. Through fractile criterion optimization in stochastic programming, the transformed stochastic two-level programming problem can be reduced to a deterministic two-level programming problem. An extended concept of Stackelberg solution is introduced and a numerical example is provided to illustrate the proposed method.  相似文献   

20.
We consider a discrete-time Markov decision process with a partially ordered state space and two feasible control actions in each state. Our goal is to find general conditions, which are satisfied in a broad class of applications to control of queues, under which an optimal control policy is monotonic. An advantage of our approach is that it easily extends to problems with both information and action delays, which are common in applications to high-speed communication networks, among others. The transition probabilities are stochastically monotone and the one-stage reward submodular. We further assume that transitions from different states are coupled, in the sense that the state after a transition is distributed as a deterministic function of the current state and two random variables, one of which is controllable and the other uncontrollable. Finally, we make a monotonicity assumption about the sample-path effect of a pairwise switch of the actions in consecutive stages. Using induction on the horizon length, we demonstrate that optimal policies for the finite- and infinite-horizon discounted problems are monotonic. We apply these results to a single queueing facility with control of arrivals and/or services, under very general conditions. In this case, our results imply that an optimal control policy has threshold form. Finally, we show how monotonicity of an optimal policy extends in a natural way to problems with information and/or action delay, including delays of more than one time unit. Specifically, we show that, if a problem without delay satisfies our sufficient conditions for monotonicity of an optimal policy, then the same problem with information and/or action delay also has monotonic (e.g., threshold) optimal policies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号