共查询到20条相似文献,搜索用时 55 毫秒
1.
2.
3.
In the theory and applications of Markov decision processes introduced by Howard and subsequently developed by many authors, it is assumed that actions can be chosen independently at each state. A policy constrained Markov decision process is one where selecting a given action in one state restricts the choice of actions in another. This note describes a method for determining a maximal gain policy in the policy constrained case. The method involves the use of bounds on the gain of the feasible policies to produce a policy ranking list. This list then forms a basis for a bounded enumeration procedure which yields the optimal policy. 相似文献
4.
Norihiro Mizuno 《Operations Research Letters》1985,3(6):307-311
This paper develops an efficient LP algorithm for solving single chain undiscounted Markov decision problems. The algorithm imposes, in the framework of the simplex method, the multiple choice constraints that exactly one basic variable be chosen from each Markov state. It is proved that the algorithm converges to an optimal solution in a finite number of steps. 相似文献
5.
6.
We consider a discrete time finite Markov decision process (MDP) with the discounted and weighted reward optimality criteria. In [1] the authors considered some decomposition of limiting average MDPs. In this paper, we use an analogous approach for discounted and weighted MDPs. Then, we construct some hierarchical decomposition algorithms for both discounted and weighted MDPs. 相似文献
7.
《Optimization》2012,61(8):949-968
If the constraints in an optimization problem are dependent on a random parameter, we would like to ensure that they are fulfilled with a high level of reliability. The most natural way is to employ chance constraints. However, the resulting problem is very hard to solve. We propose an alternative formulation of stochastic programs using penalty functions. The expectations of penalties can be left as constraints leading to generalized integrated chance constraints, or incorporated into the objective as a penalty term. We show that the penalty problems are asymptotically equivalent under quite mild conditions. We discuss applications of sample-approximation techniques to the problems with generalized integrated chance constraints and propose rates of convergence for the set of feasible solutions. We will direct our attention to the case when the set of feasible solutions is finite, which can appear in integer programming. The results are then extended to the bounded sets with continuous variables. Additional binary variables are necessary to solve sample-approximated chance-constrained problems, leading to a large mixed-integer non-linear program. On the other hand, the problems with penalties can be solved without adding binary variables; just continuous variables are necessary to model the penalties. The introduced approaches are applied to the blending problem leading to comparably reliable solutions. 相似文献
8.
This paper extends the fractional programming problem with set-inclusive constraints studied earlier by replacing every coefficient vector in the objective function with a convex set. A dual is formulated, and well-known duality results are established. A numerical example illustrates the dual strategy to obtain the value of the initial problem.The research of the first author was conducted while he was on sabbatical at the Department of Operations Research, Stanford University, Stanford, California. The financial assistance of the International Council for Exchange of Scholars is gratefully acknowledged. The author is grateful to the Department of Operations Research at Stanford for the use of its research facilities. 相似文献
9.
V. I. Zabotin T. F. Minnibaev 《Computational Mathematics and Mathematical Physics》2008,48(2):242-250
The convergence of the method of feasible directions is proved for the case of the smooth objective function and a constraint in the form of the difference of convex sets (the so-called preconvex set). It is shown that the method converges to the set of stationary points, which generally is narrower than the corresponding set in the case of a smooth function and smooth constraints. The scheme of the proof is similar to that proposed earlier by Karmanov. 相似文献
10.
N. Kanzi 《Journal of Mathematical Analysis and Applications》2009,351(1):170-261
We consider a nonsmooth semi-infinite programming problem with a feasible set defined by inequality and equality constraints and a set constraint. First, we study some alternative theorems which involve linear and sublinear functions and a convex set and we propose several generalizations of them. Then, alternative theorems are applied to obtain, under different constraint qualifications, several necessary optimality conditions in the type of Fritz-John and Karush-Kuhn-Tucker. 相似文献
11.
J. G. S. Patiño 《Journal of Optimization Theory and Applications》1987,55(3):391-401
We consider the problem of minimizing f(y)dm with y dm=c,c fixed. The functionf is assumed to be continuous, but need not be convex. For this problem, we give necessary and sufficient conditions for the existence of solutions. We also give conditions under which uniqueness in a certain sense holds, and we show a relation which holds between the minimizers of two different problems and the corresponding values of the constraintsc.This research was supported by FINEP-Brazil, Grant Nos. 62.24-0416-00 and 4.2.82.0719-00. 相似文献
12.
《Optimization》2012,61(3):431-455
The aim of this paper is to give a survey of recent developments in the area of successive approximations for Markov decision processes and Markov games. We will emphasize two aspects, viz. the conditions under which successive approximations converge in some strong sense and variations of these methods which diminish the amount of computational work to be executed. With respect to the first aspect it will be shown how much unboundedness of the rewards may be allowed without violation of the convergence With respect to the second aspect we will present four ideas, that can be applied in conjunction, which may diminish the amount of work to be done. These ideas are: 1. the use of the actual convergence of the iterates for the construction of upper and lower bounds (Macqueen bounds), 2. the use of alternative policy improvement procedures (based on stopping times), 3. a better evaluation of the values of actual policies in each iteration step by a value oriented approach, 4. the elimination of suboptimal actions not only permanently, but also temporarily. The general presentation is given for Markov decision processes with a final section devoted to the possibilities of extension to Markov games. 相似文献
13.
Gwo Dong Lin 《Mathematical Methods of Operations Research》1986,30(1):A79-A82
To aggregate constraints is a technique for solving the integer programming problem. In this note we modify a result of Zionts (1974); without this modification, there is a counterexample for Zionts' result. Further, we give an elegant theorem which considers the aggregation of nonlinear constraints.This work was partially supported by the Chinese National Science Council. 相似文献
14.
WU YunanInstitute of Policy Management Chinese Academy of Sciences Beijing China 《中国科学A辑(英文版)》2004,47(1):65-71
The concept of vector optimization problems with equilibrium constraints (VOPEC) is introduced. By using the continuity results of the approximate solution set to the equilibrium problem, we obtain the same results of the marginal map and the approximate value in VOPEC (e) for vector-valued mapping. 相似文献
15.
刘克 《应用数学学报(英文版)》1999,15(2):183-189
1.IntrodnctionTheweightedMarkovdecisionprocesses(MDP's)havebeenextensivelystudiedsince1980's,seeforinstance,[1-6]andsoon.ThetheoryofweightedMDP'swithperturbedtransitionprobabilitiesappearstohavebeenmentionedonlyin[7].Thispaperwilldiscussthemodelsofwe... 相似文献
16.
Andrzej Ruszczyński 《Mathematical Programming》2010,125(2):235-261
We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision
models: a finite horizon model and a discounted infinite horizon model. For both models we derive risk-averse dynamic programming
equations and a value iteration method. For the infinite horizon problem we develop a risk-averse policy iteration method
and we prove its convergence. We also propose a version of the Newton method to solve a nonsmooth equation arising in the
policy iteration method and we prove its global convergence. Finally, we discuss relations to min–max Markov decision models. 相似文献
17.
Two pairs of non-differentiable multiobjective symmetric dual problems with cone constraints over arbitrary cones, which are Wolfe type and Mond–Weir type, are considered. On the basis of weak efficiency with respect to a convex cone, we obtain symmetric duality results for the two pairs of problems under cone-invexity and cone-pseudoinvexity assumptions on the involved functions. Our results extend the results in Khurana [S. Khurana, Symmetric duality in multiobjective programming involving generalized cone-invex functions, European Journal of Operational Research 165 (2005) 592–597] to the non-differentiable multiobjective symmetric dual problem. 相似文献
18.
A Kind of direct methods is presented for the solution of optimal control problems with state constraints.These methods are sequential quadratic programming methods.At every iteration a quadratic programming which is obtained by quadratic approximation to Lagrangian function and Linear approximations to constraints is solved to get a search direction for a merit function.The merit function is formulated by augmenting the Lagrangian funetion with a penalty term.A line search is carried out along the search direction to determine a step length such that the merit function is decreased.The methods presented in this paper include continuous sequential quadratic programming methods and discreate sequential quadrade programming methods. 相似文献
19.
C. Singh 《Journal of Optimization Theory and Applications》1982,38(1):33-42
Duality results are established in convex programming with the set-inclusive constraints studied by Soyster. The recently developed duality theory for generalized linear programs by Thuente is further generalized and also brought into the framework of Soyster's theory. Convex programming with set-inclusive constraints is further extended to fractional programming. 相似文献