首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, the infinite horizon Markovian decision programming with recursive reward functions is discussed. We show that Bellman's optimal principle is applicable for our model. Then, a sufficient and necessary condition for a policy to be optimal is given. For the stationary case, an iteration algorithm for finding a stationary optimal policy is designed. The algorithm is a generalization of Howard's [7] and Iwamoto's [3] algorithms.This research was supported by the National Natural Science Foundation of China.  相似文献   

2.
In this paper we discuss the discrete time non-homogeneous discounted Markovian decision programming, where the state space and all action sets are countable. Suppose that the optimum value function is finite. We give the necessary and sufficient conditions for the existence of an optimal policy. Suppose that the absolute mean of rewards is relatively bounded. We also give the necessary and sufficient conditions for the existence of an optimal policy.  相似文献   

3.
Summary A general discrete decision process is formulated which includes both undiscounted and discounted semi-Markovian decision processes as special cases. A policy-iteration algorithm is presented and shown to converge to an optimal policy. Properties of the coupled functional equations are derived. Primal and dual linear programming formulations of the optimization problem are also given. An application is given to Markov ratio decision process.
Zusammenfassung Es wird ein allgemeiner diskreter Entscheidungsprozeß betrachtet, welcher sowohl die undiskontierten und diskontierten Semi-Markoffschen Entscheidungsprozesse als Spezialfälle enthält. Ein auf der Politik-Iteration basierendes Verfahren wird vorgestellt und die Konvergenz gegen eine optimale Politik wird bewiesen. Wir zeigen einige Eigenschaften des Paares gekoppelter Funktionalgleichungen, die in diesem Modell auftreten. Zum Schluß werden noch die Formulierungen dieses Entscheidungsmodelles als primales und duales lineares Optimierungsproblem angegeben.
  相似文献   

4.
MARKOV DECISION PROGRAMMING WITH CONSTRAINTS   总被引:1,自引:0,他引:1  
MARKOVDECISIONPROGRAMMINGWITHCONSTRAINTSLIUJIANYONG(刘建庸);LIUKE(刘克)(InstituteofAppliedMathematics,theChineseAcademyofSciences,...  相似文献   

5.
We present a class of countable state space Markovian decision models that can be investigated by means of an associated finite-state, finite-action reduced model which we call the skeleton. In particular, we obtain a turnpike theorem for the original model (Theorem 2 in Section 5) from a known turnpike theorem for the reduced finite model. For illustration, we present in detail an application of this approach to an inventory model (re-establishing a known turnpike result) and sketch analogous results for a cash-balance model and a growth model.
Zusammenfassung Wir führen eine Klasse von Markovschen Entscheidungsmodellen mit abzählbarem Zustandsraum ein, die mittels eines verbundenen, reduzierten Modells mit endlichem Zustands- und Aktionsraum, welches wir das Skelett nennen, untersucht werden können. Insbesondere erhalten wir ein Turnpike Theorem für das ursprüngliche Modell (Theorem 2 im Abschnitt 5) von einem bekannten Turnpike Theorem für das reduzierte endliche Modell. Zur Erläuterung stellen wir im einzelnen eine Anwendung dieses Ansatzes für ein Lagerhaltungsmodell (Wiederherleitung eines bekannten Turnpike Ergebnisses) dar, und wir skizzieren analoge Ergebnisse für ein Kassenhaltungsmodell und ein Wachstumsmodell.
  相似文献   

6.
7.
8.
A programming technique is described for ALGOL-like recursive procedures in which parameters and local variables are replaced by variables global to the recursive procedure and declared in a surrounding non-recursive procedure. The stack and time required for execution of a variety of recursive procedures has been determined in the languages ALGOL 60, CORAL 66 and ALGOL 68 both when programmed using this technique and when using parameters. Detailed results are presented for the Ackermann function and for tree manipulation.The global technique gives considerable savings in stack for all recursions investigated. For call-dominated recursions the technique speeds up the execution time in ALGOL 60 and CORAL 66 by about 25%. In ALGOL 68 the parameterless recursive procedure must be declared in the closed-clause which forms the particular-program rather than in a surrounding procedure in order to achieve a similar improvement in execution time.  相似文献   

9.
This article includes an empirical study of the housing market using the statistical method of Markov Process. The first phase of the study is devoted to measuring the filtering process in a selected neighborhood by estimating probabilities of transition from one income group to another, over the period 1949–1969 using four-year intervals. The estimated transition probabilities are then used to forecast occupancy structure for different periods and the suitability of applying the Markov Process for long term policy analysis in housing is examined. The final phase of the study includes an examination of steady state occupancy structure by various income categories of household. The study indicates a fruitful application of the Markov Process in long term housing policy analysis.  相似文献   

10.
The following optimality principle is established for finite undiscounted or discounted Markov decision processes: If a policy is (gain, bias, or discounted) optimal in one state, it is also optimal for all states reachable from this state using this policy. The optimality principle is used constructively to demonstrate the existence of a policy that is optimal in every state, and then to derive the coupled functional equations satisfied by the optimal return vectors. This reverses the usual sequence, where one first establishes (via policy iteration or linear programming) the solvability of the coupled functional equations, and then shows that the solution is indeed the optimal return vector and that the maximizing policy for the functional equations is optimal for every state.  相似文献   

11.
Fitting the value function in a Markovian decision process by a linear superposition of M basis functions reduces the problem dimensionality from the number of states down to M, with good accuracy retained if the value function is a smooth function of its argument, the state vector. This paper provides, for both the discounted and undiscounted cases, three algorithms for computing the coefficients in the linear superposition: linear programming, policy iteration, and least squares.  相似文献   

12.
Dynamic programming recursive equations are used to develop a procedure to obtain the set of efficient solutions to the multicriteria integer linear programming problem. An alternate method is produced by combining this procedure with branch and bound rules. Computational results are reported.  相似文献   

13.
Summary Recursive linear programming is defined by a sequence of linear programming problems in which a recursive relation is built into the system through either the coefficients of the objective function, the constraint matrix, or the right-hand side parameters. Here we consider the case where the right-hand side parameters are subject to a recursive time relation indicating how current period plans are related to past expectations and performance. Our object here is twofold: first, to analyze the stability properties of a linear recursive programming (LRP) model and second, to indicate some basic extensions of the LRP in the light of what is generally called the active approach of stochastic linear programming (SLP). Some simple theorems are developed in this connection and this is followed by a brief discussion of the possible lines of empirical applications.
Zusammenfassung Rekursives lineares Programmieren wird als eine Aufeinanderfolge von linearen Programmproblemen definiert, bei denen eine rekursive Beziehung in das System eingebaut ist, und zwar entweder über die Koeffizienten der Zielfunktion, die Matrix der Beschränkungen oder die Parameter der rechten Seite. Wir betrachten hier den Fall, bei dem die Parameter der rechten Seite einer rekursiven Zeitrelation unterliegen, die den Zusammenhang zwischen den Plänen der gegenwärtigen Periode und früheren Erwartungen sowie deren Erfüllung angibt. Wir verfolgen zwei Ziele: Erstens wollen wir die Stabilitätseigenschaften eines linearen rekursiven Programm (LRP)-Modells analysieren, und zweitens wollen wir gewisse grundlegende Erweiterungen des LRP im Hinblick auf das sogenannte aktive Verhalten beim stochastischen Linearen Programmieren (SLP) angeben. Damit zusammenhängend werden einige einfache Theoreme entwickelt und eine kurze Diskussion der möglichen Richtungen empirischer Anwendungen angeschlossen.


This work forms a part of a research project started originally at Iowa State University under the U.S. National Science Foundation Project NR 420-04-70 and continued presently by the authors. Some of the theoretical aspects closely related to this paper may be found in the following references: Sengupta, J. K., G. Tintner andC. Millham: On some theorems of stochastic linear programming with applications. Management Science, Vol. 10, October 1963, pp. 143–159. Sengupta, J. K., G. Tintner andB. Morrison: Stochastic linear programming with applications to economic models. Economica, August 1963, pp. 262–276. Sengupta, J. K.: Recursive constraints and stochastic linear programming. Accepted for publication Metrika. Sengupta, J. K., C. Millham andG. Tintner: On the stability of solutions under error in stochastic linear programming. Metrika No. 3, 1964. Sengupta, J. K. andT. Kumar: An application of sensitivity analysis to a linear programming problem. Unternehmungsforschung, Vol. 9, 1965, pp. 18–36. Sengupta, J. K.: On the stability of truncated solutions of stochastic linear programming. Mimeographed, December 20, 1963. Department of Economics, Iowa State University. (Sent for publication.) Sengupta, J. K., G. Tintner andC. Millham: A weak duality theorem for stochastic linear programming. Unternehmensforschung, Vol. 7, 1963, pp. 1–8.  相似文献   

14.
Whereas in goal programming the under-achievement with respect to (usually) unattainable goals are minimized, we propose the maximization of the over-achievements with respect to feasible goals or required values. An interactive algorithm, in which the over-achievements are maximized via a barrier function, is presented to implement the proposed approach.  相似文献   

15.
Continuous time Markovian decision models with countable state space are investigated. The existence of an optimal stationary policy is established for the expected average return criterion function. It is shown that the expected average return can be expressed as an expected discounted return of a related Markovian decision process. A policy iteration method is given which converges to an optimal deterministic policy, the policy so obtained is shown optimal over all Markov policies.  相似文献   

16.
《Optimization》2012,61(6):1017-1026
A stochastic decision model with general, non-necessarily additive reward function is considered and essential properties of such reward functions are formulated which only allow a successive proceeding in the sense of dynamic programming. Conditions for recursive—additive reward functions are given which ensure the existence of optimal strategies and the usability of value iteration to find an optimal policy and the optimal total reward.  相似文献   

17.
The well known, local recursive quadratic programming method introduced by E. R. Wilson is extended to apply to optimization problems with constraints of the type , where ranges over a compact interval of the real line. A scheme is proposed, which results in a globally convergent conceptual algorithm. Finally, two implementable versions are presented both of which converge quadratically.Research sponsored by the National Science Foundation Grant ECS-79-13148 and the Air Force Office of Scientific Research (AFOSR) United States Air Force Contract No. F49620-79-C-0178  相似文献   

18.
Summary It is shown that successive overrelaxation can be applied in solving a discounted Markovian decision problem. An overrelaxation factor yielding minimum contraction factor is determined. The minimum contraction factor turns out to be at most equal to the discount factor.
Zusammenfassung Es wird gezeigt, daß sukzessive Überrelaxation zur Lösung eines diskontierten Markoffschen Entscheidungsproblems herangezogen werden kann. Ein Überrelaxationsfaktor, der den minimalen Kontraktionsfaktor liefert, wird bestimmt. Der minimale Kontraktionsfaktor erweist sich als höchstens gleich dem Diskontierungsfaktor.
  相似文献   

19.
20.
The paper surveys the basic results and nonresults for decision rules in stochastic programming. It exhibits some of the difficulties encountered when trying to restrict the class of acceptable rules to those possessing specific functional forms. A liberal dosage of examples is provided which illustrate various cases. The treatment is unified by making use of the equivalence of various formulations which have appeared in the literature. An appendix is devoted to the P-model for stochastic programs with chance constraints.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号