首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider infinite-horizon deterministic dynamic programming problems in discrete time. We show that the value function of such a problem is always a fixed point of a modified version of the Bellman operator. We also show that value iteration converges increasingly to the value function if the initial function is dominated by the value function, is mapped upward by the modified Bellman operator and satisfies a transversality-like condition. These results require no assumption except for the general framework of infinite-horizon deterministic dynamic programming. As an application, we show that the value function can be approximated by computing the value function of an unconstrained version of the problem with the constraint replaced by a penalty function.  相似文献   

2.
Stability is fundamental to ensure the operation of control system, but optimality is the ultimate goal to achieve the maximum performance. This paper investigates an event-triggered pinning optimal consensus control for switched multi-agent system (SMAS) via a switched adaptive dynamic programming (ADP) method. The technical contribution mainly lies in two aspects. On the one hand, in order to optimize the control performance and ensure the consensus, the switched local value function (SLVF) and the minimum-error switching law are constructed. Based on SLVF, an algorithm of switched ADP policy iteration is proposed, and its convergence and optimality are proved. On the other hand, considering that it is impractical to install a controller for each agent in reality, a pinning strategy is developed to guide the setting of the ADP controller, which can reduce the waste of control resources. A new condition is constructed to determine the minimum number of controlled vertices of the SMAS. Lastly, a numerical example is given to verify the effectiveness of the proposed method.  相似文献   

3.
Negative dynamic programming for risk-sensitive control is studied. Under some compactness and semicontinuity assumptions the following results are proved: the convergence of the value iteration algorithm to the optimal expected total reward, the Borel measurability or upper semicontinuity of the optimal value functions, and the existence of an optimal stationary policy.  相似文献   

4.
In this work, we present a new algorithm for solving complex multi-stage optimization problems involving hard constraints and uncertainties, based on dynamic and multi-parametric programming techniques. Each echelon of the dynamic programming procedure, typically employed in the context of multi-stage optimization models, is interpreted as a multi-parametric optimization problem, with the present states and future decision variables being the parameters, while the present decisions the corresponding optimization variables. This reformulation significantly reduces the dimension of the original problem, essentially to a set of lower dimensional multi-parametric programs, which are sequentially solved. Furthermore, the use of sensitivity analysis circumvents non-convexities that naturally arise in constrained dynamic programming problems. The potential application of the proposed novel framework to robust constrained optimal control is highlighted.  相似文献   

5.
In this paper, a new systematic design procedure to stabilize continuous unified chaotic systems based on discrete sliding mode control (DSMC) is presented. In contrast to the previous works, the concept of rippling control is newly introduced such that the design of DSMC can be simplified and only a single controller is needed to realize chaos suppression. As expected, under the proposed DSMC law, the unified system can be stabilized in a manner of ripple effect, even when the external uncertainty is present. Last, two examples are included to illustrate the effectiveness of the proposed rippling DSMC developed in this paper.  相似文献   

6.
We present intensional dynamic programming (IDP), a generic framework for structured dynamic programming over atomic, propositional and relational representations of states and actions. We first develop set-based dynamic programming and show its equivalence with classical dynamic programming. We then show how to describe state sets intensionally using any form of structured knowledge representation and obtain a generic algorithm that can optimally solve large, even infinite, MDPs without explicit state space enumeration. We derive two new Bellman backup operators and algorithms. In order to support the view of IDP as a Rosetta stone for structured dynamic programming, we review many existing techniques that employ either propositional or relational knowledge representation frameworks.  相似文献   

7.
The purpose of this paper is to draw a detailed comparison between Newton's method, as applied to discrete-time, unconstrained optimal control problems, and the second-order method known as differential dynamic programming (DDP). The main outcomes of the comparison are: (i) DDP does not coincide with Newton's method, but (ii) the methods are close enough that they have the same convergence rate, namely, quadratic.The comparison also reveals some other facts of theoretical and computational interest. For example, the methods differ only in that Newton's method operates on a linear approximation of the state at a certain point at which DDP operates on the exact value. This would suggest that DDP ought to be more accurate, an anticipation borne out in our computational example. Also, the positive definiteness of the Hessian of the objective function is easy to check within the framework of DDP. This enables one to propose a modification of DDP, so that a descent direction is produced at each iteration, regardless of the Hessian.Efforts of the first author were partially supported by the South African Council for Scientific and Industrial Research, and those of the second author by NSF Grants Nos. CME-79-05010 and CEE-81-10778.  相似文献   

8.
9.
The problem of characterizing the minimum perturbations to parameters in future stages of a discrete dynamic program necessary to change the optimal first policy is considered. Lower bounds on these perturbations are derived and used to establish ranges for the reward functions over which the optimal first policy is robust. A numerical example is presented to illustrate factors affecting the tightness of these bounds.  相似文献   

10.
This paper investigates a dynamic event-triggered optimal control problem of discrete-time (DT) nonlinear Markov jump systems (MJSs) via exploring policy iteration (PI) adaptive dynamic programming (ADP) algorithms. The performance index function (PIF) defined in each subsystem is updated by utilizing an online PI algorithm, and the corresponding control policy is derived via solving the optimal PIF. Then, we adopt neural network (NN) techniques, including an actor network and a critic network, to estimate the iterative PIF and control policy. Moreover, the designed dynamic event-triggered mechanism (DETM) is employed to avoid wasting additional resources when the estimated iterative control policy is updated. Finally, based on the Lyapunov difference method, it is proved that the system stability and the convergence of all signals can be guaranteed under the developed control scheme. A simulation example for DT nonlinear MJSs with two system modes is presented to demonstrate the feasibility of the control design scheme.  相似文献   

11.
We propose an efficient dynamic programming algorithm for solving a bilevel program where the leader controls the capacity of a knapsack, and the follower solves the resulting knapsack problem. We propose new recursive rules and show how to solve the problem as a sequence of two standard knapsack problems.  相似文献   

12.
13.
A class of nonseparable dynamic programming problems   总被引:3,自引:0,他引:3  
A solution procedure is proposed for a class of deterministic sequential decision problems whose objective functions are of the form f n(x n)+(g n(x n)) where is differentiable and either concave or convex. The procedure calls for the collaboration between dynamic programming and c-programming, and is demonstrated in our treatment of a minimum variance type problem.  相似文献   

14.
The new algorithm presented here solves medium size multi-dimensional dynamic programming problems in a relatively short computational time with no fast-memory restraints. The algorithm converges to the global optimal solution under some differentiability and convexity assumptions.The procedure is to solve a succession of dynamic programming problems, the state sets of which are limited to only a very small subset of the original state space. The interrelated definition of state sets for successive subproblems facilitates an algorithmic convergence while moving the subsets to contain the optimal states at the end.  相似文献   

15.

We consider optimal control problems for systems described by stochastic differential equations with delay (SDDE). We prove a version of Bellman's principle of optimality (the dynamic programming principle) for a general class of such problems. That the class in general means that both the dynamics and the cost depends on the past in a general way. As an application, we study systems where the value function depends on the past only through some weighted average. For such systems we obtain a Hamilton-Jacobi-Bellman partial differential equation that the value function must solve if it is smooth enough. The weak uniqueness of the SDDEs we consider is our main tool in proving the result. Notions of strong and weak uniqueness for SDDEs are introduced, and we prove that strong uniqueness implies weak uniqueness, just as for ordinary stochastic differential equations.  相似文献   

16.
This paper is concerned with a class of 2-dimensional spatiotemporal discrete systems (2d spatiotemporal discrete systems), or 2-dimensional and 2-directional discrete systems (2d-2D discrete systems). Some sufficient conditions for this system to be stable and some illustrative examples for this system to be chaotic in the sense of Devaney and of Li-Yorke are derived and discussed.  相似文献   

17.
In this paper we revisit an existing dynamic programming algorithm for finding optimal subtrees in edge weighted trees. This algorithm was sketched by Maffioli in a technical report in 1991. First, we adapt this algorithm for the application to trees that can have both node and edge weights. Second, we extend the algorithm such that it does not only deliver the values of optimal trees, but also the trees themselves. Finally, we use our extended algorithm for developing heuristics for the k-cardinality tree problem in undirected graphs G with node and edge weights. This NP-hard problem consists of finding in the given graph a tree with exactly k edges such that the sum of the node and the edge weights is minimal. In order to show the usefulness of our heuristics we conduct an extensive computational analysis that concerns most of the existing problem instances. Our results show that with growing problem size the proposed heuristics reach the performance of state-of-the-art metaheuristics. Therefore, this study can be seen as a cautious note on the scaling of metaheuristics.  相似文献   

18.
This paper is concerned with classical concave cost multi-echelon production/inventory control problems studied by W. Zangwill and others. It is well known that the problem with m production steps and n time periods can be solved by a dynamic programming algorithm in O(n 4 m) steps, which is considered as the fastest algorithm for solving this class of problems. In this paper, we will show that an alternative 0–1 integer programming approach can solve the same problem much faster particularly when n is large and the number of 0–1 integer variables is relatively few. This class of problems include, among others problem with set-up cost function and piecewise linear cost function with fewer linear pieces. The new approach can solve problems with mixed concave/convex cost functions, which cannot be solved by dynamic programming algorithms.  相似文献   

19.
We show the existence ofaverage cost (AC-) optimal policy for an inventory system withuncountable state space; in fact, the AC-optimal cost and an AC-optimal stationary policy areexplicitly computed. In order to do this, we use a variant of thevanishing discount factor approach, which have been intensively studied in recent years but the available results not cover the inventory problem we are interested in.The work of the first author (OVA) was partially supported by Fondo del Sistema de Investigación del Mar de Cortéz under grant SIMAC/94/CT-005. The work of the second author (RMdO) was partially supported by Consejo Nacional de Ciencia y Tecnologia (CONACyT) under grant 0635P-E9506.  相似文献   

20.
This paper addresses Markov Decision Processes over compact state and action spaces. We investigate the special case of linear dynamics and piecewise-linear and convex immediate costs for the average cost criterion. This model is very general and covers many interesting examples, for instance in inventory management. Due to the curse of dimensionality, the problem is intractable and optimal policies usually cannot be computed, not even for instances of moderate size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号