期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

全文获取类型

收费全文	8篇
免费	1篇
国内免费	5篇

专业分类

数学

14篇

出版年

1990年	1篇
1987年	2篇
1986年	2篇
1985年	2篇
1984年	3篇
1983年	1篇
1980年	1篇
1978年	1篇
1974年	1篇

排序方式： 共有14条查询结果，搜索用时 15 毫秒

1 [2] 下一页 » 末页»

ON THE PROPERTIES OFε(≥0) OPTIMAL POLICIES IN DISCOUNTED UNBOUNDED RETURN MODEL

董泽清张昇《应用数学学报(英文版)》1987,(1)

This paper investigates the properties of ε(≥0) optimal policies in the model of [2].It is shownthat,if π~*=(π_0~*,π_1~*,…,π_n~*,π_(n+1)~*,…)is a β-discounted optimal policy,then(π_0~*,π_1~*,…,π_n~*)~∞ for alln≥0 is also a β-discounted optimal policy.Under some condition we prove that stochastic stationarypolicy π_n~(*∞)corresponding to the decision rule π_n~* is also optimal for the same discounting factor β.Wehave also shown that for each β-optimal stochastic stationary policy π_0~(*∞),π_0~(*∞) can be decomposed intoseveral decision rules to which the corresponding stationary policies are also β-optimal separately;and conversely,a proper convex combination of these decision rules is identified with the former π_0~*.We have further proved that for any (ε,β)-optimal policy,say π~*=(π_0~*,π_1~*,…,π_n~*,π_(n+1)~*,…),(π_0~*,π_1~*,…,π_(n-1)~*)∞ is ((1-β~n)~(-1)ε,β)optimal for n>0.At the end of this paper we mention that the resultsabout convex combinations and de 相似文献

平稳无后效流的特性及其应用

董泽清林元烈《数学学报》1984,27(1):82-95

本文对平稳无后效流的特性作了进一步的探讨,给出了几个新的且易于验证的充要条件.并将所得的结果用于求一些排队系统,在统计平衡下顾客的实等待时间分布. 相似文献

无界报酬折扣半马氏模型最优策略的结构

下载免费PDF全文

董泽清刘克《中国科学A辑》1985,28(11):975-985

本文研究Lippmann型无界报酬折扣半马氏决策规划(简记为URSMDP)最优策略的结构。我们证明了:任给一策略,若它是a折扣最优的,则随机平稳策略,对同一a也是折扣最优的;对任给的整数n≥1,我们也给出了(在适当历史下)也是a折扣最优的充分条件;任一随机a折扣最优平稳策略必可分解为若干个决定性平稳最优策略(对同一a)的凸组合。从而较完满地解决了该模型最优策略的结构问题。相似文献

STRUCTURE OF OPTIMAL POLICIES FOR DISCOUNTED SEMI-MARKOV DECISION PROGRAMMING WITH UNBOUNDED REWARDS

董泽清刘克《数学进展》1985,(1)

In this paper, we discuss the structure of optimal policies for discountedsemi--Markov decision programming with unbounded rewards: {S, (A(i), i∈S), q, t,r,V_α}, where state space S is a countable set; in state i∈S, available action setA(i) is any set, and (A(i),(i)) is a measurable space; q is a time homogeneousfamily of jumps of states; t is a distributiou family of state jump's time, andonly depends on current state and current action too; V_αis the αa-discounted totalexpected reward. 相似文献

马尔可夫决策规划的现状和展望

董泽清《运筹学学报》1987,(2)

§1.引言人类在征服自然、改造世界的过程中,最迷人的莫过于人能预测系统的未来,并能控制(至少影响)系统未来的发展,马尔可夫决策规划(Markov Decision Programming,简记为MDP)就是研究控制马尔可夫型随机系统未来发展的一门学科,也可以说它是研究马尔可夫型随机系统的最优序贯决策的一门学科.这种系统要在一系列的时刻点上(甚至是连续点上)都要作决策,在每个观察时刻,决策者根据观察到的系统状态,从它可用的行动(措施、方案等)集中选用其一(即作决策),这将导致两件事情发生:(i)将获得一定的效应;(ii)能确定相似文献

EXISTENCE OF OPTIMAL POLICY FOR TIME NON-HOMOGENEOUS DISCOUNTED MARKOVIAN DECISION PR0GRAMMING

郭世贞董泽清《应用数学学报(英文版)》1990,(4)

In this paper we discuss the discrete, time non--homogeneous discounted Markovian decisionprogramming, where the state space and all action sets are countable. Suppose that the optimumvalue function is finite. We give the necessary and sufficient conditions for the existence of anoptimal policy. Suppose that the absolute mean of rewards is relatively bounded. We also give thenecessary and sufficient conditions for the existence of an optimal policy. 相似文献

折扣模型最优策略的结构

董泽清刘克《数学研究及应用》1986,6(3):125-134

本文研究了折扣马尔可夫决策规划(以下简记为MDP)最优策略的结构。证明了:任给一策略π^*=(π_G^*,π₁^*,…,π_n^*,π_n+1^*,…),若它是β折扣最优的,则随机平稳策略也是β折扣最优的;对任何n(≥1),我们也给出了随机平稳策略也是β折扣最优的充分条件。还证明了:任给一随机平稳策略π₀ 相似文献

马氏决策规划的加速逼近算法与最小方差问题

董泽清《数学学报》1978,21(2):135-150

我们涉及的折扣马氏决策规划(有些著者称为马氏决策过程),具有状态空问与每个状态可用的决策集均为可数无穷集、次随机转移律族、有界报酬函数.给出了一个求(ε_)最优平稳策略的加速收敛逐次逼近算法,比White的逐次逼近算法更快地收敛于(ε_)最优解,并配合有非最优策略的检验准则,使算法更加得益. 设β为折扣因子,一般说β(或(ε,β))_最优平稳策略,往往是非唯一的,甚至与平稳策略类包含的策略数一样多.我们自然希望在诸β(或(ε,β))_最优平稳策略中寻求方差齐次地(关于初始状态)达(ε_)最小的策略.我们证明了这种策略确实存在,并给出了获得这种策略的算法. 相似文献

预序动态规划与预序马氏决策规划

董泽清《应用数学学报》1980,(3)

我们研究的是无限阶段预序模型。给出了一组比Sobel的假设更弱的条件,因此我们的模型适用范围更广;而且作了较深入的研究,获得了一系列新结果;并给出了算法收敛的充分条件;也是在更大的策略上研究的,因此结果更强。相似文献

10.

矿山采掘过程的数学模拟

徐光辉董泽清《数学的实践与认识》1974,(4)

§1.问题在露天矿的开采中,用电铲进行采掘,然后用卡车将采得的矿石拉到卸场.假定有 n 台电铲同时采掘,有 m 辆卡车进行运载(m>n),电铲的采掘能力与卡车的载重量都是已知的.还假定卸场有 s 个卸位(s相似文献

1 [2] 下一页 » 末页»