首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 838 毫秒
1.
In this paper, total reward stochastic games are surveyed. Total reward games are motivated as a refinement of average reward games. The total reward is defined as the limiting average of the partial sums of the stream of payoffs. It is shown that total reward games with finite state space are strategically equivalent to a class of average reward games with an infinite countable state space. The role of stationary strategies in total reward games is investigated in detail. Further, it is outlined that, for total reward games with average reward value 0 and where additionally both players possess average reward optimal stationary strategies, it holds that the total reward value exists.  相似文献   

2.
Abstract

In this article, we study continuous-time Markov decision processes in Polish spaces. The optimality criterion to be maximized is the expected discounted criterion. The transition rates may be unbounded, and the reward rates may have neither upper nor lower bounds. We provide conditions on the controlled system's primitive data under which we prove that the transition functions of possibly non-homogeneous continuous-time Markov processes are regular by using Feller's construction approach to such transition functions. Then, under continuity and compactness conditions we prove the existence of optimal stationary policies by using the technique of extended infinitesimal operators associated with the transition functions of possibly non-homogeneous continuous-time Markov processes, and also provide a recursive way to compute (or at least to approximate) the optimal reward values. The conditions provided in this paper are different from those used in the previous literature, and they are illustrated with an example.  相似文献   

3.
This paper deals with constrained average reward Semi-Markov Decision Processes (SMDPs) with finite state and action sets. We consider two average reward criteria. The first criterion is time-average rewards, which equal the lower limits of the expected average rewards per unit time, as the horizon tends to infinity. The second criterion is ratio-average rewards, which equal the lower limits of the ratios of the expected total rewards during the firstn steps to the expected total duration of thesen steps asn . For both criteria, we prove the existence of optimal mixed stationary policies for constrained problems when the constraints are of the same nature as the objective functions. For unichain problems, we show the existence of randomized stationary policies which are optimal for both criteria. However, optimal mixed stationary policies may be different for each of these critria even for unichain problems. We provide linear programming algorithms for the computation of optimal policies.  相似文献   

4.
程永生 《运筹与管理》2020,29(12):231-239
从消费者社交性的视角优化推荐奖励策略。文章基于效用理论,分析消费者的购买和推荐行为。构建了当期利润最大化基本模型和纳入远期利益的扩展模型,优化产品价格和折扣,给出高低折扣策略的适用空间,并探讨消费者社交能力对企业利润和社会绩效的影响。结果显示,消费者的社交能力差异较小时适合采取高折扣策略,否则适宜低折扣策略。当企业考虑远期利益的时候,企业的最优策略是免费策略。随着环境参数变化,最优奖励策略由高折扣调整为低折扣时会带来福利损失,但低折扣策略并不必定导致社会福利的损失。此外,还给出了实践中某些商品价格虚高的经济解释。本研究能够为企业实施推荐计划制定奖励策略提供依据。  相似文献   

5.
We examine referral reward programs (RRP) that are intended for a service firm to encourage its current customers (inductors) to entice their friends (inductees) to purchase the firm’s service. By considering the interplay among the firm, the inductor, and the inductee, we solve a “nested” Stackelberg game so as to determine the optimal RRP in equilibrium. We determine the conditions under which it is optimal for the firm to reward the inductor only, reward the inductee only, or reward both. Also, our results suggest that RRP dominates direct marketing when the firm’s current market penetration or the inductor’s referral effectiveness is sufficiently high. We then extend our model to incorporate certain key impression management factors: the inductor’s intrinsic reward of making a positive impression by being seen as helping a friend, the inductor’s concerns about creating a negative impression when making an incentivized referral, and the inductee’s impression of the inductor’s credibility when an incentive is involved. In the presence of these impression management factors, we show that the firm should reward the inductee more and the inductor less. Under certain conditions, it is optimal for the firm to reward neither the inductor nor the inductee so that the optimal RRP relies purely on unincentivized word of mouth.  相似文献   

6.
In this paper, the infinite horizon Markovian decision programming with recursive reward functions is discussed. We show that Bellman's optimal principle is applicable for our model. Then, a sufficient and necessary condition for a policy to be optimal is given. For the stationary case, an iteration algorithm for finding a stationary optimal policy is designed. The algorithm is a generalization of Howard's [7] and Iwamoto's [3] algorithms.This research was supported by the National Natural Science Foundation of China.  相似文献   

7.
We develop a model to explore firms’ decisions on designing referral reward programs in freemium. We find two indices that firms could compare against value discrepancy to choose their strategy. The optimal price and referral reward rate are obtained. We find that the absence of network externality relating to product quality makes referral dysfunctional, and we conduct sensitivity analysis. This study contributes to theories that incorporate a referral reward program into the freemium model and provides practitioners with viable takeaways.  相似文献   

8.
《Optimization》2012,61(1):113-121
Finite horizon stochastic dynamic decision processes with Rp valued additive returns are considered. The optimization criterion is a partial-order preference relation induced from a convex cone in Rp . The state space is a countable set, and the action space is a compact metric spaces. The optimal value function, which is of a set-valued mapping, is defined. Under certain assumptions on the continuity of the reward vector and the transition probability, a system of a recurrence set-relations concerning the optimal value functions is given.  相似文献   

9.
《Optimization》2012,61(2):313-317
A multi-stage stochastic decision model with general reward functional is considered. We get statements with respect to criteria of optimality and to the existence of optimal strategies which generalize well-known theorems of classical theory of dynamic programming.  相似文献   

10.
We consider three types of discrete time deterministic dynamic programs (DP's) on one-dimensional state spaces whose reward functions depend on both state and action, namely, type I: finite-stage DP's with invertible terminal reward function, type II:finite-stage DP's without terminal reward function, and type III: infinite-stage DP's with additive reward function. Types I and II have a general objective function, which is backwards recursively generated by stage-wise reward functions. Given a (main) DP, an inverse DP yielding a new expression is defined. The inverse DP has additive expression of objective function. Deriving recursive formulae, we establish Inverse Theorems between main and inverse DP's. Each Inverse Theorem is applied to Bellman's multi-stage allocation process. Uniqueness of solution to an inverse functional equation is proved.  相似文献   

11.
We present a conceptual framework within which we can analyze simple reward schemes for classifier systems. The framework consists of a set of classifiers, a learning mechanism, and a finite automaton environment that outputs payoff. We find that many reward schemes have negative biases that degrade system performance. We analyze bucket brigade schemes, which are subgoal reward schemes, and profit sharing schemes, which aren't. By contrasting these schemes, we hope to better understand the place of subgoal reward in learning and evolution. © 1998 John Wiley & Sons, Inc.  相似文献   

12.
A dynamic programming problem is called invariant if its transition mechanism depends only on the action taken and does not depend on the current state of the systm. Replacement and maintenance problems are two typical types of problems which arise in applications and are often invariant. The paper studies properties of invariant problems when the state space is arbitrary and the action space is finite. The main result is a method of obtaining optimal policies for this case when the optimality criterion is that of maximizing the average reward per unit time. Results are illustrated by examples.  相似文献   

13.
Multi-Armed bandit problem revisited   总被引:1,自引:0,他引:1  
In this paper, we revisit aspects of the multi-armed bandit problem in the earlier work (Ref. 1). An alternative proof of the optimality of the Gittins index rule is derived under the discounted reward criterion. The proof does not involve an explicit use of the interchange argument. The ideas of the proof are extended to derive the asymptotic optimality of the index rule under the average reward criterion. Problems involving superprocesses and arm-acquiring bandits are also reexamined. The properties of an optimal policy for an arm-acquiring bandit are discussed.This research was supported by NSF Grant IRI-91-20074.  相似文献   

14.
声誉效应与经理报酬契约的关系研究   总被引:10,自引:1,他引:9  
探讨了声誉对经理报酬契约的影响 .首先 ,分析了经理效用函数的组成 ,将声誉这个隐性激励约束因素引入经理的效用函数 .随后 ,建立了信息不对称下经理的报酬激励模型 .分析了声誉系数对契约中各要素的影响 ,说明了所有者可以利用经理的声誉效应来设计更加有利的报酬契约 .文章最后给出了主要结论 .  相似文献   

15.
16.
Finite and infinite planning horizon Markov decision problems are formulated for a class of jump processes with general state and action spaces and controls which are measurable functions on the time axis taking values in an appropriate metrizable vector space. For the finite horizon problem, the maximum expected reward is the unique solution, which exists, of a certain differential equation and is a strongly continuous function in the space of upper semi-continuous functions. A necessary and sufficient condition is provided for an admissible control to be optimal, and a sufficient condition is provided for the existence of a measurable optimal policy. For the infinite horizon problem, the maximum expected total reward is the fixed point of a certain operator on the space of upper semi-continuous functions. A stationary policy is optimal over all measurable policies in the transient and discounted cases as well as, with certain added conditions, in the positive and negative cases.  相似文献   

17.
《Optimization》2012,61(3):247-259
The paper deals with vector-valued semi-Markovian decision process (VSMDP). Thereby we derive a suitable definition of the optimal average reward of a VSMDP. We construct an algorithm for improving policies in the vector-valued case.  相似文献   

18.
We introduce semi-Markov fields and provide formulations for the basic terms in the semi-Markov theory. In particular we define and consider a class of associated reward fields. Then we present a formula for the expected reward at any multidimensional time epoch. The formula is indeed new even for the classical semi-Markov processes. It gives the expected cumulative reward for fairly large classes of reward functions; in particular, it provides the formulas for the expected cumulative reward given in Masuda and Sumitau (1991), Soltani (1996) and Soltani and Khorshidian (1998).  相似文献   

19.
《Optimization》2012,61(11):1761-1779
In this article, we study reward–risk ratio models under partially known message of random variables, which is called robust (worst-case) performance ratio problem. Based on the positive homogenous and concave/convex measures of reward and risk, respectively, the new robust ratio model is reduced equivalently to convex optimization problems with a min–max optimization framework. Under some specially partial distribution situation, the convex optimization problem is converted into simple framework involving the expectation reward measure and conditional value-at-risk measure. Compared with the existing reward–risk portfolio research, the proposed ratio model has two characteristics. First, the addressed problem combines with two different aspects. One is to consider an incomplete information case in real-life uncertainty. The other is to focus on the performance ratio optimization problem, which can realize the best balance between the reward and risk. Second, the complicated optimization model is transferred into a simple convex optimization problem by the optimal dual theorem. This indeed improves the usability of models. The generation asset allocation in power systems is presented to validate the new models.  相似文献   

20.
Value-function approximation is investigated for the solution via Dynamic Programming (DP) of continuous-state sequential N-stage decision problems, in which the reward to be maximized has an additive structure over a finite number of stages. Conditions that guarantee smoothness properties of the value function at each stage are derived. These properties are exploited to approximate such functions by means of certain nonlinear approximation schemes, which include splines of suitable order and Gaussian radial-basis networks with variable centers and widths. The accuracies of suboptimal solutions obtained by combining DP with these approximation tools are estimated. The results provide insights into the successful performances appeared in the literature about the use of value-function approximators in DP. The theoretical analysis is applied to a problem of optimal consumption, with simulation results illustrating the use of the proposed solution methodology. Numerical comparisons with classical linear approximators are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号