首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 301 毫秒
1.
《Optimization》2012,61(6):1017-1026
A stochastic decision model with general, non-necessarily additive reward function is considered and essential properties of such reward functions are formulated which only allow a successive proceeding in the sense of dynamic programming. Conditions for recursive—additive reward functions are given which ensure the existence of optimal strategies and the usability of value iteration to find an optimal policy and the optimal total reward.  相似文献   

2.
《Optimization》2012,61(5):733-742
In this note we consider a non-stationary stochastic decision model with vector-valued reward. Based on Pareto-optimality we define the maximal total reward as a set of vector valued total rewards, which have not a successor with respect to the underlying partially order relation. The principle of optimality is derived. Using the well-known von-Neumann-Morgenstern-property we formulate a Bellman-equation, which consists in a system of iterative set relations.  相似文献   

3.
In the present paper, the reward paths in non homogeneous semi-Markov systems in discrete time are examined with stochastic selection of the transition probabilities. The mean entrance probabilities and the mean rewards in the course of time are evaluated. Then the rate of the total reward for the homogeneous case is examined and the mean total reward is evaluated by means of p.g.f’s.   相似文献   

4.
This paper deals with constrained average reward Semi-Markov Decision Processes (SMDPs) with finite state and action sets. We consider two average reward criteria. The first criterion is time-average rewards, which equal the lower limits of the expected average rewards per unit time, as the horizon tends to infinity. The second criterion is ratio-average rewards, which equal the lower limits of the ratios of the expected total rewards during the firstn steps to the expected total duration of thesen steps asn . For both criteria, we prove the existence of optimal mixed stationary policies for constrained problems when the constraints are of the same nature as the objective functions. For unichain problems, we show the existence of randomized stationary policies which are optimal for both criteria. However, optimal mixed stationary policies may be different for each of these critria even for unichain problems. We provide linear programming algorithms for the computation of optimal policies.  相似文献   

5.
This paper builds a probabilistic model to analyze the risk–reward tradeoffs that larger telecommunications network elements present. Larger machines offer rewards in the form of cost savings due to economies of scale. But large machines are riskier because they affect more customers when they fail. Our model translates the risk of outages into dollar costs, which are random variables. This step enables us to combine the deployment cost and outage cost into a total cost. Once we express the decision makers’ preferences via a utility function, we can find the machine size that minimizes the total cost’s expected utility, thereby achieving an optimal tradeoff between reward and risk. The expected utility answers the question “how big is too big?”.  相似文献   

6.
We consider discrete-time nonlinear controlled stochastic systems, modeled by controlled Makov chains with denumerable state space and compact action space. The corresponding stochastic control problem of maximizing average rewards in the long-run is studied. Departing from the most common position which usesexpected values of rewards, we focus on a sample path analysis of the stream of states/rewards. Under a Lyapunov function condition, we show that stationary policies obtained from the average reward optimality equation are not only average reward optimal, but indeed sample path average reward optimal, for almost all sample paths.Research supported by a U.S.-México Collaborative Research Program funded by the National Science Foundation under grant NSF-INT 9201430, and by CONACyT-MEXICO.Partially supported by the MAXTOR Foundation for applied Probability and Statistics, under grant No. 01-01-56/04-93.Research partially supported by the Engineering Foundation under grant RI-A-93-10, and by a grant from the AT&T Foundation.  相似文献   

7.
We study the role of inequality using a tractable tournament model in which contestants with heterogeneous abilities are allocated a reward based on their relative effort. Using the Lorenz order, we focus on the effect of changing the degree of inequality in both the distribution abilities and rewards on the distribution of effort. When the Cumulative Density Function (CDF) of rewards is concave then reducing inequality of abilities increases effort at every rank, including aggregate effort. In general, aggregate effort increases (decreases) if the CDF of rewards has an increasing failure rate (decreasing failure rate). Further, unless the CDF of abilities is highly concave, increasing inequality of rewards always increases aggregate effort. Nonlinearity in the cost function also matters. The effect of changing the degree of inequality in either distribution of abilities or rewards, on aggregate effort, depends on the convexity of the cost function. In sum, our results challenge the current knowledge about the role of inequality in tournaments by showing how the distribution of effort–including aggregate effort–crucially depends on the convexity of the cost function and of the CDF of both rewards and abilities.  相似文献   

8.
As an extension of the discrete-time case, this note investigates the variance of the total cumulative reward for the embedded Markov chain of semi-Markov processes. Under the assumption that the chain is aperiodic and contains a single class of recurrent states recursive formulae for the variance are obtained which show that the variance growth rate is asymptotically linear in time. Expressions are provided to compute this growth rate.  相似文献   

9.
We consider a discrete time Markov decision process (MDP) with a finite state space, a finite action space, and two kinds of immediate rewards. The problem is to maximize the time average reward generated by one reward stream, subject to the other reward not being smaller than a prescribed value. An MDP with a reward constraint can be solved by linear programming in the range of mixed policies. On the other hand, when we restrict ourselves to pure policies, the problem is a combinatorial problem, for which a solution has not been discovered. In this paper, we propose an approach by Genetic Algorithms (GAs) in order to obtain an effective search process and to obtain a near optimal, possibly optimal pure stationary policy. A numerical example is given to examine the efficiency of the approach proposed.  相似文献   

10.
Consider a finite state irreducible Markov reward chain. It is shown that there exist simulation estimates and confidence intervals for the expected first passage times and rewards as well as the expected average reward, with 100% coverage probability. The length of the confidence intervals converges to zero with probability one as the sample size increases; it also satisfies a large deviations property.  相似文献   

11.
This paper deals with a stopping game in dynamic fuzzy systems with fuzzy rewards. We show that the optimal fuzzy reward is a unique solution of a fuzzy relational equation, and we estimate fuzzy rewards, by introducing a fuzzy expectation with a density given by fuzzy goals. We prove a minimax theorem for fuzzy expected values and show the existence of the value of the game.  相似文献   

12.
Rewards for better quality, penalties for poorer quality, and the type of inspection policy are among the most common quality-related provisions of supply chain contracts. In this paper, we examine the effect of rewards, penalities, and inspection policies on the behaviour of an expected cost minimizing supplier. We assume that the supplier selects a batch size and target quality level in order to meet a buyer's deterministic demand. We show that the reward and/or penalty that motivates a supplier to deliver the buyer's target quality depends upon the inspection policy. We also show that, when sampling inspection is used, penalties and rewards are substitutes for one another in motivating the supplier and that there exists a unique reward/penalty combination at which the buyer's expected cost of quality is zero.  相似文献   

13.
We consider three types of discrete time deterministic dynamic programs (DP's) on one-dimensional state spaces whose reward functions depend on both state and action, namely, type I: finite-stage DP's with invertible terminal reward function, type II:finite-stage DP's without terminal reward function, and type III: infinite-stage DP's with additive reward function. Types I and II have a general objective function, which is backwards recursively generated by stage-wise reward functions. Given a (main) DP, an inverse DP yielding a new expression is defined. The inverse DP has additive expression of objective function. Deriving recursive formulae, we establish Inverse Theorems between main and inverse DP's. Each Inverse Theorem is applied to Bellman's multi-stage allocation process. Uniqueness of solution to an inverse functional equation is proved.  相似文献   

14.
In this paper, we consider the nonstationary Markov decision processes (MDP, for short) with average variance criterion on a countable state space, finite action spaces and bounded one-step rewards. From the optimality equations which are provided in this paper, we translate the average variance criterion into a new average expected cost criterion. Then we prove that there exists a Markov policy, which is optimal in an original average expected reward criterion, that minimizies the average variance in the class of optimal policies for the original average expected reward criterion.  相似文献   

15.
So far, there have been several concepts about fuzzy random variables and their expected values in literature. One of the concepts defined by Liu and Liu (2003a) is that the fuzzy random variable is a measurable function from a probability space to a collection of fuzzy variables and its expected value is described as a scalar number. Based on the concepts, this paper addresses two processes—fuzzy random renewal process and fuzzy random renewal reward process. In the fuzzy random renewal process, the interarrival times are characterized as fuzzy random variables and a fuzzy random elementary renewal theorem on the limit value of the expected renewal rate of the process is presented. In the fuzzy random renewal reward process, both the interarrival times and rewards are depicted as fuzzy random variables and a fuzzy random renewal reward theorem on the limit value of the long-run expected reward per unit time is provided. The results obtained in this paper coincide with those in stochastic case or in fuzzy case when the fuzzy random variables degenerate to random variables or to fuzzy variables.  相似文献   

16.
王红芳 《运筹与管理》2022,31(5):233-239
本研究基于社会信息加工理论,构建了一个以组织自尊为中介变量、工作投入为调节变量的总体报酬影响员工创新行为的两阶段调节中介模型,并采用时间滞后法获得来自多个行业与地区的140家企业470位非研发人员及其领导的多时点配对调查数据对模型进行检验,旨在揭示中国情境下总体报酬激发员工创新行为的作用机制和边界条件。实证研究发现,总体报酬与员工的创新行为、组织自尊显著正相关,组织自尊完全中介了总体报酬与创新行为的关系,工作投入正向调节了总体报酬和组织自尊的关系、以及组织自尊和员工创新行为的关系,并进一步正向调节了总体报酬通过组织自尊对创新行为的间接影响。  相似文献   

17.
We consider a problem of sequencing a set of alternatives (i.e. manufacturing methods, job applicants or target journals) available for selection to complete a project. Associated with each alternative are the probability of successful completion, the completion time, and the reward obtained upon successfully completing the alternative. The optimal sequencing strategy that maximizes the expected present value of total rewards, is derived based on a simple ordering parameter. We further consider an extension in which one of the alternatives will not be available for selection if not selected by a certain time, and another extension in which the selection process is allowed only for a limited period of time. We propose solution strategies to the selection and sequencing problem under time constraints.  相似文献   

18.
Recently, Zhao et al. (in Fuzzy Optimization and Decision Making 2007 6, 279–295) presented a fuzzy random elementary renewal theorem and fuzzy random renewal reward theorem in the fuzzy random process. In this paper, we study the convergence of fuzzy random renewal variable and of the total rewards earned by time t with respect to the extended Hausdorff metrics d and d 1. Using this convergence information and applying the uniform convergence theorem, we provide some new versions of the fuzzy random elementary renewal theorem and the fuzzy random renewal reward theorem.  相似文献   

19.
We consider a setting where multiple vehicles form a team cooperating to visit multiple target points and collect rewards associated with them. The team objective is to maximize the total reward accumulated over a given time interval. Complicating factors include uncertainties regarding the locations of target points and the effectiveness of collecting rewards, differences among vehicle capabilities, and the fact that rewards are time-varying. We present a Receding Horizon (RH) control scheme which dynamically determines vehicle trajectories by solving a sequence of optimization problems over a planning horizon and executing them over a shorter action horizon. A key property of this scheme is that the trajectories it generates are stationary, in the sense that they ultimately guide vehicles to target points, even though the controller is not designed to perform any discrete point assignments. The proposed scheme is centralized and it induces a cooperative behavior. We subsequently develop a distributed cooperative controller which does not require a vehicle to maintain perfect information on the entire team and whose computational cost is scalable and significantly lower than the centralized case, making it attractive for applications with real-time constraints. We include simulation-based comparisons between the centralized algorithm and the distributed version, which illustrate the effectiveness of the latter.  相似文献   

20.
In this paper, total reward stochastic games are surveyed. Total reward games are motivated as a refinement of average reward games. The total reward is defined as the limiting average of the partial sums of the stream of payoffs. It is shown that total reward games with finite state space are strategically equivalent to a class of average reward games with an infinite countable state space. The role of stationary strategies in total reward games is investigated in detail. Further, it is outlined that, for total reward games with average reward value 0 and where additionally both players possess average reward optimal stationary strategies, it holds that the total reward value exists.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号