期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods

Lee A Yau C Giles MB Doucet A Holmes CC 《Journal of computational and graphical statistics》2010,19(4):769-789

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. 相似文献

2.

Performance analysis of distributed solution approaches in simulation-based optimization

M. Gerdes T. Barth M. Grauer 《Computational Management Science》2005,2(1):57-82

Applying computationally expensive simulations in design or process optimization results in long-running solution processes even when using a state-of-the-art distributed algorithm and hardware. Within these simulation-based optimization problems the optimizer has to treat the simulation systems as black-boxes. The distributed solution of this kind of optimization problem demands efficient utilization of resources (i.e. processors) and evaluation of the solution quality. Analyzing the parallel performance is therefore an important task in the development of adequate distributed approaches taking into account the numerical algorithm, its implementation, and the used hardware architecture. In this paper, simulation-based optimization problems are characterized and a distributed solution algorithm is presented. Different performance analysis techniques (e.g. scalability analysis, computational complexity) are discussed and a new approach integrating parallel performance and solution quality is developed. This approach combines a priori and a posteriori techniques and can be applied in early stages of the solution process. The feasibility of the approach is demonstrated by applying it to three different classes of simulation-based optimization problems from groundwater management. 相似文献

3.

Global Optimization Approach to Unequal Global Optimization Approach to Unequal Sphere Packing Problems in 3D

Sutou A. Dai Y. 《Journal of Optimization Theory and Applications》2002,114(3):671-694

The problem of the unequal sphere packing in a 3-dimen-sional polytope is analyzed. Given a set of unequal spheres and a poly-tope, the double goal is to assemble the spheres in such a way that (i) they do not overlap with each other and (ii) the sum of the volumes of the spheres packed in the polytope is maximized. This optimization has an application in automated radiosurgical treatment planning and can be formulated as a nonconvex optimization problem with quadratic constraints and a linear objective function. On the basis of the special structures associated with this problem, we propose a variety of algorithms which improve markedly the existing simplicial branch-and-bound algorithm for the general nonconvex quadratic program. Further, heuristic algorithms are incorporated to strengthen the efficiency of the algorithm. The computational study demonstrates that the proposed algorithm can obtain successfully the optimization up to a limiting size. 相似文献

4.

On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods

《Journal of computational and graphical statistics》2013,22(4):769-789

We present a case study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multicore processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we find speedups from 35- to 500-fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modeling into complex data-rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. This article has supplementary material online. 相似文献

5.

Solving an equilibrium model for trade of CO2 emission permits

《European Journal of Operational Research》1997,102(2):393-403

In this paper we investigate the trade of CO₂ emission permits in the framework of a competitive economic equilibrium. For this purpose we integrate already existing regional energy-economy models (Markal-Macro) into a single multi-regional trade model. Based on two possible integration schemes we discuss two solution concepts: the first is based on pseudomonotonicity of the excess map whereas the second is a fixed point approach due to Negishi. In the latter case the overall optimization problem is decomposed. For both algorithms the regional subproblems were distributed on different computers and solved in parallel. We conclude by presenting numerical results of a model with full data sets for Switzerland, Sweden and the Netherlands. 相似文献

6.

Optimal design of a 2 DOF parallel robot

S.-D. Stan V. Maties R. Balan 《PAMM》2007,7(1):4130037-4130038

This paper is aimed at presenting a study on the optimization of the Biglide mini parallel robot, which comprises two-degree-of-freedom (DOF) mini parallel robots with constant struts. The robot workspace is characterized and the inverse kinematics equation is obtained. In the paper, design optimization is implemented with Genetic Algorithms (GA) for optimization considering transmission quality index and workspace. Here, intended to show the advantages of using the GA, we applied it to a multicriteria optimization problem of 2 DOF mini parallel robot. Genetic algorithms (GA) are so far generally the best and most robust kind of evolutionary algorithms. A GA has a number of advantages. It can quickly scan a vast solution set. Bad proposals do not affect the end solution negatively as they are simply discarded. The obtained results have shown that the use of GA in such kind of optimization problem enhances the quality of the optimization outcome, providing a better and more realistic support for the decision maker. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

7.

Applying the minimax criterion in stochastic recourse programs

《European Journal of Operational Research》2005,165(3):569-584

We consider an optimization problem in which some uncertain parameters are replaced by random variables. The minimax approach to stochastic programming concerns the problem of minimizing the worst expected value of the objective function with respect to the set of probability measures that are consistent with the available information on the random data. Only very few practicable solution procedures have been proposed for this problem and the existing ones rely on simplifying assumptions. In this paper, we establish a number of stability results for the minimax stochastic program, justifying in particular the approach of restricting attention to probability measures with support in some known finite set. Following this approach, we elaborate solution procedures for the minimax problem in the setting of two-stage stochastic recourse models, considering the linear recourse case as well as the integer recourse case. Since the solution procedures are modifications of well-known algorithms, their efficacy is immediate from the computational testing of these procedures and we do not report results of any computational experiments. 相似文献

8.

Distributed computation of Pareto solutions inn-player games

Markku Verkama Harri Ehtamo Raimo P. Hämäläinen 《Mathematical Programming》1996,74(1):29-45

The problem of computing Pareto optimal solutions with distributed algorithms is considered inn-player games. We shall first formulate a new geometric problem for finding Pareto solutions. It involves solving joint tangents for the players' objective functions. This problem can then be solved with distributed iterative methods, and two such methods are presented. The principal results are related to the analysis of the geometric problem. We give conditions under which its solutions are Pareto optimal, characterize the solutions, and prove an existence theorem. There are two important reasons for the interest in distributed algorithms. First, they can carry computational advantages over centralized schemes. Second, they can be used in situations where the players do not know each others' objective functions. 相似文献

9.

Scalable solver software

David E. Keyes 《PAMM》2007,7(1):1026401-1026402

Towards Optimal Petascale Simulations (TOPS) is a scalable solver software project based on domain decomposed parallelization to research, implement, and support in collaborations with users an open-source package for large-scale discretized PDE problems. Optimal complexity methods, such as multigrid/multilevel preconditioners, keep the time spent in dominant algebraic kernels close to linear in discrete problem size as the applications scale on massively parallel computers. Krylov accelerators and Jacobian-free variants of Newton's method, as appropriate, are wrapped around the multilevel methods to deliver robustness in multirate, multiscale coupled systems, which are solved either implicitly or in more traditional forms of operator splitting. The TOPS software framework is being extended beyond direct computational simulation to computational optimization, including design, control, and inverse problems. We outline and illustrate the philosophy of TOPS. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

10.

Experimental study of scheduling with memory constraints using hybrid methods

J. Berlińska M. Lawenda 《Journal of Computational and Applied Mathematics》2009,232(2):638-654

In this paper we study divisible load scheduling in systems with limited memory. Divisible loads are parallel computations which can be divided into independent parts processed in parallel on remote computers, and the part sizes may be arbitrary. The distributed system is a heterogeneous single level tree. The total size of processor memories is too small to accommodate the whole load at any moment of time. Therefore, the load is distributed in many rounds. Memory reservations have block nature. The problem consists in distributing the load taking into account communication time, computation time, and limited memory buffers so that the whole processing finishes as early as possible. This problem is both combinatorial and algebraic in nature. Therefore, hybrid algorithms are given to solve it. Two algorithms are proposed to solve the combinatorial component. A branch-and-bound algorithm is nearly unusable due to its complexity. Then, a genetic algorithm is proposed with more tractable execution times. For a given solution of the combinatorial part we formulate the solution of the algebraic part as a linear programming problem. An extensive computational study is performed to analyze the impact of various system parameters on the quality of the solutions. From this we were able to infer on the nature of the scheduling problem. 相似文献

11.

Implementation of parallel branch-and-bound algorithms – experiences with the graph partitioning problem

Jens Clausen Jesper Larsson Träff 《Annals of Operations Research》1991,33(5):329-349

Parallel processing is one of the essential concepts in the attempts to increase the computational power available for solving continuous and discrete optimization problems. In the case where an optimization algorithm is search-based, crucial issues of parallel distributed implementations are work-load distribution and granularity, i.e. how to distribute the search space among processors and how to control the amount of processing between interprocessor communication. The present paper compares distributed implementations of two branch-and-bound algorithms for the graph partitioning problem: Given an undirected graph with an even number of edges and weights assigned to each edge, partition the vertices into two subsets of equal size such that the sum of the costs of edges connecting vertices in different subsets is as small as possible. The problem is known to be NP-complete. The two branch-and-bound methods compared differ in design strategy: One is based on time-consuming bound calculations leading to tight bounds and thus a narrow search tree with few nodes, whereas the other employs an easy bound calculation scheme leading to a larger search tree. Both have been implemented on an iPSC-hypercube with 32 processors. We investigate the influence of the design strategy on the performance of the algorithms. 相似文献

12.

Processor interconnection networks from Cayley graphs

Stephen T. Schibell Richard M. Stafford 《Discrete Applied Mathematics》1992,40(3):333-357

Cayley graphs of groups are presently being considered by the computer science community as models of architectures for large scale parallel processor computers. In the first section of this paper we discuss Cayley graphs and show how they may be used as a tool for the design and analysis of network architectures for these types of computers.

Observing that routing on a Cayley graph is equivalent to a certain factoring problem in the associated group, we have been able to use a known powerful factoring technique in computational group theory to produce a fast efficient routing algorithm on the associated Cayley graph. In the second section of this paper we present this work. This research can be regarded as a first attempt to find general purpose routing algorithms for interconnection networks.

Believing that average diameter of a network for a large scale MIMD machine is the predominant factor in determining network performance, we designed Cayley graphs to be used in a special study performed at the Supercomputing Research Center (SRC). The importance of the average diameter in determining network performance was supported by the fact that the graphs found by us had the smallest average diameter and outperformed all other graphs evaluated in the study. In fact, before being driven into saturation, one of our graphs sustained 9.4% more network traffic than the next best candidate, a butterfly architecture, and 74.3% better than the bench mark 2-d mesh. The last section of our paper is devoted to this work.

This paper is divided into three sections. In the first section we discuss Cayley graphs and show how they may be used as a tool for the design and analysis of network architectures for parallel computers. In the second section we present our research on the routing problem. This research can be regarded as a first attempt to find general purpose routing algorithms for interconnection networks. In the last section we present some evidence that average diameter of a network for a large scale MIMD machine is the predominant factor in determining network performance. 相似文献

13.

Multidisciplinary Simulations with the Coupling Library MpCCI

R. Ahrem 《PAMM》2002,1(1):39-42

Extreme high demands on designing accurate prototypes for example in the fields of medical research, aircraft construction, shipbuilding and automotive industry require multidisciplinary simulations. A large number of tools for monodisciplinary simulations are available today. Each of these provides high quality simulation results in a specific physical domain. Now there is also a solution to do multidisciplinary computations: Parallel monodisciplinary codes are coupled with the Mesh based parallel Code Coupling Interface MpCCI to solve multidisciplinary problems with a loose coupled approach. The paper presents applications in the framework of fluid‐structure interaction, which demonstrate the advantages of the parallel coupling library for this kind of problems. The computational fluid dynamics code FLOWer developed at the Institute of Design Aerodynamics/DLR and the structural mechanics code SIMPACK developed at the Institute of Aeroelasticity/DLR are coupled to solve an aeroelastic test problem. The applicability of the coupling library in the field of aeroelasticity is strongly dependent on the integrated interpolations between the involved meshes. In the Institute of Aeroelasticity the aeroelastic analysis tool CAESAR was developed which includes aeroelasticity specific interpolation algorithms. These routines are integrated in MpCCI via a special interface. There are two types of interpolation routines included. The first kind of algorithms is based on the method of finite interpolation elements and the second uses radial basis functions. 相似文献

14.

An adaptive,multiple restarts neural network algorithm for graph coloring

《European Journal of Operational Research》1996,93(2):257-270

The graph coloring problem is amongst the most difficult ones in combinatorial optimization, with a diverse set of significant applications in science and industry. Previous neural network attempts at coloring graphs have not worked well. In particular, they do not scale up to large graphs. Furthermore, experimental evaluations on real-world graphs have been lacking, and so have comparisons with state of the art conventional algorithms. In this paper we address all of these issues. We develop an improved neural network algorithm for graph coloring that scales well with graph size. The algorithm employs multiple restarts, and adaptively reduces the network's size from restart as it learns bettwe ways to color a given graph. Hence it gets faster and leaner as it evolves. We evaluate this algorithm on a structurally diverse set of graphs that arise in different applications. We compare its performance with that of a state of the art conventional algorithm on identical graphs. The conventional algorithm works better overall, though ours is not far behind. Ours works better on some graphs. The inherent parallel and distributed nature of our algorithm, especially within a neural network architecture, is a potential advantage for implementation and speed up. 相似文献

15.

Parallel quasi-Newton methods for unconstrained optimization

Richard H. Byrd Robert B. Schnabel Gerald A. Shultz 《Mathematical Programming》1988,42(1-3):273-306

We discuss methods for solving the unconstrained optimization problem on parallel computers, when the number of variables is sufficiently small that quasi-Newton methods can be used. We concentrate mainly, but not exclusively, on problems where function evaluation is expensive. First we discuss ways to parallelize both the function evaluation costs and the linear algebra calculations in the standard sequential secant method, the BFGS method. Then we discuss new methods that are appropriate when there are enough processors to evaluate the function, gradient, and part but not all of the Hessian at each iteration. We develop new algorithms that utilize this information and analyze their convergence properties. We present computational experiments showing that they are superior to parallelization either the BFGS methods or Newton's method under our assumptions on the number of processors and cost of function evaluation. Finally we discuss ways to effectively utilize the gradient values at unsuccessful trial points that are available in our parallel methods and also in some sequential software packages.Research supported by AFOSR grant AFOSR-85-0251, ARO contract DAAG 29-84-K-0140, NSF grants DCR-8403483 and CCR-8702403, and NSF cooperative agreement DCR-8420944. 相似文献

16.

Parallel Dantzig–Wolfe decomposition of petroleum production allocation problems

E Torgnes V Gunnerud E Hagem M Rönnqvist B Foss 《The Journal of the Operational Research Society》2012,63(7):950-968

This article discusses the optimization of a petroleum production allocation problem through a parallel Dantzig–Wolfe algorithm. Petroleum production allocation problems are problems in which the determination of optimal production rates, lift gas rates and well connections are the central decisions. The motivation for modelling and solving such optimization problems stems from the value that lies in an increased production rate and the current lack of integrated software that considers petroleum production systems as a whole. Through our computational study, which is based on realistic production data from the Troll West field, we show the increase in computational efficiency that a parallel Dantzig–Wolfe algorithm offers. In addition, we show that previously implemented standard parallel algorithms lead to an inefficient use of parallel resources. A more advanced parallel algorithm is therefore developed to improve efficiency, making it possible to scale the algorithm by adding more CPUs and thus approach a reasonable solution time for realistic-sized problems. 相似文献

17.

Parallel optimization for traffic assignment

R. J. Chen R. R. Meyer 《Mathematical Programming》1988,42(1-3):327-345

Most large-scale optimization problems exhibit structures that allow the possibility of attack via algorithms that exhibit a high level of parallelism. The emphasis of this paper is the development of parallel optimization algorithms for a class of convex, block-structured problems. Computational experience is cited for some large-scale problems arising from traffic assignment applications. The algorithms considered here have the property that they allow such problems to be decomposed into a set of smaller optimization problems at each major iteration. These smaller problems correspond to linear single-commodity networks in the traffic assignment case, and they may be solved in parallel. Results are given for the distributed solution of such problems on the CRYSTAL multicomputer.This research was supported in part by NSF grant CCR-8709952 and AFOSR grant AFOSR-86-0194. 相似文献

18.

A primal-dual interior point method for optimal zero-forcing beamformer design under per-antenna power constraints

Bin Li Hai Huyen Dam Antonio Cantoni Kok Lay Teo 《Optimization Letters》2014,8(6):1829-1843

In this paper, we consider an optimal zero-forcing beamformer design problem in multi-user multiple-input multiple-output broadcast channel. The minimum user rate is maximized subject to zero-forcing constraints and power constraint on each base station antenna array element. The natural formulation leads to a nonconvex optimization problem. This problem is shown to be equivalent to a convex optimization problem with linear objective function, linear equality and inequality constraints and quadratic inequality constraints. Here, the indirect elimination method is applied to reduce the convex optimization problem into an equivalent convex optimization problem of lower dimension with only inequality constraints. The primal-dual interior point method is utilized to develop an effective algorithm (in terms of computational efficiency) via solving the modified KKT equations with Newton method. Numerical simulations are carried out. Compared to algorithms based on a trust region interior point method and sequential quadratic programming method, it is observed that the method proposed is much superior in terms of computational efficiency. 相似文献

19.

Modeling and analysis of a Canadian Forces Geomatics division workflow 总被引：1，自引：0，他引：1

Ahmed Ghanmi 《European Journal of Operational Research》2006,170(3):1001-1016

This paper addresses workflow issues of a Geomatics division in the Canadian Forces. This division is dedicated to the production of Geomatics Information under the US National Geospatial-Intelligence Agency’s Foundation Based Operations concept. This paper outlines a methodology for modeling and analyzing the workflow of the Canadian Forces Geomatics division. A discrete event simulation model is developed and applied to measure the performance of the workflow system for developing Geomatics products. A hybrid queueing network/optimization model is also developed to support the simulation model. The objective of the queueing network/optimization formulation is to provide the optimal configuration of the Geomatics division. The paper successfully addresses a complex workflow problem in the context of real world system and provides insights into the performance of the Canadian Forces Geomatics processing. 相似文献

20.

Parallel bundle-based decomposition for large-scale structured mathematical programming problems 总被引：2，自引：0，他引：2

Deepankar Medhi 《Annals of Operations Research》1990,22(1):101-127

In this paper, we present parallel bundle-based decomposition algorithms to solve a class of structured large-scale convex optimization problems. An example in this class of problems is the block-angular linear programming problem. By dualizing, we transform the original problem to an unconstrained nonsmooth concave optimization problem which is in turn solved by using a modified bundle method. Further, this dual problem consists of a collection of smaller independent subproblems which give rise to the parallel algorithms. We discuss the implementation on the CRYSTAL multi-computer. Finally, we present computational experience with block-angular linear programming problems and observe that more than 70% efficiency can be obtained using up to eleven processors for one group of test problems, and more than 60% efficiency can be obtained for relatively smaller problems using up to five processors for another group of problems. 相似文献