期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods

《Journal of computational and graphical statistics》2013,22(4):769-789

We present a case study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multicore processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we find speedups from 35- to 500-fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modeling into complex data-rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. This article has supplementary material online. 相似文献

2.

Planning under uncertainty using parallel computing

George B. Dantzig 《Annals of Operations Research》1988,14(1):1-16

Industry and government routinely solve deterministic mathematical programs for planning and schelduling purposes, some involving thousands of variables with a linear or non-linear objective and inequality constraints. The solutions obtained are often ignored because they do not properly hedge against future contingencies. It is relatively easy to reformulate models to include uncertainty. The bottleneck has been (and is) our capability to solve them. The time is now ripe for finding a way to do so. To this end, we describe in this paper how large-scale system methods for solving multi-staged systems, such as Bender's Decomposition, high-speed sampling or Monte Carlo simulation, and parallel processors can be combined to solve some important planning problems involving uncertainty. For example, parallel processors may make it possible to come to better grips with the fundamental problems of planning, scheduling, design, and control of complex systems such as the economy, an industrial enterprise, an energy system, a water-resource system, military models for planning-and-control, decisions about investment, innovation, employment, and health-delivery systems. 相似文献

3.

Combining stochastic and deterministic approaches within high efficiency molecular simulations

Bruno Escribano Elena Akhmatskaya Jon I. Mujika 《Central European Journal of Mathematics》2013,11(4):787-799

Generalized Shadow Hybrid Monte Carlo (GSHMC) is a method for molecular simulations that rigorously alternates Monte Carlo sampling from a canonical ensemble with integration of trajectories using Molecular Dynamics (MD). While conventional hybrid Monte Carlo methods completely re-sample particle’s velocities between MD trajectories, our method suggests a partial velocity update procedure which keeps a part of the dynamic information throughout the simulation. We use shadow (modified) Hamiltonians, the asymptotic expansions in powers of the discretization parameter corresponding to timestep, which are conserved by symplectic integrators to higher accuracy than true Hamiltonians. We present the implementation of this method into the highly efficient MD code GROMACS and demonstrate its performance and accuracy on computationally expensive systems like proteins in comparison with the molecular dynamics techniques already available in GROMACS. We take advantage of the state-of-the-art algorithms adopted in the code, leading to an optimal implementation of the method. Our implementation introduces virtually no overhead and can accurately recreate complex biological processes, including rare event dynamics, saving much computational time compared with the conventional simulation methods. 相似文献

4.

Recommendations on performance of parallel DSMC algorithm in solving subsonic nanoflows

Ehsan Roohi Masoud Darbandi 《Applied Mathematical Modelling》2012,36(5):2314-2321

We investigate the efficiency of a parallel direct simulation Monte Carlo (PDSMC) algorithm in solving the rarefied subsonic flow through a nanochannel. We use MPI library to transfer data between the processors. It is observed that PDSMC solver shows ideal speed up if sufficient workload is provided for each of processors. Additionally, this study shows that the computational time and speed up of the extended PDSMC solver do not depend (or slightly depend) on the number of cells. In contrary, increasing the total number of particles would result in a better efficiency of the PDSMC. 相似文献

5.

Fast Hamiltonian Monte Carlo Using GPU Computing

Andrew L. Beam Sujit K. Ghosh Jon Doyle 《Journal of computational and graphical statistics》2016,25(2):536-548

In recent years, the Hamiltonian Monte Carlo (HMC) algorithm has been found to work more efficiently compared to other popular Markov chain Monte Carlo (MCMC) methods (such as random walk Metropolis–Hastings) in generating samples from a high-dimensional probability distribution. HMC has proven more efficient in terms of mixing rates and effective sample size than previous MCMC techniques, but still may not be sufficiently fast for particularly large problems. The use of GPUs promises to push HMC even further greatly increasing the utility of the algorithm. By expressing the computationally intensive portions of HMC (the evaluations of the probability kernel and its gradient) in terms of linear or element-wise operations, HMC can be made highly amenable to the use of graphics processing units (GPUs). A multinomial regression example demonstrates the promise of GPU-based HMC sampling. Using GPU-based memory objects to perform the entire HMC simulation, most of the latency penalties associated with transferring data from main to GPU memory can be avoided. Thus, the proposed computational framework may appear conceptually very simple, but has the potential to be applied to a wide class of hierarchical models relying on HMC sampling. Models whose posterior density and corresponding gradients can be reduced to linear or element-wise operations are amenable to significant speed ups through the use of GPUs. Analyses of datasets that were previously intractable for fully Bayesian approaches due to the prohibitively high computational cost are now feasible using the proposed framework. 相似文献

6.

Parallel bucket sorting on graphics processing units based on convex optimization

《Optimization》2012,61(4):1033-1055

We found an interesting relation between convex optimization and sorting problem. We present a parallel algorithm to compute multiple order statistics of the data by minimizing a number of related convex functions. The computed order statistics serve as splitters that group the data into buckets suitable for parallel bitonic sorting. This led us to a parallel bucket sort algorithm, which we implemented for many-core architecture of graphics processing units (GPUs). The proposed sorting method is competitive to the state-of-the-art GPU sorting algorithms and is superior to most of them for long sorting keys. 相似文献

7.

运载火箭姿态运动并行实时数字仿真

费景高《计算数学》1992,14(4):489-497

1.前言大型运载火箭的姿态运动是指火箭绕其质心的运动,它是火箭姿态稳定控制系统的控制对象.火箭的姿态运动是多种运动的复合,诸如火箭壳体的弹性弯曲振动、液体推进剂在贮箱内的晃动,都会使其发生弱阻尼或不衰减的振荡.另外,火箭的参数,如转动惯量、重心位置、谐振频率和气动特性等都是随时间和飞行状态变化的,从而使运动特性变得非常复杂. 相似文献

8.

A Parallel Code for LES of Compressible Swirling Jet Flow Undergoing Vortex Breakdown

T. Luginsland S. Bühler D. Obrist L. Kleiser 《PAMM》2010,10(1):727-728

Swirling jets undergoing vortex breakdown occur in many technical applications, e.g. vortex burners, turbines and jet engines. To simulate the highly nonlinear dynamics of the flow, it is necessary to use high-order numerical methods, leading to increased computational cost. To be able to perform simulations in acceptable turn-around time, an available LES code for solving the filtered compressible Navier-Stokes equations in cylindrical coordinates using compact finite-difference schemes was parallelized for massively-parallel architectures. The parallelization was done following the ghost-cell approach for filtering in the three spatial directions. The inter-process communication is handled using the message passing interface (MPI). The weak and strong scaling properties of the code indicate that it can be used for massively parallel simulations using several thousand processors. (© 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

9.

Monte Carlo simulation of brain sensing by optical diffuse spectroscopy

Anton V. Gorshkov Mikhail Yu. Kirillin 《Journal of computational science》2012,3(6):498-503

We report on development of adapted Monte Carlo based algorithm and code for simulation of light propagation in turbid media with complex geometry aimed for simulation of optical diffuse spectroscopy signal in noninvasive brain sensing. Simulation will allow to determine optimal characteristics of a prototype device for optical diffuse brain sensing. The developed Monte Carlo code can be efficiently parallelized both for SMP and distributed memory systems. We show that the speed-up of the developed algorithm almost linearly depends on the number of nodes/threads in a utilized system. 相似文献

10.

Parallel Markov chain Monte Carlo Simulation by Pre-Fetching

《Journal of computational and graphical statistics》2013,22(1):246-261

In recent years, parallel processing has become widely available to researchers. It can be applied in an obvious way in the context of Monte Carlo simulation, but techniques for “parallelizing” Markov chain Monte Carlo (MCMC) algorithms are not so obvious, apart from the natural approach of generating multiple chains in parallel. Although generation of parallel chains is generally the easiest approach, in cases where burn-in is a serious problem, it is often desirable to use parallelization to speed up generation of a single chain. This article briefly discusses some existing methods for parallelization of MCMC algorithms, and proposes a new “pre-fetching” algorithm to parallelize generation of a single chain. 相似文献

11.

Generalized speculative computation of parallel simulated annealing

Andrew Sohn 《Annals of Operations Research》1996,63(1):29-55

Simulated annealing is known to be highly sequential due to dependences between iterations. While the conventional speculative computation with a binary tree has been found effective for parallel simulated annealing, its performance is limited to (logp)-fold speedup due to parallel execution of logp iterations onp processors. This report presents a new approach to parallel simulated annealing, calledgeneralized speculative computation (GSC). The GSC is synchronous, maintaining the same decision sequence as sequential simulated annealing. The use of two loop indices encoded in a single integer eliminates broadcasting of central data structure to all processors. The master-slave parallel programming paradigm simplifies controlling the activities ofp iterations which are executed in parallel onp processors. To verify the performance of GSC, we implemented 100-city to 500-city Traveling Salesman Problems on the AP1000 massively parallel multiprocessor. Execution results on the AP1000 demonstrate that the GSC approach can indeed be an effective method for parallel simulated annealing as it gave over 20-fold speedup on 100 processors. 相似文献

12.

Parametrically splitting algorithms

S. M. Ermakov 《Vestnik St. Petersburg University: Mathematics》2010,43(4):211-216

The notion of parametrically splitting algorithms is introduced, which are characterized by their capability of solving a given problem on a large number of processors with few data transfers. These algorithms include many numerical integration algorithms, Monte Carlo and quasi-Monte Carlo methods, and so on. 相似文献

13.

Numerical comparison of three stochastic methods for nonlinear PN junction problems

Wenqi YAO Tiao LU 《Frontiers of Mathematics in China》2014,9(3):659-698

We apply the Monte Carlo, stochastic Galerkin, and stochastic collocation methods to solving the drift-diffusion equations coupled with the Poisson equation arising in semiconductor devices with random rough surfaces. Instead of dividing the rough surface into slices, we use stochastic mapping to transform the original deterministic equations in a random domain into stochastic equations in the corresponding deterministic domain. A finite element discretization with the help of AFEPack is applied to the physical space, and the equations obtained are solved by the approximate Newton iterative method. Comparison of the three stochastic methods through numerical experiment on different PN junctions are given. The numerical results show that, for such a complicated nonlinear problem, the stochastic Galerkin method has no obvious advantages on efficiency except accuracy over the other two methods, and the stochastic collocation method combines the accuracy of the stochastic Galerkin method and the easy implementation of the Monte Carlo method. 相似文献

14.

资产跳跃下CM策略多期收入保证价格模拟

何志权《运筹学学报》2017,21(1):87-102

恒定混合策略(CM策略)多期收入保证价格是保本基金发行方采取设置止损的CM\linebreak策略作为投资策略时收取保本费的理论依据, 其中标的资产由复合泊松过程和维纳过程共同驱动, 这一定价问题内嵌奇异期权, 蒙特卡罗模拟方法擅长处理这种高维数量金融问题. 基于风险中性测度推导出多期收入保证价格的现值表达式, 用条件蒙特卡罗推导出这一现值表达式的模拟公式. 在给定参数下分别用普通蒙特卡罗和条件蒙特卡罗计算CM策略多期收入保证价格的数值解, 结果显示两种蒙特卡罗方法均能有效计算其数值解, 之后通过给定显著性水平下的置信区间长度评价两种方法的精确度, 结果显示条件蒙特卡罗比普通蒙特卡罗有很大改进. 接着运用条件蒙特卡罗模拟研究多期收入保证价格对不同参数范围的变化情况. 相似文献

15.

Identification of Regeneration Times in MCMC Simulation,With Application to Adaptive Schemes

《Journal of computational and graphical statistics》2013,22(2):436-458

Regeneration is a useful tool in Markov chain Monte Carlo simulation because it can be used to side-step the burn-in problem and to construct better estimates of the variance of parameter estimates themselves. It also provides a simple way to introduce adaptive behavior into a Markov chain, and to use parallel processors to build a single chain. Regeneration is often difficult to take advantage of because, for most chains, no recurrent proper atom exists, and it is not always easy to use Nummelin's splitting method to identify regeneration times. This article describes a constructive method for generating a Markov chain with a specified target distribution and identifying regeneration times. As a special case of the method, an algorithm which can be “wrapped” around an existing Markov transition kernel is given. In addition, a specific rule for adapting the transition kernel at regeneration times is introduced, which gradually replaces the original transition kernel with an independence-sampling Metropolis-Hastings kernel using a mixture normal approximation to the target density as its proposal density. Computational gains for the regenerative adaptive algorithm are demonstrated in examples. 相似文献

16.

Adaptation and optimization of basic operations for an unstructured mesh CFD algorithm for computation on massively parallel accelerators

P. B. Bogdanov A. V. Gorobets S. A. Sukov 《Computational Mathematics and Mathematical Physics》2013,53(8):1183-1194

The design of efficient algorithms for large-scale gas dynamics computations with hybrid (heterogeneous) computing systems whose high performance relies on massively parallel accelerators is addressed. A high-order accurate finite volume algorithm with polynomial reconstruction on unstructured hybrid meshes is used to compute compressible gas flows in domains of complex geometry. The basic operations of the algorithm are implemented in detail for massively parallel accelerators, including AMD and NVIDIA graphics processing units (GPUs). Major optimization approaches and a computation transfer technique are covered. The underlying programming tool is the Open Computing Language (OpenCL) standard, which performs on accelerators of various architectures, both existing and emerging. 相似文献

17.

Parallel algorithm of 3D wave-packet decomposition of seismic data: Implementation and optimization for GPU

Viktor V. Nikitin Anton A. Duchkov Fredrik Andersson 《Journal of computational science》2012,3(6):469-473

In this paper, we consider 3D wave-packet transform that is useful in 3D data processing. This transform is computationally intensive even though it has a computational complexity of O(N³ log N). Here we present its implementation on GPUs using NVIDIA CUDA technology. The code was tested on different types of graphical processors achieving the average speedup up to 46 times on Tesla M2050 compared to CPU sequential code. Also, we analyzed its scalability for several GPUs. The code was tested for processing synthetic seismic data set: data compression, de-noising, and interpolation. 相似文献

18.

基于Copula的VsR度量与事后检验

朱世武《数理统计与管理》2007,26(6):984-991

估计VaR的传统方法有三种:协方差矩阵法、历史模拟法和蒙特仁洛模拟法。通常,文献中认为刚蒙特卡洛模拟法度量VaR有很多方面的优点。但是,本文通过实证检验发现,使用传统蒙特卡洛模拟法估计的VaR偏小,事后检验效果很不理想。本文引入Copula函数来改进传统的蒙特卡洛模拟法。Copula函数能将单个边际分布和多元联合分布联系起来,能处理非正态的边际分布,并且它度量的相关性不再局限于线性相关性。实证检验表明,基于Copula的蒙特卡罗模拟法可以更加准确地度量资产组合的VaR。相似文献

19.

Coarse-Grained Langevin Approximations and Spatiotemporal Acceleration for Kinetic Monte Carlo Simulations of Diffusion of Interacting Particles

Sasanka ARE Markos A. KATSOULAKIS Anders SZEPESSY 《数学年刊B辑(英文版)》2009,30(6):653-682

Kinetic Monte Carlo methods provide a powerful computational tool for the simulation of microscopic processes such as the diffusion of interacting particles on a surface, at a detailed atomistic level. However such algorithms are typically computationatly expensive and are restricted to fairly small spatiotemporal scales. One approach towards overcoming this problem was the development of coarse-grained Monte Carlo algorithms. In recent literature, these methods were shown to be capable of efficiently describing much larger length scales while still incorporating information on microscopic interactions and fluctuations. In this paper, a coarse-grained Langevin system of stochastic differential equations as approximations of diffusion of interacting particles is derived, based on these earlier coarse-grained models. The authors demonstrate the asymptotic equivalence of transient and long time behavior of the Langevin approximation and the underlying microscopic process, using asymptotics methods such as large deviations for interacting particles systems, and furthermore, present corresponding numerical simulations, comparing statistical quantities like mean paths, auto correlations and power spectra of the microscopic and the approximating Langevin processes. Finally, it is shown that the Langevin approximations presented here are much more computationally efficient than conventional Kinetic Monte Carlo methods, since in addition to the reduction in the number of spatial degrees of freedom in coarse-grained Monte Carlo methods, the Langevin system of stochastic differential equations allows for multiple particle moves in a single timestep. 相似文献

20.

A GPU-based algorithm for efficient LES of high Reynolds number flows in heterogeneous CPU/GPU supercomputers

《Applied Mathematical Modelling》2020

Αn optimized MPI+OpenACC implementation model that performs efficiently in CPU/GPU systems using large-eddy simulation is presented. The code was validated for the simulation of wave boundary-layer flows against numerical and experimental data in the literature. A direct Fast-Fourier-Transform-based solver was developed for the solution of the Poisson equation for pressure taking advantage of the periodic boundary conditions. This solver was optimized for parallel execution in CPUs and outperforms by 10 times in computational time a typical iterative preconditioned conjugate gradient solver in GPUs. In terms of parallel performance, an overlapping strategy was developed to reduce the overhead of performing MPI communications using GPUs. As a result, the weak scaling of the algorithm was improved up to 30%. Finally, a large-scale simulation (Re = 2 × 10⁵) using a grid of 4 × 10⁸ cells was executed, and the performance of the code was analyzed. The simulation was launched using up to 512 nodes (512 GPUs + 6144 CPU-cores) on one of the current top 10 supercomputers of the world (Piz Daint). A comparison of the overall computational time showed that the GPU version was 4.2 times faster than the CPU one. The parallel efficiency of this strategy (47%) is competitive compared with the state-of-the-art CPU implementations, and it has the potential to take advantage of modern supercomputing capabilities. 相似文献