期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Pseudo-random number generation for Brownian Dynamics and Dissipative Particle Dynamics simulations on GPU devices

Carolyn L. Phillips Joshua A. Anderson Sharon C. Glotzer 《Journal of computational physics》2011,230(19):7191-7201

Brownian Dynamics (BD), also known as Langevin Dynamics, and Dissipative Particle Dynamics (DPD) are implicit solvent methods commonly used in models of soft matter and biomolecular systems. The interaction of the numerous solvent particles with larger particles is coarse-grained as a Langevin thermostat is applied to individual particles or to particle pairs. The Langevin thermostat requires a pseudo-random number generator (PRNG) to generate the stochastic force applied to each particle or pair of neighboring particles during each time step in the integration of Newton’s equations of motion. In a Single-Instruction-Multiple-Thread (SIMT) GPU parallel computing environment, small batches of random numbers must be generated over thousands of threads and millions of kernel calls. In this communication we introduce a one-PRNG-per-kernel-call-per-thread scheme, in which a micro-stream of pseudorandom numbers is generated in each thread and kernel call. These high quality, statistically robust micro-streams require no global memory for state storage, are more computationally efficient than other PRNG schemes in memory-bound kernels, and uniquely enable the DPD simulation method without requiring communication between threads. 相似文献

2.

Monte Carlo simulations: Hidden errors from "good" random number generators

Ferrenberg AM Landau DP Wong YJ 《Physical review letters》1992,69(23):3382-3384

相似文献

3.

A massively parallel fractional step solver for incompressible flows

G. Houzeaux M. Vázquez R. Aubry J.M. Cela 《Journal of computational physics》2009,228(17):6316-6332

This paper presents a parallel implementation of fractional solvers for the incompressible Navier–Stokes equations using an algebraic approach. Under this framework, predictor–corrector and incremental projection schemes are seen as sub-classes of the same class, making apparent its differences and similarities. An additional advantage of this approach is to set a common basis for a parallelization strategy, which can be extended to other split techniques or to compressible flows. 相似文献

4.

JEMS- FDTD超大规模并行计算测试

下载免费PDF全文

李瀚宇周海京廖成《强激光与粒子束》2011,23(11):0

介绍了JEMS- FDTD在大规模并行计算机上进行的并行性能测试,包括网格片大小对性能的影响、单节点MPI/OPENMP混合并行性能、多节点MPI/OPENMP混合并行性能、大规模并行性能等。同时,也给出了一个包含电大尺寸复杂、真实结构模型的算例,并对其进行了计算、分析。测试表明,JEMS- FDTD可高效使用数万个处理器核进行并行计算。大型算例测试表明：JEMS- FDTD可针对电大尺寸复杂、真实结构模型进行有效的计算、分析。相似文献

5.

Colloquium: Large scale simulations on GPU clusters

Massimo Bernaschi Mauro Bisson Massimiliano Fatica 《The European Physical Journal B - Condensed Matter and Complex Systems》2015,88(6):158

Graphics processing units (GPU) are currently used as a cost-effective platform forcomputer simulations and big-data processing. Large scale applications require thatmultiple GPUs work together but the efficiency obtained with cluster of GPUs is, at times,sub-optimal because the GPU features are not exploited at their best. We describe how itis possible to achieve an excellent efficiency for applications in statistical mechanics,particle dynamics and networks analysis by using suitable memory access patterns andmechanisms like CUDA streams, profiling tools, etc. Similar concepts andtechniques may be applied also to other problems like the solution of Partial DifferentialEquations. 相似文献

6.

多路Marx并联高压脉冲电源研究

下载免费PDF全文

饶俊峰洪凌锋郭龙跃李孜姜松《强激光与粒子束》2020,32(5):055001-1-055001-6

脉冲功率技术在工业和生物医学领域有着广泛的应用,很多应用场合要求输出数百安培的高压脉冲。固态Marx发生器虽已研究多年,但是被广泛采用直插封装的IGBT和MOSFET功率半导体开关管的额定电流通常都低于100 A,无法满足低阻抗负载的应用需求。为提高输出脉冲电流幅值,提出两种多路Marx发生器并联的脉冲电源的拓扑结构,第一种方案采用多路Marx发生器直接并联,第二种是共用一组充电开关管的多路Marx发生器并联。由FPGA提供充放电控制信号,采用串芯磁环隔离驱动方案实现带负压偏置的同步驱动,主电路选用开通速度快、通流能力强的IGBT为主开关的半桥式固态方波Marx电路。实验结果表明,6路16级Marx直接并联的脉冲发生器能输出重频100 Hz高压方波脉冲幅值可达10 kV,在30 Ω负载侧输出峰值电流可达300 A,上升时间230 ns。共用充电开关管的6路4级Marx并联发生器在5 Ω电阻负载上的输出电流峰值可达300 A,最大输出电流可达460 A,上升时间272 ns。表明多路Marx发生器并联可以有效地减小系统内阻,提高系统带载能力;改进后的并联方案实现大电流脉冲输出的同时,所采用的开关管数量减小近一半,提高了系统的抗干扰能力的同时,降低了脉冲电源的成本;且增加级间并联导线可进一步改善均流效果。

相似文献

7.

A massively parallel multi-block hybrid compact–WENO scheme for compressible flows

J. Chao A. Haselbacher S. Balachandar 《Journal of computational physics》2009,228(19):7473-7491

A new multi-block hybrid compact–WENO finite-difference method for the massively parallel computation of compressible flows is presented. In contrast to earlier methods, our approach breaks the global dependence of compact methods by using explicit finite-difference methods at block interfaces and is fully conservative. The resulting method is fifth- and sixth-order accurate for the convective and diffusive fluxes, respectively. The impact of the explicit interface treatment on the stability and accuracy of the multi-block method is quantified for the advection and diffusion equations. Numerical errors increase slightly as the number of blocks is increased. It is also found that the maximum allowable time steps increase with the number of blocks. The method demonstrates excellent scalability on up to 1264 processors. 相似文献

8.

GPU based parallel framework for receiver coil sensitivity estimation in SENSE reconstruction

《Magnetic resonance imaging》2021

Magnetic Resonance Imaging (MRI) uses non-ionizing radiations and is safer as compared to CT and X-ray imaging. MRI is broadly used around the globe for medical diagnostics. One main limitation of MRI is its long data acquisition time. Parallel MRI (pMRI) was introduced in late 1990's to reduce the MRI data acquisition time. In pMRI, data is acquired by under-sampling the Phase Encoding (PE) steps which introduces aliasing artefacts in the MR images. SENSitivity Encoding (SENSE) is a pMRI based method that reconstructs fully sampled MR image from the acquired under-sampled data using the sensitivity information of receiver coils. In SENSE, precise estimation of the receiver coil sensitivity maps is vital to obtain good quality images. Eigen-value method (a recently proposed method in literature for the estimation of receiver coil sensitivity information) does not require a pre-scan image unlike other conventional methods of sensitivity estimation. However, Eigen-value method is computationally intensive and takes a significant amount of time to estimate the receiver coil sensitivity maps. This work proposes a parallel framework for Eigen-value method of receiver coil sensitivity estimation that exploits its inherent parallelism using Graphics Processing Units (GPUs). We evaluated the performance of the proposed algorithm on in-vivo and simulated MRI datasets (i.e. human head and simulated phantom datasets) with Peak Signal-to-Noise Ratio (PSNR) and Artefact Power (AP) as evaluation metrics. The results show that the proposed GPU implementation reduces the execution time of Eigen-value method of receiver coil sensitivity estimation (providing up to 30 times speed up in our experiments) without degrading the quality of the reconstructed image. 相似文献

9.

Mathematical analysis of coupled parallel simulations

Shirts MR Pande VS 《Physical review letters》2001,86(22):4983-4987

A set of parallel replicas of a single simulation can be statistically coupled to closely approximate long trajectories. In many cases, this produces nearly linear speedup over a single simulation ( M times faster with M simulations), rendering previously intractable problems within reach of large computer clusters. Interestingly, by varying the coupling of the parallel simulations, it is possible in some systems to obtain greater than linear speedup. The methods are generalizable to any search algorithm with long residence times in intermediate states. 相似文献

10.

Proof-of-concept implementation of the massively parallel algorithm for simulation of dispersion-managed WDM optical fiber systems

Korotkevich AO Lushnikov PM 《Optics letters》2011,36(10):1851-1853

We perform a proof-of-concept implementation of the massively parallel algorithm [P. M. Lushnikov, Opt. Lett. 27, 939 (2002)] for simulation of dispersion-managed wavelength-division-multiplexed optical fiber systems. Linear scalability of the algorithm with the number of computer cores is demonstrated. Exact result on the accuracy of the implemented algorithm is found analytically and confirmed numerically as well as it is compared with the accuracy of the standard split-step algorithm. 相似文献

11.

基于鼠标轨迹和混沌系统的真随机数产生器研究 总被引：2，自引：0，他引：2

下载免费PDF全文

周庆胡月廖晓峰《物理学报》2008,57(9):5413-5418

提出了一种基于鼠标轨迹的真随机数产生器,并对该类产生器的优缺点、总体设计和基本技术进行了研究.为了消除相同用户鼠标轨迹中存在的相似性,利用混沌系统的敏感性,分别采用图像加密算法和Hash函数两种方法对鼠标轨迹进行后处理.大量严格的测试和实验表明,改进的基于混沌Hash函数的真随机数产生器具有安全、快速、方便和廉价的优点,可以在个人电脑上实际使用. 关键词：混沌真随机数产生器鼠标轨迹相似文献

12.

From massively parallel algorithms and fluctuating time horizons to nonequilibrium surface growth

Korniss G Toroczkai Z Novotny MA Rikvold PA 《Physical review letters》2000,84(6):1351-1354

We study the asymptotic scaling properties of a massively parallel algorithm for discrete-event simulations where the discrete events are Poisson arrivals. The evolution of the simulated time horizon is analogous to a nonequilibrium surface. Monte Carlo simulations and a coarse-grained approximation indicate that the macroscopic landscape in the steady state is governed by the Edwards-Wilkinson Hamiltonian. Since the efficiency of the algorithm corresponds to the density of local minima in the associated surface, our results imply that the algorithm is asymptotically scalable. 相似文献

13.

Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project

Andrade X Alberdi-Rodriguez J Strubbe DA Oliveira MJ Nogueira F Castro A Muguerza J Arruabarrena A Louie SG Aspuru-Guzik A Rubio A Marques MA 《J Phys Condens Matter》2012,24(23):233202

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures. 相似文献

14.

EAST运动斯塔克效应诊断数据处理中GPU并行化加速算法的研究

智玉琴黄耀符佳陈颖王枫余青江李颖颖吴振伟万宝年吕波《核聚变与等离子体物理》2020,40(1):23-27

在EAST装置单道运动斯塔克效应(MSE)诊断系统数据处理中,采用CPU(中央处理器)+GPU(图形处理器)异构化模型,实现了数字谐波分析(DHA)算法的并行化加速计算。由CPU完成数据的加载及简单的数学计算,由GPU实现DHA算法的傅里叶正、逆变换及滤波等并行化计算,与串行算法相比,获得了2000倍以上的加速,可以满足MSE诊断实验期间及时数据处理的要求。相似文献

15.

Combining random number generators using cut and project sequences

Louis-Sébastien Guimond Jiří Patera Jan Patera 《Czechoslovak Journal of Physics》2001,51(4):305-311

This paper discusses the use of aperiodic (binary or ternary) sequences in combining pseudorandom number generators (RNG). We introduce a method for combining two or three RNGs using cut and project sequences. This combination method produces aperiodic number sequences having no lattice structure. Theoretical results are announced. This work was partially supported by the Bell Canada University Laboratory, NSERC of Canada and FCAR of Québec. Presented by L.-S. Guimond at the DI-CRM Woprkshop held in Prague, 18–21 June 2000. 相似文献

16.

A methodology for multivariate simulation with massively parallel computing systems for NPP safety assessment: VARIA code

E. V. Moiseenko A. S. Filippov 《Journal of Engineering Thermophysics》2011,20(3):249-259

The paper is focused on the practical application of parallel computing techniques in uncertainty assessment in simulation of heat transfer, mechanical and some other problems related to deterministic analysis of NPP safety. A methodology is developed and implemented in VARIA computer code that performs simultaneous run of multiple simulations on a parallel computing system with further statistical analysis of the array of their results. The current version of the code allows automated preparation and execution of multivariate simulations of thermal and mechanical behavior of pressurized water reactor structures by best-estimate (BE) codes in the scope of NPP safety assessment under severe accident conditions. The number of simultaneously launched tasks is limited only by the computer cluster capacity. The VARIA code is verified on multivariate simulation with HEFEST code of thermal behavior of a core melt in the VVER-440 reactor vessel during a severe accident. The influence of the variation of input parameters (decay heat value and coefficients of the applied convective heat transfer model) on the simulation results is studied. It is concluded that the potential field of applying the program is beyond the scope of analyzing severe accidents at NPP and includes also software product quality assurance and analysis of uncertainties of obtained simulation results. 相似文献

17.

Effects of the random number generator on computer simulations

Giorgio Parisi Federico Rapuano 《Physics letters. [Part B]》1985,157(4):301-302

We have measured the susceptibility of a three-dimensional Ising system in a box of 24³ size. Our results do not agree within four standard deviations with the previous result obtained with a special-purpose machine. The origin of the discrepancy is due, in our opinion, to the different random number. 相似文献

18.

High efficiency redundant binary number representations for parallel arithmetic on optical computers

G.A. De Biase A. Massini 《Optics & Laser Technology》1994,26(4)

A family of redundant binary number representations, obtained by generalization of the RB (redundant binary) number representation, is introduced. All these number representations are suitable for optical computing and have properties similar to the RB representation. In particular, the p-RB (packed redundant binary) number representation introduced in this work has efficiency greater than both RB and MSD (modified signed digit) representations. With p-RB numbers the algebraic sum is always permitted in constant time for any efficiency value. p-RB representations also fit in a natural way the 2's complement binary number system. Symbolic substitution truth tables for the algebraic sum and several examples of computation are also given. 相似文献

19.

GPU accelerated simulations of bluff body flows using vortex particle methods

Diego Rossinelli Michael Bergdorf Georges-Henri Cottet Petros Koumoutsakos 《Journal of computational physics》2010,229(9):3316-3333

We present a GPU accelerated solver for simulations of bluff body flows in 2D using a remeshed vortex particle method and the vorticity formulation of the Brinkman penalization technique to enforce boundary conditions. The efficiency of the method relies on fast and accurate particle-grid interpolations on GPUs for the remeshing of the particles and the computation of the field operators. The GPU implementation uses OpenGL so as to perform efficient particle-grid operations and a CUFFT-based solver for the Poisson equation with unbounded boundary conditions. The accuracy and performance of the GPU simulations and their relative advantages/drawbacks over CPU based computations are reported in simulations of flows past an impulsively started circular cylinder from Reynolds numbers between 40 and 9500. The results indicate up to two orders of magnitude speed up of the GPU implementation over the respective CPU implementations. The accuracy of the GPU computations depends on the Re number of the flow. For Re up to 1000 there is little difference between GPU and CPU calculations but this agreement deteriorates (albeit remaining to within 5% in drag calculations) for higher Re numbers as the single precision of the GPU adversely affects the accuracy of the simulations. 相似文献

20.

Using GPU parallelization to perform realistic simulations of the LPCTrap experiments

X. Fabian F. Mauger G. Quéméner Ph. Velten G. Ban C. Couratin P. Delahaye D. Durand B. Fabre P. Finlay X. Fléchard E. Liénard A. Méry O. Naviliat-Cuncic B. Pons T. Porobic N. Severijns J. C. Thomas 《Hyperfine Interactions》2015,232(1-3):87-95

We performed classical molecular dynamics (MD) simulations in order to search the conditions for efficient sympathetic cooling of highly charged ions (HCIs) in a linear Paul trap. Small two-component ion Coulomb crystals consisting of laser-cooled ions and HCIs were characterized by the results of the MD simulations. We found that the spatial distribution is determined by not only the charge-to-mass ratio but also the space charge effect. Moreover, the simulation results suggest that the temperature of HCIs do not necessarily decrease with increasing the number of laser-cooled ions in the cases of linear ion crystals. We also determined the cooling limit of sympathetically cooled ¹⁶⁵Ho¹⁴⁺ ions in small linear ion Coulomb crystals. The present results show that sub-milli-Kelvin temperatures of at least 10 Ho¹⁴⁺ ions will be achieved by sympathetic cooling with a single laser-cooled Be⁺. 相似文献