期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

祁美玲杨琼王苍龙田园杨磊《计算物理》2017,34(4):461-467

基于NIVIDIA公司的CUDA架构对结构材料辐照损伤的分子动力学程序在单个GPU上进行并行化,并对影响程序运行效率的相关因素进行分析和测试.经过一系列优化,当粒子数为两百万时,对比单CPU的执行时间,优化后的GPU程序其双精度加速比可达112倍,单精度加速比达到了三百倍,为后续扩展多GPU结构材料辐照损伤的分子动力学程序奠定基础. 相似文献

2.

Fully parallel algorithm for simulating dispersion-managed wavelength-division-multiplexed optical fiber systems

Lushnikov PM 《Optics letters》2002,27(11):939-941

An efficient numerical algorithm is presented for massively parallel simulations of dispersion-managed wavelength-division-multiplexed optical fiber systems. The algorithm is based on a weak nonlinearity approximation and independent parallel calculations of fast Fourier transforms on multiple central processor units (CPUs). The algorithm allows one to implement numerical simulations M/2 times faster than a direct numerical simulation by a split-step method, where M is a number of CPUs in a parallel network. 相似文献

3.

激波诱导火焰界面失稳的并行化学加速计算

下载免费PDF全文

苏月涵沈小东傅嘉琛董刚《气体物理》2020,5(1):56-65

在带有详细化学反应机理的可压缩反应流数值模拟中，化学反应源项的计算会极大增加计算时间，基于建表技术的化学加速算法可以通过查找数据表中的数据来替代化学反应计算，从而有效提高计算效率，但数据表尺寸的过度增长会导致计算的中断.文章提出了基于两种数据表容量控制策略的并行动态存储/删除算法，并在激波诱导火焰界面失稳的数值模拟中进行了应用，以考察算法的性能.两种数据表容量控制策略分别为单表容量（M_sin）控制和总表容量（M_tot）控制，当单个数据表尺寸达到M_sin或总数据表尺寸达到M_tot时，对数据表进行节点删除，以保证计算的正常进行.计算结果表明，文章提出的基于表容量控制的并行加速算法，其计算准确度和计算效率之间存在关联，具有较好计算准确度算例显示了较高的计算效率.在不同的M_sin和M_tot条件下，计算的化学加速比在2.73~3.93之间.两种表控策略的组合影响了数据表删除的频率和删除之间的同步性，当数据表删除频率小、删除同步性强时，化学加速比要更高. 相似文献

4.

基于JASMIN三维势场快速多极子算法的并行实现

左风丽刘旭张宝印胡晓燕《计算物理》2013,30(1):140-147

在JASMIN上,基于进程/线程两级并行实现策略,研制三维Laplace核函数FMM(fast multipole method)的解法器模块"JASMIN-3DLapFMM".该解法器已成功应用于三维静电场远场势的并行计算.固定单机问题规模,在上万个处理器核上运行百亿粒子的大规模问题,获得进程级几乎线性的并行可扩展性.固定总的问题规模和1024个进程,4个线程时,获得大约3倍的加速. 相似文献

5.

Efficient solid state NMR powder simulations using SMP and MPP parallel computation

Kristensen JH Farnan I 《Journal of magnetic resonance (San Diego, Calif. : 1997)》2003,161(2):183-190

Methods for parallel simulation of solid state NMR powder spectra are presented for both shared and distributed memory parallel supercomputers. For shared memory architectures the performance of simulation programs implementing the OpenMP application programming interface is evaluated. It is demonstrated that the design of correct and efficient shared memory parallel programs is difficult as the performance depends on data locality and cache memory effects. The distributed memory parallel programming model is examined for simulation programs using the MPI message passing interface. The results reveal that both shared and distributed memory parallel computation are very efficient with an almost perfect application speedup and may be applied to the most advanced powder simulations. 相似文献

6.

GPU accelerated simulations of 3D deterministic particle transport using discrete ordinates method

Chunye Gong Jie Liu Lihua Chi Haowei Huang Jingyue Fang Zhenghu Gong 《Journal of computational physics》2011,230(15):6010-6022

Graphics Processing Unit (GPU), originally developed for real-time, high-definition 3D graphics in computer games, now provides great faculty in solving scientific applications. The basis of particle transport simulation is the time-dependent, multi-group, inhomogeneous Boltzmann transport equation. The numerical solution to the Boltzmann equation involves the discrete ordinates (S_n) method and the procedure of source iteration. In this paper, we present a GPU accelerated simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The performance of the GPU simulations are reported with the simulations of vacuum boundary condition. The discussion of the relative advantages and disadvantages of the GPU implementation, the simulation on multi GPUs, the programming effort and code portability are also reported. The results show that the overall performance speedup of one NVIDIA Tesla M2050 GPU ranges from 2.56 compared with one Intel Xeon X5670 chip to 8.14 compared with one Intel Core Q6600 chip for no flux fixup. The simulation with flux fixup on one M2050 is 1.23 times faster than on one X5670. 相似文献

7.

A parallel additive Schwarz preconditioned Jacobi–Davidson algorithm for polynomial eigenvalue problems in quantum dot simulation

Feng-Nan Hwang Zih-Hao Wei Tsung-Ming Huang Weichung Wang 《Journal of computational physics》2010,229(8):2932-2947

We develop a parallel Jacobi–Davidson approach for finding a partial set of eigenpairs of large sparse polynomial eigenvalue problems with application in quantum dot simulation. A Jacobi–Davidson eigenvalue solver is implemented based on the Portable, Extensible Toolkit for Scientific Computation (PETSc). The eigensolver thus inherits PETSc’s efficient and various parallel operations, linear solvers, preconditioning schemes, and easy usages. The parallel eigenvalue solver is then used to solve higher degree polynomial eigenvalue problems arising in numerical simulations of three dimensional quantum dots governed by Schrödinger’s equations. We find that the parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method (e.g. GMRES) can solve the correction equations, the most costly step in the Jacobi–Davidson algorithm, very efficiently in parallel. Besides, the overall performance is quite satisfactory. We have observed near perfect superlinear speedup by using up to 320 processors. The parallel eigensolver can find all target interior eigenpairs of a quintic polynomial eigenvalue problem with more than 32 million variables within 12 minutes by using 272 Intel 3.0 GHz processors. 相似文献

8.

Particle-Mesh Ewald(PME)算法的GPU加速 总被引：1，自引：0，他引：1

徐骥葛蔚任瑛李静海《计算物理》2010,27(4):548-554

讨论在NVIDIACUDA开发环境下,用GPU加速分子动力学模拟中静电作用的长程受力计算部分.采用Particle-Mesh Ewald(PME)方法,将其分解为参数确定、点电荷网格离散、离散网格的傅立叶变换、静电热能求解与静电力求解5个部分,并分别分析各部分的GPU实现.此方法已成功用于7个不同大小的生物分子体系的模拟计算,达到了7倍左右的加速.该程序可耦合到现有分子动力学模拟软件中,或作为进一步开发的GPU分子动力学程序的一部分,显著加速传统分子动力学程序. 相似文献

9.

Long-time integration methods for mesoscopic models of pattern-forming systems

Nasser Mohieddin Abukhdeir Dionisios G. Vlachos Markos Katsoulakis Michael Plexousakis 《Journal of computational physics》2011,230(14):5704-5715

Spectral methods for simulation of a mesoscopic diffusion model of surface pattern formation are evaluated for long simulation times. Backwards-differencing time-integration, coupled with an underlying Newton–Krylov nonlinear solver (SUNDIALS-CVODE), is found to substantially accelerate simulations, without the typical requirement of preconditioning. Quasi-equilibrium simulations of patterned phases predicted by the model are shown to agree well with linear stability analysis. Simulation results of the effect of repulsive particle–particle interactions on pattern relaxation time and short/long-range order are discussed. 相似文献

10.

三维面向对象的并行粒子模拟程序PLASIM3D 总被引：6，自引：1，他引：5

马燕云常文蔚银燕卓红斌徐涵《计算物理》2004,21(3):305-311

设计了基于区域分解的三维粒子模拟的并行算法,基于消息传递环境(MPI)编制了三维面向对象的并行粒子模拟程序PLASIM3D(PlasmaSimulator分别取前3个字母,3D表示三维).对激光与低密度等离子体薄靶相互作用问题作了粒子模拟计算,验证了该并行程序.最后在高性能并行机上测试并分析了并行性能,获得了接近线性的加速比. 相似文献

11.

三维电磁粒子模拟并行计算的研究 总被引：3，自引：0，他引：3

下载免费PDF全文

廖臣刘大刚刘盛纲《物理学报》2009,58(10):6709-6718

三维电磁粒子模拟基于时域有限差分算法（FDTD）和PIC（particle-in-cell）方法.根据FDTD和PIC方法的特点,可以将整个模拟区域分割为多个子区域,每个计算进程模拟计算一个子区域,通过消息传递交换子区域的边界数据从而实现并行计算这一基本思路,完成了并行算法的设计,并分析了并行加速比的影响因素.在三维电磁粒子模拟软件CHIPIC3D上实现了该并行算法并验证了算法的正确性,最后应用CHIPIC3D并行版本对磁绝缘线振荡器和相对论速调管两种典型的高功率微波源器件进行了模拟,证明了该并行算法能取 关键词：电磁粒子模拟时域有限差分并行计算高功率微波源相似文献

12.

Large-scale simulations on multiple Graphics Processing Units (GPUs) for the direct simulation Monte Carlo method

C.-C. Su M.R. Smith F.-A. Kuo J.-S. Wu C.-W. Hsieh K.-C. Tseng 《Journal of computational physics》2012,231(23):7932-7958

In this study, the application of the two-dimensional direct simulation Monte Carlo (DSMC) method using an MPI-CUDA parallelization paradigm on Graphics Processing Units (GPUs) clusters is presented. An all-device (i.e. GPU) computational approach is adopted where the entire computation is performed on the GPU device, leaving the CPU idle during all stages of the computation, including particle moving, indexing, particle collisions and state sampling. Communication between the GPU and host is only performed to enable multiple-GPU computation. Results show that the computational expense can be reduced by 15 and 185 times when using a single GPU and 16 GPUs respectively when compared to a single core of an Intel Xeon X5670 CPU. The demonstrated parallel efficiency is 75% when using 16 GPUs as compared to a single GPU for simulations using 30 million simulated particles. Finally, several very large-scale simulations in the near-continuum regime are employed to demonstrate the excellent capability of the current parallel DSMC method. 相似文献

13.

MPI在蒙特卡罗程序GMT中的应用和发展

许建亚杨磊张延师张勋超付芬张雅玲杨琼《原子核物理评论》2017,34(2):204-210

针对ADS颗粒靶概念的研究和设计,中国科学院近代物理研究所自主研发了蒙特卡罗模拟软件GMT。为了提高GMT程序的计算效率,研究了MPI在GMT中的应用和发展,实现了大规模随机数在进程中的随机分配,并采用快速读写文件的方式替代了MPI相关数据通信函数,极大地提高了计算效率。并研究了不同规模计算实例进程数、加速比、效率之间的关系,确定了最大加速进程数及并行效率最高时的进程数,为科研工作者在计算资源和计算效率之间选择最优计算方案提供了科学依据。MPI在GMT中的成功应用使计算资源得到了充分、高效的利用,极大地提高了计算效率,解决了蒙特卡罗方法中大规模事件模拟计算时间长、计算不稳定等问题,在散裂靶大规模扫描计算中发挥了重要的作用。For the research and design of the ADS granular-flow target concept, the Institute of Modern Physics, CAS has developed a Monte Carlo simulation software (GPU-accelerated Monte Carlo Transport program, GMT). In order to improve the computational efficiency of the GMT program, development and application of MPI in GMT were studied, to realize random distribution of the large-scale random number in the sub processes. Rapid reading and writing files were employed instead of the MPI data communication function, which greatly improves the computational efficiency. Different scale calculations were performed to study the relationship of process instance number, speedup to find the maximum acceleration process number and the number of processes when parallel efficiency is highest, which provides a scientific basis for researchers to optimize the computational program between computational resources and computation efficiency. The successful application of MPI in GMT, utilizes the computing resources fully and efficiently, improves the computational efficiency, solve the long time cost and unstable problem of Monte Carlo method in large-scale event simulations, plays an important role in the large-scale scanning calculation of the spallation target. 相似文献

14.

基于网格片的氧碘化学激光器多块并行数值模拟

下载免费PDF全文

郭红李艳安恒斌《强激光与粒子束》2014,26(8):081002

为实现氧碘化学激光器(COIL)喷管流场大规模数值模拟,采用VICON程序中的基本方程与数值算法,应用JASMIN框架中的基本数据结构——网格片,以及多块结构网格拼接并行算法,发展了三维多块COIL并行模拟程序。数值实验结果展示了该并行模拟程序的正确性及可扩展性,并在2048个处理器核上模拟450万网格单元算例,加速比超过420。相似文献

15.

激波与火焰面相互作用数值模拟的GPU加速 总被引：1，自引：0，他引：1

蒋华董刚陈霄《计算物理》2016,33(1):23-29

为考察计算机图形处理器（GPU）在计算流体力学中的计算能力,采用基于CPU/GPU异构并行模式的方法对激波与火焰界面相互作用的典型可压缩反应流进行数值模拟,优化并行方案,考察不同网格精度对计算结果和计算加速性能的影响.结果表明,和传统的基于信息传递的MPI 8线程并行计算相比,GPU并行模拟结果与MPI并行模拟结果相同;两种计算方法的计算时间均随网格数量的增加呈线性增长趋势,但GPU的计算时间比MPI明显降低.当网格数量较小时（1.6×10⁴）,GPU计算得到的单个时间步长平均时间的加速比为8.6;随着网格数量的增加,GPU的加速比有所下降,但对较大规模的网格数量（4.2×10⁶）,GPU的加速比仍可达到5.9.基于GPU的异构并行加速算法为可压缩反应流的高分辨率大规模计算提供了较好的解决途径. 相似文献

16.

基于"块-单元"数据结构的分子动力学并行计算 总被引：5，自引：0，他引：5

曹小林莫则尧张景琳陈其峰《计算物理》2004,21(5):377-385

开发了一种基于"块-单元"数据结构的可扩展并行算法,以实现大规模、非均匀分子动力学模拟.它采用空间填充曲线将三维区域分解转换为-维负载平衡问题,然后用基于实测的多层均权法求解,以保持处理机间负载均衡.在一个MPP并行机的500个CPU上,模拟包含2.1×108个粒子的三维金属微喷射模型,该算法获得了420倍的加速比. 相似文献

17.

扩散方程的守恒型并行计算格式 总被引：4，自引：4，他引：0

袁光伟杭旭登《计算物理》2010,27(4):475-491

辐射流体力学实际问题计算中扩散方程的计算量极大,必须采用并行计算.研究易于在并行机上实施的高效的并行计算方法,通过采用预估修正等多种方式,构造和发展既保持隐式格式的守恒性、同时能保持所需精度与无条件稳定性的并行计算格式,以满足大规模数值求解辐射流体力学问题的需求. 相似文献

18.

Acceleration of canonical molecular dynamics simulations using macroscopic expansion of the fast multipole method combined with the multiple timestep integrator algorithm

MASAAKI KAWATA MASUHIRO MIKAMI 《Molecular physics》2013,111(8):521-528

A canonical molecular dynamics (MD) simulation was accelerated by using an efficient implementation of the multiple timestep integrator algorithm combined with the periodic fast multiple method (MEFMM) for both Coulombic and van der Waals interactions. Although a significant reduction in computational cost has been obtained previously by using the integrated method, in which the MEFMM was used only to calculate Coulombic interactions (Kawata, M., and Mikami, M., 2000, 98, J. Comput. Chem., in press), the extension of this method to include van der Waals interactions yielded further acceleration of the overall MD calculation by a factor of about two. Compared with conventional methods, such as the velocity-Verlet algorithm combined with the Ewald method (timestep of 0.25 fs), the speedup by using the extended integrated method amounted to a factor of 500 for a 100 ps simulation. Therefore, the extended method reduces substantially the computational effort of large scale MD simulations. 相似文献

19.

强光电离团簇并行数值模拟

下载免费PDF全文

傅饶汪淼陆茵菲张贵忠向望华徐德刚姚建铨《强激光与粒子束》2011,23(2)

为研究强激光电离氢原子团簇,在理论上采用1维氢原子团簇的经典粒子动力学模型,结合粒子对(PP)算法及粒子模拟（PIC）方法,采用自行搭建的9节点并行集群系统,利用消息传递接口（MPI)与OpenMP混合编程模型进行了并行数值模拟计算,获得了较为理想的计算加速比。并且引入了弛豫时间参数,有效地处理了粒子间的碰撞过程,在极大简化计算量的同时,保留了物理本质。所得模拟结果与已有的实验结果符合较好,表明该并行计算模型是稳定、可行的。相似文献

20.

Accelerated GPU simulation of compressible flow by the discontinuous evolution Galerkin method

B. J. Block M. Lukáčová-Medvid’ová P. Virnau L. Yelash 《The European physical journal. Special topics》2012,210(1):119-132

The aim of the present paper is to report on our recent results for GPU accelerated simulations of compressible flows. For numerical simulation the adaptive discontinuous Galerkin method with the multidimensional bicharacteristic based evolution Galerkin operator has been used. For time discretization we have applied the explicit third order Runge-Kutta method. Evaluation of the genuinely multidimensional evolution operator has been accelerated using the GPU implementation. We have obtained a speedup up to 30 (in comparison to a single CPU core) for the calculation of the evolution Galerkin operator on a typical discretization mesh consisting of 16384 mesh cells. 相似文献