共查询到20条相似文献,搜索用时 78 毫秒
1.
2.
Lushnikov PM 《Optics letters》2002,27(11):939-941
An efficient numerical algorithm is presented for massively parallel simulations of dispersion-managed wavelength-division-multiplexed optical fiber systems. The algorithm is based on a weak nonlinearity approximation and independent parallel calculations of fast Fourier transforms on multiple central processor units (CPUs). The algorithm allows one to implement numerical simulations M/2 times faster than a direct numerical simulation by a split-step method, where M is a number of CPUs in a parallel network. 相似文献
3.
在带有详细化学反应机理的可压缩反应流数值模拟中,化学反应源项的计算会极大增加计算时间,基于建表技术的化学加速算法可以通过查找数据表中的数据来替代化学反应计算,从而有效提高计算效率,但数据表尺寸的过度增长会导致计算的中断.文章提出了基于两种数据表容量控制策略的并行动态存储/删除算法,并在激波诱导火焰界面失稳的数值模拟中进行了应用,以考察算法的性能.两种数据表容量控制策略分别为单表容量(Msin)控制和总表容量(Mtot)控制,当单个数据表尺寸达到Msin或总数据表尺寸达到Mtot时,对数据表进行节点删除,以保证计算的正常进行.计算结果表明,文章提出的基于表容量控制的并行加速算法,其计算准确度和计算效率之间存在关联,具有较好计算准确度算例显示了较高的计算效率.在不同的Msin和Mtot条件下,计算的化学加速比在2.73~3.93之间.两种表控策略的组合影响了数据表删除的频率和删除之间的同步性,当数据表删除频率小、删除同步性强时,化学加速比要更高. 相似文献
4.
5.
Kristensen JH Farnan I 《Journal of magnetic resonance (San Diego, Calif. : 1997)》2003,161(2):183-190
Methods for parallel simulation of solid state NMR powder spectra are presented for both shared and distributed memory parallel supercomputers. For shared memory architectures the performance of simulation programs implementing the OpenMP application programming interface is evaluated. It is demonstrated that the design of correct and efficient shared memory parallel programs is difficult as the performance depends on data locality and cache memory effects. The distributed memory parallel programming model is examined for simulation programs using the MPI message passing interface. The results reveal that both shared and distributed memory parallel computation are very efficient with an almost perfect application speedup and may be applied to the most advanced powder simulations. 相似文献
6.
Chunye Gong Jie Liu Lihua Chi Haowei Huang Jingyue Fang Zhenghu Gong 《Journal of computational physics》2011,230(15):6010-6022
Graphics Processing Unit (GPU), originally developed for real-time, high-definition 3D graphics in computer games, now provides great faculty in solving scientific applications. The basis of particle transport simulation is the time-dependent, multi-group, inhomogeneous Boltzmann transport equation. The numerical solution to the Boltzmann equation involves the discrete ordinates (Sn) method and the procedure of source iteration. In this paper, we present a GPU accelerated simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The performance of the GPU simulations are reported with the simulations of vacuum boundary condition. The discussion of the relative advantages and disadvantages of the GPU implementation, the simulation on multi GPUs, the programming effort and code portability are also reported. The results show that the overall performance speedup of one NVIDIA Tesla M2050 GPU ranges from 2.56 compared with one Intel Xeon X5670 chip to 8.14 compared with one Intel Core Q6600 chip for no flux fixup. The simulation with flux fixup on one M2050 is 1.23 times faster than on one X5670. 相似文献
7.
Feng-Nan Hwang Zih-Hao Wei Tsung-Ming Huang Weichung Wang 《Journal of computational physics》2010,229(8):2932-2947
We develop a parallel Jacobi–Davidson approach for finding a partial set of eigenpairs of large sparse polynomial eigenvalue problems with application in quantum dot simulation. A Jacobi–Davidson eigenvalue solver is implemented based on the Portable, Extensible Toolkit for Scientific Computation (PETSc). The eigensolver thus inherits PETSc’s efficient and various parallel operations, linear solvers, preconditioning schemes, and easy usages. The parallel eigenvalue solver is then used to solve higher degree polynomial eigenvalue problems arising in numerical simulations of three dimensional quantum dots governed by Schrödinger’s equations. We find that the parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method (e.g. GMRES) can solve the correction equations, the most costly step in the Jacobi–Davidson algorithm, very efficiently in parallel. Besides, the overall performance is quite satisfactory. We have observed near perfect superlinear speedup by using up to 320 processors. The parallel eigensolver can find all target interior eigenpairs of a quintic polynomial eigenvalue problem with more than 32 million variables within 12 minutes by using 272 Intel 3.0 GHz processors. 相似文献
8.
Particle-Mesh Ewald(PME)算法的GPU加速 总被引:1,自引:0,他引:1
讨论在NVIDIACUDA开发环境下,用GPU加速分子动力学模拟中静电作用的长程受力计算部分.采用Particle-Mesh Ewald(PME)方法,将其分解为参数确定、点电荷网格离散、离散网格的傅立叶变换、静电热能求解与静电力求解5个部分,并分别分析各部分的GPU实现.此方法已成功用于7个不同大小的生物分子体系的模拟计算,达到了7倍左右的加速.该程序可耦合到现有分子动力学模拟软件中,或作为进一步开发的GPU分子动力学程序的一部分,显著加速传统分子动力学程序. 相似文献
9.
Nasser Mohieddin Abukhdeir Dionisios G. Vlachos Markos Katsoulakis Michael Plexousakis 《Journal of computational physics》2011,230(14):5704-5715
Spectral methods for simulation of a mesoscopic diffusion model of surface pattern formation are evaluated for long simulation times. Backwards-differencing time-integration, coupled with an underlying Newton–Krylov nonlinear solver (SUNDIALS-CVODE), is found to substantially accelerate simulations, without the typical requirement of preconditioning. Quasi-equilibrium simulations of patterned phases predicted by the model are shown to agree well with linear stability analysis. Simulation results of the effect of repulsive particle–particle interactions on pattern relaxation time and short/long-range order are discussed. 相似文献
10.
11.
三维电磁粒子模拟基于时域有限差分算法(FDTD)和PIC(particle-in-cell)方法.根据FDTD和PIC方法的特点,可以将整个模拟区域分割为多个子区域,每个计算进程模拟计算一个子区域,通过消息传递交换子区域的边界数据从而实现并行计算这一基本思路,完成了并行算法的设计,并分析了并行加速比的影响因素.在三维电磁粒子模拟软件CHIPIC3D上实现了该并行算法并验证了算法的正确性,最后应用CHIPIC3D并行版本对磁绝缘线振荡器和相对论速调管两种典型的高功率微波源器件进行了模拟,证明了该并行算法能取
关键词:
电磁粒子模拟
时域有限差分
并行计算
高功率微波源 相似文献
12.
C.-C. Su M.R. Smith F.-A. Kuo J.-S. Wu C.-W. Hsieh K.-C. Tseng 《Journal of computational physics》2012,231(23):7932-7958
In this study, the application of the two-dimensional direct simulation Monte Carlo (DSMC) method using an MPI-CUDA parallelization paradigm on Graphics Processing Units (GPUs) clusters is presented. An all-device (i.e. GPU) computational approach is adopted where the entire computation is performed on the GPU device, leaving the CPU idle during all stages of the computation, including particle moving, indexing, particle collisions and state sampling. Communication between the GPU and host is only performed to enable multiple-GPU computation. Results show that the computational expense can be reduced by 15 and 185 times when using a single GPU and 16 GPUs respectively when compared to a single core of an Intel Xeon X5670 CPU. The demonstrated parallel efficiency is 75% when using 16 GPUs as compared to a single GPU for simulations using 30 million simulated particles. Finally, several very large-scale simulations in the near-continuum regime are employed to demonstrate the excellent capability of the current parallel DSMC method. 相似文献
13.
针对ADS颗粒靶概念的研究和设计,中国科学院近代物理研究所自主研发了蒙特卡罗模拟软件GMT。为了提高GMT程序的计算效率,研究了MPI在GMT中的应用和发展,实现了大规模随机数在进程中的随机分配,并采用快速读写文件的方式替代了MPI相关数据通信函数,极大地提高了计算效率。并研究了不同规模计算实例进程数、加速比、效率之间的关系,确定了最大加速进程数及并行效率最高时的进程数,为科研工作者在计算资源和计算效率之间选择最优计算方案提供了科学依据。MPI在GMT中的成功应用使计算资源得到了充分、高效的利用,极大地提高了计算效率,解决了蒙特卡罗方法中大规模事件模拟计算时间长、计算不稳定等问题,在散裂靶大规模扫描计算中发挥了重要的作用。For the research and design of the ADS granular-flow target concept, the Institute of Modern Physics, CAS has developed a Monte Carlo simulation software (GPU-accelerated Monte Carlo Transport program, GMT). In order to improve the computational efficiency of the GMT program, development and application of MPI in GMT were studied, to realize random distribution of the large-scale random number in the sub processes. Rapid reading and writing files were employed instead of the MPI data communication function, which greatly improves the computational efficiency. Different scale calculations were performed to study the relationship of process instance number, speedup to find the maximum acceleration process number and the number of processes when parallel efficiency is highest, which provides a scientific basis for researchers to optimize the computational program between computational resources and computation efficiency. The successful application of MPI in GMT, utilizes the computing resources fully and efficiently, improves the computational efficiency, solve the long time cost and unstable problem of Monte Carlo method in large-scale event simulations, plays an important role in the large-scale scanning calculation of the spallation target. 相似文献
14.
15.
激波与火焰面相互作用数值模拟的GPU加速 总被引:1,自引:0,他引:1
为考察计算机图形处理器(GPU)在计算流体力学中的计算能力,采用基于CPU/GPU异构并行模式的方法对激波与火焰界面相互作用的典型可压缩反应流进行数值模拟,优化并行方案,考察不同网格精度对计算结果和计算加速性能的影响.结果表明,和传统的基于信息传递的MPI 8线程并行计算相比,GPU并行模拟结果与MPI并行模拟结果相同;两种计算方法的计算时间均随网格数量的增加呈线性增长趋势,但GPU的计算时间比MPI明显降低.当网格数量较小时(1.6×104),GPU计算得到的单个时间步长平均时间的加速比为8.6;随着网格数量的增加,GPU的加速比有所下降,但对较大规模的网格数量(4.2×106),GPU的加速比仍可达到5.9.基于GPU的异构并行加速算法为可压缩反应流的高分辨率大规模计算提供了较好的解决途径. 相似文献
16.
17.
扩散方程的守恒型并行计算格式 总被引:4,自引:4,他引:0
辐射流体力学实际问题计算中扩散方程的计算量极大,必须采用并行计算.研究易于在并行机上实施的高效的并行计算方法,通过采用预估修正等多种方式,构造和发展既保持隐式格式的守恒性、同时能保持所需精度与无条件稳定性的并行计算格式,以满足大规模数值求解辐射流体力学问题的需求. 相似文献
18.
A canonical molecular dynamics (MD) simulation was accelerated by using an efficient implementation of the multiple timestep integrator algorithm combined with the periodic fast multiple method (MEFMM) for both Coulombic and van der Waals interactions. Although a significant reduction in computational cost has been obtained previously by using the integrated method, in which the MEFMM was used only to calculate Coulombic interactions (Kawata, M., and Mikami, M., 2000, 98, J. Comput. Chem., in press), the extension of this method to include van der Waals interactions yielded further acceleration of the overall MD calculation by a factor of about two. Compared with conventional methods, such as the velocity-Verlet algorithm combined with the Ewald method (timestep of 0.25 fs), the speedup by using the extended integrated method amounted to a factor of 500 for a 100 ps simulation. Therefore, the extended method reduces substantially the computational effort of large scale MD simulations. 相似文献
19.
20.
B. J. Block M. Lukáčová-Medvid’ová P. Virnau L. Yelash 《The European physical journal. Special topics》2012,210(1):119-132
The aim of the present paper is to report on our recent results for GPU accelerated simulations of compressible flows. For
numerical simulation the adaptive discontinuous Galerkin method with the multidimensional bicharacteristic based evolution
Galerkin operator has been used. For time discretization we have applied the explicit third order Runge-Kutta method. Evaluation
of the genuinely multidimensional evolution operator has been accelerated using the GPU implementation. We have obtained a
speedup up to 30 (in comparison to a single CPU core) for the calculation of the evolution Galerkin operator on a typical
discretization mesh consisting of 16384 mesh cells. 相似文献