期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Particle-Mesh Ewald(PME)算法的GPU加速 总被引：1，自引：0，他引：1

徐骥葛蔚任瑛李静海《计算物理》2010,27(4):548-554

讨论在NVIDIACUDA开发环境下,用GPU加速分子动力学模拟中静电作用的长程受力计算部分.采用Particle-Mesh Ewald(PME)方法,将其分解为参数确定、点电荷网格离散、离散网格的傅立叶变换、静电热能求解与静电力求解5个部分,并分别分析各部分的GPU实现.此方法已成功用于7个不同大小的生物分子体系的模拟计算,达到了7倍左右的加速.该程序可耦合到现有分子动力学模拟软件中,或作为进一步开发的GPU分子动力学程序的一部分,显著加速传统分子动力学程序. 相似文献

2.

紧束缚近似含时密度泛函理论的高效OpenMP并行化和GPU加速实现

范果红韩克利何国钟《化学物理学报》2013,26(6):635-645

紧束缚近似的含时密度泛函理论在多核和GPU系统下的高效加速实现,并应用于拥有成百上千原子体系的激发态电子结构计算．程序中采用了稀疏矩阵和OpenMP并行化来加速哈密顿矩阵的构建,而最为耗时的基态对角化部分通过双精度的GPU加速来实现．基态的GPU加速能够在保持计算精度的基础上达到8.73倍的加速比．激发态计算采用了基于Krylov子空间迭代算法,OpenMP并行化和GPU加速等方法对激发态计算的大规模TDDFT矩阵进行求解,从而得到本征值和本征矢,大大减少了迭代的次数和最终的求解时间．采用GPU对矩阵矢量相乘进行加速后的Krylov算法能够很快地达到收敛,使得相比于采用常规算法和CPU并行化的程序能够加速206倍．程序在一系列的小分子体系和大分子体系上的计算表明,相比基于第一性原理的CIS方法和含时密度泛函方法,程序能够花费很少的计算量取得合理而精确结果．相似文献

3.

基于GPU求解椭圆型偏微分方程的并行算法

曹建伟徐翔王友年《计算物理》2015,32(4):475-481

针对求解椭圆型偏微分方程的雅克比迭代算法和DRM算法进行基于GPU的CUDA加速算法研究.通过两个算例在GTX570显卡上对GPU加速算法进行验证.结果表明,在保证运算精度的前提下,雅克比迭代的GPU加速效率最高,在DOUBLE类型下的加速比可达到14倍左右,效率可达到53%左右;DRM算法在DOUBLE类型下的加速比最高可达到3.8倍,效率达到15%左右. 相似文献

4.

激波与火焰面相互作用数值模拟的GPU加速 总被引：1，自引：0，他引：1

蒋华董刚陈霄《计算物理》2016,33(1):23-29

为考察计算机图形处理器（GPU）在计算流体力学中的计算能力,采用基于CPU/GPU异构并行模式的方法对激波与火焰界面相互作用的典型可压缩反应流进行数值模拟,优化并行方案,考察不同网格精度对计算结果和计算加速性能的影响.结果表明,和传统的基于信息传递的MPI 8线程并行计算相比,GPU并行模拟结果与MPI并行模拟结果相同;两种计算方法的计算时间均随网格数量的增加呈线性增长趋势,但GPU的计算时间比MPI明显降低.当网格数量较小时（1.6×10⁴）,GPU计算得到的单个时间步长平均时间的加速比为8.6;随着网格数量的增加,GPU的加速比有所下降,但对较大规模的网格数量（4.2×10⁶）,GPU的加速比仍可达到5.9.基于GPU的异构并行加速算法为可压缩反应流的高分辨率大规模计算提供了较好的解决途径. 相似文献

5.

NeuDATool：支持GPU硬件加速和计算机集群跨节点并行的开源中子散射数据分析软件

马长利程贺左太森焦贵省韩泽华秦虹《化学物理学报》2020,33(6):727-732

实验势精修是20世纪80年代英国散裂中子源无定型材料组开发的用于分析中子散射实验数据的软件. 实验势精修的目标是根据中子散射数据重建样品的三维原子结构. 在过去的几十年,实验势精修被广泛用于中子散射实验数据分析,为实验用户提供了可靠的分析结果. 但是实验势精修是基于共享内存并行计算(OpenMP)的Fortran程序,不支持计算机服务器集群跨节点并行加速和GPU加速;这限制了它的分析速度. 随着计算机服务器集群的广泛建设和GPU加速技术的普遍使用,有必要重新编写EPSR程序以提高运算速度. 本文使用面向对象的C++语言,开发了一套实现EPSR算法的开源软件包NeuDATool;软件通过MPI和CUDA C实现了计算机集群跨节点并行和GPU加速. 使用液态水和玻璃态二氧化硅的中子散射实验数据对软件进行了测试. 测试显示软件可以正确重建出样品的三维原子结构;并且模拟体系达到10万原子以上时,使用GPU加速可以比串行的CPU算法提高400倍以上的模拟速度. NeuDATool为中子实验用户尤其是对熟悉C++编程并希望定义特殊分析算法的实验科学家提供了一种新的选择. 相似文献

6.

采用GPU加速复杂结构角系数计算的方法研究

《工程热物理学报》2017,(6)

采用离散传递法计算目标的红外辐射特征首先要计算角系数,对于复杂目标,角系数计算会耗费大量时间。针对角系数计算中遮挡判断问题,提出了一种不需要求解微元面中心点连线与遮挡面交点的矢量法。开发了基于矢量法的GPU并行程序,并使用一个有理论解的模型验证了程序的正确性,最后针对航空发动机排气系统复杂模型,在一台配置有448个核心的GPU处理器和主频为3.4GHz CPU处理器的工作站上,评估了基于矢量法的GPU程序相对单核CPU程序的加速效果。结果表明;在微元面数量为13670时,矢量法可获得约73倍的加速比,是采用线面求交法获得的加速比的1.4倍。使用离散传递法计算航空发动机排气系统的红外辐射特征时,仅将角系数模块使用GPU并行计算,可减少45%的计算时间。相似文献

7.

表面效应对铁<100>间隙型位错环的影响

下载免费PDF全文

梁晋洁高宁李玉红《物理学报》2020,(3):241-248

在材料辐照损伤过程中,间隙型位错环的形成及动力学行为严重影响材料在辐照条件下的服役行为.在常用的以体心立方铁为基的合金材料中,1/2<111>和<100>是两种主要的位错环,其对辐照损伤的影响一直都是核材料领域研究的热点之一.在之前的研究中,人们对{111}面与单个1/2<111>位错环的相互作用进行了深入研究,发现表面对位错环性质确实有重要的影响.采用分子动力学方法,在原子尺度详细研究了另一个重要的表面铁{100}面对<100>间隙型位错环动力学过程的影响.模拟发现位错环伯格斯矢量与表面法线方向的关系、距表面的深度、位错环之间的相互作用以及温度等,都对位错环与表面的相互作用产生重要影响,其中,表面作用下的伯格斯矢量的演化以及<100>位错环在此过程中的一维运动首次被发现.基于这些模拟结果,就<100>位错环对表面辐照损伤结构的影响进行详细地研究,给出<100>位错环对表面凹凸结构的贡献,这些结果为理解辐照过程中材料表面的演化提供一种可能的解释. 相似文献

8.

气体动力学直接模拟Monte Carlo的高效GPU并行计算

贺永翔刘昕赵海波《计算物理》2015,(2):169-176

实现了基于计算统一设备架构(CUDA)的直接模拟Monte Carlo(DSMC)并行算法,改进了原有多图形处理器(GPU)数据之间传输并行算法,数值模拟计算二维Couette流和二维顶盖驱动方腔流,定量比较了CPU、单GPU和多GPU并行计算的结果和计算时间.结果表明单GPU并行计算相对CPU计算的加速效果可以达到10~30倍,双GPU并行计算加速效果可以达到40~60倍,多GPU并行计算的加速效率接近100%,且计算精度能够得到良好保证. 相似文献

9.

分子动力学模拟钠硼硅酸盐玻璃电子辐照诱导的结构演化效应

下载免费PDF全文

袁伟彭海波杜鑫律鹏沈扬皓赵彦陈亮王铁山《物理学报》2017,66(10):106102-106102

钠硼硅酸盐玻璃作为高放射废物玻璃固化体的候选材料之一,已有大量实验对该类玻璃开展了电子或重离子的辐照效应研究.然而,在理论计算与模拟方面的工作却很少,目前主要集中于重离子的辐照效应,对电子的辐照效应的模拟尚未见报道.本文利用分子动力学工具提出一种新的方法,以实现对电子辐照诱导的玻璃结构演化进行模拟.该方法基于实验中玻璃的结构变化特点,即实验中的拉曼结果已经证实:在大剂量的电子辐照后的玻璃中存在分子氧的事实,由于这些分子氧不会与其他粒子发生相互作用,因而可以通过从体系中逐步地移除一定数量氧原子的方式,以达到模拟大剂量电子辐照的情形,进而得到电子辐照后的玻璃的结构信息.模拟结果显示:随着移除氧原子的数量增加,玻璃中的Si—O—Si平均键角逐渐减小;而且玻璃中的小环数量会因氧的逐渐减少而逐渐增加;玻璃中部分[BO4]结构会转变为[BO3]结构,最终这种转变会达到饱和;大量移除氧之后,玻璃中的钠元素也出现明显的相分离.这些模拟辐照的玻璃结构特性能较好地与实验中的硼硅酸盐玻璃电子辐照诱导的结构变化符合.因此,本文提出的方法有望为通过分子动力学模拟硼硅酸盐玻璃的电子辐照效应提供新思路. 相似文献

10.

准分子激光辐照HgCdTe半导体材料的损伤机理研究 总被引：2，自引：0，他引：2

戚树明陈传松周新玲郭娟王娟满宝元《量子光学学报》2009,15(1):76-83

利用光学显微镜和扫描电子显微镜对248 nm准分子脉冲强激光辐照的HgCdTe晶片表面进行了观察,观察到一些与红外波段内激光辐照HgCdTe晶片时大不相同的实验现象.研究表明,红外波段内1 064nm激光辐照HgCdTe半导体材料的损伤机制主要为光热作用,而紫外波段248 nm准分子激光对HgCdTe材料的损伤机制既包含光化学作用也包含光热作用.分析了准分子激光对晶体的机械破坏现象,同时对HgCdTe材料在激光辐照区的条纹产生机理进行了探讨,发现激光驱动声波理论模型比光学模型和热导波模型能更好地解释HgCdTe晶体表面的条纹现象. 相似文献

11.

双重并行环境下最短路径的研究

孙玉强李银银顾玉宛《应用声学》2017,25(3):195-196, 230

并行问题和最短路径问题已成为一个热点研究课题,传统的最短路径算法已不能满足数据爆炸式增长的处理需求,尤其当网络规模很大时,所需的计算时间和存储空间也大大的增加;MapReduce模型的出现,带来了一种新的解决方法来解决最短路径;GPU具有强大的并行计算能力和存储带宽,与CPU相比具有明显的优势;通过研究MapReduce模型和GPU执行过程的分析,指出单独基于MapReduce模型的最短路径并行方法存在的问题,降低了系统的性能;论文的创新点是结合MapReduce和GPU形成双并行模型,并行预处理数据,针对最短路径中的数据传输和同步开销,增加数据动态处理器;最后实验从并行算法的性能评价指标平均加速比进行比较,结果表明,双重并行环境下的最短路径的计算,提高了加速比。相似文献

12.

基于格子Boltzmann方法的多孔介质流动模拟GPU加速 总被引：1，自引：0，他引：1

朱炼华郭照立《计算物理》2015,32(1):20-26

利用NVIDIA CUDA平台,在GPU上结合稀疏存贮算法实现基于格子Boltzmann方法的孔隙尺度多孔介质流动模拟加速,测试该算法相对基本算法的性能.比较该算法在不同GPU上使用LBGK和MRT两种碰撞模型及单、双精度计算时的性能差异.测试结果表明在GPU环境下采用稀疏存贮算法相对基本算法能大幅提高计算速度并节省显存,相对于串行CPU程序加速比达到两个量级.使用较新构架的GPU时,MRT和LBGK碰撞模型在单、双浮点数精度下计算速度相同.而在较上一代的GPU上,计算精度对MRT碰撞模型计算速度影响较大. 相似文献

13.

Accelerated GPU simulation of compressible flow by the discontinuous evolution Galerkin method

B. J. Block M. Lukáčová-Medvid’ová P. Virnau L. Yelash 《The European physical journal. Special topics》2012,210(1):119-132

The aim of the present paper is to report on our recent results for GPU accelerated simulations of compressible flows. For numerical simulation the adaptive discontinuous Galerkin method with the multidimensional bicharacteristic based evolution Galerkin operator has been used. For time discretization we have applied the explicit third order Runge-Kutta method. Evaluation of the genuinely multidimensional evolution operator has been accelerated using the GPU implementation. We have obtained a speedup up to 30 (in comparison to a single CPU core) for the calculation of the evolution Galerkin operator on a typical discretization mesh consisting of 16384 mesh cells. 相似文献

14.

耗散粒子动力学GPU并行计算研究 总被引：1，自引：0，他引：1

下载免费PDF全文

林晨森陈硕李启良杨志刚《物理学报》2014,63(10):104702-104702

研究了耗散粒子动力学基于计算统一设备架构的图形处理器(GPU)并行计算的实施.对其中涉及的算法映射模型、Cell-List法数组的并行化更新、随机数生成、存储器访问优化、负载平衡等进行了详细的讨论.进一步模拟了Poiseuille流动和突扩突缩流动,从而验证了GPU计算结果的正确性.计算结果表明,相对于基于中央处理器的串行计算,在耗散粒子动力学中实施GPU并行计算可以获得约20倍的加速比. 相似文献

15.

Development of a GPU-based high-performance radiative transfer model for the Infrared Atmospheric Sounding Interferometer (IASI)

Bormin Huang Jarno Mielikainen Hyunjong Oh Hung-Lung Allen Huang 《Journal of computational physics》2011,230(6):2207-2221

Satellite-observed radiance is a nonlinear functional of surface properties and atmospheric temperature and absorbing gas profiles as described by the radiative transfer equation (RTE). In the era of hyperspectral sounders with thousands of high-resolution channels, the computation of the radiative transfer model becomes more time-consuming. The radiative transfer model performance in operational numerical weather prediction systems still limits the number of channels we can use in hyperspectral sounders to only a few hundreds. To take the full advantage of such high-resolution infrared observations, a computationally efficient radiative transfer model is needed to facilitate satellite data assimilation. In recent years the programmable commodity graphics processing unit (GPU) has evolved into a highly parallel, multi-threaded, many-core processor with tremendous computational speed and very high memory bandwidth. The radiative transfer model is very suitable for the GPU implementation to take advantage of the hardware’s efficiency and parallelism where radiances of many channels can be calculated in parallel in GPUs.In this paper, we develop a GPU-based high-performance radiative transfer model for the Infrared Atmospheric Sounding Interferometer (IASI) launched in 2006 onboard the first European meteorological polar-orbiting satellites, METOP-A. Each IASI spectrum has 8461 spectral channels. The IASI radiative transfer model consists of three modules. The first module for computing the regression predictors takes less than 0.004% of CPU time, while the second module for transmittance computation and the third module for radiance computation take approximately 92.5% and 7.5%, respectively. Our GPU-based IASI radiative transfer model is developed to run on a low-cost personal supercomputer with four GPUs with total 960 compute cores, delivering near 4 TFlops theoretical peak performance. By massively parallelizing the second and third modules, we reached 364× speedup for 1 GPU and 1455× speedup for all 4 GPUs, both with respect to the original CPU-based single-threaded Fortran code with the –O₂ compiling optimization. The significant 1455× speedup using a computer with four GPUs means that the proposed GPU-based high-performance forward model is able to compute one day’s amount of 1,296,000 IASI spectra within nearly 10 min, whereas the original single CPU-based version will impractically take more than 10 days. This model runs over 80% of the theoretical memory bandwidth with asynchronous data transfer. A novel CPU–GPU pipeline implementation of the IASI radiative transfer model is proposed. The GPU-based high-performance IASI radiative transfer model is suitable for the assimilation of the IASI radiance observations into the operational numerical weather forecast model. 相似文献

16.

High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster

Dimitri Komatitsch Gordon Erlebacher Dominik Göddeke David Michéa 《Journal of computational physics》2010,229(20):7692-7714

We implement a high-order finite-element application, which performs the numerical simulation of seismic wave propagation resulting for instance from earthquakes at the scale of a continent or from active seismic acquisition experiments in the oil industry, on a large cluster of NVIDIA Tesla graphics cards using the CUDA programming environment and non-blocking message passing based on MPI. Contrary to many finite-element implementations, ours is implemented successfully in single precision, maximizing the performance of current generation GPUs. We discuss the implementation and optimization of the code and compare it to an existing very optimized implementation in C language and MPI on a classical cluster of CPU nodes. We use mesh coloring to efficiently handle summation operations over degrees of freedom on an unstructured mesh, and non-blocking MPI messages in order to overlap the communications across the network and the data transfer to and from the device via PCIe with calculations on the GPU. We perform a number of numerical tests to validate the single-precision CUDA and MPI implementation and assess its accuracy. We then analyze performance measurements and depending on how the problem is mapped to the reference CPU cluster, we obtain a speedup of 20x or 12x. 相似文献

17.

Lattice Boltzmann simulations on GPUs with ESPResSo

D. Roehm A. Arnold 《The European physical journal. Special topics》2012,210(1):89-100

For the dynamics of macromolecules in solution, hydrodynamic interactions mediated by the solvent molecules often play an important role, although one is not interested in the dynamics of the solvent itself. In computer simulations one can therefore save a large amount of computer time by replacing the solvent with a lattice fluid. The macromolecules are propagated by Molecular Dynamics (MD), while the fluid is governed by the fluctuating Lattice-Boltzmann (LB) equation. We present a fluctuating LB implementation for a single graphics card (GPU) coupled to a MD simulation running on conventional processors (CPUs). Particular emphasis lies on the optimization of the combined code. In our implementation, the LB update is performed in parallel with the force calculation on the CPU, which often completely hides the additional computational cost of the LB. Compared to our parallel LB implementation on a conventional quad-core CPU, the GPU LB is 50 times faster, and we show that a whole commodity cluster with Infiniband interconnnect cannot outperform a single GPU in strong scaling. The presented code is part of the open source simulation package ESPResSo (). 相似文献

18.

GPU accelerated simulations of bluff body flows using vortex particle methods

Diego Rossinelli Michael Bergdorf Georges-Henri Cottet Petros Koumoutsakos 《Journal of computational physics》2010,229(9):3316-3333

We present a GPU accelerated solver for simulations of bluff body flows in 2D using a remeshed vortex particle method and the vorticity formulation of the Brinkman penalization technique to enforce boundary conditions. The efficiency of the method relies on fast and accurate particle-grid interpolations on GPUs for the remeshing of the particles and the computation of the field operators. The GPU implementation uses OpenGL so as to perform efficient particle-grid operations and a CUFFT-based solver for the Poisson equation with unbounded boundary conditions. The accuracy and performance of the GPU simulations and their relative advantages/drawbacks over CPU based computations are reported in simulations of flows past an impulsively started circular cylinder from Reynolds numbers between 40 and 9500. The results indicate up to two orders of magnitude speed up of the GPU implementation over the respective CPU implementations. The accuracy of the GPU computations depends on the Re number of the flow. For Re up to 1000 there is little difference between GPU and CPU calculations but this agreement deteriorates (albeit remaining to within 5% in drag calculations) for higher Re numbers as the single precision of the GPU adversely affects the accuracy of the simulations. 相似文献