期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

常珊孔韧李春华陈慰祖王存新《计算物理》2008,25(2):241-246

基于消息传递接口(Message Passing Interface,MPI),用两种不同的并行程序设计方法对Autodock程序进行修改.将修改后的程序应用于HIV-1蛋白酶(Protease)和小分子抑制剂XK263的对接体系,测试了并行程序的加速比和并行效率.结果表明,两种改进的并行Autodock程序都可以很好地完成计算,尤其是方案Ⅱ并行程序的加速比和并行效率更高. 相似文献

2.

MC程序并行设计及提高加速比措施 总被引：4，自引：0，他引：4

邓力谢仲生黄正丰许海燕《计算物理》2001,18(2):177-180

MC程序的并行设计涉及算法及模块划分,它直接关系到并行加速效率的高低,中子－γ耦合输运蒙特卡罗程序MCNP经过行改造,实现了PVM和MPI两种系统下的并行化,由于作了模块化设计,并行加速效率极佳,PVM版和MPI版并行程序在多个处理器下的加速比均呈线性增长,相比PVM,MPI的适应性列强,多数情况下其效率高于OPVM,并行MCNP程序的计算结果可靠,MPI并行程序在16、32和64个处理器上的并行效率分别达到99％、97％和89％。相似文献

3.

三维光滑粒子流体动力学并行计算程序

王裴洪滔《计算物理》2006,23(4):431-435

介绍了基于消息传递并行程序设计平台研制的三维光滑粒子流体动力学并行程序CSPH3D.包括计算格式、并行方案、并行程序逻辑,以及加快邻域粒子搜索的处理方法.对三维微喷射和斜侵彻的计算表明:CSPH3D程序可以较好地计算这类问题.并且程序具有较高的并行效率.对于粒子总数为1527402的微喷射算例和粒子总数为1454225的斜侵彻算例,使用100个处理器时,并行效率可以达到80%. 相似文献

4.

并行平台上的CFD第一部分:标准的建立 总被引：1，自引：0，他引：1

吴淞涛徐纲蒋康涛黄伟光袁新《工程热物理学报》2001,22(3):307-309

为解决分布式网络并行技术在计算流体力学应用中的主要问题,提出了"并行平台上的CFD通用界面标准”.应用该CFD通用界面标准简化了传统的CFD程序改为并行程序的工作,使不同的CFD核心程序可以方便而快捷的改为并行程序,应用于不同的算例中;提高了CFD并行程序的移植性,CFD核心层同并行边界层无需作任何改动就可应用到不同的并行平台上.在这个标准基础上开发了基于WINDOWSNT系统的WINSOCK并行平台,搭建了基于LINUX系统的PVM并行平台. 相似文献

5.

多核环境下潮汐分析辅助软件并行绘制技术研究

单维锋陈福明李军《应用声学》2017,25(5):140-142

为方便科研人员使用Baytap-G潮汐分析软件,基于C#.NET程序设计语言设计了一套潮汐分析辅助软件,实现了Baytap-G软件的封装,完成了输入数据格式的自动转换、输出数据的自动提取,以及水位、振幅、相位数据的可视化展示;在介绍C#.NET语言中任务(TASK)并行编程模型的基础上,详细讨论了绘制振幅、相位图任务分解、并行化程序设计考虑及其实现;实验结果表明,合理设计并行程序可以充分利用多核计算机的计算资源,提升程序运行效率,但是过多的任务数、不均匀的工作负载通常会影响并行程序的效率。相似文献

6.

二维多群辐射输运程序LARED-R-1的并行化 总被引：3，自引：2，他引：1

张爱清莫则尧《计算物理》2007,24(2):146-152

利用有向图描述数据依赖关系,应用已有的并行流水线通量扫描算法,实现基于非协调网格的二维辐射输运程序LARED-R-1的并行化.同时,采用消息缓冲技术提高并行程序的性能.经测试,对于典型的问题规模(100群、3800个网格单元、40个方向),在某并行机的64个和128个处理器上,并行程序分别获得80%和53%的并行效率. 相似文献

7.

密度矩阵重正化群的异构并行优化

下载免费PDF全文

陈富州程晨罗洪刚《物理学报》2019,68(12):120202-120202

密度矩阵重正化群方法(DMRG)在求解一维强关联格点模型的基态时可以获得较高的精度,在应用于二维或准二维问题时,要达到类似的精度通常需要较大的计算量与存储空间.本文提出一种新的DMRG异构并行策略,可以同时发挥计算机中央处理器(CPU)和图形处理器(GPU)的计算性能.针对最耗时的哈密顿量对角化部分,实现了数据的分布式存储,并且给出了CPU和GPU之间的负载平衡策略.以费米Hubbard模型为例,测试了异构并行程序在不同DMRG保留状态数下的运行表现,并给出了相应的性能基准.应用于4腿梯子时,观测到了高温超导中常见的电荷密度条纹,此时保留状态数达到104,使用的GPU显存小于12 GB. 相似文献

8.

一种非定常N-S方程并行求解设计 总被引：1，自引：0，他引：1

李雪松徐建中《工程热物理学报》2008,29(1)

为了解决计算流体力学(CFD)中非定常计算与越来越大的计算量,并行计算已成为一种现实有效的选择.论文首先研究了一种并行区域分解策略,该策略简单而高效,但需要算法配合.为此,采用了一种与并行完全兼容的隐式方法DP-LUR方法.通过双时间步长法,将DP-LUR方法延伸应用到非定常计算中而不改变其原有的性质.最后分析了并行编程中的主要难点,提出解决方法,即采用中间数据分离节点下标与处理,并给出了并行程序的总体结构. 相似文献

9.

基于JASMIN框架的区域大气模式并行程序开发及试验

徐幼平程煜峰王斌郭红普业程锐《计算物理》2017,34(1):47-60

以自主研制的区域中尺度暴雨大气模式为研究对象,基于JASMIN并行编程框架,建立构件化、层次化的区域大气模式大规模高效并行程序,并针对典型天气实例,对模式并行计算程序的正确性、并行性能及高分辨率模拟效果进行验证.结果证明,基于JASMIN框架的新模式程序与原串行模式具有很好的计算一致性,其不仅能保持原有模式良好的预报效果,且能显著提升模式大规模并行计算性能和可扩展性,在进一步提高模式分辨率后能得到更好的预报结果. 相似文献

10.

基于JASMIN框架的FFT并行解法器及其应用

郭红曹小林胡晓燕《计算物理》2011,28(4):475-480

为解决并行应用程序使用FFTW(Fastest Fourier Transform in the West)并行软件包所面临的计算规模难以扩展、数据结构变动大、实现不同数据结构间通信难度大、接口不确定等问题,在JASMIN框架内设计实现FFT并行解法器.该解法器封装了数据分布存储、数据通信等并行计算细节,通过重新分布存储数据,调用一维FFT变换实现高维FFT的并行计算,并提供规范接口,支撑用户简便地实现FFT的并行计算.数值测试表明,该解法器具有很好的并行性能.该解法器已应用于激光等离子体成丝不稳定性的数值模拟并行程序,它在2048个处理器上的并行效率可达80%以上. 相似文献

11.

Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project

Andrade X Alberdi-Rodriguez J Strubbe DA Oliveira MJ Nogueira F Castro A Muguerza J Arruabarrena A Louie SG Aspuru-Guzik A Rubio A Marques MA 《J Phys Condens Matter》2012,24(23):233202

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures. 相似文献

12.

An adaptive hierarchical domain decomposition method for parallel contact dynamics simulations of granular materials

Zahra Shojaaee M. Reza Shaebani Lothar Brendel János Török Dietrich E. Wolf 《Journal of computational physics》2012,231(2):612-628

A fully parallel version of the contact dynamics (CD) method is presented in this paper. For large enough systems, 100% efficiency has been demonstrated for up to 256 processors using a hierarchical domain decomposition with dynamic load balancing. The iterative scheme to calculate the contact forces is left domain-wise sequential, with data exchange after each iteration step, which ensures its stability. The number of additional iterations required for convergence by the partially parallel updates at the domain boundaries becomes negligible with increasing number of particles, which allows for an effective parallelization. Compared to the sequential implementation, we found no influence of the parallelization on simulation results. 相似文献

13.

Parallel finite element simulations of incompressible viscous fluid flow by domain decomposition with Lagrange multipliers

Christian A. Rivera Mourad Heniche Roland Glowinski Philippe A. Tanguy 《Journal of computational physics》2010,229(13):5123-5143

A parallel approach to solve three-dimensional viscous incompressible fluid flow problems using discontinuous pressure finite elements and a Lagrange multiplier technique is presented. The strategy is based on non-overlapping domain decomposition methods, and Lagrange multipliers are used to enforce continuity at the boundaries between subdomains. The novelty of the work is the coupled approach for solving the velocity–pressure-Lagrange multiplier algebraic system of the discrete Navier–Stokes equations by a distributed memory parallel ILU (0) preconditioned Krylov method. A penalty function on the interface constraints equations is introduced to avoid the failure of the ILU factorization algorithm. To ensure portability of the code, a message based memory distributed model with MPI is employed. The method has been tested over different benchmark cases such as the lid-driven cavity and pipe flow with unstructured tetrahedral grids. It is found that the partition algorithm and the order of the physical variables are central to parallelization performance. A speed-up in the range of 5–13 is obtained with 16 processors. Finally, the algorithm is tested over an industrial case using up to 128 processors. In considering the literature, the obtained speed-ups on distributed and shared memory computers are found very competitive. 相似文献

14.

闪光照相中FXRMC和MCNP4B的散射比较研究

下载免费PDF全文

刘军刘进施将君李必勇刘瑞根管永红肖智强《强激光与粒子束》2006,18(6):1014-1018

散射问题是高能辐射成像研究中的一个重要问题，采用蒙特卡罗模拟来确定散射对提取客体信息的影响是一种重要的研究手段。简单介绍了FXRMC和MCNP4B程序的特点及其记录方式；在确保相同输入参数的条件下，针对不同的照相模型进行了对比计算。结果表明两个程序计算的散射照射量相对差别小于5%，说明这两个程序具有较高的符合程度。通过与实验结果的比较发现，这两个程序模拟的散射分布与实验结果基本一致，均可用于高能闪光照相的模拟研究。还给出了在散射检验方面的一些建议。相似文献

15.

A new electromagnetic particle-in-cell model with adaptive mesh refinement for high-performance parallel computation

Keizo Fujimoto 《Journal of computational physics》2011,230(23):8508-8526

A new electromagnetic particle-in-cell (EMPIC) model with adaptive mesh refinement (AMR) has been developed to achieve high-performance parallel computation in distributed memory system. For minimizing the amount and frequency of inter-processor communications, the present study uses the staggering grid scheme with the charge conservation method, which consists only of the local operations. However, the scheme provides no numerical damping for electromagnetic waves regardless of the wavenumber, which results in significant noise in the refinement region that eventually covers over physical signals. In order to suppress the electromagnetic noise, the present study introduces a smoothing method which gives numerical damping preferentially for short wavelength modes. The test simulations show that only a weak smoothing results in drastic reduction in the noise, so that the implementation of the AMR is possible in the staggering grid scheme. The computational load balance among the processors is maintained by a new method termed the adaptive block technique for the domain decomposition parallelization. The adaptive block technique controls the subdomain (block) structure dynamically associated with the system evolution, such that all the blocks have almost the same number of particles. The performance of the present code is evaluated for the simulations of the current sheet evolution. The test simulations demonstrate that the usage of the adaptive block technique as well as the staggering grid scheme enhances significantly the parallel efficiency of the AMR-EMPIC model. 相似文献

16.

Parallelization of analytical Hartree—Fock and density functional theory Hessian calculations. Part I: parallelization of coupled-perturbed Hartree—Fock equations

PRAKASHAN P. KORAMBATH JING KONG THOMAS R. FURLANI MARTIN HEAD-GORDON 《Molecular physics》2013,111(11):1755-1761

Solving the coupled-perturbed Hartree-Fock (CPHF) equations is the most time consuming part in the analytical computation of second derivatives of the molecular energy with respect to the nuclei. This paper describes a unique parallelization approach for solving the CPHF equations. The computational load is divided by the nuclear perturbations and distributed evenly among the computing nodes. The parallel algorithm is scalable with respect to the size of the molecule, i.e. the larger the molecule, the greater the parallel speedup. The memory storage requirements are also distributed among the processors, with little communication among the processors. The method is implemented in the Q-Chem software package and its performance is discussed. This work represents the first step in a research project to parallelize analytical frequency calculations at Hartree-Fock and density functional theory levels. 相似文献

17.

Recent radiation damage studies and developments of the Marlowe code

C.J. Ortiz A. Souidi C.S. Becquart C. Domain 《辐射效应与固体损伤》2013,168(7):592-602

Radiation damage in materials relevant to applications evolves over time scales spanning from the femtosecond – the characteristic time for an atomic collision – to decades – the aging time expected for nuclear materials. The relevant kinetic energies of atoms span from thermal motion to the MeV range.The question motivating this contribution is to identify the relationship between elementary atomic displacements triggered by irradiation and the subsequent microstructural evolution of metals in the long term. The Marlowe code, based on the binary collision approximation (BCA) is used to simulate the sequences of atomic displacements generated by energetic primary recoils and the Object Kinetic Monte Carlo code LAKIMOCA, parameterized on a range of ab initio calculations, is used to predict the subsequent long-term evolution of point defect and clusters thereof. In agreement with full Molecular Dynamics, BCA displacement cascades in body-centered cubic (BCC) Fe and a face-centered cubic (FCC) Fe\bond Ni\bond Cr alloy display recursive properties that are found useful for predictions in the long term.The case of defects evolution in W due to external irradiation with energetic H and He is also discussed. To this purpose, it was useful to extend the inelastic energy loss model available in Marlowe up to the Bethe regime. The last version of the Marlowe code (version 15) was delivered before message passing instructions softwares (such as MPI) were available but the structure of the code was designed in such a way to permit parallel executions within a distributed memory environment. This makes possible to obtain N different cascades simultaneously using N independent nodes without any communication between processors. The parallelization of the code using MPI was recently achieved by one author of this report (C.J.O.). Typically, the parallelized version of Marlowe allows simulating millions of displacement cascades using a limited number of processors (<64) within only few hours of CPU time. 相似文献

18.

3维全电磁粒子软件NEPTUNE中的并行计算方法

下载免费PDF全文

陈军莫则尧董烨杨温渊董志伟《强激光与粒子束》2011,23(11)

介绍了NEPTUNE软件采用的一些并行计算方法：采用“块-网格片”二层并行区域分解方法,使计算规模能够扩展到上千个处理器核。基于复杂几何特征采用自适应技术并行生成结构网格,在原有规则区域的基础上剔除无效网格,大幅降低了存储量和并行执行时间。在经典的Boris和SOR迭代方法基础上,采用红黑排序和几何约束,提出了非规则区域上的Poisson方程并行求解方法。采用这些方法后,当使用NEPTUNE软件模拟MILO器件时,可在1 024个处理器核上获得51.8%的并行效率。 相似文献