期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

黄娇凤钟敏刘进景越峰刘军施将君《强激光与粒子束》2012,24(12):2965-2969

研究了闪光X射线辐射照像蒙特卡罗程序(FXRMC)在MPI平台下的并行计算实现,给出了实现过程中并行随机数的产生方法。对并行程序的测试结果表明:并行程序与串行程序结果一致,加速比比较理想,呈线性增长,并行效率在16个处理器上可达80%以上。算例的结果说明了并行化可有效地解决程序计算散射技术性能时的耗时问题,从而有效化解FXRMC耗时和大规模计算的困难,提高了FXRMC程序的计算规模和计算速度,达到了研究要求。 (Institute of Fluid Physics, CAEP, P. O. Box 919-105, Mianyang 621900, China) 相似文献

2.

二维多群辐射输运程序LARED-R-1的并行化 总被引：3，自引：2，他引：1

张爱清莫则尧《计算物理》2007,24(2):146-152

利用有向图描述数据依赖关系,应用已有的并行流水线通量扫描算法,实现基于非协调网格的二维辐射输运程序LARED-R-1的并行化.同时,采用消息缓冲技术提高并行程序的性能.经测试,对于典型的问题规模(100群、3800个网格单元、40个方向),在某并行机的64个和128个处理器上,并行程序分别获得80%和53%的并行效率. 相似文献

3.

紧束缚近似含时密度泛函理论的高效OpenMP并行化和GPU加速实现

范果红韩克利何国钟《化学物理学报》2013,26(6):635-645

紧束缚近似的含时密度泛函理论在多核和GPU系统下的高效加速实现,并应用于拥有成百上千原子体系的激发态电子结构计算．程序中采用了稀疏矩阵和OpenMP并行化来加速哈密顿矩阵的构建,而最为耗时的基态对角化部分通过双精度的GPU加速来实现．基态的GPU加速能够在保持计算精度的基础上达到8.73倍的加速比．激发态计算采用了基于Krylov子空间迭代算法,OpenMP并行化和GPU加速等方法对激发态计算的大规模TDDFT矩阵进行求解,从而得到本征值和本征矢,大大减少了迭代的次数和最终的求解时间．采用GPU对矩阵矢量相乘进行加速后的Krylov算法能够很快地达到收敛,使得相比于采用常规算法和CPU并行化的程序能够加速206倍．程序在一系列的小分子体系和大分子体系上的计算表明,相比基于第一性原理的CIS方法和含时密度泛函方法,程序能够花费很少的计算量取得合理而精确结果．相似文献

4.

An adaptive hierarchical domain decomposition method for parallel contact dynamics simulations of granular materials

Zahra Shojaaee M. Reza Shaebani Lothar Brendel János Török Dietrich E. Wolf 《Journal of computational physics》2012,231(2):612-628

A fully parallel version of the contact dynamics (CD) method is presented in this paper. For large enough systems, 100% efficiency has been demonstrated for up to 256 processors using a hierarchical domain decomposition with dynamic load balancing. The iterative scheme to calculate the contact forces is left domain-wise sequential, with data exchange after each iteration step, which ensures its stability. The number of additional iterations required for convergence by the partially parallel updates at the domain boundaries becomes negligible with increasing number of particles, which allows for an effective parallelization. Compared to the sequential implementation, we found no influence of the parallelization on simulation results. 相似文献

5.

A numerical method for incompressible viscous flow simulation

Ramesh Natarajan 《Journal of computational physics》1992,100(2)

We describe a numerical scheme for computing time-dependent solutions of the incompressible Navier-Stokes equations in the primitive variable formulation. This scheme uses finite elements for the space discretization and operator splitting techniques for the time discretization. The resulting discrete equations are solved using specialized nonlinear optimization algorithms that are computationally efficient and have modest storage requirements. The basic numerical kernel is the preconditioned conjugate gradient method for symmetric, positive-definite, sparse matrix systems, which can be efficiently implemented on the architectures of vector and parallel processing supercomputers. 相似文献

6.

Kohn-Sham self-interaction correction in real time

Hofmann D Körzdörfer T Kümmel S 《Physical review letters》2012,108(14):146401

相似文献

7.

Parallel finite element simulations of incompressible viscous fluid flow by domain decomposition with Lagrange multipliers

Christian A. Rivera Mourad Heniche Roland Glowinski Philippe A. Tanguy 《Journal of computational physics》2010,229(13):5123-5143

A parallel approach to solve three-dimensional viscous incompressible fluid flow problems using discontinuous pressure finite elements and a Lagrange multiplier technique is presented. The strategy is based on non-overlapping domain decomposition methods, and Lagrange multipliers are used to enforce continuity at the boundaries between subdomains. The novelty of the work is the coupled approach for solving the velocity–pressure-Lagrange multiplier algebraic system of the discrete Navier–Stokes equations by a distributed memory parallel ILU (0) preconditioned Krylov method. A penalty function on the interface constraints equations is introduced to avoid the failure of the ILU factorization algorithm. To ensure portability of the code, a message based memory distributed model with MPI is employed. The method has been tested over different benchmark cases such as the lid-driven cavity and pipe flow with unstructured tetrahedral grids. It is found that the partition algorithm and the order of the physical variables are central to parallelization performance. A speed-up in the range of 5–13 is obtained with 16 processors. Finally, the algorithm is tested over an industrial case using up to 128 processors. In considering the literature, the obtained speed-ups on distributed and shared memory computers are found very competitive. 相似文献

8.

MC程序并行设计及提高加速比措施 总被引：4，自引：0，他引：4

邓力谢仲生黄正丰许海燕《计算物理》2001,18(2):177-180

MC程序的并行设计涉及算法及模块划分,它直接关系到并行加速效率的高低,中子－γ耦合输运蒙特卡罗程序MCNP经过行改造,实现了PVM和MPI两种系统下的并行化,由于作了模块化设计,并行加速效率极佳,PVM版和MPI版并行程序在多个处理器下的加速比均呈线性增长,相比PVM,MPI的适应性列强,多数情况下其效率高于OPVM,并行MCNP程序的计算结果可靠,MPI并行程序在16、32和64个处理器上的并行效率分别达到99％、97％和89％。相似文献

9.

基于数值原子轨道基组的第一性原理计算软件ABACUS

下载免费PDF全文

刘晓辉陈默涵李鹏飞沈瑜任新国郭光灿何力新《物理学报》2015,64(18):187104-187104

随着超级计算机硬件和数值算法迅速发展, 使得目前利用密度泛函理论研究上千个原子体系的电子能带和结构等性质变得可行. 数值原子轨道基组由于其基组较小和局域等特性, 可以很好地与电子结构计算中的线性标度算法等的新算法结合, 用来研究较大尺寸的物理体系. 本文详细介绍了一款中国科学技术大学量子信息重点实验室自主开发的基于数值原子轨道基组的第一性原理计算软件 Atomic-orbital Based Ab-initio Computation at UStc. 大量的测试结果表明: 该软件具有很好的准确性和较高的并行效率, 可以用于包含1000个原子左右的系统的电子结构和原子结构的研究以及分子动力学模拟计算. 相似文献

10.

Importance of electronic self-consistency in the TDDFT based treatment of nonadiabatic molecular dynamics

T. A.?Niehaus Email author D.?Heringer B.?Torralva Th.?Frauenheim 《The European Physical Journal D - Atomic, Molecular, Optical and Plasma Physics》2005,35(3):467-477

相似文献

11.

Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

Giorgos Arampatzis Markos A. Katsoulakis Petr Plecháč Michela Taufer Lifan Xu 《Journal of computational physics》2012,231(23):7795-7814

We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decomposition corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors.The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods. 相似文献

12.

3维全电磁粒子软件NEPTUNE中的并行计算方法

下载免费PDF全文

陈军莫则尧董烨杨温渊董志伟《强激光与粒子束》2011,23(11)

介绍了NEPTUNE软件采用的一些并行计算方法：采用“块-网格片”二层并行区域分解方法,使计算规模能够扩展到上千个处理器核。基于复杂几何特征采用自适应技术并行生成结构网格,在原有规则区域的基础上剔除无效网格,大幅降低了存储量和并行执行时间。在经典的Boris和SOR迭代方法基础上,采用红黑排序和几何约束,提出了非规则区域上的Poisson方程并行求解方法。采用这些方法后,当使用NEPTUNE软件模拟MILO器件时,可在1 024个处理器核上获得51.8%的并行效率。 相似文献

13.

二维柱几何非定常中子输运方程基于格式的界面预估校正并行算法

魏军侠袁光伟阳述林申卫东《计算物理》2012,29(2):198-204

针对二维柱几何非定常中子输运方程的Sn-间断有限元方法,提出基于格式的界面预估校正并行算法.数值算例表明,该并行算法在精度与并行度等诸方面均具有良好的性质,与已有的基于隐式格式的并行扫描算法相比,对于二维中子输运大规模计算问题,并行计算效率较高,并行加速比可增加-倍以上,且可保持原隐式格式的计算精度. 相似文献

14.

Concurrent implicit spectral deferred correction scheme for low-Mach number combustion with detailed chemistry

François P. Hamon Marcus S. Day Michael L. Minion 《Combustion Theory and Modelling》2019,23(2):279-309

We present a parallel multi-implicit time integration scheme for the advection-diffusion-reaction systems arising from the equations governing low-Mach number combustion with complex chemistry. Our strategy employs parallelisation across the method to accelerate the serial Multi-Implicit Spectral Deferred Correction (MISDC) scheme used to couple the advection, diffusion, and reaction processes. In our approach, the diffusion solves and the reaction solves are performed concurrently by different processors. Our analysis shows that the proposed parallel scheme is stable for stiff problems and that the sweeps converge to the fixed-point solution at a faster rate than with serial MISDC. We present numerical examples to demonstrate that the new algorithm is high-order accurate in time, and achieves a parallel speedup compared to serial MISDC. 相似文献

15.

适合DSP处理的低内存并行SPIHT算法

陈升来黄廉卿《光学技术》2006,32(4):587-590

针对SPIHT(set partitioning in hierarchical trees)算法的编码过程具有重复运算、存储量大等问题,提出了一种适合于DSP(digital signal processors)处理的低内存并行SPIHT算法。该算法采用乒乓缓存策略,使得数据的传输和编码能够同时进行。通过引入基于行的整型提升方案,使得只需经少量行变换就能进行列变换,提高了小波的变换速度。根据DSP的并行特性和SPIHT算法的缺点,采用“改进的最大幅值求取方法”、“误差位数以及绝对零值和绝对零集合”、“最大值与零值图”和“单棵零树编码”等多种方法对其进行了改进,大大缓解了对内存的压力,减少了算法的运算量。该算法与LZC(listless zerotree coding)算法相比,重构图像的峰值信噪比相当,但速度提高了2倍,能满足一般的实时压缩要求。相似文献

16.

Large-scale simulations on multiple Graphics Processing Units (GPUs) for the direct simulation Monte Carlo method

C.-C. Su M.R. Smith F.-A. Kuo J.-S. Wu C.-W. Hsieh K.-C. Tseng 《Journal of computational physics》2012,231(23):7932-7958

In this study, the application of the two-dimensional direct simulation Monte Carlo (DSMC) method using an MPI-CUDA parallelization paradigm on Graphics Processing Units (GPUs) clusters is presented. An all-device (i.e. GPU) computational approach is adopted where the entire computation is performed on the GPU device, leaving the CPU idle during all stages of the computation, including particle moving, indexing, particle collisions and state sampling. Communication between the GPU and host is only performed to enable multiple-GPU computation. Results show that the computational expense can be reduced by 15 and 185 times when using a single GPU and 16 GPUs respectively when compared to a single core of an Intel Xeon X5670 CPU. The demonstrated parallel efficiency is 75% when using 16 GPUs as compared to a single GPU for simulations using 30 million simulated particles. Finally, several very large-scale simulations in the near-continuum regime are employed to demonstrate the excellent capability of the current parallel DSMC method. 相似文献

17.

A new electromagnetic particle-in-cell model with adaptive mesh refinement for high-performance parallel computation

Keizo Fujimoto 《Journal of computational physics》2011,230(23):8508-8526

A new electromagnetic particle-in-cell (EMPIC) model with adaptive mesh refinement (AMR) has been developed to achieve high-performance parallel computation in distributed memory system. For minimizing the amount and frequency of inter-processor communications, the present study uses the staggering grid scheme with the charge conservation method, which consists only of the local operations. However, the scheme provides no numerical damping for electromagnetic waves regardless of the wavenumber, which results in significant noise in the refinement region that eventually covers over physical signals. In order to suppress the electromagnetic noise, the present study introduces a smoothing method which gives numerical damping preferentially for short wavelength modes. The test simulations show that only a weak smoothing results in drastic reduction in the noise, so that the implementation of the AMR is possible in the staggering grid scheme. The computational load balance among the processors is maintained by a new method termed the adaptive block technique for the domain decomposition parallelization. The adaptive block technique controls the subdomain (block) structure dynamically associated with the system evolution, such that all the blocks have almost the same number of particles. The performance of the present code is evaluated for the simulations of the current sheet evolution. The test simulations demonstrate that the usage of the adaptive block technique as well as the staggering grid scheme enhances significantly the parallel efficiency of the AMR-EMPIC model. 相似文献

18.

LUCKY_TD code for solving the time-dependent transport equation with the use of parallel computations

A. V. Moryakov 《Physics of Atomic Nuclei》2016,79(8):1242-1245

An algorithm for solving the time-dependent transport equation in the P_mS_n group approximation with the use of parallel computations is presented. The algorithm is implemented in the LUCKY_TD code for supercomputers employing the MPI standard for the data exchange between parallel processes. 相似文献

19.

GPU-accelerated T-matrix algorithm for light-scattering simulations

Giovanni Iadarola Carlo Forestiere Luca Dal Negro Fabio Villone Giovanni Miano 《Journal of computational physics》2012,231(17):5640-5652

Modern graphical processing units (GPUs) have recently become a pervasive technology able to rapidly solve large parallel problems which previously required runs on clusters or supercomputers. In this paper we propose an effective strategy to parallelize the T-matrix method on GPUs in order to speed-up light scattering simulations. We have tackled two of the most computationally intensive scattering problems that are of interest in nano-optics: the scattering from an isolated non-axisymmetric particle and from an agglomerate of arbitrary shaped particles. We show that fully exploiting the GPU potential we can achieve more than 20 times (20×) acceleration over sequential execution in the investigated scenarios, opening exciting prospectives in the analysis and the design of optical nanostructures. 相似文献

20.

激发态过程的多体理论方法 总被引：2，自引：0，他引：2

黄美纯《发光学报》2005,26(3):273-284

描述多电子体系的绝大部分参量可实验测量,如吸收光谱、发光光谱和激子效应等,都涉及电子激发态的正确描述。密度泛函理论(DFT)框架内的局域密度近似(LDA)作为第一性原理基态理论,即基于Kohn-Sham方程的解,是研究多粒子体系基态性质非常有力的工具。然而,体系激发态的第一性原理理论及其计算要比基态的理论计算复杂得多。关键问题在于描写基态和激发态时,粒子间的交换关联相互作用并不相同,而对于非均匀相互作用多粒子体系的交换关联能至今仍不清楚。不过,近年来关于激发态问题的研究,先后发展了许多描述电子激发态的理论,最重要的是基于准粒子概念和Green函数方程的多体微扰理论和含时间密度泛函理论(TDDFT)以及与此相关的描述电子-空穴相互作用的Bethe-Salpeter方程在凝聚态物理问题中的应用。其中最关键的物理量是粒子的自能算符Σ,它描述Hartree近似之外的交换和关联效应。虽然这些理论不可避免地也要引入某些近似,如对于Σ的一个好的近似就是Hedin的GW近似方法。对许多实际凝聚态体系的计算机模拟结果表明,GW近似是描述激发态问题相当成功的理论方法。将Hartree-Fock(HF)理论与LDA相结合,但采用非局域屏蔽交换代替HF方法中的局域非屏蔽交换相互作用,建立广义的KS方程(GKS),得到所谓屏蔽交换局域密度近似(sX-LDA)方法。我们在平面波自洽场方法PWscf程序包的基础上,发展了PW scf-sX-LDA方法,也是处理激发态问题及材料设计的有效方法。将评述激发态过程多体理论各种方法的发展和意义,讨论这些多体理论方法之间的联系和差异,并在此基础上介绍它们在解决半导体带带跃迁(或带隙偏小问题)、半导体及其微结构中的激子效应等重要领域的应用和成果。相似文献