期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

徐幼平程煜峰王斌郭红普业程锐《计算物理》2017,34(1):47-60

以自主研制的区域中尺度暴雨大气模式为研究对象,基于JASMIN并行编程框架,建立构件化、层次化的区域大气模式大规模高效并行程序,并针对典型天气实例,对模式并行计算程序的正确性、并行性能及高分辨率模拟效果进行验证.结果证明,基于JASMIN框架的新模式程序与原串行模式具有很好的计算一致性,其不仅能保持原有模式良好的预报效果,且能显著提升模式大规模并行计算性能和可扩展性,在进一步提高模式分辨率后能得到更好的预报结果. 相似文献

2.

基于大规模并行计算的三维多群中子扩散方程有限差分方法

下载免费PDF全文

吴文斌于颖锐向宏志甯忠豪李庆《强激光与粒子束》2017,29(8):086001

三维多群中子扩散方程的精确、高效求解是核动力堆芯设计及燃料管理的基础。应用有限差分方法求解该方程具有简便、精确、成熟的优点;然而,该方法的计算量和存储量均较大,极大地限制了它的计算规模和应用范围。本文基于大规模并行计算,研究三维多群中子扩散方程有限差分方法:采用中心有限差分格式离散中子扩散方程;基于MPI并行编程模型,采用空间区域分解的方式实现大规模并行计算;采用多群多区域耦合PGMRES算法进行并行加速。在集群服务器上开发了ParaFiDi程序,并采用IAEA3D,PHWR等多个基准题对该程序进行验证。数值结果表明,ParaFiDi程序具有较高的计算精度和计算效率。相似文献

3.

四级涡轮多叶片排三元N-S解网络并行计算 总被引：6，自引：1，他引：5

赵晓路《工程热物理学报》1998,(4)

在科学与工程计算国家重点实验室SGI工作站网络上，基于PVM并行软件平台，发展了多叶片三元N-S解并行计算程序，对一四级动力涡轮内部流场进行了计算和初步分析．相似文献

4.

基于BEM泊松方程求解的空间电荷效应数值模拟 总被引：1，自引：1，他引：0

冯义章秦斌樊明武《强激光与粒子束》2008,20(10)

为了模拟强束流在加速器及其传输线中的行为,用C++语言开发了一种包含空间电荷效应的多粒子跟踪程序(PTP-SC),它在经典的PIC方法基础上,基于边界元法(BEM)和非等距的网格求解泊松方程。束流在自由空间分布的仿真结果与解析结果保持较好的一致性。给出了一条注入线的模拟计算结果,并与ORBIT,TRACE 3-D的计算结果进行比对。结果表明：该程序与采用数值方法的ORBIT程序的计算结果有较好的一致性。该程序可用于直线加速器及回旋加速器中的空间电荷效应模拟。相似文献

5.

涡轮叶栅气热耦合数值模拟 总被引：3，自引：0，他引：3

周驰冯国泰王松涛顾中华《工程热物理学报》2003,24(2):224-227

本文建立了透平机械叶栅流场与叶片温度场的气热耦合计算模型,编制了三维非正交曲线坐标系下的温度场求解程序,计算了实心及空心叶片的温度场分布。介绍了区域分解方法在耦合计算及并行计算中的应用,以矩形区域温度场求解程序为例研究了程序的并行化,最后进仃了叶栅流场计算与叶片导热计算程序的分区耦合计算。相似文献

6.

三维中子-光子输运的蒙特卡罗程序MCMG

下载免费PDF全文

邓力胡泽华李刚李树上官丹骅姬志成《强激光与粒子束》2013,25(1):163-168

三维中子-光子输运蒙特卡罗程序MCMG发展了针对物质的碰撞机制,几何块、几何面动态可扩展, 随机数周期进一步扩大到261。可进行多群-连续截面耦合计算,多群散射展开到P5,并考虑了中子上散射,程序配备了通用和专用多群截面库。MCMG模拟取得了与MCNP程序和实验一致的结果,串行计算速度较MCNP快2~4倍,可进行上万处理器核的并行计算。相似文献

7.

层析法计算三维物体全息图的并行加速研究

下载免费PDF全文

肖波郑华东刘柯健李飞高智方《应用光学》2019,40(4):620-626

随着计算空间光调制器的分辨率的尺寸逐渐变大，全息图三维动态显示的计算量也越来越大，使得对全息计算速度提出了新的要求。利用GPU并行计算处理的方式实现全息图的快速层析法计算，该方法利用GPU并行多线程和层析法中的图像二维傅里叶变换的优势对菲涅尔衍射变换算法加速计算；同时通过对GPU底层资源的调用和对CUDA中程序的流处理过程，有效减少中间的延时等待。通过对计算速度对比分析表明:与在CPU上运算相比，计算速度大幅提升，基于GPU并行计算的方法比基于CPU计算的方法速度快10倍左右。相似文献

8.

基于Tahoe框架的某夹具并行计算 总被引：1，自引：0，他引：1

范宣华吴瑞安郝志明何颖波《计算物理》2009,26(5):699-702

在开源软件Tahoe框架基础上,结合有限元前后处理程序MSC.Patran及Tecplot,对某复杂夹具进行建模.通过区域分解、编制接口和采用PHG中提供的PCG(preconditioned conjugate gradient,预处理共轭梯度法)迭代解法成功实现262×10⁴自由度模型的串、并行计算.结果表明,并行计算收敛速度更快,4进程并行计算时间不到串行计算时间的1/4.通过与商用程序MSC.Nastran比较,验证计算结果的正确性.利用大型并行计算机对该模型并行计算性能进行研究,获得最高32进程的并行计算加速比.研究表明,改进后的Tahoe计算框架对于开展大规模自由度下的结构并行计算分析研究是可行的,并且随计算节点增加,并行计算过程基本呈线性加速. 相似文献

9.

基于JASMIN并行框架的2.5维粒子模拟程序NEPTUNE2D的研制北大核心CSCD

下载免费PDF全文

张恒郝建红董烨董志伟杨温渊孙会芳《强激光与粒子束》2016,28(3):033007-40

介绍了2.5维自主研制的并行电磁粒子模拟程序NEPTUNE2D初步研发情况。该程序基于JASMIN并行自适应结构网格支撑框架研制,并行效能高,可扩展性强,且支持动态负载平衡;采用新型PIC算法替代传统算法,避免求解泊松方程修正电场,更适用于大规模并行计算;程序支持r-z坐标系下的器件仿真,可应用于高功率微波器件、电真空器件的快速模拟设计。该程序现已完成电磁场更新、粒子推进、电磁场注入/引出、粒子发射/吸收等基本物理功能模块的研制,并通过同轴线、圆波导、同轴二极管及无箔二极管算例模拟验证了模块的正确性。最后,应用NEPTUNE2D程序设计了一个高效同轴相对论返波管,给出了粒子模拟结果和并行性能测试结果。相似文献

10.

行波管多信号非线性注波互作用的网络并行计算 总被引：2，自引：0，他引：2

下载免费PDF全文

李建清莫元龙张勇周晓岚《强激光与粒子束》2002,14(4):583-586

把网络并行计算技术应用于微波管的研究领域，提出了用于计算2.5维行波管多信号非线性注波互作用的基于TCP/IP协议的网络并行计算模型，编写了模拟行波管多信号非线性注波互作用过程的网络并行计算程序，其可在工作站和微机组成的计算机网络环境中实现并行计算。计算结果表明，该网络并行算法能够有效地减少行波管多信号非线性注波互作用的计算时间，提高工作效率。 相似文献

11.

基于JASMIN的地下水流大规模并行数值模拟 总被引：1，自引：0，他引：1

程汤培莫则尧邵景力《计算物理》2013,30(3):317-325

针对具有精细网格剖分、长时间跨度特征的地下水流模拟中计算时间长、存储开销大等瓶颈问题,基于MODFLOW三维非稳定流计算方法,提出基于网格片的核心算法以及基于影像区的通信机制,并在JASMIN框架上研制了大规模地下水流并行数值模拟程序JOGFLOW.通过河南郑州市中牟县雁鸣湖水源地地下水流的模拟,对程序正确性和性能进行了验证;通过建立一个具有精细网格剖分的假想地下水概念模型对可扩展性进行测试.相对于32核的并行程序,在512以及1 024个处理机上的并行效率分别可达77.2%和67.5%.数值模拟结果表明,JOGFLOW具有较好的计算性能与可扩展性,能够有效使用数百上千计算核心,支持千万量级以上网格剖分的地下水流模型的大规模并行计算. 相似文献

12.

二维电磁等离子体粒子云网格法并行程序的可扩展性分析

陈军莫则尧袁国兴李晓梅《计算物理》2001,18(4):366-371

大规模并行处理的发展要求并行应用程序具有良好的可扩展性.以二维电磁等离子体粒子云并行程序为例,描述了近优可扩展性分析的应用.在已知小规模系统性能的基础上,通过近优可扩展性分析,可以得到更大规模的系统在多少台处理机上运行更为"合理"的信息. 相似文献

13.

Multi-level Monte Carlo finite volume methods for nonlinear systems of conservation laws in multi-dimensions

S. Mishra Ch. Schwab J. Šukys 《Journal of computational physics》2012,231(8):3365-3388

We extend the multi-level Monte Carlo (MLMC) in order to quantify uncertainty in the solutions of multi-dimensional hyperbolic systems of conservation laws with uncertain initial data. The algorithm is presented and several issues arising in the massively parallel numerical implementation are addressed. In particular, we present a novel load balancing procedure that ensures scalability of the MLMC algorithm on massively parallel hardware. A new code is described and applied to simulate uncertain solutions of the Euler equations and ideal magnetohydrodynamics (MHD) equations. Numerical experiments showing the robustness, efficiency and scalability of the proposed algorithm are presented. 相似文献

14.

Parallel implementation of the FETI-DPEM algorithm for general 3D EM simulations

Yu-Jia Li Jian-Ming Jin 《Journal of computational physics》2009,228(9):3255-3267

A parallel implementation of the electromagnetic dual-primal finite element tearing and interconnecting algorithm (FETI-DPEM) is designed for general three-dimensional (3D) electromagnetic large-scale simulations. As a domain decomposition implementation of the finite element method, the FETI-DPEM algorithm provides fully decoupled subdomain problems and an excellent numerical scalability, and thus is well suited for parallel computation. The parallel implementation of the FETI-DPEM algorithm on a distributed-memory system using the message passing interface (MPI) is discussed in detail along with a few practical guidelines obtained from numerical experiments. Numerical examples are provided to demonstrate the efficiency of the parallel implementation. 相似文献

15.

Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project

Andrade X Alberdi-Rodriguez J Strubbe DA Oliveira MJ Nogueira F Castro A Muguerza J Arruabarrena A Louie SG Aspuru-Guzik A Rubio A Marques MA 《J Phys Condens Matter》2012,24(23):233202

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures. 相似文献

16.

A new domain decomposition method with overlapping patches for ultrascale simulations: Application to biological flows

L. Grinberg G.E. Karniadakis 《Journal of computational physics》2010,229(15):5541-5563

We address the failure in scalability of large-scale parallel simulations that are based on (semi-)implicit time-stepping and hence on the solution of linear systems on thousands of processors. We develop a general algorithmic framework based on domain decomposition that removes the scalability limitations and leads to optimal allocation of available computational resources. It is a non-intrusive approach as it does not require modification of existing codes. Specifically, we present here a two-stage domain decomposition method for the Navier–Stokes equations that combines features of discontinuous and continuous Galerkin formulations. At the first stage the domain is subdivided into overlapping patches and within each patch a C⁰ spectral element discretization (second stage) is employed. Solution within each patch is obtained separately by applying an efficient parallel solver. Proper inter-patch boundary conditions are developed to provide solution continuity, while a Multilevel Communicating Interface (MCI) is developed to provide efficient communication between the non-overlapping groups of processors of each patch. The overall strong scaling of the method depends on the number of patches and on the scalability of the standard solver within each patch. This dual path to scalability provides great flexibility in balancing accuracy with parallel efficiency. The accuracy of the method has been evaluated in solutions of steady and unsteady 3D flow problems including blood flow in the human intracranial arterial tree. Benchmarks on BlueGene/P, CRAY XT5 and Sun Constellation Linux Cluster have demonstrated good performance on up to 96,000 cores, solving up to 8.21B degrees of freedom in unsteady flow problem. The proposed method is general and can be potentially used with other discretization methods or in other applications. 相似文献

17.

基于PANDA平台的多点基础激励谐响应的并行计算

于晨阳范宣华王柯颖肖世富《计算物理》2018,35(4):443-450

基于PANDA自主并行计算平台,采用模态叠加法对多点基础激励作用下的谐响应分析开展算法设计和并行实现研究,构建相应的并行求解模块.结合商业有限元软件对研发模块的正确性进行验证;将该模块应用于某光机装置,实现11.88亿自由度超大规模的高效并行求解.结果表明：研发的多点基础激励谐响应分析模块可用于复杂装备的高效精细数值模拟,具备强大的并行计算能力,最大可扩展CPU核数可达数千个,远超现有通用商业有限元软件. 相似文献

18.

三维多物质弹塑性流体动力学Euler方法的并行算法研究及程序测试

下载免费PDF全文

马天宝费广磊张文耀《高压物理学报》2011,25(6):508-513

并行计算是解决爆炸与冲击问题大规模数值模拟最有效的手段之一。针对Euler方法并行程序设计的复杂性,阐述了三维多物质弹塑性流体动力学程序MMIC-3D并行设计的总体策略,基于消息传递接口(MPI)设计出相应的PMMIC-3D并行程序,并提出了一套实用的程序测试方案。结合聚能射流形成过程的数值模拟算例,在八节点的集群上测试了加速比、并行效率及可扩放性,分析了影响并行性能的因素。相似文献

19.

基于JASMIN框架多物理耦合程序的性能优化及分析

任健武林平申卫东《计算物理》2015,32(4):431-436

基于并行应用支撑软件框架JASMIN的辐射流体与粒子输运耦合程序RHSn2D,采用最小邦元固定处理器数目的并行策略,计算实际模型的并行规模扩展至8192核,并行效率约为16%.集成程序时间分析,验证软件框架底层MPI并行环境聚合通信对于并行优化算法(尤其是辐射流体计算时间)的影响. 相似文献

20.

Solution of the equation of radiative transfer using a Newton–Krylov approach and adaptive mesh refinement

Marc R.J. Charest Clinton P.T. Groth Ömer L. Gülder 《Journal of computational physics》2012,231(8):3023-3040

The discrete ordinates method (DOM) and finite-volume method (FVM) are used extensively to solve the radiative transfer equation (RTE) in furnaces and combusting mixtures due to their balance between numerical efficiency and accuracy. These methods produce a system of coupled partial differential equations which are typically solved using space-marching techniques since they converge rapidly for constant coefficient spatial discretization schemes and non-scattering media. However, space-marching methods lose their effectiveness when applied to scattering media because the intensities in different directions become tightly coupled. When these methods are used in combination with high-resolution limited total-variation-diminishing (TVD) schemes, the additional non-linearities introduced by the flux limiting process can result in excessive iterations for most cases or even convergence failure for scattering media. Space-marching techniques may also not be quite as well-suited for the solution of problems involving complex three-dimensional geometries and/or for use in highly-scalable parallel algorithms. A novel pseudo-time marching algorithm is therefore proposed herein to solve the DOM or FVM equations on multi-block body-fitted meshes using a highly scalable parallel-implicit solution approach in conjunction with high-resolution TVD spatial discretization. Adaptive mesh refinement (AMR) is also employed to properly capture disparate solution scales with a reduced number of grid points. The scheme is assessed in terms of discontinuity-capturing capabilities, spatial and angular solution accuracy, scalability, and serial performance through comparisons to other commonly employed solution techniques. The proposed algorithm is shown to possess excellent parallel scaling characteristics and can be readily applied to problems involving complex geometries. In particular, greater than 85% parallel efficiency is demonstrated for a strong scaling problem on up to 256 processors. Furthermore, a speedup of a factor of at least two was observed over a standard space-marching algorithm using a limited scheme for optically thick scattering media. Although the time-marching approach is approximately four times slower for absorbing media, it vastly outperforms standard solvers when parallel speedup is taken into account. The latter is particularly true for geometrically complex computational domains. 相似文献