共查询到20条相似文献,搜索用时 125 毫秒
1.
2.
三维多群中子扩散方程的精确、高效求解是核动力堆芯设计及燃料管理的基础。应用有限差分方法求解该方程具有简便、精确、成熟的优点;然而,该方法的计算量和存储量均较大,极大地限制了它的计算规模和应用范围。本文基于大规模并行计算,研究三维多群中子扩散方程有限差分方法:采用中心有限差分格式离散中子扩散方程;基于MPI并行编程模型,采用空间区域分解的方式实现大规模并行计算;采用多群多区域耦合PGMRES算法进行并行加速。在集群服务器上开发了ParaFiDi程序,并采用IAEA3D,PHWR等多个基准题对该程序进行验证。数值结果表明,ParaFiDi程序具有较高的计算精度和计算效率。 相似文献
3.
四级涡轮多叶片排三元N-S解网络并行计算 总被引:6,自引:1,他引:5
在科学与工程计算国家重点实验室SGI工作站网络上,基于PVM并行软件平台,发展了多叶片三元N-S解并行计算程序,对一四级动力涡轮内部流场进行了计算和初步分析. 相似文献
4.
基于BEM泊松方程求解的空间电荷效应数值模拟 总被引:1,自引:1,他引:0
为了模拟强束流在加速器及其传输线中的行为,用C++语言开发了一种包含空间电荷效应的多粒子跟踪程序(PTP-SC),它在经典的PIC方法基础上,基于边界元法(BEM)和非等距的网格求解泊松方程。束流在自由空间分布的仿真结果与解析结果保持较好的一致性。给出了一条注入线的模拟计算结果,并与ORBIT,TRACE 3-D的计算结果进行比对。结果表明:该程序与采用数值方法的ORBIT程序的计算结果有较好的一致性。该程序可用于直线加速器及回旋加速器中的空间电荷效应模拟。 相似文献
5.
6.
7.
随着计算空间光调制器的分辨率的尺寸逐渐变大,全息图三维动态显示的计算量也越来越大,使得对全息计算速度提出了新的要求。利用GPU并行计算处理的方式实现全息图的快速层析法计算,该方法利用GPU并行多线程和层析法中的图像二维傅里叶变换的优势对菲涅尔衍射变换算法加速计算;同时通过对GPU底层资源的调用和对CUDA中程序的流处理过程,有效减少中间的延时等待。通过对计算速度对比分析表明:与在CPU上运算相比,计算速度大幅提升,基于GPU并行计算的方法比基于CPU计算的方法速度快10倍左右。 相似文献
8.
基于Tahoe框架的某夹具并行计算 总被引:1,自引:0,他引:1
在开源软件Tahoe框架基础上,结合有限元前后处理程序MSC.Patran及Tecplot,对某复杂夹具进行建模.通过区域分解、编制接口和采用PHG中提供的PCG(preconditioned conjugate gradient,预处理共轭梯度法)迭代解法成功实现262×104自由度模型的串、并行计算.结果表明,并行计算收敛速度更快,4进程并行计算时间不到串行计算时间的1/4.通过与商用程序MSC.Nastran比较,验证计算结果的正确性.利用大型并行计算机对该模型并行计算性能进行研究,获得最高32进程的并行计算加速比.研究表明,改进后的Tahoe计算框架对于开展大规模自由度下的结构并行计算分析研究是可行的,并且随计算节点增加,并行计算过程基本呈线性加速. 相似文献
9.
介绍了2.5维自主研制的并行电磁粒子模拟程序NEPTUNE2D初步研发情况。该程序基于JASMIN并行自适应结构网格支撑框架研制,并行效能高,可扩展性强,且支持动态负载平衡;采用新型PIC算法替代传统算法,避免求解泊松方程修正电场,更适用于大规模并行计算;程序支持r-z坐标系下的器件仿真,可应用于高功率微波器件、电真空器件的快速模拟设计。该程序现已完成电磁场更新、粒子推进、电磁场注入/引出、粒子发射/吸收等基本物理功能模块的研制,并通过同轴线、圆波导、同轴二极管及无箔二极管算例模拟验证了模块的正确性。最后,应用NEPTUNE2D程序设计了一个高效同轴相对论返波管,给出了粒子模拟结果和并行性能测试结果。 相似文献
10.
11.
基于JASMIN的地下水流大规模并行数值模拟 总被引:1,自引:0,他引:1
针对具有精细网格剖分、长时间跨度特征的地下水流模拟中计算时间长、存储开销大等瓶颈问题,基于MODFLOW三维非稳定流计算方法,提出基于网格片的核心算法以及基于影像区的通信机制,并在JASMIN框架上研制了大规模地下水流并行数值模拟程序JOGFLOW.通过河南郑州市中牟县雁鸣湖水源地地下水流的模拟,对程序正确性和性能进行了验证;通过建立一个具有精细网格剖分的假想地下水概念模型对可扩展性进行测试.相对于32核的并行程序,在512以及1 024个处理机上的并行效率分别可达77.2%和67.5%.数值模拟结果表明,JOGFLOW具有较好的计算性能与可扩展性,能够有效使用数百上千计算核心,支持千万量级以上网格剖分的地下水流模型的大规模并行计算. 相似文献
12.
13.
We extend the multi-level Monte Carlo (MLMC) in order to quantify uncertainty in the solutions of multi-dimensional hyperbolic systems of conservation laws with uncertain initial data. The algorithm is presented and several issues arising in the massively parallel numerical implementation are addressed. In particular, we present a novel load balancing procedure that ensures scalability of the MLMC algorithm on massively parallel hardware. A new code is described and applied to simulate uncertain solutions of the Euler equations and ideal magnetohydrodynamics (MHD) equations. Numerical experiments showing the robustness, efficiency and scalability of the proposed algorithm are presented. 相似文献
14.
A parallel implementation of the electromagnetic dual-primal finite element tearing and interconnecting algorithm (FETI-DPEM) is designed for general three-dimensional (3D) electromagnetic large-scale simulations. As a domain decomposition implementation of the finite element method, the FETI-DPEM algorithm provides fully decoupled subdomain problems and an excellent numerical scalability, and thus is well suited for parallel computation. The parallel implementation of the FETI-DPEM algorithm on a distributed-memory system using the message passing interface (MPI) is discussed in detail along with a few practical guidelines obtained from numerical experiments. Numerical examples are provided to demonstrate the efficiency of the parallel implementation. 相似文献
15.
Andrade X Alberdi-Rodriguez J Strubbe DA Oliveira MJ Nogueira F Castro A Muguerza J Arruabarrena A Louie SG Aspuru-Guzik A Rubio A Marques MA 《J Phys Condens Matter》2012,24(23):233202
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures. 相似文献
16.
We address the failure in scalability of large-scale parallel simulations that are based on (semi-)implicit time-stepping and hence on the solution of linear systems on thousands of processors. We develop a general algorithmic framework based on domain decomposition that removes the scalability limitations and leads to optimal allocation of available computational resources. It is a non-intrusive approach as it does not require modification of existing codes. Specifically, we present here a two-stage domain decomposition method for the Navier–Stokes equations that combines features of discontinuous and continuous Galerkin formulations. At the first stage the domain is subdivided into overlapping patches and within each patch a C0 spectral element discretization (second stage) is employed. Solution within each patch is obtained separately by applying an efficient parallel solver. Proper inter-patch boundary conditions are developed to provide solution continuity, while a Multilevel Communicating Interface (MCI) is developed to provide efficient communication between the non-overlapping groups of processors of each patch. The overall strong scaling of the method depends on the number of patches and on the scalability of the standard solver within each patch. This dual path to scalability provides great flexibility in balancing accuracy with parallel efficiency. The accuracy of the method has been evaluated in solutions of steady and unsteady 3D flow problems including blood flow in the human intracranial arterial tree. Benchmarks on BlueGene/P, CRAY XT5 and Sun Constellation Linux Cluster have demonstrated good performance on up to 96,000 cores, solving up to 8.21B degrees of freedom in unsteady flow problem. The proposed method is general and can be potentially used with other discretization methods or in other applications. 相似文献
17.
18.
19.
20.
Marc R.J. Charest Clinton P.T. Groth Ömer L. Gülder 《Journal of computational physics》2012,231(8):3023-3040
The discrete ordinates method (DOM) and finite-volume method (FVM) are used extensively to solve the radiative transfer equation (RTE) in furnaces and combusting mixtures due to their balance between numerical efficiency and accuracy. These methods produce a system of coupled partial differential equations which are typically solved using space-marching techniques since they converge rapidly for constant coefficient spatial discretization schemes and non-scattering media. However, space-marching methods lose their effectiveness when applied to scattering media because the intensities in different directions become tightly coupled. When these methods are used in combination with high-resolution limited total-variation-diminishing (TVD) schemes, the additional non-linearities introduced by the flux limiting process can result in excessive iterations for most cases or even convergence failure for scattering media. Space-marching techniques may also not be quite as well-suited for the solution of problems involving complex three-dimensional geometries and/or for use in highly-scalable parallel algorithms. A novel pseudo-time marching algorithm is therefore proposed herein to solve the DOM or FVM equations on multi-block body-fitted meshes using a highly scalable parallel-implicit solution approach in conjunction with high-resolution TVD spatial discretization. Adaptive mesh refinement (AMR) is also employed to properly capture disparate solution scales with a reduced number of grid points. The scheme is assessed in terms of discontinuity-capturing capabilities, spatial and angular solution accuracy, scalability, and serial performance through comparisons to other commonly employed solution techniques. The proposed algorithm is shown to possess excellent parallel scaling characteristics and can be readily applied to problems involving complex geometries. In particular, greater than 85% parallel efficiency is demonstrated for a strong scaling problem on up to 256 processors. Furthermore, a speedup of a factor of at least two was observed over a standard space-marching algorithm using a limited scheme for optically thick scattering media. Although the time-marching approach is approximately four times slower for absorbing media, it vastly outperforms standard solvers when parallel speedup is taken into account. The latter is particularly true for geometrically complex computational domains. 相似文献