首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 125 毫秒
1.
While new power-efficient computer architectures exhibit spectacular theoretical peak performance, they require specific conditions to operate efficiently, which makes porting complex algorithms a challenge. Here, we report results of the semi-implicit method for pressure linked equations (SIMPLE) and the pressure implicit with operator splitting (PISO) methods implemented on the graphics processing unit (GPU). We examine the advantages and disadvantages of the full porting over a partial acceleration of these algorithms run on unstructured meshes. We found that the full-port strategy requires adjusting the internal data structures to the new hardware and proposed a convenient format for storing internal data structures on GPUs. Our implementation is validated on standard steady and unsteady problems and its computational efficiency is checked by comparing its results and run times with those of some standard software (OpenFOAM) run on central processing unit (CPU). The results show that a server-class GPU outperforms a server-class dual-socket multi-core CPU system running essentially the same algorithm by up to a factor of 4.  相似文献   

2.
Several next generation high performance computing platforms are or will be based on the so‐called many‐core architectures, which represent a significant departure from commodity multi‐core architectures. A key issue in transitioning large‐scale simulation codes from multi‐core to many‐core systems is closing the serial performance gap, that is, overcoming the large difference in single‐core performance between multi‐core and many‐core systems. In this paper, we discuss how this problem was addressed for a 3D unstructured mesh hydrodynamics code, describe how Amdahl's law can be used to estimate performance targets and guide optimization efforts, and present timing studies performed on multi‐core and many‐core platforms. Published 2014. This article is a U.S. Government work and is in the public domain in the USA.  相似文献   

3.
Since smoothed particle hydrodynamics (SPH) is based on interactions with the closer neighbouring particles, implementing the neighbour list is a key point in terms of the high performance of the code. The efficiency of the method depends directly on how to build and use the neighbour list. In the present work, the available searching algorithms for SPH codes are analyzed. Different gridding algorithms are evaluated, the gains in efficiency obtained from reordering of particles is investigated and the cell‐linked list and Verlet list methods are studied to create the neighbour list. Furthermore, an innovative searching procedure based on a dynamic updating of the Verlet list is proposed. The efficiency of the algorithms is analyzed in terms of computational time and memory requirements. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

4.
We demonstrate that radically differing implementations of finite element methods (FEMs) are needed on multi‐core (CPU) and many‐core (GPU) architectures, if their respective performance potential is to be realised. Our numerical investigations using a finite element advection–diffusion solver show that increased performance on each architecture can only be achieved by committing to specific and diverse algorithmic choices that cut across the high‐level structure of the implementation. Making these commitments to achieve high performance for a single architecture leads to a loss of performance portability. Data structures that include redundant data but enable coalesced memory accesses are faster on many‐core architectures, whereas redundancy‐free data structures that are accessed indirectly are faster on multi‐core architectures. The Addto algorithm for global assembly is optimal on multi‐core architectures, whereas the Local Matrix Approach is optimal on many‐core architectures despite requiring more computation than the Addto algorithm. These results demonstrate the value in making the correct choice of algorithm and data structure when implementing FEMs, spectral element methods and low‐order discontinuous Galerkin methods on modern high‐performance architectures. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

5.
人为构造解方法是复杂多物理过程耦合程序正确性验证的重要方法之一,适用于二维拉氏大变形网格的流体、辐射耦合人为解模型较为少见。针对拉氏辐射流体力学程序正确性验证的需要,从二维拉氏辐射流体力学方程组出发,基于坐标变换技术,给出了拉氏空间到欧氏空间的物理变量导数关系式,开展了辐射流体耦合的人为解构造方法研究,构造了一类质量方程无源项的二维人为解模型,并应用于非结构拉氏程序LAD2D辐射流体力学计算的正确性考核,为流体运动网格上的辐射扩散计算提供了一种有效手段。数值结果显示观测到的数值模拟收敛阶与理论分析一致。  相似文献   

6.
We present a nodal Godunov method for Lagrangian shock hydrodynamics. The method is designed to operate on three‐dimensional unstructured grids composed of tetrahedral cells. A node‐centered finite element formulation avoids mesh stiffness, and an approximate Riemann solver in the fluid reference frame ensures a stable, upwind formulation. This choice leads to a non‐zero mass flux between control volumes, even though the mesh moves at the fluid velocity, but eliminates volume errors that arise due to the difference between the fluid velocity and the contact wave speed. A monotone piecewise linear reconstruction of primitive variables is used to compute interface unknowns and recover second‐order accuracy. The scheme has been tested on a variety of standard test problems and exhibits first‐order accuracy on shock problems and second‐order accuracy on smooth flows using meshes of up to O(106) tetrahedra. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

7.
A key choice in the development of arbitrary Lagrangian‐Eulerian solution algorithms is how to move the computational mesh. The most common approaches are smoothing and relaxation techniques, or to compute a mesh velocity field that produces smooth mesh displacements. We present a method in which the mesh velocity is specified by the irrotational component of the fluid velocity as computed from a Helmholtz decomposition, and excess compression of mesh cells is treated through a noniterative, local spring‐force model. This approach allows distinct and separate control over rotational and translational modes. The utility of the new mesh motion algorithm is demonstrated on a number of 3D test problems, including problems that involve both shocks and significant amounts of vorticity.  相似文献   

8.
In this paper, the first in a series, the authors have developed and implemented new computational algorithms for improving the scalability of CFD simulations on emerging architectures such as multicore high performance computing (HPC) platforms. These algorithmic developments and implementations are classified into three categories: (i) improved partition for multicore platforms, (ii) improved and optimized communication for HPC and (iii) enhancing scalability using computer science based methods. In the first category, the multilevel partitioning strategy was modified to reduce the number of out‐of‐core communications. This resulted in noticeable speedup even for small cases. In the second category, the authors came up with a next generation communication procedure optimized for the architecture and the partitioning. This next generation communication resulted in noticeable speedups. In the third category, improvements with respect to better management of memory were implemented. This again resulted in a speedup of nearly 10%. The overall scalability, as a result of the three algorithmic implementations, yielded ideal and at times superlinear scalability until 3000 processors. In general, the scalability results are very promising and indicate that the approach has a great potential for more complicated multidisciplinary problems such as fluid–structure interaction and aeroelastic simulations. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix–vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.  相似文献   

10.
A semi-analytic solution is described for planar radiative shock waves in the equilibrium diffusion (1−T) limit. The solution requires finding numerically the root of a polynomial and integrating a nonlinear ordinary differential equation. This solution may be used as a test problem to verify computer codes that use the equilibrium–diffusion radiation model, or for more advanced radiation models in the optically-thick limit. The structure of the shock profiles is also discussed, including new accurate estimates on the conditions for continuous solutions. We also discuss how the Zel’dovich spike may be estimated from the equilibrium diffusion solution. Finally, results from a computer code are shown to compare well with a semi-analytic solution.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号