期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimising the Termofluids CFD code for petascale simulations

R. Borrell J. Chiva O. Lehmkuhl G. Oyarzun I. Rodríguez 《International Journal of Computational Fluid Dynamics》2016,30(6):425-430

ABSTRACT

This paper presents some recent efforts carried out on the expansion of the scalability of TermoFluids multi-physics Computational Fluid Dynamics (CFD) code, aiming to achieve petascale capacity for a single simulation. We describe different aspects that we have improved in our code in order to efficiently run it on 131,072 CPU-cores. This work has been developed using the BlueGene/Q Mira supercomputer of the Argonne Leadership Computing Facility, where we have obtained feedback at the targeted scale. In summary, this is a practical paper showing our experience at reaching the petascale paradigm for a single simulation with TermoFluids. 相似文献

2.

Acceleration on stretched meshes with line-implicit LU-SGS in parallel implementation

Evelyn Otero Peter Eliasson 《International Journal of Computational Fluid Dynamics》2015,29(2):133-149

The implicit lower–upper symmetric Gauss–Seidel (LU-SGS) solver is combined with the line-implicit technique to improve convergence on the very anisotropic grids necessary for resolving the boundary layers. The computational fluid dynamics code used is Edge, a Navier–Stokes flow solver for unstructured grids based on a dual grid and edge-based formulation. Multigrid acceleration is applied with the intention to accelerate the convergence to steady state. LU-SGS works in parallel and gives better linear scaling with respect to the number of processors, than the explicit scheme. The ordering techniques investigated have shown that node numbering does influence the convergence and that the orderings from Delaunay and advancing front generation were among the best tested. 2D Reynolds-averaged Navier–Stokes computations have clearly shown the strong efficiency of our novel approach line-implicit LU-SGS which is four times faster than implicit LU-SGS and line-implicit Runge–Kutta. Implicit LU-SGS for Euler and line-implicit LU-SGS for Reynolds-averaged Navier–Stokes are at least twice faster than explicit and line-implicit Runge–Kutta, respectively, for 2D and 3D cases. For 3D Reynolds-averaged Navier–Stokes, multigrid did not accelerate the convergence and therefore may not be needed. 相似文献

3.

Efficient simulation of multidimensional continuum and non-continuum flows by a parallelised unified gas kinetic scheme solver

Lokesh Kumar Ragta Balaji Srinivasan Sawan Suman Sinha 《International Journal of Computational Fluid Dynamics》2017,31(6-8):292-309

We develop an efficient, parallel, gas-kinetic solver for computing both continuum and non-continuum flows over non-Cartesian geometries by utilising the unified gas kinetic scheme (UGKS). UGKS, however, requires the computationally expensive update of a six-dimensional phase space at each time step restricting its application to canonical, laminar problems and simple geometries. In this paper, we demonstrate that the applications of UGKS can be increased by parallelising it and combining it with a recently developed, Cartesian grid method (UGKS-CGM). We demonstrate that our Cartesian grid methodology as well as UGKS parallelization perform and scale well on a range of numerical test cases even for a very large number of cores. Finally, we demonstrate that the solver accurately computes canonical turbulence at low Knudsen numbers. These results demonstrate that the parallelised UGKS code can be utilised to effectively study the non-equilibrium effects of rarefaction on laminar and turbulent non-continuum flows. 相似文献

4.

Dynamic load balance applied to particle transport in fluids

Guillaume Houzeaux Marta Garcia Juan Carlos Cajas Antoni Artigues Edgar Olivares Jesús Labarta 《International Journal of Computational Fluid Dynamics》2016,30(6):408-418

This work presents a parallel numerical strategy to transport Lagrangian particles in a fluid using a dynamic load balance strategy. Both fluid and particle solvers are parallel, with two levels of parallelism. The first level is based on a substructuring technique and uses message passing interface (MPI) as the communication library; the second level consists of OpenMP pragmas for loop parallelisation at the node level. When dealing with transient flows, there exist two main alternatives to address the coupling of these solvers. On the one hand, a single-code approach consists in solving the particle equations once the fluid solution has been obtained at the end of a time step, using the same instance of the same code. On the other hand, a multi-code approach enables one to overlap the transport of the particles with the next time-step solution of the fluid equations, and thus obtain asynchronism. In this case, different codes or two instances of the same code can be used. Both approaches will be presented. In addition, a dynamic load balancing library is used on the top of OpenMP pragmas in order to continuously exploit all the resources available at the node level, thus increasing the load balance and the efficiency of the parallelisation and uses the MPI. 相似文献

5.

Acceleration of iterative Navier-Stokes solvers on graphics processing units

Tadeusz Tomczak Katarzyna Zadarnowska Zbigniew Koza Maciej Matyka Łukasz Mirosław 《International Journal of Computational Fluid Dynamics》2013,27(4-5):201-209

While new power-efficient computer architectures exhibit spectacular theoretical peak performance, they require specific conditions to operate efficiently, which makes porting complex algorithms a challenge. Here, we report results of the semi-implicit method for pressure linked equations (SIMPLE) and the pressure implicit with operator splitting (PISO) methods implemented on the graphics processing unit (GPU). We examine the advantages and disadvantages of the full porting over a partial acceleration of these algorithms run on unstructured meshes. We found that the full-port strategy requires adjusting the internal data structures to the new hardware and proposed a convenient format for storing internal data structures on GPUs. Our implementation is validated on standard steady and unsteady problems and its computational efficiency is checked by comparing its results and run times with those of some standard software (OpenFOAM) run on central processing unit (CPU). The results show that a server-class GPU outperforms a server-class dual-socket multi-core CPU system running essentially the same algorithm by up to a factor of 4. 相似文献

6.

Efficient multigrid preconditioners for atmospheric flow simulations at high aspect ratio

下载免费PDF全文

Andreas Dedner Eike Müller Robert Scheichl 《国际流体数值方法杂志》2016,80(1):76-102

Many problems in fluid modelling require the efficient solution of highly anisotropic elliptic partial differential equations (PDEs) in ‘flat’ domains. For example, in numerical weather and climate prediction, an elliptic PDE for the pressure correction has to be solved at every time step in a thin spherical shell representing the global atmosphere. This elliptic solve can be one of the computationally most demanding components in semi‐implicit semi‐Lagrangian time stepping methods, which are very popular as they allow for larger model time steps and better overall performance. With increasing model resolution, algorithmically efficient and scalable algorithms are essential to run the code under tight operational time constraints. We discuss the theory and practical application of bespoke geometric multigrid preconditioners for equations of this type. The algorithms deal with the strong anisotropy in the vertical direction by using the tensor‐product approach originally analysed by Börm and Hiptmair [Numer. Algorithms, 26/3 (2001), pp. 219–234]. We extend the analysis to three dimensions under slightly weakened assumptions and numerically demonstrate its efficiency for the solution of the elliptic PDE for the global pressure correction in atmospheric forecast models. For this, we compare the performance of different multigrid preconditioners on a tensor‐product grid with a semi‐structured and quasi‐uniform horizontal mesh and a one‐dimensional vertical grid. The code is implemented in the Distributed and Unified Numerics Environment, which provides an easy‐to‐use and scalable environment for algorithms operating on tensor‐product grids. Parallel scalability of our solvers on up to 20 480 cores is demonstrated on the HECToR supercomputer. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

7.

Concurrent implicit spectral deferred correction scheme for low-Mach number combustion with detailed chemistry

François P. Hamon Marcus S. Day Michael L. Minion 《Combustion Theory and Modelling》2019,23(2):279-309

We present a parallel multi-implicit time integration scheme for the advection-diffusion-reaction systems arising from the equations governing low-Mach number combustion with complex chemistry. Our strategy employs parallelisation across the method to accelerate the serial Multi-Implicit Spectral Deferred Correction (MISDC) scheme used to couple the advection, diffusion, and reaction processes. In our approach, the diffusion solves and the reaction solves are performed concurrently by different processors. Our analysis shows that the proposed parallel scheme is stable for stiff problems and that the sweeps converge to the fixed-point solution at a faster rate than with serial MISDC. We present numerical examples to demonstrate that the new algorithm is high-order accurate in time, and achieves a parallel speedup compared to serial MISDC. 相似文献