期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An efficient mixed-precision,hybrid CPU–GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm

G. Chen L. Chacón D.C. Barnes 《Journal of computational physics》2012,231(16):5374-5388

Recently, an implicit, nonlinearly consistent, energy- and charge-conserving one-dimensional (1D) particle-in-cell method has been proposed for multi-scale, full-f kinetic simulations [G. Chen et al., J. Comput. Phys. 230 (18) (2011)]. The method employs a Jacobian-free Newton–Krylov (JFNK) solver, capable of using very large timesteps without loss of numerical stability or accuracy. A fundamental feature of the method is the segregation of particle-orbit computations from the field solver, while remaining fully self-consistent. This paper describes a very efficient, mixed-precision hybrid CPU–GPU implementation of the 1D implicit PIC algorithm exploiting this feature. The JFNK solver is kept on the CPU in double precision (DP), while the implicit, charge-conserving, and adaptive particle mover is implemented on a GPU (graphics processing unit) using CUDA in single-precision (SP). Performance-oriented optimizations are introduced with the aid of the roofline model. The implicit particle mover algorithm is shown to achieve up to 400 GOp/s on a Nvidia GeForce GTX580. This corresponds to 25% absolute GPU efficiency against the peak theoretical performance, and is about 100 times faster than an equivalent single-core CPU (Intel Xeon X5460) compiler-optimized execution. For the test case chosen, the mixed-precision hybrid CPU–GPU solver is shown to over-perform the DP CPU-only serial version by a factor of ～100, without apparent loss of robustness or accuracy in a challenging long-timescale ion acoustic wave simulation. 相似文献

2.

GPU-accelerated molecular dynamics simulation for study of liquid crystalline flows

Alfeus Sunarso Tomohiro TsujiShigeomi Chono 《Journal of computational physics》2010,229(15):5486-5497

We have developed a GPU-based molecular dynamics simulation for the study of flows of fluids with anisotropic molecules such as liquid crystals. An application of the simulation to the study of macroscopic flow (backflow) generation by molecular reorientation in a nematic liquid crystal under the application of an electric field is presented. The computations of intermolecular force and torque are parallelized on the GPU using the cell-list method, and an efficient algorithm to update the cell lists was proposed. Some important issues in the implementation of computations that involve a large number of arithmetic operations and data on the GPU that has limited high-speed memory resources are addressed extensively. Despite the relatively low GPU occupancy in the calculation of intermolecular force and torque, the computation on a recent GPU is about 50 times faster than that on a single core of a recent CPU, thus simulations involving a large number of molecules using a personal computer are possible. The GPU-based simulation should allow an extensive investigation of the molecular-level mechanisms underlying various macroscopic flow phenomena in fluids with anisotropic molecules. 相似文献

3.

Accelerated GPU simulation of compressible flow by the discontinuous evolution Galerkin method

B. J. Block M. Lukáčová-Medvid’ová P. Virnau L. Yelash 《The European physical journal. Special topics》2012,210(1):119-132

The aim of the present paper is to report on our recent results for GPU accelerated simulations of compressible flows. For numerical simulation the adaptive discontinuous Galerkin method with the multidimensional bicharacteristic based evolution Galerkin operator has been used. For time discretization we have applied the explicit third order Runge-Kutta method. Evaluation of the genuinely multidimensional evolution operator has been accelerated using the GPU implementation. We have obtained a speedup up to 30 (in comparison to a single CPU core) for the calculation of the evolution Galerkin operator on a typical discretization mesh consisting of 16384 mesh cells. 相似文献

4.

激波与火焰面相互作用数值模拟的GPU加速 总被引：1，自引：0，他引：1

蒋华董刚陈霄《计算物理》2016,33(1):23-29

为考察计算机图形处理器（GPU）在计算流体力学中的计算能力,采用基于CPU/GPU异构并行模式的方法对激波与火焰界面相互作用的典型可压缩反应流进行数值模拟,优化并行方案,考察不同网格精度对计算结果和计算加速性能的影响.结果表明,和传统的基于信息传递的MPI 8线程并行计算相比,GPU并行模拟结果与MPI并行模拟结果相同;两种计算方法的计算时间均随网格数量的增加呈线性增长趋势,但GPU的计算时间比MPI明显降低.当网格数量较小时（1.6×10⁴）,GPU计算得到的单个时间步长平均时间的加速比为8.6;随着网格数量的增加,GPU的加速比有所下降,但对较大规模的网格数量（4.2×10⁶）,GPU的加速比仍可达到5.9.基于GPU的异构并行加速算法为可压缩反应流的高分辨率大规模计算提供了较好的解决途径. 相似文献

5.

High-order compact schemes for incompressible flows: A simple and efficient method with quasi-spectral accuracy

Sylvain Laizet Eric Lamballais 《Journal of computational physics》2009,228(16):5989-6015

In this paper, a finite difference code for Direct and Large Eddy Simulation (DNS/LES) of incompressible flows is presented. This code is an intermediate tool between fully spectral Navier–Stokes solvers (limited to academic geometry through Fourier or Chebyshev representation) and more versatile codes based on standard numerical schemes (typically only second-order accurate). The interest of high-order schemes is discussed in terms of implementation easiness, computational efficiency and accuracy improvement considered through simplified benchmark problems and practical calculations. The equivalence rules between operations in physical and spectral spaces are efficiently used to solve the Poisson equation introduced by the projection method. It is shown that for the pressure treatment, an accurate Fourier representation can be used for more flexible boundary conditions than periodicity or free-slip. Using the concept of the modified wave number, the incompressibility can be enforced up to the machine accuracy. The benefit offered by this alternative method is found to be very satisfactory, even when a formal second-order error is introduced locally by boundary conditions that are neither periodic nor symmetric. The usefulness of high-order schemes combined with an immersed boundary method (IBM) is also demonstrated despite the second-order accuracy introduced by this wall modelling strategy. In particular, the interest of a partially staggered mesh is exhibited in this specific context. Three-dimensional calculations of transitional and turbulent channel flows emphasize the ability of present high-order schemes to reduce the computational cost for a given accuracy. The main conclusion of this paper is that finite difference schemes with quasi-spectral accuracy can be very efficient for DNS/LES of incompressible flows, while allowing flexibility for the boundary conditions and easiness in the code development. Therefore, this compromise fits particularly well for very high-resolution simulations of turbulent flows with relatively complex geometries without requiring heavy numerical developments. 相似文献

6.

GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model

Tobias Preis Peter Virnau Wolfgang Paul Johannes J. Schneider 《Journal of computational physics》2009,228(12):4468-4477

The compute unified device architecture (CUDA) is a programming approach for performing scientific calculations on a graphics processing unit (GPU) as a data-parallel computing device. The programming interface allows to implement algorithms using extensions to standard C language. With continuously increased number of cores in combination with a high memory bandwidth, a recent GPU offers incredible resources for general purpose computing. First, we apply this new technology to Monte Carlo simulations of the two dimensional ferromagnetic square lattice Ising model. By implementing a variant of the checkerboard algorithm, results are obtained up to 60 times faster on the GPU than on a current CPU core. An implementation of the three dimensional ferromagnetic cubic lattice Ising model on a GPU is able to generate results up to 35 times faster than on a current CPU core. As proof of concept we calculate the critical temperature of the 2D and 3D Ising model using finite size scaling techniques. Theoretical results for the 2D Ising model and previous simulation results for the 3D Ising model can be reproduced. 相似文献

7.

Spectrally-accurate algorithm for moving boundary problems for the Navier–Stokes equations

S.Z. Husain J.M. Floryan 《Journal of computational physics》2010,229(6):2287-2313

A spectral algorithm based on the immersed boundary conditions (IBC) concept is developed for simulations of viscous flows with moving boundaries. The algorithm uses a fixed computational domain with flow domain immersed inside the computational domain. Boundary conditions along the edges of the time-dependent flow domain enter the algorithm in the form of internal constraints. Spectral spatial discretization uses Fourier expansions in the stream-wise direction and Chebyshev expansions in the normal-to-the-wall direction. Up to fourth-order implicit temporal discretization methods have been implemented. It has been demonstrated that the algorithm delivers the theoretically predicted accuracy in both time and space. Performances of various linear solvers employed in the solution process have been evaluated and a new class of solver that takes advantage of the structure of the coefficient matrix has been proposed. The new solver results in a significant acceleration of computations as well as in a substantial reduction in memory requirements. 相似文献

8.

An immersed boundary method based on discrete stream function formulation for two- and three-dimensional incompressible flows

Shizhao Wang Xing Zhang 《Journal of computational physics》2011,230(9):3479-3499

An immersed boundary method is proposed in the framework of discrete stream function formulation for incompressible flows. In order to impose the non-slip boundary condition, the forcing term is determined implicitly by solving a linear system. The number of unknowns of the linear system is the same as that of the Lagrangian points representing the body surface. Thus the extra cost in force calculation is negligible if compared with that in the basic flow solver. In order to handle three-dimensional flows at moderate Reynolds numbers, a parallelized flow solver based on the present method is developed using the domain decomposition strategy. To verify the accuracy of the immersed-boundary method proposed in this work, flow problems of different complexity (decaying vortices, flows over stationary and oscillating cylinders and a stationary sphere, and flow over low-aspect-ratio flat-plate) are simulated and the results are in good agreement with the experimental or computational data in previously published literatures. 相似文献

9.

大规模声学边界元法的GPU并行计算

张锐文立华校金友《计算物理》2015,32(3):299-309

提出一种大规模声学边界元法的高效率、高精度GPU并行计算方法.基于Burton-Miller边界积分方程,推导适于GPU的并行计算格式并实现了传统边界元法的GPU加速算法.为提高原型算法的效率,研究GPU数据缓存优化方法.由于GPU的双精度浮点运算能力较低,为了降低数值误差,研究基于单精度浮点运算实现的doublesingle精度算法.数值算例表明,改进的算法实现了最高89.8%的GPU使用效率,且数值精度与直接使用双精度数相当,而计算时间仅为其1/28,显存消耗也仅为其一半.该方法可在普通PC机(8GB内存,NVIDIA Ge Force 660 Ti显卡)上快速完成自由度超过300万的大规模声学边界元分析,计算速度和内存消耗均优于快速边界元法. 相似文献

10.

Nested Cartesian grid method in incompressible viscous fluid flow

Yih-Ferng Peng Rajat Mittal Amalendu Sau Robert R. Hwang 《Journal of computational physics》2010,229(19):7072-7101

In this work, the local grid refinement procedure is focused by using a nested Cartesian grid formulation. The method is developed for simulating unsteady viscous incompressible flows with complex immersed boundaries. A finite-volume formulation based on globally second-order accurate central-difference schemes is adopted here in conjunction with a two-step fractional-step procedure. The key aspects that needed to be considered in developing such a nested grid solver are proper imposition of interface conditions on the nested-block boundaries, and accurate discretization of the governing equations in cells that are with block-interface as a control-surface. The interpolation procedure adopted in the study allows systematic development of a discretization scheme that preserves global second-order spatial accuracy of the underlying solver, and as a result high efficiency/accuracy nested grid discretization method is developed. Herein the proposed nested grid method has been widely tested through effective simulation of four different classes of unsteady incompressible viscous flows, thereby demonstrating its performance in the solution of various complex flow–structure interactions. The numerical examples include a lid-driven cavity flow and Pearson vortex problems, flow past a circular cylinder symmetrically installed in a channel, flow past an elliptic cylinder at an angle of attack, and flow past two tandem circular cylinders of unequal diameters. For the numerical simulations of flows past bluff bodies an immersed boundary (IB) method has been implemented in which the solid object is represented by a distributed body force in the Navier–Stokes equations. The main advantages of the implemented immersed boundary method are that the simulations could be performed on a regular Cartesian grid and applied to multiple nested-block (Cartesian) structured grids without any difficulty. Through the numerical experiments the strength of the solver in effectively/accurately simulating various complex flows past different forms of immersed boundaries is extensively demonstrated, in which the nested Cartesian grid method was suitably combined together with the fractional-step algorithm to speed up the solution procedure. 相似文献

11.

KSSOLV-GPU：一款利用GPU高效求解Kohn-Sham方程的平面波基组密度泛函理论MATLAB程序包

张振林焦诗哲李杰岚吴文挑万凌云秦新明胡伟杨金龙《化学物理学报》2021,34(5):552-564

KSSOLV(Kohn-Sham Solver)是一款用于求解平面波基组下Kohn-Sham方程(KS-DFT)的MATLAB(Matrix Laboratory)工具箱. 在KS-DFT的基态计算中,通常自洽场迭代中Kohn-Sham哈密顿量的对角化是最昂贵的部分. 为了使得个人计算机也能够执行数百个原子的中等大小KS-DFT计算,本文提出了一种CPU-GPU的混合编程方案,通过调用MATLAB内置的并行计算工具箱来加速在KSSOLV中实现的迭代对角化算法. 比较了KSSOLV-GPU在RTX3090、V100、A100三种GPU上的性能;结果表明,对于包含128个原子的块状硅体系,与串行的CPU计算相比,混合CPU-GPU的编程可以实现约10倍的加速. 特别是其在最新的民用GPU显卡RTX3090上也具有优秀的表现,可以预想到在不远的将来,KSSOLV-GPU借助MATLAB强大的可视化能力与GPU的加速支持可以在一台配备了民用GPU显卡的个人电脑上实现常规的DFT计算分析与可视化,从而降低了材料模拟与计算领域的门槛. 相似文献

12.

High order finite volume methods on wavelet-adapted grids with local time-stepping on multicore architectures for the simulation of shock-bubble interactions

Babak Hejazialhosseini Diego Rossinelli Michael Bergdorf Petros Koumoutsakos 《Journal of computational physics》2010,229(22):8364-8383

We present a space–time adaptive solver for single- and multi-phase compressible flows that couples average interpolating wavelets with high-order finite volume schemes. The solver introduces the concept of wavelet blocks, handles large jumps in resolution and employs local time-stepping for efficient time integration. We demonstrate that the inherently sequential wavelet-based adaptivity can be implemented efficiently in multicore computer architectures using task-based parallelism and introducing the concept of wavelet blocks. We validate our computational method on a number of benchmark problems and we present simulations of shock-bubble interaction at different Mach numbers, demonstrating the accuracy and computational performance of the method. 相似文献

13.

高阶谱元区域分解算法及其在流动稳定性中的应用

马东军孙德军尹协远《计算物理》2007,24(1):7-12

以无时间分裂误差的区域分解Stokes谱元算法为基础构建整体稳定性分析方法.用Jacobian-free的Inexact-Newton-Krylov算法求解不可压缩Navier-Stokes方程的定常解,将Stokes算法的时间推进步作为Newton迭代的预处理,在此基础上采用Arnoldi方法计算大规模特征值问题,对复杂流动进行稳定性分析,该方法能统一处理定常和非定常计算,没有时间分裂误差,无需显式构造Jacobian矩阵,可以减少内存使用,降低计算量,并加速迭代收敛.对有分析解的Kovasznay流动的计算表明,高阶谱元法具有指数收敛的谱精度.对亚临界方腔对称驱动流的各种定常解的计算及其稳定性分析验证了方法的可行性. 相似文献

14.

High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster

Dimitri Komatitsch Gordon Erlebacher Dominik Göddeke David Michéa 《Journal of computational physics》2010,229(20):7692-7714

We implement a high-order finite-element application, which performs the numerical simulation of seismic wave propagation resulting for instance from earthquakes at the scale of a continent or from active seismic acquisition experiments in the oil industry, on a large cluster of NVIDIA Tesla graphics cards using the CUDA programming environment and non-blocking message passing based on MPI. Contrary to many finite-element implementations, ours is implemented successfully in single precision, maximizing the performance of current generation GPUs. We discuss the implementation and optimization of the code and compare it to an existing very optimized implementation in C language and MPI on a classical cluster of CPU nodes. We use mesh coloring to efficiently handle summation operations over degrees of freedom on an unstructured mesh, and non-blocking MPI messages in order to overlap the communications across the network and the data transfer to and from the device via PCIe with calculations on the GPU. We perform a number of numerical tests to validate the single-precision CUDA and MPI implementation and assess its accuracy. We then analyze performance measurements and depending on how the problem is mapped to the reference CPU cluster, we obtain a speedup of 20x or 12x. 相似文献

15.

High-order non-uniform grid schemes for numerical simulation of hypersonic boundary-layer stability and transition

Xiaolin Zhong Mahidhar Tatineni 《Journal of computational physics》2003,190(2):419-458

The direct numerical simulation of receptivity, instability and transition of hypersonic boundary layers requires high-order accurate schemes because lower-order schemes do not have an adequate accuracy level to compute the large range of time and length scales in such flow fields. The main limiting factor in the application of high-order schemes to practical boundary-layer flow problems is the numerical instability of high-order boundary closure schemes on the wall. This paper presents a family of high-order non-uniform grid finite difference schemes with stable boundary closures for the direct numerical simulation of hypersonic boundary-layer transition. By using an appropriate grid stretching, and clustering grid points near the boundary, high-order schemes with stable boundary closures can be obtained. The order of the schemes ranges from first-order at the lowest, to the global spectral collocation method at the highest. The accuracy and stability of the new high-order numerical schemes is tested by numerical simulations of the linear wave equation and two-dimensional incompressible flat plate boundary layer flows. The high-order non-uniform-grid schemes (up to the 11th-order) are subsequently applied for the simulation of the receptivity of a hypersonic boundary layer to free stream disturbances over a blunt leading edge. The steady and unsteady results show that the new high-order schemes are stable and are able to produce high accuracy for computations of the nonlinear two-dimensional Navier–Stokes equations for the wall bounded supersonic flow. 相似文献

16.

Large-scale simulations on multiple Graphics Processing Units (GPUs) for the direct simulation Monte Carlo method

C.-C. Su M.R. Smith F.-A. Kuo J.-S. Wu C.-W. Hsieh K.-C. Tseng 《Journal of computational physics》2012,231(23):7932-7958

In this study, the application of the two-dimensional direct simulation Monte Carlo (DSMC) method using an MPI-CUDA parallelization paradigm on Graphics Processing Units (GPUs) clusters is presented. An all-device (i.e. GPU) computational approach is adopted where the entire computation is performed on the GPU device, leaving the CPU idle during all stages of the computation, including particle moving, indexing, particle collisions and state sampling. Communication between the GPU and host is only performed to enable multiple-GPU computation. Results show that the computational expense can be reduced by 15 and 185 times when using a single GPU and 16 GPUs respectively when compared to a single core of an Intel Xeon X5670 CPU. The demonstrated parallel efficiency is 75% when using 16 GPUs as compared to a single GPU for simulations using 30 million simulated particles. Finally, several very large-scale simulations in the near-continuum regime are employed to demonstrate the excellent capability of the current parallel DSMC method. 相似文献

17.

Monte Carlo integration on GPU

J. Kanzaki 《The European Physical Journal C - Particles and Fields》2011,71(2):1559

We use a graphics processing unit (GPU) for fast computations of Monte Carlo integrations. Two widely used Monte Carlo integration programs, VEGAS and BASES, are parallelized for running on a GPU. By using W ⁺ plus multi-gluon production processes at LHC, we test the integrated cross sections and execution time for programs written in FORTRAN and running in the CPU and those running on a GPU. The integrated results agree with each other within statistical errors. The programs run about 50 times faster on the GPU than on the CPU. 相似文献

18.

A robust numerical scheme for highly compressible magnetohydrodynamics: Nonlinear stability,implementation and tests

K. Waagan C. Federrath C. Klingenberg 《Journal of computational physics》2011,230(9):3331-3351

The ideal MHD equations are a central model in astrophysics, and their solution relies upon stable numerical schemes. We present an implementation of a new method, which possesses excellent stability properties. Numerical tests demonstrate that the theoretical stability properties are valid in practice with negligible compromises to accuracy. The result is a highly robust scheme with state-of-the-art efficiency. The scheme’s robustness is due to entropy stability, positivity and properly discretised Powell terms. The implementation takes the form of a modification of the MHD module in the FLASH code, an adaptive mesh refinement code. We compare the new scheme with the standard FLASH implementation for MHD. Results show comparable accuracy to standard FLASH with the Roe solver, but highly improved efficiency and stability, particularly for high Mach number flows and low plasma β. The tests include 1D shock tubes, 2D instabilities and highly supersonic, 3D turbulence. We consider turbulent flows with RMS sonic Mach numbers up to 10, typical of gas flows in the interstellar medium. We investigate both strong initial magnetic fields and magnetic field amplification by the turbulent dynamo from extremely high plasma β. The energy spectra show a reasonable decrease in dissipation with grid refinement, and at a resolution of 512³ grid cells we identify a narrow inertial range with the expected power law scaling. The turbulent dynamo exhibits exponential growth of magnetic pressure, with the growth rate higher from solenoidal forcing than from compressive forcing. Two versions of the new scheme are presented, using relaxation-based 3-wave and 5-wave approximate Riemann solvers, respectively. The 5-wave solver is more accurate in some cases, and its computational cost is close to the 3-wave solver. 相似文献

19.

Energy dissipating structures produced by walls in two-dimensional flows at vanishing viscosity

Nguyen van Yen R Farge M Schneider K 《Physical review letters》2011,106(18):184502

We perform numerical experiments of a dipole crashing into a wall, a generic event in two-dimensional incompressible flows with solid boundaries. The Reynolds number (Re) is varied from 985 to 7880, and no-slip boundary conditions are approximated by Navier boundary conditions with a slip length proportional to Re(-1). Energy dissipation is shown to first set up within a vorticity sheet of thickness proportional to Re(-1) in the neighborhood of the wall, and to continue as this sheet rolls up into a spiral and detaches from the wall. The energy dissipation rate integrated over these regions appears to converge towards Re-independent values, indicating the existence of energy dissipating structures that persist in the vanishing viscosity limit. 相似文献

20.

Lattice Boltzmann Solver for Multiphase Flows: Application to High Weber and Reynolds Numbers

Seyed Ali Hosseini Hesameddin Safari Dominique Thevenin 《Entropy (Basel, Switzerland)》2021,23(2)

The lattice Boltzmann method, now widely used for a variety of applications, has also been extended to model multiphase flows through different formulations. While already applied to many different configurations in low Weber and Reynolds number regimes, applications to higher Weber/Reynolds numbers or larger density/viscosity ratios are still the topic of active research. In this study, through a combination of a decoupled phase-field formulation—the conservative Allen–Cahn equation—and a cumulant-based collision operator for a low-Mach pressure-based flow solver, we present an algorithm that can be used for higher Reynolds/Weber numbers. The algorithm was validated through a variety of test cases, starting with the Rayleigh–Taylor instability in both 2D and 3D, followed by the impact of a droplet on a liquid sheet. In all simulations, the solver correctly captured the flow dynamics andmatched reference results very well. As the final test case, the solver was used to model droplet splashing on a thin liquid sheet in 3D with a density ratio of 1000 and kinematic viscosity ratio of 15, matching the water/air system at We = 8000 and Re = 1000. Results showed that the solver correctly captured the fingering instabilities at the crown rim and their subsequent breakup, in agreement with experimental and numerical observations reported in the literature. 相似文献