Accelerating fully resolved simulation of particle-laden flows on heterogeneous computer architectures期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Accelerating fully resolved simulation of particle-laden flows on heterogeneous computer architectures

Institution:	1. State Key Laboratory of Coal Combustion, School of Energy and Power Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China;2. Department of Mechanical Engineering, National University of Singapore, 117575, Singapore;1. School of Chemical Engineering, University of Chinese Academy of Sciences, Beijing, 100049, China;2. State Key Laboratory of Multiphase Complex Systems, Institute of Process Engineering, China Academy of Sciences, Beijing, 100190, China;1. Fuels Research Center, Department of Chemical Technology, Faculty of Science, Chulalongkorn University, 254 Phayathai Road, Pathumwan, Bangkok, 10330, Thailand;2. Center of Excellence on Petrochemical and Materials Technology, Chulalongkorn University, 254 Phayathai Road, Pathumwan, Bangkok, 10330, Thailand;3. Department of Chemical and Biological Engineering, Armour College of Engineering, Illinois Institute of Technology, 10 West 35th Street, Chicago, IL, 60616, United States;4. Advanced Computational Fluid Dynamics Research Unit, Chulalongkorn University, 254 Phayathai Road, Pathumwan, Bangkok, 10330, Thailand;1. Department of Epidemiology and Environmental Health, School of Public and Health Professions, University at Buffalo, Buffalo, 14214, USA;2. Department of Earth Science and Geography, California State University, Dominguez Hills, Carson, 90747, USA;3. Department of Environmental Science, Baylor University, Waco, 76798, USA;1. Department of Chemical Engineering, Tianjin Renai College, Tianjin, 301636, China;2. School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China;3. Center for Applied Energy Research, University of Kentucky, Lexington, KY, 40511, United States;4. The Research Center of Chemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072, China;1. International Joint Laboratory on Clean Energy Science and Technology, Beijing Key Laboratory of Process Fluid Filtration and Separation, College of Mechanical and Transportation Engineering, China University of Petroleum-Beijing, Beijing, 102249, China;2. School of Chemical and Process Engineering, University of Leeds, Leeds, LS2 9JT, UK

Abstract:	An efficient computing framework, namely PFlows, for fully resolved-direct numerical simulations of particle-laden flows was accelerated on NVIDIA General Processing Units (GPUs) and GPU-like accelerator (DCU) cards. The framework is featured as coupling the lattice Boltzmann method for fluid flow with the immersed boundary method for fluid-particle interaction, and the discrete element method for particle collision, using two fixed Eulerian meshes and one moved Lagrangian point mesh, respectively. All the parts are accelerated by a fine-grained parallelism technique using CUDA on GPUs, and further using HIP on DCU cards, i.e., the calculation on each fluid grid, each immersed boundary point, each particle motion, and each pair-particle collision is responsible by one computer thread, respectively. Coalesced memory accesses to LBM distribution functions with the data layout of Structure of Arrays are used to maximize utilization of hardware bandwidth. Parallel reduction with shared memory for data of immersed boundary points is adopted for the sake of reducing access to global memory when integrate particle hydrodynamic force. MPI computing is further used for computing on heterogeneous architectures with multiple CPUs-GPUs/DCUs. The communications between adjacent processors are hidden by overlapping with calculations. Two benchmark cases were conducted for code validation, including a pure fluid flow and a particle-laden flow. The performances on a single accelerator show that a GPU V100 can achieve 7.1–11.1 times speed up, while a single DCU can achieve 5.6–8.8 times speed up compared to a single Xeon CPU chip (32 cores). The performances on multi-accelerators show that parallel efficiency is 0.5–0.8 for weak scaling and 0.68–0.9 for strong scaling on up to 64 DCU cards even for the dense flow (φ = 20%). The peak performance reaches 179 giga lattice updates per second (GLUPS) on 256 DCU cards by using 1 billion grids and 1 million particles. At last, a large-scale simulation of a gas-solid flow with 1.6 billion grids and 1.6 million particles was conducted using only 32 DCU cards. This simulation shows that the present framework is prospective for simulations of large-scale particle-laden flows in the upcoming exascale computing era.

Keywords:	Lattice Boltzmann method Immersed boundary method Particle-laden flows Heterogeneous acceleration General Processing Units LBM"} {"#name":"keyword" "$":{"id":"pc_qTrZhFwmYJ"} "$$":[{"#name":"text" "_":"Lattice Boltzmann Method IBM"} {"#name":"keyword" "$":{"id":"pc_iSZYZWr6S2"} "$$":[{"#name":"text" "_":"Immersed Boundary Method GPU"} {"#name":"keyword" "$":{"id":"pc_dSvnUpPhwr"} "$$":[{"#name":"text" "_":"General Processing Units GLUPS"} {"#name":"keyword" "$":{"id":"pc_cEjTSleBQ5"} "$$":[{"#name":"text" "_":"Giga Lattice Updates per Second PR-DNS"} {"#name":"keyword" "$":{"id":"pc_vx8tsu3ILH"} "$$":[{"#name":"text" "_":"Particle-resolved Direct Numerical Simulation ALE"} {"#name":"keyword" "$":{"id":"pc_BMJi8in90E"} "$$":[{"#name":"text" "_":"Arbitrary Lagrangian-Eulerian
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏