首页 | 本学科首页   官方微博 | 高级检索  
     检索      

LBM伪势MRT三维模型GPU并行计算的性能优化
引用本文:彭浩,单鸣雷,朱昌平,姚澄.LBM伪势MRT三维模型GPU并行计算的性能优化[J].计算物理,2018,35(5):554-562.
作者姓名:彭浩  单鸣雷  朱昌平  姚澄
作者单位:1. 河海大学常州市传感网与环境感知重点实验室并江苏省输配电装备技术重点实验室, 常州 213022;2. 江苏省“世界水谷”与水生态文明协同创新中心, 南京 211100
基金项目:国家重点研发计划(2016YFC0401606),江苏省重点研发计划(BE2016056)及江苏省自然科学基金(SBK2014043338)资助项目
摘    要:格子Boltzmann方法伪势模型算法中的格点间计算未完全局部化,因此在并行计算时需要更多次的全局内存读写、使用更多数量的寄存器和线程同步操作,从而导致GPU并行计算效率下降.本文针对伪势模型并行计算的局限性,基于三维十五速格子结构的多松弛时间伪势模型,以气液相分离为算例,通过合并访问的方式提高全局内存的读写效率;并提出一种"定向转移"算法,提高格子边界格点获取邻居格点数据的效率;最后探索不同资源分配中各种因素对计算效率的影响,总结最优资源分配的方法.

关 键 词:LBM  伪势模型  GPU  并行计算  性能优化  
收稿时间:2017-05-17
修稿时间:2017-07-21

Performance Optimization of 3D Pseudopotential Multi-Relaxation-Time Lattice Boltzmann Model on GPU
PENG Hao,SHAN Minglei,ZHU Changping,YAO Cheng.Performance Optimization of 3D Pseudopotential Multi-Relaxation-Time Lattice Boltzmann Model on GPU[J].Chinese Journal of Computational Physics,2018,35(5):554-562.
Authors:PENG Hao  SHAN Minglei  ZHU Changping  YAO Cheng
Institution:1. Changzhou Key Laboratory of Sensor Networks and Environmental Sensing, Jiangsu Key Laboratory of Power Transmission and Distribution Equipment Technology, Hohai University, Changzhou 213022, China;2. Jiangsu Provincial Collaborative Innovation Center of World Water Valley and Water Ecological Civilization, Nanjing 211100, China
Abstract:Pseudopotential model of lattice Boltzmann method is partially non-local for pseudopotential calculation with coupling of lattices, which leads to synchronization of threads in parallel implementation process. Besides, it uses a large number of registers and much time of data access operations when access global memory in calculation process. They lead to low computational efficiency. In this paper, a multi-relaxation-time(MRT) 3D pseudopotential model with D3Q15 lattice is adopted as an example to investigate performance of parallel computing based on GPU. To address limitation of parallel computing of pseudo-potential model, efficiency of reading and writing of global memory is improved by using merge access method. To improve efficiency of grids retrieving data which are in boundary of lattice, a "Directional Transfer" algorithm is proposed. The role of computing resource configuration is investigated with different sizes of block, and optimal resource configuration scheme is obtained.
Keywords:LBM  pseudopotential model  GPU  parallel computing  performance optimization  
本文献已被 CNKI 等数据库收录!
点击此处可从《计算物理》浏览原始摘要信息
点击此处可从《计算物理》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号