共查询到20条相似文献,搜索用时 109 毫秒
1.
提出一种大规模声学边界元法的高效率、高精度GPU并行计算方法.基于Burton-Miller边界积分方程,推导适于GPU的并行计算格式并实现了传统边界元法的GPU加速算法.为提高原型算法的效率,研究GPU数据缓存优化方法.由于GPU的双精度浮点运算能力较低,为了降低数值误差,研究基于单精度浮点运算实现的doublesingle精度算法.数值算例表明,改进的算法实现了最高89.8%的GPU使用效率,且数值精度与直接使用双精度数相当,而计算时间仅为其1/28,显存消耗也仅为其一半.该方法可在普通PC机(8GB内存,NVIDIA Ge Force 660 Ti显卡)上快速完成自由度超过300万的大规模声学边界元分析,计算速度和内存消耗均优于快速边界元法. 相似文献
2.
相比于传统同步并行计算策略,在异步并行计算框架下,针对最常用的总变分(TV)最小化重建模型,通过将其转化为不动点迭代问题,并利用异步交替方向法(ADM)进行求解,推导出基于TV最小化模型的异步ADM迭代重建算法,即异步交替方向总变分最小化算法(Async-ADTVM)。利用消息传递接口技术将该算法在图形处理器(GPU)集群上进行测试,进一步提高了原始基于TV最小化模型的迭代重建算法的计算效率。实验表明,该算法在计算求解精度上略优于ADTVM算法,同时在GPU性能存在差异的条件下相比传统多GPU加速策略可获得更高的加速比。 相似文献
3.
4.
紧束缚近似的含时密度泛函理论在多核和GPU系统下的高效加速实现,并应用于拥有成百上千原子体系的激发态电子结构计算.程序中采用了稀疏矩阵和OpenMP并行化来加速哈密顿矩阵的构建,而最为耗时的基态对角化部分通过双精度的GPU加速来实现.基态的GPU加速能够在保持计算精度的基础上达到8.73倍的加速比.激发态计算采用了基于Krylov子空间迭代算法,OpenMP并行化和GPU加速等方法对激发态计算的大规模TDDFT矩阵进行求解,从而得到本征值和本征矢,大大减少了迭代的次数和最终的求解时间.采用GPU对矩阵矢量相乘进行加速后的Krylov算法能够很快地达到收敛,使得相比于采用常规算法和CPU并行化的程序能够加速206倍.程序在一系列的小分子体系和大分子体系上的计算表明,相比基于第一性原理的CIS方法和含时密度泛函方法,程序能够花费很少的计算量取得合理而精确结果. 相似文献
5.
6.
7.
激波与火焰面相互作用数值模拟的GPU加速 总被引:1,自引:0,他引:1
为考察计算机图形处理器(GPU)在计算流体力学中的计算能力,采用基于CPU/GPU异构并行模式的方法对激波与火焰界面相互作用的典型可压缩反应流进行数值模拟,优化并行方案,考察不同网格精度对计算结果和计算加速性能的影响.结果表明,和传统的基于信息传递的MPI 8线程并行计算相比,GPU并行模拟结果与MPI并行模拟结果相同;两种计算方法的计算时间均随网格数量的增加呈线性增长趋势,但GPU的计算时间比MPI明显降低.当网格数量较小时(1.6×104),GPU计算得到的单个时间步长平均时间的加速比为8.6;随着网格数量的增加,GPU的加速比有所下降,但对较大规模的网格数量(4.2×106),GPU的加速比仍可达到5.9.基于GPU的异构并行加速算法为可压缩反应流的高分辨率大规模计算提供了较好的解决途径. 相似文献
8.
《光学学报》2016,(8)
稀疏表示是一种有潜力的图像信息表示方法,已应用于图像目标检测。正交匹配追踪算法(OMP)求解稀疏系数过程计算复杂,不能满足快速处理的要求,因此引入Kalman滤波器的递归思想,提出了一种计算稀疏系数的快速OMP(FastOMP)算法。利用Hermitian引理,从上一时刻的状态更新当前信息,避免了高维矩阵数据的重复计算。为提高算法的执行效率,提出了基于GPU/CUDA(图形处理器/统一计算设备架构)的并行计算方法,充分利用GPU的并行计算能力,提高了FastOMP算法的计算速度。实验结果表明,与传统OMP算法相比,FastOMP算法可大幅度缩短计算时间并提高检测精度。 相似文献
9.
10.
随着计算机科学技术的迅速发展,嵌入式领域实时图像处理应用越来越广泛,然而传统硬件因为自身架构导致并行化程度不高,针对在视频监控、机器视觉、视频压缩、医疗影像分析等领域需要对图像进行高性能计算的问题,提出一种以OpenCL软件模型和FPGA异构模式的高性能图像处理解决方案,实现了图像显示和OpenCL加速功能,以Sobel边缘检测算法为研究对象,进行了算法并行性分析,并在系统中运用OpenCL加速内核算法,与基本的ARM平台和OpenCL共享内存加速机制相比较,展开性能测试,对加速效果进行了研究。实验数据表明,使用该系统处理不同分辨率的图像,OpenCL加速子系统的处理较基于片上ARM硬核的软件处理,实现相同功能上有100倍左右的性能提升。 相似文献
11.
12.
Changsheng Huang Baochang Shi Nanzhong He & Zhenhua Chai 《advances in applied mathematics and mechanics.》2015,7(1):1-12
The lattice Boltzmann method (LBM) can gain a great amount of
performance benefit by taking advantage of graphics processing unit
(GPU) computing, and thus, the GPU, or multi-GPU based LBM can be
considered as a promising and competent candidate in the study of
large-scale fluid flows. However, the multi-GPU based lattice
Boltzmann algorithm has not been studied extensively, especially for
simulations of flow in complex geometries. In this paper, through
coupling with the message passing interface (MPI) technique, we
present an implementation of multi-GPU based LBM for fluid flow
through porous media as well as some optimization strategies based
on the data structure and layout, which can apparently reduce memory
access and completely hide the communication time consumption. Then
the performance of the algorithm is tested on a one-node cluster
equipped with four Tesla C1060 GPU cards where up to 1732 MFLUPS is
achieved for the Poiseuille flow and a nearly linear speedup with
the number of GPUs is also observed. 相似文献
13.
Tobias Preis Peter Virnau Wolfgang Paul Johannes J. Schneider 《Journal of computational physics》2009,228(12):4468-4477
The compute unified device architecture (CUDA) is a programming approach for performing scientific calculations on a graphics processing unit (GPU) as a data-parallel computing device. The programming interface allows to implement algorithms using extensions to standard C language. With continuously increased number of cores in combination with a high memory bandwidth, a recent GPU offers incredible resources for general purpose computing. First, we apply this new technology to Monte Carlo simulations of the two dimensional ferromagnetic square lattice Ising model. By implementing a variant of the checkerboard algorithm, results are obtained up to 60 times faster on the GPU than on a current CPU core. An implementation of the three dimensional ferromagnetic cubic lattice Ising model on a GPU is able to generate results up to 35 times faster than on a current CPU core. As proof of concept we calculate the critical temperature of the 2D and 3D Ising model using finite size scaling techniques. Theoretical results for the 2D Ising model and previous simulation results for the 3D Ising model can be reproduced. 相似文献
14.
In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. 相似文献
15.
常用的图像型火焰探测算法是提取火焰在图像上表现出的单个特征信息或其有效组合作为识别的依据,需要大量的训练样本进行学习与参量优化,且识别率对特征选择的要求也很高.本文从火焰的整体特征考虑,提出了基于颜色模型和稀疏表示模型相结合的图像型火灾探测方法.首先在HIS空间建立颜色模型对火灾图像进行预处理提取出疑似区域,建立稀疏表示模型,并利用主成分分析方法构造火焰和疑似火焰物体的特征字典,最后利用l1-minimization计算测试样本与训练样本的最小逼近残差实现火焰和干扰物体的分类识别.实验结果表明,该方法提高了火灾图像的分类准确度和识别速度,同时具有较高的准确率. 相似文献
16.
A new lattice Bhatnagar--Gross--Krook (LBGK) model for a class of the generalized
Burgers equations is proposed. It is a general LBGK model for nonlinear Burgers
equations with source term in arbitrary dimensional space. The linear stability of
the model is also studied. The model is numerically tested for three problems in
different dimensional space, and the numerical results are compared with either
analytic solutions or numerical results obtained by other methods. Satisfactory
results are obtained by the numerical simulations. 相似文献
17.
Lattice Boltzmann Method is recently developed within numerical schemes for simulating a variety of physical systems. In this paper a new lattice Bhatnagar-Gross-Krook (LBGK) model for two-dimensional incompressible magnetohydrodynamics (IMHD) is presented. The model is an extension of a hydrodynamics lattice BGK model with 9 velocities on a square lattice, resulting in a model with 17 velocities. Most of the existing LBGK models for MHD can be viewed as compressible schemes to simulate incompressible flows. The compressible effect might lead to some undesirable errors in numerical simulations. In our model the compressible effect has been overcome successfully. The model is then applied to the Hartmann flow, giving reasonable results. 相似文献
18.
MA Chang-Feng SHI Bao-Chang CHEN Xing-Wang 《理论物理通讯》2005,44(5):917-920
Lattice Boltzmann Method is recently developed within numerical schemes for simulating a variety of physical systems. In this paper a new lattice.Bhatnagar-Gross-Krook (LBGK) model for two-dimensional incompressible magnetohydrodynamics (IMHD) is presented. The model is an extension of a hydrodynamics lattice BGK model with 9 velocities on a square lattice, resulting in a model with 17 velocities. Most of the existing LBGK models for MHD can be viewed as compressible schemes to simulate incompressible flows. The compressible effect might lead to some undesirable errors in numerical simulations. In our model the compressible effect has been overcome successfully. The model is then applied to the Hartmann flow, giving reasonable results. 相似文献
19.
传统的高光谱遥感影像分类算法侧重于光谱信息的应用。随着高光谱遥感影像的空间分辨率的增加,高光谱影像中相同类别的地物在空间分布上呈现聚类特性,将空间特性有效地应用于高光谱遥感影像分类算法对分类精度的提升非常关键。但是,高光谱影像的高分辨率提供空间聚类特性的同时,在不同地物边缘处表现出的差异性更加明显,若不对空间邻域像素进行甄选,直接将邻域光谱信息引入,设计空谱联合稀疏表示进行图像分割,则分类误差较大,收敛速度大大降低。将光谱角引入空谱联合稀疏表示图像分类理论中,提出了一种基于邻域分割的空谱联合稀疏表示分类算法。该算法利用光谱角计算相邻像素的空间相似度,剥离相似度较低的邻域像素,将相似度高的邻域像素定义为同类地物,引入空谱联合稀疏表示模型中,采用子联合空间追踪算子和联合正交匹配追踪算子对其优化求解,以最小重构误差为准则进行分类。选取AVIRIS及ROSIS典型光谱影像数据进行实验仿真,从中可以看出,随着光谱角分割阈值的提高,复杂的高光谱影像分类精度和平滑区域的高光谱影像分类精度均逐步提高,表明邻域分割在空谱联合稀疏表示分类中的必要性。 相似文献
20.
非线性系统的二维流形通常具有复杂几何结构和丰富动力学信息,因此在流形计算与可视化时存在大量的不可避免的数值计算.因此,如何高效地完成这些计算就成了关键问题.鉴于当今计算机的异构发展趋势(包含多核CPU和通用GPU),本文在兼顾精度和通用性的基础上,提出了适用于新一代计算平台的快速流形计算方法.本算法将计算任务分为轨道延伸和三角形生成两部分,前者运算量大而单一适合GPU完成,后者运算量小而复杂适合CPU执行.通过对Lorenz系统原点稳定流形的计算,表明本算法能充分发挥异构平台的综合性能,可大幅度提高计算速
关键词:
不稳定流形
流形计算
异构计算
Lorenz系统 相似文献