排序方式: 共有55条查询结果,搜索用时 0 毫秒
21.
Molecular dynamics simulation of complex multiphase flow on a computer cluster with GPUs 总被引:2,自引:0,他引:2
Compute Unified Device Architecture (CUDA) was used to design and implement molecular dynamics (MD) simulations on graphics
processing units (GPU). With an NVIDIA Tesla C870, a 20–60 fold speedup over that of one core of the Intel Xeon 5430 CPU was
achieved, reaching up to 150 Gflops. MD simulation of cavity flow and particle-bubble interaction in liquid was implemented
on multiple GPUs using a message passing interface (MPI). Up to 200 GPUs were tested on a special network topology, which
achieves good scalability. The capability of GPU clusters for large-scale molecular dynamics simulation of meso-scale flow
behavior was, therefore, uncovered.
Supported by the National Natural Science Foundation of China (Grant Nos. 20336040, 20221603 and 20490201), and the Chinese
Academy of Sciences (Grant No. Kgcxz-yw-124) 相似文献
22.
《Journal of computational science》2014,5(5):701-708
Hermitian radial basis functions implicits is a method capable of reconstructing implicit surfaces from first-order Hermitian data. When globally supported radial functions are used, a dense symmetric linear system must be solved. In this work, we aim at exploring and computing a matrix-free implementation of the Conjugate Gradients Method on the GPU in order to solve such linear system. The proposed method parallelly rebuilds the matrix on demand for each iteration. As a result, it is able to compute the Hermitian-based interpolant for datasets that otherwise could not be handled due to the high memory demanded by their linear systems. 相似文献
23.
In this work we explore the performance of CUDA in quenched lattice SU (2) simulations. CUDA, NVIDIA Compute Unified Device Architecture, is a hardware and software architecture developed by NVIDIA for computing on the GPU. We present an analysis and performance comparison between the GPU and CPU in single and double precision. Analyses with multiple GPUs and two different architectures (G200 and Fermi architectures) are also presented. In order to obtain a high performance, the code must be optimized for the GPU architecture, i.e., an implementation that exploits the memory hierarchy of the CUDA programming model. 相似文献
24.
Arlindo R. Galvo Filho Lauro C. Martins dePaula Clarimar Jos Coelho Telma Woerle de Lima Anderson da Silva Soares 《Mathematical Methods in the Applied Sciences》2016,39(3):405-411
Mathematical models are of great value in epidemiology to help understand the dynamics of the various infectious diseases, as well as in the conception of effective control strategies. The classical approach is to use differential equations to describe, in a quantitative manner, the spread of diseases within a particular population. An alternative approach is to represent each individual in the population as a string or vector of characteristic data and simulate the contagion and recovery processes by computational means. This type of model, referred in the literature as MBI (models based on individuals), has the advantage of being flexible as the characteristics of each individual can be quite complex, involving, for instance, age, sex, pre‐existing health conditions, environmental factors, social habits, etc. However, when it comes to simulations involving large populations, MBI may require a large computational effort in terms of memory storage and processing time. In order to cope with the problem of heavy computational effort, this paper proposes a parallel implementation of MBI using a graphics processor unit compatible with CUDA. It was found that, even in the case of a simple susceptible–infected–recovered model, the computational gains in terms of processing time are significant. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
25.
26.
27.
For biological applications, sequence alignment is an important strategy to analyze DNA and protein sequences. Multiple sequence alignment is an essential methodology to study biological data, such as homology modeling, phylogenetic reconstruction and etc. However, multiple sequence alignment is a NP-hard problem. In the past decades, progressive approach has been proposed to successfully align multiple sequences by adopting iterative pairwise alignments. Due to rapid growth of the next generation sequencing technologies, a large number of sequences can be produced in a short period of time. When the problem instance is large, progressive alignment will be time consuming. Parallel computing is a suitable solution for such applications, and GPU is one of the important architectures for contemporary parallel computing researches. Therefore, we proposed a GPU version of ClustalW v2.0.11, called CUDA ClustalW v1.0, in this work. From the experiment results, it can be seen that the CUDA ClustalW v1.0 can achieve more than 33× speedups for overall execution time by comparing to ClustalW v2.0.11. 相似文献
28.
ZHU XiaoSong CHENG Liang LU Lin & TENG Bin State Key Laboratory of Coastal Offshore Engineering Dalian University of Technology Dalian China School of Civil Resource Engineering The University of Western Australia Perth WA Australia Centre for Deepwater Engineering Dalian 《中国科学:物理学 力学 天文学(英文版)》2011,(3)
The Moving Particle Semi-implicit (MPS) method performs well in simulating violent free surface flow and hence becomes popular in the area of fluid flow simulation. However, the implementations of searching neighbouring particles and solving the large sparse matrix equations (Poisson-type equation) are very time-consuming. In order to utilize the tremendous power of parallel computation of Graphics Processing Units (GPU), this study has developed a GPU-based MPS model employing the Compute Unified Device Ar... 相似文献
29.
为了进一步提高CPU-GPU协同贝叶斯种系发生算法n(MC)3的并发度,本文在n(MC)3算法基础上提出了优化算法,修改算法并行策略,重组计算次序,削弱相邻计算节点之间的依赖关系,增强GPU空闲单元的利用,实现了更高的加速比,显著提高了算法性能. 相似文献
30.
摘要:为提高单个计算节点创建影像金字塔的速度,本研究首先将GPU并行技术用于加速影像重采样算法.影像重采样算法是影像金字塔创建算法的核心步骤,由于金字塔创建过程中数据量会不断发生变化,而数据量的大小直接影响GPU重采样算法效率.提出了一种基于阈值的金字塔遥感影像创建算法,算法将GPU并行与CPU串行遥感影像重采样算法结合,在创建影像金字塔时,依据阈值动态选择不同的重采样算法,并将本算法应用到土地遥感影像金字塔管理中.实验采用大小为10371×7945的24位遥感影像进行测试,结果表明:①基于GPU的并行重采样算法的速度最快,是基于CPU串行重采样算法的10倍;②采用本文算法创建金字塔速度是ArcGIS9.3创建金字塔速度的3倍以上. 相似文献