期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

61.

GPU‐accelerated molecular mechanics computations

Athanasios Anthopoulos Ian Grimstead Andrea Brancale 《Journal of computational chemistry》2013,34(26):2249-2260

In this article, we describe an improved cell‐list approach designed to match the Kepler architecture of General‐purpose graphics processing units (GPGPU). We explain how our approach improves load balancing for the above algorithm and how warp intrinsics are used to implement Newton's third law for the nonbonded force calculations. We also talk through our approach to exclusions handling together with a method to calculate bonded forces and 1–4 electrostatic scaling using a single Cuda kernel. Performance benchmarks are included in the last sections to show the linear scaling of our implementation using a step minimization method. In addition, multiple performance benchmarks demonstrate the contribution of various optimizations we used for our implementations. © 2013 Wiley Periodicals, Inc. 相似文献

62.

GPU-accelerated atom and dynamic bond visualization using hyperballs: a unified algorithm for balls, sticks, and hyperboloids

Chavent M Vanel A Tek A Levy B Robert S Raffin B Baaden M 《Journal of computational chemistry》2011,32(13):2924-2935

Ray casting on graphics processing units (GPUs) opens new possibilities for molecular visualization. We describe the implementation and calculation of diverse molecular representations such as licorice, ball-and-stick, space-filling van der Waals spheres, and approximated solvent-accessible surfaces using GPUs. We introduce HyperBalls, an improved ball-and-stick representation replacing tubes, linking the atom spheres by hyperboloids that can smoothly connect them. This type of depiction is particularly useful to represent dynamic phenomena, such as the evolution of noncovalent bonds. It is furthermore well suited to represent coarse-grained models and spring networks. All these representations can be defined by a single general algebraic equation that is adapted for the ray-casting technique and is well suited for execution on the GPU. Using GPU capabilities, this implementation can routinely, accurately, and interactively render molecules ranging from a few atoms up to huge macromolecular assemblies with more than 500,000 particles. In simple cases, based only on spheres, we have been able to display up to two million atoms smoothly. 相似文献

63.

Avoiding the van der Waals endpoint problem using serial atomic insertion

Boresch S Bruckner S 《Journal of computational chemistry》2011,32(11):2449-2458

In the past analyses of the so-called van der Waals end point problem focused on thermodynamic integration. Here we investigate which of the recommendations, such as the need for soft-core potentials, are still valid when Bennett's acceptance ratio method is used. We show that in combination with Bennett's acceptance ratio method intermediate states characterized by the coupling parameter λ can be replaced by intermediate states in which Lennard-Jones interactions are turned on or off on an "atom by atom" basis. By doing so, there is no necessity to use soft-core potentials. In fact, one can compute free energy differences without dedicated code, making it possible to use any molecular dynamics program to compute alchemical free energy differences. Such an approach, which we illustrate by several examples, makes it possible to exploit the tremendous computational power of the graphics processing unit. 相似文献

64.

Pattern-recognition system,designed on GPU,for discriminating between injured normal and pathological knee cartilage

Spiros Kostopoulos Konstantinos Sidiropoulos Dimitris Glotsos Emmanouil Athanasiadis Konstantina Boutsikou Eleftherios Lavdas Georgia Oikonomou Ioannis V. Fezoulidis Marianna Vlychou Michael Hantes Dionisis Cavouras 《Magnetic resonance imaging》2013

The aim was to design a pattern-recognition (PR) system for discriminating between normal and pathological knee articular cartilage of the medial femoral (MFC) and tibial condyles (MTC). The data set comprised segmented regions of interest (ROIs) from coronal and sagittal 3-T magnetic resonance images of the MFC and MTC cartilage of young patients, 28 with abnormality-free knee and 16 with pathological findings. The PR system was designed employing the probabilistic neural network classifier, textural features from the segmented ROIs and the leave-one-out evaluation method, while the PR system's precision to “unseen” data was assessed by employing the external cross-validation method. Optimal system design was accomplished on a consumer graphics processing unit (GPU) using Compute Unified Device Architecture parallel programming. PR system design on the GPU required about 3.5 min against 15 h on a CPU-based system. Highest classification accuracies for the MFC and MTC cartilages were 93.2% and 95.5%, and accuracies to “unseen” data were 89% and 86%, respectively. The proposed PR system is housed in a PC, equipped with a consumer GPU, and it may be easily retrained when new verified data are incorporated in its repository and may be of value as a second-opinion tool in a clinical environment. 相似文献

65.

一种圆轨迹锥束CT中截断投影数据的高效重建算法

下载免费PDF全文

汪先超闫镔* 刘宏奎李磊魏星胡国恩《物理学报》2013,62(9):98702-098702

本文基于数据重排方法, 提出了T-BPF (Tent-BPF)算法, 该算法先将锥束投影数据重排成平行投影数据, 然后使用一种推导的BPF型算法重建重排后的平行投影数据. T-BPF算法将原BPF算法反投影中变化的角度积分限变成固定的, 反投影中各层循环之间没有了相关性, 这意味着T-BPF算法较原BPF算法具有更好的可并行性. 实验结果显示: 使用GPU对256³的Shepp-Logan体模的图像重建进行并行加速, T-BPF算法在保证重建质量的前提下, 加速比达到了1036, 较原BPF算法有很大提升. T-BPF算法为截断投影数据的3D图像快速重建提供了方法. 关键词： X射线光学 CT 图像重建 GPU 相似文献

66.

复杂构造地震数据叠前逆时偏移方法

石颖柯璇田东升张惠瑜《数学的实践与认识》2013,43(10)

叠前逆时偏移方法可以精确成像复杂地下构造,利用高阶有限差分方法求解声波方程,并给出了满足稳定性条件的采样间隔的选取方式.利用GPU/CPU加速技术实现地震资料的叠前逆时偏移算法,极大地提高了计算效率,算法也采用随机边界条件,节约了大量存储空间.分析了速度模型变化对成像结果的影响.复杂地震数据成像的测试结果表明,所述的叠前逆时偏移算法可清晰成像陡倾角成像清晰,对盐丘边界和内部构造成像效果也较好. 相似文献

67.

Image stack alignment in full‐field X‐ray absorption spectroscopy using SIFT_PyOCL

Pierre Paleo Emeline Pouyet Jérôme Kieffer 《Journal of synchrotron radiation》2014,21(2):456-461

Full‐field X‐ray absorption spectroscopy experiments allow the acquisition of millions of spectra within minutes. However, the construction of the hyperspectral image requires an image alignment procedure with sub‐pixel precision. While the image correlation algorithm has originally been used for image re‐alignment using translations, the Scale Invariant Feature Transform (SIFT) algorithm (which is by design robust versus rotation, illumination change, translation and scaling) presents an additional advantage: the alignment can be limited to a region of interest of any arbitrary shape. In this context, a Python module, named SIFT_PyOCL, has been developed. It implements a parallel version of the SIFT algorithm in OpenCL, providing high‐speed image registration and alignment both on processors and graphics cards. The performance of the algorithm allows online processing of large datasets. 相似文献

68.

基于GPU和分块技术的巨幅影像快速傅里叶变换算法研究(英文) 总被引：1，自引：0，他引：1

杨雪李学友李家国马骏张力杨健杜全叶《光谱学与光谱分析》2014,34(2):498

快速傅里叶变换(FFT)是遥感影像处理的基础方法,随着高光谱、高空间和高时间分辨率遥感影像获取能力的提升,如何利用快速傅里叶变换技术快速有效地处理巨幅遥感影像是当前遥感影像处理技术中的重要环节和研究热点。傅里叶变换算法FFT是基本的图像处理算法之一,该算法可进行遥感影像的条带噪声去除、影像压缩和影像配准处理等多种用途。CUFFT函数库是NVIDIA公司提供的基于GPU的FFT算法库,FFTW是由MIT科学实验室计算机组在PC平台上开发的基于CPU的FFT算法,是目前在基于CPU的运行速度最快的FFT算法函数库,这两种实现共有的问题是当可用内存或显存的容量小于图像容量时,就会出现内存或显存溢出。针对这种问题,提出了一种基于GPU和分块技术的巨幅遥感影像快速傅里叶变换(huge remote fast Fourier transform,HRFFT)算法。通过对CUDA的CUFFT函数库中的FFT算法进行改进,解决了巨幅图像内存或显存溢出的问题,并结合HJ-1A卫星的CCD影像,通过实验与其他算法进行了对比,证明了该方法的合理性。在实际应用中,利用本文提出的HRFFT算法,改善了影像处理的效果,提高了遥感影像的质量,同时加快了影像处理的速度,节省了计算时间,取得了较好的效果。相似文献

69.

A massively parallel GPU‐accelerated model for analysis of fully nonlinear free surface waves

A. P. Engsig‐Karup Morten G. Madsen Stefan L. Glimberg 《国际流体数值方法杂志》2012,70(1):20-36

We implement and evaluate a massively parallel and scalable algorithm based on a multigrid preconditioned Defect Correction method for the simulation of fully nonlinear free surface flows. The simulations are based on a potential model that describes wave propagation over uneven bottoms in three space dimensions and is useful for fast analysis and prediction purposes in coastal and offshore engineering. A dedicated numerical model based on the proposed algorithm is executed in parallel by utilizing affordable modern special purpose graphics processing unit (GPU). The model is based on a low‐storage flexible‐order accurate finite difference method that is known to be efficient and scalable on a CPU core (single thread). To achieve parallel performance of the relatively complex numerical model, we investigate a new trend in high‐performance computing where many‐core GPUs are utilized as high‐throughput co‐processors to the CPU. We describe and demonstrate how this approach makes it possible to do fast desktop computations for large nonlinear wave problems in numerical wave tanks (NWTs) with close to 50/100 million total grid points in double/single precision with 4 GB global device memory available. A new code base has been developed in C++ and compute unified device architecture C and is found to improve the runtime more than an order in magnitude in double precision arithmetic for the same accuracy over an existing CPU (single thread) Fortran 90 code when executed on a single modern GPU. These significant improvements are achieved by carefully implementing the algorithm to minimize data‐transfer and take advantage of the massive multi‐threading capability of the GPU device. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

70.

A note on generating finer‐grain parallelism in a representation tree

Christof Vömel 《Numerical Linear Algebra with Applications》2012,19(5):869-879

The representation tree lies at the heart of the algorithm of Multiple Relatively Robust Representations for computing orthogonal eigenvectors of a symmetric tridiagonal matrix without Gram–Schmidt. A representation tree describes the incremental shift relations between relatively robust representations of eigenvalue clusters of an unreduced tridiagonal matrix, which are needed to strongly separate close eigenvalues in the relative sense. At the bottom of the representation tree, each leaf defines a relatively isolated eigenvalue to high relative accuracy. The shape of the representation tree plays a pivotal role for complexity and available parallelism: a deeper tree consisting of multiple levels of nodes involves tasks associated to more work (i.e., eigenvalue refinement to resolve eigenvalue clusters) and less parallelism (i.e., a longer critical path as well as potential data movement and synchronization). An embarrassingly parallel, ideal tree on the other hand consists of a root and leaves only. As highly parallel hybrid graphics processing unit/multicore platforms with large memory now become available as commodity platforms, exploiting parallelism in traditional algorithms becomes key to modernizing the components of standard software libraries such as LAPACK. This paper focuses on LAPACK's Multiple Relatively Robust Representations algorithm and investigates the critical case where a representation tree contains a long sequential chain of large (fat) nodes that hamper parallelism. This key problem needs to be addressed as it concerns all sorts of computing environments, distributed computing, symmetric multiprocessor, as well as hybrid graphics processing unit/multicore architectures. We present an improved representation tree that often offers a significantly shorter critical path and finer computational granularity of smaller tasks that are easier to schedule. In a study of selected synthetic and application matrices, we show that an average 75% reduction in the length of the critical path and 82% reduction in task granularity can be achieved. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献