首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We propose the algorithm to evaluate the Coulomb potential in the ab initio density functional calculation on the graphics processor unit (GPU). The numerical accuracy required for the algorithm is investigated in detail. It is shown that GPU, which supports only the single-precision floating number natively, can take part in the major computational tasks. Because of the limited size of the working memory, the Gauss-Rys quadrature to evaluate the electron repulsion integrals (ERIs) is investigated in detail. The error analysis of the quadrature is performed. New interpolation formula of the roots and weights is presented, which is suitable for the processor of the single-instruction multiple-data type. It is proposed to calculate only small ERIs on GPU. ERIs can be classified efficiently with the upper-bound formula. The algorithm is implemented on NVIDIA GeForce 8800 GTX and the Gaussian 03 program suite. It is applied to the test molecules Taxol and Valinomycin. The total energies calculated are essentially the same as the reference ones. The preliminary results show the considerable speedup over the commodity microprocessor.  相似文献   

2.
A new parallel algorithm and its implementation for the RI‐MP2 energy calculation utilizing peta‐flop‐class many‐core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual‐level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi‐node and multi‐GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi‐node and multi‐GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc.  相似文献   

3.
The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand–protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual‐GPU version leads to a 39‐fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc.  相似文献   

4.
The approach used to calculate the two‐electron integral by many electronic structure packages including generalized atomic and molecular electronic structure system‐UK has been designed for CPU‐based compute units. We redesigned the two‐electron compute algorithm for acceleration on a graphical processing unit (GPU). We report the acceleration strategy and illustrate it on the (ss|ss) type integrals. This strategy is general for Fortran‐based codes and uses the Accelerator compiler from Portland Group International and GPU‐based accelerators from Nvidia. The evaluation of (ss|ss) type integrals within calculations using Hartree Fock ab initio methods and density functional theory are accelerated by single and quad GPU hardware systems by factors of 43 and 153, respectively. The overall speedup for a single self consistent field cycle is at least a factor of eight times faster on a single GPU compared with that of a single CPU. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011  相似文献   

5.
We describe a set of algorithms that allow to simulate dihydrofolate reductase (DHFR, a common benchmark) with the AMBER all‐atom force field at 160 nanoseconds/day on a single Intel Core i7 5960X CPU (no graphics processing unit (GPU), 23,786 atoms, particle mesh Ewald (PME), 8.0 Å cutoff, correct atom masses, reproducible trajectory, CPU with 3.6 GHz, no turbo boost, 8 AVX registers). The new features include a mixed multiple time‐step algorithm (reaching 5 fs), a tuned version of LINCS to constrain bond angles, the fusion of pair list creation and force calculation, pressure coupling with a “densostat,” and exploitation of new CPU instruction sets like AVX2. The impact of Intel's new transactional memory, atomic instructions, and sloppy pair lists is also analyzed. The algorithms map well to GPUs and can automatically handle most Protein Data Bank (PDB) files including ligands. An implementation is available as part of the YASARA molecular modeling and simulation program from www.YASARA.org . © 2015 The Authors Journal of Computational Chemistry Published by Wiley Periodicals, Inc.  相似文献   

6.
We investigated the performance of heterogeneous computing with graphics processing units (GPUs) and many integrated core (MIC) with 20 CPU cores (20×CPU). As a practical example toward large scale electronic structure calculations using grid‐based methods, we evaluated the Hartree potentials of silver nanoparticles with various sizes (3.1, 3.7, 4.9, 6.1, and 6.9 nm) via a direct integral method supported by the sinc basis set. The so‐called work stealing scheduler was used for efficient heterogeneous computing via the balanced dynamic distribution of workloads between all processors on a given architecture without any prior information on their individual performances. 20×CPU + 1GPU was up to ~1.5 and ~3.1 times faster than 1GPU and 20×CPU, respectively. 20×CPU + 2GPU was ~4.3 times faster than 20×CPU. The performance enhancement by CPU + MIC was considerably lower than expected because of the large initialization overhead of MIC, although its theoretical performance is similar with that of CPU + GPU. © 2016 Wiley Periodicals, Inc.  相似文献   

7.
We present the first graphical processing unit (GPU) coprocessor‐enabled version of the Order‐N Electronic Total Energy Package (ONETEP) code for linear‐scaling first principles quantum mechanical calculations on materials. This work focuses on porting to the GPU the parts of the code that involve atom‐localized fast Fourier transform (FFT) operations. These are among the most computationally intensive parts of the code and are used in core algorithms such as the calculation of the charge density, the local potential integrals, the kinetic energy integrals, and the nonorthogonal generalized Wannier function gradient. We have found that direct porting of the isolated FFT operations did not provide any benefit. Instead, it was necessary to tailor the port to each of the aforementioned algorithms to optimize data transfer to and from the GPU. A detailed discussion of the methods used and tests of the resulting performance are presented, which show that individual steps in the relevant algorithms are accelerated by a significant amount. However, the transfer of data between the GPU and host machine is a significant bottleneck in the reported version of the code. In addition, an initial investigation into a dynamic precision scheme for the ONETEP energy calculation has been performed to take advantage of the enhanced single precision capabilities of GPUs. The methods used here result in no disruption to the existing code base. Furthermore, as the developments reported here concern the core algorithms, they will benefit the full range of ONETEP functionality. Our use of a directive‐based programming model ensures portability to other forms of coprocessors and will allow this work to form the basis of future developments to the code designed to support emerging high‐performance computing platforms.Copyright © 2013 Wiley Periodicals, Inc.  相似文献   

8.
We introduce a complete implementation of viscoelastic model for numerical simulations of the phase separation kinetics in dynamic asymmetry systems such as polymer blends and polymer solutions on a graphics processing unit (GPU) by CUDA language and discuss algorithms and optimizations in details. From studies of a polymer solution, we show that the GPU-based implementation can predict correctly the accepted results and provide about 190 times speedup over a single central processing unit (CPU). Further accuracy analysis demonstrates that both the single and the double precision calculations on the GPU are sufficient to produce high-quality results in numerical simulations of viscoelastic model. Therefore, the GPU-based viscoelastic model is very promising for studying many phase separation processes of experimental and theoretical interests that often take place on the large length and time scales and are not easily addressed by a conventional implementation running on a single CPU.  相似文献   

9.
We present a way to improve the performance of the electronic structure Vienna Ab initio Simulation Package (VASP) program. We show that high-performance computers equipped with graphics processing units (GPUs) as accelerators may reduce drastically the computation time when offloading these sections to the graphic chips. The procedure consists of (i) profiling the performance of the code to isolate the time-consuming parts, (ii) rewriting these so that the algorithms become better-suited for the chosen graphic accelerator, and (iii) optimizing memory traffic between the host computer and the GPU accelerator. We chose to accelerate VASP with NVIDIA GPU using CUDA. We compare the GPU and original versions of VASP by evaluating the Davidson and RMM-DIIS algorithms on chemical systems of up to 1100 atoms. In these tests, the total time is reduced by a factor between 3 and 8 when running on n (CPU core + GPU) compared to n CPU cores only, without any accuracy loss. © 2012 Wiley Periodicals, Inc.  相似文献   

10.
Space charge effects play important roles in the performance of various types of mass analyzers. Simulation of space charge effects is often limited by the computation capability. In this study, we evaluate the method of using graphics processing unit (GPU) to accelerate ion trajectory simulation. Simulation using GPU has been compared with multi-core central processing unit (CPU), and an acceleration of about 390 times have been obtained using a single computer for simulation of up to 105 ions in quadrupole ion traps. Characteristics of trapped ions can be investigated at detailed levels within a reasonable simulation time. Space charge effects on the trapping capacities of linear and 3D ion traps, ion cloud shapes, ion motion frequency shift, mass spectrum peak coalescence effects between two ion clouds of close m/z are studied with the ion trajectory simulation using GPU.  相似文献   

11.
Empirical potential structure refinement is a neutron scattering data analysis algorithm and a software package. It was developed by the disordered materials group in the British spallation neutron source (ISIS) in 1980s, and aims to construct the most-probable atomic structures of disordered materials in the field of chemical physics. It has been extensively used during the past decades, and has generated reliable results. However, it implements a shared-memory architecture with open multi-processing (OpenMP). With the extensive construction of supercomputer clusters and the widespread use of graphics processing unit (GPU) acceleration technology, it is now possible to rebuild the EPSR with these techniques in the effort to improve its calculation speed. In this study, an open source framework NeuDATool is proposed. It is programmed in the object-oriented language C++, can be paralleled across nodes within a computer cluster, and supports GPU acceleration. The performance of NeuDATool has been tested with water and amorphous silica neutron scattering data. The test shows that the software can reconstruct the correct microstructure of the samples, and the calculation speed with GPU acceleration can increase by more than 400 times, compared with CPU serial algorithm at a simulation box that has about 100 thousand atoms. NeuDATool provides another choice to implement simulation in the (neutron) diffraction community, especially for experts who are familiar with C++ programming and want to define specific algorithms for their analysis.  相似文献   

12.
We describe a complete implementation of all‐atom protein molecular dynamics running entirely on a graphics processing unit (GPU), including all standard force field terms, integration, constraints, and implicit solvent. We discuss the design of our algorithms and important optimizations needed to fully take advantage of a GPU. We evaluate its performance, and show that it can be more than 700 times faster than a conventional implementation running on a single CPU core. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009  相似文献   

13.
We accelerated an ab initio molecular QMC calculation by using GPGPU. Only the bottle‐neck part of the calculation is replaced by CUDA subroutine and performed on GPU. The performance on a (single core CPU + GPU) is compared with that on a (single core CPU with double precision), getting 23.6 (11.0) times faster calculations in single (double) precision treatments on GPU. The energy deviation caused by the single precision treatment was found to be within the accuracy required in the calculation, ~10?5 hartree. The accelerated computational nodes mounting GPU are combined to form a hybrid MPI cluster on which we confirmed the performance linearly scales to the number of nodes. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011  相似文献   

14.
During the past few years, graphics processing units (GPUs) have become extremely popular in the high performance computing community. In this study, we present an implementation of an acceleration engine for the solvent–solvent interaction evaluation of molecular dynamics simulations. By careful optimization of the algorithm speed‐ups up to a factor of 54 (single‐precision GPU vs. double‐precision CPU) could be achieved. The accuracy of the single‐precision GPU implementation is carefully investigated and does not influence structural, thermodynamic, and dynamic quantities. Therefore, the implementation enables users of the GROMOS software for biomolecular simulation to run the solvent–solvent interaction evaluation on a GPU, and thus, to speed‐up their simulations by a factor 6–9. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

15.
16.
将在计算生物分子中广泛应用的CHARMM力场应用于Windows computer cluster server(WCCS)环境下, 并实现了该力场及分子动力学模拟程序的通用显卡(GPU)并行计算. 对一些多肽链的动力学模拟结果显示, 与CPU计算相比, GPU计算在计算速度上有巨大的提升. 与64位Athlon 2.0G相比, 在NVIDIA GeForce 8800 GT显卡上的动力学模拟速度提高了至少10倍, 而且这个效率比会随着模拟体系及每块尺寸的增大而增大. 模拟体系的增大使得GPU并行单元的计算空载相对减少, 块尺寸的增大使缓存区尺寸相对减少, 单块计算效率得以提高. 在测试样本中, 该效率比最高可达到28倍以上. 利用GPU计算还对一条含有397个原子的多肽链进行了分子动力学模拟, 给出了氢键分布随时间的变化结果.  相似文献   

17.
In this study, an early‐working algorithm is designed to evaluate derivatives of electron repulsion integrals (DERIs) for heavy‐element systems. The algorithm is constructed to extend the accompanying coordinate expansion and transferred recurrence relation (ACE‐TRR) method, which was developed for rapid evaluation of electron repulsion integrals (ERIs) in our previous article (M. Hayami, J. Seino, and H. Nakai, J. Chem. Phys. 2015, 142, 204110). The algorithm was formulated using the Gaussian derivative rule to decompose a DERI of two ERIs with the same sets of exponents, different sets of contraction coefficients, and different angular momenta. The algorithms designed for segmented and general contraction basis sets are presented as well. Numerical assessments of the central processing unit time of gradients for molecules were conducted to demonstrate the high efficiency of the ACE‐TRR method for systems containing heavy elements. These heavy elements may include a metal complex and metal clusters, whose basis sets contain functions with long contractions and high angular momenta.  相似文献   

18.
A new second‐order perturbation theory (MP2) approach is presented for closed shell energy evaluations. The new algorithm has a significantly lower memory footprint, a lower FLOP (floating point operations) count, and a lower time to solution compared to previously implemented parallel MP2 methods in the GAMESS software package. Additionally, this algorithm features an adaptive approach for the disk/distributed memory storage of the MP2 amplitudes. The algorithm works well on a single workstation, small cluster, and large Cray cluster, and it allows one to perform large calculations with thousands of basis functions in a matter of hours on a single workstation. The same algorithm has been adapted for graphical processing unit (GPU) architecture. The performance of the new GPU algorithm is also discussed. © 2016 Wiley Periodicals, Inc.  相似文献   

19.
We present an algorithm for the rapid computation of electron repulsion integrals (ERIs) over Gaussian basis functions based on the accompanying coordinate expansion (ACE) formula. The present algorithm uses equations termed angular momentum reduced expressions and introduces two types of recurrence relations to ACE formulas. Numerical efficiencies are assessed for (p pmid R:p p) and (sp spmid R:sp sp) ERIs by using the floating-point operation count. The algorithm is suitable for calculating ERIs for the same exponents but different angular momentum functions, such as L shells and derivatives of ERIs. The present algorithm is also capable of calculating ERIs with highly contracted Gaussian basis functions.  相似文献   

20.
We test the relative performances of two different approaches to the computation of forces for molecular dynamics simulations on graphics processing units. A “vertex‐based” approach, where a computing thread is started per particle, is compared to an “edge‐based” approach, where a thread is started per each potentially non‐zero interaction. We find that the former is more efficient for systems with many simple interactions per particle while the latter is more efficient if the system has more complicated interactions or fewer of them. By comparing computation times on more and less recent graphics processing unit technology, we predict that, if the current trend of increasing the number of processing cores—as opposed to their computing power—remains, the “edge‐based” approach will gradually become the most efficient choice in an increasing number of cases. © 2014 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号