首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
    
The approach used to calculate the two‐electron integral by many electronic structure packages including generalized atomic and molecular electronic structure system‐UK has been designed for CPU‐based compute units. We redesigned the two‐electron compute algorithm for acceleration on a graphical processing unit (GPU). We report the acceleration strategy and illustrate it on the (ss|ss) type integrals. This strategy is general for Fortran‐based codes and uses the Accelerator compiler from Portland Group International and GPU‐based accelerators from Nvidia. The evaluation of (ss|ss) type integrals within calculations using Hartree Fock ab initio methods and density functional theory are accelerated by single and quad GPU hardware systems by factors of 43 and 153, respectively. The overall speedup for a single self consistent field cycle is at least a factor of eight times faster on a single GPU compared with that of a single CPU. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011  相似文献   

2.
    
Using a grid‐based method to search the critical points in electron density, we show how to accelerate such a method with graphics processing units (GPUs). When the GPU implementation is contrasted with that used on central processing units (CPUs), we found a large difference between the time elapsed by both implementations: the smallest time is observed when GPUs are used. We tested two GPUs, one related with video games and other used for high‐performance computing (HPC). By the side of the CPUs, two processors were tested, one used in common personal computers and other used for HPC, both of last generation. Although our parallel algorithm scales quite well on CPUs, the same implementation on GPUs runs around 10× faster than 16 CPUs, with any of the tested GPUs and CPUs. We have found what one GPU dedicated for video games can be used without any problem for our application, delivering a remarkable performance, in fact; this GPU competes against one HPC GPU, in particular when single‐precision is used. © 2014 Wiley Periodicals, Inc.  相似文献   

3.
    
A unified, computer algebra system‐based scheme of code‐generation for computational quantum‐chemistry programs is presented. Generation of electron‐repulsion integrals and their derivatives as well as exchange‐correlation potential and its derivatives is discussed. Application to general‐purpose computing on graphics processing units is considered.  相似文献   

4.
    
A high-performance implementation of the coupled-cluster singles, doubles, and perturbative triples [CCSD(T)] is developed in the Massively Parallel Quantum Chemistry program. Novel features include: (1) reduced memory requirements via a density-fitting (DF) CCSD implementation utilizing distributed lazy evaluation for tensors with more than two unoccupied indices and (2) the ability to utilize efficiently many-core nodes (Intel Xeon Phi) and heterogeneous nodes with multiple NVIDIA GPUs on each node. All data that are greater than quadratic in the system size are distributed among processes. Excellent strong scaling is observed on distributed-memory computers equipped with conventional CPUs, Intel Xeon Phi processors, and heterogeneous nodes with multiple NVIDIA GPUs Canonical CCSD(T) energies can be evaluated for systems containing 200 electrons and 1000 basis functions in a few days using a small size commodity cluster, with even larger computations possible on leadership-class computing resources.  相似文献   

5.
    
The capabilities of the polarizable force fields for alchemical free energy calculations have been limited by the high computational cost and complexity of the underlying potential energy functions. In this work, we present a GPU‐based general alchemical free energy simulation platform for polarizable potential AMOEBA. Tinker‐OpenMM, the OpenMM implementation of the AMOEBA simulation engine has been modified to enable both absolute and relative alchemical simulations on GPUs, which leads to a ∼200‐fold improvement in simulation speed over a single CPU core. We show that free energy values calculated using this platform agree with the results of Tinker simulations for the hydration of organic compounds and binding of host–guest systems within the statistical errors. In addition to absolute binding, we designed a relative alchemical approach for computing relative binding affinities of ligands to the same host, where a special path was applied to avoid numerical instability due to polarization between the different ligands that bind to the same site. This scheme is general and does not require ligands to have similar scaffolds. We show that relative hydration and binding free energy calculated using this approach match those computed from the absolute free energy approach. © 2017 Wiley Periodicals, Inc.  相似文献   

6.
王治钒  何冰  路艳朝  王繁 《化学学报》2022,80(10):1401-1409
作者此前工作表明, 在耦合簇CCSD (Coupled-Cluster approaches within the singles and doubles approximation)与CCSD(T) (CCSD approaches augmented by a perturbative treatment of triple excitations)计算中结合单精度数与消费型图形处理单元(GPU), 可以显著提高计算速度. 然而由于CCSD(T)计算对内存的巨大需求以及消费型GPU的内存限制, 在利用消费型GPU进行加速时, 不考虑利用空间对称性的情况下, 此前开发的CCSD(T)程序仅能用于计算300~400个基函数的体系. 利用密度拟合(Density-Fitting, DF)处理双电子积分可以显著降低CCSD(T)计算过程中的内存需求, 本工作发展了基于密度拟合近似并结合单精度数进行运算的DF-CCSD(T)程序, 该程序可用于包含700个基函数的无对称性体系的单点能计算, 以及包含1700个基函数的有对称性体系. 本工作所使用的计算节点配置了型号为Intel I9-10900k的CPU和型号为RTX3090的GPU, 与用双精度数在CPU上的计算相比, 利用单精度数结合GPU进行运算可以将CCSD的计算速度提升16倍, (T)部分可提升40倍左右, 而使用单精度数引入的误差可忽略不计. 在程序开发过程中, 作者发展了一套可利用GPU或CPU结合单精度数或双精度数进行含空间对称性的矩阵操作代码库. 基于该套代码库, 可以显著降低开发含空间对称性的耦合簇代码的难度.  相似文献   

7.
    
A custom code for molecular dynamics simulations has been designed to run on CUDA‐enabled NVIDIA graphics processing units (GPUs). The double‐precision code simulates multicomponent fluids, with intramolecular and intermolecular forces, coarse‐grained and atomistic models, holonomic constraints, Nosé–Hoover thermostats, and the generation of distribution functions. Algorithms to compute Lennard‐Jones and Gay‐Berne interactions, and the electrostatic force using Ewald summations, are discussed. A neighbor list is introduced to improve scaling with respect to system size. Three test systems are examined: SPC/E water; an n‐hexane/2‐propanol mixture; and a liquid crystal mesogen, 2‐(4‐butyloxyphenyl)‐5‐octyloxypyrimidine. Code performance is analyzed for each system. With one GPU, a 33–119 fold increase in performance is achieved compared with the serial code while the use of two GPUs leads to a 69–287 fold improvement and three GPUs yield a 101–377 fold speedup. © 2015 Wiley Periodicals, Inc.  相似文献   

8.
    
We present the first graphical processing unit (GPU) coprocessor‐enabled version of the Order‐N Electronic Total Energy Package (ONETEP) code for linear‐scaling first principles quantum mechanical calculations on materials. This work focuses on porting to the GPU the parts of the code that involve atom‐localized fast Fourier transform (FFT) operations. These are among the most computationally intensive parts of the code and are used in core algorithms such as the calculation of the charge density, the local potential integrals, the kinetic energy integrals, and the nonorthogonal generalized Wannier function gradient. We have found that direct porting of the isolated FFT operations did not provide any benefit. Instead, it was necessary to tailor the port to each of the aforementioned algorithms to optimize data transfer to and from the GPU. A detailed discussion of the methods used and tests of the resulting performance are presented, which show that individual steps in the relevant algorithms are accelerated by a significant amount. However, the transfer of data between the GPU and host machine is a significant bottleneck in the reported version of the code. In addition, an initial investigation into a dynamic precision scheme for the ONETEP energy calculation has been performed to take advantage of the enhanced single precision capabilities of GPUs. The methods used here result in no disruption to the existing code base. Furthermore, as the developments reported here concern the core algorithms, they will benefit the full range of ONETEP functionality. Our use of a directive‐based programming model ensures portability to other forms of coprocessors and will allow this work to form the basis of future developments to the code designed to support emerging high‐performance computing platforms.Copyright © 2013 Wiley Periodicals, Inc.  相似文献   

9.
    
The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand–protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual‐GPU version leads to a 39‐fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc.  相似文献   

10.
    
We investigated the performance of heterogeneous computing with graphics processing units (GPUs) and many integrated core (MIC) with 20 CPU cores (20×CPU). As a practical example toward large scale electronic structure calculations using grid‐based methods, we evaluated the Hartree potentials of silver nanoparticles with various sizes (3.1, 3.7, 4.9, 6.1, and 6.9 nm) via a direct integral method supported by the sinc basis set. The so‐called work stealing scheduler was used for efficient heterogeneous computing via the balanced dynamic distribution of workloads between all processors on a given architecture without any prior information on their individual performances. 20×CPU + 1GPU was up to ~1.5 and ~3.1 times faster than 1GPU and 20×CPU, respectively. 20×CPU + 2GPU was ~4.3 times faster than 20×CPU. The performance enhancement by CPU + MIC was considerably lower than expected because of the large initialization overhead of MIC, although its theoretical performance is similar with that of CPU + GPU. © 2016 Wiley Periodicals, Inc.  相似文献   

11.
12.
    
Spatial stochastic simulation is a valuable technique for studying reactions in biological systems. With the availability of high‐performance computing (HPC), the method is poised to allow integration of data from structural, single‐molecule and biochemical studies into coherent computational models of cells. Here, we introduce the Lattice Microbes software package for simulating such cell models on HPC systems. The software performs either well‐stirred or spatially resolved stochastic simulations with approximated cytoplasmic crowding in a fast and efficient manner. Our new algorithm efficiently samples the reaction‐diffusion master equation using NVIDIA graphics processing units and is shown to be two orders of magnitude faster than exact sampling for large systems while maintaining an accuracy of ~ 0.1%. Display of cell models and animation of reaction trajectories involving millions of molecules is facilitated using a plug‐in to the popular VMD visualization platform. The Lattice Microbes software is open source and available for download at http://www.scs.illinois.edu/schulten/lm © 2012 Wiley Periodicals, Inc.  相似文献   

13.
    
The Dynamo module library has been developed for the simulation of molecular systems using hybrid quantum mechanical (QM) and molecular mechanical (MM) potentials. Dynamo is not a program package but is a library of Fortran 90 modules that can be employed by those interested in writing their own programs for performing molecular simulations. The library supports a range of different types of molecular calculation including geometry optimizations, reaction‐path determinations and molecular dynamics and Monte Carlo simulations. This article outlines the general structure and capabilities of the library and describes in detail Dynamo's semiempirical QM/MM hybrid potential. Results are presented to indicate three particular aspects of this implementation—the handling of long‐range nonbonding interactions, the nature of the boundary between the quantum mechanical and molecular mechanical atoms and how to perform path‐integral hybrid‐potential molecular dynamics simulations. © 2000 John Wiley & Sons, Inc. J Comput Chem 21: 1088–1100, 2000  相似文献   

14.
  总被引:3,自引:0,他引:3  
Molecular mechanics simulations offer a computational approach to study the behavior of biomolecules at atomic detail, but such simulations are limited in size and timescale by the available computing resources. State-of-the-art graphics processing units (GPUs) can perform over 500 billion arithmetic operations per second, a tremendous computational resource that can now be utilized for general purpose computing as a result of recent advances in GPU hardware and software architecture. In this article, an overview of recent advances in programmable GPUs is presented, with an emphasis on their application to molecular mechanics simulations and the programming techniques required to obtain optimal performance in these cases. We demonstrate the use of GPUs for the calculation of long-range electrostatics and nonbonded forces for molecular dynamics simulations, where GPU-based calculations are typically 10-100 times faster than heavily optimized CPU-based implementations. The application of GPU acceleration to biomolecular simulation is also demonstrated through the use of GPU-accelerated Coulomb-based ion placement and calculation of time-averaged potentials from molecular dynamics trajectories. A novel approximation to Coulomb potential calculation, the multilevel summation method, is introduced and compared with direct Coulomb summation. In light of the performance obtained for this set of calculations, future applications of graphics processors to molecular dynamics simulations are discussed.  相似文献   

15.
    
We propose the algorithm to evaluate the Coulomb potential in the ab initio density functional calculation on the graphics processor unit (GPU). The numerical accuracy required for the algorithm is investigated in detail. It is shown that GPU, which supports only the single-precision floating number natively, can take part in the major computational tasks. Because of the limited size of the working memory, the Gauss-Rys quadrature to evaluate the electron repulsion integrals (ERIs) is investigated in detail. The error analysis of the quadrature is performed. New interpolation formula of the roots and weights is presented, which is suitable for the processor of the single-instruction multiple-data type. It is proposed to calculate only small ERIs on GPU. ERIs can be classified efficiently with the upper-bound formula. The algorithm is implemented on NVIDIA GeForce 8800 GTX and the Gaussian 03 program suite. It is applied to the test molecules Taxol and Valinomycin. The total energies calculated are essentially the same as the reference ones. The preliminary results show the considerable speedup over the commodity microprocessor.  相似文献   

16.
    
The computation of electron repulsion integrals (ERIs) is the most time‐consuming process in the density functional calculation using Gaussian basis set. Many temporal ERIs are calculated, and most are stored on slower storage, such as cache or memory, because of the shortage of registers, which are the fastest storage in a central processing unit (CPU). Moreover, the heavy register usage makes it difficult to launch many concurrent threads on a graphics processing unit (GPU) to hide latency. Hence, we propose to optimize the calculation order of one‐center ERIs to minimize the number of registers used, and to calculate each ERI with three or six co‐operating threads. The performance of this method is measured on a recent CPU and a GPU. The proposed approach is found to be efficient for high angular basis functions with a GPU. When combined with a recent GPU, it accelerates the computation almost 4‐fold. © 2014 Wiley Periodicals, Inc.  相似文献   

17.
    
We present here a set of algorithms that completely rewrites the Hartree–Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS‐US, GAMESS‐UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6‐31G Self‐Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one‐ and two‐electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc.  相似文献   

18.
    
We test the relative performances of two different approaches to the computation of forces for molecular dynamics simulations on graphics processing units. A “vertex‐based” approach, where a computing thread is started per particle, is compared to an “edge‐based” approach, where a thread is started per each potentially non‐zero interaction. We find that the former is more efficient for systems with many simple interactions per particle while the latter is more efficient if the system has more complicated interactions or fewer of them. By comparing computation times on more and less recent graphics processing unit technology, we predict that, if the current trend of increasing the number of processing cores—as opposed to their computing power—remains, the “edge‐based” approach will gradually become the most efficient choice in an increasing number of cases. © 2014 Wiley Periodicals, Inc.  相似文献   

19.
    
We advocate domain-specific virtual processors (DSVP) as a portability layer for expressing and executing domain-specific computational workloads on modern heterogeneous HPC architectures, with applications in quantum chemistry. Specifically, in this article we extend, generalize and better formalize the concept of a domain-specific virtual processor as applied to scientific high-performance computing. In particular, we introduce a system-wide recursive (hierarchical) hardware encapsulation mechanism into the DSVP architecture and specify a concrete microarchitectural design of an abstract DSVP from which specialized DSVP implementations can be derived for specific scientific domains. Subsequently, we demonstrate, an example of a domain-specific virtual processor specialized to numerical tensor algebra workloads, which is implemented in the ExaTENSOR library developed by the author with a primary focus on the quantum many-body computational workloads on large-scale GPU-accelerated HPC platforms.  相似文献   

20.
A new conservation treatment for outdoor bronze sculptures based on inorganic-organic copolymers (ORMOCER®s) was designed which meets the special requirements defined for this field of conservation. Following a systematic research strategy, a great variety of starting compounds (main components: 3-glycidoxypropyltrimethoxysilane or γ-methacryloxypropyltrimethoxysilane), curing conditions, hardeners and additives were considered for modifying the sol-gel materials. The characterization of the lacquers included IR, GPC and viscosity measurements as well as water and epoxy titrations. Screening was carried out by evaluation of the protective effect on bronze substrates. The material properties of the coatings were adjusted to optimize adhesion and resistance against weathering. Particular emphasis was given to curing the sol-gel polymer at ambient temperatures while retaining reversibility even after aging.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号