首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 147 毫秒
1.
A massively parallel algorithm of the analytical energy gradient calculations based the resolution of identity Møller–Plesset perturbation (RI‐MP2) method from the restricted Hartree–Fock reference is presented for geometry optimization calculations and one‐electron property calculations of large molecules. This algorithm is designed for massively parallel computation on multicore supercomputers applying the Message Passing Interface (MPI) and Open Multi‐Processing (OpenMP) hybrid parallel programming model. In this algorithm, the two‐dimensional hierarchical MP2 parallelization scheme is applied using a huge number of MPI processes (more than 1000 MPI processes) for acceleration of the computationally demanding O (N 5) step such as calculations of occupied–occupied and virtual–virtual blocks of MP2 one‐particle density matrix and MP2 two‐particle density matrices. The new parallel algorithm performance is assessed using test calculations of several large molecules such as buckycatcher C60@C60H28 (144 atoms, 1820 atomic orbitals (AOs) for def2‐SVP basis set, and 3888 AOs for def2‐TZVP), nanographene dimer (C96H24)2 (240 atoms, 2928 AOs for def2‐SVP, and 6432 AOs for cc‐pVTZ), and trp‐cage protein 1L2Y (304 atoms and 2906 AOs for def2‐SVP) using up to 32,768 nodes and 262,144 central processing unit (CPU) cores of the K computer. The results of geometry optimization calculations of trp‐cage protein 1L2Y at the RI‐MP2/def2‐SVP level using the 3072 nodes and 24,576 cores of the K computer are presented and discussed to assess the efficiency of the proposed algorithm. © 2017 Wiley Periodicals, Inc.  相似文献   

2.
3.
The computation of electron repulsion integrals (ERIs) is the most time‐consuming process in the density functional calculation using Gaussian basis set. Many temporal ERIs are calculated, and most are stored on slower storage, such as cache or memory, because of the shortage of registers, which are the fastest storage in a central processing unit (CPU). Moreover, the heavy register usage makes it difficult to launch many concurrent threads on a graphics processing unit (GPU) to hide latency. Hence, we propose to optimize the calculation order of one‐center ERIs to minimize the number of registers used, and to calculate each ERI with three or six co‐operating threads. The performance of this method is measured on a recent CPU and a GPU. The proposed approach is found to be efficient for high angular basis functions with a GPU. When combined with a recent GPU, it accelerates the computation almost 4‐fold. © 2014 Wiley Periodicals, Inc.  相似文献   

4.
KSSOLV (Kohn-Sham Solver) is a MATLAB (Matrix Laboratory) toolbox for solving the Kohn-Sham density functional theory (KS-DFT) with the plane-wave basis set. In the KS-DFT calculations, the most expensive part is commonly the diagonalization of Kohn-Sham Hamiltonian in the self-consistent field (SCF) scheme. To enable a personal computer to perform medium-sized KS-DFT calculations that contain hundreds of atoms, we present a hybrid CPU-GPU implementation to accelerate the iterative diagonalization algorithms implemented in KSSOLV by using the MATLAB built-in Parallel Computing Toolbox. We compare the performance of KSSOLV-GPU on three types of GPU, including RTX3090, V100, and A100, with conventional CPU implementation of KSSOLV respectively and numerical results demonstrate that hybrid CPU-GPU implementation can achieve a speedup of about 10 times compared with sequential CPU calculations for bulk silicon systems containing up to 128 atoms.  相似文献   

5.
The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand–protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual‐GPU version leads to a 39‐fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc.  相似文献   

6.
We accelerated an ab initio molecular QMC calculation by using GPGPU. Only the bottle‐neck part of the calculation is replaced by CUDA subroutine and performed on GPU. The performance on a (single core CPU + GPU) is compared with that on a (single core CPU with double precision), getting 23.6 (11.0) times faster calculations in single (double) precision treatments on GPU. The energy deviation caused by the single precision treatment was found to be within the accuracy required in the calculation, ~10?5 hartree. The accelerated computational nodes mounting GPU are combined to form a hybrid MPI cluster on which we confirmed the performance linearly scales to the number of nodes. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011  相似文献   

7.
The approach used to calculate the two‐electron integral by many electronic structure packages including generalized atomic and molecular electronic structure system‐UK has been designed for CPU‐based compute units. We redesigned the two‐electron compute algorithm for acceleration on a graphical processing unit (GPU). We report the acceleration strategy and illustrate it on the (ss|ss) type integrals. This strategy is general for Fortran‐based codes and uses the Accelerator compiler from Portland Group International and GPU‐based accelerators from Nvidia. The evaluation of (ss|ss) type integrals within calculations using Hartree Fock ab initio methods and density functional theory are accelerated by single and quad GPU hardware systems by factors of 43 and 153, respectively. The overall speedup for a single self consistent field cycle is at least a factor of eight times faster on a single GPU compared with that of a single CPU. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011  相似文献   

8.
A low‐computational‐cost algorithm and its parallel implementation for periodic divide‐and‐conquer density‐functional tight‐binding (DC‐DFTB) calculations are presented. The developed algorithm enables rapid computation of the interaction between atomic partial charges, which is the bottleneck for applications to large systems, by means of multipole‐ and interpolation‐based approaches for long‐ and short‐range contributions. The numerical errors of energy and forces with respect to the conventional Ewald‐based technique can be under the control of the multipole expansion order, level of unit cell replication, and interpolation grid size. The parallel performance of four different evaluation schemes combining previous approaches and the proposed one are assessed using test calculations of a cubic water box on the K computer. The largest benchmark system consisted of 3,295,500 atoms. DC‐DFTB energy and forces for this system were obtained in only a few minutes when the proposed algorithm was activated and parallelized over 16,000 nodes in the K computer. The high performance using a single node workstation was also confirmed. In addition to liquid water systems, the feasibility of the present method was examined by testing solid systems such as diamond form of carbon, face‐centered cubic form of copper, and rock salt form of sodium chloride. © 2017 Wiley Periodicals, Inc.  相似文献   

9.
Modern graphics processing units (GPUs) are flexibly programmable and have peak computational throughput significantly faster than conventional CPUs. Herein, we describe the design and implementation of PAPER, an open‐source implementation of Gaussian molecular shape overlay for NVIDIA GPUs. We demonstrate one to two order‐of‐magnitude speedups on high‐end commodity GPU hardware relative to a reference CPU implementation of the shape overlay algorithm and speedups of over one order of magnitude relative to the commercial OpenEye ROCS package. In addition, we describe errors incurred by approximations used in common implementations of the algorithm. © 2009 Wiley Periodicals, Inc. J Comput Chem 2010  相似文献   

10.
Graphical processing units (GPUs) are emerging in computational chemistry to include Hartree?Fock (HF) methods and electron‐correlation theories. However, ab initio calculations of large molecules face technical difficulties such as slow memory access between central processing unit and GPU and other shortfalls of GPU memory. The divide‐and‐conquer (DC) method, which is a linear‐scaling scheme that divides a total system into several fragments, could avoid these bottlenecks by separately solving local equations in individual fragments. In addition, the resolution‐of‐the‐identity (RI) approximation enables an effective reduction in computational cost with respect to the GPU memory. The present study implemented the DC‐RI‐HF code on GPUs using math libraries, which guarantee compatibility with future development of the GPU architecture. Numerical applications confirmed that the present code using GPUs significantly accelerated the HF calculations while maintaining accuracy. © 2014 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号