期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

MPI/OpenMP hybrid parallel algorithm for resolution of identity second‐order Møller–Plesset perturbation calculation of analytical energy gradient for massively parallel multicore supercomputers

《Journal of computational chemistry》2017,38(8):489-507

A massively parallel algorithm of the analytical energy gradient calculations based the resolution of identity Møller–Plesset perturbation (RI‐MP2) method from the restricted Hartree–Fock reference is presented for geometry optimization calculations and one‐electron property calculations of large molecules. This algorithm is designed for massively parallel computation on multicore supercomputers applying the Message Passing Interface (MPI) and Open Multi‐Processing (OpenMP) hybrid parallel programming model. In this algorithm, the two‐dimensional hierarchical MP2 parallelization scheme is applied using a huge number of MPI processes (more than 1000 MPI processes) for acceleration of the computationally demanding O (N ⁵) step such as calculations of occupied–occupied and virtual–virtual blocks of MP2 one‐particle density matrix and MP2 two‐particle density matrices. The new parallel algorithm performance is assessed using test calculations of several large molecules such as buckycatcher C₆₀@C₆₀H₂₈ (144 atoms, 1820 atomic orbitals (AOs) for def2‐SVP basis set, and 3888 AOs for def2‐TZVP), nanographene dimer (C₉₆H₂₄)₂ (240 atoms, 2928 AOs for def2‐SVP, and 6432 AOs for cc‐pVTZ), and trp‐cage protein 1L2Y (304 atoms and 2906 AOs for def2‐SVP) using up to 32,768 nodes and 262,144 central processing unit (CPU) cores of the K computer. The results of geometry optimization calculations of trp‐cage protein 1L2Y at the RI‐MP2/def2‐SVP level using the 3072 nodes and 24,576 cores of the K computer are presented and discussed to assess the efficiency of the proposed algorithm. © 2017 Wiley Periodicals, Inc. 相似文献

2.

gWEGA: GPU‐accelerated WEGA for molecular superposition and shape comparison

下载免费PDF全文

Xin Yan Jiabo Li Qiong Gu Jun Xu 《Journal of computational chemistry》2014,35(15):1122-1130

相似文献

3.

Efficient calculation of two‐electron integrals for high angular basis functions

Koji Yasuda Hironori Maruoka 《International journal of quantum chemistry》2014,114(9):543-552

The computation of electron repulsion integrals (ERIs) is the most time‐consuming process in the density functional calculation using Gaussian basis set. Many temporal ERIs are calculated, and most are stored on slower storage, such as cache or memory, because of the shortage of registers, which are the fastest storage in a central processing unit (CPU). Moreover, the heavy register usage makes it difficult to launch many concurrent threads on a graphics processing unit (GPU) to hide latency. Hence, we propose to optimize the calculation order of one‐center ERIs to minimize the number of registers used, and to calculate each ERI with three or six co‐operating threads. The performance of this method is measured on a recent CPU and a GPU. The proposed approach is found to be efficient for high angular basis functions with a GPU. When combined with a recent GPU, it accelerates the computation almost 4‐fold. © 2014 Wiley Periodicals, Inc. 相似文献

4.

KSSOLV-GPU: an Efficient GPU-Enabled MATLAB Toolbox for Solving the Kohn-Sham Equations within Density Functional Theory in Plane-Wave Basis Set^?

下载免费PDF全文

Zhen-lin Zhang Shi-zhe Jiao Jie-lan Li Wen-tiao Wu Ling-yun Wan Xin-ming Qin Wei Hu Jin-long Yang 《化学物理学报(中文版)》2021,34(5):552-564

KSSOLV (Kohn-Sham Solver) is a MATLAB (Matrix Laboratory) toolbox for solving the Kohn-Sham density functional theory (KS-DFT) with the plane-wave basis set. In the KS-DFT calculations, the most expensive part is commonly the diagonalization of Kohn-Sham Hamiltonian in the self-consistent field (SCF) scheme. To enable a personal computer to perform medium-sized KS-DFT calculations that contain hundreds of atoms, we present a hybrid CPU-GPU implementation to accelerate the iterative diagonalization algorithms implemented in KSSOLV by using the MATLAB built-in Parallel Computing Toolbox. We compare the performance of KSSOLV-GPU on three types of GPU, including RTX3090, V100, and A100, with conventional CPU implementation of KSSOLV respectively and numerical results demonstrate that hybrid CPU-GPU implementation can achieve a speedup of about 10 times compared with sequential CPU calculations for bulk silicon systems containing up to 128 atoms. 相似文献

5.

GPU accelerated implementation of NCI calculations using promolecular density

《Journal of computational chemistry》2017,38(14):1071-1083

The NCI approach is a modern tool to reveal chemical noncovalent interactions. It is particularly attractive to describe ligand–protein binding. A custom implementation for NCI using promolecular density is presented. It is designed to leverage the computational power of NVIDIA graphics processing unit (GPU) accelerators through the CUDA programming model. The code performances of three versions are examined on a test set of 144 systems. NCI calculations are particularly well suited to the GPU architecture, which reduces drastically the computational time. On a single compute node, the dual‐GPU version leads to a 39‐fold improvement for the biggest instance compared to the optimal OpenMP parallel run (C code, icc compiler) with 16 CPU cores. Energy consumption measurements carried out on both CPU and GPU NCI tests show that the GPU approach provides substantial energy savings. © 2017 Wiley Periodicals, Inc. 相似文献

6.

Acceleration of a QM/MM‐QMC simulation using GPU

Yutaka Uejima Tomoharu Terashima Ryo Maezono 《Journal of computational chemistry》2011,32(10):2264-2272

We accelerated an ab initio molecular QMC calculation by using GPGPU. Only the bottle‐neck part of the calculation is replaced by CUDA subroutine and performed on GPU. The performance on a (single core CPU + GPU) is compared with that on a (single core CPU with double precision), getting 23.6 (11.0) times faster calculations in single (double) precision treatments on GPU. The energy deviation caused by the single precision treatment was found to be within the accuracy required in the calculation, ～10^?5 hartree. The accelerated computational nodes mounting GPU are combined to form a hybrid MPI cluster on which we confirmed the performance linearly scales to the number of nodes. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011 相似文献

7.

Acceleration of the GAMESS‐UK electronic structure package on graphical processing units

Karl A. Wilkinson Paul Sherwood Martyn F. Guest Kevin J. Naidoo 《Journal of computational chemistry》2011,32(10):2313-2318

The approach used to calculate the two‐electron integral by many electronic structure packages including generalized atomic and molecular electronic structure system‐UK has been designed for CPU‐based compute units. We redesigned the two‐electron compute algorithm for acceleration on a graphical processing unit (GPU). We report the acceleration strategy and illustrate it on the (ss|ss) type integrals. This strategy is general for Fortran‐based codes and uses the Accelerator compiler from Portland Group International and GPU‐based accelerators from Nvidia. The evaluation of (ss|ss) type integrals within calculations using Hartree Fock ab initio methods and density functional theory are accelerated by single and quad GPU hardware systems by factors of 43 and 153, respectively. The overall speedup for a single self consistent field cycle is at least a factor of eight times faster on a single GPU compared with that of a single CPU. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011 相似文献

8.

Parallel implementation of efficient charge–charge interaction evaluation scheme in periodic divide‐and‐conquer density‐functional tight‐binding calculations

《Journal of computational chemistry》2018,39(2):105-116

A low‐computational‐cost algorithm and its parallel implementation for periodic divide‐and‐conquer density‐functional tight‐binding (DC‐DFTB) calculations are presented. The developed algorithm enables rapid computation of the interaction between atomic partial charges, which is the bottleneck for applications to large systems, by means of multipole‐ and interpolation‐based approaches for long‐ and short‐range contributions. The numerical errors of energy and forces with respect to the conventional Ewald‐based technique can be under the control of the multipole expansion order, level of unit cell replication, and interpolation grid size. The parallel performance of four different evaluation schemes combining previous approaches and the proposed one are assessed using test calculations of a cubic water box on the K computer. The largest benchmark system consisted of 3,295,500 atoms. DC‐DFTB energy and forces for this system were obtained in only a few minutes when the proposed algorithm was activated and parallelized over 16,000 nodes in the K computer. The high performance using a single node workstation was also confirmed. In addition to liquid water systems, the feasibility of the present method was examined by testing solid systems such as diamond form of carbon, face‐centered cubic form of copper, and rock salt form of sodium chloride. © 2017 Wiley Periodicals, Inc. 相似文献

9.

PAPER—Accelerating parallel evaluations of ROCS

Imran S. Haque Vijay S. Pande 《Journal of computational chemistry》2010,31(1):117-132

Modern graphics processing units (GPUs) are flexibly programmable and have peak computational throughput significantly faster than conventional CPUs. Herein, we describe the design and implementation of PAPER, an open‐source implementation of Gaussian molecular shape overlay for NVIDIA GPUs. We demonstrate one to two order‐of‐magnitude speedups on high‐end commodity GPU hardware relative to a reference CPU implementation of the shape overlay algorithm and speedups of over one order of magnitude relative to the commercial OpenEye ROCS package. In addition, we describe errors incurred by approximations used in common implementations of the algorithm. © 2009 Wiley Periodicals, Inc. J Comput Chem 2010 相似文献

10.

Linear‐scaling self‐consistent field calculations based on divide‐and‐conquer method using resolution‐of‐identity approximation on graphical processing units

下载免费PDF全文

Takeshi Yoshikawa Hiromi Nakai 《Journal of computational chemistry》2015,36(3):164-170

Graphical processing units (GPUs) are emerging in computational chemistry to include Hartree?Fock (HF) methods and electron‐correlation theories. However, ab initio calculations of large molecules face technical difficulties such as slow memory access between central processing unit and GPU and other shortfalls of GPU memory. The divide‐and‐conquer (DC) method, which is a linear‐scaling scheme that divides a total system into several fragments, could avoid these bottlenecks by separately solving local equations in individual fragments. In addition, the resolution‐of‐the‐identity (RI) approximation enables an effective reduction in computational cost with respect to the GPU memory. The present study implemented the DC‐RI‐HF code on GPUs using math libraries, which guarantee compatibility with future development of the GPU architecture. Numerical applications confirmed that the present code using GPUs significantly accelerated the HF calculations while maintaining accuracy. © 2014 Wiley Periodicals, Inc. 相似文献