首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A parallel direct self-consistent field (SCF) algorithm for distributed memory computers is described. Key features of the algorithm are its ability to achieve a load balance dynamically, its modest memory requirements per processor, and its ability to utilize the full eightfold index permutation symmetry of the two-electron integrals despite the fact that entire copies of the Fock and density matrices are not present in each processor's local memory. The algorithm is scalable and, accordingly, has the potential to function efficiently on hundreds of processors. With the algorithm described here, a calculation employing several thousand basis functions can be carried out on a distributed memory machine with 100 or more processors each with just 4 MBytes of RAM and no disk. The Fock matrix build portion of the algorithm has been implemented on a 16-node Intel iPSC/2. Results from benchmark calculations are encouraging. The algorithm shows excellent load balance when run on 4, 8, or 16 processors and displays almost ideal speed-up in going from 4 to 16 processors. Preliminary benchmark calculations have also been carried out on an Intel Paragon. © 1995 by John Wiley & Sons, Inc.  相似文献   

2.
We have implemented a parallel divide-and-conquer method for semiempirical quantum mechanical calculations. The standard message passing library, the message passing interface (MPI), was used. In this parallel version, the memory needed to store the Fock and density matrix elements is distributed among the processors. This memory distribution solves the problem of demanding requirement of memory for very large molecules. While the parallel calculation for construction of matrix elements is straightforward, the parallel calculation of Fock matrix diagonalization is achieved via the divide-and-conquer method. Geometry optimization is also implemented with parallel gradient calculations. The code has been tested on a Cray T3E parallel computer, and impressive speedup of calculations has been achieved. Our results indicate that the divide-and-conquer method is efficient for parallel implementation. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1101–1109, 1998  相似文献   

3.
A parallel Fock matrix construction program for FMO‐MO method has been developed with the distributed shared memory model. To construct a large‐sized Fock matrix during FMO‐MO calculations, a distributed parallel algorithm was designed to make full use of local memory to reduce communication, and was implemented on the Global Array toolkit. A benchmark calculation for a small system indicates that the parallelization efficiency of the matrix construction portion is as high as 93% at 1,024 processors. A large FMO‐MO application on the epidermal growth factor receptor (EGFR) protein (17,246 atoms and 96,234 basis functions) was also carried out at the HF/6‐31G level of theory, with the frontier orbitals being extracted by a Sakurai‐Sugiura eigensolver. It takes 11.3 h for the FMO calculation, 49.1 h for the Fock matrix construction, and 10 min to extract 94 eigen‐components on a PC cluster system using 256 processors. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

4.
A massively parallel algorithm of the analytical energy gradient calculations based the resolution of identity Møller–Plesset perturbation (RI‐MP2) method from the restricted Hartree–Fock reference is presented for geometry optimization calculations and one‐electron property calculations of large molecules. This algorithm is designed for massively parallel computation on multicore supercomputers applying the Message Passing Interface (MPI) and Open Multi‐Processing (OpenMP) hybrid parallel programming model. In this algorithm, the two‐dimensional hierarchical MP2 parallelization scheme is applied using a huge number of MPI processes (more than 1000 MPI processes) for acceleration of the computationally demanding O (N 5) step such as calculations of occupied–occupied and virtual–virtual blocks of MP2 one‐particle density matrix and MP2 two‐particle density matrices. The new parallel algorithm performance is assessed using test calculations of several large molecules such as buckycatcher C60@C60H28 (144 atoms, 1820 atomic orbitals (AOs) for def2‐SVP basis set, and 3888 AOs for def2‐TZVP), nanographene dimer (C96H24)2 (240 atoms, 2928 AOs for def2‐SVP, and 6432 AOs for cc‐pVTZ), and trp‐cage protein 1L2Y (304 atoms and 2906 AOs for def2‐SVP) using up to 32,768 nodes and 262,144 central processing unit (CPU) cores of the K computer. The results of geometry optimization calculations of trp‐cage protein 1L2Y at the RI‐MP2/def2‐SVP level using the 3072 nodes and 24,576 cores of the K computer are presented and discussed to assess the efficiency of the proposed algorithm. © 2017 Wiley Periodicals, Inc.  相似文献   

5.
Numerical errors in total energy values in large-scale Hartree–Fock calculations are discussed. To obtain total energy values within chemical accuracy, 0.01 kcal/mol, stricter numerical accuracy is required as basis size increases. In molecules with 10,000 basis sizes, such as proteins, numerical accuracy for total energy values must be retained to at least 11 digits (i.e., to the order of 1.0D-10) to keep accumulation of numerical errors less than the chemical accuracy (0.01 kcal/mol). With this criterion, we examined the sensitivity analysis of numerical accuracy in Hartree–Fock calculation by uniformly replacing the last bit of the mantissa part of a double-precision real number by zero in the Fock matrix construction step, the total energy calculation step, and the Fock matrix diagonalization step. Using a partial summation technique in the Fock matrix generation step, the numerical error for total energy value of molecules with basis size greater than 10,000 was within chemical accuracy (0.01 kcal/mol), whereas with the conventional method the numerical error with several thousand basis sets was larger than chemical accuracy. Computation of one Fock matrix element with parallel machines can include the partial summation technique automatically, so that parallel calculation yields not only high-performance computing but also more precise numerical solutions than the conventional sequential algorithm. We also found that the numerical error of the Householder-QR diagonalization routine is equal to or less than chemical accuracy, even with a matrix size of 10,000. ©1999 John Wiley & Sons, Inc. J Comput Chem 20: 443–454, 1999  相似文献   

6.
One of the most commonly used means to characterize potential energy surfaces of reactions and chemical systems is the Hessian calculation, whose analytic evaluation is computationally and memory demanding. A new scalable distributed data analytic Hessian algorithm is presented. Features of the distributed data parallel coupled perturbed Hartree-Fock (CPHF) are (a) columns of density-like and Fock-like matrices are distributed among processors, (b) an efficient static load balancing scheme achieves good work load distribution among the processors, (c) network communication time is minimized, and (d) numerous performance improvements in analytic Hessian steps are made. As a result, the new code has good performance which is demonstrated on large biological systems.  相似文献   

7.
It is shown that a compression of two-electron integrals and their indices significantly improves efficiency of the conventional self-consistent field (SCF) algorithm for a solution of the Hartree-Fock equation by decrease the Fock matrix calculation time. The improvement is reached not only due to a reduction of the integral file size, but mainly because data compression reduces or even can eliminate a cache conflict in data transfer from the hard drive to the main computer memory. Thus, the conventional SCF algorithm with the data compression becomes very efficient and permits to carry out large-scale Hartree-Fock calculations. The largest Hartree-Fock calculations have been performed for RNA 433D structure from the PDB data bank with 6080 basis functions formed from 6-31G basis on a workstation with 1 GHz Alpha processor.  相似文献   

8.
A new parallel algorithm has been developed for calculating the analytic energy derivatives of full accuracy second order Møller‐Plesset perturbation theory (MP2). Its main projected application is the optimization of geometries of large molecules, in which noncovalent interactions play a significant role. The algorithm is based on the two‐step MP2 energy calculation algorithm developed recently and implemented into the quantum chemistry program, GAMESS. Timings are presented for test calculations on taxol (C47H51NO14) with the 6‐31G and 6‐31G(d) basis sets (660 and 1032 basis functions, 328 correlated electrons) and luciferin (C11H8N2O3S2) with aug‐cc‐pVDZ and aug‐cc‐pVTZ (530 and 1198 basis functions, 92 correlated electrons). The taxol 6‐31G(d) calculations are also performed with up to 80 CPU cores. The results demonstrate the high parallel efficiency of the program. © 2007 Wiley Periodicals, Inc. J Comput Chem, 2007  相似文献   

9.
A new parallel algorithm has been developed for second‐order Møller–Plesset perturbation theory (MP2) energy calculations. Its main projected applications are for large molecules, for instance, for the calculation of dispersion interaction. Tests on a moderate number of processors (2–16) show that the program has high CPU and parallel efficiency. Timings are presented for two relatively large molecules, taxol (C47H51NO14) and luciferin (C11H8N2O3S2), the former with the 6‐31G* and 6‐311G** basis sets (1032 and 1484 basis functions, 164 correlated orbitals), and the latter with the aug‐cc‐pVDZ and aug‐cc‐pVTZ basis sets (530 and 1198 basis functions, 46 correlated orbitals). An MP2 energy calculation on C130H10 (1970 basis functions, 265 correlated orbitals) completed in less than 2 h on 128 processors. © 2006 Wiley Periodicals, Inc. J Comput Chem 27: 407–413, 2006  相似文献   

10.
Several parallel algorithms for Fock matrix construction are described. The algorithms calculate only the unique integrals, distribute the Fock and density matrices over the processors of a massively parallel computer, use blocking techniques to construct the distributed data structures, and use clustering techniques on each processor to maximize data reuse. Algorithms based on both square and row-blocked distributions of the Fock and density matrices are described and evaluated. Variants of the algorithms are discussed that use either triple-sort or canonical ordering of integrals, and dynamic or static task clustering schemes. The algorithms are shown to adapt to screening, with communication volume scaling down with computation costs. Modeling techniques are used to characterize algorithm performance. Given the characteristics of existing massively parallel computers, all the algorithms are shown to be highly efficient for problems of moderate size. The algorithms using the row-blocked data distribution are the most efficient. © 1996 by John Wiley & Sons, Inc.  相似文献   

11.
We present an outline of the parallel implementation of our pseudospectral electronic structure program, Jaguar, including the algorithm and timings for the Hartree–Fock and analytic gradient portions of the program. We also present the parallel algorithm and timings for our Lanczos eigenvector refinement code and demonstrate that its performance is superior to the ScaLAPACK diagonalization routines. The overall efficiency of our code increases as the size of the calculation is increased, demonstrating actual as well as theoretical scalability. For our largest test system, alanine pentapeptide [818 basis functions in the cc-pVTZ(-f) basis set], our Fock matrix assembly procedure has an efficiency of nearly 90% on a 16-processor SP2 partition. The SCF portion for this case (including eigenvector refinement) has an overall efficiency of 87% on a partition of 8 processors and 74% on a partition of 16 processors. Finally, our parallel gradient calculations have a parallel efficiency of 84% on 8 processors for porphine (430 basis functions). © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1017–1029, 1998  相似文献   

12.
Summary One of the key methods in quantum chemistry, the Hartree-Fock SCF method, is performing poorly on typical vector supercomputers. A significant acceleration of calculations of this type requires the development and implementation of a parallel SCF algorithm. In this paper various parallelization strategies are discussed comparing local and global communication management as well as sequential and distributed Fock-matrix updates. Programs based on these algorithms are bench marked on transputer networks and two IBM MIMD prototypes. The portability of the code is demonstrated with the portation of the initial Helios version to other operating systems like Parallel VM/SP and PARIX. Based on the PVM libraries, a platform-independent version has been developed for heterogeneous workstation clusters as well as for massively parallel computers.  相似文献   

13.
Gaussian-94 is the series of electronic structure programs. It is an integrated system to model a broad range of molecular systems under a variety of conditions, performing its calculations from the basic laws of quantum chemistry. This new version includes methods and algorithms for scalable massively parallel systems such as the Cray T3E supercomputer. In this study, we discuss the performance of Gaussian using large number of processors. In particular, we analyze the scalability of methods such as Hartree–Fock and density functional theory (DFT), including first and second derivatives. In addition, we explore scalability for CIS, MP2, and MCSCF calculations. Scalability and speedups were investigated for most of the examples with up to 64 process elements. A single-point energy calculation (B3-LYP/6-311++G3df,3p) was tested with up to 512 process elements. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1053–1063, 1998  相似文献   

14.
We present the parallel version of a previous serial algorithm for the efficient calculation of canonical MP2 energies (Pulay, P.; Saebo, S.; Wolinski, K. Chem Phys Lett 2001, 344, 543). It is based on the Saebo-Alml?f direct-integral transformation, coupled with an efficient prescreening of the AO integrals. The parallel algorithm avoids synchronization delays by spawning a second set of slaves during the bin-sort prior to the second half-transformation. Results are presented for systems with up to 2000 basis functions. MP2 energies for molecules with 400-500 basis functions can be routinely calculated to microhartree accuracy on a small number of processors (6-8) in a matter of minutes with modern PC-based parallel computers.  相似文献   

15.
An algorithm for massively parallel computers is developed for energy calculations of second-order M?ller?CPlesset (MP2) perturbation theory with numerical quadratures. Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) technologies are utilized for inter-node and intra-node parallelization, respectively. Computational tasks and intermediates are distributed across nodes by dividing quadrature points, and the distributed data are stored in memory. Benchmark calculations were performed on 256?C8,192 CPU cores, and we observed the speed-ups 4,534?C6,266 for 8,192 cores. A large calculation for fullerene (C60) with aug-cc-pCVTZ (3,540 basis functions) was completed in ca. 4.8?h on 8,192 cores without invoking molecular symmetry.  相似文献   

16.
A parallel algorithm for efficient calculation of the second derivatives (Hessian) of the conformational energy in internal coordinates is proposed. This parallel algorithm is based on the master/slave model. A master processor distributes the calculations of components of the Hessian to one or more slave processors that, after finishing their calculations, send the results to the master processor that assembles all the components of the Hessian. Our previously developed molecular analysis system for conformational energy optimization, normal mode analysis, and Monte Carlo simulation for internal coordinates is extended to use this parallel algorithm for Hessian calculation on a massively parallel computer. The implementation of our algorithm uses the message passing interface and works effectively on both distributed-memory parallel computers and shared-memory parallel computers. We applied this system to the Newton–Raphson energy optimization of the structures of glutaminyl transfer RNA (Gln-tRNA) with 74 nucleotides and glutaminyl-tRNA synthetase (GlnRS) with 540 residues to analyze the performance of our system. The parallel speedups for the Hessian calculation were 6.8 for Gln-tRNA with 24 processors and 11.2 for GlnRS with 54 processors. The parallel speedups for the Newton–Raphson optimization were 6.3 for Gln-tRNA with 30 processors and 12.0 for GlnRS with 62 processors. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1716–1723, 1998  相似文献   

17.
A parallel full configuration interaction (FCI) code, implemented on a distributed memory MPP computer, has been modified in order to use a direct algorithm to compute the lists of mono- and biexcitations each time they are needed. We were able to perform FCI calculations on the ground state of the acetylene molecule with two different basis sets, corresponding to more than 2.5 and 5 billion Slater determinants, respectively. The calculations were performed on a Cray-T3D and a Cray-T3E, both machines having 128 processors. Performance and comparison between the two computers are reported and discussed. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 658–672, 1998  相似文献   

18.
An efficient vector processing algorithm generating PK supermatrices has been developed, in particular aiming at calculations on large molecules. The algorithm utilizes the recurrence relations for electron repulsion integrals. The PK supermatrices are listed in a nearly canonical order so that the Fock matrix generation is efficiently vectorized, no temporary ERI and PK files being used. This is effected by partition of the basis set (atomic orbitals) into subsets of certain appropriate sizes, and the partition approach is named as the three-dimensional partial space method. A high-speed Hartree–Fock calculation including integrals and SCF procedures is achieved. © 1992 by John Wiley & Sons, Inc.  相似文献   

19.
We discuss issues in developing scalable parallel algorithms and focus on the distribution, as opposed to the replication, of key data structures. Replication of large data structures limits the maximum calculation size by imposing a low ratio of processors to memory. Only applications which distribute both data and computation across processors are truly scalable. The use of shared data structures that may be independently accessed by each process even in a distributed memory environment greatly simplifies development and provides a significant performance enhancement. We describe tools we have developed to support this programming paradigm. These tools are used to develop a highly efficient and scalable algorithm to perform self-consistent field calculations on molecular systems. A simple and classical strip-mining algorithm suffices to achieve an efficient and scalable Fock matrix construction in which all matrices are fully distributed. By strip mining over atoms, we also exploit all available sparsity and pave the way to adopting more sophisticated methods for summation of the Coulomb and exchange interactions. © 1996 by John Wiley & Sons, Inc.  相似文献   

20.
The efficient evaluation of polarizable molecular mechanics potentials on distributed memory parallel computers is discussed. The program executes at 7–10 Mflops/node on a 32-node CM-5 partition and is 19 times faster than comparable code running on a single-processor HP 9000/735. On the parallel computer, matrix inversion becomes a practical alternative to the commonly used iterative method for the calculation of induced dipole moments. The former method is useful in cases such as free-energy perturbation (FEP) simulations, which require highly accurate induced dipole moments. Matrix inversion is performed 110 times faster on the CM-5 than on the HP. We show that the accuracy which is needed for FEP calculations with polarization can be obtained by either matrix inversion or by performing a large number of iteration cycles to satisfy convergence tolerances that are less than 10?6 D. © 1995 by John Wiley & Sons, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号