首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
The full capacity of contemporary parallel computers can, in the context of iterative ab initio procedures like, for example, self-consistent field (SCF) and multiconfigurational SCF, only be utilized if the disk and input/output (I/O) capacity are fully exploited before the implementation turns to an integral direct strategy. In a recent report on parallel semidirect SCF http://www.tc.cornell.edu/er/media/1996/collabrate.html, http://www.fp.mcs.anl.gd/grand-challenges/chem/nondirect/index.html it was demonstrated that super-linear speedups are achievable for algorithms that exploit scalable parallel I/O. In the I/O-intensive SCF iterations of this implementation a static load balancing, however, was employed, dictated by the initial iteration in which integral evaluation dominates the central processing unit activity and thus determines the load balancing. In the present paper we present the first implementation in which load balancing is achieved throughout the whole SCF procedure, i.e. also in subsequent iterations. The improved scalability of our new algorithm is demonstrated in some test calculations, for example, for 63-node calculation a speedup of 104 was observed in the computation of the two-electron integral contribution to the Fock matrix.Contribution to the Björn Roos Honorary Issue Acknowledgement.We thank J. Nieplocha for valuable help and making the toolkit (including ChemIO) available to us. R.L. acknowledges the Intelligent Modeling Laboratory and the University of Tokyo for financial support during his stay in Japan.  相似文献   

2.
Summary One of the key methods in quantum chemistry, the Hartree-Fock SCF method, is performing poorly on typical vector supercomputers. A significant acceleration of calculations of this type requires the development and implementation of a parallel SCF algorithm. In this paper various parallelization strategies are discussed comparing local and global communication management as well as sequential and distributed Fock-matrix updates. Programs based on these algorithms are bench marked on transputer networks and two IBM MIMD prototypes. The portability of the code is demonstrated with the portation of the initial Helios version to other operating systems like Parallel VM/SP and PARIX. Based on the PVM libraries, a platform-independent version has been developed for heterogeneous workstation clusters as well as for massively parallel computers.  相似文献   

3.
Summary The RHF and geometry optimization sections of the ab initio quantum chemistry code, GAMESS, have been optimized for a network of parallel microprocessors, Inmos T800-20 transputers, using both indirect and direct SCF techniques. The results indicate great scope for implementation of such codes on small parallel computer systems, very high efficiencies having been achieved, particularly in the cases of direct SCF and geometry optimization with large basis sets.The work, although performed upon one particular parallel system, the Meiko Computing Surface, is applicable to a wide range of parallel systems with both shared and distributed memory.  相似文献   

4.
We implemented our gauge-including atomic orbital (GIAO) NMR chemical shielding program on a workstation cluster, using the parallel virtual machine (PVM) message-passing system. On a modest number of nodes, we achieved close to linear speedup. This program is characterized by several novel features. It uses the new integral program of Wolinski that calculates integrals in vectorized batches, increases efficiency, and simplifies parallelization. The self-consistent field (SCF) step includes a multi-Fock algorithm, i.e., the simultaneous calculation of several Fock matrices with the same integral set, increasing the efficiency of the direct SCF procedure. The SCF diagonalization step, which is difficult to parallelize, has been replaced by pseudodiagonalization. The latter, widely used in semiempirical programs, becomes important in ab initio type calculations above a certain size, because the ultimate scaling of the diagonalization step is steeper than that of integral computation. Examples of the calculation of the NMR shieldings in large systems at the SCF level are shown. Parallelization of the density functional code is underway. © 1997 by John Wiley & Sons, Inc. J Comput Chem 18: 816–825, 1997  相似文献   

5.
A vector efficient implementation of the McMurchie and Davidson algorithm for the calculation of one- and two-electron molecular integrals is presented, as available in the Cray version of the ASTERIX program system. The implementation and performance of a vector-oriented strategy for the generation and processing of the P supermatrix is also discussed. This program system has been applied to the ab initio SCF computation of the ground-state wave function for the [V10O28]6? ion, with a basis set of triple-zeta quality for the valence shell of oxygen generating 1404 GTOS and 574 CGTOS for the complete system. The performance and the bottlenecks of the integral calculation are discussed as a function of the integral classes. Two-dimensional maps of the electrostatic potential are presented for this molecule and compared to experimental information about proton fixation.  相似文献   

6.
Summary The development and implementation of a parallel direct self consistent field (SCF) Hartree-Fock algorithm, with gradients and random phase approximation solutions is presented. Important details of the structure of the parallel version of DISCO and preliminary results for calculations using the Concurrent Supercomputing Consortium Intel Touchstone Delta parallel computer system are reported. The data show that the algorithms are efficiently parallelized and that throughput of a one processor CRAY X-MP is reached with about 16 nodes on the Delta. The data also indicate sequential code which was not a bottleneck on traditional supercomputers, can become time critical on parallel computers.This work was performed under the auspices of the Office of Basic Energy Sciences, Division of Chemical Sciences, U.S. Department of Energy, under contract DE-AC06-76RLO 1830 for Pacific Northwest Laboratory which is operated by Battelle Memorial Institute for the U.S. Department of Energy.  相似文献   

7.
By using transputers it is possible to build up networks of parallel processors with varying topology. Due to the architecture of the processors it is appropriate to use the MIMD (multiple instruction multiple data) concept of parallel computing. The most suitable programming language is OCCAM. We investigate the use of transputer networks in computational chemistry, starting with the direct SCF method. The most time consuming step, the calculation of the two electron integrals is executed parallelly. Each node in the network calculates whole batches of integrals. The main program is written in OCCAM. For some large-scale arithmetic processes running on a single node, however, we used FORTRAN subroutines out of standard ab-initio programs to reduce the programming effort. Test calculations show, that the integral calculation step can be parallelized very efficiently. We observe a speed-up of almost 8 using eight network processors. Even in consideration of the scalar part of the SCF iteration, the speed-up is not less than 7.1.  相似文献   

8.
The parallel implementation of a recently developed hybrid scheme for molecular dynamics (MD) simulations (Milano and Kawakatsu, J Chem Phys 2009, 130, 214106) where self‐consistent field theory (SCF) and particle models are combined is described. Because of the peculiar formulation of the hybrid method, considering single particles interacting with density fields, the most computationally expensive part of the hybrid particle‐field MD simulation can be efficiently parallelized using a straightforward particle decomposition algorithm. Benchmarks of simulations, including comparisons of serial MD and MD‐SCF program profiles, serial MD‐SCF and parallel MD‐SCF program profiles, and parallel benchmarks compared with efficient MD program GROMACS 4.5.4 are tested and reported. The results of benchmarks indicate that the proposed parallelization scheme is very efficient and opens the way to molecular simulations of large scale systems with reasonable computational costs. © 2012 Wiley Periodicals, Inc.  相似文献   

9.
Summary A distributed memory programming model was used in a fully parallel implementation of theab initio integral evaluation program ARGOS (R. Pitzer (1973)J. Chem. Phys. 58:3111), on shared memory UNIX computers. The method used is applicable to many similar problems, including derivative integral evaluation. Only a few lines of the existing sequential FORTRAN source required modification. Initial timings on several multi-processor computers are presented. A simplified version of the programming tool used is also presented, and general consideration is given to the parallel implementation of quantum chemistry algorithms.Work performed at Argonne National Laboratory under the auspices of the Division of Chemical Sciences, Office of Basic Energy Sciences, U.S. Department of Energy under contract W-31-109-Eng-38. Pacific Northwest Laboratory is operated for the U.S. Department of Energy by Battelle Memorial Institute under contract DE-AC06-76RLO 1830  相似文献   

10.
This is the first of a series of papers on the ab initio calculation of the second, third, and fourth derivatives of the energy with respect to nuclear coordinates. The knowledge of these derivatives yields anharmonic spectroscopic constants. Here, we present efficient formulae for the analytic evaluation of these derivatives for closed-shell SCF wave functions. We discuss our implementation of the third derivative formula, in particular the integral and vectorization procedures. Applications are reported for H2S, CHOF, and HCCF.  相似文献   

11.
We present a parallel implementation of the integral equation formalism of the polarizable continuum model for Hartree-Fock and density functional theory calculations of energies and linear, quadratic, and cubic response functions. The contributions to the free energy of the solute due to the polarizable continuum have been implemented using a master-slave approach with load balancing to ensure good scalability also on parallel machines with a slow interconnect. We demonstrate the good scaling behavior of the code through calculations of Hartree-Fock energies and linear, quadratic, and cubic response function for a modest-sized sample molecule. We also explore the behavior of the parallelization of the integral equation formulation of the polarizable continuum model code when used in conjunction with a recent scheme for the storage of two-electron integrals in the memory of the different slaves in order to achieve superlinear scaling in the parallel calculations.  相似文献   

12.
We report herein, the implementation of a second-order Moller–Plesset perturbation theory (MP2) program on the IBM LCAP parallel supercomputers. The LCAP systems comprise IBM 308X hosts and 10 FPS-X64 attached processing units (APs). The APs are interconnected by a 512 Mbyte shared memory which allows rapid interprocessor communication. All the computationally demanding steps of the MP2 procedure execute efficiently in parallel. Parallel computation of two-electron integrals is accomplished by distributing the loop over shell blocks among the APs. Parallel Fock matrix formation is achieved by having each AP evaluate the contribution of its own integral sublist to the total Fock matrix. The contributions are added together on the host, and the sum diagonalized either on the host or on a single AP. The parallel implementations of the integral transformation and the MP2 calculation are less straightforward. In each case, the use of the shared memory is essential for an efficient implementation. Details of the implementations and performance data are given.  相似文献   

13.
The fragment molecular orbital method in GAMESS is parallelized in a multithreaded OpenMP implementation combined with the MPI version of the two-level generalized distributed data interface. The energy and analytic gradient in gas phase and the polarizable continuum model of solvation are parallelized in this hybrid three-level scheme, achieving a large memory footprint reduction and a high parallel efficiency on Intel Xeon Phi processors. The parallel efficiency is demonstrated on the Stampede2 and Theta supercomputers using up to 2048 nodes (262 144 threads).  相似文献   

14.
We present an outline of the parallel implementation of our pseudospectral electronic structure program, Jaguar, including the algorithm and timings for the Hartree–Fock and analytic gradient portions of the program. We also present the parallel algorithm and timings for our Lanczos eigenvector refinement code and demonstrate that its performance is superior to the ScaLAPACK diagonalization routines. The overall efficiency of our code increases as the size of the calculation is increased, demonstrating actual as well as theoretical scalability. For our largest test system, alanine pentapeptide [818 basis functions in the cc-pVTZ(-f) basis set], our Fock matrix assembly procedure has an efficiency of nearly 90% on a 16-processor SP2 partition. The SCF portion for this case (including eigenvector refinement) has an overall efficiency of 87% on a partition of 8 processors and 74% on a partition of 16 processors. Finally, our parallel gradient calculations have a parallel efficiency of 84% on 8 processors for porphine (430 basis functions). © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1017–1029, 1998  相似文献   

15.
We have implemented a parallel version of the semiempirical divide and conquer program DivCon previously developed in our laboratory. By utilizing a parallel machine we are able to leverage the linear scaling of the divide and conquer algorithm itself to perform semiempirical calculations on large bio-molecules. The utility of the implementation is demonstrated with a partial geometry optimization of hen egg white lysozyme in the gas phase. Received: 5 December 1997 / Accepted: 13 February 1998 / Published online: 17 June 1998  相似文献   

16.
A parallel version of the valence bond program TURTLE has been developed. In this version the calculation of matrix elements is distributed over the processors. The implementation has been done using the message‐passing interface (MPI), and is, therefore, portable. The parallel version of the program is shown to be quite efficient with a speed‐up of 55 at 64 processors. © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 665–672, 2001  相似文献   

17.
By means of a comparison with the CNDO method we show that the HAM/3 formalism is a legitimate version of the SCF method. It is distinguished from the standard Roothaan-Hartree-Fock version of the SCF method by an unconventional exploitation of an inherent degree of freedom.  相似文献   

18.
We have implemented a parallel divide-and-conquer method for semiempirical quantum mechanical calculations. The standard message passing library, the message passing interface (MPI), was used. In this parallel version, the memory needed to store the Fock and density matrix elements is distributed among the processors. This memory distribution solves the problem of demanding requirement of memory for very large molecules. While the parallel calculation for construction of matrix elements is straightforward, the parallel calculation of Fock matrix diagonalization is achieved via the divide-and-conquer method. Geometry optimization is also implemented with parallel gradient calculations. The code has been tested on a Cray T3E parallel computer, and impressive speedup of calculations has been achieved. Our results indicate that the divide-and-conquer method is efficient for parallel implementation. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1101–1109, 1998  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号