首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We developed a novel parallel algorithm for large-scale Fock matrix calculation with small locally distributed memory architectures, and named it the "RT parallel algorithm." The RT parallel algorithm actively involves the concept of integral screening, which is indispensable for reduction of computing times with large-scale biological molecules. The primary characteristic of this algorithm is parallel efficiency, which is achieved by well-balanced reduction of both communicating and computing volume. Only the density matrix data necessary for Fock matrix calculations are communicated, and the data once communicated are reutilized for calculations as many times as possible. The RT parallel algorithm is a scalable method because required memory volume does not depend on the number of basis functions. This algorithm automatically includes a partial summing technique that is indispensable for maintaining computing accuracy, and can also include some conventional methods to reduce calculation times. In our analysis, the RT parallel algorithm had better performance than other methods for massively parallel processors. The RT parallel algorithm is most suitable for massively parallel and distributed Fock matrix calculations for large-scale biological molecules with more than thousands of basis functions.  相似文献   

2.
We have implemented a parallel divide-and-conquer method for semiempirical quantum mechanical calculations. The standard message passing library, the message passing interface (MPI), was used. In this parallel version, the memory needed to store the Fock and density matrix elements is distributed among the processors. This memory distribution solves the problem of demanding requirement of memory for very large molecules. While the parallel calculation for construction of matrix elements is straightforward, the parallel calculation of Fock matrix diagonalization is achieved via the divide-and-conquer method. Geometry optimization is also implemented with parallel gradient calculations. The code has been tested on a Cray T3E parallel computer, and impressive speedup of calculations has been achieved. Our results indicate that the divide-and-conquer method is efficient for parallel implementation. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1101–1109, 1998  相似文献   

3.
A parallel Fock matrix construction program for FMO‐MO method has been developed with the distributed shared memory model. To construct a large‐sized Fock matrix during FMO‐MO calculations, a distributed parallel algorithm was designed to make full use of local memory to reduce communication, and was implemented on the Global Array toolkit. A benchmark calculation for a small system indicates that the parallelization efficiency of the matrix construction portion is as high as 93% at 1,024 processors. A large FMO‐MO application on the epidermal growth factor receptor (EGFR) protein (17,246 atoms and 96,234 basis functions) was also carried out at the HF/6‐31G level of theory, with the frontier orbitals being extracted by a Sakurai‐Sugiura eigensolver. It takes 11.3 h for the FMO calculation, 49.1 h for the Fock matrix construction, and 10 min to extract 94 eigen‐components on a PC cluster system using 256 processors. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

4.
The implementation of the HONDO program on the Loosely Coupled Array of Processors (LCAP) parallel computer system assembled in our laboratory is presented. We discuss a general strategy used to maintain a high level of compatibility between the serial version and the parallel version of the code. We report the implementation of energy-gradient calculation for SCF wavefunctions. The integral and integral derivative programs display high parallel efficiency, and so does the SCF part in the case of very large basis sets.  相似文献   

5.
Ab initio electronic structure approaches in which electron correlation explicitly appears have been the subject of much recent interest. Because these methods accelerate the rate of convergence of the energy and properties with respect to the size of the one-particle basis set, they promise to make accuracies of better than 1 kcal/mol computationally feasible for larger chemical systems than can be treated at present with such accuracy. The linear R12 methods of Kutzelnigg and co-workers are currently the most practical means to include explicit electron correlation. However, the application of such methods to systems of chemical interest faces severe challenges, most importantly, the still steep computational cost of such methods. Here we describe an implementation of the second-order M?ller-Plesset method with terms linear in the interelectronic distances (MP2-R12) which has a reduced computational cost due to the use of two basis sets. The use of two basis sets in MP2-R12 theory was first investigated recently by Klopper and Samson and is known as the auxiliary basis set (ABS) approach. One of the basis sets is used to describe the orbitals and another, the auxiliary basis set, is used for approximating matrix elements occurring in the exact MP2-R12 theory. We further extend the applicability of the approach by parallelizing all steps of the integral-direct MP2-R12 energy algorithm. We discuss several variants of the MP2-R12 method in the context of parallel execution and demonstrate that our implementation runs efficiently on a variety of distributed memory machines. Results of preliminary applications indicate that the two-basis (ABS) MP2-R12 approach cannot be used safely when small basis sets (such as augmented double- and triple-zeta correlation consistent basis sets) are utilized in the orbital expansion. Our results suggest that basis set reoptimization or further modifications of the explicitly correlated ansatz and/or standard approximations for matrix elements are necessary in order to make the MP2-R12 method sufficiently accurate when small orbital basis sets are used. The computer code is a part of the latest public release of Sandia's Massively Parallel Quantum Chemistry program available under GNU General Public License.  相似文献   

6.
We present a parallel implementation of second-order M?ller-Plesset perturbation theory with the resolution-of-the-identity approximation (RI-MP2). The implementation is based on a recent improved sequential implementation of RI-MP2 within the Turbomole program package and employs the message passing interface (MPI) standard for communication between distributed memory nodes. The parallel implementation extends the applicability of canonical MP2 to considerably larger systems. Examples are presented for full geometry optimizations with up to 60 atoms and 3300 basis functions and MP2 energy calculations with more than 200 atoms and 7000 basis functions.  相似文献   

7.
Summary In the last decade, the development in computer architectures has strongly influenced and motivated the evolution of algorithms for large-scale scientific computing. The unifying theme of the parallel algorithm group in CERFACS is the exploitation of vector and parallel computers in the solution of large-scale problems arising in computational science and engineering. The choice of a portable approach often leads to a loss in the average performance per computer with respect to a machine dependent implementation of the code. However, we show that, in full linear algebra as well as in sparse linear algebra, efficiency and portability can be combined. To illustrate our approach, we discuss results obtained on a wide range of shared memory multiprocessors including the Alliant FX/80, the IBM 3090E/3VF, the IBM 3090J/6VF, the CRAY-2, and the CRAY Y-MP.  相似文献   

8.
Summary A scalable integral direct, distributed-data parallel algorithm for four-index transformation is presented. The algorithm was implemented in the context of the second-order M?ller-Plesset (MP2) energy evaluation, yet it is easily adopted for other electron correlation methods, where only MO integrals with two indices in the virtual orbitals space are required. The major computational steps of the MP2 energy are the two-electron integral evaluationO(N 4) and transformation into the MO basisO(ON 4), whereN is the number of basis functions, andO the number of occupied orbitals, respectively. The associated maximal communication costs scale asO(n Σ O 2 V N), whereV andn Σ denote the number of virtual orbitals, and the number of symmetry-unique shells. The largest local and global memory requirements areO(N 2) for the MO coefficients andO(OV N) for the three-quarter transformed integrals, respectively. Several aspects of the implementation such as symmetry-treatment, integral prescreening, and the distribution of data and computational tasks are discussed. The parallel efficiency of the algorithm is demonstrated by calculations on the phenanthrene molecule, with 762 primitive Gaussians, contracted to 412 basis functions. The calculations were performed on an IBM SP2 with 48 nodes. The measured wall clock time on 48 nodes is less than 15 min for this calculation, and the speedup relative to single-node execution is estimated to 527. This superlinear speedup is a result of exploiting both the compute power and the aggregate memory of the parallel computer. The latter reduces the number of passes through the AO integral list, and hence the operation count of the calculation. The test calculations also show that the evaluation of the two-electron integrals dominates the calculation, despite the higher scaling of the transformation step.  相似文献   

9.
A parallel direct self-consistent field (SCF) algorithm for distributed memory computers is described. Key features of the algorithm are its ability to achieve a load balance dynamically, its modest memory requirements per processor, and its ability to utilize the full eightfold index permutation symmetry of the two-electron integrals despite the fact that entire copies of the Fock and density matrices are not present in each processor's local memory. The algorithm is scalable and, accordingly, has the potential to function efficiently on hundreds of processors. With the algorithm described here, a calculation employing several thousand basis functions can be carried out on a distributed memory machine with 100 or more processors each with just 4 MBytes of RAM and no disk. The Fock matrix build portion of the algorithm has been implemented on a 16-node Intel iPSC/2. Results from benchmark calculations are encouraging. The algorithm shows excellent load balance when run on 4, 8, or 16 processors and displays almost ideal speed-up in going from 4 to 16 processors. Preliminary benchmark calculations have also been carried out on an Intel Paragon. © 1995 by John Wiley & Sons, Inc.  相似文献   

10.
The treatment of relativity and electron correlation on an equal footing is essential for the computation of systems containing heavy elements. Correlation treatments that are based on four‐component Dirac–Hartree–Fock calculations presently provide the most accurate, albeit costly, way of taking relativity into account. The requirement of having two expansion basis sets for the molecular wave function puts a high demand on computer resources. The treatment of larger systems is thereby often prohibited by the very large run times and files that arise in a conventional Dirac–Hartree–Fock approach. A possible solution for this bottleneck is a parallel approach that not only reduces the turnaround time but also spreads out the large files over a number of local disks. Here, we present a distributed‐memory parallelization of the program package MOLFDIR for the integral generation, Dirac–Hartree–Fock and four‐index MS transformation steps. This implementation scales best for large AO spaces and moderately sized active spaces. © 2000 John Wiley & Sons, Inc. J Comput Chem 21: 1176–1186, 2000  相似文献   

11.
We discuss issues in developing scalable parallel algorithms and focus on the distribution, as opposed to the replication, of key data structures. Replication of large data structures limits the maximum calculation size by imposing a low ratio of processors to memory. Only applications which distribute both data and computation across processors are truly scalable. The use of shared data structures that may be independently accessed by each process even in a distributed memory environment greatly simplifies development and provides a significant performance enhancement. We describe tools we have developed to support this programming paradigm. These tools are used to develop a highly efficient and scalable algorithm to perform self-consistent field calculations on molecular systems. A simple and classical strip-mining algorithm suffices to achieve an efficient and scalable Fock matrix construction in which all matrices are fully distributed. By strip mining over atoms, we also exploit all available sparsity and pave the way to adopting more sophisticated methods for summation of the Coulomb and exchange interactions. © 1996 by John Wiley & Sons, Inc.  相似文献   

12.
In this work, we present a parallel approach to complete and restricted active space second‐order perturbation theory, (CASPT2/RASPT2). We also make an assessment of the performance characteristics of its particular implementation in the Molcas quantum chemistry programming package. Parallel scaling is limited by memory and I/O bandwidth instead of available cores. Significant time savings for calculations on large and complex systems can be achieved by increasing the number of processes on a single machine, as long as memory bandwidth allows, or by using multiple nodes with a fast, low‐latency interconnect. We found that parallel efficiency drops below 50% when using 8–16 cores on the shared‐memory architecture, or 16–32 nodes on the distributed‐memory architecture, depending on the calculation. This limits the scalability of the implementation to a moderate amount of processes. Nonetheless, calculations that took more than 3 days on a serial machine could be performed in less than 5 h on an InfiniBand cluster, where the individual nodes were not even capable of running the calculation because of memory and I/O requirements. This ensures the continuing study of larger molecular systems by means of CASPT2/RASPT2 through the use of the aggregated computational resources offered by distributed computing systems. © 2013 Wiley Periodicals, Inc.  相似文献   

13.
Summary The Bending Corrected Rotating Linear Model (BCRLM), developed by Hayes and Walker, is a simple approximation to the true multidimensional scattering problem for reactions of the type: A + BC AB + C. While the BCRLM method is simpler than methods designed to obtain accurate three-dimensional quantum scattering results, this turns out to be a major advantage in terms of our benchmarking studies. The computer code used to obtain BCRLM scattering results is written for the most part in standard FORTRAN and has been ported to several scalar, vector, and parallel architecture computers including the IBM 3090-600J, the Cray XMP and YMP, the Ardent Titan, IBM RISC System/6000, Convex C-1 and the MIPS 2000. Benchmark results will be reported for each of these machines with an emphasis on comparing the scalar, vector, and parallel performance for the standard code with minimum modifications. Detailed analysis of the mapping of the BCRLM approach onto both shared and distributed memory parallel architecture machines indicates the importance of introducing several key changes in the basic strategy and algorithms used to calculate scattering results. This analysis of the BCRLM approach provides some insights into optimal strategies for mapping three-dimensional quantum scattering methods, such as the Parker-Pack method, onto shared or distributed memory parallel computers.  相似文献   

14.
Summary A distributed memory programming model was used in a fully parallel implementation of theab initio integral evaluation program ARGOS (R. Pitzer (1973)J. Chem. Phys. 58:3111), on shared memory UNIX computers. The method used is applicable to many similar problems, including derivative integral evaluation. Only a few lines of the existing sequential FORTRAN source required modification. Initial timings on several multi-processor computers are presented. A simplified version of the programming tool used is also presented, and general consideration is given to the parallel implementation of quantum chemistry algorithms.Work performed at Argonne National Laboratory under the auspices of the Division of Chemical Sciences, Office of Basic Energy Sciences, U.S. Department of Energy under contract W-31-109-Eng-38. Pacific Northwest Laboratory is operated for the U.S. Department of Energy by Battelle Memorial Institute under contract DE-AC06-76RLO 1830  相似文献   

15.
A number of hydrogen-bond related quantities—geometries, interaction energies, dipole moments, dipole moment derivatives, and harmonic vibrational frequencies—were calculated at the Hartree—Fock, MP2, and different DFT levels for the HCN dimer and the periodic HCN crystal. The crystal calculations were performed with the Hartree—Fock program CRYSTAL92, which routinely allows an a posteriori electron-correlation correction of the Hartree—Fock obtained lattice energy using different correlation-only functionals. Here, we have gone beyond this procedure by also calculating the electron-correlation energy correction during the structure optimization, i.e., after each CRYSTAL92 Hartree—Fock energy evaluation, the a posteriori density functional scheme was applied. In a similar manner, we optimized the crystal structure at the MP2 level, i.e., for each Hartree—Fock CRYSTAL92 energy evaluation, an MP2 correction was performed by summing the MP2 pair contributions from all HCN molecules within a specified cutoff distance. The crystal cell parameters are best reproduced at the Hartree—Fock and the nongradient-corrected HF + LDA and HF + VWN levels. The BSSE-corrected MP2 method and the HF + P91, HF + LDA, and HF + VWN methods give lattice energies in close agreement with the ZPE-corrected experimental lattice energy. The (HCN)2 dimer properties are best reproduced at the MP2 level, at the gradient-corrected DFT levels, and with the B3LYP and BHHLYP methods. © 1996 John Wiley & Sons, Inc.  相似文献   

16.
17.
A parallel algorithm for four-index transformation and MP2 energy evaluation, for distributed memory parallel (MIMD) machines is presented. The underlying serial algorithm for the present parallel effort is the four-index transform. The scheme works through parallelization over AO integrals and, therefore, spreads the O(n3) memory requirement across the processors, reducing it to O(n2). In this sense, the scheme superimposes a shared memory architecture onto the distributed memory setup. A detailed analysis of the algorithm is presented for networks with 4, 6, 8, 10, and 12 processors employing a smaller test case of 86 contractions. Model direct MP2 calculations for systems of sizes ranging from 160 to 238 basis functions are reported for 11- and 22-processor networks. A gain of at least 40% and above is observed for the larger systems. © 1997 by John Wiley & Sons, Inc.  相似文献   

18.
The full capacity of contemporary parallel computers can, in the context of iterative ab initio procedures like, for example, self-consistent field (SCF) and multiconfigurational SCF, only be utilized if the disk and input/output (I/O) capacity are fully exploited before the implementation turns to an integral direct strategy. In a recent report on parallel semidirect SCF http://www.tc.cornell.edu/er/media/1996/collabrate.html, http://www.fp.mcs.anl.gd/grand-challenges/chem/nondirect/index.html it was demonstrated that super-linear speedups are achievable for algorithms that exploit scalable parallel I/O. In the I/O-intensive SCF iterations of this implementation a static load balancing, however, was employed, dictated by the initial iteration in which integral evaluation dominates the central processing unit activity and thus determines the load balancing. In the present paper we present the first implementation in which load balancing is achieved throughout the whole SCF procedure, i.e. also in subsequent iterations. The improved scalability of our new algorithm is demonstrated in some test calculations, for example, for 63-node calculation a speedup of 104 was observed in the computation of the two-electron integral contribution to the Fock matrix.Contribution to the Björn Roos Honorary Issue Acknowledgement.We thank J. Nieplocha for valuable help and making the toolkit (including ChemIO) available to us. R.L. acknowledges the Intelligent Modeling Laboratory and the University of Tokyo for financial support during his stay in Japan.  相似文献   

19.
Three improvements on the direct self-consistent field method are proposed and tested which together increase CPU-efficiency by about 50%: (i) selective storage of costly integral batches; (ii) improved integral bond for prescreening; (iii) decomposition of the current density matrix into a linear combination of previous density matrices—for which the two-electron contributions to the Fock matrix are available—and a remainder ΔD, which is minimized; construction of the current Fock matrix only requires processing of the small ΔD which enhances prescreening.  相似文献   

20.
Summary One of the key methods in quantum chemistry, the Hartree-Fock SCF method, is performing poorly on typical vector supercomputers. A significant acceleration of calculations of this type requires the development and implementation of a parallel SCF algorithm. In this paper various parallelization strategies are discussed comparing local and global communication management as well as sequential and distributed Fock-matrix updates. Programs based on these algorithms are bench marked on transputer networks and two IBM MIMD prototypes. The portability of the code is demonstrated with the portation of the initial Helios version to other operating systems like Parallel VM/SP and PARIX. Based on the PVM libraries, a platform-independent version has been developed for heterogeneous workstation clusters as well as for massively parallel computers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号