首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We developed a new parallel density-functional canonical molecular-orbital program for large molecules based on the resolution of the identity method. In this study, all huge matrices were decomposed and saved to the distributed local memory. The routines of the analytical molecular integrals and numerical integrals of the exchange-correlation terms were parallelized using the single program multiple data method. A conventional linear algebra matrix library, ScaLAPACK, was used for matrix operations, such as diagonalization, multiplication, and inversion. Anderson's mixing method was adopted to accelerate the self-consistent field (SCF) convergence. Using this program, we calculated the canonical wavefunctions of a 306-residue protein, insulin hexamer (26,790 orbitals), and a 133-residue protein, interleukin (11,909 orbitals) by the direct-SCF method. In regard to insulin hexamer, the total parallelization efficiency of the first SCF iteration was estimated to be 82% using 64 Itanium 2 processors connected at 3.2 GB/s (SGI Altix3700), and the calculation successfully converged at the 17-th SCF iteration. By adopting the update method, the computational time of the first and the final SCF loops was 229 min and 156 min, respectively. The whole computational time including the calculation before the SCF loop was 2 days and 17 h. This study put the calculations of the canonical wavefunction of 30,000 orbitals to practical use.  相似文献   

2.
A parallel direct self-consistent field (SCF) algorithm for distributed memory computers is described. Key features of the algorithm are its ability to achieve a load balance dynamically, its modest memory requirements per processor, and its ability to utilize the full eightfold index permutation symmetry of the two-electron integrals despite the fact that entire copies of the Fock and density matrices are not present in each processor's local memory. The algorithm is scalable and, accordingly, has the potential to function efficiently on hundreds of processors. With the algorithm described here, a calculation employing several thousand basis functions can be carried out on a distributed memory machine with 100 or more processors each with just 4 MBytes of RAM and no disk. The Fock matrix build portion of the algorithm has been implemented on a 16-node Intel iPSC/2. Results from benchmark calculations are encouraging. The algorithm shows excellent load balance when run on 4, 8, or 16 processors and displays almost ideal speed-up in going from 4 to 16 processors. Preliminary benchmark calculations have also been carried out on an Intel Paragon. © 1995 by John Wiley & Sons, Inc.  相似文献   

3.
We implemented our gauge-including atomic orbital (GIAO) NMR chemical shielding program on a workstation cluster, using the parallel virtual machine (PVM) message-passing system. On a modest number of nodes, we achieved close to linear speedup. This program is characterized by several novel features. It uses the new integral program of Wolinski that calculates integrals in vectorized batches, increases efficiency, and simplifies parallelization. The self-consistent field (SCF) step includes a multi-Fock algorithm, i.e., the simultaneous calculation of several Fock matrices with the same integral set, increasing the efficiency of the direct SCF procedure. The SCF diagonalization step, which is difficult to parallelize, has been replaced by pseudodiagonalization. The latter, widely used in semiempirical programs, becomes important in ab initio type calculations above a certain size, because the ultimate scaling of the diagonalization step is steeper than that of integral computation. Examples of the calculation of the NMR shieldings in large systems at the SCF level are shown. Parallelization of the density functional code is underway. © 1997 by John Wiley & Sons, Inc. J Comput Chem 18: 816–825, 1997  相似文献   

4.
We describe the implementation of a parallel, in-core, integral-direct Hartree-Fock and density functional theory code for the efficient calculation of Hartree-Fock wave functions and density functional theory. The algorithm is based on a parallel master-slave algorithm, and the two-electron integrals calculated by a slave are stored in available local memory. To ensure the greatest computational savings, the master node keeps track of all integral batches stored on the different slaves. The code can reuse undifferentiated two-electron integrals both in the wave function optimization and in the evaluation of second-, third-, and fourth-order molecular properties. Superlinear scaling is achieved in a series of test examples, with speedups of up to 55 achieved for calculations run on medium-sized molecules on 16 processors with respect to the time used on a single processor.  相似文献   

5.
Summary The exact solution of the time-dependent Schrödinger equation is obtained using a parallel implementation of the standard grid techniques. Most of the operations involved in this calculation may be executed concurrently for each of the grid points. For the few operations which may not be executed concurrently, we have implemented parallel algorithms. In our two-dimensional implementation on the Connection Machine, we have obtained optimal speed-up — that is, by usingN processors we achieve a speed-up which is proportional toN. In addition to the discussion of our 2-dimensional implementation, we shall discuss our proposed 3-dimensional implementation of these grid techniques.  相似文献   

6.
A parallel algorithm for solving the coupled-perturbed MCSCF (CPMCSCF) equations and analytic nuclear second derivatives of CASSCF wave functions is presented. A parallel scheme for evaluating derivative integrals and their subsequent use in constructing other derivative quantities is described. The task of solving the CPMCSCF equations is approached using a parallelization scheme that partitions the electronic hessian matrix over all processors as opposed to simple partitioning of the 3 N solution vectors among the processors. The scalability of the current algorithm, up to 128 processors, is demonstrated. Using three test cases, results indicate that the parallelization of derivative integral evaluation through a simple scheme is highly effective regardless of the size of the basis set employed in the CASSCF energy calculation. Parallelization of the construction of the MCSCF electronic hessian during solution of the CPMCSCF equations varies quantitatively depending on the nature of the hessian itself, but is highly scalable in all cases.  相似文献   

7.
We present an outline of the parallel implementation of our pseudospectral electronic structure program, Jaguar, including the algorithm and timings for the Hartree–Fock and analytic gradient portions of the program. We also present the parallel algorithm and timings for our Lanczos eigenvector refinement code and demonstrate that its performance is superior to the ScaLAPACK diagonalization routines. The overall efficiency of our code increases as the size of the calculation is increased, demonstrating actual as well as theoretical scalability. For our largest test system, alanine pentapeptide [818 basis functions in the cc-pVTZ(-f) basis set], our Fock matrix assembly procedure has an efficiency of nearly 90% on a 16-processor SP2 partition. The SCF portion for this case (including eigenvector refinement) has an overall efficiency of 87% on a partition of 8 processors and 74% on a partition of 16 processors. Finally, our parallel gradient calculations have a parallel efficiency of 84% on 8 processors for porphine (430 basis functions). © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1017–1029, 1998  相似文献   

8.
Parallelization of the SCF method for closed-shell molecules on the highly parallel transputer-based system PARAM is described. The parallelization has been implemented on three different hardware and software environments: (1) a network of bare 64 transputers; (2) configuration 1 plus a back-end file system (BFS); and (3) configuration 2 with one INTEL i860 processor. The evaluation of electron repulsion integrals (ERIs) and setting up of the Fock matrix is carried out in parallel on 64 nodes using minimal communication strategies. A good load balance is achieved for ERI evaluation with the help of bounds, local symmetry features, and the shell concept, as well as a data randomization technique, resulting into almost linear speedup (for ERI evaluation). In configurations 2 and 3, BFS is used for parallel storage and retrieval of ERIs. Further, in 3 matrix operations are implemented as remote procedure calls on the i860 processor. Routine techniques of level shifting and extrapolation are used for accelerating SCF convergence. The resulting package, INDMOL, has been tested for some randomly selected molecules having up to 255 contractions. Using configuration 3, a factor of 2 to 5 in computation time is obtained over 1, for the systems for which the ERIs cannot be stored in the distributed core memory. In summary, a heterogeneous system, as in configuration 3, can indeed be optimally exploited for programming peculiar diverse requirements of the SCF procedure. © 1993 John Wiley & Sons, Inc.  相似文献   

9.
Several parallel algorithms for Fock matrix construction are described. The algorithms calculate only the unique integrals, distribute the Fock and density matrices over the processors of a massively parallel computer, use blocking techniques to construct the distributed data structures, and use clustering techniques on each processor to maximize data reuse. Algorithms based on both square and row-blocked distributions of the Fock and density matrices are described and evaluated. Variants of the algorithms are discussed that use either triple-sort or canonical ordering of integrals, and dynamic or static task clustering schemes. The algorithms are shown to adapt to screening, with communication volume scaling down with computation costs. Modeling techniques are used to characterize algorithm performance. Given the characteristics of existing massively parallel computers, all the algorithms are shown to be highly efficient for problems of moderate size. The algorithms using the row-blocked data distribution are the most efficient. © 1996 by John Wiley & Sons, Inc.  相似文献   

10.
A parallel algorithm for four-index transformation and MP2 energy evaluation, for distributed memory parallel (MIMD) machines is presented. The underlying serial algorithm for the present parallel effort is the four-index transform. The scheme works through parallelization over AO integrals and, therefore, spreads the O(n3) memory requirement across the processors, reducing it to O(n2). In this sense, the scheme superimposes a shared memory architecture onto the distributed memory setup. A detailed analysis of the algorithm is presented for networks with 4, 6, 8, 10, and 12 processors employing a smaller test case of 86 contractions. Model direct MP2 calculations for systems of sizes ranging from 160 to 238 basis functions are reported for 11- and 22-processor networks. A gain of at least 40% and above is observed for the larger systems. © 1997 by John Wiley & Sons, Inc.  相似文献   

11.
A time-saving way of programming an SCF method for molecules (SCF ? MO ? LCGO method) is proposed, which is based on an economic way of handling the two-electron-multicenter integrals.  相似文献   

12.
A parallel algorithm for efficient calculation of the second derivatives (Hessian) of the conformational energy in internal coordinates is proposed. This parallel algorithm is based on the master/slave model. A master processor distributes the calculations of components of the Hessian to one or more slave processors that, after finishing their calculations, send the results to the master processor that assembles all the components of the Hessian. Our previously developed molecular analysis system for conformational energy optimization, normal mode analysis, and Monte Carlo simulation for internal coordinates is extended to use this parallel algorithm for Hessian calculation on a massively parallel computer. The implementation of our algorithm uses the message passing interface and works effectively on both distributed-memory parallel computers and shared-memory parallel computers. We applied this system to the Newton–Raphson energy optimization of the structures of glutaminyl transfer RNA (Gln-tRNA) with 74 nucleotides and glutaminyl-tRNA synthetase (GlnRS) with 540 residues to analyze the performance of our system. The parallel speedups for the Hessian calculation were 6.8 for Gln-tRNA with 24 processors and 11.2 for GlnRS with 54 processors. The parallel speedups for the Newton–Raphson optimization were 6.3 for Gln-tRNA with 30 processors and 12.0 for GlnRS with 62 processors. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1716–1723, 1998  相似文献   

13.
We present the parallel version of a previous serial algorithm for the efficient calculation of canonical MP2 energies (Pulay, P.; Saebo, S.; Wolinski, K. Chem Phys Lett 2001, 344, 543). It is based on the Saebo-Alml?f direct-integral transformation, coupled with an efficient prescreening of the AO integrals. The parallel algorithm avoids synchronization delays by spawning a second set of slaves during the bin-sort prior to the second half-transformation. Results are presented for systems with up to 2000 basis functions. MP2 energies for molecules with 400-500 basis functions can be routinely calculated to microhartree accuracy on a small number of processors (6-8) in a matter of minutes with modern PC-based parallel computers.  相似文献   

14.
An unconventional SCF method for calculations on large molecules with more than 100 basis functions is described. Storage problems which arise in conventional SCF schemes when storing more than 107 integrals are avoided by repeated calculation of integrals. The resulting increase in computational times is kept at a reasonable level by (a) improving the initial guess, (b) accelerating convergence, (c) employing a recursive construction of the Fock matrix, and (d) eliminating insignificant integrals from the calculation by a density-weighted cutoff criterion. Sample calculations show that, compared with conventional SCF calculations, computational times increase by 25%–75% depending on the basis set and the shape of the molecule.  相似文献   

15.
In recent years, computer simulations have become increasingly useful when trying to understand the complex dynamics of biochemical networks, particularly in stochastic systems. In such situations stochastic simulation is vital in gaining an understanding of the inherent stochasticity present, as these models are rarely analytically tractable. However, a stochastic approach can be computationally prohibitive for many models. A number of approximations have been proposed that aim to speed up stochastic simulations. However, the majority of these approaches are fundamentally serial in terms of central processing unit (CPU) usage. In this paper, we propose a novel simulation algorithm that utilises the potential of multi-core machines. This algorithm partitions the model into smaller sub-models. These sub-models are then simulated, in parallel, on separate CPUs. We demonstrate that this method is accurate and can speed-up the simulation by a factor proportional to the number of processors available.  相似文献   

16.
In order to calculate the one- and two-electron, two-center integrals over non-integer n Slater type orbitals, use is made of elliptical coordinates for the monoelectronic, hybrid, and Coulomb integrals. For the exchange integrals, the atomic orbitals are translated to a common center. The final integration is performed by Gaussian quadrature.As an example, an SCF ab initio calculation is performed for the LiH molecule, both with integer and non-integer principal quantum number.  相似文献   

17.
A new algorithm for parallel calculation of the second derivatives (Hessian) of the conformational energy function of biomolecules in internal coordinates is proposed. The basic scheme of this algorithm is the division of the entire calculation of the Hessian matrix (called "task") into subtasks and the optimization of the assignment of processors to each subtask by considering both the load balancing and reduction of the communication cost. A genetic algorithm is used for this optimization considering the dependencies between subtasks. We applied this method to a glutaminyl transfer RNA (Gln-tRNA) molecule for which the scalability of our previously developed parallel algorithm was significantly decreased when the large number of processors was used. The speedup for the calculation was 32.6 times with 60 processors, which is considerably better than the speedup for our previously reported parallel algorithm. The elapsed time for the calculation of subtasks, data sending, and data receiving was analyzed, and the effect of the optimization using the genetic algorithm is discussed.  相似文献   

18.
We present here a set of algorithms that completely rewrites the Hartree–Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS‐US, GAMESS‐UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6‐31G Self‐Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one‐ and two‐electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc.  相似文献   

19.
Ab initio LCAO -MO -SCF Gaussian basis function calculations have been performed for chlorpromazine and promazine. By prescreening for the size of the integrals before calculation, it was only necessary to calculate 12 million out of the possible 38.5 million integrals for chlorpromazine. By a novel procedure of processing the integral tapes for the SCF it was possible to cut down significantly on the amount of time for the SCF . The SCF calculations converged smoothly for both promazine and chlorpromazine. There is a sizeable energy gap between the energy of the highest occupied molecular orbitals in these molecules (which is of the order of ?0.3 a.u.) and the lowest unoccupied molecular orbital (which is of the order of + 0.15 a.u.). The gross atomic populations of chlorpromazine and promazine resemble each other and differ only somewhat on the carbon atom to which the substituent is attached and the carbons and their hydrogens adjacent to it.  相似文献   

20.
Cluster calculations which model chemisorption on a surface are often composed of substrate atoms arranged in a periodic manner. This pseudo-lattice symmetry of a cluster is used to reduce the number of 2-electron integrals computed in a SCF calculation by evaluating only unique integrals identified by lattice displacement vectors. The method, without using any explicit symmetry, is shown to be competitive with calculations which utilize point group symmetry. It is also demonstrated that the pseudo-lattice method markedly reduces the number of 2-electron integrals in multi-layer clusters which have little or no symmetry.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号