期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel pseudospectral electronic structure: II. Localized Møller–Plesset calculations

Michael D. Beachy David Chasman Richard A. Friesner Robert B. Murphy 《Journal of computational chemistry》1998,19(9):1030-1038

We have developed a parallel version of our pseudospectral localized Møller–Plesset electronic structure code. We present timings for molecules up to 1010 basis functions and parallel speedup for molecules in the range of 260–658 basis functions. We demonstrate that the code is scalable; that is, a larger number of nodes can be efficiently utilized as the size of the molecule increases. By taking advantage of the available distributed memory and disk space of a scalable parallel computer, the parallel code can calculate LMP2 energies of molecules too large to be done on workstations. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1030–1038, 1998 相似文献

2.

A novel parallel algorithm for large-scale Fock matrix construction with small locally distributed memory architectures: RT parallel algorithm

Takashima H Yamada S Obara S Kitamura K Inabata S Miyakawa N Tanabe K Nagashima U 《Journal of computational chemistry》2002,23(14):1337-1346

We developed a novel parallel algorithm for large-scale Fock matrix calculation with small locally distributed memory architectures, and named it the "RT parallel algorithm." The RT parallel algorithm actively involves the concept of integral screening, which is indispensable for reduction of computing times with large-scale biological molecules. The primary characteristic of this algorithm is parallel efficiency, which is achieved by well-balanced reduction of both communicating and computing volume. Only the density matrix data necessary for Fock matrix calculations are communicated, and the data once communicated are reutilized for calculations as many times as possible. The RT parallel algorithm is a scalable method because required memory volume does not depend on the number of basis functions. This algorithm automatically includes a partial summing technique that is indispensable for maintaining computing accuracy, and can also include some conventional methods to reduce calculation times. In our analysis, the RT parallel algorithm had better performance than other methods for massively parallel processors. The RT parallel algorithm is most suitable for massively parallel and distributed Fock matrix calculations for large-scale biological molecules with more than thousands of basis functions. 相似文献

3.

Parallel implementation of divide-and-conquer semiempirical quantum chemistry calculations

Wei Pan Tai-Sung Lee Weitao Yang 《Journal of computational chemistry》1998,19(9):1101-1109

We have implemented a parallel divide-and-conquer method for semiempirical quantum mechanical calculations. The standard message passing library, the message passing interface (MPI), was used. In this parallel version, the memory needed to store the Fock and density matrix elements is distributed among the processors. This memory distribution solves the problem of demanding requirement of memory for very large molecules. While the parallel calculation for construction of matrix elements is straightforward, the parallel calculation of Fock matrix diagonalization is achieved via the divide-and-conquer method. Geometry optimization is also implemented with parallel gradient calculations. The code has been tested on a Cray T3E parallel computer, and impressive speedup of calculations has been achieved. Our results indicate that the divide-and-conquer method is efficient for parallel implementation. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1101–1109, 1998 相似文献

4.

Parallel Fock matrix construction with distributed shared memory model for the FMO‐MO method

Hiroaki Umeda Yuichi Inadomi Toshio Watanabe Toru Yagi Takayoshi Ishimoto Tsutomu Ikegami Hiroto Tadano Tetsuya Sakurai Umpei Nagashima 《Journal of computational chemistry》2010,31(13):2381-2388

A parallel Fock matrix construction program for FMO‐MO method has been developed with the distributed shared memory model. To construct a large‐sized Fock matrix during FMO‐MO calculations, a distributed parallel algorithm was designed to make full use of local memory to reduce communication, and was implemented on the Global Array toolkit. A benchmark calculation for a small system indicates that the parallelization efficiency of the matrix construction portion is as high as 93% at 1,024 processors. A large FMO‐MO application on the epidermal growth factor receptor (EGFR) protein (17,246 atoms and 96,234 basis functions) was also carried out at the HF/6‐31G level of theory, with the frontier orbitals being extracted by a Sakurai‐Sugiura eigensolver. It takes 11.3 h for the FMO calculation, 49.1 h for the Fock matrix construction, and 10 min to extract 94 eigen‐components on a PC cluster system using 256 processors. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010 相似文献

5.

Linear scaling computation of the Fock matrix. VII. Parallel computation of the Coulomb matrix

Gan CK Tymczak CJ Challacombe M 《The Journal of chemical physics》2004,121(14):6608-6614

We present parallelization of a quantum-chemical tree-code for linear scaling computation of the Coulomb matrix. Equal time partition is used to load balance computation of the Coulomb matrix. Equal time partition is a measurement based algorithm for domain decomposition that exploits small variation of the density between self-consistent-field cycles to achieve load balance. Efficiency of the equal time partition is illustrated by several tests involving both finite and periodic systems. It is found that equal time partition is able to deliver 91%-98% efficiency with 128 processors in the most time consuming part of the Coulomb matrix calculation. The current parallel quantum chemical tree code is able to deliver 63%-81% overall efficiency on 128 processors with fine grained parallelism (less than two heavy atoms per processor). 相似文献

6.

Implementation of a parallel direct SCF algorithm on distributed memory computers

Thomas R. Furlani Harry F. King 《Journal of computational chemistry》1995,16(1):91-104

A parallel direct self-consistent field (SCF) algorithm for distributed memory computers is described. Key features of the algorithm are its ability to achieve a load balance dynamically, its modest memory requirements per processor, and its ability to utilize the full eightfold index permutation symmetry of the two-electron integrals despite the fact that entire copies of the Fock and density matrices are not present in each processor's local memory. The algorithm is scalable and, accordingly, has the potential to function efficiently on hundreds of processors. With the algorithm described here, a calculation employing several thousand basis functions can be carried out on a distributed memory machine with 100 or more processors each with just 4 MBytes of RAM and no disk. The Fock matrix build portion of the algorithm has been implemented on a 16-node Intel iPSC/2. Results from benchmark calculations are encouraging. The algorithm shows excellent load balance when run on 4, 8, or 16 processors and displays almost ideal speed-up in going from 4 to 16 processors. Preliminary benchmark calculations have also been carried out on an Intel Paragon. © 1995 by John Wiley & Sons, Inc. 相似文献

7.

A scalable divide-and-conquer algorithm combining coarse and fine-grain parallelization

Sor Koon Goh Carlos P. Sosa Alain St-Amant 《Theoretical chemistry accounts》1998,99(3):197-206

We describe an efficient algorithm for carrying out a “divide-and-conquer” fit of a molecule's electronic density on massively parallel computers. Near linear speedups are achieved with up to 48 processors on a Cray T3E, and our results indicate that similar efficiencies could be attained on an even greater number of processors. To achieve optimum efficiency, the algorithm combines coarse and fine-grain parallelization and adapts itself to the existing ratio of processors to subsystems. The subsystems employed in our divide-and-conquer approach can also be made smaller or bigger, depending on the number of processors available. This allows us to further reduce the wallclock time and improve the method's overall efficiency. The strategies implemented in this paper can be extended to any other divide-and-conquer method used within an ab initio, density functional, or semi-empirical quantum mechanical program. Received: 15 September 1997 / Accepted: 21 January 1998 相似文献

8.

An efficient parallel algorithm for the calculation of unrestricted canonical MP2 energies

Baker J Wolinski K 《Journal of computational chemistry》2011,32(15):3304-3312

We present details of our efficient implementation of full accuracy unrestricted open‐shell second‐order canonical Møller–Plesset (MP2) energies, both serial and parallel. The algorithm is based on our previous restricted closed‐shell MP2 code using the Saebo–Almlöf direct integral transformation. Depending on system details, UMP2 energies take from less than 1.5 to about 3.0 times as long as a closed‐shell RMP2 energy on a similar system using the same algorithm. Several examples are given including timings for some large stable radicals with 90+ atoms and over 3600 basis functions. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011 相似文献

9.

Parallel internally contracted multireference configuration interaction

Abigail J. Dobbyn Peter J. Knowles Robert J. Harrison 《Journal of computational chemistry》1998,19(11):1215-1228

A parallel implementation of the internally contracted (IC) multireference configuration (MRCI) module of the MOLPRO quantum chemistry program is described. The global array (GA) toolkit has been used in order to map an existing disk-paging small-memory algorithm onto a massively parallel supercomputer, where disk storage is replaced by the combined memory of all processors. This model has enabled a rather complicated code to be ported to the parallel environment without the need for the wholesale redesign of algorithms and data structures. Examples show that the parallel ICMRCI program can deliver results in a fraction of the time needed for equivalent uncontracted MRCI computations. Further examples demonstrate that ICMRCI computations with up to 10⁷ variational parameters, and equivalent to uncontracted MRCI with 10⁹ configurations, are feasible. The largest calculation demonstrates a parallel efficiency of about 80% on 128 nodes of a Cray T3E-300. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1215–1228, 1998 相似文献

10.

The SHARK integral generation and digestion system

Frank Neese 《Journal of computational chemistry》2023,44(3):381-396

In this paper, the SHARK integral generation and digestion engine is described. In essence, SHARK is based on a reformulation of the popular McMurchie/Davidson approach to molecular integrals. This reformulation leads to an efficient algorithm that is driven by BLAS level 3 operations. The algorithm is particularly efficient for high angular momentum basis functions (up to L = 7 is available by default, but the algorithm is programmed for arbitrary angular momenta). SHARK features a significant number of specific programming constructs that are designed to greatly simplify the workflow in quantum chemical program development and avoid undesirable code duplication to the largest possible extent. SHARK can handle segmented, generally and partially generally contracted basis sets. It can be used to generate a host of one- and two-electron integrals over various kernels including, two-, three-, and four-index repulsion integrals, integrals over Gauge Including Atomic Orbitals (GIAOs), relativistic integrals and integrals featuring a finite nucleus model. SHARK provides routines to evaluate Fock like matrices, generate integral transformations and related tasks. SHARK is the essential engine inside the ORCA package that drives essentially all tasks that are related to integrals over basis functions in version ORCA 5.0 and higher. Since the core of SHARK is based on low-level basic linear algebra (BLAS) operations, it is expected to not only perform well on present day but also on future hardware provided that the hardware manufacturer provides a properly optimized BLAS library for matrix and vector operations. Representative timings and comparisons to the Libint library used by ORCA are reported for Intel i9 and Apple M1 max processors. 相似文献

11.

A new parallel algorithm of MP2 energy calculations

Ishimura K Pulay P Nagase S 《Journal of computational chemistry》2006,27(4):407-413

A new parallel algorithm has been developed for second‐order Møller–Plesset perturbation theory (MP2) energy calculations. Its main projected applications are for large molecules, for instance, for the calculation of dispersion interaction. Tests on a moderate number of processors (2–16) show that the program has high CPU and parallel efficiency. Timings are presented for two relatively large molecules, taxol (C₄₇H₅₁NO₁₄) and luciferin (C₁₁H₈N₂O₃S₂), the former with the 6‐31G* and 6‐311G** basis sets (1032 and 1484 basis functions, 164 correlated orbitals), and the latter with the aug‐cc‐pVDZ and aug‐cc‐pVTZ basis sets (530 and 1198 basis functions, 46 correlated orbitals). An MP2 energy calculation on C₁₃₀H₁₀ (1970 basis functions, 265 correlated orbitals) completed in less than 2 h on 128 processors. © 2006 Wiley Periodicals, Inc. J Comput Chem 27: 407–413, 2006 相似文献

12.

Parallel algorithm for efficient calculation of second derivatives of conformational energy function in internal coordinates

Shugo Nakamura Mitsunori Ikeguchi Kentaro Shimizu 《Journal of computational chemistry》1998,19(15):1716-1723

A parallel algorithm for efficient calculation of the second derivatives (Hessian) of the conformational energy in internal coordinates is proposed. This parallel algorithm is based on the master/slave model. A master processor distributes the calculations of components of the Hessian to one or more slave processors that, after finishing their calculations, send the results to the master processor that assembles all the components of the Hessian. Our previously developed molecular analysis system for conformational energy optimization, normal mode analysis, and Monte Carlo simulation for internal coordinates is extended to use this parallel algorithm for Hessian calculation on a massively parallel computer. The implementation of our algorithm uses the message passing interface and works effectively on both distributed-memory parallel computers and shared-memory parallel computers. We applied this system to the Newton–Raphson energy optimization of the structures of glutaminyl transfer RNA (Gln-tRNA) with 74 nucleotides and glutaminyl-tRNA synthetase (GlnRS) with 540 residues to analyze the performance of our system. The parallel speedups for the Hessian calculation were 6.8 for Gln-tRNA with 24 processors and 11.2 for GlnRS with 54 processors. The parallel speedups for the Newton–Raphson optimization were 6.3 for Gln-tRNA with 30 processors and 12.0 for GlnRS with 62 processors. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1716–1723, 1998 相似文献

13.

Direct SCF structure optimization of large molecules on networks of workstations

S. Vogel J. Hutter T. H. Fischer H. P. Lüthi 《International journal of quantum chemistry》1993,45(6):665-678

A program to optimize the structure of large molecules at the Hartree–Fock level of theory running concurrently on a network of workstations is presented. Problems encountered in obtaining nearly optimal speedup and their solutions are discussed. A simple scheduling algorithm is presented that enables up to 99.5% of the code to run in parallel. © 1993 John Wiley & Sons, Inc. 相似文献

14.

Parallel DFT gradients using the Fourier Transform Coulomb method

Baker J Wolinski K Pulay P 《Journal of computational chemistry》2007,28(16):2581-2588

The recently described Fourier Transform Coulomb (FTC) algorithm for fast and accurate calculation of Density Functional Theory (DFT) gradients (Füsti-Molnar, J Chem Phys 2003, 119, 11080) has been parallelized. We present several calculations showing the speed and accuracy of our new parallel FTC gradient code, comparing its performance with our standard DFT code. For that part of the total derivative Coulomb potential that can be evaluated in plane wave space, the current parallel FTC gradient algorithm is up to 200 times faster in total than our classical all-integral algorithm, depending on the system size and basis set, with essentially no loss in accuracy. Proposed modifications should further improve the overall performance relative to the classical algorithm. 相似文献

15.

Full configuration interaction algorithm on a massively parallel architecture: Direct-list implementation

Elda Rossi Gian Luigi Bendazzoli Stefano Evangelisti 《Journal of computational chemistry》1998,19(6):658-672

A parallel full configuration interaction (FCI) code, implemented on a distributed memory MPP computer, has been modified in order to use a direct algorithm to compute the lists of mono- and biexcitations each time they are needed. We were able to perform FCI calculations on the ground state of the acetylene molecule with two different basis sets, corresponding to more than 2.5 and 5 billion Slater determinants, respectively. The calculations were performed on a Cray-T3D and a Cray-T3E, both machines having 128 processors. Performance and comparison between the two computers are reported and discussed. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 658–672, 1998 相似文献

16.

Two-electron integral evaluation for uncontracted geometrical-type Gaussian functions

M. W. Wong G. Corongiu E. Clementi 《Journal of computational chemistry》1991,12(2):215-219

A new algorithm for efficient evaluation of two-electron repulsion integrals (ERIs) using uncontracted geometrical-type Gaussian basis functions is presented. Integrals are evaluated by the Habitz and Clementi method. The use of uncontracted geometrical basis sets allows grouping of basis functions into shells (s, sp, spd, or spdf) and processing of integrals in blocks (shell quartets). By utilizing information common to a block of integrals, this method achieves high efficiency. This technique has been incorporated into the KGNMOL molecular interaction program. Representative timings for a number of molecules with different basis sets are presented. The new code is found to be significantly faster than the previous program. For ERIs involving only s and p functions, the new algorithm is a factor of two faster than previously. The new program is also found to be competitive when compared with other standard molecular packages, such as HONDO-8 and Gaussian 86. 相似文献

17.

On the efficiency of VBSCF algorithms,a comment on “An efficient algorithm for energy gradients and orbital optimization in valence bond theory”

J. H. van Lenthe H. B. Broer‐Braam Z. Rashid 《Journal of computational chemistry》2012,33(8):911-913

We comment on the paper [Song et al., J. Comput. Chem. 2009, 30, 399]. and discuss the efficiency of the orbital optimization and gradient evaluation in the Valence Bond Self Consistent Field (VBSCF) method. We note that Song et al. neglect to properly reference Broer et al., who published an algorithm [Broer and Nieuwpoort, Theor. Chim. Acta 1988, 73, 405] to use a Fock matrix to compute a matrix element between two different determinants, which can be used for an orbital optimization. Further, Song et al. publish a misleading comparison with our VBSCF algorithm [Dijkstra and van Lenthe, J. Chem. Phys. 2000, 113, 2100; van Lenthe et al., Mol. Phys. 1991, 73, 1159] to enable them to favorably compare their algorithm with ours. We give detail timings in terms of different orbital types in the calculation and actual timings for the example cases. © 2012 Wiley Periodicals, Inc. 相似文献

18.

DIESEL-MP2: a new program to perform large-scale multireference-MP2 computations

Musch P Engels B 《Journal of computational chemistry》2006,27(10):1055-1062

This article presents a new MR‐MP2 code (Multi‐Reference Møller–Plesset 2nd order) suitable for the computation MR‐MP2 energies of extended systems with strong near degeneracy effects (e.g., open shell systems). It is based on the DIESEL program package developed by Hanrath and Engels. Due to improved algorithms the new code is able to handle systems with 400–500 basis functions and more than 100 electrons. The code is made for parallel computers with distributed memory, but can also be run on local machines. It possesses two integral interfaces (MOLCAS, TURBOMOLE). The algorithms are briefly introduced and timings for the Neocarzinostatin chromophore are presented. The efficiencies of the codes obtained with Intel or GNU compilers are compared. © 2006 Wiley Periodicals, Inc. J Comput Chem 27: 1055–1062, 2006 相似文献

19.

Superlinear scaling in master-slave quantum chemical calculations using in-core storage of two-electron integrals

Fossgård E Ruud K 《Journal of computational chemistry》2006,27(3):326-333

We describe the implementation of a parallel, in-core, integral-direct Hartree-Fock and density functional theory code for the efficient calculation of Hartree-Fock wave functions and density functional theory. The algorithm is based on a parallel master-slave algorithm, and the two-electron integrals calculated by a slave are stored in available local memory. To ensure the greatest computational savings, the master node keeps track of all integral batches stored on the different slaves. The code can reuse undifferentiated two-electron integrals both in the wave function optimization and in the evaluation of second-, third-, and fourth-order molecular properties. Superlinear scaling is achieved in a series of test examples, with speedups of up to 55 achieved for calculations run on medium-sized molecules on 16 processors with respect to the time used on a single processor. 相似文献

20.

Parallel algorithm for the computation of the Hartree-Fock exchange matrix: gas phase and periodic parallel ONX

Weber V Challacombe M 《The Journal of chemical physics》2006,125(10):104110

In this paper we present an efficient parallelization of the ONX algorithm for linear computation of the Hartree-Fock exchange matrix [J. Chem. Phys. 106, 9708 (1997)]. The method used is based on the equal time (ET) partitioning recently introduced [J. Chem. Phys. 118, 9128 (2003)] and [J. Chem. Phys. 121, 6608 (2004)]. ET exploits the slow variation of the density matrix between self-consistent-field iterations to achieve load balance. The method is presented and some benchmark calculations are discussed for gas phase and periodic systems with up to 128 processors. The current parallel ONX code is able to deliver up to 77% overall efficiency for a cluster of 50 water molecules on 128 processors (2.56 processors per heavy atom) and up to 87% for a box of 64 water molecules (two processors per heavy atom) with periodic boundary conditions. 相似文献