首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We have developed a new hybrid (MPI+OpenMP) parallelization scheme for molecular dynamics (MD) simulations by combining a cell‐wise version of the midpoint method with pair‐wise Verlet lists. In this scheme, which we call the midpoint cell method, simulation space is divided into subdomains, each of which is assigned to a MPI processor. Each subdomain is further divided into small cells. The interaction between two particles existing in different cells is computed in the subdomain containing the midpoint cell of the two cells where the particles reside. In each MPI processor, cell pairs are distributed over OpenMP threads for shared memory parallelization. The midpoint cell method keeps the advantages of the original midpoint method, while filtering out unnecessary calculations of midpoint checking for all the particle pairs by single midpoint cell determination prior to MD simulations. Distributing cell pairs over OpenMP threads allows for more efficient shared memory parallelization compared with distributing atom indices over threads. Furthermore, cell grouping of particle data makes better memory access, reducing the number of cache misses. The parallel performance of the midpoint cell method on the K computer showed scalability up to 512 and 32,768 cores for systems of 20,000 and 1 million atoms, respectively. One MD time step for long‐range interactions could be calculated within 4.5 ms even for a 1 million atoms system with particle‐mesh Ewald electrostatics. © 2014 Wiley Periodicals, Inc.  相似文献   

2.
An efficient parallelization scheme for classical molecular dynamics simulations with flexible, polarizable empirical potentials is presented. It is based on the standard Ewald summation technique to handle the long-range electrostatic and induction interactions. The algorithm for this parallelization scheme is designed for systems containing several thousands of polarizable sites in the simulation box. Its performance is evaluated during molecular dynamics simulations under periodic boundary conditions with unit cell sizes ranging from 128 to 512 molecules employing two flexible polarizable water models [DC(F) and TTM2.1-F] containing 1 and 3 polarizable sites, respectively. The time-to-solution for these two polarizable models is compared with the one for a flexible, pairwise-additive water model (TIP4F). The benchmarks were performed on both shared and distributed memory platforms. As a result of the efficient calculation of the induced dipole moments, a superlinear scaling as a function of the number of the processors is observed. To the best of our knowledge, this is the first reported results of parallel scaling and performance for simulations of liquid water with a polarizable potential under periodic boundary conditions.  相似文献   

3.
Short-range molecular dynamics simulations of molecular systems are commonly parallelized by replicated-data methods, in which each processor stores a copy of all atom positions. This enables computation of bonded 2-, 3-, and 4-body forces within the molecular topology to be partitioned among processors straightforwardly. A drawback to such methods is that the interprocessor communication scales as N (the number of atoms) independent of P (the number of processors). Thus, their parallel efficiency falls off rapidly when large numbers of processors are used. In this article a new parallel method for simulating macromolecular or small-molecule systems is presented, called force-decomposition. Its memory and communication costs scale as N/√P, allowing larger problems to be run faster on greater numbers of processors. Like replicated-data techniques, and in contrast to spatial-decomposition approaches, the new method can be simply load balanced and performs well even for irregular simulation geometries. The implementation of the algorithm in a prototypical macromolecular simulation code ParBond is also discussed. On a 1024-processor Intel Paragon, ParBond runs a standard benchmark simulation of solvated myoglobin with a parallel efficiency of 61% and at 40 times the speed of a vectorized version of CHARMM running on a single Cray Y-MP processor. © 1996 by John Wiley & Sons, Inc.  相似文献   

4.
A parallel algorithm for four-index transformation and MP2 energy evaluation, for distributed memory parallel (MIMD) machines is presented. The underlying serial algorithm for the present parallel effort is the four-index transform. The scheme works through parallelization over AO integrals and, therefore, spreads the O(n3) memory requirement across the processors, reducing it to O(n2). In this sense, the scheme superimposes a shared memory architecture onto the distributed memory setup. A detailed analysis of the algorithm is presented for networks with 4, 6, 8, 10, and 12 processors employing a smaller test case of 86 contractions. Model direct MP2 calculations for systems of sizes ranging from 160 to 238 basis functions are reported for 11- and 22-processor networks. A gain of at least 40% and above is observed for the larger systems. © 1997 by John Wiley & Sons, Inc.  相似文献   

5.
We discuss issues in developing scalable parallel algorithms and focus on the distribution, as opposed to the replication, of key data structures. Replication of large data structures limits the maximum calculation size by imposing a low ratio of processors to memory. Only applications which distribute both data and computation across processors are truly scalable. The use of shared data structures that may be independently accessed by each process even in a distributed memory environment greatly simplifies development and provides a significant performance enhancement. We describe tools we have developed to support this programming paradigm. These tools are used to develop a highly efficient and scalable algorithm to perform self-consistent field calculations on molecular systems. A simple and classical strip-mining algorithm suffices to achieve an efficient and scalable Fock matrix construction in which all matrices are fully distributed. By strip mining over atoms, we also exploit all available sparsity and pave the way to adopting more sophisticated methods for summation of the Coulomb and exchange interactions. © 1996 by John Wiley & Sons, Inc.  相似文献   

6.
Parallel replica dynamics simulation methods appropriate for the simulation of chemical reactions in molecular systems with many conformational degrees of freedom have been developed and applied to study the microsecond-scale pyrolysis of n-hexadecane in the temperature range of 2100-2500 K. The algorithm uses a transition detection scheme that is based on molecular topology, rather than energetic basins. This algorithm allows efficient parallelization of small systems even when using more processors than particles (in contrast to more traditional parallelization algorithms), and even when there are frequent conformational transitions (in contrast to previous implementations of the parallel replica algorithm). The parallel efficiency for pyrolysis initiation reactions was over 90% on 61 processors for this 50-atom system. The parallel replica dynamics technique results in reaction probabilities that are statistically indistinguishable from those obtained from direct molecular dynamics, under conditions where both are feasible, but allows simulations at temperatures as much as 1000 K lower than direct molecular dynamics simulations. The rate of initiation displayed Arrhenius behavior over the entire temperature range, with an activation energy and frequency factor of E(a) = 79.7 kcal/mol and log A/s(-1) = 14.8, respectively, in reasonable agreement with experiment and empirical kinetic models. Several interesting unimolecular reaction mechanisms were observed in simulations of the chain propagation reactions above 2000 K, which are not included in most coarse-grained kinetic models. More studies are needed in order to determine whether these mechanisms are experimentally relevant, or specific to the potential energy surface used.  相似文献   

7.
Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3's Cell processor over more traditional processors.  相似文献   

8.
A two‐level hierarchical parallelization scheme including the second‐order Møller–Plesset perturbation (MP2) theory in the divide‐and‐conquer method is presented. The scheme is a combination of coarse‐grain parallelization assigning each subsystem to a group of processors, with fine‐grain parallelization, where the computational tasks for evaluating MP2 correlation energy of the assigned subsystem are distributed among processors in the group. Test calculations demonstrate that the present scheme shows high parallel efficiency and makes MP2 calculations practical for very large molecules. © 2011 Wiley Periodicals, Inc. J Comput Chem, 2011  相似文献   

9.
Parallelization is an effective way to reduce the computational time needed for molecular dynamics simulations. We describe a new parallelization method, the distributed-diagonal force decomposition method, with which we extend and improve the existing force decomposition methods. Our new method requires less data communication during molecular dynamics simulations than replicated data and current force decomposition methods, increasing the parallel efficiency. It also dynamically load-balances the processors' computational load throughout the simulation. The method is readily implemented in existing molecular dynamics codes and it has been incorporated into the CHARMM program, allowing its immediate use in conjunction with the many molecular dynamics simulation techniques that are already present in the program. We also present the design of the Force Decomposition Machine, a cluster of personal computers and networks that is tailored to running molecular dynamics simulations using the distributed diagonal force decomposition method. The design is expandable and provides various degrees of fault resilience. This approach is easily adaptable to computers with Graphics Processing Units because it is independent of the processor type being used.  相似文献   

10.
One of the most commonly used means to characterize potential energy surfaces of reactions and chemical systems is the Hessian calculation, whose analytic evaluation is computationally and memory demanding. A new scalable distributed data analytic Hessian algorithm is presented. Features of the distributed data parallel coupled perturbed Hartree-Fock (CPHF) are (a) columns of density-like and Fock-like matrices are distributed among processors, (b) an efficient static load balancing scheme achieves good work load distribution among the processors, (c) network communication time is minimized, and (d) numerous performance improvements in analytic Hessian steps are made. As a result, the new code has good performance which is demonstrated on large biological systems.  相似文献   

11.
A parallel Fock matrix construction program for FMO‐MO method has been developed with the distributed shared memory model. To construct a large‐sized Fock matrix during FMO‐MO calculations, a distributed parallel algorithm was designed to make full use of local memory to reduce communication, and was implemented on the Global Array toolkit. A benchmark calculation for a small system indicates that the parallelization efficiency of the matrix construction portion is as high as 93% at 1,024 processors. A large FMO‐MO application on the epidermal growth factor receptor (EGFR) protein (17,246 atoms and 96,234 basis functions) was also carried out at the HF/6‐31G level of theory, with the frontier orbitals being extracted by a Sakurai‐Sugiura eigensolver. It takes 11.3 h for the FMO calculation, 49.1 h for the Fock matrix construction, and 10 min to extract 94 eigen‐components on a PC cluster system using 256 processors. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

12.
The evaluation of interactions between nearby particles constitutes the majority of the computational workload involved in classical molecular dynamics (MD) simulations. In this paper, we introduce a new method for the parallelization of range-limited particle interactions that proves particularly suitable to MD applications. Because it applies not only to pairwise interactions but also to interactions involving three or more particles, the method can be used for evaluation of both nonbonded and bonded forces in a MD simulation. It requires less interprocessor data transfer than traditional spatial decomposition methods at all but the lowest levels of parallelism. It gains an additional practical advantage in certain commonly used interprocessor communication networks by distributing the communication burden more evenly across network links and by decreasing the associated latency. When used to parallelize MD, it further reduces communication requirements by allowing the computations associated with short-range nonbonded interactions, long-range electrostatics, bonded interactions, and particle migration to use much of the same communicated data. We also introduce certain variants of this method that can significantly improve the balance of computational load across processors.  相似文献   

13.
A massively parallel program for quantum mechanical‐molecular mechanical (QM/MM) molecular dynamics simulation, called Platypus (PLATform for dYnamic Protein Unified Simulation), was developed to elucidate protein functions. The speedup and the parallelization ratio of Platypus in the QM and QM/MM calculations were assessed for a bacteriochlorophyll dimer in the photosynthetic reaction center (DIMER) on the K computer, a massively parallel computer achieving 10 PetaFLOPs with 705,024 cores. Platypus exhibited the increase in speedup up to 20,000 core processors at the HF/cc‐pVDZ and B3LYP/cc‐pVDZ, and up to 10,000 core processors by the CASCI(16,16)/6‐31G** calculations. We also performed excited QM/MM‐MD simulations on the chromophore of Sirius (SIRIUS) in water. Sirius is a pH‐insensitive and photo‐stable ultramarine fluorescent protein. Platypus accelerated on‐the‐fly excited‐state QM/MM‐MD simulations for SIRIUS in water, using over 4000 core processors. In addition, it also succeeded in 50‐ps (200,000‐step) on‐the‐fly excited‐state QM/MM‐MD simulations for the SIRIUS in water. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.  相似文献   

14.
We present a new version of the program package nMoldyn, which has been originally developed for a neutron‐scattering oriented analysis of molecular dynamics simulations of macromolecular systems (Kneller et al., Comput. Phys. Commun. 1995, 91, 191) and was later rewritten to include in‐depth time series analyses and a graphical user interface (Rog et al., J. Comput. Chem. 2003, 24, 657). The main improvement in this new version and the focus of this article are the parallelization of all the analysis algorithms for use on multicore desktop computers as well as distributed‐memory computing clusters. The parallelization is based on a task farming approach which maintains a simple program structure permitting easy modification and extension of the code to integrate new analysis methods. © 2012 Wiley Periodicals, Inc.  相似文献   

15.
A massively parallel version of the configuration interaction (CI) section of the COLUMBUS multireference singles and doubles CI (MRCISD) program system is described. In an extension of our previous parallelization work, which was based on message passing, the global array (GA) toolkit has now been used. For each process, these tools permit asynchronous and efficient access to logical blocks of 1- and 2-dimensional (2-D) arrays physically distributed over the memory of all processors. The GAs are available on most of the major parallel computer systems enabling very convenient portability of our parallel program code. To demonstrate the features of the parallel COLUMBUS CI code, benchmark calculations on selected MRCI and SRCI test cases are reported for the CRAY T3D, Intel Paragon, and IBM SP2. Excellent scaling with the number of processors up to 256 processors (CRAY T3D) was observed. The CI section of a 19 million configuration MRCISD calculation was carried out within 20 min wall clock time on 256 processors of a CRAY T3D. Computations with 38 million configurations were performed recently; calculations up to about 100 million configurations seem possible within the near future. © 1997 by John Wiley & Sons, Inc.  相似文献   

16.
Summary An account is given of experience gained in implementing computational chemistry application software, including quantum chemistry and macromolecular refinement codes, on distributed memory parallel processors. In quantum chemistry we consider the coarse-grained implementation of Gaussian integral and derivative integral evaluation, the direct-SCF computation of an uncorrelated wavefunction, the 4-index transformation of two-electron integrals and the direct-CI calculation of correlated wavefunctions. In the refinement of macromolecular conformations, we describe domain decomposition techniques used in implementing general purpose molecular mechanics, molecular dynamics and free energy perturbation calculations. Attention is focused on performance figures obtained on the Intel iPSC/2 and iPSC/860 hypercubes, which are compared with those obtained on a Cray Y-MP/464 and Convex C-220 minisupercomputer. From this data we deduce the cost effectiveness of parallel processors in the field of computational chemistry.  相似文献   

17.
We report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms to minimize the number of operations and to improve numerical stability, (iv) using OpenMP to parallelize those sections of the code for which distributed-memory parallelization involves unfavorable computing/communication time ratio, and (v) careful memory management to minimize simultaneous access of distant memory sections. The new code enables us to run molecular dynamics simulations of protein systems with size exceeding 100,000 amino-acid residues, reaching over 1 ns/day (1 μs/day in all-atom timescale) with 24 cores for proteins of this size. Parallel performance of the code and comparison of its performance with that of AMBER, GROMACS and MARTINI 3 is presented.  相似文献   

18.
The growing interest in the complexity of biological interactions is continuously driving the need to increase system size in biophysical simulations, requiring not only powerful and advanced hardware but adaptable software that can accommodate a large number of atoms interacting through complex forcefields. To address this, we developed and implemented strategies in the GENESIS molecular dynamics package designed for large numbers of processors. Long-range electrostatic interactions were parallelized by minimizing the number of processes involved in communication. A novel algorithm was implemented for nonbonded interactions to increase single instruction multiple data (SIMD) performance, reducing memory usage for ultra large systems. Memory usage for neighbor searches in real-space nonbonded interactions was reduced by approximately 80%, leading to significant speedup. Using experimental data describing physical 3D chromatin interactions, we constructed the first atomistic model of an entire gene locus (GATA4). Taken together, these developments enabled the first billion-atom simulation of an intact biomolecular complex, achieving scaling to 65,000 processes (130,000 processor cores) with 1 ns/day performance. Published 2019. This article is a U.S. Government work and is in the public domain in the USA.  相似文献   

19.
We developed a software package (RedMD) to perform molecular dynamics simulations and normal mode analysis of reduced models of proteins, nucleic acids, and their complexes. With RedMD one can perform molecular dynamics simulations in a microcanonical ensemble, with Berendsen and Langevin thermostats, and with Brownian dynamics. We provide force field and topology generators which are based on the one‐bead per residue/nucleotide elastic network model and its extensions. The user can change the force field parameters with the command line options that are passed to generators. Also, the generators can be modified, for example, to add new potential energy functions. Normal mode analysis tool is available for elastic or anisotropic network models. The program is written in C and C++ languages and the structure/topology of a molecule is based on an XML format. OpenMP technology for shared‐memory architectures was used for code parallelization. The code is distributed under GNU public licence and available at http://bionano.icm.edu.pl/software/ . © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009  相似文献   

20.
Two algorithms are presented for parallel direct computation of energies with second-order perturbation theory. Closed-shell MP2 theory as well as the open-shell perturbation theories OPT2(2) and ZAPT2 have been implemented. The algorithms are designed for distributed memory parallel computers. The first algorithm exhibits an excellent load balance and scales well when relatively few processors are used, but a large communication overhead reduces the efficiency for larger numbers of processors. The other algorithm employs very little interprocessor communication and scales well for large systems. In both implementations the memory requirement has been reduced by allowing the two-electron integral transformation to be performed in multiple passes and by distributing the (partially) transformed integrals between processors. Results are presented for systems with up to 327 basis functions. © 1995 John Wiley & Sons, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号