期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

More bang for your buck: Improved use of GPU nodes for GROMACS 2018

Carsten Kutzner Szilárd Páll Martin Fechner Ansgar Esztermann Bert L. de Groot Helmut Grubmüller 《Journal of computational chemistry》2019,40(27):2418-2431

We identify hardware that is optimal to produce molecular dynamics (MD) trajectories on Linux compute clusters with the GROMACS 2018 simulation package. Therefore, we benchmark the GROMACS performance on a diverse set of compute nodes and relate it to the costs of the nodes, which may include their lifetime costs for energy and cooling. In agreement with our earlier investigation using GROMACS 4.6 on hardware of 2014, the performance to price ratio of consumer GPU nodes is considerably higher than that of CPU nodes. However, with GROMACS 2018, the optimal CPU to GPU processing power balance has shifted even more toward the GPU. Hence, nodes optimized for GROMACS 2018 and later versions enable a significantly higher performance to price ratio than nodes optimized for older GROMACS versions. Moreover, the shift toward GPU processing allows to cheaply upgrade old nodes with recent GPUs, yielding essentially the same performance as comparable brand-new hardware. © 2019 Wiley Periodicals, Inc. 相似文献

2.

Parallelization of MRCI based on hole-particle symmetry

Suo B Zhai G Wang Y Wen Z Hu X Li L 《Journal of computational chemistry》2005,26(1):88-96

The parallel implementation of multireference configuration interaction program based on the hole-particle symmetry is described. The platform to implement the parallelization is an Intel-Architectural cluster consisting of 12 nodes, each of which is equipped with two 2.4-G XEON processors, 3-GB memory, and 36-GB disk, and are connected by a Gigabit Ethernet Switch. The dependence of speedup on molecular symmetries and task granularities is discussed. Test calculations show that the scaling with the number of nodes is about 1.9 (for C1 and Cs), 1.65 (for C2v), and 1.55 (for D2h) when the number of nodes is doubled. The largest calculation performed on this cluster involves 5.6 x 10(8) CSFs. 相似文献

3.

Methods for parallel computation of SCF NMR chemical shifts by GIAO method: Efficient integral calculation,multi-Fock algorithm,and pseudodiagonalization

Krzysztof Wolinski Robert Haacke James F. Hinton Peter Pulay 《Journal of computational chemistry》1997,18(6):816-825

We implemented our gauge-including atomic orbital (GIAO) NMR chemical shielding program on a workstation cluster, using the parallel virtual machine (PVM) message-passing system. On a modest number of nodes, we achieved close to linear speedup. This program is characterized by several novel features. It uses the new integral program of Wolinski that calculates integrals in vectorized batches, increases efficiency, and simplifies parallelization. The self-consistent field (SCF) step includes a multi-Fock algorithm, i.e., the simultaneous calculation of several Fock matrices with the same integral set, increasing the efficiency of the direct SCF procedure. The SCF diagonalization step, which is difficult to parallelize, has been replaced by pseudodiagonalization. The latter, widely used in semiempirical programs, becomes important in ab initio type calculations above a certain size, because the ultimate scaling of the diagonalization step is steeper than that of integral computation. Examples of the calculation of the NMR shieldings in large systems at the SCF level are shown. Parallelization of the density functional code is underway. © 1997 by John Wiley & Sons, Inc. J Comput Chem 18: 816–825, 1997 相似文献

4.

Best bang for your buck: GPU nodes for GROMACS biomolecular simulations

下载免费PDF全文

Carsten Kutzner Szilárd Páll Martin Fechner Ansgar Esztermann Bert L. de Groot Helmut Grubmüller 《Journal of computational chemistry》2015,36(26):1990-2008

The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well‐exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)‐based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off‐loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance‐to‐price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer‐class GPUs this improvement equally reflects in the performance‐to‐price ratio. Although memory issues in consumer‐class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost‐efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well‐balanced ratio of CPU and consumer‐class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. 相似文献

5.

A new hierarchical parallelization scheme: generalized distributed data interface (GDDI), and an application to the fragment molecular orbital method (FMO)

Fedorov DG Olson RM Kitaura K Gordon MS Koseki S 《Journal of computational chemistry》2004,25(6):872-880

A two-level hierarchical scheme, generalized distributed data interface (GDDI), implemented into GAMESS is presented. Parallelization is accomplished first at the upper level by assigning computational tasks to groups. Then each group does parallelization at the lower level, by dividing its task into smaller work loads. The types of computations that can be used with this scheme are limited to those for which nearly independent tasks and subtasks can be assigned. Typical examples implemented, tested, and analyzed in this work are numeric derivatives and the fragment molecular orbital method (FMO) that is used to compute large molecules quantum mechanically by dividing them into fragments. Numeric derivatives can be used for algorithms based on them, such as geometry optimizations, saddle-point searches, frequency analyses, etc. This new hierarchical scheme is found to be a flexible tool easily utilizing network topology and delivering excellent performance even on slow networks. In one of the typical tests, on 16 nodes the scalability of GDDI is 1.7 times better than that of the standard parallelization scheme DDI and on 128 nodes GDDI is 93 times faster than DDI (on a multihub Fast Ethernet network). FMO delivered scalability of 80-90% on 128 nodes, depending on the molecular system (water clusters and a protein). A numerical gradient calculation for a water cluster achieved a scalability of 70% on 128 nodes. It is expected that GDDI will become a preferred tool on massively parallel computers for appropriate computational tasks. 相似文献

6.

Parallel density matrix propagation in spin dynamics simulations

Edwards LJ Kuprov I 《The Journal of chemical physics》2012,136(4):044108

Several methods for density matrix propagation in parallel computing environments are proposed and evaluated. It is demonstrated that the large communication overhead associated with each propagation step (two-sided multiplication of the density matrix by an exponential propagator and its conjugate) may be avoided and the simulation recast in a form that requires virtually no inter-thread communication. Good scaling is demonstrated on a 128-core (16 nodes, 8 cores each) cluster. 相似文献

7.

GROMACS: fast, flexible, and free 总被引：37，自引：0，他引：37

Van Der Spoel D Lindahl E Hess B Groenhof G Mark AE Berendsen HJ 《Journal of computational chemistry》2005,26(16):1701-1718

This article describes the software suite GROMACS (Groningen MAchine for Chemical Simulation) that was developed at the University of Groningen, The Netherlands, in the early 1990s. The software, written in ANSI C, originates from a parallel hardware project, and is well suited for parallelization on processor clusters. By careful optimization of neighbor searching and of inner loop performance, GROMACS is a very fast program for molecular dynamics simulation. It does not have a force field of its own, but is compatible with GROMOS, OPLS, AMBER, and ENCAD force fields. In addition, it can handle polarizable shell models and flexible constraints. The program is versatile, as force routines can be added by the user, tabulated functions can be specified, and analyses can be easily customized. Nonequilibrium dynamics and free energy determinations are incorporated. Interfaces with popular quantum-chemical packages (MOPAC, GAMES-UK, GAUSSIAN) are provided to perform mixed MM/QM simulations. The package includes about 100 utility and analysis programs. GROMACS is in the public domain and distributed (with source code and documentation) under the GNU General Public License. It is maintained by a group of developers from the Universities of Groningen, Uppsala, and Stockholm, and the Max Planck Institute for Polymer Research in Mainz. Its Web site is http://www.gromacs.org. 相似文献

8.

Enabling grand‐canonical Monte Carlo: Extending the flexibility of GROMACS through the GromPy python interface module

René Pool Jaap Heringa Martin Hoefling Roland Schulz Jeremy C. Smith K. Anton Feenstra 《Journal of computational chemistry》2012,33(12):1207-1214

We report on a python interface to the GROMACS molecular simulation package, GromPy (available at https://github.com/GromPy ). This application programming interface (API) uses the ctypes python module that allows function calls to shared libraries, for example, written in C. To the best of our knowledge, this is the first reported interface to the GROMACS library that uses direct library calls. GromPy can be used for extending the current GROMACS simulation and analysis modes. In this work, we demonstrate that the interface enables hybrid Monte‐Carlo/molecular dynamics (MD) simulations in the grand‐canonical ensemble, a simulation mode that is currently not implemented in GROMACS. For this application, the interplay between GromPy and GROMACS requires only minor modifications of the GROMACS source code, not affecting the operation, efficiency, and performance of the GROMACS applications. We validate the grand‐canonical application against MD in the canonical ensemble by comparison of equations of state. The results of the grand‐canonical simulations are in complete agreement with MD in the canonical ensemble. The python overhead of the grand‐canonical scheme is only minimal. © 2012 Wiley Periodicals, Inc. 相似文献

9.

Communication performance of d -meshes in molecular dynamics simulation

Roman Trobec Urban Bor?tnik Du?anka Jane?i? 《Journal of mathematical chemistry》2009,45(2):503-512

Communication algorithms, tailored for molecular dynamics simulation on d-meshes, are evaluated in terms of communication efficiency. It has been shown elsewhere that d-meshes are better than other regular topologies, e.g., hypercubes and standard toroidal 4-meshes, when compared in their diameter and average distance among nodes. Collective communication is needed in molecular dynamics simulation for the distribution of coordinates and calculation and distribution of new energies. We show that both collective communication patterns used in molecular dynamics can be efficiently solved with congestion-free algorithms for all-to-all communication based on store-and-forward routing and routing tables. Our results indicate that d-meshes compete with hypercubes in parallel computers. Therefore d-meshes can also be used as a communication upgrade of existing molecular dynamics simulation platforms and can be successfully applied to perform fast molecular dynamics simulation. 相似文献

10.

Parallel pseudospectral electronic structure: II. Localized Møller–Plesset calculations

Michael D. Beachy David Chasman Richard A. Friesner Robert B. Murphy 《Journal of computational chemistry》1998,19(9):1030-1038

We have developed a parallel version of our pseudospectral localized Møller–Plesset electronic structure code. We present timings for molecules up to 1010 basis functions and parallel speedup for molecules in the range of 260–658 basis functions. We demonstrate that the code is scalable; that is, a larger number of nodes can be efficiently utilized as the size of the molecule increases. By taking advantage of the available distributed memory and disk space of a scalable parallel computer, the parallel code can calculate LMP2 energies of molecules too large to be done on workstations. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1030–1038, 1998 相似文献

11.

PVM-AMBER: A parallel implementation of the AMBER molecular mechanics package for workstation clusters

Eric Swanson Terry P. Lybrand 《Journal of computational chemistry》1995,16(9):1131-1140

A parallel version of the popular molecular mechanics package AMBER suitable for execution on workstation clusters has been developed. Computer-intensive portions of molecular dynamics or free-energy perturbation computations, such as nonbonded pair list generation or calculation of nonbonded energies and forces, are distributed across a collection of Unix workstations linked by Ethernet or FDDI connections. This parallel implementation utilizes the message-passing software PVM (Parallel Virtual Machine) from Oak Ridge National Laboratory to coordinate data exchange and processor synchronization. Test simulations performed for solvated peptide, protein, and lipid bilayer systems indicate that reasonable parallel efficiency (70–90%) and computational speedup (2–5 × serial computer runtimes) can be achieved with small workstation clusters (typically six to eight machines) for typical biomolecular simulation problems. PVM-AMBER is also easily and rapidly portable to different hardware platforms due to the availability of PVM for numerous computers. The current version of PVM-AMBER has been tested successfully on Silicon Graphics, IBM RS6000, DEC ALPHA, and HP 735 workstation clusters and heterogeneous clusters of these machines, as well as on CRAY T3D and Kendall Square KSR2 parallel supercomputers. Thus, PVM-AMBER provides a simple and cost-effective mechanism for parallel molecular dynamics simulations on readily available hardware platforms. Factors that affect the efficiency of this approach are discussed. © 1995 by John Wiley & Sons, Inc. 相似文献

12.

Chemical Bonding of Transition‐Metal Co13 Clusters with Graphene

下载免费PDF全文

Tomás Alonso‐Lanza Dr. Andrés Ayuela Prof. Faustino Aguilera‐Granja 《Chemphyschem》2015,16(17):3700-3710

We carried out density functional calculations to study the adsorption of Co₁₃ clusters on graphene. Several free isomers were deposited at different positions with respect to the hexagonal lattice nodes, allowing us to study even the hcp 2d isomer, which was recently obtained as the most stable one. Surprisingly, the Co₁₃ clusters attached to graphene prefer icosahedron‐like structures in which the low‐lying isomer is much distorted; in such structures, they are linked with more bonds than those reported in previous works. For any isomer, the most stable position binds to graphene by the Co atoms that can lose electrons. We find that the charge transfer between graphene and the clusters is small enough to conclude that the Co–graphene binding is not ionic‐like but chemical. Besides, the same order of stability among the different isomers on doped graphene is kept. These findings could also be of interest for magnetic clusters on graphenic nanostructures such as ribbons and nanotubes. 相似文献

13.

Molecular dynamics simulation of complex multiphase flow on a computer cluster with GPUs 总被引：2，自引：0，他引：2

FeiGuo Chen Wei Ge JingHai Li 《中国科学B辑(英文版)》2009,52(3):372-380

Compute Unified Device Architecture (CUDA) was used to design and implement molecular dynamics (MD) simulations on graphics processing units (GPU). With an NVIDIA Tesla C870, a 20–60 fold speedup over that of one core of the Intel Xeon 5430 CPU was achieved, reaching up to 150 Gflops. MD simulation of cavity flow and particle-bubble interaction in liquid was implemented on multiple GPUs using a message passing interface (MPI). Up to 200 GPUs were tested on a special network topology, which achieves good scalability. The capability of GPU clusters for large-scale molecular dynamics simulation of meso-scale flow behavior was, therefore, uncovered. Supported by the National Natural Science Foundation of China (Grant Nos. 20336040, 20221603 and 20490201), and the Chinese Academy of Sciences (Grant No. Kgcxz-yw-124) 相似文献

14.

Direct SCF structure optimization of large molecules on networks of workstations

S. Vogel J. Hutter T. H. Fischer H. P. Lüthi 《International journal of quantum chemistry》1993,45(6):665-678

A program to optimize the structure of large molecules at the Hartree–Fock level of theory running concurrently on a network of workstations is presented. Problems encountered in obtaining nearly optimal speedup and their solutions are discussed. A simple scheduling algorithm is presented that enables up to 99.5% of the code to run in parallel. © 1993 John Wiley & Sons, Inc. 相似文献

15.

Optimization of parallel implementation of UNRES package for coarse-grained simulations to treat large proteins

Adam K. Sieradzan Jordi Sans-Duñó Emilia A. Lubecka Cezary Czaplewski Agnieszka G. Lipska Henryk Leszczyński Krzysztof M. Ocetkiewicz Jerzy Proficz Paweł Czarnul Henryk Krawczyk Adam Liwo 《Journal of computational chemistry》2023,44(4):602-625

We report major algorithmic improvements of the UNRES package for physics-based coarse-grained simulations of proteins. These include (i) introduction of interaction lists to optimize computations, (ii) transforming the inertia matrix to a pentadiagonal form to reduce computing and memory requirements, (iii) removing explicit angles and dihedral angles from energy expressions and recoding the most time-consuming energy/force terms to minimize the number of operations and to improve numerical stability, (iv) using OpenMP to parallelize those sections of the code for which distributed-memory parallelization involves unfavorable computing/communication time ratio, and (v) careful memory management to minimize simultaneous access of distant memory sections. The new code enables us to run molecular dynamics simulations of protein systems with size exceeding 100,000 amino-acid residues, reaching over 1 ns/day (1 μs/day in all-atom timescale) with 24 cores for proteins of this size. Parallel performance of the code and comparison of its performance with that of AMBER, GROMACS and MARTINI 3 is presented. 相似文献

16.

Diffusion trapping times and dynamic percolation in an Ising system

Chen CL Shapir Y Chimowitz EH 《The Journal of chemical physics》2008,129(2):024701

We address the problem of diffusion through dynamic Ising network structures using random walkers (RWs) whose net displacements are partitioned into two contributions, arising from (1) transport through neighboring "conducting" clusters and (2) self-diffusion of the site on which the RW finds itself, respectively. At finite temperatures, the conducting clusters in the network exhibit correlated dynamic behavior, making our model system different to most prior published work, which has largely been at the random percolation limit. We also present a novel heuristic scaling analysis for this system that utilizes a new scaling exponent theta(z) for representing RW trapping time as a function of "distance" from the dynamic percolation transition. Simulation results in two-dimensional networks show that when theta(z) = 2, a value found from independent physical arguments, our scaling equations appear to capture universal behavior in the system, at both the random percolation (infinite temperature) and finite temperature conditions studied. This study suggests that the model and the scaling approach given here should prove useful for studying transport in physical systems showing dynamic disorder. 相似文献

17.

An integral direct, distributed-data, parallel MP2 algorithm

Martin Schütz Roland Lindh 《Theoretical chemistry accounts》1997,95(1-2):13-34

Summary A scalable integral direct, distributed-data parallel algorithm for four-index transformation is presented. The algorithm was implemented in the context of the second-order M?ller-Plesset (MP2) energy evaluation, yet it is easily adopted for other electron correlation methods, where only MO integrals with two indices in the virtual orbitals space are required. The major computational steps of the MP2 energy are the two-electron integral evaluationO(N ⁴) and transformation into the MO basisO(ON ⁴), whereN is the number of basis functions, andO the number of occupied orbitals, respectively. The associated maximal communication costs scale asO(n _Σ O ² V N), whereV andn _Σ denote the number of virtual orbitals, and the number of symmetry-unique shells. The largest local and global memory requirements areO(N ²) for the MO coefficients andO(OV N) for the three-quarter transformed integrals, respectively. Several aspects of the implementation such as symmetry-treatment, integral prescreening, and the distribution of data and computational tasks are discussed. The parallel efficiency of the algorithm is demonstrated by calculations on the phenanthrene molecule, with 762 primitive Gaussians, contracted to 412 basis functions. The calculations were performed on an IBM SP2 with 48 nodes. The measured wall clock time on 48 nodes is less than 15 min for this calculation, and the speedup relative to single-node execution is estimated to 527. This superlinear speedup is a result of exploiting both the compute power and the aggregate memory of the parallel computer. The latter reduces the number of passes through the AO integral list, and hence the operation count of the calculation. The test calculations also show that the evaluation of the two-electron integrals dominates the calculation, despite the higher scaling of the transformation step. 相似文献

18.

Algorithms for GPU‐based molecular dynamics simulations of complex fluids: Applications to water,mixtures, and liquid crystals

下载免费PDF全文

Sergey Kazachenko Mark Giovinazzo Kyle Wm. Hall Natalie M. Cann 《Journal of computational chemistry》2015,36(24):1787-1804

A custom code for molecular dynamics simulations has been designed to run on CUDA‐enabled NVIDIA graphics processing units (GPUs). The double‐precision code simulates multicomponent fluids, with intramolecular and intermolecular forces, coarse‐grained and atomistic models, holonomic constraints, Nosé–Hoover thermostats, and the generation of distribution functions. Algorithms to compute Lennard‐Jones and Gay‐Berne interactions, and the electrostatic force using Ewald summations, are discussed. A neighbor list is introduced to improve scaling with respect to system size. Three test systems are examined: SPC/E water; an n‐hexane/2‐propanol mixture; and a liquid crystal mesogen, 2‐(4‐butyloxyphenyl)‐5‐octyloxypyrimidine. Code performance is analyzed for each system. With one GPU, a 33–119 fold increase in performance is achieved compared with the serial code while the use of two GPUs leads to a 69–287 fold improvement and three GPUs yield a 101–377 fold speedup. © 2015 Wiley Periodicals, Inc. 相似文献

19.

SCF calculations on MIMD type parallel computers

A. Burkhardt U. Wedig H. G. v. Schnering 《Theoretical chemistry accounts》1993,86(6):497-510

Summary One of the key methods in quantum chemistry, the Hartree-Fock SCF method, is performing poorly on typical vector supercomputers. A significant acceleration of calculations of this type requires the development and implementation of a parallel SCF algorithm. In this paper various parallelization strategies are discussed comparing local and global communication management as well as sequential and distributed Fock-matrix updates. Programs based on these algorithms are bench marked on transputer networks and two IBM MIMD prototypes. The portability of the code is demonstrated with the portation of the initial Helios version to other operating systems like Parallel VM/SP and PARIX. Based on the PVM libraries, a platform-independent version has been developed for heterogeneous workstation clusters as well as for massively parallel computers. 相似文献

20.

Parallel‐ProBiS: Fast parallel algorithm for local structural comparison of protein structures and binding sites

Janez Konc Matjaž Depolli Roman Trobec Kati Rozman Dušanka Janežič 《Journal of computational chemistry》2012,33(27):2199-2203

The ProBiS algorithm performs a local structural comparison of the query protein surface against the nonredundant database of protein structures. It finds proteins that have binding sites in common with the query protein. Here, we present a new parallelized algorithm, Parallel‐ProBiS, for detecting similar binding sites on clusters of computers. The obtained speedups of the parallel ProBiS scale almost ideally with the number of computing cores up to about 64 computing cores. Scaling is better for larger than for smaller query proteins. For a protein with almost 600 amino acids, the maximum speedup of 180 was achieved on two interconnected clusters with 248 computing cores. Source code of Parallel‐ProBiS is available for download free for academic users at http://probis.cmm.ki.si/download . © 2012 Wiley Periodicals, Inc. 相似文献