首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.
Using a grid‐based method to search the critical points in electron density, we show how to accelerate such a method with graphics processing units (GPUs). When the GPU implementation is contrasted with that used on central processing units (CPUs), we found a large difference between the time elapsed by both implementations: the smallest time is observed when GPUs are used. We tested two GPUs, one related with video games and other used for high‐performance computing (HPC). By the side of the CPUs, two processors were tested, one used in common personal computers and other used for HPC, both of last generation. Although our parallel algorithm scales quite well on CPUs, the same implementation on GPUs runs around 10× faster than 16 CPUs, with any of the tested GPUs and CPUs. We have found what one GPU dedicated for video games can be used without any problem for our application, delivering a remarkable performance, in fact; this GPU competes against one HPC GPU, in particular when single‐precision is used. © 2014 Wiley Periodicals, Inc.  相似文献   

2.
In this article we report on the coupled-cluster factorization problem. We describe the first implementation that optimizes (i) the contraction order for each term, (ii) the identification of reusable intermediates, (iii) the selection and factoring out of common factors simultaneously, considering all projection levels in a single step. The optimization is achieved by means of a genetic algorithm. Taking a one-term-at-a-time strategy as reference our factorization yields speedups of up to 4 (for intermediate excitation levels, smaller basis sets). We derive a theoretical lower bound for the highest order scaling cost and show that it is met by our implementation. Additionally, we report on the performance of the resulting highly excited coupled-cluster algorithms and find significant improvements with respect to the implementation of Kállay and Surján [J. Chem. Phys. 115, 2945 (2001)] and comparable performance with respect to MOLPRO's handwritten and dedicated open shell coupled cluster with singles and doubles substitutions implementation [P. J. Knowles, C. Hampel, and H.-J. Werner, J. Chem. Phys. 99, 5219 (1993)].  相似文献   

3.
During the past few years, graphics processing units (GPUs) have become extremely popular in the high performance computing community. In this study, we present an implementation of an acceleration engine for the solvent–solvent interaction evaluation of molecular dynamics simulations. By careful optimization of the algorithm speed‐ups up to a factor of 54 (single‐precision GPU vs. double‐precision CPU) could be achieved. The accuracy of the single‐precision GPU implementation is carefully investigated and does not influence structural, thermodynamic, and dynamic quantities. Therefore, the implementation enables users of the GROMOS software for biomolecular simulation to run the solvent–solvent interaction evaluation on a GPU, and thus, to speed‐up their simulations by a factor 6–9. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010  相似文献   

4.
Virtual screening of large libraries of small compounds requires fast and reliable automatic docking methods. In this article we present a parallel implementation of a genetic algorithm (GA) and the implementation of an enhanced genetic algorithm (EGA) with niching that lead to remarkable speedups compared to the original version AutoDock 3.0. The niching concept is introduced naturally by sharing genetic information between evolutions of subpopulations that run independently, each on one CPU. A unique set of additionally introduced search parameters that control this information flow has been obtained for drug‐like molecules based on the detailed study of three test cases of different complexity. The average docking time for one compound is of 8.6 s using eight R10,000 processors running at 200 MHz in an Origin 2000 computer. Different genetic algorithms with and without local search (LS) have been compared on an equal workload basis showing EGA/LS to be superior over all alternatives because it finds lower energy solutions faster and more often, particularly for high dimensionality problems. © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 1971–1982, 2001  相似文献   

5.
A new parallel algorithm and its implementation for the RI‐MP2 energy calculation utilizing peta‐flop‐class many‐core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual‐level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi‐node and multi‐GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi‐node and multi‐GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc.  相似文献   

6.
We present a highly parallel algorithm to convert internal coordinates of a polymeric molecule into Cartesian coordinates. Traditionally, converting the structures of polymers (e.g., proteins) from internal to Cartesian coordinates has been performed serially, due to an inherent linear dependency along the polymer chain. We show this dependency can be removed using a tree-based concatenation of coordinate transforms between segments, and then parallelized efficiently on graphics processing units (GPUs). The conversion algorithm is applicable to protein engineering and fitting protein structures to experimental data, and we observe an order of magnitude speedup using parallel processing on a GPU compared to serial execution on a CPU.  相似文献   

7.
The quantum yields for photouncaging reactions are mostly determined relative to other uncaging reactions, often using 1‐(2‐nitrophenyl)ethyl‐phosphate (“caged phosphate”). Herein, we demonstrate that the quantum yields acquired by using this method can be off by an order of magnitude at the typical irradiation wavelengths around 350 nm and describe an easy‐to‐use alternative procedure using inexpensive azobenzene.  相似文献   

8.
We report porting of the Divide‐Expand‐Consolidate Resolution of the Identity second‐order Møller–Plesset perturbation (DEC‐RI‐MP2) method to the graphic processing units (GPUs) using OpenACC compiler directives. It is shown that the OpenACC compiler directives implementation efficiently accelerates the rate‐determining step of the DEC‐RI‐MP2 method with minor implementation effort. Moreover, the GPU acceleration results in a better load balance and thus in an overall scaling improvement of the DEC algorithm. The resulting cross‐platform hybrid MPI/OpenMP/OpenACC implementation has scalable and portable performance on heterogeneous HPC architectures. The GPU‐enabled code was benchmarked using a reduced version of the S12L test set of Stefan Grimme (Grimme, Chem. Eur. J. 2012, 18, 9955) consisting of supramolecular complexes up to 158 atoms and 4292 contracted basis functions (cc‐pVTZ). The test set results demonstrate the general applicability of the DEC‐RI‐MP2 method showing results consistent with the DEC‐RI‐MP2 introductory paper (Baudin et al., J. Chem. Phys. 2016, 144, 054102) on molecules of complicated electronic structures. © 2016 Wiley Periodicals, Inc.  相似文献   

9.
We present an algorithm to efficiently compute accurate volumes and surface areas of macromolecules on graphical processing unit (GPU) devices using an analytic model which represents atomic volumes by continuous Gaussian densities. The volume of the molecule is expressed by means of the inclusion–exclusion formula, which is based on the summation of overlap integrals among multiple atomic densities. The surface area of the molecule is obtained by differentiation of the molecular volume with respect to atomic radii. The many‐body nature of the model makes a port to GPU devices challenging. To our knowledge, this is the first reported full implementation of this model on GPU hardware. To accomplish this, we have used recursive strategies to construct the tree of overlaps and to accumulate volumes and their gradients on the tree data structures so as to minimize memory contention. The algorithm is used in the formulation of a surface area‐based non‐polar implicit solvent model implemented as an open source plug‐in (named GaussVol) for the popular OpenMM library for molecular mechanics modeling. GaussVol is 50 to 100 times faster than our best optimized implementation for the CPUs, achieving speeds in excess of 100 ns/day with 1 fs time‐step for protein‐sized systems on commodity GPUs. © 2017 Wiley Periodicals, Inc.  相似文献   

10.
The capabilities of the polarizable force fields for alchemical free energy calculations have been limited by the high computational cost and complexity of the underlying potential energy functions. In this work, we present a GPU‐based general alchemical free energy simulation platform for polarizable potential AMOEBA. Tinker‐OpenMM, the OpenMM implementation of the AMOEBA simulation engine has been modified to enable both absolute and relative alchemical simulations on GPUs, which leads to a ∼200‐fold improvement in simulation speed over a single CPU core. We show that free energy values calculated using this platform agree with the results of Tinker simulations for the hydration of organic compounds and binding of host–guest systems within the statistical errors. In addition to absolute binding, we designed a relative alchemical approach for computing relative binding affinities of ligands to the same host, where a special path was applied to avoid numerical instability due to polarization between the different ligands that bind to the same site. This scheme is general and does not require ligands to have similar scaffolds. We show that relative hydration and binding free energy calculated using this approach match those computed from the absolute free energy approach. © 2017 Wiley Periodicals, Inc.  相似文献   

11.
Molecular dynamics (MD) simulations are a vital tool in chemical research, as they are able to provide an atomistic view of chemical systems and processes that is not obtainable through experiment. However, large‐scale MD simulations require access to multicore clusters or supercomputers that are not always available to all researchers. Recently, scientists have returned to exploring the power of graphics processing units (GPUs) for various applications, such as MD, enabled by the recent advances in hardware and integrated programming interfaces such as NVIDIA's CUDA platform. One area of particular interest within the context of chemical applications is that of aqueous interfaces, the salt solutions of which have found application as model systems for studying atmospheric process as well as physical behaviors such as the Hoffmeister effect. Here, we present results of GPU‐accelerated simulations of the liquid–vapor interface of aqueous sodium iodide solutions. Analysis of various properties, such as density and surface tension, demonstrates that our model is consistent with previous studies of similar systems. In particular, we find that the current combination of water and ion force fields coupled with the ability to simulate surfaces of differing area enabled by GPU hardware is able to reproduce the experimental trend of increasing salt solution surface tension relative to pure water. In terms of performance, our GPU implementation performs equivalent to CHARMM running on 21 CPUs. Finally, we address possible issues with the accuracy of MD simulaions caused by nonstandard single‐precision arithmetic implemented on current GPUs. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2011  相似文献   

12.
We describe the implementation of a parallel, in-core, integral-direct Hartree-Fock and density functional theory code for the efficient calculation of Hartree-Fock wave functions and density functional theory. The algorithm is based on a parallel master-slave algorithm, and the two-electron integrals calculated by a slave are stored in available local memory. To ensure the greatest computational savings, the master node keeps track of all integral batches stored on the different slaves. The code can reuse undifferentiated two-electron integrals both in the wave function optimization and in the evaluation of second-, third-, and fourth-order molecular properties. Superlinear scaling is achieved in a series of test examples, with speedups of up to 55 achieved for calculations run on medium-sized molecules on 16 processors with respect to the time used on a single processor.  相似文献   

13.
Ray casting on graphics processing units (GPUs) opens new possibilities for molecular visualization. We describe the implementation and calculation of diverse molecular representations such as licorice, ball-and-stick, space-filling van der Waals spheres, and approximated solvent-accessible surfaces using GPUs. We introduce HyperBalls, an improved ball-and-stick representation replacing tubes, linking the atom spheres by hyperboloids that can smoothly connect them. This type of depiction is particularly useful to represent dynamic phenomena, such as the evolution of noncovalent bonds. It is furthermore well suited to represent coarse-grained models and spring networks. All these representations can be defined by a single general algebraic equation that is adapted for the ray-casting technique and is well suited for execution on the GPU. Using GPU capabilities, this implementation can routinely, accurately, and interactively render molecules ranging from a few atoms up to huge macromolecular assemblies with more than 500,000 particles. In simple cases, based only on spheres, we have been able to display up to two million atoms smoothly.  相似文献   

14.
We describe a method to impose constraints in a molecular dynamics simulation. A technique developed to solve the special case of a linear topology (MILC SHAKE) is hybridized with the SHAKE algorithm. The methodology, which we term MILC‐hybridized SHAKE (or MILCH SHAKE), applies to more complex topologies. Here we consider the important case of all atom models of alkanes. Exploiting the mass difference between carbon and hydrogen we show that for higher alkanes MILCH SHAKE can be an order of magnitude faster than SHAKE. © 2009 Wiley Periodicals, Inc. J Comput Chem, 2009  相似文献   

15.
An efficient implementation of the canonical molecular dynamics simulation using the reversible reference system propagator algorithm (r‐RESPA) combined with the particle mesh Ewald method (PMEM) and with the macroscopic expansion of the fast multipole method (MEFMM) was examined. The performance of the calculations was evaluated for systems with 3000, 9999, 30,000, 60,000, and 99,840 particles. For a given accuracy, the optimal conditions for minimizing the CPU time for the implementation of the Ewald method, the PMEM, and the MEFMM were first analyzed. Using the optimal conditions, we evaluated the performance and the reliability of the integrated methods. For all the systems examined, the r‐RESPA with the PMEM was about twice as fast as the r‐RESPA with the MEFMM. The difference arose from the difference in the numerical complexities of the fast Fourier transform in the PMEM and from the transformation of the multipole moments into the coefficients of the local field expansion in the MEFMM. Compared with conventional methods, such as the velocity‐verlet algorithm with the Ewald method, significant speedups were obtained by the integrated methods; the speedup of the calculation was a function of system size, and was a factor of 100 for a system with 3000 particles and increased to a factor of 700 for a system with 99,840 particles. These integrated calculations are, therefore, promising for realizing large‐scale molecular dynamics simulations for complex systems. © 2000 John Wiley & Sons, Inc. J Comput Chem 21: 201–217, 2000  相似文献   

16.
A parallel algorithm for efficient calculation of the second derivatives (Hessian) of the conformational energy in internal coordinates is proposed. This parallel algorithm is based on the master/slave model. A master processor distributes the calculations of components of the Hessian to one or more slave processors that, after finishing their calculations, send the results to the master processor that assembles all the components of the Hessian. Our previously developed molecular analysis system for conformational energy optimization, normal mode analysis, and Monte Carlo simulation for internal coordinates is extended to use this parallel algorithm for Hessian calculation on a massively parallel computer. The implementation of our algorithm uses the message passing interface and works effectively on both distributed-memory parallel computers and shared-memory parallel computers. We applied this system to the Newton–Raphson energy optimization of the structures of glutaminyl transfer RNA (Gln-tRNA) with 74 nucleotides and glutaminyl-tRNA synthetase (GlnRS) with 540 residues to analyze the performance of our system. The parallel speedups for the Hessian calculation were 6.8 for Gln-tRNA with 24 processors and 11.2 for GlnRS with 54 processors. The parallel speedups for the Newton–Raphson optimization were 6.3 for Gln-tRNA with 30 processors and 12.0 for GlnRS with 62 processors. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1716–1723, 1998  相似文献   

17.
We describe a set of algorithms that allow to simulate dihydrofolate reductase (DHFR, a common benchmark) with the AMBER all‐atom force field at 160 nanoseconds/day on a single Intel Core i7 5960X CPU (no graphics processing unit (GPU), 23,786 atoms, particle mesh Ewald (PME), 8.0 Å cutoff, correct atom masses, reproducible trajectory, CPU with 3.6 GHz, no turbo boost, 8 AVX registers). The new features include a mixed multiple time‐step algorithm (reaching 5 fs), a tuned version of LINCS to constrain bond angles, the fusion of pair list creation and force calculation, pressure coupling with a “densostat,” and exploitation of new CPU instruction sets like AVX2. The impact of Intel's new transactional memory, atomic instructions, and sloppy pair lists is also analyzed. The algorithms map well to GPUs and can automatically handle most Protein Data Bank (PDB) files including ligands. An implementation is available as part of the YASARA molecular modeling and simulation program from www.YASARA.org . © 2015 The Authors Journal of Computational Chemistry Published by Wiley Periodicals, Inc.  相似文献   

18.
Nanoparticles can influence the properties of polymer materials by a variety of mechanisms. With fullerene, carbon nanotube, and clay or graphene sheet nanocomposites in mind, we investigate how particle shape influences the melt shear viscosity η and the tensile strength τ, which we determine via molecular dynamics simulations. Our simulations of compact (icosahedral), tube or rod‐like, and sheet‐like model nanoparticles, all at a volume fraction ? ≈ 0.05, indicate an order of magnitude increase in the viscosity η relative to the pure melt. This finding evidently can not be explained by continuum hydrodynamics and we provide evidence that the η increase in our model nanocomposites has its origin in chain bridging between the nanoparticles. We find that this increase is the largest for the rod‐like nanoparticles and least for the sheet‐like nanoparticles. Curiously, the enhancements of η and τ exhibit opposite trends with increasing chain length N and with particle shape anisotropy. Evidently, the concept of bridging chains alone cannot account for the increase in τ and we suggest that the deformability or flexibility of the sheet nanoparticles contributes to nanocomposite strength and toughness by reducing the relative value of the Poisson ratio of the composite. The molecular dynamics simulations in the present work focus on the reference case where the modification of the melt structure associated with glass‐formation and entanglement interactions should not be an issue. Since many applications require good particle dispersion, we also focus on the case where the polymer‐particle interactions favor nanoparticle dispersion. Our simulations point to a substantial contribution of nanoparticle shape to both mechanical and processing properties of polymer nanocomposites. © 2007 Wiley Periodicals, Inc. J Polym Sci Part B: Polym Phys 45: 1882–1897, 2007  相似文献   

19.
In this paper, we present the implementation of efficient approximations to time-dependent density functional theory (TDDFT) within the Tamm-Dancoff approximation (TDA) for hybrid density functionals. For the calculation of the TDDFT/TDA excitation energies and analytical gradients, we combine the resolution of identity (RI-J) algorithm for the computation of the Coulomb terms and the recently introduced "chain of spheres exchange" (COSX) algorithm for the calculation of the exchange terms. It is shown that for extended basis sets, the RIJCOSX approximation leads to speedups of up to 2 orders of magnitude compared to traditional methods, as demonstrated for hydrocarbon chains. The accuracy of the adiabatic transition energies, excited state structures, and vibrational frequencies is assessed on a set of 27 excited states for 25 molecules with the configuration interaction singles and hybrid TDDFT/TDA methods using various basis sets. Compared to the canonical values, the typical error in transition energies is of the order of 0.01 eV. Similar to the ground-state results, excited state equilibrium geometries differ by less than 0.3 pm in the bond distances and 0.5° in the bond angles from the canonical values. The typical error in the calculated excited state normal coordinate displacements is of the order of 0.01, and relative error in the calculated excited state vibrational frequencies is less than 1%. The errors introduced by the RIJCOSX approximation are, thus, insignificant compared to the errors related to the approximate nature of the TDDFT methods and basis set truncation. For TDDFT/TDA energy and gradient calculations on Ag-TB2-helicate (156 atoms, 2732 basis functions), it is demonstrated that the COSX algorithm parallelizes almost perfectly (speedup ~26-29 for 30 processors). The exchange-correlation terms also parallelize well (speedup ~27-29 for 30 processors). The solution of the Z-vector equations shows a speedup of ~24 on 30 processors. The parallelization efficiency for the Coulomb terms can be somewhat smaller (speedup ~15-25 for 30 processors), but their contribution to the total calculation time is small. Thus, the parallel program completes a Becke3-Lee-Yang-Parr energy and gradient calculation on the Ag-TB2-helicate in less than 4 h on 30 processors. We also present the necessary extension of the Lagrangian formalism, which enables the calculation of the TDDFT excited state properties in the frozen-core approximation. The algorithms described in this work are implemented into the ORCA electronic structure system.  相似文献   

20.
A custom code for molecular dynamics simulations has been designed to run on CUDA‐enabled NVIDIA graphics processing units (GPUs). The double‐precision code simulates multicomponent fluids, with intramolecular and intermolecular forces, coarse‐grained and atomistic models, holonomic constraints, Nosé–Hoover thermostats, and the generation of distribution functions. Algorithms to compute Lennard‐Jones and Gay‐Berne interactions, and the electrostatic force using Ewald summations, are discussed. A neighbor list is introduced to improve scaling with respect to system size. Three test systems are examined: SPC/E water; an n‐hexane/2‐propanol mixture; and a liquid crystal mesogen, 2‐(4‐butyloxyphenyl)‐5‐octyloxypyrimidine. Code performance is analyzed for each system. With one GPU, a 33–119 fold increase in performance is achieved compared with the serial code while the use of two GPUs leads to a 69–287 fold improvement and three GPUs yield a 101–377 fold speedup. © 2015 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号