首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A parallel algorithm for efficient calculation of the second derivatives (Hessian) of the conformational energy in internal coordinates is proposed. This parallel algorithm is based on the master/slave model. A master processor distributes the calculations of components of the Hessian to one or more slave processors that, after finishing their calculations, send the results to the master processor that assembles all the components of the Hessian. Our previously developed molecular analysis system for conformational energy optimization, normal mode analysis, and Monte Carlo simulation for internal coordinates is extended to use this parallel algorithm for Hessian calculation on a massively parallel computer. The implementation of our algorithm uses the message passing interface and works effectively on both distributed-memory parallel computers and shared-memory parallel computers. We applied this system to the Newton–Raphson energy optimization of the structures of glutaminyl transfer RNA (Gln-tRNA) with 74 nucleotides and glutaminyl-tRNA synthetase (GlnRS) with 540 residues to analyze the performance of our system. The parallel speedups for the Hessian calculation were 6.8 for Gln-tRNA with 24 processors and 11.2 for GlnRS with 54 processors. The parallel speedups for the Newton–Raphson optimization were 6.3 for Gln-tRNA with 30 processors and 12.0 for GlnRS with 62 processors. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1716–1723, 1998  相似文献   

2.
One of the most commonly used means to characterize potential energy surfaces of reactions and chemical systems is the Hessian calculation, whose analytic evaluation is computationally and memory demanding. A new scalable distributed data analytic Hessian algorithm is presented. Features of the distributed data parallel coupled perturbed Hartree-Fock (CPHF) are (a) columns of density-like and Fock-like matrices are distributed among processors, (b) an efficient static load balancing scheme achieves good work load distribution among the processors, (c) network communication time is minimized, and (d) numerous performance improvements in analytic Hessian steps are made. As a result, the new code has good performance which is demonstrated on large biological systems.  相似文献   

3.
We investigate and test an algorithm suitable for the parallel calculation of the potential energy of a protein, or its spatial gradient, when the protein atoms interact via pair potentials. This algorithm is similar to one previously proposed, but it is more efficient, having half the interprocessor communications costs. For a given protein, we show that there is an optimal number of processors that gives a maximum speedup of the potential energy calculation compared to a sequential machine. (Using more than the optimum number of processors actually increases the computation time). With the optimum number the computation time is proportional to the protein size N. This is a considerable improvement in performance compared to sequential machines, where the computation time is proportional to N2. We also show that the dependence of the maximum speedup on the message latency time is relatively weak.  相似文献   

4.
We have implemented a parallel divide-and-conquer method for semiempirical quantum mechanical calculations. The standard message passing library, the message passing interface (MPI), was used. In this parallel version, the memory needed to store the Fock and density matrix elements is distributed among the processors. This memory distribution solves the problem of demanding requirement of memory for very large molecules. While the parallel calculation for construction of matrix elements is straightforward, the parallel calculation of Fock matrix diagonalization is achieved via the divide-and-conquer method. Geometry optimization is also implemented with parallel gradient calculations. The code has been tested on a Cray T3E parallel computer, and impressive speedup of calculations has been achieved. Our results indicate that the divide-and-conquer method is efficient for parallel implementation. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1101–1109, 1998  相似文献   

5.
We developed a novel parallel algorithm for large-scale Fock matrix calculation with small locally distributed memory architectures, and named it the "RT parallel algorithm." The RT parallel algorithm actively involves the concept of integral screening, which is indispensable for reduction of computing times with large-scale biological molecules. The primary characteristic of this algorithm is parallel efficiency, which is achieved by well-balanced reduction of both communicating and computing volume. Only the density matrix data necessary for Fock matrix calculations are communicated, and the data once communicated are reutilized for calculations as many times as possible. The RT parallel algorithm is a scalable method because required memory volume does not depend on the number of basis functions. This algorithm automatically includes a partial summing technique that is indispensable for maintaining computing accuracy, and can also include some conventional methods to reduce calculation times. In our analysis, the RT parallel algorithm had better performance than other methods for massively parallel processors. The RT parallel algorithm is most suitable for massively parallel and distributed Fock matrix calculations for large-scale biological molecules with more than thousands of basis functions.  相似文献   

6.
In this paper, we present the implementation of efficient approximations to time-dependent density functional theory (TDDFT) within the Tamm-Dancoff approximation (TDA) for hybrid density functionals. For the calculation of the TDDFT/TDA excitation energies and analytical gradients, we combine the resolution of identity (RI-J) algorithm for the computation of the Coulomb terms and the recently introduced "chain of spheres exchange" (COSX) algorithm for the calculation of the exchange terms. It is shown that for extended basis sets, the RIJCOSX approximation leads to speedups of up to 2 orders of magnitude compared to traditional methods, as demonstrated for hydrocarbon chains. The accuracy of the adiabatic transition energies, excited state structures, and vibrational frequencies is assessed on a set of 27 excited states for 25 molecules with the configuration interaction singles and hybrid TDDFT/TDA methods using various basis sets. Compared to the canonical values, the typical error in transition energies is of the order of 0.01 eV. Similar to the ground-state results, excited state equilibrium geometries differ by less than 0.3 pm in the bond distances and 0.5° in the bond angles from the canonical values. The typical error in the calculated excited state normal coordinate displacements is of the order of 0.01, and relative error in the calculated excited state vibrational frequencies is less than 1%. The errors introduced by the RIJCOSX approximation are, thus, insignificant compared to the errors related to the approximate nature of the TDDFT methods and basis set truncation. For TDDFT/TDA energy and gradient calculations on Ag-TB2-helicate (156 atoms, 2732 basis functions), it is demonstrated that the COSX algorithm parallelizes almost perfectly (speedup ~26-29 for 30 processors). The exchange-correlation terms also parallelize well (speedup ~27-29 for 30 processors). The solution of the Z-vector equations shows a speedup of ~24 on 30 processors. The parallelization efficiency for the Coulomb terms can be somewhat smaller (speedup ~15-25 for 30 processors), but their contribution to the total calculation time is small. Thus, the parallel program completes a Becke3-Lee-Yang-Parr energy and gradient calculation on the Ag-TB2-helicate in less than 4 h on 30 processors. We also present the necessary extension of the Lagrangian formalism, which enables the calculation of the TDDFT excited state properties in the frozen-core approximation. The algorithms described in this work are implemented into the ORCA electronic structure system.  相似文献   

7.
The atomistic molecular dynamics program YASP has been parallelized for shared-memory computer architectures. Parallelization was restricted to the most CPU-time-consuming parts: neighbor-list construction, calculation of nonbonded, angle and dihedral forces, and constraints. Most of the sequential FORTRAN code was kept; parallel constructs were inserted as compiler directives using the OpenMP standard. Only in the case of the neighbor list did the data structure have to be changed. The parallel code achieves a useful speedup over the sequential version for systems of several thousand atoms and above. On an IBM Regatta p690+, the throughput increases with the number of processors up to a maximum of 12-16 processors depending on the characteristics of the simulated systems. On dual-processor Xeon systems, the speedup is about 1.7.  相似文献   

8.
A parallel algorithm for solving the coupled-perturbed MCSCF (CPMCSCF) equations and analytic nuclear second derivatives of CASSCF wave functions is presented. A parallel scheme for evaluating derivative integrals and their subsequent use in constructing other derivative quantities is described. The task of solving the CPMCSCF equations is approached using a parallelization scheme that partitions the electronic hessian matrix over all processors as opposed to simple partitioning of the 3 N solution vectors among the processors. The scalability of the current algorithm, up to 128 processors, is demonstrated. Using three test cases, results indicate that the parallelization of derivative integral evaluation through a simple scheme is highly effective regardless of the size of the basis set employed in the CASSCF energy calculation. Parallelization of the construction of the MCSCF electronic hessian during solution of the CPMCSCF equations varies quantitatively depending on the nature of the hessian itself, but is highly scalable in all cases.  相似文献   

9.
Dynamics simulations of molecular systems are notoriously computationally intensive. Using parallel computers for these simulations is important for reducing their turnaround time. In this article we describe a parallelization of the simulation program CHARMM for the Intel iPSC/860, a distributed memory multiprocessor. In the parallelization, the computational work is partitioned among the processors for core calculations including the calculation of forces, the integration of equations of motion, the correction of atomic coordinates by constraint, and the generation and update of data structures used to compute nonbonded interactions. Processors coordinate their activity using synchronous communication to exchange data values. Key data structures used are partitioned among the processors in nearly equal pieces, reducing the memory requirement per node and making it possible to simulate larger molecular systems. We examine the effectiveness of the parallelization in the context of a case study of a realistic molecular system. While effective speedup was achieved for many of the dynamics calculations, other calculations fared less well due to growing communication costs for exchanging data among processors. The strategies we used are applicable to parallelization of similar molecular mechanics and dynamics programs for distributed memory multiprocessors. © 1992 by John Wiley & Sons, Inc.  相似文献   

10.
We present an outline of the parallel implementation of our pseudospectral electronic structure program, Jaguar, including the algorithm and timings for the Hartree–Fock and analytic gradient portions of the program. We also present the parallel algorithm and timings for our Lanczos eigenvector refinement code and demonstrate that its performance is superior to the ScaLAPACK diagonalization routines. The overall efficiency of our code increases as the size of the calculation is increased, demonstrating actual as well as theoretical scalability. For our largest test system, alanine pentapeptide [818 basis functions in the cc-pVTZ(-f) basis set], our Fock matrix assembly procedure has an efficiency of nearly 90% on a 16-processor SP2 partition. The SCF portion for this case (including eigenvector refinement) has an overall efficiency of 87% on a partition of 8 processors and 74% on a partition of 16 processors. Finally, our parallel gradient calculations have a parallel efficiency of 84% on 8 processors for porphine (430 basis functions). © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1017–1029, 1998  相似文献   

11.
CPSA与疏水性参数的遗传算法及神经网络法研究   总被引:2,自引:0,他引:2  
用分子力学计算出“净”原子表面积,并利用量子化学方法计算出化合物的电荷加权部分表面积(CP-SA).在用遗传算法和神经网络法对改进的CPSA与有机醇类化合物的疏水性参数作相关分析时发现,改进的CPSA可有效地用于构效关系研究,且算法简洁易行,两种多元统计方法均得到了满意的结果.  相似文献   

12.
Protein sequence database search based on tandem mass spectrometry is an essential method for protein identification. As the computational demand increases, parallel computing has become an important technique for accelerating proteomics data analysis. In this paper, we discuss several factors which could affect the runtime of the pFind search engine and build an estimation model. Based on this model, effective on‐line and off‐line scheduling methods were developed. An experiment on the public dataset from PhosphoPep consisting of 100 RAW files of phosphopeptides shows that the speedup on 100 processors is 83.7. The parallel version can complete the identification task within 9 min, while a stand‐alone process on a single PC takes more than 10 h. On another larger dataset consisting of 1 366 471 spectra, the speedup on 320 processors is 258.9 and the efficiency is 80.9%. Our approach can be applied to other similar search engines. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

13.
A linear-scaling scheme for estimating the electronic energy, gradients, and Hessian of a large molecule at ab initio level of theory based on fragment set cardinality is presented. With this proposition, a general, cardinality-guided molecular tailoring approach (CG-MTA) for ab initio geometry optimization of large molecules is implemented. The method employs energy gradients extracted from fragment wave functions, enabling computations otherwise impractical on PC hardware. Further, the method is readily amenable to large scale coarse-grain parallelization with minimal communication among nodes, resulting in a near-linear speedup. CG-MTA is applied for density-functional-theory-based geometry optimization of a variety of molecules including alpha-tocopherol, taxol, gamma-cyclodextrin, and two conformations of polyglycine. In the tests performed, energy and gradient estimates obtained from CG-MTA during optimization runs show an excellent agreement with those obtained from actual computation. Accuracy of the Hessian obtained employing CG-MTA provides good hope for the application of Hessian-based geometry optimization to large molecules.  相似文献   

14.
We describe the implementation of a parallel, in-core, integral-direct Hartree-Fock and density functional theory code for the efficient calculation of Hartree-Fock wave functions and density functional theory. The algorithm is based on a parallel master-slave algorithm, and the two-electron integrals calculated by a slave are stored in available local memory. To ensure the greatest computational savings, the master node keeps track of all integral batches stored on the different slaves. The code can reuse undifferentiated two-electron integrals both in the wave function optimization and in the evaluation of second-, third-, and fourth-order molecular properties. Superlinear scaling is achieved in a series of test examples, with speedups of up to 55 achieved for calculations run on medium-sized molecules on 16 processors with respect to the time used on a single processor.  相似文献   

15.
Virtual screening of large libraries of small compounds requires fast and reliable automatic docking methods. In this article we present a parallel implementation of a genetic algorithm (GA) and the implementation of an enhanced genetic algorithm (EGA) with niching that lead to remarkable speedups compared to the original version AutoDock 3.0. The niching concept is introduced naturally by sharing genetic information between evolutions of subpopulations that run independently, each on one CPU. A unique set of additionally introduced search parameters that control this information flow has been obtained for drug‐like molecules based on the detailed study of three test cases of different complexity. The average docking time for one compound is of 8.6 s using eight R10,000 processors running at 200 MHz in an Origin 2000 computer. Different genetic algorithms with and without local search (LS) have been compared on an equal workload basis showing EGA/LS to be superior over all alternatives because it finds lower energy solutions faster and more often, particularly for high dimensionality problems. © 2001 John Wiley & Sons, Inc. J Comput Chem 22: 1971–1982, 2001  相似文献   

16.
The search for efficient and predictive methods to describe the protein folding process at the all-atom level remains an important grand-computational challenge. The development of multi-teraflop architectures, such as the IBM BlueGene used in this study, has been motivated in part by the large computational requirements of such studies. Here we report the predictive all-atom folding of the forty-amino acid HIV accessory protein using an evolutionary stochastic optimization technique. We implemented the optimization method as a master-client model on an IBM BlueGene, where the algorithm scales near perfectly from 64 to 4096 processors in virtual processor mode. Starting from a completely extended conformation, we optimize a population of 64 conformations of the protein in our all-atom free-energy model PFF01. Using 2048 processors the algorithm predictively folds the protein to a near-native conformation with an RMS deviation of 3.43 A in < 24 h.  相似文献   

17.
The spatial stochastic simulation of biochemical systems requires significant calculation efforts. Parallel discrete-event simulation is a promising approach to accelerate the execution of simulation runs. However, achievable speedup depends on the parallelism inherent in the model. One of our goals is to explore this degree of parallelism in the Next Subvolume Method type simulations. Therefore we introduce the Abstract Next Subvolume Method, in which we decouple the model representation from the sequential simulation algorithms, and prove that state trajectories generated by its executions statistically accord with those generated by the Next Subvolume Method. The experimental performance analysis shows that optimistic synchronization algorithms, together with careful controls over the speculative execution, are necessary to achieve considerable speedup and scalability in parallel spatial stochastic simulation of chemical reactions. Our proposed method facilitates a flexible incorporation of different synchronization algorithms, and can be used to select the proper synchronization algorithm to achieve the efficient parallel simulation of chemical reactions.  相似文献   

18.
Summary Eigensolving (diagonalizing) small dense matrices threatens to become a bottleneck in the application of massively parallel computers to electronic structure methods. Because the computational cost of electronic structure methods typically scales asO(N 3) or worse, even teraflop computer systems with thousands of processors will often confront problems withN 10,000. At present, diagonalizing anN×N matrix onP processors is not efficient whenP is large compared toN. The loss of efficiency can make diagonalization a bottleneck on a massively parallel computer, even though it is typically a minor operation on conventional serial machines. This situation motivates a search for both improved methods and identification of the computer characteristics that would be most productive to improve.In this paper, we compare the performance of several parallel and serial methods for solving dense real symmetric eigensystems on a distributed memory message passing parallel computer. We focus on matrices of sizeN=200 and processor countsP=1 toP=512, with execution on the Intel Touchstone DELTA computer. The best eigensolver method is found to depend on the number of available processors. Of the methods tested, a recently developed Blocked Factored Jacobi (BFJ) method is the slowest for smallP, but the fastest for largeP. Its speed is a complicated non-monotonic function of the number of processors used. A detailed performance analysis of the BFJ method shows that: (1) the factor most responsible for limited speedup is communication startup cost; (2) with current communication costs, the maximum achievable parallel speedup is modest (one order of magnitude) compared to the best serial method; and (3) the fastest solution is often achieved by using less than the maximum number of available processors.Pacific Northwest Laboratory is operated for the U.S. Department of Energy (DOE) by Battelle Memorial Institute under contract DE-AC06-76RLO 1830  相似文献   

19.
The parallel implementation of multireference configuration interaction program based on the hole-particle symmetry is described. The platform to implement the parallelization is an Intel-Architectural cluster consisting of 12 nodes, each of which is equipped with two 2.4-G XEON processors, 3-GB memory, and 36-GB disk, and are connected by a Gigabit Ethernet Switch. The dependence of speedup on molecular symmetries and task granularities is discussed. Test calculations show that the scaling with the number of nodes is about 1.9 (for C1 and Cs), 1.65 (for C2v), and 1.55 (for D2h) when the number of nodes is doubled. The largest calculation performed on this cluster involves 5.6 x 10(8) CSFs.  相似文献   

20.
建立了一种基于不相交主成分分析(Disjoint PCA)和遗传算法(GA)的特征变量选择方法, 并用于从基因表达谱(Gene expression profiles)数据中识别差异表达的基因. 在该方法中, 用不相交主成分分析评估基因组在区分两类不同样品时的区分能力; 用GA寻找区分能力最强的基因组; 所识别基因的偶然相关性用统计方法评估. 由于该方法考虑了基因间的协同作用更接近于基因的生物过程, 从而使所识别的基因具有更好的差异表达能力. 将该方法应用于肝细胞癌(HCC)样品的基因芯片数据分析, 结果表明, 所识别的基因具有较强的区分能力, 优于常用的基因芯片显著性分析(Significance analysis of microarrays, SAM)方法.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号