首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 436 毫秒
1.
This paper analyses the performance of several versions of a block parallel algorithm in order to apply Neville elimination in a distributed memory parallel computer. Neville elimination is a procedure to transform a square matrix A into an upper triangular one. This analysis must take into account the algorithm behaviour as far as execution time, efficiency and scalability are concerned. Special attention has been paid to the study of the scalability of the algorithms trying to establish the relationship existing between the size of the block and the performance obtained in this metric. It is important to emphasize the high efficiency achieved in the studied cases and that the experimental results confirm the theoretical approximation. Therefore, we have obtained a high predicting ability tool of analysis. Finally, we will present the elimination of Neville as an efficient tool in detecting point sources in cosmic microwave background maps.  相似文献   

2.
We present an outline of the parallel implementation of our pseudospectral electronic structure program, Jaguar, including the algorithm and timings for the Hartree–Fock and analytic gradient portions of the program. We also present the parallel algorithm and timings for our Lanczos eigenvector refinement code and demonstrate that its performance is superior to the ScaLAPACK diagonalization routines. The overall efficiency of our code increases as the size of the calculation is increased, demonstrating actual as well as theoretical scalability. For our largest test system, alanine pentapeptide [818 basis functions in the cc-pVTZ(-f) basis set], our Fock matrix assembly procedure has an efficiency of nearly 90% on a 16-processor SP2 partition. The SCF portion for this case (including eigenvector refinement) has an overall efficiency of 87% on a partition of 8 processors and 74% on a partition of 16 processors. Finally, our parallel gradient calculations have a parallel efficiency of 84% on 8 processors for porphine (430 basis functions). © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1017–1029, 1998  相似文献   

3.
We describe an efficient algorithm for carrying out a “divide-and-conquer” fit of a molecule's electronic density on massively parallel computers. Near linear speedups are achieved with up to 48 processors on a Cray T3E, and our results indicate that similar efficiencies could be attained on an even greater number of processors. To achieve optimum efficiency, the algorithm combines coarse and fine-grain parallelization and adapts itself to the existing ratio of processors to subsystems. The subsystems employed in our divide-and-conquer approach can also be made smaller or bigger, depending on the number of processors available. This allows us to further reduce the wallclock time and improve the method's overall efficiency. The strategies implemented in this paper can be extended to any other divide-and-conquer method used within an ab initio, density functional, or semi-empirical quantum mechanical program. Received: 15 September 1997 / Accepted: 21 January 1998  相似文献   

4.
A new algorithm for parallel calculation of the second derivatives (Hessian) of the conformational energy function of biomolecules in internal coordinates is proposed. The basic scheme of this algorithm is the division of the entire calculation of the Hessian matrix (called "task") into subtasks and the optimization of the assignment of processors to each subtask by considering both the load balancing and reduction of the communication cost. A genetic algorithm is used for this optimization considering the dependencies between subtasks. We applied this method to a glutaminyl transfer RNA (Gln-tRNA) molecule for which the scalability of our previously developed parallel algorithm was significantly decreased when the large number of processors was used. The speedup for the calculation was 32.6 times with 60 processors, which is considerably better than the speedup for our previously reported parallel algorithm. The elapsed time for the calculation of subtasks, data sending, and data receiving was analyzed, and the effect of the optimization using the genetic algorithm is discussed.  相似文献   

5.
Gaussian-94 is the series of electronic structure programs. It is an integrated system to model a broad range of molecular systems under a variety of conditions, performing its calculations from the basic laws of quantum chemistry. This new version includes methods and algorithms for scalable massively parallel systems such as the Cray T3E supercomputer. In this study, we discuss the performance of Gaussian using large number of processors. In particular, we analyze the scalability of methods such as Hartree–Fock and density functional theory (DFT), including first and second derivatives. In addition, we explore scalability for CIS, MP2, and MCSCF calculations. Scalability and speedups were investigated for most of the examples with up to 64 process elements. A single-point energy calculation (B3-LYP/6-311++G3df,3p) was tested with up to 512 process elements. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1053–1063, 1998  相似文献   

6.
We present a method of parallelizing flat histogram Monte Carlo simulations, which give the free energy of a molecular system as an output. In the serial version, a constant probability distribution, as a function of any system parameter, is calculated by updating an external potential that is added to the system Hamiltonian. This external potential is related to the free energy. In the parallel implementation, the simulation is distributed on to different processors. With regular intervals the modifying potential is summed over all processors and distributed back to every processor, thus spreading the information of which parts of parameter space have been explored. This implementation is shown to decrease the execution time linearly with added number of processors.  相似文献   

7.
We describe two algorithms for the parallel calculation of a CHARMm-like force field in macromolecules. For a molecule with a given number of atoms, we show that there is an optimal number of processors leading to a minimum computation time. At the optimum, both the number of processors and the computation time are proportional to the number of atoms. © 1992 by John Wiley & Sons, Inc.  相似文献   

8.
We describe in detail our high-performance density matrix renormalization group (DMRG) algorithm for solving the electronic Schrodinger equation. We illustrate the linear scalability of our algorithm with calculations on up to 64 processors. The use of massively parallel machines in conjunction with our algorithm considerably extends the range of applicability of the DMRG in quantum chemistry.  相似文献   

9.
We present ONETEP (order-N electronic total energy package), a density functional program for parallel computers whose computational cost scales linearly with the number of atoms and the number of processors. ONETEP is based on our reformulation of the plane wave pseudopotential method which exploits the electronic localization that is inherent in systems with a nonvanishing band gap. We summarize the theoretical developments that enable the direct optimization of strictly localized quantities expressed in terms of a delocalized plane wave basis. These same localized quantities lead us to a physical way of dividing the computational effort among many processors to allow calculations to be performed efficiently on parallel supercomputers. We show with examples that ONETEP achieves excellent speedups with increasing numbers of processors and confirm that the time taken by ONETEP as a function of increasing number of atoms for a given number of processors is indeed linear. What distinguishes our approach is that the localization is achieved in a controlled and mathematically consistent manner so that ONETEP obtains the same accuracy as conventional cubic-scaling plane wave approaches and offers fast and stable convergence. We expect that calculations with ONETEP have the potential to provide quantitative theoretical predictions for problems involving thousands of atoms such as those often encountered in nanoscience and biophysics.  相似文献   

10.
Normal mode analysis plays an important role in relating the conformational dynamics of proteins to their biological function. The subspace iteration method is a numerical procedure for normal mode analysis that has enjoyed widespread success in the structural mechanics community due to its numerical stability and computational efficiency in calculating the lowest normal modes of large systems. Here, we apply the subspace iteration method to proteins to demonstrate its advantageous properties in this area of computational protein science. An effective algorithm for choosing the number of iteration vectors in the method is established, offering a considerable improvement over the original implementation. In the present application, computational time scales linearly with the number of normal modes computed. Additionally, the method lends itself naturally to normal mode analyses of multiple neighboring macromolecular conformations, as demonstrated in a conformational change pathway analysis of adenylate kinase. These properties, together with its computational robustness and intrinsic scalability to multiple processors, render the subspace iteration method an effective and reliable computational approach to protein normal mode analysis. © 2009 Wiley Periodicals, Inc. J Comput Chem 2010  相似文献   

11.
Two algorithms are presented for parallel direct computation of energies with second-order perturbation theory. Closed-shell MP2 theory as well as the open-shell perturbation theories OPT2(2) and ZAPT2 have been implemented. The algorithms are designed for distributed memory parallel computers. The first algorithm exhibits an excellent load balance and scales well when relatively few processors are used, but a large communication overhead reduces the efficiency for larger numbers of processors. The other algorithm employs very little interprocessor communication and scales well for large systems. In both implementations the memory requirement has been reduced by allowing the two-electron integral transformation to be performed in multiple passes and by distributing the (partially) transformed integrals between processors. Results are presented for systems with up to 327 basis functions. © 1995 John Wiley & Sons, Inc.  相似文献   

12.
13.
Summary Eigensolving (diagonalizing) small dense matrices threatens to become a bottleneck in the application of massively parallel computers to electronic structure methods. Because the computational cost of electronic structure methods typically scales asO(N 3) or worse, even teraflop computer systems with thousands of processors will often confront problems withN 10,000. At present, diagonalizing anN×N matrix onP processors is not efficient whenP is large compared toN. The loss of efficiency can make diagonalization a bottleneck on a massively parallel computer, even though it is typically a minor operation on conventional serial machines. This situation motivates a search for both improved methods and identification of the computer characteristics that would be most productive to improve.In this paper, we compare the performance of several parallel and serial methods for solving dense real symmetric eigensystems on a distributed memory message passing parallel computer. We focus on matrices of sizeN=200 and processor countsP=1 toP=512, with execution on the Intel Touchstone DELTA computer. The best eigensolver method is found to depend on the number of available processors. Of the methods tested, a recently developed Blocked Factored Jacobi (BFJ) method is the slowest for smallP, but the fastest for largeP. Its speed is a complicated non-monotonic function of the number of processors used. A detailed performance analysis of the BFJ method shows that: (1) the factor most responsible for limited speedup is communication startup cost; (2) with current communication costs, the maximum achievable parallel speedup is modest (one order of magnitude) compared to the best serial method; and (3) the fastest solution is often achieved by using less than the maximum number of available processors.Pacific Northwest Laboratory is operated for the U.S. Department of Energy (DOE) by Battelle Memorial Institute under contract DE-AC06-76RLO 1830  相似文献   

14.
Short-range molecular dynamics simulations of molecular systems are commonly parallelized by replicated-data methods, in which each processor stores a copy of all atom positions. This enables computation of bonded 2-, 3-, and 4-body forces within the molecular topology to be partitioned among processors straightforwardly. A drawback to such methods is that the interprocessor communication scales as N (the number of atoms) independent of P (the number of processors). Thus, their parallel efficiency falls off rapidly when large numbers of processors are used. In this article a new parallel method for simulating macromolecular or small-molecule systems is presented, called force-decomposition. Its memory and communication costs scale as N/√P, allowing larger problems to be run faster on greater numbers of processors. Like replicated-data techniques, and in contrast to spatial-decomposition approaches, the new method can be simply load balanced and performs well even for irregular simulation geometries. The implementation of the algorithm in a prototypical macromolecular simulation code ParBond is also discussed. On a 1024-processor Intel Paragon, ParBond runs a standard benchmark simulation of solvated myoglobin with a parallel efficiency of 61% and at 40 times the speed of a vectorized version of CHARMM running on a single Cray Y-MP processor. © 1996 by John Wiley & Sons, Inc.  相似文献   

15.
16.
In this paper we present an efficient parallelization of the ONX algorithm for linear computation of the Hartree-Fock exchange matrix [J. Chem. Phys. 106, 9708 (1997)]. The method used is based on the equal time (ET) partitioning recently introduced [J. Chem. Phys. 118, 9128 (2003)] and [J. Chem. Phys. 121, 6608 (2004)]. ET exploits the slow variation of the density matrix between self-consistent-field iterations to achieve load balance. The method is presented and some benchmark calculations are discussed for gas phase and periodic systems with up to 128 processors. The current parallel ONX code is able to deliver up to 77% overall efficiency for a cluster of 50 water molecules on 128 processors (2.56 processors per heavy atom) and up to 87% for a box of 64 water molecules (two processors per heavy atom) with periodic boundary conditions.  相似文献   

17.
In this paper, we present a simple and efficient whole genome alignment method using maximal exact match (MEM). The major problem with the use of MEM anchor is that the number of hits in non-homologous regions increases exponentially when shorter MEM anchors are used to detect more homologous regions. To deal with this problem, we have developed a fast and accurate anchor filtering scheme based on simple match extension with minimum percent identity and extension length criteria. Due to its simplicity and accuracy, all MEM anchors in a pair of genomes can be exhaustively tested and filtered. In addition, by incorporating the translation technique, the alignment quality and speed of our genome alignment algorithm have been further improved. As a result, our genome alignment algorithm, GAME (Genome Alignment by Match Extension), performs competitively over existing algorithms and can align large whole genomes, e.g., A. thaliana, without the requirement of typical large memory and parallel processors. This is shown using an experiment which compares the performance of BLAST, BLASTZ, PatternHunter, MUMmer and our algorithm in aligning all 45 pairs of 10 microbial genomes. The scalability of our algorithm is shown in another experiment where all pairs of five chromosomes in A. thaliana were compared.  相似文献   

18.
Many systems of great importance in material science, chemistry, solid-state physics, and biophysics require forces generated from an electronic structure calculation, as opposed to an empirically derived force law to describe their properties adequately. The use of such forces as input to Newton's equations of motion forms the basis of the ab initio molecular dynamics method, which is able to treat the dynamics of chemical bond-breaking and -forming events. However, a very large number of electronic structure calculations must be performed to compute an ab initio molecular dynamics trajectory, making the efficiency as well as the accuracy of the electronic structure representation critical issues. One efficient and accurate electronic structure method is the generalized gradient approximation to the Kohn-Sham density functional theory implemented using a plane-wave basis set and atomic pseudopotentials. The marriage of the gradient-corrected density functional approach with molecular dynamics, as pioneered by Car and Parrinello (R. Car and M. Parrinello, Phys Rev Lett 1985, 55, 2471), has been demonstrated to be capable of elucidating the atomic scale structure and dynamics underlying many complex systems at finite temperature. However, despite the relative efficiency of this approach, it has not been possible to obtain parallel scaling of the technique beyond several hundred processors on moderately sized systems using standard approaches. Consequently, the time scales that can be accessed and the degree of phase space sampling are severely limited. To take advantage of next generation computer platforms with thousands of processors such as IBM's BlueGene, a novel scalable parallelization strategy for Car-Parrinello molecular dynamics is developed using the concept of processor virtualization as embodied by the Charm++ parallel programming system. Charm++ allows the diverse elements of a Car-Parrinello molecular dynamics calculation to be interleaved with low latency such that unprecedented scaling is achieved. As a benchmark, a system of 32 water molecules, a common system size employed in the study of the aqueous solvation and chemistry of small molecules, is shown to scale on more than 1500 processors, which is impossible to achieve using standard approaches. This degree of parallel scaling is expected to open new opportunities for scientific inquiry.  相似文献   

19.
The Effective Fragment Potential (EFP) method for solvation decreases the cost of a fully quantum mechanical calculation by dividing a chemical system into an ab initio region that contains the solute plus some number of solvent molecules, if desired, and an "effective fragment" region that contains the remaining solvent molecules. Interactions introduced with this fragment region (for example, Coulomb and polarization interactions) are added as one-electron terms to the total system Hamiltonian. As larger systems and dynamics are just starting to be studied with the EFP method, more needs to be done to decrease the calculation time of the method. This article considers parallelization of both the EFP fragment-fragment and mixed quantum mechanics (QM)-EFP interaction energy and gradient computation within the GAMESS suite of programs. The iteratively self-consistent polarization term is treated with a new algorithm that makes use of nonblocking communication to obtain better scalability. Results show that reasonable speedup is achieved with a variety of sizes of water clusters and number of processors.  相似文献   

20.
A parallel algorithm for solving the coupled-perturbed MCSCF (CPMCSCF) equations and analytic nuclear second derivatives of CASSCF wave functions is presented. A parallel scheme for evaluating derivative integrals and their subsequent use in constructing other derivative quantities is described. The task of solving the CPMCSCF equations is approached using a parallelization scheme that partitions the electronic hessian matrix over all processors as opposed to simple partitioning of the 3 N solution vectors among the processors. The scalability of the current algorithm, up to 128 processors, is demonstrated. Using three test cases, results indicate that the parallelization of derivative integral evaluation through a simple scheme is highly effective regardless of the size of the basis set employed in the CASSCF energy calculation. Parallelization of the construction of the MCSCF electronic hessian during solution of the CPMCSCF equations varies quantitatively depending on the nature of the hessian itself, but is highly scalable in all cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号