期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel Fock matrix construction with distributed shared memory model for the FMO‐MO method

Hiroaki Umeda Yuichi Inadomi Toshio Watanabe Toru Yagi Takayoshi Ishimoto Tsutomu Ikegami Hiroto Tadano Tetsuya Sakurai Umpei Nagashima 《Journal of computational chemistry》2010,31(13):2381-2388

A parallel Fock matrix construction program for FMO‐MO method has been developed with the distributed shared memory model. To construct a large‐sized Fock matrix during FMO‐MO calculations, a distributed parallel algorithm was designed to make full use of local memory to reduce communication, and was implemented on the Global Array toolkit. A benchmark calculation for a small system indicates that the parallelization efficiency of the matrix construction portion is as high as 93% at 1,024 processors. A large FMO‐MO application on the epidermal growth factor receptor (EGFR) protein (17,246 atoms and 96,234 basis functions) was also carried out at the HF/6‐31G level of theory, with the frontier orbitals being extracted by a Sakurai‐Sugiura eigensolver. It takes 11.3 h for the FMO calculation, 49.1 h for the Fock matrix construction, and 10 min to extract 94 eigen‐components on a PC cluster system using 256 processors. © 2010 Wiley Periodicals, Inc. J Comput Chem, 2010 相似文献

2.

A novel parallel algorithm for large-scale Fock matrix construction with small locally distributed memory architectures: RT parallel algorithm

Takashima H Yamada S Obara S Kitamura K Inabata S Miyakawa N Tanabe K Nagashima U 《Journal of computational chemistry》2002,23(14):1337-1346

We developed a novel parallel algorithm for large-scale Fock matrix calculation with small locally distributed memory architectures, and named it the "RT parallel algorithm." The RT parallel algorithm actively involves the concept of integral screening, which is indispensable for reduction of computing times with large-scale biological molecules. The primary characteristic of this algorithm is parallel efficiency, which is achieved by well-balanced reduction of both communicating and computing volume. Only the density matrix data necessary for Fock matrix calculations are communicated, and the data once communicated are reutilized for calculations as many times as possible. The RT parallel algorithm is a scalable method because required memory volume does not depend on the number of basis functions. This algorithm automatically includes a partial summing technique that is indispensable for maintaining computing accuracy, and can also include some conventional methods to reduce calculation times. In our analysis, the RT parallel algorithm had better performance than other methods for massively parallel processors. The RT parallel algorithm is most suitable for massively parallel and distributed Fock matrix calculations for large-scale biological molecules with more than thousands of basis functions. 相似文献

3.

Application of MDGRAPE-3, a special purpose board for molecular dynamics simulations, to periodic biomolecular systems

Kikugawa G Apostolov R Kamiya N Taiji M Himeno R Nakamura H Yonezawa Y 《Journal of computational chemistry》2009,30(1):110-118

We describe the application of a special purpose board for molecular dynamics simulations, named MDGRAPE-3, to the problem of simulating periodic bio-molecular systems. MDGRAPE-3 is the latest board in a series of hardware accelerators designed to calculate the nonbonding long-range interactions much more rapidly than normal processors. So far, MDGRAPEs were mainly applied to isolated systems, where very many nonbonded interactions were calculated without any distance cutoff. However, in order to regulate the density and pressure during simulations of membrane embedded protein systems, one has to evaluate interactions under periodic boundary conditions. For this purpose, we implemented the Particle-Mesh Ewald (PME) method, and its approximation with distance cutoffs and charge neutrality as proposed by Wolf et al., using MDGRAPE-3. When the two methods were applied to simulations of two periodic biomolecular systems, a single MDGRAPE-3 achieved 30-40 times faster computation times than a single conventional processor did in the both cases. Both methods are shown to have the same molecular structures and dynamics of the systems. 相似文献

4.

Development of hardware accelerator for molecular dynamics simulations: a computation board that calculates nonbonded interactions in cooperation with fast multipole method

Amisaki T Toyoda S Miyagawa H Kitamura K 《Journal of computational chemistry》2003,24(5):582-592

Evaluation of long-range Coulombic interactions still represents a bottleneck in the molecular dynamics (MD) simulations of biological macromolecules. Despite the advent of sophisticated fast algorithms, such as the fast multipole method (FMM), accurate simulations still demand a great amount of computation time due to the accuracy/speed trade-off inherently involved in these algorithms. Unless higher order multipole expansions, which are extremely expensive to evaluate, are employed, a large amount of the execution time is still spent in directly calculating particle-particle interactions within the nearby region of each particle. To reduce this execution time for pair interactions, we developed a computation unit (board), called MD-Engine II, that calculates nonbonded pairwise interactions using a specially designed hardware. Four custom arithmetic-processors and a processor for memory manipulation ("particle processor") are mounted on the computation board. The arithmetic processors are responsible for calculation of the pair interactions. The particle processor plays a central role in realizing efficient cooperation with the FMM. The results of a series of 50-ps MD simulations of a protein-water system (50,764 atoms) indicated that a more stringent setting of accuracy in FMM computation, compared with those previously reported, was required for accurate simulations over long time periods. Such a level of accuracy was efficiently achieved using the cooperative calculations of the FMM and MD-Engine II. On an Alpha 21264 PC, the FMM computation at a moderate but tolerable level of accuracy was accelerated by a factor of 16.0 using three boards. At a high level of accuracy, the cooperative calculation achieved a 22.7-fold acceleration over the corresponding conventional FMM calculation. In the cooperative calculations of the FMM and MD-Engine II, it was possible to achieve more accurate computation at a comparable execution time by incorporating larger nearby regions. 相似文献