期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Brownian dynamics simulations on CPU and GPU with BD_BOX 总被引：1，自引：0，他引：1

Długosz M Zieliński P Trylska J 《Journal of computational chemistry》2011,32(12):2734-2744

There has been growing interest in simulating biological processes under in vivo conditions due to recent advances in experimental techniques dedicated to study single particle behavior in crowded environments. We have developed a software package, BD_BOX, for multiscale Brownian dynamics simulations. BD_BOX can simulate either single molecules or multicomponent systems of diverse, interacting molecular species using flexible, coarse-grained bead models. BD_BOX is written in C and employs modern computer architectures and technologies; these include MPI for distributed-memory architectures, OpenMP for shared-memory platforms, NVIDIA CUDA framework for GPGPU, and SSE vectorization for CPU. 相似文献

2.

A methodology for high performance computation of fully inhomogeneous turbulent flows

Donghyun You Meng Wang Rajat Mittal 《国际流体数值方法杂志》2007,53(6):947-968

A large‐eddy simulation methodology for high performance parallel computation of statistically fully inhomogeneous turbulent flows on structured grids is presented. Strategies and algorithms to improve the memory efficiency as well as the parallel performance of the subgrid‐scale model, the factored scheme, and the Poisson solver on shared‐memory parallel platforms are proposed and evaluated. A novel combination of one‐dimensional red–black/line Gauss–Seidel and two‐dimensional red–black/line Gauss–Seidel methods is shown to provide high efficiency and performance for multigrid relaxation of the Poisson equation. Parallel speedups are measured on various shared‐distributed memory systems. Validations of the code are performed in large‐eddy simulations of turbulent flows through a straight channel and a square duct. Results obtained from the present solver employing a Lagrangian dynamic subgrid‐scale model show good agreements with other available data. The capability of the code for more complex flows is assessed by performing a large‐eddy simulation of the tip‐leakage flow in a linear cascade. Copyright © 2006 John Wiley & Sons, Ltd. 相似文献

3.

多核环境下的分子动力学模拟 总被引：1，自引：0，他引：1

杨爱贤吴江涛《工程热物理学报》2009,30(10)

本文在多核环境下,使用OpenMP实现了经典分子动力学模拟程序的并行;同时对分子动力学模拟进行了两项主要的优化:分子排序及运用SIMD指令运算.在4核下获得了4.13倍的计算性能提升,将经典分子动力学模拟的模拟规模提高至4000分子×10~7模拟总步数. 相似文献

4.

Efficient and portable acceleration of quantum chemical many-body methods in mixed floating point precision using OpenACC compiler directives

Janus J. Eriksen 《Molecular physics》2017,115(17-18):2086-2101

ABSTRACT

It is demonstrated how the non-proprietary OpenACC standard of compiler directives may be used to compactly and efficiently accelerate the rate-determining steps of two of the most routinely applied many-body methods of electronic structure theory, namely the second-order Møller-Plesset (MP2) model in its resolution-of-the-identity approximated form and the (T) triples correction to the coupled cluster singles and doubles model (CCSD(T)). By means of compute directives as well as the use of optimised device math libraries, the operations involved in the energy kernels have been ported to graphics processing unit (GPU) accelerators, and the associated data transfers correspondingly optimised to such a degree that the final implementations (using either double and/or single precision arithmetics) are capable of scaling to as large systems as allowed for by the capacity of the host central processing unit (CPU) main memory. The performance of the hybrid CPU/GPU implementations is assessed through calculations on test systems of alanine amino acid chains using one-electron basis sets of increasing size (ranging from double- to pentuple-ζ quality). For all but the smallest problem sizes of the present study, the optimised accelerated codes (using a single multi-core CPU host node in conjunction with six GPUs) are found to be capable of reducing the total time-to-solution by at least an order of magnitude over optimised, OpenMP-threaded CPU-only reference implementations. 相似文献

5.

MPI+X: task-based parallelisation and dynamic load balance of finite element assembly

Marta Garcia-Gasulla Roger Ferrer Antoni Artigues Victor López Jesús Labarta 《International Journal of Computational Fluid Dynamics》2019,33(3):115-136

The main computing phases of numerical methods for solving partial differential equations are the algebraic system assembly and the iterative solver. This work focuses on the first task, in the context of a hybrid MPI+X paradigm. The matrix assembly consists of a loop over the elements, faces, edges or nodes of the MPI partitions to compute element matrices and vectors and then of their assemblies. In a MPI+X hybrid parallelism context, X has consisted traditionally of loop parallelism using OpenMP, with different techniques to avoid the race condition, but presenting efficiency or implementation drawbacks. We propose an alternative, based on task parallelism using some extensions to the OpenMP programming model. In addition, dynamic load balance will be applied, especially efficient in the presence of hybrid meshes. This paper presents the proposed methodology, its implementation and its validation through the solution of large computational mechanics problems up to 16k cores. 相似文献

6.

A Parallel Block-Structured Finite Volume Method for Flows in Complex Geometry with Sliding Interfaces

G. Usera A. Vernet J. A. Ferré 《Flow, Turbulence and Combustion》2008,81(3):471-495

An implementation of the finite volume method is presented for the simulation of three dimensional flows in complex geometries, using block structured body fitted grids and an improved linear interpolation scheme. The interfaces between blocks are treated in a fully implicit manner, through modified linear solvers. The cells across block interfaces can be matching one-to-one or many-to-one. In addition, the use of sliding block interfaces allows the incorporation of moving rigid bodies inside the flow domain. An algebraic multigrid solver has been developed that works with this block structured approach, speeding up the iterations for the pressure. The flow solver is parallelized by domain decomposition using OpenMP, based on the same grid block structure. Application examples are presented that demonstrate these capabilities. This numerical model has been made freely available by the authors. G. Usera was supported by FPI/DPI2003-06725-C02-01 from DGI, Ministerio de Educación y Cultura y Fondos FEDER, Spain, grants 33/07 PDT, I+D CSIC, Uruguay. 相似文献

7.

The Parallel Variable Neighborhood Search for the p-Median Problem

Félix García-López Belén Melián-Batista José A. Moreno-Pérez J. Marcos Moreno-Vega 《Journal of Heuristics》2002,8(3):375-388

The Variable Neighborhood Search (VNS) is a recent metaheuristic that combines series of random and improving local searches based on systematically changed neighborhoods. When a local minimum is reached, a shake procedure performs a random search. This determines a new starting point for running an improving search. The use of interchange moves provides a simple implementation of the VNS algorithm for the p-Median Problem. Several strategies for the parallelization of the VNS are considered and coded in C using OpenMP. They are compared in a shared memory machine with large instances. 相似文献

8.

FEREBUS: Highly parallelized engine for kriging training

下载免费PDF全文

Nicodemo Di Pasquale Michael Bane Stuart J. Davie Paul L. A. Popelier 《Journal of computational chemistry》2016,37(29):2606-2616

FFLUX is a novel force field based on quantum topological atoms, combining multipolar electrostatics with IQA intraatomic and interatomic energy terms. The program FEREBUS calculates the hyperparameters of models produced by the machine learning method kriging. Calculation of kriging hyperparameters ( θ and p ) requires the optimization of the concentrated log‐likelihood . FEREBUS uses Particle Swarm Optimization (PSO) and Differential Evolution (DE) algorithms to find the maximum of . PSO and DE are two heuristic algorithms that each use a set of particles or vectors to explore the space in which is defined, searching for the maximum. The log‐likelihood is a computationally expensive function, which needs to be calculated several times during each optimization iteration. The cost scales quickly with the problem dimension and speed becomes critical in model generation. We present the strategy used to parallelize FEREBUS, and the optimization of through PSO and DE. The code is parallelized in two ways. MPI parallelization distributes the particles or vectors among the different processes, whereas the OpenMP implementation takes care of the calculation of , which involves the calculation and inversion of a particular matrix, whose size increases quickly with the dimension of the problem. The run time shows a speed‐up of 61 times going from single core to 90 cores with a saving, in one case, of ～98% of the single core time. In fact, the parallelization scheme presented reduces computational time from 2871 s for a single core calculation, to 41 s for 90 cores calculation. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. 相似文献

9.

紧束缚近似含时密度泛函理论的高效OpenMP并行化和GPU加速实现

范果红韩克利何国钟《化学物理学报》2013,26(6):635-645

紧束缚近似的含时密度泛函理论在多核和GPU系统下的高效加速实现,并应用于拥有成百上千原子体系的激发态电子结构计算．程序中采用了稀疏矩阵和OpenMP并行化来加速哈密顿矩阵的构建,而最为耗时的基态对角化部分通过双精度的GPU加速来实现．基态的GPU加速能够在保持计算精度的基础上达到8.73倍的加速比．激发态计算采用了基于Krylov子空间迭代算法,OpenMP并行化和GPU加速等方法对激发态计算的大规模TDDFT矩阵进行求解,从而得到本征值和本征矢,大大减少了迭代的次数和最终的求解时间．采用GPU对矩阵矢量相乘进行加速后的Krylov算法能够很快地达到收敛,使得相比于采用常规算法和CPU并行化的程序能够加速206倍．程序在一系列的小分子体系和大分子体系上的计算表明,相比基于第一性原理的CIS方法和含时密度泛函方法,程序能够花费很少的计算量取得合理而精确结果．相似文献

10.

Parallel direct methods for solving the system of linear equations with pipelining on a multicore using OpenMP

Panagiotis D. Michailidis Konstantinos G. Margaritis 《Journal of Computational and Applied Mathematics》2011,236(3):326-341

Recent developments in high performance computer architecture have a significant effect on all fields of scientific computing. Linear algebra and especially the solution of linear systems of equations lie at the heart of many applications in scientific computing. This paper describes and analyzes three parallel versions of the dense direct methods such as the Gaussian elimination method and the LU form of Gaussian elimination that are used in linear system solving on a multicore using an OpenMP interface. More specifically, we present two naive parallel algorithms based on row block and row cyclic data distribution and we put special emphasis on presenting a third parallel algorithm based on the pipeline technique. Further, we propose an implementation of the pipelining technique in OpenMP. Experimental results on a multicore CPU show that the proposed OpenMP pipeline implementation achieves good overall performance compared to the other two naive parallel methods. Finally, in this work we propose a simple, fast and reasonably analytical model to predict the performance of the direct methods with the pipelining technique. 相似文献