首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This work presents a parallel numerical strategy to transport Lagrangian particles in a fluid using a dynamic load balance strategy. Both fluid and particle solvers are parallel, with two levels of parallelism. The first level is based on a substructuring technique and uses message passing interface (MPI) as the communication library; the second level consists of OpenMP pragmas for loop parallelisation at the node level. When dealing with transient flows, there exist two main alternatives to address the coupling of these solvers. On the one hand, a single-code approach consists in solving the particle equations once the fluid solution has been obtained at the end of a time step, using the same instance of the same code. On the other hand, a multi-code approach enables one to overlap the transport of the particles with the next time-step solution of the fluid equations, and thus obtain asynchronism. In this case, different codes or two instances of the same code can be used. Both approaches will be presented. In addition, a dynamic load balancing library is used on the top of OpenMP pragmas in order to continuously exploit all the resources available at the node level, thus increasing the load balance and the efficiency of the parallelisation and uses the MPI.  相似文献   

2.
An efficient parallel spectral method for direct numerical simulations of transitional and turbulent flows is described in this paper. The parallelization is classically based on a bidimensional domain decomposition, but has been specifically developed for a solenoidal Fourier–Chebyshev spectral approximation where in one Fourier direction, the number of modes is very large compared with the two other directions. The approach therefore differs from classical libraries developed for cubic Fourier boxes. The strategy uses message‐passing interface (MPI) for message‐passing among nodes and is fairly portable. One of the originalities of this paper is the use of an efficient hybrid programming with MPI for internodes communications and a coarse grain parallelism using OpenMP for core shared‐memory computation, instead of the classical hybrid programming with MPI and a fine granularity parallelism at the loop level with OpenMP directives. This hybrid parallelism has been tested on the recent generation of high‐performance parallel supercomputers involving a few tens of cores per node. Performances are evaluated on different low‐frequency and high‐frequency processors massively parallel platforms. We demonstrate that spectral methods, which are known to be inherently ill‐fitted for the new generation of high‐performance distributed‐memory computers, can be implemented efficiently using this hybrid programming with good scalability and a very fast wall‐clock time per iteration. New numerical experiments are therefore now accessible on petascale computers, while keeping the attractive features of spectral methods such as accuracy, exponential convergence, computational efficiency and conservative properties. This is illustrated by a direct numerical simulation of the transition of the boundary layers developing from the entrance section of a plane channel and interacting to merge into a fully turbulent flow. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

3.
The paper focuses on the development of a numerical code for the computation of basins of attraction by using the parallel programming. Two different approaches based on the massage passing interface (MPI) standard are presented; the performance analysis presented encourages us to use a massive communication between nodes only for a few-cores architecture. The critical issues arising from the study of a generic dynamical system are discussed while the computation of basins is performed on a benchmark system described by Duffing׳s equation. We paid attention at the optimization of the computing time as well as the work time load on each node in order to develop a performing and portable code. For the presented codes, both the scalability with an implementation on a professional cluster and the capabilities of the parallelism in the elaborations of basins with a large set of initial conditions have been tested.  相似文献   

4.
Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix–vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.  相似文献   

5.
基于混合网格Navier-Stokes方程的并行隐式计算方法研究   总被引:2,自引:0,他引:2  
针对结构网格很难处理复杂外形和非结构网格无法计算具有边界层的粘性流动的缺点,发展了基于混合网格格点的隐式算法,成功地解决了在工程应用中难于处理的复杂外形粘性流场计算和效率问题。同时针对大规模的工程问题,发展了基于MPI通信技术的染色分层通讯并行计算方法。其中空间离散采用基于Roe格式发展的三阶迎风HLLEW(Harten-Lax-Van Leer-Einfeldt-Wada)或AUSM格式,湍流模型采用k??两方程湍流模型,时间推进考虑到LU-SGS并行等效较困难则采用基于DP-LUR(Data-Parallel Lower-Upper Relaxation)格式的隐式算法,计算CFL数可取到105量级,从2个到128个CPU的并行加速效率都保持在90%以上,大大提高了计算效率。算例对标模M6机翼模型流场进行计算,验证了方法的可靠性;然后对标模DLR-F6翼身组合体进行混合网格粘性与无粘计算结果进行比较,进一步验证混合网格方法;最后计算了DLR-WBNP外挂发动机翼身组合体模型,准确模拟了外挂和超临界机翼的相互干扰流动问题,采用4 CPU 16 CORE到24 CPU 96 CORE,2000步计算时间都不超...  相似文献   

6.
With the increasing heterogeneity and on‐node parallelism of high‐performance computing hardware, a major challenge is to develop portable and efficient algorithms and software. In this work, we present our implementation of a portable code to perform surface reconstruction using NVIDIA's Thrust library. Surface reconstruction is a technique commonly used in volume tracking methods for simulations of multimaterial flow with interfaces. We have designed a 3D mesh data structure that is easily mapped to the 1D vectors used by Thrust and at the same time is simple to use and uses familiar data structure terminology (such as cells, faces, vertices, and edges). With this new data structure in place, we have implemented a piecewise linear interface reconstruction algorithm in 3 dimensions that effectively exploits the symmetry present in a uniform rectilinear computational cell. Finally, we report performance results, which show that a single implementation of these algorithms can be compiled to multiple backends (specifically, multi‐core CPUs, NVIDIA GPUs, and Intel Xeon Phi processors), making efficient use of the available parallelism on each. We also compare performance of our implementation to a legacy FORTRAN implementation in Message Passing Interface (MPI) and show performance parity on single and multi‐core CPU and achieved good parallel speed‐ups on GPU. Our research demonstrates the advantage of performance portability of the underlying data‐parallel programming model.  相似文献   

7.
The design of the extreme-scale platforms that are expected to become available in the forthcoming decade will represent a convergence of technological trends and the boundary conditions imposed by over half a century of algorithm and application software development. These platforms will be hierarchical because they provide coarse grain parallelism between nodes and fine grain parallelism within each node. They are also expected to be very heterogeneous since multi-core chips and accelerators have completely different architectures and potentials. It is clear that such a degree of complexity will embody radical changes that will render obsolete the current software infrastructure for large-scale scientific applications. In this paper, we illustrate a hierarchical algorithmic approach for the implementation of an efficient parallel sparse linear solver that combines direct and iterative methods. Such a hybrid approach exploits the advantages of both numerical techniques and enables the use of several levels and grains of parallelism. This combination express different levels of parallelism and permits an optimal trade-off between numerical and parallel efficiency. Consequently, such a numerical technique appears as a promising candidate for intensive simulations on future many-core parallel platforms.  相似文献   

8.
膛口反应流并行数值模拟   总被引:1,自引:0,他引:1  
郭则庆  姜孝海  王杨 《计算力学学报》2013,30(1):111-116,123
采用轴对称多组分N-S方程对含有高速运动弹丸的膛口反应流进行了数值模拟.控制方程采用时间分裂方法并在大型计算机上采用MPI方法进行多核并行求解,其中对流项采用二阶AUSM+格式和MUSCL插值方法进行处理,燃气采用氢气-空气混合气,反应机理为9组分19步基元反应.对于弹丸引起的网格运动,采用嵌套网格法处理.并行验证算例与串行计算结果一致,采用20个CPU计算时效率为64%.根据数值结果详细讨论了发射过程中的气体动力学和化学动力学过程,并且通过对两种条件下的计算结果比较分析了化学反应对膛口流场发展的影响.结果表明,上述算法能够较为正确地模拟弹丸和化学反应对膛口流场的影响,并大大提高了计算速度.  相似文献   

9.
The singular nature of the elastic fields produced by dislocations presents conceptual challenges and computational difficulties in the implementation of discrete dislocation-based models of plasticity. In the context of classical elasticity, attempts to regularize the elastic fields of discrete dislocations encounter intrinsic difficulties. On the other hand, in gradient elasticity, the issue of singularity can be removed at the outset and smooth elastic fields of dislocations are available. In this work we consider theoretical and numerical aspects of the non-singular theory of discrete dislocation loops in gradient elasticity of Helmholtz type, with interest in its applications to three dimensional dislocation dynamics (DD) simulations. The gradient solution is developed and compared to its singular and non-singular counterparts in classical elasticity using the unified framework of eigenstrain theory. The fundamental equations of curved dislocation theory are given as non-singular line integrals suitable for numerical implementation using fast one-dimensional quadrature. These include expressions for the interaction energy between two dislocation loops and the line integral form of the generalized solid angle associated with dislocations having a spread core. The single characteristic length scale of Helmholtz elasticity is determined from independent molecular statics (MS) calculations. The gradient solution is implemented numerically within our variational formulation of DD, with several examples illustrating the viability of the non-singular solution. The displacement field around a dislocation loop is shown to be smooth, and the loop self-energy non-divergent, as expected from atomic configurations of crystalline materials. The loop nucleation energy barrier and its dependence on the applied shear stress are computed and shown to be in good agreement with atomistic calculations. DD simulations of Lomer–Cottrell junctions in Al show that the strength of the junction and its configuration are easily obtained, without ad-hoc regularization of the singular fields. Numerical convergence studies related to the implementation of the non-singular theory in DD are presented.  相似文献   

10.
An implementation of the finite volume method is presented for the simulation of three dimensional flows in complex geometries, using block structured body fitted grids and an improved linear interpolation scheme. The interfaces between blocks are treated in a fully implicit manner, through modified linear solvers. The cells across block interfaces can be matching one-to-one or many-to-one. In addition, the use of sliding block interfaces allows the incorporation of moving rigid bodies inside the flow domain. An algebraic multigrid solver has been developed that works with this block structured approach, speeding up the iterations for the pressure. The flow solver is parallelized by domain decomposition using OpenMP, based on the same grid block structure. Application examples are presented that demonstrate these capabilities. This numerical model has been made freely available by the authors. G. Usera was supported by FPI/DPI2003-06725-C02-01 from DGI, Ministerio de Educación y Cultura y Fondos FEDER, Spain, grants 33/07 PDT, I+D CSIC, Uruguay.  相似文献   

11.
ABSTRACT

In this paper, the OpenACC heterogeneous parallel programming model is successfully applied to modification and acceleration of the three-dimensional Tokamak magnetohydrodynamical code (CLT). Through combination of OpenACC and MPI technologies, CLT is further parallelised by using multiple-GPUs. Significant speedup ratios are achieved on NVIDIA TITAN Xp and TITAN V GPUs, respectively, with very few modifications of CLT. Furthermore, the validity of the double precision calculations on the above-mentioned two graphics cards has also been strictly verified with m/n?=?2/1 resistive tearing mode instability in Tokamak.  相似文献   

12.
In this work we present a robust interface coupling algorithm called Compact Interface quasi-Newton (CIQN). It is designed for computationally intensive applications using an MPI multi-code partitioned scheme. The algorithm allows to reuse information from previous time steps, feature that has been previously proposed to accelerate convergence. Through algebraic manipulation, an efficient usage of the computational resources is achieved by: avoiding construction of dense matrices and reduce every multiplication to a matrix–vector product and reusing the computationally expensive loops. This leads to a compact version of the original quasi-Newton algorithm. Altogether with an efficient communication, in this paper we show an efficient scalability up to 4800 cores. Three examples with qualitatively different dynamics are shown to prove that the algorithm can efficiently deal with added mass instability and two-field coupled problems. We also show how reusing histories and filtering does not necessarily makes a more robust scheme and, finally, we prove the necessity of this HPC version of the algorithm. The novelty of this article lies in the HPC focused implementation of the algorithm, detailing how to fuse and combine the composing blocks to obtain an scalable MPI implementation. Such an implementation is mandatory in large scale cases, for which the contact surface cannot be stored in a single computational node, or the number of contact nodes is not negligible compared with the size of the domain.  相似文献   

13.
基于OpenMP技术开发了三维显式物质点并行程序MPM3DMP。为了避免节点更新阶段的数据竞争,采用区域分解法将背景网格分解为均匀的子域,每个线程负责一个子域的节点变量更新,然后将更新后的节点变量装配到整体。在质点更新阶段采用了循环分解方法进行并行。针对Taylor杆碰撞的三种计算模型,在双Intel Woodcrest 4核CPU服务器下进行了测试:粗模型在4核下加速比为3.82,在8核下为6.23,中模型在4核下加速比为3.79,在8核下加速比为6.23;细模型在4核下加速比为3.75,8核下加速比为6.26。因此,本文的并行程序具有较好的并行效率和可扩展性。  相似文献   

14.
Assessing the mobility of off-road vehicles is a complex task that most often falls back on semi-empirical approaches to quantifying the vehicle–terrain interaction. Herein, we concentrate on physics-based methodologies for wheeled vehicle mobility that factor in both tire flexibility and terrain deformation within a fully three-dimensional multibody system approach. We represent the tire based on the absolute nodal coordinate formulation (ANCF), a nonlinear finite element approach that captures multi-layered, orthotropic shell elements constrained to the wheel rim. The soil is modeled as a collection of discrete elements that interact through contact, friction, and cohesive forces. The resulting vehicle/tire/terrain interaction problem has several millions of degrees of freedom and is solved in an explicit co-simulation framework, built upon and now available in the open-source multi-physics package Chrono. The co-simulation infrastructure is developed using a Message Passing Interface (MPI) layer for inter-system communication and synchronization, with additional parallelism leveraged through a shared-memory paradigm. The formulation and software framework presented in this investigation are proposed for the analysis of the dynamics of off-road wheeled vehicle mobility. Its application is demonstrated by numerical sensitivity studies on available drawbar pull, terrain resistance, and sinkage with respect to parameters such as tire inflation pressure and soil cohesion. The influence of a rigid tire assumption on mobility is also discussed.  相似文献   

15.
The divide-and-conquer paradigm of iterative domain decomposition or substructuring has become a practical tool in computational fluid dynamics applications because of its flexibility in accommodating adaptive refinement through locally uniform (or quasi-uniform) grids, its ability to exploit multiple discretizations of the operator equations, and the modular pathway it provides towards parallelism. We illustrate these features on the classic model problem of flow over a backstep using Newton's method as the non-linear iteration. Multiple discretizations (second-order in the operator and first-order in the preconditioner) and locally uniform mesh refinement pay dividends separately and can be combined synergistically. We include sample performance results from an Intel iPSC/860 hypercube implementation.  相似文献   

16.
应用并行PEST算法优化地下水模型参数   总被引:5,自引:0,他引:5  
基于列文伯格-马夸尔特(Levenberg-Marquardt)算法的PEST参数优化程序具有寻优速度快、健壮性好的优点,在地下水模型参数优化研究中有许多成功的应用实例。但是,对于大尺度、高精度和高复杂性的大规模地下水模拟,使用PEST进行参数优化需要大量的计算时间,优化效率较低。本文应用OpenMP并行编程方法对PEST算法进行了并行化,使之可以在共享存储并行计算机上进行参数优化的并行计算。并将此方法应用于甘肃北山区域地下水模型的参数优化中,并行实验表明,使用并行化的PEST可以将地下水模型参数优化效率提高3.7倍。
  相似文献   

17.
Parallel processing techniques have been used in the past to provide high performance computing resources for activities such as Computational Fluid Dynamics. This is normally achieved using specialized hardware and software, the expense of which would be difficult to justify for many fire engineering practices. In this paper, we demonstrate how typical office‐based PCs attached to a local area network have the potential to offer the benefits of parallel processing with minimal costs associated with the purchase of additional hardware or software. A dynamic load balancing scheme was devised to allow the effective use of the software on heterogeneous PC networks. This scheme ensured that the impact between the parallel processing task and other computer users on the network was minimized thus allowing practical parallel processing within a conventional office environment. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

18.
The Bouc–Wen model for smooth hysteresis has received an increasing interest in the last few years due to the ease of its numerical implementation and its ability to represent a wide range of hysteresis loop shapes. This model consists of a first-order nonlinear differential equation that contains some parameters that can be chosen, using identification procedures, to approximate the behavior of given physical hysteretic system. Despite a large body of literature dedicated to the Bouc–Wen model, the relationship between the parameters that appear in the differential equation and the shape of the obtained hysteresis loop is little understood. The objective of this paper is to fill this gap by analytically exploring this relationship using a new form of the model called the normalized one. The mathematical framework introduced in this study formalizes the vague notion of “loop shape" into precise quantities whose variation with the Bouc–Wen model parameters is analyzed. In light of this analysis, the parameters of Bouc–Wen model are re-interpreted.  相似文献   

19.
在三维气相爆轰数值研究中,网格精度和计算域的规模导致网格数占有非常庞大的计算资源,进而给数值模拟带来了极大的挑战。本文针对这一难题,采用5阶WENO格式对带化学反应Euler方程组进行空间离散,基于MPI(MessagePassingInterface)并行模式开发了高精度动态并行代码,并对爆轰波在带有障碍物的三维方形管道中的传播过程进行计算。计算结果表明,高精度动态并行计算能够很好的模拟三维气相爆轰波在大尺寸管道中的传播,不仅提高了计算效率,而且提高了爆轰波阵面的分辨率。与高精度静态并行相比,高精度动态并行计算减少了界面数据通信时间,从而进一步提高了计算效率。因此,高精度动态并行程序为探究三维气相爆轰新的物理机制提供有效的手段。  相似文献   

20.
在基于MPI环境的分布式内存机群上,结合高阶WENO-RF格式的特点,实现了5阶WENO-RF格式的分区并行计算方法,计算精度不受分区和节点数量影响。使用该分区并行算法以三维可压缩时间发展混合层为例进行了直接数值模拟,验证了并行算法的准确性,表明机群并行运算可以显著扩展微机的计算能力,并行效率高,减少了计算的墙上时间,适合在小型高速局域网内进行大规模数值模拟计算。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号