首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper describes the implementation and performances of a parallel solver for the direct numerical simulation of the three‐dimensional and time‐dependent Navier–Stokes equations on distributed‐memory, massively parallel computers. The feasibility of this approach to study Marangoni flow instability in half zone liquid bridges is examined. The results indicate that the incompressible, non‐linear Navier–Stokes problem, governing the Marangoni flows behavior, can effectively be parallelized on a distributed memory parallel machine by remapping the distributed data structure. The numerical code is based on a three‐dimensional Simplified Marker and Cell (SMAC) primitive variable method applied to a staggered finite difference grid. Using this method, the problem is split into two problems, one parabolic and the other elliptic A parallel algorithm, explicit in time, is utilized to solve the parabolic equations. A parallel multisplitting kernel is introduced for the solution of the pseudo pressure elliptic equation, representing the most time‐consuming part of the algorithm. A grid‐partition strategy is used in the parallel implementations of both the parabolic equations and the multisplitting elliptic kernel. A Message Passing Interface (MPI) is coded for the boundary conditions; this protocol is portable to different systems supporting this interface for interprocessor communications. Numerical experiments illustrate good numerical properties and parallel efficiency. In particular, good scalability on a large number of processors can be achieved as long as the granularity of the parallel application is not too small. However, increasing the number of processors, the Speed‐Up is ever smaller than the ideal linear Speed‐Up. The communication timings indicate that complex practical calculations, such as the solutions of the Navier–Stokes equations for the numerical simulation of the instability of Marangoni flows, can be expected to run on a massively parallel machine with good efficiency. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

2.
In this paper, we present an application of a parallel‐in‐time algorithm for the solution of the unsteady Navier–Stokes model equations that are of parabolic–elliptic type. This method is based on the alternated use of a coarse global sequential solver and a fine local parallel one. A standard finite volume/finite differences first‐order approach is used for discretization of the unsteady two‐dimensional Navier–Stokes equations. The Taylor vortex decay problem and the confined flow around a square cylinder were selected as unsteady flow examples to illustrate and analyse the properties of the parallel‐in‐time method through numerical experiments. The influence of several parameters on the computing time required to perform a parallel‐in‐time calculation on a PC cluster was verified. Among them we have analysed the influence of the number of processors, the number of iterations for convergence, the resolution of the spatial domain and the influence of the time‐step sizes ratio between the coarse and fine grids. Significant computer time saving was achieved when compared with the single processor computing time, particularly when the spatial dimension of the problem is low and the temporal scale is large. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

3.
The use of ILU(0) factorization as a preconditioner is quite frequent when solving linear systems of CFD computations. This is because of its efficiency and moderate memory requirements. For a small number of processors, this preconditioner, parallelized through coloring methods, shows little savings when compared with a sequential one using adequate reordering of the unknowns. Level scheduling techniques are applied to obtain the same preconditioning efficiency as in a sequential case, while taking advantage of parallelism through block algorithms. Numerical results obtained from the parallel solution of the compressible Navier–Stokes equations show that this technique gives interesting savings in computational times on a small number of processors of shared‐memory computers. In addition, it does this while keeping all the benefits of an ILU(0) factorization with an adequate reordering of the unknowns, and without the loss of efficiency of factorization associated with a more scalable coloring strategy. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

4.
A parallel finite volume method for the Navier–Stokes equations with adaptive hybrid prismatic/tetrahedral grids is presented and evaluated in terms of parallel performance. A new method of domain partitioning for complex 3D hybrid meshes is also presented. It is based on orthogonal bisection of a special octree corresponding to the hybrid mesh. The octree is generated automatically and can handle any type of 3D geometry and domain connectivity. One important property of the octree-based partitioning that is exploited is the octree's ability to yield load-balanced partitions that follow the shape of the geometry. This biasing of the octree results in a reduced number of grid elements on the interpartition boundaries and thus fewer data to communicate among processors. Furthermore, the octree-based partitioning gives similar quality of partitions for very different geometries, while requiring minimal user interaction and little computational time. The partitioning method is evaluated in terms of quality of the subdomains as well as execution time. Viscous flow simulations for different geometries are employed to examine the effectiveness of the octree-based partitioning and to test the scalability of parallel execution of the Navier–Stokes solver and hybrid grid adapter on two different parallel systems, the Intel Paragon and the IBM SP2. © 1998 John Wiley & Sons, Ltd.  相似文献   

5.
A parallel semi-explicit iterative finite element computational procedure for modelling unsteady incompressible fluid flows is presented. During the procedure, element flux vectors are calculated in parallel and then assembled into global flux vectors. Equilibrium iterations which introduce some ‘local implicitness’ are performed at each time step. The number of equilibrium iterations is governed by an implicitness parameter. The present technique retains the advantages of purely explicit schemes, namely (i) the parallel speed-up is equal to the number of parallel processors if the small communication overhead associated with purely explicit schemes is ignored and (ii) the computation time as well as the core memory required is linearly proportional to the number of elements. The incompressibility condition is imposed by using the artificial compressibility technique. A pressure-averaging technique which allows the use of equal-order interpolations for both velocity and pressure, this simplifying the formulation, is employed. Using a standard Galerkin approximation, three benchmark steady and unsteady problems are solved to demonstrate the accuracy of the procedure. In all calculations the Reynolds number is less than 500. At these Reynolds numbers it was found that the physical dissipation is sufficient to stabilize the convective term with no need for additional upwind-type dissipation. © 1998 John Wiley & Sons, Ltd.  相似文献   

6.
A parallel solver based on domain decomposition is presented for the solution of large algebraic systems arising in the finite element discretization of mechanical problems. It is hybrid in the sense that it combines a direct factorization of the local subdomain problems with an iterative treatment of the interface system by a parallel GMRES algorithm. An important feature of the proposed solver is the use of a set of Lagrange multipliers to enforce continuity of the finite element unknowns at the interface. A projection step and a preconditioner are proposed to control the conditioning of the interface matrix. The decomposition of the finite element mesh is formulated as a graph partitioning problem. A two-step approach is used where an initial decomposition is optimized by non-deterministic heuristics to increase the quality of the decomposition. Parallel simulations of a Navier–Stokes flow problem carried out on a Convex Exemplar SPP system with 16 processors show that the use of optimized decompositions and the preconditioning step are keys to obtaining high parallel efficiencies. Typical parallel efficiencies range above 80%. © 1998 John Wiley & Sons, Ltd.  相似文献   

7.
We develop a parallel computational algorithm for simulating models of gel dynamics where the gel is described by two phases, a networked polymer and a fluid solvent. The models consist of transport equations for the two phases, two coupled momentum equations, and a volume‐averaged incompressibility constraint. Multigrid with Vanka‐type box‐relaxation scheme is used as preconditioner for the Krylov subspace solver (GMRES) to solve the momentum and incompressibility equations. Through numerical experiments of a model problem, the efficiency, robustness and scalability of the algorithm are illustrated. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

8.
Elapsed time is always one of the most important performance measures for polymer injection moulding simulation. Solving pressure correction equations is the most time-consuming part in the mould filling simulation using finite volume method with SIMPLE-like algorithms. Algebraic multigrid (AMG) is one of the most promising methods for this type of elliptic equations. It, thus, has better performance by contrast with some common one-level iterative methods, especially for large problems. And it is also suitable for parallel computing. However, AMG is not easy to be applied due to its complex theory and poor generality for the large range of computational fluid dynamics applications. This paper gives a robust and efficient parallel AMG solver, A1-pAMG, for 3D mould filling simulation of injection moulding. Numerical experiments demonstrate that, A1-pAMG has better parallel performance than the classical AMG, and also has algorithmic scalability in the context of 3D unstructured problems.  相似文献   

9.
We present a parallel fully implicit algorithm for the large eddy simulation (LES) of incompressible turbulent flows on unstructured meshes in three dimensions. The LES governing equations are discretized by a stabilized Galerkin finite element method in space and an implicit second-order backward differentiation scheme in time. To efficiently solve the resulting large nonlinear systems, we present a highly parallel Newton-Krylov-Schwarz algorithm based on domain decomposition techniques. Analytic Jacobian is applied in order to obtain the best achievable performance. Two benchmark problems of lid-driven cavity and flow passing a square cylinder are employed to validate the proposed algorithm. We then apply the algorithm to the LES of turbulent flows passing a full-size high-speed train with realistic geometry and operating conditions. The numerical results show that the algorithm is both accurate and efficient and exhibits a good scalability and parallel efficiency with tens of millions of degrees of freedom on a computer with up to 4096 processors. To understand the numerical behavior of the proposed fully implicit scheme, we study several important issues, including the choices of linear solvers, the overlapping size of the subdomains, and, especially, the accuracy of the Jacobian matrix. The results show that an exact Jacobian is necessary for the efficiency and the robustness of the proposed LES solver.  相似文献   

10.
The convergence rate of a methodology for solving incompressible flow in general curvilinear co‐ordinates is analyzed. Double‐staggered grids (DSGs), each defined by the same boundaries as the physical domain, are used for discretization. Both grids are MAC quadrilateral meshes with scalar variables (pressure, temperature, etc.) arranged at the center and the Cartesian velocity components at the middle of the sides of the mesh cells. The problem was checked against benchmark solutions of natural convection in a squeezed cavity, heat transfer in concentric horizontal cylindrical annuli, and a hot cylinder in a duct. Poisson's pressure‐correction equations that arise from the SIMPLE‐like procedure are solved by several methods: successive overrelaxation, symmetric overrelaxation, modified incomplete factorization preconditioner, conjugate gradient (CG), and CG with preconditioner. A genetic algorithm was developed to solve problems of numerical optimization of SIMPLE‐like calculation time in a space of iteration numbers and relaxation parameters. The application provides a means of making an unbiased comparison between the DSGs method and the widely used interpolation method. Furthermore, the convergence rate was demonstrated by application to the calculation of natural convection heat transfer in concentric horizontal cylindrical annuli. Calculation times when DSGs were used were 2–10 times shorter than those achieved by interpolation. With the DSGs method, calculation time increases slightly with increasing non‐orthogonality of the grids, whereas an interpolation method calls for very small iteration parameters that lead to unacceptable calculation times. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

11.
高阶谱元区域分解算法求解定常方腔驱动流   总被引:2,自引:0,他引:2  
主要利用Jacobian-free的Newton-Krylov方法求解定常不可压缩Navier-Stokes方程,将基于高阶谱元法的区域分解Stokes算法的非定常时间推进步作为Newton迭代的预处理,回避了传统Newton方法Jacobian矩阵的显式装配,节省了程序内存,同时降低了Newton迭代线性系统的条件数,且没有非线性对流项的隐式求解,大大加快了收敛速度。对有分析解的Kovasznay流动的计算结果表明,本高阶谱元法在空间上有指数收敛的谱精度,且对定常解的Newton迭代是二次收敛的。本文模拟了二维方腔顶盖一致速度驱动流,同基准解符合得很好,表明本文方法是准确可靠的。本文还考虑了Re=800时方腔顶盖正弦速度驱动流,除得到已知的一个稳定对称解和一对稳定非对称解外,还获得了一对新的不稳定的非对称解。  相似文献   

12.
13.
实现了可用于计算岩体在渗流场(H)、温度场(T)及应力场(M)等多场耦合作用下损伤(D)演化并行分析系统。详细论述了整体实现方案、线性方程组求解器、并行通信优化策略及针对不同场问题的高效处理方法。对测试模型HM和TM计算表明,在启动合理数目处理器时程序具有最佳性能,近线性加速比,效率在82%以上,默认精度下一次耦合计算千万自由度模型稳定在200s。对三维水压致裂过程和温度载荷引起的材料表面裂纹现象进行了数值计算,结果很好地反映客观规律,展示出该系统的广阔应用前景。  相似文献   

14.
一种全耦合多相流分析的并行计算方法   总被引:2,自引:0,他引:2  
王希诚 《力学学报》1999,31(3):276-284
研究了孔隙介质中热、水和汽流全耦合分析的并行计算方法.模型中采用了考虑毛细压力关系的修正有效应力概念,并考虑了相变和潜热传递.基本变量为位移、毛细压力、汽压和温度.并行程序是在国家高性能计算中心(北京)的曙光1000A上借助PVM(ParalelVirtualMachine)软件系统实现的,考题显示出较高的并行加速比和效率  相似文献   

15.
邓小毛  廖子菊 《力学学报》2022,54(12):3513-3523
三维流固耦合问题的非结构网格数值算法在很多工程领域都有重要应用,目前现有的数值方法主要基于分区算法,即流体和固体区域分别进行求解,因此存在收敛速度较慢以及附加质量导致的稳定性问题,此外,该类算法的并行可扩展性不高,在大规模应用计算方面也受到一定限制.本文针对三维非定常流固耦合问题,提出一种基于区域分解的全隐全耦合可扩展并行算法.首先基于任意拉格朗日-欧拉框架建立流固耦合控制方程,然后时间方向采用二阶向后差分隐式格式、空间方向采用非结构稳定化有限元方法进行离散.对于大规模非线性离散系统,构造一种结合非精确Newton法、Krylov子空间迭代法与区域分解Schwarz预条件子的Newton-Krylov-Schwarz (NKS)并行求解算法,实现流体、固体和动网格方程的一次性整体求解.采用弹性障碍物绕流的标准测试算例对数值方法的准确性进行了验证,数值性能测试结果显示本文构造的全隐全耦合算法具有良好的稳定性,在不同的物理参数下具有良好的鲁棒性,在“天河二号”超级计算机上,当并行规模从192增加到3072个处理器核时获得了91%的并行效率.性能测试结果表明本文构造的NKS算法有望应用于复杂...  相似文献   

16.
In the present work a new iterative method for solving the Navier-Stokes equations is designed. In a previous paper a coupled node fill-in preconditioner for iterative solution of the Navier-Stokes equations proved to increase the convergence rate considerably compared with traditional preconditioners. The further development of the present iterative method is based on the same storage scheme for the equation matrix as for the coupled node fill-in preconditioner. This storage scheme separates the velocity, the pressure and the coupling of pressure and velocity coefficients in the equation matrix. The separation storage scheme allows for an ILU factorization of both the velocity and pressure unknowns. With the inner-outer solution scheme the velocity unknowns are eliminated before the resulting equation system for the pressures is solved iteratively. After the pressure unknown has been found, the pressures are substituted into the original equation system and the velocities are also found iteratively. The behaviour of the inner-outer iterative solution algorithm is investigated in order to find optimal convergence criteria for the inner iterations and compared with the solution algorithm for the original equation system. The results show that the coupled node fill-in preconditioner of the original equation system is more efficient than the coupled node fill-in preconditioner of the reduced equation system. However, the solution technique of the reduced equation system revals properties which may be advantageous in future solution algorithms.  相似文献   

17.
This paper describes parallel simulation techniques for the discrete element method (DEM) on multi-core processors. Recently, multi-core CPU and GPU processors have attracted much attention in accelerating computer simulations in various fields. We propose a new algorithm for multi-thread parallel computation of DEM, which makes effective use of the available memory and accelerates the computation. This study shows that memory usage is drastically reduced by using this algorithm. To show the practical use of DEM in industry, a large-scale powder system is simulated with a complicated drive unit. We compared the performance of the simulation between the latest GPU and CPU processors with optimized programs for each processor. The results show that the difference in performance is not substantial when using either GPUs or CPUs with a multi-thread parallel algorithm. In addition, DEM algorithm is shown to have high scalability in a multi-thread parallel computation on a CPU.  相似文献   

18.
This paper describes the development of a parallel three‐dimensional unstructured non‐isothermal flow solver for the simulation of the injection molding process. The numerical model accounts for multiphase flow in which the melt and air regions are considered to be a continuous incompressible fluid with distinct physical properties. This aspect avoids the complex reconstruction of the interface. A collocated finite volume method is employed, which can switch between first‐ and second‐order accuracy in both space and time. The pressure implicit with splitting of operators algorithm is used to compute the transient flow variables and couple velocity and pressure. The temperature equation is solved using a transport equation with convection and diffusion terms. An upwind differencing scheme is used for the discretization of the convection term to enforce a bounded solution. In order to capture the sharp interface, a bounded compressive high‐resolution scheme is employed. Parallelization of the code is achieved using the PETSc framework and a single program multiple data message passing model. Predicted numerical solutions for several example problems are considered. The first case validates the solution algorithm for moderate Reynolds number flows using a structured mesh. The second case employs an unstructured hybrid mesh showing the capability of the solver to describe highly viscous flows closer to realistic injection molding conditions. The final case presents the non‐isothermal filling of a thick cavity using three mesh sizes and up to 80 processors to assess parallel performance. The proposed algorithm is shown to have good accuracy and scalability. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

19.
混凝土细观力学分析程序中的快速算法与并行算法设计   总被引:1,自引:0,他引:1  
针对一套混凝土细观力学分析程序,在分析其计算方法与计算效率的不足之后,提出了采用稀疏矩阵与稀疏向量技术来高效实现有限元刚度矩阵装配过程的算法,并采用双门槛不完全Cholesky分解预条件技术与CG法相结合来高效地求解稀疏线性方程组。之后,从整体上提出了一个将有限单元分布与未知量分布有机结合的并行算法设计方案,并分别针对刚度矩阵装配、双门槛不完全Cholesky分解、稀疏矩阵与稠密向量相乘、稀疏向量相加等核心算法,进行了相应的并行算法设计。最后,在由每节点2 CPU的8个Intel Xeon节点采用千兆以太网连成的机群上,针对两个混凝土数值试样进行了数值实验,第一个试样含44117个网格点与53200个有限单元,第二个试样含71013个网格点与78800个有限单元;对第一个试样,原串行程序进行全程567次加载计算需要984.83小时约41天,采用文中串行算法后,模拟时间减少到22531秒约6.26小时,采用并行算法在16个CPU上的模拟时间进一步降为3860秒约1.07小时。对第二个试样,原串行程序进行全程94次加载计算需要467.19小时约19.5天,采用文中串行算法后,模拟时间减少到11453秒约3.18小时,采用并行算法在16个CPU上的模拟时间进一步降为1704秒约28.4分钟。串行算法的改进与并行算法的设计大大缩短了计算时间,对加快混凝土力学性能的分析研究具有重要意义。  相似文献   

20.
以大型复杂装备研制过程中对大规模精细动力学数值模拟的迫切需求为背景,对大规模模态分析及后续相关动力学并行计算的国内外研究进展进行了概述,并对团队在JAUMIN框架和PANDA软件平台上开展的结构动力学并行计算相关研发工作进行了介绍.基于神光III大型光机装备的展示算例表明,PANDA软件的动力学并行可扩展能力达到“上亿自由度、上万核”的水平,万核并行效率高达50%以上,远超国内现有商业软件的分析能力;“基于框架研发应用软件”的设计理念已经成为大规模有限元程序研发的主流理念,对于提升软件研发效率,促进软件实用化和并行可扩展性将起到关键作用.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号