首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
This paper presents the parallelization aspects of a solution method for the fully coupled 3D compressible Navier-Stokes equations. The algorithmic thrust of the approach, embedded in a finite element code NS3D, is the linearization of the governing equations through Newton methods, followed by a fully coupled solution of velocities and pressure at each non-linear iteration by preconditioned conjugate gradient-like iterative algorithms. For the matrix assembly, as well as for the linear equation solver, efficient coarse-grain parallel schemes have been developed for shared memory machines, as well as for networks of workstations, with a moderate number of processors. The parallel iterative schemes, in particular, circumvent some of the difficulties associated with domain decomposition methods, such as geometry bookkeeping and the sometimes drastic convergence slow-down of partitioned non-linear problems.  相似文献   

2.
Multiblock methods are often employed to compute flows in complex geometries. While such methods lend themselves in a natural way to coarse-grain parallel processing by the distribution of different blocks to different processors, in some situations a fine-grain data-parallel implementation may be more appropriate. A study is presented of the resolution of the Euler equations for compressible flow on a block-structured mesh, illustrating the advantages of the data-parallel approach. Particular emphasis is placed on a dynamic block management strategy that allows computations to be undertaken only for blocks where useful work is to be performed. In addition, appropriate choices of initial and boundary conditions that enchance solution convergence are presented. Finally, code portability between five different massively parallel computer systems is examined and an analysis of the performance results obtained on different parallel systems is presented.  相似文献   

3.
One of the main factors limiting the widespread use of computational fluid dynamics codes for engineering design is their very large requirements both in terms of computer memory and CPU time. Distributed memory parallel computers offer both the potential for a dramatic improvement in cost/performance over conventional supercomputers and the scalability to large numbers of processors that is required if performance beyond that of current supercomputers is to be achieved. As part of an evaluation to explore the potential of such machines for computational fluid mechanics applications, a concurrent algorithm for the solution of the Navier-Stokes equations has been developed and demonstrated on a hypercube parallel computer. The algorithm is based on a domain decomposition of a well-established serial pressure correction algorithm. The algorithm is demonstrated on both a 32-node scalar and eight-node vector Intel iPSC/2 for complicated two-dimensional laminar and turbulent flow problems with different grid sizes and numbers of processors. Speed-ups relative to a single processor of 12.9 with 16 processors and 20.2 with 32 processors are achieved on a scalar iPSC/2, demonstrating the parallel efficiency of the algorithm. Measured performance on a 32-node scalar iPSC/2 exceeds one-sixth that of a Cray X-MP running the original serial algorithm. The performance of the algorithm on an eight-node vector iPSC/2 exceeds that of the larger scalar hypercube and is about one-fifth that of the Cray X-MP. With cost/performance more than 10 times better than the Cray, these results dramatically show the cost effectiveness of vector hypercubes for this class of fluid mechanics algorithm.  相似文献   

4.
The paper describes a domain decomposition strategy which allows high efficiency parallel simulations of turbomachinery flows. The implicit ADI parallel solver is based on the scalar approximate factorisation. The Navier-Stokes (NS) and turbulence model equations are discretised by centred finite differences. The results prove that the parallel calculations with domain decomposition, in which each sub-domain explicitly communicates with the adjacent ones at the end of each implicit iteration, may suffer from a considerable deterioration of the convergence rate. A simple sub-iterative domain recoupling strategy allows recovering the convergence rate of a single processor computation. The strategy is carefully analysed and optimised in terms of inter-processor data communication rate and algorithm memory requirements. The span-wise domain decomposition is particularly suited for turbomachinery flows and is applied to a radial impeller and to an axial turbine stator and stage to prove the validity and the accuracy of the proposed approach. The results indicate that the parallel recoupled algorithm usually reach efficiencies of 0.8, with peaks over 0.9 with up to 16 processors, thereby allowing a considerable speed-up of design and verification calculations.  相似文献   

5.
A comparison of a new parallel block-implicit method and the parallel pressure correction procedure for the solution of the incompressible Navier–Stokes equations is presented. The block-implicit algorithm is based on a pressure equation. The system of non-linear equation s is solved by Newton's method. For the solution of the linear algebraic systems the Bi-CGSTAB algorithm with incomplete lower–upper (ILU) decomposition of the matrix is applied. Domain decomposition serves as a strategy for the parallelization of the algorithms. Different algorithms for the parallel solution of the linear system of algebraic equations in conjunction with the pressure correction procedure are proposed. Three different flows are predicted with the parallel algorithms. Results and efficiency data of the block-implicit method are compared with the parallel version of the pressure correction algorithm. The block-implicit method is characterized by stable convergence behaviour, high numerical efficiency, insensitivity to relaxation parameters and high spatial accuracy. © 1997 John Wiley & Sons, Ltd.  相似文献   

6.
The parallelization of an industrially important in‐house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier–Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block‐structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright © 2000 John Wiley & Sons, Ltd.  相似文献   

7.
In this paper a parallel multigrid finite volume solver for the prediction of steady and unsteady flows in complex geometries is presented. For the handling of the complexity of the geometry and for the parallelization a unified approach connected with the concept of block-structured grids is employed. The parallel implementation is based on grid partitioning with automatic load balancing and follows the message-passing concept, ensuring a high degree of portability. A high numerical efficiency is obtained by a non-linear multigrid method with a pressure correction scheme as smoother. By a number of numerical experiments on various parallel computers the method is investigated with respect to its numerical and parallel efficiency. The results illustrate that the high performance of the underlying sequential multigrid algorithm can largely be retained in the parallel implementation and that the proposed method is well suited for solving complex flow problems on parallel computers with high efficiency.  相似文献   

8.
9.
10.
This paper describes the implementation of a fire field model in the parallel computing environment offered by multiple transputers. The fire model is built into the general purpose SIMPLE-based CFD code HARWELL-FLOW3D. The technique of domain decomposition has been applied tb convert the conventional serial version of FLOW3D into a code capable of efficiently utilizing an arbitrary number of transputers. Fire simulations consisting of up to 24 000 computational cells are performed on parallel systems with up to 15-processors. The run time for this simulation has been reduced from over 4 days on a single processor to just over 8 h on the 15-processor system. An interactive graphics system has also been developed which runs in parallel with the main computations.  相似文献   

11.
A finite volume numerical method for the prediction of fluid flow and heat transfer in simple geometries was parallelized using a domain decomposition approach. The method is implicit, uses a colocated arrangement of variables and is based on the SIMPLE algorithm for pressure-velocity coupling. Discretization is based on second-order central difference approximations. The algebraic equation systems are solved by the ILU method of Stone.1 To accelerate the convergence, a multigrid technique was used. The efficiency was examined on three different parallel computers for laminar flow in a pipe with an orifice and natural convection in a closed cavity. It is shown that the total efficiency is made up of three major factors: numerical efficiency, parallel efficiency and load-balancing efficiency. The first two factors were thoroughly investigated, and a model for predicting the parallel efficiency on various computers is presented. Test calculations indicate reasonable total efficiency and favourable dependence on grid size and the number of processors.  相似文献   

12.
The Open Accelerator (OpenACC) application programming interface is a relatively new parallel computing standard. In this paper, particle-based flow field simulations are examined as a case study of OpenACC parallel computation. The parallel conversion process of the OpenACC standard is explained, and further, the performance of the flow field parallel model is analysed using different directive configurations and grid schemes. With careful implementation and optimisation of the data transportation in the parallel algorithm, a speedup factor of 18.26× is possible. In contrast, a speedup factor of just 11.77× was achieved with the conventional Open Multi-Processing (OpenMP) parallel mode on a 20-kernel computer. These results demonstrate that optimised feature settings greatly influence the degree of speedup, and models involving larger numbers of calculations exhibit greater efficiency and higher speedup factors. In addition, the OpenACC parallel mode is found to have good portability, making it easy to implement parallel computation from the original serial model.  相似文献   

13.
The parallel implementation of the Direct Simulation Monte Carlo (DSMC) method on memory-distributed machines using unstructured mesh is reported. Physical domain decomposition is used to distribute the workload among multiple processors. A high-speed driven cavity flow is used as the benchmark problem for the validation of the parallel implementation. Three static partitioning techniques including simple coordinate partitioning, two-step partitioning (JOSTLE) and multi-level partitioning (METIS) are used for static domain decomposition, respectively. A cell renumbering technique is used to improve the memory management efficiency. Results of parallel efficiency show that two-step partitioning using JOSTLE performs the best, with 63% up to 25 processors, due to better load balancing among the processors. The powerful computational capability of the parallel implementation is demonstrated by computing a 2-D, near-continuum, hypersonic flow over a cylinder as well as a 3-D hypersonic flow over a sphere, respectively, both using 25 processors. Results compare reasonably well with previous simulated and experimental studies.  相似文献   

14.
FLITE3D is a multigrid Euler solver. It is used extensively by British Aerospace in aircraft design and simulation. This paper presents experiences in parallelizing this industrial code. Owing to the employment of an agglomeration‐based multigrid technique, the communication overhead on the coarser meshes could readily erode any gain from the use of parallel computers. The parallelization of the code therefore required careful design and implementation. The strategy adopted in the parallelization of the code, including the use of data structures and communication primitives, is described. Numerical results are presented to demonstrate the efficiency of the resulting parallel code. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

15.
In this study,we present adaptive moving boundary computation technique with parallel implementation on a distributed memory multi-processor system for large scale thermo-fluid and interfacial flow computations.The solver utilizes Eulerian-Lagrangian method to track moving(Lagrangian) interfaces explicitly on the stationary(Eulerian) Cartesian grid where the flow fields are computed.We address the domain decomposition strategies of Eulerian-Lagrangian method by illustrating its intricate complexity of the computation involved on two different spaces interactively and consequently,and then propose a trade-off approach aiming for parallel scalability.Spatial domain decomposition is adopted for both Eulerian and Lagrangian domain due to easy load balancing and data locality for minimum communication between processors.In addition,parallel cell-based unstructured adaptive mesh refinement(AMR) technique is implemented for the flexible local refinement and even-distributed computational workload among processors.Selected cases are presented to highlight the computational capabilities,including Faraday type interfacial waves with capillary and gravitational forcing,flows around varied geometric configurations and induced by boundary conditions and/or body forces,and thermo-fluid dynamics with phase change.With the aid of the present techniques,large scale challenging moving boundary problems can be effectively addressed.  相似文献   

16.
A parallel large eddy simulation code that adopts domain decomposition method has been developed for large‐scale computation of turbulent flows around an arbitrarily shaped body. For the temporal integration of the unsteady incompressible Navier–Stokes equation, fractional 4‐step splitting algorithm is adopted, and for the modelling of small eddies in turbulent flows, the Smagorinsky model is used. For the parallelization of the code, METIS and Message Passing Interface Libraries are used, respectively, to partition the computational domain and to communicate data between processors. To validate the parallel architecture and to estimate its performance, a three‐dimensional laminar driven cavity flow inside a cubical enclosure has been solved. To validate the turbulence calculation, the turbulent channel flows at Reτ = 180 and 1050 are simulated and compared with previous results. Then, a backward facing step flow is solved and compared with a DNS result for overall code validation. Finally, the turbulent flow around MIRA model at Re = 2.6 × 106 is simulated by using approximately 6.7 million nodes. Scalability curve obtained from this simulation shows that scalable results are obtained. The calculated drag coefficient agrees better with the experimental result than those previously obtained by using two‐equation turbulence models. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

17.
An efficient parallel spectral method for direct numerical simulations of transitional and turbulent flows is described in this paper. The parallelization is classically based on a bidimensional domain decomposition, but has been specifically developed for a solenoidal Fourier–Chebyshev spectral approximation where in one Fourier direction, the number of modes is very large compared with the two other directions. The approach therefore differs from classical libraries developed for cubic Fourier boxes. The strategy uses message‐passing interface (MPI) for message‐passing among nodes and is fairly portable. One of the originalities of this paper is the use of an efficient hybrid programming with MPI for internodes communications and a coarse grain parallelism using OpenMP for core shared‐memory computation, instead of the classical hybrid programming with MPI and a fine granularity parallelism at the loop level with OpenMP directives. This hybrid parallelism has been tested on the recent generation of high‐performance parallel supercomputers involving a few tens of cores per node. Performances are evaluated on different low‐frequency and high‐frequency processors massively parallel platforms. We demonstrate that spectral methods, which are known to be inherently ill‐fitted for the new generation of high‐performance distributed‐memory computers, can be implemented efficiently using this hybrid programming with good scalability and a very fast wall‐clock time per iteration. New numerical experiments are therefore now accessible on petascale computers, while keeping the attractive features of spectral methods such as accuracy, exponential convergence, computational efficiency and conservative properties. This is illustrated by a direct numerical simulation of the transition of the boundary layers developing from the entrance section of a plane channel and interacting to merge into a fully turbulent flow. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

18.
针对四个处理机的Transputer并行计算机系统,建立了建筑风压数值模拟问题基于SIM-PLEC算法的几种并行化策略:分区并行策略、方程并行策略和双重并行策略。对各种策略的计算流程、数据通讯及并行效率等进行了分析和比较,并通过实例计算作了验证。  相似文献   

19.
The parallel implementation of an unstructured‐grid, three‐dimensional, semi‐implicit finite difference and finite volume model for the free surface Navier–Stokes equations (UnTRIM ) is presented and discussed. The new developments are aimed to make the code available for high‐performance computing in order to address larger, complex problems in environmental free surface flows. The parallelization is based on the mesh partitioning method and message passing and has been achieved without negatively affecting any of the advantageous properties of the serial code, such as its robustness, accuracy and efficiency. The key issue is a new, autonomous parallel streamline backtracking algorithm, which allows using semi‐Lagrangian methods in decomposed meshes without compromising the scalability of the code. The implementation has been carefully verified not only with simple, abstract test cases illustrating the application domain of the code but also with advanced, high‐resolution models presently applied for research and engineering projects. The scheme performance and accuracy aspects are researched and discussed. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

20.
混凝土细观力学分析程序中的快速算法与并行算法设计   总被引:1,自引:0,他引:1  
针对一套混凝土细观力学分析程序,在分析其计算方法与计算效率的不足之后,提出了采用稀疏矩阵与稀疏向量技术来高效实现有限元刚度矩阵装配过程的算法,并采用双门槛不完全Cholesky分解预条件技术与CG法相结合来高效地求解稀疏线性方程组。之后,从整体上提出了一个将有限单元分布与未知量分布有机结合的并行算法设计方案,并分别针对刚度矩阵装配、双门槛不完全Cholesky分解、稀疏矩阵与稠密向量相乘、稀疏向量相加等核心算法,进行了相应的并行算法设计。最后,在由每节点2 CPU的8个Intel Xeon节点采用千兆以太网连成的机群上,针对两个混凝土数值试样进行了数值实验,第一个试样含44117个网格点与53200个有限单元,第二个试样含71013个网格点与78800个有限单元;对第一个试样,原串行程序进行全程567次加载计算需要984.83小时约41天,采用文中串行算法后,模拟时间减少到22531秒约6.26小时,采用并行算法在16个CPU上的模拟时间进一步降为3860秒约1.07小时。对第二个试样,原串行程序进行全程94次加载计算需要467.19小时约19.5天,采用文中串行算法后,模拟时间减少到11453秒约3.18小时,采用并行算法在16个CPU上的模拟时间进一步降为1704秒约28.4分钟。串行算法的改进与并行算法的设计大大缩短了计算时间,对加快混凝土力学性能的分析研究具有重要意义。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号