首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Intel's latest Xeon Phi processor, Knights Landing (KNL), has the potential to provide over 2.6 TFLOPS. However, to obtain maximum performance on the KNL, significant refactoring and optimization of application codes are still required to exploit key architectural innovations that KNL features—wide vector units, many‐core node design, and deep memory hierarchy. The experience and insights gained in porting and running FEFLO (a typical edge‐based finite element code for the solution of compressible and incompressible flows) on the KNL platform are described in this paper. In particular, optimizations used to extract on‐node parallelism via vectorization and multithreading and improve internode communication are considered. These optimizations resulted in a 2.3× performance gain on a 16 node runs of FEFLO, with the potential for larger performance gains as the code is scaled beyond 16 nodes. The impact of the different configurations of KNL's on‐package MCDRAM (Multi‐Channel DRAM) memory on FEFLO's performance is also explored. Finally, the performance of the optimized versions of FEFLO for KNL and Haswell (Intel Xeon) is compared.  相似文献   

3.
Techniques used to implement an unstructured grid solver on modern graphics hardware are described. The three‐dimensional Euler equations for inviscid, compressible flow are considered. Effective memory bandwidth is improved by reducing total global memory access and overlapping redundant computation, as well as using an appropriate numbering scheme and data layout. The applicability of per‐block shared memory is also considered. The performance of the solver is demonstrated on two benchmark cases: a NACA0012 wing and a missile. For a variety of mesh sizes, an average speed‐up factor of roughly 9.5 × is observed over the equivalent parallelized OpenMP code running on a quad‐core CPU, and roughly 33 × over the equivalent code running in serial. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

4.
ABSTRACT

In this paper, the OpenACC heterogeneous parallel programming model is successfully applied to modification and acceleration of the three-dimensional Tokamak magnetohydrodynamical code (CLT). Through combination of OpenACC and MPI technologies, CLT is further parallelised by using multiple-GPUs. Significant speedup ratios are achieved on NVIDIA TITAN Xp and TITAN V GPUs, respectively, with very few modifications of CLT. Furthermore, the validity of the double precision calculations on the above-mentioned two graphics cards has also been strictly verified with m/n?=?2/1 resistive tearing mode instability in Tokamak.  相似文献   

5.
Local isotropy theory is examined using direct numerical simulation in a fully developed pipe flow at two Reynolds numbers Reτ=1285.6 and 684.8. The approach to local isotropy is assessed with reference to the two Kolmogorov classical equations for longitudinal and transverse velocity structure functions. The results for the second‐order longitudinal structure functions in both the dissipative and inertial ranges indicate an improved agreement with the local isotropy hypothesis as the centreline is approached. However, the transverse structure functions satisfy isotropy neither in the dissipative range or in the inertial range. The distribution of the longitudinal and transverse structure functions also shows a substantial Reynolds number dependance in the logarithmic region of the flow and beyond. The results for the third‐order longitudinal structure function demonstrate an increased Reynolds number influence, and a deteriorating tendency to local isotropy for large separations. Contour images of axial velocity differences in the dissipative and inertial ranges have exhibited interesting patterns in relation to those of the instantaneous axial velocity. Finally, the results obtained in this investigation are in very good agreement with other published experimental and numerical data on channel and duct flows. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

6.
We describe the performance of Chicoma , a 3D unstructured mesh compressible flow solver, on graphics processing unit (GPU) hardware. The approach used to deploy the solver on GPU architectures derives from the threaded multicore execution model used in Chicoma , and attempts to improve memory performance via the application of graph theory techniques. The result is a scheme that can be deployed on the GPU with high‐level programming constructs, for example, compiler directives, rather than low‐level programming extensions. With an NVIDIA Fermi‐class GPU (NVIDIA Corp., Sta. Clara, CA, USA) and double precision floating point arithmetic, we observe performance gains of 4–5 × on problem sizes of 106– 107 tetrahedra. We also compare GPU performance to threaded multicore performance with OpenMP and demonstrate hybrid multicore‐GPU calculations with adaptive mesh refinement. Published 2012. This article is a US Government work and is in the public domain in the USA.  相似文献   

7.
The accuracy of drag prediction in unstructured mesh CFD solver of TAS (Tohoku University Aerodynamic Simulation) code is discussed using a drag decomposition method. The drag decomposition method decomposes total drag into wave, profile, induced and spurious drag components, the latter resulting from numerical diffusion and errors. The mesh resolution analysis is conducted by the drag decomposition method. The effect of an advanced unstructured mesh scheme of U‐MUSCL reconstruction is also investigated by the drag decomposition method. The computational results show that the drag decomposition method reliably predicts drag and is capable of meaningful drag decomposition. The accuracy of drag prediction is increased by eliminating the spurious drag component from the total drag. It is also confirmed that the physical drag components are almost independent of the mesh resolution and scheme modification. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

8.
The aim of this work is to present a new model based on the volume of fluid method and the algebraic slip mixture model in order to solve multiphase gas–fluid flows with different interface scales and the transition among them. The interface scale is characterized by a measure of the grid, which acts as a geometrical filter and is related with the accuracy in the solution; in this sense, the presented coupled model allows to reduce the grid requirements for a given accuracy. With this objective in mind, a generalization of the algebraic slip mixture model is proposed to solve problems involving small‐scale and large‐scale interfaces in an unified framework taking special care in preserving the conservativeness of the fluxes. This model is implemented using the OpenFOAM® libraries to generate a tool capable of solving large problems on high‐performance computing facilities. Several examples are solved as a validation for the presented model, including new quantitative measurements to assess the advantages of the method. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
An effective way of using computational fluid dynamics (CFD) to simulate flow about a rotating device—for example, a wind or marine turbine—is to embed a rotating region of cells inside a larger, stationary domain, with a sliding interface between. This paper describes a simple but effective method for implementing this as an internal Dirichlet boundary condition, with interfacial values obtained by interpolation from halo nodes. The method is tested in two finite‐volume codes: one using block‐structured meshes and the other unstructured meshes. Validation is performed for flow around simple, isolated, rotating shapes (cylinder, sphere and cube), comparing, where possible, with experiment and the alternative CFD approach of fixed grid with moving walls. Flow variables are shown to vary smoothly across the sliding interface. Simulations of a tidal‐stream turbine, including both rotor and support, are then performed and compared with towing‐tank experiments. Comparison between CFD and experiment is made for thrust and power coefficients as a function of tip‐speed ratio (TSR) using Reynolds‐averaged Navier–Stokes turbulence models and large‐eddy simulation (LES). Performance of most models is good near the optimal TSR, but simulations underestimate mean thrust and power coefficients in off‐design conditions, with the standard k? turbulence model performing noticeably worse than shear stress transport kω and Reynolds‐stress‐transport closures. LES gave good predictions of mean load coefficients and vital information about wake structures but at substantial computational cost. Grid‐sensitivity studies suggest that Reynolds‐averaged Navier–Stokes models give acceptable predictions of mean power and thrust coefficients on a single device using a mesh of about 4 million cells. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

10.
11.
The application of unsteady computational fluid dynamics (CFD) codes to aeroelastic calculations leads to a large number of degrees of freedom making them computationally expensive. Reduced‐order models (ROMs) have therefore been developed; an ROM is a system of equations which is able to reproduce the solutions of the full set of equations with reasonable accuracy, but which is of lower order. ROMs have been the focus of research in various engineering situations, but it is only relatively recently that such techniques have begun to be introduced into CFD. In order for the reduced systems to be generally applicable to aeroelastic calculations, it is necessary to have continuous time models that can be put into discrete form for different time steps. While some engineering reduction schemes can produce time‐continuous models directly, the majority of methods reported in CFD initially produce discrete time or discrete frequency models. Such models are restricted in their applicability and in order to overcome this situation, a continuous time ROM must be extracted from the discrete time system. This process can most simply be achieved by inverting the transformation from continuous to discrete time that was initially used to discretize the CFD scheme. However an alternative method reported in literature is based on continuous time sampling, even when this is not used for the initial discretization of the CFD code. This paper focuses on one particular method for ROM generation, eigensystem realization algorithm (ERA), that has been used in the CFD field. This is implemented to produce a discrete time ROM from a standard CFD code, that can be used to investigate methods for obtaining continuous ROMs and the limitations of the resulting models. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

12.
Several next generation high performance computing platforms are or will be based on the so‐called many‐core architectures, which represent a significant departure from commodity multi‐core architectures. A key issue in transitioning large‐scale simulation codes from multi‐core to many‐core systems is closing the serial performance gap, that is, overcoming the large difference in single‐core performance between multi‐core and many‐core systems. In this paper, we discuss how this problem was addressed for a 3D unstructured mesh hydrodynamics code, describe how Amdahl's law can be used to estimate performance targets and guide optimization efforts, and present timing studies performed on multi‐core and many‐core platforms. Published 2014. This article is a U.S. Government work and is in the public domain in the USA.  相似文献   

13.
It has been well established that large‐scale structures, usually called coherent structures, exist in many transitional and turbulent flows. The topology and range of scales of those large‐scale structures vary from flow to flow such as counter‐rotating vortices in wake flows, streaks and hairpin vortices in turbulent boundary layer. There has been relatively little study of large‐scale structures in separated and reattached transitional flows. Large‐eddy simulation (LES) is employed in the current study to investigate a separated boundary layer transition under 2% free‐stream turbulence on a flat plate with a blunt leading edge. The Reynolds number based on the inlet free stream velocity and the plate thickness is 6500. A dynamic subgrid‐scale model is employed to compute the subgrid‐scale stresses more accurately in the current transitional flow case. Flow visualization has shown that the Kelvin–Helmholtz rolls, which have been so clearly visible under no free‐stream turbulence (NFST) are not as apparent in the present study. The Lambda‐shaped vortical structures which can be clearly seen in the NFST case can hardly be identified in the free‐stream turbulence (FST) case. Generally speaking, the effects of free‐stream turbulence have led to an early breakdown of the boundary layer, and hence increased the randomization in the vortical structures, degraded the spanwise coherence of those large‐scale structures. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

14.
Legacy codes remain a crucial element of today's simulation-based engineering ecosystem due to the extensive validation process and investment in such software. The rapid evolution of high-performance computing architectures necessitates the modernization of these codes. One approach to modernization is a complete overhaul of the code. However, this could require extensive investments, such as rewriting in modern languages, new data constructs, etc., which will necessitate systematic verification and validation to re-establish the credibility of the computational models. The current study advocates using a more incremental approach and is a culmination of several modernization efforts of the legacy code MFIX, which is an open-source computational fluid dynamics code that has evolved over several decades, widely used in multiphase flows and still being developed by the National Energy Technology Laboratory. Two different modernization approaches,‘bottom-up’ and ‘top-down’, are illustrated. Preliminary results show up to 8.5x improvement at the selected kernel level with the first approach, and up to 50% improvement in total simulated time with the latter were achieved for the demonstration cases and target HPC systems employed.  相似文献   

15.
Reduced‐Order Models (ROMs) have been the focus of research in various engineering situations, but it is only relatively recently that such techniques have begun to be introduced into the CFD field. The purpose of generating such models is to capture the dominant dynamics of the full set of CFD equations, but at much lower cost. One method that has been successfully implemented in the field of fluid flows is based on the calculation of the linear pulse responses of the CFD scheme coupled with an Eigensystem Realization algorithm (ERA), resulting in a compact aerodynamic model. The key to the models is the identification of the linear responses of the non‐linear CFD code. Two different methods have been developed and reported in literature for linear response identification; the first method linearizes the CFD code and the second method uses Volterra theory and the non‐linear code. As these methods were developed independently they have not previously been brought together and compared. This paper first explains the subtle, but fundamental differences between the two methods. In addition, a series of test cases are shown to demonstrate the performance and drawbacks of the ROMs derived from the different linear responses. The conclusions of this study provide useful guidance for the implementation of either of the two approaches to obtain the linear responses of an existing CFD code. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

16.
The use of ILU(0) factorization as a preconditioner is quite frequent when solving linear systems of CFD computations. This is because of its efficiency and moderate memory requirements. For a small number of processors, this preconditioner, parallelized through coloring methods, shows little savings when compared with a sequential one using adequate reordering of the unknowns. Level scheduling techniques are applied to obtain the same preconditioning efficiency as in a sequential case, while taking advantage of parallelism through block algorithms. Numerical results obtained from the parallel solution of the compressible Navier–Stokes equations show that this technique gives interesting savings in computational times on a small number of processors of shared‐memory computers. In addition, it does this while keeping all the benefits of an ILU(0) factorization with an adequate reordering of the unknowns, and without the loss of efficiency of factorization associated with a more scalable coloring strategy. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

17.
The use of natural gas (instead of liquid or solid fuels) is nowadays drawing an increasing interest in many applications (gas turbines, boilers, internal combustion engines), because of the greater attention to environmental issues. To facilitate the development of these applications, computer models are being developed to simulate gaseous injection, air entrainment and the ensuing combustion. This paper introduces a new method for modelling the injection process of gaseous fuels that aims to hold down grid requirements in order to allow the simulation also of other phenomena, like combustion or valve and piston motion, in reciprocating internal combustion engines. After a short overview of existing models, the transient jet model and the evaluation of inflow conditions are described in detail. Then a basic study of the grid effects on the jet evolution is presented. The model is updated and validated by comparing numerical results with available experimental data for two different operating conditions: a subsonic and a supersonic under‐expanded case. The model demonstrates to be fast enough to be used in a multi‐dimensional code and accurate enough to follow the real gas jet evolution. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

18.
Simulation of nano‐scale channel flows using a coupled Navier–Stokes/Molecular Dynamics (MD) method is presented. The flow cases serve as examples of the application of a multi‐physics computational framework put forward in this work. The framework employs a set of (partially) overlapping sub‐domains in which different levels of physical modelling are used to describe the flow. This way, numerical simulations based on the Navier–Stokes equations can be extended to flows in which the continuum and/or Newtonian flow assumptions break down in regions of the domain, by locally increasing the level of detail in the model. Then, the use of multiple levels of physical modelling can reduce the overall computational cost for a given level of fidelity. The present work describes the structure of a parallel computational framework for such simulations, including details of a Navier–Stokes/MD coupling, the convergence behaviour of coupled simulations as well as the parallel implementation. For the cases considered here, micro‐scale MD problems are constructed to provide viscous stresses for the Navier–Stokes equations. The first problem is the planar Poiseuille flow, for which the viscous fluxes on each cell face in the finite‐volume discretization are evaluated using MD. The second example deals with fully developed three‐dimensional channel flow, with molecular level modelling of the shear stresses in a group of cells in the domain corners. An important aspect in using shear stresses evaluated with MD in Navier–Stokes simulations is the scatter in the data due to the sampling of a finite ensemble over a limited interval. In the coupled simulations, this prevents the convergence of the system in terms of the reduction of the norm of the residual vector of the finite‐volume discretization of the macro‐domain. Solutions to this problem are discussed in the present work, along with an analysis of the effect of number of realizations and sample duration. The averaging of the apparent viscosity for each cell face, i.e. the ratio of the shear stress predicted from MD and the imposed velocity gradient, over a number of macro‐scale time steps is shown to be a simple but effective method to reach a good level of convergence of the coupled system. Finally, the parallel efficiency of the developed method is demonstrated. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

19.
A computational study of a high‐fidelity, implicit large‐eddy simulation (ILES) technique with and without the use of the dynamic Smagorinsky subgrid‐scale (SGS) model is conducted to examine the contributions of the SGS model on solutions of transitional flow over the SD7003 airfoil section. ILES without an SGS model has been shown in the past to produce comparable and sometimes favorable results to traditional SGS‐based large‐eddy simulation (LES) when applied to canonical turbulent flows. This paper evaluates the necessity of the SGS model for low‐Reynolds number airfoil applications to affirm the use of ILES without SGS‐modeling for a broader class of problems such as those pertaining to micro air vehicles and low‐pressure turbines. It is determined that the addition of the dynamic Smagorinsky model does not significantly affect the time‐mean flow or statistical quantities measured around the airfoil section for the spatial resolutions and Reynolds numbers examined in this study. Additionally, the robustness and reduced computational cost of ILES without the SGS model demonstrates the attractiveness of ILES as an alternative to traditional LES. Published 2012. This article is a US Government work and is in the public domain in the USA.  相似文献   

20.
The direct injection of CO2 into the deep ocean is one of the feasible ways for the mitigation of the global warming, although there is a concern about its environmental impact near the injection point. To minimize its biological impact, it is necessary to make CO2 disperse as quickly as possible, and it is said that injection with a pipe towed by a moving ship is effective for this purpose. Because the injection ship moves over a spatial scale of O(102km), a mesoscale model is necessary to analyse the dispersion of CO2. At the same time, since it is important to investigate high CO2 concentration near the injection point, a small‐scale model is also required. Therefore, in this study, a numerical model was developed to analyse CO2 dispersion in the deep ocean by using a fixed mesoscale and a moving small‐scale grid systems, the latter of which is nested and moves in the former along the trajectory of the moving ship. To overcome the artificial diffusion of mass concentration at the interface of the two different grid systems and to keep its spatial accuracy almost the same as that in the small‐scale, a particle Laplacian method was adopted and newly modified for anisotropic diffusion in the ocean. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号