首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The application of the Lanczos algorithm in Newton-like methods for solving non-linear systems of equations arising in nonlinear structural finite element analysis is presented. It is shown that with appropriate preconditioners iterative methods can be developed which are robust and efficient even for ill conditioned problems. Though the real advantage of iterative solvers seems to exist on distributed memory machines, even on serial machines the performance can be improved compared with direct solvers while saving memory capacity. With a specific modification of the Lanczos algorithm in combination with arc-length procedures a further speed-up of the nonlinear analysis can be achieved. For parallel implementations domain decomposition methods are used. A parallel preconditioning strategy based on an incomplete factorisation method is presented. An example is taken and the quality and efficiency of two different domain decomposition methods are discussed for a large shell structure. This work was supported by the BMBF (Bundesministerium für Bildung und Forschung) of Germany.  相似文献   

2.
We consider additive two‐level preconditioners, with a local and a global component, for the Schur complement system arising in non‐overlapping domain decomposition methods. We propose two new parallelizable local preconditioners. The first one is a computationally cheap but numerically relevant alternative to the classical block Jacobi preconditioner. The second one exploits all the information from the local Schur complement matrices and demonstrates an attractive numerical behaviour on heterogeneous and anisotropic problems. We also propose two implementations based on approximate Schur complement matrices that are cheaper alternatives to construct the given preconditioners but that preserve their good numerical behaviour. Through extensive computational experiments we study the numerical scalability and the robustness of the proposed preconditioners and compare their numerical performance with well‐known robust preconditioners such as BPS and the balancing Neumann–Neumann method. Finally, we describe a parallel implementation on distributed memory computers of some of the proposed techniques and report parallel performances. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

3.
In this paper, we propose efficient parallel implementations of the auction/sequential shortest path and the -relaxation algorithms for solving the linear minimum cost flow problem. In the parallel auction algorithm, several augmenting paths can be found simultaneously, each of them starting from a different node with positive surplus. Convergence results of an asynchronous version of the algorithm are also given. For the -relaxation method, there exist already parallel versions implemented on CM-5 and CM-2; our implementation is the first on a shared memory multiprocessor. We have obtained significant speedup values for the algorithms considered; it turns out that our implementations are effective and efficient.  相似文献   

4.
Finite element meshes and node-numberings suitable for parallel solution with equally loaded processors are presented for linear orthotropic elliptic partial differential equations. These problems are of great importance, for instance in the oil and airfoil industries. The linear systems of equations are solved by the conjugate gradient method preconditioned by modified incomplete factorization, MIC. The basic method presented, is based on fronts of uncoupled nodes and unlike earlier methods it has the advantage of no requirement of a specific orientation of the mesh. This method is however, in general, restricted to small degree of anisotropy in the differential equation. Another method, which does not suffer from this limitation, uses rotation of the differential equation and spectral equivalence. The rotation is made in such a way that in the new co-ordinate system, the basic method is applicable. The spectral equivalence property is used for estimation of the condition number of the preconditioned system. Both methods are suitable for implementation on parallel computers. The computer architecture could be single instruction multiple data (SIMD) as well as multiple instruction multiple data (MIMD) with shared or distributed memory. Implementation of the basic method on a shared memory parallel computer shows a significant improvement by use of the MIC method compared with the diagonal scaling preconditioning method.  相似文献   

5.
In this paper, a parallel implementation of Wang’s method for solving tridiagonal system of equations on the multiprocessor machine using occam language is presented. The parallel algorithm has been designed for shared and distributed memory machine that support data parallel and message passing. The over all performance of this implementation on 9 each of processors is given. The communication times are very important and any improvement on this communication would have a significant performance of the implementation. The significance of these results are discussed.  相似文献   

6.
1.IntroductionConsiderthesyllUnetricpositivedeflate(SPD)systemsoflinearequationsthatariseinfiniteelementdiscretisstionsofmanysecond-orderself-adjointellipticboundaryvalueproblems.Tosolvethisclassoflinearsystemsiteratively,AxelssonandVassilevski[1--4]preselltedthealgebraicmultileveliteration(AMLI)methodsbyreasonablyutilizingthemultigridtechniqueandthepolynomialaccelerationstrategy.Thesemethodsareamongthemostefficientiterativesolversbecausetheirpreconditioningmatricesarespectrallyequlvalellt…  相似文献   

7.
The development of parallel simulation technology is seen as an enabler for the implementation of the virtual factory concept, the integrated simulation of all the systems in a factory. One important parallel simulation protocol, the asynchronous deadlock avoidance algorithm proposed by Chandy, Misra, and Bryant, has usually been discussed in the context of distributed memory systems. Also, null messages have normally been associated with this approach for deadlock avoidance. This paper presents a new implementation of the CMB protocol designed for shared memory multiprocessor systems. We have successfully used this protocol, which we call the CMB-SMP protocol, to achieve useful speedups in a manufacturing simulation application, despite the fine granularity of event processing. The implementation eliminates the need for sending null messages, without causing deadlock in the simulation. Double buffering is also used to reduce the overhead of buffer locking. It is shown that the CMB-SMP protocol outperforms a synchronous super-step protocol in terms of the speedups achieved. The paper also discusses the cache behaviour of the CMB-SMP protocol implementation since cache misses are very expensive with today's high clock speed processors.  相似文献   

8.
In this paper we study divisible load scheduling in systems with limited memory. Divisible loads are parallel computations which can be divided into independent parts processed in parallel on remote computers, and the part sizes may be arbitrary. The distributed system is a heterogeneous single level tree. The total size of processor memories is too small to accommodate the whole load at any moment of time. Therefore, the load is distributed in many rounds. Memory reservations have block nature. The problem consists in distributing the load taking into account communication time, computation time, and limited memory buffers so that the whole processing finishes as early as possible. This problem is both combinatorial and algebraic in nature. Therefore, hybrid algorithms are given to solve it. Two algorithms are proposed to solve the combinatorial component. A branch-and-bound algorithm is nearly unusable due to its complexity. Then, a genetic algorithm is proposed with more tractable execution times. For a given solution of the combinatorial part we formulate the solution of the algebraic part as a linear programming problem. An extensive computational study is performed to analyze the impact of various system parameters on the quality of the solutions. From this we were able to infer on the nature of the scheduling problem.  相似文献   

9.
The intention of the paper is to give an introduction to the OpTiX-II Software Environment, which supports the parallel and distributed solution of mathematical nonlinear programming problems. First, a brief summary of nonsequential solution concepts for nonlinear optimization on multiprocessor systems will be given. The focus of attention will be put on coarse-grained parallelization and its implementation on multi-computer clusters. The conceptual design objectives for the OpTiX-II Software Environment will be presented as well as the implementation on a workstation cluster, a transputer system and a multiprocessor workstation (shared memory). The OpTiX-II system supports the steps from the formulation of nonlinear optimization problems to their solution on networks of (parallel) computers. In order to demonstrate the use of OpTiX-II, the solution of a nonlinear optimization problem from the field of structural design is discussed and some numerical test results are supplied.  相似文献   

10.
Paper presents a set of parallel iterative solvers and preconditioners for the efficient solution of systems of linear equations arising in the high order finite-element approximations of boundary value problems for 3-D time-harmonic Maxwell equations on unstructured tetrahedral grids. Balancing geometric domain decomposition techniques combined with algebraic multigrid approach and coarse-grid correction using hierarchic basis functions are exploited to achieve high performance of the solvers and small memory load on the supercomputers with shared and distributed memory. Testing results for model and real-life problems show the efficiency and scalability of the presented algorithms.  相似文献   

11.
张胜 《计算数学》1993,15(2):235-241
§0.引言 区域分裂是与微分方程数值解的并行计算的数学基础密切相关的,预处理共轭梯度法是区域分裂的一个主要途径,寻找好的预处理子是关键问题,本文给出一个较一般性的方法,预处理过程包括一个整体小规模问题和若干个独立的局部子问题,整体问题和局部问题的选取均有极大的任意性,预处理条件数的估计是由整体问题和局部问题的一些特  相似文献   

12.
In this paper, we address the problem of solving sparse symmetric linear systems on parallel computers. With further restrictive assumptions on the matrix (e.g., bidiagonal or tridiagonal structure), several direct methods may be used. These methods give ideas for constructing efficient data parallel preconditioners for general positive definite symmetric matrices. We describe two examples of such preconditioners for which the factorization (i.e., the construction of the preconditioning matrix) turns out to be parallel. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

13.
The inherent structure of cellular automata is trivially parallelizable and can directly benefit from massively parallel machines in computationally intensive problems. This paper presents both block synchronous and block pipeline (with asynchronous message passing) parallel implementations of cellular automata on distributed memory (message-passing) architectures. A structural design problem is considered to study the performance of the various cellular automata implementations. The synchronous parallel implementation is a mixture of Jacobi and Gauss–Seidel style iteration, where it becomes more Jacobi like as the number of processors increases. Therefore, it exhibits divergence because of the mathematical characteristics of Jacobi iteration matrix for the structural problem as the number of processors increases. The proposed pipeline implementation preserves convergence by simulating a pure Gauss–Seidel style row-wise iteration. Numerical results for analysis and design of a cantilever plate made of composite material show that the pipeline update scheme is convergent and successfully generates optimal designs.  相似文献   

14.
We propose a new preconditioner DASP (discrete approximate spectral preconditioner), based on the existing well-known preconditioners and our computational experience. Parallel preconditioning strategies for large scale partial difference equation systems arising from partial differential equations are investigated. Numerical results are given to show the efficiency and effectiveness of the new preconditioners for both model problems and real applications in petroleum reservoir simulation.  相似文献   

15.
Computers with multiple processor cores using shared memory are now ubiquitous. In this paper, we present several parallel geometric algorithms that specifically target this environment, with the goal of exploiting the additional computing power. The algorithms we describe are (a) 2-/3-dimensional spatial sorting of points, as is typically used for preprocessing before using incremental algorithms, (b) d-dimensional axis-aligned box intersection computation, and finally (c) 3D bulk insertion of points into Delaunay triangulations, which can be used for mesh generation algorithms, or simply for constructing 3D Delaunay triangulations. For the latter, we introduce as a foundational element the design of a container data structure that both provides concurrent addition and removal operations and is compact in memory. This makes it especially well-suited for storing large dynamic graphs such as Delaunay triangulations.We show experimental results for these algorithms, using our implementations based on the Computational Geometry Algorithms Library (CGAL). This work is a step towards what we hope will become a parallel mode for CGAL, where algorithms automatically use the available parallel resources without requiring significant user intervention.  相似文献   

16.
Optimal control problems constrained by a partial differential equation (PDE) arise in various important applications, such as in engineering and natural sciences. Normally the problems are of very large scale, so iterative solution methods must be used. Thereby the choice of an iteration method in conjunction with an efficient preconditioner is essential. In this paper, we consider a new iteration method and a new preconditioning technique for an elliptic PDE-constrained optimal control problem with a distributed control function. Some earlier used iteration methods and preconditioners in the literature are compared, both analytically and numerically with the new iteration method and the preconditioner.  相似文献   

17.
Two kinds of parallel preconditioners for the solution of large sparse linear systems which arise from the 2-D 5-point finite difference discretization of a convection-diffusion equation are introduced. The preconditioners are based on the SSOR or MILU preconditioners and can be implemented on parallel computers with distributed memories. One is the block preconditioner, in which the interface components of the coefficient matrix between blocks are ignored to attain parallelism in the forward-backward substitutions. The other is the modified block preconditioner, in which the block preconditioner is modified by taking the interface components into account. The effect of these preconditioners on the convergence of preconditioned iterative methods and timing results on the parallel computer (Cenju) are presented.  相似文献   

18.
This paper is concerned with the numerical solution of a symmetric indefinite system which is a generalization of the Karush–Kuhn–Tucker system. Following the recent approach of Luk?an and Vl?ek, we propose to solve this system by a preconditioned conjugate gradient (PCG) algorithm and we devise two indefinite preconditioners with good theoretical properties. In particular, for one of these preconditioners, the finite termination property of the PCG method is stated. The PCG method combined with a parallel version of these preconditioners is used as inner solver within an inexact Interior‐Point (IP) method for the solution of large and sparse quadratic programs. The numerical results obtained by a parallel code implementing the IP method on distributed memory multiprocessor systems enable us to confirm the effectiveness of the proposed approach for problems with special structure in the constraint matrix and in the objective function. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

19.
In mining supply chains, large combinatorial optimization problems arise. These are NP-hard and typically require a large number of computing resources to solve them. In particular, the run-time overheads can become increasingly prohibitive with increasing problem sizes. Parallel methods provide a way to manage such run-time issues by utilising several processors in independent or shared memory architectures. However it is not obvious how to adapt serial optimisation algorithms to perform best in a parallel environment. Here, we consider a resource constrained scheduling problem which is motivated in mining supply chains and present two popular meta-heuristics, ant colony optimization (ACO) and simulated annealing and investigate how best to parallelize these methods on a shared memory architecture consisting of several cores. ACO’s solution construction framework is inherently parallel allowing a relatively straightforward parallel implementation. However, for best performance, ACO needs an element of local search. This significantly complicates the paralellization. Several alternative schemes for parallel ACO with elements of local search are considered and evaluated empirically. We find that ACO with local search is the most effective single-threaded algorithm. The best parallel implementation can obtain similar quality results to the serial method in significantly less elapsed time.  相似文献   

20.
This paper deals with the preconditioning of the curl-curl operator. We use H(curl)- conforming finite elements for the discretization of our corresponding magnetostatic model problem. Jumps in the material parameters influence the condition of the problem. We will demonstrate by theoretical estimates and numerical experiments that hierarchical matrices are well suited to construct efficient parallel preconditioners for the fast and robust iterative solution of such problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号