首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
For the parallel integration of stiff initial value problems (IVPs) three main approaches can be distinguished: approaches based on parallelism across the problem, on parallelism across the method and on parallelism across the steps. The first type of parallelism does not require special integration methods can be exploited within any available IVP solver. The methodparallel approach received some attention in the case of Runge-Kutta based methods. For these methods, the required number of processors is roughly half the order of the generating Runge-Kutta method and the speed-up with respect to a good sequential IVP solver is about a factor 2. The third type of parallelism (step-parallelism) can be achieved in any IVP solver based on predictor-corrector iteration. Most step-parallel methods proposed so far employ a large number of processors, but lack the property of robustness, due to a poor convergence behaviour in the iteration process. Hence, the effective speed-up is rather poor. The step-parallel iteraction process proposed in the present paper is less massively parallel, but turns out to be sufficiently robust to solve the four-stage Radau IIA corrector used in our experiments within a few effective iterations per step and to achieve speed-up factors up to 10 with respect to the best sequential codes.The research reported in this paper was partly supported by the Technology Foundation (STW) in the Netherlands.  相似文献   

2.
Multi-step quasi-Newton methods for optimization   总被引:4,自引:0,他引:4  
Quasi-Newton methods update, at each iteration, the existing Hessian approximation (or its inverse) by means of data deriving from the step just completed. We show how “multi-step” methods (employing, in addition, data from previous iterations) may be constructed by means of interpolating polynomials, leading to a generalization of the “secant” (or “quasi-Newton”) equation. The issue of positive-definiteness in the Hessian approximation is addressed and shown to depend on a generalized version of the condition which is required to hold in the original “single-step” methods. The results of extensive numerical experimentation indicate strongly that computational advantages can accrue from such an approach (by comparison with “single-step” methods), particularly as the dimension of the problem increases.  相似文献   

3.
In this paper we propose a new framework for designing a delay differential equation (DDE) solver which works with any supplied initial value problem (IVP) solver that is based on a standard step-by-step approach, such as Runge-Kutta or linear multi-step methods, and can provide dense output. This is done by treating a general DDE as a special example of a discontinuous IVP. Using this interpretation we develop an efficient technique to solve the resulting discontinuous IVP. We also give a more clear process for the numerical techniques used when solving the implicit equations that arise on a time step, such as when the underlying IVP solver is implicit or the delay vanishes. The new modular design for the resulting simulator we introduce, helps to accelerate the utilization of advances in the different components of an effective numerical method. Such components include the underlying discrete formula, the interpolant for dense output, the strategy for handling discontinuities and the iteration scheme for solving any implicit equations that arise.  相似文献   

4.
The extended backward differentiation formulas (EBDFs) and theirmodified form (MEBDF) were proposed by Cash in the 1980s forsolving initial value problems (IVPs) for stiff systems of ordinarydifferential equations (ODEs). In a recent performance evaluationof various IVP solvers, including a variable-step-variable-orderimplementation of the MEBDF method by Cash, it turned out thatthe MEBDF code often performs more efficiently than codes likeRADAU5, DASSL and VODE. This motivated us to look at possibleparallel implementations of the MEBDF method. Each MEBDF stepessentially consists of successively solving three non-linearsystems by means of modified Newton iteration using the sameJacobian matrix. In a direct implementation of the MEBDF methodon a parallel computer system, the only scope for (coarse grain)parallelism consists of a number of parallel vector updates.However, all forward–backward substitutions and all right-hand-sideevaluations have to be done in sequence. In this paper, ourstarting point is the original (unmodified) EBDF method. Asa consequence, two different Jacobian matrices are involvedin the modified Newton method, but on a parallel computer system,the effective Jacobian-evaluation and the LU decomposition costsare not increased. Furthermore, we consider the simultaneoussolution, rather than the successive solution, of the threenon-linear systems, so that in each iteration the forward–backwardsubstitutions and the right-hand-side evaluations can be doneconcurrently. A mutual comparison of the performance of theparallel EBDF approach and the MEBDF approach shows that wecan expect a speed-up factor of about 2 on three processors.  相似文献   

5.
Parallel linear system solvers for Runge-Kutta methods   总被引:1,自引:0,他引:1  
If the nonlinear systems arising in implicit Runge-Kutta methods like the Radau IIA methods are iterated by (modified) Newton, then we have to solve linear systems whose matrix of coefficients is of the form I-A hJ with A the Runge-Kutta matrix and J an approximation to the Jacobian of the righthand side function of the system of differential equations. For larger systems of differential equations, the solution of these linear systems by a direct linear solver is very costly, mainly because of the LU-decompositions. We try to reduce these costs by solving the linear systems by a second (inner) iteration process. This inner iteration process is such that each inner iteration again requires the solution of a linear system. However, the matrix of coefficients in these new linear systems is of the form I - B hJ where B is similar to a diagonal matrix with positive diagonal entries. Hence, after performing a similarity transformation, the linear systems are decoupled into s subsystems, so that the costs of the LU-decomposition are reduced to the costs of s LU-decompositions of dimension d. Since these LU-decompositions can be computed in parallel, the effective LU-costs on a parallel computer system are reduced by a factor s 3 . It will be shown that matrices B can be constructed such that the inner iterations converge whenever A and J have their eigenvalues in the positive and nonpositive halfplane, respectively. The theoretical results will be illustrated by a few numerical examples. A parallel implementation on the four-processor Cray-C98/4256 shows a speed-up ranging from at least 2.4 until at least 3.1 with respect to RADAU5 applied in one-processor mode.  相似文献   

6.
A key algorithmic element of a real-time trajectory optimization hardware/software implementation is presented, the search step solver. This is one piece of an algorithm whose overall goal is to make nonlinear trajectory optimization fast enough to provide real-time commands during guidance of a vehicle such as an aeromaneuvering orbiter or the National Aerospace Plane. Many methods of nonlinear programming require the solution of a quadratic program (QP) at each iteration to determine the search step. In the trajectory optimization case, the QP has a special dynamic programming structure, an LQR-like structure. The algorithm exploits this special structure with a divide-and-conquer type of parallel implementation. A hypercube message-passing parallel machine, the INTEL iPSC/2, has been used. The algorithm solves a (p·N)-stage problem onN processors inO(p + log2 N) operations. The algorithm yields a factor of 8 speed-up over the fastest known serial algorithm when solving a 1024-stage test problem on 32 processors.This research was supported in part by the National Aeronautics and Space Administration under Grant No. NAG-1-1009.  相似文献   

7.
This paper deals with solving stiff systems of differential equations by implicit Multistep Runge-Kutta (MRK) methods. For this type of methods, nonlinear systems of dimension sd arise, where s is the number of Runge-Kutta stages and d the dimension of the problem. Applying a Newton process leads to linear systems of the same dimension, which can be very expensive to solve in practice. With a parallel iterative linear system solver, especially designed for MRK methods, we approximate these linear systems by s systems of dimension d, which can be solved in parallel on a computer with s processors. In terms of Jacobian evaluations and LU-decompositions, the k-steps-stage MRK applied with this technique is on s processors equally expensive as the widely used k-step Backward Differentiation Formula on 1 processor, whereas the stability properties are better than that of BDF. A simple implementation of both methods shows that, for the same number of Newton iterations, the accuracy delivered by the new method is higher than that of BDF.  相似文献   

8.
This paper introduces the design and implementation of two parallel dual simplex solvers for general large scale sparse linear programming problems. One approach, called PAMI, extends a relatively unknown pivoting strategy called suboptimization and exploits parallelism across multiple iterations. The other, called SIP, exploits purely single iteration parallelism by overlapping computational components when possible. Computational results show that the performance of PAMI is superior to that of the leading open-source simplex solver, and that SIP complements PAMI in achieving speedup when PAMI results in slowdown. One of the authors has implemented the techniques underlying PAMI within the FICO Xpress simplex solver and this paper presents computational results demonstrating their value. In developing the first parallel revised simplex solver of general utility, this work represents a significant achievement in computational optimization.  相似文献   

9.
A parallel, "across the method" implementation of a stiff ODE solver is presented. The construction of the methods is outlined and the main implementational issues are discussed. Performance results, using MPI on the IBM SP-2, are presented and they indicate that a speed-up between 3 and 5 typically can be obtained compared to state-of-the-art sequential codes.  相似文献   

10.
As a special case of our main result, we show that for all L> 0, each k-nearest neighbor graph in d dimensions excludes Kh as a depth L minor if h = Ω(Ld). More generally, we prove that the overlap graphs defined by Miller, Teng, Thurston and Vavasis (1993) have this combinatorial property. By a construction of Plotkin, Rao and Smith (1994), our result implies that overlap graphs have “good” cut-covers, answering an open question of Kaklamanis, Krizanc and Rao (1993). Consequently, overlap graphs can be emulated on hypercube graphs with a constant factor of slow-down and on butterfly graphs with a factor of O(log* n) slow-down. Therefore, computations on overlap graphs, such as finite element and finite difference methods on “well-conditioned” meshes and image processing on k-nearest neighbor graphs, can be performed on hypercubic parallel machines with a linear speed-up. Our result, in conjunction with a result of Plotkin, Rao and Smith, also yields a combinatorial proof that overlap graphs have separators of sublinear size. We also show that with high probability, the Delaunay diagram, the relative neighborhood graph, and the k-nearest neighbor graph of a random point set exclude Kh as a depth L minor if h = Ω(Ld/2 log n).  相似文献   

11.
This paper investigates iterated Multistep Runge-Kutta methods of Radau type as a class of explicit methods suitable for parallel implementation. Using the idea of van der Houwen and Sommeijer [18], the method is designed in such a way that the right-hand side evaluations can be computed in parallel. We use stepsize control and variable order based on iterated approximation of the solution. A code is developed and its performance is compared with codes based on iterated Runge-Kutta methods of Gauss type and various Dormand and Prince pairs [15]. The accuracy of some of our methods are comparable with the PIRK10 methods of van der Houwen and Sommeijer [18], but require fewer processors. In addition at very stringent tolerances these new methods are competitive with RK78 pairs in a sequential implementation.  相似文献   

12.
In this contribution to the Proceedings of the “International Symposium on New Aspects of Numerical Analysis in the Light of Recent Technology” held at Stresa on 13 until 18 September 1993, we give a survey of recent research at CWI for solving the implicit relations arising in ODEIVP methods on parallel computers. Starting with a General Linear method as introduced by Butcher, three forms of parallelism for solving the associated implicit equations are discussed, viz. (i) parallelism across the stages within a single step (stage parallelism), (ii) parallel preconditioners, and (iii) parallelism across the steps (step parallelism). The structure of these type of parallel methods will be described.  相似文献   

13.
How can small-scale parallelism best be exploited in the solution of nonstiff initial value problems? It is generally accepted that only modest gains inefficiency are possible, and it is often the case that “fast” parallel algorithms have quite crude error control and stepsize selection components. In this paper we consider the possibility of using parallelism to improvereliability andfunctionality rather than efficiency. We present an algorithm that can be used with any explicit Runge-Kutta formula. The basic idea is to take several smaller substeps in parallel with the main step. The substeps provide an interpolation facility that is essentially free, and the error control strategy can then be based on a defect (residual) sample. If the number of processors exceeds (p ? 1)/2, wherep is the order of the Runge-Kutta formula, then the interpolant and the error control scheme satisfy very strong reliability conditions. Further, for a given orderp, the asymptotically optimal values for the substep lengths are independent of the problem and formula and hence can be computed a priori. Theoretical comparisons between the parallel algorithm and optimal sequential algorithms at various orders are given. We also report on numerical tests of the reliability and efficiency of the new algorithm, and give some parallel timing statistics from a 4-processor machine.  相似文献   

14.
In this paper we present algorithms for drawing series parallel digraphs with as much symmetry as possible. The first step is to compute a certain kind of automorphism, called an “upward planar automorphism” for an input series parallel digraph. The next step uses these automorphisms to construct a symmetric drawing of the graph. We present several variations of the second step, with visibility drawings, “bus-orthogonal” drawings, and polyline drawings. All algorithms run in linear time.  相似文献   

15.
Banded linear systems occur frequently in mathematics and physics. However, direct solvers for large systems cannot be performed in parallel without communication. The aim of this paper is to develop a general asymmetric banded solver with a direct approach that scales across many processors efficiently. The key mechanism behind this is that reduction to a row-echelon form is not required by the solver. The method requires more floating point calculations than a standard solver such as LU decomposition, but by leveraging multiple processors the overall solution time is reduced. We present a solver using a superposition approach that decomposes the original linear system into q subsystems, where q is the number of superdiagonals. These methods show optimal computational cost when q processors are available because each system can be solved in parallel asynchronously. This is followed by a q×q dense constraint matrix problem that is solved before a final vectorized superposition is performed. Reduction to row echelon form is not required by the solver, and hence the method avoids fill-in. The algorithm is first developed for tridiagonal systems followed by an extension to arbitrary banded systems. Accuracy and performance is compared with existing solvers and software is provided in the supplementary material.  相似文献   

16.
椭圆离散方程并行预条件子的局部构造算法 Ⅰ.基本方法   总被引:1,自引:0,他引:1  
孙家昶 《计算数学》1995,17(2):143-153
用有限元或差分法离散所得的大型稀疏椭圆型线性代数方程组Au=f(1)构造高效率的迭代算法,是目前计算方法的一个极其活跃的方向.  相似文献   

17.
This paper derives a new class of general linear methods (GLMs) intended for the solution of stiff ordinary differential equations (ODEs) on parallel computers. Although GLMs were introduced by Butcher in the 1960s, the task of deriving formulas from the class with properties suitable for specific applications is far from complete. This paper is a contribution to that work. Our new methods have several properties suited for the solution of stiff ODEs on parallel computers. They are strictly diagonally implicit, allowing parallelism in the Newton iteration used to solve the nonlinear equations arising from the implicitness of the formula. The stability matrix has no spurious eigenvalues (that is, only one eigenvalue of the stability matrix is non-zero), resulting in a solution free from contamination from spurious solutions corresponding to non-dominant, non-zero eigenvalues. From these two properties arises the name DIMSEM, for Diagonally IMplicit Single-Eigenvalue Method. The methods have high stage order, avoiding the phenomenon of order reduction that occurs, for example, with some Runge-Kutta methods. The methods are L-stable, with the result that the chosen stepsize is dictated by convergence requirements rather than stability considerations imposed by the stiffness of the problem. An introduction to GLMs is given and some order barriers for DIMSEMs are presented. DIMSEMs of orders 2-6 are derived, as well as an L-stable class of diagonal methods of all orders which do not, however, possess the single-eigenvalue property. A fixed-order, variable-stepsize implementation of the DIMSEMs is described, including the derivation of local error estimators, and the results of testing on both sequential and parallel computers is presented. The testing shows the DIMSEMs to be competitive with fixed-order versions of the popular solver LSODE on a practical test problem.  相似文献   

18.
The “two-fluid” mathematical model for turbulent combustion is applied to a one-dimensional, premixed, stabilized ducted flame. The flame is assumed to consist of two interspersed fluids (“reactants” and “products”), each characterized by its own properties and interacting through the exchange of mass, heat, and momentum. The distributions of pressure, densities, velocities, and volume fractions across the duct were successfully simulated. From a parametric study on the effects of the empirical constants involved in the interfluid relations, the significant dependence of the system on the parameters that characterize the mass transfer rate and the relative effect of mass transfer to momentum transfer was confirmed. The application of the model to transient states proved its ability to predict system oscillations.  相似文献   

19.
The purpose of this paper is to create a theoretical frameworkfor parallelization of Runge-Kutta methods. We investigate theinherent potential for parallelism by considering digraphs ofRunge-Kutta matrices. By highlighting the important role ofthe underlying sparsity pattern, this approach narrows the fielddown to certain types of methods. These are further investigatedby two techniques: perturbed collocation and elementary differentials.Our analysis leads to singly diagonally implicit fourth-orderL-stable methods that can be implemented on two processors (inMIMD architecture) with computational cost of two ‘conventional’stages. We also debate local error control and present a techniquethat can be used to this end.  相似文献   

20.
Our main interest in this paper is to translate from “natural language” into “system theoretical language”. This is of course important since a statement in system theory can be analyzed mathematically or computationally. We assume that, in order to obtain a good translation, “system theoretical language” should have great power of expression. Thus we first propose a new frame of system theory, which includes the concepts of “measurement” as well as “state equation”. And we show that a certain statement in usual conversation, i.e., fuzzy modus ponens with the word “very”, can be translated into a statement in the new frame of system theory. Though our result is merely one example of the translation from “natural language” into “system theoretical language”, we believe that our method is fairly general.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号