This paper gives some global and uniform convergence estimates for a class of subspace correction (based on space decomposition) iterative methods applied to some unconstrained convex optimization problems. Some multigrid and domain decomposition methods are also discussed as special examples for solving some nonlinear elliptic boundary value problems.
Solving large scale linear systems efficiently plays an important role in a petroleum reservoir simulator, and the key part is how to choose an effective parallel preconditioner. Properly choosing a good preconditioner has been beyond the pure algebraic field. An integrated preconditioner should include such components as physical background, characteristics of PDE mathematical model, nonlinear solving method, linear 相似文献
This paper describes the first algorithm to compute the greatest common divisor (GCD) of two n-bit integers using a modular representation for intermediate values U, V and also for the result. It is based on a reduction step, similar to one used in the accelerated algorithm [T. Jebelean, A generalization of the binary GCD algorithm, in: ISSAC '93: International Symposium on Symbolic and Algebraic Computation, Kiev, Ukraine, 1993, pp. 111–116; K. Weber, The accelerated integer GCD algorithm, ACM Trans. Math. Softw. 21 (1995) 111–122] when U and V are close to the same size, that replaces U by (U−bV)/p, where p is one of the prime moduli and b is the unique integer in the interval (−p/2,p/2) such that . When the algorithm is executed on a bit common CRCW PRAM with O(nlognlogloglogn) processors, it takes O(n) time in the worst case. A heuristic model of the average case yields O(n/logn) time on the same number of processors. 相似文献
Integration of the subsurface flow equation by finite elements (FE) in space and finite differences (FD) in time requires the repeated solution to sparse symmetric positive definite systems of linear equations. Iterative techniques based on preconditioned conjugate gradients (PCG) are one of the most attractive tool to solve the problem on sequential computers. A present challenge is to make PCG attractive in a parallel computing environment as well. To this aim a key factor is the development of an efficient parallel preconditioner. FSAI (factorized sparse approximate inverse) and enlarged FSAI relying on the approximate inverse of the coefficient matrix appears to be a most promising parallel preconditioner. In the present paper PCG using FSAI, diagonal and pARMS (parallel algebraic recursive multilevel solvers) preconditioners is implemented on the IBM SP4/512 and CLX/768 supercomputers with up to 32 processors to solve underground flow problems of a large size. The results show that FSAI may allow for a parallel relative efficiency larger than 50% on the largest problems with p=32 processors. Moreover, FSAI turns out to be significantly less expensive and more robust than pARMS. Finally, it is shown that for p in the upper range may be much improved if PCG–FSAI is implemented on CLX. 相似文献
Mixed integer programming (MIP) models are extensively usedto aid strategic and tactical decision making in many businesssectors. Solving MIP models is a computationally intensive processand there is a need to develop solution approaches that enablelarger models to be solved within acceptable timeframes. Inthis paper, we describe the implementation of a two-stage parallelbranch and bound (PB & B) algorithm for MIP. In stage 1of the algorithm, a multiple heuristic search is implementedin which a number of alternative search trees are investigatedusing a forest search in the hope of finding a good solutionquickly. In stage 2, the search is reorganized so that the branchesof a chosen tree are investigated in parallel. A new heuristicis introduced, based on a best projection criterion, which evaluatesalternative B & B trees in order to choose one for investigationin stage 2 of the algorithm. The heuristic also serves as away of implementing a quality load balancing scheme for stage2 of the algorithm. The results of experimental investigationsare reported for a range of models taken from the MIPLIB libraryof benchmark problems. 相似文献
A model for parallel and distributed programs, the dynamic process graph (DPG), is investigated under graph-theoretic and complexity aspects. Such graphs embed constructors for parallel programs, synchronization mechanisms as well as conditional branches. They are capable of representing all possible executions of a parallel or distributed program in a very compact way. The size of this representation can be as small as logarithmic with respect to the size of any execution of the program.
In a preceding paper [A. Jakoby, et al., Scheduling dynamic graphs, in: Proc. 16th Symposium on Theoretical Aspects in Computer Science STACS'99, LNCS, vol. 1563, Springer, 1999, pp. 383–392] we have analysed the expressive power of the general model and various variants of it. We have considered the scheduling problem for DPGs given enough parallelism taking into account communication delays between processors when exchanging data. Given a DPG the question arises whether it can be executed (that means whether the corresponding parallel program has been specified correctly), and what is its minimum schedule length.
In this paper we study a subclass of dynamic process graphs called
-output DPGs, which are appropriate in many situations, and investigate their expressive power. In a previous paper we have shown that the problem to determine the minimum schedule length is still intractable for this subclass, namely this problem is
-complete as is the general case. Here we will investigate structural properties of the executions of such graphs. A natural graph-theoretic conjecture that executions must always split into components that are isomorphic to subgraphs turns out to be wrong. We are able to prove a weaker property. This implies a quadratic upper bound on the schedule length that may be necessary in the worst case, in contrast to the general case, where the optimal schedule length may be exponential with respect to the size of the representing DPG. Making this bound constructive, we obtain an approximation to a
-complete problem. Computing such a schedule and then executing the program can be done on a parallel machine in polynomial time in a highly distributive fashion. 相似文献
We investigate a new parallel all-optical clock recovery scheme based on heterodyne beats of an optical sideband-filtered signal. The oscillating clock signal is recovered when the filtered sideband is combined with a stable local oscillator. The filtering is performed with an optical resonator, which by nature provides possibility for multiwavelength operation. The local oscillator could be realized by a multiwavelength laser, whose emission wavelengths are injection seeded with carrier wavelengths of the input data. The output signal of such a configuration benefits from a reduced bit-pattern effect and a stable offset level. The sideband filtering is demonstrated for 23 simultaneous channels at 100 GHz DWDM grid, each hosting a data stream of 10 Gbit/s. 相似文献