首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 546 毫秒
1.
An improved GPBi-CG algorithm suitable for distributed parallel computing   总被引:1,自引:0,他引:1  
An improved generalized product-type bi-conjugate gradient (GPBi-CG) method (IGPBi-CG method, in brief) for solving large sparse linear systems with unsymmetrical coefficient matrices is proposed for distributed parallel environments. The method reduces three global synchronization points to two by reconstructing GPBi-CG method and the communication time required for the inner product can be efficiently overlapped with useful computation. The cost is only slightly increased computation time, which can be ignored compared with the reduction of communication time. Performance and isoefficiency analysis show that the IGPBi-CG method has better parallelism and scalability than the GPBi-CG method. Numerical experiments show that the scalability can be improved by a factor greater than 1.5 and the improvement in parallel communication performance approaches 33.3˙%.  相似文献   

2.
In this paper we propose and describe a parallel implementation of a block preconditioner for the solution of saddle point linear systems arising from Finite Element (FE) discretization of 3D coupled consolidation problems. The Mixed Constraint Preconditioner developed in [L. Bergamaschi, M. Ferronato, G. Gambolati, Mixed constraint preconditioners for the solution to FE coupled consolidation equations, J. Comput. Phys., 227(23) (2008), 9885-9897] is combined with the parallel FSAI preconditioner which is used here to approximate the inverses of both the structural (1, 1) block and an appropriate Schur complement matrix. The resulting preconditioner proves effective in the acceleration of the BiCGSTAB iterative solver. Numerical results on a number of test cases of size up to 2×106 unknowns and 1.2×108 nonzeros show the perfect scalability of the overall code up to 256 processors.  相似文献   

3.
In this paper we propose and describe a parallel implementation of a block preconditioner for the solution of saddle point linear systems arising from Finite Element (FE) discretization of 3D coupled consolidation problems. The Mixed Constraint Preconditioner developed in [L. Bergamaschi, M. Ferronato, G. Gambolati, Mixed constraint preconditioners for the solution to FE coupled consolidation equations, J. Comput. Phys., 227(23) (2008), 9885–9897] is combined with the parallel FSAI preconditioner which is used here to approximate the inverses of both the structural (1, 1) block and an appropriate Schur complement matrix. The resulting preconditioner proves effective in the acceleration of the BiCGSTAB iterative solver. Numerical results on a number of test cases of size up to 2×106 unknowns and 1.2×108 nonzeros show the perfect scalability of the overall code up to 256 processors.  相似文献   

4.
The paper is devoted to developing the new time- and memory-efficient algorithm BiCGSTABmem for solving the inverse gravimetry problem of determination of a variable density in a layer using the gravitational data. The problem is in solving the linear Fredholm integral equation of the first kind. After discretization of the domain and approximation of the integral operator, this problem is reduced to solving a large system of linear algebraic equations. It is shown that the matrix of coefficients is the Toeplitz-block-Toeplitz one in the case of the horizontal layer. For calculating and storing the elements of this matrix, we construct an efficient method, which significantly reduces the required memory and time. For the case of the curvilinear layer, we construct a method for approximating the parts of the matrix by a Toeplitz-block-Toeplitz one. This allows us to exploit the same efficient method for storing and processing the coefficient matrix in the case of a curvilinear layer. To solve the system of linear equations, we constructed the parallel algorithm on the basis of the stabilized biconjugated gradient method with using the Toeplitz-block-Toeplitz structure of the matrix. We implemented the BiCGSTAB and BiCGSTABmem algorithms for the Uran cluster supercomputer using the hybrid MPI + OpenMP technology. A model problem with synthetic data was solved for a large grid. It was shown that the new BiCGSTABmem algorithm reduces the computation time in comparison with the BiCGSTAB. Scalability of the parallel algorithm was studied.  相似文献   

5.
Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operation which hinders the scalability of the approach. This work studies a different parallel formulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature of the centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.  相似文献   

6.
刘瑶宁 《计算数学》2022,44(2):187-205
一类空间分数阶扩散方程经过有限差分离散后所得到的离散线性方程组的系数矩阵是两个对角矩阵与Toeplitz型矩阵的乘积之和.在本文中,对于几乎各向同性的二维或三维空间分数阶扩散方程的离散线性方程组,采用预处理Krylov子空间迭代方法,我们利用其系数矩阵的特殊结构和具体性质构造了一类分块快速正则Hermite分裂预处理子.通过理论分析,我们证明了所对应的预处理矩阵的特征值大部分都聚集于1的附近.数值实验也表明,这类分块快速正则Hermite分裂预处理子可以明显地加快广义极小残量(GMRES)方法和稳定化的双共轭梯度(BiCGSTAB)方法等Krylov子空间迭代方法的收敛速度.  相似文献   

7.
In this paper, we establish structural properties for the class of complement reducible graphs or cographs, which enable us to describe efficient parallel algorithms for recognizing cographs and for constructing the cotree of a graph if it is a cograph; if the input graph is not a cograph, both algorithms return an induced P4. For a graph on n vertices and m edges, both our cograph recognition and cotree construction algorithms run in time and require O((n+m)/logn) processors on the EREW PRAM model of computation. Our algorithms are motivated by the work of Dahlhaus (Discrete Appl. Math. 57 (1995) 29–44) and take advantage of the optimal O(logn)-time computation of the co-connected components of a general graph (Theory Comput. Systems 37 (2004) 527–546) and of an optimal O(logn)-time parallel algorithm for computing the connected components of a cograph, which we present. Our results improve upon the previously known linear-processor parallel algorithms for the problems (Discrete Appl. Math. 57 (1995) 29–44; J. Algorithms 15 (1993) 284–313): we achieve a better time-processor product using a weaker model of computation and we provide a certificate (an induced P4) whenever our algorithms decide that the input graphs are not cographs.  相似文献   

8.
We propose block ILU (incomplete LU) factorization preconditioners for a nonsymmetric block-tridiagonal M-matrix whose computation can be done in parallel based on matrix blocks. Some theoretical properties for these block ILU factorization preconditioners are studied and then we describe how to construct them effectively for a special type of matrix. We also discuss a parallelization of the preconditioner solver step used in nonstationary iterative methods with the block ILU preconditioners. Numerical results of the right preconditioned BiCGSTAB method using the block ILU preconditioners are compared with those of the right preconditioned BiCGSTAB using a standard ILU factorization preconditioner to see how effective the block ILU preconditioners are.  相似文献   

9.
We study the computation, communication and synchronization requirements related to the construction and search of parallel segment trees in an architecture independent way. Our proposed parallel algorithms are optimal in space and time compared to the corresponding sequential algorithms utilized to solve the introduced problems and are described in the context of the bulk-synchronous parallel (BSP) model of computation. Our methods are more scalable and can thus be made to work for larger values of processor size p relative to problem size n than other segment tree related algorithms that have been described on other realistic distributed-memory parallel models and also provide a natural way to approach searching problems on latency-tolerant models of computation that maintains a balanced query load among the processors.  相似文献   

10.
We present a high-order parallel algorithm, which requires only the minimum interprocessor communication dictated by the physical nature of the problem at hand. The parallelization is achieved by domain decomposition. The discretization in space is performed using the Local Fourier Basis method. The continuity conditions on the interfaces are enforced by adding homogeneous solutions. Such solutions often have fast decay properties, which can be utilized to minimize interprocessor communication. In effect, the predominant part of the computation is performed independently in the subdomains (processors) or using only local communication. A novel element of the present parallel algorithm is the incorporation of a Nonlinear Galerkin strategy to accelerate the computation and stabilize the time integration process. The basic idea of this approach consists of decomposition of the variables into large scale and small scale components with different treatment of these large and small scales. The combination of the Multidomain Fourier techniques with the Nonlinear Galerkin (NLG) algorithm is applied here to solve incompressible Navier–Stokes equations. Results are presented on direct numerical simulation of two-dimensional homogeneous turbulence using the NLG method. © 1997 John Wiley & Sons, Inc. Numer Methods Partial Differential Eq 13: 699–715, 1997  相似文献   

11.
In this paper a parallel direct Schur–Fourier decomposition (DSFD) algorithm for the direct solution of arbitrary order discrete Poisson equations on parallel computers is proposed. It is based on a combination of a Direct Schur method and a Fourier decomposition and allows to solve each Poisson equation almost to machine accuracy using only one communication episode. Thus, it is well suited for loosely coupled parallel computers, that have a high network latency compared with the CPU performance. Several three‐dimensional direct numerical simulations (DNS) of wall‐bounded turbulent incompressible flows have been carried out using the DSFD algorithm. Numerical examples illustrating the robustness and scalability of the method on a PC cluster with a conventional 100 Mbits/s network are also presented. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

12.
This article is concerned with solving the high order Stein tensor equation arising in control theory. The conjugate gradient squared (CGS) method and the biconjugate gradient stabilized (BiCGSTAB) method are attractive methods for solving linear systems. Compared with the large-scale matrix equation, the equivalent tensor equation needs less storage space and computational costs. Therefore, we present the tensor formats of CGS and BiCGSTAB methods for solving high order Stein tensor equations. Moreover, a nearest Kronecker product preconditioner is given and the preconditioned tensor format methods are studied. Finally, the feasibility and effectiveness of the new methods are verified by some numerical examples.  相似文献   

13.
The global bi-conjugate gradient (Gl-BCG) method is an attractive matrix Krylov subspace method for solving nonsymmetric linear systems with multiple right-hand sides, but it often show irregular convergence behavior in many applications. In this paper, we present a new family of global A-biorthogonal methods by using short two-term recurrences and formal orthogonal polynomials, which contain the global bi-conjugate residual (Gl-BCR) algorithm and its improved version. Finally, numerical experiments illustrate that the proposed methods are highly competitive and often superior to originals.  相似文献   

14.
A parallel asynchronous Newton algorithm for unconstrained optimization   总被引:1,自引:0,他引:1  
A new approach to the solution of unconstrained optimization problems is introduced. It is based on the exploitation of parallel computation techniques and in particular on an asynchronous communication model for the data exchange among concurrent processes. The proposed approach arises by interpreting the Newton method as being composed of a set of iterative and independent tasks that can be mapped onto a parallel computing system for the execution.Numerical experiments on the resulting algorithm have been carried out to compare parallel versions using synchronous and asynchronous communication mechanisms in order to assess the benefits of the proposed approach on a variety of parallel computing architectures. It is pointed out that the proposed asynchronous Newton algorithm is preferable for medium and large-scale problems, in the context of both distributed and shared memory architectures.This research work was partially supported by the National Research Council of Italy, within the special project Sistemi Informatici e Calcolo Parallelo, under CNR Contract No. 90.00675.PF69.  相似文献   

15.
The generalized product bi-conjugate gradient(GPBiCG(m,l))method has been recently proposed as a hybrid variant of the GPBi CG and the Bi CGSTAB methods to solve the linear system Ax=b with non-symmetric coefficient matrix,and its attractive convergence behavior has been authenticated in many numerical experiments.By means of the Kronecker product and the vectorization operator,this paper aims to develop the GPBi CG(m,l)method to solve the general matrix equation■ and the general discrete-time periodic matrix equations■ which include the well-known Lyapunov,Stein,and Sylvester matrix equations that arise in a wide variety of applications in engineering,communications and scientific computations.The accuracy and efficiency of the extended GPBi CG(m,l)method assessed against some existing iterative methods are illustrated by several numerical experiments.  相似文献   

16.
The global bi-conjugate gradient (Gl-BCG) method is an attractive matrix Krylov subspace method for solving nonsymmetric linear systems with multiple right-hand sides, but it often show irregular convergence behavior in many applications. In this paper, we present a new family of global A-biorthogonal methods by using short two-term recurrences and formal orthogonal polynomials, which contain the global bi-conjugate residual (Gl-BCR) algorithm and its improved version. Finally, numerical experiments illustrate that the proposed methods are highly competitive and often superior to originals.  相似文献   

17.
In this paper we propose a fundamentally different conjugate gradient method, in which the well-known parameter βk is computed by an approximation of the Hessian/vector product through finite differences. For search direction computation, the method uses a forward difference approximation to the Hessian/vector product in combination with a careful choice of the finite difference interval. For the step length computation we suggest an acceleration scheme able to improve the efficiency of the algorithm. Under common assumptions, the method is proved to be globally convergent. It is shown that for uniformly convex functions the convergence of the accelerated algorithm is still linear, but the reduction in function values is significantly improved. Numerical comparisons with conjugate gradient algorithms including CONMIN by Shanno and Phua [D.F. Shanno, K.H. Phua, Algorithm 500, minimization of unconstrained multivariate functions, ACM Trans. Math. Softw. 2 (1976) 87–94], SCALCG by Andrei [N. Andrei, Scaled conjugate gradient algorithms for unconstrained optimization, Comput. Optim. Appl. 38 (2007) 401–416; N. Andrei, Scaled memoryless BFGS preconditioned conjugate gradient algorithm for unconstrained optimization, Optim. Methods Softw. 22 (2007) 561–571; N. Andrei, A scaled BFGS preconditioned conjugate gradient algorithm for unconstrained optimization, Appl. Math. Lett. 20 (2007) 645–650], and new conjugacy condition and related new conjugate gradient by Li, Tang and Wei [G. Li, C. Tang, Z. Wei, New conjugacy condition and related new conjugate gradient methods for unconstrained optimization, J. Comput. Appl. Math. 202 (2007) 523–539] or truncated Newton TN by Nash [S.G. Nash, Preconditioning of truncated-Newton methods, SIAM J. on Scientific and Statistical Computing 6 (1985) 599–616] using a set of 750 unconstrained optimization test problems show that the suggested algorithm outperforms these conjugate gradient algorithms as well as TN.  相似文献   

18.
A parallel inexact Newton method with a line search is proposed for two-stage quadratic stochastic programs with recourse. A lattice rule is used for the numerical evaluation of multi-dimensional integrals, and a parallel iterative method is used to solve the quadratic programming subproblems. Although the objective only has a locally Lipschitz gradient, global convergence and local superlinear convergence of the method are established. Furthermore, the method provides an error estimate which does not require much extra computation. The performance of the method is illustrated on a CM5 parallel computer.This work was supported by the Australian Research Council and the numerical experiments were done on the Sydney Regional Centre for Parallel Computing CM5.  相似文献   

19.
A parallel system consists of a parallel algorithm and a parallel machine that supports the implementation of the algorithm. The scalability of a parallel system is a measure of its capability to increase speedup in proportion to the number of processors, or its capability to keep a constant efficiency as the number of processors increases. The present paper is devoted to the investigation of the average-case scalability of parallel algorithms executing on multicomputers with symmetric static networks, including the completely connected network, ring, hypercube, and torus. In particular, we characterize the communication overhead such that the expected efficiency can be kept at certain constant level, and that the number of tasks grows at the rate Θ(P log P).  相似文献   

20.
We consider the Schrödinger system with Newton-type interactions that was derived by R. Klein, A. Majda and K. Damodaran (1995) [17] to modelize the dynamics of N nearly parallel vortex filaments in a 3-dimensional homogeneous incompressible fluid. The known large time existence results are due to C. Kenig, G. Ponce and L. Vega (2003) [16] and concern the interaction of two filaments and particular configurations of three filaments. In this article we prove large time existence results for particular configurations of four nearly parallel filaments and for a class of configurations of N   nearly parallel filaments for any N?2N?2. We also show the existence of travelling wave type dynamics. Finally we describe configurations leading to collision.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号