首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
A systolic array for band matrixQR factorization is presented and its connection with various other arrays explained. It is shown how the computations should be performed on banded rectangular matrices which appear in several numerical problems.  相似文献   

2.
In this paper we will present a parallel algorithm to generate the permutations of at mostk out ofn objects. The architecture consists of a linear processor array and a selector. When one single processor array is available, a parallel algorithm to generate permutations is presented which achieves the best possible speedup for any givenk. Also, this algorithm can easily be modified to generate combinations. When multiple processor arrays are available, a parallel scheme is proposed to speed up the generation by fully utilizing these processor arrays. The degree of parallelism is related to the number of available processor arrays.  相似文献   

3.
The discretized linear elasticity problem is solved by the preconditioned conjugate gradient (pcg) method. Mainly we consider the linear isotropic case but we also comment on the more general linear orthotropic problem. The preconditioner is based on the separate displacement component (sdc) part of the equations of elasticity. The preconditioning system consists of two or three subsystems (in two or three dimensions) also called inner systems, each of which is solved by the incomplete factorization pcg-method, i.e., we perform inner iterations. A finite element discretization and node numbering giving a high degree of partial parallelism with equal processor load for the solution of these systems by the MIC(0) pcg method is presented. In general, the incomplete factorization requires an M-matrix. This property is studied for the elasticity problem. The rate of convergence of the pcg-method is analysed for different preconditionings based on the sdc-part of the elasticity equations. In the following two parts of this trilogy we will focus more on parallelism and implementation aspects. © 1998 John Wiley & Sons, Ltd.  相似文献   

4.
Summary LetA be a realm×n matrix with full row rankm. In many algorithms in engineering and science, such as the force method in structural analysis, the dual variable method for the Navier-Stokes equations or more generally null space methods in quadratic programming, it is necessary to compute a basis matrixB for the null space ofA. HereB isn×r, r=n–m, of rankr, withAB=0. In many instancesA is large and sparse and often banded. The purpose of this paper is to describe and test a variation of a method originally suggested by Topcu and called the turnback algorithm for computing a banded basis matrixB. Two implementations of the algorithm are given, one using Gaussian elimination and the other using orthogonal factorization by Givens rotations. The FORTRAN software was executed on an IBM 3081 computer with an FPS-164 attached array processor at the Triangle Universities Computing Center and on a CYBER 205 vector computer. Test results on a variety of structural analysis problems including two- and three-dimensional frames, plane stress, plate bending and mixed finite element problems are discussed. These results indicate that both implementations of the algorithm yielded a well-conditioned, banded, basis matrixB whenA is well-conditioned. However, the orthogonal implementation yielded a better conditionedB for large, illconditioned problems.The research by these authors was supported by the U.S. Air Force under grant No. AFOSR-83-0255 and by the National Science Foundation under grant No. MCS-82-19500The research by these authors was supported by the Applied Mathematical Sciences Program of the U.S. Department of Energy, under contract to Martin Marietta Energy Systems, Inc.  相似文献   

5.
The main goal of this paper is to highlight the relationship between the suffix array of a text and its Lyndon factorization. It is proved in [15] that one can obtain the Lyndon factorization of a text from its suffix array. Conversely, here we show a new method for constructing the suffix array of a text that takes advantage of its Lyndon factorization. The surprising consequence of our results is that, in order to construct the suffix array, the local suffixes inside each Lyndon factor can be separately processed, allowing different implementative scenarios, such as online, external and internal memory, or parallel implementations. Based on our results, the algorithm that we propose sorts the suffixes by starting from the leftmost Lyndon factors, even if the whole text or the complete Lyndon factorization are not yet available.  相似文献   

6.
Null Space Algorithm and Spanning Trees in Solving Darcy's Equation   总被引:1,自引:0,他引:1  
A Null Space algorithm is considered to solve the augmented system produced by the mixed finite element approximation of Darcy's Law. The method is based on the combination of a LU factorization technique for sparse matrices with an iterative Krylov solver. The computational efficiency of the method relies on the use of spanning trees to compute the LU factorization without fill-in and on a suitable stopping criterion for the iterative solver. We experimentally investigate its performance on a realistic set of selected application problems.This revised version was published online in October 2005 with corrections to the Cover Date.  相似文献   

7.
We have considered the systolic implementation of several methods for updating the Cholesky factorization. For positive rank-k changes there are simple one-pass arrays that implement algorithms based on elimination and plane rotations. In the case of negative rank-one changes, we do not feel that the standard algorithm [2] has a practical implementation. We have introduced a new algorithm for the case of a negative rank-k change and provided an attractive two-pass systolic implementation.  相似文献   

8.
We propose a variant of parallel block incomplete factorization preconditioners for a symmetric block-tridiagonalH-matrix. Theoretical properties of these block preconditioners are compared with those of block incomplete factorization preconditioners for the corresponding comparison matrix. Numerical results of the preconditioned CG(PCG) method using these block preconditioners are compared with those of PCG using other types of block incomplete factorization preconditioners. Lastly, parallel computations of the block incomplete factorization preconditioners are carried out on the Cray C90.  相似文献   

9.
We consider a pipeline array of processors where local communications are based on a producer-consumer mechanism. It is assumed that any processor contains a buffer of size one, and that the cycle-time of any processor is a periodic sequence of period s(j),s(j + 1),...,s(n),...,s(j − 1), where s is a fixed permutation of 1,2,...,n, and j may vary from one processor to another. We give upper and lower bounds for the performance of the system.  相似文献   

10.
For 30 years the Lempel–Ziv factorization LZ x of a string xx[1..n] has been a fundamental data structure of string processing, especially valuable for string compression and for computing all the repetitions (runs) in x. Traditionally the standard method for computing LZ x was based on Θ(n)-time (or, depending on the measure used, O(n log n)-time) processing of the suffix tree ST x of x. Recently Abouelhoda et al. proposed an efficient Lempel–Ziv factorization algorithm based on an “enhanced” suffix array – that is, a suffix array SA x together with supporting data structures, principally an “interval tree”. In this paper we introduce a collection of fast space-efficient algorithms for LZ factorization, also based on suffix arrays, that in theory as well as in many practical circumstances are superior to those previously proposed; one family out of this collection achieves true Θ(n)-time alphabet-independent processing in the worst case by avoiding tree structures altogether. The work of the first and third authors was supported in part by grants from the Natural Sciences & Engineering Research Council of Canada.  相似文献   

11.
This paper presents a new QRD factorization of a rectangular Vandermonde matrix for a special point distribution, including the symmetric case, based on ak-dimensional block decomposition of the matrix and some properties of the Kronecker product. The computational reduction factor with respect to any QR method isk 2, in the general case, and 4 in the symmetric one. By the resulting matrix factorization, new formulas are devised for the least squares system solution, whose implementation produces an algorithm of reduced computational cost and computer storage. Finally the perturbation bounds of this new factorization are devised.  相似文献   

12.
Some properties and applications of meromorphic factorization of matrix functions are studied. It is shown that a meromorphic factorization of a matrix function G allows one to characterize the kernel of the Toeplitz operator with symbol G without actually having to previously obtain a Wiener–Hopf factorization. A method to turn a meromorphic factorization into a Wiener–Hopf one which avoids having to factorize a rational matrix that appears, in general, when each meromorphic factor is treated separately, is also presented. The results are applied to some classes of matrix functions for which the existence of a canonical factorization is studied and the factors of a Wiener–Hopf factorization are explicitly determined. Submitted: April 15, 2007. Revised: October 26, 2007. Accepted: December 12, 2007.  相似文献   

13.
We consider a two-stage manufacturing system composed of a batch processor and its upstream processor. Jobs exit the upstream processor and join a queue in front of the batch processor, where they wait to be processed. The batch processor has a finite capacity Q, and its processing time is independent of the number of jobs loaded into the batch processor. In certain manufacturing systems (including semiconductor wafer fabrication), a processing time window exists from the time instance the job exits the upstream processor till the time instance it enters the batch processor. If a job is not processed before reaching the end of its processing time window, job rework or validation is required. We model this drawback by assigning a reward R for each successfully processed job by the upstream processor, and a penalty C for each job that reaches the end of its processing time window without being processed by the batch processor. We initially assume an infinite job source in front of the serial processor and also assume that the batch processor is operated under a threshold policy. We provide a method for controlling the production of the serial processor, considering the processing time window between the upstream processor and the downstream batch processor. We then show how the serial processor control policy can be modified when the serial processor also experiences intermittent job arrival.  相似文献   

14.
Summary A procedure for calculating the mean squared residual and the trace of the influence matrix associated with a polynomial smoothing spline of degree 2m–1 using an orthogonal factorization is presented. The procedure substantially overcomes the problem of ill-conditioning encountered by a recently developed method which employs a Cholesky factorization, but still requires only orderm 2 n operations and ordermn storage.  相似文献   

15.
We present the parallelization of a linear programming solver using a primal-dual interior point method on one of the heterogeneous processors, namely the Cell BE processor. Focus is given to Cholesky factorization as it is the most computationally expensive kernel in interior point methods. To make it easier to develop and port to other heterogeneous systems, we propose a two-phase implementation procedure where we first develop a shared-memory multithreaded application that executes only on the main processor, and then offload the compute-intensive tasks to execute on the synergistic processors (Cell accelerator cores). We used parent–child supernode amalgamation to increase sizes of the blocks, but we noticed that the existence of many small blocks cause significant performance degradation. To reduce the overhead of small blocks, we extend the block fan-out algorithm such that small blocks are aggregated into large blocks without adding extra zeros. We also use another type of amalgamation that can merge any two consecutive supernodes and use it to avoid having very small blocks in a composed block. The suggested block aggregation method is able to speedup the whole LP solver of up to 2.5 when compared to using parent–child supernode amalgamation alone.  相似文献   

16.
We propose block ILU (incomplete LU) factorization preconditioners for a nonsymmetric block-tridiagonal M-matrix whose computation can be done in parallel based on matrix blocks. Some theoretical properties for these block ILU factorization preconditioners are studied and then we describe how to construct them effectively for a special type of matrix. We also discuss a parallelization of the preconditioner solver step used in nonstationary iterative methods with the block ILU preconditioners. Numerical results of the right preconditioned BiCGSTAB method using the block ILU preconditioners are compared with those of the right preconditioned BiCGSTAB using a standard ILU factorization preconditioner to see how effective the block ILU preconditioners are.  相似文献   

17.
We present algorithms to determine the number of nonzeros in each row and column of the factors of a sparse matrix, for both the QR factorization and the LU factorization with partial pivoting. The algorithms use only the nonzero structure of the input matrix, and run in time nearly linear in the number of nonzeros in that matrix. They may be used to set up data structures or schedule parallel operations in advance of the numerical factorization.The row and column counts we compute are upper bounds on the actual counts. If the input matrix is strong Hall and there is no coincidental numerical cancellation, the counts are exact for QR factorization and are the tightest bounds possible for LU factorization.These algorithms are based on our earlier work on computing row and column counts for sparse Cholesky factorization, plus an efficient method to compute the column elimination tree of a sparse matrix without explicitly forming the product of the matrix and its transpose.This revised version was published online in October 2005 with corrections to the Cover Date.  相似文献   

18.
We describe the design and implementation of a parallel QR decomposition algorithm for a large sparse matrix A . The algorithm is based on the multifrontal approach and makes use of Householder transformations. The tasks are distributed among processors according to an assembly tree which is built from the symbolic factorization of the matrix A T A . We first address uniprocessor issues and then discuss the multiprocessor implementation of the method. We consider the parallelization of both the factorization phase and the solve phase. We use relaxation of the sparsity structure of both the original matrix and the frontal matrices to improve the performance. We show that, in this case, the use of Level 3 BLAS can lead to very significant gains in performance. We use the eight processor Alliant˜FX/80 at CERFACS to illustrate our discussion.  相似文献   

19.
Least squares problems occur in many branches of science. Typicallythere may be a large number of data points or observations andonly a small to moderate number of variables. On sequentialmachines these problems can be time-consuming and thereforethe use of parallel machines to solve large least-squares problemsmay well yield substantial savings. The solution of least-squaresproblems by a QR factorization using Givens rotations seemsto be particularly suitable for a parallel machine, becausethere is much choice in the order of the Givens rotations andmany Givens rotations can be carried out in parallel. In this paper, an implementation of a QR factorization on theIntel hypercube is described. Each row of the least-squaresmatrix is assigned to a processor and most of the rotationsinvolve rows within one processor in the usual case when eachprocessor receives several rows. However, it is also necessaryto carry out rotations involving rows in different processorsand we call these rotations merges. Two ways of implementingthe merges are described and they are compared on the groundsof load balance and the number of communications required. Onefeature of the implementations is that processors can continueto do Givens rotations on rows within the processor while waitingfor messages that are required for merges. There is also someflexibility in the order of the merges and this can be incorporatedinto the algorithm. For each column, the merges are carriedout according to a tree structure and the choices of trees andtheir roots are discussed. Numerical results are given to showthe usefulness and efficiency of the proposed algorithms.  相似文献   

20.
We propose a new inertia‐revealing factorization for sparse symmetric matrices. The factorization scheme and the method for extracting the inertia from it were proposed in the 1960s for dense, banded, or tridiagonal matrices, but they have been abandoned in favor of faster methods. We show that this scheme can be applied to any sparse symmetric matrix and that the fill in the factorization is bounded by the fill in the sparse QR factorization of the same matrix (but is usually much smaller). We describe our serial proof‐of‐concept implementation and present experimental results, studying the method's numerical stability and performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号