首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
A one-sided Jacobi hyperbolic singular value decomposition (HSVD) algorithm, using a massively parallel graphics processing unit (GPU), is developed. The algorithm also serves as the final stage of solving a symmetric indefinite eigenvalue problem. Numerical testing demonstrates the gains in speed and accuracy over sequential and MPI-parallelized variants of similar Jacobi-type HSVD algorithms. Finally, possibilities of hybrid CPU–GPU parallelism are discussed.  相似文献   

2.
David E. Keyes 《PAMM》2007,7(1):1026401-1026402
Towards Optimal Petascale Simulations (TOPS) is a scalable solver software project based on domain decomposed parallelization to research, implement, and support in collaborations with users an open-source package for large-scale discretized PDE problems. Optimal complexity methods, such as multigrid/multilevel preconditioners, keep the time spent in dominant algebraic kernels close to linear in discrete problem size as the applications scale on massively parallel computers. Krylov accelerators and Jacobian-free variants of Newton's method, as appropriate, are wrapped around the multilevel methods to deliver robustness in multirate, multiscale coupled systems, which are solved either implicitly or in more traditional forms of operator splitting. The TOPS software framework is being extended beyond direct computational simulation to computational optimization, including design, control, and inverse problems. We outline and illustrate the philosophy of TOPS. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

3.
The article develops a hybrid variational Bayes (VB) algorithm that combines the mean-field and stochastic linear regression fixed-form VB methods. The new estimation algorithm can be used to approximate any posterior without relying on conjugate priors. We propose a divide and recombine strategy for the analysis of large datasets, which partitions a large dataset into smaller subsets and then combines the variational distributions that have been learned in parallel on each separate subset using the hybrid VB algorithm. We also describe an efficient model selection strategy using cross-validation, which is straightforward to implement as a by-product of the parallel run. The proposed method is applied to fitting generalized linear mixed models. The computational efficiency of the parallel and hybrid VB algorithm is demonstrated on several simulated and real datasets. Supplementary material for this article is available online.  相似文献   

4.
This paper discusses the massively parallel solution of linear network programs. It integrates the general algorithmic framework of proximal minimization with D-functions (PMD) with primal-dual row-action algorithms. Three alternative algorithmic schemes are studied: quadratic proximal point, entropic proximal point, and least 2-norm perturbations. Each is solving a linear network problem by solving a sequence of nonlinear approximations. The nonlinear subproblems decompose for massively parallel computing. The three algorithms are implemented on a Connection Machine CM-2 with up to 32K processing elements, and problems with up to 16 million variables are solved. A comparison of the three algorithms establishes their relative efficiency. Numerical experiments also establish the best internal tactics which can be used when implementing proximal minimization algorithms. Finally, the new algorithms are compared with an implementation of the network simplex algorithm executing on a CRAY Y-MP vector supercomputer.  相似文献   

5.
We propose a massively parallelizable algorithm for the classical assignment problem. The algorithm operates like an auction whereby unassigned persons bid simultaneously for objects thereby raising their prices. Once all bids are in, objects are awarded to the highest bidder. The algorithm can also be interpreted as a Jacobi — like relaxation method for solving a dual problem. Its (sequential) worst — case complexity, for a particular implementation that uses scaling, is O(NAlog(NC)), where N is the number of persons, A is the number of pairs of persons and objects that can be assigned to each other, and C is the maximum absolute object value. Computational results show that, for large problems, the algorithm is competitive with existing methods even without the benefit of parallelism. When executed on a parallel machine, the algorithm exhibits substantial speedup.Work supported by Grant NSF-ECS-8217668. Thanks are due to J. Kennington and L. Hatay of Southern Methodist Univ. for contributing some of their computational experience.  相似文献   

6.
The purpose of this study is to describe a data parallel primal-dual augmenting path algorithm for the dense linear many-to-one assignment problem also known as semi-assignment. This problem could for instance be described as assigning n persons to m(n) job groups.The algorithm is tailored specifically for massive SIMD parallelism and employs, in this context, a new efficient breadth-first-search augmenting path technique which is found to be faster than the shortest augmenting path search normally used in sequential algorithms for this problem. We show that the best known sequential computational complexity of O(mn 2 ) for dense problems, is reduced to the parallel complexity of O(mn), on a machine with n processors supporting reductions in O(1) time. The algorithm is easy to implement efficiently on commercially available massively parallel computers. A range of numerical experiments are performed on a Connection Machine CM200 and a MasPar MP-2. The tests show the good performance of the proposed algorithm.  相似文献   

7.
Dessole  M.  Marcuzzi  F. 《Numerical Algorithms》2021,86(3):1243-1263

In this paper, we present PARASOF, an algorithm for the solution of linear systems with BABD matrices on massively parallel computing systems like graphic processing units or GPUs. This algorithm is compared with the state-of-the-art algorithms, in particular SOF, from which it is inspired and takes the same stability properties. We detail its design and implementation issues and give the main figures of its theoretical and experimental performances.

  相似文献   

8.
求解混合流水线调度问题的离散人工蜂群算法   总被引:1,自引:0,他引:1       下载免费PDF全文
本文给出了一种离散的人工蜂群算法(HDABC)用于求解混合流水车间调度(HFS)问题。采用工件排序的编码方式,并设计了四种邻域结构。雇佣蜂依次分派到解集中每个解,采用结合问题特征的局部搜索策略完成挖掘搜索工作。跟随蜂随机选择两个解并挑选较优者作为当前解,完成进一步的探优过程。侦察蜂采用三种策略跳出局部极小。通过34个同构并行机HFS问题和2个异构并行机HFS实际调度问题的实验,并与当前文献中的典型算法对比,验证了本文提出的算法无论在算法时间还是在求解质量上,都具备良好的性能。  相似文献   

9.
A new heuristic algorithm to perform tabu search on the Quadratic Assignment Problem (QAP) is developed. A massively parallel implementation of the algorithm on the Connection Machine CM-2 is provided. The implementation usesn 2 processors, wheren is the size of the problem. The elements of the algorithm, calledPar_tabu, include dynamically changing tabu list sizes, aspiration criterion and long term memory. A new intensification strategy based on intermediate term memory is proposed and shown to be promising especially while solving large QAPs. The combination of all these elements gives a very efficient heuristic for the QAP: the best known or improved solutions are obtained in a significantly smaller number of iterations than in other comparative studies. Combined with the implementation on CM-2, this approach provides suboptimal solutions to QAPs of bigger dimensions in reasonable time.  相似文献   

10.
Parallel preconditioned conjugate gradient algorithm on GPU   总被引:1,自引:0,他引:1  
We propose a parallel implementation of the Preconditioned Conjugate Gradient algorithm on a GPU platform. The preconditioning matrix is an approximate inverse derived from the SSOR preconditioner. Used through sparse matrix–vector multiplication, the proposed preconditioner is well suited for the massively parallel GPU architecture. As compared to CPU implementation of the conjugate gradient algorithm, our GPU preconditioned conjugate gradient implementation is up to 10 times faster (8 times faster at worst).  相似文献   

11.
A hybrid algorithm for computing the determinant of a matrix whose entries are polynomials is presented. It is based on the dimension-decreasing algorithm [22] and the parallel algorithm for computing a symbolic determinant of [19]. First, through the dimension-decreasing algorithm, a given multivariate matrix can be converted to a bivariate matrix. Then, the parallel algorithm can be applied to effectively compute the determinant of the bivariate matrix. Experimental results show that the new algorithm can not only reduce enormously the intermediate expression swell in the process of symbolic computation, but also achieve higher degree of parallelism, compared with the single parallel algorithm given in [19].  相似文献   

12.
In this paper, we study efficient parallel implementation for hybrid iterative methods BiCGSTAB and BiCGSTAB (?) with ? = 2 on the CRAY C90, and the efficiency of their parallel performance is evaluated. Numerical experiments suggest that on the CRAY C90 a parallel inner product algorithm called PDOTB be used for the parallelization of hybrid iterative methods containing sensitive values of inner products. Lastly, it is shown that the number of iterations in which parallel hybrid iterative methods satisfy a certain convergence criterion depends on the number of processors to be used.  相似文献   

13.
This paper describes several massively parallel implementations for a global search algorithm DIRECT. Two parallel schemes take different approaches to address DIRECT’s design challenges imposed by memory requirements and data dependency. Three design aspects in topology, data structures, and task allocation are compared in detail. The goal is to analytically investigate the strengths and weaknesses of these parallel schemes, identify several key sources of inefficiency, and experimentally evaluate a number of improvements in the latest parallel DIRECT implementation. The performance studies demonstrate improved data structure efficiency and load balancing on a 2200 processor cluster.  相似文献   

14.
This paper presents a parallel hybrid exact multi-objective approach which combines two metaheuristics – a genetic algorithm (GA) and a memetic algorithm (MA), with an exact method – a branch and bound (B&B) algorithm. Such approach profits from both the exploration power of the GA, the intensification capability of the MA and the ability of the B&B to provide optimal solutions with proof of optimality. To fully exploit the resources of a computational grid, the hybrid method is parallelized according to three well-known parallel models – the island model for the GA, the multi-start model for the MA and the parallel tree exploration model for the B&B. The obtained method has been experimented and validated on a bi-objective flow-shop scheduling problem. The approach allowed to solve exactly for the first time an instance of the problem – 50 jobs on 5 machines. More than 400 processors belonging to 4 different administrative domains have contributed to the resolution process during more than 6 days.   相似文献   

15.
Flow in a multiaquifer porous system can be simulated by the so-called “quasi three-dimensional” models. When heterogeneous and/or aquitards with nonlinear hydrogeologic behavior are considered, a fully numerical approach is required for the model solution. If the finite element method is used to integrate the partial differential flow equations, the final solution of large systems is required. In the present article, an original iterative solution strategy is developed. The global system is decoupled into a number of smaller subsystems consistent with the geologic structure (aquitards and aquifers) of the multiaquifer system. The aquifer and the aquitard equations are solved separately with the modified conjugate gradient and the Thomas algorithms, respectively, while the final coupled solution is obtained with an iterative procedure equivalent to a Block Jacobi scheme. The procedure can be efficiently implemented on a parallel super-computer distributing the computational load, so that two successive blocks (related to an aquifer and the underlying aquitard) are solved on the same processor. The procedure is analyzed with linear porous media, where the convergence is theoretically ensured. The results obtained with a realistic linear multiaquifer system, employing a massively parallel computer like the CRAY T3D, have pointed out the high degree of parallelization of the algorithm. Comparison with the parallel implementation of the Block SOR and Block Gauss-Seidel schemes shows that parallel Block Jacobi performs significantly better with a reduction of the elapsed times, which depends on the rate of leakage between neighboring aquifers. © 1997 John Wiley & Sons, Inc.  相似文献   

16.
An efficient parallel algorithm for the time dependent incompressible Navier–Stokes equations is developed in this paper. The time discretization is based on a direction splitting method which only requires solving a sequence of one-dimensional Poisson type equations at each time step. Then, a spectral-element method is used to approximate these one-dimensional problems. A Schur-complement approach is used to decouple the computation of interface nodes from that of interior nodes, allowing an efficient parallel implementation. The unconditional stability of the full discretized scheme is rigorously proved for the two-dimensional case. Numerical results are presented to show that this algorithm retains the same order of accuracy as a usual spectral-element projection type schemes but it is much more efficient, particularly on massively parallel computers.  相似文献   

17.
An iterative algorithm for the numerical solution of the Helmholtz problem is considered. It is difficult to solve the problem numerically, in particular, when the imaginary part of the wave number is zero or small. We develop a parallel iterative algorithm based on a rational iteration and a nonoverlapping domain decomposition method for such a non-Hermitian, non-coercive problem. Algorithm parameters (artificial damping and relaxation) are introduced to accelerate the convergence speed of the iteration. Convergence analysis and effective strategies for finding efficient algorithm parameters are presented. Numerical results carried out on an nCUBE2 are given to show the efficiency of the algorithm. To reduce the boundary reflection, we employ a hybrid absorbing boundary condition (ABC) which combines the first-order ABC and the physical $Q$ ABC. Computational results comparing the hybrid ABC with non-hybrid ones are presented. Received May 19, 1994 / Revised version received March 25, 1997  相似文献   

18.
为了改善生产线的物流平衡和加强阶段间的时间衔接,扩展一般可重入柔性流水车间调度理论,以最小化总加权完工时间为目标,研究了每阶段含不相关并行机的动态可重入柔性流水车间问题,工件在各阶段的加工时间取决于加工它的机器。鉴于所研究问题为NP-hard问题,首先,建立整数规划模型;其次,设计元胞矩阵编码方案,提出融合离散人工蜂群算法和遗传算法的一种混合算法以获得问题的近优解;最后,为了评估混合算法的性能,将所提出算法和一些元启发式算法进行了不同规模问题的对比测试,实验结果说明了所提算法的有效性。  相似文献   

19.
This work studies the build-up method for the global minimization problem for molecular conformation, especially protein folding. The problem is hard to solve for large molecules using general minimization approaches because of the enormous amount of required computation. We therefore propose a build-up process to systematically construct the optimal molecular structures. A prototype algorithm is designed using the anisotropic effective energy simulated annealing method at each build-up stage. The algorithm has been implemented on the Intel iPSC/860 parallel computer, and tested with the Lennard-Jones microcluster conformation problem. The experiments showed that the algorithm was effective for relatively large test problems, and also very suitable for massively parallel computation. In particular, for the 72-atom Lennard-Jones microcluster, the algorithm found a structure whose energy is lower than any others found in previous studies.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号