首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Parallel branch, cut, and price for large-scale discrete optimization   总被引:2,自引:0,他引:2  
In discrete optimization, most exact solution approaches are based on branch and bound, which is conceptually easy to parallelize in its simplest forms. More sophisticated variants, such as the so-called branch, cut, and price algorithms, are more difficult to parallelize because of the need to share large amounts of knowledge discovered during the search process. In the first part of the paper, we survey the issues involved in parallelizing such algorithms. We then review the implementation of SYMPHONY and COIN/BCP, two existing frameworks for implementing parallel branch, cut, and price. These frameworks have limited scalability, but are effective on small numbers of processors. Finally, we briefly describe our next-generation framework, which improves scalability and further abstracts many of the notions inherent in parallel BCP, making it possible to implement and parallelize more general classes of algorithms. Mathematics Subject Classification (1991):65K05, 68N99, 68W10, 90-04, 90-08, 90C06, 90C09, 90C10, 90C11, 90C57  相似文献   

2.
We present a general framework for multidimensional classification that captures the pairwise interactions between class variables. The pairwise class interactions are encoded using a collection of base classifiers (Phase 1), for which the class predictions are combined in a Markov random field that is subsequently used for multidimensional inference (Phase 2); thus, the framework can be positioned between multilabel Bayesian classifiers and label transformation-based approaches. Our proposal leads to a general framework supporting a wide range of base classifiers in the first phase as well as different inference methods in the second phase. We describe the basic framework and its main properties, as well as strategies for ensuring the scalability of the framework. We include a detailed experimental evaluation based on a range of publicly available databases. Here we analyze the overall performance of the framework and we test the behavior of the different scalability strategies proposed. A comparison with other state-of-the-art multidimensional classifiers show that the proposed framework either outperforms or is competitive with the tested straw-men methods.  相似文献   

3.
This paper presents a detailed analysis of the scalability and parallelization of Local Search algorithms for constraint-based and SAT (Boolean satisfiability) solvers. We propose a framework to estimate the parallel performance of a given algorithm by analyzing the runtime behavior of its sequential version. Indeed, by approximating the runtime distribution of the sequential process with statistical methods, the runtime behavior of the parallel process can be predicted by a model based on order statistics. We apply this approach to study the parallel performance of a constraint-based Local Search solver (Adaptive Search), two SAT Local Search solvers (namely Sparrow and CCASAT), and a propagation-based constraint solver (Gecode, with a random labeling heuristic). We compare the performance predicted by our model to actual parallel implementations of those methods using up to 384 processes. We show that the model is accurate and predicts performance close to the empirical data. Moreover, as we study different types of problems, we observe that the experimented solvers exhibit different behaviors and that their runtime distributions can be approximated by two types of distributions: exponential (shifted and non-shifted) and lognormal. Our results show that the proposed framework estimates the runtime of the parallel algorithm with an average discrepancy of 21 % w.r.t. the empirical data across all the experiments with the maximum allowed number of processors for each technique.  相似文献   

4.
A parallel system consists of a parallel algorithm and a parallel machine that supports the implementation of the algorithm. The scalability of a parallel system is a measure of its capability to increase speedup in proportion to the number of processors, or its capability to keep a constant efficiency as the number of processors increases. The present paper is devoted to the investigation of the average-case scalability of parallel algorithms executing on multicomputers with symmetric static networks, including the completely connected network, ring, hypercube, and torus. In particular, we characterize the communication overhead such that the expected efficiency can be kept at certain constant level, and that the number of tasks grows at the rate Θ(P log P).  相似文献   

5.
Factorized sparse approximate inverse (FSAI) preconditioners are robust algorithms for symmetric positive matrices, which are particularly attractive in a parallel computational environment because of their inherent and almost perfect scalability. Their parallel degree is even redundant with respect to the actual capabilities of the current computational architectures. In this work, we present two new approaches for FSAI preconditioners with the aim of improving the algorithm effectiveness by adding some sequentiality to the native formulation. The first one, denoted as block tridiagonal FSAI, is based on a block tridiagonal factorization strategy, whereas the second one, domain decomposition FSAI, is built by reordering the matrix graph according to a multilevel k‐way partitioning method followed by a bandwidth minimization algorithm. We test these preconditioners by solving a set of symmetric positive definite problems arising from different engineering applications. The results are evaluated in terms of performance, scalability, and robustness, showing that both strategies lead to faster convergent schemes regarding the number of iterations and total computational time in comparison with the native FSAI with no significant loss in the algorithmic parallel degree.  相似文献   

6.
Bioinformatics is experiencing a rapid and overwhelming accumulation of molecular sequence data, predominantly driven by novel wet-lab sequencing techniques. This trend poses scalability challenges for tool developers. In the field of phylogenetic inference (reconstruction of evolutionary trees from molecular sequence data), scalability is becoming an increasingly important issue for operations other than the tree reconstruction itself. In this paper we focus on post-analysis tasks in reconstructing very large trees, specifically the step of building (extended) majority rule consensus trees from a collection of equally plausible trees or a collection of bootstrap replicate trees. To this end we present non-parallel optimizations which establish our implementation as the fastest exact implementation in phylogenetics, and our novel parallelized routines are the first of their kind. Our non-parallel optimizations achieve a performance improvement of factor 50 compared to the previous version of our code and we achieve a maximum speedup of 5.5 on a 8-core Nehalem node for building consensus trees comprising up to 55,000 organisms. We also present a parallel approach for drawing bootstrap support values on a candidate tree, and experimentally assess our approach in order to better understand read-only versus read–write parallel hash table accesses on multi-core systems.  相似文献   

7.
General sparse hybrid solvers are commonly used kernels for solving wide range of scientific and engineering problems. This work addresses the current problems of efficiently solving general sparse linear equations with direct/iterative hybrid solvers on many core distributed clusters. We briefly discuss the solution stages of Maphys, HIPS, and PDSLin hybrid solvers for large sparse linear systems with their major algorithmic differences. In this category of solvers, different methods with sophisticated preconditioning algorithms are suggested to solve the trade off between memory and convergence. Such solutions require a certain hierarchical level of parallelism more suitable for modern supercomputers that allow to scale for thousand numbers of processors using Schur complement framework. We study the effect of reordering and analyze the performance, scalability as well as memory for each solve phase of PDSLin, Maphys, and HIPS hybrid solvers using large set of challenging matrices arising from different actual applications and compare the results with SuperLU_DIST direct solver. We specifically focus on the level of parallel mechanisms used by the hybrid solvers and the effect on scalability. Tuning and Analysis Utilities (TAU) is employed to assess the efficient usage of heap memory profile and measuring communication volume. The tests are run on high performance large memory clusters using up to 512 processors.  相似文献   

8.
This paper presents an application of parallel computing techniques to the solution of an important class of planning problems known as generalized networks. Three parallel primal simplex variants for solving generalized network problems are presented. Data structures used in a sequential generalized network code are briefly discussed and their extension to a parallel implementation of one of the primal simplex variants is given. Computational testing of the sequential and parallel codes, both written in Fortran, was done on the CRYSTAL multicomputer at the University of Wisconsin, and the computational results are presented. Maximum efficiency occurred for multiperiod generalized network problems where a speedup approximately linear in the number of processors was achieved.This research was supported in part by NSF grants DCR-8503148 and CCR-8709952 and by AFOSR grant AFOSR-86-0194.  相似文献   

9.
We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives.  相似文献   

10.
一种新的并行代数多重网格粗化算法   总被引:1,自引:0,他引:1  
徐小文  莫则尧 《计算数学》2005,27(3):325-336
近年来,受实际应用领域中大规模科学计算问题的驱动,在大规模并行机上实现代数多重网格(AMG)算法成为数值计算领域的研究热点。本文针对经典AMG方法,提出一种新的并行网格粗化算法一多阶段并行RS算法(MPRS)。我们将新算法集成到了高性能预条件子软件包Hypre中。大量数值实验结果显示,新算法适合更广泛的问题,相对其他并行粗化算法,明显地改善了AMG并行计算的可扩展性。对三维27点格式有限差分离散的Poisson方程,在64个处理机上并行AMG求解,含8百万个未知量,新算法比RS3算法减少了近60的三维Poisson方程,近32万个未知量,在16个处理机上并行AMG—GMRES求解,新算法所需的迭代步数大约为其他粗化算法的一半,显示了很好的算法可扩展性。  相似文献   

11.
V. Hernández  J. E. Roman  A. Tomás 《PAMM》2007,7(1):2020083-2020084
This work presents a new implementation of a Krylov-Schur eigensolver in SLEPc (Scalable Library for Eigenvalue Problem Computations), a software library for the solution of large, sparse eigenvalue problems on parallel computers. Some parallel performance results are given, showing better scalability compared to ARPACK. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

12.
PARALLEL AUXILIARY SPACE AMG FOR H(curl) PROBLEMS   总被引:2,自引:0,他引:2  
In this paper we review a number of auxiliary space based preconditioners for the second order definite and semi-definite Maxwell problems discretized with the lowest order Nedelec finite elements. We discuss the parallel implementation of the most promising of these methods, the ones derived from the recent Hiptmair-Xu (HX) auxiliary space decomposition [Hiptmair and Xu, SIAM J. Numer. Anal., 45 (2007), pp. 2483-2509]. An extensive set of numerical experiments demonstrate the scalability of our implementation on large-scale H(curl) problems.  相似文献   

13.
The solution of large sparse linear systems is often the most time-consuming part of many science and engineering applications. Computational fluid dynamics, circuit simulation, power network analysis, and material science are just a few examples of the application areas in which large sparse linear systems need to be solved effectively. In this paper, we introduce a new parallel hybrid sparse linear system solver for distributed memory architectures that contains both direct and iterative components. We show that by using our solver one can alleviate the drawbacks of direct and iterative solvers, achieving better scalability than with direct solvers and more robustness than with classical preconditioned iterative solvers. Comparisons to well-known direct and iterative solvers on a parallel architecture are provided.  相似文献   

14.
This paper discusses techniques for computing a few selected eigenvalue–eigenvector pairs of large and sparse symmetric matrices. A recently developed class of techniques to solve this type of problems is based on integrating the matrix resolvent operator along a complex contour that encloses the interval containing the eigenvalues of interest. This paper considers such contour integration techniques from a domain decomposition viewpoint and proposes two schemes. The first scheme can be seen as an extension of domain decomposition linear system solvers in the framework of contour integration methods for eigenvalue problems, such as FEAST. The second scheme focuses on integrating the resolvent operator primarily along the interface region defined by adjacent subdomains. A parallel implementation of the proposed schemes is described, and results on distributed computing environments are reported. These results show that domain decomposition approaches can lead to reduced run times and improved scalability.  相似文献   

15.
The solution of large sparse linear systems is often the most time-consuming part of many science and engineering applications. Computational fluid dynamics, circuit simulation, power network analysis, and material science are just a few examples of the application areas in which large sparse linear systems need to be solved effectively. In this paper, we introduce a new parallel hybrid sparse linear system solver for distributed memory architectures that contains both direct and iterative components. We show that by using our solver one can alleviate the drawbacks of direct and iterative solvers, achieving better scalability than with direct solvers and more robustness than with classical preconditioned iterative solvers. Comparisons to well-known direct and iterative solvers on a parallel architecture are provided.  相似文献   

16.
The mathematical foundation of an algorithm for fast and accurate evaluation of singular integral transforms was given by Daripa [9,10,12]. By construction, the algorithm offers good parallelization opportunities and a lower computational complexity when compared with methods based on quadrature rules. In this paper we develop a parallel version of the fast algorithm by redefining the inherently sequential recurrences present in the original sequential formulation. The parallel version only utilizes a linear neighbor-to-neighbor communication path, which makes the algorithm very suitable for any distributed memory architecture. Numerical results and theoretical estimates show good parallel scalability of the algorithm. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

17.
A two-level OpenMP + MPI parallel implementation is used to numerically solve a model kinetic equation for problems with complex three-dimensional geometry. The scalability and robustness of the method are demonstrated by computing the classical gas flow through a circular pipe of finite length and the flow past a reentry vehicle model. It is shown that the two-level model significantly speeds up the computations and improves the scalability of the method.  相似文献   

18.
Parallel Variable Distribution for Constrained Optimization   总被引:1,自引:0,他引:1  
In the parallel variable distribution framework for solving optimization problems (PVD), the variables are distributed among parallel processors with each processor having the primary responsibility for updating its block of variables while allowing the remaining secondary variables to change in a restricted fashion along some easily computable directions. For constrained nonlinear programs convergence theory for PVD algorithms was previously available only for the case of convex feasible set. Additionally, one either had to assume that constraints are block-separable, or to use exact projected gradient directions for the change of secondary variables. In this paper, we propose two new variants of PVD for the constrained case. Without assuming convexity of constraints, but assuming block-separable structure, we show that PVD subproblems can be solved inexactly by solving their quadratic programming approximations. This extends PVD to nonconvex (separable) feasible sets, and provides a constructive practical way of solving the parallel subproblems. For inseparable constraints, but assuming convexity, we develop a PVD method based on suitable approximate projected gradient directions. The approximation criterion is based on a certain error bound result, and it is readily implementable. Using such approximate directions may be especially useful when the projection operation is computationally expensive.  相似文献   

19.
In this note, we derive the complete recursive structure of the Birge and Qi factorization for interior point methods (IPM) for tree structured linear programs as they appear in multistage stochastic programs. This recursive structure allows for an elegant implementation on parallel hardware, since multiple versions of the same program may be run on on different processors. Our preliminary computational experiment, conducted on a Beowulf cluster, demonstrates the scalability of this approach.  相似文献   

20.
Model reduction is an area of fundamental importance in many modeling and control applications. In this paper we analyze the use of parallel computing in model reduction methods based on balanced truncation of large-scale dense systems. The methods require the computation of the Gramians of a linear-time invariant system. Using a sign function-based solver for computing full-rank factors of the Gramians yields some favorable computational aspects in the subsequent computation of the reduced-order model, particularly for non-minimal systems. As sign function-based computations only require efficient implementations of basic linear algebra operations readily available, e.g., in the BLAS, LAPACK, and ScaLAPACK, good performance of the resulting algorithms on parallel computers is to be expected. Our experimental results on a PC cluster show the performance and scalability of the parallel implementation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号