期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel tabu search message-passing synchronous strategies for task scheduling under precedence constraints

Stella C. S. Porto Celso C. Ribeiro 《Journal of Heuristics》1996,1(2):207-223

This paper presents parallelization strategies for a tabu search algorithm for the task scheduling problem on heterogeneous processors under task precedence constraints. Parallelization relies exclusively on the decompostion of the solution space exploration. Four different parallel strategies are proposed and implemented on an asynchronous parallel machine under PVM: the master-slave model, with two different schemes for improved load balancing, and the single-program-multiple-data model, with single-token and multiple-token message passing schemes. The comparative analysis of these strategies shows that the tabu search approach for this problem is very suitable to the parallelization of the neighborhood search, with efficiency results almost always close to one for problems over a certain size. 相似文献

2.

A simple load balancing problem with decentralized information

Dimitrios G. Pandelis Demosthenis Teneketzis 《Mathematical Methods of Operations Research》1996,44(1):97-113

The following load balancing problem is investigated in discrete time: A service system consists of two service stations and two controllers, one in front of each station. The service stations provide the same service with identical service time distributions and identical waiting costs. Customers requiring service arrive at a controller's site and are routed to one of the two stations by the controller. The processes describing the two arrival streams are identical. Each controller has perfect knowledge of the workload in its own station and receives information about the other station's workload with one unit of delay. The controllers' routing strategies that minimize the customers' total flowtime are determined for a certain range of the parameters that describe the arrival process and the service distribution. Specifically, we prove that optimal routing strategies are characterized by thresholds that are either precisely specified or take one of two possible values. 相似文献

3.

Optimal algorithm for scheduling large divisible workload on heterogeneous system

Shang Mingsheng 《Applied Mathematical Modelling》2008

Optimal algorithms for scheduling divisible load on heterogeneous system are considered in this paper. The platform model we use is general and realistic, in which the mode of communication is non-blocking message receiving, and processors and communication links may have different speeds and arbitrary start-up overheads. The objective is to minimize the processing time of the entire workload. The main contributions are: (1) closed-form expressions for the processing time and the fraction of workload for each processor are derived; (2) the influence of start-up overheads on the optimal processing time is analyzed; (3) for system of bounded number of processors and large workload, optimal sequence and algorithm for workload distribution are proposed. Moreover, some numerical examples are presented to illustrate the analysis. 相似文献

4.

A parallel implementation of the nested decomposition algorithm for multistage stochastic linear programs

John R. Birge Christopher J. Donohue Derek F. Holmes Oleg G. Svintsitski 《Mathematical Programming》1996,75(2):327-352

Multistage stochastic linear programs can represent a variety of practical decision problems. Solving a multistage stochastic program can be viewed as solving a large tree of linear programs. A common approach for solving these problems is the nested decomposition algorithm, which moves up down the tree by solving nodes and passing information among nodes. The natural independence of subtrees suggests that much of the computational effort of the nested decomposition algorithm can run in parallel across small numbers of fast processors. This paper explores the advantages of such parallel implementations over serial implementations and compares alternative sequencing protocols for parallel processors. Computational experience on a large test set of practical problems with up to 1.5 million constraints and almost 5 million variables suggests that parallel implementations may indeed work well, but they require careful attention to processor load balancing. Supported in part by the National Science Foundation under Grants DDM-9215921 and SES-9211937. 相似文献

5.

On estimating workload in interval branch-and-bound global optimization algorithms

José L. Berenguel L. G. Casado I. García Eligius M. T. Hendrix 《Journal of Global Optimization》2013,56(3):821-844

In general, solving Global Optimization (GO) problems by Branch-and-Bound (B&B) requires a huge computational capacity. Parallel execution is used to speed up the computing time. As in this type of algorithms, the foreseen computational workload (number of nodes in the B&B tree) changes dynamically during the execution, the load balancing and the decision on additional processors is complicated. We use the term left-over to represent the number of nodes that still have to be evaluated at a certain moment during execution. In this work, we study new methods to estimate the left-over value based on the observed amount of pruning. This provides information about the remaining running time of the algorithm and the required computational resources. We focus on their use for interval B&B GO algorithms. 相似文献

6.

The design of an efficient load balancing algorithm employing block design

Ilyong Chung Yongeun Bae 《Journal of Applied Mathematics and Computing》2004,14(1-2):343-351

In order to maintain load balancing in a distributed system, we should obtain workload information from all the nodes on network. This processing requiresO(v ²) communication overhead, wherev is the number of nodes. In this paper, we present a new synchronous dynamic distributed load balancing algorithm on a (v, k+1, 1)-configured network applying a symmetric balanced incomplete block design, wherev=k ²+k+1. Our algorithm needs only $O(v\sqrt v )$ communication overhead and each node receives workload information from all the nodes without redundancy. Therefore, load balancing is maintained since every link has the same amount of traffic for transferring workload information. 相似文献

7.

Load balancing in the parallel optimization of block-angular linear programs

S. Kingsley Gnanendran James K. Ho 《Mathematical Programming》1993,62(1-3):41-67

Decomposition algorithms for block-angular linear programs give rise to a natural, coarse-grained parallelism that can be exploited by processing the subproblems concurrently within a distributed-memory environment. The parallel efficiency of the distributed approach, however, is critically dependent on the duration of the inherently serial master phase relative to that of the bottleneck subproblem. This paper investigates strategies for improving efficiency in distributed Dantzig—Wolfe decomposition by better balancing the load between the master and subproblem processors. We report computational experience on an Intel iPSC/2 hypercube multiprocessor with test problems having dimensions up to about 30 000 rows, 87 000 columns, and 200 coupling constraints.This paper is dedicated to Phil Wolfe on the occasion of his 65th birthday. 相似文献

8.

Periodic load balancing

Hjálmtýsson Gísli Whitt Ward 《Queueing Systems》1998,30(1-2):203-250

Multiprocessor load balancing aims to improve performance by moving jobs from highly loaded processors to more lightly loaded processors. Some schemes allow only migration of new jobs upon arrival, while other schemes allow migration of jobs in progress. A difficulty with all these schemes, however, is that they require continuously maintaining detailed state information. In this paper we consider the alternative of periodic load balancing, in which the loads are balanced only at each T time units for some appropriate T. With periodic load balancing, state information is only needed at the balancing times. Moreover, it is often possible to use slightly stale information collected during the interval between balancing times. In this paper we study the performance of periodic load balancing. We consider multiple queues in parallel with unlimited waiting space to which jobs come either in separate independent streams or by assignment (either random or cyclic) from a single stream. Resource sharing is achieved by periodically redistributing the jobs or the work in the system among the queues. The performance of these systems of queues coupled by periodic load balancing depends on the transient behavior of a single queue. We focus on useful approximations obtained by considering a large number of homogeneous queues and a heavy load. When the number of queues is sufficiently large, the number of jobs or quantity of work at each queue immediately after redistribution tends to evolve deterministically, by the law of large numbers. The steady-state (limiting) value of this deterministic sequence is obtained as the solution of a fixed point equation, where the initial value is equal to the expected transient value over the interval between successive redistributions conditional on the initial value. A refined approximation based on the central limit theorem is a normal distribution, where the mean and variance are obtained by solving a pair of fixed-point equations. With higher loads, which is natural to consider when load balancing is performed, a heavy-traffic limit theorem shows that one-dimensional reflected Brownian motion can be used to approximately describe system performance, even with general arrival and service processes. With these approximations, we show how performance depends on the assumed arrival pattern of jobs and the model parameters. We do numerical calculations and conduct simulation experiments to show the accuracy of the approximations. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

9.

Additive splitting methods for elliptic-parabolic problems

Raimondas &#;iegis Areti Papastavrou Aivars Zemitis 《Annali dell'Universita di Ferrara》2000,46(1):291-306

Three finite-difference splitting schemes are proposed for numerical solution of the nonlinear 3D parabolic-elliptic problem. Adaptive front-tracking and time-stepping strategies are included into the algorithms. Parallelization of the algorithms is done using the domain decomposition method. The 1D decomposition of the computational domain is used in order to obtain the optimal computational load balancing among processors and to minimize the frequency of data communications. A redistribution of the computational domain among processors is done dynamically during computations.

Sunto In questo lavoro vengono proposti tre diversi schemi di decomposizione alle differenze finite per la risoluzione numerica di problemi ellittico-parabolici non lineari in 3D. Negli algoritmi sono incluse strategie adattative front-tracking e time-stepping. La parallelizzazione degli algoritmi è realizzata usando il metodo della decomposizione di domini. Viene impiegata la decomposizione 1D del dominio computazionale per ottenere il bilanciameto ottimale del carico computazionale tra i processori e per minizzare la frequenza della comunicazione dei dati. Durante le computazioni, infine, viene realizzata dinamicamente la ridistribuzione dei domini computazionali.

相似文献

10.

The design of simple subcontracting rules for make-to-order shops: An assessment by simulation

Matthias Thürer Mark Stevenson Ting Qu Moacir Godinho Filho 《European Journal of Operational Research》2014

Subcontracting can be an important means of overcoming capacity shortages and of workload balancing, especially in make-to-order companies characterized by high variety, high demand variation and a job shop configuration. But there is a lack of simple, yet powerful subcontracting rules suitable for such contexts. The few existing rules were developed for single work center shops and neglect the actual subcontracting lead time, meaning some subcontracted jobs are destined to become tardy. This study uses Workload Control theory on matching required and available capacity over time to propose four new rules that address these shortcomings. The new rules are compared against four existing rules using an assembly job shop simulation model where the final, assembled product consists of several sub-assemblies that either flow through an internal job shop or are subcontracted. The best new rules stabilize the direct load queuing in front of a work center and significantly improve performance compared to the existing rules. For example, when the workload exceeds capacity by 10%, a 50% reduction in percentage tardy can be achieved. By examining how the workload behaves over time, we reveal that improvements come from selectively subcontracting the sub-assemblies that would otherwise cause overloads, thereby cutting off peaks in the workload. 相似文献

11.

Space-filling curves for 2-simplicial meshes created with bisections and reflections

Joseph M. Maubach 《Applications of Mathematics》2005,50(3):309-321

Numerical experiments in J. Maubach: Local bisection refinement and optimal order algebraic multilevel preconditioners, PRISM-97 conference Proceedings, 1977, 121–136 indicated that the refinement with the use of local bisections presented in J. Maubach: Local bisection refinement for n-simplicial grids generated by reflections, SIAM J. Sci. Comput. 16 (1995), 210–227 leads to highly locally refined computational 2-meshes which can be very efficiently load-balanced with the use of a space-filling curve. This paper introduces the construction of this curve which can be produced at almost no costs, proofs that all its properties are invariant under local bisection, and comments on the 3-dimensional case.With the use of a space-filling curve (which passes through all triangular elements), load balancing over several processors is trivial: The load can be distributed over N processors by cutting the curve into N almost equilength parts. Each processor then operates on the triangles which are passed by its part of the curve. 相似文献

12.

A study of average-case speedup and scalability of parallel computations on static networks

《Mathematical and Computer Modelling》1999,29(9):83-94

A parallel system consists of a parallel algorithm and a parallel machine that supports the implementation of the algorithm. The scalability of a parallel system is a measure of its capability to increase speedup in proportion to the number of processors, or its capability to keep a constant efficiency as the number of processors increases. The present paper is devoted to the investigation of the average-case scalability of parallel algorithms executing on multicomputers with symmetric static networks, including the completely connected network, ring, hypercube, and torus. In particular, we characterize the communication overhead such that the expected efficiency can be kept at certain constant level, and that the number of tasks grows at the rate Θ(P log P). 相似文献

13.

并行间断有限元算法求解Navier-Stokes方程

下载免费PDF全文

马欣荣段治健谢公南刘三阳《应用数学和力学》2017,38(12):1377-1388

间断Galerkin有限元方法非常适合在非结构网格上高精度求解Navier-Stokes方程,然而其十分耗费计算资源.为了提高计算效率,提出了高效的MIMD并行算法.采用隐式时间离散GMRES+LU SGS格式,结合多重网格方法,当地时间步长加速算法收敛.为了保证各处理器间负载平衡,采用区域分解二级图方法划分网格,实现内存合理分配,数据只在相邻处理器间传递.数值模拟了RAE2822翼型和M6黏性绕流,加速比基本呈线性变化且接近理想值.结果表明了该算法能有效减少计算时间、合理分配内存,具有较高的加速比和并行效率,适合于MIMD粗粒度科学计算. 相似文献

14.

Parallel asynchronous tabu search for multicommodity location-allocation with balancing requirements

Teodor Gabriel Crainic Michel Toulouse Michel Gendreau 《Annals of Operations Research》1996,63(2):277-299

We study and compare asynchronous parallelization strategies for tabu search, and evaluate the impact on performance and solution quality of some important algorithmic design parameters: number of processors, handling of exchanged information, etc. Parallelization approaches are implemented and compared by using a tabu search algorithm for multicommodity location-allocation problems with balancing requirements. 相似文献

15.

Recommendations on performance of parallel DSMC algorithm in solving subsonic nanoflows

Ehsan Roohi Masoud Darbandi 《Applied Mathematical Modelling》2012,36(5):2314-2321

We investigate the efficiency of a parallel direct simulation Monte Carlo (PDSMC) algorithm in solving the rarefied subsonic flow through a nanochannel. We use MPI library to transfer data between the processors. It is observed that PDSMC solver shows ideal speed up if sufficient workload is provided for each of processors. Additionally, this study shows that the computational time and speed up of the extended PDSMC solver do not depend (or slightly depend) on the number of cells. In contrary, increasing the total number of particles would result in a better efficiency of the PDSMC. 相似文献

16.

Strong stability of Nash equilibria in load balancing games

CHEN Bo LI SongSong ZHANG YuZhong 《中国科学数学(英文版)》2014,57(7):1361-1374

We study strong stability of Nash equilibria in load balancing games of m(m 2)identical servers,in which every job chooses one of the m servers and each job wishes to minimize its cost,given by the workload of the server it chooses.A Nash equilibrium(NE)is a strategy profile that is resilient to unilateral deviations.Finding an NE in such a game is simple.However,an NE assignment is not stable against coordinated deviations of several jobs,while a strong Nash equilibrium(SNE)is.We study how well an NE approximates an SNE.Given any job assignment in a load balancing game,the improvement ratio(IR)of a deviation of a job is defined as the ratio between the pre-and post-deviation costs.An NE is said to be aρ-approximate SNE(ρ1)if there is no coalition of jobs such that each job of the coalition will have an IR more thanρfrom coordinated deviations of the coalition.While it is already known that NEs are the same as SNEs in the 2-server load balancing game,we prove that,in the m-server load balancing game for any given m 3,any NE is a(5/4)-approximate SNE,which together with the lower bound already established in the literature yields a tight approximation bound.This closes the final gap in the literature on the study of approximation of general NEs to SNEs in load balancing games.To establish our upper bound,we make a novel use of a graph-theoretic tool. 相似文献

17.

Parallel adaptive hp-refinement techniques for conservation laws

Karen D. Devine Joseph E. Flaherty 《Applied Numerical Mathematics》1996,20(4):367-386

We describe an adaptive hp-refinement local finite element procedure for the parallel solution of hyperbolic systems of conservation laws on rectangular domains. The local finite element procedure utilizes spaces of piecewise-continuous polynomials of arbitrary degree and coordinated explicit Runge-Kutta temporal integration. A solution limiting procedure produces monotonic solutions near discontinuities while maintaining high-order accuracy near smooth extrema. A modified tiling procedure maintains processor load balance on parallel, distributed-memory MIMD computers by migrating finite elements between processors in overlapping neighborhoods to produce locally balanced computations. Grids are stored in tree data structures, with finer grids being offspring of coarser ones. Within each grid, AVL trees simplify the transfer of information between neighboring processors and the insertion and deletion of elements as they migrate between processors. Computations involving Burgers' and Euler's equations of inviscid flow demonstrate the effectiveness of the hp-refinement and balancing procedures relative to non-balanced adaptive and balanced non-adaptive procedures. 相似文献

18.

Approaches to Machine Load Balancing in Flexible Manufacturing Systems

J. M. Wilson 《The Journal of the Operational Research Society》1992,43(5):415-423

The load balancing problem for a flexible manufacturing system concerns the allocation of operations to machines and of tools to magazines with limited capacity, while seeking to balance the workload on all machines. Previous attempts to tackle this problem have used integer programming and a specialized branch and bound procedure has been developed. A modified integer programming approach is proposed here. The problem has certain features which can be used advantageously for an approximate solution technique. The approximation technique is described and computational results presented. Extensions to the problem of pooling machines are also considered. 相似文献

19.

Balancing Workloads and Minimizing Set-up Costs in the Parallel Processing Shop

Richard H. Deane Emett R. White 《The Journal of the Operational Research Society》1975,26(1):45-53

This paper presents an optimal scheduling algorithm for minimizing set-up costs in the parallel processing shop while meeting workload balancing restrictions.There are M independent batch type jobs which have sequence dependent set-up costs and N parallel processing machines. Each of the M jobs must be processed on exactly one of the N available machines. It is desirable to minimize total changeover costs with the restriction that each machine workload assignment T_n be within P units of the average machine assignment. The paper describes a static problem in which all jobs are available at time zero. The sequence dependent change over costs are identical for each machine. An extension of the algorithm handles nonidentical processor problems.A combinatorial programming approach to the problem is used. For the special case of identical processors, the problem can be treated as a multi-salesman travelling salesman problem. A general branch and bound algorithm and numerical results are given. 相似文献

20.

Three-level parallel J-Jacobi algorithms for Hermitian matrices

Sanja Singer Saša Singer 《Applied mathematics and computation》2012,218(9):5704-5725

The paper describes several efficient parallel implementations of the one-sided hyperbolic Jacobi-type algorithm for computing eigenvalues and eigenvectors of Hermitian matrices. By appropriate blocking of the algorithms an almost ideal load balancing between all available processors/cores is obtained. A similar blocking technique can be used to exploit local cache memory of each processor to further speed up the process. Due to diversity of modern computer architectures, each of the algorithms described here may be the method of choice for a particular hardware and a given matrix size. All proposed block algorithms compute the eigenvalues with relative accuracy similar to the original non-blocked Jacobi algorithm. 相似文献