首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The process allocation problem (PAP) concerned with the assignment of a number of communicating processes to a certain number (not known a priori) of identical processors in a telecommunications environment is examined. The objective is to minimize the total message-passing between processes residing on different processors subject to constraints on the processing power and storage capacity (code-, data-storage and occupancy) of the processors. Constraints imposed on the co-location of certain processes on the same processor are also included. The problem is formulated as a 0-1 linear maximization problem, taking into account only the number of processes involved, while the number of processors required is produced automatically with the optimum solution. An implicit enumeration algorithm is developed which produces an optimum message-passing allocation. Computational results of a set of random problems which have similar characteristics to a real-world application in telecommunications are also presented.  相似文献   

2.
Dynamic load balancing in multicomputers can improve the utilization of processors and the efficiency of parallel computations through migrating the workload across processors at runtime. We present a survey and critique of dynamic load balancing strategies that are iterative: that is, workload migration is carried out through transferring processes across nearest neighbour processors. Iterative strategies have become prominent in recent years because of the increaasing popularity of point-to-point interconnection networks for multicomputers.  相似文献   

3.
We consider an on-line list scheduling problem of multi-core processor tasks with virtualization to minimize makespan. The competitive ratio of an on-line algorithm is shown for every specific m, where m is the number of processors. Better on-line algorithms are presented for a small number of processors.  相似文献   

4.
In multiprocessors with static allocation of processes to processors, scheduling can be done locally for each processor. The scheduling strategy may have dramatic effect on the execution time of a parallel program. It is an NP-hard problem to find an optimal schedule, and very little is known of how close the heuristic solutions get.The major result here is a theorem stating that if certain program parameters, which can be obtained from an execution of the program on a single-processor, are known, the execution time of the optimal schedule can be calculated within a factor equal to the largest number of border processes on one processor. Border processes are processes which communicate with other processors. The program parameters are obtained using a previously developed tool.Due to the generality of this theorem, the proof is rather complex because it has to cover a large range of situations. The theorem itself, however, is easy to apply, making it possible to compare the performance of different scheduling strategies with the optimal case. The proof also gives important hints on how to design efficient scheduling algorithms for statically allocated programs.  相似文献   

5.
In this paper we describe an hybrid heuristic approach, which combines Genetic Algorithms and Tabu Thresholding, for the static allocation of interacting processes onto a parallel target system, where the number of processes is greater than the number of available processors. This problem is known to be NP-hard and finds many practical applications, given the increasing diffusion of distributed and parallel computing systems.The algorithm faces infeasibilities due to processors overload by incorporating them into the objective function and by adapting the mutation operator. Global search is performed on the set of local optima obtained by a repair search operator based on a Tabu Thresholding procedure.Extensive computational testing on randomly generated instances with up to 100 processes characterized by different target network topologies with 4 to 25 processors, shows that the algorithm favorably compares with other approaches from the literature.The proposed approach has also been extended to the allocation of parallel objects and classes, where an additional co-residence constraint between each parallel object and the associated class arises.  相似文献   

6.
This paper considers a communication system which consists of many processors and studies the problem for improving its reliability by adopting the recovery techniques of checkpoint and rollback. When either processor failure or communication error has occurred, the rollback recovery for processors associated with such an event is executed to the most recent checkpoint, and so, a consistent state in the whole system is maintained. The stochastic model with the above recovery techniques is formulated, using the theory of Markov renewal processes. The mean time to take checkpoint and the expected numbers of rollback recovery caused by processor failures and communication errors are derived. Further, an optimal checkpointing interval which minimizes the expected cost is analytically discussed.  相似文献   

7.
The article considers the execution of a computational processes represented by a bilogic graph on a homogeneous multiprocessor system (MS). A series of static-dynamic dispatching methods is considered with the aid of simulation. The statistical material is generated by simulating five real processes on MS with different number of processors. The dispatching methods are compared on three levels: efficiency of MS utilization, method complexity, and accuracy of the heuristic method. For an arbitrary program defined by a bilogic graph with unit length operators, a preliminary analysis technique is proposed to select the most appropriate method of parallel processing for the program and the number of processors maximizing the MS utilization efficiency (for BÉSM-6 the corresponding program is available).Translated form Zapiski Nauchnykh Seminarov Leningradskogo Otdeleniya Matematicheskogo Instituta im. V. A. Steklova AN SSSR, Vol. 111, pp. 162–176, 1981.  相似文献   

8.
In the classical scheduling theory, it is widely assumed that a task can be processed by only one processor at a time. With the rapid development of technology, this assumption is no longer valid. In this work we present a problem of scheduling tasks, each of which requires for its processing a set of processors simultaneously and which can be executed on several alternative sets of processors. Scheduling algorithms based on dynamic and linear programming are presented that construct minimum length non-preemptive and preemptive schedules, respectively. Results of computational experiments are also reported.This research was partially supported by a KBN grant and by project CRIT.  相似文献   

9.
王德人  孙宝云 《计算数学》1991,13(3):297-306
为连续对角映射.而A=(a_(ij)∈L(R~n)是单调矩阵,B∈L(R~n)为非负矩阵,b∈R~n为已知向量. 方程组(1.1)具有丰富的实际背景,许多非线性微分方程的求解问题,经过有限元或差分离散,均可归纳为(1.1)的求解.特别,如[7],[10]以及[11]讨论的弱非线性椭圆方程和Stefan问题等,均可作为(1.1)的特例.  相似文献   

10.
Three parallel space-decomposition minimization (PSDM) algorithms, based on the parallel variable transformation (PVT) and the parallel gradient distribution (PGD) algorithms (O.L. Mangasarian, SIMA Journal on Control and Optimization, vol. 33, no. 6, pp. 1916–1925.), are presented for solving convex or nonconvex unconstrained minimization problems. The PSDM algorithms decompose the variable space into subspaces and distribute these decomposed subproblems among parallel processors. It is shown that if all decomposed subproblems are uncoupled of each other, they can be solved independently. Otherwise, the parallel algorithms presented in this paper can be used. Numerical experiments show that these parallel algorithms can save processor time, particularly for medium and large-scale problems. Up to six parallel processors are connected by Ethernet networks to solve four large-scale minimization problems. The results are compared with those obtained by using sequential algorithms run on a single processor. An application of the PSDM algorithms to the training of multilayer Adaptive Linear Neurons (Madaline) and a new parallel architecture for such parallel training are also presented.  相似文献   

11.
椭圆离散方程并行预条件子的局部构造算法 Ⅰ.基本方法   总被引:1,自引:0,他引:1  
孙家昶 《计算数学》1995,17(2):143-153
用有限元或差分法离散所得的大型稀疏椭圆型线性代数方程组Au=f(1)构造高效率的迭代算法,是目前计算方法的一个极其活跃的方向.  相似文献   

12.
Applying computationally expensive simulations in design or process optimization results in long-running solution processes even when using a state-of-the-art distributed algorithm and hardware. Within these simulation-based optimization problems the optimizer has to treat the simulation systems as black-boxes. The distributed solution of this kind of optimization problem demands efficient utilization of resources (i.e. processors) and evaluation of the solution quality. Analyzing the parallel performance is therefore an important task in the development of adequate distributed approaches taking into account the numerical algorithm, its implementation, and the used hardware architecture. In this paper, simulation-based optimization problems are characterized and a distributed solution algorithm is presented. Different performance analysis techniques (e.g. scalability analysis, computational complexity) are discussed and a new approach integrating parallel performance and solution quality is developed. This approach combines a priori and a posteriori techniques and can be applied in early stages of the solution process. The feasibility of the approach is demonstrated by applying it to three different classes of simulation-based optimization problems from groundwater management.  相似文献   

13.
The problem of optimal scheduling n tasks in a parallel processor system is studied. The tasks are malleable, i.e., a task may be executed by several processors simultaneously and the processing speed of a task is a nonlinear function of the number of processors allocated to it. The total number of processors is m and it is an upper bound on the number of processors that can be used by all the tasks simultaneously. It is assumed that the number of processors is sufficient to process all the tasks simultaneously, i.e. nm. The objective is to find a task schedule and a processor allocation such that the overall task completion time, i.e. the makespan, is minimized. The problem is motivated by real-life applications of parallel computer systems in scientific computing of highly parallelizable tasks. An O(n) algorithm is presented to solve this problem when all the processing speed functions are convex. If these functions are all concave and the number of tasks is a constant, the problem can be solved in polynomial time. A relaxed problem, in which the number of processors allocated to each task is not required to be integer, can be solved in O(nmax {m,nlog 2 m}) time. It is proved that the minimum makespan values for the original and relaxed problems coincide. For n=2 or n=3, an optimal solution for the relaxed problem can be converted into an optimal solution for the original problem in a constant time.  相似文献   

14.
We consider a fault tolerant broadcast network of n processors each holding one bit of information. The goal is to compute a given Boolean function on the n bits. In each step, a processor may broadcast one bit of information. Each listening processor receives the bit that was broadcast with error probability bounded by a fixed constant ?. The errors in different steps, as well as for different receiving processors in the same step, are mutually independent. The protocols that are considered in this model are oblivious protocols: At each step, the processors that broadcast are fixed in advanced and independent of the input and the outcome of previous steps. We present here the first linear complexity protocols for several classes of Boolean functions. This answer an open question of Yao (Invited talk in the 5th ISTCS Conf., 1997), considering this fault tolerant model that was introduced by El Gamal (Open problems presented at the 1984 workshop on Specific Problems in Communication and Computation sponsored by Bell Communication Research) and studied also by Gallager 10 . © 2008 Wiley Periodicals, Inc. Random Struct. Alg., 2009  相似文献   

15.
This paper deals with a new class of parallel asynchronous iterative algorithms for the solution of nonlinear systems of equations. The main feature of the new class of methods presented here is the possibility of flexible communication between processors. In particular partial updates can be exchanged. Approximation of the associated fixed point mapping is also considered. A detailed convergence study is presented. A connection with the Schwarz alternating method is made for the solution of nonlinear boundary value problems. Computational results on a shared memory multiprocessor IBM 3090 are briefly presented.

  相似文献   


16.
Optimal algorithms for scheduling divisible load on heterogeneous system are considered in this paper. The platform model we use is general and realistic, in which the mode of communication is non-blocking message receiving, and processors and communication links may have different speeds and arbitrary start-up overheads. The objective is to minimize the processing time of the entire workload. The main contributions are: (1) closed-form expressions for the processing time and the fraction of workload for each processor are derived; (2) the influence of start-up overheads on the optimal processing time is analyzed; (3) for system of bounded number of processors and large workload, optimal sequence and algorithm for workload distribution are proposed. Moreover, some numerical examples are presented to illustrate the analysis.  相似文献   

17.
Parallel processors are becoming an attractive option for meeting the requirements to solve large nonlinear optimization problems and the partially separable methods are ideal candidates for parallel computing. This paper proposes implementation techniques for such methods. Computational experiments on an IBM 3090-200 and on a simulated multiprocessor are presented. The performance of both implementations is compared against a reference serial implementation.  相似文献   

18.
An algorithm is proposed for the restructuring of arithmetic expressions involving unary functions to a form convenient for parallel computation. Upper bounds are presented for the restructuring time on a serial processor, the parallel computing time, and the required number of processors.Translated from Zapiski Nauchnykh Seminarov Leningradskogo Otdeleniya Matematicheskogo Instituta im. V. A. Steklova Akad. Nauk SSSR, Vol. 175, pp. 37–52, 1988.  相似文献   

19.
Approximation schemes for optimal compression with static and sliding dictionaries which can run on a simple array of processors with distributed memory and no interconnections are presented. These approximation algorithms can be implemented on both small and large scale parallel systems. The sliding dictionary method requires large size files on large scale systems. As far as lossless image compression is concerned, arithmetic encoders enable the best lossless compressors but they are often ruled out because they are too complex. Storer extended dictionary text compression to bi-level images to avoid arithmetic encoders (BLOCK MATCHING). We were able to partition an image into up to a hundred areas and to apply the BLOCK MATCHING heuristic independently to each area with no loss of compression effectiveness. Therefore, the approach is suitable for a small scale parallel system at no communication cost. On the other hand, bi-level image compression seems to require communication on large scale systems. With regard to grey scale and color images, parallelizable lossless image compression (PALIC) is a highly parallelizable and scalable lossless compressor since it is applied independently to blocks of 8 × 8 pixels. We experimented the BLOCK MATCHING and PALIC heuristics with up to 32 processors of a 256 Intel Xeon 3.06 GHz processors machine () on a test set of large topographic bi-level images and color images in RGB format. We obtained the expected speed-up of the compression and decompression times, achieving parallel running times about 25 times faster than the sequential ones. Finally, scalable algorithms computing static and sliding dictionary optimal text compression on an exclusive read, exclusive write shared memory parallel machine are presented. On the same model, compression by block matching of bi-level images is shown which can be implemented on a full binary tree architecture under some realistic assumptions with no scalability issues.  相似文献   

20.
In this paper, a parallel implementation of Wang’s method for solving tridiagonal system of equations on the multiprocessor machine using occam language is presented. The parallel algorithm has been designed for shared and distributed memory machine that support data parallel and message passing. The over all performance of this implementation on 9 each of processors is given. The communication times are very important and any improvement on this communication would have a significant performance of the implementation. The significance of these results are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号