首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
Three parallel space-decomposition minimization (PSDM) algorithms, based on the parallel variable transformation (PVT) and the parallel gradient distribution (PGD) algorithms (O.L. Mangasarian, SIMA Journal on Control and Optimization, vol. 33, no. 6, pp. 1916–1925.), are presented for solving convex or nonconvex unconstrained minimization problems. The PSDM algorithms decompose the variable space into subspaces and distribute these decomposed subproblems among parallel processors. It is shown that if all decomposed subproblems are uncoupled of each other, they can be solved independently. Otherwise, the parallel algorithms presented in this paper can be used. Numerical experiments show that these parallel algorithms can save processor time, particularly for medium and large-scale problems. Up to six parallel processors are connected by Ethernet networks to solve four large-scale minimization problems. The results are compared with those obtained by using sequential algorithms run on a single processor. An application of the PSDM algorithms to the training of multilayer Adaptive Linear Neurons (Madaline) and a new parallel architecture for such parallel training are also presented.  相似文献   

2.
We present a new parallel method for verified global optimization, using a centralized mediator for the dynamic load balancing. The new approach combines the advantages of two previous models, the master slave model and the processor farm. Numerical results show the efficiency of this new method. For a large number of problems at least linear speedup is reached. The efficiency of this new method is also confirmed by a comparison with other parallel methods for verified global optimization. A theoretical study proves that using the best-first strategy to choose the next box for subdivision, no real superlinear speedup may be expected concerning the number of iterations. Moreover, the potential of parallelization of methods of verified global optimization is discussed in general.  相似文献   

3.
This paper presents parallelization strategies for a tabu search algorithm for the task scheduling problem on heterogeneous processors under task precedence constraints. Parallelization relies exclusively on the decompostion of the solution space exploration. Four different parallel strategies are proposed and implemented on an asynchronous parallel machine under PVM: the master-slave model, with two different schemes for improved load balancing, and the single-program-multiple-data model, with single-token and multiple-token message passing schemes. The comparative analysis of these strategies shows that the tabu search approach for this problem is very suitable to the parallelization of the neighborhood search, with efficiency results almost always close to one for problems over a certain size.  相似文献   

4.
《Optimization》2012,61(1-2):95-114
We solve an optimal control problem for controlled parabolic Ito equations by a stochastic quasigradient method. Because of high amounts of computation time required by numerical solution of such problems we investigate the parallelization of the algorithm. We distribute the computations of space stages over several processor nodes of a parallel computer. We obtain an efficient algorithm with low communication cost by using a ring topology  相似文献   

5.
A parallel software package designed for the numerical simulation of three-dimensional viscous gas flows is presented. The numerical algorithm is based on kinetically consistent difference schemes used on locally refined grids. The software package has been tested in various super-and subsonic flow problems. It provides an opportunity for the direct simulation of turbulent flows. The efficiency of parallelization is analyzed depending on the problem size and the number of processors.  相似文献   

6.
The framework of this paper is the parallelization of a plasticity algorithm that uses an implicit method and an incremental approach. More precisely, we will focus on some specific parallel sparse linear algebra algorithms which are the most time-consuming steps to solve efficiently such an engineering application. First, we present a general algorithm which computes an efficient static scheduling of block computations for parallel sparse linear factorization. The associated solver, based on a supernodal fan-in approach, is fully driven by this scheduling. Second, we describe a scalable parallel assembly algorithm based on a distribution of elements induced by the previous distribution for the blocks of the sparse matrix. We give an overview of these algorithms and present performance results on an IBM SP2 for a collection of grid and irregular problems. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

7.
In this paper, a parallel asynchronous information algorithm for solving multidimensional Lipschitz global optimization problems, where times for evaluating the objective function can be different from point to point, is proposed. This method uses the nested optimization scheme and a new parallel asynchronous global optimization method for solving core univariate subproblems generated by the nested scheme. The properties of the scheme related to parallel computations are investigated. Global convergence conditions for the new method and theoretical conditions of speed up, which can be reached by using asynchronous parallelization in comparison with the pure sequential case, are established. Numerical experiments comparing sequential, synchronous, and asynchronous algorithms are also reported.  相似文献   

8.
As other metaheuristics, Scatter Search can gain from a parallel implementation. However, it is not clear beforehand which of the possible alternative parallelization strategies is more effective. To address this question, it has been selected as a test bed for empirical testing a classical combinatorial optimization problem, namely 0–1 knapsack problem, for which the sequential application of Scatter Search is well known. Two phases or groups of steps in the Scatter Search template have been identified as natural candidates for parallelization and several ways of carrying out that parallelization are proposed. An extensive experimental analysis involving 18 different parallel algorithms has been carried out testing the combinations of these alternative parallelization strategies on a battery of large random instances generated using a public code from the literature. The interpretation of the ANOVA results gives cues about the significance of the alternatives used in each phase and about the effect of the dynamic (vs. static) updating of the RefSet. A low average efficiency has been observed, although it has to be taken into account that, due to the termination condition used, not all algorithms tested carry out the same number of iterations.  相似文献   

9.
Using Balakrishnan's epsilon problem formulation (Ref. 1) and the Rayleigh-Ritz method with an orthogonal polynomial function basis, optimal control problems are transformed from the standard two-point boundary-value problem to a nonlinear programming problem. The resulting matrix-vector equations describing the optimal solution have standard parallel solution methods for implementation on parallel processor arrays. The method is modified to handle inequality constraints, and some results are presented under which specialized nonlinear functions, such as sines and cosines, can be handled directly. Some computational results performed on an Intel Sugarcube are presented to illustrate that considerable computational savings can be realized by using the proposed solution method.  相似文献   

10.
《Optimization》2012,61(3-4):261-275
This paper deals with a new parallel method for solving one-dimensional global optimization problems. We present the formulation of the decision rules of this method, the conditions for its nonredundant parallelization, the generalization for solving multi-dimensional problems and its implementation on a transputer system. There is also an account of some numerical experiments.  相似文献   

11.
A new decomposition method for multistage stochastic linear programming problems is proposed. A multistage stochastic problem is represented in a tree-like form and with each node of the decision tree a certain linear or quadratic subproblem is associated. The subproblems generate proposals for their successors and some backward information for their predecessors. The subproblems can be solved in parallel and exchange information in an asynchronous way through special buffers. After a finite time the method either finds an optimal solution to the problem or discovers its inconsistency. An analytical illustrative example shows that parallelization can speed up computation over every sequential method. Computational experiments indicate that for large problems we can obtain substantial gains in efficiency with moderate numbers of processors.This work was partly supported by the International Institute for Applied Systems Analysis, Laxenburg, Austria.  相似文献   

12.
The global minimization of large-scale partially separable non-convex problems over a bounded polyhedral set using a parallel branch and bound approach is considered. The objective function consists of a separable concave part, an unseparated convex part, and a strictly linear part, which are all coupled by the linear constraints. These large-scale problems are characterized by having the number of linear variables much greater than the number of nonlinear variables. An important special class of problems which can be reduced to this form are the synomial global minimization problems. Such problems often arise in engineering design, and previous computational methods for such problems have been limited to the convex posynomial case. In the current work, a convex underestimating function to the objective function is easily constructed and minimized over the feasible domain to get both upper and lower bounds on the global minimum function value. At each minor iteration of the algorithm, the feasible domain is divided into subregions and convex underestimating problems over each subregion are solved in parallel. Branch and bound techniques can then be used to eliminate parts of the feasible domain from consideration and improve the upper and lower bounds. It is shown that the algorithm guarantees that a solution is obtained to within any specified tolerance in a finite number of steps. Computational results obtained on the four processor Cray 2, both sequentially and in parallel on all four processors, are also presented.  相似文献   

13.
A parallel algorithm for constrained concave quadratic global minimization   总被引:2,自引:0,他引:2  
The global minimization of large-scale concave quadratic problems over a bounded polyhedral set using a parallel branch and bound approach is considered. The objective function consists of both a concave part (nonlinear variables) and a strictly linear part, which are coupled by the linear constraints. These large-scale problems are characterized by having the number of linear variables much greater than the number of nonlinear variables. A linear underestimating function to the concave part of the objective is easily constructed and minimized over the feasible domain to get both upper and lower bounds on the global minimum function value. At each minor iteration of the algorithm, the feasible domain is divided into subregions and linear underestimating problems over each subregion are solved in parallel. Branch and bound techniques can then be used to eliminate parts of the feasible domain from consideration and improve the upper and lower bounds. It is shown that the algorithm guarantees that a solution is obtained to within any specified tolerance in a finite number of steps. Computational results are presented for problems with 25 and 50 nonlinear variables and up to 400 linear variables. These results were obtained on a four processor CRAY2 using both sequential and parallel implementations of the algorithm. The average parallel solution time was approximately 15 seconds for problems with 400 linear variables and a relative tolerance of 0.001. For a relative tolerance of 0.1, the average computation time appears to increase only linearly with the number of linear variables.  相似文献   

14.
Based on the corrected finite pointset method (CFPM) with CPU-GPU heteroid parallelization (CFPM-GPU), a high-efficiency, accurate and fast parallel algorithm was developed for the high-dimensional phase separation phenomena governed by the multi-component Cahn-Hilliard (C-H) equation in complex domains. The proposed parallel algorithm with the CFPM-GPU was built in a process like: ① introduce the Wendland weight function into the discretization of the finite pointset method (FPM) scheme for the 1st/2nd spatial derivatives, based on the Taylor series and the weighted least square concept; ② use the above FPM scheme twice to approximate the 4th spatial derivative in the C-H equation, which is called the CFPM method; ③ for the first time establish an accelerating parallel algorithm for the CFPM with local matrices by means of a single GPU card based on the CUDA programming. Two benchmark problems of 2D radially and 3D spherically symmetric C-H equations were first solved to test the accuracy and high-efficiency of the proposed CFPM-GPU, and the acceleration ratio of the GPU parallelization to the single CPU computation is about 160. Subsequently, the proposed CFPM-GPU was used to predict the 2D/3D multi-phase separation phenomena in complex domains, and the prediction was compared with other numerical results. The numerical results show that, the proposed CFPM-GPU is valid and high-efficiency to simulate the 2D/3D multi-phase separation cases in complex domains. © 2023 Editorial Office of Applied Mathematics and Mechanics. All rights reserved.  相似文献   

15.
An efficient sparse LU factorization algorithm on popular shared memory multi-processors is presented. Pipelining parallelism is essential to achieve higher parallel efficiency and it is exploited with a left-right looking algorithm. No global barrier is used and a completely asynchronous scheduling scheme is one central point of the implementation. The algorithm has been successfully tested on SUN Enterprise, DEC AlphaServer, SGI Origin 2000 and Cray T90 and J90 parallel computers, delivering up to 2.3 GFlop/s on an eight processor DEC AlphaServer for medium-size semiconductor device simulations and structural engineering problems.  相似文献   

16.
A method for parallel construction of a classifier ensemble for solving the problem of localization of neuron sources within the brain on the basis of the analysis of electroencephalography signals is described. The idea of the proposed parallel numerical method consists in the consideration of the source parameters as attributes of decision tress constructed in parallel. The method is based on formation of a training data set from an experimental signal and construction of a classifier on the basis of the value of error of the potential, that is, the difference between the measured and model values of the potential. The efficiency of parallelization of the localization problem, namely, the data distribution between processors, and the distributed training of the ensembles of decision trees are considered. Analysis of the scalability of the problem of construction of a classifier ensemble with a increase in the number of processors in the course of solution of the problem of localization of a neuron source on multiprocessor computational complexes is presented. The parallel source localization algorithm is developed for architectures with either common or distributed memory. The algorithm is realized using the MPI technology; a hybrid model of parallel calculations using MPI and OpenMPI is also discussed.  相似文献   

17.
Parallel analogs of the variants of the incomplete Cholesky-conjugate gradient method and the modified incomplete Cholesky-conjugate gradient method for solving elliptic equations on uniform triangular and unstructured triangular grids on parallel computer systems with the MIMD architecture are considered. The construction of parallel methods is based on the use of various variants of ordering the grid points depending on the decomposition of the computation domain. Results of the theoretic and experimental studies of the convergence rate of these methods are presented. The solution of model problems on a moderate number processors is used to examine the efficiency of the proposed parallel methods.  相似文献   

18.
Parallel Variable Distribution for Constrained Optimization   总被引:1,自引:0,他引:1  
In the parallel variable distribution framework for solving optimization problems (PVD), the variables are distributed among parallel processors with each processor having the primary responsibility for updating its block of variables while allowing the remaining secondary variables to change in a restricted fashion along some easily computable directions. For constrained nonlinear programs convergence theory for PVD algorithms was previously available only for the case of convex feasible set. Additionally, one either had to assume that constraints are block-separable, or to use exact projected gradient directions for the change of secondary variables. In this paper, we propose two new variants of PVD for the constrained case. Without assuming convexity of constraints, but assuming block-separable structure, we show that PVD subproblems can be solved inexactly by solving their quadratic programming approximations. This extends PVD to nonconvex (separable) feasible sets, and provides a constructive practical way of solving the parallel subproblems. For inseparable constraints, but assuming convexity, we develop a PVD method based on suitable approximate projected gradient directions. The approximation criterion is based on a certain error bound result, and it is readily implementable. Using such approximate directions may be especially useful when the projection operation is computationally expensive.  相似文献   

19.
This is the third part of a trilogy on parallel solution of the linear elasticity problem. We consider the separate displacement ordering for a plain isotropic problem with full Dirichlet boundary conditions. The parallel solution methods presented in the first two parts of the trilogy are here generalised to higher order by using hierarchical finite elements. We discuss node numberings on regular grids for high degree of parallelism and even processor load as well as the problem of stability of the modified incomplete Cholesky factorisations used. Several preconditioning techniques for the conjugate gradient method are studied and compared for quadratic finite elements. Bounds for the condition numbers of the corresponding preconditioning methods are derived, and computer experiments are performed in order to confirm the theory and give recommendations on the choice of method. The parallel implementation is performed by message passing interface. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

20.
A major problem in achieving significant speed-up on parallel machines is the overhead involved with synchronizing the concurrent processes. Removing the synchronization constraint has the potential of speeding up the computation, while maintaining greater computation flexibility (e.g. differences in processors speed; differences in the data input to processors). We construct asynchronous (AS) finite difference schemes for the solution of PDEs by removing the synchronization constraint. We analyze the numerical properties of these schemes. Based on the analysis, we develop corrected-asynchronous (CA) finite difference schemes which are specifically constructed for an asynchronous processing. We present asynchronous (AS) and corrected-asynchronous (CA) finite difference schemes for the multi-dimensional heat equation. Although our discussion concentrates on the Euler scheme it should serve only as a sample, as it can be extended to other schemes and other PDEs.These schemes are implemented on the shared-memory multi-userSequent Balance machine. Numerical results for one and two dimensional problems are presented. It is shown experimentally that synchronization penalty can be about 50% of run time: in most cases, the asynchronous scheme runs twice as fast as the parallel synchronous scheme. In general, the efficiency of the parallel schemes increases with processor load, with the time-level, and with the problem dimension. The efficiency of the AS may reach 90% and over, but it provides accurate results only for steady-state values. The CA, on the other hand, is less efficient but provides more accurate results for intermediate (non steady-state) values. The results show the potential of developing asynchronous finite deference schemes for steady-state as well as non steadystate problems.This research was partially supported by a grant from The Basic Research Foundation administrated by The Israel Academy of Sciences and Humanities.A reduced version of the paper was presented at the 4th SIAM Conference on Parallel Processing for Scientific Computing, Dec. 11–13, 1989, Chicago, USA.The work by this author was supported by research grant 337 of the Israeli National Council for Research and Development in the years 1990–1991.This research was supported by the National Aeronautics and Space Administration under NASA Contract No. NASI-18107 while the author was in residence at the Institute for Computer Applications in Sciences and Engineering (ICASE), NASA Langley Research Center, Hampton, VA 23665, USA.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号