首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Data level parallelism is a type of parallelism whereby operations are performed on many data elements concurrently, by many processors. These operations are (more or less) identical, and are executed in a synchronous, orderly fashion. This type of parallelism is used by massively parallel SIMD (i.e., Single Instruction, Multiple Data) architectures, like the Connection Machine CM-2, the AMT DAP and Masspar, and MIMD (i.e., Multiple Instruction, Multiple Data) architectures, like the Connection Machine CM-5. Data parallelism can also be described by a theoretical model of computation: the Vector-Random Access Machine (V-RAM).In this paper we discuss practical approaches to the data-parallel solution of large scale optimization problems with network—or embedded-network—structures. The following issues are addressed: (1) The concept of dataparallelism, (2) algorithmic principles that lead to data-parallel decomposition of optimization problems with network—or embedded-network—structures, (3) specific algorithms for several network problems, (4) data-structures needed for efficient implementations of the algorithms, and (5) empirical results that highlight the performance of the algorithms on a data-parallel computer, the Connection Machine CM-2.  相似文献   

2.
Estimating the entries of a large matrix to satisfy a set of internal consistency relations is a problem with several applications in economics, urban and regional planning, transportation, statistics and other areas. It is known as theMatrix Balancing Problem. Matrix balancing applications arising from the estimation of telecommunication or transportation traffic and from multi-regional trade flows give rise to huge optimization problems. In this report, we show that the RAS algorithm can be specialized for vector and parallel computing and used for the solution of very large problems. The algorithm is specialized for vector computations on a CRAY X-MP and is parallelized on an Alliant FX/8. A variant of the algorithm — developed here for its potential parallelism — turns out to be more efficient than the original algorithm even when implemented serially. We use the algorithms to estimate disaggregated input/output tables and a multi-regional trade flow table of the U.S. The larger problem solved has approximately 12 000 constraints and over 370 000 nonlinear variables. This is the first of two papers that aim at the solution of very large matrix balancing problems. Zenios [20] is using the same algorithm for the same models on a massively parallel Connection Machine CM-2.Research partially supported by NSF grants ECS-8718971 and CCR-8811135, and AFOSR grant 89-0145. Computing resources were made available through the ACRF at Argonne National Laboratory and CRAY Research, Inc.  相似文献   

3.
An improved parallel hybrid bi-conjugate gradient method (IBiCGSTAB(2) method, in brief) for solving large sparse linear systems with nonsymmetric coefficient matrices is proposed for distributed parallel environments. The method reduces five global synchronization points to two by reconstructing the BiCGSTAB(2) method in [G.L.G. Sleijpen, H.A. van der Vorst, Hybrid bi-conjugate gradient methods for CFD problems, in: M. Hafez, K. Oshima (Eds.), Computational Fluid Dynamics Review 1995, John Wiley & Sons Ltd, Chichester, 1995, pp. 457–476] and the communication time required for the inner product can be efficiently overlapped with useful computation. The cost is only slightly increased computation time, which can be ignored, compared with the reduction of communication time. Performance and isoefficiency analysis shows that the IBiCGSTAB(2) method has better parallelism and scalability than the BiCGSTAB(2) method. Numerical experiments show that the scalability can be improved by a factor greater than 2.5 and the improvement in parallel communication performance approaches 60%.  相似文献   

4.
The convex hull of a finite set of points in the plane can be computed in constant time using a polynomial number of processors.This work was supported by the Natural Sciences and Engineering Research Council of Canada under Grant NSERC - A3336.  相似文献   

5.
An improved GPBi-CG algorithm suitable for distributed parallel computing   总被引:1,自引:0,他引:1  
An improved generalized product-type bi-conjugate gradient (GPBi-CG) method (IGPBi-CG method, in brief) for solving large sparse linear systems with unsymmetrical coefficient matrices is proposed for distributed parallel environments. The method reduces three global synchronization points to two by reconstructing GPBi-CG method and the communication time required for the inner product can be efficiently overlapped with useful computation. The cost is only slightly increased computation time, which can be ignored compared with the reduction of communication time. Performance and isoefficiency analysis show that the IGPBi-CG method has better parallelism and scalability than the GPBi-CG method. Numerical experiments show that the scalability can be improved by a factor greater than 1.5 and the improvement in parallel communication performance approaches 33.3˙%.  相似文献   

6.
Parallel processors are becoming an attractive option for meeting the requirements to solve large nonlinear optimization problems and the partially separable methods are ideal candidates for parallel computing. This paper proposes implementation techniques for such methods. Computational experiments on an IBM 3090-200 and on a simulated multiprocessor are presented. The performance of both implementations is compared against a reference serial implementation.  相似文献   

7.
Industry and government routinely solve deterministic mathematical programs for planning and schelduling purposes, some involving thousands of variables with a linear or non-linear objective and inequality constraints. The solutions obtained are often ignored because they do not properly hedge against future contingencies. It is relatively easy to reformulate models to include uncertainty. The bottleneck has been (and is) our capability to solve them. The time is now ripe for finding a way to do so. To this end, we describe in this paper how large-scale system methods for solving multi-staged systems, such as Bender's Decomposition, high-speed sampling or Monte Carlo simulation, and parallel processors can be combined to solve some important planning problems involving uncertainty. For example, parallel processors may make it possible to come to better grips with the fundamental problems of planning, scheduling, design, and control of complex systems such as the economy, an industrial enterprise, an energy system, a water-resource system, military models for planning-and-control, decisions about investment, innovation, employment, and health-delivery systems.  相似文献   

8.
Absorbing boundary conditions have been developed for various types of problems to truncate infinite domains in order to perform computations. But absorbing boundary conditions have a second, recent and important application: parallel computing. We show that absorbing boundary conditions are essential for a good performance of the Schwarz waveform relaxation algorithm applied to the wave equation. In turn this application gives the idea of introducing a layer close to the truncation boundary which leads to a new way of optimizing absorbing boundary conditions for truncating domains. We optimize the conditions in the case of straight boundaries and illustrate our analysis with numerical experiments both for truncating domains and the Schwarz waveform relaxation algorithm.

  相似文献   


9.
For solving inverse gravimetry problems, efficient stable parallel algorithms based on iterative gradient methods are proposed. For solving systems of linear algebraic equations with block-tridiagonal matrices arising in geoelectrics problems, a parallel matrix sweep algorithm, a square root method, and a conjugate gradient method with preconditioner are proposed. The algorithms are implemented numerically on a parallel computing system of the Institute of Mathematics and Mechanics (PCS-IMM), NVIDIA graphics processors, and an Intel multi-core CPU with some new computing technologies. The parallel algorithms are incorporated into a system of remote computations entitled “Specialized Web-Portal for Solving Geophysical Problems on Multiprocessor Computers.” Some problems with “quasi-model” and real data are solved.  相似文献   

10.
We provide a detailed hands-on tutorial for the R add-on package mboost. The package implements boosting for optimizing general risk functions utilizing component-wise (penalized) least squares estimates as base-learners for fitting various kinds of generalized linear and generalized additive models to potentially high-dimensional data. We give a theoretical background and demonstrate how mboost can be used to fit interpretable models of different complexity. As an example we use mboost to predict the body fat based on anthropometric measurements throughout the tutorial.  相似文献   

11.
12.
This paper gives a parallel computing scheme for minimizing a twice continuously differentiable function with the form
where x = (xT1,…,xTm)T and xi Rni, ∑mi = 1ni = n, and n a very big number. It is proved that we may use m parallel processors and an iterative procedure to find a minimizer of ƒ(x). The convergence and convergence rate are given under some conditions. The conditions for finding a global minimizer of ƒ(x by using this scheme are given, too. A similar scheme can also be used parallelly to solve a large scale system of nonlinear equations in the similar way. A more general case is also investigated.  相似文献   

13.
The paper describes a parallel algorithm for computing an n-dimensional Delaunay tessellation using a divide-conquer strategy. Its implementation (using MPI library for C) in the case n = 2, relied on restricted areas to discard non-Delaunay edges, is executed easily on PC clusters. We shows that the convexity is a crucial factor of efficiency of the parallel implementation over the corresponding sequential one.  相似文献   

14.
This paper focuses on the Vehicle Routing Problem with Stochastic Demands (VRPSD) and discusses how Parallel and Distributed Computing Systems can be employed to efficiently solve the VRPSD. Our approach deals with uncertainty in the customer demands by considering safety stocks, i.e. when designing the routes, part of the vehicle capacity is reserved to deal with potential emergency situations caused by unexpected demands. Thus, for a given VRPSD instance, our algorithm considers different levels of safety stocks. For each of these levels, a different scenario is defined. Then, the algorithm solves each scenario by integrating Monte Carlo simulation inside a heuristic-randomization process. This way, expected variable costs due to route failures can be naturally estimated even when customers’ demands follow a non-normal probability distribution. Use of parallelization strategies is then considered to run multiple instances of the algorithm in a concurrent way. The resulting concurrent solutions are then compared and the one with the minimum total costs is selected. Two numerical experiments allow analyzing the algorithm’s performance under different parallelization schemas.  相似文献   

15.
Many parallel iterative algorithms for solving symmetric, positive definite problems proceed by solving in each iteration, a number of independent systems on subspaces. The convergence of such methods is determined by the spectrum of the sums of orthogonal projections on those subspaces, while the convergence of a related sequential method is determined by the spectrum of the product of complementary projections. We study spectral properties of sums of orthogonal projections and in the case of two projections, characterize the spectrum of the sum completely in terms of the spectrum of the product.This work was supported in part by the Norwegian Research Council for Science and the Humanities under grant D.01.08.054 and by The Royal Norwegian Council for Scientific and Industrial Research under grant IT2.28.28484; also supported in part by the Air Force Office of Scientific Research under grant AFOSR-86-0126 and by the National Science Foundation under grant DMS-8704169.  相似文献   

16.
The Approximating Sequence Method for computation of average cost optimal stationary policies in denumerahle state Markov decision chains, introduced in Sennott (1994), is reviewed. New methods for verifying the assumptions are given. These are useful for models with multidimensional state spaces that satisfy certain mild structural properties. The results are applied to four problems in the optimal routing of packets to parallel queues. Numerical results are given for one of the models.  相似文献   

17.
A fault diagnosis model for multiprocessor computers is proposed. Under normal operating mode each processor executes its own data. When an error occurs, the system is switched to the diagnostic mode. Previous input data for each processor is shifted to a different unit, to obtain a set of comparison results. We show that analysis of the test data to diagnose or locate faulty processors is equivalent to a 2-satisfiability problem. Under the assumption that discrepancy in a comparison result occurs if and only if at least one of the processors (being compared) is faulty, we prove that all the faulty processors can be diagnosed in O(n2) time, where n denotes the number of processors in the system.  相似文献   

18.
19.
The RngStreams software package provides one viable solution to the problem of creating independent random number streams for simulations in parallel processing environments. Techniques are presented for effectively using RngStreams with C++ programs that are parallelized via OpenMP or MPI. Ways to access the backbone generator from RngStreams in R through the parallel and rstream packages are also described. The ideas in the paper are illustrated with both a simple running example and a Monte Carlo integration application.  相似文献   

20.
Fractal video compression is a relatively new video compression method. Its attraction is due to the high compression ratio and the simple decompression algorithm. But its computational complexity is high and as a result parallel algorithms on high performance machines become one way out. In this study we partition the matching search, which occupies the majority of the work in a fractal video compression process, into small tasks and implement them in two distributed computing environments, one using DCOM and the other using .NET Remoting technology, based on a local area network consists of loosely coupled PCs. Experimental results show that the parallel algorithm is able to achieve a high speedup in these distributed environments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号