期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张辉蒋耀林《中国科学:数学》2010,40(5):497-516

周期稳态是科学和工程系统中一类重要的运行状态,其计算复杂度远高于相应的初值问题,因此有更迫切的并行计算需要.我们提出了计算抛物型方程时间周期解的并行方法—基于区域分解(又称Schwarz方法)的波形松驰方法,该方法只需在子区域上求解较低维的周期问题.我们分析了两种不同的传输条件下方法的收敛性,并用数值实验支持了理论结果. 相似文献

2.

Projected subgradient algorithms on systems of equilibrium problems

Dang Van Hieu 《Optimization Letters》2018,12(3):551-566

The paper proposes two parallel and cyclic algorithms for solving systems of equilibrium problems in Hilbert spaces. The algorithms combine two methods including the diagonal subgradient method and the projection method with parallel or cyclic computations. The obtained results can be considered as improvements over several previously known methods for systems of equilibrium problems in computational steps. The algorithms have also allowed to reduce several assumptions imposed on bifunctions. The strongly convergent theorems are established under suitable conditions. 相似文献

3.

Parallel iteration across the steps of high-order Runge-Kutta methods for nonstiff initial value problems

P. J. van der Houwen B. P. Sommeijer W. A. van der Veen 《Journal of Computational and Applied Mathematics》1995,60(3):309-329

For the parallel integration of nonstiff initial value problems (IVPs), three main approaches can be distinguished: approaches based on “parallelism across the problem”, on “parallelism across the method” and on “parallelism across the steps”. The first type of parallelism does not require special integration methods and can be exploited within any available IVP solver. The method-parallelism approach received much attention, particularly within the class of explicit Runge-Kutta methods originating from fixed point iteration of implicit Runge-Kutta methods of Gaussian type. The construction and implementation on a parallel machine of such methods is extremely simple. Since the computational work per processor is modest with respect to the number of data to be exchanged between the various processors, this type of parallelism is most suitable for shared memory systems. The required number of processors is roughly half the order of the generating Runge-Kutta method and the speed-up with respect to a good sequential IVP solver is about a factor 2. The third type of parallelism (step-parallelism) can be achieved in any IVP solver based on predictor-corrector iteration and requires the processors to communicate after each full iteration. If the iterations have sufficient computational volume, then the step-parallel approach may be suitable for implementation on distributed memory systems. Most step-parallel methods proposed so far employ a large number of processors, but lack the property of robustness, due to a poor convergence behaviour in the iteration process. Hence, the effective speed-up is rather poor. The dynamic step-parallel iteration process proposed in the present paper is less massively parallel, but turns out to be sufficiently robust to achieve speed-up factors up to 15. 相似文献

4.

On the GPGPU parallelization issues of finite element approximate inverse preconditioning 总被引：1，自引：0，他引：1

C.K. Filelis-Papadopoulos P.I. Matskanidis 《Journal of Computational and Applied Mathematics》2011,236(3):294-307

During the past decades, explicit finite element approximate inverse preconditioning methods have been extensively used for efficiently solving sparse linear systems on multiprocessor systems. The effectiveness of explicit approximate inverse preconditioning schemes relies on the use of efficient preconditioners that are close approximants to the coefficient matrix and are fast to compute in parallel. New parallel computational techniques are proposed for the parallelization of the Optimized Banded Generalized Approximate Inverse Finite Element Matrix (OBGAIFEM) algorithm, based on the concept of the “fish bone” computational approach, and for the Explicit Preconditioned Conjugate Gradient type methods on a General Purpose Graphics Processing Unit (GPGPU). The proposed parallel methods have been implemented using Compute Unified Device Architecture (CUDA) developed by NVIDIA. Finally, numerical results for the performance of the finite element explicit approximate inverse preconditioning for solving characteristic two dimensional boundary value problems on a massive multiprocessor interface on a GPU are presented. The CUDA implementation issues of the proposed methods are also discussed. 相似文献

5.

On the GPGPU parallelization issues of finite element approximate inverse preconditioning

《Journal of Computational and Applied Mathematics》2012,236(3):294-307

During the past decades, explicit finite element approximate inverse preconditioning methods have been extensively used for efficiently solving sparse linear systems on multiprocessor systems. The effectiveness of explicit approximate inverse preconditioning schemes relies on the use of efficient preconditioners that are close approximants to the coefficient matrix and are fast to compute in parallel. New parallel computational techniques are proposed for the parallelization of the Optimized Banded Generalized Approximate Inverse Finite Element Matrix (OBGAIFEM) algorithm, based on the concept of the “fish bone” computational approach, and for the Explicit Preconditioned Conjugate Gradient type methods on a General Purpose Graphics Processing Unit (GPGPU). The proposed parallel methods have been implemented using Compute Unified Device Architecture (CUDA) developed by NVIDIA. Finally, numerical results for the performance of the finite element explicit approximate inverse preconditioning for solving characteristic two dimensional boundary value problems on a massive multiprocessor interface on a GPU are presented. The CUDA implementation issues of the proposed methods are also discussed. 相似文献

6.

Balanced Truncation Model Reduction of Large-Scale Dense Systems on Parallel Computers

PETER BENNER ENRIQUE S. QUINTANA-ORTí GREGORIO QUINTANA-ORTí 《Mathematical and Computer Modelling of Dynamical Systems: Methods, Tools and Applications in Engineering and Related Sciences》2013,19(4):383-405

Model reduction is an area of fundamental importance in many modeling and control applications. In this paper we analyze the use of parallel computing in model reduction methods based on balanced truncation of large-scale dense systems. The methods require the computation of the Gramians of a linear-time invariant system. Using a sign function-based solver for computing full-rank factors of the Gramians yields some favorable computational aspects in the subsequent computation of the reduced-order model, particularly for non-minimal systems. As sign function-based computations only require efficient implementations of basic linear algebra operations readily available, e.g., in the BLAS, LAPACK, and ScaLAPACK, good performance of the resulting algorithms on parallel computers is to be expected. Our experimental results on a PC cluster show the performance and scalability of the parallel implementation. 相似文献

7.

Modulus‐based synchronous multisplitting iteration methods for linear complementarity problems

Zhong‐Zhi Bai Li‐Li Zhang 《Numerical Linear Algebra with Applications》2013,20(3):425-439

By an equivalent reformulation of the linear complementarity problem into a system of fixed‐point equations, we construct modulus‐based synchronous multisplitting iteration methods based on multiple splittings of the system matrix. These iteration methods are suitable to high‐speed parallel multiprocessor systems and include the multisplitting relaxation methods such as Jacobi, Gauss–Seidel, successive overrelaxation, and accelerated overrelaxation of the modulus type as special cases. We establish the convergence theory of these modulus‐based synchronous multisplitting iteration methods and their relaxed variants when the system matrix is an H₊‐matrix. Numerical results show that these new iteration methods can achieve high parallel computational efficiency in actual implementations. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

8.

Parallelization Strategies for Rollout Algorithms

F.?Guerriero Email author M.?Mancini 《Computational Optimization and Applications》2005,31(2):221-244

Rollout algorithms are innovative methods, recently proposed by Bertsekas et al. [3], for solving NP-hard combinatorial optimization problems. The main advantage of these approaches is related to their capability of magnifying the effectiveness of any given heuristic algorithm. However, one of the main limitations of rollout algorithms in solving large-scale problems is represented by their computational complexity. Innovative versions of rollout algorithms, aimed at reducing the computational complexity in sequential environments, have been proposed in our previous work [9]. In this paper, we show that a further reduction can be accomplished by using parallel technologies. Indeed, rollout algorithms have very appealing characteristics that make them suitable for efficient and effective implementations in parallel environments, thus extending their range of relevant practical applications.We propose two strategies for parallelizing rollout algorithms and we analyze their performance by considering a shared-memory paradigm. The computational experiments have been carried out on a SGI Origin 2000 with 8 processors, by considering two classical combinatorial optimization problems. The numerical results show that a good reduction of the execution time can be obtained by exploiting parallel computing systems. 相似文献

9.

A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems

《International Journal of Approximate Reasoning》2014,55(3):896-907

Nowadays, with the volume of data growing at an unprecedented rate, large-scale data mining and knowledge discovery have become a new challenge. Rough set theory for knowledge acquisition has been successfully applied in data mining. The recently introduced MapReduce technique has received much attention from both scientific community and industry for its applicability in big data analysis. To mine knowledge from big data, we present parallel large-scale rough set based methods for knowledge acquisition using MapReduce in this paper. We implemented them on several representative MapReduce runtime systems: Hadoop, Phoenix and Twister. Performance comparisons on these runtime systems are reported in this paper. The experimental results show that (1) The computational time is mostly minimum on Twister while employing the same cores; (2) Hadoop has the best speedup for larger data sets; (3) Phoenix has the best speedup for smaller data sets. The excellent speedups also demonstrate that the proposed parallel methods can effectively process very large data on different runtime systems. Pitfalls and advantages of these runtime systems are also illustrated through our experiments, which are helpful for users to decide which runtime system should be used in their applications. 相似文献

10.

A fast algorithm for solving diagonally dominant symmetric pentadiagonal Toeplitz systems

Jeffrey Mark McNally 《Journal of Computational and Applied Mathematics》2010,234(4):995-1005

Banded Toeplitz systems of linear equations arise in many application areas and have been well studied in the past. Recently, significant advancement has been made in algorithm development of fast parallel scalable methods to solve tridiagonal Toeplitz problems. In this paper we will derive a new algorithm for solving symmetric pentadiagonal Toeplitz systems of linear equations based upon a technique used in [J.M. McNally, L.E. Garey, R.E. Shaw, A split-correct parallel algorithm for solving tri-diagonal symmetric Toeplitz systems, Int. J. Comput. Math. 75 (2000) 303-313] for tridiagonal Toeplitz systems. A common example which arises in natural quintic spline problems will be used to demonstrate the algorithm’s effectiveness. Finally computational results and comparisons will be presented. 相似文献

11.

Numerical methods for inferring phylogenies

Felipe Albrecht Nelson Borges 《PAMM》2007,7(1):2120007-2120008

Molecular phylogenies describe the study of evolution relationships between living beings, protein and genetic sequences and other molecular taxons. In recent years there has been increased interest in producing large and accurate phylogenies trees using some numerical computational methods approach. One class of molecular phylogenies is the distance matrices method, with the matrix whose entries represent the distinctions between the taxons under study. To construct large and accurate trees for a large number of taxons, the computational processing increases and parallel computing can be helpful. So far we have tested some methods to solve the linear systems, being driven to important conclusions, like which is the best tested method, and having also reached some questions about communication. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

12.

Parallel Computing for Long-Time Simulations of Calcium Waves in a Heart Cell

Yu Wang Marc Olano Matthias Gobbert Wesley Griffin 《PAMM》2012,12(1):637-638

Calcium waves are modeled by parabolic partial differential equations, whose simulation codes contain Krylov subspace methods as computational kernels. This paper presents GPU-based parallel computations for the conjugate gradient method applied to the finite difference discretization of a Poisson equation as prototype problem for the computational kernel. The CUDA algorithm tests the three memory systems of global memory, texture memory, and shared memory of a CUDA-enabled GPU. Due to the caching mechanism and coalesced read/write operations, the CUDA algorithm using global memory and single precision floating point numbers outperforms algorithms accessing texture memory and the shared memory. (© 2012 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

13.

Simulation Model Development and Analysis in UNITY

Ernest H. Page Marc Abrams 《Annals of Operations Research》2001,104(1-4):181-212

We evaluate UNITY – a computational model, specification language and proof system defined by Chandy and Misra [5] for the development of parallel and distributed programs – as a platform for simulation model specification and analysis. We describe a UNITY-based methodology for the construction, analysis and execution of simulation models. The methodology starts with a simulation model specification in the form of a set of coupled state transition systems. Mechanical methods for mapping the transition systems first into a set of formal assertions, permitting formal verification of the transition systems, and second into an executable program are described. The methodology provides a means to independently verify the correctness of the transition systems: one can specify properties formally that the model should obey and prove them as theorems using the formal specification. The methodology is illustrated through generation of a simulation program solving the machine interference problem using the Time Warp protocol on a distributed memory parallel architecture. 相似文献

14.

A CLASS OF NEW PARALLEL HYBRID ALGEBRAIC MULTILEVEL ITERATIONS

Zhong-zhi Bai 《计算数学(英文版)》2001,19(6):651-672

1. IntroductionConsider the large sparse system of linear equationsAx = b, (1.1)where, for a fixed positive integer cr, A e L(R") is a symmetric positive definite (SPD) matrir,having the bloCked formx,b E R" are the uDknwn and the known vectors, respectively, having the correspondingblocked formsni(ni S n, i = 1, 2,', a) are a given positthe integers, satisfying Z ni = n. This systemi= 1of linear equations often arises in sultable finite element discretizations of many secondorderseifad… 相似文献

15.

Unifying computers and dynamical systems using the theory of synchronous concurrent algorithms

B.C. Thompson J.I. Zucker 《Applied mathematics and computation》2009,215(4):1386-1403

A synchronous concurrent algorithm (SCA) is a parallel deterministic algorithm based on a network of modules and channels, computing and communicating data in parallel, and synchronised by a global clock with discrete time. Many types of algorithms, computer architectures, and mathematical models of physical and biological systems are examples of SCAs. For example, conventional digital hardware is made from components that are SCAs and many computational models possess the essential features of SCAs, including systolic arrays, neural networks, cellular automata and coupled map lattices.In this paper we formalise the general concept of an SCA equipped with a global clock in order to analyse precisely (i) specifications of their spatio-temporal behaviour; and (ii) the senses in which the algorithms are correct. We start the mathematical study of SCA computation, specification and correctness using methods based on computation on many-sorted topological algebras and equational logic. We show that specifications can be given equationally and, hence, that the correctness of SCAs can be reduced to the validity of equations in certain computable algebras. Since the idea of an SCA is general, our methods and results apply to each of the particular classes of algorithms and dynamical systems above. 相似文献

16.

Scalable solver software

David E. Keyes 《PAMM》2007,7(1):1026401-1026402

Towards Optimal Petascale Simulations (TOPS) is a scalable solver software project based on domain decomposed parallelization to research, implement, and support in collaborations with users an open-source package for large-scale discretized PDE problems. Optimal complexity methods, such as multigrid/multilevel preconditioners, keep the time spent in dominant algebraic kernels close to linear in discrete problem size as the applications scale on massively parallel computers. Krylov accelerators and Jacobian-free variants of Newton's method, as appropriate, are wrapped around the multilevel methods to deliver robustness in multirate, multiscale coupled systems, which are solved either implicitly or in more traditional forms of operator splitting. The TOPS software framework is being extended beyond direct computational simulation to computational optimization, including design, control, and inverse problems. We outline and illustrate the philosophy of TOPS. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

17.

Spectral optimization of explicit iterative methods. I

A. Yu. Eremin I. E. Kaporin 《Journal of Mathematical Sciences》1987,36(2):207-214

Methods of constructing preconditioning of explicit iterative methods of solving systems of linear, algebraic equations with sparse matrices are considered in the work. The techniques considered can, first of all, be realized within the framework of the simplest data structures; secondly, the graph structures of the corresponding algorithms are well adapted to realization on parallel computers; thirdly, in conjunction with modifications of Chebyshev methods they make it possible to construct rather effective computational algorithms. Experimental data are presented which demonstrate the effect of the proposed techniques of preconditioning of the distribution of the eigenvalues of matrices of systems arising in discretization of two-dimensional elliptic boundary-value problems.Translated from Zapiski Nauchnykh Seminarov Leningradskogo Otdeleniya Matematicheskogo Instituta im. V. A. Steklova AN SSSR, Vol. 139, pp. 51–60, 1984. 相似文献

18.

Upper bounds on the complexity of solving systems of linear equations

V. I. Solodovnikov 《Journal of Mathematical Sciences》1985,29(4):1482-1501

The article is of the nature of a survey and is devoted to direct (exact) methods of solving systems of linear equations examined from the point of view of their computational complexity. The construction of most of the algorithms is outlined. The paper consists of two parts. Series methods of solving systems of linear equations are examined in the first part. It includes the algorithms of Gauss and of Konoval'tsev, Strassen's algorithm and its modifications, the YunGustavson results for Toeplitz systems, etc. The second part is devoted to parallel methods of solving systems of linear equations. Examined here are the parallelization of the Gauss algorithm, the results of Hyafil and Kung on complexity estimate of the parallel solution of triangular systems, Csanky's results based on the parallelization of Leverrier's method, Hyabil's general result on the parallelization of a straight-line program for computing polynomials, Stone's algorithm for the parallel solving of tridiagonal systems. Several new bounds are derived. In particular, if a pair of (n×n) -matrices can be multiplied sequentially by a straight-line program of complexity O(n^d), then it is possible to solve an arbitrary system of m linear equations in n unknowns on p processors with the complexity $$0\left( {\frac{{max(m, n)min(m, n)^{\alpha - 1} }}{p} + min(m, n)log_2 max(m, n)} \right),$$ , and to solve a triangular system of sizen with the complexity $$0\left( {\frac{{n^2 }}{p} + \frac{n}{{p^{1/\alpha } }}log_2^{1 - \tfrac{1}{\alpha }} n + log_2^2 n} \right).$$ 相似文献

19.

Non-uniform data distribution for communication-efficient parallel clustering

《Journal of computational science》2013,4(6):489-495

Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operation which hinders the scalability of the approach. This work studies a different parallel formulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature of the centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements. 相似文献

20.

Numerical recovery strategies for parallel resilient Krylov linear solvers

下载免费PDF全文

Emmanuel Agullo Luc Giraud Abdou Guermouche Jean Roman Mawussi Zounon 《Numerical Linear Algebra with Applications》2016,23(5):888-905

As the computational power of high‐performance computing systems continues to increase by using a huge number of cores or specialized processing units, high‐performance computing applications are increasingly prone to faults. In this paper, we present a new class of numerical fault tolerance algorithms to cope with node crashes in parallel distributed environments. This new resilient scheme is designed at application level and does not require extra resources, that is, computational unit or computing time, when no fault occurs. In the framework of iterative methods for the solution of sparse linear systems, we present numerical algorithms to extract relevant information from available data after a fault, assuming a separate mechanism ensures the fault detection. After data extraction, a well‐chosen part of missing data is regenerated through interpolation strategies to constitute meaningful inputs to restart the iterative scheme. We have developed these methods, referred to as interpolation–restart techniques, for Krylov subspace linear solvers. After a fault, lost entries of the current iterate computed by the solver are interpolated to define a new initial guess to restart the Krylov method. A well‐suited initial guess is computed by using the entries of the faulty iterate available on surviving nodes. We present two interpolation policies that preserve key numerical properties of well‐known linear solvers, namely, the monotonic decrease of the A‐norm of the error of the conjugate gradient or the residual norm decrease of generalized minimal residual algorithm for solving. The qualitative numerical behavior of the resulting scheme has been validated with sequential simulations, when the number of faults and the amount of data losses are varied. Finally, the computational costs associated with the recovery mechanism have been evaluated through parallel experiments. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献