期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel split least‐squares mixed finite element method for parabolic problem

《Numerical Methods for Partial Differential Equations》2018,34(4):1282-1300

Based on overlapping domain decomposition, a new class of parallel split least‐squares (PSLS) mixed finite element methods is presented for solving parabolic problem. The algorithm is fully parallel. In the overlapping domains, the partition of unity is applied to distribute the corrections reasonably, which makes that the new method only needs one or two iteration steps to reach given accuracy at each time step while the classical Schwarz alternating methods need many iteration steps. The dependence of the convergence rate on the spacial mesh size, time increment, iteration times, and subdomains overlapping degree is analyzed. Some numerical results are reported to confirm the theoretical analysis. 相似文献

2.

热传导方程的一类新型重叠型并行区域分解有限差分算法

田敏羊丹平《高等学校计算数学学报》2008,30(1):14-25

0引言随着大规模科学工程计算的发展和计算精度要求的提高,区域分解和并行计算的发展越来越受到人们的重视.区域分解方法把复杂或大型的问题分解成若干重叠或非重叠子区域上的子问题,再在子区域上利用各种算法求解子问题.借助于区域分解,各子区域之间的计算可以并行,这引起了人们的研究兴趣和极大的应用前景.重叠型区域分解法的原始思想来源于Schwarz交替法.近年来建立在Schwarz交替法基础上的区域分解法在理论分析和实际应用中取得令人注目的发展,已成为一种有效的迭代方法.经典的Schwarz交替法本质上是串行的.随着并行计算的发展,出现了多种可完全并行化的Schwarz算法相似文献

3.

One‐level Newton–Krylov–Schwarz algorithm for unsteady non‐linear radiation diffusion problem

Serguei Ovtchinnikov Xiao‐Chuan Cai 《Numerical Linear Algebra with Applications》2004,11(10):867-881

In this paper, we present a parallel Newton–Krylov–Schwarz (NKS)‐based non‐linearly implicit algorithm for the numerical solution of the unsteady non‐linear multimaterial radiation diffusion problem in two‐dimensional space. A robust solver technology is required for handling the high non‐linearity and large jumps in material coefficients typically associated with simulations of radiation diffusion phenomena. We show numerically that NKS converges well even with rather large inflow flux boundary conditions. We observe that the approach is non‐linearly scalable, but not linearly scalable in terms of iteration numbers. However, CPU time is more important than the iteration numbers, and our numerical experiments show that the algorithm is CPU‐time‐scalable even without a coarse space given that the mesh is fine enough. This makes the algorithm potentially more attractive than multilevel methods, especially on unstructured grids, where course grids are often not easy to construct. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

4.

Parallel algorithm combined with mixed element procedure for compressible miscible displacement problem

Jiansong Zhang Danping Yang Hui Guo Yan Qu 《Numerical Algorithms》2017,76(4):993-1019

Based on overlapping domain decomposition, we construct a parallel mixed finite element algorithm for solving the compressible miscible displacement problem in porous media. The algorithm is fully parallel. We consider the relation between the convergence rate and discretization parameters, including the overlapping degree of the subspaces. We give the corresponding error estimate, which tells us that only two iterations are needed to reach to given accuracy at each time level. Numerical results are presented to confirm our theoretical analysis. 相似文献

5.

Parallel characteristic finite difference method for convection–diffusion equations

Jiansong Zhang Danping Yang 《Numerical Methods for Partial Differential Equations》2011,27(4):854-866

Based on the overlapping domain decomposition, an efficient parallel characteristic finite difference scheme is proposed for solving convection‐diffusion equations numerically. We give the optimal convergence order in error estimate analysis, which shows that we just need to iterate once or twice at each time level to reach the optimal convergence order. Numerical experiments also confirm the theoretical analysis. © 2009 Wiley Periodicals, Inc. Numer Methods Partial Differential Eq 27: 854–866, 2011 相似文献

6.

On the parallel complexity of hierarchical clustering and CC‐complete problems

Raymond Greenlaw Sanpawat Kantabutra 《Complexity》2008,14(2):18-28

Complex data sets are often unmanageable unless they can be subdivided and simplified in an intelligent manner. Clustering is a technique that is used in data mining and scientific analysis for partitioning a data set into groups of similar or nearby items. Hierarchical clustering is an important and well‐studied clustering method involving both top‐down and bottom‐up subdivisions of data. In this article we address the parallel complexity of hierarchical clustering. We describe known sequential algorithms for top‐down and bottom‐up hierarchical clustering. The top‐down algorithm can be parallelized, and when there are n points to be clustered, we provide an O(log n)‐time, n²‐processor Crew Pram algorithm that computes the same output as its corresponding sequential algorithm. We define a natural decision problem based on bottom‐up hierarchical clustering, and add this HIERARCHICAL CLUSTERING PROBLEM (HCP) to the slowly growing list of CC‐complete problems, thereby showing that HCP is one of the computationally most difficult problems in the COMPARATOR CIRCUIT VALUE PROBLEM class. This class contains a variety of interesting problems, and now for the first time a problem from data mining as well. By proving that HCP is CC‐complete, we have demonstrated that HCP is very unlikely to have an NC algorithm. This result is in sharp contrast to the NC algorithm which we give for the top‐down sequential approach, and the result surprisingly shows that the parallel complexities of the top‐down and bottom‐up approaches are different, unless CC equals NC. In addition, we provide a compendium of all known CC‐complete problems. © 2008 Wiley Periodicals, Inc. Complexity, 2008 相似文献

7.

Nonoverlapping domain decomposition characteristic finite differences for three‐dimensional convection‐diffusion equations

Changfeng Li Yirang Yuan 《Numerical Methods for Partial Differential Equations》2012,28(1):17-37

One domain decomposition method modified with characteristic differences is presented for non‐periodic three‐dimensional equations by multiply‐type quadratic interpolation and variant time‐step technique. This method consists of reduced‐scale, two‐dimensional computation on subdomain interface boundaries and fully implicit subdomain computation in parallel. A computational algorithm is outlined and an error estimate in discrete l²‐ norm is established by introducing new inner products and norms. Finally, numerical examples are given to illustrate the theoretical results, efficiency and parallelism of this method. © 2010 Wiley Periodicals, Inc. Numer Methods Partial Differential Eq 28: 17‐37, 2012 相似文献

8.

A direct Schur–Fourier decomposition for the efficient solution of high‐order Poisson equations on loosely coupled parallel computers

F. X. Trias M. Soria C. D. Prez‐Segarra A. Oliva 《Numerical Linear Algebra with Applications》2006,13(4):303-326

In this paper a parallel direct Schur–Fourier decomposition (DSFD) algorithm for the direct solution of arbitrary order discrete Poisson equations on parallel computers is proposed. It is based on a combination of a Direct Schur method and a Fourier decomposition and allows to solve each Poisson equation almost to machine accuracy using only one communication episode. Thus, it is well suited for loosely coupled parallel computers, that have a high network latency compared with the CPU performance. Several three‐dimensional direct numerical simulations (DNS) of wall‐bounded turbulent incompressible flows have been carried out using the DSFD algorithm. Numerical examples illustrating the robustness and scalability of the method on a PC cluster with a conventional 100 Mbits/s network are also presented. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献

9.

Optimal server allocation in general,finite, multi‐server queueing networks

J. MacGregor Smith F. R. B. Cruz T. van Woensel 《商业与工业应用随机模型》2010,26(6):705-736

Queueing networks with finite buffers, multiple servers, arbitrary acyclic, series‐parallel topologies, and general service time distributions are considered in this paper. An approach to optimally allocate servers to series, merge, and split topologies and their combinations is demonstrated. The methodology builds on two‐moment approximations to the service time distribution embedded in the generalized expansion method for computing the performance measures in complex finite queueing networks and Powell's algorithm for optimally allocating servers to the network topology. Convexity of the objective function along with results from computational experiments is presented for showing the efficacy of the methodology. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

10.

A parallel radix‐4 block cyclic reduction algorithm

M. Myllykoski T. Rossi 《Numerical Linear Algebra with Applications》2014,21(4):540-556

A conventional block cyclic reduction algorithm operates by halving the size of the linear system at each reduction step, that is, the algorithm is a radix‐2 method. An algorithm analogous to the block cyclic reduction known as the radix‐q partial solution variant of the cyclic reduction (PSCR) method allows the use of higher radix numbers and is thus more suitable for parallel architectures as it requires fever reduction steps. This paper presents an alternative and more intuitive way of deriving a radix‐4 block cyclic reduction method for systems with a coefficient matrix of the form tridiag{ ? I,D, ? I}. This is performed by modifying an existing radix‐2 block cyclic reduction method. The resulting algorithm is then parallelized by using the partial fraction technique. The parallel variant is demonstrated to be less computationally expensive when compared to the radix‐2 block cyclic reduction method in the sense that the total number of emerging subproblems is reduced. The method is also shown to be numerically stable and equivalent to the radix‐4 PSCR method. The numerical results archived correspond to the theoretical expectations. Copyright © 2013 John Wiley & Sons, Ltd. 相似文献

11.

An H‐matrix Type Preconditioner For Frictional Contact Problems

Lotfi Abdelhakim 《PAMM》2004,4(1):348-349

The bilateral or unilateral contact problem with Coulomb friction between two elastic bodies is considered [1]. An algorithm is introduced to solve the resulting finite element system by a non‐overlapping domain decomposition method [2, 3]. The global problem is transformed to a independant local problems posed in each bodie and a problem posed on the contact surface (the interface problem). The solution is obtained by using a successive approximation method, in each step of this algorithm we solve two intermediate problems the first with prescribed tangential pressure and the second with prescribed normal pressure [8]. Our preconditioner construction is based on the application of the H‐matrix technique [6, 7] together with the representation of the H^1/2 seminorm by a sum of partial seminorms [4]. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

12.

Discretized Multisplitting AOR Waveform Relaxation Algorithms for Initial Value Problem of Systems of ODEs

谷同祥李文强《数学季刊》1997,(4)

1.IntroductionInthedevelopmentofnewelectricalcircuits,thesimulationofthebehaviourofthecircuithasbecomeanessentialtoolforelectricalengineers.Fromthelayoutofthecircuitanonlinearsystemofordinarydifferentialequationsisgeneratedwhichdescribesthedynamicalbehaviourofthecircuit.Inthesimulationofverylargescaleintegrated(VLSI)circuitsthedimensionofthesystemofODEscanbecomeverylarge.Moreoversincethesystemisstiff,solvingthesesystemsisaverycomputionallyintensivetaskandtheuseofsupercomputersbecomesin-evit… 相似文献

13.

An accurate parallel block Gram–Schmidt algorithm without reorthogonalization

Denis Vanderstraeten 《Numerical Linear Algebra with Applications》2000,7(4):219-236

The modified Gram–Schmidt (MGS) orthogonalization process—used for example in the Arnoldi algorithm—often constitutes the bottleneck that limits parallel efficiencies. Indeed, a number of communications, proportional to the square of the problem size, are required to compute the dot‐products. A block formulation is attractive but it suffers from potential numerical instability. In this paper, we address this issue and propose a simple procedure that allows the use of a block Gram—Schmidt algorithm while guaranteeing a numerical accuracy close to that of MGS. The main idea is to determine the size of the blocks dynamically. The main advantages of this dynamic procedure are two‐fold: first, high performance matrix–vector multiplications can be used to decrease the execution time. Next, in a parallel environment, the number of communications is reduced. Performance comparisons with the alternative Iterated CGS also show an improvement for a moderate number of processors. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

14.

Local preconditioners for two‐level non‐overlapping domain decomposition methods

L. M. Carvalho L. Giraud G. Meurant 《Numerical Linear Algebra with Applications》2001,8(4):207-227

We consider additive two‐level preconditioners, with a local and a global component, for the Schur complement system arising in non‐overlapping domain decomposition methods. We propose two new parallelizable local preconditioners. The first one is a computationally cheap but numerically relevant alternative to the classical block Jacobi preconditioner. The second one exploits all the information from the local Schur complement matrices and demonstrates an attractive numerical behaviour on heterogeneous and anisotropic problems. We also propose two implementations based on approximate Schur complement matrices that are cheaper alternatives to construct the given preconditioners but that preserve their good numerical behaviour. Through extensive computational experiments we study the numerical scalability and the robustness of the proposed preconditioners and compare their numerical performance with well‐known robust preconditioners such as BPS and the balancing Neumann–Neumann method. Finally, we describe a parallel implementation on distributed memory computers of some of the proposed techniques and report parallel performances. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

15.

New fast divide‐and‐conquer algorithms for the symmetric tridiagonal eigenvalue problem

下载免费PDF全文

Shengguo Li Xiangke Liao Jie Liu Hao Jiang 《Numerical Linear Algebra with Applications》2016,23(4):656-673

In this paper, two accelerated divide‐and‐conquer (ADC) algorithms are proposed for the symmetric tridiagonal eigenvalue problem, which cost O(N²r) flops in the worst case, where N is the dimension of the matrix and r is a modest number depending on the distribution of eigenvalues. Both of these algorithms use hierarchically semiseparable (HSS) matrices to approximate some intermediate eigenvector matrices, which are Cauchy‐like matrices and are off‐diagonally low‐rank. The difference of these two versions lies in using different HSS construction algorithms, one (denoted by ADC1) uses a structured low‐rank approximation method and the other (ADC2) uses a randomized HSS construction algorithm. For the ADC2 algorithm, a method is proposed to estimate the off‐diagonal rank. Numerous experiments have been carried out to show their stability and efficiency. These algorithms are implemented in parallel in a shared memory environment, and some parallel implementation details are included. Comparing the ADCs with highly optimized multithreaded libraries such as Intel MKL, we find that ADCs could be more than six times faster for some large matrices with few deflations. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

16.

Single and multi‐interval Legendre spectral methods in time for parabolic equations

Jian‐guo Tang He‐ping Ma 《Numerical Methods for Partial Differential Equations》2006,22(5):1007-1034

In this article, we take the parabolic equation with Dirichlet boundary conditions as a model to present the Legendre spectral methods both in spatial and in time. Error analysis for the single/multi‐interval schemes in time is given. For the single interval spectral method in time, we obtain a better error estimate in L²‐norm. For the multi‐interval spectral method in time, we obtain the L²‐optimal error estimate in spatial. By choosing approximate trial and test functions, the methods result in algebraic systems with sparse forms. A parallel algorithm is constructed for the multi‐interval scheme in time. Numerical results show the efficiency of the methods. The methods are also applied to parabolic equations with Neumann boundary conditions, Robin boundary conditions and some nonlinear PDEs. © 2005 Wiley Periodicals, Inc. Numer Methods Partial Differential Eq, 2006 相似文献

17.

Testing Parallel Variable Transformation 总被引：2，自引：0，他引：2

Eiki Yamakawa Masao Fukushima 《Computational Optimization and Applications》1999,13(1-3):253-274

This paper studies performance of the parallel variable transformation (PVT) algorithm for unconstrained nonlinear optimization through numerical experiments on a Fujitsu VPP500, one of the most up-to-date vector parallel computers. Special attention is paid to a particular form of the PVT algorithm that is regarded as a generalization of the block Jacobi algorithm that allows overlapping of variables among processors. Implementation strategies on the VPP500 are described in detail and results of numerical experiments are reported. 相似文献

18.

A memory‐efficient finite volume method for advection‐diffusion‐reaction systems with nonsmooth sources

下载免费PDF全文

Jonas Schäfer Xuan Huang Stefan Kopecz Philipp Birken Matthias K. Gobbert Andreas Meister 《Numerical Methods for Partial Differential Equations》2015,31(1):143-167

We present a parallel matrix‐free implicit finite volume scheme for the solution of unsteady three‐dimensional advection‐diffusion‐reaction equations with smooth and Dirac‐Delta source terms. The scheme is formally second order in space and a Newton–Krylov method is employed for the appearing nonlinear systems in the implicit time integration. The matrix‐vector product required is hardcoded without any approximations, obtaining a matrix‐free method that needs little storage and is well‐suited for parallel implementation. We describe the matrix‐free implementation of the method in detail and give numerical evidence of its second‐order convergence in the presence of smooth source terms. For nonsmooth source terms, the convergence order drops to one half. Furthermore, we demonstrate the method's applicability for the long‐time simulation of calcium flow in heart cells and show its parallel scaling. © 2014 Wiley Periodicals, Inc. Numer Methods Partial Differential Eq31: 143–167, 2015 相似文献

19.

Real‐time road traffic forecasting using regime‐switching space‐time models and adaptive LASSO

Yiannis Kamarianakis Wei Shen Laura Wynter 《商业与工业应用随机模型》2012,28(4):297-315

Smart transportation technologies require real‐time traffic prediction to be both fast and scalable to full urban networks. We discuss a method that is able to meet this challenge while accounting for nonlinear traffic dynamics and space‐time dependencies of traffic variables. Nonlinearity is taken into account by a union of non‐overlapping linear regimes characterized by a sequence of temporal thresholds. In each regime, for each measurement location, a penalized estimation scheme, namely the adaptive absolute shrinkage and selection operator (LASSO), is implemented to perform model selection and coefficient estimation simultaneously. Both the robust to outliers least absolute deviation estimates and conventional LASSO estimates are considered. The methodology is illustrated on 5‐minute average speed data from three highway networks. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献

20.

An NC Parallel Algorithm for Edge-Coloring Series–Parallel Multigraphs

Xiao Zhou Hitoshi Suzuki Takao Nishizeki 《Journal of Algorithms in Cognition, Informatics and Logic》1997,23(2):359-374

Many combinatorial problems can be efficiently solved in parallel for series–parallel multigraphs. The edge-coloring problem is one of a few combinatorial problems for which no NC parallel algorithm has been obtained for series–parallel multigraphs. This paper gives an NC parallel algorithm for the problem on series–parallel multigraphsG. It takesO(log n) time withO(Δn/log n) processors, wherenis the number of vertices and Δ is the maximum degree ofG. 相似文献