首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, a parallel implementation of Wang’s method for solving tridiagonal system of equations on the multiprocessor machine using occam language is presented. The parallel algorithm has been designed for shared and distributed memory machine that support data parallel and message passing. The over all performance of this implementation on 9 each of processors is given. The communication times are very important and any improvement on this communication would have a significant performance of the implementation. The significance of these results are discussed.  相似文献   

2.
We present the Multiscale Coupling Library and Environment: MUSCLE 2. This multiscale component-based execution environment has a simple to use Java, C++, C, Python and Fortran API, compatible with MPI, OpenMP and threading codes. We demonstrate its local and distributed computing capabilities and compare its performance to MUSCLE 1, file copy, MPI, MPWide, and GridFTP. The local throughput of MPI is about two times higher, so very tightly coupled code should use MPI as a single submodel of MUSCLE 2; the distributed performance of GridFTP is lower, especially for small messages. We test the performance of a canal system model with MUSCLE 2, where it introduces an overhead as small as 5% compared to MPI.  相似文献   

3.
Under study is the performance of some computational models of filtration combustion of gases on multi-core computers. The analysis is restricted to the models based on explicit difference schemes. In particular, an explicit two-level parallel algorithm with an adaptive mesh is constructed. The two shared memory parallelization methods are applied: the straightforward application of OpenMP directives and special distribution of data among the threads. The simulation shows that the last method has a substantial performance advantage.  相似文献   

4.
The High Performance Fortran (HPF) language and the Message Passing Interface (MPI) are two widely used methods to achieve parallelism on today's clusters and multiprocessor supercomputers. HPF is a distinct language providing extensions to Fortran 90/95 to express parallel execution paths and regions. MPI is a library of communication calls that can be inserted into modern high-level languages (C and Fortran). This paper discusses the use of the two approaches in a parallel finite element application for liquid composite manufacturing process modeling. The unstructured nature of the code provides an excellent opportunity to test both the computation and communication effectiveness of the two approaches. We discuss performance results based on implementations conducted on a modern massively parallel computing platform with a highly tuned processor interconnection network.  相似文献   

5.
Swirling jets undergoing vortex breakdown occur in many technical applications, e.g. vortex burners, turbines and jet engines. To simulate the highly nonlinear dynamics of the flow, it is necessary to use high-order numerical methods, leading to increased computational cost. To be able to perform simulations in acceptable turn-around time, an available LES code for solving the filtered compressible Navier-Stokes equations in cylindrical coordinates using compact finite-difference schemes was parallelized for massively-parallel architectures. The parallelization was done following the ghost-cell approach for filtering in the three spatial directions. The inter-process communication is handled using the message passing interface (MPI). The weak and strong scaling properties of the code indicate that it can be used for massively parallel simulations using several thousand processors. (© 2010 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

6.
The cellular automaton model of computation has drawn the interest of researchers from different disciplines including computer science, biology, mathematics, economy, biochemistry and philosophy. Although a cellular automaton is based on a set of simple rules, over time complex patterns may evolve. We present computational methods for implementing and optimizing a well known two-state cellular automaton, Conway's Game of Life, on a 16-core Intel Xeon. The evaluation is based on three multicore algorithms. The first algorithm is coherent and utilizes shared memory and barrier synchronization. The remaining two algorithms are distributed and utilize private memories and explicit core-to-core message passing. We provide a link to our open source simulation software.  相似文献   

7.
Problems related to peculiarities in the behavior of communications in modern computational clusters with a large number of processor elements in the process of message transfer are discussed. The authors propose developing an approach to characteristics measurement based on synthetic MPI tests. Visualization tools and an algorithm for clusterization of the output data of MPI tests are proposed for analyzing these peculiarities. The developed toolkit makes it possible to uncover such peculiarities as the effect of external disturbances on the data transfer rate, the topological structure of communications, and incorrect operation of the computational cluster units. Data on the Chebyshev SKIF-60, Lomonosov SKIF T500+, MVS-100K, and BlueGene/P clusters are given.  相似文献   

8.
General sparse hybrid solvers are commonly used kernels for solving wide range of scientific and engineering problems. This work addresses the current problems of efficiently solving general sparse linear equations with direct/iterative hybrid solvers on many core distributed clusters. We briefly discuss the solution stages of Maphys, HIPS, and PDSLin hybrid solvers for large sparse linear systems with their major algorithmic differences. In this category of solvers, different methods with sophisticated preconditioning algorithms are suggested to solve the trade off between memory and convergence. Such solutions require a certain hierarchical level of parallelism more suitable for modern supercomputers that allow to scale for thousand numbers of processors using Schur complement framework. We study the effect of reordering and analyze the performance, scalability as well as memory for each solve phase of PDSLin, Maphys, and HIPS hybrid solvers using large set of challenging matrices arising from different actual applications and compare the results with SuperLU_DIST direct solver. We specifically focus on the level of parallel mechanisms used by the hybrid solvers and the effect on scalability. Tuning and Analysis Utilities (TAU) is employed to assess the efficient usage of heap memory profile and measuring communication volume. The tests are run on high performance large memory clusters using up to 512 processors.  相似文献   

9.
A method for simultaneous solution of large and sparse linearized equation sets and the corresponding eigenvalue problems is presented. Such problems arise from the discretization and the solution of nonlinear problems with the finite element method and Newton iteration. The method is based on a parallel version of the preconditioned GMRES(mm) by deflation. The parallel code exploits the architecture of the computational clusters using the MPI (Message Passing Interface). The convergence rate, the parallel speedup and the memory requirements of the proposed method are reported and evaluated.  相似文献   

10.
This paper deals with an NP-hard string problem from the bio-informatics field: the repetition-free longest common subsequence problem. This problem has enjoyed an increasing interest in recent years, which has resulted in the application of several pure as well as hybrid metaheuristics. However, the literature lacks a comprehensive comparison between those approaches. Moreover, it has been shown that general purpose integer linear programming solvers are very efficient for solving many of the problem instances that were used so far in the literature. Therefore, in this work we extend the available benchmark set, adding larger instances to which integer linear programming solvers cannot be applied anymore. Moreover, we provide a comprehensive comparison of the approaches found in the literature. Based on the results we propose a hybrid between two of the best methods which turns out to inherit the complementary strengths of both methods.  相似文献   

11.
In this study we introduce strategies for a load-balanced parallelization of sparse matrix computations on a cluster of PCs with minimum communication overhead. Based on these strategies a parallel sparse Conjugate Gradient Algorithm for CFD computations is evolved. The proposed parallel algorithm is implemented on Anu-cluster, a cluster of eight PCs, under ANULIB message passing environment. The parallel sparse code is tested both on linear and non-linear problems and found to give good performance. Results are compared with those from dense matrix computations.  相似文献   

12.
This paper develops a technique for numerically solving hybrid optimal control problems. The theoretical foundation of the approach is a recently developed methodology by S.C. Bengea and R.A. DeCarlo [Optimal control of switching systems, Automatica. A Journal of IFAC 41 (1) (2005) 11–27] for solving switched optimal control problems through embedding. The methodology is extended to incorporate hybrid behavior stemming from autonomous (uncontrolled) switches that results in plant equations with piecewise smooth vector fields. We demonstrate that when the system has no memory, the embedding technique can be used to reduce the hybrid optimal control problem for such systems to the traditional one. In particular, we show that the solution methodology does not require mixed integer programming (MIP) methods, but rather can utilize traditional nonlinear programming techniques such as sequential quadratic programming (SQP). By dramatically reducing the computational complexity over existing approaches, the proposed techniques make optimal control highly appealing for hybrid systems. This appeal is concretely demonstrated in an exhaustive application to a unicycle model that contains both autonomous and controlled switches; optimal and model predictive control solutions are given for two types of models using both a minimum energy and minimum time performance index. Controller performance is evaluated in the presence of a step frictional disturbance and parameter uncertainties which demonstrates the robustness of the controllers.  相似文献   

13.
StarSs is a task-based programming model that allows to parallelize sequential applications by means of annotating the code with compiler directives. The model further supports transparent execution of designated tasks on heterogeneous platforms, including clusters of GPUs. This paper focuses on the methodology and tools that complements the programming model forming a consistent development environment with the objective of simplifying the live of application developers.The programming environment includes the tools TAREADOR and TEMANEJO, which have been designed specifically for StarSs. TAREADOR, a Valgrind-based tool, allows a top-down development approach by assisting the programmer in identifying tasks and their data-dependencies across all concurrency levels of an application. TEMANEJO is a graphical debugger supporting the programmer by visualizing the task dependency tree on one hand, but also allowing to manipulate task scheduling or dependencies. These tools are complemented with a set of performance analysis tools (Scalasca, Cube and Paraver) that enable to fine tune StarSs application.  相似文献   

14.
The inherent structure of cellular automata is trivially parallelizable and can directly benefit from massively parallel machines in computationally intensive problems. This paper presents both block synchronous and block pipeline (with asynchronous message passing) parallel implementations of cellular automata on distributed memory (message-passing) architectures. A structural design problem is considered to study the performance of the various cellular automata implementations. The synchronous parallel implementation is a mixture of Jacobi and Gauss–Seidel style iteration, where it becomes more Jacobi like as the number of processors increases. Therefore, it exhibits divergence because of the mathematical characteristics of Jacobi iteration matrix for the structural problem as the number of processors increases. The proposed pipeline implementation preserves convergence by simulating a pure Gauss–Seidel style row-wise iteration. Numerical results for analysis and design of a cantilever plate made of composite material show that the pipeline update scheme is convergent and successfully generates optimal designs.  相似文献   

15.
The problem of multidimensional scaling with city-block distances in the embedding space is reduced to a two level optimization problem consisting of a combinatorial problem at the upper level and a quadratic programming problem at the lower level. A hybrid method is proposed combining randomized search for the upper level problem with a standard quadratic programming algorithm for the lower level problem. Several algorithms for the combinatorial problem have been tested and an evolutionary global search algorithm has been proved most suitable. An experimental code of the proposed hybrid multidimensional scaling algorithm is developed and tested using several test problems of two- and three-dimensional scaling.  相似文献   

16.
We study the dynamic admission control for a finite shared buffer with support of multiclass traffic under Markovian assumptions. The problem is often referred to as buffer sharing in the literature. From the linear programming (LP) formulation of the continuous-time Markov decision process (MDP), we construct a hierarchy of increasingly stronger LP relaxations where the hierarchy levels equal the number of job classes. Each relaxation in the hierarchy is obtained by projecting the original achievable performance region onto a polytope of simpler structure. We propose a heuristic policy for admission control, which is based on the theory of Marginal Productivity Index (MPI) and the Lagrangian decomposition of the first order LP relaxation. The dual of the relaxed buffer space constraint in the first order LP relaxation is used as a proxy to the cost of buffer space. Given that each of the decomposed queueing admission control problems satisfies the indexability condition, the proposed heuristic accepts a new arrival if there is enough buffer space left and the MPI of the current job class is greater than the incurred cost of buffer usage. Our numerical examples for the cases of two and eight job classes show the near-optimal performance of the proposed MPI heuristic.  相似文献   

17.
We apply to fixed charge network flow (FCNF) problems a general hybrid solution method that combines constraint programming and linear programming. FCNF problems test the hybrid approach on problems that are already rather well suited for a classical 0–1 model. They are solved by means of a global constraint that generates specialized constraint propagation algorithms and a projected relaxation that can be rapidly solved as a minimum cost network flow problem. The hybrid approach ran about twice as fast as a commercial mixed integer programming code on fixed charge transportation problems with its advantage increasing with problem size. For general fixed charge transshipment problems, however, it has no effect because the implemented propagation methods are weak.  相似文献   

18.
We consider the problem of canonical labeling in anonymous directed split-stars. This paper proposes a distributed algorithm for finding the vertex sets with specified leading symbols in directed split-stars and which has a linear message and constant time complexity. The algorithm runs on an asynchronous timing model without shared memory. In addition, our algorithm generalizes the previous distributed algorithms on directed split-stars that we know.  相似文献   

19.
Finite difference method is an important methodology in the approximation of waves. In this paper, we will study two implicit finite difference schemes for the simulation of waves. They are the weighted alternating direction implicit (ADI) scheme and the locally one-dimensional (LOD) scheme. The approximation errors, stability conditions, and dispersion relations for both schemes are investigated. Our analysis shows that the LOD implicit scheme has less dispersion error than that of the ADI scheme. Moreover, the unconditional stability for both schemes with arbitrary spatial accuracy is established for the first time. In order to improve computational efficiency, numerical algorithms based on message passing interface (MPI) are implemented. Numerical examples of wave propagation in a three-layer model and a standard complex model are presented. Our analysis and comparisons show that both ADI and LOD schemes are able to efficiently and accurately simulate wave propagation in complex media.  相似文献   

20.
We describe a new parallel implementation, mplrs, of the vertex enumeration code lrs that uses the MPI parallel environment and can be run on a network of computers. The implementation makes use of a C wrapper that essentially uses the existing lrs code with only minor modifications. mplrs was derived from the earlier parallel implementation plrs, written by G. Roumanis in C\({++}\) which runs on a shared memory machine. By improving load balancing we are able to greatly improve performance for medium to large scale parallelization of lrs. We report computational results comparing parallel and sequential codes for vertex/facet enumeration problems for convex polyhedra. The problems chosen span the range from simple to highly degenerate polytopes. For most problems tested, the results clearly show the advantage of using the parallel implementation mplrs of the reverse search based code lrs, even when as few as 8 cores are available. For some problems almost linear speedup was observed up to 1200 cores, the largest number of cores tested. The software that was reviewed as part of this submission is included in lrslib-062.tar.gz which has MD5 hash be5da7b3b90cc2be628dcade90c5d1b9.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号