首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Αn optimized MPI+OpenACC implementation model that performs efficiently in CPU/GPU systems using large-eddy simulation is presented. The code was validated for the simulation of wave boundary-layer flows against numerical and experimental data in the literature. A direct Fast-Fourier-Transform-based solver was developed for the solution of the Poisson equation for pressure taking advantage of the periodic boundary conditions. This solver was optimized for parallel execution in CPUs and outperforms by 10 times in computational time a typical iterative preconditioned conjugate gradient solver in GPUs. In terms of parallel performance, an overlapping strategy was developed to reduce the overhead of performing MPI communications using GPUs. As a result, the weak scaling of the algorithm was improved up to 30%. Finally, a large-scale simulation (Re = 2 × 105) using a grid of 4 × 108 cells was executed, and the performance of the code was analyzed. The simulation was launched using up to 512 nodes (512 GPUs + 6144 CPU-cores) on one of the current top 10 supercomputers of the world (Piz Daint). A comparison of the overall computational time showed that the GPU version was 4.2 times faster than the CPU one. The parallel efficiency of this strategy (47%) is competitive compared with the state-of-the-art CPU implementations, and it has the potential to take advantage of modern supercomputing capabilities.  相似文献   

2.
We propose a new algorithm, the DomEig algorithm, for obtaining the line-sum-symmetric similarity-scaling of a given irreducible, essentially nonnegative matrix A. It is based on results concerning the minimum dominant eigenvalue of an essentially nonnegative matrix under trace-preserving perturbations of its diagonal. In this note we relate the minimum dominant eigenvalue problem to the problem of determining the diagonal scaling matrix for line-sum-symmetry. We present the DomEig algorithm, prove its convergence, and discuss briefly the results of a comparison of this algorithm with another algorithm, the DSS algorithm, often used for line-sum symmetry. The experiments suggest that, for matrices of order greater than 50, the convergence rate, measured either in flop counts or CPU time, is significantly greater for DomEig than for DSS, with the improvement in rate increasing as the order increases. The algorithm may be useful in such applications as the scaling of large social accounting matrices.  相似文献   

3.
We examine the optimal scaling and the efficiency of the pseudo-marginal random walk Metropolis algorithm using a recently-derived result on the limiting efficiency as the dimension, \(d\rightarrow \infty \). We prove that the optimal scaling for a given target varies by less than 20 % across a wide range of distributions for the noise in the estimate of the target, and that any scaling that is within 20 % of the optimal one will be at least 70 % efficient. We demonstrate that this phenomenon occurs even outside the range of noise distributions for which we rigorously prove it. We then conduct a simulation study on an example with d = 10 where importance sampling is used to estimate the target density; we also examine results available from an existing simulation study with d = 5 and where a particle filter was used. Our key conclusions are found to hold in these examples also.  相似文献   

4.
UNIRANDI is a stochastic local search algorithm that performs line searches from starting points along good random directions. In this paper, we focus on a modified version of this method. The new algorithm, addition to the random directions, considers more promising directions in order to speed up the optimization process. The performance of the new method is tested empirically on standard test functions in terms of function evaluations, success rates, error values, and CPU time. It is also compared to the previous version as well as other local search methods. Numerical results show that the new method is promising in terms of robustness and efficiency.  相似文献   

5.
本文将Okada & Imaizumi等的模型加以推广,提出了一种用于处理非对称相异性矩阵的非度量多维尺度变换新方法.在模型中,我们假定每个研究对象可以表示为Minkowski度量空间中的一个点和一个超球面,超球面的半径揭示了相应研究对象的非对称性.文中我们给出了一种计算点坐标及球半径的算法.该算法使用了代数方法,比原来的方法收敛速度快,节省计算时间.最后给出了一个数值例子.  相似文献   

6.
This paper introduces a formulation for the Minimum Dominating Cycle Problem. Additionally, a Branch and Cut algorithm, based on that formulation, is also investigated. So far, the algorithm contains no primal heuristics. However, it managed to solve to proven optimality, in acceptable CPU times, all test instances with up to 120 vertices.  相似文献   

7.
Parallel preconditioned conjugate gradient algorithm on GPU   总被引:1,自引:0,他引:1  
We propose a parallel implementation of the Preconditioned Conjugate Gradient algorithm on a GPU platform. The preconditioning matrix is an approximate inverse derived from the SSOR preconditioner. Used through sparse matrix–vector multiplication, the proposed preconditioner is well suited for the massively parallel GPU architecture. As compared to CPU implementation of the conjugate gradient algorithm, our GPU preconditioned conjugate gradient implementation is up to 10 times faster (8 times faster at worst).  相似文献   

8.
We approximate the objective function of the fixed charge network flow problem (FCNF) by a piecewise linear one, and construct a concave piecewise linear network flow problem (CPLNF). A proper choice of parameters in the CPLNF problem guarantees the equivalence between those two problems. We propose a heuristic algorithm for solving the FCNF problem, which requires solving a sequence of CPLNF problems. The algorithm employs the dynamic cost updating procedure (DCUP) to find a solution to the CPLNF problems. Preliminary numerical experiments show the effectiveness of the proposed algorithm. In particular, it provides a better solution than the dynamic slope scaling procedure in less CPU time. Research was partially supported by NSF and Air Force grants.  相似文献   

9.
10.
For solving systems of linear algebraic equations with block-tridiagonal matrices arising in geoelectrics problems, the parallel matrix sweep algorithm, conjugate gradient method with preconditioner, and square root method are proposed and implemented numerically on multi-core CPU Intel with graphics processors NVIDIA. Investigation of efficiency and optimization of parallel algorithms for solving the problem with quasi-model data are performed.  相似文献   

11.
Multidimensional scaling with city block norm in embedding space is considered. Construction of the corresponding algorithm is reduced to minimization of a piecewise quadratic function. The two level algorithm is developed combining combinatorial minimization at upper level with local minimization at lower level. Results of experimental investigation of the efficiency of the proposed algorithm are presented as well as examples of its application to visualization of multidimensional data.  相似文献   

12.
In this paper, design optimization of aircraft wing structures with multiple frequency constraints was considered. An optimality criterion algorithm along with a scaling procedure was used. Large-scale structural design problems were considered for demonstrating the reliability and efficiency of the algorithm. A simplified fighter wing, and an intermediate-complex wing were considered as design examples. Design histories and the first few frequencies at the initial and final conditions are presented.  相似文献   

13.
14.
In this paper, we consider a two-machine flowshop scheduling problem in which the waiting time of each job between the two machines cannot be greater than a certain time period. For the problem with the objective of minimizing makespan, we identify several dominance properties of the problem and develop a branch-and-bound (B&B) algorithm using the dominance properties. Computational tests are performed on randomly generated test problems for evaluation of performance of the B&B algorithm, and results show that the algorithm can solve problems with up to 150 jobs in a reasonable amount of CPU time.  相似文献   

15.
An algorithm for unconstrained minimization is proposed which is invariant to the nonlinear scaling of a strictly convex quadratic function and which generates mutually conjugate directions for extended quadratic functions. It is derived for inexact line searches and is designed for general use. It compares favorably in numerical tests (eight test functions, dimensionality up to 1000) with the 1975 Dixon algorithm on which this new algorithm is based.  相似文献   

16.
We present a branch and bound algorithm for a two-machine re-entrant flowshop scheduling problem with the objective of minimizing total tardiness. In the re-entrant flowshop considered here, all jobs must be processed twice on each machine, that is, each job should be processed on machine 1, machine 2 and then machine 1 and machine 2. By regarding a job as a pair of sub-jobs, each of which represents a pass through the two machines, we develop dominance properties, a lower bound and heuristic algorithms for the problem, and use these to develop a branch and bound algorithm. For evaluation of the performance of the algorithms, computational experiments are performed on randomly generated test problems and results are reported. Results of the experiments show that the suggested branch and bound algorithm can solve problems with up to 20 sub-jobs in a reasonable amount of CPU time, and the average percentage gap of the heuristic solutions is about 13%.  相似文献   

17.
An interval method for bounding level sets, modified to increase its efficiency and to get sharper bounding boxes, is presented. The new algorithm was tested with standard global optimization test problems. The test results show that, while the modified method gives a more valuable, guaranteed reliability result set, it is competitive with non-interval methods in terms of CPU time and number of function evaluations.This work was supported by Grant OTKA 1074/1987, and in part by DAAD Fellowship No. 314/108/004/8 during the author's stay at Düsseldorf University.  相似文献   

18.
We consider a problem of scheduling n independent jobs on m unrelated parallel machines with the objective of minimizing total tardiness. Processing times of a job on different machines may be different on unrelated parallel-machine scheduling problems. We develop several dominance properties and lower bounds for the problem, and suggest a branch and bound algorithm using them. Results of computational experiments show that the suggested algorithm gives optimal solutions for problems with up to five machines and 20 jobs in a reasonable amount of CPU time.  相似文献   

19.
叠前逆时偏移方法可以精确成像复杂地下构造,利用高阶有限差分方法求解声波方程,并给出了满足稳定性条件的采样间隔的选取方式.利用GPU/CPU加速技术实现地震资料的叠前逆时偏移算法,极大地提高了计算效率,算法也采用随机边界条件,节约了大量存储空间.分析了速度模型变化对成像结果的影响.复杂地震数据成像的测试结果表明,所述的叠前逆时偏移算法可清晰成像陡倾角成像清晰,对盐丘边界和内部构造成像效果也较好.  相似文献   

20.
A two level global optimization algorithm for multidimensional scaling (MDS) with city-block metric is proposed. The piecewise quadratic structure of the objective function is employed. At the upper level a combinatorial global optimization problem is solved by means of branch and bound method, where an objective function is defined as the minimum of a quadratic programming problem. The later is solved at the lower level by a standard quadratic programming algorithm. The proposed algorithm has been applied for auxiliary and practical problems whose global optimization counterpart was of dimensionality up to 24.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号