首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
为探讨医学数字化CT图像的压缩效果,选取具有重要临床意义的肺部CT数字化图像2186张,基于小波变换进行无视觉损失压缩,对压缩效果进行统计分析与评价。经肺部影像学专家肉眼进行盲法评判,压缩后与病理诊断结果符合率为91.89(95%C I为:79%-98%),压缩前后综合评分差别无统计学意义。小波变换较好地实现了对肺部患者数字化CT图像的无视觉损失压缩。  相似文献   

2.
Numerical methods related to Krylov subspaces are widely used in large sparse numerical linear algebra. Vectors in these subspaces are manipulated via their representation onto orthonormal bases. Nowadays, on serial computers, the method of Arnoldi is considered as a reliable technique for constructing such bases. However, although easily parallelizable, this technique is not as scalable as expected for communications. In this work we examine alternative methods aimed at overcoming this drawback. Since they retrieve upon completion the same information as Arnoldi's algorithm does, they enable us to design a wide family of stable and scalable Krylov approximation methods for various parallel environments. We present timing results obtained from their implementation on two distributed-memory multiprocessor supercomputers: the Intel Paragon and the IBM Scalable POWERparallel SP2. © 1997 John Wiley & Sons, Ltd.  相似文献   

3.
The ability of the modern graphics processors to operate on large matrices in parallel can be exploited for solving constrained image deblurring problems in a short time. In particular, in this paper we propose the parallel implementation of two iterative regularization methods: the well known expectation maximization algorithm and a recent scaled gradient projection method. The main differences between the considered approaches and their impact on the parallel implementations are discussed. The effectiveness of the parallel schemes and the speedups over standard CPU implementations are evaluated on test problems arising from astronomical images.  相似文献   

4.
We present a method for compressing binary images via monochromatic pattern substitution. Such method has no relevant loss of compression effectiveness if the image is partitioned into up to a thousand blocks, approximately, and each block is compressed independently. Therefore, it can be implemented on a distributed system with no interprocessor communication. In the theoretical context of unbounded parallelism, interprocessor communication is needed. Compression effectiveness has a bell-shaped behaviour which is again competitive with the sequential performance when the highest degree of parallelism is reached. Finally, the method has a speed-up if applied sequentially to an image partitioned into up to 256 blocks. It follows that such speed-up can be applied to a parallel implementation on a small scale system.  相似文献   

5.
We describe an O((log n)2) time parallel algorithm, using n processors, for finding the lexicographically first depth first search tree in the random graph G n, p, with p fixed. The problem itself is complete for P, and so is unlikely to be efficiently parallelizable always.  相似文献   

6.
Extreme scale simulation requires fast and scalable algorithms, such as multigrid methods. To achieve asymptotically optimal complexity, it is essential to employ a hierarchy of grids. The cost to solve the coarsest grid system can often be neglected in sequential computings, but cannot be ignored in massively parallel executions. In this case, the coarsest grid can be large and its efficient solution becomes a challenging task. We propose solving the coarse grid system using modern, approximate sparse direct methods and investigate the expected gains compared with traditional iterative methods. Since the coarse grid system only requires an approximate solution, we show that we can leverage block low-rank techniques, combined with the use of single precision arithmetic, to significantly reduce the computational requirements of the direct solver. In the case of extreme scale computing, the coarse grid system is too large for a sequential solution, but too small to permit massively parallel efficiency. We show that the agglomeration of the coarse grid system to a subset of processors is necessary for the sparse direct solver to achieve performance. We demonstrate the efficiency of the proposed method on a Stokes-type saddle point system solved with a monolithic Uzawa multigrid method. In particular, we show that the use of an approximate sparse direct solver for the coarse grid system can outperform that of a preconditioned minimal residual iterative method. This is demonstrated for the multigrid solution of systems of order up to 1 0 11 degrees of freedom on a petascale supercomputer using 43,200 processes.  相似文献   

7.
The inherent structure of cellular automata is trivially parallelizable and can directly benefit from massively parallel machines in computationally intensive problems. This paper presents both block synchronous and block pipeline (with asynchronous message passing) parallel implementations of cellular automata on distributed memory (message-passing) architectures. A structural design problem is considered to study the performance of the various cellular automata implementations. The synchronous parallel implementation is a mixture of Jacobi and Gauss–Seidel style iteration, where it becomes more Jacobi like as the number of processors increases. Therefore, it exhibits divergence because of the mathematical characteristics of Jacobi iteration matrix for the structural problem as the number of processors increases. The proposed pipeline implementation preserves convergence by simulating a pure Gauss–Seidel style row-wise iteration. Numerical results for analysis and design of a cantilever plate made of composite material show that the pipeline update scheme is convergent and successfully generates optimal designs.  相似文献   

8.
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design.  相似文献   

9.
We present a case study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multicore processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we find speedups from 35- to 500-fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modeling into complex data-rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. This article has supplementary material online.  相似文献   

10.
An algorithm is proposed for the restructuring of arithmetic expressions involving unary functions to a form convenient for parallel computation. Upper bounds are presented for the restructuring time on a serial processor, the parallel computing time, and the required number of processors.Translated from Zapiski Nauchnykh Seminarov Leningradskogo Otdeleniya Matematicheskogo Instituta im. V. A. Steklova Akad. Nauk SSSR, Vol. 175, pp. 37–52, 1988.  相似文献   

11.
The problem of optimal scheduling n tasks in a parallel processor system is studied. The tasks are malleable, i.e., a task may be executed by several processors simultaneously and the processing speed of a task is a nonlinear function of the number of processors allocated to it. The total number of processors is m and it is an upper bound on the number of processors that can be used by all the tasks simultaneously. It is assumed that the number of processors is sufficient to process all the tasks simultaneously, i.e. nm. The objective is to find a task schedule and a processor allocation such that the overall task completion time, i.e. the makespan, is minimized. The problem is motivated by real-life applications of parallel computer systems in scientific computing of highly parallelizable tasks. An O(n) algorithm is presented to solve this problem when all the processing speed functions are convex. If these functions are all concave and the number of tasks is a constant, the problem can be solved in polynomial time. A relaxed problem, in which the number of processors allocated to each task is not required to be integer, can be solved in O(nmax {m,nlog 2 m}) time. It is proved that the minimum makespan values for the original and relaxed problems coincide. For n=2 or n=3, an optimal solution for the relaxed problem can be converted into an optimal solution for the original problem in a constant time.  相似文献   

12.
Parallel algorithms for evaluating arithmetic expressions generally assume the computation tree form to be at hand. The computation tree form can be generated within the same resource bounds as the parenthesis matching problem can be solved. We provide a new cost optimal parallel algorithm for the latter problem, which runs in time O(log n) using O(n/log n) processors on an . We also prove that the algorithm is the fastest possible independently of the number of processors available.  相似文献   

13.
This paper presents an ADBASE-based parallel algorithm forsolving multiple objective linear programs (MOLPs). Job balance,speedup and scalability are of primary interest in evaluatingefficiency of the new algorithm. The scalability of a parallelalgorithm is a measure of its capacity to increase performance withrespect to the number of processors used. Implementation results onIntel iPSC/2 and Paragon multiprocessors show that the algorithmsignificantly speeds up the process of solving MOLPs, which isunderstood as generating all or some efficient extreme points andunbounded efficient edges. The algorithm is shown to be scalable andgives better results for large problems. Motivation andjustification for solving large MOLPs are also included.  相似文献   

14.
We study the computation, communication and synchronization requirements related to the construction and search of parallel segment trees in an architecture independent way. Our proposed parallel algorithms are optimal in space and time compared to the corresponding sequential algorithms utilized to solve the introduced problems and are described in the context of the bulk-synchronous parallel (BSP) model of computation. Our methods are more scalable and can thus be made to work for larger values of processor size p relative to problem size n than other segment tree related algorithms that have been described on other realistic distributed-memory parallel models and also provide a natural way to approach searching problems on latency-tolerant models of computation that maintains a balanced query load among the processors.  相似文献   

15.
椭圆离散方程并行预条件子的局部构造算法 Ⅰ.基本方法   总被引:1,自引:0,他引:1  
孙家昶 《计算数学》1995,17(2):143-153
用有限元或差分法离散所得的大型稀疏椭圆型线性代数方程组Au=f(1)构造高效率的迭代算法,是目前计算方法的一个极其活跃的方向.  相似文献   

16.
Inpainting is an image interpolation problem with broad applications in image and vision analysis. Described in the current expository paper are our recent efforts in developing universal inpainting models based on the Bayesian and variational principles. Discussed in detail are several variational inpainting models built upon geometric image models, the associated Euler‐Lagrange PDEs and their geometric and dynamic interpretations, as well as effective computational approaches. Novel efforts are then made to further extend this systematic variational framework to the inpainting of oscillatory textures, interpolation of missing wavelet coefficients as in the wireless transmission of JPEG2000 images, as well as light‐adapted inpainting schemes motivated by Weber's law in visual perception. All these efforts lead to the conclusion that unlike many familiar image processors such as denoising, segmentation, and compression, the performance of a variational/Bayesian inpainting scheme much more crucially depends on whether the image prior model well resolves the spatial coupling (or geometric correlation) of image features. As a highlight, we show that the Besov image models appear to be less interesting for image inpainting in the wavelet domain, highly contrary to their significant roles in thresholding‐based denoising and compression. Thus geometry is the single most important keyword throughout this paper. © 2005 Wiley Periodicals, Inc.  相似文献   

17.
Large‐scale reservoir simulations are extremely time‐consuming because of the solution of large‐scale linear systems arising from the Newton or Newton–Raphson iterations. The problem becomes even worse when highly heterogeneous geological models are employed. This paper introduces a family of multi‐stage preconditioners for parallel black oil simulations, which are based on the famous constrained pressure residual preconditioner. Numerical experiments demonstrate that our preconditioners are robust, efficient, and scalable. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

18.
We develop the theory of information-based complexity from a parallel point of view. For a model of computation with p processors, each being capable of arithmetic operations and decisions, we analyze the complexity of function approximation, numerical integration, and solution of Fredholm integral equations. We obtain tight bounds on the complexity, considered as a function of three variables simultaneously: the number of processors, the required precision, and (in the case of approximation and integral equations) the number of points, in which the approximate solution is to be determined.  相似文献   

19.
In this paper we define the binary tree algebraic computation (BTAC) problem and develop an efficient parallel algorithm for solving this problem. A variety of graph problems (minimum covering set, minimum r-dominating set, maximum matching set, etc.) for trees and two terminal series parallel (TTSP) graphs can be converted to instances of the BTAC problem. Thus efficient parallel algorithms for these problems are obtained systematically by using the BTAC algorithm. The parallel computation model is an exclusive read exclusive write PRAM. The algorithms for tree problems run in O(log n) time with O(n) processors. The algorithms for TTSP graph problems run in O(log m) time with O(m) processors where n (m) is the number of vertices (edges) in the input graph. These algorithms are within an O(log n) factor of optimal.  相似文献   

20.
This paper presents two integer linear programming (ILP) models for cyclic scheduling of tasks with unit/general processing time. Our work is motivated by digital signal processing (DSP) applications on FPGAs (Field-Programmable Gate Arrays)—hardware architectures hosting several sets of identical arithmetic units. These hardware units can be formalized as dedicated sets of parallel identical processors. We propose a method to find an optimal periodic schedule of DSP algorithms on such architectures where the number of available arithmetic units must be determined by the scheduling algorithm with respect to the capacity of the FPGA circuit. The emphasis is put on the efficiency of the ILP models. We show the advantages of our models in comparison with common ILP models on benchmarks and randomly generated instances.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号