首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We show that the polar as well as the pseudo-polar FFT can be computed very accurately and efficiently by the well-known nonequispaced FFT. Furthermore, we discuss the reconstruction of a 2d signal from its Fourier transform samples on a (pseudo)polar grid by means of the inverse nonequispaced FFT.  相似文献   

2.
An efficient calculation of NFFT (nonequispaced fast Fourier transforms) is always a challenging task in a variety of application areas, from medical imaging to radio astronomy to chemical simulation. In this article, a new theoretical derivation is proposed for NFFT based on gridding algorithm and new strategies are proposed for the implementation of both forward NFFT and its inverse on both CPU and GPU. The GPU-based version, namely CUNFFT, adopts CUDA (Compute Unified Device Architecture) technology, which supports a fine-grained parallel computing scheme. The approximation errors introduced in the algorithm are discussed with respect to different window functions. Finally, benchmark calculations are executed to illustrate the accuracy and performance of NFFT and CUNFFT. The results show that CUNFFT is not only with high accuracy, but also substantially faster than conventional NFFT on CPU.  相似文献   

3.
Stefan Kunis 《PAMM》2008,8(1):10977-10978
Recently, the fast Fourier transform (FFT) has been generalised for arbitrary sampling nodes by the use of approximation schemes. We show that such nonequispaced FFTs can be implemented without oversampling, i.e., no extra memory besides the input and output array is needed. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

4.
The discrete Fourier transform in d dimensions with equispaced knots in space and frequency domain can be computed by the fast Fourier transform (FFT) in arithmetic operations. In order to circumvent the ‘curse of dimensionality’ in multivariate approximation, interpolations on sparse grids were introduced. In particular, for frequencies chosen from an hyperbolic cross and spatial knots on a sparse grid fast Fourier transforms that need only arithmetic operations were developed. Recently, the FFT was generalised to nonequispaced spatial knots by the so-called NFFT. In this paper, we propose an algorithm for the fast Fourier transform on hyperbolic cross points for nonequispaced spatial knots in two and three dimensions. We call this algorithm sparse NFFT (SNFFT). Our new algorithm is based on the NFFT and an appropriate partitioning of the hyperbolic cross. Numerical examples confirm our theoretical results.  相似文献   

5.
Efficient Reconstruction of Functions on the Sphere from Scattered Data   总被引:1,自引:0,他引:1  
Recently, fast and reliable algorithms for the evaluation of spherical harmonic expansions have been developed. The corresponding sampling problem is the computation of Fourier coefficients of a function from sampled values at scattered nodes. We consider a least squares approximation and an interpolation of the given data. Our main result is that the rate of convergence of the two proposed iterative schemes depends only on the mesh norm and the separation distance of the nodes. In conjunction with the nonequispaced FFT on the sphere, the reconstruction of N2 Fourier coefficients from M reasonably distributed samples is shown to take O(N2 log2 N + M) floating point operations. Numerical results support our theoretical findings.  相似文献   

6.
In this paper, a conservative compact difference scheme is proposed for the two‐dimensional nonlinear Zakharov equation with periodic boundary condition and initial condition. The proposed scheme not only conserve the mass and energy in the discrete level but also are efficient in practical experiments because the Fast Fourier transform (FFT) can be used to speed up the numerical computation. By using the standard energy method and induction argument, we can establish rigorously the unconditional and optimal H2‐error estimates. Some numerical examples are provided to support our theoretical results and show the accuracy and efficiency of the new scheme.  相似文献   

7.
A note on fast Fourier transforms for nonequispaced grids   总被引:1,自引:0,他引:1  
In this paper, we are concerned with fast Fourier transforms for nonequispaced grids. We propose a general efficient method for the fast evaluation of trigonometric polynomials at nonequispaced nodes based on the approximation of the polynomials by special linear combinations of translates of suitable functions ϕ. We derive estimates for the approximation error. In particular, we improve the estimates given by Dutt and Rokhlin [7]. As a practical consequence, we obtain a criterion for the choice of the parameters involved in the fast transforms. This revised version was published online in August 2006 with corrections to the Cover Date.  相似文献   

8.
The objective of this work is the extension of the classical computational homogenization scheme for continuous micro–structures to the homogenization of atomistic systems. The atomistic setting is simulated by applying the so–called atomistic finite element method. In particular the influence of atomistic defects onto the macroscopic material behavior is highlighted. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

9.
Calcium waves are modeled by parabolic partial differential equations, whose simulation codes contain Krylov subspace methods as computational kernels. This paper presents GPU-based parallel computations for the conjugate gradient method applied to the finite difference discretization of a Poisson equation as prototype problem for the computational kernel. The CUDA algorithm tests the three memory systems of global memory, texture memory, and shared memory of a CUDA-enabled GPU. Due to the caching mechanism and coalesced read/write operations, the CUDA algorithm using global memory and single precision floating point numbers outperforms algorithms accessing texture memory and the shared memory. (© 2012 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

10.
The two-grid method is a technique to solve the linear system of algebraic equations for reducing the computational cost. In this study, the two-grid procedure has been combined with the EFG method for solving nonlinear partial differential equations. The two-grid FEM has been introduced in various forms. The well-known two-grid FEM is a three-step method that has been proposed by Bajpai and Nataraj (Comput. Math. Appl. 2014;68:2277–2291) that the new proposed scheme is an ecient procedure for solving important nonlinear partial differential equations such as Navier–Stokes equation. By applying shape functions of IMLS approximation in the EFG method, a new technique that is called interpolating EFG (IEFG) can be obtained. In the current investigation, we combine the two-grid algorithm with the IEFG method for solving the nonlinear Rosenau-regularized long-wave (RRLW) equation. In other hand, we demonstrate that solutions of steps 1, 2, and 3 exist and are unique and also we achieve an error estimate for them. Moreover, three test problems in one- and two-dimensional cases are given which support accuracy and efficiency of the proposed scheme.  相似文献   

11.
A straightforward discretisation of problems in high dimensions often leads to an exponential growth in the number of degrees of freedom. Sparse grid approximations allow for a severe decrease in the number of used Fourier coefficients to represent functions with bounded mixed derivatives and the fast Fourier transform (FFT) has been adapted to this thin discretisation. We show that this so called hyperbolic cross FFT suffers from an increase of its condition number for both increasing refinement and increasing spatial dimension.  相似文献   

12.
In this paper we implement the moving mesh PDE method for simulating the blowup in reaction–diffusion equations with temporal and spacial nonlinear nonlocal terms. By a time-dependent transformation, the physical equation is written into a Lagrangian form with respect to the computational variables. The time-dependent transformation function satisfies a parabolic partial differential equation — usually called moving mesh PDE (MMPDE). The transformed physical equation and MMPDE are solved alternately by central finite difference method combined with a backward time-stepping scheme. The integration time steps are chosen to be adaptive to the blowup solution by employing a simple and efficient approach. The monitor function in MMPDEs plays a key role in the performance of the moving mesh PDE method. The dominance of equidistribution is utilized to select the monitor functions and a formal analysis is performed to check the principle. A variety of numerical examples show that the blowup profiles can be expressed correctly in the computational coordinates and the blowup rates are determined by the tests.  相似文献   

13.
Particle methods are a powerful tool to model dynamic systems. Thereby, the system is discretized by a large number of particles, which are interacting via local, predefined particle-particle interaction laws. The resulting computational effort includes neighborhood search, computation of interaction forces and state update via time integration. Particle methods are used in a lot of different fields of applications like computer science, physics and engineering sciences. As the analyzed systems' number of particles constantly grow, performance enhancement has become an important part of present algorithm development. Besides the well-established approach of algorithm parallelization on multi-core CPUs or CPU clusters, modern graphics processing units (GPUs) present a different and trend-setting possibility for massive parallelization even on desktop computers. Among the top four supercomputers of the world, three are already using NVIDIA GPUs. In late 2006, NVIDIA introduced the first GPUs optimized for general purpose calculations. This was followed by the introduction of a new computing architecture differing from the standard graphics user-interface like OpenGL. This architecture is called Compute Unified Device Architecture (CUDA). It enables the user to program the GPU using standard C commands with few additional runtime functions. The differences in architecture between CPU and GPU result in a completely different algorithm implementation. So, a performance evaluation of different types of particle systems implemented on a GPU using CUDA and on a standard CPU is presented. (© 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

14.
Maxim Naumov 《PAMM》2012,12(1):11-14
An implementation of the incomplete-LU/Cholesky preconditioned block-iterative methods on the Graphics Processing Units (GPUs) using the CUDA parallel programming model is presented. In particular, we focus on the tradeoffs associated with the sparse matrix-vector multiplication with multiple vectors, sparse triangular solve with multiple right-hand-sides (rhs) as well as incomplete factorization with 0 fill-in. We use these building blocks to implement the block-CG and BiCGStab iterative methods for the symmetric positive definite (s.p.d.) and nonsymmetric linear systems, respectively. Also, in our numerical experiments we show that the implementation of the preconditioned block-iterative methods using the CUSPARSE library on the GPU achieves an average of 3× speedup over their MKL implementation on the CPU. (© 2012 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

15.
This paper describes an algorithm for solving structured nonsmooth convex optimization problems using the optimal subgradient algorithm (OSGA), which is a first-order method with the complexity \(\mathcal {O}(\varepsilon ^{-2})\) for Lipschitz continuous nonsmooth problems and \(\mathcal {O}(\varepsilon ^{-1/2})\) for smooth problems with Lipschitz continuous gradient. If the nonsmoothness of the problem is manifested in a structured way, we reformulate the problem so that it can be solved efficiently by a new setup of OSGA (called OSGA-V) with the complexity \(\mathcal {O}(\varepsilon ^{-1/2})\). Further, to solve the reformulated problem, we equip OSGA-O with an appropriate prox-function for which the OSGA-O subproblem can be solved either in a closed form or by a simple iterative scheme, which decreases the computational cost of applying the algorithm for large-scale problems. We show that applying the new scheme is feasible for many problems arising in applications. Some numerical results are reported confirming the theoretical foundations.  相似文献   

16.
The fast Fourier transform (FFT) is one of the most successful numerical algorithms of the 20th century and has found numerous applications in many branches of computational science and engineering. The FFT algorithm can be derived from a particular matrix decomposition of the discrete Fourier transform (DFT) matrix. In this paper, we show that the quantum Fourier transform (QFT) can be derived by further decomposing the diagonal factors of the FFT matrix decomposition into products of matrices with Kronecker product structure. We analyze the implication of this Kronecker product structure on the discrete Fourier transform of rank‐1 tensors on a classical computer. We also explain why such a structure can take advantage of an important quantum computer feature that enables the QFT algorithm to attain an exponential speedup on a quantum computer over the FFT algorithm on a classical computer. Further, the connection between the matrix decomposition of the DFT matrix and a quantum circuit is made. We also discuss a natural extension of a radix‐2 QFT decomposition to a radix‐d QFT decomposition. No prior knowledge of quantum computing is required to understand what is presented in this paper. Yet, we believe this paper may help readers to gain some rudimentary understanding of the nature of quantum computing from a matrix computation point of view.  相似文献   

17.
The radial basis function (RBF) collocation method uses global shape functions to interpolate and collocate the approximate solution of PDEs. It is a truly meshless method as compared to some of the so‐called meshless or element‐free finite element methods. For the multiquadric and Gaussian RBFs, there are two ways to make the solution converge—either by refining the mesh size h, or by increasing the shape parameter c. While the h‐scheme requires the increase of computational cost, the c‐scheme is performed without extra effort. In this paper we establish by numerical experiment the exponential error estimate ? ~ Oc?h) where 0 < λ < 1. We also propose the use of residual error as an error indicator to optimize the selection of c. © 2003 Wiley Periodicals, Inc. Numer Methods Partial Differential Eq 19: 571–594, 2003  相似文献   

18.
19.
Based on the corrected finite pointset method (CFPM) with CPU-GPU heteroid parallelization (CFPM-GPU), a high-efficiency, accurate and fast parallel algorithm was developed for the high-dimensional phase separation phenomena governed by the multi-component Cahn-Hilliard (C-H) equation in complex domains. The proposed parallel algorithm with the CFPM-GPU was built in a process like: ① introduce the Wendland weight function into the discretization of the finite pointset method (FPM) scheme for the 1st/2nd spatial derivatives, based on the Taylor series and the weighted least square concept; ② use the above FPM scheme twice to approximate the 4th spatial derivative in the C-H equation, which is called the CFPM method; ③ for the first time establish an accelerating parallel algorithm for the CFPM with local matrices by means of a single GPU card based on the CUDA programming. Two benchmark problems of 2D radially and 3D spherically symmetric C-H equations were first solved to test the accuracy and high-efficiency of the proposed CFPM-GPU, and the acceleration ratio of the GPU parallelization to the single CPU computation is about 160. Subsequently, the proposed CFPM-GPU was used to predict the 2D/3D multi-phase separation phenomena in complex domains, and the prediction was compared with other numerical results. The numerical results show that, the proposed CFPM-GPU is valid and high-efficiency to simulate the 2D/3D multi-phase separation cases in complex domains. © 2023 Editorial Office of Applied Mathematics and Mechanics. All rights reserved.  相似文献   

20.
The discrete prolate spheroidal sequences (DPSS's) provide an efficient representation for discrete signals that are perfectly timelimited and nearly bandlimited. Due to the high computational complexity of projecting onto the DPSS basis – also known as the Slepian basis – this representation is often overlooked in favor of the fast Fourier transform (FFT). We show that there exist fast constructions for computing approximate projections onto the leading Slepian basis elements. The complexity of the resulting algorithms is comparable to the FFT, and scales favorably as the quality of the desired approximation is increased. In the process of bounding the complexity of these algorithms, we also establish new nonasymptotic results on the eigenvalue distribution of discrete time–frequency localization operators. We then demonstrate how these algorithms allow us to efficiently compute the solution to certain least-squares problems that arise in signal processing. We also provide simulations comparing these fast, approximate Slepian methods to exact Slepian methods as well as the traditional FFT based methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号