首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this short review we present the developments over the last 5 decades that have led to the use of Graphics Processing Units (GPUs) for astrophysical simulations. Since the introduction of NVIDIA’s Compute Unified Device Architecture (CUDA) in 2007 the GPU has become a valuable tool for N-body simulations and is so popular these days that almost all papers about high precision N-body simulations use methods that are accelerated by GPUs. With the GPU hardware becoming more advanced and being used for more advanced algorithms like gravitational tree-codes we see a bright future for GPU like hardware in computational astrophysics.  相似文献   

2.
We use a graphics processing unit (GPU) for fast calculations of helicity amplitudes of quark and gluon scattering processes in massless QCD. New HEGET (HELAS Evaluation with GPU Enhanced Technology) codes for gluon self-interactions are introduced, and a C++ program to convert the MadGraph generated FORTRAN codes into HEGET codes in CUDA (a C-platform for general purpose computing on GPU) is created. Because of the proliferation of the number of Feynman diagrams and the number of independent color amplitudes, the maximum number of final state jets we can evaluate on a GPU is limited to 4 for pure gluon processes (gg→4g), or 5 for processes with one or more quark lines such as $q\overline{q}\rightarrow 5g$ and qqqq+3g. Compared with the usual CPU-based programs, we obtain 60–100 times better performance on the GPU, except for 5-jet production processes and the gg→4g processes for which the GPU gain over the CPU is about 20.  相似文献   

3.
We use a graphics processing unit (GPU) for fast computations of Monte Carlo integrations. Two widely used Monte Carlo integration programs, VEGAS and BASES, are parallelized for running on a GPU. By using W + plus multi-gluon production processes at LHC, we test the integrated cross sections and execution time for programs written in FORTRAN and running in the CPU and those running on a GPU. The integrated results agree with each other within statistical errors. The programs run about 50 times faster on the GPU than on the CPU.  相似文献   

4.
Solvent-mediated hydrodynamic interactions between colloidal particles can significantly alter their dynamics. We discuss the implementation of Stokesian dynamics in leading approximation for streaming processors as provided by the compute unified device architecture (CUDA) of recent graphics processors (GPUs). Thereby, the simulation of explicit solvent particles is avoided and hydrodynamic interactions can easily be accounted for in already available, highly accelerated molecular dynamics simulations. Special emphasis is put on efficient memory access and numerical stability. The algorithm is applied to the periodic sedimentation of a cluster of four suspended particles. Finally, we investigate the runtime performance of generic memory access patterns of complexity O(N 2) for various GPU algorithms relying on either hardware cache or shared memory.  相似文献   

5.
This paper presents a parallel algorithm implemented on graphics processing units (GPUs) for rapidly evaluating spatial convolutions between the Helmholtz potential and a large-scale source distribution. The algorithm implements a non-uniform grid interpolation method (NGIM), which uses amplitude and phase compensation and spatial interpolation from a sparse grid to compute the field outside a source domain. NGIM reduces the computational time cost of the direct field evaluation at N observers due to N co-located sources from O(N2) to O(N) in the static and low-frequency regimes, to O(N log N) in the high-frequency regime, and between these costs in the mixed-frequency regime. Memory requirements scale as O(N) in all frequency regimes. Several important differences between CPU and GPU implementations of the NGIM are required to result in optimal performance on respective platforms. In particular, in the CPU implementations all operations, where possible, are pre-computed and stored in memory in a preprocessing stage. This reduces the computational time but significantly increases the memory consumption. In the GPU implementations, where handling memory often is a critical bottle neck, several special memory handling techniques are used to accelerate the computations. A significant latency of the GPU global memory access is hidden by implementing coalesced reading, which requires arranging many array elements in contiguous parts of memory. Contrary to the CPU version, most of the steps in the GPU implementations are executed on-fly and only necessary arrays are kept in memory. This results in significantly reduced memory consumption, increased problem size N that can be handled, and reduced computational time on GPUs. The obtained GPU–CPU speed-up ratios are from 150 to 400 depending on the required accuracy and problem size. The presented method and its CPU and GPU implementations can find important applications in various fields of physics and engineering.  相似文献   

6.
Graphics Processing Unit (GPU), originally developed for real-time, high-definition 3D graphics in computer games, now provides great faculty in solving scientific applications. The basis of particle transport simulation is the time-dependent, multi-group, inhomogeneous Boltzmann transport equation. The numerical solution to the Boltzmann equation involves the discrete ordinates (Sn) method and the procedure of source iteration. In this paper, we present a GPU accelerated simulation of one energy group time-independent deterministic discrete ordinates particle transport in 3D Cartesian geometry (Sweep3D). The performance of the GPU simulations are reported with the simulations of vacuum boundary condition. The discussion of the relative advantages and disadvantages of the GPU implementation, the simulation on multi GPUs, the programming effort and code portability are also reported. The results show that the overall performance speedup of one NVIDIA Tesla M2050 GPU ranges from 2.56 compared with one Intel Xeon X5670 chip to 8.14 compared with one Intel Core Q6600 chip for no flux fixup. The simulation with flux fixup on one M2050 is 1.23 times faster than on one X5670.  相似文献   

7.
针对NVIDIA CUDA(Compute Unified Device Architecture)架构的第三代GPU高性能计算技术开展了研究,利用具有448个处理核心的NVIDIA GPU GTX470实现了脉冲压缩雷达的基本数据处理算法,包括脉冲压缩算法与相参积累算法。根据GPU的并行处理架构,将脉冲压缩、相参积累完成了并行算法优化设计,有效的将算法映射到GPU GTX470的448个处理核心中,完成了脉冲压缩雷达基本处理算法的GPU并行处理实现,并针对处理结果效果与实时性进行了评估。  相似文献   

8.
《Physics letters. [Part B]》2001,504(3):247-253
We examine the possible tests of violation of the gravitational equivalence principle (VEP) at a muon storage ring via neutrino oscillation experiments. If the gravitational interactions of the neutrinos are not diagonal in the flavour basis and the gravitational interaction eigenstates have different couplings to the gravitational field, this leads to the neutrino oscillation. If one starts with μ+ beam then appearance of τ±, e+ and μ in the final state are the signals for neutrino oscillation. We have estimated the number of μ events in this scenario in νμN deep inelastic scattering. Final state lepton energy distribution can be used to distinguish the VEP scenario from the others. A large area of VEP parameter space can be explored at a future muon storage ring facility with moderate beam energy.  相似文献   

9.
We present a Monte Carlo study of dijet angular distributions at $\sqrt{s}=14$  TeV. First we perform a next-to-leading order QCD study; we calculate the distributions in four different bins of dijet invariant mass using different Monte Carlo programs and different jet algorithms, and we also investigate the systematic uncertainties coming from the choice of the parton distribution functions and the renormalization and factorization scales. In the second part of this paper, we present the effects on the distributions coming from a model including gravitational scattering and black hole formation in a world with large extra dimensions. Assuming a 25% systematic uncertainty, we report a discovery potential for the mass bin 1<M jj <2 TeV at 10 pb?1 integrated luminosity.  相似文献   

10.
We use the graphics processing unit (GPU) for fast calculations of helicity amplitudes of physics processes. As our first attempt, we compute $u\overline{u}\rightarrow n\gamma$ (n=2 to 8) processes in pp collisions at $\sqrt{s}=14$  TeV by transferring the MadGraph generated HELAS amplitudes (FORTRAN) into newly developed HEGET (HELAS Evaluation with GPU Enhanced Technology) codes written in CUDA, a C-platform developed by NVIDIA for general purpose computing on the GPU. Compared with the usual CPU programs, we obtain a 40–150 times better performance on the GPU.  相似文献   

11.
It is shown that all torsion-free vacuum solutions of the model of de Sitter (dS) gauge theory of gravity are the vacuum solutions of Einstein field equations with the same positive cosmological constant. Furthermore, for the gravitational theories with more general quadratic gravitational Lagrangian (F 2 + T 2), the torsion-free vacuum solutions are also the vacuum solutions of Einstein field equations.  相似文献   

12.
A fast mesh deformation method using explicit interpolation   总被引:1,自引:0,他引:1  
A novel mesh deformation algorithm for unstructured polyhedral meshes is developed utilizing a tree-code optimization of a simple direct interpolation method. The algorithm is shown to provide mesh quality that is competitive with radial basis function based methods, with markedly better performance in preserving boundary layer orthogonality in viscous meshes. The parallelization of the algorithm is described, and the algorithm cost is demonstrated to be O(n log n). The parallel implementation was used to deform meshes of 100 million nodes on nearly 200 processors demonstrating that the method scales to large mesh sizes. Results are provided for a simulation of a high Reynolds number fluid–structure interaction case using this technique.  相似文献   

13.
The history of the question on the possibility of detecting gravitational waves, whose existence is predicted by the General Relativity Theory, is briefly presented. The schemes of cryofiber interferometer, which we propose to use as detector of gravitational waves with amplitude |δg ij | = 10?20, are described. We also consider other versions of the use of cryofiber interferometer in both applied and fundamental context, including laboratory experiments in which according to the estimates dark energy density variations can be detected. We describe briefly the optical scheme of a compact interferometric detector of vibrations of a mirror fixed at the end of a massive gravitational antenna; the compactness admits construction of a cryogenic version with cooling of all the elements of such a recording system.  相似文献   

14.
Considering the octet baryons in relativistic mean field theory and selecting entropy per baryon S=l,we calculate and discuss the influence of U bosons on the equation of state,mass-radius,moment of inertia and gravitational redshift of massive protoneutron stars(PNSs).The effective coupling constant gu of U bosons and nucleons is selected from 0 to 70 GeV~(-2).The results indicate that U bosons will stiffen the equation of state(EOS).The influence of U bosons on the pressure is more obvious at low density than high density,while the influence of U bosons on the energy density is more obvious at high density than low density.The U bosons play a significant role in increasing the maximum mass and radius of PNS.When the value of gu changes from 0 to 70 GeV~(-2),the maximum mass of a massive PNS increases from 2.11M_⊙ to 2.58M_⊙,and the radius of a PNS corresponding to PSR J0348+0432 increases from 13.71 km to 24.35 km.The U bosons will increase the moment of inertia and decrease the gravitational redshift of a PNS.For the PNS of the massive PSR J0348+0432,the radius and moment of inertia vary directly with gu,and the gravitational redshift varies approximately inversely with gu.  相似文献   

15.
Brane factories     
We propose that higher-dimensional extended objects (p-branes) are created by super-Planckian scattering processes in theories with TeV scale gravity. As an example, we compute the cross section for p-brane creation in a (n+4)-dimensional spacetime with asymmetric compactification. We find that the cross section for the formation of a brane which is wounded on a compact submanifold of size of the fundamental gravitational scale is larger than the cross section for the creation of a spherically symmetric black hole. Therefore, we predict that branes are more likely to be created than black holes in super-Planckian scattering processes in these manifolds. The higher rate of p-brane production has important phenomenological consequences, as it significantly enhances possible detection of non-perturbative gravitational events in future hadron colliders and cosmic rays detectors.  相似文献   

16.
We present exact analytic solutions describing the equilibrium states available to a one-dimensional, self-gravitating cloud of gas subject to an external constant gravitational acceleration due to a plane of “stars”. The gas is taken to be heated at a rate proportional to the local gas density and is cooling by both radiation and conduction. The solutions are valid for a thermal conductivity which is an arbitrary function of gas temperature, T, and for radiative cooling which is proportional to the local gas density, ?, multiplied by an arbitrary function of gas pressure, ?. Illustrations of the general spatial dependence are given for the cases where the radiative cooling is proportional to ?2T, and in which the thermal conductivity is either constant, or proportional to Ta(a > 0) in the limits of T tending zero or infinity, respectively.We show that the phenomenon of density “inversion”, reported earlier, is indeed ameliorated by the radiative cooling term, as we had speculated it might be, but is not removed. This indicates that the phenomenon of density inversion is of rugged quality, persisting under a wide variety of conditions and, therefore, of general astrophysical import. We also show that, depending on the ratios of various parameters entering the problem, there is a new phenomenon possible in which the gas temperature has a local minimum at some non-central location so that a wedge of cool gas is in equilibrium surrounded by a hot medium.We have done these calculations as an aid to understanding the complicated behavior of interstellar gas clouds in particular, and the general physical interplay between force balance and energy balance in models of gas clouds more realistic than those heretofore available.  相似文献   

17.
激波与火焰面相互作用数值模拟的GPU加速   总被引:1,自引:0,他引:1  
蒋华  董刚  陈霄 《计算物理》2016,33(1):23-29
为考察计算机图形处理器(GPU)在计算流体力学中的计算能力,采用基于CPU/GPU异构并行模式的方法对激波与火焰界面相互作用的典型可压缩反应流进行数值模拟,优化并行方案,考察不同网格精度对计算结果和计算加速性能的影响.结果表明,和传统的基于信息传递的MPI 8线程并行计算相比,GPU并行模拟结果与MPI并行模拟结果相同;两种计算方法的计算时间均随网格数量的增加呈线性增长趋势,但GPU的计算时间比MPI明显降低.当网格数量较小时(1.6×104),GPU计算得到的单个时间步长平均时间的加速比为8.6;随着网格数量的增加,GPU的加速比有所下降,但对较大规模的网格数量(4.2×106),GPU的加速比仍可达到5.9.基于GPU的异构并行加速算法为可压缩反应流的高分辨率大规模计算提供了较好的解决途径.  相似文献   

18.
19.
Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9× to 12× speedup over the original CPU version, in addition to speedup from multi-threading. This is 2× faster than the fully-optimized GPU version, indicating the importance of optimizing CPU implementations.  相似文献   

20.
In General Relativity, the graviton interacts in three-graviton vertex with a tensor that is not the energy-momentum tensor of the gravitational field. We consider the possibility that the graviton interacts with the definite gravitational energy-momentum tensor that we previously found in the G 2 approximation. This tensor in a gauge, where nonphysical degrees of freedom do not contribute, is remarkable, because it gives positive gravitational energy density for the Newtonian center in the same manner as the electromagnetic energy-momentum tensor does for the Coulomb center. We show that the assumed three-graviton vertex does not lead to contradiction with the precession of Mercury’s perihelion. In the S-matrix approach used here, the external gravitational field has only a subsidiary role, similar to the external field in quantum electrodynamics. This approach with the assumed vertex leads to the gravitational field that cannot be obtained from a consistent gravity equation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号