首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
 为了研究离子发动机羽流对航天器的影响,采用质点网格-蒙特卡罗碰撞方法对离子发动机羽流中的交换电荷离子进行了模拟。利用计算设备统一架构技术,开发出一套基于图形处理器的并行粒子模拟程序。随机数生成采用并行MT19937伪随机数生成器算法,电场方程使用完全近似存储格式的代数多重网格法求解。r-z轴对称坐标系中,在z=0 m处获得的电流密度均值为4.5×10-5 A/m2,图形处理器所得结果与中央处理器模拟结果吻合。在16核心的NVIDIA GeForce 9400 GT图形显示卡上,取得相对于Intel Core 2 E6300中央处理器4.5~10.0倍的加速比。  相似文献   

2.
Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9× to 12× speedup over the original CPU version, in addition to speedup from multi-threading. This is 2× faster than the fully-optimized GPU version, indicating the importance of optimizing CPU implementations.  相似文献   

3.
We present the GPU calculation with the common unified device architecture (CUDA) for the Wolff single-cluster algorithm of the Ising model. Proposing an algorithm for a quasi-block synchronization, we realize the Wolff single-cluster Monte Carlo simulation with CUDA. We perform parallel computations for the newly added spins in the growing cluster. As a result, the GPU calculation speed for the two-dimensional Ising model at the critical temperature with the linear size L = 4096 is 5.60 times as fast as the calculation speed on a current CPU core. For the three-dimensional Ising model with the linear size L = 256, the GPU calculation speed is 7.90 times as fast as the CPU calculation speed. The idea of quasi-block synchronization can be used not only in the cluster algorithm but also in many fields where the synchronization of all threads is required.  相似文献   

4.
In this study, the application of the two-dimensional direct simulation Monte Carlo (DSMC) method using an MPI-CUDA parallelization paradigm on Graphics Processing Units (GPUs) clusters is presented. An all-device (i.e. GPU) computational approach is adopted where the entire computation is performed on the GPU device, leaving the CPU idle during all stages of the computation, including particle moving, indexing, particle collisions and state sampling. Communication between the GPU and host is only performed to enable multiple-GPU computation. Results show that the computational expense can be reduced by 15 and 185 times when using a single GPU and 16 GPUs respectively when compared to a single core of an Intel Xeon X5670 CPU. The demonstrated parallel efficiency is 75% when using 16 GPUs as compared to a single GPU for simulations using 30 million simulated particles. Finally, several very large-scale simulations in the near-continuum regime are employed to demonstrate the excellent capability of the current parallel DSMC method.  相似文献   

5.
We discuss theoretical predictions for -pair production and decay at LEP2 and higher energies in a form suitable for comparison with raw data. We present a practical framework for calculating uncertainties of predictions given by the KORALW and grc4f Monte Carlo programs. As an example we use observables in the decay channel: the total four-quark (four-jet) cross section and two-quark/jet invariant-mass distribution and cross section, in the case when the other two may escape detection. Effects of QED bremsstrahlung, effective couplings, running and widths, Coulomb interaction and the complete tree level set of diagrams are discussed. We also revisit the question of technical precision of the new version 1.21 of the KORALW Monte Carlo code as well as of version 1.2(26) of the grc4f one. Finally we find predictions of the two programs to have an overall physical uncertainty of 2%. As a side result we show, on the example of an invariant mass distribution, the strong interplay of spin correlations and detector cut-offs in the case of four-fermion final states. Received: 18 March 1997 / Revised version: 4 July 1997  相似文献   

6.
The compute unified device architecture (CUDA) is a programming approach for performing scientific calculations on a graphics processing unit (GPU) as a data-parallel computing device. The programming interface allows to implement algorithms using extensions to standard C language. With continuously increased number of cores in combination with a high memory bandwidth, a recent GPU offers incredible resources for general purpose computing. First, we apply this new technology to Monte Carlo simulations of the two dimensional ferromagnetic square lattice Ising model. By implementing a variant of the checkerboard algorithm, results are obtained up to 60 times faster on the GPU than on a current CPU core. An implementation of the three dimensional ferromagnetic cubic lattice Ising model on a GPU is able to generate results up to 35 times faster than on a current CPU core. As proof of concept we calculate the critical temperature of the 2D and 3D Ising model using finite size scaling techniques. Theoretical results for the 2D Ising model and previous simulation results for the 3D Ising model can be reproduced.  相似文献   

7.
The answers to data assimilation questions can be expressed as path integrals over all possible state and parameter histories. We show how these path integrals can be evaluated numerically using a Markov Chain Monte Carlo method designed to run in parallel on a graphics processing unit (GPU). We demonstrate the application of the method to an example with a transmembrane voltage time series of a simulated neuron as an input, and using a Hodgkin–Huxley neuron model. By taking advantage of GPU computing, we gain a parallel speedup factor of up to about 300, compared to an equivalent serial computation on a CPU, with performance increasing as the length of the observation time used for data assimilation increases.  相似文献   

8.
We present a Monte Carlo study of dijet angular distributions at $\sqrt{s}=14$  TeV. First we perform a next-to-leading order QCD study; we calculate the distributions in four different bins of dijet invariant mass using different Monte Carlo programs and different jet algorithms, and we also investigate the systematic uncertainties coming from the choice of the parton distribution functions and the renormalization and factorization scales. In the second part of this paper, we present the effects on the distributions coming from a model including gravitational scattering and black hole formation in a world with large extra dimensions. Assuming a 25% systematic uncertainty, we report a discovery potential for the mass bin 1<M jj <2 TeV at 10 pb?1 integrated luminosity.  相似文献   

9.
Inclusive baryon-antibaryon pair production was studied in two-photon events which were collected at the e + e ? collider TRISTAN, and correspond to an integrated luminosity of 303 pbt?1. Correlations between a baryon and an antibaryon were studied for their flavors (p or Λ) and their momentum vectors. The experimental results were compared with the expectations from a jet-fragmentation Monte Carlo simulation. We have found that although the ratios of the cross sections of different baryon-flavor combinations are consistent with the Monte Carlo expectations, the cross section shows an excess over the Monte Carlo expectation in a low invariant-mass region of final-state particles at large angles, that indicates a significant contribution from higher-order QCD or non-perturbative effects. The experimental data show no narrow azimuthal-angle correlation, which is expected from a jet-fragmentation Monte Carlo. A search for exclusive Λ pair production has also been made. We have no candidates and have obtained the upper limit for the cross section.  相似文献   

10.
We have implemented the leading-color n-gluon amplitudes using the Berends–Giele recursion relations on a multi-threaded GPU. Speed-up factors between 150 and 300 are obtained compared to the CPU-based implementation of the same event generator. In this first paper, we study the feasibility of a GPU-based event generator with an emphasis on the constraints imposed by the hardware. Some studies of Monte Carlo convergence and accuracy are presented for PP→2,…,10 jet observables using of the order of 1011 events.  相似文献   

11.
We present an implementation of the calculation of the production of W + W + plus two jets at hadron colliders, at next-to-leading order (NLO) in QCD, in the POWHEG framework, which is a method that allows the interfacing of NLO calculations to shower Monte Carlo programs. This is the first 2→4 process to be described to NLO accuracy within a shower Monte Carlo framework. The implementation was built within the POWHEG BOX package. We discuss a few technical improvements that were needed in the POWHEG BOX to deal with the computer intensive nature of the NLO calculation, and argue that further improvements are possible, so that the method can match the complexity that is reached today in NLO calculations. We have interfaced our POWHEG implementation with PYTHIA and HERWIG, and present some phenomenological results, discussing similarities and differences between the pure NLO and the POWHEG+PYTHIA calculation both for inclusive and more exclusive distributions. We have made the relevant code available at the POWHEG BOX web site.  相似文献   

12.
A short review is given concerning the quantum statistical Monte Carlo method based on the equivalence theorem(1) thatd-dimensional quantum systems are mapped onto (d+1)-dimensional classical systems. The convergence property of this approximate tansformation is discussed in detail. Some applications of this geneal appoach to quantum spin systems are reviewed. A new Monte Carlo method, “thermo field Monte Carlo method,” is presented, which is an extension of the projection Monte Carlo method at zero temperature to that at finite temperatures. Invited talk presented at “Frontiers of Quantum Monte Carlo,” Los Alamos National Laboratory, September 3–6, 1985.  相似文献   

13.
14.
This paper presents a parallel algorithm implemented on graphics processing units (GPUs) for rapidly evaluating spatial convolutions between the Helmholtz potential and a large-scale source distribution. The algorithm implements a non-uniform grid interpolation method (NGIM), which uses amplitude and phase compensation and spatial interpolation from a sparse grid to compute the field outside a source domain. NGIM reduces the computational time cost of the direct field evaluation at N observers due to N co-located sources from O(N2) to O(N) in the static and low-frequency regimes, to O(N log N) in the high-frequency regime, and between these costs in the mixed-frequency regime. Memory requirements scale as O(N) in all frequency regimes. Several important differences between CPU and GPU implementations of the NGIM are required to result in optimal performance on respective platforms. In particular, in the CPU implementations all operations, where possible, are pre-computed and stored in memory in a preprocessing stage. This reduces the computational time but significantly increases the memory consumption. In the GPU implementations, where handling memory often is a critical bottle neck, several special memory handling techniques are used to accelerate the computations. A significant latency of the GPU global memory access is hidden by implementing coalesced reading, which requires arranging many array elements in contiguous parts of memory. Contrary to the CPU version, most of the steps in the GPU implementations are executed on-fly and only necessary arrays are kept in memory. This results in significantly reduced memory consumption, increased problem size N that can be handled, and reduced computational time on GPUs. The obtained GPU–CPU speed-up ratios are from 150 to 400 depending on the required accuracy and problem size. The presented method and its CPU and GPU implementations can find important applications in various fields of physics and engineering.  相似文献   

15.
A search for QCD-instanton-induced events in deep inelastic ep scattering has been performed with the ZEUS detector at the HERA collider, using data corresponding to an integrated luminosity of 38 pb-1. A kinematic range defined by cuts on the photon virtuality, Q2 > 120 GeV2, and on the Bjorken scaling variable, x > 10-3, has been investigated. The QCD-instanton induced events were modelled by the Monte Carlo generator QCDINS. A background-independent, conservative 95% confidence level upper limit for the instanton cross section of 26 pb is obtained, to be compared with the theoretically expected value of 8.9 pb.Received: 17 December 2003, Published online: 2 April 2004  相似文献   

16.
We propose a new Monte Carlo method for calculating eigenvalues of transfer matrices leading to free energies and to correlation lengths of classical and quantum many-body systems. Generally, this method can be applied to the calculation of the maximum eigenvalue of a nonnegative matrix  such that all the matrix elements of Âk are strictly positive for an integerk. This method is based on a new representation of the maximum eigenvalue of the matrix  as the thermal average of a certain observable of a many-body system. Therefore one can easily calculate the maximum eigenvalue of a transfer matrix leading to the free energy in the standard Monte Carlo simulations, such as the Metropolis algorithm. As test cases, we calculate the free energies of the square-lattice Ising model and of the spin-1/2XY Heisenberg chain. We also prove two useful theorems on the ergodicity in quantum Monte Carlo algorithms, or more generally, on the ergodicity of Monte Carlo algorithms using our new representation of the maximum eigenvalue of the matrixÂ.  相似文献   

17.
Triple differential dijet cross sections in interactions are presented in the region of photon virtualities 2 < Q 2 < 80 GeV2, inelasticities 0.1 < y < 0.85, jet transverse energies E * T 1 > 7 GeV, E * T 2 > 5 GeV, and pseudorapidities . The measurements are made in the centre-of-mass frame, using an integrated luminosity of 57 pb-1. The data are compared with NLO QCD calculations and LO Monte Carlo programs with and without a resolved virtual photon contribution. NLO QCD calculations fail to describe the region of low Q 2 and low jet transverse energies, in contrast to a LO Monte Carlo generator which includes direct and resolved photon interactions with both transversely and longitudinally polarised photons. Initial and final state parton showers are tested as a mechanism for including higher order QCD effects in low E T jet production.Received: 13 January 2004, Revised: 21 July 2004, Published online: 18 August 2004  相似文献   

18.
We present two parallel implementations of the bond fluctuation model on graphics processors that outperform by a factor of up to 50 times an equivalent implementation on single CPU processor. The first algorithm is a parallelized version of an accelerated MC method published earlier in [S. Nedelcu, J.-U. Sommer, Single chain dynamics in polymer networks: a Monte Carlo study, J. Chem. Phys. 130 (2009) 204902]. In this first algorithm we use the parallel domain decomposition technique to avoid monomer collisions. In contrast, in the second algorithm we associate each monomer with a parallel process, where all monomers in the system are attempted to move simultaneously. In both cases, only monomer moves that result in allowed bonds and preserve lattice occupancy are accepted. To validate the correctness of the GPU algorithms we simulated monodisperse polymer melts at monomer number density 0.5 and compared static and dynamical properties with standard CPU implementations. We found good agreement between the CPU and the GPU results, which demonstrates the equivalence of the serial and parallel implementations. The influence of higher monomer number density is discussed.  相似文献   

19.
The long-term goal of this paper is to develop a robust simulator to study the EM scattering from vegetation. In an effort to overwhelm the intensive computational burden results from large sampling numbers, we decided to utilize the Graphics Processing Unit (GPU), which has been greatly developed in recent years. In this paper, Compute Unified Device Architecture (CUDA) is combined with the four-path method to predict the EM scattering properties from scatterers which are sampled by using the Monte Carlo method in a two-layer canopy model. Obviously, a speedup of 77.8 times could be readily obtained in comparison with the original serial algorithm on a Core(TM) i5 CPU with the help of a GTS250 GPU as a coprocessor.  相似文献   

20.
The transition temperature obtained from recent Monte Carlo calculations for the Quartet Ising model on the fcc lattice deviated by 17% from the exact transition temperatureT c SD required by selfduality which we have proven afterwards. Here we use Monte Carlo results of the internal energy, which agree well with low- and high temperature series, to determine entropy and free energy and obtain aT c in excellent agreement (±0.1%) with the exact value. The Quartet model on the hcp lattice is shown to be selfdual too; the rapidly converging series for the fcc and the hcp lattice differ only in higher order.Guest stay  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号