共查询到20条相似文献,搜索用时 15 毫秒
1.
Carolyn L. Phillips Joshua A. Anderson Sharon C. Glotzer 《Journal of computational physics》2011,230(19):7191-7201
Brownian Dynamics (BD), also known as Langevin Dynamics, and Dissipative Particle Dynamics (DPD) are implicit solvent methods commonly used in models of soft matter and biomolecular systems. The interaction of the numerous solvent particles with larger particles is coarse-grained as a Langevin thermostat is applied to individual particles or to particle pairs. The Langevin thermostat requires a pseudo-random number generator (PRNG) to generate the stochastic force applied to each particle or pair of neighboring particles during each time step in the integration of Newton’s equations of motion. In a Single-Instruction-Multiple-Thread (SIMT) GPU parallel computing environment, small batches of random numbers must be generated over thousands of threads and millions of kernel calls. In this communication we introduce a one-PRNG-per-kernel-call-per-thread scheme, in which a micro-stream of pseudorandom numbers is generated in each thread and kernel call. These high quality, statistically robust micro-streams require no global memory for state storage, are more computationally efficient than other PRNG schemes in memory-bound kernels, and uniquely enable the DPD simulation method without requiring communication between threads. 相似文献
2.
3.
This paper presents a parallel implementation of fractional solvers for the incompressible Navier–Stokes equations using an algebraic approach. Under this framework, predictor–corrector and incremental projection schemes are seen as sub-classes of the same class, making apparent its differences and similarities. An additional advantage of this approach is to set a common basis for a parallelization strategy, which can be extended to other split techniques or to compressible flows. 相似文献
4.
5.
Massimo Bernaschi Mauro Bisson Massimiliano Fatica 《The European Physical Journal B - Condensed Matter and Complex Systems》2015,88(6):158
Graphics processing units (GPU) are currently used as a cost-effective platform forcomputer simulations and big-data processing. Large scale applications require thatmultiple GPUs work together but the efficiency obtained with cluster of GPUs is, at times,sub-optimal because the GPU features are not exploited at their best. We describe how itis possible to achieve an excellent efficiency for applications in statistical mechanics,particle dynamics and networks analysis by using suitable memory access patterns andmechanisms like CUDA streams, profiling tools, etc. Similar concepts andtechniques may be applied also to other problems like the solution of Partial DifferentialEquations. 相似文献
6.
脉冲功率技术在工业和生物医学领域有着广泛的应用,很多应用场合要求输出数百安培的高压脉冲。固态Marx发生器虽已研究多年,但是被广泛采用直插封装的IGBT和MOSFET功率半导体开关管的额定电流通常都低于100 A,无法满足低阻抗负载的应用需求。为提高输出脉冲电流幅值,提出两种多路Marx发生器并联的脉冲电源的拓扑结构,第一种方案采用多路Marx发生器直接并联,第二种是共用一组充电开关管的多路Marx发生器并联。由FPGA提供充放电控制信号,采用串芯磁环隔离驱动方案实现带负压偏置的同步驱动,主电路选用开通速度快、通流能力强的IGBT为主开关的半桥式固态方波Marx电路。实验结果表明,6路16级Marx直接并联的脉冲发生器能输出重频100 Hz高压方波脉冲幅值可达10 kV,在30 Ω负载侧输出峰值电流可达300 A,上升时间230 ns。共用充电开关管的6路4级Marx并联发生器在5 Ω电阻负载上的输出电流峰值可达300 A,最大输出电流可达460 A,上升时间272 ns。表明多路Marx发生器并联可以有效地减小系统内阻,提高系统带载能力;改进后的并联方案实现大电流脉冲输出的同时,所采用的开关管数量减小近一半,提高了系统的抗干扰能力的同时,降低了脉冲电源的成本;且增加级间并联导线可进一步改善均流效果。
相似文献7.
A new multi-block hybrid compact–WENO finite-difference method for the massively parallel computation of compressible flows is presented. In contrast to earlier methods, our approach breaks the global dependence of compact methods by using explicit finite-difference methods at block interfaces and is fully conservative. The resulting method is fifth- and sixth-order accurate for the convective and diffusive fluxes, respectively. The impact of the explicit interface treatment on the stability and accuracy of the multi-block method is quantified for the advection and diffusion equations. Numerical errors increase slightly as the number of blocks is increased. It is also found that the maximum allowable time steps increase with the number of blocks. The method demonstrates excellent scalability on up to 1264 processors. 相似文献
8.
Magnetic Resonance Imaging (MRI) uses non-ionizing radiations and is safer as compared to CT and X-ray imaging. MRI is broadly used around the globe for medical diagnostics. One main limitation of MRI is its long data acquisition time. Parallel MRI (pMRI) was introduced in late 1990's to reduce the MRI data acquisition time. In pMRI, data is acquired by under-sampling the Phase Encoding (PE) steps which introduces aliasing artefacts in the MR images. SENSitivity Encoding (SENSE) is a pMRI based method that reconstructs fully sampled MR image from the acquired under-sampled data using the sensitivity information of receiver coils. In SENSE, precise estimation of the receiver coil sensitivity maps is vital to obtain good quality images. Eigen-value method (a recently proposed method in literature for the estimation of receiver coil sensitivity information) does not require a pre-scan image unlike other conventional methods of sensitivity estimation. However, Eigen-value method is computationally intensive and takes a significant amount of time to estimate the receiver coil sensitivity maps. This work proposes a parallel framework for Eigen-value method of receiver coil sensitivity estimation that exploits its inherent parallelism using Graphics Processing Units (GPUs). We evaluated the performance of the proposed algorithm on in-vivo and simulated MRI datasets (i.e. human head and simulated phantom datasets) with Peak Signal-to-Noise Ratio (PSNR) and Artefact Power (AP) as evaluation metrics. The results show that the proposed GPU implementation reduces the execution time of Eigen-value method of receiver coil sensitivity estimation (providing up to 30 times speed up in our experiments) without degrading the quality of the reconstructed image. 相似文献
9.
A set of parallel replicas of a single simulation can be statistically coupled to closely approximate long trajectories. In many cases, this produces nearly linear speedup over a single simulation ( M times faster with M simulations), rendering previously intractable problems within reach of large computer clusters. Interestingly, by varying the coupling of the parallel simulations, it is possible in some systems to obtain greater than linear speedup. The methods are generalizable to any search algorithm with long residence times in intermediate states. 相似文献
10.
We perform a proof-of-concept implementation of the massively parallel algorithm [P. M. Lushnikov, Opt. Lett. 27, 939 (2002)] for simulation of dispersion-managed wavelength-division-multiplexed optical fiber systems. Linear scalability of the algorithm with the number of computer cores is demonstrated. Exact result on the accuracy of the implemented algorithm is found analytically and confirmed numerically as well as it is compared with the accuracy of the standard split-step algorithm. 相似文献
11.
12.
We study the asymptotic scaling properties of a massively parallel algorithm for discrete-event simulations where the discrete events are Poisson arrivals. The evolution of the simulated time horizon is analogous to a nonequilibrium surface. Monte Carlo simulations and a coarse-grained approximation indicate that the macroscopic landscape in the steady state is governed by the Edwards-Wilkinson Hamiltonian. Since the efficiency of the algorithm corresponds to the density of local minima in the associated surface, our results imply that the algorithm is asymptotically scalable. 相似文献
13.
Andrade X Alberdi-Rodriguez J Strubbe DA Oliveira MJ Nogueira F Castro A Muguerza J Arruabarrena A Louie SG Aspuru-Guzik A Rubio A Marques MA 《J Phys Condens Matter》2012,24(23):233202
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures. 相似文献
14.
15.
This paper discusses the use of aperiodic (binary or ternary) sequences in combining pseudorandom number generators (RNG).
We introduce a method for combining two or three RNGs using cut and project sequences. This combination method produces aperiodic
number sequences having no lattice structure. Theoretical results are announced.
This work was partially supported by the Bell Canada University Laboratory, NSERC of Canada and FCAR of Québec.
Presented by L.-S. Guimond at the DI-CRM Woprkshop held in Prague, 18–21 June 2000. 相似文献
16.
The paper is focused on the practical application of parallel computing techniques in uncertainty assessment in simulation of heat transfer, mechanical and some other problems related to deterministic analysis of NPP safety. A methodology is developed and implemented in VARIA computer code that performs simultaneous run of multiple simulations on a parallel computing system with further statistical analysis of the array of their results. The current version of the code allows automated preparation and execution of multivariate simulations of thermal and mechanical behavior of pressurized water reactor structures by best-estimate (BE) codes in the scope of NPP safety assessment under severe accident conditions. The number of simultaneously launched tasks is limited only by the computer cluster capacity. The VARIA code is verified on multivariate simulation with HEFEST code of thermal behavior of a core melt in the VVER-440 reactor vessel during a severe accident. The influence of the variation of input parameters (decay heat value and coefficients of the applied convective heat transfer model) on the simulation results is studied. It is concluded that the potential field of applying the program is beyond the scope of analyzing severe accidents at NPP and includes also software product quality assurance and analysis of uncertainties of obtained simulation results. 相似文献
17.
We have measured the susceptibility of a three-dimensional Ising system in a box of 243 size. Our results do not agree within four standard deviations with the previous result obtained with a special-purpose machine. The origin of the discrepancy is due, in our opinion, to the different random number. 相似文献
18.
High efficiency redundant binary number representations for parallel arithmetic on optical computers
A family of redundant binary number representations, obtained by generalization of the RB (redundant binary) number representation, is introduced. All these number representations are suitable for optical computing and have properties similar to the RB representation. In particular, the p-RB (packed redundant binary) number representation introduced in this work has efficiency greater than both RB and MSD (modified signed digit) representations. With p-RB numbers the algebraic sum is always permitted in constant time for any efficiency value. p-RB representations also fit in a natural way the 2's complement binary number system. Symbolic substitution truth tables for the algebraic sum and several examples of computation are also given. 相似文献
19.
Diego Rossinelli Michael Bergdorf Georges-Henri Cottet Petros Koumoutsakos 《Journal of computational physics》2010,229(9):3316-3333
We present a GPU accelerated solver for simulations of bluff body flows in 2D using a remeshed vortex particle method and the vorticity formulation of the Brinkman penalization technique to enforce boundary conditions. The efficiency of the method relies on fast and accurate particle-grid interpolations on GPUs for the remeshing of the particles and the computation of the field operators. The GPU implementation uses OpenGL so as to perform efficient particle-grid operations and a CUFFT-based solver for the Poisson equation with unbounded boundary conditions. The accuracy and performance of the GPU simulations and their relative advantages/drawbacks over CPU based computations are reported in simulations of flows past an impulsively started circular cylinder from Reynolds numbers between 40 and 9500. The results indicate up to two orders of magnitude speed up of the GPU implementation over the respective CPU implementations. The accuracy of the GPU computations depends on the Re number of the flow. For Re up to 1000 there is little difference between GPU and CPU calculations but this agreement deteriorates (albeit remaining to within 5% in drag calculations) for higher Re numbers as the single precision of the GPU adversely affects the accuracy of the simulations. 相似文献
20.
X. Fabian F. Mauger G. Quéméner Ph. Velten G. Ban C. Couratin P. Delahaye D. Durand B. Fabre P. Finlay X. Fléchard E. Liénard A. Méry O. Naviliat-Cuncic B. Pons T. Porobic N. Severijns J. C. Thomas 《Hyperfine Interactions》2015,232(1-3):87-95
We performed classical molecular dynamics (MD) simulations in order to search the conditions for efficient sympathetic cooling of highly charged ions (HCIs) in a linear Paul trap. Small two-component ion Coulomb crystals consisting of laser-cooled ions and HCIs were characterized by the results of the MD simulations. We found that the spatial distribution is determined by not only the charge-to-mass ratio but also the space charge effect. Moreover, the simulation results suggest that the temperature of HCIs do not necessarily decrease with increasing the number of laser-cooled ions in the cases of linear ion crystals. We also determined the cooling limit of sympathetically cooled 165Ho14+ ions in small linear ion Coulomb crystals. The present results show that sub-milli-Kelvin temperatures of at least 10 Ho14+ ions will be achieved by sympathetic cooling with a single laser-cooled Be+. 相似文献