首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
Molecular dynamics is a popular methodology for investigating the properties of liquids. In this article, the historical development of the subject and its current status will be briefly reviewed. The different parallelisation strategies that are commonly used are discussed, highlighting their relative strengths and weaknesses. Particular attention is given to the software and hardware aspects of implementing these algorithms on the 'Beowulf class' of parallel computers. Finally, three different examples of parallel molecular dynamics studies on a Beowulf computer will be discussed, that are indicative of the range of potential applications.  相似文献   

2.
在迄今流行的认知当中,第二代高温超导导线相对于一代导线具有更优良的磁场性能,然而本研究发现在液氮温度和低平行场(平行于导线宽面)的条件下实际情况恰好相反。文中通过测量研究和文献调研相结合,确认了这一现象的可靠性。高温超导电缆在远距离输电和大型城市输电方面都具有显著的性能优势,必将为我国电力能源领域带来重大的技术变革。而在高温超导电缆等应用技术中,低平行场是典型的使用条件,根据本研究得到的数据并考虑到价格因素,目前一代导线在这些应用领域中具有明显的相对优势。  相似文献   

3.
Neural network-based image processing algorithms present numerous advantages due to their supervised adjustable properties. Among various neural network architectures, dynamic neural networks, Hopfield and Cellular networks, have been found inherently suitable for filtering applications. Combining supervised and filtering features of dynamic neural networks, this paper presents dynamic neural filtering technique based on Hopfield neural network architecture. The filtering technique has also been implemented by using phase-only joint transform correlation (POJTC) for optical image processing applications. Filtering structure is basically similar to the Hopfield neural network structure except for the adjustable filter mask and 2D convolution operation instead of weight matrix operations. The dynamic neural filtering architecture has learnable properties by back-propagation learning algorithm. POJTC presents significant advantages to achieve the operation of summing the cross-correlation of bipolar data by phase-encoding bipolar data in parallel. The image feature extraction performance of the proposed optical system is reported for various image processing applications using a simulation program.  相似文献   

4.
In magnetic resonance imaging, highly parallel imaging using coil arrays with a large number of elements is an area of growing interest. With increasing channel numbers for parallel acquisition, the increased reconstruction time and extensive computer memory requirements have become significant concerns. In this work, principal component analysis (PCA) is used to develop a channel compression technique. This technique efficiently reduces the size of parallel imaging data acquired from a multichannel coil array, thereby significantly reducing the reconstruction time and computer memory requirement without undermining the benefits of multichannel coil arrays. Clinical data collected with a 32-channel cardiac coil are used in all of the experiments. The performance of the proposed method on parallel, partially acquired data, as well as fully acquired data, was evaluated. Experimental results show that the proposed method dramatically reduces the processing time without considerable degradation in the quality of reconstructed images. It is also demonstrated that this PCA technique can be used to perform intensity correction in parallel imaging applications.  相似文献   

5.
The Photon-Ion Spectrometer at PETRA III—in short, PIPE—is a permanently installed user facility at the "Variable Polarization XUV Beamline" P04 of the synchrotron light source PETRA III operated by DESY in Hamburg, Germany. The careful design of the PIPE ion-optics in combination with the record-high photon flux at P04 has lead to a breakthrough in experimental studies of photon interactions with ionized small quantum systems. This short review provides an overview over the published scientific results from photon-ion merged-beams experiments at PIPE that were obtained since the start of P04 operations in 2013. The topics covered comprise photoionization of ions of astrophysical relevance, quantitative studies of multi-electron processes upon inner-shell photoexcitation and photoionization of negative and positive atomic ions, precision spectroscopy of photoionization resonances, photoionization and photofragmentation of molecular ions, and of endohedral fullerene ions.  相似文献   

6.
以自主研制的区域中尺度暴雨大气模式为研究对象,基于JASMIN并行编程框架,建立构件化、层次化的区域大气模式大规模高效并行程序,并针对典型天气实例,对模式并行计算程序的正确性、并行性能及高分辨率模拟效果进行验证.结果证明,基于JASMIN框架的新模式程序与原串行模式具有很好的计算一致性,其不仅能保持原有模式良好的预报效果,且能显著提升模式大规模并行计算性能和可扩展性,在进一步提高模式分辨率后能得到更好的预报结果.  相似文献   

7.
Finding regions of similarity between two very long data streams is a computationally intensive problem referred to as sequence alignment. Alignment algorithms must allow for imperfect sequence matching with different starting locations and some gaps and errors between the two data sequences. Perhaps the most well known application of sequence matching is the testing of DNA or protein sequences against genome databases. The Smith–Waterman algorithm is a method for precisely characterizing how well two sequences can be aligned and for determining the optimal alignment of those two sequences. Like many applications in computational science, the Smith–Waterman algorithm is constrained by the memory access speed and can be accelerated significantly by using graphics processors (GPUs) as the compute engine. In this work we show that effective use of the GPU requires a novel reformulation of the Smith–Waterman algorithm. The performance of this new version of the algorithm is demonstrated using the SSCA#1 (Bioinformatics) benchmark running on one GPU and on up to four GPUs executing in parallel. The results indicate that for large problems a single GPU is up to 45 times faster than a CPU for this application, and the parallel implementation shows linear speed up on up to 4 GPUs.  相似文献   

8.
Echo Planar Imaging (EPI) is a neuroimaging tool for clinical practice and research investigation. Due to odd-even echo phase inconsistencies, however, EPI suffers from Nyquist N/2 ghost artifacts. In standard neuroimaging protocols, EPI artifacts are suppressed using phase correction techniques that require reference data collected from a reference scan. Because reference-scan based techniques are sensitive to subject motion, EPI performance is sub-optimal in neuroimaging applications. In this technical note, we present a novel EPI data processing technique which we call Parallel EPI Artifact Correction (PEAC). By introducing an implicit data constraint associated with multi-coil sensitivity in parallel imaging, PEAC converts phase correction into a constrained problem that can be resolved using an iterative algorithm. This enables “reference-less” EPI that can improve neuroimaging performance. In the presented work, PEAC is investigated using a standard functional magnetic resonance imaging (fMRI) protocol with multi-slice 2D EPI. It is demonstrated that PEAC can suppress ghost artifacts as effectively as the standard reference-scan based phase correction technique used on a clinical MRI system. We also found that PEAC can achieve dynamic phase correction when motion occurs.  相似文献   

9.
Methods for parallel simulation of solid state NMR powder spectra are presented for both shared and distributed memory parallel supercomputers. For shared memory architectures the performance of simulation programs implementing the OpenMP application programming interface is evaluated. It is demonstrated that the design of correct and efficient shared memory parallel programs is difficult as the performance depends on data locality and cache memory effects. The distributed memory parallel programming model is examined for simulation programs using the MPI message passing interface. The results reveal that both shared and distributed memory parallel computation are very efficient with an almost perfect application speedup and may be applied to the most advanced powder simulations.  相似文献   

10.
The acquisition time of three-dimensional magnetic resonance imaging (3-D MRI) is too long to tolerate in many clinical applications. At present, parallel MRI (pMRI) and partial Fourier (PF) with homodyne detection, including 2-D pMRI (two-dimensional pMRI) and PF_pMRI (the combination of PF and pMRI), are often used to accelerate data sampling in 3-D MRI. However, the performances of 2-D pMRI and PF_pMRI have been seldom discussed. In this paper, we choose GRAPPA (generalized auto-calibrating partially parallel acquisition) as a representative pMRI to analyze and compare the performances of 2-D GRAPPA and PF_GRAPPA, including the noise standard deviation (SD), root mean-square error (RMSE) and g factor, through a series of in vitro experiments. A series of phantom experiments show that the SD, RMSE and g-factor values of PF_GRAPPA are lower than those of 2-D GRAPPA under the same acceleration factor. It demonstrates that the performance of PF_GRAPPA is better than that of 2-D GRAPPA. PF_GRAPPA can be used in any thickness of imaging slab, while 2-D GRAPPA can only be used in thick slab due to the difficulties in determination of the fitting coefficients which result from imperfect RF pulse. In vivo brain experiment results also show that the performance of PF_GRAPPA is better than that of 2-D GRAPPA.  相似文献   

11.
Typical applications of wireless sensor networks (WSN), such as in Industry 4.0 and smart cities, involves acquiring and processing large amounts of data in federated systems. Important challenges arise for machine learning algorithms in this scenario, such as reducing energy consumption and minimizing data exchange between devices in different zones. This paper introduces a novel method for accelerated training of parallel Support Vector Machines (pSVMs), based on ensembles, tailored to these kinds of problems. To achieve this, the training set is split into several Voronoi regions. These regions are small enough to permit faster parallel training of SVMs, reducing computational payload. Results from experiments comparing the proposed method with a single SVM and a standard ensemble of SVMs demonstrate that this approach can provide comparable performance while limiting the number of regions required to solve classification tasks. These advantages facilitate the development of energy-efficient policies in WSN.  相似文献   

12.
内嵌微流道低温共烧陶瓷基板传热性能(英)   总被引:1,自引:0,他引:1       下载免费PDF全文
随着系统级封装(SIP)所容纳的电子元器件和集成密度迅速增加,传统的散热方法(热通孔、风冷散热等)越来越难以满足系统级封装的热管理需求。低温共烧陶瓷(LTCC)作为常见的封装基板材料之一,设计并研制了三种内嵌于LTCC基板的微流道,其中包括直排型、蛇型和螺旋型微流道(高度为0.3 mm,宽度分别为0.4, 0.5和0.8 mm)。通过数值仿真和红外热像仪测试相结合的方式分析了微流道网络结构、流体质量流量、雷诺数、材料热导率对内嵌微流道LTCC基板换热性能的影响,实验结果表明:当去离子水的流量为10 mL/min,热源等效功率为2 W/cm2时,直排型微流道的LTCC基板最高温度在3.1 kPa输入泵压差下能降低75.4 ℃,蛇型微流道的LTCC基板最高温度在85.8 kPa输入泵压差下能降低80.2 ℃,螺旋型微流道的LTCC基板最高温度在103.1 kPa输入泵压差下能降低86.7 ℃。在三种微流道中,直排型微流道具有最小的雷诺数,在相同的输入泵压差下有最好的散热性能。窄的直排型微流道(0.4 mm)在相同的流道排布密度和流体流量时比宽的微流道(0.8 mm)能多降低基板温度10 ℃。此外,提高封装材料的热导率有助于提高微流道的换热性能。  相似文献   

13.
Signal-to-noise ratio (SNR) is a critical factor in MR-guided high-intensity focused ultrasound (HIFU) for local heating, which can affect the accuracy of temperature measurement. In order to achieve high SNR and higher temporal resolution, dedicated coil arrays for MR-guided HIFU applications need to be developed. In this work, a flexible 9-channel coil array was designed, and constructed at 3 T to achieve fast temperature mapping for MR-guided HIFU applications on rabbit leg muscle. Coil performance was evaluated for SNR, and parallel imaging capability by in-vivo studies. Compared to a commercially available 4-channel flexible coil array, the dedicated 9-channel coil array has a much higher SNR, with at least a 2.6-fold increment in the region of interest (ROI). The inverse g-factors maps demonstrated that the dedicated 9-channel coil array has a better parallel imaging capability than the Flex Small 4. With accelerations normal to the array direction, both coil arrays showed much higher g-factors than those of accelerations along the array direction. Room temperature mapping was implemented to evaluate the temperature measurement accuracy by in-vivo experiments. The precisions of the 9-channel coil, ±0.18 °C for un-acceleration and ± 0.56 °C for acceleration at R = 2 × 2, both improved by an order of magnitude than these of the 4-channel coil, which were ± 1.45 °C for un-acceleration and ± 3.52 °C for acceleration at R = 2 × 2. In the fast temperature imaging on the rabbit leg muscle with heating, a high temporal resolution of 3.3 s with a temperature measurement precision of ±0.56 °C has been achieved using the dedicated 9-channel coil. This study demonstrates that the dedicated 9-channel coil array for rabbit leg imaging provides improved performance in SNR, parallel imaging capability, and the accuracy of temperature measurement compared to a commercial 4-channel coil, and it also achieves fast temperature mapping in practical MR-guided HIFU applications.  相似文献   

14.
This paper describes a coupling framework for parallel execution of different solvers for multi-physics and multi-domain simulations with an arbitrary number of adjacent zones connected by different physical or overlapping interfaces. The coupling architecture is based on the execution of several instances of the same coupling code and relies on the use of smart edges (i.e., separate processes) dedicated to managing the exchange of information between two adjacent regions. The collection of solvers and coupling sessions forms a flexible and modular system, where the data exchange is handled by independent servers that are dedicated to a single interface connecting two solvers’ sessions. Accuracy and performance of the strategy is considered for turbomachinery applications involving Conjugate Heat Transfer (CHT) analysis and Sliding Plane (SP) interfaces.  相似文献   

15.
The computational capability of a coarse-grained reconfigurable array(CGRA)can be significantly restrained due to data and context memory bandwidth bottlenecks.Traditionally,two methods have been used to resolve this problem.One method loads the context into the CGRA at run time.This method occupies very small on-chip memory but induces very large latency,which leads to low computational efficiency.The other method adopts a multi-context structure.This method loads the context into the on-chip context memory at the boot phase.Broadcasting the pointer of a set of contexts changes the hardware configuration on a cycle-by-cycle basis.The size of the context memory induces a large area overhead in multi-context structures,which results in major restrictions on application complexity.This paper proposes a Predictable Context Cache(PCC)architecture to address the above context issues by buffering the context inside a CGRA.In this architecture,context is dynamically transferred into the CGRA.Utilizing a PCC significantly reduces the on-chip context memory and the complexity of the applications running on the CGRA is no longer restricted by the size of the on-chip context memory.Data preloading is the most frequently used approach to hide input data latency and speed up the data transmission process for the data bandwidth issue.Rather than fundamentally reducing the amount of input data,the transferred data and computations are processed in parallel.However,the data preloading method cannot work efficiently because data transmission becomes the critical path as the reconfigurable array scale increases.This paper also presents a Hierarchical Data Memory(HDM)architecture as a solution to the efficiency problem.In this architecture,high internal bandwidth is provided to buffer both reused input data and intermediate data.The HDM architecture relieves the external memory from the data transfer burden so that the performance is significantly improved.As a result of using PCC and HDM,experiments running mainstream video decoding programs achieved performance improvements of 13.57%–19.48%when there was a reasonable memory size.Therefore,1080p@35.7fps for H.264high profile video decoding can be achieved on PCC and HDM architecture when utilizing a 200 MHz working frequency.Further,the size of the on-chip context memory no longer restricted complex applications,which were efficiently executed on the PCC and HDM architecture.  相似文献   

16.
17.
Because of the merits of non-destruction, high speed, and high sensitivity, optical techniques have been developed for experimental mechanics and optical measurement. In commercial optical systems, the speed performance becomes more important and real-time systems are pursued. Among many acceleration methods, using parallel computing hardware is proven effective. In this paper, the main principles of parallel computing at an application level are introduced; the hardware platforms that support parallel computing are compared; the applications of parallel computing in experimental mechanics and optical measurement are reviewed. Parallel hardware platforms are seen to be useful for the acceleration of various problems. When the computation is time-consuming or real-time performance is required, hardware acceleration is a possible approach for consideration.  相似文献   

18.
A novel intrinsically decoupled transmit and receive radio-frequency coil element is presented for applications in parallel imaging and parallel excitation techniques in high-field magnetic resonance imaging. Decoupling is achieved by a twofold strategy: during transmission elements are driven by current sources, while during signal reception resonant elements are switched to a high input impedance preamplifier. To avoid B(0) distortions by magnetic impurities or DC currents a resonant transmission line is used to relocate electronic components from the vicinity of the imaged object. The performance of a four-element array for 3 T magnetic resonance tomograph is analyzed by means of simulation, measurements of electromagnetic fields and bench experiments. The feasibility of parallel acquisition and parallel excitation is demonstrated and compared to that of a conventional power source-driven array of equivalent geometry. Due to their intrinsic decoupling the current-controlled elements are ideal basic building blocks for multi-element transmit and receive arrays of flexible geometry.  相似文献   

19.
An increasing number of massively-parallel supercomputers are based on heterogeneous node architectures combining traditional, powerful multicore CPUs with energy-efficient GPU accelerators. Such systems offer high computational performance with modest power consumption. As the industry trend of closer integration of CPU and GPU silicon continues, these architectures are a possible template for future exascale systems. Given the longevity of large-scale parallel HPC applications, it is important that there is a mechanism for easy migration to such hybrid systems. The OpenACC programming model offers a directive-based method for porting existing codes to run on hybrid architectures. In this paper, we describe our experiences in porting the Himeno benchmark to run on the Cray XK6 hybrid supercomputer. We describe the OpenACC programming model and the changes needed in the code, both to port the functionality and to tune the performance. Despite the additional PCIe-related overheads when transferring data from one GPU to another over the Cray Gemini interconnect, we find the application gives very good performance and scales well. Of particular interest is the facility to launch OpenACC kernels and data transfers asynchronously, which speeds the Himeno benchmark by 5%–10%. Comparing performance with an optimised code on a similar CPU-based system (using 32 threads per node), we find the OpenACC GPU version to be just under twice the speed in a node-for-node comparison. This speed-up is limited by the computational simplicity of the Himeno benchmark and is likely to be greater for more complicated applications.  相似文献   

20.
This paper describes a number of array post-processing methods developed for scanning applications in non-destructive evaluation. The approach is to capture and process the full matrix of all transmit-receive time-domain signals from the array. Post-processing the data in this way enables a multitude of imaging modalities to be implemented, including many that could not feasibly be achieved using conventional parallel firing techniques. The authors have previously published work on imaging algorithms for improving the characterisation of defects in solids by post-processing the data from a static linear ultrasonic array. These algorithms are extended and applied to data from a scanned array. This allows the effective aperture and range of probing angles to be increased, hence improving imaging and defect characterisation performance. Practical implementation issues such as scanning speed and data transfer are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号