期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王晓青王向军《光学学报》2019,39(3):266-272

提出了一种应用于嵌入式图形处理器(GPU)的实时目标检测算法。针对嵌入式平台计算单元较少、处理速度较慢的现状,提出了一种基于YOLO-V3(You Only Look Once-Version 3)架构的改进的轻量目标检测模型,对汽车目标进行了离线训练,在嵌入式平台上部署训练好的模型,实现了在线检测。实验结果表明,在嵌入式平台上,所提方法对分辨率为640 pixel×480 pixel的视频图像的检测速度大于23 frame/s。相似文献

2.

基于灰度特性的海天背景小目标检测 总被引：3，自引：0，他引：3

董宇星刘伟宁《中国光学》2010,15(3):252-256

针对复杂海天背景下的小目标检测存在海浪、云层干扰等问题,提出了先提取海天线,然后利用一维最大熵阈值分割法对出现在天空、海面或者海天线附近特定区域的目标进行检测的算法。该算法主要利用天空海面行灰度均值特性,结合梯度运算和形态学运算在海天线的潜在位置中检测边缘,进而用强鲁棒性的Hough变换直线检测法拟合海天线,实现对海天线的准确定位。实验处理分辨率为：256pixel×256pixel的位图时,定位海天线需时4．1ms,检测到目标需时5．3ms,完全满足高帧频图像处理的实时性要求。实验结果表明,该算法能够快速、准确地检测出小目标,大大降低了虚警率。相似文献

3.

提升R-OTDR系统实时性能方法研究

程碧钊王东王宇靳宝全《光学技术》2019,45(6):701-706

针对长距离分布式拉曼测温系统(R-OTDR)中数据量大影响系统实时性的问题,提出通过并行运算实现高次累加平均算法提高系统信噪比和系统实时性能的方法。采用中央处理器(CPU)与图形处理器(GPU)协同合作的方式来提高系统的数据处理速度。传感数据由CPU控制数据采集端读取,然后使用累加平均算法对传感数据进行去噪。在统一计算设备架构(CUDA)中,通过调用GPU上的kernel函数对累加平均算法次数进行多线程分配,以10线程的模式进行并行运算以提高数据处理速度。实验结果表明,在10km传感距离下,相比于原系统30000次累加平均算法,采用50000次累加平均算法使系统测量误差由±1.1℃降至±0.5℃,并且采用CPU和GPU协同合作的方式使50000次累加平均算法的运算时间由3890ms降至1.472ms,提升了系统实时性能。相似文献

4.

基于嵌入式GPU的运动目标分割算法并行优化

下载免费PDF全文

张刚马震环雷涛崔毅张三喜《应用光学》2019,40(6):1067-1076

在光电监视系统中,广泛应用于运动目标分割的PBAS(pixel base adaptive segmenter)算法计算复杂、参数量大,难以达到实时分割的要求。针对PBAS算法是对图像中每个像素点进行独立处理,特别适合于GPU并行加速的特点,对其在嵌入式GPU平台Jetson TX2上进行了并行优化实现。在数据存储结构、共享内存使用、随机数产生机制3个方面对该算法进行了优化设计。实验结果表明,对于480×320像素分辨率的中波红外视频序列,该并行优化方法可以达到132 fps的处理速度,满足了实时处理的要求。相似文献

5.

基于嵌入式GPU的红外弱小目标检测算法

下载免费PDF全文

范鹏程张卫国刘万刚张卫黄维东刘国栋徐晓枫《应用光学》2020,41(5):1089-1095

红外弱小目标的目标像素少，目标对比度低，成像帧率高，图像数据量大，检测实时性强。针对红外弱小目标检测算法适合于GPU并行计算的特点，对其在嵌入式GPU平台Jetson TX2上进行了并行优化实现。在检测算法设计、内存访问、调试优化3个方面进行了优化设计。实验结果表明，对640×480像素分辨率的红外视频，并行优化后的目标检测算法能够在10 ms内完成计算，满足实时处理需求。相似文献

6.

基于稳定矩阵的动态图像运动目标检测

郝志成吴川《光学学报》2009,29(11)

为了对视频序列中的运动目标进行快速、准确地提取,提出了一种自适应背景模型估计方法.利用背景与前景图像在时域中不同的变化特性,构造图像的稳定矩阵函数,通过稳定矩阵元素的变化自动区分背景点和前景点,并对稳定矩阵设置上、下饱和值,使算法能在短时间内自动感知背景的突变,从而快速地建立背景图像模型并对其实时更新.同时还分析了动态图像序列的配准问题,选取局部特征模板图像,采用投影匹配原理简化计算,快速估计出全局运动矢量.实验证明,背景估计算法收敛速度快,只需10 frame图像即可建立稳定背景,对于500 pixel×200 pixel动态图像序列,整个算法时间只需35 ms,完全满足工程上25 frame/s处理能力的要求. 相似文献

7.

改进的Canny图像边缘检测算法

李俊山赵方舟郭莉莎马颖《光子学报》2011,(z1):50-54

边缘检测是图像处理中非常重要的一个环节,针对图像边缘检测中噪声抑制与细节保留之间的矛盾,提出了一种改进的基于Canny算子的边缘检测算法,采用小波变换和改进的自适应中值滤波器替代Canny算法中高斯滤波器对图像做去噪工作;采用3×3邻域代替Canny算法中2×2邻域来计算梯度幅值.仿真实验结果较好地说明了改进算法的性能... 相似文献

8.

Canny算法的改进及其硬件的实现 总被引：16，自引：0，他引：16

韦海萍赵保军唐林波何佩琨《光学技术》2006,32(2):263-266

传统的Canny边缘检测算法是先对图像进行平滑,然后利用二维滤波器计算图像的梯度值,从而得到梯度直方图。通过基于梯度直方图来选取双阈值,最后进行基于双阈值的非极大值抑制,得到边缘检测的图像。为了能很方便地在硬件中实现这种算法,对传统的Canny算法进行了适当的改进,利用模板来代替原算法中的卷积,从而使得该算法在FPGA(Field Programmable Gate Array)中可以完全利用原理图来实现。相似文献

9.

基于CUDA的多GPU加速SART迭代重建算法

下载免费PDF全文

雷德川陈浩王远张成鑫陈云斌胡栋材《强激光与粒子束》2013,25(9):2418-2422

为解决SART迭代重建算法计算耗时的问题,在单GPU基础上,利用多块GPU 的并行计算能力,提出了一种多GPU加速迭代重建算法。实验结果表明,与CPU重建相比,在不影响重建图像质量的情况下,采用GPU重建速度有明显提高,且增加GPU数量可以进一步提高重建速度。相似文献

10.

基于GPU的液晶大气湍流模拟器波面生成的并行实现

下载免费PDF全文

倪小龙刘智孔悦刘丹《强激光与粒子束》2014,26(3):031011-71

为了使液晶大气湍流模拟器具有实时大气模拟能力,在GPU通用计算架构下提出了基于GPU的液晶大气湍流模拟器实时波面生成计算方法。针对液晶湍流模拟器高分辨率、高精度的特点介绍了湍流波面生成计算方法,论述了CUDA通用计算架构。建立基于GPU的波面生成模型,并对该模型进行了并行化优化和共享存储器优化。给出了采用CPU与GPU进行波面生成的实验对比结果。结果表明:采用GPU生成分辨率为256×256,192项Zernike多项式进行波面生成的平均时间为2.5ms,生成速度比CPU少两个量级,满足实时波面生成的要求。相似文献

11.

NeuDATool：支持GPU硬件加速和计算机集群跨节点并行的开源中子散射数据分析软件

马长利程贺左太森焦贵省韩泽华秦虹《化学物理学报》2020,33(6):727-732

实验势精修是20世纪80年代英国散裂中子源无定型材料组开发的用于分析中子散射实验数据的软件. 实验势精修的目标是根据中子散射数据重建样品的三维原子结构. 在过去的几十年，实验势精修被广泛用于中子散射实验数据分析，为实验用户提供了可靠的分析结果. 但是实验势精修是基于共享内存并行计算(OpenMP)的Fortran程序，不支持计算机服务器集群跨节点并行加速和GPU加速；这限制了它的分析速度. 随着计算机服务器集群的广泛建设和GPU加速技术的普遍使用，有必要重新编写EPSR程序以提高运算速度. 本文使用面向对象的C++语言，开发了一套实现EPSR算法的开源软件包NeuDATool；软件通过MPI和CUDA C实现了计算机集群跨节点并行和GPU加速. 使用液态水和玻璃态二氧化硅的中子散射实验数据对软件进行了测试. 测试显示软件可以正确重建出样品的三维原子结构；并且模拟体系达到10万原子以上时，使用GPU加速可以比串行的CPU算法提高400倍以上的模拟速度. NeuDATool为中子实验用户尤其是对熟悉C++编程并希望定义特殊分析算法的实验科学家提供了一种新的选择. 相似文献

12.

Acceleration of the Smith–Waterman algorithm using single and multiple graphics processors

Ali Khajeh-Saeed Stephen Poole J. Blair Perot 《Journal of computational physics》2010,229(11):4247-4258

Finding regions of similarity between two very long data streams is a computationally intensive problem referred to as sequence alignment. Alignment algorithms must allow for imperfect sequence matching with different starting locations and some gaps and errors between the two data sequences. Perhaps the most well known application of sequence matching is the testing of DNA or protein sequences against genome databases. The Smith–Waterman algorithm is a method for precisely characterizing how well two sequences can be aligned and for determining the optimal alignment of those two sequences. Like many applications in computational science, the Smith–Waterman algorithm is constrained by the memory access speed and can be accelerated significantly by using graphics processors (GPUs) as the compute engine. In this work we show that effective use of the GPU requires a novel reformulation of the Smith–Waterman algorithm. The performance of this new version of the algorithm is demonstrated using the SSCA#1 (Bioinformatics) benchmark running on one GPU and on up to four GPUs executing in parallel. The results indicate that for large problems a single GPU is up to 45 times faster than a CPU for this application, and the parallel implementation shows linear speed up on up to 4 GPUs. 相似文献

13.

CUDA-based four-path method with application to the EM scattering of a two-layer canopy

《Waves in Random and Complex Media》2013,23(3):529-541

The long-term goal of this paper is to develop a robust simulator to study the EM scattering from vegetation. In an effort to overwhelm the intensive computational burden results from large sampling numbers, we decided to utilize the Graphics Processing Unit (GPU), which has been greatly developed in recent years. In this paper, Compute Unified Device Architecture (CUDA) is combined with the four-path method to predict the EM scattering properties from scatterers which are sampled by using the Monte Carlo method in a two-layer canopy model. Obviously, a speedup of 77.8 times could be readily obtained in comparison with the original serial algorithm on a Core(TM) i5 CPU with the help of a GTS250 GPU as a coprocessor. 相似文献

14.

Importance of explicit vectorization for CPU and GPU software performance

Neil G. Dickson Kamran Karimi Firas Hamze 《Journal of computational physics》2011,230(13):5383-5398

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU implementations of a particular computationally intensive Metropolis Monte Carlo algorithm. Explicit vectorization on the CPU and the equivalent, explicit memory coalescing, on the GPU are found to be critical to achieving good performance of this algorithm in both environments. The fully-optimized CPU version achieves a 9× to 12× speedup over the original CPU version, in addition to speedup from multi-threading. This is 2× faster than the fully-optimized GPU version, indicating the importance of optimizing CPU implementations. 相似文献

15.

基于结构光的植保无人机障碍物在线检测系统

下载免费PDF全文

陈念吴开华王文杰《应用光学》2018,39(3):343-348

为了确保植保无人机在飞行作业过程中的安全,要求植保无人机具有自动避障能力,为此提出了基于结构光视觉的障碍物检测方法。为提高障碍物检测的实时性,重点研究了基于嵌入式平台的植保无人机障碍物检测系统,通过将障碍物图像处理算法的并行计算映射到GPU硬件资源上完成,大大提高了算法的运行效率。实验表明,在保证障碍物轮廓线完整的前提下,通过对比CPU和CPU-GPU实现处理算法,障碍物检测系统获得了约46.15的加速比,采集及处理时间约为48.985 ms。该系统具有处理效果明显与实时性好等优点,为植保无人机的实时障碍物检测和进一步实现自动避障奠定了基础。相似文献

16.

众核处理架构在水下航行器相位编码脉冲回波检测中的应用 总被引：1，自引：0，他引：1

下载免费PDF全文

詹飞马晓川杨力《声学学报》2018,43(4):445-452

针对宽带编码脉冲、多输入多输出等新型目标探测体制发展带来的运算量和数据存储需求剧增的问题,根据水下航行器相位编码脉冲回波检测算法的数据级并行特点,提出应用图形处理器(Graphics Processing Unit,GPU)众核处理架构,并从任务分配策略、数据处理流程、GPU硬件资源利用率和存储器访问等角度考虑,设计了算法在GPU上的并行实现框架。利用湖试数据测试了桌面级GPU平台、嵌入式GPU平台与基于多核数字信号处理器(Digital Signal Processor,DSP)的传统航行器信号处理平台的性能,与多核DSP平台相比,嵌入式GPU平台在功耗、运算性能等方面更有优势。研究结果表明采用嵌入式GPU平台可大幅提升每瓦特性能指标并简化系统设计,能满足新型航行器探测系统大数据量、低功耗和实时性的应用需求。相似文献

17.

Cryptanalysis of an ergodic chaotic encryption algorithm

下载免费PDF全文

王兴元谢旖欣秦学《中国物理 B》2012,21(4):40504-040504

In this paper, we present the results for the security and the possible attacks on a new symmetric key encryption algorithm based on the ergodicity property of a logistic map. After analysis, we use mathematical induction to prove that the algorithm can be attacked by a chosen plaintext attack successfully and give an example to show how to attack it. According to the cryptanalysis of the original algorithm, we improve the original algorithm, and make a brief cryptanalysis. Compared with the original algorithm, the improved algorithm is able to resist a chosen plaintext attack and retain a considerable number of advantages of the original algorithm such as encryption speed, sensitive dependence on the key, strong anti-attack capability, and so on. 相似文献

18.

层析法计算三维物体全息图的并行加速研究

下载免费PDF全文

肖波郑华东刘柯健李飞高智方《应用光学》2019,40(4):620-626

随着计算空间光调制器的分辨率的尺寸逐渐变大，全息图三维动态显示的计算量也越来越大，使得对全息计算速度提出了新的要求。利用GPU并行计算处理的方式实现全息图的快速层析法计算，该方法利用GPU并行多线程和层析法中的图像二维傅里叶变换的优势对菲涅尔衍射变换算法加速计算；同时通过对GPU底层资源的调用和对CUDA中程序的流处理过程，有效减少中间的延时等待。通过对计算速度对比分析表明:与在CPU上运算相比，计算速度大幅提升，基于GPU并行计算的方法比基于CPU计算的方法速度快10倍左右。相似文献

19.

连续时间系统二维不稳定流形的异构算法

下载免费PDF全文

李清都谭宇玲杨芳艳《物理学报》2011,60(3):30206-030206

非线性系统的二维流形通常具有复杂几何结构和丰富动力学信息,因此在流形计算与可视化时存在大量的不可避免的数值计算.因此,如何高效地完成这些计算就成了关键问题.鉴于当今计算机的异构发展趋势(包含多核CPU和通用GPU),本文在兼顾精度和通用性的基础上,提出了适用于新一代计算平台的快速流形计算方法.本算法将计算任务分为轨道延伸和三角形生成两部分,前者运算量大而单一适合GPU完成,后者运算量小而复杂适合CPU执行.通过对Lorenz系统原点稳定流形的计算,表明本算法能充分发挥异构平台的综合性能,可大幅度提高计算速 关键词：不稳定流形流形计算异构计算 Lorenz系统相似文献

20.

GPU-based single-cluster algorithm for the simulation of the Ising model

Yukihiro Komura Yutaka Okabe 《Journal of computational physics》2012,231(4):1209-1215

We present the GPU calculation with the common unified device architecture (CUDA) for the Wolff single-cluster algorithm of the Ising model. Proposing an algorithm for a quasi-block synchronization, we realize the Wolff single-cluster Monte Carlo simulation with CUDA. We perform parallel computations for the newly added spins in the growing cluster. As a result, the GPU calculation speed for the two-dimensional Ising model at the critical temperature with the linear size L = 4096 is 5.60 times as fast as the calculation speed on a current CPU core. For the three-dimensional Ising model with the linear size L = 256, the GPU calculation speed is 7.90 times as fast as the CPU calculation speed. The idea of quasi-block synchronization can be used not only in the cluster algorithm but also in many fields where the synchronization of all threads is required. 相似文献