首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Current and future requirements for adaptive real-time image compression challenge even the capabilities of highly parallel realizations in terms of hardware performance. Previously proposed linear array structures for full-search vector quantization do not offer scalability and adaptivity in this context, because they require separate data/control pins for dynamically updating the codevectors and complicated interlock mechanisms to ensure that the regular data flow is not corrupted as a result of updates. We explore the design space for full-search vector quantizers and propose a novel linear processor array architecture in which global wiring is limited to clock and power supply distribution, thus allowing high-speed processing in spite of only limited communication with the host via the boundary processors. The resulting fully pipelined design is not only area-efficient for VLSI implementation but is also readily scalable and offers extremely high performance.  相似文献   

2.
This paper presents a Computational Memory architecture for MPEG-4 applications with mobile devices. The proposed architecture is used for real-time block-based motion estimation, which is the most computational intensive task in the video encoder. It uses the exhaustive block-matching algorithm (EBMA) for motion estimation. The proposed architecture consists of embedded SRAMs and a number of block-matching units working in parallel to process video data while stored in the memory. The block-matching units access the embedded SRAMs simultaneously, which increases the speed of the architecture. The architecture processes CIF format video sequences (i.e., the frame size is 352 × 288 pixels) with block size of 16 × 16 pixels and ±15 pixels search range. The proposed architecture has been designed, prototyped, and simulated for 0.18 μm TSMC CMOS technology. The simulation shows that the proposed architectures processes up to 126 CIF frames per second with clock frequency 100 MHz. The synthesized prototype of the proposed architecture includes 200 KB memory and it has an area of 33.75 mm2 and consumes 986.96 mW @100 MHz. Mohammed Sayed received his B.Sc. degree from Zagazig University, Zagazig, Egypt, in 1997 and a postgraduate diploma in VLSI design from the Information Technology Institute (ITI), Cairo, Egypt, in 1998. In 2003 he received his M.Sc. degree from University of Calgary, Calgary, Canada. From 1998 to 2001 he was a research and teaching assistant at the Electronics & Communications Engineering Department, Zagazig University, Egypt. In 2001 he became a research assistant at the Department of Electrical and Computer Engineering, University of Calgary, Canada. His current research interests are System-on-Chip, Embedded Memories, and Digital Video Processing. Mr. Sayed received a number of scholarships and awards such as iCORE Scholarship from 2003 to 2005, SMC Industrial Collaboration Award in June 2003, and the Micronet Annual Workshop Best Paper Award in April 2002. He has a number of journal and conference publications and a number of contributions to the MPEG-4 standard (ISO/IEC JTC1/SC29/WG11 MPEG2002/ M8562 and M8563). Wael Badawy is an associate professor in the Department of Electrical and Computer Engineering. He holds an adjunct professor in the Department of Mechanical Engineering, University of Alberta. Dr. Badawy's research interests are in the areas of: Microelectronics, VLSI architectures for video applications with low-bit rate applications, digital video processing, low power design methodologies, and VLSI prototyping. His research involves designing new models, techniques, algorithms, architectures and low power prototype for novel system and consumer products. Dr. Badawy authored and co-authored more than 100 peer reviewed Journal and Conference papers and about 30 technical reports. He is the Guest Editor for the special issue on System on Chip for Real-Time Applications in the Canadian Journal on Electrical and Computer Engineering, the Technical Chair for the 2002 International Workshop on SoC for real-time applications, and a technical reviewer in several IEEE journals and conferences. He is currently a member of the IEEE-CAS Technical Committee on Communication. Dr. Badawy was honored with the “2002 Petro Canada Young Innovator Award”, “2001 Micralyne Microsystems Design Award” and the 1998 Upsilon Pi Epsilon Honor Society and IEEE Computer Society Award for Academic Excellence in Computer Disciplines. He is currently the Chairman of the Canadian Advisor Committee (CAC) and Head of the Canadian Delegation on ISO/IEC/JTC1/SC6 “Telecommunications and Information Exchange Between Systems”. Member, The Canadian Advisory Committee for the Standards Council of Canada—Subcommittee 29: Coding of Audio, Picture Multimedia and Hypermedia Information, and Canadian Delegate, The ISO/IEC MPEG standard committee. He is a voting Member on the VSI Alliance. He is also the Chair of the IEEE-Southern Alberta Society-Computer Chapter.  相似文献   

3.
This paper describes an area and power-efficient VLSI approach for implementing the discrete wavelet transform on streaming multielectrode neurophysiological data in real time. The VLSI implementation is based on the lifting scheme for wavelet computation using the symmlet4 basis with quantized coefficients and integer fixed-point data precision to minimize hardware demands. The proposed design is driven by the need to compress neural signals recorded with high-density microelectrode arrays implanted in the cortex prior to data telemetry. Our results indicate that signal integrity is not compromised by quantization down to 5-bit filter coefficient and 10-bit data precision at intermediate stages. Furthermore, results from analog simulation and modeling show that a hardware-minimized computational core executing filter steps sequentially is advantageous over the pipeline approach commonly used in DWT implementations. The design is compared to that of a B-spline approach that minimizes the number of multipliers at the expense of increasing the number of adders. The performance demonstrates that in vivo real-time DWT computation is feasible prior to data telemetry, permitting large savings in bandwidth requirements and communication costs given the severe limitations on size, energy consumption and power dissipation of an implantable device.  相似文献   

4.
李丽华 《电子科技》2006,(11):27-30
最近提出的MPEG无损音频编码标准(MPEG-4音频SLS(scalable to Lossless编码技术)提供了一种统一的语音模式,这种语音模式可以将有损语音编码,无损语音编码以及粒度可伸缩语音编码的功能结合在 一个框架中.我们提出了两种提高SLS编码效率的方法,即基于上下关系的算术编码方法和低能量模式编码方法.这两种方法同当前的SLS框架搭配在一起运行,可以保持它的好的特征如:粒度可伸缩性,同时可以成功地改善它的无损压缩性能.  相似文献   

5.
周汀  陈旭昀  章倩苓  李蔚 《电子学报》1998,26(5):51-55,85
我们提出了一种基于最小绝对值误差测试(MMAE)矢量编码器的VLSI结构,这一结构采用了误差测度的值不等式判据、预排序的码书和最近邻搜索算法,并采用二分搜索方法和特殊的误差测度计算及比较结构,大大降低了系统的实现规模,同时采用并行流水线等设计技术,可以获得每8个时钟周期编码一个矢量的处理速度。整个系统采用硬件描述语言VHDL和Synopsys系统中进行了设计验证和综合。  相似文献   

6.
MPEG-4运动补偿处理器的VLSI结构设计   总被引:2,自引:0,他引:2       下载免费PDF全文
王占辉  刘大明  刘龙   《电子器件》2005,28(3):546-550
针对MPEG-4编解码中运动补偿控制复杂、数据吞吐量大、实现较困难的特点,提出了一种适合MPEG-4的运动补偿硬件实现方案,解决了时序分配、输人输出控制等较难处理的问题。文中的方案已经在Xilinx ISE6.1i集成开发环境下,采用了VHDL进行描述,并使用了电子设计自动化(EDA)工具进行了模拟和验证。仿真和综合结果表明,该处理器逻辑功能完全正确,能满足MPEG-4Core Profiles& Level2实时编码要求,可用于MPEG-4的VLSI实现。  相似文献   

7.
文章根据MPEG-4纹理填充的特点,采用流水线结构设计了MPEG-4中的重复填充的,采用乒乓RAM实现了高速流水线结构,利用填充PE单元实现了MPEG-4高效的重复填充.仿真和综合结果表明,文章设计的VOP填充处理器的逻辑功能完全正确,而且可以满足MPEG-4 Core Profiles&Level2的实时编码要求,可用于MPEG-4的VLSI实现。  相似文献   

8.
一种快速高效MPEG-4运动估计硬件结构的研究和实现   总被引:6,自引:0,他引:6  
提出一种高度并行和多流水线处理的硬件结构,实现MPEG-4视频部分的全搜索块匹配运动估计算法.该硬件结构能实时地通过全搜索块匹配运动估计算法来搜索每个像素块最佳匹配运动向量,具有计算速度高、运动向量准确、较少的内置存储器要求、低运行时钟和低功耗等诸多优点,从而可满足移动视频业务和高清晰视频业务的需求.该硬件结构基于富士通的CE66库实现.  相似文献   

9.
多媒体系统在信息传播与记录上的应用已日趋普遍,且成为信息传播的主流.在多媒体的传输当中,视频占了很大的比重.由于视频数据量十分庞大,在实际的储存与传输上都有困难,因此已有许多压缩标准被制订出来.  相似文献   

10.
一种基于对象的可伸缩小波编码器   总被引:1,自引:1,他引:0  
提出了一种基于任意形状对象的嵌入零树小波编码器。该编码器首先对任意形状对象做适形离散小波变换(SA-DWT),然后用适形预测嵌入零树小波(SA-PEZW)方法来同时编码变换后的形状和纹理信息,从而得到一个具有可伸缩性并且能够精确重构形状信息的码流。对标准测试库列进行了实验,结果证明该编码器具有较高的编码效率,性能良好。  相似文献   

11.
MPEG-2量化策略的改进   总被引:2,自引:0,他引:2  
本文在 M P E G- 2 T M 5 量化策略的基础上作了以下改进:1根据人眼视觉特性,提出一种计算宏块活动性系数的新算法,使之能更好地反映人眼对图像的敏感程度。2根据同一帧图像内不同区域的复杂度,自适应地给不同区域分配不同数量的比特数,减少由于对每个宏块平均分配比特数产生的图像质量不均匀现象。计算机模拟结果表明:采用本算法的图像编码信噪比平均提高了 06d B以上,而且图像的主观质量也有较大的改善。  相似文献   

12.
The need for higher data rates is ever rising as wireless communications standards move from the third to the fourth generation. Turbo-Codes are the prevalent channel codes for wireless systems due to their excellent forward error correction capability. So far research has mainly focused on components of high throughput Turbo-Decoders. In this paper we explore the Turbo-Decoder design space anew, both under system design and deep-submicron implementation aspects. Our approach incorporates all levels of design, from I/O behavior down to floorplaning taking deep-submicron effects into account. Its scalability allows to derive optimized architectures tailored to the given throughput and target technology. We present results for 3GPP compliant Turbo-Decoders beyond 100 Mbit/s synthesized on a 0.18 μm standard cell library.  相似文献   

13.
应骏  李莉 《电视技术》2007,31(8):29-31
结合TI公司的TMS320DM320针对媒体处理并行数字信号处理结构特点,分析MPEG-4算法本身的实现,采用了流水线设计的方式,针对性地提出了基于DM320上的MPEG-4解码算法.分析了各个芯片内部资源的利用率,提出了未来优化的方向.  相似文献   

14.
网格安全是网格计算中的一个关键问题,分析了网格中存在的安全问题,以及网格对安全的需求,通过对网格安全元素的形式化抽象,提出了一种可扩展的网格安全体系结构,并简要分析了该体系结构的一些性质。  相似文献   

15.
The TANGRAM VLSI co-processor is intended as a building block for use in system-on-chip (SOC) designs for the versatile MPEG-4 multimedia standard. It is designed to perform the computation intensive final step of MPEG-4 video decoding: compositing of scenes at the display. This includes warping and alpha blending of multiple full-screen video textures in real-time. TANGRAM consists of a RISC control processor and multiple powerful arithmetic units that perform rendering calculations directly in hardware. This hybrid architecture enables adaptation to changes in algorithms or support for different video-formats in software. Communication to a host CPU and video decoding hardware is done via the very common PI-bus on-chip interface. TANGRAM directly interfaces with the ITU-R601/656 digital video output. VHDL implementation and synthesis for a 0.35 standard-cell library provide an estimate of 100 MHz achievable clock frequency (worst-case), 52 mm2 overall area and 1 Watt power dissipation. TANGRAM has sufficient performance for rendering of MPEG-4 Main Profile@Layer3 scenes (ITU-R 601).  相似文献   

16.
In this paper, we propose a cost-effective architecture of variable length decoder (VLD) for MPEG-2 and AVS. In order to save the buffer memory between VLD and IDCT and accelerate decoding speed, block-based pipeline buffers are adopted. Inverse scan (IScan) and inverse quantisation (IQ) are also merged into this architecture for cost-effective implementation and for easier system integration. A novel group-based architecture with the optimized look-up table is used for MPEG-2 and a new memory-efficient architecture with mixed memory organization is used for AVS. We use shared modules in both MPEG-2 and AVS as much as possible, such as the flush unit, the buffer controller and the buffers. Moreover, we propose merged IQ scheme and merged RAMs scheme. Based on 0.18 μm CMOS technology, the proposed design consumes about 11.5 K gates at a clock constrain of 125 MHz. The simulation results show that it can achieve real-time decoding, such as HD1080i (1,920 × 1,088 at 30 MHz) format video of AVS and MPEG-2. Furthermore, we propose an effective design of the buffers between VLD and IDCT according to the IDCT architecture, a cost-efficient IQ architecture with full flexibility and an efficient scheme for accelerating VLC decoding.
Yun HeEmail:
  相似文献   

17.
18.
Evolving video coding standards demand functional flexibility for implementations, not only at design time but also after fabrication. This paper presents a System-on-Chip design approach with a feasible combination of performance, scalability, programmability, area efficiency, and design time effort for a video encoder. The encoder is based on a homogeneous master-slave processor architecture. Each slave encodes a part of the frame in the Single Program Multiple Data (SPMD) data parallel model. Both shared and distributed memory architectures are presented. Design effort is reduced by identical program codes, automated assembly of software and hardware modules independent of the number and type of processors, as well as our flexible on-chip communication network called Heterogeneous IP Block Interconnection (HIBI). A case study implementation with two to ten simple ARM7 processors, 32-bit HIBI bus and non-optimized processor-independent software gives the performance from 6 to 53 fps for QCIF. The whole encoder area ranges from 173 to 770 kgates excluding the memories. The relation scales reasonably well to systems with more powerful processors and optimized code. The optimization of the communication network shows that with more than six slaves even a serial HIBI connection with 100 MHz speed is feasible. HIBI and the parallelization approach allow exploration and optimization of the communication both at the application and architecture layers. Tero Kangas, MSc ’01, Tampere University of Technology (TUT). Since 1999 he has been working as a research scientist in the Institute of Digital and Computer Systems (DCS) at TUT. Currently he is working towards his PhD degree and his main research topics are system architectures and SoC design methodologies in multimedia applications. Kimmo Kuusilinna, PhD ’01, TUT. His main research interests include system-level design and verification, interconnection networks, and parallel memories. Currently he is working as a senior research engineer at the Nokia Research Center. Timo D. H?m?l?inen, MSc ’93, PhD ’97, TUT. He acted as a senior research scientist and project manager at TUT in 1997-2001. He was nominated to full professor at TUT/Institute of Digital and Computer Systems in 2001. He heads the DACI research group that focuses on three main lines: wireless local area networking and wireless sensor networks, high-performance DSP/HW based video encoding, and interconnection networks with design flow tools for heterogeneous SoC platforms.  相似文献   

19.
提出了一种基于阈值的分布式迭代算法。与现有算法不同的是,该算法针对可扩展网络交换调度结构的特点,为处于最高优先级的调度器安排了2次迭代,第1次迭代用阈值方法找出一些较长的VOQ(虚拟输出队列),并在最高优先级时隙之前的一个时隙完成,以缩短信号的处理时间。仿真结果表明,该算法与现有算法相比,在大流量的uniform流量模式下,延时性能和吞吐率获得了明显的提高。同时,该算法的硬件代价小,有效地实现了性能和复杂度的良好折中。  相似文献   

20.
文中着重描述了MPEG-4标准引入的DMIF协议的通信体系结构,并概述第一版DMIF协议的以及第二版将要做的功能扩育。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号