共查询到20条相似文献,搜索用时 0 毫秒
1.
Current and future requirements for adaptive real-time image compression challenge even the capabilities of highly parallel realizations in terms of hardware performance. Previously proposed linear array structures for full-search vector quantization do not offer scalability and adaptivity in this context, because they require separate data/control pins for dynamically updating the codevectors and complicated interlock mechanisms to ensure that the regular data flow is not corrupted as a result of updates. We explore the design space for full-search vector quantizers and propose a novel linear processor array architecture in which global wiring is limited to clock and power supply distribution, thus allowing high-speed processing in spite of only limited communication with the host via the boundary processors. The resulting fully pipelined design is not only area-efficient for VLSI implementation but is also readily scalable and offers extremely high performance. 相似文献
2.
This paper presents a Computational Memory architecture for MPEG-4 applications with mobile devices. The proposed architecture
is used for real-time block-based motion estimation, which is the most computational intensive task in the video encoder.
It uses the exhaustive block-matching algorithm (EBMA) for motion estimation. The proposed architecture consists of embedded
SRAMs and a number of block-matching units working in parallel to process video data while stored in the memory. The block-matching
units access the embedded SRAMs simultaneously, which increases the speed of the architecture.
The architecture processes CIF format video sequences (i.e., the frame size is 352 × 288 pixels) with block size of 16 × 16
pixels and ±15 pixels search range. The proposed architecture has been designed, prototyped, and simulated for 0.18 μm TSMC
CMOS technology. The simulation shows that the proposed architectures processes up to 126 CIF frames per second with clock
frequency 100 MHz. The synthesized prototype of the proposed architecture includes 200 KB memory and it has an area of 33.75
mm2 and consumes 986.96 mW @100 MHz.
Mohammed Sayed received his B.Sc. degree from Zagazig University, Zagazig, Egypt, in 1997 and a postgraduate diploma in VLSI design from
the Information Technology Institute (ITI), Cairo, Egypt, in 1998. In 2003 he received his M.Sc. degree from University of
Calgary, Calgary, Canada. From 1998 to 2001 he was a research and teaching assistant at the Electronics & Communications Engineering
Department, Zagazig University, Egypt. In 2001 he became a research assistant at the Department of Electrical and Computer
Engineering, University of Calgary, Canada. His current research interests are System-on-Chip, Embedded Memories, and Digital
Video Processing.
Mr. Sayed received a number of scholarships and awards such as iCORE Scholarship from 2003 to 2005, SMC Industrial Collaboration
Award in June 2003, and the Micronet Annual Workshop Best Paper Award in April 2002. He has a number of journal and conference
publications and a number of contributions to the MPEG-4 standard (ISO/IEC JTC1/SC29/WG11 MPEG2002/ M8562 and M8563).
Wael Badawy is an associate professor in the Department of Electrical and Computer Engineering. He holds an adjunct professor in the
Department of Mechanical Engineering, University of Alberta.
Dr. Badawy's research interests are in the areas of: Microelectronics, VLSI architectures for video applications with low-bit
rate applications, digital video processing, low power design methodologies, and VLSI prototyping. His research involves designing
new models, techniques, algorithms, architectures and low power prototype for novel system and consumer products. Dr. Badawy
authored and co-authored more than 100 peer reviewed Journal and Conference papers and about 30 technical reports. He is the
Guest Editor for the special issue on System on Chip for Real-Time Applications in the Canadian Journal on Electrical and
Computer Engineering, the Technical Chair for the 2002 International Workshop on SoC for real-time applications, and a technical
reviewer in several IEEE journals and conferences. He is currently a member of the IEEE-CAS Technical Committee on Communication.
Dr. Badawy was honored with the “2002 Petro Canada Young Innovator Award”, “2001 Micralyne Microsystems Design Award” and
the 1998 Upsilon Pi Epsilon Honor Society and IEEE Computer Society Award for Academic Excellence in Computer Disciplines.
He is currently the Chairman of the Canadian Advisor Committee (CAC) and Head of the Canadian Delegation on ISO/IEC/JTC1/SC6
“Telecommunications and Information Exchange Between Systems”. Member, The Canadian Advisory Committee for the Standards Council
of Canada—Subcommittee 29: Coding of Audio, Picture Multimedia and Hypermedia Information, and Canadian Delegate, The ISO/IEC
MPEG standard committee. He is a voting Member on the VSI Alliance. He is also the Chair of the IEEE-Southern Alberta Society-Computer
Chapter. 相似文献
3.
Oweiss K.G. Mason A. Suhail Y. Kamboh A.M. Thomson K.E. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(6):1266-1278
This paper describes an area and power-efficient VLSI approach for implementing the discrete wavelet transform on streaming multielectrode neurophysiological data in real time. The VLSI implementation is based on the lifting scheme for wavelet computation using the symmlet4 basis with quantized coefficients and integer fixed-point data precision to minimize hardware demands. The proposed design is driven by the need to compress neural signals recorded with high-density microelectrode arrays implanted in the cortex prior to data telemetry. Our results indicate that signal integrity is not compromised by quantization down to 5-bit filter coefficient and 10-bit data precision at intermediate stages. Furthermore, results from analog simulation and modeling show that a hardware-minimized computational core executing filter steps sequentially is advantageous over the pipeline approach commonly used in DWT implementations. The design is compared to that of a B-spline approach that minimizes the number of multipliers at the expense of increasing the number of adders. The performance demonstrates that in vivo real-time DWT computation is feasible prior to data telemetry, permitting large savings in bandwidth requirements and communication costs given the severe limitations on size, energy consumption and power dissipation of an implantable device. 相似文献
4.
最近提出的MPEG无损音频编码标准(MPEG-4音频SLS(scalable to Lossless编码技术)提供了一种统一的语音模式,这种语音模式可以将有损语音编码,无损语音编码以及粒度可伸缩语音编码的功能结合在 一个框架中.我们提出了两种提高SLS编码效率的方法,即基于上下关系的算术编码方法和低能量模式编码方法.这两种方法同当前的SLS框架搭配在一起运行,可以保持它的好的特征如:粒度可伸缩性,同时可以成功地改善它的无损压缩性能. 相似文献
5.
6.
针对MPEG-4编解码中运动补偿控制复杂、数据吞吐量大、实现较困难的特点,提出了一种适合MPEG-4的运动补偿硬件实现方案,解决了时序分配、输人输出控制等较难处理的问题。文中的方案已经在Xilinx ISE6.1i集成开发环境下,采用了VHDL进行描述,并使用了电子设计自动化(EDA)工具进行了模拟和验证。仿真和综合结果表明,该处理器逻辑功能完全正确,能满足MPEG-4Core Profiles& Level2实时编码要求,可用于MPEG-4的VLSI实现。 相似文献
7.
文章根据MPEG-4纹理填充的特点,采用流水线结构设计了MPEG-4中的重复填充的,采用乒乓RAM实现了高速流水线结构,利用填充PE单元实现了MPEG-4高效的重复填充.仿真和综合结果表明,文章设计的VOP填充处理器的逻辑功能完全正确,而且可以满足MPEG-4 Core Profiles&Level2的实时编码要求,可用于MPEG-4的VLSI实现。 相似文献
8.
9.
多媒体系统在信息传播与记录上的应用已日趋普遍,且成为信息传播的主流.在多媒体的传输当中,视频占了很大的比重.由于视频数据量十分庞大,在实际的储存与传输上都有困难,因此已有许多压缩标准被制订出来. 相似文献
10.
一种基于对象的可伸缩小波编码器 总被引:1,自引:1,他引:0
提出了一种基于任意形状对象的嵌入零树小波编码器。该编码器首先对任意形状对象做适形离散小波变换(SA-DWT),然后用适形预测嵌入零树小波(SA-PEZW)方法来同时编码变换后的形状和纹理信息,从而得到一个具有可伸缩性并且能够精确重构形状信息的码流。对标准测试库列进行了实验,结果证明该编码器具有较高的编码效率,性能良好。 相似文献
11.
12.
Michael J. Thul Frank Gilbert Timo Vogt Gerd Kreiselmaier Norbert Wehn 《Journal of Signal Processing Systems》2005,39(1-2):63-77
The need for higher data rates is ever rising as wireless communications standards move from the third to the fourth generation. Turbo-Codes are the prevalent channel codes for wireless systems due to their excellent forward error correction capability. So far research has mainly focused on components of high throughput Turbo-Decoders. In this paper we explore the Turbo-Decoder design space anew, both under system design and deep-submicron implementation aspects. Our approach incorporates all levels of design, from I/O behavior down to floorplaning taking deep-submicron effects into account. Its scalability allows to derive optimized architectures tailored to the given throughput and target technology. We present results for 3GPP compliant Turbo-Decoders beyond 100 Mbit/s synthesized on a 0.18 μm standard cell library. 相似文献
13.
结合TI公司的TMS320DM320针对媒体处理并行数字信号处理结构特点,分析MPEG-4算法本身的实现,采用了流水线设计的方式,针对性地提出了基于DM320上的MPEG-4解码算法.分析了各个芯片内部资源的利用率,提出了未来优化的方向. 相似文献
14.
15.
M. Berekovic P. Pirsch T. Selinger K.-I.- Wels C. Miro A. Lafage C. Heer G. Ghigo 《The Journal of VLSI Signal Processing》2002,31(2):157-171
The TANGRAM VLSI co-processor is intended as a building block for use in system-on-chip (SOC) designs for the versatile MPEG-4 multimedia standard. It is designed to perform the computation intensive final step of MPEG-4 video decoding: compositing of scenes at the display. This includes warping and alpha blending of multiple full-screen video textures in real-time. TANGRAM consists of a RISC control processor and multiple powerful arithmetic units that perform rendering calculations directly in hardware. This hybrid architecture enables adaptation to changes in algorithms or support for different video-formats in software. Communication to a host CPU and video decoding hardware is done via the very common PI-bus on-chip interface. TANGRAM directly interfaces with the ITU-R601/656 digital video output. VHDL implementation and synthesis for a 0.35 standard-cell library provide an estimate of 100 MHz achievable clock frequency (worst-case), 52 mm2 overall area and 1 Watt power dissipation. TANGRAM has sufficient performance for rendering of MPEG-4 Main Profile@Layer3 scenes (ITU-R 601). 相似文献
16.
In this paper, we propose a cost-effective architecture of variable length decoder (VLD) for MPEG-2 and AVS. In order to save
the buffer memory between VLD and IDCT and accelerate decoding speed, block-based pipeline buffers are adopted. Inverse scan
(IScan) and inverse quantisation (IQ) are also merged into this architecture for cost-effective implementation and for easier
system integration. A novel group-based architecture with the optimized look-up table is used for MPEG-2 and a new memory-efficient
architecture with mixed memory organization is used for AVS. We use shared modules in both MPEG-2 and AVS as much as possible,
such as the flush unit, the buffer controller and the buffers. Moreover, we propose merged IQ scheme and merged RAMs scheme.
Based on 0.18 μm CMOS technology, the proposed design consumes about 11.5 K gates at a clock constrain of 125 MHz. The simulation
results show that it can achieve real-time decoding, such as HD1080i (1,920 × 1,088 at 30 MHz) format video of AVS and MPEG-2.
Furthermore, we propose an effective design of the buffers between VLD and IDCT according to the IDCT architecture, a cost-efficient
IQ architecture with full flexibility and an efficient scheme for accelerating VLC decoding.
相似文献
Yun HeEmail: |
17.
18.
Tero Kangas Timo D. H?m?l?inen Kimmo Kuusilinna 《The Journal of VLSI Signal Processing》2006,44(1-2):79-95
Evolving video coding standards demand functional flexibility for implementations, not only at design time but also after
fabrication. This paper presents a System-on-Chip design approach with a feasible combination of performance, scalability,
programmability, area efficiency, and design time effort for a video encoder. The encoder is based on a homogeneous master-slave
processor architecture. Each slave encodes a part of the frame in the Single Program Multiple Data (SPMD) data parallel model.
Both shared and distributed memory architectures are presented. Design effort is reduced by identical program codes, automated
assembly of software and hardware modules independent of the number and type of processors, as well as our flexible on-chip
communication network called Heterogeneous IP Block Interconnection (HIBI). A case study implementation with two to ten simple
ARM7 processors, 32-bit HIBI bus and non-optimized processor-independent software gives the performance from 6 to 53 fps for
QCIF. The whole encoder area ranges from 173 to 770 kgates excluding the memories. The relation scales reasonably well to
systems with more powerful processors and optimized code. The optimization of the communication network shows that with more
than six slaves even a serial HIBI connection with 100 MHz speed is feasible. HIBI and the parallelization approach allow
exploration and optimization of the communication both at the application and architecture layers.
Tero Kangas, MSc ’01, Tampere University of Technology (TUT). Since 1999 he has been working as a research scientist in the Institute
of Digital and Computer Systems (DCS) at TUT. Currently he is working towards his PhD degree and his main research topics
are system architectures and SoC design methodologies in multimedia applications.
Kimmo Kuusilinna, PhD ’01, TUT. His main research interests include system-level design and verification, interconnection networks, and parallel
memories. Currently he is working as a senior research engineer at the Nokia Research Center.
Timo D. H?m?l?inen, MSc ’93, PhD ’97, TUT. He acted as a senior research scientist and project manager at TUT in 1997-2001. He was nominated
to full professor at TUT/Institute of Digital and Computer Systems in 2001. He heads the DACI research group that focuses
on three main lines: wireless local area networking and wireless sensor networks, high-performance DSP/HW based video encoding,
and interconnection networks with design flow tools for heterogeneous SoC platforms. 相似文献
19.
20.
文中着重描述了MPEG-4标准引入的DMIF协议的通信体系结构,并概述第一版DMIF协议的以及第二版将要做的功能扩育。 相似文献