共查询到20条相似文献,搜索用时 15 毫秒
1.
Daisuke Miyazaki Shoji Kawahito 《Analog Integrated Circuits and Signal Processing》2000,25(3):235-244
This paper presents a new scheme of a low-power area-efficient pipelined A/D converter using a single-ended amplifier. The proposed multiply-by-two single-ended amplifier using switched capacitor circuits has smaller DC bias current compared to the conventional fully-differential scheme, and has a small capacitor mismatch sensitivity, allowing us to use a smaller capacitance. The simple high-gain dynamic-biased regulated cascode amplifier also has an excellent switching response. These properties lead to the low-power area-efficient design of high-speed A/D converters. The estimated power dissipation of the 10-b pipelined A/D converter is less than 12 mW at 20 MSample/s. 相似文献
2.
3.
Lee H.-Y. Park I.-C. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(4):889-900
This paper presents an area-efficient algorithm for the pipelined processing of fast Fourier transform (FFT). The proposed algorithm is to decompose a discrete Fourier transform (DFT) into two balanced sub-DFTs in order to minimize the total number of twiddle factors to be stored into tables. The radix in the proposed decomposition is adaptively changed according to the remaining transform length to make the transform lengths of sub-DFTs resulting from the decomposition as close as possible. An 8192-point pipelined FFT processor designed for digital video broadcasting-terrestrial (DVB-T) systems saves 33% of general multipliers and 23% of the total size of twiddle factor tables compared to a conventional pipelined FFT processor based on the radix-22 algorithm. In addition to the decomposition, several implementation techniques are proposed to reduce area, such as a simple index generator of twiddle factor and add/subtract units combined with the two's complement operation 相似文献
4.
5.
为提高NIOS系统的浮点计算效率,使用Verilog语言实现了单精度浮点数加减及乘法运算的功能模块,并通过波形验证其功能,依据NIOSⅡ定制指令的制定规范,将这一功能添加到SOPCBuilder中,扩展出新的基于硬件电路的浮点运算指令,使之在NIOS软件环境中得到应用。通过NIOSⅡ本身软件浮点计算和新增硬件指令进行运算结果和时间上的对比,证实硬件指令计算的优越性,为NIOS下的浮点运算提供了更有效率的选择。 相似文献
6.
Tay-Jyi Lin Hung-Yueh Lin Chie-Min Chao Chih-Wei Liu Chih-Wei Jen 《The Journal of VLSI Signal Processing》2006,42(2):127-138
A multimedia system-on-a-chip (SoC) usually contains one or more programmable digital signal processors (DSP) to accelerate
data-intensive computations. But most of these DSP cores are designed originally for standalone applications, and they must
have some overlapped (and redundant) components with the host microprocessor. This paper presents a compact DSP for multi-core
systems, which is fully programmable and has been optimized to execute a set of signal processing kernels very efficiently.
The DSP core was designed concurrently with its automatic software generator based on high-level synthesis. Moreover, it performs
lightweight arithmetic—the static floating-point (SFP), which approximates the quality of floating-point (FP) operations with
the hardware similar to that of the integer arithmetic. In our simulations, the compact DSP and its auto-generated software
can achieve 3X performance (estimated in cycles) of those DSP cores in the dual-core baseband processors with similar computing resources.
Besides, the 16-bit SFP has above 40 dB signal to round-off noise ratio over the IEEE single-precision FP, and it even outperforms
the hand-optimized programs based on the 32-bit integer arithmetic. The 24-bit SFP has above 64 dB quality, of which the maximum
precision is identical to that of the single-precision FP. Finally, the DSP core has been implemented and fabricated in the
UMC 0.18μm 1P6M CMOS technology. It can operate at 314.5 MHz while consuming 52mW average power. The core size is only 1.5 mm×1.5 mm
including the 16 KB on-chip memory and the AMBA AHB interface.
This work was supported by the National Science Council, Taiwan under Grant NSC93-2220-E-009-017. Besides, the authors would
like to thank the National Chip Implementation Center (CIC) for chip fabrication.
Tay-Jyi Lin received the BS degree in electrical and control engineering from National Chiao Tung University, Taiwan, in 1998. He is
working toward the PhD degree in the Department of Electronics Engineering and the Institute of Electronics, National Chiao
Tung University. His current researches include the heterogeneous computing platform for embedded multimedia systems, complexity-aware
architecture design, and high-performance/low-power digital signal processors.
Hung-Yueh Lin received the BS and the MS degrees in electronics engineering from National Chiao Tung University, Taiwan, in 2002 and 2004,
respectively. He is now with MediaTek, Inc., Hsinchu, Taiwan. His research interests include lightweight computer arithmetic
and DSP architecture.
Chie-Min Chao received the BS degree in electronics engineering from National Chiao Tung University, Taiwan, in 2003, where he is currently
pursuing his MS degree. His researches include system software development, VLSI system design, and DSP architecture.
Chih-Wei Liu received the BS and the PhD degrees in electrical engineering from National Tsing Hua University, Taiwan, in 1991 and 1999,
respectively. From 1999 to 2000, he was an integrated circuit design engineer at the Electronics Research and Service Organization
(ERSO) of Industrial Technology Research Institute (ITRI), Taiwan. Then, near the end of 2000, he started to work for the
SoC Technology Center (STC) of ITRI as a project leader and eventually left ITRI at the end of Oct., 2003. He is currently
with the Department of Electronics Engineering and the Institute of Electronics, National Chiao Tung University, Taiwan, as
an assistant professor. His current research interests include SoC and VLSI system design, processor architecture, digital
signal processing, digital communications, and coding theory.
Chein-Wei Jen received the BS degree from National Chiao Tung University, Taiwan, in 1970, the MS degree from Stanford University in 1977,
and the PhD degree from National Chiao Tung University in 1983. From 1981 to 2004, he was with the Department of Electronics
Engineering and the Institute of Electronics at National Chiao Tung University. Dr Jen was given the Outstanding Electrical
Engineering Professor Award by the Chinese Institute of Electrical Engineering in 2002. He is currently the General Director
of the SoC Technology Center at Industrial Technology Research Institute, the Adviser of National SoC Program, and the Managing
Director of the Board of the Taiwan IC Design Society. His research interests include SoC design, VLSI architectures, multimedia
processing, and design automation. He holds seven patents and has published over 50 journal and 100 conference papers in these
areas. 相似文献
7.
8.
9.
10.
A novel architecture of high precision, floating-point special Arithmetic function unit (SFU) for elementary transcendental functions is presented in this paper to provide area efficiency as well as high performance for programmable vertex shader. From the architecture point of view, the evaluation of quadratic approximation for special functions is performed by sharing the SIMD vector unit in shader architecture to minimize processing latency and to reduce area cost in SFU. An optimized minimax approach is proposed as well to obtain the finite-length and normalized quadratic coefficients for high precision. The experiment result shows that the proposed SFU can significantly reduce area cost and by adopting the proposed SFU, a vertex shader with Transport triggered architecture (TTA) can achieve 15.0% improvement on average in performance/area ratio for various shading benchmarks. 相似文献
11.
Ardavan Pedram John D. McCalpin Andreas Gerstlauer 《Journal of Signal Processing Systems》2014,77(1-2):169-190
FFT algorithms have memory access patterns that prevent many architectures from achieving high computational utilization, particularly when parallel processing is required to achieve the desired levels of performance. Starting with a highly efficient hybrid linear algebra/FFT core, we co-design the on-chip memory hierarchy, on-chip interconnect, and FFT algorithms for a multicore FFT processor. We show that it is possible to to achieve excellent parallel scaling while maintaining power and area efficiency comparable to that of the single-core solution. The result is an architecture that can effectively use up to 16 hybrid cores for transform sizes that can be contained in on-chip SRAM. When configured with 12MiB of on-chip SRAM, our technology evaluation shows that the proposed 16-core FFT accelerator should sustain 388 GFLOPS of nominal double-precision performance, with power and area efficiencies of 30 GFLOPS/W and 2.66 GFLOPS/mm2, respectively. 相似文献
12.
Olivieri M. Pappalardo F. Smorfa S. Visalli G. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2007,54(8):685-689
Leading zero anticipation with error correction is a widely adopted technique in the implementation of high-speed IEEE-754-compliant floating-point units (FPUs), which are critical for area and power in multimedia-oriented systems-on-chips. We investigated a novel LZA algorithm allowing us to remove error correction circuitry by reducing the error rate below a commonly accepted limit for image processing applications, which is not achieved by previous techniques. We embedded our technique into a complete FPU definitely obtaining both area saving and overall FPU latency reduction with respect to traditional designs. 相似文献
13.
IEEE-754 rounding support increases the critical delay for floating-point multipliers. Except round-to-zero mode all IEEE rounding modes test the (n???2) least significant product bits for one. The result of the test is indicated by the sticky-bit. Since fast generation of the sticky-bit is critical for performance, various sticky-bit generation designs are developed. This paper presents a comparison of previous fast sticky-bit generation designs and proposes a novel design that is independent from the multiplier’s hardware. Thus, the proposed design can be used in any floating-point multiplier or any floating-point multiply-accumulate circuit. The proposed method is one of the fastest among all methods and it uses the minimum hardware resources among the designs that use the same idea. 相似文献
14.
15.
Nigel Boston 《The Journal of VLSI Signal Processing》2005,39(3):323-331
An extension of a polynomial consists of the polynomial plus higher power terms. Given a polynomial with real coefficients and an integer larger than its degree, a method is given that produces a finite list of extensions of degree this larger integer such that this list necessarily contains the extension whose largest root is as small as possible. This extension is called the pole radius minimizer. The pole radius minimizer is then found by the finite check of comparing the polynomials in the list. The method is applied to obtain filter transformations that are optimal as regards throughput, but also have considerable savings in hardware overhead compared with standard methods such as Scattered Lookahead and Minimum Order Augmentation. The table in Section 5 gives an explicit comparison for various kinds of filters.Nigel Boston Undergraduate degree from Cambridge University, UK, followed by a mathematics Ph.D. from Harvard in 1987. After a year at IHES, France, and two years as a Morrey Assistant Professor at Berkeley, went to the University of Illinois at Urbana-Champaign. Was founding director of the Illinois Center for Cryptography and Information Protection in the Coordinated Science Lab at UIUC and organized the first three Midwest Arithmetical Geometry in Cryptography meetings. Left UIUC in 2002 to become a Full Professor at the University ofWisconsin, with a split appointment in mathematics and ECE and affiliate appointment in CS. Working on applications of algebra and number theory to engineering in areas such as cryptography, coding theory, watermarking, and biometrics. 相似文献
16.
针对短波宽带接收机系统中信号动态范围大的特点,自定义了24位的浮点格式,并采用流水线技术设计了该格式浮点数的加法和乘法运算单元。在分析了各种FIR滤波器优缺点的基础上.结合FPGA的特点给出了转置型FIR校正滤波器设计方案。最后,以数据率为2.5MS/S的宽带信号为输入,Ahera公司的EP2S60F672C5芯片为硬件平台仿真实现了10通道短波宽带接收机的250阶FIR校正滤波器,最高运行速率达到130MHz以上。 相似文献
17.
The double-precision floating-point arithmetic, specifically multiplication, is a widely used arithmetic operation for many scientific and signal processing applications. In general, the double-precision floating-point multiplier requires a large 53×53 mantissa multiplication in order to get the final result. This mantissa multiplication exists as a limit on both area and performance bounds of this operation. This paper presents a novel way to reduce this large multiplication. The proposed approach in this paper allows to use less amount of multiplication hardware compared to the traditional method. The multiplication is done by using Karatsuba technique. This design is specifically targeting Field Programmable Gate Array (FPGA) platforms, and it has also been evaluated on ASIC flow. The proposed module gives excellent performance with efficient use of resources. The design is fully compatible with the IEEE standard precision. The proposed module has shown a better performance in comparison with the best reported multipliers in the literature. 相似文献
18.
Imtiaz Sajid M. M. Ahmed Sotirios G. Ziavras 《Journal of Signal Processing Systems》2012,67(2):157-166
The square root is a basic arithmetic operation in image and signal processing. We present a novel pipelined architecture to implement N-bit fixed-point square root operation on an FPGA using a non-restoring pipelined algorithm that does not require floating-point hardware. Pipelining hazards in its hardware realization are avoided by modifying the classic non-restoring algorithm, thus resulting in a 13% improved latency. Furthermore, the proposed architecture is flexible allowing modification as per individual application needs. It is demonstrated that the proposed architecture is approximately four times faster than its popular counterparts and at the same time it consumes 50% less energy for envelope detection at 268 MHz sampling rate. 相似文献
19.
实现快速、低功耗以及节省面积的乘法器对高性能微处理器 (例如 DSP和 RISC)而言是至关重要的。文中详尽论述了新型的增强型多输出多米诺逻辑 ( EMODL)及其 n-MOS赋值树的尺寸优化方法 ,并用它实现了高速低功耗 2 0× 2 0 bit流水线乘法器。最后 ,通过 HSPICE仿真 ,确认了该乘法器结构的优越性 :流水线等待时间小 ( 2倍于系统时钟 )、运算速度高 ( 10 0 MOPS)以及低功耗 ( 2 3 .94m W) 相似文献
20.
Arithmetic operations and expression eval- uations are fundamental in computing models. This paper firstly designs arithmetic membranes without priority rules for basic arithmetic operations, and then proposes an algo- rithm to construct expression P systems based on several of such membranes after designing synchronous and asyn- chronous transmission strategies among the membranes. For any arithmetic expression, an expression P system can be built to evaluate it effectively. Finally, we discuss differ- ent parallelism strategies through which different expres- sion P systems can be built for an arithmetic expression. 相似文献