期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Low-Power Area-Efficient Pipelined A/D Converter Design Using a Single-Ended Amplifier

Daisuke Miyazaki Shoji Kawahito 《Analog Integrated Circuits and Signal Processing》2000,25(3):235-244

This paper presents a new scheme of a low-power area-efficient pipelined A/D converter using a single-ended amplifier. The proposed multiply-by-two single-ended amplifier using switched capacitor circuits has smaller DC bias current compared to the conventional fully-differential scheme, and has a small capacitor mismatch sensitivity, allowing us to use a smaller capacitance. The simple high-gain dynamic-biased regulated cascode amplifier also has an excellent switching response. These properties lead to the low-power area-efficient design of high-speed A/D converters. The estimated power dissipation of the 10-b pipelined A/D converter is less than 12 mW at 20 MSample/s. 相似文献

2.

一种深度流水线的浮点加法器

下载免费PDF全文

邵杰伍万棱余汉城《电子器件》2007,30(3):911-914

随着数字信号处理技术的发展,FPGA正越来越频繁地用于实现基于高速硬件的高性能的科学计算.本文通过增加浮点加法器的流水线级数来提高其单位时间的吞吐量,探讨了充分利用FPGA内部丰富的触发器来提高系统主频的可行性.提出了一种指数和尾数操作、加法和减法操作均分离的多路径浮点加法器结构,对于单精度(32位)的操作数,采用Altera公司的StratixⅡ系列芯片,8级流水线可以达到356 MHz以上的速度. 相似文献

3.

Balanced Binary-Tree Decomposition for Area-Efficient Pipelined FFT Processing

Lee H.-Y. Park I.-C. 《IEEE transactions on circuits and systems. I, Regular papers》2007,54(4):889-900

This paper presents an area-efficient algorithm for the pipelined processing of fast Fourier transform (FFT). The proposed algorithm is to decompose a discrete Fourier transform (DFT) into two balanced sub-DFTs in order to minimize the total number of twiddle factors to be stored into tables. The radix in the proposed decomposition is adaptively changed according to the remaining transform length to make the transform lengths of sub-DFTs resulting from the decomposition as close as possible. An 8192-point pipelined FFT processor designed for digital video broadcasting-terrestrial (DVB-T) systems saves 33% of general multipliers and 23% of the total size of twiddle factor tables compared to a conventional pipelined FFT processor based on the radix-2² algorithm. In addition to the decomposition, several implementation techniques are proposed to reduce area, such as a simple index generator of twiddle factor and add/subtract units combined with the two's complement operation 相似文献

4.

43位浮点流水线乘法器的设计 总被引：1，自引：0，他引：1

下载免费PDF全文

梁峰邵志标孙海珺《电子器件》2006,29(4):1094-1096,1102

提出一种浮点流水线乘法器IP芯核。该乘法器采用改进的三阶Booth算法减少部分积数目,提出了一种压缩器混用的Wallace树结构压缩阵列,并对关键路径中的5-2压缩器、4—2压缩器和64位CLA加法器进行了优化设计,有效降低了乘法器的延时和面积。经FPGA仿真验证表明,该乘法器运算能力比Altera公司近期提供的同类乘法器单元快15．4％。相似文献

5.

NIOS浮点运算定制指令的实现

陈鹏蔡雪梅《现代电子技术》2011,34(10):166-168

为提高NIOS系统的浮点计算效率,使用Verilog语言实现了单精度浮点数加减及乘法运算的功能模块,并通过波形验证其功能,依据NIOSⅡ定制指令的制定规范,将这一功能添加到SOPCBuilder中,扩展出新的基于硬件电路的浮点运算指令,使之在NIOS软件环境中得到应用。通过NIOSⅡ本身软件浮点计算和新增硬件指令进行运算结果和时间上的对比,证实硬件指令计算的优越性,为NIOS下的浮点运算提供了更有效率的选择。相似文献

6.

A Compact DSP Core with Static Floating-Point Arithmetic

Tay-Jyi Lin Hung-Yueh Lin Chie-Min Chao Chih-Wei Liu Chih-Wei Jen 《The Journal of VLSI Signal Processing》2006,42(2):127-138

A multimedia system-on-a-chip (SoC) usually contains one or more programmable digital signal processors (DSP) to accelerate data-intensive computations. But most of these DSP cores are designed originally for standalone applications, and they must have some overlapped (and redundant) components with the host microprocessor. This paper presents a compact DSP for multi-core systems, which is fully programmable and has been optimized to execute a set of signal processing kernels very efficiently. The DSP core was designed concurrently with its automatic software generator based on high-level synthesis. Moreover, it performs lightweight arithmetic—the static floating-point (SFP), which approximates the quality of floating-point (FP) operations with the hardware similar to that of the integer arithmetic. In our simulations, the compact DSP and its auto-generated software can achieve 3X performance (estimated in cycles) of those DSP cores in the dual-core baseband processors with similar computing resources. Besides, the 16-bit SFP has above 40 dB signal to round-off noise ratio over the IEEE single-precision FP, and it even outperforms the hand-optimized programs based on the 32-bit integer arithmetic. The 24-bit SFP has above 64 dB quality, of which the maximum precision is identical to that of the single-precision FP. Finally, the DSP core has been implemented and fabricated in the UMC 0.18μm 1P6M CMOS technology. It can operate at 314.5 MHz while consuming 52mW average power. The core size is only 1.5 mm×1.5 mm including the 16 KB on-chip memory and the AMBA AHB interface. This work was supported by the National Science Council, Taiwan under Grant NSC93-2220-E-009-017. Besides, the authors would like to thank the National Chip Implementation Center (CIC) for chip fabrication. Tay-Jyi Lin received the BS degree in electrical and control engineering from National Chiao Tung University, Taiwan, in 1998. He is working toward the PhD degree in the Department of Electronics Engineering and the Institute of Electronics, National Chiao Tung University. His current researches include the heterogeneous computing platform for embedded multimedia systems, complexity-aware architecture design, and high-performance/low-power digital signal processors. Hung-Yueh Lin received the BS and the MS degrees in electronics engineering from National Chiao Tung University, Taiwan, in 2002 and 2004, respectively. He is now with MediaTek, Inc., Hsinchu, Taiwan. His research interests include lightweight computer arithmetic and DSP architecture. Chie-Min Chao received the BS degree in electronics engineering from National Chiao Tung University, Taiwan, in 2003, where he is currently pursuing his MS degree. His researches include system software development, VLSI system design, and DSP architecture. Chih-Wei Liu received the BS and the PhD degrees in electrical engineering from National Tsing Hua University, Taiwan, in 1991 and 1999, respectively. From 1999 to 2000, he was an integrated circuit design engineer at the Electronics Research and Service Organization (ERSO) of Industrial Technology Research Institute (ITRI), Taiwan. Then, near the end of 2000, he started to work for the SoC Technology Center (STC) of ITRI as a project leader and eventually left ITRI at the end of Oct., 2003. He is currently with the Department of Electronics Engineering and the Institute of Electronics, National Chiao Tung University, Taiwan, as an assistant professor. His current research interests include SoC and VLSI system design, processor architecture, digital signal processing, digital communications, and coding theory. Chein-Wei Jen received the BS degree from National Chiao Tung University, Taiwan, in 1970, the MS degree from Stanford University in 1977, and the PhD degree from National Chiao Tung University in 1983. From 1981 to 2004, he was with the Department of Electronics Engineering and the Institute of Electronics at National Chiao Tung University. Dr Jen was given the Outstanding Electrical Engineering Professor Award by the Chinese Institute of Electrical Engineering in 2002. He is currently the General Director of the SoC Technology Center at Industrial Technology Research Institute, the Adviser of National SoC Program, and the Managing Director of the Board of the Taiwan IC Design Society. His research interests include SoC design, VLSI architectures, multimedia processing, and design automation. He holds seven patents and has published over 50 journal and 100 conference papers in these areas. 相似文献

7.

单节拍浮点运算神经元的组合逻辑设计

王守觉李卫军陈旭《半导体学报》2004,25(11):1505-1509

介绍了通用神经计算机CASSANDRA-中单节拍浮点运算神经元的硬件设计方法.基于通用超曲面神经元模型,以组合电路与EPROM查表分别实现浮点数加法、乘法及p次幂运算,从而实现了单节拍内完成浮点运算|W(X-Y)|p的神经元组合逻辑设计.该设计使通用神经计算机硬件具有更强的适应能力和更好的网络性能相似文献

8.

单节拍浮点运算神经元的组合逻辑设计

王守觉李卫军陈旭《半导体学报》2004,25(11)

介绍了通用神经计算机CASSANDRA-Ⅱ中单节拍浮点运算神经元的硬件设计方法.基于通用超曲面神经元模型,以组合电路与EPROM查表分别实现浮点数加法、乘法及p次幂运算,从而实现了单节拍内完成浮点运算|W(X-Y)|p的神经元组合逻辑设计.该设计使通用神经计算机硬件具有更强的适应能力和更好的网络性能. 相似文献

9.

用于流水线A/D转换器的改进型数字自校准算法

钱黎明姚建楠吴金李冰《微电子学》2009,39(1)

数字自校准算法在高精度流水线ADC中应用越来越广泛.目前,基于数字自校准算法的流水线ADC的结构一般都是1.5位/级.基于对各种结构优缺点的分析,选择在芯片功耗和面积方面有很强优势的2位/级结构,并设计了一种符合这种结构的改进型数字自校准算法.这种改进算法解决了目前数字自校准算法中校准参数不准确的问题,使校准输出后的数据准确度更高.实验结果表明,该改进型数字自校准算法使系统的线性度有了很大的提升. 相似文献

10.

A Novel Architecture of Special Arithmetic Function Unit for Area-Efficient Programmable Vertex Shader

CHANG Yisong WEI Jizeng ZHAO Guoyu GUO Wei SUN Jizhou 《电子学报:英文版》2013,(3):483-488

A novel architecture of high precision, floating-point special Arithmetic function unit （SFU） for elementary transcendental functions is presented in this paper to provide area efficiency as well as high performance for programmable vertex shader. From the architecture point of view, the evaluation of quadratic approximation for special functions is performed by sharing the SIMD vector unit in shader architecture to minimize processing latency and to reduce area cost in SFU. An optimized minimax approach is proposed as well to obtain the finite-length and normalized quadratic coefficients for high precision. The experiment result shows that the proposed SFU can significantly reduce area cost and by adopting the proposed SFU, a vertex shader with Transport triggered architecture （TTA） can achieve 15.0% improvement on average in performance/area ratio for various shading benchmarks. 相似文献

11.

A Highly Efficient Multicore Floating-Point FFT Architecture Based on Hybrid Linear Algebra/FFT Cores

Ardavan Pedram John D. McCalpin Andreas Gerstlauer 《Journal of Signal Processing Systems》2014,77(1-2):169-190

FFT algorithms have memory access patterns that prevent many architectures from achieving high computational utilization, particularly when parallel processing is required to achieve the desired levels of performance. Starting with a highly efficient hybrid linear algebra/FFT core, we co-design the on-chip memory hierarchy, on-chip interconnect, and FFT algorithms for a multicore FFT processor. We show that it is possible to to achieve excellent parallel scaling while maintaining power and area efficiency comparable to that of the single-core solution. The result is an architecture that can effectively use up to 16 hybrid cores for transform sizes that can be contained in on-chip SRAM. When configured with 12MiB of on-chip SRAM, our technology evaluation shows that the proposed 16-core FFT accelerator should sustain 388 GFLOPS of nominal double-precision performance, with power and area efficiencies of 30 GFLOPS/W and 2.66 GFLOPS/mm², respectively. 相似文献

12.

Analysis and Implementation of a Novel Leading Zero Anticipation Algorithm for Floating-Point Arithmetic Units

Olivieri M. Pappalardo F. Smorfa S. Visalli G. 《Circuits and Systems II: Express Briefs, IEEE Transactions on》2007,54(8):685-689

Leading zero anticipation with error correction is a widely adopted technique in the implementation of high-speed IEEE-754-compliant floating-point units (FPUs), which are critical for area and power in multimedia-oriented systems-on-chips. We investigated a novel LZA algorithm allowing us to remove error correction circuitry by reducing the error rate below a commonly accepted limit for image processing applications, which is not achieved by previous techniques. We embedded our technique into a complete FPU definitely obtaining both area saving and overall FPU latency reduction with respect to traditional designs. 相似文献

13.

Evaluation of Sticky-Bit Generation Methods for Floating-Point Multipliers

Mustafa Gök Metin Mete Özbilen 《Journal of Signal Processing Systems》2009,56(1):51-57

IEEE-754 rounding support increases the critical delay for floating-point multipliers. Except round-to-zero mode all IEEE rounding modes test the (n???2) least significant product bits for one. The result of the test is indicated by the sticky-bit. Since fast generation of the sticky-bit is critical for performance, various sticky-bit generation designs are developed. This paper presents a comparison of previous fast sticky-bit generation designs and proposes a novel design that is independent from the multiplier’s hardware. Thus, the proposed design can be used in any floating-point multiplier or any floating-point multiply-accumulate circuit. The proposed method is one of the fastest among all methods and it uses the minimum hardware resources among the designs that use the same idea. 相似文献

14.

浮点加法器IP核的VHDL设计

何清平刘佐濂林少伟《山西电子技术》2006,(4):34-36

浮点数加法运算是浮点运算中使用频率最高的运算。结合VHDL和FPGA可编程技术，完成具有5级流水线结构、符合IEEE754浮点数标准、可参数化为单／双精度的浮点数加法器IP核的VHDL设计。相似文献

15.

Pipelined IIR Filter Architecture Using Pole-Radius Minimization

Nigel Boston 《The Journal of VLSI Signal Processing》2005,39(3):323-331

An extension of a polynomial consists of the polynomial plus higher power terms. Given a polynomial with real coefficients and an integer larger than its degree, a method is given that produces a finite list of extensions of degree this larger integer such that this list necessarily contains the extension whose largest root is as small as possible. This extension is called the pole radius minimizer. The pole radius minimizer is then found by the finite check of comparing the polynomials in the list. The method is applied to obtain filter transformations that are optimal as regards throughput, but also have considerable savings in hardware overhead compared with standard methods such as Scattered Lookahead and Minimum Order Augmentation. The table in Section 5 gives an explicit comparison for various kinds of filters.Nigel Boston Undergraduate degree from Cambridge University, UK, followed by a mathematics Ph.D. from Harvard in 1987. After a year at IHES, France, and two years as a Morrey Assistant Professor at Berkeley, went to the University of Illinois at Urbana-Champaign. Was founding director of the Illinois Center for Cryptography and Information Protection in the Coordinated Science Lab at UIUC and organized the first three Midwest Arithmetical Geometry in Cryptography meetings. Left UIUC in 2002 to become a Full Professor at the University ofWisconsin, with a split appointment in mathematics and ECE and affiliate appointment in CS. Working on applications of algebra and number theory to engineering in areas such as cryptography, coding theory, watermarking, and biometrics. 相似文献

16.

基于FPGA的浮点FIR滤波器的设计与实现

朱蕾王斌《微电子学与计算机》2007,24(7):59-62

针对短波宽带接收机系统中信号动态范围大的特点，自定义了24位的浮点格式，并采用流水线技术设计了该格式浮点数的加法和乘法运算单元。在分析了各种FIR滤波器优缺点的基础上．结合FPGA的特点给出了转置型FIR校正滤波器设计方案。最后，以数据率为2．5MS／S的宽带信号为输入，Ahera公司的EP2S60F672C5芯片为硬件平台仿真实现了10通道短波宽带接收机的250阶FIR校正滤波器，最高运行速率达到130MHz以上。相似文献

17.

VLSI Implementation of Double-Precision Floating-Point Multiplier Using Karatsuba Technique

Manish Kumar Jaiswal Ray C. C. Cheung 《Circuits, Systems, and Signal Processing》2013,32(1):15-27

The double-precision floating-point arithmetic, specifically multiplication, is a widely used arithmetic operation for many scientific and signal processing applications. In general, the double-precision floating-point multiplier requires a large 53×53 mantissa multiplication in order to get the final result. This mantissa multiplication exists as a limit on both area and performance bounds of this operation. This paper presents a novel way to reduce this large multiplication. The proposed approach in this paper allows to use less amount of multiplication hardware compared to the traditional method. The multiplication is done by using Karatsuba technique. This design is specifically targeting Field Programmable Gate Array (FPGA) platforms, and it has also been evaluated on ASIC flow. The proposed module gives excellent performance with efficient use of resources. The design is fully compatible with the IEEE standard precision. The proposed module has shown a better performance in comparison with the best reported multipliers in the literature. 相似文献

18.

Novel Pipelined Architecture for Efficient Evaluation of the Square Root Using a Modified Non-Restoring Algorithm

Imtiaz Sajid M. M. Ahmed Sotirios G. Ziavras 《Journal of Signal Processing Systems》2012,67(2):157-166

The square root is a basic arithmetic operation in image and signal processing. We present a novel pipelined architecture to implement N-bit fixed-point square root operation on an FPGA using a non-restoring pipelined algorithm that does not require floating-point hardware. Pipelining hazards in its hardware realization are avoided by modifying the classic non-restoring algorithm, thus resulting in a 13% improved latency. Furthermore, the proposed architecture is flexible allowing modification as per individual application needs. It is demonstrated that the proposed architecture is approximately four times faster than its popular counterparts and at the same time it consumes 50% less energy for envelope detection at 268 MHz sampling rate. 相似文献

19.

用EMODL实现的高速低功耗流水线乘法器

王颀邵丙铣《固体电子学研究与进展》2004,24(3):363-368

实现快速、低功耗以及节省面积的乘法器对高性能微处理器 (例如 DSP和 RISC)而言是至关重要的。文中详尽论述了新型的增强型多输出多米诺逻辑 ( EMODL)及其 n-MOS赋值树的尺寸优化方法 ,并用它实现了高速低功耗 2 0× 2 0 bit流水线乘法器。最后 ,通过 HSPICE仿真 ,确认了该乘法器结构的优越性 :流水线等待时间小 ( 2倍于系统时钟 )、运算速度高 ( 10 0 MOPS)以及低功耗 ( 2 3 .94m W) 相似文献

20.

Arithmetic Expression Evaluations with Membranes

GUO Ping ;CHEN Haizhu ;ZHENG Hui 《电子学报:英文版》2014,(1):55-60

Arithmetic operations and expression eval- uations are fundamental in computing models. This paper firstly designs arithmetic membranes without priority rules for basic arithmetic operations, and then proposes an algo- rithm to construct expression P systems based on several of such membranes after designing synchronous and asyn- chronous transmission strategies among the membranes. For any arithmetic expression, an expression P system can be built to evaluate it effectively. Finally, we discuss differ- ent parallelism strategies through which different expres- sion P systems can be built for an arithmetic expression. 相似文献