排序方式: 共有62条查询结果,搜索用时 15 毫秒
1.
This paper proposes a software pipelining framework, CALiBeR (ClusterAware Load Balancing Retiming Algorithm), suitable for compilers targetingclustered embedded VLIW processors. CALiBeR can be used by embedded systemdesigners to explore different code optimization alternatives, that is, high-qualitycustomized retiming solutions for desired throughput and program memory sizerequirements, while minimizing register pressure. An extensive set of experimentalresults is presented, demonstrating that our algorithm compares favorablywith one of the best state-of-the-art algorithms, achieving up to 50% improvementin performance and up to 47% improvement in register requirements. In orderto empirically assess the effectiveness of clustering for high ILP applications,additional experiments are presented contrasting the performance achievedby software pipelined kernels executing on clustered and on centralized machines. 相似文献
2.
3.
Hui Liu Zili Shao Meng Wang Junzhao Du Chun Jason Xue Zhiping Jia 《Journal of Signal Processing Systems》2009,57(2):249-262
In this paper, we combine coarse-grained software pipelining with DVS (Dynamic Voltage/Frequency Scaling) for optimizing energy
consumption of stream-based multimedia applications on multi-core embedded systems. By exploiting the potential of multi-core
architecture and the characteristic of streaming applications, we propose a two-phase approach to solve the energy minimization
problem for periodic dependent tasks on multi-core processors with discrete voltage levels. With our approach, in the first
phase, we propose a coarse-grained task-level software pipelining algorithm called RDAG to transform the periodic dependent
tasks into a set of independent tasks based on the retiming technique (Leiserson and Saxe, Algorithmica 6:5–35, 1991). In the second phase, we propose two DVS scheduling algorithms for energy minimization. For single-core processors, we propose
a pseudo-polynomial algorithm based on dynamic programming that can achieve optimal solution. For multi-core processors, we
propose a novel scheduling algorithm called SpringS which works like a spring and can effectively reduce energy consumption
by iteratively adjusting task scheduling and voltage selection. We conduct experiments with a set of benchmarks from E3S (Dick
2008) and TGFF () based on the power model of the AMD Mobile Athlon4 DVS processor. The experimental results show that our technique can achieve
12.7% energy saving compared with the algorithms in Zhang et al. (2002) on average.
相似文献
Zhiping JiaEmail: |
4.
This paper deals with the process of Transformation and Quantization that is carried out on each inter-predicted residual
block in a video encoding process and their reduced complexity hardware implementation. H.264/AVC utilizes 4 × 4 integer transform,
which is derived from the 4 × 4 DCT. We propose, a reduced complexity algorithm and a pipelined structure for the Core forward
integer transform module. A multiplier-less architecture is realized with less number of shifts and adds compared to existing
works. The corresponding inverse transform is exactly reversible. Each of the transformed coefficients is quantized by a scalar
quantizer. The quantization step size can be varied from macroblock to macroblock. The proposed unified pipelined architecture
outperforms many recent implementations in terms of gate count and is capable of processing a 4 × 4 residual block in 4 clock
cycles.
相似文献
Reeba KorahEmail: |
5.
基于FPGA的32位浮点加法器的设计 总被引:2,自引:2,他引:0
在综合分析各种浮点加法器算法的基础上,提出了一种符合TI格式标准的32位浮点加法器,同时兼顾了速度和面积两方面因素.本设计在virtex-4系列FPGA上进行了实现,最高速度可达到182.415MHz,资源占用也较为合理. 相似文献
6.
LS SIMD计算机的并行技术 总被引:2,自引:0,他引:2
文章主要讨论了LSSIMD计算机中所采用的并行技术数据并行技术、三级指令流水线并行技术与三组指令并行执行技术。 相似文献
7.
8.
软件流水是一种实现循环迭代中指令级并行的指令调度技术。它可以克服多周期指令延迟对CPU处理性能的影响,保证循环核的运行效率最优。从C64X+开始,TMS320C6X系列DSP引入SPLOOP技术,软件上增加SPLOOP(D/W)、SPKERNEL等相关指令,硬件上增加软件流水缓存等专用模块,通过模调度软件流水模式,有效缩小了软件代码量,提升了执行代码效率。一般情况下,采用SPLOOP技术后机器编译输出的循环代码质量很高,编程人员无需再对代码进行进一步的手工优化。 相似文献
9.
在分析DES算法原理的基础上,详细阐述一种基于VHDL描述、FPGA实现的DES加密算法系统的设计和仿真结果。该系统采用了一种基于子密钥预先计算的新型流水线设计方案,克服了传统DES流水线实现方式的缺点,使系统的密钥可动态刷新,并在硬件资源消耗有所降低的情况下,进一步提高系统的处理速度,系统最高时钟频率为222.77MHz,信息加密的速度为14.26Gb/s,是最快软件实现方式的112倍。同时系统还具有设计灵活,可靠性高,可重用性强,升级方便等特点。 相似文献
10.
数字信号处理器的汇编程序优化方案 总被引:1,自引:0,他引:1
虽然目前大多数DSP都支持C语言编程,但是在实际工程应用中,多是用C语言编写流程控制,搭建工程框架,具体的算法模块以及比较耗时的功能模块还是采用汇编语言来编写.因为用汇编语言进行编程可以利用电路自身硬件结构的特点对其进行优化与精简,从而能够使一些复杂的算法和功能模块在实时性方面取得非常好的效果.文中从指令并行和软件流水二方面出发,以ADI公司TS101系列电路的程序为实例,概括归纳出对DSP汇编程序进行优化的一般方法. 相似文献