排序方式: 共有62条查询结果,搜索用时 15 毫秒
1.
2.
LS SIMD计算机的并行技术 总被引:2,自引:0,他引:2
文章主要讨论了LSSIMD计算机中所采用的并行技术数据并行技术、三级指令流水线并行技术与三组指令并行执行技术。 相似文献
3.
Hui Liu Zili Shao Meng Wang Junzhao Du Chun Jason Xue Zhiping Jia 《Journal of Signal Processing Systems》2009,57(2):249-262
In this paper, we combine coarse-grained software pipelining with DVS (Dynamic Voltage/Frequency Scaling) for optimizing energy
consumption of stream-based multimedia applications on multi-core embedded systems. By exploiting the potential of multi-core
architecture and the characteristic of streaming applications, we propose a two-phase approach to solve the energy minimization
problem for periodic dependent tasks on multi-core processors with discrete voltage levels. With our approach, in the first
phase, we propose a coarse-grained task-level software pipelining algorithm called RDAG to transform the periodic dependent
tasks into a set of independent tasks based on the retiming technique (Leiserson and Saxe, Algorithmica 6:5–35, 1991). In the second phase, we propose two DVS scheduling algorithms for energy minimization. For single-core processors, we propose
a pseudo-polynomial algorithm based on dynamic programming that can achieve optimal solution. For multi-core processors, we
propose a novel scheduling algorithm called SpringS which works like a spring and can effectively reduce energy consumption
by iteratively adjusting task scheduling and voltage selection. We conduct experiments with a set of benchmarks from E3S (Dick
2008) and TGFF () based on the power model of the AMD Mobile Athlon4 DVS processor. The experimental results show that our technique can achieve
12.7% energy saving compared with the algorithms in Zhang et al. (2002) on average.
相似文献
Zhiping JiaEmail: |
4.
5.
This paper proposes a software pipelining framework, CALiBeR (ClusterAware Load Balancing Retiming Algorithm), suitable for compilers targetingclustered embedded VLIW processors. CALiBeR can be used by embedded systemdesigners to explore different code optimization alternatives, that is, high-qualitycustomized retiming solutions for desired throughput and program memory sizerequirements, while minimizing register pressure. An extensive set of experimentalresults is presented, demonstrating that our algorithm compares favorablywith one of the best state-of-the-art algorithms, achieving up to 50% improvementin performance and up to 47% improvement in register requirements. In orderto empirically assess the effectiveness of clustering for high ILP applications,additional experiments are presented contrasting the performance achievedby software pipelined kernels executing on clustered and on centralized machines. 相似文献
6.
在综合分析各种浮点加法器算法的基础上,提出了一种符合TI格式标准的32位浮点加法器,同时兼顾了速度和面积两方面因素.本设计在virtex-4系列FPGA上进行了实现,最高速度可达到182.415MHz,资源占用也较为合理. 相似文献
7.
This paper deals with the process of Transformation and Quantization that is carried out on each inter-predicted residual
block in a video encoding process and their reduced complexity hardware implementation. H.264/AVC utilizes 4 × 4 integer transform,
which is derived from the 4 × 4 DCT. We propose, a reduced complexity algorithm and a pipelined structure for the Core forward
integer transform module. A multiplier-less architecture is realized with less number of shifts and adds compared to existing
works. The corresponding inverse transform is exactly reversible. Each of the transformed coefficients is quantized by a scalar
quantizer. The quantization step size can be varied from macroblock to macroblock. The proposed unified pipelined architecture
outperforms many recent implementations in terms of gate count and is capable of processing a 4 × 4 residual block in 4 clock
cycles.
相似文献
Reeba KorahEmail: |
8.
软件流水是一种实现循环迭代中指令级并行的指令调度技术。它可以克服多周期指令延迟对CPU处理性能的影响,保证循环核的运行效率最优。从C64X+开始,TMS320C6X系列DSP引入SPLOOP技术,软件上增加SPLOOP(D/W)、SPKERNEL等相关指令,硬件上增加软件流水缓存等专用模块,通过模调度软件流水模式,有效缩小了软件代码量,提升了执行代码效率。一般情况下,采用SPLOOP技术后机器编译输出的循环代码质量很高,编程人员无需再对代码进行进一步的手工优化。 相似文献
9.
10.
基于汇编语言的DSP源代码优化技术的研究 总被引:1,自引:0,他引:1
介绍了一种利用汇编语言 ,借助流水线技术优化 TMS3 2 0 C60 0 0源代码的方法。该方法能够提高代码的执行效率 ,为实际工程中复杂算法的实时性提供了软件保障。以图像的测量跟踪程序为例 ,介绍了使用汇编语言 ,利用流水线结构优化源代码的方法和步骤 ,在实验中得出三个程序优化前和优化后的执行时间情况对比。窗口尺寸为 1 0 0× 80的重心跟踪程序优化前所用时间为 1 640μs,优化后所用时间为 48μs,执行时间缩短了 3 4.2倍 ,充分验证了汇编语言和流水线技术对 TMS3 2 0 C60 0 0源代码的优化效果 ,为实时条件下应用更为复杂和有效的算法开辟了广阔的空间 相似文献