排序方式: 共有5条查询结果,搜索用时 15 毫秒
1
1.
Milidonis A. Dimitroulakos G. Galanis M. D. Kakarountas A. P. Theodoridis G. Goutis C. Catthoor F. 《Design Automation for Embedded Systems》2004,9(2):101-121
We present an automated framework that partitions the code and data types for the needs of data management in an object-oriented source code. The goal is to identify the crucial data types from data management perspective and separate these from the rest of the code. In this way, the design complexity is reduced allowing the designer to easily focus on the important parts of the code to perform further refinements and optimizations. To achieve this, static and dynamic analysis is performed on the initial C++ specification code. Based on the analysis results, the data types of the application are characterized as crucial or non-crucial. Continuing, the initial code is rewritten automatically in such a way that the crucial data types and the code portions that manipulate them are separated from the rest of the code. Experiments on well-known multimedia and telecom applications demonstrate the correctness of the performed automated analysis and code rewriting as well as the applicability of the introduced framework in terms of execution time and memory requirements. Comparisons with Rational’s QuantifyTM suite show the failure of QuantifyTM to analyze correctly the initial code for the needs of data management. 相似文献
2.
Athanasios Milidonis Nikolaos Alachiotis Vasileios Porpodas Harris Michail Georgios Panagiotakopoulos Athanasios P. Kakarountas Costas E. Goutis 《Journal of Signal Processing Systems》2010,59(3):281-296
We present an architecture of decoupled processors with a memory hierarchy consisting only of scratch-pad memories, and a main memory. This architecture exploits the more efficient pre-fetching of Decoupled processors, that make use of the parallelism between address computation and application data processing, which mainly exists in streaming applications. This benefit combined with the ability of scratch-pad memories to store data with no conflict misses and low energy per access contributes significantly for increasing the system’s performance. The application code is split in two parallel programs the first runs on the Access processor and computes the addresses of the data in the memory hierarchy. The second processes the application data and runs on the Execute processor, a processor with a limited address space—just the register file addresses. Each transfer of any block in the memory hierarchy up to the Execute processor’s register file is controlled by the Access processor and the DMA units. This strongly differentiates this architecture from traditional uniprocessors and existing decoupled processors with cache memory hierarchies. The architecture is compared in performance with uniprocessor architectures with (a) scratch-pad and (b) cache memory hierarchies and (c) the existing decoupled architectures, showing its higher normalized performance. The reason for this gain is the efficiency of data transferring that the scratch-pad memory hierarchy provides combined with the ability of the Decoupled processors to eliminate memory latency using memory management techniques for transferring data instead of fixed prefetching methods. Experimental results show that the performance is increased up to almost 2 times compared to uniprocessor architectures with scratch-pad and up to 3.7 times compared to the ones with cache. The proposed architecture achieves the above performance without having penalties in energy delay product costs. 相似文献
3.
In this work, the authors proposed a microscopic particle tracking system based on the previous work (Tien et al. in Exp Fluids 44(6):1015–1026, 2008). A three-pinhole plate, color-coded by color filters of different wavelengths, is utilized to create a triple exposure pattern on the image sensor plane for each particle, and each color channel of the color camera acts as an independent image sensor. This modification increases the particle image density of the original monochrome system by three times and eliminates the ambiguities caused by overlap of the triangle exposure patterns. A novel lighting method and a color separation algorithm are proposed to overcome the measurement errors due to crosstalk between color filters. A complete post-processing procedure, including a cascade correlation peak-finding algorithm to resolve overlap particles, a calibration-based method to calculate the depth location based on epipolar line search method, and a vision-based particle tracking algorithm is developed to identify, locate and track the Lagrangian motions of the tracer particles and reconstruct the flow field. A 10X infinity-corrected microscope and back-lighted by three individual high power color LEDs aligning to each of the pinhole is used to image the flow. The volume of imaging is 600 × 600 × 600 μm3. The experimental uncertainties of the system verified with experiments show that the location uncertainties are less than 0.10 and 0.08 μm for the in-plane and less than 0.82 μm for the out-of-plane components, respectively. The displacement uncertainties are 0.62 and 0.63 μm for the in-plane and 0.77 μm for the out-of-plane components, respectively. This technique is applied to measure a flow over a backward-facing micro-channel flow. The channel/step height is 600/250 μm. A steady flow with low particle density and an accelerating flow with high particle density are measured and compared to validate the flow field resolved from a two-frame tracking method. The Reynolds number in the current work varies from 0.033 to 0.825. A total of 20,592 vectors are reconstructed by time-averaged tracking of 156 image pairs from the steady flow case, and roughly 400 vectors per image pair are reconstructed by two-frame tracking from the accelerating flow case. 相似文献
4.
Michalis?D.?GalanisEmail author Athanasios?Milidonis George?Theodoridis Dimitrios?Soudris Costas?E.?Goutis 《Design Automation for Embedded Systems》2005,10(1):27-47
In this paper, we propose a methodology for accelerating application segments by partitioning them between reconfigurable
hardware blocks of different granularity. Critical parts are speeded-up on the coarse-grain reconfigurable hardware for meeting
the timing requirements of application code mapped on the reconfigurable logic. The reconfigurable processing units are embedded
in a generic hybrid system architecture which can model a large number of existing heterogeneous reconfigurable platforms.
The fine-grain reconfigurable logic is realized by an FPGA unit, while the coarse-grain reconfigurable hardware by our developed
high-performance data-path. The methodology mainly consists of three stages; the analysis, the mapping of the application
parts onto fine and coarse-grain reconfigurable hardware, and the partitioning engine. A prototype software framework realizes
the partitioning flow. In this work, the methodology is validated using five real-life applications. Analytical partitioning
experiments show that the speedup relative to the all-FPGA mapping solution ranges from 1.5 to 4.0, while the specified timing
constraints are satisfied for all the applications. 相似文献
5.
Michalis D. Galanis Athanassios Milidonis Athanassios P. Kakarountas Costas E. Goutis 《Microelectronics Journal》2006,37(6):554-564
In this paper, we propose a method for speeding-up Digital Signal Processing applications by partitioning them between the reconfigurable hardware blocks of different granularity and mapping critical parts of applications on coarse-grain reconfigurable hardware. The reconfigurable hardware blocks are embedded in a heterogeneous reconfigurable system architecture. The fine-grain part is implemented by an embedded FPGA unit, while for the coarse-grain reconfigurable hardware our developed high-performance coarse-grain data-path is used. The design flow mainly consists of three steps; the analysis procedure, the mapping onto coarse-grain blocks, and the mapping onto the fine-grain hardware. In this work, the methodology is validated using five real-life applications; an OFDM transmitter, a medical imaging technique, a wavelet-based image compressor, a video compression scheme and a JPEG encoder. The experimental results show that the speedup, relative to an all-FPGA solution, ranges from 1.55 to 4.17 for the considered applications. 相似文献
1