首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The complexity of hardware/software (HW/SW) interfacing and the lack of portability across different platforms, restrain the widespread use of reconfigurable accelerators and limit the designer productivity. Furthermore, communication between SW and HW parts of codesigned applications are typically exposed to SW programmers and HW designers. In this work, we introduce a virtualization layer that allows reconfigurable application-specific coprocessors to access the user-space virtual memory and share the memory address space with user applications. The layer, consisting of an operating system (OS) extension and a HW component, shifts the burden of moving data between processor and coprocessor from the programmer to the OS, lowers the complexity of interfacing, and hides physical details of the system. Not only does the virtualization layer enhance programming abstraction and portability, but it also performs runtime optimizations: by predicting future memory accesses and speculatively prefetching data, the virtualization layer improves the coprocessor execution-applications achieve better performance without any user intervention. We use two different reconfigurable system-on-chip (SoC) running Linux and codesigned applications to prove the viability of our concept. The applications run faster than their SW versions, and the overhead due to the virtualisation is limited. Dynamic prefetching in the virtualisation layer further reduces the abstraction overhead.  相似文献   

2.
The number of document compression algorithms is increasing due to the expanded information exchange in our society and due to the increasing quality demands for color documents. For these purposes, high quality and complex compression algorithms are needed, in order to keep the memory size and channel capacity within realistic bounds. In addition, these algorithms must be executed within a certain real-time specification, in order to reduce the user wait times. In this article, an optimized RBN-algorithm for coding true color documents will be used as a test vehicle. Unfortunately, the typical properties of such high-throughput algorithms restricts the possible real-time realizations. We believe that an application-specific design approach supported with powerful CAD-tools is the most efficient implementation for these type of applications when a reasonable time-to-market needs to be achieved. An efficient dedicated architecture is proposed, based on a lowly multiplexed co-operating data-path style. As for most complex video applications, the cost minimization and performance optimization are highly dependent on the memory organization. Hence the latter is the most important topic of this article. The architectural design process has been traversed mostly manually with use of some prototype synthesis tools from the merging CATHEDRAL-3 synthesis environment.  相似文献   

3.
4.
Streaming of continuous media over wireless links is a notoriously difficult problem. This is due to the stringent quality of service (QoS) requirements of continuous media and the unreliability of wireless links. We develop a streaming protocol for the real-time delivery of prerecorded continuous media from (to) a central base station to (from) multiple wireless clients within a wireless cell. Our protocol prefetches parts of the ongoing continuous media streams into prefetch buffers in the clients (base station). Our protocol prefetches according to a join-the-shortest-queue (JSQ) policy. By exploiting rate adaptation techniques of wireless data packet protocols, the JSQ policy dynamically allocates more transmission capacity to streams with small prefetched reserves. Our protocol uses channel probing to handle the location-dependent, time-varying, and bursty errors of wireless links. We evaluate our prefetching protocol through extensive simulations with VBR MPEG and H.263 encoded video traces. Our simulations indicate that for bursty VBR video with an average rate of 64 kb/s and typical wireless communication conditions our prefetching protocol achieves client starvation probabilities on the order of 10-4 and a bandwidth efficiency of 90% with prefetch buffers of 128 kbytes  相似文献   

5.
Bursty continuous media streams with periodic playout deadlines (e.g., VBR-encoded video) are expected to account for a large portion of the traffic in the future Internet. By prefetching parts of ongoing streams into client buffers these bursty streams can be more efficiently accommodated in packet-switched networks. In this paper we develop a modular algorithm-theoretic framework for the fair and efficient transmission of continuous media over a bottleneck link. We divide the problem into the two subproblems of (i) assuring fairness, and (ii) efficiently utilizing the available link capacity. We develop and analyze algorithm modules for these two subproblems. Specifically, we devise a bin packing algorithm for subproblem (i), and a "layered prefetching" algorithm for subproblem (ii). Our simulation results indicate that the combination of these two algorithm modules compares favorably with existing monolithic solutions. This demonstrates the competitiveness of the decoupled modular algorithm framework, which provides a foundation for the development of refined algorithms for fair and efficient prefetching.  相似文献   

6.
This paper suggests a new approach for data compression during extracutaneous transmission of neural signals recorded by high-density microelectrode array in the cortex. The approach is based on exploiting the temporal and spatial characteristics of the neural recordings in order to strip the redundancy and infer the useful information early in the data stream. The proposed signal processing algorithms augment current filtering and amplification capability and may be a viable replacement to on chip spike detection and sorting currently employed to remedy the bandwidth limitations. Temporal processing is devised by exploiting the sparseness capabilities of the discrete wavelet transform, while spatial processing exploits the reduction in the number of physical channels through quasi-periodic eigendecomposition of the data covariance matrix. Our results demonstrate that substantial improvements are obtained in terms of lower transmission bandwidth, reduced latency and optimized processor utilization. We also demonstrate the improvements qualitatively in terms of superior denoising capabilities and higher fidelity of the obtained signals.  相似文献   

7.
8.
A nonvolatile memory circuit using conventionally available components (transistors and magnetic switching cores) operates on the principle of the two distinct impedance levels of a switching core in the irreversible and reversible regions. It has the property of nondestructive read-out and requires no sensing amplifiers. It is believed that the circuit is useful for systems that require low memory capacity, such as from a few bits to a hundred of bits of information.  相似文献   

9.
One major requirement that has to be satisfied in heterogeneous wireless networking environments is the support for real time services and applications that are becoming increasingly popular and should be effectively supported end-to-end. Thus special emphasis should be given for the design and dimensioning of the handover procedure which becomes a challenge when a user is moving within heterogeneous systems. In this work we deal with the problem of handover time minimization for bandwidth demanding and real time services in 4th generation wireless communications systems. In particular we present a framework for handover execution as well as its associated procedures that are activated adaptively to the available resources for achieving bounded latency, together with their performance which is evaluated via simulations.  相似文献   

10.
11.
A combined OFDM/SDMA approach   总被引:7,自引:0,他引:7  
Two major technical challenges in the design of future broadband wireless networks are the impairments of the propagation channel and the need for spectral efficiency. To mitigate the channel impairments, orthogonal frequency division multiplexing (OFDM) can be used, which transforms a frequency-selective channel in a set of frequency-flat channels. On the other hand, to achieve higher spectral efficiency, space division multiple access (SDMA) can be used, which reuses bandwidth by multiplexing signals based on their spatial signature. In this paper, we present a combined OFDM/SDMA approach that couples the capabilities of the two techniques to tackle both challenges at once. We propose four algorithms, ranging from a low-complexity linear minimum mean squared error (MMSE) solution to the optimal maximum likelihood (ML) detector. By applying per-carrier successive interference cancellation (pcSIC), initially proposed for DS-CDMA, and introducing selective state insertion (SI), we achieve a good tradeoff between performance and complexity. A case study demonstrates that, compared to the MMSE approach, our pcSIC-SI-OFDM/SDMA algorithm obtains a performance gain of 10 dB for a BER of 10-3, while it is only three times more complex. On the other hand, it is two orders of magnitude less complex than the ML approach, for a performance penalty of only 2 dB  相似文献   

12.
An original reliability prediction procedure is presented. The physics of failure accounts for the failure mechanisms involved (a lognormal distribution was presumed); the interactions (synergies) between the technology factors depending on the manufacturing techniques are considered. The basis of this methodology (called SYRP=synergetic reliability prediction) is the assessment of failure-risk coefficients (FRC), based on fuzzy logic, for the potential failure mechanisms induced at each manufacturing step. These FRC are corrected at the subsequent steps by considering the synergy of the manufacturing factors, At the end of the manufacturing process, final FRC are obtained for each potential failure mechanism; the parameters of the lognormal distribution are calculated with a simple algorithm. Experimental results for four lots of the same type of semiconductor devices, each lot being manufactured with a slightly different technology, were obtained. SYRP forecasts for these four lots agree well with accelerated life test results. This is a fairly good result, because SYRP was used early, at the design phase  相似文献   

13.
A novel PCI Express (peripheral component interconnection express) direct memory access (DMA) transaction method using bridge chip PEX 8311 is proposed. Furthermore, a new method on optimizing PC1 Express DMA transaction through improving both bus-efficiency and DMA-effieiency is presented. A finite state machine (FSM) responding for data and address cycles on PCI Express bus is introduced, and a continuous data burst is realized, which greatly promote bus-efficiency. In software design, a driver framework based on Windows driver model (WDM) and three DMA optimizing options for the proposed PCI Express interface are presented to improve DMA-efficiency. Experiments show that both read and write hardware transaction speed in this paper exceed PCI theoretical maximum speed (133 MBytes/s).  相似文献   

14.
ASIC design methodologies are assessed from the system designer's point of view by comparing the entire IC-related product cost, design schedule, functionality, and risks to that of designs containing standard devices. ASIC methodologies include programmable logic devices, gate arrays, standard cells, and full custom, all primarily in 2-µm CMOS, at production volumes of 1 to 100K units per year and at complexities of 5OO to 20 000 gates per device. It is shown that "gates per pin" is the key determinant of total IC-related cost. Products containing ASIC cost less than those containing SSI/MSI, since ASICs raise the number of gates per pin from 2 to a range of 40-200. More surprising, products using ASIC devices cost less than products containing combinations of standard LSI/VLSI and SSI/MSI, if their gates per pin is 2-3 times that of the products containing standard devices. Each design methodology has regions, or market segments, where it is competitive. But there are large regions of small cost differences between two ASIC methodologies. Currently, these regions use primarily the older methodologies, i.e., gate arrays at low production volumes and full custom at high volumes. They also provide future opportunities for standard cells. Currently, IC manufacturing cost accounts for about 15 percent of the logic-related total cost, field maintenance for 17 percent, device and system development for 11 percent, and systems related manufacturing cost for 57 percent. These percentages are expected to migrate to 17, 20, 13, and 50 percent, respectively, by 1990. Our ASIC techno-economic assessment is summarized in 27 nomograms, figures, and charts.  相似文献   

15.
在面向多媒体数据流的计算密集型的应用中,不仅要求DSP(数字信号处理器)有非常强大的数据处理能力,还要求其具有高速的数据输入、输出接口带宽。本文在传统DSP常用的增强型哈佛结构的基础上,提出一种DSP处理器DMA接口结构的设计方案.实现了基于指令并行和任务并行的DMA并行传输模式。通过6个常用的DSP算法程序实验验证.在片上存储器使用单口RAM的前提下,指令中带有片上Memory访存操作的指令占总指令的42.2%-94.3%时.这种方法设计的。DMA接口能够在DSP零开销的情况下,完成必要的数据传输。而且能够实现对Host处理器程序员透明的。DMA数据传输操作.有效地提高了DSP系统的性能。  相似文献   

16.
Due to the small size of nanoscale devices, they are highly prone to process disturbances which results in manufacturing defects. Some of the defects are randomly distributed throughout the nanodevice layer. Other disturbances tend to be local and lead to cluster defects caused by factors such as layer misalignments, line width variations and contamination particles. In this paper, initially a method is proposed for separately identifying cluster defects from random ones. Subsequently a hardware repair structure is presented to repair the cluster defects with rectangular window transfer vectors using a range-matching content addressable memory (RM-CAM) and random defects using defect aware triple-modular redundancy (DA-TMR) columns. It is shown that a combination of these two approaches is more effective for repairing defects at higher error rates with an acceptable overhead. The effectiveness of the technique is shown by examining defect recovery results for different fault distribution scenarios. Also the mapping circuit hardware performance parameters are presented for various memory sizes and the speed, power dissipation and overhead factors are reported.  相似文献   

17.
In system-on-chips (SoCs), DMA, as a peripheral module, plays an important role in data transmission. However, the structure shrinking of SoC leads to its proneness to radiation-induced soft errors, especially for DMA. This paper presents a fine-grained software-implemented fault tolerance for SoC, named DCRH, to enhance the reliability of DMA against soft errors. DCRH achieves fine-grained selective fault tolerance, protecting DMA without interfering other modules of SoC. Furthermore, it is transparent to the user application because it performs on driver layer. In this paper, we present our fault source analysis for DMA based on Xilinx Zynq-7010 SoC and the detailed design of DCRH. The method is then applied to bare-metal MicroZed so that a DCRH-enhanced DMA driver is developed. Finally, SSIFFI is engaged in the simulated DMA fault injection experiments to validate DCRH. The experimental results prove that DCRH can achieve high fault coverage for DMA, above 97%, with stable performance.  相似文献   

18.
Murad Abusubaih 《电信纪事》2011,66(11-12):635-642
Hidden node is a fundamental problem that severely degrades the performance of wireless networks. The problem occurs when nodes that do not hear each other transmit at the same time, which leads to data packet collision. IEEE 802.11 Wireless Local Area Networks (WLANs) tries to solve this problem through the Request to Send/Clear to Send (RTS/CTS) mechanism. However, the mechanism is not wholly successful. The RTS/CTS idea is based on the assumption that all nodes in the vicinity of Access Points will hear CTS packets and consequently defer their transmissions. The shortcoming of RTS/CTS stems from the fact that such packets introduce high overhead if extensively used. In this article, we propose a hybrid approach for detecting hidden nodes in 802.11 WLANs. The approach is mainly based on adaptive learning about collisions in the network. We think that the approach will be useful for controlling the tuning of RTS/CTS threshold and therefore reduce the overhead those packets introduce. Detailed simulation experiments have shown the strength of the proposed approach compared with other approaches.  相似文献   

19.
Input vector control (IVC) is a popular technique for leakage power reduction. It utilizes the transistor stack effect in CMOS gates by applying a minimum leakage vector (MLV) to the primary inputs of combinational circuits during the standby mode. However, the IVC technique becomes less effective for circuits of large logic depth because the input vector at primary inputs has little impact on leakage of internal gates at high logic levels. In this paper, we propose a technique to overcome this limitation by replacing those internal gates in their worst leakage states by other library gates while maintaining the circuit's correct functionality during the active mode. This modification of the circuit does not require changes of the design flow, but it opens the door for further leakage reduction when the MLV is not effective. We then present a divide-and-conquer approach that integrates gate replacement, an optimal MLV searching algorithm for tree circuits, and a genetic algorithm to connect the tree circuits. Our experimental results on all the MCNC91 benchmark circuits reveal that 1) the gate replacement technique alone can achieve 10% leakage current reduction over the best known IVC methods with no delay penalty and little area increase; 2) the divide-and-conquer approach outperforms the best pure IVC method by 24% and the existing control point insertion method by 12%; and 3) compared with the leakage achieved by optimal MLV in small circuits, the gate replacement heuristic and the divide-and-conquer approach can reduce on average 13% and 17% leakage, respectively.  相似文献   

20.
In this paper, we describe a methodology and flow for systematic design of application specific multiprocessor system-on-chip (mp-SoC). Our approach is based on a generic architecture platform which is used as a model throughout the design process. This model is modular, flexible and scalable, making it possible to cover a large application field. A complete design flow from system specification to register transfer level (rtl) consists of two principal stages. The first stage is architecture exploration where the system-level performance estimation method is required to find the best system architecture. The goal of this stage is to fix the optimal architectural parameters specific to the application. The second stage is the systematic design flow. The architectural parameters are used in this stage to produce thertl architecture. This paper focuses on the definition of the architecture model and the systematic design flow that was now automated. The feasibility and effectiveness of this approach are illustrated by several telecommunication applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号