首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Memory requirements for balanced computer architectures
Authors:HT Kung
Institution:Department of Computer Science, Carnegie-Mellon University, Pittsburgh, Pennsylvania 15213, USA
Abstract:In this paper, a processing element (PE) is characterized by its computation bandwidth, I/O bandwidth, and the size of its local memory. In carrying out a computation, a PE is said to be balanced if the computing time equals the I/O time. Consider a balanced PE for some computation. Suppose that the computation band-width of the PE is increased by a factor of α relative to its I/O bandwidth. Then when carrying out the same computation the PE will be imbalanced; i.e., it will have to wait for I/O. A standard method of avoiding this I/O bottleneck is to reduce the overall I/O requirement of the PE by increasing the size of its local memory. This paper addresses the question of by how much the PE's local memory must be enlarged in order to restore balance.The following results are shown: For matrix computations such as matrix multiplication and Gaussian elimination, the size of the local memory must be increased by a factor of α2. For computations such as relaxation on a k-dimensional grid, the local memory must be enlarged by a factor of αk. For some other computations such as the FFT and sorting, the increase is exponential; i.e., the size of the new memory must be the size of the original memory to the αth power. All these results indicate that to design a balanced PE, the size of its local memory must be increased much more rapidly than its computation bandwidth. This phenomenon seems to be common for many computations where an output may depend on a large subset of the inputs.Implications of these results for some parallel computer architectures are also discussed. One particular result is that to balance an array of p linearly connected PEs for performing matrix computations such as matrix multiplication and matrix triangularization, the size of each PE's local memory must grow linearly with p. Thus, the larger the array is, the larger each PE's local memory must be.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号