A parameterized ordering for cache-, register- and pipeline-efficient Givens QR decomposition |
| |
Authors: | Carrig James J Meyer Gerard GL |
| |
Institution: | (1) Sony Electronics, Inc., Santa Clara, CA 95054, USA;(2) Johns Hopkins University, Baltimore, MD 21218, USA |
| |
Abstract: | A parameterized ordering of Givens rotations and guidelines for choosing parameter values is presented in the context of QR
decomposition. Although a standard selection of parameter values retrieves an ordering that corresponds to a well-known algorithm,
we show that non-standard values decrease the execution time. We implement the new ordering on an Intel Pentium Pro system,
a single thin POWER2 processor of the IBM SP2, and a single R8000 processor of the SGI POWER Challenge XL. On each machine,
we observe performance that is more than twice that of the original ordering.
This revised version was published online in June 2006 with corrections to the Cover Date. |
| |
Keywords: | Givens QR algorithm superscalar processors 65F05 65F25 65Y10 |
本文献已被 SpringerLink 等数据库收录! |
|