Abstract
This paper demonstrates the design of efficient asynchronous bundled-data pipelines for the matrix-vector multiplication core of discrete cosine transforms (DCTs). The architecture is optimized for both zero and small-valued data, typical in DCT applications, yielding both high average performance and low average power. The proposed bundled-data pipelines include novel data-dependent delay lines with integrated control circuitry to efficiently implement speculative completion sensing. The control circuits are based on a novel control-circuit template that simplifies the design of such nonlinear pipelines. Extensive post-layout back-end timing analysis was performed to gain confidence in the timing margins as well as to quantify performance and energy. Comparison with a synchronous counterpart suggests that our best asynchronous design yields 30% higher average throughput with negligible energy overhead.
| Original language | English |
|---|---|
| Pages (from-to) | 448-461 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Very Large Scale Integration (VLSI) Systems |
| Volume | 13 |
| Issue number | 4 |
| DOIs | |
| State | Published - Apr 2005 |
Keywords
- Asynchronous pipelines
- Bundled-data pipelines
- Control circuit templates
- Discrete cosine transforms
- Matrix-vector multiplication
- Precharged full buffer
- True four-phase full buffer