Data-parallel Digital Signal Processors:

Algorithm Mapping, Architecture Scaling and Workload Adaptation

Sridhar Rajagopal

Ph.D. Dissertation Oral Defense

January 12, 2004

10 am 12 pm

Duncan Hall 1049

 

 

Rice Calendar Notice

 

Thesis draft (.pdf format)

 

Contact: Sridhar Rajagopal, Duncan Hall 3038, sridhar@rice.edu, 713.348.2256.

Abstract [Small]:

Emerging applications such as high definition television (HDTV), streaming video, image processing in embedded applications and signal processing in high-speed wireless communications are driving a need for high performance digital signal processors (DSPs) with real-time processing. This class of applications demonstrates significant data parallelism, finite precision, need for power-efficiency and the need for 100s of arithmetic units in the DSP to meet real-time requirements. The thesis first presents the design of algorithms for efficient mapping on data-parallel DSPs and for reducing the architecture complexity. This thesis then demonstrates a design space exploration tool for designing a family of data-parallel DSPs that meet real-time requirements while minimizing power consumption. Finally, the thesis improves the power efficiency in data-parallel DSPs by adapting the compute resources in data-parallel DSPs to run-time variations in the workload.

Abstract [As in thesis]:

Emerging applications such as high definition television (HDTV), streaming video, image processing in embedded applications and signal processing in high-speed wireless communications are driving a need for high performance digital signal processors (DSPs) with real-time processing. This class of applications demonstrates significant data parallelism, finite precision, need for power-efficiency and the need for 100s of arithmetic units in the DSP to meet real-time requirements. The thesis first presents the design of algorithms for efficient mapping on data-parallel DSPs and for reducing the architecture complexity. This thesis then demonstrates a design space exploration tool for designing a family of data-parallel DSPs that meet real-time requirements while minimizing power consumption. Finally, the thesis improves the power efficiency in data-parallel DSPs by adapting the compute resources in data-parallel DSPs to run-time variations in the workload.

The thesis focuses on wireless base-stations as the application for designing data-parallel DSPs. Base-stations are classified into 3 categories to demonstrate the range of algorithms and performance requirements for data-parallel DSP design. A second generation (2G) base-station is a voice-based system, provides 16 Kbps/user and employs single user algorithms. A 3G base-station is a multimedia based-system, provides 128 Kbps/user and employs multi-user algorithms. A 4G base-station is a multiple antenna system, employs multi-antenna algorithms for providing higher capacity and provides 1 Mbps/user. Data-parallel DSPs employ clusters of functional units to enable support 100s of computations every clock cycle. These DSPs exploit instruction level parallelism and subword parallelism within clusters, similar to a traditional VLIW (Very Long Instruction Word) DSP, and exploit data parallelism across clusters, similar to vector processors.

The first contribution of this thesis demonstrates that efficient design of algorithms are needed to map on to data-parallel DSPs and this can simultaneously lead to complexity reduction in the data-parallel DSP architecture. The thesis explores trade-offs in the use of subword parallelism, memory access patterns, inter-cluster communication and functional unit efficiency to design and map algorithms efficiently on data-parallel DSPs. The thesis demonstrates that communication patterns existing in the algorithms can be exploited to provide greater scalability of the inter-cluster communication network with the number of clusters and reduce the DSP complexity.

The second thesis contribution demonstrates a design space exploration framework for data-parallel DSPs to meet real-time requirements for a given application while minimizing power consumption. The design space for data-parallel DSPs exhibits trade-offs between the number of arithmetic units per cluster, number of clusters and the clock frequency in order to meet the real-time requirements of a given application. The presented exploration methodology searches this design space and provides a heuristic to minimize the power consumption and decides the number of adders, multipliers, clusters and the real-time clock frequency in the DSP.

The final thesis contribution improves power efficiency in data-parallel DSPs by varying the number and organization functional units to adapt to the compute requirements of the application and by scaling voltage and frequency to meet the real-time processing requirements. The thesis presents the design of an adaptive multiplexer network that allows the number of active clusters to be varied during run-time by multiplexing the data from internal memory on to a select number of clusters and turning off unused clusters using power gating.

 

Committee:

Dr. Joseph R. Cavallaro (Advisor and Chair)

Dr. Scott Rixner

Dr. Behnaam Aazhang

Dr. David B. Johnson