Skip to Main Content
Skip Nav Destination
ASME Press Select Proceedings
International Conference on Advanced Computer Theory and Engineering (ICACTE 2009)
Xie Yi
Xie Yi
Search for other works by this author on:
No. of Pages:
ASME Press
Publication date:

Mat-Core is a research processor aiming at exploiting the increasingly number of transistors per IC to improve the performance of a wide range of applications. It extends a general-purpose scalar processor with a matrix unit for processing vector/matrix data, where scalar and vector/matrix instructions are executed out-of-order. To hid memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues and executed also out-of-order. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute scalar-matrix, vector-matrix, and matrix-matrix instructions in addition to scalar-vector and vector-vector instructions. By extending the well known scoreboard algorithm, these instructions are executed out-of-order on parallel pipelines. This paper describes the SystemC (system level modeling language) implementation of Mat-Core and evaluates its performance on vector and matrix kernels. On four parallel lanes and matrix registers of size 4×8 or 32 elements each, the performance of Mat-Core with queues size of 10, start up time of 6 clock cycles, and memory latency of 10 clock cycles, is about 1.4, 2.1, 4.2, 2.6, 4.2, and 6.4 FLOPs per clock cycle; achieved on scalar-vector multiplication, SAXPY, Givens, rank-1 update, vector-matrix multiplication, and matrix-matrix multiplication, respectively.

Key Words
1. Introduction
2. The Architecture of Out-of-Order Mat-Core Processor
3. Performance Evaluation of the Out-of-Order Mat-Core
4. Summaries
This content is only available via PDF.
You do not currently have access to this chapter.
Close Modal

or Create an Account

Close Modal
Close Modal