Mat-Core is a research processor aiming at exploiting the increasingly number of transistors per IC to improve the performance of a wide range of applications. It extends a general-purpose scalar processor with a matrix unit for processing vector/matrix data, where scalar and vector/matrix instructions are executed out-of-order. To hid memory latency, the extended matrix unit is decoupled into two components: address generation and data computation, which communicate through data queues and executed also out-of-order. Like vector architectures, the data computation unit is organized in parallel lanes. However, on parallel lanes, Mat-Core can execute scalar-matrix, vector-matrix, and matrix-matrix instructions in addition...