CUED Publications database

Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization

Bhardwaj, K and Havasi, M and Yao, Y and Brooks, DM and Lobato, JMH and Wei, GY (2019) Determining Optimal Coherency Interface for Many-Accelerator SoCs Using Bayesian Optimization. IEEE Computer Architecture Letters, 18. pp. 119-123. ISSN 1556-6056

Full text not available from this repository.


© 2002-2011 IEEE. The modern system-on-chip (SoC) of the current exascale computing era is complex. These SoCs not only consist of several general-purpose processing cores but also integrate many specialized hardware accelerators. Three common coherency interfaces are used to integrate the accelerators with the memory hierarchy: non-coherent,coherent with the last-level cache (LLC), and fully-coherent.However, using a single coherence interface for all the accelerators in an SoC can lead to significant overheads: in the non-coherent model, accelerators directly access the main memory, which can have considerable performance penalty; whereas in the LLC-coherent model, the accelerators access the LLC but may suffer from performance bottleneck due to contention between several accelerators; and the fully-coherent model, that relies on private caches, can incur non-trivial power/area overheads. Given the limitations of each of these interfaces, this paper proposes a novel performance-aware hybrid coherency interface, where different accelerators use different coherency models, decided at design time based on the target applications so as to optimize the overall system performance. A new Bayesian optimization based framework is also proposed to determine the optimal hybrid coherency interface, i.e., use machine learning to select the best coherency model for each of the accelerators in the SoC in terms of performance. For image processing and classification workloads, the proposed framework determined that a hybrid interface achieves up to 23 percent better performance compared to the other 'homogeneous' interfaces, where all the accelerators use a single coherency model.

Item Type: Article
Divisions: Div F > Computational and Biological Learning
Depositing User: Cron Job
Date Deposited: 25 Oct 2019 21:12
Last Modified: 01 Sep 2020 05:58
DOI: 10.1109/LCA.2019.2910521