DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms [conference paper]

Conference

49th International Conference on Parallel Processing – August 2020

Authors

Ziyi Zhao, Zhang Jiang, Ximing Liu, Xiaoli Gong, Wenwen Wang, Pen-Chung Yew (professor)

Abstract

The scalability of a dynamic binary translation (DBT) system has become important due to the prevalence of multicore systems and large multi-threaded applications. Several recent efforts have addressed some critical issues in extending a DBT system to run on multicore platforms for better scalability. In this paper, we present a distributed DBT framework, called DQEMU, that goes beyond a single-node multicore processor and can be scaled up to a cluster of multi-node servers.

In such a distributed DBT system, we integrate a page-level directory-based data coherence protocol, a hierarchical locking mechanism, a delegation scheme for system calls, and a remote thread migration approach that are effective in reducing its overheads. We also proposed several performance optimization strategies that include page splitting to mitigate false data sharing among nodes, data forwarding for latency hiding, and a hint-based locality-aware scheduling scheme. Comprehensive experiments have been conducted on DQEMU with micro-benchmarks and the PARSEC benchmark suite. The results show that DQEMU can scale beyond a single-node machine with reasonable overheads. For ”embarrassingly-parallel” benchmark programs, DQEMU can achieve near-linear speedup when the number of nodes increases - as opposed to flattened out due to lack of computing resources as in current single-node, multi-core version of QEMU.

Link to full paper

DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms

Keywords

parallel processing

Share