Two CS&E papers accepted to ICPP 2020
Two papers from the University of Minnesota’s Department of Computer Science and Engineering were accepted for publication at the 49th International Conference on Parallel Processing (ICPP). The conference was held virtually in August 2020.
The first paper, DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms, was submitted by Professor Pen-Chung Yew and his collaborators from the University of Georgia and Nankai University.
The second paper, First Time Miss: Low Overhead Mitigation for Shared Memory Cache Side Channels, was written by CS&E Ph.D. student Kartik Ramkrishnan, associate professor Stephen McCamant, professor Pen-Chung Yew, and associate professor Antonia Zhai.
DQEMU: A Scalable Emulator with Retargetable DBT on Distributed Platforms
The scalability of a dynamic binary translation (DBT) system has become important due to the prevalence of multicore systems and large multi-threaded applications. Several recent efforts have addressed some critical issues in extending a DBT system to run on multicore platforms for better scalability. In this paper, we present a distributed DBT framework, called DQEMU, that goes beyond a single-node multicore processor and can be scaled up to a cluster of multi-node servers.
In such a distributed DBT system, we integrate a page-level directory-based data coherence protocol, a hierarchical locking mechanism, a delegation scheme for system calls, and a remote thread migration approach that are effective in reducing its overheads. We also proposed several performance optimization strategies that include page splitting to mitigate false data sharing among nodes, data forwarding for latency hiding, and a hint-based locality-aware scheduling scheme. Comprehensive experiments have been conducted on DQEMU with micro-benchmarks and the PARSEC benchmark suite. The results show that DQEMU can scale beyond a single-node machine with reasonable overheads. For ”embarrassingly-parallel” benchmark programs, DQEMU can achieve near-linear speedup when the number of nodes increases - as opposed to flattened out due to lack of computing resources as in current single-node, multi-core version of QEMU.
First Time Miss: Low Overhead Mitigation for Shared Memory Cache Side Channels
Cache hit or miss is an important source of information leakage in cache side channel attacks. An attacker observes a much faster cache access time if the cache line has previously been filled in by the victim, and a much slower memory access time if the victim has not accessed this cache line, thus revealing to the attacker whether the victim has accessed the cache line or not.
For machines with private caches, this leakage can be mitigated by scheduling the victim and potential attackers on different cores, or flushing the private caches after a use. However, the latter is less practical for the large last-level cache. In this work, we propose a novel yet simple mitigation approach for cross-core attacks, called FTM (first time miss) approach. In this approach, in order to hide a cache hit to a shared cache, we make it to behave like a miss when it is accessed the first time by a thread. It is simulated by buffering the cache line for a time similar to the memory access time (i.e. like a miss penalty), and then sending it to the private cache. The next access onwards, it is safe to allow cache hits on this cache line because the attacker has already accessed it once, and expects it to be filled anyway. Thus, all of the cache lines appear to be accessed only by the attacker, and the access patterns of the victim can be hidden.
The hardware overhead for the FTM scheme is minimal because it only needs a small per core buffer. Simulation-based evaluation on SPEC and PARSEC benchmarks shows low performance hit (< 0.1%) because of low number of first time misses in most application programs.