µSTT: Microarchitecture Design for Speculative Taint Tracking

Published in ICCD, 2025 (Full Paper | Code | bibtex)

@inproceedings{gofetch, title = {GoFetch: Breaking Constant-Time Cryptographic Implementations Using Data Memory-Dependent Prefetchers}, author = {Boru Chen and Yingchen Wang and Pradyumna Shome and Christopher W. Fletcher and David Kohlbrenner and Riccardo Paccagnella and Daniel Genkin}, booktitle = {USENIX Security}, year = {2024}, }
@inproceedings{peekawalk, author = { Wang, Alan and Chen, Boru and Wang, Yingchen and Fletcher, Christopher and Genkin, Daniel and Kohlbrenner, David and Paccagnella, Riccardo }, booktitle = { IEEE S\&P }, title = , year = {2025} }
@inproceedings{controlled-preemtion, author = {Zhu, Yongye and Chen, Boru and Zhao, Zirui Neil and Fletcher, Christopher W.}, title = {Controlled Preemption: Amplifying Side-Channel Attacks from Userspace}, year = {2025}, booktitle = {ASPLOS} }
@inproceedings{ustt, title = {µSTT: Microarchitecture Design for Speculative Taint Tracking}, author = {Boru Chen and Rutvik Choudhary and Kaustubh Khulbe and Archie Lee and Adam Morrison and Christopher W. Fletcher}, booktitle = {ICCD}, year = {2025}, }

Speculative execution attacks exploit malicious speculation to leak sensitive data via microarchitectural covert channels. Speculative Taint Tracking (STT) is a state-of-the-art hardware mechanism that blocks such threats by tainting data flowing from speculative loads, untainting data once all its dependencies are not speculative, and delaying instructions that create covert channels until their inputs are untainted. However, STT’s hardware feasibility remains unclear due to a lack of detailed hardware cost analysis.

This paper presents the first in-depth hardware cost analysis of STT and identifies two key challenges: (1) the logic delay of taint propagation, which grows with rename width, and (2) area overhead from instruction delaying, which requires expensive CAM-style logic to enforce speculation safety.

To address these, we propose a new microarchitecture for STT, called µSTT. µSTT is based on two new mechanisms. First, the Age Matrix is a shallow taint propagation circuit that removes 85% of the logic delay overhead of prior STT designs and only adds 36% more area at the default rename width of 8. Second, the impede micro-op implements instruction delaying by leveraging existing RAW dependency tracking. Together, these contributions make STT more practical by reducing hardware cost and maintaining strong security with only 5% more performance overhead relative to the original STT.