Background
Cloud providers now let you run code inside hardware-protected virtual machines — called Confidential VMs — where even the cloud operator cannot see what's happening inside. Intel TDX (Trust Domain Extensions) is one such technology, increasingly used to protect sensitive tasks like private AI inference, financial applications, and medical analytics.
TDX encrypts all guest memory and prevents the hypervisor (host) from reading or modifying guest data directly. Sounds airtight — but the host still shares physical hardware with the guest: caches, interconnects, and memory controllers all handle both.
TDXRay systematically exploits these shared resources as side channels — indirect signals that leak what the encrypted VM is doing, without ever breaking the cryptographic boundary.
Four Side-Channel Primitives in TDX
We exploit four host-observable signals in Intel TDX. Each exploits a different shared resource — together they cover everything from page-level to cache-line granularity.
Intel TDX exposes a host API to temporarily block/unblock guest memory page translations. We show a malicious hypervisor can repeatedly block all pages in a region and wait for faults — each fault reveals exactly which page the guest just accessed, building a deterministic page-level trace.
Although TDX prevents reading encrypted guest memory, reading a physical alias address still takes measurably longer when the guest previously cached that line. The timing difference reveals not only whether a cache line is cached — but even whether the guest read or wrote it.
Intel TSX hardware transactions abort when they encounter a conflicting cache line. We wrap a read of a guest memory alias inside a TSX transaction — if the guest has that line cached in L1, the transaction aborts. No timer needed, no frequency-scaling noise — a clean binary signal.
The mwait instruction pauses execution until a specific memory address is accessed. We show it works on the physical address — including TDX private memory. This provides cache-line-granular synchronization: the host can precisely align its measurements with guest execution.
The Framework
TDXRay is a Linux kernel module that combines all four primitives into a practical tracing system. It operates entirely within legitimate host interfaces, requires no guest cooperation, and produces cache-line-granular memory access traces of any unmodified confidential VM.
We validated TDXRay on the classic AES T-table cache attack — a standard benchmark for cache side channels. All four primitives successfully recover the secret key.
Evaluated on Intel Xeon 6736P (Granite Rapids) · TDX Module v2.0
Case Study: Private LLM Inference
Confidential computing is rapidly adopted for private AI inference — users want their prompts kept secret from the cloud operator. TDXRay directly threatens this guarantee.
Before an LLM processes your prompt, it runs a tokenizer — a program that converts text into numeric IDs using a hash map stored in memory. Each word or sub-word lookup traverses a linked list in that hash map, creating a predictable, secret-dependent memory access pattern.
The attacker monitors which hash map nodes the tokenizer visits — and since the hash function is public, the bucket traversal uniquely identifies each token. Stitch the tokens together, and you have the full prompt.
Critically, this attack happens entirely on the CPU during tokenization — before inference even begins — meaning GPU confidentiality offers no protection.
Defenses
We investigate both short-term software-level mitigations and long-term architectural changes that hardware vendors can adopt.
Replace the standard hash map with a data-oblivious map that produces indistinguishable access patterns regardless of input. We implement this for Llama 3.2's full 128K-token vocabulary and show acceptable performance overhead — inference time still dominates for real workloads.
TDXRay leaves observable microarchitectural footprints: inflated page fault rates and elevated cache miss counts. A monitor inside the confidential VM can track these performance counters and detect when monitoring is occurring. TDX supports virtualized performance counters, making this practical.
Load+Probe and TSX-Probe work because the cache coherence protocol doesn't distinguish cache lines with identical physical addresses but different encryption key IDs (HKIDs). Incorporating HKID into cache tags would allow these lines to coexist, eliminating the conflict-based timing signal.
Cite
FAQ
Research Team
* Equal contribution joint first authors