- The paper introduces a novel CFA scheme that uses deterministic hardware performance counters to verify control flows without expanding the TCB via software instrumentation.
- The methodology employs a dual-enclave design with a tracer and tracee, using ILP-based analysis of control flow graphs to validate execution integrity on commodity CPUs.
- Experimental evaluations demonstrate high detection rates of control flow tampering with manageable performance overhead through targeted ecall segmentation and caching techniques.
Introduction and Motivation
Trusted Execution Environments (TEEs) are widely adopted for executing security-critical code on untrusted platforms, with static attestation serving as the main trust establishment mechanism. This process, however, only inspects the enclave’s state at initialization, remaining oblivious to runtime attacks such as ROP and JOP that deviate control flow post-launch. Control Flow Attestation (CFA) addresses this gap by checking that only legitimate control flow paths are followed during execution.
The paper introduces HPCCFA, a robust CFA scheme for TEEs built upon hardware-backed control flow trace generation using deterministic Hardware Performance Counters (HPCs). Unlike prior work that either relies on software instrumentation (which is susceptible to manipulation if the enclave is compromised) or hardware changes (which are infeasible on commodity hardware), HPCCFA enables enclave-agnostic CFA using existing CPUs without software instrumentation, thus minimizing the Trusted Computing Base (TCB) expansion.
System Architecture
HPCCFA extends TEEs with CFA using a dual-enclave tracer/tracee design. The traced enclave (tracee) operates in an isolated TEE, and a dedicated tracer enclave—executed in parallel and authenticated via mutual attestation—verifies control flow traces generated from the tracee via HPC snapshots. Trace collection and control are handled solely by the enclave’s trusted Security Monitor (SM), which mediates trace transfer and enforces a memory access protocol that unblocks shared regions only after CFA validation.
The workflow proceeds as follows: When the tracee invokes an enclave call (ecall), the SM snapshots the HPCs and instruction pointer, sets verification state to pending, and delegates control to the tracer. After trace validation, if no violation is detected, shared memory is unlocked; otherwise, execution is halted prior to external interaction (Figure 1).
Figure 1: Verification workflow for a tracee tracer tandem. Shared memory access is gated on successful tracer-side CFA on every enclave transition.
This architecture guarantees that any malicious modifications to the control flow can be detected before exfiltrating data outside the enclave, with minimal TCB expansion (Figure 2).
Figure 2: High-level overview of tracer/tracee architecture in Keystone, highlighting modified API, protected trace buffers, IPC, and conditional memory unlocking.
The approach is orthogonal to the specific trace collection implementation but in this work is realized using deterministic HPC events (retired instructions, conditional branches, loads, etc.) to ensure exact prediction from binary analysis.
Control Flow Trace Generation and CFA Verification
HPCCFA’s attestation logic is predicated on static analysis to build a fine-grained control flow graph (CFG) at the level of basic blocks (BBs). Each BB is annotated with its deterministic impact on the selected hardware counters. On each enclave context switch, the SM records current HPC values and instruction pointer. For trace segments between two SM-managed context switches, the CFA algorithm determines whether the sequence could legitimately arise, given the CFG and the observed HPC deltas.
For this, the CFA process identifies if a path exists between two nodes in the CFG whose cumulative counter effects exactly match the observed deltas, factoring in loops as arbitrary integer multiples and function calls/returns with stack-aware path selection. The verification problem is posed as an Integer Linear Program (ILP) that tests integer cone membership: whether a non-negative integer combination of loop vectors, added to a simple path vector, yields the measured counter values.
Loops are handled via path expansion, considering all linear combinations over the loop bodies that could fill the observed deltas, as depicted in Figure 3.
Figure 3: Example CFG section illustrating simple paths and transitively connected loops relevant for trace verification.
In practice, CFG recovery is challenged by indirect calls or dynamic dispatch. The implementation leverages Ghidra for analysis and, in the presence of function pointers, supplements with static analysis to achieve over-approximation where necessary.
Implementation on Keystone/RISC-V
HPCCFA is prototyped on the open-source Keystone TEE platform for RISC-V with the StarFive VisionFive2 SoC. The enclave metadata and API are extended for role selection, trace buffer, attestation references, and attestation state management.
The SM is configured to exclusively access and manage RISC-V HPCs (e.g., minstret, mcycle, and event-specific counters). Counter configuration is locked to SM privilege to prevent manipulation by tenant code. Each enclave context switch triggers enable/disablement and snapshotting, while necessary register state is managed to avoid perturbing deterministic counts.
CFG generation is performed via static analysis using Ghidra, with fine-grained mappings from all RISC-V instructions to their relevant deterministic HPC increments. The attestation side comprises a two-step process: pre-computation of feasible paths using Rust, and Python-based verification/exploration of the ILP model with the CBC solver (supporting fast batch validation).
Experimental Evaluation
HPCCFA's practical efficacy hinges on two metrics: CFA reliability (detection rate of manipulated control flows) and performance overhead (time for context switches and verification).
CFA Reliability
Robustness was assessed using example enclaves: a “Hello World” runtime and tweetnacl cryptography library routines. Initially, with only three HPCs (due to hardware limitations), CFA reliably detected control flow tampering for short, simple segments (e.g., 98% for “Hello World”), but detection deteriorated for longer, more complex regions (e.g., <1% for long tweetnacl segments). The system is designed to avoid false positives on legitimate control flows, with stress testing performed via synthetic basic-block manipulation and small random perturbations to emulate subtle code-reuse and data-oriented attacks.
Reliability is sensitive to both the segment length (number of basic blocks and loop nesting per measurement window) and the number of deterministic counters—more counters and more frequent measurement points (via inserted ecall) reduce the density of “valid” measurement cones, dramatically reducing the likelihood of undetectable mimicry. Instrumenting additional ecall to segment long execution traces enabled near-perfect detection rates across all attack variants, suggesting that the tradeoff between observing frequency and performance dominates detection effectiveness.
The dual-enclave mechanism introduces execution time overhead primarily due to frequent context switches and communication. For the tweetnacl workload, adding 49 extra measurement points (102,619 total segments vs. 30 baseline) resulted in a 14x slowdown. The verification procedure itself, dominated by ILP solving, incurred further cost but typically contributed less than inter-enclave IPC. Caching of repeat segment verifications amortized this further in practical workloads with repetitive traces.
Security Implications
The system enforces that only authenticated tracer enclaves can access traces, and that the tracee is only executable alongside its designated tracer—a TCB-level property established through attestation. The approach is resilient against memory manipulation and code-reuse techniques as long as measurement values cannot be tampered with. By not exposing trace data outside the TEE, side-channel exposure is minimized, though attacks leveraging microarchitectural covert channels are excluded from the current threat model.
The design supports extensions to synchronize verification windows with timer interrupts (not only ecall-delimited) to reduce temporal leakage in future iterations, with performance and completeness tradeoffs.
HPCCFA diverges from previous CFA approaches such as C-FLAT (Abera et al., 2016), LO-FAT (Dessouky et al., 2017), and GuaranTEE (2603.29749) in its reliance on hardware-only trace generation with deterministic counters, obviating the need for instrumentation susceptible to compromise. Related signature-based ROP detection using HPCs [sigdrop2016] and lattice-based trace matching are distinguished by the integration of formal ILP-based validation coupled with strong CFA semantics (i.e., validation at the exact path/sequences per CFG and deterministic event deltas).
Practical and Theoretical Implications
Practically, HPCCFA provides a deployable CFA solution for TEEs with minimal hardware and TCB changes, suitable for enclaves deployed in untrusted, multi-tenant environments, especially when integrated with automated trace segmentation and verification caching. Theoretically, the work connects CFA reliability to the density of integer cones determined by HPC event selection and measurement intervalization, suggesting that dynamic and per-segment event selection can further amplify detection guarantees under constrained hardware.
The methodology is adaptable to other TEE platforms (e.g., Intel TDX, ARM CCA) with equivalent deterministic HPC support and SM orchestration, though instruction set complexity and CFG recovery fidelity pose migration challenges.
Future work avenues comprise integrating hardware-assisted dynamic measurement (e.g., Intel PEBS), exploring TCB-verification co-location, and mitigating side-channel exposure via randomization and asynchronous verification.
Conclusion
HPCCFA advances the state of TEE CFA by introducing an enclave-agnostic, dual-enclave verification mechanism that exploits deterministic HPCs to safeguard control flow. The system robustly attests execution integrity and halts data leakage on detection, achieving near-perfect detection rates with sufficient instrumentation and counter dimensionality. The tradeoff between performance and reliability is quantitatively characterized, providing clarity for engineering deployments in diverse threat environments. This research lays groundwork for TEE-based CFA primitives deployable on commodity hardware, opening the path for future adaptive, event-driven attestation frameworks.