Enabling Deterministic User-Level Interrupts in Real-Time Processors via Hardware Extension

Published 5 Apr 2026 in cs.CR and cs.AR | (2604.04015v1)

Abstract: The growing complexity of real-time embedded systems demands strong isolation of software components into separate protection domains to reduce attack surfaces and limit fault propagation. However, application-supplied device interrupt handlers -- even untrusted -- have to remain in the kernel to minimize interrupt latency, undermining security and burdening manual certifications. Current hardware extensions accelerate interrupts only when the target protection domain is scheduled by the kernel; consequently, they are limited to improving average-case performance but not worst-case latency, and do not meet the requirements of critical real-time applications such as autonomous vehicles or robots. To overcome this limitation, we propose a novel hardware extension that enables direct, deterministic switching to the appropriate protection domain upon user-level interrupt arrival -- without kernel intervention -- even when that domain is dormant. Our hardware extension reduces worst-case latency by more than 50x with a 19% increase in core area (2% of total die area) and 4.1% increase in dynamic power. To the best of our knowledge, this is the first integrated mechanism to guarantee user-level interrupt delivery with a nanosecond-scale yet bounded worst-case latency.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a hardware extension that enables deterministic user-level interrupt delivery using CAM-based IID lookup and shadow register banking to achieve 11-cycle latency.
It demonstrates over 50× reduction in worst-case interrupt latency and >60× jitter improvement compared to traditional kernel-mediated approaches.
The design enforces strict spatial and temporal isolation, supporting secure user-level integration in microkernel architectures for critical cyber-physical system deployments.

Deterministic User-Level Interrupts in Real-Time Processors via Hardware Extension

Introduction and Motivation

This work addresses the fundamental challenge of achieving deterministic, low-latency, user-level interrupt handling in real-time embedded systems that require strong temporal and spatial isolation of software components. The increasing complexity and compositional diversity in cyber-physical systems (CPS)—notably in autonomous vehicles, medical devices, and industrial controllers—demand strong compartmentalization. Traditionally, to guarantee minimal interrupt latency, device interrupts must execute in a privileged kernel to avoid the overhead of user-kernel transitions, undermining both security (attack surface, fault containment) and system certification. Prior hardware extensions (e.g., Intel UIPI, RISC-V N extension) opportunistically deliver interrupts directly to user-mode only if the destination process is scheduled; otherwise, they fall back to unpredictable, high-latency kernel mediation. This bimodal behavior is incompatible with hard real-time requirements where worst-case latency, not average-case, determines system correctness.

Hardware Extension Architecture

The proposed hardware extension, implemented on a RISC-V RV32IM base core, introduces a datapath and memory-backed microarchitectural units to support fully deterministic, secure user-level interrupt delivery regardless of the process's scheduling state. The architecture consists of the following key components:

Interrupt Identification Data Unit (IIDU): Performs a single-cycle match between incoming interrupt sources and the designated user-level protection domains using a content-addressable memory (CAM) structure.
Spatial Protection: On activation, the PMP configuration for the handler's protection domain is loaded from a dedicated TCM, confining device and memory access.
Temporal Protection: An execution budget, also loaded from TCM, is enforced via an autonomous countdown timer. Upon budget expiration or on fault (e.g., unauthorized access), the hardware extension enforces immediate handler preemption and context restoration.
Context Management: Register files are shadow banked (for the highest-performance variant) to achieve near-zero context switch time, with provision for hybrid spilling to TCM if bank nesting depth is insufficient.
Figure 1: Proposed hardware extension (V5 variant) block diagram and data flow, highlighting TCM-backed PMP and timer management for deterministic and isolated user-level interrupt delivery.

The extension supports seamless integration into modern single-address-space microkernel architectures, with hardware APIs abstracted as hardware-managed "threads" scheduled via system-level calls.

Design Space Exploration and Implementation

A suite of five implementation variants (V1–V5) is explored to quantify the latency–area–power trade-off:

V1 (Baseline): Minimal area overhead, all tables and stacking in main SRAM, serialized access (38 cycles latency).
V2: Dedicated TCM for stack to eliminate bus contention (29 cycles).
V3/V4: Register banking and dedicated TCM tables to minimize context switch impact, with incremental latency reductions.
V5 (Optimal): CAM-based IID lookup, register banking, and dedicated TCM for stack and tables, achieving 11 cycle delivery (a greater than 50 $\times$ reduction over kernel mediation) at a 19% core area increase.
Figure 2: Timing diagram of the V5 variant showing minimal sequential pipeline stages and fully parallelized table and context memory accesses, yielding 11-cycle latency.

The area and power overheads are rigorously reported, with V5 representing an absolute upper bound of 2% on total SoC die area (45 nm node) and around 4% increase in dynamic power—well within practical bounds for modern CPS use cases.

API Integration and Software Abstraction

A kernel API enables user processes to program user-level handlers through a secure registration flow, mapping interrupt IDs to handler entry points, policies, and PMP/budget domains. Interrupts are autonomously handled by hardware, and the entire workflow is backward compatible, with negligible kernel source modifications.

Figure 3: User-level interrupt API and the software-hardware workflow, showing handler registration and activation steps.

Evaluation and Numerical Results

Latency Determinism

Comprehensive evaluation on an FPGA platform demonstrates that all hardware extension variants outperform software-only and contemporary opportunistic hardware (Intel UIPI/RISC-V N). For inactive processes, all baseline methods exhibit catastrophic worst-case degradation (e.g., kernel mediation: $>$ 800 cycles, INTEL-style: bimodal with low best-case but high worst-case). All extension variants maintain their minimal latency regardless of process activity.

Figure 4: Raw interrupt latency (CPU cycles) for various schemes; extension V5 achieves consistently low latency under all execution contexts.

Industrial Application Metrics

The paper evaluates canonical CPS workloads—pulse train output (PTO) generation and Modbus-RTU co-location—with hard jitter and throughput requirements.

PTO Frequency and Jitter: The extension supports an order-of-magnitude higher sustainable PTO frequencies and $>60\times$ jitter reduction at frequency ranges critical for robotics and industrial control, compared with kernel-based handling.
Figure 5: Maximum achievable Pulse Train Output (PTO) frequency across evaluated interrupt delivery mechanisms.

Figure 6: Normalized jitter as a function of PTO frequency, demonstrating minimal timing variance for hardware extensions even under process interleaving.
Task Colocation & Load Isolation: During high-rate Modbus interrupts, best-effort background recognition workload throughput is preserved with less than 15% performance loss for V5 at 1 Mbps, versus 52% loss for software mediation.
Figure 7: Achievable FPS for a background task as a function of foreground Modbus-RTU baud rate; hardware extensions maintain high throughput under heavy interrupt load.

Isolation and Security

All variants enforce strict spatial/temporal isolation: attempts at illegal memory access or budget overrun by handlers result in immediate, automatic preemption by hardware, safeguarding kernel and co-resident processes against attack or denial-of-service conditions. Only minimal kernel code is needed to leverage this architecture, enabling deployment within existing microkernels with negligible SLoC increase.

Implications and Future Perspectives

The research extends the user-level interrupt model from opportunistic, best-effort acceleration to true, deterministic hard real-time delivery for user-level domains, without compromising security via expanded kernel attack surface. Practically, this re-enables real-time system consolidation and third-party/untrusted component integration without sacrificing worst-case response times or requiring extensive re-certification.

From a theoretical perspective, the work delineates the minimal microarchitectural primitives required for worst-case latency isolation in a modern SASOS, providing a template for future RISC-V or ARM IP extensions. The paper briefly notes the feasibility of porting analogous mechanisms to MMU-based systems, suggesting future work towards deterministic isolation in more general-purpose, possibly multicore, architectures.

Conclusion

The presented hardware extension achieves deterministic, secure user-level interrupt delivery in real-time processors with strong worst-case guarantees. Strong numerical results include over 50 $\times$ reduction in worst-case latency, $>60\times$ jitter improvement, and negligible SoC area/power penalty, all while maintaining strict compartmentalization. The architecture is suitable for industrial, automotive, and safety-critical embedded CPS deployments and opens a path to principled consolidation of untrusted code in safety/security-critical real-time systems (2604.04015).