- The paper introduces a heterogeneous SoC integrating ReckOn, a recurrent SNN accelerator, enabling energy-efficient neuromorphic edge computing on FPGAs.
- It validates two configurations—microcontroller-based and ARM-based—with high accuracy (>96%) on benchmark tasks, demonstrating resource efficiency and scalability.
- The work shows flexible prototyping by bridging firmware and OS-level control, facilitating real-time online learning in diverse edge applications.
Heterogeneous SoC Integrating an Open-Source Recurrent SNN Accelerator for Neuromorphic Edge Computing on FPGA
Introduction
This work presents a heterogeneous System-on-Chip (SoC) architecture aimed at advancing neuromorphic edge computing through the integration of an open-source recurrent Spiking Neural Network (SNN) accelerator, ReckOn, on FPGA platforms. The motivation arises from the increasing relevance of SNNs for energy-efficient, real-time computations in edge applications, and the prohibitive costs and inflexibility associated with custom silicon-based neuromorphic hardware. By leveraging FPGAs, the authors propose a reconfigurable, cost-effective, and open-source hardware platform for prototyping and deploying digital neuromorphic systems, targeting both research and future commercial neuromorphic edge devices.
System Architecture and Methodology
Integration Strategies
Two distinct SoC configurations were implemented and validated:
- X-HEEP Microcontroller-Based Architecture: Here, ReckOn is managed by X-HEEP, an open-source RISC-V microcontroller, synthesized on the FPGA. The microcontroller configures and controls ReckOn via an SPI interface, handling peripheral communications and dataset management through on-chip BRAM. The design emphasizes flexibility for diverse FPGA targets, facilitating firmware-driven experimentation.
- ARM Processor-Based Architecture: Utilizing the ARM Cortex-A9 processor within Xilinx Zynq Ultrascale SoCs, ReckOn operates as a coprocessor in the programmable logic (PL) fabric, while the processor — running a Linux environment — orchestrates operations using the AXI interconnect. This approach enables external dataset management, batch loading, and advanced software integration using Python/Jupyter interfaces, suitable for large-scale, high-data-volume edge workloads.
Peripheral and Data Management
Both designs are built around a finite-state machine (FSM)-based Address Event Representation (AER) decoder responsible for temporal spike pattern encoding and communication between memory, processor, and accelerator. Data samples, encoded as 32-bit words (distinguishing spike, label, and end-of-sample events), are delivered to ReckOn via the AER protocol. The ARM-based architecture extends this design to support batch-wise dataset streaming to overcome BRAM limitations, optimizing both dataset size accommodation and runtime configurability.
ReckOn Accelerator Details
ReckOn is a task-agnostic, recurrent SNN accelerator implementing the e-prop algorithm for local, online learning. Capable of simulating up to 256 input/recurrent LIF neurons and 16 output LI neurons, it supports both classification and regression workflows. The architecture features on-chip SRAM for network weights, membrane states, and eligibility traces, and is interfaced via AER and SPI for input/output and configuration, respectively.
Experimental Results
Resource Utilization
Both architectures were synthesized on a PYNQ-Z2 FPGA board (Zynq-7020 SoC). Key points include:
- X-HEEP Microcontroller Architecture: Consumed 45,651 LUTs, 145 DSPs, and 94 BRAMs (almost saturating available BRAM when accommodating both datasets and network state), operating at 10 MHz.
- ARM Processor Architecture: Achieved reduced utilization (37,528 LUTs, 144 DSPs, 56 BRAMs), running at 15 MHz, and allocated additional resources for debug and batch operations. Offloading larger datasets from OS-managed memory through AXI minimized on-chip memory bottlenecks.
Functional Validation
Binary Decision Navigation Task
Replication of results from a previously taped-out ReckOn silicon implementation solidified correctness. On a binary delayed-supervision navigation dataset, both SoC implementations (X-HEEP and ARM-based) matched the original RTL simulation and silicon performance, achieving test accuracy above 96% — specifically, 92.4% (X-HEEP) and 92.2% (ARM) on training, and 96.8% (X-HEEP) and 96.4% (ARM) on validation.
Braille Digit Classification
Extended evaluation on the Braille digits dataset (originally used for benchmarking neuromorphic devices) demonstrated the system’s ability to handle real-world spatio-temporal pattern recognition:
- 3-class subset (A, E, U): Achieved 90% test accuracy after 200 training epochs, with a peak validation accuracy of 93%.
- 4-class subsets (A, E, U, Space and A, E, O, U): Reached 78.8% and 60% test accuracy, respectively, with stable learning dynamics observed during training.
These results validate not only functional parity with prior closed-source systems but also demonstrate the feasibility of on-chip online learning with state-of-the-art SNN algorithms on FPGA platforms.
Implications and Future Directions
The open-source, FPGA-based heterogeneous SoC implemented in this work provides a practical route for prototyping, evaluating, and deploying neuromorphic algorithms at the edge. Key implications include:
- Accessibility and Flexibility: By avoiding custom silicon, the presented system lowers the barrier to entry for neuromorphic hardware research and accelerates experimentation through open-source, reconfigurable platforms.
- Practicality for Edge AI: Demonstrated support for both firmware-level (bare metal) and OS-level (Linux) software stacks addresses a broad spectrum of edge scenarios, from ultra-low-power microcontroller-driven devices to more complex, multi-application edge nodes.
- Scalability: Efficient batching and management of large datasets position the ARM-based design for tasks beyond on-chip BRAM capacity, enabling more extensive, real-time learning scenarios in future work.
- Benchmarking and Interoperability: The system sets the stage for direct, empirical benchmarking against commercial neuromorphic processors (e.g., Loihi, SpiNNaker), particularly in conjunction with frameworks like Neuromorphic Intermediate Representation (NIR).
- Open-Source Ecosystem: Both ReckOn and X-HEEP are open, fostering community-driven innovation, transparent evaluation, and reproducibility.
Future directions involve systematic benchmarking versus other hardware in the NIR framework, scaling to more complex datasets and tasks, and exploring further architectural optimizations (e.g., power-aware dynamic configuration, advanced learning rules, real-time sensor integration).
Conclusion
This paper delivers a validated, open-source, heterogeneous SoC platform for neuromorphic edge computing via FPGA, integrating the Recurrent SNN accelerator ReckOn. Through both microcontroller-based and ARM-based architectures, the design demonstrates high-accuracy on-chip learning for real-world edge inference tasks with resource-efficient implementations and flexible runtime management. The approach represents a robust foundation for future research and development of neuromorphic edge devices and benchmarks in an open, accessible hardware ecosystem.