Towards Agentic Test-Driven Quality Assurance for 6G Networks

Published 25 Apr 2026 in cs.NI and cs.SE | (2604.23285v1)

Abstract: This work proposes an agentic, intent-driven end-to-end (E2E) orchestration framework that integrates intent co-creation with a Test-Driven Quality Assurance paradigm. In this framework, autonomous agents iteratively refine a user's initial intent into a confirmed, auditable specification. Furthermore, the system automatically derives validation tests from these intents before provisioning, directly mirroring the Test-Driven Development workflow in software engineering to ensure proactive Service Level Agreement (SLA) compliance. The architecture is grounded in a standards-aligned knowledge representation using TM Forum (TMF) information models and catalogs. This enables deterministic graph traversal from high-level Product Offerings down to granular Service/Resource and Test specifications. We prototyped this architecture by extending OpenSlice with a message-driven, multi-agent pattern and integrating MCP-enabled (Model Context Protocol) tool access for real-time knowledge retrieval. Currently, our evaluation of the agents targets the intent co-creation phase as a baseline toward full-scale orchestration. Building on experiments with multiple open-source LLM backends integrated with the TMF-based knowledge base, we observe substantial variability in tool-use reliability and hallucination patterns, underscoring the critical importance of robust knowledge integration in agentic 6G systems.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a novel framework combining agentic intent co-creation with test-driven quality assurance, ensuring precise SLO/SLA compliance in 6G networks.
It details the integration of autonomous agents, LLMs, and TM Forum standards to iteratively derive and enforce network provisioning tests.
The empirical evaluation highlights tool-use reliability, with GPT-OSS-20B demonstrating superior performance in dynamic NaaS scenarios.

Agentic Test-Driven Quality Assurance for 6G Networks: Formal Summary

Context and Motivation

The proliferation of 6G infrastructures introduces unprecedented heterogeneity across radio, transport, core, and cloud-edge domains, rendering manual and semi-automated operations insufficient for dynamic provisioning and compliance with ultra-low latency and stringent reliability requirements. Intent-Based Networking (IBN) mechanisms have gained momentum as a strategy to express desired outcomes rather than explicit network configurations, yet prevailing approaches commonly assume well-formed user intents and reactive quality assurance. These assumptions misalign with real-world scenarios, where operator requirements are often incomplete or ambiguous and service-level validations are tightly coupled to provisioning workflows.

Framework Architecture

The paper presents an end-to-end orchestration framework that integrates agentic, intent-driven processes with test-driven quality assurance (TDD-QA). The key architectural elements are:

Intent Co-Creation: Autonomous agents engage users/operators in iterative refinement loops to clarify and confirm ambiguous objectives, operational constraints, and quality/cost boundaries. The resultant intent forms a formal inventory entry that drives further orchestration.
Test-Driven Assurance: Upon intent confirmation, agents derive validation tests from the finalized intent prior to provisioning, adhering to TM Forum standards (e.g., TMF653). This process mirrors TDD in software engineering, ensuring that SLO/SLA compliance is achieved iteratively during resource activation and maintained throughout the service lifecycle.
Knowledge Representation: The framework models network intelligence as a directed knowledge graph grounded in TM Forum Information Framework (SID), mapping commercial product offerings down to service/resource specifications, associated test requirements, and translation rules.
Service Orchestration: A standards-compliant E2E orchestrator executes agent-generated actions via BPMN workflows, coordinating infrastructure controllers (e.g., RAN, SDN, 5G Core). Observability modules stream real-time telemetry into an SLA/Tests pipeline for closed-loop remediation.

Prototype Implementation

The reference implementation extends OpenSlice, an ETSI-governed OSS, with a message-driven, multi-agent orchestration pattern. Key features include:

Agents deployed as Spring Boot microservices, coordinated asynchronously via an ActiveMQ-backed service bus.
Model Context Protocol (MCP) enables agents to access platform tools (catalogs, slice provisioning, observability) without direct control over core logic, maintaining deterministic execution.
Modular "Agent Skills" encode domain-specific reasoning, constraints, and test workflows as context-optimized directories, facilitating on-demand loading and minimizing token consumption in LLM-based decision-making.

Empirical Evaluation

The evaluation centers on the intent co-creation phase, benchmarking agentic decision-making with multiple open-source LLMs integrated via MCP with the knowledge representation layer. Experiments involve:

Resolving complex NaaS scenarios through sequential user/agent interaction, with explicit product and parameter constraints.
Measuring convergence to a "ground truth" consisting of (i) correct functional product composition, (ii) accurate fiscal/temporal reasoning, (iii) valid initialization payloads.
Assessing tool-use reliability and hallucination rates across reasoning-centric and tool-augmented LLM families, considering operational latency and resource consumption.

Notable findings include:

GPT-OSS-20B: Exhibited best overall performance with accurate cost estimation and product selection, zero hallucination, and compliance with confirmation requirements within ≤5 minutes sessions.
Qwen3-32B/Qwen3-VL-8B/Magistral-24B: Initiated correct product recommendations but faltered in tool integration or session completion under changing constraints.
Lightweight LLM variants consistently failed to avoid hallucinations and improper tool invocation, underscoring minimum viable model size requirements.
Strong claim: The critical risk in agentic OSS operations is not reasoning quality but tool-use reliability and hallucination control.

Practical and Theoretical Implications

Integrating agentic co-creation with test-driven quality assurance establishes a proactive, standards-grounded methodology for deterministic E2E orchestration in 6G networks. By elevating test derivation and SLO/SLA management to first-class entities within the orchestration lifecycle, the framework enforces that service provisioning is continuously guided and validated against explicitly confirmed user intents.

Practically, this architecture:

Enables automated SLA tracing and closed-loop remediation for dynamic, heterogeneous 6G environments.
Facilitates modular scaling of agentic OSS through standardized skills and tool interfaces.
Reduces dependence on specialized AI knowledge by allowing domain experts to contribute modular knowledge bases.

Theoretically, the knowledge graph approach bridges semantic mismatches between high-level business objectives and low-level resource actions, supporting adaptive orchestration under evolving network conditions and offerings.

Speculation on Future Developments

Future research directions will include:

Expanding NaaS benchmarks to validate deterministic convergence across diverse product and test scenarios.
Enhancing operational guardrails with schema validation and catalog-grounding gates.
Enriching knowledge representation and rules for granular traceability from derived test specifications back to original user intents.
Optimizing MCP tool portability and on-prem security for carrier-grade deployments.

Long-term, agentic orchestration leveraging standards-based KR and TDD principles is poised to underpin carrier-grade, trustworthy autonomous networks, with further developments in schema validation, on-prem LLM execution, and explainable AI for multi-domain remediation.

Conclusion

The paper articulates a comprehensive, standards-aligned framework for agentic, intent-driven 6G orchestration with embedded test-driven quality assurance. Emphasizing proactive validation and modular knowledge integration, the work advances practical approaches to carrier-grade OSS autonomy, mitigating operational risks associated with tool unreliability and LLM hallucinations and charting an evolutionary path for scalable, trustworthy AI-native networking infrastructures (2604.23285).

Markdown Report Issue