The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems

Published 21 May 2026 in cs.AI and cs.MA | (2605.21997v1)

Abstract: Most agent frameworks are built around the LLM: a conversation loop comes first, then tools, then rules, and finally a logging layer bolted on for observability, with state persisted as retrievable "memory." We describe ActiveGraph, a runtime that inverts this arrangement. The append-only event log is the source of truth; the working graph is a deterministic projection of that log; and behaviors--ordinary functions, classes, LLM-backed routines, or logic attached to typed edges--react to changes in the graph and emit new events. No component instructs another; coordination happens entirely through the shared graph. This single design decision yields three properties that retrieval-and-summarization memory systems do not provide: deterministic replay of any run from its log, cheap forking that branches a run at any event without re-executing the shared prefix, and end-to-end lineage from a high-level goal down to the individual model call that produced each artifact. We present the architecture, a determinism contract that makes replay sound, and a worked diligence example whose full causal structure is reconstructable from the log alone. We discuss--without claiming to demonstrate--why this substrate is unusually well suited to self-improving agents, and how it extends the BabyAGI lineage and prior graph-memory research.

Abstract PDF Upgrade to Chat

Authors (1)

Yohei Nakajima

Summary

The paper introduces a log-first architecture where an append-only event log unifies agent state, ensuring deterministic replay and comprehensive auditability.
It demonstrates reactive functions and pattern-based subscriptions that eliminate traditional orchestrators, improving decentralized control and scalability.
Efficient forking and counterfactual reasoning enable safe self-improvement and robust compliance by reconstructing and comparing divergent agent behaviors.

Event-Sourced Architectures for Agentic Systems: ActiveGraph Runtime

Architectural Paradigm: Log-Primary Agentic Systems

ActiveGraph introduces a critical reconfiguration in agent architectures by defining the append-only event log as the ground truth for the agent state, relegating conventional memory and conversational subsystems to derived projections. The system exposes behaviors as reactive functions subscribing to graph patterns or event types, which respond to changes by invoking models, tools, or computations and emitting further events. This substrate eliminates traditional orchestrators or workflow engines, establishing agent coordination as an emergent property of pattern-based subscriptions on a shared graph—an evolution beyond the BabyAGI lineage and memory-layer architectures.

Figure 1: The ActiveGraph runtime cycle, where the append-only event log is the primary source, with replay folding log entries into a reactive graph and behaviors subscribing to graph-shape changes.

The architectural distinction is not merely philosophical: deterministic replay of any run, cheap forking at arbitrary points, and comprehensive lineage tracking emerge directly from treating the log as primary. Object state, relations, rules, tool calls, and even code changes become uniformly events in a single linear history, which enables reconstructable and counterfactual reasoning across runs.

Determinism, Replay, and Auditability

ActiveGraph enforces a determinism contract: agent behaviors must be deterministic functions of their inputs to guarantee that replaying a log produces identical state across runs. Interactive components, notably LLM-based behaviors and tools, violate determinism at runtime but are rendered deterministic during replay by recording their responses as content-addressed events. Two replay modes—strict for validation (detecting replay divergence) and permissive for live agent runs—define operational guarantees.

Lineage is preserved at a granular level: every artifact produced by the agent is accompanied by provenance linking it to the exact behavior, triggering event, and model call. Auditing, rollback, and forensic reconstruction become trivial operations, facilitating compliance and trust-critical scenarios.

Forking, Structural Comparison, and Counterfactual Reasoning

Forking an agent run is operationally efficient: the shared event prefix up to the branch point is replayed using cached model and tool responses, eliminating redundant costs. Subsequent executions in the forked diverge only at the branch point, allowing rapid counterfactual evaluations and diffs. Structural diff identifies which objects, relations, and patches diverged as a result of the fork, supporting systematic exploration of agent policies or configurations.

Frames serve as intra-run, transient contexts, distinct from forks which are durable branches with independent event logs—a design that supports both exploratory and persistent branching within agentic workflows.

Empirical Demonstration: Auditable Agentic Diligence

The investment-diligence pack exemplifies the operational guarantees: when run, it produces a 671-event audit log with 93 objects and 76 relations, via 103 model calls and 48 tool calls, entirely traceable and reproducible. No orchestration code sequences these operations; behaviors fire reactively as new objects appear, demonstrating decentralized control. Each memo produced is fully auditable down to the originating model prompt and response—a property uniquely afforded by comprehensive event logging.

Implications for Self-Improving Agents

The ActiveGraph substrate is particularly suited to self-improving agent designs. Self-modification, typically hazardous due to the lack of rollback or provenance, is rendered safe by logging rules and behaviors as events. Evaluation of self-improvements becomes a matter of forking the run, applying changes, and structurally diffing outcomes—a pattern unattainable in architectures where state is fragmented or ephemeral. While speculative, this architectural affordance lays a credible foundation for agentic systems capable of safe experimentation and iteration over their own policies and code.

ActiveGraph diverges fundamentally from memory-layer systems like MemGPT, Zep, and Hindsight, which treat memory as a derived, audit-partial layer feeding stateless agents. Here, memory, rules, tool calls, and reasoning are unified as projections from a single authoritative log—substantially improving provenance and replayability. The blackboard architecture, a precursor model, suffered from brittle components and manually authored control; ActiveGraph modernizes the coordination model by leveraging LLM-backed behaviors and automated, pattern-based subscriptions, transcending former scalability limits.

Limitations and Operational Challenges

Operational caveats are named explicitly: reactive substrates can diverge or loop due to cascading triggers, mitigated by runtime budgets but not statically guaranteed. Replay performance scales with log length, necessitating checkpointing or compaction in long-lived runs (not implemented). Schema evolution, external tools with side effects, and concurrent write ordering present unresolved system-level challenges. Determinism is dynamically enforced, imposing responsibility on behavior authors and surfacing violations during replay, not at design time.

Conclusion

ActiveGraph demonstrates that a log-first architecture confers deterministic replay, cheap counterfactual forking, and exhaustive lineage in agent systems. This inversion equips agentic frameworks for compliance, audit, counterfactual evaluation, and future self-improvement—capabilities unattainable in architectures layering memory over conversational or tool-driven workflows. Future work must quantify downstream task benefits and implement the self-improvement mechanisms enabled by this substrate, but the architectural guarantees position event-sourced reactive graphs as a robust foundation for next-generation agentic AI.

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper introduces a new way to build AI “agents” (computer programs that plan and act with LLMs). Instead of treating the agent as a chat loop with a little logging on the side, it flips the idea: the log itself is the agent. Think of it like this:

The agent keeps a careful, never-changing diary (the log) of everything it does.
A live “map” (a graph) is rebuilt from that diary at any time.
Small helpers (behaviors) watch the map for changes, react, do work (including calling an AI model), and write new entries into the diary.

By making the diary the main thing, the system gains three superpowers: you can replay any run exactly, you can branch and try “what if” without redoing past steps, and you can trace every final answer back to the exact steps and model calls that created it.

What questions is it trying to answer?

In simple terms, the paper asks:

Can we build AI agents so that every action is recorded in a single, trustworthy history?
Can we make it easy to rewind and exactly replay what happened, even when AI models are usually random?
Can we cheaply “fork” a run at any point to try a different choice without paying again for all past model calls?
Can we give full “lineage” (a traceable chain) from a big goal down to every tiny decision and source?

The goal is not to make the agent smarter at tasks, but to make it more reliable, auditable, and experiment-friendly.

How does it work? (Methods and approach)

Here are the key ideas using everyday analogies:

The diary (event log) is the source of truth

Imagine a notebook where you only ever add new lines. Each line is an event: a goal set, a rule changed, a tool used, a model asked, or a fact produced.
You never edit old lines—this makes the history trustworthy and permanent.

The map (graph) is rebuilt from the diary

From the diary, the system builds a live “map” of objects (like goals, questions, documents, claims) and the connections between them (like “derived from” or “supports”).
If you lose the map, you can rebuild it exactly by rereading the diary.

Helpers (behaviors) react instead of a boss orchestrating

There’s no single master script. Instead, little helpers subscribe to patterns on the map, like “when a new company appears, create research questions.”
When the map changes in a way a helper cares about, it acts and writes new events to the diary. Those events may trigger other helpers, and so on.

Making replay exact (determinism)

AI models can be random. To replay exactly, the system saves the model’s request and response in the diary.
On replay, it doesn’t call the model again; it just returns the saved answer. This guarantees the same results every time.
Helpers follow a “determinism contract”: they can’t secretly use randomness or the current clock without recording it. If they do, strict replay will catch the mismatch.

Forking like a video game save file

You can “fork” at event number k: copy the first k diary entries and then continue in a new branch.
Past model calls in the shared prefix are free on the fork because their saved answers are reused from the diary.
You can run the fork forward with different settings and compare outcomes.

Seeing differences and lineage

The system can show “what changed” between two runs by comparing their maps (objects, links, edits).
Every statement (like “Revenue grew 28%”) carries a breadcrumb trail back to the exact model call, rule, and evidence that created it.

What did they find, and why is it important?

The paper is a systems/architecture contribution (it doesn’t claim better task accuracy). It shows that treating the diary as the agent gives three big benefits:

Exact replay: You can rebuild the agent’s state and steps byte-for-byte from the diary, even though models are usually random, because responses are cached.
Cheap forks: You can branch at any point and test “what if” without re-paying for earlier model calls.
Full lineage: You can trace every final output back to the goal, rules, evidence, and exact model calls.

They demonstrate this with a worked example: an “investment diligence” pack. Starting from a company name, the system generates questions, researches documents, extracts claims with evidence, finds contradictions and risks, and writes a memo. The important part: every claim in the memo is traceable to its sources and model calls through the diary.

Why does this matter? (Implications and impact)

Better trust and audit: In fields like research, compliance, or finance, you need to explain not just the answer but how you got there. This system makes that easy by design.
Faster iteration: Forking lets you try new prompts or rules without paying again for the whole history. That speeds up testing and comparison.
Safer self-improvement: If an agent modifies its own rules or prompts, those changes are recorded. You can fork before the change, run after, and compare honestly. This enables careful, testable self-improvement.
Practical trade-offs: Saving all model/tool responses takes storage; very long runs may need future features like checkpoints; and helpers must follow the determinism rules to keep replays clean.

In short, by making the diary the center of the agent, the system turns reliability (replay), exploration (forking), and explainability (lineage) from hard problems into built‑in features.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a focused list of concrete gaps the paper leaves unresolved and that future work could address:

Deterministic replay in distributed settings: precise semantics for concurrent or distributed writers (ordering, contention resolution, exactly-once delivery, idempotence, CRDT/transaction model) are not specified.
Long-run scalability: no checkpointing/compaction or incremental replay; replaying million-event runs from scratch is acknowledged but unaddressed (design and correctness of snapshots, partial replay, log pruning).
Schema evolution at scale: a formal versioned type system, automated migrations, and guarantees for backward/forward compatibility across heterogeneous historical events are not detailed.
Static enforcement of the determinism contract: no formal methods, static analysis, or sandboxing are provided to prevent wall-clock randomness, hidden I/O, or global-state dependence; enforcement is only via post hoc strict replay.
External tool side effects: no mechanism to make forks “honest” when tools mutate the outside world (snapshots, mocks, transactional outboxes, idempotency protocols, or side-effect simulation on fork/replay).
LLM caching robustness: stability of request canonicalization/hashing across library versions and providers (e.g., JSON key ordering, tool/function definitions, streaming outputs, tokenizer/version drift) is not analyzed; ambiguity about handling provider model updates under a constant model id.
Code, prompt, and environment provenance: how behavior code, prompt templates, tool specs, and runtime environment are pinned/versioned and linked to events (e.g., content-addressed artifacts, container/image hashes) to ensure true replayability is not specified.
Storage footprint and retention: strategies for compression, deduplication across forks, garbage collection of unreachable objects, retention policies, and cost modeling are unaddressed.
Privacy and compliance: append-only logs conflict with “right to be forgotten”; selective redaction, encryption-at-rest, access control, and multi-tenant isolation policies are not described.
Integrity and trust: no cryptographic tamper-evidence (hash chains, signatures), chain-of-custody, or verifiable lineage mechanisms beyond shared ids are provided.
Behavior scheduling semantics: ordering, fairness, priority, backpressure, and starvation avoidance for behavior firing—while preserving determinism—remain unspecified.
Loop/cascade control beyond budgets: only coarse ceilings are offered; no static cycle detection, confluence analysis, or guarantees of termination/liveness for reactive chains.
Structural diff at scale: algorithms, complexity bounds, and human-centered representations for diffs over large graphs (semantic vs. syntactic diffs, noise reduction) are not given.
Frames semantics and merge: how parallel sub-contexts deterministically converge (conflict resolution, merge policies, associativity/commutativity guarantees) is not defined.
Multi-agent operation over a shared graph: namespaces, isolation, permissions, and conflict resolution between independent agents are open.
Provenance query performance: indexing, time-travel queries, and efficient lineage reconstruction over very large logs are not evaluated.
Empirical benefits: no quantitative evidence that replay/forking/lineage improve task accuracy, cost, latency, or reliability versus baselines; no benchmarks or ablations.
Domain generality: demonstrations are limited; applicability to real-time agents, high-throughput tool use, or domains with volatile external data isn’t shown.
Self-improvement pipeline: concrete procedures for proposing, selecting, and validating changes via fork-and-diff (objective functions, safety constraints, stopping criteria, avoiding Goodharting) are not developed.
Fault tolerance and recovery: atomic event commits, crash consistency, replays after mid-flight failures, and exactly-once behavior execution semantics are unspecified.
Identifier and time handling: guarantees for deterministic id generation across forks and runs, clock/timestamp provenance, and cross-timezone reproducibility are not detailed.
Non-deterministic tools and data drift: strategies to capture and archive volatile inputs (e.g., web pages), detect and handle drift, and decide when to invalidate cached responses are missing.
Type system and validation: strong typing, constraints/invariants, schema-level validation, and rejection of ill-formed or contradictory events are not elaborated.
Performance engineering: throughput/latency under load, reactive scheduling overhead, batching, and safe parallelization of behaviors while maintaining deterministic outcomes are unevaluated.
Integration with retrieval/memory: how to project embeddings/summaries as secondary indices from the log, maintain their consistency, and support efficient RAG over evolving state remains open.
Developer tooling: interactive debugging, step-through causality, visual lineage/diff UIs, and ergonomics for authoring/testing behaviors are not discussed.
Cross-platform reproducibility: sensitivity to library/runtime differences (JSON encoding, floating-point ops, tokenizer changes), and canonicalization libraries/protocols are not specified.
Governance and collaboration: policies for who can edit/fork/merge runs, review workflows for behavior/prompt changes, and provenance of policy changes are undefined.
Cost/budget management: predictive cost models, dynamic budget adaptation, and trade-offs between replay storage vs. live recomputation are not analyzed.

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

Below are concrete, deployable use cases that leverage ActiveGraph’s event-sourced runtime, deterministic replay, fork-and-diff, and full lineage capture. Each item names likely sectors, emergent tools/products/workflows, and key assumptions/dependencies.

Agent CI/CD and observability
- Sectors: Software/LLMOps, Platform teams, SaaS
- Tools/Products/Workflows:
- “Flight recorder” for agents that captures every step (events, prompts, tool calls) for postmortems
- Agent regression testing suites that fork past runs to A/B prompts, tools, or policies without re-paying model costs
- Deterministic offline replays (fixtures + content-addressed cache) for reproducible bug reports and support tickets
- Visual lineage explorers and structural diff viewers integrated into MLOps dashboards
- Assumptions/Dependencies:
- Teams adopt the determinism contract (no hidden I/O/time/randomness) and wrap tools through the framework
- Storage and retention policies accommodate cached model/tool responses
- Existing agents are adapted to behaviors and the event log projection
Audit-ready enterprise chatbots and RPA
- Sectors: Customer support, HR, IT helpdesk, Shared services
- Tools/Products/Workflows:
- End-to-end audit trails of chatbot and automation decisions (who/what/why/when) with strict replay proofs
- “Policy pack” behaviors that enforce and log compliance rules as events
- Incident-response playbooks that fork a problematic conversation at the offending event to test mitigations
- Assumptions/Dependencies:
- Governance acceptance of storing full interaction logs (PII handling, redaction, RBAC)
- Tool side effects are acknowledged (first execution mutates the world; replay is read-only)
Investment and commercial due diligence with provenance
- Sectors: Finance (PE/VC/public markets), Corporate strategy, Market intelligence
- Tools/Products/Workflows:
- Deploy the provided diligence pack to produce memos where every claim links to questions, sources, and evidence
- Counterfactual forks to compare research strategies (e.g., different question generators, document retrieval settings)
- Compliance-ready exports where lineage is the deliverable (traceable from goal to memo)
- Assumptions/Dependencies:
- Accept that the paper claims systems guarantees, not accuracy gains—domain validation still required
- Source access and RAG/document tools integrated as ActiveGraph tools
Legal review with chain-of-custody style lineage
- Sectors: Legal (contract analysis, e-discovery), Compliance
- Tools/Products/Workflows:
- Contract/risk extractors where each extracted clause, assertion, and risk maps to source passages and model calls
- “Red-team forks” to test alternative prompts and consistency checks before filing or disclosure
- Assumptions/Dependencies:
- Secure storage for sensitive documents; encryption and access controls
- Human-in-the-loop validation remains essential; the runtime supplies traceability, not legal judgment
Healthcare administrative automation with audit trails
- Sectors: Healthcare admin (prior auth documentation, claims coding support), Revenue cycle management
- Tools/Products/Workflows:
- Event-level provenance for every suggestion and tool invocation (e.g., code lookup) to support audits
- Fork-and-diff to assess the impact of updated policies or prompt templates on documentation outcomes
- Assumptions/Dependencies:
- Limit to non-clinical decision support initially; clinical use demands additional validation and approvals
- PHI handling compliant with HIPAA/GDPR; strong data governance
Scientific and academic research assistants with reproducible traces
- Sectors: Academia, Industrial R&D
- Tools/Products/Workflows:
- “LLM lab notebook” where hypotheses, evidence, and claims are objects with links and provenance
- Byte-identical replays for peer review and teaching reproducible methods
- Assumptions/Dependencies:
- Researchers model tasks as behaviors and accept event log artifacts as part of publication supplements
- Citation and source licensing policy integration
LLM-infused data/ETL with end-to-end provenance
- Sectors: Data engineering, Analytics, BI
- Tools/Products/Workflows:
- Transformation steps (classification, extraction) logged as events with source–claim links for downstream audits
- Structural diffs between pipeline versions to explain metric changes to stakeholders
- Assumptions/Dependencies:
- Tooling wrappers around connectors/ETL steps; cost control for large runs via fixtures/checkpoints as systems scale
Cost and carbon optimization via cached replays
- Sectors: Any org with heavy evaluation workloads
- Tools/Products/Workflows:
- Content-addressed cache to avoid re-running identical model calls during dev, QA, and demos
- Assumptions/Dependencies:
- Cache hygiene and deduplication; disk budget proportional to run sizes
Policy pilots for transparent AI governance
- Sectors: Public sector, Regulated industries
- Tools/Products/Workflows:
- FOIA-ready/traceable decision records of AI-assisted processes
- Replayable impact assessments by forking at policy-change events and comparing outcomes
- Assumptions/Dependencies:
- Clear retention/redaction standards; stakeholder training to interpret lineage artifacts

Long-Term Applications

The following opportunities likely require additional research, scaling, or integration (e.g., concurrency models, checkpoints, sector-specific validation, regulatory approvals).

Self-improving agents with fork-and-diff evaluation loops
- Sectors: Software/LLMOps, Autonomous systems
- Tools/Products/Workflows:
- “Agent PRs”: propose a prompt/tool/policy change, fork at the proposal point, run forward, diff outcomes, auto-merge on predefined metrics
- Continuous optimization harnesses that evaluate many candidates without re-paying for shared history
- Assumptions/Dependencies:
- Robust metrics and guardrails; sandboxing to avoid harmful side effects
- Governance for accepting/reverting self-modifications
Scalable multi-agent and distributed deployments
- Sectors: Enterprise platforms, Robotics fleets, Supply chain
- Tools/Products/Workflows:
- Shared or sharded event logs with ordering/consistency for multiple agents
- Conflict resolution and merge semantics for graph state across distributed writers
- Assumptions/Dependencies:
- Concurrency control protocols; eventual consistency policies; security for cross-agent data sharing
Checkpointing, compaction, and long-horizon operations
- Sectors: Any long-running agentic workload (ops, monitoring, planning)
- Tools/Products/Workflows:
- Periodic checkpoints/snapshots and log compaction to handle million-event runs
- Time-travel queries over compacted histories
- Assumptions/Dependencies:
- Sound snapshot/restore semantics that preserve replay guarantees
Safety-critical autonomy and digital twins
- Sectors: Energy, Manufacturing, Transportation/Robotics
- Tools/Products/Workflows:
- “What-if” scenario planning by forking control runs into simulated twins to test policy changes before deployment
- Post-incident analyses with strict replay and structural diff to isolate causal factors
- Assumptions/Dependencies:
- High-fidelity simulators; certification processes; isolation of real-world side effects from simulated forks
Regulated clinical decision support with verifiable provenance
- Sectors: Healthcare (clinical), Pharma
- Tools/Products/Workflows:
- End-to-end lineage from patient input to recommendation, including model/tool specifics for audit and validation
- Assumptions/Dependencies:
- Clinical trials, FDA/EMA approvals, bias and safety evidence; robust PHI protections
Finance-grade autonomous workflows with audit and rollback
- Sectors: Banking, Trading, Insurance, AML/KYC
- Tools/Products/Workflows:
- Agentic risk assessments, model-assisted investigations, and claims adjudication with replayable histories
- “Change impact forks” to evaluate new risk policies or model versions on historic cases
- Assumptions/Dependencies:
- Model risk management alignment (SR 11-7-like frameworks), regulator acceptance, strong controls for side effects
Government policy design and evaluation via counterfactuals
- Sectors: Public policy, Urban planning
- Tools/Products/Workflows:
- Forking at policy-draft events to compare alternative interventions and produce transparent, auditable impact narratives
- Assumptions/Dependencies:
- Data availability and validation; stakeholder interpretability of lineage and diffs
Marketplaces of reusable “packs” with lineage guarantees
- Sectors: Software ecosystems, Vertical AI vendors
- Tools/Products/Workflows:
- Domain packs (e.g., safety audits, procurement, clinical triage) published with sample fixtures and reproducible runs
- Certification badges for packs that pass strict replay and provenance checks
- Assumptions/Dependencies:
- Community standards for schemas, migrations, and provenance; versioning and compatibility tooling
Explainable analytics and BI with causal tracebacks
- Sectors: Enterprise analytics, Risk/Compliance
- Tools/Products/Workflows:
- Dashboards where every metric or insight has a click-through lineage to source data, transformations, and model calls
- Assumptions/Dependencies:
- Harmonized schemas across data sources; integration with data catalogs and governance layers
Consumer-grade personal knowledge and automation with provenance
- Sectors: Daily life, Productivity
- Tools/Products/Workflows:
- “Personal agent journals” that show why a recommendation or plan was produced, and allow users to fork scenarios safely
- Assumptions/Dependencies:
- Simple UX for non-experts; private-by-default storage; local or encrypted cloud caches

Cross-cutting assumptions and dependencies

Determinism contract adherence: behaviors must avoid non-deterministic reads outside the framework; violations surface in strict replay.
Storage/governance: logs include prompts, tool arguments, and outputs; plan for PII/PHI handling, encryption, retention, and access control.
Tool side effects: first execution mutates the real world; replay is deterministic but does not undo side effects—sandboxes/simulations mitigate risk.
Model/provider changes: replay relies on cached responses for exactness; forks that change prompts/models will re-execute downstream calls.
Schema evolution and migration: operational discipline and tooling needed as object/event types evolve.
Performance/scaling: very long runs need checkpoints/compaction; distributed use requires ordering and concurrency models.

These applications flow directly from the paper’s core innovations: treating the append-only event log as the source of truth, projecting a deterministic graph from it, reacting via behaviors, caching model/tool responses for exact replay, and enabling cheap, honest counterfactual forks with structural diffs and full lineage.

View Paper Prompt View All Prompts

Glossary

append-only event log: A write-once, ordered history of events where entries are only appended and never mutated, serving as the system’s source of truth. "The append-only event log (bottom) is the source of truth."
blackboard systems: An AI architecture where independent knowledge sources coordinate by reading from and writing to a shared knowledge structure. "Blackboard systems~\cite{blackboard} from the 1970s and 80s organized problem solving around a shared knowledge structure that independent ``knowledge sources'' read from and wrote back to, with no direct calls between them"
command-query responsibility segregation: A design pattern that separates operations that change state (commands) from those that read state (queries). "Event sourcing and command-query responsibility segregation are established patterns in data systems"
content-addressed cache: A cache keyed by hashes of full request content, enabling exact replay of model/tool outputs without re-executing them. "including a content-addressed cache that records model and tool responses so replay performs no new model calls"
counterfactual: A hypothetical alternative scenario used to evaluate “what if” changes without re-executing shared history. "cheap counterfactual forks over computations that include nondeterministic model calls"
Cypher: A graph query language used to express structural patterns over graphs. "a graph-shape pattern expressed in a Cypher subset"
determinism contract: A set of rules requiring behaviors to be deterministic functions of their inputs during replay (e.g., no uncontrolled I/O or randomness). "a determinism contract that makes replay sound"
deterministic fold: Computing current state by deterministically folding (reducing) an event log into a projection. "graph state is a deterministic fold over an append-only event log"
event sourcing: An architectural approach where system state is derived from an immutable sequence of events rather than being stored directly. "event sourcing and CQRS, in which state is a fold over an immutable event log"
fork: A branched run that shares a parent’s event prefix up to a cutoff and then proceeds independently. "A fork branches a run at a chosen event."
frame: A lightweight, in-run sub-context for parallel branches that reconverge within the same log. "The runtime offers a lighter-weight in-run primitive, the frame, for parallel sub-contexts that converge back within a single run and share that run's log."
incremental dataflow: A computation model where derived results update reactively as inputs change. "the core of incremental dataflow and of spreadsheet engines."
lineage: The end-to-end causal trace linking goals, decisions, and artifacts back to their origins. "end-to-end lineage from a high-level goal down to the individual model call that produced each artifact."
LLM-backed behavior: A behavior whose logic is implemented by a LLM call, with requests and responses logged as events. "an LLM-backed behavior, which is what makes ActiveGraph an agent rather than a workflow engine."
monotonic id space: An identifier sequence that only increases, avoiding collisions across branches. "After the cutoff the fork has its own monotonic id space, so ids do not collide."
permissive replay: A replay mode that allows deviations when hashes don’t match, issuing fresh calls while using cache for matches. "will pass permissive replay and fail strict replay"
provenance: Metadata capturing the origin and derivation of objects, including which behavior and event created them. "carries a provenance block naming the behavior that created it"
relation-behavior: Logic attached to a specific relation (edge) type, firing when that relation is established or changed. "and a relation-behavior---logic attached to a typed edge, so that the act of relating two objects can itself carry computation."
schema evolution: The process of changing event or object type definitions over time while preserving historical data. "Schema evolution---changing an event or object type after events using the old shape are already on disk---is handled by migration tooling but remains a real operational burden."
strict replay: A replay mode that requires the live reproduction of events to exactly match the recorded log, otherwise raising a divergence error. "A green strict replay is a proof that the run is reproducible."
structural diff: A comparison of graph structures between runs to identify differences in objects, relations, and patches. "A structural diff between parent and fork then reports exactly which objects, relations, and patches differ as a consequence of the change introduced at the fork point."
typed edge: A graph relationship with an explicit type, enabling semantics and edge-attached computation. "relation-behaviors make typed edges first-class carriers of logic"

View Paper Prompt View All Prompts

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems

Summary

Event-Sourced Architectures for Agentic Systems: ActiveGraph Runtime

Architectural Paradigm: Log-Primary Agentic Systems

Determinism, Replay, and Auditability

Forking, Structural Comparison, and Counterfactual Reasoning

Empirical Demonstration: Auditable Agentic Diligence

Implications for Self-Improving Agents

Limitations and Operational Challenges

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions is it trying to answer?

How does it work? (Methods and approach)

The diary (event log) is the source of truth

The map (graph) is rebuilt from the diary

Helpers (behaviors) react instead of a boss orchestrating

Making replay exact (determinism)

Forking like a video game save file

Seeing differences and lineage

What did they find, and why is it important?

Why does this matter? (Implications and impact)

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Cross-cutting assumptions and dependencies

Glossary

Open Problems

Continue Learning

Collections

Tweets

The Log is the Agent: Event-Sourced Reactive Graphs for Auditable, Forkable Agentic Systems

Summary

Event-Sourced Architectures for Agentic Systems: ActiveGraph Runtime

Architectural Paradigm: Log-Primary Agentic Systems

Determinism, Replay, and Auditability

Forking, Structural Comparison, and Counterfactual Reasoning

Empirical Demonstration: Auditable Agentic Diligence

Implications for Self-Improving Agents

Comparative Context and Related Work

Limitations and Operational Challenges

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions is it trying to answer?

How does it work? (Methods and approach)

The diary (event log) is the source of truth

The map (graph) is rebuilt from the diary

Helpers (behaviors) react instead of a boss orchestrating

Making replay exact (determinism)

Forking like a video game save file

Seeing differences and lineage

What did they find, and why is it important?

Why does this matter? (Implications and impact)

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Cross-cutting assumptions and dependencies

Glossary

Open Problems

Continue Learning

Collections

Tweets