Papers
Topics
Authors
Recent
Search
2000 character limit reached

ForesightFlow: An Information Leakage Score Framework for Prediction Markets

Published 1 May 2026 in q-fin.TR, cs.CR, and q-fin.GN | (2605.00493v1)

Abstract: ForesightFlow is an Information Leakage Score (ILS) framework for detecting informed trading on decentralized prediction markets. For an event-resolved binary market, the score quantifies the fraction of the terminal information move priced in before the public news event. Three operational scope conditions (edge effect, non-trivial total move, anchor sensitivity) are stated as preconditions for interpretation. The score admits a Murphy-decomposition reading that connects label generation to the proper-scoring-rule literature. A pilot empirical evaluation surfaces three findings. First, a resolution-anchored proxy for the public-event timestamp does not separate event-resolved markets from a matched control population (Mann-Whitney p = 1e-6, separation reversed), demonstrating that proxy quality is itself a binding constraint. Second, the article-derived timestamp on a single high-stakes case shifts the score by 0.444 in magnitude relative to the proxy and lies on the opposite side of zero. Third, an audit of the publicly documented Polymarket insider record reveals that documented cases are systematically deadline-resolved, falling outside the original ILS scope (0 of 24 FFIC inventory markets satisfied original scope conditions). This last finding motivates a deadline-ILS extension introduced in Section 7, anchored at the public-event timestamp rather than the news timestamp, and equipped with a per-category exponential hazard baseline for the time-to-event distribution. The extension closes the gap between the methodology and the population in which insider trading has been empirically documented. An end-to-end evaluation of the extension on the 2026 U.S.-Iran conflict cluster is reported in a companion paper. We release the FFIC inventory, the resolution-typology classification of the 911,237-market corpus, and all code at github.com/ForesightFlow.

Authors (1)

Summary

  • The paper introduces the Information Leakage Score (ILS) to quantify the fraction of pre-news price movement that indicates potential informed trading.
  • The framework integrates microstructure diagnostics and on-chain wallet analytics to distinguish genuine signals from noise in decentralized prediction markets.
  • Empirical analyses on Polymarket data underscore the utility of ILS and its deadline variant for real-time market surveillance and regulatory monitoring.

ForesightFlow: An Information-Theoretic Detection Framework for Informed Trading in Decentralized Prediction Markets

Motivation and Context

Decentralized prediction markets, exemplified by platforms like Polymarket, transform diverse participant beliefs into dynamic price signals that have significant reference and coordination value for media, institutions, and the public. The transparency and pseudonymity inherent to these on-chain markets, however, create unique opportunities for market participants with material non-public information (MNPI) to extract anomalous profits, as evidenced by multi-million-dollar events around geopolitical crises, corporate disclosures, and regulatory decisions. Yet, the bulk of extant research focuses on identification of informed activity only after market resolution, a paradigm that fails to provide actionable detection during periods of live, consequential price movement.

Information Leakage Score: Formalization and Interpretation

The central methodological innovation of this framework is the Information Leakage Score (ILS), an axiomatic, information-theoretic label generator for resolved binary outcome markets. For a resolved market MM, ILS quantifies the proportion of the market's terminal information move that was realized before the associated public news event, with the formal definition: ILS(M)=p(Tnews)โˆ’p(Topen)p(Tres)โˆ’p(Topen)ILS(M) = \frac{p(T_{\text{news}}) - p(T_{\text{open}})}{p(T_{\text{res}}) - p(T_{\text{open}})} where p(โ‹…)p(\cdot) denotes the mid-quote price of the YES outcome at the specified timepoints: market opening, first public mention of event-relevant information, and resolution. The score is only interpretable under explicit scope conditions: nontrivial total information movement (โ‰ฅ0.05\geq 0.05), opening prices away from degenerate consensus, and robustness to anchor timing for the news event.

Crucially, the ILS is interpreted through the lens of the Murphy decomposition of the Brier score: high ILS values correspond to resolution (discriminative power) that is front-loaded in the pre-news windowโ€”indicative of price dynamics not justified by the public information set. This connects the microstructure empirics to the proper-scoring-rule literature and enables theoretically grounded, outcome-label definition without post-hoc outcome leakage.

Scope Delineation and Empirical Constraints

ForesightFlow codifies a detailed resolution typology, distinguishing between event-resolved contracts (resolution at a contemporaneous, observable event), deadline-resolved contracts ("will event X occur by deadline Y?"), and structurally unclassifiable markets. Systematic empirical analysis reveals that the most salient, documented cases of Polymarket informed trading (e.g., Iran strike, Venezuela events, major regulatory outcomes) are all deadline-resolved, not event-resolved. This mismatch required an extension of the methodology: the deadline-ILS variant, in which leakage is computed versus a baseline belief trajectory (often, a constant-hazard Bayesian prior) with the label anchored to public observation of the event itself. This closes the gap between method and validated ground truth.

Significantly, the framework identifies and corrects common sources of noise and spurious signal: false positives from edge-effect regimes (opening prices near resolution), trivial-movement markets, and anchor sensitivity in timestamp recovery. Only contracts satisfying all scope and typology criteria are retained for inference.

Data and Label Generation Pipeline

The architecture integrates heterogeneous data sources: Polymarket's CLOB and subgraph APIs (for historical trade, volume, and wallet identification), the UMA Optimistic Oracle (for evidence URLs and authoritative event timestamps), GDELT (for cross-verified news event anchoring), and on-chain wallet metadata from Polygon. The resolution evidence hierarchy is designed to recover "true" news event timestamps, with LLM-assisted matching reserved for ambiguous cases.

Empirical validation is performed both on an extensive market backfill (over 900k contracts, with category-specific filtering) and on a curated inventory of publicly documented insider trading episodes, the ForesightFlow Insider Cases (FFIC), spanning asset classes, event types, and Polymarket functional regimes.

Machine-Detectable Microstructure and On-Chain Wallet Signatures

The ForesightFlow detector incorporates a composite of real-time, causal features:

  • Classical market microstructure diagnostics, adapted to the discrete-outcome, bounded-price regime:
    • Order imbalance at multiple time resolutions.
    • VPIN-style toxicity (bucketed volume imbalance).
    • Rolling estimation of Kyleโ€™s lambda (price impact slope).
    • Variance ratio and two-sidedness (borrowed from the Signal Credibility Index) to distinguish durable, directional repricing from ephemeral or contested moves.
    • Trade-size kurtosis and Hawkes process-based self-excitation for time-clustering anomalies.
  • Wallet-level features exploiting pseudonymous on-chain identity:
    • Wallet-novelty score, aggregating wallet age, prior market participation, concentration of funding sources, and entry timing.
    • Cross-market wallet analysis for abnormal patterns of coordinated positioning.

The resulting feature vector accommodates both immediate microstructure distortion and identity-dependent forensics, supporting robust, interpretable detection without reliance on post-hoc profitability signals.

Empirical Findings and Methodological Implications

An initial pilot empirical study, exploiting the ILS on event-resolved markets using proxy news timestamps, demonstrated that ILS alone does not separate informed flow from null hypothesesโ€”primarily due to misalignment between proxy and true event timing, and spurious signal from edge-effect markets. A high-resolution, article-anchored case study (Epstein-files market) revealed that even robust ILS signals can be consistent with non-insider price dynamics, underscoring the necessity of joint evaluation with wallet-level features.

The FFIC audit established that meaningful empirical power for detecting informed trading flows is only achievable for deadline-resolved markets, explicitly motivating the deadline-ILS formalism. The empirical companion paper (cited as [nechepurenko2026foresightflow_empirical]) details per-category hazard model estimation, case-wise ILSdl application, and cross-case consistency checks.

Limitations and Practical Impact

The framework is candid regarding both theoretical and operational limitations:

  • The proper-scoring-rule connection holds only under regularity assumptions on market calibration.
  • Label assignment is sensitive to news-timestamp recovery quality; some high-volume Polymarket markets remain outside indexable scope due to subgraph infrastructure limits.
  • The small number of publicly documented, unambiguously labeled insider cases restricts fine-grained detector performance claims, but longitudinal production monitoring accumulates power over time.
  • Current analysis is Polymarket-specific; cross-platform generalization (e.g., Kalshi) demands further harmonization and is not undertaken in this stage.

Theoretical and Practical Implications for Market Surveillance and Market Microstructure

ForesightFlow advances both the measurement and operational detection of informed trading in unprecedented ways for decentralized, pseudonymous, and globally accessible markets. The ILS bridges market microstructure and proper scoring rule approachesโ€”enabling the quantification of anticipatory information incorporation with explicit time anchoring and category adaptation. The fusion of on-chain wallet analytics and microstructure features sets a new empirical foundation for market surveillance, regulatory assessment, and future market design.

Practically, the system provides actionable, real-time probability outputs for active markets, facilitating both transparency for the trading public and integrity monitoring for operators and regulators. Release of the entire codebase, typology, case inventories, and empirical artifacts as open infrastructure supports community-wide reproducibility and operationalization.

Conclusion

The ForesightFlow framework establishes a rigorous, scope- and category-aware methodology for detecting informed flow in decentralized prediction markets. By integrating an information-theoretic label formalized via the Murphy decomposition, category-conditional typologies, microstructure and wallet features, and multi-source timestamp anchoring, ForesightFlow pushes the frontier of real-time market forensics. The frameworkโ€™s methodological disciplineโ€”mandating causal features, robust labeling, and scope-aware interpretationโ€”ensures that future empirical and practical enhancements will rest on replicable, interpretable, and actionable foundations for the trustworthy operation of prediction markets as public information infrastructure.

All system artifacts, code, and case inventories are publicly available at https://github.com/ForesightFlow and https://foresightflow.xyz.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.