ForesightFlow
← Datasets
polymarket-ils-corpus-v1/v1.0 · CC-BY-4.0

Polymarket ILS Corpus

Population-scale Information Leakage Score (ILS) computations for 4,801 resolved Polymarket markets. Anchored at t_resolve − 24h proxy for 99.9% of records (4,796 markets); 5 markets carry a genuinely event-anchored timestamp. Includes multi-window ILS variants (30min, 2h, 6h, 24h, 7d), top-wallet concentration (HHI), and scope-condition flags. For event-anchored analysis on a smaller, deeper sample, use polymarket-deadline-ils-v3.

Population-scale ILS screening corpus for 4,801 resolved Polymarket markets.

Anchor note. 99.9% of records use t_news = t_resolve − 24h as the anchor (a structural proxy, not a recovered public-event timestamp). Positive ILS values may reflect resolution convergence as well as informed pre-event positioning. For event-anchored measurement on 88 markets with LLM-recovered T_event and bootstrap CIs, use polymarket-deadline-ils-v3. The anchor_type column ("event" or "proxy_24h") labels every record.

Corpus summary

MetricValue
Total markets4,801
anchor_type = proxy_24h4,796 (99.9%)
anchor_type = event5 (0.1%)
Clean-scope markets (scope_all_pass)2,548 (53.1%)
HHI coverage1,391 (29.0%)

Category breakdown

Categoryn
regulatory_decision3,443
military_geopolitics902
esports273
corporate_disclosure183

ILS definition

ILS(M) = (p(t_news⁻) − p_open) / (p_resolve − p_open)

For 99.9% of records, t_news = t_resolve − 24h. The score measures how much of the resolution-day price move was visible 24 hours before settlement.

Scope conditions

Two conditions are evaluated per record; records failing either are retained with scope_all_pass = false:

  1. Non-trivial move|delta_total| ≥ ε (denominator well-conditioned)
  2. Edge effect|p_open − 0.5| ≤ 0.4 (substantive uncertainty at opening)

53.1% of markets pass both conditions. The clean_scope_subset.jsonl file (2,548 rows) is the recommended analysis population.

Files

FileDescription
data/ils_corpus_v1.parquetFull corpus, Parquet (Snappy, ~950 KB)
data/ils_corpus_v1.jsonl.gzFull corpus, JSONL gzipped (~680 KB)
data/clean_scope_subset.jsonl2,548 scope-passing markets
data/event_anchored_subset.jsonl5 genuinely event-anchored markets
data/scope_failure_breakdown.csvPer-condition failure counts

Quick start

import pandas as pd

df = pd.read_parquet("data/ils_corpus_v1.parquet")

# Recommended analysis population
clean = df[df["scope_all_pass"]]
print(f"Clean-scope markets: {len(clean)}")  # 2,548

# Event-anchored records only
event = df[df["anchor_type"] == "event"]
print(f"Event-anchored: {len(event)}")  # 5

print(clean.groupby("fflow_category")["ils"].describe())

Citation

@misc{nechepurenko2026ils-corpus-dataset,
  title     = {Polymarket ILS Corpus},
  author    = {Nechepurenko, Maksym},
  year      = {2026},
  publisher = {ForesightFlow / Devnull FZCO},
  url       = {https://github.com/ForesightFlow/datasets/tree/main/polymarket-ils-corpus},
  note      = {Version 1.0, CC-BY-4.0. Accompanies: Information Leakage at Population Scale (arXiv:2605.00459) and Per-Market Information Leakage and Order-Flow Skill (arXiv:2605.02287)}
}