Polymarket ILS Corpus
Population-scale Information Leakage Score (ILS) computations for 4,801 resolved Polymarket markets. Anchored at t_resolve − 24h proxy for 99.9% of records (4,796 markets); 5 markets carry a genuinely event-anchored timestamp. Includes multi-window ILS variants (30min, 2h, 6h, 24h, 7d), top-wallet concentration (HHI), and scope-condition flags. For event-anchored analysis on a smaller, deeper sample, use polymarket-deadline-ils-v3.
Population-scale ILS screening corpus for 4,801 resolved Polymarket markets.
Anchor note. 99.9% of records use
t_news = t_resolve − 24has the anchor (a structural proxy, not a recovered public-event timestamp). Positive ILS values may reflect resolution convergence as well as informed pre-event positioning. For event-anchored measurement on 88 markets with LLM-recovered T_event and bootstrap CIs, use polymarket-deadline-ils-v3. Theanchor_typecolumn ("event"or"proxy_24h") labels every record.
Corpus summary
| Metric | Value |
|---|---|
| Total markets | 4,801 |
anchor_type = proxy_24h | 4,796 (99.9%) |
anchor_type = event | 5 (0.1%) |
Clean-scope markets (scope_all_pass) | 2,548 (53.1%) |
| HHI coverage | 1,391 (29.0%) |
Category breakdown
| Category | n |
|---|---|
| regulatory_decision | 3,443 |
| military_geopolitics | 902 |
| esports | 273 |
| corporate_disclosure | 183 |
ILS definition
ILS(M) = (p(t_news⁻) − p_open) / (p_resolve − p_open)
For 99.9% of records, t_news = t_resolve − 24h. The score measures how much of the resolution-day price move was visible 24 hours before settlement.
Scope conditions
Two conditions are evaluated per record; records failing either are retained with scope_all_pass = false:
- Non-trivial move —
|delta_total| ≥ ε(denominator well-conditioned) - Edge effect —
|p_open − 0.5| ≤ 0.4(substantive uncertainty at opening)
53.1% of markets pass both conditions. The clean_scope_subset.jsonl file (2,548 rows) is the recommended analysis population.
Files
| File | Description |
|---|---|
data/ils_corpus_v1.parquet | Full corpus, Parquet (Snappy, ~950 KB) |
data/ils_corpus_v1.jsonl.gz | Full corpus, JSONL gzipped (~680 KB) |
data/clean_scope_subset.jsonl | 2,548 scope-passing markets |
data/event_anchored_subset.jsonl | 5 genuinely event-anchored markets |
data/scope_failure_breakdown.csv | Per-condition failure counts |
Quick start
import pandas as pd
df = pd.read_parquet("data/ils_corpus_v1.parquet")
# Recommended analysis population
clean = df[df["scope_all_pass"]]
print(f"Clean-scope markets: {len(clean)}") # 2,548
# Event-anchored records only
event = df[df["anchor_type"] == "event"]
print(f"Event-anchored: {len(event)}") # 5
print(clean.groupby("fflow_category")["ils"].describe())
Citation
@misc{nechepurenko2026ils-corpus-dataset,
title = {Polymarket ILS Corpus},
author = {Nechepurenko, Maksym},
year = {2026},
publisher = {ForesightFlow / Devnull FZCO},
url = {https://github.com/ForesightFlow/datasets/tree/main/polymarket-ils-corpus},
note = {Version 1.0, CC-BY-4.0. Accompanies: Information Leakage at Population Scale (arXiv:2605.00459) and Per-Market Information Leakage and Order-Flow Skill (arXiv:2605.02287)}
}