Polymarket Stylized Facts Dataset
Per-market stylized-fact measurements (SF1–SF9) for 13,314 resolved Polymarket binary-event markets over 2026-04-21 to 2026-04-27. Empirical foundation for Paper 1 of the Event-Linked Perpetuals research programme.
This dataset provides nine stylized-fact measurements (SF1–SF9) computed on a stratified sample of 13,314 Polymarket binary-event markets that resolved during 2026-04-21 to 2026-04-27 UTC. It is released as the empirical foundation for Paper 1 of the four-paper Event-Linked Perpetuals research programme and as a shared baseline for prediction-market microstructure research.
Corpus
| Attribute | Value |
|---|---|
| Total markets | 13,314 |
| Source archive | PMXT v2, 168 files |
| Date range (resolved_at) | 2026-04-21 to 2026-04-27 UTC |
| Subsample rule | Stratified-by-day, seed 20260505 |
| Snapshot cutoff | 2026-04-27T23:59:59Z |
| Format | Parquet (primary), JSON (aggregates) |
Note: The companion paper (Paper 1) reports 13,298 markets; the released dataset contains 13,314 due to a more complete UMA oracle cache available at build time (2026-05-07). Stylized-fact headline values are unaffected. See CHANGELOG.md.
Counts by event class
| Event class | Count | Share of total | Share of three classes |
|---|---|---|---|
sports | 6,800 | 51.1% | 77.9% |
other | 4,584 | 34.4% | (excluded) |
crypto | 1,518 | 11.4% | 17.4% |
politics | 412 | 3.1% | 4.7% |
| Total | 13,314 | 100% | — |
Sports account for 77.9% of the three-class total (excluding other), triggering the pre-registered ≤70% threshold consequence rule on the analysis sample.
Stylized facts headline values
| Stylized fact | Headline value | Pre-registered floor | Passed |
|---|---|---|---|
| SF1 boundary depth asymmetry ρ (base) | 1.72 | ≥ 1.5 | ✓ |
| SF1 boundary depth asymmetry ρ (resume) | 1.65 | ≥ 1.5 | ✓ |
| SF2 terminal jump magnitude |Δ| | 0.50 | ≥ 0.10 | ✓ |
| SF3 news vs control basis | 0.0132 / 0.0367 | descriptive | — |
| SF4 mid half-spread | 0.27 | descriptive | — |
| SF8 crypto surge factor | 24.62× | descriptive | — |
| SF8 politics surge factor | 0.68× | descriptive | — |
| SF9 12h-3h → 3h-1h depth ratio | 4.91 | descriptive | — |
Files
| File | Format | Description |
|---|---|---|
data/markets-stylized-facts-v1.parquet | Parquet | One row per market; per-market SF1, SF2, SF4 columns |
data/aggregates.json | JSON | Pooled and per-class aggregate values for all nine SFs |
data/sf7-class-hour-v1.parquet | Parquet | SF7 hourly activity — 96 rows (4 classes × 24 hours) |
data/sf9-bucket-aggregate-v1.parquet | Parquet | SF9 depth by time-to-resolution bucket — 5 rows |
Schema: markets-stylized-facts-v1.parquet
One row per market. SF1, SF2, and SF4 carry per-market columns; SF3, SF5–SF9 are aggregate-only (in aggregates.json).
| Field | Type | Description |
|---|---|---|
market_id | string | Polymarket condition ID, lowercase 0x-prefixed hex |
question | string | Market question text |
event_class | string | One of sports, politics, crypto, other |
tags | list[string] | Polymarket tag names |
created_at | string | ISO 8601 UTC |
closed_at | string | null | ISO 8601 UTC; null if not reported |
resolved_at | string | ISO 8601 UTC (UMA OO settlement) |
resolution_outcome | int8 | 0 (NO) or 1 (YES) |
volume_total_usdc | float64 | null | Cumulative trading volume |
is_negrisk_member | bool | Whether part of a Polymarket negRisk group |
negrisk_group_id | string | null | Group identifier if applicable |
sf_pass | string | CC-004 pass: resume or none |
sf1_rho | float64 | null | Boundary depth asymmetry ratio; null if no boundary observations (88% of markets) |
sf2_terminal_jump_magnitude | float64 | null | |Δ index| over [restime − 1h, restime]; null for 23% illiquidity cohort |
sf4_half_spread_boundary_low | float64 | null | Median half-spread when index < 0.10; null in v1 (not per-market in CC-004) |
sf4_half_spread_low | float64 | null | Median half-spread, index in [0.10, 0.30) |
sf4_half_spread_mid | float64 | null | Median half-spread, index in [0.30, 0.70] |
sf4_half_spread_high | float64 | null | Median half-spread, index in (0.70, 0.90] |
sf4_half_spread_boundary_high | float64 | null | Median half-spread when index > 0.90; null in v1 (not per-market in CC-004) |
Quick start
import pandas as pd
df = pd.read_parquet("data/markets-stylized-facts-v1.parquet")
# Median terminal jump magnitude by event class (SF2)
jump_by_class = (
df[df["sf2_terminal_jump_magnitude"].notna()]
.groupby("event_class")["sf2_terminal_jump_magnitude"]
.median()
.sort_values(ascending=False)
)
print(jump_by_class)
# crypto 0.9995
# sports 0.5000
# politics 0.5000
# other 0.5000
# Median boundary depth asymmetry (SF1) by class
rho_by_class = (
df[df["sf1_rho"].notna()]
.groupby("event_class")["sf1_rho"]
.median()
)
print(rho_by_class)
import duckdb
con = duckdb.connect()
# SF9: depth growth approaching resolution
con.execute("""
SELECT bucket, bucket_lower_h, bucket_upper_h,
pooled_median_depth_within_200bps_usdc,
pooled_n_market_observations
FROM read_parquet('data/sf9-bucket-aggregate-v1.parquet')
ORDER BY bucket_lower_h DESC
""").df()
Citation
@dataset{pmxt2026,
author = {Nechepurenko, Maksym},
title = {{Polymarket Stylized Facts Dataset}, v1},
year = {2026},
publisher = {GitHub},
url = {https://github.com/ForesightFlow/datasets/tree/main/pmxt-stylized-facts-v1},
note = {Tag: pmxt-stylized-facts-v1. Snapshot cutoff: 2026-04-27.}
}