ForesightFlow
← Datasets
pmxt-stylized-facts-v1/v1.0 · CC-BY-4.0

Polymarket Stylized Facts Dataset

Per-market stylized-fact measurements (SF1–SF9) for 13,314 resolved Polymarket binary-event markets over 2026-04-21 to 2026-04-27. Empirical foundation for Paper 1 of the Event-Linked Perpetuals research programme.

This dataset provides nine stylized-fact measurements (SF1–SF9) computed on a stratified sample of 13,314 Polymarket binary-event markets that resolved during 2026-04-21 to 2026-04-27 UTC. It is released as the empirical foundation for Paper 1 of the four-paper Event-Linked Perpetuals research programme and as a shared baseline for prediction-market microstructure research.

Corpus

AttributeValue
Total markets13,314
Source archivePMXT v2, 168 files
Date range (resolved_at)2026-04-21 to 2026-04-27 UTC
Subsample ruleStratified-by-day, seed 20260505
Snapshot cutoff2026-04-27T23:59:59Z
FormatParquet (primary), JSON (aggregates)

Note: The companion paper (Paper 1) reports 13,298 markets; the released dataset contains 13,314 due to a more complete UMA oracle cache available at build time (2026-05-07). Stylized-fact headline values are unaffected. See CHANGELOG.md.

Counts by event class

Event classCountShare of totalShare of three classes
sports6,80051.1%77.9%
other4,58434.4%(excluded)
crypto1,51811.4%17.4%
politics4123.1%4.7%
Total13,314100%

Sports account for 77.9% of the three-class total (excluding other), triggering the pre-registered ≤70% threshold consequence rule on the analysis sample.

Stylized facts headline values

Stylized factHeadline valuePre-registered floorPassed
SF1 boundary depth asymmetry ρ (base)1.72≥ 1.5
SF1 boundary depth asymmetry ρ (resume)1.65≥ 1.5
SF2 terminal jump magnitude |Δ|0.50≥ 0.10
SF3 news vs control basis0.0132 / 0.0367descriptive
SF4 mid half-spread0.27descriptive
SF8 crypto surge factor24.62×descriptive
SF8 politics surge factor0.68×descriptive
SF9 12h-3h → 3h-1h depth ratio4.91descriptive

Files

FileFormatDescription
data/markets-stylized-facts-v1.parquetParquetOne row per market; per-market SF1, SF2, SF4 columns
data/aggregates.jsonJSONPooled and per-class aggregate values for all nine SFs
data/sf7-class-hour-v1.parquetParquetSF7 hourly activity — 96 rows (4 classes × 24 hours)
data/sf9-bucket-aggregate-v1.parquetParquetSF9 depth by time-to-resolution bucket — 5 rows

Schema: markets-stylized-facts-v1.parquet

One row per market. SF1, SF2, and SF4 carry per-market columns; SF3, SF5–SF9 are aggregate-only (in aggregates.json).

FieldTypeDescription
market_idstringPolymarket condition ID, lowercase 0x-prefixed hex
questionstringMarket question text
event_classstringOne of sports, politics, crypto, other
tagslist[string]Polymarket tag names
created_atstringISO 8601 UTC
closed_atstring | nullISO 8601 UTC; null if not reported
resolved_atstringISO 8601 UTC (UMA OO settlement)
resolution_outcomeint80 (NO) or 1 (YES)
volume_total_usdcfloat64 | nullCumulative trading volume
is_negrisk_memberboolWhether part of a Polymarket negRisk group
negrisk_group_idstring | nullGroup identifier if applicable
sf_passstringCC-004 pass: resume or none
sf1_rhofloat64 | nullBoundary depth asymmetry ratio; null if no boundary observations (88% of markets)
sf2_terminal_jump_magnitudefloat64 | null|Δ index| over [restime − 1h, restime]; null for 23% illiquidity cohort
sf4_half_spread_boundary_lowfloat64 | nullMedian half-spread when index < 0.10; null in v1 (not per-market in CC-004)
sf4_half_spread_lowfloat64 | nullMedian half-spread, index in [0.10, 0.30)
sf4_half_spread_midfloat64 | nullMedian half-spread, index in [0.30, 0.70]
sf4_half_spread_highfloat64 | nullMedian half-spread, index in (0.70, 0.90]
sf4_half_spread_boundary_highfloat64 | nullMedian half-spread when index > 0.90; null in v1 (not per-market in CC-004)

Quick start

import pandas as pd

df = pd.read_parquet("data/markets-stylized-facts-v1.parquet")

# Median terminal jump magnitude by event class (SF2)
jump_by_class = (
    df[df["sf2_terminal_jump_magnitude"].notna()]
    .groupby("event_class")["sf2_terminal_jump_magnitude"]
    .median()
    .sort_values(ascending=False)
)
print(jump_by_class)
# crypto    0.9995
# sports    0.5000
# politics  0.5000
# other     0.5000

# Median boundary depth asymmetry (SF1) by class
rho_by_class = (
    df[df["sf1_rho"].notna()]
    .groupby("event_class")["sf1_rho"]
    .median()
)
print(rho_by_class)
import duckdb

con = duckdb.connect()

# SF9: depth growth approaching resolution
con.execute("""
    SELECT bucket, bucket_lower_h, bucket_upper_h,
           pooled_median_depth_within_200bps_usdc,
           pooled_n_market_observations
    FROM read_parquet('data/sf9-bucket-aggregate-v1.parquet')
    ORDER BY bucket_lower_h DESC
""").df()

Citation

@dataset{pmxt2026,
  author    = {Nechepurenko, Maksym},
  title     = {{Polymarket Stylized Facts Dataset}, v1},
  year      = {2026},
  publisher = {GitHub},
  url       = {https://github.com/ForesightFlow/datasets/tree/main/pmxt-stylized-facts-v1},
  note      = {Tag: pmxt-stylized-facts-v1. Snapshot cutoff: 2026-04-27.}
}