ForesightFlow
← Datasets
pmxt-behavioral-clusters-v1/v1.0 · CC-BY-4.0

PMXT Behavioral Clusters v1 — Non-Retail Polymarket Microstructure Dataset

Fill-side behavioral clusters, feature tiers, and per-market microstructure signatures from 13.36M OrderFilled events on Polymarket CTFExchange (43,116 markets, 77,203 addresses, 2026-04-21 to 2026-04-27). k-means k=5 archetypes, 6 feature tiers, Spearman bilateral analysis with BH-FDR correction. Bundle 3 of the PMXT family; companion to Paper 4 of the Event-Linked Perpetuals research programme.

Fill-side behavioral clustering and per-market microstructure metrics for 43,116 Polymarket binary prediction markets over 2026-04-21 to 2026-04-27 UTC. The dataset provides the first public characterisation of non-retail participant behaviour on a decentralised prediction market platform and is released as the empirical foundation for Paper 4 of the Event-Linked Perpetuals research programme.

Corpus

AttributeValue
Markets43,116
Unique addresses (aggregate only)77,203
Total fills13,356,931
SourcePolymarket CTFExchange OrderFilled events, Polygon mainnet
Block range86,008,447 – 86,107,178
Date range2026-04-21 to 2026-04-27 UTC
Pipeline run2026-05-11

Feature-tier distribution

TierAddresses% of totalNotional% of total
whale-tier (≥ $1M notional)680.1%$184M28.0%
high-frequency-operator2,9523.8%$155M23.5%
power-trader6,7388.7%$197M29.9%
active-retail2,0622.7%$70M10.6%
high-breadth-operator2,0252.6%$7M1.1%
episodic-retail63,35882.1%$45M6.8%

Top 3 tiers (12.6% of addresses) control 81.4% of notional volume.

k-means k=5 exploratory partition

ClusterArchetypeAddressesNotional share
C1fill-MM16,78646.0%
C2fill-LP13,62645.1%
C0SPECIALIST13,7757.5%
C3RETAIL20,0330.8%
C4RETAIL14,7330.6%

Note: DBSCAN yielded 1 cluster (unimodal data); k-means k=5 is an exploratory fallback (silhouette = 0.227). Feature-tier classification is the recommended primary stratification.

Three-gate verdict

GateResult
G-FILLPASS — 13.36M fills attributed via eth_getLogs
G-QUOTE-LIFEFAIL universal — off-chain CLOB; no quote lifecycle data
G-BOOKPASS partial — market-level best_bid/best_ask only

Files

FileDescription
data/per_cluster_summary.jsonk-means cluster centroids, 95% CI, archetype labels
data/per_cluster_aggregates.parquetCluster-level totals and summary statistics (5 rows)
data/per_address_tiers.parquetAddress-level tier and cluster assignments (77,203 rows)
data/tier_kmeans_crosstab.json6 tiers × 5 clusters cross-tabulation
data/tier_sensitivity.jsonTier populations at P90/P95/P99 threshold variants
data/per_market_microstructure.parquet43,116 markets × 28 metrics (PR, TS, OI, VPIN, SCI, Kyle's λ)
data/per_market_ils.parquetILS at 4 resolution anchor offsets (6,406 / 43,116 scope_pass)
data/per_market_archetype_share.parquetPer (market, archetype) volume fraction
data/cluster_microstructure_bilateral_real.jsonSpearman ρ, BH-FDR, Mann-Whitney, BCa CI (110 tests, 75 significant)
data/manipulation_patterns.jsonWash-volume candidates and book-depth swings
data/gate_report.jsonThree-gate empirical verdict
docs/SCHEMA.mdColumn-by-column documentation
docs/METHODOLOGY.mdFull methodology
docs/KNOWN_LIMITATIONS.mdKnown limitations (10 items)
DATASHEET.mdDatasheets for Datasets format (Gebru et al. 2018)

Related datasets (PMXT family)

BundleDOIContents
Bundle 1: pmxt-stylized-facts-v110.5281/zenodo.2010744913,314 markets, SF1–SF9, PMXT v2 archive
Bundle 2: pmxt-counterfactual-replay-v110.5281/zenodo.20108387E2/E3 resolution-zone counterfactuals
Bundle 3: pmxt-behavioral-clusters-v110.5281/zenodo.XXXXXXXXThis dataset

Citation

@dataset{nechepurenko2026pmxt_clusters,
  author    = {Nechepurenko, Maksym},
  title     = {{PMXT Behavioral Clusters v1 --- Non-Retail Polymarket Microstructure Dataset}},
  year      = {2026},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.XXXXXXXX},
  license   = {CC-BY-4.0}
}