Fill-side behavioral clusters, feature tiers, and per-market microstructure signatures from 13.36M OrderFilled events on Polymarket CTFExchange (43,116 markets, 77,203 addresses, 2026-04-21 to 2026-04-27). k-means k=5 archetypes, 6 feature tiers, Spearman bilateral analysis with BH-FDR correction. Bundle 3 of the PMXT family; companion to Paper 4 of the Event-Linked Perpetuals research programme.
Fill-side behavioral clustering and per-market microstructure metrics for 43,116 Polymarket binary prediction markets over 2026-04-21 to 2026-04-27 UTC. The dataset provides the first public characterisation of non-retail participant behaviour on a decentralised prediction market platform and is released as the empirical foundation for Paper 4 of the Event-Linked Perpetuals research programme.
Corpus
Attribute Value Markets 43,116 Unique addresses (aggregate only) 77,203 Total fills 13,356,931 Source Polymarket CTFExchange OrderFilled events, Polygon mainnet Block range 86,008,447 – 86,107,178 Date range 2026-04-21 to 2026-04-27 UTC Pipeline run 2026-05-11
Feature-tier distribution
Tier Addresses % of total Notional % of total whale-tier (≥ $1M notional) 68 0.1% $184M 28.0% high-frequency-operator 2,952 3.8% $155M 23.5% power-trader 6,738 8.7% $197M 29.9% active-retail 2,062 2.7% $70M 10.6% high-breadth-operator 2,025 2.6% $7M 1.1% episodic-retail 63,358 82.1% $45M 6.8%
Top 3 tiers (12.6% of addresses) control 81.4% of notional volume.
k-means k=5 exploratory partition
Cluster Archetype Addresses Notional share C1 fill-MM 16,786 46.0% C2 fill-LP 13,626 45.1% C0 SPECIALIST 13,775 7.5% C3 RETAIL 20,033 0.8% C4 RETAIL 14,733 0.6%
Note: DBSCAN yielded 1 cluster (unimodal data); k-means k=5 is an exploratory fallback (silhouette = 0.227). Feature-tier classification is the recommended primary stratification.
Three-gate verdict
Gate Result G-FILL PASS — 13.36M fills attributed via eth_getLogs G-QUOTE-LIFE FAIL universal — off-chain CLOB; no quote lifecycle data G-BOOK PASS partial — market-level best_bid/best_ask only
Files
File Description data/per_cluster_summary.jsonk-means cluster centroids, 95% CI, archetype labels data/per_cluster_aggregates.parquetCluster-level totals and summary statistics (5 rows) data/per_address_tiers.parquetAddress-level tier and cluster assignments (77,203 rows) data/tier_kmeans_crosstab.json6 tiers × 5 clusters cross-tabulation data/tier_sensitivity.jsonTier populations at P90/P95/P99 threshold variants data/per_market_microstructure.parquet43,116 markets × 28 metrics (PR, TS, OI, VPIN, SCI, Kyle's λ) data/per_market_ils.parquetILS at 4 resolution anchor offsets (6,406 / 43,116 scope_pass) data/per_market_archetype_share.parquetPer (market, archetype) volume fraction data/cluster_microstructure_bilateral_real.jsonSpearman ρ, BH-FDR, Mann-Whitney, BCa CI (110 tests, 75 significant) data/manipulation_patterns.jsonWash-volume candidates and book-depth swings data/gate_report.jsonThree-gate empirical verdict docs/SCHEMA.mdColumn-by-column documentation docs/METHODOLOGY.mdFull methodology docs/KNOWN_LIMITATIONS.mdKnown limitations (10 items) DATASHEET.mdDatasheets for Datasets format (Gebru et al. 2018)
Related datasets (PMXT family)
Bundle DOI Contents Bundle 1: pmxt-stylized-facts-v1 10.5281/zenodo.20107449 13,314 markets, SF1–SF9, PMXT v2 archive Bundle 2: pmxt-counterfactual-replay-v1 10.5281/zenodo.20108387 E2/E3 resolution-zone counterfactuals Bundle 3: pmxt-behavioral-clusters-v1 10.5281/zenodo.XXXXXXXX This dataset
Citation
@dataset{nechepurenko2026pmxt_clusters,
author = {Nechepurenko, Maksym},
title = {{PMXT Behavioral Clusters v1 --- Non-Retail Polymarket Microstructure Dataset}},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.XXXXXXXX},
license = {CC-BY-4.0}
}