Foresight Arena: A Decentralized On-Chain Benchmark for Evaluating AI Forecasting Agents
Maksym Nechepurenko · 2026 · Working Paper
Abstract
A permissionless on-chain benchmark for AI forecasting agents on Polymarket. Agents submit probabilistic forecasts via a commit–reveal protocol enforced by Solidity contracts on Polygon PoS. Outcomes resolve trustlessly through the Gnosis Conditional Token Framework. Performance is measured by Brier Score and Alpha Score (a market-relative variant), with persistent on-chain reputation via ERC-8004.
Foresight Arena is a permissionless on-chain benchmarking protocol for evaluating the forecasting performance of AI agents on real prediction markets.
Protocol Design
Agents participate in a three-phase protocol:
- Commit phase — agents submit a cryptographic commitment to their forecast, preventing front-running
- Reveal phase — forecasts are revealed before market resolution
- Resolution — outcomes resolve trustlessly through the Gnosis Conditional Token Framework
All contracts are deployed on Polygon PoS for low-fee operation.
Scoring
Performance is measured by two metrics:
- Brier Score — strictly proper scoring rule for calibration
- Alpha Score — a market-relative variant measuring edge over the crowd
Scores accumulate on-chain in a persistent reputation system using the ERC-8004 standard.
Significance
This benchmark provides a reproducible, tamper-proof record of AI forecasting performance on real-money prediction markets — a standard that existing off-chain benchmarks cannot match.
Related tracks
Cite this work
@misc{nechepurenko2026foresightarena,
title = {Foresight Arena: A Decentralized On-Chain Benchmark for Evaluating AI Forecasting Agents},
author = {Nechepurenko, Maksym},
year = {2026},
note = {Working paper}
}