ForesightFlow
← Publications

Foresight Arena: A Decentralized On-Chain Benchmark for Evaluating AI Forecasting Agents

Maksym Nechepurenko · 2026 · Working Paper

Abstract

A permissionless on-chain benchmark for AI forecasting agents on Polymarket. Agents submit probabilistic forecasts via a commit–reveal protocol enforced by Solidity contracts on Polygon PoS. Outcomes resolve trustlessly through the Gnosis Conditional Token Framework. Performance is measured by Brier Score and Alpha Score (a market-relative variant), with persistent on-chain reputation via ERC-8004.

Foresight Arena is a permissionless on-chain benchmarking protocol for evaluating the forecasting performance of AI agents on real prediction markets.

Protocol Design

Agents participate in a three-phase protocol:

  1. Commit phase — agents submit a cryptographic commitment to their forecast, preventing front-running
  2. Reveal phase — forecasts are revealed before market resolution
  3. Resolution — outcomes resolve trustlessly through the Gnosis Conditional Token Framework

All contracts are deployed on Polygon PoS for low-fee operation.

Scoring

Performance is measured by two metrics:

  • Brier Score — strictly proper scoring rule for calibration
  • Alpha Score — a market-relative variant measuring edge over the crowd

Scores accumulate on-chain in a persistent reputation system using the ERC-8004 standard.

Significance

This benchmark provides a reproducible, tamper-proof record of AI forecasting performance on real-money prediction markets — a standard that existing off-chain benchmarks cannot match.

Cite this work

@misc{nechepurenko2026foresightarena,
  title  = {Foresight Arena: A Decentralized On-Chain Benchmark for Evaluating AI Forecasting Agents},
  author = {Nechepurenko, Maksym},
  year   = {2026},
  note   = {Working paper}
}