For nearly a century, statistical inference has rested on assumed probability measures. We postulate that data comes from some distribution P, then derive tests and decisions from that assumption.
But what if the structure of the decision problem itself, a game between a Skeptic who seeks truth and a Nature that reveals data, were enough to give rise to all of probability?
This is the insight behind e-values and game-theoretic probability. An e-value is not a probability. It is a bet. It is the wealth of a gambler who wagered against the null hypothesis. When that wealth grows large, the evidence is strong, not because we computed a probability, but because the game would not allow it otherwise.
Imagine Forecaster claims hypothesis P is true. Skeptic pays $1 and specifies a bet E. If outcome ω is observed, Skeptic gets E(ω) back. The contract is fair to Forecaster when:
Below: the distribution of e-values under two worlds. Under the null (left), e-values scatter around 1 — Skeptic breaks even on average. Under the alternative (right), e-values shift upward — Skeptic profits because truth is on her side.
The simplest e-variable is the likelihood ratio — the log-optimal bet, which maximizes expected wealth growth:
Testing a hypothesis as a sequential game. At each round, Skeptic declares a bet before seeing the data. Nature reveals the outcome. Wealth grows or shrinks.
The same protocol applies in every domain. Here are two real-world applications of the testing-by-betting game:
Sequential monitoring of BOLD signal in a single brain voxel. Each fMRI volume provides one observation. The e-process accumulates evidence of activation without pre-specifying how many scans to collect.
Sequential comparison of two conversion rates. Each visitor provides a Bernoulli observation. Peek at the result after every user — the e-process remains valid. No need to wait for a fixed sample size.
An e-process remains valid at any stopping time. You may peek whenever you wish and the conclusion holds. This is the gift of Ville's inequality.
The e-process framework offers freedom: how much of your wealth to bet on each round. The betting fraction λt can be chosen differently, yielding different growth-risk profiles. Four canonical strategies:
All-In (λ=1): Bet everything every round. Maximum growth when right, maximum volatility. The product of raw e-values.
Conservative (λ fixed, small): Steady, predictable growth. Lower variance, slower accumulation. Good when you need robustness.
Empirically Adaptive: Start cautious (λ₁=0), then learn the optimal fraction from past e-values. Asymptotically log-optimal without knowing the alternative.
Log-Optimal: If you know the alternative Q, bet the Kelly fraction. Maximizes 𝔼Q[log Wt]. The gold standard, rarely achievable in practice.
Every e-process has a dual: a confidence sequence — an interval containing the true parameter at every time step simultaneously.
It starts wide (little data, much uncertainty), then tightens as evidence accumulates, but never loses its guarantee. Stop collecting data whenever the interval is narrow enough for your purpose.
The connection between e-processes and physics runs deep. In particle physics, sequential monitoring of collision data yields an e-process that accumulates evidence for new particles — the Poisson likelihood ratio is a natural e-value. And the TwoSidedNormalMixture from martingales.py provides the core machinery for testing any sequential stream of observations: a mixture supermartingale that simultaneously gives an e-process and a time-uniform confidence sequence.
Particle Discovery at a Collider. A detector counts events in successive time windows. Under background only (null), counts follow Poisson(λ₀). If a new particle exists (alternative), counts follow Poisson(λ₁ > λ₀). Each window yields an e-value — the likelihood ratio. The product e-process accumulates evidence collision by collision.
Sequential Mean Testing. The core of the library: testing whether a stream of observations has mean zero. The TwoSidedNormalMixture class (martingales.py) constructs a mixture supermartingale that yields both an e-process and a time-uniform confidence sequence. Under the null (μ = 0), the wealth fluctuates near 1. Under drift μ > 0, the running mean escapes the shrinking confidence band and the e-process grows — evidence accumulates proportional to μ².
Game-theoretic probability was born, in part, from the study of financial markets. Shafer and Vovk's foundational work placed finance on game-theoretic footing: the market is Nature, the trader is Skeptic, and the absence of arbitrage is mathematically equivalent to the existence of a probability measure.
VaR Backtesting. A bank reports daily Value-at-Risk: the loss threshold it claims will be exceeded only 1% of the time. The backtest e-statistic (Example 16.9) is eqβ(x, r) = 𝟙{x > r} / (1−β) — a simple indicator that equals 1/α on exceedance days and 0 otherwise. These sequential e-values are combined into an e-process via the betting fraction approach (Equation 16.15, EProcessUpdater): Mt = Mt−1 · ((1−λ) + λEt). If the bank's model is honest, the e-process stays bounded. If risk is underestimated, evidence accumulates.
Market Regime Detection. Financial markets alternate between calm and turbulent regimes. The ConformalCUSUM class (cusum.py) implements a multiplicative e-detector: each observation yields an e-statistic e(x,r) = x²/σ₀² for variance (Example 16.8), and the detector accumulates Ct = max(Ct-1 · Et, ε) with truncation floor ε = 10⁻¹⁰. Under the null (σ = σ₀), E[log Et] ≈ −1.27, so the detector decays rapidly to near-zero. Under a volatility shift, it grows and triggers an alarm at threshold (default 20), then resets to 0.
A brain scan divides the cortex into a lattice of volumetric pixels — voxels. Each voxel produces a noisy time series of BOLD signal. The question at every voxel: is there neural activation above baseline? This is a massively parallel sequential testing problem. Each voxel runs an independent TwoSidedNormalMixture e-process (martingales.py). The e-BH procedure (Definition 9.8) then selects discoveries: reject all voxels whose e-process exceeds K/(α · |D|), controlling FDR at level α under arbitrary dependence between voxels.
Live Cortical Mapping. A top-down view of the cortex. Each cell is one voxel — a region of interest being tested in parallel. As fMRI volumes arrive, evidence accumulates. Cool regions show no activation (e-process near 1). Warm regions show growing evidence. When a voxel's e-process crosses 1/α, it lights up — a discovery. The scan progress bar shows acquisition time, and the counter tracks discoveries in real time.
Voxel Signals. Below, each ribbon is a single voxel's BOLD signal over time. Most voxels are null — pure noise around zero. A subset carry true activation (a small positive drift). The 3D view reveals the spatial structure: many parallel streams of evidence, tested simultaneously.
Parallel E-Processes. Each voxel's signal feeds a TwoSidedNormalMixture supermartingale. Under the null, wealth stays near 1. Under activation, it grows — the active voxels rise from the field like peaks on a landscape. The e-BH threshold adapts to the number of discoveries, controlling false discovery rate across all voxels simultaneously.
Merged E-Process. The arithmetic mean MK = (E1 + ··· + EK) / K is the canonical e-merging function (Proposition 8.3, Theorem 8.4). It essentially dominates all symmetric e-merging functions. At each scan volume, we average all K voxel e-processes into a single global test statistic. If the merged process crosses 1/α, there is global evidence that some activation exists — anywhere in the brain — while each individual voxel's test retains its own validity.