AccuracyShare of graded calls that hit. Posts only after 5+ calls actually resolve against the real result — reads “Tracking” until then, never a guessed number. Higher is better.
Calls LoggedEvery prediction we’ve put on the public record — wins and misses. Nothing is deleted or cherry-picked.
Agent ScoreAverage per-call quality: a transparent blend of the called probability, the agent’s confidence, and how complete the data was. It says how strong the call looked — not whether it won.
CalibrationHow well the agent’s confidence matches reality (a Brier score). Lower is better — 0.25 is a coin-flip, below that is genuinely calibrated. Rewards being right about its own uncertainty, which accuracy alone can’t see.