Brier score measures how accurate our probabilistic predictions are.
- · 0.00 = perfect (model is never wrong)
- · 0.14-0.18 = well-calibrated professional model (our target)
- · 0.25 = same as flipping a coin
- · 0.30+ = model is broken / needs retrain
Bias (pp) is how much the model over- or under-predicts on average.
- · 0 pp = predictions match reality exactly, on average
- · +5 pp = we predict ~5% higher than actual (overrate teams)
- · -5 pp = we predict ~5% lower than actual (underrate teams)
- · |>10 pp| = time to retrain
Data source: anonymized prediction-outcome pairs from every FreshLoop user whose daemon is set to share calibration data (opt-in, default on). No user IDs are stored with these samples. See
privacy policy.