Teaching AI models to say “I’m not sure” ↗
Researchers at MIT CSAIL developed RLCR (Reinforcement Learning with Calibration Rewards) to train models to output calibrated confidence estimates with their answers. RLCR adds a Brier score term to the reward to penalize mismatch between stated confidence and actual accuracy, improving calibration by up to 90% in tests with no loss of accuracy. Results from 7B-parameter models show calibration improvements across seen and unseen benchmarks, outperforming post-hoc confidence methods.