PassBench Leaderboard

last updated 2026-05-29
# Model AS Score G-Mean Speedup Correctness fast_1

AS Score is the headline metric — higher is better; Eager = 1.000 and TorchInductor = 0.706 are reference baselines. Hover a column header or see the paper for metric definitions.