Papers
Topics
Authors
Recent
Search
2000 character limit reached

Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment

Published 5 Mar 2026 in cs.LG and cs.AI | (2603.05739v1)

Abstract: Best-of-N (BoN) sampling is a widely used inference-time alignment method for LLMs, whereby N candidate responses are sampled from a reference model and the one with the highest predicted reward according to a learned reward model is selected. Despite its widespread practical use, recent theoretical work has suggested that it is statistically suboptimal and vulnerable to reward hacking, the process by which models exploit weaknesses in the learned reward model to achieve high estimated reward without genuinely improving performance. We revisit this question under assumptions that more closely reflect practice than that of prior work. In particular, in contradistinction to earlier analyses that focused on expected true reward, which may not be meaningful in many practical settings, we investigate how inference-time alignment affects the win-rate, a pairwise comparison-based metric more closely aligned with how reward models are trained and evaluated in practice. We demonstrate that, under minimal conditions on the quality of the reference model and learned reward model, properly tuned BoN is both computationally and statistically optimal in achieving high win-rate, partially explaining its widespread practical success. Because BoN remains susceptible to reward-hacking in this setting, we propose a simple and practical variant that provably eliminates reward-hacking while maintaining optimal statistical performance. Finally, we show that prior approaches are provably suboptimal when considering win-rate, highlighting the importance of choosing appropriate objectives when analyzing inference-time alignment methods.

Authors (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.