Reject Only Critical Tokens: Pivot-Aware Speculative Decoding

Abstract

Speculative decoding enforces exact target-distribution matching, which can unnecessarily reduce token acceptance and limit speedups. This work reframes decoding around utility preservation and introduces Pivot-Aware Speculative Decoding, which rejects only critical tokens that materially affect downstream utility. A lightweight pivot-token detector enables higher acceptance with minimal utility loss. Experiments across tasks show up to 2.5x speedup while maintaining comparable task performance.

Publication
NeurIPS 2025 Workshop on Efficient Reasoning
Yavuz Faruk Bakman
Yavuz Faruk Bakman
PhD Student in Computer Science Capital One Responsible AI Fellow

My research interests include Trustworthy LLM, Continual Learning and Federated Learning.