Reject Only Critical Tokens: Pivot-Aware Speculative Decoding

Amir Ziashahabi, Yavuz Faruk Bakman, Duygu Nur Yaldiz, Mostafa El-Khamy, Sai Praneeth Karimireddy, Salman Avestimehr

November, 2025

Abstract

Speculative decoding enforces exact target-distribution matching, which can unnecessarily reduce token acceptance and limit speedups. This work reframes decoding around utility preservation and introduces Pivot-Aware Speculative Decoding, which rejects only critical tokens that materially affect downstream utility. A lightweight pivot-token detector enables higher acceptance with minimal utility loss. Experiments across tasks show up to 2.5x speedup while maintaining comparable task performance.

Type

Conference paper

Publication

NeurIPS 2025 Workshop on Efficient Reasoning

Reject Only Critical Tokens: Pivot-Aware Speculative Decoding

Abstract

Yavuz Faruk Bakman

PhD Student in Computer Science Capital One Responsible AI Fellow