Trustworthy LLM

TruthTorchLM: A Comprehensive Library for Predicting Truthfulness in LLM Outputs

TruthTorchLM is an open-source library for estimating the truthfulness of LLM outputs. It unifies over 30 truthfulness methods spanning …

Duygu Nur Yaldiz, Yavuz Faruk Bakman, Sungmin Kang, Alperen Ozis, Hayrettin Eren Yildiz, Mitash Ashish Shah, Zhiqi Huang, Anoop Kumar, Alfy Samuel, Daben Liu, Sai Praneeth Karimireddy, Salman Avestimehr

Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering

We study uncertainty quantification for contextual question answering and propose a principled epistemic uncertainty measure derived …

Yavuz Faruk Bakman, Sungmin Kang, Zhiqi Huang, Duygu Nur Yaldiz, Catarina Belem, Chenyang Zhu, Anoop Kumar, Alfy Samuel, Daben Liu, Sai Praneeth Karimireddy, Salman Avestimehr

Reconsidering LLM Uncertainty Estimation Methods in the Wild

This paper studies practical deployment challenges for LLM uncertainty estimation beyond standard short-form QA evaluation. It analyzes …

Yavuz Faruk Bakman, Duygu Nur Yaldiz, Sungmin Kang, Tuo Zhang, Baturalp Buyukates, Salman Avestimehr, Sai Praneeth Karimireddy

Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs

In this work, we introduce the Learnable Response Scoring Function (LARS) for Uncertainty Estimation (UE) in generative Large Language …

Duygu Nur Yaldiz, Yavuz Faruk Bakman, Baturalp Buyukates, Anil Ramakrishna, Chenyang Tao, Dimitrios Dimitriadis, Jieyu Zhao, Salman Avestimehr

MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs

Generative Large Language Models (LLMs) are widely utilized for their excellence in vari- ous tasks. However, their tendency to produce …

Yavuz Faruk Bakman, Duygu Nur Yaldiz, Baturalp Buyukates, Chenyang Tao, Dimitrios Dimitriadis, Salman Avestimehr