arXiv 2306.03341
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
By Kenneth Li, Oam Patel, et al.
Published 2023-06-06
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
We introduce Inference-Time Intervention (ITI), a technique designed to enhance the "truthfulness" of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLaMA called Alpaca, Iā¦