arXiv 2209.11895
In-context Learning and Induction Heads
By Catherine Olsson, Nelson Elhage, et al.
Published 2022-09-24
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop…