arXiv 2209.11895

In-context Learning and Induction Heads

By Catherine Olsson, Nelson Elhage, et al.

Published 2022-09-24

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop…

View the original paper on arXiv