arXiv 2510.13900
Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences
By Julian Minder, Clément Dumas, et al.
Published 2025-10-14
Citation lineage
Review the prior work and downstream research connected to this paper.
Finetuning on narrow domains has become an essential tool to adapt Large Language Models (LLMs) to specific tasks and to create models with known unusual properties that are useful for research. We show that narrow finetuning creates strong biases in LLM activations that can be interpreted to understand the finetuning domain. These biases can be discovered using simple tools from model diffing - the study of differe…