arXiv 2510.13900

Narrow Finetuning Leaves Clearly Readable Traces in Activation Differences

By Julian Minder, Clément Dumas, et al.

Published 2025-10-14

Discussion

Read the public discussion and references gathered around this paper.

Finetuning on narrow domains has become an essential tool to adapt Large Language Models (LLMs) to specific tasks and to create models with known unusual properties that are useful for research. We show that narrow finetuning creates strong biases in LLM activations that can be interpreted to understand the finetuning domain. These biases can be discovered using simple tools from model diffing - the study of differe…

View the original paper on arXiv