arXiv 2510.27254

Languages are Modalities: Cross-Lingual Alignment via Encoder Injection

By Rajan Agarwal and Aarush Gupta

Published 2025-10-31

Discussion

Read the public discussion and references gathered around this paper.

Instruction-tuned Large Language Models (LLMs) underperform on low resource, non-Latin scripts due to tokenizer fragmentation and weak cross-lingual coupling. We present LLINK (Latent Language Injection for Non-English Knowledge), a compute efficient language-as-modality method that conditions an instruction-tuned decoder without changing the tokenizer or retraining the decoder. First, we align sentence embeddings f…

View the original paper on arXiv