arXiv 2510.27254

Languages are Modalities: Cross-Lingual Alignment via Encoder Injection

By Rajan Agarwal and Aarush Gupta

Published 2025-10-31

Citation lineage

Review the prior work and downstream research connected to this paper.

Instruction-tuned Large Language Models (LLMs) underperform on low resource, non-Latin scripts due to tokenizer fragmentation and weak cross-lingual coupling. We present LLINK (Latent Language Injection for Non-English Knowledge), a compute efficient language-as-modality method that conditions an instruction-tuned decoder without changing the tokenizer or retraining the decoder. First, we align sentence embeddings f…

View the original paper on arXiv