arXiv 2509.04476

Training Text-to-Molecule Models with Context-Aware Tokenization

By Seojin Kim, Hyeontae Song, et al.

Published 2025-08-30

Citation lineage

Review the prior work and downstream research connected to this paper.

Recently, text-to-molecule models have shown great potential across various chemical applications, e.g., drug-discovery. These models adapt language models to molecular data by representing molecules as sequences of atoms. However, they rely on atom-level tokenizations, which primarily focus on modeling local connectivity, thereby limiting the ability of models to capture the global structural context within molecul…

View the original paper on arXiv