arXiv 2509.04476
Training Text-to-Molecule Models with Context-Aware Tokenization
By Seojin Kim, Hyeontae Song, et al.
Published 2025-08-30
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Recently, text-to-molecule models have shown great potential across various chemical applications, e.g., drug-discovery. These models adapt language models to molecular data by representing molecules as sequences of atoms. However, they rely on atom-level tokenizations, which primarily focus on modeling local connectivity, thereby limiting the ability of models to capture the global structural context within molecul…