arXiv 2512.15567

Evaluating Large Language Models in Scientific Discovery

By Zhangde Song, Jieyu Lu, et al.

Published 2025-12-17

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific discovery. We introduce a scenario-grounded benchmark that evaluates LLMs across biology, chemistry, materials, and physics, where domain experts define researc…

View the original paper on arXiv