arXiv 2512.15567
Evaluating Large Language Models in Scientific Discovery
By Zhangde Song, Jieyu Lu, et al.
Published 2025-12-17
Discussion
Read the public discussion and references gathered around this paper.
Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific discovery. We introduce a scenario-grounded benchmark that evaluates LLMs across biology, chemistry, materials, and physics, where domain experts define researc…