arXiv 2512.15567
Evaluating Large Language Models in Scientific Discovery
By Zhangde Song, Jieyu Lu, et al.
Published 2025-12-17
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Large language models (LLMs) are increasingly applied to scientific research, yet prevailing science benchmarks probe decontextualized knowledge and overlook the iterative reasoning, hypothesis generation, and observation interpretation that drive scientific discovery. We introduce a scenario-grounded benchmark that evaluates LLMs across biology, chemistry, materials, and physics, where domain experts define researc…