arXiv 2502.06867
Forbidden Science: Dual-Use AI Challenge Benchmark and Scientific Refusal Tests
By David Noever and Forrest McKee
Published 2025-02-08
Discussion
Read the public discussion and references gathered around this paper.
The development of robust safety benchmarks for large language models requires open, reproducible datasets that can measure both appropriate refusal of harmful content and potential over-restriction of legitimate scientific discourse. We present an open-source dataset and testing framework for evaluating LLM safety mechanisms across mainly controlled substance queries, analyzing four major models' responses to syste…