arXiv 2502.06867
Forbidden Science: Dual-Use AI Challenge Benchmark and Scientific Refusal Tests
By David Noever and Forrest McKee
Published 2025-02-08
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
The development of robust safety benchmarks for large language models requires open, reproducible datasets that can measure both appropriate refusal of harmful content and potential over-restriction of legitimate scientific discourse. We present an open-source dataset and testing framework for evaluating LLM safety mechanisms across mainly controlled substance queries, analyzing four major models' responses to syste…