arXiv 2502.06867

Forbidden Science: Dual-Use AI Challenge Benchmark and Scientific Refusal Tests

By David Noever and Forrest McKee

Published 2025-02-08

Discussion

Read the public discussion and references gathered around this paper.

The development of robust safety benchmarks for large language models requires open, reproducible datasets that can measure both appropriate refusal of harmful content and potential over-restriction of legitimate scientific discourse. We present an open-source dataset and testing framework for evaluating LLM safety mechanisms across mainly controlled substance queries, analyzing four major models' responses to syste…

View the original paper on arXiv