arXiv 2502.06867

Forbidden Science: Dual-Use AI Challenge Benchmark and Scientific Refusal Tests

By David Noever and Forrest McKee

Published 2025-02-08

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

The development of robust safety benchmarks for large language models requires open, reproducible datasets that can measure both appropriate refusal of harmful content and potential over-restriction of legitimate scientific discourse. We present an open-source dataset and testing framework for evaluating LLM safety mechanisms across mainly controlled substance queries, analyzing four major models' responses to syste…

View the original paper on arXiv