arXiv 2508.03153
Estimating Worst-Case Frontier Risks of Open-Weight LLMs
By Eric Wallace, Olivia Watkins, et al.
Published 2025-08-05
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
In this paper, we study the worst-case frontier risks of releasing gpt-oss. We introduce malicious fine-tuning (MFT), where we attempt to elicit maximum capabilities by fine-tuning gpt-oss to be as capable as possible in two domains: biology and cybersecurity. To maximize biological risk (biorisk), we curate tasks related to threat creation and train gpt-oss in an RL environment with web browsing. To maximize cybers…