arXiv 2603.11214

Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

By Linus Folkerts, Will Payne, et al.

Published 2026-03-11

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

We evaluate the autonomous cyber-attack capabilities of frontier AI models on two purpose-built cyber ranges-a 32-step corporate network attack and a 7-step industrial control system attack-that require chaining heterogeneous capabilities across extended action sequences. By comparing seven models released over an eighteen-month period (August 2024 to February 2026) at varying inference-time compute budgets, we obse…

View the original paper on arXiv