The “Bubble” of Risk: Improving Assessments for Offensive Cybersecurity Agents
Freedom to Tinker 2025-07-22
Summary:
Authored by Boyi Wei Most frontier models today undergo some form of safety testing, including whether they can help adversaries launch costly cyberattacks. But many of these assessments overlook a critical factor: adversaries can adapt and modify models in ways that expand the risk far beyond the perceived safety profile that static evaluations capture. At […]
The post The “Bubble” of Risk: Improving Assessments for Offensive Cybersecurity Agents appeared first on CITP Blog.