OpenAI and Paradigm introduced EVMbench, a benchmark designed to test AI agents against real smart contract vulnerabilities. The initiative reflects rising concern that artificial intelligence will accelerate both crypto exploits and defensive auditing.
In a joint effort announced Wednesday, the firms said EVMbench evaluates an agent’s ability to detect, patch, and exploit high-severity vulnerabilities in Ethereum Virtual Machine environments. The dataset draws from 120 curated vulnerabilities across 40 audits, including sponsored competitions and reviews tied to Tempo, the Layer 1 blockchain co-developed by Paradigm and Stripe. The system includes capability modes spanning vulnerability detection, contract modification, and simulated fund-draining attacks in sandboxed environments.
Can AI Agents Secure Smart Contracts Before Attackers Exploit Them?
The release follows a series of DeFi security incidents. This month, lending protocol Moonwell suffered losses tied to vulnerable code written with AI assistance, while cross-chain liquidity protocol CrossCurve experienced an exploit of roughly $3 million across multiple networks. According to OpenAI, measuring AI performance in “economically meaningful environments” is essential as models improve at reading and executing code.
“Smart contracts secure billions of dollars in assets, and AI agents are likely to be transformative for both attackers and defenders,” OpenAI wrote in a blog post.
The company added that encouraging defensive use of AI systems is increasingly important as agent capabilities expand. The framing positions EVMbench not just as a research tool but as a safeguard for onchain capital.
Competitive pressure is intensifying. Last year, Anthropic argued that AI agents had already progressed enough to identify smart contract vulnerabilities, potentially lowering the cost of exploits. Paradigm, traditionally focused on crypto investments, has also broadened its mandate to include frontier technologies such as artificial intelligence.
The key variable will be whether security teams adopt AI agents as standard auditing infrastructure rather than experimental tools. The next catalyst will be independent testing results that quantify whether AI-driven audits reduce exploit frequency across live DeFi deployments.