Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense
The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models have effectively collapsed this security barrier, achieving pass rates as high as 90% on complex logic puzzles.
We introduce Next-Gen CAPTCHAs, a scalable defense framework with 27 newly-designed GUI-Agent Era's CAPTCHA families designed to secure the next-generation web against advanced agents. We exploit the persistent human-agent "Cognitive Gap" in interactive perception, memory, decision-making, and action. By engineering dynamic tasks that require adaptive intuition rather than granular planning, we re-establish a robust distinction between biological users and artificial agents.
Performance comparison across frontier AI models on our benchmark (519 puzzles)
Pass@1 accuracy on 519 Next-Gen CAPTCHA puzzles
Experience Next-Gen CAPTCHAs yourself
Opens in Hugging Face Spaces
Our CAPTCHAs target 5 fundamental human-agent gaps
Observation interpretation and grounding under partial observability
Multi-step evidence accumulation from motion and sequential reveals
Decision-boundary sensitivity to discrete quantities and counts
Working-memory consistency across interaction steps
Robust low-level execution of correct browser interactions
27 novel CAPTCHA families designed to defend against GUI agents
@article{liu2026nextgen,
title={Next-Gen CAPTCHAs: Leveraging the Cognitive Gap for
Scalable and Diverse GUI-Agent Defense},
author={Liu, Jiacheng and Luo, Yaxin and Cui, Jiacheng and
Shang, Xinyi and Zhao, Xiaohan and Shen, Zhiqiang},
year={2026}
}