Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense
The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models have effectively collapsed this security barrier, achieving pass rates as high as 90% on complex logic puzzles.
We introduce Next-Gen CAPTCHAs, a scalable defense framework with 27 newly-designed GUI-Agent Era's CAPTCHA families designed to secure the next-generation web against advanced agents. We exploit the persistent human-agent "Cognitive Gap" in interactive perception, memory, decision-making, and action. By engineering dynamic tasks that require adaptive intuition rather than granular planning, we re-establish a robust distinction between biological users and artificial agents.
Performance comparison across frontier AI models on our benchmark†
† The benchmark contains 519 puzzles in total; due to inference cost constraints, GPT-5.2-xHigh and Claude-Opus-4.5 were evaluated on a 135-puzzle subset.
Pass@1 accuracy on Next-Gen CAPTCHA puzzles†
Experience Next-Gen CAPTCHAs yourself
Opens in Hugging Face Spaces
Our CAPTCHAs target 5 fundamental human-agent gaps
Observation interpretation and grounding under partial observability
Multi-step evidence accumulation from motion and sequential reveals
Decision-boundary sensitivity to discrete quantities and counts
Working-memory consistency across interaction steps
Robust low-level execution of correct browser interactions
27 novel CAPTCHA families designed to defend against GUI agents
@misc{liu2026nextgencaptchasleveragingcognitive,
title={Next-Gen CAPTCHAs: Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense},
author={Jiacheng Liu and Yaxin Luo and Jiacheng Cui and Xinyi Shang and Xiaohan Zhao and Zhiqiang Shen},
year={2026},
eprint={2602.09012},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2602.09012},
}
}