Next-Gen CAPTCHAs

Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense

Jiacheng Liu*1 Yaxin Luo*1 Jiacheng Cui1 Xinyi Shang1,2 Xiaohan Zhao1 Zhiqiang Shen1
1Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) 2University College London * Equal Contribution
👤
98.8%
Human Pass@1
31s avg
← 92.9% Gap →
🤖
5.9%
Best AI Pass@1
GPT-5.2-xHigh

Introduce Next-Gen CAPTCHAs

The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models have effectively collapsed this security barrier, achieving pass rates as high as 90% on complex logic puzzles.

We introduce Next-Gen CAPTCHAs, a scalable defense framework with 27 newly-designed GUI-Agent Era's CAPTCHA families designed to secure the next-generation web against advanced agents. We exploit the persistent human-agent "Cognitive Gap" in interactive perception, memory, decision-making, and action. By engineering dynamic tasks that require adaptive intuition rather than granular planning, we re-establish a robust distinction between biological users and artificial agents.

Key Results

Performance comparison across frontier AI models on our benchmark

98.8% Human
5.9% GPT-5.2-xHigh
3.2% Gemini-3-Flash
3.0% Claude-Opus4.5
1.3% Gemini-3-Pro
1.3% Doubao-Seed-1.8
0.9% Qwen3-VL-Plus
Human
OpenAI (xHigh Thinking Mode, Max level)
Google (High Thinking Mode)
Anthropic (Extended High Thinking Mode)
ByteDance Doubao (High Thinking Mode)
Alibaba (High Thinking Mode)

The benchmark contains 519 puzzles in total; due to inference cost constraints, GPT-5.2-xHigh and Claude-Opus-4.5 were evaluated on a 135-puzzle subset.

💰
Economic Asymmetry
The Browse-Use Agent backed by GPT-5.2-xHigh spent $6.02 per puzzle to achieve only 5.9% success rate, while humans solve in 31 seconds per puzzle for free.
Time Barrier
High-reasoning models require 16-77 minutes per puzzle vs. human's sub-minute performance.

Leaderboard

Pass@1 accuracy on Next-Gen CAPTCHA puzzles

RANK
MODEL
ORGANIZATION
SOLVING RATE

Try the Demo

Experience Next-Gen CAPTCHAs yourself

Launch Interactive Demo

Opens in Hugging Face Spaces

Cognitive Gap Categories

Our CAPTCHAs target 5 fundamental human-agent gaps

G1

Scene-Structure Inference

Observation interpretation and grounding under partial observability

Mirror, Shadow Direction, 3D Viewpoint, Backmost Layer
G2

Temporal Integration

Multi-step evidence accumulation from motion and sequential reveals

Spooky Circle, Structure From Motion, Trajectory Recovery
G3

Numerosity & Invariants

Decision-boundary sensitivity to discrete quantities and counts

Hole Counting, Color Counting, Subway Paths
G4

Latent-State Tracking

Working-memory consistency across interaction steps

Dice Roll Path, Box Folding, Temporal Object Continuity
G5

Perception-to-Action

Robust low-level execution of correct browser interactions

Static Jigsaw, Dynamic Jigsaw, Red Dot

Citation

@misc{liu2026nextgencaptchasleveragingcognitive,
                    title={Next-Gen CAPTCHAs: Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense}, 
                    author={Jiacheng Liu and Yaxin Luo and Jiacheng Cui and Xinyi Shang and Xiaohan Zhao and Zhiqiang Shen},
                    year={2026},
                    eprint={2602.09012},
                    archivePrefix={arXiv},
                    primaryClass={cs.LG},
                    url={https://arxiv.org/abs/2602.09012}, 
              }
}