Next-Gen CAPTCHAs

Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense

Jiacheng Liu*1 Yaxin Luo*1 Jiacheng Cui1 Xinyi Shang1,2 Xiaohan Zhao1 Zhiqiang Shen1
1Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) 2University College London * Equal Contribution
👤
98.8%
Human Pass@1
31s avg
← 92.9% Gap →
🤖
5.9%
Best AI Pass@1
GPT-5.2-xHigh

Introduce Next-Gen CAPTCHAs

The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks established a baseline for evaluating multimodal agents, recent advancements in reasoning-heavy models have effectively collapsed this security barrier, achieving pass rates as high as 90% on complex logic puzzles.

We introduce Next-Gen CAPTCHAs, a scalable defense framework with 27 newly-designed GUI-Agent Era's CAPTCHA families designed to secure the next-generation web against advanced agents. We exploit the persistent human-agent "Cognitive Gap" in interactive perception, memory, decision-making, and action. By engineering dynamic tasks that require adaptive intuition rather than granular planning, we re-establish a robust distinction between biological users and artificial agents.

Key Results

Performance comparison across frontier AI models on our benchmark (519 puzzles)

98.8% Human
5.9% GPT-5.2-xHigh
3.2% Gemini-3-Flash
3.0% Claude-Opus4.5
1.3% Gemini-3-Pro
1.3% Doubao-Seed-1.8
0.9% Qwen3-VL-Plus
Human
OpenAI (xHigh Thinking Mode, Max level)
Google (High Thinking Mode)
Anthropic (Extended High Thinking Mode)
ByteDance Doubao (High Thinking Mode)
Alibaba (High Thinking Mode)
💰
Economic Asymmetry
GPT-5.2-xHigh spent $3,122 to achieve only 5.9% success rate, while humans solve in 31 seconds for free.
Time Barrier
High-reasoning models require 16-77 minutes per puzzle vs. human's sub-minute performance.

Leaderboard

Pass@1 accuracy on 519 Next-Gen CAPTCHA puzzles

RANK
MODEL
ORGANIZATION
SOLVING RATE

Try the Demo

Experience Next-Gen CAPTCHAs yourself

Launch Interactive Demo

Opens in Hugging Face Spaces

Cognitive Gap Categories

Our CAPTCHAs target 5 fundamental human-agent gaps

G1

Scene-Structure Inference

Observation interpretation and grounding under partial observability

Mirror, Shadow Direction, 3D Viewpoint, Backmost Layer
G2

Temporal Integration

Multi-step evidence accumulation from motion and sequential reveals

Spooky Circle, Structure From Motion, Trajectory Recovery
G3

Numerosity & Invariants

Decision-boundary sensitivity to discrete quantities and counts

Hole Counting, Color Counting, Subway Paths
G4

Latent-State Tracking

Working-memory consistency across interaction steps

Dice Roll Path, Box Folding, Temporal Object Continuity
G5

Perception-to-Action

Robust low-level execution of correct browser interactions

Static Jigsaw, Dynamic Jigsaw, Red Dot

Citation

@article{liu2026nextgen,
  title={Next-Gen CAPTCHAs: Leveraging the Cognitive Gap for
         Scalable and Diverse GUI-Agent Defense},
  author={Liu, Jiacheng and Luo, Yaxin and Cui, Jiacheng and
          Shang, Xinyi and Zhao, Xiaohan and Shen, Zhiqiang},
  year={2026}
}