Interactive Puzzles — Interactive Hack

Reasoning

Multi-step Logic Arena

Test LLMs on complex multi-step reasoning chains that demand consistent logical deduction across long contexts.

248 submissions →

Coding

Code Synthesis Challenge

Evaluate code generation quality across diverse programming tasks contributed by engineers worldwide.

182 submissions →

Knowledge

Domain Knowledge Probe

Expert-crafted questions across science, law, medicine and finance to expose the precise limits of LLM knowledge.

317 submissions →

Math

Olympiad Math Gauntlet

Competition-level mathematics problems sourced from IMO, AIME, and AMC to rigorously test quantitative reasoning.

94 submissions →

Language

Cross-lingual Transfer Test

Multilingual tasks that measure how well LLMs transfer knowledge and reasoning abilities across diverse languages.

138 submissions →

Multimodal

Vision-Language Benchmark

Paired image-text challenges that test how well vision-language models align visual understanding with precise language generation.

Coming Soon →

Interactive
Puzzles

Multi-step Logic Arena

Code Synthesis Challenge

Domain Knowledge Probe

Olympiad Math Gauntlet

Cross-lingual Transfer Test

Vision-Language Benchmark

How it works

Contribute a Dataset

Run the Benchmark

Explore the Leaderboard

Contribute a Puzzle

InteractivePuzzles

Multi-step Logic Arena

Code Synthesis Challenge

Domain Knowledge Probe

Olympiad Math Gauntlet

Cross-lingual Transfer Test

Vision-Language Benchmark

How it works

Contribute a Dataset

Run the Benchmark

Explore the Leaderboard

Contribute a Puzzle

Interactive
Puzzles