Multi-step Logic Arena
Test LLMs on complex multi-step reasoning chains that demand consistent logical deduction across long contexts.
Code Synthesis Challenge
Evaluate code generation quality across diverse programming tasks contributed by engineers worldwide.
Domain Knowledge Probe
Expert-crafted questions across science, law, medicine and finance to expose the precise limits of LLM knowledge.
Olympiad Math Gauntlet
Competition-level mathematics problems sourced from IMO, AIME, and AMC to rigorously test quantitative reasoning.
Cross-lingual Transfer Test
Multilingual tasks that measure how well LLMs transfer knowledge and reasoning abilities across diverse languages.
Vision-Language Benchmark
Paired image-text challenges that test how well vision-language models align visual understanding with precise language generation.