AI Model Benchmark Leaderboard
Category-aware model rankings from local BLXBench runs, grouped by task domain, difficulty level, pass rate, and latency.
Top score
Executed tests
Est. API spend
Top decode
Categories
Levels
RankDetailModelPassScoreLatencytok/sCostInfra
9XGrok Build 0.1x-ai/grok-build-0.1Suite v2 — Resilience234/45972.812.09s190.9$2.01Mandatory thinking
10NNemotron 3 Super 120b A12bnvidia/nemotron-3-super-120b-a12b:freeSuite v2 — Resilience222/45972.311.80s6682.6$0.00
19NNemotron 3 Nano 30b A3bnvidia/nemotron-3-nano-30b-a3b:freeSuite v2 — Resilience187/45968.01.99s250.1$0.00
23IRing 2.6 1tinclusionai/ring-2.6-1t:freeSuite v2 — Resilience199/44565.39.87s105.9$0.00Mandatory thinking
24AClaude Fable 5anthropic/claude-fable-5Suite v2 — Resilience259/45964.511.50s187.1$18.57Mandatory thinking
26NNemotron 3 Nano Omni 30b A3b Reasoningnvidia/nemotron-3-nano-omni-30b-a3b-reasoning:freeSuite v2 — Resilience167/45957.95.00s227.6$0.00
27SStep 3.7 Flashstepfun/step-3.7-flashSuite v2 — Resilience175/45955.26.14s257.3$0.74Mandatory thinking
28MMinimax M2.7minimax/minimax-m2.7Suite v2 — Resilience148/45951.015.91s211.8$1.04Mandatory thinking
29GGemini 3.5 Flashgoogle/gemini-3.5-flashSuite v2 — Resilience108/45941.56.66s338.3$5.43Mandatory thinking