Benchmark run
Started Apr 24, 2026, 9:57 PM · Recorded Apr 24, 2026, 11:06 PM · Ended Apr 24, 2026, 9:58 PM
Per-model aggregates from overall_ranking.json for this run id.
No matching report.json under results/ — charts use ranking or summary only.
Values are read from report.json when the benchmark wrote them.
Discovery
Limited — up to 1 test(s) per category
blxbench argv
tui
Score % vs mean latency where samples exist. Built from per-test rows in report.json when available.
Per-test rows from report.json → results — by category (collapsed by default), then by difficulty. COMPL from details when present. The Visual column is omitted when no test in this run has a details.visual score.
7 tasks in 7 categories · Grouped by category, then by difficulty; row order within each table matchesreport.json results (benchmark execution order)