Benchmark run

run_c83c0d

Started Apr 24, 2026, 9:57 PM · Recorded Apr 24, 2026, 11:06 PM · Ended Apr 24, 2026, 9:58 PM

0.0Blended scoreTests 7Models 1

Passed0

Failed7

Pass rate0.0%

Duration51.5s

Categories7

Models1

Est. cost (run)$0.00

Submitted byBitslix

Tokens (Σ results)0 / 0

Models in this run

Per-model aggregates from overall_ranking.json for this run id.

ModelTestsPassScoreLatencyCost

deepseek/deepseek-v4-pro70/70.0%n/a$0.00

No matching report.json under results/ — charts use ranking or summary only.

blxbench & discovery

Values are read from report.json when the benchmark wrote them.

Discovery

Limited — up to 1 test(s) per category

blxbench argv

tui

Category performance (this run)

Score % vs mean latency where samples exist. Built from per-test rows in report.json when available.

By difficulty level

Pass vs fail

Tests per category

All task results

Per-test rows from report.json → results — by category (collapsed by default), then by difficulty. COMPL from details when present. The Visual column is omitted when no test in this run has a details.visual score.

7 tasks in 7 categories · Grouped by category, then by difficulty; row order within each table matchesreport.json results (benchmark execution order)