BLXBench Docs
BLXBench Docs
LeaderboardOur TestsSponsor / PartnershipDocumentationInstallationUpdating blxbenchQuick StartTUIArcadeCommandsHeadless ModeConfigurationLeaderboardOur TestsAccountReport Browser (desktop)AboutFAQSupport

TUI

BLXBench interactive Terminal User Interface.

BLXBench features a modern Terminal User Interface (TUI) built with Ink and React. It provides a command-driven experience for configuring providers, selecting models, running benchmarks, and submitting reports.

Install the CLI globally as @bitslix/blxbench (see Installation); the binary name remains blxbench. To upgrade the CLI, use the same global package manager you used the first time — Updating blxbench has npm, pnpm, and Bun examples and explains the optional “newer on npm” info box on the welcome screen.

Starting the TUI

blxbench

Screens

Welcome Screen

When OPENROUTER_API_KEY is not set (after the config merge), the TUI opens a first-run setup screen instead of the usual banner: enter your OpenRouter key once (input is masked) and it is stored in ~/.blxbench/config.json under env, then you continue to sign-in.

Choosing LM Studio (lms) or Ollama (oll) in /provider when the optional LMSTUDIO_API_KEY / OLLAMA_API_KEY is unset opens the same style of screen: save a token to config.json, or press Enter on an empty field (or S) to skip if you do not need auth locally.

After selecting lms or oll, traffic uses 127.0.0.1 only. If your daemon listens on a non-default TCP port, set it with /local-inference-port <n> (or off / unset for the adapter default) before /run; /show echoes the active override when present.

If the project ./.env and config.env both define that key, you may see a short picker to prefer workspace, stored, or ignore workspace .env — see Configuration — App config file.

Otherwise you see the normal welcome / sign-in flow:

Welcome Screen

Arcade (minigames)

Open BLXBench Arcade from the shell with /arcade—or press a while a benchmark is actively running tests so the console job keeps going underneath. Esc / q closes the overlay only.

Help & Commands

View available commands and shortcuts:

Help Screen

Account Management

Sign in to upload results to the leaderboard:

Account Screen

Configure Run

Select provider, models, and test filters:

Configure Run

Provider list

Use /provider (or /set provider) to open the provider picker. Each row shows the alias, cloud or local mode, and a short description:

Select provider

Model list

Use /models list (or /set models) to open the live model picker from your current provider. The list shows modality info, pricing per million tokens, expiry, and context size:

Select models

Categories

Use /categories or /set categories to toggle benchmark categories. Each row shows a description and the test count by difficulty level:

Select categories

Levels

Use /levels or /set levels to choose easy, medium, and/or hard:

Select levels

Config overlay (/config)

/config opens an interactive overlay for editing stored settings without touching files manually. It has three tabs (navigate with ← / → or Tab):

  • Settings — desktop notifications, preferred config source, summary/validation model
  • API Keys — view and edit provider keys stored in ~/.blxbench/config.json
  • Main Provider — set the default provider alias

Navigate rows with ↑ / ↓, press Enter to edit a value, Esc / q to close.

Config overlay — Settings tab

Config overlay — API Keys tab

Session snapshots

Use /save and /load to persist and restore your run configuration (provider, models, category/level filters, limits, optional local_inference_port for lms/oll, report mode, submit toggle, etc.): The CLI writes JSON under ~/.blxbench/saves/ on Unix and %USERPROFILE%\.blxbench\saves\ on Windows.

Separate from snapshots, /set notify on (or /notify) stores desktop notification preferences in ~/.blxbench/config.json — optional OS hints when a run finishes. See Configuration — Desktop notifications.

Saved sessions — /load

  • With a label (or the default), /save creates a new timestamped file each time so you can keep checkpoints.
  • /load opens a picker (same keys as /report list: ↑/↓, Enter, Esc) and replaces the whole active configuration when you confirm — use /show to verify before /run.
  • Snapshot files hold only run settings. Provider API keys and your BLXBench credentials stay in environment variables and the separate local credentials file managed by /auth — they are not copied into snapshots.

Autosave: while you are in the interactive shell (not during an active run replay), the CLI can periodically overwrite a single file named autosave.json in the same saves directory so you always have a recent backup. Configure the interval with the environment variable BLXBENCH_AUTOSAVE_SEC (seconds); set it to 0 to turn autosave off. See Configuration.

Invalid or hand-edited snapshot files fail validation when you load them — the TUI shows an error instead of applying a broken config.

Running Benchmark

Watch real-time progress during execution:

Benchmark Run

Recent reports (/report list)

Open /report list to browse recent runs under your report root (default ~/.blxbench/reports on Unix, %USERPROFILE%\.blxbench\reports on Windows). Navigate with ↑/↓ or j/k, PgUp/PgDn, g / G (top/end), then Enter to open a report. Esc cancels.

Recent reports list

Report details — text replay

The report replay view (from /report list, or d right after a run) shows the same report.json as a scrollable text summary: timing, per-model stats, category breakdown, and provider cost where available. The path at the top is the JSON file; open the sibling index.html in a browser for the full HTML report when report is html or both.

Report replay — text summary

Report details — charts

Press c to toggle charts in the report replay: category score bars (and related visuals) for the current model. If the run used more than one model, press m to cycle which model the charts use. Press c again to return to the text view.

Report replay — charts

Navigation

KeyAction
TabComplete slash-command suggestions
EnterRun the current command
Ctrl+CExit

While typing a slash command, Tab and the suggestion panel help you pick commands and see short descriptions:

Slash-command suggestions

During a benchmark (run dashboard)

While /run is active, the run dashboard shows the model (and current test when known) on the line above the progress bar. The bar counts completed tests (done / total) across models; TTFT on that line reflects the in-flight request when applicable.

At the right end of that same line, blxbench shows a live cumulative cost (· run $…) — the sum of per-test cost_usd values for every finished test in this run (including multi-model runs). Skipped tests contribute $0. The number uses the same rounding style as the $ figures on the log lines under the bar.

While tests are executing (not only on the idle summary after a run):

KeyAction
aOpen Arcade minigames — benchmark continues in the background; Esc / q closes the overlay
pPause the run — waits for the current test to finish, writes a pause snapshot, then holds. You can resume later with /resume from the shell or press r to continue immediately in the same session.

Paused run screen

When the run is paused (snapshot saved):

Run pausing — waiting for current test to finish

KeyAction
rResume immediately — continues from the saved test index in the same session
q / EscReturn to the shell — snapshot is kept on disk; use /resume later to pick it up

Pause snapshots are stored under ~/.blxbench/pauses/ (Unix) or %USERPROFILE%\.blxbench\pauses\ (Windows). The final report generated after resuming includes results from both the pre-pause and post-resume sessions and is marked as a resumed run in the web detail view.

After a benchmark finishes (run dashboard)

The run dashboard stays open with a per-model summary, log, and a line pointing at your local report. With the default report mode (both or html), the TUI shows Report HTML: and the path to index.html in the run folder. If you set /set report json, it shows Report JSON: instead (no index.html is written in that mode).

KeyAction
dOpen report details (replay view) for this run — same screen as choosing a file in /report list
s / rUpload the run’s report.json to the public leaderboard (manual; works even when /report submit is off), if you are signed in and your role may submit
q / EscReturn to the shell (the benchmark does not re-run)

The footer while a run is active looks like: d — report details · s / r — upload · q / Esc — shell.

Report details (replay)

Opened from d after a run, or from /report list → Enter on a row. This view reads the selected report.json.

KeyAction
cToggle charts (category bars, etc.) vs text summary
mWhen charts are on and multiple models ran, switch to the next model’s chart
s / rUpload this file to the leaderboard (same rules as on the run dashboard)
q / EscBack to the shell

Upload still sends report.json to the server; the HTML file is for local viewing and sharing.

Common Commands

The status line under the header shows user, provider, models, cats (category filter), levels (difficulty filter), rate, report, and submit at a glance.

CommandAction
/helpShow all commands by category
/showShow the active configuration
/provider or /set providerOpen the provider list and pick a registered adapter (no extra text on the command line)
/models listFetch models from the current provider
/models id,idSet model ids directly
/categories or /set categoriesOpen the category checklist; use Space / a / n / Enter to pick embedded names. /categories * = all categories. You cannot type category names after the command.
/levels or /set levelsOpen the difficulty checklist (easy / medium / hard from the suite). /levels * = all levels. You cannot type level names after the command.
/limit NLimit tests per category
/ratelimit RPM-or-offThrottle provider requests
`/report htmljson
`/report submit onoff`
/report listList recent reports (same report root) — Enter opens report replay (text + c charts) or upload with s / r
/resumeOpen the paused-run list and continue a paused benchmark from where it stopped (↑/↓ · Enter · Esc)
/save [label]Save the current configuration to a JSON snapshot under ~/.blxbench/saves/ (Unix) or %USERPROFILE%\.blxbench\saves\ (Windows)
/loadOpen the saved-snapshot list and apply the selected file
/output-dir PATH or /set output-dir PATHChange the report directory
/report browser installInstall the optional native Report Browser desktop app
/report browser openLaunch the installed Report Browser
/report browser uninstallRemove the installed Report Browser
/auth loginSign in with browser device login
/auth whoamiShow the signed-in account
/usageUsage overlay — masked email, pass / subscription summary, weekly public bench quota per model (heaviest model vs cap, optional per-model lines, UTC week end). Fetches live data from the app you signed into. Close with Esc, Enter, or q.
/passSame as /usage (not the browser /pass checkout page).
/arcadeArcade — minigame picker from the shell (no run required).
/playwright installInstall Playwright Chromium
/runStart the benchmark

Account Login

/auth login starts the blxbench device flow. blxbench opens the web app, you approve the displayed code in the browser, and blxbench stores local credentials in your user config directory.

Leaderboard submission requires a signed-in account with a pass tier that includes submission quota: Scout, Bencher, Founder, or Admin. Headless automation can instead use BLXBENCH_API_KEY with --submit.

Public leaderboard uploads are only accepted for full, unfiltered benchmark runs (among other checks). Runs with category, level, or per-category limit filters, /set limit, fail-fast / early exit, or incomplete execution are rejected by the API — you can still run and review them locally. See Public submission rules.

Some deployments may additionally require a verified report.json (cryptographic signing). If your upload fails with an integrity or signature error, use a CLI build that supports signing and follow your operator’s instructions; no key material belongs in snapshot or report files you share in chat.

Local Reports

By default, TUI runs write reports under ~/.blxbench/reports/ on Linux/macOS and %USERPROFILE%\.blxbench\reports\ on Windows. Use /set output-dir PATH to override this for the current run. /report clear cleans the report directory while preserving/resetting ranking files.

/report list scans for recent report.json files under that effective path (default or your /set output-dir). If you change the output directory, the list uses the new location — it does not read from a different folder than the one your runs use.

After each public upload attempt (auto, s/r on a finished run, or from /report list), the CLI appends a local audit trail to the same report.json under the key blxbench_cli.public_submissions (timestamp, success or HTTP error, remote ids on success). That field is not sent to the server on upload, so you can see later what was published from this file.

Provider API keys remain local in your environment or .env. TUI sign-in is only for BLXBench account features such as eligible leaderboard upload.

Features

  • Interactive model selection — Type model ids, or use /models list; provider, categories, and levels are chosen in on-screen lists (not by typing names after the slash command)
  • Real-time progress — Watch benchmark execution
  • Account integration — Sign in via browser device login and upload eligible reports
  • Manual upload — After a run, s / r upload the report without turning auto-submit on; /report list or d opens the report replay (text + optional charts)
  • Command help — Slash-command reference with completion

Quick Start

Run your first benchmark in 5 minutes.

Arcade

BLXBench minigames in the terminal — play from the shell or during a benchmark.

On this page

Starting the TUIScreensWelcome ScreenArcade (minigames)Help & CommandsAccount ManagementConfigure RunProvider listModel listCategoriesLevelsConfig overlay (/config)Session snapshotsRunning BenchmarkRecent reports (/report list)Report details — text replayReport details — chartsNavigationDuring a benchmark (run dashboard)Paused run screenAfter a benchmark finishes (run dashboard)Report details (replay)Common CommandsAccount LoginLocal ReportsFeatures