Commands
Complete reference for all blxbench commands.
This reference assumes the blxbench command is available (install globally as @bitslix/blxbench).
Interactive TUI
For the visual TUI interface, see TUI Guide.
Running Benchmarks
# Start interactive TUI
blxbenchFrom TUI
The TUI is command-driven. Type /help to list available commands and use Tab to complete suggestions.
| Command | Description |
|---|---|
/show | Show current run configuration |
/config | Open the config overlay — interactive editor for stored settings, API keys, and default provider (three tabs; ← / → or Tab to switch, Enter to edit, Esc / q to close) |
/set provider or /provider | Open the provider list; pick a registered adapter (no free-text args) |
| `/set models [list | id,id]or/models` |
/set categories or /categories | Open the category checklist, or * = all Overall categories. roblox is opt-in and must be selected explicitly. |
/set levels or /levels | Open the difficulty checklist, or * = all. Do not type level names after the command. |
/set limit N | Limit tests per category |
/set ratelimit RPM-or-off | Throttle provider calls |
| `/set fail-fast [on | off]` |
| `/set report html | json |
/set output-dir PATH | Write reports somewhere other than ~/.blxbench/reports |
| `/set notify [on | off]or/notify` |
/resume | Open the paused-run list and continue a previously paused benchmark from where it stopped. Navigates with ↑/↓, Enter to resume, Esc to cancel. |
/save [label] | Write the current run configuration to a JSON snapshot under your user saves directory (optional label → timestamped filename). Does not store provider or web API keys — only settings like provider alias, models, filters, report options. |
/load | Open a list of saved snapshots (same navigation as /report list) and replace the active TUI configuration when you press Enter |
/report list | List recent report.json files and open one to review or upload |
/report clear | Clear generated reports in the default report directory |
| `/report submit on | off` |
/report browser install | Install the optional native Report Browser desktop app (@bitslix/blxbench-report-browser) via npm |
/report browser open | Launch the installed native Report Browser |
/report browser uninstall | Remove the installed native Report Browser |
/auth login, /auth logout, /auth whoami | Manage web account credentials |
/usage | Usage overlay — account summary, subscription / pass line, and weekly public bench quota per model (peak model vs cap, UTC reset from the signed-in site’s /api/cli/me). Admin is shown as unlimited when the server omits limits. Esc / Enter / q closes. Requires an active sign-in (/auth). |
/pass | Alias for /usage (same overlay; the website’s /pass page is separate — pricing and checkout). |
/arcade | Open Arcade — minigame select from the shell (Esc / q to close). |
/playwright status/install/uninstall | Manage Playwright Chromium |
/run | Start the benchmark |
During a run (TUI)
On the run dashboard, the line above the progress bar shows the active model and current test. · run $… at the end of that line is a live running total of estimated API cost for all completed tests in this run (see TUI — During a benchmark).
While the benchmark is executing tests:
| Key | Action |
|---|---|
a | Open Arcade minigames — run continues in the background; Esc / q closes the overlay |
p | Pause the run after the current test finishes — a pause snapshot is written so the run can be resumed later with /resume |
When the run is paused (snapshot written):
| Key | Action |
|---|---|
r | Resume the run immediately in the same session (skips the shell; resumes from the saved index) |
q / Esc | Return to the shell (snapshot is kept; resume later with /resume) |
After a run (TUI)
When a benchmark finishes, the run dashboard shows a per-model summary, log, and the path to your local report. Unless you use /set report json, the TUI highlights Report HTML: and the index.html in the run folder (the browser-friendly view). Machine-readable data is still in report.json in the same directory (used for replay in the TUI and for uploads).
While you are on the run screen:
- If desktop notifications are enabled (
/set notify,BLXBENCH_NOTIFY, or~/.blxbench/config.json), the OS may show a hint when the run completes — useful for long jobs while the terminal is in the background. Cancelled runs do not notify. d— Open the report details view (same read-only replay as/report list): full text summary, optional charts, and upload — without starting a new runqorEsc— Return to the command shellsorr— Manually upload the generatedreport.jsonto the public leaderboard (independent of/report submit on|off). Requires sign-in and a Scout, Bencher, Founder, or Admin role for public submit.
If auto-upload was off or failed, use s / r to try again. A duplicate run_id is rejected by the server (HTTP 409) — you need a new run to create a new public entry.
The server only accepts eligible reports for the public leaderboard: full runs with no category / level / per-category limit filters, no --limit-style limited runs, and no fail-fast / exit-early runs. Runs that add roblox to a full run are accepted, but Roblox results are marked as a special category and excluded from Overall. Filtered or partial runs are still useful locally; use /show to confirm options before you rely on /report submit or s/r. Details: Leaderboard — Public submission rules.
/report list lists recent reports under the same directory as the runner: the default ~/.blxbench/reports/ (or the path from /set output-dir). Use ↑/↓ (or j/k), Enter to open the report replay view. There you can toggle charts with c, cycle models with m when the run has multiple models, upload with s / r, and return with q / Esc. See TUI — Report replay for screenshots.
See TUI for the full walkthrough.
Headless Mode
Run benchmarks without the TUI when stdout is not a TTY, or force it with --headless. Pass options directly — there is no run subcommand:
blxbench --headless --provider <alias> --models <model-id> [more-model-ids...]See Headless Mode for CI/CD integration.
Options
| Flag | Description | Default |
|---|---|---|
--provider | Provider alias | opr (OpenRouter) |
--models | Model ID(s) | (required) |
--api-key | Sets BLXBENCH_API_KEY for this process | — |
--tests-dir | Path to tests directory | Built-in tests |
--category | Filter categories. Defaults to all Overall categories; include roblox explicitly for Roblox OpenGameEval. | All except roblox |
--level | Filter difficulty | All |
--limit | Max tests per category | All |
--save-json | Output JSON path | Auto |
--fail-fast | Stop on first failure | false |
--ratelimit | Requests per minute | 7 (when flag has no value) |
--dotenv-path | Custom .env file | .env |
--clear | Clear the default report directory | false |
--install-chromium | Install Playwright | false |
--skip-render-validation | Skip UI render stage for coding_ui | false |
--submit | Upload report after run | false |
--notify | Request a desktop notification when the run finishes (also BLXBENCH_NOTIFY and app config) | false |
--local-inference-port | lms / oll only: HTTP port on 127.0.0.1 for chat + model list endpoints | Adapter default (1234 / 11434) |
--roblox-adapter | Roblox OpenGameEval backend alias (separate from --provider; default rbx) | rbx |
--roblox-llm-name | Override Roblox custom_llm_info vendor: openai, claude, or gemini | Inferred from LLM adapter id (e.g. OpenAI) or model prefix (anthropic/…, google/…) |
--roblox-llm-model-version | Model version sent to Roblox, for example gpt-5 or claude-sonnet-4-5-20250929 | Selected model id without provider prefix |
--roblox-max-concurrent | Max concurrent Roblox OpenGameEval jobs | 1 |
--roblox-poll-interval | Poll interval in seconds for Roblox eval records | 10 |
--roblox-timeout | Timeout in seconds per Roblox eval job | 900 |
Utility Commands
Version
blxbench --version
blxbench -VPrints blxbench <semver>. The TUI footer shows the same version as v<semver>.
Clear Results
blxbench --headless --clearRemoves generated artifacts while preserving ranking files.
By default, reports live in ~/.blxbench/reports/ on Linux/macOS and %USERPROFILE%\.blxbench\reports\ on Windows.
Install Chromium
blxbench --headless --install-chromiumDownloads Playwright Chromium for UI rendering tests.
In the TUI, the same setup is available as /playwright install. Use /playwright status to check whether Chromium is already detected.
Environment Variables
| Variable | Description |
|---|---|
OPENROUTER_API_KEY | OpenRouter (opr) |
OPENAI_API_KEY | OpenAI adapter (oai) |
ANTHROPIC_API_KEY | Claude key used by Roblox OpenGameEval when --roblox-llm-name claude |
GEMINI_API_KEY | Gemini key used by Roblox OpenGameEval when --roblox-llm-name gemini |
LLM_API_KEY | Optional explicit LLM key override for Roblox OpenGameEval |
OPEN_GAME_EVAL_API_KEY | Roblox OpenGameEval backend key (rbx) |
HF_TOKEN | Hugging Face (hgf) |
TOGETHER_API_KEY | Together (tgr) |
PORTKEY_API_KEY | Portkey (ptk) |
CLOUDFLARE_API_TOKEN | Cloudflare (cfr) |
BLXBENCH_API_KEY | BLXBench API key for headless submit |
BLXBENCH_SUBMIT | Set to 1 or true to upload after a headless run |
BLXBENCH_NOTIFY | 1 / true — show an OS desktop hint when a run finishes (TUI or headless). 0 / false — force off for this process, even if ~/.blxbench/config.json has desktopNotify: true. See Configuration. |
BLXBENCH_AUTOSAVE_SEC | TUI only: interval in seconds for overwriting autosave.json in your saves directory; 0 disables autosave. When unset, the CLI uses a short default. |
BLXBENCH_PREFER_STORED_ENV | 1 / true — treat ~/.blxbench/config.json env as overriding project .env for overlapping keys (same as preferStoredEnv: true). 0 / false forces the opposite for this process. See Configuration — App config file. |
Examples
Run all tests (OpenRouter):
blxbench --headless --provider opr --models openai/gpt-5.4-miniRun specific categories:
blxbench --headless --provider opr --models openai/gpt-5.4-mini --category speed reasoningRun a full Overall suite and attach the special Roblox category:
OPEN_GAME_EVAL_API_KEY=... OPENAI_API_KEY=... \
blxbench --headless --provider opr --models openai/gpt-5.4-mini \
--category coding_ui debugging hallucination reasoning refactoring security speed roblox \
--roblox-llm-name openai \
--roblox-llm-model-version gpt-5roblox appears in reports and web breakdowns, but it does not affect Overall score, Overall rank, trends, or best-run selection.
Limit test count:
blxbench --headless --provider opr --models openai/gpt-5.4-mini --limit 5Upload results:
blxbench --headless --provider opr --models openai/gpt-5.4-mini --submit