BLXBench - Commands

This reference assumes the blxbench command is available (install globally as @bitslix/blxbench).

Interactive TUI

For the visual TUI interface, see TUI Guide.

Running Benchmarks

# Start interactive TUI
blxbench

From TUI

The TUI is command-driven. Type /help to list available commands and use Tab to complete suggestions.

Command	Description
`/show`	Show current run configuration
`/config`	Open the config overlay — interactive editor for stored settings, API keys, and default provider (three tabs; ← / → or Tab to switch, Enter to edit, Esc / q to close)
`/set provider` or `/provider`	Open the provider list; pick a registered adapter (no free-text args)
`/set models [list	id,id]`or`/models`
`/set categories` or `/categories`	Open the category checklist, or `*` = all Overall categories. `roblox` is opt-in and must be selected explicitly.
`/set levels` or `/levels`	Open the difficulty checklist, or `*` = all. Do not type level names after the command.
`/set limit N`	Limit tests per category
`/set ratelimit RPM-or-off`	Throttle provider calls
`/set fail-fast [on	off]`
`/set report html	json
`/set output-dir PATH`	Write reports somewhere other than `~/.blxbench/reports`
`/set notify [on	off]`or`/notify`
`/resume`	Open the paused-run list and continue a previously paused benchmark from where it stopped. Navigates with ↑/↓, Enter to resume, Esc to cancel.
`/save [label]`	Write the current run configuration to a JSON snapshot under your user saves directory (optional label → timestamped filename). Does not store provider or web API keys — only settings like provider alias, models, filters, report options.
`/load`	Open a list of saved snapshots (same navigation as `/report list`) and replace the active TUI configuration when you press Enter
`/report list`	List recent `report.json` files and open one to review or upload
`/report clear`	Clear generated reports in the default report directory
`/report submit on	off`
`/report browser install`	Install the optional native Report Browser desktop app (`@bitslix/blxbench-report-browser`) via npm
`/report browser open`	Launch the installed native Report Browser
`/report browser uninstall`	Remove the installed native Report Browser
`/auth login`, `/auth logout`, `/auth whoami`	Manage web account credentials
`/usage`	Usage overlay — account summary, subscription / pass line, and weekly public bench quota per model (peak model vs cap, UTC reset from the signed-in site’s `/api/cli/me`). Admin is shown as unlimited when the server omits limits. Esc / Enter / q closes. Requires an active sign-in (`/auth`).
`/pass`	Alias for `/usage` (same overlay; the website’s /pass page is separate — pricing and checkout).
`/arcade`	Open Arcade — minigame select from the shell (Esc / q to close).
`/playwright status/install/uninstall`	Manage Playwright Chromium
`/run`	Start the benchmark

On the run dashboard, the line above the progress bar shows the active model and current test. · run $… at the end of that line is a live running total of estimated API cost for all completed tests in this run (see TUI — During a benchmark).

While the benchmark is executing tests:

Key	Action
`a`	Open Arcade minigames — run continues in the background; Esc / q closes the overlay
`p`	Pause the run after the current test finishes — a pause snapshot is written so the run can be resumed later with `/resume`

When the run is paused (snapshot written):

Key	Action
`r`	Resume the run immediately in the same session (skips the shell; resumes from the saved index)
`q` / `Esc`	Return to the shell (snapshot is kept; resume later with `/resume`)

After a run (TUI)

When a benchmark finishes, the run dashboard shows a per-model summary, log, and the path to your local report. Unless you use /set report json, the TUI highlights Report HTML: and the index.html in the run folder (the browser-friendly view). Machine-readable data is still in report.json in the same directory (used for replay in the TUI and for uploads).

While you are on the run screen:

If desktop notifications are enabled (/set notify, BLXBENCH_NOTIFY, or ~/.blxbench/config.json), the OS may show a hint when the run completes — useful for long jobs while the terminal is in the background. Cancelled runs do not notify.
d — Open the report details view (same read-only replay as /report list): full text summary, optional charts, and upload — without starting a new run
q or Esc — Return to the command shell
s or r — Manually upload the generated report.json to the public leaderboard (independent of /report submit on|off). Requires sign-in and a Scout, Bencher, Founder, or Admin role for public submit.

If auto-upload was off or failed, use s / r to try again. A duplicate run_id is rejected by the server (HTTP 409) — you need a new run to create a new public entry.

The server only accepts eligible reports for the public leaderboard: full runs with no category / level / per-category limit filters, no --limit-style limited runs, and no fail-fast / exit-early runs. Runs that add roblox to a full run are accepted, but Roblox results are marked as a special category and excluded from Overall. Filtered or partial runs are still useful locally; use /show to confirm options before you rely on /report submit or s/r. Details: Leaderboard — Public submission rules.

/report list lists recent reports under the same directory as the runner: the default ~/.blxbench/reports/ (or the path from /set output-dir). Use ↑/↓ (or j/k), Enter to open the report replay view. There you can toggle charts with c, cycle models with m when the run has multiple models, upload with s / r, and return with q / Esc. See TUI — Report replay for screenshots.

See TUI for the full walkthrough.

Headless Mode

Run benchmarks without the TUI when stdout is not a TTY, or force it with --headless. Pass options directly — there is no run subcommand:

blxbench --headless --provider <alias> --models <model-id> [more-model-ids...]

See Headless Mode for CI/CD integration.

Options

Flag	Description	Default
`--provider`	Provider alias	`opr` (OpenRouter)
`--models`	Model ID(s)	(required)
`--api-key`	Sets `BLXBENCH_API_KEY` for this process	—
`--tests-dir`	Path to tests directory	Built-in tests
`--category`	Filter categories. Defaults to all Overall categories; include `roblox` explicitly for Roblox OpenGameEval.	All except `roblox`
`--level`	Filter difficulty	All
`--limit`	Max tests per category	All
`--save-json`	Output JSON path	Auto
`--fail-fast`	Stop on first failure	`false`
`--ratelimit`	Requests per minute	7 (when flag has no value)
`--dotenv-path`	Custom `.env` file	`.env`
`--clear`	Clear the default report directory	`false`
`--install-chromium`	Install Playwright	`false`
`--skip-render-validation`	Skip UI render stage for `coding_ui`	`false`
`--submit`	Upload report after run	`false`
`--notify`	Request a desktop notification when the run finishes (also `BLXBENCH_NOTIFY` and app config)	`false`
`--local-inference-port`	`lms` / `oll` only: HTTP port on `127.0.0.1` for chat + model list endpoints	Adapter default (1234 / 11434)
`--roblox-adapter`	Roblox OpenGameEval backend alias (separate from `--provider`; default `rbx`)	`rbx`
`--roblox-llm-name`	Override Roblox `custom_llm_info` vendor: `openai`, `claude`, or `gemini`	Inferred from LLM adapter id (e.g. OpenAI) or model prefix (`anthropic/…`, `google/…`)
`--roblox-llm-model-version`	Model version sent to Roblox, for example `gpt-5` or `claude-sonnet-4-5-20250929`	Selected model id without provider prefix
`--roblox-max-concurrent`	Max concurrent Roblox OpenGameEval jobs	`1`
`--roblox-poll-interval`	Poll interval in seconds for Roblox eval records	`10`
`--roblox-timeout`	Timeout in seconds per Roblox eval job	`900`

Utility Commands

Version

blxbench --version
blxbench -V

Prints blxbench <semver>. The TUI footer shows the same version as v<semver>.

Clear Results

blxbench --headless --clear

Removes generated artifacts while preserving ranking files.

By default, reports live in ~/.blxbench/reports/ on Linux/macOS and %USERPROFILE%\.blxbench\reports\ on Windows.

Install Chromium

blxbench --headless --install-chromium

Downloads Playwright Chromium for UI rendering tests.

In the TUI, the same setup is available as /playwright install. Use /playwright status to check whether Chromium is already detected.

Environment Variables

Variable	Description
`OPENROUTER_API_KEY`	OpenRouter (`opr`)
`OPENAI_API_KEY`	OpenAI adapter (`oai`)
`ANTHROPIC_API_KEY`	Claude key used by Roblox OpenGameEval when `--roblox-llm-name claude`
`GEMINI_API_KEY`	Gemini key used by Roblox OpenGameEval when `--roblox-llm-name gemini`
`LLM_API_KEY`	Optional explicit LLM key override for Roblox OpenGameEval
`OPEN_GAME_EVAL_API_KEY`	Roblox OpenGameEval backend key (`rbx`)
`HF_TOKEN`	Hugging Face (`hgf`)
`TOGETHER_API_KEY`	Together (`tgr`)
`PORTKEY_API_KEY`	Portkey (`ptk`)
`CLOUDFLARE_API_TOKEN`	Cloudflare (`cfr`)
`BLXBENCH_API_KEY`	BLXBench API key for headless submit
`BLXBENCH_SUBMIT`	Set to `1` or `true` to upload after a headless run
`BLXBENCH_NOTIFY`	`1` / `true` — show an OS desktop hint when a run finishes (TUI or headless). `0` / `false` — force off for this process, even if `~/.blxbench/config.json` has `desktopNotify: true`. See Configuration.
`BLXBENCH_AUTOSAVE_SEC`	TUI only: interval in seconds for overwriting `autosave.json` in your saves directory; `0` disables autosave. When unset, the CLI uses a short default.
`BLXBENCH_PREFER_STORED_ENV`	`1` / `true` — treat `~/.blxbench/config.json` `env` as overriding project `.env` for overlapping keys (same as `preferStoredEnv: true`). `0` / `false` forces the opposite for this process. See Configuration — App config file.

Examples

Run all tests (OpenRouter):

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Run specific categories:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --category speed reasoning

Run a full Overall suite and attach the special Roblox category:

OPEN_GAME_EVAL_API_KEY=... OPENAI_API_KEY=... \
blxbench --headless --provider opr --models openai/gpt-5.4-mini \
  --category coding_ui debugging hallucination reasoning refactoring security speed roblox \
  --roblox-llm-name openai \
  --roblox-llm-model-version gpt-5

roblox appears in reports and web breakdowns, but it does not affect Overall score, Overall rank, trends, or best-run selection.

Limit test count:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --limit 5

Upload results:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --submit

This reference assumes the blxbench command is available (install globally as @bitslix/blxbench).

Interactive TUI

For the visual TUI interface, see TUI Guide.

Running Benchmarks

# Start interactive TUI
blxbench

From TUI

The TUI is command-driven. Type /help to list available commands and use Tab to complete suggestions.

Command	Description
`/show`	Show current run configuration
`/config`	Open the config overlay — interactive editor for stored settings, API keys, and default provider (three tabs; ← / → or Tab to switch, Enter to edit, Esc / q to close)
`/set provider` or `/provider`	Open the provider list; pick a registered adapter (no free-text args)
`/set models [list	id,id]`or`/models`
`/set categories` or `/categories`	Open the category checklist, or `*` = all Overall categories. `roblox` is opt-in and must be selected explicitly.
`/set levels` or `/levels`	Open the difficulty checklist, or `*` = all. Do not type level names after the command.
`/set limit N`	Limit tests per category
`/set ratelimit RPM-or-off`	Throttle provider calls
`/set fail-fast [on	off]`
`/set report html	json
`/set output-dir PATH`	Write reports somewhere other than `~/.blxbench/reports`
`/set notify [on	off]`or`/notify`
`/resume`	Open the paused-run list and continue a previously paused benchmark from where it stopped. Navigates with ↑/↓, Enter to resume, Esc to cancel.
`/save [label]`	Write the current run configuration to a JSON snapshot under your user saves directory (optional label → timestamped filename). Does not store provider or web API keys — only settings like provider alias, models, filters, report options.
`/load`	Open a list of saved snapshots (same navigation as `/report list`) and replace the active TUI configuration when you press Enter
`/report list`	List recent `report.json` files and open one to review or upload
`/report clear`	Clear generated reports in the default report directory
`/report submit on	off`
`/report browser install`	Install the optional native Report Browser desktop app (`@bitslix/blxbench-report-browser`) via npm
`/report browser open`	Launch the installed native Report Browser
`/report browser uninstall`	Remove the installed native Report Browser
`/auth login`, `/auth logout`, `/auth whoami`	Manage web account credentials
`/usage`	Usage overlay — account summary, subscription / pass line, and weekly public bench quota per model (peak model vs cap, UTC reset from the signed-in site’s `/api/cli/me`). Admin is shown as unlimited when the server omits limits. Esc / Enter / q closes. Requires an active sign-in (`/auth`).
`/pass`	Alias for `/usage` (same overlay; the website’s /pass page is separate — pricing and checkout).
`/arcade`	Open Arcade — minigame select from the shell (Esc / q to close).
`/playwright status/install/uninstall`	Manage Playwright Chromium
`/run`	Start the benchmark

During a run (TUI)

While the benchmark is executing tests:

Key	Action
`a`	Open Arcade minigames — run continues in the background; Esc / q closes the overlay
`p`	Pause the run after the current test finishes — a pause snapshot is written so the run can be resumed later with `/resume`

When the run is paused (snapshot written):

Key	Action
`r`	Resume the run immediately in the same session (skips the shell; resumes from the saved index)
`q` / `Esc`	Return to the shell (snapshot is kept; resume later with `/resume`)

After a run (TUI)

While you are on the run screen:

If desktop notifications are enabled (/set notify, BLXBENCH_NOTIFY, or ~/.blxbench/config.json), the OS may show a hint when the run completes — useful for long jobs while the terminal is in the background. Cancelled runs do not notify.
d — Open the report details view (same read-only replay as /report list): full text summary, optional charts, and upload — without starting a new run
q or Esc — Return to the command shell
s or r — Manually upload the generated report.json to the public leaderboard (independent of /report submit on|off). Requires sign-in and a Scout, Bencher, Founder, or Admin role for public submit.

If auto-upload was off or failed, use s / r to try again. A duplicate run_id is rejected by the server (HTTP 409) — you need a new run to create a new public entry.

See TUI for the full walkthrough.

Headless Mode

Run benchmarks without the TUI when stdout is not a TTY, or force it with --headless. Pass options directly — there is no run subcommand:

blxbench --headless --provider <alias> --models <model-id> [more-model-ids...]

See Headless Mode for CI/CD integration.

Options

Flag	Description	Default
`--provider`	Provider alias	`opr` (OpenRouter)
`--models`	Model ID(s)	(required)
`--api-key`	Sets `BLXBENCH_API_KEY` for this process	—
`--tests-dir`	Path to tests directory	Built-in tests
`--category`	Filter categories. Defaults to all Overall categories; include `roblox` explicitly for Roblox OpenGameEval.	All except `roblox`
`--level`	Filter difficulty	All
`--limit`	Max tests per category	All
`--save-json`	Output JSON path	Auto
`--fail-fast`	Stop on first failure	`false`
`--ratelimit`	Requests per minute	7 (when flag has no value)
`--dotenv-path`	Custom `.env` file	`.env`
`--clear`	Clear the default report directory	`false`
`--install-chromium`	Install Playwright	`false`
`--skip-render-validation`	Skip UI render stage for `coding_ui`	`false`
`--submit`	Upload report after run	`false`
`--notify`	Request a desktop notification when the run finishes (also `BLXBENCH_NOTIFY` and app config)	`false`
`--local-inference-port`	`lms` / `oll` only: HTTP port on `127.0.0.1` for chat + model list endpoints	Adapter default (1234 / 11434)
`--roblox-adapter`	Roblox OpenGameEval backend alias (separate from `--provider`; default `rbx`)	`rbx`
`--roblox-llm-name`	Override Roblox `custom_llm_info` vendor: `openai`, `claude`, or `gemini`	Inferred from LLM adapter id (e.g. OpenAI) or model prefix (`anthropic/…`, `google/…`)
`--roblox-llm-model-version`	Model version sent to Roblox, for example `gpt-5` or `claude-sonnet-4-5-20250929`	Selected model id without provider prefix
`--roblox-max-concurrent`	Max concurrent Roblox OpenGameEval jobs	`1`
`--roblox-poll-interval`	Poll interval in seconds for Roblox eval records	`10`
`--roblox-timeout`	Timeout in seconds per Roblox eval job	`900`

Utility Commands

Version

blxbench --version
blxbench -V

Prints blxbench <semver>. The TUI footer shows the same version as v<semver>.

Clear Results

blxbench --headless --clear

Removes generated artifacts while preserving ranking files.

By default, reports live in ~/.blxbench/reports/ on Linux/macOS and %USERPROFILE%\.blxbench\reports\ on Windows.

Install Chromium

blxbench --headless --install-chromium

Downloads Playwright Chromium for UI rendering tests.

In the TUI, the same setup is available as /playwright install. Use /playwright status to check whether Chromium is already detected.

Environment Variables

Variable	Description
`OPENROUTER_API_KEY`	OpenRouter (`opr`)
`OPENAI_API_KEY`	OpenAI adapter (`oai`)
`ANTHROPIC_API_KEY`	Claude key used by Roblox OpenGameEval when `--roblox-llm-name claude`
`GEMINI_API_KEY`	Gemini key used by Roblox OpenGameEval when `--roblox-llm-name gemini`
`LLM_API_KEY`	Optional explicit LLM key override for Roblox OpenGameEval
`OPEN_GAME_EVAL_API_KEY`	Roblox OpenGameEval backend key (`rbx`)
`HF_TOKEN`	Hugging Face (`hgf`)
`TOGETHER_API_KEY`	Together (`tgr`)
`PORTKEY_API_KEY`	Portkey (`ptk`)
`CLOUDFLARE_API_TOKEN`	Cloudflare (`cfr`)
`BLXBENCH_API_KEY`	BLXBench API key for headless submit
`BLXBENCH_SUBMIT`	Set to `1` or `true` to upload after a headless run
`BLXBENCH_NOTIFY`	`1` / `true` — show an OS desktop hint when a run finishes (TUI or headless). `0` / `false` — force off for this process, even if `~/.blxbench/config.json` has `desktopNotify: true`. See Configuration.
`BLXBENCH_AUTOSAVE_SEC`	TUI only: interval in seconds for overwriting `autosave.json` in your saves directory; `0` disables autosave. When unset, the CLI uses a short default.
`BLXBENCH_PREFER_STORED_ENV`	`1` / `true` — treat `~/.blxbench/config.json` `env` as overriding project `.env` for overlapping keys (same as `preferStoredEnv: true`). `0` / `false` forces the opposite for this process. See Configuration — App config file.

Examples

Run all tests (OpenRouter):

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Run specific categories:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --category speed reasoning

Run a full Overall suite and attach the special Roblox category:

OPEN_GAME_EVAL_API_KEY=... OPENAI_API_KEY=... \
blxbench --headless --provider opr --models openai/gpt-5.4-mini \
  --category coding_ui debugging hallucination reasoning refactoring security speed roblox \
  --roblox-llm-name openai \
  --roblox-llm-model-version gpt-5

roblox appears in reports and web breakdowns, but it does not affect Overall score, Overall rank, trends, or best-run selection.

Limit test count:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --limit 5

Upload results:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --submit

Commands

Interactive TUI

Running Benchmarks

From TUI

During a run (TUI)

After a run (TUI)

Headless Mode

Options

Utility Commands

Version

Clear Results

Install Chromium

Environment Variables

Examples

On this page

Commands

Interactive TUI

Running Benchmarks

From TUI

During a run (TUI)

After a run (TUI)

Headless Mode

Options

Utility Commands

Version

Clear Results

Install Chromium

Environment Variables

Examples

On this page