BLXBench Docs
BLXBench Docs
LeaderboardOur TestsSponsor / PartnershipDocumentationInstallationUpdating blxbenchQuick StartTUIArcadeCommandsHeadless ModeConfigurationLeaderboardOur TestsAccountReport Browser (desktop)AboutFAQSupport

Commands

Complete reference for all blxbench commands.

This reference assumes the blxbench command is available (install globally as @bitslix/blxbench).

Interactive TUI

For the visual TUI interface, see TUI Guide.

Running Benchmarks

# Start interactive TUI
blxbench

From TUI

The TUI is command-driven. Type /help to list available commands and use Tab to complete suggestions.

CommandDescription
/showShow current run configuration
/configOpen the config overlay — interactive editor for stored settings, API keys, and default provider (three tabs; ← / → or Tab to switch, Enter to edit, Esc / q to close)
/set provider or /providerOpen the provider list; pick a registered adapter (no free-text args)
`/set models [listid,id]or/models`
/set categories or /categoriesOpen the category checklist, or * = all Overall categories. roblox is opt-in and must be selected explicitly.
/set levels or /levelsOpen the difficulty checklist, or * = all. Do not type level names after the command.
/set limit NLimit tests per category
/set ratelimit RPM-or-offThrottle provider calls
`/set fail-fast [onoff]`
`/set report htmljson
/set output-dir PATHWrite reports somewhere other than ~/.blxbench/reports
`/set notify [onoff]or/notify`
/resumeOpen the paused-run list and continue a previously paused benchmark from where it stopped. Navigates with ↑/↓, Enter to resume, Esc to cancel.
/save [label]Write the current run configuration to a JSON snapshot under your user saves directory (optional label → timestamped filename). Does not store provider or web API keys — only settings like provider alias, models, filters, report options.
/loadOpen a list of saved snapshots (same navigation as /report list) and replace the active TUI configuration when you press Enter
/report listList recent report.json files and open one to review or upload
/report clearClear generated reports in the default report directory
`/report submit onoff`
/report browser installInstall the optional native Report Browser desktop app (@bitslix/blxbench-report-browser) via npm
/report browser openLaunch the installed native Report Browser
/report browser uninstallRemove the installed native Report Browser
/auth login, /auth logout, /auth whoamiManage web account credentials
/usageUsage overlay — account summary, subscription / pass line, and weekly public bench quota per model (peak model vs cap, UTC reset from the signed-in site’s /api/cli/me). Admin is shown as unlimited when the server omits limits. Esc / Enter / q closes. Requires an active sign-in (/auth).
/passAlias for /usage (same overlay; the website’s /pass page is separate — pricing and checkout).
/arcadeOpen Arcade — minigame select from the shell (Esc / q to close).
/playwright status/install/uninstallManage Playwright Chromium
/runStart the benchmark

During a run (TUI)

On the run dashboard, the line above the progress bar shows the active model and current test. · run $… at the end of that line is a live running total of estimated API cost for all completed tests in this run (see TUI — During a benchmark).

While the benchmark is executing tests:

KeyAction
aOpen Arcade minigames — run continues in the background; Esc / q closes the overlay
pPause the run after the current test finishes — a pause snapshot is written so the run can be resumed later with /resume

When the run is paused (snapshot written):

KeyAction
rResume the run immediately in the same session (skips the shell; resumes from the saved index)
q / EscReturn to the shell (snapshot is kept; resume later with /resume)

After a run (TUI)

When a benchmark finishes, the run dashboard shows a per-model summary, log, and the path to your local report. Unless you use /set report json, the TUI highlights Report HTML: and the index.html in the run folder (the browser-friendly view). Machine-readable data is still in report.json in the same directory (used for replay in the TUI and for uploads).

While you are on the run screen:

  • If desktop notifications are enabled (/set notify, BLXBENCH_NOTIFY, or ~/.blxbench/config.json), the OS may show a hint when the run completes — useful for long jobs while the terminal is in the background. Cancelled runs do not notify.
  • d — Open the report details view (same read-only replay as /report list): full text summary, optional charts, and upload — without starting a new run
  • q or Esc — Return to the command shell
  • s or r — Manually upload the generated report.json to the public leaderboard (independent of /report submit on|off). Requires sign-in and a Scout, Bencher, Founder, or Admin role for public submit.

If auto-upload was off or failed, use s / r to try again. A duplicate run_id is rejected by the server (HTTP 409) — you need a new run to create a new public entry.

The server only accepts eligible reports for the public leaderboard: full runs with no category / level / per-category limit filters, no --limit-style limited runs, and no fail-fast / exit-early runs. Runs that add roblox to a full run are accepted, but Roblox results are marked as a special category and excluded from Overall. Filtered or partial runs are still useful locally; use /show to confirm options before you rely on /report submit or s/r. Details: Leaderboard — Public submission rules.

/report list lists recent reports under the same directory as the runner: the default ~/.blxbench/reports/ (or the path from /set output-dir). Use ↑/↓ (or j/k), Enter to open the report replay view. There you can toggle charts with c, cycle models with m when the run has multiple models, upload with s / r, and return with q / Esc. See TUI — Report replay for screenshots.

See TUI for the full walkthrough.

Headless Mode

Run benchmarks without the TUI when stdout is not a TTY, or force it with --headless. Pass options directly — there is no run subcommand:

blxbench --headless --provider <alias> --models <model-id> [more-model-ids...]

See Headless Mode for CI/CD integration.

Options

FlagDescriptionDefault
--providerProvider aliasopr (OpenRouter)
--modelsModel ID(s)(required)
--api-keySets BLXBENCH_API_KEY for this process—
--tests-dirPath to tests directoryBuilt-in tests
--categoryFilter categories. Defaults to all Overall categories; include roblox explicitly for Roblox OpenGameEval.All except roblox
--levelFilter difficultyAll
--limitMax tests per categoryAll
--save-jsonOutput JSON pathAuto
--fail-fastStop on first failurefalse
--ratelimitRequests per minute7 (when flag has no value)
--dotenv-pathCustom .env file.env
--clearClear the default report directoryfalse
--install-chromiumInstall Playwrightfalse
--skip-render-validationSkip UI render stage for coding_uifalse
--submitUpload report after runfalse
--notifyRequest a desktop notification when the run finishes (also BLXBENCH_NOTIFY and app config)false
--local-inference-portlms / oll only: HTTP port on 127.0.0.1 for chat + model list endpointsAdapter default (1234 / 11434)
--roblox-adapterRoblox OpenGameEval backend alias (separate from --provider; default rbx)rbx
--roblox-llm-nameOverride Roblox custom_llm_info vendor: openai, claude, or geminiInferred from LLM adapter id (e.g. OpenAI) or model prefix (anthropic/…, google/…)
--roblox-llm-model-versionModel version sent to Roblox, for example gpt-5 or claude-sonnet-4-5-20250929Selected model id without provider prefix
--roblox-max-concurrentMax concurrent Roblox OpenGameEval jobs1
--roblox-poll-intervalPoll interval in seconds for Roblox eval records10
--roblox-timeoutTimeout in seconds per Roblox eval job900

Utility Commands

Version

blxbench --version
blxbench -V

Prints blxbench <semver>. The TUI footer shows the same version as v<semver>.

Clear Results

blxbench --headless --clear

Removes generated artifacts while preserving ranking files.

By default, reports live in ~/.blxbench/reports/ on Linux/macOS and %USERPROFILE%\.blxbench\reports\ on Windows.

Install Chromium

blxbench --headless --install-chromium

Downloads Playwright Chromium for UI rendering tests.

In the TUI, the same setup is available as /playwright install. Use /playwright status to check whether Chromium is already detected.

Environment Variables

VariableDescription
OPENROUTER_API_KEYOpenRouter (opr)
OPENAI_API_KEYOpenAI adapter (oai)
ANTHROPIC_API_KEYClaude key used by Roblox OpenGameEval when --roblox-llm-name claude
GEMINI_API_KEYGemini key used by Roblox OpenGameEval when --roblox-llm-name gemini
LLM_API_KEYOptional explicit LLM key override for Roblox OpenGameEval
OPEN_GAME_EVAL_API_KEYRoblox OpenGameEval backend key (rbx)
HF_TOKENHugging Face (hgf)
TOGETHER_API_KEYTogether (tgr)
PORTKEY_API_KEYPortkey (ptk)
CLOUDFLARE_API_TOKENCloudflare (cfr)
BLXBENCH_API_KEYBLXBench API key for headless submit
BLXBENCH_SUBMITSet to 1 or true to upload after a headless run
BLXBENCH_NOTIFY1 / true — show an OS desktop hint when a run finishes (TUI or headless). 0 / false — force off for this process, even if ~/.blxbench/config.json has desktopNotify: true. See Configuration.
BLXBENCH_AUTOSAVE_SECTUI only: interval in seconds for overwriting autosave.json in your saves directory; 0 disables autosave. When unset, the CLI uses a short default.
BLXBENCH_PREFER_STORED_ENV1 / true — treat ~/.blxbench/config.json env as overriding project .env for overlapping keys (same as preferStoredEnv: true). 0 / false forces the opposite for this process. See Configuration — App config file.

Examples

Run all tests (OpenRouter):

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Run specific categories:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --category speed reasoning

Run a full Overall suite and attach the special Roblox category:

OPEN_GAME_EVAL_API_KEY=... OPENAI_API_KEY=... \
blxbench --headless --provider opr --models openai/gpt-5.4-mini \
  --category coding_ui debugging hallucination reasoning refactoring security speed roblox \
  --roblox-llm-name openai \
  --roblox-llm-model-version gpt-5

roblox appears in reports and web breakdowns, but it does not affect Overall score, Overall rank, trends, or best-run selection.

Limit test count:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --limit 5

Upload results:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --submit

Arcade

BLXBench minigames in the terminal — play from the shell or during a benchmark.

Headless Mode

Running benchmarks in automated environments.

On this page

Interactive TUIRunning BenchmarksFrom TUIDuring a run (TUI)After a run (TUI)Headless ModeOptionsUtility CommandsVersionClear ResultsInstall ChromiumEnvironment VariablesExamples