Configuration
Configure blxbench via files, environment variables, and flags.
The CLI is distributed as the npm package @bitslix/blxbench; configuration below applies to the blxbench command in your shell.
Configuration Files
.env File
BLXBench loads variables from .env in the current directory or the path given by --dotenv-path, and merges ~/.blxbench/config.json env according to the rules in App config file.
# Example — use the env vars required by your chosen adapter (values are not shown here)
OPENROUTER_API_KEY=
OPENAI_API_KEY=
OPEN_GAME_EVAL_API_KEY=
LMSTUDIO_API_KEY=
OLLAMA_API_KEY=
BLXBENCH_API_KEY=Provider API keys are used locally to call model APIs. OPEN_GAME_EVAL_API_KEY is the Roblox OpenGameEval backend key for the optional roblox category (not your --provider). BLXBENCH_API_KEY is a separate web-app key used only when uploading reports with --submit or BLXBENCH_SUBMIT=1.
Results Directory
By default, generated reports are saved to ~/.blxbench/reports/ (%USERPROFILE%\.blxbench\reports on Windows). This path is used by both the installed native binary and local development builds, so reports do not depend on the current working directory being writable.
In headless mode, --save-json writes an additional JSON export to a custom path:
blxbench --headless --save-json ./custom-results.jsonIn the TUI, use /set output-dir PATH to change the report directory for the interactive run.
Session snapshots and autosave (TUI)
Session configuration snapshots (slash commands /save and /load) are stored under a separate directory from reports:
- Linux / macOS:
~/.blxbench/saves/ - Windows:
%USERPROFILE%\.blxbench\saves\
Each manual /save writes a new JSON file (unless you rely on autosave only). Autosave repeatedly overwrites a single file named autosave.json in that folder.
| Variable | Purpose |
|---|---|
BLXBENCH_AUTOSAVE_SEC | Interval in seconds between autosave writes while the interactive shell is open. Set to 0 to disable. When unset, the CLI uses a built-in default. |
Snapshot files contain run settings only (validated when loaded). They do not contain provider API keys or BLXBench account secrets — those stay in .env, config.json → env, the shell environment, and the credentials file used by /auth.
App config file
The CLI reads ~/.blxbench/config.json (Unix) or %USERPROFILE%\.blxbench\config.json (Windows) at startup and merges it with .env files into process.env before the TUI or headless run starts. See Configuration priority below for merge rules.
| Field | Purpose |
|---|---|
version | File format version (written by the CLI; you can omit when hand-editing). |
desktopNotify | When true, the CLI may show an OS-level notification when a benchmark run finishes (TUI or headless). Does not run for cancelled runs. Change from the TUI with /set notify on / /notify off, or edit the JSON. |
env | Map of extra environment variables (typically provider API keys such as OPENROUTER_API_KEY or OPEN_GAME_EVAL_API_KEY). Values are merged into the process environment. Suitable for secrets you want outside a project .env. The file is chmod 600 when it holds non-empty env entries. |
preferStoredEnv | When true, non-empty keys in env override values loaded from .env files for those keys (shell exports still win). When false (default), env only fills keys that are still missing after loading .env. |
ignoreWorkspaceDotenv | When true, the CLI does not read ./.env in the current working directory (package-adjacent .env for the installed CLI may still load). Use this if you want ~/.blxbench/config.json (or shell exports) to be the only file-based source for overlapping keys. |
You can set the same “prefer stored” behavior for a single process with BLXBENCH_PREFER_STORED_ENV=1 (or 0 to force off for that run), which behaves like preferStoredEnv in the file when unset there.
First-run setup (welcome): If OPENROUTER_API_KEY is missing (required for the default opr provider), the TUI shows a setup screen first: enter the key once, masked, and it is saved under config.env.OPENROUTER_API_KEY. After that you go through sign-in as usual. You can also create env yourself or rely on project .env — see installation.
When you enable the optional roblox category in the TUI and OPEN_GAME_EVAL_API_KEY is missing, the same masked setup flow stores the key under config.env.OPEN_GAME_EVAL_API_KEY.
When you choose a local LLM adapter with an optional API key (lms, oll) and that key is not in the environment, the TUI may prompt once; you can skip if you run locally without auth — see TUI.
Project vs stored keys: If both ./.env and config.env define the same known adapter key, such as OPENROUTER_API_KEY or OPEN_GAME_EVAL_API_KEY (and you have not set preferStoredEnv, ignoreWorkspaceDotenv, or BLXBENCH_PREFER_STORED_ENV), the TUI may ask whether to prefer the project file, the stored config, or to skip the workspace .env. That choice updates config.json accordingly.
Example:
{
"version": 2,
"desktopNotify": true,
"env": {
"OPENROUTER_API_KEY": "sk-or-...",
"OPEN_GAME_EVAL_API_KEY": "..."
},
"preferStoredEnv": false,
"ignoreWorkspaceDotenv": false
}Desktop notifications
Notifications are best-effort and depend on the desktop session (no guarantee over SSH without a notification daemon, etc.). Evaluation order:
- If
BLXBENCH_NOTIFYis0,false,no, oroff→ no notification for this process. - If
BLXBENCH_NOTIFYis1,true,yes, oron→ notify when a run completes. - Else if
config.jsonhasdesktopNotify: true→ notify when a run completes. - Else if headless was started with
--notify→ notify when a run completes.
The implementation uses the platform’s native mechanisms (no extra npm packages). Details: Commands ( --notify, BLXBENCH_NOTIFY ).
Provider Configuration
The installed CLI includes the official provider adapters in the native bundle. In the source repo they live under packages/benchmark-core/adapters/. Each adapter exposes a provider alias (argument in meta.json):
| Alias | Adapter | Typical env var |
|---|---|---|
opr | OpenRouter | OPENROUTER_API_KEY |
oai | OpenAI | OPENAI_API_KEY |
hgf | Hugging Face | HF_TOKEN |
tgr | Together | TOGETHER_API_KEY |
ptk | Portkey | PORTKEY_API_KEY |
cfr | Cloudflare | CLOUDFLARE_API_TOKEN |
lms | LM Studio | LMSTUDIO_API_KEY (optional) |
oll | Ollama | OLLAMA_API_KEY (optional) |
The default --provider is opr. Model ids are whatever that endpoint accepts (e.g. OpenRouter-style vendor/model-name).
For local LM Studio runs, start the LM Studio server first (default chat API on TCP port 1234). The bundled lms adapter talks only to 127.0.0.1; override the port with --local-inference-port (headless) or /local-inference-port in the TUI if your server listens elsewhere. Then use --provider lms --models <model-key>. LMSTUDIO_API_KEY is only needed if you enabled API token authentication in LM Studio Server Settings. The same alias can be used for generated summaries and Coding/UI judge validation: SUMMARY_PROVIDER=lms and VALIDATION_PROVIDER=lms.
For Ollama, run ollama serve (default port 11434). Requests use 127.0.0.1 only; override the port like LM Studio when needed. Then --provider oll --models <name> with a model from ollama list / GET /api/tags. OLLAMA_API_KEY is optional locally; use it for Ollama Cloud or authenticated remote endpoints. Summaries and judge validation: SUMMARY_PROVIDER=oll / VALIDATION_PROVIDER=oll. Submitted report.json carries provider_mode: local and alias oll, so the leaderboard shows the same local badges as LM Studio.
When you pick lms or oll in /provider and the optional env var is unset, the TUI may offer the masked setup screen; you can skip if you do not need a token (see TUI).
Roblox OpenGameEval backend (rbx)
The rbx entry is not an LLM chat endpoint: it is the HTTP backend for the optional roblox test category (Roblox/open-game-eval). Use a normal LLM alias as --provider (opr, oai, …). Enable Roblox tasks with --category roblox (or TUI category selection); --roblox-adapter defaults to rbx and selects that backend’s endpoint and OPEN_GAME_EVAL_API_KEY.
OpenGameEval still calls your LLM via Roblox’s API: set LLM_API_KEY or the provider-specific variable (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY) matching the inferred vendor (openai | claude | gemini). The CLI infers that vendor from the chosen adapter’s meta.json id when possible (e.g. OpenAI), otherwise from the model id prefix (anthropic/…, google/…). Override with --roblox-llm-name / --roblox-llm-model-version when needed.
Test Filters
Categories
Filter tests by category (folder names under the active suite directory — e.g. packages/benchmark-core/tests/ for v1 or packages/benchmark-core/suites/v2/tests/ for v2):
| Category | Role |
|---|---|
speed | Latency-sensitive correctness |
security | Safe outputs and vulnerability awareness |
reasoning | Structured / numeric reasoning |
debugging | Small patches and bug fixes |
refactoring | Behavior-preserving edits |
hallucination | Grounding under tricky prompts |
coding | Executable JavaScript function tasks scored by hidden tests |
ui | Single-file HTML/UI artifacts with render + judge validation |
coding_ui | HTML artifacts + optional Playwright render |
roblox | Optional Roblox OpenGameEval suite; visible in breakdowns, excluded from Overall |
Difficulty Levels
Filter by difficulty:
| Level | Description |
|---|---|
easy | Lighter fixtures |
medium | Representative difficulty |
hard | Stricter / longer tasks |
Legacy German labels (easy, …) are normalized to these ids.
Advanced Options
Playwright Configuration
For coding_ui (and other HTML render checks), Playwright Chromium should be installed:
blxbench --headless --install-chromiumIn the TUI, use:
/playwright install
/playwright statusPlaywright stores Chromium in its normal per-user browser cache:
| OS | Default cache |
|---|---|
| Linux | ~/.cache/ms-playwright |
| macOS | ~/Library/Caches/ms-playwright |
| Windows | %LOCALAPPDATA%\ms-playwright |
Skip render validation if Chromium is missing:
blxbench --headless --skip-render-validationRate Limiting
| Value | Behavior |
|---|---|
| Unset | No rate limiting |
--ratelimit | Default RPM |
--ratelimit 30 | Custom RPM |
Fail Fast
Stop on first test failure:
blxbench --headless --fail-fastConfiguration Priority
Environment merge (provider keys, etc.): For each variable name, a non-empty value already exported in the shell always wins. Otherwise the CLI merges, in order:
.envnext to the installed CLI package (fills keys)../.envin the current working directory (fills keys the first pass did not set) — unlessignoreWorkspaceDotenvistrue.config.json→env: either only fills gaps (default), or overrides the file merge whenpreferStoredEnvistrueorBLXBENCH_PREFER_STORED_ENV=1.
Run options: built-in defaults → merged process.env → CLI flags where the command-line applies (flags win for that invocation).
Custom Tests Directory
Use a custom test tree:
blxbench --headless --tests-dir ./my-tests --provider opr --models openai/gpt-5.4-miniYour directory should mirror the fixture layout expected by benchmark-core. See Our Tests for how catalog entries map to files.