BLXBench - Configuration

The CLI is distributed as the npm package @bitslix/blxbench; configuration below applies to the blxbench command in your shell.

BLXBench loads variables from .env in the current directory or the path given by --dotenv-path, and merges ~/.blxbench/config.json env according to the rules in App config file.

# Example — use the env vars required by your chosen adapter (values are not shown here)
OPENROUTER_API_KEY=
OPENAI_API_KEY=
OPEN_GAME_EVAL_API_KEY=
LMSTUDIO_API_KEY=
OLLAMA_API_KEY=
BLXBENCH_API_KEY=

Provider API keys are used locally to call model APIs. OPEN_GAME_EVAL_API_KEY is the Roblox OpenGameEval backend key for the optional roblox category (not your --provider). BLXBENCH_API_KEY is a separate web-app key used only when uploading reports with --submit or BLXBENCH_SUBMIT=1.

Results Directory

By default, generated reports are saved to ~/.blxbench/reports/ (%USERPROFILE%\.blxbench\reports on Windows). This path is used by both the installed native binary and local development builds, so reports do not depend on the current working directory being writable.

In headless mode, --save-json writes an additional JSON export to a custom path:

blxbench --headless --save-json ./custom-results.json

In the TUI, use /set output-dir PATH to change the report directory for the interactive run.

Session snapshots and autosave (TUI)

Session configuration snapshots (slash commands /save and /load) are stored under a separate directory from reports:

Linux / macOS: ~/.blxbench/saves/
Windows: %USERPROFILE%\.blxbench\saves\

Each manual /save writes a new JSON file (unless you rely on autosave only). Autosave repeatedly overwrites a single file named autosave.json in that folder.

Variable	Purpose
`BLXBENCH_AUTOSAVE_SEC`	Interval in seconds between autosave writes while the interactive shell is open. Set to `0` to disable. When unset, the CLI uses a built-in default.

Snapshot files contain run settings only (validated when loaded). They do not contain provider API keys or BLXBench account secrets — those stay in .env, config.json → env, the shell environment, and the credentials file used by /auth.

App config file

The CLI reads ~/.blxbench/config.json (Unix) or %USERPROFILE%\.blxbench\config.json (Windows) at startup and merges it with .env files into process.env before the TUI or headless run starts. See Configuration priority below for merge rules.

Field	Purpose
`version`	File format version (written by the CLI; you can omit when hand-editing).
`desktopNotify`	When `true`, the CLI may show an OS-level notification when a benchmark run finishes (TUI or headless). Does not run for cancelled runs. Change from the TUI with `/set notify on` / `/notify off`, or edit the JSON.
`env`	Map of extra environment variables (typically provider API keys such as `OPENROUTER_API_KEY` or `OPEN_GAME_EVAL_API_KEY`). Values are merged into the process environment. Suitable for secrets you want outside a project `.env`. The file is chmod `600` when it holds non-empty `env` entries.
`preferStoredEnv`	When `true`, non-empty keys in `env` override values loaded from `.env` files for those keys (shell exports still win). When `false` (default), `env` only fills keys that are still missing after loading `.env`.
`ignoreWorkspaceDotenv`	When `true`, the CLI does not read `./.env` in the current working directory (package-adjacent `.env` for the installed CLI may still load). Use this if you want `~/.blxbench/config.json` (or shell exports) to be the only file-based source for overlapping keys.

You can set the same “prefer stored” behavior for a single process with BLXBENCH_PREFER_STORED_ENV=1 (or 0 to force off for that run), which behaves like preferStoredEnv in the file when unset there.

First-run setup (welcome): If OPENROUTER_API_KEY is missing (required for the default opr provider), the TUI shows a setup screen first: enter the key once, masked, and it is saved under config.env.OPENROUTER_API_KEY. After that you go through sign-in as usual. You can also create env yourself or rely on project .env — see installation.

When you enable the optional roblox category in the TUI and OPEN_GAME_EVAL_API_KEY is missing, the same masked setup flow stores the key under config.env.OPEN_GAME_EVAL_API_KEY.

When you choose a local LLM adapter with an optional API key (lms, oll) and that key is not in the environment, the TUI may prompt once; you can skip if you run locally without auth — see TUI.

Project vs stored keys: If both ./.env and config.env define the same known adapter key, such as OPENROUTER_API_KEY or OPEN_GAME_EVAL_API_KEY (and you have not set preferStoredEnv, ignoreWorkspaceDotenv, or BLXBENCH_PREFER_STORED_ENV), the TUI may ask whether to prefer the project file, the stored config, or to skip the workspace .env. That choice updates config.json accordingly.

Example:

{
  "version": 2,
  "desktopNotify": true,
  "env": {
    "OPENROUTER_API_KEY": "sk-or-...",
    "OPEN_GAME_EVAL_API_KEY": "..."
  },
  "preferStoredEnv": false,
  "ignoreWorkspaceDotenv": false
}

Desktop notifications

Notifications are best-effort and depend on the desktop session (no guarantee over SSH without a notification daemon, etc.). Evaluation order:

If BLXBENCH_NOTIFY is 0, false, no, or off → no notification for this process.
If BLXBENCH_NOTIFY is 1, true, yes, or on → notify when a run completes.
Else if config.json has desktopNotify: true → notify when a run completes.
Else if headless was started with --notify → notify when a run completes.

The implementation uses the platform’s native mechanisms (no extra npm packages). Details: Commands ( --notify, BLXBENCH_NOTIFY ).

Provider Configuration

The installed CLI includes the official provider adapters in the native bundle. In the source repo they live under packages/benchmark-core/adapters/. Each adapter exposes a provider alias (argument in meta.json):

Alias	Adapter	Typical env var
`opr`	OpenRouter	`OPENROUTER_API_KEY`
`oai`	OpenAI	`OPENAI_API_KEY`
`hgf`	Hugging Face	`HF_TOKEN`
`tgr`	Together	`TOGETHER_API_KEY`
`ptk`	Portkey	`PORTKEY_API_KEY`
`cfr`	Cloudflare	`CLOUDFLARE_API_TOKEN`
`lms`	LM Studio	`LMSTUDIO_API_KEY` (optional)
`oll`	Ollama	`OLLAMA_API_KEY` (optional)

The default --provider is opr. Model ids are whatever that endpoint accepts (e.g. OpenRouter-style vendor/model-name).

For local LM Studio runs, start the LM Studio server first (default chat API on TCP port 1234). The bundled lms adapter talks only to 127.0.0.1; override the port with --local-inference-port (headless) or /local-inference-port in the TUI if your server listens elsewhere. Then use --provider lms --models <model-key>. LMSTUDIO_API_KEY is only needed if you enabled API token authentication in LM Studio Server Settings. The same alias can be used for generated summaries and Coding/UI judge validation: SUMMARY_PROVIDER=lms and VALIDATION_PROVIDER=lms.

For Ollama, run ollama serve (default port 11434). Requests use 127.0.0.1 only; override the port like LM Studio when needed. Then --provider oll --models <name> with a model from ollama list / GET /api/tags. OLLAMA_API_KEY is optional locally; use it for Ollama Cloud or authenticated remote endpoints. Summaries and judge validation: SUMMARY_PROVIDER=oll / VALIDATION_PROVIDER=oll. Submitted report.json carries provider_mode: local and alias oll, so the leaderboard shows the same local badges as LM Studio.

When you pick lms or oll in /provider and the optional env var is unset, the TUI may offer the masked setup screen; you can skip if you do not need a token (see TUI).

Roblox OpenGameEval backend (`rbx`)

The rbx entry is not an LLM chat endpoint: it is the HTTP backend for the optional roblox test category (Roblox/open-game-eval). Use a normal LLM alias as --provider (opr, oai, …). Enable Roblox tasks with --category roblox (or TUI category selection); --roblox-adapter defaults to rbx and selects that backend’s endpoint and OPEN_GAME_EVAL_API_KEY.

OpenGameEval still calls your LLM via Roblox’s API: set LLM_API_KEY or the provider-specific variable (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY) matching the inferred vendor (openai | claude | gemini). The CLI infers that vendor from the chosen adapter’s meta.json id when possible (e.g. OpenAI), otherwise from the model id prefix (anthropic/…, google/…). Override with --roblox-llm-name / --roblox-llm-model-version when needed.

Test Filters

Category	Role
`speed`	Latency-sensitive correctness
`security`	Safe outputs and vulnerability awareness
`reasoning`	Structured / numeric reasoning
`debugging`	Small patches and bug fixes
`refactoring`	Behavior-preserving edits
`hallucination`	Grounding under tricky prompts
`coding`	Executable JavaScript function tasks scored by hidden tests
`ui`	Single-file HTML/UI artifacts with render + judge validation
`coding_ui`	HTML artifacts + optional Playwright render
`roblox`	Optional Roblox OpenGameEval suite; visible in breakdowns, excluded from Overall

Difficulty Levels

Filter by difficulty:

Level	Description
`easy`	Lighter fixtures
`medium`	Representative difficulty
`hard`	Stricter / longer tasks

Legacy German labels (easy, …) are normalized to these ids.

Advanced Options

Playwright Configuration

For coding_ui (and other HTML render checks), Playwright Chromium should be installed:

blxbench --headless --install-chromium

In the TUI, use:

/playwright install
/playwright status

Playwright stores Chromium in its normal per-user browser cache:

OS	Default cache
Linux	`~/.cache/ms-playwright`
macOS	`~/Library/Caches/ms-playwright`
Windows	`%LOCALAPPDATA%\ms-playwright`

Skip render validation if Chromium is missing:

blxbench --headless --skip-render-validation

Rate Limiting

Value	Behavior
Unset	No rate limiting
`--ratelimit`	Default RPM
`--ratelimit 30`	Custom RPM

Fail Fast

Stop on first test failure:

blxbench --headless --fail-fast

Configuration Priority

Environment merge (provider keys, etc.): For each variable name, a non-empty value already exported in the shell always wins. Otherwise the CLI merges, in order:

.env next to the installed CLI package (fills keys).
./.env in the current working directory (fills keys the first pass did not set) — unless ignoreWorkspaceDotenv is true.
config.json → env: either only fills gaps (default), or overrides the file merge when preferStoredEnv is true or BLXBENCH_PREFER_STORED_ENV=1.

Run options: built-in defaults → merged process.env → CLI flags where the command-line applies (flags win for that invocation).

Custom Tests Directory

Use a custom test tree:

blxbench --headless --tests-dir ./my-tests --provider opr --models openai/gpt-5.4-mini

Your directory should mirror the fixture layout expected by benchmark-core. See Our Tests for how catalog entries map to files.

The CLI is distributed as the npm package @bitslix/blxbench; configuration below applies to the blxbench command in your shell.

Configuration Files

.env File

# Example — use the env vars required by your chosen adapter (values are not shown here)
OPENROUTER_API_KEY=
OPENAI_API_KEY=
OPEN_GAME_EVAL_API_KEY=
LMSTUDIO_API_KEY=
OLLAMA_API_KEY=
BLXBENCH_API_KEY=

Results Directory

In headless mode, --save-json writes an additional JSON export to a custom path:

blxbench --headless --save-json ./custom-results.json

In the TUI, use /set output-dir PATH to change the report directory for the interactive run.

Session snapshots and autosave (TUI)

Session configuration snapshots (slash commands /save and /load) are stored under a separate directory from reports:

Linux / macOS: ~/.blxbench/saves/
Windows: %USERPROFILE%\.blxbench\saves\

Each manual /save writes a new JSON file (unless you rely on autosave only). Autosave repeatedly overwrites a single file named autosave.json in that folder.

Variable	Purpose
`BLXBENCH_AUTOSAVE_SEC`	Interval in seconds between autosave writes while the interactive shell is open. Set to `0` to disable. When unset, the CLI uses a built-in default.

App config file

Field	Purpose
`version`	File format version (written by the CLI; you can omit when hand-editing).
`desktopNotify`	When `true`, the CLI may show an OS-level notification when a benchmark run finishes (TUI or headless). Does not run for cancelled runs. Change from the TUI with `/set notify on` / `/notify off`, or edit the JSON.
`env`	Map of extra environment variables (typically provider API keys such as `OPENROUTER_API_KEY` or `OPEN_GAME_EVAL_API_KEY`). Values are merged into the process environment. Suitable for secrets you want outside a project `.env`. The file is chmod `600` when it holds non-empty `env` entries.
`preferStoredEnv`	When `true`, non-empty keys in `env` override values loaded from `.env` files for those keys (shell exports still win). When `false` (default), `env` only fills keys that are still missing after loading `.env`.
`ignoreWorkspaceDotenv`	When `true`, the CLI does not read `./.env` in the current working directory (package-adjacent `.env` for the installed CLI may still load). Use this if you want `~/.blxbench/config.json` (or shell exports) to be the only file-based source for overlapping keys.

When you enable the optional roblox category in the TUI and OPEN_GAME_EVAL_API_KEY is missing, the same masked setup flow stores the key under config.env.OPEN_GAME_EVAL_API_KEY.

Example:

{
  "version": 2,
  "desktopNotify": true,
  "env": {
    "OPENROUTER_API_KEY": "sk-or-...",
    "OPEN_GAME_EVAL_API_KEY": "..."
  },
  "preferStoredEnv": false,
  "ignoreWorkspaceDotenv": false
}

Desktop notifications

Notifications are best-effort and depend on the desktop session (no guarantee over SSH without a notification daemon, etc.). Evaluation order:

If BLXBENCH_NOTIFY is 0, false, no, or off → no notification for this process.
If BLXBENCH_NOTIFY is 1, true, yes, or on → notify when a run completes.
Else if config.json has desktopNotify: true → notify when a run completes.
Else if headless was started with --notify → notify when a run completes.

The implementation uses the platform’s native mechanisms (no extra npm packages). Details: Commands ( --notify, BLXBENCH_NOTIFY ).

Provider Configuration

Alias	Adapter	Typical env var
`opr`	OpenRouter	`OPENROUTER_API_KEY`
`oai`	OpenAI	`OPENAI_API_KEY`
`hgf`	Hugging Face	`HF_TOKEN`
`tgr`	Together	`TOGETHER_API_KEY`
`ptk`	Portkey	`PORTKEY_API_KEY`
`cfr`	Cloudflare	`CLOUDFLARE_API_TOKEN`
`lms`	LM Studio	`LMSTUDIO_API_KEY` (optional)
`oll`	Ollama	`OLLAMA_API_KEY` (optional)

The default --provider is opr. Model ids are whatever that endpoint accepts (e.g. OpenRouter-style vendor/model-name).

When you pick lms or oll in /provider and the optional env var is unset, the TUI may offer the masked setup screen; you can skip if you do not need a token (see TUI).

Roblox OpenGameEval backend (`rbx`)

Test Filters

Category	Role
`speed`	Latency-sensitive correctness
`security`	Safe outputs and vulnerability awareness
`reasoning`	Structured / numeric reasoning
`debugging`	Small patches and bug fixes
`refactoring`	Behavior-preserving edits
`hallucination`	Grounding under tricky prompts
`coding`	Executable JavaScript function tasks scored by hidden tests
`ui`	Single-file HTML/UI artifacts with render + judge validation
`coding_ui`	HTML artifacts + optional Playwright render
`roblox`	Optional Roblox OpenGameEval suite; visible in breakdowns, excluded from Overall

Difficulty Levels

Filter by difficulty:

Level	Description
`easy`	Lighter fixtures
`medium`	Representative difficulty
`hard`	Stricter / longer tasks

Legacy German labels (easy, …) are normalized to these ids.

Advanced Options

Playwright Configuration

For coding_ui (and other HTML render checks), Playwright Chromium should be installed:

blxbench --headless --install-chromium

In the TUI, use:

/playwright install
/playwright status

Playwright stores Chromium in its normal per-user browser cache:

OS	Default cache
Linux	`~/.cache/ms-playwright`
macOS	`~/Library/Caches/ms-playwright`
Windows	`%LOCALAPPDATA%\ms-playwright`

Skip render validation if Chromium is missing:

blxbench --headless --skip-render-validation

Rate Limiting

Value	Behavior
Unset	No rate limiting
`--ratelimit`	Default RPM
`--ratelimit 30`	Custom RPM

Fail Fast

Stop on first test failure:

blxbench --headless --fail-fast

Configuration Priority

Environment merge (provider keys, etc.): For each variable name, a non-empty value already exported in the shell always wins. Otherwise the CLI merges, in order:

.env next to the installed CLI package (fills keys).
./.env in the current working directory (fills keys the first pass did not set) — unless ignoreWorkspaceDotenv is true.
config.json → env: either only fills gaps (default), or overrides the file merge when preferStoredEnv is true or BLXBENCH_PREFER_STORED_ENV=1.

Run options: built-in defaults → merged process.env → CLI flags where the command-line applies (flags win for that invocation).

Custom Tests Directory

Use a custom test tree:

blxbench --headless --tests-dir ./my-tests --provider opr --models openai/gpt-5.4-mini

Your directory should mirror the fixture layout expected by benchmark-core. See Our Tests for how catalog entries map to files.

Configuration

Configuration Files

.env File

Results Directory

Session snapshots and autosave (TUI)

App config file

Desktop notifications

Provider Configuration

Roblox OpenGameEval backend (`rbx`)

Test Filters

Categories

Difficulty Levels

Advanced Options

Playwright Configuration

Rate Limiting

Fail Fast

Configuration Priority

Custom Tests Directory

On this page

Configuration

Configuration Files

.env File

Results Directory

Session snapshots and autosave (TUI)

App config file

Desktop notifications

Provider Configuration

Roblox OpenGameEval backend (`rbx`)

Test Filters

Categories

Difficulty Levels

Advanced Options

Playwright Configuration

Rate Limiting

Fail Fast

Configuration Priority

Custom Tests Directory

On this page