BLXBench Docs
BLXBench Docs
LeaderboardOur TestsSponsor / PartnershipDocumentationInstallationUpdating blxbenchQuick StartTUIArcadeCommandsHeadless ModeConfigurationLeaderboardOur TestsAccountReport Browser (desktop)AboutFAQSupport

Configuration

Configure blxbench via files, environment variables, and flags.

The CLI is distributed as the npm package @bitslix/blxbench; configuration below applies to the blxbench command in your shell.

Configuration Files

.env File

BLXBench loads variables from .env in the current directory or the path given by --dotenv-path, and merges ~/.blxbench/config.json env according to the rules in App config file.

# Example — use the env vars required by your chosen adapter (values are not shown here)
OPENROUTER_API_KEY=
OPENAI_API_KEY=
OPEN_GAME_EVAL_API_KEY=
LMSTUDIO_API_KEY=
OLLAMA_API_KEY=
BLXBENCH_API_KEY=

Provider API keys are used locally to call model APIs. OPEN_GAME_EVAL_API_KEY is the Roblox OpenGameEval backend key for the optional roblox category (not your --provider). BLXBENCH_API_KEY is a separate web-app key used only when uploading reports with --submit or BLXBENCH_SUBMIT=1.

Results Directory

By default, generated reports are saved to ~/.blxbench/reports/ (%USERPROFILE%\.blxbench\reports on Windows). This path is used by both the installed native binary and local development builds, so reports do not depend on the current working directory being writable.

In headless mode, --save-json writes an additional JSON export to a custom path:

blxbench --headless --save-json ./custom-results.json

In the TUI, use /set output-dir PATH to change the report directory for the interactive run.

Session snapshots and autosave (TUI)

Session configuration snapshots (slash commands /save and /load) are stored under a separate directory from reports:

  • Linux / macOS: ~/.blxbench/saves/
  • Windows: %USERPROFILE%\.blxbench\saves\

Each manual /save writes a new JSON file (unless you rely on autosave only). Autosave repeatedly overwrites a single file named autosave.json in that folder.

VariablePurpose
BLXBENCH_AUTOSAVE_SECInterval in seconds between autosave writes while the interactive shell is open. Set to 0 to disable. When unset, the CLI uses a built-in default.

Snapshot files contain run settings only (validated when loaded). They do not contain provider API keys or BLXBench account secrets — those stay in .env, config.json → env, the shell environment, and the credentials file used by /auth.

App config file

The CLI reads ~/.blxbench/config.json (Unix) or %USERPROFILE%\.blxbench\config.json (Windows) at startup and merges it with .env files into process.env before the TUI or headless run starts. See Configuration priority below for merge rules.

FieldPurpose
versionFile format version (written by the CLI; you can omit when hand-editing).
desktopNotifyWhen true, the CLI may show an OS-level notification when a benchmark run finishes (TUI or headless). Does not run for cancelled runs. Change from the TUI with /set notify on / /notify off, or edit the JSON.
envMap of extra environment variables (typically provider API keys such as OPENROUTER_API_KEY or OPEN_GAME_EVAL_API_KEY). Values are merged into the process environment. Suitable for secrets you want outside a project .env. The file is chmod 600 when it holds non-empty env entries.
preferStoredEnvWhen true, non-empty keys in env override values loaded from .env files for those keys (shell exports still win). When false (default), env only fills keys that are still missing after loading .env.
ignoreWorkspaceDotenvWhen true, the CLI does not read ./.env in the current working directory (package-adjacent .env for the installed CLI may still load). Use this if you want ~/.blxbench/config.json (or shell exports) to be the only file-based source for overlapping keys.

You can set the same “prefer stored” behavior for a single process with BLXBENCH_PREFER_STORED_ENV=1 (or 0 to force off for that run), which behaves like preferStoredEnv in the file when unset there.

First-run setup (welcome): If OPENROUTER_API_KEY is missing (required for the default opr provider), the TUI shows a setup screen first: enter the key once, masked, and it is saved under config.env.OPENROUTER_API_KEY. After that you go through sign-in as usual. You can also create env yourself or rely on project .env — see installation.

When you enable the optional roblox category in the TUI and OPEN_GAME_EVAL_API_KEY is missing, the same masked setup flow stores the key under config.env.OPEN_GAME_EVAL_API_KEY.

When you choose a local LLM adapter with an optional API key (lms, oll) and that key is not in the environment, the TUI may prompt once; you can skip if you run locally without auth — see TUI.

Project vs stored keys: If both ./.env and config.env define the same known adapter key, such as OPENROUTER_API_KEY or OPEN_GAME_EVAL_API_KEY (and you have not set preferStoredEnv, ignoreWorkspaceDotenv, or BLXBENCH_PREFER_STORED_ENV), the TUI may ask whether to prefer the project file, the stored config, or to skip the workspace .env. That choice updates config.json accordingly.

Example:

{
  "version": 2,
  "desktopNotify": true,
  "env": {
    "OPENROUTER_API_KEY": "sk-or-...",
    "OPEN_GAME_EVAL_API_KEY": "..."
  },
  "preferStoredEnv": false,
  "ignoreWorkspaceDotenv": false
}

Desktop notifications

Notifications are best-effort and depend on the desktop session (no guarantee over SSH without a notification daemon, etc.). Evaluation order:

  1. If BLXBENCH_NOTIFY is 0, false, no, or off → no notification for this process.
  2. If BLXBENCH_NOTIFY is 1, true, yes, or on → notify when a run completes.
  3. Else if config.json has desktopNotify: true → notify when a run completes.
  4. Else if headless was started with --notify → notify when a run completes.

The implementation uses the platform’s native mechanisms (no extra npm packages). Details: Commands ( --notify, BLXBENCH_NOTIFY ).

Provider Configuration

The installed CLI includes the official provider adapters in the native bundle. In the source repo they live under packages/benchmark-core/adapters/. Each adapter exposes a provider alias (argument in meta.json):

AliasAdapterTypical env var
oprOpenRouterOPENROUTER_API_KEY
oaiOpenAIOPENAI_API_KEY
hgfHugging FaceHF_TOKEN
tgrTogetherTOGETHER_API_KEY
ptkPortkeyPORTKEY_API_KEY
cfrCloudflareCLOUDFLARE_API_TOKEN
lmsLM StudioLMSTUDIO_API_KEY (optional)
ollOllamaOLLAMA_API_KEY (optional)

The default --provider is opr. Model ids are whatever that endpoint accepts (e.g. OpenRouter-style vendor/model-name).

For local LM Studio runs, start the LM Studio server first (default chat API on TCP port 1234). The bundled lms adapter talks only to 127.0.0.1; override the port with --local-inference-port (headless) or /local-inference-port in the TUI if your server listens elsewhere. Then use --provider lms --models <model-key>. LMSTUDIO_API_KEY is only needed if you enabled API token authentication in LM Studio Server Settings. The same alias can be used for generated summaries and Coding/UI judge validation: SUMMARY_PROVIDER=lms and VALIDATION_PROVIDER=lms.

For Ollama, run ollama serve (default port 11434). Requests use 127.0.0.1 only; override the port like LM Studio when needed. Then --provider oll --models <name> with a model from ollama list / GET /api/tags. OLLAMA_API_KEY is optional locally; use it for Ollama Cloud or authenticated remote endpoints. Summaries and judge validation: SUMMARY_PROVIDER=oll / VALIDATION_PROVIDER=oll. Submitted report.json carries provider_mode: local and alias oll, so the leaderboard shows the same local badges as LM Studio.

When you pick lms or oll in /provider and the optional env var is unset, the TUI may offer the masked setup screen; you can skip if you do not need a token (see TUI).

Roblox OpenGameEval backend (rbx)

The rbx entry is not an LLM chat endpoint: it is the HTTP backend for the optional roblox test category (Roblox/open-game-eval). Use a normal LLM alias as --provider (opr, oai, …). Enable Roblox tasks with --category roblox (or TUI category selection); --roblox-adapter defaults to rbx and selects that backend’s endpoint and OPEN_GAME_EVAL_API_KEY.

OpenGameEval still calls your LLM via Roblox’s API: set LLM_API_KEY or the provider-specific variable (OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY) matching the inferred vendor (openai | claude | gemini). The CLI infers that vendor from the chosen adapter’s meta.json id when possible (e.g. OpenAI), otherwise from the model id prefix (anthropic/…, google/…). Override with --roblox-llm-name / --roblox-llm-model-version when needed.

Test Filters

Categories

Filter tests by category (folder names under the active suite directory — e.g. packages/benchmark-core/tests/ for v1 or packages/benchmark-core/suites/v2/tests/ for v2):

CategoryRole
speedLatency-sensitive correctness
securitySafe outputs and vulnerability awareness
reasoningStructured / numeric reasoning
debuggingSmall patches and bug fixes
refactoringBehavior-preserving edits
hallucinationGrounding under tricky prompts
codingExecutable JavaScript function tasks scored by hidden tests
uiSingle-file HTML/UI artifacts with render + judge validation
coding_uiHTML artifacts + optional Playwright render
robloxOptional Roblox OpenGameEval suite; visible in breakdowns, excluded from Overall

Difficulty Levels

Filter by difficulty:

LevelDescription
easyLighter fixtures
mediumRepresentative difficulty
hardStricter / longer tasks

Legacy German labels (easy, …) are normalized to these ids.

Advanced Options

Playwright Configuration

For coding_ui (and other HTML render checks), Playwright Chromium should be installed:

blxbench --headless --install-chromium

In the TUI, use:

/playwright install
/playwright status

Playwright stores Chromium in its normal per-user browser cache:

OSDefault cache
Linux~/.cache/ms-playwright
macOS~/Library/Caches/ms-playwright
Windows%LOCALAPPDATA%\ms-playwright

Skip render validation if Chromium is missing:

blxbench --headless --skip-render-validation

Rate Limiting

ValueBehavior
UnsetNo rate limiting
--ratelimitDefault RPM
--ratelimit 30Custom RPM

Fail Fast

Stop on first test failure:

blxbench --headless --fail-fast

Configuration Priority

Environment merge (provider keys, etc.): For each variable name, a non-empty value already exported in the shell always wins. Otherwise the CLI merges, in order:

  1. .env next to the installed CLI package (fills keys).
  2. ./.env in the current working directory (fills keys the first pass did not set) — unless ignoreWorkspaceDotenv is true.
  3. config.json → env: either only fills gaps (default), or overrides the file merge when preferStoredEnv is true or BLXBENCH_PREFER_STORED_ENV=1.

Run options: built-in defaults → merged process.env → CLI flags where the command-line applies (flags win for that invocation).

Custom Tests Directory

Use a custom test tree:

blxbench --headless --tests-dir ./my-tests --provider opr --models openai/gpt-5.4-mini

Your directory should mirror the fixture layout expected by benchmark-core. See Our Tests for how catalog entries map to files.

Headless Mode

Running benchmarks in automated environments.

Leaderboard

How to read and interpret the BLXBench leaderboard.

On this page

Configuration Files.env FileResults DirectorySession snapshots and autosave (TUI)App config fileDesktop notificationsProvider ConfigurationRoblox OpenGameEval backend (rbx)Test FiltersCategoriesDifficulty LevelsAdvanced OptionsPlaywright ConfigurationRate LimitingFail FastConfiguration PriorityCustom Tests Directory