BLXBench - FAQ

General

What is BLXBench?

BLXBench is an open benchmark for AI models that evaluates speed, security, reasoning, and coding capabilities.

How is BLXBench different from other benchmarks?

Open source — All tests are freely available
Reproducible — Anyone can run the same tests
No paid placements — Results are based purely on performance
Community-driven — New tests can be contributed

Which models/providers are supported?

blxbench talks to adapters (OpenRouter, OpenAI, Hugging Face, Together, Portkey, Cloudflare — see Configuration). You pick a model id accepted by that endpoint (e.g. OpenRouter-style vendor/model). New adapters can be added in packages/benchmark-core/adapters/.

Using BLXBench

What is the npm package name?

@bitslix/blxbench (Bitslix on npm). The command after install is blxbench. See Installation.

How do I run a benchmark?

See Quick Start.

Do I need an API key?

Yes — you need credentials for the adapter you run against (e.g. OPENROUTER_API_KEY for opr).

A BLXBench API key is separate: use it for headless --submit uploads to the web app, together with a pass tier that includes submission quota. Interactive TUI login uses /auth login and the browser device flow.

Where are reports saved?

By default, local reports are written to a user-owned directory:

Linux/macOS: ~/.blxbench/reports/
Windows: %USERPROFILE%\.blxbench\reports\

Use /set output-dir PATH in the TUI to change the report directory for a run. Headless --save-json PATH writes an additional JSON export, but the normal report folder still uses the default report directory.

/report list in the TUI shows recent report.json files under that same effective directory (so it stays in sync with your /set output-dir choice).

Where are session snapshots saved?

/save and autosave write JSON under ~/.blxbench/saves/ on Linux/macOS and %USERPROFILE%\.blxbench\saves\ on Windows. Files hold run configuration (provider, models, filters, …) validated on load — not your provider keys or /auth credentials. Details: TUI — Session snapshots, Configuration — Autosave.

Why was my public upload rejected?

Typical cases: the run used filters (categories, levels, /set limit), --limit, fail-fast, or did not complete the full plan — the public endpoint only accepts comparable full runs. See Public submission rules. You may also see 409 for a duplicate run_id, or other messages if the deployment enforces manifest or integrity rules.

Can I upload without turning on auto-submit?

Yes, in the TUI after a run: press s or r to upload the report that was just written. That is separate from /report submit on (auto-upload on completion). You can also use /report list to open an older run and upload its report.json again. See TUI — After a benchmark finishes and Commands — After a run. Public upload still needs sign-in, an eligible pass tier, and a submit-capable account role, like headless --submit.

How long does a benchmark take?

Depends on the model and test count:

Quick run (5 tests): ~5 minutes
Full run (~100 tests): ~30-60 minutes

Does BLXBench cost money?

blxbench and local benchmark runs are free to use. You pay your model provider for API usage, and public leaderboard submission requires a BLXBench pass tier that includes submission quota.

Results

How are scores calculated?

See Leaderboard.

Why did my model's score change?

Scores reflect the latest test results. Changes can occur when:

New tests are added
Model versions update
Your run is superseded by newer submissions

Can I dispute a result?

Contact Support with details about the disputed result.

Troubleshooting

Playwright not found

Run:

blxbench --headless --install-chromium

Or, in the TUI:

/playwright install

The browser is installed in Playwright's normal per-user cache, not inside the npm package.

Permission denied creating `results`

Update to the latest CLI. Current versions write reports to ~/.blxbench/reports/ instead of a results/ folder in the current working directory.

Rate limit errors

Use --ratelimit flag:

blxbench --headless --ratelimit 30

API key errors

Make sure your API key is set:

export OPENROUTER_API_KEY=YOUR_KEY_HERE

Cannot upload results

Ensure you have a BLXBench account, an API key, and a pass tier that includes leaderboard submission (Runner alone is local-only). See Account. Check the blxbench warning if --submit was skipped because BLXBENCH_API_KEY was missing. If the API returns 400 with a message about filters, limits, or incomplete runs, see Public submission rules. 429 with BENCH_WEEKLY_LIMIT means the per-model weekly public submit cap was reached for at least one model in the report.