BLXBench Docs
BLXBench Docs
LeaderboardOur TestsSponsor / PartnershipDocumentationInstallationUpdating blxbenchQuick StartTUIArcadeCommandsHeadless ModeConfigurationLeaderboardOur TestsAccountReport Browser (desktop)AboutFAQSupport

Headless Mode

Running benchmarks in automated environments.

Headless mode allows BLXBench to run in CI/CD pipelines, scripts, and automated workflows. Install the blxbench command via @bitslix/blxbench (see Installation) before running the examples below.

Basic Usage

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Omit --headless if the process already has no TTY (typical in CI); blxbench then enters the same headless path automatically.

For --provider lms or oll, the runner uses fixed loopback URLs on 127.0.0.1. Pass --local-inference-port N when your LM Studio or Ollama daemon listens on a non-default TCP port (defaults remain 1234 and 11434).

Reports are written to the user's report directory by default:

  • Linux/macOS: ~/.blxbench/reports/
  • Windows: %USERPROFILE%\.blxbench\reports\

Use --save-json PATH for an additional JSON copy, or use the TUI's /set output-dir PATH when running interactively.

Desktop notification when the run finishes

Add --notify to ask the OS for a short hint when the benchmark completes (the same rules as the TUI: success or failure, not an aborted run). You can also set BLXBENCH_NOTIFY=1 or persist desktopNotify in ~/.blxbench/config.json — see Configuration — Desktop notifications. BLXBENCH_NOTIFY=0 forces notifications off for CI.

Multiple models

Pass more than one model ID to run separate benchmark runs (one run_id and one report.json per model). Use --parallel [n] to cap concurrent sub-runs (default in the runner: min(3, number of models)). The global --ratelimit budget applies to all sub-runs together.

With --submit, each report.json is uploaded separately. Quota is enforced per model for the current ISO calendar week (UTC): Scout includes 2, Bencher 5, and Founder 10 submissions per model week. Each distinct model id in report.summary.models consumes one slot for that week, and a multi-model report counts once toward every model it includes. The CLI may still send a shared batch id (quotaGroupId) for correlation on the server; it does not merge quota. Public submit only accepts full runs: no --limit, no category/level filters, and no fail-fast partial runs (exit_early must be false in report.json). The special roblox category may be attached to a full run; it is visible in the report but excluded from Overall. Details: Public submission rules.

Roblox OpenGameEval

roblox is an opt-in category backed by Roblox OpenGameEval rather than the normal chat-completions scorer. Default “all categories” runs exclude it; include roblox explicitly when you want those tests.

export OPEN_GAME_EVAL_API_KEY=...
export OPENAI_API_KEY=...

blxbench --headless --provider opr --models openai/gpt-5.4-mini \
  --category coding_ui debugging hallucination reasoning refactoring security speed roblox \
  --roblox-llm-name openai \
  --roblox-llm-model-version gpt-5

No local Python, uv, or Roblox Studio install is required. You do need a Roblox account and an OpenCloud API key with studio-evaluations:create. Roblox currently supports openai, claude, and gemini as OpenGameEval LLM names; OpenRouter models are not exposed through this path until Roblox offers custom provider/base-url support.

Roblox-specific flags:

FlagPurpose
--roblox-adapter rbxAdapter for https://apis.roblox.com/open-eval-api/v1; reads OPEN_GAME_EVAL_API_KEY.
--roblox-llm-name openai|claude|geminiProvider name sent to Roblox OpenGameEval.
--roblox-llm-model-version VERSIONModel version sent to Roblox. Defaults to the selected BLXBench model id without an OpenRouter-style provider prefix.
--roblox-max-concurrent NMax concurrent Roblox jobs per model. Keep low unless Roblox raises your quota.
--roblox-poll-interval SECONDSPoll interval for eval records.
--roblox-timeout SECONDSPer-job timeout.

Roblox results use category: "roblox" and include eval metadata such as job id, record URL, place id, and check counts. Secrets are not written to reports.

Integration with CI/CD

GitHub Actions

name: Benchmark
on: [push, pull_request]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: oven-sh/setup-bun@v1

      - name: Run benchmark
        run: |
          bun install -g @bitslix/blxbench
          blxbench --headless --provider opr --models openai/gpt-5.4-mini

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: blxbench-results
          path: ${{ env.HOME }}/.blxbench/reports/

GitLab CI

stages:
  - benchmark

benchmark:
  image: oven/bun:1
  script:
    - bun install -g @bitslix/blxbench
    - blxbench --headless --provider opr --models openai/gpt-5.4-mini
  artifacts:
    paths:
      - $HOME/.blxbench/reports/

Exit Codes

CodeDescription
0Success
1General error
2Invalid arguments
3Test failure (with --fail-fast)

Rate Limiting

Use --ratelimit to avoid hitting provider rate limits:

# Default (60 RPM)
blxbench --headless --provider opr --models openai/gpt-5.4-mini --ratelimit

# Custom (30 requests per minute)
blxbench --headless --provider opr --models openai/gpt-5.4-mini --ratelimit 30

Output Handling

Save JSON Results

blxbench --headless --provider opr --models openai/gpt-5.4-mini --save-json ./my-results.json

--save-json is an extra export. The regular run folder, HTML report, report.json, screenshots, artifacts, and aggregate ranking files still go under ~/.blxbench/reports/ unless you configure another results directory in the TUI.

Capture Output

# Suppress progress output
blxbench --headless --provider opr --models openai/gpt-5.4-mini 2>/dev/null

# Log to file
blxbench --headless --provider opr --models openai/gpt-5.4-mini >> benchmark.log 2>&1

Automated Submission

Set environment variables for automatic submission:

export BLXBENCH_API_KEY=your-key
export BLXBENCH_SUBMIT=1

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Or use the flag:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --submit --api-key your-key

Non-Interactive Detection

BLXBench automatically detects non-TTY environments and skips the TUI. To force the same behavior in a terminal:

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Commands

Complete reference for all blxbench commands.

Configuration

Configure blxbench via files, environment variables, and flags.

On this page

Basic UsageDesktop notification when the run finishesMultiple modelsRoblox OpenGameEvalIntegration with CI/CDGitHub ActionsGitLab CIExit CodesRate LimitingOutput HandlingSave JSON ResultsCapture OutputAutomated SubmissionNon-Interactive Detection