BLXBench - Headless Mode

Headless mode allows BLXBench to run in CI/CD pipelines, scripts, and automated workflows. Install the blxbench command via @bitslix/blxbench (see Installation) before running the examples below.

Basic Usage

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Omit --headless if the process already has no TTY (typical in CI); blxbench then enters the same headless path automatically.

For --provider lms or oll, the runner uses fixed loopback URLs on 127.0.0.1. Pass --local-inference-port N when your LM Studio or Ollama daemon listens on a non-default TCP port (defaults remain 1234 and 11434).

Reports are written to the user's report directory by default:

Linux/macOS: ~/.blxbench/reports/
Windows: %USERPROFILE%\.blxbench\reports\

Use --save-json PATH for an additional JSON copy, or use the TUI's /set output-dir PATH when running interactively.

Desktop notification when the run finishes

Add --notify to ask the OS for a short hint when the benchmark completes (the same rules as the TUI: success or failure, not an aborted run). You can also set BLXBENCH_NOTIFY=1 or persist desktopNotify in ~/.blxbench/config.json — see Configuration — Desktop notifications. BLXBENCH_NOTIFY=0 forces notifications off for CI.

Multiple models

Pass more than one model ID to run separate benchmark runs (one run_id and one report.json per model). Use --parallel [n] to cap concurrent sub-runs (default in the runner: min(3, number of models)). The global --ratelimit budget applies to all sub-runs together.

With --submit, each report.json is uploaded separately. Quota is enforced per model for the current ISO calendar week (UTC): Scout includes 2, Bencher 5, and Founder 10 submissions per model week. Each distinct model id in report.summary.models consumes one slot for that week, and a multi-model report counts once toward every model it includes. The CLI may still send a shared batch id (quotaGroupId) for correlation on the server; it does not merge quota. Public submit only accepts full runs: no --limit, no category/level filters, and no fail-fast partial runs (exit_early must be false in report.json). The special roblox category may be attached to a full run; it is visible in the report but excluded from Overall. Details: Public submission rules.

Roblox OpenGameEval

roblox is an opt-in category backed by Roblox OpenGameEval rather than the normal chat-completions scorer. Default “all categories” runs exclude it; include roblox explicitly when you want those tests.

export OPEN_GAME_EVAL_API_KEY=...
export OPENAI_API_KEY=...

blxbench --headless --provider opr --models openai/gpt-5.4-mini \
  --category coding_ui debugging hallucination reasoning refactoring security speed roblox \
  --roblox-llm-name openai \
  --roblox-llm-model-version gpt-5

No local Python, uv, or Roblox Studio install is required. You do need a Roblox account and an OpenCloud API key with studio-evaluations:create. Roblox currently supports openai, claude, and gemini as OpenGameEval LLM names; OpenRouter models are not exposed through this path until Roblox offers custom provider/base-url support.

Roblox-specific flags:

Flag	Purpose
`--roblox-adapter rbx`	Adapter for `https://apis.roblox.com/open-eval-api/v1`; reads `OPEN_GAME_EVAL_API_KEY`.
`--roblox-llm-name openai\|claude\|gemini`	Provider name sent to Roblox OpenGameEval.
`--roblox-llm-model-version VERSION`	Model version sent to Roblox. Defaults to the selected BLXBench model id without an OpenRouter-style provider prefix.
`--roblox-max-concurrent N`	Max concurrent Roblox jobs per model. Keep low unless Roblox raises your quota.
`--roblox-poll-interval SECONDS`	Poll interval for eval records.
`--roblox-timeout SECONDS`	Per-job timeout.

Roblox results use category: "roblox" and include eval metadata such as job id, record URL, place id, and check counts. Secrets are not written to reports.

Integration with CI/CD

GitHub Actions

name: Benchmark
on: [push, pull_request]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: oven-sh/setup-bun@v1

      - name: Run benchmark
        run: |
          bun install -g @bitslix/blxbench
          blxbench --headless --provider opr --models openai/gpt-5.4-mini

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: blxbench-results
          path: ${{ env.HOME }}/.blxbench/reports/

GitLab CI

stages:
  - benchmark

benchmark:
  image: oven/bun:1
  script:
    - bun install -g @bitslix/blxbench
    - blxbench --headless --provider opr --models openai/gpt-5.4-mini
  artifacts:
    paths:
      - $HOME/.blxbench/reports/

Exit Codes

Code	Description
0	Success
1	General error
2	Invalid arguments
3	Test failure (with `--fail-fast`)

Rate Limiting

Use --ratelimit to avoid hitting provider rate limits:

# Default (60 RPM)
blxbench --headless --provider opr --models openai/gpt-5.4-mini --ratelimit

# Custom (30 requests per minute)
blxbench --headless --provider opr --models openai/gpt-5.4-mini --ratelimit 30

Output Handling

Save JSON Results

blxbench --headless --provider opr --models openai/gpt-5.4-mini --save-json ./my-results.json

--save-json is an extra export. The regular run folder, HTML report, report.json, screenshots, artifacts, and aggregate ranking files still go under ~/.blxbench/reports/ unless you configure another results directory in the TUI.

Capture Output

# Suppress progress output
blxbench --headless --provider opr --models openai/gpt-5.4-mini 2>/dev/null

# Log to file
blxbench --headless --provider opr --models openai/gpt-5.4-mini >> benchmark.log 2>&1

Automated Submission

Set environment variables for automatic submission:

export BLXBENCH_API_KEY=your-key
export BLXBENCH_SUBMIT=1

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Or use the flag:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --submit --api-key your-key

Non-Interactive Detection

BLXBench automatically detects non-TTY environments and skips the TUI. To force the same behavior in a terminal:

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Basic Usage

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Omit --headless if the process already has no TTY (typical in CI); blxbench then enters the same headless path automatically.

Reports are written to the user's report directory by default:

Linux/macOS: ~/.blxbench/reports/
Windows: %USERPROFILE%\.blxbench\reports\

Use --save-json PATH for an additional JSON copy, or use the TUI's /set output-dir PATH when running interactively.

Desktop notification when the run finishes

Multiple models

Roblox OpenGameEval

export OPEN_GAME_EVAL_API_KEY=...
export OPENAI_API_KEY=...

blxbench --headless --provider opr --models openai/gpt-5.4-mini \
  --category coding_ui debugging hallucination reasoning refactoring security speed roblox \
  --roblox-llm-name openai \
  --roblox-llm-model-version gpt-5

Roblox-specific flags:

Flag	Purpose
`--roblox-adapter rbx`	Adapter for `https://apis.roblox.com/open-eval-api/v1`; reads `OPEN_GAME_EVAL_API_KEY`.
`--roblox-llm-name openai\|claude\|gemini`	Provider name sent to Roblox OpenGameEval.
`--roblox-llm-model-version VERSION`	Model version sent to Roblox. Defaults to the selected BLXBench model id without an OpenRouter-style provider prefix.
`--roblox-max-concurrent N`	Max concurrent Roblox jobs per model. Keep low unless Roblox raises your quota.
`--roblox-poll-interval SECONDS`	Poll interval for eval records.
`--roblox-timeout SECONDS`	Per-job timeout.

Roblox results use category: "roblox" and include eval metadata such as job id, record URL, place id, and check counts. Secrets are not written to reports.

Integration with CI/CD

GitHub Actions

name: Benchmark
on: [push, pull_request]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: oven-sh/setup-bun@v1

      - name: Run benchmark
        run: |
          bun install -g @bitslix/blxbench
          blxbench --headless --provider opr --models openai/gpt-5.4-mini

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: blxbench-results
          path: ${{ env.HOME }}/.blxbench/reports/

GitLab CI

stages:
  - benchmark

benchmark:
  image: oven/bun:1
  script:
    - bun install -g @bitslix/blxbench
    - blxbench --headless --provider opr --models openai/gpt-5.4-mini
  artifacts:
    paths:
      - $HOME/.blxbench/reports/

Exit Codes

Code	Description
0	Success
1	General error
2	Invalid arguments
3	Test failure (with `--fail-fast`)

Rate Limiting

Use --ratelimit to avoid hitting provider rate limits:

# Default (60 RPM)
blxbench --headless --provider opr --models openai/gpt-5.4-mini --ratelimit

# Custom (30 requests per minute)
blxbench --headless --provider opr --models openai/gpt-5.4-mini --ratelimit 30

Output Handling

Save JSON Results

blxbench --headless --provider opr --models openai/gpt-5.4-mini --save-json ./my-results.json

Capture Output

# Suppress progress output
blxbench --headless --provider opr --models openai/gpt-5.4-mini 2>/dev/null

# Log to file
blxbench --headless --provider opr --models openai/gpt-5.4-mini >> benchmark.log 2>&1

Automated Submission

Set environment variables for automatic submission:

export BLXBENCH_API_KEY=your-key
export BLXBENCH_SUBMIT=1

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Or use the flag:

blxbench --headless --provider opr --models openai/gpt-5.4-mini --submit --api-key your-key

Non-Interactive Detection

BLXBench automatically detects non-TTY environments and skips the TUI. To force the same behavior in a terminal:

blxbench --headless --provider opr --models openai/gpt-5.4-mini

Headless Mode

Basic Usage

Desktop notification when the run finishes

Multiple models

Roblox OpenGameEval

Integration with CI/CD

GitHub Actions

GitLab CI

Exit Codes

Rate Limiting

Output Handling

Save JSON Results

Capture Output

Automated Submission

Non-Interactive Detection

On this page

Headless Mode

Basic Usage

Desktop notification when the run finishes

Multiple models

Roblox OpenGameEval

Integration with CI/CD

GitHub Actions

GitLab CI

Exit Codes

Rate Limiting

Output Handling

Save JSON Results

Capture Output

Automated Submission

Non-Interactive Detection

On this page