BLXBenchBLXBench UI
blxbench

Benchmark

Misc

DocsOur TestsPassSponsor / Partnership
DocsOur TestsPassSponsor / Partnership
BLXBenchBLXBench UI
blxbench

Benchmark

Suite

Misc

DocsOur TestsPassSponsor / Partnership
DocsOur TestsPassSponsor / Partnership
  1. Home
  2. Our Tests
  3. Coding-Hard-Json-Pointer
blxbench

Test fixture

Coding-Hard-Json-Pointer

Codingv2 — Resiliencehardscorer: javascript_function_tests

Implementation-focused coding tasks with structured correctness checks.

How it is scored

The model receives the prompt (and optional system message). The run uses scorer javascript_function_tests with the JSON configuration below. Pass/fail and partial credit are determined entirely by that scorer against the model output; no human grading.

User prompt
Return JSON only with a string field named code. The code must be dependency-free JavaScript, define the requested function in the top level or module.exports, and include no markdown, imports, require, timers, network, filesystem, eval, or placeholders.

Implement function applyJsonPatch(doc, ops) supporting JSON Patch add, remove, and replace with JSON Pointer paths. Return a patched deep copy and do not mutate doc.

Your JSON must look like {"code":"function applyJsonPatch(...) { ... }"}.
Scorer config
{
  "function_name": "applyJsonPatch",
  "timeout_ms": 250,
  "test_pass_threshold": 1,
  "partial_credit_threshold": 0.5,
  "tests": "[hidden executable tests]"
}
Run parameters

temperature

0

max_tokens

2200

timeout (s)

120

type

scored

file

coding-hard-json-pointer.json

← PreviousCoding-Hard-Json-Patch
|
Next →Coding-Hard-Longest-Common-Subsequence

BLXBench

Community driven leaderboardPublic benchmark runner — run in your environment, share results with the community.

© 2026 BLXBench by bitslix.com

ProvenanceAggregated from user runs
Scope41 / 11 / 490
Latestrun_3d5451 / 459 / $1.75
TermsPrivacy