Test fixture

Coding-Hard-Token-Bucket

Codingv2 — Resiliencehardscorer: javascript_function_tests

Implementation-focused coding tasks with structured correctness checks.

How it is scored

The model receives the prompt (and optional system message). The run uses scorer javascript_function_tests with the JSON configuration below. Pass/fail and partial credit are determined entirely by that scorer against the model output; no human grading.

User prompt

Return JSON only with a string field named code. The code must be dependency-free JavaScript, define the requested function in the top level or module.exports, and include no markdown, imports, require, timers, network, filesystem, eval, or placeholders.

Implement function createTokenBucket(capacity, refillPerMs) that returns allow(cost, timestampMs). For this benchmark also support createTokenBucket(capacity, refillPerMs, calls), where calls is an array of [cost,timestampMs], and return allow results for those calls.

Your JSON must look like {"code":"function createTokenBucket(...) { ... }"}.

Scorer config

{
  "function_name": "createTokenBucket",
  "timeout_ms": 250,
  "test_pass_threshold": 1,
  "partial_credit_threshold": 0.5,
  "tests": "[hidden executable tests]"
}

Run parameters

temperature

max_tokens

2200

timeout (s)

120

type

scored

file

coding-hard-token-bucket.json

← PreviousCoding-Hard-Semver-Compare

Next →Coding-Hard-Top-K-Frequent