Test fixture
Implementation-focused coding tasks with structured correctness checks.
The model receives the prompt (and optional system message). The run uses scorer javascript_function_tests with the JSON configuration below. Pass/fail and partial credit are determined entirely by that scorer against the model output; no human grading.
Return JSON only with a string field named code. The code must be dependency-free JavaScript, define the requested function in the top level or module.exports, and include no markdown, imports, require, timers, network, filesystem, eval, or placeholders.
Implement function createRateLimiter(limit, windowMs) that returns allow(userId, timestampMs). For this benchmark also support being called as createRateLimiter(limit, windowMs, calls), where calls is an array of [userId,timestampMs], and return the allow results for those calls.
Your JSON must look like {"code":"function createRateLimiter(...) { ... }"}.{
"function_name": "createRateLimiter",
"timeout_ms": 100,
"test_pass_threshold": 1,
"partial_credit_threshold": 0.5,
"tests": "[hidden executable tests]"
}temperature
0
max_tokens
1600
timeout (s)
120
type
scored
file
coding-medium-rate-limit.json