Test fixture
Bug fixes, edge conditions, and minimal patch accuracy.
The model receives the prompt (and optional system message). The run uses scorer rubric_json_metrics with the JSON configuration below. Pass/fail and partial credit are determined entirely by that scorer against the model output; no human grading.
Return JSON only with keys diagnosis, fix, tests. A read-through cache invalidates a key after a write, but an older asynchronous read resolves later and stores stale data back into the cache. Identify the out-of-order cache race and fix it.
{
"metrics": {
"repro": {
"checks": [
{
"contains": [
"older read"
]
},
{
"contains": [
"resolves later"
]
},
{
"contains": [
"stale data"
]
},
{
"contains": [
"cache"
]
}
]
},
"hidden": {
"checks": [
{
"contains": [
"version"
]
},
{
"contains": [
"generation"
]
},
{
"contains": [
"compare before storing"
]
}
]
},
"diagnose": {
"checks": [
{
"contains": [
"out-of-order"
]
},
{
"contains": [
"race"
]
},
{
"contains": [
"invalidation"
]
}
]
}
}
}temperature
0
max_tokens
520
timeout (s)
120
type
scored
file
debug-cache-invalidation-race-v2.json