Test fixture
Bug fixes, edge conditions, and minimal patch accuracy.
The model receives the prompt (and optional system message). The run uses scorer rubric_json_metrics with the JSON configuration below. Pass/fail and partial credit are determined entirely by that scorer against the model output; no human grading.
Return JSON only with keys diagnosis, fix, tests. A module-level singleton (a connection pool) is not reset during hot reload; each reload adds a new pool instance without closing the old one, causing growing memory consumption, port exhaustion, and stale state observed by some request handlers. Identify the singleton leak and fix it.
{
"metrics": {
"repro": {
"checks": [
{
"contains": [
"hot reload"
]
},
{
"contains": [
"singleton"
]
},
{
"contains": [
"stale"
]
},
{
"contains": [
"accumulate"
]
}
]
},
"hidden": {
"checks": [
{
"contains": [
"dispose"
]
},
{
"contains": [
"module.hot"
]
},
{
"contains": [
"cleanup"
]
}
]
},
"diagnose": {
"checks": [
{
"contains": [
"module-level singleton"
]
},
{
"contains": [
"not reset on reload"
]
},
{
"contains": [
"hot module replacement"
]
}
]
}
}
}temperature
0
max_tokens
420
timeout (s)
120
type
scored
file
debug-hot-reload-module-v2.json