Test fixture
Bug fixes, edge conditions, and minimal patch accuracy.
The model receives the prompt (and optional system message). The run uses scorer rubric_json_metrics with the JSON configuration below. Pass/fail and partial credit are determined entirely by that scorer against the model output; no human grading.
Return JSON only with keys diagnosis, fix, tests. A websocket client starts a heartbeat timer on every reconnect attempt but never clears the previous timer, so reconnect storms create duplicate pings and stale sockets stay active. Identify the timer lifecycle bug and fix it.
{
"metrics": {
"repro": {
"checks": [
{
"contains": [
"heartbeat timer"
]
},
{
"contains": [
"reconnect"
]
},
{
"contains": [
"duplicate pings"
]
},
{
"contains": [
"stale sockets"
]
}
]
},
"hidden": {
"checks": [
{
"contains": [
"clearInterval"
]
},
{
"contains": [
"cleanup"
]
},
{
"contains": [
"single active timer"
]
}
]
},
"diagnose": {
"checks": [
{
"contains": [
"reconnect storm"
]
},
{
"contains": [
"timer lifecycle"
]
},
{
"contains": [
"previous timer"
]
}
]
}
}
}temperature
0
max_tokens
520
timeout (s)
120
type
scored
file
debug-websocket-reconnect-storm-v2.json