Test fixture
Code transformation while preserving behavior and intent.
The model receives the prompt (and optional system message). The run uses scorer rubric_json_metrics with the JSON configuration below. Pass/fail and partial credit are determined entirely by that scorer against the model output; no human grading.
Return JSON only with keys plan, refactor, tests. Two API endpoints manually map database rows to response DTOs with overlapping field renames. Refactor the mapper and keep endpoint output stable.
{
"metrics": {
"intent": {
"checks": [
{
"contains": [
"response DTO"
]
},
{
"contains": [
"mapper"
]
},
{
"contains": [
"field renames"
]
}
]
},
"visible": {
"checks": [
{
"contains": [
"database rows"
]
},
{
"contains": [
"two endpoints"
]
},
{
"contains": [
"overlap"
]
}
]
},
"hidden": {
"checks": [
{
"contains": [
"snapshot tests"
]
},
{
"contains": [
"stable output"
]
},
{
"contains": [
"null fields"
]
}
]
}
}
}temperature
0
max_tokens
500
timeout (s)
120
type
scored
file
refactor-response-mapper.json