Test fixture
Code transformation while preserving behavior and intent.
The model receives the prompt (and optional system message). The run uses scorer rubric_json_metrics with the JSON configuration below. Pass/fail and partial credit are determined entirely by that scorer against the model output; no human grading.
Return JSON only with keys plan, refactor, tests. Authentication policy, session lookup, and audit logging are tangled in one service. Refactor boundaries so policy decisions are testable and side effects remain explicit.
{
"metrics": {
"intent": {
"checks": [
{
"contains": [
"policy decisions"
]
},
{
"contains": [
"service boundaries"
]
},
{
"contains": [
"testable"
]
}
]
},
"visible": {
"checks": [
{
"contains": [
"session lookup"
]
},
{
"contains": [
"audit logging"
]
},
{
"contains": [
"authentication"
]
}
]
},
"hidden": {
"checks": [
{
"contains": [
"side effects explicit"
]
},
{
"contains": [
"contract tests"
]
},
{
"contains": [
"same security behavior"
]
}
]
}
}
}temperature
0
max_tokens
560
timeout (s)
120
type
scored
file
refactor-auth-policy-boundaries.json