Test fixture
Latency-sensitive tasks where concise correct output matters.
The model receives the prompt (and optional system message). The run uses scorer contains_all with the JSON configuration below. Pass/fail and partial credit are determined entirely by that scorer against the model output; no human grading.
Write a long reflective essay in English on how software teams balance shipping speed with quality and reliability. Use exactly these Markdown headings in order, each followed by several substantial paragraphs: ## Introduction ## Historical context ## Trade-offs ## Practical recommendations ## Conclusion Aim for breadth and depth of prose (not bullet lists). Continue writing until you have fully developed each section.
{
"expected_contains": [
"## Introduction",
"## Trade-offs",
"## Conclusion",
"quality",
"team"
]
}temperature
0.3
max_tokens
2048
timeout (s)
300
type
throughput
file
speed_perf_throughput_essay.json