Test fixture

Summary Customer Support

Speedv1 — Nutritionmediumscorer: contains_all

Throughput and TTFT-focused generation tasks.

How it is scored

The model receives the prompt (and optional system message). The run uses scorer contains_all with the JSON configuration below. Pass/fail and partial credit are determined entirely by that scorer against the model output; no human grading.

User prompt

Summarize in exactly 3 short bullet points:
Support performance improved after introducing intent-based ticket routing, response templates for common issues, and escalation paths for payment and security incidents. Weekly reviews of first-response time and resolution time helped the team prioritize process bottlenecks.

Scorer config

{
  "expected_contains": [
    "ticket routing",
    "escalation",
    "first-response time"
  ]
}

Run parameters

temperature

max_tokens

110

timeout (s)

120

type

scored

file

speed_medium_04.json

← PreviousSummary Release Safety

Next →Summary Engineering Enablement