Test fixture
Roblox OpenGameEval tasks. Special category: visible in breakdowns, excluded from Overall.
The model receives the prompt (and optional system message). The run uses scorer roblox_open_eval with the JSON configuration below. Pass/fail and partial credit are determined entirely by that scorer against the model output; no human grading.
[Roblox OpenGameEval] I want 6 items of different rarity to spawn in the middle, each item must have a unique animal shape.
Rarity with random income between two values for each rarity: Common 60% $1-5, Rare 20% $6-50, Very Rare 11% $100-1,000, Mythic 7% $1,000-10,000, Legendary 1% $10,000-50,000, Ultra 0.9% $50,000-100,000, God 0.09% $100,000-300,000, Limited 0.01% Random value between $300,000-500,000.
6 items spawn on the map and must slide on the conveyor belt (already installed) from {spawn=Vector3.new(0,0,0), platform=Vector3.new(-56,0,0)}, {spawn=Vector3.new(0,0,0), platform=Vector3.new(-28,0,-48.497)}, {spawn=Vector3.new(0,0,0), platform=Vector3.new(28,0,-48.497)}, {spawn=Vector3.new(0,0,0), platform=Vector3.new(56,0,0)}, {spawn=Vector3.new(0,0,0), platform=Vector3.new(28,0,48.497)}, {spawn=Vector3.new(0,0,0), platform=Vector3.new(-28,0,48.497)}.
The sliding time must be 10 seconds then it remains stationary for 30 seconds before disappearing.
1 second later, 6 new items spawn randomly (respecting spawn rates) with 3 items per animal rarity.
The font must adhere to this -- Billboard (name + rarity):
'local billboard = Instance.new("BillboardGui"); billboard.Size = UDim2.new(0, 200, 0, 50); billboard.StudsOffset = Vector3.new(0, 3, 0); billboard.AlwaysOnTop = true; billboard.Parent = main; local textLabel = Instance.new("TextLabel"); textLabel.Size = UDim2.new(1, 0, 1, 0); textLabel.BackgroundTransparency = 1; textLabel.Text = name .. " [" .. rarity .. "]"; textLabel.TextScaled = true; textLabel.TextColor3 = Color3.new(1, 1, 1); textLabel.Parent = billboard.
'
Each animal must have a unique cubic Minecraft-style shape that reflects its true form. Here are the 24 animals:
Common (60%) Cow Sheep Chicken
Rare (20%) Wolf Pig Horse
Very Rare (11%) Tiger Bear Elephant
Mythic (7%) Unicorn Phoenix Griffin
Legendary (1%) Dragon Minotaur Hydra
Ultra (0.9%) Titan Celestial Leviathan
God (0.09%) GodSword DivineShield HolyOrb
Limited (0.01%) Excalibur LegendaryGem DivineStaff.
Create designs that reflect the item names. The items must float on the conveyor at the size of a Roblox player's head.{
"input_script": "[embedded in CLI runner]",
"upstream_path": "Evals/122_animal_item_with_rarity.lua",
"upstream_sha256": "4f4d78ea9f3925532346e3a5f440c0cac2abca5a5e5713a4db02eebf11fd0ff3",
"scenario_name": "122_animal_item_with_rarity",
"place": "baseplate.rbxl",
"eval_kind": "codegen"
}temperature
0
max_tokens
1
timeout (s)
900
type
scored
file
122_animal_item_with_rarity.json