Model detail
Deepseek V4 Pro
deepseek/deepseek-v4-pro
D3.6Overall scoreOverall rank 6/6Benchmark runs 4
Score3.6
Pass rate3.6
Tests1/28
Runs4
Avg latency22.21s
TTFT (Ø)2209 ms
Decode (Ø)94.2 tok/s
Leading categoriesReasoning
Est. cost$0.02
Tokens (Σ)535 pr / 6.6k comp
Score over runs
Overall score % from merged run_models rows (chronological). Only runs that include this model appear as points.
Category performance
Score % (left axis) vs mean latency per category (seconds, right). With typical 0/1 scorers, pass rate tracks score; both are shown in the tooltip and breakdown table.
Tests per category
Number of merged result rows from local run reports (coverage in your dataset, not total fixtures).
Difficulty levels
Speed profile by category
Normalized 0–100 within this model: TTFT (shorter → higher spoke) and decode tok/s (higher → higher spoke). Values come from streamed BLXBench runs merged into overall_ranking.json.
CategoryRankPassScoreLatencytok/sCost
Coding Ui5/60/40.033.57s44.7$0.02
Debugging2/60/40.00.87s28.6$0.0004
Hallucination4/60/40.00.66s7.6$0.0002
Reasoning5/61/425.00.58s475.0$0.0002
Refactoring2/60/40.01.64s39.6$0.0011
Security4/60/40.00.83s36.0$0.0005
Speed5/60/40.00.71s28.1$0.0003
Cost by category
Sum of estimated API costs (USD) per benchmark domain for this model.
RunByTestsRun ΣThis model
run_c83c0d7$0.00$0.00run_7c82877$0.00$0.00run_acaf287$0.00$0.00run_fa781e7$0.02$0.02