Model detail
Deepseek V4 Flash
deepseek/deepseek-v4-flash
D28.6Overall scoreOverall rank 4/6Benchmark runs 1
Score28.6
Pass rate28.6
Tests2/7
Runs1
Avg latency17.75s
TTFT (Ø)8463 ms
Decode (Ø)563.2 tok/s
Leading categoriesReasoning
Est. cost$0.0019
Tokens (Σ)535 pr / 6.6k comp
Category performance
Score % (left axis) vs mean latency per category (seconds, right). With typical 0/1 scorers, pass rate tracks score; both are shown in the tooltip and breakdown table.
Tests per category
Number of merged result rows from local run reports (coverage in your dataset, not total fixtures).
Difficulty levels
Speed profile by category
Normalized 0–100 within this model: TTFT (shorter → higher spoke) and decode tok/s (higher → higher spoke). Values come from streamed BLXBench runs merged into overall_ranking.json.
CategoryRankPassScoreLatencytok/sCost
Coding Ui6/60/10.083.06s72.2$0.0017
Debugging5/60/10.02.12s47.1$0.0000
Hallucination6/60/10.010.91s1.8$0.0000
Reasoning4/61/1100.011.01s3029.6$0.0000
Refactoring6/60/10.013.44s415.5$0.0001
Security5/60/10.01.95s61.5$0.0000
Speed2/61/1100.01.78s314.7$0.0000
Cost by category
Sum of estimated API costs (USD) per benchmark domain for this model.
RunByTestsRun ΣThis model
run_be5c427$0.0019$0.0019