Sarvam AI Benchmark Dashboard

Web-version benchmark reports for Sarvam AI, curated into reasoning, systems, and coding tracks. Home now shows only summary signals, while detailed listings are moved to the Reports page.

Total Benchmarks

Overall Average

7.81 / 10

Top Benchmark

Python Coding (9.03)

Lowest Benchmark

Multi-Step Logic (5.80)

Open Full Reports View Bug Snapshot

Browse by Track

Reasoning

Logic, contradiction detection, long-context, and policy-layer reasoning.

Avg 7.89 Reports 7

Systems

Distributed architecture, consensus depth, scaling math, and concurrency tradeoffs.

Avg 6.00 Reports 1

Coding

Algorithmic coding quality in Python and cross-language concurrency stress.

Avg 8.46 Reports 2

Latest Benchmarks

Cross-Language Coding

Strong paradigm switching across languages, but concurrency correctness is not fully production-grade.

Coding Score 7.90 Run #10

Nested Hypothetical

Very strong layered reasoning and scope discipline with minor hierarchy-depth gaps.

Reasoning Score 8.80 Run #9

Silent Inconsistency

Good silent contradiction detection, with weaker real-time enforceability and resource-bound analysis.

Reasoning Score 7.80 Run #8