All Reports
Detailed content is organized here track-wise. Use the sections below for fast navigation.
Reasoning (7)
Section average: 7.89 / 10
Sarvam AI Web Reasoning Benchmark
Strong logic and math performance, with weaker handling of overlapping constraints.
Sarvam AI Multi-Layer Nested Hypothetical Benchmark
Very strong layered reasoning and scope discipline with minor hierarchy-depth gaps.
Sarvam AI Partial Spec Corruption Benchmark
Strong contradiction detection with moderate depth limits in enforcement-level explanation.
Sarvam AI Long-Context Spec Consistency Benchmark
Strong long-context retention and rule cross-referencing with moderate consistency-depth gaps.
Sarvam AI Silent Inconsistency Injection Benchmark
Good silent contradiction detection, with weaker real-time enforceability and resource-bound analysis.
Sarvam AI Adversarial Spec Mutation Benchmark
Good mutation tracking, but weak enforceability modeling under distributed constraints.
Sarvam AI Multi-Step Logic Stress Benchmark
Solid linear reasoning but weak global validation and constraint reconciliation.
Systems (1)
Section average: 6.00 / 10
Coding (2)
Section average: 8.46 / 10