Benchmark Status: This evaluation is based on the Sarvam AI web version only, from a single prompt run with no re-prompting. It has not been validated in real projects and remains incomplete until stable API access is available.

Back to all reports

Scope

This benchmark tests four concurrent abstraction layers:

  1. Physical constraints
  2. System architecture
  3. Policy guarantees
  4. Temporary executive override

Score

  • Overall: 8.8 / 10

What It Did Well

  • Preserved layer boundaries across all cases
  • Correctly reasoned temporary override windows
  • Correctly identified duplicate-side-effect contradictions

Where It Can Improve

  • Deeper hierarchy analysis for override vs policy guarantees
  • More explicit treatment of irreversible ordering effects

Verdict

One of the strongest runs. Layered reasoning is stable and resilient under nested hypotheticals.