Benchmark Status: This evaluation is based on the Sarvam AI web version only, from a single prompt run with no re-prompting. It has not been validated in real projects and remains incomplete until stable API access is available.

Back to all reports

Scope

The model must detect injected inconsistencies without direct hints and decide whether design claims remain structurally valid.

Score

  • Overall: 7.8 / 10
  • Contradictions detected: 8 / 10

Strength Pattern

  • Strong under explicit architectural structure
  • Good semantic reasoning around ordering and retries

Gap Pattern

  • Real-time bounds and resource pressure analysis are less reliable
  • Some answers stay at guarantee-level language instead of enforceability-level proof

Verdict

Good defensive reasoning, but not yet expert in hard constraint or infra-feasibility analysis.