Benchmark Status: This evaluation is based on the Sarvam AI web version only, from a single prompt run with no re-prompting. It has not been validated in real projects and remains incomplete until stable API access is available.

Back to all reports

Scope

This test introduces corruption in a partial spec and checks whether structural impossibilities are detected.

Score

  • Overall: 8.5 / 10

What Worked Well

  • Formal consistency checks were stable
  • Global vs regional authority contradictions were recognized
  • Mathematical consistency cases were handled correctly

Remaining Gaps

  • Some answers are logically right but not implementation-grounded enough
  • A few missed chances to tie findings to exact rule interactions

Verdict

A strong result for contradiction detection under adversarial specification changes.