Benchmark Status: This evaluation is based on the Sarvam AI web version only, from a single prompt run with no re-prompting. It has not been validated in real projects and remains incomplete until stable API access is available.

Back to all reports

Scope

This stress test focuses on multi-constraint logic with self-reference and end-state validation pressure.

Score

  • Overall: 5.8 / 10

Observed Behavior

  • Local steps are often valid.
  • Global state checks are inconsistent.
  • Recursive truth setups cause instability.

Typical Failure Modes

  • Constraint retention loss
  • Missed automatic rule activation
  • Arithmetic or assignment checks skipped at final pass

Verdict

The model can reason linearly but needs stronger whole-system verification before final answers.