Benchmark Status: This evaluation is based on the Sarvam AI web version only, from a single prompt run with no re-prompting. It has not been validated in real projects and remains incomplete until stable API access is available.

Back to all reports

Scope

The benchmark covers 15 problems across logic, quantifiers, divisibility, combinatorics, probability, invariants, graph theory, and strategy puzzles.

Score

  • Final: 135 / 150
  • Average: 9.0 / 10

Where It Performed Best

  • Probability and Bayes calculations
  • Invariant and parity reasoning
  • Single-constraint combinatorics
  • Graph degree reasoning

Where It Slipped

  • Multi-constraint counting cases
  • Quantifier interaction edge cases

Notes

ItemObservation
Q2Weak quantifier interaction handling
Q9Multi-constraint counting error
Remaining setMostly high confidence and correct

Verdict

A strong reasoning profile for web interaction use. The main gap is global constraint reconciliation when multiple rules overlap.