Benchmark Status: This evaluation is based on the Sarvam AI web version only, from a single prompt run with no re-prompting. It has not been validated in real projects and remains incomplete until stable API access is available.

Back to all reports

Scope

Fifteen systems problems were used, from distributed architecture and analytics pipelines to lock-free structures and x86/TSO memory semantics.

Score

  • Overall average: ~6.0 / 10
  • Best single section: real-time analytics design

Strength Pattern

  • Performs well in layered pipeline design (ingest, process, serve)
  • Good retry and dedup modeling
  • Good interview-level architecture communication

Failure Pattern

  • Consensus and cross-region correctness are shallow
  • Numeric capacity estimates are often optimistic
  • Failure-first analysis is incomplete
  • MPMC and minimal synchronization details are inconsistent

Verdict

Useful for structured design drafts. Not reliable yet for high-risk distributed correctness or low-level concurrency proof work.