Sarvam AI Adversarial Spec Mutation Benchmark | Sarvam AI

Back to all reports

Scope

The benchmark tests if the model updates prior reasoning after rule removals and replacements.

Score

Overall: 7.1 / 10

Good Adaptations

Removed obsolete global-priority logic correctly
Updated later answers to align with new assumptions
Correctly identified eventual-consistency shift

Weak Adaptations

Treats intended policy as if implementation exists
Blurs delivery semantics in one duplicate-probability case

Verdict

Reasoning is structurally solid, but implementation-feasibility analysis remains the ceiling.