Verification strategy
Before you rely on Testing as a merge gate: local/CI parity, Keys resolution behaviour, flake policy, and artefact usefulness under failure.
- Local run matches CI for the same commit and env variables (modulo base URLs).
- Forced failure produces a triage path within the MVP target (~30 minutes for a competent engineer).
indeterminateverdicts are rare, logged, and understood—not ignored.- Keys misconfiguration or outage fails loudly; tests do not silently run against the wrong model.
Agent prompts
Optional: use these with a coding agent to implement this phase in your repo in a safe, gated sequence. Expand only when you need them.
- Run the checklist above on a feature branch; file gaps as issues with severity.
- Define team policy for when
indeterminateblocks merge vs needs human override. - Schedule a monthly review of top flaky goals and artefact noise.