Verified context

Restormel Connect serves knowledge-graph context to AI applications with a verification chain attached to every claim. This page defines exactly what "verified" means on this API, which guarantees the pipeline enforces, and — most importantly — how you can check each one yourself. It is written for the person auditing a deployment, not only the developer integrating it.

The falsifiability test. Every claim served as supported carries the quoted evidence span, its character offsets, and a content hash of the exact source version it was bound against. A skeptical reader can open the cited source and check the quote themselves. If a surface cannot show that chain, it does not say "verified".

At a glance

GuaranteeMechanismHow you check it
A claim is never "supported" without locatable evidenceDeterministic evidence binding (quote + offsets + source-version hash)Re-find the quote at its offsets in the cited source; compare the hash
Misattributed claims are structurally caughtBinding runs against the cited source only — a quote from elsewhere failsThe envelope's evidence is empty and the state is not supported
A missing verdict is never a passFail-safe coverage finalizers: omitted or unparseable verdicts become coverage gapsUnjudged claims surface as unverified, never as supported
Uncertainty is flagged, not blendedThe judge may abstain; abstention and low confidence route to reviewPer-claim state + verification_summary counts on every response
Every retrieval is auditable after the factProvenance trace recorded per query (included and excluded claims, with reasons)Export the trace: GET /connect/v1/traces/{trace_id}/export?format=json
Graph quality is held to a published barG2 gate: ≥ 90% supported, ≤ 2% unsupported across validated claimsQuality report on every ingest job; webhook on threshold breach
Scoring rules are inspectable, not implicitVersioned verification rule sets (six weighted dimensions, named policies)GET /connect/v1/verification-rules

What "verified" means: the five states

Verification is two-layered, per the Evidence-Bound Verification design. Layer 1 is deterministic and model-free: at ingest, every extracted claim must bind a quoted evidence span to exact character offsets in the cited source version, recorded with that version's SHA-256 content hash. Anyone can re-run this check at any time — if the source changed or the quote is not where it was bound, the check fails. Layer 2 is a narrow entailment judgment: a model is asked only "does this bound span entail this claim?", and it may abstain. The judge runs on a different model family than the extractor, so the system that writes claims never grades its own work. Every verdict is recorded with the judge's model id, prompt version, and timestamp, append-only — re-judging adds history, it never rewrites it.

StateMeaningRequires
supportedEvidence-bound and entailedLayer 1 pass and Layer 2 entailed
inferredEntailed, but no directly bound span — always labeled as inferenceLayer 2 entailed; Layer 1 partial
unverifiedJudge abstained, low confidence, or no bindable evidenceRouted to the human review queue
contradictedEvidence entails the negationReview; excluded from strict retrieval
excludedRemediation or operator decisionReversible soft-exclude — the record is kept, out of active use

The asymmetry is deliberate: a claim with no locatable evidence in its cited source can never be supported, whatever any judge said about it. This is what closes the misattribution hole — a claim that is true somewhere else in the corpus but cited to the wrong source fails the deterministic binding, no judgment required. Graphs verified before this design (or imported from elsewhere) are normalized through the same rule: a legacy-affirmed claim without a bound span is served as inferred at best.

The verification chain on every response

Retrieval responses (POST /connect/v1/retrieve and POST /connect/v1/graph) carry a verified-claim envelope per returned unit, plus a per-state summary in the response metadata so a consumer can gate on "anything non-supported in this context?" without scanning every claim:

{
  "claim": { "id": "claim:7rk2…", "text": "Virtue is a mean between two vices." },
  "state": "supported",
  "evidence": [
    {
      "quote": "virtue is a mean between two vices, the one involving excess, the other deficiency",
      "offsets": [18204, 18289],
      "source_ref": "source:nicomachean_ethics",
      "source_hash": "9f2c41…e7",
      "match": "exact"
    }
  ],
  "judge": {
    "model": "gemini-2.0-flash",
    "prompt_version": 1,
    "confidence": 0.93,
    "at": "2026-06-10T12:04:11.000Z"
  },
  "citation": "Nicomachean Ethics, Book II",
  "trace_ref": "/connect/v1/traces/3f6f9a3a-…",
  "trust_score": 88
}

Three honesty rules govern this envelope. Evidence is never fabricated: if a span could not be bound, the array is empty and the state says so. Judge attribution is never invented: if a claim has not been judged, the field is omitted. And anything looser than an exact quote match is labeled (normalized or fuzzy), never hidden. Requests can also pass require_verified (or an explicit verification_policy) to exclude non-supported claims from the context entirely — exclusions are then counted and recorded in the trace, not silently dropped.

Fail-safe gates in the ingest pipeline

The pipeline that builds the graph (extract → relate → group → embed → validate → remediate → store) fails safe, not open. The gates an auditor should know about:

  • Coverage finalizers. Validation and entailment run in batches against live models, and models sometimes omit items or return malformed output. Any claim the judge did not return a verdict for is finalized as a coverage gap — recorded as an abstention and routed to review. An omission can never default to a pass.
  • Abstention is an outcome, not an error. The entailment judge is explicitly allowed to answer "cannot verify". Abstentions and low-confidence verdicts land in the review queue; they are never laundered into a softer passing grade.
  • Remediation cannot resurrect. Claims flagged weak or unsupported go through a repair pass; repaired text must re-bind its evidence before it can return to circulation, and claims that cannot be supported are soft-excluded — reversibly, with the record and its history retained.
  • Verification cannot silently rot. Because the Layer-1 check is deterministic over hashed content, it is re-runnable at read time. If a source version changes, bindings against the old hash fail the re-check rather than continuing to vouch for text that no longer exists.

The validator itself is measured against planted ground truth, not assumed. The most recent benchmark (2026-06-10, cross-model routing: extraction on gpt-4o-mini, validation on Llama-3.3-70B) measured 100% recall on planted fabricated and misattributed claims, a 14.5% false-flag rate on known-good claims, and 0% affirm rate on claims the validator was never shown — that last probe being the direct test that the fail-safe gates hold under real model behaviour. Numbers are point-in-time, tied to those model versions, and re-measured when models or routes change.

The G2 quality bar

A graph is not "done" because ingest finished; it is done when it clears the published bar. The G2 gate requires at least 90% of validated claims supported and at most 2% unsupported. Every ingest job's quality report states the trust score (0–100) and the supported / weak / unsupported breakdown, so the bar is checkable per run rather than asserted globally. The trust score weighs verification coverage and embedding coverage most heavily, alongside structural health (orphan rate, vector index presence, relation balance), minus a penalty for high-severity issues. You can register a webhook (job.quality_below_threshold) to be notified when a run lands under your threshold — quality failures are pushed to you, not buried in a log.

Provenance traces and export

Every retrieval query produces a structured audit trace answering "why did the agent get this context?" — the question regulators and internal audit actually ask. The trace records the query, the verification policy in force, the seed claims chosen, the graph expansion, and a per-claim verdict for everything the engine considered: included claims with their verification state and trust score, and excluded claims with the reason they were dropped (verification gate, confidence gate, duplicate). Traces are retained for 90 days and are workspace-scoped.

  • GET /connect/v1/traces/{trace_id} — the versioned trace document
  • GET /connect/v1/traces/{trace_id}/export?format=json — downloadable export for audit files

The trace_ref on every verified-claim envelope links the claim back to the exact query that served it, so a finding in an AI system's output can be walked back to context, claim, evidence span, and source version in four steps.

Verification rules are public configuration

The reasoning-quality scoring behind verification is not an implicit prompt; it is a versioned rule set. Each rule set defines six weighted dimensions — logical structure, evidence grounding, counterargument coverage, scope calibration, assumption transparency, internal consistency — with per-dimension passing thresholds and named policies (strict / balanced / lenient) that set the overall pass and weak-claim thresholds. Workspaces may override weights per domain pack; the override is itself inspectable.

  • GET /connect/v1/verification-rules — the rule set active for your workspace
  • GET /connect/v1/verification-rules/built-in — the built-in "Restormel Core v1" definition

Auditing a claim yourself

  1. Retrieve context and read the verified_claims envelopes; check metadata.verification_summary for anything non-supported.
  2. For any claim, take evidence[0].quote, offsets, and source_hash.
  3. Fetch the cited source version and confirm the quote sits at those offsets and the content hashes to the recorded value. A mismatch is a finding — the claim should not be in a supported context.
  4. Follow trace_ref and export the trace; confirm the verification policy and that exclusions carry reasons.
  5. Fetch the active verification rules and confirm the policy thresholds match what your deployment claims to enforce.

Related actions

Engineering reference: docs/decisions/evidence-bound-verification.md in the repository is the canonical design record this page summarizes; the envelope's canonical schema is @restormel/contracts (verified-claim.ts).