---
title: Behavioral contracts
description: "What the engine guarantees: the behaviors you can build against."
---

Eight guarantees about how the engine behaves under your code. Each is testable, audited, and load-bearing for the SDK above. A regression on any of these is release-blocking.

The implementation behind these guarantees evolves as the engine improves. The guarantees themselves are stable. The surface you build against doesn't shift underneath you.

---

## 1. Retries and no-ops are safe

Replaying the same `(idempotencyKey, scope)` on `add`, `after`, or `observe` produces the same state. Empty inputs (`after('')`, an empty `BeliefDelta`) leave state unchanged.

**What this means for you:** at-least-once delivery from queues, webhooks, or flaky networks doesn't double-count evidence. Defensive patterns ("call `after()` every turn even if nothing happened") are zero-cost and zero-risk.

---

## 2. Fusion is order-independent

Combining the same set of contributions in any order produces the same result.

**What this means for you:** retrying a failed `after()` after a peer's write doesn't change the outcome. Multi-agent pipelines have no hidden ordering bug class.

---

## 3. Trust knobs behave predictably

Lowering an agent's or source's trust attenuates its contributions proportionally without affecting any other agent or source. Locked overrides stay where you set them; the engine's learning never drifts them.

**What this means for you:** `beliefs.trust.set({ kind: 'agent', id: 'unreliable-scout' }, { confidence: 0.1, strength: 50 })` reduces that scout's pull at fusion time without surprising side effects elsewhere.

---

## 4. Older evidence carries less weight

Evidence is downweighted by a freshness factor as time passes, scaled to the workspace's configured decay rate. Stale claims surface for re-verification rather than silently dominating fresh ones.

**What this means for you:** the system creates pressure to refresh. Old analyses lose their grip without being deleted, and new evidence wins on equal footing.

---

## 5. Confidence labels are calibrated

When the SDK reports `confidence: 'high'`, those events resolve true at roughly the rate the label implies. Calibration is enforced in CI; regressions don't ship.

**What this means for you:** the labels are honest. You can route on `'high' / 'medium' / 'low'` without building your own calibration layer on top.

---

## 6. Supersession is a clean cut

When belief B explicitly supersedes belief A, A leaves the active candidate set. `read()` and `list()` no longer return A; `trace()` still surfaces it for audit.

**What this means for you:** an agent updating its position on a claim doesn't leave the prior position competing for attention. Audit history is preserved separately from current state.

---

## 7. Belief shapes don't contaminate each other

Beliefs of different shapes (binary, categorical, numeric) compose safely. Adding a categorical claim doesn't perturb a binary one.

**What this means for you:** multi-modal world models are safe. Your numeric measurements aren't at risk from a new yes/no claim landing in the same workspace.

---

## 8. Confidence and evidence count are tracked separately

A claim at 70% with 100 supporting observations is a different signal from a claim at 70% with 2 observations. The SDK exposes both. See [Clarity](/dev/core/clarity) for the two-channel model.

**What this means for you:** you can distinguish "we haven't investigated yet" from "we've investigated extensively and the answer is genuinely close." They demand opposite next actions.

---

## What's deliberately not promised

- **Extraction model choice.** The model behind `after()` and `observe()` may change between releases. Only the *shape* of the resulting `BeliefDelta` is contracted.
- **Absolute confidence numbers across version bumps.** Calibration shifts when models swap; the calibration *quality* is bounded, not the exact numbers.
- **Cost or token usage.** Telemetry is intentionally not part of the public SDK contract.
- **Implementation details.** How fusion combines contributions, how decay scales evidence, how confidence is computed: these evolve as the engine improves. Build against the guarantees, not the implementation.

If you find a case where SDK behavior appears to violate one of these contracts, file it as a P0.
