In investing, the world model is the live conviction on a deal: the claim "is this a good bet at this price?" carried as a posterior that the diligence fleet drives up or down as evidence lands, and that conviction is what sizes the check. Take Helio, a Series B infrastructure startup under review in the namespace deal:helio. The pitch says net retention is strong and the design win at a marquee account is locked. Diligence exists to test exactly those load-bearing claims: pull the cohort data, call the references, read the contract. A belief state that lives outside any one workstream holds the conviction as it actually stands, so the partner reads where the evidence has moved the thesis before sizing the check.
The world
The environment is the target and the deal: the company, the round, and the price you are being asked to pay. The entities are the company and its product, the market it sells into, the customers and the design wins, the cohorts behind the revenue, the founders and the team, and the terms of the round. The relations are the load-bearing structure of the thesis: a customer validates a use case, a cohort supports a retention claim, a competitor threatens a moat, an expert call corroborates or refutes the technical story, a term prices the risk.
Ground truth is uneven, and saying where it lives is half the discipline. The data room holds management's numbers, which are a starting point, not the record. Primary evidence outranks them: the raw cohort export reconciled to the billing ledger, the signed contract behind a claimed design win, the customer who actually renewed, the expert who has shipped the same architecture. Management's framing is a claim of lower rank waiting on primary evidence to confirm or kill it.
What streams in
The fleet runs the deal as separate workstreams, each an agent that gathers its own evidence and fuses its findings into the one shared conviction. The observations are the channels real diligence runs on:
- Market: TAM build, competitive map, pricing power, and where the category is heading
- Financial / cohort: the data-room financials, but resolved against raw cohort and retention exports reconciled to the billing ledger
- Technical / product: architecture review, security posture, and a read on whether the build matches the story
- Customer & expert references: renewal-stage customers and independent operators who have run the same playbook, weighted by independence: a reference the founder supplied is an advocate and ranks below one sourced off-list
- Team / background: founder track record, key-person risk, and reference checks on the people
The workstreams are not siloed. When the financial agent finds that the raw cohorts undercut the stated retention, that finding becomes an observation the customer agent acts on next, steering its reference calls at the churned accounts. One agent's evidence is the next agent's starting question, and every finding folds into the same conviction with a receipt back to the source that produced it.
The belief state
The conviction here is not a static memo written at first meeting. It is a central claim carrying an explicit posterior that moves as each workstream reports, with every step traceable to the evidence behind it. Supporting evidence lifts it, refuting evidence pulls it down, a genuine contradiction is surfaced rather than averaged into a comfortable middle, and an open gap is named as work still to do.
1┌──────────────────────────────────────────────────────────────┐
2│ THESIS: HELIO IS A GOOD BET AT THE SERIES B PRICE │
3│ │
4│ Conviction lean invest, 55-65% band │ 5 streams │
5│ │
6│ ● Marquee design win is contracted + │ signed MSA │
7│ ● Expansion revenue in top cohorts + │ raw export │
8│ ● Founder shipped this before + │ 3 off-list refs │
9│ ✗ Net retention 130% (pitch) vs 108% raw: definition gap │
10│ │
11│ Gap: "Does the #2 logo renew at Q3 term?" UNKNOWN │
12│ ⚠ Stale: expert call on the moat is 7 months old │
13└──────────────────────────────────────────────────────────────┘Read the conviction as a band, not a point: the fleet reports a range it can defend, and the band is what gates the call and bounds the check. Each row is a piece of evidence tied to a source a partner can pull. The crossed-out line is the point of the exercise: the founder's stated 130% net retention does not sit beside the 108% the raw cohort export shows as an equal. Often the gap is definitional before it is dishonest, two different cohort windows or a different treatment of contraction, so the engine surfaces the conflict and pins the question rather than blending the two into a number that buries it. A partner adjudicates which definition the thesis should underwrite. The stale flag works the other way: a moat read from an expert call seven months ago has decayed enough that it should be re-confirmed before the thesis leans on it.
When a second agent works the same deal, the pictures fuse into one conviction. The market agent's optimism and the cohort agent's caution cannot both ride forward as equals; where they disagree, the disagreement is shown, not smoothed. The engine compiles each workstream's observations into the belief state and projects the conviction the partner acts from.
What we're after
The intent is a defensible invest-or-pass decision and, if invest, the right check size for the conviction the evidence supports. Success has a precise shape: the deal's load-bearing assumptions are tested against primary evidence, the conviction is calibrated against the claims that actually resolved during diligence, and every assertion in the memo traces to a source a partner can name. The limiting factor is the assumption doing the most work with the least support behind it: the claim that, if it cracked, would collapse the thesis. For Helio, that is whether the retention story survives the raw cohorts and references sourced off the founder's list.
Moves are ranked against that intent, not against raw uncertainty. The fleet does not chase the most uncertain fact on the board. It chases the gap whose resolution most moves the conviction the partner has to defend in the IC memo.
| Goal | A defensible invest-or-pass call and a check sized to the conviction the evidence supports. |
| Success | Load-bearing assumptions tested against primary evidence, conviction calibrated on what resolved in diligence, every assertion traceable. |
| Limiting factor | The least-supported assumption the thesis leans on hardest. |
The honest boundary matters here. Venture outcomes resolve over years, so the engine does not predict Helio's exit or claim to know whether the bet pays off. It calibrates against the assumptions that resolve during diligence: did the cohort data confirm the retention claim, did the reference confirm the design win. That calibration accrues across the fund's deal history, not just this one deal, so a partner can ask how often a thesis that looked this well-supported held up when the resolvable checks came back. It surfaces a conviction with its evidence; the partner and the IC make the call and set the check.
The policy
A diligence process runs under rules that outrank any single attractive narrative. You encode them as the structure the fleet works within:
| Policy bucket | In a diligence process |
|---|---|
| Invariants | Stay inside the investment mandate and stage. Every assertion in the memo cites evidence. The check is never sized by software. Data-room material stays in deal:helio and never leaks across deals or to an unwalled model. |
| Source hierarchy | Primary evidence (raw cohorts, signed contracts, references) leads management's data-room numbers; the data room leads the pitch narrative. |
| Conventions | How conviction maps to a sizing band, the stage gates a deal clears before it advances (screen, partner review, IC), when a deal goes "on watch," how a contested claim is phrased in the memo. |
| Avoid | Letting a preference for the deal inflate the posterior on its thesis. This is the is/ought firewall. |
When two sources disagree, the hierarchy decides. The reconciled cohort export outranks the founder's headline retention number, every time.
The actions
What the agent may do in this world, grouped by what it costs to be wrong:
| Action | Safety class | Effect |
|---|---|---|
summarize-data-room, build-cohort, screen-comps | info | Read-only analysis. |
flag-risk, update-conviction, set-watch | mutates | Changes the recorded view. |
decide-invest, set-check-size | needs-approval | Gated on the partner and the IC. No capital is committed without a signed human decision. |
Today, policy and actions are a modeling lens you encode, not a contract the engine runs for you. You express the invariants, the source hierarchy, and the safety classes in how your fleet observes and acts; a declarative config surface that registers and enforces them is on the roadmap. So read these tables as the discipline your fleet honors today. What the engine supplies is the conviction each decision rests on and a record of what every action changed, so the IC can audit the chain before a dollar moves and an LP can later ask why the call was made.
Plan & act
1import Beliefs from 'beliefs'
2
3const beliefs = new Beliefs({
4 apiKey: process.env.BELIEFS_KEY,
5 agent: 'diligence-agent',
6 namespace: 'deal:helio',
7 writeScope: 'space',
8})
9
10// 1. Orient: read the current conviction; the ranked next moves ride back on it
11const context = await beliefs.before('What is the biggest open risk to the Helio thesis?')
12const [move] = context.moves // top-ranked, already in hand; no extra call
13// → { action: 'gather_evidence', subType: 'verify-cohort-retention', valueOfInformation: 0.8, ... }
14
15// 2. Act: your agent dispatches the tool the move points to
16const cohort = await runTool(move.subType) // 'verify-cohort-retention' → your data-room and analytics tool
17
18// 3. Fold the result back in; it becomes the next observation
19await beliefs.after(cohort.text, { source: 'Raw cohort export, reconciled to the billing ledger' })The verdict the world model renders is the conviction on Helio as the evidence stands today, and that is what a recall layer cannot do. A RAG store over the data room would hand the partner the pitch deck, the management retention slide, and the raw cohort export with equal footing, and let an agent write a memo on the 130% number the cohorts already undercut. thinkⁿ ranks verify-cohort-retention above update-conviction, because resolving the load-bearing retention claim moves the thesis more than another paragraph built on management's framing. Once the export folds in, the contradiction surfaces, the conviction settles where the primary evidence supports it, and the partner can walk every line of the memo back to its source. The human stays on the loop for the decision and the check.
Emerging: plan several moves ahead
As a fund builds resolved history across deals, the agent can forecast which diligence move most sharpens conviction before doing the work: beliefs.forecast.predict(['verify-cohort-retention', 'call-design-win-reference', 'check-key-person-risk']). Forecasting stays deliberately low-confidence until the workspace has enough resolved diligence outcomes to calibrate against what those checks actually turned up, so a forecast is never mistaken for a track record. See Moves.