Skip to main content

Command Palette

Search for a command to run...

Committing a Fact: A Trust Boundary for LLM-Extracted Graph Data

Updated
11 min read
Committing a Fact: A Trust Boundary for LLM-Extracted Graph Data
M
I'm a social data analyst working at the intersection of data science, AI integration, and sociopolitical research. I build reproducible studies — open data, validated methods, robustness checks, honest limitations — on questions like how the press frames African elections, where internet shutdowns cost the most, and how violence and displacement move through a conflict. I also build the open-source tooling that makes that kind of analysis repeatable: graph-investigation workbenches and retrieval systems for reading social dynamics out of text. Based in the Netherlands. Python-first, provider-agnostic by design. That's the full-strength version. It opens with the identity, names>

A language model reads a sentence from a news wire and returns, cleanly formatted, {value: 40, property: casualty_count, source: "Source X", confidence: "high"}. It looks like a fact. It is a guess: a probabilistic reading of one ambiguous sentence by a system that cannot reliably tell a confirmed count from a rumor it half-recognized, a translation artifact, or a number it carried over from the previous paragraph. The naive pipeline takes that JSON and writes it straight into the graph. In a single step, a text-prediction model has been handed write access to the system of record.

I keep coming back to one image for what is wrong with that step. A proposal arriving from an extraction model is a claimant at a border, not a resident. It has documents that may or may not be genuine, it may be the same person as someone already inside under a different name, and it does not get to wave itself through. Something has to stand between the claim and the record: an inspection, a decision, and a single authority empowered to admit it. This piece is about building that crossing.

To be clear about the target, this is not an argument for distrusting language models. Extraction is exactly the work they are good at, and reading messy prose into structured proposals is the part of the pipeline I would least want to hand-code. The problem is narrower and sharper: extraction is not the same thing as authority, and a system that conflates the two has no defensible answer to the question "who decided this was true?"

One artifact, enriched and never overwritten

Follow a single batch through the system and a discipline becomes visible. A manifest is created when the batch arrives, and every stage that touches it adds to it rather than replacing it. Extraction produces a consolidated manifest. Resolution reads that and writes a resolution-annotated version. Curation reads that and writes a curated version. At no point does a stage reach back and erase what an earlier stage decided; the file grows a new layer of annotation and carries the old ones forward, so the final artifact contains a complete account of how it came to be.

This is the same principle the companion article applied to data at rest, now applied to data in motion. There, the point was that a fact's history is part of the fact, so the store never overwrites a value. Here, the point is that a fact's processing history is part of its provenance, so the pipeline never overwrites a stage's output. Provenance accumulates in both directions, at rest and in flight, and for the same reason: in this domain, the audit trail is not overhead, it is the product.

The pattern has a name. A shared artifact that successive analyzers enrich without destroying earlier layers is the idea behind annotation pipelines like UIMA's common analysis structure. The adaptation worth noting is making the artifact a human-readable line format that carries its own decisions inline, so the manifest is something an operator can open and read rather than a black box passed between services.

Proposals, not writes

Now look inside extraction. Four agents run in a fixed order: one locates places, then one identifies actors, then one extracts events, then one binds those into claims. Each agent reads the proposals the previous agents left in a shared workspace, so the actor agent can attach to the locations already found and the claim agent can reference actors and events already proposed. The order is deliberate and the agents never run in parallel.

Why give up the obvious throughput win of running them concurrently, or the flexibility of letting a swarm of autonomous agents negotiate the result among themselves? Because determinism and auditability are worth more here than speed. A fixed sequence means a rerun on the same input produces the same manifest, which is the difference between a bug you can reproduce and one you cannot. It means an auditor can read the pipeline as a story with a beginning and an end. The architecture descends from blackboard systems, where independent agents collaborate through a shared structure, run over a plain pipes-and-filters spine, and the decision to keep it strictly sequential is a stance, not a v0 apology.

What matters most about this stage is what it cannot do. None of these agents can write to the database. They read a contract and emit proposals, and a proposal has no standing. The sequential design is not a performance confession to be optimized away later; the serialization is a control I would keep even if the clock said otherwise.

Resolution: deciding what is the same thing

Two sources describe what is probably one military unit. One calls it "the 5th Brigade," another "5th Mechanized," and a gazetteer the system already holds lists "5th Mechanized Infantry Brigade." Resolution is the stage that decides these are one entity rather than three, and it does so with a cascade that spends cheap effort before expensive effort.

The cascade tries the inexpensive, high-precision test first and escalates only when it fails. An exact match against the gazetteer settles the clear cases. Edit-distance catches the near-misses, the typos and truncations. Embedding similarity handles the genuinely hard residue, where two strings mean the same thing without looking alike. Past the actor-matching step, the same stage deduplicates events and claims, collapses redundant edges that several sources all assert, and orders the assertion history so the record reads in time sequence. Where one entity resolves to a second that resolves to a third, the chain is flattened so everything points to a single canonical node in one hop.

None of this is new in its parts. Cheap-to-expensive matching is the standard shape of record linkage, with a lineage running back through the Fellegi-Sunter model, and flattening merge chains is path compression from the union-find data structure under a different name. Resolution proposes identity, and like extraction before it, it still commits nothing.

The gate: governance made executable

A curated proposal reaches a reviewer that returns exactly one of four verdicts: accept, reject, modify, or escalate. The reviewer is itself a model pass with the criteria written into it, backed by escalation to a human for the cases the system should not decide alone. This is the stage where policy stops being a document and becomes code that runs on every proposal.

The companion article ended on a question it deliberately left open: if the record keeps every claim, who is allowed to assert, how is confidence assigned, and when is one claim permitted to supersede another? The gate is the answer made concrete. "Who may assert" is the accept rule. "When may a claim supersede another" is the logic that distinguishes a real correction from a value that is merely newer. "What do we do when we cannot tell" is escalate, the human-in-the-loop seam where the system declines to rule and hands the decision up rather than guessing. A pipeline without this stage can still store history beautifully; it just has no principled account of how anything got into that history.

The gate is not a quality filter bolted onto the end of an otherwise-finished pipeline. It is the boundary itself, the line a proposal crosses to stop being a claim and start being a record. Everything before it is provisional by construction.

The single writer

Here is the rule that makes the boundary real: exactly one component in the entire system is permitted to write to the store. Every other part reads shared contracts and emits proposals, and not one of them can mint an identifier or modify a node. The write privilege is not distributed with discipline; it is structurally unavailable everywhere but one place.

When an accepted manifest arrives at that one component, it opens a single atomic transaction and does all of the consequential work inside it: mint the node and edge identifiers, resolve the placeholder references that earlier stages used into real ids, and write the nodes, the edges, and the assertion-history rows together. The transaction either wholly succeeds or wholly rolls back, so the store is never left in a half-written state where some of a batch landed and some did not. The single writer is also the one place that stamps lifecycle metadata, because a single writer can hold an invariant that a dozen writers would each have to be trusted to respect.

The lineage here is the single-writer principle, familiar from high-throughput designs like the LMAX architecture, meeting the dependency rule of clean and hexagonal architecture, where the core defines contracts and only an outer adapter is allowed to touch the database. The reflex objection is that one writer is a scaling bottleneck. At the scale this serves, a single operator running a bounded ingestion cadence, the serialization is not a cost I am paying. It is the feature I am buying: one place to reason about correctness, one place to audit, one throat to choke.

The honest seam

The transaction commits, and for a moment the system is in an awkward state. The record now holds the new facts, but the graph the analyst actually queries is a separate store, and it has not been rebuilt yet. The record is ahead of the projection. This window is unavoidable the moment you split a system of record from a derived view, and the only real choice is whether to hide it or name it.

I chose to name it. The manifest's status walks through an explicit sequence: the record is committed, the projection is marked pending, the rebuild runs, and only then is the batch marked complete. The pending state is a real, observable thing, not an implied gap between two lines of code. This is the transactional outbox pattern in spirit: commit to the authoritative store, record that a downstream propagation is owed, and carry it out as a tracked step rather than a hopeful side effect. Naming the gap does not close it, and it is not meant to. It converts an eventual-consistency window from a silent risk into a property you can monitor, alert on, and reason about.

What is actually mine to claim

Walk back through the crossing and almost every primitive has a citation. The enriched manifest is an annotation pipeline. The ordered agents over a shared workspace are a blackboard. The matching cascade is record linkage and the merge flattening is union-find. The one privileged writer is the single-writer principle wearing the dependency rule. The named pending state is an outbox. I did not invent any of these, and pretending otherwise would be both false and unnecessary.

What I will stand behind is the assembly and what it is assembled for. The contribution is a trust boundary for machine-generated facts: treat a model's output as a proposal with no standing, route it through a gate that is empowered to refuse it, and grant exactly one component the authority to admit what survives, with the whole crossing legible to a single operator. Plenty of LLM pipelines write model output directly into their store because each step looked reasonable on its own. Drawing one hard line between proposal and record, and putting a single writer on the far side of it, is the move most of them skip, and it is the move that determines whether you can ever answer for what your system believes.

That boundary will only get more important. As extraction models improve, their proposals will look more and more like finished facts, and the temptation to let them write directly will grow in exact proportion to how convincing they become. The bet embedded in this design is that the boundary matters most precisely when the proposals are good enough to wave through on reflex. The question to carry forward is not whether your model is accurate enough to trust. It is whether, the day it is wrong, you can point to the one place where the wrong thing was admitted, and say who let it in.


This describes a general pattern for committing LLM-extracted entities into a versioned graph store; implementation specifics are illustrative.