TL;DR
Most engineering-AI products announced in the last twenty-four months share the same four-part stack: a foundation model, a vector store over customer documents, a RAG pipeline, and a chat UI. That stack is a stochastic parrot perched on a filing cabinet of unknown provenance. Fluent output, ungrounded content, no schema, no invariants, no lifecycle awareness, no audit trail. Good enough for consumer productivity. Not good enough for a safety-critical, export-controlled, regulated engineering programme.
Clarity’s approach is structurally different. It rests on six principles:
- Separate AI capability from domain truth. Foundation models are commoditising utilities. The Lx world model is twenty years of irreplaceable domain IP. They must not be fused in a single prompt.
- One big agent is the wrong answer. Hundreds of narrow ones is the right answer. Each bounded to a specific slice of the Lx schema, each code-reviewed, each replaceable, each auditable.
- Ground every agent in an explicit world model, not in a vector search over a pile of PDFs. Clarity’s 13-layer Lx schema is that world model.
- Temper every generation with lessons learnt. A curated, typed knowledge graph of failure modes, root causes, mitigations, and best practices — injected as RAG context, not baked into training data.
- Bound every decision by DeZolve. The patent pending Decision Intelligence Framework scores the evidential chain behind every L5 decision, distinguishing defensible decisions from well-intentioned guesses.
- Stay sovereign. Portable across any foundation model (Bring Your Own Model), deployable from open SaaS to fully air-gapped classified, with the world model, prompts, lessons-learnt graph, and DeZolve taxonomy all travelling as proprietary payload inside the deployment boundary.
The headline number — 325 AI agents — is the full-scope target across thirteen Lx layers, ten overlay groups, the reporting surface, and decision intelligence. Today, roughly two dozen production Lambdas and over 160 specialised prompt templates are live across the design plane (L0–L5). The architecture is designed for the full 325 from day one: adding an agent is a prompt file and a Lambda wire-up, not a retraining run.
If you only read one sentence: AI capability is a commoditising utility. Domain truth is irreplaceable intellectual property. The industry’s mistake is fusing them in a single prompt; Clarity’s answer is to keep them separate and let each do what it is uniquely good at.
The bargain on offer
Every week another vendor announces an “AI-powered” version of their product. Copilots for PLM. Agents for ERP. Assistants for MES. The marketing is indistinguishable. The architecture, underneath, is almost always the same: a large language model wired to a vector search over the customer’s unstructured documents, with a chat box on the front.
This is not decision intelligence. It is a stochastic parrot perched on a filing cabinet. The parrot produces fluent text. The filing cabinet contains whatever happened to be scanned, uploaded, emailed, or shared over the last decade — with no structure, no provenance, no invariants, and no semantic model of the system being engineered. The fluency of the parrot disguises the chaos of the cabinet.
For consumer productivity that is often good enough. For engineering a nuclear reactor, a satellite bus, a medical device, a submarine combat system, or an aircraft carrier catapult, it is catastrophically not. An engineering programme that cannot answer the question “what did we decide, when, on what evidence, and what has since changed about that evidence?” cannot be audited, cannot be certified, and cannot be safely evolved.
“Pointing a language model at SharePoint is not a strategy.
It is a lossy compression of twenty years of institutional knowledge
into a bag of word vectors, and then a fluent monologue over the bag.”
This whitepaper explains how Clarity approaches the problem differently. It has three sections:
- Section 1 — why most AI agents are stochastic parrots on a filing cabinet, and the importance of separating AI capability from explicit domain knowledge.
- Section 2 — machine learning, expert systems, and AI: a twenty-year arc, and why semantic awareness is the missing middle every era has failed to supply.
- Section 3 — how the 325 AI agents in Clarity actually work: the Lx world model, the lessons-learnt knowledge graph, DeZolve, the prompt resolver, and the bounding primitives that make sovereign, auditable, hallucination-bounded AI possible.
Section 1 — The stochastic parrot on a filing cabinet
Clarity’s AI architecture starts from a blunt premise: the mainstream engineering-AI stack is structurally incapable of doing engineering work. Not because the foundation models are stupid — they are astonishing — but because the architecture around them fuses two things that must be kept apart.
This section explains what that stack actually is, why it fails in engineering contexts, what the stochastic parrot metaphor is really pointing at, and why the separation of AI capability from domain truth is the single most important design decision in bounded AI.
1.1 What a modern “AI agent” usually is
Strip the marketing away from any mainstream engineering-AI product launched in the last 24 months and you find, almost without exception, the same four components:
- A foundation model (GPT-4, Claude, Gemini, Llama, Mistral, Nova, DeepSeek) accessed over an API.
- A vector store built from embeddings of the customer’s documents (PDFs, Word files, drawings, specs, emails).
- A retrieval-augmented generation (RAG) pipeline that, on each user question, does a similarity search over the vector store and stuffs the top-k chunks into the prompt.
- A chat UI that hides all of the above behind a blinking cursor.
This stack is cheap to build, demos well, and scales to millions of users. It also has four structural properties that make it almost entirely unsuitable for engineering decision work.
1.2 Four structural failures of RAG-over-documents
| Property | What it means | Why engineers can’t rely on it |
|---|---|---|
| Semantic illiteracy | The model has no notion of component, interface, requirement, invariant, baseline, or decision beyond the statistical co-occurrence of those words in training data | It will cheerfully conflate a power interface with a data interface, or a functional requirement with a capability, because it has no schema to tell them apart |
| Provenance blindness | A retrieved chunk is just text. There is no record of who wrote it, under what authority, at what lifecycle stage, superseded by what, or whether the source is still valid | Engineers cannot defend a decision whose evidence chain stops at “the model said so because a PDF chunk mentioned it” |
| Hallucination by construction | When the retrieval returns nothing relevant, the model fills the gap with the most statistically plausible completion. This is indistinguishable, to the user, from a grounded answer | Silent fabrication is catastrophic in safety-critical, export-controlled, or regulated work |
| Filing-cabinet inheritance | The knowledge base is whatever documents happened to be uploaded. Duplicates, stale drafts, contradictory versions, marketing slides, and the CEO’s holiday photos all get embedded with equal weight | The cabinet becomes a snapshot of organisational entropy, not a model of the system |
The vendor pitch for this stack is always the same: “Point it at your SharePoint and ask questions.” The engineering reality is that pointing a language model at SharePoint is not a strategy. It is a lossy dimensionality reduction of twenty years of undocumented institutional knowledge into a bag of word vectors, and then a fluent monologue over the bag.
1.3 Stochastic parrot: the term and the trap
The phrase stochastic parrot comes from Bender, Gebru, McMillan-Major and Shmitchell (2021), describing how large language models generate text by sampling from a learned statistical distribution rather than by understanding meaning. The parrot repeats patterns that sound plausible. It does not know whether what it says is true, because true is not a category in its internal representation.
Used as a general-purpose writing aid, this is fine. Used as the substrate of a decision on a £250M capital programme, it is malpractice. The failure mode is not that the parrot is stupid — modern foundation models are astonishingly capable — but that it is fluent in a way that outruns its grounding. It will complete “the interface margin is” with a number, because that is the shape of the answer the question demanded. That number will be presented in a confident, well-formatted paragraph. It will be wrong in ways no human reviewer can detect without doing the underlying engineering themselves — in which case they did not need the agent.
The filing cabinet half of the metaphor is the other half of the failure. A parrot trained on a well-structured, authoritative corpus of verified engineering truths would be useful. A parrot trained on a random dump of whatever was in the customer’s shared drive is a liability wearing a lab coat.
1.4 The missing separation — AI capability vs. domain truth
The root cause of the filing-cabinet problem is that the industry has conflated two things that must be kept separate:
- AI capability — the ability of a foundation model to read natural language, extract structured claims, summarise, reformulate, draft, translate, code, classify, and reason over short contexts.
- Domain truth — the set of entities, relationships, invariants, constraints, and lifecycle states that describe the specific engineering system being built, and the history of decisions that have shaped it.
AI capability is general, improving rapidly, and commoditising. Domain truth is specific, hard-won, and irreplaceable. The first is a utility. The second is intellectual property.
“AI capability is a commoditising utility.
Domain truth is irreplaceable intellectual property.
The industry's mistake is fusing them in a single prompt.”
Mainstream engineering-AI products fuse the two by pouring domain truth into the foundation model’s context window as unstructured retrieved text, and then asking the model to reason about both capability and truth simultaneously. The model is forced to invent the structure of the domain every time it is prompted, because the structure is not present — only fragments of prose about it are. This is equivalent to hiring a brilliant graduate, locking them in a room with a filing cabinet every morning, and asking them to re-derive the company’s engineering model from scratch before answering the day’s first question. It is a waste of the graduate, and a disservice to the company.
Clarity takes the opposite approach. Domain truth lives in an explicit, structured, versioned, provenance-tracked model — the Lx world model. AI capability is bounded by that model. Every agent is a narrow specialist operating over a single, well-defined slice of the Lx schema, with the structure, vocabulary, invariants, and history supplied by Clarity, not invented by the LLM. The filing cabinet is replaced by a graph. The parrot is replaced by an army of narrow, auditable, replaceable specialists.
1.5 Four symptoms of the fusion mistake
Whenever AI capability and domain truth are fused in a single prompt-over-documents stack, four symptoms appear. Engineers on the receiving end will recognise all of them:
- Confident contradictions. Ask the same question twice, ten minutes apart, and get two different numbers. Both presented fluently. Neither traceable.
- Disappearing provenance. The answer cites “the design document” without naming which one, which revision, which section, or which authority. When pushed, the agent invents a citation that does not exist.
- Silent drift. The underlying documents get updated. The vector index gets re-built. The agent’s answers change. No changelog, no diff, no record of why the answer mutated. Regulators notice.
- Schema evaporation. Any attempt to enforce a structured output schema — JSON, a requirements table, a BOM row — results in field names quietly drifting between camelCase and snake_case, arrays arriving without their wrapper object, and entire required fields being omitted. Every integration downstream has to defensively re-parse.
None of these symptoms are bugs in any individual product. They are emergent properties of the fusion mistake. They cannot be engineered away at the product layer. They have to be engineered away at the architecture layer — by separating capability from truth, binding the AI to an explicit world model, and enforcing structure at every interface. That is what Clarity does, and it is what the remainder of this paper describes.
Section 2 — Machine learning, expert systems, and AI: the missing middle
To understand why Clarity’s architecture looks the way it does, it helps to remember that the engineering community has been trying to build useful decision-support tools for more than forty years, through three distinct eras of machine intelligence. Each era answered a different question, and each era taught a lesson the current era has largely forgotten.
2.1 Three eras, three answers, one failure
| Era | Dominant technology | Core question | Lesson learnt |
|---|---|---|---|
| ~1975–2000 | Expert systems (rule engines, Prolog, CLIPS, OPS5, Drools) | Can we encode what experts know as explicit rules? | Rules scale badly and rot fast; without a shared semantic model, every rule base becomes a tangle only its original author can maintain |
| ~2000–2020 | Classical machine learning (SVMs, random forests, gradient boosting, early neural nets) | Can we learn patterns from data without encoding rules? | Models are only as good as their features; without a semantic model, feature engineering becomes an unmaintainable art form |
| ~2020–present | Foundation models (transformers, LLMs, multimodal models) | Can we learn everything from scale? | Scale produces fluency, not grounding. Without a semantic model, fluency becomes hallucination |
The striking thing about this table is the final column. Every era failed in the same place: the absence of a shared, explicit, versioned semantic model of the domain. Expert systems died of semantic entropy. Classical ML died of feature-engineering fatigue. Foundation models are dying — in high-stakes engineering contexts — of ungrounded fluency.
The missing middle is the same in every era. It is the world model. The thing that says this is a component, this is an interface, this is a requirement, this is a baseline, this is a decision, this is the relationship between them, and this is what must remain true as they evolve.
Clarity’s founders spent twenty years building, breaking, rescuing, and ultimately being frustrated by expert systems across defence, nuclear, aerospace, health, energy, and advanced manufacturing. The Lx world model is the distilled answer to the lessons of that twenty years, carried forward into the foundation-model era — not replaced by it, but augmented by it.
2.2 Expert systems — what they got right, what they got wrong
Expert systems were the first serious attempt to make machines reason about engineering domains. They worked, for a while, and on constrained problems they still do. What they got right was the insight that engineering reasoning needs an explicit, auditable chain of inference — a reviewer must be able to point at a conclusion and ask why?, and the system must be able to answer.
What they got wrong was everything around that chain:
Brittleness
Rules written for one programme did not transfer to the next. Every new project was a cold start.
Opacity
Rules encoded as flat lists had no structural relationship to the system being reasoned about. A reviewer could see what the rule said but not why it existed, what it depended on, what depended on it, or what changed when it was edited.
Semantic drift
Without a shared ontology, different authors used the same word for different things and different words for the same thing. After two years the rule base was unmaintainable by anyone except the original author, and after five years it was unmaintainable even by them.
No lifecycle awareness
Rules were stateless. They had no notion of this requirement was valid at preliminary design but was superseded at critical design review. Every rule base gradually became a record of the past rather than a model of the present.
Every one of these failure modes recurs, in a new costume, in today’s RAG-over-documents stacks. Brittleness becomes “it works on our demo corpus but not yours”. Opacity becomes “why did the agent say that?”. Semantic drift becomes “the model uses inconsistent field names”. No lifecycle awareness becomes “the answer changed and we don’t know why”.
The lesson that expert systems taught, and that must be carried forward, is this: explicit reasoning requires an explicit ontology of the domain, with structural relationships, lifecycle states, and provenance built in from the first commit, not retrofitted from the tenth.
2.3 Classical ML — the feature-engineering detour
Classical machine learning — SVMs, random forests, gradient-boosted trees, early feedforward neural networks — was, in hindsight, a detour. It promised to replace the rule base with a learned function over features, removing the brittleness of hand-coded rules. And within narrow, data-rich domains (fraud detection, ad click prediction, predictive maintenance on well-instrumented machinery) it delivered. In engineering decision support, it mostly did not.
The reason is subtle. Classical ML works when the features are stable, the labels are reliable, and the training distribution looks like the test distribution. Engineering programmes violate all three assumptions:
- Features change as the design changes.
- Labels are scarce, expensive, and disputed.
- The training distribution is the last three programmes, and the test distribution is the next one — which is, by definition, novel.
Transfer does not work, because there is no shared semantic layer that makes one programme’s features meaningful to another’s model.
What ML did contribute was a mental shift away from hand-coded rules toward data-driven inference. That shift is valuable. But it also entrenched a blind spot: the belief that if you only had enough data and enough compute, the model would figure out the domain on its own. That belief has carried forward into the foundation-model era almost unchanged, and it is wrong for the same reason it was wrong in 2010. Engineering domains are not learnable from raw data alone. They require an explicit ontology — not because the model cannot, in principle, learn one, but because the cost of bugs in the ontology is catastrophic, and the cost of making the ontology explicit is low.
2.4 Foundation models — fluency without grounding
Foundation models are a genuine breakthrough. They read natural language, extract structured claims, generate code, summarise long documents, translate across domains, and reformulate ideas in ways that classical ML never could. For the tasks Clarity uses them for — and only for those tasks — they are the right tool.
But they are not a substitute for a world model. A foundation model, prompted with an engineering question and retrieved document fragments, is doing the same thing an expert system did in 1985: reasoning over a context. The difference is that an expert system’s context was explicit, auditable, and version-controlled; a foundation model’s context is a snapshot of a vector search over a pile of PDFs. The expert system could be audited line by line. The foundation model cannot.
The fluency of the model disguises this. An answer that would have been “ERROR: insufficient rules to conclude” in 1985 is now “Based on the available information, the interface margin is approximately 12%” — a sentence that sounds authoritative, reads well, and is completely unfalsifiable without re-doing the engineering. The user’s cognitive load goes up, not down, because now they have to audit the fluency in addition to the answer.
“An expert system said ‘ERROR: insufficient rules to conclude’.
A foundation model says ‘Based on the available information,
the interface margin is approximately 12%’.
The second sentence is more dangerous than the first.”
2.5 Carrying the lesson forward
The Clarity world model — the Lx schema — is what an expert system always should have been, rebuilt with the lessons of forty years. It is:
- Explicit — every entity type, relationship type, and lifecycle state is defined in a schema, not inferred from text.
- Versioned — every change is tracked, attributed, dated, and reversible.
- Provenance-carrying — every field records where it came from, who put it there, what confidence was attached, and what evidence supports it.
- Lifecycle-aware — entities move through thirteen explicit lifecycle phases (L0–L12), each with its own semantic rules and transition conditions.
- Overlay-extensible — domain-specific concerns (financial, security, regulatory, risk, supply chain, export control, quality) are overlays on the base schema, not forks of it.
- Invariant-enforced — three rings of validation catch violations at annotation time, at change-approval time, and at solver time.
- Lessons-learnt-aware — historical failure modes, root causes, and mitigations are carried forward as a knowledge graph that is injected as context into every generation.
And then — then — foundation models are brought in. Not as a replacement for the world model, but as a capability layer bolted to it. Each AI agent is a narrow specialist operating over a single slice of the Lx schema, with the ontology, vocabulary, invariants, and history supplied by Clarity, not invented by the LLM. The agents do what foundation models are uniquely good at (reading, extracting, summarising, drafting, translating). The world model does what foundation models are uniquely bad at (grounding, provenance, consistency, lifecycle awareness, invariants).
That is the missing middle. That is what every previous era failed at. And that is what the third section of this paper describes in structural detail.
Section 3 — How the 325 AI agents in Clarity work
Clarity’s platform architecture is designed around the principle that there should not be one big AI agent. There should be hundreds of small ones, each narrow, each bounded, each auditable, each replaceable, and each tied to a specific slice of the Lx world model.
This section describes, as structurally as possible and while respecting the patent protections around the Clarity IP, how those agents are composed, bounded, and kept honest.
3.1 The headline number, honestly
The headline number — 325 AI agents — is the full-scope target across the thirteen Lx layers (L0 stakeholder intent through L12 disposal), the ten overlay groups, the reporting surface, the decision-intelligence layer, and the cross-layer orchestration. Not every one of those 325 is deployed in the current pre-beta release. Today, roughly two dozen production Lambdas and over 160 specialised prompt templates are live across the design-plane layers (L0–L5). The remaining agents are the implementation-plane expansion (L6–L12) plus the overlay and reporting specialisations that follow once the base is stable.
We report the full number because the architecture is designed for it from day one. Adding an agent is a matter of authoring a new prompt template, registering it in the resolver, and wiring it to a Lambda. No retraining. No foundation-model change. No schema migration. The scaffolding is already there.
What follows is a structural description of how an individual agent is bounded, how agents compose, and how the whole army is kept honest by DeZolve. It is deliberately IP-preserving: the whitepaper describes the shape of the architecture, not the specific algorithms, prompt contents, or scoring formulas that are subject to patent protection.
3.2 Anatomy of a single bounded agent
Every AI agent in Clarity is composed of five parts. None of them is a foundation model alone. The foundation model is the least interesting component.
| Component | What it provides | Why it matters |
|---|---|---|
| Lx schema slice | A narrow, well-defined fragment of the Lx world model — e.g. L2 option sets, L3 scenarios, L0 invariants — with explicit entity types, field definitions, and cross-references | The agent is scoped to a single kind of engineering object; it cannot wander across the schema |
| Prompt template | A versioned, S3-hosted, hierarchically-resolved prompt file specifying task, inputs, outputs, schema, and constraints | The agent’s behaviour is code-reviewed, diff-able, and hot-swappable without redeploying Lambdas |
| RAG context | Targeted retrieval from the world-model JSON, relevant overlays, the lessons-learnt knowledge graph, and (where appropriate) the tenant’s imported standards library | The agent’s grounding comes from authoritative, structured sources — not from a vector search over whatever was uploaded |
| Foundation-model adapter | The LLMClient abstraction, which selects a concrete model from a group registry (generation, extraction, classification) with runtime fallback and IAM-bounded access | The agent is portable across Nova, Claude, DeepSeek, or any future Bedrock or non-Bedrock model without changing any other component |
| Response validator | A three-pass JSON sanitiser, a dual-key field-name guard, structured invariant checks, and a provenance annotator that records lineage, confidence, evidence, and authorship on every written field | The agent’s output is structurally verified before it touches the system of record |
An agent, in this architecture, is not an LLM. It is the five-part composition above. Changing the LLM is the smallest possible change. Changing the prompt is a bigger one. Changing the Lx schema slice is bigger still. Changing the validator is a major architectural decision. The cost of change is correctly proportional to the semantic weight of what is being changed — which is the opposite of how stochastic-parrot stacks behave.
3.3 The prompt resolver — 160 agents from one abstraction
Clarity’s prompt layer is an S3-hosted hierarchy addressed by a four-part key: scope, surface, layer, intent. Every agent’s prompt is resolved at runtime by walking that hierarchy, with category-specific overrides falling back to generic defaults.
prompts/
global/
shell/panel/chat/base.json ← reusable chat scaffolding
tab/
model/sidebar/L0/extract-requirements.json
model/sidebar/L0/extract-invariants.json
options/sidebar/L2/generate-options.json
options/sidebar/L2/generate-parameters.json
scenarios/sidebar/L3/generate-l3-scenario.json
scenarios/sidebar/L3/analyze-monte-carlo.json
reporting/{report-code}/{insight-type}.json
...
Today over 160 prompts are registered. Each one is a separately-versioned JSON document, code-reviewed, diff-able, and hot-swappable without a redeploy. The resolver is boring by design: given (scope, surface, intent, layer) it returns the right prompt. No prompt content is ever hard-coded in a Lambda. Missing prompts raise an explicit error, not a silent fallback.
The consequence is that behaviour lives in data, not in code. A prompt engineer can tune an L2 option-generation agent without touching the Lambda that invokes it. A security reviewer can diff every prompt change across a release. A compliance officer can snapshot the prompt corpus for audit. In an air-gapped deployment, the prompt corpus is bundled with the world model and ships as part of the proprietary payload — nothing is phoned home, nothing is mutated in flight.
3.4 Bounding by the Lx world model
The single most important design decision in Clarity’s AI architecture is this: the agents do not invent the world model. The world model is supplied to them, as context, by the Lx repository.
The Lx world model is a 13-layer hierarchy of engineering concerns:
- L0 Intent — stakeholders, needs, capabilities, functions, constraints, assumptions, invariants, risks, questions, facts, goals
- L1 Context — system boundary, external actors, interfaces to the world
- L2 Options — option sets, options, interfaces, parameters (Measure of Performance)
- L3 Scenarios — scenarios, analyses, Measure of Effectiveness overrides, Monte Carlo, Pareto, MCDA
- L4 Baselines — change management, ECRs, ECNs, configuration items
- L5 Decisions — decision records, evidence, trustworthiness (Measure of Success)
- L6–L12 Implementation plane — as-designed, as-built, as-validated, as-deployed, as-operated, as-updated, as-disposed
Every layer has an explicit entity type list, a field schema, a set of allowed relationships, and a set of lifecycle rules. Overlays — financial, supply chain, technology readiness, regulatory, lifecycle, security, risk, external systems, quality, and export/international — add annotations without forking the base schema. Every write to any Lx layer carries a @source provenance record with thirteen fields including lineage, timestamp, author, confidence, evidence references, and tool attribution.
When a Clarity AI agent generates an L2 option set, it does not generate it from nothing. It is handed the L1 system boundary, the applicable invariants from L0, the existing option sets from prior runs, the relevant lessons-learnt entries, and the applicable standards — all as structured JSON, not as text chunks. Its job is not to invent the structure of an option set. Its job is to populate a structure that Clarity already owns.
The foundation model, in this arrangement, is doing what it is uniquely good at: reading the supplied context, extracting candidate entities, drafting field values, and returning structured JSON that conforms to the supplied schema. It is not being asked to remember what an option set is, what fields it has, what constraints apply to it, or how it relates to L1 or L3. Those are questions the world model already answers, with twenty years of institutional knowledge embedded in the schema.
3.5 World-model binning — how the schema partitions RAG for batching
Every foundation model has a prompt window. Nova Micro handles 128K tokens. Nova Pro and Claude Sonnet are practical up to around 200–300K before latency and reliability degrade. In theory these limits are generous. In practice, an engineering programme with 3,000 requirements, 400 options, 200 interfaces, 80 scenarios, and fifteen years of lessons-learnt material blows past any of them in a single naive prompt.
The mainstream RAG response to this problem is similarity-sort then truncate: embed everything, fetch the top-k most similar chunks, stuff them into the prompt. That approach has no idea what it is keeping and what it is throwing away. It discards a critical invariant because the invariant happens to use unusual vocabulary. It keeps three copies of the same fact because they all embedded close to the query. It drops a supersession record because the replaced document still looks relevant.
Clarity uses the world model to do something different — binning. Because every entity lives in a known Lx layer, with a known type, known relationships, and known overlays, the RAG assembler can partition the full working set into semantically coherent bins before it ever touches the foundation model:
- Layer-bin — include L0 invariants, L1 boundary, and L2 option sets; exclude L3/L4/L5 unless the task specifically requires them.
- Type-bin — for an interface-extraction agent, pass only the interface entities and their counterpart components; leave parameters, constraints, and risks for their own bins.
- Overlay-bin — for a regulatory compliance check, include the regulatory and export-control overlays; leave financial, supply-chain, and TRL overlays out of the prompt.
- Scope-bin — include only the portion of the Lx hierarchy under the option set currently in focus; leave sibling option sets for a different call.
- Lessons-bin — filter the LLKG to the domain and invariant category of the current task; a maritime Monte Carlo run does not need aerospace lessons.
Each bin becomes a batch. A batch is a prompt the Lambda can actually fit in the target model’s window, with the guarantee that everything relevant to the task is inside and everything irrelevant has been excluded. The world model makes this binning possible because it knows what everything is. A RAG stack without a world model cannot bin by semantics — it can only bin by similarity, which is a lossy proxy for meaning.
The payoff is compounding. Batches are smaller, so latency is lower. Batches are focused, so hallucination risk is lower — the model cannot confabulate about entities that are not in its window. Batches are cheaper, because smaller prompts burn fewer tokens. And — most importantly — batches are verifiable, which is the subject of the next sub-section.
3.6 Batching as a validation mechanism — catching the black box
Foundation models are black boxes. No vendor publishes the decision process that produced a particular output. No amount of chain-of-thought prompting makes the internals of a 400-billion-parameter network auditable. For an engineering platform, this is structurally unacceptable — and structurally unavoidable. The question is not how do we open the box? The question is what do we do given that we cannot?
Clarity’s answer is to treat batching not as a performance optimisation but as a validation primitive. When a single logical task is split across multiple batches, each batch becomes an independent replication of part of the work. Consistency across batches becomes a signal. Inconsistency becomes a detected hallucination.
Three batching rules turn this from theory into production defence.
Consistent-schema batching
Every batch in a logical task is given identical schema instructions. Field names, units, value formats, enumerations, and canonical identifiers are spelled out explicitly in every prompt. If a canonical list of identifiers exists — for example, an optionSetId master list for a Monte Carlo sweep — it is embedded in every batch prompt, and the model is instructed to use those exact identifiers. When batch n comes back with optionSetId spelled option_set_id or returns a different canonical name for the same entity, that inconsistency is a caught hallucination — and the per-batch validator rejects it before it is accumulated into the result.
Per-batch conformance checks
After each Bedrock call, the Lambda validates the response against the expected shape (isinstance(parsed, dict), required fields present, numeric ranges sane) before accumulating. A batch that fails conformance is logged, skipped, and — if policy requires — retried against a different model in the same group. The merged output therefore never contains a batch that the model misunderstood. One batch’s drift cannot poison the others.
In-code aggregation, never delegated
Merging the outputs of multiple batches is done in code — by deterministic logic with explicit rules about deduplication, canonical-key resolution, worst-case enumeration, and numeric normalisation. It is never done by a second Bedrock call. The reason is simple: asking a foundation model to merge the output of another foundation model is asking one black box to audit another, and produces a black-box aggregate that inherits the failure modes of both. Code-level aggregation is boring, inspectable, diff-able, and verifiable. It is the exact opposite of an agentic “meta-agent” — which is just another parrot, supervising parrots.
“An agentic ‘meta-agent’ that supervises other agents
is just another parrot, supervising parrots.
Clarity's aggregators are code, not models.”
The cumulative effect is that Clarity’s AI pipeline treats the foundation model as an untrusted component at every step. Every call is bounded by a world-model-derived bin. Every response is validated for shape before it is used. Every batch is compared to its siblings for consistency. Every merge is done deterministically. The foundation model is assumed to misbehave, and the architecture defends against misbehaviour at every boundary — which is precisely why the resulting agents can be trusted to operate on decisions that matter.
And because the batching and validation layer is independent of which foundation model is in use, swapping models is a configuration change, not an architectural one. Which brings us to Bring Your Own Model.
3.7 The lessons-learnt knowledge graph
A second structural bounding layer is the lessons-learnt knowledge graph (LLKG). It is a curated corpus of failure modes, root causes, mitigations, and best practices drawn from real programmes across defence, nuclear, aerospace, health, energy, advanced manufacturing, and robotics. Each entry is a node with explicit fields — domain, severity, invariant category, impact cost, impact schedule, framework alignment — connected to other entries by typed edges (caused_by, mitigated_by, learned_from, relates_to).
The LLKG is not training data. It is context, injected into generation prompts at runtime, filtered by domain and category. When an L0 requirements-extraction agent runs over a ConOps document for a maritime platform, the agent is handed the maritime-applicable lessons-learnt entries alongside the ConOps text. When an L3 Monte Carlo analysis runs on a nuclear instrumentation option set, the agent is handed the nuclear-applicable lessons.
The effect is that every generation is biased toward the patterns that have previously failed in related contexts. The agent cannot forget the lessons, because they are supplied on every call. The user cannot forget them either, because they are rendered into the UI alongside the generated content with their full provenance.
This is the opposite of a stochastic parrot. The parrot trained on everything, equally, and remembered nothing. The Clarity agent is supplied, on every call, with the short list of things that have specifically gone wrong before in this kind of work.
3.8 DeZolve — the Decision Intelligence Framework
Layered above the Lx world model and the LLKG sits DeZolve — the patent pending Decision Intelligence Framework that gives Clarity its name for bounded AI.
DeZolve is an ontology of decision trustworthiness. It defines a catalogue of fifteen node types — Analysis, Context, Conclusion, Question, Option, Idea, Decision, Requirement, Challenge, Need, Assumption, Goal, Fact, Evidence, Data — and twenty-six typed edges describing the relationships between them (Assumption validated by Fact, Decision requires Options, Evidence substantiates Requirement, and so on). Every node and edge type is canonical. The taxonomy is embedded in Clarity’s world model as a first-class artefact.
When a decision is made in Clarity — a formal L5 decision record — DeZolve walks backward through the evidence chain from the decision to its supporting analyses, from those analyses to their contextual assumptions, from the assumptions to the facts that validated them, and from the facts to the raw data or evidence they came from. At each hop the traversal computes a local trust weight: verified (explicit traceability), inferred (AI-derived, human-approved), transitive (indirect path), or gap (missing link). These weights compose into a truth vector for the decision as a whole — a structured measure of how defensible it was at the moment it was taken.
“DeZolve does not ask whether a decision was right.
It asks whether a decision was defensible at the time it was taken,
given the evidence available and the chain by which
that evidence reached the decision-maker.”
Two consequences follow that make DeZolve fundamentally different from any legacy audit trail:
- Decisions are scored on their evidential foundation, not on their outcome. A decision that turns out badly may have been perfectly defensible at the time. A decision that turns out well may have been reckless. DeZolve tells the difference, and the difference matters for auditors, regulators, boards, and post-incident investigators.
- AI-generated content cannot masquerade as human-verified content. Every field in the Lx model carries an explicit lineage — algorithm, ai, human. DeZolve’s trust weights explicitly distinguish between them. An AI-generated assumption that has not been human-approved contributes a lower trust weight than a human-verified fact. The truth vector of a decision is a first-class measure of how much of its foundation was AI and how much was human.
The specific scoring formula, the exact weights, and the graph-traversal algorithm are subject to patent protection and are not described here. What matters for this paper is that DeZolve is the layer that makes the bounding auditable. Without it, a Clarity user would have to trust the designers of the prompts, the curators of the lessons-learnt graph, and the engineers of the Lambdas. With it, every decision carries, as a structural property, a computed measure of how much trust its evidential chain actually deserves.
3.9 Three rings of invariant enforcement
Bounding AI output against a world model is necessary but not sufficient. The model must also enforce its own invariants — structural rules that must hold regardless of what any agent produces. Clarity applies three rings of invariant enforcement, running at three different points in the pipeline.
Ring 1 — Annotation
Every Lx generation Lambda, before writing its output, annotates each entity with an invariantStatus header. Violated invariants are surfaced to the UI as banners, not hidden as errors.
Ring 2 — Change approval
The change-approval pipeline blocks promotion of any change that would violate an active invariant. A pending change with an unresolved invariant violation returns HTTP 409 and cannot progress to baseline.
Ring 3 — Solver gate
The L3 analysis solver pipeline refuses to run if pre-conditions are violated, and annotates post-solve results with any invariants violated during the run. A failed invariant aborts the solver with HTTP 422.
Invariants are not rules in the old expert-system sense. They are predicates attached to the L0 intent layer, authored by engineers, versioned, provenance-carrying, and traceable. When the AI agents are producing content, the invariants are watching. When the humans are approving content, the invariants are watching. When the solvers are running, the invariants are watching. At each point the agent is bounded not only by its schema slice and its prompt, but also by the predicate-level truths that the engineering team has asserted about the system.
3.10 Hallucination bounding — structural, not hopeful
Language-model output is inherently probabilistic. A well-engineered system does not hope that the model will behave; it assumes the model will misbehave and defends against it. Clarity defends at four structural layers:
| Layer | Defence | Mechanism |
|---|---|---|
| Response shape | Foundation models do not reliably honour the field names or the wrapper structure specified in a prompt. They may return snake_case when you asked for camelCase, or a bare array instead of an object | Every Bedrock-reading Lambda uses a dual-key field access pattern and an isinstance guard before any .get() on the parsed response |
| JSON parsing | Foundation models emit Markdown fences, unbalanced braces, stray control characters, and arithmetic expressions that are not valid JSON | A three-pass sanitiser strips fences, balances braces, removes control characters, and evaluates simple arithmetic expressions before handing the result to json.loads() |
| Batch consistency | Foundation-model output drifts across batches: different field names, different enumerations, different formats, even within the same request | Every batch is validated against an explicit schema before being accumulated. Numeric formats are normalised per batch, not after merging. Aggregation is always done in code, never delegated to a second model call |
| Provenance capture | AI-generated content that masquerades as human content destroys the audit trail | Every generated field carries a @source lineage record marking it as algorithm/ai/human, with timestamp, authorship, confidence, and evidence references. Human approval flips the lineage to human and re-computes the DeZolve trust contribution |
None of these defences are novel on their own. What is novel is that they are mandatory, enforced by pattern catalogue, and applied consistently at every boundary where a foundation model’s output becomes part of the system of record. A Clarity Lambda that fails to apply them is considered defective by construction — not because the code is buggy, but because the architecture is compromised.
3.11 Bring Your Own Model — portability without re-architecture
A sovereign AI architecture has to survive the next model release, the next model provider, and the next policy change about which models a customer’s jurisdiction will allow. Clarity’s foundation-model layer is deliberately thin.
An abstract LLM client exposes a small group-based interface (generation, extraction, classification, and similar). At runtime, each group resolves to a concrete model via an SSM-parameter lookup. The default registry includes Amazon Nova Premier / Pro / Lite / Micro, Anthropic Claude Sonnet / Haiku / Opus, and DeepSeek R1 / V3 — all accessed through Bedrock. A customer who needs a different model changes the SSM parameter; no code changes.
Concrete model-to-task mappings in the current pre-beta deployment
The assignment of foundation models to tasks is not accidental. Each task has a specific context-window requirement, a specific reasoning profile, and a specific cost constraint — and Clarity matches the model to the task, not the other way round.
| Task | Group | Current default model | Why this model |
|---|---|---|---|
| L0 requirement extraction from ConOps PDFs | extraction | Amazon Nova Pro (300K context) | Large-context extraction over 50+ page source documents; structured JSON output with bounded vocabulary; cost-efficient at scale |
| L0 invariant extraction + lessons-learnt cross-reference | generation | Anthropic Claude Sonnet 4 (200K) | Stronger reasoning over KG cross-references and predicate-level semantics; strict schema adherence |
| L1 system-boundary generation from L0 facts | generation | Amazon Nova Pro | Fast structured generation with explicit schema; good price/performance for mid-complexity tasks |
| L2 option-set generation and parameter assignment | generation | Anthropic Claude Sonnet 4 | Multi-constraint reasoning across options, interfaces, and MoP parameters; Claude’s instruction-following is the strongest fit |
| L2 bulk parameter refresh | classification | Amazon Nova Lite (128K) | Batched lightweight updates over many entities; Nova Lite’s lower cost dominates given the small per-call reasoning requirement |
| L3 Monte Carlo scenario sweep | generation | Amazon Nova Pro | Long-context sweep across many option sets; structured numeric output; batching pattern assumes 300K window |
| L3 Pareto / MCDA scoring | generation | Anthropic Claude Sonnet 4 | Multi-criteria reasoning with explicit trade-off justification |
| Chat turn orchestration (tenant chat) | generation | Anthropic Claude Haiku 3.5 | Low-latency conversational turns; fast enough to keep the UI responsive |
| Report narrative generation | generation | Anthropic Claude Sonnet 4 | High-quality prose over structured Lx input; the one task where fluency is actually the goal |
| Classification / tagging jobs | classification | Amazon Nova Micro | Small, fast, cheap; used where the task is close to a lookup |
| Alternative evaluation runs (A/B) | any | DeepSeek R1 or V3 | Periodically re-evaluated against Nova and Claude for quality regressions; useful as a sovereignty fallback in jurisdictions where US cloud models are unavailable |
Every mapping in that table is a single SSM parameter. Changing the generation group for L3 Monte Carlo from Nova Pro to Claude Opus is one command. Changing it to a customer-supplied on-premise model in an air-gapped SECRET environment is also one command. The Lambdas do not know, and do not care, which concrete model served their call. They call the group; the group resolves.
Air-gapped deployments — when commercial cloud AI is not authorised
In classified or air-gapped environments where commercial cloud AI is not authorised at all — US C2S/SC2S, UK NCSC SECRET, Australian IRAP SECRET, and equivalent — the same BYOM interface supports customer-provided local models running inside the enclave. In practice that means an on-premise Llama 3 or Mistral variant, a sovereign national model, or a customer-specific fine-tuned model, hosted on whatever accredited inference runtime the classification authority has approved.
The Lambda’s contract is unchanged. The world model, the prompt corpus, the lessons-learnt knowledge graph, and the DeZolve taxonomy all travel with the deployment as proprietary payload. The customer’s security authority approves the concrete model once, points the SSM parameter at the local endpoint, and the platform runs. No code change. No rebuild. No network egress.
Crucially, the quality of the bounded-AI output degrades gracefully when an air-gapped model is less capable than its commercial counterpart. The world-model binning still keeps prompts focused. The batching and validation layer still catches drift. The invariants still enforce constraints. The DeZolve trust vector still scores the evidential chain. A smaller, less capable local model produces smaller, less ambitious candidate outputs — but those outputs are still structurally bounded, still provenance-tracked, and still auditable. The sovereignty of the platform is not contingent on having the biggest model available. It is contingent on having the architecture that makes any model usable.
This matters because the choice of foundation model should not be an architectural commitment. Foundation models are improving month-on-month; a product that bakes in a single vendor’s API is a product that will have to be rebuilt when the vendor’s licensing changes, when the customer’s policy changes, or when a materially better model arrives. Clarity treats the foundation model as what it is: an interchangeable utility with a stable interface and a rapidly changing implementation.
3.12 Sovereignty — architecture, not policy
The last structural property that makes Clarity’s AI architecture sovereign rather than merely private is that every sensitive component lives inside the deployment boundary.
- The Lx world model is proprietary payload, delivered with the deployment, encrypted at rest with customer-managed keys.
- The prompt corpus is proprietary payload, versioned, hot-swappable, code-reviewed, and never mutated outside version control.
- The lessons-learnt knowledge graph is proprietary payload, domain-filtered per deployment, and never mixed with tenant-generated data.
- The DeZolve taxonomy is proprietary payload, canonical, versioned, and the same across every deployment.
- The tenant data is multi-tenant-isolated at the S3 path level, encrypted with tenant-specific KMS aliases, and validated for tenant-membership on every Lambda boundary.
- The identity layer normalises claims from any IdP — Cognito, Keycloak, SailPoint, CAC/PIV, government-supplied — so that the architecture supports both commercial SaaS and air-gapped classified deployments without divergent code paths.
- The exchange layer uses the Diode and Airlock connectors (subject of a separate whitepaper) to move metadata across classification boundaries with three independent enforcement layers and dual-policy redaction.
Sovereignty is not a marketing claim. It is the structural property that nothing in the architecture depends on a call to a third-party service the customer cannot audit, a model the customer cannot replace, a schema the customer cannot version, or a knowledge base the customer cannot inspect. Every component of the bounded-AI stack is owned, auditable, and portable.
3.13 Putting it together — what a Clarity AI agent actually does
A concrete walk-through, drawn from the L2 option-generation path, makes the architecture tangible.
- A user in the Options tab requests AI-assisted option generation for a particular L1 system boundary.
- The frontend signs a request and invokes the
generate-optionsLambda via a Function URL with IAM authentication. - The Lambda loads the L1 system boundary, the active L0 invariants, and the existing L2 option sets from the Lx repository.
- The prompt resolver walks the hierarchy
(tab='options', surface='shell.sidebar', intent='generate-options', layer='L2')and returns the current prompt template from S3. - The Lambda assembles the RAG context: relevant lessons-learnt entries filtered by the L1 domain, applicable standards from the tenant’s library, and the structured L0/L1 data.
- The LLMClient resolves the
generationgroup to a concrete Bedrock model and invokes it with the assembled prompt. - The response is passed through the three-pass JSON sanitiser. Dual-key field access guards extract option set candidates.
- The invariant validator runs Ring 1 annotation: each candidate is tagged with an
invariantStatusreflecting any violations. - The
@sourceannotator stamps every generated field with lineage=ai, timestamp, confidence, and evidence references. - The result is written to the Lx repository as a draft L2 option set. The Lx.json aggregation pipeline debounces, re-aggregates, and emits an EventBridge event.
- The frontend renders the draft in the sidebar, with invariant banners where they apply, provenance chips showing AI authorship, and edit controls for human review and approval.
- When the user approves, the
@sourcelineage flips to human-verified, the DeZolve trust vector contribution recomputes, and the option set becomes part of the baseline for downstream L3 analysis.
Every step in that walk-through is bounded. Every step is auditable. Every step is replaceable. No step depends on the foundation model remembering something. No step depends on a vector search over a pile of PDFs. No step produces a fluent paragraph without a provenance chain. The agent is a narrow specialist, the world model supplies the grounding, the LLKG supplies the caution, DeZolve supplies the auditability, and the invariants supply the guardrails. That is how bounded AI actually works.
Conclusion — bounded, narrow, sovereign, auditable
The AI industry is, at the time of writing, offering engineers a bargain they should refuse. The bargain is this: accept a stochastic parrot on a filing cabinet of unknown provenance, in exchange for fluent answers to engineering questions that cannot be reviewed, traced, or defended. The fluency is real. The grounding is not.
Clarity offers a different bargain. Accept a world model that has been refined over twenty years, carrying the lessons of the expert-system era forward into the foundation-model era. Accept that AI capability and domain truth must be separated, and that the separation is not a compromise but the whole point. Accept that the right number of agents is not one — it is hundreds, each narrow, each bounded, each auditable, each replaceable. Accept that every decision must carry, as a structural property, a computed measure of how defensible its evidential chain actually is.
In exchange, you get engineering AI that behaves like engineering: explicit, versioned, provenance-carrying, lifecycle-aware, invariant-enforced, and sovereign. Not a parrot. Not a filing cabinet. An army of narrow specialists, bounded by DeZolve, grounded in the Lx world model, tempered by twenty years of lessons learnt, and deployable from open SaaS to fully air-gapped classified programmes without changing a line of code.
That is what 325 AI Agents, Bounded by DeZolve means. It is not a slogan. It is an architecture, and it is the one the next twenty years of engineering work actually needs.
This whitepaper forms part of the Clarity technical series. See also: Breaking the DIKW Ceiling and the 25-USP matrix.
Buyer journeys: Systems Engineer · Tier 2–4 Supplier