inscripta-platform

AI Governance Pipelines

Inspiration

In regulated industries, deploying AI isn't blocked by capability — it's blocked by trust. Engineers distrust AI not because it gives bad answers, but because they can't trace why it gave that answer or prove it followed safety rules.

The standard response is more guardrails around the model. But that still treats AI as the system of record. It isn't.

I was inspired to build a Zero Trust architecture: treat AI exactly like a third-party API. You wouldn't let a third-party API write directly to your database without validation. You shouldn't let an AI recommendation execute without deterministic enforcement either.

What It Does

Inscripta acts as a compliance firewall between AI outputs and physical execution. Amazon Nova generates complex reallocation strategies when physical assets fail. Inscripta's governance pipeline handles everything else:

Enforcement Layer — validates Nova's proposal against a versioned, immutable Role Policy. Suggested a part we don't have? Blocked.
Risk Scorer — pure deterministic math produces a 0–100 score based on physical constraints and operational phase.
Decision Gate — routes to the right human authority level. High-risk proposals require an Electronic Change Notice (ECN) cryptographically signed by a human commander.

How We Built It

We built in a strict order: human approval gate first, audit trail second, AI integration third.

The core is a FastAPI backend handling stateful, multi-step governance transactions, backed by PostgreSQL. Frontend in Streamlit for rapid operational dashboarding.

For the intelligence layer, we integrated AWS Bedrock via boto3. Rather than treating Nova like a chatbot, we treated the system prompt as executable policy — passing a strict JSON schema and constraint context via the Converse API, forcing Nova to act as a structured, predictable data generator rather than a free-form responder.

Challenges We Ran Into: The Dual-Model Consensus Problem

To prevent hallucinations during data ingestion, Inscripta uses a Dual-Model A/B Extraction pipeline. The same diagnostic evidence runs through two independent AI calls. If the extracted values disagree beyond a typed tolerance threshold (numeric: ±10%, categorical: exact match), the system holds and routes to human arbitration before any state is written to the database.

Initially, we created A/B variance by running the same model twice with different temperature and top_p values. This worked well with other providers — varying sampling parameters produced meaningfully different extractions, giving the comparator something real to evaluate.

When we integrated Amazon Nova, we discovered a different characteristic: Nova's factual grounding is strong enough that on structured extraction tasks, both calls converge to the same output regardless of temperature. The sampling distribution is simply too peaked on unambiguous data for parameter variation to move it. The result was a comparator validating a value against itself — technically passing, but providing no real consensus guarantee.

This exposed a deeper principle: temperature variance is a fragile divergence mechanism. It only works when the model's output distribution is wide enough for sampling noise to matter. For high-stability models, you must engineer divergence through structure.

We implemented two Nova-native strategies:

Model-size divergence — running amazon.nova-pro-v1:0 (deeper reasoning) against amazon.nova-lite-v1:0 (faster, more literal extraction). Different model sizes produce genuinely different inference paths on the same input, catching edge cases where deep inference and literal text conflict.
Perspective-based prompts — giving Model A a strict literal-extraction instruction ("extract values exactly as written") and Model B a logical-analysis instruction ("assess the operational state implied by this data"). Two valid perspectives on the same evidence. Real divergence when the data is ambiguous.

We also introduced AI_MODEL_EXTRACTION_A and AI_MODEL_EXTRACTION_B environment variables so the extraction pair can be configured independently per deployment — enabling cross-provider A/B (e.g. Nova Pro vs Nova Lite) without code changes.

The underlying lesson: Nova's resistance to hallucination is a genuine enterprise strength. But it means you cannot treat all foundation models the same. Nova requires variance through

structure and perspective — not randomised sampling.

Accomplishments

We built an architecture that satisfies EU AI Act Article 12 traceability requirements. By combining AWS Bedrock's zero data retention policy with SHA-256 hash chaining, every decision record cryptographically links:

$$\text{hash} = \text{SHA256}(\text{prompt} | \text{response} | \text{approval_signature} | \text{state_change})$$

Any historical decision is fully replayable: what Nova saw, what it recommended, what a human authorised.

What We Learned

The most underappreciated element of AI integration in regulated workflows is the system prompt. Treating it as a rigid schema definition — with rules separated from runtime data — improved output quality and consistency more than any model parameter change.

Structure and format in the prompt directly define the quality of the output. A prompt that specifies an exact JSON schema, typed enums, and one-sentence rationale requirements produces auditable, parseable output. A conversational prompt produces prose you can't validate programmatically.

What's Next

The full enterprise deployment model on AWS:

Layer	Service
Compute	Amazon ECS + Fargate behind ALB
Database	Amazon Aurora PostgreSQL (immutable audit logs)
Integration	Amazon EventBridge → customer ERP/MES webhooks

Webhooks fire only after human authorisation is complete — the compliance firewall guarantees no legacy system receives an AI-originated command that hasn't been reviewed and signed.

One capability I explored but did not complete in time is cost-aware dynamic model routing — automatically selecting between Nova Lite, Nova Pro, and Nova Premier at runtime based on task complexity, current API cost budget, and queue depth. The concept: simple extraction tasks route to Lite (fast, cheap), complex reallocation proposals route to Premier (deep reasoning), and the system degrades gracefully under cost pressure without human intervention. The ModelTier registry and AIGateway abstraction are already in place — the missing piece is the runtime cost tracking and the policy that governs tier selection. This is the next meaningful architectural addition.

Built With

amazon-web-services
aws-bedrock
docker
fastapi
nova
postgresql
python
restapi
sqlite
streamlit

Updates

jie gao started this project — Mar 16, 2026 04:32 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.