AI Backbone | Sovereign Data Foundry

The scale advantage

40 connected datasets, one canonical spine, zero integration tax

The combination is the product. Any one of these sources is useful; all of them, joined on one identity layer with provenance, is transformational for AI.

40+

connected data sources

Corporate, financial, procurement, ESG, property, workforce, trade and derived signals — joined on a single canonical UK entity spine.

17.2M

resolved UK entities

Companies, directors, owners and connected places — the grounding any UK-focused AI system needs to stop guessing.

1

knowledge graph

One unified graph instead of 40 disconnected feeds, so models retrieve consistent, provenance-tracked facts.

0

data brokers to integrate

Replace months of vendor procurement, schema mapping and entity reconciliation with a single sovereign backbone.

How it works

From raw sources to predictive intelligence

Click any stage to see what flows in, what comes out and why it matters for AI systems built on the Foundry.

Stage detail

Raw sources

Authoritative public, licensed and proprietary UK datasets are ingested on stable cadences — corporate filings, financials, procurement, ESG, property, workforce, trade and derived signals.

Inputs

Companies House
Public spend & tenders
Credit & insolvency
Property & energy
Workforce & skills

Outputs

Versioned raw extracts
Source + timestamp metadata

Example payload — Raw sources

One raw record arriving from Companies House

entity: SDF_03977902

// source: companies_house  fetched_at: 2026-05-09T03:14:11Z
{
  "CompanyNumber": "03977902",
  "CompanyName": "EXAMPLE HOLDINGS LIMITED",
  "RegAddress.PostCode": "EC2A 4NE",
  "CompanyStatus": "Active",
  "SICCode.SicText_1": "62012 - Business and domestic software development",
  "IncorporationDate": "2000-04-12"
}

Department view — Raw sources

Switch perspective to see what this stage delivers for each part of government and the SME ecosystem.

Selected perspective

Procurement

Stage: Raw sources

Tailored benefit

One ingest of every UK contracts, awards and supplier filings feed — no per-portal monitoring or per-vendor licence.

Use case stories

Click a story to highlight the stages that power it

The hallucination problem

Why AI without sovereign data invents UK facts

Hallucinations are not a model failure first — they are a data failure. Models guess when grounding is missing, contradictory or unverifiable.

No grounded UK facts

Foundation models trained on the open web have shallow, outdated and inconsistent knowledge of UK firms, contracts, ownership and places. They confidently fill the gaps with plausible fiction.

Disconnected sources

Even when teams stitch vendors together, identities don't match across systems. The model is fed contradictions, so it picks one — or invents a third.

No provenance

Without source, timestamp and confidence on every fact, models cannot reason about freshness or trust. Stale data is treated as current truth.

No predictive context

Single-source signals miss the patterns that predict insolvency, supplier risk, growth or fraud. The model has no time series to reason over.

What the backbone delivers

Six capabilities every AI team needs and few can build alone

The Foundry exposes its connected estate as the building blocks data-hungry AI systems actually consume.

Training-ready corpora

Curated, deduplicated and licensed UK datasets formatted for supervised fine-tuning, instruction tuning and continued pre-training of domain models.

Verified entity records with canonical identifiers
Time-stamped financial, procurement and ownership histories
Geographic, employment and skills features per postcode
Provenance tags on every record so training data is auditable

Retrieval-augmented generation (RAG)

Drop-in retrieval over the connected knowledge graph so chat agents, copilots and decision systems answer with cited UK facts instead of plausible guesses.

Vector and graph retrieval against entity-resolved records
Source, date and confidence returned with every fact
Cross-source joins (company → contract → location → owner) in one query
Drastically reduced hallucination on UK business questions

Predictive intelligence features

Pre-engineered features for credit, supplier risk, growth, churn, fraud and demand models — built once on the sovereign spine, reused everywhere.

Insolvency and distress signals derived from filings, payments and filings cadence
Growth signals from hiring, premises, awards and group activity
Concentration and dependency features for supply-chain models
Place-based demand features tied to postcode, employment and demographics

Agent and tool APIs

Stable, governed APIs that AI agents can call as tools — with consistent schemas, rate limits and audit trails suitable for production.

Lookup, search and graph-traversal endpoints
Streaming change feeds for live agent context
Per-call provenance and licence metadata
Identity-aware access for enterprise and government deployments

Evaluation and benchmarking

Use the same backbone that powers your model to evaluate it against a UK-specific benchmark suite — so accuracy claims are grounded in real sovereign data.

Foreign-ownership and beneficial-control detection
Local-first sourcing match quality
Procurement opportunity matching
National supply-capacity modelling

Governed, legal-grade feeds

Every dataset arrives with licence terms, lineage and DPIA-ready notes — so AI systems built on the backbone are deployable in regulated and public-sector settings.

Clear licence metadata per source
Signed snapshot releases for reproducibility
Audit logs of which records influenced which output
UK-sovereign data handling and storage terms

The stack

From raw sources to AI-ready interfaces

The same backbone serves training, retrieval, features, agents and evaluation — one investment, reused across every AI workload.

Layer 01

Source integration

40+ authoritative UK feeds harmonised into one ingest layer.

Layer 02

Entity resolution

Canonical identifiers across companies, directors, owners and places.

Layer 03

Knowledge graph

Connected facts that AI systems can traverse instead of guessing.

Layer 04

Provenance & versioning

Source, time and confidence on every assertion.

Layer 05

Feature & embedding store

Pre-computed features and vectors for ML, RAG and agents.

Layer 06

Governed access APIs

Stable endpoints, rate limits, audit trails and licence metadata.

Before and after

What changes when AI teams stop building the same plumbing

Without the backbone

Months of vendor procurement and legal review per data source
Brittle ETL pipelines breaking on every schema change
Conflicting identifiers that corrupt training sets
No way to attribute a model output back to a source
Hallucinated UK facts shipped into production decisions

With the backbone

One contract, one schema, one canonical entity spine
Snapshots and change feeds with stable, versioned interfaces
Resolved identities across every source from day one
Per-fact provenance carried through training and inference
Models that cite real UK evidence instead of inventing it

Who it serves

One backbone, every AI workload

Foundation model labs

Continued pre-training and fine-tuning on a coherent UK corpus instead of scraped fragments.

Enterprise AI teams

Build copilots for credit, procurement, sales and risk on grounded sovereign data.

Government AI projects

Deploy assistants and decision systems with provenance suitable for public-sector accountability.

Startups and SMEs

Ship vertical AI products without spending the first two years rebuilding a national data layer.

Developer interfaces

Lookup, search, graph traversal and change feeds

The same backbone that powers AI training is exposed through stable, documented endpoints — designed for production agents, copilots and analytics.

Lookup API

GET

/v1/entities/{id}

Resolve a single canonical entity by SDF ID, Companies House number, domain or VAT number. Returns the unified record with per-field provenance.

GET /v1/entities/SDF_03977902
Authorization: Bearer <token>

200 OK
{
  "id": "SDF_03977902",
  "name": "Example Holdings Ltd",
  "status": "active",
  "registered_address": { "postcode": "EC2A 4NE", "lat": 51.522, "lon": -0.082 },
  "_provenance": {
    "name":   { "source": "companies_house", "as_of": "2026-04-12" },
    "status": { "source": "companies_house", "as_of": "2026-05-02", "confidence": 0.99 }
  }
}

Search API

POST

/v1/search

Faceted search across entities, contracts, properties and people. Supports filters on geography, sector, financials, ownership and dates.

POST /v1/search
{
  "type": "entity",
  "filters": {
    "postcode_area": "M",
    "sic_section": "C",
    "employees_gte": 10,
    "status": "active"
  },
  "limit": 50
}

Graph traversal

POST

/v1/graph/traverse

Walk relationships across the knowledge graph — owners, directors, contracts, suppliers, premises — with depth and edge-type controls.

POST /v1/graph/traverse
{
  "start": "SDF_03977902",
  "edges": ["owns", "controlled_by", "awarded"],
  "depth": 3,
  "return": ["entity", "contract"]
}

Change feeds

GET (stream)

/v1/changes?since={cursor}

Resumable, append-only stream of every change with source, timestamp and previous value. Drives live agent context, alerting and incremental retraining.

GET /v1/changes?since=2026-05-09T00:00:00Z
Accept: text/event-stream

event: entity.updated
data: { "id":"SDF_03977902", "field":"status",
        "from":"active", "to":"in_administration",
        "source":"insolvency_service", "as_of":"2026-05-10T08:14:00Z" }

Authentication, governance and audit

Production controls so AI built on the backbone is deployable

Every interface is wrapped in identity, policy and lineage — so departments, enterprises and regulated workloads can trust both the data and the system that served it.

Authentication

Short-lived OAuth 2.0 access tokens issued per workspace, with optional mTLS for government and enterprise tiers. Service accounts and human users carry distinct identities so audit trails are unambiguous.

OAuth 2.0 client credentials and authorisation code flows
Workspace-scoped tokens with configurable TTL
Optional mTLS for cross-organisation deployments
SSO via SAML and OIDC for human operators

Authorisation & governance

Role-based access combined with dataset-level policies enforces who can read which sources and at what granularity. Licence terms travel with every response so downstream use stays compliant.

Roles: reader, analyst, integrator, steward, admin
Per-dataset policies (e.g. aggregate-only, redacted addresses)
Licence metadata returned on every record
Data-handling tier (public-benefit, local-first, government, commercial)

Audit trails

Every API call, schema change and data release is recorded against the calling identity, with the request, response shape and the dataset versions consulted — so any AI output can be traced back to its evidence base.

Immutable, append-only audit log per workspace
Per-request lineage: identity, dataset versions, fields read
Signed snapshots for reproducible model training runs
Exportable evidence packs for regulators and DPIAs

One sovereign data backbone for every AI system that needs UK truth