How Infinite Federation Works
Stand up a federated query layer, a shared catalog, and product-first governance—then let domains ship value.
Core building blocks
- Federated Engine: Trino or Starburst as the execution fabric across sources.
- Open Table Format: Apache Iceberg for ACID tables and time-travel.
- Catalog: Polaris / Nessie / HMS for table & schema registry.
- Object Store: S3 / ADLS / MinIO for cost-efficient lake storage.
- Metadata Mesh: OpenMetadata (ownership, lineage, docs, tags).
- Policy Plane: OPA (ABAC/RBAC) + row/column masking; SSO groups.
- Observability: Query events, costs, SLOs; quality via data contracts.
- Developer Access: JDBC/ODBC, REST, and optional MCP for agent workflows.
Logical architecture
Operating model (Data Mesh)
Domain ownership
Each domain publishes discoverable, documented products (with SLAs, owners, contracts).
Platform guardrails
Central team runs engine, catalog, policy, and observability; no blocking domain autonomy.
Self-service
Bootstrap templates, connectors, CI/CD, and access requests via groups & tags.
Getting started (30–60–90)
Days 0–30: Foundation
- Deploy Trino/Starburst with 2 catalogs (lake + one external).
- Stand up Iceberg + catalog (Polaris/Nessie/HMS) on MinIO/S3/ADLS.
- Wire SSO (OIDC/SAML), bootstrap OPA policies (PII tags → column masking).
- Install OpenMetadata; ingest Trino, object store, and lineage.
Days 31–60: First data products
- Create 3–5 gold tables per domain with data contracts and SLAs.
- Expose governed views; enable caching/CTAS patterns where useful.
- Set SLOs and dashboards for query cost, success rate, freshness.
Days 61–90: Scale & standardize
- Add 2–3 more domains; templatize onboarding via IaC + blueprints.
- Adopt resource groups and queues; define chargeback/showback.
- Optional: enable MCP endpoints for agent workflows over the catalog.
Governance & security (pragmatic)
- Groups over users: Access via SSO groups mapped to OPA policies.
- Tag-driven rules: PII/SENSITIVE tags → dynamic column masking & row filters.
- Contracts: Schemas + test checks in CI; break-glass policy per domain.
- Audit: Query logs to lake; line-level access events retained per policy.
Success measures
- Time to publish a new data product (PRD → prod).
- Adoption: active consumers, product reuse across domains.
- Cost efficiency: bytes scanned / query, cache hit-rate, egress avoided.
- Quality: contract pass-rate, freshness SLO, incident MTTR.