🔥 Q1: How does Trino execute a query across multiple sources?

Answer

Trino follows a distributed execution model.

The coordinator parses the SQL, builds a logical plan, and splits it into stages.
Each stage is broken into tasks, which are distributed to workers.
Workers use connectors to push down filters/projections to source systems where possible.
Data is streamed back in pages, and joins/aggregations are executed in parallel across workers.
Finally, results are merged and returned to the client.

Key point: Trino minimizes data movement by pushing computation closer to the source wherever possible.

🔥 Q2: How did you avoid performance bottlenecks and large shuffles?

Answer

I focused on reducing data movement:

Predicate and projection pushdown to limit scanned data
Prefer broadcast joins for small dimensions
Pre-aggregations for heavy datasets into Iceberg
Partition pruning where possible
Set query limits and resource groups for control

Also, for frequently used joins, I created curated datasets to avoid repeated heavy shuffles.

One-liner: Performance was about controlling data movement, not just scaling compute.

🔥 Q3: What if a source system is slow or down?

Answer

Yes, that’s a real challenge in federation.

I handled it in three ways:

Timeouts and fail-fast configs in Trino
Critical datasets had fallback curated copies (Iceberg)
I defined SLA tiers - not all queries were real-time

For operational use cases, I ensured dependencies were known and monitored.

So queries could fail gracefully or fallback, depending on use case.

🔥 Q4: How did you handle data consistency?

Answer

I accepted that strong consistency across distributed systems is not always feasible.

So I handled it through:

Clear data contracts - each source was the system of record
Timestamp-based correlation for aligning datasets
Communicating data freshness expectations to users

For critical decisions, I used either:

Same-time snapshots where possible
Or validated joins with time windows

So instead of forcing consistency, I made it explicit and manageable.

🔥 Q5: Where is OPA sitting? What about latency?

Answer

OPA was integrated in the query path via Trino’s access control layer.

Each query triggers a policy evaluation call to OPA
Policies are pre-loaded in OPA (low-latency evaluation)

To reduce latency:

Policies were cached in OPA
Evaluations were lightweight (no external calls during decision)

Impact was minimal - typically milliseconds per query.

🔥 Q6: What if OPA is down?

Answer

I chose a fail-closed approach for security.

If OPA is unavailable -> access is denied
I set up high availability for OPA (replicas + health checks)

For critical systems, availability of governance layer was treated as a tier-1 dependency.

Security > availability in this context.

Strong line: I never compromise on access control for availability.

🔥 Q7: What exactly did YOU do?

Answer

I led the architecture and execution of the platform.

Specifically, I:

Defined the shift from centralized to federated architecture
Designed integration between Trino, metadata, and governance layers
Led discussions with security, platform, and governance teams
Drove implementation of policy-based access using OPA
Ensured phased rollout with measurable impact

I was responsible end-to-end - from design to adoption.

Note: Prefer “I” unless you are explicitly speaking for an org-level decision.

🔥 Q8: Why not APIs instead of federated SQL?

Answer

APIs work well for predefined use cases, but they don’t scale for exploratory or cross-domain queries.

In my case:

I needed ad-hoc correlation across multiple systems
APIs would require building and maintaining multiple integrations
Every new question would mean new API development

With SQL federation:

Users could query across systems dynamically
No additional development was needed

So APIs are great for operational workflows, but federation is better for analytical flexibility and speed.

🔥 Q9: Something that failed?

Answer

One issue I faced early was performance degradation when users ran unoptimized cross-source joins.

Some queries were pulling large datasets unnecessarily.

To fix this, I:

Introduced query governance controls
Added training and best practices
Created curated datasets for heavy joins

This improved performance and reduced system load significantly.

It reinforced that federation needs guardrails, not just access.

🔥 Q10: If you build this today for Salesforce?

Answer

I would evolve it in three ways:

First, I would invest more in the semantic layer - making data more discoverable and AI-ready.

Second, I would design the platform to be agent-first, where AI systems can:

Discover metadata
Generate queries
And act safely with governance

Third, I would standardize data contracts and SLAs earlier, especially across global teams.

So the architecture would remain similar, but more focused on:

AI-driven interaction
Stronger semantics
And standardized governance from day one.