🔥 Q1: How does Trino execute a query across multiple sources?

Answer

Trino follows a distributed execution model.

Key point: Trino minimizes data movement by pushing computation closer to the source wherever possible.


🔥 Q2: How did you avoid performance bottlenecks and large shuffles?

Answer

I focused on reducing data movement:

Also, for frequently used joins, I created curated datasets to avoid repeated heavy shuffles.

One-liner: Performance was about controlling data movement, not just scaling compute.


🔥 Q3: What if a source system is slow or down?

Answer

Yes, that’s a real challenge in federation.

I handled it in three ways:

For operational use cases, I ensured dependencies were known and monitored.

So queries could fail gracefully or fallback, depending on use case.


🔥 Q4: How did you handle data consistency?

Answer

I accepted that strong consistency across distributed systems is not always feasible.

So I handled it through:

For critical decisions, I used either:

So instead of forcing consistency, I made it explicit and manageable.


🔥 Q5: Where is OPA sitting? What about latency?

Answer

OPA was integrated in the query path via Trino’s access control layer.

To reduce latency:

Impact was minimal - typically milliseconds per query.


🔥 Q6: What if OPA is down?

Answer

I chose a fail-closed approach for security.

For critical systems, availability of governance layer was treated as a tier-1 dependency.

Security > availability in this context.

Strong line: I never compromise on access control for availability.


🔥 Q7: What exactly did YOU do?

Answer

I led the architecture and execution of the platform.

Specifically, I:

I was responsible end-to-end - from design to adoption.

Note: Prefer “I” unless you are explicitly speaking for an org-level decision.


🔥 Q8: Why not APIs instead of federated SQL?

Answer

APIs work well for predefined use cases, but they don’t scale for exploratory or cross-domain queries.

In my case:

With SQL federation:

So APIs are great for operational workflows, but federation is better for analytical flexibility and speed.


🔥 Q9: Something that failed?

Answer

One issue I faced early was performance degradation when users ran unoptimized cross-source joins.

Some queries were pulling large datasets unnecessarily.

To fix this, I:

This improved performance and reduced system load significantly.

It reinforced that federation needs guardrails, not just access.


🔥 Q10: If you build this today for Salesforce?

Answer

I would evolve it in three ways:

First, I would invest more in the semantic layer - making data more discoverable and AI-ready.

Second, I would design the platform to be agent-first, where AI systems can:

Third, I would standardize data contracts and SLAs earlier, especially across global teams.

So the architecture would remain similar, but more focused on: