# Hi, I’m Vivek. I’m currently a Principal Engineer, where I design AI-ready data platforms for enterprise-scale systems.

My focus is on building the backbone for AI — enabling data to flow securely and reliably from multiple sources through governed, metadata-driven layers so it can be safely consumed by analytics and AI systems.

More recently, I’ve been working on agentic AI use cases, where intelligent agents can interact with data platforms, translate intent into queries, and retrieve insights in a controlled and auditable way.

What excites me about this opportunity is applying these ideas beyond a single enterprise — and working at Salesforce scale to enable AI-driven insights across global employee systems.

# One project I’m particularly proud of is the AI-ready data platform I built at my organization.

The challenge was that data was spread across multiple systems, and teams struggled to access it in a consistent and governed way. So I designed a platform that combined real-time pipelines with a federated query layer, allowing users to access data from multiple sources using a single interface.

On top of that, I introduced a metadata-driven governance layer to ensure secure and controlled access, which was especially important for sensitive data.

This platform now supports over 200,000 queries per day across more than 1,000 users, with a 99.98% success rate — and it also became the foundation for enabling AI use cases, where agents can interact with data in a governed way.

You mentioned agentic AI — can you explain what that means in your work, in simple terms?

In simple terms, agentic AI means systems where AI doesn’t just generate insights, but can actively interact with data and systems to complete tasks.

In my work, I’ve been building platforms where an AI agent can understand a user’s request, translate it into a query or API call, and retrieve the right data — all within governed and secure boundaries.

For example, instead of a user manually writing SQL, an agent can do that on their behalf and return insights, while ensuring access controls and policies are respected.

So it’s essentially moving from passive analytics to AI systems that can take actions in a controlled and reliable way.

That’s really interesting. How do you ensure data quality and reliability in the pipelines you build?

For me, data quality and reliability start at ingestion and are enforced throughout the pipeline.

I implement validation checks at multiple stages — for example schema validation, data completeness, and anomaly detection during ingestion. I also use monitoring and alerting to detect failures or unexpected patterns early.

On top of that, I focus on making pipelines observable — tracking metrics like success rates, latency, and data freshness — so issues can be identified and resolved quickly.

Finally, I integrate governance through metadata, so there is clear lineage and traceability, which helps in both debugging and ensuring trust in the data.

Why do you think you’re a good fit for this role at Salesforce?

I think I’m a strong fit because my experience sits at the intersection of data engineering, AI, and platform design — which aligns closely with what this role is looking for.

I’ve built large-scale, production-grade data platforms that not only handle high-volume pipelines, but also enable governed and secure access to data, which is critical when AI systems are involved.

More recently, I’ve been working on agentic AI use cases, which I believe is very aligned with Salesforce’s direction around AI-driven insights and automation.

And beyond the technical side, I enjoy working closely with stakeholders to turn data into meaningful outcomes — which I understand is a key part of this role.

What are the biggest data challenges the team is solving today?

“How is Salesforce thinking about governance and trust when AI agents interact with employee data?”

I’m motivated by the opportunity to apply my experience in AI-driven data platforms to a global, AI-first environment like Salesforce, where data can directly drive intelligent workflows and decisions.

I would like to go with my first use case of federated query engine.

“I faced a critical challenge where software vulnerabilities were continuously emerging, and the response process was too slow to keep up.

Whenever a high-severity vulnerability was published, it would take weeks to identify which systems were affected, who owned them, and where those software components were running across the organization.

The core issue was that data was heavily siloed across multiple systems—asset inventories, vulnerability scanners, ownership records—and there was no unified or reliable way to correlate this information quickly.

I initially tried to solve this by building a centralized data lake, but that introduced another problem: the data quality degraded compared to source systems, and I still couldn’t achieve real-time visibility.

The business impact was significant—slow response times meant increased exposure to critical vulnerabilities, and the organization was unable to take timely defensive actions.

So instead of continuing with a centralized ingestion-heavy approach, I shifted to a federated data strategy.

I designed a distributed data mesh using a SQL-on-anything approach with Trino, allowing teams to query data directly from source systems in real time—without waiting for ingestion pipelines.

The key trade-off I had to manage was speed vs governance: • Direct access improved freshness and response time • But required strong access control and data consistency guarantees

To address this, I implemented metadata-driven governance using OpenMetadata and OPA, ensuring that access decisions were dynamic, policy-based, and aligned with ownership and sensitivity of data.

I also made a conscious decision to avoid duplicating data wherever possible, which helped reduce infrastructure costs and eliminate inconsistencies introduced by batch pipelines.

As a result: • I reduced vulnerability response time from weeks to near real-time • Enabled faster identification of impacted systems and owners • Improved data trust by querying source-of-truth systems directly • And significantly strengthened the firm’s ability to respond to emerging threats proactively

More importantly, this shifted the platform from a passive data system to an actionable intelligence layer, which is something I’m very passionate about building further—especially in the context of AI-driven and agent-based systems.”