Secretless by Design: How Aembit Secures Azure Databricks Pipelines Without Static Credentials

Abstract 3D illustration showing a secure cloud data pipeline, with a stacked data platform feeding protected data flows through a cloud lock to multiple downstream services and applications.

CISO TL;DR

Your Azure Databricks pipelines are probably authenticating to downstream services – Azure APIs, Salesforce, whatever they need – using static service principal credentials embedded in pipeline configuration. Those credentials live for months of years, rarely get rotated, are difficult to track, and when they leak, the blast radius is significant.

Aembit eliminates them entirely. By cryptographically verifying the identity of pipelines at runtime and issuing short-lived, ephemeral tokens in their place, Aembit removes the need to store, distribute, rotate, or audit static credentials across your data engineering environment. Policy-based. Fully auditable. Minimal to no code changes required depending on deployment.

Practitioner TL;DR

Aembit can use the Azure Managed Identity assigned to your Azure Databricks compute cluster or a Databricks issued oidc token for the pipeline to cryptographically attest the workload’s identity. Once verified against a configured trust provider, Aembit evaluates access policy and performs a token exchange, issuing a short-lived OAuth token for the target service (Azure resources via Microsoft Graph, Salesforce, or others).

The pipeline gets the token it needs, uses it, and it expires. No secrets stored in the pipeline. No service principal credentials sitting in config files. Attestation can be configured at the cluster level or scoped to individual pipelines using process-level identifiers. All access events are logged and exportable to your SIEM.

Introduction

Modern enterprises run on interconnected services. Data pipelines pull from APIs, push to cloud platforms, and trigger workflows across SaaS applications, often dozens of them, across business units and environments. Making all of that work requires those services to trust each other, and trust, in practice, has traditionally meant credentials: API keys, service principal secrets, OAuth tokens, and passwords distributed across configuration files, environment variables, and secrets vaults.

The problem isn’t that organizations haven’t tried to manage this carefully. Most have. The problem is that static credentials are fundamentally difficult to secure at scale. They get copied. They outlive their purpose. They accumulate in places nobody remembers. And when one leaks, or gets swept up in a broader incident, the damage extends well beyond the original scope.

Aembit was built to solve exactly this. Rather than helping organizations manage static credentials more carefully, Aembit removes the need for them altogether. Workloads authenticate in real time, access is granted based on verified identity and policy, and the credentials issued are ephemeral, expiring in minutes, not months. It’s workload identity and access management designed for the way modern infrastructure actually works.

The Use Case: Azure Databricks Pipelines Accessing Cloud and SaaS Resources

Teams using Azure Databricks often need their pipelines to reach out to other services, calling the Microsoft Graph API to pull directory data, writing to Salesforce, accessing internal Azure resources. Today, the most common way to enable that access is a service principal with a client secret baked into the pipeline configuration.

That approach works. But it carries real risk. The secret doesn’t expire automatically. If someone with access to the pipeline configuration can read it, they have persistent access to whatever that service principal can reach. Rotation is manual and disruptive. Auditing who used the credential, and when, is difficult. For a real-life example, without naming the organization, a contractor helped build a Databricks pipeline and had access to the secret during implementation. Months after the project ended, that credential still worked because nothing automatically tied access to the person or their employment status. This only caught when a 3rd party auditor brought it up.

The question Aembit answers is: what if the pipeline never needed to know its own secret?

How Aembit Works with Azure Databricks

Aembit replaces the need for a static secret by supplying ephemeral tokens based on workload identity attestation. When Azure provisions a compute cluster in Azure Databricks, it automatically creates and assigns a Managed Identity to the underlying virtual machines. That managed identity is cryptographically verifiable; it’s not something that can be forged or copied. Aembit uses it as the foundation for workload attestation.

Here’s what the flow looks like in practice:

Step 1: Identity attestation

When a pipeline needs to access a downstream service, it calls the Azure Instance Metadata Service to retrieve the compute node’s managed identity token or a Databricks OIDC token issued by the Databricks Authorization Server, scoped to the Run As identity of each job. This token is sent to Aembit.

Step 2: Trust provider verification

Aembit’s trust provider validates the token cryptographically, checking the claims within it against the configured policy. This confirms that the request is genuinely coming from the expected Azure Databricks cluster and pipeline..

Step 3: Policy evaluation and token exchange

 Once identity is confirmed, Aembit evaluates the access policy: which target service is being requested, and what credential should be issued? Aembit then performs a token exchange with the target service, interacting with Azure Entra via Workload Identity Federation, or with a third-party service like Salesforce via OAuth, and returns a short-lived access token.

Step 4: Access and expiry

The pipeline uses the token to make its API call. The token expires. No secret was ever stored in the pipeline, and nothing needs to be rotated.

From the pipeline’s perspective, the experience is simple: request access, get a token, use it. The security complexity is handled by Aembit.

Cluster-Level and Pipeline-Level Attestation

Aembit supports two levels of granularity for workload identification in Azure Databricks, giving security and platform teams flexibility to match their environment and risk requirements.

Cluster-level attestation uses the shared managed identity assigned to the compute cluster. All pipelines running on that cluster share the same identity for attestation purposes. This is straightforward to configure and works well when pipelines on a given cluster share a common access profile.

Pipeline-level attestation adds a process identifier, such as the pipeline name or job name, directly within the pipeline as a client identifier. This allows Aembit to distinguish individual pipelines running on the same cluster, enabling more granular access policies and credential issuance. Alternatively, a Databricks-issued OIDC token — scoped to each job’s Run As service principal by the Databricks Authorization Server — provides a cryptographically verifiable pipeline identity without requiring process-level configuration. Either way, a pipeline responsible for pulling HR data can be scoped to exactly that, separately from a pipeline with broader analytical access.

Both configurations are supported today and can be set up through the Aembit console, API, or Terraform provider, fitting naturally into your existing infrastructure-as-code workflows and extending beyond Azure to cover third-party services

Beyond Azure: Extending to Third-Party Services

One of the more powerful aspects of this pattern is that it extends beyond Azure-native resources. The same attestation and token exchange flow that provides access to Azure services such as the Microsoft Graph API can be applied to any service that supports OAuth or OIDC, including external SaaS platforms like Salesforce.

An Azure Databricks pipeline calling the Salesforce API today likely has a Salesforce client secret stored somewhere in its configuration. With Aembit, that secret lives only within Aembit’s credential provider. The pipeline authenticates to Aembit, Aembit performs the OAuth exchange with Salesforce on its behalf, and a short-lived token is returned. Salesforce access is granted, the token expires, and no Salesforce credentials ever touch the pipeline environment.

The same policy framework governs all of it. One platform. One audit trail.

Auditability and Access Conditions

Every step in this process is logged. Aembit captures each attestation event, policy evaluation, and credential request, recording which workload, compute cluster, and pipeline made the request, which target service was involved, which access policy was applied, and what the outcome was. All of this is exportable to your SIEM.

Access conditions add another layer of control on top of identity verification. Policies can be configured to restrict access by geography, time window, or workload health posture. A pipeline running from an unexpected region can be denied access automatically, without requiring manual intervention.

The result is a complete, auditable record of non-human access across your Azure Databricks environment, something that’s nearly impossible to achieve when credentials are distributed across configuration files and environment variables.

The Bigger Picture

The Azure Databricks use case is a concrete illustration of a broader shift in how security teams think about workload access. Static credentials aren’t a necessary evil; they’re an architectural choice, and it’s one that creates persistent risk and operational overhead. Workload IAM replaces that pattern with something better: access based on verified identity, scoped by policy, and backed by short-lived credentials that expire.

For data engineering teams, the benefit is simpler pipelines with no credential handling. For security teams, it’s a dramatically reduced attack surface and a complete audit trail. For platform teams, it’s a model that scales across pipelines, services, and clouds without multiplying the burden of credential management.

See It in Action

Learn More

To see how Aembit secures Agentic AI and workload access across your environment, visit aembit.io or schedule a demo with the team

You might also like

Eliminating static API keys is real progress – but securing one credential surface is not the same as governing workload access at scale.
A working prototype can mask the harder problem: keeping every workload, agent, credential, policy, and audit trail consistent across production environments.
An early IETF draft hints at how identity infrastructure may evolve once autonomous software starts acting inside enterprise environments.