Skip to content
Reeflow
Start Building

Databricks

Connect to Databricks using the SQL Statement Execution API authenticated by an OAuth service principal. Queries run on a SQL warehouse against the catalogs, schemas, and tables.

Before creating a connection, you need:

  1. A Databricks workspace on AWS, Azure, or GCP
  2. A running SQL warehouse the connection can execute statements on
  3. A Databricks service principal with an OAuth client secret and read-only privileges on the catalogs and schemas you want to query
Section titled “Creating a read-only service principal (recommended)”

For least-privilege access, create a dedicated service principal for Reeflow rather than using an admin account. Databricks’ Unity Catalog privileges reference covers the available grants in detail.

  1. In your workspace, click your username in the top right, then Settings
  2. Go to Identity and access, then next to Service principals click Manage
  3. Click Add service principal, enter a name (e.g. reeflow), and click Add
  4. Grant Can use on your SQL warehouse: navigate to SQL Warehouses, open the warehouse, click Permissions, and add the service principal with Can use

Then grant catalog-level read access by running the following as a Unity Catalog metastore admin, customising the principal, catalog, and schema names to match your environment:

-- Grant catalog visibility and read access on every schema, table, and view
GRANT USE CATALOG ON CATALOG main TO `reeflow_reader`;
GRANT USE SCHEMA ON CATALOG main TO `reeflow_reader`;
GRANT SELECT ON CATALOG main TO `reeflow_reader`;

Reeflow authenticates with Databricks using OAuth M2M (machine-to-machine) via a service principal. Reeflow exchanges the client credentials for a short-lived bearer token automatically on each connection.

Use the service principal created in the previous section, or an existing one. Then:

  1. Click your username in the top right, go to Settings, then Identity and access, then next to Service principals click Manage
  2. Click the service principal name to open the Service principal details page
  3. Open the Secrets tab and click Generate secret
  4. Copy the Secret and note the Client ID from the dialog. Databricks will not show the secret again.

When creating a Databricks connection in Reeflow, provide the following:

FieldDescription
Workspace hostDatabricks workspace hostname, without the https:// prefix. For example, dbc-12345678-90ab.cloud.databricks.com or adb-1234567890123456.7.azuredatabricks.net.
Client IDThe service principal’s Client ID, shown in the Generate secret dialog alongside the secret.
OAuth secretThe secret generated on the Secrets tab of the service principal details page.
SQL warehouseThe warehouse used to execute queries. Reeflow lists the warehouses available to the service principal.
CatalogDefault catalog queries are issued against.
SchemaOptional default schema. Defaults to default when omitted.

Create a Databricks connection

Add a Databricks workspace as a data source in Reeflow.

Navigate to Connections in the main navigation, then click New Connection.

Enter a Name for the connection and an optional Description.

Select Databricks as the connection type.

Enter your Databricks Workspace host, Client ID, Client secret, SQL warehouse, Catalog, and (optionally) Schema.

Click Test Connection to verify your credentials are correct.

Click Create Connection to save. The connection appears in your connections list.