---
title: "Databricks integration | Grafana Cloud documentation"
description: "Learn about Databricks Grafana Cloud integration."
---

> For a curated documentation index, see [llms.txt](/llms.txt). For the complete documentation index, see [llms-full.txt](/llms-full.txt).

# Databricks integration for Grafana Cloud

Databricks is a unified analytics platform for data engineering, data science, and machine learning workloads. It provides a lakehouse architecture combining the best of data warehouses and data lakes.

The Databricks integration uses Grafana Alloy to collect billing, jobs, pipelines, and SQL warehouse metrics from Databricks System Tables. Accompanying dashboards are provided to visualize these metrics.

This integration supports providing the metrics listed in the docs via the Databricks exporter, which is integrated into Grafana Alloy.

This integration includes 16 useful alerts and 3 pre-built dashboards to help monitor and visualize Databricks metrics.

## Before you begin

Before configuring Alloy to collect Databricks metrics, you need:

1. **Databricks Workspace** with Unity Catalog and System Tables enabled
2. **Service Principal** with OAuth2 M2M authentication configured
3. **SQL Warehouse** for querying System Tables (serverless is recommended for cost efficiency)

**Get your workspace hostname:**

1. Copy your workspace URL subdomain, for example, `dbc-abc123-def456.cloud.databricks.com`.

**Create a SQL Warehouse:**

1. Go to **SQL Warehouses** in the sidebar and click **Create SQL warehouse**.
2. Configure the warehouse:
   
   - **Size**: 2X-Small (minimum size to reduce costs).
   - **Auto stop**: After 10 minutes of inactivity.
   - **Scaling**: Min 1, Max 1 cluster.
3. Click **Create**, then go to the **Connection Details** tab.
4. Copy the **HTTP path**, for example, `/sql/1.0/warehouses/abc123def456`.

**Create a Service Principal:**

1. Click your workspace name (top-right) and select **Manage Account**.
2. Go to **User Management** &gt; **Service Principals** tab &gt; **Add service principal**.
3. Enter a name, for example, `grafana-cloud-integration`.
4. Go to **Credentials &amp; secrets** tab &gt; **OAuth secrets** &gt; **Generate secret**.
5. Select the maximum lifetime (730 days) and click **Generate**.
6. Copy the **Client ID** and **Client Secret**. You will need both for the integration configuration.

**Assign the Service Principal to your workspace:**

1. Go to **Workspaces** in the sidebar and select your workspace.
2. Go to the **Permissions** tab and click **Add permissions**.
3. Search for the Service Principal and assign it the **Admin** permission.

**Grant SQL permissions to the Service Principal:**

As a metastore admin or user with MANAGE privilege, run the following SQL statements in a query editor:

SQL ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```sql
GRANT USE CATALOG ON CATALOG system TO `<service-principal-client-id>`;
GRANT USE SCHEMA ON SCHEMA system.billing TO `<service-principal-client-id>`;
GRANT SELECT ON SCHEMA system.billing TO `<service-principal-client-id>`;
GRANT USE SCHEMA ON SCHEMA system.query TO `<service-principal-client-id>`;
GRANT SELECT ON SCHEMA system.query TO `<service-principal-client-id>`;
GRANT USE SCHEMA ON SCHEMA system.lakeflow TO `<service-principal-client-id>`;
GRANT SELECT ON SCHEMA system.lakeflow TO `<service-principal-client-id>`;
```

Replace `<service-principal-client-id>` with your Service Principal’s Client ID.

Refer to the [Databricks documentation](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html) for detailed OAuth2 M2M setup instructions.

## Install Databricks integration for Grafana Cloud

1. In your Grafana Cloud stack, click **Connections** in the left-hand menu.
2. Find **Databricks** and click its tile to open the integration.
3. Review the prerequisites in the **Configuration Details** tab and set up Grafana Alloy to send Databricks metrics to your Grafana Cloud instance.
4. Click **Install** to add this integration’s pre-built dashboards and alerts to your Grafana Cloud instance, and you can start monitoring your Databricks setup.

## Configuration snippets for Grafana Alloy

### Simple mode

Replace the following:

- *`<your-server-hostname>`* : The Databricks workspace hostname, for example, `dbc-abc123-def456.cloud.databricks.com`.
- *`<your-warehouse-http-path>`* : The HTTP path of the SQL Warehouse, for example, `/sql/1.0/warehouses/abc123def456`.
- *`<your-service-principal-client-id>`* : The OAuth2 Application ID or Client ID of your Service Principal.
- *`<your-service-principal-client-secret>`* : The OAuth2 Client Secret of your Service Principal.

### Integrations snippets

Alloy ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```alloy
prometheus.exporter.databricks "integrations_databricks" {
	server_hostname    = "<your-databricks-server-hostname>"
	warehouse_http_path = "<your-databricks-warehouse-http-path>"
	client_id          = "<your-databricks-client-id>"
	client_secret      = "<your-databricks-client-secret>"
}

discovery.relabel "integrations_databricks" {
	targets = prometheus.exporter.databricks.integrations_databricks.targets

	rule {
		target_label = "instance"
		replacement  = constants.hostname
	}

	rule {
		target_label = "job"
		replacement  = "integrations/databricks"
	}
}

prometheus.scrape "integrations_databricks" {
	targets         = discovery.relabel.integrations_databricks.output
	forward_to      = [prometheus.remote_write.metrics_service.receiver]
	job_name        = "integrations/databricks"
	scrape_interval = "10m0s"
	scrape_timeout  = "9m0s"
}
```

### Advanced mode

Replace the following:

- *`<your-server-hostname>`* : The Databricks workspace hostname, for example, `dbc-abc123-def456.cloud.databricks.com`.
- *`<your-warehouse-http-path>`* : The HTTP path of the SQL Warehouse, for example, `/sql/1.0/warehouses/abc123def456`.
- *`<your-service-principal-client-id>`* : The OAuth2 Application ID or Client ID of your Service Principal.
- *`<your-service-principal-client-secret>`* : The OAuth2 Client Secret of your Service Principal.

### Advanced integrations snippets

Alloy ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```alloy
prometheus.exporter.databricks "integrations_databricks" {
	server_hostname    = "<your-databricks-server-hostname>"
	warehouse_http_path = "<your-databricks-warehouse-http-path>"
	client_id          = "<your-databricks-client-id>"
	client_secret      = "<your-databricks-client-secret>"
}

discovery.relabel "integrations_databricks" {
	targets = prometheus.exporter.databricks.integrations_databricks.targets

	rule {
		target_label = "instance"
		replacement  = constants.hostname
	}

	rule {
		target_label = "job"
		replacement  = "integrations/databricks"
	}
}

prometheus.scrape "integrations_databricks" {
	targets         = discovery.relabel.integrations_databricks.output
	forward_to      = [prometheus.remote_write.metrics_service.receiver]
	job_name        = "integrations/databricks"
	scrape_interval = "10m0s"
	scrape_timeout  = "9m0s"
}
```

The Databricks integration uses Alloy’s embedded exporter to query Databricks System Tables.

**Optional configuration arguments:**

Expand table

| Argument                | Default | Description                                                                         |
|-------------------------|---------|-------------------------------------------------------------------------------------|
| `query_timeout`         | `5m`    | Timeout for individual SQL queries.                                                 |
| `billing_lookback`      | `24h`   | How far back to look for billing data.                                              |
| `jobs_lookback`         | `3h`    | How far back to look for job runs.                                                  |
| `pipelines_lookback`    | `3h`    | How far back to look for pipeline runs.                                             |
| `queries_lookback`      | `2h`    | How far back to look for SQL warehouse queries.                                     |
| `sla_threshold_seconds` | `3600`  | Duration threshold in seconds for job SLA miss detection.                           |
| `collect_task_retries`  | `false` | Collect task retry metrics. Can cause high cardinality due to the `task_key` label. |

**Lookback windows:**

The exporter queries Databricks System Tables using SQL with sliding time windows. Each scrape collects data from `now - lookback` to `now`:

- **`billing_lookback`** : Queries `system.billing.usage` for DBU consumption and cost estimates. Databricks billing data typically has 24-48 hour lag.
- **`jobs_lookback`** : Queries `system.lakeflow.job_run_timeline` for job run counts, durations, and status.
- **`pipelines_lookback`** : Queries `system.lakeflow.pipeline_event_log` for DLT pipeline metrics.
- **`queries_lookback`** : Queries `system.query.history` for SQL warehouse query metrics.

The lookback window should be at least 2x the `scrape_interval` to ensure data continuity between scrapes. For example, with a 10-minute scrape interval, use at least 20 minutes of lookback.

**Tuning recommendations:**

- **`scrape_interval`** : Use 10-30 minutes. The exporter queries Databricks System Tables which can be slow. Increase the `scrape_interval` to reduce your SQL Warehouse costs.
- **`scrape_timeout`** : Must be less than `scrape_interval`. The exporter typically takes 90-120 seconds per scrape depending on data volume.
- **Lookback vs interval**: The lookback windows should be at least 2x the scrape interval. The defaults (`3h` for jobs and pipelines, `2h` for queries) work well with 10-30 minute scrape intervals.

**High cardinality warning:**

The `collect_task_retries` flag adds task-level retry metrics which can significantly increase cardinality for workspaces with many jobs. Only enable this feature if you really need it.

## Dashboards

The Databricks integration installs the following dashboards in your Grafana Cloud instance to help monitor your system.

- Databricks Jobs &amp; Pipelines
- Databricks Overview
- Databricks Warehouses &amp; Queries

**Databricks overview dashboard.**

**Databricks warehouses and queries dashboard (1/2).**

**Databricks jobs and pipelines dashboard (1/3).**

## Alerts

The Databricks integration includes the following useful alerts:

Expand table

| Alert                                    | Description                                                     |
|------------------------------------------|-----------------------------------------------------------------|
| DatabricksWarnSpendSpike                 | Warning: Databricks spend increased significantly day-over-day. |
| DatabricksCriticalSpendSpike             | Critical: Databricks spend spiked critically day-over-day.      |
| DatabricksWarnNoBillingData              | Warning: No billing data received from Databricks.              |
| DatabricksCriticalNoBillingData          | Critical: Critical: No billing data received from Databricks.   |
| DatabricksWarnJobFailureRate             | Warning: High job failure rate detected.                        |
| DatabricksCriticalJobFailureRate         | Critical: Critical job failure rate detected.                   |
| DatabricksWarnJobDurationRegression      | Warning: Job duration p95 regression detected.                  |
| DatabricksCriticalJobDurationRegression  | Critical: Critical job duration p95 regression detected.        |
| DatabricksWarnPipelineFailureRate        | Warning: High pipeline failure rate detected.                   |
| DatabricksCriticalPipelineFailureRate    | Critical: Critical pipeline failure rate detected.              |
| DatabricksWarnPipelineDurationRegression | Warning: Pipeline duration p95 regression detected.             |
| DatabricksCritPipelineDurationHigh       | Critical: Critical pipeline duration p95 regression detected.   |
| DatabricksWarnSqlQueryErrorRate          | Warning: High SQL query error rate detected.                    |
| DatabricksCriticalSqlQueryErrorRate      | Critical: Critical SQL query error rate detected.               |
| DatabricksWarnSqlQueryLatencyRegression  | Warning: SQL query latency p95 regression detected.             |
| DatabricksCritQueryLatencyHigh           | Critical: Critical SQL query latency p95 regression detected.   |

## Metrics

The most important metrics provided by the Databricks integration, which are used on the pre-built dashboards and Prometheus alerts, are as follows:

- databricks\_billing\_cost\_estimate\_usd\_sliding
- databricks\_billing\_dbus\_sliding
- databricks\_job\_run\_duration\_seconds\_sliding
- databricks\_job\_run\_status\_sliding
- databricks\_job\_runs\_sliding
- databricks\_pipeline\_freshness\_lag\_seconds\_sliding
- databricks\_pipeline\_retry\_events\_sliding
- databricks\_pipeline\_run\_duration\_seconds\_sliding
- databricks\_pipeline\_run\_status\_sliding
- databricks\_pipeline\_runs\_sliding
- databricks\_queries\_running\_sliding
- databricks\_queries\_sliding
- databricks\_query\_duration\_seconds\_sliding
- databricks\_query\_errors\_sliding
- databricks\_task\_retries\_sliding
- up

## Changelog

md ![Copy code to clipboard](/media/images/icons/icon-copy-small-2.svg) Copy

```md
# 1.0.1 - February 2026

- Fix metric_names and regex snippet in alloy example config

# 1.0.0 - February 2026

- Initial release
```

## Cost

By connecting your Databricks instance to Grafana Cloud, you might incur charges. To view information on the number of active series that your Grafana Cloud account uses for metrics included in each Cloud tier, see [Active series and dpm usage](/docs/grafana-cloud/fundamentals/active-series-and-dpm/) and [Cloud tier pricing](/products/cloud/pricing/).
