Share via


Workspaces system table reference

Important

This system table is in Public Preview.

This page explains how to use the workspaces system table to monitor workspaces in your Azure Databricks account. Each row in the table represents the latest known state of an active workspace in your account, including metadata and lifecycle status.

This table is most useful when joined with other system tables. You can use it to get aggregate statistics on reliability, performance, and cost across workspaces in your account.

Note

The table only includes workspaces currently in your account. After a workspace is cancelled, its row is removed from the workspaces system table.

Table path: This table is located at system.access.workspaces_latest

Workspaces table schema

Column name Data type Description Example
account_id string ID of the Databricks account 0722779a-fd4e-49c1-a7a6-8417a97cf9ea
workspace_id string ID of the Databricks workspace '2274721051152826'
workspace_name string The human-readable name of the workspace dough-re-mi
workspace_url string URL of the workspace https://dough-re-mi-pizza.cloud.databricks.com/
create_time timestamp Timestamp of when the workspace was created (second precision) 2025-03-05 15:47
status enum The status of the workspace. For workspace creation, it is set to PROVISIONING initially. Continue to check the status until the status is RUNNING. NOT_PROVISIONED, PROVISIONING, RUNNING, FAILED, BANNED

Sample queries

The following sections include sample SQL queries using the workspaces system table.

Which workspaces are currently running?

The query below shows all the workspaces currently in the RUNNING state in your account.

SELECT
    workspace_id,
    workspace_name,
    workspace_url,
    create_time
FROM
    system.access.workspaces_latest
WHERE
    status = "RUNNING";

Which workspaces have the highest spend?

This query calculates the top 10 workspaces in your account by spend in the last 30 days.

WITH
-- apply date filter
usage_with_ws_filtered_by_date AS (
  SELECT
    w.workspace_id,
    w.workspace_name,
    w.workspace_url,
    u.usage_quantity,
    u.usage_unit,
    u.sku_name,
    u.usage_end_time,
    u.cloud
  FROM
    system.billing.usage AS u NATURAL JOIN system.access.workspaces_latest AS w
  WHERE
    u.usage_date > DATE_ADD(CURRENT_DATE(), -30)
),
-- calc list priced usage in USD
prices AS (
  SELECT
    COALESCE(price_end_time, DATE_ADD(current_date, 1)) AS coalesced_price_end_time,
    *
  FROM
    system.billing.list_prices
  WHERE
    currency_code = 'USD'
),
list_priced_usd AS (
  SELECT
    COALESCE(u.usage_quantity * p.pricing.default, 0) as usage_usd,
    u.*
  FROM
    usage_with_ws_filtered_by_date as u
      LEFT JOIN prices AS p
        ON u.sku_name = p.sku_name
        AND u.cloud = p.cloud
        AND u.usage_unit = p.usage_unit
        AND (u.usage_end_time BETWEEN p.price_start_time AND p.coalesced_price_end_time)
)
-- calc total usage in USD
SELECT
  workspace_id,
  workspace_name,
  workspace_url,
  round(sum(usage_usd), 2) AS usage_usd
FROM
  list_priced_usd
GROUP BY
  1,
  2,
  3
ORDER BY
  4 DESC
limit 10;

Which jobs across my account are the most expensive?

This query calculates the top 10 most expensive jobs in your account in the last 30 days.

with usage_with_cost AS (
  SELECT
    *,
    t1.usage_quantity * list_prices.pricing.default as list_cost
  FROM system.billing.usage t1
  INNER JOIN system.billing.list_prices list_prices on
      t1.cloud = list_prices.cloud and
      t1.sku_name = list_prices.sku_name and
      t1.usage_start_time >= list_prices.price_start_time and
      (t1.usage_end_time <= list_prices.price_end_time or list_prices.price_end_time is null)
),
most_expensive_jobs_30d AS (
  SELECT
    workspace_id,
    usage_metadata.job_id,
    SUM(list_cost) as list_cost
  FROM usage_with_cost
  WHERE
    usage_metadata.job_id IS NOT NULL
    AND usage_date >= CURRENT_DATE() - INTERVAL 30 DAYS
  GROUP BY ALL
  ORDER BY list_cost DESC
  LIMIT 100
),
latest_jobs AS (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY workspace_id, job_id ORDER BY change_time DESC) as rn
  FROM system.lakeflow.jobs QUALIFY rn=1
)
SELECT
  t1.workspace_id,
  t2.workspace_name,
  t3.name as job_name,
  CONCAT(
    t2.workspace_url, '/jobs/', t1.job_id
  ) as job_url,
  t1.list_cost
FROM most_expensive_jobs_30d t1
LEFT JOIN  system.access.workspaces_latest t2 using (workspace_id)
LEFT JOIN latest_jobs t3 USING (workspace_id, job_id)
ORDER BY list_cost DESC
LIMIT 10;