Share via


Connect to external HTTP services

Important

This feature is in Public Preview.

This article describes how to set up Lakehouse Federation to run federated queries on external service data that is not managed by Azure Databricks. To learn more about Lakehouse Federation, see What is Lakehouse Federation?.

To connect to your external service database using Lakehouse Federation, you must create the following in your Azure Databricks Unity Catalog metastore:

  • A connection to your external service database.
  • A foreign catalog that mirrors your external service database in Unity Catalog so that you can use Unity Catalog query syntax and data governance tools to manage Azure Databricks user access to the database.

Before you begin

Workspace requirements:

  • Workspace enabled for Unity Catalog.

Compute requirements:

  • Network connectivity from your compute resource to the target database systems. See Networking recommendations for Lakehouse Federation.
  • Azure Databricks compute must use Databricks Runtime 15.4 LTS or above and Standard or Dedicated access mode.
  • SQL warehouses must be pro or serverless and must use 2023.40 or above.

Permissions required:

  • To create a connection, you must be a metastore admin or a user with the CREATE CONNECTION privilege on the Unity Catalog metastore attached to the workspace.
  • To create a foreign catalog, you must have the CREATE CATALOG permission on the metastore and be either the owner of the connection or have the CREATE FOREIGN CATALOG privilege on the connection.

Additional permission requirements are specified in each task-based section that follows.

  • Set up authentication to the external service using one of the following methods:

    • Bearer token: Obtain a bearer token for simple token-based authentication.
    • OAuth 2.0 Machine-to-Machine: Create and configure an app to enable machine-to-machine authentication.
    • OAuth 2.0 User-to-Machine Shared: Authenticate with user interaction to share access between service identity and machine.
    • OAuth 2.0 User-to-Machine Per User: Authenticate with per user interaction to access between user identity and machine.

Authentication methods for external services

Bearer token: A bearer token is a simple token-based authentication mechanism where a token is issued to a client and used to access resources without requiring additional credentials. The token is included in the request header and grants access as long as it is valid.

OAuth Machine-to-Machine (recommended): OAuth Machine-to-Machine (M2M) authentication is used when two systems or applications communicate without direct user involvement. Tokens are issued to a registered machine client, which uses its own credentials to authenticate. This is ideal for server-to-server communication, microservices, and automation tasks where no user context is needed. Databricks recommends using OAuth Machine-to-Machine when it is available.

OAuth User-to-Machine Shared: OAuth User-to-Machine Shared authentication allows a single user identity to authenticate and share the same set of credentials across multiple clients or users. All users share the same access token. This approach is suitable for shared devices or environments where a consistent user identity is sufficient, but it reduces individual accountability and tracking. In cases where identity login is required, select User-to-Machine Shared.

OAuth User-to-Machine Per User: OAuth User-to-Machine Per User authentication allows each user identity to authenticate and use its own credentials to access resources. Each user is issued a unique access token, enabling individual access control, auditing, and accountability. This method is suitable when user-specific data access is required and when accessing external services on behalf of the individual user.

External services must comply with OAuth 2.0 specifications

HTTP connections that use OAuth must connect to services that comply with the official OAuth 2.0 specification for how they handle and return access token data. This means that the service’s responses must use the exact field names and data formats described in the specification, such as access_token, expires_in, and so on.

If you have problems connecting to an external service using OAuth 2.0, check that the service’s responses follow these requirements.

Create a connection to the external service

First, create a Unity Catalog connection to the external service that specifies a path and credentials to access the service.

Benefits of using a Unity Catalog connection include the following:

  • Secure credential management: Secrets and tokens are securely stored and managed in Unity Catalog, ensuring they are never exposed to users.
  • Granular access control: Unity Catalog allows fine-grained control over who can use or manage connections with the USE_CONNECTION and MANAGE_CONNECTION privileges.
  • Host-specific token enforcement: Tokens are restricted to the host_name specified during connection creation, ensuring they cannot be used with unauthorized hosts.

Permissions required: Metastore admin or user with the CREATE CONNECTION privilege.

Create a connection using one of the following methods:

Catalog Explorer

Use the Catalog Explorer UI to create a connection.

  1. In your Azure Databricks workspace, click Data icon. Catalog.

  2. At the top of the Catalog pane, click the Add or plus icon Add icon and select Add a connection from the menu.

    Alternatively, from the Quick access page, click the External data > button, go to the Connections tab, and click Create connection.

  3. Click Create connection.

  4. Enter a user-friendly Connection name.

  5. Select a Connection type of HTTP.

  6. Select an Auth type from the following options:

    • Bearer token
    • OAuth Machine to Machine
    • OAuth User to Machine Shared
    • OAuth User to Machine Per User
  7. On the Authentication page, enter the following connection properties for the HTTP connection.

    For a bearer token:

    Property Description Example value
    Host The base URL of your Databricks workspace or deployment. https://databricks.com
    Port The network port used for the connection, typically 443 for HTTPS. 443
    Bearer Token The authentication token used to authorize API requests. bearer-token
    Base Path The root path for API endpoints. /api/

    For OAuth Machine-to-Machine token:

    Property Description
    Client ID Unique identifier for the application you created.
    Client secret Secret or password generated for the application that you created.
    OAuth scope Scope to grant during user authorization. The scope parameter is expressed as a list of space-delimited, case-sensitive strings.
    For example: channels:read channels:history chat:write
    Token endpoint Used by the client to obtain an access token by presenting its authorization grant or refresh token.
    Usually in the format: https://authorization-server.com/oauth/token

    For OAuth User-to-Machine Shared token:

    • You will be prompted to sign in using your OAuth credentials. The credentials you use will be shared by anyone that uses this connection. Some providers require an allowlist for the redirect URL, please include <databricks_workspace_url>/login/oauth/http.html as the redirect URL allowlist. Example: https://databricks.com/login/oauth/http.html
    Property Description
    Client ID Unique identifier for the application you created.
    Client secret Secret or password generated for the application that you created.
    OAuth scope Scope to grant during user authorization. The scope parameter is expressed as a list of space-delimited, case-sensitive strings.
    For example: channels:read channels:history chat:write
    Authorization endpoint Used to authenticate with the resource owner via user-agent redirection.
    Usually in the format: https://authorization-server.com/oauth/authorize
    Token endpoint Used by the client to obtain an access token by presenting its authorization grant or refresh token.
    Usually in the format: https://authorization-server.com/oauth/token

    For OAuth User-to-Machine Per User token:

    • Each user will be prompted to sign in using their individual OAuth credentials the first time they use the HTTP connection. Some providers require an allowlist for the redirect URL, please include <databricks_workspace_url>/login/oauth/http.html as the redirect URL allowlist. Example: https://databricks.com/login/oauth/http.html
    Property Description
    Client ID Unique identifier for the application you created. Used by the authorization server to identify the client application during the OAuth flow.
    Client secret Secret or password generated for the application that you created. It is used to authenticate the client application when exchanging authorization codes for tokens and must be kept confidential.[1][5][6]
    OAuth scope Scope to grant during user authorization. Expressed as a list of space-delimited, case-sensitive strings defining the permissions the application requests.
    For example: channels:read channels:history chat:write
    Authorization endpoint Endpoint used to authenticate the resource owner via user-agent redirection and obtain authorization.
    Usually in the format: https://authorization-server.com/oauth/authorize
    The client directs the user to this endpoint to log in and consent to permissions.
    Token endpoint Endpoint used by the client to exchange an authorization grant (such as an authorization code) or refresh token for an access token.
    Usually in the format: https://authorization-server.com/oauth/token
  8. Click Create connection.

SQL

Use the CREATE CONNECTION SQL command to create a connection.

Note

You cannot use the SQL command to create a connection that uses OAuth Machine-to-User Shared. Instead, see the Catalog Explorer UI instructions.

To create a new connection using a Bearer token, run the following command in a notebook or the Databricks SQL query editor:

CREATE CONNECTION <connection-name> TYPE HTTP
OPTIONS (
  host '<hostname>',
  port '<port>',
  base_path '<base-path>',
  bearer_token '<bearer-token>'
);

Databricks recommends using secrets instead of plaintext strings for sensitive values like credentials. For example:

CREATE CONNECTION <connection-name> TYPE HTTP
OPTIONS (
  host '<hostname>',
  port '<port>',
  base_path '<base-path>',
  bearer_token secret ('<secret-scope>','<secret-key-password>')
)

To create a new connection using OAuth Machine-to-Machine, run the following command in a notebook or the Databricks SQL query editor:

CREATE CONNECTION <connection-name> TYPE HTTP
OPTIONS (
  host '<hostname>',
  port '<port>',
  base_path '<base-path>',
  client_id '<client-id>'
  client_secret '<client-secret>'
  oauth_scope '<oauth-scope1> <oauth-scope-2>'
  token_endpoint '<token-endpoint>'
)

Send an HTTP request to the external system

Now that you have a connection, learn how to send HTTP requests to the service using the http_request built-in SQL function.

Permissions required: USE CONNECTION on the connection object.

Run the following SQL command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • connection-name: The connection object that specifies the host, port, base_path, and access credentials.
  • http-method: The HTTP request method used to make the call. For example: GET, POST, PUT, DELETE
  • path: The path to concatenate after the base_path to invoke the service resource.
  • json: The JSON body to send with the request.
  • headers: A map to specify the request headers.
SELECT http_request(
  conn => <connection-name>,
  method => <http-method>,
  path => <path>,
  json => to_json(named_struct(
    'text', text
  )),
  headers => map(
    'Accept', "application/vnd.github+json"
  )
);

Note

SQL access with http_request is blocked for the User-to-Machine Per User connection type. Use the Python Databricks SDK instead.

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ExternalFunctionRequestHttpMethod

WorkspaceClient().serving_endpoints.http_request(
  conn="connection-name",
  method=ExternalFunctionRequestHttpMethod.POST,
  path="/api/v1/resource",
  json={"key": "value"},
  headers={"extra-header-key": "extra-header-value"},
)

Use HTTP connections for agent tools

AI agents can use the HTTP connection to access external applications like Slack, Google Calendar, or any service with an API using HTTP requests. Agents can use externally connected tools to automate tasks, send messages, and retrieve data from third-party platforms.

See Connect AI agent tools to external services.