使用基础模型

2025-04-30

本文介绍如何编写基础模型的查询请求并将其发送到模型服务终结点。可以查询 Databricks 托管的基础模型和 Databricks 外部托管的基础模型。

有关传统的 ML 或 Python 模型查询请求，请参阅查询自定义模型的服务终结点。

马赛克 AI 模型服务支持基础模型 API，外部模型用于访问基础模型。模型服务使用与 OpenAI 兼容的统一 API 和 SDK 进行查询。这样，就可以跨受支持的云和提供商试验和自定义生产的基础模型。

Mosaic AI 模型服务提供以下选项，可用于向提供基础模型或外部模型服务的终结点发送评分请求：

方法	详细信息
OpenAI 客户端	使用 OpenAI 客户端查询 Mosaic AI 模型服务终结点托管的模型。将提供终结点名称的模型指定为 `model` 输入。支持由基础模型 API 或外部模型提供的聊天、嵌入和完成模型。
SQL 函数	使用 `ai_query` SQL 函数直接从 SQL 调用模型推理。请参阅示例：查询基础模型。
服务 UI	在“服务终结点”页面中，选择“查询终结点”。插入 JSON 格式的模型输入数据，然后单击“发送请求”。如果模型记录了输入示例，请使用“显示示例”来加载该示例。
REST API	使用 REST API 调用和查询模型。有关详细信息，请参阅 POST /serving-endpoints/{name}/invocations。有关为多个模型提供服务的终结点的评分请求，请参阅查询终结点背后的单个模型。
MLflow 部署 SDK	使用 MLflow 部署 SDK 的 predict（）函数查询模型。
Databricks Python SDK	Databricks Python SDK 是 REST API 上的一个层。它处理低级别详细信息，例如身份验证，从而更轻松地与模型交互。

要求

模型服务终结点。
Databricks 工作区位于受支持的区域中。
- 基础模型 API 区域
- 外部模型区域
若要通过 OpenAI 客户端 REST API 或 MLflow 部署 SDK 发送评分请求，你必须有 Databricks API 令牌。

重要

作为适用于生产场景的安全最佳做法，Databricks 建议在生产期间使用计算机到计算机 OAuth 令牌来进行身份验证。

对于测试和开发，Databricks 建议使用属于服务主体（而不是工作区用户）的个人访问令牌。若要为服务主体创建令牌，请参阅管理服务主体的令牌。

安装包

选择查询方法后，必须先将相应的包安装到群集。

OpenAI 客户端

若要使用 OpenAI 客户端，需要在群集上安装 databricks-sdk[openai] 包。 Databricks SDK 提供了一个包装器，用于构造 OpenAI 客户端，该客户端的授权会自动配置为查询生成 AI 模型。在笔记本或本地终端中运行以下命令：

!pip install databricks-sdk[openai]>=0.35.0

仅当在 Databricks Notebook 上安装包时，才需要满足以下条件

dbutils.library.restartPython()

REST API

Databricks Runtime 中提供对服务 REST API 的访问，供机器学习使用。

MLflow 部署 SDK

!pip install mlflow

仅当在 Databricks Notebook 上安装包时，才需要满足以下条件

dbutils.library.restartPython()

Databricks Python SDK

所有使用 Databricks Runtime 13.3 LTS 或更高版本的 Azure Databricks 群集上均已安装了 Databricks SDK for Python。对于使用 Databricks Runtime 12.2 LTS 及更低版本的 Azure Databricks 群集，必须先安装 Databricks SDK for Python。请参阅用于 Python 的 Databricks SDK。

文本完成模型查询

OpenAI 客户端

重要

不支持使用 OpenAI 客户端通过按令牌付费的基础模型 API 实现文本补全模型的查询。仅支持使用 OpenAI 客户端查询外部模型，如本部分所示。

要使用 OpenAI 客户端，需将模型服务终结点名称指定为 model 输入。以下示例使用 OpenAI 客户端查询由 Anthropic 托管的 claude-2 完成模型。若要使用 OpenAI 客户端，请在 model 字段中填充托管着你要查询的模型的模型服务终结点的名称。

此示例使用以前创建的终结点 anthropic-completions-endpoint，该终结点是为访问 Anthropic 模型提供商提供的外部模型而配置的。了解如何创建外部模型终结点。

有关你可以查询的其他模型及其提供商，请参阅支持的模型。


from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

completion = openai_client.completions.create(
model="anthropic-completions-endpoint",
prompt="what is databricks",
temperature=1.0
)
print(completion)

SQL

重要

以下示例使用内置 SQL 函数 ai_query。此函数为公共预览版，定义可能会发生变化。

SELECT ai_query(
    "<completions-model-endpoint>",
    "Can you explain AI in ten words?"
  )

REST API

下面是对通过外部模型实现的补全模型查询的补全请求。

重要

以下示例使用 REST API 参数来查询为外部模型提供服务的终结点。这些参数为公共预览版，定义可能会更改。请参阅 POST /serving-endpoints/{name}/invocations。


curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{"prompt": "What is a quoll?", "max_tokens": 64}' \
https://<workspace_host>.databricks.com/serving-endpoints/<completions-model-endpoint>/invocations

MLflow 部署 SDK

下面是对通过外部模型实现的补全模型查询的补全请求。

重要

以下示例使用来自predict()的 API。


import os
import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook

os.environ['DATABRICKS_HOST'] = "https://<workspace_host>.databricks.com"
os.environ['DATABRICKS_TOKEN'] = "dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

completions_response = client.predict(
    endpoint="<completions-model-endpoint>",
    inputs={
        "prompt": "What is the capital of France?",
        "temperature": 0.1,
        "max_tokens": 10,
        "n": 2
    }
)

# Print the response
print(completions_response)

Databricks Python SDK

下面是对通过外部模型实现的补全模型查询的补全请求。

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="<completions-model-endpoint>",
    prompt="Write 3 reasons why you should train an AI model on ___domain specific data sets."
)
print(response.choices[0].text)

以下是完成模型的预期请求格式。对于外部模型，可以包含对给定提供程序和终结点配置有效的其他参数。请参阅其他查询参数。

{
  "prompt": "What is mlflow?",
  "max_tokens": 100,
  "temperature": 0.1,
  "stop": [
    "Human:"
  ],
  "n": 1,
  "stream": false,
  "extra_params":
  {
    "top_p": 0.9
  }
}

下面是预期的响应格式：

{
  "id": "cmpl-8FwDGc22M13XMnRuessZ15dG622BH",
  "object": "text_completion",
  "created": 1698809382,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
      "text": "MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides tools for tracking experiments, managing and deploying models, and collaborating on projects. MLflow also supports various machine learning frameworks and languages, making it easier to work with different tools and environments. It is designed to help data scientists and machine learning engineers streamline their workflows and improve the reproducibility and scalability of their models.",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 83,
    "total_tokens": 88
  }
}

聊天完成模型查询

下面是查询聊天模型的示例。该示例适用于查询使用模型服务功能（基础模型 API 或外部模型）提供的聊天模型。

有关批处理推理示例，请参阅使用 AI Functions 执行批处理 LLM 推理。

OpenAI 客户端

下面是基础模型 API 按令牌付费终结点提供的 Meta Llama 3.3 70B 指示模型的聊天请求，databricks-meta-llama-3-3-70b-instruct 在工作区中。

要使用 OpenAI 客户端，需将模型服务终结点名称指定为 model 输入。


from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

response = openai_client.chat.completions.create(
    model="databricks-meta-llama-3-3-70b-instruct",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

若要在工作区外部查询基础模型，必须直接使用 OpenAI 客户端。还需要 Databricks 工作区实例才能将 OpenAI 客户端连接到 Databricks。以下示例假定在计算中安装了 Databricks API 令牌和 openai。


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.chat.completions.create(
    model="databricks-meta-llama-3-3-70b-instruct",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

SQL

重要

以下示例使用内置 SQL 函数 ai_query。此函数为公共预览版，定义可能会发生变化。

以下是基础模型 API 按令牌付费终结点提供的 meta-llama-3-3-70b-instruct 聊天请求，databricks-meta-llama-3-3-70b-instruct 在工作区中。

备注

ai_query() 函数不支持为 DBRX 或 DBRX 指示模型提供服务的查询终结点。

SELECT ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Can you explain AI in ten words?"
  )

REST API

重要

以下示例使用 REST API 参数来查询为基础模型提供服务的终结点。这些参数为公共预览版，定义可能会更改。请参阅 POST /serving-endpoints/{name}/invocations。

下面是基础模型 API 按令牌付费终结点提供的 Meta Llama 3.3 70B 指示模型的聊天请求，databricks-meta-llama-3-3-70b-instruct 在工作区中。

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": " What is a mixture of experts model?"
    }
  ]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-meta-llama-3-3-70b-instruct/invocations \

MLflow 部署 SDK

重要

以下示例使用来自predict()的 API。

下面是基础模型 API 按令牌付费终结点提供的 Meta Llama 3.3 70B 指示模型的聊天请求，databricks-meta-llama-3-3-70b-instruct 在工作区中。


import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

chat_response = client.predict(
    endpoint="databricks-meta-llama-3-3-70b-instruct",
    inputs={
        "messages": [
            {
              "role": "user",
              "content": "Hello!"
            },
            {
              "role": "assistant",
              "content": "Hello! How can I assist you today?"
            },
            {
              "role": "user",
              "content": "What is a mixture of experts model??"
            }
        ],
        "temperature": 0.1,
        "max_tokens": 20
    }
)

Databricks Python SDK

下面是基础模型 API 按令牌付费终结点提供的 Meta Llama 3.3 70B 指示模型的聊天请求，databricks-meta-llama-3-3-70b-instruct 在工作区中。

此代码必须在工作区的笔记本中运行。请参阅在 Azure Databricks 笔记本中使用 Databricks SDK for Python。

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-meta-llama-3-3-70b-instruct",
    messages=[
        ChatMessage(
            role=ChatMessageRole.SYSTEM, content="You are a helpful assistant."
        ),
        ChatMessage(
            role=ChatMessageRole.USER, content="What is a mixture of experts model?"
        ),
    ],
    max_tokens=128,
)
print(f"RESPONSE:\n{response.choices[0].message.content}")

LangChain

若要使用 LangChain 查询基础模型终结点，可以使用 ChatDatabricks ChatModel 类并指定 endpoint。

以下示例使用 LangChain 中的 ChatDatabricks ChatModel 类来查询基础模型 API 按令牌付费终结点 databricks-meta-llama-3-3-70b-instruct。

%pip install databricks-langchain

from langchain_core.messages import HumanMessage, SystemMessage
from databricks_langchain import ChatDatabricks

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(content="What is a mixture of experts model?"),
]

llm = ChatDatabricks(endpoint_name="databricks-meta-llama-3-3-70b-instruct")
llm.invoke(messages)

例如，以下是使用 REST API 时聊天模型的预期请求格式。对于外部模型，可以包含对给定提供程序和终结点配置有效的其他参数。请参阅其他查询参数。

{
  "messages": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.1
}

以下是使用 REST API 发出的请求的预期响应格式：

{
  "model": "databricks-meta-llama-3-3-70b-instruct",
  "choices": [
    {
      "message": {},
      "index": 0,
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 74,
    "total_tokens": 81
  },
  "object": "chat.completion",
  "id": null,
  "created": 1698824353
}

推理模型

马赛克 AI 模型服务提供与推理模型交互的统一 API。推理为基础模型提供了增强功能来处理复杂任务。一些模型还通过展示其分步思维过程来提供透明度，然后再提供最终答案。

有两种类型的模型：仅推理模型和混合模型。仅推理模型（如 OpenAI o 系列）始终在其响应中使用内部推理。混合模型（例如 databricks-claude-3-7-sonnet，在需要时支持快速、即时答复和更深层次的推理）。

若要在混合模型中启用推理，请包括思维参数并设置一个 budget_tokens 值，用于控制模型可用于内部思考的令牌数。较高的预算可以提高复杂任务的质量，但超过 32K 的使用量可能会有所不同。 budget_tokens 必须小于 max_tokens。

所有推理模型都通过聊天完成终结点进行访问。

from openai import OpenAI
import base64
import httpx

client = OpenAI(
  api_key=os.environ.get('YOUR_DATABRICKS_TOKEN'),
  base_url=os.environ.get('YOUR_DATABRICKS_BASE_URL')
  )

response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10240
        }
    }
)

msg = response.choices[0].message
reasoning = msg.content[0]["summary"][0]["text"]
answer = msg.content[1]["text"]

print("Reasoning:", reasoning)
print("Answer:", answer)

API 响应包括思维和文本内容块：

ChatCompletionMessage(
    role="assistant",
    content=[
        {
            "type": "reasoning",
            "summary": [
                {
                    "type": "summary_text",
                    "text": ("The question is asking about the scientific explanation for why the sky appears blue... "),
                    "signature": ("EqoBCkgIARABGAIiQAhCWRmlaLuPiHaF357JzGmloqLqkeBm3cHG9NFTxKMyC/9bBdBInUsE3IZk6RxWge...")
                }
            ]
        },
        {
            "type": "text",
            "text": (
                "# Why the Sky Is Blue\n\n"
                "The sky appears blue because of a phenomenon called Rayleigh scattering. Here's how it works..."
            )
        }
    ],
    refusal=None,
    annotations=None,
    audio=None,
    function_call=None,
    tool_calls=None
)

跨多个轮次管理推理

此部分特定于 databricks-claude-3-7-sonnet model。

在多轮对话中，只有与最后一次助理轮次或工具使用会话关联的逻辑推理模块对模型可见，并计为输入标记。

如果不想将推理令牌传递回模型（例如，不需要它来推理其先前的步骤），可以完全省略推理块。例如：

response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[
        {"role": "user", "content": "Why is the sky blue?"},
        {"role": "assistant", "content": text_content},
        {"role": "user", "content": "Can you explain in a way that a 5-year-old child can understand?"}
    ],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10240
        }
    }
)

answer = response.choices[0].message.content[1]["text"]
print("Answer:", answer)

但是，如果你确实需要模型来推理其以前的推理过程（例如，如果你正在构建显示其中间推理的体验），则必须包含完整、未经修改的助理消息，包括上一轮次的推理块。下面介绍如何使用完整的助手消息继续线程：

assistant_message = response.choices[0].message

response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[
        {"role": "user", "content": "Why is the sky blue?"},
        {"role": "assistant", "content": text_content},
        {"role": "user", "content": "Can you explain in a way that a 5-year-old child can understand?"},
        assistant_message,
        {"role": "user", "content": "Can you simplify the previous answer?"}
    ],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10240
        }
    }
)

answer = response.choices[0].message.content[1]["text"]
print("Answer:", answer)

推理模型的工作原理是什么？

推理模型除了标准输入和输出令牌外，还引入了特殊的推理令牌。这些令牌让模型通过提示“思考”，将其分解，并考虑不同的响应方式。在此内部推理过程之后，模型会生成其最终答案作为可见输出标记。某些模型（例如 databricks-claude-3-7-sonnet）向用户显示这些推理令牌，而其他模型（如 OpenAI o 系列）会丢弃它们，并且不会在最终输出中公开它们。

支持的模型

基础模型API（由Databricks托管）

databricks-claude-3-7-sonnet

外部模型

具有推理功能的 OpenAI 模型
具有推理功能的人类 Claude 模型
具有推理功能的 Google Gemini 模型

视觉模型

马赛克 AI 模型服务提供统一的 API，用于使用各种基础模型来理解和分析图像，从而解锁强大的多模式功能。该功能可通过特定的 Databricks 托管模型获得，作为基础模型 API 的一部分，并通过终结点为外部模型提供服务。

代码示例


from openai import OpenAI
import base64
import httpx

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

# encode image
image_url = "https://upload.wikimedia.org/wikipedia/commons/a/a7/Camponotus_flavomarginatus_ant.jpg"
image_data = base64.standard_b64encode(httpx.get(image_url).content).decode("utf-8")

# OpenAI request
completion = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "what's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

聊天完成 API 支持多个图像输入，使模型能够分析每个图像并合成来自所有输入的信息，以生成对提示的响应。


from openai import OpenAI
import base64
import httpx

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

# Encode multiple images

image1_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image1_data = base64.standard_b64encode(httpx.get(image1_url).content).decode("utf-8")

image2_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
image2_data = base64.standard_b64encode(httpx.get(image1_url).content).decode("utf-8")

# OpenAI request

completion = client.chat.completions.create(
model="databricks-claude-3-7-sonnet",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What are in these images? Is there any difference between them?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image1_data}"},
},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{image2_data}"},
},
],
}
],
)

print(completion.choices[0].message.content)

支持的模型

基础模型API（由Databricks托管）

databricks-claude-3-7-sonnet

外部模型

具有视觉功能的 OpenAI GPT 和 o 系列模型
具有视觉功能的人类 Claude 模型
具有视觉功能的 Google Gemini 模型
还支持其他具有 OpenAI API 兼容的视觉功能的外部基础模型。

输入图像要求

本部分仅适用于基础模型 API。有关外部模型，请参阅提供程序的文档。

每个请求存在多个映像

最多 20 张图像 用于 Claude.ai
API 请求可最多处理 100 张图像
所有提供的图像都在请求中进行处理，这对于比较或对比它们很有用。

大小限制

大于 8000x8000 像素 的图像将被拒绝。
如果在一个 API 请求中提交超过 20 个图像， 则每个图像允许的最大大小 为 2000 x 2000 像素。

图像大小调整建议

为了获得最佳性能，在上传图像之前调整图像的大小（如果图像太大）。
如果图像的 长边缘超过 1568 像素 ，或者其 大小超过约 1,600 个标记，则会在保留纵横比的同时 自动缩减 它。
非常小的图像 （ 任何边缘的 200 像素以下）可能会 降低性能。
为了降低延迟，请将图像保持在 1.15 万像素 以内，同时在两个维度中最多保留 1568 像素 。

图像质量注意事项

支持的格式： JPEG、PNG、GIF、WebP。
清晰： 避免模糊或像素化图像。
图像中的文本：
- 确保文本 清晰明了 ，而且不太小。
- 避免裁剪关键视觉上下文，只是为了放大文本。

计算成本

本部分仅适用于基础模型 API。有关外部模型，请参阅提供程序的文档。

系统会将对基础模型的请求中的每个映像添加到令牌使用情况。

令牌计数和估计值

如果不需要调整大小，请使用：
tokens = （宽度 px × 高度 px） / 750

不同映像大小的近似令牌计数：

图像大小	令牌
200×200 像素（0.04 MP）	~54
1000×1000 像素（1 MP）	~1334
1092×1092 像素（1.19 MP）	~1590

图像理解的限制

本部分仅适用于基础模型 API。有关外部模型，请参阅提供程序的文档。

对 Databricks 上的 Claude 模型的高级图像理解有一些限制：

人员标识：无法识别或命名图像中的人员。
准确性：可能误解了低质量、旋转或非常小的图像（<200 像素）。
空间推理：与精确布局作斗争，如读取模拟时钟或棋盘位置。
计数：提供近似计数，但对于许多小型对象来说可能不准确。
AI 生成的图像：无法可靠地检测合成或假图像。
不适当的内容：阻止显式或违反策略的映像。
医疗保健：不适合复杂的医疗扫描（例如 CT 和 MRIs）。这不是诊断工具。

仔细查看所有输出，尤其是对于高风险用例。避免将 Claude 用于需要完美精度或敏感分析的任务，而无需人工监督。

嵌入模型查询

下面是基础模型 API 提供的 gte-large-en 模型的嵌入请求。该示例适用于查询使用模型服务功能（基础模型 API 或外部模型）提供的嵌入模型。

OpenAI 客户端

要使用 OpenAI 客户端，需将模型服务终结点名称指定为 model 输入。


from databricks.sdk import WorkspaceClient

w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

response = openai_client.embeddings.create(
  model="databricks-gte-large-en",
  input="what is databricks"
)

若要在工作区外部查询基础模型，必须直接使用 OpenAI 客户端，如下所示。以下示例假定在计算中安装了 Databricks API 令牌和 openai。还需要 Databricks 工作区实例才能将 OpenAI 客户端连接到 Databricks。


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.embeddings.create(
  model="databricks-gte-large-en",
  input="what is databricks"
)

SQL

重要

以下示例使用内置 SQL 函数 ai_query。此函数为公共预览版，定义可能会发生变化。


SELECT ai_query(
    "databricks-gte-large-en",
    "Can you explain AI in ten words?"
  )

REST API

重要

以下示例使用 REST API 参数来查询为基础模型和外部模型提供服务的终结点。这些参数为公共预览版，定义可能会更改。请参阅 POST /serving-endpoints/{name}/invocations。


curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d  '{ "input": "Embed this sentence!"}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-gte-large-en/invocations

MLflow 部署 SDK

重要

以下示例使用来自predict()的 API。


import mlflow.deployments

export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

embeddings_response = client.predict(
    endpoint="databricks-gte-large-en",
    inputs={
        "input": "Here is some text to embed"
    }
)

Databricks Python SDK


from databricks.sdk import WorkspaceClient
from databricks.sdk.service.serving import ChatMessage, ChatMessageRole

w = WorkspaceClient()
response = w.serving_endpoints.query(
    name="databricks-gte-large-en",
    input="Embed this sentence!"
)
print(response.data[0].embedding)

LangChain

若要使用 LangChain 中的 Databricks 基础模型 API 模型作为嵌入模型，请导入 DatabricksEmbeddings 类并指定 endpoint 参数，如下所示：

%pip install databricks-langchain

from databricks_langchain import DatabricksEmbeddings

embeddings = DatabricksEmbeddings(endpoint="databricks-gte-large-en")
embeddings.embed_query("Can you explain AI in ten words?")

下面是嵌入模型的预期请求格式。对于外部模型，可以包含对给定提供程序和终结点配置有效的其他参数。请参阅其他查询参数。


{
  "input": [
    "embedding text"
  ]
}

下面是预期的响应格式：

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": []
    }
  ],
  "model": "text-embedding-ada-002-v2",
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 2
  }
}

检查嵌入是否规范化

使用以下命令检查模型生成的嵌入是否已规范化。


  import numpy as np

  def is_normalized(vector: list[float], tol=1e-3) -> bool:
      magnitude = np.linalg.norm(vector)
      return abs(magnitude - 1) < tol

函数调用

Databricks 函数调用与 OpenAI 兼容，仅在模型服务期间作为基础模型 API 的一部分提供，并提供为外部模型提供服务的终结点。有关详细信息，请参阅 Azure Databricks 上的函数调用。

结构化输出

结构化输出与 OpenAI 兼容，仅在作为基础模型 API 的一部分的模型服务期间可用。有关详细信息，请参阅 Azure Databricks 上的结构化输出。

使用 AI 操场与支持的 LLM 聊天

可以使用 AI 操场与受支持的大语言模型进行交互。 AI 操场是类似聊天的环境，可以在该处测试、提示和比较 Azure Databricks 工作区的 LLM。

AI 操场

通过

使用基础模型

要求

安装包

OpenAI 客户端

REST API

MLflow 部署 SDK

Databricks Python SDK

文本完成模型查询

OpenAI 客户端

SQL

REST API

MLflow 部署 SDK

Databricks Python SDK

聊天完成模型查询

OpenAI 客户端

SQL

REST API

MLflow 部署 SDK

Databricks Python SDK

LangChain

推理模型

跨多个轮次管理推理

推理模型的工作原理是什么？

支持的模型

基础模型API（由Databricks托管）

外部模型

视觉模型

代码示例

支持的模型

基础模型API（由Databricks托管）

外部模型

输入图像要求

每个请求存在多个映像

大小限制

图像大小调整建议

图像质量注意事项

计算成本

图像理解的限制

嵌入模型查询

OpenAI 客户端

SQL

REST API

MLflow 部署 SDK

Databricks Python SDK

LangChain

检查嵌入是否规范化

函数调用

结构化输出

使用 AI 操场与支持的 LLM 聊天

其他资源

反馈

其他资源