使用 Unity 目录函数创建自定义 AI 代理工具

2025-05-14

使用 Unity 目录函数创建 AI 代理工具，这些工具执行自定义逻辑并执行特定任务，以扩展 LLM 的功能，超越语言生成。

要求

用于创建使用 SQL 正文创建函数语句编写的 Unity 目录函数的无服务器计算连接。 Python 函数不需要无服务器计算。
使用 Databricks Runtime 15.0 及更高版本。

创建代理工具

在此示例中，你将创建 Unity 目录工具，测试其功能，并将其添加到代理。在 Databricks 笔记本中运行以下代码。

安装依赖项

使用 [databricks] 附加安装选项安装 Unity Catalog AI 包，并安装 Databricks-LangChain 集成包。

此示例使用 LangChain，但类似的方法可以应用于其他库。请参阅将 Unity 目录工具与第三方生成 AI 框架集成。

# Install Unity Catalog AI integration packages with the Databricks extra
%pip install unitycatalog-ai[databricks]
%pip install unitycatalog-langchain[databricks]

# Install the Databricks LangChain integration package
%pip install databricks-langchain

dbutils.library.restartPython()

初始化 Databricks 函数客户端

初始化 Databricks 函数客户端，这是用于在 Databricks 中创建、管理和运行 Unity 目录函数的专用接口。

from unitycatalog.ai.core.databricks import DatabricksFunctionClient

client = DatabricksFunctionClient()

定义工具的逻辑

Unity Catalog 工具实际上是 Unity Catalog 用户定义函数（UDF）。定义 Unity 目录工具时，需要在 Unity 目录中注册函数。若要了解有关 Unity 目录 UDF 的详细信息，请参阅 Unity 目录中的用户定义函数（UDF）。

可以使用以下两个 API 之一创建 Unity 目录函数：

create_python_function 接受 Python 可调用对象。
create_function 接受 SQL 正文 create 函数语句。请参阅 “创建 Python 函数”。

使用 create_python_function API 创建函数。

为了使 Python 可调用对象可以被 Unity Catalog 函数数据模型识别，您的函数必须满足以下要求：

类型提示：函数签名必须定义有效的 Python 类型提示。命名参数和返回值都必须定义其类型。
请勿使用变量参数：不支持变量参数，例如 *args 和 **kwargs。必须显式定义所有参数。
类型兼容性：并非所有 Python 类型都在 SQL 中受支持。请参阅 Spark 支持的数据类型。
描述性文档字符串：Unity 目录函数工具包从 docstring 中读取、分析和提取重要信息。
- 必须根据 Google docstring 语法设置 Docstrings 的格式。
- 为函数及其参数编写明确的说明，以帮助 LLM 了解如何以及何时使用该函数。
依赖项导入：必须在函数正文中导入库。运行该工具时，不会解析函数外部的导入。

以下代码片段使用 create_python_function 注册 Python 可调用代码 add_numbers：


CATALOG = "my_catalog"
SCHEMA = "my_schema"

def add_numbers(number_1: float, number_2: float) -> float:
  """
  A function that accepts two floating point numbers adds them,
  and returns the resulting sum as a float.

  Args:
    number_1 (float): The first of the two numbers to add.
    number_2 (float): The second of the two numbers to add.

  Returns:
    float: The sum of the two input numbers.
  """
  return number_1 + number_2

function_info = client.create_python_function(
  func=add_numbers,
  catalog=CATALOG,
  schema=SCHEMA,
  replace=True
)

测试函数

测试函数以检查其是否按预期工作。在 API 中 execute_function 指定完全限定的函数名称以运行函数：

result = client.execute_function(
  function_name=f"{CATALOG}.{SCHEMA}.add_numbers",
  parameters={"number_1": 36939.0, "number_2": 8922.4}
)

result.value # OUTPUT: '45861.4'

使用 UCFunctionToolKit 封装函数

使用 UCFunctionToolkit 函数包装函数，使代理创作库可以访问该函数。该工具包可确保不同生成型人工智能库的一致性，并添加有用的功能，例如检索器的自动追踪。

from databricks_langchain import UCFunctionToolkit

# Create a toolkit with the Unity Catalog function
func_name = f"{CATALOG}.{SCHEMA}.add_numbers"
toolkit = UCFunctionToolkit(function_names=[func_name])

tools = toolkit.tools

在代理中使用该工具

使用 tools 属性从 UCFunctionToolkit 中将该工具添加到 LangChain 代理。

此示例使用 LangChain AgentExecutor API 创建一个简单的代理，以便于简单。对于生产工作负荷，请使用示例中显示的ChatAgent代理编写工作流。

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.prompts import ChatPromptTemplate
from databricks_langchain import (
  ChatDatabricks,
  UCFunctionToolkit,
)
import mlflow

# Initialize the LLM (optional: replace with your LLM of choice)
LLM_ENDPOINT_NAME = "databricks-meta-llama-3-3-70b-instruct"
llm = ChatDatabricks(endpoint=LLM_ENDPOINT_NAME, temperature=0.1)

# Define the prompt
prompt = ChatPromptTemplate.from_messages(
  [
    (
      "system",
      "You are a helpful assistant. Make sure to use tools for additional functionality.",
    ),
    ("placeholder", "{chat_history}"),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
  ]
)

# Enable automatic tracing
mlflow.langchain.autolog()

# Define the agent, specifying the tools from the toolkit above
agent = create_tool_calling_agent(llm, tools, prompt)

# Create the agent executor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_executor.invoke({"input": "What is 36939.0 + 8922.4?"})

使用清晰的文档改进工具调用

良好的文档可帮助代理了解何时以及如何使用每个工具。请遵循以下最佳做法来记录工具：

对于 Unity 目录函数，请使用 COMMENT 子句来描述工具功能和参数。
明确定义预期的输入和输出。
编写有意义的说明，使代理和人类能够更轻松地使用工具。

示例：有效的工具文档

以下示例显示了查询结构化表的工具的明文 COMMENT 字符串。

CREATE OR REPLACE FUNCTION main.default.lookup_customer_info(
  customer_name STRING COMMENT 'Name of the customer whose info to look up.'
)
RETURNS STRING
COMMENT 'Returns metadata about a specific customer including their email and ID.'
RETURN SELECT CONCAT(
    'Customer ID: ', customer_id, ', ',
    'Customer Email: ', customer_email
  )
  FROM main.default.customer_data
  WHERE customer_name = customer_name
  LIMIT 1;

示例：无效的工具文档

以下示例缺少重要详细信息，使得代理难以有效地使用该工具：

CREATE OR REPLACE FUNCTION main.default.lookup_customer_info(
  customer_name STRING COMMENT 'Name of the customer.'
)
RETURNS STRING
COMMENT 'Returns info about a customer.'
RETURN SELECT CONCAT(
    'Customer ID: ', customer_id, ', ',
    'Customer Email: ', customer_email
  )
  FROM main.default.customer_data
  WHERE customer_name = customer_name
  LIMIT 1;

使用无服务器模式或本地模式运行函数

当 GEN AI 服务确定需要工具调用时，集成包（UCFunctionToolkit 实例）将运行 DatabricksFunctionClient.execute_function API。

调用 execute_function 可以在两种执行模式下运行函数：无服务器或本地。此模式确定哪个资源运行函数。

用于生产的无服务器模式

无服务器模式是生产用例的默认和建议选项。它使用 SQL Server 无终结点远程运行函数，确保代理的进程保持安全且不受在本地运行任意代码的风险的影响。

# Defaults to serverless if `execution_mode` is not specified
client = DatabricksFunctionClient(execution_mode="serverless")

当代理在 无服务器 模式下请求工具执行时，会发生以下情况：

如果尚未在本地缓存定义，则 DatabricksFunctionClient 向 Unity 目录发送请求以检索函数定义。
DatabricksFunctionClient 提取函数定义并验证参数的名称和类型。
将 DatabricksFunctionClient 作为 UDF 执行提交到无服务器实例。

用于开发的本地模式

本地模式专为开发和调试而设计。它在本地子进程中执行函数，而不是向 SQL Server 无服务器终结点发出请求。这样，可以通过提供本地堆栈跟踪更有效地排查工具调用问题。

当代理请求 在本地 模式下运行工具时， DatabricksFunctionClient 请执行以下操作：

向 Unity 目录发送请求，以在尚未本地缓存定义的情况下检索函数定义。
提取 Python 可调用定义，在本地缓存可调用对象，并验证参数名称和类型。
使用超时保护机制在受限子进程中调用指定参数的可调用对象。

# Defaults to serverless if `execution_mode` is not specified
client = DatabricksFunctionClient(execution_mode="local")

在 "local" 模式下运行提供以下功能：

CPU 时间限制： 限制可调用执行的总 CPU 运行时，以防止过多的计算负载。

CPU 时间限制基于实际 CPU 使用率，而不是时钟时间。由于系统计划和并发进程，CPU 时间可能超过实时方案中的时钟时间。
内存限制： 限制分配给进程的虚拟内存。
超时保护： 限制函数运行的总墙时钟超时时间。

使用环境变量自定义这些限制（阅读进一步）。

环境变量

使用以下环境变量配置函数在 DatabricksFunctionClient 运行方式：

环境变量	默认值	说明
`EXECUTOR_MAX_CPU_TIME_LIMIT`	`10` 秒	允许的最大 CPU 执行时间（仅限本地模式）。
`EXECUTOR_MAX_MEMORY_LIMIT`	`100` MB	进程允许的最大虚拟内存分配（仅限本地模式）。
`EXECUTOR_TIMEOUT`	`20` 秒	最大总墙钟时间（仅限本地模式）。
`UCAI_DATABRICKS_SESSION_RETRY_MAX_ATTEMPTS`	`5`	在令牌过期时重试刷新会话客户端的最大尝试次数。
`UCAI_DATABRICKS_SERVERLESS_EXECUTION_RESULT_ROW_LIMIT`	`100`	使用无服务器计算和 `databricks-connect`运行函数时要返回的最大行数。

后续步骤

以编程方式将 Unity 目录工具添加到代理。请参阅 ChatAgent 示例。
使用 AI Playground UI 将 Unity 目录工具添加到代理。请参阅 AI Playground 中的原型工具调用代理。
使用函数客户端管理 Unity 目录函数。请参阅 Unity 目录文档 - 函数客户端

通过