监视在 Azure Databricks 外部部署的应用（旧版）

2025-06-11

重要

此功能在 Beta 版中。

重要

本页介绍<0.22 与 MLflow <2.x 的代理评估。 Databricks 建议使用与代理评估 >1.0集成的 MLflow 3。代理评估 SDK 方法现在通过 mlflow SDK 公开。

有关本主题的信息，请参阅生产质量监视（自动运行评分器）。

本页介绍如何为在马赛克 AI 代理框架外部部署的生成 AI 应用设置监视。有关使用监视（如结果架构、查看结果、使用 UI 添加警报和管理监视器）的一般信息，请参阅“监视生成 AI 应用”（旧版）。

适用于生成式 AI 的湖屋监视可帮助你使用 Mosaic AI 代理评估 AI 判定来跟踪运营指标（如数据量、延迟、错误和成本）以及质量指标（如正确性和准则遵守情况）。

监视的工作原理：

工作原理概述

监视用户界面：

适用于生成式 AI 的湖屋监视 UI 主图

要求

若要启用和配置监视，必须使用 databricks-agents SDK。

%pip install databricks-agents>=0.22.0 mlflow>=2.21.2
dbutils.library.restartPython()

重要

您不需要在生产应用程序中安装 databricks-agents。生产应用程序仅需要安装 mlflow，从而启用 MLflow 跟踪功能。

设置监视

如果已在 Databricks 外部部署 AI 应用或使用 Databricks Apps，请使用 create_external_monitor Databricks 笔记本中的方法设置监视器。

注释

监视器是在 MLflow 试验中创建的。监视 UI 显示为 MLflow 试验中的选项卡。通过 MLflow 实验的访问控制列表（ACL）来控制对日志追踪的访问。

然后，你将使用 MLFlow 跟踪和 mlflow.tracing.set_destination 检测已部署的代码，如下所示。有关使用 Agent Framework 部署的应用，请参阅为生成式 AI 应用程序部署代理。

以下步骤假定你正在从 Databricks 笔记本工作。若要从本地开发环境（例如 IDE）创建监视器，请改为下载并使用此 Python 脚本。

此笔记本包括本页其余部分显示的步骤。

创建外部监视器示例笔记本

获取笔记本

步骤 1：创建 MLflow 实验

可以使用现有试验或创建新试验。若要创建新试验，请执行以下步骤。

import mlflow
mlflow_experiment_path = "/Users/your-user-name@company.com/my_monitor_name"
mlflow.set_experiment(experiment_name=mlflow_experiment_path)

# Get the experiment ID to use in the next step
experiment_id = mlflow.tracking.fluent._get_experiment_id()

步骤 2：创建外部监视器

重要

默认情况下，如果在未显式指定试验的情况下从 Databricks 笔记本运行 create_external_monitor ，则会在笔记本的 MLflow 试验中创建监视器。

create_external_monitor 接受以下输入：

catalog_name: str - 要向其写入增量工件的目录名称。
schema_name: str - 写入增量数据的架构名称。此架构应是上述目录的一部分。
[Optional] experiment_id：存储生产指标的 MLflow experiment_id 。其中一个或 experiment_name 应定义。如果未指定，监视器将使用运行此命令的笔记本电脑的实验。
[Optional] experiment_name：存储生产指标的 MLflow experiment_name 。其中一个或 experiment_id 应定义。如果未指定，监视器将使用运行此命令的笔记本电脑的实验。
assessments_config: AssessmentsSuiteConfig | dict - 用于监视器计算的评估的配置。支持以下参数：
- [Optional] sample: float - 全局采样率，即用于计算评估的请求比例（介于 0 和 1 之间）。默认为 1.0（所有流量的计算评估）。
- [Optional] paused: bool - 监视器是否暂停。
- [Optional] assessments: list[BuiltinJudge | GuidelinesJudge] 评估列表，包括内置评估标准或准则评估标准。

BuiltinJudge 采用下列参数：

name: str - 支持监测的内置法官之一：“安全”、“稳重性”、“查询相关性”、“片段相关性”。有关内置法官的更多详细信息，请参阅内置法官。
[Optional] sample_rate: float - 计算此评估请求的比例（范围在 0 到 1 之间）。默认为全局采样率。

GuidelinesJudge 采用下列参数：

guidelines: dict[str, list[str]] - 一个字典，其中包含用于对请求/响应进行断言的指南名称和纯文本指南。有关准则的更多详细信息，请参阅 “遵循准则”。
[Optional] sample_rate: float - 在 0 和 1 之间的计算准则请求的比例。默认为全局采样率。

有关更多详细信息，请参阅 Python SDK 文档。

例如：

from databricks.agents.monitoring import create_external_monitor, AssessmentsSuiteConfig, BuiltinJudge, GuidelinesJudge
import mlflow


external_monitor = create_external_monitor(
  catalog_name='my_catalog',
  schema_name='my_schema',
  # experiment_id=..., # Replace this line with your MLflow experiment ID.  By default, create_external_monitor uses the notebook's MLflow experiment.
  assessments_config=AssessmentsSuiteConfig(
    sample=1.0,
    assessments=[
      # Builtin judges: "safety", "groundedness", "relevance_to_query", "chunk_relevance"
      BuiltinJudge(name='safety'),  # or {'name': 'safety'}
      BuiltinJudge(name='groundedness', sample_rate=0.4), # or {'name': 'groundedness', 'sample_rate': 0.4}
      BuiltinJudge(name='relevance_to_query'), # or {'name': 'relevance_to_query'}
      BuiltinJudge(name='chunk_relevance'), # or {'name': 'chunk_relevance'}
      # Create custom judges with the guidelines judge.
      GuidelinesJudge(guidelines={
        "pii": ["The response must not contain personal information."],
        "english": ["The response must be in English"]
      }),
    ]
  )
  # AssessmentsSuiteConfig can also be simple objects:
  # assessments_config={
  #   "sample": 1.0,
  #   "assessments": [
  #     {'name': 'safety'},
  #     {'name': 'groundedness'},
  #     {'name': 'relevance_to_query'},
  #     {'name': 'chunk_relevance'},
  #     {
  #       'name': 'guideline_adherence',
  #       'guidelines': {
  #         'pii': ['The response must not contain personal information.'],
  #         'english': ['The response must be in English']
  #       }
  #     },
  #   ]
  # }
)
print("experiment_id=", external_monitor.experiment_id)

你将在单元格输出中看到一个指向监视 UI 的链接。可以在此 UI 中查看评估结果，并存储在monitoring_table中。若要查看已评估的行，请运行：

display(spark.table("cat.schema.monitor_table"))

步骤 3：为生成式 AI 应用提供 MLFlow 工具以便跟踪

在部署的代理中安装以下包以开始使用：

%pip install "mlflow>=2.21.2"

重要

DATABRICKS_TOKEN必须由一个对配置监视器所在的 MLflow 试验具有编辑访问权限的服务主体或用户使用。

在 Gen AI 应用中，添加以下内容：

设置DATABRICKS_HOST和DATABRICKS_TOKEN环境变量。
- DATABRICKS_HOST 是工作区的 URL，例如 https://workspace-url.databricks.com
- DATABRICKS_TOKEN 是 PAT 令牌。请遵循以下步骤。
  - 如果要使用服务主体的 PAT 令牌，请确保为服务主体授予对笔记本中配置的 MLflow 试验的编辑权限。如果没有此项，MLflow 追踪功能将无法记录数据。
mlflow.tracing.set_destination 设置跟踪目标。
MLFlow 自动跟踪或 MLFlow Fluent APIs 用于跟踪你的应用。 MLFlow 支持对许多常用框架（如 LangChain、Bedrock、DSPy、OpenAI 等）自动记录。
Databricks 身份验证令牌，以使 MLFlow 可以记录跟踪信息到 Databricks。

# Environment variables:
DATABRICKS_TOKEN="..."
DATABRICKS_HOST="..."

import mlflow
from mlflow.tracing.destination import Databricks

# Setup the destination.
mlflow_experiment_id = "..."  # This is the experiment_id that is configured in step 2.
mlflow.tracing.set_destination(Databricks(experiment_id=mlflow_experiment_id))

# Your AI app code, instrumented with mlflow.
# MLFlow supports autologging for a variety of

## Option 1: MLFlow autologging
import mlflow

mlflow.langchain.autolog()

# Enable other optional logging
# mlflow.langchain.autolog(log_models=True, log_input_examples=True)

# Your LangChain model code here
# ...
# ...

# Option 2: MLflow Fluent APIs:
# Initialize OpenAI client
# This example uses the databricks-sdk's convenience method to get an OpenAI client
# In your app, you can use any OpenAI client (or other SDKs)
w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()

# These traces will automatically be sent to Databricks.
@mlflow.trace(span_type='AGENT')
def openai_agent(user_input: str):
  return openai_client.chat.completions.create(
      model="databricks-meta-llama-3-3-70b-instruct",
      messages=[
          {
              "role": "system",
              "content": "You are a helpful assistant that always responds in CAPS!",
          },
          {"role": "user", "content": user_input},
      ],
  )

# Call the app to generate a Trace
openai_agent("What is GenAI observability?")

步骤 4. 访问监控界面以查看日志追踪

转到在步骤 2 中配置的 MLflow 试验。单击“监视”选项卡，然后在左上角的“日志”上查看步骤 3 中记录的跟踪。

监视执行和调度

创建监视器时，它会启动一个作业，该作业评估过去 30 天内向终结点发出的请求示例。此初始评估可能需要几分钟才能完成，具体取决于请求量和采样率。

初始评估后，监视器每隔 15 分钟自动刷新一次以评估新请求。向终结点发出请求时以及监视器评估请求的时间之间存在延迟。

监视器由 Databricks 工作流提供支持。若要手动触发监视器的刷新，请找到具有名称 [<endpoint_name>] Agent Monitoring Job 的工作流，然后单击“ 立即运行”。

局限性

以下是不在 Databricks 上部署的 GenAI 应用在 Lakehouse 监控方面的限制：

最大日志数据引入吞吐量为 10 查询数/秒（QPS）。
MLFlow TraceData 字段requestresponse组合不能超过25KB。
MLFlow TraceDataspans 不得超过 1MB。

通过