跟踪 DSPy

2025-06-11

通过自动日志进行 DSPy 跟踪

DSPy 是一个开源框架，用于构建模块化 AI 系统，并提供用于优化提示和权重的算法。

MLflow 跟踪为 DSPy 提供自动跟踪功能。可以通过调用 mlflow.dspy.autolog 函数来启用 DSPy 跟踪，嵌套跟踪在调用 DSPy 模块时会自动记录到活动的 MLflow 试验中。

import mlflow

mlflow.dspy.autolog()

先决条件

若要将 MLflow 跟踪与 DSPy 配合使用，需要安装 MLflow 和 dspy-ai 库。

开发

对于开发环境，请安装带有 Databricks 附加组件的完整 MLflow 包和 dspy-ai。

pip install --upgrade "mlflow[databricks]>=3.1" dspy-ai

完整 mlflow[databricks] 包包括用于 Databricks 的本地开发和试验的所有功能。

生产

对于生产部署，请安装 mlflow-tracing 和 dspy-ai：

pip install --upgrade mlflow-tracing dspy-ai

包 mlflow-tracing 已针对生产用途进行优化。

注释

对于 DSPy 的最佳追踪体验，强烈推荐使用 MLflow 3。

在运行示例之前，需要配置环境：

对于不使用 Databricks 笔记本的用户：设置 Databricks 环境变量：

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-personal-access-token"

对于 Databricks 笔记本中的用户：这些凭据会自动为您设置。

API 密钥：确保设置 LLM 提供程序 API 密钥：

export OPENAI_API_KEY="your-openai-api-key"
# Add other provider keys as needed

示例用法

import dspy
import mlflow
import os

# Ensure your OPENAI_API_KEY (or other LLM provider keys) is set in your environment
# os.environ["OPENAI_API_KEY"] = "your-openai-api-key" # Uncomment and set if not globally configured

# Enabling tracing for DSPy
mlflow.dspy.autolog()

# Set up MLflow tracking to Databricks
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/dspy-tracing-demo")

# Define a simple ChainOfThought model and run it
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)


# Define a simple summarizer model and run it
class SummarizeSignature(dspy.Signature):
    """Given a passage, generate a summary."""

    passage: str = dspy.InputField(desc="a passage to summarize")
    summary: str = dspy.OutputField(desc="a one-line summary of the passage")


class Summarize(dspy.Module):
    def __init__(self):
        self.summarize = dspy.ChainOfThought(SummarizeSignature)

    def forward(self, passage: str):
        return self.summarize(passage=passage)


summarizer = Summarize()
summarizer(
    passage=(
        "MLflow Tracing is a feature that enhances LLM observability in your Generative AI (GenAI) applications "
        "by capturing detailed information about the execution of your application's services. Tracing provides "
        "a way to record the inputs, outputs, and metadata associated with each intermediate step of a request, "
        "enabling you to easily pinpoint the source of bugs and unexpected behaviors."
    )
)

评估时的跟踪

评估 DSPy 模型是开发 AI 系统的重要步骤。 MLflow 跟踪可帮助你跟踪评估后程序的性能，方法是提供有关每个输入的程序执行的详细信息。

为 DSPy 启用 MLflow 自动跟踪时，在执行 DSPy 的内置评估套件时，将自动生成跟踪。以下示例演示如何在 MLflow 中运行评估和查看跟踪：

import dspy
from dspy.evaluate.metrics import answer_exact_match
import mlflow
import os

# Ensure your OPENAI_API_KEY (or other LLM provider keys) is set in your environment
# os.environ["OPENAI_API_KEY"] = "your-openai-api-key" # Uncomment and set if not globally configured

# Enabling tracing for DSPy evaluation
mlflow.dspy.autolog(log_traces_from_eval=True)

# Set up MLflow tracking to Databricks if not already configured
# mlflow.set_tracking_uri("databricks")
# mlflow.set_experiment("/Shared/dspy-eval-demo")

# Define a simple evaluation set
eval_set = [
    dspy.Example(
        question="How many 'r's are in the word 'strawberry'?", answer="3"
    ).with_inputs("question"),
    dspy.Example(
        question="How many 'a's are in the word 'banana'?", answer="3"
    ).with_inputs("question"),
    dspy.Example(
        question="How many 'e's are in the word 'elephant'?", answer="2"
    ).with_inputs("question"),
]


# Define a program
class Counter(dspy.Signature):
    question: str = dspy.InputField()
    answer: str = dspy.OutputField(
        desc="Should only contain a single number as an answer"
    )


cot = dspy.ChainOfThought(Counter)

# Evaluate the programs
with mlflow.start_run(run_name="CoT Evaluation"):
    evaluator = dspy.evaluate.Evaluate(
        devset=eval_set,
        return_all_scores=True,
        return_outputs=True,
        show_progress=True,
    )
    aggregated_score, outputs, all_scores = evaluator(cot, metric=answer_exact_match)

    # Log the aggregated score
    mlflow.log_metric("exact_match", aggregated_score)
    # Log the detailed evaluation results as a table
    mlflow.log_table(
        {
            "question": [example.question for example in eval_set],
            "answer": [example.answer for example in eval_set],
            "output": outputs,
            "exact_match": all_scores,
        },
        artifact_file="eval_results.json",
    )

如果打开 MLflow UI 并进入“CoT 评估”运行，你将看到评估结果，以及在 Traces 选项卡上评估期间生成的跟踪列表。

注释

可以通过将mlflow.dspy.autolog参数设置为log_traces_from_eval来调用False函数，进而禁用这些步骤的跟踪。

编译期间跟踪（优化）

编译（优化）是 DSPy 的核心概念。通过编译，DSPy 会自动优化 DSPy 程序的提示和权重，以实现最佳性能。

默认情况下，MLflow 不会在复杂期间生成跟踪，因为复杂情况可能会触发数百或数千个 DSPy 模块的调用。若要启用编译的跟踪功能，可以通过设置参数mlflow.dspy.autolog为log_traces_from_compile来调用True函数。

import dspy
import mlflow
import os

# Ensure your OPENAI_API_KEY (or other LLM provider keys) is set in your environment
# os.environ["OPENAI_API_KEY"] = "your-openai-api-key" # Uncomment and set if not globally configured

# Enable auto-tracing for compilation
mlflow.dspy.autolog(log_traces_from_compile=True)

# Set up MLflow tracking to Databricks if not already configured
# mlflow.set_tracking_uri("databricks")
# mlflow.set_experiment("/Shared/dspy-compile-demo")

# Optimize the DSPy program as usual
tp = dspy.MIPROv2(metric=metric, auto="medium", num_threads=24)
optimized = tp.compile(cot, trainset=trainset, ...)

禁用自动跟踪

可以通过调用 mlflow.llama_index.autolog(disable=True) 或 mlflow.autolog(disable=True)调用来全局禁用 LlamaIndex 的自动跟踪。

通过