跟踪 LangChain

2025-06-11

通过自动记录进行 LangChain 跟踪

LangChain 是用于生成 LLM 支持的应用程序的开源框架。

MLflow 跟踪为 LangChain 提供自动跟踪功能。可以通过调用 mlflow.langchain.autolog 函数为 LangChain 启用跟踪，嵌套的跟踪在调用链时会自动记录到当前活动的 MLflow 实验中。

import mlflow

mlflow.langchain.autolog()

先决条件

若要将 MLflow 跟踪与 LangChain 配合使用，需要安装 MLflow 和相关 LangChain 包（例如 langchain， langchain-openai）。

开发

对于开发环境，请使用 Databricks Extras 和 LangChain 包安装完整的 MLflow 包：

pip install --upgrade "mlflow[databricks]>=3.1" langchain langchain-openai
# Add other langchain community/core packages if needed

完整 mlflow[databricks] 包包括用于 Databricks 的本地开发和试验的所有功能。

生产

对于生产部署，请安装 mlflow-tracing 和 LangChain 包：

pip install --upgrade mlflow-tracing langchain langchain-openai
# Add other langchain community/core packages if needed

包 mlflow-tracing 已针对生产用途进行优化。

注释

为了获得与 LangChain 的最佳追踪体验，强烈推荐使用 MLflow 3。如果遇到问题，请查看以下示例，了解特定兼容版本的 LangChain 包。

在运行示例之前，需要配置环境：

对于不使用 Databricks 笔记本的用户：设置 Databricks 环境变量：

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-personal-access-token"

对于 Databricks 笔记本中的用户：这些凭据会自动为您设置。

API 密钥：确保设置 LLM 提供程序 API 密钥：

export OPENAI_API_KEY="your-openai-api-key"
# Add other provider keys as needed

示例用法

import mlflow
import os

from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# Ensure your OPENAI_API_KEY (or other LLM provider keys) is set in your environment
# os.environ["OPENAI_API_KEY"] = "your-openai-api-key" # Uncomment and set if not globally configured

# Enabling autolog for LangChain will enable trace logging.
mlflow.langchain.autolog()

# Set up MLflow tracking to Databricks
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/langchain-tracing-demo")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7, max_tokens=1000)

prompt_template = PromptTemplate.from_template(
    "Answer the question as if you are {person}, fully embodying their style, wit, personality, and habits of speech. "
    "Emulate their quirks and mannerisms to the best of your ability, embracing their traits—even if they aren't entirely "
    "constructive or inoffensive. The question is: {question}"
)

chain = prompt_template | llm | StrOutputParser()

# Let's test another call
chain.invoke(
    {
        "person": "Linus Torvalds",
        "question": "Can I just set everyone's access to sudo to make things easier?",
    }
)

注释

上述示例已确认能够与以下版本要求正常运行：

pip install openai==1.30.5 langchain==0.2.1 langchain-openai==0.1.8 langchain-community==0.2.1 mlflow==2.14.0 tiktoken==0.7.0

受支持的 API

LangChain 的自动跟踪支持以下 API。

invoke
batch
stream
ainvoke
abatch
astream
get_relevant_documents （对于检索器）
__call__ （适用于链条和 AgentExecutors）

自定义跟踪行为

有时，你可能想要自定义在跟踪中记录的信息。可以通过创建一个继承自 mlflow.langchai.langchain_tracer.MlflowLangchainTracer 的自定义回调处理程序来实现此目的。 MlflowLangchainTracer 是一个回调器，它被注入到 langchain 模型推理过程中，以自动生成日志跟踪。它会在一系列链动作（如 on_chain_start、on_llm_start）开始时启动一个新的阶段，并在动作完成时结束该阶段。各种元数据（如跨度类型、动作名称、输入、输出、延迟）会被自动记录到跨度中。

以下示例演示如何在聊天模型开始运行时将其他属性记录到跨度。

from mlflow.langchain.langchain_tracer import MlflowLangchainTracer


class CustomLangchainTracer(MlflowLangchainTracer):
    # Override the handler functions to customize the behavior. The method signature is defined by LangChain Callbacks.
    def on_chat_model_start(
        self,
        serialized: Dict[str, Any],
        messages: List[List[BaseMessage]],
        *,
        run_id: UUID,
        tags: Optional[List[str]] = None,
        parent_run_id: Optional[UUID] = None,
        metadata: Optional[Dict[str, Any]] = None,
        name: Optional[str] = None,
        **kwargs: Any,
    ):
        """Run when a chat model starts running."""
        attributes = {
            **kwargs,
            **metadata,
            # Add additional attribute to the span
            "version": "1.0.0",
        }

        # Call the _start_span method at the end of the handler function to start a new span.
        self._start_span(
            span_name=name or self._assign_span_name(serialized, "chat model"),
            parent_run_id=parent_run_id,
            span_type=SpanType.CHAT_MODEL,
            run_id=run_id,
            inputs=messages,
            attributes=kwargs,
        )

禁用自动跟踪

可以通过调用mlflow.langchain.autolog(disable=True)或mlflow.autolog(disable=True)全局禁用 LangChain 的自动跟踪。