Share via


Tracing LangChain

LangChain Tracing via autolog

LangChain is an open-source framework for building LLM-powered applications.

MLflow Tracing provides automatic tracing capability for LangChain. You can enable tracing for LangChain by calling the mlflow.langchain.autolog function, and nested traces are automatically logged to the active MLflow Experiment upon invocation of chains.

import mlflow

mlflow.langchain.autolog()

Prerequisites

To use MLflow Tracing with LangChain, you need to install MLflow and the relevant LangChain packages (e.g., langchain, langchain-openai).

Development

For development environments, install the full MLflow package with Databricks extras and LangChain packages:

pip install --upgrade "mlflow[databricks]>=3.1" langchain langchain-openai
# Add other langchain community/core packages if needed

The full mlflow[databricks] package includes all features for local development and experimentation on Databricks.

Production

For production deployments, install mlflow-tracing and LangChain packages:

pip install --upgrade mlflow-tracing langchain langchain-openai
# Add other langchain community/core packages if needed

The mlflow-tracing package is optimized for production use.

Note

MLflow 3 is highly recommended for the best tracing experience with LangChain. Check the example below for specific compatible versions of LangChain packages if you encounter issues.

Before running the examples, you'll need to configure your environment:

For users outside Databricks notebooks: Set your Databricks environment variables:

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-personal-access-token"

For users inside Databricks notebooks: These credentials are automatically set for you.

API Keys: Ensure your LLM provider API keys are set:

export OPENAI_API_KEY="your-openai-api-key"
# Add other provider keys as needed

Example Usage

import mlflow
import os

from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# Ensure your OPENAI_API_KEY (or other LLM provider keys) is set in your environment
# os.environ["OPENAI_API_KEY"] = "your-openai-api-key" # Uncomment and set if not globally configured

# Enabling autolog for LangChain will enable trace logging.
mlflow.langchain.autolog()

# Set up MLflow tracking to Databricks
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/langchain-tracing-demo")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7, max_tokens=1000)

prompt_template = PromptTemplate.from_template(
    "Answer the question as if you are {person}, fully embodying their style, wit, personality, and habits of speech. "
    "Emulate their quirks and mannerisms to the best of your ability, embracing their traits—even if they aren't entirely "
    "constructive or inoffensive. The question is: {question}"
)

chain = prompt_template | llm | StrOutputParser()

# Let's test another call
chain.invoke(
    {
        "person": "Linus Torvalds",
        "question": "Can I just set everyone's access to sudo to make things easier?",
    }
)

Note

This example above has been confirmed working with the following requirement versions:

pip install openai==1.30.5 langchain==0.2.1 langchain-openai==0.1.8 langchain-community==0.2.1 mlflow==2.14.0 tiktoken==0.7.0

Supported APIs

The following APIs are supported by auto tracing for LangChain.

  • invoke
  • batch
  • stream
  • ainvoke
  • abatch
  • astream
  • get_relevant_documents (for retrievers)
  • __call__ (for Chains and AgentExecutors)

Customize Tracing Behavior

Sometimes you may want to customize what information is logged in the traces. You can achieve this by creating a custom callback handler that inherits from mlflow.langchai.langchain_tracer.MlflowLangchainTracer. MlflowLangchainTracer is a callback handler that is injected into the langchain model inference process to log traces automatically. It starts a new span upon a set of actions of the chain such as on_chain_start, on_llm_start, and concludes it when the action is finished. Various metadata such as span type, action name, input, output, latency, are automatically recorded to the span.

The following example demonstrates how to record an additional attribute to the span when a chat model starts running.

from mlflow.langchain.langchain_tracer import MlflowLangchainTracer


class CustomLangchainTracer(MlflowLangchainTracer):
    # Override the handler functions to customize the behavior. The method signature is defined by LangChain Callbacks.
    def on_chat_model_start(
        self,
        serialized: Dict[str, Any],
        messages: List[List[BaseMessage]],
        *,
        run_id: UUID,
        tags: Optional[List[str]] = None,
        parent_run_id: Optional[UUID] = None,
        metadata: Optional[Dict[str, Any]] = None,
        name: Optional[str] = None,
        **kwargs: Any,
    ):
        """Run when a chat model starts running."""
        attributes = {
            **kwargs,
            **metadata,
            # Add additional attribute to the span
            "version": "1.0.0",
        }

        # Call the _start_span method at the end of the handler function to start a new span.
        self._start_span(
            span_name=name or self._assign_span_name(serialized, "chat model"),
            parent_run_id=parent_run_id,
            span_type=SpanType.CHAT_MODEL,
            run_id=run_id,
            inputs=messages,
            attributes=kwargs,
        )

Disable auto-tracing

Auto tracing for LangChain can be disabled globally by calling mlflow.langchain.autolog(disable=True) or mlflow.autolog(disable=True).