Tracing DSPy

2025-06-11

DSPy Tracing via autolog

DSPy is an open-source framework for building modular AI systems and offers algorithms for optimizing their prompts and weights.

MLflow Tracing provides automatic tracing capability for DSPy. You can enable tracing for DSPy by calling the mlflow.dspy.autolog function, and nested traces are automatically logged to the active MLflow Experiment upon invocation of DSPy modules.

import mlflow

mlflow.dspy.autolog()

Prerequisites

To use MLflow Tracing with DSPy, you need to install MLflow and the dspy-ai library.

Development

For development environments, install the full MLflow package with Databricks extras and dspy-ai:

pip install --upgrade "mlflow[databricks]>=3.1" dspy-ai

The full mlflow[databricks] package includes all features for local development and experimentation on Databricks.

Production

For production deployments, install mlflow-tracing and dspy-ai:

pip install --upgrade mlflow-tracing dspy-ai

The mlflow-tracing package is optimized for production use.

Note

MLflow 3 is highly recommended for the best tracing experience with DSPy.

Before running the examples, you'll need to configure your environment:

For users outside Databricks notebooks: Set your Databricks environment variables:

export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-personal-access-token"

For users inside Databricks notebooks: These credentials are automatically set for you.

API Keys: Ensure your LLM provider API keys are set:

export OPENAI_API_KEY="your-openai-api-key"
# Add other provider keys as needed

Example Usage

import dspy
import mlflow
import os

# Ensure your OPENAI_API_KEY (or other LLM provider keys) is set in your environment
# os.environ["OPENAI_API_KEY"] = "your-openai-api-key" # Uncomment and set if not globally configured

# Enabling tracing for DSPy
mlflow.dspy.autolog()

# Set up MLflow tracking to Databricks
mlflow.set_tracking_uri("databricks")
mlflow.set_experiment("/Shared/dspy-tracing-demo")

# Define a simple ChainOfThought model and run it
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)


# Define a simple summarizer model and run it
class SummarizeSignature(dspy.Signature):
    """Given a passage, generate a summary."""

    passage: str = dspy.InputField(desc="a passage to summarize")
    summary: str = dspy.OutputField(desc="a one-line summary of the passage")


class Summarize(dspy.Module):
    def __init__(self):
        self.summarize = dspy.ChainOfThought(SummarizeSignature)

    def forward(self, passage: str):
        return self.summarize(passage=passage)


summarizer = Summarize()
summarizer(
    passage=(
        "MLflow Tracing is a feature that enhances LLM observability in your Generative AI (GenAI) applications "
        "by capturing detailed information about the execution of your application's services. Tracing provides "
        "a way to record the inputs, outputs, and metadata associated with each intermediate step of a request, "
        "enabling you to easily pinpoint the source of bugs and unexpected behaviors."
    )
)

Tracing during Evaluation

Evaluating DSPy models is an important step in the development of AI systems. MLflow Tracing can help you track the performance of your programs after the evaluation, by providing detailed information about the execution of your programs for each input.

When MLflow auto-tracing is enabled for DSPy, traces will be automatically generated when you execute DSPy's built-in evaluation suites. The following example demonstrates how to run evaluation and review traces in MLflow:

import dspy
from dspy.evaluate.metrics import answer_exact_match
import mlflow
import os

# Ensure your OPENAI_API_KEY (or other LLM provider keys) is set in your environment
# os.environ["OPENAI_API_KEY"] = "your-openai-api-key" # Uncomment and set if not globally configured

# Enabling tracing for DSPy evaluation
mlflow.dspy.autolog(log_traces_from_eval=True)

# Set up MLflow tracking to Databricks if not already configured
# mlflow.set_tracking_uri("databricks")
# mlflow.set_experiment("/Shared/dspy-eval-demo")

# Define a simple evaluation set
eval_set = [
    dspy.Example(
        question="How many 'r's are in the word 'strawberry'?", answer="3"
    ).with_inputs("question"),
    dspy.Example(
        question="How many 'a's are in the word 'banana'?", answer="3"
    ).with_inputs("question"),
    dspy.Example(
        question="How many 'e's are in the word 'elephant'?", answer="2"
    ).with_inputs("question"),
]


# Define a program
class Counter(dspy.Signature):
    question: str = dspy.InputField()
    answer: str = dspy.OutputField(
        desc="Should only contain a single number as an answer"
    )


cot = dspy.ChainOfThought(Counter)

# Evaluate the programs
with mlflow.start_run(run_name="CoT Evaluation"):
    evaluator = dspy.evaluate.Evaluate(
        devset=eval_set,
        return_all_scores=True,
        return_outputs=True,
        show_progress=True,
    )
    aggregated_score, outputs, all_scores = evaluator(cot, metric=answer_exact_match)

    # Log the aggregated score
    mlflow.log_metric("exact_match", aggregated_score)
    # Log the detailed evaluation results as a table
    mlflow.log_table(
        {
            "question": [example.question for example in eval_set],
            "answer": [example.answer for example in eval_set],
            "output": outputs,
            "exact_match": all_scores,
        },
        artifact_file="eval_results.json",
    )

If you open the MLflow UI and go to the "CoT Evaluation" run, you will see the evaluation result, and the list of traces generated during the evaluation on the Traces tab.

Note

You can disable tracing for these steps by calling the mlflow.dspy.autolog function with the log_traces_from_eval parameters set to False.

Tracing during Compilation (Optimization)

Compilation (optimization) is the core concept of DSPy. Through compilation, DSPy automatically optimizes the prompts and weights of your DSPy program to achieve the best performance.

By default, MLflow does NOT generate traces during complication, because complication can trigger hundreds or thousands of invocations of DSPy modules. To enable tracing for compilation, you can call the mlflow.dspy.autolog function with the log_traces_from_compile parameter set to True.

import dspy
import mlflow
import os

# Ensure your OPENAI_API_KEY (or other LLM provider keys) is set in your environment
# os.environ["OPENAI_API_KEY"] = "your-openai-api-key" # Uncomment and set if not globally configured

# Enable auto-tracing for compilation
mlflow.dspy.autolog(log_traces_from_compile=True)

# Set up MLflow tracking to Databricks if not already configured
# mlflow.set_tracking_uri("databricks")
# mlflow.set_experiment("/Shared/dspy-compile-demo")

# Optimize the DSPy program as usual
tp = dspy.MIPROv2(metric=metric, auto="medium", num_threads=24)
optimized = tp.compile(cot, trainset=trainset, ...)

Disable auto-tracing

Auto tracing for LlamaIndex can be disabled globally by calling mlflow.llama_index.autolog(disable=True) or mlflow.autolog(disable=True).

Share via