Share via


Tracing Data Model

This document provides a detailed overview of the MLflow Trace data model. Understanding this model is key to leveraging MLflow Tracing for observability and analysis of your Generative AI applications.

MLflow Traces are designed to be compatible with OpenTelemetry specifications, a widely adopted industry standard for observability. This ensures interoperability and allows MLflow Traces to be exported and used with other OpenTelemetry-compatible systems. MLflow enhances the basic OpenTelemetry Span model by defining specific structures and attributes for Generative AI use cases, providing richer context and deeper insight into quality and performance.

Structure of Traces

At a high level, an MLflow Trace is composed of two primary objects:

  1. TraceInfo:
    • Metadata that aids in explaining the origination of the trace, the status of the trace, information about the total execution time, etc.
    • Tags that provide additional context for the trace, such as the user, session, and developer-provided key:value pairs. Tags can be used for searching or filtering traces.
    • Assessments that let you add structured feedback labels from humans or LLM judges or ground truth information to a trace or specific spans within a trace.
  2. TraceData:
    • The actual payload, which contains the instrumented Span objects that capture your application's step-by-step execution from input to output.

Tip

Check the API documentation for helper methods on these dataclass objects for more information on how to convert or extract data from them.

Trace Architecture

1. Trace Info

The TraceInfo within MLflow's tracing feature aims to provide a lightweight snapshot of critical data about the overall trace. TraceInfo is a dataclass object that contains metadata about the trace.

This metadata includes information about the trace's origin, status, and various other data that aids in retrieving and filtering traces when used with mlflow.client.MlflowClient.search_traces and for navigation of traces within the MLflow UI. To learn more about how TraceInfo metadata is used for searching, you can see examples here.

Parameter Data Type Description
trace_id str The primary identifier for the trace.
trace_location TraceLocation The ___location where the trace is stored, represented as a :py:class:~mlflow.entities.TraceLocation object. MLflow currently support MLflow Experiment or Databricks Inference Table as a trace ___location.
request_time int Start time of the trace, in milliseconds.
state TraceState State of the trace, represented as a :py:class:~mlflow.entities.TraceState enum. Can be one of [OK, ERROR, IN_PROGRESS, STATE_UNSPECIFIED].
request_preview Optional[str] Request to the model/agent, equivalent to the input of the root span but JSON-encoded and can be truncated.
response_preview Optional[str] Response from the model/agent, equivalent to the output of the root span but JSON-encoded and can be truncated.
client_request_id Optional[str] Client supplied request ID associated with the trace. This could be used to identify the trace/request from an external system that produced the trace, e.g., a session ID in a web application.
execution_duration Optional[int] Duration of the trace, in milliseconds.
trace_metadata dict[str, str] Key-value pairs associated with the trace. They are designed for immutable values like run ID associated with the trace.
tags dict[str, str] Tags associated with the trace. They are designed for mutable values, that can be updated after the trace is created via MLflow UI or API.
assessments list[Assessment] List of assessments associated with the trace.

The data that is contained in the TraceInfo object is used to populate the trace view page within the MLflow tracking UI, as shown below.

TraceInfo as it is used in the MLflow UI

The primary components of MLflow TraceInfo objects are listed below.

Metadata

Assessments

Assessments are crucial for evaluating the quality and correctness of your GenAI application's behavior as captured in traces. They allow you to attach structured labels, scores, or ground truth information to a trace or specific spans within a trace.

MLflow defines two main types of assessments, both inheriting from a base Assessment concept:

  1. Feedback: Represents qualitative or quantitative judgments about an operation's output. This can come from human reviewers, LLM-as-a-judge, or custom scoring functions.
  2. Expectations: Represents the ground truth or expected outcome for a given operation, often used for direct comparison against actual outputs.

Assessments are typically logged to a trace using functions like mlflow.log_feedback(), mlflow.log_expectation(), or the more general mlflow.log_assessment().

Assessment Source

Every assessment is associated with a source to track its origin.

  • source_type: An mlflow.entities.AssessmentSourceType enum. Key values include:
    • HUMAN: Feedback or expectation provided by a human.
    • LLM_JUDGE: Assessment generated by an LLM acting as a judge.
    • CODE: Assessment generated by a programmatic rule, heuristic, or custom scorer.
  • source_id: A string identifier for the specific source (e.g., user ID, model name of the LLM judge, script name).
from mlflow.entities import AssessmentSource, AssessmentSourceType

# Example: Human source
human_source = AssessmentSource(
    source_type=AssessmentSourceType.HUMAN,
    source_id="reviewer_alice@example.com"
)

# Example: LLM Judge source
llm_judge_source = AssessmentSource(
    source_type=AssessmentSourceType.LLM_JUDGE,
    source_id="gpt-4o-mini-evaluator"
)

# Example: Code-based scorer source
code_source = AssessmentSource(
    source_type=AssessmentSourceType.CODE,
    source_id="custom_metrics/flesch_kincaid_scorer.py"
)

Feedback

Feedback captures judgments on the quality or characteristics of a trace or span output.

Key Fields:

Parameter Data Type Description
name str The name of the assessment. If not provided, the default name "feedback" is used.
value Optional[FeedbackValueType] The feedback value. Can be a float, int, str, bool, list of these types, or a dict with string keys and values of these types.
error Optional[Union[Exception, AssessmentError]] An optional error associated with the feedback. This is used to indicate that the feedback is not valid or cannot be processed. Accepts an exception object, an :py:class:~mlflow.entities.Exception object, or a AssessmentError.
rationale Optional[str] The rationale / justification for the feedback.
source Optional[AssessmentSource] The source of the assessment. If not provided, the default source is CODE.
trace_id Optional[str] The ID of the trace associated with the assessment. If unset, the assessment is not associated with any trace yet.
metadata Optional[dict[str, str]] The metadata associated with the assessment.
span_id Optional[str] The ID of the span associated with the assessment, if the assessment should be associated with a particular span in the trace.
create_time_ms Optional[int] The creation time of the assessment in milliseconds. If unset, the current time is used.
last_update_time_ms Optional[int] The last update time of the assessment in milliseconds. If unset, the current time is used.

Example:

import mlflow
from mlflow.entities import Feedback, AssessmentSource, AssessmentSourceType

# Log simple binary feedback
mlflow.log_feedback(
    trace_id="trace_123",
    name="is_correct",
    value=True,
    source=AssessmentSource(source_type=AssessmentSourceType.HUMAN, source_id="user_bob"),
    rationale="The answer provided was factually accurate."
)

# Log a numeric score from an LLM judge
llm_judge_feedback = Feedback(
    name="relevance_score",
    value=0.85,
    source=AssessmentSource(source_type=AssessmentSourceType.LLM_JUDGE, source_id="claude-3-sonnet"),
    rationale="The response directly addressed the user's core question.",
    metadata={"judge_prompt_version": "v1.2"}
)
# Assuming trace_id is known, you can also use log_assessment
# mlflow.log_assessment(trace_id="trace_456", assessment=llm_judge_feedback)

Expectation

Expectations define the ground truth or target output for an operation.

Key Fields:

Parameter Data Type Description
name str The name of the assessment.
value Any The expected value of the operation. This can be any JSON-serializable value.
source Optional[AssessmentSource] The source of the assessment. If not provided, the default source is HUMAN. (See Assessment Source for more details).
trace_id Optional[str] The ID of the trace associated with the assessment. If unset, the assessment is not associated with any trace yet.
metadata Optional[dict[str, str]] The metadata associated with the assessment.
span_id Optional[str] The ID of the span associated with the assessment, if the assessment should be associated with a particular span in the trace.
create_time_ms Optional[int] The creation time of the assessment in milliseconds. If unset, the current time is used.
last_update_time_ms Optional[int] The last update time of the assessment in milliseconds. If unset, the current time is used.

Example:

import mlflow
from mlflow.entities import Expectation, AssessmentSource, AssessmentSourceType

# Log a ground truth answer
mlflow.log_expectation(
    trace_id="trace_789",
    name="ground_truth_response",
    value="The Battle of Hastings was in 1066.",
    source=AssessmentSource(source_type=AssessmentSourceType.HUMAN, source_id="history_expert_01")
)

# Log an expected structured output for a tool call
expected_tool_output = Expectation(
    name="expected_tool_call_result",
    value={"result": {"status": "success", "data": "item_abc_123"}},
    metadata={"tool_name": "inventory_check"}
)
# Assuming trace_id is known:
# mlflow.log_assessment(trace_id="trace_101", assessment=expected_tool_output)

Assessment Error

Used to log errors that occurred during the generation or computation of feedback or an expectation (e.g., an LLM judge failing).

Key Fields:

  • error_code (str): A code for the error (e.g., "RATE_LIMIT_EXCEEDED", "JUDGE_ERROR").
  • error_message (Optional[str]): Detailed error message.
  • stack_trace (Optional[str]): Stack trace, if available.

Example:

import mlflow
from mlflow.entities import AssessmentError, Feedback, AssessmentSource, AssessmentSourceType

judge_error = AssessmentError(
    error_code="LLM_JUDGE_TIMEOUT",
    error_message="The LLM judge timed out after 30 seconds while assessing relevance."
)

mlflow.log_feedback(
    trace_id="trace_error_example",
    name="relevance_with_judge_v2",
    source=AssessmentSource(source_type=AssessmentSourceType.LLM_JUDGE, source_id="custom_judge_model"),
    error=judge_error
    # Note: `value` is typically None when an error is provided
)

These entities provide a flexible yet structured way to associate rich qualitative and quantitative data with your traces, forming a crucial part of the observability and evaluation capabilities within MLflow Tracing.

Tags

The tags property in MLflow's TraceInfo object is used to provide additional context for the trace. These tags can be used for searching, filtering, or providing additional information about the trace.

The tags are key-value pairs, and they are mutable. This means that you can add, modify, or remove tags at any time, even after the trace has been logged to an experiment.

To learn how to add custom tags to capture custom metadata view the Attach custom tags / metadata.

Standard tags

MLflow uses a set of standard tags for common contextual information about users, sessions and the environment, which enable enhanced filtering and grouping capabilities within the MLflow UI and SDK:

  • mlflow.trace.session: Standard tag for session ID, introduced in Track Users & Sessions.
  • mlflow.trace.user: Standard tag for user ID, introduced in Track Users & Sessions.
  • mlflow.source.name: The entry point or script that generated the trace.
  • mlflow.source.git.commit: If run from a Git repository, the commit hash of the source code.
  • mlflow.source.type: The type of source that generated the trace, commonly PROJECT (for MLflow Project runs) or NOTEBOOK (if run from a notebook).

You can learn more about how to implement these in the guides for tracking users & sessions and tracking environments & context.

2. Trace Data

The MLflow TraceData object, accessible via Trace.data, holds the core payload of the trace. It primarily contains the sequence of operations (spans) that occurred, along with the initial request that triggered the trace and the final response produced.

Key Fields:

  • spans (List[Span]):

    • This is a list of Span objects (conforming to mlflow.entities.Span and OpenTelemetry specifications) that represent the individual steps, operations, or function calls within the trace. Each span details a specific unit of work.
    • Spans are organized hierarchically via parent_id to represent the execution flow.
    • See the Span Schema section below for a detailed breakdown of a Span object.

Note

The request and response properties are preserved for backward compatibility. Their values are looked up from the respective inputs and outputs attribute of the root span and is not set directly by the user on the TraceData object.

  • request (str):

    • A JSON-serialized string representing the input data for the root span of the trace. This is typically the end-user's request or the initial parameters that invoked the traced application or workflow.
    • Example: '{"query": "What is MLflow Tracing?", "user_id": "user123"}'
  • response (str):

    • A JSON-serialized string representing the final output data from the root span of the traced application or workflow.
    • Example: '{"answer": "MLflow Tracing provides observability...", "confidence": 0.95}'

Conceptual Representation:

While you typically interact with TraceData through an mlflow.entities.Trace object retrieved via the client (e.g., client.get_trace(trace_id).data), conceptually it bundles these core components:

# Conceptual structure (not direct instantiation like this)
class TraceData:
    def __init__(self, spans: list[Span], request: str, response: str):
        self.spans = spans # List of Span objects
        self.request = request # JSON string: Overall input to the trace
        self.response = response # JSON string: Overall output of the trace

Understanding TraceData is key to programmatically analyzing the detailed execution path and the data transformations that occur throughout your GenAI application's lifecycle.

Spans

The Span object within MLflow's tracing feature provides detailed information about the individual steps of the trace.

It complies to the OpenTelemetry Span spec. Each Span object contains information about the step being instrumented, including the span_id, name, start_time, parent_id, status, inputs, outputs, attributes, and events.

Span Architecture

Span Schema

Spans are the core of the trace data. They record key, critical data about each of the steps within your genai application.

When you view your traces within the MLflow UI, you're looking at a collection of spans, as shown below.

Spans within the MLflow UI

The sections below provide a detailed view of the structure of a span.

Span Types

Span types are a way to categorize spans within a trace. By default, the span type is set to "UNKNOWN" when using the trace decorator. MLflow provides a set of predefined span types for common use cases, while also allowing you to setting custom span types.

The following span types are available. Additionally, you can set the span type to any developer-specified str value.

Span Type Description
"CHAT_MODEL" Represents a query to a chat model. This is a special case of an LLM interaction.
"CHAIN" Represents a chain of operations.
"AGENT" Represents an autonomous agent operation.
"TOOL" Represents a tool execution (typically by an agent), such as querying a search engine.
"EMBEDDING" Represents a text embedding operation.
"RETRIEVER" Represents a context retrieval operation, such as querying a vector database.
"PARSER" Represents a parsing operation, transforming text into a structured format.
"RERANKER" Represents a re-ranking operation, ordering the retrieved contexts based on relevance.
"UNKNOWN" A default span type that is used when no other span type is specified.

To set a span type, you can pass the span_type parameter to the mlflow.trace decorator or mlflow.start_span context manager. When you are using automatic tracing, the span type is automatically set by MLflow.

import mlflow
from mlflow.entities import SpanType


# Using a built-in span type
@mlflow.trace(span_type=SpanType.RETRIEVER)
def retrieve_documents(query: str):
    ...


# Setting a custom span type
with mlflow.start_span(name="add", span_type="MATH") as span:
    span.set_inputs({"x": z, "y": y})
    z = x + y
    span.set_outputs({"z": z})

    print(span.span_type)
    # Output: MATH

Schema for specific span types

MLflow has a set of predefined types of spans (see mlflow.entities.SpanType), and certain span types have properties that are required in order to enable additional functionality within the UI and downstream tasks such as evaluation.

Retriever Spans

The RETRIEVER span type is used for operations involving retrieving data from a data store (for example, querying documents from a vector store). The output of a RETRIEVER span is expected to be a list of documents.

Each document in the list should be a dictionary (or an object that can be serialized to a dictionary with the following keys) and ideally includes:

  • page_content (str): The text content of the retrieved document chunk.
  • metadata (Optional[Dict[str, Any]]): A dictionary of additional metadata associated with the document.
    • MLflow UI and evaluation metrics may specifically look for doc_uri (a string URI for the document source) and chunk_id (a string identifier if the document is part of a larger chunked document) within this metadata for enhanced display and functionality.
  • id (Optional[str]): An optional unique identifier for the document chunk itself.

Example of a Retriever Span in action:

import mlflow
from mlflow.entities import SpanType, Document


def search_store(query: str) -> list[(str, str)]:
    # Simulate retrieving documents (e.g., from a vector database)
    return [
        ("MLflow Tracing helps debug GenAI applications...", "docs/mlflow/tracing_intro.md"),
        ("Key components of a trace include spans...", "docs/mlflow/tracing_datamodel.md"),
        ("MLflow provides automatic instrumentation...", "docs/mlflow/auto_trace.md")
    ]


@mlflow.trace(span_type=SpanType.RETRIEVER)
def retrieve_relevant_documents(query: str):
    # Get documents from the search store
    docs = search_store(query)

    # Get the current active span (created by @mlflow.trace)
    span = mlflow.get_current_active_span()

    # Set the outputs of the span in accordance with the tracing schema
    outputs = [Document(page_content=doc, metadata={"doc_uri": uri}) for doc, uri in docs]
    span.set_outputs(outputs)

    # Return the original format for downstream usage
    return docs


# Example usage
user_query = "MLflow Tracing benefits"
retrieved_docs = retrieve_relevant_documents(user_query)

# Read path: Reconstructing the document list from the span outputs
trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id)
span = trace.search_spans(name="retrieve_relevant_documents")[0]
documents = [Document(**doc) for doc in span.outputs]

print(documents)

Conforming to this structure, especially including page_content and relevant metadata like doc_uri, will ensure that RETRIEVER spans are rendered informatively in the MLflow UI (e.g., displaying document content and providing links) and that downstream evaluation tasks can correctly process the retrieved context.

Chat Completion & Tool Call Spans

Spans of type CHAT_MODEL or LLM are used to represent interactions with a chat completions API (for example, OpenAI's chat completions, or Anthropic's messages API). These spans can also capture information about tools (functions) made available to or used by the model.

As providers can have different schemas for their API, there are no strict restrictions on the format of the span's inputs and outputs for the raw LLM call itself. However, to enable rich UI features (like conversation display and tool call visualization) and to standardize data for evaluation, MLflow defines specific attributes for chat messages and tool definitions.

Please refer to the example below for a quick demonstration of how to use the utility functions described above, as well as how to retrieve them using the span.get_attribute() function:

import mlflow
from mlflow.entities.span import SpanType # Corrected from mlflow.entities.span import SpanType
from mlflow.tracing.constant import SpanAttributeKey
from mlflow.tracing import set_span_chat_messages, set_span_chat_tools

# example messages and tools
messages = [
    {
        "role": "system",
        "content": "please use the provided tool to answer the user's questions",
    },
    {"role": "user", "content": "what is 1 + 1?"},
]

tools = [
    {
        "type": "function",
        "function": {
            "name": "add",
            "description": "Add two numbers",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {"type": "number"},
                    "b": {"type": "number"},
                },
                "required": ["a", "b"],
            },
        },
    }
]


@mlflow.trace(span_type=SpanType.CHAT_MODEL)
def call_chat_model(messages, tools):
    # mocking a response
    response = {
        "role": "assistant",
        "tool_calls": [
            {
                "id": "123",
                "function": {"arguments": '{"a": 1,"b": 2}', "name": "add"},
                "type": "function",
            }
        ],
    }

    combined_messages = messages + [response]

    span = mlflow.get_current_active_span()
    set_span_chat_messages(span, combined_messages)
    set_span_chat_tools(span, tools)

    return response


call_chat_model(messages, tools)

trace = mlflow.get_last_active_trace()
span = trace.data.spans[0]

print("Messages: ", span.get_attribute(SpanAttributeKey.CHAT_MESSAGES))
print("Tools: ", span.get_attribute(SpanAttributeKey.CHAT_TOOLS))

Next steps

Continue your journey with these recommended actions and tutorials.

Reference guides

Explore detailed documentation about related concepts.