Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This document provides a detailed overview of the MLflow Trace data model. Understanding this model is key to leveraging MLflow Tracing for observability and analysis of your Generative AI applications.
MLflow Traces are designed to be compatible with OpenTelemetry specifications, a widely adopted industry standard for observability. This ensures interoperability and allows MLflow Traces to be exported and used with other OpenTelemetry-compatible systems. MLflow enhances the basic OpenTelemetry Span model by defining specific structures and attributes for Generative AI use cases, providing richer context and deeper insight into quality and performance.
Structure of Traces
At a high level, an MLflow Trace is composed of two primary objects:
- TraceInfo:
- Metadata that aids in explaining the origination of the trace, the status of the trace, information about the total execution time, etc.
- Tags that provide additional context for the trace, such as the user, session, and developer-provided key:value pairs. Tags can be used for searching or filtering traces.
- Assessments that let you add structured feedback labels from humans or LLM judges or ground truth information to a trace or specific spans within a trace.
- TraceData:
- The actual payload, which contains the instrumented Span objects that capture your application's step-by-step execution from input to output.
Tip
Check the API documentation for helper methods on these dataclass objects for more information on how to convert or extract data from them.
1. Trace Info
The TraceInfo
within MLflow's tracing feature aims to provide a lightweight snapshot of critical data about the overall trace. TraceInfo
is a dataclass object that contains metadata about the trace.
This metadata includes information about the trace's origin, status, and various other data that aids in retrieving and filtering traces when used with mlflow.client.MlflowClient.search_traces
and for navigation of traces within the MLflow UI. To learn more about how TraceInfo
metadata is used for searching, you can see examples here.
Parameter | Data Type | Description |
---|---|---|
trace_id |
str |
The primary identifier for the trace. |
trace_location |
TraceLocation |
The ___location where the trace is stored, represented as a :py:class:~mlflow.entities.TraceLocation object. MLflow currently support MLflow Experiment or Databricks Inference Table as a trace ___location. |
request_time |
int |
Start time of the trace, in milliseconds. |
state |
TraceState |
State of the trace, represented as a :py:class:~mlflow.entities.TraceState enum. Can be one of [OK , ERROR , IN_PROGRESS , STATE_UNSPECIFIED ]. |
request_preview |
Optional[str] |
Request to the model/agent, equivalent to the input of the root span but JSON-encoded and can be truncated. |
response_preview |
Optional[str] |
Response from the model/agent, equivalent to the output of the root span but JSON-encoded and can be truncated. |
client_request_id |
Optional[str] |
Client supplied request ID associated with the trace. This could be used to identify the trace/request from an external system that produced the trace, e.g., a session ID in a web application. |
execution_duration |
Optional[int] |
Duration of the trace, in milliseconds. |
trace_metadata |
dict[str, str] |
Key-value pairs associated with the trace. They are designed for immutable values like run ID associated with the trace. |
tags |
dict[str, str] |
Tags associated with the trace. They are designed for mutable values, that can be updated after the trace is created via MLflow UI or API. |
assessments |
list[Assessment] |
List of assessments associated with the trace. |
The data that is contained in the TraceInfo
object is used to populate the trace view page within the MLflow tracking UI, as shown below.
The primary components of MLflow TraceInfo
objects are listed below.
Metadata
Assessments
Assessments are crucial for evaluating the quality and correctness of your GenAI application's behavior as captured in traces. They allow you to attach structured labels, scores, or ground truth information to a trace or specific spans within a trace.
MLflow defines two main types of assessments, both inheriting from a base Assessment
concept:
- Feedback: Represents qualitative or quantitative judgments about an operation's output. This can come from human reviewers, LLM-as-a-judge, or custom scoring functions.
- Expectations: Represents the ground truth or expected outcome for a given operation, often used for direct comparison against actual outputs.
Assessments are typically logged to a trace using functions like mlflow.log_feedback()
, mlflow.log_expectation()
, or the more general mlflow.log_assessment()
.
Assessment Source
Every assessment is associated with a source to track its origin.
source_type
: Anmlflow.entities.AssessmentSourceType
enum. Key values include:HUMAN
: Feedback or expectation provided by a human.LLM_JUDGE
: Assessment generated by an LLM acting as a judge.CODE
: Assessment generated by a programmatic rule, heuristic, or custom scorer.
source_id
: A string identifier for the specific source (e.g., user ID, model name of the LLM judge, script name).
from mlflow.entities import AssessmentSource, AssessmentSourceType
# Example: Human source
human_source = AssessmentSource(
source_type=AssessmentSourceType.HUMAN,
source_id="reviewer_alice@example.com"
)
# Example: LLM Judge source
llm_judge_source = AssessmentSource(
source_type=AssessmentSourceType.LLM_JUDGE,
source_id="gpt-4o-mini-evaluator"
)
# Example: Code-based scorer source
code_source = AssessmentSource(
source_type=AssessmentSourceType.CODE,
source_id="custom_metrics/flesch_kincaid_scorer.py"
)
Feedback
Feedback captures judgments on the quality or characteristics of a trace or span output.
Key Fields:
Parameter | Data Type | Description |
---|---|---|
name |
str |
The name of the assessment. If not provided, the default name "feedback" is used. |
value |
Optional[FeedbackValueType] |
The feedback value. Can be a float, int, str, bool, list of these types, or a dict with string keys and values of these types. |
error |
Optional[Union[Exception, AssessmentError]] |
An optional error associated with the feedback. This is used to indicate that the feedback is not valid or cannot be processed. Accepts an exception object, an :py:class:~mlflow.entities.Exception object, or a AssessmentError . |
rationale |
Optional[str] |
The rationale / justification for the feedback. |
source |
Optional[AssessmentSource] |
The source of the assessment. If not provided, the default source is CODE. |
trace_id |
Optional[str] |
The ID of the trace associated with the assessment. If unset, the assessment is not associated with any trace yet. |
metadata |
Optional[dict[str, str]] |
The metadata associated with the assessment. |
span_id |
Optional[str] |
The ID of the span associated with the assessment, if the assessment should be associated with a particular span in the trace. |
create_time_ms |
Optional[int] |
The creation time of the assessment in milliseconds. If unset, the current time is used. |
last_update_time_ms |
Optional[int] |
The last update time of the assessment in milliseconds. If unset, the current time is used. |
Example:
import mlflow
from mlflow.entities import Feedback, AssessmentSource, AssessmentSourceType
# Log simple binary feedback
mlflow.log_feedback(
trace_id="trace_123",
name="is_correct",
value=True,
source=AssessmentSource(source_type=AssessmentSourceType.HUMAN, source_id="user_bob"),
rationale="The answer provided was factually accurate."
)
# Log a numeric score from an LLM judge
llm_judge_feedback = Feedback(
name="relevance_score",
value=0.85,
source=AssessmentSource(source_type=AssessmentSourceType.LLM_JUDGE, source_id="claude-3-sonnet"),
rationale="The response directly addressed the user's core question.",
metadata={"judge_prompt_version": "v1.2"}
)
# Assuming trace_id is known, you can also use log_assessment
# mlflow.log_assessment(trace_id="trace_456", assessment=llm_judge_feedback)
Expectation
Expectations define the ground truth or target output for an operation.
Key Fields:
Parameter | Data Type | Description |
---|---|---|
name |
str |
The name of the assessment. |
value |
Any |
The expected value of the operation. This can be any JSON-serializable value. |
source |
Optional[AssessmentSource] |
The source of the assessment. If not provided, the default source is HUMAN. (See Assessment Source for more details). |
trace_id |
Optional[str] |
The ID of the trace associated with the assessment. If unset, the assessment is not associated with any trace yet. |
metadata |
Optional[dict[str, str]] |
The metadata associated with the assessment. |
span_id |
Optional[str] |
The ID of the span associated with the assessment, if the assessment should be associated with a particular span in the trace. |
create_time_ms |
Optional[int] |
The creation time of the assessment in milliseconds. If unset, the current time is used. |
last_update_time_ms |
Optional[int] |
The last update time of the assessment in milliseconds. If unset, the current time is used. |
Example:
import mlflow
from mlflow.entities import Expectation, AssessmentSource, AssessmentSourceType
# Log a ground truth answer
mlflow.log_expectation(
trace_id="trace_789",
name="ground_truth_response",
value="The Battle of Hastings was in 1066.",
source=AssessmentSource(source_type=AssessmentSourceType.HUMAN, source_id="history_expert_01")
)
# Log an expected structured output for a tool call
expected_tool_output = Expectation(
name="expected_tool_call_result",
value={"result": {"status": "success", "data": "item_abc_123"}},
metadata={"tool_name": "inventory_check"}
)
# Assuming trace_id is known:
# mlflow.log_assessment(trace_id="trace_101", assessment=expected_tool_output)
Assessment Error
Used to log errors that occurred during the generation or computation of feedback or an expectation (e.g., an LLM judge failing).
Key Fields:
error_code
(str): A code for the error (e.g., "RATE_LIMIT_EXCEEDED", "JUDGE_ERROR").error_message
(Optional[str]): Detailed error message.stack_trace
(Optional[str]): Stack trace, if available.
Example:
import mlflow
from mlflow.entities import AssessmentError, Feedback, AssessmentSource, AssessmentSourceType
judge_error = AssessmentError(
error_code="LLM_JUDGE_TIMEOUT",
error_message="The LLM judge timed out after 30 seconds while assessing relevance."
)
mlflow.log_feedback(
trace_id="trace_error_example",
name="relevance_with_judge_v2",
source=AssessmentSource(source_type=AssessmentSourceType.LLM_JUDGE, source_id="custom_judge_model"),
error=judge_error
# Note: `value` is typically None when an error is provided
)
These entities provide a flexible yet structured way to associate rich qualitative and quantitative data with your traces, forming a crucial part of the observability and evaluation capabilities within MLflow Tracing.
Tags
The tags
property in MLflow's TraceInfo
object is used to provide additional context for the trace. These tags can be used for searching, filtering, or providing additional information about the trace.
The tags are key-value pairs, and they are mutable. This means that you can add, modify, or remove tags at any time, even after the trace has been logged to an experiment.
To learn how to add custom tags to capture custom metadata view the Attach custom tags / metadata.
Standard tags
MLflow uses a set of standard tags for common contextual information about users, sessions and the environment, which enable enhanced filtering and grouping capabilities within the MLflow UI and SDK:
mlflow.trace.session
: Standard tag for session ID, introduced in Track Users & Sessions.mlflow.trace.user
: Standard tag for user ID, introduced in Track Users & Sessions.mlflow.source.name
: The entry point or script that generated the trace.mlflow.source.git.commit
: If run from a Git repository, the commit hash of the source code.mlflow.source.type
: The type of source that generated the trace, commonlyPROJECT
(for MLflow Project runs) orNOTEBOOK
(if run from a notebook).
You can learn more about how to implement these in the guides for tracking users & sessions and tracking environments & context.
2. Trace Data
The MLflow TraceData
object, accessible via Trace.data
, holds the core payload of the trace. It primarily contains the sequence of operations (spans) that occurred, along with the initial request that triggered the trace and the final response produced.
Key Fields:
spans
(List[Span
]):- This is a list of
Span
objects (conforming tomlflow.entities.Span
and OpenTelemetry specifications) that represent the individual steps, operations, or function calls within the trace. Each span details a specific unit of work. - Spans are organized hierarchically via
parent_id
to represent the execution flow. - See the Span Schema section below for a detailed breakdown of a
Span
object.
- This is a list of
Note
The request
and response
properties are preserved for backward compatibility. Their values are looked up from the respective inputs
and outputs
attribute of the root span and is not set directly by the user on the TraceData
object.
request
(str):- A JSON-serialized string representing the input data for the root span of the trace. This is typically the end-user's request or the initial parameters that invoked the traced application or workflow.
- Example:
'{"query": "What is MLflow Tracing?", "user_id": "user123"}'
response
(str):- A JSON-serialized string representing the final output data from the root span of the traced application or workflow.
- Example:
'{"answer": "MLflow Tracing provides observability...", "confidence": 0.95}'
Conceptual Representation:
While you typically interact with TraceData
through an mlflow.entities.Trace
object retrieved via the client (e.g., client.get_trace(trace_id).data
), conceptually it bundles these core components:
# Conceptual structure (not direct instantiation like this)
class TraceData:
def __init__(self, spans: list[Span], request: str, response: str):
self.spans = spans # List of Span objects
self.request = request # JSON string: Overall input to the trace
self.response = response # JSON string: Overall output of the trace
Understanding TraceData
is key to programmatically analyzing the detailed execution path and the data transformations that occur throughout your GenAI application's lifecycle.
Spans
The Span object within MLflow's tracing feature provides detailed information about the individual steps of the trace.
It complies to the OpenTelemetry Span spec. Each Span object contains information about the step being instrumented, including the span_id, name, start_time, parent_id, status, inputs, outputs, attributes, and events.
Span Schema
Spans are the core of the trace data. They record key, critical data about each of the steps within your genai application.
When you view your traces within the MLflow UI, you're looking at a collection of spans, as shown below.
The sections below provide a detailed view of the structure of a span.
Span Types
Span types are a way to categorize spans within a trace. By default, the span type is set to "UNKNOWN"
when using the trace decorator. MLflow provides a set of predefined span types for common use cases, while also allowing you to setting custom span types.
The following span types are available. Additionally, you can set the span type to any developer-specified str
value.
Span Type | Description |
---|---|
"CHAT_MODEL" |
Represents a query to a chat model. This is a special case of an LLM interaction. |
"CHAIN" |
Represents a chain of operations. |
"AGENT" |
Represents an autonomous agent operation. |
"TOOL" |
Represents a tool execution (typically by an agent), such as querying a search engine. |
"EMBEDDING" |
Represents a text embedding operation. |
"RETRIEVER" |
Represents a context retrieval operation, such as querying a vector database. |
"PARSER" |
Represents a parsing operation, transforming text into a structured format. |
"RERANKER" |
Represents a re-ranking operation, ordering the retrieved contexts based on relevance. |
"UNKNOWN" |
A default span type that is used when no other span type is specified. |
To set a span type, you can pass the span_type
parameter to the mlflow.trace
decorator or mlflow.start_span
context manager. When you are using automatic tracing, the span type is automatically set by MLflow.
import mlflow
from mlflow.entities import SpanType
# Using a built-in span type
@mlflow.trace(span_type=SpanType.RETRIEVER)
def retrieve_documents(query: str):
...
# Setting a custom span type
with mlflow.start_span(name="add", span_type="MATH") as span:
span.set_inputs({"x": z, "y": y})
z = x + y
span.set_outputs({"z": z})
print(span.span_type)
# Output: MATH
Schema for specific span types
MLflow has a set of predefined types of spans (see mlflow.entities.SpanType
), and certain span types have properties that are required in order to enable additional functionality within the UI and downstream tasks such as evaluation.
Retriever Spans
The RETRIEVER
span type is used for operations involving retrieving data from a data store (for example, querying documents from a vector store). The output of a RETRIEVER
span is expected to be a list of documents.
Each document in the list should be a dictionary (or an object that can be serialized to a dictionary with the following keys) and ideally includes:
page_content
(str
): The text content of the retrieved document chunk.metadata
(Optional[Dict[str, Any]]
): A dictionary of additional metadata associated with the document.- MLflow UI and evaluation metrics may specifically look for
doc_uri
(a string URI for the document source) andchunk_id
(a string identifier if the document is part of a larger chunked document) within this metadata for enhanced display and functionality.
- MLflow UI and evaluation metrics may specifically look for
id
(Optional[str]
): An optional unique identifier for the document chunk itself.
Example of a Retriever Span in action:
import mlflow
from mlflow.entities import SpanType, Document
def search_store(query: str) -> list[(str, str)]:
# Simulate retrieving documents (e.g., from a vector database)
return [
("MLflow Tracing helps debug GenAI applications...", "docs/mlflow/tracing_intro.md"),
("Key components of a trace include spans...", "docs/mlflow/tracing_datamodel.md"),
("MLflow provides automatic instrumentation...", "docs/mlflow/auto_trace.md")
]
@mlflow.trace(span_type=SpanType.RETRIEVER)
def retrieve_relevant_documents(query: str):
# Get documents from the search store
docs = search_store(query)
# Get the current active span (created by @mlflow.trace)
span = mlflow.get_current_active_span()
# Set the outputs of the span in accordance with the tracing schema
outputs = [Document(page_content=doc, metadata={"doc_uri": uri}) for doc, uri in docs]
span.set_outputs(outputs)
# Return the original format for downstream usage
return docs
# Example usage
user_query = "MLflow Tracing benefits"
retrieved_docs = retrieve_relevant_documents(user_query)
# Read path: Reconstructing the document list from the span outputs
trace_id = mlflow.get_last_active_trace_id()
trace = mlflow.get_trace(trace_id)
span = trace.search_spans(name="retrieve_relevant_documents")[0]
documents = [Document(**doc) for doc in span.outputs]
print(documents)
Conforming to this structure, especially including page_content
and relevant metadata
like doc_uri
, will ensure that RETRIEVER
spans are rendered informatively in the MLflow UI (e.g., displaying document content and providing links) and that downstream evaluation tasks can correctly process the retrieved context.
Chat Completion & Tool Call Spans
Spans of type CHAT_MODEL
or LLM
are used to represent interactions with a chat completions API
(for example, OpenAI's chat completions,
or Anthropic's messages API). These spans can also capture information about tools (functions) made available to or used by the model.
As providers can have different schemas for their API, there are no strict restrictions on the format of the span's inputs and outputs for the raw LLM call itself. However, to enable rich UI features (like conversation display and tool call visualization) and to standardize data for evaluation, MLflow defines specific attributes for chat messages and tool definitions.
Please refer to the example below for a quick demonstration of how to use the utility functions described above, as well as
how to retrieve them using the span.get_attribute()
function:
import mlflow
from mlflow.entities.span import SpanType # Corrected from mlflow.entities.span import SpanType
from mlflow.tracing.constant import SpanAttributeKey
from mlflow.tracing import set_span_chat_messages, set_span_chat_tools
# example messages and tools
messages = [
{
"role": "system",
"content": "please use the provided tool to answer the user's questions",
},
{"role": "user", "content": "what is 1 + 1?"},
]
tools = [
{
"type": "function",
"function": {
"name": "add",
"description": "Add two numbers",
"parameters": {
"type": "object",
"properties": {
"a": {"type": "number"},
"b": {"type": "number"},
},
"required": ["a", "b"],
},
},
}
]
@mlflow.trace(span_type=SpanType.CHAT_MODEL)
def call_chat_model(messages, tools):
# mocking a response
response = {
"role": "assistant",
"tool_calls": [
{
"id": "123",
"function": {"arguments": '{"a": 1,"b": 2}', "name": "add"},
"type": "function",
}
],
}
combined_messages = messages + [response]
span = mlflow.get_current_active_span()
set_span_chat_messages(span, combined_messages)
set_span_chat_tools(span, tools)
return response
call_chat_model(messages, tools)
trace = mlflow.get_last_active_trace()
span = trace.data.spans[0]
print("Messages: ", span.get_attribute(SpanAttributeKey.CHAT_MESSAGES))
print("Tools: ", span.get_attribute(SpanAttributeKey.CHAT_TOOLS))
Next steps
Continue your journey with these recommended actions and tutorials.
- Instrument your app with tracing - Apply these data model concepts to add tracing to your application
- Query traces via SDK - Use the data model to programmatically analyze traces
- Attach custom tags / metadata - Enrich traces with contextual information
Reference guides
Explore detailed documentation about related concepts.
- Tracing concepts - Understand the higher-level concepts behind the data model
- Logging assessments - Learn how to attach feedback and expectations to traces
- Delete traces - Manage trace lifecycle and cleanup