Share via


Version Tracking API Reference

Overview

MLflow version tracking enables you to create versioned representations of your GenAI applications using the LoggedModel entity. This page provides the API reference for tracking application versions in MLflow.

Why Version Your GenAI Application?

Reproducibility: Capture or link to the exact code (e.g., Git commit hash) and configurations used for a specific version, ensuring you can always reconstruct it.

Debugging Regressions: Track LoggedModel versions to easily compare problematic versions against known good versions by examining differences in code, configurations, evaluation results, and traces.

Objective Comparison: Systematically evaluate versions using mlflow.genai.evaluate() to compare metrics like quality scores, cost, and latency side-by-side.

Auditability: Each LoggedModel version serves as an auditable record, linking to specific code and configurations for compliance and incident investigation.

Core Concepts

LoggedModel

A LoggedModel in MLflow represents a specific version of your GenAI application. Each distinct state of your application that you want to evaluate, deploy, or refer back to can be captured as a new LoggedModel.

Key characteristics:

  • Central versioned entity for your GenAI application
  • Captures application state including configuration and parameters
  • Links to external code (typically via Git commit hash)
  • Tracks lifecycle from development through production

Version Tracking Methods

MLflow provides two approaches for version tracking:

  1. set_active_model: Simple version tracking that automatically creates a LoggedModel if needed and links subsequent traces
  2. create_external_model: Full control over version creation with extensive metadata, parameters, and tags

API Reference

set_active_model

Links traces to a specific LoggedModel version. If a model with the given name doesn't exist, it automatically creates one.

def set_active_model(
    name: Optional[str] = None,
    model_id: Optional[str] = None
) -> ActiveModel:

Parameters

Parameter Type Required Description
name str \| None No* Name of the model. If model doesn't exist, creates a new one
model_id str \| None No* ID of an existing LoggedModel

*Either name or model_id must be provided.

Return Value

Returns an ActiveModel object (subclass of LoggedModel) that can be used as a context manager.

Example Usage

import mlflow

# Simple usage - creates model if it doesn't exist
mlflow.set_active_model(name="my-agent-v1.0")

# Use as context manager
with mlflow.set_active_model(name="my-agent-v2.0") as model:
    print(f"Model ID: {model.model_id}")
    # Traces within this context are linked to this model

# Use with existing model ID
mlflow.set_active_model(model_id="existing-model-id")

create_external_model

Creates a new LoggedModel for applications whose code and artifacts are stored outside MLflow (e.g., in Git).

def create_external_model(
    name: Optional[str] = None,
    source_run_id: Optional[str] = None,
    tags: Optional[dict[str, str]] = None,
    params: Optional[dict[str, str]] = None,
    model_type: Optional[str] = None,
    experiment_id: Optional[str] = None,
) -> LoggedModel:

Parameters

Parameter Type Required Description
name str \| None No Model name. If not specified, a random name is generated
source_run_id str \| None No ID of the associated run. Defaults to active run ID if within a run context
tags dict[str, str] \| None No Key-value pairs for organization and filtering
params dict[str, str] \| None No Model parameters and configuration (must be strings)
model_type str \| None No User-defined type for categorization (e.g., "agent", "rag-system")
experiment_id str \| None No Experiment to associate with. Uses active experiment if not specified

Return Value

Returns a LoggedModel object with:

  • model_id: Unique identifier for the model
  • name: The assigned model name
  • experiment_id: Associated experiment ID
  • creation_timestamp: When the model was created
  • status: Model status (always "READY" for external models)
  • tags: Dictionary of tags
  • params: Dictionary of parameters

Example Usage

import mlflow

# Basic usage
model = mlflow.create_external_model(
    name="customer-support-agent-v1.0"
)

# With full metadata
model = mlflow.create_external_model(
    name="recommendation-engine-v2.1",
    model_type="rag-agent",
    params={
        "llm_model": "gpt-4",
        "temperature": "0.7",
        "max_tokens": "1000",
        "retrieval_k": "5"
    },
    tags={
        "team": "ml-platform",
        "environment": "staging",
        "git_commit": "abc123def"
    }
)

# Within a run context
with mlflow.start_run() as run:
    model = mlflow.create_external_model(
        name="my-agent-v3.0",
        source_run_id=run.info.run_id
    )

LoggedModel Class

The LoggedModel class represents a versioned model in MLflow.

Properties

Property Type Description
model_id str Unique identifier for the model
name str Model name
experiment_id str Associated experiment ID
creation_timestamp int Creation time (milliseconds since epoch)
last_updated_timestamp int Last update time (milliseconds since epoch)
model_type str \| None User-defined model type
source_run_id str \| None ID of the run that created this model
status LoggedModelStatus Model status (READY, FAILED_REGISTRATION, etc.)
tags dict[str, str] Dictionary of tags
params dict[str, str] Dictionary of parameters
model_uri str URI for referencing the model (e.g., "models:/model_id")

Common Patterns

Version Tracking with Git Integration

import mlflow
import subprocess

# Get current git commit
git_commit = subprocess.check_output(["git", "rev-parse", "HEAD"]).decode().strip()[:8]

# Create versioned model name
model_name = f"my-app-git-{git_commit}"

# Track the version
model = mlflow.create_external_model(
    name=model_name,
    tags={"git_commit": git_commit}
)

Linking Traces to Versions

import mlflow

# Set active model - all subsequent traces will be linked
mlflow.set_active_model(name="my-agent-v1.0")

# Your application code with tracing
@mlflow.trace
def process_request(query: str):
    # This trace will be automatically linked to my-agent-v1.0
    return f"Processing: {query}"

# Run the application
result = process_request("Hello world")

Production Deployment

In production, use environment variables instead of calling set_active_model():

# Set the model ID that traces should be linked to
export MLFLOW_ACTIVE_MODEL_ID="my-agent-v1.0"

Best Practices

  1. Use semantic versioning in model names (e.g., "app-v1.2.3")
  2. Include git commits in tags for traceability
  3. Parameters must be strings - convert numbers and booleans
  4. Use model_type to categorize similar applications
  5. Set active model before tracing to ensure proper linkage

Common Issues

Invalid parameter types:

# Error: Parameters must be strings
# Wrong:
params = {"temperature": 0.7, "max_tokens": 1000}

# Correct:
params = {"temperature": "0.7", "max_tokens": "1000"}

Next Steps