Track versions of Git-based applications with MLflow

2025-06-11

This guide demonstrates how to track versions of your GenAI application when your app's code resides in Git or a similar version control system. In this workflow, an MLflow LoggedModel acts as a metadata hub, linking each conceptual application version to its specific external code (e.g., a Git commit), configurations. This LoggedModel can then be associated MLflow entities like traces and evaluation runs.

The mlflow.set_active_model(name=...) is key to version tracking: calling this function links your application's traces to a LoggedModel. If the name does not exist, a new LoggedModel is automatically created.

What you'll learn:

Track versions of your application using LoggedModels
Link evaluation runs to your LoggedModel

Tip

We suggest using LoggedModels alongside MLflow's prompt registry. If you use prompt registry, each prompt's version will be automatically associated with your LoggedModel. See track prompt versions alongside application versions.

Prerequisites

Install MLflow and required packages

pip install --upgrade "mlflow[databricks]>=3.1.0" openai

Create an MLflow experiment by following the setup your environment quickstart.

Step 1: Create a sample application

Below is a simple application that prompts an LLM for a response.

import mlflow
from openai import OpenAI

# Enable MLflow's autologging to instrument your application with Tracing
mlflow.openai.autolog()

# Connect to a Databricks LLM via OpenAI using the same credentials as MLflow
# Alternatively, you can use your own OpenAI credentials here
mlflow_creds = mlflow.utils.databricks_utils.get_databricks_host_creds()
client = OpenAI(
    api_key=mlflow_creds.token,
    base_url=f"{mlflow_creds.host}/serving-endpoints"
)

# Use the trace decorator to capture the application's entry point
@mlflow.trace
def my_app(input: str):
    # This call is automatically instrumented by `mlflow.openai.autolog()`
    response = client.chat.completions.create(
    model="databricks-claude-sonnet-4",  # This example uses a Databricks hosted LLM - you can replace this with any AI Gateway or Model Serving endpoint. If you provide your own OpenAI credentials, replace with a valid OpenAI model e.g., gpt-4o, etc.
    messages=[
        {
        "role": "system",
        "content": "You are a helpful assistant.",
        },
        {
        "role": "user",
        "content": input,
        },
    ],
    )
    return response.choices[0].message.content

result = my_app(input="What is MLflow?")
print(result)

Step 2: Add version tracking to your app's code

A LoggedModel version serves as a central record (metadata hub) for a specific version of your application. It doesn't need to store the application code itself; instead, it points to where your code is managed (e.g., a Git commit hash).

We use mlflow.set_active_model() to declare which LoggedModel we are currently working with, or to create a new one if it doesn't already exist. This function returns an ActiveModel object containing the model_id which is useful for subsequent operations.

Tip

In production, you can set the environment variable MLFLOW_ACTIVE_MODEL_ID instead of calling set_active_model(). See the version tracking in production guide for more details.

Note

The below code uses the current Git commit hash as the model's name, so your model version only increments when you commit. If you want to create a new LoggedModel for every change in your code base, see the appendix for a helper function that creates a unique LoggedModel for any change in your code base, even if not committed to Git.

Insert the following code at the top of your application from step 1. In your application, you must call `set_active_model() BEFORE you execute your app's code.

# Keep original imports
### NEW CODE
import subprocess

# Define your application and its version identifier
app_name = "customer_support_agent"

# Get current git commit hash for versioning
try:
    git_commit = (
        subprocess.check_output(["git", "rev-parse", "HEAD"])
        .decode("ascii")
        .strip()[:8]
    )
    version_identifier = f"git-{git_commit}"
except subprocess.CalledProcessError:
    version_identifier = "local-dev"  # Fallback if not in a git repo
logged_model_name = f"{app_name}-{version_identifier}"

# Set the active model context
active_model_info = mlflow.set_active_model(name=logged_model_name)
print(
    f"Active LoggedModel: '{active_model_info.name}', Model ID: '{active_model_info.model_id}'"
)

### END NEW CODE

### ORIGINAL CODE BELOW
### ...

Step 4: (optionally) Record parameters

Optionally, you can log key configuration parameters that define this version of your application directly to the LoggedModel using mlflow.log_model_params(). This is useful for recording things like LLM names, temperature settings, or retrieval strategies that are tied to this code version.

Add the following code below the code from step 3:

app_params = {
    "llm": "gpt-4o-mini",
    "temperature": 0.7,
    "retrieval_strategy": "vector_search_v3",
}

# Log params
mlflow.log_model_params(model_id=active_model_info.model_id, params=app_params)

Step 5: Run the application

Now, lets call the application to see how the LoggedModel is created and tracked.

# These 2 invocations will be linked to the same LoggedModel
result = my_app(input="What is MLflow?")
print(result)

result = my_app(input="What is Databricks?")
print(result)

To simulate a change without committing, add the following lines to manually create a new logged model.


# Set the active model context
active_model_info = mlflow.set_active_model(name="new-name-set-manually")
print(
    f"Active LoggedModel: '{active_model_info.name}', Model ID: '{active_model_info.model_id}'"
)

app_params = {
    "llm": "gpt-4o",
    "temperature": 0.7,
    "retrieval_strategy": "vector_search_v4",
}

# Log params
mlflow.log_model_params(model_id=active_model_info.model_id, params=app_params)

# This will create a new LoggedModel
result = my_app(input="What is GenAI?")
print(result)

Step 6: View traces linked to the LoggedModel

Via UI

Now, go to the MLflow Experiment UI. In the traces tab, you can see the version of the app that generated each trace (note, the first trace will not have a version attached since we called the app without calling set_active_model() first). In the versions tab, you can see each LoggedModel alongside its parameters and linked traces.

trace

Via SDK

You can use search_traces() to query for traces from a LoggedModel:

import mlflow

traces = mlflow.search_traces(
    filter_string=f"metadata.`mlflow.modelId` = '{active_model_info.model_id}'"
)
print(traces)

You can use get_logged_model() to get details of the LoggedModel:

import mlflow
import datetime
# Get LoggedModel metadata
logged_model = mlflow.get_logged_model(model_id=active_model_info.model_id)

# Inspect basic properties
print(f"\n=== LoggedModel Information ===")
print(f"Model ID: {logged_model.model_id}")
print(f"Name: {logged_model.name}")
print(f"Experiment ID: {logged_model.experiment_id}")
print(f"Status: {logged_model.status}")
print(f"Model Type: {logged_model.model_type}")
creation_time = datetime.datetime.fromtimestamp(logged_model.creation_timestamp / 1000)
print(f"Created at: {creation_time}")

# Access the parameters
print(f"\n=== Model Parameters ===")
for param_name, param_value in logged_model.params.items():
    print(f"{param_name}: {param_value}")

# Access tags if any were set
if logged_model.tags:
    print(f"\n=== Model Tags ===")
    for tag_key, tag_value in logged_model.tags.items():
        print(f"{tag_key}: {tag_value}")

Step 6: Link evaluation results to the LoggedModel

To evaluate your application and link the results to this LoggedModel version, see Link Evaluation Results and Traces to App Versions. This guide covers how to use mlflow.genai.evaluate() to assess your application's performance and automatically associate the metrics, evaluation tables, and traces with your specific LoggedModel version.

import mlflow
from mlflow.genai import scorers

eval_dataset = [
    {
        "inputs": {"input": "What is the most common aggregate function in SQL?"},
    }
]

mlflow.genai.evaluate(data=eval_dataset, predict_fn=my_app, model_id=active_model_info.model_id, scorers=scorers.get_all_scorers())

View the results in the versions and evaluations tabs in the MLflow Experiment UI:

trace

Helper function to compute a unique hash for any file change

The below helper function automatically generates a name for each LoggedModel based on the status of your repo. To use this function, call set_active_model(name=get_current_git_hash()).

get_current_git_hash() generates a unique, deterministic identifier for the current state of a git repository by returning either the HEAD commit hash (for clean repos) or a combination of the HEAD hash and a hash of uncommitted changes (for dirty repos). It ensures that different states of the repository always produce different identifiers, so every code change results in a new LoggedModel.

import subprocess
import hashlib
import os

def get_current_git_hash():
    """
    Get a deterministic hash representing the current git state.
    For clean repositories, returns the HEAD commit hash.
    For dirty repositories, returns a combination of HEAD + hash of changes.
    """
    try:
        # Get the git repository root
        result = subprocess.run(
            ["git", "rev-parse", "--show-toplevel"],
            capture_output=True, text=True, check=True
        )
        git_root = result.stdout.strip()

        # Get the current HEAD commit hash
        result = subprocess.run(
            ["git", "rev-parse", "HEAD"], capture_output=True, text=True, check=True
        )
        head_hash = result.stdout.strip()

        # Check if repository is dirty
        result = subprocess.run(
            ["git", "status", "--porcelain"], capture_output=True, text=True, check=True
        )

        if not result.stdout.strip():
            # Repository is clean, return HEAD hash
            return head_hash

        # Repository is dirty, create deterministic hash of changes
        # Collect all types of changes
        changes_parts = []

        # 1. Get staged changes
        result = subprocess.run(
            ["git", "diff", "--cached"], capture_output=True, text=True, check=True
        )
        if result.stdout:
            changes_parts.append(("STAGED", result.stdout))

        # 2. Get unstaged changes to tracked files
        result = subprocess.run(
            ["git", "diff"], capture_output=True, text=True, check=True
        )
        if result.stdout:
            changes_parts.append(("UNSTAGED", result.stdout))

        # 3. Get all untracked/modified files from status
        result = subprocess.run(
            ["git", "status", "--porcelain", "-uall"],
            capture_output=True, text=True, check=True
        )

        # Parse status output to handle all file states
        status_lines = result.stdout.strip().split('\n') if result.stdout.strip() else []
        file_contents = []

        for line in status_lines:
            if len(line) >= 3:
                status_code = line[:2]
                filepath = line[3:]  # Don't strip - filepath starts exactly at position 3

                # For any modified or untracked file, include its current content
                if '?' in status_code or 'M' in status_code or 'A' in status_code:
                    try:
                        # Use absolute path relative to git root
                        abs_filepath = os.path.join(git_root, filepath)
                        with open(abs_filepath, 'rb') as f:
                            # Read as binary to avoid encoding issues
                            content = f.read()
                            # Create a hash of the file content
                            file_hash = hashlib.sha256(content).hexdigest()
                            file_contents.append(f"{filepath}:{file_hash}")
                    except (IOError, OSError):
                        file_contents.append(f"{filepath}:unreadable")

        # Sort file contents for deterministic ordering
        file_contents.sort()

        # Combine all changes
        all_changes_parts = []

        # Add diff outputs
        for change_type, content in changes_parts:
            all_changes_parts.append(f"{change_type}:\n{content}")

        # Add file content hashes
        if file_contents:
            all_changes_parts.append("FILES:\n" + "\n".join(file_contents))

        # Create final hash
        all_changes = "\n".join(all_changes_parts)
        content_to_hash = f"{head_hash}\n{all_changes}"
        changes_hash = hashlib.sha256(content_to_hash.encode()).hexdigest()

        # Return HEAD hash + first 8 chars of changes hash
        return f"{head_hash[:32]}-dirty-{changes_hash[:8]}"

    except subprocess.CalledProcessError as e:
        raise RuntimeError(f"Git command failed: {e}")
    except FileNotFoundError:
        raise RuntimeError("Git is not installed or not in PATH")

Next Steps

Optionally Package Code: For scenarios where you need to bundle code with the LoggedModel.

Share via