跟踪提示版本和应用程序版本

重要

此功能在 Beta 版中。

本指南介绍如何将 MLflow 提示注册表中的提示集成到 GenAI 应用程序中,同时同时跟踪提示和应用程序版本。 在将mlflow.set_active_model()与来自注册表的提示一起使用时,MLflow 会在您的提示版本和应用版本之间自动创建关系。

你将了解的内容:

  • 在应用程序中加载和使用 MLflow 提示语注册表中的提示语
  • 使用 LoggedModels 跟踪应用程序版本
  • 查看提示版本与应用版本之间的自动关联
  • 更新提示并查看更改如何传递到您的应用程序中

先决条件

  1. 安装 MLflow 和所需包

    pip install --upgrade "mlflow[databricks]>=3.1.0" openai
    
  2. 请按照 设置环境快速指南 创建 MLflow 试验。

  3. 访问 Unity 目录中的 CREATE FUNCTION

    • 为什么? 提示以函数的形式存储在 UC 中

步骤 1:在注册表中创建提示

首先,让我们创建一个提示,我们将在应用程序中使用。 如果已按照 “创建和编辑提示 ”指南创建了提示,则可以跳过此步骤。

import mlflow

# Replace with a Unity Catalog schema where you have CREATE FUNCTION permission
uc_schema = "workspace.default"
prompt_name = "customer_support_prompt"

# Define the prompt template with variables
initial_template = """\
You are a helpful customer support assistant for {{company_name}}.

Please help the customer with their inquiry about: {{topic}}

Customer Question: {{question}}

Provide a friendly, professional response that addresses their concern.
"""

# Register a new prompt
prompt = mlflow.genai.register_prompt(
    name=f"{uc_schema}.{prompt_name}",
    template=initial_template,
    commit_message="Initial customer support prompt",
    version_metadata={
        "author": "support-team@company.com",
        "use_case": "customer_service"
    },
    tags={
        "department": "customer_support",
        "language": "en"
    }
)

print(f"Created prompt '{prompt.name}' (version {prompt.version})")

步骤 2:使用提示创建启用了版本控制的应用程序

现在,让我们创建一个从注册表加载和使用此提示的 GenAI 应用程序。 我们将使用 mlflow.set_active_model() 来跟踪应用程序版本。

调用 mlflow.set_active_model() 时,MLflow 将创建一个 LoggedModel 作为应用程序版本的元数据中心。 此 LoggedModel 不会存储您的实际应用程序代码 - 而是作为一个中心记录点,链接您的外部代码(如 Git 提交)、配置参数,并自动跟踪您的应用程序使用的注册表中的提示。 有关应用程序版本跟踪工作原理的详细说明,请参阅 使用 MLflow 跟踪应用程序版本

import mlflow
import subprocess
from openai import OpenAI

# Enable MLflow's autologging to instrument your application with Tracing
mlflow.openai.autolog()

# Connect to a Databricks LLM via OpenAI using the same credentials as MLflow
# Alternatively, you can use your own OpenAI credentials here
mlflow_creds = mlflow.utils.databricks_utils.get_databricks_host_creds()
client = OpenAI(
    api_key=mlflow_creds.token,
    base_url=f"{mlflow_creds.host}/serving-endpoints"
)

# Define your application and its version identifier
app_name = "customer_support_agent"

# Get current git commit hash for versioning
try:
    git_commit = (
        subprocess.check_output(["git", "rev-parse", "HEAD"])
        .decode("ascii")
        .strip()[:8]
    )
    version_identifier = f"git-{git_commit}"
except subprocess.CalledProcessError:
    version_identifier = "local-dev"  # Fallback if not in a git repo
logged_model_name = f"{app_name}-{version_identifier}"

# Set the active model context - this creates a LoggedModel that represents this version of your application
active_model_info = mlflow.set_active_model(name=logged_model_name)
print(
    f"Active LoggedModel: '{active_model_info.name}', Model ID: '{active_model_info.model_id}'"
)

# Log application parameters
# These parameters help you track the configuration of this app version
app_params = {
    "llm": "databricks-claude-sonnet-4",
    "temperature": 0.7,
    "max_tokens": 500
}
mlflow.log_model_params(model_id=active_model_info.model_id, params=app_params)

# Load the prompt from the registry
# NOTE: Loading the prompt AFTER calling set_active_model() is what enables
# automatic lineage tracking between the prompt version and the LoggedModel
prompt = mlflow.genai.load_prompt(f"prompts:/{uc_schema}.{prompt_name}/1")
print(f"Loaded prompt version {prompt.version}")

# Use the trace decorator to capture the application's entry point
# Each trace created by this function will be automatically linked to the LoggedModel (application version) we set above.  In turn, the LoggedModel is linked to the prompt version that was loaded from the registry
@mlflow.trace
def customer_support_app(company_name: str, topic: str, question: str):
    # Format the prompt with variables
    formatted_prompt = prompt.format(
        company_name=company_name,
        topic=topic,
        question=question
    )

    # Call the LLM
    response = client.chat.completions.create(
        model="databricks-claude-sonnet-4",  # Replace with your model
        messages=[
            {
                "role": "user",
                "content": formatted_prompt,
            },
        ],
        temperature=0.7,
        max_tokens=500
    )
    return response.choices[0].message.content

# Test the application
result = customer_support_app(
    company_name="TechCorp",
    topic="billing",
    question="I was charged twice for my subscription last month. Can you help?"
)
print(f"\nResponse: {result}")

步骤 3:查看自动世系

步骤 4:更新提示并跟踪更改

让我们改进提示词,并查看在我们的应用程序中使用它时新版本如何被自动跟踪。

# Create an improved version of the prompt
improved_template = """\
You are a helpful and empathetic customer support assistant for {{company_name}}.

Customer Topic: {{topic}}
Customer Question: {{question}}

Please provide a response that:
1. Acknowledges the customer's concern with empathy
2. Provides a clear solution or next steps
3. Offers additional assistance if needed
4. Maintains a friendly, professional tone

Remember to:
- Use the customer's name if provided
- Be concise but thorough
- Avoid technical jargon unless necessary
"""

# Register the new version
updated_prompt = mlflow.genai.register_prompt(
    name=f"{uc_schema}.{prompt_name}",
    template=improved_template,
    commit_message="Added structured response guidelines for better customer experience",
    version_metadata={
        "author": "support-team@company.com",
        "improvement": "Added empathy guidelines and response structure"
    }
)

print(f"Created version {updated_prompt.version} of '{updated_prompt.name}'")

步骤 5:在应用程序中使用更新的提示

现在,让我们使用新的提示版本并创建新的应用程序版本来跟踪此更改:

# Create a new application version
new_version_identifier = "v2-improved-prompt"
new_logged_model_name = f"{app_name}-{new_version_identifier}"

# Set the new active model
active_model_info_v2 = mlflow.set_active_model(name=new_logged_model_name)
print(
    f"Active LoggedModel: '{active_model_info_v2.name}', Model ID: '{active_model_info_v2.model_id}'"
)

# Log updated parameters
app_params_v2 = {
    "llm": "databricks-claude-sonnet-4",
    "temperature": 0.7,
    "max_tokens": 500,
    "prompt_version": "2"  # Track which prompt version we're using
}
mlflow.log_model_params(model_id=active_model_info_v2.model_id, params=app_params_v2)

# Load the new prompt version
prompt_v2 = mlflow.genai.load_prompt(f"prompts:/{uc_schema}.{prompt_name}/2")

# Update the app to use the new prompt
@mlflow.trace
def customer_support_app_v2(company_name: str, topic: str, question: str):
    # Format the prompt with variables
    formatted_prompt = prompt_v2.format(
        company_name=company_name,
        topic=topic,
        question=question
    )

    # Call the LLM
    response = client.chat.completions.create(
        model="databricks-claude-sonnet-4",
        messages=[
            {
                "role": "user",
                "content": formatted_prompt,
            },
        ],
        temperature=0.7,
        max_tokens=500
    )
    return response.choices[0].message.content

# Test with the same question to see the difference
result_v2 = customer_support_app_v2(
    company_name="TechCorp",
    topic="billing",
    question="I was charged twice for my subscription last month. Can you help?"
)
print(f"\nImproved Response: {result_v2}")

后续步骤:评估提示版本

现在,你已经跟踪了不同版本的提示和应用程序,你可以系统地评估哪些提示版本性能最佳。 MLflow 的评估框架允许你使用 LLM 评委和自定义指标并行比较多个提示版本。

若要了解如何评估提示版本,请参阅 评估提示。 本指南介绍如何:

  • 在不同的提示版本上运行评估
  • 使用评估 UI 跨版本比较结果
  • 同时使用内置 LLM 评估方法和自定义指标
  • 对要部署的提示版本做出数据驱动决策

通过将提示版本控制与评估相结合,可以放心地迭代地改进提示,确切地了解每个更改对质量指标的影响。

后续步骤