アプリケーションのバージョンと共にプロンプトのバージョンを追跡する

2025-06-24

Von Bedeutung

このガイドでは、プロンプトとアプリケーションの両方のバージョンを一緒に追跡しながら、MLflow Prompt Registry からのプロンプトを GenAI アプリケーションに統合する方法について説明します。レジストリからのプロンプトで mlflow.set_active_model() を使用すると、MLflow によってプロンプトのバージョンとアプリケーションのバージョンの間に系列が自動的に作成されます。

学習内容:

アプリケーションで MLflow プロンプトレジストリからプロンプトを読み込んで使用する
を使用してアプリケーションのバージョンを追跡する LoggedModels
プロンプトバージョンとアプリケーションバージョン間の自動系列を表示する
更新プロンプトを表示し、アプリケーションに対する変更の流れを確認する

[前提条件]

MLflow と必要なパッケージをインストールする
```
pip install --upgrade "mlflow[databricks]>=3.1.0" openai
```
環境のセットアップのクイックスタートに従って、MLflow 実験を作成します。
を使用して Unity カタログスキーマにアクセスする CREATE FUNCTION
- なぜでしょうか。プロンプトは UC に関数として格納されます

手順 1: レジストリにプロンプトを作成する

まず、アプリケーションで使用するプロンプトを作成しましょう。「プロンプトの作成と編集」ガイドに従ってプロンプトを既に作成している場合は、この手順をスキップできます。

import mlflow

# Replace with a Unity Catalog schema where you have CREATE FUNCTION permission
uc_schema = "workspace.default"
prompt_name = "customer_support_prompt"

# Define the prompt template with variables
initial_template = """\
You are a helpful customer support assistant for {{company_name}}.

Please help the customer with their inquiry about: {{topic}}

Customer Question: {{question}}

Provide a friendly, professional response that addresses their concern.
"""

# Register a new prompt
prompt = mlflow.genai.register_prompt(
    name=f"{uc_schema}.{prompt_name}",
    template=initial_template,
    commit_message="Initial customer support prompt",
    tags={
        "author": "support-team@company.com",
        "use_case": "customer_service"
        "department": "customer_support",
        "language": "en"
    }
)

print(f"Created prompt '{prompt.name}' (version {prompt.version})")

手順 2: プロンプトを使用するバージョン管理が有効なアプリケーションを作成する

次に、レジストリからこのプロンプトを読み込んで使用する GenAI アプリケーションを作成しましょう。 mlflow.set_active_model()を使用して、アプリケーションのバージョンを追跡します。

mlflow.set_active_model()を呼び出すと、MLflow によって、アプリケーションバージョンのメタデータハブとして機能するLoggedModelが作成されます。この LoggedModel では、実際のアプリケーションコードは格納されません。代わりに、外部コード (Git コミットなど) にリンクする中央レコードとして機能し、構成パラメーターを使用して、アプリケーションが使用するレジストリからのプロンプトを自動的に追跡します。アプリケーションバージョンの追跡のしくみの詳細については、 MLflow を使用したアプリケーションバージョンの追跡に関する記事を参照してください。

import mlflow
import subprocess
from openai import OpenAI

# Enable MLflow's autologging to instrument your application with Tracing
mlflow.openai.autolog()

# Connect to a Databricks LLM via OpenAI using the same credentials as MLflow
# Alternatively, you can use your own OpenAI credentials here
mlflow_creds = mlflow.utils.databricks_utils.get_databricks_host_creds()
client = OpenAI(
    api_key=mlflow_creds.token,
    base_url=f"{mlflow_creds.host}/serving-endpoints"
)

# Define your application and its version identifier
app_name = "customer_support_agent"

# Get current git commit hash for versioning
try:
    git_commit = (
        subprocess.check_output(["git", "rev-parse", "HEAD"])
        .decode("ascii")
        .strip()[:8]
    )
    version_identifier = f"git-{git_commit}"
except subprocess.CalledProcessError:
    version_identifier = "local-dev"  # Fallback if not in a git repo
logged_model_name = f"{app_name}-{version_identifier}"

# Set the active model context - this creates a LoggedModel that represents this version of your application
active_model_info = mlflow.set_active_model(name=logged_model_name)
print(
    f"Active LoggedModel: '{active_model_info.name}', Model ID: '{active_model_info.model_id}'"
)

# Log application parameters
# These parameters help you track the configuration of this app version
app_params = {
    "llm": "databricks-claude-sonnet-4",
    "temperature": 0.7,
    "max_tokens": 500
}
mlflow.log_model_params(model_id=active_model_info.model_id, params=app_params)

# Load the prompt from the registry
# NOTE: Loading the prompt AFTER calling set_active_model() is what enables
# automatic lineage tracking between the prompt version and the LoggedModel
prompt = mlflow.genai.load_prompt(f"prompts:/{uc_schema}.{prompt_name}/1")
print(f"Loaded prompt version {prompt.version}")

# Use the trace decorator to capture the application's entry point
# Each trace created by this function will be automatically linked to the LoggedModel (application version) we set above.  In turn, the LoggedModel is linked to the prompt version that was loaded from the registry
@mlflow.trace
def customer_support_app(company_name: str, topic: str, question: str):
    # Format the prompt with variables
    formatted_prompt = prompt.format(
        company_name=company_name,
        topic=topic,
        question=question
    )

    # Call the LLM
    response = client.chat.completions.create(
        model="databricks-claude-sonnet-4",  # Replace with your model
        messages=[
            {
                "role": "user",
                "content": formatted_prompt,
            },
        ],
        temperature=0.7,
        max_tokens=500
    )
    return response.choices[0].message.content

# Test the application
result = customer_support_app(
    company_name="TechCorp",
    topic="billing",
    question="I was charged twice for my subscription last month. Can you help?"
)
print(f"\nResponse: {result}")

手順 3: 自動系列を表示する

手順 4: プロンプトを更新し、変更を追跡する

プロンプトを改善し、アプリケーションで新しいバージョンを使用したときに新しいバージョンがどのように自動的に追跡されるかを確認しましょう。

# Create an improved version of the prompt
improved_template = """\
You are a helpful and empathetic customer support assistant for {{company_name}}.

Customer Topic: {{topic}}
Customer Question: {{question}}

Please provide a response that:
1. Acknowledges the customer's concern with empathy
2. Provides a clear solution or next steps
3. Offers additional assistance if needed
4. Maintains a friendly, professional tone

Remember to:
- Use the customer's name if provided
- Be concise but thorough
- Avoid technical jargon unless necessary
"""

# Register the new version
updated_prompt = mlflow.genai.register_prompt(
    name=f"{uc_schema}.{prompt_name}",
    template=improved_template,
    commit_message="Added structured response guidelines for better customer experience",
    tags={
        "author": "support-team@company.com",
        "improvement": "Added empathy guidelines and response structure"
    }
)

print(f"Created version {updated_prompt.version} of '{updated_prompt.name}'")

手順 5: アプリケーションで更新されたプロンプトを使用する

次に、新しいプロンプトバージョンを使用し、新しいアプリケーションバージョンを作成して、この変更を追跡してみましょう。

# Create a new application version
new_version_identifier = "v2-improved-prompt"
new_logged_model_name = f"{app_name}-{new_version_identifier}"

# Set the new active model
active_model_info_v2 = mlflow.set_active_model(name=new_logged_model_name)
print(
    f"Active LoggedModel: '{active_model_info_v2.name}', Model ID: '{active_model_info_v2.model_id}'"
)

# Log updated parameters
app_params_v2 = {
    "llm": "databricks-claude-sonnet-4",
    "temperature": 0.7,
    "max_tokens": 500,
    "prompt_version": "2"  # Track which prompt version we're using
}
mlflow.log_model_params(model_id=active_model_info_v2.model_id, params=app_params_v2)

# Load the new prompt version
prompt_v2 = mlflow.genai.load_prompt(f"prompts:/{uc_schema}.{prompt_name}/2")

# Update the app to use the new prompt
@mlflow.trace
def customer_support_app_v2(company_name: str, topic: str, question: str):
    # Format the prompt with variables
    formatted_prompt = prompt_v2.format(
        company_name=company_name,
        topic=topic,
        question=question
    )

    # Call the LLM
    response = client.chat.completions.create(
        model="databricks-claude-sonnet-4",
        messages=[
            {
                "role": "user",
                "content": formatted_prompt,
            },
        ],
        temperature=0.7,
        max_tokens=500
    )
    return response.choices[0].message.content

# Test with the same question to see the difference
result_v2 = customer_support_app_v2(
    company_name="TechCorp",
    topic="billing",
    question="I was charged twice for my subscription last month. Can you help?"
)
print(f"\nImproved Response: {result_v2}")

次の手順: プロンプトのバージョンを評価する

プロンプトとアプリケーションのさまざまなバージョンを追跡したので、どのプロンプトバージョンが最適に動作するかを体系的に評価できます。 MLflow の評価フレームワークを使用すると、LLM のジャッジとカスタムメトリックを使用して、複数のプロンプトバージョンを並べて比較できます。

プロンプトのバージョンを評価する方法については、「プロンプトの評価」を参照してください。このガイドでは、次の方法について説明します。

さまざまなプロンプトバージョンで評価を実行する
評価 UI を使用してバージョン間で結果を比較する
組み込みの LLM ジャッジとカスタムメトリックの両方を使用する
デプロイするプロンプトバージョンに関するデータドリブンの決定を行う

プロンプトのバージョン管理と評価を組み合わせることで、各変更が品質メトリックに与える影響を正確に把握しながら、プロンプトを自信を持って反復的に改善できます。

次のステップ

プロンプトの評価 - 品質の異なるプロンプトバージョンを評価する方法について説明します
運用環境のトレースをアプリのバージョンにリンクする - 運用環境のバージョンを追跡する