バージョンと環境を追跡する

2025-06-11

GenAI アプリケーションの実行環境とアプリケーションバージョンを追跡することで、コードに対するパフォーマンスと品質の問題をデバッグできます。このメタデータにより、次のことが可能になります。

環境固有の分析は、development、staging、およびproductionにわたって行われます。
アプリのバージョン間でのパフォーマンス/品質の追跡と回帰の検出
問題発生時の根本原因分析の高速化

MLflow では、メタデータ (キーと値のペア) を使用して、トレースにコンテキスト情報を格納します。

注

バージョン管理のしくみの包括的な概要については、「バージョン追跡」を参照してください。

自動的に生成されるメタデータ

これらの標準メタデータフィールドは、実行環境に基づいて MLflow によって自動的にキャプチャされます。

Von Bedeutung

自動キャプチャロジックが要件を満たしていない場合は、 mlflow.update_current_trace(metadata={"mlflow.source.name": "custom_name"})を使用して、自動的に設定されたこれらのメタデータを手動でオーバーライドできます。

カテゴリ	メタデータフィールド	説明	自動設定ロジック
実行環境	`mlflow.source.name`	トレースを生成したエントリポイントまたはスクリプト。	Python スクリプトのファイル名、Databricks/Jupyter ノートブック名が自動的に入力されます。
	`mlflow.source.git.commit`	Git コミットハッシュ。	Git リポジトリから実行すると、コミットハッシュが自動的に検出され、設定されます。
	`mlflow.source.git.branch`	Git ブランチ。	Git リポジトリから実行すると、現在のブランチ名が自動的に検出され、設定されます。
	`mlflow.source.git.repoURL`	Git リポジトリの URL。	Git リポジトリから実行すると、リポジトリの URL が自動的に検出され、設定されます。
	`mlflow.source.type`	実行環境をキャプチャします。	Jupyter または Databricks ノートブックで実行されている場合は `NOTEBOOK` に自動的に設定され、ローカル Python スクリプトを実行している場合は `LOCAL` 。それ以外の場合は `UNKNOWN` (自動的に検出されます)。デプロイしたアプリでは、環境 ( `PRODUCTION`、 `STAGING`など) に基づいてこの変数を更新することをお勧めします。
アプリケーションのバージョン	`metadata.mlflow.modelId`	MLflow LoggedModel ID。	環境変数 `MLFLOW_ACTIVE_MODEL_ID` のモデル ID 値、または関数によって設定されたモデル ID `mlflow.set_active_model()` 自動的に設定されます。

自動的に設定されるメタデータのカスタマイズ

mlflow.update_current_trace()を使用して、自動的に設定されたメタデータフィールドをオーバーライドできます。これは、自動検出が要件を満たしていない場合、またはコンテキストを追加する場合に便利です。

import mlflow
import os

# We suggest populating metadata from environment variables rather than hard coding the values

@mlflow.trace
def my_app(user_question: str) -> dict:
    # Override automatically populated metadata and add custom context
    mlflow.update_current_trace(
        metadata={
            # Use any of the keys from above
            "mlflow.source.type": current_env = os.getenv("APP_ENVIRONMENT", "development"),  # Override default LOCAL/NOTEBOOK
        }
    )

    # Application logic

    return {"response": user_question + "!!"}

my_app("test")

完全カスタムメタデータ

カスタムメタデータをアタッチして、アプリケーション固有のコンテキストをキャプチャできます。カスタムメタデータのアタッチの詳細については、「カスタムメタデータ/タグのアタッチ」を参照してください。

たとえば、次のような情報を添付できます。

app_version: たとえば、 "1.0.0" (環境変数から APP_VERSION )
deployment_id: たとえば、 "deploy-abc-123" (環境変数から DEPLOYMENT_ID )
region: たとえば、 "us-east-1" (環境変数から REGION )
(機能フラグなどの他のカスタムタグも追加できます)

import mlflow
import os

# We suggest populating metadata from environment variables rather than hard coding the values

@mlflow.trace
def my_app(user_question: str) -> dict:
    # Override automatically populated metadata and add custom context
    mlflow.update_current_trace(
        metadata={
            # Use any key
            "app_version": os.getenv("APP_VERSION", "development")
        }
    )

    # Application logic

    return {"response": user_question + "!!"}

my_app("test")

運用 Web アプリケーションの例

実稼働 FastAPI アプリケーションでは、コンテキストを環境変数、要求ヘッダー、またはアプリケーション構成から派生させることができます。次の例は、 Production Observability with Tracing ガイドに準拠しており、さまざまなコンテキストの種類をキャプチャする方法を示しています。

import mlflow
import os
from fastapi import FastAPI, Request, HTTPException # HTTPException might be needed depending on full app logic
from pydantic import BaseModel

# Initialize FastAPI app
app = FastAPI()

class ChatRequest(BaseModel):
    message: str

@mlflow.trace # Ensure @mlflow.trace is the outermost decorator
@app.post("/chat") # FastAPI decorator should be inner
def handle_chat(request: Request, chat_request: ChatRequest):
    # Retrieve all context from request headers
    client_request_id = request.headers.get("X-Request-ID")
    session_id = request.headers.get("X-Session-ID")
    user_id = request.headers.get("X-User-ID")

    # Update the current trace with all context and environment metadata
    # The @mlflow.trace decorator ensures an active trace is available
    mlflow.update_current_trace(
        client_request_id=client_request_id,
        metadata={
            # Session context - groups traces from multi-turn conversations
            "mlflow.trace.session": session_id,
            # User context - associates traces with specific users
            "mlflow.trace.user": user_id,
            # Override automatically popoulated environment metadata
            "mlflow.source.type": os.getenv("APP_ENVIRONMENT", "development"),  # Override default LOCAL/NOTEBOOK
            # Add customer environment metadata
            "environment": "production",
            "app_version": os.getenv("APP_VERSION", "1.0.0"),
            "deployment_id": os.getenv("DEPLOYMENT_ID", "unknown"),
            "region": os.getenv("REGION", "us-east-1")
        }
    )

    # --- Your application logic for processing the chat message ---
    # For example, calling a language model with context
    # response_text = my_llm_call(
    #     message=chat_request.message,
    #     session_id=session_id,
    #     user_id=user_id
    # )
    response_text = f"Processed message: '{chat_request.message}'"
    # --- End of application logic ---

    # Return response
    return {
        "response": response_text
    }

# To run this example (requires uvicorn and fastapi):
# uvicorn your_file_name:app --reload
#
# Example curl request with context headers:
# curl -X POST "http://127.0.0.1:8000/chat" \
#      -H "Content-Type: application/json" \
#      -H "X-Request-ID: req-abc-123-xyz-789" \
#      -H "X-Session-ID: session-def-456-uvw-012" \
#      -H "X-User-ID: user-jane-doe-12345" \
#      -d '{"message": "What is my account balance?"}'

コンテキストデータのクエリと分析

MLflow UI の使用

MLflow UI ([トレース] タブ) では、添付されたメタデータを表示できます。

トレースメタデータ

プログラムによる分析

より複雑な分析や他のツールとの統合には、MLflow SDK を使用します。

from mlflow.client import MlflowClient

client = MlflowClient()

# Example 1: Compare error rates across app versions in production
def compare_version_error_rates(experiment_id: str, versions: list):
    error_rates = {}
    for version in versions:
        traces = client.search_traces(
            filter_string=f"metadata.`mlflow.source.type` = 'production' AND metadata.app_version = '{version}'"
        )
        if not traces:
            error_rates[version] = None # Or 0 if no traces means no errors
            continue

        error_count = sum(1 for t in traces if t.info.status == "ERROR")
        error_rates[version] = (error_count / len(traces)) * 100
    return error_rates

# version_errors = compare_version_error_rates("your_exp_id", ["1.0.0", "1.1.0"])
# print(version_errors)

# Example 2: Analyze performance for a specific feature flag
def analyze_feature_flag_performance(experiment_id: str, flag_name: str):
    control_latency = []
    treatment_latency = []

    control_traces = client.search_traces(
        filter_string=f"metadata.feature_flag_{flag_name} = 'false'",
        # extract_fields=["execution_time_ms"] # Not a real field, use span attributes if needed
    )
    for t in control_traces: control_latency.append(t.info.execution_time_ms)

    treatment_traces = client.search_traces(
        experiment_ids=[experiment_id],
        filter_string=f"metadata.feature_flag_{flag_name} = 'true'",
    )
    for t in treatment_traces: treatment_latency.append(t.info.execution_time_ms)

    avg_control_latency = sum(control_latency) / len(control_latency) if control_latency else 0
    avg_treatment_latency = sum(treatment_latency) / len(treatment_latency) if treatment_latency else 0

    return {
        f"avg_latency_{flag_name}_off": avg_control_latency,
        f"avg_latency_{flag_name}_on": avg_treatment_latency
    }

# perf_metrics = analyze_feature_flag_performance("your_exp_id", "new_retriever")
# print(perf_metrics)

次のステップ

これらの推奨されるアクションとチュートリアルを使用して、体験を続けます。

ユーザーとセッションを追跡する - トレースにユーザー中心の可観測性を追加する
カスタムタグ/メタデータをアタッチする - コンテキストを使用してトレースを強化する方法の詳細を確認する
トレースを使用した運用環境の監視 - 運用環境に包括的なトレースを展開する

リファレンスガイド

このガイドで説明されている概念と機能の詳細なドキュメントを確認します。

トレースデータモデル - メタデータとそのトレースへの格納方法を理解する
アプリのバージョン追跡の概念 - バージョン管理戦略について説明します
SDK を使用してトレースをクエリする - メタデータフィルターを使った高度な検索

次の方法で共有

バージョンと環境を追跡する

自動的に生成されるメタデータ

自動的に設定されるメタデータのカスタマイズ

完全カスタム メタデータ

運用 Web アプリケーションの例

コンテキスト データのクエリと分析

MLflow UI の使用

プログラムによる分析

次のステップ

リファレンス ガイド

フィードバック

その他のリソース

完全カスタムメタデータ

コンテキストデータのクエリと分析

リファレンスガイド