跟踪用户和会话

2025-06-11

跟踪 GenAI 应用程序中的用户和会话提供了了解用户行为、分析聊天流和改进个性化的基本上下文。 MLflow 提供内置支持，用于将跟踪与用户关联，并将其分组到会话中。

先决条件

根据环境选择适当的安装方法：

生产

对于生产部署，请安装 mlflow-tracing 包：

pip install --upgrade mlflow-tracing

该 mlflow-tracing 包针对生产用途进行优化，具有最少的依赖项和更好的性能特征。

开发

对于开发环境，请使用 Databricks 附加组件安装完整的 MLflow 包：

pip install --upgrade "mlflow[databricks]>=3.1"

完整 mlflow[databricks] 包包括 Databricks 本地开发和试验所需的所有功能。

注释

用户和会话跟踪需要 MLflow 3。由于性能限制和缺少生产用途所必需的功能，不支持 MLflow 2.x。

为什么跟踪用户和会话？

用户和会话跟踪可实现强大的分析和改进：

用户行为分析 - 了解不同用户与应用程序交互的方式
对话流程跟踪 - 分析多轮对话和语境保持
个性化见解 - 确定用于改进用户特定体验的模式
每个用户的质量 - 跟踪不同用户细分的性能指标
会话连续性 - 维持跨多个交互的上下文

标准 MLflow 元数据字段

MLflow 为会话和用户跟踪提供两个标准元数据字段：

mlflow.trace.user - 将跟踪与特定用户相关联
mlflow.trace.session - 对属于多轮对话的跟踪进行分组

使用这些标准元数据字段时，MLflow 会自动在 UI 中启用筛选和分组。与标记不同，在记录跟踪后，元数据无法更新，因此非常适合不可变标识符（如用户和会话 ID）。

基本实现

下面介绍如何将用户和会话跟踪添加到应用程序：

import mlflow

@mlflow.trace
def chat_completion(user_id: str, session_id: str, message: str):
    """Process a chat message with user and session tracking."""

    # Add user and session context to the current trace
    # The @mlflow.trace decorator ensures there's an active trace
    mlflow.update_current_trace(
        metadata={
            "mlflow.trace.user": user_id,      # Links this trace to a specific user
            "mlflow.trace.session": session_id, # Groups this trace with others in the same conversation
        }
    )

    # Your chat logic here
    # The trace will capture the execution time, inputs, outputs, and any errors
    response = generate_response(message)
    return response

# Example usage in a chat application
def handle_user_message(request):
    # Extract user and session IDs from your application's context
    # These IDs should be consistent across all interactions
    return chat_completion(
        user_id=request.user_id,        # e.g., "user-123" - unique identifier for the user
        session_id=request.session_id,   # e.g., "session-abc-456" - groups related messages
        message=request.message
    )

要点：

@mlflow.trace修饰器自动为函数执行创建跟踪
mlflow.update_current_trace() 将用户 ID 和会话 ID 作为元数据添加到活动跟踪
使用 metadata 可确保创建跟踪后这些标识符不可变

生产 Web 应用程序示例

在生产应用程序中，通常同时跟踪用户、会话和其他上下文信息。以下示例改编自我们的 “通过跟踪实现生产可观测性”指南，并结合了环境和部署上下文，如 “追踪环境和上下文”指南中所示。

import mlflow
import os
from fastapi import FastAPI, Request, HTTPException # HTTPException might be needed depending on full app logic
from pydantic import BaseModel

# Initialize FastAPI app
app = FastAPI()

class ChatRequest(BaseModel):
    message: str

@mlflow.trace # Ensure @mlflow.trace is the outermost decorator
@app.post("/chat") # FastAPI decorator should be inner
def handle_chat(request: Request, chat_request: ChatRequest):
    # Retrieve all context from request headers
    session_id = request.headers.get("X-Session-ID")
    user_id = request.headers.get("X-User-ID")

    # Update the current trace with all context and environment metadata
    # The @mlflow.trace decorator ensures an active trace is available
    mlflow.update_current_trace(
        client_request_id=client_request_id,
        metadata={
            # Session context - groups traces from multi-turn conversations
            "mlflow.trace.session": session_id,
            # User context - associates traces with specific users
            "mlflow.trace.user": user_id,

        }
    )

    # --- Your application logic for processing the chat message ---
    # For example, calling a language model with context
    # response_text = my_llm_call(
    #     message=chat_request.message,
    #     session_id=session_id,
    #     user_id=user_id
    # )
    response_text = f"Processed message: '{chat_request.message}'"
    # --- End of application logic ---

    # Return response
    return {
        "response": response_text
    }

# To run this example (requires uvicorn and fastapi):
# uvicorn your_file_name:app --reload
#
# Example curl request with context headers:
# curl -X POST "http://127.0.0.1:8000/chat" \
#      -H "Content-Type: application/json" \
#      -H "X-Request-ID: req-abc-123-xyz-789" \
#      -H "X-Session-ID: session-def-456-uvw-012" \
#      -H "X-User-ID: user-jane-doe-12345" \
#      -d '{"message": "What is my account balance?"}'

此示例演示了上下文跟踪、捕获的统一方法：

用户信息：从 X-User-ID 标头中记录为 mlflow.trace.user 元数据。
会话信息：从 X-Session-ID 标头中提取，记录为 mlflow.trace.session 元数据。

查询和分析数据

使用 MLflow UI

在 MLflow UI 中，使用以下搜索查询过滤跟踪：

# Find all traces for a specific user
metadata.`mlflow.trace.user` = 'user-123'

# Find all traces in a session
metadata.`mlflow.trace.session` = 'session-abc-456'

# Find traces for a user within a specific session
metadata.`mlflow.trace.user` = 'user-123' AND metadata.`mlflow.trace.session` = 'session-abc-456'

编程分析

使用 MLflow SDK 以编程方式分析用户和会话数据。这样，便可以大规模生成自定义分析、生成报表和监视用户行为模式。

from mlflow.client import MlflowClient

client = MlflowClient()

# Analyze user behavior
def analyze_user_behavior(user_id: str, experiment_id: str):
    """Analyze a specific user's interaction patterns."""

    # Search for all traces from a specific user
    user_traces = client.search_traces(
        experiment_ids=[experiment_id],
        filter_string=f"metadata.`mlflow.trace.user` = '{user_id}'",
        max_results=1000
    )

    # Calculate key metrics
    total_interactions = len(user_traces)
    unique_sessions = len(set(t.info.metadata.get("mlflow.trace.session", "") for t in user_traces))
    avg_response_time = sum(t.info.execution_time_ms for t in user_traces) / total_interactions

    return {
        "total_interactions": total_interactions,
        "unique_sessions": unique_sessions,
        "avg_response_time": avg_response_time
    }

# Analyze session flow
def analyze_session_flow(session_id: str, experiment_id: str):
    """Analyze conversation flow within a session."""

    # Get all traces from a session, ordered chronologically
    session_traces = client.search_traces(
        experiment_ids=[experiment_id],
        filter_string=f"metadata.`mlflow.trace.session` = '{session_id}'",
        order_by=["timestamp ASC"]
    )

    # Build a timeline of the conversation
    conversation_turns = []
    for i, trace in enumerate(session_traces):
        conversation_turns.append({
            "turn": i + 1,
            "timestamp": trace.info.timestamp,
            "duration_ms": trace.info.execution_time_ms,
            "status": trace.info.status
        })

    return conversation_turns

主要功能：

用户行为分析 - 跟踪每个用户的交互频率、会话计数和性能指标
会话流分析 - 重新构造会话时间线以了解多轮次交互
灵活筛选 - 使用 MLflow 的搜索语法通过元数据字段的任意组合来查询跟踪
可缩放分析 - 以编程方式处理数千条跟踪以获取大规模见解
导出就绪数据 - 可以将结果轻松转换为数据帧或导出以进一步分析

最佳做法

一致的 ID 格式 - 对用户和会话 ID 使用标准化格式
会话边界 - 定义会话开始和结束时间的明确规则
元数据扩充 - 添加其他上下文，如用户段或会话类型
结合请求跟踪 - 将 用户/会话数据与请求 ID 链接，实现完整的可跟踪性
常规分析 - 设置仪表板以监视用户行为和会话模式

与其他 MLflow 功能的集成

用户和会话跟踪与其他 MLflow 功能无缝集成：

评估 - 比较不同用户细分的质量指标，以确定改进领域
生产监视 - 按用户队列或会话类型跟踪性能模式
反馈收集 - 将用户反馈与用于质量分析的特定会话相关联
生成评估数据集 - 从特定用户会话创建目标数据集

生产注意事项

有关全面的生产环境实施，请参阅有关生产可观测性与追踪的指南，其中涵盖：

在生产环境中设置用户和会话跟踪
将会话 ID 与请求 ID 相结合，实现完整的可跟踪性
实现整个会话的反馈收集
大容量会话管理的最佳做法

后续步骤

继续您的旅程，并参考这些推荐的行动和教程。

跟踪环境和上下文 - 向跟踪添加部署和环境元数据
收集用户反馈 - 从用户捕获质量信号

参考指南

浏览本指南中提到的概念和功能的详细文档。

跟踪数据模型 - 了解元数据、标记和跟踪结构
通过 SDK 查询跟踪 - 掌握高级查询技巧
生产监视概念 - 探索监视模式

通过