你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

Azure OpenAI 存储的补全和蒸馏（预览版）

2025-05-25

存储的补全使你可以从聊天补全会话中捕获会话历史记录，用作评估和微调的数据集。

存储的补全支持

API 支持

支持首次于 2024-10-01-preview 添加，请使用 2025-02-01-preview 或更高版本访问最新功能。

部署类型

所有 Azure OpenAI 部署类型都支持存储的补全项，包括标准、全局、数据域和预配。

模型和区域可用性

只要使用聊天补全 API 进行推理，就可以利用存储的补全项。它支持所有 Azure OpenAI 模型以及所有受支持的区域（包括仅限全球的区域）。

配置存储的补全

若要为 Azure OpenAI 部署启用存储的补全，请将 store 参数设置为 True。使用 metadata 参数通过其他信息来丰富存储的补全数据集。

import os
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
  azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT"), 
  azure_ad_token_provider=token_provider,
  api_version="2025-02-01-preview"
)

completion = client.chat.completions.create(
    
    model="gpt-4o", # replace with model deployment name
    store= True,
    metadata =  {
    "user": "admin",
    "category": "docs-test",
  },
    messages=[
    {"role": "system", "content": "Provide a clear and concise summary of the technical content, highlighting key concepts and their relationships. Focus on the main ideas and practical implications."},
    {"role": "user", "content": "Ensemble methods combine multiple machine learning models to create a more robust and accurate predictor. Common techniques include bagging (training models on random subsets of data), boosting (sequentially training models to correct previous errors), and stacking (using a meta-model to combine base model predictions). Random Forests, a popular bagging method, create multiple decision trees using random feature subsets. Gradient Boosting builds trees sequentially, with each tree focusing on correcting the errors of previous trees. These methods often achieve better performance than single models by reducing overfitting and variance while capturing different aspects of the data."}
    ]   
)

print(completion.choices[0].message)

重要

请谨慎使用 API 密钥。请不要直接在代码中包含 API 密钥，并且切勿公开发布该密钥。如果使用 API 密钥，请将其安全地存储在 Azure Key Vault 中。若要详细了解如何在应用中安全地使用 API 密钥，请参阅 API 密钥与 Azure Key Vault。

有关 Azure AI 服务安全性的详细信息，请参阅对 Azure AI 服务的请求进行身份验证。

import os
from openai import AzureOpenAI
    
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),  
    api_version="2025-02-01-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )

completion = client.chat.completions.create(
    
    model="gpt-4o", # replace with model deployment name
    store= True,
    metadata =  {
    "user": "admin",
    "category": "docs-test",
  },
    messages=[
    {"role": "system", "content": "Provide a clear and concise summary of the technical content, highlighting key concepts and their relationships. Focus on the main ideas and practical implications."},
    {"role": "user", "content": "Ensemble methods combine multiple machine learning models to create a more robust and accurate predictor. Common techniques include bagging (training models on random subsets of data), boosting (sequentially training models to correct previous errors), and stacking (using a meta-model to combine base model predictions). Random Forests, a popular bagging method, create multiple decision trees using random feature subsets. Gradient Boosting builds trees sequentially, with each tree focusing on correcting the errors of previous trees. These methods often achieve better performance than single models by reducing overfitting and variance while capturing different aspects of the data."}
    ]   
)

print(completion.choices[0].message)

Microsoft Entra 身份识别系统

curl $AZURE_OPENAI_ENDPOINT/openai/deployments/gpt-4o/chat/completions?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
  -d '{
    "model": "gpt-4o",
    "store": true,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

API 密钥

curl $AZURE_OPENAI_ENDPOINT/openai/deployments/gpt-4o/chat/completions?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "api-key: $AZURE_OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4o",
    "store": true,
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

{
  "id": "chatcmpl-B4eQ716S5wGUyFpGgX2MXnJEC5AW5",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Ensemble methods enhance machine learning performance by combining multiple models to create a more robust and accurate predictor. The key techniques include:\n\n1. **Bagging (Bootstrap Aggregating)**: Involves training multiple models on random subsets of the data to reduce variance and overfitting. A popular method within bagging is Random Forests, which build numerous decision trees using random subsets of features and data samples.\n\n2. **Boosting**: Focuses on sequentially training models, where each new model attempts to correct the errors made by previous ones. Gradient Boosting is a common boosting technique that builds trees sequentially, concentrating on the mistakes of earlier trees to improve accuracy.\n\n3. **Stacking**: Uses a meta-model to combine predictions from various base models, leveraging their strengths to enhance overall predictions.\n\nThese ensemble methods generally outperform individual models because they effectively handle overfitting, reduce variance, and capture diverse aspects of the data. In practical applications, they are valued for their ability to improve model accuracy and stability.",
        "refusal": null,
        "role": "assistant",
        "audio": null,
        "function_call": null,
        "tool_calls": null
      },
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "protected_material_code": {
          "filtered": false,
          "detected": false
        },
        "protected_material_text": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ],
  "created": 1740448387,
  "model": "gpt-4o-2024-08-06",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "fp_b705f0c291",
  "usage": {
    "completion_tokens": 205,
    "prompt_tokens": 157,
    "total_tokens": 362,
    "completion_tokens_details": {
      "accepted_prediction_tokens": 0,
      "audio_tokens": 0,
      "reasoning_tokens": 0,
      "rejected_prediction_tokens": 0
    },
    "prompt_tokens_details": {
      "audio_tokens": 0,
      "cached_tokens": 0
    }
  },
  "prompt_filter_results": [
    {
      "prompt_index": 0,
      "content_filter_results": {
        "hate": {
          "filtered": false,
          "severity": "safe"
        },
        "jailbreak": {
          "filtered": false,
          "detected": false
        },
        "self_harm": {
          "filtered": false,
          "severity": "safe"
        },
        "sexual": {
          "filtered": false,
          "severity": "safe"
        },
        "violence": {
          "filtered": false,
          "severity": "safe"
        }
      }
    }
  ]
}

为 Azure OpenAI 部署启用存储的补全后，它们将开始在 Azure AI Foundry 门户的“存储的补全”窗格中显示。

蒸馏

蒸馏使你可以将存储的补全转变为微调数据集。常见的用例是将存储的补全与更强大的较大模型配合使用以完成特定任务，然后使用存储的补全以模型交互的高质量示例来训练较小的模型。

蒸馏至少需要 10 个存储的补全，但建议提供数百到数千个存储的补全，以获得最佳结果。

在 Azure AI Foundry 门户的“存储的补全”窗格中，使用“筛选器”选项来选择训练模型所要使用的补全。
若要开始蒸馏，请选择“蒸馏”
选择要使用存储的补全数据集微调的模型。
确认要微调的模型版本：
将根据存储的补全创建一个 .jsonl 文件作为训练数据集，该文件的名称随机生成。选择该文件 >“下一步”。

注意

存储的补全蒸馏训练文件无法直接访问，也无法从外部导出/下载。

其余步骤与典型的 Azure OpenAI 微调步骤一致。若要了解详细信息，请参阅我们的微调入门指南。

计算

大语言模型的评估是衡量这些模型在不同任务和维度上的性能的关键步骤。这对于微调的模型尤其重要，评估训练的性能提升（或损失）对于这类模型至关重要。全面的评估有助于了解模型的不同版本如何影响应用程序或方案。

存储的补全可作为数据集用于运行评估。

在 Azure AI Foundry 门户的“存储的补全”窗格中，使用“筛选器”选项来选择要加入评估数据集的补全。
若要配置评估，请选择“评估”
这会启动“评估”窗格，其中包含一个预填充的文件，文件名称随机生成，且该文件是根据存储的补全创建的，以用作评估数据集.jsonl。

注意

存储的补全评估数据文件无法直接访问，也无法从外部导出/下载。

若要了解有关评估的详细信息，请参阅评估入门

存储的补全 API

若要访问存储的完成 API 命令，可能需要升级 OpenAI 库的版本。

pip install --upgrade openai

列出存储的补全项

其他参数：

metadata：按存储的补全项中的键/值对进行筛选
after：上一个分页请求中上次存储的完成消息的标识符。
limit：要检索的存储完成消息数。
order：按索引排序的结果顺序（升序或降序）。

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
  azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com", 
  azure_ad_token_provider=token_provider,
  api_version="2025-02-01-preview"
)

response = client.chat.completions.list()

print(response.model_dump_json(indent=2))

from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com", 
  api_key=os.getenv("AZURE_OPENAI_API_KEY"), 
  api_version="2025-02-01-preview"
)

response = client.chat.completions.list()

print(response.model_dump_json(indent=2))

Microsoft Entra 身份识别系统

curl https://YOUR-RESOURCE-NAME.openai.azure.com/openai/chat/completions?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \

API 密钥

curl https://YOUR-RESOURCE-NAME.openai.azure.com/openai/chat/completions?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "api-key: $AZURE_OPENAI_API_KEY" \

{
  "data": [
    {
      "id": "chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u",
      "choices": [
        {
          "finish_reason": null,
          "index": 0,
          "logprobs": null,
          "message": {
            "content": "Ensemble methods enhance machine learning performance by combining multiple models to create a more robust and accurate predictor. The key techniques include:\n\n1. **Bagging (Bootstrap Aggregating):** This involves training models on random subsets of the data to reduce variance and prevent overfitting. Random Forests, a popular bagging method, build multiple decision trees using random feature subsets, leading to robust predictions.\n\n2. **Boosting:** This sequential approach trains models to correct the errors of their predecessors, thereby focusing on difficult-to-predict data points. Gradient Boosting is a common implementation that sequentially builds decision trees, each improving upon the prediction errors of the previous ones.\n\n3. **Stacking:** This technique uses a meta-model to combine the predictions of multiple base models, leveraging their diverse strengths to enhance overall prediction accuracy.\n\nThe practical implications of ensemble methods include achieving superior model performance compared to single models by capturing various data patterns and reducing overfitting and variance. These methods are widely used in applications where high accuracy and model reliability are critical.",
            "refusal": null,
            "role": "assistant",
            "audio": null,
            "function_call": null,
            "tool_calls": null
          }
        }
      ],
      "created": 1740447656,
      "model": "gpt-4o-2024-08-06",
      "object": null,
      "service_tier": null,
      "system_fingerprint": "fp_b705f0c291",
      "usage": {
        "completion_tokens": 208,
        "prompt_tokens": 157,
        "total_tokens": 365,
        "completion_tokens_details": null,
        "prompt_tokens_details": null
      },
      "request_id": "0000aaaa-11bb-cccc-dd22-eeeeee333333",
      "seed": -430976584126747957,
      "top_p": 1,
      "temperature": 1,
      "presence_penalty": 0,
      "frequency_penalty": 0,
      "metadata": {
        "user": "admin",
        "category": "docs-test"
      }
    }
  ],
  "has_more": false,
  "object": "list",
  "total": 1,
  "first_id": "chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u",
  "last_id": "chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u"
}

获取存储的补全项

按 ID 获取存储的补全项。

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
  azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com/", 
  azure_ad_token_provider=token_provider,
  api_version="2025-02-01-preview"
)

response = client.chat.completions.retrieve("chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u")

print(response.model_dump_json(indent=2))

from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com", 
  api_key=os.getenv("AZURE_OPENAI_API_KEY"), 
  api_version="2025-02-01-preview"
)

response = client.chat.completions.retrieve("chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u")

print(response.model_dump_json(indent=2))

Microsoft Entra 身份识别系统

curl https://YOUR-RESOURCE-NAME.openai.azure.com/openai/chat/completions/chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \

API 密钥

curl https://YOUR-RESOURCE-NAME.openai.azure.com/openai/chat/completions/chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "api-key: $AZURE_OPENAI_API_KEY" \

{
  "id": "chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u",
  "choices": [
    {
      "finish_reason": null,
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Ensemble methods enhance machine learning performance by combining multiple models to create a more robust and accurate predictor. The key techniques include:\n\n1. **Bagging (Bootstrap Aggregating):** This involves training models on random subsets of the data to reduce variance and prevent overfitting. Random Forests, a popular bagging method, build multiple decision trees using random feature subsets, leading to robust predictions.\n\n2. **Boosting:** This sequential approach trains models to correct the errors of their predecessors, thereby focusing on difficult-to-predict data points. Gradient Boosting is a common implementation that sequentially builds decision trees, each improving upon the prediction errors of the previous ones.\n\n3. **Stacking:** This technique uses a meta-model to combine the predictions of multiple base models, leveraging their diverse strengths to enhance overall prediction accuracy.\n\nThe practical implications of ensemble methods include achieving superior model performance compared to single models by capturing various data patterns and reducing overfitting and variance. These methods are widely used in applications where high accuracy and model reliability are critical.",
        "refusal": null,
        "role": "assistant",
        "audio": null,
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1740447656,
  "model": "gpt-4o-2024-08-06",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "fp_b705f0c291",
  "usage": {
    "completion_tokens": 208,
    "prompt_tokens": 157,
    "total_tokens": 365,
    "completion_tokens_details": null,
    "prompt_tokens_details": null
  },
  "request_id": "0000aaaa-11bb-cccc-dd22-eeeeee333333",
  "seed": -430976584126747957,
  "top_p": 1,
  "temperature": 1,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "metadata": {
    "user": "admin",
    "category": "docs-test"
  }
}

获取存储的聊天完成消息

其他参数：

after：上一个分页请求中上次存储的完成消息的标识符。
limit：要检索的存储完成消息数。
order：按索引排序的结果顺序（升序或降序）。

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
  azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com/", 
  azure_ad_token_provider=token_provider,
  api_version="2025-02-01-preview"
)

response = client.chat.completions.messages.list("chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u", limit=2)

print(response.model_dump_json(indent=2))

from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com", 
  api_key=os.getenv("AZURE_OPENAI_API_KEY"), 
  api_version="2025-02-01-preview"
)

response = client.chat.completions.messages.list("chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u", limit=2)

print(response.model_dump_json(indent=2))

Microsoft Entra 身份识别系统

curl https://YOUR-RESOURCE-NAME.openai.azure.com/openai/chat/completions/chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u/messages?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \

API 密钥

curl https://YOUR-RESOURCE-NAME.openai.azure.com/openai/chat/completions/chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u/messages?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "api-key: $AZURE_OPENAI_API_KEY" \

{
  "data": [
    {
      "content": "Provide a clear and concise summary of the technical content, highlighting key concepts and their relationships. Focus on the main ideas and practical implications.",
      "refusal": null,
      "role": "system",
      "audio": null,
      "function_call": null,
      "tool_calls": null,
      "id": "chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u-0"
    },
    {
      "content": "Ensemble methods combine multiple machine learning models to create a more robust and accurate predictor. Common techniques include bagging (training models on random subsets of data), boosting (sequentially training models to correct previous errors), and stacking (using a meta-model to combine base model predictions). Random Forests, a popular bagging method, create multiple decision trees using random feature subsets. Gradient Boosting builds trees sequentially, with each tree focusing on correcting the errors of previous trees. These methods often achieve better performance than single models by reducing overfitting and variance while capturing different aspects of the data.",
      "refusal": null,
      "role": "user",
      "audio": null,
      "function_call": null,
      "tool_calls": null,
      "id": "chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u-1"
    }
  ],
  "has_more": false,
  "object": "list",
  "total": 2,
  "first_id": "chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u-0",
  "last_id": "chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u-1"
}

更新存储的聊天补全项

将元数据键值对附加至现有存储的补全项。

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
  azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com/", 
  azure_ad_token_provider=token_provider,
  api_version="2025-02-01-preview"
)

response = client.chat.completions.update(
    "chatcmpl-C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1w",
    metadata={"fizz": "buzz"}
)

print(response.model_dump_json(indent=2))

from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com", 
  api_key=os.getenv("AZURE_OPENAI_API_KEY"), 
  api_version="2025-02-01-preview"
)

response = client.chat.completions.update(
    "chatcmpl-C2dE3fH4iJ5kL6mN7oP8qR9sT0uV1w",
    metadata={"fizz": "buzz"}
)

print(response.model_dump_json(indent=2))

Microsoft Entra 身份识别系统

curl -X https://YOUR-RESOURCE-NAME.openai.azure.com/openai/chat/completions/chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN"
  -d '{
    "metadata": {
      "fizz": "buzz"
    }
  }'

API 密钥

curl -X https://YOUR-RESOURCE-NAME.openai.azure.com/openai/chat/completions/chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "api-key: $AZURE_OPENAI_API_KEY" 
  -d '{
    "metadata": {
      "fizz": "buzz"
    }
  }'

  "id": "chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u",
  "choices": [
    {
      "finish_reason": null,
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "Ensemble methods enhance machine learning performance by combining multiple models to create a more robust and accurate predictor. The key techniques include:\n\n1. **Bagging (Bootstrap Aggregating):** This involves training models on random subsets of the data to reduce variance and prevent overfitting. Random Forests, a popular bagging method, build multiple decision trees using random feature subsets, leading to robust predictions.\n\n2. **Boosting:** This sequential approach trains models to correct the errors of their predecessors, thereby focusing on difficult-to-predict data points. Gradient Boosting is a common implementation that sequentially builds decision trees, each improving upon the prediction errors of the previous ones.\n\n3. **Stacking:** This technique uses a meta-model to combine the predictions of multiple base models, leveraging their diverse strengths to enhance overall prediction accuracy.\n\nThe practical implications of ensemble methods include achieving superior model performance compared to single models by capturing various data patterns and reducing overfitting and variance. These methods are widely used in applications where high accuracy and model reliability are critical.",
        "refusal": null,
        "role": "assistant",
        "audio": null,
        "function_call": null,
        "tool_calls": null
      }
    }
  ],
  "created": 1740447656,
  "model": "gpt-4o-2024-08-06",
  "object": "chat.completion",
  "service_tier": null,
  "system_fingerprint": "fp_b705f0c291",
  "usage": {
    "completion_tokens": 208,
    "prompt_tokens": 157,
    "total_tokens": 365,
    "completion_tokens_details": null,
    "prompt_tokens_details": null
  },
  "request_id": "0000aaaa-11bb-cccc-dd22-eeeeee333333",
  "seed": -430976584126747957,
  "top_p": 1,
  "temperature": 1,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "metadata": {
    "user": "admin",
    "category": "docs-test"
    "fizz": "buzz"
  }
}

删除存储的聊天补全项

根据补全项 ID 删除已存储的补全项。

Microsoft Entra 身份识别系统

from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = AzureOpenAI(
  azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com/", 
  azure_ad_token_provider=token_provider,
  api_version="2025-02-01-preview"
)

response = client.chat.completions.delete("chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u")

print(response.model_dump_json(indent=2))

from openai import AzureOpenAI

client = AzureOpenAI(
  azure_endpoint = "https://YOUR-RESOURCE-NAME.openai.azure.com", 
  api_key=os.getenv("AZURE_OPENAI_API_KEY"), 
  api_version="2025-02-01-preview"
)

response = client.chat.completions.delete("chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u")

print(response.model_dump_json(indent=2))

curl -X DELETE -D - https://YOUR-RESOURCE-NAME.openai.azure.com/openai/chat/completions/chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN"

API 密钥

curl -X DELETE -D - https://YOUR-RESOURCE-NAME.openai.azure.com/openai/chat/completions/chatcmpl-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u?api-version=2025-02-01-preview \
  -H "Content-Type: application/json" \
  -H "api-key: $AZURE_OPENAI_API_KEY"

"id"• "chatcmp1-A1bC2dE3fH4iJ5kL6mN7oP8qR9sT0u", 
"deleted": true, 
"object": "chat. completion. deleted"

故障排除

是否需要特殊权限才能使用存储的补全？

存储的补全访问权限通过两个 DataAction 进行控制：

Microsoft.CognitiveServices/accounts/OpenAI/stored-completions/read
Microsoft.CognitiveServices/accounts/OpenAI/stored-completions/action

默认情况下，Cognitive Services OpenAI Contributor 可以获取这两个权限：

如何删除存储的数据？

通过删除关联的 Azure OpenAI 资源可以删除数据。如果只想删除存储的补全数据，则必须向客户支持部门提出案例。

可以存储多少存储的补全数据？

最多可以存储 10 GB 的数据。

是否可以阻止在订阅上启用存储的补全？

需要向客户支持部门提出案例，才能在订阅级别禁用存储的补全。

TypeError：Completions.create() 获得了意外的参数“store”

运行较旧版本的 OpenAI 客户端库时，将发生此错误，该库早于发布的存储的补全功能。运行 pip install openai --upgrade。

通过

Azure OpenAI 存储的补全和蒸馏（预览版）

存储的补全支持

API 支持

部署类型

模型和区域可用性

配置存储的补全

蒸馏

计算

存储的补全 API

列出存储的补全项

获取存储的补全项

获取存储的聊天完成消息

更新存储的聊天补全项

删除存储的聊天补全项

Microsoft Entra 身份识别系统

故障排除

是否需要特殊权限才能使用存储的补全？

如何删除存储的数据？

可以存储多少存储的补全数据？

是否可以阻止在订阅上启用存储的补全？

TypeError：Completions.create() 获得了意外的参数“store”

反馈

其他资源