排查 Application Insights 中的数据高摄入问题

项目
2025-04-30

Application Insights 或 Log Analytics 的计费费用增加通常是由于数据引入过高造成的。本文可帮助你排查此问题，并提供降低数据引入成本的方法。

常规故障排除步骤

步骤 1：识别处理大量数据的资源

在 Azure 门户中，导航到订阅并选择 成本管理>成本分析。此页面提供成本分析视图，用图表展示每个资源的成本，请参见以下内容：

显示“成本分析”边栏的屏幕截图。

步骤 2：识别数据摄取成本高昂的表

确定了 Application Insights 资源或 Log Analytics 工作区后，分析数据并找出数据摄取量最高的地方。请考虑最适合您情况的方法：

基于原始记录计数

使用以下查询来比较表之间的记录数：
```
search *
| where timestamp &gt; ago(7d)
| summarize count() by $table
| sort by count_ desc
```
此查询可帮助识别 最吵闹 的表。在此处，可以优化查询以缩小调查范围。

基于已使用字节数

使用 format_bytes（）标量函数确定具有最高字节引入的表：

systemEvents
| where timestamp &gt; ago(7d)
| where type == "Billing"
| extend BillingTelemetryType = tostring(dimensions["BillingTelemetryType"])
| extend BillingTelemetrySizeInBytes = todouble(measurements["BillingTelemetrySize"])
| summarize TotalBillingTelemetrySize = sum(BillingTelemetrySizeInBytes) by BillingTelemetryType
| extend BillingTelemetrySizeGB = format_bytes(TotalBillingTelemetrySize, 1 ,"GB")
| sort by BillingTelemetrySizeInBytes desc
| project-away BillingTelemetrySizeInBytes

与记录计数查询类似，前面的查询可帮助识别最活跃的表，从而确定特定表以供进一步调查。

使用 Log Analytics 工作区工作簿

在 Azure 门户中，导航到 Log Analytics 工作区，选择“监视>工作簿”，然后在 Log Analytics 工作区见解下选择“使用情况”。

此工作簿提供有价值的见解，例如每个表的数据摄取百分比，以及同一工作区下每个资源的详细数据摄取统计信息。

步骤 3：确定导致高数据引入的因素

确定具有高数据引入的表后，请专注于具有最高活动的表，并确定导致高数据引入的因素。这可能是一个特定应用程序，它生成的数据比其他应用程序多、记录的异常消息太频繁，或者发出过多信息的新记录器类别。

下面是可用于此标识的一些示例查询：

requests
| where timestamp > ago(7d)
| summarize count() by cloud_RoleInstance
| sort by count_ desc

requests
| where timestamp > ago(7d)
| summarize count() by operation_Name
| sort by count_ desc

dependencies
| where timestamp > ago(7d)
| summarize count() by cloud_RoleName
| sort by count_ desc

dependencies
| where timestamp > ago(7d)
| summarize count() by type
| sort by count_ desc

traces
| where timestamp > ago(7d)
| summarize count() by message
| sort by count_ desc

exceptions
| where timestamp > ago(7d)
| summarize count() by message
| sort by count_ desc

可以尝试不同的遥测字段。例如，你可能会首先运行以下查询，并观察到没有过度遥测的明显原因。

dependencies
| where timestamp > ago(7d)
| summarize count() by target
| sort by count_ desc

您可以尝试其他遥测字段，而不是 target，例如 type。

dependencies
| where timestamp > ago(7d)
| summarize count() by type
| sort by count_ desc

在某些情况下，可能需要进一步调查特定的应用程序或实例。使用以下查询标识干扰消息或异常类型：

traces
| where timestamp > ago(7d)
| where cloud_RoleName == 'Specify a role name'
| summarize count() by type
| sort by count_ desc

exceptions
| where timestamp > ago(7d)
| where cloud_RoleInstance == 'Specify a role instance'
| summarize count() by type
| sort by count_ desc

步骤 4：调查随时间推移摄入的演变

根据之前确定的因素，检查摄取随时间的演化。这样就可以确定此行为是一致的，还是在特定点发生了更改。通过以这种方式分析数据，可以确定更改发生时间，并更清楚地了解高数据引入背后的原因。此见解对于解决问题和实现有效解决方案非常重要。

在以下查询中， bin（） Kusto 查询语言（KQL）标量函数用于将数据分段为一天间隔。此方法有助于进行趋势分析，因为你可以看到数据随时间变化或未更改的方式。

dependencies
| where timestamp > ago(30d)
| summarize count() by bin(timestamp, 1d), operation_Name
| sort by timestamp desc

使用 min() 聚合函数确定特定因素最早记录的时间戳。此方法有助于建立基线，并提供有关首次发生事件或更改的时间的见解。

dependencies
| where timestamp > ago(30d)
| where type == 'Specify dependency type being investigated'
| summarize min(timestamp) by type
| sort by min_timestamp desc

特定方案的故障排除步骤

方案 1：Log Analytics 中的高数据引入

查询 Log Analytics 工作区中的所有表：

search *
| where TimeGenerated > ago(7d)
| where _IsBillable == true
| summarize TotalBilledSize = sum(_BilledSize) by $table
| extend IngestedVolumeGB = format_bytes(TotalBilledSize, 1, "GB")
| sort by TotalBilledSize desc
| project-away TotalBilledSize

你可以知道哪个表是成本的最大贡献者。下面是 AppTraces 的示例：

显示 AppTraces 表是成本最大来源的截图。

查询推动跟踪成本的特定应用程序：

AppTraces
| where TimeGenerated > ago(7d)
| where _IsBillable == true
| summarize TotalBilledSize = sum(_BilledSize) by  AppRoleName
| extend IngestedVolumeGB = format_bytes(TotalBilledSize, 1, "GB")
| sort by TotalBilledSize desc
| project-away TotalBilledSize

显示特定应用程序驱动跟踪成本的屏幕截图。

运行以下特定于该应用程序的查询，并深入研究那些将遥测数据发送到 AppTraces 表的特定记录器类别。

AppTraces
| where TimeGenerated > ago(7d)
| where _IsBillable == true
| where AppRoleName contains 'transformation'
| extend LoggerCategory = Properties['Category']
| summarize TotalBilledSize = sum(_BilledSize) by tostring(LoggerCategory)
| extend IngestedVolumeGB = format_bytes(TotalBilledSize, 1, "GB")
| sort by TotalBilledSize desc
| project-away TotalBilledSize

结果显示负责成本的两个主要类别：

显示将遥测数据发送到 AppTraces 表的特定记录器类别的屏幕截图。

方案 2：Application Insight 中的高数据引入

若要确定导致成本的因素，请执行以下步骤：

跨所有表查询遥测数据，并分别获取每个表和每个 SDK 版本的记录数：
```
search *
| where TimeGenerated > ago(7d)
| summarize count() by $table, SDKVersion
| sort by count_ desc
```
以下示例显示 Azure Functions 正在生成大量跟踪日志和异常遥测：

运行以下查询，以找出生成比其他应用产生更多跟踪的特定应用程序：

AppTraces
| where TimeGenerated > ago(7d)
| where SDKVersion == 'azurefunctions: 4.34.2.22820'
| summarize count() by AppRoleName
| sort by count_ desc

显示哪个应用正在生成最多跟踪的屏幕截图。

优化查询以包含该特定应用，并为每个消息单独生成记录的数量。

AppTraces
| where TimeGenerated > ago(7d)
| where SDKVersion == 'azurefunctions: 4.34.2.22820'
| where AppRoleName contains 'inbound'
| summarize count() by Message
| sort by count_ desc

结果可显示增加引入成本的特定消息：

显示每条消息的记录计数的屏幕截图。

方案 3：意外达到每日上限

假设你在 9 月 4 日意外达到每日上限。使用以下查询获取自定义事件的计数，并标识与每个事件关联的最新时间戳：

customEvents
| where timestamp between(datetime(8/25/2024) .. 15d)
| summarize count(), min(timestamp) by name

这种分析表明，某些事件自9月4日开始被纳入处理，并随后迅速变得非常嘈杂。

降低数据引入成本

在确定负责意外数据引入的 Azure Monitor 表中的因素后，根据方案使用以下方法减少数据引入成本。

方法 1：更新每日上限配置

调整每日上限以防止过度遥测数据摄入。

方法 2：切换表格布局

切换到与 Application Insights 兼容的其他表格规划。数据导入的收费基于表计划和 Log Analytics 工作区所在的区域。请参阅表计划和支持 Azure Monitor 日志中基本表计划的表。

方法 3：对 Java 代理使用遥测 SDK 功能

默认建议的解决方案是使用采样替代。 Application Insights Java 代理提供两种类型的采样。常见的用例是抑制收集健康检查的遥测数据。

采样调整有一些附加方法：

降低traces表中的成本：
- 减少遥测日志级别。
- 删除具有 MDC 属性和采样覆盖的应用程序日志（不包括框架/库）。
- 通过更新 applicationinsights.json 文件禁用日志工具：
```
{
  "instrumentation": {
    "logging": {
      "enabled": false
    }
  }
}
```
降低dependencies表中的成本:
- 禁止收集由 Java 方法生成的依赖项遥测数据。
- 禁用生成依赖项遥测数据的工具。
  
  如果依赖项是数据库调用，则不会在应用程序映射上看到数据库。如果删除 HTTP 调用或消息（例如 Kafka 消息）的依赖项检测工具，则会丢弃所有下游遥测数据。
从 customMetrics 表中降低成本：
- 增加指标间隔。
- 排除具有遥测处理器的指标。
- 增加心跳间隔。
降低 OpenTelemetry 属性的成本：

OpenTelemetry 属性将添加到 customDimensions 列。它们表示为 Application Insights 中的属性。可以使用属性监控处理器删除属性。有关详细信息，请参阅遥测处理器示例 - 删除。

方法 4：更新应用程序代码（日志级别或异常）

在某些情况下，直接修改应用程序代码可能有助于减少 Application Insights 后端服务生成和消耗的遥测数据。一个常见示例可能是由应用程序出现的噪声异常。

参考文献

第三方联系人免责声明

为了帮助您获取有关此主题的更多信息，Microsoft 提供了第三方的联系信息。该联系信息可能会在不通知的情况下更改。微软不保证第三方联系信息的准确性。

联系我们以获得帮助

如果您有任何疑问或需要帮助，可以创建支持请求，或咨询Azure社区支持。您还可以向Azure反馈社区提交产品反馈。

通过