本文介绍如何排查 ClusterResourcePlacementScheduled
在 Azure Kubernetes Fleet Manager 中使用 ClusterResourcePlacement
API 对象传播资源时出现的问题。
症状
使用 ClusterResourcePlacement
Azure Kubernetes Fleet Manager 中的 API 对象传播资源时,机队工作负荷的计划程序找不到计划策略指定的所有必需群集, ClusterResourcePlacementScheduled
条件状态显示为 False
。
注释
若要获取有关计划失败的原因的详细信息,可以检查 计划程序 日志。
原因
此问题可能由于以下原因之一而发生:
- 放置策略设置为
PickFixed
,但指定的群集名称与队列中的任何已加入成员群集名称不匹配,或者指定的群集不再连接到该机群。 - 放置策略设置为
PickN
,并且指定了 N 个群集,但少于 N 个群集已加入队列或满足放置策略。 - 资源
ClusterResourcePlacement
选择器选择保留的命名空间。
注释
当放置策略设置为 PickAll
时,条件 ClusterResourcePlacementScheduled
将设置为 True
。
案例研究
在以下示例中,具有 PickN
放置策略的 ClusterResourcePlacement
正尝试将资源传播到被标记为 env:prod
的两个集群。 这两个群集分别命名为 kind-cluster-1
和 kind-cluster-2
,已加入车队。 但是,只有一个成员群集 kind-cluster-1
具有标签 env:prod
。
ClusterResourcePlacement 规范
spec:
policy:
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: prod
numberOfClusters: 2
placementType: PickN
resourceSelectors:
...
revisionHistoryLimit: 10
strategy:
type: RollingUpdate
ClusterResourcePlacement 状态
status:
conditions:
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: could not find all the clusters needed as specified by the scheduling
policy
observedGeneration: 1
reason: SchedulingPolicyUnfulfilled
status: "False"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: All 1 cluster(s) start rolling out the latest resource
observedGeneration: 1
reason: RolloutStarted
status: "True"
type: ClusterResourcePlacementRolloutStarted
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: No override rules are configured for the selected resources
observedGeneration: 1
reason: NoOverrideSpecified
status: "True"
type: ClusterResourcePlacementOverridden
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: Works(s) are successfully created or updated in the 1 target clusters'
namespaces
observedGeneration: 1
reason: WorkSynchronized
status: "True"
type: ClusterResourcePlacementWorkSynchronized
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: The selected resources are successfully applied to 1 clusters
observedGeneration: 1
reason: ApplySucceeded
status: "True"
type: ClusterResourcePlacementApplied
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: The selected resources in 1 cluster are available now
observedGeneration: 1
reason: ResourceAvailable
status: "True"
type: ClusterResourcePlacementAvailable
observedResourceIndex: "0"
placementStatuses:
- clusterName: kind-cluster-1
conditions:
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
score: 0, topology spread score: 0): picked by scheduling policy'
observedGeneration: 1
reason: Scheduled
status: "True"
type: Scheduled
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: Detected the new changes on the resources and started the rollout process
observedGeneration: 1
reason: RolloutStarted
status: "True"
type: RolloutStarted
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: No override rules are configured for the selected resources
observedGeneration: 1
reason: NoOverrideSpecified
status: "True"
type: Overridden
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: All of the works are synchronized to the latest
observedGeneration: 1
reason: AllWorkSynced
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: All corresponding work objects are applied
observedGeneration: 1
reason: AllWorkHaveBeenApplied
status: "True"
type: Applied
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: All corresponding work objects are available
observedGeneration: 1
reason: AllWorkAreAvailable
status: "True"
type: Available
- conditions:
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: 'kind-cluster-2 is not selected: ClusterUnschedulable, cluster does not
match with any of the required cluster affinity terms'
observedGeneration: 1
reason: ScheduleFailed
status: "False"
type: Scheduled
selectedResources:
...
在 ClusterResourcePlacement
状态下,ClusterResourcePlacementScheduled
的条件状态显示为 False
。 若要确定调度程序为何无法调度指定放置策略的资源,请检查 ClusterSchedulingPolicySnapshot
规范和状态。 若要了解如何获取最新的 ClusterSchedulingPolicySnapshot
,请参阅 如何查找和验证 ClusterResourcePlacement 部署的最新 ClusterSchedulingPolicySnapshot?
最新集群调度策略快照
apiVersion: placement.kubernetes-fleet.io/v1
kind: ClusterSchedulingPolicySnapshot
metadata:
annotations:
kubernetes-fleet.io/CRP-generation: "1"
kubernetes-fleet.io/number-of-clusters: "2"
creationTimestamp: "2024-05-07T22:36:33Z"
generation: 1
labels:
kubernetes-fleet.io/is-latest-snapshot: "true"
kubernetes-fleet.io/parent-CRP: crp-2
kubernetes-fleet.io/policy-index: "0"
name: crp-2-0
ownerReferences:
- apiVersion: placement.kubernetes-fleet.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: ClusterResourcePlacement
name: crp-2
uid: 48bc1e92-a8b9-4450-a2d5-c6905df2cbf0
resourceVersion: "10090"
uid: 2137887e-45fd-4f52-bbb7-b96f39854625
spec:
policy:
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: prod
placementType: PickN
policyHash: ZjE0Yjk4YjYyMTVjY2U3NzQ1MTZkNWRhZjRiNjQ1NzQ4NjllNTUyMzZkODBkYzkyYmRkMGU3OTI3MWEwOTkyNQ==
status:
conditions:
- lastTransitionTime: "2024-05-07T22:36:33Z"
message: could not find all the clusters needed as specified by the scheduling
policy
observedGeneration: 1
reason: SchedulingPolicyUnfulfilled
status: "False"
type: Scheduled
observedCRPGeneration: 1
targetClusters:
- clusterName: kind-cluster-1
clusterScore:
affinityScore: 0
priorityScore: 0
reason: picked by scheduling policy
selected: true
- clusterName: kind-cluster-2
reason: ClusterUnschedulable, cluster does not match with any of the required
cluster affinity terms
selected: false
决议
在此方案中,若要解决此问题,请将 env:prod
标签添加到 kind-cluster-2
成员集群资源,以便调度程序可以选择集群来传播资源。