本文讨论如何在 Microsoft Azure Kubernetes Fleet Manager 中使用ClusterResourcePlacementApplied
对象 API 传播资源时排查ClusterResourcePlacement
问题。
症状
使用 ClusterResourcePlacement
Azure Kubernetes Fleet Manager 中的 API 对象传播资源时,部署将失败。 状态 ClusterResourcePlacementApplied
显示为 False
。
原因
此问题可能是由于以下原因之一导致的:
- 该资源已存在于群集上,并且不受机群控制器管理。 若要解决此问题,请更新
ClusterResourcePlacement
清单 YAML 文件,以便在ApplyStrategy
中使用AllowCoOwnership
,从而允许集群控制器管理资源。 - 另一个
ClusterResourcePlacement
部署已使用不同的应用策略来管理所选群集的资源。 - 由于语法错误或资源配置无效,
ClusterResourcePlacement
部署不会应用清单。 如果资源通过信封对象传播,则也可能发生这种情况。
故障排除步骤
-
ClusterResourcePlacement
查看状态并找到placementStatuses
分区。 检查placementStatuses
值,以识别哪些群集已将False
条件设置为ResourceApplied
,并记下它们的clusterName
值。 - 在
Work
中心群集中找到对象。 使用标识clusterName
查找Work
与成员群集关联的对象。 有关详细信息,请参阅如何查找与ClusterResourcePlacement
该资源关联的正确工作资源。 - 检查对象的状态
Work
,了解阻止成功资源应用程序的特定问题。
案例研究
在以下示例中, ClusterResourcePlacement
尝试将包含部署的命名空间传播到两个成员群集。 但是,命名空间已存在于一个成员群集上,具体而言 kind-cluster-1
。
ClusterResourcePlacement 规范
apiVersion: placement.kubernetes-fleet.io/v1beta1
kind: ClusterResourcePlacement
metadata:
name: crp
spec:
policy:
clusterNames:
- kind-cluster-1
- kind-cluster-2
placementType: PickFixed
resourceSelectors:
- group: ""
kind: Namespace
name: test-ns
version: v1
revisionHistoryLimit: 10
strategy:
type: RollingUpdate
ClusterResourcePlacement 状态
status:
conditions:
- lastTransitionTime: "2024-05-07T23:32:40Z"
message:couldn't find all the clusters needed as specified by the scheduling
policy
observedGeneration: 1
reason: SchedulingPolicyUnfulfilled
status: "False"
type: ClusterResourcePlacementScheduled
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: All 2 cluster(s) start rolling out the latest resource
observedGeneration: 1
reason: RolloutStarted
status: "True"
type: ClusterResourcePlacementRolloutStarted
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: No override rules are configured for the selected resources
observedGeneration: 1
reason: NoOverrideSpecified
status: "True"
type: ClusterResourcePlacementOverridden
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: Works(s) are succcesfully created or updated in the 2 target clusters'
namespaces
observedGeneration: 1
reason: WorkSynchronized
status: "True"
type: ClusterResourcePlacementWorkSynchronized
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: Failed to apply resources to 1 clusters, please check the `failedPlacements`
status
observedGeneration: 1
reason: ApplyFailed
status: "False"
type: ClusterResourcePlacementApplied
observedResourceIndex: "0"
placementStatuses:
- clusterName: kind-cluster-2
conditions:
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: 'Successfully scheduled resources for placement in kind-cluster-2 (affinity
score: 0, topology spread score: 0): picked by scheduling policy'
observedGeneration: 1
reason: Scheduled
status: "True"
type: Scheduled
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: Detected the new changes on the resources and started the rollout process
observedGeneration: 1
reason: RolloutStarted
status: "True"
type: RolloutStarted
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: No override rules are configured for the selected resources
observedGeneration: 1
reason: NoOverrideSpecified
status: "True"
type: Overridden
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: All of the works are synchronized to the latest
observedGeneration: 1
reason: AllWorkSynced
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: All corresponding work objects are applied
observedGeneration: 1
reason: AllWorkHaveBeenApplied
status: "True"
type: Applied
- lastTransitionTime: "2024-05-07T23:32:49Z"
message: The availability of work object crp-4-work isn't trackable
observedGeneration: 1
reason: WorkNotTrackable
status: "True"
type: Available
- clusterName: kind-cluster-1
conditions:
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: 'Successfully scheduled resources for placement in kind-cluster-1 (affinity
score: 0, topology spread score: 0): picked by scheduling policy'
observedGeneration: 1
reason: Scheduled
status: "True"
type: Scheduled
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: Detected the new changes on the resources and started the rollout process
observedGeneration: 1
reason: RolloutStarted
status: "True"
type: RolloutStarted
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: No override rules are configured for the selected resources
observedGeneration: 1
reason: NoOverrideSpecified
status: "True"
type: Overridden
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: All of the works are synchronized to the latest
observedGeneration: 1
reason: AllWorkSynced
status: "True"
type: WorkSynchronized
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: Work object crp-4-work isn't applied
observedGeneration: 1
reason: NotAllWorkHaveBeenApplied
status: "False"
type: Applied
failedPlacements:
- condition:
lastTransitionTime: "2024-05-07T23:32:40Z"
message: 'Failed to apply manifest: failed to process the request due to a
client error: resource exists and isn't managed by the fleet controller
and co-ownernship is disallowed'
reason: ManifestsAlreadyOwnedByOthers
status: "False"
type: Applied
kind: Namespace
name: test-ns
version: v1
selectedResources:
- kind: Namespace
name: test-ns
version: v1
- group: apps
kind: Deployment
name: test-nginx
namespace: test-ns
version: v1
在 kind-cluster-1
的 failedPlacements
部分中,message
字段说明了为何未在成员集群上应用资源。 在前一conditions
部分中,Applied
条件为kind-cluster-1
标记为false
并显示NotAllWorkHaveBeenApplied
原因。 这表示为成员群集 Work
准备的 kind-cluster-1
对象未被应用。 有关详细信息,请参阅如何查找与ClusterResourcePlacement
该资源关联的正确工作资源。
kind-cluster-1 的工作状态
status:
conditions:
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: 'Apply manifest {Ordinal:0 Group: Version:v1 Kind:Namespace Resource:namespaces
Namespace: Name:test-ns} failed'
observedGeneration: 1
reason: WorkAppliedFailed
status: "False"
type: Applied
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: ""
observedGeneration: 1
reason: WorkAppliedFailed
status: Unknown
type: Available
manifestConditions:
- conditions:
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: 'Failed to apply manifest: failed to process the request due to a client
error: resource exists and isn't managed by the fleet controller and co-ownernship
is disallowed'
reason: ManifestsAlreadyOwnedByOthers
status: "False"
type: Applied
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: Manifest isn't applied yet
reason: ManifestApplyFailed
status: Unknown
type: Available
identifier:
kind: Namespace
name: test-ns
ordinal: 0
resource: namespaces
version: v1
- conditions:
- lastTransitionTime: "2024-05-07T23:32:40Z"
message: Manifest is already up to date
observedGeneration: 1
reason: ManifestAlreadyUpToDate
status: "True"
type: Applied
- lastTransitionTime: "2024-05-07T23:32:51Z"
message: Manifest is trackable and available now
observedGeneration: 1
reason: ManifestAvailable
status: "True"
type: Available
identifier:
group: apps
kind: Deployment
name: test-nginx
namespace: test-ns
ordinal: 1
resource: deployments
version: v1
Work
检查状态,尤其是manifestConditions
部分。 可以看到无法应用命名空间,但命名空间中的部署已从中心传播到成员群集。
决议
在这种情况下,潜在的解决方案是在 ApplyStrategy 策略中将 AllowCoOwnership
设置为 true
。 但是,请务必注意,此决定应由用户做出,因为资源可能不共享。
此外,还可以查看 “应用工作控制器 ”的日志,以更深入地了解资源不可用的原因。