OpenClawSlowReconciliation
Meaning
Section titled “Meaning”The p99 reconciliation duration exceeds 30 seconds. Reconciliation should normally complete in under 5 seconds.
Impact
Section titled “Impact”Slow reconciliation delays the propagation of spec changes to managed resources. During high load, the controller may fall behind on processing events.
Diagnosis
Section titled “Diagnosis”# Check operator resource usagekubectl top pod -n openclaw-operator-system
# Check operator logs for slow operationskubectl logs -n openclaw-operator-system deploy/openclaw-operator-controller-manager -c manager --tail=200
# Check API server latencykubectl get --raw /metrics | grep apiserver_request_duration
# Check number of managed instanceskubectl get openclawinstance --all-namespaces --no-headers | wc -l
# Check workqueue depthkubectl get --raw /metrics | grep workqueue_depthMitigation
Section titled “Mitigation”- API server overload - Check kube-apiserver health and latency
- Too many instances - Consider running multiple operator replicas with leader election
- Network issues - Check connectivity between operator pod and API server
- Resource starvation - Increase operator CPU/memory limits