This article explains how to configure MachineHealthCheck for kubernetes cluster deployed by cluster-api-provider-openstack.
Target environment
ubuntu@capi:~$ clusterctl upgrade plan
Checking cert-manager version...
Cert-Manager is already up to date
Checking new release availability...
Management group: capi-system/cluster-api, latest release available for the v1alpha3 API Version of Cluster API (contract):
NAME NAMESPACE TYPE CURRENT VERSION NEXT VERSION
bootstrap-kubeadm capi-kubeadm-bootstrap-system BootstrapProvider v0.3.12 Already up to date
control-plane-kubeadm capi-kubeadm-control-plane-system ControlPlaneProvider v0.3.12 Already up to date
cluster-api capi-system CoreProvider v0.3.12 Already up to date
infrastructure-openstack capo-system InfrastructureProvider v0.3.3 Already up to date
You are already up to date!
New clusterctl version available: v0.3.10 -> v0.3.12
https://github.com/kubernetes-sigs/cluster-api/releases/tag/v0.3.12
ubuntu@capi:~$ kubectl get machines
NAME PROVIDERID PHASE VERSION
external-control-plane-fffhl openstack://33170cf7-8112-4163-91ab-c9fb4b4f1f81 Running v1.17.11
external-control-plane-q9fms openstack://6a5bd72f-7926-413e-92a6-d4d6b8ab5c7d Running v1.17.11
external-control-plane-zd7hd openstack://8eca63d1-fac4-4979-ad76-447bc803d98e Running v1.17.11
external-md-0-84b9fff89c-snh5m openstack://af280c1f-fe8b-46a6-993b-6cea5e015505 Running v1.17.11
Creating a MachineHealthCheck
Create a manifest file and apply it.
apiVersion: cluster.x-k8s.io/v1alpha3
kind: MachineHealthCheck
metadata:
name: external-node-unhealthy-5m
spec:
clusterName: external
maxUnhealthy: 100%
selector:
matchLabels:
cluster.x-k8s.io/deployment-name: external-md-0
unhealthyConditions:
- type: Ready
status: Unknown
timeout: 300s
- type: Ready
status: "False"
timeout: 300s
See MachineHealthCheck resource
ubuntu@capi:~$ kubectl get mhc
NAME MAXUNHEALTHY EXPECTEDMACHINES CURRENTHEALTHY
external-node-unhealthy-5m 100% 1 1
Test
Login to the worker node and stop kubelet for marking the worker node unhealthy. you may add bastion and login worker node via bastion.
Let’s observe event and machine. Cluster API tries to delete worker node and create new worker ones. Since kubelet was stopped, cluster can not drain Machine’s node.
ubuntu@capi:~$ kubectl get machines
NAME PROVIDERID PHASE VERSION
external-control-plane-fffhl openstack://33170cf7-8112-4163-91ab-c9fb4b4f1f81 Running v1.17.11
external-control-plane-q9fms openstack://6a5bd72f-7926-413e-92a6-d4d6b8ab5c7d Running v1.17.11
external-control-plane-zd7hd openstack://8eca63d1-fac4-4979-ad76-447bc803d98e Running v1.17.11
external-md-0-84b9fff89c-97sz2 openstack://956ebe53-a3d0-4e02-8e65-4561fbfc6f9a Running v1.17.11
external-md-0-84b9fff89c-snh5m openstack://af280c1f-fe8b-46a6-993b-6cea5e015505 Deleting v1.17.11
kubectl get events --sort-by=.metadata.creationTimestamp
4m48s Normal MachineMarkedUnhealthy machine/external-md-0-84b9fff89c-snh5m Machine default/external-node-unhealthy-5m/external-md-0-84b9fff89c-snh5m/external-md-0-rcct8 has been marked as unhealthy
4m48s Normal SuccessfulCreate machineset/external-md-0-84b9fff89c Created machine "external-md-0-84b9fff89c-97sz2"
111s Normal DetectedUnhealthy machine/external-md-0-84b9fff89c-97sz2 Machine default/external-node-unhealthy-5m/external-md-0-84b9fff89c-97sz2/ has unhealthy node
4m35s Normal SuccessfulCreateServer openstackmachine/external-md-0-nfwds Created server external-md-0-nfwds with id 956ebe53-a3d0-4e02-8e65-4561fbfc6f9a
6s Warning FailedDrainNode machine/external-md-0-84b9fff89c-snh5m error draining Machine's node "external-md-0-rcct8": requeue in 20s
111s Normal SuccessfulSetNodeRef machine/external-md-0-84b9fff89c-97sz2 external-md-0-nfwds
After few minuites, worker node is replaced.
ubuntu@capi:~$ kubectl get machines
NAME PROVIDERID PHASE VERSION
external-control-plane-fffhl openstack://33170cf7-8112-4163-91ab-c9fb4b4f1f81 Running v1.17.11
external-control-plane-q9fms openstack://6a5bd72f-7926-413e-92a6-d4d6b8ab5c7d Running v1.17.11
external-control-plane-zd7hd openstack://8eca63d1-fac4-4979-ad76-447bc803d98e Running v1.17.11
external-md-0-84b9fff89c-97sz2 openstack://956ebe53-a3d0-4e02-8e65-4561fbfc6f9a Running v1.17.11
kubectl get events --sort-by=.metadata.creationTimestamp
10m Normal DetectedUnhealthy machine/external-md-0-84b9fff89c-snh5m Machine default/external-node-unhealthy-5m/external-md-0-84b9fff89c-snh5m/external-md-0-rcct8 has unhealthy node external-md-0-rcct8
75s Normal MachineMarkedUnhealthy machine/external-md-0-84b9fff89c-snh5m Machine default/external-node-unhealthy-5m/external-md-0-84b9fff89c-snh5m/external-md-0-rcct8 has been marked as unhealthy
7m40s Normal SuccessfulCreate machineset/external-md-0-84b9fff89c Created machine "external-md-0-84b9fff89c-97sz2"
4m43s Normal DetectedUnhealthy machine/external-md-0-84b9fff89c-97sz2 Machine default/external-node-unhealthy-5m/external-md-0-84b9fff89c-97sz2/ has unhealthy node
7m27s Normal SuccessfulCreateServer openstackmachine/external-md-0-nfwds Created server external-md-0-nfwds with id 956ebe53-a3d0-4e02-8e65-4561fbfc6f9a
2m38s Warning FailedDrainNode machine/external-md-0-84b9fff89c-snh5m error draining Machine's node "external-md-0-rcct8": requeue in 20s
4m43s Normal SuccessfulSetNodeRef machine/external-md-0-84b9fff89c-97sz2 external-md-0-nfwds
73s Normal SuccessfulDrainNode machine/external-md-0-84b9fff89c-snh5m success draining Machine's node "external-md-0-rcct8"
73s Normal SuccessfulDeleteServer openstackmachine/external-md-0-rcct8 Deleted server external-md-0-rcct8 with id af280c1f-fe8b-46a6-993b-6cea5e015505
ubuntu@capi:~$
MachineHealthCheck worked successfully.