Running pre-flight checks hang

gmmajal · April 10

Hello,

I am getting stuck at task 8 of Ex 2.2. I used the suggested grep command to get the precise sudo kubeadm command to use in the worker node. I am making sure that I am copying line by line. Unfortunately, running pre-flight checks hang. I used the --v=5 flag as well(it was prompted by kubectl) and it can not connect to the IP address I specified even though it is the same as the one written in cp.out file. I even used the kubectl get nodes -o wide command to check the IP address of the control panel node and its the same. Anyone has any suggestions on how to tackle this problem? Was I supposed to run any other command before using sudo kubeadm join? Thanks in advance

chrispokorni · April 12

Hi @gmmajal,

After adding 10.0.0.10 k8scp to the two /etc/hosts files, perform the following to attempt to grow the cluster:

On the CP node (your control plane with assumed private IP 10.0.0.10) run the following command:
sudo kubeadm token create --print-join-command

On the WORKER node (with an assumed private IP 10.0.0.x) run the following commands:
sudo kubeadm reset
sudo kubeadm join ... #<-- the entire join command generated on the CP node

If this join is still not successful then please review the VPC and firewall configuration steps from the demo video for GCP. Also, ensure both VMs (CP and WORKER) are created in the same VPC/subnet, so that they are both protected by the same firewall (open to all inbound traffic, all protocols, from all sources, to all port destinations).

Regards,
-Chris

chrispokorni · April 10

Hi @gmmajal,

Please provide the output produced by the kubeadm join command, using the code format.

Also, keep in mind that correctly setting up the infrastructure is essential. Did you follow the provisioning videos from the introductory chapter? The most important aspects are the VPC network and firewall configuration.
What cloud or what local hypervisor provisions your infrastructure? What is the guest OS of the VMs? How many network interfaces on each VM? Are your firewalls disabled as instructed?

Regards,
-Chris

gmmajal · April 10

I0410 18:17:52.181740    6386 join.go:413] [preflight] found NodeName empty; using OS hostname as NodeName
I0410 18:17:52.182229    6386 initconfiguration.go:122] detected and using CRI socket: unix:///var/run/containerd/containerd.sock
[preflight] Running pre-flight checks
I0410 18:17:52.182428    6386 preflight.go:93] [preflight] Running general checks
I0410 18:17:52.182509    6386 checks.go:280] validating the existence of file /etc/kubernetes/kubelet.conf
I0410 18:17:52.182540    6386 checks.go:280] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I0410 18:17:52.182564    6386 checks.go:104] validating the container runtime
I0410 18:17:52.225113    6386 checks.go:639] validating whether swap is enabled or not
I0410 18:17:52.225242    6386 checks.go:370] validating the presence of executable crictl
I0410 18:17:52.225287    6386 checks.go:370] validating the presence of executable conntrack
I0410 18:17:52.225320    6386 checks.go:370] validating the presence of executable ip
I0410 18:17:52.225353    6386 checks.go:370] validating the presence of executable iptables
I0410 18:17:52.225389    6386 checks.go:370] validating the presence of executable mount
I0410 18:17:52.225430    6386 checks.go:370] validating the presence of executable nsenter
I0410 18:17:52.225462    6386 checks.go:370] validating the presence of executable ebtables
I0410 18:17:52.225494    6386 checks.go:370] validating the presence of executable ethtool
I0410 18:17:52.225522    6386 checks.go:370] validating the presence of executable socat
I0410 18:17:52.225552    6386 checks.go:370] validating the presence of executable tc
I0410 18:17:52.225580    6386 checks.go:370] validating the presence of executable touch
I0410 18:17:52.225615    6386 checks.go:516] running all checks
I0410 18:17:52.244927    6386 checks.go:401] checking whether the given node name is valid and reachable using net.LookupHost
I0410 18:17:52.250203    6386 checks.go:605] validating kubelet version
I0410 18:17:52.331698    6386 checks.go:130] validating if the "kubelet" service is enabled and active
I0410 18:17:52.347137    6386 checks.go:203] validating availability of port 10250
I0410 18:17:52.347514    6386 checks.go:280] validating the existence of file /etc/kubernetes/pki/ca.crt
I0410 18:17:52.347552    6386 checks.go:430] validating if the connectivity type is via proxy or direct
I0410 18:17:52.347613    6386 checks.go:329] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0410 18:17:52.347697    6386 checks.go:329] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0410 18:17:52.347753    6386 join.go:532] [preflight] Discovering cluster-info
I0410 18:17:52.347806    6386 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "10.0.0.6:6443"
I0410 18:18:02.349715    6386 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.6:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0410 18:18:18.743849    6386 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.6:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0410 18:18:34.340314    6386 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.6:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I0410 18:18:50.171644    6386 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.6:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Get "https://10.0.0.6:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
couldn't validate the identity of the API Server
k8s.io/kubernetes/cmd/kubeadm/app/discovery.For
        cmd/kubeadm/app/discovery/discovery.go:45
k8s.io/kubernetes/cmd/kubeadm/app/cmd.(*joinData).TLSBootstrapCfg
        cmd/kubeadm/app/cmd/join.go:533
k8s.io/kubernetes/cmd/kubeadm/app/cmd.(*joinData).InitCfg
        cmd/kubeadm/app/cmd/join.go:543
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join.runPreflight
        cmd/kubeadm/app/cmd/phases/join/preflight.go:98
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:259
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
        cmd/kubeadm/app/cmd/join.go:180
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:267
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1650
error execution phase preflight
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:260
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
        cmd/kubeadm/app/cmd/join.go:180
github.com/spf13/cobra.(*Command).execute
        vendor/github.com/spf13/cobra/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        vendor/github.com/spf13/cobra/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        vendor/github.com/spf13/cobra/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
        cmd/kubeadm/app/kubeadm.go:50
main.main
        cmd/kubeadm/kubeadm.go:25
runtime.main
        /usr/local/go/src/runtime/proc.go:267
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1650

Hi @chrispokorni

The aforementioned block is my (truncated)output when I run the sudo kubeadm join command with a v=5 flag. Kubectl prompted me to add this flag in order to get a more verbose output to identify the nature of the error.

With regards to your questions:
1) I am using the Google Cloud Engine and I am connecting to the VM instances, via putty.
2) The OS is Ubuntu 20.04. LTS
3) I have ensured that I have chosen the VPC network that I made specifically for this class(following the instructions provided in the first lesson). There's just one network per VM.

4) I have also made sure I have disabled the firewall. I have also added a screen shot of the firewall rule that's operational.

chrispokorni · April 10

Hi @gmmajal,

Thank you for the detailed output.
What are the custom entries of the /etc/hosts files, what are the private IP addresses and the hostnames of the two VMs?

What are the outputs of kubectl get nodes -o wide and kubectl get pods -A -o wide ?

Regards,
-Chris

gmmajal · April 11

NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE   IP           NODE   NOMINATED NODE   READINESS GATES
kube-system   cilium-k62nk                       1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
kube-system   cilium-operator-58684c48c9-b4c8f   1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
kube-system   coredns-76f75df574-725bc           1/1     Running   0          10m   10.0.0.5     cp     <none>           <none>
kube-system   coredns-76f75df574-gccb4           1/1     Running   0          10m   10.0.0.245   cp     <none>           <none>
kube-system   etcd-cp                            1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
kube-system   kube-apiserver-cp                  1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
kube-system   kube-controller-manager-cp         1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
kube-system   kube-proxy-bqh7r                   1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
kube-system   kube-scheduler-cp                  1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>

NAME   STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
cp     Ready    control-plane   18m   v1.29.1   10.0.0.10     <none>        Ubuntu 20.04.6 LTS   5.15.0-1053-gcp   containerd://1.6.31

Hi Chris,
Thanks for the prompt response. The first block is the output for kubectl get pods command on the control panel node. The second block is the output for the kubectl get nodes command on the control panel node.

The entry inside the hosts file is the following:

127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
169.254.169.254 metadata.google.internal metadata

The hostname of the vm instances are worker and cp. The IP addresses are 34.91.60.229 and 34.32.234.112, respectively.

With regards to the firewall rule I just wanted to recheck one thing. There are a few rules created by default for an instance of a VPC network on Google cloud. Are we supposed to delete them entirely, before inserting our own firewall rule?

Regards,
GMMajal

chrispokorni · April 11

Hi @gmmajal,

You probably missed a step in the lab exercise. You must configure both /etc/hosts files, on each node respectively with the same additional entry CP-NODE-PRIVATE-IP k8scp. In your case the additional entry should be 10.0.0.10 k8scp.

Regards,
-Chris

gmmajal · April 12

Hi @chrispokorni

I made the additional entry to the etc/hosts files on both nodes. Unfortunately, the problem still persists. Can you tell me which part of the exercise is responsible for making the configuration you mentioned in your earlier message? Unfortunately, I couldn't really find it. If I run kubectl get nodes on my worker node I get the following output:

E0412 10:07:35.586706   16538 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0412 10:07:35.587280   16538 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0412 10:07:35.588770   16538 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0412 10:07:35.589209   16538 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0412 10:07:35.590655   16538 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?

Even running a curl request:curl https://10.0.0.10:6443 results in a
curl: (28) Failed to connect to 10.0.0.10 port 6443: Connection timed out
connection timed out error. Is there somewhere else where I need to modify entries to allow this connection to happen?

Regards,
GMMajal

gmmajal · April 15

Hi @chrispokorni ,

Thanks for your response. I tried what you suggested about growing the cluster first and then trying to connect the worker node to the cp. Unfortunately, that did not work. I started all over again making the VPC and VM instances. I followed each instruction carefully and it seems the original problem was indeed with my Firewall rule. I followed all the instructions in the exercise as stated and this time it worked. I did not have to insert any additional information in the etc/hosts file. The problem was with my Firewall setup to begin with.

Regards,
GMMajal

Running pre-flight checks hang

Best Answer

Answers

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)