Welcome to the Linux Foundation Forum!

Running pre-flight checks hang

Options

Hello,

I am getting stuck at task 8 of Ex 2.2. I used the suggested grep command to get the precise sudo kubeadm command to use in the worker node. I am making sure that I am copying line by line. Unfortunately, running pre-flight checks hang. I used the --v=5 flag as well(it was prompted by kubectl) and it can not connect to the IP address I specified even though it is the same as the one written in cp.out file. I even used the kubectl get nodes -o wide command to check the IP address of the control panel node and its the same. Anyone has any suggestions on how to tackle this problem? Was I supposed to run any other command before using sudo kubeadm join? Thanks in advance

Best Answer

  • chrispokorni
    chrispokorni Posts: 2,181
    Answer ✓
    Options

    Hi @gmmajal,

    After adding 10.0.0.10 k8scp to the two /etc/hosts files, perform the following to attempt to grow the cluster:

    On the CP node (your control plane with assumed private IP 10.0.0.10) run the following command:
    sudo kubeadm token create --print-join-command

    On the WORKER node (with an assumed private IP 10.0.0.x) run the following commands:
    sudo kubeadm reset
    sudo kubeadm join ... #<-- the entire join command generated on the CP node

    If this join is still not successful then please review the VPC and firewall configuration steps from the demo video for GCP. Also, ensure both VMs (CP and WORKER) are created in the same VPC/subnet, so that they are both protected by the same firewall (open to all inbound traffic, all protocols, from all sources, to all port destinations).

    Regards,
    -Chris

Answers

  • chrispokorni
    chrispokorni Posts: 2,181
    edited April 10
    Options

    Hi @gmmajal,

    Please provide the output produced by the kubeadm join command, using the code format.

    Also, keep in mind that correctly setting up the infrastructure is essential. Did you follow the provisioning videos from the introductory chapter? The most important aspects are the VPC network and firewall configuration.
    What cloud or what local hypervisor provisions your infrastructure? What is the guest OS of the VMs? How many network interfaces on each VM? Are your firewalls disabled as instructed?

    Regards,
    -Chris

  • gmmajal
    gmmajal Posts: 5
    edited April 10
    Options
    I0410 18:17:52.181740    6386 join.go:413] [preflight] found NodeName empty; using OS hostname as NodeName
    I0410 18:17:52.182229    6386 initconfiguration.go:122] detected and using CRI socket: unix:///var/run/containerd/containerd.sock
    [preflight] Running pre-flight checks
    I0410 18:17:52.182428    6386 preflight.go:93] [preflight] Running general checks
    I0410 18:17:52.182509    6386 checks.go:280] validating the existence of file /etc/kubernetes/kubelet.conf
    I0410 18:17:52.182540    6386 checks.go:280] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
    I0410 18:17:52.182564    6386 checks.go:104] validating the container runtime
    I0410 18:17:52.225113    6386 checks.go:639] validating whether swap is enabled or not
    I0410 18:17:52.225242    6386 checks.go:370] validating the presence of executable crictl
    I0410 18:17:52.225287    6386 checks.go:370] validating the presence of executable conntrack
    I0410 18:17:52.225320    6386 checks.go:370] validating the presence of executable ip
    I0410 18:17:52.225353    6386 checks.go:370] validating the presence of executable iptables
    I0410 18:17:52.225389    6386 checks.go:370] validating the presence of executable mount
    I0410 18:17:52.225430    6386 checks.go:370] validating the presence of executable nsenter
    I0410 18:17:52.225462    6386 checks.go:370] validating the presence of executable ebtables
    I0410 18:17:52.225494    6386 checks.go:370] validating the presence of executable ethtool
    I0410 18:17:52.225522    6386 checks.go:370] validating the presence of executable socat
    I0410 18:17:52.225552    6386 checks.go:370] validating the presence of executable tc
    I0410 18:17:52.225580    6386 checks.go:370] validating the presence of executable touch
    I0410 18:17:52.225615    6386 checks.go:516] running all checks
    I0410 18:17:52.244927    6386 checks.go:401] checking whether the given node name is valid and reachable using net.LookupHost
    I0410 18:17:52.250203    6386 checks.go:605] validating kubelet version
    I0410 18:17:52.331698    6386 checks.go:130] validating if the "kubelet" service is enabled and active
    I0410 18:17:52.347137    6386 checks.go:203] validating availability of port 10250
    I0410 18:17:52.347514    6386 checks.go:280] validating the existence of file /etc/kubernetes/pki/ca.crt
    I0410 18:17:52.347552    6386 checks.go:430] validating if the connectivity type is via proxy or direct
    I0410 18:17:52.347613    6386 checks.go:329] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
    I0410 18:17:52.347697    6386 checks.go:329] validating the contents of file /proc/sys/net/ipv4/ip_forward
    I0410 18:17:52.347753    6386 join.go:532] [preflight] Discovering cluster-info
    I0410 18:17:52.347806    6386 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "10.0.0.6:6443"
    I0410 18:18:02.349715    6386 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.6:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    I0410 18:18:18.743849    6386 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.6:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    I0410 18:18:34.340314    6386 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.6:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    I0410 18:18:50.171644    6386 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://10.0.0.6:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Get "https://10.0.0.6:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    couldn't validate the identity of the API Server
    k8s.io/kubernetes/cmd/kubeadm/app/discovery.For
            cmd/kubeadm/app/discovery/discovery.go:45
    k8s.io/kubernetes/cmd/kubeadm/app/cmd.(*joinData).TLSBootstrapCfg
            cmd/kubeadm/app/cmd/join.go:533
    k8s.io/kubernetes/cmd/kubeadm/app/cmd.(*joinData).InitCfg
            cmd/kubeadm/app/cmd/join.go:543
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join.runPreflight
            cmd/kubeadm/app/cmd/phases/join/preflight.go:98
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
            cmd/kubeadm/app/cmd/phases/workflow/runner.go:259
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
            cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
            cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
    k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
            cmd/kubeadm/app/cmd/join.go:180
    github.com/spf13/cobra.(*Command).execute
            vendor/github.com/spf13/cobra/command.go:940
    github.com/spf13/cobra.(*Command).ExecuteC
            vendor/github.com/spf13/cobra/command.go:1068
    github.com/spf13/cobra.(*Command).Execute
            vendor/github.com/spf13/cobra/command.go:992
    k8s.io/kubernetes/cmd/kubeadm/app.Run
            cmd/kubeadm/app/kubeadm.go:50
    main.main
            cmd/kubeadm/kubeadm.go:25
    runtime.main
            /usr/local/go/src/runtime/proc.go:267
    runtime.goexit
            /usr/local/go/src/runtime/asm_amd64.s:1650
    error execution phase preflight
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
            cmd/kubeadm/app/cmd/phases/workflow/runner.go:260
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
            cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
            cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
    k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
            cmd/kubeadm/app/cmd/join.go:180
    github.com/spf13/cobra.(*Command).execute
            vendor/github.com/spf13/cobra/command.go:940
    github.com/spf13/cobra.(*Command).ExecuteC
            vendor/github.com/spf13/cobra/command.go:1068
    github.com/spf13/cobra.(*Command).Execute
            vendor/github.com/spf13/cobra/command.go:992
    k8s.io/kubernetes/cmd/kubeadm/app.Run
            cmd/kubeadm/app/kubeadm.go:50
    main.main
            cmd/kubeadm/kubeadm.go:25
    runtime.main
            /usr/local/go/src/runtime/proc.go:267
    runtime.goexit
            /usr/local/go/src/runtime/asm_amd64.s:1650
    
    

    Hi @chrispokorni

    The aforementioned block is my (truncated)output when I run the sudo kubeadm join command with a v=5 flag. Kubectl prompted me to add this flag in order to get a more verbose output to identify the nature of the error.

    With regards to your questions:
    1) I am using the Google Cloud Engine and I am connecting to the VM instances, via putty.
    2) The OS is Ubuntu 20.04. LTS
    3) I have ensured that I have chosen the VPC network that I made specifically for this class(following the instructions provided in the first lesson). There's just one network per VM.

    4) I have also made sure I have disabled the firewall. I have also added a screen shot of the firewall rule that's operational.

  • chrispokorni
    chrispokorni Posts: 2,181
    Options

    Hi @gmmajal,

    Thank you for the detailed output.
    What are the custom entries of the /etc/hosts files, what are the private IP addresses and the hostnames of the two VMs?

    What are the outputs of kubectl get nodes -o wide and kubectl get pods -A -o wide ?

    Regards,
    -Chris

  • gmmajal
    gmmajal Posts: 5
    Options
    NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE   IP           NODE   NOMINATED NODE   READINESS GATES
    kube-system   cilium-k62nk                       1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
    kube-system   cilium-operator-58684c48c9-b4c8f   1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
    kube-system   coredns-76f75df574-725bc           1/1     Running   0          10m   10.0.0.5     cp     <none>           <none>
    kube-system   coredns-76f75df574-gccb4           1/1     Running   0          10m   10.0.0.245   cp     <none>           <none>
    kube-system   etcd-cp                            1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
    kube-system   kube-apiserver-cp                  1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
    kube-system   kube-controller-manager-cp         1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
    kube-system   kube-proxy-bqh7r                   1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
    kube-system   kube-scheduler-cp                  1/1     Running   0          10m   10.0.0.10    cp     <none>           <none>
    
    NAME   STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION    CONTAINER-RUNTIME
    cp     Ready    control-plane   18m   v1.29.1   10.0.0.10     <none>        Ubuntu 20.04.6 LTS   5.15.0-1053-gcp   containerd://1.6.31
    
    

    Hi Chris,
    Thanks for the prompt response. The first block is the output for kubectl get pods command on the control panel node. The second block is the output for the kubectl get nodes command on the control panel node.

    The entry inside the hosts file is the following:

    127.0.0.1 localhost
    
    # The following lines are desirable for IPv6 capable hosts
    ::1 ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    ff02::3 ip6-allhosts
    169.254.169.254 metadata.google.internal metadata
    

    The hostname of the vm instances are worker and cp. The IP addresses are 34.91.60.229 and 34.32.234.112, respectively.

    With regards to the firewall rule I just wanted to recheck one thing. There are a few rules created by default for an instance of a VPC network on Google cloud. Are we supposed to delete them entirely, before inserting our own firewall rule?

    Regards,
    GMMajal

  • chrispokorni
    chrispokorni Posts: 2,181
    Options

    Hi @gmmajal,

    You probably missed a step in the lab exercise. You must configure both /etc/hosts files, on each node respectively with the same additional entry CP-NODE-PRIVATE-IP k8scp. In your case the additional entry should be 10.0.0.10 k8scp.

    Regards,
    -Chris

  • gmmajal
    gmmajal Posts: 5
    Options

    Hi @chrispokorni

    I made the additional entry to the etc/hosts files on both nodes. Unfortunately, the problem still persists. Can you tell me which part of the exercise is responsible for making the configuration you mentioned in your earlier message? Unfortunately, I couldn't really find it. If I run kubectl get nodes on my worker node I get the following output:

    E0412 10:07:35.586706   16538 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
    E0412 10:07:35.587280   16538 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
    E0412 10:07:35.588770   16538 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
    E0412 10:07:35.589209   16538 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
    E0412 10:07:35.590655   16538 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
    The connection to the server localhost:8080 was refused - did you specify the right host or port?
    

    Even running a curl request:curl https://10.0.0.10:6443 results in a
    curl: (28) Failed to connect to 10.0.0.10 port 6443: Connection timed out
    connection timed out error. Is there somewhere else where I need to modify entries to allow this connection to happen?

    Regards,
    GMMajal

  • gmmajal
    gmmajal Posts: 5
    Options

    Hi @chrispokorni ,

    Thanks for your response. I tried what you suggested about growing the cluster first and then trying to connect the worker node to the cp. Unfortunately, that did not work. I started all over again making the VPC and VM instances. I followed each instruction carefully and it seems the original problem was indeed with my Firewall rule. I followed all the instructions in the exercise as stated and this time it worked. I did not have to insert any additional information in the etc/hosts file. The problem was with my Firewall setup to begin with.

    Regards,
    GMMajal

Categories

Upcoming Training