次のコマンドを発行して、Kubernetesマスターをセットアップしようとしています。
kubeadm init --pod-network-cidr = 192.168.0.0/16
問題:coredns
ポッドにはCrashLoopBackOff
またはError
状態があります:
# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-lflwx 2/2 Running 0 2d
coredns-576cbf47c7-nm7gc 0/1 CrashLoopBackOff 69 2d
coredns-576cbf47c7-nwcnx 0/1 CrashLoopBackOff 69 2d
etcd-suey.nknwn.local 1/1 Running 0 2d
kube-apiserver-suey.nknwn.local 1/1 Running 0 2d
kube-controller-manager-suey.nknwn.local 1/1 Running 0 2d
kube-proxy-xkgdr 1/1 Running 0 2d
kube-scheduler-suey.nknwn.local 1/1 Running 0 2d
#
トラブルシューティングkubeadm-Kubernetes を試してみましたが、ノードがSELinux
を実行しておらず、Dockerは最新です。
# docker --version
Docker version 18.06.1-ce, build e68fc7a
#
kubectl
のdescribe
:
# kubectl -n kube-system describe pod coredns-576cbf47c7-nwcnx
Name: coredns-576cbf47c7-nwcnx
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: suey.nknwn.local/192.168.86.81
Start Time: Sun, 28 Oct 2018 22:39:46 -0400
Labels: k8s-app=kube-dns
pod-template-hash=576cbf47c7
Annotations: cni.projectcalico.org/podIP: 192.168.0.30/32
Status: Running
IP: 192.168.0.30
Controlled By: ReplicaSet/coredns-576cbf47c7
Containers:
coredns:
Container ID: docker://ec65b8f40c38987961e9ed099dfa2e8bb35699a7f370a2cda0e0d522a0b05e79
Image: k8s.gcr.io/coredns:1.2.2
Image ID: docker-pullable://k8s.gcr.io/coredns@sha256:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Wed, 31 Oct 2018 23:28:58 -0400
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 31 Oct 2018 23:21:35 -0400
Finished: Wed, 31 Oct 2018 23:23:54 -0400
Ready: True
Restart Count: 103
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-xvq8b:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-xvq8b
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 54m (x10 over 4h19m) kubelet, suey.nknwn.local Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 9m56s (x92 over 4h20m) kubelet, suey.nknwn.local Liveness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 5m4s (x173 over 4h10m) kubelet, suey.nknwn.local Back-off restarting failed container
# kubectl -n kube-system describe pod coredns-576cbf47c7-nm7gc
Name: coredns-576cbf47c7-nm7gc
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: suey.nknwn.local/192.168.86.81
Start Time: Sun, 28 Oct 2018 22:39:46 -0400
Labels: k8s-app=kube-dns
pod-template-hash=576cbf47c7
Annotations: cni.projectcalico.org/podIP: 192.168.0.31/32
Status: Running
IP: 192.168.0.31
Controlled By: ReplicaSet/coredns-576cbf47c7
Containers:
coredns:
Container ID: docker://0f2db8d89a4c439763e7293698d6a027a109bf556b806d232093300952a84359
Image: k8s.gcr.io/coredns:1.2.2
Image ID: docker-pullable://k8s.gcr.io/coredns@sha256:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Wed, 31 Oct 2018 23:29:11 -0400
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 31 Oct 2018 23:21:58 -0400
Finished: Wed, 31 Oct 2018 23:24:08 -0400
Ready: True
Restart Count: 102
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-xvq8b:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-xvq8b
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 44m (x12 over 4h18m) kubelet, suey.nknwn.local Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
Warning BackOff 4m58s (x170 over 4h9m) kubelet, suey.nknwn.local Back-off restarting failed container
Warning Unhealthy 8s (x102 over 4h19m) kubelet, suey.nknwn.local Liveness probe failed: HTTP probe failed with statuscode: 503
#
kubectl
のlog
:
# kubectl -n kube-system logs -f coredns-576cbf47c7-nm7gc
E1101 03:31:58.974836 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:31:58.974836 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:31:58.974857 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.975493 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.976732 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.977788 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.976164 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.977415 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.978332 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
2018/11/01 03:33:08 [INFO] SIGTERM: Shutting down servers then terminating
E1101 03:33:31.976864 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:31.978080 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:31.979156 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
#
# kubectl -n kube-system log -f coredns-576cbf47c7-gqdgd
.:53
2018/11/05 04:04:13 [INFO] CoreDNS-1.2.2
2018/11/05 04:04:13 [INFO] linux/AMD64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/AMD64, go1.11, eb51e8b
2018/11/05 04:04:13 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/11/05 04:04:19 [FATAL] plugin/loop: Seen "HINFO IN 3597544515206064936.6415437575707023337." more than twice, loop detected
# kubectl -n kube-system log -f coredns-576cbf47c7-hhmws
.:53
2018/11/05 04:04:18 [INFO] CoreDNS-1.2.2
2018/11/05 04:04:18 [INFO] linux/AMD64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/AMD64, go1.11, eb51e8b
2018/11/05 04:04:18 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/11/05 04:04:24 [FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected
#
describe
(apiserver
):
# kubectl -n kube-system describe pod kube-apiserver-suey.nknwn.local
Name: kube-apiserver-suey.nknwn.local
Namespace: kube-system
Priority: 2000000000
PriorityClassName: system-cluster-critical
Node: suey.nknwn.local/192.168.87.20
Start Time: Fri, 02 Nov 2018 00:28:44 -0400
Labels: component=kube-apiserver
tier=control-plane
Annotations: kubernetes.io/config.hash: 2433a531afe72165364aace3b746ea4c
kubernetes.io/config.mirror: 2433a531afe72165364aace3b746ea4c
kubernetes.io/config.seen: 2018-11-02T00:28:43.795663261-04:00
kubernetes.io/config.source: file
scheduler.alpha.kubernetes.io/critical-pod:
Status: Running
IP: 192.168.87.20
Containers:
kube-apiserver:
Container ID: docker://659456385a1a859f078d36f4d1b91db9143d228b3bc5b3947a09460a39ce41fc
Image: k8s.gcr.io/kube-apiserver:v1.12.2
Image ID: docker-pullable://k8s.gcr.io/kube-apiserver@sha256:094929baf3a7681945d83a7654b3248e586b20506e28526121f50eb359cee44f
Port: <none>
Host Port: <none>
Command:
kube-apiserver
--authorization-mode=Node,RBAC
--advertise-address=192.168.87.20
--allow-privileged=true
--client-ca-file=/etc/kubernetes/pki/ca.crt
--enable-admission-plugins=NodeRestriction
--enable-bootstrap-token-auth=true
--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
--etcd-servers=https://127.0.0.1:2379
--insecure-port=0
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
--requestheader-allowed-names=front-proxy-client
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--secure-port=6443
--service-account-key-file=/etc/kubernetes/pki/sa.pub
--service-cluster-ip-range=10.96.0.0/12
--tls-cert-file=/etc/kubernetes/pki/apiserver.crt
--tls-private-key-file=/etc/kubernetes/pki/apiserver.key
State: Running
Started: Sun, 04 Nov 2018 22:57:27 -0500
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 04 Nov 2018 20:12:06 -0500
Finished: Sun, 04 Nov 2018 22:55:24 -0500
Ready: True
Restart Count: 2
Requests:
cpu: 250m
Liveness: http-get https://192.168.87.20:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Mounts:
/etc/ca-certificates from etc-ca-certificates (ro)
/etc/kubernetes/pki from k8s-certs (ro)
/etc/ssl/certs from ca-certs (ro)
/usr/local/share/ca-certificates from usr-local-share-ca-certificates (ro)
/usr/share/ca-certificates from usr-share-ca-certificates (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
etc-ca-certificates:
Type: HostPath (bare Host directory volume)
Path: /etc/ca-certificates
HostPathType: DirectoryOrCreate
k8s-certs:
Type: HostPath (bare Host directory volume)
Path: /etc/kubernetes/pki
HostPathType: DirectoryOrCreate
ca-certs:
Type: HostPath (bare Host directory volume)
Path: /etc/ssl/certs
HostPathType: DirectoryOrCreate
usr-share-ca-certificates:
Type: HostPath (bare Host directory volume)
Path: /usr/share/ca-certificates
HostPathType: DirectoryOrCreate
usr-local-share-ca-certificates:
Type: HostPath (bare Host directory volume)
Path: /usr/local/share/ca-certificates
HostPathType: DirectoryOrCreate
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute
Events: <none>
#
syslog(ホスト):
11月4日22:59:36 suey kubelet [1234]:E1104 22:59:36.139538 1234 pod_workers.go:186]ポッドの同期エラーd8146b7e-de57-11e8-a1e2-ec8eb57434c8( "coredns-576cbf47c7-hhmws_kube-system(d8146b77 de57-11e8-a1e2-ec8eb57434c8) ")、スキップ:" coreCons "の" StartContainer "に失敗し、CrashLoopBackOff:" back-off 40s restarting failed container = coredns pod = coredns-576cbf47c7-hhmws_kube-system(d8146b7e-de57-11e8 -a1e2-ec8eb57434c8)」
お知らせ下さい。
このエラー
[FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected
coreDNSが解決構成でループを検出したときに発生します。これは意図された動作です。この問題が発生しています:
https://github.com/kubernetes/kubeadm/issues/1162
https://github.com/coredns/coredns/issues/2087
Hacky solution:Disable the CoreDNS loop detection
CoreDNS configmapを編集します。
kubectl -n kube-system edit configmap coredns
loop
を含む行を削除またはコメント化し、保存して終了します。
次に、CoreDNSポッドを削除して、新しい構成で新しいポッドを作成できるようにします。
kubectl -n kube-system delete pod -l k8s-app=kube-dns
その後はすべて問題ないはずです。
推奨される解決策:DNS構成のループを削除します
まず、systemd-resolved
を使用しているかどうかを確認します。 Ubuntu 18.04を実行している場合は、おそらくそうです。
systemctl list-unit-files | grep enabled | grep systemd-resolved
ある場合は、クラスターが参照として使用しているresolv.conf
ファイルを確認します。
ps auxww | grep kubelet
次のような行が表示される場合があります。
/usr/bin/kubelet ... --resolv-conf=/run/systemd/resolve/resolv.conf
重要な部分は--resolv-conf
です。systemdresolv.confが使用されているかどうかがわかります。
systemd
のresolv.conf
の場合は、次のようにします。
/run/systemd/resolve/resolv.conf
の内容を確認して、次のようなレコードがあるかどうかを確認します。
nameserver 127.0.0.1
127.0.0.1
がある場合、それがループの原因です。
それを取り除くには、そのファイルを編集するのではなく、他の場所をチェックして適切に生成されるようにします。
/etc/systemd/network
の下のすべてのファイルを確認し、次のようなレコードを見つけたら
DNS=127.0.0.1
そのレコードを削除します。また、/etc/systemd/resolved.conf
を確認し、必要に応じて同じようにします。次のように、少なくとも1つまたは2つのDNSサーバーが構成されていることを確認してください。
DNS=1.1.1.1 1.0.0.1
それをすべて実行したら、systemdサービスを再起動して変更を有効にします。systemctl restart systemd-networkd systemd-resolved
その後、DNS=127.0.0.1
がresolv.conf
ファイルにないことを確認します。
cat /run/systemd/resolve/resolv.conf
最後に、DNSポッドの再作成をトリガーします
kubectl -n kube-system delete pod -l k8s-app=kube-dns
概要:ソリューションには、ホストDNS構成からDNSルックアップループのようなものを取り除くことが含まれます。手順は、resolv.confマネージャー/実装によって異なります。
「none」ドライバーを使用するUbuntuのminikubeの場合、
次のフラグを使用して、他の変更を必要とせずに機能させることができます-Sudo minikube start --extra-config=kubelet.resolv-conf=/run/systemd/resolve/resolv.conf
参照 this 関連する問題
ubuntu 16.04
dnsmasq
に問題がある可能性があります。ループバックアドレスを自動的に設定します。私は here 同様の応答を投稿しました。
tk の answer を自動化するシェルハッカーを次に示します。
# remove loop from DNS config files
Sudo find /etc/systemd/network /etc/systemd/resolved.conf -type f \
-exec sed -i '/^DNS=127.0.0.1/d' {} +
# if necessary, configure some DNS servers (use cloudfare public)
if ! grep '^DNS=.*' /etc/systemd/resolved.conf; then
Sudo sed -i '$aDNS=1.1.1.1 1.0.0.1' /etc/systemd/resolved.conf
fi
# restart systemd services
Sudo systemctl restart systemd-networkd systemd-resolved
# force (re-) creation of the dns pods
kubectl -n kube-system delete pod -l k8s-app=kube-dns