ポッドを調べると、次のようなOOMKilledイベントのインストルメンテーションを設定したいと思います。
Name: pnovotnak-manhole-123456789-82l2h
Namespace: test
Node: test-cluster-cja8smaK-oQSR/10.x.x.x
Start Time: Fri, 03 Feb 2017 14:34:57 -0800
Labels: pod-template-hash=123456789
run=pnovotnak-manhole
Status: Running
IP: 10.x.x.x
Controllers: ReplicaSet/pnovotnak-manhole-123456789
Containers:
pnovotnak-manhole:
Container ID: docker://...
Image: pnovotnak/it
Image ID: docker://sha256:...
Port:
Limits:
cpu: 2
memory: 3Gi
Requests:
cpu: 200m
memory: 256Mi
State: Running
Started: Fri, 03 Feb 2017 14:41:12 -0800
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Fri, 03 Feb 2017 14:35:08 -0800
Finished: Fri, 03 Feb 2017 14:41:11 -0800
Ready: True
Restart Count: 1
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tder (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-46euo:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tder
QoS Class: Burstable
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
11m 11m 1 {default-scheduler } Normal Scheduled Successfully assigned pnovotnak-manhole-123456789-82l2h to test-cluster-cja8smaK-oQSR
10m 10m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Created Created container with docker id xxxxxxxxxxxx; Security:[seccomp=unconfined]
10m 10m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Started Started container with docker id xxxxxxxxxxxx
11m 4m 2 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Pulling pulling image "pnovotnak/it"
10m 4m 2 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Pulled Successfully pulled image "pnovotnak/it"
4m 4m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Created Created container with docker id yyyyyyyyyyyy; Security:[seccomp=unconfined]
4m 4m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Started Started container with docker id yyyyyyyyyyyy
ポッドログから取得できるのは、
{
textPayload: "shutting down, got signal: Terminated
"
insertId: "aaaaaaaaaaaaaaaa"
resource: {
type: "container"
labels: {
pod_id: "pnovotnak-manhole-123456789-82l2h"
...
}
}
timestamp: "2017-02-03T22:34:48Z"
severity: "ERROR"
labels: {
container.googleapis.com/container_name: "POD"
...
}
logName: "projects/myproj/logs/POD"
}
そして、kubletログ。
{
insertId: "bbbbbbbbbbbbbb"
jsonPayload: {
_BOOT_ID: "ffffffffffffffffffffffffffffffff"
MESSAGE: "I0203 22:41:11.925928 1843 kubelet.go:1816] SyncLoop (PLEG): "pnovotnak-manhole-123456789-82l2h_test(a-uuid)", event: &pleg.PodLifecycleEvent{ID:"another-uuid", Type:"ContainerDied", Data:"..."}"
...
これは、これをOOMイベントとして一意に識別するのに十分とは思えません。他のアイデアは?
OOMKilledイベントはログに表示されませんが、ポッドが強制終了されたことを検出できる場合は、 kubectl get pod -o go-template=... <pod-id>
理由を判別します。 the docs の例:
[13:59:01] $ ./cluster/kubectl.sh get pod -o go-template='{{range.status.containerStatuses}}{{"Container Name: "}}{{.name}}{{"\r\nLastState: "}}{{.lastState}}{{end}}' simmemleak-60xbc
Container Name: simmemleak
LastState: map[terminated:map[exitCode:137 reason:OOM Killed startedAt:2015-07-07T20:58:43Z finishedAt:2015-07-07T20:58:43Z containerID:docker://0e4095bba1feccdfe7ef9fb6ebffe972b4b14285d5acdec6f0d3ae8a22fad8b2]]
プログラムでこれを実行している場合、kubectl
出力に依存するより良い代替策は、Kubernetes REST API GET /api/v1/pods
メソッド。 APIにアクセスするためのメソッドも ドキュメントに記載されています です。