BPF Go program in Kubernetes
Wed 17 November 2021
BPF opens a lot of possibilities of making observability tools running in Kubernetes.
One can start with BCC libbpf-tools written in C,
e.g., launch tcpconnlat program and process its stdout with another program to
detect cases when it took too long to establish a TCP connection.
For example, curl http://example.com
took 47.8 milliseconds to establish a connection
where source address is 10.0.2.15
and destination address is 93.184.216.34
.
PID COMM IP SADDR DADDR DPORT LAT(ms)
21500 curl 4 10.0.2.15 93.184.216.34 80 47.80
Another option is to use Go version of tcpconnlat and modify it as you wish, e.g., write the events into Kafka for further analysis.
I have already published a Docker image marselester/go-libbpf-tools containing tcpconnlat and tried to run it on Mac, that didn't go well though.
﹪ docker run --rm -it --privileged marselester/go-libbpf-tools:latest bash
root@552a159ce901:/opt/libbpf-tools# ./tcpconnlat
failed to load BPF programs and maps: field TcpRcvStateProcess: program tcp_rcv_state_process: CO-RE relocations: no BTF for kernel version 5.10.47-linuxkit: not supported
Minikube with Virtualbox driver didn't help either because it uses an old kernel, hopefully it will be upgraded soon #10501.
﹪ minikube start --driver=virtualbox
﹪ minikube ssh
﹩ uname -nr
Linux minikube 4.19.202
Luckily there is another option called Kubespray.
Kubespray
Kubespray sets up a Kubernetes cluster of 3 nodes using Vagrant and Ansible. Clone the repository and install Python dependencies for provisioning tasks.
﹪ git clone https://github.com/kubernetes-sigs/kubespray
﹪ cd ./kubespray/
﹪ virtualenv venv
﹪ source ./venv/bin/activate
(venv) ﹪ pip install -r requirements.txt
From looking at the Vagrantfile we see that Kubespray supports Fedora Linux 34, so BTF and CO-RE technologies should be there.
﹪ mkdir vagrant
﹪ echo '$os = "fedora34"' > ./vagrant/config.rb
﹪ vagrant up
﹪ export KUBECONFIG=$(pwd)/.vagrant/provisioners/ansible/inventory/artifacts/admin.conf
﹪ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-1 Ready control-plane,master 8m20s v1.22.3
k8s-2 Ready control-plane,master 7m56s v1.22.3
k8s-3 Ready <none> 6m58s v1.22.3
﹪ vagrant ssh k8s-1
﹩ uname -nr
k8s-1 5.11.12-300.fc34.x86_64
See vagrant.md if the cluster wasn't provisioned.
DaemonSet
An observability tool should run on each node, and a DaemonSet ensures that all nodes run a copy of a pod. Let's try to launch tcpconnlat on all 3 nodes.
﹪ kubectl apply -f - <<<'
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: tcpconnlat-daemon
spec:
selector:
matchLabels:
app: tcpconnlat
template:
metadata:
labels:
app: tcpconnlat
spec:
containers:
- name: libbpf-tools
image: marselester/go-libbpf-tools:latest
command:
- /opt/libbpf-tools/tcpconnlat
'
Unfortunately pods have crashed because the containers didn't have privileged mode.
﹪ kubectl get pods
NAME READY STATUS RESTARTS AGE
tcpconnlat-daemon-646h2 0/1 CrashLoopBackOff 5 (2m9s ago) 5m13s
tcpconnlat-daemon-hwnzt 0/1 CrashLoopBackOff 5 (118s ago) 5m13s
tcpconnlat-daemon-tmn6r 0/1 CrashLoopBackOff 5 (2m14s ago) 5m13s
﹪ kubectl logs -f tcpconnlat-daemon-646h2
failed to set temporary RLIMIT_MEMLOCK: operation not permitted
Let's enable it.
By default a container is not allowed to access any devices on the host, but a "privileged" container is given access to all devices on the host. This allows the container nearly all the same access as processes running on the host.
https://kubernetes.io/docs/concepts/policy/pod-security-policy/#privileged
﹪ kubectl apply -f - <<<'
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: tcpconnlat-daemon
spec:
selector:
matchLabels:
app: tcpconnlat
template:
metadata:
labels:
app: tcpconnlat
spec:
containers:
- name: libbpf-tools
image: marselester/go-libbpf-tools:latest
command:
- /opt/libbpf-tools/tcpconnlat
securityContext:
privileged: true
'
It works! 👇
﹪ kubectl get pods
NAME READY STATUS RESTARTS AGE
tcpconnlat-daemon-9sgc5 1/1 Running 0 18s
tcpconnlat-daemon-lrvh5 1/1 Running 0 18s
tcpconnlat-daemon-th9k4 1/1 Running 0 18s
﹪ kubectl logs -f tcpconnlat-daemon-9sgc5
PID COMM IP SADDR DADDR DPORT LAT(ms)
5955 coredns 4 127.0.0.1 127.0.0.1 8080 0.02
703 kubelet 4 172.18.8.101 172.18.8.101 6443 0.04
703 kubelet 4 10.233.64.1 10.233.64.4 8181 0.07
703 kubelet 4 169.254.25.10 169.254.25.10 9254 0.03
703 kubelet 4 10.233.64.1 10.233.64.5 8080 0.03
5537 node-cache 4 169.254.25.10 169.254.25.10 9254 0.06
4204 etcd 4 172.18.8.101 172.18.8.103 2380 0.22
4204 etcd 4 172.18.8.101 172.18.8.102 2380 0.21
I also tried to run execsnoop and tcpconnect, but alas they crashed with the corresponding errors.
failed to attach the BPF program to sys_enter_execve tracepoint: trace event syscalls/sys_enter_execve: file does not exist
failed to load BPF programs and maps: field TcpV4ConnectRet: program tcp_v4_connect_ret: load program: permission denied: trace type programs with run-time allocated hash maps are unsafe. Switch to preallocated hash maps.
Category: Go Tagged: bpf golang kubernetes