Tetragon eBPF for Kubernetes: The Verdict Is Out
Cilium Tetragon was created roughly a year ago to address not only some of Cilium’s shortcomings but also those of eBPF tools in general, both commercial and open source. One of the main issues has been the power consumption conundrum. While eBPF is relatively powerful with its observability and security capabilities — which start from deep within the kernel and extend across different environments — there are trade-offs to be considered. Tetragon was created to mitigate some of these challenges, especially by enhancing observability without compromising performance.
One of Tetragon’s key attributes is how it simplifies security observability. It integrates seamlessly with various libraries that connect well to observability tools like Grafana and other consoles. This streamlines monitoring, making it easier for teams to get insights across their environments.
But another significant — and arguably key — feature of Tetragon is its Kubernetes-native design, meaning it was built to be highly compatible with Kubernetes. This is because it is quite different from traditional networking systems. This provides Tetragon with the potential to become an especially powerful tool in Kubernetes environments, where its hooks and integration are particularly effective.
But since the one-year anniversary of the release of Tetragon 1.0, its viability for monitoring and observability for esoteric Kubernetes environments has proved worthy — or not?
As CTO and co-founder Benyamin Hirschberg of Kubernetes security provider ARMO explained, Tetragon was created by the same team who are behind the Cilum project and are one of the leading teams of eBPF technologies and they extended the network capabilities. “Cilium provided eBPF with runtime security detection and response capabilities also in eBPF,” Hirschberg said.
One of my takeaways of this #OSSummit talk is that #eBPF for developers is being abstracted away: eBPF: A New Era in Cloud Infrastructure Tools. @lizrice, @Isovalent; Frederic Branczyk, Polar Signals, Hemanth Malla, Datadog; Yusheng Zheng, EUNOMIA, and Richard Simon, T-Systems. pic.twitter.com/5UyteUtxKH
— BC Gain (@bcamerongain) September 16, 2024
Tetragon is geared for Kubernetes and cloud native environments, especially in terms of Kubernetes abstractions. But Tetragon is more than that, Rice responded during the Q&A session at the Open Source Summit during the panel discussion “eBPF: A New Era in Cloud Infrastructure Tools.” With Tetragon, the user can access security profiles that can be attached to different events and report on security blocks. “You can use Tetragon directly on a host; it doesn’t have to be used with containers or Kubernetes,” Rice said.
The Open Source
Tetragon consists of a user-space agent and a kernel component. As Thomas Graf, co-founder and CTO Isovalent, vice president of cloud networking and security, Isovalent at Cisco, explained during a Webcast when Tetragon was publicly announced, the hard work happens in eBPF in the kernel, enabling low overhead. The agent collects the data and interacts with logging or metrics systems. The real-time enforcement happens in the kernel, allowing the system to act quicker than if the action had to be processed in user space.
As Tetragon is deeply embedded in the kernel, its integration at that level is a significant achievement, further solidifying its role in enhancing observability and security in modern cloud native environments. This deep kernel integration is not something to be overlooked and represents a major advance for tools of this kind.
The difference in performance overhead is due to Tetragon filtering events in the kernel. Traditional solutions handle all filtering in user space, causing higher overhead. This difference also shows in Transmission Control Protocol connect-request-response (TCP-CRR), a network benchmark where Tetragon outperforms traditional solutions by reducing latency and connection overhead.
Since Tetragon was designed to filter events against a security policy in-kernel, the data flowing from kernel to userspace is reduced. This results in much higher performance, Liz Rice, head of open source at Isovalent, told The New Stack.”Tetragon also places its eBPF hooks deeper within the kernel’s internals, so that it’s not vulnerable to the TOCTOU attacks documented at DEFCON a couple of years ago,” Rice said. “Its vantage point within the kernel means that it has visibility over any application running on the same machine, without needing to modify, reconfigure or even restart the applications.”
Now for Kubernetes: First, Tetragon, as the agent, connects to the Kubernetes API server and retrieves metadata for the kernel. What this metadata consists of extends to the namespace pod. Tetragon was designed to detect if a process is being run inside a pod, including the pod name, labels, and all related details, Graf explained when it was released. It can also automatically trace whether the execution happened inside the container, or if the kubelet executed it from outside by entering the pod’s namespace, which is what kubectl exec does, Graf said.
For example, a Splunk app connected to the Hubble UI can show, for example, how a compromised Node.js app invoked a reverse shell, leading to an internal Elasticsearch service being accessed through lateral movement.
‘Tetragon came to a solution space that was already “occupied” by another famous CNCF project: Falco. They took a deliberate design decision and instead of feeding a user-space rule engine with all eBPF events, like Falco does, they are embedding their policy engine in the eBPF code. This strategy enables higher performance (no need to transfer high-volume events to the user space), but due to the limitations of an eBPF program, Tetragon’s policy language is not as rich as Falco’s. On the one hand, Tetragon is a very promising project, on the other hand, it fails some of the basic detection techniques Falco has supported for a long time.
Under the Hood
For the user, the levels of abstraction are such that when working with a Datadog, Grafana, Polar Signals or another observability provider, Cilium, not to mention eBPF and its hooks, are running underneath the hood. This means the user doesn’t necessarily need to know about eBPF to take advantage of this much-talked-about technology.
In the case of Polar Signals, eBPF is used to build a profiler that shows the top-line number where CPU resources are being spent. However, observability provider Polar Signals started out not using eBPF for this function by using a completely different mechanism to obtain this data. “That just goes to show that it happened to be the right tool for the job, but unless you dig in and try to understand how the profiler itself works, you’re not going to know that eBPF is being used under the hood,” Frederic Branczyk, founder and CEO of Polar Signals, during the Open Source Summit Europe panel discussion cited above.