Skip to content

rootless-containers/bypass4netns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bypass4netns: Accelerator for slirp4netns using SECCOMP_IOCTL_NOTIF_ADDFD (Kernel 5.9)

bypass4netns is as fast as --net=host and almost as secure as traditional slirp4netns.

The current version of bypass4netns needs to be used in conjunction with slirp4netns, however, future version may work without slirp4netns.

Benchmark

(Oct 16, 2020)

Workload: iperf3 -c HOST_IP from podman run

  • --net=host (insecure): 57.9 Gbps
  • bypass4netns: 56.5 Gbps
  • slirp4netns: 7.56 Gbps

How it works

bypass4netns eliminates the overhead of slirp4netns by trapping socket syscals and executing them in the host network namespace using SECCOMP_IOCTL_NOTIF_ADDFD.

See also the talks.

Requirements

  • kernel >= 5.9
  • runc >= 1.1, or crun >= 1.6
  • libseccomp >= 2.5
  • Rootless Docker, Rootless Podman, or Rootless containerd/nerdctl

Build-time requirement:

  • golang >= 1.17

Compile

make
sudo make install

The following binaries will be installed into /usr/local/bin:

  • bypass4netns: the bypass4netns binary.
  • bypass4netnsd: an optional REST daemon for controlling bypass4netns processes from a non-initial network namespaces. Used by nerdctl.

Usage

Hard way (docker|podman|nerdctl)

$ bypass4netns --ignore="127.0.0.0/8,10.0.0.0/8,auto" -p="8080:80"

--ignore=... is a list of the CIDRs that cannot be bypassed:

  • loopback CIDRs (127.0.0.0/8)
  • slirp4netns CIDR (10.0.0.0/8)
  • CNI CIDRs inside the slirp's network namespace (auto)
$ ./test/seccomp.json.sh >$HOME/seccomp.json
$ $DOCKER run -it --rm --security-opt seccomp=$HOME/seccomp.json --runtime=runc alpine

$DOCKER is either docker, podman, or nerdctl.

Easy way (nerdctl)

bypass4netns is experimentally integrated into nerdctl (>= 0.17.0).

containerd-rootless-setuptool.sh install-bypass4netnsd
nerdctl run -it --rm -p 8080:80 --annotation nerdctl/bypass4netns=true alpine

NOTE: nerdctl prior to v2.0 needs --label instead of --annotation. Also, the syntax will be probably replaced with --security-opt or something like --network-opt in a future version of nerdctl.

⚠️ Caveats ⚠️

Accesses to host abstract sockets and host loopback IPs (127.0.0.0/8) from containers are designed to be rejected.

However, it is probably possible to connect to host loopback IPs by exploiting TOCTOU of struct sockaddr * pointers.

TODOs

  • Integration for Docker
  • Integration for Podman
  • Enable to connect to port-fowarded ports from other containers
    • This means that a container with publish option like -p 8080:80 cannot be connected to port 80 from other containers in the same network namespace
  • Handle protocol specific publish option like -p 8080:80/udp.
    • Currently, bypass4netns ignores porotocol in publish option.
  • Bind port when bypass4netns starts with publish option like -p 8080:80
    • Currently, bypass4netns bind socket to port 8080 when it handles bind(2) with target port 80.
    • bind(2) can fail if other process bind port 8080 before container's process bind port 80

Publications