Why do control plane CNFs hate isolcpu?

Why do control plane CNFs hate isolcpu?

Why can't some of the Containers use more than one CPU Core?

A little bit of background:

People who have worked on Openstack-based telco cloud might know about a kernel parameter called isolcpu. It's one of the features that make latency-sensitive VNFs run on compute nodes."isolcpu" basically isolates workload CPUs from the operating system's housekeeping services so that workloads can run mostly uninterrupted (there are more to it, keeping it simple)

Openstack Compute

Some telco operators use this isolation to run control plane VNFs and Data Plane VNFs on the same compute, making capacity planning easier and improving overall compute utilization. 

What does it have to do with CNFs?

Many Network Functions are containerized, but their requirements have not changed much; they still need exclusive CPUs with CPU pinning and isolated from one another. So isolcpu is still used by many Kubernetes distributions to achieve the same required isolation for Dataplane Network Functions.

So, what would happen if control plane pods run on a worker node with isolcpu?

The short answer is that control plane containers cannot scale to more than one core, irrespective of the number of allocated cores. It can use only one core even at peak utilization, usually the first core.

Kubernetes Pod

How does this look in practice?

pod running on node with isolcpu and exclusive CPUs [All 3 process sharing the same core]

No alt text provided for this image

same pod running on node with no isolcpu [All 3 process on different cores]

No alt text provided for this image

Is this a bug?

It's not a bug!.With isolcpu, Kernel will disable its load-balancing and process scheduling on the specified cores. So, no matter whether the container has 2 or 20 cores, the process will run only on a single core as there is no load balancing of cores by the Kernel. This is by design.

What about Dataplane CNFs how can they scale to more than one core?

The differentiator is DPDK,it has its scheduling ,load-balancing and doesn't rely on Kernel for packet processing. People who have worked on DPDK vSwitch/vRouter/vApps know that you must specify the CPU affinity explicitly.

So, this is not a Kubernetes issue, nor a container runtime issue. Instead, it is the way it works when Kubernetes CPU pinning and isolcpu are used together ,again its not a bug and its by design.

How can this be avoided in a Telco Cloud if you have to run Data plane and control plane CNFs on the same node?

  • The obvious and most straightforward solution is to create different worker nodes for Data plane and Control plane Network Functions.
  • There are Container Runtime-specific implementations supported by some OCI runtime. However, this requires extensive testing with the vendors, and pod spec changes particular to that runtime,which wont work if the runtime is different.
  • replace isolcpu with another isolation technique:

isolcpu has its tradeoffs; hence, some linux distributions also support cpuset cgroup-based isolation.(Even some trying to deprecate isolcpu).There are many articles explaining how to use cpuset based isolation and its comparison with isolcpu.

Every approach has its Pros and Cons but these are some of the solutions that i have considered, if you are aware of a different approach it would be interesting to discuss that.

NB: If the container CPU resources are in milli-cores or the containers are not expected to scale to more than one core, you won't notice this.

chinasubbareddy mallavarapu

Solution Architect at Ericsson | Telco 5G Core | Cloud Native infrastructure(CNIS) | Kubernetes for Telco| Cloud RAN

1y

Good insight ,thanks Ananth

KanakaDurgarao Vemana

Mobility Core | NFVi | Cloud Native | Hyper Scalers | Solution Design

1y

Curious to know the behavior Incase of Data CNFs with SR-IOV ports

ALUMBWAGE MCHILO

NFVI (VMWARE/Openstack) /SDN/ IMS Core/CS Core/ VNFs

1y

Interesting stuffs. I need to widen my skill into containers now.

Asad Khan

Systems Solution Architect @ Ericsson | TelcoCloud Trainer, NFV Professional

1y

Good explanation Ananth

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics