KOBE is a benchmarking system that leverages Docker and Kubernetes in order to reproduce experiments of federated query processing in a collections of data sources.
In the SPARQL query processing community, as well as in the wider databases community, benchmark reproducibility is based on releasing datasets and query workloads. However, this paradigm breaks down for federated query processors, as these systems do not manage the data they serve to their clients but provide a data-integration abstraction over the actual query processors that are in direct contact with the data.
The KOBE benchmarking engine is a system that aims to provide a generic platform to perform benchmarking and experimentation that can be reproducible in different environments. It was designed with the following objectives in mind:
- to allow for benchmark and experiment specifications to be reproduced in different environments and be able to produce comparable and reliable results;
- to ease the deployment of complex benchmarking experiments by automating the tedious tasks of initialization and execution.
Kubernetes
>= 1.8.0kubectl
configured for the Kubernetes clusterHelm
version 3 (for the Evaluation Metrics Extraction subsystem)nfs-commons
installed in the nodes of the cluster
Note: The following instructions were tested on Debian 12. Minor adjustments may be necessary for installation on other Linux distributions or operating systems.
curl -LO "https://dl.k8s.io/release/v1.20.7/bin/linux/amd64/kubectl"
curl -LO "https://dl.k8s.io/release/v1.20.7/bin/linux/amd64/kubectl.sha256"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
If you are not using an existing Kubernetes cluster, you can quickly set up a local environment for testing and development using Minikube:
Download and install Minikube on your system:
curl -LO https://storage.googleapis.com/minikube/releases/v1.21.0/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
Prepare Docker installation and setup:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a r /etc/apt/keyrings/docker.asc
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
Start Minikube with Docker driver:
minikube start --driver=docker
kubectl cluster-info
sudo apt-get install nfs-common
KOBE needs the Kubernetes operator that needs to be installed in the
Kubernetes cluster. To quickly install the KOBE operator in a
Kubernetes cluster, you can use the kobectl
script found in the
bin directory:
export PATH=`pwd`/bin:$PATH
kobectl install operator .
If you are using kubernetes version 1.15 and below you should instead use
kobectl install operator-v1beta1
Alternatively, you could run the following commands:
kubectl apply -f operator/deploy/crds
kubectl apply -f operator/deploy/service_account.yaml
kubectl apply -f operator/deploy/clusterrole.yaml
kubectl apply -f operator/deploy/clusterrole_binding.yaml
kubectl apply -f operator/deploy/operator.yaml
For Kubernetes version 1.15 and below swap
kubectl apply -f operator/deploy/crds
with
kubectl apply -f operator/deploy/crds-v1beta1
You will get a confirmation message that each resource has successfully been created. This will set the operator running in your Kubernetes cluster and needs to be done only once.
KOBE uses Istio to support network delays between the different deployments. To install Istio first define the version:
export ISTIO_VERSION=1.11.3
then deploy Istio:
kobectl install istio .
Alternatively, you can consult the official installation guide or you can type the following commands.
curl -L https://istio.io/downloadIstio | sh -
./istio-*/bin/istioctl manifest apply --set profile=default
KOBE uses Helm to simplify the management of dependencies within Kubernetes environments. To install Helm on your system, run:
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3.
chmod 700 get_helm.sh.
./get_helm.sh.
To enable the evaluation metrics extraction subsystem, run
kobectl install efk .
or alternatively the following
helm repo add elastic https://helm.elastic.co
helm repo add kiwigrid https://kiwigrid.github.io
helm install elasticsearch elastic/elasticsearch --set persistence.enabled=false --set replicas=1 --version 7.6.2
helm install kibana elastic/kibana --set service.type=NodePort --version 7.6.2
helm install fluentd kiwigrid/fluentd-elasticsearch -f operator/deploy/efk-config/fluentd-values.yaml --version 8.0.1
kubectl apply -f operator/deploy/efk-config/kobe-kibana-configuration.yaml
These result in the simplest setup of an single-node
Elasticsearch
that does not persist data across pod recreation, a
Fluentd
DaemonSet
and a
Kibana
node that exposes a NodePort
.
After all pods are in Running
state Kibana dashboards can be accessed
at
http://<NODE-IP>:<NODEPORT>/app/kibana#/dashboard/
where <NODE-IP>
the IP of any of the Kubernetes worker nodes and
<NODEPORT>
the result of kubectl get -o jsonpath="{.spec.ports[0].nodePort}" services kibana-kibana
.
The setup can be customized by changing the configuration parameters of each helm chart. Please check the corresponding documentation of each chart for more info.
To ensure compatibility, we recommend using the following versions of the dependencies:
Kubernetes
= v1.20.7Minikube
= v1.21.0 (if used)Istio
= v1.11.3Helm
= v3Elasticsearch
= 7.6.2Kibana
= 7.6.2Fluentd
= 8.0.1
These versions have been tested and verified to work together.
The typical workflow of defining a KOBE experiment is the following.
- Create one DatasetTemplate for each dataset server you want to use in your benchmark.
- Define your Benchmark, which should contain a list of datasets and a list of queries.
- Create one FederatorTemplate for the federator engine you want to use in your experiment.
- Define an Experiment over your previously defined benchmark.
Several examples of the above specifications can be found in the examples directory.
In the following, we show the steps for deploying an experiment on a simple benchmark that comprises three queries over a Semagrow federation of two Virtuoso endpoints.
You can use the kobectl
script found in the bin directory for controlling your experiments:
export PATH=`pwd`/bin:$PATH
kobectl help
First, apply the templates for Virtuoso and Semagrow:
kobectl apply examples/dataset-virtuoso/virtuosotemplate.yaml
kobectl apply examples/federator-semagrow/semagrowtemplate.yaml
Then, apply the benchmark.
kobectl apply examples/benchmark-toybench/toybench-simple.yaml
Before running the experiment, you should verify that the datasets are loaded. Use the following command:
kobectl show benchmark toybench-simple
When the datasets are loaded, you should get the following output:
NAME STATUS
toy1 Running
toy2 Running
Proceed now with the execution of the experiment:
kobectl apply examples/experiment-toybench/toyexp-simple.yaml
As perviously, you can review the state of the experiment with the following command:
kobectl show experiment toyexp-simple
You can now view the evaluation metrics in the Kibana dashboards.
For removing all of the above, issue the following commands:
kobectl delete experiment toyexp-simple
kobectl delete benchmark toybench-simple
kobectl delete federatortemplate semagrowtemplate
kobectl delete datasettemplate virtuosotemplate
For more advanced control options for KOBE, use kubectl.
To remove KOBE from your cluster, run the following command:
kobectl purge .
To remove KOBE operator manually, run
kubectl delete -f operator/deploy/operator.yaml
kubectl delete -f operator/deploy/role.yaml
kubectl delete -f operator/deploy/clusterrole_binding.yaml
kubectl delete -f operator/deploy/clusterrole.yaml
kubectl delete -f operator/deploy/service_account.yaml
kubectl delete -f operator/deploy/crds
To remove Istio manually, run
./istio-*/bin/istioctl manifest generate --set profile=default | kubectl delete -f -
kubectl delete namespace istio-system
To remove the evaluation metrics extraction subsystem manually, run
helm uninstall elasticsearch
helm uninstall kibana
helm uninstall fluentd
helm repo remove elastic
helm repo remove kiwigrid
kubectl delete jobs.batch kobe-kibana-configuration
kubectl delete configmaps kobe-kibana-config
and then in each Kubernetes node
rm -rf /var/log/fluentd-buffers/
rm /var/log/containers.log.pos