Skip to content

Milestone 5

Sudip Padhye edited this page May 19, 2021 · 9 revisions

TASKS:

  • Deploying our services across both the IU and TACC instances of Jetstream (using Kubespray).
  • Installing Istio (Service Mesh) on both IU & TACC
  • Modifying Jenkins pipeline to deploy latest versions to both IU & TACC (for keeping both in sync)
  • Switching between TACC and IU VMs with no disruption of services over Network Failure using “blue-green” deployments.
  • Testing failover

Deploying Kubernetes cluster on TACC & creating a new instance for HAproxy

The below diagram explains our Jetstream deployment, where both IU & TACC have the Kubernetes cluster installed. The only difference is the IU holds 2 extra instances HAproxy & Jenkins.

Deploying Service Mesh (Istio)

  1. Injected sidecar proxy
  2. Connects Control plane (istiod) with the data plane (pods with sidecar proxy)

We have the following set of commands to install & deploy istio along with deploying the kiali dashboard by converting it from ClusterIP to LoadBalancer.

  curl -L https://istio.io/downloadIstio | sh - &&
  cd istio-1.9.3/ &&
  export PATH=$PWD/bin:$PATH &&
  istioctl install -y &&
  kubectl label namespace default istio-injection=enabled &&
  cd .. &&
  kubectl delete -f PingIntelligence/ &&
  kubectl apply -f PingIntelligence/ &&
  cd PingIntelligence/ &&
  git checkout automation-script &&
  cp ./kiali.yaml ../istio-1.9.3/samples/addons/ &&
  git checkout kubernetes_files &&
  cd .. &&
  kubectl apply -f istio-1.9.3/samples/addons/

Here is a snapshot of the kiali dashboard:

Kiali Dashboard is accessible from the below URLs:

  1. http://149.165.156.145:32001/
  2. http://129.114.16.125:32001/

Blue-Green Deployment

In order to achieve blue-green deployment, we have created bash scripts that make use of Kubernetes & Istio for knowing the status. The script files deployed on each instance can be found below links:

  1. IU Kubernetes Master Node
  2. TACC Kubernetes Master Node
  3. HAproxy Node

The Monitor Script present at "149.165.172.138" runs continuously. It monitors for any failure in IU (blue deployment). If it detects any then the Database states are extracted and loaded onto the TACC server (Green deployment). Later, the HAproxy redirects the URL from IU's master node to TACC's master node. Similarly, if any failure occurs on TACC then control will switch from TACC to IU.

Hence, at any given point in time, both IU & TACC are up & running. The HAproxy URL to access it is: http://149.165.172.138/

Assumption: IU/TACC clusters may go down but HAProxy Server won’t fail.

Control Flow:

Advantages of using HA Proxy Server

  • Free, Open-source.
  • Provides high-availability load balancer and proxy server across multiple servers.

Challenges faced:

  • Configuring HAproxy (port conflicts with Nginx)
  • Extracting & loading DB states
  • Identifying reasons for failover
  • Building shell script for continuous monitoring using HAProxy Server

The HAproxy status can be monitored using HAproxy dashboard using the link.

Username: ubuntu

Password: ubuntu

Below is the snapshot of the dashboard:

Below are the links we referred to know the possible disruptions and we tried to address the same using the scripts mentioned above:

  1. https://kubernetes.io/docs/concepts/workloads/pods/disruptions/
  2. https://kubernetes.io/docs/concepts/architecture/nodes/#condition
  3. https://istio.io/latest/docs/ops/diagnostic-tools/proxy-cmd/#:~:text=STALE means that Istiod has,a bug with Istio itself.
  4. http://www.haproxy.org/

Another alternative that we thought of was the KeepAliveD package. The advantage of using that is it does not require additional instances and redirection is done automatically (without the need to write the script). Redirection is done using mechanisms: VIP (Virtual IP), VRRP (Virtual Router Redundancy Protocol), Heartbeat (to elect the next Master node in case the current master fails). The Master node is elected from a set of Backup nodes based on the configured priority. The only challenge we faced is to know the status of the Master node (when does the control shift from master to backup using the Health-checkup script). Hence, we went ahead with HAproxy.

Testing the Failover

We tested the failover from IU to TACC by pausing the worker node - 2. This lead to disruption of service provided by pods present in node-2.

Similarly, we tested TACC to IU failover using another possibility (deleting all pods from the Kubernetes cluster).

  kubectl delete deployment --all

With both failovers, we observed a successful transition of control initially from IU -> TACC and then TACC -> IU (by maintaining the database states).