containters

In this article, I will show you how the ansible operator-sdk works to create a rabbitmq operator. The main purpose is to verify if an ops can use it or if it is a developer-oriented tool.

Targeted audience : k8s users and ansible users

By Jacques Roussel, Cloud Consultant @Objectif Libre

Definition

In kubernetes, an operator is a piece of code (controller) that watches specific resources via the k8s api. These specific resources are created by a Custom Resource Defenition (CRD) that adds new resources to the k8s api.

In this article, we will add a Custom Resource called rabbitmq. These CRD will allow us to create a rabbitmq cluster. These rabbitmq will be managed by our operator.

Configuration

Prepare your environment

To test our operator, we need a k8s cluster. We use minikube for its simplicity.

$ sudo apt update && sudo apt install virtualbox -y
$ curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64   && chmod +x minikube
$ sudo mv minikube /usr/local/bin/

We need to install the operator-sdk too:

$ sudo add-apt-repository ppa:longsleep/golang-backports
$ sudo apt update && sudo apt install golang-go -y
$ mkdir ~/go
$ export GOPATH="$HOME/go"
$ mkdir -p $GOPATH/src/github.com/operator-framework
$ cd $GOPATH/src/github.com/operator-framework
$ git clone https://github.com/operator-framework/operator-sdk
$ cd operator-sdk
$ git checkout master
$ make install
$ export PATH=$PATH:$(go env GOPATH)/bin

Now we check that everything is ok before we take the next step:

$ operator-sdk --version
operator-sdk version v0.7.0+git
$ minikube version
minikube version: v1.0.0

Generate the file tree

We create a namespace-scoped operator. It means that the operator should be deployed in the namespace where we want to use it. Let’s start by creating the folder with all files:

$ operator-sdk new rabbitmq-operator --api-version=rabbitmq.olibre.io/v1alpha1 --kind=Rabbitmq --type=ansible
INFO[0000] Creating new Ansible operator 'rabbitmq-operator'. 
INFO[0000] Created deploy/service_account.yaml          
INFO[0000] Created deploy/role.yaml                     
INFO[0000] Created deploy/role_binding.yaml             
INFO[0000] Created deploy/crds/rabbitmq_v1alpha1_rabbitmq_crd.yaml 
INFO[0000] Created deploy/crds/rabbitmq_v1alpha1_rabbitmq_cr.yaml 
INFO[0000] Created build/Dockerfile                     
INFO[0000] Created roles/rabbitmq/README.md             
INFO[0000] Created roles/rabbitmq/meta/main.yml         
INFO[0000] Created roles/rabbitmq/files/.placeholder    
INFO[0000] Created roles/rabbitmq/templates/.placeholder 
INFO[0000] Created roles/rabbitmq/vars/main.yml         
INFO[0000] Created molecule/test-local/playbook.yml     
INFO[0000] Created roles/rabbitmq/defaults/main.yml     
INFO[0000] Created roles/rabbitmq/tasks/main.yml        
INFO[0000] Created molecule/default/molecule.yml        
INFO[0000] Created build/test-framework/Dockerfile      
INFO[0000] Created molecule/test-cluster/molecule.yml   
INFO[0000] Created molecule/default/prepare.yml         
INFO[0000] Created molecule/default/playbook.yml        
INFO[0000] Created build/test-framework/ansible-test.sh 
INFO[0000] Created molecule/default/asserts.yml         
INFO[0000] Created molecule/test-cluster/playbook.yml   
INFO[0000] Created roles/rabbitmq/handlers/main.yml     
INFO[0000] Created watches.yaml                         
INFO[0000] Created deploy/operator.yaml                 
INFO[0000] Created .travis.yml                          
INFO[0000] Created molecule/test-local/molecule.yml     
INFO[0000] Created molecule/test-local/prepare.yml      
INFO[0000] Run git init ...                             
Initialized empty Git repository in /home/jroussel/Documents/Clients/Objectif-libre/blog/article-operator/rabbitmq-operator/.git/
INFO[0000] Run git init done                            
INFO[0000] Project creation complete.
$ cd rabbitmq-operator/

As you noticed, the sdk creates a lot of files. We will modify only a few of them. The main file will be roles/rabbitmq/tasks/main.yml. It’s in that file that you will list the k8s objects that you want your operator to deploy.

Customize your operator

The operator that we will create comes from this statefulset with few modifications:

---
# tasks file for rabbitmq
- name: create rabbitmq
  k8s:
    definition:
      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: '{{ meta.name }}-rabbitmq'
        namespace: '{{ meta.namespace }}'

- name: Add rbac
  k8s:
    definition:
      kind: Role
      apiVersion: rbac.authorization.k8s.io/v1beta1
      metadata:
        name: endpoint-reader
        namespace: '{{ meta.namespace }}'
      rules:
      - apiGroups: [""]
        resources: ["endpoints"]
        verbs: ["get"]

- name: Add rolebinding
  k8s:
    definition:
      kind: RoleBinding
      apiVersion: rbac.authorization.k8s.io/v1beta1
      metadata:
        name: endpoint-reader
        namespace: '{{ meta.namespace }}'
      subjects:
      - kind: ServiceAccount
        name: '{{ meta.name }}-rabbitmq'
      roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: Role
        name: endpoint-reader

- name: Add service
  k8s:
    definition:
      kind: Service
      apiVersion: v1
      metadata:
        namespace: '{{ meta.namespace }}'
        name: rabbitmq
        labels:
          app: rabbitmq
          type: LoadBalancer  
      spec:
        type: ClusterIP
        ports:
         - name: http
           protocol: TCP
           port: 15672
           targetPort: 15672
         - name: amqp
           protocol: TCP
           port: 5672
           targetPort: 5672
        selector:
          app: rabbitmq

- name: Add configmap
  k8s:
    definition:
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: rabbitmq-config
        namespace: '{{ meta.namespace }}'
      data:
        enabled_plugins: |
            [rabbitmq_management,rabbitmq_peer_discovery_k8s].
      
        rabbitmq.conf: |
            ## Cluster formation. See http://www.rabbitmq.com/cluster-formation.html to learn more.
            cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
            cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
            ## Should RabbitMQ node name be computed from the pod's hostname or IP address?
            ## IP addresses are not stable, so using [stable] hostnames is recommended when possible.
            ## Set to "hostname" to use pod hostnames.
            ## When this value is changed, so should the variable used to set the RABBITMQ_NODENAME
            ## environment variable.
            cluster_formation.k8s.address_type = ip
            ## How often should node cleanup checks run?
            cluster_formation.node_cleanup.interval = 30
            ## Set to false if automatic removal of unknown/absent nodes
            ## is desired. This can be dangerous, see
            ##  * http://www.rabbitmq.com/cluster-formation.html#node-health-checks-and-cleanup
            ##  * https://groups.google.com/forum/#!msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ
            cluster_formation.node_cleanup.only_log_warning = true
            cluster_partition_handling = autoheal
            ## See http://www.rabbitmq.com/ha.html#master-migration-data-locality
            queue_master_locator=min-masters
            ## See http://www.rabbitmq.com/access-control.html#loopback-users
            loopback_users.guest = false
   
- name: Add sts
  k8s:
    definition:
      apiVersion: apps/v1beta1
      kind: StatefulSet
      metadata:
        name: rabbitmq
        namespace: '{{ meta.namespace }}'
      spec:
        serviceName: rabbitmq
        replicas: "{{size}}"
        template:
          metadata:
            labels:
              app: rabbitmq
          spec:
            serviceAccountName: '{{ meta.name }}-rabbitmq'
            terminationGracePeriodSeconds: 10
            containers:        
            - name: rabbitmq-k8s
              image: rabbitmq:3.7
              volumeMounts:
                - name: config-volume
                  mountPath: /etc/rabbitmq
              ports:
                - name: http
                  protocol: TCP
                  containerPort: 15672
                - name: amqp
                  protocol: TCP
                  containerPort: 5672
              livenessProbe:
                exec:
                  command: ["rabbitmqctl", "status"]
                initialDelaySeconds: 60
                # See https://www.rabbitmq.com/monitoring.html for monitoring frequency recommendations.
                periodSeconds: 60
                timeoutSeconds: 15
              readinessProbe:
                exec:
                  command: ["rabbitmqctl", "status"]
                initialDelaySeconds: 20
                periodSeconds: 60
                timeoutSeconds: 10
              imagePullPolicy: Always
              env:
                - name: MY_POD_IP
                  valueFrom:
                    fieldRef:
                      fieldPath: status.podIP
                - name: RABBITMQ_USE_LONGNAME
                  value: "true"
                # See a note on cluster_formation.k8s.address_type in the config file section
                - name: RABBITMQ_NODENAME
                  value: "rabbit@$(MY_POD_IP)"
                - name: K8S_SERVICE_NAME
                  value: "rabbitmq"
                - name: RABBITMQ_ERLANG_COOKIE
                  value: "mycookie" 
            volumes:
              - name: config-volume
                configMap:
                  name: rabbitmq-config
                  items:
                  - key: rabbitmq.conf
                    path: rabbitmq.conf
                  - key: enabled_plugins
                    path: enabled_plugins

If you are used to manipulate k8s objects, there is nothing special here. We use some variables like {{ meta.namespace }} or {{ size }}. These variables reference parameters that you will define when you create an instance of the CRD. meta.* comes from the metadata section and size comes from the spec section.

Then we need to modify deploy/role.yaml because our operator needs more permissions on the k8s api than the initial exemple:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  creationTimestamp: null
  name: rabbitmq-operator
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - services
  - serviceaccounts
  - endpoints
  - persistentvolumeclaims
  - events
  - configmaps
  - secrets
  verbs:
  - '*'
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - get
- apiGroups:
  - apps
  resources:
  - deployments
  - daemonsets
  - replicasets
  - statefulsets
  verbs:
  - '*'
- apiGroups:
  - monitoring.coreos.com
  resources:
  - servicemonitors
  verbs:
  - get
  - create
- apiGroups:
  - apps
  resourceNames:
  - rabbitmq-operator
  resources:
  - deployments/finalizers
  verbs:
  - update
- apiGroups:
  - rabbitmq.olibre.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - rbac.authorization.k8s.io
  resources:
  - roles
  - rolebindings
  verbs:
  - '*'

That’s it. Now we have to build the docker image that will embed our operator.

Build the docker image

In order to deploy our operator in the cluster, we have to create and publish a docker image that contains our logic operator.

$ operator-sdk build jarou/rabbitmq-operator:v0.0.3 #change the tag with your own registry or use this one
$ docker push jarou/rabbitmq-operator:v0.0.3

Configure the docker image in the manifest

Modify deploy/operator.yaml to set the image to jarou/rabbitmq-operator:v0.0.3 and imagePullPolicy to Always :

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rabbitmq-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      name: rabbitmq-operator
  template:
    metadata:
      labels:
        name: rabbitmq-operator
    spec:
      serviceAccountName: rabbitmq-operator
      containers:
        - name: ansible
          command:
          - /usr/local/bin/ao-logs
          - /tmp/ansible-operator/runner
          - stdout
          # Replace this with the built image name
          image: "jarou/rabbitmq-operator:v0.0.3"
          imagePullPolicy: "Always"
          volumeMounts:
          - mountPath: /tmp/ansible-operator/runner
            name: runner
            readOnly: true
        - name: operator
          # Replace this with the built image name
          image: "jarou/rabbitmq-operator:v0.0.3"
          imagePullPolicy: "Always"
          volumeMounts:
          - mountPath: /tmp/ansible-operator/runner
            name: runner
          env:
            - name: WATCH_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: OPERATOR_NAME
              value: "rabbitmq-operator"
      volumes:
        - name: runner
          emptyDir: {}

We are ready to deploy our operator and create an instance of our CRD.

Test of the operator

Deploy the operator

$ minikube start --memory 4096
$ kubectl create ns test
$ kubectl create -f deploy/crds/rabbitmq_v1alpha1_rabbitmq_crd.yaml -n test
$ kubectl create -f deploy/service_account.yaml -n test
$ kubectl create -f deploy/role.yaml -n test
$ kubectl create -f deploy/role_binding.yaml -n test
$ kubectl create -f deploy/operator.yaml -n test

Now we can create a rabbitmq instance.

Use the operator

$ kubectl create -f  deploy/crds/rabbitmq_v1alpha1_rabbitmq_cr.yaml -n test

Done. Our operator is working. To check that the reconciliation works, you can edit the rabbitmq object to change the size of the cluster and verify that the change is applied.

$ kubectl exec -it rabbitmq-1 bash -n test
root@rabbitmq-1:/# rabbitmqctl cluster_status
Cluster status of node rabbit@172.17.0.5 ...
[{nodes,[{disc,['rabbit@172.17.0.3','rabbit@172.17.0.5',
                'rabbit@172.17.0.8']}]},
 {running_nodes,['rabbit@172.17.0.8','rabbit@172.17.0.3','rabbit@172.17.0.5']},
 {cluster_name,<<"rabbit@rabbitmq-0.rabbitmq.test.svc.cluster.local">>},
 {partitions,[]},
 {alarms,[{'rabbit@172.17.0.8',[]},
          {'rabbit@172.17.0.3',[]},
          {'rabbit@172.17.0.5',[]}]}]
root@rabbitmq-1:/# exit
$ kubectl edit rabbitmq example-rabbitmq -n test # Change the size to 2
rabbitmq.rabbitmq.olibre.io/example-rabbitmq edited
$ kubectl exec -it rabbitmq-1 bash -n test
root@rabbitmq-1:/# rabbitmqctl cluster_status
Cluster status of node rabbit@172.17.0.5 ...
[{nodes,[{disc,['rabbit@172.17.0.3','rabbit@172.17.0.5',
                'rabbit@172.17.0.8']}]},
 {running_nodes,['rabbit@172.17.0.3','rabbit@172.17.0.5']},
 {cluster_name,<<"rabbit@rabbitmq-0.rabbitmq.test.svc.cluster.local">>},
 {partitions,[]},
 {alarms,[{'rabbit@172.17.0.3',[]},{'rabbit@172.17.0.5',[]}]}]
root@rabbitmq-1:/# exit

Conclusion

So it’s possibe for an ops to create an operator without writing a single ligne of code, and it stays simple.

In this example, I used the sdk (that I just discovered) and a rabbitmq statefulset (that I found for this article). The integration of both was simple and efficient. We can say this sdk seems to be a good tool.

Nevertheless, the Mesos team released a similar tool called kudo. It would be interesting to test it, to see if they are similar or if one of them is more efficient than the other regarding the cases.