A VM journey from VMware to Kubernetes

On Sunday 2nd of February, Marek Libra, senior software engineer at Red Hat, presents how to migrate VMs from VMware to a Kubernetes environment.

As many know, Kubernetes is a container orchestrator but it can also operates VMs using KubeVirt. The legitimate question is the following: Why mix containers and VMs in the same environment?

Several answers are relevant, such as :

  • To migrate from a legacy applicative environment to a container-centric design more easily.
  • To keep critical legacy applications up and ready.
  • To isolate with more control to achieve security and stableness.
  • To scale horizontally.

KubeVirt implements VM creation using K8s Custom Resource Definitions.

The VM import feature is in the development stage inside OKD, the community distribution of Kubernetes that serves as a ground for Red Hat OpenShift.

A VM import multi-stage form is available inside the OKD console. Its validation triggers the creation of several interconnected resources in order to replicate the vCenter VM into OKD. One of these resources is the VM itself. Others are for the persistent volumes.

Another one of those resources is the conversion pod. Its role is to download the vCenter data into Kubernetes persistent volumes.

The state of the conversion can be viewed on the OKD web interface. Once done, the user can have access to a VM, up and ready, right inside Kubernetes.

The following features are still under ongoing development:

  • Add other providers
  • Bulk import

A call to contributors has been made.

Useful links:

OpenShift Cluster Console UI Repository https://github.com/openshift/console

KubeVirt Official Website https://kubevirt.io/

KubeVirt Architecture https://github.com/kubevirt/kubevirt/blob/master/docs/architecture.md

Why is not KubeVirt a CRI ? https://github.com/kubevirt/kubevirt/issues/49

Conference video record https://ftp.fau.de/fosdem/2020/H.1309/vai_vm_journey_from_vmware_to_k8s.webm

Kubernetes debugging

Debugging an application in kubernetes can be a real pain, especially if the application is replicated on multiple nodes. During Fosdem there were two conferences that dealt with this problem.

Debugging apps running in Kubernetes

By: Jeff Knurek

Note : The presented applications are not recommended for production environment. The talk was mostly about some tools that can be used by developers to troubleshoot their applications in development phase, these tools are the following :

  • Ksync: this is a tool that allows to mount the filesystem of a pod on the local filesystem, the tool allows to test the changes in seconds and not to wait minutes for a small change, to sum it up we can say that ksync is the -v of Kubernetes.
  • Telepresence: This is another tool that can be used to troubleshoot the application but unlike ksync, Telepresence allows to run the whole application on local system and proxying the communication to Kubernetes cluster if the application needs other microservices.
  • skaffold: or Local Kubernetes Development, is yet another tool to help developers to build their microservices in a Kubernetes environment, the tool mostly watches for changes on files, in that case it will build the image and run it on Kubernetes, the tool can be helpful if the application needs other applications running on Kubernetes.
  • Squash: Squash allows you to modify containers code on the fly and watch their logs, all this from local IDE. Besides debugging pods, Squash can also be used to debug Kubernetes services.

For more informations see the links below: Video + Slides: https://fosdem.org/2020/schedule/event/debugging_kubernetes/ ksync: https://github.com/ksync/ksync Telepresence: https://www.telepresence.io/ skaffold: https://github.com/GoogleContainerTools/skaffold Squash: https://github.com/solo-io/squash

Inspektor Gadget and traceloop

By: Alban Crequy

Stracing a pod in Kubernetes or even a container is not practical at all, the main reason is that using PID number as a filter in the container world can be very limited. traceloop resolves this issue by tracing containers at the cgroup level. Unlike strace, traceloop saves syscalls into a buffer ring, which can be consulted after a container’s crash. Tracing syscalls on containers is a nice feature but within Kubernetes developers don’t have access to the workers and containers are dispatched on multiple workers, to overcome this problem Inspektor was created. Inspektor role is to run traceloop on workers and collect syscalls using Kubernetes labels and metadata instead of cgroups. For more information about the project see the recording of the presentation.

Video: https://video.fosdem.org/2020/UD2.208/containers_bpf_tracing.mp4

Falco : the missing detection tool for Kubernetes security

by Kris Nova, Falco lead maintainer

In the domain of security, there is two complementary approaches, prevention and detection.

Prevention is locking the door, and blocking unwanted behaviors from your users, anyone on the Internet, etc.

Detection is when you failed at prevention, and in order not to fail next time, you need to know how you failed, what the attacker did, why and how he or she was able to do it, and therefore, what you can do to prevent it.

Detection is important because there is no such thing as a perfectly secure system.

In the Kubernetes context, prevention is done with tools such as Pod Security Policies, Network Policies, or Open Policy Agent.

Kris Nova presents Falco, which fills the detection role in that context.

Falco combines Kernel Tracing [1], container context (provided by the container engine) and Kubernetes metadata and matches these against rules provided by the user. It then produces its output by the means of logs, a gRPC interface, or webhooks.

Combining these three levels of information allow getting the broader picture without missing any key information. As Kris Nova explains in the talk, kernel tracing is important because you can, for example, open a file with devious means to hide it from userspace, but ultimately it is the kernel that execute the call, and therefore is able to trace it. Container engine and Kubernetes data can relate this action to where it came from and why it took place.

Last point, observability often comes at a performance cost. Falco is implemented in C and C++ and Kris claims it has a negligible performance impact.

The project is currently incubated at the CNCF, and looks quite promising.

Fosdem recording Falco Internals

SELinux for containers made (almost) easy

SELinux basics

SELinux is hard. The most common interaction with SELinux that Linux users have with it is sestatus followed by setenforce Permissive (at least according to Google search suggestions).

If you’re not familiar with SELinux, don’t worry, we’ll explain the basics. You can see it as an additional security layer on top of regular Linux ACLs, using a Mandatory Access Control (MAC) system. Its goal is to isolate processes to mitigate attacks via privilege escalation by limiting access to system resources. In Linux, every system resource is conveniently a file, SELinux thus works by assigning labels to files stored in the extended attributes of the file system (most filesystems support extended attributes) and to processes. A SELinux policy is what describe the allowed interactions between processes and files. By default, everything is denied and you’ll need to define policy rules to allow certain interactions. SELinux supports containers the same way: you can assign labels to containers. It’s particularly useful for privileged containers.

Fortunately, there is a generic SELinux policy for containers. Container processes can only read/execute /usr files and can only write to container files. Container processes have the container_t label and files have the container_file_t. That’s perfect to protect the host from containers, but we need a way to protect containers from other containers. In addition to labels, containers have categories to differentiate one from another:

This default policy should be enough for most use cases, but it has its issues:

  • No SELinux network control (all containers can bind to any network port)
  • No SELinux control on capabilities
  • Too strict for certain use cases (containers that need access to /home or /var/log directories)
  • Too loose to just add the container_file_t every directory your container needs access to (and can conflict with other SELinux-enabled tools)

You’ll probably need to write custom policies to take full advantage of SELinux (or simply to get your containerized apps running). That’s where SELinux’s biggest obstacle lies: writing SELinux policies is not easy and not fun. But SELinux is effective: every container breakout vulnerability was a file system breakout, and SELinux has contained them all.

Udica: generating policies

“Udica is a tool for generating SELinux security profiles for containers.” It looks at a running container (with SELinux temporarily disabled) and generates a SELinux policy based on the container capabilities, network and volume binds. Support for multiple container runtimes is included (CRI-O, Docker, Podman). It can be run from a container or installed as a python package and uses a JSON representation of the container as input: docker inspect <container> | udica my_container. After loading the policies it created you can simply run your container with --security-opt label=type:my_container.process.

Udica is useful by itself, especially for maintainers who’d like to provide a SELinux policy for their container images. I don’t really expect many maintainers to do such thing, so Udica is also nice for users of container images who are using SELinux.

Beyond running containers on a single host with SELinux enabled, Udica can be useful in k8s clusters. With an operator, you can scale this automated policy generation to a whole cluster !

Slides and talk recording: https://fosdem.org/2020/schedule/event/security_using_selinux_with_container_runtimes/

[1] Trace system calls and their parameters, using either a kernel module, or an ePBF probe (the latter being the preferred approach, since it runs in userspace and cannot crash the kernel).