Stateless vs Stateful Kubernetes

WEKA. February 4, 2021

Kubernetes has become the de-facto orchestration tool and initially it was supporting stateless applications, but stateful (data-driven) applications are very common and are critical to almost all the businesses. Now, a lot of support is available for running stateful applications with Kubernetes.

In this article you’ll learn about these topics:

Stateless vs stateful basics
Containerized Stateful Application Use Cases and Their Challenges
Kubernetes storage and stateful applications
Kubernetes storage provisioning & AI workloads
How Weka’s Parallel File System Helps Solve Stateful Use Cases

Stateless vs. Stateful Kubernetes

At a very basic level, as the name suggests, the term “stateless” means that no past data nor state is stored or needs to be persistent when a new container is created. Stateless applications tend to include containerized microservices apps, CDN, print services, or any short term workers. and are easy for both deploying and managing resources.

Stateful applications typically involve some database, such as Cassandra, MongoDB, or MySQL and processes a read and/or write to it. Typically, it will involve requests being processed based on the information provided. The prior request history impacts the current state; hence, the server must access and hold onto state information generated during the processing of the earlier request, which is why it is called stateful.

Almost all the applications with modern workloads, such as AI/ML, financial data, and genomics sequencing are stateful and require persistent storage.

Containerized Stateful Application Use Cases and Their Challenges

Containerized applications need statefulness, as they are commonly deployed in hybrid and edge-to-core-to-cloud workloads, as well as CI/CD use cases. Here are some of the common use cases for containerized application deployments:

Data analytics processing and AI/ML–Hadoop, Spark, Tensorflow, PyTorch, and Kubeflow are now increasingly adopting containers. And need to go over massive amounts of data repeatedly.
MLOps–There are a number of stateful requirements when using containers for MLOps environments, such as checkpointing for large training jobs and sharing training and inference results.
Databases and messaging–Some applications recommend local flash for low latency. Using local flash on the POD’s worker nodes will limit the capabilities of moving containers between different worker nodes in the POD for additional agility. That is the reason why a high-performance shared storage like Weka, one that can provide the same or better latency while allowing for shared high performance data, would allow effectively using applications such as these:

– Single-instance databases like MySQL, PostgreSQL, MariaDB
– NoSQL databases like Cassandra and MongoDB
– In-memory databases like Redis and MemSQL and KDB+
– Messaging apps like Kafka
– Business critical apps like Oracle, SQL server, and SAP

Kubernetes Storage and Stateful Applications

In Kubernetes, basic storage building blocks are known as volumes. A volume is attached to a pod. A volume is like local storage, and there is no persistence to it. A volume gets released when a pod is destroyed. As such, a regular volume lacks persistence, portability, and scalability.

Persistent storage, as the name suggests, retains or stores the data generated by an application, making it suitable for stateful applications. Unlike local storage or a regular volume, a persistent volume is managed by clusters, and it’s not dependent on the pod lifecycle; therefore, the data can be retained and reused.

When creating a persistent volume for Kubernetes clusters, the storage file system and its configuration (IDs, access modes, size, names etc.) needs to be specified in a StorageClass.

There are 3 steps involved in creating a persistent volume and attaching it to a container in a pod:

Create a StorageClass
Create a PersistentVolumeClaim
Define the volume

Kubernetes Storage Provisioning and AI Workloads

There are two popular ways of provisioning persistent storage in Kubernetes: static and dynamic. We expanded on the topic in Kubernetes storage provisioning: what you should know before deploying containerized applications. To recap, the main difference relies on the point in which you want to configure storage. If you need to pre-populate data in a volume, you choose static provisioning. If you need to create volumes on demand, you go for dynamic provisioning.

Dynamic provisioning is no longer tied to the lifecycle of the pod. Moreover, the workload is not independent of the storage technology nor whether it is running on premises or in the cloud. That gives you more flexibility in choosing different deployment modes for different workloads. The outcome is optimal and flexible usage of resources for each specific workload.

In the past though, dynamic provisioning required changes to the Kubernetes source code. Today, Kubernetes developed a container storage interface (CSI) that provides the ability to implement and provision without changing the Kubernetes source code. CSI bundles it under a single umbrella, so it’s easier to perform dynamic and static provisioning. Then users simply install the volume plugin within the cluster and use it with a StorageClass object.

How Weka’s Parallel File System Helps Solve Stateful Use Cases

WekaFS, the world’s fastest and most scalable parallel file system, addresses the shareability, performance, and portability challenges by providing stateful, reliable storage, allowing seamless deployment on premises and easy migration to the cloud, while meeting high performance and low latency requirements. Using the WekaFS Kubernetes CSI plugin, organizations now have increased flexibility in how and where they deploy containers while delivering local storage performance and low latency.

The WekaFS CSI plugin is deployed using a Helm Chart or as a Daemonset, along with the POSIX agent on Kubernetes worker nodes, and is available in the Rancher application catalog as well. WekaFS supports volume provisioning in both the dynamic (persistent volume claim) and static (persistent volume) forms with its own storage class. It also supports ReadOnlyMany, ReadWriteOnce, and ReadWriteMany access modes.

WekaFS provides the same highest performing disaggregated storage, scalability, encryption, and data protection to all application- specific Kubernetes PODs and clusters, whether on premises or in the cloud.