Series · Cloud Computing · Chapter 3

Cloud Computing (3): Cloud-Native and Container Technologies

Why cloud-native exists, what containers actually do at the kernel level, how Kubernetes really works, when service mesh is worth its weight, and how the whole stack fits together in production.

Cloud Computing (3): Cloud-Native and Container Technologies — Chapter overview

The shift from monolithic applications to cloud-native architectures is one of the most consequential changes in software engineering this decade. The headline — containers and Kubernetes — is well known. The interesting story is why this stack won, what each layer actually does, and where the seams are that determine whether your platform feels effortless or feels like a maze.

This article walks the cloud-native stack from first principles. We start with the architectural shift that motivates everything else, then dig into what a container really is at the Linux kernel level, climb up to Kubernetes orchestration, examine when a service mesh earns its complexity, and finish with packaging and delivery via Helm and GitOps. Examples are deliberately concrete: copy-pastable Dockerfiles, real manifests, and the trade-offs that matter when you run this in production.


What You Will Learn#

  • The 12-Factor App methodology and why each factor exists
  • Containers from the inside: namespaces, cgroups, union filesystems, and image layering
  • Docker production essentials: multi-stage builds, security, Compose for local dev
  • Kubernetes architecture: how the control plane drives worker nodes via the reconciliation loop
  • Workload primitives: Pods, Services, Deployments, StatefulSets, DaemonSets, Jobs
  • Networking: CNI plugins, NetworkPolicy, Ingress, and when Istio service mesh pays for itself
  • Storage: PV/PVC dynamic provisioning and what ReadWriteMany actually costs
  • Helm packaging, release history, and how rollbacks really work
  • Microservices patterns: circuit breakers, sagas, API gateways
  • GitOps with ArgoCD and the operational discipline it forces

Prerequisites#

  • Comfortable with the Linux command line and basic networking (routing, DNS, TCP)
  • Understanding of HTTP/REST and how web apps and databases talk to each other
  • Parts 1-6 of this series (especially Virtualization , Networking , and DevOps ) provide useful background

Cloud-Native: What Changed and Why#

Cloud-native is not “running stuff in the cloud.” A lift-and-shifted VM is in the cloud but not cloud-native. The CNCF definition is precise:

Cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

Three ideas do most of the work behind that sentence:

  1. Immutable infrastructure. Servers are not pets you patch; they are cattle you replace. A new release is a new image, never an in-place edit. This eliminates configuration drift, the source of half of all production incidents.
  2. Declarative APIs. You describe the desired state (“I want 3 replicas of v1.4 with 500 MB memory each”) and the platform makes reality match. The opposite — imperative scripts that say “do step 1, then step 2” — breaks the moment reality differs from the script’s assumptions.
  3. Loose coupling at every layer. Services are independent. So are deploys. So are failures. So are scaling decisions. The cost is more moving parts; the benefit is that no single moving part can break everything.

Monolith vs Microservices: The Trade-off Made Visible#

Monolith vs Microservices Architecture

The diagram above shows the structural difference, but the real story is in four numbers:

DimensionMonolithMicroservices
Deploy unit1 binaryN independent services
Scale unitWhole appEach service independently
Tech stackOne language/runtimePolyglot per service
Failure blast radius100%1 service (with circuit breakers)

Microservices are not strictly better. They trade simplicity for independence: you pay with distributed systems complexity (network failures, eventual consistency, distributed tracing, contract versioning) to gain the ability to deploy, scale, and fail independently. The decision rule: if your team is small enough to fit in two pizzas and your release cadence is monthly, a well-structured monolith is almost certainly the right answer. The threshold to introduce microservices is when coordination overhead between teams starts dominating engineering time.

The 12-Factor App: A Survival Guide#

The 12-Factor methodology (Heroku, 2011) predates Kubernetes but has become the default operational contract a containerized service is expected to honor. Each factor exists to make a specific failure mode impossible:

#FactorWhy it matters
1Codebase — one repo, many deploysSame code, different config = reliable promotion path
2Dependencies — explicitly declared and isolated“Works on my machine” becomes impossible
3Config — in environment, not codeSame image runs in dev/staging/prod
4Backing services — attached resourcesSwap a DB by changing a URL, not refactoring
5Build, release, run — strictly separatedA release is immutable and rollback-able
6Processes — stateless and share-nothingAny replica can serve any request
7Port binding — self-containedNo assumed external server (Tomcat, IIS)
8Concurrency — scale via process modelHorizontal scaling is the default
9Disposability — fast startup, graceful shutdownAuto-scaling and rolling updates work
10Dev/prod parity — keep environments similarProduction surprises shrink
11Logs — as event streams to stdoutPlatform aggregates, you don’t write to files
12Admin processes — one-off in same envMigrations don’t have a separate stack

Violating a factor is sometimes the right call (factor 6 is genuinely hard for stateful systems), but each violation is a debt you should know you took on.

Containers: What They Actually Are#

A common mental model is “containers are lightweight VMs.” That mental model is wrong in important ways. Containers are not virtualization; they are process isolation. A container is just a Linux process (or process tree) where the kernel has been instructed to lie to it about what the system looks like.

Three Linux kernel features do the work:

  1. Namespaces — give a process its own view of system resources (PID, network, mount, UTS, IPC, user, cgroup). Inside a PID namespace, your container sees itself as PID 1 and cannot see processes outside.
  2. cgroups (v2) — enforce resource limits (CPU, memory, IO, PIDs). When you set --memory=512m, the kernel kills the process if it exceeds that limit.
  3. Union filesystems (overlay2 today) — stack read-only image layers under a thin writable layer per container, enabling instant copy-on-write filesystem semantics.

That’s it. A container shares the host kernel. There is no hypervisor, no second OS. The cost: ~50 ms startup vs ~30 s for a VM, ~5 MB overhead vs ~500 MB, and density of hundreds per host vs tens.

Image Layers: The Cache That Makes Builds Fast#

Docker Image Layers

Every Dockerfile instruction creates a new layer. Layers stack via the union filesystem; identical layers are deduplicated across images and across hosts. This is why two well-structured images that share a base can differ by megabytes even if the base is gigabytes.

Two practical consequences:

1. Order Dockerfile instructions for cache reuse. Put things that change rarely (system packages, language runtime) first; put things that change every commit (your app code) last. A cached build is seconds; a cold build is minutes.

GitOps Deployment Pipeline

The key shift is GitOps: the cluster’s state is defined by Git. ArgoCD (or Flux) continuously reconciles the cluster against a Git repo. Two big wins:

  1. Audit trail. Every change is a commit. Want to know who changed prod at 2am? git blame.
  2. Disaster recovery. Cluster gone? kubectl apply from the manifest repo and you’re back.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata: { name: web, namespace: argocd }
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/k8s-manifests
    path: apps/web/overlays/prod
    targetRevision: main
  destination:
    server: https://kubernetes.default.svc
    namespace: prod
  syncPolicy:
    automated: { prune: true, selfHeal: true }
    syncOptions: [CreateNamespace=true]

selfHeal: true means “if someone kubectl edits a resource by hand, ArgoCD will revert it.” That is the discipline GitOps enforces — the cluster’s state is what’s in Git, not what’s in someone’s terminal.

Operating in Production: The Commands That Matter#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Cluster overview
kubectl cluster-info
kubectl get nodes -o wide

# What's running where
kubectl get pods -A -o wide
kubectl top pods -A                   # CPU/memory actuals

# Debug a failing pod
kubectl describe pod <name> -n <ns>   # events, status, scheduling
kubectl logs <name> -n <ns> -f        # stream logs
kubectl logs <name> -n <ns> --previous  # logs from the crashed previous container
kubectl exec -it <name> -n <ns> -- sh

# Recent cluster events (the goldmine for "why did this happen")
kubectl get events -A --sort-by='.lastTimestamp' | tail -30

# Scale and rollout
kubectl scale deploy/web --replicas=5
kubectl rollout restart deploy/web    # forces a fresh rollout, useful for picking up new secrets

The single most useful pair: kubectl describe (status, events, scheduling decisions) and kubectl logs --previous (what happened in the container that just crashed).

Production Checklist#

Before declaring a workload production-ready:

  • Multi-stage Dockerfile, non-root user, distroless or minimal base
  • Image pinned by digest (or at least immutable tag), signed, scanned in CI
  • Resource requests and limits set on every container
  • Liveness and readiness probes (readiness controls traffic, liveness controls restarts)
  • PodDisruptionBudget so cluster maintenance doesn’t take you below minAvailable
  • HorizontalPodAutoscaler if traffic is variable
  • NetworkPolicy with default-deny + explicit allows
  • Secrets in an external store (Vault, AWS Secrets Manager, External Secrets Operator) not in plain Secrets
  • Logs to stdout, structured (JSON), aggregated to a central system
  • Metrics exposed (Prometheus format) and dashboards exist
  • Distributed tracing instrumented (OpenTelemetry)
  • Backups tested (especially for StatefulSets)
  • Runbook exists for the common failure modes

A workload that ticks all these boxes is not unbreakable — but the failure modes that remain are the interesting ones, not the embarrassing ones.

In this series

Cloud Computing 8 parts

  1. 01 Cloud Computing (1): Fundamentals and Architecture
  2. 02 Cloud Computing (2): Virtualization Technology Deep Dive
  3. 03 Cloud Computing (3): Cloud-Native and Container Technologies you are here
  4. 04 Cloud Computing (4): Cloud Storage Systems and Distributed Architecture
  5. 05 Cloud Computing (5): Cloud Network Architecture and SDN
  6. 06 Cloud Computing (6): Cloud Security and Privacy Protection
  7. 07 Cloud Computing (7): Cloud Operations and DevOps Practices
  8. 08 Cloud Computing (8): Multi-Cloud and Hybrid Architecture

Liked this piece?

Follow on GitHub for the next one — usually one a week.

GitHub