Docker and Containers (7): Security — Running Containers Without Giving Away the Keys

Containers provide isolation, not security. Default Docker configurations run processes as root with full capabilities. This article shows how to lock containers down for production.

Docker’s default configuration prioritizes convenience over security. Containers run as root, have access to a broad set of Linux capabilities, and can write to their entire filesystem. This is fine for development but dangerous for production. A container escape vulnerability in a root-privileged container means an attacker can take over the host. Let’s fix that.


The Threat Model#

Before securing your setup, understand what you’re defending against:

Rootless containers

  1. Vulnerable application code: Your app has a bug (RCE, path traversal, SSRF) and an attacker gets code execution inside the container
  2. Vulnerable dependencies: A library in your image has a known CVE
  3. Container escape: An attacker exploits a kernel or runtime vulnerability to break out of the container
  4. Supply chain attack: A malicious base image or package is used
  5. Secrets exposure: Credentials leak through environment variables, image history, or logs
  6. Lateral movement: An attacker in one container pivots to other containers or the host

Each hardening technique addresses one or more of these threats. The goal is defense in depth: no single measure is sufficient, but layers of hardening make exploitation much harder.

Running as Non-Root#

By default, the process inside a Docker container runs as root (UID 0). This root is the same as on the host (unless user namespaces are enabled). If an attacker escapes the container, they become root on the host.

In the Dockerfile#

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
FROM python:3.11-slim

WORKDIR /app

# Install dependencies as root (needed for system packages)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Create a non-root user
RUN groupadd -r appuser && useradd -r -g appuser -d /app -s /sbin/nologin appuser

# Copy application files and set ownership
COPY --chown=appuser:appuser . .

# Switch to non-root user for all subsequent instructions and runtime
USER appuser

EXPOSE 8000
CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:app"]

On Alpine-based images, the syntax is slightly different:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
FROM python:3.11-alpine

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

RUN addgroup -S appuser && adduser -S appuser -G appuser
COPY --chown=appuser:appuser . .
USER appuser

CMD ["gunicorn", "--bind", "0.0.0.0:8000", "app:app"]

At Runtime#

Even if the Dockerfile doesn’t set a user, you can override it at runtime:

1
2
3
4
5
# Run as a specific UID:GID
docker run --user 1000:1000 myapp

# Run as the "nobody" user
docker run --user nobody myapp

Verifying the user#

1
2
3
4
5
6
7
# Check what user the container is running as
docker exec my-container id
# Output: uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)

# Compare with a default container
docker exec default-container id
# Output: uid=0(root) gid=0(root) groups=0(root)

Common non-root gotchas#

Running as a non-root user can break things that assume root access:

ProblemSymptomSolution
Can’t bind to ports < 1024Permission denied on port 80Use port 8080+ and map with -p 80:8080
Can’t write to directoriesPermission denied on /var/logRUN mkdir -p /var/log/app && chown appuser /var/log/app
Can’t install packages at runtimeapt-get failsInstall everything in build stage before USER
Can’t read mounted filesPermission denied on volumesMatch UID/GID with host, or use named volumes
Package managers need rootnpm/pip failUse --user flag for pip, or install before switching user

Read-Only Filesystem#

Container security fortress with multiple defense layers

A read-only root filesystem prevents attackers from modifying binaries, planting malware, or changing configuration files:

1
2
# Run with read-only filesystem
docker run --read-only myapp

Most applications need to write to some locations (temp files, caches, pid files). Use tmpfs for these writable areas:

1
2
3
4
5
# Read-only root with writable /tmp and /var/run
docker run --read-only \
    --tmpfs /tmp:size=100m \
    --tmpfs /var/run:size=1m \
    myapp

In Docker Compose:

1
2
3
4
5
6
7
8
services:
  api:
    image: myapp
    read_only: true
    tmpfs:
      - /tmp:size=100m
      - /var/run:size=1m
      - /app/cache:size=50m

Test your application with --read-only in development. If it crashes, the error message will tell you which path it tried to write to — then add a tmpfs for that path.

1
2
3
4
# Find where your app writes
docker run --read-only myapp 2>&1 | grep "Read-only file system"
# Output: OSError: [Errno 30] Read-only file system: '/app/logs/app.log'
# Solution: Add --tmpfs /app/logs:size=50m

Linux Capabilities#

Rootless container running as unprivileged user security vis

Linux capabilities divide root’s power into about 40 individual privileges. By default, Docker grants containers a subset of these, more than most applications need.

Linux capability management

Default capabilities given to Docker containers:

CapabilityPermissionNeeded?
CHOWNChange file ownershipRarely
DAC_OVERRIDEBypass file permission checksRarely
FSETIDSet SUID/SGID bitsAlmost never
FOWNERBypass permission checks on file ownerRarely
MKNODCreate special filesAlmost never
NET_RAWUse raw sockets (ping, packet capture)Sometimes
SETGIDSet group IDSometimes (init scripts)
SETUIDSet user IDSometimes (init scripts)
SETFCAPSet file capabilitiesAlmost never
SETPCAPSet process capabilitiesAlmost never
NET_BIND_SERVICEBind to ports < 1024Only for port 80/443
SYS_CHROOTUse chrootAlmost never
KILLSend signals to other processesSometimes
AUDIT_WRITEWrite to kernel audit logRarely

Follow the principle of least privilege: drop all capabilities and add back only what your application needs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Drop ALL capabilities, add back only what's needed
docker run \
    --cap-drop ALL \
    --cap-add NET_BIND_SERVICE \
    myapp

# A web server that needs to bind to port 80
docker run \
    --cap-drop ALL \
    --cap-add NET_BIND_SERVICE \
    -p 80:80 \
    nginx

# Most applications need nothing
docker run \
    --cap-drop ALL \
    myapp

In Docker Compose:

1
2
3
4
5
6
7
services:
  api:
    image: myapp
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE

Checking capabilities#

1
2
3
4
5
# See what capabilities a running container has
docker exec my-container cat /proc/1/status | grep Cap

# Decode the hex capability mask
docker exec my-container capsh --decode=00000000a80425fb

Secrets Management#

Secrets (API keys, database passwords, TLS certificates) are among the most common security failures in containerized applications.

How NOT to handle secrets#

1
2
3
4
5
6
7
8
9
# NEVER: Secrets in build arguments (stored in image history)
ARG DB_PASSWORD=supersecret
RUN echo "password=$DB_PASSWORD" >> /app/config

# NEVER: Secrets in environment variables in the Dockerfile
ENV API_KEY=sk-12345abcde

# NEVER: Secrets COPYed into the image
COPY credentials.json /app/credentials.json

All three of these are visible to anyone with access to the image:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Build args are visible in history
docker history myapp
# Shows: ARG DB_PASSWORD=supersecret

# Environment variables are visible in inspect
docker inspect myapp --format '{{json .Config.Env}}'
# Shows: ["API_KEY=sk-12345abcde"]

# Files are extractable from the image
docker create --name extract myapp
docker cp extract:/app/credentials.json .

Environment variables (acceptable for many use cases)#

Environment variables at runtime (not in the Dockerfile) are the most common approach:

1
docker run -e DB_PASSWORD=secret -e API_KEY=sk-12345 myapp

Or with a file:

1
docker run --env-file .env myapp

The .env file should never be committed to version control (add it to .gitignore).

Risks of environment variables:

  • Visible via docker inspect
  • Available to all processes in the container (including child processes)
  • Can be logged accidentally (env | sort in debug output, error reporters)
  • Visible in /proc/<pid>/environ inside the container

Docker BuildKit secrets (for build-time secrets)#

BuildKit can mount secrets during the build without storing them in any layer:

1
2
3
4
5
6
7
8
9
# syntax=docker/dockerfile:1

FROM python:3.11-slim

# Mount secret at build time — never stored in a layer
RUN --mount=type=secret,id=pip_extra_index \
    pip install --no-cache-dir \
    --extra-index-url $(cat /run/secrets/pip_extra_index) \
    -r requirements.txt
1
2
3
4
# Build with the secret
DOCKER_BUILDKIT=1 docker build \
    --secret id=pip_extra_index,src=./pip_index_url.txt \
    -t myapp .

The secret is available during the RUN instruction but is not stored in the image or any layer.

Docker Swarm secrets (for runtime secrets)#

If you use Docker Swarm, secrets are first-class:

1
2
3
4
5
6
7
8
# Create a secret
echo "supersecretpassword" | docker secret create db_password -

# Use it in a service
docker service create \
    --name api \
    --secret db_password \
    myapp

Inside the container, the secret is available as a file at /run/secrets/db_password. This is more secure than environment variables because:

  • It’s a tmpfs mount (never written to disk)
  • Only available to services that explicitly request it
  • Can be rotated without restarting the service

Files mounted at runtime#

For non-Swarm deployments, you can achieve similar security with bind mounts:

1
2
3
docker run \
    -v /secure/path/credentials.json:/run/secrets/credentials.json:ro \
    myapp

The :ro flag makes it read-only. Combine with --tmpfs /tmp and --read-only to prevent the secret from being copied elsewhere in the container.

Image Scanning with Trivy#

Trivy is a vulnerability scanner that checks container images against known CVE databases:

Image vulnerability scanning

1
2
3
4
5
# Install Trivy
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin

# Scan an image
trivy image myapp:latest
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
myapp:latest (debian 12.1)
===========================
Total: 45 (UNKNOWN: 0, LOW: 25, MEDIUM: 12, HIGH: 6, CRITICAL: 2)

+-------------------+------------------+----------+-------------------+-------------------+
|      LIBRARY      |  VULNERABILITY   | SEVERITY | INSTALLED VERSION |   FIXED VERSION   |
+-------------------+------------------+----------+-------------------+-------------------+
| libssl3           | CVE-2023-XXXXX   | CRITICAL | 3.0.9-1           | 3.0.11-1          |
| libcurl4          | CVE-2023-YYYYY   | CRITICAL | 7.88.1-10         | 7.88.1-10+deb12u4 |
| python3.11        | CVE-2023-ZZZZZ   | HIGH     | 3.11.4            | 3.11.5            |
+-------------------+------------------+----------+-------------------+-------------------+

Python (requirements.txt)
==========================
Total: 3 (HIGH: 2, MEDIUM: 1)

+-------------------+------------------+----------+-------------------+-------------------+
|      LIBRARY      |  VULNERABILITY   | SEVERITY | INSTALLED VERSION |   FIXED VERSION   |
+-------------------+------------------+----------+-------------------+-------------------+
| requests          | CVE-2023-AAAAA   | HIGH     | 2.28.0            | 2.31.0            |
| flask             | CVE-2023-BBBBB   | MEDIUM   | 2.2.0             | 2.3.3             |
+-------------------+------------------+----------+-------------------+-------------------+

Trivy scans both OS packages and application dependencies (pip, npm, gem, etc.).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Scan only for CRITICAL and HIGH severity
trivy image --severity CRITICAL,HIGH myapp:latest

# Fail if any vulnerability is found (useful in CI)
trivy image --exit-code 1 --severity CRITICAL myapp:latest

# Scan a Dockerfile (check the base image before building)
trivy config Dockerfile

# Scan a local filesystem
trivy fs --security-checks vuln,secret ./

Integrate Trivy in CI#

1
2
3
4
5
6
7
8
# GitHub Actions example
- name: Run Trivy vulnerability scanner
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: 'myapp:${{ github.sha }}'
    format: 'table'
    exit-code: '1'
    severity: 'CRITICAL,HIGH'

Minimal Base Images#

The fewer files in your image, the smaller the attack surface. Compare these base images:

Base ImageSizePackagesShellSecurity Posture
ubuntu:22.0478 MB~100bashLarge attack surface
debian:bookworm-slim75 MB~80bashSlightly smaller
alpine:3.187 MB~15shSmall, uses musl libc
distroless/base20 MB~5NoneMinimal, no shell access
distroless/static2 MB~2NoneOnly static binaries
scratch0 MB0NoneAbsolute minimum

Distroless Images#

Google’s distroless images contain only your application and its runtime dependencies — no shell, no package manager, no unnecessary utilities:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Multi-stage build with distroless
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
COPY . .

FROM gcr.io/distroless/python3-debian12
WORKDIR /app
COPY --from=builder /root/.local/lib/python3.11/site-packages /usr/lib/python3.11/site-packages
COPY --from=builder /app .
ENTRYPOINT ["python3", "app.py"]

Benefits:

  • No shell means docker exec bash doesn’t work — attackers can’t get an interactive shell
  • No package manager means attackers can’t install tools
  • Fewer files means fewer potential CVEs

Drawback: debugging is harder. You can’t exec into the container. Use the ephemeral debug container technique from the previous article.

Scratch Images (Go, Rust)#

For statically compiled languages, you can use scratch (literally empty):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server .

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /server /server
EXPOSE 8080
ENTRYPOINT ["/server"]

The final image contains exactly one file (plus CA certificates). Attack surface: almost zero.

Docker Content Trust#

Docker Content Trust (DCT) uses digital signatures to verify image authenticity:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Enable content trust
export DOCKER_CONTENT_TRUST=1

# Now pulls and pushes require signatures
docker pull nginx:latest
# Only succeeds if the image is signed

# Push a signed image (requires setting up signing keys)
docker push myrepo/myapp:v1.0
# Docker will prompt for a signing passphrase

DCT uses The Update Framework (TUF) to manage keys and signatures. When enabled:

  • docker pull verifies that the image was signed by a trusted publisher
  • docker push signs the image with your key
  • Unsigned images are rejected

This prevents supply chain attacks where a registry is compromised and images are replaced with malicious ones.

Resource Limits#

Without resource limits, a container can consume unlimited CPU, memory, and disk I/O — starving other containers and the host:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Memory limit (container is killed if it exceeds this)
docker run --memory 512m myapp

# Memory limit with swap
docker run --memory 512m --memory-swap 1g myapp

# CPU limit (container gets at most 0.5 CPU cores)
docker run --cpus 0.5 myapp

# CPU shares (relative weight, default 1024)
docker run --cpu-shares 512 myapp

# Combined limits
docker run \
    --memory 512m \
    --memory-swap 512m \
    --cpus 1.0 \
    --pids-limit 100 \
    myapp

In Docker Compose:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
services:
  api:
    image: myapp
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 128M
ResourceFlagEffect
Memory--memory 512mHard limit, OOM-killed if exceeded
Memory + Swap--memory-swap 1gTotal memory+swap limit
CPU--cpus 0.5Hard limit: 50% of one core
CPU shares--cpu-shares 512Relative weight (soft limit)
PIDs--pids-limit 100Max number of processes (prevents fork bombs)
Disk I/O--device-read-bps /dev/sda:1mbDisk bandwidth limit

The --pids-limit flag is often overlooked but prevents fork bomb attacks:

1
2
3
# Without --pids-limit, a fork bomb crashes the host
# With it, the container is limited to 100 processes
docker run --pids-limit 100 myapp

Security Options#

Security layers

Seccomp Profiles#

Seccomp (Secure Computing Mode) filters which system calls a container can make. Docker’s default seccomp profile blocks ~60 dangerous syscalls:

Seccomp syscall filtering

1
2
3
4
5
6
7
8
# Run with the default seccomp profile (automatic)
docker run myapp

# Run with a custom seccomp profile
docker run --security-opt seccomp=/path/to/profile.json myapp

# Disable seccomp (DON'T do this in production)
docker run --security-opt seccomp=unconfined myapp

AppArmor and SELinux#

Docker automatically applies AppArmor (Ubuntu/Debian) or SELinux (RHEL/CentOS) profiles:

1
2
3
4
5
6
# Check the AppArmor profile
docker inspect my-container --format '{{.AppArmorProfile}}'
# Output: docker-default

# Run with a custom AppArmor profile
docker run --security-opt apparmor=my-custom-profile myapp

No New Privileges#

Prevents processes inside the container from gaining new privileges (through setuid binaries, for example):

1
docker run --security-opt no-new-privileges:true myapp

In Docker Compose:

1
2
3
4
5
services:
  api:
    image: myapp
    security_opt:
      - no-new-privileges:true

Security Best Practices Checklist#

PracticePriorityImplementation
Run as non-root userCriticalUSER appuser in Dockerfile
Use specific image tagsCriticalFROM python:3.11.5-slim, never latest
Scan images for CVEsCriticaltrivy image myapp in CI pipeline
Drop all capabilitiesHigh--cap-drop ALL --cap-add <needed>
Use read-only filesystemHigh--read-only --tmpfs /tmp
Set memory limitsHigh--memory 512m
Use .dockerignoreHighExclude .git, .env, secrets
No secrets in imagesCriticalUse runtime env vars, mounted files, or Docker secrets
Use multi-stage buildsHighBuild tools stay out of production image
Enable no-new-privilegesMedium--security-opt no-new-privileges:true
Use minimal base imagesMediumAlpine, distroless, or scratch
Pin dependency versionsMediumLockfiles, exact version pins
Set PID limitsMedium--pids-limit 100
Enable content trustMediumDOCKER_CONTENT_TRUST=1
Use health checksMediumHEALTHCHECK CMD curl -f http://localhost/health
Limit network exposureMediumUse custom networks, don’t expose unnecessary ports
Audit image historyLowdocker history --no-trunc myapp
Use read-only volumesLow-v config:/app/config:ro

A Hardened Docker Compose Example#

Putting it all together:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
services:
  api:
    build:
      context: ./api
      target: production
    read_only: true
    tmpfs:
      - /tmp:size=100m,mode=1777
      - /var/run:size=1m
    user: "1000:1000"
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 512M
        reservations:
          memory: 128M
    healthcheck:
      test: ["CMD-SHELL", "python -c \"import urllib.request; urllib.request.urlopen('http://localhost:8000/health')\""]
      interval: 30s
      timeout: 5s
      retries: 3
    environment:
      DATABASE_URL: ${DATABASE_URL}
    ports:
      - "8000:8000"
    networks:
      - frontend
      - backend
    restart: unless-stopped
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

  postgres:
    image: postgres:16-alpine
    read_only: true
    tmpfs:
      - /tmp
      - /var/run/postgresql
    user: "999:999"
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    deploy:
      resources:
        limits:
          memory: 1G
    volumes:
      - postgres-data:/var/lib/postgresql/data
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    networks:
      - backend
    restart: unless-stopped

networks:
  frontend:
  backend:
    internal: true  # No external access — only containers on this network

volumes:
  postgres-data:

Notice backend is an internal: true network — containers on this network cannot reach the internet, limiting the blast radius if the database container is compromised.

What’s Next#

You now know how to secure individual containers: non-root users, minimal capabilities, read-only filesystems, image scanning, and resource limits. But security is one challenge — scaling is another. What happens when a single host isn’t enough? When you need automatic failover, rolling updates, and service discovery across multiple machines? The final article previews container orchestration: Docker Swarm for simplicity and Kubernetes for scale, and when you might not need either.

In this series

Docker and Containers 8 parts

  1. 01 Docker and Containers (1): Why Containers — The Problem VMs Didn't Solve
  2. 02 Docker and Containers (2): Images and Layers — What docker pull Actually Downloads
  3. 03 Docker and Containers (3): Dockerfile Patterns — From Naive to Production
  4. 04 Docker and Containers (4): Networking and Volumes — How Containers Talk and Persist
  5. 05 Docker and Containers (5): Docker Compose — Multi-Container Applications
  6. 06 Docker and Containers (6): Debugging and Logging — When Things Go Wrong Inside a Box
  7. 07 Docker and Containers (7): Security — Running Containers Without Giving Away the Keys you are here
  8. 08 Docker and Containers (8): Beyond Docker — Kubernetes, Swarm, and What Comes Next

Liked this piece?

Follow on GitHub for the next one — usually one a week.

GitHub