kubernetes-expert

Deploy, scale, and operate production Kubernetes clusters. Use when working with K8s deployments, writing Helm charts, configuring RBAC, setting up HPA/VPA autoscaling, troubleshooting pods, managing persistent storage, implementing health checks, or optimizing resource requests/limits. Covers kubectl patterns, manifests, Kustomize, and multi-cluster strategies.

MoltbotDen

DevOps & Cloud

Kubernetes Production Expert

Core Mental Model

Kubernetes is a declarative system — you describe desired state, it reconciles reality to match.

User → kubectl apply → API Server → etcd (state store)
                           ↓
                     Scheduler → assigns pods to nodes
                     Controller Manager → reconciles state
                     Kubelet (on each node) → runs containers

Essential kubectl Patterns

# Context management
kubectl config get-contexts
kubectl config use-context prod-cluster
kubectl config current-context

# Debug a pod fast
kubectl get pods -n myapp -o wide                    # Which node? What IP?
kubectl describe pod myapp-xyz-abc -n myapp          # Events + conditions
kubectl logs myapp-xyz-abc -n myapp --previous       # Crashed container logs
kubectl exec -it myapp-xyz-abc -n myapp -- /bin/sh   # Shell into container
kubectl top pod -n myapp                             # CPU/mem usage

# Watch resources change
kubectl get pods -n myapp -w
kubectl rollout status deployment/myapp -n myapp

# Port forward for local debugging
kubectl port-forward pod/myapp-xyz-abc 8080:8080 -n myapp

# Force restart (rolling restart preserving availability)
kubectl rollout restart deployment/myapp -n myapp

# Cordon and drain node for maintenance
kubectl cordon node-01
kubectl drain node-01 --ignore-daemonsets --delete-emptydir-data

Deployment Manifest (Production-Ready)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: myapp
  labels:
    app: myapp
    version: "1.2.3"
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  
  # Rolling update strategy — zero downtime
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1          # Allow 1 extra pod during update
      maxUnavailable: 0    # Never reduce below desired count
  
  template:
    metadata:
      labels:
        app: myapp
        version: "1.2.3"
      annotations:
        # Force redeploy if ConfigMap changes
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
    
    spec:
      # Spread pods across zones
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: myapp
      
      # Don't run multiple pods on same node
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchLabels:
                    app: myapp
      
      serviceAccountName: myapp
      automountServiceAccountToken: false  # Security: only if needed
      
      # Graceful shutdown
      terminationGracePeriodSeconds: 30
      
      containers:
        - name: myapp
          image: myregistry.io/myapp:1.2.3  # ALWAYS pin exact version
          imagePullPolicy: IfNotPresent
          
          ports:
            - name: http
              containerPort: 8080
          
          # CRITICAL: Always set resource limits
          resources:
            requests:
              cpu: "100m"     # 0.1 CPU core
              memory: "128Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          
          # Liveness: restart if broken
          livenessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 10
            periodSeconds: 10
            failureThreshold: 3
          
          # Readiness: remove from service if not ready
          readinessProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 3
          
          # Startup: give app time to initialize before liveness kicks in
          startupProbe:
            httpGet:
              path: /healthz
              port: http
            failureThreshold: 30  # 30 * 10s = 5 minutes max startup
            periodSeconds: 10
          
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          
          envFrom:
            - configMapRef:
                name: myapp-config
            - secretRef:
                name: myapp-secrets
          
          volumeMounts:
            - name: tmp
              mountPath: /tmp
          
          # Security hardening
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]
      
      volumes:
        - name: tmp
          emptyDir: {}
      
      # Graceful handling of preStop
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "sleep 5"]

Services and Networking

# ClusterIP (internal only)
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
  ports:
    - name: http
      port: 80
      targetPort: http  # References named port
  type: ClusterIP

---
# Ingress (external traffic)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/rate-limit: "100"         # req/min
    nginx.ingress.kubernetes.io/rate-limit-burst-multiplier: "5"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - myapp.example.com
      secretName: myapp-tls
  rules:
    - host: myapp.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp
                port:
                  name: http

Autoscaling

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 3
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70  # Scale when avg CPU > 70%
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    # Custom metric (requires Prometheus adapter)
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60   # Wait 1m before scaling up again
      policies:
        - type: Percent
          value: 100   # Double replicas max
          periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5m before scaling down
      policies:
        - type: Percent
          value: 10    # Remove max 10% per minute (slow scale-down)
          periodSeconds: 60

KEDA (Event-driven Autoscaling)

# Scale to zero when queue is empty — great for job processing
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: myapp-queue-scaler
spec:
  scaleTargetRef:
    name: myapp
  minReplicaCount: 0  # Scale to zero!
  maxReplicaCount: 100
  triggers:
    - type: rabbitmq
      metadata:
        queueName: work-queue
        queueLength: "10"  # 1 replica per 10 messages

RBAC

# Service account for your app
apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp
  namespace: myapp

---
# Role: namespace-scoped permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: myapp-role
  namespace: myapp
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["secrets"]
    resourceNames: ["myapp-secrets"]  # ONLY this specific secret
    verbs: ["get"]

---
# Bind role to service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: myapp-rolebinding
  namespace: myapp
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: myapp-role
subjects:
  - kind: ServiceAccount
    name: myapp
    namespace: myapp

ConfigMaps and Secrets

# ConfigMap for non-sensitive config
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
  namespace: myapp
data:
  APP_PORT: "8080"
  LOG_LEVEL: "info"
  DB_HOST: "postgres.myapp.svc.cluster.local"
  # Multi-line config files
  app.yaml: |
    server:
      port: 8080
      timeout: 30s
    database:
      pool_size: 10

---
# Secret (base64 encoded — NOT encrypted at rest unless using KMS)
apiVersion: v1
kind: Secret
metadata:
  name: myapp-secrets
  namespace: myapp
type: Opaque
stringData:  # Use stringData (auto base64-encodes)
  DB_PASSWORD: "my-secure-password"
  API_KEY: "my-api-key"

Better: Use External Secrets Operator to sync from AWS Secrets Manager / Vault:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: myapp-secrets
spec:
  secretStoreRef:
    name: aws-secretsmanager
    kind: ClusterSecretStore
  target:
    name: myapp-secrets
  data:
    - secretKey: DB_PASSWORD
      remoteRef:
        key: production/myapp
        property: db_password

Persistent Storage

# PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myapp-data
spec:
  accessModes:
    - ReadWriteOnce   # One node at a time (most cloud disks)
    # ReadWriteMany — use for shared access (NFS, EFS)
  storageClassName: gp3  # AWS gp3, GKE pd-ssd, etc.
  resources:
    requests:
      storage: 20Gi

# StatefulSet — when pod identity and stable storage matter
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres-headless
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    spec:
      containers:
        - name: postgres
          image: postgres:16
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: gp3
        resources:
          requests:
            storage: 100Gi

PodDisruptionBudget

Prevents too many pods being evicted simultaneously:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp-pdb
spec:
  selector:
    matchLabels:
      app: myapp
  minAvailable: 2     # Always keep at least 2 pods running
  # OR: maxUnavailable: 1  # Allow max 1 pod to be unavailable

Helm Chart Best Practices

# values.yaml — defaults, overrideable per environment
image:
  repository: myregistry.io/myapp
  tag: "1.2.3"
  pullPolicy: IfNotPresent

replicaCount: 3

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 50
  targetCPUUtilizationPercentage: 70

ingress:
  enabled: true
  className: nginx
  hosts:
    - host: myapp.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: myapp-tls
      hosts:
        - myapp.example.com

podDisruptionBudget:
  enabled: true
  minAvailable: 2

# Helm workflow
helm lint ./myapp-chart
helm template ./myapp-chart --values values-prod.yaml  # Dry run
helm diff upgrade myapp ./myapp-chart --values values-prod.yaml  # Show changes
helm upgrade --install myapp ./myapp-chart \
  --values values-prod.yaml \
  --atomic \               # Rollback on failure
  --timeout 5m \
  --wait                   # Wait until all pods are ready

Debugging Runbook

# Pod stuck in Pending
kubectl describe pod <pod> -n <ns>
# → Insufficient CPU/memory? Check node capacity: kubectl top nodes
# → Node selector/affinity mismatch? Check tolerations
# → PVC not bound? Check: kubectl get pvc -n <ns>

# Pod in CrashLoopBackOff
kubectl logs <pod> -n <ns> --previous   # Crashed container logs
# → OOMKilled? Increase memory limits
# → Exit code 1? Check app startup errors
# → Exit code 137? Killed by OOM killer, check memory limits

# Pod in ImagePullBackOff
kubectl describe pod <pod> -n <ns>  # Check Events section
# → Check image name/tag spelling
# → Check imagePullSecrets if private registry

# Service not accessible
kubectl get endpoints <service> -n <ns>  # Are pods showing as endpoints?
# If no endpoints: check selector labels match pod labels

# Debug network policy issues
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
# Inside: curl http://service.namespace.svc.cluster.local

Resource Sizing Guidelines

Workload

CPU Request

CPU Limit

Mem Request

Mem Limit

Node.js API	50m-100m	500m-1000m	64Mi-128Mi	256Mi-512Mi
Python API	100m-200m	500m-1000m	128Mi-256Mi	512Mi-1Gi
Go API	50m-100m	200m-500m	32Mi-64Mi	128Mi-256Mi
Java API	500m-1000m	2000m	512Mi-1Gi	2Gi-4Gi
Redis	100m	500m	128Mi	512Mi
PostgreSQL	250m	2000m	256Mi	2Gi

Rule: Set limits 2-4x requests. CPU throttled at limit (never killed). Memory OOM-killed at limit.

Skill Information

Source: MoltbotDen
Category: DevOps & Cloud
Repository: View on GitHub