DevOps & CloudDocumentedScanned

kubernetes

Comprehensive Kubernetes and OpenShift cluster management.

Share:

Installation

npx clawhub@latest install kubernetes

View the full skill documentation and source below.

Documentation

Kubernetes & OpenShift Cluster Management

Comprehensive skill for Kubernetes and OpenShift clusters covering operations, troubleshooting, manifests, security, and GitOps.

Current Versions (January 2026)

PlatformVersionDocumentation
Kubernetes1.31.x
OpenShift4.17.x
EKS1.31
AKS1.31
GKE1.31

Key Tools

ToolVersionPurpose
ArgoCDv2.13.xGitOps deployments
Fluxv2.4.xGitOps toolkit
Kustomizev5.5.xManifest customization
Helmv3.16.xPackage management
Velero1.15.xBackup/restore
Trivy0.58.xSecurity scanning
Kyverno1.13.xPolicy engine

Command Convention

IMPORTANT: Use kubectl for standard Kubernetes. Use oc for OpenShift/ARO.


1. CLUSTER OPERATIONS

Node Management

# View nodes
kubectl get nodes -o wide

# Drain node for maintenance
kubectl drain ${NODE} --ignore-daemonsets --delete-emptydir-data --grace-period=60

# Uncordon after maintenance
kubectl uncordon ${NODE}

# View node resources
kubectl top nodes

Cluster Upgrades

AKS:

az aks get-upgrades -g ${RG} -n ${CLUSTER} -o table
az aks upgrade -g ${RG} -n ${CLUSTER} --kubernetes-version ${VERSION}

EKS:

aws eks update-cluster-version --name ${CLUSTER} --kubernetes-version ${VERSION}

GKE:

gcloud container clusters upgrade ${CLUSTER} --master --cluster-version ${VERSION}

OpenShift:

oc adm upgrade --to=${VERSION}
oc get clusterversion

Backup with Velero

# Install Velero
velero install --provider ${PROVIDER} --bucket ${BUCKET} --secret-file ${CREDS}

# Create backup
velero backup create ${BACKUP_NAME} --include-namespaces ${NS}

# Restore
velero restore create --from-backup ${BACKUP_NAME}

2. TROUBLESHOOTING

Health Assessment

Run the bundled script for comprehensive health check:

bash scripts/cluster-health-check.sh

Pod Status Interpretation

StatusMeaningAction
PendingScheduling issueCheck resources, nodeSelector, tolerations
CrashLoopBackOffContainer crashingCheck logs: kubectl logs ${POD} --previous
ImagePullBackOffImage unavailableVerify image name, registry access
OOMKilledOut of memoryIncrease memory limits
EvictedNode pressureCheck node resources

Debugging Commands

# Pod logs (current and previous)
kubectl logs ${POD} -c ${CONTAINER} --previous

# Multi-pod logs with stern
stern ${LABEL_SELECTOR} -n ${NS}

# Exec into pod
kubectl exec -it ${POD} -- /bin/sh

# Pod events
kubectl describe pod ${POD} | grep -A 20 Events

# Cluster events (sorted by time)
kubectl get events -A --sort-by='.lastTimestamp' | tail -50

Network Troubleshooting

# Test DNS
kubectl run -it --rm debug --image=busybox -- nslookup kubernetes.default

# Test service connectivity
kubectl run -it --rm debug --image=curlimages/curl -- curl -v 

# Check endpoints
kubectl get endpoints ${SVC}

3. MANIFEST GENERATION

Production Deployment Template

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${APP_NAME}
  namespace: ${NAMESPACE}
  labels:
    app.kubernetes.io/name: ${APP_NAME}
    app.kubernetes.io/version: "${VERSION}"
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app.kubernetes.io/name: ${APP_NAME}
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ${APP_NAME}
    spec:
      serviceAccountName: ${APP_NAME}
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
        - name: ${APP_NAME}
          image: ${IMAGE}:${TAG}
          ports:
            - name: http
              containerPort: 8080
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: http
            initialDelaySeconds: 10
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
          volumeMounts:
            - name: tmp
              mountPath: /tmp
      volumes:
        - name: tmp
          emptyDir: {}
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app.kubernetes.io/name: ${APP_NAME}
                topologyKey: kubernetes.io/hostname

Service & Ingress

apiVersion: v1
kind: Service
metadata:
  name: ${APP_NAME}
spec:
  selector:
    app.kubernetes.io/name: ${APP_NAME}
  ports:
    - name: http
      port: 80
      targetPort: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ${APP_NAME}
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - ${HOST}
      secretName: ${APP_NAME}-tls
  rules:
    - host: ${HOST}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: ${APP_NAME}
                port:
                  name: http

OpenShift Route

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: ${APP_NAME}
spec:
  to:
    kind: Service
    name: ${APP_NAME}
  port:
    targetPort: http
  tls:
    termination: edge
    insecureEdgeTerminationPolicy: Redirect

Use the bundled script for manifest generation:

bash scripts/generate-manifest.sh deployment myapp production


4. SECURITY

Security Audit

Run the bundled script:

bash scripts/security-audit.sh [namespace]

Pod Security Standards

apiVersion: v1
kind: Namespace
metadata:
  name: ${NAMESPACE}
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: baseline
    pod-security.kubernetes.io/warn: restricted

NetworkPolicy (Zero Trust)

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ${APP_NAME}-policy
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: ${APP_NAME}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: frontend
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: database
      ports:
        - protocol: TCP
          port: 5432
    # Allow DNS
    - to:
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53

RBAC Best Practices

apiVersion: v1
kind: ServiceAccount
metadata:
  name: ${APP_NAME}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: ${APP_NAME}-role
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: ${APP_NAME}-binding
subjects:
  - kind: ServiceAccount
    name: ${APP_NAME}
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: ${APP_NAME}-role

Image Scanning

# Scan image with Trivy
trivy image ${IMAGE}:${TAG}

# Scan with severity filter
trivy image --severity HIGH,CRITICAL ${IMAGE}:${TAG}

# Generate SBOM
trivy image --format spdx-json -o sbom.json ${IMAGE}:${TAG}

5. GITOPS

ArgoCD Application

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: ${APP_NAME}
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default
  source:
    repoURL: ${GIT_REPO}
    targetRevision: main
    path: k8s/overlays/${ENV}
  destination:
    server: 
    namespace: ${NAMESPACE}
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Kustomize Structure

k8s/
├── base/
│   ├── kustomization.yaml
│   ├── deployment.yaml
│   └── service.yaml
└── overlays/
    ├── dev/
    │   └── kustomization.yaml
    ├── staging/
    │   └── kustomization.yaml
    └── prod/
        └── kustomization.yaml

base/kustomization.yaml:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - deployment.yaml
  - service.yaml

overlays/prod/kustomization.yaml:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - ../../base
namePrefix: prod-
namespace: production
replicas:
  - name: myapp
    count: 5
images:
  - name: myregistry/myapp
    newTag: v1.2.3

GitHub Actions CI/CD

name: Build and Deploy
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Build and push image
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
      
      - name: Update Kustomize image
        run: |
          cd k8s/overlays/prod
          kustomize edit set image myapp=${{ secrets.REGISTRY }}/${{ github.event.repository.name }}:${{ github.sha }}
          
      - name: Commit and push
        run: |
          git config user.name "github-actions"
          git config user.email "github-actions@github.com"
          git add .
          git commit -m "Update image to ${{ github.sha }}"
          git push

Use the bundled script for ArgoCD sync:

bash scripts/argocd-app-sync.sh ${APP_NAME} --prune


Helper Scripts

This skill includes automation scripts in the scripts/ directory:

ScriptPurpose
cluster-health-check.shComprehensive cluster health assessment with scoring
security-audit.shSecurity posture audit (privileged, root, RBAC, NetworkPolicy)
node-maintenance.shSafe node drain and maintenance prep
pre-upgrade-check.shPre-upgrade validation checklist
generate-manifest.shGenerate production-ready K8s manifests
argocd-app-sync.shArgoCD application sync helper
Run any script:
bash scripts/<script-name>.sh [arguments]