diff --git a/skills/kubernetes-patterns/SKILL.md b/skills/kubernetes-patterns/SKILL.md
new file mode 100644
index 00000000..1d5a23e8
--- /dev/null
+++ b/skills/kubernetes-patterns/SKILL.md
@@ -0,0 +1,755 @@
+---
+name: kubernetes-patterns
+description: Kubernetes workload patterns, resource management, RBAC, probes, autoscaling, ConfigMap/Secret handling, and kubectl debugging for production-grade deployments.
+origin: ECC
+---
+
+# Kubernetes Patterns
+
+Production-grade Kubernetes patterns for deploying, managing, and debugging workloads reliably.
+
+## When to Activate
+
+- Writing Kubernetes manifests (Deployments, Services, Ingress, Jobs)
+- Configuring resource requests/limits, liveness/readiness probes
+- Setting up RBAC, namespaces, or ServiceAccounts
+- Managing configuration and secrets in K8s
+- Debugging CrashLoopBackOff, OOMKilled, pending pods, or image pull errors
+- Configuring HPA (Horizontal Pod Autoscaler) or PodDisruptionBudgets
+- Reviewing K8s YAML for security or correctness
+
+## When to Use
+
+> Same as **When to Activate** above. This alias satisfies repo skill-format conventions. Use this skill any time you are writing, reviewing, or debugging Kubernetes YAML and workloads.
+
+## How It Works
+
+This skill provides **copy-pasteable, production-grade YAML patterns** and **kubectl debugging commands** organized by task:
+
+1. **Deployment template** — A fully configured production `Deployment` with security context, rolling update strategy, all three probe types, resource limits, and environment injection from ConfigMap/Secret.
+2. **Probes** — Decision table for startup vs liveness vs readiness, with correct `failureThreshold × periodSeconds` math.
+3. **Services & Ingress** — ClusterIP, LoadBalancer, and TLS Ingress patterns with cert-manager annotations.
+4. **ConfigMaps & Secrets** — `envFrom`, file-mount, and external secrets guidance.
+5. **Resource management** — Requests vs limits rules of thumb by workload type (web API, JVM, worker, sidecar).
+6. **RBAC** — Least-privilege ServiceAccount → Role → RoleBinding chain.
+7. **HPA & PDB** — Autoscaling and node-drain safety configurations.
+8. **Jobs & CronJobs** — One-off and scheduled workload patterns with correct `restartPolicy`.
+9. **kubectl cheatsheet** — Logs, exec, rollback, port-forward, dry-run, and common error diagnosis commands.
+10. **Anti-patterns & checklist** — What NOT to do, and a security/reliability/observability checklist.
+
+## Examples
+
+See the sections below for complete, runnable examples. Quick references:
+
+| Task | Jump to |
+|------|---------|
+| Full production Deployment YAML | [Core Workload Patterns](#core-workload-patterns) |
+| Probe configuration | [Probes](#probes--liveness-readiness-startup) |
+| RBAC least-privilege setup | [RBAC](#rbac--roles-and-serviceaccounts) |
+| Debug a CrashLoopBackOff | [kubectl Debugging Cheatsheet](#kubectl-debugging-cheatsheet) |
+| Autoscaling | [HPA](#horizontal-pod-autoscaler-hpa) |
+
+---
+
+## Core Workload Patterns
+
+### Deployment — Production Template
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: my-app
+  namespace: my-namespace
+  labels:
+    app: my-app
+    version: "1.0.0"
+spec:
+  replicas: 3
+  selector:
+    matchLabels:
+      app: my-app
+  strategy:
+    type: RollingUpdate
+    rollingUpdate:
+      maxSurge: 1          # Allow 1 extra pod during update
+      maxUnavailable: 0    # Never reduce below desired count
+  template:
+    metadata:
+      labels:
+        app: my-app
+        version: "1.0.0"
+    spec:
+      # Security context at pod level
+      securityContext:
+        runAsNonRoot: true
+        runAsUser: 1001
+        fsGroup: 1001
+
+      # Graceful shutdown
+      terminationGracePeriodSeconds: 30
+
+      containers:
+        - name: my-app
+          image: ghcr.io/org/my-app:1.0.0   # Never use :latest
+          imagePullPolicy: IfNotPresent
+
+          ports:
+            - containerPort: 8080
+              protocol: TCP
+
+          # Resource requests AND limits are both required
+          resources:
+            requests:
+              cpu: "100m"
+              memory: "128Mi"
+            limits:
+              cpu: "500m"
+              memory: "256Mi"
+
+          # Container security context
+          securityContext:
+            allowPrivilegeEscalation: false
+            readOnlyRootFilesystem: true
+            capabilities:
+              drop:
+                - ALL
+
+          # Probes (see Probes section below)
+          startupProbe:
+            httpGet:
+              path: /health
+              port: 8080
+            failureThreshold: 30
+            periodSeconds: 5
+          livenessProbe:
+            httpGet:
+              path: /health
+              port: 8080
+            initialDelaySeconds: 0
+            periodSeconds: 30
+            failureThreshold: 3
+          readinessProbe:
+            httpGet:
+              path: /ready
+              port: 8080
+            initialDelaySeconds: 5
+            periodSeconds: 10
+            failureThreshold: 2
+
+          # Environment from ConfigMap and Secret
+          envFrom:
+            - configMapRef:
+                name: my-app-config
+          env:
+            - name: DB_PASSWORD
+              valueFrom:
+                secretKeyRef:
+                  name: my-app-secrets
+                  key: db-password
+
+          # Writable tmp directory when readOnlyRootFilesystem: true
+          volumeMounts:
+            - name: tmp
+              mountPath: /tmp
+
+      volumes:
+        - name: tmp
+          emptyDir: {}
+```
+
+---
+
+## Probes — Liveness, Readiness, Startup
+
+Understanding when to use each probe is critical:
+
+| Probe | Failure Action | Use For |
+|-------|---------------|---------|
+| `startupProbe` | Kills container if slow to start | Slow-starting apps (JVM, Python) |
+| `livenessProbe` | Restarts container | Deadlock / hung process detection |
+| `readinessProbe` | Removes from Service endpoints | Temporary unavailability (DB reconnect) |
+
+```yaml
+# Correct pattern: startupProbe covers slow startup,
+# then liveness/readiness take over
+startupProbe:
+  httpGet:
+    path: /health
+    port: 8080
+  failureThreshold: 30  # 30 * 5s = 150s max startup time
+  periodSeconds: 5
+
+livenessProbe:
+  httpGet:
+    path: /health
+    port: 8080
+  periodSeconds: 30
+  failureThreshold: 3   # 3 * 30s = 90s before restart
+
+readinessProbe:
+  httpGet:
+    path: /ready         # Separate endpoint: checks DB, cache, etc.
+    port: 8080
+  periodSeconds: 10
+  failureThreshold: 2
+```
+
+```yaml
+# WRONG: initialDelaySeconds without startupProbe
+# If the app takes 60s to start, set a startupProbe instead
+livenessProbe:
+  httpGet:
+    path: /health
+    port: 8080
+  initialDelaySeconds: 60   # BAD: Arbitrary wait, race condition
+```
+
+---
+
+## Services and Ingress
+
+### Service Types
+
+```yaml
+# ClusterIP (default) — internal-only
+apiVersion: v1
+kind: Service
+metadata:
+  name: my-app
+  namespace: my-namespace
+spec:
+  selector:
+    app: my-app
+  ports:
+    - port: 80
+      targetPort: 8080
+      protocol: TCP
+  type: ClusterIP
+```
+
+```yaml
+# LoadBalancer — external traffic (cloud providers)
+spec:
+  type: LoadBalancer
+  ports:
+    - port: 443
+      targetPort: 8080
+```
+
+### Ingress with TLS
+
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: my-app
+  namespace: my-namespace
+  annotations:
+    nginx.ingress.kubernetes.io/ssl-redirect: "true"
+    cert-manager.io/cluster-issuer: "letsencrypt-prod"
+spec:
+  ingressClassName: nginx
+  tls:
+    - hosts:
+        - myapp.example.com
+      secretName: my-app-tls
+  rules:
+    - host: myapp.example.com
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service:
+                name: my-app
+                port:
+                  number: 80
+```
+
+---
+
+## ConfigMaps and Secrets
+
+### ConfigMap — Non-sensitive configuration
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: my-app-config
+  namespace: my-namespace
+data:
+  LOG_LEVEL: "info"
+  APP_ENV: "production"
+  MAX_CONNECTIONS: "100"
+  # Mount as a file for complex config
+  app.yaml: |
+    server:
+      port: 8080
+      timeout: 30s
+```
+
+```yaml
+# Mount ConfigMap as a file
+volumes:
+  - name: config
+    configMap:
+      name: my-app-config
+      items:
+        - key: app.yaml
+          path: app.yaml
+volumeMounts:
+  - name: config
+    mountPath: /etc/app
+    readOnly: true
+```
+
+### Secrets — Sensitive data
+
+```bash
+# Create secret from literal (CLI, then store in Vault/SOPS)
+kubectl create secret generic my-app-secrets \
+  --from-literal=db-password='s3cr3t' \
+  --namespace=my-namespace \
+  --dry-run=client -o yaml | kubectl apply -f -
+```
+
+```yaml
+apiVersion: v1
+kind: Secret
+metadata:
+  name: my-app-secrets
+  namespace: my-namespace
+type: Opaque
+# Values are base64-encoded (NOT encrypted — use Sealed Secrets or ESO for real encryption)
+data:
+  db-password: czNjcjN0  # base64 of 's3cr3t'
+```
+
+> **Important:** Raw Kubernetes Secrets are only base64-encoded, not encrypted at rest unless your cluster has encryption configured. Use [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) or [External Secrets Operator](https://external-secrets.io) for production.
+
+---
+
+## Resource Requests and Limits
+
+```yaml
+resources:
+  requests:       # Scheduler uses this to place the pod
+    cpu: "100m"   # 100 millicores = 0.1 CPU
+    memory: "128Mi"
+  limits:         # Container is killed/throttled above this
+    cpu: "500m"
+    memory: "256Mi"
+```
+
+**Rules of thumb:**
+
+| Workload Type | CPU Request | Memory Request | Notes |
+|---------------|-------------|----------------|-------|
+| Web API | 100–250m | 128–256Mi | Set limits 2-4x requests |
+| Worker/consumer | 250–500m | 256–512Mi | Memory limit = request for predictability |
+| JVM app | 500m–1 | 512Mi–2Gi | Allow headroom above `-Xmx` for JVM overhead |
+| Sidecar | 10–50m | 32–64Mi | Keep minimal |
+
+```yaml
+# WRONG: No requests or limits — unpredictable scheduling, OOM evictions
+containers:
+  - name: app
+    image: myapp:latest
+    # Missing resources: {} — this is dangerous in production
+
+# WRONG: Limits without requests — requests default to limits, over-reserves capacity
+resources:
+  limits:
+    cpu: "2"
+    memory: "1Gi"
+  # requests missing — will default to limits values
+```
+
+---
+
+## RBAC — Roles and ServiceAccounts
+
+### Principle of Least Privilege
+
+**Two patterns depending on whether the app calls the Kubernetes API:**
+
+#### Pattern A — App does NOT need the Kubernetes API (most apps)
+
+Disable token automounting on the ServiceAccount. The Role/RoleBinding are not needed.
+
+```yaml
+# ServiceAccount with token disabled — safest default
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: my-app-sa
+  namespace: my-namespace
+automountServiceAccountToken: false   # No K8s API token injected into pods
+```
+
+```yaml
+# Reference in Deployment — no token, no API access
+spec:
+  template:
+    spec:
+      serviceAccountName: my-app-sa
+      automountServiceAccountToken: false   # Belt-and-suspenders: also set at pod level
+```
+
+#### Pattern B — App DOES need the Kubernetes API (operators, controllers, config watchers)
+
+Enable the token and grant only the permissions actually required.
+
+```yaml
+# 1. ServiceAccount — enable token for this SA
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: my-app-sa
+  namespace: my-namespace
+automountServiceAccountToken: true    # Token required: app calls K8s API
+```
+
+```yaml
+# 2. Role — grant only what the app needs (namespace-scoped)
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: my-app-role
+  namespace: my-namespace
+rules:
+  - apiGroups: [""]
+    resources: ["configmaps"]
+    verbs: ["get", "list", "watch"]    # Read-only, specific resource
+  - apiGroups: [""]
+    resources: ["secrets"]
+    resourceNames: ["my-app-secrets"]  # Restrict to specific secret by name
+    verbs: ["get"]
+```
+
+```yaml
+# 3. Bind Role to ServiceAccount
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: my-app-rolebinding
+  namespace: my-namespace
+subjects:
+  - kind: ServiceAccount
+    name: my-app-sa
+    namespace: my-namespace
+roleRef:
+  kind: Role
+  apiGroup: rbac.authorization.k8s.io
+  name: my-app-role
+```
+
+```yaml
+# 4. Reference SA in Deployment
+spec:
+  template:
+    spec:
+      serviceAccountName: my-app-sa
+      # automountServiceAccountToken defaults to true from SA — token is injected
+```
+
+---
+
+## Horizontal Pod Autoscaler (HPA)
+
+```yaml
+apiVersion: autoscaling/v2
+kind: HorizontalPodAutoscaler
+metadata:
+  name: my-app-hpa
+  namespace: my-namespace
+spec:
+  scaleTargetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: my-app
+  minReplicas: 2      # Always at least 2 for HA
+  maxReplicas: 10
+  metrics:
+    - type: Resource
+      resource:
+        name: cpu
+        target:
+          type: Utilization
+          averageUtilization: 70    # Scale up when avg CPU > 70%
+    - type: Resource
+      resource:
+        name: memory
+        target:
+          type: Utilization
+          averageUtilization: 80
+```
+
+> HPA requires `resources.requests` to be set on all containers — it calculates utilization as `current / request`.
+
+---
+
+## PodDisruptionBudget (PDB)
+
+Prevent too many pods going down during node drains or rolling updates:
+
+```yaml
+apiVersion: policy/v1
+kind: PodDisruptionBudget
+metadata:
+  name: my-app-pdb
+  namespace: my-namespace
+spec:
+  minAvailable: 2           # OR use maxUnavailable: 1
+  selector:
+    matchLabels:
+      app: my-app
+```
+
+---
+
+## Namespaces and Multi-Tenancy
+
+```bash
+# Create namespace with resource quotas
+kubectl create namespace my-namespace
+
+# Apply ResourceQuota to limit namespace consumption
+kubectl apply -f - <<EOF
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+  name: my-namespace-quota
+  namespace: my-namespace
+spec:
+  hard:
+    requests.cpu: "4"
+    requests.memory: 4Gi
+    limits.cpu: "8"
+    limits.memory: 8Gi
+    pods: "20"
+EOF
+```
+
+---
+
+## Jobs and CronJobs
+
+```yaml
+# One-off Job (DB migration, data processing)
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: db-migrate
+  namespace: my-namespace
+spec:
+  backoffLimit: 3          # Retry up to 3 times on failure
+  ttlSecondsAfterFinished: 3600   # Auto-delete after 1h
+  template:
+    spec:
+      restartPolicy: OnFailure    # Never for Jobs (not Always)
+      containers:
+        - name: migrate
+          image: ghcr.io/org/my-app:1.0.0
+          command: ["python", "manage.py", "migrate"]
+          resources:
+            requests:
+              cpu: "100m"
+              memory: "256Mi"
+```
+
+```yaml
+# CronJob
+apiVersion: batch/v1
+kind: CronJob
+metadata:
+  name: cleanup-job
+  namespace: my-namespace
+spec:
+  schedule: "0 2 * * *"         # 2am daily
+  concurrencyPolicy: Forbid      # Don't run if previous still running
+  successfulJobsHistoryLimit: 3
+  failedJobsHistoryLimit: 1
+  jobTemplate:
+    spec:
+      template:
+        spec:
+          restartPolicy: OnFailure
+          containers:
+            - name: cleanup
+              image: ghcr.io/org/cleanup:1.0.0
+              resources:
+                requests:
+                  cpu: "50m"
+                  memory: "64Mi"
+```
+
+---
+
+## kubectl Debugging Cheatsheet
+
+```bash
+# --- Pod status and logs ---
+kubectl get pods -n my-namespace
+kubectl get pods -n my-namespace -o wide          # Show node assignment
+kubectl describe pod <pod-name> -n my-namespace   # Events and state details
+kubectl logs <pod-name> -n my-namespace           # Current logs
+kubectl logs <pod-name> -n my-namespace --previous  # Logs from crashed container
+kubectl logs <pod-name> -n my-namespace -c <container>  # Multi-container pod
+
+# --- Execute into a running container ---
+kubectl exec -it <pod-name> -n my-namespace -- sh
+kubectl exec -it <pod-name> -n my-namespace -- bash
+
+# --- Check resource usage ---
+kubectl top pods -n my-namespace
+kubectl top nodes
+
+# --- Deployment operations ---
+kubectl rollout status deployment/my-app -n my-namespace
+kubectl rollout history deployment/my-app -n my-namespace
+kubectl rollout undo deployment/my-app -n my-namespace      # Rollback
+kubectl rollout undo deployment/my-app --to-revision=2 -n my-namespace
+
+# --- Scale manually ---
+kubectl scale deployment my-app --replicas=5 -n my-namespace
+
+# --- Inspect events (cluster-wide issues) ---
+kubectl get events -n my-namespace --sort-by='.lastTimestamp'
+
+# --- Port-forward for local debugging ---
+kubectl port-forward pod/<pod-name> 8080:8080 -n my-namespace
+kubectl port-forward svc/my-app 8080:80 -n my-namespace
+
+# --- Dry-run to validate YAML ---
+kubectl apply -f deployment.yaml --dry-run=client
+kubectl apply -f deployment.yaml --dry-run=server   # Validates against live cluster
+```
+
+### Diagnosing Common Errors
+
+```bash
+# CrashLoopBackOff: container keeps crashing
+kubectl logs <pod-name> --previous -n my-namespace  # Check crash logs
+kubectl describe pod <pod-name> -n my-namespace     # Check exit code & OOMKilled
+
+# ImagePullBackOff: can't pull image
+kubectl describe pod <pod-name> -n my-namespace     # Check Events section
+# Causes: wrong image tag, missing imagePullSecret, private registry
+
+# Pending pod: not scheduled
+kubectl describe pod <pod-name> -n my-namespace
+# Causes: insufficient resources, no matching node selector, taint/toleration mismatch
+
+# OOMKilled: out of memory
+# Increase memory limits, check for memory leaks
+kubectl describe pod <pod-name> -n my-namespace | grep -A5 "Last State"
+```
+
+---
+
+## Anti-Patterns
+
+```yaml
+# BAD: Using :latest tag — non-deterministic deployments
+image: myapp:latest
+
+# GOOD: Pin to a specific immutable tag (SHA or semver)
+image: ghcr.io/org/myapp:1.4.2
+# or
+image: ghcr.io/org/myapp@sha256:abc123...
+
+# ---
+
+# BAD: Running as root
+securityContext: {}    # Defaults to root
+
+# GOOD: Non-root with explicit UID
+securityContext:
+  runAsNonRoot: true
+  runAsUser: 1001
+
+# ---
+
+# BAD: No resource limits — one pod can starve the entire node
+containers:
+  - name: app
+    image: myapp:1.0.0
+    # No resources defined
+
+# GOOD: Always set requests and limits
+resources:
+  requests:
+    cpu: "100m"
+    memory: "128Mi"
+  limits:
+    cpu: "500m"
+    memory: "256Mi"
+
+# ---
+
+# BAD: Storing plaintext secrets in ConfigMaps
+apiVersion: v1
+kind: ConfigMap
+data:
+  DB_PASSWORD: "mysecretpassword"   # NEVER — use Secret or external secrets manager
+
+# ---
+
+# BAD: ClusterAdmin for application service accounts
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+roleRef:
+  kind: ClusterRole
+  name: cluster-admin    # Grants god-mode to your app
+
+# ---
+
+# BAD: minAvailable: 0 in PDB — defeats the purpose
+spec:
+  minAvailable: 0
+
+# ---
+
+# BAD: restartPolicy: Always in a Job (causes infinite restart loop)
+spec:
+  restartPolicy: Always   # Use OnFailure or Never for Jobs
+```
+
+---
+
+## Best Practices Checklist
+
+### Security
+- [ ] Container runs as non-root (`runAsNonRoot: true`, `runAsUser` set)
+- [ ] `readOnlyRootFilesystem: true` with `emptyDir` for writable paths
+- [ ] `allowPrivilegeEscalation: false`
+- [ ] All capabilities dropped (`capabilities.drop: [ALL]`)
+- [ ] Dedicated ServiceAccount per app, not `default`
+- [ ] `automountServiceAccountToken: false` unless needed
+- [ ] RBAC follows least privilege (use `Role`, not `ClusterRole` unless needed)
+- [ ] Secrets managed via Sealed Secrets or External Secrets Operator
+
+### Reliability
+- [ ] All 3 probe types configured (startup + liveness + readiness)
+- [ ] Resource requests AND limits set on every container
+- [ ] `minReplicas: 2+` for any production workload
+- [ ] PodDisruptionBudget defined for stateful or critical services
+- [ ] `RollingUpdate` strategy with `maxUnavailable: 0`
+- [ ] HPA configured for variable-load services
+
+### Observability
+- [ ] App exposes `/health` (liveness) and `/ready` (readiness) endpoints
+- [ ] Structured JSON logging (no PII in logs)
+- [ ] Resource labels: `app`, `version`, `environment`
+
+---
+
+## Related Skills
+
+- `docker-patterns` — Multi-stage Dockerfiles and image security
+- `deployment-patterns` — CI/CD pipelines, rollback strategy, health check endpoints
+- `security-review` — Broader security hardening context
+- `git-workflow` — GitOps integration with K8s (ArgoCD / Flux patterns)