mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-06-13 23:03:34 +08:00
feat(skills): add kubernetes-patterns skill (#2178)
* feat(skills): add kubernetes-patterns skill * fix(skills): address CodeRabbit review on kubernetes-patterns - Add When to Use alias section (repo skill-format requirement) - Add How It Works overview section (required schema) - Add Examples quick-reference table (required schema) - Fix RBAC: split into Pattern A (no API, token disabled) and Pattern B (needs API, token enabled) to resolve contradiction between automountServiceAccountToken: false and Role/RoleBinding - Fix missing -n my-namespace flag on OOMKilled kubectl describe command
This commit is contained in:
parent
7883da658b
commit
e116d69c65
755
skills/kubernetes-patterns/SKILL.md
Normal file
755
skills/kubernetes-patterns/SKILL.md
Normal file
@ -0,0 +1,755 @@
|
|||||||
|
---
|
||||||
|
name: kubernetes-patterns
|
||||||
|
description: Kubernetes workload patterns, resource management, RBAC, probes, autoscaling, ConfigMap/Secret handling, and kubectl debugging for production-grade deployments.
|
||||||
|
origin: ECC
|
||||||
|
---
|
||||||
|
|
||||||
|
# Kubernetes Patterns
|
||||||
|
|
||||||
|
Production-grade Kubernetes patterns for deploying, managing, and debugging workloads reliably.
|
||||||
|
|
||||||
|
## When to Activate
|
||||||
|
|
||||||
|
- Writing Kubernetes manifests (Deployments, Services, Ingress, Jobs)
|
||||||
|
- Configuring resource requests/limits, liveness/readiness probes
|
||||||
|
- Setting up RBAC, namespaces, or ServiceAccounts
|
||||||
|
- Managing configuration and secrets in K8s
|
||||||
|
- Debugging CrashLoopBackOff, OOMKilled, pending pods, or image pull errors
|
||||||
|
- Configuring HPA (Horizontal Pod Autoscaler) or PodDisruptionBudgets
|
||||||
|
- Reviewing K8s YAML for security or correctness
|
||||||
|
|
||||||
|
## When to Use
|
||||||
|
|
||||||
|
> Same as **When to Activate** above. This alias satisfies repo skill-format conventions. Use this skill any time you are writing, reviewing, or debugging Kubernetes YAML and workloads.
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
This skill provides **copy-pasteable, production-grade YAML patterns** and **kubectl debugging commands** organized by task:
|
||||||
|
|
||||||
|
1. **Deployment template** — A fully configured production `Deployment` with security context, rolling update strategy, all three probe types, resource limits, and environment injection from ConfigMap/Secret.
|
||||||
|
2. **Probes** — Decision table for startup vs liveness vs readiness, with correct `failureThreshold × periodSeconds` math.
|
||||||
|
3. **Services & Ingress** — ClusterIP, LoadBalancer, and TLS Ingress patterns with cert-manager annotations.
|
||||||
|
4. **ConfigMaps & Secrets** — `envFrom`, file-mount, and external secrets guidance.
|
||||||
|
5. **Resource management** — Requests vs limits rules of thumb by workload type (web API, JVM, worker, sidecar).
|
||||||
|
6. **RBAC** — Least-privilege ServiceAccount → Role → RoleBinding chain.
|
||||||
|
7. **HPA & PDB** — Autoscaling and node-drain safety configurations.
|
||||||
|
8. **Jobs & CronJobs** — One-off and scheduled workload patterns with correct `restartPolicy`.
|
||||||
|
9. **kubectl cheatsheet** — Logs, exec, rollback, port-forward, dry-run, and common error diagnosis commands.
|
||||||
|
10. **Anti-patterns & checklist** — What NOT to do, and a security/reliability/observability checklist.
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
See the sections below for complete, runnable examples. Quick references:
|
||||||
|
|
||||||
|
| Task | Jump to |
|
||||||
|
|------|---------|
|
||||||
|
| Full production Deployment YAML | [Core Workload Patterns](#core-workload-patterns) |
|
||||||
|
| Probe configuration | [Probes](#probes--liveness-readiness-startup) |
|
||||||
|
| RBAC least-privilege setup | [RBAC](#rbac--roles-and-serviceaccounts) |
|
||||||
|
| Debug a CrashLoopBackOff | [kubectl Debugging Cheatsheet](#kubectl-debugging-cheatsheet) |
|
||||||
|
| Autoscaling | [HPA](#horizontal-pod-autoscaler-hpa) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Workload Patterns
|
||||||
|
|
||||||
|
### Deployment — Production Template
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: my-app
|
||||||
|
namespace: my-namespace
|
||||||
|
labels:
|
||||||
|
app: my-app
|
||||||
|
version: "1.0.0"
|
||||||
|
spec:
|
||||||
|
replicas: 3
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: my-app
|
||||||
|
strategy:
|
||||||
|
type: RollingUpdate
|
||||||
|
rollingUpdate:
|
||||||
|
maxSurge: 1 # Allow 1 extra pod during update
|
||||||
|
maxUnavailable: 0 # Never reduce below desired count
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: my-app
|
||||||
|
version: "1.0.0"
|
||||||
|
spec:
|
||||||
|
# Security context at pod level
|
||||||
|
securityContext:
|
||||||
|
runAsNonRoot: true
|
||||||
|
runAsUser: 1001
|
||||||
|
fsGroup: 1001
|
||||||
|
|
||||||
|
# Graceful shutdown
|
||||||
|
terminationGracePeriodSeconds: 30
|
||||||
|
|
||||||
|
containers:
|
||||||
|
- name: my-app
|
||||||
|
image: ghcr.io/org/my-app:1.0.0 # Never use :latest
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
|
||||||
|
ports:
|
||||||
|
- containerPort: 8080
|
||||||
|
protocol: TCP
|
||||||
|
|
||||||
|
# Resource requests AND limits are both required
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: "100m"
|
||||||
|
memory: "128Mi"
|
||||||
|
limits:
|
||||||
|
cpu: "500m"
|
||||||
|
memory: "256Mi"
|
||||||
|
|
||||||
|
# Container security context
|
||||||
|
securityContext:
|
||||||
|
allowPrivilegeEscalation: false
|
||||||
|
readOnlyRootFilesystem: true
|
||||||
|
capabilities:
|
||||||
|
drop:
|
||||||
|
- ALL
|
||||||
|
|
||||||
|
# Probes (see Probes section below)
|
||||||
|
startupProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /health
|
||||||
|
port: 8080
|
||||||
|
failureThreshold: 30
|
||||||
|
periodSeconds: 5
|
||||||
|
livenessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /health
|
||||||
|
port: 8080
|
||||||
|
initialDelaySeconds: 0
|
||||||
|
periodSeconds: 30
|
||||||
|
failureThreshold: 3
|
||||||
|
readinessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /ready
|
||||||
|
port: 8080
|
||||||
|
initialDelaySeconds: 5
|
||||||
|
periodSeconds: 10
|
||||||
|
failureThreshold: 2
|
||||||
|
|
||||||
|
# Environment from ConfigMap and Secret
|
||||||
|
envFrom:
|
||||||
|
- configMapRef:
|
||||||
|
name: my-app-config
|
||||||
|
env:
|
||||||
|
- name: DB_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: my-app-secrets
|
||||||
|
key: db-password
|
||||||
|
|
||||||
|
# Writable tmp directory when readOnlyRootFilesystem: true
|
||||||
|
volumeMounts:
|
||||||
|
- name: tmp
|
||||||
|
mountPath: /tmp
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
- name: tmp
|
||||||
|
emptyDir: {}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Probes — Liveness, Readiness, Startup
|
||||||
|
|
||||||
|
Understanding when to use each probe is critical:
|
||||||
|
|
||||||
|
| Probe | Failure Action | Use For |
|
||||||
|
|-------|---------------|---------|
|
||||||
|
| `startupProbe` | Kills container if slow to start | Slow-starting apps (JVM, Python) |
|
||||||
|
| `livenessProbe` | Restarts container | Deadlock / hung process detection |
|
||||||
|
| `readinessProbe` | Removes from Service endpoints | Temporary unavailability (DB reconnect) |
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Correct pattern: startupProbe covers slow startup,
|
||||||
|
# then liveness/readiness take over
|
||||||
|
startupProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /health
|
||||||
|
port: 8080
|
||||||
|
failureThreshold: 30 # 30 * 5s = 150s max startup time
|
||||||
|
periodSeconds: 5
|
||||||
|
|
||||||
|
livenessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /health
|
||||||
|
port: 8080
|
||||||
|
periodSeconds: 30
|
||||||
|
failureThreshold: 3 # 3 * 30s = 90s before restart
|
||||||
|
|
||||||
|
readinessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /ready # Separate endpoint: checks DB, cache, etc.
|
||||||
|
port: 8080
|
||||||
|
periodSeconds: 10
|
||||||
|
failureThreshold: 2
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# WRONG: initialDelaySeconds without startupProbe
|
||||||
|
# If the app takes 60s to start, set a startupProbe instead
|
||||||
|
livenessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /health
|
||||||
|
port: 8080
|
||||||
|
initialDelaySeconds: 60 # BAD: Arbitrary wait, race condition
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Services and Ingress
|
||||||
|
|
||||||
|
### Service Types
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# ClusterIP (default) — internal-only
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: my-app
|
||||||
|
namespace: my-namespace
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
app: my-app
|
||||||
|
ports:
|
||||||
|
- port: 80
|
||||||
|
targetPort: 8080
|
||||||
|
protocol: TCP
|
||||||
|
type: ClusterIP
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# LoadBalancer — external traffic (cloud providers)
|
||||||
|
spec:
|
||||||
|
type: LoadBalancer
|
||||||
|
ports:
|
||||||
|
- port: 443
|
||||||
|
targetPort: 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
### Ingress with TLS
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: networking.k8s.io/v1
|
||||||
|
kind: Ingress
|
||||||
|
metadata:
|
||||||
|
name: my-app
|
||||||
|
namespace: my-namespace
|
||||||
|
annotations:
|
||||||
|
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
||||||
|
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
||||||
|
spec:
|
||||||
|
ingressClassName: nginx
|
||||||
|
tls:
|
||||||
|
- hosts:
|
||||||
|
- myapp.example.com
|
||||||
|
secretName: my-app-tls
|
||||||
|
rules:
|
||||||
|
- host: myapp.example.com
|
||||||
|
http:
|
||||||
|
paths:
|
||||||
|
- path: /
|
||||||
|
pathType: Prefix
|
||||||
|
backend:
|
||||||
|
service:
|
||||||
|
name: my-app
|
||||||
|
port:
|
||||||
|
number: 80
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ConfigMaps and Secrets
|
||||||
|
|
||||||
|
### ConfigMap — Non-sensitive configuration
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ConfigMap
|
||||||
|
metadata:
|
||||||
|
name: my-app-config
|
||||||
|
namespace: my-namespace
|
||||||
|
data:
|
||||||
|
LOG_LEVEL: "info"
|
||||||
|
APP_ENV: "production"
|
||||||
|
MAX_CONNECTIONS: "100"
|
||||||
|
# Mount as a file for complex config
|
||||||
|
app.yaml: |
|
||||||
|
server:
|
||||||
|
port: 8080
|
||||||
|
timeout: 30s
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Mount ConfigMap as a file
|
||||||
|
volumes:
|
||||||
|
- name: config
|
||||||
|
configMap:
|
||||||
|
name: my-app-config
|
||||||
|
items:
|
||||||
|
- key: app.yaml
|
||||||
|
path: app.yaml
|
||||||
|
volumeMounts:
|
||||||
|
- name: config
|
||||||
|
mountPath: /etc/app
|
||||||
|
readOnly: true
|
||||||
|
```
|
||||||
|
|
||||||
|
### Secrets — Sensitive data
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create secret from literal (CLI, then store in Vault/SOPS)
|
||||||
|
kubectl create secret generic my-app-secrets \
|
||||||
|
--from-literal=db-password='s3cr3t' \
|
||||||
|
--namespace=my-namespace \
|
||||||
|
--dry-run=client -o yaml | kubectl apply -f -
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Secret
|
||||||
|
metadata:
|
||||||
|
name: my-app-secrets
|
||||||
|
namespace: my-namespace
|
||||||
|
type: Opaque
|
||||||
|
# Values are base64-encoded (NOT encrypted — use Sealed Secrets or ESO for real encryption)
|
||||||
|
data:
|
||||||
|
db-password: czNjcjN0 # base64 of 's3cr3t'
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Important:** Raw Kubernetes Secrets are only base64-encoded, not encrypted at rest unless your cluster has encryption configured. Use [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets) or [External Secrets Operator](https://external-secrets.io) for production.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resource Requests and Limits
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
resources:
|
||||||
|
requests: # Scheduler uses this to place the pod
|
||||||
|
cpu: "100m" # 100 millicores = 0.1 CPU
|
||||||
|
memory: "128Mi"
|
||||||
|
limits: # Container is killed/throttled above this
|
||||||
|
cpu: "500m"
|
||||||
|
memory: "256Mi"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rules of thumb:**
|
||||||
|
|
||||||
|
| Workload Type | CPU Request | Memory Request | Notes |
|
||||||
|
|---------------|-------------|----------------|-------|
|
||||||
|
| Web API | 100–250m | 128–256Mi | Set limits 2-4x requests |
|
||||||
|
| Worker/consumer | 250–500m | 256–512Mi | Memory limit = request for predictability |
|
||||||
|
| JVM app | 500m–1 | 512Mi–2Gi | Allow headroom above `-Xmx` for JVM overhead |
|
||||||
|
| Sidecar | 10–50m | 32–64Mi | Keep minimal |
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# WRONG: No requests or limits — unpredictable scheduling, OOM evictions
|
||||||
|
containers:
|
||||||
|
- name: app
|
||||||
|
image: myapp:latest
|
||||||
|
# Missing resources: {} — this is dangerous in production
|
||||||
|
|
||||||
|
# WRONG: Limits without requests — requests default to limits, over-reserves capacity
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
cpu: "2"
|
||||||
|
memory: "1Gi"
|
||||||
|
# requests missing — will default to limits values
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## RBAC — Roles and ServiceAccounts
|
||||||
|
|
||||||
|
### Principle of Least Privilege
|
||||||
|
|
||||||
|
**Two patterns depending on whether the app calls the Kubernetes API:**
|
||||||
|
|
||||||
|
#### Pattern A — App does NOT need the Kubernetes API (most apps)
|
||||||
|
|
||||||
|
Disable token automounting on the ServiceAccount. The Role/RoleBinding are not needed.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# ServiceAccount with token disabled — safest default
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ServiceAccount
|
||||||
|
metadata:
|
||||||
|
name: my-app-sa
|
||||||
|
namespace: my-namespace
|
||||||
|
automountServiceAccountToken: false # No K8s API token injected into pods
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Reference in Deployment — no token, no API access
|
||||||
|
spec:
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
serviceAccountName: my-app-sa
|
||||||
|
automountServiceAccountToken: false # Belt-and-suspenders: also set at pod level
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Pattern B — App DOES need the Kubernetes API (operators, controllers, config watchers)
|
||||||
|
|
||||||
|
Enable the token and grant only the permissions actually required.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# 1. ServiceAccount — enable token for this SA
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ServiceAccount
|
||||||
|
metadata:
|
||||||
|
name: my-app-sa
|
||||||
|
namespace: my-namespace
|
||||||
|
automountServiceAccountToken: true # Token required: app calls K8s API
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# 2. Role — grant only what the app needs (namespace-scoped)
|
||||||
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: Role
|
||||||
|
metadata:
|
||||||
|
name: my-app-role
|
||||||
|
namespace: my-namespace
|
||||||
|
rules:
|
||||||
|
- apiGroups: [""]
|
||||||
|
resources: ["configmaps"]
|
||||||
|
verbs: ["get", "list", "watch"] # Read-only, specific resource
|
||||||
|
- apiGroups: [""]
|
||||||
|
resources: ["secrets"]
|
||||||
|
resourceNames: ["my-app-secrets"] # Restrict to specific secret by name
|
||||||
|
verbs: ["get"]
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# 3. Bind Role to ServiceAccount
|
||||||
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: RoleBinding
|
||||||
|
metadata:
|
||||||
|
name: my-app-rolebinding
|
||||||
|
namespace: my-namespace
|
||||||
|
subjects:
|
||||||
|
- kind: ServiceAccount
|
||||||
|
name: my-app-sa
|
||||||
|
namespace: my-namespace
|
||||||
|
roleRef:
|
||||||
|
kind: Role
|
||||||
|
apiGroup: rbac.authorization.k8s.io
|
||||||
|
name: my-app-role
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# 4. Reference SA in Deployment
|
||||||
|
spec:
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
serviceAccountName: my-app-sa
|
||||||
|
# automountServiceAccountToken defaults to true from SA — token is injected
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Horizontal Pod Autoscaler (HPA)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: autoscaling/v2
|
||||||
|
kind: HorizontalPodAutoscaler
|
||||||
|
metadata:
|
||||||
|
name: my-app-hpa
|
||||||
|
namespace: my-namespace
|
||||||
|
spec:
|
||||||
|
scaleTargetRef:
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
name: my-app
|
||||||
|
minReplicas: 2 # Always at least 2 for HA
|
||||||
|
maxReplicas: 10
|
||||||
|
metrics:
|
||||||
|
- type: Resource
|
||||||
|
resource:
|
||||||
|
name: cpu
|
||||||
|
target:
|
||||||
|
type: Utilization
|
||||||
|
averageUtilization: 70 # Scale up when avg CPU > 70%
|
||||||
|
- type: Resource
|
||||||
|
resource:
|
||||||
|
name: memory
|
||||||
|
target:
|
||||||
|
type: Utilization
|
||||||
|
averageUtilization: 80
|
||||||
|
```
|
||||||
|
|
||||||
|
> HPA requires `resources.requests` to be set on all containers — it calculates utilization as `current / request`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## PodDisruptionBudget (PDB)
|
||||||
|
|
||||||
|
Prevent too many pods going down during node drains or rolling updates:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: policy/v1
|
||||||
|
kind: PodDisruptionBudget
|
||||||
|
metadata:
|
||||||
|
name: my-app-pdb
|
||||||
|
namespace: my-namespace
|
||||||
|
spec:
|
||||||
|
minAvailable: 2 # OR use maxUnavailable: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: my-app
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Namespaces and Multi-Tenancy
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create namespace with resource quotas
|
||||||
|
kubectl create namespace my-namespace
|
||||||
|
|
||||||
|
# Apply ResourceQuota to limit namespace consumption
|
||||||
|
kubectl apply -f - <<EOF
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ResourceQuota
|
||||||
|
metadata:
|
||||||
|
name: my-namespace-quota
|
||||||
|
namespace: my-namespace
|
||||||
|
spec:
|
||||||
|
hard:
|
||||||
|
requests.cpu: "4"
|
||||||
|
requests.memory: 4Gi
|
||||||
|
limits.cpu: "8"
|
||||||
|
limits.memory: 8Gi
|
||||||
|
pods: "20"
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Jobs and CronJobs
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# One-off Job (DB migration, data processing)
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: Job
|
||||||
|
metadata:
|
||||||
|
name: db-migrate
|
||||||
|
namespace: my-namespace
|
||||||
|
spec:
|
||||||
|
backoffLimit: 3 # Retry up to 3 times on failure
|
||||||
|
ttlSecondsAfterFinished: 3600 # Auto-delete after 1h
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
restartPolicy: OnFailure # Never for Jobs (not Always)
|
||||||
|
containers:
|
||||||
|
- name: migrate
|
||||||
|
image: ghcr.io/org/my-app:1.0.0
|
||||||
|
command: ["python", "manage.py", "migrate"]
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: "100m"
|
||||||
|
memory: "256Mi"
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# CronJob
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: CronJob
|
||||||
|
metadata:
|
||||||
|
name: cleanup-job
|
||||||
|
namespace: my-namespace
|
||||||
|
spec:
|
||||||
|
schedule: "0 2 * * *" # 2am daily
|
||||||
|
concurrencyPolicy: Forbid # Don't run if previous still running
|
||||||
|
successfulJobsHistoryLimit: 3
|
||||||
|
failedJobsHistoryLimit: 1
|
||||||
|
jobTemplate:
|
||||||
|
spec:
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
restartPolicy: OnFailure
|
||||||
|
containers:
|
||||||
|
- name: cleanup
|
||||||
|
image: ghcr.io/org/cleanup:1.0.0
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: "50m"
|
||||||
|
memory: "64Mi"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## kubectl Debugging Cheatsheet
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# --- Pod status and logs ---
|
||||||
|
kubectl get pods -n my-namespace
|
||||||
|
kubectl get pods -n my-namespace -o wide # Show node assignment
|
||||||
|
kubectl describe pod <pod-name> -n my-namespace # Events and state details
|
||||||
|
kubectl logs <pod-name> -n my-namespace # Current logs
|
||||||
|
kubectl logs <pod-name> -n my-namespace --previous # Logs from crashed container
|
||||||
|
kubectl logs <pod-name> -n my-namespace -c <container> # Multi-container pod
|
||||||
|
|
||||||
|
# --- Execute into a running container ---
|
||||||
|
kubectl exec -it <pod-name> -n my-namespace -- sh
|
||||||
|
kubectl exec -it <pod-name> -n my-namespace -- bash
|
||||||
|
|
||||||
|
# --- Check resource usage ---
|
||||||
|
kubectl top pods -n my-namespace
|
||||||
|
kubectl top nodes
|
||||||
|
|
||||||
|
# --- Deployment operations ---
|
||||||
|
kubectl rollout status deployment/my-app -n my-namespace
|
||||||
|
kubectl rollout history deployment/my-app -n my-namespace
|
||||||
|
kubectl rollout undo deployment/my-app -n my-namespace # Rollback
|
||||||
|
kubectl rollout undo deployment/my-app --to-revision=2 -n my-namespace
|
||||||
|
|
||||||
|
# --- Scale manually ---
|
||||||
|
kubectl scale deployment my-app --replicas=5 -n my-namespace
|
||||||
|
|
||||||
|
# --- Inspect events (cluster-wide issues) ---
|
||||||
|
kubectl get events -n my-namespace --sort-by='.lastTimestamp'
|
||||||
|
|
||||||
|
# --- Port-forward for local debugging ---
|
||||||
|
kubectl port-forward pod/<pod-name> 8080:8080 -n my-namespace
|
||||||
|
kubectl port-forward svc/my-app 8080:80 -n my-namespace
|
||||||
|
|
||||||
|
# --- Dry-run to validate YAML ---
|
||||||
|
kubectl apply -f deployment.yaml --dry-run=client
|
||||||
|
kubectl apply -f deployment.yaml --dry-run=server # Validates against live cluster
|
||||||
|
```
|
||||||
|
|
||||||
|
### Diagnosing Common Errors
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# CrashLoopBackOff: container keeps crashing
|
||||||
|
kubectl logs <pod-name> --previous -n my-namespace # Check crash logs
|
||||||
|
kubectl describe pod <pod-name> -n my-namespace # Check exit code & OOMKilled
|
||||||
|
|
||||||
|
# ImagePullBackOff: can't pull image
|
||||||
|
kubectl describe pod <pod-name> -n my-namespace # Check Events section
|
||||||
|
# Causes: wrong image tag, missing imagePullSecret, private registry
|
||||||
|
|
||||||
|
# Pending pod: not scheduled
|
||||||
|
kubectl describe pod <pod-name> -n my-namespace
|
||||||
|
# Causes: insufficient resources, no matching node selector, taint/toleration mismatch
|
||||||
|
|
||||||
|
# OOMKilled: out of memory
|
||||||
|
# Increase memory limits, check for memory leaks
|
||||||
|
kubectl describe pod <pod-name> -n my-namespace | grep -A5 "Last State"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Anti-Patterns
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# BAD: Using :latest tag — non-deterministic deployments
|
||||||
|
image: myapp:latest
|
||||||
|
|
||||||
|
# GOOD: Pin to a specific immutable tag (SHA or semver)
|
||||||
|
image: ghcr.io/org/myapp:1.4.2
|
||||||
|
# or
|
||||||
|
image: ghcr.io/org/myapp@sha256:abc123...
|
||||||
|
|
||||||
|
# ---
|
||||||
|
|
||||||
|
# BAD: Running as root
|
||||||
|
securityContext: {} # Defaults to root
|
||||||
|
|
||||||
|
# GOOD: Non-root with explicit UID
|
||||||
|
securityContext:
|
||||||
|
runAsNonRoot: true
|
||||||
|
runAsUser: 1001
|
||||||
|
|
||||||
|
# ---
|
||||||
|
|
||||||
|
# BAD: No resource limits — one pod can starve the entire node
|
||||||
|
containers:
|
||||||
|
- name: app
|
||||||
|
image: myapp:1.0.0
|
||||||
|
# No resources defined
|
||||||
|
|
||||||
|
# GOOD: Always set requests and limits
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: "100m"
|
||||||
|
memory: "128Mi"
|
||||||
|
limits:
|
||||||
|
cpu: "500m"
|
||||||
|
memory: "256Mi"
|
||||||
|
|
||||||
|
# ---
|
||||||
|
|
||||||
|
# BAD: Storing plaintext secrets in ConfigMaps
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ConfigMap
|
||||||
|
data:
|
||||||
|
DB_PASSWORD: "mysecretpassword" # NEVER — use Secret or external secrets manager
|
||||||
|
|
||||||
|
# ---
|
||||||
|
|
||||||
|
# BAD: ClusterAdmin for application service accounts
|
||||||
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: ClusterRoleBinding
|
||||||
|
roleRef:
|
||||||
|
kind: ClusterRole
|
||||||
|
name: cluster-admin # Grants god-mode to your app
|
||||||
|
|
||||||
|
# ---
|
||||||
|
|
||||||
|
# BAD: minAvailable: 0 in PDB — defeats the purpose
|
||||||
|
spec:
|
||||||
|
minAvailable: 0
|
||||||
|
|
||||||
|
# ---
|
||||||
|
|
||||||
|
# BAD: restartPolicy: Always in a Job (causes infinite restart loop)
|
||||||
|
spec:
|
||||||
|
restartPolicy: Always # Use OnFailure or Never for Jobs
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Best Practices Checklist
|
||||||
|
|
||||||
|
### Security
|
||||||
|
- [ ] Container runs as non-root (`runAsNonRoot: true`, `runAsUser` set)
|
||||||
|
- [ ] `readOnlyRootFilesystem: true` with `emptyDir` for writable paths
|
||||||
|
- [ ] `allowPrivilegeEscalation: false`
|
||||||
|
- [ ] All capabilities dropped (`capabilities.drop: [ALL]`)
|
||||||
|
- [ ] Dedicated ServiceAccount per app, not `default`
|
||||||
|
- [ ] `automountServiceAccountToken: false` unless needed
|
||||||
|
- [ ] RBAC follows least privilege (use `Role`, not `ClusterRole` unless needed)
|
||||||
|
- [ ] Secrets managed via Sealed Secrets or External Secrets Operator
|
||||||
|
|
||||||
|
### Reliability
|
||||||
|
- [ ] All 3 probe types configured (startup + liveness + readiness)
|
||||||
|
- [ ] Resource requests AND limits set on every container
|
||||||
|
- [ ] `minReplicas: 2+` for any production workload
|
||||||
|
- [ ] PodDisruptionBudget defined for stateful or critical services
|
||||||
|
- [ ] `RollingUpdate` strategy with `maxUnavailable: 0`
|
||||||
|
- [ ] HPA configured for variable-load services
|
||||||
|
|
||||||
|
### Observability
|
||||||
|
- [ ] App exposes `/health` (liveness) and `/ready` (readiness) endpoints
|
||||||
|
- [ ] Structured JSON logging (no PII in logs)
|
||||||
|
- [ ] Resource labels: `app`, `version`, `environment`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Related Skills
|
||||||
|
|
||||||
|
- `docker-patterns` — Multi-stage Dockerfiles and image security
|
||||||
|
- `deployment-patterns` — CI/CD pipelines, rollback strategy, health check endpoints
|
||||||
|
- `security-review` — Broader security hardening context
|
||||||
|
- `git-workflow` — GitOps integration with K8s (ArgoCD / Flux patterns)
|
||||||
Loading…
x
Reference in New Issue
Block a user