mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-06-16 16:36:53 +08:00
The official Agent Skills spec (agentskills.io/specification) whitelists exactly 6 top-level frontmatter keys (name/description/license/compatibility/metadata/ allowed-tools). A top-level `origin` key fails the official validator (anthropics/skills quick_validate.py ALLOWED_PROPERTIES; skills-ref validate). This moves `origin: X` -> `metadata.origin: X` across the canonical skills/ tree, preserving each value verbatim. Frontmatter-only, minimal diff. - 251 SKILL.md updated (242 new metadata block, 9 appended to existing metadata) - origin values preserved verbatim (verified 251/251) - YAML validated on all changed files - scoped to canonical skills/ only (docs/<lang> translations + tool mirrors .cursor/.kiro/.agents left untouched; presumably regenerated from canonical) Addresses #2233
52 lines
1.2 KiB
Markdown
52 lines
1.2 KiB
Markdown
---
|
|
name: enterprise-agent-ops
|
|
description: Operate long-lived agent workloads with observability, security boundaries, and lifecycle management.
|
|
metadata:
|
|
origin: ECC
|
|
---
|
|
|
|
# Enterprise Agent Ops
|
|
|
|
Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions.
|
|
|
|
## Operational Domains
|
|
|
|
1. runtime lifecycle (start, pause, stop, restart)
|
|
2. observability (logs, metrics, traces)
|
|
3. safety controls (scopes, permissions, kill switches)
|
|
4. change management (rollout, rollback, audit)
|
|
|
|
## Baseline Controls
|
|
|
|
- immutable deployment artifacts
|
|
- least-privilege credentials
|
|
- environment-level secret injection
|
|
- hard timeout and retry budgets
|
|
- audit log for high-risk actions
|
|
|
|
## Metrics to Track
|
|
|
|
- success rate
|
|
- mean retries per task
|
|
- time to recovery
|
|
- cost per successful task
|
|
- failure class distribution
|
|
|
|
## Incident Pattern
|
|
|
|
When failure spikes:
|
|
1. freeze new rollout
|
|
2. capture representative traces
|
|
3. isolate failing route
|
|
4. patch with smallest safe change
|
|
5. run regression + security checks
|
|
6. resume gradually
|
|
|
|
## Deployment Integrations
|
|
|
|
This skill pairs with:
|
|
- PM2 workflows
|
|
- systemd services
|
|
- container orchestrators
|
|
- CI/CD gates
|