mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-05-14 02:10:07 +08:00
docs: define AgentShield enterprise roadmap (#1833)
This commit is contained in:
parent
d2d8cda8b3
commit
be42989746
@ -99,11 +99,20 @@ As of 2026-05-12:
|
|||||||
scope, expiry, and days-until-expiry reporting; terminal output and GitHub
|
scope, expiry, and days-until-expiry reporting; terminal output and GitHub
|
||||||
Action job-summary evidence; README docs; rebuilt action bundles; and
|
Action job-summary evidence; README docs; rebuilt action bundles; and
|
||||||
1,708-test validation.
|
1,708-test validation.
|
||||||
|
- AgentShield PR #63 exposed baseline drift in the GitHub Action with
|
||||||
|
`baseline` / `save-baseline` inputs, baseline drift outputs, job-summary
|
||||||
|
evidence, regression annotations, README/API docs, rebuilt action bundles,
|
||||||
|
and green remote action/self-scan/Node verification.
|
||||||
- AgentShield PDF-export decision: defer a native PDF writer for now. The
|
- AgentShield PDF-export decision: defer a native PDF writer for now. The
|
||||||
self-contained HTML executive report remains the exportable buyer artifact
|
self-contained HTML executive report remains the exportable buyer artifact
|
||||||
and can be printed to PDF when needed; native PDF generation should wait for
|
and can be printed to PDF when needed; native PDF generation should wait for
|
||||||
explicit enterprise/compliance demand or a print-fidelity gap in the HTML
|
explicit enterprise/compliance demand or a print-fidelity gap in the HTML
|
||||||
report.
|
report.
|
||||||
|
- `docs/architecture/agentshield-enterprise-research-roadmap.md` identifies
|
||||||
|
the next AgentShield enterprise signal: move from scanner/report/policy gate
|
||||||
|
to a team control plane with baseline drift, evidence packs, multi-harness
|
||||||
|
adapters, corpus accuracy gates, remediation routing, threat intelligence,
|
||||||
|
and ECC-Tools/GitHub App integration.
|
||||||
- ECC PR #1778 recovered the useful stale #1413 network/homelab architect-agent
|
- ECC PR #1778 recovered the useful stale #1413 network/homelab architect-agent
|
||||||
concepts.
|
concepts.
|
||||||
- ECC-Tools PR #26 added cost/token-risk predictive follow-ups for AI routing,
|
- ECC-Tools PR #26 added cost/token-risk predictive follow-ups for AI routing,
|
||||||
@ -208,7 +217,7 @@ is not complete unless the evidence column exists and has been freshly verified.
|
|||||||
| Naming and rename readiness | Naming matrix across package/plugin/docs/social surfaces | `docs/releases/2.0.0-rc.1/naming-and-publication-matrix.md` records current package, repo, Claude plugin, Codex plugin, OpenCode, and npm availability evidence | Complete for rc.1; post-rc rename remains future work |
|
| Naming and rename readiness | Naming matrix across package/plugin/docs/social surfaces | `docs/releases/2.0.0-rc.1/naming-and-publication-matrix.md` records current package, repo, Claude plugin, Codex plugin, OpenCode, and npm availability evidence | Complete for rc.1; post-rc rename remains future work |
|
||||||
| Claude and Codex plugin publication | Contact/submission path with required artifacts and status | Publication readiness, naming matrix, and May 12 dry-run evidence document plugin validation, clean-checkout Claude tag/install smoke, and Codex marketplace CLI shape | Needs explicit approval for real tag/push and marketplace submission |
|
| Claude and Codex plugin publication | Contact/submission path with required artifacts and status | Publication readiness, naming matrix, and May 12 dry-run evidence document plugin validation, clean-checkout Claude tag/install smoke, and Codex marketplace CLI shape | Needs explicit approval for real tag/push and marketplace submission |
|
||||||
| Articles, tweets, and announcements | X thread, LinkedIn copy, GitHub release copy, push checklist | Draft launch collateral exists under rc.1 release docs | Needs URL-backed refresh |
|
| Articles, tweets, and announcements | X thread, LinkedIn copy, GitHub release copy, push checklist | Draft launch collateral exists under rc.1 release docs | Needs URL-backed refresh |
|
||||||
| AgentShield enterprise iteration | Policy gates, SARIF, packs, provenance, corpus, HTML reports, exception lifecycle audit | PRs #53, #55-#62 landed with test evidence; native PDF export deferred in favor of self-contained HTML plus print-to-PDF until explicit enterprise demand appears | Needs next enterprise signal |
|
| AgentShield enterprise iteration | Policy gates, SARIF, packs, provenance, corpus, HTML reports, exception lifecycle audit, baseline drift action surface, enterprise research roadmap | PRs #53, #55-#63 landed with test evidence; native PDF export deferred in favor of self-contained HTML plus print-to-PDF until explicit enterprise demand appears; `docs/architecture/agentshield-enterprise-research-roadmap.md` selects baseline drift as the first control-plane slice | Baseline-drift Action surface landed; CLI/evidence-pack routing remains |
|
||||||
| ECC Tools next-level app | Billing audit, PR checks, deep analyzer, sync backlog, evaluator/RAG corpus | PRs #26-#40 landed with test evidence | Needs capacity-backed Linear rollout |
|
| ECC Tools next-level app | Billing audit, PR checks, deep analyzer, sync backlog, evaluator/RAG corpus | PRs #26-#40 landed with test evidence | Needs capacity-backed Linear rollout |
|
||||||
| GitGuardian/Dependabot/CodeRabbit-style checks | Non-blocking taxonomy and deterministic follow-up checks | ECC-Tools risk taxonomy check plus follow-up signals landed, including Skill Quality, Deep Analyzer Evidence, Analyzer Corpus Evidence, RAG/Evaluator Evidence, and PR Review/Salvage Evidence | Partially complete |
|
| GitGuardian/Dependabot/CodeRabbit-style checks | Non-blocking taxonomy and deterministic follow-up checks | ECC-Tools risk taxonomy check plus follow-up signals landed, including Skill Quality, Deep Analyzer Evidence, Analyzer Corpus Evidence, RAG/Evaluator Evidence, and PR Review/Salvage Evidence | Partially complete |
|
||||||
| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md`, `examples/evaluator-rag-prototype/`, and ECC-Tools PR #40 define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, deep-analyzer evidence, and RAG/evaluator comparison scenarios with trace, report, playbook, verifier, and predictive-check artifacts | Local corpus complete; hosted integration remains future |
|
| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md`, `examples/evaluator-rag-prototype/`, and ECC-Tools PR #40 define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, deep-analyzer evidence, and RAG/evaluator comparison scenarios with trace, report, playbook, verifier, and predictive-check artifacts | Local corpus complete; hosted integration remains future |
|
||||||
@ -231,7 +240,7 @@ back to the repo evidence and merge commits.
|
|||||||
| Release and publication | rc.1 release docs, publication readiness doc | Naming matrix and plugin submission/contact checklist | Before any tag |
|
| Release and publication | rc.1 release docs, publication readiness doc | Naming matrix and plugin submission/contact checklist | Before any tag |
|
||||||
| Harness OS core | Audit, adapter matrix, observability docs, `ecc2/` | HUD/session-control acceptance spec | Weekly until GA |
|
| Harness OS core | Audit, adapter matrix, observability docs, `ecc2/` | HUD/session-control acceptance spec | Weekly until GA |
|
||||||
| Evaluation and RAG | Reference-set validation, harness audit, traces, ECC-Tools corpus | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, deep-analyzer evidence, and RAG/evaluator comparison fixtures | Hosted retrieval/check-run automation plan |
|
| Evaluation and RAG | Reference-set validation, harness audit, traces, ECC-Tools corpus | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, deep-analyzer evidence, and RAG/evaluator comparison fixtures | Hosted retrieval/check-run automation plan |
|
||||||
| AgentShield enterprise | AgentShield PR evidence and roadmap notes | Next enterprise signal | After PDF/export decision |
|
| AgentShield enterprise | AgentShield PR evidence and roadmap notes | Baseline-drift CLI/evidence-pack follow-up | Next implementation batch |
|
||||||
| ECC Tools app | ECC-Tools PR evidence, billing audit, risk taxonomy, evaluator/RAG corpus | Capacity-backed Linear rollout | Next implementation batch |
|
| ECC Tools app | ECC-Tools PR evidence, billing audit, risk taxonomy, evaluator/RAG corpus | Capacity-backed Linear rollout | Next implementation batch |
|
||||||
| Linear progress | Linear project status updates and this mirror | Status update with queue/evidence/missing gates | Every significant merge batch |
|
| Linear progress | Linear project status updates and this mirror | Status update with queue/evidence/missing gates | Every significant merge batch |
|
||||||
|
|
||||||
@ -432,9 +441,11 @@ Acceptance:
|
|||||||
|
|
||||||
## Next Engineering Slices
|
## Next Engineering Slices
|
||||||
|
|
||||||
1. Identify the next AgentShield enterprise signal beyond the merged HTML
|
1. Finish the AgentShield baseline-drift control-plane slice from
|
||||||
executive report, corpus benchmark output, exception lifecycle audit, and
|
`docs/architecture/agentshield-enterprise-research-roadmap.md`: PR #63
|
||||||
deferred native-PDF decision.
|
shipped the GitHub Action baseline outputs and job-summary evidence; the
|
||||||
|
remaining work is CLI baseline UX, evidence-pack routing, and ECC-Tools
|
||||||
|
backlog sync integration.
|
||||||
2. Enable/configure the merged Linear backlog sync path after workspace issue
|
2. Enable/configure the merged Linear backlog sync path after workspace issue
|
||||||
capacity clears or the Linear workspace is upgraded.
|
capacity clears or the Linear workspace is upgraded.
|
||||||
3. Use the ECC-Tools evaluator/RAG corpus as the promotion gate before adding
|
3. Use the ECC-Tools evaluator/RAG corpus as the promotion gate before adding
|
||||||
|
|||||||
328
docs/architecture/agentshield-enterprise-research-roadmap.md
Normal file
328
docs/architecture/agentshield-enterprise-research-roadmap.md
Normal file
@ -0,0 +1,328 @@
|
|||||||
|
# AgentShield Enterprise Research Roadmap
|
||||||
|
|
||||||
|
Generated: 2026-05-12
|
||||||
|
|
||||||
|
This is a planning artifact for the next AgentShield enterprise iteration. It
|
||||||
|
does not modify AgentShield code. The goal is to turn the current scanner,
|
||||||
|
policy gate, corpus, and reporting surface into a security control plane for
|
||||||
|
teams running AI coding agents across multiple harnesses.
|
||||||
|
|
||||||
|
## Evidence Reviewed
|
||||||
|
|
||||||
|
Current AgentShield repository state:
|
||||||
|
|
||||||
|
- AgentShield checkout on clean `main`.
|
||||||
|
- `README.md`, `API.md`, `package.json`, `.github/workflows/*`, and
|
||||||
|
`src/`/`tests/` module layout.
|
||||||
|
- Current supported user surfaces: `agentshield scan`, `agentshield init`,
|
||||||
|
`agentshield miniclaw start`, scanner JSON, MiniClaw API, GitHub Action,
|
||||||
|
HTML, SARIF, markdown, terminal, and JSON reports.
|
||||||
|
- Current enterprise-like surfaces: policy packs, GitHub Action policy
|
||||||
|
enforcement, SARIF policy violations, supply-chain provenance, corpus
|
||||||
|
benchmark, HTML executive reports, and exception lifecycle audit.
|
||||||
|
|
||||||
|
External references checked from official GitHub repos or README sources:
|
||||||
|
|
||||||
|
- [stablyai/orca](https://github.com/stablyai/orca): multi-agent IDE,
|
||||||
|
worktree isolation, live agent status, GitHub integration, diff review, and
|
||||||
|
notifications.
|
||||||
|
- [superset-sh/superset](https://github.com/superset-sh/superset): AI-agent
|
||||||
|
editor with worktree orchestration, built-in diff review, workspace presets,
|
||||||
|
and universal CLI-agent compatibility.
|
||||||
|
- [standardagents/dmux](https://github.com/standardagents/dmux): tmux/worktree
|
||||||
|
multiplexer with lifecycle hooks, multi-agent launches, pane visibility, and
|
||||||
|
merge/PR workflows.
|
||||||
|
- [jarrodwatts/claude-hud](https://github.com/jarrodwatts/claude-hud): Claude
|
||||||
|
Code statusline, context health, tool activity, agent tracking, todo
|
||||||
|
progress, transcript parsing, and usage telemetry.
|
||||||
|
- [stanford-iris-lab/meta-harness](https://github.com/stanford-iris-lab/meta-harness):
|
||||||
|
harness optimization through repeatable tasks, logged proposer interactions,
|
||||||
|
and evaluated scaffold changes.
|
||||||
|
- [greyhaven-ai/autocontext](https://github.com/greyhaven-ai/autocontext):
|
||||||
|
recursive improvement loop with traces, scored generations, playbooks,
|
||||||
|
persisted knowledge, scenario evaluation, and optional production traces.
|
||||||
|
- [NousResearch/hermes-agent](https://github.com/NousResearch/hermes-agent):
|
||||||
|
self-improving skills, memory, session search, multi-platform gateway,
|
||||||
|
scheduled automation, terminal backends, and trajectory generation.
|
||||||
|
- [anthropics/claude-code](https://github.com/anthropics/claude-code):
|
||||||
|
terminal, IDE, GitHub, plugin, permission, MCP, and data-retention surfaces.
|
||||||
|
- [anomalyco/opencode](https://github.com/anomalyco/opencode): provider-agnostic
|
||||||
|
open-source coding agent with build/plan agents, desktop beta,
|
||||||
|
client/server architecture, and LSP support.
|
||||||
|
- [opencode-ai/opencode](https://github.com/opencode-ai/opencode): earlier
|
||||||
|
archived Go-based terminal agent with sessions, providers, LSP, file change
|
||||||
|
tracking, custom commands, and auto-compact.
|
||||||
|
- [zed-industries/zed](https://github.com/zed-industries/zed): high-performance
|
||||||
|
multiplayer editor with strict license/compliance CI expectations.
|
||||||
|
- [aidenybai/ghast](https://github.com/aidenybai/ghast): native terminal
|
||||||
|
multiplexer built around Ghostty, workspace grouping, split panes, drag/drop,
|
||||||
|
notifications, and terminal search.
|
||||||
|
|
||||||
|
Local Claude Code source inspection:
|
||||||
|
|
||||||
|
- Reviewed only non-secret local file/module shape from a private Claude Code
|
||||||
|
source snapshot.
|
||||||
|
- Relevant surfaces observed: `tools/`, `utils/permissions/`, `utils/mcp/`,
|
||||||
|
`utils/hooks/`, `utils/plugins/`, `types/permissions.ts`,
|
||||||
|
`types/plugin.ts`, `remote/`, `tasks/`, `assistant/sessionHistory.ts`,
|
||||||
|
and session/history utilities.
|
||||||
|
- No code was copied. The takeaway is that AgentShield should track permissions,
|
||||||
|
plugins, MCP, hooks, remote sessions, task/subagent activity, and history as
|
||||||
|
first-class audit domains rather than treating a `.claude/` tree as the only
|
||||||
|
source of truth.
|
||||||
|
|
||||||
|
## Current AgentShield Position
|
||||||
|
|
||||||
|
AgentShield is already more than a static lint tool:
|
||||||
|
|
||||||
|
- Rule coverage spans secrets, permissions, hooks, MCP servers, agent configs,
|
||||||
|
prompt injection, supply chain, taint analysis, sandbox execution, policy
|
||||||
|
evaluation, runtime repair/status, corpus validation, MiniClaw, and Opus
|
||||||
|
analysis.
|
||||||
|
- Reports are usable by humans and machines: terminal, JSON, markdown, HTML,
|
||||||
|
SARIF, scan logs, and GitHub Action outputs.
|
||||||
|
- Enterprise hooks exist: policy packs, exception metadata, expiring/expired
|
||||||
|
exception reporting, SARIF code scanning, and job-summary output.
|
||||||
|
- Accuracy work is active: `runtimeConfidence`, template/example weighting,
|
||||||
|
docs-example downgrades, hook-manifest resolution, false-positive audit
|
||||||
|
guidance, and corpus readiness.
|
||||||
|
|
||||||
|
The next iteration should not be "add more regex rules" by default. The higher
|
||||||
|
leverage move is to make AgentShield remember, compare, route, and enforce
|
||||||
|
security posture across time, repos, teams, and harnesses.
|
||||||
|
|
||||||
|
## Enterprise Gaps
|
||||||
|
|
||||||
|
### 1. Organization Baselines And Drift
|
||||||
|
|
||||||
|
Enterprise buyers need to know whether a repo, team, or agent fleet is getting
|
||||||
|
safer or riskier over time. AgentShield has scan logs and baseline comparison
|
||||||
|
modules, and PR #63 now exposes that drift through GitHub Action inputs,
|
||||||
|
outputs, annotations, and job-summary evidence. The remaining product surface
|
||||||
|
should make baseline snapshots, CLI drift summaries, and owner-ready deltas
|
||||||
|
explicit.
|
||||||
|
|
||||||
|
Target capability:
|
||||||
|
|
||||||
|
- `agentshield baseline write --output agentshield-baseline.json`
|
||||||
|
- `agentshield scan --baseline agentshield-baseline.json`
|
||||||
|
- Report sections for new, fixed, unchanged, suppressed, and policy-excepted
|
||||||
|
findings.
|
||||||
|
- GitHub Action output that posts "security posture changed" rather than only a
|
||||||
|
point-in-time grade.
|
||||||
|
|
||||||
|
### 2. Multi-Harness Security Adapters
|
||||||
|
|
||||||
|
The market is moving toward many parallel agent harnesses, not one tool. Orca,
|
||||||
|
Superset, dmux, OpenCode, Claude Code, Codex, Gemini, Zed, and terminal
|
||||||
|
multiplexers all create different security surfaces.
|
||||||
|
|
||||||
|
Target capability:
|
||||||
|
|
||||||
|
- A small adapter registry for `claude-code`, `opencode`, `codex`, `gemini`,
|
||||||
|
`zed`, `dmux`, `orca`, `superset`, and `generic-terminal`.
|
||||||
|
- Each adapter declares config paths, permission concepts, plugin surfaces,
|
||||||
|
MCP/tooling conventions, history/session surfaces, and CI evidence.
|
||||||
|
- Report output groups findings by harness and confidence, so template/docs
|
||||||
|
findings do not look like active runtime exposure.
|
||||||
|
|
||||||
|
### 3. Session And Worktree Awareness
|
||||||
|
|
||||||
|
Worktree-native orchestrators change the risk model. A team can run many agents
|
||||||
|
in parallel, each with its own branch, shell, MCP config, and local state.
|
||||||
|
|
||||||
|
Target capability:
|
||||||
|
|
||||||
|
- Optional scan metadata for branch, worktree path, agent name, session id,
|
||||||
|
provider, and orchestrator.
|
||||||
|
- A scan-history table that answers: which worktree introduced a new permission,
|
||||||
|
which agent run added a risky MCP, which branch relaxed policy, and whether
|
||||||
|
the final merged branch fixed it.
|
||||||
|
- A compact "security HUD" summary usable by statuslines, GitHub checks, and
|
||||||
|
local dashboards.
|
||||||
|
|
||||||
|
### 4. Evidence Packs For Buyers And Auditors
|
||||||
|
|
||||||
|
HTML reports are the right buyer-facing artifact today; native PDF is deferred.
|
||||||
|
The deeper need is a portable evidence bundle that can be attached to audits,
|
||||||
|
security reviews, and customer questionnaires.
|
||||||
|
|
||||||
|
Target capability:
|
||||||
|
|
||||||
|
- `agentshield scan --evidence-pack out/agentshield-evidence`
|
||||||
|
- Bundle includes JSON report, HTML report, SARIF, policy evaluation,
|
||||||
|
exception audit, baseline diff, dependency/provenance summary, and a short
|
||||||
|
README explaining how to interpret the artifacts.
|
||||||
|
- Optional redaction mode for secrets, local paths, usernames, and project names.
|
||||||
|
|
||||||
|
### 5. Regression Corpus And Reference Sets
|
||||||
|
|
||||||
|
Meta-Harness and Autocontext point to the same lesson: improvements need scored
|
||||||
|
scenarios, traces, and playbooks. AgentShield already has a corpus benchmark,
|
||||||
|
but enterprise trust needs a curated reference set for false positives,
|
||||||
|
false negatives, and policy regressions.
|
||||||
|
|
||||||
|
Target capability:
|
||||||
|
|
||||||
|
- Versioned scenario fixtures for critical rules, false-positive suppressions,
|
||||||
|
policy exceptions, template/docs examples, plugin manifests, and hook-code
|
||||||
|
resolution.
|
||||||
|
- Per-category precision/coverage reporting, not just aggregate readiness.
|
||||||
|
- A "no accuracy regression" gate that must pass before releases.
|
||||||
|
- Playbook notes for why a suppression exists and when it should expire.
|
||||||
|
|
||||||
|
### 6. Remediation Workflow
|
||||||
|
|
||||||
|
Security tools become enterprise-grade when they turn findings into accountable
|
||||||
|
work without flooding maintainers.
|
||||||
|
|
||||||
|
Target capability:
|
||||||
|
|
||||||
|
- One-click or CLI-generated remediation branch for safe transforms.
|
||||||
|
- Policy comments that group findings by owner and risk rather than by file
|
||||||
|
order.
|
||||||
|
- GitHub App support for check-run annotations, issue caps, Linear sync, and
|
||||||
|
deferred backlog export.
|
||||||
|
- Finding fingerprints that avoid duplicate issues across repeated scans.
|
||||||
|
|
||||||
|
### 7. Threat Intelligence And Package Reputation
|
||||||
|
|
||||||
|
Agent security depends on MCP packages, plugin repositories, action bundles,
|
||||||
|
and rapidly changing CLI ecosystems. Static checks need a maintained external
|
||||||
|
reputation layer.
|
||||||
|
|
||||||
|
Target capability:
|
||||||
|
|
||||||
|
- A local-first threat-intel cache for known MCP/package risks, CVEs, malware
|
||||||
|
package names, suspicious install scripts, mutable git dependencies, and
|
||||||
|
known-good packages.
|
||||||
|
- Offline deterministic mode remains available.
|
||||||
|
- Online enrichment is opt-in and produces clear provenance for every external
|
||||||
|
claim.
|
||||||
|
|
||||||
|
### 8. Commercial And Team Controls
|
||||||
|
|
||||||
|
AgentShield is already connected conceptually to the ECC Tools GitHub App.
|
||||||
|
Native GitHub payments make the product path more concrete: free local scans,
|
||||||
|
paid org policy gates, paid evidence bundles, and paid drift/history.
|
||||||
|
|
||||||
|
Target capability:
|
||||||
|
|
||||||
|
- Tier-aware GitHub App checks: free static scan, paid org policy enforcement,
|
||||||
|
paid evidence packs, paid historical drift, and paid deep analysis.
|
||||||
|
- Seat/team mapping for policy owners and exception approvers.
|
||||||
|
- Billing readiness checks shared with ECC-Tools so payment state never changes
|
||||||
|
enforcement behavior silently.
|
||||||
|
|
||||||
|
## Recommended Build Order
|
||||||
|
|
||||||
|
### Slice 1: Baseline Drift MVP
|
||||||
|
|
||||||
|
Implement the smallest enterprise control-plane primitive: compare this scan to
|
||||||
|
the last accepted baseline.
|
||||||
|
|
||||||
|
Artifacts:
|
||||||
|
|
||||||
|
- Baseline JSON schema.
|
||||||
|
- Baseline writer and comparator.
|
||||||
|
- Terminal and JSON report sections for new/fixed/unchanged findings.
|
||||||
|
- Tests covering stable fingerprints, fixed findings, new findings, and policy
|
||||||
|
exception carry-forward.
|
||||||
|
|
||||||
|
Why first:
|
||||||
|
|
||||||
|
- It reuses existing scan output.
|
||||||
|
- It improves CLI, GitHub Action, and GitHub App value at once.
|
||||||
|
- It does not require a hosted service.
|
||||||
|
|
||||||
|
### Slice 2: Evidence Pack Bundle
|
||||||
|
|
||||||
|
Bundle the existing machine and human reports into a portable audit artifact.
|
||||||
|
|
||||||
|
Artifacts:
|
||||||
|
|
||||||
|
- `--evidence-pack <dir>` CLI flag.
|
||||||
|
- Redacted bundle README.
|
||||||
|
- HTML, JSON, SARIF, policy, exception, and baseline diff files.
|
||||||
|
- Tests for file layout, redaction, and deterministic output names.
|
||||||
|
|
||||||
|
Why second:
|
||||||
|
|
||||||
|
- It converts existing reporting work into buyer-ready proof.
|
||||||
|
- It keeps native PDF deferred while still meeting audit handoff needs.
|
||||||
|
|
||||||
|
### Slice 3: Harness Adapter Registry
|
||||||
|
|
||||||
|
Make harness support explicit instead of implicit.
|
||||||
|
|
||||||
|
Artifacts:
|
||||||
|
|
||||||
|
- Adapter metadata for Claude Code, OpenCode, Codex, Gemini, dmux, generic
|
||||||
|
terminal, and project-local templates.
|
||||||
|
- Discovery output that reports which adapters matched and why.
|
||||||
|
- Report grouping by adapter.
|
||||||
|
- Tests using fixture directories for each adapter.
|
||||||
|
|
||||||
|
Why third:
|
||||||
|
|
||||||
|
- It aligns AgentShield with ECC's harness-agnostic positioning.
|
||||||
|
- It creates a stable surface for future Zed, Orca, Superset, and Hermes
|
||||||
|
integration without pretending all harnesses share Claude's config model.
|
||||||
|
|
||||||
|
### Slice 4: Corpus Accuracy Gate
|
||||||
|
|
||||||
|
Promote the corpus from a benchmark into a release gate.
|
||||||
|
|
||||||
|
Artifacts:
|
||||||
|
|
||||||
|
- Per-category corpus report.
|
||||||
|
- Required category thresholds.
|
||||||
|
- Regression snapshots for known false-positive suppressions.
|
||||||
|
- Release checklist entry requiring corpus readiness before publish.
|
||||||
|
|
||||||
|
Why fourth:
|
||||||
|
|
||||||
|
- It prevents enterprise credibility from degrading as rules expand.
|
||||||
|
- It creates a durable route for Meta-Harness/Autocontext-style improvement
|
||||||
|
loops later.
|
||||||
|
|
||||||
|
### Slice 5: GitHub App And Linear Sync Wiring
|
||||||
|
|
||||||
|
Connect AgentShield findings to ECC-Tools follow-up routing.
|
||||||
|
|
||||||
|
Artifacts:
|
||||||
|
|
||||||
|
- Finding fingerprints compatible with ECC-Tools issue caps.
|
||||||
|
- Linear-ready backlog export for baseline drift and policy violations.
|
||||||
|
- Check-run annotations grouped by owner/risk.
|
||||||
|
- Tests that ensure repeated scans do not spam duplicate issues.
|
||||||
|
|
||||||
|
Why fifth:
|
||||||
|
|
||||||
|
- It needs the baseline/fingerprint work from Slice 1.
|
||||||
|
- It is the bridge from local CLI to paid team workflow.
|
||||||
|
|
||||||
|
## Non-Goals For This Iteration
|
||||||
|
|
||||||
|
- Native PDF generation, unless buyer/compliance workflows explicitly require
|
||||||
|
generated PDF instead of HTML plus print-to-PDF.
|
||||||
|
- Hosted dashboards before the local baseline/evidence/fingerprint contracts are
|
||||||
|
stable.
|
||||||
|
- Fine-tuning or model training before deterministic corpus gates and reference
|
||||||
|
traces exist.
|
||||||
|
- Broad automated code rewrites for risky findings without explicit,
|
||||||
|
reviewable transforms and tests.
|
||||||
|
|
||||||
|
## Acceptance Gates
|
||||||
|
|
||||||
|
The AgentShield enterprise iteration is not complete until these are true:
|
||||||
|
|
||||||
|
- Local `npm run typecheck`, `npm run lint`, `npm test`, and `npm run build`
|
||||||
|
pass from the AgentShield repository root.
|
||||||
|
- Built CLI smoke tests cover the new flags or report modes.
|
||||||
|
- GitHub Action self-test covers the new CI-visible output.
|
||||||
|
- Documentation names the free/local path and the paid/team path separately.
|
||||||
|
- Evidence produced by the feature is deterministic enough for CI diffing.
|
||||||
|
- ECC-Tools can consume the finding fingerprints or backlog export without
|
||||||
|
exceeding GitHub/Linear object caps.
|
||||||
|
- The GA roadmap and Linear project status link to the merged AgentShield PRs.
|
||||||
Loading…
x
Reference in New Issue
Block a user