mirror of https://github.com/affaan-m/everything-claude-code.git synced 2026-06-13 23:03:34 +08:00

feat: expand Kiro adapter to full language coverage (#2101 )

* feat: expand Kiro adapter to full language coverage

- Add 17 new agents (typescript, rust, kotlin, java, cpp, django, swift,
  fsharp, pytorch, mle, performance-optimizer) in both .md and .json formats
- Add 25 new skills (rust, kotlin, java/spring, django, fastapi, nestjs,
  react, nextjs, cpp, swift, mle/pytorch, deep-research, strategic-compact,
  autonomous-loops, content-hash-cache-pattern)
- Add 6 new language-specific steering files (rust, kotlin, java, cpp, php, ruby)
- Add 3 new hooks (rust-check-on-edit, python-lint-on-edit, security-check-on-create)
- Update README with expanded component inventory and documentation
- Fix install.sh line endings for macOS compatibility

Total Kiro components: 33 agents, 43 skills, 22 steering files, 13 hooks

* fix: resolve P1/P2 violations in Kiro agents, skills, and steering

- java-patterns.md: remove reference to non-existent quarkus-patterns skill
- kotlin-patterns.md: fix insecure BuildConfig recommendation for secrets
- swift-actor-persistence: fix Swift version claim (5.9+) and Dictionary crash
- java-reviewer.md: add recursive framework detection + robust diff chain
- kotlin-reviewer.md: replace unreliable diff detection with fallback chain
- rust-reviewer.md: add diff fallback + make CI gating mandatory
- jpa-patterns: add DISTINCT to fetch-join query to prevent duplicates
- django-reviewer.md: add migration safety check, narrow save() rule,
  fix pytest-django behavior description

* fix: resolve remaining violations in Kiro agents, skills, and docs

Agents:
- java-build-resolver.md: remove quarkus-patterns ref, fix 'Initialise' spelling
- java-reviewer.json: remove quarkus-patterns ref from prompt
- mle-reviewer.md, cpp-build-resolver.md, java-build-resolver.md,
  performance-optimizer.md: fix allowedTools 'read' -> 'fs_read'

Hooks:
- rust-check-on-edit: fix description to match askAgent behavior

Skills:
- content-hash-cache-pattern: hyphenate 'Content-Hash-Based'
- cpp-testing: hyphenate 'real-time'
- django-security: use placeholder secrets, fix CSRF_COOKIE_HTTPONLY=False
- nestjs-patterns: add Logger to HttpExceptionFilter for non-Http errors
- react-patterns: add React 19 compatibility note for useActionState
- rust-patterns: remove edition-specific 'Rust 2024+' reference
- springboot-patterns: cap exponential backoff, recommend Resilience4j
- springboot-security: fix invalid @Query SQL injection example
- swift-protocol-di-testing: add thread-safety doc comment to mock

Docs:
- README.md: fix Project Structure counts (33/43/22/13)

* fix: sync README tree with counts, restore local diff in kotlin-reviewer, correct django FK index guidance

- README.md: Project Structure tree now lists all 33 agents, 43 skills,
  22 steering files, and 13 hooks (was showing old subset)
- kotlin-reviewer.md: restore git diff --staged / git diff for local
  pre-commit review before falling back to HEAD~1
- django-reviewer.md: clarify that ForeignKey fields are indexed by
  default; only flag missing db_index on non-FK filter columns

2026-06-07 13:26:37 +08:00

5.5 KiB

Raw Blame History

name, description, origin

name	description	origin
content-hash-cache-pattern	Cache expensive file processing results using SHA-256 content hashes — path-independent, auto-invalidating, with service layer separation.	ECC

Content-Hash File Cache Pattern

Cache expensive file processing results (PDF parsing, text extraction, image analysis) using SHA-256 content hashes as cache keys. Unlike path-based caching, this approach survives file moves/renames and auto-invalidates when content changes.

When to Activate

Building file processing pipelines (PDF, images, text extraction)
Processing cost is high and same files are processed repeatedly
Need a --cache/--no-cache CLI option
Want to add caching to existing pure functions without modifying them

Core Pattern

1. Content-Hash-Based Cache Key

Use file content (not path) as the cache key:

import hashlib
from pathlib import Path

_HASH_CHUNK_SIZE = 65536  # 64KB chunks for large files

def compute_file_hash(path: Path) -> str:
    """SHA-256 of file contents (chunked for large files)."""
    if not path.is_file():
        raise FileNotFoundError(f"File not found: {path}")
    sha256 = hashlib.sha256()
    with open(path, "rb") as f:
        while True:
            chunk = f.read(_HASH_CHUNK_SIZE)
            if not chunk:
                break
            sha256.update(chunk)
    return sha256.hexdigest()

Why content hash? File rename/move = cache hit. Content change = automatic invalidation. No index file needed.

2. Frozen Dataclass for Cache Entry

from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class CacheEntry:
    file_hash: str
    source_path: str
    document: ExtractedDocument  # The cached result

3. File-Based Cache Storage

Each cache entry is stored as {hash}.json — O(1) lookup by hash, no index file required.

import json
from typing import Any

def write_cache(cache_dir: Path, entry: CacheEntry) -> None:
    cache_dir.mkdir(parents=True, exist_ok=True)
    cache_file = cache_dir / f"{entry.file_hash}.json"
    data = serialize_entry(entry)
    cache_file.write_text(json.dumps(data, ensure_ascii=False), encoding="utf-8")

def read_cache(cache_dir: Path, file_hash: str) -> CacheEntry | None:
    cache_file = cache_dir / f"{file_hash}.json"
    if not cache_file.is_file():
        return None
    try:
        raw = cache_file.read_text(encoding="utf-8")
        data = json.loads(raw)
        return deserialize_entry(data)
    except (json.JSONDecodeError, ValueError, KeyError):
        return None  # Treat corruption as cache miss

4. Service Layer Wrapper (SRP)

Keep the processing function pure. Add caching as a separate service layer.

def extract_with_cache(
    file_path: Path,
    *,
    cache_enabled: bool = True,
    cache_dir: Path = Path(".cache"),
) -> ExtractedDocument:
    """Service layer: cache check -> extraction -> cache write."""
    if not cache_enabled:
        return extract_text(file_path)  # Pure function, no cache knowledge

    file_hash = compute_file_hash(file_path)

    # Check cache
    cached = read_cache(cache_dir, file_hash)
    if cached is not None:
        logger.info("Cache hit: %s (hash=%s)", file_path.name, file_hash[:12])
        return cached.document

    # Cache miss -> extract -> store
    logger.info("Cache miss: %s (hash=%s)", file_path.name, file_hash[:12])
    doc = extract_text(file_path)
    entry = CacheEntry(file_hash=file_hash, source_path=str(file_path), document=doc)
    write_cache(cache_dir, entry)
    return doc

Key Design Decisions

Decision	Rationale
SHA-256 content hash	Path-independent, auto-invalidates on content change
`{hash}.json` file naming	O(1) lookup, no index file needed
Service layer wrapper	SRP: extraction stays pure, cache is a separate concern
Manual JSON serialization	Full control over frozen dataclass serialization
Corruption returns `None`	Graceful degradation, re-processes on next run
`cache_dir.mkdir(parents=True)`	Lazy directory creation on first write

Best Practices

Hash content, not paths — paths change, content identity doesn't
Chunk large files when hashing — avoid loading entire files into memory
Keep processing functions pure — they should know nothing about caching
Log cache hit/miss with truncated hashes for debugging
Handle corruption gracefully — treat invalid cache entries as misses, never crash

Anti-Patterns to Avoid

# BAD: Path-based caching (breaks on file move/rename)
cache = {"/path/to/file.pdf": result}

# BAD: Adding cache logic inside the processing function (SRP violation)
def extract_text(path, *, cache_enabled=False, cache_dir=None):
    if cache_enabled:  # Now this function has two responsibilities
        ...

# BAD: Using dataclasses.asdict() with nested frozen dataclasses
# (can cause issues with complex nested types)
data = dataclasses.asdict(entry)  # Use manual serialization instead

When to Use

File processing pipelines (PDF parsing, OCR, text extraction, image analysis)
CLI tools that benefit from --cache/--no-cache options
Batch processing where the same files appear across runs
Adding caching to existing pure functions without modifying them

When NOT to Use

Data that must always be fresh (real-time feeds)
Cache entries that would be extremely large (consider streaming instead)
Results that depend on parameters beyond file content (e.g., different extraction configs)

5.5 KiB Raw Blame History