Claude Code Deep Dive: Architecture, Governance, and Engineering Practices

Categories: Share

0. TL;DR

This post comes from six months of deep Claude Code usage—$40 per month across two accounts of trial and error.

At first, I treated it like a chatbot. Pretty quickly I hit the same walls everyone does: context getting messy, tools multiplying but results degrading, rules growing longer yet compliance dropping. After wrestling with it and studying how Claude Code actually works, I realized this isn’t a prompt engineering problem—it’s how the system is designed.

I want to walk through how Claude Code actually works, why context drifts and how to manage it, how to design Skills and Hooks, the right way to use Subagents, the architectural impact of Prompt Caching, and how to write a CLAUDE.md that actually works.

Here’s how I think about Claude Code: six layers.

Layer Responsibility
CLAUDE.md / rules / memory Long-term context—tells Claude “what this is”
Tools / MCP Action capabilities—tells Claude “what I can do”
Skills On-demand methodologies—tells Claude “how to do it”
Hooks Enforced behaviors that don’t rely on Claude’s judgment
Subagents Context-isolated workers for controlled autonomy
Verifiers Validation loops that make output testable, rollbackable, auditable

Over-index on one layer and things get wobbly. CLAUDE.md gets too long and pollutes its own context. Too many tools and selection gets confusing. Subagents everywhere and state drifts. Skip verification and you won’t know where things broke.


1. How It Actually Works

Agent Loop

Claude Code isn’t about “answering”—it’s an iterative agent loop:

Gather context → Take action → Verify result → [Done or loop back]
     ↑                    ↓
  CLAUDE.md          Hooks / Permissions / Sandbox
  Skills             Tools / MCP
  Memory

After using it for a while, you realize getting stuck usually isn’t because the model’s “not smart enough.” Wrong information is more dangerous than no information. More often, you’ve generated something but have no way to judge if it’s correct, and no way to roll back.

The Five Surfaces That Actually Matter

Surface Core Question Primary Carriers
Context surface What’s always loaded vs. loaded on demand CLAUDE.md, rules, memory, skills
Action surface What actions can Claude currently take Built-in tools, MCP, plugins
Control surface What actions must be constrained, blocked, or audited Permissions, sandbox, hooks
Isolation surface What tasks need context and permission isolation Subagents, worktrees, forked sessions
Verification surface How to know a task is done and the result is trustworthy Tests, lint, screenshots, logs, CI

When something’s off, check these surfaces. Unstable results? Check context loading order—not the model. Automation running wild? Check the control layer—not the agent being “too active.” Quality degrading in long sessions? Intermediate artifacts polluted context. Start a fresh session instead of endlessly tuning prompts.


2. Concept Boundaries: MCP / Plugin / Tools / Skills / Hooks / Subagents

Concept Runtime Role Solves What Common Misuse
CLAUDE.md Project-level persistent contract Commands, boundaries, prohibitions that must hold every session Writing it as a team knowledge base
.claude/rules/* Path or language-specific rules Directory, file type, or language-level local conventions Dumping all rules into root CLAUDE.md
Built-in Tools Built-in capabilities Read files, modify files, run commands, search Putting all integrations into shell
MCP External capability protocol Let Claude access GitHub, Sentry, databases Connecting too many servers, tool definitions overwhelming context
Plugin Packaging and distribution layer Distribute skills/hooks/MCP together Treating plugin as a runtime primitive
Skill On-demand loaded knowledge/workflow Give Claude a “method package” Skill trying to be both encyclopedia and deployment script
Hook Rule-enforcing interception layer Execute rules around lifecycle events Using hooks to replace all model judgment
Subagent Context-isolated work unit Parallel research, limit tools and permissions Unbounded fan-out, governance loss

Simple rule: Give Claude new actions with Tool/MCP, give it methodologies with Skill, isolate execution environment with Subagent, enforce constraints and audit with Hook, distribute across projects with Plugin. Mix these together and the system becomes unmanageable fast.


3. Context Engineering: The Most Important System Constraint

Many people treat context as a “capacity problem,” but the real issue is usually noise—useful information drowned in irrelevant content.

The Real Context Cost Structure

Context Loading

Claude Code’s 200K context isn’t all usable:

200K total context
├── Fixed overhead (~15-20K)
│   ├── System instructions: ~2K
│   ├── All enabled Skill descriptors: ~1-5K
│   ├── MCP Server tool definitions: ~10-20K  ← Biggest hidden killer
│   └── LSP state: ~2-5K
│
├── Semi-fixed (~5-10K)
│   ├── CLAUDE.md: ~2-5K
│   └── Memory: ~1-2K
│
└── Dynamically available (~160-180K)
    ├── Conversation history
    ├── File contents
    └── Tool call results

A typical MCP Server (like GitHub) has 20-30 tool definitions at ~200 tokens each, totaling 4,000-6,000 tokens. Connect 5 servers and that’s 25,000 tokens (12.5%) in fixed overhead alone. When I first ran the numbers, I was genuinely surprised—in scenarios reading large codebases, that 12.5% really matters.

Always resident    → CLAUDE.md: project contract / build commands / prohibitions
Path-loaded        → rules: language / directory / file-type specific
On-demand loaded   → skills: workflows / domain knowledge
Isolated loaded    → subagents: heavy exploration / parallel research
Never in context   → hooks: deterministic scripts / audit / blocking

Put it simply: don’t load things every time if you only need them occasionally.

Context Best Practices

  • Keep CLAUDE.md short, hard, and executable—prioritize commands, constraints, architectural boundaries. Anthropic’s own CLAUDE.md is about 2.5K tokens—use it as reference
  • Split large reference docs into skill supporting files, don’t stuff them into SKILL.md body
  • Use .claude/rules/ for path/language rules, don’t make root CLAUDE.md handle all differences
  • Use /context proactively to observe consumption, don’t wait for automatic compression to kick in

/context command output showing token breakdown by source

  • For task switches, prefer /clear; for new phases of same task, use /compact
  • Write Compact Instructions into CLAUDE.md—you control what survives compression, not the algorithm

The Compression Trap

Session Continuity

The default compression algorithm judges by “re-readability”—early tool outputs and file contents get prioritized for deletion, which also throws away architectural decisions and constraint rationale. Two hours later when you need to make changes, you might not remember what was decided two hours ago. Mysterious bugs often come from this.

The fix is to specify in CLAUDE.md:

## Compact Instructions

When compressing, preserve in priority order:

1. Architecture decisions (NEVER summarize)
2. Modified files and their key changes
3. Current verification status (pass/fail)
4. Open TODOs and rollback notes
5. Tool outputs (can delete, keep pass/fail only)

Besides Compact Instructions, there’s a more proactive approach: before starting a new session, have Claude write a HANDOFF.md documenting current progress, what was tried, what worked, what didn’t, and what to do next. The next Claude instance can pick up from just this file, not dependent on compression algorithm quality:

Write the current progress in HANDOFF.md. Explain what you tried, what worked, what didn’t, so the next agent with fresh context can continue from just this file.

Quick review for gaps, then start a new session with the HANDOFF.md path.

Plan Mode’s Engineering Value

Plan Mode splits exploration from execution—read-only during exploration, files touched only after plan confirmation:

  • Exploration phase is read-only operations
  • Claude can clarify goals and boundaries before submitting a concrete plan
  • Execution cost only occurs after plan confirmation

For complex refactoring, migrations, cross-module changes, this separation beats “shipping code immediately”—it significantly reduces the probability of continuing to modify based on wrong assumptions. Double-tap Shift+Tab to enter Plan Mode. Advanced technique: run one Claude to write the plan, another Codex to review it as a “senior engineer”—AI reviewing AI works well.


4. Skills Design: Not a Template Library, But Workflows Loaded On Demand

Skills are described as “on-demand loaded knowledge and workflows”—descriptors stay in context, full content loads on demand. Using them feels quite different from “saved prompts.”

What Makes a Good Skill

  • A good skill description tells the model “when to use me,” not “what I do”—big difference.
  • Has complete steps, inputs, outputs, and stop conditions—not just a beginning
  • Body contains navigation and core constraints only, large materials split into supporting files
  • Skills with side effects should explicitly set disable-model-invocation: true, otherwise Claude decides whether to run them

How Skills Achieve “On-Demand Loading”

Claude Code’s team emphasizes “progressive disclosure” in internal design—don’t show the model everything at once, but give it an index and navigation first, then pull details on demand:

  • SKILL.md defines task semantics, boundaries, and execution skeleton
  • Supporting files provide domain details
  • Scripts deterministically gather context or evidence

A stable structure looks like:

.claude/skills/
└── incident-triage/
    ├── SKILL.md
    ├── runbook.md
    ├── examples.md
    └── scripts/
        └── collect-context.sh

Three Typical Skill Types

These examples come from my actual usage in the open-source terminal project Kaku.

Type 1: Checklist (Quality Gate)

Run before release to ensure nothing is missed:

---
name: release-check
description: Use before cutting a release to verify build, version, and smoke test.
---

## Pre-flight (All must pass)
- [ ] `cargo build --release` passes
- [ ] `cargo clippy -- -D warnings` clean
- [ ] Version bumped in Cargo.toml
- [ ] CHANGELOG updated
- [ ] `kaku doctor` passes on clean env

## Output
Pass / Fail per item. Any Fail must be fixed before release.

Type 2: Workflow (Standardized Operations)

Config migration is high-risk, explicit invocation with built-in rollback:

---
name: config-migration
description: Migrate config schema. Run only when explicitly requested.
disable-model-invocation: true
---

## Steps
1. Backup: `cp ~/.config/kaku/config.toml ~/.config/kaku/config.toml.bak`
2. Dry run: `kaku config migrate --dry-run`
3. Apply: remove `--dry-run` after confirming output
4. Verify: `kaku doctor` all pass

## Rollback
`cp ~/.config/kaku/config.toml.bak ~/.config/kaku/config.toml`

Type 3: Domain Expert (Encapsulated Decision Framework)

When runtime issues occur, have Claude collect evidence along a fixed path instead of guessing:

---
name: runtime-diagnosis
description: Use when kaku crashes, hangs, or behaves unexpectedly at runtime.
---

## Evidence Collection
1. Run `kaku doctor` and capture full output
2. Last 50 lines of `~/.local/share/kaku/logs/`
3. Plugin state: `kaku --list-plugins`

## Decision Matrix
| Symptom | First Check |
|---|---|
| Crash on startup | doctor output → Lua syntax error |
| Rendering glitch | GPU backend / terminal capability |
| Config not applied | Config path + schema version |

## Output Format
Root cause / Blast radius / Fix steps / Verification command

Keep Descriptors Short—Every Skill Has Context Cost

Each enabled skill keeps its descriptor in context. The gap between optimized and unoptimized is huge:

# Inefficient (~45 tokens)
description: |
  This skill helps you review code changes in Rust projects.
  It checks for common issues like unsafe code, error handling...
  Use this when you want to ensure code quality before merging.

# Efficient (~9 tokens)
description: Use for PR reviews with focus on correctness.

Important disable-auto-invoke strategy:

  • High frequency (>1 per session) → Keep auto-invoke, optimize descriptor
  • Low frequency (<1 per session) → Disable auto-invoke, manual trigger, descriptor completely out of context
  • Very low frequency (<1 per month) → Remove skill, document in AGENTS.md instead

Skills Anti-Patterns

  • Description too short: description: help with backend (triggers for any backend work)
  • Body too long: Hundreds of lines of manual stuffed into SKILL.md body
  • One skill covering review, deploy, debug, docs, incident—five different things
  • Side-effect skills allowing model auto-invocation

5. Tool Design: Helping Claude Pick the Right Tool

The more I used it, the more I realized: tools for Claude and APIs for humans are different. Human APIs often pursue feature completeness, but for agents, the focus isn’t how complete the features are—it’s how easy it is to use correctly.

Good Tools vs. Bad Tools

Dimension Good Tools Bad Tools
Name jira_issue_get, sentry_errors_search query, fetch, do_action
Parameters issue_key, project_id, response_format id, name, target
Return Information directly relevant to next decision UUIDs, internal fields, raw noise
Scope Single purpose, clear boundaries Multiple actions mixed, opaque side effects
Cost Default output controlled, truncatable Default returns oversized context
Errors Include correction suggestions Only return opaque error codes

Practical design principles:

  • Name prefix by system or resource layer: github_pr_*, jira_issue_*
  • Support response_format: concise / detailed for large responses
  • Error responses should teach the model how to fix, not just throw opaque codes
  • When high-level task tools can be merged, don’t expose too many low-level fragment tools—avoid list_all_* making the model filter itself

Lessons from Claude Code’s Internal Tool Evolution

Finding the sweet spot

The Claude Code team’s evolution of internal tools is pretty interesting. For scenarios needing to pause mid-task and ask the user, they tried three approaches:

  • Version 1: Add a question parameter to existing tools (like Bash), letting Claude ask while calling the tool. Result: Claude usually ignored this parameter and kept going, never stopping to ask.
  • Version 2: Require Claude to write specific markdown format in output, outer layer parses and pauses. Problem: no enforcement, Claude often “forgot” to follow the format, questioning logic very fragile.
  • Version 3: Make it a standalone AskUserQuestion tool. Claude must explicitly call it to ask, call equals pause, no ambiguity. Significantly better than previous versions.

This image explains why Version 3 is more stable:

Improving Elicitation & the AskUserQuestion tool

The left approach (markdown free output) is too loose—model format arbitrary, outer parsing fragile; Right (ExitPlanTool parameter) too rigid—by the time you exit plan phase to ask it’s too late; AskUserQuestion as independent tool sits in the middle—structured and callable anytime, the most stable design of the three.

Bottom line: if you want Claude to stop and ask, give it a dedicated tool. Adding a flag or agreeing on output format often gets skipped.

Todo Tool Evolution:

Updating with Capabilities - Tasks & Todos

Early versions used TodoWrite tool + reminders every 5 turns to keep Claude on task. As models got better, this tool actually became a constraint—Todo reminders made Claude feel it had to follow them rigidly, unable to flexibly adjust plans. Funny how that works: the tool was added because models weren’t strong enough, but once they improved, it became a burden. Worth checking periodically whether constraints you added back then still make sense.

Search Tool Evolution: Initially used RAG vector database—fast but needed indexing, fragile across environments, and most importantly Claude didn’t like using it. Switching to Grep tool letting Claude search directly worked significantly better. Later discovered a side benefit: Claude reads skill files, skill files reference other files, model reads recursively, discovering information on demand without pre-loading—this pattern became known as “progressive disclosure.”

When NOT to Add Another Tool

  • Things local shell can reliably do
  • Model only needs static knowledge, not real external interaction
  • Requirements better fit skill workflow constraints than tool action capabilities
  • Haven’t validated that tool description, schema, and return format can be stably used by model

6. Hooks: Running Your Own Code Before/After Claude Acts

Hooks are easily treated as “automatically running scripts,” but my experience is they’re more about taking things you can’t leave to Claude’s on-the-fly judgment and moving them into deterministic processes.

Like whether to run formatting, whether protected files can be modified, whether to notify after task completion—don’t expect Claude to remember these every time.

Current Hook Points

Hooks configuration interface

Suitable vs. Unsuitable for Hooks

Suitable: Blocking modifications to protected files, auto formatting/lint/light validation after Edit, injecting dynamic context after SessionStart (Git branch, env vars), pushing notifications after task completion.

Unsuitable: Complex semantic judgments requiring lots of context, long-running business processes, decisions requiring multi-step reasoning and tradeoffs—these belong in skills or subagents.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit",
        "pattern": "*.rs",
        "hooks": [
          {
            "type": "command",
            "command": "cargo check 2>&1 | head -30",
            "statusMessage": "Running cargo check..."
          }
        ]
      }
    ],
    "Notification": [
      {
        "type": "command",
        "command": "osascript -e 'display notification \"Task completed\" with title \"Claude Code\"'"
      }
    ]
  }
}

Hooks: Earlier Error Detection Saves Time

Hooks intervention points in execution flow

In a 100-edit session, saving 30-60 seconds each time accumulates to 1-2 hours saved—quite significant. Limit output length (| head -30) to avoid hook output polluting context.

Hooks + Skills + CLAUDE.md Three-Layer Stack

  • CLAUDE.md: Declare “must pass tests and lint before commit”
  • Skill: Tell Claude in what order to run tests, how to read failures, how to fix
  • Hook: Hard validation on critical paths, block if necessary

In practice, any single layer has gaps. CLAUDE.md rules alone get ignored, hooks alone can’t handle judgment calls—all three together is what actually works.


7. Subagents: Sending an Independent Claude to Do One Specific Thing

A subagent is basically an independent Claude instance that branches off from your main conversation, with its own context window and only the tools you specify. Core value isn’t “parallelism” but isolation—scanning codebases, running tests, doing reviews that generate lots of output goes to subagents, main thread only gets summaries, not polluted by intermediate process.

Claude Code has built-in Explore (read-only scan, runs Haiku to save cost), Plan (planning research), General-purpose (general), and custom options.

Explicit Constraints in Configuration

  • tools / disallowedTools: Limit what tools can be used, don’t give same-wide permissions as main thread
  • model: Exploration tasks use Haiku/Sonnet, important reviews use Opus
  • maxTurns: Prevent runaway
  • isolation: worktree: Isolate filesystem when files need modification

Another practical detail: long-running bash commands can be moved to background with Ctrl+B, Claude will check results later with BashOutput tool without blocking main thread. Same for subagents—just tell it “run in background.”

Common Anti-Patterns

  • Subagent with same-wide permissions as main thread, isolation meaningless
  • Output format not fixed, unusable by main thread
  • Strong dependencies between subtasks, frequently sharing intermediate state—Subagent not suitable for this

8. Prompt Caching: Core to Claude Code’s Internal Architecture

You don’t see this covered much in tutorials, but it heavily affects Claude Code’s cost structure and design choices.

Engineering wisdom: “Cache Rules Everything Around Me”—true for agents too. Claude Code’s entire architecture is built around prompt caching. High cache hit rate doesn’t just reduce costs, it enables more generous rate limits. Anthropic even runs alerts on hit rates—too low and it’s declared a SEV.

Prompt Layout Designed for Caching

Lay Out Your Prompt for Caching

Prompt caching works by prefix matching—content from request start to each cache_control breakpoint gets cached. So order matters:

Claude Code prompt order:
1. System Prompt → Static, locked
2. Tool Definitions → Static, locked
3. Chat History → Dynamic, comes after
4. Current user input → Last

Common caching pitfalls:

  • Putting timestamped content in static system prompt (makes it change every time)
  • Non-deterministically shuffling tool definition order
  • Adding/removing tools mid-session

What about dynamic info like current time? Don’t touch system prompt, put it in the next message. Claude Code does this too—adding <system-reminder> tags in user messages, system prompt untouched, cache not broken.

Don’t Switch Models Mid-Session

Prompt cache is model-specific. If you’ve chatted 100K tokens with Opus and want to ask a simple question, switching to Haiku is actually more expensive than continuing with Opus, because you have to rebuild the entire cache for Haiku.

If you really need to switch, hand off via subagent: Opus prepares a “handoff message” for another model explaining the task to complete.

Compaction’s Actual Implementation

Forking Context — Compaction

Above is the Compaction (context compression) flow: left is context near full, middle is Claude Code forking a call feeding full conversation history to model plus “Summarize this conversation”—this hits cache so only 1/10 the price, right is after compression where dozens of conversation turns become a ~20k token summary, System + Tools remain, previously referenced files re-attached, freeing space to continue new turns.

Intuitively Plan Mode should switch to read-only toolset, but this breaks cache. Actual implementation: EnterPlanMode is a tool the model can call itself, entering plan mode autonomously when detecting complex issues, toolset unchanged, cache unaffected.

defer_loading: Lazy Loading for Tools

Claude Code has dozens of MCP tools, including full definitions in every request would be expensive, but removing mid-session breaks cache. Solution: send lightweight stubs with just tool names, marked defer_loading: true. Model “discovers” them via ToolSearch tool, full tool schema only loaded after model selection—this keeps cache prefix stable.


9. Verification Loop: No Verifier, No Engineering Agent

“Claude says it’s done” has little engineering value. What matters is knowing whether it’s actually correct, whether you can roll back if something’s wrong, and whether the process is auditable.

Verifier Levels

  • Lowest: command exit codes, lint, typecheck, unit test
  • Middle: integration tests, screenshot comparison, contract test, smoke test
  • Higher: production log verification, monitoring metrics, manual review checklists

Explicitly Define Verification in Prompt, Skill, and CLAUDE.md

## Verification

For backend changes:

- Run `make test` and `make lint`
- For API changes, update contract tests under `tests/contracts/`

For UI changes:

- Capture before/after screenshots if visual

Definition of done:

- All tests pass
- Lint passes
- No TODO left behind unless explicitly tracked

When writing task prompts or skills, define acceptance criteria upfront. Which commands must pass, what to check first if they fail, what screenshots and logs should show to pass—the earlier these are clear, the less trouble later.

My simple test: if you can’t clearly explain “how Claude knows it’s done correctly,” it’s probably not suitable to throw at Claude for autonomous completion.


10. Engineering Value of High-Frequency Commands

These commands serve one core purpose: actively manage context, don’t wait for the system to handle it.

Context Management

/context   # View token consumption structure, debug MCP and file read ratios
/clear     # Clear session, restart if same issue got corrected twice
/compact   # Compress but keep key points, works with Compact Instructions
/memory    # Confirm which CLAUDE.md actually got loaded

Capabilities and Governance

/mcp connection status showing server tool counts and token consumption

/mcp           # Manage MCP connections, check token costs, disconnect idle servers
/hooks         # Manage hooks, control plane entry
/permissions   # View or update permission whitelist
/sandbox       # Configure sandbox isolation, essential for high-automation scenarios
/model         # Switch models: Opus for deep reasoning, Sonnet for routine, Haiku for quick exploration

Session Continuity and Parallelism

claude --continue               # Resume latest session in current directory, continue next day
claude --resume                 # Open selector to resume historical session
claude --continue --fork        # Fork from existing session, different approaches from same starting point
claude --worktree               # Create isolated git worktree
claude -p "prompt"              # Non-interactive mode, integrate into CI / pre-commit / scripts
claude -p --output-format json  # Structured output, easy for scripts to consume

Several Less Common but Useful Commands

/simplify: Three-dimensional check on just-modified code—reuse, quality, and efficiency—fix issues directly. Great to run right after modifying logic instead of manual review.

/rewind: Not “undo,” but return to a session checkpoint and re-summarize. Good for: Claude explored too far down wrong path; want to keep early consensus but discard later failures.

/btw: Ask a quick side question without interrupting main task, good for “what’s the difference between these two commands” single-round tangential questions—not for questions needing repo reading or tool calls.

claude -p --output-format stream-json: Real-time JSON event stream, good for long task monitoring, incremental processing, streaming integration into your own tools.

/insight: Have Claude analyze current session, extract what’s worth codifying into CLAUDE.md. Usage: run after some session progress, it’ll point out things like “this convention was mentioned repeatedly but not written into contract”—great for iteratively improving CLAUDE.md.

Double-tap ESC to backtrack: Double-tap ESC to return to previous input for re-editing, no need to retype. When Claude goes off track, or previous message wasn’t clear, double-tap ESC to modify and resend—much easier than restarting session.

Conversation history is all local: All session records are in ~/.claude/projects/, folder names by project path (slashes become dashes), each session is a .jsonl file. To find history on a topic, just grep -rl "keyword" ~/.claude/projects/ or tell Claude “help me search for previous discussions about X” and it’ll look itself.


11. How to Write a Good CLAUDE.md

I see CLAUDE.md as more of a collaboration contract between you and Claude—not team docs, not a knowledge base. Only put things that must hold for every session.

My advice is actually simple: start by writing nothing. Use it first, only add when you find yourself repeating the same thing. Adding is simple too—type # to append current conversation to CLAUDE.md, or tell Claude “add this to the project’s CLAUDE.md” and it’ll know which file to modify.

Keep it simple

What to Include

  • Build, test, lint, run commands (most important)
  • Key directory structure and module boundaries
  • Explicit code style and naming constraints
  • Non-obvious environment dependencies and pitfalls
  • Prohibitions and high-risk operations (NEVER list)
  • Information that must survive compression (Compact Instructions)

What NOT to Include

  • Long background introductions
  • Complete API documentation
  • Vague principles like “write high-quality code”
  • Obvious information Claude can infer from reading the repo
  • Large background materials and low-frequency task knowledge (put these in skills)

High-Quality Template

# Project Contract

## Build And Test

- Install: `pnpm install`
- Dev: `pnpm dev`
- Test: `pnpm test`
- Typecheck: `pnpm typecheck`
- Lint: `pnpm lint`

## Architecture Boundaries

- HTTP handlers live in `src/http/handlers/`
- Domain logic lives in `src/domain/`
- Do not put persistence logic in handlers
- Shared types live in `src/contracts/`

## Coding Conventions

- Prefer pure functions in domain layer
- Do not introduce new global state without explicit justification
- Reuse existing error types from `src/errors/`

## Safety Rails

## NEVER

- Modify `.env`, lockfiles, or CI secrets without explicit approval
- Remove feature flags without searching all call sites
- Commit without running tests

## ALWAYS

- Show diff before committing
- Update CHANGELOG for user-facing changes

## Verification

- Backend changes: `make test` + `make lint`
- API changes: update contract tests under `tests/contracts/`
- UI changes: capture before/after screenshots

## Compact Instructions

Preserve:

1. Architecture decisions (NEVER summarize)
2. Modified files and key changes
3. Current verification status (pass/fail commands)
4. Open risks, TODOs, rollback notes

Usage is actually straightforward: what must be known every session goes in CLAUDE.md, what only affects some files goes in rules, what only needed for certain task types goes in skills.

Let Claude Maintain Its Own CLAUDE.md

One of my favorite moves: after correcting Claude’s mistake, have it update CLAUDE.md:

“Update your CLAUDE.md so you don’t make that mistake again.”

Claude is actually quite good at patching these rules for itself—over time the same mistakes show up less. That said, review it periodically; entries go stale and constraints that made sense once may not anymore. More on keeping it healthy in section 14.


12. What I’ve Learned Recently

Over the Spring Festival break, I built an open-source terminal Kaku with Claude Code—Rust + Lua at the core, with some AI sprinkled in. Mixed languages plus a custom config system actually surfaced a bunch of typical agent collaboration problems. Here are a few learnings that helped me most.

“Environment Transparency” Matters More Than You Think

Claude Code calls real shells, git, package managers, and local configurations. As long as one layer is opaque, it has to start guessing; once it starts guessing environment, reliability usually drops fast. This isn’t unique to Claude Code—many agents are like this.

So I quickly added a doctor command to the terminal, pulling together environment state, dependencies, and config status into a structured health report. Running doctor before Claude Code starts work does save a lot of “didn’t understand environment before starting” problems.

I also found that if the CLI itself has clearly semantic subcommands like init, config, reset, Claude Code uses them more stably than letting it guess config file placement. Converge state first, then expose edit entry points—reverse the order and chaos easily follows.

Hooks Practice for Mixed-Language Projects

Two languages, two checks—Hooks are well-suited to trigger separately by file type:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit",
        "pattern": "*.rs",
        "hooks": [{
          "type": "command",
          "command": "cargo check 2>&1 | head -30",
          "statusMessage": "Checking Rust..."
        }]
      },
      {
        "matcher": "Edit",
        "pattern": "*.lua",
        "hooks": [{
          "type": "command",
          "command": "luajit -b $FILE /dev/null 2>&1 | head -10",
          "statusMessage": "Checking Lua syntax..."
        }]
      }
    ]
  }
}

Knowing immediately if there’s a compile error beats “running a bunch of stuff only to find it was broken from the start.”

Complete Engineering Layout Reference

If you want a complete Claude Code setup for your project, here’s a reference structure—don’t copy all of it, just grab what you need:

Project/
├── CLAUDE.md
├── .claude/
│   ├── rules/
│   │   ├── core.md
│   │   ├── config.md
│   │   └── release.md
│   ├── skills/
│   │   ├── runtime-diagnosis/     # Uniformly collect logs, state, dependencies
│   │   ├── config-migration/      # Config migration rollback protection
│   │   ├── release-check/         # Pre-release validation, smoke test
│   │   └── incident-triage/       # Production incident triage
│   ├── agents/
│   │   ├── reviewer.md
│   │   └── explorer.md
│   └── settings.json
└── docs/
    └── ai/
        ├── architecture.md
        └── release-runbook.md

Global constraints (CLAUDE.md), path constraints (rules), workflows (skills), and architectural details fully decoupled—Claude Code execution stability increases significantly. If you maintain multiple projects, you can put stable personal baselines in ~/.claude/, project-specific differences in project-level .claude/, distributed via sync scripts, preventing cross-project pollution.


13. Common Anti-Patterns

Anti-Pattern Symptom Fix
CLAUDE.md as wiki Pollutes context every load, key instructions diluted Keep only contract, move materials to skills and rules
Skill grab-bag Description can’t stably trigger, workflow conflicts One skill does one thing, explicit side-effect control
Too many tools, vague descriptions Wrong tool selection, schema overwhelms context Merge overlapping tools, clear namespacing
No verification loop Claude can only “feel like it’s done” Bind verifier to every task type
Over-autonomy Multi-agent parallel unbounded, hard to stop when wrong Minimize roles/permissions/worktrees, explicit maxTurns
No context segmentation Research, implementation, review all piled on main thread, effective context diluted Task switch /clear, phase switch /compact, heavy exploration to subagent (Explore → Main pattern)
Wide autonomy but insufficient governance Multi-agent, external tools all open, but lack permission boundaries and result recovery boundaries permissions + sandbox + hooks + subagent combined boundaries
Approved commands pile up uncleaned rm -rf and other dangerous operations lingering in settings.json, once triggered irreversible Regularly review allowedTools list in .claude/settings.json

14. Configuration Health Check

Based on the six-layer framework from this post, I’ve packaged these checks into an open-source skill project tw93/claude-health that can one-click check your Claude Code configuration status.

npx skills add tw93/claude-health

After installing, run /health in any session—it automatically identifies project complexity, runs checks on CLAUDE.md, rules, skills, hooks, allowedTools, and actual behavior patterns, outputting a priority report: fix immediately / structural issues / can do gradually.

If you want to know how far your configuration is from these principles after reading this post, running /health is the fastest way.


15. Conclusion

Using Claude Code typically goes through three stages:

Stage Focus Efficiency Perception
Tool User “How do I use this feature” Helpful but limited
Process Optimizer “How to make collaboration smoother”—start writing CLAUDE.md and Skills Significant improvement
System Designer “How to make Agent operate autonomously under constraints” Qualitative change

At stage three, the focus quietly shifts from “how to use this feature” to “how to make the agent run autonomously under constraints”—these are very different things.

One thing worth thinking about: if you can’t clearly articulate “what does done look like,” it’s probably not suitable to throw directly at Claude for autonomous completion. No acceptance criteria means no correct answer, no matter how capable the model.

These are my takeaways from six months of tinkering—there’s definitely more to dig into. If you’ve found something that works even better, let me know.

Read More

Installing OpenClaw Isn't the Same as Using It

【2026-03-07】Watching the frenzy around Tencent Tower installing OpenClaw today gave me a lot to think about. Many big tech companies are aggressively pushing non-technical frontline employees to install this AI tool, with some even offering 500 RMB door-to-door installation services. Everyone's desperately searching for use cases, demanding implementation, and trying to prove this thing is too important to miss. The whole process gives me a strong sense of cyber-tech folding.