Claude Code Deep Dive: Architecture, Governance, and Engineering Practices

Categories: Share

Preface

I wrote this handbook for engineers who already use Claude Code and want workflows that are more predictable, more controllable, and easier to verify.

Unless noted otherwise, the guidance here is based on my own usage, experiments, and public Anthropic materials available as of March 2026. Treat it as engineering guidance, not as an official specification of Claude Code internals.

TL;DR

Six months of heavy Claude Code use, two accounts, roughly $40 per month.

At first I treated it like a chatbot. Things went sideways pretty fast: context kept getting messy, tools multiplied while outcomes got worse, rule sets grew longer while compliance dropped. Spent some time working through those failures and studying how Claude Code operates, and finally understood where things were going wrong.

The most direct way to think about it: six layers.

Layer Responsibility
CLAUDE.md / rules / memory Long-term context,tells Claude “what this is”
Tools / MCP Action capabilities,tells Claude “what I can do”
Skills On-demand methodologies,tells Claude “how to do it”
Hooks Enforced behaviors that don’t rely on Claude’s judgment
Subagents Context-isolated workers for controlled autonomy
Verifiers Validation loops that make output testable, rollbackable, auditable

Over-index on one layer and the system becomes unstable. CLAUDE.md grows too long and pollutes its own context. Too many tools make selection noisy. Too many subagents make state drift harder to control. Skip verification and you lose the ability to tell where things broke.


Execution Model

Agent Loop

Claude Code runs an iterative agent loop at its core:

Gather context → Take action → Verify result → [Done or loop back]
     ↑                    ↓
  CLAUDE.md          Hooks / Permissions / Sandbox
  Skills             Tools / MCP
  Memory

After using it for a while, you realize that failures usually do not come from a lack of raw model capability. Wrong information is more dangerous than missing information. In many cases, the real problem is that something was produced, but you have no reliable way to validate it or roll it back.

Five Diagnostic Surfaces

Surface Core Question Primary Carriers
Context surface What’s always loaded vs. loaded on demand CLAUDE.md, rules, memory, skills
Action surface What actions can Claude currently take Built-in tools, MCP, plugins
Control surface What actions must be constrained, blocked, or audited Permissions, sandbox, hooks
Isolation surface What tasks need context and permission isolation Subagents, worktrees, forked sessions
Verification surface How to know a task is done and the result is trustworthy Tests, lint, screenshots, logs, CI

When something goes wrong, these surfaces are the right place to start. If results are unstable, inspect context loading order. If automation runs too far, inspect the control layer. If quality drops in long sessions, assume intermediate artifacts have polluted the context and start a fresh session instead of tuning prompts.


Concept Boundaries: MCP / Plugin / Tools / Skills / Hooks / Subagents

Concept Runtime Role Solves What Common Misuse
CLAUDE.md Project-level persistent contract Commands, boundaries, prohibitions that must hold every session Writing it as a team knowledge base
.claude/rules/* Path or language-specific rules Directory, file type, or language-level local conventions Dumping all rules into root CLAUDE.md
Built-in Tools Built-in capabilities Read files, modify files, run commands, search Putting all integrations into shell
MCP External capability protocol Let Claude access GitHub, Sentry, databases Connecting too many servers, tool definitions overwhelming context
Plugin Packaging and distribution layer Distribute skills/hooks/MCP together Treating plugin as a runtime primitive
Skill On-demand loaded knowledge/workflow Give Claude a method package Skill trying to be both encyclopedia and deployment script
Hook Rule-enforcing interception layer Execute rules around lifecycle events Using hooks to replace all model judgment
Subagent Context-isolated work unit Parallel research, limit tools and permissions Unbounded fan-out, governance loss

Simple rule: use Tools and MCP to give Claude new actions, use Skills to provide reusable methods, use Subagents to isolate execution, use Hooks to enforce constraints and collect audit signals, and use Plugins for distribution. Blur these boundaries and the system becomes difficult to reason about.


Context Engineering: The Most Important System Constraint

Many people treat context as a “capacity problem,” but the real issue is usually noise,useful information drowned in irrelevant content.

The Real Context Cost Structure

Context Loading

Claude Code’s large context window is not fully available for task-specific work:

200K total context
├── Fixed overhead (~15-20K)
│   ├── System instructions: ~2K
│   ├── All enabled Skill descriptors: ~1-5K
│   ├── MCP Server tool definitions: ~10-20K  ← Largest hidden overhead
│   └── LSP state: ~2-5K
│
├── Semi-fixed (~5-10K)
│   ├── CLAUDE.md: ~2-5K
│   └── Memory: ~1-2K
│
└── Dynamically available (~160-180K)
    ├── Conversation history
    ├── File contents
    └── Tool call results

A typical MCP server such as GitHub can expose 20-30 tool definitions at roughly 200 tokens each, which is 4,000-6,000 tokens. Connect five servers and you may be spending roughly 25,000 tokens (12.5%) on fixed overhead alone. In large codebases, that overhead is material.

Always resident    → CLAUDE.md: project contract / build commands / prohibitions
Path-loaded        → rules: language / directory / file-type specific
On-demand loaded   → skills: workflows / domain knowledge
Isolated loaded    → subagents: heavy exploration / parallel research
Never in context   → hooks: deterministic scripts / audit / blocking

Context Best Practices

  • Keep CLAUDE.md short, strict, and operational. Prioritize commands, constraints, and architectural boundaries. In Anthropic’s public examples, these files stay relatively compact; that is the right direction even if there is no universal token target.
  • Split large reference documents into supporting files for the skill; do not put them directly in SKILL.md
  • Use .claude/rules/ for path- and language-specific rules; do not make root CLAUDE.md handle every variation
  • Use /context proactively to inspect consumption; do not wait for automatic compression

/context command output showing token breakdown by source

  • For task switches, prefer /clear; for a new phase of the same task, use /compact
  • Write Compact Instructions in CLAUDE.md. You should decide what survives compression, not the algorithm.

Tool Output Noise: The Other Hidden Context Killer

The analysis above covers fixed overhead from MCP tool definitions. But dynamic context has its own trap: tool output. A single cargo test run can produce thousands of lines, git log fills up fast on any non-trivial repo, and find or grep can easily flood the window with irrelevant matches. Claude does not need all of it, but once it’s in context it consumes real tokens and crowds out conversation history and file contents.

I came across RTK (Rust Token Killer) and the idea clicked immediately: filter command output before it reaches Claude, keeping only what’s relevant to the next decision. For cargo test, instead of thousands of lines:

# Raw output Claude would see
running 262 tests
test auth::test_login ... ok
...(thousands of lines)

# After RTK
✓ cargo test: 262 passed (1 suite, 0.08s)

What Claude needs is “did it pass, and if not, where did it fail.” Everything else is noise. RTK rewrites commands transparently via a hook, so Claude Code never notices it. The difference from manually appending | head -30 to every command is that RTK covers all output automatically. The project is open source on GitHub.

The Compression Trap

Session Continuity

The default compression algorithm optimizes for “re-readability.” Early tool outputs and file contents are often deleted first, which can also remove architectural decisions and constraint rationale. Two hours later, when you need to change something, the earlier decisions may already be gone. A surprising number of bugs start there.

The fix is to specify in CLAUDE.md:

## Compact Instructions

When compressing, preserve in priority order:

1. Architecture decisions (NEVER summarize)
2. Modified files and their key changes
3. Current verification status (pass/fail)
4. Open TODOs and rollback notes
5. Tool outputs (can delete, keep pass/fail only)

Compact Instructions help, but a HANDOFF.md is even more reliable. Before ending a session, have Claude record the current state: what it tried, what worked, what failed, and what should happen next. Then the next Claude instance can continue from that file instead of depending on compression quality:

Write the current progress in HANDOFF.md. Explain what you tried, what worked, what didn’t, so the next agent with fresh context can continue from just this file.

Review it quickly for gaps, then start the next session from the HANDOFF.md path.

Plan Mode’s Engineering Value

Plan Mode separates exploration from execution. Exploration stays read-only, and files are touched only after the plan is confirmed:

  • The exploration phase is read-only
  • Claude can clarify goals and boundaries before proposing a concrete plan
  • Execution begins only after the plan is confirmed

For complex refactors, migrations, and cross-module changes, this separation is usually better than editing immediately. Running with a bad assumption is much less likely when the plan is confirmed first. Double-tap Shift+Tab to enter Plan Mode. A useful pattern is to let one agent draft the plan and another review it before execution.


Skills Design: Not a Template Library, But Workflows Loaded On Demand

Skills are best understood as on-demand workflow packages. Their descriptors remain in context, while the full body is loaded only when needed. That makes them operationally different from saved prompts.

What Makes a Good Skill

  • A good skill description tells the model when to use the skill, not just what it contains
  • It defines complete steps, inputs, outputs, and stop conditions, not just an opening instruction
  • The body should contain navigation and core constraints only; large reference material should live in supporting files
  • Skills with side effects should explicitly disable model invocation; otherwise the model decides whether to run them

How Skills Achieve “On-Demand Loading”

A useful pattern here is progressive disclosure. Do not show the model everything at once. Give it an index and navigation first, then load details on demand:

  • SKILL.md defines task semantics, boundaries, and execution skeleton
  • Supporting files provide domain details
  • Scripts deterministically gather context or evidence

A stable structure looks like:

.claude/skills/
└── incident-triage/
    ├── SKILL.md
    ├── runbook.md
    ├── examples.md
    └── scripts/
        └── collect-context.sh

Three Typical Skill Types

These examples come from my own use in the open-source terminal project Kaku.

Type 1: Checklist (Quality Gate)

Run this before a release so nothing critical is missed:

---
name: release-check
description: Use before cutting a release to verify build, version, and smoke test.
---

## Pre-flight (All must pass)
- [ ] `cargo build --release` passes
- [ ] `cargo clippy -- -D warnings` clean
- [ ] Version bumped in Cargo.toml
- [ ] CHANGELOG updated
- [ ] `kaku doctor` passes on clean env

## Output
Pass / Fail per item. Any Fail must be fixed before release.

Type 2: Workflow (Standardized Operations)

Config migration is high risk, so use explicit invocation and a built-in rollback path:

---
name: config-migration
description: Migrate config schema. Run only when explicitly requested.
disable-model-invocation: true
---

## Steps
1. Backup: `cp ~/.config/kaku/config.toml ~/.config/kaku/config.toml.bak`
2. Dry run: `kaku config migrate --dry-run`
3. Apply: remove `--dry-run` after confirming output
4. Verify: `kaku doctor` all pass

## Rollback
`cp ~/.config/kaku/config.toml.bak ~/.config/kaku/config.toml`

Type 3: Domain Expert (Encapsulated Decision Framework)

When runtime issues occur, have Claude collect evidence along a fixed path instead of guessing:

---
name: runtime-diagnosis
description: Use when kaku crashes, hangs, or behaves unexpectedly at runtime.
---

## Evidence Collection
1. Run `kaku doctor` and capture full output
2. Last 50 lines of `~/.local/share/kaku/logs/`
3. Plugin state: `kaku --list-plugins`

## Decision Matrix
| Symptom | First Check |
|---|---|
| Crash on startup | doctor output → Lua syntax error |
| Rendering glitch | GPU backend / terminal capability |
| Config not applied | Config path + schema version |

## Output Format
Root cause / Blast radius / Fix steps / Verification command

Keep Descriptors Short,Every Skill Has Context Cost

Each enabled skill keeps its descriptor in context. The gap between optimized and unoptimized is huge:

# Inefficient (~45 tokens)
description: |
  This skill helps you review code changes in Rust projects.
  It checks for common issues like unsafe code, error handling...
  Use this when you want to ensure code quality before merging.

# Efficient (~9 tokens)
description: Use for PR reviews with focus on correctness.

Important invocation strategy:

  • High frequency (>1 per session) → Keep auto-invoke and optimize the descriptor
  • Low frequency (<1 per session) → Disable auto-invoke, trigger manually, keep the descriptor out of context
  • Very low frequency (<1 per month) → Remove the skill and document it in AGENTS.md instead

Skills Anti-Patterns

  • Description too short: description: help with backend (it triggers for almost any backend task)
  • Body too long: hundreds of lines of manual content stuffed into SKILL.md
  • One skill covering review, deploy, debug, docs, incident,five different things
  • Side-effect skills allowing model auto-invocation

Tool Design: Helping Claude Pick the Right Tool

The more I used Claude Code, the clearer this became: tools for Claude and APIs for humans should be designed differently. Human-facing APIs often optimize for feature completeness. Agent-facing tools should optimize for correct selection and correct use.

Good Tools vs. Bad Tools

Dimension Good Tools Bad Tools
Name jira_issue_get, sentry_errors_search query, fetch, do_action
Parameters issue_key, project_id, response_format id, name, target
Return Information directly relevant to next decision UUIDs, internal fields, raw noise
Scope Single purpose, clear boundaries Multiple actions mixed, opaque side effects
Cost Default output controlled, truncatable Default returns oversized context
Errors Include correction suggestions Only return opaque error codes

Practical design principles:

  • Prefix names by system or resource layer: github_pr_*, jira_issue_*
  • Support response_format: concise / detailed for large responses
  • Make error responses corrective; do not return opaque codes only
  • When high-level task tools can be merged, do not expose too many low-level fragments. Avoid patterns like list_all_* that force the model to filter the results itself.

Lessons from Claude Code’s Internal Tool Evolution

Finding the sweet spot

One useful lesson from Anthropic’s public discussion of Claude Code tooling is how they handled cases where the agent needed to stop mid-task and ask the user a question. They tried three approaches:

  • Version 1: Add a question parameter to existing tools such as Bash, letting Claude ask while calling the tool. Result: Claude often ignored the parameter and continued without pausing.
  • Version 2: Require Claude to emit a specific markdown format that an outer layer parses to pause. Problem: there is no hard enforcement, so the questioning path remains fragile.
  • Version 3: Make it a standalone AskUserQuestion tool. Claude must explicitly call it to ask, and the tool call itself becomes the pause signal. Much more reliable than either previous approach.

This image shows why Version 3 is more stable:

Improving Elicitation & the AskUserQuestion tool

The left approach is too loose because parsing depends on free-form output. The right approach is too rigid because the model can ask only after exiting plan mode. AskUserQuestion as a dedicated tool sits in the middle: structured, explicit, and callable at any point.

The practical takeaway is straightforward: if you want Claude to stop and ask, give it a dedicated tool. Flags and output-format conventions are much easier for the model to skip.

Todo Tool Evolution:

Updating with Capabilities - Tasks & Todos

Early versions used a TodoWrite tool plus periodic reminders to keep Claude on task. As models improved, that tool became more of a constraint than a benefit. The broader lesson is worth keeping: controls that were useful for weaker models may become unnecessary friction later, so they should be reviewed periodically.

Search Tool Evolution: Early approaches relied on a RAG-style vector database. It was fast, but it required indexing and proved fragile across environments. More importantly, tool adoption by the model was poor. Switching to a Grep-style tool that let Claude search directly worked better in practice. A useful side effect is that Claude can read a skill file, follow references to other files, and load information recursively only when needed. That is a strong example of progressive disclosure.

When NOT to Add Another Tool

  • Tasks the local shell can already perform reliably
  • Cases where the model needs static knowledge, not external interaction
  • Requirements that fit skill workflow constraints better than tool actions
  • Cases where you have not yet validated that the description, schema, and return format are stable for model use

Hooks: Running Your Own Code Before/After Claude Acts

Hooks are easy to think of as “automatically running scripts,” but in practice they are better understood as a way to move work out of on-the-fly model judgment and into deterministic processes.

Examples include whether formatting should run, whether protected files can be modified, and whether to notify after task completion. These are not the kinds of controls you should rely on the model to remember every time.

Current Hook Points

Hooks configuration interface

Suitable vs. Unsuitable for Hooks

Suitable: Blocking modifications to protected files, auto formatting/lint/light validation after Edit, injecting dynamic context after SessionStart (Git branch, env vars), pushing notifications after task completion.

Unsuitable: Complex semantic judgments requiring lots of context, long-running business processes, decisions requiring multi-step reasoning and tradeoffs,these belong in skills or subagents.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit",
        "pattern": "*.rs",
        "hooks": [
          {
            "type": "command",
            "command": "cargo check 2>&1 | head -30",
            "statusMessage": "Running cargo check..."
          }
        ]
      }
    ],
    "Notification": [
      {
        "type": "command",
        "command": "osascript -e 'display notification \"Task completed\" with title \"Claude Code\"'"
      }
    ]
  }
}

Hooks: Earlier Error Detection Saves Time

Hooks intervention points in execution flow

In a 100-edit session, saving 30-60 seconds each time adds up to 1-2 hours. That is material. Limit output length (| head -30) to avoid hook output polluting context. For a more systematic approach to output noise across all commands, see the RTK note in section 3.

Hooks + Skills + CLAUDE.md Three-Layer Stack

  • CLAUDE.md: Declare “must pass tests and lint before commit”
  • Skill: Tell Claude in what order to run tests, how to read failures, how to fix
  • Hook: Hard validation on critical paths, block if necessary

In practice, any single layer has gaps. CLAUDE.md rules alone get ignored, hooks alone can’t handle judgment calls,all three together is what works.


Subagents: Sending an Independent Claude to Do One Specific Thing

A subagent is an independent Claude instance that branches off from the main conversation, with its own context window and only the tools you allow. The main value is isolation, not raw parallelism. Codebase scans, test runs, and review passes that generate large amounts of output can go to a subagent, while the main thread receives only the summary instead of carrying all intermediate output in its context window.

Claude Code has built-in Explore (read-only scan, runs Haiku to save cost), Plan (planning research), General-purpose (general), and custom options.

Explicit Constraints in Configuration

  • tools / disallowedTools: Limit what tools can be used; do not grant the same broad permissions as the main thread
  • model: Exploration tasks use Haiku/Sonnet, important reviews use Opus
  • maxTurns: Prevent runaway
  • isolation: worktree: Isolate filesystem when files need modification

Another practical detail: long-running bash commands can be moved to the background with Ctrl+B. Claude can check the results later with the BashOutput tool without blocking the main thread. The same pattern applies to subagents.

Common Anti-Patterns

  • A subagent with the same broad permissions as the main thread; isolation becomes largely meaningless
  • Output format not fixed, unusable by main thread
  • Strong dependencies between subtasks, frequently sharing intermediate state,Subagent not suitable for this

Prompt Caching: A First-Class Design Constraint

Tutorials rarely cover this topic, but it has a major effect on Claude Code’s cost structure and design choices. Claude Code’s entire architecture is built around prompt caching. A high cache hit rate cuts cost, improves latency, and loosens rate limits. Anthropic monitors hit rates internally and treats low cache performance as an incident-level signal.

Prompt Layout Designed for Caching

Lay Out Your Prompt for Caching

Prompt caching works by prefix matching,content from request start to each cache_control breakpoint gets cached. So order matters:

Claude Code prompt order:
1. System Prompt → Static, locked
2. Tool Definitions → Static, locked
3. Chat History → Dynamic, comes after
4. Current user input → Last

Common caching pitfalls:

  • Putting timestamped content in static system prompt (makes it change every time)
  • Non-deterministically shuffling tool definition order
  • Adding/removing tools mid-session

What about dynamic information such as the current time? Do not put it in the system prompt. Put it in a later message instead. That preserves cache stability.

Don’t Switch Models Mid-Session

Prompt cache is model-specific. If you’ve chatted 100K tokens with Opus and want to ask a simple question, switching to Haiku is more expensive than continuing with Opus, because you have to rebuild the entire cache for Haiku.

If you really need to switch, hand off via subagent: Opus prepares a “handoff message” for another model explaining the task to complete.

Compaction’s Actual Implementation

Forking Context , Compaction

Above is the compaction flow. On the left, the context window is close to full. In the middle, Claude Code forks a summarization call over the existing conversation, which can benefit from caching. On the right, many turns have been compressed into a shorter summary while the system prompt, tool definitions, and previously referenced files remain available, freeing space for the session to continue.

At first glance, Plan Mode might seem like it should switch to a different read-only toolset, but that would hurt cache reuse. Anthropic has publicly described a tool-driven approach instead: the model can enter plan mode without changing the underlying tool prefix, which preserves cache stability.

defer_loading: Lazy Loading for Tools

Claude Code can have many MCP tools. Including every full definition in every request would be expensive, but adding and removing tools mid-session also hurts cache reuse. One approach Anthropic has described is to keep lightweight stubs in the stable prefix and load fuller schemas only when the model selects them.


Verification Loop: No Verifier, No Engineering Agent

“Claude says it’s done” has little engineering value. What matters is knowing whether it’s correct, whether you can roll back if something’s wrong, and whether the process is auditable.

Verifier Levels

  • Lowest: command exit codes, lint, typecheck, unit test
  • Middle: integration tests, screenshot comparison, contract test, smoke test
  • Higher: production log verification, monitoring metrics, manual review checklists

Explicitly Define Verification in Prompt, Skill, and CLAUDE.md

## Verification

For backend changes:

- Run `make test` and `make lint`
- For API changes, update contract tests under `tests/contracts/`

For UI changes:

- Capture before/after screenshots if visual

Definition of done:

- All tests pass
- Lint passes
- No TODO left behind unless explicitly tracked

When writing task prompts or skills, define acceptance criteria upfront. Which commands must pass, what to check first if they fail, what screenshots and logs should show to pass,the earlier these are clear, the less trouble later.

My simple test: if you can’t clearly explain “how Claude knows it’s done correctly,” it’s probably not suitable to throw at Claude for autonomous completion.


Commands Worth Using Frequently

These commands serve one main purpose: manage context actively instead of waiting for the system to manage it for you.

The exact command set can change across releases, but the operational pattern is stable: inspect context usage, control loaded capabilities, and keep long sessions intentional.

Context Management

/context   # Inspect token consumption, including MCP and file-read ratios
/clear     # Reset the session; useful when the same issue has already been corrected twice
/compact   # Compress while retaining key points; works best with Compact Instructions
/memory    # Confirm which CLAUDE.md got loaded

Capabilities and Governance

/mcp connection status showing server tool counts and token consumption

/mcp           # Manage MCP connections, check token costs, disconnect idle servers
/hooks         # Manage hooks; this is a key control-plane entry point
/permissions   # View or update permission whitelist
/sandbox       # Configure sandbox isolation, essential for high-automation scenarios
/model         # Switch models: Opus for deep reasoning, Sonnet for routine, Haiku for quick exploration

Session Continuity and Parallelism

claude --continue               # Resume the latest session in the current directory; useful for picking work back up the next day
claude --resume                 # Open selector to resume historical session
claude --continue --fork        # Fork from an existing session to try a different approach from the same starting point
claude --worktree               # Create isolated git worktree
claude -p "prompt"              # Non-interactive mode for CI, pre-commit, or other scripts
claude -p --output-format json  # Structured output that scripts can consume directly

Several Less Common but Useful Commands

/simplify: Runs a quick pass over recently modified code with a focus on reuse, quality, and efficiency. Useful immediately after changing logic instead of waiting for later manual review.

/rewind: Not an “undo,” but a return to an earlier session checkpoint followed by a new summary. Useful when Claude went too far down the wrong path and you want to keep the early consensus but discard later failures.

/btw: Ask a quick side question without interrupting the main task. It works well for light tangents such as comparing two commands, but not for questions that require repository reads or tool calls.

claude -p --output-format stream-json: Emits a real-time JSON event stream. Useful for long-running task monitoring, incremental processing, or integration into your own tooling.

/insight: Ask Claude to analyze the current session and extract what should be codified in CLAUDE.md. Run it after some progress has accumulated; it can surface patterns such as “this convention came up repeatedly but was never written into the contract.”

Double-tap ESC to backtrack: Brings the previous input back for editing instead of retyping it. If Claude starts down the wrong path, or your last message was underspecified, this is often faster than restarting the session.

Conversation history is all local: Session records live under ~/.claude/projects/. Folder names are derived from the project path, and each session is stored as a .jsonl file. To find prior work on a topic, run grep -rl "keyword" ~/.claude/projects/ or ask Claude to search previous discussions about that topic.


How to Write a Good CLAUDE.md

I think of CLAUDE.md as a collaboration contract between you and Claude. It is not team documentation and not a knowledge base. Put only the information that must hold across every session.

My advice is simple: start with nothing. Use Claude Code first, then add entries only when you notice yourself repeating the same instruction. To add an entry, type # to append the current conversation to CLAUDE.md, or tell Claude “add this to the project’s CLAUDE.md” and it will usually update the right file.

Keep it simple

What to Include

  • Build, test, lint, run commands (most important)
  • Key directory structure and module boundaries
  • Explicit code style and naming constraints
  • Non-obvious environment dependencies and pitfalls
  • Prohibitions and high-risk operations (NEVER list)
  • Information that must survive compression (Compact Instructions)

What NOT to Include

  • Long background introductions
  • Complete API documentation
  • Vague principles like “write high-quality code”
  • Obvious information Claude can infer from reading the repo
  • Large background materials and low-frequency task knowledge (put these in skills)

High-Quality Template

# Project Contract

## Build And Test

- Install: `pnpm install`
- Dev: `pnpm dev`
- Test: `pnpm test`
- Typecheck: `pnpm typecheck`
- Lint: `pnpm lint`

## Architecture Boundaries

- HTTP handlers live in `src/http/handlers/`
- Domain logic lives in `src/domain/`
- Do not put persistence logic in handlers
- Shared types live in `src/contracts/`

## Coding Conventions

- Prefer pure functions in domain layer
- Do not introduce new global state without explicit justification
- Reuse existing error types from `src/errors/`

## Safety Rails

## NEVER

- Modify `.env`, lockfiles, or CI secrets without explicit approval
- Remove feature flags without searching all call sites
- Commit without running tests

## ALWAYS

- Show diff before committing
- Update CHANGELOG for user-facing changes

## Verification

- Backend changes: `make test` + `make lint`
- API changes: update contract tests under `tests/contracts/`
- UI changes: capture before/after screenshots

## Compact Instructions

Preserve:

1. Architecture decisions (NEVER summarize)
2. Modified files and key changes
3. Current verification status (pass/fail commands)
4. Open risks, TODOs, rollback notes

Let Claude Maintain Its Own CLAUDE.md

One useful habit is to update CLAUDE.md immediately after correcting Claude’s mistake:

“Update your CLAUDE.md so you don’t make that mistake again.”

Claude is often reasonably good at writing these rules for itself, and repeated mistakes do usually become less frequent over time. Still, review the file periodically. Entries go stale, and constraints that once helped may stop paying for themselves. More on keeping it healthy in section 14.


Field Notes from a Real Project

Over the Spring Festival break, I built an open-source terminal called Kaku with Claude Code. It uses Rust and Lua at the core, plus a custom configuration system. That combination surfaced many of the collaboration problems that agentic coding workflows tend to expose. Here are the practices that helped most.

“Environment Transparency” Matters More Than You Think

Claude Code calls real shells, git, package managers, and local configurations. As long as one layer is opaque, it has to start guessing, and once it starts guessing environment, reliability drops fast.

I quickly added a doctor command to the terminal that collects environment state, dependencies, and configuration status into a structured health report. Running doctor before Claude Code starts work eliminates many cases where the agent begins from an incorrect understanding of the environment.

I also found that when the CLI exposes clearly semantic subcommands such as init, config, and reset, Claude Code uses them more reliably than when it has to infer where configuration files live. Converge state first, then expose edit entry points. Reversing that order creates unnecessary chaos.

Hooks Practice for Mixed-Language Projects

Two languages, two checks,Hooks are well-suited to trigger separately by file type:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit",
        "pattern": "*.rs",
        "hooks": [{
          "type": "command",
          "command": "cargo check 2>&1 | head -30",
          "statusMessage": "Checking Rust..."
        }]
      },
      {
        "matcher": "Edit",
        "pattern": "*.lua",
        "hooks": [{
          "type": "command",
          "command": "luajit -b $FILE /dev/null 2>&1 | head -10",
          "statusMessage": "Checking Lua syntax..."
        }]
      }
    ]
  }
}

Knowing immediately that a file no longer compiles is much better than running a long sequence of steps only to discover it was broken from the start.

Complete Engineering Layout Reference

If you want a complete Claude Code setup for your project, here is a reference structure. Treat it as a reference, not a template:

Project/
├── CLAUDE.md
├── .claude/
│   ├── rules/
│   │   ├── core.md
│   │   ├── config.md
│   │   └── release.md
│   ├── skills/
│   │   ├── runtime-diagnosis/     # Uniformly collect logs, state, dependencies
│   │   ├── config-migration/      # Config migration rollback protection
│   │   ├── release-check/         # Pre-release validation, smoke test
│   │   └── incident-triage/       # Production incident triage
│   ├── agents/
│   │   ├── reviewer.md
│   │   └── explorer.md
│   └── settings.json
└── docs/
    └── ai/
        ├── architecture.md
        └── release-runbook.md

With global constraints (CLAUDE.md), path constraints (rules), workflows (skills), and deeper architectural detail separated cleanly, Claude Code behaves much more predictably. If you maintain multiple projects, keep stable personal defaults in ~/.claude/ and project-specific differences in each project’s .claude/ directory to reduce cross-project pollution.


Common Anti-Patterns

Anti-Pattern Symptom Fix
CLAUDE.md as wiki Pollutes context every load, key instructions diluted Keep only contract, move materials to skills and rules
Skill grab-bag Description can’t stably trigger, workflow conflicts One skill does one thing, explicit side-effect control
Too many tools, vague descriptions Wrong tool selection, schema overwhelms context Merge overlapping tools, clear namespacing
No verification loop Claude has no way to tell if it’s actually done Bind verifier to every task type
Over-autonomy Unbounded multi-agent parallelism that is hard to stop once it goes wrong Minimize roles, permissions, and worktrees; set explicit maxTurns
No context segmentation Research, implementation, and review all pile onto the main thread, diluting effective context Use /clear for task switches, /compact for phase switches, and offload heavy exploration to a subagent (Explore → Main pattern)
Wide autonomy but insufficient governance Multi-agent execution and external tools are open, but permission boundaries and recovery boundaries are weak Combine permissions, sandbox, hooks, and subagents into a single control boundary
Approved commands pile up uncleaned Dangerous operations such as rm -rf remain in settings.json; once triggered, they are irreversible Regularly review the allowedTools list in .claude/settings.json

Configuration Health Check

Based on the six-layer framework in this handbook, I packaged these checks into an open-source skill project, tw93/claude-health, which can evaluate your Claude Code configuration in one pass.

npx skills add tw93/claude-health

After installing, run /health in any session. It evaluates project complexity, checks CLAUDE.md, rules, skills, hooks, allowedTools, and recurring behavior patterns, then outputs a prioritized report: fix now, structural issues, and gradual improvements.

If you want to know how far your configuration is from these principles after reading this handbook, running /health is the fastest way.


Conclusion

Using Claude Code typically involves three stages:

Stage Focus Efficiency Perception
Tool User “How do I use this feature?” Helpful but limited
Process Optimizer “How do I make collaboration smoother?” Start using CLAUDE.md and Skills intentionally Significant improvement
System Designer “How do I make the agent operate autonomously under constraints?” Qualitative change

One practical test matters more than most: if you cannot clearly articulate what “done” looks like, the task is probably not ready for autonomous execution by Claude. Without acceptance criteria, there is no reliable notion of a correct answer, no matter how capable the model is.

These are my takeaways from six months of sustained use. There is still more to learn, but this framework has held up well in practice. If you have found a better operating pattern, I would be interested in comparing notes.

Read More

Installing OpenClaw Isn't the Same as Using It

【2026-03-07】Watching the frenzy around Tencent Tower installing OpenClaw today gave me a lot to think about. Many big tech companies are aggressively pushing non-technical frontline employees to install this AI tool, with some even offering 500 RMB door-to-door installation services. Everyone's desperately searching for use cases, demanding implementation, and trying to prove this thing is too important to miss. The whole process gives me a strong sense of cyber-tech folding.