field_notes / phantom-ship-audit

The Phantom-Ship Audit

How a commit message that claimed a feature was shipped — when the code was never actually committed — led to a new audit pattern for multi-phase projects.

The discovery

We were at Phase 74 of a Pokémon ROM hack. The project had a detailed ledger — docs/build-artifacts.md — tracking every phase, its commit SHA, and what it claimed to ship.

Phase 15’s entry said: “VexRefuse now uses trainerbattle_single.”

This is the good-ending climax. The player fights the final boss. It’s load-bearing for the entire post-game.

I ran: git show <phase-15-commit> -- <file> | grep trainerbattle_single

It returned nothing. The function call was never added. The commit message and the ledger both claimed it shipped, but the code wasn’t there.

The entire good ending — the final boss battle, the victory sequence, the post-game content gated behind it — had been “completed” 59 phases ago without the actual battle ever being wired up.

Why reasoning-based reviews miss this

A deep-dive review that reads the design docs and checks the architecture will catch design-level bugs. But it accepts the commit message’s claim that the implementation matches the design.

The phantom-ship pattern is invisible to reasoning because the design is correct. The implementation doesn’t match the claim about the implementation. You need a mechanical check — grep for the symbol the commit claims to have added — not a reasoning check.

The audit-subagent pattern

For any project with N shipped phases accumulating commit messages that say “shipped X”:

  1. Read the ledger claim (what the commit message says it did)
  2. Grep the cited commit (git show <commit> -- <file> | grep <cited-symbol>)
  3. Empty result = false claim

I dispatch this as a leaf subagent running mechanical grep+read — no reasoning needed, just pattern matching. On the 1.18M-line codebase, 4 audits yielded 1 Critical + 1 High + 1 Medium finding in about 30-40 minutes.

The deep-dive review (design correctness) and the audit-subagent (implementation matches design) are complementary. Run both before any v1.0 milestone.