Why Claude Code Needs More Than Your Words

By Mike Holloway

The 3-Sentence Story Showcasing Lazy Prompting

Here's a real story description I wrote for one of my own projects:

"We need to create a prototype web page that can be launched to test the consumer chat functionality outside of the product_intel.io site. So this page is basically a representation of a client implementation of our consumer chat. And we need a testing page to see the behaviors and everything else. It doesn't need to be very fancy or anything else, but it should include a basic chat input box and chat window and chat responses so that we can test the true behavior and output of the consumer chat."

Three sentences. Somewhat clear intent. A human developer would read this, ask a few questions, make some assumptions, and start building. Granted this is not the quality of a story you would put into Jira for your development team, but the recent models have become so good at what appears to be interpreting your intent, it could become easy to start to give your coding agent this type of prompt. Especially when your coding sessions have a lot of memories and "context" to what you have been building, it really can enable laziness because it seems to be working...

An AI agent would do the same, except it wouldn't ask the questions. It would make the assumptions silently, start building immediately, and produce something that technically works but may not be what you needed.

Now here's what that same story looks like after 30 seconds of AI-powered spec enrichment:

6 files to create, 2 to modify, 3 to reuse (don't recreate)
6 acceptance criteria, each with data source, display behavior, and error state
8 negative constraints ("Do NOT build a full chat UI library")
5 error handling scenarios with exact user-facing messages
10 unresolved questions the AI identified but couldn't answer alone

The enriched spec is 400+ lines. The original was 3 sentences. The difference between the two is the difference between an AI agent that builds what you meant and one that builds what it assumed.

The Hidden Cost of Vague Stories

The problem with vague stories isn't that they produce bad code. The code works. It compiles, it renders, it does what was asked. The problem is that vague stories create an implicit permission to make assumptions, and every assumption is a coin flip that compounds over time.

I learned this the hard way building ProductIntel, an AI-native operations platform with 21 modules, 85+ database tables, and 55+ pages. Over six months of development, across hundreds of iterations and hundreds of conversations with AI, we accumulated technical debt that was invisible until it wasn't.

The Architecture That Outgrew Itself

When we started building, the implicit requirement was "build pages that show data." Simple enough. Every page used client-side fetch() calls to load data from API routes. It was the simplest pattern and it worked perfectly, when there were 5 pages.

By Release 6, we had 55+ pages making 63 client-side fetch calls. The authentication system used server-side cookies that raced with client-side fetches, causing silent 401 failures. Pages with AI features made 3-6 parallel LLM calls that took 10-20 seconds. Users would navigate between tabs and watch loading spinners while the same data was re-fetched from scratch.

The fix took three full sessions across four days: migrating 65 fetch calls across 4 tiers, converting 40 to server-side data loading, wrapping 15 with auth-aware utilities, converting 25 mutations to server actions, removing 7 dead API routes, and adding 9 error boundaries.

One line in an early story spec would have prevented all of it:

Negative Constraint: Do NOT use client-side fetch() for initial 
page data loads. Use server components with direct service calls.

That's what a refined story catches. Not a bug, but a decision that should have been made explicitly instead of assumed implicitly.

The 170-Document System Prompt

We built a Knowledge Chat feature, meaning "Add an AI chat to the knowledge base that can answer questions about our docs." Simple, right?

The implementation loaded all documents from the database and stuffed their summaries into the LLM's system prompt. With 9 product docs, this worked great. Then our Discovery Agent started running, generating findings stored as document-type artifacts. Each run created 5+ new documents. We ran Discovery dozens of times during testing.

One day, the Knowledge Chat started giving vague, generic answers. Response times increased. Token costs spiked. We didn't know why, until we built an Inference Inspector and looked at the actual prompt being sent to the model.

170+ documents. 51,000+ characters. Most of them duplicate discovery outputs that had nothing to do with the knowledge base.

The model was drowning in noise. The actual product documentation, the stuff the user was asking about, was buried under a mountain of AI-generated artifacts.

The fix was a one-line filter:

const docs = allDocs.filter(d => d.status === 'published' && !d.title.startsWith('Discovery:'))

But here's the thing. The original code wasn't wrong. getDocuments() returned all documents. That's what it was supposed to do. Nobody specified "only include published product documentation, not AI-generated artifacts." So the agent included everything. Technically correct. Practically disastrous.

A refined story would have specified:

Data Source: getDocuments() filtered to status='published' and 
type NOT IN ('discovery-finding', 'discovery-output')

Negative Constraint: Do NOT load all documents into the system 
prompt. Discovery artifacts are NOT knowledge base content.

The Vector Search That Couldn't Find Its Own Documents

We embedded all our product documentation for semantic search. A user asks "What's the difference between Archive API v2 and v3?" and the system should find the v2 and v3 API specs and answer from them.

Instead, the top result was the Preview Manager documentation at 69% relevance. The actual Archive API v3 spec, the document literally about what the user asked, scored 25%.

The root cause: we embedded entire documents as single vectors. A 5,000-word API spec gets compressed into one array of 1,536 numbers, a "semantic fingerprint" of the entire document's average meaning. The v3 spec covers webhooks, batch operations, customer portals, and statistics. "Version differences" is a tiny fraction of what the document is about, so the embedding barely matches.

The original approach was: "Embed the documents so we can do semantic search."

Nobody questioned whether "one embedding per document" was the right granularity. Nobody specified chunking. Nobody defined what "good retrieval" looks like. So the agent did the simplest thing and it worked, until it didn't.

What Good Refinement Actually Looks Like

Story refinement isn't about writing a novel. It's about answering the questions that an AI agent would otherwise answer silently and wrong.

The enriched spec we generated has several sections that prevent the exact problems we lived through:

Negative Constraints: What NOT to Build

This is arguably the most valuable section:

Do NOT build a full chat UI library, use basic HTML/Tailwind components only
Do NOT create a separate database table, reuse pb_artifacts
Do NOT hardcode model names, always use resolveModel()
Do NOT implement message search, filtering, or export features

Without these, an agent will build more than you need. It will create a component library for a test page. It will design a schema for a throwaway feature. It will add search functionality nobody asked for. Each of these adds complexity, increases maintenance surface, and burns tokens.

Negative constraints are the guardrails that keep an agent focused on the 20% of the work that delivers 80% of the value.

Unresolved Questions: Where the AI Needs You

The enrichment produced 10 questions it couldn't answer:

Should chat history persist across browser sessions?
What is the maximum message length allowed?
Should the test page require authentication?
What is the expected response time for assistant messages?

This is the AI saying: "Here are the decisions I'd have to guess about if you don't tell me. Do you want me guessing, or do you want to decide?"

In our experience, the unresolved questions are where the real architectural decisions hide. "Should chat history persist?" is actually a question about storage strategy, session management, and data lifecycle. An agent that guesses "yes" might build a full persistence layer for a throwaway test page.

Files to Reuse: Don't Reinvent the Wheel

The enrichment identifies existing code that should be reused:

context-engine.ts: Use for enriching chat context
schema.ts: Reference pb_artifacts table definition
models.ts: Use resolveModel() for LLM selection

Without this, agents commonly rebuild utilities that already exist in the codebase. We've seen it happen. A new feature creates its own database helper when a perfectly good one exists two directories over. The enrichment prevents this by explicitly naming what already exists.

The Honest Truth About AI Spec Enrichment

The enriched spec isn't perfect. It hallucinated some implementation details, like suggesting the wrong table for message storage because it didn't have full awareness of how the consumer chat system actually works. It made assumptions about architecture that a human with context would challenge.

But imperfect refinement is 10x better than no refinement.

The goal of spec enrichment isn't to produce a perfect blueprint. It's to:

Surface decisions that need to be made before code is written
Establish constraints that prevent scope creep and over-engineering
Identify reusable code so the agent doesn't reinvent existing patterns
Define error handling so failures are graceful, not silent
Give the agent enough structure to build 80% correctly on the first attempt instead of 20%

The remaining 20%, the hallucinated details and wrong assumptions, is where human judgment matters. I think an AI PM's job isn't to accept the enrichment as-is. It's to review it, catch the mistakes, answer the unresolved questions, and then hand a genuinely solid spec to the agent.

The Pattern Behind the Pattern

Looking back at every architectural problem we encountered, from the client-side fetch migration to the system prompt bloat to the embedding granularity to the dozens of smaller issues along the way, they all share the same root cause:

A decision that should have been made explicitly was instead made implicitly by the agent.

What I said	What the agent assumed	What should have been specified
"Add a page that shows data"	Client-side fetch	Server component with direct service calls
"Add AI chat to the knowledge base"	Load all docs into prompt	Filter by status, exclude AI-generated artifacts
"Embed the documents for search"	One embedding per document	Chunk at 300-500 tokens, embed each chunk
"Show discovery findings on the product page"	Show all findings	Scope to product, filter by confidence, limit count

Every row in that table is a story that wasn't refined. Every row cost hours to fix after the fact.

The Practical Takeaway

If you're building with AI agents, whether through Claude Code, Cursor, Copilot, or any other tool, the highest-leverage investment you can make isn't a better model or a bigger context window. It's better input.

Spend 30 seconds running your story through spec enrichment before the first line of code is written. Review the negative constraints. Answer the unresolved questions. Verify the file paths and data sources.

The 30 seconds the LLM spends enriching saves hours of migration later. And the unresolved questions it surfaces? Those are your highest-risk assumptions, the ones that compound silently until they become the wall your architecture hits at scale.

Story refinement isn't about being pedantic. It's about being intentional. And in an AI-first development world, intentionality is the difference between building something that works and building something that lasts.

Mike Holloway is a VP of Product Management at a Fortune 500 FinTech with 25+ years of enterprise product and technology leadership. He writes about the intersection of product management, AI engineering, and enterprise software. See his AI skills portfolio at mikeholloway.dev/ai-skills.

This is an early preview. I'd rather get the ideas out while the conversation is happening than wait for perfectly polished prose. Further editing and proofing will come in time.

A note on how this was written: This article was developed through collaborative sessions with Claude, drawing on real learnings from building ProductIntel over the past six months. The experiences, the specific technical examples, and the lessons are mine. The organization and drafting was a human-AI partnership, which feels appropriate given the article is about how to work with AI more effectively.

Why Claude Code Needs More Than Just Your Words: A Case Study in Story Refinement