The Pre-Product Discovery Problem

Why the first 5 minutes of an AI project can determine the next 6 months, and how a simple Q&A could prevent most architectural migrations

The Pattern

Every architectural problem I've encountered while building AI products traces back to the same root cause: a decision that should have been made explicitly in the first week was instead made implicitly by whatever felt simplest at the time.

Not wrong decisions. Implicit ones. Decisions that nobody realized they were making because nobody asked the question.

"How should pages load data?" → Nobody asked. The AI tool used client-side fetches. Six months later: 65 fetch calls migrated across 4 tiers.
"How should documents be embedded for search?" → Nobody asked. One embedding per document. Six months later: 25% retrieval relevance on documents the query was directly about.
"What goes into the system prompt?" → Nobody asked. Everything. Six months later: 51,000-character prompts with 95% noise.
"Which LLM features should be optional vs always-on?" → Nobody asked. Everything was always-on. Six months later: customers who can't afford the token spend for features they don't use.

These aren't edge cases. This is the pattern. And it repeats across every AI project I've seen, not just my own.

Why AI Projects Are Especially Vulnerable

Traditional software projects have the same problem, implicit decisions that compound. But AI projects are uniquely exposed for three reasons:

1. AI Coding Tools Move Faster Than Architecture Discussions

When you can ship a working feature in 20 minutes, there's enormous pressure to just build. The tool is ready. The model is ready. You have an idea. Why stop to discuss data loading patterns when you could have a working page right now?

Because the working page you build in 20 minutes establishes a pattern that every subsequent page will follow. If that pattern is wrong, you don't discover it until page 30, and by then, migrating is a week-long project.

2. AI Decisions Are Invisible Until They Compound

In traditional software, a bad database schema is obvious. Queries are slow, joins are painful, migrations are frequent. In AI software, a bad embedding strategy looks fine until you have enough documents to dilute the vectors. A bloated system prompt looks fine until you have enough content to overflow the context window. A missing retrieval filter looks fine until AI-generated artifacts outnumber human-authored ones.

AI technical debt is silent. It doesn't throw errors. It degrades gradually. Response quality drops 2% per month. Token costs creep up. Retrieval scores decline. Nobody notices until someone asks "why is the AI giving worse answers than it used to?"

3. The Decision Space Is Unfamiliar

Most product managers and developers know how to discuss database schemas, API design, and frontend architecture. Few know how to discuss embedding strategies, prompt composition patterns, model selection criteria, or token budget allocation. The decisions are unfamiliar, so they get deferred, which means they're made by default rather than by design.

The 5-Minute Discovery

What if, before the first line of code, there was a structured conversation that surfaced the decisions that are expensive to change later? Not a requirements document. Not a technical design review. A 5-minute Q&A that forces you to be explicit about the things that matter.

Here are the questions, organized by how expensive they are to change later:

Tier 1: Change-the-Foundation Expensive

How should pages load data? Options: Server components (SSR), client-side fetch, hybrid Impact: Affects every page in the application. Migrating later means touching every file. Our mistake: Started with client-side fetches. Migrated to server components across 65 files.

What's the auth model? Options: Session-based, token-based, API key, OAuth Impact: Affects middleware, every API route, every server action. Our mistake: Auth cookie race conditions caused silent 401 failures on client-side fetches.

What's the deployment target? Options: Vercel, self-hosted, local-only, hybrid Impact: Affects build configuration, environment variables, worker architecture. Our lesson: Mac Mini worker for agent execution + Vercel for the UI was the right split but wasn't planned upfront.

Tier 2: Significant-Refactor Expensive

How should documents be embedded? Options: Whole-document, chunked (300-500 tokens), hierarchical Impact: Affects retrieval quality for every RAG feature. Our mistake: Whole-document embedding. 5,000-word specs scored 25% relevance.

What's the model selection strategy? Options: Single model, per-feature model config, adaptive routing Impact: Affects cost, quality, and latency across every AI feature. Our approach: Per-feature model config stored in database. Right decision, made early by accident.

Should AI features be tierable? Options: All-or-nothing, configurable levels (minimal/standard/extended) Impact: Determines addressable market, because not every customer wants full AI at full cost. Our lesson: Built configurable AI Levels after realizing small businesses can't afford unlimited LLM calls.

Tier 3: Moderate-Effort to Change

What goes into system prompts? Options: Static instructions only, static + dynamic context, everything assembled per-request Impact: Affects token cost, response quality, prompt debugging. Our mistake: Mixed static instructions with dynamic content in one blob. Couldn't separate "prompt problem" from "context problem" until we built Context Segments.

How are LLM calls traced? Options: Not traced, basic logging, full inference traces with context segments Impact: Determines whether you can diagnose AI quality issues. Our lesson: Should have been instrumented from day one. We only found the bloated prompt problem because we built an Inspector 9 months in.

Where does AI-generated content live? Options: Same table as human content, separate table, tagged with metadata Impact: Affects every query that loads "content." If you can't distinguish human vs AI content, AI output pollutes human-curated spaces. Our mistake: Discovery findings stored as DOC-type artifacts. Knowledge Chat loaded them as if they were product documentation.

Tier 4: Easy to Change (But Still Worth Deciding)

What's the caching strategy for LLM responses? What's the error handling pattern for failed AI calls? How are token costs tracked and attributed? What's the maximum acceptable latency for AI features?

What the Output Looks Like

The answer to these questions isn't a document. It's a set of constraints that get written into the project's configuration before the first feature is built.

For us, that means CLAUDE.md, the file that AI coding tools read at the start of every session. But the mechanism doesn't matter. What matters is that the decisions are explicit and enforced:

## Architecture Rules
- All data uses server components for initial load. Client-side fetch 
  only for polling, streaming, and user-triggered actions.
- Documents must be chunked at 300-500 tokens before embedding.
- System prompts must separate static instructions from dynamic context.
- Every LLM call must be traceable via logInferenceTrace().
- AI-generated content must be tagged in metadata (discoveryFinding, 
  generatedBy, etc.) and excluded from knowledge base context.
- AI features must respect the aiLevel setting (minimal/standard/extended).

Six constraints. Maybe 30 seconds to write. Would have prevented the three biggest architectural issues we encountered.

The Skill Files

Beyond constraints, the discovery Q&A can generate skill files, which are reusable prompt templates that encode architectural patterns for AI coding tools.

For example, a /data-loading skill that fires whenever a new page is created:

When creating a new page:
1. Use a server component for initial data load
2. Call service functions directly (not API routes)
3. Pass data as props to client components
4. Only use client-side fetch for: polling > 30s interval, 
   streaming responses, user-triggered mutations

Or a /ai-feature skill:

When adding an AI feature:
1. Check aiLevel before making LLM calls, skip at 'minimal'
2. Use tracedGenerateText() instead of generateText()
3. Build context from structured segments, not monolithic strings
4. Filter knowledge base queries to status='published' only
5. Set a token budget for the context, log a warning if exceeded

These skills are the AI equivalent of coding standards. They're not suggestions, they're constraints that the AI coding tool follows automatically. Every feature built with these skills inherits the right architecture by default.

The Product Opportunity

I think there's a product hiding in this concept.

Imagine a guided UI that walks a PM or founder through the discovery questions, maybe 8-10 questions over 5 minutes. Based on the answers, it generates:

A CLAUDE.md file (or equivalent) with the architectural constraints
Starter skill files with pattern-specific guardrails for the AI coding tool
A ready-to-use prompt for v0.app, Claude Code, Cursor, or any builder tool

The input is a 5-minute conversation. The output is a project that's architecturally constrained from commit one.

We didn't have this when we started. We learned every constraint the hard way. But the constraints themselves are transferable. A new project in a similar domain should be able to start with the lessons we already learned, not repeat them.

The Compound Effect

The real cost of implicit decisions isn't the migration itself. It's the compounding.

Every feature built on top of a wrong decision inherits that wrongness. If you start with client-side fetches, every new page uses client-side fetches. If you start with whole-document embeddings, every document is embedded wrong. If you start with unfiltered context assembly, every new content type pollutes the prompts.

By the time you notice the problem, it's not one file. It's 65 files. It's not one embedding. It's 200 embeddings. It's not one prompt. It's every prompt in the system.

Five minutes of discovery at the start doesn't prevent all problems. But it prevents the ones that compound. And in a world where AI tools can ship features in minutes, the decisions that compound are the only ones that matter.

Mike Holloway is a VP of Product Management at a Fortune 500 FinTech with 25+ years of enterprise product and technology leadership. He writes about the intersection of product management, AI engineering, and enterprise software. See his AI skills portfolio at mikeholloway.dev/ai-skills.

This is an early preview. I'd rather get the ideas out while the conversation is happening than wait for perfectly polished prose. Further editing and proofing will come in time.

A note on how this was written: This article was developed through collaborative sessions with Claude, pulling together lessons from building ProductIntel over the past year. The mistakes, the discovery questions, and the constraints are all drawn from real development experience. The organization and drafting was a human-AI partnership.