Building an AI PM Skillset, In Public

What 8 weeks of intense AI platform development taught me about the role that doesn't exist yet

The Starting Line

I'm a VP of Product Management at a Fortune 500 FinTech. 25 years of enterprise and IT experience. I know APIs, databases, and system architecture. But a year ago, I couldn't tell you what an embedding was or how a vector search worked. I understood AI conceptually. I didn't understand it mechanically.

I decided to learn by building. Not by taking courses, not by reading papers, but by shipping a real product. In January 2026, I opened Claude Code, described what I wanted, and started iterating.

Eight weeks later, reusing some components from previous projects but building most of it from scratch, I have a 21-module AI platform with 85+ database tables, 55+ pages, a working agent orchestration framework, RAG pipelines, inference observability, and a Discovery Agent that reads API documentation and identifies cross-product opportunities.

Eight weeks. Not eight months. That speed is both the point and the problem, and it's exactly what makes the lessons worth sharing.

Phase 1: The Confidence Trap (Weeks 1-3)

The first thing I learned was that AI coding tools make you dangerously productive. Within weeks, I had a working application. Authentication, database schema, CRUD operations, a dashboard. Features were shipping daily. I felt like a 10x engineer.

I was not a 10x engineer. I was a 1x engineer with a 10x typing speed.

The code worked. It compiled. It rendered. But I was making architectural decisions by omission. I wasn't choosing a data loading pattern (the tool chose client-side fetches), wasn't choosing an auth strategy (the tool chose the simplest approach), wasn't choosing an embedding granularity (the tool embedded entire documents as single vectors). Every "non-decision" was actually a decision that would cost me later.

What I learned: Speed without architecture is technical debt on a payment plan. The monthly payments are small until they're not.

The skill I developed: Knowing which decisions matter enough to be explicit about, even when the tool will happily make them for you.

Phase 2: The Scale Wall (Weeks 3-5)

By the fourth release, just a few weeks in, the application had grown beyond what the initial architecture could handle. Pages took 15+ seconds to load because they were making 3-6 parallel LLM calls via client-side fetches. Authentication cookies raced with client-side requests, causing silent 401 errors. The knowledge base chat was stuffing 170+ documents into the system prompt because nobody specified what "load the docs" meant at scale.

This was my first real architectural crisis, and it taught me something I couldn't have learned from a course: the feeling of outgrowing your own architecture. I'd been building fast, shipping features, celebrating progress, and suddenly nothing worked well anymore.

The fix took a full week. We migrated 65 client-side fetch calls across 4 tiers, converted pages to server-side data loading, added error boundaries, removed dead API routes. We essentially rebuilt the data layer of the entire application.

What I learned: Architecture decisions made in Week 1 compound through every file written in Weeks 2-8. A 5-minute conversation about data loading patterns at the start would have saved a week of migration.

The skill I developed: Recognizing architectural inflection points, the moment when adding one more feature on the current foundation is more expensive than changing the foundation.

Phase 3: The AI Awakening (Weeks 5-7)

After the architecture stabilization, I started building the AI-native features that made the platform genuinely novel: an Intelligence module with AI-generated briefings, a Discovery Agent that analyzes knowledge base changes and identifies opportunities, a Products module that gives a single-pane-of-glass view per product with AI narratives.

This is where the role shift happened. I was no longer just building software. I was designing AI experiences. And the design questions were fundamentally different from traditional product work:

What should the AI summarize vs what should it leave for the human?
How confident does the AI need to be before it makes a recommendation?
When should AI features be on by default vs opt-in?
How do you show the user why the AI said what it said?

That last question led to building an Inference Inspector, a tracing system that shows the full lifecycle of every LLM call. What prompt was sent, what documents were retrieved, what the model returned, and why. Building it taught me more about AI than any blog post or course because I was learning by observing real AI behavior in a system I understood deeply.

What I learned: The difference between an AI-assisted product (AI helps the human) and an AI-first product (AI leads, human approves) is a design philosophy, not a technology choice. Getting it right requires product judgment, not engineering skill.

The skill I developed: AI-first product design, knowing when to lead with the AI's recommendation vs when to present options, and how to make AI decisions transparent without overwhelming the user.

Phase 4: The Observability Shift (Weeks 7-8 and Beyond)

The most recent phase has been the most transformative for my understanding. We built inference tracing, context segment analysis, a prompt playground for A/B testing system prompts, and integrated Langfuse for aggregated observability.

This latest phase came from a conversation with a data scientist colleague who said: "You're a proficient builder. Now you need to understand what happens inside the LLM call." He was right. I could build AI products, but I couldn't diagnose them.

The first thing I did was trace a Knowledge Chat query through our Inspector. What I found shocked me: the system prompt was 51,000+ characters because it was loading every document in the database, including 100+ AI-generated discovery outputs. The model was drowning in noise while trying to answer a simple question about API retention policies.

This single observation taught me:

Retrieval quality matters more than model quality. The right documents at 90% relevance with Haiku will outperform wrong documents at 50% relevance with Opus.
Prompt composition is an optimization problem. Every token in the context window has a cost (literal dollars) and a benefit (relevance to the query). Most prompts are not optimized. They're assembled.
Observability is not optional. If you can't see what the model received, you can't diagnose why it responded poorly.

We built the prompt into a structured breakdown: system instructions, product context, document summaries, and vector search results, each tracked as a separate "context segment" with its own token count. Now we can see that 60% of our tokens are going to document summaries while only 15% are vector search results. Is that the right ratio? We can test it.

What I learned: Understanding AI at the inference level is the gap between "I can build AI products" and "I can optimize AI products." It's the difference between a PM who ships features and one who ships features that actually work well.

The skill I developed: LLM observability and diagnostics, including tracing requests, diagnosing retrieval quality, optimizing prompt composition, and making data-driven decisions about model selection and context budgets.

What I'd Tell Week-0 Me

If I could go back and give myself five pieces of advice:

1. Make architectural decisions explicitly, even when the tool will make them for you. Write them down. "We're using server-side data loading. We're chunking documents before embedding. We're filtering discovery artifacts from knowledge base queries." Every explicit decision is a future migration avoided.

2. Build observability early. I wish I'd built the Inference Inspector in Week 3, not Week 7. The bloated prompt issue was probably there for months before I could see it. Every AI feature should trace its calls from day one.

3. The AI PM skill isn't building. It's judgment. Anyone can prompt an AI to build features. The skill is knowing which features to build, which architectural patterns to choose, when to lead with AI and when to let the human drive, and how to diagnose AI behavior when it goes wrong.

4. Story refinement matters more than story volume. We shipped hundreds of features with vague stories. Every vague story was an implicit permission for the AI to make assumptions. The assumptions that were wrong cost hours to fix. 30 seconds of spec enrichment before coding saves weeks of migration after.

5. Revisit what you've built, out loud. Some of my deepest learning didn't happen during the build sessions. It happened afterward, when I sat down with Claude and talked through what I had just experienced. Not to write code, but to process it. "Why did that architecture break at scale? What pattern should I have used instead? How does this connect to what I learned three weeks ago?" Those conversations forced me to move from "I fixed the problem" to "I understand why the problem existed." Writing about it takes that even further, because putting lessons into words forces you to relive the development experience and find the structure inside it. The building teaches you what happened. The conversation teaches you why. The writing makes sure you don't forget either one.

The Role That's Emerging

Over this intense build period, I've come to believe that there's a new product role forming at the intersection of three skills that don't traditionally live together:

Product judgment, knowing what to build and why
Builder capability, being able to validate ideas by building them
AI fluency, understanding LLM behavior at the inference level

Most PMs have the first. Some are developing the second (the "builder PM" trend). Very few have the third, which is the ability to look at an AI system and diagnose why it's behaving a certain way, then make a data-driven decision about how to fix it.

That third skill is what separates someone who uses AI tools from someone who manages AI products. And the only way I've found to develop it is to build something real, trace every AI call, and learn from what you see.

The role doesn't have a standard title yet. AI Product Manager. AI-Native PM. Builder PM. Whatever it's called, the defining characteristic is the same: you can go from user problem to shipped AI solution to optimized AI behavior, end to end, with judgment at every step.

That's what I've been building toward. Eight weeks in, I'm not there yet. But I can see the path, and the speed at which this skillset can be developed is part of the story. You don't need a year. You need intensity, real problems, and the willingness to trace every AI call until you understand why it did what it did.

Mike Holloway is a VP of Product Management at a Fortune 500 FinTech with 25+ years of enterprise product and technology leadership. He writes about the intersection of product management, AI engineering, and enterprise software. See his AI skills portfolio at mikeholloway.dev/ai-skills.

This is an early preview. I'd rather get the ideas out while the conversation is happening than wait for perfectly polished prose. Further editing and proofing will come in time.

A note on how this was written: This article is a good example of the learning process it describes. I built the platform, then sat down with Claude to revisit and make sense of what I'd learned across those eight weeks. The experiences, the mistakes, and the lessons are mine. The conversation that helped me find the structure in them, and the drafting that followed, was a human-AI partnership.