All writing
No. 5·The framework

Seven Skills I Believe Will Define the Next Product Manager

12 min readEarly Preview

In my first article, I argued that the PM role is splitting in two. One path toward faster execution of the same job, and another toward a new discipline entirely. This article is about what that new discipline actually looks like. Not the title or the org chart, but the skills.

If we're going to talk about a new kind of product manager, one who designs and builds AI systems rather than just specifying them, then we need to define what that person actually knows. Concretely. Specifically enough that a company could hire against it and an individual could develop toward it.

I've identified seven competency domains. Some of them didn't exist two years ago. Some of them have existed for decades but become dramatically more important in an AI-native world. None of them live cleanly in the "product management" or "software engineering" buckets we're used to, and that's the point.

1. AI-First Product Design

This is not "add a chatbot to your existing product."

AI-First Product Design means fundamentally rethinking the user experience so that AI leads the interaction. Instead of presenting users with a dashboard full of controls and filters and making them do the analysis, you design experiences where the AI has already done the analysis and presents recommended actions.

I think of it as "recommendations before controls." The default landing page isn't a blank canvas. It's a narrative briefing. Here's what happened since you last logged in. Here are the three things that need your attention, ranked by impact. Here's what I recommend, with cost estimates and confidence levels. The controls still exist, but they're secondary to the AI's assessment.

This is a design philosophy, not a technology choice. And it requires a product person to get right because the hard decisions are about what to surface and what to suppress, how much AI confidence the user needs to see before they trust the recommendation, and where the human override needs to be prominent versus buried.

Engineers can absolutely build AI-first interfaces. But deciding what the AI should prioritize, how to frame uncertainty, and when the experience should feel automated versus collaborative, that's product judgment applied to a new medium.

2. Context Engineering

This is the new information architecture.

In traditional product work, information architecture meant organizing navigation, taxonomies, and content hierarchies for human users. In AI-native products, information architecture means organizing knowledge so that AI agents find exactly what they need, at the right moment, at the right cost.

Context Engineering covers a range of technical decisions that all come back to one question: what goes into the context window?

Retrieval strategy is a big part of this. When should you use vector search versus full-text search versus hybrid approaches? How do you weight relevance versus recency? How do you handle the case where the most relevant document is also the most expensive to load?

Then there's context scoping, which is about what each agent sees versus what is kept out of view. Loading everything into every agent's context is the default instinct, and it's almost always wrong. Each agent works better with a focused, narrow view. The orchestrator holds the broad view. The specialists hold deep, scoped views.

And underlying all of it is the cost tradeoff. Every token in the context window has a dollar cost. Rich context produces better output, but the relationship isn't linear. There's a point of diminishing returns, and finding that point for each use case is a core skill.

Here's a concrete framework I've been developing: a tiered onboarding model where 5 minutes of context loading gets you to roughly 70% AI effectiveness, and 30 minutes gets you to roughly 95%. The question then becomes whether the delta between 70% and 95% is worth the additional cost and latency for this specific use case. Sometimes yes, sometimes no. That's a product decision, not an engineering decision.

3. Agent System Design

Most people talking about AI agents are still running single-agent loops. One model, one system prompt, one set of tools. That works for simple use cases but it breaks down fast when you need specialized reasoning across multiple domains.

Agent System Design is about multi-agent orchestration, and it's an emerging discipline with almost no established best practices. There are several key decisions you need to make.

Team topology is the first one. When do you use a solo agent versus a supervised agent versus multiple agents collaborating? The answer depends on the complexity of the task, the cost tolerance, and whether the domains are separable.

Handoff protocols are another critical area. When Agent A finishes its work and Agent B needs to continue, what gets communicated? The full output? A compressed summary? Pointers to artifacts with on-demand retrieval? Each approach has cost, quality, and latency tradeoffs that need to be considered.

Then there's error propagation. When an agent fails or produces low-confidence output, what happens? Does the workflow stop? Does it retry? Does it escalate to a different agent or to a human? Error handling in multi-agent systems is fundamentally harder than in single-agent systems because errors can cascade across agents in non-obvious ways.

And here's a question almost nobody is asking yet: how do you know which agent team configuration is actually better? If you have two different ways to structure a workflow, maybe a three-agent pipeline versus a two-agent pipeline with broader scope, how do you objectively compare them?

This is where A/B testing at the agent team level comes in, and it's something I've actually built. The concept is straightforward but the implications are significant. You take the same task or story, run it through two different agent team configurations, and then score the outputs against each other on dimensions like quality, cost, latency, and token consumption. Over time, you build up data on which team topologies perform best for which types of work. Maybe your research tasks perform better with a solo deep-reasoning agent, while your feature decomposition tasks need a three-agent pipeline with specialized roles. You wouldn't know that without measuring it. This kind of empirical testing of agent team design is, I think, going to become standard practice, but right now almost nobody is doing it.

The key insight I keep coming back to is that agent team design looks a lot like human team design. The same principles apply. Clear responsibilities, scoped authority, structured communication, and well-defined escalation paths. The people who are best positioned to design agent teams are the ones who've designed and managed human teams.

4. Specification as Code

In the old world, a product manager wrote requirements for humans to interpret. Ambiguity was tolerable because humans could fill in the gaps with judgment, ask clarifying questions, and course-correct in standup meetings.

Agents don't do any of that. They execute exactly what you specify. If your specification is vague, the output is vague. If you leave a gap, the agent will fill it, but not necessarily the way you intended.

Specification as Code means writing agent specifications with the same rigor you'd apply to a function interface. Each agent needs explicit capabilities (what it can do), explicit constraints (what it must not do), an explicit output schema (the structure and format of what it produces), and scoped tool permissions (what tools it has access to, and nothing more).

The principle of least privilege, a concept borrowed from enterprise security, turns out to be one of the most important patterns in agent design. Give each agent exactly the tools it needs and nothing more. Research shows that reducing available tools actually improves agent performance, because the agent spends less time reasoning about which tool to use and more time doing the work.

I've written specifications for over 20 agents at this point. The ones that perform consistently well share a common trait: they're specified with precision. The ones that surprise me with bad behavior? Every time, I can trace it back to a gap in the spec.

This is arguably the most transferable PM skill on the list. Product managers who write tight, unambiguous requirements for human teams are already halfway to writing good agent specifications. The discipline is the same. The audience is just less forgiving.

5. Model Economics

This is the CFO dimension of AI product work, and almost nobody is talking about it seriously yet.

If you were in enterprise technology during the cloud migration wave, you've seen this movie before. Organizations moved everything to the cloud because of the perceived cost savings, and in many cases those savings were real at first. But once the migration was complete and all the environments were running, the bills started adding up in ways nobody had forecasted. Suddenly you had teams spinning up resources without understanding the cost implications, and the monthly invoice became a surprise instead of a plan. I think the exact same thing is about to happen with AI and token economics. Right now most organizations are running a handful of AI features or prototypes, and the costs feel manageable. But as AI agents penetrate deeper into enterprise workflows and you go from a few thousand agent calls a month to hundreds of thousands, the economics change dramatically. Without someone paying attention to model selection, context loading efficiency, and token budgets from the beginning, companies are going to get the same sticker shock they got with their first real cloud bill.

Every agent call consumes tokens. Every model has a different price-per-token. Every unnecessary piece of context is money burned. As AI features move from prototypes to production, the question "what does this cost per user per month?" becomes critical, and most teams can't answer it.

Model Economics covers several interconnected decisions.

Model-task fit is the starting point. Not every task needs your most expensive model. A summarization task that works fine on a fast, cheap model doesn't need a reasoning-heavy model that costs 10x more. Assigning the right model to the right task is a product decision with direct P&L impact.

Token budgeting is about setting explicit budgets per workflow and per agent, so costs are predictable rather than surprising. This includes designing context loading strategies that minimize tokens without sacrificing quality.

Challenger testing means systematically evaluating whether a cheaper model can produce equivalent output for a specific task. This is essentially A/B testing for model economics. You replay recent tasks through a challenger model, score the output similarity, and surface downgrade recommendations with estimated cost savings.

And then there's burn rate visibility, which means dashboards and alerts that show real-time and projected AI costs per feature, per workflow, per user. The same cost discipline you'd apply to any SaaS infrastructure, applied to model inference.

This competency barely exists anywhere right now. But I believe it's going to become critical as AI costs become a real line item on the P&L. The product person who can say "I can cut our model costs 40% without degrading output quality, and here's the data to prove it" is going to be extremely valuable.

6. AI Observability and Trust

Monitoring AI systems is not like monitoring traditional software.

Traditional observability answers a few basic questions. Is it up? Is it fast? Is it throwing errors? Those questions still matter for AI systems, but they're necessary and not sufficient. An AI system can have 100% uptime, sub-second response times, and zero errors while confidently producing wrong output all day long.

AI Observability adds several new dimensions that teams need to think about.

Output quality monitoring asks whether the AI's outputs are actually good. This requires evaluation frameworks, human review sampling, and automated quality scoring. It's not enough to know the system responded. You need to know whether the response was useful.

Retrieval attribution is about verifying sources. When the AI cites information to support a recommendation, did it actually find that information in the data? Or did it hallucinate the source? Attribution tracking traces every claim back to its source material.

Cost trend monitoring looks at not just current costs but trajectory. Is this workflow getting more expensive per invocation? Why? Is it because context windows are growing over time? Because retry rates are increasing?

Security boundary monitoring watches whether agents are staying within their authorized scope. Are they accessing tools they shouldn't? Are they producing output that violates content policies?

The trust angle is equally important and harder to engineer. AI systems need to earn trust the same way any new tool earns trust in an enterprise: through transparency. Every decision the AI makes should be inspectable. Every recommendation should come with a reasoning trail that a skeptical stakeholder can review. Not because they will review every decision, but because they could.

I've been designing observability around the concept of an Inference Inspector, which is a per-trace view that shows not just what the AI decided, but why, in plain language. Not for the engineers. For the product manager, the business stakeholder, the compliance officer who needs to understand why the AI recommended what it recommended.

7. Strategic Product Judgment

This is the domain where traditional PM experience becomes more valuable, not less.

As build cycles compress and it becomes possible to prototype in hours what used to take weeks, the constraint shifts from "can we build it" to "should we build it." And answering "should we build it" requires judgment that AI itself can't provide.

Problem selection means looking at ten things AI could do and identifying the two that actually matter for your users and your business. This requires customer empathy, market understanding, and the ability to distinguish genuine pain points from interesting technical demos.

Prioritization under uncertainty gets harder, not easier, when you can build anything quickly. The number of decision points multiplies. More options means more chances to choose wrong, and the premium on good prioritization goes up.

Organizational context is about understanding how a feature will actually be adopted, who will resist it, what the training and change management requirements are, and whether the organization is ready for it. This is knowledge that comes from years of shipping software to real companies.

Strategic Product Judgment is the one competency on this list that cannot be learned quickly. The others, like Context Engineering, Agent System Design, and Model Economics, can be developed in months with focused effort. Strategic Product Judgment comes from years of doing the work, from having shipped things that succeeded and things that failed, and from having the pattern recognition to know the difference.

This is why I believe the people best positioned for this new discipline are experienced product leaders who learn AI, not AI engineers who learn product. The technical skills are acquirable. The judgment is not.

What This Framework Is For

This isn't an academic exercise. It's meant to be practical.

If you're a product manager figuring out what to learn next, this gives you a development map. Pick the domain where you're weakest and go deep.

If you're a company trying to hire for this role, this gives you an interview framework. Don't ask candidates if they "know AI." Ask them how they'd design a context loading strategy for a multi-agent workflow. Ask them how they'd set up model economics monitoring. Ask them what happens when two agents disagree.

If you're a leader trying to understand how your product team should evolve, this gives you a maturity model. Where is your team today across these seven domains? Where do they need to be in 12 months?

The PM role is changing. I believe these are the skills that will shape what it's changing into, and while nobody can predict the future with certainty, the direction feels clear to me.


Mike Holloway is a VP of Product Management at a Fortune 500 FinTech with 25+ years of enterprise product and technology leadership. He writes about the intersection of product management, AI engineering, and enterprise software. See his AI skills portfolio at mikeholloway.dev/ai-skills.


This is an early preview. I'd rather get the ideas out while the conversation is happening than wait for perfectly polished prose. Further editing and proofing will come in time.

A note on how this was written: Like the rest of this series, this article was developed through collaborative sessions with Claude, synthesizing ideas that emerged over months of building AI systems and reflecting on how enterprise product experience applies to this new discipline. The framework, the examples, and the opinions are mine. The organization and drafting was a human-AI partnership.