You paid for the AI. Nobody built the system.

Several product orgs I work with have the same pattern. AI is in the workflow. Spend is on the P&L. Measurable operational outcomes are not in the data. The teams are using it. The investment just has no defensible return because no one built the system that would generate one.

Spend · fully visible
AI tool licenses$4,200/mo
Seat expansion$1,800/mo
Onboarding + training$12,000 (one-time)
Product manager time on AI workflow$4,800/mo
Impact · not tracked
Product manager output per headnot measured
Rework ratenot measured
Handoff failure ratenot measured
Cycle time deltanot measured

Representative example. A typical 15-person product org running mainstream AI tooling. Excludes API and infrastructure consumption, AI program leadership compensation, custom integrations, and platform engineering time. Real run rates often run multiples higher.

Three gaps that neutralize the investment.

Every org I see running AI without a return is missing the same three things. They aren't prompting badly. They're operating without the connective tissue between AI output and operational outcome. The same three gaps in the same order of incidence: context, handoff, decision. Every team that hasn't closed them is running the same playbook and getting the same result.

01
Most common
The AI doesn't know who you are.

I watch product managers run the most capable reasoning system ever built with zero grounding in their company. No product context. No customer segments. No strategic priorities. No recent decisions. Every prompt starts from scratch. The output feels generic because it is, and product managers compensate by hand-writing context into every prompt. The model isn't lazy. It's working with nothing.

What it costs 40 to 60 percent of product manager output requires significant rework before it can be used. At a loaded product manager cost, that isn't a productivity gap. It's a margin problem. The org is paying for AI and still paying for the rework.
"We ran it three times and kept getting the same boilerplate. We might as well have Googled it."
Read more

The fix is not a better prompt. It is org grounding that flows into every output without the product manager having to remember to attach it. Strategy, customer segments, current priorities, recent decisions: persistent context that travels with the work, so a new product manager produces output indistinguishable from someone who has been at the company three years.

02
Most expensive
The work doesn't survive the handoff.

The AI doesn't know where the output goes, so it returns a wall of text when the work needed a development-ready spec. Even when the format is right, the reasoning rarely travels. Someone summarizes the brief in Slack. Someone else interprets the summary in a ticket. By the time the work reaches the builder, the original intent is three degrees removed and nobody owns the gap.

What it costs A bad spec isn't one hour of rework. It's one sprint. Misaligned execution doesn't show up in your AI budget. It shows up in cycle time, rework rate, and missed release dates.
"The brief was right. Three handoffs later, we were building something else."
Read more

I see two failure modes producing the same outcome. Format failure: discovery output arrives unshaped, engineering reformats it before scoping. Reasoning failure: someone summarizes the brief in Slack, someone else interprets the summary in a ticket, the why disappears. Both end in a feature that ships off-target. The worked example below traces one of these failures through to dollars.

03
Most overlooked
Everything is there. Nothing is decided.

The AI doesn't know what you're trying to decide, what tradeoffs matter, or who's reading the output. So it gives you comprehensive. Comprehensive isn't actionable. You still have to do the hard part. The artifacts look rigorous, fill the meeting, and leave the room without a recommendation. AI added a step. It didn't remove one.

What it costs Decision latency is the drag nobody tracks. Every cycle where AI produces output but doesn't sharpen a decision is cost absorbed with no value captured.
"The brief was thorough. I read it twice and still didn't know what we were recommending."
Read more

This gap is the hardest to spot because the artifacts look good. The fix is framing every output around a decision: the tradeoff being navigated, the recommendation, the next action. Comprehensiveness is a default, not a goal. If a leader can't act on the output without a follow-up meeting, the AI did the work twice and still didn't move the metric.

Three handoffs later.

One feature. One handoff failure. The brief on the left was specific, sourced, and measurable. The version on the right is what reached the engineer. The structural fields survived. The substance dropped out at every degree of separation. This is what Gap 02 looks like when you trace it through to dollars.

Your brief
Objective
Reduce notification fatigue for enterprise users by consolidating alerts into a daily digest.
User problem
Power users receive 40+ notifications daily. 73% dismissed without reading.
Why now
Q3 churn data: notification overload in 34% of at-risk accounts.
Success metric
30% reduction in dismissed alerts within 60 days.
Three handoffs later
Objective
Reduce the number of notifications.
↳ lost: enterprise scope, digest mechanism
User problem
Users get too many notifications.
↳ lost: 40+ figure, 73% stat, power user context
Why now
(empty)
↳ lost: Q3 data, churn link, 34% stat
Success metric
TBD
↳ lost: 30% target, 60-day window

The structure survived. The substance didn't.

What breaks Time lost Est. cost
Product manager rewrites degraded brief 4 hrs ~$800
Engineering scopes broken spec 6 hrs ~$1,800
Sprint replanned mid-cycle 1 day ~$4,200
Feature ships misaligned 2 wks ~$18,000
At-risk account escalates open untracked
Total visible cost, one feature, one handoff failure ~3 wks ~$25K

Cost estimates based on typical loaded engineering and product manager rates at growth-stage SaaS, not a sourced study.

One bad handoff on one feature carries about $25K of visible cost, three weeks of slip, and an at-risk account that escalates. None of it gets traced back to the AI investment.

This happens every sprint. It doesn't show up on any AI ROI dashboard.

"The brief was right. Three handoffs later, we were building something else."

Six questions.

Yes or no for your own team. Two questions per gap. Take it as a CPO answering for your own org, or as an operating partner answering for a portfolio company. The questions are blunt by design. Hedging here defeats the point. If the honest answer is yes-but, it's a no.

Gap 01 · context
  1. Does every AI output produced by your team start from the same persistent organizational context, without a product manager remembering to add it?
  2. Can a new product manager produce work indistinguishable in framing from a product manager who has been at the company three years?
Gap 02 · handoff
  1. When discovery output reaches engineering, does the engineer have the original reasoning behind the brief, not just the brief?
  2. When briefs degrade across handoffs, does someone catch it before the work begins, or after?
Gap 03 · decision
  1. Do AI-generated artifacts come framed around a decision, recommendation, or next action, rather than around comprehensiveness?
  2. Has any decision been faster in the last quarter because of an AI output? Name it.
Five or six yeses: the system is working. Three or four: the gap is open and visible in your metrics. Fewer than three: the AI investment is decoration.

What closing the gaps looks like.

This is not a prompting technique or a tooling upgrade. It is a different operating model. The teams I watch moving operational metrics built each of these three deliberately, not as a side effect of better tools. Three things change.

Org grounding flows through every output and every handoff.

No product manager remembering to add context. No generic outputs that could belong to any company. The reasoning behind the brief travels with it through the org, so the builder gets the why, not just the what.

Artifacts arrive shaped for where they go.

A development-ready spec comes out of a discovery session ready for engineering, not three rounds of reformatting later. Each artifact is shaped for the role and the decision waiting at the next step.

Decisions come out of the process, not after it.

Outputs are framed around the tradeoff, the recommendation, the next action. Not comprehensiveness. The team can name which decisions got faster this quarter, and point at the specific AI artifact that moved them.

Today
Spend
What's missing
System
What gets reported
EBITDA lever

The middle box is the work most orgs skipped. This is the gap between AI as a cost line and AI as an EBITDA lever.

If this maps to what you're seeing in your portfolio or your own org, I'd be glad to talk. More on embedded product leadership for PE-backed SaaS.

Or

Notify me when early previews open.

If you'd rather not start a thread, leave details and I'll reach out when there's something concrete to share.