Defining Value from AI

I was discussing with a fellow CIO a few weeks ago. He had just walked his board through eighteen months of AI investment. Adoption numbers were strong. Three pilots had been promoted to production. The internal NPS on the tools was high. Then his CFO asked one question: "What did any of this do to our cost-to-serve?" The room went quiet. Not because the answer was bad. Because nobody had ever defined what a good answer would look like.

That scene plays out more often than most leaders want to admit. And it gets at something I think the industry is only starting to reckon with.

Global AI spending is projected to hit $665 billion this year. Nearly every leadership team has increased their AI budget. The CFO holdout population has collapsed from 70% to 4% in five years. The money is flowing. The conviction is real.

And yet 73% of AI deployments fail to achieve their projected ROI. McKinsey reports that only 11% of companies can confidently measure the return on their AI investments. Deloitte's 2026 data is even more blunt: 56% of enterprises have captured neither revenue gains nor cost savings from their AI spending.

This is not usually a technology failure. The models work. The infrastructure is better and cheaper than it has ever been. What most organizations are missing is more fundamental than that. They cannot tell you what AI is worth to them because they never decided what "value" means in the first place.

The metrics we use are wrong

The most common way companies measure AI today is hours saved. It is also, I think, the most misleading.

If AI saves an employee two hours a day and that person spends those two hours on low-priority work, you saved nothing. Hours saved is an input metric. It tells you the machine is faster. It does not tell you the business is better.

Economists have a name for this: Jevons Paradox. Efficiency improvements lead to increased consumption rather than net savings. Finance teams use AI to accelerate month-end close, then fill the recovered hours with more variance analysis. Marketing generates content faster, then simply produces more of it. HR automates resume screening, then interviews more candidates without improving quality of hire. The work expands to consume whatever time becomes available. I suspect most organizations have experienced some version of this already, even if they haven't named it.

The same problem shows up in adoption metrics. "78% of employees have logged in" tells a board nothing about whether the investment is working. If 60% of those users stopped after the first week, your real adoption rate is closer to 18%. Sustained usage after 90 days is a signal. Launch week logins are noise.

And accuracy without a baseline is meaningless. If your human process runs at 99% and your AI achieves 95%, you just made things worse. The number only matters relative to what it replaced, weighted by the cost of getting it wrong.

What is starting to shift, at least in the research, is that productivity is losing its grip as the headline metric. Futurum Group's latest data shows direct financial impact nearly doubling as the primary measure while productivity dropped almost six points. The market has moved past the productivity argument. Most companies' measurement frameworks have not caught up.

Three kinds of value, not one

When Thomson Reuters set out to evaluate AI tools across their organization, they ran into something that I think more companies will recognize as they mature their own measurement. Time savings told less than half the story. Roughly 60% of the value they found was not efficiency at all. It was capability expansion: people doing work they literally could not do before.

A business analyst with no coding background started building interactive data visualizations. A junior team member produced market analysis that matched senior-level quality. A non-technical program manager created automated reporting dashboards without involving engineering. None of that shows up in a "time saved" calculation. In some cases, these people spent more time, not less. But they produced output that previously required specialized roles the organization could not access or afford.

This maps to what I think is the right way to categorize AI value. There are three types, and most companies only measure the first:

Efficiency. The same work, done faster or cheaper. Cycle time reduction, cost per transaction, error rates. Real and important. But it is the floor, not the ceiling.

Throughput. The same team, handling meaningfully more volume. Thomson Reuters found that 61% of use cases reported a 2x capacity increase. That is not "saving time." It is expanding what a team can do without expanding headcount — a support team resolving twice the tickets, a legal team reviewing twice the contracts, with the cost staying roughly flat.

Capability. New work that was not possible before. This is the category most measurement systems miss entirely, and it is often where the most strategic value hides. When a generalist can do specialist-level work, when a team can tackle problems they previously had to outsource or skip, you are not optimizing. You are changing what the organization can do. That is a harder thing to put a number on, which is probably why so few companies try.

BCG's research reinforces this from a different angle. In a typical AI implementation, only 10% of the value comes from the algorithms and 20% from the technology and data. The remaining 70% comes from managing process change. If you are measuring only the efficiency layer, you are fishing in the shallowest part of the pond.

The value that disappears

Here is the part that should worry every CIO, and it is something I have seen play out more than once.

Even when organizations can point to specific efficiency gains from AI, those gains often do not show up in the P&L. BCG found this pattern consistently: without a clear plan for what happens with freed-up capacity, the savings vaporize.

The critical question most teams skip is deceptively simple. If AI makes a process 15% more efficient, what happens to that 15%?

There are only three honest answers. The team gets smaller. The team takes on more work that moves the needle. Or the time gets absorbed into nothing — more meetings, more reporting, more activity that feels productive but does not change a financial outcome. That third option is the default when no one makes a deliberate choice. And in my experience, it is the most common outcome by a wide margin.

IBM's cost transformation is instructive here. They reduced annual operating costs by more than $4.5 billion, but the savings were real because they restructured around them — deliberate process redesign, function right-sizing, platform consolidation. More than 90% of HR inquiries now go through an AI chatbot, which dropped HR operating expenses by 40%. The chatbot alone did not produce those savings. The organizational decisions around the chatbot did.

A MIT Sloan study found that 61% of enterprise AI projects were approved on the basis of projected value that was never formally measured after deployment. Executives signed off, then moved on. Nobody went back to check. That is not a technology problem. That is a management problem, and it is one that better tooling alone will not fix.

What the top performers do differently

The roughly 30% of organizations seeing meaningful AI returns tend to share a few patterns.

They define success in business terms before they deploy. Not "deploy AI across four departments" — that is an activity metric. Instead: "reduce customer support cost per ticket by 22%." Or "increase throughput per analyst by 2x without adding headcount." The metric is tied to a business outcome, not an AI activity. This sounds obvious, but it is remarkable how often it gets skipped in practice.

They baseline before they build. Forrester's data suggests that companies are three times more likely to scale AI successfully when they define and baseline ROI before deployment. That means documenting the current cost per unit, cycle time, error rate, and headcount before the AI goes live. Without that baseline, you are comparing against a feeling, not a fact. And feelings do not survive a CFO review.

They hold AI to the same standard as any other investment. In the organizations seeing returns, the CEO or CFO tends to own the AI ROI conversation directly. It is not delegated to a Chief AI Officer or a VP of Innovation. The person accountable for company financial performance is also accountable for AI financial performance. That alignment changes everything about where, how, and whether AI gets deployed.

Hyundai Motor Manufacturing Czech is a useful example of what this discipline looks like in practice. They deployed AI-driven production scheduling at their plant that produces 1,400 vehicles daily. The result: $540,000 in annual savings with a three-month payback, 74% improvement in primer efficiency, planning time reduced from hours to five minutes. Those numbers are specific because the baseline was specific. They knew exactly what the process cost before AI touched it.

Measuring at the right altitude

The three types of value above — efficiency, throughput, capability — describe what kind of value AI creates. But there is a separate question that matters just as much: at what level are you measuring it?

Most organizations measure at the task level. Did this agent save time on one step? That is useful, but it is the narrowest view. The more consequential measurement happens at the workflow level: did the end-to-end process actually improve? Fewer handoffs, faster cycle times, lower cost-to-serve. AI often creates its biggest value by eliminating the seams between steps, not just by speeding up individual ones. And then there is the enterprise level: does any of this move a business KPI? Revenue per employee. EBIT margin. Customer retention. Working capital efficiency.

If I am being honest, most AI measurement I have seen stops at level one. The teams building the tools measure task performance because that is what they can control. The business impact — which is what actually matters to a board — requires someone to connect the dots across workflow and financial systems. That connecting work is tedious, cross-functional, and not particularly glamorous, which is exactly why it rarely gets done.

BCG's data makes the case for doing it anyway. AI leaders who measure across this full stack deliver 3x greater cost reduction, 1.6x higher EBIT margins, and 2.7x the return on invested capital compared to their peers. Same technology. Different measurement and management discipline.

Five questions to ask before your next AI investment

If you take nothing else from this, take these five questions into your next budget conversation:

What is the specific business outcome we expect? Not "deploy AI." Not "improve productivity." What measurable result, in business terms, will this produce?
What does the process cost today? Document the baseline. If you cannot answer this question now, you will not be able to prove value later.
What happens with the freed-up capacity? If AI makes a team 20% more efficient, where does that 20% go? If you do not decide in advance, it will almost certainly disappear.
Are we measuring at the right altitude? The agent might be fast. The process might still be slow. Make sure you are measuring where value actually accrues.
Who owns the ROI number? Not who owns the AI tool. Who owns the business result? If no one specific is accountable for the financial outcome, the project probably will not produce one.

These are not complicated questions. But the majority of organizations spending on AI right now cannot answer them.

The prove-it era

This is the year the patience runs out. Or at least starts to.

Investors expect returns within six months. Only 16% of CEOs think they can deliver on that timeline. Forrester predicts a quarter of planned AI spend will get deferred to 2027 because the returns are not visible yet. CFOs who once approved AI budgets on faith are now demanding the same financial rigor they apply to ERP implementations and headcount decisions.

The "invest and learn" phase is winding down. What replaces it is the "prove it" phase, and the organizations that come out ahead will probably not be the ones with the best models or the most pilots. They will be the ones that defined what value means before they started spending, measured it honestly, and made the organizational changes required to actually capture it.

The technology was never really the bottleneck. The question was always simpler and harder than that: do you know what you are trying to get from it?

Define that clearly enough, and the value tends to follow. Skip it, and you are just adding to the $665 billion.