April 21st 2026, Karel V, Utrecht | AI is getting cheaper. Your AI bill isn’t.
Most boards are asking some version of the same question this quarter. AI budgets keep growing. Where is the return?
On April 21st, fifteen senior AI leaders sat down behind closed doors at Karel V in Utrecht. Two hours, no slides, no pitch decks. The AI & ROI Roundtable, hosted by NXT Minds and MDCS.AI with NVIDIA, brought together CDOs, CIOs, and Heads of AI who had moved past pilots and were running AI in production.
Each one of them was watching costs compound, facing a board that wanted numbers.
Cost per token kept dropping that quarter, and total AI spend kept climbing. Both were true at once.
A paradox no calculator resolves.
The board question without a framework
The pilot phase of AI is over. Hard results are expected now. The question on every board’s table has shifted from “can we use AI?” to “where is the return?”
The frameworks built to answer that question were built for SaaS. None of them survive contact with AI workloads in production.
Niels van Rees, Co-founder of MDCS.AI and co-host of the afternoon, framed it directly. AI budgets keep growing, and the board question has changed.
Vera Schut of NXT Minds opened the conversation with a similar observation. The pilot phase is over, hard results are expected, and what stays on the table is one fundamental question. How do you make sure AI investments deliver structural value while costs keep climbing?
Every leader in the room had felt some version of that question landing in a quarterly review. We see this every week, in conversations with CDOs and Heads of AI across nine sectors.
The question is consistent. The frameworks people reach for to answer it are not.
Why cheaper tokens lead to bigger bills
Cost per token is dropping. Total AI spend keeps growing. Why both at once?
Cost per token is falling. Total AI spend keeps growing. Why? That’s not a coincidence. It’s a consequence of how you design and deploy AI. – Niels van Rees, Co-founder of MDCS.AI
Four multipliers do the work. They run in parallel, and each one alone is enough to bend the spend curve upward.
- Workloads scale faster than prices fall. More prompts per user, longer context windows, multi-turn conversations, agentic patterns running continuous load. Every drop in token price gets outpaced by a rise in token volume.
- Cloud architecture bills per call. Owned hardware amortizes over years. Cloud invoices every prompt, every inference, every user. There is no flat monthly rate when load is variable and growing.
- Stack fragmentation multiplies invoices. Compute, storage, networking, monitoring, orchestration. Different vendors, different contracts, different bills. The total accumulates outside the line item your CFO is watching.
- Pilot to production changes the load shape. A pilot runs ad-hoc, mostly business hours, mostly forgiving. Production runs 24/7 against SLAs. The same workload, once it crosses into production, runs around the clock instead of in business hours.
Reserved instances drop the unit price. They leave the other three multipliers untouched.
Koen van den Berg from NVIDIA and Niels walked the room through how GPU usage, architecture, and utilization directly determine cost and scalability. The real challenge sits between the layers, in how compute, storage, and software work together.
AI leaks. The question is which kind.
You might read this as “cloud bad, owned good.” The framing went the other way.
AI leaks. Always. In the cloud, through scale and invisible costs. On-prem, through underutilization and architecture choices made with traditional IT logic. – Niels van Rees
Cloud leaks happen on the multiplier side. Every user, every shadow tool, every long-running session compounds against an architecture built for elastic billing.
The leak is structural. It runs in the background until the quarterly invoice arrives.
On-prem leaks happen on the architecture side. Average GPU utilization across most enterprise AI environments sits well below thirty percent.
The bottleneck lives in the fuel line. Storage tiers configured for file shares instead of 800-gigabit AI fabrics, schedulers built for batch jobs rather than continuous inference, monitoring designed for virtual machines and blind to tokens per watt, all of it adds up across the stack.
Owned infrastructure isn’t an automatic win. It carries its own leak.
The architectural discipline that closes the on-prem leak differs from the cloud one. Different problem, different fix.
The sharper question is which leak your architecture allows, and which one you are actively designing to close.





What you can actually steer
ROI on AI is hard to calculate. The variables move faster than accounting cycles. Token prices shift quarterly, context-window pricing shifts monthly, and agentic patterns reshape workload assumptions every time a new framework lands.
What you can steer is the foundation.
AI ROI is hard to calculate. What you can steer is the foundation. Your stack, your cost per token, your scalability, your control. – Niels van Rees
Four levers run through the framing. Stack architecture as the way layers work together rather than the choice of any single one. Cost per token as a real management metric instead of GPU hours.
Scalability under realistic production load rather than pilot load. Control over the layers that compound when load grows.
On each of these four, an owned AI stack outperforms default-cloud configurations in almost every case we see. Cloud was built for elasticity. The cost shape of production AI is a different problem.
The panel made this concrete. Koen, Surajeet Ghosh at Heineken, and Maarten Kramer at a.s.r. opened the leadership questions. Speed, performance, sustainability, risk, model quality, all on the table.
What you actually steer on shapes the choice between cloud, hybrid, and an owned stack. Without a sharp value framework, sustainable ROI is close to impossible to land.
Sebastiaan van Duijn at Trivium Packaging then walked the room through how a small team designed and shipped their own AI platform. From ROI discussions to live infrastructure decisions, in months rather than years.
His point landed in the room. AI is not an experiment. It is an economic infrastructure choice.
From discussion to design to running platform, with a team you could fit around a kitchen table.
The ROI paradox resolves through design choices.
AI ROI is not calculated. It is designed.
The cost-per-token narrative will keep delivering reassuring quarterly trends. Total AI spend will keep climbing alongside it. The lever that moves both is the stack underneath.
Stack architecture, cost per token, scalability, control. Four knobs that fit on a whiteboard, available Monday morning. By the end of next quarter, the difference between teams that picked them up and teams that waited will show up on the invoice.
If you are weighing the move from cloud to a sovereign AI stack, this is the moment to design it deliberately. MDCS.AI helps you take that step in a controlled way. [Get in touch]
Thanks to Vera and the NXT Minds team for hosting, and to every leader at the table for two hours of straight talk.
