NVIDIA Keynote: The AI factory Becomes a Balance-sheet Question

On a slide near the start of his GTC keynote last month, Jensen Huang sketched five rectangles stacked on top of each other. Energy at the bottom. Then chips.

Then infrastructure. Then models. Applications at the top.

That stack, he said, is what an AI factory looks like now.

Everything around AI is changing at breakneck speed, and keynotes are one of the few moments where someone tries to pause the spin and describe what’s actually taking shape.

A few hours of slides, announcements, and demos later, four observations stayed with us. Each one sharpens the case European organizations already have for building AI on their own terms.

From cost center to value center

Framed as a five-layer stack, an AI factory is a production facility. Its output gets measured in tokens generated per unit of energy and capital invested.

NVIDIA offered its own analogy on stage. The more cars BMW’s factories produce, the more money BMW makes. AI factories work the same way, with tokens instead of cars coming off the line.

That shift stayed with Niels van Rees, co-founder at MDCS.AI, long after the keynote ended. Once intelligence tokens become the economic unit, the datacenter stops being a cost center. Output, efficiency, and return all become directly measurable.

For a CFO, that changes the conversation. The hardware in the corner of the building starts earning its own line on the P&L.

A GPU cluster generating meaningful tokens around the clock reads differently on a dashboard than one sitting mostly idle between experiments. Token throughput and cost-per-token turn into numbers that end up in board decks.

When inference becomes the dominant workload

For years, the headlines belonged to training. How big the model, how many parameters, how long the training run. The keynote made clear that gravity has moved.

Reasoning models now think through a problem step by step before answering. A single complex question can chain through many internal thinking steps before a single line of output reaches the user.

Agentic models go further. They read files, call tools, check what they just did, correct course, and repeat until the task is finished. Every one of those steps happens at inference time, on live compute, in production.

According to NVIDIA’s own framing, compute demand for inference workloads has grown by roughly a millionfold over the past two years. Even if that number invites scrutiny, the direction is unmistakable.

Jensen Huang told the room NVIDIA now has at least $1 trillion in orders booked through 2027. An orderbook that size points to production inference scaling out.

Tokens per watt and tokens per dollar move from marketing slide to operating dashboard.

Trusted data is no longer optional

When AI only generated text for a human to review, a wrong answer was an annoyance. Someone edited it or deleted it. That safety net disappears the moment agents start acting on data directly.

One of the watch-party speakers illustrated the point well. A professor was caught on Dutch television using a fabricated Einstein quote produced by ChatGPT. On a comedy show, the clip is funny.

Inside a production workflow, the same kind of error plays out very differently. An agent triggering wrong payouts at an insurer, on the basis of unreliable data, causes real financial and reputational damage.

The same logic applies across finance, government, and healthcare. A wrong answer in that world becomes a payment someone shouldn’t have received, or a record that shouldn’t exist.

Trusted data, with transparent lineage and auditable access, becomes the precondition for letting agents touch production work at all.

Sovereignty was there, just not named

Jensen Huang didn’t speak about sovereignty in the political sense. He didn’t have to.

The building blocks were all there, sitting quietly inside the technical story. Jensen talked about full-stack control, and open models that move between NVIDIA systems and any other stack.

He mentioned AI factories deployable in any country or region, fully on-premise. And confidential computing that keeps data invisible, even to the operator of the machine.

That’s how Raymond Drielinger, co-founder at MDCS.AI, read the keynote. It matches what we see in European boardrooms every week.

In those boardrooms, sovereignty shows up as a short list of practical questions a CTO brings to architecture conversations:

Where does the data sit, and under which jurisdiction?
How does the stack get upgraded, and who decides when?
What happens when the vendor relationship ends?
Who can see what, including the operator of the infrastructure?

The answers to those questions are what sovereignty looks like at the architecture level.

What European organizations do with this

Most of what Jensen Huang described on stage maps onto infrastructure that MDCS.AI, together with ecosystem partners including NVIDIA and Cisco, has already been building.

End-to-end AI stacks, designed and managed as one environment. Built on hardware that runs open models across stacks. Deployed inside EU borders, on terms the customer sets.

The keynote sharpened the reasons behind that direction. As inference turns into the dominant workload, as agents start acting on live data, and as token economics become line items on the budget, the case for owning the stack and controlling the data gets easier to make in any serious AI conversation.

European CTOs and CFOs are looking at the same options they had six months ago. They’re evaluating them faster now, and with sharper attention to where each one places AI infrastructure on the cost-center-versus-value-center line.

The real question centers on how to build on European terms, with clarity on where the data lives, what it costs, and what it returns. Racing or retreating from the hyperscalers misses the point.

That’s the conversation the NVIDIA GTC keynote quietly set the table for.

If these observations match questions coming out of your own boardroom, we’re always glad to compare notes.