Most enterprises plug expensive GPUs into networks built for office traffic.
A company invests in high-end GPUs. Racks them. Connects them to the existing enterprise network. Flips the switch. And then waits for AI performance that never quite arrives.
Raymond Drielinger, founder of MDCS.AI, sees this play out at organizations across the Netherlands. The GPUs are top-tier. The network feeding them was designed to handle email, file shares, and video calls. Nobody redesigned it for AI.
In a recent episode of the MDCS.AI podcast series Cisco: More Than Networking, Raymond and Sander ten Hoedt (Cisco, Leader Cloud & AI Account Executives BeNeLux & Switzerland) sat down in Cisco’s Amsterdam studio to talk about what actually happens inside an AI factory when the data flow can’t keep up. And why that directly drives up the metric that will define AI economics: cost per token.
GPU shopping with an office network
The pattern is consistent. An organization buys GPUs, adds some storage, connects it all to their existing network fabric, and calls it an AI platform.
The assumption sounds reasonable: if the GPUs are powerful enough, the rest follows. But a GPU can only process data as fast as the network delivers it. When the network runs at 100 or 200 gigabit and the workload demands 400 or 800, the compute sits idle. You’re paying for a race car engine and starving it of fuel.
The trade-offs start early. Raymond describes teams choosing 200 gigabit networking to save on costs, while their workload needs 400 or more. That gap translates directly into GPUs waiting for data instead of processing it.
The fuel line that starves the engine
Raymond reaches for a Formula 1 analogy to make it concrete. You can take Max Verstappen’s engine and put it in your car. But if the fuel tank is too small and the fuel lines too narrow to deliver what the engine needs, that car finishes at the back of the grid. Every single race.
“We can all take Verstappen’s engine block. But if the fuel lines are too narrow to get the fuel to the engine, performance won’t be there. Every race, you finish last.” – Raymond Drielinger
The same dynamic plays out inside AI factories. The GPUs are the engine. The network is the fuel line. When throughput falls short, GPU utilization drops. And every drop means the infrastructure keeps running, the electricity keeps flowing, but the output goes down. Your cost per token goes up without anyone touching the model.
Sander ten Hoedt adds a sharp point: GPU utilization is one of the most important metrics of any AI factory. And the bottleneck, he says, is often the network.
From chatbot spikes to continuous peak load
For a basic chatbot, most organizations still manage with loosely assembled components. Each prompt creates a short spike on the system. Concurrency stays low. The infrastructure holds.
Agentic AI changes that picture completely. Agents don’t send occasional prompts. They run sustained workloads across the system, all day. Sander calls it continuous peak load, requiring a fundamentally different infrastructure philosophy.
The throughput numbers tell the story. Enterprise AI networking has moved from 200 gigabit to 400, then to 800. The next generation targets 1.6 terabit. Each step up in network speed feeds GPUs faster, raises utilization, and pushes cost per token down.
Cost per token: the metric that connects CTO and CFO
Most companies don’t yet measure in a cost-per-token framework. Sander confirms this. Pharma and financial services are ahead of the curve. For the rest, this metric hasn’t entered the boardroom yet.
It will. Cost per token captures what it costs to produce one unit of AI output. And that number reflects more than your model choice. It reflects your entire infrastructure stack: how well compute, storage, and network work together under load.
Three reasons cost per token creeps up without anyone noticing:
- GPUs sit idle because network or storage can’t deliver data fast enough
- Performance swings cause retries and wasted compute cycles
- No end-to-end observability means the leak stays invisible
Sander puts the consequence plainly: if cost per token runs too high, the plug gets pulled. That, he says, is exactly what happens to many pilots today.
Five checks before you scale your AI factory
Before investing further in GPUs or expanding workloads, five questions are worth asking your own team:
- What is your workload profile? A chatbot creates short spikes. Agentic AI demands sustained throughput. The infrastructure design follows from that distinction.
- Is your network built for continuous AI traffic? Both east-west bandwidth (between GPUs) and north-south bandwidth (to storage and users) need to match the workload.
- What is your GPU utilization during peak and throughout the day? If it drops, find out why before adding more compute.
- Can you explain your cost per token and identify which layer drives it up? If the answer is no, you’re making AI investment decisions without seeing the full picture.
- Do you have end-to-end observability across your entire AI stack? Bottlenecks you can’t measure will erode returns before anyone spots them.
Raymond shares a recognizable scenario. A car that does 320 kilometers per hour goes in for maintenance. Different teams service different components. The car comes back and does 180. Everyone checks their own domain and says everything is fine. You’ve lost half your speed and nobody can tell you why. That is what happens when you run AI infrastructure without a validated, integrated architecture.
If your cost per token keeps climbing while you’re scaling up, that’s often a signal that your data flow isn’t growing with your compute. We’re happy to think along with your specific situation.
This blog is based on the MDCS.AI podcast episode “Cisco: More Than Networking,” featuring Sander ten Hoedt (Cisco) and Raymond Drielinger (MDCS.AI).
Wondering what your first step could look like? Let’s talk.
