A new class of hardware — sometimes called “memory godboxes” — is emerging to address one of the most pressing bottlenecks in modern datacentres: the chronic shortage of RAM driven by AI workloads. Writing in The Register, Systems Editor Tobias Mann examines how networked memory appliances built on the Compute Express Link (CXL) standard could fundamentally change how memory is provisioned and shared across servers. The catch? AI may consume those savings just as fast as they are made.
The RAMpocalypse: Why Memory Has Become a Crisis
AI workloads are notoriously memory-hungry. During inference — the process of running a trained AI model to generate responses — systems must maintain what are known as key-value (KV) caches. These caches store the model’s working state as it processes requests, and in multi-tenant environments (where many users share the same hardware) the KV cache frequently exceeds the size of the model itself.
When system memory fills up, these caches spill over to flash storage. That is a problem for two reasons: flash is significantly slower than RAM, degrading inference performance; and flash has finite write endurance, meaning the constant churn of AI workloads physically wears out the storage hardware over time.
The result is sustained, structural pressure on RAM availability and cost across the industry — what The Register dubs the “RAMpocalypse.”
What Is a Memory Godbox?
A memory godbox is a networked memory appliance: a dedicated piece of hardware that pools large quantities of RAM and makes it available to multiple servers over a high-speed interconnect, rather than requiring each individual server to carry all the memory it might ever need. Think of it as treating memory the way cloud computing treats compute — as a shared, on-demand resource rather than a fixed allocation per machine.
The technology making this practical is Compute Express Link (CXL), a cache-coherent interface standard that connects CPUs, memory, accelerators, and other peripherals using a common protocol. CXL piggybacks on the PCIe physical standard, meaning it can leverage the existing high-speed interconnect infrastructure already present in modern servers.
CXL: Seven Years in the Making
CXL has evolved through several generations, each adding meaningfully to what is possible:
- CXL 1.0 — Introduced memory expansion modules, allowing a server to attach additional RAM via a PCIe slot. Useful, but limited to a single machine at a time.
- CXL 2.0 (2020) — Added basic switching, enabling memory pooling: a pool of RAM that can be partitioned and assigned to different hosts, though not accessed simultaneously by multiple systems.
- CXL 3.0 — The significant leap. Enables true memory sharing: multiple machines can access the same physical memory simultaneously. Operates over a PCIe 6.0 baseline providing 16 GB/s bidirectional bandwidth per lane — with 64 lanes per CPU, that delivers up to 512 GB/s of additional bandwidth. Round-trip latency sits at approximately 170–250 nanoseconds, comparable to a NUMA hop within a traditional server.
- CXL 3.1 — Added confidential computing capabilities for workload isolation where required.
- CXL 4.0 (ratified late 2025) — Doubles bandwidth to 32 GB/s per lane by rebasing on PCIe 7.0.
Amazon’s Graviton5 CPUs already support the CXL 3.0 specification, and AMD and Intel’s current Epyc and Xeon server processors support the existing appliances. The next generation of server CPUs and GPUs is expected to push CXL 3.0 adoption into the mainstream.
Products Already on the Market
This is not purely theoretical. Several vendors are already shipping or actively developing memory appliance hardware:
- Panmnesia PanSwitch — A CXL 3.2-compatible switch with 256 lanes of connectivity, capable of connecting memory modules, devices, or CPUs in large-scale topologies.
- Liqid Composable Memory Platform — Provides up to 100 TB of pooled DDR5 memory accessible by as many as 32 host servers simultaneously.
- UnifabriX Max — Offers CXL 1.1 or 2.0 connectivity to 16 or more systems, with CXL 3.2 support in development.
The Catch: AI May Eat the Savings
Here is where the analysis takes a sobering turn. Memory godboxes could, in principle, let enterprises right-size the RAM in individual servers and draw on a shared pool only when demand spikes — reducing both capital cost and the waste of provisioning peak capacity in every machine. In a world of diverse, mixed workloads, that is a compelling proposition.
But the dominant workload driving RAM demand right now is AI inference — and that workload will consume the additional pooled memory for the same KV cache offloading purposes that are causing the shortage in the first place. The expanded capacity does not eliminate the problem; it moves the ceiling upwards until AI fills the room again.
As The Register puts it: memory godboxes could theoretically reduce enterprise infrastructure costs, but AI adoption will likely consume those resources for KV cache offloading — “bad news for enterprises looking to these memory godboxes for salvation from the RAMpocalypse.”
What This Means for IT Infrastructure Planning
For IT teams managing on-premises infrastructure or advising on server procurement, the CXL story has a few practical takeaways:
- Memory is the new compute bottleneck. When specifying servers for AI-adjacent workloads — even indirectly, such as running applications that call cloud AI APIs — RAM capacity deserves as much attention as CPU core count or storage throughput.
- CXL support is a future-proofing consideration. If you are buying server hardware in 2026 with a three-to-five year lifespan, checking for CXL 3.0 support in the CPU specification is a worthwhile step. It preserves the option of attaching pooled memory appliances as the ecosystem matures.
- Memory pooling is most valuable in mixed-workload environments. Organisations running diverse server workloads with variable memory demand — databases, virtualisation, analytics, and AI inference on the same infrastructure — stand to gain most from pooled memory. Homogeneous AI inference clusters will likely consume every byte regardless.
- The cloud abstraction hides but does not eliminate this problem. If your AI workloads run in the cloud, the memory bottleneck is the cloud provider’s problem to solve — but it feeds directly into the GPU and compute instance availability and pricing you experience. Understanding what is driving those costs helps you make better procurement and architecture decisions.
CXL-based memory pooling is a genuinely important technology that is, after seven years of development, finally approaching practical viability at scale. Whether it delivers meaningful cost relief or simply enables the next tier of AI memory consumption will depend largely on how organisations choose to deploy it — and how aggressively AI workload growth continues to outpace hardware advances.
If you are planning a server refresh or evaluating your infrastructure capacity for AI workloads, speak to the BIT Tech team for independent guidance on hardware specification and architecture.

