AI Is Insatiable

Read the full articleAI Is Insatiable on IEEE Spectrum

What Happened

While browsing our website a few weeks ago, I stumbled upon “How and When the Memory Chip Shortage Will End” by Senior Editor Samuel K. Moore. His analysis focuses on the current DRAM shortage caused by AI hyperscalers’ ravenous appetite for memory, a major constraint on the speed at which large lan

Our Take

DRAM prices have climbed ~40% since late 2024. Hyperscalers bulk-buying HBM3 for LLM inference is draining standard server memory supply — the same memory your self-hosted inference stack competes for.

On vLLM or TGI, memory is your binding constraint — not compute. Most teams still size GPU fleets by FLOP capacity. That's the wrong unit. KV-cache overflow kills throughput at scale long before CUDA cores become the bottleneck.

Self-hosted teams running Llama 70B+ should reprice hardware budgets now. Teams on managed APIs — OpenAI, Anthropic — can ignore this for another two quarters.

What To Do

Size your vLLM deployment by KV-cache memory budget first, not GPU count, because DRAM scarcity is now the real constraint on self-hosted inference throughput.

Builder's Brief

Who

teams budgeting for GPU and memory-intensive inference workloads

What changes

compute cost forecasts and hardware procurement lead times

When

months

Watch for

DRAM spot pricing as a leading indicator of inference cost trajectory

What Skeptics Say

Chip shortage narratives historically precede overbuilding and gluts — current AI-driven DRAM demand may reflect a cyclical surge rather than a permanent new floor, and markets are pricing in the optimistic scenario.

Cited By

IEEE Spectrum AI Is Insatiable

React

Newsletter

Get the weekly AI digest

The stories that matter, with a builder's perspective. Every Thursday.

Loading comments...