AI Markets · · 8 min read

Google’s TurboQuant Compresses AI Memory 6x—But the Real Story Is Why Total Demand May Surge

Efficiency gains historically expand resource consumption rather than contract it, potentially accelerating the HBM supercycle to $100B+ by 2028.

Google’s TurboQuant algorithm achieved 6x memory compression in late March 2026, triggering immediate panic-selling across memory stocks—yet hyperscaler capex projections continue climbing toward $700 billion this year, suggesting the market misread efficiency as demand destruction.

The algorithm compresses KV cache to 3 bits per value from the standard 16 bits with zero measurable accuracy loss, according to Google Research. Micron fell approximately 5% post-earnings, while Western Digital and Sandisk dropped 7-11% within 48 hours of the announcement. The immediate read: if AI models need less memory, aggregate demand for HBM must contract.

But structural data contradicts that narrative. Moody’s projects hyperscaler capex will reach $700 billion in 2026 across the six largest US firms, rising to $820 billion in 2027. Memory’s share of that spend is expanding sharply—SemiAnalysis forecasts memory will account for approximately 30% of hyperscaler Capital Expenditure this year, up from ~8% in 2023-24, per Digital Today. Amazon’s AWS capex alone is expected to reach ~$200 billion in 2026, with a backlog at $244 billion representing a 40% year-over-year increase as of February.

HBM Market Acceleration
2025 TAM
$38B
2026 Projection
$58B
YoY Growth
+52%

The Jevons Paradox in Silicon

Historical precedent suggests efficiency gains typically expand total resource consumption rather than contract it. The dynamic, known as the Jevons Paradox after the 19th-century economist who observed coal consumption rising after steam engines became more efficient, applies directly to AI infrastructure. Morgan Stanley framed it explicitly: “If TurboQuant lowers AI operating costs to one-sixth of current levels, companies that had hesitated to adopt AI because of the cost burden will enter the AI ecosystem. This will not reduce aggregate memory demand, but instead serve as a catalyst that expands the overall size of the AI market itself.”

The mechanism is straightforward. TurboQuant’s 4-bit compression delivers up to 8x speedup in computing attention logits on NVIDIA H100 GPUs, per VentureBeat. That efficiency makes previously uneconomical use cases viable—longer-context models, broader inference deployments across edge applications, agentic workflows requiring persistent memory states. Hyperscalers don’t pocket the savings; they redeploy freed capacity into expansion.

“The entire industry remains capacity-constrained because demand for computing capacity to train new AI models and support exploding growth in inferencing and agentic applications exceeds supply.”

— Moody’s Ratings

Alphabet CEO Sundar Pichai quantified the demand signal in February: “The number of deals in 2025 over a billion dollars surpassed the previous three years combined,” he told investors, according to Network World. That backlog sits alongside persistent supply constraints—HBM capacity across all major suppliers is essentially sold out for 2026, with SK Hynix alone holding 62% market share and Samsung and Micron splitting the remainder at 17% and 21% respectively, per Introl.

Capex Signals Contradict Compression Thesis

Micron’s own guidance contradicts the demand destruction narrative. The company reported Q2 fiscal 2026 revenue of $23.86 billion—a 196% year-over-year increase—with gross margin at 75%, according to Financial Content. Management guided Q3 FY26 to $33.5 billion in revenue and announced a $25 billion capex plan for the remainder of 2026. That level of capital commitment—announced after TurboQuant’s release—signals management expects sustained demand expansion, not contraction.

Samsung and SK Hynix are moving in parallel. Samsung is planning a 50% production capacity expansion in 2026, while SK Hynix announced a 4x infrastructure investment increase, per DataCenter Dynamics. Both accelerated HBM4 mass production to February 2026, several quarters ahead of the original timeline. Gross margins reflect pricing power from tight supply: TrendForce expects Samsung to achieve 63-67% gross margin in Q4 2025, with SK Hynix margins surpassing TSMC’s 60% for the first time since Q4 2018.

Key Drivers of Demand Expansion
  • Longer-context models become economically viable at reduced compression costs, expanding use cases into legal document analysis, enterprise knowledge bases, and persistent agentic memory
  • Inference workloads shift from training-centric to deployment-centric, multiplying total compute and memory requirements across distributed edge infrastructure
  • Lower per-query costs trigger new entrants into AI adoption, expanding the addressable market beyond early-adopter hyperscalers
  • Freed capacity enables hyperscalers to pursue multimodal models requiring simultaneous text, image, and video processing—each demanding distinct memory architectures

Pricing Power Persists Despite Efficiency

Memory pricing contradicts any narrative of weakening demand. DRAM prices rose ~50% year-to-date through 2025, with an additional 30% increase in Q4 2025 followed by 20% more in early 2026, according to Counterpoint Research. Contract DDR5 prices surged over 100% to $19.50 per unit from ~$7 earlier in 2025. HBM pricing remains elevated despite TurboQuant—suppliers maintain pricing discipline in an oligopoly where three firms control 100% of production.

The capital intensity of the build-out is accelerating, not decelerating. Deloitte projects AI data center capex at $400-450 billion globally in 2026, with $250-300 billion allocated to chips alone. That figure rises to $1 trillion by 2028, with AI chips exceeding $400 billion. Memory sits at the centre of that spend—HBM’s share of total semiconductor content in AI servers is rising from 15% in 2024 to an estimated 25% by 2027.

Hyperscaler Capex Trajectory
Year Total Capex Memory Share HBM TAM
2024 $550B 8% $28B
2025 $620B 18% $38B
2026 $700B 30% $58B
2027E $820B 32% $78B

What to Watch

The Q2 2026 earnings cycle will test the compression-versus-expansion thesis. Micron reports in late June; investors should focus on HBM4 shipment volumes and whether capex guidance remains at $25 billion or increases. Samsung and SK Hynix earnings in late July will clarify whether capacity expansions are meeting demand or creating oversupply risk.

Hyperscaler commentary on inference workloads matters more than training spend. If Amazon, Google, and Microsoft begin quantifying the shift from model training to deployment-scale inference—particularly for agentic applications requiring persistent memory states—that validates the expansion thesis. Watch for changes in average context window lengths deployed in production; if those rise from 8K-32K tokens toward 128K-1M tokens, memory demand per query rises proportionally despite compression improvements.

The strategic question for semiconductor investors: does TurboQuant represent peak efficiency before demand plateaus, or does it represent the cost reduction that unlocks the next order-of-magnitude expansion in AI deployment? Historical precedent—and current capex trajectories—suggest the latter. If correct, the HBM supercycle accelerates toward $100 billion in annual TAM by 2028 rather than plateauing at current forecasts.