AI inference pushes NAND supply into constraint

IN Brief:

Enterprise SSD demand is displacing client, and embedded, NAND allocations.
RAG and vector databases are pulling storage closer to compute, and shifting system balances between DRAM, NAND, and IOPS.
Suppliers are leaning on process upgrades and QLC, while OEMs hedge with qualification breadth, LTAs, and capacity-flexible designs.

Phison chief executive K.S. Pua has been making headlines this week by stating candidly what he thinks comes next for flash memory. In a Chinese-language interview cited by multiple outlets, he described major OEMs “pleading suppliers for flash memory,” warned that smaller players may not secure supply, and characterised the outlook for the second half of 2026 in apocalyptic terms, including the line that “consumer electronics are finished,” and that the market “will see a lot of victims.”

Industrial electronics engineers have heard versions of this before. Memory cycles are vicious, and they rarely end politely. The reason Pua’s comments are landing now is that they line up with what the market data is already signalling: TrendForce expects NAND Flash contract prices to rise 33–38% quarter-on-quarter in 1Q26, explicitly tying the move to disciplined supplier capacity management and robust server demand displacing other applications. That is not a forecast of “no NAND anywhere,” but it is a warning that allocation will become the real product, and price is simply the paperwork around it.

What is actually changing in NAND demand

NAND has spent two decades expanding by stealth — more layers, more bits per cell, and more die per package — while most system designers treated it as a solved problem. AI inference is breaking that assumption because it creates storage demand that is simultaneously capacity-heavy, latency-sensitive, and operationally expensive (writes, endurance, and bandwidth all matter at once).

At the front end, training clusters still ingest, checkpoint, and reshuffle multi-terabyte datasets, but inference is where NAND becomes infrastructure rather than a component. Retrieval-augmented generation (RAG) and vector search architectures pull large embeddings and index structures into “always-on” storage tiers that sit close to GPUs, and they reward predictable micro-latency and high random-read throughput. Kioxia’s own AI storage material makes the point directly: RAG vector databases scale up quickly, and an SSD-based approach is positioned as a way to avoid DRAM scaling limits while keeping query performance acceptable.

That shift changes purchasing behaviour. Hyperscalers do not buy “some SSDs.” They lock capacity, qualify specific SKUs, and then treat SSDs as a fleet asset with defined performance envelopes, endurance budgets, and failure-rate targets. TrendForce is effectively saying the same thing in market language: NAND demand is becoming polarised between consumer applications and AI-driven infrastructure, with suppliers shifting output from client SSDs to data centre SSDs, and high-capacity, low-cost QLC products “especially limited.”

There is also another accelerant: cold storage. When nearline HDD supply tightens, the industry does not stop storing data; it substitutes. TrendForce has argued that data centres and cloud service providers are being pushed toward high-capacity QLC enterprise SSDs as replacements, and that the combination of yield and validation challenges constrains supply for those very high-capacity drives. The net effect is that the highest bit-hungry segment is also the hardest segment to manufacture smoothly.

Why supply cannot “just ramp” in time

The persistent myth is that NAND is “commodity,” so supply can respond quickly. In reality, NAND output is the product of wafer starts, layer scaling, yields, packaging capacity, controller availability, firmware qualification, and customer-specific validation cycles. If any one of those becomes the bottleneck, more capex simply buys you a larger queue.

Even within NAND itself, the industry is juggling contradictory goals. Layer-count increases and bits-per-cell moves (TLC to QLC, and beyond) raise bits per wafer, but they also raise the burden on process control (high aspect-ratio etch, deposition uniformity, and defect density), and they complicate endurance and performance management at the device level. QLC, in particular, can be excellent for read-heavy, cold-tier workloads, but it is far less forgiving under sustained write, and it demands more controller sophistication, better wear management, and more aggressive caching strategies to hit enterprise QoS targets.

Meanwhile, suppliers are making deliberate allocation choices. TrendForce’s January briefing is explicit that profit-maximising suppliers are shifting supply from client SSDs to data centre SSDs, and that they are controlling shipments in a way that tightens availability across categories. In December, TrendForce data cited by Astute Group described tightening wafer supply across major density segments, with strong enterprise SSD demand, phasing out of older nodes, and sharp contract price increases, particularly in TLC.

This is where industrial and embedded buyers get nervous, because embedded storage is a rounding error in volume terms but a headache in product terms. eMMC and UFS are long-life-cycle parts, often qualified to conservative temperature and reliability profiles, and tied to specific controller-plus-NAND combinations. When supply tightens, “equivalent” is rarely equivalent.

NAND outside the data centre

It would be a mistake to frame this as hyperscalers versus consumer gadgets. NAND content is rising almost everywhere, and the list of applications that now assume local non-volatile storage keeps expanding.

Automotive is the obvious example. Micron’s investor material points to ADAS and AI-enhanced in-cabin experiences driving “significantly higher memory and storage content,” and it places embedded AI systems — drones, robots, and AR/VR — in the same trajectory. Modern vehicles are increasingly designed around software-defined features, multi-camera sensor ingestion, event logging, map and model updates, and richer infotainment. All of that lands on NAND, even when the marketing slide says “experience.”

Industrial equipment is following the same pattern, just with less gloss. Edge inference wants local model storage, local feature stores, and local logging because connectivity is not guaranteed and cloud latency is often unacceptable. Add regulatory pressure to retain data, and NAND becomes part of compliance, not just functionality. For engineers, the critical point is that these are not bursty consumer writes; they are sustained operational workloads with real endurance implications, which forces higher-capacity parts, stronger controllers, and better power-loss protection strategies.

What mitigation looks like in practice

Phison’s own language hints at the risk profile: he said he and many others are now “memory beggars,” and claimed his customers’ fulfilment rate is under 30%. Even if the exact percentage varies by segment, the procurement lesson is the same: spot buys fail first, and long-tail SKUs get rationed first.

For system designers and supply-chain teams, mitigation falls into three buckets.

First, specification flexibility. If a product design hardcodes a single capacity point, a single package geometry, and a single interface generation, it is volunteering to be squeezed. Engineering teams that define qualified “bands” (for example, multiple densities, alternate package heights, and validated second-source controllers) are buying themselves optionality. In embedded, that often means treating eMMC and UFS as design elements with roadmap ownership, not as components thrown over the wall at the end of the project.

Second, qualification strategy. Enterprise buyers already behave this way: they qualify multiple SSD SKUs, enforce firmware baselines, and maintain approved alternates because they assume the market will misbehave. Industrial OEMs can do the same, scaled to their reality: qualify alternates early, insist on PCN discipline, and keep a controlled process for introducing new NAND generations without resetting the entire product certification stack.

Third, architectural pressure relief. Data centres are already experimenting with SSD-centric approaches to reduce DRAM pressure in RAG systems, including SSD-based ANNS implementations that move index data to SSD and reduce the need to hold it in DRAM. The industrial analogue is less exotic but equally effective: tier cold data away from premium NAND, use aggressive compression and log management, and design for predictable write patterns to reduce write amplification. If you cannot reduce demand, you can at least stop wasting it.

Geopolitics and manufacturing concentration

Even if demand and supply were perfectly forecast, the NAND supply chain still sits on a narrow manufacturing base and a global tool ecosystem. That makes geopolitical friction a genuine operational risk rather than a headline risk.

Reuters reported in September 2025 that tighter US export restrictions would make it harder for Samsung and SK hynix to import US semiconductor manufacturing equipment into China, with analysts estimating that around a third of Samsung’s NAND output is produced in China, and that 30–40% of SK hynix’s DRAM and NAND production is based there. The Financial Times reporting on the same policy shift highlighted similar exposure in specific Chinese fabs and the long-term constraint on upgrades.

Washington later granted annual licences for 2026 equipment shipments, but “annual approvals” are still a weaker guarantee than standing waivers, and they introduce planning friction in a business that already runs on multi-year timelines. In a tight NAND market, reduced flexibility is not a theoretical problem; it is what turns a squeeze into a shortage for the buyers at the back of the allocation queue.

The practical takeaway is dull, but it is the only one that matters: if your product ships on a memory component, treat memory as a programme risk. Pua has supplied the hook, TrendForce has supplied the numbers, and the rest is down to how disciplined engineering teams are willing to be before procurement becomes a contact sport.

Eatron and NEXTY scale battery monitoring

QPT opens demos for 1MHz GaN drive

CSG secures European electronic fuze contracts

Cadence adds autonomous AI to chip verification

Intel adds Xeon 6+ for agentic AI systems

Siemens builds power architecture for NVIDIA AI centres

ST adds in-sensor AI to vibration monitoring

A.R.T. and Space East to host space electronics manufacturing event

Pickering to show EV battery and ECU test systems in Stuttgart