Forum: TFSI

In AI models, the real bottleneck isnt computing power its memor

From TechnologyDaily@1337:1/100 to All on Wed Jan 14 17:45:09 2026

In AI models, the real bottleneck isnt computing power its memory: Phison
CEO on 244TB SSDs, PLC NAND, why high-bandwidth flash isnt a good idea, and why CSP profit goes hand in hand with storage capacity

Date:
Wed, 14 Jan 2026 17:32:17 +0000

Description:
Phison CEO tells us about 244TB SSDs, PLC NAND, and why High Bandwidth Flash is not a good idea.

FULL STORY ======================================================================

The technology industry is increasingly talking about GPUs being central to
AI infrastructure, but the limiting factor whichdecides what models you can run is actually memory.

In a wide-ranging interview, Phison CEO Pua Khein Seng, who invented the world's first single-chip USB flash drive, told TechRadar Pro the focus on compute has distracted from a more basic constraint which shows up
everywhere, from laptops running local inference to hyperscalers building AI data centers.

In AI models, the real bottleneck isnt computing power - its memory, Pua
said. If you dont have enough memory, the system crashes. Compensating for DRAM limits

This something is what's behind Phisons aiDAPTIV+ work, which the company discussed publicly at CES 2026 , and essentially is a way to extend AI processing to integrated GPU systems by using NAND flash as a memory pool.

Pua describes it as using SSD capacity to compensate for DRAM limits and keep GPUs focused on compute instead of waiting on memory.

Our invention uses SSDs as a complement to DRAM memory, he says. We use this as memory expansion.

A practical goal is improving responsiveness during inference, especially
Time to First Token, the delay between submitting a prompt and seeing the first output. Pua argues long TTFT makes local AI feel broken, even when the model eventually completes the task.

If you ask your device something and have to wait 60 seconds for the first word, would you wait? he says. When I ask something, I can wait two seconds. But if it takes 10 seconds, users will think its garbage.

Pua links TTFT improvements to better reuse of memory-heavy inference data, particularly KV cache, comparing it to a doctor repeating the same instructions to every patient because nothing is saved between visits.

In AI inference, theres something called KV cache - its like cookies in web browsing, he expanded. Most systems dont have enough DRAM, so every time you ask the same question, it has to recompute everything.

Phisons approach, Pua added, is to store frequently used cache in the storage so the system can retrieve it quickly when a user repeats or revisits a
query.

That memory-first framing extends beyond laptops into how companies build GPU servers, as Pua notes many organizations buy extra GPUs not for compute throughput, but to collect more VRAM, which leads to wasted silicon.

Without our solution, people buy multiple GPU cards primarily to aggregate memory, not for compute power, he adds. Most of those expensive GPUs end up idle because theyre just being used for their memory.

If SSDs can provide a larger memory pool, Pua says, GPUs can be bought and scaled for compute instead. Once you have enough memory, then you can focus
on compute speed, he notes, if one GPU is slow, you can add two, four, or eight GPUs to improve computing power. 244TB SSDs

From there, Pua widened the lens to the economics of hyperscalers and AI infrastructure, describing the current wave of GPU spending as necessary but incomplete, because the business case for AI depends on inference, and inference depends on data storage.

CSPs have invested over $200 billion in GPUs, he says. Theyre not making
money directly from GPUs. The revenue comes from inference, which requires massive data storage.

He summarized the situation with a line he returned to repeatedly: CSP profit equals storage capacity.

That argument also feeds into Phisons push toward extreme-capacity enterprise SSDs . The company has announced a 244TB model, and Pua told us, "Our current 122TB drive uses our X2 controller with 16-layer NAND stacking. To reach 244TB, we simply need 32-layer stacking. The design is complete, but the challenge is manufacturing yield.

He also outlined an interesting alternative route: higher-density NAND dies. Were waiting for 4Tb NAND dies, with those, we could achieve 244TB with just 16 layers, he said, adding that timing would depend on manufacturing
maturity.

On PLC NAND, Pua was clear Phison doesnt control when it arrives, but he told us he intends to support it once manufacturers can ship it reliably.

PLC is five-bit NAND, thats primarily a NAND manufacturer decision, not ours, he said. When NAND companies mature their PLC technology, our SSD designs
will be ready to support it.

He was more skeptical about a different storage trend: tying flash directly into GPU-style memory stacks, sometimes discussed under labels like high-bandwidth flash. Pua argued the endurance mismatch creates a nasty failure mode.

The challenge with integrating NAND directly with GPUs is the write cycle limitation, he said. NAND has finite program/erase cycles. If you integrate them, when the NAND reaches end-of-life, you have to discard the entire expensive GPU card.

Phisons preferred model is modular: keeping SSDs as replaceable,
plug-and-play components. When an SSD wears out, you simply replace it while keeping the expensive GPU.

Taken together, Puas view of the AI hardware future is less about chasing ever-larger GPUs and more about building systems where memory capacity is cheap, scalable, and replaceable.

Whether the target is local inference on an integrated GPU or rack-scale inference in a hyperscaler, the company is betting that storage density and memory expansion will decide whats practical long before another jump in compute does.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the
Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

======================================================================
Link to news story: https://www.techradar.com/pro/in-ai-models-the-real-bottleneck-isnt-computing- power-its-memory-phison-ceo-on-244tb-ssds-plc-nand-why-high-bandwidth-flash-is nt-a-good-idea-and-why-csp-profit-goes-hand-in-hand-with-storage-capacity

--- Mystic BBS v1.12 A49 (Linux/64)
* Origin: tqwNet Technology News (1337:1/100)

Who's Online

System Info

Sysop:	CyberNix
Location:	London, UK
Users:	22
Nodes:	10 (0 / 10)
Uptime:	57:08:09
Calls:	911
Files:	5,123
D/L today:	14 files (1,993K bytes)
Messages:	761,319

In AI models, the real bottleneck isnt computing power its memor

Who's Online

System Info