• This tiny AMD PC just ran a massive 397B AI Model that required a

    From TechnologyDaily@1337:1/100 to All on Fri Jun 19 03:00:23 2026
    This tiny AMD PC just ran a massive 397B AI Model that required a server room full of GPUs a year ago

    Date:
    Fri, 19 Jun 2026 00:20:00 +0000

    Description:
    Longsys shows how far edge AI computing has come by running a mammoth 397-billion-parameter model on a customized AMD Ryzen AI Max+ 395-based PC with 128GB of RAM.

    FULL STORY ======================================================================Copy link Facebook X Whatsapp Reddit Pinterest Flipboard Threads Email Share this article 0 Join the conversation Follow us Add us as a preferred source on Google Newsletter Subscribe to our newsletter AMD 's Ryzen AI Halo recently went on sale for $4,000, sparking an interesting debate about how it compares to Nvidia's slightly pricier DGX Spark offering.

    The configuration that the Ryzen AI Halo offers, however, has been on the market for a few months now, and while most OEMs and enterprise providers are offering the same flavor and configuration, Shenzhen-based memory and storage company Longsys has taken things a step further. The storage giant demonstrated a localized version of a 397B-parameter AI model running on its own version of the Ryzen AI Halo, featuring the same 16-core Ryzen AI Max+
    395 and 128GB of RAM configuration. Latest Videos From Watch full video here: How was the Ryzen AI Max+ 395 able to run such a massive model with only
    128GB of RAM? While the model being run was not explicitly stated, it seems
    to be a customized version derived from Alibaba's Qwen 3.5 397B (A17B), a multimodal foundation model that leverages a Mixture-of-Experts (MoE) approach, which made the original DeepSeek such a potent challenger.

    Even if it was leveraging INT4 quantization, the memory requirements far exceed the memory the device demonstrating the feat had on offer: only 96GB
    of VRAM is available to the GPU in a 128GB unified configuration, versus an estimated 200-250GB of VRAM the model needs to run. You may like AMD's rival to Nvidia's GB10 AI workstation is set to go on preorder in days, but is it too little too late? Tiny company steals AMD's thunder and challenges Nvidia with old-tech PCIe AI accelerator HP announces the most powerful Windows AI
    PC ever built Nvidia GB300 workstation can handle one trillion parameters thanks to its 784GB unified memory, but it won't be cheap

    The secret sauce is Longsys's recently unveiled custom SPU and iSA configuration that offers the ability to compress data in real time, a feat that the company says allows it to fit as much as twice the amount of data in storage drives of up to 128GB, leveraging a caching layer that reduces DRAM requirements considerably.

    The approach involves offloading experts not in active use to a large, fast storage buffer that the AI chip can then reintroduce them from if needed. Are you a pro? Subscribe to our newsletter Sign up to the TechRadar Pro
    newsletter to get all the top news, opinion, features and guidance your business needs to succeed! Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors By submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.

    In a press release , Longsys claimed its approach worked by targeting, "the pain points of MoE LLMs", such as large parameter counts, rapid KV Cache expansion, and I/O latency that hampers inference efficiency

    "It leverages expert offloading, intelligent cache management, and predictive prefetch algorithms to efficiently resolve storage scheduling challenges and comprehensively improve local AI inference smoothness," the company added.

    It is important to note that while the move itself is an impressive feat, Longsys did not provide specifics on compute power in terms of tokens per second, where the Ryzen AI chip is relatively limited compared to most modern AI GPU offerings.

    Regardless, the approach that essentially treats storage as memory suggests that localized AI might be able to run considerably larger models, and that memory might not be as hard a constraint for certain approaches.

    It signifies that memory constraints can be circumvented by leveraging fast storage and running a frontier-level model that would otherwise require tens of thousands of dollars in AI hardware, which is no small feat. It means that models that were previously constrained to datacenters only can now be run on a device that fits in the palm of your hand. Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.



    ======================================================================
    Link to news story: https://www.techradar.com/pro/this-tiny-amd-pc-just-ran-a-massive-397b-ai-mode l-that-required-a-server-room-full-of-gpus-a-year-ago


    --- Mystic BBS v1.12 A49 (Linux/64)
    * Origin: tqwNet Technology News (1337:1/100)